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Preface 


There are no tall towers of advanced differential geometry in this book. The objective of this book is to dig 
deep tunnels beneath the tall towers of advanced theory to investigate and reconstruct the foundations. 


This book has not been peer-reviewed, and probably never will be. The best reviewer is the reader when the 
electronic version is free. The peers will have little interest in this book because it is primarily concerned 
with the meaning of mathematics, not the serious business of discovering new theorems. So the reader must 
judge everything on its own merits. Abundant references to the authoritative standard literature are given 
to help judge the credibility of assertions made here, particularly those which are difficult or controversial. 


The purpose here is not to fit in with the current literature, but rather to examine some of that literature 
with the intention of reconstructing it into a more unified systematic framework. Doing things in the usual 
way is not the prime objective, but this book is not iconoclastic. Its purpose is to unify the scattered pieces. 


This book aims to reconcile the many differential geometry approaches which have developed and diverged 
during the last two hundred years. A core objective has been the removal of the need for intuition in proofs 
and definitions. A powerful tool for such “logicisation” is the ubiquitous use of logical quantifiers. The cost 
of replacing intuition and informality with detailed logical arguments is the excessive size of this book. 
Part IV of this book is currently being rewritten because it is too incomplete and inconsistent. Anything 
which looks unfinished probably is unfinished. Work is still in progress. The creative process for producing 
this book is illustrated in the following diagram. These processes are all happening concurrently. 


read write type upload download 


—- —- notes —- —> —- 


books brain desk workstation web server Internet 


Unlike a jigsaw puzzle, no book can ever be complete. Books are abandoned, not completed. There is no 
“last piece" which can be placed in the puzzle to make it gap-free and perfect. 


'The purpose of this book was to replace the chaotic hustle and bustle of the differential geometry market- 
place with the serene order of a well-curated ornamental garden. This has failed, but there have been some 
successes in some areas. There are too many gaps for one person to fill in one lifetime. Hopefully, when this 
book is finally “abandoned” several months from now, at least some aspects of this book may be useful to 
someone, somewhere, some time. 


January 2023 Dr. Alan U. Kennington 
Melbourne, Australia 


Disclaimer 


'The author of this book disclaims any express or implied guarantee of the fitness of this book for any purpose. 
In no event shall the author of this book be held liable for any direct, indirect, incidental, special, exemplary, 
or consequential damages (including, but not limited to, procurement of substitute services; loss of use, data, 
or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict 
liability, or tort (including negligence or otherwise) arising in any way out of the use of this book, even if 
advised of (or knew or should have known) the possibility of such damages. 


Biography 

'The author was born in England in 1953 to a German mother and Irish father. The family migrated in 1963 
to Adelaide, South Australia. The author graduated from the University of Adelaide in 1984 with a Ph.D. in 
mathematics. He was a tutor at University of Melbourne in 1984, research assistant at Australian National 
University (Canberra) in early 1985, Assistant Professor at University of Kentucky for the 1985/86 academic 
year, and visiting researcher at University of Heidelberg, Germany, in 1986/87. From 1987 to 2010, the 
author carried out research and development in communications and information technologies in Australia, 
Germany and the Netherlands. His time and energy since then have been consumed mostly by this book. 
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ele JR 
Iz Ig IS eie tee ie I 


This book is not “Differential Geometry Made Easy". If you think differential geometry is easy, you haven't 
understood it! This book aims to be “Differential Geometry Without Gaps", although a finite book cannot 
fill every gap. Recursive gap-filling entails a broad and deep traversal of the catchment area of differential 
geometry. This catchment area includes a large proportion of undergraduate mathematics. 


'This is a definitions book, not a theorems book. Definitions introduce concepts and their names. Theorems 
assert properties and relations of concepts. Most mathematical texts give definitions to clarify the meaning 
of theorems. In this book, theorems are given to clarify the meaning of definitions. To be meaningful, 
many definitions require prior proofs of existence, uniqueness or properties. So some basic theorems are 
unavoidable. Some theorems are also given to motivate definitions, or to clarify the relations between them. 
'The intention of this definition-centred approach is to assist comprehension of the theorem-centred literature. 
'The focus is on the foundations of differential geometry, not advanced techniques, but the reader who has a 
confident grasp of basic definitions should be better prepared to understand advanced theory. 


To study differential geometry, the reader should already have a good understanding of mathematical logic, 
set theory, number systems, algebra, calculus and topology. These topics are presented in preliminary 
chapters here, but it is preferable to have made prior acquaintance with them. This book is targeted at the 
graduate perspective although much of the subject matter is typically taught in undergraduate courses. 


The central concepts of differential geometry are coordinate charts, tangent vectors, the exterior calculus, 
fibre bundles, connections, curvature and metric tensors. The intention of this book is to give the reader 
a deeper understanding of these concepts and the relations between them. The presentation strategy is to 
systematically stratify all differential geometry concepts into structural layers. 


This is not a textbook. It is a thoughtful investigation into the meanings of differential geometry concepts. 
Differential geometry imports a very wide range of definitions and theorems from other areas. Imperfect 
understanding of fundamental concepts can result in an insecure or even false understanding of advanced 
concepts. Therefore it is desirable to better understand the foundations. But the recursive definition of a 
concept in terms of other concepts inevitably leads to a dead end, a loop, or an infinite chain of definitions. 
Ultimately one must ask the ontology question: What is this mathematical concept in itself. The ontology 
trail leads outside pure mathematics to other disciplines and into the minds of mathematicians. 

The author intends to use this book as a resource for writing other books. When this unabridged version 
has been released, the author may write a shorter version which omits the least popular technicalities. The 
shorter version may be titled “Differential Geometry Made Easy”. It might sell a lot of copies! 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www. geometry .org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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2 1. Introduction 


1.1. Structural layers of differential geometry 


The following table summarises the progressive build-up of the layers of structure of differential geometry in 
the chapters of this book. (Chapters in parentheses are concerned with non-manifold preliminaries.) 


layer main concept structure chapters 
0 point-set layer points set of points with no topological structure (7-30) 
1 topological layer connectivity topological space: open neighbourhoods (31-39) 47-50 
2 differential layer vectors atlas of differentiable charts; tangent bundle (40-46) 51-66 
3 connection layer parallelism affine connection on the tangent bundle 67-72 
4 metric layer distance Riemannian metric tensor field 3-75 


The following table shows structural layers required by some important concepts. For example, Jacobi fields 
may be defined for an affine connection or Riemannian metric, but not if given only differentiable structure. 


concepts structural layers where concepts are meaningful 
point topology differentiable affine riemannian 
set structure connection metric 
0 cardinality of sets yes yes yes yes yes 
1 interior/boundary of sets yes yes yes yes 
connectivity of sets yes yes yes yes 
continuity of functions yes yes yes yes 
2 tangent vectors yes yes yes 
differentials of functions yes yes yes 
tensor bundles yes yes yes 
differential forms yes yes yes 
Lie bracket yes yes yes 
Lie derivative yes yes yes 
exterior derivative yes yes yes 
Stokes theorem yes yes yes 
3a connections yes+ yes 
curvature form [pfb] yes+ yes 
covariant derivative [vb] yes+ yes 
Riemann curvature [vb] yes+ yes 
3b geodesic curves yes yes 
geodesic coordinates yes yes 
Jacobi fields yes yes 
torsion tensor yes yes 
convex sets and functions yes yes 
Ricci curvature yes yes 
4 distance between points yes 
length of a vector yes 
angle between vectors yes 
volume of a region yes 
area of a surface yes 
normal coordinates yes 
sectional curvature yes 
scalar curvature yes 
Laplace-Beltrami operator yes 


Concepts marked “yes+” are meaningful for general connections on differentiable fibre bundles, not just for 
affine connections on tangent bundles. Thus “3a” denotes the sub-layer of connections on differentiable fibre 
bundles, whereas “3b” denotes the sub-layer of affine connections on tangent bundles. There are also some 
other sub-layers and sub-sub-layers, for example for vector bundles [vb] and principal fibre bundles [pfb]. 
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1.2. Topic flow diagram 


1.2. Topic flow diagram 


Chapters 2 to 48 present various preliminary topics so as to avoid interrupting later chapters with frequent 
interpolations of prerequisites. These topics include logic, set theory, relations, functions, order and numbers 
(Chapters 3-16), algebra (Chapters 17-20), linear algebra (Chapters 22-26), multilinear and tensor algebra 


(Chapters 27-30), topology (Chapters 31-39), differential and integral 
calculus (Chapters 40-46) and fibre bundles (Chapters 21, 47-48). 
'The differential geometry of manifolds includes topological manifolds 
(Chapters 49-50), differentiable manifolds (Chapter 51), general and 
affine connections (Chapters 67-72), and manifolds with a Riemannian 
or pseudo-Riemannian metric tensor field (Chapters 73-75). 


This book tries to clarify which definitions and theorems belong to four 
levels of manifold structure: topological structure (Chapters 49-50), 
differentiable structure (Chapters 51-61), affine connection structure 
(Chapters 67-72), and Riemannian metric structure (Chapters 73-75). 
Lie groups and differentiable fibre bundles (Chapters 62-66) are in some 
sense preliminary topics, but like tensor algebra (Chapters 27-30) and 
topological fibre bundles (Chapters 47-48), they may also be regarded 
as core topics of differential geometry. 


The topic flow diagram shows the progressive build-up of algebraic 
structure from "sets and numbers" to "tensor algebra". Analytical 
topics “topology” and “calculus” are combined with “tensor algebra” 
to define “differentiable manifolds". Then adding “topological fibre 


sets and numbers 


N 


algebra 
topology = spaces 
Nisi tensor algebra 
a v 
anie 7| made. 
ERG) exse 


we a 


differentiable fibre bundles 


! 


connections on fibre bundles 


Riemannian pseudo-Riemannian 
manifolds manifolds 


bundles" to this yields “differentiable fibre bundles" on which parallelism and connections may be defined. 
In particular, “affine connections" are defined on tangent bundles. Adding a metric tensor field leads to 


“Riemannian manifolds". Adding a space-time metric tensor field leads to 


1.3. Chapter groups 


6 


‘pseudo-Riemannian manifolds". 


The page counts for the chapter groups outlined in Remark 1.3.1 are as follows. 


pages chapter group 


chapters 


14 introduction 


ji 


564 Part I: foundations 
20 general comments 

178 mathematical logic 

148 sets, relations, functions 

218 order, numbers 


Te T 
m | la 
CcIolrmic 


412 Part II: algebra 


152 semigroups, groups, rings, fields, modules 


260 linear algebra, tensor algebra 


532 Part III: topology and analysis 
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1.3.1 REMARK: 


1. Introduction 


Chapters within each chapter group. 


Chapter 1 is a general introduction. 
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Chapters 40 to 46 introduce analytical topics, namely the differential and integral calculus. 
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6 1. Introduction 


1.4. Objectives and motivations 


1.4.1 REMARK: The original book plan has suffered “mission creep” and “project drift”. 

The initial plan for this book in 1986 was to combine the topics in a dozen or so differential geometry related 
articles in the Mathematical Society of Japan’s Encyclopedic dictionary of mathematics [112] into a compact, 
coherent presentation in a logical order with uniform notation, together with prerequisites and further 
details from other books. Unfortunately the original target length of about 50 pages has been exceeded! 
The recursive catchment area of differential geometry prerequisites is a surprisingly large proportion of 
undergraduate mathematics. Recursive gap-filling is an almost never-ending process, reminiscent of the 
Sorcerer’s Apprentice's broom-splitting. (See “Der Zauberlehrling", by Goethe [493], pages 120-123.) 


1.4.2 REMARK: Motivations for this book. 
The motivations for this book include the following. 


(1) Unify the definitions and notations of the various formalisms and styles of differential geometry. 
(2) Systematise differential geometry by organising concepts into clear-cut structural layers. 


(3) Integrate differential geometry with its prerequisites. This requires unified definitions and notations from 
the lowest layers of mathematical logic and set theory to the highest layers of differential geometry. 


(4) Provide a background resource for mathematics and physics students studying differential geometry. 
The principal application contexts are pure geometry, gravity theories and gauge theories. 


(5) Facilitate the generalisation of partial differential equations theory from flat space to curved space. 
(6) Provide a “source book” for the author to create smaller, beginner-friendlier books. 


1.4.3 REMARK: Book form. Textbook versus monograph versus survey versus investigation. 

A textbook is an educational course-in-a-book, used in a class or for self-study. A monograph is an orderly 
presentation of a research area. A survey is an overview of the ideas or developments in a subject. An 
investigation is an attempt to bring clarity or insight to an intriguing question, possibly suggesting new 
ideas for solutions. This book is not a textbook because it has no exercises. Nor can it be a research 
monograph because the subject matter is mostly well known. This book was initially intended to be a 
survey of the disparate and fragmented literature (mostly textbooks) on differential geometry, hopefully 
bringing clarity by comparing and contrasting various approaches. But the author’s desire to understand 
concepts more deeply, not just acquire skill in their application, has led him reluctantly into a prolonged 
investigation, like a journalist persistently sniffing out a story over many years because a routine report 
turned up some surprising developments. This book is a report of that journalistic investigation. 


A classic example of this genre is George Boole’s 1854 book, “An investigation of the laws of thought” [342]. 
In philosophy, it is not unusual to see the words “essay”, “treatise” or “enquiry” in a book title. (For example, 
de Montaigne [456, 457]; Locke [463]; Berkeley [453]; Hume [460].) This book is thus an investigation, essay, 
treatise or enquiry. Implicit in such titles is the possibility, or even the expectation, of failure! To judge this 
book as a textbook or monograph would be like judging the cat as fox-hunter, or the dog as mouse-catcher. 


1.4.4 REMARK: The original motivation for this book was elliptic second-order boundary value problems. 
In 1986 the author started trying to generalise his PhD research on certain geometric properties of solutions 
of second-order boundary and initial value problems from flat space to differentiable manifolds. This required 
some computations for parallel transport of second-order differential operators along geodesics in terms of 
the curvature of the point-space. The author could not find the necessary definitions in the literature. 


Looking beyond the author’s own research area, it seemed that the existence and regularity theory for partial 
differential equations in flat space, particularly for boundary and initial value problems, should be extended 
to curved space. If the universe is not flat, much PDE theory could be inapplicable. Many of the techniques of 
PDE theory depend on special properties of Euclidean space. So it seemed desirable to formulate differential 
geometry so as to facilitate “porting” PDE theory to curved space. For this task, the author could not 
find suitable differential geometry texts. The more he read, the more confusing the subject became because 
of the multitude of contradictory definitions and formalisms. The origins and motivations of differential 
geometry concepts are largely submerged under a century of continuous redefinition and rearrangement. The 
differential geometry literature contains a plethora of mutually incomprehensible formalisms and notations. 
Writing this book has been like creating a map of the world from a hundred regional maps which use different 
coordinate systems and languages for locating and naming geographical features. 
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1.4.5 REMARK: Michael Spivak tried to reunify differential geometry in 1970. 

In July 2002, long after initially writing the comments in Remark 1.4.4 about a “multitude of contradictory 
definitions and formalisms” and a “plethora of mutually incomprehensible formalisms and notations”, the 
author acquired a copy of Michael Spivak’s 5-volume DG book. The first two paragraphs of the preface to 
the 1970 edition ([37], volume 1, page ix) contain similar comments. 


|... ] no one denies that modern definitions are clear, elegant, and precise; it’s just that it’s impossible 
to comprehend how any one ever thought of them. And even after one does master a modern 
treatment of differential geometry, other modern treatments often appear simply to be about totally 
different subjects. 


Since 1970, little has changed. Now the DG literature is more like it was then than it ever was. One needs 
some kind of Rosetta stone to translate between the various languages in which the subject is expressed. 


1.4.6 REMARK: Computation is the robot’s task. Understanding mathematics is the human’s task. 

To suppose that mathematics is the art of calculation is like supposing that architecture is the art of 
bricklaying, or that literature is the art of typing. Calculation in mathematics is necessary, but it is an 
almost mechanical procedure which can be automated by machines. The mathematician does much more 
than a computerised mathematics package. The mathematician formulates problems, reduces them to a form 
which a computer can handle, checks that the results correspond to the original problem, and interprets and 
applies the answers. Therefore the mathematician must not be too concerned with mere computation. That 
is the task of the robot. The task of the human is to understand mathematics. 


Great expertise in theorem-proving and computation may be compared with expert horse-riding. Even the 
best horse-rider may know little about the internal anatomy, physiology, chemistry and physics of the horse. 
This book is more concerned with the horse itself than with the art of riding it. 


1.4.7 REMARK: It is important to identify the class of each mathematical object before using it. 
For every mathematical object, one may ask the following questions. 


(1) How do I perform calculations with this object? 
(2) In which space does this object “live”? 
(3) What is the “essential nature” of this object? 


Students who need mathematics only as a tool for other subjects may be taught not much more than how to 
perform calculations. This often leads to incorrect or meaningless results because of a lack of knowledge of 
which kind of space each object “lives” in. Different spaces have different rules. Question (3) is important 
to help guide one’s intuition to form conjectures and discover proofs and refutations. One must have some 
kind of mental concept of every class of object. This book tries to give answers to all three of the above 
questions. Human beings are half animal, half robot. It is important to satisfy the animal half’s need for 
motivation and meaning as well as the robot half’s need to make computations. 


If one cannot determine the class of object to which a symbolic expression refers, the expression could be 
a meaningless “pseudo-notation”. Such expressions are sometimes encountered in the differential geometry 
literature. In theoretical physics, explicit indications of set membership are rare. This book tries to replace 
pseudo-notations with well-defined, meaningful expressions. An effective tactic for making sense of difficult 
mathematical ideas is to determine the set-membership of every expression and sub-expression. A major 
objective of this book is to help clarify concepts by always making set-membership explicit. The theoretical 
physics literature could be made more comprehensible to mathematicians by adopting this approach. 


1.4.8 REMARK: Understanding is for wimps? 

It could be argued that “understanding is for wimps”. Anyone who needs to think long and hard about 
the meanings of mathematical concepts and procedures surely does not have what it takes to do advanced 
mathematics. In this author’s opinion, no one should be embarrassed by the desire to understand what 
the symbols and manipulations in mathematics actually mean. Difficulty in understanding a mathematical 
concept may be due to an ambiguous or faulty explanation, and sometimes the concept may be much deeper 
than it first seems, even to the person presenting it. Sometimes the concept may in fact be meaningless or 
self-contradictory. To paraphrase a famous song title, it don’t mean a thing if it don’t mean a thing. 
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1.4.9 REMARK: Minimalism: This book attempts to remove unnecessary constraints on definitions. 

In this book, an attempt is made to determine what happens if many of the apparently arbitrary choices 
and restrictions in definitions are dropped or relaxed. If removing or weakening a requirement results in a 
useless or meaningless definition, this helps to clarify the meaning and motivate the requirements. 


There is a kind of duality between definitions and theorems. If a definition is generalised, its set of theorems 
is typically restricted. If a definition is more constrained, its theorems can be strengthened. Conversely, to 
strengthen a theorem, one may need to constrain one or more definitions which are inputs to the theorem. 


For each input definition for each theorem, there is a maximal definition which makes that theorem valid for a 
given output, assuming that the other inputs are fixed. In many places in this book, such maximal definitions 
are sought. Since different authors have different favourite theorems, many of the most important definitions 
in mathematics have multiple versions, each optimised for a different set of favourite target theorems. 


1.4.10 REMARK: Minimalism: Excessive assumptions of smoothness hinder applications. 

Most DG textbooks assume a high degree of smoothness (e.g. C^?) for functions and manifolds to make their 
work easier. This hinders applications to functions and manifolds which arise, for example, as solutions of 
initial and boundary value problems for physical models, where analysts often have to work very hard to 
prove that a problem has solutions with even limited regularity such as C^! or C?. 


Anyone wishing to apply smooth DG theory to non-smooth manifolds and functions must first determine 
whether smooth results are still valid. This book tries to specify the weakest possible regularity assumptions 
for definitions and theorems. Weakening the input assumptions for definitions and theorems also gives deeper 
insight into how and why they work. But most importantly, their range of applicability is enhanced. 


1.4.11 REMARK: Minimalism: Sharp theorems and definitions are best. 

A “sharp” theorem is a theorem which is as strong as possible within some class of possibilities. For 
example, if a theorem is proved for a C^ input (i.e. assumption) whereas the same output (i.e. assertion) 
can be obtained for a C^-! input, the theorem requiring a C* input would not be sharp. Similarly, if the 
output is shown to be CF when in fact it is possible to prove a C*+! output, such a theorem would not 
be sharp. A theorem can be shown to be sharp by demonstrating the existence of a counterexample which 
makes strengthening or “sharpening” of the theorem impossible within the specified class of possibilities. (In 
this example, the “class of possibilities" is the set of all C^ regularity levels.) 


The assumptions of definitions can also be “sharpened”. For example, a continuous real function can be 
integrated, but much weaker assumptions than continuity still yield a meaningful integral. Similarly, many 
definitions for linear spaces may be easily extended to general modules over rings, and many definitions for 
Riemannian manifolds are perfectly meaningful for affinely connected manifolds. 


Whenever it has been relatively easy to do so, theorems and definitions have been made as sharp as possible 
in this book, although this has been avoided when the cost has clearly outweighed the benefits. Sometimes 
one must sacrifice some truth for beauty, although mostly it is better to sacrifice some beauty for truth. 


1.4.12 REMARK: Successful revolutions require a deep understanding of fundamentals. 

'The current orthodoxy in cosmology may possibly require an overhaul some time soon. If general relativity 
needs to be put on the operating table for major surgery, it will be important to have a deep understanding 
of the mathematical machinery underlying it. This justifies the detailed investigation of the fundamentals of 
differential geometry. It is perilous to overhaul, rebuild or redesign machinery which one does not understand 
in depth. Every part has a purpose which must be understood if surgery is to be successful. (This comment 
was composed in the late 1980s or early 1990s. It has seemed even more relevant every year since then.) 


1.4.13 REMARK: Folklore should be avoided. Methods and assumptions should be explicitly documented. 
Differential geometry has developed multiple languages and dialects for expressing its extensive network of 
concepts during the last 190 years. Familiarity with the folklore of each school of thought in this subject 
sometimes requires a lengthy apprenticeship to learn its languages and methods. It is the author's belief 
that a competent mathematician should be able to learn differential geometry from books alone without the 
need for initiation into the esoteric mysteries by the illuminated ones. The validity of theorems should follow 
from objective logical argument, not subjective geometric intuition. Differential geometry should receive the 
same kind of logically rigorous treatment that algebra, topology and analysis are subjected to. 
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The author is reminded of the foreword to the counterpoint tutorial “Gradus ad Parnassum" ([497], page 17) 
by Johann Joseph Fux, published in 1725. (For the original Latin, see Fux [490], praefatio, second page.) 


I do not believe that I can call back composers from the unrestrained insanity of their writing to 
normal standards. Let each follow his own counsel. My object is to help young persons who want 
to learn. I knew and still know many who have fine talents and are most anxious to study; however, 
lacking means and a teacher, they cannot realize their ambition, but remain, as it were, forever 
desperately athirst. 


Joseph Haydn was one of the composers who learned counterpoint from Fux’s book because he “lacked means 
and a teacher”. This differential geometry book is also aimed at those who lack means and a teacher. The 
“Gradus ad Parnassum” was much more successful than previous counterpoint tutorials because Fux arranged 
the ideas in a logical order, starting from the most basic two-part counterpoint (with gradually increasing 
rhythmic complexity), then three-part counterpoint in the same way, and finally four-part counterpoint in 
the same graduated manner. Fux [497], page 18, said about this systematic approach: “When I used this 
method of teaching I observed that the pupils made amazing progress within a short time.” This book tries 
similarly to arrange all of the ideas which are required for differential geometry in a clear systematic order. 


1.4.14 REMARK: Concentrated focus on definitions gives better understanding than peripheral perception. 
A vegetarian cook will generally cook vegetables better than the meat-centric cook who regards vegetables 
as a necessary but uninteresting background. And a definition-centric book will generally explain definitions 
better than a theorem-centric book which regards definitions as a necessary but uninteresting background. 


1.4.15 REMARK: Truth is not decided by majority vote. Seek truth from facts. 
This author has strongly believed since the age of 16 years that truth is not decided by majority vote. After 
a test had been handed back to the chemistry class, he noticed that the two questions where he had lost 
points had been marked incorrectly. So he raised this with the teacher. After about 20 minutes of open 
discussion, the teacher agreed that it had been marked incorrectly. Then this 16-year-old student started 
to make the case for the other lost mark. The teacher lost patience and asked for a show of hands. The 
majority voted on the side of the teacher. The teacher said this settled the matter and the aggrieved student 
was wrong. Since this student had studied chemistry by reading more broadly and intensively than expected, 
it seemed somewhat unjust that a majority vote had decided “the truth”. The disillusionment with science 
that arose from this experience is doubtless one of the historical causes of the present book. If the majority 
of mathematicians, teachers and textbooks agree on a point, this does not necessarily imply that it is correct. 
Assertions should be either true or false in themselves, not determined by majority vote. All majority beliefs 
should be examined and re-examined by free-thinking individuals — because the majority might be wrong! 
As Máo Zé-Dong and Déng Xiáo-Píng used to say: “Seek truth from facts.” (Shi shi qit shi. SCSRoK E.) 


1.5. Some minor details of presentation 


1.5.1 REMARK: A systematic notation is adopted for important sets of numbers. 
The mathematical literature has a wide range of notations for sets of numbers. This book uses the notations 
which are summarised in Table 1.5.1. (See Chapters 12, 14, 15 and 16 for details.) 


number system all positive non-negative negative non-positive 
integers Z Z+, N Zf, w Z- Zo 
rationals Q Qr M Q Qo 
reals R R+ R R- Ro 
extended integers Z Zt, N Zj wt Z- Zo 
extended rationals Q Qt Qi Q7 Qo 
extended reals R Rt Es R- Ro 
Table 1.5.1 Summary of notations for number systems 


The notation w is used for the non-negative integers Zi when the ordinal number representation is meant. 
Thus 0 = Ø, 1 = {0}, 2 = {0,1} and so forth. Then wt = wU {w}. The notation Zj is preferable when 
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the representation is not important. Note that the plus-symbol in Zi indicates the inclusion of positive 
numbers, whereas the plus-symbol in wt means that an extra element is added to the set w. 


The notation IN is mnemonic for the “natural numbers" Zt. (Some authors include zero in the natural 
numbers. See Remark 14.1.1.) The bar over a set of numbers indicates that the base set is extended by 
elements oo and —oo. So Z = ZU (oo, —oo] and R = RU foo, —oo}. Positivity and negativity restrictions 
are then applied to these extended base sets. Thus Zj = Zj U (oo) and R^ = R7 U (—oo). 

The notations IN, = {1,2,...n} and Zn = (0, 1,...n — 1) = n are used as finite index sets. These are often 
used interchangeably in an informal manner. Thus R” usually means RN” in practice. 


1.5.2 REMARK: Mathematical symbols at the beginning of a sentence are avoided. 

When mathematical symbols appear immediately after punctuation, the sentence structure can become 
unclear. Therefore a concerted effort has been made to commence all sentences with natural language words. 
For similar reasons, end-of-sentence mathematical symbols at the beginning of a line are also avoided, like 
7. 'These punctuation-related considerations sometimes lead to slightly unnatural grammatical structure 
of sentences. Likewise, a concerted effort has been made to rearrange sentences in order to avoid hyphen- 
ation at line breaks. The text is also often rewritten to avoid a single word on a final line, which would look 
bad. 


1.5.3 REMARK: There are no foot-notes or end-notes. 

Mathematics is painful enough already without having to go backwards and forwards between the main text 
and numerous foot-notes and/or end-notes. Such archaic parenthetical devices cause sore eyes and loss of 
concentration. Their original purposes are no longer so relevant in the era of computer typesetting. They 
were formerly used to save money when typesetting postscripts, after-thoughts and revision notes by editors. 
'The bibliography, on the other hand, does belong at the end of a book or article, pointed to from the text. 


1.5.4 REMARK: All theorems are based on Zermelo-Fraenkel axioms except those which are tagged. 
'Theorems which are based on non-Zermelo-Fraenkel axioms are tagged to indicate the axiomatic framework. 
This is like listing ingredients on food labels. (See Remarks 4.5.5 and 7.1.11 for some axiom-system tags.) 


1.5.5 REMARK:  Pseudo-notations to introduce definitions are avoided. 
A definition is not a logical implication or equivalence. Nor is it an axiom or a theorem. A definition 
is a shorthand for an often-used concept. Therefore the symmetric symbols “=” for equality or “=” for 


equivalence are not used in this book for definitions, but the popular asymmetric definition “assignment” 
f 


notations def = + and := are clumsy, unattractive and incorrect. Plain language is used instead. 

1.5.6 REMARK: An empty-square symbol indicates the end of each proof. 

The square symbol “Q” is placed at the end of each completed proof, with the same meaning as the Latin 
abbreviation QED (Quod Erat Demonstrandum) which was customary in Euclidean geometry textbooks. 
This kind of QED symbol is generally attributed to Paul Richard Halmos. In the 1990 edition of a 1958 
book, Eves [353], page 149, wrote: “The modern symbol O, suggested by Paul R. Halmos, or some variant 
of it, is frequently used to signal the end of a proof." In 1955, Kelley [101), page vi, wrote the following. 


In some cases where mathematical content requires “if and only if" and euphony demands something 
less I use Halmos’ “iff.” The end of each proof is signalized by J. This notation is also due to Halmos. 


Many authors have used a solid rectangle. (This leaves unanswered the question of who first used the more 
attractive, understated empty square symbol instead of the brash, ink-wasting solid bar.) 


1.5.7 REMARK: The chicken-foot symbol indicates specification-tuple abbreviations. 

The non-standard “chicken-foot” symbols “<” and “>” indicate abbreviations of specification tuples for 
mathematical objects. (See Section 8.8.) This avoids embarrassing absurdities such as: “Let X = (X, T) be 
a topological space." Clearly abbreviation, not equality, is intended in such situations. 


1.5.8 REMARK: Every remark and theorem has a one-line summary. 

Above each remark and theorem is a one-line summary. This is intended to be a kind of “punch-line” 
or “take-home message". Such “subsection titles" should enable the reader to more quickly recognise the 
meaning of each titled subsection, and to optionally skip it if it is not of interest. A similar convention was 
followed in Galileo's time, when marginal notes summarised the main points in the text. (See for example 
Galileo [268]. This practice was also followed more recently by Misner/'Thorne/Wheeler [292].) 
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1.5.9 REMARK: Each remark is a mini-essay. 

Each of the 4571 remarks in this book may be thought of as a mini-essay. Remarks are usually related to 
the immediately preceding or following subsections, but not always. Some remarks may seem to duplicate or 
contradict other remarks. This is sometimes intentional. For every argument, there is a counter-argument. 
For every counter-argument, there is a counter-counter-argument. A book that has been composed over 
many years may not have the same level of internal consistency as a book that is written in six months. If 
a subsection (or section or chapter) seems unacceptable or worthless, just skip to the next one. 


The definitions, notations, theorems and proofs are the “business” or “science” channel of this book. The 
remarks and diagrams are the “interpretation” or “philosophy” channel. So this book is “in stereo”. 


1.5.10 REMARK: All chapters start on odd-numbered pages. 
Chapters start on odd-numbered pages so that they can be printed and bound as individual “booklets”. 


1.5.11 REMARK: Cross-references and index-references point to subsections, not page numbers. 

There are two main reasons to use subsection numbers instead of page numbers for cross-references and 
index-references. A subsection is typically much smaller than a page. So it makes references more precise, 
saving time hunting for text. And if the referenced text is in the middle of a long subsection, it is usually 
best to read the context at the beginning of the subsection first. 


1.5.12 REMARK: Italicisation of Latin and other non-English phrases. 
It is conventional to typeset non-English phrases in italics, e.g. modus ponens and ez falso quodlibet, as a 
hint to readers that these are imported from other languages. This convention is often adopted here also. 


1.5.13 REMARK: Translations of non-English-language quotations. 
All English-language translations preceded by their non-English source text are composed by this author. 
These translations prioritise fidelity to the original, not the naturalness of the English. 


1.6. Differences from other differential geometry books 


1.6.1 REMARK: General differences in the style of presentation in this book. 
The following are some of the general differences in presentation between this book and the majority of 
differential geometry textbooks. 


(1) This is not a textbook. (See Remark 1.4.3.) 


(2) Definitions are the main focus rather than theorems. Theorems are presented only to support the 
presentation of definitions. 


(3) Substantial preliminary chapters present most of the prerequisites for the book. This avoids having to 
weave elementary material into more advanced material wherever needed. 


(4) Classes of mathematical objects are generally defined in terms of specification tuples. 


(5) An effort has been made to identify which class each mathematical object belongs to, and the domain and 
range classes for all functions. Such tidy book-keeping helps ensure that every expression is meaningful. 


(6) Predicate logic notation is used extensively for the presentation of definitions and theorems. Predicate 
logic is more precise than explanations in natural language, but most of the formal definitions are 
accompanied by ample natural-language commentary, especially when the logic could be perplexing. 


(7) Differential geometry concepts are presented in a progressive order within a framework of structural 
layers and sublayers. In particular, the Riemannian metric is not introduced until affine connections have 
been presented, and concepts which are meaningful without a connection are defined in the differentiable 
manifold chapters before connections are presented. (See Section 1.1 for further details.) 


(8) There are no exercises (because this is not a textbook). So the reader is not required to remedy numerous 
gaps in the presentation. Proving theorems is the best kind of exercise. Readers should write or sketch 
proofs for most theorems themselves before reading the solutions provided here. (Most of the 1817 
proofs in this book would be set as exercises in typical textbooks.) Proof-writing is the principal skill 
of pure mathematics. Anyone can make conjectures. The ability to provide a proof is the difference 
between the mathematician and the dilettante. The dilettante conjectures. The mathematician proves. 
Creating examples and diagrams to elucidate theorems and definitions is another important kind of 
exercise. Also very educational is to rewrite everything in one's own personal choice of notation. 
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1.6.2 REMARK: Specific differences in the choices of definitions in this book. 
The following are some specific differences between the way concepts are defined in this book compared to 
many or most other differential geometry books. 


(1) Mathematical logic is presented with semantics as the primary “reality”, while the linguistic component 
of logic is secondary. In other words, models are primary and theories are secondary. 


(2) Axioms of infinite choice are weeded out of definitions and theorems wherever possible. Equivalent 
AC-free definitions and theorems are given instead. Theorems which require an infinite choice axiom 
are tagged to discourage their use. They are presented only as intellectual curiosities because the 
mathematical universes to which they apply are too metaphysical, mystical and mythical for this book. 
Finite ordinal numbers are defined in terms of predecessors instead of successors. 

Tensor spaces are defined as duals of spaces of multilinear functions. 

Ordinary and principal fibre bundles are combined as “fibre/frame bundles”. (See Section 47.13.) 
Tangent vectors are defined as constant-velocity line trajectories rather than as differential operators or 
curve classes. (See Section 53.1 for “the true nature of tangent vectors” .) 

Higher-order tangent operators are defined for use in partial differential equations on manifolds. 
Connections are defined as horizontal lift maps on ordinary fibre bundles. 

The covariant derivative is defined in terms of a connection by applying a “drop function”. 


Riemannian metric tensors are defined primarily as half the Hessian of the square of the point-to-point 
distance function, rather than as symmetric covariant tensors of degree 2. 


As mentioned in Remarks 1.4.3 and 1.6.1, this is not a textbook, nor is it a monograph. A textbook should 
introduce a student to the standard culture of a subject. A monograph should summarise the current state 
of the art in a topic. But in an essay, the author is free to diverge from the mainstream a little in order to 
find some possibly better ways of doing things. As the title of this book suggests, the objective here is to 
reconstruct the subject, not to describe only what is already present in the literature. Therefore if this book 
does diverge in some points from the mainstream, this is an inevitable consequence of its main purpose. 
Hopefully at least some of the heretical views expressed here will be to the long-term benefit of the subject. 


1.6.3 REMARK: Topics which are outside the scope of this book. 
All books have “loose ends”. Otherwise they would never end. Even encyclopedias must limit their scope. 
(1) NBG set theory. Neumann-Bernays-Gédel set theory is not presented due to lack of time, regrettably. 


(2) Model theory. Although model theory is alluded to, and some basic models (the cumulative hierarchy 
and the constructible universe) are defined, this topic is not developed beyond some basic definitions. 


(3) Category theory. The bulk handling facilities for morphisms in category theory are not used here. 


(4) Complex numbers. As far as possible, complex numbers are avoided. However, they are required for 
the unitary Lie groups which are unavoidable in gauge theory. 


(5) Topological vector spaces. General topological vector spaces are very briefly outlined in Section 39.1. 
Schwartz distributions and Sobolev spaces are not presented. Uniformities are also not presented. 


(6) Algebraic topology. Apart from some brief allusions, algebraic topology and combinatorial topology 
in general are avoided. Only “point set topology” is presented. 


(7) Partial differential equations. PDE theory would have expanded the book by at least 1000 pages. 
(8) Manifolds with boundaries. This topic is too large to fit within the time constraints. 


(9) Lie groups. Chapters 62-63 present some basic definitions for Lie groups and transformation groups, 
but their deeper group-theoretic, combinatorial topology and representation theory are not covered. 

(10) Probability theory. Almost nothing is said about probability at all. Consequently information 
geometry and quantum theory are not presented. 

(11) Theoretical physics. Although one of the principal motivations for this book is to illuminate some 
portions of the mathematical foundations underlying general relativity and gauge theory, GR-related 
concepts are limited to some basic definitions for pseudo-Riemannian manifolds in Chapter 75, and only 
some very basic “classical” gauge theory definitions are presented. Quantization is out of scope. 
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1.7. How to learn mathematics 


1.7.1 REMARK: The asterisk learning method focuses attention on difficult points which need it most. 
This is the “asterisk method of learning” which I discovered in 1975. It worked! 


(1) Find somewhere quiet. Turn off the radio (and your portable digital music player). A library reading 
room is best if it is a quiet library reading room. Preferably have a large table or desk to work on. 


(2) Open the book or lecture notes which you wish to study. 


(3) Start copying the relevant part of the book or lecture notes to a (paper) notebook by hand. If you 
are studying a book, you should paraphrase or write a summary of what you are reading. If you are 
studying lecture notes, you should copy everything and add explanations where required. 


(4) Whenever you copy something, ask yourself if you really understand it completely. In other words, you 
must understand every word in every sentence. As long as you are completely comfortable with what 
you are copying, keep going. 

(5) If you read something which is difficult to understand, stop and think about it until you understand it 
clearly. If a mathematical expression is unclear, determine which set or class each term in the expression 
belongs to. All sub-expressions in a complex expression must belong to some set or class. Every operator 
and function in an expression must act only on variables in the domain of definition of the operator or 
function. Draw arrow diagrams for all functions, domains and ranges. Draw diagrams of everything! 


(6) If you find something that you really can't understand after a long time, copy it to your notebook, but 
put an asterisk in the margin. This means that you have copied something that you did not understand. 


(7) While you continue copying, keep going back to the lines which are marked with an asterisk to see if 
you can understand them. If you find an explanation later, you can erase the asterisk. 


(8) When you have finished copying enough material for one sitting, look over your notes to see if you can 
understand the lines which still have an asterisk. If you have no asterisks, that means that you have 
understood everything. So you can progress to the next section or chapter of the text. 


(9) If you still have one or more asterisks left in your notes after a day or more, you should keep trying to 
understand the lines with the asterisks. Whenever you get some spare time and energy, just look at the 
lines with asterisks on them. These are the lines that need your attention most. 


(10) If you discuss your work with other people, especially with teachers or tutors, show them your notes 
and the lines with the asterisks. Try to get them to explain these lines to you. 


If you keep working like this, you will find that your study becomes very efficient. This is because you do 
not waste your time and energy studying things which you have already understood. 


I used to notice that I would spend most of my time reading the things which I did understand. To learn 
efficiently, it is necessary to focus on the difficult things which you do not understand. That's why it is so 
important to mark the incomprehensible lines with an asterisk. 


Copying material by hand is important because this forces the ideas to go through the mind. The mind is 
on the path between the eyes and the hands. So when you copy something, it must go through your mind! 


It is also important to develop an awareness of whether you do or do not really understand something. It is 
important to remove the mental blind spots which hide the difficult things from the conscious mind. 


When copying material, it is important to determine which is the first word where it becomes difficult. Read 
a difficult sentence until you find the first incomprehensible word. Focus on that word. If a mathematical 
expression is too difficult to understand, read each symbol one by one and ask yourself if you really know 
what each symbol means. Make sure you know which set or space each symbol belongs to. Look up the 
meaning of any symbol which is not clear. 


1.7.2 REMARK: First learn. Then understand. Insight requires ideas to be uploaded to the mind first. 

It is often said that learning is most effective when the student has insight. This assertion is sometimes 
given as a reason to avoid “rote learning", because insight must precede the acquisition of ideas. But the 
best way to get insight into the properties and relations of ideas is to first upload them into the mind and 
then make connections between them. For example, a child learning multiplication tables will notice many 
patterns and redundancies during or after rote learning. The best strategy is to learn first, then understand 
more deeply. Connections can only be made between ideas when they are in the mind to be connected. 
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1.8. MSC 2010 subject classification 


The following “wheel of fortune” shows where differential geometry fits into the general scheme of things. 


suonesieJeueb pue fjoeui dnouo 


ids 


Mechanics of particles and systems 


Readers interested only in differential geometry, as opposed to logic, set theory, number systems, algebra, 
topology and calculus, may prefer to commence reading at Chapter 49. Foundational topics and pre-requisites 
in earlier chapters are abundantly cross-referenced where they are required or relevant. 
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Part I 


Foundations 


The informal outlines at the beginning of Parts I, II, III and IV of this book are not intended to be read. 
However, these outlines could possibly be of some value as an adjunct to the table of contents and index. 
The reader is recommended to skip immediately to Chapter 2. 


PART I: Foundations 


(1) 


3) 


4) 
5) 


1) 
2) 


Part I introduces logic, sets and numbers. Each of these foundation topics requires the other two. So 
it is not possible to present them in strict define-before-use order. (This would be like trying to write 
a dictionary where every word must be defined using only words which have already been defined.) 
The definitions and theorems before and including Definition 7.2.4 are tagged as metamathematical or 
logical, but most definitions and theorems thereafter are based on Zermelo-Fraenkel set theory and are 
presented in define-before-use order. 


Chapters 3, 4, 5 and 6 on mathematical logic are presented before all other topics because logic is 
the most fundamental, ubiquitous and deeply pervasive part of mathematics. In particular, it is the 
necessary foundation for axiomatic set theory. 

Chapter 7 introduces the Zermelo-Fraenkel set theory axioms, which provide the fundamental framework 
for modern mathematics. Chapters 8, 9 and 10 present the most basic structures and theorems of 
mathematics in terms of the ZF set theory foundation. 

Chapters 11, 12 and 13 introduce order, ordinal numbers and cardinality. 


Chapters 14, 15 and 16 introduce natural numbers, integers, rational numbers and real numbers. 


CHAPTER 2: General comments on mathematics 


Chapter 2 is a preamble (or “preliminary ramble”) to some of the philosophical themes in this book. 


Remark 2.1.1 states that mathematics needs to be bootstrapped into existence. There is no solid 
“bedrock” of mathematics which is self-evident and upon which everything else can be securely based. 
Therefore Part I of this book cannot be written in a strictly reductionist manner from a small number 
of self-evident axioms. 


CHAPTER 3: Propositional logic 


(1) 
(2) 


(5) 
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Remark 3.0.1 explains why this book commences with logic rather than numbers or sets. 


Section 3.1 asserts that logical propositions are either true or false, and cannot be both true and false. 
This is the definition of a logical proposition. The validity of methods of logical argumentation must 
be judged by their ability to correctly infer truth values of propositions. Truth is not decided by 
argumentation. Truth can only be discovered through argumentation. 


Notation 3.2.2 introduces the most basic concepts in the book, namely the proposition-tags F ("false") 
and T (“true”). Definition 3.2.3 introduces “truth value maps” which associate true/false proposition- 
tags with propositions in some “concrete proposition domain”. These concepts provide the initial basis 
for all of the logic in this book. 

Definition 3.3.2 introduces “proposition name maps” which associate names with propositions in a 
concrete proposition domain. This approach to “truth” is intended to be a description of how logic is 
done in the real world, not an abstract system which is disconnected from how real-world logic is done. 


Definition 3.4.4 introduces “knowledge sets”. These describe the state of knowledge of some person or 
community or system about a given concrete proposition domain. 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


16 


(6) 


(10) 


(11) 
(12) 
(13) 


(14) 


Part I. Foundations 


Section 3.5 gives an alternative representation of truth value maps using “truth domains”, which are 
the possible sets of concrete propositions which are true. Then knowledge sets are given an alternative 
representation in terms of such truth domains. 


Section 3.6 introduces logical expressions which can be used to describe one’s current knowledge about 
a given concrete proposition domain. These logical expressions use familiar logical operations such as 
“and”, “or”, “not” and “implies”. 

Definition 3.7.2 introduces “truth functions", which are arbitrary maps from (F, T]" to (F, T]. These 
are used to present some basic binary logical operators in Remark 3.7.12. The wide variety of notations 
for logical operators is demonstrated by the survey in Table 3.7.1, described in Remark 3.7.14. 


Sections 3.9 and 3.10 introduce symbol-strings which may be used as “names” of propositional logic ex- 
pressions. This allows names which refer to particular knowledge sets to be combined into new knowledge 
sets. When particular meanings are defined for logical symbol-strings, the result is a kind of “syntax” or 
“language” for describing knowledge sets. Symbol-strings are a compact way of systematically naming 
knowledge sets (i.e. logical expressions). 

Section 3.11 introduces the prefix and postfix styles for logical expressions, which are relatively easy to 
manage. Section 3.12 introduces the infix styles for logical expressions, which is relatively difficult to 
manage, but this is the most popular style! 

Section 3.13 introduces tautologies and contradictions, which are important kinds of logical expressions. 
It also mentions “substitution theorems” for tautologies and contradictions. 

Section 3.14 is concerned with the interpretation of logical operations propositional logic expressions as 
logical operations on knowledge sets. 

Section 3.15 mentions some of the limitations of the standard logical syntax systems for describing 
complicated knowledge sets. 


Section 3.16 states that model theory for propositional logic has very limited interest. It also explains 
very briefly what model theory is, and gives a broad classification of logical systems. 


CHAPTER 4: Propositional calculus 


(1) 
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Chapter 4 describes the argumentative approach to logic, where axioms and rules of inference are 
first asserted (and presumably accepted by the reader), and logical propositions are then progressively 
inferred from those axioms and rules. In this way, the reader can be led to accept propositions that 
they otherwise might reject if they had foreseen the consequences of accepting the axioms and rules. In 
propositional logic, the argumentative approach requires much more effort than truth-table computation. 
However, the argumentative method is presented here, in the Hilbert style and without the assistance 
of the deduction theorem, to demonstrate just how much hard work is required. This is a kind of 
mental preparation for predicate calculus, where the argumentative method seems to be unavoidable 
for a logical system which has “non-logical axioms” . 


Remark 4.1.5 mentions that the knowledge set concept is applied to the interpretation of propositional 
logic argumentation in much the same way as it is to propositional logic expressions. This gives a 
semantic basis for predicate logic argumentation. (In other words, it gives a basis for testing whether 
all arguments infer only true conclusions from true assumptions. ) 


Remark 4.1.6 gives a survey of the wide range of terminology which is found in the literature to refer 
to various aspects of formalisations of logic. 


Table 4.2.1 in Remark 4.2.1 surveys the wide range of styles of axiomatisation of propositional calculus. 
Definition 4.3.9 and Notations 4.3.11, 4.3.12 and 4.3.15 define and introduce notations for unconditional 
and conditional assertions. 

The “modus ponens” inference rule is introduced in Remark 4.3.17. 

Definition 4.4.3 presents the chosen axiomatisation for propositional calculus in this book. This consists 
of the three Lukasiewicz axioms together with the modus ponens and substitution rules. 


The first propositional calculus theorem in this book is Theorem 4.5.7, which “bootstraps” the system 
by proving some very elementary technical assertions. These are applied throughout the book to prove 
other theorems. Because of the way in which logic is formulated here in terms of very general concrete 
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proposition domains and knowledge sets, the theorems in Chapter 4 are applicable to all logic and 
mathematics. Further theorems for the implication operator are given in Section 4.6. 


« » 


Section 4.7 presents some theorems for the “and”, “or” and “if and only if" operators, which are 
themselves defined in terms of the “implication” and “negation” operators. 


In Section 4.8, the “deduction theorem”, which is really a metatheorem, is discussed. However, it is not 
used in Chapter 4 for proving any theorems. Instead, the *conditional proof" rule CP is included as an 
axiom in Definition 6.3.9 because it is semantically obvious. It is not completely obvious that it follows 
from metalogical arguments because of the relative informality of such arguments, but when viewed in 
light of knowledge set semantics, there seems very little doubt about it. 


CHAPTER 5: Predicate logic 


(1) 


Figure 5.0.1 in Remark 5.0.1 shows the relations between propositional logic, predicate logic, first-order 
languages and Zermelo-Fraenkel set theory. Mathematics is viewed in this book, in principle, as the set 
of all systems which can be modelled within ZF set theory. (This is a somewhat approximate delineation 
which can be made more precise.) 


Section 5.1 discusses the underlying assumptions and interpretations of predicate logic. Remark 5.1.1 
describes predicate logic as a framework for the “bulk handling" of parametrised sets of propositions, 
particularly infinite sets of propositions. 


Notation 5.2.2 introduces the universal and existential quantifiers V (“for all”) and 3 (“for some”). These 
are applied to parametrised propositions to form predicates, which are propositions which may contain 
these quantifiers. A survey of notations for these quantifiers is given in Table 5.2.1 in Remark 5.2.4. 


Section 5.3 gives some informal arguments in favour of using natural deduction systems for predicate 
calculus instead of Hilbert-style systems. Natural deduction systems use conditional assertions (called 
"sequents") for each line of an argument, whereas Hilbert-style systems use unconditional assertions. 
Natural deduction systems correspond closely to the efficient way in which real mathematicians construct 
proofs of real-world theorems, whereas Hilbert-style systems are often oppressively difficult to work with 
because they are so unintuitive. Real mathematical proofs rely heavily on the inference methods known 
as “Rule G” and “Rule C" (discussed in Remark 6.3.19 and elsewhere), but the Hilbert approach does 
not permit these methods, except by way of metatheorems. The semantics of sequents is summarised 
in Table 5.3.1 in Remark 5.3.10 for simple propositions, and in Remark 5.3.11 for logical quantifiers. 


CHAPTER 6: Predicate calculus 


(1) 
(2) 
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It is explained in Section 6.1 that there is no general truth-table-like automatic method for evaluating 
validity of logical predicates. Therefore the “argumentative method” must be used. 


Table 6.1.2 in Remark 6.1.5 is a survey of styles of axiomatisation of predicate calculus according to the 
numbers of basic logical operators, axioms and inference rules they use. The style chosen here is based 
on the four symbols ^, =, V and J, together with the three Lukasiewicz axioms, the modus ponens and 
substitution rules, and five natural deduction rules. The propositional calculus theorems in Chapter 5 
are assumed to apply to predicate calculus because the same three Lukasiewicz axioms are used. 


'The chosen predicate calculus system for this book is presented in Definition 6.3.9. 


Section 6.4 discusses the semantic interpretation of the chosen predicate calculus. Remark 6.5.9 gives 
an example of the semantics of some particular lines of logical argument in this natural deduction style. 


Section 6.6 presents the first “bootstrap” theorems for the chosen predicate calculus. Surprisingly 
difficult are the assertions in Theorem 6.6.4 for zero-variable predicates. These must be handled very 
carefully because of the subtleties of interpretation. The other theorems in Section 6.6 demonstrate 
some of the strategies which can be used for this natural deduction system, and also shows how some 
kinds of “false theorems” can be easily avoided. (In some predicate calculus systems, “false theorems” 
are not so easy to avoid!) 


Section 6.7 introduces predicate calculus with equality. This is a very simple but important minimal 
example of a first-order language. An equality relation is usually required in logical systems where 
propositions refer to objects in some universe. (In the case of Zermelo-Fraenkel set theory, these objects 
are sets.) Equality means that two names refer to the same object. 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


18 
(7) 
(8) 


Part I. Foundations 


Section 6.8 introduces uniqueness and multiplicity, which are defined in terms of the equality relation 
of a predicate calculus with equality. 


Section 6.9 introduces first-order languages, which are distinguished from predicate calculus by having 
constant relations between objects. These relations are governed by non-logical axioms. 


CHAPTER 7: Set axioms 


(1) 


Zermelo-Fraenkel set theory, the model-building factory for all of mathematics (according to this book), 
is axiomatised in Definition 7.2.4. Subsets and supersets are introduced in Definition 7.3.2. 

The meaning of the eight ZF axioms is discussed in Sections 7.5, 7.6, 7.7, 7.8 and 7.9. 

Since nothing is ever truly infinite in our experience, some difficulties with the concept of infinity are 
discussed in Section 7.10. 

The axiom of choice is presented in Section 7.11. Some popular theorems which are “lost” if the axiom 
of choice is not accepted are listed in Remark 7.11.13. 


Scorn is heaped on the axiom of choice in Section 7.12 (and elsewhere in this book). It is pointed out in 
Remark 7.12.5 that the axiom of infinite choice is false. This is sad because life would be much easier 
for mathematicians if it was true. 


CHAPTER 8: Set operations and constructions 


(1) 


(2) 
(3) 


Chapter 8 presents numerous useful definitions and theorems of binary set operations and constructions 
such as unions, intersections, complements, and the symmetric set difference in Sections 8.1, 8.2 and 8.3. 
Corresponding definitions and theorems for unions and intersections of (possibly infinite) collections of 
sets are given in Sections 8.4 and 8.6. Section 8.5 gies some properties of power sets of general sets. The 
theorems in this chapter are heavily relied upon throughout the book. 


Section 8.7 gives some definitions of set covers and partitions which are useful for topology. 


Section 8.8 discusses "specification tuples" which are used for definition statements in this book. 


CHAPTER 9: Relations 


(1) 
(2) 
(3) 


Chapter 9 introduces relations as arbitrary sets of ordered pairs in Section 9.5. 
Some basic theorems about ordered pairs are given in Section 9.2. 


Section 9.3 contains some comments on extensions of the ordered pair concept to more general finite 
ordered tuples. 


General Cartesian set-products and their basic properties are introduced in Section 9.4. 


Basic definitions and properties of general relations are presented in Section 9.5, including important 
concepts such as the domain, range, source set, target set and images of relations. 


Section 9.6 presents various properties of composites and inverses of relations. 


Section 9.7 defines direct products of relations. These are precursors of the direction products of 
functions in Section 10.14, which are used for direct products of manifolds in Sections 50.4 and 52.7. 


Equivalence relations, which are almost ubiquitous in mathematics, are defined in Section 9.8. 


CHAPTER 10: Functions 


(1) 


(2) 
(3) 
(4) 


(5) 
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Chapter 10 introduces functions, which are almost as important as sets in mathematics. Although 
functions are represented as sets in the modern set theory idiom, they are ontologically different kinds 


of objects. 
Section 10.1 makes some general philosophical observations about mathematical functions. 


Section 10.2 presents the most basic definitions and properties of functions. 


Section 10.3 defines choice functions. These are required for many kinds of definitions and theorems. 
Their existence can be guaranteed by invoking choice axioms, if so desired. 


Section 10.4 introduces restrictions, extensions and composition of functions. 
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Section 10.5 introduces injections, surjections, bijections and inverse functions, and the various names 
for different kinds of “morphisms”. Category theory is not within the scope of this book. So only the 
most basic vocabulary is introduced. 


Sections 10.6 and 10.7 give definitions and theorems for function set-maps and inverse set-maps, which 
are especially important for topology, but also for most other topics. 


Section 10.8 presents many useful theorems for families of sets and families of functions, particular in 
regard to function set-maps and inverse set-maps. 


Sections 10.9 and 10.10 concern the somewhat neglected topic of partially defined functions. These 
are almost ubiquitous in differential geometry, and yet their general properties are rarely presented 
systematically. Various useful properties of “partial functions” and their composites are given. 

Section 10.11 gives some basic properties of Cartesian products of sets. Since the axiom of choice is not 
adopted in this book, it is not necessarily guaranteed that non-empty products of non-empty families 
of sets are non-empty. However, this is not a problem in practice. 


Sections 10.12 and 10.13 define projection maps, “slices” and “lifts” for binary and general Cartesian 
products of sets. The slices and lifts are generally presented informally as required, but here they are 
given names and notations. 


Sections 10.14 and 10.15 present two different kinds of direct products of functions. The “double- 
domain” style of direct product is used for direct products of manifolds and other such structures. The 
“common-domain” style of direct product is used for fibre charts for fibre bundles. 


Section 10.16 presents equivalence relations and equivalent kernels, which are often used to define various 
kinds of quotient space structures. 


Section 10.17 introduces partial Cartesian products and “patchwork spaces”, which are used for con- 
structing spaces by gluing together patches, as is often done in differential geometry. 


The “indexed set covers” in Section 10.18 are an obvious extension of the set covers in Section 8.7 , 


Section 10.19 presents some useful extensions to the standard notations for sets of functions and map- 
rules. For example, f : A — (B — C) means that f is a function with domain A and values which are 
functions from B to C. Consistent with this, f : a — (b > g(a,b)) means that f is the function which 
maps a to the function which maps b to g(a, b). These notations are often used in this book. 


CHAPTER 11: Order 


1) 


(7) 
(8) 


Section 11.1 defines partial order relations on sets. Since every kind of order is a special case of a partial 
order, the definitions and properties of partial orders are generally applicable. 

Section 11.2 defines upper and lower bounds, infimum and supremum, and the minimum and maximum 
of a subset of a partially ordered set. 

Section 11.3 defines upper and lower bounds, infimum and supremum, and the minimum and maximum 
for functions whose range is a partially ordered set. 

Section 11.4 defines lattices as a particular kind of partially ordered set. Although lattices have some 
applications to inferior and superior limits in Section 35.8, they are not much needed for differential 
geometry. 

Section 11.5 defines total order, which is very broadly applicable. 

Section 11.6 defines well ordered sets, which are required for transfinite induction. According to the 
axiom of choice, a well-ordering exists for every set, but this is about as useful as knowing that there’s 
at least one needle in the haystack, but you’re not sure which galaxy the haystack is located in. This 
doesn’t help you to find a needle. 

Section 11.7 is concerned with the comparability of well-ordered sets. 

Section 11.8 is on transfinite induction, which has applications in set theory, but not many applications 
in differential geometry. 


CHAPTER 12: Ordinal numbers 


(1) 
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Section 12.1 defines ordinal numbers in a way which differs subtly from the standard textbook definition. 
Definition 12.1.3 introduces “extended finite ordinal numbers” as sets whose elements equal either the 
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empty set or the “successor” of some other element. (The name is chosen by analogy with the extended 
integers or real numbers.) By a long process of logical deduction, which depends heavily on the ZF 
regularity axiom, it is shown that w U {w} equals the set of all extended finite ordinal numbers, where 
w is the set of finite ordinal numbers. Along the way, many useful properties of extended finite ordinal 
numbers are shown. 


In Section 12.2, the standard total order (which is a well-ordering) on the extended finite ordinal numbers 
is defined to be the relation of set inclusion. 


Section 12.3 defines sequences with domains which are extended finite ordinal numbers. 


In Section 12.4, Theorem 12.4.2 shows without the axiom of choice that there exists an enumeration 
of any subset of w by some element of w U {w}. This shows that all subsets of w are countable. 
Theorem 12.4.13 shows that there is no injection from w to an element of w. This implies that w is 
infinite. 


Section 12.5 mentions general ordinal numbers, but only to observe that they have very little relevance 
to differential geometry beyond w + w. 


Section 12.6 presents the so-called von Neumann universe of sets, also known as the cumulative hierarchy 
of sets. This vast universe of sets is a kind of “standard model” for ZF set theory, although the stages 
beyond V.,4., have little relevance to differential geometry. 

Table 12.6.2 in Remark 12.6.5 summarises the lower inductive stages of both the ordinal numbers and 
the corresponding transfinite cumulative hierarchy of sets. The main purpose of presenting these is to 
give some idea of why they are not useful. 


CHAPTER 13: Cardinality 


(1) 


Cardinality is useful as an initial classification of sets into finite, countably infinite, or uncountably 
infinite. Cardinality is itself not very well defined. The comparative concepts of equal, lesser and 
greater cardinality are much better defined, but since they rely on the existence of certain kinds of 
functions, and that existence is often purchased at the price of the axiom of choice, even comparison of 
cardinality is often ambiguous. 


Section 13.1 introduces equinumerosity, which is, in principle, an equivalence relation on the class of all 
sets. Theorem 13.1.7 is the well-known Schréder-Bernstein theorem, which does not require any axiom 
of choice. It implies, roughly speaking, that two sets have equal cardinality if they are each greater than 
or equal to the other. Unfortunately, proving that in general at least one set of a pair is greater than or 
equal to the other does require the axiom of choice. 


Section 13.2 concerns the application of ordinal numbers to provide standard cardinality “yardsticks” 
for ZF sets. It is concluded that the ordinal numbers fail badly in this task for uncountable sets. 


Section 13.3 presents Hartogs’s theorem, which asserts (without any choice axiom) that for every ZF 
set, there exists an ordinal number with not-lower cardinality. Without AC, it cannot be shown that 
the cardinality is greater except for well-ordered sets (such as ordinal numbers). This reinforces the 
conclusion that the ordinal numbers are inadequate cardinality yardsticks for uncountable sets. 


Section 13.4 proposes that the standard “yardstick” for the cardinality of sets should be the set of 
finite ordinal numbers w for finite sets, but that it should be the class of von Neumann universe stages 
for infinite sets. Thus infinite sets are compared with yardstick “beta-sets” like w, 2", 207), and so 
forth. This idea is given the name “beta-cardinality” here. This approach to cardinality yardsticks 
is motivated by the complete unsuitability of the infinite ordinal numbers (beyond w) for measuring 
cardinality, whereas beta-sets are eminently suited to the purpose. (The infinite ordinal numbers are 
well suited to the classification of all possible well-orderings of infinite sets, a topic which has dubious 
relevance to applicable mathematics.) 


Section 13.5 introduces finite sets. 
Section 13.6 defines addition for finite ordinal numbers. 


Section 13.7 looks at the basic properties of infinite and countably infinite sets. Various difficulties arise 
because (without the axiom of countable choice) it is not possible to guarantee that a set which is not 
finite is necessarily equinumerous to w. 
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Theorem 13.8.5 shows that, without the axiom of choice, the union of an explicit countably infinite 
family of finite sets is finite. The explicitness means that a family of enumerations must be given for 
the sets. In practice, such a family is typically known, but in many contexts in measure theory, for 
example, such explicit families cannot be guaranteed to exist. A consequence of this kind of issue is 
that a large number of basic theorems in measure theory must be modified to make them independent 
of the axiom of choice. However, as mentioned in Remark 13.8.9, avoiding the axiom of choice is often 
a simple matter of “quantifier swapping". 


Section 13.9 gives some explicit enumerations of subsets of wxw. These are useful in relation to measure 
theory in the proofs of Theorems 45.3.3 and 45.4.2. 


Section 13.10 is about the frustrating subject of the Dedekind definitions for finite and infinite sets, and 
“mediate cardinals”, whose cardinality is finite but less than the cardinality of w. Unfortunately, such 
issues arise frequently in topology and measure theory. 


To pour oil onto the fire, Section 13.11 gives eight different definitions of finite sets. If the axiom of 
choice is adopted, these definitions are all equivalent. This kind of chaos helps to explain why the vast 
majority of mathematicians prefer to adopt the axiom of choice, even though AC is in many ways even 
more disturbing than the consequences of not adopting it. 


Section 13.12 presents some definitions and properties of various kinds of “cardinality-constrained power 
sets". These are useful for some topics in topology and measure theory. 


CHAPTER 14: Natural numbers and integers 


1) 


(9) 


Section 14.1 defines the natural numbers to be the positive integers. Table 14.1.1 gives a survey of 
authors which define zero to be a natural number and those who do not. The Peano axioms are 
presented for the natural numbers. 

Section 14.2 presents Peano-axioms-style tests for countability of sets. 

Section 14.3 makes some brief observations on natural number arithmetic. 

Section 14.4 introduces the (signed) integers and notations for various useful subsets. 

Sections 14.5 and 14.6 contain some brief definitions and notations for extended integers and Cartesian 
products of sequences of sets and functions. 

Section 14.7 defines indicator functions (also known as characteristic functions) and delta functions (also 
known as Kronecker delta functions). 


Section 14.8 defines permutations to be bijections from a set to itself. T'hese have particular applications 
to components of tensors. Also defined in Section 14.8 are the parity function, factorial function, the 
Jordan factorial function, the Levi-Civita alternating symbol (or tensor), and multi-indices. 

Section 14.9 defines combination symbols, typically denoted C? or C(n,r). Section 14.10 presents 
“ordered selections", which are useful for tensor algebra. Section 14.11 presents the sorting of ordered 
selections by means of “rearrangements” by permuting the domains of selections. 


Section 14.12 introduces “list spaces", which are sets of finite-length sequences of elements of a given 
set. List spaces are generally presented informally as needed in mathematics textbooks, but it is helpful 
to make some list operations more precise. 


CHAPTER 15: Rational and real numbers 


(1) 


(2) 


(3) 
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Chapter 15 is a fairly conventional presentation of the rational and real numbers. The Dedekind cut 
representation is chosen here for the real numbers after considering the disadvantages of several other 
representations. Although the real number system has numerous philosophical and practical difficulties, 
it is assumed as the basis of most science and engineering, and a substantial proportion of mathematics. 
So it is a good idea to understand them well. They are not as simple as they seem. 

Section 15.1 introduces the rational numbers in the usual way, as equivalence classes of ratios of integers. 
In most situations, when people think they are using real numbers, they are really using rational 
numbers, particularly in computer hardware and software contexts. 


Section 15.2 considers the cardinality of the rational numbers, and explicit enumerations which prove 
that they are countably infinite. 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


22 Part I. Foundations 


(4) Section 15.3 discusses a variety of representations of the real numbers, none of which is truly satisfactory 
for even modest requirements. 


the most elementary prerequisites. However, even the easiest option is not as easy as it seems! The real 
number system is a complete ordered field. But verifying every detail is not a trivial task. 


CHAPTER 16: Real-number constructions 


(1) Section 16.1 presents real-number interval notations and definitions. These are important in topology, 
for example for defining curves. The real-number intervals are the connected sets of real numbers. 


(2) Section 16.2 defines extended real numbers, which include negative and positive infinite pseudo-elements. 
Arithmetic for the extended real numbers is somewhat ambiguous and arbitrary. There are different 
versions of how the arithmetic operations are defined. However, they are useful for measure theory and 
some other purposes. 

(3) Sections 16.3 and 16.4 make some brief observations on extended rational numbers and real-number 
tuples respectively. 

(4) Sections 16.5, 16.6 and 16.7 define some familiar real-valued functions which are useful from time to 
time, particularly for constructing examples and counterexamples. 

(5) Section 16.8 very briefly defines complex numbers, but their use is avoided as much as possible. They 
are required only for defining unitary groups, which are relevant to gauge theory. 
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2.0.1 REMARK: No bedrock of knowledge underlies mathematics. Reductionism ultimately fails. 

The following medieval-style “wheel of knowledge” shows some interdependencies between seven disciplines. 
One may seek answers to questions about the foundations of each discipline by following the arrow to a more 
fundamental discipline, but there seems to be no ultimate “bedrock of knowledge”. 


The author started writing this book with the intention of putting differential geometry on a firm, reliable 
footing by filling in all of the gaps in the standard textbooks. After reading a set theory book by Halmos [357] 
in 1975, it had seemed that all mathematics must be reducible to set constructions and their properties. But 
decades later, after delving more deeply into set theory and logic, that happy illusion faded and crumbled. 


It seems to be a completely reasonable idea to seek the meaning of each mathematical concept in terms of 
the meanings of its component concepts. But when this procedure is applied recursively, the inevitable result 
is either a cyclic definition at some level, or a definition which refers outside of mathematics. Any network 
of definitions can only be meaningful if at least one point in the network is externally defined. 


The reductionist approach to mathematics is very successful, but cannot be carried to its full conclusion. 
Even if all of mathematics may be reduced to set theory and logic, the foundations of set theory and logic 
will still require the importation of meaning from extra-mathematical contexts. 


The impossibility of a globally hierarchical arrangement of definitions may be due to the network organisation 
of the human mind. Whether the real world itself is a strict hierarchy of layered systems, or is rather a 
network of systems, is perhaps unknowable since we can only understand the world via human minds. 


2.0.2 REMARK: Intellectual towers on sandy foundations. 

It is unwise to build intellectual towers on sandy foundations. So Part I of this book attempts to bolster 
and compactify the foundations of mathematics. Where the foundations cannot be strengthened, one can 
at least study their weaknesses. Then if an occasional intellectual tower does begin to tilt, one may be able 
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to promptly identify where further support is required before it topples and sinks into the sand, or if the 
deficiencies cannot be remedied, one may give up and recommend the evacuation of the building. 


This remark is more than mere idle philosophising. The foundations of mathematics are examined in Part I 
of this book because the author found numerous cases where a “trace-back” of definitions or theorems led to 
serious weaknesses in the foundations. Even some fairly “intuitively obvious” distance-function constructions 
for Riemannian manifolds in Chapter 73 lead directly to questions in set theory regarding constructibility 
for arcane concepts such as “mediate cardinals”, which can only be resolved (or at least understood) within 
mathematical logic and model theory. Consequently it is now this author’s belief that the foundations of 
mathematics should not be ignored, especially in applied mathematics. 


2.0.3 REMARK: Philosophy of mathematics. But this is not the professional philosopher’s philosophy. 
Chapter 2 was originally titled “Philosophy of mathematics”, but the professional philosopher’s idea of 
philosophy has almost nothing in common with the “philosophy” in this book. Every profession has its own 
“philosophical” frameworks within which ideas are developed and discussed. Mathematicians in particular 
have a need to develop and discuss ideas which arise directly from the day-to-day concerns of the métier itself. 
Philosophers have contemplated mathematics for thousands of years, possibly because it is simultaneously 
exceedingly abstract and yet extremely applicable. But as Richard Phillips Feynman is widely reputed (but 
not proved) to have said: “Philosophy of science is about as useful to scientists as ornithology is to birds.” 
The same is largely true of the philosophy of mathematics, except that some philosophers have in fact been 
mathematicians investigating philosophy, not philosophers investigating mathematics. Any “philosophy” in 
this book arises directly from the real, inescapable quandaries which are encountered while formulating and 
organising practical mathematics, not from the desire to have something to philosophise about. 


In regard to the absence of philosophy in mathematical works, Lanczos [280], page xxvii, wrote the following. 


The sober, practical, matter-of-fact nineteenth century—which carries over into our day—suspected 
all speculative and interpretative tendencies as “metaphysical” and limited its programme to the 
pure description of natural events. In this philosophy mathematics plays the role of a shorthand 
method, a conveniently economical language for expressing involved relations. 


In defence of the metaphysical in mathematical theories, Lanczos [280], page xxviii, mentioned that the 
theory of general relativity... 


[...] was obtained by mathematical and philosophical speculation of the highest order. Here was a 
discovery made by a kind of reasoning that a positivist cannot fail to call ^metaphysical," and yet 
it provided an insight into the heart of things that mere experimentation and sober registration of 
facts could never have revealed. The Theory of General Relativity brought once again to the fore 
the spirit of the great cosmic theorists of Greece and the eighteenth century. 


6 


Likewise in this book, there is much “speculation” and “interpretation” because “insight into the heart of 
; p g 


things" often leads to discoveries which mechanical computation does not. 


2.1. The bedrock of mathematics 


2.1.1 REMARK: Rigorous mathematics is boot-strapped from naive mathematics. 

Although logic is arguably the bedrock of mathematics, it floats on a sea of molten magma, namely the 
naive notions of logic, sets, functions, order and numbers which are used in the formulation of “rigorous” 
mathematical logic. This is illustrated in Figure 2.1.1. 


Mathematics is boot-strapped into existence by first assuming socially and biologically acquired naive notions, 
then building logical machinery on this wobbly basis, and then “rigorously” redefining the naive notions of 
sets, functions and numbers with this logical machinery. Terra firma floats on molten magma. Mathematics 
and logic are like two snakes biting each other’s tails. They cannot both swallow the other (although at first 
they might think they can). The way in which rigorous logic is boot-strapped from naive logic is summarised 
in the opening paragraph of Suppes/Hill [396], page 1. 

In the study of logic our goal is to be precise and careful. The language of logic is an exact one. Yet 

we are going to go about building a vocabulary for this precise language by using our sometimes 

confusing everyday language. We need to draw up a set of rules that will be perfectly clear and 

definite and free from the vagueness we may find in our natural language. We can use English 
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Figure 2.1.1 Relations between naive mathematics and “rigorous mathematics” 


sentences to do this, just as we use English to explain the precise rules of a game to someone who 
has not played the game. Of course, logic is more than a game. It can help us learn a way of 
thinking that is exact and very useful at the same time. 


A mathematical education makes many iterations of such redefinition of relatively naive concepts in terms 
of relatively rigorous concepts until one’s thinking is synchronised with the current mathematics culture. 
Philosophers who speculate on the true nature of mathematics possibly make the mistake of assuming a 
static picture. They ignore the cyclic and dynamic nature of knowledge, which has no ultimate bedrock 
apart from the innate human capabilities in logic, sets, order, geometry and numbers. 


2.1.2 REMARK: Etymology and meaning of the word “naive”. 

The word “naive” is not necessarily pejorative. The French word “naive” comes from the Latin word 
“nativus”, which means "native", “innate” or “natural”. These meanings are applicable in the context of 
logic and set theory. The Latin word “nativus” comes from Latin “natus”, which means “born”. So the 
word “naive” may be thought of as meaning “inborn” or “innate”. That is, naive mathematics is a capability 
which humans are born with. 


2.1.3 REMARK: Mathematics bedrock layers in history. 

The choice of nominal bedrock layer for mathematics seems to be a question of fashion. In ancient Greece, 
and for a long time after, the most popular choice for the bedrock layer was geometry, while arithmetic and 
algebra could be based upon the ruler-and-compass geometry of points, lines, planes, circles, spheres and 
conic sections. From the time of Descartes onwards, geometry could be based on arithmetic and algebra. 
Then geometry became progressively more secondary, and arithmetic became more primary, because algebra 
and analysis extended numbers well beyond what the old geometry could deliver. Weyl [311], page viii, wrote 
the following on this subject in 1928. 


Occidental mathematics has in past centuries broken away from the Greek view and followed a 
course which seems to have originated in India and which has been transmitted, with additions, to 
us by the Arabs; in it the concept of number appears as logically prior to the concepts of geometry. 
'The result of this has been that we have applied this systematically developed number concept to 
all branches, irrespective of whether it is most appropriate for these particular applications. But 
the present trend in mathematics is clearly in the direction of a return to the Greek standpoint; 
we now look upon each branch of mathematics as determining its own characteristic domain of 
quantities. 


Later on, set theory provided a new, more primary layer upon which arithmetic, algebra, analysis and 
geometry could be based. Around the beginning of the 20th century, mathematical logic became the new 
bedrock layer upon which set theory and all of mathematics could be based, initiated by Peano [375] in 1889. 
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At the beginning of the 21st century, it would probably be fair to say that the majority of mathematics 
is (or can be) modelled within the framework of Zermelo-Fraenkel or Neumann-Bernays-Gédel set theory 
(appealing to axioms of choice when absolutely necessary). ZF and NBG set theory may be expressed within 
the framework of first-order language theory, and first-order languages derive their semantics from model 
theory. Model theory is so abstract, abstruse and arcane that few mathematicians immerse themselves in it. 
But most mathematicians seem to accept the famous metatheorems of Kurt Gödel, Paul Cohen and others, 
which are part of model theory. Hence model theory could currently be viewed as the ultimate bedrock of 
mathematics. This is convenient because, being abstract, abstruse and arcane, model theory is examined 
by very few mathematicians closely enough to ascertain whether it provides a solid ultimate bedrock for 
their work. Thus the solidity of their discipline is outsourced. So it is “somebody else’s problem”. (See 
D. Adams [486], pages 329-330.) 

2.1.4 REMARK: Bertrand Russell’s disappointment with the non-provability of axioms. 

The lack of a solid ultimate bedrock for mathematics was encountered by the 11-year-old Bertrand Russell 
when he discovered that the intellectual edifice of Euclidean geometry was founded on axioms which had to be 
accepted without proof. The following paragraph (Clark [454], page 34) describes Russell’s disillusionment. 


Lack of an emotional hitching-post was quite clearly a major factor in driving the young Russell 
out on his quest for an intellectual alternative — for certainty in an uncertain world — a journey 
which took him first into mathematics and then into philosophy. The expedition had started by 
1883 when Frank Russell took his brother’s mathematical training in hand. ‘I gave Bertie his first 
lesson in Euclid this afternoon’, he noted in his diary on 9 August. ‘He is sure to prove a credit to 
his teacher. He did very well indeed, and we got half through the Definitions.’ Here there was to 
be no difficulty. The trouble came with the Axioms. What was the proof of these, the young pupil 
asked with naive innocence. Everything apparently rested on them, so it was surely essential that 
their validity was beyond the slightest doubt. Frank’s statement that the Axioms had to be taken 
for granted was one of Russell’s early disillusionments. ‘At these words my hopes crumbled’, he 
remembered; ‘... why should I admit these things if they can’t be proved?’ His brother warned that 
unless he could accept them it would be impossible to continue. Russell capitulated — provisionally. 


In later life, Russell tried, together with Whitehead [400, 401, 402], to provide a solid bedrock for mathematics, 
but this attempt was ultimately only partially successful. 


2.1.5 REMARK: The choice of foundations for this book. 

Questions regarding the “bedrock” (i.e. foundations) of mathematics often arose in very practical ways while 
writing this book. Most frequently, the questions concerned whether or not to accept some form of choice 
axiom so as to be able to prove various desirable theorems. Since axioms of choice are a kind of mystical 
“magic wand” which make theorems work when all else fails, it seems best to avoid them as a matter of 
principle, but most working mathematicians accept choice axioms whenever they are useful. (Readers are 
hereby warned that openly professing a lack of faith in the axiom of choice could harm their careers!) 


To resolve this issue, the author needed to look more deeply into the ultimate basis of mathematics to try 
to discover “the truth”. After a long search, it now appears that the axioms of choice are only the tip of 
the iceberg. The metaphysical (i.e. non-constructible) constructions and concepts of mathematics commence 
with the infinity axiom, and are amplified by the power-set axiom. Axioms of choice are only objectionable 
because they assert the non-emptiness of sets which contain no constructible elements. Thus, for example, 
the set of all well-orderings of the real numbers is asserted to be non-empty, but no such well-orderings are 
constructible. Such axiom-of-choice assertions would be harmless if the elements of the supposedly non-empty 
sets were not used as inputs to other theorems and definitions, which are thereby rendered non-constructible 
themselves. Some subjects consist, in fact, predominantly of theorems and definitions which are reliant on 
axioms of choice to guarantee “existence”. This may be very popular with mathematicians, but it confuses 
different kinds of “existence”. This is not a matter of small importance. A large proportion of mathematical 
analysis is concerned with the existence of solutions to problems. 


'The resolution which the author has decided upon is to occasionally state theorems and definitions which use 
the axiom of choice to assert the "existence" of non-constructible sets and functions. However, theorems and 
definitions which are thus “tainted” will be tagged as such, as a warning that the objects whose existence is 
asserted can never be seen or thought of individually by human eyes or minds. The issue is not limited to 
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the axiom of choice. For example, almost all real numbers are non-constructible, although the constructible 
real numbers constitute a dense subset. Therefore the real numbers must themselves be used with care. 


In addition to the semi-arbitrariness of the axioms for set theory, there is also arbitrariness in the choice 
of “model” for the chosen set theory axioms. Such questions, and many other related questions, are the 
motivation for thinking long and hard about the “bedrock” of mathematics. 


Although it could be concluded that all problems in the foundations of mathematics could be removed by 
drastic measures, such as rejecting the infinity axiom, or the power-set axiom, or both, this would remove 
essentially all of modern analysis. The best policy is, all things considered, to manage the metaphysicality of 
mathematics, because when the metaphysical is removed from mathematics, almost nothing useful remains. 


If only concrete mathematics is accepted, then mathematics loses most of its power. The whole point of 
mathematics is to take human beings beyond the concrete. Restricting mathematics to the most concrete 
concepts defeats its prime purpose. Without infinities, and infinities of infinities, mathematics could not 
progress far beyond the abacus and the ruler-and-compass. But the price to pay to go far beyond directly 
perceptible realities is eternal vigilance. Mathematics needs to be anchored in reality to some extent. 


Mathematics spans a spectrum of metaphysicality from the intuitively clear elementary arithmetic to bizarre 
concepts such as “inaccessible cardinals”. Situated in between the extremes lie mathematical analysis and 
differential geometry. One does want to be able to assert the existence of solutions to problems of many 
kinds, but if the existence is of the shadowy axiom-of-choice kind, that won’t be of much practical use when 
one tries to design approximation algorithms for such solutions, or draw graphs of them. 


2.1.6 REMARK: The importance of the foundations of mathematics. 
Anyone who doubts the importance of the foundations of mathematics is perhaps unaware of the ubiquitous 
role of mathematics in the technological progress which improves quality of life. (See Figure 2.1.2.) 


quality of life 
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Figure 2.1.2 Mathematics foundations underpin quality of life 


The progress of technology in the last 500 years has rested heavily on progress in fundamental science, which 
in turn has relied heavily on progress in mathematics. The general public who consume modern technology 
are mostly unaware of the vast amount of mathematical analysis which underpins it. Since mathematics 
has been such an important factor in technological progress in the last 500 years, surely the foundations 
of mathematics deserve some attention also, since mathematics is the foundation of science, which is the 
foundation of modern technology, the principal factor which differentiates life in the 21st century from life in 
the 16th century. A large book could easily be filled with a list of the main contributions of mathematics to 
modern science and technology. Unfortunately it is only mathematicians, scientists and engineers who can 
appreciate the full extent of the contributions of mathematics to quality of life. (See also Remark 3.0.4.) 


2.2. Ontology of mathematics 
2.2.1 REMARK: Ontology is concerned with the true nature of things, beyond their formal treatment. 
If the bedrock of mathematics ultimately lies outside of mathematics, this raises the question of where exactly 


it does lie. This question is related to the concept of “ontology”. 
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2.2.2 REMARK: Albert Einstein's explanation of the ontological problem for mathematics. 

Albert Einstein [458], pages 154-155, wrote the following comments about the ontological problem in a 
1934 essay, “Das Raum-, Ather- und Feldproblem der Physik”, which originally appeared in 1930. He used 
the terms “Wesensproblem” and “Wesensfrage” (literally *essence-problem" or “essence-question” ) for the 
ontological problem or question. 


Es ist die Sicherheit, die uns bei der Mathematik soviel Achtung einflóft. Diese Sicherheit aber ist 
durch inhaltliche Leerheit erkauft. Inhalt erlangen die Begriffe erst dadurch, daf sie — wenn auch 
noch so mittelbar — mit den Sinneserlebnissen verknüpft sind. Diese Verknüpfung aber kann keine 
logische Untersuchung aufdecken; sie kann nur erlebt werden. Und doch bestimmt gerade diese 
Verknüpfung den Erkenntniswert der Begriffssysteme. 


Beispiel: Ein Archäologe einer späteren Kultur findet ein Lehrbuch der euklidischen Geometrie 
ohne Figuren. Er wird herausfinden, wie die Worte Punkt, Gerade, Ebene in den Sàtzen gebraucht 
sind. Er wird auch erkennen, wie letztere auseinander abgeleitet sind. Er wird sogar selbst neue 
Sätze nach den erkannten Regeln aufstellen können. Aber das Bilden der Sätze wird für ihn ein 
leeres Wortspiel bleiben, solange er sich unter Punkt, Gerade, Ebene usw. nicht »etwas denken 
kann«. Erst wenn dies der Fall ist, erhalt für ihn die Geometrie einen eigentlichen Inhalt. Analog 
wird es ihm mit der analytischen Mechanik gehen, überhaupt mit Darstellungen logisch-deduktiver 
Wissenschaften. 


Was meint dies »sich unter Gerade, Punkt, schneiden usw. etwas denken kónnen?« Es bedeutet 
das Aufzeigen der sinnlichen Erlebnisinhalte, auf die sich jene Worte beziehen. Dies aufrlogische 
Problem bildet das Wesensproblem, das der Archäologe nur intuitiv wird lösen können, indem 
er seine Erlebnisse durchmustert und nachsieht, ob er da etwas entdecken kann, was jenen Ur- 
worten der Theorie und den für sie aufgestellten Axiomen entspricht. In diesem Sinn allein kann 
vernünftigerweise die Frage nach dem Wesen eines begrifflich dargestellten Dinges sinnvoll gestellt 
werden. 


This may be translated into English as follows. 


It is the certainty which so much commands our respect in mathematics. This certainty is purchased, 
however, at the price of emptiness of content. Concepts can acquire content only when — no matter 
how indirectly — they are connected with sensory experiences. But no logical investigation can 
uncover this connection; it can only be experienced. And yet it is precisely this connection which 
determines the knowledge-value of the systems of concepts. 


Example: An archaeologist from a later culture finds a textbook on Euclidean geometry without 
diagrams. He will find out how the words “point”, “line” and “plane” are used in the theorems. He 
will also discern how these are derived from each other. He will even be able to put forward new 
theorems himself according to the discerned rules. But the formation of theorems will be an empty 
word-game for him so long as he cannot “have something in mind” for a point, line, plane, etc. Until 
this is the case, geometry will hold no actual content for him. He will have a similar experience 
with analytical mechanics, in fact with descriptions of logical-deductive sciences in general. 


What does this *having something in mind for a line, point, intersection etc." mean? It means 
indicating the sensory experience content to which those words relate. This extralogical problem 
constitutes the essence-problem which the archaeologist will be able to solve only intuitively, by 
sifting through his experiences to see if he can find something which corresponds to the fundamental 
terms of the theory and the axioms which are proposed for them. Only in this sense can the question 
of the essence of an abstractly described entity be meaningfully posed in a reasonable way. 


(For an alternative English-language translation, see Einstein [459], pages 61-62, “The problem of space, 
ether, and the field in physics", where “Wesensfrage” is translated in the paragraph following the above 
quotation as “ontological problem". That translation is also in Commins/Linscott [455], pages 471-472.) 


Mathematics cannot be defined in a self-contained way. Some science-fiction writers claim that a dialogue 
with extra-terrestrial life-forms will start with lists of prime numbers and gradually progress to discussions 
of interstellar space-ship design. But the ancient Greeks didn't even have the same definitions for odd and 
even numbers that we have now. (See for example Heath [244], pages 70-74.) So it is difficult to make the 
case that prime numbers will be meaningful to every civilisation in the galaxy. Since only one out of millions 
of species on Earth has developed language in the last half a billion years (ignoring extinct homininians), it 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2.2. Ontology of mathematics 29 


is not at all certain that even language is universal. The fact that one species out of millions can understand 
numbers and differential calculus doesn’t imply that they are somehow universally self-evident, independent 
of our biology and culture. 


Einstein is probably not entirely right in saying that all meaning must be connected to “sensory experience” 
unless one includes internal perception as a kind of sensory experience. Pure mathematical language refers 
to activities which are perceived within the human brain, but internal perception originates in the ambient 
culture. So it is probably more accurate to say that words must be connected to cultural experience. A 
thousand years ago, no one on Earth could have understood Lebesgue integration, no matter how well it was 
explained, because the necessary intellectual foundations did not exist in the culture at that time. 


One of the main purposes of this book is to identify the meanings of many of the definitions in mathematics 
without the need for so much communication by osmosis as an apprentice in the mathematics trade. Anyone 
who reads older historical works will inevitably find concepts which make little or no sense in terms of 
modern mathematical thought. Often the ontology has been lost. Ancient authors had concepts in their 
minds which are no longer generally even thought about in the 21st century. Even in the relatively new 
subject of differential geometry, many older works are difficult to make sense of now. Hence it is important 
to attempt to explicitly state the meanings of mathematical concepts at this time so that they may be 
understood in future centuries which do not have access to our current “osmosis through apprenticeship” . 


It may often seem that some of the explanations in this book are excessively explicit, but people in future 
centuries might not be able to “fill the gaps” in a way which we now take for granted. If the gaps are made 
smaller, they will be easier to fill, not only by people of the distant future, but also by people in this century. 
So the effort to make the implicit explicit is arguably worthwhile. 


2.2.3 REMARK: Ontology versus “an ontology”. 
Semantics is the association of meanings with texts. An ontology is a semantics for which the meanings are 
expressed in terms of world-models. 


In philosophy, the subject of “ontology” is generally defined as the study of being or the essence of things. The 
phrase “an ontology” has a different but related meaning. An ontology, especially in the context of artificial 
intelligence computer software, is a standardised model to which different languages may refer to facilitate 
translation between the languages. To achieve this objective, an ontology must contain representations of all 
the objects, classes, relations, attributes and other things which are signified by the languages in question. 


In the context of mathematics, an ontology must be able to represent all of the ideas which are signified by 
mathematical language. The big question is: What should an ontology for mathematics contain? In other 
words, what are the things to which mathematics refers and what are the relations between them? 


2.2.4 REMARK: Possible locations for mathematics. 
The following possible “locations” for mathematics are discussed in the mathematics philosophy literature. 


(1) Mathematics exists in the machinery of the universe, and that’s where humans find it. 


(2) Plato’s theory of ideal Forms. Mathematics exists in a “timeless realm of being”. 


(3) Mathematics exists in the human mind. 


2.2.5 REMARK: The physical-universe-structure mathematics ontology. 
The ontology category (1) in Remark 2.2.4 is exemplified by the following comments in Lakoff/Nüiiez [449], 
page xv. (The authors then proceed to pour scorn on the idea.) 


Mathematics is part of the physical universe and provides rational structure to it. There are 
Fibonacci series in flowers, logarithmic spirals in snails, fractals in mountain ranges, parabolas in 
home runs, and 7 in the spherical shape of stars and planets and bubbles. 

Later, these same authors say the following (Lakoff/Nunez [449], page 3). 
How can we make sense of the fact that scientists have been able to find or fashion forms of 
mathematics that accurately characterize many aspects of the physical world and even make correct 
predictions? It is sometimes assumed that the effectiveness of mathematics as a scientific tool shows 


that mathematics itself exists in the structure of the physical universe. This, of course, is not a 
scientific argument with any empirical scientific basis. 
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[...] Our argument, in brief, will be that whatever “fit” there is between mathematics and the 
world occurs in the minds of scientists who have observed the world closely, learned the appropriate 
mathematics well (or invented it), and fit them together (often effectively) using their all-too-human 
minds and brains. 


It is certain that mathematics does exist in the human mind. This mathematics corresponds to observations 
of, and interactions with, the physical universe. Those observations and interactions are limited by the 
human sensory system and the human cognitive system. We cannot be at all certain that the mathematics 
of our physical models is inherent in the observed universe itself. We can only say that our mathematics 
is well suited to describing our observations and interactions, which are themselves limited by the channels 
through which we make our observations. 


The correspondence between models and the universe seems to be good, but we view the universe through 
a cloud of statistical variations in all of our measurements. All observations of the physical universe require 
statistical inference to discern the noumena underlying the phenomena. This inference may be explicit, with 
reference to probabilistic models, or it may be implicit, as in the case of the human sensory system. 


2.2.6 REMARK: A plausible argument in favour of a Platonic-style ontology for mathematics. 
A plausible argument may be made in favour of Platonic ontology for the integers in the following way. 


(1) Numbers written on paper must refer to something, because so many people agree about them. 
(2) Numbers do not correspond exactly to anything in the perceived world. 


(3) Therefore numbers correspond exactly to something which is not in the perceived world. 


The same overall form of argument is applied to geometrical forms such as triangles in this passage from 
Russell [469], page 139. 


In geometry, for example, we say: ‘Let ABC be a rectilinear triangle.’ It is against the rules to ask 
whether ABC really is a rectilinear triangle, although, if it is a figure that we have drawn, we may 
be sure that it is not, because we can’t draw absolutely straight lines. Accordingly, mathematics 
can never tell us what is, but only what would be if... There are no straight lines in the sensible 
world; therefore, if mathematics is to have more than hypothetical truth, we must find evidence 
for the existence of super-sensible straight lines in a super-sensible world. This cannot be done by 
the understanding, but according to Plato it can be done by reason, which shows that there is a 
rectilinear triangle in heaven, of which geometrical propositions can be affirmed categorically, not 
hypothetically. 


Written numerals are only ink or graphite smeared onto paper. If billions of people are writing these symbols, 
they must mean something. They must refer to something in the world of experience of the people who 
write the symbols to communicate with each other. 


Since nothing in the physical world corresponds exactly to the idea of a number, and written numbers must 
refer to something, there must be a non-physical world where numbers exist. This non-physical world must 
be perceived by all humans because otherwise they could not communicate about numbers with each other. 
This proves the existence of a non-physical world where all numbers are perfect and eternal. This world 
may be referred to as a “mathematics heaven”. This number-world can be perceived by the minds of human 
beings. So the human mind is an sensory organ which can perceive the Platonic ideal universe in the same 
way that the eyes see physical objects. (See Plato [465], pages 240-248, for “the simile of the cave” .) 

2.2.7 REMARK: Lack of unanimity in perception of the Platonic universe of Forms. 

The history of arguments over the last 150 years about the foundations of mathematics, particularly in regard 
to intuitionism versus formalism or logicism, casts serious doubt on the claim that everyone perceives the same 
mathematical universe. There have even been serious disagreements as to the “reality” of negative numbers, 
zero, irrational numbers, transcendental numbers, complex numbers, and transfinite ordinal numbers. 


People born into a monolinguistic culture may have the initial impression that their language is “absolute” 
because everyone agrees on its meaning and usage, but later one discovers that language is mostly “relative”, 
i.e. determined by culture. In the same way, Plato might have thought that mathematics is absolute, but 
the more one understands of other cultures, the more one perceives one’s own culture to be relative. 
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2.2.8 REMARK: Descartes and Hermite supported the Platonic ontology for mathematics. 
Descartes and Hermite supported the Platonic Forms view of mathematics. Bell [233], page 457, published 
the following comment in 1937. 


Hermite’s number-mysticism is harmless enough and it is one of those personal things on which 
argument is futile. Briefly, Hermite believed that numbers have an existence of their own above 
all control by human beings. Mathematicians, he thought, are permitted now and then to catch 
glimpses of the superhuman harmonies regulating this ethereal realm of numerical existence, just 
as the great geniuses of ethics and morals have sometimes claimed to have visioned the celestial 
perfections of the Kingdom of Heaven. 


It is probably right to say that no reputable mathematician today who has paid any attention 
to what has been done in the past fifty years (especially the last twenty five) in attempting to 
understand the nature of mathematics and the processes of mathematical reasoning would agree 
with the mystical Hermite. Whether this modern skepticism regarding the other-worldliness of 
mathematics is a gain or a loss over Hermite’s creed must be left to the taste of the reader. What is 
now almost universally held by competent judges to be the wrong view of “mathematical existence” 
was so admirably expressed by Descartes in his theory of the eternal triangle that it may be quoted 
here as an epitome of Hermite’s mystical beliefs. 


“T imagine a triangle, although perhaps such a figure does not exist and never has existed any- 
where in the world outside my thought. Nevertheless this figure has a certain nature, or form, or 
determinate essence which is immutable or eternal, which I have not invented and which in no way 
depends on my mind. This is evident from the fact that I can demonstrate various properties of 
this triangle, for example that the sum of its three interior angles is equal to two right angles, that 
the greatest angle is opposite the greatest side, and so forth. Whether I desire it or not, I recognize 
very clearly and convincingly that these properties are in the triangle although I have never thought 
about them before, and even if this is the first time I have imagined a triangle. Nevertheless no one 
can say that I have invented or imagined them.” 'Iransposed to such simple “eternal verities" as 
14+2=3, 2+2 = 4, Descartes’ everlasting geometry becomes Hermite's superhuman arithmetic. 


2.2.9 REMARK: Bertrand Russell abandoned the Platonic ontology for mathematics. 
Bertrand Russell once believed the ideal Forms ontology, but later rejected it. Bell [234], page 564, said the 
following about Bertrand Russell’s change of mind. 


In the second edition (1938) of the Principles, he recorded one such change which is of particular 
interest to mathematicians. Having recalled the influence of Pythagorean numerology on all subse- 
quent philosophy and mathematics, Russell states that when he wrote the Principles, most of it in 
1900, he “shared Frege’s belief in the Platonic reality of numbers, which, in my imagination, peopled 
the timeless realm of Being. It was a comforting faith, which I later abandoned with regret.” 


2.2.10 REMARK: Sets and numbers exist in minds, not in a mathematics-heaven. 

The idea that all imperfect physical-world circles are striving towards a single Ideal circle form in a perfect 
Form-world is quite seductive. But the seduction leads nowhere. (For more discussion of Plato’s “theory of 
ideas", see for example Russell [469], Chapter XV, pages 135-146; Foley [448], pages 81-83.) 


The Platonic style of ontology is explicitly rejected in this book. Sets and numbers, for example, really 
exist in the mind-states and communications among human beings (and also in the electronic states and 
communications among computers and between computers and human beings), but sets and numbers do 
not exist in any “mathematics heaven" where everything is perfect and eternal. Perfectly circular circles, 
perfectly straight lines and zero-width points all exist in the human mind, which is an adequate location for 
all mathematical concepts. There is no need to postulate some kind of extrasensory perception by which 
the mind perceives perfect geometric forms. The perfect Forms are inside the mind and are perceived by 
the mind internally. Plato's ^heaven for Forms" is located in the human mind, if anywhere, and it is neither 
perfect nor eternal. 


2.2.11 REMARK: The location of mathematics is the human brain. 
The view taken in this book is option (3) in Remark 2.2.4. In other words, mathematics is located in the 


human mind. Lakoff/Nünez [449], page 33, has the following comment on the location of mathematics. 
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Ideas do not float abstractly in the world. Ideas can be created only by, and instantiated only in, 
brains. Particular ideas have to be generated by neural structures in brains, and in order for that 
to happen, exactly the right kind of neural processes must take place in the brain’s neural circuitry. 


Lakoff / Núñez [449], page 9, has the following similar comment. 


Mathematics as we know it is human mathematics, a product of the human mind. Where does 
mathematics come from? It comes from us! We create it, but it is not arbitrary—not a mere 
historically contingent social construction. What makes mathematics nonarbitrary is that it uses 
the basic conceptual mechanisms of the embodied human mind as it has evolved in the real world. 
Mathematics is a product of the neural capacities of our brains, the nature of our bodies, our 
evolution, our environment, and our long social and cultural history. 


The mathematics-in-the-brain ontology implies that the concepts of mathematics are certain kinds of states 
or activities within the brain. A mathematics education requires the creation of mathematical objects 
within the brain. The capabilities of the brain are continuously “stretched” by each step in that education. 
(This might explain why most people find advanced mathematics painful.) Space-filling curves, Lebesgue 
non-measureable functions, infinite-dimensional Hilbert spaces, and everything else must be represented 
somehow in the brain. 


An idea which is intuitive to a person immersed in advanced mathematics may be impossible to grasp for a 
person who lacks that immersion. The necessary structures and functions must be built up in the brain over 
time, level by level. Students of mathematics must literally create the subject matter of mathematics inside 
their brains, unlike the situation for the sciences and engineering, where the subject matter is “out there” in 
a shared universe which can be perceived by everyone. Mathematicians communicate and synchronise their 
ideas by words and symbols, but introspection is the only way to “see” those ideas. 


2.2.12 REMARK: Intuitionism versus formalism. 

During the first half of the twentieth century, there were battles between the intuitionists (such as Brouwer) 
and the formalists (such as Hilbert). During the second half of the century, intuitionism became more and 
more peripheral to the mainstream of mathematics and is now not at all prominent. Intuitionism lost the 
battle. The battle is well illustrated by the account by Reid [248], page 184, of a meeting in 1924 July 22. 
(See also Van Dalen [250], page 491.) 


The enthusiasm for Brouwer’s Intuitionism had definitely begun to wane. Brouwer came to Gottingen 
to deliver a talk on his ideas to the Mathematics Club. [...] After a lively discussion Hilbert finally 
stood up. 


“With your methods,” he said to Brouwer, “most of the results of modern mathematics would have 
to be abandoned, and to me the important thing is not to get fewer results but to get more results.” 


He sat down to enthusiastic applause. 


The comment on this meeting by Hans Lewy seems as correct now as it was then. (See Reid [248], page 184; 
Van Dalen [250], pages 491-492.) 


'The feeling of most mathematicians has been informally expressed by Hans Lewy, who as a Privat- 
dozent was present at Brouwer’s talk in Göttingen: 


“It seems that there are some mathematicians who lack a sense of humor or have an over-swollen 
conscience. What Hilbert expressed there seems exactly right to me. If we have to go through so 
much trouble as Brouwer says, then nobody will want to be a mathematician any more. After all, 
it is a human activity. Until Brouwer can produce a contradiction in classical mathematics, nobody 
is going to listen to him. 

"That is the way, in my opinion, that logic has developed. One has accepted principles until such 
time as one notices that they may lead to contradiction and then he has modified them. I think 
this is the way it will always be. There may be lots of contradictions hidden somewhere; and as 
soon as they appear, all mathematicians will wish to have them eliminated. But until then we will 
continue to accept those principles that advance us most speedily." 


Both ends of the spectrum lack credibility. Pure formalism apparently accepts as mathematics anything that 
can be written down in symbolic logic without disobeying some arbitrary set of rules. But obeying the rules 
is only a necessary attribute of mathematics, certainly not sufficient to deliver meaningfulness. By contrast, 
intuitionism rejects any mathematics which has insufficient intuitive immediacy. But it is difficult to draw 
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a clear line so that meaningful intuitive mathematics is on one side and meaningless mystical mathematics 
lies on the other. Certainly to “get more results” cannot be the overriding consideration. 


There is a graduation of meaningfulness from the most intuitive concepts, such as finite integers, to clearly 
non-intuitive concepts, such as real-number well-orderings or Lebesgue unmeasurable sets. As concepts 
become more and more metaphysical and lose their connection to reality, one must simply be aware that one 
is studying the properties of objects which cannot have any concrete representation, even within the human 
mind. One will never see a Lebesgue non-measurable set of points in the real world, nor in one’s own mind. 
One will never see a well-ordering of the real numbers. But one can meaningfully discuss the properties 
of such fantasy concepts in an axiomatic fashion, just as one may discuss the properties of a Solar System 
where the Moon is twice as heavy as it is, or a universe where the gravitational constant G is twice as high as 
it is, or a universe which has a thousand spatial dimensions. Discussing properties of metaphysical concepts 
is no more harmful than playing chess. Formalism is harmless as long as one is aware, for example, that it 
impossible to write a computer program which can print out a well-ordering of all real numbers. 


The solution to the intuitionism-versus-formalism debate adopted here is to permit mildly metaphysical 
concepts, but to always strive to maintain awareness of how far each concept diverges from human intuition. 
(In concrete terms, this book is based on standard mathematical logic plus ZF set theory without the axiom 
of choice.) The approach adopted here might be thought of as “Hilbertian semi-formalism”, since Hilbert 
apparently thought of formalism as a way to extend mathematics in a logically consistent manner beyond 
the bounds of intuition. By proving self-consistency of a set of axioms for mathematics, he had hoped to 
at least show that such an extrapolation beyond intuition would be acceptable. When Gödel essentially 
showed that the axioms for arithmetic could not be proven by finitistic methods to be self-consistent, the 
response of later formalists, such as Curry, was to ignore semantics and focus entirely on symbolic languages 
as the real “stuff” of mathematics. (Note, however, that Gentzen proved that arithmetic is complete and 
self-consistent, directly contradicting Gódel's incompleteness metatheorem.) Model theory has brought some 
semantics back into mathematical logic. So modern mathematical logic, with model theory, cannot be said 
to be semantics-free. However, the ZF models which are used in model theory are largely non-constructible 
and no more intuitive than the languages which are interpreted with those models. 


The view adopted here, then, is that formalism is a “good idea in principle”, which facilitates the extension 
of mathematics beyond intuition. But the further mathematics is extrapolated, the more dubious one must 
be of its relevance and validity. It is generally observed in most disciplines that extrapolation is less reliable 
than interpolation. Euclidean geometry, for example, cannot be expected to be meaningful or accurate when 
extrapolated to a diameter of a trillion light-years. Every theory is limited in scope. 


2.2.13 REMARK: Pure mathematics ontology versus applied mathematics ontology. 

One gains another perspective regarding ontology by distinguishing pure mathematics ontology from applied 
mathematics ontology. In the former case, numbers, sets and functions refer to mental states and activities. 
In the latter case, numbers, sets and functions refer to real-world systems which are being modelled. 


Zermelo-Fraenkel set theory and other pure mathematical theories, such as those of geometry, arithmetic and 
algebra, are abstractions from the concrete models which are constructed for real-world systems in physics, 
chemistry, biology, engineering, economics and other areas. Such mathematical theories provide frameworks 
within which models for extra-mathematical systems may be constructed. Thus mathematical theories are 
like model-building kits such as are given to children for play or are used by architects or engineers to model 
buildings or bridges. For example, the interior of a planet or the structure of a human cell can be modelled 
within Zermelo-Fraenkel set theory. Even abstract geometry, arithmetic and algebra can be modelled within 
Zermelo-Fraenkel set theory. 


Whenever a mathematical theoretical framework is applied to some extra-mathematical purpose, the entities 
of the pure mathematical framework acquire meaning through their association with extra-mathematical 
objects. Thus in applied mathematics, the answer to the question “What do these numbers, sets and 
functions refer to?” may be answered by “following the arrow” from the model to the real-world system. 


In pure mathematics, one studies mathematical frameworks in themselves, just as one might study a molecule 
construction kit or the components which are used in an architect's model of a building. One may study the 
general properties of the components of such modelling kits even when the components are not currently 
being utilised in a particular model of a particular real-world system. In this pure mathematical case, one 
cannot determine the meanings of the components by “following the arrow” from the model to the real-world 
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system. In this case, one is dealing with a human abstraction from the universe of all possible models of all 
systems which can be modelled with the abstract framework. 


The distinction between pure and applied mathematics ontologies is not clear-cut. All real-world categories, 
such as buildings, bridges, dogs, etc., are human mental constructs. There is no machine which can be 
pointed at an animal to say if it is a dog or not a dog. The set of dogs is a human classification of the 
world. There may be boundary cases, such as hybrids of dogs, dingos, wolves, foxes, and other dog-related 
animals. The situation is even more subjective in regard to breeds of dogs. A physical structure may be 
both a building and a bridge, or some kind of hybrid. Is a cave a building if someone lives in it and puts 
a door on the front? Is a caravan a building? Is a tree trunk a bridge if it has fallen across a river? Is a 
building a building if it is only partly built, or if it has been demolished already? It becomes rapidly clear 
that there is a second level of modelling here. The mathematical model points to an abstract world-model, 
and the world-model points in some subjective way to the real world. Ultimately there is no real world apart 
from what is perceived through observations from which various conceptions of the “real world” are inferred. 


As mentioned in Remark 3.3.1, the word “dog” is given real-world meaning by pointing to a real-world dog. 
The abstract number 3, written on paper, does not point to a real-world external object in the same way, 
since it refers to something which resides inside human minds. However, when the number 3 is applied 
within a model of the number of sheep in a field, it acquires meaning from the application context. 


One may conclude from these reflections that there is no unsolvable riddle or philosophical conundrum 
blocking the path to an ontology for mathematics. If anyone asks what any mathematical concept refers to, 
two answers may be given. In the abstract context, the answer is a concept in one or more human minds. 
In the applied context, the answer is some entity which is being modelled. Of course, some mathematics is 
so pure that one may have difficulty finding any application to anything resembling the real world, in which 
case only one of these answers may be given. 


2.2.14 REMARK: The importance of correct ontology for learning mathematics. 

Anyone who believes that the foundations of mathematics are either set in stone, or irrelevant, or both, 
should consider that despite the ubiquitous importance of mathematics in all areas of science, engineering, 
technology and economics, it remains the most unpopular and inaccessible subject for most of the people who 
need it most. The difficulties with learning mathematics may plausibly be attributed to its metaphysicality. 
Plato was essentially right in thinking that mathematics is perceived by the brain in much the same way 
that the physical world is perceived by sight and hearing and the other senses. But what one perceives is 
not some sort of astral plane, but rather the activity and structures within one’s own brain. 


The learning of mathematics requires restructuring the brain to accept and apply analytical ideas which 
employ a bewildering assortment of infinite systems as conceptual frameworks. When students say that 
they do not “see” an idea, that is literally true because the required idea-structure is not yet created in 
their brain. Learning mathematics requires the steady accumulation and interconnection of idea-structures. 
The ability to write answers on paper is not the main objective. The answers will appear on paper if the 
idea-structures in the brain are correctly installed, configured and functioning. Whereas the chemist studies 
chemicals which are “out there” in the physical world, mathematics students must not only look inside their 
own minds for the objects of their study, but also create those objects as required. 


Mathematics has strong similarities to computer programming. Theorems are analogous to library procedures 
which are invoked by passing parameters to them. If the parameters correctly match the assumptions of the 
theorem, the logic of the theorem’s proof is executed and the output is an assertion. The proof is analogous 
to the “body” or “code” of a computer program procedure. However, whereas computer code is executed on 
a semiconductor-based computer, mathematical theorems are executed inside human brains. 


2.3. Dark sets and dark numbers 


2.3.1 REMARK: Informal definition of a nameable set. 

A set (or number, or other kind of object) x may be considered to be “nameable” within a particular set 
theory framework if one can write a single-parameter predicate formula $ within that framework such that 
the proposition ó(z) is true, and it can be shown that x is the unique parameter for which $(x) is true. This 
definition leaves some ambiguity. The phrase “can write" is ambiguous. If it would take a thousand years 
to print out a formula which x uniquely satisfies, and ten thousand years for an army of mathematicians to 
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check the correctness of the formula, that would raise some doubts. Similarly, the phrase “can be shown” is 
ambiguous. If the existence of the formula ¢, and the existence of proofs that (1) x satisfies ¢, and that (2) it 
is unique, may be proved only by indirect methods which do not permit an explicit formula to be presented, 
that would raise some doubts also. 


An example of a nameable number would be 7, because we can write a formula for it, and we can prove that 
one and only one number satisfies the formula (within a suitable set theory framework). Another example 
of a nameable number would be the quintillionth prime number, although we cannot currently write down 
even its first decimal digit. 


2.3.2 REMARK: Unnameable sets, mystical faith, Ockham’s razor and “unnecessary entities”. 

An example of an unnameable set would be a well-ordering of the real numbers. The believers “know” by 
the axiom of choice that such well-orderings “exist”, but no individual well-ordering of the real numbers can 
be named. The existence of such dark objects is a matter of mystical faith, which is therefore a matter of 
personal preference. It has been shown, more or less, that the axiom of choice is consistent with the more 
constructive axioms of set theory. Therefore it is not possible to disprove the faith of believers. Dark sets 
either exist or do not exist in an inaccessible zone where no one can identify individual objects. No light 
can ever be shone on this region. It is eternally in total darkness. If one accepts Ockham's razor, one surely 
does not wish to “multiply entities beyond necessity”. Newton [294], page 387, put it very well as follows. 


REGUL# PHILOSOPHANDI 
REGULA I. 


Causas rerum naturalium non plures admitti debere, quam que & vere sint & earum phenomenis 
explicandis sufficiant. 


Dicunt utique philosophi: Natura nihil agit frustra, & frustra fit per plura quod fieri potest per 
pauciora. Natura enim simplex est & rerum causis superfluis non luxuriat. 


'This may be translated as follows. 


PHILOSOPHISING RULES 
RULE I. 


One must not admit more causes of natural things than are both true and sufficient to explain their 
appearances. 


The philosophers say anyway: Nature does nothing without effect, and it is without effect to make 
something with more which can be made with less. Indeed, nature is simple and does not luxuriate 
in superfluous causes of things. 


The intuitionists rejected more of the “dark entities" in mathematics than the logicists, but some of these 
dark entities are of great practical utility, particularly the “existence” of infinite sets, which are the basis of 
limits and all of analysis. Limits are no more harmless than the “points at infinity" of the artists’ rules of 
perspective. Points at infinity are convenient, and have a strong intuitive basis. But well-orderings of the 
real numbers are useless because one cannot approach them as "limits" in any way. They simply are totally 
inaccessible. They are therefore not a “necessity”. 


2.3.3 REMARK: Unnameable numbers and sets may be thought of as “dark numbers” and “dark sets”. 
The constraints of signal processing place a finite upper bound on the number of different symbols which 
can be communicated on electromagnetic channels, in writing or by spoken phonemes. Therefore in a finite 
time and space, only a finite amount of information can be communicated, depending on the length of time, 
the amount of space, and the nature of the communication medium. Consequently, mathematical texts can 
communicate only a finite number of different sets or numbers. The upper bound applies not only to the 
set of objects that can be communicated, but also to the set of objects from which the chosen objects are 
selected. In other words, one can communicate only a finite number of items from a finite menu. 


Since symbols are chosen from a finite set, and only a finite amount of time and space is available for 
communication, it follows that only a finite number of different sets and numbers can ever be communicated. 
Similarly, only a finite number of different sets and numbers can ever be thought about. Even if one imagines 
that an infinite amount of time and space is available, the total number of different sets and numbers that 
can be communicated or thought about is limited to countable infinity. Since symbols can be combined to 
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signify an arbitrarily wide range of sets and numbers, it seems sensible to generously accept a countably 
infinite upper bound for the cardinality of the sets and numbers that can be communicated or thought about. 


It follows that almost all elements of any uncountably infinite set are incapable of being communicated or 
thought about. One could refer to those sets and numbers which cannot be communicated or thought about 
as “unnameable”, “uncommunicable”, “unthinkable”, *unmentionable", and so forth. 


Unnameable numbers and sets could be described as “dark numbers” and “dark sets” by analogy with “dark 
matter” and “dark energy”. (This is illustrated in Figure 2.3.1.) The supposed existence of dark matter and 
dark energy is inferred in aggregate from their effects in the same way that dark numbers and sets may be 
constructed in aggregate. Unnameable numbers and sets could also be called “pixie numbers” and “pixie 
sets”, because we know they are “out there”, but we can never see them. 


countable name space 


universe of numbers 


Figure 2.3.1 Nameable numbers and dark numbers 


The unnameability issue can be thought of as a lack of names in the name space which can be spanned 
by human texts. The names spaces used by human beings are necessarily countable. These names may be 
thought of as “illuminating” a countable subset of elements of any given set. Then if a set is uncountable, 
almost all of the set is necessarily “dark”. 


Since most elements of an uncountable set can never be thought of or mentioned, any uncountable set is 
metaphysical. One may write down properties which are satisfied by uncountable sets, but this is analogous 
to a physics model for a 500-dimensional universe. Whether or not such a model is self-consistent, the 
model still refers to a universe which we cannot experience. In fact, such a ridiculous physical model cannot 
be absolutely rejected as impossible, but the possibility of an uncountable sets of names for things can be 
absolutely rejected. 


In particular, standard Zermelo-Fraenkel set theory is a world-model for a mathematical world which is 
almost entirely unmentionable and unthinkable. The use of ZF set theory entails a certain amount of self- 
deception. One pretends that one can construct sets with uncountable cardinality, but in reality these sets 
can never be populated. In other words, ZF set theory is a world-model for a fantasy mathematical universe, 
not the real mathematical universe of real mathematicians. 


This does not imply that ZF set theory must be abandoned. The sets in ZF set theory which can be named 
and mentioned and thought about are perfectly valid and usable. It is important to keep in mind, however, 
that ZF set theory models a universe which contains a very large proportion of wasted space. But there is 
no real problem with this because it costs nothing. 


2.3.4 REMARK: Arbitrary mathematical foundations need to be anchored. 

One of the most powerful capabilities of the human mind is its ability to imagine things which are not 
directly experienced. For example, we believe that the food in a refrigerator continues to exist when the 
door is closed. This ability to believe in the existence of things which are not directly perceived is an 
enormous advantage when the belief turns out to be true. However, modern mathematics and science have 
harnessed that ability, and have extrapolated it so that human beings can now believe in total absurdities 
when led by “reason” and “logic”. 
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In the sciences, extrapolations and hypotheses are subject to the discipline of evidence and experiment. 
In mathematics, the only discipline is the requirement for self-consistency, which is not a very restrictive 
discipline. Hence there are concepts in pure mathematics which have an extremely tenuous connection with 
practical mathematics. The idea that mathematics is located in the human mind assists in making choices 
of foundations for mathematics from amongst the infinity of possible self-consistent foundations. Otherwise 
mathematics would be completely arbitrary, as some people do claim. 


2.3.5 REMARK: Unnameable sets and numbers are incompressible. 

Numbers and sets which cannot be referred to individually by any construction, nor by any specific individual 
definition, may be described as “unmentionable” numbers or sets. They could also be described as “incom- 
pressible” numbers and sets because the ability to reduce an infinite specification to a finite specification is a 
kind of compression. (The word “incompressible” was suggested to the author in a phone call on 2008-3-12 
or earlier [503].) For example, the real number 7 has an infinite number of digits in its decimal expansion, 
but the expression m = 457 4(—1)"(2n + 1)7! is a finite specification for m which uniquely defines it so 
that it can be distinguished from any other specified real number. 


2.3.6 REMARK: The price to pay for the power of mathematics. 
Bell [233], pages 521-522 made the following comment about irrational numbers and infinite concepts. 


It depends upon the individual mathematician's level of sophistication whether he regards these 
difficulties as relevant or of no consequence for the consistent development of mathematics. The 
courageous analyst goes boldly ahead, piling one Babel on top of another and trusting that no 
outraged god of reason will confound him and all his works, while the critical logician, peering 
cynically at the foundations of his brother's imposing skyscraper, makes a rapid mental calculation 
predicting the date of collapse. In the meantime all are busy and all seem to be enjoying themselves. 
But one conclusion appears to be inescapable: without a consistent theory of the mathematical 
infinite there is no theory of irrationals: without a theory of irrationals there is no mathematical 
analysis in any form even remotely resembling what we now have; and finally, without analysis 
the major part of mathematics—including geometry and most of applied mathematics—as it now 
exists would cease to exist. 


The most important task confronting mathematicians would therefore seem to be the construction 
of a satisfactory theory of the infinite. 


Mathematics sometimes seems like an exhausting series of mind-stretches. First one has to stretch one's 
ideas about integers to accept enormously large integers. Then one must accept negative numbers. Then 
fractional, algebraic and transcendental numbers. After this, one must accept complex numbers. Then there 
are real and complex vectors of any dimension, infinite-dimensional vectors and seminormed topological 
linear spaces. Then one must accept a dizzying array of transfinite numbers which are “more infinite than 
infinity". Beyond this are dark numbers and dark sets which exist but can never be written down. If there 
were no benefits to this exhausting series of mind-stretches, only crazy people would study such stuff. It 
is only the amazing success of applications to science and engineering which justifies the towering edifice of 
modern mathematics. Intellectual discomfort is the price to pay for the analytical power of mathematics. 


2.3.7 REMARK: Even finite sets may be very “dim”. 

The difficulties with “dark sets" are not limited to infinite sets. As explained in Remark 12.6.2, the fairly 
straightforward von Neumann universe construction procedure yields a set with 299996 elements at the sixth 
stage. This implies that the set of finite sets which have a set-membership tree-depth of 6 or less contains 
about 2.0035 - 1019728 elements, which very greatly exceeds the number of atoms in the known physical 
Universe. This set of sets does not even contain the ordinal number 6, and yet it is utterly impossible to 
represent it explicitly with all of the computers on Earth. So even finite sets may be “dark” in this sense. 
One may perhaps refer to such sets as “dim sets", since they are finite but still “invisible”. Hence even the 
mathematics of finite sets is only possible if one is highly selective. 
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2.4. General comments on mathematical logic 


2.4.1 REMARK: Mathematical logic theories don’t need to be complete. 

Completeness of a logical theory is not essential. If the theory cannot generate all of the truth-values for a 
universe of propositions, that is unfortunate, but this is not a major failing. Most scientific theories are not 
complete, but they are still useful if they are correct. 


Logical argumentation is an algebra or calculus for “solving for unknowns”. It sometimes happens in ele- 
mentary algebra that false results arise from correct argumentation. Then axioms must be added in order 
to prevent the false results. For example, one may know that a square has an area of 4 units, from which 
it can be shown that the side is 2 or —2 units. To remove the anomaly, one must specify that the side is 
non-negative. Similarly, in the logical theory of sets, for example, one must add a regularity axiom to prevent 
contradictions such as Russell’s paradox, or if one wishes to ensure or prevent the existence of well-orderings 
of the real numbers, one must add an axiom to respectively either assert or deny the axiom of choice. In 
other words, the axioms must be chosen to describe the universe which one wishes to describe. Axioms do 
not fall from the sky on golden tablets! 


2.4.2 REMARK: Name clash for the word “model”. 

The word “model” in mathematical logic has an established meaning which is the inverse to the meaning 
in the sciences and engineering. (See also Remark 2.4.3 on this issue.) The meaning intended is often this 
science and engineering meaning. In mathematical logic, the word “model” has come to mean an example 
system which satisfies a set of axioms, together with a map from the language of the axioms to the example 
system. Thus the Zermelo-Fraenkel set theory axioms, for example, are satisfied by many different set- 
universes, with many different possible maps from the variables, constants, relations and functions of the 
first-order language to these set-universes. The study of such “interpretations” of ZF set theory is called 
“model theory". 


In the sciences and engineering, mathematical modelling means the creation of a mathematical system with 
variables and evolution equations so that the behaviour of a real-world observed system may be understood 
and predicted. In this meaning of “model”, a real-world system is a particular example which is well described 
by the mathematical model. By contrast, a mathematical logic “model” is a system which is well described 
by the axioms and deduction methods of the logical language. The confusion in the use of the word “model” 
was also discussed in 1957 by Suppes [394], pages 253-254. 


During the last two decades the phrases ‘model’ and ‘mathematical model’ have been widely used, 
particularly in the behavioral sciences. These phrases seem to be used in at least three distinct 
senses. [....] The third meaning of ‘model’, the one most popular with empirical scientists, is what 
we have meant by ‘theory’ in preceeding pages. In this sense, to give a mathematical model for 
some branch of empirical science is to state an exact mathematical theory. 


Since the word “model” has been usurped to mean a system which satisfies a theory, the inverse notion 
may be called a “theory for a model”, or a “formulation of a theory”. (For “the theory of a model”, see 
Chang/Keisler [347], page 37; Bell/Slomson [339], page 140; Shoenfield [390], page 82. For “a formulation of 
a theory”, see Stoll [393], pages 242-243.) 


2.4.3 REMARK: The logic literature uses the word “model” differently. 
The word “model” is widely used in the logic literature in the reverse sense to the meaning in this book. 
(See also Remark 2.4.2 on this matter.) 


Shoenfield [390], page 18, first defines a “structure” for an abstract (first-order) language to be a set of 
concrete predicates, functions and variables, together with maps between the abstract and concrete spaces, 
as outlined in Remark 5.1.4. If such a structure maps true abstract propositions only to true concrete 
propositions, it is then called a “model”. (Shoenfield [390], page 22.) 


E. Mendelson [370], page 49, defines an “interpretation” with the same meaning as a “structure”, just 
mentioned. Then a “model” is defined as an interpretation with valid maps in the same way as just mentioned. 
(See E. Mendelson [370], page 51.) Shoenfield [390], pages 61, 62, 260, seems to define an “interpretation” as 
more or less the same as a “model”. 


Here the abstract language is regarded as the model for the concrete logical system which is being discussed. 
It is perhaps understandable that logicians would regard the abstract logic as the “real thing” and the 
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concrete system as a mere “structure”, “interpretation” or “model”. But in the scientific context, generally 
the abstract description is regarded as a model for the concrete system being studied. It would be accurate 
to use the word “application” for a valid “structure” or “interpretation” of a language rather than a “model”. 


All in all, the model theory use of the word “model” does make very good sense if one thinks of Gédel’s 
famous construction to represent arithmetic, and various constructions which represent ZF set theory. Since 
these are constructed in a more or less mechanical fashion, they do in some sense resemble the way architects 
or engineers might build models to represent the ideas which they have in their minds. Similarly a model ship 
is a construction which represents the real ship. Thus if one replaces the word “model” with “construction”, 
the meaning becomes clearer. 


However, the “construction” terminology is not quite so helpful when referring to all groups as “models” of 
the axioms of a group because of the extreme non-uniqueness. The concept of a model as a construction like a 
model ship makes most sense when the ship is a unique object. The very unfortunate discovery regarding ZF 
set theory is that it does admit so many models with such a vast range of mutually contradictory properties. 
This shows that the ZF axioms are nothing like a complete specification of set theory. Thus ZF set theory is 
not a definite fixed unique system, but rather a broad family of systems. And supposedly all of the concepts 
of mathematics can be represented (or “modelled”) within ZF set theory. Therefore many mathematical 
concepts have variable meaning according to how the underlying ZF set theory is modelled. This gives 
mathematics a partially subjective character, which contradicts the popular view that mathematics is a 
totally objective subject. 


2.4.4 REMARK: Mathematical logic is an art of modelling, whether the subject matter is concrete or not. 
There is a parallel between the history of mathematics and the history of art. In the olden days, paintings 
were all representational. In other words, paintings were supposed to be modelled on visual perceptions of 
the physical world. But although the original objective of painting was to represent the world as accurately 
as possible, or at least convince the viewer that the representation was accurate, the advent of cameras made 
this representational role almost redundant. Photography produced much more accurate representations 
for a much lower price than any painter could compete with. So the visual arts moved further and further 
away from realism towards abstract non-representational art. In non-representational art, the methods and 
techniques were retained, but the original objective to accurately model the visual world was discarded. With 
modern computer generated imagery, the capability to portray things which are not physically perceived has 
increased enormously. 


Mathematics was originally supposed to represent the real world, for example in arithmetic (for counting and 
measuring), geometry (for land management and three-dimensional design) and astronomy (for navigation 
and curiosity). The demands of science required rapid development of the capabilities of mathematics to 
represent ever more bizarre conceptions of the physical world. But as in the case of the visual arts, the 
methods and techniques of mathematics took on a life of their own. Increasingly during the 19th and 20th 
centuries, mathematics took a turn towards the abstract. It was no longer felt necessary for pure mathematics 
to represent anything at all. Sometimes fortuitously such non-representational mathematics turned out to 
be useful for modelling something in the real world, and this justified further research into theories which 
modelled nothing at all. 


The fact that a large proportion of logic and mathematics theories no longer represent anything in the 
perceived world does not change the fact that the structures of logic and mathematics are indeed theories. 
Just as a painting is not necessarily a painting of physical things, but they are still paintings, so also the 
structures of mathematics are not necessarily theories of physical things, but they are still theories. It follows 
from this that mathematics is an art of modelling, not a science of eternal truth. Therefore it is pointless to 
ask whether mathematical logic is correct or not. It simply does not matter. Mathematical logic provides 
tools, techniques and methods for building and studying mathematical models. 


2.4.5 REMARK: Theorems don’t need to be proved. Nor do they need to be read. 

In everyday life, the truth of a proposition is often decided by majority vote or by a kind of adversarial 
system. Thus one person may claim that a proposition is true, and if no one can find a counterexample 
after many years of concerted and strenuous efforts, it may be assumed that the proposition is thereby 
established. After all, if counterexamples are so hard to find, maybe they just don’t matter. And besides, 
delay in establishing validity has its costs also. The safety of new products could be determined in this way 
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for example, especially if the new products have some benefits. Thus a certain amount of risk is generally 
accepted in the real world. 


By contrast, it is rare that propositions are accepted in mathematics without absolute proof, although the 
adjective “absolute” can sometimes be somewhat elastic. One way to avoid the necessity of “absolutely 
proving” a proposition is to make it an axiom. Then it is beyond question! Propositions are also sometimes 
established as “folk theorems”, but these are mostly reluctantly accepted, and generally the users of folk 
theorems are conscious that such “theorems” still await proof or possible disproof. 


Theorems are also sometimes proved by sketches, typically including comments like “it is easy to see how to 
do X” or “by a well-known procedure, we may assert X”. Then anyone who does not see how to do it is 
exposing themselves as a rank amateur. Sometimes, such proofs can be made strictly correct only by finding 
some additional technical conditions which have been omitted because every professional mathematician 
would know how to fill in such gaps. Thus there is a certain amount of flexibility and elasticity in even the 
mathematical standards for proof of theorems. 


In textbooks, it is not unusual to leave most of the theorems unproved so that the student will have something 
to do. This is quite usual in articles in academic journals also, partly to save space because of the high cost 
of printing in the olden days, but also to avoid insulting the intelligence of expert readers. 


So one might fairly ask why the proofs of theorems should be written out at all. Every proposition could 
be regarded as an exercise for the reader to prove. Alternatively, each proposition could be thought of as 
a challenge to the reader to find counterexamples. If the reader does find a counterexample, that’s a boost 
of reputation for the reader and a blow to the writer. If a proposition survives many decades without a 
successful contradiction, that boosts the reputation of the writer. Most of the time, a false theorem can be 
fixed by adding some technical assumption anyway. So there’s not much harm done. Thus mathematics 
could, in principle, be written entirely in terms of assertions without proofs. The fear of losing reputation 
would discourage mathematicians from publishing false assertions. 


Even when proofs are provided, it is mostly unnecessary to read them. Proofs are mostly tedious or difficult 
to read — almost as tedious or difficult as they are to write! Therefore proofs mostly do not need to be 
read. A proof is analogous to the detailed implementation of computer software, whereas the statement of a 
theorem is analogous to the user interface. One does not need to read a proof unless one suspects a “bug” or 
one wishes to learn the tricks which are used to make the proof work. (Analogously, when people download 
application software to their mobile computer gadgets, they rarely insist on reading the source before using 
the “app”, but authors should still provide the source code in case problems arise, to give confidence that 
the software does what is claimed.) 


2.4.6 REMARK: Lemmas. 

A “lemma” in ancient Greek mathematics, appears, to have had more than one interpretation, and those 
interpretations were not quite the same as the modern meanings. A lemma could have meant a kind of 
folk theorem which was used in proofs, but which no one had yet rigorously proved because it was deemed 
unnecessary to prove an assertion which is obviously true. Then at some later time, the lemma might have 
been given a proof. Heath [244], page 373, wrote the following on this subject. 


The word lemma (Añuua) simply means something assumed. Archimedes uses it of what is now 
known as the Axiom of Archimedes, the principle assumed by Eudoxus and others in the method 
of exhaustion; but it is more commonly used of a subsidiary proposition requiring proof, which, 
however, it is convenient to assume in the place where it is wanted in order that the argument 
may not be interrupted or unduly lengthened. Such a lemma might be proved in advance, but the 
proof was often postponed till the end, the assumption being marked as something to be afterwards 
proved by some such words as óc detyOhoeta, ‘as will be proved in due course’. 


Elsewhere, Heath quotes the following later view of Proclus (412-485AD). (See Euclid/Heath [213], page 133.) 


“The term lemma,” says Proclus, “is often used of any proposition which is assumed for the 
construction of something else: thus it is à common remark that a proof has been made out of 
such and such lemmas. But the special meaning of lemma in geometry is a proposition requiring 
confirmation. For when, in either construction or demonstration, we assume anything which has not 
been proved but requires argument, then, because we regard what has been assumed as doubtful 
in itself and therefore worthy of investigation, we call it a lemma, differing as it does from the 
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postulate and the axiom in being matter of demonstration, whereas they are immediately taken for 
granted, without demonstration, for the purpose of confirming other things.” 


In a footnote to this quotation from Proclus, Heath makes the following comment on the relative modernity 
of the Proclus concept of a lemma as a theorem whose proof is delayed to a later time to avoid disrupting a 
proof. (He mentions the first century AD Greek mathematician Geminus.) 


This view of a lemma must be considered as relatively modern. It seems to have had its origin 
in an imperfection of method. In the course of a demonstration it was necessary to assume a 
proposition which required proof, but the proof of which would, if inserted in the particular place, 
break the thread of the demonstration: hence it was necessary either to prove it beforehand as a 
preliminary proposition or to postpone it to be proved afterwards (óc &&fic 6ejrürjoexot). When, 
after the time of Geminus, the progress of original discovery in geometry was arrested, geometers 
occupied themselves with the study and elucidation of the works of the great mathematicians who 
had preceded them. This involved the investigation of propositions explicitly quoted or tacitly 
assumed in the great classical treatises; and naturally it was found that several such remained to be 
demonstrated, either because the authors had omitted them as being easy enough to be left to the 
reader himself to prove, or because books in which they were proved had been lost in the meantime. 
Hence arose a class of complementary or auxiliary propositions which were called lemmas. 


According to this view, lemmas are gaps which have been found in proofs. Nowadays a lemma often means an 
uninteresting, difficult, technical theorem which is only of use to prove a single interesting, useful theorem. 
Zorn's lemma, in particular, is a kind of gap-filler, replacing a common, reasonable assumption with a 
rigorous proof from some other basis. 


If lemmas are theorems whose assertions are obvious, but whose proofs are tedious, then certainly this book 
has lots of lemmas. Lemmas are needed in great numbers if one tries to fill all the gaps. History has shown 
that interesting non-trivial mathematics sometimes arises from the effort to “prove the obvious". Therefore 
the “plethora of lemmas” in this book can perhaps be excused. 


2.4.7 REMARK: The advantages of predicate logic notation. 

The very precise notation of predicate logic is still being adopted by mathematicians in general at an 
extremely slow rate. The current style of predicate logic notation, which is adopted fairly generally in the 
mathematical logic literature, has changed only in relatively minor ways since it was introduced in 1889 by 
Peano [375]. Predicate logic notation is not only beneficial for its lack of ambiguity. It is also concise, and 
it is actually much easier to read when one has become familiar with it. The “alphabet” and “vocabulary” 
of predicate logic are very tiny, particularly in comparison with the ad-hoc notations which are introduced 
in almost all mathematics textbooks. Every textbook seems to have its own peculiar dialect which must 
be learned before the book can be interpreted into one’s own “mother tongue”. So the acquisition of the 
compact, economical notation of predicate logic, whose form and meaning are almost universally accepted, 
seems like a very attractive investment of a small effort for a huge profit. Hence the extremely slow adoption 
of predicate logic notation is perplexing. Those authors who do adopt it sometimes feel the need to justify 
it, as for example Cullen [64], page 3, in 1972. 


We will find it convenient to use a small amount of standard mathematical shorthand as we proceed. 
Learning this notation is part of the education of every mathematician. In addition the gain in 
efficiency will be well worth any initial inconvenience. 


Some authors have felt the need to justify introducing any logic or set theory at all, as for example at the 
beginning of an introductory chapter on logic and set theory by Graves [85], page 1, in 1946. 


It should be emphasized that mathematics is concerned with ideas and concepts rather than with 
symbols. Symbols are tools for the transference of ideas from one mind to another. Concepts become 
meaningful through observation of the laws according to which they are used. This introductory 
chapter is concerned with certain fundamental notations of logic and of the calculus of classes. It 
will be understood better after the student has become familiar with the use of these concepts in 
the later chapters. 


Such authors felt the need to explain why they were adopting logic notations such as the universal and 
existential quantifiers “Y” and “4” in their textbooks. Modern authors do not give such explanations 
because they mostly do not even use predicate quantifiers. This book, by contrast, adopts these quantifiers 
as the two most important symbols in the mathematical alphabet. 
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It is fairly widely recognised that progress in mathematics has often been held back by lack of an effective 
notation, and that some notable historical surges in the progress of mathematics have been associated with 
advances in the quality of notation. Thus the general adoption of predicate logic notation can be expected 
to have substantial benefits. Lucid thinking is important for the successful understanding and development 
of mathematics, and good notation is important for lucid thinking. If one cannot express a mathematical 
idea in predicate logic, it has probably not been clearly understood. The exercise of converting ideas into 
predicate logic is an effective procedure to achieve both clarification and communication. 


2.4.8 REMARK: The parallel roles of intuition and logic in mathematics. Thinking “in stereo”. 
Mathematics can be presented and understood in two ways: by intuition and by logic. If one understands 
only by intuition, serious errors may occur. If one understands only by logic, it may not be clear how to 
navigate from given assumptions to desired conclusions. (That is, proofs may be difficult to discover.) 


Intuition is generally built up from examples, but even a million examples do not constitute a proof. The 
counterexamples to some “intuitively obvious facts” of mathematics are generally not obvious until they 
are pointed out. So it is essential to verify all intuition by producing rigorous proofs. Mathematics has 
many infinities, and infinities of infinities, and so on ad infinitum. Existential and universal quantifiers are 
mixed in curious ways, and one may sometimes accidentally swap them when it is not justified to do so. 
Therefore one must thoroughly master complex logical assertions which contain numerous logical quantifiers 
and convoluted set relations and dependencies. 


If one thoroughly masters all aspects of the manipulation of symbolic logic with complex arrangements of 
existential and universal quantifiers, one can virtually guarantee that no false conclusions will arise from 
true assumptions. However, proof by logical deduction requires the skill to navigate vast chess-like decision 
trees. A 3-step proof might be easy using pure logic alone, but a 20-step proof will typically require some 
strategic thinking to ensure that the deduction path leads to the desired goal. (This is why computerised 
mathematics systems are good for verifying proofs, but not so good for finding them.) 


Sometimes the logical proof of a result comes first, and the intuition comes with great difficulty. Other times, 
the intuition comes first, and the logical proof comes with great difficulty. But in the end, one needs both. 
Therefore mathematics must be presented, studied and researched “in stereo”. In other words, there must 
be two communication channels: one channel for logic, the other channel for intuition. 

The reader may think there is too much symbolic logic in this book. The reader may also think there 
is too much informal discussion. However, the ideal presentation should have the maximum of rigorous 
logic, and abundant natural-language commentary. Both channels should be turned up loud and clear. (See 
Remark 1.5.9 for related comments about the formal and informal “stereo channels” of this book.) 
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3.0.1 REMARK: Why commence with logic? 

One could commence a systematic account of mathematics with either logic, numbers or sets. It is not 
possible to systematically present any one of these three subjects without first presenting the other two. 
Logic seems to be the best starting point for several reasons. 


(1) In mathematics, logical operations and expressions are required in almost every sentence and in all 
definitions and theorems, explicitly or implicitly, formally or informally, expressed in symbolic logic or 
in natural language. 


(2) Symbolic logic provides a compact, unambiguous language for the communication of mathematical 
assertions and definitions, helping to clear the fog of ambiguity and subjectivity which envelops natural 
language explanations. (This book uses symbolic logic more than most about differential geometry. 
Definitions and notations vary from decade to decade, from topic to topic, and from author to author, 
but symbolic logic makes possible precise, objective comparisons of concepts and arguments.) 


(3) Although the presentation of mathematics requires some knowledge of the integers, some understanding 
of set membership concepts, and the ability to order and manipulate symbols in sequences of lines on 
a two-dimensional surface of some kind, most people studying mathematics have adequate capabilities 
in these areas. However, the required sophistication in logic operations and expressions is beyond the 
naive level, especially when infinite or infinitesimal concepts of any kind are introduced. 


(4) The history of the development of mathematics in the last 350 years has shown that more than naive 
sophistication in logic is required to achieve clarity and avoid confusion in modern mathematical analysis. 
It is true that one may commence with numbers or sets, and that this is how mathematics is often 
introduced at an elementary level, but lack of sophistication in logic is very often a major obstacle to 
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progress beyond the elementary level. Therefore the best strategy for more advanced mathematics is to 
commence by strengthening one’s capabilities in logic. 


(5) Logic is wired into all brains at a very low level, human and otherwise. This is demonstrated in particular 
by the phenomenon of “conditioning”, whereby all animals (which have a nervous system) may associate 
particular stimuli with particular reactions. This is in essence a logical implication. Associations may 
be learned by all animals (which have a nervous system). So logic is arguably the most fundamental 
mathematical skill. (At a low level, brains look like vast pattern-matching machines, often producing 
true/false outputs from analogue inputs. See for example Churchland/Sejnowski [447].) 


(6) Logic is wired into human brains at a very high level. Ever since the acquisition of language (probably 
about 250,000 years ago), humans have been able to give names to individual objects and categories 
of objects, to individual locations, regions and paths, and to various relations between objects and 
locations (e.g. the lion is behind the tree, or Mkoko is the child of Mbobo). Through language, humans 
can make assertions and deny assertions. The linguistic style of logic is arguably the capability which 
gives humans the greatest advantage relative to other animals. 


Commencing with logic does not imply that all mathematics is based upon a bedrock of logic. Logic is not 
so much a part of the subject matter of mathematics, but is rather an essential and predominant part of the 
process of thinking in general. Arguably logic is essential for all disciplines and all aspects of human life. 
The special requirement for in-depth logic in mathematics arises from the extreme demands placed upon 
logical thinking when faced with the infinities and other such semi-metaphysical and metaphysical concepts 
which have been at the core of mathematics for the last 350 years. Mathematical logic provides a framework 
of language and concepts within which it is easier to explain and understand mathematics efficiently, with 
maximum precision and minimum ambiguity. Therefore strengthening one’s capabilities first in logic is a 
strategic choice to build up one’s intellectual reserves in preparation for the ensuing struggle. 


3.0.2 REMARK: The possibly sinister origins of language and logic. 

One might naively suppose that language developed in human beings for the benefit of the species. Language 
is the means by which propositions can be converted from non-verbal to verbal form, communicated via 
speech, and interpreted by an interlocutor from verbal form to non-verbal understanding. (In other words, 
coding, transmission and decoding, in both directions.) It is difficult to see anything negative in this. 
However, evolution is generally driven by naked self-interest, not the “greatest good of the greatest number”. 
The principal thesis of an article by Aiello/Dunbar [446], page 190, is that the development of human language 
was necessitated by evolutionary pressures to increase social group size. 


In groups of the size typical of non-human primates (and of “vocal-grooming” hominids), social 
knowledge is acquired by direct, first-hand interaction between individuals. This would not be 
possible in the large groups characteristic of modern humans, where cohesion can only be maintained 
if individuals are able to exchange information about behaviour and relationships of other group 
members. By the later part of the Middle Pleistocene (about 250,000 years ago), groups would 
have become so large that language with a significant social information content would have become 
essential. 


Regarding three possible explanations for the evolutionary pressure to increase human social group size, 
Aiello/Dunbar [446], page 191, make the following comment. 


The second possibility is that human groups are in fact larger than necessary to provide protection 
against predators because they are designed to provide protection against other human groups |... . ]. 
Competition for access to resources might be expected to lead to an evolutionary arms race because 
the larger groups would always win. Some evidence to support this suggestion may lie in the fact 
that competitive aggression of this kind resulting in intraspecific murder has been noted in only 
one other taxon besides humans, namely, chimpanzees. 


The need for human social groups to maintain cohesion in competition with other social groups would have 
been particularly great in times of ecological stress such as “megadrought” in eastern Africa during the Late 
Pleistocene. (See for example Cohen et alii [489].) Proficiency in a particular language would be a strong 
marker of “us” versus “them”, with the consequence that major wars between groups for access to resources 
would be much more likely than if individuals were not aggregated into groups by their language. (Any 
individual who could not speak one’s own language would be one of “them”, evoking hostility rather than 
cooperation.) Thus it seems quite likely that basic propositional logic, which is predicated on the ability to 
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express ideas in verbal form, arose from the evolutionary benefits of forming and demarcating social groups 
for success in genocidal wars during the last 250,000 years. (See also related discussion of the evolution of 
language by Foley [448], pages 43-78; Wade [452], pages 204—205.) 


The much later date of 50,000 years ago, or a little earlier, is given for the emergence of “fully modern 
language" by Wade [452], pages 7-8, 32, 46, 50, 226. Much earlier dates for very primitive language, perhaps 
as much as 250,000 years ago, are given by Leakey [450], pages 119-138. The date 250,000 years for very 
early language is also supported by Mithen [451], pages 140-142, 185-187. 


3.0.3 REMARK: The inadequacy and irrelevance of mathematical logic. 

Logic is the art of thinking. There are many styles of logic and many styles of thinking. Since analysis 
(including differential and integral calculus) was introduced in a significant way into mathematics in the 
17th century to support developments in mathematical physics, the demands on mathematical thinking have 
steadily increased, which has necessitated extensive developments in logic. Even in ancient Greek times, logic 
was insufficient to support their discoveries in mathematics. During the 20th century, mathematical logic 
was found to be inadequate, and the current state of mathematical logic is still far from satisfactory, bogged 
down in convoluted, mutually incompatible frameworks which have dubious relevance to the real-world 
mathematics issues which they were originally intended to resolve. The fact that practical mathematics has 
continued in blissful disregard of the paradoxes and the plethora of formalisms in the last 120 years gives 
some indication of the irrelevance of most mathematical logic research to most practical mathematics. 


Modern mathematics is presented predominantly within the framework of Zermelo-Fraenkel set theory, which 
is most precisely presented in terms of formal theory, including formal language, formal axioms and formal 
deduction. The purpose of such formality is to attempt to manage the issues which arise principally from 
the unrestrained application of infinity concepts. Practical mathematicians mostly show sufficient restraint 
in regard to infinity concepts, but sometimes they venture into dangerous territory. Then clear thinking is 
essential. It is not at all clear where the boundary between safe logic and dangerous logic lies. In some 
topics, mathematicians feel obliged to push far into dangerous territory, and one could argue that in many 
areas they have strayed well into the land of the pixies, where the concept of “existence” no longer has any 
connection with reality as we know it. So some logic is “a good idea”. 


3.0.4 REMARK: Why logic deserves to be studied and understood. 

Since the modern economy and standard of living are built with modern technology, and modern technology 
is nourished by modern science, and modern science is based upon mathematical models, and mathematics 
makes extensive use of mathematical logic, it would seem rational to invest some effort to acquire a clear 
and deep understanding of mathematical logic. If investment in scientific research is justified because science 
is the goose which lays the golden eggs, then surely logic, which is ubiquitous in science and technology, 
deserves at least some investment of time and effort. Many of the colossal blunders in science, technology 
and economics can be attributed to bad logic. So the ability to think logically is surely at least a valuable 
tool in the intellectual toolbox. The specific facts and results of mathematical logic are mostly not directly 
applicable in science, technology and economics, but the clarity and depth of thinking are directly applicable. 
Similarly, advanced mathematics in general is widely applicable in the sense that the mental skills and the 
clarity and depth of thinking are applicable. (See also Remark 2.1.6.) 


3.0.5 REMARK: Apologia for the assumption of a-priori knowledge of set theory and logic. 

It is assumed in Chapters 3, 4, 5 and 6 that the reader is familiar with the standard notations of elementary 
set theory and elementary symbolic logic. Since it is not possible to present mathematical logic without 
assuming some prior knowledge of the elements of set theory and logic, the circularity of definitions cannot 
be avoided. Therefore there is no valid reason to avoid assuming prior knowledge of some useful notations 
and basic definitions of set theory and logic, which enormously facilitate the efficient presentation of logic. 
Readers who do not have the required prior knowledge of set theory may commence reading further on in 
this book (where the required notations and definitions are presented), and return to this chapter later. 


If the cyclic dependencies between logic and set theory seem troubling, one could perhaps consider that “left” 
and “right” cannot be defined in a non-cyclic way. Left is the opposite of right, and right is the opposite 
of left. It is not clear that one concept may be defined independently prior to the other. And yet most 
people seem comfortable with the mutual dependency of these concepts. Similarly for “up” and “down”, 
and “inside” and “outside”. Few people complain that they cannot understand “inside” until “outside” 
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has been defined, and they cannot understand “outside” until “inside” has been defined, and therefore they 
cannot understand either! 


The purpose of Chapter 3 is more to throw light upon the “true nature" of mathematical logic than to try to 
rigorously boot-strap mathematical logic into existence from minimalist assumptions. The viewpoint adopted 
here is that logical thinking is a physical phenomenon which can be modelled mathematically. Logic becomes 
thereby a branch of applied mathematics. This viewpoint clearly suffers from circularity, since mathematics 
models the logic which is the basis of mathematics. This is inevitable. Reality is like that. 


To a great extent, the concept of a “set” or “class” is a purely grammatical construction. Roughly speaking, 
properties and classes are in one-to-one correspondence with each other. Ignoring for now the paradoxes 
which may arise, the “unrestricted comprehension" rule says that for every predicate $, there is a set 
Xe = (x; ¢(x)} which contains precisely those objects which satisfy ¢, and for every set X, there is a 
predicate $ such that (zx) is true if and only if x € X. It is possible to do mathematics without sets or 
classes. This was always done in the distant past, and is still possible in modern times. (See for example 
Misner/Thorne/Wheeler [292], which apparently uses no sets.) Since a predicate may be regarded as an 
attribute or adjective, and a set or class is clearly a noun, the “unrestricted comprehension" rule simply 
associates adjectives (or predicates) with nouns. Thus the adjective “mammalian”, or the predicate “is a 
mammal”, may be converted to “the set of mammals”, which is a nounal phrase. In this sense, one may say 
that prior knowledge of set theory is really only prior knowledge of logic, since sets are linguistic fictions. 
Hence only a-priori knowledge of logic is assumed in this presentation of mathematical logic, although the 
nounal language of sets and classes is often used to improve the efficiency of communication and thinking. 


(The Greek word “apología” [ànoňoyta] is used here in its original meaning as a “speech in defence”, not as 
a regretful acceptance of blame or fault as the modern English word “apology” would imply. Somehow the 
word has reversed its meaning over time! See Liddell/Scott [478], page 102; Morwood/'Taylor [480], page 44.) 


3.0.6 REMARK: Why propositional logic deserves more pages than predicate logic. 
It could be argued that propositional logic is a “toy logic", as in fact some authors do explicitly state. (See for 
example Chang/Keisler [347], page 4.) Some authors of logic textbooks do not present a pure propositional 


calculus, preferring instead to present only a predicate calculus with propositional calculus incorporated. 


As suggested in Remarks 4.1.1 and 5.0.3, the main purpose of predicate calculus is to extend mathematics 
from intuitively clear concrete concepts to metaphysically unclear abstract concepts, in particular from the 
finite to the uncountably infinite. Since most interesting mathematics involves infinite classes of objects, it 
might seem that propositional logic has limited interest. However, as mentioned in Remark 3.3.5, there is 
a “naming bottleneck” in mathematics (which is caused by the finite bandwidth of human communications 
and thinking). So in reality, all mathematics is finite in the “language layer", even when it is intended to 
be infinite in the “semantics layer". Therefore the apparently limited scope of propositional logic is not as 
limited as it might seem. Another reason for examining the “toy” propositional logic very closely is to ensure 
that the meanings of concepts are clearly understood in a simple system before progressing to the predicate 
logic where concepts are often difficult to grasp. 


3.0.7 REMARK: How to skip this chapter. 

There is probably no harm at all in skipping Chapters 3 to 16. These topics, namely logic, set theory, order 
and numbers, are abundantly presented in standard textbooks. These chapters are provided primarily for 
“filling the gaps" in later chapters by back-referencing. The best way to skip this chapter would be to proceed 
immediately to Chapter 17. On the other hand, the logic chapters 3, 4, 5 and 6 provide a solid foundation 
for a precise form of mathematical language which is intended to remedy the logical imprecision which is too 
often encountered in the literature. All later chapters make extensive use of this precise language. 


Chapter 3 presents the semantics of propositional logic. Chapter 4 presents the formalisation of the language 
of propositional logic, and some useful theorems. Chapters 5 and 6 extend logical argumentation tools for 
the analysis of relations between finite sets of individual propositions to infinite classes of propositions. The 
formal logic axioms and rules, and the logic theorems which are directly applied in the rest of the book, are 
in the following definitions and sections. 
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Definition 4.4.3 propositional calculus axiomatic system 
Sections 4.5, 4.6, 4.7 propositional calculus theorems 
Definition 6.3.9 predicate calculus axiomatic system 
Section 6.6 predicate calculus theorems 


Section 6.7 predicate calculus with equality 


3.1. Assertions, denials, deception, falsity and truth 


3.1.1 REMARK: The meaning of truth and falsity. 

Truth and falsity are apparently the lowest-level concepts in logic, but they are attributes of propositions. 
So propositions are also found at the lowest level. Propositions may be defined as those things which can be 
true or false, while truth and falsity may be defined as those attributes which apply to propositions. 


It seems plausible that, when human language first developed about 250,000 years ago, assertions developed 
very early, such as: “There’s a lion behind the tree.” It is possible that imperatives developed even earlier, 
such as “Come here.”, or “Go away.”, or “Give me some of that food.” But there is a fine line between 
assertions and imperatives. Every request or command contains some informational content, and assertions 
mostly carry implicit or explicit requests or commands. For example, “There’s a lion!” means much the 
same as “Run away from that lion.”, and “Give me food.” carries the information “I am hungry.” 


The earliest surviving written stories, such as the epic of Gilgamesh [491,492] in Mesopotamia, include 
numerous descriptions of deception using language. In fact, the first writing is generally believed to have 
originated in Mesopotamia about 3500BC as a means of enforcing contracts. Verbal contracts were often 
denied by one or both parties. So it made sense to record contracts in some way. Thus assertions, denials 
and deceptions have a long history. It seems plausible that these were part of human life for at least fifty 
thousand years. (For an ancient Egyptian text highlighting deception and the contrast between truth and 
falsity, see for example “The tale of the eloquent peasant” [499], pages 54-88.) 


It seems likely, therefore, that a sharp distinction between truth and falsity of natural-language assertions 
must have been part of daily human life since shortly after the advent of spoken language. So, like the 
numbers 1, 2 and 3, the logical notions of truth and falsity, and assertions and denials, may be assumed as 
naive knowledge. Hence it is unnecessary to define such concepts. In fact, it is probably impossible. The 
basic concepts of logic are deeply embedded in human language. 


3.1.2 REMARK: Truth and falsity are interchangeable in abstract logic. Truth is extra-logical. 

It is well known that mathematical logic is self-dual in the sense that if the truth values of all propositions 
are inverted, so that true propositions become false and false propositions become true, then the purely 
logical axioms and theorems remain as valid as they were before the inversion. (For example, the theorem 
A => (B = A) remains true because when each proposition is replaced by its negative, the result is another 
theorem, in this case (24) > ((5B) = (2A)).) 


This raises the question of whether truth and falsity have a real difference, or whether the designations 
"true" and "false" are completely arbitrary. In terms of abstract mathematical logic, these designations 
are in fact arbitrary. In the case of binary electronic circuits, for example, one may arbitrarily designate 
either high or low voltage to mean “true” or “false”. Mathematical logic is self-dual in the same way that 
binary circuits are self-dual. Thus within abstract mathematical logic, there is no real distinction between 
“true” and “false”. These concepts only have a real difference in the concrete proposition domains to which 
abstract logic is applied. (See Section 3.2 for concrete proposition domains.) For example, when abstract 
logic is applied to ZF set theory, the particular proposition “Ø € Ø” is designated as “false”. The theorems of 
pure mathematical logic would still be true if this proposition were designated as “true”, but in a first-order 
language, there are non-logical axioms which assert that some particular propositions are either true or false. 
Similarly in the real world, it is false to say that the Earth is flat, but the purely logical axioms are equally 
valid whether the Earth is flat or not. 


In this sense, one may say that truth and falsity cannot be defined. In other words, “truth” has no meaning 
except in relation to “falsity”, and vice versa. This is perhaps disturbing because both logic and mathematics 
are based on truth and falsity as their lowest-level core concepts. This issue is easily resolved by recognising 
that "truth" is an extra-logical concept. In the same way that the origin and axes of a Cartesian coordinate 
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system may be arbitrarily chosen, so also can truth values be arbitrarily chosen. But just as the objects in 
the world have definite coordinates when the coordinate system has been chosen, so also the propositions 
in a concrete proposition domain have definite truth values when the designations “true” and “false” have 
been assigned. It is reality which “breaks the symmetry” between “truth” and “falsity”. 


3.1.3 REMARK: All propositions are either true or false. A proposition cannot be both true and false. 

In a strictly logical model, all propositions are either true or false. This is the definition of a logical model. 
(If you disagree with this statement, you are in a universe where your belief is true and my belief is false!) 
Mathematical logic is the study of propositions which are either true or false, never both and never neither. 
Each proposition shall have one and only one truth value, and the truth value shall be either true or false. 


As a simple mathematical example, consider the proposition that “the octillionth decimal digit of 7 is even". 
This is most likely beyond the abilities of current hardware to compute. (If computers have caught up 
with this problem by the time you read this, simply replace this proposition with some other mathematical 
proposition which cannot currently be answered. If no such propositions exist when you read this, you can 
stop reading this book because mathematicians are of no further use.) This is the kind of proposition which 
one would say is either true or false. So it is a valid proposition for the purpose of the logic in this book. 
The fact that we might not know the truth value of the proposition does not imply that the truth value is 
undefined or something other than “true” or "false". Knowledge is a property of both the proposition and 
the person (or entity) which knows or does not know the truth value. Knowledge is subjective and variable. 


Nothing in the “real world” is clear-cut: true or false. Perceptions and observations of the “real world” 
are subjective, variable, indirect and error-prone. So one could argue that a strictly logical model cannot 
correctly represent such a “real world". The “real world" is also apparently probabilistic according to 
quantum physics. We have imperfect knowledge of the values of parameters for even deterministic models, 
which suggests that propositions should also have possible truth values such as “unknown” or “maybe” or 
“not very likely", or numerical probabilities or confidence levels could be given for all observations. But this 
would raise serious questions about how to define such concepts. Inevitably, one would need to define fuzzy 
truth values in terms of crisp true/false propositions. So it seems best to first develop two-valued logic, and 
then use this to define everything else. 


If someone claims that no proposition is either absolutely true or absolutely false, one could ask them 
whether they are absolutely certain of this. If they answer “yes”, then they have made an assertion which is 
absolutely true, which contradicts their claim. So they must answer ^no", which leaves open the possibility 
that absolutely true or absolutely false propositions do exist. More seriously, it is in the metalanguage that 
people are more likely to make categorical claims of truth or falsity, especially when there is no possibility 
of a proof. This supports the notion that a core foundation logic should be categorical and clear-cut. Then 
application-specific logics may be constructed within such a categorical metalogic. Therefore clear-cut logic 
does appear to have an important role as a kind of foundation layer for all other logics. 


The logic presented in this book is simple and clear-cut, with no probabilities, no confidence values, and no 
ambiguities. More general styles of logic may be defined within a framework of clear-cut true-or-false logic, 
but they are outside the scope of this book. One may perhaps draw a parallel with the history of computers, 
where binary zero-or-one computers almost completely displaced analogue designs because it was very much 
more convenient to convert analogue signals to and from binary formats and do the processing in binary than 
to design and build complex analogue computers. (There was a time long ago when digital and analogue 
computers were considered to be equally likely contenders for the future of computing.) 


3.1.4 REMARK: The truth values of propositions exist prior to argumentation. 

There is a much more important sense in which “all propositions are either true or false". Most of the 
mathematical logic literature gives the impression that the truth or falsity of propositions comes via logical 
argumentation from axioms and deduction rules. Consequently, if it is not possible to prove a proposition 
true or false within a theory, the proposition is effectively assumed to have no truth value, because where 
else could a proposition obtain a truth value except through logical argumentation? This kind of thinking 
ignores the original purpose of logical argumentation, which was to deduce unknown truth values from known 
truth values. Just because one cannot prove that a lion is or isn't behind the tree does not mean that the 
proposition has no truth value. It only means that one's argumentation is insufficient to determine the 
matter. Similarly, in algebra for example, the inability of the axioms of a field to prove whether a field K is 
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finite means that the axioms are insufficient, not that the proposition “K is finite" has no truth value. The 
proposition does have a truth value when a particular concrete field has been fully defined. 


This point may seem pedantic, but it removes any doubt about the “excluded middle" rule, which underlies 
the validity of “reductio ad absurdum" (RAA) arguments. So it will not be necessary to give any serious 
consideration to objections to the RAA method of argument. 


3.1.5 REMARK: RAA has absurd consequences. Therefore it is wrong. (Spot the logic error!) 

Some people in the last hundred years have put the argument that the excluded middle is unacceptable 
because if RAA is accepted as a mode of argument, then any proposition can be shown to be false from 
any contradiction. But this argument uses RAA itself, by apparently proving that RAA is false from the 
absurd consequence that any proposition can be shown to be false by a single contradiction. (The discomfort 
with the apparent arbitrariness of RAA is sometimes given the Latin name “ex falso (sequitur) quodlibet" , 
meaning “from falsity, anything (follows)". See for example Curry [350], pages 264, 285. A more colloquial 
expression for this would be: “One out, all out.") 


There is a more serious reason to reject the arguments against RAA. If a mathematical theory does yield 
a contradiction, by applying inference methods which are claimed to give only true conclusions from true 
assumptions, then the whole system should be rejected. The surfacing of a single contradiction means that 
either the inference methods contain a fault or the axioms contain a fault, in which case all conclusions 
arrived at by this system are thrown into doubt. In fact, in case of a contradiction, it becomes much more 
likely than not that a substantial number of invalid conclusions may already have been inferred. As an 
analogy, if a computer system yielded the output 3 + 2 = 7, one would either re-boot the computer, rewrite 
the software to remove the bugs, or consider purchasing different hardware. One would also consider that 
all computations up to that time have been thrown into serious doubt, necessitating a discard of all outputs 
from that system. 


Contradictions are regularly arrived at in practical mathematical logic by making tentative assumptions 
which one wishes to disprove, while maintaining a belief that all axioms, inference methods and prior theorems 
are correct. Then it is the tentative assumption which is rejected if a contradiction arises. It is difficult to 
think of another principle of inference which is more important to mathematics than RAA. This principle 
should not be accepted because it is indispensable. It should be accepted because it is clearly valid. 


RAA may be thought of as an application to logic of the “regula falsi? method, which was known to the 
ancient Egyptians and is similar to the "tentative assumption" method of Diophantus. (See Cajori [241], 
pages 12-13, 61, 91-93, 103.) The regula falsi idea is to make a first guess of the value of a variable, and 
then use the resulting error to assist discovery of the true value. In logic, one makes a first guess of the truth 
value of a proposition. If this yields a contradiction, this implies that the only other possible truth value 
must have been correct. This is a special case of the notion of negative feedback. 


3.1.6 REMARK: The subservient role of mathematical logic. 

Another benefit of the assertion that “truth values of propositions exist prior to argumentation” is that it 
makes clear the subservient status of logical theories, axioms and deduction rules. They are not arbitrary, 
as one might suppose by reading much of the mathematical logic literature. (See Section 79.8 for a sample 
of this literature.) Logical theories, axioms and deduction rules must be judged to be either useful or useless 
according to whether they yield the right answers. A proposition is not true or false because some arbitrary 
choice of axioms (e.g. for set theory) “proves” that it is true or false. If a theory “proves” that a proposition 
is true when we know that it is false, then the theory must be wrong. This becomes particularly significant 
when one must choose axioms for set theory. Axioms do not fall from the sky on golden tablets. 


The role and mission of mathematical logic is to find axioms and deductive methods which will correctly 
generate truth values for propositions in a given universe of propositions. The validity of a logical theory 
comes from its ability to correctly generate truth values of propositions, in much the same way that a linear 
functional on a linear space can be generated by choosing a valid basis and then extending the value of 
the functional from the basis to the whole space. Similarly, in the sciences, the validity of physical "laws" 
is judged by their ability to generate predictions and explanations of physical observations. If theory and 
reality disagree, then the theory is wrong. 


It seems plausible that the instinct which drives societies to concoct creation myths and folk etymologies 
could be the same instinct which drives mathematicians to seek axioms to explain the origins of mathematical 
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propositions. When one reverses the deductive method to seek these origins, there are inevitably gaps where 
“first causes” should be. There seems to be some kind of instinct to fill such gaps. 


3.1.7 REMARK: Valid logic must not permit false conclusions to be drawn from true assumptions. 

Any logic system which permits false conclusions to be drawn from true assumptions is clearly not valid. 
Such a logic system would lack the principal virtue which is expected of it. However, if it is supposed that 
truth is decided by argumentation within a logic system, the system would be deciding the truthfulness of 
its own outputs, which would make it automatically valid because it is the judge of its own validity. Thus 
all logic systems would be valid, and the concept of validity would be vacuous. Therefore to be meaningful, 
validity of a logic system must have some external test (by an impartial judge not connected with the case). 
This observation reinforces the assertion in Remark 3.1.6. Logical argumentation must be subservient to the 
extra-logical truth or falsity of propositions. 


3.1.8 REMARK: Mathematical logic theories must be consistent. 

If the universe of propositions which is modelled (in the sense of a world-model) by a logical theory is well- 
defined and self-consistent, one would expect the logical theory to also be well-defined and self-consistent. 
In fact, it is fairly clear that a logical theory which contains contradictions cannot be a valid model for a 
universe of propositions which does not contain contradictions. 


3.2. Concrete proposition domains 


3.2.1 REMARK: Aggregation of concrete propositions into “domains”. 

Concrete propositions are collected for convenience into “concrete proposition domains” in Definition 3.2.3. 
Propositional logic language consists of sequences of symbols which have no meaning of their own. They 
acquire meaning when proposition-name symbols “point to” concrete propositions. (The word “domain” is 
chosen here so as to avoid using the words “set” or “class”, which have very specific meanings in set theory.) 
Notation 3.2.2 gives convenient abbreviations for the proposition tags (i.e. attributes) “true” and “false”. 


3.2.2 NOTATION [MM]: 
F denotes the proposition-tag “false”. 
T denotes the proposition-tag “true”. 


3.2.3 DEFINITION [MM]: A truth value map is a map 7 : P > {F,T} for some (naive) set P. 
The concrete proposition domain of a truth value map T : P > (FE, T] is its domain P. 


3.2.4 REMARK: Metamathematical definitions. 

Notation 3.2.2 and Definition 3.2.3 are marked as metamathematical by the bracketed letters “[MM]”, which 
means that they are intended for the discussion context, not as part of a logical or mathematical axiomatic 
system. (Metamathematical theorems are similarly tagged as *[MM]", as described in Remark 3.13.4.) 


Metamathematical definitions are often given as unproclaimed running-commentary text, but it is sometimes 
useful to present them in numbered, highlighted metadefinition-proclamation text-blocks so that they can 
be easily cross-referenced. These may be regarded as “rendezvous points" for key concepts. 


Definitions, theorems and notations within the propositional calculus system in Definition 4.4.3 will be 
tagged as “[PC]”. Similarly the tag *[QC]" will be used for the predicate calculus system in Definition 6.3.9. 
Definitions, theorems and notations which are not tagged are assumed to be presented within the Zermelo- 
Fraenkel axiomatic system. 


3.2.5 REMARK: The use of basic set language and operations to explain logic. 
As suggested by Figure 2.1.1 in Remark 2.1.1, mathematical logic must be presented within a framework of 
pre-understood naive mathematics. The requirements of an “underlying set theory” for logic are minimal. 
(In the case of model theory, the need for an “underlying set theory” or “intuitive set theory” is stated 
explicitly by Chang/Keisler [347], pages 560, 579. Model theory demands much more from the underlying 
set theory than a basic introduction to propositional and predicate logic does.) 


Definition 3.2.3 uses the language of sets and functions because the mathematical reader will be familiar with 
these concepts. The purpose of an “underlying set theory” is mostly to supply vocabulary and notations for 
the explanation of logic. The most important operations are intersection and union, which are really nothing 
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more than the logical operations “and” and “or”, and the most important relation is set-inclusion, which is 
really nothing more than the logical implication relation. 


Some introductions to logic use two distinct logic notations in parallel, one for metalogic, the other for formal 
logic languages. Naive set notation is used here instead of a metalogical logic notation because it is less likely 
to be confused with formal logic notation. 


3.2.6 REMARK: Propositions have two possible truth values. This is the essence of propositions. 

In Definition 3.2.3, a truth value map is defined on a concrete proposition domain 7 which may be a set 
defined in some kind of set theory, or it may be a naive set. Each concrete proposition p € P has a truth 
value r(p) which may be true (T) or false (F). Some, all, or none of the truth values 7(p) may be known, 
but as discussed in Remarks 3.1.3 and 3.1.4, there are only two possible truth values. 


3.2.7 REMARK: Propositions obtain their meaning from applications outside pure logic. 

In applications of mathematical logic in particular contexts, the truth values for concrete propositions are 
typically evaluated in terms of a mathematical model of some kind. For example, the truth value of the 
proposition “Mars is heavier than Venus” may be evaluated in terms of the mass-parameters of planets in 
a Solar System model. (It's false!) Pure logic is an abstraction from applied logic. Abstract logic has the 
advantage that one can focus on the purely logical issues without being distracted by particular applications. 
Abstract logic can be applied to a wide variety of contexts. A disadvantage of abstraction is that it removes 
intuition as a guide. When intuition fails, one must return to concrete applications to seek guidance. 


Concrete propositions are located in human minds, in animal minds or in “computer minds". The discipline 
of mathematical logic may be considered to be a collection of descriptive and predictive methods for human 
"thinking behaviour", analogous to other behavioural sciences. This is reflected in the title of George Boole's 
1854 publication “An investigation of the laws of thought” [342]. Human logical behaviour is in some sense 
“the art of thinking". Logical behaviour aims to derive conclusions from assumptions, to guess unknown 
truth values from known truth values. The discipline of mathematical logic aims to describe, systematise, 
prescribe, improve and validate methods of logical argument. 


3.2.8 REMARK: Logical language refers to concrete propositions, which refer to logic applications. 

The lowest level of logic, concrete proposition domains, is the most concrete, but it is also the most difficult 
to formalise because it is informal by nature, being formulated typically in natural language in a logic 
application context. This is the semantic level for logic, where real logical propositions reside. A logical 
language L obtains meaning by “pointing to" concrete propositions. Similarly, a concrete proposition domain 
P obtains meaning by “pointing to" a mathematical system S or a world-model for some kind of “world”. 
This idea is illustrated in Figure 3.2.1. 


propositional calculus 
L Example: “pı or p2”, where 
Di = m « 3", po = m > 3” 


Y 


concrete proposition domain eir saine: 
P “nt <3”, “a > 3”, m =3”",... —> (F, T) 
T(** <3”) 2F,Tr(^*»3")-T,... : 
L 
mathematical system 
752. 9,9. «vs 


S 


Figure 3.2.1 Relation of logical language to concrete propositions and mathematical systems 


The idea here is that abstract proposition symbols such as p; and p» refer to concrete propositions such as 
"y < 3" and “a > 3", while symbols such as “x” and “3” refer to objects in some kind of mathematical 
system, which itself resides in some kind of mind. The details are not important here. The important thing 
to know is that despite the progressive (and possibly excessive) levels of abstraction in logic, the symbols do 
ultimately mean something. 
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3.2.9 EXAMPLE: Some examples of concrete proposition domains. 
The following are some examples of concrete proposition domains which may be encountered in mathematics 
and mathematical logic. 


(1) P = the propositions “x < y" for real numbers x and y. The map 7 : P — {F, T} is defined so that 
T("*r«y'")-Tifrzc«yandc(r«gy")-—Fifz-yinthe usual way. 

(2) P = the propositions “X € Y" for sets X and Y in Zermelo-Fraenkel set theory. The map 7 : P > 
(F, T} is defined so that 7(“X € Y") = Tif X € Y and T(^X € Y") =F if X ¢Y in the familiar way. 
(One may think of the totality of mathematics as the task of calculating this truth value map.) 


(3) P = the propositions *X € Y" and *X C Y" for sets X and Y in Zermelo-Fraenkel set theory. The 
map T : P — {F,T} is defined so that 7(^X € Y") = Tif XcY,z(*X €Y") =Fif X £ Y, 
T(*"X CY”) =Tif XCY,r(^X CY") =F if X ZY in the familiar way. 

(4) P = the well-formed formulas of any first-order language. The map 7 : P > (F, T} is defined so that 
T(P) = T if P is true and 7(P) =F if P is false. 

(5) P = the well-formed formulas of Zermelo-Fraenkel set theory. The map 7 : P > {F,T} is defined so 
that T(P) = T if P is true and 7(P) =F if P is false. 


In case (1), the domain P is a well-defined set in the Zermelo-Fraenkel sense. In the other cases above, P is 
the class of all ZF sets. 


Examples (6) and (7) relating to the physical world are not so crisply defined as the others. 


(6) P = the propositions P(x,t) defined as “the voltage of transistor x is high at time t" for transistors x 
in a digital electronic circuit for times t. The truth value r(P(x,t)) is not usually well defined for all 
pairs (x,t) for various reasons, such as settling time, related to latching and sampling. Nevertheless, 
propositional calculus can be usefully applied to this system. 


(7) P = the propositions P(n,t,z) defined as “the voltage V,,(t) at time t equals z" for locations n in an 
electronic circuit, at times t, with logical voltages z equal to 0 or 1. The truth value r(P(n,t, z)) is 
usually not well defined at all times t. Propositions P(n,t,z) may be written as “Vp (t) = 2”. 


Some of these examples of concrete proposition domains and truth value maps are illustrated in Figure 3.2.2. 


concrete proposition domain 


“Tt <3”, 
Pi “3.1< T^, 
Vp 2T. ous 
truth value space 
“Ar, Vy y E 2”, 
P2 *( e 0, F T |[FE,T) 
“AH, 


concrete proposition domain 


ase 
"W(0)- 1.4”, 
Ps | “Vo(3.7 uS) = 0.5”, 
“V7(7.2mS) = 2^, ... 


concrete proposition domain 


Figure 3.2.2 Concrete proposition domains and truth value maps 


3.2.10 REMARK: A concrete proposition domain is a pruned-back, limited-features kind of “set”. 

The naive set P of propositions which appears in Definition 3.2.3 is not the same as the kind of set which 
appears in set theories such as Zermelo-Fraenkel (Section 7.2) and Bernays-Gódel. A set P of concrete 
propositions requires only a very limited set-membership relation. 
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The membership relation x € P is defined for elements x of P, but there is no membership relation x € y for 
elements x and y of P. It is meaningless to say that one proposition “is an element of" another proposition. 
So the thorny issues which arise in general set theory do not need to be considered here. 


For example, Russell’s paradox cannot arise because the membership relation can only occur between a 
proposition x on the left and a set of propositions P on the right. So “loops” and infinite chains of propositions 
linked by membership relations are automatically excluded. Therefore no regularity axiom is required. 


3.3. Proposition name spaces 


3.3.1 REMARK: The difference between a dog and the word “dog”. 

Normally we do not confuse the word “dog” with the dog itself. If someone says that the dog has long hair, 
we do not imagine that the word «de^ has long hair. The dog cannot itself be inserted into a sentence. The 
word “dog” is used instead of a real dog. But we have no difficulties with understanding what is meant. 


'The situation in mathematical logic is different. Propositions are often confused with the labels which point 
to them. The formalist approach attempts to make the text itself the subject of study. Then a string of 
symbols such as *p — q" is considered to be a proposition, when in reality it is merely a string of symbols. 
The letters ^p" and “q” point to concrete propositions, and the symbol “=” constructs something out of 
p and q which is not a concrete proposition, but the symbol-string *p — q" does point to something. The 
question of what the symbol-string “p = q" refers to is discussed in Section 3.4. The defeatist approach is 
to say that all logic is about symbol-strings. In this book, it is claimed that proposition names are labels for 
concrete propositions, and logical expressions point to “knowledge sets". This gives symbol-strings meaning. 


The distinction between names and objects in logic is as important as distinguishing between points in 
manifolds and the coordinates which refer to those points. A hundred years ago, it was common to confuse 
points with coordinates. Such confusion is not generally accepted in modern differential geometry. (This 
"confusion" was introduced about 1637 by Descartes [212], and was formalised as an axiom by Cantor and 
Dedekind about 1872. See Boyer [236], pages 92, 267; Boyer [235], pages 291—292.) 


In this book, proposition names are admittedly often confused with the propositions which they point to. 
This mostly does as little harm as substituting the word “dog” for a real dog. When the difference does 
matter, the concepts of a “proposition name space" and a “proposition name map” will be invoked. In 
Section 3.4, for example, propositions are identified with their names. This is no more harmful than the 
standard practice of saying “let p be a point” instead of the more correct “let p denote a point" or “let p be 
a label for a point". But in the axiomatisation of logic in Chapter 4, the distinction does matter. 


The issue of the distinction between names and objects is called ^use versus mention" by Quine [380], page 23. 


3.3.2 DEFINITION [MM]: 
A proposition name map is a map p : N — P from a (naive) set M to a concrete proposition domain P. 


The proposition name space of a definition name map u : N — P is its domain N. 


3.3.3 REMARK: Truth value maps for proposition name spaces. 

Let u : N — P be a proposition name map. Then t = 7 o u : N — {F,T} maps abstract proposition 
names to the truth values of the propositions to which they refer. (See Definition 3.2.3 for truth value 
maps T : P + {F,T}.) The relation between proposition name spaces and concrete proposition domains is 
illustrated in Figure 3.3.1. 


From the formalist perspective, the name space M and the abstract truth value map t : M — (F, T] are 
the principal subject of study, while the domain P and the name map u : N — P are considered to be 
components of an "interpretation" of the language. 


3.3.4 REMARK: Constant names, variable names, and name scopes. 

Names may be constant or variable. However, variable names are generally constant within some restricted 
scope. The scope of a variable can be as small as a sub-expression of a logical sentence. Textbook for- 
malisations of mathematical logic generally define only two kinds of names: constant and variable. In real 
mathematics, all names have a scope and a lifetime. The scope of a name may range in extent from the global 
context of a theory down to a single sub-expression. In the global case, the name is called a "constant", 
while in the sub-expression case, the name may be called a “bound variable" or “dummy variable". 
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concrete proposition domain truth value space 


Figure 3.3.1 Proposition name space and concrete proposition domain 


Within a particular scope, one typically uses some names for objects which are constant within that scope, 
while other names may refer to any object in the universe of objects. As the scopes are “pushed” and 
“popped” (i.e. commenced or terminated respectively), the set of locally constant names is augmented or 
restored respectively. Therefore the proposition name maps in Definition 3.3.2 are themselves variable in 
the sense that they are context-dependent. In Figure 3.3.1, then, the map p is variable, while the domain P 
may be fixed. 


Scopes are typically organised in a hierarchy. If a scope Sı is defined within a scope So, then Sg may be said 
to be the “enclosing scope" or “parent scope", while Sı may be referred to as the “enclosed scope”, “child 
scope" or “local scope". The constant name assignments in any enclosed child scope are generally inherited 
from the enclosing parent scope. 


A theorem, proof or discussion often commences with language like: “Let P be a proposition." This means 
that the variable name P is to be fixed within the following scope. Then P is constant within the enclosed 
scope, but is a free variable in the enclosing scope. 


'The discussion of constants and scopes may seem to be of no real importance. However, it is relevant to 
issues such as the axiom of choice. Every time a choice of object is made, an assignment is made in the local 
scope between a name and a chosen object. This requires the name map to be modified so that a locally 
constant name is assigned to the chosen object. 


Shoenfield [390], page 7, describes the context-dependence of proposition name maps by saying that “[...] a 
syntactical variable may mean any expression of the language; but its meaning remains fixed throughout 
any one context." (His “syntactical variable" is equivalent to a proposition name.) 


3.3.5 REMARK: The naming bottleneck. 

Constant names cannot always be given to all propositions in a domain 7. Variable names can in principle be 
made to point at any individual proposition, but some propositions may be “unmentionable” as individuals 
because the selective specification of some objects may require an infinite amount of information. (For 
example, the digits of a truly random real number in decimal representation almost always cannot be 
specified by any finite rule.) 


Proposition name spaces in the real world are finite. Countably infinite name spaces are a modest extension 
from reality to the slightly metaphysical. Anything larger than this is somewhat fanciful. Some authors 
explicitly state that their proposition name spaces are countable. (See for example E. Mendelson [370], 
page 29.) Most authors indicate implicitly that their proposition name spaces are countable by using letters 
indexed by integers, such as pi, po, pa, .. ., for example. 


3.3.6 REMARK: Application-independent logical axioms and logical theorems. 
A single proposition name map p as in Definition 3.3.2 may be mapped to many different concrete proposition 
domains. (This is illustrated in Figure 3.3.2.) 


Propositional logic may be formalised abstractly without specifying the concrete proposition domain or the 
proposition name map. Then it is intended that the theorems will be applicable to any concrete proposition 
domain and any valid proposition name map. The axioms and theorems of such abstract propositional logic 
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Figure 3.3.2 Proposition name space with multiple concrete proposition domains 


are called “logical axioms” and “logical theorems” to distinguish them from axioms and theorems which are 
specific to particular applications. 


3.4. Knowledge sets 


3.4.1 REMARK: A “knowledge set” is everything that is known (or believed) about a truth value map. 

If one has full knowledge of the truth values of all propositions in a concrete proposition domain P, this 
knowledge may be described by a single map 7 : P — {F, T}. In this case, there is no need for logical 
deduction at all, since there are no unknowns. Logical argumentation has a role to play only when there are 
some knowns and some unknowns. 


For temporary convenience within the logic chapters, let 2 denote the set (F, T). Then 7 € 2”. 


The state of knowledge about a truth value map 7 may be specified as a subset K of 2? within which the 
map is known to lie. This means that 7 is known to be excluded from the set 2” V K. In terms of the 
notation P(2”) for the set of all subsets of 2”, one may write K € P(2”). 


Such a set K may be referred to as a “(deterministic) knowledge set" for the truth value map 7 : P > {F, T}. 
It would perhaps be more accurate to call K the “ignorance set", since the larger K is, the less knowledge 
one has. (Probabilistic knowledge could be defined similarly in terms of probability measures, but that is 
outside the scope of this book.) 


Although the word “knowledge” is used here, this does not mean knowledge in the sense of “true facts" 
about the real world. Knowledge is always framed within some kind of model of the real world, although 
many people often do not distinguish between their world-model and the world itself. 


Even within a single world-model, each person may have different beliefs about the same propositions, and 
even one individual person may have different beliefs at different times, or even multiple inconsistent beliefs 
simultaneously. Thus “knowledge” may be static or dynamic, correct or incorrect, consensus or controversial, 
and may be asserted with great certainty or extreme doubt. 


Individual propositions, and assertions regarding their truth values, are observed “in the wild” in human 
behaviour. Assertions of constraints on the relations between truth values may also be observed "in the 
wild". The knowledge-set concept is a model for human knowledge (or beliefs, to be more precise). The 
main task in the formulation of logic is to correctly describe valid, self-consistent logical thinking in terms of 
general laws for relations between proposition truth values, and also the processes of argumentation which 
aim to deduce unknowns from knowns. This task is carried out here in terms of the knowledge-set concept. 
Although the words “set” and “subset” are used in Definition 3.4.4, these are only naive sets because by 
Definition 3.3.2, a concrete proposition domain is only a naive set. These “sets” may or may not be sets or 
classes in the sense of set theories, but this does not matter because only the most naive properties of “sets” 
are required here. (The term “knowledge space" in Definition 3.4.3 is rarely used in this book.) 
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3.4.2 NOTATION [MM]: 
2 denotes the (naive) set {F, T}. 


2? denotes the (naive) set of all truth value maps for a concrete proposition domain P. In other words, 
2P = {r : P — {F,T}} is the (naive) set of all maps from P to {F, T}. 


3.4.3 DEFINITION[MM]: The knowledge space for a concrete proposition domain 7 is the set 2”. 
3.4.4 DEFINITION [MM]: A knowledge set for a concrete proposition domain P is a subset of 2P. 


3.4.5 EXAMPLE: A knowledge set for a concrete proposition domain with 3 propositions. 
As an example of a knowledge set, let P = (pi, p2, p3} be a concrete proposition domain. If it is known only 
that p; is false or pg is true (or both), then the knowledge set is 


K-iíre 2r T(p1) =F or r(p2) = T} 
= {(F, F, F), (FE, F, T), (F,T,F),(F,T,T),(T,T,F),(T,T,T)}, 


where the ordered truth-value triples are defined in the obvious way. 

The knowledge set K’ if it is known only that pı is true is K' = {(T, F, F), (T, F, D, (T, T, F), (T, T, T)). 
Then the intersection K N K’ = ((T, T, F), (T, T, T)} contains all truth value maps for which both pı and 
p» are true. More generally, knowledge of truth value maps may be combined by intersecting knowledge sets. 
(This is also stated in Remark 3.4.10.) 


3.4.6 REMARK: Quantitative measures of knowledge (and ignorance). 

One may define quantitative measures of knowledge. For a finite set P, the real number u(P,K) = 
loga (#(2”)/#(K)) = #(P) — loga (#(K)) satisfies 0 < (P, K) < #(P) for any knowledge set K C 2P. One 
may consider this to be a measure of the amount of information in the knowledge set K, measured in units 
of “bits”. (In Example 3.4.5, u(P, K) = 3 — loga (6) ~ 0.415 bits.) 

This “knowledge measure” is really a measure of the set of possible values of the truth value map 7 which are 
excluded by K from 2”. In other words, it measures 2? \ K. This helps to explain the form of the function 
u(P, K) = log,(#(2”)/#(K)). The more one knows, the more possibilities are excluded, and vice versa. 
The complementary quantity loga (#(K)) may be regarded as an “ignorance measure”. (In Example 3.4.5, 
the ignorance measure would be log4(6) ~ 2.585 bits.) 


One has the maximum possible knowledge of the truth value map on a concrete proposition domain P 
if the knowledge set has the form K = {7} for some 7 : P — {F,T}. If P is finite, then #(K) = 1 
and u(P, K) = #(P) bits. At the other extreme, one has the minimum possible knowledge if K = 2”, which 
gives u(P, K) = 0 bits if P is finite. 


3.4.7 EXAMPLE: The logical relation between smoke and fire. 

Define a concrete proposition domain by P = {p,q}, where p = “there is smoke" and q = “there is fire". It 
is popularly believed that “where there is smoke, there is fire". Whether this is true or not is unimportant. 
The descriptive role of the discipline of mathematical logic is to model human thinking, whether human 
beliefs and thought processes are correct or not. In this case, then, someone “knows” that the truth value 
map T : P > (F, T} lies in the knowledge set K = ((F, F), (F, T), (T, T)). So the “knowledge measure" is 
u(P, K) = 2 — log,(3) ~ 0.415 bits. 

If smoke is observed, then the knowledge set shrinks to K = {(T, T)}, and the measure of knowledge increases 
to 2 bits. This is a kind of “knowledge amplification”. But if only fire is observed, then the knowledge set 
becomes K = {(F, T), (T, T)}, which gives only 1 bit for the measure of knowledge. 


If the person also knows that “where there is fire, there is smoke”, then, in the absence of observations of 
either fire or smoke, the knowledge set is K = {(F,F),(T,T)}, which gives 1 bit of knowledge. If either 
smoke or fire is then observed, the knowledge will increase to 2 bits. 


3.4.8 REMARK: Knowledge can assert individual propositions or relations between propositions. 

Although not all truth values 7(p) for p € P may be directly observable, relations between truth-values may 
be known or conjectured. The objective of logical argumentation is to solve for unknown truth-values in 
terms of known truth-values, using known (or conjectural) relations between truth-values. In Example 3.4.7, 
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knowledge can have the form of a relation such as a = “where there is smoke, there is fire". Knowledge is 
stored in human minds not only as the truth values of individual propositions, but also as relations between 
truth values of propositions. If no knowledge could ever be expressed as relations between truth values of 
individual propositions, there would be no role for propositional calculus. 


It could be argued that someone whose knowledge is only the above logical relation o has no knowledge at all. 
Does this person know whether there is smoke? No. Does this person know whether there is fire? No. There 
are only two questions, and the person does not know the answer to either question. Therefore they know 
nothing at all. Zero knowledge about smoke plus zero knowledge about fire equals zero total knowledge. 
The purpose of Remark 3.4.6 and Example 3.4.7 is to resolve this apparent paradox. The “knowledge set” 
concept in Remark 3.4.1 is an attempt to give a concrete representation for the meaning of the “partial 
knowledge” in compound logical propositions. 


3.4.9 REMARK: Logical deduction has similarities to classical algebra. 

In Example 3.4.7, the observation r(p) = T is combined with the known logical relation a in Remark 3.4.8 
to arrive at the conclusion 7(q) = T. This is analogous to solving arithmétic equations and inequalities. 
Both propositional calculus and predicate calculus may be thought of as techniques for solving large sets of 
simultaneous logical equations and inequalities. 


3.4.10 REMARK: Knowledge sets may be combined. 

Any two knowledge sets Kı and Kə which are both valid for a concrete proposition domain P may be 
combined into a knowledge set K = Kı N K2 which represents the combined knowledge about P. In other 
words, if Kı and Kə are both valid for P, then Kı N Kə is valid for P. 


If someone possesses multiple knowledge sets regarding a single concrete proposition domain, those knowledge 
sets can be combined by intersecting them to produce a single “grand unified knowledge set” Ko. Then all 
of the individual knowledge sets will be supersets of this combined set Ko. If knowledge about the truth and 
falsity of propositions was static, one could in principle construct such a combined set Ko and discard all of 
the contributing knowledge sets. However, knowledge is most often a dynamic process involving discoveries 
(and refutations) by multiple individuals. 


As knowledge accumulates, the unified set Ko becomes smaller over time, but if some contributing knowledge 
is found to be unreliable, it must be removed from the trusted knowledge base and the intersection of the 
remaining knowledge sets must be recomputed. 


3.5. Knowledge sets using truth domains 
3.5.1 REMARK: Replacing truth value maps with subsets, which are simpler and equi-informational. 


From Remark 3.2.6, it follows that the truth value maps in Definition 3.2.3 can be replaced with subsets 
of P, which are a much simpler concept. Two-valued maps and subsets are “equi-informational”. 


For any map 7 : P — (F, T], one may define the set S, consisting of all elements p of P which satisfy 
T(p) = T. For any subset S of P, one may define the map rs : P > (F, T] with rs(p) = T if and only if 
p is an element of S. It is evident that rg, = 7 and S,, = S. Thus no information is lost (or gained) by 
replacing truth value maps on P with the corresponding “truth subsets" of P. 


Informally one may write S+ = 7 !((T]) and s = (S x (TJ) U((PX S) x {F}) using set-theoretic notations 
and concepts, but such formulas are too clumsy to be useful for presenting logic. 


A difficulty arises with the “truth subset” representation of truth value maps when the domain P is variable 
or not stated. The domain can easily be recovered from a truth value map (because it is the domain of the 
map), but not from a set. Therefore to be unambiguous, one would have to define “truth subsets" as pairs 
(P,S) for subsets S of P. Alternatively one could define “truth subsets" as pairs (S, S"), where S" is the 
complement of S in P. Then P could be recovered as the union of S and S’. Such an arrangement is quite 
inconvenient. One would have to read pairs such as (P, S) or (S, S") as “the subset S of P". 


Because of this “domain ambiguity" issue, the “truth value map" representation in Definition 3.2.3 has been 
chosen for this book in preference to the “truth subset" representation. Truth subsets are introduced in 
Definition 3.5.2 with the name “truth domains”, but they are not used outside Section 3.5. 
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3.5.2 DEFINITION[MM]: Truth domains. 
The truth domain of a truth value map T on a concrete proposition domain P is the subset of P consisting 
of all propositions p in P which satisfy r(p) = T. 


A truth domain of a concrete proposition domain P is a subset of P. 
(In other words, it is a collection of propositions which are all in P.) 


A truth domain (pair) is a pair (P, S) such that P is a concrete proposition domain and S is a truth domain 
of P. (A truth domain pair (P, S) may be abbreviated to S when P is implicit in the context.) 


3.5.3 EXAMPLE: Simplification of knowledge sets by using truth domains. 
The truth function maps 7 in Example 3.4.5 may be replaced with their corresponding truth domains as in 
Definition 3.5.2. Then the knowledge set K may be replaced by the corresponding set 


K —-[S,;reK) 
= (S € P(P); p, is not in S or pg is in S] 
= {0, {p3}, {p2}, {p2, P3}, im. P2}, {p1, P2, P3}}, 


where P(P) denotes the set of subsets of P. The elements of K are the subsets S of P which are truth 
domains of truth value maps in K. 


3.5.4 EXAMPLE: Distinctions between propositions, truth domains, and knowledge sets. 
To clarify the distinctions between propositions, truth domains and knowledge sets, consider the concrete 
proposition domain P = (pi, p2} and list all of the associated truth domains and knowledge sets. 
(1) The N = 2 concrete propositions are pı and po. 
(2) The 2% = 4 possible truth domains are: Ø, {p1}, {p2}, (pi, p2}- 
(3) The 22") = 16 possible knowledge sets are: Ø, (0), {{p1}}, {{p2}}, (pi P2}}, 
(0, {pit}, (0, {p2}}, (0, {p1, p2}}, iin) {p2}}, iint {p1; p2 }}, {{p2}, {p1,p2}}, 
(0, {pi}; {pa}}, (0, {pi}, {p1,p2}}, (0, {p2}, {p1; p2}}, ini {p2}, {p1; p2}}, (0, {pi}, {p2}, {p1; p2}}- 


Truth domains are used in this example instead of truth value maps because truth domains are easier to 
write and easier to read. (This was in fact the motivation for introducing Definition 3.5.2.) 


An important point to note about this example is that the propositions p; and pọ appear on their own in 
all of the three levels, but they have a different meaning in each level. 
(1) pı and pa are concrete propositions with no associated truth values. 
(2) {pi} and {p2} are truth domains, where 
{pi} means that “pı is true and pə is false", whereas 
{p2} means that “pə is true and p; is false". 
(3) {{pi}} and {{p2}} are knowledge sets, where 
{{pi}} means it is known that “pı is true and pə is false", whereas 
{{p2}} means it is known that “pə is true and p; is false". 


The distinction between (2) and (3) may appear to be subtle. It is the distinction between merely mentioning 
a possible combination of truth values versus asserting that this is the only possible combination. 


3.6. Basic logical expressions combining concrete propositions 

3.6.1 REMARK: Notations for basic logical operations 

Notation 3.6.2 gives informal natural-language meanings for some standard symbols for the most common 
informal natural-language logical operations. (See Table 3.7.1 in Remark 3.7.14 for a literature survey of 


alternative symbols for these logical operations.) 
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3.6.2 NoTATION[MM]: Basic unary and binary logical operator symbols. 
(i) ^ means “not” (logical negation). 

(ii) A means “and” (logical conjunction). 

(iii) V means “or” (logical disjunction). 

(iv) = means “implies” (logical implication). 

(v) 


(vi) = means “if and only if” (logical equivalence). 


< means “is implied by” (reverse logical implication). 


3.6.3 REMARK: Mnemonics and justifications for logical operator symbols. 

The ^ (not) symbol is perhaps inspired by the — (minus) symbol for arithmétic negation. A popular 
alternative for “not” is the ~ symbol. However, this is easily confused with other uses of the same symbol. 
So ^ is preferable when logic is mixed with mathematics in a single text. 


The ^ and V symbols have some easy mnemonics (in English). The ^ symbol suggests the letter ^A" in the 
word “and”, but without the horizontal strut. The V symbol is the opposite of this. The ^ and V symbols 
for propositions match the corresponding N and U symbols for sets. The U symbol resembles the letter “U” 
for “union”. (Arguably the N symbol should suggest the “A” in the word "all" .) 

According to Quine [380], pages 12-13, Kleene [366], page 11, and Lemmon [367], page 19, the V symbol 
is actually a letter ^v", which is a mnemonic for the Latin word “vel”, which means the inclusive “or”, as 
opposed to the Latin word “aut”, which means the exclusive “or”. (The mnemonic “vel” for ^v" is also 
mentioned by Kórner [461], page 39, and in a footnote by Eves [353], page 246. The Latin vel/aut distinction 
is also described in Hilbert/Ackermann [358], page 4, where the letter “v” notation is used instead of ^v^.) 
The symbol “V” is attributed to Whitehead/Russell, Volume I [400] by Quine [381], page 15, and Quine [380], 
page 14. 

Although the first three symbols in Notation 3.6.2 are very common in logic, they are not so common in 
general mathematics. In differential geometry, the exterior algebra wedge product symbol clashes with the ^ 
logic symbol. To avoid such clashes, one often uses natural-language words in practical mathematics instead 
of the more compact logical symbols. 


The arrow symbol “=” for implication is suggestive of an “arrow of time". Thus the expression “A > B” 
means that “if you have A now, then you will have B in the future". The reverse arrow “<=” may be thought 
of in the same way. The double arrow “=” suggests that one may go in either direction. Thus “A = B" 
means that “B follows from A, and A follows from B". Single arrows are needed for other purposes in 
mathematics. So double arrows are used in logic. 


3.6.4 REMARK: Temporary knowledge-set notations for some basic logical expressions. 

The notations for some basic kinds of knowledge sets in Notation 3.6.5 are temporary. They are useful for 
presenting the semantics of logical expressions. The view taken here is that logical expressions represent either 
knowledge, conjectures or assertions of constraints on the truth values of propositions. These constraints 
may be expressed in terms of knowledge sets. Hence operations on knowledge sets are presented as a prelude 
to defining the meaning of logical expressions. 


3.6.5 NOTATION [MM]: Let P be a concrete proposition domain. Then for any propositions p,q € P, 
(1) K, denotes the knowledge set {r € 2”; r(p) = T], 


(2) K.,, denotes the knowledge set (7 € 2”; r(p) = F}, 

(3) Kpag denotes the knowledge set {r € 2”; (r(p) = T) ^ (r(q) = T)}, 
(4) Kpvq denotes the knowledge set {r € 2”; (r(p) = T) v (r(q) = T)}, 
(5) K, 4.4 denotes the knowledge set {7 € 2”; (r(p) = T) > (r(g) = T)}, 
(6) K,.., denotes the knowledge set (7 € 2”; (r(p) = T) = (r(q) = T)}, 
(7) K,44 denotes the knowledge set {7 € 2”; (r(p) = T) & (r(q) = T)}. 


3.6.6 REMARK: Construction of knowledge sets from basic logical operations on sets. 

The sets K, and K-p in Notation 3.6.5 are unambiguous, but the sets in lines (3)-(7) require some explanation 
because they are specified in terms of the symbols in Notation 3.6.2, which are abbreviations for natural- 
language terms. These sets may be clarified as follows for any p,q € P for a concrete proposition domain 7. 
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(1) Kp = {7 € 2P; T(p) = T). 
(2) Kip = 2? \ Kp. 

(3) Kpag = Kp N Ka. 

(4) Kova = Kp U Ko. 

(D) Ko Kp U Ka. 

(6) Boag = Kp U Bay. 


(T) K 54 = Kps Kpg 


In line (5), Kp+q denotes the set {r € 2”; (r(p) = T) = (r(q) = T)}, which means the knowledge that 
“if r(p) = T, then r(q) = T". In other words, if (p) = T, then the possibility 7(g) = F can be excluded. 
In other words, one excludes the joint occurrence of 7(p) = T and r(q) = F. Therefore the knowledge set 
Kpsq equals 2” V (Kp N Kuq). Hence Kpsq = K-, U Ky. Lines (6) and (7) follow similarly. 

Using basic set operations to define basic logic concepts and then “prove” relations between them is clearly 
cheating because basic set operations are first introduced systematically in Chapter 8. This circularity is 
not as serious as it may at first seem. The purpose of introducing the “knowledge set” concept is only to 
give meaning to logical expressions. It is possible to present formal logic without giving any meaning to 
logical expressions at all. Some people think that meaningless logic (i.e. logic without semantics) is best. 
Some people think it is inevitable. The idea that logical languages have no meaning, which has been broadly 
accepted for many decades, is expressed for example by Roitman [385], page 27. 


First-order languages have no meaning in themselves. There is syntax, but no semantics. At this 
level, sentences don’t mean anything. 


However, even the most semantics-free presentations of logic are inspired directly by the underlying meaning 
of the formalism. Making the intended meaning explicit helps to avoid the development of useless theories 
which are exercises in pure formalism. 

The sets in Notation 3.6.5 may be written out explicitly if P has only two elements. Let P = {p,q}. Denote 
each truth value map 7 : P > (FE, T] by the corresponding pair (r(p),7(q)). Then the sets (1)-(7) may be 
written explicitly as follows. n 


1) Kp = (T, F), (T, T). 

2) K-p = {(F, F), (F, T)). 

3) Kpag = (T, T)}- 

4) Kyyq = {(F, T), (T, F), (T, T)}. 
5) Ks, = {(F, F), (F, T), (T, T)). 
6) Kpeq = {(F, F), (T, F), (T, T)}. 
7) Kyeq = {(F, F), (T, T)j. 


The fact that the knowledge sets may be written out as explicit lists implies that there are no real difficulties 
with circularity of definitions here. The concept of a list may be considered to be assumed naive knowledge. 
The use of the set brackets “{” and “}” is actually superfluous. Thus any unary or binary logical expression 
may be interpreted as a specific constraint on the possible combinations of truth values of one or two specified 
propositions. Such a constraint may be specified as a list of possible truth-value combinations, while the truth 
values of all other propositions in the concrete proposition domain may vary arbitrarily. The truth-value 
combination list is finite, no matter how infinite the proposition domain is. 


3.6.7 REMARK:  Escaping the circularity of interpretation between logical and set operations. 

The circularity of interpretation between logical operations and set operations may seem inescapable within 
a framework of symbols printed on paper. The circularity can be escaped by going outside the written page, 
as one does in the case of defining a dog or any other object which is referred to in text. According to 
Lakoff/Nünez [449], pages 39-45, 121-152, the objects and operations for both logic and sets come from 
real-world experience of containers (for sets) and locations (for states or predicates of objects), as in Venn 
diagrams. There may be some truth in this, but for most people, logical operations are learned as operational 
procedures. For example, the meaning of the intersection AM B of two sets A and B (for some concrete 
examples of A and B, such as "the spider has a black body and a red stripe on its back"), may be learned 
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somewhat as in the following kind of operational definition scheme. (This example may be thought of as 
defining either the set expression AN B or the logical expression (x € A) ^ (x € B).) 


1) Consider an object x. (Either think about it, write it down, hold it in your hand, look at it from a 
distance, or talk about it.) 


2) If you have previously determined whether x is in AN B, and z is in the list of things which are in ANB, 
or in the list of things which are not in AN B, that’s the answer. End of procedure. 


3) Consider whether x is in A. If x is in A, go to step (4). If x is not in A, add z to the list of things 


which are not in AN B. That's the answer. End of procedure. 


4) Consider whether x is in B. If x is in B, add x to the list of things which are in AN B. That's the 
answer. End of procedure. Otherwise, if x is not in B, add x to the list of things which are not in ANB. 
That’s the answer. End of procedure. 


'This kind of procedure is somewhat conjectural and introspective, but the important point is that in opera- 
tional terms, two tests must be applied to determine whether x is in AN B. In principle, one may be able to 
perform two tests concurrently, but in practice, generally one test will be more difficult than the other, and a 
person will know the answer to one of the tests before knowing the answer to the other test. The conclusion 
about the logical conjunction of the two propositions may be drawn in step (3) or in step (4), depending on 
the outcome of the first test. Typically the easier test will be performed first. One presumably retains some 
kind of memory of the outcomes of previous tests. So memory must play a role. If the answer to one test is 
known already, only one test remains to be performed, or it may be unnecessary to perform any test at all 
(if the first test failed). 


In early life, the meaning of the word “and” is generally learned as such a sequential procedure. Often a child 
will think that the criteria for some outcome have been met, but will be told that not only the criterion which 
has been clearly met, but also one more criterion, must be met before the outcome is achieved. E.g. “I’ve 
finished eating the potatoes. Can I have dessert now?" Response: “No, you must eat all of the potatoes and 
all of the carrots." Result: Child learns that two criteria must be met for the word “and”. In practice, the 
criteria are generally met in a temporal sequence. The operational procedure for the word “or” is similar. 
(The English word “or” comes from the same source as the word “other”, meaning that if the first assertion 
is false, there is another assertion which will be true in that case, which suggests once again a temporal 
sequence of condition testing.) 


Basic logic operations such as “and” and “or” are learned at a very early age as concrete operational 
procedures to be performed to determine whether some combination of logical criteria has been met. Basic 
set operations such as union and intersection likewise correspond to concrete operational procedures. The 
difference between these set operations and the corresponding logical operations is more linguistic than 
substantial. Whether one reports that x is a rabbit or x is in the set of rabbits could be regarded as a 
question of grammar or linguistic style. 


It is perhaps of some value to mention that in digital computers, basic boolean operations on boolean variables 
may be carried out in a more-or-less concurrent manner by the settling of certain kinds of electronic circuits 
which are designed to combine the states of the bits in pairs of registers. However, a large amount of the logic 
in computer programs is of the sequential kind, especially if the information for individual tests becomes 
available at different times. Sometimes, the amount of work for two boolean tests is very different. So it is 
convenient to first perform the easy test, and only if the outcome is still unknown, the second (more “costly” ) 
test is performed. Such sequential testing corresponds more closely to real-life human logic. 


'The static kinds of logical operations which are defined in mathematical logic have their ontological source 
in human sequential logical operations, not static combinations of truth values. However, this difference can 
be removed by considering that mathematical logic applies to the outcomes of logical operations, not to the 
procedures which are performed to yield such outcomes. 


3.6.8 REMARK: Interpretations of basic logical expressions. 

Knowledge sets give meanings to logical expressions. When we say that p € P is true, we mean that r(p) = T, 
which means that a knowledge set K for P satisfies K C Kp, for Ky as in Remark 3.6.6. The set Kp is an 
interpretation for the knowledge (or assertion or conjecture) that p is true. 


If we know that both p and q are true, then K C K, N K4. The set Kp N Ky gives a concrete interpretation 
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for the assertion “p and q”. Similarly, the assertion “p or q?” may be interpreted as the set K, U Kg, and 
“not p" may be interpreted as the set Kp = 2” \ Kp. 


Definition 3.6.9 formalises interpretations in terms of knowledge sets for logical expressions which use the 
unary and binary logical operators listed in Notation 3.6.2. Each symbolic expression in Definition 3.6.9 is 
interpreted as a set of possible truth value maps 7 on a concrete proposition domain 7. 


3.6.9 DEFINITION [MM]: Let P be a concrete proposition domain. Then for any propositions p,q € P, the 
following are interpretations of logical expressions in terms of concrete propositions. 


(1) “p” signifies the knowledge set K, = {7 € 2^; r(p) = T]. 


2) “~p” signifies the knowledge set K-p = 2” \ Kp = (r € 2”; r(p) = F}. 
3) “p ^ q" signifies the knowledge set Kpag = Kp N Ky = (r € 2^; (r(p) = T) ^ (r(q) = T)}. 
4) “p V q" signifies the knowledge set Kpvq = Kp U Kq = {7 € 2^; (r(p) = T) v (r(q) = T)) 


6 
7 


, 


(2) 

(3) ) 
(4) ) 
(5) “p = q” signifies the knowledge set Kp+q = Kap U Kq = (r € 2^; (r(p) 

(6) “p <q” signifies the knowledge set Kj, = Kp U Kq = {7 € 2”; (r(p ^ T 
(7) “p = q” signifies the knowledge set Kpog = Ky a4 Kpeq = {7 € 2^; (r(p) = T) > (r(q) = T)). 


3.6.10 REMARK: Logical expressions versus the meanings of logical expressions. 

It is important to distinguish between logical expressions and their meanings. A logical expression is a string 
of symbols which is written within a specified context. A meaning is associated with the logical expression by 
the context. Sometimes logical expressions are presented as merely strings of symbols which obey specified 
syntax rules and are manipulated according to specified deduction rules. However, logical expressions often 
also have concrete meanings with which the syntax and deduction rules must be consistent. It is possible in 
pure logic to define logical expressions in terms of linguistic rules alone, in the absence of any meaning, but 
such purely linguistic, semantics-free logic is of limited direct practical applicability in mathematics. 


'The use of quotation marks in Definition 3.6.9 emphasises the fact that logical expressions are symbol-strings, 
whereas their interpretations are sets. 


3.6.11 REMARK: Proposition name maps and the interpretation of logical expressions. 

The alert reader will recall the discussion in Section 3.3, where a sharp distinction was drawn between 
proposition names in a set M and concrete propositions in a domain P. In Sections 3.4 and 3.6 up to 
this point, it has been tacitly assumed that M = P, and that the proposition name map p : N — P is 
the identity map. The distinction starts to be important in Definition 3.6.9. If the assumption VV = P is 
not made, then Definition 3.6.9 would state that for any proposition names p,q € M, “p” signifies the set 
Ka) = {7 € 2°; r(u(p)) = T}, and so forth. This would be too tediously pedantic. So it is assumed 
that the reader is aware that proposition symbols such as p and q are names in a name space M which are 
mapped to concrete propositions in a domain P via a variable map u : N — P. The relations between 
proposition name spaces, concrete propositions, logical expressions and knowledge sets (in the case N z P) 
are illustrated in Figure 3.6.1. 
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Figure 3.6.1 Proposition names, concrete propositions, logical expressions and knowledge sets 
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The proposition name map yu in Figure 3.6.1 is variable. Therefore the interpretation map is also variable. 
To interpret a logical expression, one carries out the following steps. 


1) Parse the logical expression (e.g. “p V q") to extract the proposition names in V (e.g. p and q). 

2) Follow the map u : N — P to determine the named concrete propositions in P (e.g. u(p) and u(q)). 

3) 
) 


4) Apply the set-constructions to the atomic knowledge sets in accordance with the structure of the logical 
expression to obtain the corresponding compound knowledge set (e.g. Kup) U K,(,). 


Follow the embedding from P to P(2”) to determine the atomic knowledge sets (e.g. Kup) and K,(). 


( 
( 
( 
( 


3.6.12 REMARK: Logical operators acting om propositions versus acting on logical expressions. 
The logical operations which act on proposition names to form logical expressions in Definition 3.6.9 must 


be distinguished from logical operations which act on general logical expressions in Section 3.9. 


In Definition 3.6.9, a logical operator such as implication (“=>”) is combined with proposition names p, q € N 
to form logical expressions. Let W denote the space of logical expressions. Then the implication operator 
is effectively a map from M x V to W. By contrast, the logical operator “=” in Section 3.9 is effectively a 
map from W x W to W, mapping each pair ($1, $2) € W x W to a logical expression of the form “¢, > Q2”. 


3.6.13 REMARK: Logical expressions are analogous to navigation instructions. 

There are equivalences amongst even the rather basic kinds of logical expressions in Definition 3.6.9. For 
example, “p = q" always has the same meaning as “q <= p". Logical expressions may be thought of as 
"navigation instructions" for how to arrive at a particular meaning. Many different paths may terminate 
at the same end-point. Therefore the particular form of a logical expression is of less importance than the 
meaning which is arrived at by following the “instructions”. (Compare, for example, the fact that 2 + 2 and 
3 + 1 arrive at the same number 4. Many paths lead to the same number.) 


3.6.14 REMARK:  Always-false and always-true nullary operators. 

It is convenient to introduce here the always-false and always-true nullary operators. Notation 3.6.15 is 
effectively a continuation of Notation 3.6.2. Definition 3.6.16 is effectively a continuation of Definition 3.6.9. 
(The terms “falsum” and “verum” were used in an 1889 Latin-language work by Peano [375], page VIII, 


although he used a capital-V notation for “verum” and a rotated capital-V for "falsum".) 


3.6.15 NoTATION[MM]: Nullary operator symbols. 


(i) L means “always false" (the "falsum" symbol). 


ii) T means “always true" (the *verum" symbol). 
y 


3.6.16 DEFINITION[MM]: Let P be a concrete proposition domain. Then the following are interpretations 
of logical expressions in terms of concrete propositions. 


(1) “1” signifies the knowledge set (). 
(2) “T” signifies the knowledge set 27. 


3.6.17 REMARK:  Nullary logical expressions. 
The always-false and always-true logical expressions “1” and “T” are symbols-strings which contain only 
one symbol each, and that symbol is a nullary operator symbol. 


Although logical operator symbols | and T may appear in logical expressions on their own, they may also 
be used in combination with other logic symbols, as mentioned in Remark 3.13.6. 


3.6.18 REMARK: The difference between true and always-true. 

The always-false and always-true nullary operator symbols L and T are not the same as the respective false 
and true truth-values F and T. Logical operator symbols are symbols which may appear in logical expressions 
which signify knowledge sets. Truth values are elements of the range of truth value maps 7 : P > (F, T). 
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3.7. Truth functions for binary logical operators 


3.7.1 REMARK: Some basic kinds of knowledge sets can be expressed in terms of “truth functions”. 

One of the strongest instincts of mathematicians is to generalise any nugget of truth in the hope of finding 
more nuggets. Since the logical operators in Definition 3.6.9 are useful, one naturally wants to know the set 
of all possible logical operators to determine whether there are additional useful ones. 


The basic kinds of knowledge sets in Definition 3.6.9 lines (3)-(7) are specified in terms of the truth values 
of two concrete propositions p,q € P. Such “binary knowledge sets" may be generalised to sets of the form 
K$,-—(r € 2”; 0(r(p),7(q)) = T} for arbitrary “truth functions” 0 : (F, T? > (F, T). Definition 3.7.2 
further generalises such “binary truth functions" to arbitrary numbers of parameters. 


3.7.2 DEFINITION [MM]: A truth function with n parameters or n-ary truth function, for a non-negative 
integer n, is a function 0 : (F, T)" > {F, T]. 


3.7.3 REMARK: Justification for the range-set {F,T} for truth functions. 

It is not entirely obvious that the set (F,' T) is the best range-set for truth functions in Definition 3.7.2. 
One advantage of this range-set is that pairs of propositions are effectively combined to produce *compound 
propositions". To see this, consider the following equalities. 


Ky = {r €2"; v() = T) - (re^; S) eT) 
K? re? 0(T(p),T(4)) = T} = {7 € 2P; fe (T) =T}, 


where fp : 2P — (F, T) and ff, : 2P — {F,T} are defined for p,q € P by fp(T) = (p) and f?,(r) = 
0(r(p),r(q)) respectively, for all r € 2P. Since fp effectively represents the concrete proposition p, the 
function Ta effectively represents the “compound proposition" formula pO q. This makes such formulas 
seem like a natural generalisation of “atomic propositions” p € P. The convenience of this arrangement is 
due to the fact that the range-set of 0 is {F, T}, which is the same as the range-set of r € 2P. One may 
write K, = f; !((TJ) and K$, = (ff) ‘({T}), using the inverse function notation in Definition 10.5.10. 


3.7.4 REMARK: Disadvantages of the range-set {F, T} for truth functions. 

The range-set {F, T} for truth functions is not strictly correct. The range-value F for a truth function 
actually signifies “excluded combination”, and the range-value T signifies “not-excluded combination”. One 
cannot say that the expression "O(r(p),Tr(q))" is “true” in any sense. This expression is only a formula for 
a constraint on truth value maps. 


Strictly speaking, different symbols should be used, such as X for “excluded” and O for “not excluded” 
or “okay” or “open”. The use of the false/true values F and T leads to confusion regarding the meaning 
of compound propositions. A small economy in typography has significant negative effects on the teaching 
and understanding of logic. For example, most people seem to have difficulties, at least initially, with the 
idea that an expression “p — q" is true whenever p is false. Many textbooks try, often without success, to 
convince readers that “p = q” can be true when p is false. 


The teaching difficulty arises from the lack of distinction between concrete propositions (which are like 
"atoms") and logical expressions (which are like *molecules"). The concrete propositions are in fact either 
true or false, but the logical expression molecules are either consistent with the truth values of the atoms or 
not consistent. The “output” from a logical expression should be marked as “consistent” or “not consistent” 
to indicate that the logical expression accepts or rejects the tuple of atoms. Conversely, the atom-tuples 
should be marked as “not excluded” or “excluded”. The logical expression is a kind of “claim” or “theory” 
or “hypothesis”. The atom tuples either are consistent with the theory, or else invalidate the theory. Thus a 
logical expression is never, strictly speaking, true or false. A logical expression is a kind of pattern-matching 
machine, like in a neural network, and the output signifies whether the pattern matches or does not. 


When one says that “if there is smoke, then there is fire”, one is expressing partial knowledge about the 
concrete propositions p = “there is smoke” and q = “there is fire”. If it is directly observed that there is 
no smoke, this does not imply that the compound proposition p — q is true. What it does mean is that 
the partial knowledge p = q is not invalidated if p is false. On the other hand, the expression “p => q” is 
invalidated if it is observed that there is smoke without fire. 
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3.7.5 REMARK: An ad-hoc notation for binary truth functions. 

For any binary truth function 0 : (F,' T)? > {F, T}, let 0 be denoted temporarily by the sequence of four 
values 0(F, F) 0(F, T) 6(T, F) 6(T, T) of 6. Thus for example, FFTF denotes the binary truth function 6 
which satisfies 0(A, B) = T if and only if (A, B) = (T, F). 


The five kinds of binary logical expressions p 0 q in Definition 3.6.9 have the following truth functions. 


truth logical 
function expression 


FFFT p^q 
FTTT pvVq 
TFFT peg 
TFTT peq 
TTFT pq 


3.7.6 REMARK: Truth functions written in terms of exclusions and non-exclusions. 
The table of truth functions in Remark 3.7.5 may be rewritten in light of Remark 3.7.4 as follows. 


truth logical input pattern 
function expression FF FT TF TT 


XXXO D^q 
XOOO pVq 
OXXO ped or 
OXOO pq 
OOXO pq 


output 
signal 


COOK 
O x Won 
KOOK 
oooococ 


This shows the meaning of truth functions more clearly. Compound logical expressions signify exclusions 
of some combinations of truth values. As long as the excluded truth-value combinations do not occur, the 
logical expressions are not invalidated. 


In terms of the smoke-and-fire example, the proposition “if there is smoke, then there is fire" cannot be 
invalidated if there is no smoke, whether there is fire or not. Likewise, the proposition cannot be invalidated 
if there is fire, whether there is smoke or not. The only combination which invalidates the proposition 
is a situation where there is smoke without fire. If this situation occurs, the proposition is invalidated. 
Conversely, if the proposition is valid, then a smoke-without-fire situation is excluded. Thus it is not correct 
to say that a compound proposition such as “p = q” is true or false. It is correct to say that such a 
proposition is “excluded” or “not excluded". In other words, it is “invalidated” or “not invalidated”. 


3.7.7 REMARK: Additional binary logical operations. 

Since there are 22^? = 16 theoretically possible binary truth functions 0 in Remark 3.7.5, one naturally asks 
whether some of the remaining 11 binary truth functions might be useful. Three additional useful binary 
logical operations, all of which are symmetric, are introduced in Definition 3.7.10. 


The logical operator symbols introduced in Notation 3.7.8 have multiple alternative names, as indicated. 
In electronics engineering, “nand”, “nor” and “xor” are generally capitalised as NAND, NOR and XOR 
respectively. In this book, both upper and lower case versions are used. 


3.7.8 NoTATION [MM]: Additional binary logical operator symbols. 


(i) T means “alternative denial" (NAND, Sheffer stroke). 
(ii) | means “joint denial" (NOR, Peirce arrow, Quine dagger). 


(iii) A means “exclusive or” (XOR, exclusive disjunction). 


3.7.9 REMARK: History of the Sheffer stroke operator. 

The discovery of the properties of the Sheffer stroke operator is attributed by Hilbert /Bernays [359], page 48 
footnote, to an 1880 manuscript by Charles Sanders Peirce, although the discovery only became generally 
known much later through a 1913 paper by Henry Maurice Sheffer. 
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3.7.10 DEFINITION [MM]: Let P be a concrete proposition domain. Then for any propositions p,q € P, the 
following are interpretations of logical expressions in terms of concrete propositions. 

(1) “p f q" signifies the knowledge set 2” V (Kp N K3). 

(2) “p | q^ signifies the knowledge set 27 \ (Kp U K,). 

(3) “p ^ q” signifies the knowledge set (A, V Ka) U (Ka \ Kp). 


3.7.11 REMARK: Truth functions for the additional binary logical operators. 
The alternative denial operator has the truth function 0 : {F,T}? — {F,T} satisfying 0(p,q) = F for 
(p,q) = (T, T), otherwise 6(p, q) = T. 


The joint denial operator has the truth function 0 : (F, T]? — {F, T} satisfying 0(p,q) = T for (p,q) = 
(F, F), otherwise 0(p, q) = F. 

The exclusive-or operator has the truth function 0 : {F,T}? > (F, T} satisfying 0(p,q) = T for (p,q) = 
(T, F) or (p,q) = (F, T), otherwise 0(p,q) = F. 


3.7.12 REMARK: Truth functions for eight binary logical operators. 
The following table combines the binary logical operators in Definition 3.7.10 with the table in Remark 3.7.5. 


function ð pq 


FFFT p^q 
FTTF pdAq 
FTTT pVq 
TFFF pila 
TFFT peq 
TFTT p<q 
TTFT p>q 
TTTF ptq 


3.7.13 REMARK: The “but-not” and “not-but” binary operators. 

The remaining non-degenerate binary operators which are not shown in the table in Remark 3.7.12 are the 
“but-not” and “not-but” operators. Church [348], page 37, defines the “but-not” binary operator symbol * p" 
to have the truth function FF TF, so that p D q means that p is true and q is false, and defines the “not-but” 
binary operator symbol ^q" to have the truth function FTFF, so that p (| q means that p is false and q is 
true. Thus p D q means that “p is true, but not q”, and p d q means that “p is not true, but q is”. In other 
words, as the symbols suggest, p D q means that p — q is false, and p ¢ q means that p < q is false, These 
operators seem to be rarely used by authors other than Church. They are therefore not used in this book. 
(More modern notations would be # for FFTF and ¢ for FTFF, as in Table 3.8.1 in Remark 3.8.1.) 


3.7.14 REMARK: A survey of symbols in the literature for some basic logical operations. 
The logical operation symbols in Notations 3.6.2 and 3.7.8 are by no means universal. Table 3.7.1 compares 
some logical operation symbols which are found in the literature. 


The NOT-symbols in Table 3.7.1 are prefixes except for the superscript dash (Graves [85]; Eves [353]), and 
the over-line notation (Weyl [156]; Hilbert /Bernays [359]; Hilbert /Ackermann [358]; Quine [381]; EDM [112]; 
EDM2[113]). In these cases, the empty-square symbol indicates where the logical expression would be. The 


reverse implication operator “=” is omitted from this table because it is rarely used. When it is used, it is 
usually denoted as the mirror image of the forward implication symbol. 


The popular choice of the Sheffer stroke symbol “|” for the logical NAND operator is unfortunate. It 
clashes with usage in number theory, probability theory, and the vertical bars |...| of the modulus, norm 
and absolute value functions. The less popular vertical arrow notation *T" has many advantages. Luckily 
this operator is rarely needed. 


3.7.15 REMARK:  Peano's notation for logical operators. 

In 1889, Peano [375], pages VII- VIII, used “N” and “U” for the conjunction and disjunction respectively, and 
*—" for logical negation. He used a capital-C to mean “is a consequence". Thus a C b would mean that a is a 
consequence of b. Thus “C” was his abbreviation for “consequence”. Then he used a capital-C rotated 180? 
to represent implication, the inverse operation. These two symbols resemble the more modern *C" and “D” 
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or implies 
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1889 
1910 
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1919 
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respectively. (Peano’s notation for the implication operator is indicated as “D” in Table 3.7.1 because of the 
difficulty of finding a rotated-C character in standard TEX fonts.) More precisely, Peano [375], page VIII, 
wrote the following. 


[Signum C significat est consequentia; ita b C a legitur b est consequentia propositionis a. Sed hoc 
signo nunquam utimur]. 
Signum D significat deducitur; ita a D b idem significat quod b C a. 


'This may be translated as follows. 


[The symbol C means is a consequence; thus b C a is read as: b is a consequence of proposition a. 
But this symbol is never used]. 
The symbol D means is deduced; thus a D b means the same as b C a. 


According to Quine [381], page 26, Peano’s notation “>” for logical implication was “revived by Peano” from 
an earlier usage of the same symbol in 1816 by Gergonne [418]. The same origin is identified by Cajori [242], 
volume II, page 288, who wrote that Gergonne's “C” stood for “contains” and his 180?-rotated “C” stood 


for “is contained in". Peano [375], page XI, also uses these symbols for class inclusion relations as follows. 


Signum > significat continetur. Ita a D b significat classis a continetur in classi b. [.. . ] 
[Formula b C a significare potest classis b continet classem a; at signo C non utimur]. 


'This may be translated as follows. 


The symbol D means is contained. Thus a D b means class a is contained in class b. [.. .] 
[Formula b C a could mean class b contains class a; but the symbol C is not used]. 


Unfortunately this is the reverse of the modern convention for the meaning of *5". But it does help to 
explain why the logical operator “D” which is often seen in the logic literature is apparently around the 
wrong way. In fact, the omitted text in the above quotation is as follows in modern notation. 


Va,b c K, (a 2b) & Va, (x E€adzeEd), 


where K denotes the class of all classes. This makes the logic and set theory symbols match, but it contradicts 
the modern convention where the smaller side of the symbol “d” corresponds to the smaller set. 


3.7.16 REMARK: Notation for the exclusive-OR operator. 

There are only three notations for the exclusive OR operator in Table 3.7.1, possibly because there is little 
need for it in logic and mathematics. The exclusive OR of A and B is (AV B)A-(A^ B), which is equivalent 
to A & ~B. This has the useful property that it is true if and only if the arithmétic sum of the truth 
values T(A) + T(B) is an odd integer. So a notation resembling the addition symbol “+” could be suitable, 
such as “®”. This is in fact used in some contexts. (For example, see Lin/Costello [496], pages 16-17; Ben- 
Ari [340], page 8.) But this symbol is also frequently used in algebra with other meanings. The exclusive-OR 
symbol “#” is sometimes used. (For example, Church [348], page 37; CRC [63], pages 16-21.) But this also 
clashes with standard mathematical usage. 

Modified OR symbols such as “V”, “V” or “Y” could be used for the exclusive-OR symbol because of the 
similarity between the inclusive-OR and exclusive-OR operations. 

A fairly rational notation choice would be a superposed OR and AND symbol “X”. This has the same sort of 
4-way symmetry as the ^" biconditional symbol. But it would be tedious to write this by hand frequently. 
The triangle notation “A” seems suitable for the exclusive OR. because it resembles the Delta symbol A 
and the corresponding set operation is the “set difference". (See Definition 8.3.2 and Notation 8.3.3.) To 
distinguish this symbol from the set operation, the small triangle symbol A may be used. 


3.8. Classification and characteristics of binary truth functions 
3.8.1 REMARK: There are essentially only two kinds of binary truth function. 


The 22°) = 16 possible binary truth functions are summarised in Table 3.8.1 in terms of the truth-function 
notations in Remark 3.7.12. Here A and B are names of concrete propositions. 


The binary logical expressions in Table 3.8.1 may be initially classified as follows. 
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3.8. Classification and characteristics of binary truth functions 


truth logical 
function expression type sub-type equivalents atoms 
FFFF L always false 0 
FFFT AAB conjunction A, B 2 
FFTF A^-B conjunction A, 3B A#AB 2 
FFTT A atomic 1 
FTFF —-A^B conjunction AA, B A%B 2 
FTFT B atomic 1 
FTTF A<&~B implication exclusive AA B ~AS B 
FTTT  —-A- B implication inclusive AVB A<=-7B 
TFFF -=AA-7B conjunction AA, 5B ALB 2 
TFFT A«B implication exclusive A A=B oAe&e-B 
TFTF AaB atomic 1 
TFTT -—A--B implication inclusive AVv-B A&B 
TTFF aA atomic 1 
TTFT A-B _ implication inclusive ~AVB —-A«—-B 
TTTF A=+-B implication inclusive =AV-B A=B 
TTTT T always true 0 


Table 3.8.1 All possible binary truth functions 


(1) The always-false (L) and always-true (T) truth functions convey no information. In fact, the always- 
false truth function is not valid for any possible combinations of truth values. The always-true truth 
function excludes no possible combinations of truth values at all. 

The 4 atomic propositions (A, B, ~B and ^A) convey information about only one of the propositions. 
'These are therefore not, strictly speaking, binary operations. 

The 4 conjunctions (A ^ B, A ^ =B, ^A ^ B and ^A ^ -B) are equivalent to simple lists of two 
atomic propositions. So the information in the operators corresponding to these truth functions can be 
conveyed by two-element lists of individual propositions. 

The 4 inclusive disjunctions (A V B, A V 4B, 5A V B and =A V 7B) run through the 4 combinations 
of F and T for the two propositions. They are essentially equivalent to each other. 


(5) The 2 exclusive disjunctions (A ^ B and A ^ ^B) are essentially equivalent to each other. 


Apart from the logical expressions which are equivalent to lists of 0, 1 or 2 atomic propositions, there are only 
two distinct logical operators modulo negations of concrete-proposition truth values. There are therefore only 
two kinds of non-atomic binary truth function: inclusive and exclusive. There are 4 essentially-equivalent 
inclusive disjunctions, and 2 essentially-equivalent exclusive disjunctions. 


3.8.2 REMARK: The different qualities of inclusive and exclusive operators. 
The one-way implication and the inclusive OR-operator are very similar to each other. Also, the two-way 
implication and the exclusive OR-operator are very similar to each other. 


one-way two-way 

(inclusive) (exclusive) 
implication A> B ASB 
disjunction AVB AAB 


The two-way disjunction A A B is typically thought of as exclusive multiple choices: (AA ^B) V (2A ^ B). 
The two-way implication is often thought of as a choice between “both true" and “neither true”: (A A B) V 
(2A ^-B). Thus both cases may be most naturally thought of as a disjunction of conjunctions, not as a 
list (i.e. conjunction) of one-way implications or disjunctions. In this sense, inclusive and exclusive binary 
logical operations are quite different. However, a different point of view is described in Remark 3.8.3. 


3.8.3 REMARK:  Biconditionals are equivalent to lists of conditionals. 
One may go beyond the classification in Remark 3.8.1 to note that the two essentially-different kinds of 
disjunctions, the inclusive and exclusive, can be expressed in terms of a single kind. 
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The biconditional A <= B (which is the same as the exclusive disjunction A A ~B) is equivalent to a list 
of two simple conditionals: A > B, B => A. This re-write of biconditionals in terms of conditionals is 
fairly close to how people think about biconditionals. The biconditional A « B is also equivalent to the 
list: A > B, 5A > -B, which is also fairly close to how people think about biconditionals. (The exclusive 
disjunction A A B is equivalent to a list of two inclusive disjunctions: AV B, 2A V =B. But this is probably 
not how most people spontaneously think of exclusive disjunctions. ) 


It follows that every binary logical expression is equivalent to either a list of conditionals or a list of atomic 
propositions, modulo negations of concrete propositions. In other words, the implication operator is, appar- 
ently, the most fundamental building block for general binary logical operators. This tends to justify the 
choice of basic logical operators “=” and “=” for propositional calculus in Definition 4.4.3. 


3.9. Basic logical operations on general logical expressions 


3.9.1 REMARK: Extension of logical operators from concrete propositions to general logical expressions. 
The interpretations of logical operator expressions in Definitions 3.6.9, 3.6.16 and 3.7.10 assume that the 
operands are concrete propositions. In Section 3.9, logical expressions are given interpretations when the 
operands are general logical expressions. This recursive application of logical operators yields a potentially 
infinite class of logical expressions which is boot-strapped with concrete propositions. 


Logical expressions are given meaning by interpreting them as knowledge sets. If the operands in a logical 
expression are concrete propositions, the expression is interpreted in terms of atomic knowledge sets. If 
the operands of a logical expression are themselves logical expressions, the resultant expression may be 
interpreted in terms of the interpretations of the operands. 


3.9.2 REMARK:  Dereferencing logical expression names. 

To avoid confusion when proposition names are mixed with logical expression names, a “dereference” or 
“unquote” notational convention is required. For example, if ¢ is a label for the symbol-string “hair”, then 
the symbol-string *có's" is the same as “chairs”, whereas the symbol-string “cds” is not the same as “chairs”. 
The superscript “`” indicates that the preceding expression name is to be replaced with its value. So “g” 
is the same symbol-string as ¢, for any logical expression name ¢. Dereferencing a name is sometimes also 
called “expansion” of the name. 


3.9.3 NOTATION [MM]: Logical expression name dereference. 


¢`, for any logical expression name ¢, denotes the dereferenced value of ¢. 


3.9.4 REMARK: Single substitution of a symbol string into a symbol string. 

To provide unnecessarily abundant clarity for the notion of dereferencing, Definition 3.9.5 gives the details 
of how symbols are rearranged in a symbol string when a single symbol in the string is substituted with 
another symbol string. This is in contrast to the uniform substitution procedure in Definition 3.10.7. (See 
Notation 14.4.7 for the sets Zn = {0,...n — 1) for n € Zj.) 


3.9.5 DEFINITION[MM]: The (single) substitution into a symbol string po of a symbol string $ for the 


symbol in position q is the symbol string 45 = (49, ER ! which is constructed as follows. 
(1) Let à; = ($1,;)7:5! for i — 0 and i — 1. 
(2) Let mj = mo +m, — 1. 
Q0, k ifk<q 
(3) Let Pok = Q1,k—q if q< k< q + mı, for k € Zm- 
Qo,k-mı+1 Wk>qtm 


3.9.6 REMARK: Logical operator binding precedence rules and omission of parentheses. 

Parentheses are omitted whenever the meaning is clear according to the “usual binding precedence rules”. 
Conversely, when a logical expression is unquoted, the expanded expression must be enclosed in parentheses 
unless the usual binding precedence rules make them unnecessary. For example, if 9 = “pı ^ p2”, then the 
symbol-string ^" is understood to equal ^«(p, ^ pa)". 
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3.9.7 REMARK: Construction of knowledge sets from basic logical operations on knowledge sets. 
Knowledge sets may be constructed from other knowledge sets as follows. 


i) For a given knowledge set Kı C 27 , construct the complement 27 ky ={re 27 ;T Q Ky}. 
8 8 = 
ii) For given Ky, Ko C 2 , construct the intersection Kı N Kə = {T € 2 ;T€ K;andr € Ko}. 
8 = 
iii) For given Ky, K2 C 2 , construct the union KiU Ko — [r € 27 TEK ort € Ko}. 
g = 


By recursively applying the construction methods in lines (i), (ii) and (iii) to basic knowledge sets of the 
form K, = (r € 2”; r(p) = T} in Notation 3.6.5, any knowledge set K C 2? may be constructed if P is a 
finite set. (In fact, it is clear that this may be achieved using only the two methods (i) and (ii), or only the 
two methods (i) and (iii).) 

It is inconvenient to work directly at the semantic level with knowledge sets. It is more convenient to define 
logical expressions to symbolically represent constructions of knowledge sets. For example, if $4 and $» are 
the names of logical expressions which signify knowledge sets Kı and Kə respectively, then the expressions 
“Ao”, “PI A by” and *ój V $5" signify the knowledge sets 2? V Kı, Kı N Ko and Kı U Ko respectively. 
This is an extension of the logical expressions which were introduced in Section 3.6 from expressions signifying 
atomic knowledge sets K, for p € P to expressions signifying general knowledge sets K C 27. Logical 
expressions of the form “pı 0' po” for pı, p2 € P (as in Remark 3.7.12) are extended to logical expressions of 
the form *$1 0 ó5", where à, and $» signify subsets of 2P. (Note that p, and pə in the expression “pı 0‘ py” 
must not be dereferenced.) 


3.9.8 REMARK: Logical expression templates. 
Symbol-strings such as *$j V $5” which require zero or more name substitutions to become true logical 
expressions may be referred to as "logical expression templates". A logical expression may be regarded as a 


logical expression template with no required name substitutions. 


It is often unclear whether one is discussing the template symbol-string before substitution or the logical 
expression which is obtained after substitution. (The confusion of symbolic names with their values is a 
constant issue in natural-language logic and mathematics!) This can generally be clarified (at least partially) 
by referring to such symbol-strings as “templates” if the before-substitution string is meant, or as “logical 
expressions" if the after-substitution string is meant. 


3.9.9 DEFINITION [MM]: A logical expression template is a symbol string which becomes a logical expression 
when the names in it are substituted with their values in accordance with Notation 3.9.3. 


3.9.10 REMARK: Extended interpretations are required for logical operators applied to logical expressions. 
The natural-language words “and” and “or” in Remark 3.9.7 lines (ii) and (iii) cannot be simply replaced 
with the corresponding logical operators “A” and “v” in Definition 3.6.9 because the expressions ^r € Kj" 
and “r € Kə” are not elements of the concrete proposition domain P. (Note, however, that one could easily 
construct a different context in which these expressions would be concrete propositions, but these would be 
elements of a different domain P’.) 


Similarly, the words “and” and “or” in Remark 3.9.7 lines (ii) and (iii) cannot be easily interpreted using 
binary truth functions in the style of Definition 3.7.2 because the expressions “r € Kı” and “r € K2” cannot 
be given truth values which could be the arguments of such truth functions. The interpretation of these 
expressions requires set theory, which is not defined until Chapter 7. 


3.9.11 REMARK: Recursive syntax and interpretation for infix logical expressions. 

A general class of logical expressions is defined recursively in terms of unary and binary operations on 
a concrete proposition domain in Definition 3.9.12. (A non-recursive style of syntax specification for infix 
logical expressions is given in Definition 3.12.2, but the non-recursive specification is somewhat cumbersome. ) 


3.9.12 DEFINITION [MM]: An infix logical expression on a concrete proposition domain P, with proposition 
name map p : N — P is any expression which may be constructed recursively as follows. 


(i) For any p € N, the logical expression “p”, which signifies (7 € 27; r(u(p)) = T]. 
(ii) For any logical expressions $1, ¢2 signifying Kı, K2 on P, the following logical expressions with the 
following meanings. 
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(1) “(4 93)", which signifies 2? V Ky. 

2) "(3 ^ $3)", which signifies Kı Ko. 

3) “(P1 V $3)", which signifies Kı U Ko. 

4) “(¢, > $3)", which signifies (2? V K1) U Ko. 

(5) “(¢, = $5)", which signifies Ky U (2? \ Ka). 

6) “(P1 S $3)", which signifies (K1 N K2) U ((2P \ K1) n (27 \ K3)) = ((2? \ Ki) U K2) n (KU (2? \ K3)). 
7) “(1 t $3)", which signifies 2? \ (K1 N Ka). 

8) “(¢, | $5)", which signifies 2? V (KU K3). 

9) “(b\ ^ $5)", which signifies (K1 \ K2) U (K2 V Ky) = (K1U K3) \ (K1 1 Ko). 


3.9.13 REMARK: Suppression of implied proposition name maps. 

Although Definition 3.9.12 is presented in terms of a specified proposition name map u : N —> P, it will 
be mostly assumed that M = P and p is the identity map. When the name map is fixed, it is tedious and 
inconvenient to have to write, for example, that the logical expression “(pı ^ p2) = (pa V pa)” signifies the 
knowledge set (27 XV (K,(5,) N K,(5,))) U (Kuipa) U Kua): 


3.10. Logical expression equivalence and substitution 


3.10.1 REMARK: Equivalent logical expressions. 

Using Definition 3.9.12, one may construct logical expressions for a concrete proposition domain P. For 
example, the logical expression “(pı ^ p2) — (ps V pa)” signifies K = (2? \ (Kp, A K5,)) U (Kp, U Kpa), 
where P = (pi, po pai pa; -..]- Similarly, the logical expression (pi V ^pa) <= (^pa ^ —p4) signifies K' = 
(Kup, U Kup.) U (2? \ (Kap, O K-p,)). Since K = K’, these logical expressions are equivalent ways of 
signifying a single knowledge set. Therefore these may be referred to as “equivalent logical expressions”. In 
general, any two logical expressions which have the same interpretation in terms of knowledge sets are said 
to be equivalent. 


3.10.2 DEFINITION [MM]: Equivalent logical expressions 61,2 on a concrete proposition domain P are 
logical expressions ¢,,¢2 on P whose interpretations as subsets of 2P are the same. 


3.10.3 NOTATION [MM]: ¢1 = d», for logical expressions $1, $» on a concrete proposition domain P, means 
that $1 and $5 are equivalent logical expressions on P. 


3.10.4 REMARK: Proving equivalences for logical expression templates. 

One may prove equivalences between logical expressions on concrete proposition domains by comparing the 
corresponding knowledge sets as in Remark 3.10.1. For example, for any logical expressions 41,45 on the 
same concrete proposition domain, the following equivalences may be easily proved in this way. 


“gy” = ài 
«(oi V by)” “api A A, » 
“pi => py” =“ EM geo. 


Discovering and proving such equivalences can give hours of amusement and recreation. Alternatively one 
may write a computer programme to print out thousands or millions of them. 


Il 


Equivalences between logical expressions may be efficiently demonstrated using truth tables or by deducing 
them from other equivalences. The deductive method is used to prove theorems in Sections 4.4, 4.6 and 4.7. 


The equivalence relation “=” between logical expressions is different to the logical equivalence operation “=>”. 
The logical operation “<=” constructs a new logical expression from two given logical expressions. The 
equivalence relation “=” is a relation between two logical expressions (which may be thought of as a boolean- 


valued function of the two logical expressions). 


3.10.5 REMARK: All logical expressions may be expressed in terms of negations and conjunctions. 

The logical expression templates in Definition 3.9.12 lines (3)-(9) may be expressed in terms of the logical 
negation and conjunction operations in lines (1) and (2) as follows. When logical expression names (such 
as $1 and $3) are unspecified, it is understood that the equivalences are asserted for all logical expressions 
which conform to Definition 3.9.12. 
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1) “($1 V $3)" = "(5 (Q6) ^ (063) 

2) *($ = 2)” = *(^ (93 ^ (^ 03)" 

3) “(b= 93)" = *(^((0061) ^ 63)". 

4) “(91 & o3) = M 1^63)) ^ (^ (91) ^ (^03) 
= “(H (Ai ^(63)) ^ C (061) ^ $3)". 


5) *(&3 t $2)” = "(^ (91 ^ ó3))"- 
6) “(d+ $2)” = “(> 41) ^ (093). 
7) “(b1 ^ 62)” = “(A(A LA (7 43))) ^ (A (0 91) ^ 65))))” 
= “(9 (441) A (743) A O (G1 ^9)). 
In abbreviated infix logical expression syntax, these equivalences are as follows. 
1) *ói V $3" = "(26i ^ 205)". 
) *ó, > oy” = "n ^63)". 
3) *ó € 6)” = “(791 ^ ó3)". 
) *$, € 62” = “(791A 93) ^ o (61 ^ 265)" 
= “(p1 A793) ^ 2(201 ^ é3)". 
5) “Pi to” = “(gi A 2)”. 
6) “by | 62” = “ag, A793”. 
7) “OB G3” = A A 205) AAG ^ 93)" 
4761 ^ 203) ^ «(61 ^ à3)". 
These equivalences have the interesting consequence that the left-hand-side expressions may be defined to 


mean the right-hand-side expressions. Then there would be only two basic logical operators *—" and “A”, 
and the remaining operators would be mere abbreviations. Such an approach is in fact often adopted. 


Similarly, all logical expressions are equivalent to expressions which use only the negation and implication 
operators, or only the negation and disjunction operators. Economising the number of basic logical operators 
in this way has some advantages in the boot-strapping of logic because some kinds of recursive proofs of 
metamathematical theorems thereby require less steps. 


All logical expressions can in fact be written in terms of a single binary operator, namely the nand “t” 
or the nor “|” operator, since both negations and conjunctions may be written in terms of either one of 
these operators. However, defining all operators either in terms of the nand-operator or in terms of the 
nor-operator generally makes recursive-style metamathematical theorem proofs more difficult. 


Logical operator minimalism does not correspond to how people think. After the early stages of boot- 
strapping logic have been completed, it is best to use a full range of intuitively meaningful logical operators. 


'The recursively defined logical expressions in Definition 3.9.12 are merely symbolic representations for par- 
ticular kinds of knowledge sets. They are helpful because they are intuitively clearer and more convenient 
than set-expressions involving the set-operators V, N and U, but logical expressions are not “the real thing". 
Logical expressions merely provide an efficient, intuitive symbolic language for communicating knowledge, 
beliefs, assertions or conjectures. 


3.10.6 REMARK: Uniform substitution of symbol strings for symbols in a symbol string. 

Propositional calculus makes extensive use of “substitution rules”. These rules permit symbols in logical 
expressions to be substituted with other logical expressions under specified circumstances. Definition 3.10.7 
presents a general kind of uniform substitution for symbol strings. (See Notation 14.4.7 for the sets Z, = 
{0,...n—1} for n € Zt.) 

Uniform substitution is not the same as the symbol-by-symbol dereferencing concept which is discussed in 
Remark 3.9.2 and Notation 3.9.3. Single-symbol substitution is described in Definition 3.9.5. 


3.10.7 DEFINITION[MM]: The uniform substitution into a symbol string po of symbol strings $1, 92... n 
for distinct symbols pi,p2...pa, where n is a non-negative integer, is the symbol string $$ = (6,5) 720. i 
which is constructed as follows. 
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(1) i = ($45) Mig ^ for all i € Zn. 


mi if Qo, = Di 
2) lk = : i f ll k € Zmo. 
( ) k { 1 if Q0,k ¢ in; t . Pn} un s 9 


(4) A(k) = 32559 & for all k Zing. 


Qij id don = Pi 
/ _ J j 
(5) Do Akti = { Due urt medi for all j € Z,,, for all k € Zmo- 


3.10.8 NOTATION [MM]: Subs(¢9; p1[¢1], p2[¢2],---Pn[bn]), for symbol strings ġo, 61, %2. - -Øn and symbols 
P1, D2...pn, for a non-negative integer n, denotes the uniform substitution into $o of the symbol strings 


Q1, 92... Ọn for the symbols pi, pa . . . pa. 


3.10.9 REMARK: The stages of the uniform substitution procedure. 
The stages of the procedure in Definition 3.10.7 may be expressed in plain language as follows. 


(1) Expand each symbol string ¢; as a symbol sequence ói,o. .. i,m;—1- 

(2) Let &, be the length of string which will be substituted for each symbol o, in the source string ġo. (If 
there will be no substitution, the length will remain equal to 1.) 

(3) Let m, be the sum of the substituted lengths of symbols in $o. This will be the length of $$. 

(4) Let A(k) be the sum of the new (i.e. substituted) lengths of the first k symbols in ¢o. 

(5) Corresponding to each symbol ġo, in $o, let the £j; symbols NO Pus $0. X(k)d-—1 in ġġ) equal the £j, 
symbols $; 9... j,2, 1 in the string 9; if $9, = pi. Otherwise let $0. (k) equal the old value $o,;. Then 


dl = (95,5) 755! is the string Subs(¢o; p1[41], p2[ó2]. - - -Pu[ós])- 


All in all, the plain language explanation is not truly clearer than the symbolic formulas. 


3.11. Prefix and postfix logical expression styles 


3.11.1 REMARK: Recursive logical expression definitions are an artefact of natural language. 

Most of the mathematical logic literature expresses assertions in terms of unary-binary trees. T'his kind of 
structure is an artefact of natural language which is imported into mathematical logic because it is how 
argumentation has been expressed in the sciences (and other subjects) for thousands of years. 

A substantial proportion of the technical effort required to mechanise logic may be attributed to the use of 
natural-language-style unary-binary trees to describe knowledge of the truth values of sets of propositions. 
A particular technical issues which arises very early in propositional calculus is the question of how to 
systematically describe all possible unary-binary logical expressions. The approaches include the following. 


(1) Syntax tree diagrams. The logical expression is presented graphically as a syntax tree. 

(2) Symbol-strings. The logical expression is presented as a sequence of symbols using infix notation with 
parentheses, or in prefix or postfix notation without parentheses. 

(3) Functional expressions. The logical expression is constructed by a tree of function invocations. 

(4) Data structures. The logical expression is presented as a sequence of nodes, each of which specifies an 
operator or a proposition name. The operator nodes additionally give a list of nodes which are “pointed 
to” by the operator. (This kind of representation is suitable for computer programming.) 


These methods are illustrated in Figure 3.11.1. (See also Figure 3.12.1 in Remark 3.12.1.) 

All nodes are required to have a parent node except for a single node which is called the “root node". Each 
of these methods has advantages and disadvantages. Probably the postfix notation is the easiest to manage. 
It is unfortunate that the infix notation is generally used as the basis for axiomatic treatment since it is so 
difficult to manipulate. 

Recursive definitions and theorems, which seem to be ubiquitous in mathematical logic, are not truly intrinsic 
to the nature of logic. They are a consequence of the argumentative style of logic, which is generally expressed 
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syntax tree postfix expression data structure 
=> Pı Pp V a p3 => node| op | args 
E x N pa prefix expression B: | 8 
| => 7 V pi pps : ds 
V |3,4 
y infix expression 3 |m 
/ N (tpi V pa) >p) a | 5 
Pı E 
i abbreviated infix 5 | Pe 
p2 ^(pi V pz) => ps fa s 
functional expression 
fal 3", CSS CVs p ful", p?) "ps?) 
Figure 3.11.1 Representations of an example logical expression 


in natural language or some formalisation of natural language. If history had taken a different turn, logic 
could have been expressed mostly in terms of truth tables which list exclusions in the style of Remark 3.7.6. 


Ultimately all logical expressions boil down to exclusions of combinations of the truth values of concrete 
propositions. Each such exclusion may be expressed in infinitely many ways as logical expressions. This 
is analogous to the description of physical phenomena with respect to multiple observation frames. In 
each frame, the same reality is described with different numbers, but there is only one reality underlying 
the frame-dependent descriptions. In the same way, logical expressions have no particular significance in 
themselves. They only have significance as convenient ways of specifying lists of exclusions of combinations 
of truth values. 


3.11.2 REMARK: Punctuation is not part of the meaning of propositional logic. 

The most important take-home message from Section 3.11 is the observation that parentheses and other 
forms of proposition-grouping punctuation, such as the dot notations mentioned in Remark 4.1.6 item (4), 
are merely a side-effect of the importation of natural language conventions into symbolic logic. Punctuation 
is unnecessary for propositional logic. 

Some authors give detailed syntactical rules for the application of parentheses or other forms of punctuation, 
possibly giving the impression that such punctuation is a significant component of symbolic logic. The prefix 
and postfix notations show that punctuation can be dispensed with entirely. The punctuation is required 
only in order to express propositional logic in a way which resembles natural language. 


3.11.3 REMARK: Postfix logical expressions. 

Both the syntax and semantics rules for postfix logical expressions are very much simpler than for the infix 
logical expressions in Definition 3.9.12. In particular, the syntax does not need to be specified recursively. 
The space of all valid postfix logical expressions is neatly specified by lines (i) and (ii). 


3.11.4 DEFINITION[MM]: A postfix logical expression on a concrete proposition domain P, with operator 
classes Op = (*1", “T” }, O4 = (*—"), Op = (*A", "y? So”, "e", "en, MY *A"Y. using a name map 
u: N — P, is any symbol-sequence ¢ = (¢;)"2) € X" for positive integer n, where X = M U Oo U O1 U Os», 
which satisfies 

(i) L(¢,n) — 1, 

(ii) a($;) € L($,i) for all i 20...n— 1, 


where L(, i) = 37, -6(1 — a(6;)) for all i — 0... n, and 


1 if $4; € Oi 


0 if Qi € Oo UN 
a(ġi) - 
2 if Qi € Os 


for all i 2 0...n — 1. The meanings of these logical expressions are defined recursively as follows. 
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(iii) For any p € M, the postfix logical expression ^p" signifies the knowledge set (r € 2?; r(u(p)) = T]. 


(iv) For any postfix logical expressions $1, $» signifying K1, K2 on P, the following postfix logical expressions 
with the following meanings. 


(1) *$; ^" signifies 2? \ Ky. 

(2) *$1ó» ^" signifies Kı N Ko. 

(3) “pios V” signifies Kı U Ko. 

(4) “Pps =” signifies (2? V K1) U Ko. 

(5) “bby =” signifies Kı U (2? \ Ko). 

(6) “\d) ©” signifies (Ky N Ka) U ((2? V K1) n (2? \ Ka) = (2? \ K1) U Ka) n (Ka U (2P V Ka). 
(T) *195 1" signifies 2? V (K4 N Kə). 

(8) “oios 4” signifies 2? V (Kı U Ko). 

(9) “2193 ^" signifies (K1 \ K2) U (K3N K1) = (Kı U Ko) V (I1 N Ko). 


3.11.5 NOTATION[MM]: Postfix logical expression space. 
VY — (N', Oo, O1, O3), for a proposition name space M and operator classes Oo, O41 and O2, denotes the set 
of all postfix logical expressions $ as indicated in Definition 3.11.4. 


WT denotes YT (N, Og, O1, O3) with parameters V, Oo, O4 and Oz implied in the context. 


3.11.6 NOTATION[MM]: Postfix logical expression interpretation map. 

LN .P.u,Oo,O1,02* for a proposition name space V, a concrete proposition domain P, a proposition name 
map u : N — P, and operator classes Og, O1 and Oz, denotes the interpretation map from the postfix 
logical expression space W~ (N, Oo, O1, O3) to P(2”) as indicated in Definition 3.11.4. 


Ty- denotes Zw p ,0$,0,,0, With parameters N, P, u, Oo, O4 and Os implied in the context. 


3.11.7 REMARK: Postfix notation stacks and arity. 

The function a(¢;) notes the “arity” of the proposition name or operator symbol ¢;. This colloquial term 
means the number of operands for a symbol, which is 0 for nullary symbols, 1 for unary symbols and 2 for 
binary symbols. (See for example Huth/Ryan [363], page 99.) Postfix expressions are typically parsed by 
a state machine which includes a “stack”, onto which symbols may be “pushed”, or from which symbols 
may be “popped”. The formula L($,i) equals the length of the stack just before symbol ¢; is parsed. 
Clearly the stack must contain at least a($;) symbols for an operator with arity a(¢;) to pop. All symbols 
push one symbol onto the stack after being parsed. At the end of the parsing operation, the number of 
symbols remaining on the stack is L(ġ, n), which must equal 1. The semantic rules (iii) and (iv) are easily 
implemented in a style of algorithm known as “syntax-directed translation". (See for example Aho/Sethi/ 
Ullman [487], pages 279-342.) 


The syntax rules (i) and (ii) in Definition 3.11.4 may be expanded as follows. 


(i) 35220 a(65) =n - 1. 
(ii) 355 alj) € i for all i 2 0...n — 1. 


In essence, rule (i) means that the sum of the arities is one less than the total number of symbols, which 
means that at the end of the symbol-string, there is exactly one object left on the stack which has not been 
“gobbled” by the operators. Similarly, rule (ii) means that after each symbol has been processed, there is at 
least one object on the stack. In other words, the stack can never be empty. The object remaining on the 
stack at the end of the processing is the “value” of the logical expression. 


3.11.8 REMARK: Closure of postfix logical expression spaces under substitution. 
Theorem 3.11.9 asserts that postfix logical expression spaces are closed under substitution operations. (See 
Notation 3.10.8 for the uniform substitution notation.) 


3.11.9 THEOREM [MM]: Substitution of expressions for proposition names in postfix logical expressions. 
Let P be a concrete proposition domain with name map uw: N — P. Let ġo, ¢1,.- -n € YT (N, Oo, O1, O3) 
be postfix logical expressions on P, and let p1,...pa € N be proposition names, where n is a non-negative 
integer. Then Subs(¢9; p1[91],---Pn[On]) € W - (N, Oo, O1, O2). 
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PROOF: Let ¢0,¢1,.--¢n € Y - (N', Oo, O1, O2), and let à = Subs($o; pi[¢1],---Pn[on]). It is clear from 


/ 


Definition 3.11.4 that 9 ,, € N U Oo U O1 U O; for all k’ € Zm, where $Q = hl ae 


Each individual substitution of an expression $; = (dus into $9 removes a single proposition name 


symbol p;, which has arity 0. The single-symbol string v = “p;” satisfies L(w,1) = 1, but the substituting 
expression ¢; satisfies L(¢;,m;) = 1 also. So the net change to L($o,mo) is zero. Hence L($$,mo) = 
L($o, mo) = 1. (This works because of the additivity properties of the stack-length function L.) This verifies 
Definition 3.11.4 condition (i). 


Definition 3.11.4 condition (ii) can be verified by induction on the index k of $o. Before any substitution at 
k'—1 mo- 


index k, the preceding partial string (4) ;);<9 of the fully substituted string (4 w) dang satisfies a(o j) < 
L($Q,j) for j < k’ = A(k) (using the notation A(k) in Definition 3.10.7). If o, is substituted by $;, then 
a(o j) < Llo, j) for k' < j < k' + mi, and therefore a(o) < L(ġo, j) for j < K' + mi = (k +1)’. So by 


induction, a(o, j) < L(69,j) for j < mg. Hence ġo € #7 (N, Oo, O1, O2). 


3.11.10 REMARK: Abbreviated notation for proposition name maps for postfix logical expressions. 

The considerations in Remark 3.9.13 regarding implicit proposition name maps apply to Definition 3.11.4 
also. Although this definition is presented in terms of a specified proposition name map p4 : N — P, it will 
be mostly assumed that N = P and p is the identity map. 


3.11.11 REMARK: Prefix logical expressions. 

The syntax and interpretation rules for prefix logical expressions in Definition 3.11.12 are very similar to 
those for postfix logical expressions in Definition 3.11.4. The prefix logical expression syntax specification in 
lines (i) and (ii) differs in that the “stack length” L(¢,i) has an upper limit instead of a lower limit. A non- 
positive stack length may be interpreted as an expectation of future arguments for operators, which must be 
fulfilled by the end of the symbol-string, at which point there must be exactly one object on the stack. Note 
that for both postfix and prefix syntaxes, an empty symbol-string is prevented by the condition L(¢,n) = 1. 
(The function L is identical in the two definitions.) 


3.11.12 DEFINITION [MM]: A prefix logical expression on a concrete proposition domain P, with operator 
classes Oo = T5. SP. O71 = [^ [07 = {“A”, dr as un i 51278 Ss SAT E. using a name map 
u: N — P, is any symbol-sequence ¢ = (¢;)"29 € X" for positive integer n, where X = M U Oo UO, U O», 
which satisfies 


(i) L($,n) ^ 1, 
(ii) L(ó,i) < 0 for alli 2 0...n — 1, 


where L(,i) = 15, -6(1 — a(@,)) for all i — 0...n, and 


a(ó; —41 if ¢i € O1 


2 if $; € Os 


for alli 2 0...n — 1. The meanings of these logical expressions are defined recursively as follows. 


{1 if ġdi € Oo UN 


(iii) For any p € M, the prefix logical expression “p” signifies the knowledge set (7 € 2”; r(u(p)) = T). 


(iv) For any prefix logical expressions $1, ¢2 signifying Kı, K2 on P, the following prefix logical expressions 
with the following meanings. 


1) “~g” signifies 2? V K4. 
2) “Adios” signifies Kı N Ko. 
3) “Vigs” signifies Kı U Ko. 
4) “= $1ó»" signifies (2P V K1) U Ko. 
5) “<= $1ó»" signifies Kı U (2? \ Ka). 
6) “S $105" signifies (Ky N K3) U ((2” V K1) n (2? V K3)) = ((2P \ K1) U K2) n (Kı U (27 \ K3)). 
7) “to,” signifies 2? V (K1 N Kə). 
8) “Loi” signifies 2? V (Ky U Ka). 
) 


SA oy y signifies (Kı \ Kə) U (K2 \ Kı) = (Kı U Kə) \ (Kı N Kə). 
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3.11.13 NOTATION [MM]: Prefix logical expression space. 
W*(N,Oo,O1, 02), for a proposition name space M and operator classes Op, O4 and O2, denotes the set 
of all prefix logical expressions ¢ as indicated in Definition 3.11.12. 


W denotes Wt(N, Oo, O1, O3) with parameters V, Oo, O1 and Oz implied in the context. 


3.11.14 NOTATION [MM]: Prefix logical expression interpretation map. 

Tees eects igs for a proposition name space V, a concrete proposition domain P, a proposition name 
map u : N — P, and operator classes Oo, O4 and O2, denotes the interpretation map from the prefix logical 
expression space W+(N, Oo, O4, O3) to P(2”) as indicated in Definition 3.11.12. 


Tyw+ denotes TE u,Oo,0,,0, With parameters N, P, u, Oo, O4 and Os implied in the context. 


3.11.15 REMARK: Closure of prefix logical expression spaces under substitution. 
Theorem 3.11.16 asserts that prefix logical expression spaces are closed under substitution operations. (See 
Notation 3.10.8 for the uniform substitution notation.) 


3.11.16 THEOREM[MM]: Substitution of expressions for proposition names in prefix logical expressions. 
Let P be a concrete proposition domain with name map u : N > P. Let $9, $1,... 04 € Y * (N, Oo, O1, O3) 
be prefix logical expressions on P, and let p1,...pn € N be proposition names, where n is a non-negative 
integer. Then Subs($o; pili]... palós]) € Wt (N, Oo, O1, O2). 


PROOF: The proof follows the pattern of Theorem 3.11.9. 


3.11.17 REMARK: Syntaz-directed translation between logical expression styles. 

The interpretation rules (iii) and (iv) in Definition 3.11.4, and the corresponding rules in Definition 3.11.12, 
may be easily modified to generate logical expressions in other styles instead of generating knowledge sets. 
For example, one may generate infix notation from prefix notation by applying the following rules to the 
logical expression space W~ in Definition 3.11.4. 


66,99 


(iii’) For any p € V, the postfix logical expression “p” € W~ signifies the infix logical expression ^p". 

(iv’) For any postfix logical expressions ¢1,¢2 € W~ signifying infix logical propositions 1/1, v», the following 

postfix logical expressions have the following meanings. 

“pin” signifies “(7)”. 

“bby ^" signifies *(V ^ v5)". 

“bby V? signifies "(Vj V v5)". 

“bby =>” signifies “(Y] > V5)". 

“bi ga =” signifies “(Y1 = v5)". 

4019» ©” signifies "(Vj & V5)". 

“pipa T” signifies "(v T 5)". 

“pipat” signifies “(yy | 5)". 

9) “$163 A" signifies “(Y1 ^ v5)". 

Although it is very easy to generate infix syntax from postfix syntax (and similarly from prefix syntax), it 
8 y y 

is not so easy to parse and interpret infix syntax to produce the knowledge set interpretation or to convert 

to other syntaxes. The difficulties with infix notation arise from the use of parentheses and the fact that 


binary operators are half-postfix and half-prefix, since one operand is on each side, which is the reason why 
parentheses are required. This suggests that infix syntax is a rather poor choice. 


3.12. Infix logical expression style 


3.12.1 REMARK:  Non-recursive syntax specification for infix logical expressions. 
The space of infix logical expressions in Definition 3.9.12 is more difficult to specify non-recursively than the 
corresponding postfix and prefix logical expression spaces. To see how this space might be specified, it is 
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Figure 3.12.1 Bracket levels of infix logical expression example ((^(pi V (^p2))) > p3) 


useful to visualise infix expressions in terms of a “bracket-level diagram". This is illustrated in Figure 3.12.1 
for the infix logical expression example in Figure 3.11.1 (in Remark 3.11.1). 


The difficulties of non-recursively specifying the space of infix logical expressions in Definition 3.9.12 can 
be gleaned from syntax conditions (i)-(v) in Definition 3.12.2. These conditions make use of the quantifier 
notation which is introduced in Section 5.2 and the set notation Zm = (0,1,... m — 1) for integers m > 0. 


3.12.2 DEFINITION[MM]: An infix logical expression on a concrete proposition domain P, with operator 
classes Oo = TEES. SEDE. O71 = d tes Oz = {“A”, Be p dn ES EA zu id ES "AT and parenthesis 
classes B+ = {“(”} and B7 = (*)"), using a name map y : N — P, is any symbol-sequence ¢ = (@;)29 € X” 
for positive integer n, where X = M U Oo U O1 U O3 U Bt U B^ , which satisfies 


(i) Vi € Zn, B(ó, i) > 1, where Vi € Zn, B(ġ, i) = 5, b+ (65) — 22,9 07 (6;), where 


P) m (1 ES eM UODUBT and 


= 1 if¢;€E NUO,UB- 
= b I i 0 
0 otherwise (9%) { 


~ l0 otherwise. 


(The symbol string ¢ is assumed to be extended with empty characters outside its domain Zn.) 
(ii) B(¢d,n) = 0. 
(ii) For all i1,22 € Zn with i1 < ig which satisfy 

(1) B(¢, i1 — 1) < B(ó,i3) = B(¢, t2) > B(, i2 +1) and 

(2) B(¢,i1) € B(o,i) for all à € Zn with i1 <i < i, 


either 
(3) à4 = i2, or 
(4) there is one and only one io € Zn with à < io < i2 and B(¢,i1) = B(d, io), and for this io, 
(io) € O1 and à; +1 = io < i2 — 1, or 
(5) there is one and only one io € Zn with à4 < i9 < i? and B(¢,i1) = B(o, io), and for this io, 
(io) € Op and à +1 < io < i2 — 1. 
The meanings of these logical expressions are defined recursively as follows. 


(iv) For any p € M, the infix logical expression ^p" signifies the knowledge set {7 € 2”; r(u(p)) = T]. 
v) For any infix logical expressions $1, 2 signifying Kı, K2 on P, the following infix logical expressions 
have the following meanings. 


1) *(^91)" signifies 2? V Ky. 
2) "(9 ^ by)” signifies Kı N Ka. 
3) *(6 V 3)" signifies Ky U Ko. 
4) "(Qi > $3)" signifies (2? V K1) U Ko. 
5) "(Qi = $5)" signifies Kı U (2? \ Ka). 
6) "($i & $3)" signifies (Kı N K2) U ((2? \ K1) (2? V K2)) = ((2? \ Ki) U K2) n (K: U (2? \ K3)). 
7) “(ATHY signifies 2? V (K1 Ko). 
8) "(04 | $3)" signifies 2? V (Kı U Ko). 
) “( 


“(1 A $3)" signifies (K1 V K2) U (K2 V K1) = (K1 U K2) \ (K1 N Ko). 
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3.12.3 NOTATION [MM]: Infix logical expression space. 
JY? (N , Oo, O1, O5), for a proposition name space M and operator classes Oo, O41 and O2, denotes the set of 
all infix logical expressions $ as indicated in Definition 3.12.2. 


W denotes Y (N', Oo, O1, O3) with parameters V, Oo, O1 and Oz implied in the context. 


3.12.4 NOTATION[MM]: Infix logical expression interpretation map. 

T p, 11,00,01,02? for a proposition name space V, a concrete proposition domain P, a proposition name 
map u: N — P, and operator classes Og, O4 and Oz, denotes the interpretation map from the infix logical 
expression space Y? (N', Oo, O1, O3) to P(2”) as indicated in Definition 3.12.2. 


Ly denotes Thy p, o,.0,,0, With parameters M, P, ji, Oo, O1 and Os implied in the context. 
3.12.5 REMARK: Closure of infix logical expression spaces under substitution. 


Theorem 3.11.16 asserts that infix logical expression spaces are closed under substitution operations. (See 
Notation 3.10.8 for the uniform substitution notation.) 


3.12.6 THEOREM[MM]: Substitution of expressions for proposition names in infix logical expressions. 

Let P be a concrete proposition domain with name map u : N — P. Let do, 1,- . On € O(N, Oo, O1, O2) 
be infix logical expressions on P, and let pi,...pn € N be proposition names, where n is a non-negative 
integer. Then Subs($o; p1[41],---Pn[on]) € W°(N, Oo, O1, O2). 


PRoor: The proof follows the pattern of Theorem 3.11.9. 


3.12.7 REMARK: Disadvantages of infix syntaz. 

Syntax conditions (i)-(v) in Definition 3.12.2 apply to the fully parenthesised infix syntax which is defined 
much more easily by the corresponding recursive syntax conditions (i)-(ii) in Definition 3.9.12. Abbreviated 
versions of infix syntax are much more difficult to specify non-recursively. Infix syntax can be somewhat 
cumbersome to write metamathematical proofs for. 


Operators of any arity (such as ternary and quaternary operators) are easy to define for postfix and prefix 
syntax styles, but this is not true for the infix syntax style. This is a further disadvantage of infix notation. 


3.12.8 REMARK: Infix logical subexpressions. 

A symbol substring ¢ which is restricted to the interval [i1, i2] C Zn by Definition 3.12.2 syntax conditions 
(iii) (1)-(2) is itself a syntactically correct infix logical expression. It is clear that such a substring ¢ (shifted 
i, symbols to the left) satisfies conditions (iii) (3)-(5) if @ satisfies those conditions. These infix logical 
subexpressions of ¢ may be classified according to which of the three conditions are satisfied. 


(1) Subexpressions which satisfy Definition 3.12.2 condition (iii) (3) are either proposition names or nullary 
operators. 

(2) Subexpressions which satisfy Definition 3.12.2 condition (iii) (4) have the unary expression form in 
Definition 3.9.12 line (1). 


(3) Subexpressions which satisfy Definition 3.12.2 condition (iii) (5) have the binary expression form in 
Definition 3.9.12 lines (2)-(9). 


3.12.9 REMARK: The principal operator and matching parentheses in an infix logical subexpression. 
Let @ be an infix logical expression according to Definition 3.12.2, with domain Z,. For any i € Zn which 
satisfies (i) € O4 U O3 U Bt U B^ , the matching left and right parentheses are defined respectively by 


f(i) = max(j € Zn; j < i and B($,j) = B(¢,i)} if d(4) € O1 U O2 U B^ 
and r(i) =min{j € Zn; j » i and B(o,j) = B(¢,i)} if ó(i) € O1 U O2 U B*. 
Then the following properties are satisfied by ¢. 
(1) Vi € Zn, (i € O1U Oz implies (ġe) = "(" and dr = *)")). 
(2) Vi € Zn, (6; = *(" implies $,(; € O1 U O2). 
(3) Vic Zn, (Qi = e implies Peli) € O1U O2). 


These properties imply that there is a unique unary or binary operator between any two matching paren- 
theses. The substring between (and including) matching parentheses is an infix logical subexpression. 
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3.12.10 REMARK: Outer versus inner parenthesisation for infix logical expressions. 

An alternative to the “outer parenthesisation style” for infix logical expressions in Definition 3.12.2 semantic 

conditions (iv)-(v) is the “inner parenthesisation style" as follows. 

(iv') For any p € N, the logical expression ^p" signifies (r € 2P; r(u(p)) = T]. 

(v^) For any logical expressions $1,4» signifying Kı, K2 on P, the following logical expressions have the 
following meanings. 

(1^) “4(¢})” signifies 27 \ Ky. 

(2^) “(b\) ^ ($5)? signifies Kı N Ko. 

(3) “(¢)) V ($5)" signifies Kı U Ko. 

And so forth. Then the outer-style example *((—^(pi V (^p2))) > p3)” in Figure 3.11.1 becomes the inner- 


style expression “(—=((p1) V (>(p2)))) = (p3)”. The inner style has an extra two parentheses for each binary 
operator in the expression. This makes it even more unreadable than the outer style. 


)^ 
Jv 


3.12.11 REMARK: Many difficulties arise in logic because logical expressions imitate natural language. 

A conclusion which may be drawn from Section 3.11 is that many of the difficulties of logic arise from the way 
in which knowledge is typically described in natural language. In other words, the placing of constraints on 
combinations of truth values of propositions does not inherently suffer from the complexities and ambiguities 
of the forms of natural language used to communicate those constraints. The features of natural language 
which are imitated in propositional logic include the use of recursive-binary logical expressions, and the use 
of infix syntax. 


An even greater range of difficulties arises when the argumentative method is applied to logic in Chapter 4. 
The very long tradition of seeking to extend knowledge from the known to the unknown by logical argu- 
mentation has resulted in a formalist style of modern logic which focuses excessively on the language of 
argumentation while relegating semantics to a secondary role. Knowledge is sometimes identified in the logic 
literature with the totality of propositions which can be successfully argued for, using rules which are only 
distantly related to their meaning. The rules of logic are justifiable only if they lead to valid results. The 
rules of logic must be tested against the validity of the consequences, not vice versa. 


3.12.12 REMARK: Closure of infix logical expression spaces under substitution. 
In the case of infix logical expressions, substitutions are the same as for postfix and prefix logical expressions. 
(See Theorems 3.11.9 and 3.11.16.) 


3.12.13 REMARK: The functional alternative to symbol-string logical expressions. 

For metamathematical proofs, in view of the difficulties of manipulating natural-language imitating symbol- 
strings, it seems preferable to use function notation. Thus instead of the symbol-strings in Definitions 3.9.12, 
3.11.4 and 3.11.12, one may define logical expressions in terms of functions. In functional form, the example 
in Figure 3.11.1 in Remark 3.11.1 would have the form 


[o = f=”, RCP, fa( v, "pi, Al’, “p2” ))), “p3” ), (3.12.1) 


where the functions fı and f2 are defined according to some syntax style. For example, in the infix syntax 
style, fi(01, $1) expands to “o|(¢,)” and f2(02, 2, 3) expands to “(¢,)0,(¢3)”. In the postfix syntax style, 


MA 


fi(o1, 61) expands to *ó10j" and f2(02, 2, 03) expands to “$4303”. 


The observant reader will no doubt have noticed that the symbols in Equation (3.12.1) appear in the sequence 
“> 4 V pi ^ po p3”, which (not) coincidentally is the same order as in the prefix syntax in Figure 3.11.1. 
In fact, the conversion between functional notation and prefix notation is very easy to automate. The prefix 
and functional notations are well suited to proofs of metamathematical theorems. This suggests that for 
facilitating such proofs, the prefix or functional notations should be preferred. 
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3.13. Tautologies, contradictions and substitution theorems 


3.13.1 REMARK: The definition of tautologies and contradictions. 

Definition 3.13.2 means that a “contradiction” is a logical expression which is equivalent to the always-false 
logical expression (because it signifies the same knowledge set), and a “tautology” is a logical expression 
which is equivalent to the always-true logical expression (for the same reason). In other words, a logical 
expression $ is a contradiction if ¢ = "1". It is a tautology if ó = "T". (See Definition 3.6.16 for the 
interpretations of “L” and “T”.) 


3.13.2 DEFINITION [MM]: Let P be a concrete proposition domain. 


(i) A contradiction on P is a logical expression on P which signifies the knowledge set 9. 
(ii) A tautology on P is a logical expression on P which signifies the knowledge set 2”. 


3.13.3 THEOREM [MM]: Some useful tautologies for propositional calculus axiomatisation. 
The following logical expressions are tautologies for any given logical expressions $1, ¢2 and $3. 
(i) “pi => (à > e. 
(i) *(ó, => (65 > $3)) => ((0 > 95) > (& > 03)". 
(iii) “(093 > ^61) > (61 > 23)". 
PRoor: To verify these claims, let P be the concrete proposition domain on which $4, $9 and $4 are 
defined, and let K1, K2 and K3 be the respective knowledge sets. 


To verify (i), note that *ó; — 4j" signifies the knowledge set (27 V K2) U Kı by Definition 3.9.12 (ii) (4). 
Therefore “¢| > ($5 => 4)" signifies the knowledge set (2? \ K1) U((27 \ K3) U K1) by the same definition. 
By the commutativity and associativity of the union operation, this set equals ((2” V K1) U K1) U (27 \ Ko), 
which equals 27 U (2? \ K2), which equals 2”. Hence *ó; = ($$ > 4)" isa tautology by Definition 3.13.2 (ii). 
To verify (ii), it follows from Definition 3.9.12 (i) (4) that “(¢, > $5) > ($1 — $3)" signifies the set 


(2” \ ((2” \ K1) U K2)) U (27 \ Ki) U Ka) = (K1 N 27 \ K3)) U (BP \ K1) U Ka) 
= (KiU ((2” \ Ki) U Ka)) n (27 \ K2) U (P \ K1) U Ka) 
= (2? U K3) n ((2” X K2) U (GP \ K1) U Ka) 
= 2? n ((2” X K2) U (QP \ Ki) U Ka)) 
= (2? \ K3) U (GP \ Ki) UKs), 


while “| > (¢ > $3)" signifies the same knowledge set (2? V K1) U ((2P \ K3) U K3). Therefore the logical 
expression *(9j => ($$ = $3)) > ((¢, => $5) > (61 > ¢3))” signifies the knowledge set 


(27 X (27 \ Ka) U ((2” \ K1) U K3))) U (Ka A (2” \ Ka)) U (27 \ I) U K3)) = 27. 


Hence the logical expression “(¢; => ($5 > $3)) > ((94 > $3) > (9 => $3))" is a tautology. 
For part (iii), “=, = 4,” signifies the knowledge set (2? \ (2? V K2)) U (2? \ K1) by Definition 3.9.12 
parts (1) and (ii) (4), which equals KU (2? \ K1), whereas “(¢, = $5)" signifies (27 V K,) U K2, which is the 
same set. Therefore “(Ady $1) (¢, > $5)" signifies (2? V ((2? \ Ki) U K3)) U (2P \ Ky) U Kz) = 2”. 
Hence “(~g 91) => ($4 => $3)" is a tautology. 


3.13.4 REMARK:  Metatheorems are not theorems. 

Theorem 3.13.3 is not a real theorem. It is only “proved” by informal arguments which are not based on 
formal axioms of any kind. Therefore it is tagged with the letters “MM” as a warning that it is “only 
metamathematics”. Real mathematics follows rules. Metamathematics is the discussion which sets the rules 
and judges the consequences of the rules. 


3.13.5 REMARK: Applications of tautologies. 
The fact that the three logical expressions in Theorem 3.13.3 signify the set 2? implies that they place no 
constraint at all on the truth values of propositions in P. In this sense, tautologies convey no information. 
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The value of such tautologies lies in the fact that any logical expressions $1, ¢2 and $3 may be substituted 
into them to give guaranteed valid compound propositions. Such substituted tautologies may then be applied 
in combination with “nonlogical axioms” to determine the logical consequences of such nonlogical axioms. 


The three tautologies in Theorem 3.13.3 are the basis of the axiomatisation of propositional calculus in 
Definition 4.4.3. The tedious nature of the above proofs using knowledge sets provides some motivation for 
the development of a propositional calculus. The purpose of a pure propositional calculus (which has no 
“nonlogical axioms”) is to generate all possible tautologies from a small number of basic tautologies. This 
is to some extent reminiscent of the way in which a linear space may be spanned by a set of basis vectors, 
or a group may be generated by a set of generator elements. 


3.13.6 REMARK: Always-false and always-true nullary logical operators in compound expressions. 

The always-false and always-true logical expressions in Definition 3.6.16 and Notation 3.6.15 may be used 
in compound logical expressions in the same way as other logical expressions. For example, the expressions 
“L”, “=T”, “AA” and “a(A V T)" are all valid logical expressions if A is a proposition name. In fact, 
these are all contradictions, no matter which knowledge set is signified by A. Similarly, the expressions “T”, 
*—L[", “(A ^ L)’ and “AV T? are all valid logical expressions which are tautologies, no matter which 
knowledge set is signified by A. 


3.13.7 REMARK:  Tautologies and contradictions are independent of proposition name substitutions. 

If a logical expression is a tautology, it remains a tautology if the proposition names in the logical expression 
are uniformly substituted with other proposition names. Thus if each occurrence of a proposition name p 
appearing in a logical expression is substituted with a proposition name p, the logical expression remains a 
tautology. In other words, the tautology property depends only on the form of the logical expression, not 
on the particular concrete propositions to which it refers. This also applies to contradictions. 


» 


It is not entirely obvious that this should be so. For example, the expression “pı = (po — pi)” is a tautology 
for any proposition names pi1,po € V, for a name map u : N — P. This follows from the interpretation 
(2P \ Kpa) U (27 \ Kp.) U Kp,), which equals 2”. It seems from the form of proof that the validity does 
not depend on the choice of proposition names. So one would expect that the proposition names may be 
substituted with any other names. But arguments based on the form of a proof are intuitive and unreliable. 
Asserting that the form of a natural-language proof has some attributes would be meta-metamathematical. 


If one substitutes p3 for only the first appearance of pı, the resulting expression ^pa — (po = pi)" is not a 
tautology. So uniformity of substitution is important. Using Definition 3.10.7 for uniform substitution, it 
should be possible to provide a merely metamathematical proof that uniform substitution converts tautologies 
into tautologies. This is stated as (unproved) Propositions 3.13.8 and 3.13.10. 


3.13.8 PROPOSITION[MM]: Substitution of proposition names into tautologies. 
Let P be a concrete proposition domain with name map p : N > P. Let ġo € Y? (N,Oo, O1, O2) be an 
infix logical expression on P, let p1,...pn € N be distinct proposition names, and let p1,...pa, € M be 
proposition names, where n is a non-negative integer. If dọ is a tautology on P, then the substituted symbol 
string Subs($o; mi[*p1"], ... pa [pa^ ]) € W°(N, Oo, O1, O3) is a tautology on P. 


3.13.9 REMARK: Substitution of logical expressions into tautologies yields more tautologies. 

It may be observed that when any of the letters in a tautology are uniformly substituted with arbitrary 
logical expressions, the logical expression which results from the substitution is itself a tautology. However, 
this conjecture requires proof, even if the proof is only informal. 


3.13.10 PRoPosITION[MM]: Substitution of logical expressions into tautologies. 

Let P be a concrete proposition domain with name map 4: N — P. Let ġo, Q1,- - -On € W°(N, Oo, O1, O3) 
be prefix logical expressions on P, and let pi,...pa& € N be distinct proposition names, where n is a 
non-negative integer. If dp is a tautology on P, then Subs($o; m[ó1]. ...pa|ós]) € W°(N, Oo, O1, O2) is a 
tautology on P. 
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3.14. Interpretation of logical operations as knowledge set operations 


3.14.1 REMARK: Knowledge sets supply an interpretation for logic operations. 

According to the perspective presented here, logical operations on symbol-strings have a secondary role. 
They provide a convenient and efficient way of describing knowledge (or ignorance) of the truth values of 
sets of propositions. Logical expressions are “navigation instructions” which tell you how to construct a set 
of non-excluded truth value maps. 


Specifying a set of non-excluded truth value maps with a logical expression is analogous to describing a 
region of Cartesian space in terms of unions and intersections of hyperplanes. The region is the primary 
object, but the constraints are sometimes a convenient, efficient way to specify which points are in the region. 
However, as mentioned in Examples 3.15.3 and 3.15.4, binary logical operations can be extremely inefficient 
for describing some kinds of knowledge sets, in the same way that linear constraints are extremely inefficient 
for describing a spherical region. 


Logic is not arbitrary. The axioms of logic may not be chosen at will. The axioms must generate the correct 
properties of set-operations on knowledge sets. In this book, the universe of concrete propositions is regarded 
as primary, and the theory must correctly describe that universe. 


3.14.2 REMARK: A test-function analogy for knowledge sets as interpretations of logical operators. 

The original difficulties in the interpretation of the Dirac delta distribution concept were resolved by applying 
distributions to smooth test functions. Then derivatives of Dirac delta “functions” could be interpreted by 
applying differentiation to the test functions instead of the symbolic functions. By analogy, one may regard 
knowledge sets as “test functions” for logical operators. Then symbolic logical operations may be interpreted 
as concrete operations on knowledge sets. 


3.14.3 REMARK: Logic operations and set operations are duals of each other. 

The interpretation of logic operations in terms of set operations is somewhat suspicious because the set 
operations are themselves defined in terms of natural-language logic operations. So whether logic is defined 
in terms of sets or vice versa is a moot point. However, the framework proposed here has definite consequences 
for logic. For example, this framework implies (without any shadow of a doubt) that two logical expressions 
of the form (^p1) V p» and pı = p» have exactly the same meaning. Similarly, logical expressions of the 
form pı and —^(^p1) have exactly the same meaning (without any shadow of a doubt). More generally, any 
two logical expressions are considered to have identical meaning if and only if they put exactly the same 
constraints on the truth value map for the relevant concrete proposition domain. 


Although logic operations and set operations are apparently duals of each other, since set operations may 
be defined in terms of logical operations (via a naive “axiom of comprehension"), and logical operations 
may be defined in terms of set operations, nevertheless the set operations are more concrete. Whenever 
there is ambiguity, one may easily demonstrate set operations on finite sets in a very concrete way by listing 
those elements which are in a resultant set, and those which are not. By contrast, clarifying ambiguities in 
logic operations can be exasperating. (A well-known example of this is the difficulty of convincing a naive 
audience that a conditional proposition P — Q is true if P is false.) 


It could be argued that symbolic logic expressions are more concrete than set expressions because symbolic 
logic manipulates sequences of symbols on lines and such manipulation can be checked automatically by 
computers because the rules are so objective that they can be programmed in software. However, the 
corresponding set operators are equally automatable in the case of finite sets of propositions, although for 
infinite sets, only symbolic logic can be automated. This is because a finite sequence of symbols may be 
agreed to refer an infinite set. This is a somewhat illusory advantage for symbolic logic because sets can 
be defined otherwise than by comprehensive lists of their elements. For example, very large sparse matrices 
are typically represented in computers by listing only the non-zero elements. If an infinite matrix has only 
a finite number of non-zero elements, clearly one may easily represent such a matrix in a finite computer. 
Similarly, the number m may be represented symbolically in a computer rather than in terms of its infinite 
binary or decimal expansion. If one continues to apply and extend such methods of representing infinite 
objects by finite objects in computer software, the result is not very much different to symbolic logic. The 
real difference is how the computer implementation is thought about. 


3.14.4 REMARK: Alternative representations for propositional logic expressions. 
'The set operations corresponding to basic logic operations are essentially equivalent to the corresponding 
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truth tables. However, set operations are more closely aligned with human intuition than truth tables. These 
ways of representing a logic operation are compared in the case of the proposition A — B in Figure 3.14.1. 


Knowledge set B Truth table 
Natural language A, B truth values F T AIBlIAAoB 
“If A is true, F,F FT F|F T 
then B is true." O O F O O FIT T 
: : A 
Symbolic logic TF TT T| E F 
AB x Q T| X O T T T 
not excluded © Truth value map is in K Abbreviated 
excluded X Truth value map not in K knowledge set 
Figure 3.14.1 Comparison of presentation styles for a logic operation A — B 


In the knowledge-set table in Figure 3.14.1, the symbol *x" signifies that a truth-value-map possibility is 
excluded. The symbol “QO” signifies that a truth-value-map possibility is not excluded. This helps to make 
clear the meanings of the entries in the right column of the truth table. Each *F" in the right column 
signifies that the truth-value combination to the left is excluded. Each “T” in the right column signifies 
that the truth-value tuple to the left is not excluded. Thus the knowledge represented by the compound 
proposition “A = B" is very limited. It excludes only one of the four possibilities. It says nothing at all 
about the other three possibilities. 


3.14.5 REMARK: Finite knowledge sets are equi-informational to truth tables. 

It seems clear from Figure 3.14.1 (in Remark 3.14.4) that there is a one-to-one association between finite 
knowledge sets and truth tables. This association is easy to demonstrate by associating subsets K of 2? 
with functions from 27 to (F, T}. 


A knowledge set K for a concrete proposition domain P may be written as K = {r € 2P; vy(r) = T}, 
where Yg is the unique function Yg : 2? + (F, T} such that for all r € 2P, vy (7) = T if and only ifr € K. 
Conversely, any function v : 2P — (F, T) determines a unique knowledge set Ky = (r € 27; Y(T) = T}. 
Thus there is a one-to-one association between knowledge sets and functions from 2? to (F, T}. (This is 
analogous to the "indicator function" concept in Section 14.7.) 


A function v : 2P — (F, T) for finite P is essentially the same thing as a truth table, since a truth table 
lists the values v(7) in the right column for the truth value maps 7 € 2P which are listed in the left columns. 
This applies to any finite concrete proposition space P. 


Each truth table for a fixed finite concrete proposition space P may be associated in a one-to-one way with 
a function 6: (F, T)" > {F,T} if a fixed bijection a € P" (ie. a: {1,2,...n} — P) is given for some 
non-negative integer n. Such a bijection induces a unique bijection Ry : (F, T}? — (F, T)" which satisfies 
Ra: fo f oa for f € 27 = (F, TU". This fixed bijection Ra may be composed with each function 
0 : (F, T)" — (F, T) by the rule 0 — pọ = 0 o Ra. Then vs : (F, T}? — (F, T) is a well-defined truth 
table. Conversely, the maps v : (F, T)? — (F, T) are associated in a one-to-one manner with the composite 
maps ły = Y o Rt : (F, T)" > (F, T). The association is one-to-one because Ra is a bijection. 

In summary, for a fixed bijection a: (1,2,... n) — P, there is a unique one-to-one association between the 
truth functions 0 : (F, T)" > (F, T} in Definition 3.7.2 and the knowledge sets K C 2? and truth table 
maps x : 2P + {F,T}. Therefore it is a matter of convenience which form of presentation is used to define 
knowledge on a finite concrete proposition domain 7. 


3.14.6 REMARK: Knowledge has more the nature of exclusion than inclusion. 

It seems clear from Remarks 3.7.4 and 3.14.4 that knowledge is more accurately thought of in terms of 
exclusion of possibilities rather than inclusion. The knowledge in multiple knowledge sets is combined by 
intersecting them (as in Remark 3.4.10) because the set of possibilities which are excluded by the intersection 
of a collection of sets equals the union of the possibilities excluded by each individual set in the collection. 
In other words, knowledge accumulates by excluding more and more possibilities. 
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Similarly, statistical inference only excludes hypotheses. Statistical inference cannot prove hypotheses. It can 
only exclude hypotheses at some confidence level (assuming a framework of a-priori modelling assumptions). 
Thus statistical “confidence level” refers to the confidence of exclusion, not the probability of inclusion. 
One might perhaps draw a further parallel with Karl Popper’s philosophy of science, which famously asserts 
that science progresses by refuting conjectures, not by proving them. (See Popper [467].) Thus scientific 
knowledge progresses by accumulating exclusions, not by accumulating inclusions. 


3.14.7 REMARK:  Nonlogical axioms are knowledge sets. 

In the axiomatic approach to logic, each nonlogical axiom determines a knowledge set for a system. (For 
nonlogical or “proper” axioms, see for example E. Mendelson [370], page 57; Shoenfield [390], page 22; 
Kleene [366], page 26; Margaris [369], page 112.) In principle, the intersection of such knowledge sets should 
fully determine the system. (Note that a fixed universe of propositions is assumed to be specified here. 
'Therefore the theory has a unique interpretation in terms of a model. So this does not contradict the fact 
that theories do not in general have unique models.) In practice, however, determining the “intersection” of 
a set of axioms may be extremely difficult or even impossible when the set of propositions is infinite. 


3.15. Knowledge set specification with general logical operations 


3.15.1 REMARK: Possible infeasibility of specifying general knowledge sets with binary logical operators. 
Section 3.15 is concerned with fully general knowledge sets for finite concrete proposition domains. One 
issue which arises is whether the method of specifying knowledge sets by recursively invoking binary logic 
operations is adequate and efficient in the general case. 


3.15.2 REMARK: Options for describing general finite knowledge sets. 
One option for describing a general (finite) knowledge set K C 2” is to exhaustively list its elements. This 
option has the advantage of full generality, but gives little insight into the meaning of one's knowledge. 


In practical applications, one's knowledge is typically a conjunction of multiple pieces of knowledge of a 
relatively simple kind. This conjunction-of-a-list approach to logic has the great advantage that if some item 
on a knowledge list turns out to be false, it may be removed and the combined knowledge may be recalculated. 
(This is analogous to the way in which lists of constraints are managed in linear programming.) 


To describe knowledge which is more complex than simple binary operations, one could define notations and 
methods for managing the 20") general n-ary logical operations for n > 3 in the same way that the 205 = 16 
general binary operations are presented in Section 3.8. This approach is unsuitable largely because human 
beings do not think or talk using n-ary logical operations for n > 3. In particular, the English language 
describes general logical relations by combining unary and binary logical operations recursively. 


In practice, general n-ary logical operations are expressed as combinations of binary operations. For example, 
the assertion that “if the Sun is up and the Moon is up, then a solar eclipse is not possible” has the form 
(pı ^ p2) = ^pa, where p; = “the Sun is up”, po = “the Moon is up" and p3 = “a solar eclipse is possible". 
This assertion could in principle be described in terms of a ternary logical operation with its own eight-line 
truth table. (In this special case, there is in fact an alternative to compound binary operations because the 
assertion is equivalent to =(p1 ^ p2) V ^pa, which is equivalent to ^p1 V ^po V —^ps, which may be expressed 
as “at least one of these propositions is false", but this is not strictly a ternary logical operation.) 


The most usual way of describing finite knowledge sets is by “compound propositions”, which are binary 
operations whose “terms” are themselves either concrete propositions or compound propositions, and so 
forth recursively. This approach is equivalent to extending a concrete proposition domain P by adding 
binary logical operations to the domain. Then binary operations are applied to this extended domain to 
obtain a further extension of the domain, and so forth indefinitely. Since each such compound proposition ¢ 
may be identified with a knowledge set Ky = {r € 2”; ó(r) = T}, the end result of such recursion is a 
second-order proposition domain which may be identified with 27 if P is a finite set. 


3.15.3 EXAMPLE: Simple binary logical operations are inadequate to describe some small knowledge sets. 

A knowledge set K C 2? for a concrete proposition domain P cannot always be expressed as a single 
binary logic operation, nor even as the conjunction of a sequence of binary logic operations. For example, 
if P = {p1, p2, p3, p4}, then the knowledge that precisely two of the four propositions are true (and the 
other two are false) corresponds to the knowledge set K = {r € 2?; #{p € P; r(p) = T) = 2). (This set 
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contains C4 = 6 elements. See Notation 14.9.3 for combination symbols.) This set K cannot be described 
as the conjunction or disjunction of any list of binary logic operations because for all pairs X = {p;, p;} of 
propositions chosen from P = {p1,p2,p3, pa}, for all binary truth functions of p; and p;, there is always at 
least one combination of the remaining pair in P \ X which is possible, and at least one which is excluded. 


The condition #{p € P; r(p) = T) = 2 for P = (pi, po, p3, pa} corresponds to the logical expression 


((p1 ^ p2) = (^ps ^ —pa)) ^ ((pr ^ pa) € (ps ^ pa)) ^ (pi ^ ^p2) € (ps ^ pa)). 


'This compound proposition does not give a natural description of the knowledge set. Recursively compounded 
binary logical operations seem unsuitable for such sets. If this knowledge set example is written in disjunctive 
normal form, it requires 6 terms, but this representation also does not communicate the intended meaning of 
the knowledge set. Propositional logic is traditionally presented in terms of binary operations and compounds 
of binary operations possibly because such forms of logical expressions are prevalent in natural-language 
logic usage. However, the natural-language-inspired compounded binary logic expressions are inefficient and 
clumsy for many situations. 


3.15.4 EXAMPLE: Recursive binary logical operations are unsuitable for most large knowledge sets. 

As a more extreme example, suppose one knows that 70 out of a set of 100 propositions are true (and the rest 
are false). Then disjunctive normal form requires C480 = 29,372,339,821,610,944,823,963,760 terms, which 
is clearly too many for practical purposes, whereas one may specify such a knowledge set fairly compactly in 
terms of the existence of a bijection from the first 70 integers to the set of propositions. (Such an example 
is not entirely artificial. For example, a test may have 100 questions, and the assessment may state that 70 
of the answers are correct. Then one knows that 70 of the propositions of the candidate are true and 30 are 
false. Similarly, one might know that the score is in the range from 70 to 80, which makes the knowledge 
set structure even more difficult to describe with compound binary logic operations.) 


As a slightly more natural approach to this “70 out of 100" example, one could attempt a divide-and-conquer 
strategy to generate a compound expression. If r(p1) = T, then 69 of the remaining 99 propositions must 
be true. If (pi) = F, then 70 of the remaining propositions must be true. So the logical operation may 
be written as (py ^ P(99,69)) v (2p, ^ P(99, 70)), where P(m, k) means that k of the last m propositions 
are true. Then P(100,70) may be expanded recursively into a binary tree. Unfortunately, this tree has 
approximately C10? terms. So it is not more compact than disjunctive normal form or an exhaustive listing 
of a truth table, but it does have the small advantage that it gives some insight. However, is still requires 
many trillions of terabytes of disk space to store the result. 


Another strategy to deal with the “70 out of 100" example is to recursively split the set of propositions in the 
50 
middle. Thus Q(1,100, 70) may be written as MAUI 50, k) A Q(51,100, 70 — k)), where Q(i, j, k) means 


the assertion that exactly k of the propositions from pi to pj inclusive are true. This strategy also fails to 
deliver a more compact logical expression. 


3.15.5 REMARK: Limited practical applicability of traditional binary-operation logic expressions. 

One may conclude from Examples 3.15.3 and 3.15.4 that compounds of binary logical operations are not 
satisfactory for completely general propositional logic, although they are often presented in elementary texts 
as the sole fundamental basis. 


3.15.6 REMARK: Arithmétic and min/max expressions for logical operators. 

Some logical operators are expressed in Table 3.15.1 in terms of arithmetic relations among truth values of 
the function T from the set of proposition names to the set of integers (0, 1) as defined by T(P) = 0 if P is 
false and T(P) = 1 if P is true. 


The cases in Table 3.15.1 have been chosen so that both the logical and arithmetic expressions are simple. 
Most often, however, one style of expression may be short and simple while the other is long and complicated. 
As mentioned in Example 3.15.4, the mismatch can be huge. 

The arithmétic expression T(A) + 7(B) + 7(C) € 2 is equivalent to (A ^ B) V (A ^ C) V (B ^ C), and 
T(A) + 7(B) + T(C) = 2 is equivalent to ((A ^ B) V (A ^ C) V (B ^ C)) A7(AA B ^ C), which is 
equivalent to (AA BA -C)V (AASGBAC)V(-2A^B ^C). 

Negatives, conjunctions and disjunctions may also be expressed in terms of minimum and maximum operators 
as in Table 3.15.2. 
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logical expression arithmetic expression 
A T(A) =1 
AA T(A) =0 

AAB T(A) +7(B) 22 

AVB T(A)+7(B) 21 

A=>B T(A) € T(B) 

Ae B T(A) = T(B) 

AAB T(A) - T(B) 21 
A^BAC T(A)+7(B) +7(C) =3 
AVBVC T(A)+7(B)+7(C)>1 
AABAC T(A) - T(B) - T(C) =1 or 3 

'Table 3.15.1 Equivalent arithmétic expressions for some logical expressions 
logical expression min/max expression 
A min(T(A)) = 1 
AA max(T(A)) = 0 
AAB min(T(A),T(B)) 21 
AVB max(T(A),7(B)) = 1 


ANBAC min(T(A),7(B),7(C)) = 1 
AVBVC max(7T(A),7(B),7(C)) = 1 


Table 3.15.2 Equivalent min/max expressions for some logical expressions 


An advantage of these min/max operators is that they are also valid for infinite sets of propositions. This is 
important in the predicate calculus, where propositions are organised into parametrised families which are 
typically infinite. 


3.16. Model theory for propositional logic 


3.16.1 REMARK: The concrete proposition domain is not determined by the language. 

The issue of “model theory” arises when the concrete proposition domain is not uniquely determined by 
a given logical language. In the case of pure propositional logic, the class of all propositions is completely 
unknown. It could even be empty, and all of the theorems of propositional calculus in 4.5, 4.6 and 4.7 would 
still be completely valid. 


In ZF set theory, which is the principal framework for the specification of mathematical systems, the class 
of all propositions is unknown (in some sense). This follows from the fact that the class of all sets in ZF 
set theory is unknown (in some sense). The concrete proposition domain for ZF set theory contains basic 
propositions of the form “x € y" for sets x and y, together with all propositions which can be built from these 
using binary logical operators and quantifiers. This seems to be a satisfactory recursively defined domain of 
propositions, but the class of sets is not so easy to determine exactly. This is where “model theory” becomes 
relevant. Model theory is concerned with determining the properties of the underlying concrete proposition 
domain, including the class of sets, if only the language is given. 


If the class of propositions of a propositional logical system is given, then there is no need for model theory. 
In an introduction to propositional logic and propositional calculus, the concrete proposition domain is 
typically not given, but the class of propositions is generally not of interest in such a context because the 
level of abstraction makes it impossible to say anything at all about the model if only the language is given. 
Also in pure predicate logic and predicate calculus, there is no way to determine the underlying class of 
propositions and predicate-parameters from the language, although it is usual to assume that the class of 
predicate-parameters is non-empty to avoid needing to deal with this trivial case. Therefore in pure predicate 
logic, model theory has little interest because of the extreme generality of possible models. 

When non-logical axioms are introduced, the model theory question becomes genuinely interesting. In set 
theories in particular, the interest focuses on the class of sets in the system. This class is generally required 
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to have at least one element, such as the empty set, and is also expected to contain all sets which can 
be constructed recursively from this according to some rules. Therefore the set of “objects” in the system 
must include at least those sets which can be constructed according to the rules, but may contain additional 
objects. Model theory is concerned with determining the properties of those object. 


It would be possible to artificially construct a kind of model theory for propositional logic by introducing some 
constant propositions, analogous to the constant relation “€” and constant set ^)" in set theory. But then to 
make this interesting, one would need some kind of recursive rule-set for constructing new propositions from 
old propositions, starting with the given constant propositions. It is true that one may recursively construct 
a hierarchy of compound propositions using binary logical operators. But these are quite redundant because 
they would simply shadow the compound propositions of the language itself. 


Set theory generates interesting classes of propositions by first generating interesting classes of sets, and these 
sets may be combined into propositions using the set membership relation, thereby defining an interesting 
class of concrete propositions. Thus in order to make model theory interesting, propositions must have 
parameters which are chosen from an interesting class of objects, and the non-logical axioms must then give 
rules which can be used to recursively construct the interesting class of objects. This is in essence what a 
“first-order language" is, and this explains why model theory is interesting for first-order languages, but is 
not of any serious interest for propositional logic. 


3.16.2 REMARK: Broad classification of logical systems. 
Some styles of logical systems where the “model” is given a-priori may be classified as follows. 


(1) Pure propositional logic. A concrete proposition domain P is given. Each proposition in the language 
signifies a subset of the knowledge space 2P”. The theorems of the language concern relations between 
such propositions. The language is typically built recursively from unary and binary operations. The 
“logical axioms” for such a system are tautologies which are valid for any subset of 2”. 


(2) Propositional logic with “non-logical axioms”. A concrete proposition domain P is given, and a 
set of constraints on the knowledge space 2? reduces it a-priori to a subset Ko of 2P. The constraints 
may be specified in the language as “non-logical axioms”. This is equivalent to asserting a-priori a single 
proposition which signifies this reduced knowledge space Ko. This is a kind of “first-order language for 
propositional logic”. The logical axioms for 2? in (1) are also valid for Ko. 


(3) Pure predicate logic. A space Y of predicate parameters (called “variables” ) is given. A space Q of 
predicates is given. These are proposition-valued functions with zero or more parameters in V. (These 
may be referred to as “constant predicates” to distinguish them from the predicates which are formed 
in the language from logical operators.) A concrete proposition domain 7 is defined to be the space 
of all values of all predicates. (For example, V could be a class of sets and Q could consist of the two 
binary predicates “=” and “€”. Then P would contain all propositions of the form “x = y" or “x € y? 
for x and y in the class V.) Then every proposition in the language signifies a subset of the knowledge 
space 27, and the theorems of the language concern relations between such propositions, exactly as 
in (1). Because of the “opportunity” open up by the parametrisation of propositions, the unary and 
binary operations in (1) are “enhanced” by the introduction of universal and existential quantifiers. The 
“logical axioms” for a pure predicate logic are tautologies which are valid for any subset of 2”. 


(4) Predicate logic with “non-logical axioms”. This is the same as the pure predicate logic in (3) 
except that the knowledge space 2” is reduced a-priori to a subset Ko by a set of constraints which are 
typically written in the language as “non-logical axioms”. These are combined with the pure predicate 
logic “logical axioms” in (3). Non-logical axioms are written in terms of constant predicates, and possibly 
also constant objects and constant object-maps. (Otherwise the non-logical axioms would have nothing 
to refer to.) 


(5) Pure predicate logic with “object-maps”. This is the same as the pure predicate logic in (3) 
except that a space of “object-maps” is added to the system. These object-maps are maps from Y" to 
Y for some non-negative integer n. These potentially make the concrete proposition domain P larger 
because concrete propositions may be constructed as a combination of predicates and object-maps. For 
example, in axiomatic group theory (not based on ZF set theory), one might write o(g1,7(g2,93)) = 
c(c(gi, g2), g3), which would be a concrete proposition formed from the variables gi, g2, ga, the object- 
map c, and the predicate “=”. 
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(6) Pure predicate logic with “object-maps” and “non-logical axioms”. This is the same as (5), 
but with the addition of non-logical axioms as in (4). This corresponds to a fairly typical description of 
a “first-order language” in mathematical logic and model theory textbooks. 


Logical system styles (5) and (6) are not required for ZF set theory, and are therefore not formally presented 
in this book. They are more applicable to the axiomatisation of algebra without set theory. Style (1) is the 
subject of Chapters 3 and 4. Style (3) is the subject of Chapters 5 and 6. Style (4) is the logical framework 


for ZF set theory. 


Model theory inverts the roles of languages and semantics. Instead of defining objects and propositions in 
advance, model theory takes a “language” or “theory”, including axioms, as the “input”, and then studies 
properties of the possible object spaces V (and the constrained knowledge spaces Ko C 27) as “output”. 
In the case of a set theory, model theory is concerned with the properties of models which fit each set of 
axioms. Each model has a universe of sets and a set-membership relation for that universe. 


Model theory requires *underlying set theories" as frameworks in which to define models for the study of set 
theories. This makes the whole endeavour seem very circular. This is a “credibility issue" for model theory. 
(For the *underlying set theory" concept, or "intuitive set theory", see for example Chang/Keisler [347], 
pages 560, 579—596.) 
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4.0.1 REMARK: The argumentative approach to mathematics. 

One may think of propositional calculus as the “argumentative approach” to propositional logic. There is a 
huge weight of history behind the argumentative approach to knowledge in general. One can see painfully 
clearly the difficulties of escaping from the ancient authority of Aristotle’s argumentative approach in the 
“Dialogue” of Galileo [268], where Aristotelian views must be fought against by employing dubious arguments 
to counter even more dubious arguments, while the evidence of experiment is relegated to an inferior status. 


Logical argumentation helps us to interpolate and extrapolate from what we do know to what we don’t know. 
Logical argumentation also assists the rational organisation of knowledge to facilitate learning, to identify 
supposed facts which “do not fit the pattern”, and to formulate new hypotheses. But in mathematics, the 
replacement of intuitive truth with inferred truth has arguably gone too far in some areas. 


Propositional calculus is an unnecessary expenditure of time and effort from the point of view of applications 
to propositional logic, which can be managed much more efficiently with truth tables. Propositional calculus 
must be endured for the sake of its extension to applications with infinite sets of propositions. 


4.1. Propositional calculus formalisations 


4.1.1 REMARK: A propositional calculus is a mechanisation of argumentative propositional logic. 

A propositional calculus is a formalisation of the methods of argument which are used to solve simultaneous 
logic equations. By formalising the text-level procedures which are observed in real-world methods of solution, 
it is hoped that it will be possible to dispense with semantics and perform all calculations mechanically 
without reference to the meanings of symbols in much the same way that computers do arithmetic without 
understanding what numbers are. 


The deeper reason for using the argumentative, deductive method, as opposed to direct calculation, is that 
by deduction from axioms, one may span a larger range of ideas than can be represented concretely. For 
example, infinite sets cannot be demonstrated concretely, but they may be managed as abstract properties 
in an abstract language which refers to a non-demonstrable, non-concrete universe. In other words, the 
axiomatic, argumentative, deductive approach extends our “knowledge” from concepts which we can know 
by other means to concepts which we cannot know by other means. 


4.1.2 REMARK:  Veracity and credibility should arise from the mechanisation of logic. 
The underlying assumption of all formal logical systems is the belief (or hope) that the veracity, or at least the 
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credibility, of a very wide range of assertions may be assured by first focusing intensely on the selection and 
close examination of a small set of axioms and inference rules, and then by merely monitoring and enforcing 
conformance to the axioms and rules, all outputs from the process will inherit veracity or credibility from 
those axioms and rules. 


This is somewhat similar to the care which is taken in the design of machines which manufacture products 
and other machines. The manufacturing machines must be calibrated and verified carefully, and one assumes 
(or hopes) that the ensuing mass production of products and more machines will be reliable. However, real- 
world manufacturing is almost always monitored by quality control processes. If the analogy is valid, then 
the products of mathematical theories also need to be monitored and subjected to quality control. 


The view that meaning should be removed from logic was clearly expressed by Church [348], page 48. 


Our procedure is not to define the new language merely by means of translations of its expressions 
(sentences, names, forms) into corresponding English expressions, because in this way it would hardly 
be possible to avoid carrying over into the new language the logically unsatisfactory features of the 
English language. Rather, we begin by setting up, in abstraction from all considerations of meaning, 
the purely formal part of the language, so obtaining an uninterpreted calculus or logistic system. 


4.1.3 REMARK: Social benefits of the mechanisation of logic. 

High on the list of expected benefits from the mechanisation of logic is the establishment of harmony 
and consensus in the mathematical community, thereby hopefully avoiding civil war between competing 
ideologies. The plethora of formalisms and interpretations of mathematical logic has, however, solved major 
disputes more by isolating the mutual hostile warring groups in disconnected locally harmonious islands 
rather than joining all belief systems into a grand unified world-view. Nevertheless, good fences make good 
neighbours, and the axioms-and-rules approach to logic does give the important social benefit of dispute 
resolution efficiency. A straightforward comparison of the axioms and rules used by two mathematical 
belief-groups will generally reveal either that consensus is impossible or that the difference must lie in the 
execution of the common logical system. In the former case, no further time and energy need by expended 
in the attempt to achieve consensus. In the latter case, close scrutiny and comparison of the competing 
arguments by humans or computers will reveal which side is right. 


4.1.4 REMARK: All propositional calculus formalisations must be equivalent. 

All formalisations of propositional calculus must be equivalent to each other because they must all correctly 
the describe the same “model”, which in this case consists of a concrete proposition domain, the space of 
knowledge sets on this domain, and the collection of basic logical operations on these knowledge sets. 


4.1.5 REMARK: Interpretation of logical argumentation in terms of knowledge sets. 

In terms of knowledge sets, all valid logical argumentation commences with a set A of accepted knowledge 
sets, which are subsets of 2? for some concrete proposition domain P, and then proceeds to demonstrate 
that some specific set Ko is a superset of the intersection of all sets K € A. In other words, in terms of 
Notation 8.4.2, one shows that NA C Ko, assuming only that (J A C 27. (It is actually not necessary to 
assume A # () here because “fN A" is a shorthand for {r € 27; VK € A, T € K}, which is a well-defined set 
(or class) if 2? is.) 

To see why this is so, let 7 : P — [F, T] be the truth value map on P. Then 7 € K for all K € A. 
So r €f] A. But this is the only knowledge that one has about 7 because () A combines all of the accepted 
knowledge sets for P. If one shows that (] A C Ko, then it immediately follows that 7 € Ko. Therefore any 
knowledge set Ko C 2” which satisfies A C Ko is a valid knowledge set, because the definition of validity 
of a knowledge set Ko is that r € Ko. However, if (] A Z Ko, then (Ko N ((] 4)) € NA. (This would be 
asserting that (^] A’ is a set of valid knowledge sets for P, where A’ = AU {Ko}.) But this would contradict 
the assumption that A contains all of the accepted knowledge about P. In other words, Ko does not follow 
from A. So any logical argument which claims to prove Ko from A must be an invalid argument. Such an 
argument would conclude that T ¢ (fM A) V Ko, whereas the possibility of r € (NA) V Ko is not excluded by 
any knowledge set K c A. 


One might very reasonably ask why one does not simply calculate f) A for any given set of knowledge sets A. 
In fact, this calculation is not at all difficult for finite concrete proposition domains P. But in the case of 
infinite concrete proposition domains, the calculation is typically either too difficult or impossible. One may 
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think of logical argumentation as a procedure for putting bounds on the set f A for a given set A of subsets 
of 2? for a given set P. 


Thus logical argumentation proceeds by accumulating a set A of subsets of 2? without ever changing the 
value of (| A. This may seem somewhat paradoxical, but it means, more or less, that the total knowledge of 
a system is unchanged by the accumulation of theorems. In particular, if A is initially empty, then the only 
set which can be added to A is the entire space 2”, which by Definition 3.13.2 is the interpretation of all 
tautologies. In other words, any logical expression which is proved from A = () as the initial set of knowledge 
sets must be a tautology. One may consider this to be the task of any propositional calculus, namely the 
discovery of tautologies. Therefore propositional calculus merely discovers many ways of expressing the 
set 2P. The real profits of investment in logical argumentation are obtained only when A Æ Ø, which is true 
in any formal system with one or more nonlogical axioms, as for example in set theory. 


4.1.6 REMARK: The components of propositional calculus formalisations. 
Axiomatic systems for propositional calculus typically have the following components. These components 
are also found more generally in axiomatic systems for predicate calculus and a wide range of other theories. 


(1) Concrete propositions: These are the propositions to which proposition names refer. (The term 
“concrete propositions” is used also by Boole [342], page 52, and Carroll [346], page 59, with essentially 
the same meaning as intended here.) Concrete propositions are also known as “propositions” (Eves [353], 
pages 245, 250; Huth/Ryan [363], page 2), “statements” (Huth/Ryan [363], page 3; Margaris [369], 
page 1), “sentences” (Nagel/Newman [373], page 46), “elementary sentences” (Hilbert / Ackermann [358], 
pages 11, 18), “declarative sentences” (Lemmon [367], page 6; Huth/Ryan [363], page 2; Margaris [369], 
page 1), “atomic sentences” (E. Mendelson [370], page 16), “atoms” (Kleene [366], page 5), or “prime 
formulas” (Kleene [366], page 5). 


(2) Proposition names: These are labels for concrete propositions. Proposition names are typically single 
letters, with or without subscripts. They are also called “propositional variables" (Bell/Slomson [339], 
page 32; Lemmon [367], page 43; Rosenbloom [386], page 38; Smullyan [391], page 5), “proposition 
variables” (Kleene [365], page 139; Church [348], page 69), “propositional letters” (Whitehead/Russell, 
Volume I [400], page 5; Robbin [384], page 4), “proposition letters” (Kleene [365], page 108), “state- 
ment letters” (E. Mendelson [370], page 30), “statement variables” (Stoll [393], page 375), “sentential 
variables” (Nagel/Newman [373], page 46), “sentence symbols” (Chang/Keisler [347], page 5), “atoms” 
(Curry [350], page 55), or “propositional atoms” (Huth/Ryan [363], page 32). 

(3) Basic logical operators: The most often used basic logic operators are denoted here by the sym- 
bols +, ^, A, V and +. Basic logical operators are also called “logical operators” (Margaris [369], 
page 30), “sentence-forming operators on sentences” (Lemmon [367], page 6), “operations” (Curry [350], 
page 55; Eves [353], page 250), “connectives” (Chang/Keisler [347], page 5; Rosenbloom [386], page 32), 
“sentential connectives” (Nagel/Newman [373], page 46; Robbin [384], page 1), “statement connectives” 
(Margaris [369], page 26), “propositional connectives” (Kleene [366], page 5; Kleene [365], page 73; 
Cohen [349], page 4), “logical connectives” (Bell/Slomson [339], page 32; Lemmon [367], page 42; 
Smullyan [391], page 4; Roitman [385], page 26), “primitive connectives” (E. Mendelson [370], page 30), 
“fundamental logical connectives” (Hilbert / Ackermann [358], page 3), “logical symbols” (Bernays [341], 
page 45), or “fundamental functions of propositions” (Whitehead/Russell, Volume I [400], page 6). 
(Operator symbols are identified here with the operators themselves. Strictly speaking a distinction 
should be made. Another distinction which should be made is between the formally declared basic 
logical operators and the non-basic logical operators which are defined in terms of them. However, the 
distinction is not always clear. Nor is the distinction necessarily of real importance. The non-basic 
operators may be defined explicitly in terms of the basic operators, or alternatively their meanings may 
be defined via axioms.) 


(4) Punctuation symbols: In the modern literature, punctuation symbols typically include parentheses 
such as “(” and “)”, although parentheses are dispensable if the operator syntax style is prefix or 
postfix for example. Punctuation symbols are called “punctuation symbols” by Bell/Slomson [339], 
page 33, or just “parentheses” by Chang/Keisler [347], page 5. In earlier days, many authors uses 
various styles of dots for punctuation. (See for example Whitehead/Russell, Volume I [400], page 10; 
Rosenbloom [386], page 32; Robbin [384], pages 5, 7.) Some authors used square brackets (“P and “]”) 
instead of parentheses. These were called “brackets” by Church [348], page 69. 
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(5) Logical expression symbols: These are the symbols which are permitted in logical expressions. 
These symbols are of three disjoint symbol classes: proposition names, basic operators and punctuation 
symbols. Logical expression symbols are also known as “basic symbols” (Bell/Slomson [339], page 32), 
“formal symbols” (Stoll [393], page 375), “primitive symbols” (Church [348], page 48; Robbin [384], 
page 4; Stoll [393], page 375), or “symbols of the propositional calculus” (Lemmon [367], page 43). 


— 
aD 
— 


Logical expressions: These are the syntactically correct symbol-sequences. Logical expressions are 
also known as "formulas" (Bell/Slomson [339], page 33; Kleene [366], page 5; Kleene [365], pages 72, 108; 
Margaris [369], page 31; Shoenfield [390], page 4; Smullyan [391], page 5; Nagel/Newman [373], page 45; 
Stoll [393], page 375-376), *well-formed formulas" (Church [348], page 49; Lemmon [367], page 44; Huth/ 
Ryan [363], page 32; E. Mendelson [370], page 29; Robbin [384], page 5; Cohen [349], page 6), "proposition 
letter formulas" (Kleene [365], page 108), “statement forms" (E. Mendelson [370], page 15), “sentences” 
(Chang/Keisler [347], page 5; Rosenbloom [386], page 38), *elementary sentences" (Curry [350], page 56), 
or “sentential combinations” (Hilbert/Ackermann [358], page 28). 

Logical expressions which are not proposition names may be called *composite formulas" (Kleene [366], 
page 5), “compound formulas” (Smullyan [391], page 8), or “molecules” (Kleene [366], page 5). 


(7) Logical expression names: The permitted labels for well-formed formulas. (The labels for well-formed 
formulas could be referred to as “wff names” or “wff letters".) Logical expression names are also called 
“sentential symbols” (Chang/Keisler [347], page 5), “sentence symbols” (Chang/Keisler [347], page 7), 


“syntactical variables” (Shoenfield [390], page 7), “metalogical variables” (Lemmon [367], page 49), or 


“metamathematical letters" (Kleene [365], pages 81, 108). 


(8) Syntax rules: The rules which decide the syntactic correctness of logical expressions. Syntax rules are 
also known as "formation rules" (Church [348], page 50; Lemmon [367], page 42; Nagel/Newman [373], 
page 45; Robbin [384], page 4; Kleene [365], page 72; Kleene [366], page 205; Smullyan [391], page 6). 

(9) Axioms: The collection of axioms or axiom schemes (or schemas). Axiom schemes (or schemas, or 
schemata) are templates into which arbitrary well-formed formulas may be substituted. (Axiom schemes 
are attributed by Kleene [365], page 140, to a 1927 paper by von Neumann [438].) Axioms can also be 
defined in terms of logical expression names for which any logical expressions may be substituted. Both 
of these two variants are equivalent to an axiom-set plus substitution rule. 

Almost everybody calls axioms “axioms”, but axioms are also called “primitive formulas" (Nagel/ 
Newman [373], page 46), "primitive logical formulas" (Hilbert/Ackermann [358], page 27), "primitive 
tautologies" (Eves [353], page 251), "primitive propositions" (Whitehead/Russell, Volume I [400], pages 
13, 98), “prime statements" (Curry [350], page 192), or “initial formulas" (Bernays [341], page 47). 

(10) Inference rules: The set of deduction rules. Inference rules are also known as “rules of inference” 
(Kleene [366], page 34; Kleene [365], page 83; Church [348], page 49; Margaris [369], page 2; Robbin [384], 
page 14; Shoenfield [390], page 4; Stoll [393], page 376; Smullyan [391], page 80; Suppes/Hill [396], 
page 43; Tarski [398], page 132; Bernays [341], page 47; Bell/Slomson [339], page 36; E. Mendelson [370], 
page 29), “rules of proof” (Tarski [398], page 132), “rules of procedure” (Church [348], page 49), “rules of 
derivation” (Lemmon [367], page 8; Suppes [394], page 25; Bernays [341], page 47), “rules” (Curry [350], 
page 56; Eves [353], page 251), “primitive rules" (Hilbert/Ackermann [358], page 27), “natural deduction 
rules" (Huth/Ryan [363], page 6), *deductive rules" (Kleene [365], page 80; Kleene [366], page 205), 
“derivational rules” (Curry [350], page 322), “transformation rules” (Kleene [365], page 80; Kleene [366], 
page 205; Nagel/Newman [373], page 46), or *principles of inference" (Russell [388], page 16). 


The reader will (hopefully) gain the impression from this brief survey that the terms in use for the components 
of a propositional calculus formalisation are far from being standardised! In fact, it is not only the descriptive 
language terms that are unstandardised. There seems to be no majority in favour of any one way of doing 
almost anything in either the propositional and predicate calculus, and yet the resulting logics of all authors 
are essentially equivalent. In order to read the literature, one must, therefore, be prepared to see familiar 
concepts presented in unfamiliar ways. 


4.1.7 REMARK: The naming of concrete propositions and logical expressions. 
The relations between concrete propositions, proposition names, logical expressions, logical expression names 
in Remark 4.1.6 are illustrated in Figure 4.1.1. 


The concrete propositions in layer 1 in Figure 4.1.1 are in some externally defined concrete proposition 
domain. (See Section 3.2 for concrete proposition domains.) Concrete propositions may be natural-language 
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Figure 4.1.1 Propositions, logical expressions and names 


statements, symbolic mathematics statements, transistor voltages, or the states of any other kind of two-state 
elements of a system. 


Concrete propositions are abstractions from a physical or social system. Therefore the fact that the “real 
world” does not have sharply two-state components is a modelling or “verisimilitude” question which does 
not affect the validity of argumentation within the logical system. (The “real world” may be considered to 
be in layer 0 in this diagram.) 


The proposition names in layer 2 are associated with concrete propositions. Two proposition names may 
refer to the same concrete proposition. The association may vary over time and according to context. 


The proposition names in layer 2 belong to the “discussion context", whereas the concrete propositions in 
layer 1 belong to the “discussed context”. Layer 3 also belongs to the discussion context. (Layers 4 and 5 
belong to a meta-discussion context, which discusses the layer 2/3 context.) 


In layer 3, the proposition names in layer 2 are combined into logical expressions. Layers 2 and 3 are the 
layers where the mechanisation of propositional calculus is executed. 


In layer 4, the metamathematical discussion of logical expressions, for example in axioms, inference rules, 
theorems and proofs, is facilitated by associating logical expression names with logical expressions in layer 3. 
This association is typically context-dependent and variable over time in much the same way as for proposition 
name associations. 


In layer 5, the logical expression names in layer 4 may be combined in logical expression formulas. These 
are formulas which are the same as the logical expression formulas in layer 3, except that the symbols 
in the formulas are logical expression names instead of concrete proposition names. The apparent logical 
expressions in axioms, inference rules, and most theorems and proofs, are in fact templates in which each 
symbol may be replaced with any valid logical expression. 


4.2. Selection of a propositional calculus formalisation 


4.2.1 REMARK: Survey of propositional calculus formalisations. 
There are hundreds of reasonable ways to develop a propositional calculus. Some of the propositional calculus 
formalisations in the literature are roughly summarised in Table 4.2.1. 
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year author basic operators axioms inference rules 
1889 Peano [375], pages VII-X L,=, A, V, >, 43 Subs, MP 
1910 Whitehead/Russell [400], pages 4-13 ~, V 5 MP 
1919 Russell [389], pages 148-152 T 5 MP 
1919 Russell [389], page 152 1 1 MP 
1928 Hilbert/Ackermann [358], pages 27-28 ~, V 4 Subs, MP 
1928 Hilbert/Ackermann [358], page 29 y — 6 Subs, MP 
1928 Hilbert/Ackermann [358], page 29 y > 3 Subs, MP 
1928 Hilbert/Ackermann [358], page 29 T 1 Subs, MP 
1934 Hilbert/Bernays [359], page 65 ʻa, A, V, >, & 15 Subs, MP 
1941 Tarski [398], pages 147-148 a, >, 7 Subs, MP 
1944 Church [348], page 72 L, => 3 Subs, MP 
1944 Church [348], page 119 n 3 Subs, MP 
1944 Church [348], page 137 =, V 5 Subs, MP 
1944 Church [348], page 137 AV 4 Subs, MP 
1944 Church [348], page 137 ~=, V 3 Subs, MP 
1944 Church [348], page 138 + 1 Subs, MP 
1944 Church [348], page 138 f 1 Subs, MP 
1944 Church [348], page 138 + 1 Subs, MP 
1944 Church [348], page 157 em 7 Subs, MP 
1944 Church [348], page 160 A mm 3 Subs, MP 
1950 Quine [381], page 85 a> 3 Subs, MP 
1950 Quine [381], page 87 T 1 Subs, MP 
1950 Rosenbloom [386], pages 38-39 =, > 3 Subs, MP 
1952 Kleene [365], pages 69-82, 108-109 S Ny Vu 10 MP 
1952 Wilder [403], pages 222-225 =, V 5 Subs, MP 
1953 Rosser [387], pages 55-57 a, A, > 3 MP 
1957 Suppes [394], pages 20-30 a, A, V, > Taut 3 rules 
1958 Eves [353], pages 250-252 ~=, V 4 4 rules 
1958 Eves [353], page 256 T 1 4 rules 
1958 Nagel/Newman [373], pages 45-49 a, A,V,=> 4 Subs, MP 
1963 Curry [350], pages 55-56 =, > 3 MP 
1963 Stoll [393], pages 375-377 y — 3 MP 
1964 E. Mendelson [370], pages 30-31 EE 3 MP 
1964 E. Mendelson [370], page 40 eV 4 MP 
1964 E. Mendelson [370], page 40 a, A 3 MP 
1964 E. Mendelson [370], page 40 << 3 Subs, MP 
1964 E. Mendelson [370], page 40 AVV 10 MP 
1964 E. Mendelson [370], page 42 =, => 1 MP 
1964 E. Mendelson [370], page 42 T 1 MP 
1964 Suppes/Hill [396], pages 12-109 a,A,V, > 0 16 rules 
1965 Lemmon [367], pages 5-40 a, A,V,=> 0 10 rules 
1967 Kleene [366], pages 33-34 a A, V, >, S 13 MP 
1967 Margaris [369], pages 47-51 = > 3 MP 
1969 Bell/Slomson [339], pages 32-37 a, A 9 MP 
1969 Robbin [384], pages 3-14 L= 3 MP 
1973 Chang/Keisler [347], pages 4-9 a, A Taut MP 
1993 Ben-Ari [340], page 55 =, > 3 MP 
2004 Huth/Ryan [363], pages 3-27 wA NV S 0 9 rules 

Kennington E 3 Subs, MP 

Table 4.2.1 Survey of propositional calculus formalisation styles 
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The basic operator notations in Table 4.2.1 have been converted to the notations used in this book for easy 
comparison. Usually the remaining (non-basic) logical operators are defined in terms of the basic operators. 
Sometimes the axioms are defined in terms of both basic and non-basic operators. So it is sometimes slightly 
subjective to designate a subset of operators to be “basic”. 


The axioms for some systems are really axiom templates. The number of axioms is in each case somewhat 
subjective since axioms may be grouped or ungrouped to decrease or increase the axiom count. The axiom 
set "Taut" means all tautologies. (Such a formalisation is not strictly of the same kind as is being discussed 
here. Axioms are usually chosen as a small finite subset of tautologies from which all other tautologies can 
be deduced via the rules.) 


The inference rules “MP” and “Subs” are abbreviations for “Modus Ponens" and “Substitution” respectively. 
For systems with large numbers of rules, or rules which have unclear names, only the total number of rules is 
given. Usually the number of rules is very subjective because similar rules are often grouped under a single 
name or split into several cases or even sub-cases. 


Propositional logic formalisations differ also in other ways. For example, some systems maintain dependency 
lists as part of the argumentation to keep track of the premises which are used to arrive at each assertion. 


Despite all of these caveats, Table 4.2.1 gives some indication of the wide variety of styles of formalisations 
for propositional calculus. Perhaps the primary distinction is between those systems which use a minimal 
rule-set (only MP and possibly the substitution rule) and those which use a larger rule-set. The large rule- 
set systems typically use fewer or no axioms. The minimum number of basic operators is one operator, 
but this makes the formalisation tedious and cumbersome. Single-operator systems have more academic 
than practical interest. Systems with four or more operators are typically chosen for first introductions to 
logic, whereas the more spartan two-operator logics are preferable for in-depth analysis, especially for model 
theory. For the applications in this book, the best style seems to be a two-operator system with minimal 
inference rules and 3 or 4 axioms or axiom templates. Such a system is a good compromise between intuitive 
comprehensibility and analytical precision. 


4.2.2 REMARK: Advantages of using exactly two basic logical operators. 
The advantages of using no more and no less than two logical operators (for the purposes of model theory) 
are expressed as follows by Chang/Keisler [347], page 6. 


[...] there are certain advantages to keeping our list of symbols short. [...] proofs by induction 

based on it are shorter this way. At the other extreme, we could have managed with only a single 

connective, whose English translation is ‘neither ... nor ...’. We did not do this because ‘neither 
. hor...’ is a rather unnatural connective. 


The “single connective” which they refer to is the joint denial operator “|” in Definition 3.7.10. A contrary 
view was taken by Russell [389], page 146. 


When our minds are fixed upon inference, it seems natural to take “implication” as the primitive 
fundamental relation, since this is the relation which must hold between p and q if we are to be able 
to infer the truth of q from the truth of p. But for technical reasons this is not the best primitive 
idea to choose. 


He chose the single alternative denial operator "^f" instead, but was then obliged to immediately define four 
other operators (including the implication operator) in terms of this *primitive" operator in order to state 
five axioms more comprehensibly in terms of the implication operator especially. (The very clumsy and 
inconvenient alternative denial operator was avoided in his later Principia Mathematica [400, 401, 402].) 


4.2.3 REMARK:  Propositional calculus with a single binary logical operator. 

The single axiom schema (aT(81»))1((01(610))t ((€t8)t((orte)t (ort &)))) for the single NAND or “alternative 
denial” logical operator is attributed to Nicod [430] by Russell [389], pages 148-152, Hilbert / Ackermann [358], 
page 29, Church [348], pages 137-138, Quine [381], pages 87, 89, E. Mendelson [370], page 42, and Eves [353], 
page 256. This axiom schema requires only the single inference rule a, at (Ty) F y, which is equivalent to 
modus ponens. (More precisely, the rule is equivalent to a, a — (BA y) F y, which implies modus ponens.) 
Such minimalism seems to lead to no useful insights or technical advantages. (However, the NAND logic 
component has great importance in digital electronic circuit design.) 


4.2.4 REMARK: General considerations in the selection of a propositional calculus. 
When choosing a propositional calculus, the main consideration is probably its suitability for extension to 
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predicate calculus, and to set theory, and to general mathematics beyond that. It could be argued that the 
choice does not matter at all. No matter which basic operators are chosen, all of the other operators are 
soon defined in terms of them, and it no longer matters which set one started with. Similarly, a small set of 
axioms is soon extended to the set of all tautologies. So once again, the starting set makes little difference. 
Likewise the inference rules are generally supplemented by “derived rules”, so that the final set of inference 
rules is the same, no matter which set one commenced with. 


Since the end-point of propositional calculus theorem-proving is always the same, particularly in view of 
Remark 4.1.4, one might ask whether there is any criterion for choosing formalisations at all. (There are 
logic systems which are not equivalent to the standard propositional calculus and predicate calculus, but 
these are outside the scope of this book, being principally of interest only for philosophy and pure logic 
research.) In this book, the principal reason for presenting mathematical logic is to facilitate “trace-back” 
as far as possible from any definition or theorem to the most basic levels. Therefore the formalisation is 
chosen to make the trace-back as meaningful as possible. Each step of the trace-back should hopefully add 
meaning to any definition or theorem. 


If the number of fundamental inference rules is too large, one naturally asks where these rules come from, 
and whether they are really all valid. Therefore a small number of intuitively clear rules is preferable. The 
modus ponens and substitution rules are suitable for this purpose. 


Similarly, a small set of meaningful axioms is preferable to a comprehensive set of axioms. The smaller the 
set of axioms which one must be convinced of, the better. If the main inference rule is modus ponens, then 
the axioms should be convenient for modus ponens. This suggests that the axioms should have the form 
a => £ for some logical expressions a and 8. This strongly suggests that the basic logical operators should 
include the implication operator “=>”. 


The propositional calculus described by Bell/Slomson [339], pages 32-37, uses ^ and ^ as the basic operators, 
but their axioms are all written in terms of —, which they define in terms of their two basic operators. Sim- 
ilarly, Whitehead/Russell, Volume I [400], pages 4-13, Hilbert /Ackermann [358], pages 27-28, Wilder [403], 
pages 222-225, and Eves [353], pages 250-252, all use ^ and V as basic operators, but they also write all of 
their axioms in terms of the implication operator after defining it in terms of the basic operators. This hints 
at the difficulty of designing an intuitively meaningful axiom-set without the implication operator. 


If it is agreed that implication “=” should be one of the basic logical operators, then the addition of the 
negation operator “=” is both convenient and sufficient to span all other operators. It then remains to choose 
the axioms for these two operators. 


The three axioms in Definition 4.4.3 are well known to be adequate and minimal (in the sense that none of the 
three are superfluous). They are not intuitively the most obvious axioms. Nor is it intuitively obvious that 
they span all tautologies. These three axioms also make the boot-strapping of logical theorems somewhat 
austere and sometimes frustrating. However, they are chosen here because they are fairly popular, they give 
a strong level of assurance that theorems are correctly proved, and they provide good exercise in a wide 
range of austere techniques employed in formal (or semi-formal) proof. Then the extension to predicate logic 
follows much the same pattern. Zermelo-Fraenkel set theory can then be formulated axiomatically within 
predicate logic using the same techniques and applying the already-defined concepts and already-proved 
theorems of propositional and predicate logic. 


4.3. Logical deduction 


4.3.1 REMARK: Styles of logical deduction. 

There are many distinct styles of logical deduction. Any attempt to fit all formalisations into a single 
framework will either be too narrow to describe all systems or else so wide that the work required to 
reduce the broad abstraction to a concrete formalisation is excessively onerous. Some systems require the 
maintenance of dependency lists for each line of argument. Some systems use two-dimensional tableaux of 
various kinds, which are difficult to formalise and apply systematically. However, the majority of logical 
deduction systems follow a pattern which is relatively easy to describe and apply. 


The most popular kind of logical deduction system has globally fixed sets of proposition names, basic logical 
operators, punctuation symbols, syntax rules, axioms and inference rules. Generally one will commence log- 
ical deduction in possession of a (possibly empty) set of accumulated theorems from earlier logical deduction 
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activity. The set of accumulated theorems is augmented by new theorems as they are discovered, and the 
logical deduction may exploit these accumulated theorems. The theorem accumulation set is generally not 
formalised explicitly. Theorems may be exchanged between the theorem accumulation sets of people who are 
using the same logical deduction system and trust each other not to make mistakes. Thus academic research 
journals become part of the logical deduction process. 


Typically a number of short-cuts are formulated as “metatheorems” or “derived inference rules” to automate 
tediously repetitive sequences of steps in proofs. These metatheorems and derived rules are maintained and 
exchanged in metatheorem or derived rule accumulation sets. 


It is rare that logical deduction systems are fully specified. Until one tries to implement a system in computer 
software, one does not become aware of many of the implicit state variables which must be maintained. (One 
notable computer implementation of mathematical logic is the Isabelle/HOL “proof assistant for higher-order 
logic", Nipkow /Paulson/ Wenzel [374].) 


4.3.2 REMARK: Every line of a logical argument is a tautology, more or less. 

In the currently most popular styles of formal argument for propositional logic, every line of an argument 
is, in principle, a tautology. A formal argument progresses by inferring new tautologies from earlier lines in 
the argument, or from an accumulated cache of tautologies, which may be expressed as axioms or theorems. 
Such inference follows rules which are (hopefully) guaranteed to permit only valid tautologies to be inferred 
from valid tautologies. One could perhaps describe propositional calculus without non-logical axioms as 
“propositional tautology calculus". 


In practical applications, there are two notable departures from the notion that “every line is a tautology”. 
First, any proposition may be introduced at any point as a hypothesis. Hypotheses are typically not tau- 
tologies because tautologies can usually be proved, and therefore do not need to be assumed. One purpose 
of introducing a hypothesis is to establish a derived rule of inference which permits the inference of one 
proposition from another. Thus a proposition œ may be introduced as a hypothesis, and the standard rules 
of inference may be applied to show that a proposition 8 is conditionally true if a is true. The informal word 
“true” here does not signify a tautology. Conditional “truth” means that if the global knowledge set 2” is 
restricted to the subset Ka C 2? which is signified by a, then the knowledge set K pS 2P signified by £ is 
a tautology relative to Ka. This means that from the point of view of the subset Ka, the set Kg excludes 
no truth value maps. Thus if 7 € Ka, then 7 € Ka. In other words, Ka C Kg. 


After a hypothesis o has been introduced in a formal argument, all propositions 8 following this introduction 
are only required to signify knowledge sets Kg which are supersets of Ka. Then instead of writing F B, 
one writes a F 8. This can then be claimed as a theorem which asserts that 8 may be inferred from a. 
By uniformly applying substitution rules to œ and £, one may then apply such a theorem in other proofs. 
Whenever a proposition of the form o appears in an argument, it is permissible to introduce £ as an inference. 
The moral justification for this is that one could easily introduce the lines of the proof of o F 8 into the 
importing argument with the desired uniform substitutions. To save time, one prepares in advance numerous 
stretches of argument which follow a common pattern. Such patterns may be exploited by invocation, without 
literally including a substituted copy into the importing argument. 


A second departure from the notion that “every line is a tautology” is the adoption of “non-logical axioms". 
These are axioms which are not tautologies. Such axioms are adopted for the logical modelling of concrete 
proposition domains P where some relations between the concrete propositions 7 € P are given a-priori. As 
a simple example, P could include propositions of the form a(x) = “x is a dog” and G(x) = “x is a animal” 
for x in some universe U. Then one might define Kı = {r € 2P; Vx € U, r(a(x)) > 7(G(x))}. Usually 
this would make Kı a proper subset of 2P. This is equivalent to adopting all propositions of the form 
a(x) = f(x) as non-logical axioms. In a system which has non-logical axioms, all arguments are made 
relative to these axioms, which in semantic terms means that the global space 2? is replaced by a proper 
subset which corresponds to “a-priori knowledge". 


In the case of predicate calculus, it transpires, at least in the “natural deduction” style of logical argument 
which is presented in Section 6.3, that “every line is a conditional tautology”. In other words, every line has 
the implicit form o4, a2, ... o F f. Such a form of conditional assertion has the semantic interpretation 
Nii Ka; € Kg, which is equivalent to the set-equality (2? \ NZ; Ka;) U Kg = 2”, which effectively means 
that the proposition (a, ^ az ^ ... am) => f is a tautology. In such a predicate calculus, the same two kinds 
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of departures from the rule “every line is a tautology” are encountered, namely hypotheses and non-logical 
axioms. 


4.3.3 REMARK: Notation for the assertion of existence of a proof of a proposition. 

The “assertion” of a logical expression in a propositional calculus means that one is claiming to be in 
possession of a proof of the logical expression according to the axioms and rules of that propositional 
calculus. This is somewhat metamathematical, being defined in terms of the “socio-mathematical network” 
of mathematicians who maintain and exchange lists of proved theorems, and trust each other to not insert any 
faulty theorems into the common pool. The claim of “existence” of a proof is not like the claims of existence 
of sets in set theory, which are proved often by logical argument rather than be direct demonstration. The 
existence of a proof is supposed to mean that the claimant can produce the entire sequence of lines of the 
proof on demand. This is a much more concrete form of existence than, for example, the “existence” of an 
infinite set in set theory. 


4.3.4 REMARK: Propositional calculus scroll management. 

The logical deduction process may be thought of as unfolding on a scroll of paper which has definite left and 
top edges, but which is as long and wide as required. Typically there will be three columns: the first for the 
line number, the second for logical expressions, and the third for the justification of the logical expression. 


In some deduction systems, not all lines are asserted to be proved. For example, systems which use the RAA 
(“reductio ad absurdum") method permit an arbitrary logical expression to be introduced as an “assumption” 
(or “premise”), with the intention of deducing a contradiction from it, which then justifies asserting the 
negation of the assumption. In such systems, all consequences of such assumptions will be shown to be 
conditional on a false premise. Therefore the lines which are provisional must be distinguished from those 
which are asserted. This is typically achieved by maintaining a fourth column which list the previous lines 
upon which a given line depends. When the dependency list is empty, the logical expression in that line is 
being asserted unconditionally. 


The axioms may be thought of as being written on a separate small piece of paper, or as incorporated into 
a machine which executes the rules of the system, or they may be written on the first few lines of the main 
scroll. Most systems introduce axioms or instantiations of axiom templates as required. A substitution rule 
may or may not be required to produce a desired instantiation of an axiom. 


An “assertion” may now be regarded as a quotation of an unconditionally asserted logical proposition which 
has appeared in the main scroll. For this purpose, a separate scroll which lists theorems may be convenient. 
If challenged to produce a proof of a theorem, the keeper of the main scroll may locate the line in the scroll 
and perform a trace-back through the justifications to determine a minimal proof for the assertion. Since 
the justification of a line must quote the earlier line numbers which it requires, one may collect all of the 
relevant earlier lines into a complete proof. 


Thus an “assertion” is effectively a claim that someone has a scroll, or can produce a scroll, which gives 
a proof of the assertion. In practice, mathematicians are almost always willing to accept mere sketches 
of proofs, which indicate how a proof could be produced. True proofs are almost never given, except for 
elementary logic theorems. 


4.3.5 REMARK: Why logic and mathematics are organised into theorems. The mind is a microscope. 

Recording all of the world’s logic and mathematics on a single scroll would have the advantage of ensuring 
consistency. In a very abstract sense, all of the argumentation is maintained on a single scroll for each logical 
system. In principle, all of the theorems scattered around the world could be merged into a single scroll. 
One obvious reason for not doing this is that there are so many people in different places independently 
contributing to the “main scroll”. But even individuals break up their logical arguments into small scrolls 
which each have a statement of a theorem followed by a proof. This kind of organisation is not merely 
convenient. It is necessitated by the very limited “bandwidth” of the human mind. While working on one 
theorem, an individual temporarily focuses on providing a correct (or convincing) proof of that assertion. 
During that time, maximum focus can be brought to bear on a small range of ideas. The mind is like 
a microscope which has a very limited field of view, resolution and bandwidth, but by breaking up big 
questions into little questions, it is possible to solve them. Theorems correspond, roughly speaking, to the 
mind’s capacity to comprehend ideas in a single chunk. A million lines of argument cannot be comprehended 
at one time, but it is possible in tiny chunks, like exploring and mapping a whole country with a microscope. 
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4.3.6 REMARK: The validity of derived rules. 

This simple picture is blurred by the use of metamathematical “derived rules”, which are proved metamath- 
ematically to “exist” in some sense. Such proofs regard a propositional calculus (or other logical system) 
as a kind of machine which can itself be studied within a higher-order mathematical model. However, in 
the case of logical deduction systems, the validity of proofs in a higher-order model relies upon theorems 
which are produced within the lower-order model. Therefore the use of derived rules casts doubt upon 
theorems which are produced with their assistance. If a particular line of a proof is justified by the use of a 
derived rule, this is effectively a claim that a proof without derived rules may be produced on demand. In 
effect, this is something like a “macro” in computer programming languages. A “macro” is a single line of 
pseudo-code which is first expanded by a pre-processor to generate the real software, and is then interpreted 
by a language compiler in the usual way. If a derived rule can always be “expanded” to produce a specific 
sequence of deduction lines upon demand, then such derived rules should be essentially harmless. However, 
real danger arises if metaphysical methods such as choice axioms are employed in the metamathematical 
proof of a derived rule. As a rough rule, if a computer program can convert a derived rule into explicit 
deduction lines which are verifiably correct, then the derived rule should be acceptable. Thus derived rules 
which use induction are mostly harmless. 


One of the most important derived deduction rules is the invocation of a previously proved theorem. This is 
effectively equivalent to the insertion of the previous theorem’s proof in the lines of proof of the new theorem. 
In this sense, every theorem may be regarded as a template or “macro” whose proof may be expanded and 
inserted into new proofs. Since this rule may be applied recursively any finite number of times, a single 
invocation of a theorem may add thousands of lines to a fully expanded proof. 


4.3.7 REMARK: The meaning of the assertion symbol. 

The assertion symbol “-” is used in many ways. Sometimes it is a report or claim that someone has found 
a proof of a specific assertion within some logical system. In this case, the reporter or claimant should be 
able to produce an explicit proof on demand. 


In the theorem-and-proof style of mathematical presentation, each theorem-statement text contains one or 
more assertions which are proved in the theorem-proof text which typically follows the theorem-statement. 
Stating the claim before the proof has the advantage that the reader can see the “punch line” without 
searching the proof. Another advantage is that a trusting reader may omit reading the proof since most 
readers are interested only in the highlighted assertions. 


In metamathematics, the assertion symbol generally indicates that an asserted logical proposition template 
(not a single explicit logical proposition) is claimed to have a proof which the claimant can produce on demand 
if particular logical propositions are substituted into the template. In this case, an explicit algorithm should 
be stated for the production of the proof. In the absence of an effective procedure for producing an explicit 
proof, some scepticism could be justified. 


The assertion symbol * -" is in all cases a metamathematical concept. In the metaphor of Remark 4.3.4, 
the assertion symbol is not found on “the main scroll”, unless one considers that all unconditional lines in 
“the main scroll” have an implicit assertion symbol in front of them, and quoting such lines in other “scrolls” 
quotes the assertion symbols along with the asserted logical propositions. 


4.3.8 REMARK: Three meanings for assertion symbols. 
There are at least three kinds of meaning of an assertion symbol in the logic literature. 


(1) Synonym for the implication operator. In this case, “a - 8” means exactly the same thing as 
“a — B". This is found for example in the case of sequents as discussed in Section 5.3. 


(2) Assertion of provability. In this case, “a - 8” means that there exists a proof within the language 
that B is true if œ is assumed. 


(3) Semantic assertion. In this case, “at 8” means that for all possible interpretations of the language, 
if the interpretation of o is true, then the interpretation of 8 is true. This is generally denoted with a 
double hyphen: “a E f". 


According to Gentzen [413], page 180, the assertion symbol in sequents signifies exactly the same as an 
implication operator. This is not surprising because in his sequent calculus, the language was “complete”, 
and so all true theorems were provable. In other words, there was no difference between syntactic and 
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semantic implication. This meaning for the assertion symbol in sequents is confirmed by Church [348], 
page 165. Contradicting this is a statement by Curry [350), page 184, that Gentzen's "elementary statements 
are statements of deducibility". However, this may be a kind of revisionist view of history after model theory 
came to dominate mathematical logic. The original paper by Gentzen is very clear and cannot be understood 
as Curry claims. 


In model theory, options (2) and (3) are sharply distinguished because model theory would have no purpose 
if they were the same. 


4.3.9 DEFINITION [MM]: An unconditional assertion of a logical expression a in an axiomatic system X is 
a metamathematical claim that œa may be proved within X. 


A conditional assertion of a logical expression f from premises &œ1,@2,...Qn (where n is a positive in- 
teger) in an axiomatic system X is a metamathematical claim that may be proved within X from 
premises 01,Q2,...Qn- 


An assertion in an axiomatic system X is either an unconditional assertion of a logical expression within X 
or a conditional assertion of a logical expression within X from some given premises. 


4.3.10 REMARK:  Unconditional assertions are tautologies. Conditional assertions are derived rules. 
Unconditional assertions in Definition 4.3.9 may be thought of as the special case of conditional assertions 
with n — 0. However, there is a qualitative difference between the two kinds of assertion. Àn unconditional 
assertion claims that the asserted logical expression is a consequence of the axioms and rules alone, which 
means that they are "facts" about the system. In pure propositional calculus (without nonlogical axioms), 
all such valid assertions signify tautologies. 


6 


A conditional assertion is a claim that if the premises appear as lines on the “main scroll” of an axiomatic 
system, then the asserted logical expression may be produced on a later line by applying the axioms and 
rules of the system. Thus if the premises are not “facts” of the system, then the claimed consequence is not 
claimed to be a “fact”. A conditional assertion may be valid even when the premises are not “facts”. This 
is somewhat worrying when the “main scroll” is supposed to list only "facts". 


4.3.11 NOTATION[MM]: Unconditional assertions. 
Fx a denotes the unconditional assertion of the logical expression o in an axiomatic system X. 


+ a denotes the unconditional assertion of the logical expression o in an implicit axiomatic system. 


4.3.12 NOTATION[MM]: Conditional assertions. 
04,02,...Q& Fx B, for any positive integer n, denotes the conditional assertion of the logical expression 8 
from premises 04,025,...04, in an axiomatic system X. 


04,02,...Q& F D,for any positive integer n, denotes the conditional assertion of the logical expression 8 
from premises o1,02,... 0o, in an implicit axiomatic system. 


4.3.13 REMARK: The assertion symbol. 

Lemmon [367], page 11, states that the assertion symbol “H” in Notations 4.3.11 and 4.3.12 is “called often 
but misleadingly in the literature of logic the assertion sign. It may conveniently be read as ‘therefore’. 
Before it, we list (in any order) our assumptions, and after it we write the conclusion drawn." Curry [350], 
page 66, suggests that the conditional assertion sign may be read as “entails” or “yields”. Possibly “entail- 
ment sign" would be suitably precise, but most authors call it the “assertion symbol" or “assertion sign". 


It is very important not to confuse the conditional assertion “at 8" with the logical implication expression 
“a = f" or the unconditional assertion “F a = 8”, although it is true that they signify related ideas. The 
relations between these ideas are discussed in Remark 4.3.18, Section 4.8 and elsewhere. 


4.3.14 REMARK: Notation for two-way assertions. 
The two-way assertion symbol in Notation 4.3.15 is rarely needed. It is necessarily restricted to the case 
n = 1 in Definition 4.3.9. 


4.3.15 NOTATION [MM]: Two-way assertions. 
a -Fx B denotes the combined conditional assertions a Fx 8 and B Fx a in an axiomatic system X. 


a B denotes the combined conditional assertions œ F 8 and 8 F a in an implicit axiomatic system. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


4.4. A three-axiom implication-and-negation propositional calculus 103 


4.3.16 REMARK: Logical expression formulas. 

Inference rules are often stated in terms of logical expression formulas. (For example, the formula a => £ in 
Remark 4.3.17.) When specific logical expressions are substituted for all letters in a single logical expression 
formula, the result is a single logical expression. (Logical expression formulas are located in layer 5 in 
Figure 4.1.1 in Remark 4.1.7.) For comparison, if concrete propositions are substituted for all letters in a 
single logical expression, the result is a “knowledge set” which excludes zero or more combinations of truth 
values of the concrete propositions. (Logical expressions are located in layer 3 in Figure 4.1.1.) 


4.3.17 REMARK: The concept of operation of the modus ponens rule. 
Let a be a logical expression. The modus ponens rule states that if a appears on some line (m1): 


(nı) o 

and for some logical expression £, the logical expression formula o = f appears on line (n3): 

(no) a> B 

then the logical expression 8 may be written in any line (na): 

(na) B MP (n2,n1) 


which satisfies nı < ng and ng < na. To assist error-checking and “trace-back”, it is customary to document 
each application of the modus ponens rule in the third column in line (n3), indicating the two input lines in 
some way, such as for example “MP (n2,n1)”. (It is customary to list the input line dependencies with the 
conditional proposition “a = 8” first and the antecedent proposition “a” second.) 

4.3.18 REMARK: Modus ponens and the deduction metatheorem. 

The modus ponens inference rule may be thought of very roughly as replacing the — operator with the 
H symbol. The reverse replacement rule is the deduction metatheorem, which states that the theorem 
T F a= f may be proved if the theorem T, a F £8 can be proved, where T is any list of logical expressions. 
(See Section 4.8 for the deduction metatheorem.) 


4.4. A three-axiom implication-and-negation propositional calculus 


4.4.1 REMARK: The chosen axiomatic system for propositional calculus for this book. 
Section 4.4 concerns the particular formalisation of propositional caleulus which has been selected as the 
starting point for propositional logic theorems (and all other theorems) in this book. 


Definition 4.4.3 defines a one-rule, two-symbol, three-axiom system. It uses => and ~ as primitive symbols, 
and modus ponens and substitution as the inference rules. 


The three axioms (PC 1), (PC2) and (PC3) are attributed to Jan Lukasiewicz by Hilbert/ Ackermann [358], 
page 29, Rosenbloom [386], pages 38-39, 196, Church [348], pages 119, 156, and Curry [350], pages 55-56, 84. 
These axioms are also presented by Margaris [369], pages 49, 51, and Stoll [393], page 376. (According 
to Hilbert/Ackermann [358], page 29, Lukasiewicz obtained these axioms as a simplification of an earlier 
6-axiom system published in 1879 by Frege [355], but it is difficult to verify this because of Frege's almost 
inscrutable quipu-like notations for logical expressions.) 


The chosen three axioms are a compromise between a minimalist NAND-based axiom system (as described 
in Remark 4.2.3), and an easy-going 4-symbol, 10-axiom “natural” system which would be suitable for a 
first introduction to logic. Definition 4.4.3 incorporates these axioms in a one-rule, two-symbol, three-axiom 
system for propositional calculus. 


4.4.2 REMARK: Proposition names, wffs, wff-names and wff-wffs. 

Definition 4.4.3 is broadly organised in accordance with the general propositional calculus axiomatisation 
components which are presented in Remark 4.1.6. For brevity, the term “logical expression” is replaced 
with the abbreviation “wff” for “well-formed formula”. The term “scope” is very much context-dependent. 
So it cannot be defined here. A scope is a portion of text where the bindings of name spaces to concrete 
object spaces are held constant. A scope may inherit bindings from its parent scope. (See Remark 3.9.2 and 
Notation 3.9.3 for logical expression name dereferencing as in te => p.) 
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Wits, wff-names and wff-wffs must be carefully distinguished (although in practice they are often confused). 
The wffs in Definition 4.4.3 (v) are strings such as *((2A) > (B = C))”, which are built recursively from 
proposition names, logical operators and punctuation symbols. The wíf-names in Definition 4.4.3 (iv) are 
letters which are bound to particular wffs in a particular scope. Thus the wff-name “a” may be bound 
to the wf *((24) > (B => C))”, for example. The wff-wffs in Definition 4.4.3 (vi) are strings such as 
“(a = (B = a))” which appear in axiom templates and rules in Definition 4.4.3 (vii, viii), and also in 
theorem assertions and proofs. Thus wff-wffs are templates containing wff-names which may be substituted 


with wffs, and the wffs contain proposition names. 


4.4.3 DEFINITION[MM]: The following is the axiomatic system PC for propositional calculus. 
(i) The proposition name space Np for each scope consists of lower-case and upper-case letters of the 
Roman alphabet, with or without integer subscripts. 
(ii) The basic logical operators are = (“implies”) and ^ (“not”). 
(iii) The punctuation symbols are “(” and “)”. 


(iv) The logical expression names (“wff names") are lower-case letters of the Greek alphabet, with or without 
integer subscripts. Each wff name which is used within a given scope is bound to a unique wff. 


(v) Logical expressions ( ^wffs") are specified recursively as follows. 
(1) “A” is a wff for all A in Np. 
(2) “(a` => B')" is a wf for any wffs o and £. 
(3) “(ma`)” is a wff for any wf a. 
Any expression which cannot be constructed by recursive application of these rules is not a wff. (For 
clarity, parentheses may be omitted in accordance with customary precedence rules.) 
(vi) The logical expression formulas (^wff-wffs") are defined recursively as follows. 
(1) “a” is a wff-wff for any wff-name a which is bound to a wff. 
(2) *«(Ej > Z3)" is a wff-wff for any wff-wffs =, and Es. 
(3) “(A)” is a wff-wff for any wff-wff E. 
(vii) The axiom templates are the following logical expression formulas for any wffs a, 3 and vy. 
(PC1) a= (B => a). 
(PC2) (a => (B => y)) = ((a => B) > (a > 9). 
(PC3) (^8 > ~a) > (a = 2). 
(viii) The inference rules are substitution and modus ponens. (See Remark 4.3.17.) 
(MP) a>6,aFt B. 


4.4.4 REMARK: The lack of obvious veracity and adequacy of the propositional calculus axioms. 

Axioms PC 1, PC2 and PC3 in Definition 4.4.3 are not obvious consequences of the intended meanings of 
the logical operators. (See Theorem 3.13.3 for a low-level proof of the validity of the axioms.) Nor is it 
obvious that all intended properties of the logical operators follow from these three axioms. 


Axiom PC 1 is one half of the definition of the > operator. Axiom PC2 looks like a “distributivity axiom” 
or “restriction axiom”, which says how the = operator is distributed or restricted by another = operator. 
Axiom PC3 looks like a modus tollens rule. (See Remark 4.8.15 for the modus tollens rule.) 


4.4.5 REMARK: The interpretation of axioms, axiom schemas and axiom templates. 

Since this book is concerned more with meaning than with advanced techniques of computation, it is im- 
portant to clarify the meanings of the logical expressions which appear, for example, in Definition 4.4.3 and 
Theorem 4.5.7 before proceeding to the mechanical grind of logical deduction. (The modern formalist fashion 
in mathematical logic is to study logical deduction without reference to semantics, regarding semantics as 
an optional extra. In this book, the semantics is primary.) 


Broadly speaking, the axioms in axiomatic systems may be true axioms or axiom schemas, and the proofs 
of theorems may be true proofs or proof schemas. In the case of true axioms, the “letters” in the logical 
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expression in an axiom have variable associations with concrete propositions. Each letter is associated with 
at most one concrete proposition, the association depending on context. In this case of true axioms, the 
axioms are logical expressions in layer 3 in Figure 4.1.1 in Remark 4.1.7. An infinite number of tautologies 
may be generated from true axioms by means of a substitution rule. Such a rule states that a new tautology 
may be generated from any tautology by substituting an arbitrary logical expression for any letter in the 
tautology. In particular, axioms may yield infinitely many tautologies in this way. 


In the case of axiom schemas, the axioms are logical expression formulas (i.e. wff-wffs), which are in layer 5 
in Figure 4.1.1. There are (at least) two interpretations of how axiom schemas are implemented. In the 
first interpretation, the infinite set of all instantiations of the axiom schemas is notionally generated before 
deductions within the system commence. In this view, an axiom instance, obtained by substituting a 
particular logical expression for each letter in an axiom schema, “exists” in some sense before deductions 
commence, and is simply taken “off the shelf” or “out of the toolbox”. In the second interpretation, each 
axiom schema is a mere “axiom template” from which tautologies are generated “on demand” by substituting 
logical expressions for each letter in the template. In this “axiom template” interpretation, a substitution 
rule is required, but the source for substitution is in this case a wff-letter in a wff-wff rather than in a wff. 


For a standard propositional calculus (without nonlogical axioms), the choice between axioms, axiom schemas 
and axiom templates has no effect on the set of generated tautologies. The axioms in Definition 4.4.3 are 
intended to be interpreted as axiom templates. 


In the case of logic theorems such as Theorem 4.5.7, each assertion may be regarded as an “assertion schema", 
and each proof may be regarded as a “proof schema”. Then each theorem and proof may be applied by 
substituting particular logical expressions into these schemas to produce a theorem instantiation and proof 
instantiation. However, this interpretation does not seem to match the practical realities of logical theorems 
and proofs. By an abuse of notation, one may express the “true meaning" of an axiom such as “a > (3 > a)” 
as Va € Y, V8 EW, a => (B = a), where W denotes the “set” of all logical expressions. In other words, 
the wff-wff “a > (8 = a)” is thought of as a parametrised set of logical expressions which are handled 
simultaneously as a single unit. 


This is analogous to the way in which two real-valued functions f,g : R — R may be added to give a 
function f +g : R — R by pointwise addition f + g : x — f(x) + g(x). Similarly, one might write f < g to 
mean Vz € R, f(x) € g(x). Thus a wff-wff “a > (8 => a)” may be regarded as a function of the arguments 
o and £, and the logical expression formula is defined “pointwise” for each value of œ and f. Thus an 
infinite number of assertions may be made simultaneously. The “quantifier” expressions “Va € W” and 
"vB € W” are implicit in the use of Greek letters in the axioms in Definition 4.4.3 and in the theorems and 
the proof-lines in Theorem 4.5.7. To put it another way, one may interpret an axiom such as “a > (8 > a)” 
as signifying the map f : W x W — Y given by f(“a”, *8") = “a > (B > a)” for alla € W and BE X. 
The quantifier-language “for all” and the set-language W may be removed here by interpreting the axiom 
“a => (B => a)” to mean: 


if a and B are wffs, then the substitution of a and f into the formula “a = (8 > a)” yields a tautology. 


Thus the substitution rule is implicitly built into the meaning of the axiom because the use of wff letters in the 
logical expression formula implies the claim that the formula yields a valid tautology for all wff substitutions. 


There is a further distinction here between wff-functions and wff-rules. A function is thought of as a set of 
ordered pairs, whereas a rule is an algorithm or procedure. Here the “rule” interpretation will be adopted. 
In other words, an axiom such as “a = (3 => o)" will be interpreted as an algorithm or procedure to 
construct a wff from zero or more wff-letter substitutions. This may be referred to as an “axiom template” 
interpretation. 


Yet another fine distinction here is the difference between the value of the axiom template a > (8 > a) 
for a particular choice of logical expressions œ and 8 and the rule (o, 8) œ> “a = (8 = a)”. The value 
is a particular logical expression in W. The rule is a map from W x W to W. (This is analogous to the 
distinction between the value of a real-valued function x? +1 for a particular z € R and the rule x œ z? +1.) 
The intended meaning is generally implicit in each context. 
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4.5. A first theorem to boot-strap propositional calculus 


4.5.1 REMARK: Interpretation of substitution in proofs of propositional calculus theorems. 

The validity of the use of axiom schemas and the substitution rule in propositional calculus depends entirely 
upon the fact that the substitution of arbitrary logical expressions into a tautology always yields a tautology. 
(This “fact” is stated as Proposition 3.13.10.) 


Every line of a proof (in the style of propositional calculus presented here) is claimed to be a tautology 
formula (if there are no nonlogical axioms). In other words, it is claimed that every line of a proof becomes a 
tautology if its letters are uniformly substituted with particular wffs. Moreover, it is claimed that the validity 
of such tautology formulas follows from previous lines according to the rules of the axiomatic system. 


When a new line is added to a proof, the justification column may contain one of the following texts. 


(1) “Hyp” to indicate that the line is being quoted verbatim from a premise of the conditional assertion in 
the theorem statement. 


(2) “MP” followed by the line-numbers of the premises for the application of a modus ponens invocation. 
(The substitution which is required will always be obvious because the second referred-to line will be 
a verbatim copy of the logical expression which is on the left of the implication in the first referred-to 
line, and the line being justified will be a verbatim copy of the logical expression which is on the right 
of the implication in the first referred-to line.) 


(3) The label of an axiom, followed by the substitutions which are to be applied. 

(4) The label of a previous unconditional-assertion theorem, followed by the substitutions which are to be 
applied. 

(5) The label of a previous conditional-assertion theorem, followed by the line-numbers of the premises for 
the application of the theorem, followed by the substitutions which are to be applied. 


In cases (3), (4) and (5), the substitutions are assumed to be valid because all axioms and theorem assertions 
are assumed to apply to all possible substitutions. Therefore the wff-wff which is constructed by uniformly 
substituting any wff-wffs into the letters of the theorem wff-wff is a valid wff-wtf. 


4.5.2 REMARK: The trade-off between precision and comprehensibility. 

It may seem that the rules stated here for propositional calculus lack precision. To state the rules with total 
precision would make them look like computer programs, which would be difficult for humans to read. The 
rules are hopefully stated precisely enough so that a competent programmer could translate them into a form 
which is suitable for a computer to read. However, this book is written for humans. The reader needs to be 
convinced only that the rules could be converted to a form suitable for verification by computer software. 


4.5.3 REMARK: Notation for substitutions in proof-line justifications. 

The form of notation chosen here to specify the substitutions to be applied for the kinds of justifications in 
cases (3), (4) and (5) in Remark 4.5.1 is the sequence of all wff letters which appear in the original wff-wff, 
each letter being individually followed within square brackets by the wff-wff which is to be substituted for 
it. Thus, for example, in line (3) of the proof of part (iv) of Theorem 4.5.7, the justification is “part (i) (2): 
all = 7], Gla]”. This means that the wff letters a and £ in the wff-wffs “a” and “3 = a” in the conditional 
assertion “alt B — a” are to be substituted with the wff-wffs ^5 — y” and “a” respectively. This yields the 
conditional assertion “38 — y F a => (B => vy)" after the substitution. Then from line (2), the unconditional 


assertion “a => (8 — y)” follows (because the required condition is satisfied by line (2) of the proof). 


4.5.4 REMARK: The potentially infinite number of assertions in an axiom template. 

Each axiom template in Definition 4.4.3 may be thought of as an infinite set of axiom instances, where 
each axiom instance is a logical expression containing proposition names, and each such axiom instance 
represents a potentially infinite number of concrete logical expressions by substituting specific propositions 
for the variable proposition names. For example, the axiom “a => (8 = a))" could be interpreted to represent 
the aggregate of all logical expressions A > (B > A), B > (A > B), (B => A) = (A= (B = A)), and all 
other logical expressions which can be constructed by such substitutions. Then each of these axiom instances 
could be interpreted to mean concrete logical expressions such as (p => q) => (q = (p = q)) and any other 
logical expressions obtained by substitution with particular choices of concrete propositions p,q € P. If P is 
infinite, this would give an infinite number of concrete logical expressions for each of the infinite number of 
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axiom instances. The phrase “potentially infinite” is often used to describe this situation. The application 
of substitutions may generate any finite number of axiom instances, and from each axiom instance, any finite 
number of concrete logical expressions may be generated. One could say that the number is “unbounded” or 
“potentially infinite”, but the “existence” of aggregate sets of axiom instances and concrete logical expressions 
is not required. 


4.5.5 REMARK: Tagging of theorems which are proved within the propositional calculus. 

Logic theorems which are deduced within the propositional calculus axiomatic system in Definition 4.4.3 
will be tagged with the abbreviation “PC”, as in Theorem 4.5.7 for example. (See Remark 6.1.1 for the 
corresponding tag QC for predicate calculus.) Theorems and definitions which are not tagged are, by 
default, stated within Zermelo-Fraenkel set theory. (See also Remark 7.1.11.) 


4.5.6 REMARK: Using logic theorems as exercises. 
To learn the art of theorem-proving, it is a good idea to write one’s own proofs for logic assertions such 
as those given in Theorems 4.5.7, 4.5.16, 4.6.2, 4.6.4, 4.7.9 and 4.9.4, without first looking at the solutions 


which are provided. Some of the proofs are trivial. Some are non-trivial. Your mileage may vary! 


4.5.7 THEOREM [PC]: Initial theorem for the propositional calculus. 
'The following assertions follow from Definition 4.4.3. 


(ii) a> (8 — y) F (a B) (a m. 


(iii) 38 > ~a F a > B. 


(vi) F (8 = y) > ((a => B) > (a => 7)). 
(vii) F (a= B) = (8 = vy) 9 (a 7)). 
(viii) 8 — « F (a — b)=> (la> v). 
(ix) a 9 B F (8 2 «) 29 (a ). 
(x) F (a = B) => (a> (B = 7)) => (a 7)) 
(xi) Fasa 
(xii) aF a 
(xiii) F a 2 ((a = B) => B). 
(xiv) F (a — (8 = y)) = (8 9 (a> 7)). 


(xvi) F a 2 (~a > b). 

(xvii) F (2a > a) > a. 

(xviii) F (sa — B) > ((8 — a) > a). 
(xix) F (46 => ^a) > ((7a = B) = B). 


) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) ) 
(xv) F 2a > (a > B). 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) : 

) F (a = B) => (^8 2 ~a). 

) 
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(xx 
(xxxi) a> BF ABS -a. 
(xxxii) a => B, ^f F ~a. 
(xxxiii) F (28 1a) 2 (8 > a) > f). 
(C8 = a) = B). 
(“a = B) = B). 
(a= B) > (7a > P). 
- (ma => B) => (a = p). 


x) 2a — B F —B a. 

) 

ii) 

) 

) 

) 

Vie 

) 
(xxxviii) a > 8 F ~~a > $. 

) 

) 

) 

) 

) 

) 

v) 


(xxxiv) F 


- (a = B) 


(xxxv 


(xxxvi 


(xxxvii 


(xxxix) 22o — B F- a> B. 
(xl) 2a F a 2 f. 
(xli 


(xlii 


B - a«a- B. 

~la => 8) F a. 
^(a = 8) F =. 

(xliv) F a => (AB => ~la > )). 
a, 7B F 


(xliii 


(^N 
a 


~(a => f). 
Pnoor: To prove part (i): 
1) a 

(2) a= (8 — a) 

3) 8-a 


ak bsa 


To prove part (ii): a 


1) a> (B = vy) 


2) (a= (8 > 7)) > ((a => B) > (a> v) 
(3) (a > B) => (a — v) 

To prove part (iii) | —8 — —a F a> 
(1) 38 > ~a 

(2) (^8 > ~a) > (a > B) 

3 o8 


To prove part (iv) a+6,B>yras>y 


) 
) 

(3) a= (B => 7) 
) (a= B) => (a> 7) 
) 

To prove part (v): 
1) a > (8 = 7) 
2) (a= B) => (a => 9) 

(3) B — (a — B) 

4) B=> (a => 7) 


> (L> 7) F (a> p) => 


Hyp 
PCI: ala], 8[8] 
MP (2,1) 
(a > v) 
Hyp 
PC2: afa], [5], h] 
MP (2,1) 


Hyp 
PC3: afa], 8[8] 
MP (2,1) 


Hyp 

Hyp 

part (i) (2): a[8 = 7], Bla] 
part (ii) (3): ala], 818], yh] 
MP (4,1) 


=>(6>y7) F B= (a=>7) 


To prove part (vi): 


(a> y) 


1) (8 2 y) => (a= (8 = 7)) 
2) (a = (B = 7)) = (a => B) 9 (a= 7)) 
(3) (8 = y) = ((a=> B) > (a> 7)) 
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To prove part (vii): F (a= 8) > ((8 => y) > (a= 7)) 
(1) (8 — y) > ((a > B) > (a> ») part (vi): ala], 8(8], [y] 
(2) (a => B) 2 ((B => y) = (a 7) part (v) (1): all > 7], Bla => 8], yla = 9] 
To prove part (viii): —.8 — y F (a> B) — (a — vy) 
(D Bey Hyp 
(2) a+ (854) part (i) (1): e[& + 7], Bla! 
(3) (a > B) => (a> v) part (ii) (2): ala], 8[8], yh] 
To prove part (ix): a= F (8 — y) — (a vy) 
(1) o2 8 Hyp 
2) (a> B) 2 (8 => y) = (a 7) part (vii): alo], 818], yh] 
3) (82 9) => (a 9) MP (2,1) 
To prove part (x): F (a= B) > ((a => (B => 7)) => (a > 7)) 
1) (a (89 9) > (a> 8) 9 (a+) PC2: ala], 615), al» 
2) (a > B) => ((a => (8 > 7)) > (a> v) part (v) (1): ala = (8 => 7)], bla = 8], yla > v 
To prove part (xi): Fa>a 
1) a > ((a => a) > a) PC1: ala], pla > a 
2) (a — (oa => a)) => la => a) part (ii) (1): afa], Bla > al, yla 
3) a> (a >a) PCI: ola], Bla 
(4) aa MP (2,3) 
To prove part (xii: aha 
(1) a Hyp 
(2)a>a part (xi): alfa] 
(3) a MP (2,1) 
To prove part (xiii): F a => ((a> B) — B) 
(1) (a 8) + (a>) part (xi): ala > à 
(2) a = ((a = 8) => 8B) part (v) (D: ala => £], Bla], "y[8 
To prove part (xiv): + (a> (8+ 7)) 9 (5> (a> 3) 
(1) (a> (893) + (a9 8) 9 (a 9) PC2: afa), BIS], sb» 
(2) B = (a = B) PC1: a[b], Bla 
(3) ((a=> 8) > (a > 7)) = (8 > (a => ») part (ix) (2): o[8], Bla = 8], yla > 7 
(4) (a= (8 > 7)) > (8 > (a> 7)) 


part (iv) (1,3): ala = (8 = 7)], ila = 8) => (o — vh; 118 — (a = 7) 


To prove part (xv):  F ~a > (a => f) 


(1) 2a + (48 > =a) 
(2) (28 > ^o) > (a 
(3) ~a > (a> 8) 


B) 


To prove part (xvi) + a= (7a => B) 
(1) ^a > (a > 8) 
(2) a> (~a => 8) 


To prove part (xvii): F (-a>a)>a 
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PC3: ala], [8 
part (iv) (1,2): aja], 8[48 > ~a], yla > 8 


part (xv): ala], 8|8 
part (v) (1): afha], Bla], [8 
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(1) ~a > (a (a => a)) part (xv): ala], 8[5(a => a) 
(2) (sa > a) > (sa > (o > a)) part (ii) (1): a[7o], Bla], y[^(a > a) 
(3) Ca > ~la > a)) > ((@ > a) > a) PC3: afa = aj, Ba 
(4) (^a 2 a) > ((a> a) => a) part (iv) (2,3): aja > a], 8[2o > —^(o > o)], y[(a > a) >a 
(5) (a > a) = ((^a > a) > o) part (v) (4): o[^o > a], Bla > a], yla 
(6) a>a part (xi): ala 
(T) (pa>a)>a MP (5,6) 
To prove part (xviii): F (~a => 8) — ((8 — a) — a) 
(1) (sa > B) > ((B > a) > (~a = o)) part (vii): o[^o], 8[8], yla 
(2) (ha >a) > part (xvii): ala 
3) (8 = a) > (^e > a)) > ((B > a) > a) part (viii) (2): a[8 = a], B[^a = a], yla 
4) Ca = B) - ((B-»a)-» o) part (iv) (1,3): o[^o = 8], B[(B > a) > (^a > a)], (8 > o) > a 
To prove part (xix): F (48 1a) > ((-a > B) > B) 
1) (28 > ~a) 2 (“a — £) ^ B) part (xviii): a[8], B[^a 
To prove part (xx): F ((a— B) > y) > (8 — 7) 
1) 8 = (a= b) PC1: o[8], Bla 
(2) (8 — (a > 8)) > (a> B) > 7) > (B 7)) part (vii): a[8], Bla = 8], viy 
3) ((a=> B) > 7) > (B 9) MP (2,1) 
To prove part (xxi): | ((a>8)>y7)=>(-a=>7) 
1) ~a > (a > B) part (xv): ala], 8(8] 
2) (^a > (a > 8)) > (a 8) > 7) 9 Ca > 7)) part (vii): o[^a], Bla = £], yh] 
3) (a 8) y) > (7a 7) MP (2,1) 
To prove part (xxii): F ~~a > « 
1) (FaSa)>a part (xvii): a[a| 
2) (Da => a) > a) > (77a > a) part (xxi): a[>a], B[a], yla] 
(3) 22a a MP (2,1) 
To prove part (xxiii): F à — ^a 
(1) 222a > ~a part (xxii): a[^o] 
(2) ana part (iii) (1): aja], A-a] 
To prove part (xxiv); =~a F a 
(1) 77a Hyp 
(2) =a > a part (xxii): afa] 
(3) a MP (2,1) 
To prove part (xxv): o F —-a. 
(1) a Hyp 
2) a > nng part (xxiii): ala] 
3) —-a MP (2,1) 
To prove part (xxvi): F(a iB) => (B 10) 
1) a >a part (xxii): ala 
2) (a > ^B) > (ma > >) part (ix): a[^7o], Bla], y8 
3) (77a => 78) => (B = ~a) PC3: alb], 8[^a 
4) (a => =£) > (8 > ~a) part (iv) (2,3): ala — ^8], fma > 8], 7[6 > ^o 
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To prove part (xxvii): F (~a > B) > (^8 => o) 

(1) B= = part (xxiii): a[@ 
(2) Cas B) + (2a. 8) part (viii): a^a], B18], 1-78 
(3) (^o = 778) > (98 = a) PC3: a[8], Bla 
(4) (7a = B) > (28 = a) part (iv) (2,3): aja = 8], B[>a > ^8], 4[^8 > a 
To prove part (xxviii): F} (a= 8) => (^8 10) 

(1) 77a a part (xxii): ala 
(2) E. => s (77a => B) part (ix): a[7^o], la], [8 
(3) (^ B) = (^B > ~a) part (xxvii): alsa], B[8 
(4) (a > 6) (28 = ^a) part (iv) (2,3): ala = 8], 8a = b], Y[^8 = ^a 
To prove part (xxix): o — F B>-7a 

(1) a> ^8 Hyp 
(2) (a 28) + (6 > ~a) part (xxvi): ajo], AIA] 
(3) 8 2 ~a MP (2,1) 
To prove part (xxx): a-—Btr--8-a 

(1) ^a => B Hyp 
(2) (sa = B) > (28 = a) part (xxvii): ala], 8[8] 
(3) -B Sa MP (2,1) 
To prove part (xxxi): o — F 7~B=>-7a 

(1 a B Hyp 
(2) (a 8) > (28. = ~a) part (xxviii): ala], [5] 
(3) 48 > -a MP (2,1) 
To prove part (xxxii): a+ B, ^8 F ~a 

(1) a — B Hyp 
(2) ^8 Hyp 
(3) 2B > ~a part (xxxi) (1): o[a], B[8] 
(4) ^a MP (3,2) 
To prove part (xxxiii) — - (58 1a) > ((58 > a) > B) 

(1) (48 = ~a) = ((^a = 8) = B) part (xix): ala], B8 
(2) (^8 > a) => (7a = B) part (xxvii): a[8], Bla 
(3) ((^a = B) > B) > ((48 = a) = B) part (ix) (2): o[28 = o], ba = £], [8 
(4) (48 > ^a) > ((^B > a) = B) part (iv) (1,3): a[>8 > ^a], [a = B) = 6], yi8 > a) = 8 
To prove part (xxxiv): H (a => B) > (48 + a) > 8) 

(1) (@ > £) > (“8 + a) part (xviii): afa], S18 
(2) (46 > ^a) > (£ > a) = B) part (xxxiii): afa], B[8 
(3) (a= B) => ((^8 = a) = B) part (iv) (1,2): ala = £], 8[^8 > ^o], y[(48 = o) => 8 
To prove part (xxxv): F (a= 8) => ((-a=> B) = B) 

(1) (a= 8) + (28 > ~a) part (xxviii): ala], 212 
(2) (48 > ^a) = (“a = B) = B) part (xix): ala], B/G 
(3) (a= B) = (Ga => B) = B) part (iv) (1,2): ala = B], 8[^8 > ^o], (^a > 8) => B 


To prove part (xxxvi): F (a = B) = (~~a — B) 
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(1) a= ((a — B) = B) part (xiii): a[a], 8[8 
(2) 220a part (xxii): ala 
(3) 7a = ((a => B) = B) part (iv) (2,1): a[^7o], Blo], y[(a > 8) > 8 
(4) (a — B) > (a = B) part (v) (3): a[^7a], Bla = B], l6 
To prove part (xxxvii): F (~~a => f) > (a = b) 

(1) 77a => ((>7a — B) = B) part (xiii): o[^^a], 88 
(2) a 2 ~~a part (xxiii): ala 
(3) a => (~~a => B) = B) part (iv) (2,1): ala], 8[77o], y[(77a — 8) > 8 
(4) (^70 => p) > (a= B) part (v) (3): o[a], B[^7« => 8], [8 
To prove part (xxxviii; a+ 8 F ~~a => £ 

(1) o9 B Hyp 
(2) (a= B) > (a= B) part (xxxvi): alo], 8[8] 
(3) sna => 6 MP (2,1) 
To prove part (xxxix): ~-a >f - a 

(1) =a => B Hyp 
(2) (ma = $) > (a= B) part (xxxvii): alo], 8[5] 
(3 a8 MP (2,1) 
To prove part (xl: ~a F a+ 8 

(1) ^a Hyp 
(2) ~a > (a > 8) part (xv): ala], 8[8] 
(3) 0 2 B MP (2,1) 
To prove part (xli): F o 8 

(1) 8 Hyp 
(2) B => (a => B) PC 1: aff], [o] 
(3) a> 8 MP (2,1) 
To prove part (xlii): — ^(a => 8) F 

(1) (a = 8) Hyp 
(2) ^a => (a ) part (xv): ala}, 8[6] 
(3) “(a> B) a part (xxx) (2): a[»a], Bla = £] 
(4) a MP (3,1) 
To prove part (xlii): |—(o => 8) F ^8 

(1) (a = B) Hyp 
(2) B = (a = B) PC1: a[b], Bla] 
(3) > = B) > B part (xxxi) (2): o[8], Bla = £] 
(4) = MP (3,1) 
To prove part (xliv): F o — (^8 (a => B)) 

(1) a= ((a = B) = B) part (xiii): ala], [8] 
(2) ((a = B) = B) > (^8 => ~la = B)) part (xxviii): ala = £], B[8] 
(3) a => (98 => ~(a > B)) part (iv) (1,2): alo], bia = 8) = 8], Y[^8 > ^(e = 8)] 


To prove part (xlv); a, 48 F -(a => f) 
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(1) a Hyp 
(2) ^8 Hyp 
(3) a > (^8 => ~la = )) part (xliv): abs ], 8[8] 
(4) 28 = ~la — B) MP (3,1) 
(5) (a = B) MP (4,2) 


This completes the proof of Theorem 4.5.7. 


4.5.8 REMARK: How to prove the trivial theorem “at a”. 

It is often the most trivial theorems which are the most difficult to prove. Even though an assertion may 
be obviously true, it may not be obvious how to prove that it is true. Knowing that something is true does 
not guarantee that a particular framework for proving assertions can deliver the obviously true proposition 
as an output. 


Theorem 4.5.7 part (xii), could in principle be proved in a single line as follows. 


To prove part (xii): abt a 
(1) a Hyp 


The first line of this proof is the antecedent proposition a as hypothesis. The last line of this proof is the 
desired consequent proposition a. So apparently this does prove the assertion “ œ F o". The practical 
application of this theorem would be to permit any expression a to be introduced into an argument if it has 
appeared earlier in the same argument. But it is not at all obvious that this is not already obvious! Possibly 
such a theorem should be defined as an inference rule, not as a theorem. 


In the proof provided here for Theorem 4.5.7 (xii), the conclusion a follows by application of the almost trivial 
assertion F a@ = a, which is Theorem 4.5.7 (xi). However, part (xi) is not really trivial. It is a property of 
the implication operator which must be shown from the axioms, and for the axioms in Definition 4.4. 3, the 
proof is not as simple as one might expect. In fact, here it is shown from Theorem 4.5.7 part (i), w whic 
states: a > (8 — y) + (a = B) = (a= y). Thus the pathway from the axioms to the assertion “a F a” 
is quite complex, and not at all obvious. Hence the three-line proof which is given here for the theorem 
^o F o? seems to require a complicated proof-pathway to arrive at the inference rule which is apparently 
the most obvious of all. 


The validity of copying a proposition from one line to another (with or without some substitution) has been 
assumed in the logical calculus framework which is presented here. However, in order to prove “a F a” using 
such “copying” would be assuming the theorem which is to be proved. So it seems that it cannot be done 
because usually a theorem cannot be used to prove itself. If one makes the application of the substitution 
rule explicit, one may write the following. 


To prove part (xii): aba 


(1) a Hyp 
(2) a Subst (1): afa] 


It is the trivial borderline cases which often provide the impetus to properly formalise the methods of a 
logical or mathematical framework. This is perhaps such a case. 


4.5.9 REMARK: Equivalence of a logical implication expression to its contrapositive. 

An implication expression of the form ^f = ~a is known as the “contrapositive” of the corresponding 
implication expression a = f. Theorem 4.5.7 parts (iii) and (xxxi) demonstrate that the contrapositive 
of a logical expression is equivalent to the original (i.e. ^positive") logical expression. (For the definition 
of “contrapositive”, see for example Kleene [366], page 13; E. Mendelson [370], page 21. For the “law of 
contraposition", see for example Suppes [394], pages 34, 204-205; Margaris [369], pages 2, 4, 71; Church [348], 
pages 119, 121; Kleene [365], page 113; Curry [350], page 287. The equivalence is also called the “principle 
of transposition" by Whitehead/Russell, Volume I [400], page 121. It is called both "transposition" and 
“contraposition” by Carnap [345], page 32.) 


4.5.10 REMARK: Notation for the "burden of hypotheses". 
In the proofs of assertions in Theorem 4.5.7, the dependence of argument-lines on previous argument-lines is 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


114 4. Propositional calculus 


not indicated. Any assertion which follows from axioms only is clearly a direct, unconditional consequence of 
the axioms. But assertions which depend on hypotheses are only conditionally proved. An assertion which 
has a non-empty list of propositions “on the left” of the assertion symbol is in effect a deduction rule. Such 
an assertion states that if the propositions on the left occur in an argument, then the proposition “on the 
right” may be proved. In other words, the assertion is a derived rule which may be applied in later proofs, 
not an assertion of a tautology. 


Strictly speaking, one should distinguish between argument-lines which are unconditional assertions of tau- 
tologies and conditional argument lines. In some logical calculus frameworks, this is done systematically. 
For example, one might rewrite part (i) of Theorem 4.5.7 as follows. 


To prove part (i): at 8-a 


(1) a Hyp d (1) 
(2) a = (B > a)) PCI: ala], 8[8] 10 
(3) 8a MP (2,1) J (1) 


Clearly the proposition “a” in line (1) is not a tautology. It is an assertion which follows from itself. Therefore 
the “burden of hypotheses” is indicated on the right as line (1). Since line (2) follows from axioms only, its 
“burden of hypotheses" is an empty list. However, line (3) depends upon both lines (1) and (3). Therefore 
it imports or inherits all of the dependencies of lines (1) and (2). Hence line (3) has the dependency list (1). 
Therefore the assertion in line (3) may be written out more fully as “a F 6 = a”, which is the proposition 
to be proved. One could indicate dependencies more explicitly as follows. 


To prove part (i): at B>a 


(1)aka Hyp 
(2) F a = (8 2 a)) PC1: ala], 8(8] 
(3. aF- l>a MP (2,1) 


This would be somewhat tedious and unnecessary. All of the required information is in the dependency 
lists. A slightly more interesting example is part (iv) of Theorem 4.5.7, which may be written with explicit 
dependencies as follows. 


To prove part (iv); a+ 6,B>yrha>y¥y 


(1) 0 B Hyp d (1) 
(2) Bv Hyp 4 (2) 
(3) a> (B — v) part (i) (2): a[8 > 7], Bla] 4 (2) 
(4) (a = B) > (a — 7) part (ii) (3): ala], BIA], yh] 4 (2) 
(5 av MP (4,1) Jj (1,2) 


Here the dependency list for line (5) is the union of the dependency lists for lines (1) and (4). The explicit 
indication of the accumulated “burden of hypotheses” may seem somewhat unnecessary for propositional 
calculus, but the predicate calculus in Chapter 6 often requires both accumulation and elimination (or 
"discharge") of hypotheses from such dependency lists. Then inspired guesswork must be replaced by a 
systematic discipline. 


4.5.11 REMARK: Comparison of propositional calculus argumentation with computer software libraries. 
The gradual build-up of theorems in an axiomatic system (such as propositional calculus, predicate calculus 
or Zermelo-Fraenkel set theory) is analogous to the way programming procedures are built up in software 
libraries. In both cases, there is an attempt to amass a hoard of re-usable “intellectual capital” which can 
be used in a wide variety of future work. Consequently the work gets progressively easier as “user-friendly” 
theorems (or programming procedures) accumulate over time. 


Accumulation of “theorem libraries" sounds like a good idea in principle, but a single error in a single re-usable 
item (a theorem or a programming procedure) can easily propagate to a very wide range of applications. In 
other words, “bugs” can creep into re-usable libraries. It is for this reason that there is so much emphasis 
on total correctness in mathematical logic. The slightest error could propagate to all of mathematics. 


The development of the propositional calculus is also analogous to “boot-strapping” a computer operating 
system. The propositional calculus is the lowest functional layer of mathematics. Everything else is based 
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on this substrate. Logic and set theory may be thought of as a kind of “operating system” of mathematics. 
Then differential geometry is a “user application” in this “operating system”. 


4.5.12 REMARK: The discovery of interesting theorems by systematic spanning. 

Theorem 4.5.16 follows the usual informal approach to proofs in logic, which is to find proofs for desired 
assertions, while building up a small library of useful intermediate assertions along the way. A different 
approach would be to generate all possible assertions which can be obtained in n deductions steps with all 
possible combinations of applications of the deduction rules. Then n can be gradually increased to discover 
all possible theorems. This is analogous to finding the span of a set of vectors in a linear space. However, 
such a systematic, exhaustive approach is about as useful as generating the set of all possible chess games 
according to the number of moves n. As n increases, the number of possible games increases very rapidly. 
But a more serious problem is that it the vast majority of such games are worthless. Similarly in logic, the 
vast majority of true assertions are uninteresting. 


4.5.13 REMARK: Choosing the best order for attacking theorems. 

A particular difficulty of propositional calculus theorem proving is knowing the best order in which to attack 
a desired set of assertions. If the order is chosen well, the earlier assertions can be very useful for proving 
later assertions. It is often only after the proofs are found that one can see a better order of attack. 


In practice, one typically has a single target assertion to prove and seeks assertions which can assist with 
this. If this backwards search leads to known results, the target assertion can be proved. The natural order 
of discovery is typically the opposite of the order of deduction of propositions. This is a constant feature 
of mathematics research. Discovery of new results typically starts from the conclusions and works back to 
old results. So results are presented in journal articles and books in the reverse order to the sequence of 
discovery insights. 


4.5.14 REMARK: Propositional calculus is absurdly complex, not a true “calculus”. 

The absurdly complex methods and procedures of propositional calculus argumentation may be contrasted 
with the extreme simplicity of logical calculations using knowledge sets or truth tables. Far from being a 
true calculus, the so-called propositional “calculus” is an arduous, frustratingly indirect set of methods for 
obtaining even the most trivial results. One might reasonably ask why anyone would prefer the argumentative 
method to simple calculations. 


This author’s best guess is that the argumentative method is preferred because of the historical origins of 
logic, mathematics and science in the ancient Greek tradition of peripatetic (i.e. ambulatory) philosophy, 
where truth was arrived at through argumentation in preference to experimentation. The great success of 
Euclid’s axiomatisation of geometry inspired axiomatisations of other areas of mathematics and in several 
areas of physics and chemistry, which were also very successful. The reductionist approach whereby the 
“elements” of a subject are discovered and everything else follows automatically by deduction has been 
very successful generally, but this approach must always be combined with experimentation to fine-tune the 
axioms and methods of deduction. In mathematics, however, there is an unfortunate tendency to strongly 
assert the validity of anything which follows from a set of axioms, as if deduction from axioms were the 
golden test of validity. As in the sciences, mathematicians should be more willing to question their golden 
axioms if the consequences are too absurd. 


If a result is questioned on the grounds of absurdity, some people defend the result on the grounds that it 
follows inexorably from axioms, and the axioms are defended on the grounds that if they are not accepted, 
then one loses certain highly desirable results which follow from them. Thus one is offered a package of axioms 
which have mixed results, including both desirable results and undesirable results. The heavy investment in 
the argumentative method yields profits when one is able to make assertions with them. The choice of axioms 
confers the power to decide which results are true and which are false. Perhaps it is this “power over truth 
and falsity” which makes the argumentative method so attractive. The ability to arrive at absurd conclusions 
from minimal assumptions is a power which is difficult to resist, despite the necessary expenditure of effort 
on arduous, frustrating proofs. If one merely wrote out all of the desired tautologies of propositional logic 
as simple calculations from their semantic definitions, they would not have the same power of persuasion as 
a lengthy argument from “first principles”. 


4.5.15 REMARK: Identifying the logical conjunction operator in the propositional calculus. 
Theorem 4.5.16 establishes that the logical expression formula ^(o = ~p) has the properties that one would 
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expect of the conjunction a ^ B, or equivalently the list of atomic wff-wffs a and f. 
4.5.16 THEOREM [PC]: Initial assertions for the logical conjunction operator expression. 
'The following assertions follow from Definition 4.4.3. 

(i) F ~la > 28) = a. 

(ii) F ^(a > 7B) > B. 


) 
(iii) F a w (a = =2)). 
) 


(iv) a, ~(a > -). 


PROOF: To prove part (i): F=(a=>=p)> q&a 


(1) ~a > (a = -) Theorem 4.5.7 (xv): ala], 8[8] 
(2) Ca = (a + 28) > (la > 78) > a) Theorem 4.5.7 (xxvii): afa], Bla > —5] 
(3) Ra > 28) o MP (2,1) 
To prove part (ii): F —(o — 28) — £ 

(1) 2B = (a > +8) PC1: a[^6], Bla] 
(2) (48 > (a 13)) => (Ala B) = B) Theorem 4.5.7 (xxvii): o[8], Bla > 76] 
(3) ~la > 3B) > B MP (2,1) 
To prove part (iii): F o — (8 (a 3)) 

(1) (a iB) > (a 3) Theorem 4.5.7 (xi): ala > ~£] 
(2) a= ((a = ^8) > B) Theorem 4.5.7 (v) (1): ala = ^8], Bla}, [^8] 
(3) ((a = 58) > ^B) > (6 > “(a 8) Theorem 4.5.7 (xxvi): ala => =], B[8] 
CE a Theorem 4.5.7 (iv) (2,3): ala], B[(a — ^8) = 7A], 718 > (o > -8)] 
To prove part (iv): a, 8 F ^(a — ^) 

(1) a Hyp 
(2) B Hyp 
(3) a = (8 > ~la = -B)) part (iii): afa], B[8] 
(4) 8 2 (a> =b) MP(3,1) 
(5) =a + 28) MP(4,2) 


This completes the proof of Theorem 4.5.16. 


4.6. Further theorems for the implication operator 


4.6.1 REMARK: The purpose of Theorem 4.6.2. 
One purpose of Theorem 4.6.2 is to prove Theorem 4.7.6 (viii), which is equivalent to Theorem 4.6.2 (vi). 
However, the other assertions are applied in other theorems also. 


4.6.2 THEOREM [PC]: Some useful technical assertions for the implication operator. 
'The following assertions follow from the propositional calculus in Definition 4.4.3. 


(i) F (^o = B) = ((a = B) => B). 
a => B) = (“a => B) = B). 
o (B-—vy),vy-—óta-(B-éó) 


) 
) 
(iv) F (o > 8)  ((8 > 7) > ((@> 7)  0- 
) 
) 


o — (B — (y) F a> (y — (B 9). 


y) = (B = 9) — (Ca 9 B) > 7)). 
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Pnoor: To prove part (i): F (2e — B) > ((a = 8) => B) 
(1) (58 > ^a) > ((^8 = a) = B) 

(2) (a => 8) = (^8 = ~a) 

(3) (a = B) = (£ = a) = 8) 

(4) (08 a) 2 (a= 8) >8 

(5) Ca => B) > (£ = a) 

(6) (sa > B) > ((a > 8) => B) 


To prove part (ii): F (a => 8) => ((~a => 8) 2 £) 

(1) (7a = b) = ((a = B) = B) 

(2) (a= B) = (ca = B) = B) 

a> (b=7), y> F a= (=ô) 


To prove part (iii): 


(1) a> (B 2 v) 
(2) y2 5 
(3) (a> (8 = 9) = (a = B) > (a = 7)) 
(4) (a = B) 2 (a2 v) 
(8) (a = y) => (v 4) => (a = 8) 
(6) (a = B) > ((7 = ô) 9 (a = ô)) 
(7) (82 y) = (v6) = (B= ô)) 
(8) a = ((y 4) = (8 = 4)) 
(9) (y = 8) => (a = (8 9 4)) 
(10) a= (B = ô) 
To prove part (iv): F (~a => 8) > (8 = y) > ((a y) 9 y) 


) 
(1) (7a > 6) > (8 = y) = Ca => 7)) 


(3) Ca 2 B) 9 (B y) 9 (o9 7) => 7)) 

a => (Bo (y => ô)) F a> (y => (Bo 6) 
(1) a> (Bo (7 = 6) 

(2) (82 Q => 8)) => (v2 (82 ô)) 

(3) a = (y= (B= ô)) 

To prove part (vi): F(a 2 y) 2 ((8 29 y) 9 ((^a = 8)=7)) 


To prove part (v): 


(1) (a > B) => ((8— 7) > (la 7) => 7)) 
(2) (8 = y) = (“a = 8) => (a => 7) > » 
(3) (8 — y) => (a> 7) = (a B) > 7)) 
(4) (a y) > (8 > 7) = (Ca B) 7)) 


'This completes the proof of Theorem 4.6.2. 


4.6.3 REMARK: Simpler proof of an assertion. 
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Theorem 4.5.7 (xxxiii) 
Theorem 4.5.7 (xxviii) 
Theorem 4.5.7 (iv) (2,1) 
Theorem 4.5.7 (v) (3) 
Theorem 4.5.7 (xxvii) 
Theorem 4.5.7 (iv) (5,4) 


part (i) 
Theorem 4.5.7 (v) (1) 


MP (3,1) 
Theorem 4.5.7 (vii 
Theorem 4.5.7 (iv) (4,5 


Theorem 4.5.7 (v) (8 


) 
) 
ii) 
Theorem 4.5.7 iv) (1,7) 
) 
) 


Theorem 4.5.7 (vii) 
part (i) 
part (iii) (1,2) 


Theorem 4.5.7 (xiv) 


Theorem 4.5.7 (iv) (1,2) 


Theorem 4.6.2 (iii) is identical to Theorem 4.6.4 (x), but the proof given for the latter is apparently simpler 


and shorter. 


4.6.4 THEOREM [PC]: 


(i) F (a = 8) > ((a = 78) > ^a). 
(ii) a> p,a B F ~a. 
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B, o — p F ~a. 
~p, a => D F a. 
B, 5a 2 B F a. 
~b, ma >p Fa. 


a> B8,7,B>(7>0) Faso. 
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a>(8>7),yodbF a> (>ð). 


) 
iv) 
) 
vi) 
vii) 
(viii) a> (8-3), 8 - a «v. 
) 
) 
xi) 
) 


E (a = ($ = 7)) => (8 = a) = (8 


(B = y) F 6> a) = ((6 = 8) = (8 > 7)). 


Pnoor: To prove part (i): F (a = f) 


1) (a 2 B) > (8 2 ~a) 

2) (28 2 ~a) => ((8 => ~a) > ~a) 
3) (a — B) > ((B = 7a) = 7a) 
4) (8 = ~a) => ((a — B) = 7a) 
5) (a => 58) => (8 => ~a) 

6) (a => 5B) => ((a — B) = ~a) 
7) (a = B) => ((a => 78) => 7a) 
To prove part (ii): a+ ,a B | 
lhaxsB6 

2)a>-6 

3) (a => B) > ((a => 78) = ~a) 
(4) (a= ^8) > ~a 

(5) ~a 


~,a >p F ~a 


To prove part (vi): §6,7-a>-76 Fa 


Theorem 4.5.7 (xxviii) 
Theorem 4.6.2 (i) 
Theorem 4.5.7 (iv) (1,2) 
Theorem 4.5.7 (v) (3) 
i) 

) 

) 


Theorem 4.5.7 ( 
Theorem 4.5.7 (iv) 


Theorem 4.5.7 (v) (6 


Hyp 

Hyp 

Theorem 4.6.2 (ii) 
MP (3,1) 

MP (4,2) 


Hyp 

Hyp 

Theorem 4.5.7 (xxvi) 
MP (3,2) 

MP (4,1) 


Hyp 

Hyp 

Theorem 4.5.7 (xxxi) (2) 
MP (3,1) 
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(1) 2 Hyp 
(2) ^ Hyp 
(3) ^ part (iv) 
(4) 220 a Theorem 4.5.7 (xxii) 
(5) a MP (4,3) 
To prove part (vii): —=8,7-a> 8a 
(1) 76 Hyp 
(2) 7a => B Hyp 
(3) ~-a part (v) 
(4) 220 a Theorem 4.5.7 (xxii) 
(5) o MP (4,3) 
To prove part (viii) —(8-4,Bt-oa-2w 
(1) a > (B — v) Hyp 
(2) B Hyp 
(3) (a> (8 = 7)) > (8 => (ao Y) Theorem 4.5.7 (xiv) 
(4) B= (a v) MP (1,3) 
(5) o vy MP (2,4) 
To prove part (ix): a> 8,y,8-2(y280)- a6 
(1) o9 8 Hyp 
(2) Y Hyp 
(3) B= (y ô) Hyp 
(4) o (y = ô) Theorem 4.5.7 (iv) (1,3) 
(5 a ó part (viii) (4,2) 
To prove part (x): => (8=>=7), y> F a= (=ô) 
(1) a> s => 7) Hyp 
2) y2 Hyp 
(3) (8 P (y+ 5) + (898) Tienaa 
(4) a> (B = ô) part (ix) (1,2,3) 
To prove part (xi): F (a = (8 = 7)) = ((6 9 a) = ((6 9 B) = (6 = 7))) 
(1) a = (B => y) 9 ((6> a) > (ô => (8 => 7))) Theorem 4.5.7 (vi) 
(2) (8 => (B = 7)) => (6 = B) 9 (6 = 7)) PC2 
(3) (a = (8 = 7)) > (6 > a) > (6 > B) 9 (6 > 7))) part (x) (1,2) 
To prove part (xii): a= (8 = y) F (8 = a) = ((8 = b) => (6 = 7)) 
(1) a= (B — 7) Hyp 
(2) (a= (8 = 9) > (6 = a) > (6 = B) => (ô 7) part (xi) 
(3) (6 => a) = ((8 > 8) = (6 > 7)) MP (1,2) 
This completes the proof of Theorem 4.6.4. 
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4.7. Theorems for other logical operators 


4.7.1 REMARK: Definition of non-basic logical operators. 

The implication-based propositional calculus introduced in Section 4.4 offers two logical operators, — and —. 
Of the five logical connectives in Notation 3.6.2, the two connectives = and - are undefined in the axiom 
system described in Definition 4.4.3 because they are primitive connectives. (All of the operators are fully 
defined in the semantic context, but the symbols-only language context defines only manipulations rules, not 
meaning.) The other three connectives are defined in terms of > and — in Definition 4.7.2. 


4.7.2 DEFINITION [MM]: 


(i) a V B means (^a) => f for any wffs a and f. 
(ii) aA B means ^(a => ~£) for any wffs a and f. 
(ii) a= 8 means (a > B) ^ (8 > o) for any wffs a and f. 
4.7.3 REMARK: Logical operators which are of less importance. 
The NOR (1), NAND (1), XOR (A) and implied-by (<=) operators may be defined in a similar manner 
to Definition 4.7.2 as follows. They are not presented in propositional logic theorems here because their 
properties follows easily from properties of the other operators. 
(1) a | 6 means —^(o V B) for any wffs a and f. 
(2) at B means ~(a ^ B) for any wffs a and f. 
(3) a ^ B means ^(a & f) for any wffs a and £. 
(4) a = B means 8 > a for any wffs a and f. 


4.7.4 REMARK: Arguments for and against a minimal set of basic logical operators. 

Definition 4.7.2 is not how logical connectives are defined in the real world. It just happens that there is a lot 
of redundancy among the operators in Notation 3.6.2. So it is possible to define the full set of operators in 
terms of a proper subset. Defining the operators in terms of a minimal set of operators is part of a minimalist 
mode of thinking which is not necessarily useful or helpful. 


Reductionism has been enormously successful in the natural sciences in the last couple of centuries. But 
minimalism is not the same thing as reductionism. Reductionism recursively reduces complex systems to 
fundamental principles and synthesises entire systems from these simpler principles. (E.g. Solar System 
dynamics can be synthesised from Newton's laws.) However, it cannot be said that the operators = and ^ 
are more “fundamental” than the operators A and V. The best way to think of the basic logical connectives 
is as a network of operators which are closely related to each other. 


In many contexts, the set of three operators A, V and ~ is preferred as the “fundamental” operator set. 
(For example, there are useful decompositions of all truth functions into “disjunctive normal form" and 
“conjunctive normal form”.) In the context of propositional calculus, the = operator is more “fundamental” 
because it is the basis of the modus ponens inference rule. But the modus ponens rule could be replaced by 
an equivalent set of rules which use one or more different logical connectives. 


A propositional calculus based on the single NAND operator with a single axiom and modus ponens is 
possible, but it requires a lot of work for nothing. The NAND operator is in no way “the fundamental 
operator” underlying all other operators. It is minimal, not fundamental. 


4.7.5 REMARK: Demonstrating a larger set of natural axioms for non-basic logical operators. 
The ten assertions of Theorem 4.7.6 are the same as the axiom schemas of the propositional calculus described 
by E. Mendelson [370], page 40. 


4.7.6 THEOREM [PC]: Proof of 10 axioms for an alternative propositional calculus with =>, ^, V and =. 
The following assertions for wffs a, 8 and y follow from the propositional calculus in Definition 4.4.3. 


(i) Fas (B — a). 

) F (a => (8 => 7)) => ((a => B) > (a => 0). 
(ii) F (a A B) Sa. 

) F (œ ^ B) => B. 
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(v) Fa 2 (B= (a ^ B). 
(vi) - a 2 (o V B) 
(vii) - B= (a V f). 
(viii) F (a= 7) > ((8 > y) => ((a v 8) 2 9) 
(ix) F (a => B) > ((a > 28) > ~a) 
(x) F 22a « 
PROOF: 


Part (ii) is the same as F 
Theorem 4.5.16 (i). Similarly, part (iv) is the same as F 7=(a@ 
assertion is identical to Theorem 4.5.16 (ii). 


Fa — (B -— (a B)) 


—^(a = =) => a by Definition 4.7.2 (ii). 


To prove part (v): 


Parts (i) and (ii) are identical to axiom schemas PC 1 and PC2 respectively in Definition 4.4.3. 


But this assertion is identical to 
= =f) => B by Definition 4.7.2 (ii), and this 


1) (a B) — (a 3) Theorem 4.5.7 (xi) 
2) a (a 3) 3) Theorem 4.5.7 (v) (1) 
3) ((a > ^B) > =) > (B = “(a => B) Theorem 4.5 rnm 
4) a => (B => ~la -B)) Theorem 4.5.7 (iv 2,3) 
5) a> (B = (a ^ B)) Definition 4.7.2 (ii) (4) 
To prove part (vi): Fa= (a V B) 
1) a => (~a > B) Theorem 4.5.7 (xvi) 
2) a 2 (o V B) Definition 4.7.2 (1) (1) 
To prove part (vii):  - 8 — (o V B) 
1) B = (~a = ) PCI 
2) 8 — (a Vv B) Definition 4.7.2 (1) (1) 
Part (viii) is identical to Theorem 4.6.2 (vi). 
Part (ix) is identical to Theorem 4.6.4 (i). 
Part (x) is identical to Theorem 4.5.7 (xxii). 
This completes the proof of Theorem 4.7.6. 


4.7.7 REMARK: Joining and splitting tautologies to and from logical expressions. 

If a wff a is a tautology, it may be combined in a disjunction with any wff @ without altering the truth or 
falsity of the enclosing wf. Thus if f is a sub-wff of a wff, it may be replaced with the combined wf (a V 8). 
This fact follows from Theorem 4.7.6 (vi) and the theorem a + (a V 8) = B. Conversely, a tautology may 
be removed from a conjunction. Thus o may be substituted for a ^ B if B is a tautology. 


4.7.8 REMARK: Joining and splitting contradictions to and from logical expressions. 

There is a corresponding observation to Remark 4.7.7 in terms of a contradiction instead of a tautology. (See 
Definition 3.13.2 (i) for contradictions.) A contradiction 6 can be combined with any proposition a without 
altering its truth or falsity. Thus the combined proposition a V f is equivalent to o for any proposition a 
and contradiction B. 


4.7.9 THEOREM [PC]: More assertions for propositional calculus with the operators >, ^, ^, V and ©. 
The following assertions for wffs a, 6 and y follow from the propositional calculus in Definition 4.4.3. 


(i) o, B - ad B. 


(ii) o 8,8 a - ae. 
(iii) Fa & o. 

(iv) o, 8 - a € f. 

(v) F (a> (82 7)) © (8 > (a => 7)). 
(vi) F (a = ^8) € (8 = ~a) 
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a V B) = ((7a) = B). 
a) = ((a V B) = B). 
(a = 28)) > (a ^ B). 


aI 
= 


- (=la ^ 8)) > (a = ^8). 


^ B F- a. 


(xx) aS B,a ey F- Bem. 


VB,a-vy,B-",t». 
—y,B-—*t(avwBB)w. 
oVa)-a. 

aVa) sa. 

a — (a ^ a). 

Q0 ^a)-«ea. 

a ^ B) => ((Bo» 7) > m. 


(a ^ B) > 7) > (a> (8 > 7)). 
a> (B => )) > ((a A 8)  »). 


= (B-«)tr(a^8B)o. 
(a ^ B) 
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4.7. Theorems for other logical operators 


(li) F (a ^ B) 2 7) € (B 2 (a= 7)). 
(ii) F (a ^ B) 2 y) € ((B ^ a) > 7). 
(liii) F ((a@ ^ B) ^ y) & ((8 ^ a) ^ 4). 
(liv) F ((a ^ B) => ^y) € ((a@ ^ y) > ^8). 
(Iv) F ((@ ^ B) ^ y) & ((a ^ v) ^ B). 
(Ivi) F (a ^ (B ^ y)) & ((a ^ B) ^ 9). 
(vii) F (a v (8 v y)) = (B V (a V 9). 
(lviii) F (a V (8 v ¥)) (8 V (a V 9). 
(lix) F (a v (B V y)) & ((a v B) V 9). 
(Ix) a>BF ~a V B. 
(lxi) ^a V B F a 2 B. 
(Ixii) a= 8 JF ~a V f. 
(lxii) a V 8 F 58 > a. 
(xiv) 982 aF ov B. 
(Ixv) 28 — a dF a V f. 
(Ixvi) ~la > B) F a ^ ^B. 
(Ixvii) a ^ 28 F ~la => f). 
(Ixviii) 5(o > B8) dk a ^ f. 
(Ixix) na F ala ^ f). 
(Ixx) o 8 F (a ^f) 
(Ixxi) F (a > 8) > ((a ^ B) >a). 
(lxxii) F (a > 8) > (a => (a ^ B)). 
(Ixxiii) F (a => 8) 2 ((a => y) > (a => (B ^ 7))). 
(Ixxiv) a> 8,La >y Fas (B^). 
(xxv) a => (B y, a > (v B) F a= (84). 
(xxvi) F (a 2 B) > ((a A B) & a). 
(Ixxvii) a> 8 - (y Vo) (wv f). 
(lxxviii) a> 8 F (y ^a) 2 (y ^ B). 
(xxix) F (a = 8) > ((Y ^ o) > (v ^ B)). 
(box) F (a (85 3) => (e ^ B) = (ay). 
(Ixxxi) F (a V (8 ^ y)) > ((a V B) ^ (o V 9). 
(Ixxxii) F ((a V B) ^ (a V y)) = (a V (B ^ 9). 
(Ixxxiii) F (a V (B ^ y)) & (lav B) ^ (a V «q)). 
(Ixxxiv) F (a ^ (8 V y)) = ((@ ^ B) V (a ^ 9). 
(Ixxxv) F ((a ^ B) V (e ^ y)) = (a ^ (8 v 9). 
(Ixxxvi) F (a ^ (8 v y)) + ((@ ^ B) V (a ^ 9). 
(Ixxxvii) - (a V (a ^ B)) Sa. 
(Ixxxviii) F (a ^ (a V B)) & a. 
Pnoor: To prove part (i): a,8 F aA B 
(1) a 
(2) 8 
(3) a = (8 = (a ^ B)) 
(4) B= (a ^ B) 
(5) a ^B 
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Hyp 

Hyp 

Theorem 4.7.6 (v) 
MP (3,1) 

MP (4,2) 
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To prove part (ii): a 


lha=B 
)82a 
(3) (a= B) ^ 
) 


4) aep 


(8 => 


To prove part (ii): 
(1) aoa 
2 asa 


To prove part (iv): 


4. Propositional calculus 


a) 


B, B 


Fasa 


arkaseB 


a,BrFaeBp 


1) (a > (8 — 9) 


3) (a> (8 
To prove part (vi): 


1) (a = 28) = (8 


2) (B > ^a) > (a 


( 

( 
3) (a > ^B) e (8 

) 


To prove part (vii): 


1) (œa) > 8) > (( 
2) (a v B) = (a) 
To prove part (viii): 
1) (a V B) = ((7) 
2) (^a) = ((a v B) 
To prove part (ix): 


1) =a => -8)) 


(a 


( 
(2) Cla => 28) = ( 


) 
To prove part (x); (^ 
)2 (a AB) 


(1) (a+ 8) 
(2) Gla ^ 8) > 
To prove part (xi): 
(1) a ^B 

(2) (aA B)» 
(3) a 


To prove part (xii): 


[ www. geometry. org/dg.htm1 ] 
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a ^ B) 


(a = =$) 


anf Fa 


aNBr B 


Hyp 

Hyp 

part (i) (1,2) 
Definition 4.7.2 (iii) (3) 


Theorem 4.5.7 (xi) 
part (i) (1,1) 


Hyp 
Hyp 
4.5.7 (1) (2) 
Theorem 4.5.7 (i) (1) 


part (ii) (3,4) 


Theorem 


Theorem 4.5.7 (xiv 


xiv 
Theorem 4.5.7 (xiv) 
part (ii) 


Theorem 4.5. 


Theorem 4.5.7 (xxvi) 


Theorem 4.5.7 (v) (1) 


Theorem 
Definition 4.7.2 "m (1) 


part (ix) 
Theorem 4.5.7 (xxx) (1) 


Hyp 
Theorem 4.7.6 (iii) 
MP (2,1) 
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4.7. Theorems for other logical operators 


(1) aA B 

(2) (aA B) 2 B 

(3) B 

To prove part (xiii): a - ov 
(1) a 

(2) a= (a V B) 

(3) av B 

To prove part (xiv): 6 Fav 6 
(1) B 

(2) B = (a V B) 

(3) av B 

To prove part (xv) «€ 8 F- a B 
(1) asb 

(2) (a= B) ^ (B= o) 

(3) a B 

To prove part (xvi): aß - Ba 
(1 asg 

(2) (a= 8) ^ (8 = o) 

(3) BS a 


To prove part (xvii): aß Boa 


To prove part (xviii): 0€» 8,8 «y F- «€ 


To prove part (xi); o &y,B& y F o € B. 


To prove part (xx): a@B,aeyt Bey 
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Hyp 
Theorem 4.7.6 (iv) 
MP (2,1) 


Hyp 
Theorem 4.7.6 (vi) 
MP (2,1) 


Hyp 
Theorem 4.7.6 (vii) 
MP (2,1) 


Hyp 
Definition 4.7.2 (iii) (1) 
part (xi) (2) 


Hyp 
Definition 4.7.2 (iii) (1) 
part (xii) (2) 


Hyp 

Hyp 

part (xv) (1) 

part (xv) (2) 

Theorem 4.5.7 (iv) (3,4) 
part (xvi) (1) 

part (xvi) (2) 

) 
) 


Theorem 4.5.7 (iv) (7, 
part (i) (5 


2 
6 
8 
Hyp 
Hyp 


part (xvii) (2) 
part (xviii) (1,3) 
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To prove part (xxiii): 


(1 
(2 
(3 
(4 


2B 
(7a) € (78) 


a € 


1-3 
) 
) 
) 


To prove part (xxiv): 


(1) ~a 
(2) asa 


To prove part (xxv): 


(a2 y)-— 
(637) => 
(ao y)e 


4. Propositional calculus 


a € B F ^a & 3f 


—-2a € 86 t-aef 


~a, np rasp 


2a F-oea 


asb tr(y2o)e(y- B) 


o € B F (a y) & (B — v). 


(8 2 v) 
(a = 7) 
(8 = 7) 


To prove part (xxvii): 


a€ B - (y V o) & (v B). 
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Hyp 

Hyp 

part (xvii) (1) 
part (xviii) (3,2) 


Hyp 

part (xv) (1) 

part (xvi) (1) 

Theorem 4.5.7 (xxxi) (3) 
Theorem 4.5.7 (xxxi) (2) 
part (i) (4,5) 


xv) (1) 

part = ) (1) 
Theorem 4.5.7 m (3) 
) 

) 


Hyp 
Hyp 
part (iv) (1,2) 
part (xxii) (3) 


Hyp 
part (xxiii) (1,1) 


Hyp 

part (xv) (1) 

part (xvi) (1) 
Theorem 4.5.7 (viii) (2) 
Theorem 4.5.7 (viii) (3) 
part (ii) (4,5) 


Hyp 

part (xv) (1) 

part (xvi) (1) 
Theorem 4.5.7 (ix) (3) 
Theorem 4.5.7 (ix) (2) 
part (ii) (4,5) 
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4.7. Theorems for other logical operators 


(1) aes 
(2) (^y > a) e (y = B) 
(3) (y V a) e (7 V B) 


To prove part (xxviii): 


e 


o € B F (a V y) & (B v 9). 
a € p 

aa € 3B 

(^a => y) e (“6 > 7) 

(a V y) € (B V 9) 


To prove part (xxix): 


2 
3 
4 


) 
) 
) 
) 


a € B - (y ^o) & (v^ f). 


(y = 70) e 2( > 78) 


o € B F- (a ^v) & (B ^). 


(a > —y) € (8 => 79) 

(a = oy) e a(8 => 79) 

(a ^ y) € (B ^ 9) 

To prove part (xxxi): - (a V B) — 
1) Ca > B) > ($ > a) 

2) (a V B) = (B V a) 
To prove part (xxxii): 
(1) a V B 

2) (a V B) 
3 Va 

To prove part (xxxiii): 
1) (a V B) > (8 V a) 
2) (B V a) > (a V B) 
3) (a V B) € (B V a) 


To prove part (xxxiv): 


(8 V o) 


aovB-6va 


(8 V o) 


Flav B) &(B Va) 


F(aA B)> 
1) (8 — ~a) > (a = =p) 

2) (a = =$) > -(8 = ~a) 

3) (a ^ B) = (B ^ a) 
To prove part (xxxv): 
1) (a ^ B) = (B ^ a) 
2) (B ^ a) = (a ^ B) 
3) (a ^ B) & (8 ^ a) 
To prove part (xxxvi): 
1) 8 Va) ty vB) 
2) (a= (BV 9) > (a 


(8 ^ a) 


FE (a ^ B) & (B Aa) 


F (a= (B v 9)) => (a (7 V B)) 


(y v 8)) 
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Hyp 
part (xxv) (1) 
Definition 4.7.2 (i) (2) 


Hyp 

part (xxi) (1) 

part (xxvi) (2) 
Definition 4.7.2 (i) (3) 


part (xxi 


part (xxv 


part (xxi 


Definition 4.7.2 (ii 


Hyp 

part (xxvi) (1) 

part (xxi) (2) 
Definition 4.7.2 (ii) (3) 


Theorem 4.5.7 (xxvii 
Definition 4.7.2 (i) (1 


— 


— 


Hyp 
part (xxxi) 
MP (2,1) 


part (xxxi) 
part (xxxi) 
part (ii) (1,2) 


Theorem 4.5.7 (xxvi) 
Theorem 4.5.7 (xxxi) (1) 
Definition 4.7.2 (ii) (2) 


part, (xxxiv) 
part (xxxiv) 
part (ii) (1,2) 


part (xxxi) 
Theorem 4.5.7 (viii) (1) 
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1) (y > (a > 8) > (7 > a) > (Cv 
(2) ( a) > (77 => (a = B)) > (^v 
3) a> (7 => a) 
4) a= ((7y 9 (a = B)) 2 (vy => B) 
5) (^y > (a= B)) = (a (7 = 8) 
6) (y V (a= B)) > (a= (v V B) 
7) (a> B) V y) - (Y V (a B) 
(8) ((a= B) V y) > (a (7 V B) 
9) (a> (y V B)) > (a — (8 V 9) 
(10) ((a = 8) V y) > (a= (8 V 9) 
To prove part (xxxviii: — F- ((a 9 8) ^ y) 2 (a 9 (B ^ vy)) 
(1) (8 — ^y) > (a> B) > (a > ^v) 
2) ((a > B) > (a => 7») > (a > ((a = B) 
3) (B= 77) = (a > ((a => B) > 7») 
4) a => ((8 > ^y) > ((a => B) > 7») 
9) (8 => 77) = ((a => 8) > 7») > (la 
6) a > (a > B) = 77) > B8 = v) 
7) a((a > B) > y) > (a > “(8 = 7) 
8) (a> B) ^ y) > (a — (8A 7) 


Part (xxxix) is identical to Theorem 4.6.2 (iv). 


To prove part (xD: oVvf,oa-y,B-wytw 


1 o2 

2) 82 Y 

(3) (a> 7) 9 (8 7) 9 (a v 8) > 7) 
4) (B => y) > ((a V B) 2 ») 

5) (aV B) 2» 


To prove part (xlii): F(a Vo)-a« 
1 oa 

2) (aVa)>a 

To prove part (xliii: F(a V a) a 
1 (aVa)>a 

2)a>(aVa) 

3) (aVa)eSa 
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Theorem 4.5.7 (iv) (3,2) 
Theorem 4.5.7 (v) (4) 
Definition 4.7.2 (i ) 

part (xxxi) 

6) 
vi) 
) 


E 


Theorem 4.5.7 (iv 
part (xxxvi 
Theorem 4.5.7 (iv) (8,9 


~ 
N 


Theorem 4.5.7 (vi) 
Theorem 4.5.7 (v) 
Theorem 4.5.7 (iv) (1,2) 
Theorem 4.5.7 (v) (3) 
Theorem 4.5.7 (xxviii) 
) 

) 

) 


Theorem 4.5.7 (iv) (4,5 
Theorem 4.5.7 (v 
Definition 4.7.2 (ii 


Hyp 

Hyp 

Theorem 4.7.6 (viii) 
MP (3,1) 

MP (4,2) 


Theorem 4.5.7 (xi) 
part (sli) (1,1) 


part (xlii) 
Theorem 4.7.6 (vi) 
part (ii) (1.2) 
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4.7. Theorems for other logical operators 


To prove part (xliv): F a= (a ^ o) 


1) a = (a = (a ^ a)) 
2) (a= (a> (a ^ a))) > ((a > a) > (a > (a ^ a))) 
3) (a= a) > (a => (a ^ a)) 
(4 a>a 
5) a (a ^a) 
To prove part (xlv): F(a^a) ea 
1 (aA a)>a 
2)a>(aAa) 
3) (a^o)ea 
To prove part (xlvi: F(a A 8) => ((8 — y) — vy) 
1) (eo^ B) 2 8 
2) B= ((B— y)— 7) 
(3) (a ^ B) => ((B- y) v 
To prove part (xlvii): F ((a ^ 8) 2 y) > (a> (8 —)) 
1) a = (8 => (a ^ B)) 
(2) (8 — (e ^ B)) > (((a ^ B) > y) > (8 — 7) 
3) a= (((a ^ B) 2 y) = (8 => ») 
4) (a ^ B) 2 y) > (a (B 0) 
To prove part (xlviii): F (a => (B — y)) => ((a ^ B) 2 v) 
1 (aA B) Sa 
2) (a= (B>7)) > ((@A B) > (8 => 9) 
(3) (a ^ B) => (8 > y) > v 
4) (a ^ B) => B= 5)) = ((a ^ B) > 7) 
5) (a= (8 => 9)) > ((@A B) > 9) 
To prove part (xlix): a — (8 y) F (aA B) y 
1) a > (8 — 7) 
2) (a= (8 => 7)) > ((@A B) 9 9) 
3) (aA B) y 
To prove part (D: -((aA 8) 2 y) € (a 2 (8 2 «)) 
1) ((a@ ^ B) 9) > (a> (8 > 0) 
2) (a= (8 = 7)) > ((@A B) > 9) 
3) (a ^ B) > y) S (a — (8 > 9) 
To prove part (li): EK ((a ^ 8) 2: y) € (8 29 (a )) 
(1) (o ^ 8 9 3) e (a (8 +7) 
2) (a= (B = 7)) € (8 > (a> ») 
3) ((a ^ B) => y) € (8 > (a> ») 
To prove part (li): F ((a ^ B) 2 y) & ((B ^ o) 2 v) 
1) (la A 8) 9 3) (a (84) 
2) ((8 ^a) e 3) e (a> (899) 
3) ((a ^ B) > y) € ((B ^ a) + 7) 
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Theorem 4 
Theorem 4.7.6 6 i 


o 
o 


Theorem 


Theorem 4.7.6 (iv) 
Theorem 4.5.7 (xiii) 


Theorem 4.5.7 (iv) (1,2) 


Theorem 4.7.6 (v) 
Theorem 4.5.7 (vii) 
32) 
) 


um 
d ae 


Theorem 4.5.7 (iv) (1 


Theorem 4.5.7 (v) (a 


Theorem 4.7.6 (iii 
Theorem 4.5.7 (i 


ee 
=a 
m: 


E 
vs. 
~ 
. Tl 
Z z E 


O 
2 
H 
e 

CRF ELS 
an 
5 


"Jj 
Q 
N 


Theorem 4.5.7 (iv) (2,4) 


Hyp 
part (xlviii) 
MP (1,2) 


part (xlvii) 
part (xlviii) 
part Gi) (1.2) 


part (1) 
part (v) 
part (xviii) (1,2) 


part (1) 
part (li) 
part (xx) (1,2) 
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130 4. Propositional calculus 


To prove part (lii): - ((a ^ B) ^ y) & ((B ^ a) ^ v) 


(1) ((a ^ B) > 77) € ((B ^ a) 7») 
(2) a((@ ^ B) > 7) € (B ^ a)  —) 


(3) (a ^ B) ^ y) € ((B ^ a) ^ 4) 

To prove part (liv): F ((a ^ B) 29 ^y) & ((a ^ y) > ^8) 
(1) (8 — ^y) e (7 = 28) 

(2) (a — (8 => y) € (a => (v > =2)) 

(3) ((@ ^ B) => 77) € (a > (8 = v) 

(4) ((@ ^ B) => ^y) € (a > (y > 28) 

(5) ((@ ^ 7) > 2B) € (a > (y > 28) 

(6) ((@ ^ B) => 77) €» ((a ^ y) = 78) 

To prove part (Iv): | - ((a ^ B) ^ y) & ((a ^ y) ^ B) 
(1) ((@ ^ 8) > ^v) & ((a ^ y) > 78) 

(2) =a ^ B) => 77) € (a ^ y) ^8) 

(3) ((@ ^ B) ^ y) € ((o ^ 9) ^ B) 

To prove part (Ivi:  F (e ^ (B ^ *)) €& ((a ^ B) ^ vy) 
(1) (BAY) ^ a) € ((B ^ a) ^ v) 

(2) (e ^ (B ^ 9)) & ((B ^ y) ^ a) 

(3) (e A (B ^ 9)) € ((B ^ a) A) 

(4) ((B ^ a) ^ y) € ((@ ^ B) A) 

(5) (e A (B ^ 9)) € ((@ ^ B) A) 

To prove part (lvii): F(a V (8 V 7)) — (8 V (a V 7) 


(1) (^a > (58 = 7)) > 8 = (^a > v) 
(2) (a V (8 V 9)) > (8 V (a V 9) 
To prove part (lviii: F(a V (8 V 7)) & (B V (a V 4)) 


(1) Ca > (48 = 7)) € (^8 = (a > 9) 


(2) (a V (BV 9)) € (B V (a V 7) 
To prove part (lix): F (o V (8 v *)) & ((a V 8B) V v) 
(1) (a V (y V B) @ (v v (e v B) 
(2) (v V (o V B)) € ((e V B) V v) 
(3) (a V (y V B)) @ ((e V B) V v) 
(4) (8 V y) & (7V B) 

(5) (a V (8 V ¥)) & (a V (7 V B) 
(6) (a V (B V 9)) € (a V B) V 7) 
To prove part (Ix); a+ 8 - 7aVv B 
(1) o 8 

(2) 77a > B 

(3) sa V B 

To prove part (lxi): —o v8 «a 
(1) sa V B 

(2) 77a > B 

(3) a= 8 
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part (lii) 
part (xxi) (1) 
Definition 4.7.2 (ii) (2) 


part (vi) 

part (xxv) (1) 
part (1) 

part (xviii) (3,2) 
part (1) 

part (xix) (4,5) 


part (liv) 


part (xxi) (1) 


Definition 4.7.2 (ii) (2) 


part (xviii 


) 
part m 
part (xviii) (3, 


part (v) 
Definition 4.7.2 (i) (1) 


part (lviii 
part (xxxiii 


part (xviii) (1, 


) 
) 
2) 
part (xxxiii) 
part (xxvii) (4) 
3) 


part (xviii) (5, 


Hyp 
Theorem 4.5.7 (xxxviii) (1) 
Definition 4.7.2 (i) (2) 


Hyp 
Definition 4.7.2 (i) (1) 
Theorem 4.5.7 (xxxix) (2) 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


4.7. Theorems for other logical operators 


Part (lxii) follows from parts (Ix) and (lxi) 
To prove part (Ixiii): av 8 - ~B>a 
lov 

2) BVa 

3) 982a 

To prove part (lxiv) =B>abtav 86 
1) ->a 

2) BVa 

3 ov 


Part (lxv) follows from parts (Ixiii) and (lxiv) 
To prove part (lxvi): — 2(a > 8) F a ^ ^8 


1) “(a > B) 

) 

) 

)a 

To prove part E aA 3B F -(a B) 
^ B 


4) “(a => 8) 
Part (Ixviii) follows from parts (Ixvi) and (Ixvii) 


To prove part (Ixix): ~a F —^(a ^ B) 


To prove part (lxxi): F(a => 8) — ((a ^ B) — a) 

(1) (a^ 8) a 

(2) (a = B) => ((a ^ B) > o) 

To prove part (lxxii): F (a> 8) => (a => (a ^ B)) 

(1) a = (8 => (a ^ B)) 

(2) (a = B) > (a= (a ^ B)) 

To prove part (lxxiii): F (a= 6) > ((a — y) = (a= (8 ^ 9) 


Hyp 
part (xxxii) (1) 
Definition 4.7.2 (i) (2) 


Hyp 
Definition 4.7.2 (i) (1) 
part (xxxii) (2) 


Hyp 

Theorem 4.5.7 (xlii) (1) 
Theorem 4.5.7 (xliii) (1) 
) 


part (i) (2,3 


Theorem 4.5.7 (i 
Theorem 4.5.7 (xxix 
Theorem 4.5.7 (xxv 
Definition 4.7.2 (ii 


Hyp 


Theorem 4.7.6 (iii 


[un] 
— 


Theorem 4.5.7 (i) 
Theorem 4.7.6 (v) 
Theorem 4.5.7 m ii) (1) 
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(1) 8 2 (y 9 (B ^v) 
(2) (a> B) > ((a => 7) => (a> (B ^ y) 
To prove part (Ixxiv): a= B, y F- o (B ^w) 


To prove part (Lxv): a + (£ > 7), a > (7 => $) E a= (647) 


) a> (B — v) 

2) a= (y => B) 
)a-((B- y) ^Q- B) 
) a> (B & v) 


To prove part (Ixxvi): F (a> 8) — ((a ^ 8) & a) 
1) (a= B) = ((a ^ B) = a) 

2) (a= B) => (a= (a ^ B)) 

3) (a — B) > ((a ^ B) & a) 

To prove part (lxxvii): a= 8 - (y V o) — (y V B) 
1 a=b 

2) ((^y) > a) = (m) = B) 

3) (y V a) = (7 V B) 

To prove part (lxxviii): o — 8 - (y ^ a) (y ^ B) 
1 o2 

2) (^8) > (70) 

3) (y => 7B) = (y > 70) 

) 

) 


(^(y = 7@)) = (407 = 28) 
(7 ^ a) = (7A B) 
To prove part (Ixxix): F(a=>6)=>((yA o) 29 (y ^ B)) 


1) (a = 8) = ((^8) = (7a) 

2) (8) > (ra) => ((y = 78) > (7 = 79) 

3) (yo cB) > Uy > ~a)) > (aly > ~a) > (9o 38) 
(4) ((y = 78) > (y > 7a)) > (v ^ a) = (7 ^ B) 

5) (a => B) > ((y > 78) > (7 > 7) 

(6) (a = B) > ((Y ^ a) = (7A B)) 


To prove part (lxxx): | (o. — (B — y)) 7 ((a ^ B) — (a ^ 7)) 


1) (8 = 7) = (7) = (OB) 

2) (a = (B = 7)) — (a => (09) = (8) 

3) (a => (09) = (48))) > (a 9 (090) > (e = (08))) 

4) (a => (B > 5) > (la => (099) => (a = (980) 

5) ((a = (49) > (a => (989) > Gla = (78) > ~la C09) 
6) ((a = (77) > (a = (78))) => ((e ^ 8) = (a ^ 9) 

7) (a = (8 = 7)) > (@A 8) = (a ^ 9) 
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Theorem 4.5.7 (iv) (2,3) 
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) 
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Theorem 4.5.7 (iv) (4,6 
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To prove part (lxxxi): F(a V (8 ^ y)) > ((a V B) ^ (a V $)) 

(1) (8^9) 8 Theorem 4.7.6 (iii) 
(2) (a V (B A 4)) = (a V B) part (Ixxvii) (1) 
(3) (B^w«y)-—» Theorem 4.7.6 (iv) 
(4) (a V (B ^ y)) 2 (avy) part (Ixxvii) (3) 
(5) (e V (B ^ 9) > ((a V B) ^ (a V o) part (Ixxiv) (2,4) 
To prove part (Ixxxii): + ((a V 8) ^ (a V y)) 2 (e V (B ^ 4)) 

(1) (^a) = B) = (Ca) > 7) > (Ca) = (B ^ 7) part (Ixxiii) 
(2) (a V B) > ((a V y) 2 (a V (B ^ 4))) Definition 4.7.2 (i) (1) 
(3) ((a V B) ^ (a V y)) = (a V (B ^ )) part (xlix) (2) 
To prove part (Ixxxiii): .- (a V (BA 3)) & ((a V 8) ^ (a V 4)) 

(1) (e V (B ^ 9) => (la V B) ^ (a V v) part (Ixxxi) 
(2) ((o V B) ^ (a V 7) = (a V (B ^ y) part (Ixxxii) 
(3) (e V (B ^ 9)) € (la V B) ^ (a V o) part (ii) (1,2) 
To prove part (Ixxxiv): F(a ^ (B V *)) 7 ((a ^ B) V (a ^ 4)) 

(1) (^8) > ((8 V y) v) part (viii) 
(2) (a> 4B) > (a — ((B V y) 2 4)) Theorem 4.5.7 (viii) (1) 
(3) (^(a ^ B)) > (a = ^8) part (x) 
(4) (5(a ^ B)) > (a> ((B V y) 2 v)) Theorem 4.5.7 (iv) (3,2) 
(5) (a => ((B V y) > 7) > ((e ^ (8 v 9) > (e ^ 9) part (Ixxx) 
(6) (Ala ^ 8) => (a ^ (BV 3) + (a ^ 9) Theorem 4.5.7 (iv) (4,5) 
(7) (e ^ (8 V 9)) => (Gla ^ B)) = (a ^ 9) Theorem 4.5.7 (v) (6) 
(8) (a ^ (B V q)) > ((a ^ B) V (a ^ 4)) Definition 4.7.2 (1) (7) 
To prove part (Ixxxv): - ((« ^ B) V (a ^ qy)) 2 (a ^ (8B V «)) 

(1) 8 — (B v 7) Theorem 4.7.6 (vi) 
(2) (a ^ B) > (a ^ (B v «)) part (lxxviii) (1) 
(3) y 2 (B v «) Theorem 4.7.6 (vii) 
(4) (a ^ vy) 2 (a ^ (B V 4)) part (Ixxviii) (3) 
(5) ((@A B) V (a ^ 7) = (a ^ (BV 9) part (xli) (2,4) 
To prove part (Ixxxvi): - (a ^ (B V 7)) @((a ^ B) V (a ^ 4)) 

(1) (a ^ (8 V 4)) > (a ^ B) V (a ^ 4)) part (Ixxxiv) 
(2) ((@A B) V (a ^ y)) = (a ^ (BV 7) part (Ixxxv) 
(3) (e A (8 V ¥)) € (la ^ B) V (a ^ 9) part (ii) (1,2) 
To prove part (Ixxxvii): F(a V (a ^ B)) Sa 

(1) aSa Theorem 4.5.7 (xi) 
(2) (aA B)>a Theorem 4.7.6 (iii) 
(3) (a V (a ^ B)) > part (xli) (1,2) 
(4) a> (a V (aA B)) Theorem 4.7.6 (vi) 
(5) (e V (a A B) e part (ii) (3,4) 
To prove part (Ixxxviii): H (œ ^ (a V B)) € 
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(1) (e^ (a v B) >a Theorem 4.7.6 (iii) 
(2) oa Theorem 4.5.7 (xi) 
(3) a= (a V B) Theorem 4.7.6 (vi) 
(4) (a^ (a V B) ea part (ài) (1,3) 


'This concludes the proof of Theorem 4.7.9. 


4.7.10 REMARK: Some basic logical equivalences which shadow the corresponding set-theory equivalences. 
'The purpose of Theorem 4.7.11 is to supply logical-operator analogues for the basic set-operation identities 
in Theorem 8.1.6. 


4.7.11 THEOREM [PC]: Logical equivalences which correspond to some set-expression equivalences. 
The following assertions for wffs a, 8 and y follow from the propositional calculus in Definition 4.4.3. 


(i) F (a V B) & (8 v a). (Commutativity of disjunction.) 

(ii) F (a ^ B) e E ^ a). (Commutativity of conjunction.) 

(iii) F (a V à) se 

(iv) - (a ^ a) Sa. 

(v) F(a v (BV y)) & ((o v B) V y). (Associativity of disjunction.) 

(vi) F (a ^ (B ^ 4)) & ((a ^ B) ^ 7). (Associativity of conjunction.) 

(vii) H (a V (B ^ 4)) & ((a V B) ^ (a V 4). (Distributivity of disjunction over conjunction.) 
(viii) F (a ^ (8 V y)) € ((a ^ B) V (a ^ v)). (Distributivity of conjunction over disjunction.) 
(ix) H (a v (a ^ B)) & o. (Absorption of disjunction over conjunction.) 

(x) F (a A (a V B)) & a. (Absorption of conjunction over disjunction.) 


Pnoor: Part (i) is the same as Theorem 4.7.9 (xxxiii). 
Part (ii) is the same as Theorem 4.7.9 (xxxv). 

Part (iii) is the same as Theorem 4.7.9 (xliii). 

Part (iv) is the same as Theorem 4.7.9 (xlv). 

Part (v) is the same as Theorem 4.7.9 (lix). 

Part (vi) is the same as Theorem 4.7.9 (lvi). 

Part (vii) is the same as Theorem 4.7.9 (Ixxxiii). 

Part (viii) is the same as Theorem 4.7.9 (Ixxxvi). 

Part (ix) is the same as Theorem 4.7.9 (Ixxxvii). 

Part (x) is the same as Theorem 4.7.9 (Ixxxviii). 


4.7.12 REMARK: The absurdity of axiomatic propositional calculus. 

A rational person would probably ask why in axiomatic propositional calculus, one discards almost everything 
that one knows about logic, leaving only three axioms and one rule from which the whole edifice must be 
rebuilt. (That's like throwing away all of your money except for three dollars and then trying to rebuild all 
of your former wealth from that.) It is difficult to answer such a criticism in a rational way. Some authors 
do in fact assume that all of propositional calculus is determined by truth tables, thereby short-circuiting 
the Gordian knot of the axiomatic approach. Then the axiomatic approach is applied to predicate calculus 
and first order languages, where the argumentative calculus approach can be better justified. 


In Sections 4.4, 4.5, 4.6 and 4.7, various expedient metatheorems, such as the deduction metatheorem in 
Section 4.8, have been eschewed, unless one counts amongst metatheorems the substitution rule and the 
invocation of theorems and definitions for example. The fact that many presentations of mathematical logic 
do resort to various metatheorems to reduce the burden of proving basic logic theorems by hand suggests that 
minimalist axiomatisation is not really the best way to view logic. Euclid's “Elements” [213, 214, 215, 216] 
set the pattern for minimalist axiomatisation, but the intuitive clarity of Euclid's axioms and method of 
proof far exceeded the intuitive clarity of logical theorem proofs by the three Lukasiewicz axioms using only 
the modus ponens rule. 
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Propositional calculus, even with a generous set of axioms and rules, is an extremely inefficient method 
of determining whether logical expressions are tautologies. Propositional calculus is a challenging mental 
exercise, somewhat similar to various frustrating toy-shop puzzles and mazes. It is vastly more efficient to 
substitute all possible combinations of truth values for the letters in an expression to determine whether it is 
a tautology. One could make the claim that the argumentative method of propositional calculus mimics the 
way in which people logically argue points, but this does not justify the excessive effort compared to truth- 
table calculations. One could also make the argument that the axiomatic method gives a strong guarantee of 
the validity of theorems. But the fact that the theorems are all derived from three axioms does not increase 
the likelihood of their validity. In fact, one could regard reliance on a tiny set of axioms to be vulnerable in 
the sense that a biological monoculture is vulnerable. Either they all survive or none survive. A population 
which is cloned from a very small set of ancestor axioms could be entirely wiped out by a tiny fault in a 
single axiom! It is somewhat disturbing to ponder the fact that the most crucial axioms and rules of the 
whole edifice of mathematics are themselves completely incapable of being proved or verified in any way. 
(See Remark 2.1.4 for Bertrand Russell’s similar observations on this issue.) 


4.7.13 REMARK: What real mathematical proofs look like. 

One of the advantages of presenting some very basic logic proofs in full in this book is that the reader gets 
some idea of what real mathematical proofs look like. The proofs which are given for the vast majority of 
theorems in the vast majority of textbooks are merely sketches of proofs. This is always acceptable if one 
really does know how to “fill in the gaps”. The formal proofs for predicate calculus in Section 6.6 are in the 
“natural deduction” style, which is quite different to the “Hilbert style” which is used in Sections 4.5, 4.6 
and 4.7 for propositional calculus. Nevertheless, even the natural deduction style is quite austere, although 
it is closer to the way mathematicians write proofs “in the wild” than the Hilbert style where every line is 
effectively a theorem. 


4.8. The deduction metatheorem and other metatheorems 


4.8.1 REMARK: Deduction metatheorem not used for propositional calculus in this book. 

Most presentations of propositional calculus include some kind of deduction metatheorem, and many also 
include a proof. (See Remark 4.8.6 for further comments on the literature for this subject.) Propositional 
calculus can be carried out entirely without using any kind of deduction metatheorem. In fact, no deduction 
metatheorem has been used to prove the propositional calculus theorems in Chapter 4. 


For the predicate calculus in Chapter 6, no deduction metatheorem is required because it is assumed that 
the assertions aœ F 8 and F a — f are equivalent. (This equivalence is implicit in the rules MP and CP in 
Definition 6.3.9.) 


4.8.2 REMARK: The purpose of the deduction metatheorem. 

Assertions of the forms (1) “a F 6” and (2) “+ a = 8" are similar but different. The first form means 
that one is claiming to be in possession of a proof of 6, starting from the assumption of a. This proof is 
constrained to be in accordance with the rules of the system which one has defined. The second form of 
assertion is much less clear. The logical operator “=” has no concrete meaning within the axiomatic system. 
It is just a symbol which is subject to various inference rules. Since its meaning has been intentionally 
removed, so as to ensure that no implicit, subjective thinking can enter into the inference process, one 
cannot be certain what it really “means”. The motivation for establishing an axiomatic system is clearly to 
determine “facts” about some objects in some universe of objects, but the facts about that universe are only 
supposed to enter into the choices of axioms, inference rules, and other aspects of the axiomatic system at 
the time of its establishment. Thereafter, all controversy is removed by the prohibition of intuition in the 
mechanised inference process. 


Despite the prohibition on intuitive inference, it is fairly clear that “a = 8” is intended to mean that f is 
"true" whenever a is “true”, although truth itself is relegated to the role of a mere symbol which is subject 
to mechanised rules and procedures. If this interpretation is permitted, one would expect that o = 8 should 
be “true” if 8 can be proved from a. Conversely, one could hope that the “truth” of a = 6 might imply the 
provability of 8 from a. In formal propositional calculus, the semantic notion of “truth” is replaced by the 
more objective concept of provability. (Ironically, non-provability is usually impossible to prove in a general 
axiomatic system except by using metamathematics. So provability is itself a very slippery concept.) 
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As things are arranged in a standard propositional calculus, the slightly more dubious expectation is more 
easily fulfilled. That is, the provability of a = 6 does automatically imply the provability of 8 from a. This 
is exactly what the modus ponens rule delivers. (The “inputs” to the MP rule are listed in Section 4.8 in 
the opposite order to other sections. Here the implication is listed before the premise.) 


To prove: a > 8, o - 8 


(1) o2 B Hyp 
(2) a Hyp 
(3) 8 MP (1,2) 


In other words, if the logical expression a = / can be proved, then 8 may be deduced from a. Perhaps 
surprisingly, the reverse implication is not so easy to show because it is not declared to be a rule of the system. 
However, the “deduction metatheorem" states that if 8 can be inferred from a, then the logical expression 
a = D is a provable theorem in the axiomatic system. Unfortunately, there is no corresponding “deduction 
theorem" which would prove this result within the axiomatic system. (The deduction metatheorem requires 
mathematical induction in its proof, for example, which is not available within propositional calculus.) The 
deduction metatheorem and modus ponens may be summarised as follows. 


inference rule name inference rule Curry code 
modus ponens given a => 6, infer a F 6 Pe 
deduction metatheorem given a F f, infer a — 6 Pi 


Curry [350], page 176, gives the abbreviation “Pe” for modus ponens and “Pi” for the deduction metatheorem. 
These signify “implication elimination” and “implication introduction” respectively. In a rule-based “natural 
deduction” system, both of these are included in the logical framework, whereas in an “assertional system” 
with only the MP inference rule, implication introduction is a metatheorem. 


4.8.3 REMARK: Example to motivate the deduction metatheorem. 

To see why proving the deduction metatheorem is not a shallow exercise, consider the theorem F a > a 
for example. (See Theorem 4.5.7 (xi).) It is difficult to think of a more obvious theorem than this! But the 
only tools at one’s disposal are MP and the three axiom templates. The corresponding conditional theorem 
is a F a, which is very easily and trivially proved. 


To prove: a F a 


(1) a Hyp 
The conclusion is the same as the hypothesis. So the formal proof is finished as soon as it is started. When 
attempting to prove F o = a, one has no hypothesis to the left of assertion symbol. So one must arrive at 
conclusions from the axioms and MP alone. Since the compound logical expression a = a does not appear 
amongst the axiom templates, the proof must have at least two lines as follows. 


To prove: Fa => « 


(1) [logical expression?] [axiom instance?] 
(2)a>a [justification? | 
Here the logical expression on line (1) must be an axiom instance, since there is no other source of logical 


expressions. This must lead to a — a on line (2) by MP, since there is only one inference rule. However, 
MP requires two (different) inputs to generate one output. So at least three lines of proof are required. 


To prove: Fa => « 


(1) [logical expression?] [axiom instance?] 
(2) [logical expression?] [axiom instance?] 
(3 aa MP (2,1) 


With a little effort, one may convince oneself that such a 3-line proof is impossible. The MP rule requires 
one input with the symbol “=” between two logical subexpressions, where the subexpression on the left 
is on the other input line, and a = a is the subexpression on the right. This can only be achieved with 
the axiom instances a > (a > o) or (^a 1a) => (a > a). In neither case can the left subexpression 
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be obtained from an axiom instance. Therefore a 3-line proof is not possible. One may continue in this 
fashion, systematically increasing the number of lines and considering all possible proofs of that length. The 
following is a solution to the problem. 


To prove: Fa => « 


(1) a= ((a =a) =a) PCI: ala], bla > a] 
(2) (a> ((a> a) => a)) > ((a> (a> a)) > (a => a)) PC 2: afa], Bla = o], yla] 
(3) (a = (a = a)) => (a => a) MP (2,1) 
(4) a> (a — a) PC 1: afa], 8[o] 
(5 aa MP (4,3) 


The proof could possibly be shortened by a line (although it is difficult to see how). But it is clear that a 
quick conversion of the proof of a F o into a proof of F a > a is not likely. If it is not obvious how to 
convert the most trivial one-line proof, it is difficult to imagine that a 20-line proof will be easy either. 


4.8.4 REMARK: Alternatives to the deduction metatheorem. 

One way to establish the validity of the deduction metatheorem in propositional calculus is to simply declare 
this metatheorem to be a deduction rule. In other words, one may declare that whenever there exists a 
proof of an assertion of the form “at f", it is permissible to infer the assertion “+ a — 8”. A deduction 
rule can permit any kind of inference one desires. In this case, we “know” that this rule will always give 
true conclusions from true assumptions. Since all axioms and all other rules of inference are justified meta- 
mathematically anyway, there is no overwhelming reason to exclude this kind of deduction rule. In fact, 
all inference rules may be regarded as metatheorems because they can be justified metamathematically. 
(Lemmon [367], pages 14-18, Suppes [394], pages 28-29, and Suppes/Hill [396], page 131, do in fact give the 
deduction metatheorem as a basic inference rule called *conditional proof" .) 


Another way to avoid the need for a deduction metatheorem would be to exclude all conditional theorems. 
Given a theorem of the form “F a => 8”, one may always infer 8 from «a by modus ponens. In other 
words, there is no need for theorems of the form “at 8”. This would make the presentation of symbolic 
logic slightly more tedious. An assertion of the form a1, a2, ... o F 8 would need to be presented in the 
form F a, > (as => (...(@n = B)...)), which is somewhat untidy. For example, a1, a2, a3 F f becomes 
F a, => (as => (as = B)). The real disadvantage here is that it is more difficult to prove theorems of the 
unconditional type than the corresponding theorems of the conditional type. 


4.8.5 REMARK: The semantic background for the deduction metatheorem. 
The assertion ^F a” of a compound proposition a has the meaning “r € Ka”, where r : P > (F, T] isa 
truth value map as in Definition 3.2.3, and Ka C 2? is the knowledge set denoted by a. 


The assertion “+ a = 8” for compound propositions a and f! has the meaning “r € (27 V Ka) U Kg”, where 
Kg C 2? is the knowledge set denoted by £. 


The assertion “at 8" for compound propositions o and 8 has the meaning ^r € Ka > 7r € Kg”, where the 
triple arrow “S” denotes implication in the semantic (metamathematical) framework. This is equivalent to 
the (naive) set-inclusion Ka C Kg. From this, it follows that any T € 27 must satisfy r € (2? \ Ka) U Kg. 
(To see this, note that 7r € 2? implies r € Kg U (27 V Kg) C Kg U (2P \ K4).) However, it does not follow 
that the validity of r € (2? V Ka) U Kg for a single truth value map r implies the set-inclusion Ky C Kg. To 
obtain this conclusion, one requires the validity of r € (27 \ Ka) U Kg for all possible truth value maps 7. 


Thus in semantics terms, the assertion “a + 9" implies T € (27 V Ka) U Kg, which implies the validity of the 
assertion “F oa = 6”. So in terms of semantics, the deduction metatheorem is necessarily true. Therefore 
one might reasonably ask why implication introduction is not a fundamental rule of any logical system. Any 
reasonable logical system should guarantee that “- a => f" follows from “at 8” if it accurately describes 
the underlying semantics. The validity of the implication introduction rule is thrown in doubt in minimalist 
logical systems which are based on modus ponens as the sole inferential rule. Natural deduction systems 
which have rules that mimic real-world mathematics do not need to have such doubt because they can 
prescribe implication introduction as an a-priori rule. 


The requirement to prove the deduction metatheorem may be thought of as a “burden of proof” for any 
logical system which declines to include the implication introduction rule amongst its a-priori rules. This 
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includes all systems which rely upon modus ponens as the sole rule. This provides an incentive to abandon 
“assertional” systems in favour of rule-based natural deduction systems. 


Non-MP rules become even more compelling in predicate calculus, where Rule G and Rule C, for example, 
must be proved as metatheorems in assertional systems. (See Remark 6.1.7 for Rules G and C.) This suggests 
that the austere kind of axiomatic framework in Definition 4.4.3 is too austere for practical mathematics. 
In fact, even for the purposes of pure logic, it seems of dubious value to discard a necessary rule and then 
have to work hard to prove it in a precarious metatheorem. Such an approach only seems reasonable if one 
starts from the assumption that the theory or language is the primary “reality” and that the underlying 
meaning (or model) is of secondary importance. Making the deduction metatheorem an a-priori rule only 
seems like “cheating” if one assumes that the language is the real thing and the underlying space of concrete 
propositions is a secondary construction. 


4.8.6 REMARK: Literature for the deduction metatheorem for propositional calculus. 

Presentations of the deduction metatheorem (and its proof) for various axiomatisations of propositional 
calculus are given by several authors as indicated in Table 4.8.1. Most authors refer to the deduction 
metatheorem as the “deduction theorem". (‘The propositional algebra used in the presentation by Curry [350], 
pages 178-181, lacks the negation operator. So it is apparently not equivalent to the propositional calculus 
in Definition 4.4.3.) 


year author basic operators axioms inference rules 
1944 Church [348], pages 88-89 1l, 3 MP 
1952 Kleene [365], pages 90-98 a, A, V, > 10 MP 
1953 Rosser [387], pages 75-76 a, A > 3 MP 
1963 Curry [350], pages 178-181 AV, > 3 MP 
1963 Stoll [393], pages 378-379 a, => 3 MP 
1964 E. Mendelson [370], pages 32-33 5, > 3 MP 
1967 Kleene [366], pages 39-41 a A, V, >, $ 13 MP 
1967 Margaris [369], pages 55-58 =, > 3 MP 
1969 Bell/Slomson [339], pages 43-44 =, ^ 9 MP 
1969 Robbin [384], pages 16-18 L, > 3 MP 
1993 Ben-Ari [340], pages 57-58 e 3 MP 


Table 4.8.1 Survey of deduction metatheorem presentations for propositional calculus 


The deduction metatheorem is attributed to Herbrand [421,422] by Kleene [365], page 98; Kleene [366], 
page 39; Curry [350], page 249; Lemmon [367], page 32; Margaris [369], page 191; Suppes [394], page 29. 


4.8.7 REMARK: Different deduction metatheorem proofs for different axiomatic systems. 
Each axiomatic system requires its own deduction metatheorem proof because the proofs depend very much 
on the formalism adopted. (See Remark 6.6.27 for the deduction metatheorem for predicate calculus.) 


4.8.8 REMARK: Arguments against the deduction metatheorem. 

After doing a fairly large number of proofs in the style of Theorem 4.5.7, one naturally feels a desire for 
short-cuts. One of the most significant frustrations is the inability to convert an assertion of the form “at 8” 
to an implication of the form “F a > p”. 


An assertion of the form “at 8” means that the writer claims that there exists a proof of the wf 6 from 
the assumption a. Therefore if a appears on a line of an argument, then 6 may be validly written on any 
later line. It is intuitively clear that this is equivalent to the assertion “= a — 8”. The latter assertion can 
be used as an input to modus ponens whereas the assertion “alt 6" cannot. 


Although it is possible to convert an assertion of the form “+ o => 8” to the assertion “a + 8” (as mentioned 
in Remark 4.8.2), there is no way to make the reverse conversion. A partial solution to this problem is 
called the “Deduction Theorem". This is not actually a theorem at all. It is sometimes referred to as a 
metatheorem, but if the logical framework for the proof of the metatheorem is not well defined, it cannot be 
said to be a real theorem at all. It might be more accurate to call it a “naive theorem" since the proof uses 
naive logic and naive mathematics. 
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The proof of this “theorem” requires mathematical induction, which requires the arithmetic of infinite sets, 
which requires set theory, which requires logic. (This inescapable cycle of dependencies was forewarned 
in Remark 2.1.1 and illustrated in Figure 2.1.1.) However, all of the propositional calculus requires naive 
arithmetic and naive set theory already. So one may as well go the whole hog and throw in some naive 
mathematical induction too. (Mathematical induction is generally taught by the age of 16 years. One 
cannot generally progress in mathematics without accepting it. A concept which is taught at a young 
enough age is generally accepted as “obvious” .) 


One could call these sorts of “logic theorems” by various names, like “pseudo-theorems”, “metatheorems” 
g y 


or “fantasy theorems”. In this book, they will be called “naive theorems” since they are proved using naive 
logic and naive set theory. 


4.8.9 REMARK: Notation and the space of formulas for the deduction metatheorem. 
Theorem 4.8.11 attempts to provide a corollary for modus ponens. 


To formulate naive Theorem 4.8.11, let W denote the set of all possible wffs in the propositional calculus in 
Definition 4.4.3, let W” denote the set of all sequences of n wffs for non-negative integers n, and let List(W) 
denote the set Ur.) W” of sequences of wffs with non-negative length. An element of W” is said to be a wff 
sequence of length n. The concatenation of two wff sequences I'; and I'? is denoted with a comma as I'1, T2. 


4.8.10 REMARK: Znformality of the deduction metatheorem. 

Concepts like “line” and “proof” and “quoting a theorem” and “deduction rule" are not formally defined 
in preparation for Theorem 4.8.11 because that would require a meta-metalogic for the definitions, and this 
process would have to stop somewhere. Metamathematics could be formally defined in terms of sets and lists 
and integers, and this is in fact informally assumed in practice. Attempting to formalise this kind of circular 
application of higher concepts to define lower concepts would put an embarrassing focus on the weaknesses 
of the entire conceptual framework. 


4.8.11 THEOREM [MM]: Deduction metatheorem 
In the propositional calculus in Definition 4.4.3, let T € List(W) be a wff sequence. Let a € W and B € Y 
be wffs for which the assertion L', œ + £ is provable. Then the assertion T F a = f is provable. 


PROOF: Let I € List(W) be a wff sequence. Let a € W and 8 € Y be wffs. Let A = (ô1,... ôm) € W™ be 
a proof of the assertion T, a + 8 with ôm = B. First assume that no other theorems are used in the proof 
of A. 


Define the proposition P(k) for integers k with 1 € k € m by P(k) = “the assertion T - a => ð; is provable 
for all positive ? with i < k”. 

To prove P(1) it must be shown that T F a = ô, is provable by some proof A’ € List(W). Every line of 
the proof A must be either (a) an axiom (possibly with some substitution for the wff names), (b) an element 
of the wff sequence I, (c) the wf o, or (d) the output of a modus ponens rule application. Line 6; of the 
proof A cannot be a modus ponens output. So there are only three possibilities. Suppose 6; is an axiom. 
'Then the following argument is valid. 


Tras 

(1) à (from some axiom) 
(2) 6: > (a > 41) PCI 
(3) a => ô, MP (2,1) 
If 04 is an element of the wff sequence I’, the situation is almost identical. 

Tras 

(1) à Hyp (from T) 
(2) 64 > (a => 61) PC1 
(3) a= ð MP (2,1) 


If 6; equals a, the following proof works. 


Ts a> ô 
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(1) oa Theorem 4.5.7 (xi) 


When a theorem is quoted as above, it means that a full proof could be written out “inline” according to the 
proof of the quoted theorem. Now the proposition P(1) is proved. So it remains to show that P(k—1) > P(k) 
for all k > 1. 


Now suppose that P(k — 1) is true for an integer k with 1 < k < m. That is, assume that the assertion 
T F a= 6; is provable for all integers i with 1 < i < k. Line 6, of the original proof A must have been 
justified by one of the four reasons (a) to (d) given above. Cases (a) to (c) lead to a proof of Tr F a = 8 
exactly as for establishing P(1) above. 

In case (d), line à of proof A is arrived at by modus ponens from two lines 6; and à; with 1 < i < k 
and 1 < j < k with i # j, where 6; has the form 9; = óy. By the inductive hypothesis P(k — 1), there 
are valid proofs A’ € W™ for Tl - a => 6; and AP € Ww" frr bas 6;. (For the principle of 
mathematical induction, see Theorem 12.2.12.) Then the concatenated argument A’, A" € Ww’ *"" has 
a > 6; on line (m’) and a > (6; = p) on line (m +m”). A proof of T F a = óy may then be constructed 


as an extension of the argument A’, A” as follows. 


Tras Ôk 
(m) a= 6; (above lines of A’) 
(m! +m”) a => (0; > dx) (above lines of A") 
(m +m" +1) (a= (0; > dx)) => ((a = ði) > (a = 9k) PC2 
(m/ +m" +2) (a => 6;) > (a > dx) MP (m +m” + 1,m/ +m") 
(m +m" +3) a= ox MP (m + m" + 2,m’) 


(Alternatively one could first prove the theorem a > 8, a > (B — y) F a — y and apply this to lines (m) 
and (m' + m”).) This established P(k). Therefore by mathematical induction, it follows that P(m) is true, 
which means that T F a= f is provable. 


4.8.12 REMARK:  Equivalence of the implication operator to the assertion symbol. 
It follows from the deduction metatheorem that the following assertions are interchangeable. 


- (AA (Ao B)) 2 B. 
AA^(A—M B)FB 
A, (A= B) FB 

A FE (Ao B) B. 


More generally, logical expressions may be freely moved between the left and the right of the assertion symbol 
in this way in a similar fashion. 


4.8.13 REMARK: Substitution in a theorem proof yields a proof of the substituted theorem. 

If all lines of a theorem are uniformly substituted with arbitrary logical expressions, the resulting substituted 
deduction lines form a valid proof of the similarly substituted theorem. (See for example Kleene [365], 
pages 109-110.) 


4.8.14 REMARK: Quoting a theorem is a kind of metamathematical deduction rule. 

Since it is intuitively obvious that the quotation of a logic theorem is equivalent to interpolating the theorem 
inline, with substitutions explicitly indicated, this is generally not presented as an explicit mathematical 
deduction rule. If a logical system is coded as computer software, the quotation of theorems must be defined 
very precisely. Such a programming task is outside the scope of this book. 


Also outside the scope of this book is the management of the theorem-hoards of multiple people over time, 
including the hoards of theorems which appear in the literature. The basic specification for mathematical 
logic corresponds, in principle, to a single scroll of paper with infinitely many lines. (See also Remarks 
4.3.4, 4.3.7 and 4.3.10 regarding “scroll management”.) But in practice, there are many “scrolls” which 
dynamically change over time. Even though people may quote theorems from other people, there is not 
standardised procedure to ensure consistency of meaning and axioms between the very many “scrolls”. 
Real-world mathematics would require “theorem import" rules or metatheorems to manage this situation. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


4.9. Some propositional calculus variants 141 


4.8.15 REMARK: The medieval modes of reasoning. 
The phrase modus ponens is an abbreviation for the medieval reasoning principle called modus ponendo 
ponens. This was one of the following four principles of reasoning. (See Lemmon [367], page 61.) 
(i) Modus ponendo ponens. Roughly speaking: A, A> BE B. 
(ii) Modus tollendo tollens. Roughly speaking: B, A > B F —A. 
(iii) Modus ponendo tollens. Roughly speaking: A, 2(A ^ B) F =B. 
(iv) Modus tollendo ponens. Roughly speaking: 54, AV B F B. 


The Latin words in these reasoning-mode names come from the verbs “ponere” and "tollere". The verb 
"ponere" (which means “to put") has the following particular meanings in the context of logical argument. 
(See White [485], page 474.) 

(1) [In speaking or writing:] To lay down as true; to state, assert, maintain, allege. 

(2) To put hypothetically, to assume, suppose. 
Meaning (1) is intended in the word “ponens”. Meaning (2) is intended in the word “ponendo”. Thus the 
literal meaning of “modus ponendo ponens" is “assertion-by-assumption mode”. In other words, when A is 
assumed, B may be asserted. 
The Latin verb “tollere” (which means “to lift up") has the following figurative meanings. (White [485], 
page 612.) 

To do away with, remove; to abolish, annul, abrogate, cancel. 
Mode (ii) (“modus tollendo tollens") may be translated “negative-assertion-by-negative-assumption mode" 
or “negation-by-negation mode”. In other words, when B is assumed to be false, A may be asserted to be 
false. Effectively "tollendo" means “by negative assumption" while “tollens” means “negative assertion". 
Mode (iii) (“modus ponendo tollens") may be translated “negative-assertion-by-positive-assumption mode" 
or “negation-by-assumption mode”. When A is assumed to be true, B may be asserted to be false. 
Mode (iv) (“modus tollendo ponens") may be translated “positive-assertion-by-negative-assumption mode" 
or “assertion-by-negation mode". In this case, when A is assumed to be false, B may be asserted to be true. 
Since modes (iii) and (iv) are rarely used as inference rules, the more popular modes (i) and (ii) are generally 
abbreviated to simply “modus ponens" and “modus tollens" respectively. 


4.9. Some propositional calculus variants 


4.9.1 REMARK: The original selection of a propositional calculus for this book. 

Definition 4.9.2 is the propositional calculus system which was originally selected as the basis for propositional 
logic theorems in this book. It differs from Definition 4.4.3 only in axiom (PC’3). (This system is also used 
by E. Mendelson [370], pages 30-31.) Like Definition 4.4.3, Definition 4.9.2 is a one-rule, two-symbol, three- 
axiom system with => and ~ as logical operators and modus ponens as the sole inference rule. 


4.9.2 DEFINITION[MM]: The following is the axiomatic system PC’ for propositional calculus. 
(i) The proposition names are lower-case and upper-case letters of the Roman alphabet, with or without 
integer subscripts. 
(ii) The basic logical operators are => (“implies”) and ^ (“not”). 
(iii) The punctuation symbols are “(” and “)”. 


(iv) Logical expressions (“wffs”) are specified recursively. Any proposition name is a wff. For any wff a, the 
substituted expression (~a) is a wff. For any wffs a and fj, the substituted expression (a = f) is a wf. 
Any expression which cannot be constructed by recursive application of these rules is not a wff. (For 
clarity, parentheses may be omitted in accordance with customary precedence rules.) 


(v) The logical expression names (*wff names") are lower-case letters of the Greek alphabet, with or without 
integer subscripts. 

(vi) The logical expression formulas (*wff-wffs") are logical expressions whose letters are logical expression 
names. 


(vii) The axiom templates are the following logical expression formulas: 
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(PC'1) a > (B = a). 
(PC'2) (a > (B > 7)) = ((a > B) > (a 7)). 
(PC'3) (58 > ~a) > ($ = a) = B). 
(viii) The inference rules are substitution and modus ponens. (See Remark 4.3.17.) 
(MP) «a 8, a F B. 


4.9.3 REMARK: Comparison of the Lukasiewicz and Mendelson axioms 
Axiom PC3 in Definition 4.4.3 is slightly briefer than axiom PC’3 in Definition 4.9.2, although the other 
two axioms are the same. 


The equivalence of the sets of axioms in Definitions 4.4.3 and 4.9.2 is established by Theorem 4.5.7 (xxxiii) 
and Theorem 4.9.4 (iii). Since the axioms of each system are provable in the other system, all theorems are 
provable in each system if and only if they are provable in the other. Therefore one may freely choose which 
set of axioms to adopt. 


The axioms in Definition 4.4.3 appear to be more popular than the axioms in Definition 4.9.2, but it seems 
much more difficult to prove the required initial low-level theorems with Definition 4.4.3. (The proof of 
Theorem 4.5.7 (xxxiii) uses 18 other parts of the same theorem, whereas the proof of Theorem 4.9.4 (iii) is 
accomplished using only the two other parts of Theorem 4.9.4.) In order to not seem to be avoiding hard 


work, the author has chosen the more popular axioms in Definition 4.4.3 in this book! 


The axioms in Definition 4.4.3 are slightly more “minimalist”, since axiom PC 3 has one less wff-letter than 
axiom PC’3. It is perhaps noteworthy that axiom PC’3 resembles the RAA rule, whereas axiom PC 3 has 
some resemblance to the modus tollens rule. (See Remark 4.8.15 for the modus tollens rule.) Axiom PC'3 
states that if the assumption —/ implies both ~a and a, then the assumption ^£ must be false; in other 
words 8 must be true. Although axioms PC 3 and PC'3 differ by only one wff letter, the difference in power 
of these axioms is quite significant. 


4.9.4 THEOREM [PC']: Some assertions for a Lukasiewicz axiom-set variant. 
The following assertions follow from Definition 4.9.2. 


(ji) «a 8, 8-4 oa. 
(ii) a> (B y) F 8 (a 7). 
(iii) F (48 > ~a) > (a = P). 


PRooF: To prove part (i): a>6,B>yra>y 


1)o28 Hyp 
2) B3 Hyp 
3) (B= 7) > (a> (8 > 0) PC'l: all = 7], Ala] 
4) a 2 (B 2 vy) MP (3,2) 
5) (a= (8 > 7)) > ((a => B) > (a > ») PC72: alo], 8(8], yh] 
6) (a= 8) 2 la > vy) MP (5,4) 
T) a3 MP (6,1) 
To prove part (i): a>(P>y7) F 82» (a y) 
1) a= (8 = 7) Hyp 
2) B= (a = B) PC'1: a[8], Bla 
3) (a= (8 = 7)) = ((a > 8) = (a > v PC72: alo], 8(8], [y 
4) (a= B) 2 (a vy) MP (3,1) 
5) 82 (a9 y) part (i) (2,4): ala], bla > 6], yla => 7 
To prove part (iii): H (28 1a) > (a => 8) 
1) a= (28 = a) PC'1: afa], 8[^8 
2) (28 = ~a) > (£ > a) = B) PC'3: ala], B[8 
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(3) (46 > a) = (8 = ~a) = B) part (ii) (2): a[>8 = ^a], B[^8 = a], [8 
(4) a= ((48 = ^a) = B) part (i) (1,3): ala], B[^8 — a], y[(28 > a) > 8 
(5) (46 > ~a) > (a = B) part (ii) (4): ala], B[>8 > ^a], [8 


This completes the proof of Theorem 4.9.4. 


4.9.5 REMARK: Single axiom schema for the implication and negation operators. 
It is stated by E. Mendelson [370], page 42, that the single axiom schema 


((((a = £) > (oy > =8)) > 7) > £) > ((6 > a) > (ô = o) 


was shown by Meredith [427] to be equivalent to the three-axiom Definition 4.9.2. (Consequently it is also 
equivalent to Definition 4.4.3.) The minimalism of a single-axiom formalisation is clearly past the point of 
diminishing returns. Even the three-axiom sets are difficult to consider to be intuitively obvious. 


It should be remembered that the main justification for an axiomatic system is that one merely needs to 
accept the obvious validity of the axioms and rules (as for example in the case of the axioms of Euclidean 
geometry), and then all else follows from the axioms and rules. If the axioms are opaque, the whole raison 
d’étre of an axiomatic system is null and void. An axiomatic system should lead from the obvious to the 
not-obvious, not from the not-obvious to the obvious, which is what these formalisms do in fact achieve. 
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Chapter 5 


PREDICATE LOGIC 
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5.0.1 REMARK: Classes of logic systems. 

Propositional logic may be extended to predicate logic by adding some basic “management tools” for families 
of parametrised propositions. A predicate logic with equality extends the predicate logic concept by explicitly 
stating non-logical axioms for equality. A predicate logic system becomes a first-order language when purely 
logical axioms are extended by the addition of non-logical axioms. A first-order language with equality 
extends the first-order language concept by explicitly stating non-logical axioms for equality. Zermelo- 
Fraenkel set theory is a particular example of a first-order language with equality. This progression of 
concepts is illustrated in Figure 5.0.1. (See also Remark 3.16.2 for this progression of concepts.) 


propositional logic 


L 


predicate logic | ————» | predicate logic with equality 


Y v 


first-order language |—» | first-order language with equality 


v 


Zermelo-Fraenkel set theory 


Figure 5.0.1 Classes of logic systems 


5.0.2 REMARK: The fundamental role of predicate logic. 

Mathematicians have a fairly clear intuition for the meanings of objects and logical expressions which arise 
in their discussions and publications. The task of predicate calculus is to facilitate and automate the logical 
argumentation of mathematicians. Just as important as facilitation and automation are the adjudication and 
arbitration services which mathematical logic provides. The controversies in the last half of the nineteenth 
century required some kind of resolution. It was hoped that the formalisation of mathematics could resolve 
the issues. Russell [389], pages 144-145, said the following about the formalisation of mathematics. 


Mathematics is a deductive science: starting from certain premisses, it arrives, by a strict process 
of deduction, at the various theorems which constitute it. It is true that, in the past, mathematical 
deductions were often greatly lacking in rigour; it is true also that perfect rigour is a scarcely 
attainable ideal. Nevertheless, in so far as rigour is lacking in a mathematical proof, the proof is 
defective; it is no defence to urge that common sense shows the result to be correct, for if we were to 
rely upon that, it would be better to dispense with argument altogether, rather than bring fallacy 
to the rescue of common sense. No appeal to common sense, or “intuition,” or anything except 
strict deductive logic, ought to be needed in mathematics after the premisses have been laid down. 


The difficulty here is that if everything depends so heavily on the axioms and rules of deduction, then the 
choice of axioms and rules requires extremely close scrutiny. Even a tiny error in the deepest substratum of 
mathematics could cause grief in the upper strata. If the deductions differ from intuition too much and too 
often, one would have to suspect that the fundamentals require an overhaul. 
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5.0.3 REMARK: Predicate logic attempts to formalise theories which describe infinities. 

Mathematical logic would be a shallow subject if infinities played no role. It is the infinities which make 
mathematical logic “interesting”. (One could perhaps make the even bolder claim that it is the infinities 
which prevent mathematics in general from being shallow!) Since infinities range from slightly metaphysical 
to extremely metaphysical, it is not possible to settle questions regarding infinite logic and infinite sets by 
reference to concrete examples. At best, one may hope to make the theory correspond accurately to the 
metaphysical universes which are imagined to exist beyond the finite world which we can perceive. Predicate 
logic is primarily an attempt to formalise logical argumentation for systems where infinities play a role. 


” 


5.0.4 REMARK: Logic is the real “stuff” of mathematics. 

As mentioned in Remarks 7.1.4 and 7.1.5, mathematics does not reside inside ZF set theory. Most of the 
theories of mathematics can be modelled inside ZF set theory, but no one really believes that ZF sets are 
the real “stuff” of mathematics. For example, the assertions 2 C 3, 2U 4 = 4 and f']8 = 0 seem somewhat 
absurd, although the standard ZF model for the ordinal numbers makes these assertions true. 


Mathematical theories are typically formulated nowadays as axiomatic systems such as first-order languages. 
(See Section 6.9 for first-order languages.) Every such system may be given an interpretation as a “model” 
in terms of some “underlying set theory”, but the underlying framework for the model could be simply a set 
of integers or a set of real numbers, or some other kind of model-framework which is vastly simpler than the 
principal ZF set theory model known as the von Neumann universe, for example. (See Section 12.6 for the 
von Neumann universe.) 


As mentioned in Remarks 3.16.2 and 5.1.3, there is not much difference between predicate calculus and a 
first-order language. One merely adds some non-logical axioms to a predicate calculus, and then restructures 
the universe of propositions correspondingly. (This rearrangement is summarised in Remark 6.9.1.) 


Much mathematics falls more naturally into the predicate logic and general axiomatic systems framework 
rather than into the ZF set theory framework. It seems, then, that Bertrand Russell’s logicist programme 
has prevailed. Mathematics is a branch of logic! Predicate calculus and first-order languages are closer to 
being the “stuff” of mathematics than ZF set theory, which is, after all, only one theory among many. 


5.1. Parametrised families of propositions 


5.1.1 REMARK: Predicate logic is parametrised propositional logic with “bulk handling facilities”. 
Predicate logic introduces parameters for the propositions of propositional logic. If the concrete proposition 
domain is finite, the use of parameters is a mere management convenience, permitting “bulk handling” of 
large classes as opposed to individual treatment, but for an infinite set of propositions, the parametrised 
management of propositions and their truth values is unavoidable. It is true that all of propositional logic is 
immediately and directly applicable to arbitrary infinite collections of propositions, but propositional logic 
lacks the “bulk handling” facilities. The additional facilities of predicate logic consist of just two quantifiers, 
namely the universal and existential quantifiers. (So it would be fairly justified to refer to predicate logic as 
“quantifier logic” or “propositional logic with quantifiers” .) 

In practice, proposition parameters are typically thought of as objects of some kind. Examples of kinds 
of objects are numbers, sets, points, lines, and instants of time. Although the parameters for propositions 
may be in multiple object-universes, it is usually most convenient to define the universal and existential 
quantifiers to apply to a single combined object-universe. Then individual classes of objects are selected by 
applying preconditions to parameters in order to restrict the application of proposition quantifiers to the 
relevant object classes. 


5.1.2 REMARK: Predicate logic is “object oriented”. 

Whereas propositions in a propositional calculus may be notated as letters such as A or B, in a predicate 
calculus one typically notates propositions, for example, as A(x) or B(y) for some x and y in a universe U 
which contains all relevant objects. Parametrised propositions may also be organised in doubly indexed 
families notated, for example, as A(z, y) or B(z, y), or with any non-negative finite number of parameters. 


Single-parameter propositions may be thought of as attributes of their parameters, and multi-parameter 
propositions may be thought of as relations between their parameters. In other words, if the proposition- 
parameters are thought of as objects, parametrised propositions are then thought of as attributes and 
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relations of objects. This suggests some kind of dual relationship between objects and propositions, similar 
to the relationship between rows and columns of matrices. One may focus on either the propositions or 
the objects as the primary entities in a logical system. In predicate logic, the focus shifts to a great extent 
from the propositions to the objects. This is related to the concept of “object-oriented programming” in 
computer science, where the focus is shifted from functions and procedures to objects. Then functions and 
procedures become secondary concepts which derive their significance from the objects to which they are 
attached. Similarly in predicate logic, attributes and relations become secondary concepts which derive their 
significance from the objects on which they act. 


5.1.3 REMARK: First-order languages. 

The principal difference between predicate calculus and a first-order language is that the latter has non- 
logical axioms, a universe of objects, and some “constant predicates”. The logical axioms are those which 
are valid for any collection of finitely-parametrised families of propositions. These logical axioms follow 
almost automatically from the axioms of propositional calculus. In terms of the concrete proposition domain 
concept in Section 3.2, a predicate calculus has a concrete proposition domain P which typically contains 
an infinite number of atomic propositions. (P must include the ranges of all families of propositions in the 
logical system, and must also contain all unparametrised propositions in the system.) The set of truth value 
maps T € 2? is then completely unrestricted. Every logical axiom in a predicate calculus is a compound 
logical expression @ which is a tautology, which means that it signifies a knowledge set Ky C 2? which is 
equal to the entire knowledge space 2” of truth value maps. (See Section 3.4 for knowledge sets.) In other 
words, every truth value map in 2” satisfies such a tautology ¢. The difference in the case of a first-order 
language is that the space 27 is restricted by non-logical axioms, which correspond to constraints on this 
space. For example an equality relation E on an object universe U would satisfy r(E(x,x)) = T for all z € U, 
for all r € 2P. (Usually *E(z,y)" would be notated as “x = y" for all z, y € U.) This non-logical axiom 
for equality would restrict the set of truth value maps from 2” to {r € 2”; Va € U, T(E(z,z)) = T). (Note 
that Range(£) C P.) 


In addition to constraining the knowledge space, first-order languages may also introduce object-maps, each 
of which maps finite sequences of objects to single objects. Since object-maps are not required for ZF set 
theory, this feature of first-order languages is not presented in any detail here. (See Section 6.9 for some 
brief discussion of general first-order languages.) 


5.1.4 REMARK: Name maps for predicates, variables and propositions in predicate logic. 

Predicate calculus requires two kinds of names, namely (1) names which refer to predicates and (2) names 
which refer to parameters for the predicates. Thus the notation “P(x)” refers to the proposition which 
is “output” by the predicate referred to by “P” when the input is a variable which is referred to by the 
symbol “x”. In pure predicate calculus, there are no “constant predicates” (such as “=” and “€” in set 
theory). Therefore the symbols “P” and “a” may refer to any predicate and variable respectively. However, 
within the scope of these symbols, their meaning or “binding” does not change. Similarly, the “Q(z, y)" 
refers to the proposition which is output by the predicate bound to “Q” from inputs which are bound to “a” 
and “y”. And so forth for any number of predicate parameters. 


The name-to-object maps (or *bindings") for a predicate language are summarised in the following table. 
The maps are fixed within their scope. 


name name object object 
map space domain type 
uv Ny y variables 
uo No Q predicates 
Lip Np P propositions 


The choice of the words “space” and “domain” here is fairly arbitrary. (The most important thing is to 
avoid using the word “set”.) The spaces and maps in the above table are illustrated in Figure 5.1.1. 


In Figure 5.1.1, the proposition name “Q(y,z)” would be mapped by up to up(*Q(y, z)"), which is the 
output from the predicate uo (*Q") when it is applied to the variables uy( ^y") and py(“z”). Thus one could 
write Lp (“Q(y, zy”) = to (“Q”) (uy (“y”), wy (“2”)). Then if “Q” is “E”, “y” is «() and 4o» is “LOY, one 
could write pp (“0 € (07) = uol“E” uy ( “0 ), wy (*(0)?)). 
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variable names predicate names proposition names truth value names 
Ny No Np 
abstract |£, Y, Z.. |------ >| P, Q,... | ——»-|P(x), Q(y, z).... C F T | (ET) 
by | Ho HP | 
y Q P 
concrete 0, {Q},... aS >|“ es >| () (0),... ONES F T {F,T} 
variables predicates propositions truth values 
Figure 5.1.1 Variables, predicates, propositions and truth values 


The space Q of predicates may be partitioned according to the number of parameters of each predicate. 
Thus Q = Qo U Q1 UQ»..., where Q; is the space of predicates which have k parameters. The predicates 
f € Qr have the form f : V^ — (F, T). In terms of the list notation in Section 14.12, the predicates f € Q 
have the form f : List(V) > (F, T]. 


5.1.5 REMARK: Name spaces for truth values. 

Strictly speaking, there should be a map ur : NT — T, where 7 is the concrete set of truth values, and 
Ny = {F,T} is the abstract set of truth values, but this fairly trivial mapping is ignored here for simplicity. 
The concrete truth values may be voltages, for example. And the truth map may be varied for the same 
abstract and concrete truth value spaces. The abstract truth value name space My probably should be a 
fixed space. On the other hand, other kinds of logic could use more than two truth values for example. 


5.1.6 REMARK: Five layers of linguistic structure. Logical expressions and logical expression names. 

The five layers of metamathematical language for propositional calculus in Figure 4.1.1 in Remark 4.1.7 are 
applicable also to predicate calculus. The main difference is that the logical operators are extended by by the 
addition of universal and existential quantifiers in the predicate calculus. When writing axioms, for example, 
one uses logical expression names to refer to logical expressions whose individual letters refer to entire 
proposition names. For example, in the logical expression name “Vx, (a(x) = (x))", the letters a and 8 
could refer to logical expressions such as *P(x) ^ A” and *Q(x) ^ A" respectively. Then “Vx, (a(x) = B(x))" 
would mean “Yz, ((P(x) ^ A) > (Q(x) ^ A))” in the language of the theory. Thus “Vz, (a(x) > 6(x))” is 
a kind of template for logical expressions which have a particular form. The distinction becomes important 
when writing axioms and theorems for a language. Axioms and theorems are generally written in template 
form, and proofs of theorems are really templates of proofs which are instantiated by replacing logical 
expression names with specific logic expressions in the language. (This idea is illustrated in Figure 5.1.2.) 


5.1.7 REMARK: Finite number of predicates and relations in first-order languages. 

Although the variables in predicate calculus may be arbitrarily infinite, there are typically only a small finite 
number of “constant predicates” in a first-order language. For example, in ZF set theory, there are only two 
predicates, namely the set equality and set membership relations. (In NBG set theory, there is an additional 
single-parameter predicate which indicates whether a class is a set.) This large difference in size between 
the spaces of variables and predicates is completely understandable when one views predicates as families of 
propositions. Each predicate corresponds to a different concept, typically a fundamental relation or property 
of objects. 


5.1.8 REMARK: Predicate calculus requires a domain of interpretation. 

The wffs in a propositional calculus are meaningful only if a “domain of interpretation” is specified for 
its statement variables, relations, functions and constants. According to E. Mendelson [370], page 49, an 
“interpretation” requires that the variables be elements of a set D and each logical relation, function and 
constant must be associated with a relation, function and constant in the set-theory sense within the set D. 
In other words, the interpretation of a propositional calculus requires some of the basic set theory which is 
presented in Chapters 7, 8, 9 and 10. (Or more accurately, some naive form of set theory is required.) 
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compounds 7 
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names 
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concrete ; 
objects qı q2 Vi) 2| |U3 qi(vi,v2)| |qo(va, va) 1 E semantics 
predicates variables concrete propositions 


Figure 5.1.2 Concrete objects, names, logical expressions and logical expression names 


5.1.9 REMARK: The always-false and always-true predicates. 

It is sometimes convenient to have a notation for a predicate which is always true (or always false). Such 
a predicate does not need parameters. So Notation 5.1.10 introduces the zero-parameter predicates which 
are always true or always false. This is the same notation as for zero-operand logical operators. (See 
Notation 3.6.15.) 


These predicates are added to the abstract predicate names for any particular predicate calculus. There are 
not necessarily any concrete predicates for which these abstract predicates are labels. 


As an extension of notation, logical predicates which are always true or always false and have one or more 
parameters, may be denoted as for functions. Thus, for example, T (x,y,z) would be a true proposition for 
any variables x, y and z, and L(a,b,c,d) would be a false proposition for any variables a, b, c and d. Luckily 
such notation is rarely needed. 


5.1.10 NOTATION [MM]: 
T denotes the (always) true zero-parameter logical predicate. 
L denotes the (always) false zero-parameter logical predicate. 


5.1.11 REMARK:  Constants in predicate calculus. 

In addition to the variable predicates, variable functions and individual variables, there is also a requirement 
for constants in each of these categories. For example, in ZF set theory, “€” is a constant predicate and ^" 
is an individual constant. Figure 5.1.3 illustrates the map uy of variable names for sets and the map jig for 
the constant name “E” for the concrete set membership predicate. 


5.1.12 REMARK:  Constants and equality om concrete object spaces. 

The definition of “constant” in Remark 5.1.11, for names of predicates, functions and individuals, only 
has meaning if there is a definition of equality (or “identity” ) on the corresponding concrete object space. 
Otherwise one cannot know whether two names are pointing to the same object. But definitions of constants 
require even more than that. 


The definition of the word “constant” is not at all obvious. In mathematics presentations, one often hears 
statements like: “Let C be a constant." But then it generally transpires that C is quite arbitrary. So it 
isn't constant at all, although it will generally be constant with respect to something. Analysis texts often 
have propositions of the general form: Vz,dC,Vy, P(r,y, C), where P is some three-parameter predicate. 
(See for example the epsilon-delta criterion for continuity on metric spaces in Theorem 38.1.3.) In this case, 
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Figure 5.1.3 Mapping the constant name “€” to a concrete predicate 


C is constant with respect to y, but not with respect to x. Thus one may typically write C(x) to informally 
suggest that C may depend on z but not on y. 


So the question arises in the case of names of predicates, functions and variables as to what a constant name 
should be constant with respect to. The simple answer is that constant names are constant with respect to 
name maps, but this requires some explanation. 


Consider first the name maps uy : My — V from the individual name space Vy to the individual object 
space V. (The “individual” spaces are usually called “variable” spaces, but this is confusing when one is 
talking about “constant variables". A more accurate terms would be “predicate parameter spaces".) In 
principle, the language defined in terms of the name space should be invariant under arbitrary choices of the 
name map uy. That is, all of the axioms and rules should be valid for any given name map. Therefore the 
axioms and rules should be invariant under any permutation of the elements of the object space V. 

If all elements of a concrete space are equivalent, there is no need to be concerned with the choice of names. 
But this is not usually the case. More typically, each element of a concrete space has a unique character 
which is of interest in the study of that space. In this case, one often wishes to have fixed names for some 
concrete objects. 

Consider the example of the empty set in ZF set theory. Let a € Y be the empty set in a concrete ZF set 
theory (often called a “model” of the theory). Then the name A € Vy in the parameter name space Vy 
points to a if and only if uy(A) = a, where £: V x Y > P denotes the equality relation on the concrete 
parameter space V. (See Figure 5.1.4.) 


abstract names abstract names 


$99 38$ 


= up(B) b= uA) c5 pC) 
concrete objects 


bi ub(B) cS (C) a 
concrete objects 


Figure 5.1.4 Definition of a constant name 


A variable name C € Ny can be said to be “constant” if Vul, uy, € 
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denotes the set of functions f : Ny > V. 


Since the concrete parameter space has unknown elements, one cannot know what a is. (Concrete spaces are 
an "implementation" matter, which cannot be standardised at the linguistic level in logic.) Therefore the 
condition uy(A) = a is not meaningful. However, if C is a name which know (somehow) has the property 
uy(C) = a, we can define define A by the condition uy(A) = py(C). This is now importable into the 
abstract name space as the condition: A = C, where =: My x Ny > Np is the import of the concrete 
relation = to the parameter name space. 


Now the condition A = C is also not meaningful because C is not known. The problem here is that the 
equality relation is fully democratic. To see how to escape from this problem, denote an ad-hoc abstract 
single-parameter predicate Pc by Po(X) = “X = C^. (This is equivalent to Po(X) = *uy(X) = a”.) Then 
clearly Po(X) is true if and only if X = C. That is, VX, (Po(X) & (uy(X) 5 a)). So the predicate Pc 
characterises the required constant C. That is, we can determine if any name X points to the fixed object 
a by testing it with the predicate Po. 

A predicate P € Ng which satisfies a uniqueness rule, namely Yz € Ny, Vy € Ny, ((P(z) AP(y)) > (x = y)), 
characterises a unique concrete object. The predicate Po defined by Pc(X) = “uy(X) = a" does satisfy this 
uniqueness requirement, but this alone cannot define Pc because it does not define the unknown object a € V. 
The predicate Po(X) = "uy (X) 5 a" is specific to a, but there is no way to explicitly specify a. 

Pc can be converted into a predicate which is meaningful by defining P(X) = “Vy, y ¢ X". The predicate 
P does not refer to any a-priori knowledge of the identity of the empty set at all. So this does define a 
constant name for the empty set. A name X is the name of the constant object (called “the empty set") if 
and only if P(X) is true. 


Next it must be noted that the predicate expression P(X) = “Vy, y d X" depends on the membership 
relation “€”, which is in turn a fixed predicate which is imported from the concrete predicate space. This 
has shifted the constancy problem from predicate parameters to predicates. It seems that the need to have 
an a-priori map from some constant predicate names to particular concrete predicates is unavoidable. On 
the other hand, the ZF axioms characterise the membership relation “€” very precisely. If there are two or 
more such membership relations on the concrete space, that is not a problem. All of the consequences of 
ZF set theory will be valid for all maps po from the predicate name space Mo to the concrete predicate 
space Q. 

The “constant” empty set name is characterised the predicate P(X) = “Vy, y ¢ X", which depends on the 
variable name *€" for a membership relation which satisfies the ZF axioms. However, this is not a problem 
which anyone worries about. The choice of predicate name map jig : Vo > Q is very strongly constrained 
by the ZF set axioms. Then in terms of the choice of the membership relation *€", the name () is uniquely 
constrained by the predicate P(()), which is true only if the name “Ø” points to the unique empty set for the 
particular choice of predicate name map for “€”. Then we have a = uy((0). 


In conclusion, the definition of a constant name (for a predicate parameter) requires a definition of equality 
on the concrete parameter space, which implies that a first-order language without equality cannot have 
constant names, and at least one additional “constant” predicate must be specified in order to distinguish 
one parameter from another, so that one will know which concrete object is being pointed to by the constant. 


5.2. Logical quantifiers 


5.2.1 REMARK: The two logical quantifiers. 

The quantifier symbols V and 3 mean “for all” (the universal quantifier) and "for some” (the existential 
quantifier) respectively. The existential quantifier may, in principle, be defined in terms of the universal 
quantifier. Thus Jz, P(x) may be defined to mean ^(Vz,—P(x)) However, minimising the number of 
primitive symbols is not always the best approach. As mentioned in Remarks 4.2.2 and 4.2.3, reducing 
primitive logical operators to a single operator such as |. (NOR) or + (NAND) obfuscates the meaning and 
creates unnecessary extra work to define the other operators and demonstrate their properties. 


The statement that “there is a lion in the forest” is logically equivalent to the statement that “it is not true 
that everything in the forest is not a lion". However, the second statement is not how one generally thinks 
about the existence of lions in a forest. 
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Definition 6.3.9 (viii) gives separate rules for the universal and existential quantifiers, from which all desired 
properties may be inferred in a meaningful way. (See Theorems 6.6.7 (ii) and 6.6.10 (vi).) Notation 5.2.2 is 
merely an attempt to explain the meanings of these quantifier symbols to human readers. 


5.2.2 NOTATION [MM]: 
Va, P(x) for any variable name x and predicate name P means “P(x) is true for all values of x”. 


da, P(x) for any variable name x and predicate name P means *P(x) is true for some value of x”. 


5.2.3 REMARK: Erroneous interpretation of the existential quantifier. 
The symbol 3 does not mean “there exists” as many elementary texts erroneously claim. It is a quantifier, 


« 


not a verb. Sentences of the form “Ja such that P(x)." are wrong — and very annoying to the cognoscenti. 


The correct form is “Sa, P(r)." or: "There exists x such that P(x)." Thus in colloquial contexts one may 
read dz as “there exists x such that", but not “there exists x”. 


Although the 3 symbol does not mean “there exists", the symbol is mnemonic for the letter ^E" of the word 
“exists”. Similarly the V symbol is mnemonic for “A” in the word “all”. The rotation of the symbols through 
180? is most likely a relic of the olden days when typesetting used lead fonts. Using an existing character 
presumably saved space in the upper font drawer (called the *upper case"). 


5.2.4 REMARK: Notations for logical quantifier symbols and syntax. 

The use of a comma after every quantifier is intended to remove ambiguity. The comma terminates the 
quantifier unambiguously. This is a good idea even if the following sub-expression is parenthesised because 
juxtaposition of two sub-expressions could be confusing. It is better to simply make all expressions easy to 
parse by using the quantifier-terminating commas. 


Table 5.2.1 is a sample of logical quantifier notations in the literature. 


The universal and existential quantifiers are sometimes denoted by ^ and V respectively. This choice 
of symbols may be justified by considering expressions of the form “P(a1) ^ P(x) A P(za) A...” and 
“P(x1) V P(z2) V P(za) V ..." respectively. However, these notations would be confusing in the context of 
tensor algebra. So they are not used in this book. 


5.2.5 REMARK: Suppression of parameters of parametrised proposition families in abbreviated notation. 
One often sees in the mathematical logic literature the suppression of parameter lists for proposition families, 
such as the parameter list “(x,y)” in a parametrised proposition “P(x, y)". Then the surrounding text must 
indicate whether each variable is, or is not, a free variable in a logical expression. It seems desirable to make 
all symbolic expressions contain sufficient information to interpret their meaning, rather than providing 
textual comments in the nearby context to indicate what the expressions mean. 


In the abbreviated style of predicate logic notation, an author would typically write an expression such as 
“Jx, Vy, P" and then explain informally that, for example, “x may be free in P, but y is not free in P". This 
same information could be communicated unambiguously in the symbolic expression form “Ja, Vy, P(x)”. 
Many logic textbooks become particularly confusing when they write their inference rules in the abbreviated 
style. In this book, in Definition 6.3.9 rule (UI) for example, one sees a sentence similar to: “From A F a(y), 
infer A F Va, o(z)." In the abbreviated notation convention, one must add a subsidiary note such as “where 
x is not free in A” to a sentence such as: “From A F a, infer A F Va, a.” When multiple inference rules 
must be applied in complex practical proofs of logic theorems, the verification of the correct application of 
informally written subsidiary conditions is error-prone, which tends to defeat the main purpose of formal 
logic, which is to guarantee correctness as far as possible. 


The habit of suppressing implicit parameters is seen almost universally in the mathematics literature. This 
is particularly so in differential geometry and partial differential equations. Thus one writes “Au = f" as an 
abbreviation for “Va € IR^, Au(x) = f(x)", for example. Free variables, such as “x” in this case, are often 
suppressed, and their “universes” are generally indicated somewhere in the context. Complicated tensor 


expressions are quite often tedious to write out in full. For example, the equation 
Vz € Range(¥), V(i, j k, £) € NÅ, 
Ri rela) = Dj (2) — E ole) + DL Gr) CE) — Lis) TR Gc) 
would typically be abbreviated to: 


Hug = Veg = Tjk e + Imk! je — Tmel jk 
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year author universal quantifier existential quantifier 
1928 Hilbert /Ackermann [358] (x)A (x) (Ex)A (x) 
1931 Gödel [356] ~ gjll(zo(z1)), (x) (F(2)) (Ex)(F(x)) 
1934 Hilbert /Bernays [359] x) A(x) (Ex) A(x) 
1940 Quine [380] ES a) (Ea) 
1944 Church [348] x) F(a) (3a) F(a) 
1950 Quine [381] Va Fx de Fr 
1950 Rosenbloom [386 a)9t (ac) (3a)8t(a) 
1952 Kleene [365] Vx A(x) 3rA(x) 
1952 Wilder [403 Va P(x) da P(x) 
1953 Rosser [387 x) F(x) (Ex) F(x) 
1953 Tarski [399] Aa Px Va Px 
1954 Carnap [345] x)Px (Ax)Px 
1955 Allendoerfer/Oakley [48] Vzpz ENS 
1957 Suppes [394 z)Fx (Ar) Fx 
1958 Bernays [341] (x) (x) (Er)X(e) 
1958 Nagel/Newman [373 x)(F(2)) (Ar) (F(x)) 
1960 Körner [461] x) P(x) (Ar) P(x 
1960 Suppes [395 (Vx) d(x) (Ar) é(x) 
1963 Curry [350 (Vr)A (3r)A 
1963 Quine [382] x) Fx (Sa) Fax 
1963 Stoll [393 z)P(x) (Sa) P(x) 
1964 E. Mendelson [370] z)Al(a) (Ex) Al(z) 
1964 Suppes/Hill [396] Va) (Ax) 

1965 KEM [103 Vo A(x) 3rA(x) 
1965 Lemmon [367] x)Px (da) Px 
1966 Cohen [349] _ Va A(x) Jr A(x) 
1967 Kleene [366] Vo P(x) da P(x) 
1967 Margaris [369 VrP E 
1967 Shoenfield [390] VxA JxA 
1968 Smullyan [391] (Va)Px (3x)Pr 
1969 Bell/Slomson [339] (Va) P(x) (Ar) P(x) 
1969 Robbin [384] VxP(x) J3xP(x) 
1970 Quine [383] Vie Fr da Fr 
1971 Hirst /Rhodes [361] Va, P(x) da, P(x) 
1973 Chang/Keisler [347] (Va)O(a) (Ar) d(x) 
1973 Jech [364] Va olx) Jr (x) 
1974 Reinhardt/Soeder [124] /\ P(z) V P(x) 
1979 Lévy [368] Yro Jro 
1980 EDM [112 VaeF (x), (x)F (x), Wek (x), AvyF(x) 3xF(x), (Ex)F(x), XxF(x), VrF(x) 
1982 Moore [371 Vx P(x) da P(x) 
1990 Roitman [385 Va olx) da (x) 
1993 Ben-Ari [340] Va P(x) da P(x) 
1993 EDM2 [113 VzF(z) 3e F(x) 
1996 Smullyan/Fitting [392] ^ (Vzr)P(z) (Ar) P(x) 
1998 Howard/Rubin [362] (Va) P(x) (Ar) P(x) 
2004 Huth/Ryan [363] Va P(x) da P(x) 
2004 Szekeres [305] Va(P(a)) 3z(P(a)) 
Kennington Va, P(x) dz, P(x) 
Table 5.2.1 Survey of logical quantifier notations 


As a rule of thumb, such expressions are interpreted to mean that the parametrised proposition is being 
asserted for all values of all free variables in the universes defined within the context for those variables. 
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This is only a problem if multiple contexts are combined and the meanings therefore become ambiguous. 


The beneficial purpose of suppressing proposition and object parameters, and universal quantifiers for free 
variables, is to focus the minds of readers and writers on particular aspects of expressions and propositions, 
while placing routine parameters in the background. There is a trade-off here between precision and focus. 
One may sometimes wonder whether a writer even knows how to write out an abbreviated expression in 
full. When mathematical expressions seem difficult to comprehend, it is often a rewarding exercise to seek 
out all of the variables, and the universes for those variables, and write down fully explicit versions of those 
expressions, showing all quantifiers, all variables, and all universes for those variables. 


As mentioned also in Remark 6.6.3, the abbreviated style of predicate logic notation where proposition 
parameters are suppressed is not adopted in this book. Thus if a proposition parameter list is not shown, 
then the proposition has no parameters. If one parameter is shown, then the proposition belongs to a single- 
parameter family with one and only one parameter. And so forth for any number of indicated parameters. 


This rule implies that so-called “bound variables" are suppressed. Thus one may write, for example, “A(a)” 
to mean “Vy € U, P(x,y)”, which roughly speaking may be written as A(x) = “Vy € U, P(x,y)’. If one 
expands the meaning of A(x), the bound variable y appears. But since it is bound by the quantifier “Vy”, it 
is not a free variable in A(x). Therefore it is not indicated because (A(x)),cy is a single parameter family 
of propositions. The expression "vy € U, P(x,y)” yields a single proposition for each choice of the free 
variable x. 


To be a little more precise, a two-parameter family of propositions (P(x, y))z,yeu is (in metamathematical 
language) effectively a map P : U x U — P from pairs (x, y) € U x U to concrete propositions P(x,y) € P 
for some universe of parameters U and some concrete proposition domain P. The expression A(z) = 
"vy € U, P(x,y)” then denotes a family one-parameter family of propositions (A(x))seu, which is effectively 
a map A: U — P. The logical expression "Vy € U, P(x,y)” determines one and only one proposition 
in P for each choice of x € U. (The semantic interpretation of A(x) = “Vy € U, P(x,y)” is discussed 
for example in Remark 5.3.11.) Therefore it is valid to denote the (proposition pointed to by the) logical 
expression "Vy € U, P(x,y)” by a single-parameter expression ^A(x)" for each x € U, and so (A(x))seu is 
a well-defined proposition family A : U > P. 


5.2.6 REMARK: Relative information content of universal and existential quantifiers. 

The universal quantifier gives the maximum possible information about the truth values of a predicate P 
because the statement Vx, P(x) determines the truth value of the proposition P(x) for all values of x in 
the universe of the logical system. By contrast, the existential quantifier gives almost the minimum possible 
information about the truth values of a predicate P because the statement dx, P(x) determines the truth 
value of P(x) for only one value of x in the universe, and we don't even know which value of x this is. 


Although universal and existential quantifiers are superficially similar, since they are in some sense duals of 
each other, the differences in information content between them show that they are fundamentally different. 


5.2.7 REMARK: Difficulties with real-world interpretation of logical quantifiers. 

It is very difficult to specify notations for semantics. So Notation 5.2.2 necessarily explains the basic quantifier 
notations in natural language. If one looks too closely at this short explanation of the universal and existential 
quantifiers, however, some serious difficulties arise. The words “all” and “some” are not easy to explain 
precisely. In the physical world, it is very often impossible to prove that all members of a class have a given 
property. Even if the class is not infinite, it may still be impossible to test the property for all members. 


One might think that the word “some” is easier because the truth of the proposition is firmly proved as soon 
as one example is found. But if the 3r, P(x) is true, it might still not be possible to find the single required 
example x which proves the proposition. There is thus a clear duality between the two quantifiers. If the 
class of variables x is infinite for empirical propositions P(x), it is never possible to prove that Vx, P(x) is 
true, and it is never possible to prove that 3x, P(x) is false. (This duality follows from the fact that “true” 
and “false” are duals of each other.) 


The inability to establish quantified predicates by physical testing, in the case of empirical propositions, is 
not necessarily a show-stopper. Propositions usually do not refer to the “real world” at all. Propositions are 
usually attributes of models of the real world. And the quantified predicates may not arise from case-by-case 
testing, but rather from some sort of a-priori assumption about the model. Then if the model does find a 
contradiction with the real world, the model must be modified or limited in some way. This is how most 
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universal propositions enter into models. They are adopted because they are initially not contradicted by 
observations, and the model is maintained as long as it is useful. 


In the case of mathematical logic, it is sometimes impossible to find counterexamples to universal propo- 
sitions, or examples for existential propositions. An important example of this is the idea of a Lebesgue 
non-measurable set. No examples of these sets can be found in the sense of being able to write down a rule 
for determining what the elements of such a set are. If the axiom of choice is added to the ZF set theory 
axioms, it can be proved by means of axioms and deduction rules that such sets must exist. In other words, 
it is shown that da, P(x) is true for the predicate P(x) = “the set x is Lebesgue non-measurable". But no 
examples can be found. This does not prove that the assertion is false, which is very frustrating if one thinks 
about it too much. 


The difficulties with universal and existential quantifiers in mathematics are in one sense the reverse of the 
difficulties for empirical propositions. The empirical claim that “all swans are white” is always vulnerable 
to the discovery of new cases. So empirical universals can never be certain. But in the case of mathematics, 
universal propositions are often quite robust. For example, one may be entirely certain of the proposition: 
“All squares of even integers are even.” If anyone did find a counterexample x whose square x? is not 
even, one could apply a quick deduction (from the definitions) that the x? is not even if x is not even, 
which would be a contradiction. More generally, universal mathematical propositions are usually proved by 
demonstrating the absurdity of any counterexample. There is no corresponding method of proof to show the 
total absurdity of non-white swans. (Popper [467], page 381, attributes the use of swans in logic to Carnap. 
See also Popper [466], page 47, for ravens.) 


5.2.8 REMARK: The problem of infinite sets in logic. 

The problem of interpreting infinite sets within set theory is essentially the same as the infinity problem in 
logic. The predicate calculus already has all of the infinity difficulties. On the other hand, no infinite set 
of propositions can ever be written down and checked one by one. So one can never really prove that any 
universally quantified infinite family of propositions is satisfied. This seems to put infinite conjunctions in 
the same general category as propositions which can never be proved true or false. 


There are adequate metaphors for infinity concepts. For example, the idea that no matter how large a 
number is, there will always be a larger number, is very convincing. We simply cannot imagine that any 
number could be so big that we could not add 1 to it. But this, and any other infinity metaphor, breaks 
down when we consider straws on the backs of camels. It is very difficult to imagine that a single straw will 
break the back of a camel. But we also know that a weight of 100 tons cannot be carried by any camel. So 
there must be a point at which the camel will collapse. If we replace straws with single hairs or even lighter 
objects, it is even more difficult to imagine the camel collapsing from such a tiny weight increment. The 
boundedness of the integers seems entirely plausible. But we just can’t imagine the breaking point where 
the universe’s ability to represent an integer would break down. 


5.3. Natural deduction, sequent calculus and semantics 


5.3.1 REMARK: The motivation for natural deduction systems. 

It was discovered in the 1930s and earlier that for axiomatic systems in the spartan style proposed by Hilbert, 
where the only inference rules were modus ponens and substitution, very long proofs could be short-circuited 
by applying derived inference rules by which the existence of proofs could be guaranteed, thereby removing 
the need to actually construct full proofs. An important kind of derived inference rule is a deduction 
metatheorem such as is discussed in Section 4.8. (A deduction metatheorem may be thought of as a converse 
of the modus ponens rule.) 


A typical toolbox of useful derived inference rules for a Hilbert system is given by Kleene [365], pages 98-101, 
146-148, or Kleene [366], pages 44—45, 118-119. Since derived rules are proved metamathematically, and 
the main purpose of the axiomatisation of logic is to guarantee correctness by mechanising logical proofs, it 
seems counterproductive to start with axioms which are inadequate to requirements, and then fill the gap 
with metatheorems which are proved subjectively outside the mechanical system. The sought-for objectivity 
is thereby lost. It seems far preferable to start with rules which are adequate, and which will not require 
frequent recourse to metatheorems to fill gaps. 


A natural deduction system may be thought of as a logical inference system where the derived rules of a 
Hilbert-style system are accepted as a-priori obvious instead of axioms. 
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Derived rules from Hilbert-style systems typically state that if some proposition can be proved from some set 
of assumptions, then some other proposition can be proved from another set of assumptions. As an example, 
a deduction metatheorem typically states that if C can be inferred from propositions A1, A2, ... Am and B, 
then B — C can be inferred from A1, A», ... Am, where m € Zg. If the assertion symbol “ H” is understood 
to signify provability, this derived rule states that if A1, A2, ... Am, B F C, then A1, As, ... Aj, F BSC. 
Most derived rules have this kind of appearance. Therefore it makes practical sense to replace simple 
assertions of the form “+ B” with provability assertions of the form “A1, Ag, ...Am F B". 


The fact that the assertion symbol signifies provability in a Hilbert system can be conveniently forgotten. 
In the new “natural deduction” system, the historical origin does not matter. Each line of an argument can 
now be a “conditional assertion”, which can signify simple that if all of the antecedent propositions are true, 
then the consequent proposition is true. In the new system, “provability” means provability of a conditional 
assertion from other conditional assertions. Since deduction metatheorems are assumed to have been proved 
in the original Hilbert system, one can freely move propositions from the left to the right of the assertion 
symbol. Therefore there is no real difference between an assertion symbol and an implication operator. This 
conclusion is essentially the starting point of the 1934 paper by Gentzen [413]. 


In a sense, there is not real difference between a Hilbert system and a natural deduction system. In practice, 
they often use the same inference rules. In the former case, they are “derived rules”, whereas in the later 
case they are the basic assumptions of the system. The main difference is in the initial starting point of 
the presentation. A Hilbert system commences with a set of axioms with very few inference rules, and the 
assertions are initially of the unconditional form “H a” for well-formed formulas o. In the natural deduc- 
tion case, conditional assertions of the form “a ,,...@m F f" are the starting point, and comprehensive 
deduction rules rules are given, with no axioms or very few axioms. 


5.3.2 REMARK: Natural deduction argues with conditional assertions instead of unconditional assertions. 
Conditional assertions of the form “ A1, A2, ... Am F B" forme Zi were presented by Gentzen [413, 414] 
as the basis for a predicate calculus system which he called “natural deduction". This style of logical calculus 
had well over a dozen inference rules, but no axioms. This was in conscious contrast to the Hilbert-style 
systems which used absolutely minimal inference rules, usually only modus ponens and possibly some kind 
of substitution rule. The natural deduction style of logical calculus permits proofs to be mechanised in a 
way which closely resembles the way mathematicians write proofs, and also tends to permit shorter proofs 
than with the more austere Hilbert-style calculus. The natural deduction style of logical calculus is adapted 
for the presentation of predicate calculus in this book. It is both rigorous and easy to use for practical 
theorem-proving, especially in the tabular form presented in comprehensive introductions to logical calculus 
by Suppes [394] and Lemmon [367]. Some texts in the logic literature which present tabular systems of 
natural deduction are as follows. 


(1) 1940. Quine [380], pages 91-93, indicated antecedent dependencies by line numbers in square brackets, 
anticipating Suppes’ 1957 line-number notation. 


(2) 1944. Church [348], pages 164-165, 214-217, gives a very brief description of natural deduction and 
sequent calculus systems. 


(3) 1950-1982. Quine [381], pages 241-255. It is unclear which of the four editions introduced tabular 
presentations of natural deduction systems. This book by Quine demonstrates a technique of using 
one or more asterisks to the left of each line of a proof to indicate dependencies. This is equivalent to 
Kleene's vertical bar-lines. However, this system is not applied to a wide range of practical situations, 
and does not use the very convenient and flexible numbered reference lines as in the books by Suppes 
and Lemmon. 


(4) 1952. Kleene [365], page 182, uses vertical bars on the left of the lines of an argument to indicate 
antecedent dependencies. On pages 408-409, he uses explicit dependencies on the left of each argument 
line. On pages 98-101, Kleene gives many derived rules for a Hilbert system which look very much like 


the basic rules for a natural deduction system. 

(5) 1957. Suppes [394], pages 25-150, seems to give the first presentation of the modern tabular form of 
natural deduction with numbered reference lines. It is intended for introductory logic teaching, but also 
has some theoretical depth. 

(6) 1963. Stoll [393], pages 183-190, 215-219, used sets of line numbers to indicate antecedent dependencies 
for the lines of tabular logical arguments based on natural deduction inference rules. 
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(T) 1965. Lemmon [367], gives a full presentation of practical propositional calculus and predicate calculus 
which uses many rules, but no axioms. It emphasises practical introductory theorem-proving, but does 
have some theoretical treatment. It is derived directly from Suppes [394]. 


(8) 1965. Prawitz [378, 379]. 


(9) 1967. Kleene [366], pages 50-58, 128-130, briefly demonstrates two styles of tabular natural deduction 
proof, one using explicit full quotations of antecedent propositions on the left of each line, the other 
using vertical bar-lines on the left to indicate antecedents. There is some serious theoretical treatment. 
In particular, it is shown that his propositional inference rules are valid (Kleene [366], pages 44-45), and 
that his rules for quantifiers are valid (Kleene [366], pages 118-119). 

(10) 1967. Margaris [369] presents an introduction to logic which has Hilbert-style axioms. He uses single- 
consequent assertions to present his derived inference rules, and uses indents to indicate hypotheses and 
Rule C. But otherwise, it is not a natural deduction system. 

(11) 1993-2012. Ben-Ari [340], pages 55-71. This is a kind of natural deduction for computer applications. 
It uses explicit antecedent dependency lists on every line for propositional calculus. 


(12) 2004. Huth/Ryan [363]. This is a presentation of natural deduction for computer applications. 


For the presentation of predicate calculus in this book, the tabular form of natural deduction is adapted 
by reducing the number of rules from over a dozen down to about seven, but adding the three Lukasiewicz 
axioms from the propositional calculus which is developed in the Hilbert style in Chapter 4. It is thus a 
hybrid of Hilbert and Gentzen styles. 


5.3.3 REMARK: Hilbert assertions, Gentzen assertions, natural deduction and sequent calculus. 

The kinds of assertion statements employed in various kinds of deduction systems may be very roughly 
characterised as follows. 

(1) Hilbert style. Assertions of the form “t 8” for some logical formula f. 

(2) Gentzen style. Conditional assertions are the *elementary statements" of the system. 


(2.1) Natural deduction. Assertions of the form *o4, a2, ... o, F B" for logical formulas a; and £, 
for i € IN4, where m € Zg. 
(2.2) Sequent calculus. Assertions of the form “a1, 05, ... o F £1, B5, ... B4," for logical formulas 


a, and bj, fori € Nm and j € Nn, where m,n € Dix 


The semantics of each of the three styles of assertion statements is fairly straightforward to state. (See 
particularly Table 5.3.1 in Remark 5.3.10.) However, the axioms and inference rules for deduction systems 
based on these styles of assertion statements show enormous variety in the literature. Any deduction system 
which infers true assertions from true assertions is acceptable. And of course, such deduction systems should 
preferably deliver all, or as many as possible, of the true inferences that are semantically correct. 


The corresponding styles of assertions, independent of the styles of inference systems, are as follows. 


(1) Unconditional assertions: “+ 6” for some logical formula f. 
Meaning: £ is true. 
(2) Conditional assertions: (also called “sequents” ) 


(2.1) Simple sequents: “a1, a2, ... o, F f? for formulas a; and fj, for i € Nm, where m € Zj. 
Meaning: If a; is true for all i € Nm, then £ is true. 


(2.2) Compound sequents: “a1, &2,...-Qm F 61, B2, --- Bn” for formulas a; and fj, for i € Nm 
and j € Nn, where m,n € Zi. 
Meaning: If a; is true for all i € Nm, then 0; is true for some j € Nn. 


5.3.4 REMARK: It is convenient to define a “sequent” to be a single-consequent assertion. 
In 1965, Lemmon [367], page 12, defined “sequent” with the same meaning as the “simple sequent” in 


Definition 5.3.14. 
Thus a sequent is an argument-frame containing a set of assumptions and a conclusion which is 
claimed to follow from them. [..] The propositions to the left of ‘H’ become assumptions of 
the argument, and the proposition to the right becomes a conclusion validly drawn from those 
assumptions. 
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Huth/Ryan [363], page 5, also defines a sequent to have one and only one consequent proposition. This seems 
far preferable in a natural deduction context. The fact that Gentzen did not define it in this way in 1934 
should not be an obstacle to sensible terminology in the 21st century. This is how the term “sequent” will 
be understood in this book. 


5.3.5 REMARK: Sequents, natural deduction, provability and model theory. 

The form of a conditional assertion is only one small component of a logical system. The same form of 
conditional assertion may be associated with dozens or hundreds of different inference systems and semantic 
interpretations. Therefore it is certainly not very informative to state merely the syntax of conditional 
assertions. Even if the basic semantics is specified, this still leaves open the possibility for a very wide range 
of inference systems and philosophical and practical interpretations. 


Particularly in the case of simple conditional assertions, there are two widely divergent styles of inference 
systems. First there are the very specific sets of rules and interpretations of Gentzen’s original natural 
deduction system introduced in 1934/35. (See Gentzen [413, 414].) He presented rules for natural deduction, 
and also rules for intuitionistic inference, both based on the same range of syntaxes and semantics for simple 
conditional assertions. He stated explicitly that his assertion symbol has the identical same meaning as the 
implication operator. Second, there are the logical systems, which could be broadly characterised as “natural 
deduction” systems, in which the assertion symbol signifies provability, but because of the adoption of the 
“conditional proof” inference rule, they effectively equate the assertion symbol to the implication operator. 
In these systems, the deduction metatheorem, which is very useful for Hilbert-style systems, is redundant 
because it is assumed as a rule. The rules in these kinds of systems have little in common with Gentzen’s 
intuitionistic rule-sets, which avoid using the “excluded middle” rule. Generally, the later natural deduction 
systems have assumed the excluded middle as one of their standard rules. 


In the modern model theory style of interpretation, conditional assertions are sharply distinct from semantic 
implication. The ability to prove a conditional assertion implies that the assertion is valid as an implication 
in all interpretations of the language in terms of models, but the inability to prove a conditional assertion 
does not necessarily prevent the assertion from being valid in all models. Thus is a model can be found 
which contradicts an assertion, this implies that the assertion cannot be proved within the language, but 
even if an assertion can be proved to be true in all models, this does not imply that the assertion can be 
proved within the language. In other words, the language may be incomplete in the sense that there are 
theorems which are true in all models although they cannot be proved within the language. 


The distinction between syntactic and semantic assertions is not present in the typical natural deduction 
system. Therefore the assertion symbol in such systems does not signify provability as distinct from validity. 
If it is valid, it is provable, and vice versa. 


5.3.6 REMARK: History of the meaning of the assertion symbol. Truth versus provability. 

The assertion symbol has changed meaning since the 1930s. Gentzen [413], page 180, wrote the following in 
1934. (In his notation, the arrow symbol *—" took the place of the modern assertion symbol “+”, and his 
superset symbol “d” took the place of the modern implication operator symbol *—".) 


2.4. Die Sequenz Aj, ..., A, — Bi, ..., By bedeutet inhaltlich genau dasselbe wie die Formel 
(Ai & ... & AJ) 5 (B1 Vv... V B,). 


(Translation: “The sequent A1, ..., A, — Bi, ..., B, signifies, as regards content, exactly the same as 
the formula (A; & ... & Aj) D (Bi V ... V B,).”) 


In 1939, Hilbert/Bernays [360], page 385, wrote the following in a footnote. 
Für die inhaltliche Deutung ist eine Sequenz 
Ai, — A, => By, POS B, 
worin die Anzahlen r und s von 0 verschieden sind, gleichbedeutend mit der Implikation 


(Translation: “For the contained meaning, a sequent A1, ..., A, — Bi, ..., Bs, in which the numbers r 
and s are different from 0, is synonymous with the implication (A; & ... & As) > (Bı V... V B,)”.) 
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This interpretation is agreed with by Church [348], page 165, as follows. (Church’s reference to the Gentzen 
1934 paper is replaced here by the corresponding reference as listed in this book.) 


Employment of the deduction theorem as primitive or derived rule must not, however, be confused 
with the use of “Sequenzen” by Gentzen.[413, pp. 190-210] ([...]) For Gentzen’s arrow, —, is not 
comparable to our syntactical notation, +, but belongs to his object language (as is clear from the 
fact that expressions containing it appear as premisses and conclusions in applications of his rules 
of inference). 


In other words, Gentzen's apparent assertion symbol is not metamathematical (in the *meta-language", as 
Church calls it), but is rather a part of the language in which propositions are written. That is, it is the 
same as the implication operator, not an assertion of provability. 


After the 1944 book by Church [348], page 165, all books on mathematical logic have apparently defined the 
assertion symbol in a sequent to mean provability, not truth. In other words, even though an implication 
may be true for all possible models, the assertion is not valid if there exists no proof of the assertion within 
the proof system. A distinction arose between what can be proved and what is true. An incomplete theory 
is a theory where some assertions are true but not provable. The double-hyphen notation *F" is often used 
for truth of an assertion in all possible models, whereas the notation “FH” is reserved for assertions which can 
be proved within the system. (Assertions that cannot be proved within a theory can sometimes be proved 
by metamathematical methods outside the theory.) 


In books in 1963 by Curry [350], page 184, in 1965 by Lemmon [367], page 12, and in 2004 by Huth/ 
Ryan [363], page 5, it is stated that sequents do signify assertions of provability. (See Remark 5.3.4 for the 


relevant statement by Lemmon.) This is possibly because the assertion symbol had been redefined during 
that time, particularly under the influence of model theory. 


Consequently the modern meaning of “sequent” is a single-consequent assertion which is provable within the 
proof system in which it is written. 


5.3.7 REMARK:  Gentzen-style multi-consequent sequent calculus is not used in this book. 

The sequent calculus logic framework which is presented in Remark 5.3.9 is not utilised in this book. 
Gentzen's multi-output sequent calculus has theoretical applications which are not required in this book. 
(See for example Takeuti [397], pages 7-481; Kleene [366], pages 305-369; Kleene [365], pages 440—516.) It 
is the simple (“single-output”) conditional assertions which are used here because of their great utility for 
rigorous practical theorem-proving. 

It is a misfortune that the word “sequent” is so closely associated with Gentzen's sequent calculus because this 
word is so much shorter than the phrase “simple conditional assertion”. Nevertheless, the word “sequent” 
(or the phrase “simple sequent") is used in this book in connection with the natural deduction, not the 
sequent calculus which Gentzen sharply counterposed to the natural deduction systems which he described. 


The real innovation in the sequent concept lies not in the form and semantics of the sequent itself, but rather 
in the use of sequents as the lines of arguments instead of unconditional assertions as was the case in the 
earlier Hilbert-style logical calculus. In other words, every line of a Gentzen-style argument is, or represents, 
a valid sequent, and valid sequents are inferred from other valid sequents. This is quite different to the 
Hilbert-style argument where every line represents a true proposition, and true propositions are inferred 
from other true propositions. It is the transition from the “one valid assertion per line" style of argument to 
the *one valid conditional assertion per line" style which is really valuable for practical logical proofs, not 
the disjunctive right-hand-side semantics. 


5.3.8 REMARK: Replacing ordered sequences with order-independent sets in sequents. 

In Gentzen’s original definition of sequents, he stipulated that the antecedents and consequents of a sequent 
were true sequences. Hence the same proposition could appear multiple times in each of the two sequences, 
and the order was significant. This required the adoption of rules for swapping propositions, and introducing 
and eliminating multiple copies of propositions. (See Gentzen [413], page 192. He called these rules “Ver- 
tauschung”, “Verdiinnung” and “Zusammenziehung”, meaning “exchange”, “thinning” and “contraction” 
respectively. ) 

For an informal presentation of predicate calculus and first-order languages, the rules for swapping formulas in 
a sequent, and for introducing and eliminating them, are somewhat tedious and unnecessary. It is preferable 
to regard both sides of the assertion symbol as sets of propositions rather than sequences. In practical 
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arguments, proposition lists must be listed in some order, and multiple copies of propositions may occur. 
These are best dealt with by informal procedures rather than cluttering arguments with lines whose sole 
purpose is to swap formulas and introduce and eliminate superfluous copies of formulas. 


If the two sides of a sequent are sets instead of sequences, and the right side is a single-element “sequence”, 
it is questionable whether the term "sequent" still evokes the correct concept. However, its meaning and 
historical associations are close enough to make the term useful in the absence of better candidates. 


5.3.9 REMARK: Sequent calculus and disjunctive semantics for consequent propositions. 

Gentzen generalised “single-output” assertions to “multi-output” assertions which he called "sequents". 
(Actually he called them “Sequenzen” in German, and this has been translated into English as “sequents” to 
avoid a clash with the word “sequences”. See Kleene [366], page 441, for a similar comment.) Sequents have 
the fairly obvious form “A1, A2, ... Am F Bi, B2, ... Bn” for m,n € Z, but the meaning of Gentzen-style 
sequents is not at all obvious. 


The commas on the left of a sequent “A1, A2, ... Am F Bi, B2, ... Bn” are interpreted according to the 
unsurprising convention that each comma is equivalent to a logical conjunction operator, but each comma 
on the right is interpreted as a logical disjunction operator. In other words, such a sequent means that if all 
of the prior propositions are true, then at least one of the posterior propositions is true. This is equivalent 
to the simple unconditional assertion “F (A1 A A2 A ... Am) > (Bi V Bo V ... B4)". (This is consequently 
equivalent to the disjunction of the n sequents “A1, A», ... Am F Bj" for j € Nn.) 


The historical origin of the somewhat perplexing disjunctive semantics on the right of the assertion symbol 
is the “sequent calculus” which was introduced by Gentzen [413], pages 190-193. The adoption of disjunctive 
semantics on the right has various theoretical advantages. The 22 inference rules for predicate calculus with 
such semantics have a very symmetric appearance and are easily modified to obtain an intuitionistic inference 
system. But Gentzen’s principal reason for introducing this style of sequent was its usefulness for proving a 
completeness theorem for predicate calculus. 


5.3.10 REMARK: Semantics of sequents in natural deduction and sequent calculus. 

The interpretation of conditional assertions in natural deduction and sequent calculus inference systems is 
indicated in Table 5.3.1 in terms of the knowledge sets introduced in Section 3.4. (It is assumed that there 
are no non-logical axioms. See Remark 6.9.1 for comparative interpretation of purely logical systems and 
first-order languages.) 


sequent relation interpretation set interpretation 

1. RE 2? C Kg Kp =2F 
2. AF B KA4C Kg (2P \ KA)U Kp —9P 
3. Ay, Ag, ... Am F B N Ka; C Kg (2? \ QZ Ka) U Kg = 2P 
4. A1, Ao,...Am F B1, Bo, ... Bn N Ka, C Uj- KB, (2? XA. Ka) UU KB, = 2P 

i i Ui (2? \ Ka) UU KB, = 2? 
5. F Bı, B2, ... Bn 2P CUa Kp, Ua Kp, = 2P 
6. Ai, A2, ... Åm H N Ka, CO Uii? \ Ka,) = 2P 
T. AF KACÓ 2P \ Ka = QP 
8. F 2P co 0 = 2P 

Table 5.3.1 The semantics of conditional and unconditional assertions 


In each row in Table 5.3.1, the inclusion relation in the “relation interpretation” column is equivalent to 
the equality in the “set interpretation" column. (This follows from the fact that all knowledge sets are 
necessarily subsets of 2”.) The inclusion relations have the advantage that they visually resemble the form 
of the corresponding sequents. The assertion symbol may be replaced by the inclusion relation and the 
commas may be replaced by intersection and union operations on the left and right hand sides respectively. 
The sets on the left of the equalities in the “set interpretation” column are asserted to be tautologies. (The 
correspondence between tautologies and the set 2” is given by Definition 3.13.2 (ii).) One may think of each 
sequent as signifying a set. When sequent is asserted, the statement being asserted is that the truth value 
map 7 is an element of the sequent’s knowledge set for all 7 € 2”. 
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The assertion “H A” may be interpreted as 7 € Ky. The empty left hand side of this sequent signifies the 
intersection of a list of zero knowledge sets (or an empty union of copies of 2”), which yields 2”, the set of 
all possible truth value maps. Thus “H A” has the meaning r € 2? > 7 € K 4. Since the truth value map T 
is an implicit universal variable, the logical implication is equivalent to the set-inclusion relation 2? C K 4. 
Since the constraint T € 2? is always in force, this inclusion relation is equivalent to the equality K4 = 2”. 
'Thus logical expressions which are asserted without preconditions are always tautologies in a purely logical 
system. (In a system with non-logical axioms, the set 2? is replaced with some subset of 27.) 


The sequent ^A - B” may be interpreted as the inclusion relation K 4 C Kg, which means that if A is true 
then B is true. The sequent “A+ B" may also be interpreted as the set (2P V K 4) U Kg, which is also the 
knowledge set for the compound proposition “A — B". If this sequent is asserted in a purely logical system 
(i.e. without non-logical axioms), then it is being asserted that (2P V KA) U Kg equals the set 2”. 


It is easily seen that K4 C Kg if and only if (27 V Ka) U Kg = 2” for any sets K4, Kg € IP(2^). 


(2° \ K4)U Kp — 27 & 2P N((27N KA)U Kg) - 0 
& Kan (2? \ Kg) 20 
o KíANKp-0 
<= Ka C Kp. 


This justifies the set interpretations in the above table where there are one or more propositions on the left 
of the assertion symbol. 


Every sequent in a purely logical system is effectively a tautology. This fact is important in regard to the 
design of a predicate calculus. In the Hilbert-style predicate calculus systems, there are no left-hand sides 
for the lines of a proof. Each line is asserted to be a tautology in a purely logical system (or it is asserted to 
be a consequence of the full set of axioms in a system with non-logical axioms). In a sequent-based “natural 
deduction” system, each line of an argument signifies a (single-output) sequent, and the inference rules are 
designed to efficiently manipulate those sequents. The principal advantage of such sequent-based argument 
systems is that they more closely resemble the way in which mathematicians construct proofs. As a bonus, 
the proofs are generally shorter and easier to discover in a sequent-based system. 


The idea that the intersection of a sequence of zero knowledge sets yields the set 2” is consistent with 
the inductive rule that each additional knowledge set K4, in the sequence K 4,, K4,, ...K4,, restricts the 
resultant set. Since all knowledge sets are subsets of 2”, the smallest set for m = 0 is 2P. Anything 
larger would go outside the domain of interest. Another way of looking at this is to think of (7, Ka, 
as 2? \ Ui", (2? \ Ka,). Then the empty intersection for m = 0, which is normally not well defined, is 
now well defined as 27 V |] o \ Ka,), which equals 2? \ Ø, which equals 2P. (The union of an empty 
family of sets is well defined and equal to the empty set by Theorem 8.4.6 (i).) Therefore a sequent of the 
form “A;, A2, ... Am F Bi, Bo, ... Bn” with m = 0 is consistent with an unconditional sequent of the form 
“H By, Bo, ... Bn”. In other words, the unconditional style of sequent is a special case of the corresponding 
conditional sequent with m = 0. 


Similarly, when n = 0, so that the right-hand-side proposition list “B1, Bo, ... Bn” is empty, the natural 
limit of the knowledge-set union Uia Kp, is the empty set Ø. Then the interpretation of the sequent is 
that the left-hand-side knowledge set N; K4, is a subset of Ø, which is only true if there is no truth value 
map which is an element of all of the knowledge sets K 4;. This means that the composite logical expression 
A; ^ Ag ^ ...Am is a contradiction. 

The case n = 0 is of only peripheral interest here. In this book, only sequents with exactly one consequent 
formula will be used in the development of predicate calculus and first-order logic. 

In the degenerate case m = n = 0, the almost-empty sequent “H” signifies Ø = 27, which is not possible 
even if P = () because 2? = {Ø} 4 Ø by Theorem 10.2.26 (ii). The set of empty functions from () to P is not 
the same as the empty set which contains no functions. (It's okay to make forward references to set-theory 
theorems here because “anything goes" in metamathematics!) 


5.3.11 REMARK: Semantics of logical quantifiers. 
The interpretation of some kinds of universal and existential expressions in sequents in a purely logical 
system is suggested by the examples in the following table. 
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sequent relation interpretation set interpretation 
- Ve, p(x) 27 C leu Kea) seu Keia) =a 
- 3e, B(x) 2” C Usev Kee) Useu Kare) -2 
Va, a(z) - dy, B(y) heu Kat) € Uyeu Kew) (27 Neu Kot) U yeu Keo) = 2” 
Va, a(x) z a" Ka(a) C 0 2 \ leu Ka) = 2P 


A conditional assertion of the form “Va, a(x) H} Jy, B(y)" is very similar to a conditional assertion of the 
form *A1, As, ... Am F B1, Bo, ... Bn”. Their interpretations are also very similar. Instead of finite sets of 
indices 1...m and 1...n, however, the quantified expressions “Vz, a(x)” and “Jy, 8(y)" have the implied 
index set U, which is universe of parameters for propositions. 

The similarity between the sequents “Vx, a(x) - dy, B(y)" and “A1, As», ... Am F B1, Bo, ... Bn” has some 
relevance for the design a predicate calculus for quantified expressions based on inspiration from the much 
simpler propositional calculus for unquantified expressions. 


5.3.12 REMARK: Representation of disjunctions of propositions as sequents. 

As a curiosity, one may note that the disjunction of two antecedent propositions in a sequent may be 
represented as the conjunction of two sequents. To be more precise, suppose that one wishes to represent 
the sequent “a, V a9 F B" as a sequent using a comma instead of the logical disjunction operator. This is 
not possible because the comma is understood to mean a conjunction of propositions. However, the single 
sequent “a; V a2 l- f" is equivalent to the conjunction of the two sequents “a; F 8" and “ag F f". This 
idea is put to practical application when it is observed that a sequent of the form dz, a(x) | f is equivalent 
to the sequent a(#) F 8 because the presence of the free variable 2 causes the sequent to be universalised, 
which is equivalent to the conjunction of all individual sequents of that form. 


5.3.13 REMARK: Definition of simple and compound sequents. 

The notation Zo for the non-negative integers in Definition 5.3.14 is given in Notation 14.4.5. The circularity 
of this kind of definition is acceptable because it is a metadefinition, and anything goes in metadefinitions. 
The same holds for Nm and IN,, which are defined in Notation 14.1.21. 


5.3.14 DEFINITION[MM]: A (simple) sequent is an expression of the form “A1, A2, ... Amt B" for some 
sequence of logical expressions (4;)7*,, for some m € Zi, and some logical expression B. 


A compound sequent is an expression of the form “A1, A2, ... Am F Bi, Bo, ... Bn” for some sequence 
of logical expressions (4;)7*,, for some m € Zi, and some sequence of logical expressions (Bj); for 


some n € Zg. 


The meaning of a simple sequent “A;, A2, ... Am H B" is the knowledge set inclusion relation (7^, Ka, C 
Kp, where K 4, is the meaning of *A;" for alli € Nm, and Kg is the meaning of *B". 


The meaning of a compound sequent “A;, A», ... Am F B1, B2, ... Bn” is the knowledge set inclusion 
relation (i, Ka, C Uia Kp,, where K 4, is the meaning of *A;" for all i € Nm, and Kp, is the meaning 
of “B;” for all j € Nn. 


5.3.15 REMARK: Provability and semantics of sequents. 

The question of whether the assertion symbol signifies provability in Definition 5.3.14 is avoided by providing 
a semantic test for the truth of each sequent. If the test does not deliver an answer, one must accept that the 
truth or falsity of the assertion is unknown. In all cases, a sequent is either true or false, with no distinction 
between provability (within a theory) and truth (in all models). There is only one model. So model theory 
doesn’t have a substantial role here. 
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6.0.1 REMARK: Pure predicate calculus has no non-logical axioms. 

One could perhaps describe predicate calculus which has no non-logical axioms as “predicate tautology 
calculus” because every line in a predicate calculus argument represents a simple sequent which is effectively 
a tautology. (See Remark 4.3.2 for a corresponding observation for propositional calculus. See Figure 5.0.1 
in Remark 5.0.1 for the relations between predicate logic, predicate logic with equality, and first order 
languages. See Definition 5.3.14 for simple sequents.) 


As mentioned in Remark 3.16.2, constant predicates and non-logical axioms may be added to pure predicate 
calculus to form a first-order language. In this book, “predicate calculus” will generally mean the pure 
predicate calculus, which takes no account of any constant predicates, object-maps or non-logical axioms. 


6.1. Predicate calculus considerations 


6.1.1 REMARK: Abbreviation for predicate calculus. 

In this book, QC is an abbreviation for predicate calculus. The Q suggests the word “quantifier”. In fact, 
a more accurate name for predicate calculus would be “quantifier calculus”. (See Remark 4.5.5 for the 
corresponding tag PC for propositional calculus.) 


6.1.2 REMARK: The need for semantics to guide the axiomatic method for predicate calculus. 

Truth tables are a tabular representation of the method of exhaustive substitution. When there are N 
propositions in a logical expression, there are 2% combinations to test. When there are infinitely many 
propositions, the exhaustive substitution method is clearly not applicable. However, since even predicate 
calculus logical formulas contain only finitely many symbols, it seems reasonable to hope for the discovery 
of some finite procedure of determining whether an expression is a tautology or not. 


E. Mendelson [370], page 56, comments as follows on the existence of a truth-table-like procedure for the 
evaluation of logical expressions involving parametrised families of propositions. 


In the case of the propositional calculus, the method of truth tables provides an effective test as to 
whether any given statement form is a tautology. However, there does not seem to be any effective 
process to determine whether a given wf is logically valid, since, in general, one has to check the 
truth of a wf for interpretations with arbitrarily large finite or infinite domains. In fact, we shall see 
later that, according to a fairly plausible definition of “effective”, it may actually be proved that 
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there is no effective way to test for logical validity. The axiomatic method, which was a luxury in 
the study of the propositional calculus, thus appears to be a necessity in the study of wfs involving 
quantifiers, and we therefore turn now to the consideration of first-order theories. 


This observation does not imply that parametrised families of propositions can derive their validity only 
by deduction from axioms. The semantics of parametrised propositions is not entirely different to the 
semantics of unparametrised propositions. The introduction of a space of variables, together with universal 
and existential quantifiers to indicate infinite conjunctions and disjunctions respectively, does not imply 
that the semantic foundations are irrelevant to the determination of truth and falsity of logical expressions. 
The axioms of a predicate calculus are justified by semantic considerations. It is therefore possible to 
similarly examine all logical expressions in predicate calculus to determine their validity by reference to their 
meaning. However, the pursuit of this approach rapidly leads to a set of procedures which closely resemble 
the axiomatic approach. (This is demonstrated particularly by the semantic interpretations for predicate 
calculus in Section 6.4.) 


The axiomatisation of logic is merely a formalisation of a kind of “logical algebra”. There is an analogy 
here to numerical algebra, where a proposition such as x? + 2x = (a + 1)? — 1 may be shown to be true 
for all x € R without needing to substitute all values of x to ensure the validity of the formula. One uses 
algebraic manipulation rules which preserve the validity of propositions, such as “add the same number to 
both sides of an equation” and “multiply both sides of the equation by a non-zero number”, together with 
the rules of distributivity, commutativity, associativity and so forth. But these rules are based directly on 
the semantics of addition and multiplication. The symbol manipulation rules are valid if and only if they 
match the semantics of the arithmétic operations. In the same way, the axioms of predicate calculus are not 
arbitrary or optional. The predicate calculus axioms and rules derive their validity from the propositional 
calculus, which derives its axioms and rules from the semantics of propositional logic. 


Consequently, the validity of predicate calculus is derived from exhaustive substitution in principle. It is not 
possible to carry out exhaustive substitution for infinite families of propositions. But the rules and axioms 
are determined from a study of the meanings of logical expressions. 


Thus predicate calculus is a formalisation of the “algebra” of parametrised families of propositions, and the 
objective of this algebra is to “solve” for the truth values of particular propositions and logical expressions, 
and also to show equality (or other relations) between the truth value functions represented by various logical 
expressions which may involve quantifiers. 


There is a temptation in predicate calculus to regard truth as arising solely from manipulations of symbols 
according to apparently somewhat arbitrary rules and axioms. The truth of a proposition does not arise 
from line-by-line deduction rituals. The truth arises from the underlying semantics of the proposition, which 
is merely discovered or inferred by means of line-by-line argument. 


The above comment by E. Mendelson [370], page 56, continues into the following footnote. 


There is still another reason for a formal axiomatic approach. Concepts and propositions which 
involve the notion of interpretation, and related ideas such as truth, model, etc., are often called 
semantical to distinguish them from syntactical precise formal languages. Since semantical notions 
are set-theoretic in character, and since set theory, because of the paradoxes, is considered a rather 
shaky foundation for the study of mathematical logic, many logicians consider a syntactical ap- 
proach, consisting in a study of formal axiomatic theories using only rather weak number-theoretic 
methods, to be much safer. 


Placing all of one’s hopes on abstract languages, detached from semantics, seems like abandoning the original 
problem because it is difficult. The validity of predicate calculus is derived from set-theory semantics. This 
cannot be avoided. The detachment of logic from set theory creates vast “opportunities” to deviate from the 
natural meanings of logical propositions and walk off into territory populated by ever more bizarre paradoxes 
of a logical kind. At the very least, the azioms of predicate calculus should be based on a consideration of 
some kind of underlying semantics. This should prevent aimless wandering from the original purposes of the 
study of mathematical logic. 


6.1.3 REMARK: The inapplicability of truth tables to predicate calculus. 
Lemmon [367], page 91, has the following comment on the inapplicability of the truth table approach to the 
predicate calculus. (The word “sequent” here is the same as the “simple sequent” in Definition 5.3.14.) 
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At more complex levels of logic, however, such as that of the predicate calculus [. ..], the truth-table 
approach breaks down; indeed there is known to be no mechanical means for sifting expressions 
at this level into the valid and the invalid. Hence we are required there to use techniques akin to 
derivation for revealing valid sequents, and we shall in fact take over the rules of derivation for 
the propositional calculus, expanding them to meet new needs. The propositional calculus is thus 
untypical: because of its relative simplicity, it can be handled mechanically—indeed, [...] we can 
even generate proofs mechanically for tautologous sequents. For richer logical systems this aid is 
unavailable, and proof-discovery becomes, as in mathematics itself, an imaginative process. 


The idea that there is no effective means of mechanical computation to determine the truth values of 
propositions in the predicate calculus is illustrated in Figure 6.1.1. 


mechanical logical 
computation argumentation 
ebiects ; infinite 
predicates predicate : : 
: ? quantification 
relations calculus 
functions operators 
(impossible) (difficult) 
simple truth propositional | binary 
propositions tables calculus operators 
(easy) (difficult) 


Figure 6.1.1 


The non-availability of mechanical computation for predicate calculus 


The requirement for the use of “the rules of derivation” is not very different to the requirement for deduction 
and reduction procedures in numerical algebra. The task of predicate calculus is to solve problems in “logical 
algebra”. If infinite sets of variables are present in any kind of algebra, numerical or logical, it is fairly clear 
that one must use rules to handle the infinite sets rather than particular instances. But this does not imply 
that the methods of argument, devoid of semantics, have a life of their own. By analogy, one may specify an 
arbitrary set of rules for manipulating numerical algebra expressions, equations and relations, but if those 
rules do not correspond to the underlying meaning of the expressions, equations and relations, those rules 
will be of recreational interest at best, and a waste of time and resources at worst. To put it simply, semantics 
does matter. 


Mathematics, at one time, was entirely an “imaginative process”. Then the communications between math- 
ematicians were codified symbolically to the extent that some mechanisation and automation was possible. 
But now, when mechanical methods are inadequate, the application of the “imaginative process” is almost 
regarded as a necessary evil, whereas it was originally the whole business. 


One could mention a parallel here with the industrial revolution, during which a large proportion of human 
productive activity was mechanised and automated. The fact that the design, use and repair of machines 
requires human intervention, including manual dexterity and “an imaginative process”, is perhaps inconve- 
nient at times, but one should not forget that humans were an endangered species of monkey only a couple 
of hundred thousand years ago. The automation of a large proportion of human thought by mechanising 
logical processes can be as beneficial as the automation of economic goods and services. But semantics 
defines the meaning and purposes of logical processes. So semantics is as necessary to logic as human needs 
are to the totality of economic production. One should not permit mechanised methods of logic to force 
mathematicians to conclusions which seem totally ridiculous. If the conclusions are too bizarre, then maybe 
it is the mechanised logic which needs to be fixed, not the minds of mathematicians. 


6.1.4 REMARK: The decision problem. 
Rosser [387], page 161, said the following. 
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The problem of finding a systematic procedure which, for a given set of axioms and rules, will tell 
whether or not any particular arbitrarily chosen statement can be derived from the given set of 
axioms and rules is called the decision problem for the given set of axioms and rules. The method 


of truth-value tables gives a solution to the decision problem for the statement calculus [...]. What 
about a solution to the decision problem for the predicate calculus [...]. So far none has been 
discovered. 


Shortly after this remark, Rosser [387], page 162, said the following. 
As we said, no solution of the decision problem for the predicate calculus has yet been discovered. 


We can say much more. There is no solution to be discovered! This result was proved by Alonzo 
Church (see Church, 1936). 


Let there be no misunderstanding here. Church’s theorem does not say merely that no solution 
has been found or that a solution is hard to find. It says that there simply is no solution at all. 
What is more, the theorem continues to hold as we add further axioms (unless by mischance we 
add axioms which lead to a contradiction, so that every statement becomes derivable). Applying 
Church’s theorem to our complete set of axioms for mathematics, we conclude that there can never 
be a systematic or mechanical procedure for solving all mathematical problems. In other words, 
the mathematician will never be replaced by a machine. 


This makes clear the underlying motivation of formal mathematical logic, which is to mechanise mathematics. 
In this connection, the following paragraph by Curry [350], page 357, is of interest. (The system “HK*” is a 
formulation of predicate calculus.) 


For a long time the study of the predicate calculus was dominated by the decision problem ( Entschei- 
dungsproblem). This is the problem of finding a constructive process which, when applied to a given 
proposition of HK*, will determine whether or not it is an assertion. [...] Since a great variety of 
mathematical problems can be formulated in the system, this would enable many important math- 
ematical problems to be turned over to a machine. Much effort in the early 1930’s went into special 
results bearing on this problem. In 1936, Church showed [...] that the problem was unsolvable. 


This shows once again the mechanisation motivation in the formalisation of mathematical logic. 


6.1.5 REMARK: Survey of predicate calculus formalisations. 
A rough summary of some predicate calculus formalisations in the literature is presented in Table 6.1.2. 


The symbol notations in Table 6.1.2 have been converted to the notations of this book for easy comparison. 
In the “axioms” column, the first number in each pair is the number of propositional calculus axioms, and the 
second is the number of axioms which contain quantifiers. “Taut” means that the propositional axioms are all 
tautologous propositions, which are determined by truth tables or some other means. The inference rules are 
abbreviated as “Subs” (one or more substitution rules), “MP” (Modus Ponens), “Gen” (the generalisation 
rule). Since the hypothesis introduction rule is assumed to be available in all systems, it is not mentioned 
here. For any additional rules, only the number of them is given. (See Remark 4.2.1 for the propositional 
calculus version of this table.) 


6.1.6 REMARK: Criteria for the choice of a suitable predicate calculus. 
For this book, there are three principal criteria for the choice of a predicate calculus. 


(1) Provide a framework for efficiently proving logic theorems. 
(2) Provide clarification of the underlying meaning of logical propositions and arguments. 


(3) Provide a convenient and expressive notation to express logical ideas. 


Many formulations of predicate calculus are oriented towards research into rather abstract (and arcane) 
questions in model theory and proof theory. The most general formulations are best suited to algebraic 
systems which are far beyond the scope and requirements of Zermelo-Fraenkel set theory. 


For differential geometry, one merely requires a method for ensuring that no false arguments are used in 
proofs of theorems. One should preferably have total confidence in all logical arguments which are used. 
This confidence has two aspects. One must have confidence in the basis which justifies the predicate calculus 
itself, and one must have confidence in the methods which are derived from that basis. 
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year author basic operators axioms inference rules 

1928 Hilbert /Ackermann [358], pages 65-81 ~, A, V, >, V, d 4+2 Subs, MP 

1934 Hilbert /Bernays [359], pages 104-105 ~, A, V, >, &, Y, 3 1542 Subs, MP+2 rules 

1944 Church [348], pages 168-176 4,3, yY 3-2 Subs, MP, Gen 

1950 Rosenbloom [386], pages 69-88 4,>,V 3+1 MP 

1952 Kleene [365], pages 69-85 a, A,V,=,V,4 104-2  MP+2 rules 

1952 Wilder [403], pages 234-236 =, A, V, >, V, 3 5+2 Subs, MP+2 rules 

1953 Rosser [387], pages 101-102 =, A =>,V 3+4 MP 

1957 Suppes [394], pages 98-99 =a, A, V, >, &, Y, J3 Taut+0 3+7 rules 

1958 Bernays [341], pages 45-52 =a, A, V, >, $, VY, J3 Taut+2 Subs, MP+2 rules 

1963 Curry [350], page 344 2,—,V,d 3+4 Subs, MP, Gen 

1963 Curry [350], page 344 =a, >, V, d 3+6 Subs, MP, Gen 

1963 Stoll [393], pages 387-390 -— 342 MP, Gen 

1964 E. Mendelson [370], pages 45-58 a,>,V 3+2 MP, Gen 

1965 Lemmon [367], pages 92-159 a, A, V, >, V, g 0+0 10+4 rules 

1967 Kleene [366], pages 107-112 ʻa, A, V, >, &, V, J3 1342 MP+2 rules 

1967 Margaris [369], pages 47-51 2,,V 3-3 MP, Gen 

1967 Shoenfield [390], pages 14-23 2, V, d 1-1 MP+5 rules 

1969 Bell/Slomson [339], pages 50-61 M, ^. 9+2 MP, Gen 

1969 Robbin [384], pages 32-44 l,.V 34-2  MP,Gen 

1973 Chang/Keisler [347], pages 18-25 a,A,V Taut+2 MP, Gen 

1993 Ben-Ari [340], page 158 a, >, V 342 MP, Gen 
Kennington =p E 3+0 Subs, MP4-5 rules 

Table 6.1.2 Survey of predicate calculus formalisation styles 


When there is some doubt about the basis or methods, individual mathematicians should be able to pick and 
choose which parts of the theory to accept. It is also highly desirable for every mathematician to use the same 
consensus-approved logical system. This suggests that the basic logical system should be minimal, subject 
to providing all required methods. The chosen logical system should be as compact and pruned-down as 
possible with no superfluous concepts, but it should cover all of the requirements of day-to-day mathematics. 


The formulations of predicate calculus in the overview in Figure 6.1.2 (in Remark 6.1.5) give some idea of 
the wide variety in the literature. However, the variety is very much greater than suggested by such an 
overview. Most of these systems are incompatible without major adjustments to accommodate them. Proofs 
in one logical system mostly cannot be imported into others. The axioms in one system are often expressed 
as rules or definitions in another system, and vice versa. Hence it is important to choose a system which 
makes the import and export of theorems and definitions as painless as possible (although the pain cannot 
be reduced to zero). 


6.1.7 REMARK: Undesirability of derived inference rules with metamathematical proofs. 

In addition to the criteria in Remark 6.1.6, it is desirable to avoid as far as possible any reliance on inference 
rules which are provided by metatheorems. Inference rules which are proved metamathematically defeat the 
main purpose of formalisation, which is to provide a high level of certainty in the validity and verifiability of 
proofs. Metamathematical proofs generally exhibit the kind of informality which formal logic is supposed to 
avoid. Such proofs often require naive induction, which introduces a serious circularity in logical reasoning. 
It seems slightly absurd to assume axioms which are not obvious so as to meta-prove rules of inference which 
are obvious. The whole raison-d’étre of axiomatisation is to start with a statement of the obvious, which 
then provides a solid basis for proving the not-so-obvious. (A similar comment is made in Remark 4.9.5.) 


As an example, “Rule C” is presented by some authors as a primary rule which does not need to be proved 
from other rules and axioms (for example, Kleene [365], pages 82, 98-100), while others regard it as a provable 
metatheorem (for example, Rosser [387], pages 128-139, 147-149; Margaris [369], pages 40-42, 78-88, 191; 
Kleene [366], pages 118-119; E. Mendelson [370], pages 73-75.) Similarly, some authors present predicate 
calculus frameworks which depend very heavily on the deduction metatheorem. 
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6.1.8 REMARK: Desirability of harmonising predicate calculus with conjunction and disjunction axioms. 
Another criterion for the choice of a predicate calculus is compatibility with intuition. The axioms and rules 
should be obvious, and their application should be obvious. In terms of the underlying semantics, a universal 
quantifier is a multiple-AND operator, and an existential quantifier is a multiple-OR operator. Therefore for 
a finite universe to quantify over, one would expect the axioms and rules for the quantifiers to resemble the 
axioms for the A-operator and V-operator respectively. There are various propositional calculus frameworks 
in the literature which give axioms for the basic operators =, =, ^ and V. (See for example Kleene [365], 
page 82; E. Mendelson [370], pages 40-41; Kleene [366], pages 15-16.) Since all properties of these operators 
can be deduced from their axioms, one could reasonably expect that similar axioms for the V-quantifier and 
J-quantifier would yield the expected deductions for a predicate calculus. 


6.1.9 REMARK: Outline of practical procedures for manipulating universal and existential propositions. 

A predicate calculus needs four kinds of fundamental rules for exploiting universal and existential expressions. 
These are the UI (universal introduction), UE (universal elimination), EI (existential introduction) and EE 
(existential elimination) rules. The concept of operation of these rules may be broadly summarised as follows. 


(UI) Assumption: A(y) for arbitrary y. Conclusion: Vr, A(x). (Requires some work to discover a proof.) 
* One picks an arbitrary element y from a class U and demonstrates that the proposition A(y) is true. 
From this one concludes that A(x) is true for all x in U because the proof procedure did not depend on 
the choice of y. 
* For example, one may be able to prove that if y is a dog, then y is an animal, without knowing or making 
use of any particular characteristics of the dog y. If so, one may conclude that all dogs are animals. 


(UE) Assumption: Vr, A(x). Conclusion: A(y) for arbitrary y. (Requires no work at all. Just substitution.) 
* If one knows that A(z) is true for all z, then one may conclude that A(y) for an arbitrary choice of y. 
* For example, if one knows that all dogs are animals, one may pick any dog at random and conclude 
that it is an animal without needing to consider any of its individual characteristics. 


(EI) Assumption: A(y) for some y. Conclusion: 3x, A(x). (Requires some work to find an example.) 

* One finds some element x from a class U and discovers that the A(y) is true. From this one concludes 
that A(x) is true for some x in U. 

* For example, if one discovers a single black swan y, this immediately justifies the conclusion that x is 
black for some swan x. So this inference rule is very easy to apply after the individual y has been found, 
but it is not always clear how to discover y in the first place. Sometimes there is no known procedure 
for discovering it. 


(EE) Assumption: dx, A(x). Conclusion: A(y) for some y. (May be very difficult or even impossible.) 
* If one knows that A(x) is true for some x, then one may conclude that A(y) for a some choice of y. 
However, it is not at all clear how to discover this y. There may be no known discovery procedure. 
* For example, even if one is told that that there is some swan that is black, one may not be able to find 
it in one lifetime. In the case of a cryptographic puzzle, for example, all the computing power on Earth 
may be insufficient to find the answer, even though it is known for certain that an answer exists. 


Following the application of an elimination rule, the individual proposition which has been stripped of its 
quantifiers may be manipulated according to propositional calculus methods. So the real interest in predicate 
calculus lie in the four kinds of procedures outlined above. The universal rules are essentially concerned with 
theorems which prove the validity of propositions for a large class of parameters. The existential rules are 
generally more concerned with examples and counterexamples. The discovery of a counterexample makes 
the corresponding theorem impossible to prove. Conversely, the discovery of a proof for a theorem excludes 
the possibility of finding a counterexample. Mathematical research often consists of a mixture of theorem 
proving and counterexample discovery. Whichever succeeds first precludes the possibility of the other. 


As a matter of proof strategy, the purpose of the elimination rules is to permit the de-quantified expressions 
to be manipulated according to propositional calculus, which is already well defined in Chapter 4. Following 
such propositional manipulation, the introduction rules are applied to return to quantified expressions. Hence 
a proof typically commences with one or more elimination rule applications, followed by some propositional 
calculus rule applications, followed by one or more introduction rule applications. 'There could be two or 
more elimination/manipulation/introduction cycles in a single proof. 
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6.1.10 REMARK: Predicate calculus formalisms in the literature. 

It is difficult to compare the predicate calculus systems in the literature because of the plethora of notations, 
vocabulary and conceptual frameworks. In the following very brief summaries, only the quantifier axioms 
and rules are shown. In particular, the modus ponens and substitution are tacitly assumed. Propositional 
calculus axioms are not shown since they all yield the same theorems. The purpose of this summary is to 
assist comparison of the systems. (The notations and terminology are harmonised to this book. The free 
variables in each predicate are listed after its name. For example, “A(x, y)" indicates that A has the two free 
variables z and y and no others.) The natural deduction systems of Suppes [394] and Lemmon [367] (which 
are listed in Table 6.1.2) are not included in the following summary. m ue 


1928. Hilbert/Ackermann [358], pages 65-70. 

(1) Axiom: (Va, A(x)) > A(y). 

2) Axiom: A(y) > (Az, A(x)). 

3) Rule: From A => B(x), obtain A > (Vz, B(z)). 

4) Rule: From B(x) > A, obtain (3x, B(x)) > A. 

1944. Church [348], page 172. 

1) Axiom: (Vz, A(x)) > A(y). 

2) Axiom: (Vx, (A 2 B(z))) > (A= (Vz, B(x))). 

3) Rule: From A, infer Vz, A(x). 

1952. Kleene [365], page 82. (Same as Hilbert/Ackermann [358].) 
1) Axiom: (Vx, A(x)) > A(y). 

2) Axiom: A(y) = (Ax, A(x)). 

3) Rule: From A > B(x), infer A > (Vz, B(x)). 

4) Rule: From B(x) > A, infer (Az, B(x)) > A. 

1952. Wilder [403], page 236. (Same as Hilbert/Ackermann [358].) 
1) Axiom: (Vx, A(x)) > A(y). 

2) Axiom: A(y) = (Ax, A(x)). 

3) Rule: From A => B(x), infer A > (Vz, B(x)). 

4) Rule: From B(x) > A, infer (Az, B(x)) > A. 

1953. Rosser [387], pages 101-102. 


1) Axiom: A — (Vz, A). (No free occurrences of z in A.) 

2) Axiom: ((Va, A(x)) => A(y)) > ((Vz, A(x)) => (Vy, A(y))). 

3) Axiom: (Va, A(x, y)) > A(y, y). 

4) Derived Rule G: From A(y), infer Va, A(x). (Rosser [387], pages 124-126.) 
(5) Derived Rule C:. From dz, A(x), infer A(y). (Rosser [387], pages 126-133.) 
1958. Bernays [341], page 47. (Same as Hilbert/ Ackermann [358], who attribute their system to Bernays.) 
(1) Axiom: (Va, A(x)) > A(y). 

(2) Axiom: A(y) = (da, A(x)). 

(3) Rule: From A => B(x), infer A > (Vz, B(x)). 

(4) Rule: From B(x) = A, infer (3x, B(x)) > A. 

1963. Curry [350], page 342. 

(1) Rule: From | Va, A(x), infer F A(y). 

(2) Rule: From | A(y), infer F Vx, A(x). (Rule G.) 

(3) Rule: From A(y) - B, infer 3z, A(x) - B. (Rule C.) 

(4) Rule: From F A(y), infer - 3x, A(x). 


Wor MIL 
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1963. Curry [350], page 344. (Same as Church [348], plus two extra axioms.) 


1) Axiom 
2) Axiom 


4) Axiom 


( 
( 
(3) Axiom: 
( 
( 


A(y) — 


— 


: (Va, A(x)) > A(y). 
: (Va, (A > B(x))) > (A= (Va, B(z))). 
Jx, A(x)). 
: (Vx, (A(x) => B)) => (Gx, A(x)) > B). 


5) Rule: From A(y), infer Vx, A(x). 


1963. Curry [350], page 344. 


1) Axiom 
2 
3 


Axiom 


Axiom 


5 


6) Axiom 


Axiom: 


A(y) > (3 
(ax, A) > 


: (va, A(2)) > A(y). 
(2) : A> Vr, A 

(3) : (Va, (A(z) = B(z))) = ((Vz, A(z)) = (Yz, B(z))). 
(4) Axiom: 
(5) 

(6) 

( 


z, A(x)). 
A. 


: (Va, (A(x) > B(x))) > (Gx, A(x)) > (Sz, B(x))). 


7) Rule: From A(y), infer Vx, A(x). 


1963. Stoll [393], pages 389-390. (Same as Church [348].) 
(1) Axiom: (Va, A(x)) => A(y). 
(2) Axiom: (Va, (A => B(x))) > (A => (Va, B(x))). 
(3) Rule: From A, infer Yx, A(x). 
1964. E. Mendelson [370], page 57. (Same as Church [348].) 


2) Axiom 


1) Axiom 
2) Axiom 
3) Axiom 
4) Axiom 


: Aly) > (3 


: (Va, A(x)) 


1) Axiom: (Va, A(x)) > A(y). 
2) Axiom: (Vz, (A 2 B(z))) > (A= (Vz, B(a))). 
(3) Rule: From A, infer Vr, A(x). 
4) Derived Rule C: From F 3 


z, A(x)). 


=> A(y). 


3) Rule: From A > B(x), infer A > (Vx, B(x)). 
4) Rule: From B(x) — A, infer (3 


1967. Margaris [369], page 49. 


x, B(x)) > A. 


: A > (Va, A). (No free occurrences of x in A.) 

: (Va, A(2)) > A(y)) > ns AGO) > (Yu, AQ))- 
: Va, A(x) for any axiom A. 

x) and A(x) B, infer + B. (Margaris [369], pages 40, 78-79.) 


5) Derived Rule C: From + da, A 
1967. Shoenfield [390], page 21. 
1) Rule: From B(x) — A, infer (3 
1969. Bell/Slomson [339], page 57-58. (Same as Church [348].) 


1) Axiom: (Vx, A(x)) > A(y). 
2) Axiom: (Vz, (A > B(z))) > (A= (Vz, B(x))). 
3) Rule: From A, infer Vz, A(x). 


x, B(x)) > A. (Rule C.) 
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x, A(x) and A(x) B, infer + B. (E. Mendelson [370], pages 73-75.) 
1967. Kleene [366], page 107. (Same as Hilbert /Ackermann [358].) 
1) Axiom: (Vx, A(x)) > A(y). 
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1969. Robbin [384], page 43. (Same as Church [348].) 


1) Axiom: (Vx, A(x)) > A(y). 

2) Axiom: (Va, (A > B(z))) > (A => (Vz, B(x))). 
3) Rule: From A, infer Vz, A(x). 

1973. Chang/Keisler [347], pages 24-25. (Same as Church [348].) 
1) Axiom: (Vx, A(x)) > A(y). 

2) Axiom: (Va, (A 2 B(z))) > (A = (Vz, B(x))). 
3) Rule: From A, infer Vz, A(x). 

1993. Ben-Ari [340], page 158. (Same as Church [348].) 
1) Axiom: (Vx, A(x)) > A(y). 

2) Axiom: (Va, (A => B(z))) > (A= (Vz, B(x))). 
3) Rule: From A(y), infer Vz, A(x). 


Eight of these frameworks use the Church [348] system, five of them use the Hilbert / Ackermann [358] system, 
and five of these systems are not like the others. 


6.1.11 REMARK: Ambiguity in notations for universal and existential free variables. 

A disagreeable aspect of many of the predicate calculus frameworks in Remark 6.1.10 is the lack of distinction 
between universal and existential free variables. When a free variable is introduced in a proof, precisely the 
same notation denotes a universal free variable, which is completely arbitrary, and an existential free variable, 
which is chosen from a restricted set (or class) of “solutions” of a logical predicate. These two kinds of free 
variables have an entirely different character, and yet they are notated identically. Therefore such systems 
are highly vulnerable to error and subjectivity, since the context determines the meaning of symbols in a 
subtle way. 


When a universal free variable is introduced into a proof, it is obtained from a universal logical expression of 
the form Vx, P(x). If U denotes the universe of individual indices or objects, this expression has the informal 
interpretation Vx € U, P(x), which is equivalent to Vx, (x € U = P(x)). But an existential expression of 
the form dz, Q(x) has the informal interpretation Jx € U, Q(x), which is equivalent to 3x, (x € U ^ Q(z)). 
It is significant that the implicit logical operator is “=” in the first case, and “A” in the second case. 


logical informal equivalent 
expression interpretation expression 
Va, P(x) Vr €U,P(x) Vx, (x € U => P(x)) 
Jx, P(x) Jc €U,P(x) 3x,(xeU A P({x)) 


It seems reasonable to adopt the universal quantifier elimination rule ĉọ € U, Vx, P(x) | P(ĉo). In other 
words, for any 9 € U, the proposition P(ĉọ) follows from the proposition Vx, P(x). The analogous intro- 
duction rule for the existential quantifier is žo € U, P(žo) F Ja, P(x). In other words, for any ĉo € U, the 
proposition Jx, P(x) follows from the proposition P(ĉo). This gives two rules which seem reasonable and 
natural for the universal and existential quantifiers. 


quantifier introduction elimination 
expression rule rule 

Va, P(x) Rule G ĉo € U, Yx, P(x) | P(e) 
da, P(x) žo € U, P(žo) F Aa, P(x) Rule C 


The real difficulties commence when one attempt to fill the gaps in this table. These two gaps, labelled 
“Rule G” and “Rule C”, have a very different nature to each other. An introduction rule for the universal 
quantifier (i.e. Rule G) would require the prior proof of P(x) for every x € U. This can be achieved by 
using a form of proof where x is unknown but arbitrary, and the proof argument is known (by intuition 
or assumption) to be independent of the choice of x. By contrast, when one attempts to eliminate the 
existential quantifier (i.e. Rule C), the resulting assertion must be that P(x) is true for some choice of x, but 
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how one may make such a choice is often unknowable. (For example, the discovery of a valid choice might 
require more computing power than is available in the universe if the problem is “hard” .) 


Many predicate frameworks have J-elimination rules which look like 3x, P(x) + P(y), and V-elimination rules 
which look like Vz, P(x) - P(y). It is clear that P(y) has a totally different significance in these two cases, 
but notationally they look the same. Hence a safer and more credible method of inference is required. 


A clue to the meaning of an elimination rule like “3x, P(x) - P(y)” is the fact that in practice, no proof ends 
with the proposition P(y) for an indeterminate object y, whereas one often ends a proof with a conclusion 
of the form Jaz, P(x). The last step in such a proof would be an existential introduction of some kind, not an 
elimination. So existential elimination should be thought of only as an intermediate step in a proof. In other 
words, one eliminates the existential quantifier, then makes some use of the liberated proposition, and then 
introduces an existential quantifier again in some way. This helps to explain the unusual form of Rule C. 


6.1.12 REMARK: The contrasting roles of elimination and introduction rules for quantifiers. 

Quantifier elimination rules are generally used in the body of a proof to narrow the focus to the quantified 
logical expression, which can then be manipulated with a propositional calculus. Following such manipula- 
tion, quantifier introduction rules are required to conclude the proof. In predicate calculus proofs, one does 
not usually wish to conclude with an intermediate variable in the final expression. 


In applications, however, one often wishes to apply a quantified expression to a particular concrete variable. 
For example, if one proves that the square of an odd integer is an odd integer, one may wish to apply this 
theorem to a particular odd integer, and if one proves that there exists a prime number greater than any 
given prime number, one may wish to find an example of a particular prime number which is larger than a 
particular given prime number. 


6.2. Requirements for a predicate calculus 


6.2.1 REMARK: Predicate structure and object-relation-function models for predicate logic. 

The term “predicate calculus” gives a hint of the first requirement of a predicate calculus. Whereas a 
propositional calculus only needs to provide a framework for propositions whose only property is that they 
are true or false, a predicate calculus must be able to “look into” the propositions to describe their structure. 
In natural language grammar, a clause may be analysed into a subject and predicate. For example, if one 
says that “the dog bit the cat”, the subject is “the dog” and the predicate is “bit the cat”. In the subject- 
predicate model for clauses, the subject is thought of as a member of a class of objects, while the predicates 
are thought of as functions which take objects as arguments and deliver true-or-false values. A clause as a 
whole is then thought of as a proposition which may be true or false. So the first task of a predicate calculus 
is to provide a framework for describing propositions which have at least such a subject-predicate structure. 


A further level of intra-proposition structure which must be described by a predicate calculus is the object- 
relation-object form. For example, the clause “the dog is bigger than the cat” is a clause with two objects 
and one relation. Such clause structure may be thought of as a boolean-valued function of two objects. The 
space of objects for such relations is typically the same as the space of objects for the subject-predicate 
clause-form. Both clause-forms may be thought of boolean-valued functions of objects, with one or two 
parameters respectively. The inductive instinct of mathematicians leads one to immediately embed such 
clause-forms in a framework with predicates which take any non-negative number of parameters. 


A second category of requirements for a predicate calculus is the need to be able to describe infinities. One of 
the most quintessential instincts of mathematicians is to generalise everything to infinite classes and infinite 
classes of infinite classes, and so forth. The unary and binary operators from which compound propositions 
are constructed in the propositional calculus are inadequate to describe conjunctions or disjunctions of infinite 
collections of propositions. The use of quantifiers to extend unary and binary operators to infinite classes is 
generally considered to be an integral part of any predicate calculus. 


Thus there are two principal technical requirements for a predicate calculus, which are independent. 

(1) Proposition structuring. Require object-predicate and object-relation structure for propositions. 
(2) Quantifier operations. Require quantifiers for infinite collections of propositions. 

One could design a calculus with predicate and relation structuring for propositions, but with no quantifiers 


for infinite collections of propositions. But one could also design a calculus with quantifiers for infinite 
collections of propositions, but with no structuring of those propositions. 
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At the very minimum, a predicate calculus for mathematics should be able to describe the language of 
Zermelo-Fraenkel set theory. ZF set theory has the membership relation “€”, which is a two-parameter 
relation. Although individual sets cannot be “true” or “false”, in NBG set theory, there is a single-parameter 
predicate which is true or false according to whether a class is a set. Otherwise, it seems that set theory 
requires only relations for objects called “sets” in some universe of sets. However, it is customary to provide 
additionally “functions” in a predicate calculus. These are not the set-of-ordered-pair style of functions within 
set theory. Predicate calculus “functions” are object-valued functions with zero or more object parameters 
(as contrasted with boolean-valued functions). Although these are apparently not required for ZF set theory, 
they are required for algebra which is not explicitly based on set theory. Algebra is largely concerned with 
operations which deliver a unique object from one or two parameter objects. Thus the requirements for a 
predicate calculus customarily include the following. 


(1) Proposition structuring. Require object-predicate and object-relation structure for propositions. 
(2) Quantifier operations. Require quantifiers for infinite collections of propositions. 

(3) Object functions. Require object-functions which map from object-tuples to objects. 

To meet all of these requirements, one needs at least the following classes of entities. 

(1 
(2 
(3 
(4 


Objects. A universe of objects. 
Relations. Classes of boolean-valued relations with zero or more object-parameters. 


Quantifiers. Quantifier operations for boolean-valued relations. 


a Rl NI Re 


Functions. Classes of object-valued functions with zero or more object-parameters. 


A pure predicate calculus provides only quantifiers for parametrised propositions, with no explicit support 
for constant relations and functions which take objects as parameters. A pure predicate calculus also has 
no non-logical axioms. Non-logical axioms generally state some a-priori properties of constant relations and 
functions, but if such constant relations and functions are not present, it is clearly not possible to assert 
properties for them. 


6.2.2 REMARK: Parametrisation of propositions. The introduction of objects and predicates. 
A first step to go beyond a simple propositional calculus is to parametrise propositions. For example, one 
may introduce index sets 7 and organise propositions into families (P,),er. 


It is a small step from the indexed families of propositions to the introduction of a space of objects so that 
propositions can be interpreted as properties and relations of objects. (Properties and relations may be 
grouped together as *predicates".) Thus one may introduce a set of objects U, and the propositions may 
be organised into families of the form (Pi(i));iev, (Pa(t,7))i jeu, (P3(t, j, k))ij, sev, and so forth. There may 
be any number of proposition families with a given number of parameters, and some families may not be 
defined for the whole universe of objects U. Partial definition may be managed by defining a default value 
(such as "false") for the truth-value of undefined propositions. Since such proposition families have some 
significance in particular applications, they can be given notations which suggest their meaning. 


With the introduction of objects and predicates, there is still in fact no significant difference between such 
a minimalist “predicate logic” and the propositional logic in Chapters 3 and 4. Even if there are finite 
rules which specify infinite sets of relations between the truth-values of propositions, these rules are part 
of the application context, not explicitly supported by the propositional calculus. For example, one might 
specify that P(x1, %2) is a false proposition for every x; in U, for some x2 in U. (Such a situation arises, 
for example, with the ZF empty set axiom dz», Vz1, ri € £2.) A propositional calculus does not permit any 
conclusions to be drawn from such a universal/existential statement. Nor does it have language in which 
such a specification can be expressed. 


Although in a literal sense, a logical framework which explicitly supports notions of objects and predicates 
may be called a “predicate logic", any “predicate calculus" for such a logic would not say much more than 
is said by a simple propositional logic. Therefore one may safely reject such a very literal interpretation of 
the concept of a “predicate logic". 


6.2.3 REMARK: The introduction of quantifiers. 
It is possible to introduce universal and existential quantifiers to propositional logic without first introducing 
objects and predicates. For example, if the universe of propositions is indexed in the form (P;);e; for 
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some infinite index set J, one might specify some constraints on the truth values of the propositions with a 
universal-quantifier expression such as Vi € I, (P; > P41), where I = Ze . This simply makes formal and 
explicit a rule which could otherwise have been specified informally in the application context. 


It is important to indicate the scope of each quantifier. The scope of a quantifier must be a valid proposition 
for each choice of a “free variable” in the scope. By convention, the scope of a quantifier must be a contiguous 
substring of the full string. So in expressions of the form Vi, (A(z)) or 3j, (B(j)), the subexpressions A(?) 
and B(j) must be valid propositions for all choices of the variables i and j. The scope of each quantifier is 
delimited by parentheses. 


One possible difficulty with this approach is the assignment of meaning to subexpressions of a logical expres- 
sion. In the case of pure propositional logic, each logical subexpression is identified with a “knowledge set”. 
(See Sections 3.4 and 3.6 for knowledge sets.) Therefore every subexpression in propositional logic may be 
interpreted as a knowledge set. Since quantifier logic merely extends propositional logic by the addition of 
quantifier operators, it seems plausible that a similar kind of “knowledge set” could be defined for each valid 
subexpression of a valid expression which contains quantifiers. 


6.2.4 REMARK: The combination of quantifiers, objects and predicates. 

As suggested in Remark 6.2.2, the indices for families of propositions often have the character of object 
labels, and the propositions often have the character of properties or relations. Both quantifiers and the 
object-predicate perspective naturally arise out of application contexts for any propositional logic. The ZF 
empty-set axiom, for example, demonstrates the way in which quantifiers, objects and predicates arise. In 
the rule dz, Vy, ^(y € x), the labels x and y refer to objects (i.e. sets in this case), the set membership 
relation “€” takes two parameters, and the proposition “y € x” is given the false truth-value for an infinite 
set of object-pairs (x, y) by a single explicit rule. 


A predicate calculus must support the explicit specification of infinite sets of relations between propositions, 
and provide methods to safely combine such sets of relations to infer conclusions. The object-predicate 
structure of propositions is not the primary task. However, the specification of infinite sets of relations 
between propositions is mostly expressed in terms of the object-predicate structure. Therefore it makes 
good sense to adopt a single predicate calculus which offers both quantifiers and object-predicate structure. 


One opportunity which is opened up by the interpretation of proposition indices as object labels is the 
possibility of introducing an equality relation. Thus when two labels x and y point to the same object, one 
may write x = y, or some such expression. (This is the subject of Section 6.7.) 


6.2.5 REMARK: Wildcards and links. 

Universal quantification of predicates may be achieved with the use of “wildcards”. For example, if a relation 
P(x, y) is asserted to be true for a particular x and an arbitrary y, one may express this as *P(z, V)", where 
the symbol “V” is thought of as a “wildcard” which can take on any value. This avoids the customary 
use of quantifiers as in “da, Vy, P(x,y)”. This approach runs into difficulties with compound expressions 
such as ^P(zi,V) = P(a2,V)”. The meaning of such an expression is ambiguous. This can be remedied 
with expressions such as *P(z,,Vi) > P(a#2,V1)” or *P(z,,V1) > P(a2,V2)”. The former would mean 
Vyi, (P(zi,y1) > P(x2,y1)), while the latter could mean (Vyi, P(zi,y1)) > (Vy2, P(z2,y2)). This suggests 
that “labelled wildcards" should be used. These make possible a linkage between wildcards to indicate that 
they are intended to be the same, but arbitrary otherwise. The advantage of a "linked wildcards" notation 
is that it avoids the syntactical clumsiness of the customary quantifier notation. 


A disadvantage of the “linked wildcards" notation is ambiguity. For example, *P(zi,Vi) > P(a2,V2)” 
could mean either (Vy1, P(zi,y1)) > (Vys, P(e, y2)) or Vyi, Vys, (P(zi,y1) = P(xe, y2)). The scope of 
each quantifier must be indicated somehow. The most obvious way to indicate scope is with parentheses of 
some kind. But then these parentheses must be labelled to indicate which quantifier is being scoped. So 
one may as well return to the conventional notation where blocks of the form Vz, (...) or 3x, (...) indicate 
at the commencement of the scope-block which substitution is intended for which subexpression. To see 
this scoping issue more clearly, consider the expression “Vz, P(x) = Q”, which could be notated with a 
“wildcard” as *P($g) > Q”. It is important to distinguish between the interpretations "Vr, (P(x) > Q)” 
and *(Vz, P(x)) => Q”, which differ only in their scope. 


In summary, three aspects of quantifiers must be indicated. 


(1) the kind of quantifier (universal or existential), 
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(2) the variables to which each quantifier applies (which implies linkage if more than one variable is substi- 
tuted by a single quantifier), 


(3) the exact scope of each quantifier. 


6.2.6 REMARK: Interpretation styles for sequents in predicate calculus arguments. 

There are, broadly speaking, three styles of interpretation for the lines which appear in predicate calculus 
arguments. Style (1) may be thought of as the “Hilbert style”. Style (2) may be thought of as the “sequent 
style” or “Gentzen style”. (The latter originated in the 1934/35 papers by Gentzen [413, 414].) The sequents 
here are the simple sequents in Definition 5.3.14. 


(1) Every line is a true proposition. 
(2) Every line is a true sequent, interpreted with universals, implications and conjunctions. 


(3) Every line is a true sequent, interpreted with existentials, universals, implications and conjunctions. 


Perhaps the strongest argument in favour of interpretation style (3) is the observation that mathematicians 
seem to believe that they are talking about an actual choice of an actual value of x = zo when a proposition 
of the form “Jx, a(x)” is assumed in a theorem and they say, for example: “Let zo satisfy a(xo).” Then they 
proceed to use this “constant variable" £o to prove some other proposition £, which is then considered to be 
proved from the original assumption “Jx, a(x)”. The choice of variable zo is apparently not thought of as part 
of an implication of the form “Vao € U, (o(xo) > B)", from which the conclusion *(3xo € U, a(xo)) > 8” is 
subsequently inferred. They do not say that since 8 can be inferred from a(ao) for any zo, it follows that 8 
has been inferred from the assumption dz, a(x). In other words, the variable xg is not thought of as a free 
universal variable, but rather as a fixed “existential variable”. 


The “Rule C” approach has the advantage that all lines of an argument are sequents where all “free variables” 
are subject to universalisation as in argument style (2), which prevents existential semantics from being 
implemented in the way that mathematicians really work. Style (3) permits existential “constant variables” 
to appear in formal predicate calculus arguments, which is even more “natural” than the pure universalised 
sequent argument style (2). 


6.3. A sequent-based predicate calculus 


6.3.1 REMARK: Objectives of a good predicate calculus. 

A good predicate calculus should permit proofs which are short, simple, comprehensible and clearly correct. 
Proofs should also be easy to discover, construct, verify and modify. These objectives underly the choices 
which have been made in the design of the predicate calculus in Definition 6.3.9. (See also Section 6.2 for 
some discussion of the requirements for a predicate calculus.) 


6.3.2 REMARK: The importance of the safe management of temporary assumptions in proofs. 

In real-life mathematics, one generally keeps mental track of the assumptions upon which all assertions are 
based. Some of the assumptions are fairly permanent within a particular subject. Some of the assumptions 
are restricted to a narrower context than the whole subject. Some assumptions may be restricted to a 
single chapter or section of a work. Sometimes the assumptions are present only in the statement of a 
particular theorem. And quite often, assumptions are adopted and discharged during the development of a 
proof. At the beginning of a proof, it is usually fairly clear which assumptions are in force at that point, 
but as a long proof progresses, it may become unclear which temporary assumptions are still in force and 
which temporary assumptions have been discharged already. A good predicate calculus should make the 
management of temporary assumptions clear, safe and easy. (The safe, tidy management of temporary 
assumptions in proofs is discussed particularly in Section 6.5.) 


6.3.3 REMARK: A predicate calculus which uses true sequents instead of tautologies. 

The predicate calculus in Definition 6.3.9 is based on the propositional calculus in Definition 4.4.3. The 
axioms (PC 1), (PC 2) and (PC 3) are incorporated essentially verbatim. To these axioms, 7 inference rules 
are added. 


The 7 inference rules in Definition 6.3.9 are written in terms of “sequents” rather than propositions. A sequent 
is a sequence of zero or more proposition templates followed by an assertion symbol, which is followed by a 
single proposition template. (All of the sequents in Chapter 6 are simple sequents as in Definition 5.3.14. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


176 6. Predicate calculus 


Simple sequents have one and only one proposition on the right of the assertion symbol.) Each such sequent 
essentially means that if the input templates at the left of the assertion symbol match propositions which are 
found in a logical argument, then the corresponding filled-in output template at the right of the assertion 
symbol may be inserted into the argument at any point after the appearance of the inputs. 


When a rule is applied, the list of inputs of the sequent is indicated by line numbers which refer to earlier 
propositions. Thus every line of an argument is a sequent in an abbreviated notation. (This style of 
abbreviated logical calculus was used in textbooks by Suppes [394], pages 25-150, and Lemmon [367].) In 
propositional calculus, or in a Hilbert-style predicate calculus, every line of an argument is a tautology. In a 
natural deduction style of argument, every line represents a true sequent, which is not very different to the 
true-proposition style of argument. 


6.3.4 REMARK: Non-emptiness of the universe of objects. 

The universe of objects U in Definition 6.3.9 is assumed to be non-empty. If this were not so, there would 
be no reason to extend propositional logic to predicate logic. Very often in mathematics, the degenerate 
cases of definitions and theorems are adequately handled by the same general rule which is applied to the 
non-degenerate case. However, in the case of predicate calculus, the treatment would be significantly less 
tidy if an empty universe of objects were permitted. Since predicate calculus with a non-empty universe is 
already “interesting” enough, the degenerate empty universe case is excluded here. Shoenfield [390], page 10, 
put it more succinctly as follows. as 


We shall restrict ourselves to axiom systems in which the universe is not empty, i.e., in which there 
is at least one individual. This is technically convenient; and it obviously does not exclude any 
interesting cases. 


The non-empty universe assumption has the advantage, for example, that one may infer Jx, A(x) from the 
assumption Vr, A(x). This is seen in the proof of Theorem 6.6.7 (i), using Definition 6.3.9 rule EI, which is 
valid only in a non-empty universe. In the empty universe case, the proposition Vx, A(x) is always true, and 
the proposition 4a, A(x) is always false. 


Only one universe of objects is specified because in practice, multiple universes are much more conveniently 
managed with single-parameter predicates which indicate which universe each object belongs to. Thus, for 
example, sets and classes in NBG set theory may be indicated by predicates “Set(...)” and *Class(...)". The 
fundamental properties of such constant predicates may be conveniently introduced via non-logical axioms. 


6.3.5 REMARK: Design of a user-friendly predicate calculus based on "true sequents”. 

Natural deduction predicate calculus systems are based on “true sequents" rather than the Hilbert-style “true 
propositions". In sequents, there are two kinds of operator-like connectives which separate propositions. 
Every sequent contains, in principle, a single assertion symbol * - ", and zero or more commas before the 
assertion symbol. The meaning of the assertion symbol is very closely related to the implication symbol “=>”. 
The meaning of the comma separator is very closely related to the conjunction symbol “A”. Thus for example 
the sequent *o4, a2, a3 F f" has a meaning which is closely related to the single compound proposition 
“(ay ^ ag ^ as) > Bis 


It is quite straightforward to represent universal quantifier elimination and introduction procedures in terms 
of implication and conjunction operators. Thus if the universe of objects is a 3-element set U = (21,22,23] 
for example, then the predicate formula “Vz € U, a(x)” may be written as “a(xı) ^ a(z2) ^ o(xza)", 
and this can be rewritten as “Va, (x € U = o(r)). In the first case, only conjunctions are required, 
whereas in the second case, an implication is required. The formula "Vr € U, a(x)” corresponds roughly 
to the sequent “x € U F a(x)”. In other words, from the assumption x € U one infers a(x), no matter 
which element x € U is chosen. So the sequents “A F Va, a(x)” and “A, $ € U F a(#)” may be regarded as 
essentially interchangeable for any ambient assumption-set A. It is important to note that any sequent such as 
“a(x) F B(x)” is implicitly universal, which means that such a sequent is interpreted as “Vx, (a(x) > B(x))". 
Therefore the pseudo-sequent “a € U F a(x)” is interpreted as “Va, (x € U > o(xz))". 

'The real difficulty starts with the representation of existential quantifier elimination and introduction in 
terms of sequents. The predicate formula “Jx € U, a(x)” for the 3-element universe U = (z1,22,23], for 
example, may be written as *a(zi) V o(z2) V o(xa)", or as “Jx, (x € U ^ a(x))". The appearance of the 
disjunction operator “V” cannot easily be implemented in terms of implication and conjunction operators, 


and the existential quantifier “Jx” cannot easily be implemented in terms of a universal quantifier. Therefore 
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it seems that existential logic may be very difficult to implement in terms of sequents. In fact, predicate 
calculus systems in general seem to be not very well suited to the implementation of existential logic. To 
a great extent the cause of this is the very limited language of logic arguments. Conjunctions, implications 
and universals are apparently inherent in the culture of logical argumentation, whereas disjunctions and 
existentials are alien to the culture. 


One of the keys to the solution of this problem is the observation that one rarely, if ever, ends a proof 
with a particular choice of a free variable z which exemplifies an existential statement of the form Jz, a(x). 
Typically any such choice will be introduced with a phrase such as: “Let x be such that a(x) is true.” 
Then the particular choice of x is used for some purpose, and some conclusion is drawn from it. But the 
particular choice of x does not usually appear in the conclusion of the argument. Thus choices of variables 
are generally stages of arguments which lead to particular conclusions. As mentioned in Remark 6.1.10, the 
1928 work by Hilbert /Ackermann [358], pages 65-70, proposes a predicate calculus which includes rules for 
the introduction of universal and existential quantifiers in the following style. 


(3) Rule: From A => B(x), obtain A > (Vz, B(x)). 
(4) Rule: From B(x) => A, obtain (Sr, B(x)) > A. 


In the existential case (4), the quantified expression is introduced as a premise for a conclusion A, whereas 


in the universal case (3), the quantified expression is the conclusion of a premise A. This effectively means 
that the proposition *B(x) — A” is interpreted as “Vx, (B(x) > A)”, which is equivalent to “(Sxr, B(a)) > 
A”. This demonstrates that expressions containing free variables are generally interpreted with an implicit 
universal quantifier. By making the variable proposition B(x) the premise, the universal quantifier yields the 


existential expression “J 


da, B(x)” as the premise for A. This kind of “reverse logic" for existential quantifiers 
is required because of the default universal interpretation when free variables are present. 

In natural deduction systems (where true sequents appear on every line instead of true propositions), the 
argumentation for existential quantifier introduction and elimination is implemented principally as *Rule C", 
which is presented in the following form by E. Mendelson [370], pages 73-75, and Margaris [369], pages 78-79. 


Derived Rule C: From + Jz, A(x) and A(x) - B, infer - B. 


When the implicit universal quantifier is applied to the sequent *A(x) + B" (and the assertion symbol is 
replaced with the conditional operator), the result is “Vz, (A(x) = B)", which is equivalent to *(3r, A(x)) > 
B", which may be combined with the sequent “+ 3x, A(x)” to obtain the assertion “+ B". Thus an existential 
quantifier is again represented via the implicit universal quantifier by placing the quantified predicate on the 
left of the implication operator (whose role is in this case performed by the assertion symbol). 


In a typical example of existential quantifiers in ordinary mathematical usage, one might have a function f 
which is somehow known to satisfy IK € R, Vt € R, |f(t)| < K. To exploit such an assumption, one may 
argue along the lines: “Choose K € R such that Vt € R, |f(t)| < K.” Then one may perform manipulations 
with this number K, and later on may draw a conclusion about the function f from this which are independent 
of the choice of K. Thus one obtains a conclusion which does not refer to the arbitrary choice of K. In the 
abstract predicate logic context, choosing K could be written as “+ @ € U A A(z)” or “+ € U, Ale)”. 
The one may manipulate the arbitrary choice of & to obtain a conclusion, after which the arbitrary choice 
is removed because the validity of the conclusion does not depend on this choice. Such a form of argument 
may be written abstractly as the following existential elimination and introduction rules. 


(EE) From F Az, a(x), infer - ž € U and F a(1). 
(EI) From! ž € U and F 6(), infer - 3x, B (x). 


In principle, there is no problem with such an approach. The fly in the ointment here is the fact that 
the standard implicit interpretation of the sequents “H € U” and “F a(%)” would be Vz, x € U and 
Va, a(x), which is quite nonsensical. One really wants the interpretation dz, (x € U ^ a(x)), which is 
equivalent to da € U, a(x). There is no easy way to “turn off" this standard interpretation. Although the 
logical argumentation style which results from this approach is valid, since it is equivalent to the “Rule C? 
approach, it does not result in lines which are all valid sequents. Hence the rather clumsy *Rule C" approach 
is to be preferred. Its clumsiness is not inherent in the notion of an existential quantifier nor in the arbitrary 
choice of a object for such a quantifier. The clumsiness arises from the default interpretation of sequents, 
which is unfairly biased in favour of universal quantifiers. 
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Some logical inference systems actually use different ranges of the alphabet for free variables to indicate 
which ones should be universalised and which ones are constants. This is even clumsier than using, say, hat 
and check notations such as ĉ and ž to indicate universal and existential free variables respectively. 


6.3.6 REMARK: Parametrised names for logical formulas in predicate calculus. 

In the case of propositional calculus, it is possible to write axioms, rules, theorems and proofs in terms of 
wff-names, which are simple names for general well-formed formulas. In the case of predicate calculus, wffs 
have a more complicated structure, as in Definition 6.3.9(v). This complicated structure is not a special 
problem in itself. The difficulty arises when such wffs are used in axioms, rules, theorems and proofs. These 
contexts require at least some information about the presence or absence of free variables. (The bound 
variables do not play a role in these contexts.) Therefore a form of notation is required which indicates 
enough information about free variables to ensure correct inference. 


The most important “design principle" for wff-names is that whenever the parameters are substituted with 
particular variable names, the resulting expression should refer to a well-defined proposition. For example, let 
P : Ny x Ny — Np denote the map P : (z,y) — “r` = y" for variable names z, y € Ny. (See Remark 3.9.2 
for the back-quote symbol “`” for dereferencing name-strings.) Then P(“#,”, *j,") = *$ = $1" is the name 


of a well-defined proposition (in predicate calculus with equality) for any “#1”, *j41" € Ny. 


Now let Q : Ny — Np denote the map Q : (x) 4 “r` = y". When a particular variable name “#,” € Ny 
is substituted for x, the result is ^? = y”, where y is unspecified. This is in fact not an element of Np 
because “#, = y” is not a well-defined proposition-name unless a variable name is substituted for y. This 
situation can be salvaged by noting that Q(x) effectively yields a single-parameter predicate Ry : Ny > Np 
defined by Ry : (y) 4 Q(z, y) = “r` = y" for each x in Ny. Then for each z in My, Rs is a well-defined 
map from My to Np. This can be expressed by writing Q : My > (Ny — Np), which means that Q is a 
kind of “predicate-valued predicate” defined by Q : (x) +> ((y) 9 “a = y"). (See Section 10.19 for extended 
notations for function-spaces and map-rules.) 


'The concept which is required for predicate calculus rules, theorems and proofs may be thought of as "soft 
predicates". These are analogous to software-defined functions by contrast with the concrete predicates whose 
names are in Ng in Definition 6.3.9. For example, in set theory “=” and “€” would be the names of hard 
predicates, whereas the map (z,y,z) =œ “z Ey A y € 2” would be a soft predicate with three parameters, 
and the map (y) + “Yx, a(x € y)" would be a soft predicate with one parameter. Since rules, theorems and 
proofs are written in terms of soft predicates, they must be (more or less) formally specified. 


6.3.7 REMARK: Predicate names, variables names, wffs, wff-names, wff-wffs and “soft predicates”. 

The relations between predicate names, variable names, wffs, wff-names and wff-wffs for predicate calculus 
are an extension of the corresponding concepts for propositional calculus which are described in Remark 4.4.2. 
In Definition 6.3.9 (v), wffs are defined as strings such as “(V21, (3x2, (A(z1, £2) > B(x2,x3))))”, which are 
built recursively from predicate names, variable names, logical operators and punctuation symbols. Each wff- 
name in Definition 6.3.9 (iv) is a letter which may be bound to a particular wff within each scope. Thus the 
wff-name “a” may be bound to the wf “(Va1, (322, (A(x1, £2) > B(x2,23))))", for example. The wff-wffs 
in Definition 6.3.9 (vi) are strings such as *(Vza, (20(z3)))", which may appear in axioms, inference rules, 
theorem assertions and proofs. Each wff-wff is a template into which particular wffs may be substituted. 


The wff-wff construction rule (1) in Definition 6.3.9(vi) requires the list of variable names following a 
wíf-name a to contain all of the free variables in the wff which a is bound to. Thus, for example, 
the wff-wff *a(x3)" is valid if a is bound to a wff which contains only “x3” as a free variable, such as 
"(Vai, (325, (CA(z1, 22) => B(x2,23))))". The set of free variables in a wff may be determined by means of 
the recursive formulas at the right of each rule in Definition 6.3.9 (v). Expressions which are defined accord- 
ing to this rule are “soft predicates”, which may be thought of as software-defined predicates by contrast 


with the “hard predicates” with names in No. 


6.3.8 REMARK: The chosen predicate calculus. 

Definition 6.3.9 is the pure predicate calculus which has been chosen for this book. (See Remarks 4.4.2 
and 6.3.7 for comments on the terminology and notation in Definition 6.3.9. See Remark 5.1.4 for the spaces 
Qp of predicates with k parameters, for non-negative integers k. See Section 6.6 for some basic theorems for 
this predicate calculus.) 
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Compared to other predicate calculus systems, this one is relatively simple and intuitive to use, even compared 
to other natural deduction systems. It is not designed for theoretical analysis, but rather for its ease of use as 
a practical deduction system for all mathematics. It is directly applicable to first-order logic systems which 
have specified constant predicates and non-logical axioms. Proofs are relatively easy to discover, compared 
to other systems, and errors in deduction are relatively easy to detect. 


Definition 6.3.9 is quite informal, and has many “rough edges”. This is an inevitable consequence of the lack 
of a formal language for metamathematics in this book. In terms of such a formal language, for example for 
a suitable computer software package, it should be possible to specify all of the rules much more precisely. 
The informal specification given here is sufficient for the proofs of some basic theorems in Section 6.6, which 
are useful for the rest of the book. 


6.3.9 DEFINITION[MM]: The following is the axiomatic system QC for predicate calculus. 


(i) The predicate name space Mo for each scope consists of upper-case letters of the Roman alphabet, with 
or without integer subscripts. The variable name space Vy for each scope consists of lower-case letters 
of the Roman alphabet, with or without integer subscripts. 


(ii) The basic logical operators are — (“implies”), ^ (“not”), V (“for all") and 3 (“for some"). 

(iii) The punctuation symbols are *(" and “,” and *)". 

(iv) The logical predicate expression names (“wff names") are lower-case letters of the Greek alphabet, with 
or without integer subscripts. Within each scope, each wff name may be bound to a wf. 

(v) Logical predicate expressions (*wffs"), and their free-variable name-sets S, are built recursively according 
to the following rules. 


(1) *A(zi,...z4)" is a wff for any A in No, and z1,...24 in Ny. [S =A tiyin] 
(2) “(a` > B')" is a wff for any wffs o and £. [S = Sa U Sg] 
(3) “(ma`)” is a wff for any wf a. [S = Sa] 
(4) “(Va, o^)" is a wff for any wf a and x in Ny. [S = Sa \ {e} 
(5) “(Aa, a`)” is a wff for any wff a and x in Ny. [S = Sa \ {x}] 


Any expression which cannot be constructed by recursive application of these rules is not a wf. (For 
clarity, parentheses may be omitted in accordance with customary precedence rules. In particular, the 
parentheses for the parameters of a zero-parameter predicate may be omitted.) 


The free-variable name-set of a wff is the set S defined recursively by the rules used for its construction. 


A k-parameter wff, for non-negative integer k, is a wff whose free-variable name-set contains k names. 


A free variable (name) in a wff is any element of its free-variable name-set. 


A bound variable (name) in a wff is any variable name in the wff which is not an element of its free- 
variable name-set. 


(vi) The logical expression formulas (^wff-wffs") are defined recursively as follows. 


(1) *a(z1,...x&)" is a wf-wf for any wff-name a, for any finite sequence z,...zj in My, where a is 
bound to a wff whose free variable names are all contained in the set (z1,...x4). Logical expression 
formulas constructed according to this rule are “soft predicates". 


(2) *(£j > E5)” is a wff-wff for any wff-wffs Z1 and E». 
(3) “(A)” is a wff-wff for any wff-wff E. 
(4) *(Vz, E)" is a wff-wff for any wff-wff E and z in Ny. 


(5) “(dx, E" is a wff-wff for any wff-wff E and z in Ny. 


(vii) The axiom templates are the following sequents for any soft predicates a, 8 and y with any numbers of 
parameters, applied uniformly for each soft predicate. 


(PC1) F a 2 (B — a). 
(PC2) F (a= (8 > 7)) > ((a > B) > (a> 7. 
(PC3) F (48 > ~a) > (a > $). 
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(viii) The inference rules are uniform substitution and the following rules for any soft predicates a and 8 with 
any numbers of extra parameters, applied uniformly for each soft predicate. 
(MP) From A; Fa => 6 and As F o, infer Ay U A2 F B. 
) From A, at B, infer AF a > B. 
) Infer aF a. 
(UE) From A(£) F Va, a(x, £), infer A(ĉ) - a(ĉ, ĉ). 
) From A F a(ĉ), infer AF Va, a(x). 
) From A; F Jz, a(x) and Ag, a(ĉ) - B, infer Ay U A» F B. 
(EI) From A(ĉ) - a(ĉ, 2), infer A(2) - 3z, a(x, ĉ). 


Each symbol “A” indicates a dependency list, which is a sequence of zero or more wffs which may have 
& as a free variable only if so indicated. The wffs a and 5, and also the wffs in dependency lists, may 
have free variables as indicated and also any free variables other than ĉ, applied uniformly in each rule. 
“A; U Ag” means the concatenation of dependency lists A; and A», with removal of duplicates. 


6.3.10 REMARK: Bound and free variables. 

Definition 6.3.9 (v) classifies variables in logical predicate expressions as “bound” or “free”. In essence, a 
bound variable has local scope as a parameter of a quantifier whereas a free variable has global scope. This 
implies that a variable may be a free variable for a sub-formula of a formula and simultaneously a bound 
variable within the full formula. In computer software, a bound variable would be called a “dummy variable". 


6.3.11 REMARK: Selective quantification of free variables in the predicate calculus rules. 

The rules UE and EI in Definition 6.3.9 (viii) permit “selective quantification of free variables”. In other 
words, if the wff indicated by a has more than one occurrence of the free variable ĉ, then the quantified 
form of this wff may substitute any number of instances of ? with the quantified variable x. In these two 
rules, the wf list A(ĉ) is not substituted in any way with the quantified free variable. 


6.3.12 REMARK:  Ezxtraneous free variables for the three propositional calculus axioms. 

The three axioms in part (vii) of Definition 6.3.9 are copied from the Lukasiewicz axioms in part (vii) of 
Definition 4.4.3. However, the wff specifications are different. So the meaning is slightly different. In the 
predicate calculus case, the wffs may have free variables, which are indicated in Definition 6.3. 6.3.9(v). In 
the special case of a wff a with an empty free-variable name-set Sa, it is not difficult to believe that the 
propositional calculus axioms of Definition 4.4.3 are applicable here, and that therefore all of the theorems 
for that calculus are valid also. 


If a wff appearing in a sequent has one or more free variables, these free variables are interpreted according 
to the universalisation semantics in Remark 6.4.1. This effectively implies that the sequent asserts that 
the result of any uniform substitution of particular variables for the free variables yields a logical formula 
which is true for the underlying knowledge space. Since the axioms in Definition 6.3.9 (vii) are deemed to 
be tautologies for any substitution of wffs for a, 6 and y, every uniform substitution for their free variables 
must yield a valid tautology. Therefore they continue to be tautologies when they are universalised. 


Another way to think of this is that when free variables are seen in sequents, they are in fact not free 
variables. They are in fact always implicitly universalised and are therefore actually bound variables. 


It follows from these observations that all of the theorems in Sections 4.5, 4.6 and 4.7 for the propositional 
calculus in Definition 4.4.3 are applicable for proving theorems for the predicate calculus in Definition 6.3.9. 
A typical example of this is the proof of Theorem 6.6.7 (iii), where lines (2) and (5) assert —A($9) and 
(Vr, A(x)) = A($9) respectively, and Theorem 4.6.4 (v) is then applied to assert =(Vz, A(x)) on line (6). 
The “input” propositions both contain the free variable 29, which is implicitly substituted in each proposition 
so that the free variable is no longer free. Then the conclusion follows by propositional calculus for each 
possible substitution. 


6.3.13 REMARK: Exztraneous free variables in the predicate calculus inference rules. 

The inference rules in Definition 6.3.9 (viii) permit free variables in much the same was as for the axioms, 
as mentioned in Remark 6.3.12. However, there are some differences. The inference rules contain both wffs 
and dependency lists, and these have both explicit indications of free variables and implicit permission for 
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“extra” or “extraneous” free variables. For the same reasons as given in Remark 6.3.12, the extraneous free 
variables do not diminish the validity of the rules. If a rule is valid for one substitution of particular values 
for the extraneous free variables, then the universalisation of the rule is also valid. 


An example of the application of extraneous free variables to an inference rule may be found in the proof 
of Theorem 6.6.24 (i) in lines (2) and (3). The (consequent) proposition in line (2) is "Vy, A(£9, y)", and 
the (consequent) proposition in line (3) is “A(£o, jo)", which follows by the UE rule. The first proposition 
contains the extraneous free variable 29, which is retained in the second proposition. The rule acts only on 
the variable y, ignoring the variable 29 as “extraneous”. Since the inference is valid for each fixed value 
of $9, it is therefore valid for all fixed values. Hence universalisation semantics may be validly applied to Zo. 


The antecedent dependency list for both lines (2) and (3) in the proof of Theorem 6.6.24 (i) is the single wff 
‘Ya, Vy, A(x, y)", which has no free variables. Therefore the form of rule being applied is as follows. 


From At Vy, o(£9, y), infer AF o(£9, ĝo). 
This is a special case of the UE rule, but with the extraneous free variable 29 indicated explicitly. 


The additional (optional) dependence of the soft predicate a on the free variable 2 in rules UE and EI in 
Definition 6.3.9 (viii) effectively means that any number of instances of 2 may be bound to the quantified 
variable x or left in place, as desired. 


6.3.14 REMARK: Brief justification of quantifier introduction and elimination rules. 

The fact that the UE and EI rules in Definition 6.3.9 permit the antecedent conditions to depend on the free 
variable 2, whereas the UI and EE rules do not enjoy this privilege, requires some explanation. The quantifier 
introduction and elimination rules in Definition 6.3.9 may be very briefly justified in terms of the following 
basic properties of the quantifier concepts using “naive predicate calculus". (Sequents are interpreted with 
default “universalisation semantics", which means that “'” is interpreted as “=”, and this implication 
operator is assumed to be valid for all values of all free variables in a universe of variables U.) 


(UE) The sequent Vx, A(z,2) F A($,4) follows from the tautology V2 € U, ((Vz, A(x, 2)) > A(£,4)). So 
+ A(£,4) can be inferred from F Va, A(x, 4). 

(UI) From the sequent H A(£), the sequent + Vx, A(x) can be inferred because the sequent F A(#) means 
V$ € U, A($), which is equivalent to Vz, A(z). 

(EE) From the sequent A(ĉ) - B, the sequent + (3r, A(z)) = B can be inferred because the sequent A(2) - B 
means V$ € U, (A($) = B), which is equivalent to (3r, A(x)) > B. 

(EI) The sequent A(ĉ,ĉ) F Ja, A(x,£) follows from the tautology Yê € U, (A(2,2) => dz, A(z, 2)). So 

H da € U, A(z, $) may be inferred from + A(4, 4). 


Even more briefly, the basic properties of the quantifiers may be written as follows purely in terms of sequents. 
(UE) Infer Vr, A(z,$) F A(#, ĉ). 
(UI) From + A(£) infer - Va, A(x). 
(EE) From A(ĉ) - B infer - (Ax, A(x)) > B. 
(EI) Infer A(2, 2) F 3x, A(x, 2). 


In condensed form, these properties encapsulate the meaning of the two quantifiers. The duality between 
the UE and EI properties is clear. The duality between the UI and EE properties is less clear because the 
default universalisation semantics for sequents breaks the symmetry here. However, the symmetry can be 
restored by observing that the UI property is equivalent to: “From Bt A(£) infer B => Vx, A(x).” (Then 
B may be replaced by the always-true proposition T.) 


The sequents *A(2) F Va, o(z, 2)" and *A(2) - o(2, 2)" in the UE rule in Definition 6.3.9 are expanded as 


lines (2) and (3) respectively as follows, where line (1) is the basic (naive predicate calculus) property (UE) 
of the universal quantifier indicated above. 


(1) V$ € U, Vr, o(r,$) > a(ĉ, ĉ) Hyp 
(2) V@ € U, A(ĉ) => Va, a(x, d) Hyp 
(3) Vê € U, A(@) > o(,1) from (1,2) 
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It is clear that line (3) follows from lines (1) and ( (2). In other words, the UE rule in Definition 6.3.9 follows 
from the basic property of the universal quantifier in line (1). Significantly, t the antecedent condition list A 
is permitted to depend on the free variable 1. 


Similarly, the sequents A(£) - a(ĉ, ĉ) and A(#) - 3x, a(x, £) in the EI rule in Definition 6.3.9 are expanded 
as lines (2) and (3) respectively as follows, where line a 1) is the basic (naive predicate calculus) property (EI) 
of the existential « quantifier indicated above. 


(1) Vê € U, o(2,2) — Jx, a(x, ĉ) Hyp 
(2) V$ € U, A(z) = alê, ĉ) Hyp 
(3) Vê € U, A(f)- Jr, a(x, d) from (1,2) 


Once again, it is clear that line (3) follows from lines (1) and (2). In other words, the EI rule in Definition 6.3.9 
follows from the basic property of the existential quantifier in line (1). Significantly, the antecedent condition 
list A is permitted to depend on the free variable ĉ. 


The UI rule may be interpreted as follows. There is no mystery to be solved here. 
(1) V@ € U, A => a(ĉ) Hyp 
(2) A > Vr, a(x) from (1) 


The reason for requiring the antecedent conditions list A to be independent of ĉ for the UI and EE rules, 
but not for the UE and EI rules, is explained in Remark 6.4.5. The EE rule may be interpreted as follows. 


(1) Ay > Jr, a(x) Hyp 
(2) Vi € U, (Az ^ a(2)) — B Hyp 
(3) A2 => Vz, (a(x) > B) from (2) 
(4) A3 > (Ax, a(x)) > B from (3) 
(5) (Ai ^ Ao) > B from (1,4) 


Here line (5) is inferred from lines (1) and (2), which validates the EE rule. Note that the expression 
“Ai A A3" in line (5) denotes the logical conjunction of the sequences of wffs in A; and Ag, which corresponds 
to the union of the wffs in the two sequences, which is denotes here as “A; U A2”. The reason for apparent 
disagreement here is that conjunction semantics are applied to lists of antecedent wffs. (Strictly speaking, 
the set-union operator “U” is incorrect because the correct operator is list-concatenation.) 


Thus expanding sequent lines according to universalisation semantics and then applying well-known logical 
rules for the quantifiers validates all four of the quantifier rules for predicate calculus. 


6.3.15 REMARK: The propositional calculus analogue for the EE rule. 

The similarity between the EE rule and Theorem 4.7.9 part (xli) is not pure coincidence. The EE rule 
is the natural generalisation to an infinite family of propositions of the propositional calculus theorem 
"a — y, B — y F (o V B) — y". The two propositions a and 8 are replaced by A(#), which represents 
a generic member of a family of propositions, and the expression œ V f is replaced by dz, a(x), which is 
effectively the disjunction of all members of the family of propositions. (The consequent proposition ^ is 
replaced by B.) 


6.3.16 REMARK: The accents on universal and existential variables are superfluous in principle. 

In Definition 6.3.9, the “hat” accent 7] serves only to hint that a variable is free on a line of a logical 
argument. It is unnecessary and may be omitted without changing the meaning. The variables which are 
free or bound are objectively discernable by reading the text of each line. 


As a mnemonic for the meaning of the “hat” accent “], one may think of it as the logical conjunction 
operator symbol “A”. A universally quantified proposition “Vz, A(x)” may be thought of as *A(x1) A 
A(a2) ^ A(za) A...” (if it is assumed that the elements of the parameter-universe may be enumerated). 
Similarly, one may think of the “check” accent Ď as a hint for the logical disjunction operator “V” because 
an existentially quantified proposition “Jx, A(x)” may be thought of as *A(z1) V A(z2) V A(xa) V ...". 
(Such existentialised free variables are not used for the predicate calculus in Definition 6.3.9.) 


6.3.17 REMARK: Alternative form for the existential quantifier elimination rule. 
The following is a simpler-looking alternative for the existential elimination rule EE in Definition 6.3. 
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(EE’) From A F a($) > B, infer A - (Az, o(z)) > B. 


Rules EE and EE’ are “equipotent” in the sense that each may be derived from the other. To see how this 
could be plausible, suppose first that rule EE is assumed to hold, and suppose that the sequent AF a(2) > 8 
is given. Let A; = Az = A. Then the sequent A», a(#) F 8 follows by application of the MP rule. So the 
sequent A; U A2 F f may be inferred via the EE rule from the sequent A, F Jz, a(x). In other words, the 
sequent A F 8 may be inferred from the sequent A F 3v, a(x). Therefore the sequent A F (Ax, a(x)) > 8 
follows via the CP rule. Hence the EE' rule follows from the application of the EE, MP and CP rules. (See 
also Remark 6.4.10 for a demonstration of the plausibility of rule EE' on knowledge-set semantics grounds.) 


Now assume that rule EE’ holds, and suppose that the sequents A, F Ja, a(x) and A», a(ĉ) F B are given. 
Then the sequent A3 + (3x, a(x)) = B follows via the EE’ and CP rules, and the sequent A; U A2 F B from 
this and A; F dz, a(x) via the MP rule. Hence the EE rule follows from the application of the EE’, MP 
and CP rules. m m 


Thus the EE and EB’ rules are equipotent. Any inference which can be obtained from one can be obtained 
from the other, modulo applications of the MP and CP rules. Some other equivalent rules (modulo MP 
and CP) are as follows. 


(EEL) From A, a(ĉ) F B, infer AF (Ax, a(x)) => B. 
(EES) From At a(ĉ) > B, infer A, Jz, a(x) F B. 
(EES) From A, a(ĉ) F 8, infer A, 3x, a(x) - B. 


In view of the existence of so many simpler-looking equivalent formulations of the EE rule, one might ask 
what benefits might be obtained by adopting the formulation presented in Definition 6.3.9. The answer 
is that rule (EE), as given, is the basis for the “Rule C" inference procedure, which is very close to the 
way mathematics is done in the real world for existential propositions. Adoption of the simpler-looking 
alternatives would necessitate extra lines invoking the MP and CP rules to achieve the same result. 


6.3.18 REMARK: Slightly incongruous existential elimination rule. 

Three of the rules (UE), (UI), (EE) and (EI) are kind of the same. But rule (EE) is not like the others. 
Rule (UI) is almost the exact reverse of rule (UE), and vice versa. (The only symmetry-breaking factor is 
the requirement that the UI rule antecedent list A and the consequent Yz, a(x) be independent of 2.) But 
rules (EE) and (EI) are not almost exactly the reverse of each other. The underlying cause of this is that 
according to the de-facto standard interpretation, sequents are assumed to be “universalised” with respect 


to free variables. The reverse of rule (EI) seems like it should be somewhat as follows. 


(EE") From A F dz, a(x), infer AF a(ž). 


This rule would be perfectly valid if an existential-style sequent interpretation were available for i, but in 
terms of the current conventions, there is no way to express the idea that the variable $ in a sequent is 
“existentialised”. Therefore rule (EE") cannot be used in Definition 6.3.9. 


6.3.19 REMARK: Rule G and Rule C. 

The expressions “Rule G" and “Rule C" appear to be due to Rosser [387], pages 124-139. In short, Rule G 
is very simply the method of inference where a parametrised proposition a() is first demonstrated for an 
arbitrary choice of the variable 2, and from this it is inferred that the quantified expression Vr, A(x) has been 
proved. There is nothing at all convoluted about this. This method of inference is arguably a very sensible 
definition of what the expression "Vr, A(x)” means. This is nothing more or less than generalisation from 
a general proof for an arbitrary choice of free variable to a general conclusion, hence the name “Rule G”. It 
is clear that this method of inference is the most straightforward application of rule UI in Definition 6.3.9. 
An application of Rule G looks somewhat as follows in practice. 


(i) a(#) 4 (A) 
(n) Vx, a(x) UI (1) 4 (A) 


Rule C is not as simple to explain as Rule G. In the Rule C method of inference, one typically commences 
with an argument line which asserts Jz, a(x) with some list of dependencies A1. Then one commences a 
new argument from scratch, assuming the proposition a(#) as a hypothesis, where 2 is an arbitrary choice of 
the parameter for o(...). Then one shows that a proposition 9 may be inferred with dependency a(#) and 
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an arbitrary list of further dependencies A2 (which must all be fixed relative to 2). Then Rule C consists in 
the inference that the sequent A; UA, + 8 must be valid under these conditions. This is clearly the inference 
of A; U A2 F B from the sequents A; F 3r, a(x) and A», a(%) F B, which is exactly what the EE rule in 
Definition 6.3.9 says. An application of Rule C looks somewhat as follows in practice. 


(i) Ix, a(x) 4 (Ay) 
(i) a(ê) Hyp 4 (7) 
(k) B 3 (Aa,j) 
(n) B EE (i,j,k) 4 (Ay U Ag) 


6.3.20 REMARK: Regarding predicate calculus rules as definitions of logical operators. 
One may regard the MP/CP, UE/UI and EE/EI rule-pairs as “operational definitions” of the meanings of 
the “=”, "v" and “J” operators respectively. 


6.4. Interpretation of a sequent-based predicate calculus 


6.4.1 REMARK: Universalisation semantics for free variables in sequents. 

In Chapter 6, sequents are interpreted in accordance with two kinds of variable names: bound variables and 
free variables. Free variables are denoted by letters with a hat-accent and an optional subscript. Bound 
variables are denoted by letters with no accent. 


A sequent of the form “a(ĉo, ĉ1,...) F @6(4%o,41,...)” is interpreted as: 
Vio, Vii, ..., a(ĉo, 21,...) > B(£9,d1,...). 


Any commas between propositions in the proposition lists at the left and right of the assertion symbol are 
converted to logical conjunction operators before the quantifiers are applied. Each of the universal variables 
may occur on the left or the right of the assertion symbol any number of times, including zero times. 
Strictly speaking, sequents are not truly universally quantified for each free variable which they contain. 
Strictly speaking, a sequent which contains one or more free variables is a different sequent for each com- 
bination of free variables. In other words, the sequent may be thought of as a kind of template into which 
the free variables may be inserted. One may refer to such templates as “parametrised sequents”. Since 
the combinations of free variables may be freely chosen, there is not much difference between saying that 
the sequent is true for each combination and saying that it is true for all combinations of free variables. 
The difference here is between metamathematical quantification and “object language" quantification. (The 
“object language" is the name sometimes given to the language in which the wffs of the predicate calculus 
are written.) 


6.4.2 REMARK: Knowledge-set interpretation for the MP inference rule. 
The MP (modus ponens) rule in Theorem 6.3.9 is expressed in abstract summary form as follows. 


(MP) From A; Fa => 6 and As F a, infer Ay U Ao F B. 
If the proposition list A; is the list 41, y4, ... y}, for some mı € Zd, the combined knowledge set for 
A, may be written as Ka, = 2? n Q Ki, where P is the concrete proposition domain for the system 
being modelled, and Kj: C 2” is the knowledge set for q} for each i € Nm,. (The intersection with 


2? is needed only for the case mı = 0.) Then the sequent “A; F a > f signifies the knowledge set 
relation Ka, C (2? \ Ka) U Kg. Similarly, the sequent “Az F a” signifies the relation KA, C Ka, where 
RK.e3 N a en Ks, is the knowledge set signified by Az, which equals the proposition list 47, 73, ... "2, 
for some m» € Zf. Then it follows from the sequents “A; F a = B” and “Az F a” that 


Kauas = KA, Ka, 
€ ((2? \ Ka) U Kg) Ka 
= Kgn Ky 
C Kg. 


Consequently the sequent “A, U A5 F 8" is valid because its interpretation is Ka,ua, C Kg. 
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6.4.3 REMARK: Knowledge-set interpretation for the CP inference rule. 
The CP (conditional proof) rule may be similarly interpreted. 


(CP) From A, at B, infer AF a > B. 


The sequent “A, œ F 8" has the knowledge-set interpretation KA N Ka C Kg, for any proposition-list A 
and propositions a and 8. It follows that 


Ka = Ka n2? 
= Ka N ((2P \ Ka) U Ka) 
= (Ka N (2” \ Ka)) U (Kan Ka) 
C (Ka N (2” \ Ka)) U Kg 
C (2P \ Ka) U Kg. 


Consequently the sequent A+ a — £ is valid because its interpretation is Ka C (2? V Ka) U Kg. 

It is noteworthy that the CP rule and MP rule are converses of each other. (This may be easily verified by 
the interested reader! Hint: Show that the MP rule is equivalent to inferring A F a — f from A, at £ for 
any proposition list A.) 


6.4.4 REMARK: Knowledge-set interpretation for the Hyp inference rule. 
The Hyp (hypothesis) rule is trivially valid because the knowledge-set interpretation of “at a” is Ka C Ka. 


(Hyp) Infer a F a. 


6.4.5 REMARK: Knowledge-set interpretation for the UE and UI inference rules. 
The four rules for universal and existential quantifier elimination and introduction are more “interesting” 
than the MP, CP and Hyp rules. The UE (universal elimination) rule and UI (universal introduction) rule 
are as follows. 

(UE) From A(£) F Va, a(x, £), infer A(2) F a(ĉ, ĉ). 

(UI) From A F a(#), infer AF Vz, a(x). 
The knowledge-set interpretation of “A(ĉ) - Va, a(a,#)” is Vê € U, (Ka) C (ey Ka(z,2)), where U 
is the universe of parameters of the predicate logic system which is being referred to. The interpretation 
of “A(ĉ) F o(2,2)" is V£ € U, (Kag) C Keye,a)). A universal quantifier is implied for all free variables 
because there is an implicit claim that the assertion is true for all values of ĉ in the universe of parameters. 


Let 7 € 2? be a truth value map. Then 7 € Oeevy Ka(z,2) if and only if T € Kaz.) for all x € U. (See 
Notation 10.8.10 for the intersection of a family of sets.) Therefore 


Vi cU, (KA (a) G N K(5,2)) c VicU,Vrc QP, (T € Kaa) => TE sev Fetes) 
zcU 

j € V$ € U, Vr € 2”, (re Kag) > Vi € U, r € Kaiya) 

€ Yê € U, Yr € 2P, VjeU, (T € Kos > TE Kaga) (6.41 


) 
€ Vi € U, V9 € U, Yr € 2”, (re Kaa) > TE Kaga) (6.4.2) 
) 
) 


= Vi € U, V cU, (Kac) C Kat) 6.4.3 
=> Vĉ € U, (KA (a) € Kalaa) 6.4.4 


which is the same as the knowledge-set interpretation for the sequent “A(ĉ) F a(ĉ,ĉ)”. (Line (6.4.1) 
follows from Theorem 6.6.12 (xviii). Line (6.4.2) follows from Theorem 6.6.24 (i). Line (6.4.3) follows from 
Definition 7.3.2 and Notation 7.3.3.) Hence the inference of the sequent “A(ĉ) - a(ĉ, ĉ)” from the sequent 
“A(é) F Yr, a(x, £)" is always valid. This establishes the validity of the UE rule. 


In the case of the UI rule, the same reasoning applies as for the UE rule, except that the final line (6.4.4) 
cannot be reversed if A depends on 2 or a has an extraneous dependence on $. Therefore the dependence 
on $ must be removed. The resulting semantic calculation is then as follows. 


Ka CG N Koa) e YTE QP, (TE Kan > TE (reg Katj) 
zeU 
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e Vr € 27, (re Ky = VE EU, TE Kou) 
e vr e 27, V&eU, (re Ka > r € Kou) 
e V$eU,Vre2", (re Ka > r€ Kou) 
€ V$ € U, (Ka € Kayay), 


which is the same as the knowledge-set interpretation for the sequent “A F a(2)". This gives some insight 
into why the UE rule permits the precondition A to depend on 2, and the wff a to have an extraneous 
dependence on 2, whereas the UE rule does not. 


6.4.6 REMARK: Circularity of knowledge-set interpretations for predicate calculus rules. 

It is not possible to hide the fact here that elementary theorems of predicate calculus and set theory are 
being used in these semantic justifications of inference rules, and these inference rules will be used later to 
justify these very same theorems of predicate calculus and set theory. (The knowledge-set semantic calculus 
is itself justified by invoking predicate calculus theorems!) This is not as fatal as it might seem. First, these 
semantic justifications have the benefit that they establish at least the consistency or coherence of the whole 
dual system of logic and set theory. Second, there is no real harm in these justifications because after they 
have been established on the grounds that they are consistent with what we already “know” about predicate 
logic and sets, the justifications may be discarded and we may pretend that the inference rules “fell from 
the sky on golden tablets”. In other words, they are either a-priori knowledge, or they are established on 
the unquestionable authority of great minds or the weight of history and tradition. The “golden tablets” 
stratagem neatly removes circularity from the mutual dependence between logic and set theory. 


Third, and most importantly, these semantic justifications make clear the intended significance of the rules. 
The rules of mathematical logic are not, as some writers have supposed, mere conventions for the manipula- 
tion of meaningless symbols on paper. That would be a purely “behaviourist” interpretation of mathematical 
behaviour, whereby the states and activities of the minds of mathematicians are held to be conjectural and 
beyond scientific investigation. However, this behaviourist point of view may be applied to all human obser- 
vations of all behaviour of all aspects of the universe. The view taken here is that mathematical texts have 
significance beyond the ink they are written or printed in. In the case of mathematical logic, a very reason- 
able supposition is that a logical expressions signifies the inclusion of some propositions and the exclusion 
of others. Such inclusions and exclusions may be identified with knowledge sets, where each knowledge set 
excludes some subset of the set 2? of all possible combinations r € 2? of true and false values for all propo- 
sitions in a concrete proposition domain P. It is helpful to have something to think of as the significance of 
each logical assertion, and if that significance is fully consistent with all practical applications of that logic, 
so much the better! 


Fourth, although the logical axioms for predicate calculus may seem to be not much more than a con- 
venient shorthand for the corresponding set-manipulation operations, the real advantages of the predicate 
calculus argument language is its extension to general first-order languages, where non-logical axioms are 
introduced. Such non-logical axioms restrict the full space 2? to a proper subset in ways which could be 
excessively complex for convenient practical argumentation. Mathematical literature is almost entirely writ- 
ten in an argumentative language from assumptions to assertions. Even though in principle such arguments 
are equivalent to calculations of constraints above or below given target knowledge sets, that is not how 
mathematicians generally think. (Recall that the subset relation for knowledge sets corresponds to the im- 
plication operator for logical propositions. So constraints “above” or “below” knowledge sets correspond to 
implication relations. ) 


6.4.7 REMARK: Knowledge-set interpretation for the EE and EI inference rules. 
The EE (existential elimination) rule and EI (existential introduction) rule are even more “interesting” than 
the UE and UI rules in Remark 6.4.5. 

(EE) From A, F Jz, a(x) and As, a(%) - B, infer Ay U As F f. 

(EI) From A(2) - a(ĉ, 2), infer A(ĉ) - da, a(x, 1). 


At first sight, the EI rule appears to be almost identical to the UI rule, and in fact it almost is. Whenever 
the UI rule may be used to infer a universally quantified proposition, the EI rule may be used to infer the 
corresponding existentially quantified proposition from the same list of premise lines. One may (almost) 
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freely choose which inference to draw. This seems perhaps somewhat paradoxical. One might reasonably 
expect the conditions for the inference of “A F da, a(x)” to be significantly weaker than the conditions 
for the inference of “A + Vaz, a(x)”. In fact, the antecedent proposition list A is permitted to depend 
on $ in the EI case, and the wff a may retain one or more copies of the free variable ? in the consequent 
proposition Vz, a(x). These are in practice two significant weakenings of the conditions under which the 
rule may be applied, which enormously increases the power of the rule. 


6.4.8 REMARK: Knowledge-set interpretation for the EI inference rule. 

For the EI rule, the knowledge-set interpretation for “A(#) F oa(2,2)" is V@ € U, (Kag) C Kava,e)), and 
the interpretation of “A(%) F da, o(z, $)" is Vt € U, (Kava) € Urey Ka(e,a)), where U is the universe of 
predicate parameters. A universal quantifier is implied for the free variable $ (because existential sequents 
are not defined for this style of predicate calculus). Let r € 2? be a truth value map. Then 7 € Uer Katz, 
if and only if T € Ka(z,2) for some x € U, where U is assumed to be non-empty, as mentioned in Remark 6.3.4. 
(See Notation 10.8.10 for the union of a family of sets.) Therefore 


V& €U, (Kare C U Kaga) V8 eU, Vr € 2^, ( 
e Vi EU, Vr € 2”, ( 
e V$ cU, Vr € 2”, 3j eU, (re Kaz) T€ Kaga) (6.4.5) 
<= Vi cU, 
e VicU, 


T€ Kae) —T€ Usev Ka(a,a)) 
T € Kaa) > ag € U, T E€ Koga) 


g€U, Yre, (re Kars) > TE Kaga) (6.4.6) 
Ü € U, (Kaa) c TC a) 


which is the same as the knowledge-set interpretation for the sequent “A(#) - o(£, 2)". (Line (6.4.5) follows 
from Theorem 6.6.12 (xxvii). Line (6.4.6) follows from Theorem 6.6.24 (iii). Hence the inference of the 
sequent ^A(2)(£) F 3x, a(z, 2)" from the sequent *A(2) F a(%,%)” is always valid. This establishes the 
validity of the EI rule. 


6.4.9 REMARK: Knowledge-set interpretation for the EE inference rule. 

The EE rule is a different kettle of fish. The EE rule does not look like a simple converse of the EI rule in the 
way that the UE rule looks like a simple converse of the UI rule. The main reason for is that the converse 
of the EI rule isn't valid. This is partly because the existential meaning of the logical expression “a(z)” 
cannot be interpreted within the standard universalised sequent semantics, but also because the swapping of 
universal and existential quantifiers in line (6.4.6) in Remark 6.4.8 cannot be reversed. A different semantic 
system could remove both of these show-stoppers, but within the standard interpretation, these obstacles 
cannot be removed. However, existential semantics can be effectively procured by locating the free-variable- 
containing term “a(@)” on the left of the assertion symbol. (This left-hand-side free-variable trick can be 
seen as early as the 1928 axioms and rules of Hilbert /Ackermann [358], pages 65-70, which are summarised 
in Remark 6.1.10.) 


The sequent “A2, a($) + 8" has the interpretation Vi € U, ((A2 ^ o(£)) = B) using standard universalised 
free variable semantics, where any commas in the proposition list A2 are converted to conjunction operators. 
By Theorem 6.6.12 (iii), this expression is equivalent to (3$ € U, (Az ^ a(%))) > 8. Thus an existential 
meaning is very conveniently manufactured using the standard universalised free variable interpretation. By 
Theorem 6.6.16 (vi), this expression is also equivalent to (Ag ^ (3$ € U, o($))) > 8, which is equal to 
the expansion of the sequent “Ag, dx, a(x) F 8". This may quite plausibly be combined with the sequent 
“A, F da, a(x)” to produce the sequent “A; U As F 6”. This may be more carefully verified using the 
knowledge-set interpretation. 


In terms of knowledge sets, the sequent ^A; F Jr, a(x)” has the interpretation Ka, C Urey Ka(a), the 
sequent ^A», o($) F 8" has the interpretation V? € U, (Ka, N Kaa) C Kg), and sequent “A; U Ag F 8" 
has the interpretation KA, N Ka, C Kg. Therefore the EE rule may be interpreted as follows. 

(EE) From Ka, € Uet Koa) and Vé € U, (Ka, N Kaw) E Kg), infer KA, N KA, C Kg. 


'The second precondition may be rewritten as follows. 


(vê E€ U, (Ka, N Koa) [a K5)) e U (Ka, N Kua) C Kg 
$cU 
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> Kain U Kae) C Kg (6.4.7) 
$cU 


€ Ka, C (27X U Kaa) U Ks, 
$cU 


where line (6.4.7) follows from Theorem 10.8.14 (i). So the conjunction of the two preconditions implies 
Ka, N Ka, i ( U Rt) N (G^ \ U Kaa) U Ka) 
zeU $cU 
=(U Kae) Ks 
xeU 
C Kg. 


Thus the inference Ka, N Ka, C Kg is demonstrated. Hence the inference of the sequent “A, U Aa F 9? 
from the sequents “A, F dx, a(x)” and “Ag, a(#) - 8" is always valid. This establishes the validity of 
the EE rule. This completes the demonstration of the plausibility of the inference rules in Definition 6.3.9. 
Therefore it is probably safe to discard the semantic justifications and pretend that Definition 6.3.9 fell from 
the sky on golden tablets (which are kept in a secure location where sceptics are not permitted to visit). 


6.4.10 REMARK: Demonstration of plausibility of an alternative existential elimination rule. 
It is possibly (but not very likely) of some interest to demonstrate the plausibility of the alternative existential 
elimination rule EE’ in Remark 6.3.17, which may be repeated here as follows. 


(EE’) From At a($) > B, infer A F (Az, a(x)) > B. 
The sequent A H a(@) > B has the knowledge-set interpretation V? € U, (Ka C (2” V Kaiz)) U Kg). But 


(vê € U, (KA C QP \ Kaay) U Kg)) & Ka C Q V Kaca)) U Kg) 


€ KAC (ot \ Kya))) U Kg 
LE 
& Ka C (2 \ U Kaa) U Kg, 
$cU 


which is the knowledge-set interpretation for the sequent A F (dz, a(x)) > 8. Hence rule EE’ is plausible. 


6.4.11 REMARK: Curious similarity of UE/UI and MP/CP rules. 

As a curiosity, it may be noted that the UE and MP rules are similar, and the UI and CP rules are similar. 
The UE rule in Definition 6.3.9 has the following form. (The permitted dependence of A on ĉ is ignored 
here for the sake of tidiness and symmetry. But the conclusions are the same if A does depend on 1.) 


(UE) From At Va, a(x), infer A F a(£). 


But one may think of the UE rule as having the following form. 
(UE?) From A, F Va, a(x) and A2 F ĉ € U, infer Ay U A5 F a(&). 
This is very similar in form to the MP rule. 


(MP) From A; Fa => 6 and As F o, infer Ay U A2 F B. 


The formula “a => f" is a kind of a template into which a matching pattern a may be inserted to obtain 8 as 
output. In a similar way, the formula “Vz, a(x)” is a kind of a formula into which ĉ my be inserted to obtain 
a(#) as output. This is less surprising if one considers that Vx, a(x) means more precisely Vx € U, a(x), 
which is equivalent to Vx, (x € U > o(z)). But “x € U => a(x)” is equivalent to ^x € UF a(x)”. Hence 
from F Vx, a(x) and F $ € U, one may infer F o($). This leads directly to an inference rule of the form UE”. 


Likewise, the UI rule in Definition 6.3.9 has the following form. 
(UI) From A F a(#), infer AF Vz, a(x). 
But one may think of the UI rule as having the following form. 


(UI°) From A, 2 € UF a(ĉ), infer A F Va, a(x). 
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This is very similar in form to the CP rule. 
(CP) From A, at B, infer AF a > B. 


Applying the CP rule to the pseudo-sequent “A, $ € UF o($)", one obtains “A F $ € U > o($)". When 
the free variable $ is universalised, the result is “A F Va, ( € U => a(#))”, and this is equivalent to 
“A F Vx € U, a(x)” or “A F Vr, a(x)”, as in the standard UI rule. One could almost be tempted to try to 
design an alternative kind of predicate calculus based upon such observations! 


» 


This kind of argument suggests that a universal formula "Vr, a(x)” may be thought of as the having the 
same meaning as the conditional formula *2 € U > o(£)", where the variable 2 is universalised. If this were 
accepted, then the UE/UI and MP/CP rules would in fact be the same. 

The situation for the EE/EI rules is somewhat different because by default, free variables are assumed 
to be universal. However, a formula “Ja, a(x)” may be thought of as the having the same meaning as 
the formula *£ € U ^ a(ž)”, where the variable ž is existentialised. Then the standard properties of 


« ? 


the conjunction operator in Section 4.7 yield the EE/EI rules in a similar way because “Jx, a(x)” means 
«g «g 


da € U, a(x)”, which is equivalent to “Ja, (x € U A a(x))”, which is simply the existentialised version of 
the formula “x € U ^ a(x)”. 


A potential advantage of a predicate calculus in which the universe for predicate parameters has an explicit 
role would be the ability to support more than one parameter-universe. The multiple universe scenario 
can be supported in other ways, for example by defining fixed predicates U,(a) for r € U which mean 
that x is an element of a universe Ug. Therefore the extra effort to support explicit indication of multiple 
parameter-universes is probably not profitable. 


6.5. Practical application of predicate calculus rules 


6.5.1 REMARK: Practical application of predicate calculus axioms. 

The axioms in Definition 6.3.9 mean that in any line of an argument, any axiom may be inserted with 
any well-defined formula uniformly substituted for the letters of the axiom. This substitution procedure is 
demonstrated by the proofs of many assertions in Theorems 4.5.7, 4.5.16, 4.6.2 and 4.6.4. An example line 
is as follows for T'heorem 4.5.7 (vi). Don 


(1) (8 = y) => (a= (8 = v) PC1: af > 4], Blo] 4 0 


Axiom (PCI) is the proposition template “a = (8 = a)”, which is here uniformly substituted with the 
wif “G = y” for the letter a, and with the wff “a” for the letter 8. The resulting substituted formula 
"(8 => y) (a = (B = y))" is then appended to the list of argument lines. Substitutions into axioms 
are always tautologies. So the list of dependencies for an axiom substitution line is always the empty list. 
(Note that argument line dependency lists are not shown in Chapter 4 because the lists are always empty 
for propositional calculus.) 


6.5.2 REMARK: Practical application of the predicate calculus MP rule. 

Suppose that a logical argument currently contains n — 1 argument lines with labels (1) to (n — 1). The 
MP rule in Definition 6.3.9 permits line (n) to be appended to the argument as follows if there are lines (i) 
and (j) already in the argument with the following form. 


(i) a9 Ji A (Ai) 
(j) a Jj 4 (Aj) 
(n) B MP (i,j) 4 (A; U Aj) 


Here J; and J; denote the justifications for lines 7 and j respectively, where i may be less than or greater 
than j. The order does not matter, but the inequality max(i, j) < n is mandatory. A; and A; denote the lists 
of line numbers of the antecedent propositions for the sequents on lines 7 and j respectively. The consequent 
propositions of lines 7 and j have the form a => p and a respectively, where a and £ are well-formed formulas 
in accordance with Definition 6.3.9. It is probably good practice to maintain the dependency line number 
lists in numerical order for easy checking, whereas the justification “MP (2,7)” should list the line numbers i 
and j so that the formula “a — 8” appears on line 7 and the formula “a” appears on line j. In other words, 
if 7 « i, the line numbers in the justification will be in reverse numerical order. An example application of 
the MP rule appears on line (4) of the proof of Theorem 6.6.12 part (i) as follows. 
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(1) (ax, A(x)) > B Hyp 4 (1) 
(2) A(Zo) Hyp 4 (2) 
(3) da, A(x) EI (2) 4 (2) 
(4) B MP (1,3) 4 (1,2) 


In this example, i = 1, j = 3, n 24, A; = “1”, A; = “2”, A, = “1,2”, a = “(da, A(x))” and 8 = “B”. 


6.5.3 REMARK: Practical application of the predicate calculus CP rule. 
The application of the CP rule follows very much the pattern of the MP rule. A template for the application 
of the CP rule is as follows. 


(i) a Hyp d (i) 
(i) 8 Jj 4 (Aj) 
(n) a 8 CP (i,j) 4 (A7) 


In this case, instead of merging the lists of dependency line numbers A; and A; to construct the list An, 
the output dependency list is constructed by removing one of the line numbers from the list A;. The only 
possible justification of line 7 is the Hyp rule. This is because the dependency list A; = (A5, i) for line j must 
indicate a dependency on line i, but the only way in which A; can indicate line i is via the Hyp rule, which 
always sets the dependency list to “(i)”, with no other line numbers in the list. All other rules obtain their 
dependency lists from earlier line numbers, or else have an empty dependency list. An example application 
of rule CP appears on line (5) of the proof of Theorem 6.6.12 part (xvii) as follows. 


(1) A 2 Vz, B(x) Hyp 4 (1) 
(2) A Hyp + (2) 
(3) Ve, B(x) MP (1,2) 4 (1,2) 
(4) B(ĉo) UE (3) 4 (1,2) 
(5) A => B(ĉo) CP (2,4) 4 (1) 
In this example, i = 2, 7 = 4, n = 5, A; = “2”, A; = “1,2”, A, = "1", a= “A” and B = "B($g)". 

6.5.4 REMARK: Practical application of the predicate calculus Hyp rule. 

A template for the application of the Hyp rule “at a” is as follows. 

(n) a Hyp d (n) 


The Hyp rule is the only rule which introduces a new line number into dependency lists, and only the 
dependency lists of lines which apply the Hyp rule satisfy n € An. One may think of Hyp rule lines as 
"rhetorical" because they don't really assert anything. Such lines are inserted into an argument because 
they will be useful later. They are completely useless and without effect if they are not referred to later in 
the argument. 


6.5.5 REMARK: Practical application of the predicate calculus UE and UI rules. 
A template for the application of the UE rule is as follows. (Note that A; may depend on 2, and a may 
have an extra dependence on 4.) 


(i) Va, a(x) Ji A (Ai) 
(n) a(#) UE (i) 4 (Ai) 
For the UE rule, the dependency list A, is always the same as the dependency list A;. To keep the free 


variables distinct, each introduced free variable ĉ must have a different label. It is convenient to do this by 
adding subscript indices, like ĉo, £1, etc. 


A template for the application of the UI rule is as follows. (Note that A; must be independent of ĉ, and a 
may have an extra dependence on 4.) 

(i) a(2) Ji 4 (A;) 
(n) Vz, a(z) UI (i) 4 (Ai) 
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This is almost the exact converse of the UE rule. (The only difference is that the dependency list A; may 
depend on ĉ in the UE case.) 

Both the UE and UI rules are well exemplified in the proof of Theorem 6.6.12 part (xvii) as follows. (This 
example also neatly demonstrates the interaction between the Hyp, MP and CP rules.) 


1) A > Yz, B(x) Hyp + (1) 
2) A Hyp 4 (2) 
3) Yz, B(x) MP (1,2) 4 (1,2) 
4) B(£o) UE (3) 4 (1,2) 
5) A > B(ĉo) CP (2,4) d (1) 

6) Va, (A => B(x)) UI (5) 4 (1) 


A very important point to note about the application of the UI rule is that the dependency list must contain 
no instances of the free variable which is being removed by the application of the rule. This is a very easy 
requirement to overlook. The dependencies of each argument line are abbreviated as a list of line numbers in 
parentheses at the far right of the line. The lines referred to by these numbers must be examined to ensure 
that the removed free variable is not present. In the above example, the UI rule is applied to line (5) to 
remove the free variable 29. Line (5) depends on line (1), which luckily does not contain the free variable ĉo. 
Two examples of attempted pseudo-proofs where this non- dependency check could easily be overlooked are 
described in Remarks 6.6.8 and 6.6.26. This kind of error is particularly easy to make because it is not the 
line to which the rule is being applied which must be checked. It is the line or lines referred to by the number 
or numbers in parentheses at the right which must be checked! 


6.5.6 REMARK: Limited usefulness of the free variable dependency permission for the UE rule. 

The UE rule seems to only rarely exploit the permission to make the dependency list A; depend on the free 
variable ĉ. This may be because when this happens, considerable information is “thrown away”, and mostly 
one does not wish to “throw away" information in a proof. The weakness of this case can be seen if one 
considers that the UE rule allows on to infer a sequent of the form “A(g) F a(#)” from a sequent of the 
form “A(g) + Vr, a(x)”. A sequent *A(j) H o()" is valid both if ? and ĝ are different and if they are the 
same, but requiring them to be the same clearly “throws away” a large range of possibilities. It is always 
permitted to uniformly substitute one free variable for another in a sequent, and if this makes two of the free 
variables the same, the range of propositions which are asserted by the sequent is considerably restricted. 
The appearance of the free variable ĉ in the statement of Definition 6.3.9 rule (UE) is merely permitting 
the possibility of discarding useful information, which one can easily do in other ways. By contrast, the 
appearance of the free variable ĉ in Definition 6.3.9 rule (EI) offers a considerable strengthening of the rule 
because in that case, ĉ is being eliminated from the consequent of the sequent, not being introduced. 


The following is a contrived example application of the UE rule where the dependency list contains the same 
free variable which is being eliminated from the universally quantified predicate. 


1) Va, (A(z) > Vy, B(y)) Hyp d (1) 
2) A(to) => Vy, By) UE (1) 4 (1) 
3) A(to) Hyp 4 (3) 
4) Vy, B(y) MP (2,3) 4 (1,3) 
5) B(io) UE (4) 4 (1,3) 
6) A($9) > B($o) CP (3,5) 4 (1) 
7) Va, (A(z) = B(z)) UI (6) 4 (1) 


In this argument, clearly a lot of information has been “given away” by the application of the UE rule in 
line (5) because the same free variable ĉo was chosen for the universal elimination as was used in the universal 
elimination in line (2). This kind of choice is rarely useful in a proof, whereas in the case of the EI rule, it is 
very common to see an existential quantifier introduction use the same free variable as a proposition in the 
dependency list. 


6.5.7 REMARK: Practical application of the predicate calculus EE rule. 
A template for the application of the EE rule is as follows. (Note that A; and Aj, must be independent 
of $.) 
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(i) Ax, a(x) 4 (Ay) 
(J) o(4) p ) 
(k) B 3 (A13) 
(n) 8 EE (i,j,k) 4 T^ U Aj) 


Here Aj, denotes A; V {j}, where A; is the set (or list) of dependencies of line k. So one may write 
An = A; U (Ax \ {7}). Thus the dependency on line j is “discharged”. The justification J, = “EE (i,j,k)" 
lists the input lines 7, j and k in the order in which they appear in the statement of the EE rule in 
Definition 6.3.9. The justification of line j is always the Hyp rule because the only way in which line j 
can be a dependency of line j is via the Hyp rule. This observation is also mentioned in Remark 6.5.3 in 
connection with the CP rule. Thus both the EE and EE rules introduce hypotheses with the intention of 


discharging them. 


'The EE rule is the most complex of the inference rules in Definition 6.3.9 in practical applications. It always 
yields two lines with the same consequent proposition 8. This might make one think that an efficiency can 
be achieved by removing one of these lines. Some authors do in fact merge the final two lines k and n into a 
single line, usually omitting the dependency lists. This can make errors difficult to find. All in all, it is best 
to write out the sequent dependencies in full, showing the full four lines as indicated here. 


A typical application of the EE rule is shown in the proof of Theorem 6.6.12 part (xi) as follows. 


(1) Se, (A(x) > B) Hyp 4 (1) 
(2) A($0) > B Hyp 4 (2) 
(3) Yz, A(x) Hyp 4 (3) 
(4) A(@o) UE (3) 4 (3) 
(5) B MP (2,4) 4 (2,3) 
(6) B EE (1,2,5) 4 (1,3) 
(7) (Vz, A(x)) > B CP (3,6) 4 (1) 


The hypothesis in line (2) is introduced so that it can be discharged by the EE rule in line (6). The hypothesis 
in line (3) is introduced so that it can be discharged by the CP rule in line (7). The justification “EE (1, 2, 5)" 
for line (6) lists line (1) as the existentially quantified proposition dz, (A(x) = B) which will become the 
new dependency for the proposition B which is input on line (5), while discharging the dependency on the 
proposition A(29) — B on line (2). Thus the dependency list A; = (2,3) is replaced with Ag = (1,3). This 
is an important psychological step. It means that the argument from line (2) to line (5), which proves B from 
A(#o) = B for an arbitrary choice of ĉo justifies the conclusion that B follows from the existential proposition 
dx, (A(x) = B). In other words, no matter which value of 29 “exists” in line (1), the conclusion B followed 
from A(#o) = B. Therefore the conclusion B follows from 3x, (A(x) 2 B), independent of the choice of ĉo, 
and therefore the dependency on line (2) may be discharged. It is noteworthy that at no point is it stated 
in this argument that $9 “exists”, nor that Zp is “chosen”, even though a mathematician composing such a 
proof may often think of the value of 29 as “existing” or having been “chosen”. 


6.5.8 REMARK: Practical application of the predicate calculus EI rule. 
A template for the application of the EI rule is as follows. (Note that A; may depend on ĉ, and a may have 
an extra dependence on 4.) 

(i) a(2) d AS) 
(n) 3x, a(x) EI (i) 4 (Aj) 
This is almost identical to the template for the UI rule. A typical application of the EI rule is shown in the 
proof of Theorem 6.6.24 part (ii) as follows. 


(1) 3x, Jy, A(z, y) Hyp 4 (1) 
(2) Jy, A(ĉ0, y) Hyp 4 (2) 
(3) A(ĉ0, ĝo) Hyp 4 (3) 
(4) Ax, A(x, ĝo) EI (3) 4 (3) 
(5) Jy, dz, A(z, y) 1 (4) 4 (3) 
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(6) Jy, Ix, A(z, y) EE (2,3,5) 4 (2) 
(7) dy, Ix, A(x, y) EE (1,2,6) 4 (1) 


Although in general the EI rule may seem to “discard information” because it yields a weaker conclusion 
than the UI rule from the same antecedent proposition, in this case no information is “lost”. This is because 
the free variable ĉo in the antecedent proposition A(ĉo, ĝo) for the EI rule application in line (4), for example, 
is effectively an existential variable. When the dependency on Zo is discharged by the EE rule in line (6), 
there may be only one value of $9 which “exists”. (It is very significant that in both lines (4) and (5), 
the antecedent proposition in line (3) depends on the variable which is being converted to an existential 
quantifier, exploiting the permitted dependence of A(2) on ĉ in the EI rule.) In the proof of Theorem 6.6.24 
part (iii), by contrast, information is “lost” because the corresponding variable ĉo in the antecedent line (3) 
is à completely arbitrary element of the parameter-universe U, and this is not exploited by the EI rule 
application in line (4) of that proof. 


6.5.9 REMARK: Sequent expansions and knowledge set semantics for predicate calculus arguments. 

Every line in an argument according to the axioms and rules in Definition 6.3.9 must be valid in itself when 
interpreted as a sequent. In other words, each line is effectively a tautology. The rules are designed to ensure 
that only valid sequents follow from valid sequents. 


As an example, the following argument lines are taken from the proof of part (i) of Theorem 6.6.12. 


argument lines sequent expansions 
(1) (Aa, A(z)) > B Hyp 4 (1) (Gx, A(z)) > B) > (ax, A(x)) > B) 
(2) A(%o) Hyp 4 (2)  Vĉo, A(ĉo) = A(ĉo) 
(3) da, A(x) EI (2) 4 (2) Vio, A(£g) — da, A(x) 
(4) B MP (1,3) 4 (1,2) V$o, (Ex, A(z)) > B) ^ A(d9) > 
(5) A(%o) > B CP (2,4) 4 (1) Vio, (Gx, A(z)) > B) > (A($o) > B) 
(6) Va, (A(x) > B) UI (5) 4 (1) (Gx, A(z)) > B) > (Va, (A(x) > B)) 
The argument lines on the left are expanded as sequents on the right. The dependency lists in parentheses at 
the right of each argument line are fully expanded on the right, with commas replaced by logical conjunction 


operators. Assertion symbols are converted to implication operators and free variables are assumed to be 
universally quantified. Visual inspection of the sequent expansions reveals that they are all tautological. 
(See Remark 6.6.26 for an example of how such sequent expansions can be used to reveal inference errors.) 


The meaning of all lines in the above argument may be expressed in terms of the knowledge set concept in 
Section 3.4 as follows. (To save space, the universal quantifier "Vio € U” is abbreviated as “Vio”, where U 
is the universe of predicate parameters.) 


argument lines knowledge set semantics 
(1) Gr, A(2)) >B Hyp 4 Q) (2? A neu Kat) U Ka € (2? \Useu Kate) U Kn 
(2) A(Zo) Hyp 4 (2) V$o, K Alo) € KA) 
(3) da, A(x) EI(2) 4 (2) Váo, Kaeo) € Urev Kala) 
(4) B MP (1,3) 4 (1,2) V£o, ((2” \Usey Kaz) U KB) U Kaleo) € Kp 
(5) A(@o) > B. — CP (2,4) 4 (1) Vato, (27 \ Usev Ka) U Kg € (2? \ Kacey) U Kn 
(6) Va, (A(x) > B) UI(5) 3 (1) (27 NU,eu Kacey) U Ks € Urey (Q7 \ Kacey) U Kg) 


The above table exemplifies the core concept underlying predicate logic in this book. As mentioned in 
Remarks 2.1.1, 3.0.5 and 3.14.3, there is a kind of duality between logic and naive set theory where logical 
language may be regarded as a shorthand for logical operations on knowledge sets, and knowledge sets may 
be regarded as a “model” for a logical language. Some samples of this duality between the “language world” 
and the “model world" for predicate logic are given in the table in Remark 5.3.11. The “naive set theory” 
referred to here is the very minimalist set theory which is hinted at in Remark 3.2.10. 
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6.6. Some basic predicate calculus theorems 


6.6.1 REMARK: Some theorems for predicate calculus. 

Theorems 6.6.4, 6.6.7, 6.6.10, 6.6.12, 6.6.16, 6.6.18 and 6.6.24 present and prove some basic predicate calculus 
tautologies and sequents. Every sequent (with a non-empty list of premises) is a kind of “derived rule” for 
obtaining true propositions from true propositions, whereas an unconditional assertion (with an empty list 


of premises) is asserted to be a tautology. 


These theorems are directly applicable to set theory, topology and analysis. It often happens that a step in a 
proof is intuitively fairly clear, but is open to some lingering doubts. In cases where the intuition is uncertain, 
it is very helpful to have a small “library” of rigorously verified logic theorems at hand to help ensure that 
logical “bugs” do not enter into the proof. The style of predicate calculus in Definition 6.3.9 has been chosen 
to facilitate the expression of natural deduction arguments in rigorous form. Real mathematicians do not 
use the austere styles of logical argument where modus ponens is the only inference rule. 


6.6.2 REMARK: Quantification of constant propositions. 

In practice, predicates of the form Vz, A or dz, A where A is a logical formula with no occurrences of the 
free variable x, may seem to be of technical interest only. However, such predicates do have an important 
practical role in addition to their technical role. Some basic assertions for such predicates are given in 
Theorem 6.6.4. The non-emptiness of the universe of objects U is required for parts (ii) and (iv). (See 
Remark 6.3.4 for empty universes of objects.) 


6.6.3 REMARK: Convention for indicating predicate parameter lists. 

The notational conventions to indicate dependence or independence of a predicate on a variable are adopted 
from common informal practice in the mathematical literature. This must be distinguished from the more 
formal practice, where f(x), for example, means the value of a function f for a parameter x. Thus f is the 
function and f(x) is its value. 


In the often-seen informal practice, a list of variables may be appended to any expression to indicate the 
expression's dependencies which are relevant in a particular context. Thus, for example, when one writes 
Ve € Rt, dó € Rt, A(6,e), one may write ó(£) instead of ô to indicate that ô depends on £. In another 
kind of context, the bounds for solutions of partial differential equations are often written with a list of 
dependencies. Thus, for example, this assertion (under some complicated conditions) appears at the end of 
a theorem which appears in Gilbarg/Trudinger [81], page 95. 


Then 


(2) 
lula «aur S C(ulo;o + |floaeur) 
where C = C(n, a, à, A). 
This informal practice contradicts the convention that C denotes the function and Cn, o; A, A) denotes the 
value of the function. (See Notation 10.2.9 for this convention for functions and their values.) The purpose 
of this style of notation is to suggest that an expression depends only on the indicated list of parameters, 


or at least it is intended to suggest that any other dependencies are irrelevant to the context in which the 
expression appears. 


It is the informal practice which is adopted for predicate calculus here. Thus “Vz, A", for example, indicates 
that the logical expression A has no occurrence of the variable z when it is fully expanded. This is essentially 
equivalent to writing “Vz, A(x)” with the proviso that A(x) is the same proposition for any x € U, for the 
relevant universe U. One may think of the absence of a parameter x in a logical expression A as meaning 
that the proposition “Vx, Vy, (A(x) = A(y))” is being asserted. In mathematical logic, as in mathematics in 
general, one must always be alert for hidden dependencies. The standard logic textbooks abound in tedious 
conditions regarding bound and free variables. It is often not at all clear that the technical conditions 
regarding variables in predicate calculus are not subjective or intuitive. 


The convention adopted in this book for the indication of proposition parameter lists is also described in 
Remark 5.2.5. 


6.6.4 THEOREM [QC]: Some assertions for zero-parameter predicates. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 
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(i) Va, A F A. 

(ii) da, A F A. 
(iii) A F Va, A 
(iv) A F da, A 

Pnoor: To prove part (i): Va, At A 

(1) Va, A Hyp 4 (1 
(2) A UE (1) J (1 
To prove part (ii): da, At A 

(1) dz, A Hyp 4 (1) 
(2) A Hyp 4 (2) 
(3) A Theorem 4.5.7 (xii) (2) J (2) 
(4) A EE (1.23) 4 (1) 
To prove part (iii): A F Va, A 

(1) A Hyp 4 (1) 
(2) Va, A UI (1) 4 (1) 
To prove part (iv): A F da, A 

(1) A Hyp + (1) 
(2) da, A EI (1) 4 (1) 


'This completes the proof of Theorem 6.6.4. 


6.6.5 REMARK: Scrutiny of predicate calculus for zero-parameter predicates. 

Theorem 6.6.4 requires some scrutiny to ensure that there is no sleight of hand here, no cards up the sleeve, 
no rabbits in hats. It is helpful in part (i) to add a redundant dependency on the variable x to see how the 
proof develops with exactly the same rules being applied at each step. 


To prove Theorem 6.6.4 part (i): Va, AF A 
(1) Va, A(z) Hyp 4 (1) 
(2) A(@o) UE (1) 4 (1) 


This is a correct proof of part (i) because A(%o) is the same as A for any choice of $9. In the proof provided 
for part (i) following the statement of Theorem 6.6.4, the redundant dependency on the parameter of A has 
been omitted. As mentioned in Remark 6.6.3, the convention is adopted here that the absence of an explicit 
parameter for a predicate means that the predicate is independent of that parameter. 


To prove Theorem 6.6.4 part (ii): da, A F A 

(1) 3x, A(x) Hyp 4 (1) 
(2) A(#o) Hyp 4 (2) 
(3) A(o) Theorem 4.5.7 (xii) (2) 4 (2) 
(4) A(#o) EE (1,2,3) 4 (1) 


Once again, the fact that A(ĉo) is independent of 29 implies that the final line (4) is the same as A with 


the redundant dependency suppressed. However, there is no need to resort to such artifices. The proof of 
Theorem 6.6.4 follows the specified rules. 


6.6.6 REMARK: Basic relations between universal and existential quantifiers. 

Theorem 6.6.7 gives some very basic, low-level relations between the universal and existential quantifiers. 
Theorem 6.6.7 part (i) shows that A(x) is true for some z if it is true for all x. It is assumed that the 
universe of objects is non-empty. (See Remark 6.3.4.) Consequently the assertion of A(ĉo) for all variables 
ĉo in the universe on line (2) validly implies dz, A(x) on line (3) by the EI rule. 


Theorem 6.6.7 parts (iii) and (v) show the dual relationship between the universal and existential quantifiers. 
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6.6.7 THEOREM [QC]: Some elementary assertions for single-parameter predicates. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 


(i) Va, A(x) F Aa, A(x). 

i) dv, A(x) F 7A(Va, ^ A(x)). 
da, 2A(x) F -(Vz, A(x 
Va, A(x) F 7(Aa, ^ A(x)). 
Va, aA(x) F —(3m, A( 


Pnoor: To prove part (i): Va, A(x) F 3x, A(x) 


(1) Va, A(x) 

(2) A(o) 

(3) 3y, AW) 

To prove part (ii): da, A(x) F —(Vz, 2A(x)) 


To prove part (iii): da, A(x) F —-(Vz, A(x)) 


To prove part (iv): Va, A(x) F ^(3z, 2A(a)) 


Va, SA(r) F —(3z, A(x)) 


z 
"3 
5 
[e] 
A Ll 
uoc 
© 
H 
co 
E 
d 


^(Vz, 2A(z)) 
(ax, A(z)) > ^(Vz, 2A(x)) 
5) a(da, A(x)) 


This completes the proof of Theorem 6.6.7. 
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Hyp 4 (1) 

UE (1) 4 (1) 

EI (2) 4 (1) 

Hyp 4 (1) 

Hyp 4 (2) 

Hyp 4 (3) 

UE (3) J (3) 

CP (3,4) 4 0 

Theorem 4.6.4 (iv) (2,5) 4 (2) 
EE (1,2,6) 4 (1) 

Hyp 4 (1) 

Hyp 4 (2) 

Hyp 4 (3) 

UE (3) 4 (3) 

CP (3,4) 40 

Theorem 4.6.4 (v) (2,5) 4 (2) 
EE (1,2,6) 4 (1) 

Hyp 4 (1) 

Hyp 3(2) 

part (iii) (2) 4 (2) 

CP (2,3) 40 

Theorem 4.6.4 (iv) (1,4) 4 (1) 
Hyp 4 (1) 

Hyp 4 (2) 

part (ii) (2) 4 (2) 

CP (2,3) 40 

Theorem 4.6.4 (iv) (1,4) 4 (1) 
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6.6.8 REMARK: Testing the reversibility of a non-reversible conditional assertion. 

A first test for any predicate calculus formalism is whether the assertion in part (i) of Theorem 6.6.7 can 
be reversed to “prove” the obviously false assertion “Sa, A(x) H Va, A(x)”. (Preventing this kind of false 
proof is not as easy as one might expect.) Attempting to write such a proof helps to clarify the rules. An 


attempt to carry out such a false proof might have the following form. 


To prove: Jx, A(x) F Va, A(x) [pseudo-theorem] 
(1) da, A(x) Hyp 4 (1) 
(2) A(#o) Hyp 4 (2) 
(3) Va, A(x) (*) U1 (2) 4 2) 
(4) Va, A(x) EE (1,2,3) 4 (1) 
Line (3) is marked with an asterisk “(*)” because it is erroneous. Rule UI in Definition 6.3.9 states that from 
A F a($), one may infer A F Vz, a(x), but A must not have a parameter ĉ because there is no suffix “(#)” 


to indicate any such parameter. As mentioned in Remark 6.6.3, the absence of a parameter indication means 
that such a parameter is absent. In this case, however, the dependency for A(ĉo) in line (2) is A(%o). In 
semantic terms, this means that the assertion of A(ĉo) is not being made for general ĉo € U. Hence line (3) 
does not follow from line (2). (See Remark 6.6.26 for some similar comments in regard to the application of 
Rule G in the context of a related pseudo-proof.) 


6.6.9 REMARK: Some practically useful relations between universal and existential quantifiers. 
Theorem 6.6.10 gives some practically useful equivalent forms of assertions (ii) and (iv) in Theorem 6.6.7. 


6.6.10 THEOREM [Qc]: Basic assertions for one single-parameter predicate. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 


(i) F (4G, A(a))) > (Vv, ^A(x)). 
(ii gJ 


(ii 


(iv 


(v 


) 

) 

) 

) 

vi) 2(Vz, 2A(z)) F 3 
vii) 2(dz, 2A(z)) F Vz, 

ay a(Va, A(x)) F Aa, 5A(x 
ix) 
) 
) 
) 
) 


) dF eva, - 


GENS 


(x 
(xi 
(xii 


(xiii 


| A(z A 
,A(z) 1H -(Vz, A 
A(x 


3 (x)) 
E (x)). 
Va, ) dk -(3x, 5 A(a)). 
Va, A(r) AF (3m, A(x) 
F Va, (A(x) = A(z)). 

(): F (Gr, A(z))) > (Va, ^A(z)) 


(1) (42, A(x)) Hyp d (1) 
(2) Aléo) Hyp 4 (2) 
(3) Se, A(z) EI (2) 4 (2) 
(4) A(£9) > 3v, A(x) CP (2,3) 40 
(5) aA(Zo) Theorem 4.6.4 (v) (1,4) 4 (1) 
(6) Va, ^A(x) UI (5) 4 (1) 
(7) Cz, A(z))) > (Vz, ^A(z)) CP (1,6) 40 
To prove part (i): F (2(Vz, —A(x))) > (daz, A(x)) 
(1) Cz, A(z))) = (Vz, ^A(z)) part (i) 30 
(2) (A(Vx, 2A(x))) > (3v, A(x)) Theorem 4.5.7 (xxx) (1) 4 0 
] 
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To prove part (iii): F (^(3z, 2A(x))) > (Yx, A(x)) 
(1) =z, ^A(x)) 

(2) ^A(£9) 

(3) da, 2A(x) 

(4) 2A(g) > Ix, 5A(x) 

(5) A(#o) 

(6) Va, A(x) 

(7) (Ezr, ^A(z))) > (Wa, A(x)) 

To prove part (iv): F (^(Vz, A(x))) > (dx, ^A(x)) 
(1) (^(3z, ^A(z))) > (Vz, A(x)) 

(2) (^(Vz, A(a))) > (Ax, ^A(x)) 

To prove part (v):  —(dz, A(x)) F Va, 2A(x) 


(1) (42, A(x)) 

(2) Cz, A(z))) > (Vz, ^A(z)) 

(3) Vr, 2A(x) 

To prove part (vi: | ^(Vz, 2A(z)) F 3x, A(x) 


(1) ^(Vz, 5A(z)) 
2) ((Vz, ^A(z))) = (Ae, A(x) 


To prove part (vii): (da, 2A(x)) F Va, A(x) 
1) 7A(Az, -A(z)) 
) 


2) C(x, ^A(z))) = (Ve, A(x)) 

3) Va, A(x) 
To prove part (viii): — —(Vz, A(x)) F 3x, 2A(x) 

1) (Vx, A(x)) 

2) (^(Yz, A(a))) = (Gv, ^A(x)) 

3) dz, 2A(x) 
Part (ix) follows from part (vi) and Theorem 6.6.7 (ii). 
Part (x) follows from part (viii) and Theorem 6.6.7 (iii). 
Part (xi) follows from part (vii) and Theorem 6.6.7 (iv). 


Part (xii) follows from part (v) and Theorem 6.6.7 (v). 
To prove part (xiii): H Vz, (A(x) > A(x)). 

(1) A($9) => A(£o) 

(2) Va, (A(x)  A(x)) 

This completes the proof of Theorem 6.6.10. 


Hyp d 
Hyp d 
EI (2) 4 


Theorem 4.6.4 (vii) (1,4) 4 
UI (5) - 
CP (1,6) 4 


part (iii) 4 


( 
( 
( 
CP (2,3) 4 
( 
( 


Theorem 4.5.7 (xxx) (1) 4 


Hyp 7 


part (i) 4 


MP (2,1) E 


Hyp d 


part (ii) 4 
( 


MP (2,1) 4 


Hyp d 


MP (2,1) 4 


MP (2,1) E 


Theorem 4.5.7 (xi) 4 


( 
part (iii) 4 
( 


UI (1) d 


6.6.11 REMARK: Some basic assertions for a single-parameter predicate and a single proposition. 
In Theorem 6.6.12, part (x) is the contrapositive of part (ix), and is therefore essentially equivalent. However, 


the method of proof is notably different. 


6.6.12 THEOREM [QC]: Basic assertions for one single-parameter predicate and one simple proposition. 
'The following assertions follow from the predicate calculus in Definition 6.3.9. 
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v) F (ax, 2A(z)) > (3v, (A(x) > B)) 
(vi) B F Va, (A(x) => B). 
(vii) B F 3z, (A(x) > B). 
(viii) - B 2 (Az, (A(x) 2 B)) 
(ix) (Va, A(x)) > B F Aa, (A(x) > B). 
(x) =(ax, (A(x) > B)) F ~((Yx, A(x)) > B). 
(xi) da, (A(x) > B) F (Va, A(x)) > B. 
(xii) (Va, A(z)) 2 B Ak Jr, (A(x) > B). 
(xiii B)) 2 ((Vz, A(x)) > B). 


( 
—^(3x, (A(x 


) F ^(3x, (A(x) = B)). 
) d 


FE a((Vz, A(a)) > 


- Va, (A 2 B(z)). 
I- 2A > Vz, (A > B(z)). 
- dz, (A> B(x)). 


(xxi) ^A 

(xxii) F 2A > dz, (A 2 B(z)). 
(xxii) 3x, B(x) F 3z, (A> B(xz)). 
(xxiv) de, B(x) > Jz, 


(A => B(z)). 


EN A= B(x)) F A 2 Jr, B(x). 


Jx, (A > B(a)). 
dt A > Jr, B(x). 


PROOF: 
1) (az, A(x)) > B 
A(ĉo) 

da, A(x) 


To prove part (ii): 
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) 
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) 
) 
(xvi) Vz, (A => B(x)) F A > Vz, B(x). 
) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
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To prove part (i): (ax, A(x) 


Va, (A(x) > B) F (3x, 


B). 


; (A(z) = B) 
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(6) B EE (2,3,5) 4 (1,2) 
(T) (Ax, A(z)) > B CP (2,6) 4 (1) 
Part (iii) follows from parts (i) and (ii). 
To prove part (iv); dz, 2A(x) F Aa, (A(x) ^ B) 
1) Ar, A(x) Hyp 4 (1) 
(2) +A(#) Hyp 4 (2) 
3) 2A($9) V B Theorem 4.7.9 (xiii) (2) 4 (2) 
4) A(ig) > B Theorem 4.7.9 (Ixi) (3) 4 (2) 
5) dr, (A(x) > B) EI (4) 4 (2) 
6) da, (A(x) > B) EE (1,2,5) 4 (1) 
To prove part (v): F (dz, 2A(x)) > (dz, (A(x) > B)) 
(1) Ja, ^A(x) Hyp d (1) 
(2) Ix, (A(x) > B) part (iv) (1) 4 (1) 
(3) (Ax, 2 A(x)) > (Ex, (A(x) > B)) CP (1,2) 40 
To prove part (vi: B F Va, (A(x) > B) 
(1) B Hyp 4 (1) 
(2) A(29) > B Theorem 4.5.7 (i) (1) 4 (1) 
(3) Vz, (A(x) + B) UL (2) 4 (1) 
To prove part (vii): B F Jax, (A(x) > B) 
(1) B Hyp 4 (1) 
(2) A(29) > B Theorem 4.5.7 (i) (1) 4 (1) 
(3) 3x, (A(x) > B) EI (2) 4 (1) 
To prove part (viii): F B = (dz, (A(z) = B)) 
(1) Hyp 4 (1) 
(2) da, (A(x) ^ B) part (vii) (1) 4 (1) 
(3) B= (3x, (A(x) > B)) CP (1,2) 40 
To prove part (ix): (Va, A(x)) > B F 3z, (A(x) > B) 
(1) (Va, A(x)) > B Hyp 4 (1) 
(2) (^(Vz, A(x))) V B Theorem 4.7.9 (1x) (1) 4 (1) 
(3) ^(Vz, A(x)) Hyp 4 (3) 
(4) da, 2A(x) Theorem 6.6.10 (viii) 4 (3) 
(5) Ix, (A(x) > B) part (iv) (4) 4 (3) 
(6) (s(Vx, A(z))) > (Ax, (A(x) = B)) CP (3,5) 40 
(T) B= (3r, (A(x) > B)) part (viii) 40 
(8) 3x, (A(x) > B) Theorem 4.7.9 (xl) (2,6,7) 4 (1) 
To prove part (x) | (3x, (A(z) > B)) F -((Vz, A(x)) > B) 
(1) =x, (A(x) > B)) Hyp 4 (1) 
(2) Vx, 5(A(x) > B) Theorem 6.6.10 (v) (1) 4 (1) 
(3) =(A(ĉ0) = B) UE (2) 4 (1) 
(4) A(Zo) Theorem 4.5.7 (xlii) (3) 4 (1) 
(5) Vx, A(x) UI(4) 4 (1) 
(6) 5B Theorem 4.5.7 (xliii) (3) 4 (1) 
(7) ^((Vz, A(x)) > B) Theorem 4.5.7 (xlv) (5,6) 4 (1) 
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To prove part (xi): da, (A(x) > B) F (Va, A(x)) > B 


1) 3x, (A(x) > B) Hyp 4 (1) 
(2) A(to) > B Hyp 4 (2) 
3) Vz, A(x) Hyp 4 (3) 
4) A(#o) UE (3) 4 (3) 
5) B MP (2,4) 4 (2,3) 
6) B EE (1,2,5) 4 (1,3) 

7) (Va, A(x)) > B CP (3,6) 4 (1) 


Part (xii) follows from parts (ix) and (xi). 


To prove part (xiii): H (da, (A(x) > B)) 2 ((Vz, A(x)) > B) 


(1) Sa, (A(z) > B) Hyp 4 (1) 
(2) (Va, A(x)) > B part (xi) (1) 4 (1) 
(3) 3x, (A(x) > B) > (Yx, A(z)) > B CP (1,2) 40 
To prove part (xiv):  —((Vz, A(z)) 2 B) F ^(3z, (A(x) > B)) 
(1) ^((Vz, A(z)) > B) Hyp d (1) 
(2) (Ax, (A(x) > B)) > ((Vz, A(x)) > B) part (xiii) 4 ø 
(3) ^(3x, (A(x) > B)) Theorem 4.6.4 (v) (1,2) 4 (1) 
Part (xv) follows from parts (x) and (xiv). 
To prove part (xvi): Væ, (A > B(x)) - A > Vz, B(x) 
(1) Va, (A => B(a)) Hyp 4 (1) 
(2) A => B(ĉo) UE (1) J (1) 
(3) A Hyp 4 (3) 
(4) B($o) MP (2,3) 4 (1,3) 
(5) Va, B(x) UL (4) 4 (1,3) 
(6) A = Yz, B(z) CP (3,5) 4 (1) 
To prove part (xvii): A = Vz, B(x) F Va, (A => B(x)) 
(1) AS Ve, B(z) Hyp 4 (1) 
(2) A Hyp 4 (2) 
(3) Yx, B(x) MP (1,2) 4 (1,2) 
(4) B($o) UE (3) 4 (1,2) 
(5) A= Blo) CP (2,4) 4 (1) 
(6) Va, (A => B(a)) UI(5) 4 (1) 
Part (xviii) follows from parts (xvi) and (xvii). 
To prove part (xix): =A F Va, (A> B(xz)) 
1) >A Hyp 4 (1) 
(2) A => B(ĉo) Theorem 4.5.7 (xl) (1) 4 (1) 
3) Vr, (A > B(x)) UI (2) 4 (1) 
To prove part (xx):  F- 2A > Vr, (A => B(x)) 
1) ^A Hyp 4 (1) 
2) Va, (A > B(x)) part (xix) (1) 4 (1) 
3) 2A 2 Va, (A => B(z)) CP(12)40 
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To prove part (xxi): 
(1) ^4 

(2) A= Blo) 

3) Ix, (A > B(z)) 


To prove part (xxii): 
1) AA 
2) dz, (A => B(x)) 


To prove part (xxiii): 


-AF3 


F7A sg 


3) 2A > Jx, (A> B(z)) 
Jx, B(x) - Ax, (A> B(x)) 


6. Predicate calculus 


x, (A= B(z)) 


z, (A= B(z)) 


F 3z, B(x) > dx, (A => B(x)) 


1) Ar, B(x) 
2) B(so) 
3) A > B(ĉo) 
4) dx, (A => B(z)) 
5) da, (A 2 B(z)) 
To prove part (xxiv): 
1) 3z, B(x) 
2) 3x, (A => B(x)) 
3) 3x, B(x) > Ax, (A > B(x)) 
To prove part (xxv): 
1) 3x, (A => B(x)) 
2) A 2 B(ĉo) 
3) A 
4) B(£o) 
5) dz, B(x) 
6) dz, B(x) 
7) A > dz, B(x) 
To prove part (xxvi) A= dz, B(x) F 
1) A2 Jx, Bia 
2) A 
3) dz, B(x) 
4) B(ĉo) 
5) A => B(ĉo) 
6) 3x, (A => B(x)) 
7) dr, (A 2 B(z)) 
8) A > dz, (A 2 B(z)) 
9) 2A > Aa, (A => B(x)) 
(10) 3x, (A => B(z)) 
To prove part (xxvi: A= dz, B(x) H 
(1) A 2 3x, B(x) 
(2) 2A V Aa, B(x 
(3) 2A => Jx, (A> B(x)) 
(4) 3x, B(x) > 3v, (A 2 B(x)) 
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da, (A => B(x)) F A> Jz, B(x) 


z, (A> B(z)) 


z, (A 2 B(z)) 


Theorem 4.6.4 (iii) (8,9) ~ 
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Hyp 4 ( 
Theorem 4.5.7 (xl) (1) 4 ( 
EL(2) 4(1 
Hyp 4 ( 
part (xxi) (1) 4 ( 
CP (1,2) 4 
Hyp 4 ( 
Hyp 4 ( 
Theorem 4.5.7 (i) (2) 4 ( 
EI (3) 4( 
EE (1,2,4) 4 ( 
Hyp 4 ( 
part (xxiii) (1) 4 ( 
CP (1,2) ṣ 
Hyp 4 ( 
Hyp 4 ( 
Hyp 4 ( 
MP (2,3) 4 (2,3 
EI (4) 4 (2,3 
EE (1,2,5) 4 (1,3 
CP (3,6) 4 (1) 
[first proof 
Hyp 4 ( 
Hyp 4 ( 
MP (1,2) 4 (1, 
Hyp 4 (4) 
Theorem 4.5.7 (i) (4) 4 (4) 
EI (5) 4 (4) 
EE (3,4,6) 4 (1, 


CP (2,6) 4( 
part (xxii) 4 
( 


[second proof 


Theorem 4.7.9 (1x) (1) 4 


Hyp Jt 
( 
part (xxii) - 


part (xxiv) 4 
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(5) (2A V 3x, B(x)) > 3v, (A> B(x)) Theorem 4.7.9 (xli) (3,4) 40 
(6) Ix, (A > B(x)) MP (5,2) 4 (1) 


Part (xxvii) follows from parts (xxv) and (xxvi). 


To prove part (xxviii): =(Vx, (A(z) = B)) F ^((dz, A(x)) > B) 


1) ^(Vz, (A(x) > B)) Hyp 4 (1) 
2) (3v, A(x)) > B Hyp 4 (2) 
3) Va, (A(x) > B) part (i) (2) 4 (2) 
4) (Ex, A(z)) > B) => (Vz, (A(x) > B)) CP (2,3) 40 
5) a((ax, A(z)) > B) Theorem 4.5.7 (xxxii) (4,1) 4 (1) 
To prove part (xxix):  —((dz, A(z)) > B) F «(Vz, (A(z) > B)) 
1) =((3x, A(z)) = B) Hyp 4 (1) 
2) Yx, (A(x) = B) Hyp 4 (2) 
3) Gr, A(x)) > B part (ii) (2) 4 (2) 
4) (Vx, (A(x) = B)) > ((Sx, A(x)) > B) CP (2,3) 40 
5) a(Va, (A(x) > B)) Theorem 4.5.7 (xxxii) (4,1) 4 (1) 


Part (xxx) follows from parts (xxviii) and (xxix). 


6.6.13 REMARK: A difficult predicate calculus assertion proof. 

The proof of Theorem 6.6.12 part (xxvi) is slightly more complex than the proofs of most other parts. The 
first proof given here almost fails at line (7) because of the residual dependency on the proposition A from 
line (2) from which “3x, B(x)” was obtained in line (3), from which *3z, (A + B(x))” was then obtained 


in line (7). The situation is saved by introducing an initial (apparently pua condition A at line (8) 
and then removing it by noting that the desired conclusion also follows from ~A. So it must be true! 


Margaris [369], pages 93-94, gives two proofs of an assertion which is slightly stronger than part (xxvi), 
suggesting that he may have encountered similar difficulties proving this assertion. 


6.6.14 REMARK: Completeness of a logical calculus. 

When a proof runs into difficulties (as suggested by the minor difficulties described in Remark 6.6.13), and 
it is “known” that the desired conclusion of the proof is valid, one does very reasonably ask whether a 
particular logical calculus has the ability to prove all true theorems within the given logical system. This is 
the “completeness” question. This metamathematical question is outside the scope of this book, but it is 
thoroughly and very adequately investigated in the mathematical logic literature. 


The possible existence of “dark theorems” which are true but not provable, is not a purely philosophical or 
psychological concern. In practical mathematics research, it is very reasonable to ask after an open problem 
has been researched intensively for a hundred or more years whether the problem is a “dark problem” which 
no amount of analysis can resolve. If the problem can be correctly expressed within a logical system which 
is provably complete, then one can have confidence that further analysis will some day achieve a result one 
way or the other. And if one can prove somehow that no proof within the system is possible, one can state 
with some confidence that the desired theorem is false. But within an incomplete system, it would not be 
clear whether the expenditure of further effort is justified after a hundred or more years of intense research 
has achieved no result. 


In this book, it is generally assumed that there is a fixed underlying logical system (or “model”), and that 
the logical language or calculus is merely a convenient means for discovering theorems for that logical system. 
If the calculus is not capable of proving all theorems which are true within the underlying system, this raises 
the possibility that other systems might satisfy all of the provable theorems of the calculus, but differing from 
the chosen system only in those theorems which are not provable. This kind of speculation leads down the 
path to “model theory", where one fixes the axioms and rules of the language and investigates the possible 
systems which satisfy the language. In model theory, the language is fixed and the underlying model is 
variable. But if the model is fixed, and the language is regarded as a mere tool for studying it, then model 
theory loses much of its allure. Model theory is less interesting if the model is a single fixed model. 
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Figure 6.6.1 illustrates one way in which a logical language could be used either with only one interpretation 
or with many interpretations. In the case of the transmitting (TX) entity, there is only one model, which is 
axiomatised for communicating ideas efficiently. The receiving (RX) entity only has access to the language 
and therefore cannot be certain about the model from which the TX entity created the language. So for the 
RX entity, there is ambiguity in the interpretation, while for the TX entity there is no such ambiguity. 


logical | communication — ,| logical 
language language 
axiomatisation | ; | \ interpretation 
semantics semantics 
TX model RX models 
Figure 6.6.1 Possible origin of single-model and multi-model logical languages 


6.6.15 REMARK: Some basic assertions for the conjunction and disjunction operators. 

The assertions of Theorem 6.6.12 for a single-parameter predicate and a single proposition all concern the 
implication and negation logical operators. (Some of the proofs for Theorem 6.6.12 did admittedly make 
use of the conjunction and disjunction operators for convenience, but these operators did not appear in the 
assertions of the theorem themselves.) Theorem 6.6.16 concerns the conjunction and disjunction operators. 


Theorem 6.6.16 parts (ii) and (iii) are not valid if the universe is empty. Therefore they cannot be extended 
to obtain the corresponding set-theoretic theorems of the form Va € X, (A(x) ^ B) - (Wa € X, A(x)) ^ B. 
and (Vr € X, A(z)) ^ B Ak Va € X, (A(x) ^ B) unless it is known that X 4 Ø. The same comment 
applies to parts (vii) and (viii). (The empty universe issue is discussed in Remark 6.3.4. It is rule EI in 
Definition 6.3.9 which guarantees a non-empty universe.) 


6.6.16 THEOREM [Qc]: Some basic assertions for quantifiers and conjunction and disjunction operators. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 


(i) (Va, A(z)) ^ B F Va, (A(x) ^ E 
(ii) Va, (A(x) ^ B) F (Wa, A(x)) A 
(iii) (Vz, A(x)) ^ B Ak Va, (A(x) ^ B). 
(iv) (3r, A(z)) ^ B F 3x, (A(x) ^ - 
(v) da, (A(z) ^ B) F (ax, A(x)) ^ 
(vi) (dr, A(z)) ^ B Ak 3z, (A(x) ^ B). 
(vii) (Va, A(z)) V B F Va, (A(x) V 
(viii) Vz, (A(x) V B) F (Yz, A(x)) V 
(ix) (Va, A(x)) V B Ak Va, (A(x) V B). 
PROOF: To prove part (i): (Vz, A(z)) ^ B F Va, (A(x) ^ B) 
(1) (Va, A(z)) A B Hyp d (1) 
(2) 5((Vz, A(x)) > 4B) Definition 4.7.2 (ii) (1) 4 (1) 
(3) (3x, (A(x) > -B)) Theorem 6.6.12 (xiv) (2) 4 (1) 
(4) Va, 2(A(x) > ~B) Theorem 6.6.10 (v) (3) 4 (1) 
(5) Va, (A(x) ^ B) Definition 4.7.2 (ii) (4) 4 (1) 
To prove part (ii): Va, (A(z) ^ B) F (Va, A(x)) ^ B 
(1) Va, (A(x) ^ B) Hyp 4 (1) 
(2) Va, (A(x) > ~B) Definition 4.7.2 (ii) (1) 4 (1) 
(3) 5(3x, (A(z) > -B)) Theorem 6.6.7 (v) (2) 4 (1) 
(4) a((Va, A(x)) > ^B) Theorem 6.6.12 (x) (3) 4 (1) 
(5) (Va, A(z)) ^ B Definition 4.7.2 (ii) (4) 4 (1) 
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Part (iii) follows from parts (i) and (ii). 

To prove part we (ax, A(z)) ^ B F 3x, (A(x) A B) 
(1) Gz, A(x)) ^ 

(2) o A(x)) => a 

(3) ^Y 2, (A (x) > 4B)) 
(4) 
(5) 


4) Ar, =(A(x) > =B) 
5) da, (A(x) ^ B) 
To prove part (iv): (da, A(x)) ^ B F dz, (A(x) ^ B) 

(1) (Az, A(z)) ^ B 

(2) 3x, A(x) 

(3) 
(4) 
(5) 
(6) 
(7) 


si 


OET 
& 
2 


5 
6 
7 

To prove part (v): da, (A(x) ^ B) F (az, A(z)) ^ B 

, (A(z) ^ B) 

; (A(z) = =B) 

(vx, (A(z) = ^B)) 

(da, A(z)) > ^B) 

Jz, A(x)) ^ B 
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a 
> 
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T; 
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Part (vi) follows from parts (iv) and (v). 

To prove part (vii: (Vz, A(z)) V B F Vz, (A(z) V B) 

1) (Vz, A(z)) V B 

(2) =B => Vx, A(x) 

3) Va, (2B = A(z)) 

4) =B => A(ĉo) 

5) A($o) V B 

6) Vr, (A(x) V B) 

To prove part (viii): Va, (A(z) V B) F (Va, A(z)) V B 
1) Va, (A(x) V. B) 
2) A(29) V B 

3) 4B > A(ĉo) 

4) Vz, (^B = A(z)) 

5) 4B > Va, A(x) 
6) (Yx, A(z)) V B 


Part (ix) follows from parts (vii) and (viii). 
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[first proof] 

Hyp 4 (1) 

Definition 4.7.2 (ii) (1) 4 (1) 
Theorem 6.6.12 (xxix) (2) 4 (1) 
Theorem 6.6.10 (viii) (3) 4 (1) 
Definition 4.7.2 (ii) (4) 4 (1) 
[second proof] 

Hyp 4 (1) 

Theorem 4.7.9 (xi) (1) 4 (1) 
Theorem 4.7.9 (xii) (1) 4 (1) 
Hyp 4 (4) 

Theorem 4.7.9 (i) (4,3) 4 (1,4) 
EI (5) 4 (1,4) 

EE (2,4,6) 4 (1) 

Hyp 4 (1) 

Definition 4.7.2 (ii) (1) 4 (1) 
Theorem 6.6.7 (iii) (2) 4 (1) 
Theorem 6.6.12 (xxviii) (3) 4 (1) 
Definition 4.7.2 (ii) (4) 4 (1) 
Hyp 4 (1) 

Theorem 4.7.9 (1xiii) (1) 4 (1) 
Theorem 6.6.12 (xvii) (2) 4 (1) 
UE (3) 4 (1) 

Theorem 4.7.9 (Ixiv) (4) 4 (1) 
UI(5)d (1) 

Hyp 4 (1) 

UE (1) 4 (1) 

Theorem 4.7.9 (Ixiii) (2) 4 (1) 
UI(3) 4 (1) 

Theorem 6.6.12 (xvi) (4) 4 (1) 
Theorem 4.7.9 (Ixiv) (5) 4 (1) 


6.6.17 REMARK: Importation of two-proposition theorems into predicate calculus. 
All of the propositional calculus theorems which are valid for two logical expressions can be immediately 
imported into predicate calculus by making both logical expressions depend on the same variable object. 


Theorem 6.6.18 gives a sample of such imported assertions. 


obviously be generated automatically. 
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6.6.18 THEOREM [QC]: 


Some assertions which universalise some propositional calculus assertions. 


The following assertions follow from the predicate calculus in Definition 6.3.9. 


(i) Va, (A(x) > B(x)) F Va, (^A(x) V B(x)). 
(ii) Va, (2A(z) V B(a)) F Va, (A(x) > B(x)). 
(iii) Vr, (A(z) > B(r)) dk Va, (2A(x) V B(x)). 
(iv) Va, B(x) F Va, (A(x) > B(x)). 

(v) Va, 2A(z) F Va, (A(x) > B(x)). 


x 


Pnoor: To prove part (i): Va, (A(z) = B(z)) F Va, (2A(x) V B(x)) 

1) Var, (A(x) > B(a)) Hyp 4 (1) 
2) A($o) > B(%o) UE (1) 4 (1) 
3) -A(&o) V B(ĉo) Theorem 4.7.9 (Ix) (2) 4 (1) 
4) Vz, (^A(z) V B(z)) UI (3) 4 (1) 

To prove part (i): Va, (2A(x) V B(x)) F Va, (A(x) > B(z)) 
1) Va, A(z) V B(a)) Hyp 4 (1) 
2) SA($9) V B(ĉo) UE (1) J (1) 
3) A(ĉ0) > B(ĉo) Theorem 4.7.9 (lxi) (2) 4 (1) 
4) Vz, (A(x) = B(x)) UI (3) 4 (1) 
Part (iii) follows from parts (i) and (ii). 
To prove part (iv): Va, B(x) F Vz, (A(x) > B(z)) 
(1) Va, B(x) Hyp 4 (1) 
(2) B(o) UE (1) 4 (1) 
(3) A(29) > B(£o) Theorem 4.5.7 (i) (2) 4 (1) 
(4) Va, (A(x)  B(z)) UI (3) 4 (1) 
To prove part (v): Væ, 2A(z) F Va, (A(x) > B(2)) 
(1) Va, ^A(x) Hyp 4 (1) 
(2) ^A($0) UE (1) 4 (1) 
(3) A(£9) > B(ĉo) Theorem 4.5.7 (xl) (2) 4 (1) 
(4) Va, (A(x)  B(z)) UI (3) 4 (1) 


'This completes the proof of Theorem 6.6.18. 


6.6.19 REMARK: Universal implies existential if the universe is 


Theorem 6.6.20 may be applied to sets to assert that Vr € X, B(x) implies 3 
7.6.7.) 


that X Æ (). In this application, A(x) = “x € X". (See Theorem 


non-empty. 


x € X, B(x) if it is assumed 


Unfortunately commas cannot be used to separate the assumptions in sequents for predicate calculus because 
of the resulting ambiguity. So a semi-colon is used for Theorem 6.6.20 (i) instead. Alternatively the two 
assumptions could be enclosed in parentheses and joined by a conjunction, but this would require extra lines 


in the proof. 


6.6.20 THEOREM [Qc]: Universal implication implies existential 


conjunction. 


The following assertion follows from the predicate calculus in Definition 6.3.9. 


(i) da, A(x); Va, (A(x) > B(x)) F Ja, (A(x) ^ B(x)). 


Pnoor: To prove part (i): da, A(x); Vx, (A(x) > B(z)) F 
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(1) 3x, A(x) Hyp + (1) 
(2) Va, (A(x) = B(x)) Hyp d (2) 
(3) A(Zo) Hyp 4 (3) 
(4) A(£9) = B(ĉ0) UE (2) 4 (2) 
(5) B(&o) MP (4,3) 4 (2,3) 
(6) A($9) ^ B(g) Theorem 4.7.9 (i) (4,5) 4 (2,3) 
(7) da, (A(x) ^ B(z)) EI (6) 4 (2,3) 
(8) da, (A(x) A B(x)) EE (1,3,7) 4 (1,2) 


'This completes the proof of Theorem 6.6.20. 


6.6.21 REMARK: Repeated use of the same free variable. 

There is a subtlety in lines (5) and (6) of the proof of part (ii) of Theorem 6.6.22. Line (5) uses the same 
free variable $9 as line (4). This is not an immediate problem. The semantics is clear. Each line is a 
distinct sequent whose free variables are universalised to produce an interpretation. Lines (4) and (5) are 
correctly inferred by universal elimination from lines (2) and (3) respectively. Universal elimination replaces 
an explicit universal quantifier with an implicit universal quantifier. The possible difficulty arises here in 
line (6). When this line is interpreted by universalising the single free variable 29, the inference of line (6) 


from lines (4) and (5) is equivalent to the proposition to be proved, namely part (ii) of the theorem. 


More important than the validity of the inference is whether it obeys the stated rules. The application of the 
rules must not rely upon intuition or guesswork. If an invalid form of argument is accepted because it gives 
a valid result, it may later be applied to “prove” and invalid result. The inference of line (6) from lines (4) 
and (5) is justified by Theorem 4.7.9 (i), which is based upon the three axiom templates in Definition 6.3.9 (vii) 
and the modus ponens rule Definition 6.3.9 (MP). These axioms and the MP rule explicitly permit arbitrary 
soft predicates with any numbers of parameters to be substituted for the wff-names. So the rules are in fact 
followed correctly here. 


6.6.22 THEOREM[QC]: Some assertions which split universal conjunctions into conjunctions of universals. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 


(i) Va, (A(x) ^ B(z)) F (Wa, A(x)) ^ (Va, B(x)). 
(ii) (Va, A(x)) ^ (Va, B(z)) F Va, (A(x) ^ B(z)). 
(iii) Va, (A(z) ^ B(r)) dk (Va, A(x)) ^ (Yz, B(x)). 


Pnoor: To prove part (i): Va, (A(x) ^ B(z)) F (Va, A(x)) A (Yz, B(x)) 


(1) Va, (A(z) ^ B(a)) Hyp 4 (1) 
(2) Ao) ^ Bléo) UE(1) 4 (1) 
(3) A(Zo) Theorem 4.7.9 (xi) (2) 4 (1) 
(4) vr, A(x) Ul (3) 4 (1) 
(5) B(o) Theorem 4.7.9 (xii) (2) 4 (1) 
(6) Vr, B(x) UI (5) 4 (1) 
(7) (Va, A(z)) ^ (Vx, B(x)) Theorem 4.7.9 (i) (4, 6) 4 (1) 
To prove part (ii): (Va, A(x)) ^ (Va, B(z)) F Va, (A(x) ^ B(x)) 

(1) (Va, A(x)) ^ (Yz, B(x) Hyp 4 (1) 
(2) Vx, A(x) Theorem 4.7.9 (xi) (1) 4 (1) 
(3) Va, B(x) Theorem 4.7.9 (xii) (1) 4 (1) 
(4) A(o) UE (2) 4 (1) 
(5) B($o) UE (3) 4 (1) 
(6) A(%o) A Blo) Theorem 4.7.9 (i) (4,5) 4 (1) 
(7) Va, (A(x) ^ B(a)) UI (6) 4 (1) 


Part (iii) follows from parts (i) and (ii). 
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6.6.23 REMARK: 


6. Predicate calculus 


A theorem for two-parameter predicates. 


Theorem 6.6.24 gives some basic quantifier-swapping properties for two-parameter predicates. Part (iii) is 
frequently used in mathematical analysis. 


6.6.24 THEOREM [QC]: 


(i) Vx, Vy, A(z, y) 


(ii) Jr, AY, A(x, y) 
(iii) Sa, Vy, A(x, y) 
PROOF: 
1) Va, Vy, A(x, y) 
) Vy, A(£o, y) 
3) A(Zo, ĝo) 

4) Vx, A(x, jo) 
5) Vy, Va, A(x, y) 


2 


Jx, A(z, y). 


To prove part (i): 


To prove part (ii): 3 


Jy, AQ y) F 3 


) Jx, dy, A(x, y) 
) dy, A($o, y) 
) A(%o, Yo) 

4) Fa, A(x, Go) 
) 
) 
) 


dy, dx, A(x, y) 


To prove part (iii 

) da, Vy, A(z, ) 
) Vy, A($o, y) 
3) A(Zo, ĝo) 
) 
) 


da, A(z, Yo) 


ax, A(x, y) 
6) Vy, Sa, A(x, y) 


'This completes the proof of Theorem 6.6.24. 


6.6.25 REMARK: 


z, Vy, A(z, 


da, A(x, y). 


Va, Vy, A(z, 


Some propositions for two-parameter predicates. 
The following assertions follow from the predicate calculus in Definition 6.3.9. 


F Yy, Yz, Al ayy). 
y; 
F Vy, 


y) F Vy, Va, A(x, y) 


y, 


Jc, A(x, y) 


y) F Vy, dx, A(x, y) 


An erroneous application of the EE rule to prove a correct theorem. 


The converse of part (iii) is not valid, but this false converse 
provides a useful “stress test" of a predicate calculus, as discussed in Remark 6.6.26. 


The following “proof” of Theorem 6.6.24 (ii) appears to be quite plausible if one does not look too closely. 
'The theorem is correct, but the proof is incorrect. 


1) Az, dy, A(z, y) 
2) dy, A(ĉo, y) 
A(ĉo, ĝo) 

x, A(x, ĝo) 

x, A(x, ĝo) 
Jy, dz, A(x, y) 
T) Jy, Ix, A(x, y) 


EE (1,2,6) - 


Line (5) in this pseudo-proof is erroneous because the consequent proposition which is asserted in line (4) is 
not independent of ĝo. If line (5) is expanded to a full sequent, it becomes evident that this line is not only 
falsely inferred. It is also a false sequent. 
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(1) dz, dy, A(z,y) > dv, dy, A(z, y) 4 (1) 
(2) Yĉo € U, dy, A(fo, y) — dy, Ato, y) J (2) 
(3) Vo € U, Vig € U, A(£0, ĝo) = A(£o, ĝo) 4 (3) 
(4) Vo € U, Vio € U, A(£0, ĝo) — Ax, A(x, ĝo) 1 (3) 
(5) Yĉo € U, Vio € U, dy, A(fo,y) — Ax, A(x, ĝo) (*) 4 (2) 
(6) V € U, Jy, A($o,y) > 3y, Ix, A(x, y) 3(2) 
(7) Vy, da, A(z,y) > Aa, Vy, A(z, y) 4 (1) 


Only the sequent for line (5) is invalid. The others sequents are valid, including on the concluding line (7). 
One might easily overlook the erroneous inference on line (5) because the conclusion is correct. This demon- 
strates a benefit of having well-defined sequent semantics for each line of a proof to highlight erroneous 
inference steps. However, although an invalid sequent always indicates an erroneous inference, an erroneous 
inference does not always yield an invalid sequent. So even the correctness of all sequents in an argument 
does not imply that all applications of inference rules have been correctly executed. 


In the above argument, a correct application of the EI rule on line (6) yields a valid sequent from the invalid 
sequent on line (5), and subsequently the concluding line (7) is valid. But the entire argument must be 
rejected as invalid because there is a single erroneous sequent, which arises from a single false application of 
arule. The correctness of the conclusion is not the sole criterion of the correctness of a proof. Valid arguments 
yield valid conclusions, but valid conclusions may follow from invalid arguments. It is the integrity of the 
process which determines the validity of an argument. 


6.6.26 REMARK: An attempt to reverse a non-reversible theorem. 

In Remark 6.6.8, an attempt is made to reverse the non-reversible theorem Vz, A(x) F 3x, A(x), which is 
given as Theorem 6.6.7 part (i). Similarly, Theorem 6.6.24 part (iii) is non-reversible. It is of some interest 
to determine whether the chosen predicate calculus in Definition 6.3.9 will permit a false reversal in this 
case, and if not, why not. 


False converse of Theorem 6.6.24 part (ii): Wy, 3x, A(z,y) F dv, Vy, A(x, y) [pseudo-theorem] 
(1) Vy, Jx, A(x, y) Hyp 4 (1) 
(2) Ix, A(x, ĝo) UE (1) 4 (1) 
(3) A(2o, ĝo) Hyp 4 (3) 
(4) Vy, A(£o. y) (*) UI (3) 4 (3) 
(5) da, Vy, A(z, y) EI (4) 4 (3) 
(6) da, Vy, A(x, y) EE (2,3,5) 4 (1) 


It is quite difficult to see why this form of proof fails. Lines (1) to (6) have the following meanings when 
they are explicitly expanded as sequents. (See Remark 6.5.9 regarding such sequent expansions.) 


(1) Vy, dv, A(z,y) > Vy, Ix, A(x, y) 4 (1) 
(2) Vio € U, Vy, da, A(z,y) > Ix, A(x, ĝo) 4 (1) 
(3) Vio € U, Yo € U, A(Zo0, ĝo) = A($0. ĝo) 4 (3) 
(4) Yĉo € U, Vio € U, A(ĉ0, ĝo) = Vy, A(ĉo, y) (*) 4 (3) 
(5) Vio € U, Vio € U, A(Zo, ĝo) = Ax, Vy, A(x, y) (*) 4 (3) 
(6) Vy, dv, A(z,y) => dz, Vy, A(x, y) (*) 4 (1) 


When expanded as explicit sequents like this, it is clear that line (4) is not a valid sequent. (As a consequence, 
lines (5) and (6) are also not valid.) The cause of this is the incorrect application of the UI rule. This rule 
states that from AF a($) one may infer A F Vz, a(x). The dependency list A for line (3) contains the single 
proposition *A(£9, jo)", which clearly does depend on the parameter jo. So the UI rule cannot be applied. 
Thus the error in the above pseudo-proof lies in the incorrect application of the UI rule, not the EE rule as 
one could have suspected. In fact, it is always invalid to apply the UI rule to a hypothesised sequent, since 
the free variable to which the universalisation is to be applied always appears in the dependency list if it 
appears in the consequence. (See Remark 6.6.8 for some similar comments in regard to the application of 
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Rule G in the context of a related pseudo-proof.) Rosser [387], page 124, referred to the UI rule as “Rule G”, 
meaning “the generalisation rule". 


At first sight, it might seem that the inference of line (4) from line (3) is correct because line (3) (in the 
abbreviated argument lines) seems to say that A(£o, Go) is true for all 29 € U and jg € U. But it does not 
say this. Line (3) actually says that A(ĉo, ĝo) = A(ĉo, ĝo) for all ĉo € U and ĝo € U, which really asserts 
nothing about A at all. One might also think that the fully expanded sequent on line (4) asserts that the 
statement “Vig € U, Vy, A(£9, y)" follows from the statement “Yĉo € U, Vo € U, A(%o, ĝo)”. That would 
be a correct assertion. But line (4) does not say this! The key point here is that the universal quantifiers 
for the free variables are applied to the entire sequent, not to the consequent and antecedent propositions 
of the sequent separately. 


'This pseudo-proof example demonstrates that it is not sufficient to examine only the clearly visible consequent 
proposition of a sequent before applying an inference rule. The consequent proposition appears clearly in the 
foreground of an argument line, but in the background is a list of dependencies at the right of the line. Every 
dependency in that list must be examined to ensure that the variable to be universalised in an application of 
the UI rule is not present. The dependency list is an integral part of the sequent. This observation applies 
also to the EE rule. The absence of a free variable parameter "(2)" appended to a dependency list A in 
Definition 6.3.9 means that the propositions in the dependency list must not contain the free variable i. 


It is perhaps worth mentioning that although the free variable jg appears in both lines (2) and (3) in the 
above pseudo-proof, this does not in any way link the two lines, except as a mnemonic for the person reading 
or writing the logical argument. Each line is independent. The scope of any free variables in a line is restricted 
to that line. One might think of the free variable ĝo in the formula “A(Zo, jo)" on line (3) as meaning “the 
variable jo which was chosen in line (2)”, which is what it would be in standard mathematical argumentation 
style. In fact, the formula “A(ĉo, ĝo)” on line (3) commences a new argument, whose lines are linked only 
by the line numbers in parentheses at the right of each line. The free variables may be arbitrarily renamed 
on each line without affecting the validity or outcome of the argument in any way. Thus the meaning of 
every line is history-independent. In other words, there is no “entanglement” between the lines of a predicate 
calculus argument! (This is hugely beneficial for the robustness and verifiability of proofs.) Hence the error 
at line (4) in the pseudo-proof has nothing at all to do with the apparent linkage of the variable jo (which 
is removed by the application of the UI rule) to the apparent fixed choice of this variable in line (2). The 
variable isn't fixed and it isn't linked! 


It is perhaps also worth mentioning that the error in the above pseudo-proof has nothing to do with the 
invocation of the EE rule on line (6). Nor is the error due to the fact that the UI rule is applied to a line 
which is within the apparent scope of the application of the EE rule from line (2) through to line (5). The 
fact that the UI rule is applied to a variable jo which appears in the existential formula “Sx, A(x, ĝo)” in 
line (2) is irrelevant. (See E. Mendelson [370], page 74, line (III), for a statement which does suggest this 


kind of linkage for a closely related variant of the predicate calculus which is presented here.) 


6.6.27 REMARK: Deduction metatheorems for predicate calculus. 

There are many kinds of deduction metatheorems for predicate calculus in the literature. (See for example 
Kleene [366], pages 112-114; Shoenfield [390], page 33; Robbin [384], pages 45-47; Stoll [393], page 391.) 
Different logical frameworks require different deduction metatheorems, but their proofs all require some kind 
of metamathematical induction or recursion argument. 


Since the principal objective of formal logic is to guarantee the logical correctness of mathematical proofs 
by automating and mechanising deductions, it seems somewhat counterproductive to rely on metatheorems 
whose proofs require informal inference by human beings who check their work by testing with examples 
and thinking as carefully as they can. It would be far preferable to adopt rules and axioms which permit 
the correctness of mathematical proofs to be fully automated, thereby “taking the human out of the loop". 
If some metatheorems seem indispensable for efficient logical deduction, one may as well convert those 
metatheorems into primary inference rules, which is more or less what a natural inference system does. Since 
a billion lines of inference may be checked every second by modern computers, it seems quite unnecessary 
to invoke efficiency as a motivation for metatheorems. The deduction metatheorem, in particular, says only 
that an equivalent proof exists without the benefit of the metatheorem. If metatheorem methods are justified 
by their equivalence to methods which do not use metatheorems, it could be argued that it is somewhat 
superfluous to adopt them. 
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6.7. Predicate calculus with equality 


6.7.1 REMARK: Equality is an axiomatised concrete predicate in a predicate calculus. 
The semantics of predicate calculus with equality is an excellent introduction to the semantics of the more 
general first-order languages. Some classes of logic languages are compared in Table 6.7.1. 


abbreviation class of logic language concrete spaces axiomatised predicates 
PC propositional calculus P 
QC (pure) predicate calculus Q, V, P 
QC + EQ predicate calculus with equality Q, Y, P “= in O 
FOL first-order language O,V,P,F all predicates/maps in Q, F 
ZF Zermelo-Fraenkel set theory Q, V, P “=”, “E” in Q = {“=”, “c” } 
Table 6.7.1 Some classes of logic languages 
In QC+EQ, the concrete predicate space Q includes at least the binary equality predicate “=”, which is 


axiomatised. Other concrete predicates may also exist, but they do not have specified axioms or rules. In a 
predicate calculus with equality, any additional predicates are only axiomatised with respect to substitutivity 
of variable names which refer to the same variable. 


In a pure predicate calculus QC (without equality), there may be any number of predicates, but there are 
no non-logical axioms at all for these. In a FOL, there may be any numbers of concrete predicates in Q 
and concrete object-maps in F, and these are typically all axiomatised. ZF is a particular FOL which 
is a predicate calculus with equality with one additional axiomatised concrete predicate and no concrete 
object-maps. In other words, Q = {“=”, “€” } and F = (f. 


As outlined in Remark 5.1.4, a pure predicate calculus has any number of concrete predicates and variables, 
which are combined to produce concrete propositions. These concrete propositions are then suitable for 
formalisation as a concrete proposition domain P as in Section 3.2, with a proposition name space Np as 
in Section 3.3. In a predicate calculus with equality, the equality predicate is axiomatised both in relation 
to itself and in relation to all other predicates. This requires some “non-logical” or “extralogical” axioms in 
addition to the purely logical axioms which are given in Definition 6.3.9 for pure predicate calculus. 


The two kinds of predicates are distinguished here as “concrete” and “abstract” , but some books refer to them 
as “constant” and “variable” predicates. Another way to think of them is as “hard” and “soft” predicates, 
analogous to the hardware and software in computers. The “hard” predicates are built-in, constant and 
concrete, whereas the “soft” predicates are constructed as required as logical formulas from the basic logical 
operators and quantifiers. (See also Remark 6.3.7 for soft predicates.) 


6.7.2 REMARK: Confusability of relation symbols in languages and metalanguage. 

Regrettably the symbols “=” and “€” are used here with two very different meanings which can be easily 
confused. Some texts use different sets of symbols for mathematical and metamathematical relations to 
avoid confusion. However, here the different kinds of meanings are distinguished only by context. 

The symbols “=” and “€” in Table 6.7.1 are “axiomatised predicates” in the “(object) language”, whereas 
the same symbols “=” and *€" in Notation 3.6.5, for example, are part of the metalanguage. The different 
meanings of these symbols can be determined from the expressions on their left and right, depending on 


whether the expressions are part of the language or the metalanguage. 


6.7.3 REMARK: Equality in predicate calculus means that two names are “bound” to the same object. 

As in the cases of pure propositional calculus and pure predicate calculus, so also in the case of predicate 
calculus with equality, the formulation in terms of a language must be guided by the underlying semantics. 
In Figure 5.1.2 in Remark 5.1.6, concrete object names are shown in a separate layer to the concrete objects 


themselves. The name map py : Ny — Y for variables may or may not be injective in a particular scope. 


There is an underlying assumption in the formalisation of equality that the space of concrete variables V has 
some kind of “native” equality relation which is extralogical, meaning that it is an application issue. (Le. it’s 
somebody else's problem. See D. Adams [486], pages 329-330.) Then two names of variables x and y are said 
to be equal in the name space Ny whenever py (“x”) and py(“y”) are equal in the concrete object space V. 
In other words, x and y are “bound” to the same object. (The object binding map uy is scope-dependent.) 
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6.7.4 REMARK: A concrete variable space (almost) always has an equality relation. 

If a space of concrete variables V is given for a logical system, one usually does have the ability to determine 
if two objects in Y are the same or different. This is surely one of the most basic properties of any set or class 
or space of objects. In the physical world, two electrons may be indistinguishable. They have no “identity”. 
But abstract objects constructed by humans generally do not have that problem. 


An equality relation may be thought of as a map E : Y x Y + P which has the symmetry and transitivity 
properties of an equivalence relation as in Definition 9.8.2. In other words, E(v1,v2) = E(v2,v1) for all 
U1,U9 € V and (E(vi,v3) ^ E(v2,v3)) > E(v1, v3) for all v4, v2,v3 € V. However, a difficulty arises when 
one attempts to require the reflexivity property. To require that E(v, v) to be true for any v € V exposes the 
issue of the difference between names and objects. This requirement is equivalent to requiring that E(vi, v2) 
be true if v; = v2, which can only be determined by reference to an even more concrete equality relation. An 
equality relation E should satisfy E(vi,v2) if and only if vı = v2. In other words, if vı Æ v2, then E(vi, v2) 
must be false. But this means that E must be the same relation as “=”, which raises the question of where 
this pre-defined relation “=” comes from. 


When using a variable name v in some context in some language, it is assumed that it has a fixed “binding” 
within its scope. For example, when one writes “x < x”, one assumes that the first “x” refers to the same 
object (such as a number or an element of an ordered set) as the second “x”. This means that the value of the 
name-to-object map py maps the two instances of “x” to the same object. In other words, uy (z1) = uy(z2), 
where x; and z2 are the first and second occurrences of the symbol “x”. But this equality of uy(z1) and 
Ly (x2) is only meaningful if an equality relation is already defined on the object space V. So it is not even 
possible to say what it means that the binding of a variable name stays the same within its scope unless an 
equality relation is available for the object space. Consequently, if the symbol “v” is a name in the variable 
name space Vy, and E is a relation on this name space, then the requirement that E(v,v) must be true for 
all v € My is meaningless because it is not possible to say that the binding of “v” has stayed the same within 
its scope. To put this another way, the definition of a function requires it to have a unique value for each 
element of its domain, but uniqueness cannot be defined without an equality relation on the target space. 
So it is impossible to say whether the name map uy : Ny — V is well defined. 


6.7.5 REMARK: Distinguishing an equality relation from an equivalence relation. 

Although an equality relation is necessarily an equivalence relation, it is not immediately clear how to 
add some extra condition to distinguish equality from equivalence. Condition (i) in Definition 9.8.2 for an 
equivalence relation R on a set X requires x Rz to be true for all x in X, which means that x Ry must 
be true whenever x = y. Since all equivalence relations are implicitly or explicitly defined in terms of an 
underlying equality relation, the circularity of definitions is difficult to overcome, as noted in Remark 6.7.4. 


There are two main ways to formalise equality within Zermelo-Fraenkel set theory. Approach (1) is applicable 
to general predicate calculus with equality. Approach (2) is specific to set theories. (The ZF axiom of 
extension has two different forms depending on which approach is adopted. See Remark 7.5.12 for more 
detailed discussion of these two approaches to defining equality.) 


(1) Require the equality relation to satisfy “substitutivity of equality” with respect to all other predicates. 
This means that any two names which refer to the same object may be used interchangeably in any 
other predicate. 


(2) Define the equality relation in terms of set membership. The equality ^r = y" may be defined in a set 
theory to be an abbreviation for the logical expression Vz, ((z € x) = (z € y)), where “€” is the name 
bound to the concrete set membership relation. 


The axioms in Definition 6.7.6 (ii) require the equality relation to satisfy the three basic conditions for an 
equivalence relation and a substitutivity condition with respect to any concrete predicate whose name A is in 
No,,, the set of names of concrete predicates with n parameters, where n is a non-negative integer. (See the 
proof of Theorem 7.3.5 (iv) for an application of the substitutivity axiom, Definition 6.7.6 (ii) (Subs). The 
substitutivity axiom is usually applied as a self-evident standard procedure without quoting any axioms.) 


6.7.6 DEFINITION [MM]: Predicate calculus with equality. 
The axiomatic system QC+EQ for predicate calculus with equality is the axiomatic system QC for predicate 


calculus (in Definition 6.3.9) together with the following additional rules and axioms. 
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i) The predicate name “=” in No is bound to the same binary concrete predicate in all scopes. 
p 9 


(ii) The additional axiom templates are the following logical expression formulas. 


(EQ1) Fay, = zı for zı € Ny. 

(EQ2) F (z1 = 3) > (x2 = 71) for x1, £2 € Ny. 

(EQ3) F (z1 = x2) > ((xa = £3) > (£1 = v3)) for £1, £2, 3 € Ny. 

(Subs) | (x; = yi) > (A(z1,...24,... 25) > A(z1,... 9i... En)) for AE Non, £i Yi € Ny, t=1,...n 


6.7.7 NOTATION: «x # y,for variable names z and y in a predicate calculus with equality, means —^(x = y). 


6.7.8 REMARK: Application of predicate calculus with equality to proofs of theorems. 

Theorem 6.7.9 shows how predicate calculus with equality differs in some small points from pure predicate 
calculus. (When you build a new plane, it's a good idea to take it out for a test flight before putting it 
into regular service!) Part (i) of Theorem 6.7.9 reflects the underlying assumption that the concrete variable 
space is non-empty. 

In the proof of Theorem 6.7.9 part (iii), line (2), only one of the two instances of the free variable ĉ is bound 
to the quantified variable y by the EI rule, Definition 6.3.9 (viii) (EI), which permits selective quantification 
of a free variable, as described in Remark 6.3.11. Similarly, in the proof of Theorem ars part (x), line (4) 


Oa? 


follows by re-binding only two of three instances of “y” to the quantified variable “z 


There are formal applications of assertions in M 6.7.9 to the proofs in Remark 7.3.10, and parts 
(xi) and (xiv) explicitly justify lines (10.8.9) and (10.8.20) respectively in the proof of Theorem 10.8.12. 
Otherwise, QC+EQ theorems are generally applied without specific attribution. 


6.7.9 THEOREM [QC+EQ]: Some propositions for predicate calculus with equality. 
The following assertions follow from the predicate calculus with equality in Definition 6.7.6. 


(iii) F dz, dy, z = y. 

(iv) F Va, dy, £ = y. 

(v) ĝi — d» F $2 = ín. 

(vi) G1 = G2, A($,d1) F ACÈ, 02). 

(vii) $1 = $2, A(@, 92) F A(2, ĝa) 

(viii) ĝi = d» F Va, (A(x, ĝi) > A(x, d2)) 
(ix) dz, (z= ^ A($,2)) F A(2,d). 
(x) A(2,9) - 3z, (z =9 ^ A(&,z)). 
(xi) dz, (z= ^ A($,2)) dk A($,d). 

(xii) Vz, (z = ĝ => A($,2)) F Alê, 9). 

(xii) A(2,9) F Vz, (z =9 => A(&,z)). 

(xiv) Vz, (z = à => A($,2)) dk A(4,9) 


PROOF: To prove part (i): [ dz,z=z 


(1) 222 EQ140 
(2) de, 2=< EI(1)40 
To prove part (ii):  FVz,z—z 

(1) 222 EQ140 
(2) Vx,z—2z UI (1) 40 


To prove part (ii): F 3x, dy, z = y 
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To prove part (iv): Va, 3y, £ = y 


To prove part (v): ĝi = ĝ2 F P=hH 


1) ti = de 
2) ín = $2 > d» = fı 
3) d» = ih 


To prove part (vi: — $4 = $e, Alê, Hı) F Alê, ĝa) 


To prove part (vii): 91 = Je, A(ĉ, ĝ2) F A(2, 01) 
(1) ti = f 
(2) A(2, 02) 
(3) J2 = ĝı 
(4) Alê, 1) 
To prove part (viii): ĝı = $a F Va, (A(x, 01) > A(x, ĝ2)) 


"~ 


To prove part (ix): — 3z, 
dz, (z= ^ A(4,z)) 
2 — 4 ^ A(&, 2) 


Il 
c 


To prove part (x):  A(2,9) F dz, (z — 9 ^ A(2,2)) 


(G=9) ^ AG.) 

(4) 32, ((2 9.8) ^ A(&,2)) 

Part (xi) follows from parts (ix) and (x). 

To prove part (xii): Vz, (z = à => A($,2)) F A($,9) 
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Hyp 4 (1) 

Hyp 4 (2) 

part (vi) (1,2) 4 (1,2) 

CP (2,3) 4 (1) 

UI (4) 4 (1) 

Hyp d (1) 

Hyp 4 (2) 

Theorem 4.7.9 (xi) (2) 4 (2) 
Theorem 4.7.9 (xii) (2) 4 (2) 
part (vi) (3.4) 4 (2) 

EE (1,2,5) 4 (1) 

Hyp 4 (1) 

EQ140 

Theorem 4.7.9 (i) (2,1) 4 (1) 
EI (3) 4 (1) 
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(1) Vz, (z = ĝ => A(4, z)) Hyp d (1) 
(2) $— $ => A(2,8) UE (1) 4 (1) 
(3) 9=9 EQ140 
(4) A(, b) MP (3,2) 4 (1) 
To prove part (xiii): — A(2,9) F Vz, (z = à => A($,2)) 

(1) A(, 9) Hyp 4 (1) 
(2) 2=9 Hyp 4 (2) 
(3) A(, 2) part (vii) (2,1) 4 (1,2) 
(4) 2 = $ > Alê, 2) CP (2,3) 4 (1) 
(5) Vz, (z = ĝ => A(8, z)) UI (4) 4 (1) 


Part (xiv) follows from parts (xii) and (xiii). 


6.7.10 REMARK: Preventing false theorems in predicate calculus with equality. 

When designing a logical calculus, it is even more important to prevent false positives than to prevent 
false negatives. In other words, the failure to prove a true theorem is less important than the failure to 
prevent false theorems from being proved. So it is of some interest to consider what the form of proof of 
Theorem 6.7.9 (iii) does not succeed for the “proof” of the false theorem “Ja, Vy, x = y". A “proof” following 
this pattern would be as follows. 


To prove “false theorem”: | dr, Vy, x = y [pseudo-theorem] 
(1) ĉ=ĉ2 EQ140 
(2) Vy, $—y (*) UI (1) 40 
(3) Ix, Vy, £ — y EI (2) 4 0 


The reason this fails is because the UI rule in Definition 6.3.9 does not permit “selective quantification" of the 
variable ĉ by the quantified variable y. Similarly, this pattern of proof cannot succeed for the pseudo-theorem 
‘Ya, Vy, x — y". 


Guaranteeing the impossibility of “false positives” is more important than avoiding “false negatives". In 
other words, if a logical calculus is unable to demonstrate some true propositions, that is a much less serious 
problem than the ability to prove even a single false proposition. 


6.7.11 REMARK: The history of the equality symbol. 
The equals sign “=” was first used by Robert Recorde, “Whetstone of witte", 1557. (See Cajori [242], 
Volume I, pages 165, 297-309.) His symbol *————" (“a paire of parelleles") was somewhat longer than the 


modern version. Recorde wrote that he chose this symbol *bicause noe.2. thynges, can be moare equalle." 


6.8. Uniqueness and multiplicity 


6.8.1 REMARK: The importance of uniqueness in mathematics. 

Much of pure mathematics is concerned with proving that sets (or numbers, or functions) exist and are 
unique. Functions are at least as important as sets in mathematics. According to Definition 10.2.2, a 
function is primarily required to have a unique value. Existence of a value is a secondary requirement. 
According to Definition 10.9.2, a “partial function" or “partially defined function” requires only uniqueness, 
existence not being required at all. (Partial functions are frequently encountered in differential geometry.) 
The deeper results of most areas of mathematics are concerned with existence and uniqueness of solutions to 
problems. Typically uniqueness is easier to prove than existence, but uniqueness is still of great importance. 


If a set exists and is unique, it can be used in a definition. Definitions mostly give names to things which exist 
and are unique. One may use the word "the" for unique objects. For example, the Zermelo-Fraenkel empty 
set axiom (Definition 7.2.4 (2)) states that 3A, Vx, x ¢ A. From this, it is easily proved that 3' A, Vr, x ¢ A. 
(In other words, A is unique.) Therefore the set A may be given a name “the empty set" and a specific 
notation “Ø”. (See Theorem 7.6.2 and Notation 7.6.4.) The ability to use the word “the” for unique objects 
is more than a grammatical convenience. It means that knowledge about the object accumulates. In other 
words, whatever is discovered about it continues to be valid every time it is encountered. One does not need 
to ask if two theorems which mention the empty set are talking about the same set! 
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6.8.2 REMARK: Unique existence. 

The shorthand “J” is often used with the meaning “for some unique". For example, *3'z, P(x)” means 
“for some unique x, P(x) is true". This may be expanded to the statement “for some x, P(x) is true; and 
there is at most one x such that P(x) is true”. So it signifies the conjunction of two statements. This is 
stated more precisely in Notation 6.8.3. (It is tacitly understood that the soft predicate P in Notation 6.8.3 
may have any number of “extraneous free variables” as indicated explicitly in Notation 6.8.5.) 


6.8.3 NOTATION[MM]: Unique existence quantifier notation. 
+x, P(x), for a predicate P and variable name x, means: 


(Ax, P(z)) ^ (Vm, Vy, (P(x) ^ P(y) => (x = 9). 


6.8.4 REMARK: Hazards of the unique existence notation. 
If the predicate P in Notation 6.8.3 is a very complicated expression, possibly possessing several parameters, 
the longhand proposition could be quite irksome. 


It is important in practice to not ignore additional predicate parameters because uniqueness is usually 
conditional upon other parameters being restricted in some way. In colloquial contexts, the “J” notation is 
sometimes used in such a way that the dependencies are ambiguous. This can be avoided by ensuring that 
all quantifiers and conditions are written out fully and in the correct order. Notation 6.8.5 gives an extension 


of Notation 6.8.3 to explicitly show other parameters of the predicate P. 


6.8.5 NOTATION: Unique existence quantifier notation, showing extraneous free variables. 


Ha, P(yi, ... m T, Z1,- -- Zn), for a soft predicate P with m +n + 1 parameters for non-negative integers m 
and n, and a variable name x, means: 


(ax, P(yi, ...9m 2,21... Zn)) 
^ (Vu, Vv, ((CP(yi; Umi Z1, Zn) ^ P (Yir ms 0, 21. 24)) > (u= v))). 
6.8.6 REMARK: Alternatives for the unique existential quantifier notation. 


Alternative unique existence quantifier notations include *3!" (Cohen [349], page 21; Kleene [365], page 225), 
“E!” (Suppes [395], page 3; Kleene [365], page 225; Lévy [368], page 4), and “Sl” (Graves [85], page 12). 


6.8.7 REMARK: Using the cardinality operator notation to denote uniqueness in a set theory. 

In terms of the cardinality notation in set theory, one might write 3'z, P(x) informally as “#{x; P(x)} = 1”, 
meaning that there is precisely one thing x such that P(x) is true. (Existence means that #{x; P(x)} > 1 
whereas uniqueness means that #{x; P(x)) € 1.) But the notation *3'" is well defined in a general predicate 
calculus with equality, which is much more general than set theory. 


6.8.8 REMARK:  Generalised multiplicity quantifiers. 

There are some (rare) occasions where it would be convenient to have notations for concepts such as “there 
are at least 2 things x such that P(x) is true", notated “Sox, P(x)”. (Margaris [369], page 174, gives the 
notation *3!n" for essentially the same concept as the quantifier ^3," presented here.) More generally, “there 
are between m and n things such that P(x) is true” for cardinal numbers m < n, notated "32x, P(x)”. 


Then 3', would be a sensible shorthand for 37. Hence the quantifiers 3', 3 and J} are equivalent. 


The notation *J^z, P(x)” should mean: “There are at most n things such that P(x) is true.” So *3"" should 
mean the same as “Sj”. (See Remarks 6.8.9, 6.8.10, 6.8.11, 6.8.12, 6.8.13, 6.8.14, 6.8.15, 6.8.16 regarding 


: «gn n Be IC A ccc d ccc M C c ÉCEId cC SO SERA 
expressions for ^37,".) 


Such generalised uniqueness and existence quantifiers could be referred to as “multiplicity quantifiers” . 


Perhaps generalised existential quantifier notations are best written out in longhand. The simple case 
dox, P(x) may be defined as 3z, dy, (P(x) ^ P(y) ^ (x z y)), but the more general cases become rapidly 
more complex. The slightly more complex case 35z, P(x) (that is, 32z, P(x)) would mean 


(3x, 3y, (P(x) A Ply) ^ (rz y))) A (Va, Vy, Yz, ((P(x) ^ Ply) ^ P(z)-—(x-yvVy-zvr z))). 


The statement “Ix, P(x)” is equivalent to “a(4,,412, P(x))” for non-negative integers n. So “A? z, P(x)” 
is equivalent to “(4,,2, P(r)) A 7(An412, P(z))". 


—1 


The statement “I pz, P(x)” is equivalent to “4(4"~'x, P(x))" for positive integers n. 
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6.8.9 REMARK: Logical expression for existence of at least three things. 
The expression 43x, P(x), which means “there exist at least 3 things with property P", may be written as: 


dr, dy, dz, P(x) ^ Ply) ^ Plz) ^ (zy) ^ (y #2) ^ (rz z2). 

6.8.10 REMARK: Logical expression for existence of at most three things. 

The expression 3?z, P(x), which means “there exist at most 3 things with property P", may be written as: 
Yw, Y£, Vy, Vz, 

(P(w) A P(x) ^ P(y) A P(z)) (w=a2Vw=yVw=2z2Ver=yVL=ZzVy=2). 


Alternatively, 
Vu, Va, Vy, Vz, 
(wArAwAYyYAWAZAGA YALA ZAY#F2Z) => (P(w) V SP(x)V AP(y) V -P(z)). 
Saying that there are at most 3 things x such that P(x) is true is not really an existential statement All 


uniqueness statements (in this case a “tripliqueness” statement) are negative existence statements. Therefore 
universal quantifiers are used rather than existential quantifiers. 


The similarity of form to Remark 6.8.9 is not surprising because 3?z, P(x) means the same thing as 
—(34v, P(x)). There exist at most three x if and only if there do not exist four x. It follows by simple 
negation of J4z, P(x) that there is a third equivalent form for 3?z, P(x), namely 
Yw, Yx, Vy, Vz, 
=P(w) v =P(a) V ~P(y) V 4P(z) Vw— 23 V w=yVw=zVzrt=yVzrt=zVy=z. 


6.8.11 REMARK: Logical expression for existence of exactly three things. 
The expression 432, P(x), which means “there exist exactly 3 things with property P", may be written as: 


(3z, 3y, 32, (P(e) ^ Ply) ^ P(2) ^ Gr) ^ (uz) ^ Gr 2) 


^ (Nw, va, Vy, Yz, (P(w) A P(e) ^ Ply) A P(2)) (w-—zVw-yVw-zvz-—yvVrz-zVvy 2. 


This is the same as the conjunction of 437, P(x) and 3?z, P(x) in Remarks 6.8.9 and 6.8.10. It is probably not 
possible to reduce the complexity of the statement by exploiting some sort of redundancy in the combination 
of statements. 


6.8.12 REMARK: Logical expression for existence of exactly two or three things. 
The expression 43x, P(x), which means “there exist at least 2 and most 3 things with property P", may be 
written as: 


(Sx, 3y, (P(x) ^ Ply) ^ (x #y))) 
^ (vw, va, Wy, Yz, (P(w) A P(x) ^ Ply) ^ P(z)) > (w—-5 2 Vw—-yVw-zvr-yvVr-zVvy 2) 


This may be interpreted as: “There are at least 2, but less than 4, x such that P(x) is true.” More informally, 
one could write 2 < #{x; P(x)} < 4. 


6.8.13 REMARK: Logical expression for existence of at least n things. 

To write a logical formula for Inz, P(x) for a general non-negative integer n, which means "there exist at 
least n things with property P", is more difficult. It is tempting to approach this task using induction. 
However, that would not yield a closed formula for the desired statement 3,z, P(x). It is easier to think of 
the ordinal number n as a general set. We want to require the existence of a unique x; such that P(x;) is 
true for each i € n. The logical expression (6.8.1) is one way of writing 3,2, P(x). 


3f, (Vi € n, da ((i,z) € f ^ P(x))) ^ (Vien, Vj En, Ve ((i,z) € f A (j £) € f)=i= j). (6.8.1) 


In other words, there exists a set f which includes an injective relation on n, such that P(f(i)) is true 
for all i € n. (It is only required that f restricted to n be an injective relation on n. The purpose of this 
generality is to simplify the logic.) Thus statement (6.8.1) means that there exists a distinct x which satisfies 
P for each i € n. As a bonus, the logical expression (6.8.1) is valid for any set n, even if n is countably or 
uncountably infinite. 
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6.8.14 REMARK: Logical expression for existence of at most n things. 

Consider the task of writing a logical formula for 4°, P(x) for general non-negative integer n, which means 
“there exist at most n things with property P". Note that 3"x, P(x) is equivalent to ^(3, 41x, P(x)), which 
can be obtained from Remark 6.8.13 by simple negation as in the following expression, where n* =n + 1. 


V f, (Vi e€n*,Vjen*,Vz((iz)ef ^(jzx)ef)i- j) (3i n*,Vz((i,z)ef- =P(x))). 


This may be interpreted as: “For any injective function f on n*, for some i in n*, P(f(i)) is false.” The 
generalisation of this expression to infinite n does not seem to be as straightforward as in Remark 6.8.13. 


6.8.15 REMARK: Logical expression for existence of exactly n things. 

Consider the task of writing a logical formula for 422, P(x) for general non-negative integer n, which means 
“there exist exactly n things with property P”. This task can be tackled by combining Remarks 6.8.13 
and 6.8.14. 


| 


Jf, ((vi €n, 3e ((i,z) € f ^ P(a))) ^ (Vien, Vj € n, Vx (2) € f A (9,2) € f) =i=/)) 


A Vg, (Wient, Vj ent, Vy (iy) eg ^(5y)eg) m i-i) > (3ien*, Vy ((i,y) € g  -P(y))). 


6.8.16 REMARK: Logical expression for existence of at least m things and at most n things. 

Consider the task of writing a logical formula for 3? x, P(x), for general non-negative integers m and n 
with m « n, which means “there exist at least m and most n things with property P". This logical 
expression is a simple variation of Remark 6.8.15. 


b (tvi Em, da ((i,z) € f ^ P(z))) ^ (Viem, vjem,Vz((;z)e f ^ (jx) € f) >i = j) 


A Vg, (Vien*,vjen*,vy((Gy)eg^(5y)eg) 9m i-i) > (3ien* Vy ((i,y) € g > =P(y))). 


6.8.17 REMARK: Terminology for generalised multiplicity. 
Just as “unique” means that there is one thing only, the word “duplique” must mean that there are two 
things only. The adjectives for 3 and 4 might be “triplique” and “quadruplique” respectively. 


Just as the noun “existence” means that there exists at least one thing, the word “duplicity” might mean 
that two things exist. (The corresponding adjective might be “duplicate” or “duplex”.) The nouns for 3 and 
4 might be “triplicity” and *quadruplicity" respectively. 


6.8.18 REMARK: An equality relation is required for both upper and lower bounds on numbers of objects. 
In Remark 6.8.8, both upper and lower limits on the cardinality of objects satisfying a given predicate require 
an equality relation on the space of objects. (Here *cardinality" is meant in the logical sense for non-negative 
integers only, not the set-theoretic sense.) Whereas uniqueness places an upper bound of 1 on the number 
objects satisfying a predicate by requiring all such objects to be equal, the notion of duplicity puts a lower 
bound on “cardinality” by requiring the existence of at least two unequal objects. In each case, the equality 
relation is required for “counting” objects. 


6.9. First-order languages 


6.9.1 REMARK: The semantics underlying first-order languages and non-logical axioms. 

At the semantic level, the main difference between a pure predicate calculus and a first-order language is the 
absence of any specified constraint on the truth value map 7 € 2? in a pure predicate calculus, whereas a 
first-order language may (and usually does) specify constraints on the knowledge space 27 by way of axioms. 
(Truth value maps 7 : P + (F, T) for a concrete proposition domain P are introduced in Definition 3.2.3. 
The notation 2” is introduced in Remark 3.4.1.) Thus the axioms of a pure predicate calculus are merely 
"logical axioms" which are tautologies. 


In a pure predicate calculus, a valid theorem of the simple form “+ a” has the meaning Ka = 27, where Ka 
is the knowledge set corresponding to the logical expression a. (Knowledge sets are introduced in Section 3.4. 
See Remark 5.3.11 for some discussion of knowledge sets for pure predicate logic without non-logical axioms.) 
In other words, all valid theorems without antecedent propositions are tautologies. Similarly, a sequent of the 
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form “ay,...Q, H 8" has the meaning (};_, Ka, C Kg, which expresses a relation between the propositions 
in the sequent. (It means that if “the truth” lies in all of the knowledge set Ky,, then “the truth" must also 
lie in the knowledge set Kg.) 


In a first-order language, the non-logical axioms constrain the knowledge space 2” a-priori to some subset 
Kz of 2”, where Z is the logical conjunction of all of the axioms of the language. This is like preceding 
all sequents by the proposition Z. Thus an assertion “F o" has the meaning Kz C Ka because “F a” 
is tacitly converted to the sequent “Z F a”. Similarly, a general sequent of the form “aj ,...a, F 8” is 
tacitly converted to *Z,o1,...o4 8”, which has the meaning Kz N N; Ka, C Kg. Consequently the 
theorems of a first-order logic with non-logical axioms are not necessarily tautologies. Such theorems are 
all consequences of the non-logical axioms. However, all of the theorems from the pure predicate calculus 
continue to be tautologies. So pure predicate calculus theorems are valid for any first-order logic. 


6.9.2 REMARK: Object-maps in first-order languages. 

As mentioned in Remark 6.7.1, first-order languages are distinguished from a pure predicate calculus in two 
mains ways. A first-order language may have non-logical axioms, as outlined in Remark 6.9.1, but they also 
differ by having optional object-maps. These are functions which map finite sequences of elements of the 
concrete variable space V to individual objects in this space. As mentioned in Remark 3.16.2 (5), this kind 
of structure is relevant for axiomatic algebra which is not based on set theory. However, this option is not 
exercised in ZF set theory. So it can be safely ignored here. 


6.9.3 REMARK: Concrete propositions for first-order languages. 

Despite the apparent differences between propositional calculus and a first-order language, they have in 
common a universe of propositions P which is basis of all assertions in the language. (In fact, every assertion 
corresponds to a subset of 27.) The following tables gives an impression of this space of propositions in 
the case of a set theory with only equality and set membership predicates. (In each table, the entry “O” 
indicates that the object on the left has the relevant relation to the object in the top row, and *X" indicates 
that the relation does not hold.) 


“e” 0 {0} {10 {0,10} =” 0 (0) {{O}} {0, {O}} 
0 x oO X O 0 O x x X 
(0) x x oO [e (0) X O X X 
((0)) X X X X HOH X X O X 
{0,{O}} X X X X (0,00) X X X O 


In principle, it is possible to determine a complete universe of objects for such a system, and to then 
determine all of the true/false entries in the tables. This does not make set theory simple for several reasons. 
Determining even one complete universe for ZF set theory, for example, is not entirely trivial. (The study 
of such universes leads to model theory.) More importantly, the propositions of interest are not these simple 
concrete propositions, but rather the "soft predicates" which can be built from them. For example, the 
proposition “Vz, x ¢ (" means that the entire first column of the “€” table contains X. This very simple 
"soft proposition" is the conjunction of the falsity of the infinitely many concrete propositions in the first 
column. The proposition “Vz, x € x” is the conjunction of the falsity of all diagonal propositions in the “e” 
table. Most of the interesting theorems and conjectures in mathematics are vastly more complicated logical 
combinations of concrete propositions than that. As in the game of chess, the basic rules seem simple, but 
playing the game is famously non-trivial. 


6.9.4 REMARK: No theorems for first-order logic. 

The reader may be concerned about the lack of theorems for first-order logic in Section 6.9. However, 
(abstract) first-order logic is only a general framework for developing particular (concrete) first-order logics 
which have particular specifications for spaces of variables, predicates and object-maps. As mentioned in 
Remark 6.9.1, all of the theorems for pure predicate calculus are also theorems for any first-order logic. 
Moreover, all of the theorems for predicate calculus with equality in Section 6.7 are also theorems for 
any first-order logic with equality because the axioms for an equality relation are essentially universal and 
standard. But when further relations (such as set-membership) are introduced, together with non-logical 
axioms (such as the ZF set theory axioms), additional theorems must be developed in accordance with those 
relations and axioms. And that is an application issue, not a framework issue. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


220 6. Predicate calculus 


[www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


book dri 


gto. Al Rights Reserved. You m 


book daft for personal use. Publie redistribution ofthis book draft in electr 


‘or printed form is forbidden. You may not charge- 


[221] 


Chapter 7 


SET AXIOMS 
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7.12 Comments on the axioms of infinite choice... 2... 2... ee 254 


7.0.1 REMARK: Set theory research versus applicable set theory. 

The set theory presented in Chapter 7 is intended to be merely a “set theory which works”. Set theory 
research questions are not investigated here, and the style of presentation given here is not intended to 
facilitate such research. The author has combined many ideas from numerous set theory approaches to 
produce a basic axiomatic framework which is more than adequate for the rest of the book. 


Although some of the deeper set theory questions are discussed in Chapters 7, 11, 12 and 13, the advanced 
methods which are required for investigating these questions are not presented. Thus one may say that the 
presentation here is restricted to applicable set theory. It is the minimum amount of set theory to ensure 
that differential geometry has a sufficiently solid basis for all practical purposes. 


7.1. The role of set theory in mathematics 


7.1.1 REMARK: Set theory as a foundation layer for mathematics. 
Set theory is a foundation upon which most mathematics may be built. According to Halmos [357], page vi: 


[...] general set theory is pretty trivial stuff really, but, if you want to be a mathematician, you 
need some, and here it is; read it, absorb it, and forget it. 


It is true that mathematics can be done successfully without studying set theory or mathematical logic, but 
some people have a desire to understand what everything means, not just how to do it. Since this book is 
strongly oriented towards meaning and understanding, as opposed to just proving theorems, the presentation 
here commences with the foundations. Set theory is explained in terms of an even lower foundation layer, 
mathematical logic, which is presented in Chapters 3, 4, 5 and 6. Although this is all “pretty trivial stuff", 


it can help to give meaning to an otherwise seemingly endless treadmill of abstract theorem-proving. 


7.1.2 REMARK: The importance of not examining the foundations of mathematics too closely. 
Mathematics is full of perplexing and incomprehensible ambiguities. So it is always useful to be able to 
claim that everything can be made meaningful using “rigorous set theory”. This is like the ancient Greeks 
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explaining events (e.g. in the Iliad) in terms of the whims of deities on Mount Olympus. As long as no 
one went there, no one could prove that the whims of deities on Mount Olympus didn’t explain events. 
Similarly, one should not examine set theory too closely, because the myth of a solid basis in rigorous set 
theory is useful to maintain. Even in times when most logicians thought that set theory was irreparably 
self-inconsistent, mathematics progressed regardless. Mathematics did not stop for a decade while waiting 
for paradoxes in set theory to be resolved. 


7.1.3 REMARK: Arguments for and against defining mathematics in terms of sets. 

Constructing all mathematical objects from ZF sets is like building all houses, cars and computers out of 
empty boxes. Although it is possible to build mathematics from highly convoluted constructions which are 
defined entirely in terms of the empty set, this is not at all natural. 


An alternative to the sets-only approach for the definition of mathematical objects is the axiomatic approach, 
where objects such as integers, real numbers and other low-level structures are defined by axioms. Then 
the particular set-based constructions of these structures are merely representations for the corresponding 
axiomatic systems. 


By allowing structures other than sets to be defined axiomatically, mathematical constructions will not then 
all be derivable from empty sets as their core underlying object. Some minimalism is thereby sacrificed for 
the sake of naturalism. However, minimalism in the foundations implies a vast amount of pointless work 
in the construction phase, constructing all mathematical objects out of the empty set. The fact that it 
can be done does not imply that it should be done. Suitable candidates for axiomatisation (i.e. definition 
independent of pure set theory) include numbers, groups, fields, linear spaces, tensors and tangent vectors. 


Allowing some classes of mathematical objects to be defined independently of set theory could be regarded 
as a kind of diversification strategy or insurance policy. If it turns out later that there is some fundamental 
flaw in pure set theory which cannot be fixed, it would be useful to have an independent basis for at least 
some of the structures which are required for mathematics. 


7.1.4 REMARK: ZF is a model validation framework for theories. Mathematics does not reside inside ZF. 
It is not true to say that all mathematics is about ZF set theory. Nor is all mathematics expressed within the 
framework of ZF set theory. Mathematics thrived for thousands of years before set theory was invented, but 
towards the end of the 19th century and the beginning of the 20th century, the accumulation of paradoxes in 
mathematics motivated attempts to axiomatise all of mathematics as Euclid had done for plane geometry. 
Peano, Frege, Whitehead and Russell, Hilbert, and others attempted to bring mathematics within a logical 
system which could be shown to be at least consistent, and preferably complete, so that mathematics in 
general could be validated, and future discoveries of paradoxes could be absolutely excluded. 


Although the logicist and formalist research programmes were ultimately considered to have failed to deliver 
absolute certainty to mathematics, the Zermelo-Fraenkel set theory framework remains as a comprehensive 
framework within which almost all mathematics can be represented. When most of the key questions of 
independence of axioms (such as the various axioms of choice and continuum hypotheses) had been reasonably 
well resolved in the 1960s by Paul Cohen and others, mathematicians largely lost interest in research into 
the validity of the foundations of mathematics. 


Now in the post-1960s mathematical world, ZF set theory has a much more limited role. The original 
motivation for trying to fit all of mathematics into ZF set theory was to define models whose validity could 
be established within a strict axiomatic framework. One can say now that if a mathematical theory can be 
modelled within ZF set theory, then probably there are no serious fundamental problems which can endanger 
the theory. For example, Russell’s paradox has been banished from ZF set theory. If a new contradiction 
does in fact arise in ZF set theory, it will have major effects on all of mathematics, not merely on a single 
isolated area of mathematics. Such a discovery would probably bring some fame to the discoverer, which 
would more than compensate for the loss of validity of some small area of mathematics. So ZF set theory is 
a fairly safe framework within which to build mathematical models. There is a kind of a built-in “insurance 
scheme” which will pay out more than adequate compensation for any damage. 


At the beginning of the 21st century, one may regard (almost) all mathematics as being formulated within 
the framework of a first-order language (FOL) or in some related kind of logical language. (See Section 6.9 for 
first-order language.) Any valid FOL for a mathematical theory typically has infinitely many interpretations 
by models within ZF set theory if it has any at all. But the ability to find a model for a theory within the 
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ZF framework is not the sole criterion for validity. Other kinds of models can be defined. The principal 
advantage of finding a model for a theory within ZF is the confidence that one obtains from this. For 
example, the real numbers may be implemented as Cauchy sequences or as Dedekind cuts, both within ZF 
set theory. Likewise, the integers, rational numbers, complex numbers, group theory, topological spaces, 
linear spaces, Schwartz distributions, Carathéodory measure theory and partial differential equations can all 
be modelled very straightforwardly within the ZF set theory framework. 


Mathematical theories and structures are not all modelled within ZF set theory. The important point is 
that almost all mathematical theories and structures can be modelled within ZF set theory. As soon as this 
feasibility has been demonstrated, the immersion of a theory within ZF set theory may be safely discarded. 
Thus for example the non-negative integers may be modelled as sets 0 = 0, 1 = {0}, and so forth. But this 
is not really the definition of the non-negative integers. It is merely a feasibility demonstration. Similarly, 
representations of real numbers as Cauchy sequences or Dedekind cuts may be safely discarded as soon as the 
theory of real numbers has been validated for the particular model which is chosen. The real real numbers 
are extra-mathematical entities, not elements of the von Neumann universe or any other set theory universe. 


7.1.5 REMARK: ZF set theory does not exist. It’s only a theory. Or a model... 

Whereas Remark 7.1.4 asserts that ZF set theory is a testing and validation framework which is a kind of 
factory that provides models for all of mathematics, ZF set theory itself is a theory which requires models. In 
practice, the models for ZF set theory are provided by ZF set theory! (See for example Chang/Keisler [347], 
page 560, for some issues in the choice of an “underlying set theory” for model theory.) In other words, it 
is a self-modelling theory. This should, perhaps, be alarming because almost all of mathematics depends on 
the consistency of ZF set theory. 


The 20th century literature on set theory shows broad disagreement regarding which axioms should be part 
of axiomatic set theory. Several axioms have swayed between broad acceptance and broad scepticism. (Some 
of the more controversial axioms are the choice, continuum hypothesis, infinity, replacement and regularity 
axioms.) The situation in mathematical logic is not much different. As a result, there are now dozens 
of variants of set theory and hundreds of variants of mathematical logic, each with their own cohort of 
“believers” who defend their own systems and denigrate other systems. The idea that mathematics is the 
most logical, water-tight and bullet-proof of all intellectual disciplines is inconsistent with the literature. 


It is probably fairest to say that ZF set theory provides a well-studied core set of axioms which are accepted 
by enough mathematicians to make it a standard point of reference or “trig point” for most of mathematics. 
In this sense, one may regard ZF set theory as a unifying basic “credo” for mathematics. Whether one 
actually believes in it or not, it does provide a common language to communicate mathematical ideas. 


At the opposite extreme, one may think of mathematics as a completely “atomised” subject where every 
definition is a mini-theory which can have its own mini-models. In other words, mathematics is a network 
of definitions, and ZF set theory is only one theory amongst millions. Some other fundamental theories are 
the integers and the real numbers, which are also used as models for the interpretation of many theories. 
Since both the integers and real numbers can themselves be modelled within ZF set theory, this may give the 
impression that the definitions of mathematics may be organised as a “directed acyclic graph” of definitions 
which all ultimately derive their meaning from ZF set theory. Such an outcome would have pleased Peano, 
Frege, Russell and Hilbert. However, the validity of mathematics does not nowadays reside in a single bucket 
into which all other definitions ultimately drain. 


Axiomatic systems such as first-order logics provide a very general intellectual framework for constructing 
mathematical theories and concepts, clearly much more general than ZF set theory. However, axiomatic 
systems are usually modelled in terms of some kind of “underlying set theory”, which is typically chosen as 
some variant of ZF set theory. 


7.1.6 REMARK: The origin of set theory. 

The origin of modern set theory is generally traced back to Georg Cantor’s work in the last quarter of the 
19th century, particularly his paper “Grundlagen einer allgemeinen Mannigfaltigkeitslehre” (“Foundations 
of a general theory of aggregates”) in 1883. This partially intuitive form of set theory contained paradoxes 
which were resolved within axiomatic set theory in the early 20th century. 


The earliest publications on set theory appeared principally in German, using the word “Menge” for “set”. 
This word for the mathematical set concept was introduced posthumously in 1851 by Bolzano [205], § 4. 
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Einen Inbegriff, den wir einem solchen Begriffe unterstellen, bei dem die Anordnung seiner Teile 
gleichgültig ist (an dem sich also nichts für uns Wesentliches ändert, wenn sich bloß diese ändert), 
nenne ich eine Menge. 


This may be translated literally into English as follows. 


An embodiment, which we subordinate to such a concept, where the order of its parts is immaterial 
(where thus nothing essential is changed for us, if it [the order] actually is changed), I call a “set”. 


In other words, a set is a collection of elements, disregarding the order in which they appear. However, 
although the use of the word “set” as opposed to a “class” may be attributed to Cantor, the axiomatic 
development of set theory seems to be more attributable to the 1889 work by Peano [375]. 


7.1.7 REMARK: Collections are synonymous with sets. 

In Zermelo-Fraenkel set theory, there is no distinction between sets and collections. In a hierarchically 
organised set theory, there could be objects which are not sets, called “atoms” for example, and then sets 
which may contain such atoms, and finally “collections” which may contain sets. This is not the case in ZF 
set theory. 


The term “collection” is used here merely as a hint to the reader (and to the writer) that the purpose of the 
collection is to contain various other sets which appear in the same context. Thus “collection” means “set 
of sets”. However, all ZF sets are sets of sets, even though a set may be empty. Apart from the empty set, 
all ZF sets contain other sets. However, although the number 2 may be defined as (0, (0) ), it is not thought 
of as a set of sets. It is generally thought of as an “atom”. Thus some sets are thought of as atoms, whereas 
others are thought of as “collections” or “sets of sets”. Thus there is a difference here between the austere 
abstract theory and the way mathematicians really think. 


7.1.8 REMARK: The roles of set theory in mathematics language and semantics. 

It is strongly emphasised in Chapters 4 and 6 (and elsewhere) that mathematical logic language can be based 
upon semantics which is expressed in terms of elementary naive set theory and logic. The very broad picture 
looks like Figure 2.1.1, which shows rigorous mathematics based upon a “bedrock layer” of mathematical 
logic, which in turn is supported by naive logic and set theory. Figure 7.1.1 shows a slightly more concrete 
version of this picture. 


mathematics 
language 
mathematics|* - n 
semantics 
set theory 
Y language 
set theory aT i 
semantics 
logic 
M language 
logic — 
semantics 
Y 
naive 
set theory 
Y 
naive 
logic 
Figure 7.1.1 Layering of mathematics language and semantics 


The naive set theory here is nothing like Zermelo-Fraenkel axiomatic set theory. The naive logic is intuitively 
clear, trivial logic. But the important assertion here is that mathematics language can be backed up by 
semantics to which the language “points”. In principle, it should be possible to “follow an arrow" from any 
mathematical language construct to the supporting semantics to discover the meaning of the construct. 


Very roughly speaking, naive logic is described in Sections 3.1, 3.2 and 3.3, the relevant naive set theory is 


and 3.15, and in Chapter 5, and logic language is described in Chapters 4 and 6. 
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7.1.9 REMARK: Set theory is just a first-order language. 

From the abstract logician’s point of view, set theory could be thought of as just one language in an infinite 
universe of possible first-order languages which have no meaning apart from their properties as sequences of 
symbol strings of various kinds. Even model theory, which “interprets” logical languages, interprets them 
only in terms of further dry, meaningless universes which are defined in terms of other logical languages. 
Viewing set theory as just a language is like saying that the English language is just one possible grammar 
and vocabulary amongst infinitely many possibilities, and the Internet engineering specification (which is a 
kind of “grammar and vocabulary” for a network) is just one engineering specification amongst infinitely 
many possible network specifications. But there is a huge lively literature written in English, and there is 
a huge real-world network which implements the Internet specification. In the same way, there is a huge 
mathematics literature which gives a kind of “life” to abstract set theory. Therefore set theory is more than 
“just a first-order language”. The fact that a grammar and specification can be written for the language of 
mathematics does not mean that mathematics is just a language. It is the ideas which are communicated in 
the language which are the real mathematics, in the same way that the ideas communicated in English and 
on the Internet are the real life which underlies the grammar books and the technical specifications. 


7.1.10 REMARK: Some set theory abbreviations. 
The following are abbreviations for some particular set theories. ZF means “Zermelo-Fraenkel”. BG means 
“Bernays-Gédel”. NBG means *Neumann-Bernays-GodeP". 


The abbreviation AC means “axiom of choice”. CC means “axiom of countable choice”. The abbreviation 
CC,, is also popular for the axiom of countable choice. 


7.1.11 REMARK: Tagging of theorems proved in set theories beyond Zermelo-Fraenkel. 

Theorems which are based on non-standard axioms (i.e. not Zermelo-Fraenkel set theory) are tagged. For 
example, a theorem based on the axioms ZF plus AC is written as “THEOREM [ZF-AC]". For example, see 
Theorem 22.7.21. (See also Remark 4.5.5.) 


Non-standard additions to ZF set theory (“optional extras”) include AC, CC, DC (dependent choice), CH 
(continuum hypothesis) and GCH (generalised continuum hypothesis). The main purpose of presenting such 
beyond-ZF theorems in this book is to help the reader to recognise them in the wild and avoid them. It is 
this author’s belief that any theorem which relies upon a choice axiom as evidence is, in the terminology of 
Scottish law, “not proven”, no matter how plausible its verdict may seem. A second purpose for presenting 
beyond-ZF theorems is to show where a choice axiom has been used in the proof so as to assist the discovery 
of standard ZF methods of proof starting from slightly stronger conditions which guarantee the existence of 
any required choice functions. 


7.1.12 REMARK: The advantages of multiple set theories. 

The purpose of law is to prevent war, or more precisely, to manage conflict between individuals, associations, 
nations and so forth. Laws are chosen so that the vast majority of individuals are content to abide by them. 
The existence of multiple set theory axiomatisations is analogous to the existence of different legal systems 
in the 200 or so countries in the world. Individuals who are not happy in one system can take refuge in other 
systems. For this book, the ZF system is adopted as standard, and occasionally the consequences of some 
closely related systems are mentioned. The ZF system is adequate for most of differential geometry. 


The rapid development of axiomatic set theory and mathematical logic around the beginning of the 20th 
century followed some occasionally vicious warfare between mathematicians in the 19th century. Axiomatic 
set theory proscribes some behaviours and gives licence to others. This does not mean that the axioms are 
universally true. The axioms help to keep the peace. ZF set theory is a kind of mainstream law-book which 
may be amended as required for special applications. 


7.2. Zermelo-Fraenkel set theory axioms 


7.2.1 REMARK:  Set-theoretic formulas. 

Definition 7.2.4 requires some predicate calculus terminology. In Definition 7.2.2, set-theoretic formulas are 
expressed in terms of the “predicate name space” and “soft predicates” in Definition 6.3.9 (vi) (1). (Soft 
predicates are essentially logical predicate expressions which are parametrised by their free variables.) The 
“predicate calculus with equality” in Definition 7.2.4 is introduced in Definition 6.7.6. 
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7.2.2 DEFINITION [MM]: 
A set-theoretic formula is a soft predicate in a predicate calculus with predicate name space {“=”, “E” }. 


7.2.3 REMARK: Proposition templates. 

'The function of a free variable in a logical expression is to permit substitution of values from a variable space. 
Thus a logical expression which contains free variables may be regarded as a template which generates 
a different particular proposition for each combination of values substituted for the free variables in the 
template expression. 

The function of a bound variable is to link two points in a logical expression. It is not permitted to substitute 
particular values for a bound variable as one may do for free variables. 

The ZF set theory axioms in Definition 7.2.4 are proposition templates. Concrete individuals of the indicated 
type may be substituted for any of the free variables. All of the quantifiers refer to the set of individuals, 
not predicates or functions. All individuals are sets. So all quantifiers refer to sets. 


7.2.4 DEFINITION[MM]: Zermelo-Fraenkel set theory is a predicate calculus with equality, with a binary 
relation “€” which satisfies the following axioms. 


(1) The extension axiom: For any sets A and B, 


(Vr, (€ A & x € B)) > (A=B). 


(2) The empty set axiom: 
JA, Vr, ^(x € A). 


(3) The unordered pair axiom: For any sets A and B, 


C, Yz, (ze C & (z—AV z=B)). 


(4) The union axiom: For any set K, 


JX, Vz, (ze X & AS, (ze S ^ Sc K)). 


(5) The power set axiom: For any set X: 


JP, VA, (AE P & Vx, (€ A x € X)). 


(6) The replacement axiom: For any two-parameter set-theoretic formula f and set A, 


(Va, Vy, Vz, ((f(z,y) ^ f(x, z2)) > y = z)) > AB, Vy, (y € B & Jz, (€ A ^ f(a,y))). 


(7) The regularity axiom: For any set X, 


(Ay, y € X) > Jy, (yE X A Yz, (ze X 9 -(ze€py)). 


(8) The infinity axiom: 


3X, Vz, (ze X e ((Vu, 2(uez)) V Jy, (ye X A Ww, (vez e (vey Vv=y))). 


7.2.5 NOTATION: « € y means -(z € y). 


7.2.6 REMARK: Domain-restricted universal and existential quantifiers. 

The meanings of the logical predicate expressions “Vx € S, P(x)” and “Sa € S, P(x)” in Notation 7.2.7 are of 
a different kind. The former uses “=” while the latter uses “A”. This difference ensures that “da € S, P(x)” 
is equivalent to ^^(Vr € S, 5P(x))". This matches the expected form of an existential proposition. (See 
Remark 5.2.1 and Theorems 6.6.7 (ii) and 6.6.10 (vi).) 
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7.2.7 NOTATION: 
A proposition of the form Vz € S, P(x) for a set S and set-theoretic formula P means Vz, (x € S > P(z)). 
A proposition of the form 3x € S, P(x) for a set S and set-theoretic formula P means Jx, (x € $ ^ P(z)). 


7.2.8 REMARK: Compact summary of ZF axioms. 
The following is a compact summary of the ZF set theory axioms in Definition 7.2.4. The word “formula” 
on the left means “two-parameter set-theoretic formula". 


(1) sets A, B: (Va, (x E€ A & x € B)) > (A=B) 

(2) JA, Yx, xr A 

(3) sets A, B: dC, Vz, (z E€ C e(z-Avz-B) 

(4) set K: JX, Yz, (zE X edSekK,zes) 

(5) set X: JP, VA, (AE P e Va, (xE A9 v € X)) 

(6) formula f, set A: (Va, Vy, Vz, ((f(z,y) A f(a, z)) > y = z)) > 3B, Vy, (y € B &3« € A, f(z,y)) 
(7) set X: (dy, y E X) > EX, Vee X,z2¢y 

(8) AX, Vz, (z E€ X & ((Vu, u € z) V Jy € X, Ww, (vE z & (v E€ y Vv=y)))). 


It is quite impressive that the entire basis of set theory can be written in 8 lines. The rest of mathematics 
is just definitions, notations, theorems and remarks. 


7.2.9 REMARK: The notation for the set membership symbol. 

The set membership symbol “€” could plausibly have originated from the first letter of the word “element”, 
and this is a useful mnemonic for it. (This symbol should not be confused with e (epsilon) or & (variant 
epsilon) in modern mathematics.) However, Gaal [77], page 18, states: “The membership relation which we 
denote by € was originally introduced by Peano who used e as an abbreviation for the Greek word eot.” The 
word “oti” is indeed the third person singular of the Greek verb “to be”. In other words, it means “is” in 
English. (See for example Morwood [481], page 93.) This is confirmed by Cajori [242], volume II, page 299, 
section 689. The first publication of a symbol like “€” for the set membership relation is generally held to 
be an 1889 work in Latin by Peano [375], page X, although Peano offered no etymology for the symbol there. 


Signum e significat est. Ita acb legitur a est quoddam b; acK significat a est quaedam classis; 
a €P significat a est quaedam propositio. 


'This may be translated into English as follows. 


The sign e signifies is. Thus aeb is read a is a particular b; aeK signifies that a is a particular 

class; aeP signifies that a is a particular proposition. 
It seems plausible that Peano could have had in mind the Italian word “è”, which also means “is” in English. 
In the French translation of Peano’s 1895 work “Formulario mathematico” (written in his own peculiar 
simplified version of Latin), an etymology was offered. In the 1901 “Formulaire des mathématiques”, the 
following text appeared. (See Peano [376], page 4, point 4.) 

:4 € est la lettre initiale du mot oti. 
'This may be translated into English as follows. 

-4 € is the initial letter of the word eott. 
It seems plausible that this could have been an after-thought by Peano, an attempt to strengthen the case for 
the adoption of this notation since he was enthusiastically advocating the adoption of his entire mathematical 
notation system. General adoption by mathematicians was almost guaranteed by its appearance in 1910 in 
Whitehead/Russell, Volume I [400], pages 25-26, as follows. 

The propositional function “x is a member of the class a” will be expressed, following Peano, by 

the notation 

TEQ. 


» 


Here e is chosen as the initial of the word oti. “xea” may be read “x is an a.” Thus “x eman” 
will mean “x is a man,” and so on. 
And the rest, as they say, is history. Or maybe not! One must also ask why the symbol “e€” persisted when 


classical Greek ceased to be a part of every mathematician's education. The longer-term persistence of the 
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symbol “e” for the “element-of” relation was quite plausibly assisted by the fact that the word “element” 
begins with the letter “e” 


7.2.10 REMARK: Set theory is really “membership theory”. Membership is the only attribute of a set. 
Set theory provides axioms for the membership relation “€” rather than for the sets themselves. So it would 
perhaps be more accurate to call set theory “membership theory” or “containment theory”. 


In set theory, it is considered that the only property of a set is its membership relations “on the left”. In 
other words, two sets with names A and B must be the same set if they satisfy Vr, (x € A & x € B). The 
assertion that sets have no other properties than their on-the-left membership relations is enforced by the 
ZF extension axiom, Definition 7.2.4 (1), which states that Vr, (x € A x € B) implies A = B. (This issue 
is also discussed in Remark 6.7.5 and Section 7.5.) 


In practice, mathematicians typically think of sets as having more properties than are indicated by their 
membership. The cause of this is that set theory is used as a “model-construction factory” for most of the 
systems in mathematics, but the sets in the models of different systems often overlap, so that the same set has 
different meanings in different systems. However, this is an application issue. Sets have many applications, 
and their meanings are different in each application. This is inevitable. In pure set theory, extraneous 
attributes of sets are totally ignored. (This issue is also mentioned in Section 8.8.) 


7.3. Set inclusion relations 


7.3.1 REMARK: Inclusion is the third important relation for sets. 

The three most important relations between sets are equality, membership and inclusion. Both equality and 
membership are defined by the axioms of ZF set theory. Luckily inclusion may be defined very easily in 
terms of membership as in Definition 7.3.2. So there is not need to axiomatise a third relation! 


7.3.2 DEFINITION: 

A subset of a set B is a set A such that Vz, (x € A > x € B). 

A superset of a set A is a set B such that Vr, (x € A— x € B). 
A set A is said to be included in a set B when A is a subset of B. 
A set A is said to include a set B when A is a superset of B. 


7.3.3 NOTATION: 
A C B, for sets A and B, means that A is a subset of B. 
In other words, A C B means Va, (x € A > x € D). 


A 2 B, for sets A and B, means that A is a superset of B. 
In other words, A 2 B means Vx, (x € B >z € A). 


AZ B, for sets A and B, means that A is not a subset of B. 
In other words, A Z B means 7(V2, (x € A > x € B)). 


A 2 B, for sets A and B, means that A is not a superset of B. 
In other words, A Z B means «(Vz, (x € B > x € A)). 


7.3.4 REMARK: Notation for set inclusion relations. 

'The use of the notation *C" to mean set inclusion is avoided in this book because although *C" frequently 
means the same thing as “C”, it also often means proper inclusion. Similarly the notation “>” is avoided. 
Mathematics is difficult enough already without introducing guessing games through ambiguous notations. 
(Theorem 7.3.5 is also proved in the rigorous predicate calculus style in Remark 7.3.10.) 


7.3.5 THEOREM: Some basic properties of the inclusion relation. 
Let A, B and C be sets. 


M yd 

(iii) (ACBA BCA)SA=B. 
(iv) A=BSACB. 

(v) A=BSADB. 
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(i) A2 Be(ACBABC A). 
(ii) (ACBA BC C)2 ACC. 


PROOF: For part (i), let A be a set. Then Vz, (x € A > x € A) by Theorem 6.6.10 (xiii). So A C A by 
Notation 7.3.3. 

Part (ii) may be proved as for part (i). 

Part (iii) follows from Definitions 7.2.4 (1), 7.3.2 and 4.7.2 (iii). 

For part (iv), define a two-parameter predicate P by P(y,z) = “Va, (x € y — x € z)”. By the predicate 
calculus with equality axiom of substitutivity, Definition 6.7.6 (ii) (Subs), P(A, A) > P(A, B) since A = B. 
But P(A, A) = “Va, (x € A => x € A)" (which is the same as A C A) follows from Theorem 6.6.10 (xiii), and 
P(A, B) = ‘Va, (x € A > x € B)" then follows from this. Hence A = B > A C B by Definition 7.3.2. 


Part (v) may be proved as for part (iv). 


Part (vi) follows from parts (iii), (iv) and (v). 
For part (vii), let A, B and C be sets with A C B ^ B C C. Then by Definition 7.3.2, Vx, (x € A => x € B) 
and Vz, (x € B = x € C). Let x bea set. Then (x € A > x € B) ^ (x € B = z € C). Therefore 
r € A => x € C by Theorem 4.5.7 (iv). Consequently Vr, (x € A = x € C) by Definition 6.3.9 (UI). 
Therefore A C C by Definition 7.3.2 SGO(AC BA BCC)2 ACC. 


7.3.6 DEFINITION: 
A proper subset of a set B is a set A such that A C B and AF B. 
A proper superset of a set A is a set B such that AC B and AF B. 


7.3.7 NOTATION: 
A & B, for sets A and B, means that A is a proper subset of B. 
A 2 B, for sets A and B, means that A is a proper superset of B. 


7.3.8 REMARK: Notation for strict inclusion relations between sets. 

To be safe, one should use the clumsy-looking strict inclusion notations & and 2 respectively for the proper 
subset and superset relations. The clumsiness does not matter because these strict inclusion relations are 
almost never needed. It goes perhaps without saying that the relations $ and 2 must not be confused with 
the relations Z and J respectively in Notation 7.3.3. 


7.3.9 REMARK: Abstract high-level summary of ZF axioms. 

The following is an informal interpretation of the compact summary of ZF set theory axioms in Remark 7.2.8 
using the set inclusion notation and many other notations from later sections and chapters. The variables 
on the left indicate what kind of free variable or free predicate is to be used in the axiom schema on the 
right. The word “formula” on the left means “two-parameter set-theoretic formula”. 


(1) sets A, B: (ACBA BCA) = (A=B) [extension 
(2) () is a set [empty set 
(3) sets A, B: {A, B} is a set [pair 
(4) set K: UK is a set [union 
(5) set X: P(X) is a set [power set 
(6) formula f, set A: f is a function — f(A) is a set [replacement 
(T) set X: XZz0-ZJex,ynX-0 [regularity 
(8) w is a set. [infinity 


7.3.10 REMARK: Some sample proofs of ZF set theory assertions using formal predicate calculus. 

The above proof for Theorem 7.3.5 uses the standard “informal sketch” style of argumentation in which the 
vast majority of mathematical proofs are written (by human beings). The reader should, in principle, be able 
to build a rigorous proof from an informal sketch. Formal line-by-line logical calculus becomes rapidly more 
tedious and incomprehensible as the theory progresses from the basics to more advanced topics. However, 
the very elementary Theorem 7.3.5 is not too onerous to prove, as follows, in the predicate calculus fashion 
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as in Section 6.7. (Note that accents on free variables, i.e. sets, are optional because they are merely hints 
that the variables are free. The sets A and B are free variables in the following proofs.) 


To prove Theorem 7.3.5 (i): 
(1) Va, (we A x € A) 
(2 ACA 

To prove Theorem 7.3.5 (ii): 


(1) Va, (€ A x € A) 
(2) ADA 


To prove Theorem 7.3.5 (iii): 


ACBABCA 
ACB 
Va, (cE A— rc B) 


A—-B-—ADB 
To prove Theorem 7.3.5 (vi): 


1) A=BS3ACB 
2 A=BS3BCA 


4 


FACA 


K(ACBABCA)SA=B 


FA=B3ACB 


FA—-B-—ADB 


FA-Be(ACBABC A) 


(1) 
(2) 
(3 A2 B2(ACBABC A) 
(4) (ACBABCA)2 A-B 
(5) 


5 A=Be(ACBA BCA) 


To prove Theorem 7.3.5 (vii): 


(1) ACBABCC 
(2) ACB 
(3) Vr, (€ A x € B) 
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Theorem 6.6.10 (xiii) 4 0 
Notation 7.3.3 (1) 40 
Theorem 6.6.10 (xiii) 4 0 
Notation 7.3.3 (1) 4 0 
Hyp 4 (1) 

Theorem 4.7.9 (xi) (1) 4 (1) 
Notation 7.3.3 (2) 4 (1) 

UE (3) 4 (1) 

Theorem 4.7.9 (xii) (1) 4 (1) 
Notation 7.3.3 (5) 4 (1) 

UE (6) + (1) 

Theorem 4.7.9 (ii) (4,7) 4 (1) 
UL (8) 4 (1) 

Definition 7.2.4 (1) 4 () 

MP (9,10) 4 (1) 

CP (1,11) 40 


mit 


Theorem 4.7.9 meds (1,2) 4 
part (iii) (0 
Theorem 4.7.9 (ii) (3,4) + 


AU WIL leur Ae UR E 
€ € c € S 


Hyp 4 (1) 
Theorem 4.7.9 (xi) (1) + 
Notation 7.3.3 (2) + 
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(4) 2€ A— 2$cB UE (3) 4 (1) 
(5 BCC Theorem 4.7.9 (xii) (1) 4 (1) 
(6 Va, (we B— z€C) Notation 7.3.3 (5) 4 (1) 
(7) € B— $e€eC UE (6) 4 (1) 
(8) $€ A—$2cC Theorem 4.5.7 (iv) (4,7) 4 (1) 
(9 Va, (we A x€C) UI (8) 4 (1) 
(10 ACC Notation 7.3.3 (9) 4 (1) 
(11) (ACBA^ABCC)-— ACC CP (1,10) 4 0 


Such formality in proofs does not greatly assist comprehension of the ideas, but it is occasionally useful 
to convert some lines of informal sketch-proofs to rigorous predicate calculus when the logic is unclear or 
uncertain. In principle, all pure mathematics is no more than a long sequence of predicate calculus arguments 
as above, but certainly no one would read a book which is written entirely in this way. Nor would they learn 
much about mathematics. It would be more like a computer program than a book for humans to read. 


This naturally raises the question of what essence would be lost in millions of lines of rigorous predicate 
calculus without commentary. If a super-computer could be programmed to execute a billion lines of predicate 
calculus per second and store all of the resulting theorems in a database, why would that not be “doing 
mathematics"? Logical calculus without commentary would be almost meaningless, like bones without flesh. 
But a sequence of assertions without the iron discipline of logical calculus would be dubious speculation, like 
flesh without bones. 


7.4. Comments on ZF set theory 


7.4.1 REMARK: Ontology for set theory. 

The definition of a ZF set theory is presented as a set of axioms in Definition 7.2.4 because sets are almost 
the most basic concept in this book. There is no canonical representation of the system of sets in terms of 
other systems. Each mathematician is expected to have their own representation of sets. The ZF axioms 
just provide a set of tests to ensure that all mathematicians are discussing an equivalent system when they 
discuss sets. Sets of axioms such as ZF may be thought of as “interoperability specifications” for outsourced 
mathematical systems. (This is similar to the way in which the rules of chess ensure interoperability even 
when the board and pieces have different sizes and colours and shapes.) A conformant system is given a 
certificate of compliance if it satisfies the axioms. The choice of representation is a mere implementation 
issue to be resolved by the supplier. A model (or implementation or interpretation) which is outsourced is 
then “somebody else's problem". (See D. Adams [486], pages 329—330.) 


7.4.2 REMARK: The different qualities of the ZF axioms. 

Definition 7.2.4 specifies the properties of the membership relation between sets. It also specifies construction 
methods which yield new sets from given sets. So it provides a combination of membership relation properties 
and set existence rules. The set existence axioms 2, 3, 4, 5, 6 and 8 guarantee the existence of various kinds 
of sets. They effectively “anoint” sets which are constructed according to specified rules. Axioms 1 and 7 
specify constraints which limit set theory to safe ground. Thus there are two technical axioms and six 
“anointment” axioms. (This is perhaps clearer in the high-level summary in Remark 7.3.9.) 


All of the ZF set existence axioms guarantee the existence of sets which are constructed with the aid of 
pre-existing sets except for axioms 2 and 8, which each make a single set out of nothing. 


The productive axioms 2, 3, 4, 5, 6 and 8 may be thought of as defining lower bounds on the sets which may 


exist in ZF set theory. T lie restrictive axioms 1 and 7 may be thought of as defining upper bounds. 


Axiom 2 is implied by the other axioms. Axiom 6 allows you to prune back any set to the empty set. Since 
the infinity axiom 8 guarantees the existence of at least one set, it follows that an empty set exists. Thus 
axiom 2 follows from axioms 6 and 8. So it is technically redundant, but it is generally retained so that if 
the infinity axiom is removed, the remaining axioms guarantee that the universe of sets is non-empty. 


7.4.3 REMARK: The ae e sets whose existence is asserted by ZF set theory. 
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So each individual application of these axioms yields one and only one set, whose elements are precisely 
determined by the parameters (if any) of the axiom. In each case, the anointed set X is specified by a logical 
expression of the form 3X, Vz, (z € X © P(z,...)) for some predicate P with all parameters specified except 
for z, which is bound to a universal quantifier. (The special case of the empty set axiom 2 is equivalent to 
3X, Vz, (z € X = L(z)), where L denotes the always-false predicate. See Notation 5.1.10 for L.) The form 
of these logical expressions implies their uniqueness by Theorem 7.5.2. 


'The combination of existence and uniqueness permits a specific name or notation to be given to a set. So 
all of the ZF set existence axioms yield unique, well-specified sets. This contrasts with the axiom of choice, 
which merely claims that a specified class of sets is non-empty. This is why the axiom of choice is so useless. 
(See also Remark 7.12.1.) 


7.4.4 REMARK: ZF consistency and existence of models. 

One of the first things that a mathematician asks about a set of axioms is the existence question. Does 
there “exist” at least one system which satisfies the axioms? Existence is an extremely woolly philosophical 
notion, but it is generally agreed (almost universally) that a prerequisite for existence is consistency of the 
axioms. Conversely, if a model (or “exemplar” ) for the axioms can be found, then one concludes (rightly or 
wrongly) that the axioms must be consistent. Since it is generally not possible (for various reasons) to prove 
within an axiomatic system that the system itself is consistent, mathematical logicians have resorted to the 
“construction” of models for theories as a way of demonstrating consistency. Unfortunately, most of these 
model constructions rely upon the consistency of similar sets of axioms for their “existence” proofs. The 
word “construction” here does not mean literally constructive methods, but rather the definition of classes 
of objects within other axiomatic systems. Since essentially all of the axiomatic systems used for defining 
models are themselves in doubt, all proofs of model-existence and theory-consistency are necessarily relative. 


Among the hundreds of models which have been defined for ZF set theory (mostly to answer questions about 
relative consistency or independence of some axioms relative to others), there are two models which are 
the most basic, obvious and intuitively clear. Although these two models have large numbers of difficulties 
and unanswered questions, they are at least simple in principle. They are generally called the constructible 
universe, denoted L, and the von Neumann universe, denoted V. Both of these ZF universes require ordinal 
numbers for their definition. So their presentation is delayed until Chapter 12. (The von Neumann universe 
is discussed in Section 12.6. The constructible universe is mentioned very briefly in Remark 12.6.8.) 


Although there is no absolute proof of the consistency of the ZF axioms, most mathematicians accept as 
a matter of faith that at least one model does exist in some sense. This is not the kind of faith which is 
required in order to accept the axiom of choice. In the AC case, one must accept the existence of entities 
for which no concrete examples will ever be seen. In the ZF consistency case, all of mathematics has made 
extensive use and application of the axioms every minute of every day for a hundred years without finding any 
inconsistency. While AC is incapable of being tested, ZF is continually tested by every line of mainstream 
mathematics. Thus one can say that ZF has a superlative test record, whereas AC has no record of testing 
at all. Hence the choice between ZF and ZFC as a basis for mathematics is very clear. Of course, ZFC yields 
more theorems, which is the motivation for its wide acceptance, but the extra theorem-proving power is 
purchased at the cost of extreme metaphysicality. It is a Faustian pact, obtaining a richness of theorems in 
return for one’s mathematical integrity. 


7.5. The ZF extension axiom and singleton sets 


7.5.1 REMARK: The meaning of the ZF extension axiom. 

The extension axiom, Definition 7.2.4 (1), is also known as the axiom of extensionality. This axiom may be 
thought of as the proposition (A C B ^ B C A) > A = B. That is, if the sets referred to by two abstract 
set names A and B have the same membership relations “to the left” with all other abstract set names, then 
the abstract set names must refer to the same concrete object. (See Figure 7.5.1.) 

If two names refer to the same object, all predicates of that object must be identical. This is formalised as 
the axiom of substitutivity for predicate calculus with equality in Definition 6.7.6 (ii) (Subs). The converse 
is formalised by the ZF extension axiom. In ZF set theory, the only property of a set is what it contains 
because equality and set membership are the only predicates for sets in Definition 7.2.4. 

By the extension axiom, if Vr, (x € A €» x € B) then A = B. So predicate calculus with equality 
substitutivity axiom implies Vx, (A € x € B € x). In other words, if two sets have the same membership 
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= e 
To 
(vx, (x€ A & ze B)) 2 A=B 


Figure 7.5.1 ZF extension axiom 


relations “on the left”, then they also have the same set membership relations “on the right”. The converse 
is also true, as shown by Theorem 7.5.10. 


7.5.2 THEOREM: Uniqueness of singleton sets. 
Let 21, z2, S be sets such that Vz, (x € z1 & x = S) and Va, (£ € z2 & x = S). Then 21 = 22. 


PROOF: Let 21, 22, S satisfy Vx, (x € z1 €& x = S) and Va, (x € z2 & x = S). Then Vz, (x € 21 & x € 22) 
by Theorem 4.7.9 (xix). Hence z1 = z2 by the extension axiom, Definition 7.2.4 (1). 


7.5.3 REMARK: Fristence and uniqueness of singleton sets. 
For the proof of Theorem 7.5.10, it is convenient to first define singleton sets, whose existence and uniqueness 
is shown in Theorem 7.5.4 by applying the ZF unordered pair axiom, Definition 7.2.4 (3). 


7.5.4 THEOREM: Existence and uniqueness of singleton sets. 
Va, JS, Vz, (z € S & z =r). 


PROOF: Let x be a set. Then dS, Vz, (z € S & (z = x V z = x)) by the pair axiom, Definition 7.2.4 (3) 
(by letting A = x and B = x). Therefore 3S, Vz, (z € S = z = x). The uniqueness of S follows from 
Theorem 7.5.2. Hence 3'S, Vz, (z € S & z = x) for all sets z. 


7.5.5 REMARK: Definition and notation for singleton sets. 
Since the set S in Theorem 7.5.4 exists and is unique, it may be given a name, and the word "the" may be 
used for it. It is called “the singleton set which contains S”. 


7.5.6 DEFINITION: <A singleton (set) is a set S which satisfies dz, Vz, (z € S & z = x). 
The singleton (set) which contains a set x is the set S which satisfies Vz, (z € S & z = x). 


7.5.7 NOTATION: {2}, for a set x, denotes the set S which satisfies Vz, (z € S <= z = x). In other words, 
{x} denotes the singleton which contains z. 


7.5.8 THEOREM: Elementary properties of singleton sets. 


(i) Va, Vz, (z € {a} & z = x). 


(ii) A set S is a singleton if and only if 3'z, x € S. 
(iii) Va, x € {a}. 
(iv) VS, Va, (x € S & (x) € S). 


Pnoor: Part (i) follows immediately from Definition 7.5.6 and Notation 7.5.7. 


For part (ii), let S be a singleton. Then dz, Vz, (z € S & z = x) by Definition 7.5.6. So there exists a set 
x such that Yz, (z € S & z = x). Let z = x. Then z € S. So x € S. The uniqueness of x follows from 
Theorem 7.5.2. Therefore 3'z, x € S. 


Now assume that S is a set with 3'z, x € S. Then Jz, x € S and Va, Vb, ((a € S and b € S) => a = b) by 
Notation 6.8.3. Let x be a set which satisfies x € S. Let z be a set. If z € S, then z = x by the uniqueness 
of x (because (a € S and b € S) => a = b for all sets a and b). And if z = a, then clearly z € S. Thus 
da, Vz, (z € $ & z = x). So S is a singleton by Definition 7.5.6. It follows that S is a singleton if and only 
if Fr, x Es. 
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For part (iii), let z be a set. Then {x} is a well-defined set by Theorem 7.5.4 and Notation 7.5.7, and æ € {x} 
by Notation 7.5.7. Hence Vz, x € (x). 

For part (iv), let S and x be sets. Assume that x € S. Suppose that y € (xj. Then y = z by Notation 7.5.7. 
So y € S. Thus {x} C S by Notation 7.3.3. 

Now assume that {x} C S. It follows from part (ii) that z € {x}. So x € S by Notation 7.3.3. Thus 
r€S => {x} CS. Hence z E S e {a} C S. 


7.5.9 REMARK: Same membership relations on the left imply same membership relations on the right. 
From Theorem 7.5.10, it follows that two sets are equal if and only if they have the same membership 
relations on the right. But the extension axiom, Definition 7.2.4 (1), states only that two sets are equal if 
and only if they have the same membership relations on the left. 


7.5.10 THEOREM: Sets must be equal if they are elements of exactly the same sets. 
Let A, B be sets such that Vz, (A € z & B € z). Then A = B. 


PROOF: Let A, B be sets such that Vz, (A € z & B € z). Let z1 = {A}. Then Vz, (x € z & x = A). 
So A € z,. Therefore B € z,. Let x = B. Then x = A. Hence A = B. 


7.5.11 REMARK: Equality in set theories which are not based on predicate calculus with equality. 

As mentioned in Remark 6.7.5, if the object equality predicate for sets is not axiomatised within a predicate 
calculus with equality (QC+EQ), then the equality relation may be defined in terms of set membership. 
(See for example the presentation of NBG (Neumann-Bernays-Gédel) set theory by E. Mendelson [370], 
pages 159-170.) Then the extension axiom in Definition 7.2.4(1) must be replaced by an axiom which 
replaces the substitutivity axiom in QC+EQ. In other words, the equality relation may be defined in terms 
of the membership relation “€” by defining A = B to mean Va, (x € A & x € B). In terms of this defined 
equality relation, the extension axiom then requires this relation to have the following relation to the set 
membership relation *on the right". 


A=B > Vx, (Acre Ber). (7.5.1) 


This would follow automatically from the QC+EQ axiom Definition 6.7.6 (EQ 1) (Subs). At first sight, 
line (7.5.1) could easily be confused with the extension axiom in Definition 7.2.4 (1). 


7.5.12 REMARK: Set theories based on predicate calculus with or without equality. 
Two ways of introducing the extension axiom are summarised in the following table. 


proposition concrete abstract 
(A = B) > Yz, (z€ A z€ B) QC+EQ definition 
(A = B) = Yz, (Aez Bez) QC+EQ axiom 
. (Yz, (z € A z€ B))=(A=B) axiom definition 
. (Yz, (AezeoBcz)-—(A-B) theorem theorem 


po PH 


The more concrete approach, based on predicate calculus with equality, is adopted in this book and also by 
Shoenfield [390], pages 238-240, and by EDM2 [113], 33.B, pages 147-148. In this approach, it is assumed 
that the names in the variable name space Ny of the language refer to objects in a concrete variable space V. 
(In the case of set theory, this means that the names of sets refer to concrete sets in some externally defined 
space.) An equality relation is assumed to be defined already on the concrete variable space, and this relation 
is imported into the abstract space via the variable name map uy : Ny > V. 


“I 


Therefore in the concrete approach, the equality relation is defined as an import of a concrete equality 
relation. Since it also assumed that the membership relation “€” is also imported from a concrete membership 
relation, and the concrete relation is assumed to be well-defined for the concrete variables, it automatically 
follows that in the abstract name space, (A = B) > (z € A & z € B) and (A = B) > (A€ z e B € z). 
In fact, B may be substituted for any instance of A like this without changing the truth value of the 
proposition. That is, (A = B) = (F(A) © F(B)) for any predicate F. (This is called the “substitutivity of 
equality” axiom. See Remark 6.7.5.) So lines (1) and (2) in the above table follow immediately. (“QC + EQ" 
abbreviates “predicate calculus with equality”.) In this case, line (3), (Vz, (z € A@ z € B)) = (A= B) 
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is specified as an axiom of extension. This is required because it is not a-priori obvious that the concrete 
membership relation would imply the concrete equality relation in this way. 


The abstract approach is used by E. Mendelson [370], pages 159-170. In this approach, the equality relation is 
not imported from a concrete space. So the equality relation needs to be defined in terms of the membership 
relation, which is the only relation imported from the concrete space. In this approach, A = B is defined to 
mean Vz, (z € A & z € B). This yields lines (1) and (3) in the table. 


In the abstract approach, since the equality relation is no more than a definition in terms of the membership 
relation, there is no guarantee at all that this abstract equality of A and B implies that they both refer to 
the same concrete object, which therefore would have all attributes identical. Therefore this “substitutivity 
of equality" property must be specified as an axiom of extension. 


This explains why line (3) is an axiom in the concrete approach whereas it is given by a definition in the 
abstract approach, and the reverse is true for line (1). 


7.6. The ZF empty set, pair, union and power set axioms 


7.6.1 REMARK: The empty set axiom. How to name a set. 
The empty set axiom states that there exists an empty set. It is unique by Theorem 7.6.2. Therefore it can 
be given a name and a notation. 


More generally, names (especially definitions and notations) can be given to individual “constant objects” in 
any predicate calculus with equality if they are shown to be unique. (See Section 6.7 for predicate calculus 
with equality. See Section 6.8 for uniqueness.) 


Each assignment of a name to an object requires a prior proved assertion or a definition that the object is 
unique. Then a name can be assigned to the unique object. The first and most important name assignment 
in set theory is to the empty set. Its uniqueness is asserted and proved in Theorem 7.6.2. Its existence is 
asserted as a ZF axiom in Definition 7.2.4 (2). (Definition 7.2.4 also asserts that ZF set theory is a predicate 
calculus with equality.) 


« 


The combined existence and uniqueness assertion may be written as *3'z, Vy, y ¢ z". The unique object to 
be named is defined by the predicate P(z) = “Vy, y € z". Then one may write *3'z, P(z)" as the uniqueness 
assertion. 


A serious difficulty arises when one wishes to state what the empty set is. In a non-standard way, one 
could write: Ø = “z; Vy, y ¢ z”. This does not make much sense because the equality symbol is defined only 
between sets in this predicate calculus with equality. The right hand side of this “equality” for () is not a 
set, nor is it even a predicate. Nevertheless, this might be a meaningful way of assigning names to sets if the 
equality symbol is replaced by something else, like the word “means”. 

Using Notations 7.5.7 and 7.7.10, one could write {0} = {z; Vy, y € z}, which would be correct. This tends 
to support the idea of writing that “Ø means “z; Vy, y € z"". This is expressed informally in plain language 
in Definition 7.6.3 so as not to overload the reader with formalities. 


7.6.2 THEOREM: Uniqueness of the empty set. 
Let Aj, A» be sets which satisfy Vx, x ¢ A1 and Vx, z ¢ A». Then A; = A». 


PROOF: Let Aj, A2 be sets which satisfy Vr, r ¢ A; and Va, x € As. Then Vz, (x ¢ Ay > x € A3) by 
Theorem 6.6.18 (iii). Similarly Va, (x ¢ Ag — x ¢ A1). Therefore Vz, (x ¢ A; & x ¢ A5). Hence Ay = Ag 
by the extension axiom, Definition 7.2.4 (1). 


7.6.3 DEFINITION: The empty set is the set A which satisfies Vr, x ¢ A. 
A non-empty set is a set which is not the empty set. 


7.6.4 NOTATION: ( denotes the empty set. 


7.6.5 THEOREM: Some basic properties of the empty set. 
Let A be a set. 

(i) 0 C A. 

(ii) If A CQ, then A= Ô. 
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PROOF: For part (i), Vz, x ¢ Ø by Definition 7.6.3. So Vx, (x € Ø V x € A) by Theorem 4.7.6 (vi). This is 
equivalent to Vx, (x € Ø => x € A) by Definition 4.7.2. Hence Ø C A by Definition 7.3.2. 

For part (ii), let A C 0. Suppose that z € A. Then x € () by Notation 7.3.3. But x ¢ Ø by Definition 7.6.3. 
This is a contradiction. So x ¢ A. Thus Vx, x ¢ A. Hence A = () by Definition 7.6.3 and Theorem 7.6.2. 


7.6.6 REMARK: Universal implies existential, if the set is non-empty. 

One often wishes to assert that if a proposition is true for all x € X, then it is true for some x € X. This is 
shown in Theorem 7.6.7. Of course X must not be the empty set. In terms of Notations 7.2.7 and 7.6.4, this 
means that "Vr € X, P(x)” implies “Jx € X, P(x)” if X #0. (This is closely related to Theorem 8.4.8 (i).) 


7.6.7 THEOREM: For all x implies for some x, if the set is non-empty. 
Let X be a non-empty set. Let P be a single-parameter set-theoretic predicate. Suppose that Vz € X, P(x). 
Then 3x € X, P(z). 


PROOF: The assertion follows from Theorem 6.6.20 (i) with A(x) = “x € X" for all x. 


7.6.8 REMARK: All assertions are “vacuously true” for elements of the empty set. 
'Theorem 7.6.9 may seem trivial, but it is used quite often in practical applications. 


7.6.9 THEOREM: All properties are valid for the elements of the empty set. 
Let P be a single-parameter set-theoretic predicate. Then Vx € 0, P(x). 


Proor: By Notation 7.2.7, “Vz € 0, P(x)” means “Vx, (x € 0 > P(x))". Let x be a set. Then —^(x € 0) 
by Definition 7.6.3. So z € Ø > P(x) by Theorem 4.5.7 (xl). So Vz, (x € Ø > P(x)) by Definition 6.3.9 (UI). 
Hence Vx € Ø, P(x). (Alternatively, apply Theorem 6.6.18 (v).) 


7.6.10 REMARK: The unordered pair axiom. 
Similarly to Theorem 7.5.4 for singleton sets, Theorem 7.6.11 shows existence and uniqueness of unordered 
pair sets which contain two specified sets. Then these unordered pairs can be given a name and a notation. 


7.6.11 THEOREM: Uniqueness and existence of unordered pair sets. 
Va, Vy, dS, Vz, (z E S & (z =x V z=y)). 


PROOF: Let {xand y be sets. Then 35, Vz, (z € S & (z = x V z = y)) by the pair axiom, Definition 7.2.4 (3). 
To show uniqueness, suppose that S, satisfies Vz, (z € Sk €» (z = x V z = y)) for k = 1,2. Then 
Vz, (z € S1 & z € Sy) by Theorem 4.7.9 (xix). So Sı = S» by the extension axiom, Definition 7.2.4 (1). 
Hence 3'S, Vz, (z E S & (z =x A z = y)) for all sets x and y. 


7.6.12 DEFINITION: An (unordered) pair is a set S which satisfies dz, Jy, Vz, (z € $ & (z =x V z =y)). 
The unordered pair which contains sets x and y is the set S which satisfies Yz, (z € S & (z — x V z — y)). 


7.6.13 NOTATION: {x,y}, for sets x and y, denotes the set S such that Vz, (z € S & (z = x V z = y)). In 
other words, {x,y} denotes the unordered pair which contains x and y. 


7.6.14 REMARK: Unordered pairs which are singletons. 
In the special case that z = y in Notation 7.6.13, the unordered pair (x, y) equals the singleton {x}, which 
equals {y}. But if x Æ y, then {x,y} is not a singleton. 


7.6.15 REMARK: The union axiom. 
The ZF union axiom, Definition 7.2.4 (4), states that 3X, Vz, (z € X & dS € K, z € S) for any set K. 
(This is illustrated in Figure 7.6.1.) 


The uniqueness of the set X in this proposition, for any given set K, may be proved exactly as for singletons 
in Theorem 7.5.4, the empty set in Theorem 7.6.2, and unordered pairs in Theorem 7.6.11. (There is no need 
to formally prove an assertion which is obvious. It is the custom in mathematics that any unproved assertion 
which the reader does not believe automatically becomes an exercise question for which the author will not 
provide an answer!) Thus clearly VK, 3X, Vz, (z € X & dS € K, z € S). And as usual, this existence and 
uniqueness implies that it is possible to give a name and notation to this set. 
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IX, Vz, (zE X ea3S(zeS^ScK)) 


Figure 7.6.1 ZF union axiom for a general collection of sets 


7.6.16 DEFINITION: The union of a set K is the set X which satisfies Vz, (z € X & dS € K, z € S). 


7.6.17 NOTATION: |K, for a set K, denotes the set X which satisfies Yz, (z € X & JS € K, z € S). In 
other words, |J K denotes the union of K. 


7.6.18 REMARK: The union of a pair of sets. 

The union axiom may now be combined with the unordered pair axioms to define and give a notation for 
the union of two sets. The binary union is generally considered to be a simpler operation than the general 
union, but a single axiom is adequate to construct them both. Thus if K = (A, B] for sets A and B, then 
the set U K = U (A, B} is already defined in Definition 7.6.16. (This is illustrated in Figure 7.6.2.) 


AUB 
IX, Yz, (zE X & (zeAv ze B)) 


Figure 7.6.2 ZF union axiom for a collection of two sets 


7.6.19 DEFINITION: The union of two sets A and B is the set LJ (A, B}. 


7.6.20 NOTATION: AUB, for sets A and B, denotes the union of A and B. 


7.6.21 REMARK: The intersection of a pair of sets requires the specification theorem. 

It may seem straightforward to define the intersection of two sets A and B as the set C which satisfies 
Vz, (z E€ C & (zc A V z € B)), but the ZF union axiom does not imply the existence of such a set, and 
there is no ZF intersection axiom. However, the existence of the intersection of two sets does follow from the 
specification theorem which follows from the ZF replacement axiom. (See Theorem 7.7.2 and Remark 7.7.19.) 


7.6.22 REMARK: A singleton axiom would be weaker than the unordered pair axiom. 

It may seem that if unordered pair axioms were replaced with a “singleton axiom” which guarantees the 
existence of the set {x} for any x, this could be combined with the union axiom to construct an unordered 
pair {x,y} as the union {x} U (y) of the two sets {x} and {y}. However, this would require the existence of 
the unordered pair {{x}, {y}} in Definition 7.6.19. 


Thus the ZF axioms stipulate the existence of a set containing exactly 0 elements and the existence of sets 
containing any given 2 elements. Then existence for all other finite numbers of elements can be demonstrated 
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from these. Oddly, there is a gap in the middle between 0 and 2 here! If one attempts to stipulate only 
existence of 1-element and 2-element sets in terms of given sets, this would not exclude the possibility of an 
empty universe if the infinity axiom is omitted. 


7.6.23 REMARK: Power set axiom. 

The power set axiom, Definition 7.2.4 (5), may be written more concisely as VX, JP, Vz, (z € P & z € X). 
In other words, P is the set of all subsets of X. Since a definite pattern is emerging in the uniqueness proofs 
for singletons, the empty set and unordered pairs in Theorems 7.5.2, 7.6.2 and 7.6.11 respectively, a more 
general form of such proofs is given in Theorem 7.6.24. (See Remarks 7.7.3 and 7.7.9 for some comments 
on definition of sets by their “comprehension”.) This is immediately applicable to the union axiom and the 
power set axiom. Therefore the set P in Definition 7.6.25 is unique. 


7.6.24 THEOREM: Uniqueness of sets defined by their “comprehension”. 
Let A be a single-parameter set-theoretic formula. Let S1, S5 be sets which satisfy Vz, (z € Sp = A(z)) 
for k = 15 2. Then Si = $5. 


PROOF: Let Sp be sets which satisfy Vz, (z € Sk <= A(z)) for k = 1,2. Then Vz, (z € Si & z € S2) by 
Theorem 4.7.9 (xix). So Sı = S2 by the extension axiom, Definition 7.2.4 (1). 


7.6.25 DEFINITION: The power set of a set X is the set P which satisfies Vz, (z € P & z C X). 
7.6.26 NOTATION: P(X), for a set X, denotes the power set of X. 


7.6.27 REMARK: Variant notation for power sets. 

A popular alternative notation for the power set P(X) is 2*, but this has two major disadvantages. The 
notation 2* is inconvenient for typesetting, especially if X is a complicated expression, but more importantly, 
it lacks validity. The notation 2* is best reserved for its literal meaning, which is the set of functions with 
domain X and range 2 = {0,1}. (See Notation 10.2.17.) Such functions are called “indicator functions” 
on X. (See Section 14.7 for indicator functions.) It is true that indicator functions on X may be "identified" 
with subsets of X because there is a natural bijection between them, but the confusion of concepts by such 
identification maps is unnecessary and often misleading. (See Remark 14.7.6 for this identification map.) 


7.6.28 REMARK: Using the power set notation to clarify quantified logical expressions. 

Propositions of the form “Vz, (z C X = A(z))”, for some set-theoretic formula A, are often seen in practical 
mathematics. Since the power set P(X) is defined by Vz, (z € P(X) = z C X), the proposition z C X is 
interchangeable with the proposition z € P(X). So the proposition “Vz, (z C X = A(z))” is equivalent to 
“Yz, (z € P(X) = A(z))”. This may be written in terms of Notation 7.2.7 as “Vz € P(X), A(z)”, which 
has considerable advantages when other quantifiers some before or after the quantifier “Vz € P(X)”. 


7.7. The ZF replacement axiom 


7.7.1 REMARK: The meaning of the ZF replacement axiom. 

The ZF replacement axiom, Definition 7.2.4 (6), states that for any pre-existing set A and any two-parameter 
set-theoretic formula f which determines a unique value y which satisfies f(x, y) for each x, there is a set B 
which contains the value y for each x in the domain A, and only those values. The formula f may be thought 
of as a function which maps elements of an old (pre-existing) set A to sets (which are inside or outside A), 
which are then gathered together as the elements of a new set B. This is illustrated in Figure 7.7.1. 


A / «B 
e f { >e j 
T 4 y f 
old new 
Figure 7.7.1 ZF replacement axiom: Vy, (y € B & 3x € A, f(x, y)) 
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Although each individual set y which satisfies f(x,y) for some x € A is clearly contained in at least one 
pre-existing set because y € {y}, it is does not follow in general from the other axioms that the collection of 
all such sets y is contained in or included in a pre-existing set. 


The replacement axiom is directly useful for showing that infinite collections of sets which are determined 
by some set-theoretic rule may be considered to be sets. The union axiom is not helpful here because to 
apply the union axioms would require the sets in question to already constitute a well-defined set so that 
their union may be shown to exist. 


The distinction between set-theoretic formulas and ZF functions (which are sets of ordered pairs in the style 
of Definition 10.2.2) is very significant. In the case of ZF functions, the domain and range must be known 
in advance to be well-defined ZF sets. So the ZF replacement axiom gives no new information at all. With 
a set-theoretic formula, it is possible to specify a new set in terms of an old set according to a fixed rule for 
an infinite set of old sets. Then the aggregate of the specified new sets is a well-defined ZF set by the ZF 
replacement axiom. The new aggregate set does not require the prior existence of any container set. 


Some authors call the replacement axiom the “substitution axiom”, but the word “substitution” is best 
reserved for text-level substitution of expressions into other expressions. (See for example Sections 3.10 
and 3.13, and Definitions 4.4.3 (viii) and 6.3.9 (viii).) 

The most useful practical applications of the replacement axiom are indirect by way of the specification 
theorem, Theorem 7.7.2, which is used for the great majority of set constructions in mathematics. 


7.7.2 THEOREM: The Zermelo-Fraenkel specification theorem. 
Let A be a set. Let P be a single-parameter set-theoretic formula. Then 


3B, Vy, (y€ B & (y € A^ P(y)). 


PROOF: Let A be a set. Let P be a single-parameter set-theoretic formula. Define the two-parameter 
set-theoretic formula f by f(x,y) = *P(x) ^ (x = y)" for sets x and y. Then the ZF replacement axiom, 
Definition 7.2.4 (6), asserts the following for any set A and two-parameter set-theoretic formula f. 


(Var, Vyr, Vy2, (Gv, y1) ^ f(x, y2)) > Y= y2)) > AB, Vy, (y ceBe dz, (x EAN f(x,y)))- 


Let x, y1, y2 be sets such that f(x,yi) A f(z,ya) is true. Then P(x) ^ (x 2 y1) A P(x) ^ (x = y2) is true 
by the definition of f. So yı = ys. Therefore f satisfies Vx, Vy1, Vyo, ((f(z,y1) A f(x, y2)) > yı = y2). 
Consequently 4B, Vy, (y € B & dr, (x € A ^ f(z,y))). In other words, there exists a set B which satisfies 
Vy, (y € B & Jx, (x € A^ P(x) ^ x = y)). This is equivalent to Vy, (y € BS Jx, (y E€ A^ P(y) ^ x — y)), 
which is equivalent to Vy, (y € BS (y € A ^ P(y))). 


7.7.3 REMARK: The ZF replacement axiom versus the Z specification axiom. 

The ZF replacement axiom was introduced in 1922 by Fraenkel [412] as a substituted for the specification 
axiom which was introduced in 1908 by Zermelo [444]. Zermelo's original specification ariom was the same 
as the ZF specification theorem, Theorem 7.7.2. (The replacement axiom was also hinted at informally by 
Dmitry Mirimanoff in 1917 and published formally by Thoralf Skolem in 1922 or 1923. See Moore [371], 
page 262; Lévy [368], page 24; Stoll [393], page 303; Suppes [395], page 202.) Set theory which is the same 
as ZF, but with the weaker specification axiom substituted for the ZF replacement axiom is called Zermelo 
set theory, abbreviated as Z. 


The replacement axiom extends the class of sets which can be proved to exist. It was noticed by Fraenkel 
and Skolem that the range of the recursively defined function f given by f(0) = w and f(i -- 1) = P(f(2)) 
for all i € w, was not guaranteed to exist in Z. In other words, the set {w, P(w), P(P(w)),...} could not be 
guaranteed to exist in Z, but its existence was guaranteed in ZF. (See for example Halmos [357], page 76; 
Moore [371], page 262; Stoll [393], page 303.) 


'The greater power of the replacement axiom, relative to the specification axiom, is required for various pure 
mathematical set theory purposes, particularly to guarantee the existence of sets beyond the “von Neumann 
universe stage” Vi,4w, which is a Zermelo set theory model. (See Section 12.6 for the cumulative hierarchy 
of ZF sets.) For most purposes in applicable mathematics, however, the sets whose existence is guaranteed 
in Z are more than adequate. The omitted sets are a waste of space. (See Remark 12.6.3 for some comments 
on this smaller universe of sets for Zermelo set theory.) 


[www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


240 7. Set axioms 


Some applications of the ZF replacement axiom, for which the ZF specification theorem is insufficient, are 
Theorems 13.2.5 and 13.3.2, which are both related to ordinal numbers. This is not mere coincidence. Most 
“normal” mathematics can be done in Z set theory. It is the ordinal numbers which primarily justify the 
adoption of the ZF replacement axiom. 


One of the advantages of Zermelo set theory is that the specification axiom (6^) in Definition 7.7.4 is much 
simpler than the replacement axiom (6) in Definition 7.2.4. This suggests that it is more likely to be a 
fundamental property of sets. The specification axiom is also known as the axiom of comprehension or the 
axiom of separation. (Another advantage of Z set theory is that the dizzying complexity of ordinal and 
cardinal numbers beyond V,,4,, can be ignored.) 


7.7.4 DEFINITION[MM]: Zermelo set theory is the same as Zermelo-Fraenkel set theory (Definition 7.2.4) 
except that the replacement axiom (6) is substituted with Axiom (6^). 


(6’) The specification axiom: For any set A and single-parameter set-theoretic formula P, 


B, Yy, (ye B & (yE A^ P()). 


7.7.5 REMARK: Name for two-parameter set-theoretic formulas which are function-like. 

It is convenient to give a name to the kind of two-parameter set-theoretic formula which is required for 
the ZF replacement axiom. Such a formula f satisfies Vx, Vy1, Vyo, (f(a, y1) A f(a, y2)) > yı = y2). It is 
difficult to call this a “set-theoretic function” because this is very easily confused with two-parameter set- 
theoretic object-maps in general first-order languages. However, within ZF set theory, the opportunities for 
confusion should be minimal. There is a more serious danger of confusion with single-parameter set-theoretic 
object-maps because this is essentially what they are. (At least they may be “identified” with each other.) 


One fairly safe solution to this problem is to allow free substitution of one form for the other, regarding 
the difference as a mere notational issue. Notation 7.7.7 permits a “boolean-valued” two-parameter function 
formula to be notated as if it were a set-valued single-parameter formula. Although this is a harmless 
fiction, it could seem to contradict the claim that ZF set theory has no set-theoretic object-maps. Note that 
Notation 7.7.7 also clashes with the conventions for the ordered-pair-set functions in Definition 10.2.2, which 
are based on the ordered-pair-set relations in Definition 9.5.2. 


7.7.6 DEFINITION: A (two-parameter) set-theoretic function (formula) is a two-parameter set-theoretic 
formula f which satisfies Vx, Vyi, Vy2, ((f(z,y1) ^ f(z,y2)) > y1 = yo). 


7.7.7 NOTATION: f(x), for a set x and a two-parameter set-theoretic function formula f, denotes the set 
y for which f(x, y) is true. 


7.7.8 REMARK: Brace-and-semicolon notations for sets. 

There are three basic notations which use braces (or “curly brackets”) and semicolons for sets. The validity 
of these forms of notation is dependent on the replacement axiom and the restriction theorem. In each 
case, the first brace may be read as “the set of”, and the semicolon may be read as “such that”. Thus 
"(x € A; P(x)}” may be read as “the set of x in A such that P of x is true". 


(1) (z; P(x)) denotes the set of x which satisfy predicate P. 
(2) (x € A; P(x)) denotes the set of x in A which satisfy predicate P. 
(3) (f(x); x € A) denotes the set of f(x) such that x is in A. 


In case (1), there is no guarantee in general that a set exists which contains precisely those sets x which 
satisfy P(x). As an extreme example, if P is the always-true predicate ^T", then (z; P(a)} would presumably 
mean the set of all sets, which is forbidden by the ZF regularity axiom because it would lead to Russell's 
paradox. (See Remark 7.8.6 for Russell’s paradox.) Therefore this notation must only be used when it is 
known that there is a well-defined set which satisfies P. To be more precise, the proposition 4X, Vz, (z € 
X € P(z)) must be provable in ZF set theory. Provability is, unfortunately, a slightly slippery notion. It is 
difficult to prove that a proposition cannot be proved. Nevertheless, the onus is on the user to demonstrate 
that the notation is well-defined before using it. 


In case (2), there is generally no serious difficulty with the validity of its use. The specification theorem 
guarantees that {x € A; P(x)} represents a valid set whenever it is can be shown that A is a set. Any 
well-formed predicate P combined with any well-defined set A will yield a valid set. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


7.7. The ZF replacement axiom 241 


In case (3), the ZF replacement axiom must be applied unless it is known that all of the sets f(x) for x € A 
are contained in a single well-defined set. This notation is mostly used when all of the elements of the range 
of f are in fact elements of a set whose validity is known. Therefore the application of the replacement axiom 
is, most of the time, superfluous in applicable mathematics. 


It should be noted that in all of these cases, there is no absolute general guarantee that the notations actually 
denote valid sets. They always require implicit or explicit justification, especially the forms (1) and (3), which 
almost always require explicit justification. In case (2), the only point which needs to be checked is that A 
is a set, which is typically clear within the context. 


7.7.9 REMARK: Case (1): Notation for sets which are justified by “semi-naive comprehension". 

Although Zermelo-Fraenkel set theory does not permit completely general “naive comprehension” as a basis 
for specifying sets, it is convenient to introduce a form of notation which at least appears to specify sets 
in such a naive way. (“Naive comprehension” means that {x; P(x)] specifies a valid set or class for any 
set-theoretic formula P.) To ensure that this form of notation does not introduce non-sets into mathematical 
texts, the use of Notation 7.7.10 must always be justified with a proof that the object referred to is a genuine 
ZF set. One could perhaps refer to such careful use of this notation as “semi-naive comprehension” . 


Theorem 7.6.24 shows that there is at most one ZF set which is specified by a single-parameter set-theoretic 
formula. So if at least one set can be proved to satisfy the formula, it must be unique. Therefore any 
proposition which specifies at least one set specifies exactly one set. Hence Notation 7.7.10 is well defined if 
the stipulated condition for P is met. 


7.7.10 NOTATION: Set whose existence is justified by “semi-naive comprehension”. 
{x; P(x)}, for any single-parameter set-theoretic formula P which satisfies 1X, Vz, (z € X = P(z)), denotes 
the set X which satisfies Vz, (z € X = P(z)). 


7.7.11 REMARK: Variant notations for sets which are specified by semi-naive comprehension. 
There are some variants of Notation 7.7.10 in the literature. For example, Shoenfield [390], page 242, uses 
“Ia | P(a)]” to denote (z; P(a)}. (See also Remark 7.7.16 for discussion of similar alternative notations.) 


7.7.12 REMARK: Case (2): Notation for sets which are justified by the specification theorem. 

There are no validity issues for Notation 7.7.13 if A denotes a well-defined set. The set which is denoted by 
{x € A; P(x)} is the same as the set which is denoted by (x; x € A ^ P(x)] in terms of Notation 7.7.10. 
This set is well defined by Theorems 7.6.24 and 7.7.2. 


7.7.13 NOTATION: Set whose existence is justified by the specification theorem. 


[x € A; P(x)), for a set A and a single-parameter set-theoretic formula P, denotes the set X which satisfies 
Yz, (zE X &(z€ AA P(z))). 


7.7.14 THEOREM: Expression for subset-specification notation using semi-naive comprehension notation. 
Let A be a set and P be a single-parameter set-theoretic formula. Then (x € A; P(x)) = (x; x € A^ P(z)). 


PROOF: Let A be a set and P be a single-parameter set-theoretic formula. Let X = (x € A; P(z)). Define 
the single-parameter set-theoretic formula Q by Q(z) = “z€ A^ P(z)”. Then Vz, (z € X & Q(z)) by 
Notation 7.7.13. So 3X, Vz, (z € X = Q(z)). Therefore (x; Q(x)) denotes the set X by Notation 7.7.10. 
Hence {x € A; P(x)} = (x; € A ^ P(a)}. 


7.7.15 THEOREM: Some basic properties of sets specified as subsets of other sets. 

(i) A={a; x € A} = {x € A; x = x} = {x € A; T} for any set A. 

(ii) {x € A; P(x)} C A, for any set A and any single-parameter set-theoretic formula P. 
(iii) If AC B, then A = {x € B; z € A}. 


PROOF: For part (i), let A be a set. Then Vy, (y € (x; x € A} = y € A) by Notation 7.7.10. Therefore 
A = (z; x € A} by Definition 7.2.4 (1). By Notation 7.7.13, {x € A; x = x} = (z; x € A and x = x], which 
equals {x; « € A} because x = z is true for all sets. (See Section 6.7 and Remark 7.5.12 for equality.) Then 
{x € A; z = £} = {x € A; T} because the always-true proposition “T” is always true. (See Notation 5.1.10.) 


For part (ii), let A be a set and let P be a single-parameter set-theoretic formula. Let y € (x € A; P(z)]. 
Then y € A and P(y) by Notation 7.7.13. Therefore x € A. Hence (x € A; P(x)) C A. 
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For part (iii), let A and B be sets with A C B. Then Vz, (z € A => z € B) by Notation 7.3.3. It follows 
from Theorem 4.7.9 (lxx) that Vz, ((z € A^ z€ B) €&&z € A). 

Define a set-theoretic formula P by P(x) = “x € A”. Then by Notation 7.7.13, (x € B; P(x)} denotes the 
set X such that Vz, (z € X & (z € BA P(z))). This is the same as Vz, (z € X & (z € BA z € A)). This is 
equivalent to Vz, (z € X = z € A) by the above observation, which is equivalent to Vz, (z € X & z € AA T). 
So by Notation 7.7.13, {x € B; x € A} is equal to {x € A; T}, which is equal to A by part (i). 


7.7.16 REMARK: Clashes in some variant notations for functions. 
Some people use the notation {x € A| P(x)} instead of (x € A; P(x)}, but the vertical stroke “|” can be 
confusing when combined with the following notations. 


(i) The probability notation *Prob(E | F)” for events E and F, which means the probability of the event 
E conditioned by the event F. 
(ii) The modulus |z| of a complex number z, the norm |v| of a vector v, or the determinant |A| of a matrix A. 
(iii) The restriction Tle of a function f to a set X. 
(iv) The divisibility operator. For example, m | n means that m divides n. 


Some people use the notation (x € A: P(a)} instead of {x € A; P(x)}. The colon can be confusing in set 
constructions such as (f : A —^ B : P(f)}. (See Notation 10.2.18 for (f : A > B; P(f)}.) 


7.7.17 REMARK: Case (3): Notation for sets which are justified by the replacement axiom. 
The set notation { f(a); x € A} is well defined by the ZF replacement axiom, Definition 7.2.4 (6). 


7.7.18 NOTATION: Set whose existence is justified by the replacement axiom. 
(f(x); x € A}, for a set A and a two-parameter set-theoretic function f, means (y; da € A, f(x, y)}. 


7.7.19 REMARK: Existence of intersections of sets follows from the specification theorem. 
The intersection of a pair of sets A and B may be defined by restricting elements of one of the sets to be 
contained in the other as in Definition 7.7.20, using Notation 7.7.13. This is well-defined by Theorem 7.7.2. 


7.7.20 DEFINITION: The intersection of two sets A and B is the set {z € A; z € B}. 


7.7.21 NOTATION: AN B, for sets A and B, denotes the intersection of the two sets A and B. 


7.7.22 REMARK: Replacing containment with inclusion in brace-and-semicolon set notations. 

A difficulty with Notation 7.7.13 is the fact that the expression “a € A” in *[x € A; P(ax)}” appears to be 
a proposition, and a proposition may plausibly be replaced by another proposition. However, the expression 
"y € A” is in fact a combination of a bound variable x with a proposition x € A because “{a € A; P(x)}” is 
an abbreviation for “{x; x € A A P(a)}”, where the role of “x” as a bound variable is more evident. (This 


is analogous to the way in which “Jx € S, P(x)” abbreviates “Sa, (x € S A P(x))” in Notation 7.2.7.) 


One possible generalisation of Notation 7.7.13 would be to say that {x Ry; P(x)} means (x; (x Ry) ^ P(x)) 
for any relation R. But a possible problem could then arise with an expression such as “{A C X; P(A, X)}”, 
which should presumably be equivalent to “{X 2 A; P(A, X)" because A C X is equivalent to X D A. 
But then the bound variable must be assumed to be A instead of X because A is the first-listed variable 
name. Since the notation “{A C X; P(A, X)}” is sometimes useful, it is presented here as Notation 7.7.23, 
but it is preferable to write a much less ambiguous expression such as {A € P(X); P(A, X)} instead. 

Note that the notation “{X 2 A; P(A,X)}” is intentionally not defined here because it would be almost 
meaningless. The set of all supersets of A is not a set, although the expression “{X; X D A A P(A, X)? 
could be a well-defined set. Since this form of notation might seem to give an unjustified guarantee that the 
expression is well-defined, it is best avoided. 


7.7.23 NOTATION: 
(A € X; P(A, X)}, for a set X and set-theoretic formula P, means (A € P(X); P(A, X)). 


7.7.24 REMARK: Other variations of the brace-and-semicolon set notations. 
Expressions such as ((z,y,z2) € R3; y = z}, {(2,y,y); xy € R} or {f : X 5 Y; f is injective} can be 
confusing. Such expressions can be re-expressed in terms of Notation 7.7.13. Thus these three examples may 
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be written as (p € RÌ; p = (x,y,z) ^ y = z}, (p € R; po = p3} and (f € Y*; f is injective} respectively. 
(See also Notation 10.2.18 for specification of sets of functions.) Usually the context makes clear which 
variables are bound variables, but sometimes not. 

Another kind of confusion can arise with expressions such as (f(x); x € A} for a set-theoretic expression f 
which is a function on A (ie. f(x) is uniquely defined for all x € A). Such expressions are introduced in 
Notation 7.7.18. It follows by ZF Axiom (6) (Definition 7.2.4) that (f(x); x € A} is a set if A is a set and 
f is a single-parameter set-valued formula defined on A. 


7.8. The ZF regularity axiom 


7.8.1 REMARK: The meaning of the ZF regularity axiom. 
The regularity axiom, Definition 7.2.4 (7), also known as the “axiom of foundation”, may be written as: 


YX, Xz0-2JgzexX,znx-(. (7.8.1) 


This implies that a sequence of set memberships must terminate “on the left". Without this axiom, a non- 
empty set X might exist such that Vz € X, dz’ € X, z € z. This would mean that every element zo of X 
would contain at least one element zı of X, which in turn would contain at least one element z2 of X, and 
so forth. An infinite sequence of such containments might never arrive at a set which contains no elements 


of X. (See Figure 7.8.1.) 
Z5 ZA Z3 22 Z1 20 


Figure 7.8.1 Infinite chain of set memberships “on the left” 


The axiom of regularity implies that any infinite sequence of set memberships ...2, € ... € 22 € 21 € zo 
terminates on the left for some n. This can be proven by letting X be this sequence of sets. 


The regularity axiom differs significantly from the other axioms by not actually anointing any new sets. It 
simply outlaws certain kinds of sets which are considered objectionable. Therefore there are no definitions 
or notations for sets which are constructed in accordance with this axiom because it produces no new sets. 
The negative nature of this axiom can be seen more clearly by rewriting line (7.8.1) as: 


VX, (XZ0^VzeX,znX FN => 1, 


where | denotes the always-false predicate in Notation 5.1.10. In other words, X ZÜAVzeX,znX^z( 
is forbidden. Alternatively, this may be written as: 


VX, (Vze X,znXz20)2 X=0 


7.8.2 REMARK: Equivalent formulation of the ZF regularity axiom. 

Theorem 7.8.3 (i) is the contrapositive of the ZF regularity axiom. (This contrapositive is used in the proof 
of Theorem 12.5.19 (xi).) It means that if every element of a set X contains some element of X, then X 
must be the empty set. This implies that if every element z € X has the property dy € X, y € z, then no 
element z € X has the property dy € X, y € z. However, it is certainly possible that some element z € X 
has the property dy € X, y € z. For example X = (0, [0]. Thus the axiom implies that if some element 
z € X has the property Jy € X, y € z, then some element z € X does not have the property Jy € X, y € z. 


7.8.3 THEOREM: The contrapositive of the ZF regularity axiom. 
Let X be a set. 


(i) (Vze X,dye X, yez) > X =F. 


Pnoor: For part (i), let X satisfy Vz € X, dy € X, y € z. Then 5 (dz € X, Vy € X, y é z). By 
Notation 7.2.7, this means ^ (3z, (z € X and Vy, (y € X > y ¢ z). So ^ (3y, y € X) by Definition 7.2.4 (7). 
In other words, X — (). 
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7.8.4 THEOREM: Impossibility of 1-set, 2-set and 3-set containment loops with the ZF regularity axiom. 
Let A, B and C be sets. 


(i) AGA. 
(ii) A ¢ B or B ¢ A. In other words, if A € B then B ¢ A. 
(ii) «(A € B) ^ (BEC) ^ (C € A)). 
PROOF: For part (i), let A be a set. Let X = {A}. Then X z Ø. So dz € X, zN X = 0 by the regularity 
axiom. But the only element of X is z = A. So An X = 0. That is, AN (A) = 0. Hence A ¢ A. 


For part (ii), let A and B be sets. Let X = (A, B}. Then X #9. So dz € X, zN X =O by the regularity 
axiom. There are only two elements in X. So either z = A or z = B. Suppose that z = A. Then AN X =Í. 
That is, AN (A, B) = Ø. So A ¢ A and B £ A. In particular, B ¢ A. Similarly, from z = B it follows 
that A ¢ B. Hence A é B or B ¢ A. 


For part (iii), let A, B and C be sets. Let X = (A, B, C]. Then X Z and so dz € X, zN X = (. But then 
z= Aorz-— B or z = C. Suppose that z = A. Then AN{A, B,C} ^ 0. So C ¢ A. Similarly, setting z = B 
yields A ¢ B, and setting z = C yields B ¢ C. In each case, the proposition (A € B) A (BEC) ^ (C € A) 
is contradicted. Hence 2((A € B) ^ (B € C) ^ (C € A)). 


7.8.5 REMARK: Application of the axiom of regularity to ordinal numbers. 

'The axiom of regularity is very directly applicable, in particular, to proofs of the most basic properties of the 
infinite set w of finite ordinal numbers in Section 12.1. Although the infinity axiom guarantees the existence 
of infinitely many ordinal numbers, it is the regularity axiom guarantees the uniqueness of its structure 
by forbidding infinite membership-relation sequences on the left. The regularity axiom enforces a unique 
"stopping point" for such sequences, namely the empty set. This makes the finite ordinal numbers well 
ordered, which makes mathematical induction valid. (The infinity axiom which is adopted here is in a sense 
the reverse of its most popular forms. This causes the structure of the proofs to be somewhat different. See 
Remarks 7.9.2, 7.9.3, 7.9.4 and 12.1.8 for comments on the “design choices" for the infinity axiom.) 


7.8.6 REMARK: The axiom of regularity prevents Russell’s paradoz. 

The most immediate motivation for the ZF regularity axiom is to excluded Russell’s paradox. There is a 
body of thought which says that Russell’s paradox should not be excluded, in particular because the concept 
of “the set of all sets” sounds so plausible at first. However, more careful thinking reveals that a container 
does not contain itself in the real world. There is an analogy here with computer file systems. On some 
operating systems, it is possible to make a “hard link” from a folder inside a folder to the parent folder. 
This results in total chaos for many kinds of software which perform recurse descent into folders. A rough 
idea of how this would look in terms of windows on a computer screen is shown in Figure 7.8.2. 


universe Uo 


universe Ui 


universe Uz 


universe U3 


universe U4 


universe Us 


universe 6 


universe 
univ 


Figure 7.8.2 A “folder” which contains a “hard link" to its “parent folder” 


Theorem 7.8.4 (i) prevents Russell’s paradox by excluding the existence of a set U satisfying Vz, x € U. 
Russell’s paradox is thus resolved in ZF set theory by simply forbidding the kinds of sets which cause 
embarrassment. There are many ways to exclude Russell’s paradox. Neumann-Bernays-Gédel set theory 
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avoids the paradox by demoting the problematic “sets” such as U to “classes” which are forbidden to 
be elements of sets. An alternative is to not exclude the paradox, but to let it propagate into the deeper 
foundations of mathematics by formalising “inconsistency-tolerant logic”. (See for example Mortensen [372].) 
Except in very elementary introductions, the concept of a set or universe of all sets is not often seen in 
serious mathematics. Therefore not much is lost by excluding it. The metaphor for a set in everyday life is a 
container, which clearly cannot contain itself. If the word “container” had been adopted instead of “set” or 
“class”, the concept of a “container which contains all containers” would have been less appealing, and the 
temptation would have been easier to resist. Thus Russell’s paradox may be viewed as a linguistic confusion 
or riddle rather than a true paradox. 


7.8.7 REMARK: The ZF regularity axiom is stronger than is required to avoid Russell’s paradox. 

Russell’s paradox follows from the existence of a universal set which contains all sets, not from the mere 
existence of sets which contain themselves. Logically consistent set-universes may be constructed which 
contradict the regularity axiom. In fact, some approaches to ZF set theory regard the regularity axiom as 
optional or unnecessary. 


Consider a set x which contains itself. In other words, x € x. Let y = {z € x; z € z}. Suppose that y € y. 
Then y € x and y ¢ y, which is a contradiction. So y ¢ y. However, it is not possible to prove y € y from 
the assumption that y ¢ y, as one would do in the demonstration of Russell’s paradox. To achieve this, one 
would need to assume that y € x. In fact, now suppose that y € x. Then y € y because y ¢ y, which is 
a contradiction. So y ¢ x. Thirdly, suppose that x € y. Then x € x and x ¢ x, which is a contradiction. 
Therefore x ¢ y. Thus three non-containments are proved, namely that y € y, y ¢ x and x ¢ y. In the 
Russell's paradox scenario, one has y € x because z is the set of all sets, which yields y € y and y ¢ y. In 
the non-paradox situation, one has y C x \ {a}, which causes no problems, whether y is empty or not. (See 
Figure 7.8.3.) Thus the ZF regularity axiom is stronger than it needs to be if the objective is merely to 
exclude Russell's paradox. 


x \ {x} 


Figure 7.8.3 Russell’s paradox construction for a self-containing set 


In the final analysis, it does seem to be an uneconomical application of the human intellect to permit sets to 
contain themselves and then devote one’s energy to the earnest study of such pathology. Self-containing sets 
are a source of intriguing riddles for mathematical recreations. But they seem to offer no real benefit. The 
same observation applies to set-pairs (x,y) such that x € y and y € x, and to any other set-of-sets which 
has a non-acyclic containment relation. For practical purposes, it is important to understand such oddities 
only in order to exclude them. Hence the ZF regularity axiom is relied upon in this book as a secure pillar 
of the set theory edifice. 


7.8.8 REMARK: The “depth” of sets “on the left” can be unbounded, but not infinite. 

If any chain of membership relations “on the left” is required to be finite by the regularity axiom, then 
every such chain is equinumerous to one and only one finite ordinal number. (See Definition 12.1.34 and 
Section 12.2 for finite ordinal numbers. See Section 13.1 for equinumerosity.) The finite stages V, of the 
cumulative hierarchy of sets (i.e. the von Neumann universe) have the interesting property that they contain 
precisely those sets of finite ordinal numbers which have maximum depth n on the left. (This is clear from 
the method of construction.) However, the first infinite von Neumann universe stage, denoted V,, does not 
contain sets with infinite depth. The depth is unbounded, but never infinite for a single set in the universe 
stage V,,. Thus the regularity axiom is not contravened. 


The set membership tree for w (which is a subset of V,,) is illustrated in Figure 7.8.4. Any particular path 
downwards from w terminates after a finite number of steps, although this number of steps is unbounded. 


7.8.9 REMARK: Infinite set membership chains to the right are permitted. 
Although the regularity axiom prevents an infinite chain of set memberships “to the left” there is nothing 
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Figure 7.8.4 Set membership tree for the set of finite ordinal numbers 


to stop an infinite chain of set memberships “to the right”. In fact, the set of finite ordinal numbers is such 
an infinite sequence. By Theorem 7.5.8 (iii), the infinite sequence in line (7.8.2) is valid for any set x. 


xE {a} e {ap e e... (7.8.2) 


But an finite sequence as in line (7.8.3) is forbidden. 


Z4 € 23 € 22 € z1 € 29. (7.8.3) 


One might ask why there is an asymmetry here. One way to look at this is to consider that the nature and 
meaning of a set is determined by what it contains, not by what it is contained in. Thus to determine the 
nature of a set, one follows the membership relation network to the left. By looking at the members, and 
the members of the members, and so forth, one should be able to find an end-point of any traversal along 
a leftwards path in the membership relation network within a finite number of steps. The regularity axiom 
guarantees that this is so. Therefore the “nature” of any set may be determined in this way. 


One might ask also why the nature of a set is not determined by the sets which it is contained in. That 
would seem to be not entirely unreasonable. However, this approach would require a very large number of 
top-level sets to be given meaning outside pure set theory. In the ZF approach to set theory, all membership 
relation traversals to the left ultimately end with the empty set. Therefore one only needs to give meaning 
to one set, namely the empty set. All other sets then derive their meanings from this one set by following 
set membership chains on the left. It is not at all clear how one could start with a single universe set and 
give meaning to its members, and members of members, and so forth. 


So one possible answer to the question of why the regularity axiom is needed is that it provides certainty 
that all sets have determinable content. 


7.8.10 REMARK: Constructible sets are guaranteed to satisfy the ZF regularity axiom. 

If all sets in ZF set theory are built up by a finite number of “stages” from the empty set () (axiom 2) and 
the (infinite) set of finite ordinals w (axiom 8), it seems fairly clear that the ZF regularity axiom should 
hold automatically. (See for example Shoenfield [390], pages 270-276 for a constructible universe for ZF set 
theory. See also Remark 12.6.8.) Each of the set construction axioms 3, 4, 5 and 6 seems to build sets which 
obey the regularity axiom if the source-sets out of which they are built obey the regularity axiom. The fly 
in the ointment here is the observation, made in Remark 7.8.5, that the set w requires the regularity axiom 
to force it to be regular. Since all infinite sets require w at some stage of their construction, this means that 
all infinite sets owe their regularity to the regularity of w. 


7.8.11 REMARK: History of the axiom of regularity. 
According to Stoll [393], pages 304-305, various versions of the ZF regularity axiom were given by Mirimanoff 
(1917), von Neumann (1925, 1930), and Zermelo (1929). 
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7.9. The ZF infinity axiom 


7.9.1 REMARK: The infinity axiom does not explicitly specify a set. 

In the ZF theory in Definition 7.2.4, axioms 2, 3, 4, 5 and 6 all specify the contents of their “anointed 
sets” explicitly. In each case, the set X which is asserted to exist is defined by a proposition of the form 
Vz, (z € X € P(z)), where P is independent of X. Therefore uniqueness is guaranteed by Theorem 7.6.24, 
and the anointed set may be given a name and a notation. The case of the infinity axiom 8 is not so easy. 


Axiom 8 may be written as follows. 


AX, Vz, (zE X & (z=0 V Jy E X, Ww, (v€z &(vey Vv v-y)). 


This has the form Vz, (z € X = P(z,X)). It is because of this self-reference in the defining predicate that 
so much effort is required to demonstrate uniqueness, which is the main subject of Section 12.1. (Such a 
self-reference appears also in the infinity axioms of other authors which are described in Remark 7.9.2.) 
Consequently the unique set of ordinal numbers whose existence is stipulated by axiom 8 is not given a name 
and notation until Section 12.1. (See Definition 12.1.28 and Notation 12.1.29.) 


7.9.2 REMARK: Variants of the infinity axiom. 
There are several variants of the infinity axiom in the literature, including the following. 


iX 


~ 


Jy, (yE X AVz,z2¢y) ^ 
Vu, (u € X > 3v, (VEX ^ Vw, (w € v & (w € u V w = u))))). 
AX, (@EX ^ (Yy € X, (y U {y} € X))) 


, 


SES 
o io 
wo nm 
RE 


Line (7.9.1) is given by Shoenfield [390], page 240. Line (7.9.2), which is equivalent to line (7.9.1), is given 


by EDM2 [113], 33.B. Line (7.9.3), which is given by E. Mendelson [370], page 169, and EDM [112], 35.B, 
may be written equivalently as line (7.9.4). 


AX, (du, u € X) ^ Vu, (ue X > w, (VEX ^uCv^ vz u))). 
iX,(XZz0^VueX,3veX,u&v). 


SE 
o io 
HS w 
& 


7.9.3 REMARK: Defining finite ordinal numbers in terms of predecessors instead of successors. 
A style of infinity axiom which asserts the existence of predecessors rather than successors is as follows. 


AX, Va, (zE XS (z=0 V Jy € X, z = yU {y})). (7.9.5) 


Proposition (7.9.5) may be rewritten in low-level predicate logic as follows. 


JX, Vz, (z E€ X & ((Vu, u é z) V Jy, (y E X AV, (VE ZS(veyVv=y))))). (7.9.6) 


This superficially resembles the successor-style axioms in Remark 7.9.2, but in Section 12.1 it becomes clear 
that this adjustment to the infinity axiom makes proofs of basic properties of the ordinal numbers quite 
different. The successor style asserts that X contains () and all successors of all elements of X, whereas the 
predecessor style asserts that X contains () and all successors of all elements of X, and nothing else. This 
is enforced by requiring each element (except ()) to have a predecessor. Roughly speaking, the conventional 
successor style implies that w C X for some set X, whereas the predecessor style asserts that w is a set. 


The author has not yet found any set theory textbook which formulates the infinity axiom in terms of 
predecessors as here. E. Mendelson [370], page 175, does define w in terms of predecessors, but he first 


defines general ordinal numbers in the usual way (in terms of transitivity and well-ordering), and he states 
the infinity axiom in terms of successors. (See E. Mendelson [370], page 169.) 


7.9.4 REMARK: Historical note on the adoption of the predecessor-style infinity axiom in this book. 

Since the author cannot find the version of the infinity axiom which is used in Definition 7.2.4 (8) in the 
literature, there is some mystery as to where it could have originated from. After developing all of the 
essential theory for this variant definition in Section 12.1, it was perplexing that no reference could be found 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


248 7. Set axioms 


for it. Since the variant axiom yielded all of the expected results, although with unexpectedly large effort, 
it was surprising that it was even valid. 


The infinity axiom in terms of predecessors in Remark 7.9.3, line (7.9.6), which is adopted here as one of the 


ZF axioms (Definition 7.2.4 (8)), was added to this book on 2009-1-26, replacing an earlier successor-style 
infinity axiom. Infinity axioms are expressed in most textbooks in the successor style as in Remark 7.9.2. 


The 2009-1-26 predecessor-style infinity axiom was apparently based on a finite ordinal numbers definition 
which was added to this book on 2005-3-14. That definition, in turn, seems very likely to have been inspired 
by pure intuitive conjecture (i.e. guesswork) by inspecting earlier place-holder text, which was as follows. 


“The finite ordinal numbers are 0 = 0, 1 = {0}, 2 = {0,1}, nt = n U {n}, ete.” 


The replacement text for this on 2005-3-14 was as follows. 


“The set of finite ordinal numbers is the set w defined by 


crew & ((x=90)V (Ay € w, x = y U {y})). (22? 


The question marks indicated that the author was unsure of the new place-holder definition, suggesting that 
it was a hastily concocted conjecture to be either verified, corrected or deleted later. 


The change from successor axiom to predecessor axiom could plausibly have been influenced by reading 
E. Mendelson [370], page 175, where finite ordinal numbers are defined in a similar way, but the similarity is 
not instantly obvious. Also, Mendelson requires the elements of w to be a-priori ordinal numbers, whereas 
here there is no such a-priori assumption. In other words, his definition only says which ordinals are in w, 
not how the ordinals in w can be defined directly in terms of ZF without first defining general ordinals. So 
it seems unlikely that this was the source. It seems most likely that the form of infinity axiom chosen for 
this book was a “lucky guess”, based on a combination of intuition, optimism and naivety! 


7.9.5 REMARK: Justification of the form of finite ordinal number sets. 

One might reasonably ask why the very general concept of infinite sets is described in ZF set theory in terms 
of the von Neumann ordinal number construction, where each element in the set w is constructed as x U {x} 
from each preceding set x. 


The construction of an infinite set X requires the specification of a new set which is different to all preceding 
sets, for each given subset of X. To be able to state that a set X is infinite, we need to be able to assert 
that no matter how large any subset Y C X is, we can always generate a set S(Y) which is different to all 
elements of Y. So the general requirement is to find a general construction rule which yields a set S(Y) for 
any given set Y, such that S(Y) Nn Y = 0 for any set Y. 


The construction Y ++ Y U [Y ) happens to satisfy the requirement. To see this, let S(Y) = Y U (Y) and 
note that Y € S(Y), but Y ¢ Y because this is forbidden by the regularity axiom. Therefore S(Y) 4 Y, by 
the “substitution of equality” axiom of a first-order language with equality. So each generated set is different 
to the previous set. 


We also need to prove that S(Y) is different to all preceding elements of the sequence of sets. To show 
this, note that Y C S(Y). In fact, by mathematical induction, it is clear that all generated elements of the 
sequence include all previous elements of the sequence. In fact, each element Y of the sequence equals the 
union of all elements in the sequence up to and including Y. Therefore the proposition Y ¢ Y also implies 
that S(Y) is different to all of the preceding elements of the sequence. 


The construction S(Y) = {Y } looks simpler, and the inequality S(Y) Z Y is guaranteed by the regularity 
axiom. (This construction was presented in a 1908 paper by Zermelo.) One may easily show that the sets 
in the sequence Ø, (0), {{@}}, etc., are different by applying the regularity axiom inductively, but such 
induction would have to be naive induction since ZF induction is not available before the integers have been 
defined. (The Zermelo style of number construction is also discussed in Remark 12.1.15.) 
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7.10.1 REMARK: The induction principle, name maps and dark numbers. 

The principle of mathematical induction requires a name to be assigned to an object in a set, and an induction 
rule is applied to this named object to show that another named object (its successor) is in the set, which 
then shows that there is no maximum element in the set. This is what we mean by an infinite set. However, 
since we cannot name all objects, and we cannot explicitly specify an infinite number of naming maps, it 
is not possible to validate the application of mathematical induction to concrete systems. The concept of 
“the largest nameable number in a set” is as circular as “the largest number which can be written in ten 
words”. When one writes N = “the largest nameable number in set X”, one must adjust one’s naming 
system so that the name N points to the largest nameable number in X. Thus the name map from abstract 
to concrete objects is defined in terms of itself. There is an implicit name-to-object map py : Ny — Y here 
which maps an abstract name such as N to a concrete number py(N). (See Remark 5.1.4 for notations.) 
This is illustrated in Figure 7.10.1. 


abstract name space 


modelled concrete numbers 


Figure 7.10.1 The naming bottleneck and dark numbers 


7.10.2 REMARK: Socio-mathematical network “virtual infinity” interpretation of ZF set theory axioms. 
The set theory axioms in general may be thought of as giving a “social licence” for the creation of sets by 
mathematicians. If a mathematician agrees to only claim the creation of sets which are in conformance with 
the axioms, other mathematicians will not make objections. 


Not all sets which are permitted by ZF set theory actually exist. The ZF axioms are like legislation which 
licenses anyone to construct sets within specified constraints without hindrance. But sets do not exist in any 
real sense. They exist only in the minds and literature and discussions of communities of mathematicians. 
Different communities have different laws about what is permitted. As in the case of legislation in other 
areas of social behaviour, the laws of set theory are chosen so as to help resolve arguments about what 
is right and wrong. The social harmony of “the community" is an important consideration. Clarity and 
consistency of the laws tends to minimise social discord and vexatious litigation. Admittance to citizenship 
in each mathematical community is subject to acceptance of their constitution and laws. 


Infinite sets, such as the integers, grant a very broad licence to create and discuss numbers. It is not possible 
for a finite number of finite minds to simultaneously contemplate an infinite set of integers, but the infinite 
freedom to create integers ad-libitum is required. This has some similarities to the way computer virtual 
memory works. A computer process may have the freedom to access a thousand times as much memory as 
physically exists in the computer, but the memory is only allocated on demand. When the process wishes 
to write to a block of memory, it is instantiated on demand. Thus any particular portion of the potentially 
infinite virtual memory is accessible, but the whole virtual memory cannot be instantiated simultaneously. 


On-demand set creation is like walking on rolls of red carpet in a desert. You do not need to cover the whole 
desert with a single red carpet. You only need a dozen rolls of carpet and a few helpers. Whenever you 
want to walk in some direction, the helpers roll out a carpet in the direction you want to walk. When the 
carpets are all used, the helpers roll up one of the used carpets and roll it out in front of you. In this way, 
you can walk anywhere in an infinite desert with a small fixed finite number of carpets. (This is similar to 
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how the human mind’s short-term consciousness works.) Thus one has the happy illusion that the whole 
infinite desert is covered in red carpet. 


The axioms of set theory are like helpers rolling out pieces of red carpet in the desert. They allow you to 
walk in an infinite universe of sets, but you can only walk in a finite area at any one time. Sets do not 
exist until someone thinks of them. Sets and numbers come into existence and persist as long as you want 
them to. But they do not all exist permanently in some universe of sets. Sets and numbers exist only in the 
human socio-mathematical network, and they exist only in the dynamical sense of on-demand set creation. 


The red-carpet-in-the-desert metaphor is related to the way in which computer users may view an infinite 
amount of information on a finite computer screen by scrolling, and one may achieve any level of resolution 
by zooming in. In this way, one has the illusion that the objects being viewed have infinite resolution and 
infinite extent, although the viewing window has a fixed finite number of pixels. 


7.10.3 REMARK: The infinity axiom is less metaphysical than the axiom of choice. 

Whereas the axiom of choice (in Section 7.11) requires faith because it does not give you any set at all, the 
axiom of infinity gives you as much of an infinite set as you wish, and you can employ any means, including 
computers, to generate the elements of an infinite set. It does require some faith to believe that an infinite 
set goes on forever, even though we can never add up all of the terms of a series such as is employed for a 
number such as 7. But since we can have as many digits as we wish, it seems to be somewhat metaphysical 
to suppose that an infinite set or series stops at some point. No one has ever encountered an infinite series 
which reaches its final term. Calculation of the decimal digits of 7 has never encountered a show-stopper 
which cannot be removed by employing larger and faster computers. 


In the case of the infinity axiom, it is very difficult indeed to imagine that there is a point at which the 
integers suddenly stop existing. The induction principle has a very strong intuitive basis indeed. It would 
require enormous faith to believe that there is a point at which the digits of 7 are exhausted, or to believe 
that there is a largest integer beyond which there are no more integers. It is a common riddle for 5 year old 
children to name the largest number. By the age of 10, very few children are in any doubt that numbers 
may be arbitrarily large and that there are infinitely many numbers. 


On the other hand, the axiom of choice delivers precisely nothing. In general, the axiom of choice delivers an 
infinite set of sets if it delivers only one set. But it does not deliver the first set from which the others may 
be constructed. This is because an infinite set of choices can generally be modified in an infinite number of 
ways to generate more choice sets. The ZF axioms can generate infinitely many choice sets if we are delivered 
only one. So it is very frustrating that the first one is never delivered. 


It is tempting to think that it is a small hurdle to obtain the first choice set because it is only a single set. 
If someone tells you that they will give you a billion dollars if they give you even one single dollar, it is very 
tempting to feel cheerful about this. But if the first dollar is impossible to deliver, one must restrain one’s 
irrational exuberance. No technological development in the size and speed of computers will ever deliver the 
first choice set. The axiom of infinity gives you as many dollars as you want. (The axiom of choice also gives 
you as many dollars as you want, but only if it gives you the first dollar, which it never does.) 


It may be concluded, then, that only the most extreme sceptic would reject the infinity axiom, whereas 
enormous trust is required to accept the choice axiom. One could perhaps call the axiom of choice the 
“rabbit out of a hat” axiom. But there is no rabbit! All you get is an IOU for a rabbit! You can only collect 
your rabbit if your internet link has infinite bandwidth and your computer has infinite memory. 


If one takes the view that the axiom of choice is actually false, then all of the theorems which make use of 
sets whose existence is “proved” by invoking the axiom of choice are in reality only asserting properties of 
the empty set. Thus they may be completely correct, but also totally useless. 


As mentioned in Remark 7.12.9, anyone who actually desires metaphysicality in their mathematics may 
console themselves with the more restrained metaphysicality of the infinity axiom. 


7.11. The axiom of infinite choice 


7.11.1 REMARK: Living without the axiom of choice. 
It is assumed in this book that all axioms of infinite choice are false. Choice functions whose existence can 
be proved within ZF set theory are adequate for applicable mathematics. Some abstract theorems which 
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require a choice axiom are presented for comparison with the standard literature, but they are tagged so 
that they can be avoided, and any theorem which uses a tagged theorem is tagged similarly. The axiom of 
choice is presented here to assist the reader to develop avoidance strategies. In various situations where it is 
customary to invoke a choice axiom, here it is shown instead how to live without it. If someone is wearing 
an imaginary parachute, it’s better to become aware of this before jumping from the plane, even though a 
real parachute is more expensive. 


7.11.2 REMARK: The literature on the axiom of choice. 

The study of the axiom of choice in the 20th century has been deep and broad. Some particularly useful 
books which present and interpret research into the axiom of choice are those by Moore [371], Jech [364], 
and Howard/Rubin [362]. There are also useful chapters or sections on the axiom of choice in books by 
Halmos [357], pages 59-69; Suppes [395], pages 239-253; Bernays [341], pages 133-137, 142-147; Stoll [393], 
pages 111-118, 124-126; Quine [382], pages 217-238; Wilder [403], pages 72-79, 120-141; E. Mendelson [370], 
pages 197-206; Rosenbloom [386], pages 146-151; Roitman [385], pages 79-88; Lévy [368], pages 158-195; 
Cohen [349], pages 136-142; Kelley [101], pages 31-36; EDM2 [113], section 34, pages 153-154; Russell [389], 
pages 117-130; Whitehead/Russell, Volume I [400], pages 500—539; Whitehead/Russell, Volume III [402], 
pages 1-17; Smullyan/Fitting [392], pages 200-203, 244—247, 267—283; Pinter [377], pages 110-122. 


7.11.3 REMARK:  Choosable collections of sets. 

'The axiom of choice asserts that all pairwise disjoint collections of non-empty sets are choosable in the sense of 
Definition 7.11.5. There are many weaker variants of the axiom of choice which assert choosability for special 
classes of pairwise disjoint collections of non-empty sets. Theorem 7.11.6 shows that a choosable collection 
of sets cannot contain the empty set. So it is not technically necessary to require this in Definition 7.11.5. 


7.11.4 DEFINITION: A (set-collection) choice set for a set X is a set C such that VA € X, Jr € C, x € A. 


7.11.5 DEFINITION: A choosable collection (of sets) is a pairwise disjoint collection of sets X which satisfies 
3C, VA € X, Jx € C, x € A. In other words, it is a pairwise disjoint collection of sets for which a set- 
collection choice set exists. 


7.11.6 THEOREM:  Choosable sets of sets cannot contain the empty set. 
Let X be a choosable collection of sets. Then 0 £ X. 


PROOF: Let X be a choosable collection of sets. Suppose that Ø € X Then there is a set C which satisfies 
VA€ X, di'xe C, r € A. Let A=. Then A € X. So dz € C, x € A. Therefore dr € C, x € A. In other 
words, dz, (x € C and x € A). So Jx, x € A, which contradicts the emptiness of A. Hence 0 ¢ X. 


7.11.7 REMARK: Using choice sets instead of choice functions for the axiom of choice. 

There are two main (equivalent) approaches to formulating the axiom of choice, in terms of choice functions 
or choice sets. It is more natural to think of choice axioms in terms of choice functions, but functions 
are defined in Chapter 10 in terms of ordered pairs, which adds to the complexity. The option of using 
set-theoretic functions as in Definition 7.7.6 is not feasible because this would require a different kind of 
predicate calculus, where predicates can be subject to quantifiers. (That would be “second-order logic”, 
which is a different kettle of fish.) A second and more important reason to not use set-theoretic functions is 
that there are not enough of them. More precisely, they are capable of representing only finitely specifiable 
rules, whereas sets-of-ordered-pairs functions are almost unlimited. (They may represent rules which require 
an infinite, even uncountably infinite, amount of information to specify.) 


The main advantage of formulating the axiom of choice in terms of choice sets is that it can be stated as 
a low-level set-theoretic formula. The principal disadvantage is that the set-collection to choose from must 
contain pairwise disjoint sets. This constraint is not necessary for choice functions. All in all, it seems best 
to commence with choice sets, such as C in Definitions 7.11.5 and 7.11.10. However, the axiom is still quite 
long when expressed as a pure set-theoretic formula. 


The choice-function approach does not require a choosable set-collection to have the pairwise disjoint property 
because the choice function associates each choice of element with the set it has been chosen from. In the 
choice-set approach, pairwise disjointness is required because, for example, two of the sets A; and Ag in the 
set-collection could be subsets of a third set A3. Then if A, and Ag are disjoint, they would each contain a 
single element of C, but then A3 would contain two elements of C. (See Figure 7.11.1.) 
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Figure 7.11.1 Choice set C is impossible if A; N Ag = @ and A, U A5 C A3 


7.11.8 REMARK:  Undecidability of the choosability of collections of sets. 

Definition 7.11.5 names a property of collections of sets which may not be decidable within ZF set theory. 
Each different form of the axiom of choice guarantees choosability of different classes of collections of sets. 
'This is not a defect in the definition. The existence of the set w is not decidable without the axiom of infinity, 
and the decidability of the existence of the von Neumann stage V,,4,, requires both the axioms of infinity 
and replacement, and the decidability of the existence or non-existence of mediate cardinals requires some 
form of the continuum hypothesis. So undecidability is just a fact of life. 


Definition 7.11.5 is expressed in terms of a proposition of the form VX, 3C, P(.X, C) for some set-theoretic 
predicate P. The validity of this assertion can be made more likely by either constraining the class of sets 
X for which it is asserted, or else by expanding the universe of sets C with which the proposition P(X,C) 
can be satisfied. For example, if X is constrained to be finite, then the induction principle makes the 
proposition true. But if X is an unrestricted collection of non-empty sets, the universe of sets C must be 
expanded to improve the chances of the proposition being true. Since different models of ZF set theory have 
different universes of sets, it is not surprising that a given class of eligible set-collections may be guaranteed 
to be always choosable in one model, but guaranteed to be not always choosable in another model, and the 
choosability may be unknown or unknowable in yet other models. (See Howard/Rubin [362], pages 91-103, 
145-221, for comprehensive details. Their list includes 95 forms of choice axioms which have not yet been 
shown to be equivalent in ZF set theory, and most of these axioms are known to be independent of most of 
the others. A sample of this complexity is shown in Figure 13.8.1 in Remark 13.8.8.) 


The plausibility of axioms of choice no doubt derives from the fact that in practical mathematics, choosing 
elements of sets according to a finitely expressible rule is always feasible because the sets follow an explicit 
pattern which facilitates the choice of elements according to a rule. 


7.11.9 REMARK: Adding AC to the ZF axioms. 

Definition 7.11.10 adds the axiom of choice to the ZF set theory in Definition 7.2.4. This can be useful when 
courage and moral fortitude fail. (“If it’s too difficult to prove, make it an axiom.” Because axioms don’t 
need to be proved!) The set of axioms ZF plus AC is typically abbreviated as ZFC. (Definition 7.11.10 (9) 
is illustrated in Figure 7.11.2.) 


em 
NS» A Ag 


Figure 7.11.2 Choice of one element z; from each set A; of a disjoint collection X 


7.11.10 DEFINITION [MM]: Zermelo-Fraenkel set theory with the axiom of choice is Zermelo-Fraenkel set 
theory as in Definition 7.2.4, together with the additional Axiom (9). 


(9) The choice axiom: For every set X, 


(VA, (A € X > Jr, z € A)) AVA, VB, (AEX A Be X)> (A=B V ~Jr, (x€ A ^x € B))) 
> JC, VA, (AEX > 3z, (zEAAzEC AYy, (YEAAYEC)Sy=2))). 
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In other words, 


VX, (0G X AVA,BEX,(A=BV ANB=9)) > (3C, VAE X, d'zeC,zeA) (7.11.1) 
In other words, every pairwise disjoint collection of non-empty sets is choosable. 


7.11.11 REMARK: Formulation of the axiom of choice in terms of choice functions. 

Although choice functions are the most meaningful and applicable way to express the axiom of choice, 
functions are higher-level structures defined in terms of set products and ordered pairs. Therefore the 
technical discussion of choice functions is delayed to Section 10.3. (See also Remark 7.11.7 for some technical 
advantages and disadvantages of the choice-set and choice-function formulations of AC.) The axiom of choice 
is typically first defined in terms of choice functions, and then it is shown or asserted that this is equivalent 


to the choice-set version in Definition 7.11.10. 


The set-theoretic formula for the choice-set version of the axiom of choice in Definition 7.11.10 (9) is about 
twice as long as the longest ZF axioms in Definition 7.2.4. Both the complexity, and the fact that AC 
does not specify a set with a proposition of the form 3C, Vz, (z € C © P(z,X)) for some set-theoretic 
predicate P as the ZF existence axioms do, tend to suggest that AC is not a truly fundamental axiom of set 
theory. Most of the weaker choice axioms, such as countable choice, require even more complexity because 
they are expressed in terms of cardinality, which is at an even higher structural level than functions, and 
this cardinality is often expressed in terms of ordinal numbers or cardinal numbers. Therefore although full 
AC can be expressed in terms of choice functions in Section 10.3, the other forms of choice axioms must 
be delayed even further, which suggests that they are even less fundamental. (For example, the axiom of 
countable choice is presented in Definition 13.7.21.) 


7.11.12 REMARK: Some AC equivalents. 

In the literature, there are many lists of equivalents for the axiom of choice (modulo the ZF axioms). See 
especially the structured list of 74 AC equivalents by Moore [371], pages 330-334, and the 86 AC equivalents 
with source references by Howard/Rubin [362), pages 11-16. See also some shorter lists (including some 
proofs) by Suppes [395], pages 243-251; Jech [364], pages 9-11, 183; Lévy [368], pages 158-166; Stoll [393], 
pages 116-118; E. Mendelson [370], page 197-198; Wilder [403], pages 135-139; EDM2 [113], pages 153-154; 
Quine [382], 220-230. The following are some AC equivalents which are relevant for this book. 


(1) Every surjective function has a right inverse. (See Theorem 10.5.17 (i) and Pinter [377], pages 114-115; 
Moore [371], page 333; MacLane/Birkhoff [110], pages 7-9.) 


(2) The well-ordering theorem. This theorem states that any set can be well-ordered if the axiom of choice 
is assumed. (See Section 11.6 for well-ordering.) Zermelo [443] formulated the axiom of choice in 1904 
to provide a means of proving the well-ordering theorem. (See Moore [371], page 330.) 


(3) Trichotomy. For any sets X and Y, either X is equinumerous to a subset of Y, or Y is equinumerous 
to a subset of X. In other words, either there exists a bijection from X to a subset of Y , or vice versa. 
In other words, cardinal numbers are totally ordered. (See E. Mendelson [370], page 198; Moore [371], 
page 330.) 


(4) Every set which spans a linear space includes a basis for the space. (See Moore [371], page 332.) 


(5) Tikhonov's compactness theorem. The product of any family of compact topological spaces is compact. 
(Moore [371], page 333.) 


7.11.13 REMARK: Some AC consequences which are “lost” if AC is not accepted. 

It is important to know what one loses by not accepting the axiom of choice (because the gap must be filled 
by other means). The term “AC consequence” means an assertion which can be proved in ZF+AC but not 
in ZF. Similarly, a “CC consequence” means an assertion which can be proved in ZF+CC but not in ZF. 
The following assertions are AC consequences, but not CC consequences, which are relevant to this book. 
(The AC equivalents in Remark 7.11.12 are not repeated here.) 


(1) Every linear space has a basis. (See Theorem 22.7.21 and Jech [364], page 145.) 


(2) Every set is equinumerous to an ordinal number. Therefore the cardinality of all sets is well defined. 
(See E. Mendelson [370], page 200.) 


(3) There is a Lebesgue non-measurable subset of R. (See Jech [364], page 144.) 
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(4) The product of a family of compact topological spaces is a compact topological space. (Tikhonov’s 
theorem. See Remark 33.5.16.) For a topological space to be compact requires only that a particular kind 
of subcollection of given collections of sets should exist. Compactness does not require the specification 
of any method of determining the contents of these subcollections. This makes the axiom of choice very 
tempting because it asserts the existence of sets, but no method of determining the contents. (See also 
Rudin [130], page 391, for the Tikhonov theorem.) 


5) Hausdorff's maximality theorem. Every non-empty partially ordered set contains a totally ordered subset 
which is maximal with respect to the property of being totally ordered. (See Rudin [130], page 391.) 


Every proper ideal in a commutative unitary ring lies in a maximal ideal. (See Rudin [130], page 391.) 
The general Hahn-Banach theorem. (See Rudin [130], page 391.) 

The Krein-Milman theorem. (See Rudin [130], page 391.) 

Urysohn’s lemma. (See Simmons [137], pages 135-137.) 

Tietze extension theorem. (See Simmons [137], pages 135-137.) 

Generalised Stone-Weierstraf theorem. (See Rudin [130], page 121.) 

(12) Stone-Cech compactification. (See Simmons [137], pages 141, 331-332; Rudin [130], page 411.) 


The following assertions are CC consequences which are relevant to this book. 


(13) The union of a countable number of countable sets is countable. (See Theorem 13.9.9 and Jech [364], 
page 142.) 

(14) IR is not equal to a countable union of countable sets. (Howard/Rubin [362], pages 30, 132.) 

(15) Any Dedekind-finite set (Definition 13.10.3) is a finite set as defined by equinumerosity to finite ordinal 
numbers. (See Theorem 13.10.11 (iii). It is shown by Jech [364], pages 119-131, that the axiom of count- 


able choice implies the equivalence of finite and Dedekind-finite sets, but that this finiteness equivalence 
does not imply the CC axiom.) 


(16) Eight independent definitions of finiteness in ZF are equivalent in ZF+CC. (See Table 13.11.1 in Re- 
mark 13.11.1. Two of these definitions are ordinary finiteness and Dedekind-finiteness.) 


(17) Every infinite set has a subset which is equinumerous to the ordinal numbers. That is, if X is an infinite 
set, then 3f, ((Vi € w, f(i) € X) ^ (Vi, E€ w, (i = j & f(i) = f(3))). (See Theorem 13.10.10 and 
Jech [364], page 141.) 


(18) Every sequentially compact subset of R is closed. (See Remark 35.6.11 and Jech [364], page 142.) 
(19) Every sequentially compact subset of R is bounded. (See Remark 35.6.11 and Jech [364], page 142.) 


These “benefits” of AC and CC must surely be irresistible to any ambitious mathematician who wants to 
succeed in a competitive market. Why would anyone give up such huge benefits if they cost nothing? (One 
possible reason to resist temptation would be that these axioms could be false.) A comprehensive list of 
several hundred consequences of AC, CC, DC and many other AC variants, is given by Howard/Rubin [362], 
pages 9-136. A less formidable list of (almost) irresistible AC consequences (and the bad things which can 
happen without AC) is given by Jech [364], pages 141-166. 


7.11.14 REMARK: Tagging of theorems which require an axiom of choice. 

All “AC-tainted” theorems will be tagged with a warning indication such as “THEOREM |ZF+AC]”. Those 
who are comfortable with AC might call such theorems “AC-enhanced” or “AC-enriched”. Theorems which 
can be proven with the weaker axiom of countable choice will be tagged as “THEOREM [ZF--CC|" and will 
be called “CC-tainted”. (Examples: Theorems 13.9.9 and 13.10.10.) It is very easy to accidentally use AC 
in a proof, especially in topology. Theorems claimed as AC-free in this book may need to be tagged as 
AC-tainted or CC-tainted in a later revision if a requirement for AC is discovered. Caveat lector! 


7.12. Comments on the axioms of infinite choice 


7.12.1 REMARK: The axiom of choice does not specify set membership. 
The ZF existence axioms 2, 3, 4, 5, 6 and 8 in Definition 7.2.4 have the form 3X, Vz, (z € X = P(z)) 


for some predicate P. This form determines set membership explicitly. The axiom of choice is not of 


[www. geometry. org/dg.htm1] [ draft: UTC 2023-1-3 Tuesday 00:13] 


7.12. Comments on the axioms of infinite choice 255 


this form. An explicit set-membership specification makes possible the definition of a unique set X whose 
members satisfy P, as described in Remark 7.4.3. The axiom of choice does not raise the prospect of any 
such definition. This is why the axiom of choice is of no value in applied mathematics. AC does not tell 
you what the members of a set are. A set is defined by its members. The membership of a set is its only 
property. If the membership is not known, then nothing is known about the set. If an aeroplane pilot asks 
the navigation computer for route details and the response is that infinitely many routes exist, but it can't 
choose one because that would require infinite memory, such a response would be of limited value. Thus the 
axiom of choice is really the axiom of no choice because it delivers nothing except a promise. 


7.12.2 REMARK: Bertrand Russell’s boots-and-socks metaphor for the axiom of choice. 
In 1919, Russell [389], pages 125-127, explained the axiom of choice in terms of a millionaire who was so rich 
that he had a countable infinity of pairs of boots and a countable infinity of pairs of socks. 


Among boots we can distinguish right and left, and therefore we can make a selection of one out 
of each pair, namely, we can choose all the right boots or all the left boots; but with socks no such 
principle of selection suggests itself, and we cannot be sure, unless we assume the multiplicative 
axiom, that there is any class consisting of one sock out of each pair. 


Russell used the term “multiplicative axiom” for the axiom of choice. Referring to the ordering of a countably 
infinite set of pairs of objects, he wrote: 


There is no difficulty in doing this with the boots. The pairs are given as forming an No, and 
therefore as the field of a progression. Within each pair, take the left boot first and the right 
second, keeping the order of the pair unchanged; in this way we obtain a progression of all the 
boots. But with the socks we shall have to choose arbitrarily, with each pair, which to put first; 
and an infinite number of arbitrary choices is an impossibility. Unless we can find a rule for selecting, 
i.e. a relation which is a selector, we do not know that a selection is even theoretically possible. 


Russell then went on to suggest that the difficulty could be resolved by using the location of the centre 
of mass of each sock as a selector. The kind of axiom of choice described in the socks metaphor is for a 
countable sequence of pairs, which is probably the weakest of all choice axioms. 


7.12.3 REMARK: The axiom of choice is a “virtual axiom”. 
According to Moore [371], page 75: 


In their magistral Principia Mathematica, Russell and Whitehead carefully made the Multiplicative 
Axiom an explicit hypothesis for those theorems depending on it, but they remained dubious of its 
truth. 


A very similar approach was taken by Quine [382], page 217, who wrote that 


[...] it is common practice to distinguish between results that depend on the axiom of choice and 
those that do not. Accordingly I shall not adopt the principle as axiom or axiom schema, but shall 
merely use it as premiss where needed. 


This is more or less the approach that is adopted in this book. And in fact, this is de facto the approach 
taken by many authors who are careful to point out each use of AC, as if to alert the reader that each theorem 
which depends on it is dubious. Thus it becomes a kind of “virtual axiom” which comes into existence and 
disappears according to convenience, like putting on and taking off a Santa Claus costume. In other words, 
even when it is used, it is generally not truly believed. 


7.12.4 REMARK: The advantages of “local choice axioms”. 

Even more useful than a global on/off “switchable axiom of choice” as in Remark 7.12.3 is a “locally 
switchable axiom of choice”, which is a condition of the form: “If X has a choice function, then ...”. 
In other words, if a theorem depends on the existence of a choice function for a particular set, this should 
be listed as one of the conditions for the theorem. In this way, if the user of the theorem can prove in ZF 
that such a choice function exists, then the theorem becomes a pure ZF theorem. In ZF+AC set theory, the 
choice-function-existence condition can be removed. 


This choice-function-condition approach may be thought of as a kind of “local axiom of choice” which is 
invoked explicitly as a theorem-condition whenever required. By stating exactly which choice function is 
required in order to make the theorem’s assertion valid, the conditional theorem can be fully valid in ZF 
set theory. This approach has the virtue that all AC-dependent ZF theorems can be converted to pure ZF 
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theorems by merely identifying the required choice function(s) in the list of conditions. Hence there is no 
need to expurgate AC theorems from the textbooks. Such an approach is often adopted in this book. 


In regard to Hilbert’s concern, mentioned in Remark 2.2.12, that Brouwer’s intuitionism would lead to “less 
results” by forbidding certain kinds of mathematical argument, the “local axiom of choice” idea does not 
reduce the number of results. It merely replaces the completely general axiom of choice as a pre-condition 
with a very much more specific pre-condition that a particular kind of choice function must be known to 
exist. Thus blunt theorems are replaced by sharper theorems which identify their conditions more precisely. 
This increases the amount of information contained in such theorems, and the writer of each theorem must 
exercise some care to determine which sets require choice functions. Usually this can be achieved by merely 
inspecting the proof to find exactly where AC has been invoked. If this is not easy to determine, probably 
the proof should be rewritten for greater clarity. (See also Remark 13.8.14 regarding “AC shim theorems” .) 


Of some relevance to Hilbert’s concern to obtain “more results”, Lévy [368], page 159, made the following 
allusion to the general motivation for adopting the axiom of choice. 


What leads mathematicians to adopt the axiom of choice as an axiom of set theory, in addition 
to the opportunistic reason that it enables them to prove many theorems, is the following 
consideration. [...] 


Every AC-dependent theorem may be decomposed into a choice-function-conditioned ZF theorem plus an 
“AC shim theorem”. Therefore all AC-dependent mathematical literature may be rewritten in terms of 
AC-free theorems (which explicitly state their choice-function requirements), plus AC-dependent automatic 
corollaries which assert exactly the same propositions, but omitting the choice-function pre-conditions. The 
AC-dependent corollaries make exactly the same assertions as were made before the decomposition. So 
no theorems are lost. But as a bonus, one obtains a pure-ZF theorem corresponding to each original 
ZF+AC theorem (which now becomes a trivial corollary). So this kind of ZF/AC theorem decomposition 
doubles the total count of theorems! Moreover, the pure-ZF theorems contain valuable information about 
the choice-function dependencies. This ZF/AC decomposition approach clearly has many advantages and 
no disadvantages except that a small amount of thinking is required. 


When a body of mathematical literature has been decomposed into pure-ZF theorems and AC corollaries, 
it becomes clear that the axiom of choice is vacuous. The AC corollaries state merely that their author has 
faith that the required choice functions can be obtained somehow, either from a valid ZF proof or by an act 
of pure will. The difference between the literature before and after ZF/AC decomposition is a set of trivial 
AC corollaries which have no real content except that they advertise the faith of their author. 


7.12.5 REMARK: The axiom of choice is mathematically false, even when it is metamathematically true. 
Since the axiom of choice is equivalent to the assertion that all sets can be well-ordered, it can be refuted 
by showing that some sets cannot be well ordered. (See Section 11.6 for well-orderings.) The abstract of 
a 1964 article by Feferman [411], page 325, seems to clinch the issue. (This quotation is also discussed in 
Remark 22.7.23.) 


No set-theoretically definable well-ordering of the continuum can be proved to exist from the 
Zermelo-Fraenkel axioms together with the axiom of choice and the generalized continuum 
hypothesis. 


If the axiom of choice were valid, every set would have a well-ordering. But no “set-theoretically definable 
well-ordering" can be "proved to exist". In other words, with finite bandwidth, it will never be possible to 
write down such a thing nor even contemplate it. An axiom which asserts a proposition which is known to be 
factually false seems to be a somewhat unwise axiom. One could argue that the well-orderability of the real 
numbers is valid metamathematically in a larger system within which ZF is modelled, but that well-ordering 
is not accessible within the language. 


If one adopts some constructible universe model for Zermelo-Fraenkel set theory so that the axiom of choice 
is true for this model, the universe of all sets becomes countable metamathematically. If it were true that 
the set of all real numbers was countable within the theory, this would contradict the Cantor diagonalisation 
procedure which shows that the set of real numbers cannot be countable. (See Theorem 13.1.27 (v).) Since 
the well-orderability of the real numbers in the constructible universe follows from the countability of the 
set of real numbers in the model, it is not possible to say within the ZF theory that the real numbers are 
well-orderable, nor that they are countable, unless one states this simply as an axiom. 
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Another way to put this is that if a well-ordering of the real numbers existed, one would not need any 
axiom of choice to assert its existence. For example, it is easily shown that the non-negative integers are 
well ordered. There is no need of an axiom of choice in this case. If a well-ordering of the real numbers 
really existed, why should an axiom be invented to support the assertion? This would be relying on ad-hoc 
hypotheses when no genuine proof is possible. As Newton [294], page 530, once wrote: “hypotheses non 


fingo”. In other words, one should not make up metaphysical stories which are not justified by the evidence. 


Rationem vero harum gravitatis proprietatum ex phaenomenis nondum potui deducere, & hypotheses 
non fingo. Quicquid enim ex phaenomenis non deducitur, hypothesis vocanda est; & hypotheses seu 
metaphysicae, seu physicae, seu qualitatum occultarum, seu mechanice, in philosophia experimentali 
locum non habent. In hac philosophia propositiones deducuntur ex phaenomenis, & redduntur 
generales per inductionem. 


'This may be translated as follows. 


I have not yet been able to deduce the true reason of these properties of gravity from phenomena, and 
I do not contrive hypotheses. Indeed whatever is not deduced from phenomena should be called a 
hypothesis; and hypotheses, be they metaphysical or physical, of occult qualities or mechanical, have 
no place in ezperimental philosophy. In this philosophy, propositions are deduced from phenomena, 
and are made general by induction. 


This kind of thinking may equally be applied to mathematics. The rules of mathematics should be inferred 
from everyday mathematical practice, not drawn from the imagination. If no well-orderings for the real 
numbers are observed in real-world mathematics, it seems somewhat arbitrary to suppose that they “exist” 
in some metaphysical universe to which there is no possible access. 


7.12.6 REMARK: Metamathematical choosability does not guarantee mathematical choosability. 

It could be argued, in favour of the axiom of infinite choice, that it is completely acceptable because in various 
well-known constructible universe models for ZF set theory, the class of all sets is countable and therefore 
well-orderable, and hence the class of all sets possesses a choice function. This kind of argument unfortunately 
also implies that the class of all sets is countable because metamathematically this class is countable in a 
constructible model. This seems to contradict Cantor's diagonalisation argument, which demonstrates that 
the set of real numbers, or equivalently the set P(w), is uncountable. (See Remark 13.7.23 for further 
comment on the apparently paradoxical countability of the constructible real numbers.) This apparent 
contradiction comes under the heading of Skolem’s paradox. (See for example Wilder [403], pages 237-238; 
Moore [371], pages 252, 261; Kleene [366], pages 323-324; Stoll [393], pages 453-454; Korner [461], page 154; 
Kleene [365], page 427; EDM2 [113], pages 615-616; Curry [350], pages 6-7; Mendelson [370], pages 183-184; 
Bell/Slomson [339], page 306.) 


In the same way that the metamathematical countability of the real numbers in a constructible ZF model 
cannot be imported into the ZF set theory first-order language because an enumeration function which maps 
the integers to the class of all sets cannot be written as a finite formula within the language, a choice function 
for the class of all sets in a constructible ZF model cannot be imported into ZF set theory as a finite formula 
in its first-order language. 


metamathematics mathematics 
ZF or ZF+AC countable model ZF or ZF+AC first-order language 


R is countable. R is not countable. 
Reason: Cantor [408]. 1891. 
R is well-orderable. R is not well-orderable. 


Reason: Feferman [411]. 1964. 


Thus there is an apparent contradiction here which is analogous to Skolem’s paradox. (One could perhaps 
refer to it as “Feferman’s paradox”.) The metamathematical choosability of the class of all ZF sets in a 
constructible model implies that the axiom of choice is metamathematically consistent with ZF set theory, 
but it does not imply that the ZF set-universe has a choice function which can be written with ZF set theory’s 
first-order language. If one were to assert that there exists a choice function for every ZF set, one would 
be equally justified in asserting that there is a countable enumeration of every ZF set, which is clearly not 
acceptable in view of Cantor’s well-established diagonalisation demonstration of the uncountability of P(w) 
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and R. (See Theorem 13.1.27 (v).) One may therefore conclude that AC is metamathematically consistent 
with ZF, but that the axiom of choice is false within ZF set theory itself. (See Remark 7.12.5 for the falsity 
of the axiom of choice.) 


To put it another way, the existence of a well-ordering for the real numbers in the case of a countable ZF 
model consisting of constructible sets applies only to the constructible real numbers. If these are the only 
real numbers, then the set of all real numbers is well-ordered. But since there are only countably many 
constructible real numbers, and Cantor's diagonalisation argument demonstrates the uncountability of the 
real numbers, it follows that the well-ordering imported from the model does not in fact apply to all real 
numbers. Thus the price of the well-ordering of all real numbers is their countability. 


7.12.7 REMARK: How the choice-axiom bug became a “feature”. 

Choice functions whose existence is asserted by a choice axiom are like “pixies at the bottom of the garden”. 
You know they’re there, but you never see them! A set which “exists” only by virtue of an axiom of 
choice has unknown contents. (A good example of this is a Lebesgue non-measurable function. The author 
discovered this issue as an undergraduate when he tried to write a computer program to print a graph of 
such a function to see what it looked like.) Some mathematicians feel quite comfortable with this while 
others find the discomfort intolerable. E. Mendelson [370], page 197, said: 


The Axiom of Choice is one of the most celebrated and contested statements in the theory of sets. 
Struik [249], page 201, said the following about the 1904 proof of the well-ordering theorem by Zermelo [443]. 


Since Zermelo based his proof on the Auswahlaxiom, which states that from each subset of a given 
set one of its elements can be singled out, mathematicians differed on the acceptability of a proof 
where no constructive procedure could be given for the finding of such an element. Hilbert and 
Hadamard were willing to accept it; Poincaré and Borel were not. 


Most modern mathematicians have lost interest in the question because it has been known since 1938 that 
AC is relatively consistent with the ZF axioms. AC also makes very numerous questions very much simpler. 
AC is convenient, and is apparently not logically inconsistent with ZF. So most mathematicians acquiesce, 
accepting an apparently vacuous axiom because it makes some theorems more elegant and easier to prove. 


The axiom of choice is untestable because it gives you sets which you can never see. Just as Ockham’s razor 
is invoked in science to eliminate arbitrary and untestable theorising, so also should the axiom of choice 
be rejected in mathematics. Real mathematicians solve problems by constructing solutions, not by pulling 
them out of a hat. 


When some anti-AC mathematicians discovered that they had implicitly relied upon choice axioms in their 
own publications for the “construction” of choice functions, this was understandably embarrassing. The way 
in which the acceptance of AC developed in the early 20th century is reminiscent of the attitude which is 
sometimes adopted in the computer software world, where a design blunder or implementation error which 
is difficult to fix is marketed as a "feature". Since it was too onerous, and embarrassing, to remove or disown 
all of the published “results” which tacitly invoked AC, the axiom transmuted from a bug into a feature. As 
mentioned in Remark 7.11.9, “if it can’t be proved, make it an axiom”. The progress of the “controversy” 
is well documented by Moore [371], including this comment on page 64. 


It is a historic irony that many of the mathematicians who later opposed the Axiom of Choice 
had used it implicitly in their own researches. This occurred chiefly to those who were pursuing 
Cantorian set theory or applying it in real analysis. At the turn of the century such analysts included 
René Baire, Emile Borel, and Henri Lebesgue in France, as well as W. H. Young in Germany. [....] 
At times these various mathematicians used the Assumption in the guise of infinitely many arbitrary 
choices, and at times it appeared indirectly via the Countable Union Theorem and similar results. 
Certainly the indirect uses did not indicate any conscious adoption of non-constructive premises. 


The 1938 relative consistency proof by Gódel [419] showed that AC was in some sense harmless. But the 
real harm lies in the diversion of much human creativity into a fruitless metaphysical domain. If the bug 
had been taken more seriously ten or twenty years earlier, it could have been eradicated. Now it is too late. 
Abstention from choice axioms is now almost as difficult and rare as vegetarianism amongst the Eskimo. 


Also of interest in this connection is the following statement by Fraenkel in the historical introduction to a 
1958 book by Bernays [341], page 20. 
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[....] to-day, after more than fifty years, there is still a lot of controversy regarding the Axiom 
of Choice, not only among mathematicians in general but in particular among those working in 
set theory. (Intuitionists, of course, reject the axiom because of its purely existential character — 
except for Poincaré who was ready to admit it.) [....] Some authors, e.g. Borel and Denjoy, think 
the axiom rather acceptable if t is a denumerable set, while the scepticism of Lebesgue and others 
makes no distinction between denumerable and non-denumerable t. 


7.12.8 REMARK: Axiom of choice scepticism. 
Penrose [297], pages 14-15, makes the following sceptical comment about the axiom of choice in the context 


of the subjectivity or otherwise of the Wiles proof of Fermat’s last theorem. 


It should perhaps be mentioned that, from the point of view of mathematical logic, the Fermat 
assertion is actually a mathematical statement of a particularly simple kind, whose objectivity 
is especially apparent. Only a tiny minority of mathematicians would regard the truth of such 
assertions as being in any way 'subjective'—although there might be some subjectivity about the 
types of argument that would be regarded as being convincing. However, there are other kinds 
of mathematical assertion whose truth could plausibly be regarded as being a ‘matter of opinion’. 
Perhaps the best known of such assertions is the axiom of choice. [...] Most mathematicians would 
probably regard the axiom of choice as ‘obviously true’, while others may regard it as a somewhat 
questionable assertion which might even be false (and I am myself inclined, to some extent, towards 
this second viewpoint). Still others would take it as an assertion whose ‘truth’ is a mere matter of 
opinion or, rather, as something which can be taken one way or the other, depending upon which 
system of axioms and rules of procedure [...] one chooses to adhere to. 


Penrose [297], pages 366-367, makes the following further comment in the context of the comparability of 
cardinal numbers. 


We may ask whether there are pairs of cardinals o and f for which neither of the relations a < 6 
and 8 < a holds. Such cardinals would be non-comparable. In fact, it follows from the assumption 
known as the axiom of choice |. ..] that non-comparable cardinals do not exist. 


The axiom of choice asserts that if we have a set A, all of whose members are non-empty sets, then 
there exists a set B which contains exactly one element from each of the sets belonging to A. It 
would appear, at first, that the axiom of choice is merely asserting something absolutely obvious! 
[...] However, it is not altogether uncontroversial that the axiom of choice should be accepted as 
something that is universally valid. My own position is to be cautious about it. The trouble with 
this axiom is that it is a pure ‘existence’ assertion, without any hint of a rule whereby the set B 
might be specified. In fact, it has a number of alarming consequences. One of these is the Banach- 
Tarski theorem, one version of which says that the ordinary unit sphere in Euclidean 3-space can 
be cut into five pieces with the property that, simply by Euclidean motions (i.e. translations and 
rotations), these pieces can be reassembled to make two complete unit spheres! The ‘pieces’, of 
course, are not solid bodies, but intricate assemblages of points, and are defined in a very non- 
constructive way, being asserted to ‘exist’ only by use of the axiom of choice. 


Probably anyone who applies mathematics to the real world, for example to physics, would tend to be an 
AC-sceptic. No choice axiom is required for mathematics if existence of choice functions is incorporated into 
definitions and theorems as conditions rather then assuming that they will appear by magic. 


Axioms should in principle be propositions about which there is not the slightest doubt. Since breathtakingly 
high towers of logic are built upon the foundations provided by axioms, even the slightest doubt about their 
veracity should be taken as a powerful warning to not rely upon them. The axiom of choice has been the 
subject of doubt by very numerous authors. This axiom is almost always introduced as a special kind of 
axiom with some kind of mystery attached to it. Therefore it would seem inimical to the spirit of mathematics 
to adopt a dubious proposition as one of the central pillars of the entire mathematical edifice. 


Mathematics is more than just a symbol manipulation game with arbitrary rules. Mathematical propositions 

have meanings. Even though AC may be independent of ZF, this does not mean that its truth value can 

be arbitrarily assigned, just as we do not arbitrarily assert that “the Moon is made of cheese” is true or 
Ean 


false. When an axiom asserts that something exists, the existential quantifier “3” is supposed to mean 
that something does actually exist, not merely that contradictions cannot arise from the assertion. Since 
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well-orderings for the set of real numbers cannot be written within ZF set theory, it seems absurd to assert 
that they can be written. 


In the middle of the 20th century, the notion that mathematics is merely a symbol manipulation game where 
the symbols have no meaning did in fact gain some popularity. Certainly the validity of argumentation should 
not depend on the meanings of propositions, and intuition should not enter into purely logical argumentation. 
However, in the choice of axioms and inference rules, certainly it is intuition and real-world facts which must 
determine what is asserted and what is not. If ZF set theory is intended to realistically represent what real 
mathematicians think and do, then surely the axioms must be chosen to be valid beyond doubt, after which 
all argumentation on the basis of the axioms should be mechanical, utilising intuition only for discovering 
proofs, not for justifying argumentation steps. Thus it is as important to apply intuition to the design of 
axioms as it is to exclude intuition from logical inference. 


Stoll [393], page 403, made the following comment regarding David Hilbert’s view on the controversial nature 
of the axiom of choice. 


Hilbert took the position that a metatheory (that is, the study of a formal theory in the meta- 
language selected) should have the following form. First of all, it should belong to intuitive and 
informal mathematics; thus, it is to be expressible in ordinary language with mathematical symbols. 
Further, its theorems (that is, the metatheorems of the formal theory) must be understood and the 
deductions must carry conviction. To help ensure the latter, all controversial principles of reasoning 
such as the axiom of choice must not be used. 


Since Hilbert was the principal advocate of formalism, as opposed to intuitionism, it is interesting that even 
he considered the axiom of choice to lack conviction. 


7.12.9 REMARK: Mathematics without AC would be reduced to a collection of algorithms. 
Moore [371], page 1, says the following regarding one motivation for accepting the axiom of choice. 


Rarely have the practitioners of mathematics, a discipline known for the certainty of its conclusions, 
differed so vehemently over one of its central premises as they have done over the Axiom of Choice. 
Yet without the Axiom, mathematics today would be quite different. The very nature of modern 
mathematics would be altered and, if the Axiom's most severe constructivist critics prevailed, 
mathematics would be reduced to a collection of algorithms. 


There are some difficulties with this point of view. First, the form of this argument resembles the assertion 
that life without Father Xmas would be empty and meaningless, and therefore Father Xmas must exist. The 
search for truth does not always bring happiness. At best it brings truth, and sometimes the truth is painful. 
'The truth regarding any question is not necessarily the solution which gives the greatest happiness. 


Second, mathematics without AC certainly is more than a mere “collection of algorithms". Every concept 
in plain Zermelo-Fraenkel mathematics (with or without AC) corresponds to concepts and activities inside 
the minds of mathematicians and the users of mathematics. In other words, there is an ontology underlying 
mathematics. In the same way that the formulas of electronics engineering refer to real-world electronic 
components, the formulas of mathematics are connected with a very wide range of world-models in a very 
wide range of disciplines. Pure mathematics is concerned with the abstract study of mathematical tools 
and "algorithms", but in application contexts, the abstract concepts come to life. Denying oneself a highly 
metaphysical concept such as the axiom of choice has no practical consequences for applied mathematics. 


'This may be contrasted with the infinity axiom, without which the majority of non-trivial theorems of modern 
pure mathematics would be lost. As noted in Remark 7.10.3, the infinity axiom is less metaphysical than 
the choice axiom, but the infinity axiom is nevertheless metaphysical. T'herefore anyone who laments the 
reduction of mathematics “to a collection of algorithms" may content themselves with the more restrained 
metaphysicality of the infinity axiom, which has the advantage of being almost indispensable to modern 
mathematical thought, whereas the choice axiom is eminently dispensable. 


7.12.10 REMARK: No one worries about AC any more, but (almost) everyone points out where it is used. 
Halmos [357], page 60, wrote the following comment in 1960 regarding the history of trying to avoid the 
axiom of choice. 


It used to be considered important to examine, for each consequence of the axiom of choice, the 
extent to which the axiom is needed in the proof of the consequence. An alternative proof without 
the axiom of choice spelled victory; a converse proof, showing that the consequence is equivalent to 
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the axiom of choice (in the presence of the remaining axioms of set theory) meant honorable defeat. 
Anything in between was considered exasperating. 


However, truth is not decided by majority vote. Intellectual fashions come and go. A few years earlier, in 
the 1956 second edition of a book on real analysis, Graves [85], pages 321-322, made the following related 
comment, saying that AC was well accepted, but that many mathematicians do point out where it is used. 


The axiom of choice and proofs by transfinite induction have been the subject of serious controversy 
among mathematicians. Some mathematicians have made strenuous efforts to avoid use of such 
logical tools, and some even regard as doubtful all proofs of existence which give no means of 
identifying an object whose existence is asserted. An existence proof depending on the axiom of 
choice is necessarily of this character, and such proofs enter into many parts of mathematics. Since 
the axiom of choice has a way of slipping into proofs without being noticed, it is well for the student 
to become thoroughly familiar with its various forms. The consistency of this axiom with the other 
axioms of set theory has been demonstrated by Gödel. Its use in mathematics is now generally 
regarded as well justified, but some mathematicians make a practice of systematically pointing out 
the occasions of its use. 


This comment seems to indicate somewhat more wariness of the axiom of choice. Even until the current 
time, most books on analysis which are not at the most elementary level point out where the axiom of choice 
is needed, at least much of the time. On this topic, Willard [165], page 9, wrote in 1970 that the axiom of 
choice “is assumed by most mathematicians when they need it, to the unremitting disgust of a few.” He 
then wrote the following comment. 


It bothers some people because it asserts the existence of a set [...] without giving enough infor- 
mation to determine that set uniquely (by applying a finite number of rules), and it is the only 
formal set-theoretic axiom which does this. For this reason it is customary to mention the axiom 
of choice whenever it is used. 


The strong tendency of mathematicians to think of the axiom of choice as “well justified”, “intuitive”, or 
“obviously true”, may perhaps be attributed to confusion between implicit and explicit existence of families 
of enumeration functions. (See Remark 13.8.1 for further comments on this.) 
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Chapter 8 


SET OPERATIONS AND CONSTRUCTIONS 


8.1 Binary set unions and intersections . . . . . 2 osos 263 
8.2 The set complement operation... . . 2 e 267 
8.3 The symmetric set difference operation ..... es 270 
8.4 Unions and intersections of collections of sets . 2... 2... 22e 273 
8.0: “Powersets’ 20.6405 dan an Qo a Se ud e ee SE eG ow tert, a e 279 
8.6 Closure of set unions under arbitrary unions... .. eo e e e 280 
8.7 Set covers and partitions ... 2... 2 a oor 281 
8.8 Specification tuples and object classes . . . .... e 283 


8.0.1 REMARK:  Theorems for set operations and set constructions are numerous and (potentially) tedious. 
The plethora of tiny propositions (and their sometimes tedious proofs) in Sections 8.1, 8.2, 8.3 and 8.4 may 
seem monotonous and unenlightening reading material, but skill in the correct inani uot of of fundamental 
set operations is essential in order to avoid error at crucial points in proofs of more interesting and uplifting 
theorems. It is therefore beneficial to attempt to prove all of the assertions in these sections without looking 
at the proofs provided, at least once in one's life. A possibly better option is to refer to these propositions as 
needed, and then verify the proofs before they are applied to ensure that one is not “benefiting” from false 
theorems! 


An alternative to logical argumentation to prove propositions about the unary and binary set operations in 
Sections 8.1, 8.2 and 8.3 is to use truth tables. Membership of each set corresponds to a logical proposition 
with two Double truth v values, and each unary or binary set operation corresponds to a unary or binary logical 
operator. There is a one-to-one correspondence between propositional calculus tautologies and identities for 
simple set-operation expressions. A much easier option is to use boolean arithmetic, representing each set 
by a single-bit boolean variable. The “proofs” of set-expression identities in Chapter 8 are informal. There 
is no attempt to apply formal methods here because the theorems are all of a very elementary character. 


Most of the theorems in Sections 8.1, 8.2 and 8.3 are limited to three sets, and yet the complexity of the 
statements and proofs of the theorems is somewhat burdensome. The reader should try to imagine the 
complexity which would arise for four or five sets or more. The reason for the complexity becomes clearer 
when one considers that there are 20" -U possible sets which can be constructed from n sets for n € Z, 
which generates the sequence 1, 2, 8, 128, 32768, 2147483648, 92233720368547 75808 (for 6 sets), and so forth. 
Each of these sets can be represented by an infinite number of set-operation expressions. So the possibilities 
for discovering set-expression identities are endless. Therefore this chapter presents only a selection of useful 
identities. The infinite sets of sets which are permitted by the general unions and intersections in Sections 
8.4 open up an enormous new can of worms. It becomes rapidly clear that elementary set algebra is not a 
walk in the park. 


8.1. Binary set unions and intersections 


8.1.1 REMARK: Unions and intersections of sets are well defined. 
The union of any two sets is well defined by the union Axiom (4) in Definition 7.2.4. The intersection of 
any two sets is well defined by the replacement Axiom (6). (For completeness and tidiness, Definition 8.1.2 
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and Notation 8.1.3 duplicate the earlier Definition 7.6.19 and Notation 7.6.20 for binary set-union and 
Definition 7.7.20 and Notation 7.7.21 for binary set-intersection.) 


8.1.2 DEFINITION: 
The union of sets A and B is the set (x; x € A V x € B). 
The intersection of two sets A and B is the set (x; x € A ^ x € B). 


8.1.3 NOTATION: 
AUB for sets A and B denotes the union of A and B 
AN B for sets A and B denotes the intersection of A and B 


8.1.4 THEOREM: Some very basic properties of set union and intersection. 
(i) AU 0 =A for any set A. 

(ii) AN = 0 for any set A. 

(iii) ACB & (zc A & (z€ AA z € B)) for any sets A and B. 

(iv) AC 0 => A — 0 for any set A. 


Pnoor: To prove part (i), note that 


r€ AUO & (xe€Av ce) 
e zcA. (8.1.1) 


Line (8.1.1) follows from Remark 4.7.8 and the fact that Vx, x € Ø by the definition of an empty set. 
To prove part (ii), note that 


reAn0 & (reA^rrze() 
e c €. (8.1.2) 


Line (8.1.2) follows from Remark 4.7.8 and the fact that Vz, x ¢ @ by the definition of an empty set. 
To prove part (iii), note that 


ACB &(zcA-zc B) 
€ (z€AS(zEAAZEB)) 
€ (zcA (€ A^ze B)) 


To prove part (iv), let A be a set which satisfies A C Ø. Suppose that A 7 Ø. Then x € A for some z by 
Definition 7.6.3. So x € Ú by Notation 7.3.3 and Definition 7.3.2. But this contradicts Definition 7.6.3 for 
the empty set. Hence A = @. 


8.1.5 REMARK: The logical equivalent of a set-operation theorem. 
Theorem 8.1.4 (iii) is equivalent to the tautology (a > 8) = (a & (aA B)) with the meanings a = “z € A” 
and B = “ze B". 


8.1.6 THEOREM: Some fundamental properties of union and intersection. 
The following identities hold for all sets A, B and C. 


() AU B 2 A. 

(ii) AU B 2 B. 

(i) AQ B C. A. 

(iv) An B C B. 

(v) AUB— BU A. [Commutativity of union] 
(i) An B— Bn A. [Commutativity of intersection] 
(vii) AU A — A. 
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(viii AN A= A. 
(ix) AU (BUC) = (AU B)UC. [Associativity of union 
(x) An(BnC)-(AnB)nC. [Associativity of intersection 
(xi) AU(Bn C) = (AU B)n (AUC). [Distributivity of union over intersection 
(xii) AN (BUC) (An B)U(AnC). [Distributivity of intersection over union 
(xii) AU(ANB)=A [Absorption of union over intersection 
(xiv) AN (AU B) =A. [Absorption of intersection over union 


PROOF: These formulas follow from the corresponding formulas for the logical operators V and A. 


Part (i) follows from Theorem 4.7.6 (vi). Part (ii) follows from Theorem 4.7.6 (vii). Part (iii) follows from 
Theorem 4.7.6 (iii) Part (iv) follows from Theorem 4.7.6 (iv). Part (v) follows from Theorem 4.7.11 (i). 
Part (vi) follows from Theorem 4.7.11 (ii). Part (vii) follows from Theorem 4.7.11 (iii). Part (viii) follows from 
Theorem 4.7.11(iv). Part (ix) follows from Theorem 4.7.11(v). Part (x) follows from Theorem 4.7.11 (vi). 
Part (xi) follows from Theorem 4.7.11 (vii). Part (xii) follows from Theorem 4.7.11 (viii). Part (xiii) fol follows 
from Theorem 4.7.11 (ix). Part (xiv) follows from Theorem 4.7.11 (x). 


8.1.7 THEOREM: Some more properties of union and intersection. 
Let A, B and C be sets. 
(i) A=B € (AC B and B C A). 
(ii) AC B 2 (AUC)C(BUC). 
i E uis (BNC). 


) 

) 

iv) 

) 

vi) 
(vii) AUB)U (An B) = AUB. 
(viii) (A = BUC) (A— B & CCB). 
(ix) (ACCA BCC) s (AUBCC). 
(x) (ACCA BCC) => (An BCC). 
(xi) (ACC V BCC) 2 (An BC C). 
(xii) (AC BA ACC) Ss (AC BnC). 
(xii) (AC BA ACC) => (AC BUC). 
(xiv) Viros E 
(xv) AU(BNC)=(AUB)NC e ACC. 
(xvi) PN M PEE e 

ii) 


PROOF: Part (i) follows from Theorem 7.3.5 (iii, iv, v). 

For part (ii), suppose that A C B. Let x € AUC. Then x € A or z € C (or both). Suppose x € A. 
Then z € B. Sor € BUC. Suppose x € C. Then z € B. Sox € BUC also. Therefore (AU C) C (BUC). 
For part (iii), suppose that A C B. Let x € ANC. Then z € A and z € C. Sore B. Sore Bn C. 
Therefore (An C) € (BNC). 

For part (iv), let A C B. Then AU B C BU B — B by part (ii). But B C AU B by Theorem 8.1.6 (ii). 
So AU B — B. To show the converse, suppose that AU B — B. Then AU B C B by Theorem 7.3.5 (iv). 
Let x € A. Then x € AU B. Therefore x € B. Hence A C B. 

For part (v), let A C B. Then An B 2 ANA = A by part (iii. But A 2 An B by Theorem 8.1.6 (iii). 
So AN B = A. To show the converse, suppose that AM B = A. Then AN B C A by Theorem 7.3.5 (iv). 
Let x € A. Then x € AN B. Therefore x € B. Hence A C B. 


Part (vi) follows from Theorem 8.1.6 (iii) and Theorem 8.1.6 (i). 
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Part (vii) follows from parts (vi) and (iv) and Theorem 8.1.6 (v). 


For part (viii), suppose that A = BUC. Then C C A by Theorem 8.1.6 (ii). So if A = B, then C C B. 
But if C C B, then A= BUC C BU B by part (ii), and BU B = B by Theorem 8.1.6 (vii). So A C B. 
But A = BUC implies B C A by Theorem 8.1.6 (i). Therefore A = B by the ZF extension axiom. Hence 
A=Be cee mm 


For part (ix), assume that A C C and B C C. Let x € AU B. Then x € Aor x € B. Suppose that x c A. 
Then z € C because A C C. Similarly, if x € B then z € C. Therefore (AC C A BC C) > (AUB C C). 
Now suppose that AUB C C. Let z € A. Then z € AUB, and so z € C. Therefore A C C. Similarly, B C C. 
Therefore A C C and B C C. In other words, A C CA^ B C C. Therefore (AUB C C) > (ACCABCC). 
Hence (ACCA BC C) & (AUBCC). 


For part (x), the implication (AU B C C) — (An B C C) follows from part (vi) and Theorem 7.3.5 (vii). 
Hence (AC C ^ BC C) = (An B C C) by part (ix). 
For part (xi), assume that A C C or B C C. Suppose that A C C. Then AN B € C because AN B C A by 


Theorem 8.1.6 (iii). Similarly, if B C C then AN B € C because AN B € B by Theorem 8.1.6 (iv). Hence 
(ACCV BC C) = (An B CC). 


For part (xii), assume that A C B and A C C. Let x € A. Then x € B and g € C. So x e BNC. 
Therefore A C BNC. Hence (AC BAAC C) > (AC BNC). Now assume that A C BNC. Let 
x € A. Then z € BNC. So x € B. Therefore A C B. Similarly AC C. So AC B^ AC C. Therefore 
(AC BnC) => (ACB^ ACC). Hence(AC B^ AC C) & (AC BnC). 


For part (xiii), the implication (A C BNC) = (A € BUC) follows from part (vi) and Theorem 7.3.5 (vii). 
Hence (AC B ^ AC C) => (AC BUC) follows from part (xii). 


For part (xiv), assume that A C B or AC C. Suppose that A C B. Then AC BUC because BC BUC 
by Theorem 8.1.6 (i). Similarly, if A C C then A C BUC because C C BU C by Theorem 8.1.6 (ii). Hence 
(ACBV ACC) — (AC BUC). 


For part (xv), AU(BNC) = (AU B) (AUC) by Theorem 8.1.6 (xi). Suppose that A C C. Then AUC = C 
by part (iv). So AU(Bn C) = (AU B)n C. Now suppose that AU (BN C) = (AU B)nC. Then 
AC AU(BnC) — (AU B)n C CC by Theorem 8.1.6 (i, iv). Thus AC C. 


For part (xvi), AN (BUC) = (An BJU(An C) by Theorem 8.1.6 (xii). Suppose that A 2 C. Then 
ANC =C by part (v). So AN (BUC) = (An B)UC. Now suppose that AN (BUC) = (An B)UC. Then 
A2 An(BUC) = (An B)UC 2 C by Theorem 8.1.6 (iii, ii). Thus A2 C. 


For part (xvii), suppose that AU B = Ø. Then A C 0 by Theorem 8.1.6 (i). So A = Ø by Theorem 8.1.4 (iv 


Similarly B = (. Thus AU B — = (A = Í and B = Ø). For the converse, suppose that A = and B = 
Then AU B = AUO = A = 0 by Theorem 8.1.4 (i). Thus AU B = 0 = (A= and B = ()). 


poc 


8.1.8 DEFINITION: Two sets A and B are disjoint if AN B =b. 


8.1.9 THEOREM: Some 4-set properties of union and intersection. 
The following identities hold for all sets A, B, C and D. 


(i) (AUB)N (CUD) =(ANC)U(BNC)U(AND)U(BND). 
(ii) (AN B)U(CND) =(AUC)N(BUC)N(AUD)N(BUD). 
(ii) (ACCA BC D) > (AUBCCUD). 
(iv) (ACCA BCD) => (ANBCCND). 


PROOF: Part (i) follows from Theorem 8.1.6. Thus (AU B)N (CUD) = ((AUB)NC)U ((AUB)ND) = 
((ANC)U(BNC)) U (An D)u(Bn D). 

For part (ii), (AN B)U(CND) = ((An B)uC)n ((An B)UD) = ((AUC)n (BUC))n ((AUD)n (BU D)) 
follows from Theorem 8.1.6. 

For part (iv), assume A C C and B C D. Let xe AU B. Then xz € Aor x € B. Suppose that x € A. 


Then z € C. Sox € CUD. Similarly, if x € B then x € CU D. Therefore by Theorem 4.7.9 (xli), 
(re Aorze B) €CUD. Sor c AUB— z€CUD. Hence AUBC CU D. 
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For part (iv), suppose that A C C and B C D. Then AN B C A by Theorem 8.1.6 (iii). So AN B CC 
by Theorem 7.3.5 (vii). Similarly, AN B C B C D by Theorem 8.1.6 (iv). Hence An B C B C Cn D by 
Theorem 8.1.7 (xii). 


8.2. The set complement operation 


8.2.1 DEFINITION: The (relative) complement of a set B within a set A is the set {x € A; x ¢ B}. 


8.2.2 NOTATION: A B for sets A and B denotes the complement of B within A. 
That is, A \ B = {x € A; x d B). 


8.2.3 THEOREM: De Morgan’s laws for complements of unions and intersections. 
The following identities hold for all sets A, B and X. 


(i) X\ (AUB) 2 (XN A) (XV B). 
(ii) XN (An B) 2 (X V A)U (X B). 


PROOF: For part (i), let x € X \ (AU B). Then z € X and x ¢ AU B. So c(z € A V x € B). So 
xé A^vréB.Soxc XMAand z € X\B. Soz € (X\A)N(X\B). Now suppose that z € (X\A)N(X\B). 
Then z € X \ A and z € X \ B. So x € X and z ¢ A and z ¢ B. So ~(x € AV x € B). Sox d AU B. 
So x € X \ (AU B). Hence X \ (AUB) = (XN A)N(X \ B). 

As an alternative approach for part (i), note that (x € A \ (BU C)) & (xe A) A (@EBALEC)) & 
(cE AAcEB) A (tE AALEC)) e ((LEA\B) ^ (xe A\C)) & (x € (AV B)N(A\C)). 

For part (ii), let x € X \ (AN B). Then z € X and z ¢ ANB. So ~(x € A^ x € B). Sotr¢gAV«¢B. So 
rc XXAorze X\B. Sor € (X\ A)U(X\B). Now suppose that x € (X\ A)U(X\B). Thenz € X\A 
or x E€ X \ B. So x € X, and z ¢ A or x é B. So ~(x € A^ x € B). Soz An B. Sor e X\ (ANB). 
Hence X \ (AN B) 2 (XV A) U(X \ B). 


8.2.4 THEOREM: Expression for intersection in terms of unions and complements. 
Let A and B be sets. 


(i) An B-—(AUB)N((AN B)U(BN A)). 


PROOF: For part (i), let x € ANB. Then z € A and x € B. Sor € AX B and z ¢ BV A. So 
x é(ANB)JU(BN A). But z € AUB. Therefore x € (AU B) \ ((A\ B)JU(BN A)). To show the 
reverse inclusion, suppose that x € (AUB) \ ((A\ B)U(B\A)). Then z € AUB, and sox € A 
or x € B. Suppose that z € A, and assume that x ¢ B. Then x € AV B. Sox € (AN B)U(B\ A). So 
x é(AUB)N((AN B)U(BN A)), which is a contradiction. Therefore x € B. Similarly, the assumption 


x € B implies x € A. Therefore x € AN B in both cases. Hence An B = (AU B)\ ((A\ B)U(BN A)). 


8.2.5 THEOREM: Some basic properties of the complement operation. 
Let X and A be sets. 


) 
) 
) 
) (AC X and X\A=0) & A- X. 
) (XNVAÀ)n A =l. 

(vii) (X VA) n (AV X) — 0. 

i) (XVA)JUA- XU A. 

) 
) 
) 
) 


XX A)UX =X. 
XNnAU(X\A) =X. 
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(xiii) If X VA — X, then XNA=O. 
(xiv) (XNA)VA- XM A. 
(xv) X\(X\ A)= XNA. 
(xvi) If A C X, then X \ (X \ A) =A. 
(xvii) X\ (XNA) =X\A. 
(xviii) (X \A)NX=X\ A. 
(xix) (XUA)\A=X\A. 

(xx) (XN A)JU((XU A)V X) =A. 

(xxi) X \A=A\X if and only if A= X. 


PROOF: For part (i), note that X V X = {x € X; x ¢ X) = 0 by Definition 8.2.1. 

For part (ii), note that X \ Ø = {x € X; x ¢ 0} = X by Definition 8.2.1. 

For part (iii), let x € X V A. Then z € X, by Definition 8.2.1. Hence XV AC X. 

For part (iv), note that X V A = f if and only if Vx, =(x € X \ A), if and only if Vx, (x € X ^ x ¢ A) 
by Definition 8.2.1, if and only if Va, (x ¢ X V x € A), if and only if Vz, (x € X — x € A), if and only 
if X C A. 

For part (v), (A C X and XVA— 0) & (AC X and X C A) by part (iv) and Theorem 4.7.9 (xxix). Hence 
(AC X and XVA— 0) & A= X by Theorem 8.1.7 (i). 

For part (vi), let x € (X V A) A. Then z € XV A. So z ¢ A by Definition 8.2.1. So x ¢ A and z € A. 
Hence (X V A)n A =Í. 

Part (vii) follows from part (vi) and Theorem 8.1.7 (iii), since A \ X C A by part (iii). 

For part (viii), note that X V A C X. So (X \A)UA C XU A by Theorem 8.1.7 (ii). Now suppose 
that r € XU A. Then z € X or z € A (or both). If x € A, then z € (X \ A) UA by the definition of set 
union. If x ¢ A, then x € X V A by Definition 8.2.1. So x € (X V A) U A by the definition of set union. 

For part (ix), suppose that A C X. Then (X V AJU A = XU A by part (viii). But XU A = X by 
Theorem 8.1.7 (iv). So (XV A)U A =X. 

For part (x), (x € ((X \ A)UX)) e ((re X ^d A) Vx € X) Saxe X by Theorem 4.7.11 (ix). 

For part (xi), let x € X. Then z € A or z ¢ A. If x € A, then x € Xn A. If z £ A, then z € XXV A. So 
x E (XM A)U(X \ A). Conversely, suppose that x € (XQ A)U(X\ A). Thenz e XnAorze XV A. 
Then x € X in either case. So (X N A) U(X \ A) =X. 


Part (xii) follows directly from part (xi) and Theorem 8.1.4 (i). 
Part (xiii) follows directly from part (vi). 


For part (xiv), note that (X \ A) \ AC XV A by Definition 8.2.1. Suppose x € X \ A. Then x ¢ A by 
Definition 8.2.1. So x € (X \ A)\ A by Definition 8.2.1. Hence X \ AC (X \ A)\ A. 


For part (xv), let x € X \ (X \ A). Then z € X and z ¢ X \ A. From z ¢ X \ A, it follows that z ¢ X 
or x € A. But x ¢ X contradicts x € X. Sox € A. Thus z € XNA. Hence X \ (X \ A) C XNA. Now 
suppose that x € XNA. Then z € X and z € A. Sox ¢ X \ A, by Definition 8.2.1. So x € X \ (X \ A), by 


Definition 8.2.1. Hence Xn AC X \ (XV A). 
For part (xvi), suppose that A C X. Then XV (XV A) = XNA by part (ix). But XNA = A by 


Theorem 8.1.7 (v). So XV (XV A) =A. 

Part (xvii) follows from Theorem 8.2.3 (ii) and part (i). 

For part (xviii), (x € (X V A)0n X)) e (re X^z£d ADAGE X) e(ze X ^z é A) by 
Theorem 4.7.11 (ii, vi, iv). 

For part (xix), let x € (X UA)\ A. Then z € X UA and z ¢ A, by Definition 8.2.1. But from x € X U A, 
it follows that x € X or x € A (or both). So x € X. Hence (XU AJVAC XV A, by Definition 8.2.1. 
Now suppose that x € X \ A. Then x € X and z ¢ A, by Definition 8.2.1. But from x € X, it follows 
that x € XU A by Theorem 8.1.6 (i). Sor € (XU A) \ A. Hence(XU A)\A=X\ A. 

For part (xx), (X NAJU ((X UA) N X) = (XM A) U(A\ X) by part (xix) and Theorem 8.1.6 (v). Hence 
(Xn A)U((X U A) \ X) = A by part (xi) and Theorem 8.1.6 (vi). 
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For part (xxi), suppose that X \ A = A \ X. Let x € X \ A. Then z ¢ A. So x ¢ AX X, which contradicts 
the assumption. Therefore there is no z € XV A. Thus XV A = (j. So X C A by part (iv). Similarly A C X. 
Hence A — X. The converse follows from the double application of part (iv) 


8.2.6 THEOREM: Some not-so-basic properties of complements, unions and intersections. 
The following identities hold for all sets A, B, C and X. 


(i) AN(B\ C) =(ANB)\C. 


(ii) If ANC = 0, then AN(B\C)=ANB. 

(iii) (AU B)JNC— (ANC)U(BNC). 

(iv) If BC C, then (AU B)JNC— ANC. 

(v) ANB CC if and only if AC BUC. 

(vi) (AN B)JNC-—(ANC)n(BNC) (AN C)n B. 
(vii) (A\ B)\C=A\(BUC) = (AU B)JN(BUC). 
(vii) (A\ B)\B=A\B. 

(ix) (AN B)\C=(A\C)\B. 

(x) (A\C)\ (B\ 0) = AV (BUC). 

(xi) (A\ B)\ © = (A\ B) \ (CA B). 

(xii) (An B)\ € = (A\C)\ (4\ B) = (B\ 0)\ (BV A). 
(xii) A\(B\C)=(A\B)U (ANC). 

(xiv) (AUC)N(BNC) 2 (AN B)UC. 

(xv) If AC B, then ANCC B\C. 

(xvi) If AC B, then C\ ADC\B. 

(xvii) C\ADC\BSCNACCNB. 
(xvii) (A\C)N(B\C)=0 & ANBCC. 

(xix) (ANC € BNC) A(A\CCB\C)) 2 AC B. 
(xx) If AC X, then (X \B)NA=A\B. 

(xxi) If A, B C X, then (CN AC CNB) A ((X\C)NAC(X\C)NB)) 2 ACB. 


PROOF: Part (i) follows from (x € (AN(B\C))) e (re A^ (RE BAzé£C) S(MEAATE 
B) A z ¢ C) & x € ((ANB)\C). 

Part (ii) follows from a double application of part (i). Suppose ANC = 0. Then AN(B\C) = (ANB)\C = 
(BN A)\C=BnN(A\C) =BNA by Theorem 8.2.5 (xii). 

For part (iii), let A, B and C be sets. Then x € (AU B) V C if and only if z € AU B and z ¢ C, if and 
only if (x € A or x € B) and x ¢ C, if and only if (x € A and x ¢ C) or (x € B and z € C), if and only if 
xe ANCorzec B\C, if and only if x € (A\C)U(B\C). 

For part (iv), let B C C. Then BV C = () by Theorem 8.2.5 (iv). Therefore (AU B) \C = (A\C)U(B\C) = 
(AN C) by part (iii) and Theorem 8.1.4 (i). 

For part (v), let AV B C C. Then by Theorem 8.2.5 (xi, A= (AN B)U(A\ B) C (An B)UC C BUC. 
Conversely, suppose that A C BU C. Then by Theorem 8.2.5 (xix), AN BC (BUC)\B=C\BCC. 

For the first equality in part (vi), let A, B and C be sets. Then x € (An B) \ C if and only if z € ANB 
and z € C, if and only if x € A and x € B and z ¢ C, if and only if (x € A and x ¢ C) and (x € B and 
x ¢ C), if and only if x € A\C and € B \ C, if and only if x € (A \ C) N (B \ ©). 

The second equality in part (vi) follows from the fact that (A \ C) N C = Ø by Theorem 8.2.5 (vi), and so 
(A\ C)NA (B\C)=(A\C)A B by part (ii). Alternatively, the equality (AN B) \C = (AX C) A B follows 
directly from part (i). 

For the first equality in part (vii), let A, B and C be sets. Then z € (A \ B) \ C if and only if z € (AN B) 
and z € C, if and only if (x € A and x d B) and x € C, if and only if x € A and z € BU C, if and only if 
cE AN(BUC). 
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For the second equality in part (vii), note that B C BUC by Theorem 8.1.6 (ii). Hence (AU B) \ (BUC) = 
A\ (BUC) by part (iv). 

For part (viii), (AV B) \ B = A\ (BU B) by part (vii. But BU B = B by Theorem 8.1.6 (vii). Hence 
(A\ B)\B=A\B. 

For part (ix), note that (AV B)\C = A\ (BUC) by part (vii). Similarly, (A\C)\ B = A\ (CUB). 
But BUC = CU B. Therefore (AV B)\C = (ANC)N B. 

For part (x), z € (A\C)\(B\C) & (( e Anz £C)A^o(ze€ BAzéC)) &((xe€ ANz£C)^(zé Bvze 
C)) & (€ AN éCAz£ B)V(x€ AN £C^zxz€C)) & (LE AATECALE B) xe A\(BUC). 
Part (xi) follows from parts (vii) and (x). 

For the first equality of part (xii), note that (AN C) V (AN B) = (AN(AN B)) VC by part (ix). But 
AN(AN B) = An B by Theorem 8.2.5 (xv). Therefore (AV C) \ (AN B) = (An B) \ C. The second equality 
follows from the first by swapping A and B because AN B= Bn A. 

For part (xiii), let A, B and C be sets. Then x € AV (B \ C) if and only if x € A and x ¢ B \ C, if and 
only if x € A and (x ¢ B or x € C), if and only if (x € A and z € B) or (x € A and z € C), if and only if 
rcANBorzc ANC, if and only if x e (AN B)U(AnC). 

For part (xiv), (AUC) \ (B\C) = ((AUC)\ B)U((AUC)NC) by part (xiii). This equals ((AUC)\ B)UC 
by Theorem 8.1.6 (xiv). Then this equals ((A \ B) U (C \ B)) UC = (A \ B)U((C \ B) UC) by part (iii). 
And this equals (A \ B) UC by Theorem 8.2.5 (x). 

For part (xv), let A, B and C be sets such that A C B. Let x € AVC. Then z € A and z ¢ C. Sore B 
and x ¢ C. So x € B\C. Hence ANCC B\C. 


For part (xvi), let A, B and C be sets such that A C B. Let x € CX B. Then x € C and x ¢ B. 
Therefore x ¢ A. So x € CV A. Hence C \ AD C B. 


For part (xvii), let A, B and C be sets such that CV A 2 CX B. Let x € CN A. Then x € C and x € A. 
Then z ¢ CV A. Sox € CV B. So x € B because x € C. Sox € CNB. Hence Cn A € Cn B. To show the 
converse, suppose that CNA C Cn B. Then CV (Cn A) 2 CV (C n B) by part (xvi). Hence CVAD2 CV B 
by Theorem 8.2.5 (xvii). 

For part (xviii), note that (A \ C) N (BX C) = 0 if and only if (AN B) \ C = 0 by part (vi), if and only if 
(An B) CC by Theorem 8.2.5 (iv). E 

For part (xix), let A, B and C be sets such that (ANC C BNC) ^ (ANC C B\C) > AC B. Letz € A. 
Then z € ANC or z € A\C by Theorem 8.2.5 (xi). If x € ANC then x € BNC. If z € A\C then 
x € B\C. Sor € (BNC)U(B\C). Therefore x € B by Theorem 8.2.5 (xi). Hence A C B. Conversely, 
suppose that A C B. Let r € ANC. Then x € A and z € C. Therefore x € B and x € C. Sox e BNC. 
Hence ANC C BNC. Now let x € A\C. Then z € A and a ¢ C. Therefore x € B and x ¢ C. So x € B\C. 
Hence A \ C € B\ C. Therefore (ANC C BNC) ^ (ANC € BXC). 

For part (xx), let A, B and X be sets such that A C X. Let z € (X V B) A. Then z € XV B and z € A. 
So x é B and z € A. So x € AV B. Conversely, suppose that x € ANB. Then z € A and z d B. But x € X 
because A C X. So x € X \ B and x € A. Sor e (XV B) A. 


For part (xxi), let A, B, C and X be sets with A, B C X. Then (XVC)0 A = ANC and (X\C)NB=B\C 


by part (xx). So the proposition (X V C) n.A € (XX C) n B is equivalent to ANC C BN C. Hence the 


proposition ((C N A C Cn B) ^ ((XNC)n AC (XV C)n B)) = AC B follows from part (xix). 


8.3. The symmetric set difference operation 


8.3.1 REMARK: The symmetric set difference is a disjoint union of two complements. 
The union in Definition 8.3.2 is disjoint by Theorem 8.2.5 (vii). 


8.3.2 DEFINITION: The (symmetric) set difference of two sets A and B is the set (AN B) U (B \ A). 


8.3.3 NoTATION: AA B, for sets A and B, denotes the symmetric set difference of A and B. 


8.3.4 THEOREM: Some basic properties of the symmetric set-difference operation. 
The symmetric set difference operation has the following properties for sets A, B and C. 
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(i) AA A - 0. 

(i) AA - A. 

(iii) AAB=BAA. 

(iv) A C B if and only if AAB=B\ A. 

(v) A= B if and only if AA B — (. 

(vi) AA B—-(AUB)N(An B) 2 (AUB) A(An B). 

(vii) If An B = 0, then A A B = AU B. 
) (AA B)n(An B) - 0. 
) 
) 
) 
) 
) 
) 
) 
) 


(viii 


(ix) (AA B)JN(AQn B)— AA B. 

(x) (AA B)U(An B) 2 AU B. 

(xi) (AA B) A(An B) 2 AU B. 

(xii) (AA BJAC-(AN(BUC)U(BN(AUC))U(CN(AUB))U(An BNC). 
(xii) (AA B)AC=AA(BAC). 

(xiv) (AA B) A A — B. 

(xv) (AA B) C 2 (CN A) A (CN DB). 


(xvi) AN(BAC) 2 (AN (BUC))U (An Bn C). 


PROOF: For part (i), let A be a set. Let x € AA A. Then z € (AV A)U (AN A) by Definition 8.3.2. So 
x € AX A. So x € A and z ¢ A, by Definition 8.2.1. This is a contradiction. Hence A A A = b. 

For part (ii), let A be a set. Then z € A A 0) if and only if x € (AV 0) U (0X A) (by Definition 8.3.2), if and 
only if x € A\@ or x € 0X A, if and only if x € AV f, if and only if x € A and A ¢ 0), if and only if x € A. 
SoAAQ=A. 

Part (iii) follows from the symmetry of the union operation in Definition 8.3.2. 

For part (iv), let AC B. Then AA B = (A \ B)U(B\ A) =0U(B\ A) = B\ A by Theorem 8.2.5 (iv) and 
Theorem 8.1.4 (i). 

For the converse of part (iv), suppose that AA B = B \ A. Then (AV B)U(B\ A) = BV A. Therefore 
A\ B C B\A by Theorem 8.1.7 (iv). So AV B C B by Theorem 8.25 (iii). So (A \ B)\B C B\ B by 
Theorem 8.2.6 xv). But BV B= Ô by Theorem 8.2. 8.2.5 (i) and (AN B)\ B= A \ B by Theorem 8.2.5 (xiv). 
So AX B CO. Therefore A\ B = (. Hence A C B by Theorem 8.2.5 (iv). 

For part (v), let A= B. Then AA B = AA A = Í by part (i). For the converse, suppose that A A B = 9. 
Then (A \ B) U (BN A) = 0 by Definition 8.3.2. So A\ B = ( and B\ A = 0 by Theorem 8.1.7 (xv a 


Therefore B C A and A C B by Theorem 8.2.5 (iv). Thus A = B. Hence A = B if and only if AAB- 
For the first equality in part (vi), let A and B be sets. Then 


tEAABsxe(A\B)U(B\A) 
€& (r€ A^xéB)V(xe€B^mcé&A) 
€& (rc AVxe€B)^(xéBVxc€B)^(x€AVrx4A)^(x&4AV EB) 
€& (r€ AVxe€B)^(x£é£AV r4 B) 
< re AUBATEANAB 
= rc(AUB)N(An B). 


The second equality in part (vi) follows from (AN B) € (An B) and part (iv). 

Part (vii) follows directly from the first equality in part (vi). 

For part (viii), AA B = (AUB)\ (ANB) by part (vi). So (AA B)N(ANB) = ((AUB)\(ANB))N (An B) (by 
doing the same thing to both sides of the equation). Therefore (AA B) N (ANB) = 0 by Theorem 8.2.5 (vi). 
Part (ix) follows from part (viii) and Theorem 8.2.5 (xii). 

For part (x), (AA B)U (AN B) = ((AU B)\ (AN B))U (AN B) by part (vi). Therefore (AA B)U (AN B) = 
(AU B)U (An B) by Theorem 8.2.5 (viii). Hence (A A B) U (An B) = (AU B) by Theorem 8.1.7 (vii). 
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For part (xi), note that (AA B) A (AN B) = (AA B)U(An B) by part (vii) because (AA B) N (ANB) =0 
by part (viii). Hence (A A B) A (An B) = AU B by part (x). 

For part (xii), let A, B and C be sets. Then (A^ B) AC = (AA B)JNC)U(CN(AA B)). But 
(AA B) \ C equals ((A \ B) U (BN A)) \ C by Definition 8.3.2, which equals (LAN B)N C)U ((B \ A)N C) 
by Theorem 8.2.6 (iii), which equals (A \ (BU C)) U (BN (AU C)) by Theorem 8.2.6 (vii). Also, C\ (AA B) 
equals CV ((AU B)N (An B)) by part (vi), which equals (CV (AU B))U (Cn (An B)) by Theorem 8.2.6 (xiii). 
Therefore (AA B) AC =(A\(BUC))U(B\ (AUC))U(C\ (AU B))U(ANBNC). 

For part (xiii), note that the expression for (AA B)AC in part (xii) is symmetric with respect to permutations 
of A, B and C. Hence or otherwise, (A AB) AC=AA(BAC). 

For part (xiv), (A A B) A A = (B A A) A A by part (iii), which equals B A (A A A) by part (xiii), which 
equals B A () by part (i), which equals B by part (ii). 

For part (xv), note that (CN A) A (CN B) = ((CN A) \ (C\ B) U ((CN B) N (CN A)) by Definition 8.3.2. 
But (CV 4) \ (C \ B) = (BN C)VA and (CV B) \ (C \ A) = (ANC) \ B by Theorem 8.2.6 (xii). Also, 
(BhC)VA = (BV A)n C and (An C)N B = (AN B) Y C by Theorem 8.2.6(vi). So it follows that 
(CN A) A(C\ B) = ((B\ A)NC)U((A\ B) C). This equals ((B\ A) U(A\ B)) n C by Theorem 8.1.6 (xii), 
which equals (B A A) C by Definition 8.3.2, which equals (A A B) N C by Theorem 8.3.4 (iii). 

For part (xvi), A\ (BAC) = AN((BUC)N(Bn C)) by part (vi. Hence A\ ((BUC) \(BNC)) = 
(A\ (BUC))U(AN(BNC)) by Theorem 8.2.6 (xiii). 


8.3.5 REMARK: The simplicity of the properties of the symmetric difference operation. 
Properties (i), (ii), (iii) and (xiii) in Theorem 8.3.4 for the symmetric difference operation are satisfyingly 
simple and symmetric. This is because a symmetric set difference contains a point if and only if the number 
of sets which contain the point is odd. This observation implies both the symmetry rule (iii) and the 
associativity rule (xiii). The set A A A in part (i) contains no points because all points are either in zero or 
two of the sets A and A. 


Theorem 8.3.7, by contrast, contains few satisfyingly simple symmetric rules because the symmetric set 
difference operator is arithmetic in nature, whereas the union and intersection operators are logical in nature. 
(Admittedly, parts (i) and (iii) are quite satisfying.) The logical operators have simple relations amongst 
themselves and the arithmétic operator A has simple relations. But combinations of these two kinds of 
relations yield very few simple formulas. 


8.3.6 REMARK: Truth tables give quick proofs of set-operation theorems, but deduction has advantages. 
Theorems 8.3.4 and 8.3.7 can be more easily proved using truth tables or Venn diagrams, but the logical 
deduction method can be more easily extended to assertions which involve infinite collections of sets. 


8.3.7 THEOREM: Some slightly less basic properties of the symmetric set-difference operation. 
The symmetric set difference operation has the following properties for sets A, B and C. 
(i) (AA B)nC (AnC)A(BnC). 
(ii) (AA B)nC =b & (ANC) 2 (BnC). 
) VC-(AUC)A(BUC)-—-(ANC)A(BNXC). 
) UC —-(AUBUC)N((An B)NC). 
) UB — AU B. 
)AA(AnB)- ANB. 
)(AAB)nA- ANB. 
(viii) (AA B)\A=B\A. 
) 
) 
) 
) 
) 
) 


AnB)N(AAC)-(AnBnC). 

An B)NV(AA B) —- (An B). 
AAB)U(AAC)-(AUBUC)N(An Bn C). 
AA B)n(AAC)-(AN(BUC))U((BnC)N A). 
)V(An(BAOC)). 
)U(BAC)\A). 


SF AS 
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PROOF: For part (i), (AA B)NC = ((A\ B)U(B\ A))NC = (A\ B)NC)U ((B\ A)NC) = 

((ANC)\ B)U((BNC)\ A)= ee oe al —(AnC)A(BnC). 

Part (ii) follows from part (i) and Theorem 8.3.4 (v). 
( 


For the first equality in part (iii), (A A B)\ C = ((A 
by Theorem 8.2.6 (iii). This equals (LAU C) V (BU C)) 
equals (AU C) A (BU C) by Definition 8.3.2. 
For the second equality in part (iii), (AN C) A 


\ B)U(B\ A)\ C = (AN B)NC)U((BN A) NC) 
U ((B U C) \ (AU C)) by Theorem 8.2.6 (vii), which 


iii B \ ©) equals ((A\ C) \ (B\ C)) U (B \ C) \ (A\ C)) by 
Definition 8.3.2. This equals (A \ (CU (BN C)))U(B\ (CU (A \ C))) by Theorem 8.2.6 (vii), which equals 
(AN (CUB))U(B\ (CU A)) by Theorem 8.2.5 (viii). This equals ((A UC) \ (CU B))U((BUC) \ (CU A)) 


by Theorem 8.2.6 (vii), which equals (AU C) A (BUC) by Definition 8.3.2. 

For part (iv), (AAB)UC = ((AUB)\(ANB))UC by Theorem 8.3.4 (vi). This equals (AUBUC)\((ANB)\C) 
by Theorem 8.2.6 (xiv). 

Part (v) follows from part (iv) by substituting B for C and noting that (An B)\ B = () by Theorem 8.2.5 (iv) 
because AM B C B by Theorem 8.1.6 (iv). 

For part (vi), note that A^ (An B) = (AN(An B))U((AQn B)\ A) Z2 (AN B)JUOÜ- AN B. 

For part (vii), substituting A for C in part (i) gives (A^ B)NA = (ANA) A (BAA)  AA(AQn B) - ANB 
by part (vi). 

For part (viii), note that (A A B) \ A = (AV A) A (B \ A) by part (iii), which equals Ø A (B \ A) by 
Theorem 8.2.5 (i), which equals B X A by Theorem 8.3.4 (iii) Theorem 8.3.4 (ii). 

For part (ix), note that (AN B)N(AAC) = (ANB) \ (AUC) \ (ANC)) by Theorem 8.3.4 (vi). This equals 
((ANB)\(AUC))U((ANB)N(ANC)) by Theorem 8.2.6 (xiii). This equals AN BNC by Theorem 8.1.7 (iv) 
because ((AN B)V(AUC)) C (AN B)N(ANC)) = An Bn C. 


Part (x) follows from part (ix) by substituting B for C. 


For part (xi), note that (AA B)U(AAC) = ((AUB)\(ANB))U(AAC) by Theorem 8.3.4 (vi). This equals 
((AU B)U (AA C))N((An B) \ (AA C)) by Theorem 8.2.6 (xiv). But (AU BJU(AA C) = AUBUC by 
part (v), and (AN B)\ (AAC) = ANBNC by part (ix). Hence (A^ BJU(AAC) = (AUBUC)N(An Bn C). 
For part (xii), (AA B)N (AAC) = ((A\ B)U(B\ A))N(AAC) = ((A\ B)N(AAC))U((B\ A)n (AA C))). 
But (A\ B)N(AAC) = (AN(AAC))\B = (A\C)\ B = A\ (BUC) by part (vii) and Theorem 8.2.6 (vii). 
Moreover, (B \A)N (AA C) = ((B\ A) à A) A ((B\ A) NC) by part (i), which equals 0 A ((B \ ANC) = 
(B\ A)NC =(BNC)\A by Theorem 8.2.6 (i). Hence (A A B)Y (AA C) = (A\ (BUC))U((BNC)\ A). 
For part (xiii), note that (AA B)U(AAC) = (AUBUC)\ (ANBNC) by part (xi). So ((AAB)U(AAC))\ 
(AN(BAC)) = ((AUBUC)\(ANBNC)) \ (AN(BAC)) = (AUBUC)\ ((ANBNC)U(AN(BAC))) by 
Theorem 8.2.6 (vii). But (AN BNC)U(AN(BAC)) = AN((BNC)U(BAC)) by Theorem 8.1.6 (xii), which 
equals AN(BUC) by Theorem 8.3.4 (x). Hence ((AAB)U(AAC))\(AN(BAC)) = (AUBUC)\(AN(BUC)) = 
AA (BU C) by Definition 8.3.2, as claimed. 
For part (xiv), (AA B)N(AAC) = (A\(BUC))U((BNC)\ A) by part (xii). So ((AA B)N(AAC))U((BA 
C)\ A) = (A\(BUC))U((BNC)\ A)U((BAC)\ A). This equals (A\(BUC))U(((BNC)U(BAC)) \ A) 
by Theorem 8.2.6 (iii), which equals (A \ (B U C)) U ((B UC) \ A) by Theorem 8.3.4 (x), and this equals 
AA (BU C) by Definition 8.3.2. 


— 
dear 


8.4. Unions and intersections of collections of sets 


8.4.1 REMARK: Applicability of properties of infinite unions and intersections in topology. 
The properties of general unions and intersections are especially applicable in topology. For completeness 
(and aesthetic reasons), Notation 7.6.17 is duplicated here. 


8.4.2 NOTATION: 
US denotes the set (x; JA € S, x € A} for any set S. 
(| S denotes the set (x; VA € S, x € A} for any non-empty set S. 


8.4.3 THEOREM: Well-definition of unions and intersections of collections of sets. 
Let S be a set. 
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(i) US is a well-defined Zermelo-Fraenkel set. 
(ii) If S Z 0, then N S is a well-defined Zermelo-Fraenkel set. 


PROOF: For part (i), the set US is well defined by the ZF union axiom, Definition 7.2.4 (4). 
For part (ii), let S be a non-empty set. Then X € S for some set X. Define a single-parameter predicate 


P by P(x) = “YA € S, x € A" for sets x. Then the set (x € X; P(x)) is well defined by Theorem 7.7.2. 
In other words, {x € X; VA € S, x € A} is a well-defined set. But (VA € S, x € A) =x € X. Therefore 
(vVAe€S,cA)ex(reXA^(vVAeS,ccA).Soí(rze X; VAES, x4 € A} —- (x; VA € S, v € A}. So 


[z; VA € S, x € A} is a well-defined set. Hence f) S is well defined by Notation 8.4.2. 


8.4.4 THEOREM: Some equivalent conditions for membership of unions and intersections. 
(i) ceUS e (3AeS, xe A) & (3A, (€ A^ Ae S)) for any z and any set of sets S. 
(ii) vef18 e (VAES, v e A) & (VA, (x€ A V A£ S)) for any x and any non-empty set of sets S. 


PROOF: For part (i), the first equivalence (x € US) & (4A € S, x € A) follows directly from Notation 8.4.2. 
The second equivalence (3A € S, x € A) & (AA, (x € AA A € S)) follows from Notation 7.2.7 (which defines 
the meanings of the restricted quantifiers “Vaz € S" and “da € S"). 


For part (ii), the first equivalence (x € S) = (VA € S, x € A) follows directly from Notation 8.4.2. The 
equivalence (VA € S, x € A) & (VA, (A € S = x € A)) follows from Notation 7.2.7, from which the second 


equivalence (VA € S, x € A) & (VA, (x € A V A € S)) follows by Theorem 6.6.18 (iii). 


8.4.5 REMARK: Checking some trivial properties of unions and intersections of set-collections. 

Despite the triviality of Theorem 8.4.6, it is important to confirm the validity of trivial “boundary cases" 
for definitions and theorems. Part (i) is proved in unnecessary semi-formality in order to demonstrate to 
some extent how propositional calculus and predicate calculus may be applied in such simple cases. A fully 
formal proof would, of course, be excessively tedious. Mathematicians give only sketches of proofs. Life is 
too short to waste on completely formal proofs. (Computer programs can do that if required.) The details 
of a step in a proof only need to be filled in if some scepticism is expressed as to its validity. 


8.4.6 THEOREM: Some very basic properties of general unions and intersections. 
(i) YO=0. 

(ii) U{A} = A for any set A. 

(ii) Q{4} = A for any set A. 

(iv) U{A, B) = AU B for any sets A and B. 

(v) Q{4, B} = AN B for any sets A and B. 


PROOF: For part (i), let x € U0. Then JA, (x € A A A € Ø) by Theorem 8.4.4 (i). But VA, A ¢ 0 by 
Definition 7.6.3. So VA, (A € Ø ^ x € A) by Theorem 4.7.9 (Ixix). Therefore 4(3A, (A € Ø ^ x € A)) by 
Theorem 6.6.7 (v). This contradiction gives x ¢ [JØ by Theorem 4.6.4 (v). Therefore Yz, x ¢ U@ by the 
universal introduction axiom, Definition 6.3.9 (UI). Hence J Ø = Ø by Definition 7.6.3. 


For part (ii), let A be a set. Then y € {A} if and only if y = A, for any given y by Definition 7.5.6. So 
x € U{A} if and only if dy € {A}, x € y, which holds if and only if x € A. Hence U{A} = A. 

For part (iii), let A be a set. Then y € {A} if and only if y = A, for any given y. So x € {A} if and only 
if Vy € {A}, x € y, which holds if and only if z € A. Hence {A} = A. 

For part (iv), let A and B be sets. Then y € (A, B} if and only if y = A or y = B, for any given y. So 
x € U(A, B} if and only if dy € (A, B}, x € y, which holds if and only if x € A or x € B. But x € AU B if 
and only if z € A or x € B, by Definition 8.1.2 and Notation 8.1.3. Hence U{A, B) = AU B. 


For part (v), let A and B be sets. Then y € (A, Bj if and only if y = A and y = B, for any given y. So 
x € (A, B) if and only if Vy € (A, B), x € y, which holds if and only if x € A and x € B. Butz € ANB 
if and only if z € A and z € B, by Definition 8.1.2 and Notation 8.1.3. Hence (){A, B) = An B. 


8.4.7 REMARK: Various generalisations of union and intersection properties. 
Theorem 8.4.8 gives some generalisations of the statements in Theorems 8.1.6 and 8.1.9 to sets of sets. (The 
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reader may like to ponder why part (x) of Theorem 8.4.8 requires $1 and S5 to be non-empty. Hint: Consider 
the example Sı = 0, S2 = (X) where X # 0.) 

The slightly unusual assertion of Theorem 8.4.8 (xxiv) is applicable to ordinal numbers. (See for example 
Theorem 12.5.23 (xiv.) It is presented here because it is valid for general sets of sets, and because it shows 
how sometimes a multiple-case formula can be converted to a linear formula, which make typesetting easier! 


8.4.8 THEOREM: Some slightly less basic properties of general unions and intersections. 
Let A be a set. Let S, Sı and S2 be sets (of sets). 

0) (19 C US if S z 0. 

(ii 


(iii 


(xviii 
(xix 
(xx) AES > US=U{AUX; XES}. 

i) AES S (\S=(\{ANX; X € S}. 
(VX1 € S1, VX € S2, Xyn X320) & (US NUS: =9). 
If Sı 40 and VX, € S1, VX2q € S2, X4 C Xs, then (S1 U S2) = N $. 


(xxi 
(xxii 
(xxiii 
Uso dieque ye ree 


) 
) 
(iv) An(US) -U(AnX; Xe s). 
(v) AU (US) =U{AUX; X € S}. 
(vi) AU (NAS) 2(1(AU X; Xe S) if S z 0. 
(vii An(f1S9) 2(0(An X; X e S) if SFO. 
(viii) (U S1) n (US2) 2U (Xin Xs; X1 € S1, X2 € S5). 
(ix) (AS) U (A S2) = (1 Xi U X2; X1 € S1, X2 € S2} if $1 Z 0 and S2 z 0. 
(x) (US1) U (US2) = U {X1 U Xo; X1 € S1, X2 € S2} if 54 # Ó and 55 ZO. 
(xi) (AS) n (A S2) = N DX n X2; X1 € 81, X2 € S2} if $1 # 0 and S2 FO. 
(xii) (US) U (U32) =U (Si U S9). 
(xiii) (£151) n (N S2) = N (91 U 85) if S, #0 and S; 7 0. 
(xiv) AES > ACUS 
(xv) AES 2 ADS 
(xvi) pes => (1S-0 
(xvii) USCA SVBES, BCA. 
)n 
) 
) 
) 
) 
) 
) 


(xxiv 


PROOF: For part (i), let S 4 Ø. Then f) S and US are well-defined sets. Let x € (|S. Then Vy € S, x € y. 
But z € S for some z because S # Ø. So x € z for such a set z. Therefore 3z € S, x € z. Thus z € US. 
Hence NS CUS. 

For part (ii), let S; and S2 be sets of sets such that Sı C S5. Let x € |J S1. Then JA € Sı, x € A. But 
A € $1 > A € $5. So HA € S2, x € A, which means that x € U S5. 

To prove part (iii), let Sı 4 Ø and S» be sets (of sets) such that Sı C S2. Then $5 # Ø. Let x € (| Si. Then 
VA € Sı, x € A. But A € Sı > A E S2. So VA € S2, x € A, which means that x € (S5. 


To prove part (iv), let A be a set and S be a set of sets. Then 


(re A) A GXES,xzEX) 
Xe€S,(cA^mcX) 
XcS,zcAnx 
reUtanx;xes) 


z € An (US) 


e 
e 
e 
e 
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To prove part (v), let A be a set and S be a set of sets. Then 


xz € AU(US) (reA)v(dXeS, re X) 


e 
edXcS,(rcAvamexX) 
e 

e 


Xc€S,cAUX 
rcU(AuUX; X € S}. 


To prove part (vi), let A be a set and S be a non-empty set of sets. Then 


r€AU((]S)e (te A) v (VX ES, xc X) 
e VXcS,(rcAVv«cec X) 
o VXcS,cAUX 
e vrc(MAUX;XesS). 


To prove part (vii), let A be a set and S be a non-empty set of sets. Then 


reAn(f|S)e (reA)^(vXeS,vex) 
e VXc€S,(rcA^mcc X) 
eo vXcS,rxcAnx 
< re(\{ANX; XE SI. 


To prove part (viii), let $4 and S5 be sets of sets. Then 


rc (U $1) N (U S2) < (3X4 $1, x X1) ^ (3X5 € $5, LE X3) 


= AX, € S1, 4X2 € So, (r € Xi ^ x € X3) 
e AX, € $1, 3X2 E S2, € X11 X2 
erc UX N Xs; Xı € $1, X3 [s S2}. 


To prove part (ix), let $; and $5 be non-empty sets of sets. Then 


x €(f]91)U((199) & (VX. E S1, x E X1) V (VX2€ S2, £ € X3) 
€ VX, € S1, YX2 € So, (rc X1 V z E€ X5) 
e VX, € S1, VX2 E€ So, £ E X1 U Xo 
e x Ee {X1 U Xs Xı € S1, X2 E So}. 


To prove part (x), let Sı and S2 be non-empty sets of sets. Then 


LE (U $1) U (U S2) < (3X1 Si, £ X1) V (3X5 E S2, LE Xə) 


e AX, € S1, 4X2 € So, (ae Xy V x € X3) 
c dXi € S1, IX E€ So, £x E X1U Xo 
Sore U{X U Xo; Xı E S1, X2 E S2}. 


To prove part (xi), let S1 and S2 be non-empty sets of sets. Then 


x E€ (MS) NA (N S2) & (VX1€ $1, x € X1) ^ (VX» € S», x € X3) 
e VX1€ S1, VXo€ S2, (x E X1 ^ v € X3) 
€ VX1€ $,, VX2 E€ S2, £ € XIN Xo 
e re(MXin Xe; X1 € 91, X2 € So}. 
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To prove part (xii), let Sı and S2 be sets of sets. Then 


E (US1) U (U S2) & (reUS) v (z € U S2) 
(AX, (X € Sı A zE X)) v (AX, (X ES. Are X)) 
IX, (XES AcE X) V(X ES. A xEX)) 
X, ((X €8, VX ES.) NcEX) 
AX, ((X € SU S2) ^ x € X) 
JX e€ S&U S2, £ E X 
x € U(S1 U S2). 


LLI 


e 
e 
e 
> 
e 
e 
e 


To prove part (xiii), let Sı and S2 be non-empty sets of sets. Then 


€ (191)n (182) & (x €f181) ^ (x ET) S2) 
e (WX, (X g9 V cE X)) A (VX, (X ¢ Sp V EX)) 
e WX, (Xg VxE X) A(X ¢S2 voe X)) 
e WX, (XEM A X ES.) V rE X) 
€ VX, ((X é S&U S2) V x € X) 
e VXceS1US,zeX 
e x E€ N(S1 U S2). 


The above calculations use the fact that the proposition Vr € X, P(x), for any set X and set-theoretic 
formula P, means Vz, (x € X = P(x)), which is equivalent to the proposition Vr, (x € X V P(z)). (See 
Notation 7.2.7 and Remark 7.2.6.) 


To prove part (xiv), let S be a set of sets, let A € S and let z € A. It follows that Jy,(z € y ^ y € S) 
because this is true for y = A. Therefore z € (zx; d, (rey A y € S)} 2US. Hence AC (JS. 


To prove part (xv), let S be a set of sets with A € S. Then S 4 0. So f S is well defined by Theorem 8.4.3 (ii). 
Let z € QS = {x; Vy, (y E S = x € y)}. Then Vy, (y € S => z € y). From A € S it then follows that z € A. 
Hence NS C A. 


For part (xvi), suppose that Ø € S. Then (] S C 0 by part (xv). Hence f) S = Ø by Theorem 7.6.5 (ii). 


For part (xvii), suppose that US C A. Let B € S. Then B C US by part (xiv). So B C A. Thus 
USCA — VBec S, B C A. For the converse, suppose that VB € S, B C A. Let x € [J S. Then z € B 
for some B € S. Sox € A for some B € S. Sox € A. Thus USCA. Thus JS C A e VBec S, B C A. 
Hence USCA & VBeS, BC A. 


For part (xviii), suppose that )S 2 A. Let B € S. Then B 2 (S by part (xv). So B 2 A. Thus 
VBc€S, B2 A. Sof\S DA => VB € S, B D A. For the converse, suppose that VB € S, B D A. Let 
x € A. Let B € S. Then x € B. Thus x € B for all B € S. dihereore m E 15. So NS 2 A. Thus 
VSIA =VBES, BDA. Hence SIA & YBES, BD A. 


For part (xx), let A € S. Then A C [J S by part (xiv). So AU (US) =U S by Theorem 8.1.7 (iv). Hence 
US =U{AUX; X € S) by part (v (v). 

For part (xxi), let A € S. Then S £9. So A 2 NS by part (xv). So An (fS) = NS by Theorem 8.1.7 (v). 
Hence (1$ = (](An X; X € S) by part (vii). 

To prove part (xxii), let S1 and S2 be sets of sets. By part (viii), (U S1) O (U S2) = U{X N Xo; X1 € 
$1, Xo € So}. But Utxi 1 Xe; X1 € $1, X2 € S2} = = {y; 4X1 € $1, 5X2 € $5, YE Xi N Xo}. This equals 
the empty set if and only if the proposition “AX, € $1,4X5 € S5, y € Xı N Xə” is false for all y. In other 
words, Vy, -(3X4 € S1,dX» € So, y E€ Xı N X3). That is, Vy, V X4 € S,, VXo € So, Y d X40 Xə. This 
may be rearranged as VX, € S1, VX5 € Ss, Vy, y  Xı X». But Vy, y ¢ Xı N Xə means precisely that 
Xi N X2 = Ø. Hence (U $1) N (U S2) = if and only if VX, € S1, VXo € So, X1 A Xo = 0. 

For part (xxiii), assume that S1 4 Ø and VX; € $1, VX2 € S2, X1 C Xə. Then (](S1 U S3) C N Sı by 
part (ii). Now let x € S1. Then VX; € $1, x € X,. So VX, € S,, VX2 € So, x € Xə. Therefore 
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VX2 € So, x € X» because $4 # Ø. So VX € Sı U Ss, x € X because VX; € Sı, x € X,. Thus 
x € N(S1 U S2). Consequently f^] $1 C ()(S1 U S3). Hence ()(S1 U S2) = N S1. 

For part (xxiv), if US € S, then $n (UST - (US). SoUSU(SN{US}) =USU{US}. FUSES, 
then Sn (US 20. SoUSU(SN{US}) =USU0=US. This verifies the identity. 


8.4.9 REMARK: Unions and intersections of single-parameter set-valued set-theoretic formulas. 

To write Theorem 8.4.12, Notation 8.4.10 is required. This anticipates Notation 10.8.10, which applies to 
families of sets, whose unions are guaranteed by the union axiom to be well defined because their ranges are 
well-defined ZF sets of sets. 

The well-definition of unions [J yes f(X) in Notation 8.4.10 is not always guaranteed by the ZF union axiom. 
The ZF replacement axiom Definition 7.2.4 (6) is required in the general case. The intersections ( yes f (X), 
by contrast, generally yield valid ZF sets by the weaker ZF specification “axiom”, which is Theorem 7.7.2. 


8.4.10 NOTATION: (Uxes f(X), for a set S and a single-parameter set-valued set-theoretic function f such 
that f(X) is well-defined for all X € S, denotes the set LJ {f(X); X € S). In other words, 


WFD = U{f(X); X € S} 
= {y; IX € S, y € f(X)}. 


Nxes f(X), for a non-empty set S and a single-parameter set-valued set-theoretic function f such that f(X) 
is well-defined for all X € S, denotes the set (| {f (X); X € S}. In other words, 


N XNXX) XES} 
XES 


= {y; VX € S, y E f(X)}. 


8.4.11 REMARK: Some generalisations of de Morgan’s law. 
Propositions (i) and (ii) in Theorem 8.4.12 are generalisations of De Morgan’s law. (See Theorem 8.2.3.) 


8.4.12 THEOREM: De Morgan’s law for complements of general unions and intersections. 
Let A be a set. Let S be a set of sets. 


(i) A\US=N{4\ X; X € 85] = Nxes(4\ X) if 5 FO. 
(ii) AV\NS =U{A\ X; Xe ST = Uxeg(A\ X) if S z 9. 
PROOF: For part (i), let A be a set, and let S be a non-empty set of sets. Then A\US = {x € A; x ¢ U S} 
= {x € AVX € S, x X} = {x; xr € A A YX E S, x ¢ X} = {x; YX E€ S, z E€ AA X) = Nxesl A \ X). 

For part (ii), let A be a set, and let S be a non-empty set of sets. Then A\ QS = {x € A; x € NS} = 
{x € A; IX € S, x € X} = {x; x € A AIX ES, x ¢ X} = {x; IX € S, z € A\ X} = Uxes 4 \ X). 


8.4.13 REMARK: Unions and intersections of set-collections specified by restriction. 

Notation 7.7.13 introduced the abbreviation {x € A; P(x)} for {a; (x € A) A P(x)}, where A is a set and 
P is a set-theoretic formula. Theorem 8.4.14 presents some properties of unions and intersections applied to 
sets of this form. Part (ii) is perhaps more surprising than part (i). 


8.4.14 THEOREM: Equivalent conditions for membership of unions and intersections of specified subsets. 
(i) xe U{Ae€ S; P(A)} « (3A € S, (T€ A^ P(A))) = (3A, (r€ AN A€S ^ P(A))) 
for any set of sets S and set-theoretic formula P. 
(ii) ce (\{Ae S; P(A)) (VA S, (x EAV =P(A))) = (VA, (te AVAESV -P(A))) 
for any non-empty set of sets S and set-theoretic formula P which satisfies JA € S, P(A). 


PROOF: To prove part (i), note that 


rcU(AeSs; P(A)) & xeU{B e€ S; P(B)) 


€ JA, (re AANAE{BES; P(B)}) (8.4.1) 
€ JA, (te AN AESA P(A))) 
€ JA€S, (€ A^ P(A))). (8.4.2) 
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To prove part (ii), note that 


xr EeEN{A E S; P(A} e reN{B ES; P(B)} 
€ VA, ({xEeAV A¢{B €S; P(B)}) (8.4.3) 
e VA, (te AV A(AES A P(A)) 
€ VA, (te AV AES V AP(A)) 
€ VAES, (xe AV AP(A)). (8.4.4) 


Line (8.4.1) follows from Theorem 8.4.4 (i). Line (8.4.2) follows from Notation 7.2.7. Line (8.4.3) follows 
from Theorem 8.4.4 (ii). Line (8.4. 8.4.4) foll follows from Remark 7.2.6. 


8.5. Power sets 


8.5.1 REMARK: Some properties of power sets. 
Theorem 8.5.2 presents some basic properties of the power sets in Definition 7.6.25 and Notation 7.6.26. 


8.5.2 THEOREM: Some basic properties of power sets of general sets. 

Let A, and A5 be sets. Let Sı and $5 be sets (of sets). 

(i) 0 € P(A1). 

(ii) Ay € IP(A1). 

(iii) A; € Ag > P(A1) € P(A5). 

(iv) A1 € P(A43) > P(A1) € P(P(A2)). 

(v) U(P(A1)) = Ax. 

) (Y0P(41)) = 0. 

) VC € P(S5), UC € P(U S5). That is, $; € $2 > U Sı C U Ss. 

(viii) Sı C IP(U $1). 

(ix) Sı C P(A4)) >U Sı C Ai. ( 1)-) 
)US,C Ai > S, C P(A). (That is, U S1 € P(A1) > Sı € P(IP(A1)).) 

(xi) US) C Ai & Sı C P(A1). (That is, U S1 € P(A1) & $1 € P(IP(A1)).) 
ii) pe 
) 
) 
) 
xvi) 
ii) 
iii) 
ix) 


That is, S; € P(P(Ai)) > U Sı € P(A 


(S; C IP(A)) A $1 #0) > (] $1 € A1. (That is, Sı € P(P(41)) \ {0} > N S1 € P(41).) 
(xiii) (UC; C € P(P(A1))) = IP(A1). 


(xiv) {C; € € P(P(A1)) \ {0}} = IP(A1). 


> 
- 
3 
[ 


P P(A1) N P(42). 
P(A, U A5) 2 P(41) U P(Ay). 
P 
P 


(151) = Naes, P(A) if 5, #0. 
(U S1) 2 Uses, P(A). 


PROOF: For part (i), Ø C A; by Theorem 7.6.5 (i). Hence Ø € IP(A1) by Definition 7.6.25. 

For part (ii), A1 C A1. Hence A; € IP(A1) by Definition 7.6.25. (See also Remark 7.6.28.) 

To prove part (iii), let A; and A» be sets with A; C Ag. Let X € P(A;). Then X C Ay. So X C As. 
So X € P(A2). 

Part (iv) follows from part (iii) and Definition 7.6.25. 

To prove part (v), let A; be a set. Then A; € PP(Ai). So A; C U(P(A1)). To show the reverse in- 
clusion, let x € U(P(A1)). Then 3X € P(Ai, x € X. So 3X, (X C Ai ^x € X). Hence x € Aj. 
Therefore UJ(IP(A1)) C Ai. It follows that U(P(A1)) = A1. 

For part (vi), Ø € IP(A1) by part (i). Hence (\(P(A1)) = Ø by Theorem 8.4.8 (xvi). 

Part (vii) follows immediately from Theorem 8.4.8 (ii). 
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To prove part (viii), let $1 be a set of sets. Let X € Sı. Then X C US). So X € P(U S1). It follows 
that Sı C IP(U $1). 

To prove part (ix), suppose Sı C IP(A1) and let B € US. Then JU, (U € $1 ^ B € U). So dU, (U € 
P(Ai) ^ B € U) (because S; C IP(S1i)). That is, B € (J(P(A1)). But U(P(A1)) = Ai by part (v) 
So B € Ai. Therefore |J $1 C A, as claimed. 

To prove part (x), suppose that US; C A1. Then P(U S1) C IP(A1) by part (iii. But Sı C P(U S1) b 
part (viii). Hence Sı C IP(A1). 

Part (xi) follows trivially from parts (ix) and (x). 

To prove part (xii), suppose Sı C IP(A1) and Sı # Ø. Then US) C A, by part (ix). But 51 C US) 
because $4 Z (). So N $4 C A; as claimed. 

For part (xiii), let Z = {U C; C € P(IP(A))). Then Z 2 (Ufa); {a} € P(A)) = {a; a € P(A))) = 
IP(A:). 

Now let C € Z. Then C = UD for some D € IP(IP(A1)). So UD € IP(A1) by part (ix). So C € IP(A1). 
Hence Z C IP(A1). Therefore Z = IP(A1) as claimed. 
To prove part (xiv), let Z = ((1C; C € P(P(A1) \ 
(C) € P(P(A1)) \ (0). So C = (Y(C) € Z. Hence P(A; 
Now let C € Z. Then C = f D for some D € IP(IP(A1)) ind So (] D € P(A1) by part (xii). So C € IP(A1). 
Hence Z C IP(A1). Therefore Z = IP(A1) as claimed. 

Part (xv) follows from Theorem 8.1.7 (xii). 

Part (xvi) follows from Theorem 8.1.7 (xiv). 

For part (xvii), let Sı # Ø. Then 


M Let C € P(Ai). Then {C} € P(A). So 
yc 


re€P(f191)e9 vC(|8 

e VAES, TCA (8.5.1) 

= VA € Sı, x € P(A) 

e rc f| P(A), 

AES) 

where line (8.5.1) follows from Theorem 8.4.8 (xviii). Hence P(N 51) = Naes, P(A). 
For part (xviii), let A € Sı. Then A C US) by Theorem 8.4.8 (xiv). So P(A) C P(U $1) by part (iii). 
Hence (jes, P(A) € P(N 51) by Theorem 8.4.8 (xvii). 
For part (xix), let C € P(P(A;)) and S € C. Then {9 € P(A;); S’ C S} = P(A1)NP(S). But C € P(A). 
So S € IP(A1), which implies S C A. Therefore P(S) C IP(A1) by part (iii). So P(A1) n P(S) = P(S) by 
Theorem 8.1.7 (v). Hence (S' € P(A1); S’ C S} = P(S). 


8.6. Closure of set unions under arbitrary unions 


8.6.1 REMARK: The closure of unions of collections of sets under unions is useful in topology. 
Theorem 8.6.2 states that for any set of sets S, the set of unions of subsets of S is closed under arbitrary 
unions. This is useful in topology, for example to prove Theorems 32.1.8, 32.1.10, 32.1.15 and 37.5.4. 


There is some danger of implicitly invoking the axiom of choice for Theorem 8.6.2 because it could be proved 
by choosing, for each set E € Q, a particular collection C such that E = (JC. The axiom is avoided by 
utilising all such sets C in the construction of the set D. So no choices are required! Using all elements of 
a set is often a successful tactic for avoiding making a single choice from that set. 


8.6.2 THEOREM: The set of unions of elements of a collection is closed under the union operation. 
Let S be a set of sets. Let T = (JC; C € S). Then VQ € P(T), UQ € T. 


PROOF: Let T={UC;C CS}. Let QC T. Let D = (Ae S; 3C € P(S), (ACCA UCEQ)}. Then 


Va, rceUDe 
e 


(cE AANAED) 
,(t@EAXNAESAACEP(S), (ACCA UC EQ)) 


A 
A 
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e JA, IC, (EAA AECSACCSAAECAUCEQ) (8.6.1) 
e JA, IC, (t@e AN AECCACCSAUCEQ) 

e IC, (teUCAUCEQACCS) 

e JIC, IE, (te ENEECQACCSAE=UC) 

e JE, (te E^ EEQA ACE P(S), E=UC) 

e JE, (te E^ EECQA EET) 

€ JE, (re EA E€Q) (8.6.3) 
e ceEUQ. 


Line (8.6.1) follows from Theorem 6.6.16 (vi). Line (8.6.2) follows from Definition 7.3.2. Line (8.6.3) follows 


from Q CT. Thus JQ =U D. But DCS. SoU QET. 


8.6.3 REMARK: Well-definition of a set of unions. 

In the statement of Theorem 8.6.2, UC C US for all C € S by Theorem 8.4.8 (ii). So UC € P(U S), as 
stated in Theorem 8.5.2 (vii). Therefore T C P(U S). This guarantees that Ts is a well-defined set by the 
ZF axiom of specification. 


8.6.4 REMARK: Accidental use of the axiom of choice. 

Assertions of the form Vaz € X, dy, P(x,y), for a proposition P depending on two variables, often lead to 
the temptation to use the axiom of choice because such assertions intuitively suggest the existence of a 
function f on the set X such that P(x, f(x)) holds for all x € X. In the particular case of Theorem 8.6.2, 
VU € Q, 3C € P(S), U = (JC, which leads to the temptation to propose a function f : Q — P(S) satisfying 
U f(U) =U for all U € Q. This issue is discussed in more detail in Remark 10.11.11. 


8.7. Set covers and partitions 


8.7.1 REMARK: Covers of subsets are useful for topological compactness definitions. 
Set covers and subcovers are essential for the topological compactness definitions in Sections 33.5 and 33.7. 
Indexed set covers are defined in Section 10.18. 


For Definition 8.7.2, recall that in theory, a “collection” is synonymous with a “set”. (See Remark 7.1.7.) 
But the set C is a set of sets of “atomic” objects in X. So it is called a “collection” here. 


8.7.2 DEFINITION: A cover of a subset A in a set X is a collection C € P(P(X)) such that A C UC. 


8.7.3 DEFINITION: A subcover of a cover C of a subset A in aset X is a cover C” of A in X such that C" C C. 


8.7.4 DEFINITION: A refinement of a cover C of a subset A in a set X is a cover C’ of A in X such that 
YS eC, $e C, S' C S. 


8.7.5 THEOREM: A covered set is equal to the union of intersections of sets which cover it. 
Let C be a cover of a subset A of a set X. Then A = Ugec(AN S). 


PROOF: Let A C UC. Then by Theorems 8.1.7 (v) and 8.4.8 (iv), A= AN (UC) =U{ANS; SeC) — 
Usec (An S). 


8.7.6 REMARK: Covers of sets. 

The term “cover of a set” sometimes means a cover of a subset in the sense of Definition 8.7.2, where the sets 
in the cover may overflow into the set which includes the given subset, but sometimes what is meant is an 
exact cover which covers only the given set and nothing more. In the case of topological compactness, open 
covers are permitted to “overflow” outside a given set because the sets in the cover are intended to be open 
sets in the topological space which includes the given set. Restricting these covering sets to the set which is 
covered would lose the openness property. However, there are also situations where the term "cover of a set" 
means an exact cover which covers the given set and nothing more. This is the purpose of Definition 8.7.7. 


The intended meaning of the term “set cover" can usually be determined from its context. 


8.7.7 DEFINITION: A cover of a set X is a collection C € P(P(X)) such that X = (JC. 
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8.7.8 DEFINITION: A subcover of a cover C of a set X is a cover C” of X such that C’ C C. 


8.7.9 DEFINITION: A refinement of a cover C of a set X is a cover C" of X such that 
vS'eC', 3S eC, S' c S. 


8.7.10 REMARK:  Technically superfluous condition in the definition of a set cover. 

As a curiosity, one may note that the condition C € IP(IP(X)) in Definition 8.7.7 is superfluous, at least in 
the technical sense. The requirement X = | JC implies that S C X for all S € C by Theorem 8.4.8 (xvii). 
So S c P(X) for all S € C. Therefore C C P(X), which implies C € P(P(X)). (Alteratively see 
Theorem 8.5.2 (xi) for the assertion that [JC C X if and only if C € IP(IP(X)).) However, it often happens 
in mathematics that a superfluous condition may give some extra meaning or understanding. In this case, 
the condition C € IP(IP(X)) means that C is a set of subsets of X, which gives an immediate idea (and 
probably a mental image) of its relation to X. By the ZF power set and specification axioms, one can also 
determine immediately that the set of all covers of a set X is a well-defined set. Stating the superfluous 
condition C € IP(IP(.X)) first saves some mental energy in determining this basic relation between C and X. 


8.7.11 REMARK: Partitions of sets are special kinds of set covers. 

A partition of a set X is a cover C of X (in the exact covering sense of Definition 8.7.7) such that each 
element of X is an element of one and only element of C. (The relations between covers and partitions are 
illustrated in Figure 8.7.1. A topology on X is also a kind of set cover of X. See Definition 31.3.2.) 


cover of a subset 


Y 


cover of a set 


KON 


partition topology 
of a set on a set 


Figure 8.7.1 Family tree for covers and partitions 


Partitions are closely related to equivalence relations, which are defined in Section 9.8. 


8.7.12 DEFINITION: A partition of a set X is a set C C P(X) \ (0) such that 


(i) UC » X. 
(ii) VA, B € C, (A= Bor An B — (). 


A set X is said to equal the disjoint union of C if C is a partition of X. 


8.7.13 REMARK: Difficulties with the term “disjoint union”. 

There is a slight abuse of language in the term “disjoint union”. This term creates the impression that there 
are two kinds of union: and ordinary union J C and a disjoint union of C, which many people denote as [J C. 
However, the phrase “X is the disjoint union of C" means “X is the union of C, and C is a pairwise disjoint 
collection of sets". The phrase “disjoint union of C" means “union of C, which happens to be a pairwise 
disjoint collection". If the collection C doesn’t happen to be a pairwise disjoint union, then the phrase “the 
disjoint union of C" has no meaning. 


8.7.14 REMARK:  Ezclusion of the empty set as an element of a partition. 

It seems reasonable to require Ø ¢ C in Definition 8.7.12. The empty set contributes no useful information 
to a partition. The empty set may be added to or removed from any partition without affecting the implied 
equivalence relation. One may either require it, forbid it, or tolerate it. Here it is forbidden, but in some 
ways, it would be simpler to require it. A strong argument in favour of forbidding the inclusion of the empty 
set in partitions is that it adversely affects the cardinality of the partition. For example, the finest partition 
of a finite set has the same cardinality as the set. It would be inconvenient to have to add 1 to obtain 
the cardinality of the finest partition. A weak argument for allowing the empty set to be an element of a 
partition is that this would make Definition 8.7.12 marginally simpler. 
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8.7.15 DEFINITION: 
A partition C" of a set X is (weakly) finer than a partition C of X if Vz € C", dy € C, z € y. 


A partition C" of a set X is (weakly) coarser than a partition C of X if Vy € C, dr € C', y C x. 


8.7.16 DEFINITION: A (weak) refinement of a partition C of a set X is a partition C" of X such that 
Vx € C’, 3y € C, x C y. 


8.8. Specification tuples and object classes 


8.8.1 REMARK: Mathematical objects need “class tags” to indicate their meaning. 

The fact that mathematicians do not agree on which sets should represent the concepts of differential 
geometry shows that the sets are not themselves the objects under consideration, but merely serve to indicate 
which object is being looked at within a class of objects. 


Every mathematical object should ideally have not only a set to indicate which object it is, but also an 
object class, a name tag, a scope identifier and an encoding class. The encoding class is really necessary 
for the same reason that it is necessary in computer software. Computers represent all data in terms of 
zeros and ones. These are the same kinds of zeros and ones whether the data is text, integers, floating-point 
numbers or images in dozens of formats. Therefore all data in computers has some sort of indication of the 
encoding rules used for each piece of data. It is also necessary to know which class of object is represented. 
To distinguish one object from another, identifiers (name tags) are used. Since names from different contexts 
may clash, scope identifiers are often used implicitly or explicitly. 


Strictly speaking, mathematics should also have all five of these components: (1) a set to indicate which 
object of a class is indicated; (2) a class tag to indicate the human significance of the object class; (3) an 
encoding tag to indicate the chosen representation; (4) a name tag to indicate which object of a class is 
indicated; (5) a scope tag to remove ambiguity from multiple uses of a name tag. All of this is done 
informally in most mathematics, but it is helpful to be aware of the limits of the expressive power of sets. 
Sets should be thought of as the mathematical equivalent of the zeros and ones of computer data. Either 
explicitly or implicitly, this raw data must be brought to life. 


8.8.2 REMARK: Particular objects within a mathematical class may be singled out with specification tuples. 
Mathematical objects may be organised into classes. The objects in each mathematical class may be indicated 
by a “specification tuple”, a parameter sequence which uniquely determines a single object in the class. 
Names for things are often confused with the things which they refer to. Mathematical names are no 
exception. A specification tuple is often thought of as the definition of the object itself. For example, a pair 
(G,oq) may be defined to be a group if the function og : G x G — G satisfies the axioms of a group. The 
trivial group with identity 0 would then be the tuple ({0}, {((0,0),0)}). However, given only this pair of 
sets, it would be difficult to guess that it is supposed to be a group. There is something missing in the bare 
parameter list. One could think of the missing significance as the “essence of a group”. 


As another example, consider the set (0, {0}}. This has the following interpretations. 


(1) P({0}). (See Definition 7.6.25.) 

(2) The unordered pair {0,1}, because 0 = Ø and 1 = {0}. (See Remark 12.2.1.) 

(3) The ordinal number 2. (Also defined in Remark 12.2.1.) 

(4) The topological space (.X, T) with X = Ø and T = P(X). (See Remark 31.3.17.) 


'Thus by looking at this set without a class tag, it is impossible to know if it is a set of two sets, a set of two 
ordinal numbers, a single ordinal number, or a topological space. 


8.8.8 REMARK: Specification tuples may be abbreviated. 

A full specification tuple may be inconveniently long. In this case, the tuple may be abbreviated. For 
example, a topological group might be referred to as a set G with various operations and a topology, whereas 
a full specification tuple might be (G,TG,cG) where Ta is the topology and og is the group operation. 
Informal specifications are fine for simple situations, but as structures become progressively more complex, 
the burden on the reader's memory and guesswork becomes excessive. It is best to specify the full set of 
parameters to avoid ambiguity when introducing a new concept. 
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8.8.4 REMARK: A notation to indicate abbreviation of specification tuples. 
Although one may think of notations such as G and (G, TG, aq) as referring to the same object, they cannot 
be equal. A statement such as “G = (G,Tc,oc)” is logical nonsense. It is preferable to use an asymmetric 
notation such as 

G« (G, Ta, ca) or (G, Ta, ca) >G 


to indicate that G is an abbreviation for (G, Te, oa). The non-standard chicken-foot symbols “<” and “>” 
are frequently used in this book. 


8.8.5 REMARK: A more precise alternative specification notation which is not used in this book. 

In a more formal presentation, one might indicate mathematical classes explicitly. For example, the class of 
all topological principal fibre bundles with structure group G might be denoted as TPFB|G, Tc, oc], where 
G < (G,Ta,6G) is a topological group. (In this case, there is a different class TPFB[G, Ta, oq] for each 
choice of the triple of parameters (G, Tg,oq), where G is a group, Tg is a topology on G and og is an 
algebraic operation on G.) 


An individual member of this class might be denoted as TPFBJG, Tc, oG])(P, Tp, q, B, Tp, AG, we), which 
indicates both the parameters of the class and the parameters of the particular object. The 3 class parameters 
are (G, Ta, oc). The 7 object parameters are (P, Tp, q, B, Tp, AS, ue). 

With such a notation, the reader would see at all times the clear division between class and object parameters. 
In practice, such an object is denoted as (P, q, B) or (P, Tp, q, B, Tp, AG, uE), and the class membership is 
indicated in the informal context. Although this book is not written in such a formal way, an effort has been 
made to ensure that most class and object specifications could be formalised if required. Most texts freely 
mix class and object parameters, which can make it difficult to think clearly about what one is doing. 


8.8.6 REMARK: Idea for replacing specification tuples with specification trees or networks. 
Another possible idea for upgrading the information level of specification tuples would be to replace them 
with specification trees or specification networks as discussed very briefly in Remark 47.6.3. 


8.8.7 REMARK: Example of subtle ambiguity which results from not indicating the class of an object. 
Suppose LTG(G, X, ø, u) denotes a left transformation group G with group operation o : Gx G > G, acting 
on X with action u : Gx X — X. Let RTG(G, X, c, p) denote the corresponding right transformation group. 
Then for any group GP(G,c), the parameters of LTG(G, G,o,c0) and RTG(G,G,o,c) are identical. Yet 
they are respectively the left and right transformation groups of G acting on G. So they are different classes 
of structure with identical parameters. (This particular ambiguity is also mentioned in Remark 63.7.1.) 


A subtle feature of this example is that even if all of the components of the specification tuple have well- 
defined class tags, it is still not possible to know what the class of the entire tuple is. Thus it is necessary 
to recursively specify tags for the tuple and all of its components. 


This shows that there must be something extra which indicates the class of a specification tuple. Thus, for 
example, when this book talks about “the group (G, o)”, what is really meant is “the structure GP(G,c)", 
where the meaning of GP is explained only in non-technical human-to-human language. The meaning of 
the class of a structure lies outside formal mathematics in the socio-mathematical context. So the reader 
should have no illusions that the pair (G,c) is a group. It is just a pair of parameters. 
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9.0.1 REMARK: 


Relations and functions are closely related. 


The close similarity between relations and functions. 
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A relation may be thought of as a boolean-valued function of tuples of objects. 


A function may be thought of as a map from tuples of objects to objects. 


Relations and functions evidently have common features. Therefore it makes sense to discuss some aspects 
of relations and functions together. A relation is essentially a map from tuples of objects (x1, £2,... £m) 
to the truth values F and T, whereas a function is essentially a map from tuples of objects (x1, £2,... £m) 
to individual objects y. Although relations and functions are thus clearly distinct, the standard way to 
represent functions in mathematics is as special cases of relations. So the representation of relations has 
important consequences for the representation of functions. 


9.0.2 REMARK: 


Family tree of relations and functions. 
A family tree for relations and functions is illustrated in Figure 9.0.1. 


Cartesian product 


M 


relation 


nm Au dA t 


equivalence partial order 
anon partially defined | | mulitple-valued 
function function Y 
i" P d attice order 
function 4 
"d BEN total order 
injective function| |surjective function 4 


Burt 


a 


bijective function 


well-ordering 


Figure 9.0.1 


Family tree for relations and functions 


Figure 9.0.1 includes equivalence relations, which are defined in Section 9.8, functions, which are defined in 


Chapter 10, and some types of order, which are defined in Chapter 11. 
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9.1. Representation of relations and functions as sets 


9.1.1 REMARK: Representing relations and functions as sets of ordered pairs. 

Relations and functions may be represented as sets of ordered pairs. In other words, associations between 
objects may be explicitly and exhaustively listed. In practice, such sets of ordered pairs are too numerous to 
list individually. So these ordered-pair lists are specified by finite sets of rules. Therefore the set-of-ordered- 
pairs representation is merely an in-principle abstraction of relations and functions. (Functions are defined 
as special cases of relations. This is illustrated in Figure 9.0.1.) 


Two rule-sets which specify relations are equivalent if and only if they generate the same set of ordered pairs. 
Thus two relations are equivalent if and only if their rule-sets are logically equivalent. Suppose relations Ry, 
and Rə are defined as the ordered-pair sets Ry = { (x,y); Pi(x,y)} and Ry = ((z, y); P3(zx,y)), where P, 
and P» are set-theoretic relations (i.e. predicates) with two variables. Then 


fü - Rə o (Vz, Vy, Pim, y) e Py(z, y)). 


In other words, testing equality of the sets Rı and Rə is equivalent to testing the logical equivalence of the 
predicates P, and P2. (This is clear because (x,y) € R; = P;(x,y) for j = 1,2.) Although one is supposed 
to think of relations as ordered-pair sets, in practice one works with logical predicates most of the time. 


These two ways of thinking about relations may be referred to as “ordered-pair-sets” and “membership- 
predicates”. The correspondence between these representations is not perfect for (at least) two reasons. 
Firstly, a membership predicate may not specify a Zermelo-Fraenkel set. For example, the predicate P(x, y) = 
"y C y" does not specify a ZF set. (It is too “big”.) Secondly, a set may not be constructible from a finite 
set of rules. For example, the set P(R x IR) of all relations between real numbers is a well-defined ZF set. 
But almost all elements of this set are not specifiable by a finite set of rules. (In fact, almost all elements of 
R are not specifiable by a finite set of rules.) Thus most of the elements of such sets are “unreachable” by 
membership predicates. But they can be said to "exist in aggregate". 


Ordered-pair-sets have the advantage of avoiding issues such as the existence of finite rules to specify them. 
But membership-predicates have the advantage that they are well defined even when the domain and range 
of the relation are not sets. Chapter 9 presents the ordered-pair-set representation for relations. In practice, 
these sets are typically defined as a predicate restricting a well-defined ZF set. (See Notation 7.7.13.) This 
leverages the axiom of specification to guarantee that the relation is a ZF set. 


9.1.2 REMARK: Graphs of relations and functions. 

If relations and functions are identified with the set of ordered pairs associating elements of the domain with 
elements of the range (as in Sections 9.5 and 10.2), then it is difficult to say what the "graph of a relation" 
or “graph of a function” means, since the graph (i.e. the set of ordered pairs) is identical to the function. 
The term “graph” for a relation or function does have a non-vacuous meaning, however, if a function is 
defined to be a triple consisting of a domain, a range and a rule (or rule-set). Then the graph is the set of 
element-pairs determined by the rule. 


9.1.3 REMARK: Explicit specification of domains and ranges of relations and functions. 

Some authors specify relations and functions as an ordered triple such as f = (X,Y, S) to mean that X is the 
domain, Y is the range, and S is the set of ordered pairs of the function. Then two functions with the same 
domain set and ordered-pair set are considered to be equal functions if and only if the range sets are the 
same. (See for example MacLane/Birkhoff [110], page 4.) The sets X and Y are superfluous. Nevertheless, 
if this formalism is adopted, the set S may be referred to as the graph of the function f to distinguish it 
from the domain/range/graph triple. 


In contexts where the domain and range are fixed in the context background, one naturally ascribes properties 
such as “well-defined” and “surjective” to the ordered-pair sets of functions — because the domain and range 
sets are not varying. So one attributes these properties to the things which do vary. But in contexts where 
the domain and range sets are highly variable, one becomes aware that the “well-defined” and “surjective” 
properties are actually relations between the ordered-pair sets and the domain and range sets. In some 
contexts, each individual function may be specified as a fixed rule which is interpreted differently according 
to the domain and range. 


Thus, for example, the squaring rule maps n to n x n for integers, but the same rule may map z to z? for 
complex numbers or A to A? for (finite or infinite) square matrices over the quaternions. Indicator maps 
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on sets X map elements of X to the range Y = {0,1}, but this range may be interpreted as a subset of the 
ordinal numbers w, or the signed integers Z, or the real numbers R, and so forth. Thus a single function 
may have many different ranges. 


9.1.4 REMARK: The qualitative difference between relations and functions. 

Although functions are represented as particular kinds of sets, they may be thought of as being dynamic, in 
contrast to general relations and sets, which are typically regarded as static. Functions may be thought of as 
moving or transforming objects. Sometimes one thinks of functions as having a temporal or causal quality. 


Since functions may be applied to functions, it often happens that a function acts on static objects, but the 
function is itself acted upon by other functions. So the same function may sometimes have the active role, 
sometimes a passive role. 


Similarly, relations are typically thought of as more than just sets of ordered pairs. They may be thought of 
as indicating associations between things. 


The meanings of sets, relations and functions are clearly not fully captured in the set theory formalism. The 
meanings are provided by the human being who interprets these objects. Sets merely provide parametrisation 
for mathematical objects. Sets are not the real objects of mathematics. (This is similar to the notion that 
words merely provide parametrisation for real-world objects and ideas.) 


9.2. Ordered pairs 


9.2.1 REMARK: The representation of ordered pairs. 

To define relations and functions within the ZF set theory framework, one first defines ordered pairs. This 
is not unavoidable. For example, a relation or function between two sets A and B could be specified as a set 
of unordered pairs {a,b} with a € A and b € B if A and B are disjoint. There would then be no ambiguity 
about which element of each pair belongs to the domain or the range. If the sets A and B are different, but 
not necessarily disjoint, one could specify a relation or function from A to B as a set of pairs {{A, a}, (B, b}} 
for a € A and b € B. Since a — A and b — B are then impossible, the information about which element a 
or b belongs to each set can be retrieved. (It is not possible for both a — B and b — A to be true because 
the axiom of regularity excludes this.) 


Another way to specify ordered pairs would be to introduce a new kind of object into the set theory frame- 
work, namely the class of pairs of objects. These could be axiomatised in the same way as sets, but then 
one would have to describe also pairs whose elements may be pairs. (See Remark 9.2.4.) Similarly, one 
could define relations and functions as axiomatised classes of objects, which would make the system even 
more unwieldy. According to E. Mendelson [370], page 162, Kazimierz (Casimir) Kuratowski discovered the 
representation of ordered pairs in Definition 9.2.2. (See also Quine [382], page 58, and Kuratowski [425], 
page 171.) It has the advantage that pairs are implemented in terms of sets, thereby avoiding the nuisance 
of axiomatising multiple classes of objects, but it has the disadvantage that several kinds of ambiguities 
occur. 


For any sets a and b, the ordered pair construction {{a}, {a,b}} in Definition 9.2.2 is a well-defined set. 
(This follows from ZF Axiom (3). See Definition 7.2.4.) A peculiarity of this representation is the fact that 
the ordered pair (a, a) is represented by {{a}}. One would not immediately think of a set of the form {{a}} 
as an ordered pair. The difference is a matter of interpretation and context. The advantage of representing 
pairs as ordinary sets is the spartan economy of ZF set theory. The disadvantage is that each set has a 
meaning only when accompanied by commentary. 


9.2.2 DEFINITION: An ordered pair is a set of the form {{a}, (a, 0) ) for any sets a and b. 
9.2.3 NOTATION: (a,b) denotes the ordered pair {{a}, (a, b}} for any a and b. 


9.2.4 REMARK: Alternative representation of ordered pairs as a class outside ZF set theory. 

One could axiomatically define a “system of pairs of sets", called Pairs, say, to contain all of the ordered 
pairs of sets in a ZF set theory Sets, say. The relations between sets and pairs could be defined in terms 
of a function P : Sets x Sets — Pairs and functions L : Pairs — Sets and R : Pairs — Sets such that 
L(P(z,y)) = x, R(P(z,y)) = y and P(L(p), R(p)) = p for sets x and y and pairs p. (Of course, functions and 
cross-products have not been defined yet, and Sets and Pairs are classes, not sets. So these are merely naive 
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notions here, not axiomatic.) We could call P(x,y) the ordered pair of sets x and y. Then Definition 9.2.2 

would be just one possible representation of the class Pairs in terms of sets. The set expression {{a}, (a, b}} 

is merely a parametrisation of the concept of an ordered pair. 

The set-theoretic functions P, L and R for the ordered pair representation in Definition 9.2.2 must satisfy 
P(a,b) = {{a}, {a, b}}, L({{a}, {a, b}}) = a and R({{a}, (a, b}}) = b. But the operators L(p) and R(p) 

must be able to extract a and b from p without being given them in advance. Theorems 9.2.5 and 9.2.6 show 

that the left and right elements of an ordered pair can be extracted without knowing them in advance. 


9.2.5 THEOREM:  Set-theoretic formula for the left element of an ordered pair. 
The left element of an ordered pair p is the set a defined by Vr € p, a € x. 


PROOF: Let p = (a,b) be an ordered pair. Then p = {{a}, {a,b}}. Therefore Vz € p, a € x. But it must 
be shown that this uniquely determines the set a. In other words, it must be shown that if Vr € p, a’ € x, 
then a’ = a. Suppose a = b. Then a’ € (a). Therefore a’ = a. Suppose a Z b. Then a’ € {a} and a’ € {a,b}. 
In particular, a’ € {a} again. So a’ =a. 


9.2.6 THEOREM:  Set-theoretic formula for the right element of an ordered pair. 
The right element of an ordered pair p is the set b defined by 3x € p, b € x. 


PROOF: The proposition Fx € p, b € x means Jz € p, b' Ex and Vr, y € p, (( € £ ^ V € y) > x — y). 
(See Notation 6.8.3 for the unique existence notation.) This clearly holds with b = b’. So it remains to show 
that any b which satisfies Fx € p, b’ € x must equal b. Suppose a = b. Then b' € {a} = (b). So V = b. 
Suppose a Æ b. Then either b € {a} or b’ € {a,b}. So either b’ = a or b' = b. Suppose b = a. Then x = {a} 
and y = {a,b} implies that (b' € x ^ b’ € y) and x Æ y, which contradicts the assumption. Hence b’ = b. 


9.2.7 REMARK: The difficulty of extracting the left and right items from an ordered pair. 

'Theorems 9.2.5 and 9.2.6 are unsatisfying. One would ideally like to have simple logical expressions "left" 
and “right” which yield a and b from p as a = left(p) and b = right(p). Instead we have expressions only for 
{a} and (b) in terms of p. 


{a} = {y; Yx € p, y € 1} 
=p 
{b} = {y; Fz ep, y €x} 
a if UpAN 
Up if Up = Np. 


These are clumsy constructions, but there seem to be no expressions for a and 6 themselves at all. In fact, 
more generally, there seems to be no expression to simply construct a from {a}. 


One way out of this difficulty would be to invent a form of expression such as “x; r € X" to extract a 
from X — (a). But this logical expression only yields a single thing if the set is a singleton. It is desirable 
to have an operator which yields “the thing inside the set". The closest one can come to “x = the thing 
inside X" is “a € X". Within the application context, one must also then assert that Fr, x € X. In a 
formal logic approach, this is essentially what is done. Thus for example, the proposition "Vx € p, a € x” 
may be introduced to mean that a is the left element of p. 


The left and right operators may be written as L(p) = “a; Vr € p, a € x” and R(p) = “b; Va € p, b E€ a”. 
This is equivalent to writing Va € p, L(p) € x and 3'z € p, R(p) € x. 


9.2.8 REMARK: Extraction of elements from singletons and ordered pairs if the elements are sets. 

Kelley [101], pages 258-259, gives the methods in Theorem 9.2.9 for extracting the elements of singletons and 
ordered pairs. These methods depend very much on the assumption that the objects a and b in an ordered 
pair (a,b) are themselves sets. By the axiom of extension, Definition 7.2.4 (1), {x; x € a} =a for any set a. 
(In a set theory with “atoms”, this equality is not valid.) 


9.2.9 THEOREM: Some basic properties of ordered pairs (and singletons and unordered pairs). 
Let a and b be sets. 


(i) Ufa} =a. 
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(ii) (aj = a. 
(iii) Ufa, b} 2 aUb 
(iv) (Ma, b} —- an b. 
(v) Ula, b) = {a,b}. 
(vi) Nla, b) = {a}. 
(vii) UU(a,b)=aUb 
(viii) NU(a,b) 2 an b. 
(ix) Uf Ya, b) =a. 
(x) MM, b) =a. 
(xi) (NMUla, b)) U ((UU(Ga 5) ) \ Ufa 0) ) =b 


PROOF: For part (i), let a be a set. Then J(a) = (zx; dy € {a}, x € y} = {a; dy, (y € {a} ^ x € y)). But 
y € {a} if and only if y = a. So Ufa} = (x; Jy, (y =a ^x € y)} = (x; x € a} =a by Theorem 7.7.15 (i). 
For part (ii), let a be a set. Then (){a} = (x; Vy € {a}, x € y} = {a; Vy, (y € (a) >a € y)}. But y € {a} 
if and only if y = a. So U{a} = (x; Vy, (y = a x € y)) = {x; x € a} =a by Theorem 7.7.15 (i). 

For part (iii), let a and b be sets. Then U{a, b} = (x; Jy € {a,b}, x € y} = (x; Jy, (y € {a,b} ^x ey) = 
Ux; dy, (y= 4) V Y= 5) ^ z € y)} = i 3y, (ya^ z €y) V(y-b^v€y)] 
—(z;cavmrcbb—aUub. 


For part (iv), let a and b be sets. Then (){a,b} = (z; Vy € {a,b}, x € y} = (x; Vy, (ye {a,b} > z E€ y)} = 


ix; Vy, (((y =a) V (y =b)) => x € y)} = {z; Vy, (y=a => x E€ y) A (y=b => z € y))} 
={x;xrE€aNxebb=anb. 


For part (v), let a and b be sets. Then U(a,b) = U{{a}, {a, b}} = {x; x € {a} V x € {a, b}} 
= {x; (x =a) V (x =a V x =b)} = {xz; x =a V x= b} = {a,b} 


For part (vi), let a and b be sets. Then ((a, 5) =(){{a}, {a,b}} = (x; x € {a} ^ we {a,b}} 
= {x; (x =a) A (x =a V x =b)}={xz; xz =a} 
For part (vii), let a and b be sets. Then (JU J(a, b) = Ufa, b) = a U b by parts (v) and (iii). 
For part (viii), let a and b be sets. Then (]U(a, b) = (Ma, b) = a N b by parts (v) and (iv). 
For part (ix), let a and b be sets. Then U()(a, b) = U{a} = a by parts (vi) and (i). 
For part (x), let a and b be sets. Then NN (a,b) = (\{a} = a by parts (vi) and (ii). 

( 

( 


For part xi), let a and b be sets. Then ((\U(a,)) U ((UU(a, 8) ) \ UN(a,8)) = (anb) U ((aUb) \ a) =b 
by parts (viii), (vii) and (ix) and Theorem 8.2.5 (xx). 


9.2.10 REMARK: Unnaturalness of left and right element extraction from ordered pairs. 

The left and right element extraction operations for ordered pairs in Theorem 9.2.9 parts (ix), (x) and (xi) are 
quite unnatural. Certainly an expression such as “(NU z)U((UU z)\UN) 2)” does not prompt a spontaneous 
association with the idea of “the second element of the ordered pair z”. However, this representation for 
ordered pairs has been adopted as the de-facto standard, which has the advantage that everyone suffers the 
same inconvenience. Theorem 9.2.11 expands the expressions for a and b in Theorem 9.2.9 as (very slightly 
abbreviated) set-theoretic formulas. 


9.2.11 THEOREM: Some explicit formulas for the left and right elements of an ordered pair. 
Let P = (a,b) be an ordered pair. 


(i) a= {a; Jy, (x E€ y A Yz E P, y E€ z)} = {x; Yy, (Ey V Xx EP, y ¢z)}. 
(ii) aUb= (zx; Jy, (z Ey A Az € P,y€ z)). 
(iii) aN b= {a; Vy, (x E y V Yz e P, y é z)). 
(iv) b= 
(x: (Vy, (rey vVze P,y¢z)) v (Ay, (ey^dzeP,yez)^-(Sw (£z E€ y A Yz E P, y € z)))}. 
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PROOF: For part (i), it follows from Theorem 9.2.9 (ix) that 


ceasexeUf\P 


e dyce(]P,rey 
e dy (y ef|lP^zey) 
€ dy, (x€e€y^VzeP,ye az). 


Similarly, it follows from Theorem 9.2.9 (x) that 


rcaeczc[YPp 
e Vyce(1P,rey 
e Vy (yEN\P > rey) 
= Vy,(xey Vv dz€ P,yéd az). 


For part (ii), it follows from Theorem 9.2.9 (vii) that 


xzcaUb e x cUUP 


e JdyceUP, rey 
e dy, (yEeUPAxeEy) 
€ dy, (xe€y^3zeP,ye a). 


For part (iii), it follows from Theorem 9.2.9 (viii) that 


z€anbeczc()UP 
e Vy€eUP,xey 
e Vy, (y EUP > rey) 
€ Vy, (rey v Vze P, yéz). 


Part (iv) follows by substituting the formulas for a, aU b and a/b in parts (i), (ii) and (iii) into the identity 
b = (a N b) U ((a U b) \ a) given by Theorem 8.2.5 (xx). 


9.2.12 REMARK: Left and right element extraction operations for ordered pairs. 
It is occasionally convenient to encapsulate the expressions in Theorems 9.2.9 (x, xi) and 9.2.11 (i, iv) as the 
left and right element operators “Left” and “Right” in Definition 9.2.13 and Notation 9.2.14. 


9.2.13 DEFINITION: The left element of an ordered pair of sets p is the set NN p. 
The right element of an ordered pair of sets p is the set (Up) u ((UUp) \ UN). 


9.2.14 NOTATION: Left(p), for an ordered pair of sets p, denotes the left element of p. 


Right(p), for an ordered pair of sets p, denotes the right element of p. 


9.2.15 REMARK: How to test a set to verify that it is an ordered pair. 

Before one applies the left and right element extraction operations, one must first test the given object z 
to determine that it is in fact an ordered pair. Otherwise the outcome of such operations will be incorrect. 
First one must establish that z 4 Ø because (]0@ is the class of all sets, which is not itself a Zermelo-Fraenkel 
set. If z = {0}, then (](]z = (10 and (1Uz = NI, which are both similarly undefined. Similarly, if the 
elements of z are disjoint, then QNAN z = (9, which is undefined. Otherwise, the operations in Theorem 9.2.9 
parts (ix), (x) and (xi) all yield a well-defined set. So assuming that z # Ø, and then fz 4 0, one may 
proceed to test whether z is an ordered pair. 

The next obvious test to apply to a set z is a determination that it has one and only one element in its 
intersection. In other words, one must verify the proposition 3'z, x € (| z, which is really two tests, namely 
da, x € Nz and Va, Vy, (xe Nz ^y e (12) 9 x — y. 
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Then one must verify that Jz has at most two elements. In other words, one must verify the proposition 
3?z, x € Uz. (See Remark 6.8.8 for “multiplicity quantifiers”.) This test may be written as: 


Va, Vg, Vx3, (x1€Uz ^ ro €l)z ^ z3 € Uz) => (z1 = T2 V T2 = T3 V ri = 23). 


Passing all of these tests excludes the possibility that z could have the form {{a}, {a,b}, {b}} with a 4 b 
because this would have an empty intersection although its union would equal (a, 5]. 

After a set has passed all of these tests, one may safely apply the left and right element extraction operations 
in Theorem 9.2.9 parts (ix), (x) and (xi). This strengthens the impression mentioned in Remark 9.2.10 that 
the standard representation for ordered pairs is unnatural. 


Another way of testing a given set z to verify that it is an ordered pair is by way of the proposition 
da, 3b, z = {{a}, (a, b] ]. This may be written out more fully as Ja, 3b, Vc, (c € z & (c = {a} V c= (a,5])), 
which may be written out even more fully as 


da, 3b, Vc, (ce ze ((Vz, (r€ce&r-a)Vv(Vrz,(rece(x-aVvr = b))))). 


9.2.16 REMARK: Ordered pairs are equal if and only if both components are equal. 

Theorem 9.2.17 asserts the property of ordered pairs which motivates the representation of ordered pairs in 
Definition 9.2.2. Any alternative representation which has the same property would be equally acceptable, 
but the representation in Definition 9.2.2 seems to be universally accepted. (See also proofs of Theorem 9.2.17 
by Halmos [357], page 23 and E. Mendelson [370], page 162.) 


9.2.17 THEOREM:  Element-wise equivalent condition for equality of ordered pairs. 
Ordered pairs satisfy Va, Vb, Vc, Vd, ((a,b) = (c,d) & ((a=c) ^ (b — d))). 


PROOF: This proof uses the idea that two finite sets that are equal must have the same number of elements. 
Suppose (a, b) = (c, d). By Definition 9.2.2, (a,b) = {{a}, {a, b}}. and (c,d) = {{c}, (c, d]. So {a} € (a,b). 
Therefore {a} € (c,d). So by definition of {{c}, {c,d}}, either {a} = {c} or (a) = {c,d}. If {a} = {c}, 
then a = c. If {a} = {c,d}, then a = c = d. In either case, a = c. 

To show that b = d, first suppose that a = b. Then (a,b) = {{a}}. Therefore (c, d) = {{c}}. So c = d. 
Therefore b — d. 

Now suppose that a # b. Then {a,b} € (c,d) since {a,b} € (a,b). So {a,b} = (c) or {a,b} = {c,d}. 
Therefore {a,b} = {c,d} since a Z b. Hence c Z d. Sob = cor b = d. But a = c and a Æ b. So b = d. 


9.3. Ordered tuples 


9.3.1 REMARK:  Lou-level and high-level representations of ordered pairs. 

In Section 14.6, ordered n-tuples are defined for any non-negative integer n € Zi. These are defined as 
functions from IN, = {1,2,...n} to a fixed set X. When n = 2, this gives an alternative representation 
for ordered pairs. To avoid confusion and circularity, it is important to distinguish between ordered pairs 
and ordered 2-tuples. The former are defined in Definition 9.2.2 whereas the latter are represented as 
functions, which cannot be defined until ordered pairs have been defined. In practice, the n-tuple function 
representation is most often used because this is uniform and convenient. To avoid circularity, a “boot- 
strapping" procedure is followed. First, low-level ordered pairs are defined as in Definition 9.2.2. Then 
general functions are defined in terms of these low-level ordered pairs. Then general n-tuples are defined in 
terms of functions. 

It is possible to define ordered triples and quadruples directly in terms of ordered pairs as in Definition 9.3.3 
without first defining functions. These are quite inelegant, however, and are not the usual way of defining 
general ordered tuples. 


9.3.2 REMARK:  Lou-level definitions of ordered triples and quadruples. 

Ordered triples, quadruples, quintuples and higher-order tuples are defined in Definition 14.6.5 in terms of 
functions (which are in turn defined in terms of ordered pairs in Section 10.2). An alternative low-level 
construction for ordered triples and quadruples is given in Definition 9.3.3. (This construction is described 
by Shoenfield [390], page 244.) 
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9.3.3 DEFINITION: An ordered triple is a set of the form (a, (b, c)) for any three sets a, b and c. This is 
denoted as (a, b, c). 


An ordered quadruple is a set of the form (a,(b,c,d)) for any four sets a, b, c and d. This is denoted 
as (a, b, c, d). 


9.3.4 REMARK:  Lou-level definitions of ordered tuples. 

By naive induction, Definition 9.3.3 may be extended to any order of tuple by the rule (a1, a2,...an+1) = 
(a1, (a2,...a4,àn41)). To make the induction complete, it is convenient to define a 1-tuple as (a) = a for 
any set a. (This inductive rule cannot include the definition of a 0-tuple.) 


9.3.5 REMARK:  Unsuitability of low-level representations of tuples. 
'The set constructions for ordered triples and quadruples in Definition 9.3.3 are unattractive when expanded. 


(a, (b c)) = {1a}, a; (b, c)}} 
= (ta la {1b}, 00555] 
(a, (b, c, d)) = {{a}, {a, (b, c, d)}} 
= {ta}, la {10}, 18 tit 06 HD 


The corresponding constructions for a different permutation of the components are even less attractive. (This 
alternative representation for ordered triples and quadruples is described by E. Mendelson [370], page 162.) 


((a, b), c) = {{(a, b)}, {(a, b), 05] 
= (Eta 1a 533 (tah, ta, by, ht 
((a, b, c), d) = {{((4, b), c)}, {((a, b), c), d) } 
= {tttttat ta, Oy}, Gar ta, bbb 3 UGG ta Ob a) ta, ob cht, ay. 


On the other hand, this is not very much more unattractive than the standard representation for the ordinal 
numbers. (See Remark 12.2.1.) The formulas for extracting the components of these ordered triples and 
quadruples are even more unattractive. This is a further motivation for not using this representation. 


It would seem reasonable to use a representation {{a}, {a,b}, {a, b,c}} for the ordered triple (a,b,c), and a 
similar expression for ordered quadruples etc. However, when two of the components of such a triple are 
equal, the construction is ambiguous. 


{{a}, {a, ch} = (a, c) = (b, c) ifa=b 
{{a}, {a,b}, (a, b, ch} > {{a}, {a, b}} = (a, b) = (a,c) ifb=c 
{{a}, {a,b}} = (a,b) = (bc). ifa=e. 


This would imply that (b,b, c) = (c, b,c) and (a, a, c) = (a,c, c), and so forth. Therefore it is best to abandon 
such low-level constructions for ordered tuples except in the case of ordered pairs, which are required to 
boot-strap the definitions of Cartesian products, relations and functions. 


9.4. Cartesian product of a pair of sets 


9.4.1 REMARK: Representation of Cartesian products of sets in terms of ordered pairs. 

'The Cartesian product of a pair of sets may be defined in terms of ordered pairs. The Cartesian product of 
more than two sets is generally defined in terms of functions and integers, which are introduced in Chapters 10 
and 14 respectively. T'herefore Section 9.4 is concerned only with the Cartesian product of two sets. 


9.4.2 DEFINITION: The Cartesian product of two sets A and B is the set ((a,b); a € A, b € B} of all 
ordered pairs (a,b) such that a € A and b € B. 


9.4.3 NOTATION: A x B denotes the Cartesian product of sets A and B. 
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Figure 9.4.1 Cartesian product of sets 


9.4.4 REMARK: Diagram of the Cartesian product of two sets. 
The Cartesian product of two sets A and B may be visualised as in Figure 9.4.1. 


Cartesian products may be thought of as “rectangular sets”, like plain blocks of wood from the timber store. 
Cartesian products are less valuable in themselves than when they have been sawn, planed and sanded to 
the desired shape, and then nailed, screwed or glued to other components of a mathematical system. 


9.4.5 REMARK: Well-definition of Cartesian products. 

The Cartesian product A x B is well defined because it is a subset of P(P(AU B)). To see this, note that 
{a} and {a,b} are elements of IP(AU B) so that (a,b) € P(P(AUB)) and A x B € P(IP(IP(AU B))). Then 
Ax B= {p € P(P(AU B); da € A, db € B, p = (a,b)). Thus the well-definition of the ZF set A x B 
follows from Axioms (3), (4), (5) and (6) 


9.4.6 THEOREM: Some basic properties of Cartesian set-products. 
Let A, B, C and D be sets. 
(i) (Ax B=) e (A=0)V (B — ()). 
(ii) If AC C and BC D, then Ax BC C x D. 
(ii) If A Z Ø and B Z0, then (Ax BC C x D) & ((AC C)A(B C D)). 
(iv) (Ax B)n (Cx D) 2 (An C) x (Bn D). 
(v) If An C 20 or Bn D =f, then (A x B) (C x D) — 0. 
) 
) 


(vi) (Ax C)n(BxC)-(An B) x C and (Cx A)n (Cx B) 2C x (ANB). 
(vii) (Ax C)U (B x C) = (AUB) x C and (Cx A)U (C x B) 2C x (AUB). 
(vii) (A x B)U(C x D) C (AUC) x (BU D). 


Pnoor: For part (i), note that A x B = {(a,b);a € A, b € B} by Definition 9.4.2. It follows that 
r€AxB e (x=(a,b) Aae€ AA bE B). Therefore 
AxB=0eVa,c¢AxB 
e Vr, (xx # (a,b) Vag AV DEB) 
e Va, (x = (a,b) > (a ¢ AV bd B)) 
= Va, Vb, (ag AV b é B) 
= Va, (a € A V (Vb, b ¢ B)) 
= (Va, a é A) V (Vb, b é B) 
€ (A-0) v (B=9). 
Lines (9.4.1) and (9.4.2) follow from Theorem 6.6.16 (ix). 


For part (ii), suppose that A C C and B C D. Let x € A x B. Then x = (a,b) for some a € A and bc B 
by Definition 9.4.2. So a € C and b € D by Definition 7.3.2. Therefore (a,b) € C x D by Definition 9.4.2. 
Hence A x B C C x D by Definition 7.3.2. 


Ss 
> e 
N e 
= 
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For part (iii), let A # 0 and B # Ø. Suppose that A x B C C x D. Let a € A. (a,b) € Ax B for some b € B 
because B #4 (). So (a,b) € C x D. Therefore (a,b) = (c, d) for some c € C and d € D by Definition 9.4.2. 
So a — c and b — d by Theorem 9.2.17. Soa € C. Thus A C C. Similarly B C D. The converse implication 
follows from part (ii). 

For part (iv), (x,y) € (A x B) n (C x D) if and only if (x € A and y € B) and (x € C and y € D), if and 
only if (x € A and x € C) and (y € B and y € D), if and only if x € ANC and y € BND, if and only if 
(z,y) € (An C) x (Bn D). Hence (Ax B)n(C x D) = (ANC) x (Bn D). 

Part (v) follows from parts (i) and (iv). 

Part (vi) follows from part (iv). 

For part (vii), (x,y) € (A x C)U (B x C) if and only if (x € A and y € B) or (x € B and y € C), if and 
only if (x € A or x € B) and y € C, if and only if x € AUC and y € C, if and only if (x, y) € (AU B) x C. 
Hence (A x C)U(B x C) = (AU B) x C. The second equality may be proved in the same way. 

For part (viii), note that 


(z,y) € (Ax B)U(C x D) rEANYEB)V(LECAYED) 
rC€A^ycB)VxcC)^(xe€A^ycB)vzreD) 
cAVzcC)^(y€eBvVrzceC)^(recAVxceD)^(yeB v yc D) 
cAVzcC)^(yeBvycD) 


y) € (AUC) x (BU D). 


S ( 
<= ( 
e (x 
> (x 
€ (z, 


Hence (A x B)U (C x D) C (AUC) x (BU D). 


9.4.7 THEOREM: Properties of unions and intersections of collections of set-products. 
Let C1 and C5 be (collections of) sets. 


(i) Uaec, (A x B) = (UC1) x B. 
(ii) Naco, (A x B) = (N C1) x B if C1 AO. 
( 


)= 
) )= 
(iii) Usec Ax B)=Ax (UC). 
(iv) Nec, (A x B) = A x (f1C2) if C» Z 0.. 
(v) Uaec, Unec, (A x B) = (UC1) x (U C2). 
(vi) f'laec, Unec, (A x B) = ((1C1) x (UC2) if C1 #4. 
(vii) Uaec, Ngeca (A x B) = (U C1) x ((1C2) if C» Z 0. 
(viii) Macc, Ngec (4 x B) = (101) x (C2) if C1 7 () and C» Z 0. 
PROOF: For part (i), 
Va, Vb, (a,b)€ U (Ax B) & VAe C, (a,b) eAx B 
Pu & VA€ Ci, (af A ^ be B) 
& (VAeCi,aeA)AbeB (9.4.3) 


€ (ac Uneo, A) ^b € B 
€ (aeUCi) AbEB 
€» (a,b) € (UC) x B, 


where line (9.4.3) follows from Theorem 6.6.16 (iii). Hence U jec, (Ax B) = (UCi) x B 
Parts (ii), (iii) and (iv) may be proved as for part (1). 
Part (v) follows from parts (i) and (iii). 


i) follows from parts (ii) and (iii). 
ii 


vii) follows from parts (i) and (iv). 
iii 


viii) follows from parts (ii) and (iv). 
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9.4.8 REMARK: Projection maps for binary Cartesian products. 
Since the definitions for projection maps for binary Cartesian products require the concept of a function, 
this important construction for Cartesian set-products is delayed until Section 10.12. 


9.4.9 REMARK: Emphatic version of the Cartesian set-product notation. 

The product symbol “x” in Notation 9.4.3 is very much over-used in mathematics. It can mean a plain 
Cartesian set-product as in Definition 9.4.2, a relation product as in Definition 9.7.2, a function product 
as in Definitions 10.14.3 and 10.15.2, a group product as in Definition 17.7.14, a topological space product 
as in Definition 32.9.4, or a differentiable manifold product as in Definition 52.6.2. Most of the time, the 
product construction which is intended is clear from the object classes of the component spaces, but there are 
also situations where the class-specific interpretation must be overridden. (See for example Remark 52.6.6.) 
Notation 9.4.10 is provided to override all such class-specific interpretations. 


9.4.10 NOTATION: Strict notation for binary Cartesian set products. 
A x? B denotes the Cartesian product of sets A and B. 


9.5. Relations 


9.5.1 REMARK: The definition of a relation between two sets. 

Remark 9.1.1 mentions the difference between a membership-predicate relation (defined by a logical predicate 
expression or rule) and an ordered-pair-set relation (defined as a set of ordered pairs of related objects). In 
this terminology, Definition 9.5.2 defines an ordered-pair-set relation. Each pair (a, b) in a relation defines a 
directed association between the components a and b. 


9.5.2 DEFINITION: A relation is a set of ordered pairs. 
A relation between a set A and a set B is a subset of A x B. 


A relation from a set A to a set B is a subset of A x B. 


A relation on a set A is a subset of A x A. 
A pair (a,b) is said to satisfy the relation R if (a,b) € R. 


9.5.3 EXAMPLE: Some relations which are defined for all sets. 

The following sets are relations from X to Y for any sets X and Y. 
) 0. The empty relation. Nothing is related to anything else. 

2 

3 

4 

5 


) X x Y. Every element of X is related to every element of Y. 

) {(x,y) € X x Y; x= y}. The equality relation. 

) {(x,y) € X xY; x #y}. The inequality relation. 

) {(x,y) €X x Y; x € y}. The containment or “element-of” relation. 
) {(z,y) © X xY; x Cy}. The set-inclusion relation. 


e 
( 
( 
( 
( 
(6 


The following sets are relations from X to Y for any sets X, Y, A and B. 


(7) ((z,y) e X x Y; r € A}. This is the same as (X N A) x Y. 
(8) ((z,y) € X x Y; v d A}. This is the same as (X \ A) x Y. 
(9) ((z,y) e X x Y; y € B}. This is the same as X x (Y A B). 
(10) ((z,y) € X xY; y € B). This is the same as X x (Y \ B). 


The set ((z,y) € X x Y; P(z,y)} is a relation from X to Y for any sets X and Y binary predicate P. 


» 


The relations “=”, “A”, “€”, “C” in the above list are set-theoretic relations. The relations “=” and “€” are 
primary relations in ZF set theory, whereas *Z" and “C” are composed from two or more primary relations. 
(Thus A Z B means =(A = B), and A C B means Vz, (x € A => x € B) for example.) These set-theoretic 
relations must be distinguished from the corresponding ordered-pair-set relations {(x, y) € X x Y; « = y}, 
((x,y) EX xY; x £y}, ((m,y) e X x Yir € y} and ((z,y) € X x Y; x € y]. Usually the context makes 
clear which kind of relation is meant. 


Set-theoretic relations typically apply to all sets without restriction, whereas ordered-pair-set relations are 
generally defined only within a specified Cartesian product of sets. 
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9.5.4 DEFINITION: The domain of a relation R is the set {a; 4b, (a,b) € R}. 
The range (or the image or the codomain) of a relation R is the set (b; da, (a,b) € R}. 


9.5.5 REMARK: Terminology for the domain and range of relations. 

The word “image” is used both for the overall range set (b; Ja, (a,b) € R} of a relation R, and also for 
the restriction {b; da € A, (a,b) € R} of the range corresponding to any set A. It would therefore seem 
reasonable to use the term “pre-image” for the overall domain of the relation. However, this is not usually 
done. The term “pre-image” customarily means the set (a; 4b € B, (a,b) € R} for some set B, and this set 
is called the pre-image of B. 


9.5.6 NOTATION: 
Dom(R) denotes the domain of a relation R. 


Range(R) denotes the range of a relation R. 
Img(R) is an alternative notation for the range of a relation R. 


9.5.7 REMARK: The domain and range of a relation are well-defined sets. 

Since Definition 9.5.2 specifies that a relation is a set, it follows that the domain and range in Definition 9.5.4 
are also sets. It is perhaps not immediately clear why this is so. The specification axiom (which follows from 
the ZF replacement axiom, Definition 7.2.4 (6)) implies that any expression of the form {x € X; P(x)} isa 
genuine set if X is a set and P is a set-theoretic function. Hence it is sufficient to find suitable X and P to 
represent the domain and range of a function. The set X = [JU includes both the domain and range for 
any relation R. Theorem 9.5.8 expresses the domain and range of any relation in terms of this set X and 
suitable predicates P. 


9.5.8 THEOREM: Well-definition of relation domain and range by ZF union and specification axioms. 
The domain and range of a relation are sets. For any relation R, 


Dom(R) = (a € UUR; I € UUR, (a,b) € R} 


and 


Range(R) = (b € UU R; Ja € UUR, (a,b) € R}. 


PROOF: Let R be a relation. Suppose that (a,b) € R. By definition of an ordered pair, this means that 
{{a}, {a, b}} € R. It follows that {a} € (a,b) for some b. Let y = (a,b). Then (a) € y and y € R for some y. 
That is, Jy, ({a} € y ^ y € R). By the definition of set union, this is equivalent to (a) € [J R. Let z = {a}. 
Then a € z and z € UR. So by the definition of set union again, this implies a € JU R. This argument 
may be summarised as follows. 


(a,b) € R & {{a}, {a,b} e R 
=> dy, ((ajey^ye R) 
€ {a} EUR 
=> dz, (aezAzeUR) 
€ ae UU. 


An almost identical argument shows that (a,b) € R = b € U UR as follows. 


(a,b) € R & {{a}, a,b) ) E R 
=> dy, ({a,b EY AyER) 
= {a,b EUR 
> dz, (bezAzeUR) 
© beUUR. 


By a double application of the ZF union axiom (Definition 7.2.4(4)), UU R is a set. By the replacement 
axiom (Definition 7.2.4 (6)), (a € UU R; 3b € UUR, (a,b) € R} and {b € UUR; 3a € UUR, (a,b) € R} 
are both sets also. But since (a,b) € R implies that a € UUR and b € UUR, these sets are the same 
as (a; Jb, (a,b) € R} and (b; Ja, (a,b) € R} respectively, which are simply the definitions of Dom(R) 
and Range(R). It follows that the domain and range of R are sets for any relation R. 


Ws WwW 


Se 


[ www. geometry. org/dg. html] [draft: UTC 2023-1-3 Tuesday 00:13 


9.5. Relations 297 


9.5.9 REMARK: The method of proof that the domain and range of a relation are well-defined sets. 

The steps in the proof of Theorem 9.5.8 may be thought of as progressively replacing the set-braces “ {” 
and “}” with the set-union operator “(J)”. The two levels of nesting of a and b in set-braces are replaced 
by two levels of set-union operators. This theorem depends very particularly on the representation chosen 
for ordered pairs in Definition 9.2.2. 


9.5.10 THEOREM: Inclusion of a relation within Cartesian products constructed from the relation. 
Let R be a relation. 


(i) RC (UUR x UUR). 
(ii) R € Dom(R) x Range(R). 


PROOF: Part (i) follows from Theorem 9.5.8. 
For part (ii), let x € R. Then x = (a,b) for some a and b by Definition 9.5.2. So a satisfies 3b, (a,b) € R, 


and b satisfies Ja, (a,b) € R. Therefore a € Dom(R) and b € Range(R) by Definition 9.5.4. So (a,b) € 
Dom(R) x Range(R) by Definition 9.4.2. Hence R C Dom(R) x Range(R). 


9.5.11 THEOREM: Domain and range of a relation are included in source and target spaces. 
Let R be a relation. 


(i) VA, VB, (RC Ax B > Dom(R) C A). 

(ii) VA, VB, (RC Ax B => Range(R) € B). 
Pnoor: For part (i), let A and B be sets satisfying R C A x B. Let a € Dom(R). Then (a,b) € R for some 
b € R by Definition 9.5.4. So (a,b) € A x B. Therefore a € A by Definition 9.4.2. Hence Dom(R) C A. 


For part (ii), let A and B be sets satisfying R C A x B. Let b € Range(R). Then (a,b) € R for some a € R 
by Definition 9.5.4. So (a,b) € A x B. Therefore b € B by Definition 9.4.2. Hence Range(R) C B. 


9.5.12 THEOREM: Some very basic properties of relations and the elements of their ordered pairs. 
Let R be a relation. 


(i) Va, Vb, ((a,b) € R > a € Dom(R)). 
(ii) Va, Vb, ((a,b) € R = b € Range(R)). 


Pnoor: For part (i), let (a,b) € R. Then 3b, (a,b) € R by Definition 6.3.9 (EI). So a € Dom(R) by 
Definition 9.5.4 and Notation 9.5.6. The assertion follows. 

For part (ii), let (a,b) € R. Then da, (a,b) € R by Definition 6.3.9 (EI). So b € Range(R) by Definition 9.5.4 
and Notation 9.5.6. The assertion follows. 


9.5.13 THEOREM: Domains and ranges are non-decreasing with respect to relations. 
Let R4 and R3 be relations. 


(i) Ry C Ro => Dom(£i) Gc Dom(H»). 

(ii) Ry C Ro > Range(R,) C Range( s). 
Proor: For part (i), let Ri C Rə. Let z € Dom(R,). Then (x,y) € Ri for some y. So (x,y) € Rə for 
some y. Therefore x € Dom(R2) by Definition 9.5.4. Hence Dom(R;) € Dom(R;). 


For part (ii), let Ry C Ro. Let y € Range(R,). Then (x,y) € Rı for some x. So (x,y) € Rə for some z. 
Therefore y € Range( R2) by Definition 9.5.4. Hence Range(R4) € Range(R2). 


9.5.14 THEOREM: Unions and intersections of relations are relations. 
Let A and B be sets. Let C be a set of relations from A to B. 


(i) UC is a relation from A to B. 
(ii) If C Z 0, then (| C is a relation from A to B. 


Pnoor: For part (i), C € P(P(A x B)). So UC € P(A x B) by Theorem 8.5.2 (ix). Therefore UC is a 
relation from A to B by Definition 9.5.2. 


For part (ii), C € P(P(A x B)). So NC € P(A x B) by Theorem 8.5.2 (xii). Therefore (|C is a relation 
from A to B by Definition 9.5.2. 


www.geometry.org/dg.html draft: UTC 2023-1-3 Tuesday 00:13 
8 x 8 B D ll geometry c gh (C) 2022, Alan U. Kennington. All Rights Re LY print this book dr personal use, Public redistribution of this book draft in electronic or printed form is forbidden, You may not charge any fee for copies of this book draft 


298 9. Relations 


9.5.15 REMARK: The image and pre-image of a relation. 
The sets A and B in Definition 9.5.16 are completely arbitrary. In particular, they are not required to be 
subsets of the domain and range respectively of the relation R. 


The notation R~!(B) in Notation 9.5.17 is more or less a “forward reference" to Notation 9.6.14 for the 
inverse of a relation. However, to be pedantic, the expression *R-!(B)" means the pre-image of B by R 
whereas “(R~+)(B)” means the image of B by R^!. Since both expressions refer to the same set, the 
confusion is harmless. 


9.5.16 DEFINITION: The image of a set A by a relation R is the set (5; Ja € A, (a,b) € R}. 
The pre-image (or inverse image) of a set B by a relation R is the set (a; 3b € B, (a,b) € R}. 


9.5.17 NOTATION: R(A), for a relation R and set A, denotes the image of A by R 
R- (B), for a relation R and set B, denotes the pre-image of B by R 


9.5.18 THEOREM: Image and pre-image set-maps are non-decreasing. 
Let R bea relation. Let A and B be sets. 

(i) R(A) € Range(R). 

(ii) R-1(A) € Dom(R). 


PROOF: Parts (i) and (ii) follow from Definitions 9.5.4 and 9.5.16. 
For part (iii), suppose that A C B. Let b € R(A "E Then Ja € A, (a,b) € R by Definition 9.5.16. So 
Ja € B, (a,b) € R. Therefore b € R(B). Hence R(A) C R(B). 

For part (iv), suppose that A C B. Let a € RT (A. d db € A, (a,b) € R by Definition 9.5.16. So 
3b € B, (a,b) € R. Therefore a € R !(B). Hence R^! (4) C R (B). 


9.5.19 THEOREM: Images, pre-images, domains and ranges of relations. 
Let R bea relation. Let A and B be sets. 


(i) R(A) = Range(RN (A x Range( R))). 
(ii) R-!(B) = Dom(RN (Dom(R) x B)). 
(iii) R(A) = a N Dom(R)). 

(iv) R-!(B) = R-* (Bn Range(R)). 


PnRoor: For part (i), Range(RN(A x Range( R))) = (6; da, (a,b) € RO (Ax Range(R))} by Definition 9.5.4, 
and (a,b) € RN (A x Range(R)) if and only if (a,b) € R and (a,b) € A x Range(R). But (a,b) € 
A x Range( R) if and only if a € A and b € Range(R). Since (a,b) € R implies b € Range(R), it follows 
from Theorem 4.7.9 (lxx) that (a,b) € RM (A x Range(R)) if and only if (a,b) € R and a € A. Hence 
Range(R n (A x Range(R))) = (5; Ja € A, (a,b) € R}, which equals R(A) by Definition 9.5.16. 

For part (ii), Dom(R n (Dom(R) x B)) = (a; 3b, (a,b) € RN (Dom(R) x B)} by Definition 9.5.4, and 
(a,b) € RA (Dom(R) x B) if and only if (a,b) € R and (a,b) € Dom(R) x B. But (a,b) € Dom(R) x B if and 
only if a € Dom(R) and b € B. Since (a,b) € R implies a € Dom(R), it follows from Theorem 4.7.9 (lxx) 
that (a,b) € RN (Dom(R) x B) if and only if (a,b) € R and b € B. Hence Dom(R n (Dom(R) x B)) = 
(a; db € B, (a,b) € R}, which equals R^ 1(A) by Definition 9.5.16. 
For part (iii), R(AM Dom(R)) € R(A) by Theorem 9.5.18 (iii). Let b € R(A). Then da € A, (a,b) € R by 
Definition 9.5.16. But (a,b) € R implies a € Dom(R) by Theorem 9.5.12 (i). So da € A, (a € Dom(R) and 
(a,b) € R). Thus da € An Dom(R), (a,b) € R). Sob € R(A N Dom(R)) by Definition 9.5.16. Therefore 
R(A) € R(AN Dom(R)). Hence R(A) = R(A N Dom(R)). 


For part (iv), R ! (Bn Range(R)) C R^!(B) by Theorem 9.5.18 (iv). Let a € R^!(B). Then 3b € B, 
(a,b) € R by Definition 9.5.16. But (a,b) € R implies b € Range(R) by Theorem 9.5.12 (ii). So 3b € B, 
(b € Range(R) and (a,b) € R). Thus 3b € B N Range(R), (a,b) € R). Soa € R-!(B n Range(R)) by 


Definition 9.5.16. Therefore R !(B) C R ! (Bn Range(R)). Hence R^ !(B) = R 1 (Bn Range(R)). 
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9.5.20 DEFINITION: A source set for a relation R is any set A such that A > Dom(R). 
A target set for a relation R is any set B such that B 2 Range(R). 


9.5.21 REMARK: Source sets and target sets for relations. 

The terms “source set” and “target set” in Definition 9.5.20 are possibly non-standard. These sets are often 
referred to colloquially as the “domain” and “range” of the relation. If a relation is defined to be merely a 
set R of ordered pairs, as opposed to the tuple (R, X, Y), then the sets X and Y are not uniquely determined 
properties of the set R. These sets are merely part of the context in which the relation is discussed. Thus 
Dom(R) and Range(R) are uniquely determined by R, whereas Source( R) and Target(R) are not well defined 
because they are not uniquely determined. Nevertheless it is very often useful to be able to refer to the source 
and target sets. Therefore they need names which are different to “domain” and “range”. (Other suitable 
names for the source and target sets would be the “origin set" and “destination set” respectively.) 


When a relation is defined to be a tuple (R, X, Y), incorporating the source and target sets X and Y into 
the specification, the set of ordered pairs R is called the “graph” of the relation. 


When relations are defined as merely the set of ordered pairs R, the word “graph” is a synonym for the 
relation itself, although the use of the word “graph” then draws attention to the fact that the relation is 
represented as a set of ordered pairs. One generally thinks of sets and relations as being in different classes 
in some sense. (Class tags for sets are discussed in Remark 9.1.4.) According to this perspective, sets merely 
provide a space of parameters for the objects in object classes. Consequently, using the term “graph” is 
supposed to indicate that the set of ordered pairs should be thought of as a member of the Set class rather 
than the Relation class. (Object classes here are only distantly related to the categories of category theory.) 
Thus it is useful to be able to refer to the set of ordered pairs of a relation as a “graph” because this 
signals that one should remove the metamathematical *Relation" tag for the set, and replace it with the 
metamathematical *Set" tag. 


In defence of the relation triples (R, X, Y), one may argue that in practice, most relations and functions 
are introduced into mathematical contexts by specifying the source and target sets and a rule for effectively 
determining which ordered pairs are to be included. However, this is not always so. A rule may be specified 
for the set of ordered pairs, and it may sometimes be a non-trivial task to determine the domain and range. 
This is more of an issue for functions, which are required to be defined for all elements of the domain. 


9.5.22 REMARK:  Redundant, but useful, definition and notation for the graph of a relation. 

Definition 9.5.23 and Notation 9.5.24 are, strictly speaking, entirely redundant, as is proved by the equally 
redundant Theorem 9.5.25. However, as suggested in Remark 9.5.21, both the definition and the notation 
have some value in changing the perception of a relation from an association between elements of two sets 
to a simple subset of a Cartesian product set. 


9.5.23 DEFINITION: The graph of a relation R is the set ((a, 0b) € Dom(R) x Range(R); (a,b) € R}. 


9.5.24 NOTATION: graph(R), for a relation R, denotes the graph of R. 


9.5.25 THEOREM: Every relation is exactly the same thing as its own graph. 
Let R be a relation. Then graph(R) = R. 


PRoor: Let R be a relation. Then R is a set of ordered pairs by Definition 9.5.2, and it follows from 
Notation 9.5.24 and Definition 9.5.23 that graph(R) = {(a,b) € Dom(R) x Range(R); (a,b) € R}. Let 
x € graph(R). Then z = (a,b) for some (a,b) € R. So x € R. Therefore graph(R) C R. Now let x € R. 
Then x = (a,b) for some ordered pair (a,b), and so (a,b) € R. Therefore a € Dom(R) and b € Range(R) by 
Notation 9.5.6 and Definition 9.5.4, So (a,b) € Dom(R) x Range(R) by Notation 9.4.3 and Definition 9.4.2. 


Therefore x € graph(R). Hence graph(R) = R. 


9.5.26 REMARK: Confusion in terminology for the “range” of a relation. 

It is clear from Definitions 9.5.2 and 9.5.4 that if R is a relation between sets A and B, then Dom(R) C A 
and Range(R) C B. 

The word “range” is used in the mathematics literature for two different concepts. Sometimes the range of a 
relation R means the set B (in the case of a relation between sets A and B), but equally often it means the 
set (b; Ja, (a,b) € R}. It is the latter definition which is adopted here. This choice of definition is influenced 
by the meaning of the English-language word “range”, but it also agrees with Halmos [357], page 27. 
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9.5.27 REMARK: Interpretation of the meaning of a relation. 

A relation-set is not usually thought of as just a set of ordered pairs. Generally a relation will be introduced 
into a mathematical discussion as a “relation between (sets) A and B” or a “relation from (set) A to (set) B”. 
A relation is thought of as associating objects with each other. Thus the ordered pair (a, b) associates object a 
with object b. A relation associates any number of objects in this way. 


Generally the writer or speaker will have in mind specific sets A and B between which the relation defines 
associations. Therefore it is common to define a relation as a subset of a Cartesian product A x B. However, 
it is, strictly speaking, unnecessary to specify the source and target sets A and B respectively. The fact that 
a relation R is required in Definition 9.5.2 to be a set guarantees that Dom(R) and Range(R) are both sets. 
From this it follows that R C A x B if A and B are chosen as A = Dom(R) and B = Range(R). 


9.5.28 REMARK: Including source and target sets in specification tuples for relations. 

A relation is sometimes defined as a triple of sets (R, A, B), where A and B are sets and RC A x B. This 
is then generally abbreviated to R. Thus R < (R, A, B), using the “chicken-foot” notation for abbreviations 
introduced in Section 8.8. In such a formalism, the reader must decide according to context whether R 
means the full tuple (R, A, B) or just the set of ordered pairs R C A x B. 


The tuple formalism (R, A, B) has the advantage of communicating the intended context to the reader, but 
it is often clumsy and confusing. The contextual sets A and B are probably best communicated in the 
surrounding text. 


9.5.29 NOTATION: a Rb for a relation R means that (a,b) € R. 


9.5.30 REMARK:  Justification of the infiz notation for relation expressions. 

The infix notation a Rb is abstracted from the well-known notations for relations such as a = b, a Æ b, a € b, 
a<ba>ba<ba>baCbhadba=b,a~b,ax~b,axbanda&b. However, when the relation is 
denoted by a letter rather than a special symbol, it can give rise to expressions which are difficult to read. It 
is usually best to use infix notation only for relations which are denoted by special symbols. The letter “R” 
used in Section 9.5 is a meta-symbol which is intended to be replaced by some concrete particular relation 
symbol. If the concrete symbol is a letter, it is best notated as a prefix. (In other words, in functional 
notation with parenthesised arguments.) If it is a special symbol, it is best notated as an infix. 


9.5.31 REMARK: Classes and properties of relations. 
Some textbooks give preliminary lists of sample classes of relations. In practice, it is generally best to define 
properties or attributes of relations as they are needed. Quite often, the same words are used for different 
properties. Therefore their definitions are best seen in their application contexts. However, there is probably 
little harm in giving a short sample list of typical relation properties here. 
) Empty. (Example 9.5.3, Definition 10.2.21.) 
) Cross product, equality, inequality, containment, inclusion. (Example 9.5.3.) 
) Identity. (Definitions 9.6.9, 10.2.27.) 
) Reflexive. (Definitions 9.6.15, 11.1.2, 11.5.1.) 
) Symmetric. (Definition 9.6.15.) 
6) Antisymmetric. (Definition 11.1.2, 11.5.1.) 
) Transitive. (Definitions 9.6.15, 11.1.2, 11.5.1, 12.5.7.) 
) Injective. (Definition 9.6.18.) 
) Equivalence. (Definition 9.8.2) 
) Unique value, existent value, range inclusion. (I.e. functions, Definitions 10.2.2, 10.9.2.) 
) 


Injective, surjective, bijection function. (Definitions 10.5.2, 10.9.8.) 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


9.6. Composition and inversion of relations 301 


9.6. Composition and inversion of relations 


9.6.1 REMARK: Justification of the reverse order of the notation for the composition of relations. 

The reverse order of appearance of E; and fi in Notation 9.6.3 is an annoyance which is almost universally 
accepted. When one speaks of “the composition of R; and H5", this means that R4, is applied first and 
H» is applied second. (The word “apply” here is more meaningful for functions than for relations.) But to 
harmonise Notation 9.6.3 with the functional notation Ro(Rı(x)) for x € Dom(Rı), the order is reversed 
so that (Rə o R,)(x) = Ra(R4(x)). This reverse composition order notation is followed, for example, by 
Halmos [357], page 40; Stoll [393], page 39; Mattuck [114], page 127. However, Rose [127], page 3, adopts the 


left-to-right order for denoting function composition, with the following comment. 
(This corresponds to the European convention of reading from left to right.) With the functional 
notation customary in analysis, by which the image of x under ¢ is denoted by $(x), the composite 
map is denoted by ¥¢ : (9ó)(z) = v(ó(r)). 
Since composed functions are written in reverse order, this must be done for general relations also so that 
one can say that functions are a special kind of relation, even though the reverse order makes even less sense 
for relations than it does for functions. 


9.6.2 DEFINITION: The composition or composite of relations R4 and HR» is the set 
((a, c); 3b, ((a,b) € Ri ^ (b,c) € R3)]. 


9.6.3 NOTATION: Rə o Rı denotes the composition of two relations R; and Rə. In other words, 
Rə o Rı = ((a,c); 3b, ((a,b) € Ri ^ (b, c) € R3)]. 


9.6.4 REMARK: Composition of relations. 

'The composite of two relations is a relation. This is almost obvious because a relation is defined as a set of 
ordered pairs. The only non-obvious assertion is that the composite of two relations is a set. The fact that 
someone writes down something of the form X = (x; P(x)} does not necessarily imply that X is a set in 
Zermelo-Fraenkel set theory. Theorem 9.6.5 verifies that the composition of relations yields a set. 


9.6.5 l'HEOREM: The composite of two relations is a relation. 
Let Ri, R2 be relations. Then Rə o R; is a relation and Rə o Rı C Dom(Rı) x Range(R2). 


PROOF: Let Rı and Rə be relations. By Theorem 9.5.8, Dom(R;) and Range(R;) are sets for i = 1,2. 
Thus R; € Dom(R;) x Range(R;) for i = 1,2. Let R = Rə o Ri. Let (a,b) € R. Then (a,c) € Ry and 
(c, b) € Ra for some c. Therefore a € Dom(R) and b € Dom(R5). Hence R C Dom(R1) x Range( Rə). 


9.6.6 THEOREM: Some basic properties of composites of relations and their inverses. 
Let R4 and Rə be relations. 
(i) Dom(Rz o R1) = Rī ! (Dom(H3)). 
(ii) Range(R2 o R1) = Ro(Range(R;)). 
(iii) Dom(Rz o RI ) = Ri (Dom(R3)). 
) Range(R2 o R7 +) = R2(Dom(R,)). 
(v) Dom(Rs o Ri) = RI ! (Dom(R5) n Range(R1)). 
) 
) 


(iv 


Range(R2 o R1) = Ro(Range(Ri) N Dom(Rə2)). 
(vii) Dom(Rə o RI ) = Rı(Dom(R1) N Dom(Rə)). 
(viii) Range(Rz o RI) = Ro(Dom(R1) N Dom(R»)). 


— 


(vi 


— 


Pnoor: For part (i), let Ry and Rz be relations. Then 
Va, x € Dom(Rz o R4) & dy, (x,y) € Roo Ry 
€ dy, 3z, ((, z) € Ri and (z, y) € R2) 
«€ dz, ((x,z) € Rı and dy, (z, y) € Re) 
€ dz, ((x, z) € Ry and z € Dom(R3)) 
€ 3z, ((z, z) € R1! and z e Dom(R3)) 
& zx € R,(Dom(R;)). 
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Hence Dom(Rs o R4) = RI ! (Dom(R3)). 
For part (ii), let Ry and Rə be relations. Then 


Va, y € Range(Ro o Ri) & Az, (x,y) € Roo Ry 

Ja, 3z, ((z, 2) € Ri and (z, y) € Ro) 
dz, ((3v, (x, z) € Ri) and (z, y) € Re) 
dz, (z € Range(R,) and (z, y) € Ro) 


y € Ro(Range(R;)). 


TETT 


Hence Range(R2 o R4) = Rə(Range(R1ı)). 
Part (iii) follows from part (i). 
Part (iv) follows from part (ii). 
Part (v) follows from part (i) and Theorem 9.5.19 (iv). 
Part (vi) follows from part (ii) and Theorem 9.5.19 (iii). 
(vi 
(vi 


Part (vii) follows from part (iii) and Theorem 9.5.19 (iii). 


viii) follows from part (iv) and Theorem 9.5.19 (iii). 


9.6.7 THEOREM: Associativity of composition of relations. 
(Ha o R2) o Rı = Ra o (Rə o Rj) for any relations R1, Ro and R3. 


PROOF: Let R;, Rg and H3 be relations. Then R3 o Ry = {(b,d); dc, ((b, c) € Re A (c,d) € R3)] and 
(Rs o Rg) o Rı = ((a, d); 3b, ((a,b) € Ri ^ (b, d) € Ra o R2)} by Definition 9.6.2. Therefore 


(R3 o Rə) o Ri = {(a,d); Jb, dc, ((a, b) ERIA (b, c) € Ro A (c, d) € R3)} 


Similarly Rə o R4 = {(a,c); 3b, ((a,b) € Ri A (b,c) € R2)}, and so 


Rs o (Ro o Ri) = (a, d); de, ((a, c) € n o Ry ^ (c, d) € R3)} 


(a, d); 3b, 3c, ((a,b) € Ri ^ (b,c) € Ra A (c,d) € R3)} 


9.6.8 REMARK: The identity relation on a set. 
Since the identity relation (template) is a function for any set X, it is also defined as the identity function. 
(See Definition 10.2.27 and Notation 10.2.29.) 


Definition 9.6.9 may be thought of as a "forward reference" to Definition 10.2.27. It is useful for stating some 
properties of general relations. Even when an identity relation and identity function are the same set, they 
are in different object-classes. As mentioned in Section 8.8, ZF sets should be regarded as merely parameters 
which indicate which object in an object-class is intended. Each object-class has its own definitions, notations, 
meanings and applications. As mentioned in Remark 9.1.4, a relation has a static character, whereas a 
function has a more dynamic character, even when they are represented by the same set. 


9.6.9 DEFINITION: The identity relation on a set X is the relation {(x,x); x € X]. 
9.6.10 NOTATION: idx denotes the identity relation on a set X. In other words, idx = ((z,z); x € X]. 


9.6.11 THEOREM: Some elementary properties of the identity relation. 
(i) idx o idx = idx for any set X. 

(ii) Ro idpomcr) = R for any relation R. 

(iii) idpange(r) 0 R = R for any relation R. 
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PROOF: For part (i), let (x, z) € idx o idx. Then (x,y) € idx and (y, z) € idx for some y by Notation 9.6.3. 
Sox = y and y = z by Notation 9.6.10. Therefore x = z by Definitions 7.2.4 and 6.7.6 (EQ 3). So (z, z) € idx 
by Notation 9.6.10. Thus idx o idx C idx. Now suppose that (x,y) € idx. Then x = y by Notation 9.6.10. 
So (x,y) € idx o idx by Notation 9.6.3. Thus idx C idx o idx. Hence idx o idx = idx. 
For part (ii), let (a, z) € Ro idpomr). Then (y, z) € R and (x,y) € idpomcr) for some y by Notation 9.6.3. 
But then y = x by Notation 9.6.10. So (a,z) € R. Thus R o idpomcry C R. Now suppose that (x,y) € 
Then (x, x) € idpomcry by Notation 9.6.10 because x € Dom(R). So (x,y) € R o idpom(r) by Notation 9.6.3 3. 
Thus R C R oidpog(g). Hence R o idpom(r) = R by Theorem 7.3.5 (iii). 
For part (iii), let (x, z) € idpange(r) © R. Then (x,y) € Rand (y, z) € idRange(r) for some y by Notation 9.6.3. 
But then y = z by Notation 9.6.10. So (z,z) € R. Thus idrange(r) o R C R. Now let (x,y) € R. Then 


(y, y) € idpange(r) by Notation 9.6.10 because y € Range(R). So (x,y) € idmauge(g) © R by Notation 9.6.3. 
Thus R C idpange(r) © R. Hence idRange(r) o R = R by Theorem 7.3.5 (iii). 


"| 


9.6.12 REMARK: Obvious properties of inverses of relations. 
It goes without saying (and without proof) that the inverse of a relation is a relation. Also, if R is a relation 
from A to B, then R-! is a relation from B to A. 


9.6.13 DEFINITION: The inverse of a relation R is the set {(b,a); (a,b) € R}. 


9.6.14 NOTATION: R`! denotes the inverse of a relation R. 


9.6.15 DEFINITION: A reflexive relation in a set X is a relation R in X such that Va € X, (a,a) € R. 
A symmetric relation is a relation R such that Va, Vb, ((a,b) € R = (b,a) € R). 
A transitive relation is a relation R such that Va, Vb, Vc, (((a,b) € R ^ (b,c) € R) = (a,c) € R). 


9.6.16 THEOREM: Some basic properties of inverses of relations. 
(i) idx’ = idx for any set X. 


(ii 


(R- RA 1 — R for any relation R. 


(iii) R^! o R is a symmetric relation for any relation R. 


) 
) R 
(iv) R1 o R 2 idpom(n) for any relation R. 
(v) R 
) 


(vi 


oR 1D idnange(g) for any relation R. 
(R20 Ry) t= Re o d for any relations R; and Rə. 


PROOF: For part (i), let X be a set. Then by Definition 9.6.13, (x,y) € idx’ if and only if (y, x) € idx. 
But (y, x) € idx if and only if x € X and y = x by Notation 9.6.10. So (x,y) € idx’ if and only if £ € X 
and y = x by Theorem 4.7.9 (xviii). Therefore (x, y) € id’ if and only if (x,y) € idx by Notation 9.6.10. 
Hence id, = idx by Theorem 7.5.10. 

For part (ii), let R be a relation. Then R^! = {(b,a); (a,b) € R} by Definition 9.6.13. Therefore (R~')~! = 
{(b,a); (a,b) € R-*} = {(b, a); (b,a) e R} = R. 

To prove part (iii), let R be a relation. Let S denote the composite R-t! o R of R with its inverse. Suppose 
(21,22) € S = R^! o R. Then for some y € Y, (21,y) € R and (y,z2) € R^1. So (x2,y) € R by the 
definition of R^ 1. Similarly, (y,z1) € R71. Hence (2,21) € S by the definition of the composite. Therefore 
S is symmetric. 

For part (iv), let R bea relation. Then R^! o R = {(a,c); 3b, ((a,b) € RA (b,c) € R^ 1)) by Definition 9.6.2. 
So by Definition 9.6.13, 


b, ((a,b) € R ^ (c,b) € R)} 
b, ((a,b) € R ^ (a,b) € R)] 


For part (v), let R bea relation. Then Ro R^! = {(a,c); 3b, ((a,b) € R^! ^ (b,c) € R)} by Definition 9.6.2. 
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So by Definition 9.6.13, 

b, ((b,a) € R A (b,c) e R) 
b, ((b,a) € R A (b,a) € R)} 
(a,a); a € Range(R)} 


For part (vi), let Ry and Rg be relations. Then (Rə o F1) ! = ({(a,c); 3b, ((a,b) € Ri A (b,c) € R))) ! = 
{(c,a); Ib, ((a,b) € Ri A (b,c) € R2)} = ((c,a); 3b, ((b,a) € RI! ^ (c,b) € Ry')} = RI ' o R51. 


9.6.17 REMARK: Algebraic inverses of relations. 

Although the name “inverse” is given to the “inverse of a relation” in Definition 9.6.13, it would be more 
accurate to call it the “reverse of a relation” because it simply reverses the order of the ordered pairs. 
Theorem 9.6.16 (iv, v) does not imply that R and R^! are algebraic inverses of each other in the sense of the 
group element inverses in Definition 17.3.2 (iii). However, for injective relations, Theorem 9.6.19 (viii, xii) 
shows that inverses of relations compose in a way which resembles the algebraic concept of inverses for 
groups. In fact, the algebraic properties of composition, identities and inverses for relations have similarities 
to the algebraic properties of multiplication, identities and inverses of matrices in Section 25.3. 


9.6.18 DEFINITION: An injective relation is a relation R which satisfies 


Vai, Varo, Vy, ((1,y) ERA (22, y) € R) = X1 = 22. 


9.6.19 THEOREM: Some basic properties of injective relations. 
(i) The composite of any two injective relations is an injective relation. 
Gi) BS o R= idpomr) if and only if R is injective, for any relation R. 


(iii) Ro R^! = idpange(r) if and only if R^! is injective, for any relation R. 
PROOF: For part (i), let E; and Rz be injective relations. By Definition 9.6.2, the composite of Ry and R2 
is the relation R = {(a,b); 3c, ((a,c) € Ra A (c,b) € Ri)}. Suppose (a1, b) € R and (a2,b) € R. Then for 
some c4 and c2, (a1, c1), (a3, c2) € R2 and (c1,b), (c3, b) € Ri. Since R4 is an injective relation, cı = co. So 
a, = à» because R; is an injective relation. Hence R is an injective relation. 


For part (ii), let R be an injective relation. Then 


(a,c); 3b, ((a,b) € R ^ (b.c) e R^) 
= ((a, c); 3b, ((a,b) € R ^ (cb) e R)} 
= ((a, a); 3b, ((a,b) € R ^ (a,b) e R)} (9.6.1) 
= ((a, a); a € Dom(R)} 


Line (9.6.1) follows from the injectivity of R. Now suppose that R is not injective. Then (a1,5), (a2,b) € R 
for some a4, à; and b with a; Æ a», and so (b, a1), (b, a3) € R-t. Therefore 3b, ((a1,b) € R ^ (b,a2) € R- 1). 
So (a1, a2) € R-!o R. But (a1, a2) [2 idpom(R)- So Ro R z idpom(R)- 


For part (iii), let R be a relation such that R~* is injective. Then 
Ro R^ = {(a,c); 3b, ((a,b) € R? ^ (b.c) € R)} 
= {(a,c); 3b, ((a,6) € R^ ^ (ob) e R™)} 
= ((a, a); 3b, ((a,b) € R^! ^ (a,b) e R )) (9.6.2) 
= {(a,a); a € Dom(R^)] 


= idpon(n-1) = idRange(R)- 


Line (9.6.2) follows from the injectivity of R71. Suppose that R^! is not injective. Then (a1, b), (a2,b) € R^! 
for some a1, az and b with a; Æ a», and so (b, a1), (b,a2) € R. Therefore 3b, ((a1,0) € R^! ^ (b,a2) € R). 
So (a1, a2) € Ro Ro. But (a1, a2) E id mange R)- So Ro Ro! Æ idRange(R)- 
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9.6.20 REMARK: Generalisation of function restrictions to relations. 

Definition 9.6.21 and Notation 9.6.22 extend the restriction concept in Definition 10.4.3 and Notation 10.4.4 
from functions to general relations. Although domain restrictions for relations are a straightforward and 
obvious generalisation, range restrictions are not often mentioned. Therefore an ad-hoc notation is given 
here, which is probably non-standard. The restriction R| Z of a relation R to a set-pair (A, B) is almost 
never encountered. It is given here only for reasons of morbid curiosity. 


Relation restrictions are closely related to compositions of relations with identity relations. 


9.6.21 DEFINITION: Restrictions of relations. 
The domain restriction of a relation R to a set A is the relation ((x,y) € R; x € A}. 


The range restriction of a relation R to a set B is the relation ((x,y) € R; y € B). 
The restriction of a relation R to a set-pair (A, B) is the relation RN (A x B) = ((z,y)e R; x € A, y e B}. 
The restriction of a relation R to a set C is the relation RN (C x C) = {(x,y) e Rs cE CL y € Ch. 


9.6.22 NOTATION: Relation domain and range restrictions. 
R| a for a relation R and set A, denotes the domain restriction of R to A. 


R|”, for a relation R and set B, denotes the range restriction of R to B. 


A for a relation R and sets A and B, denotes the range restriction of R to the set-pair (A, B). 


9.6.23 REMARK: Reconstruction of relations from restrictions to "patches". 
Theorem 9.6.24 (v) has some relevance to the concept of an atlas. If two sets M; and M» are covered by 
set-collections C and C2 respectively, any relation R between M; and Mə may be constructed from the 
restrictions of R to “patches” A € C1 and B € C2. (See Section 49.3 for non-topological charts and atlases. 
See the proof of Theorem 49.3.6 (iv, v) for applications of Theorem 9.6.24 (v).) 


9.6.24 THEOREM: Some properties of identity relations and restrictions of relations. 
Let R be a relation. 


(i) Roida = RA (A x Range(R)) for any set A. In other words, Ro id4 = R| is 
(ii) idg o R= RN (Dom(R) x B) for any set B. In other words, idg o R = R|P. 
(iii) idg o Roida = RN (A x B) for any sets A and B. In other words, idg o Ro id, = Ri. 
) 


(iv) Uaec, Unec, (ids o R o ida) = RN ((U C1) x (U C2)) for any (collections of) sets C1 and C5. 


; ; C 
In other words, LJ 4cc, Upec, (ids o R o ida) = Uaec, Upec, BIZ = AILS. 


(v) If Dom(R) C UC; and Range(R) € U C2, then R = Uyec, Upcc, (ide o R o ida). 


PROOF: For part (i), it follows from Notations 9.6.3 and 9.6.10 that 


{(a,c); 3b, ((a,b) € ida A (b,c) € R)} 

{(a,c); db, (@a=bAbEAA (b,c) € R)) 
= ((a,c); a € A ^ (a,c) € R} 

t(a, c) 


= RN (A x Range( R)). 


Parts (ii) and (iii) may be proved in a similar way to part (i). 


For part (iv), Uaec, Upec, (ida o R o ida) = Uaec, Upec, (RM (A x B)) by part (iii), which equals 
RO (Uae, Usec, (Ax B)) by Theorem 8.4.8 (iv), which equals RA ((U C1) x (U C2)) by Theorem 9.4.7 (v). 


Part (v) follows from part (iv) because R C Dom(R) x Range(R). 


9.6.25 THEOREM: More properties of identity relations and restrictions of relations. 
Let R bea relation. Let A and B be sets. 
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(i) Ro id, = C RIP) = idgcay o R. 

(ii) Ro idg-1(p) = Rl p- (B) 2 R|? = Ro idp. 

(iii) R o id4 = J = R|FC9 = idg(aj o R if R is injective. 

(iv) Ro idg-1(B8) = R| p- 
) 
) 


ig) = R|? = idg o R if R`! is injective. 


(v) R^! o idg o R = idp-1(p) if R is injective. 


(vi) Ro ida o R^! — idg(Ay if R7! is injective. 


PROOF: For part (i), let (a,b) € R|,- Then (a,b) € R and a € A. Therefore (a,b) € R and b € R(A) 
by Notation 9.5.17. So (a,b) € gp. Hence R|, e HIN, The other two equalities then follow from 
Theorem 9.6.24 (i, ii). 

Part (ii) follows as for part (i), using Notation 9.5.17 for R- ! (B). 

For part (ii), let (a,b) € R|F(9. Then (a,b) € R and b € R(A). So (a',b) € R for some a’ € A by 
Notation 9.5.17. But then a’ = a by Definition 9.6.18 because R is injective. So (a,b) € R and a € A. 


Therefore (a,b) € Rss Thus R|; 2 Re Hence R|} = Ree by part (i). The other two equalities also 
follow by part (i). 


Part (iv) may be proved similarly to part (iii). 
For part (v), it follows from Notations 9.6.3 and 9.6.10 that 


R~! o idg o R = ((a,d); 3b, 3c, ((a,b) € R and (b,c) € idg and (c,d) € R™')} 
= ((a, d); 3b, Jc, ((a, b) € R and (b = c and b € B) and (d,c) € R)} 
= { (a, d); 3b, ((a, b) € Rand b € B and (d,b) € R)} 
= ((a, d); 3b, (a,b) € R and b € B) and a = d) (9.6.3) 
= ((a, d); a = d and 3b, ((a,b) € Rand b € B)] (9.6.4) 
= ((a,d); a = d and 3b € B, (a,b) € R} 
= ((a,d); a = d and a € R^ !(B)) 
= idR-1(B), 


where line (9.6.3) follows from the injectivity of R, and line (9.6.4) follows from Theorem 6.6.16 (vi). 
Part (vi) follows from part (v) by substituting R^! for R. 


9.7. Double-domain direct products of relations 


9.7.1 REMARK: Definition and notation for the double-domain direct product of two relations. 

The style of direct product of relations in Definition 9.7.2 is relevant to the definition of atlases for direct 
products of manifolds. (See Definition 50.4.6 for example.) It is a generalisation of the double-domain direct 
product of functions in Definition 10.14.3. 


9.7.2 DEFINITION: The (double-domain) direct product of two relations Rı and Ra is the relation: 
{((21, x2), (1, Y2)); (01,91) € Rı and (22, y2) € Raj. 


9.7.3 NOTATION: R, X Ro, for relations Ry and R5, denotes the double-domain direct product of R; and Ro. 
In other words, 


Rı X Ro = (((21,22), (91,92); (31,91) € Ri and (22, y2) € Ro}. 


9.7.4 REMARK: Ambiguous notation for direct product of relations. 

The alternative R x Rə for Notation 9.7.3 would clash with Notation 9.4.3 for the Cartesian product of sets. 
To be specific, for any relations F4 and R2, the Cartesian set-product of R4 and Rə is {(a,b); a € R4 and 
b € Re}, which is equal to (((z1, y1), (£2, y2)); (21,91) € Ri and (2, y2) € Ro}. The direct relation-product 
in Definition 9.7.2 gives (((z1,22), (y1, y2)); (1,91) € Ri and (x2, y2) € Ro} for the same relations. Thus 
(23,91); (22; y2)) is in the set-product if and only if ((z1, 2), (yi, y2)) is in the relation-product. 
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(1) Ri x Re = {((x1, y1), (22, y2)); (31,91) € Ri and (za, y2) € Re} is the Cartesian set-product. 
(2) Ry X Ro = {((21, £2), (Y1, y2)); (1,91) € Ry and (x2, y2) € Ro} is the double-domain direct product. 


The “information” in these tuples is the same, but arranged differently. Since they are so similar, it would 
be easy to confuse them. Therefore they must be handled with care! (This applies equally to the direct 
function products in Definition 10.14.3 and the direct partial function products in Definition 10.14.10.) 


9.7.5 THEOREM: Domains and ranges of direct products of relations. 
Let Rı and Rə be relations. 


(i) Dom(R; X Rz) = Dom(R;) x Dom(Ry). 
(ii) Range(R, x R2) = Range(R,) x Range(R2). 
PROOF: For part (i), it follows from Definition 9.5.4 that 
Dom(R; X H3) = (a; 3b, (a,b) € Ri X Ro} 
= {(@1, £2); 30. y2), ((£1, £2), (yi, 92)) € Fa X Ro} 
= ((21,22); I(y1, y2), ((x1,91) € Ri and (22, y2) € R2 )} 
) 
) 


= (21,22); (Ay1, (21,91) € Ri) and (Aya, (22,2) € R2)} 


= Dom(Ri) x Dom(R2). 
For part (ii), it follows from Definition 9.5.4 that 
Range( Ry x Rə) = {b; Ja, (a,b) € Ry x Ro} 


= {(y1, Y2); 3(21, 22), ((£1, £2), (yi, y2)) € Ri X Ro} 
= {(y1, y2); 3(21, 22), ((1, 491) € Ri and (x2, y2) € R2 )} 
= {(y1,y2); (Ari, (21, 91) € Ri) and (322, (v2, Y2) € R2)} 


= {(y1,Y2); yı € Range(Rı) and y» € Range(1i5)] 
= Range(Rı) x Range( s). 


9.7.6 REMARK: Alternative expression for the direct product of two relations. 
By Theorem 9.7.5, one may write: 
V((2, £2), (y1, y2)) € Dom(R, x Ha) x Range(Rı x H3), 
((23, £2), (y1, y2)) ER, x Rog e (11,41) € Rı and (x2, y2) € Ro. 


9.7.7 REMARK: Composition of direct products of relations. 

The composition of two direct products of relations yields a direct product of two composites of relations. 
This is shown in Theorem 9.7.8 (i). This kind of construction is relevant to transition maps for direct products 
of manifolds. (See for example Theorem 52.6.5.) 


9.7.8 THEOREM: Some properties of direct products of composites of relations. 
Let Ri, Ro, S1, S2 be relations. 


(i) (S1 o R1) X (S2 o Re) = (S1 X S2) o (Ry X Ro). 
(ii) Rj! x R7! = (Ri X R5) |. 
(ii) (S; o R11) X (S2 0 R53) = (S; X S2) o (Ri X R3). 
Pnoor: For part (i), it follows from Definitions 9.6.2 and 9.7.2 that 
V((21, £2), (Y1, y2)) € Dom((S1 o R4) X (S5 o R5)) x Range((S4 o R1) x (S2 o R5)), 
(21,22); (Y1, Y2)) € (81 o Fa) x (S2 o Rə) 
(21,91) € (S10 Ri) ^ (22,92) € (S2 o Ra) 
Fei, (231,21) € Ri ^ (21,91) € S1)) A (Aza, ((@2, 22) € R2 ^ (22,92) € S2)) 
(21,22). ((@1, 21) € Ri A (12,22) € R2 ^ (21,91) € S1 A (22,92) € S2) 
(1,22). (((x1, £2), (21,22)) € Ri X Ro ^ ((21, 22), (y1,92)) € $1 X S2) 
21,22), (41, y2)) € (91 X S2) o (Ry X Rə). 


— 


H 
? 


$2$20221 


^c Ll 
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Hence (Si O Rı) x (S2 O Ha) = (Si x S2) O (Ry x Ra). 
For part (ii), it follows from Definitions 9.6.2, 9.6.13 and 9.7.2 that 


V((21,22), (y1, y2)) € Dom(Rj ! X R5!) x Range(R7* X R51), 

((21,22), (y1,u2)) € Rz X Ry* © (zu yi) € Ry* A (23, y3) € Ry* 
1,21) € Ri ^ (ys, 22) € R2 
(yi, 92), (21, 22)) € Ri X Rə 
(21,22); (y1, y2)) € (Ri X Re). 


— a C oc 


e 
e 
e 


Hence Rj! x R5! = (Ri X R5). 
Part (iii) follows from parts (i) and (ii). 


9.7.9 REMARK: Common-domain direct products of relations. 
The common-domain direct products of functions in Definition 10.15.2 may be generalised to relations, but 
in the absence of any obvious applications to differential geometry, they are not defined here. 


9.8. Equivalence relations 


9.8.1 REMARK: Equivalence classes are closely related to partitions. 
Although equivalence relations cannot be defined before relations, the closely related lower-level concept of 
a partition is defined in Section 8.7. 


9.8.2 DEFINITION: An equivalence relation on a set X is a relation R on X such that 


(i) Yx € X, z Rz; [reflexivity] 
(ii) Yx, y € X, (x Ry > y Rz); [symmetry] 
(iii) Vz,y,z € X, ((r Ry and y Rz) > z Rz). [transitivity] 


9.8.3 REMARK:  Equi-informational definitions of partitions and equivalence relations. 

Theorem 9.8.4 could be thought of as the “fundamental theorem of partitions and equivalence relations". 
Partitions and equivalence relations are *equi-informational" in much the same way that subsets of sets and 
indicator functions on sets are “equi-informational”. (See Remark 14.7.6 for the equi-informationality of the 
set of subsets P(S) and the set of functions 2° for any set S.) 


Another structure which is equi-informational to a partition of X or an equivalence relation on X is a 
function defined on X. This function defines an “equivalence kernel" on X which is an equivalence relation. 
(See Definition 10.16.7.) 


9.8.4 THEOREM: Interdefinability of partitions and equivalence relations. 
(i) Let S be a partition of a set X. Then the relation R on X defined by 


Yr,y € X, rRy-e€dAeS,(reAandye A) (9.8.1) 


is an equivalence relation on X. 


(ii) Let R be an equivalence relation on a set X. Then 


(A € P(X) \ {0}; (Vz,y € A, x Ry) and (Vx € A, Vj € X \ A, a(x Ry))} (9.8.2) 
is a partition of A. 


PRoor: For part (i), let S be a partition of a set X, and define a relation R on X as in line (9.8.1). Let 
x € X. Then x € A for some A € S by Definition 8.7.12 (i). So HA € S, (x € A and z € A). Therefore x Rx, 
which verifies Definition 9.8.2 (i). The symmetry condition Definition 9.8.2 (ii) follows from the symmetry 
of the form of the relation in line (9.8.1). Now suppose that z,y,z € X satisfy r Ry and y Rz. Then 


JA € S, (x € A and y € A) and 3B € S, (y € B and z € B). By Definition 8.7.12 (ii), either A = B or 
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ANB = 0. If A = B, then z € A and z € B = A, and so z Rz. But if ANB =9, then either y ¢ A or y ¢ B, 
and so either z Ry is false or y Rz is false, which contradicts the assumption. Thus Definition 15.6.5 (iii) is 
verified. Hence R is an equivalence relation on X. 


For part (ii), let R be an equivalence relation on X. Let S be the subset of P(X) V {Ø} as indicated in 
line (9.8.2). Then US C X. Let x € X. Let A= (y € X; z Ry). Then A C X, and A # () because x € A 
by Definition 9.8.2 (i). So A € P(X) \ {0}. Let x,y € A. Then x Ry by the definition of A. Let x € A and 
y € X\ A. Then z Ry is false by the definition of A. So A € S. Therefore X C US and so US = X, 
which verifies Definition 8.7.12 (i). Now let A, B € S satisfy AN B # 0. Then z € An B for some x € X. 
Let y € A. Then z Ry by line (9.8.2). Suppose that y ¢ B. Then x Ry is false by line (9.8.2). So A C B. 
Similarly B C A. So A = B. This verifies Definition 8.7.12 (ii). Hence 5 is a partition of X. - 


9.8.5 DEFINITION: The equivalence class containing an element x of a set X, for an equivalence relation 
Ron X, is the set (y € X; x Ry). 


An equivalence class of an equivalence relation R on a set X is the equivalence class for R containing 
some r € X. 


9.8.6 NOTATION: [r]|g, for x in a set X with an equivalence relation R on X, denotes the equivalence class 
of z with respect to R. 


[x] is an abbreviation for [x] y when the equivalence relation R is implicit in the context. 


9.8.7 DEFINITION: The quotient set of a set X with respect to an equivalence relation R is the set of 
equivalence classes of X with respect to R. 


9.8.8 NOTATION: X/R, for an equivalence relation R on a set X, denotes the quotient (set) of X with 
respect to R. In other words, X/R = ([x]n; x € X). 


9.8.9 REMARK: Identification spaces and quotient sets. 

The set X/ R of equivalence classes of a set X with respect to a relation R is often called the “identification 
set" or "identification space" of the relation R on X, particularly in the context of topological spaces. It 
is also known as the *decomposition set? or *decomposition space" of the relation. Another name for the 
quotient set X/R is the “classification” of X by R. It may also be called simply the “partition” of X by R. 


9.8.10 REMARK:  Equivalence kernels. 
Equivalence relations on a set X arise very naturally as “equivalence kernels" of arbitrary functions on X. 
Since this concept requires the definition of a function, it is delayed until Definition 10.16.7. 
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10.1. General comments on functions 


10.1.1 REMARK: Functions are quintessentially different to relations. 

As mentioned in Remark 9.0.1, although functions are represented as a special case of relations, the meaning 
of functions is not a special case of the meaning of relations. Relations are essentially thought of as maps 
from tuples of objects to the truth values F and T, and maps are functions. So it would seem that functions 
must be defined before relations. But functions are essentially thought of as maps from tuples of objects to 
individual objects, and maps are functions. So it would seem that functions must be defined before functions! 
This circularity of definitions is broken by representing functions as a special case of relations, although their 
meaning is not a special case. 


10.1.2 REMARK: Set-theoretic function predicates versus Cartesian-product functions. 
'The word "function" refers to two kinds of mathematical entity: 
(i) a rule-based “set-theoretic function" (specified as a “set-theoretic formula" as in Definition 7.2.2); 
(ii) a special kind of relation-set (as defined in Definition 9.5.2). E 
A rule-based (“set-theoretic”) function may be thought of as a procedure or a sequence of operations in the 


logic layer which yields a new set from a given set. The domain of a set-theoretic function is not necessarily 
a set. As an example, the operation of constructing the union X U {0} from any given set X is clearly a 
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well-defined operation on all sets. But the “set of all sets” is not a set. This kind of logic-layer set-theoretic 
function is not the subject of Section 10.1. 

A reasonable name for the kind of function in part (i) would be a “function-predicate” by analogy with the 
“relation-predicate” alluded to in Remark 9.1.1. Then a reasonable name for the kind of function in part (ii) 
would be a “function-set” by analogy with the corresponding “relation-set” . 


10.1.3 REMARK: The active nature of functions. 

Functions are generally represented in set theory as particular kinds of sets, but they are usually thought of 
as being a separate object class — something like a machine which produces outputs for given inputs. The 
modelling of functions as sets is an economical measure which keeps the number of object classes low. (The 
advantages and disadvantages of such “conceptual economy” are discussed by Halmos [357], pages 24-25. 
Dissatisfaction with the passive definition of a function as a set is discussed by Halmos [357], page 30.) 


10.1.4 REMARK: Function procedures versus function look-up tables. 

It may be that the feeling which mathematicians have that a function is different to a set is due to the 
historical origins of functions. In the olden days, for instance, the square of an integer x was defined by a 
procedure of multiplication of x by itself, which was an active process of generating one number from another. 
But the set definition of the “square function" is more like a look-up table in computing. The set definition is 
a set of ordered pairs (x, z?). So to calculate the square of a number with the set definition, you look up the 
value in the set of ordered pairs. In a more active definition of functions, you would specify an algorithm or 
procedure. It seems to have been necessary historically to abandon functions defined as procedures in favour 
of functions defined as look-up tables in order to remove an arbitrary limitation on the set of functions that 
one can discuss. The down side of this has been the loss of “active mood" in the definition of functions. 


10.1.5 REMARK: Other terminology for functions. 

The nouns “function” and “map” are used synonymously in this book. In many contexts, the word “map” 
indicates a function between sets which are peers in some sense (such as differentiable manifolds), whereas 
"function" is then used to indicate a more light-weight function such as a real-valued function. The noun 
“mapping” is synonymous with “map”, and a “family” is really the same thing as a general function except 
that it is thought about differently. All of these synonyms for “function” are useful for putting the focus on 
different aspects of functions. 


10.1.6 REMARK: Historical origin of the word “function”. 

The word function in the mathematical sense was apparently introduced by Leibniz. Bell [233], page 98, says: 
“The word function (or its Latin equivalent) seems to have been introduced into mathematics by Leibniz in 
1694; the concept now dominates much of mathematics and is indispensable in science. Since Leibniz’ time 
the concept has been made precise.” 


In an important 1875 paper, Darboux [176], page 59, felt it necessary to strongly assert that a well-defined 
function f is any correspondence of a unique value f(x) to each value of x, and that each individual value 
f(x1), f(x2), .... may be defined in a completely arbitrary manner. This was at a time when it was still 
believed that all continuous functions are differentiable because the notion of a function was much narrower 
than is now accepted. Some impression of the gulf separating the concept of a function 130 years ago from 
the present day may be gained from this opening paragraph of a paper by Darboux [176], page 57, where he 
referred to an 1854 treatise by Riemann [195], posthumously published in 1868. 


Jusqu'à l'apparition du Mémoire de Riemann sur les séries trigonométriques aucun doute ne s'était 
éléve sur l'existence de la dérivée des fonctions continues. D'excellents, d'illustres géométres, au 
nombre desquels il faut compter Ampère, avaient essayé de donner des démonstrations rigoureuses 
de l'existence de la dérivée. Ces tentatives étaient loins sans doute d'étres satisfaisantes; mais, je 
le répéte, aucun doute n'avait été formulé sur l'existence méme d'une dérivée pour les fonctions 
continues. 


This may be translated as follows. 
Until the appearance of Riemann's memoire on trigonometric series, no doubt had been raised about 
the existence of the derivative of continuous functions. Excellent, illustrious geometers, amongst 
whose number one must count Ampére, had tried to give rigorous demonstrations of the existence 
of the derivative. Without doubt, these attempts were far from being satisfactory; but, I repeat, 
no doubt had been formulated about the existence itself of a derivative for continuous functions. 
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10.2.1 REMARK: Inclusion of the domain and range in the specification of a function. 
Whereas a relation is introduced in Definition 9.5.2 as a set of ordered pairs without any specification of an 
explicit domain or range set, most introductory texts do explicitly define a function in terms of a specified 
domain set and range set. But functions, like relations, do not need the domain or range to be specified in 
advance. The domain can always be determined from the set of ordered pairs in a function. Similarly, any 
set which contains all values of the function may be considered to be a range set for it. 


Therefore Definition 10.2.2 introduces three levels of specification of a function. When neither the domain 
nor the range is specified, condition (i) requires only that the value of the function must have a unique value 
if it has a value at all. When the domain is specified, it is required by condition (ii) to be equal to the 
set of values for which the relation does have a value. When both the domain and range sets are specified, 
condition (iii) only requires that the specified range set Y should contain all of the values of the relation f. 
It does not need to equal the set of values of the relation. 


10.2.2 DEFINITION: A function is a relation f such that 

(i) Va, Vy, Vy, (((a,91) € f A (x, y2) € f) > v = y2). [uniqueness 
A function on X , for any set X, is a function f such that 

(ii) Dom(f) = X, [existence 


A function from X to Y , for any sets X and Y, is a function f on X such that 


(iii) Range(f) € Y, [range inclusion 
10.2.3 NOTATION: /f: X — Y means that f is a function from X to Y. 


10.2.4 THEOREM: Equivalent definition for a function from X to Y. 
Let X and Y be sets. Then f is a function from X to Y if and only if f C X x Y and 


Vr € X, d'y € Y, (z, y) € f. (10.2.1) 
Hence a relation f from X to Y is a function from X to Y if and only if line (10.2.1) is satisfied. 


PRoor: First suppose that f is a function from X to Y. Then f is a set of ordered pairs by Definitions 


By Definitions 9.5.4 and 10.2.2 (ii), X = Dom(f) = (xz; dy, (x,y) € f). Let x € X. Then Jy, (x,y) € f. But 
by Definition 10.2.2 (i), this value of y is unique for a given value of x. Therefore Va € X, J'y € Y, (a, y) € f, 
which verifies line (10.2.1). 


Now suppose that f C X x Y satisfies line (10.2.1). Then f is a set by Remark 9.4.5, and so f is a relation 
by Definition 9.5.2. The uniqueness property in Definition 10.2.2 (i) follows directly from line (10.2.1). The 
existence property Va € X, dy € Y, (x,y) € f also follows directly from line (10.2.1). So Dom(f) 2 X. But 
Dom(f) € X by Theorem 9.5.11 (i). So Dom(f) = X. This verifies Definition 10.2.2 (ii). The set inclusion 
Range(f) € Y follows from Theorem 9.5.11 (ii), which verifies Definition 10.2.2 (iii). 


10.2.5 REMARK: Separation of conditions for functions into existence and uniqueness conditions. 

The existence and uniqueness condition on line (10.2.1) in Theorem 10.2.4 can be expressed in terms of set 
cardinality as Vr € X, #{y € Y; (x,y) € f) =1. In other words, there is one and only one y for each x € X 
such that (x,y) € f. In terms of the set-map f in Definition 10.6.4 (i), the condition for a relation f to be a 
function f : X — Y may be written as Vz € X, #(f({r})) ^ 1. 


The first two conditions in Definition 10.2.2 may be expressed in terms of image set cardinality as follows. 


(i!) Ve € X, #{y; (x,y) € f} € 1. [uniqueness] 
(ii!) Vr € X, #{y; (z,y) e f} 2 1. [existence] 
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If condition (i^) is not required, the relation is a ^multiple-valued function". If condition (ii) is not required, 
the relation is a “partial function”. Neither of these classes of “function” has the huge importance of the 
“well-defined function” in Definition 10.2.2. The reason for the core significance of well-defined functions 
in mathematics is the fact that one may use the word “the” for the value of the function. One may give a 
name and a notation to an object which exists and is unique. A function introduces a unique object into a 
mathematical discussion for each element of the domain. A very large proportion of definitions and notations 
depend for their well-definition on the existence and uniqueness of function values. This point cannot be 
over-emphasised. Existence and uniqueness proofs are two of the core preoccupations of the mathematician, 
especially the pure mathematician. When one says that something is well defined, one generally means that 
it exists and is unique. The two conditions in the definition of a function are therefore constantly employed 
in all of mathematics. 


10.2.6 REMARK: Domain, range and image of functions are inherited from relations. 
The domain, range and image of a function are defined exactly as for relations in Definition 9.5.4. The 
notations Dom(/f), Range( f) and Img(f) are defined for functions exactly as for relations in Notation 9.5.6. 


Likewise, the definition and notation for graph(f), the graph of a function f, are inherited from general 
relations, namely from Definition 9.5.23 and Notation 9.5.24. (Since graph(f) — f for any function f, this 
definition and notation are, of course, totally redundant, as discussed in Remarks 9.5.21 and 9.5.22.) 


10.2.7 REMARK: Ambiguity in the meaning of the range of a function. 

The “range” of a function f from X to Y is sometimes defined to be the set Y rather than the set (y € 
Y; da € X, (x,y) € f} € Y, which is not generally the same set. The term “image”, however, is always 
the set {y € Y; da € X, (x,y) € f} of values of f. In this book, both the range and image of a function 
are understood to be the set of values (y € Y; da € X, (x,y) € f} of the function f as in Definition 9.5.4. 
The term “image” should be preferred because it is less ambiguous. A practical difficulty with the word 
“image” is the fact that the shorter abbreviation “Im” clashes with the abbreviation for the imaginary part 
of a complex number. It is therefore preferable to use “Img” as an abbreviation for “Image”. 


For maximum clarity one should use the term “target set" for the set Y in the phrase “function from X 
to Y", and “image” for the set of values of f. (This is discussed for general relations in Remark 9.5.21.) 


10.2.8 DEFINITION: An argument of a function f is any element of the domain of f. 


A value of a function f is any element of the range of f. 


The value of a function f for an argument x of f is the value y of f such that (x, y) € f. 


10.2.9 NOTATION: f(x) denotes the value of a function f for an argument x € Dom( f). 


f; denotes the value of a function f for an argument x € Dom( f). 
10.2.10 NOTATION: (fi)iex is an alternative notation for a function f with domain X. 


10.2.11 THEOREM: A function is equal to its set of argument/value ordered pairs. 
Let f be a function. 


(i) Va, Vy, (x, y) € f = (x € Dom(f) and y = f(z)). 
(ii) f = {(2, f(x); x € Dom(f)). 


PROOF: For part (i), let (x,y) € f. Then x € Dom(f) by Definition 9.5.4, and y = f(x) by Definition 10.2.8 
and Notation 10.2.9. Now suppose that x € Dom(f) and y = f(x). Then (x,y) € f by Definition 10.2.8 and 
Notation 10.2.9. Hence (x,y) € f = (x € Dom(f) and y = f(z)). 

For part (ii), let f be a function. Let z € f. Then z = (x,y) for some x and y by Definition 9.5.2 because 
f is a relation by Definition 10.2.2. So z € Dom(f) and y = f(x) by part (i). Thus z = (a, f(x)) for some 
x € Dom(f). Therefore z € {(x, f(x)); x € Dom(f)}. Consequently f C {(x, f(x)); x € Dom(f)}. 

Now suppose that z € {(x, f(x));  € Dom(f)). Then z = (a, f(x)) for some x € Dom(f) by Notation 7.7.18. 
In other words, z = (x,y) and y = f(x) for some x € Dom(f). Therefore z € f by part (i). Consequently 


f 2 {(x, f(z)); £ € Dom(f)}. Hence f = {(x, f(x)); x € Dom(f)}. i 
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10.2.12 REMARK: Functions are the same if and only if they agree at all arguments. 
Theorem 10.2.13 is fairly obvious, but is sometimes useful in applications, to remove any lingering doubts. 
Theorem 10.2.14 similarly follows from the uniqueness of function values. 


10.2.13 THEOREM: Argument-by-argument test for equality of functions on a given domain. 
Let f and g be functions on X. Then f = g if and only if Vr € X, (f(x) = g(x)). 


PROOF: Let f = g. Let x € X. Let y = f(x) and z = g(x). Then (x,y) € f and (z,z) € g = f by 
Definition 10.2.8. So y = z by Definition 10.2.2 (i). Thus f(x) = g(x). Hence Vx € X, (f(x) = g(x)). 


Now assume Yz € X, (f(x) = g(z)). Let p € f. Then p = (a,b) for some a € X by Definitions 10.2.2 (ii) 


and 9.5.2. So b is the value of f for argument a by Definition 10.2.8. Therefore b — f(a) by Notation 10.2.9 


So b = g(a) by the assumption. So (a,b) € g by Definition 10.2.8 and Notation 10.2.9. Consequently f C g. 
Similarly g C f. Hence f = g by Theorem 7.3.5 (iii). 


10.2.14 THEOREM: Inclusion test for equality of functions on a given domain. 
Let f and g be functions on X with f C g. Then f — g. 


PROOF: Let f C g. Suppose that q € g \ f. Then q = (x,z) for some x € X and some z. Since f is a 
function on X, there is a pair p — (x, y) for some y. But f C g then implies that p € g. Thus (z, y) € g and 
(z,z) € g. So y = z by Definition 10.2.2 (i). Therefore q = (r,z) € f, which contradicts the assumption. 
Thus g V f =. Hence f = g by Theorem 8.2.5 (v). 


10.2.15 REMARK: Notation for the set of functions with a given domain and range. 
The function set Y * is the set of functions f such that Range(f) = X and Dom(f) C Y. By Theorem 10.2.4, 
the function set Y* may be expressed more formally as on line (10.2.2). 


10.2.16 DEFINITION: The function set from X to Y, for sets X and Y, is the set of functions from X to Y. 


10.2.17 Notation: Y%, for sets X and Y, denotes the function set from X to Y. In other words, 


Y* ={f e P(X xY); Vr e X, Jy EY, (z,y) € f). (10.2.2) 


10.2.18 NOTATION: The template {f : X — Y; P(f)} denotes the set {f € Y *; P(f)) for any sets X and 
Y and any set-theoretic formula P. 


10.2.19 REMARK: Template required in an expression with a variable predicate. 
Notation 10.2.18 is called a template because it contains an arbitrary set-theoretic formula. 


10.2.20 REMARK: Alternative notations for the set of functions with a given domain and range. 

If T denotes the always-true set-theoretic formula with zero arguments (as in Notation 5.1.10), then the set 
Y may be written in terms of Notation 10.2.18 as {f : X + Y; T). But then the function name f does 
not appear in the expression P(f) = T. So it seems superfluous to write the letter f at all. (One could make 
use here of the single-parameter always-true logical predicate which is alluded to in Remark 5.1.9.) The set 
Y is sometimes written as {f : X — Y}, which also contains the superfluous function name f. 


In this book, the notation “X — Y" is proposed as an equivalent for Y*. (See Notation 10.19.2.) This 
notation has the advantage that it avoids the superfluous function name, but it is slightly non-standard. 
One place where the author has found this sort of notation is in a computer software user manual for the 
Isabelle/HOL “proof assistant for higher-order logic” [374], page 5. 


10.2.21 DEFINITION: The empty function is the set 0. 


10.2.22 REMARK: Functions with empty domain or range. 

If f is the empty function f = then Dom(f) = 0 and Range(f) = 0. It follows that f is a function from X 
to Y if and only if X = Ø. In other words, the target set is arbitrary. This observation is proved in a little 
more detail in Theorem 10.2.23 (i). (The proof is nevertheless still only a sketch.) 
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10.2.23 THEOREM: A function is empty if and only if its domain is empty. 
Let X and Y be sets, and let f : X — Y be a function. 


(i) f — 0 if and only if X = 0. 
(ii) If X Z 0, then f AO and Y £90. 
(ii) If Y = 0, then X = 0 and f — 0. 


Pnoor: For part (i), let X and Y be sets, and let f : X — Y be a function. Suppose that f = 0. 
Then Dom(f) = (xz; dy, (x,y) € f) = (x; dy, (x,y) € Ø}. But Vz, z € 0 by Definition 7.6.3. Therefore 


Va, Vy, (x,y) € 0. So Yz, =(Ay, (x,y) € 0) by Theorem 6.6.7 (v). Therefore Vz', x’ ¢ (z; dy, (x,y) € Ø}. So 


{x; Jy, (x,y) € Ø} = 0 by Definition 7.6.3. Hence X = (). 


Now suppose that X = Ø. Then Dom(f) = 0. That is, (x; dy, (x, y) € f] = 0. So Va, ^(3y, (x,y) € f). 
Therefore Vz, Vy, (x,y) € f by Theorem 6.6.10 (v). But all elements of a function are ordered pairs. So 
Vz, z € f. Hence f = () by Definition 7.6.3. 


For part (ii), suppose that X # Ø. Then x € X for some x. So (x,y) € f for some y by Definitions 9.5.4 
and 10.2.2. Then y € Y by Definition 10.2.2 (iii). Hence f # Ø and Y #9. 


For part (iii), let Y = Ø. Then X = 0 by part (ii). So f = by part (i). Hence X = ( and f = 0. 


10.2.24 REMARK: Zero to the power zero. 

One of the riddles often posed in elementary mathematics is: “What is zero to the power zero?” Since 0* = 0 
for general x > 0, and z? = 1 for general x > 0, the answers 0 and 1 both seem like reasonable guesses. 
Whichever choice one makes will break one rule or the other (or both). Using some analysis, one may 
resolve the issue by observing that lim,_,9+ x” = lim,_,9+ e*^* = exp(lim,_,9+ zInz) = 1. Coincidentally, 
Theorem 10.2.25 gives 1 as the cardinality of the set Ø”. This coincidence seems to justify the answer 0° = 1. 


10.2.25 THEOREM: There is one and only one function from the empty set to the empty set. 


Ø = {0}. 


Pnoor: By Theorem 10.2.23 (i), if f = Ø then f : 0 — 0. In other words, Ø € 0°. To show that this is the 
only element of (9, let f € 0%. Then f C 0 x 0 = 0 by Definitions 10.2.2 and 9.5.2. So f = Ø. So Ø’ contains 
one and only one element, namely the empty set (). 


10.2.26 THEOREM: Some elementary properties of function-set boundary cases. 
(i) 0* = for any non-empty set X. 

(ii) Y? = (0) for any set Y. 

(iii) Y = (((z,y)); y € Y} for any object x and set Y. 


PRoor: To show part (i), note that Ø¥ C X x Ø = Ø. So if f € Y*, then f C 0. Hence f = Ø. But 
a function must contain at least one ordered pair for each element of the source set, which in this case is 
non-empty. Therefore there are no such functions. So ()* = 0. 

The show part (ii), let f € Y’. Then f C 0x Y =O. So f = 0. By Remark 10.2.22, Ø € Y’. So Y? contains 
one and only one element, namely the empty set (). 

For part (iii), let f € Y{*} for some set Y and object x. Then f C {x} x Y = {(2,y); y € Y}. Therefore 
Jy € Y, f ={(x,y)} because y is unique for each x. Conversely, if f = {(x,y)}, then f e Y'*}. So f e Y1% 
if and only if f = {(x, y)} for some y € Y. Hence Y!?! = (((z,y); y € Y). 


10.2.27 DEFINITION: The identity function on a set X is the function f : X — X with Vr € X, f(x) ^ x. 


10.2.28 REMARK: The identity function on a set. 

The identity function on any set X is the set ((z,x); x € X]. This replicates Notation 9.6.10 for identity 
relations, and consistent with it. They denote the same ZF set in each case, but the metamathematical 
meaning is different. 


10.2.29 NOTATION: idx denotes the identity function on a set X. 
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10.2.30 REMARK: The identity function is a function template. 
Since the identity function idx is parametrised by a set which is defined in the context where it is used, it 
is really a kind of “meta-function” or “function template”. 


10.2.31 REMARK: Functions of several variables. 

Mathematics texts often talk about a “function of several variables”. There is no specific definition of this 
concept as a kind of set-construction in ZF set theory. Presumably one could define such set-constructions 
by generalising ordered pairs to ordered tuples, but usually the single domain set is simply replaced by a 
Cartesian set-product of some kind. (By contrast, two-parameter functions are explicitly defined in first- 
order languages, where the set membership predicate, for example, is a function with two set-arguments, not 
a function with a single set-pair as an argument. But the domain of the set-membership predicate is not a 
ZF set, nor is it a pair of ZF sets.) 


A “function of two variables", where the variables are in sets X4 and X5, and the range is a set Y, may 
be formalised as a function with domain X = X, x X» and range Y. Thus there is no difference between 
a function with a single variable r = (x1,%2) € X = X, x X» and a function of two variables x, € Xj 
and x2 € X5. The general definition of functions of more than two variables requires the construction of 
Cartesian products of finite families of sets. (See Section 10.11.) 


10.2.32 DEFINITION: A function of two variables is a function whose domain is a Cartesian product of two 
sets. In other words, it is a function f with Dom(f) = X x Y for some sets X and Y. 


10.3. Choice functions 


10.3.1 REMARK: The importance of choice functions in ZF set theory without the axiom of choice. 
Choice functions play an important role in most areas of mathematics which are based on ZF set theory 
(with or without an axiom of choice). If an axiom of choice is added to the list of axioms, then choice 
functions are guaranteed to exist, although nothing further may be known about them. If no axiom of choice 
is adopted, any choice functions which are required must be constructed on the basis of ZF set theory alone. 
So it could be argued that choice functions are of much greater importance when the axiom of choice is not 
assumed because in its absence, the existence of all choice functions must be demonstrated the hard way, 
using logical deduction from constructive ZF axioms. Identifying where choice functions are required, and 
how to construct them without AC, is one of the principal themes of this book. Axioms of choice are not 
invoked in this book except to show what is “lost” by not giving in to temptation. 


It is sometimes argued that one obtains more theorems with the axiom of choice than without it. But 
it can be even more strongly argued that without the axiom of choice, one must make up the deficit by 
constructing all required choice functions explicitly rather than relying on an “imaginary parachute", and 
these explicit constructions all require definitions and theorems to back them up. So one should really expect 
more theorems without the axiom of choice. Measure theory and topology, for example, take on a different 
character when all constructions must be explicitly demonstrated to exist. 


Choice functions are frequently required as inputs to theorems or definitions. In many situations, their 
existence can be guaranteed in plain ZF set theory with moderate or little effort. If the required range of 
choice functions cannot be proved to exist (without AC), it is usually a simple matter to put preconditions 
on theorems or definitions to require choice functions to be provided as a prerequisite to make the theorems 
or definitions valid. 


Adding a condition to a theorem that a suitable kind of choice function must exist for the theorem to be 
valid is very similar to requiring a number to be non-zero before dividing another number by it. This is 
requiring the multiplicative inverse of the number to exist. But choice functions are the same (or essentially 
the same) as right inverses of given surjective functions. (See Remark 7.11.12 (1) and Theorem 10.5.17 (i) for 
this equivalence. See Definition 10.5.2 for surjective functions.) So an axiom of choice is always guaranteeing 
that a right inverse of a function exists. If mathematicians can be disciplined enough to require a number 
to be non-zero to make a theorem valid, then surely requiring a function to have a right inverse is not much 
more onerous. 


A very large proportion of all the hard problems in mathematics, which require so much time, effort, ingenuity 
and luck to solve, sometimes for a century or more, are inverse problems. Solving any algebraic, differential 
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or integral equation, for example, is solving an inverse problem. It could therefore be perceived as extremely 
lazy for someone to say that a two-line axiom gives them an inverse to any surjective function without 
making any effort at all, whereas so many mathematicians devote their whole lives and creativity to finding 
inverses. By renouncing all axioms of choice, mathematicians would be forced to construct inverses instead 
of merely fantasising them into existence. Making this effort sometimes reveals features of the inverses and 
their construction which had not previously been noticed, and which may lead to further developments. 


10.3.2 REMARK: The advantages of choice functions relative to choice sets. 

As mentioned in Remark 7.11.11, the most natural way of formulating the axiom of choice is in terms of 
choice functions rather than the low-level choice sets which are used for Definition 7.11.10 (9). Choice sets 
are limited to making choices from pairwise disjoint set-collection, as discussed in Remark 7.11.7. Using 
functions, it is possible to keep track of which choice goes with which set in an arbitrary set-collection. 
'The choice-function and choice-set formulations are equivalent, but it is the choice-function version which 
is generally the most applicable and most comprehensible. The first application of the axiom of choice is to 
Theorem 10.5.17 (i), which makes much more sense in terms of choice functions than choice sets. 


10.3.3 REMARK: Styles of choice sets and choice functions. 
The styles of choice sets and choice functions include the following. 


(1) Set-collection choice set. Given a pairwise disjoint set of non-empty sets X: 
Choose a set C such that VA € X, 3'z € C, z € A. (See Definition 7.11.4.) 


(2) Set-collection choice function. Given a set of non-empty sets X: 
Choose a function f : X + U X such that VA € X, f(A) € A. (See Definition 10.3.4.) 


(3) Power-set choice function. Given a set X: 
Choose a function f : P(X) V (0) — X such that VA € P(X) \ {0}, f(A) € A. (See Definition 10.3.5.) 


(4) Set-family choice function. (“Multiplicative axiom”.) Given a family of non-empty sets (Xq)aer: 


Choose a function f : J > (Jac; Xa such that Va € I, f(a) € Xa. In other words, choose f € xaerXa. 
(See Definition 10.11.8.) 


The axiom of choice in Definition 7.11.10 (9) is the hypothesis that every eligible set-collection (i.e. pairwise 
disjoint set of non-empty sets) in ZF set theory has a set-collection choice set. Style (4), which requires 
the definition of families of sets in Section 10.8, is equivalent to asserting that the Cartesian set-product 
XacAXa is non-empty. (Cartesian set-products of set-families are introduced in Section 10.11.) 


The AC version which asserts that every Cartesian product of non-empty sets is non-empty, often called 
the “multiplicative axiom”, is sometimes given as the standard AC definition, possibly because it is so 
intuitively appealing. Both the power-set and set-family choice function formulations were given in 1904 by 
Zermelo [443]. (The choice-set version of AC in Definition 7.11.10 (9) is called the “multiplicative axiom” by 
E. Mendelson [370), page 198, but this disagrees with the terminology of most authors.) 


10.3.4 DEFINITION: A set-collection choice function for a set of non-empty sets S is a function f : S 2 JS 
which satisfies VA € S, f(A) € A. 


10.3.5 DEFINITION: A power-set choice function for a set X is a function f : P(X) V {0} — X which 
satisfies VA € P(X) V (0), f(A) € A. 


10.3.6 REMARK: Power-set choice functions. 

Definition 10.3.5 is the special case of Definition 10.3.4 with S = P(X) V (0). (Definition 10.3.5 is illustrated 
in Figure 10.3.1.) Therefore for any set X, any set-collection choice function for P(X) V {0} is a power-set 
choice function for X. 

Conversely, let S be a set of non-empty sets, and let X = (JS. Then S C P(X) by Theorem 8.5.2 (viii). 
Let f be a power-set choice function for X, and let g = {(x,y) € f; x € S}. (That is, g is the restriction of 
f to S as in Definition 10.4.3.) Then g is a function with domain S, and g(A) € A for all A € S. So g is a 
set-collection choice function for 5. 
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Figure 10.3.1 Choice of one element for every subset A; of a given set X 


10.3.7 REMARK: Aziom of choice for power-set choice functions. 
An axiom of choice may be stated in terms of power-set choice functions as: 


VX, 3f : P(X) \ (0) > X, VA € P(X) V (0), f(A) € A. 


This may written in terms of lower-level logic as follows. 


VX, 3f, (vz, y, z, (((m y) € f ^ (@,2) € f) => (y = 2))) 
^ (VA, ((ACXA^ AZ) f(A)e A)). 
To be suitable for use as an axiom in Section 7.11, this would have to be expanded to a much lower-level 


set-theoretic formula. For this, an ordered pair (x,y) must be written as ((x],([x,y]]. The condition 
"(z,y) € f" may be written as: 


Jp, (p € f A (Va, (a € ps ((Vb, (b&a«&b-—2z)) V (Ve, (CE aS (cox V c=y))))))). 


This expansion must be performed for the expression "(z,z) € f" also. But this is not the end of the story. 
The expression “A C X" may be written as "Va € A, a € X", which is simple enough. If f is a function, 
then “f(A) € A” may be written as: 


dp, (pe f ^ ((Vaep, Aca) ^ (Ha € p, db € a, be A)). 


But there's more. To ensure that f really is a function, all elements of f must also be required to be ordered 
pairs. The proposition “Vp € f, 3x, Jy, p = (x, y)" may be written as: 


Vp € f, dv, Jy, ((da€ p, v € a) ^ (dae p, y € p) ^ (Va E p, x € a) ^ (Va E p, Vb € a, (b = x V b = y))). 


Alternatively, this may be written as: 


Vp € f, da, Jy, Va, (a € ps ((Vb, (b € aS b —x)) V (Ve, (cea (c xz V c=y))))). 


It is fairly clear that the axiom of choice for power-set choice functions cannot be written on a single line as 
a set-theoretic formula. (It seems to require about five lines.) This is ample reason to use the set-collection 
choice sets for Definition 7.11.10, where the axiom requires only two lines as a set-theoretic formula. It is 
difficult to believe that a five-line set-theoretic formula could be a fundamental property of sets. 


10.3.8 REMARK: The axiom of choice for a family of sets. 

Theorem 10.3.9 effectively expresses the axiom of choice in Definition 7.11.10 line (7.11.1) in terms of a 
family of sets instead of a set of disjoint sets. The set-valued function S in the statement of Theorem 10.3.9 
is effectively a family of non-empty subsets of a fixed set Y. (The ZF replacement axiom in Section 7.7 
implies that the range of any family of sets is a set. So the requirement S(x) C Y for some fixed Y does not 
place any real restriction on S.) The pairwise disjointness pre-condition for Definition 7.11.10 is achieved by 
tagging the elements y of each set S(x) with x so that the products {x} x S(x) are pairwise disjoint. 


10.3.9 THEOREM [ZF+AC]: Functional form of the axiom of choice. 
Let X and Y be sets. Let S : X —> P(Y) \ {@} be a function. Then 3g : X > Y, Yx € X, g(a) € S(x). 
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Pnoor: Let Z = (Sn ({x} x P(Y)); x € X). In other words, Z = {{x} x S(x); x € X}. Then 0 ¢ Z 
because S : X — P(Y) \ {Ø} is a function, and so {x} x S(x) Æ Q for all x € X because S(x) z (. 


Let A1, 4» € Z. Then A; = {x1} x S(z1) and Ag = {2} x S(z2) for some 71,22 € X. If Aı N A» Æ Í, 
then xı = x2, which implies A; = Ag. Thus Z satisfies the conditions for Definition 7.11.10 line (7.11.1). 
Therefore by the axiom of choice, Definition 7.11.10 (9), there is a set C such that VA € Z, Jp € C, pc A. 


Let g =CN(X x Y). Let z € X. Let A= {x} x S(x). Then A € Z. Therefore 3'p € C, p € A. In other 
words, there is a unique p € C such that p € A. Such p € C satisfies p = (x,y) for some y € S(x). But 
S(x) C Y because S(x) € IP(Y)M0]. So y € Y. Therefore p € g. Thus Vx € X, Jy € Y, (x,y) € g. But y is 
unique for a given x € X because p € C is unique. So Vx € X, 3'y € Y, (x, y) € g. Thus g is a well-defined 
function from X to Y by Definition 10.2.2 (iii). Hence 3g : X — Y, Va € X, g(x) € S(x). 


10.4. Restrictions, extensions and composition of functions 


10.4.1 REMARK: Restrictions of function domains. 

'The function domain restrictions in Definition 10.4.3 and Notation 10.4.4 are very frequently required in 
differential geometry because manifolds and fibre bundles are defined in terms of atlases, which implies that 
most concepts must be defined locally via charts. Therefore global functions and maps must very frequently 
be restricted to chart domains for definitions and theorems. 


Domain and range restrictions for general relations are not so often encountered in differential geometry. 
These are given in the corresponding Definition 9.6.21 and Notation 9.6.22. 


10.4.2 DEFINITION: A restriction of a function f is any function g which satisfies g C f. 


10.4.3 DEFINITION: The restriction of a function f to a set A is the function ((z,y) € f; x € A}. 


10.4.4 NOTATION: Function domain restrictions. 
f | a for a function f and set A, denotes the restriction of f to A. 


10.4.5 THEOREM: Elementary properties of function restrictions. 
Let f be a function. Let A be a set. Then 


fl, ={@y) © fia A} 
= {(x, f(x)); x € A}, (10.4.1) 
Dom(f|,) = An Dom(/) (10.4.2) 
= {x € A; dy, (x,y) € f}, 
Range(f|,) = (f(x); z € An Dom(f)} 
= {y; dz € A, (x,y) € f}. (10.4.3) 


PRoor: The assertions follow directly from Definition 10.4.3 and Notation 10.4.4. 


10.4.6 REMARK: Mild clash of terminology between function restrictions and relation restrictions. 

The restriction of a function f to a set A in Definition 10.4.3 is the same as the domain restriction of the 
relation f which is given in Definition 9.6.21. Thus there is a mild clash of terminology here. The "restriction 
of a relation to a set" in Definition 9.6.21 restricts both the domain and range to the same set. However, 
the relation restriction definition specifies that the source and target sets of the relation are the same. (This 
is in itself a source of ambiguity, since source and target sets are highly non-unique, being defined in each 
context for convenience, as a kind of “hint” .) 


10.4.7 REMARK: The function restriction notation. 

It is not necessary for the set A in Definition 10.4.3 to be a subset of the domain X of f. A function g is a 
restriction of a function f if and only if it is a restriction of f to some set A. Suppose g is a restriction of f. 
Then g € f. Sog= Conversely, if g — f| 4 then clearly g € f by the definition of f| a 


f Domi) j 
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10.4.8 THEOREM: Some basic properties of function restrictions. 
Let f and g be functions. Let A and B be sets. 


(f, 
(ii) If AC B, then f|, € f|. 


) 
(iii) MEE f ^ (A x Range(f)). 
(iv) fam f ^ (A x Range( T 
- If f Cg, then f|, Egla 

(vi) (fle = flans: 
PROOF: Part (i) follows from Definition 10.4.3. 
Part (ii) follows from Definition 10.4.3. 
For part (iii), note that (x,y) € f N (A x Range(f)) if and only if (x,y) € f and x € A and y € Range(/) 
by Definition 9.4.2. But (z,y) € f implies y € Range(f) by Definition 9.5.4 and Remark 10.2.6. So 
(x,y) € f (Àx Range(f)) if and only if (x,y) € f and z € A by Theorem 4.7.9 (lxx). Therefore f|, = 
f(A x Range(f)) for any set A by Definition 10.4.3. 
For part (iv), f|, = FA (Ax Range(f)) 2 fn(Ax Range(/| ,)) by parts (i) and (iii). Let (x,y) € f| ,. Then 
fn (Ax Range(f)) by part (iii). So (x,y) € f and x € A. So y € Range(f| ,) by Theorem 10.4.5 line (10.4.3). 
So (x,y) € f n (A x Range(/]| ,)). So f|, € fn (A x Range(/| ,)). Hence f|, = f n (A x Range(f|_,)). 
For part (v), let f C g. Then Range(f) C Range(g) by Theorem 9.5.13 (ii). So by part (iii) = Theorems 
9.4.6 (ii) and 8.1.7 (iii), f|, = f N (A x Range(f)) € f N (A x Range(g)) C gN (A x Range(g)) = g| 4- 
For part (vi) «Uia (f|,)n (B x Range GL) = fn (AxRange(f))n (B x Range( fl) by i" (iii). But 
(A x He )) N (B x Range (Fla) = (AN B) x Range (FL) by Theorem 9.4.6 (iv) because Range (Fla) 
Range(f) by part (i). So ( (Flas C fA((AN B) x Range f) = P pen by parts (i) and (iii), and ( GL n" 2 
fn((AnB)x Range( f| ,.. 5) = cans by parts (ii) and (iv). Hence UP ada = Flaps 


10.4.9 REMARK: Subsets of algebraic operations are restrictions of those operations. 

Theorem 10.4.10 is useful when demonstrating the most basic properties of subsystems of algebraic systems. 
Algebraic systems generally have one or more operations of the form u : X x Y > Z, where typically Z = X 
or Z = Y, and often X = Y. Then a subsystem would generally be defined to have a sub-operation p’ C u 
with Dom(y/) = X' x Y', where X' C X and Y' C Y. This interprets the prefix “sub” very literally to mean 
that the operation j/' is a subset of the full system's operation u. However, one usually wants to know that 
u' agrees with u on Dom(p’), which means that w’ = p| x This is a more or less obvious consequence 


UIn 


xYt 
of the set-inclusion rules, but it must be proved, as in Theorem 10.4.10. (For applications of this apparently 
worthless theorem, see the proofs of Theorems 19.3.10 (i, ii) and 19.10.18 (i).) 


10.4.10 THEOREM: Inclusion of one two-parameter functions within another implies it is a restriction. 
Let Xi, X2, Y1, Yo be sets with X1 C X» and Yı C Yo. Let fı and f» be functions with Dom(f1) = Xı x Yi, 


Dom(/f5) = X2 x Ya and fı C f2. Then fı = Buses 


PROOF: Dom(f;)- X, x Y, implies fı = filz xy, But also lxx is a function with domain X, x Yi 
by Theorem 10.4.5 line (10.4.2) because X; x Yı € Xə x Yə by Theorem 9.4.6 (ii). Then fı C f implies 
ape (a lxx by Theorem 10.4.8 (v). Thus fı C E where fı and fo are functions with 
the same domain X, x Yı. Hence fı = by Theorem 10.2.14. 


S 


Bus 


10.4.11 REMARK: Extensions of functions. 

Definition 10.4.12 defines extensions of functions so that g is an extension of f if and only if f is a restriction 
of g. By contrast with restrictions of functions, there is no such thing as the extension of a function f to a 
set A which includes the domain of f, because obviously this would not be unique. 


10.4.12 DEFINITION: An extension of a function f is any function g which satisfies g 2 f. 


10.4.13 DEFINITION: An extension of a function f to a set A such that Dom(f) C A is any function g 
which satisfies g 2 f and has domain A. 
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10.4.14 REMARK: Post-evaluation expression substitution notation. 

Notation 10.4.15 is useful for substituting a complicated value for a free variable in a complicated expression. 
Roughly speaking, Notation 10.4.15 means that F (x)| ya = F(a). However, the following example shows 
some possible difficulties with this interpretation. 


( x (Hava (2))) ) NT (p)' (10.4.4) 


In this example, the variable x is bound to the partial differential operator. After the differentiation is exe- 
cuted, the result becomes a function of a free variable x, which is then substituted with the expression v (p). 
The substitution cannot occur before differentiation is carried out because substitution for x in the partial 
differential operator would be very difficult to interpret. Therefore it is understood that the expression Ya (p) 
is substituted after differentiation is carried out. 


The difficulty arises here because of an anomaly in the standard way in which differentiation is denoted in 
the literature. Notations with a form similar to “(d/dx) cos?(r)" commonly signify the result of carrying 
out a differentiation on a function, not just its value at x, and then regarding the result as a functional 
expression for the same variable x. Thus the notation “d/d” effectively means that the following expression 
is expected to be an inline definition of a function (which may contain multiple instances of the variable zx, 
as in “x? + x”), which in this example is the composition of the cosine and square functions. Then this 
inline-defined function is to be differentiated, and the resulting function is understood to have the same 
variable x. A more formally correct notation would be d(x +> cos?(z))(z), where the inline function is now 
defined explicitly as “x — cos?(x)”, then this is differentiated by d, and the resulting derivative is evaluated 
at x. Sadly history did not go down this path. But if it had, one could have written, instead of the expression 
in line (10.4.4), the expression 
O;(x > vis (o (2))) (9 (0). 
Thus Notation 10.4.15 may be thought of as a “post-evaluation expression substitution" notation. 


Notation 10.4.15 must not be confused with function restriction. The notation F(x) ls denotes substitution 
of a value a, which yields a value for the expression F when it is restricted to the set {a} C Dom(F). Clearly 
F(a) is not exactly the same thing as Fly ays which equals the set {(a, F(a))}. 


10.4.15 NOTATION: F(x) EC for an expression F with variable x, and an expression a, denotes the result 
of substituting a for x into F after it has been evaluated. 


10.4.16 REMARK: Composition of functions versus composition of relations. 

Definition 10.4.17 for composition of functions is effectively the direct application of Definition 9.6.2 for 
composition of relations, but in the case of functions, it is possible to write the value of the composite in 
terms of the values of the individual functions because “the value” is well defined by the unique existence of 
the values of functions for all elements in their domains. 


An important difference between the function and relation composition definitions is that Definition 10.4.17 
requires the range of the first function to be included in the domain of the second function. The purpose of 
this is to ensure that the domain of the composite function is the same as the domain of the first function. 
One could perhaps refer to functions which satisfy the “chain condition" Range(f) C Dom(g) as “(fully) 
chained functions" or “dovetailed functions", although the word “dovetail” would suggest an exact equality 
of Range( f) and Dom(g). 


10.4.17 DEFINITION: The composition or composite of two functions f and g such that Range( f) C Dom(g) 
is the function h defined by h(x) = g(f(x)) for all z € Dom(f). 


10.4.18 NOTATION: go f denotes the composition of two functions f and g. 


10.4.19 REMARK: Pitfalls of the function composition notation. 
Although (g o f)(x) is the same thing as g(f(x)), g o f is not the same thing as g(f) because g o f is a 
function constructed from f and g, not the value of g for the argument f. 


10.4.20 REMARK:  Associativity of function composition. 
Theorem 10.4.21 shows that function composition is associative in the sense of Definition 17.1.1 (i). Note 
that Definition 10.4.17 implies Dom(g o f) = Dom(f) because of the domain/range chaining requirement. 
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10.4.21 THEOREM:  Well-definition and associativity of function composition. 
Let f,g,h be functions with Range(f) C Dom(g) and Range(g) € Dom(h). 


(i) (h o g) o f and ho (go f) are well-defined functions with domain equal to Dom( f). 
(ii) (hog) o f — ho(go f). 


PROOF: For part (i), let Range(f) C Dom(g) and Range(g) € Dom(h). Then h o g and g o f are well- 
defined functions, and Dom(h o g) = Dom(g) and Range(g o f) € Range(g). So Range(f) € Dom(h o g) 
and Range(g o f) € Dom(h). Hence (h o g) o f and h o (g o f) are well-defined functions with Dom((h o 
g) o f) =Dom(f) and Dom(h o (g o f) = Dom(g o f)  Dom(f). 

For part (ii), let x € Dom(f). Then ((h o g) o f)(x) = (h o g)(f(x)) = h(g(f(x))) by Definition 10.4.17. 
Similarly (h o (g o f))(z) = h((g o f)(x)) = h(g(f(z))). Thus Va € Dom(f), ((h o g) o f)(z) = (h o (go 
f)(x). Hence (ho g) o f - ho(go f). 


10.4.22 REMARK: Restrictions of composites of functions. 
Theorem 10.4.23 (i), implies that there is no need to use parentheses around expressions of the form g o f | PE 
The result is the same whether f is first restricted and then composed, or first composed and then restricted. 


10.4.23 THEOREM: Restricting a function composite is the same as restricting the right-hand function. 
Let f,g be functions with Range( f) C Dom(g). 


(i) (gof)|,=9° (Fla) for any set A. 


Proor: For part (i), note that Range(f| ,) C Range( f) € Dom(g) by Definition 10.4.17. So g o (Fla) is a 
well-defined composition by Definition 10.4.17. Then 


y) 
y); x € A and Az, ((z, z) € f and (z, y) € g)} 
y); Iz, ((z,z) € f|, and (z,y) € g)} 


10.4.24 REMARK:  Partially defined composites of functions. 

It is often desirable to give a meaning to g o f even if Range(f) Z Dom(g) in Definition 10.4.17. One 
workaround for such “partially chained functions" is to restrict f to f^ ! (Dom(g)) before composing f with g. 
This function domain restriction is automatically achieved if one applies Definition 9.6.2 (for the composition 
of general relations) to functions, although this has the drawback that the domain of the composite is no 
longer guaranteed to be the same as the domain of the first function. 


The composition of partially chained functions is also well defined if one regards functions as a sub-class 
of partially defined functions. (See Definition 10.10.6 and Notation 10.10.7 for the composition of partial 
functions.) In this case, the “output” from the composition of two functions is guaranteed to be a partial 
function, which may or may not have the same domain as the first function in the composition. 


10.4.25 REMARK:  Pointwise composition of function-valued functions. 

Occasionally it is convenient to have a definition and notation for the *pointwise composition of function- 
valued functions" f and g, which means the function-valued function which is obtained by composing the 
functions f(p) and g(p) at each point p in the common domain of the two function-valued functions. This is 
given in Definition 10.4.26. (See Theorem 69.14.6 for an application.) 


10.4.26 DEFINITION: The pointwise composition of function-valued functions f and g which have the same 
domain X, and which satisfy Vp € X, Range(f(p)) € Dom(g(p)), is the function-valued function g oo f with 
domain X which is defined by 


Vp € X, (g oo f)(p) = g(p) o f(p). 
'Thus 
Vp € X, Va € Dom(f(p)), (g oo f)(p)(z) = g(p) Cf (p)()). 
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10.5. Injections, surjections and inverses 


10.5.1 REMARK: The relations between injectivity, surjectivity and bijectivity. 
Figure 10.5.1 illustrates the basic relations between injectivity, surjectivity and bijectivity for both functions 
and partial functions. (See Section 10.9 for partially defined functions.) 


relation 

partial function 
#(y)<1 

injective partial function function surjective partial function 
#(z)<1 #(y)<1 #(y)=1 #(x)>1 #(y)<1 
injective function surjective function 
#(x)<1 #(y)=1 #(x)21 #(y)=1 
bijective function 
#(2)=1 #(y)=1 


Figure 10.5.1 Family tree for injectivity, surjectivity and bijectivity 


The inequalities in Figure 10.5.1 are abbreviations where “#(x)” means #{x; (x,y) € f} and “#(y)” means 
#{y; (x,y) € f}, and the inequalities indicate constraints on the numbers of values as in Remark 10.5.22. 
(See Chapter 13 for cardinality.) Thus “< 1” indicates uniqueness, “> 1” indicates existence, and “= 1” 
indicates unique existence. 


10.5.2 DEFINITION: 

A function f : X — Y is injective when Yz1, £2 € X, (f(zi1) = f(z2) > v1 = v»). 
A function f : X > Y is surjective when Vy € Y, dx € X, f(x) — y. 

A function f : X — Y is bijective when it is injective and surjective. 


An injection from a set X to a set Y is a function f : X — Y which is injective. 
A surjection from a set X to a set Y is a function f : X — Y which is surjective. 
A bijection from a set X to a set Y is a function f : X — Y which is bijective. 


10.5.3 REMARK: Injections with unspecified target sets. 

The surjections and bijections in Definition 10.5.2 must always have specified target sets because part of 
their definition is that the range must be a superset of some specified target set. Injections, by contrast, do 
not need a target set to be specified. (The functions in Definition 10.2.2 (i, ii) similarly do not need a target 
set specification.) Therefore injections can be defined in Definition 10.5.4 without naming a target set. 


Similarly, a surjection does not require a source set to be specified. This is also introduced in Definition 10.5.4, 
although surjections with unspecified source sets are seen much less often than injections with unspecified 
target sets. 


10.5.4 DEFINITION: 

An injection is a function f which satisfies Vr1, £2 € Dom(f), (f(zi) = f(x2) > z1 = x32). 

In other words, an injection is a function f which satisfies Y(x1, y1), (v2, y2) € f, (y1 = yo — T1 = v»). 
In other words, an injection is a relation f which satisfies V(a1, y1), (zo, y2) € f, (yi = ya & x1 = x2). 


An injection on a set X is a function on X which is an injection. 
A surjection to a set Y is a function f such that Range(f) 2 Y. 


10.5.5 THEOREM: Identity functions are bijections on their domains. 
Let X be a set. Then idx : X — X is a bijection. 


Pnoor: By Definition 10.2.27 and Notation 10.2.29, Dom(idx) = Range(idx) = X. Injectivity follows 
from the observation that if idx (x1) = idx (x2), then zı = x2 by the definition of idx. For surjectivity, let 
y € X. Let x = y. Then idx (x) = y. So idx : X > X is a bijection by Definition 10.5.2. 
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10.5.6 THEOREM: Inheritance of injectivity and surjectivity by composites of functions. 
Let X, Y and Z be sets. 
(i) If f: X ^ Y and g: Y — Z are injections, then g o f : X — Z is an injection. 
(ii) If f : X > Y and g : Y 5 Z are surjections, then g o f : X > Z is a surjection. 
(iii) If f: X —^ Y and g: Y — Z are bijections, then g o f : X > Z is a bijection. 


PROOF: For part (i), let f : X — Y and g : Y > Z be injections. Let g(f(x1)) = g(f(#2)) for some 
1,,19 € X. Then f(z,) = f(x) because g is injective. So zx, = z2 because f is injective. Hence g o f is an 
injection. 

For part (ii), let f : X — Y and g : Y — Z be surjections. Let h = g o f. Let z € Z. Then z = g(y) 
for some y € Y because Range(g) = Z. So y = f(x) for some x € X because Range(f) = Y. Thus 
Vz € Z, 3x € X, z = g(f(x)). Therefore Range(g o f) 2 Z. But Range(g o f) C Z. So Range(g o f) = Z 
by Theorem 7.3.5 (iii). Hence g o f : X — Z is a surjection. 


Part (iii) follows from parts (i) and (ii). 


10.5.7 REMARK: Difficulties with the terms “one-to-one” and “onto”. 

Injective functions are sometimes called “one-to-one” functions. (The term “one-to-one” is often abbreviated 
as ^1-1".) This is ambiguous because “one-to-one” often means bijective, which is in fact a more reasonable 
meaning to give it. 

The word *onto" is often used for surjective functions, but it suffers from similar ambiguities. (It is also not 
an adjective in standard English.) However, since surjectivity of a function is always relative to a particular 
target set, the word “onto” may be used correctly and unambiguously as a preposition as in the phrase: 
“f is surjective onto Y.” A real hazard of the terms “surjective” and “bijective” is that they are meaningless 
if the target set is not specified. The term “onto” has at least the advantage that in natural English, it 
is a preposition requiring an object. Since “onto” does not require a grammatical object in mathematical 
English, it suffers from the same hazards as the word “surjective”. 


10.5.8 REMARK: The dependence of surjectivity on the target set of a function. 

Whereas the injective property is independent of the choice of range set Y, the surjective property depends 
completely on the set Y. In fact, since a function f is specified as a set of ordered pairs, it is possible to 
determine the domain of f as the set Dom(f) = (x; (x,y) € f}, but it is not possible to determine the range 
of f from only the ordered pairs of f. That is, the range of a function is not an attribute of the function as 
usually specified. One can only determine that Y 2 Range(f) = (y; (x,y) € f). Then a function f : X > Y 
is said to be surjective if Y = Range(f), which means that surjectivity is a relation between a function f 
and a given set Y , not an attribute of the function as injectivity is. 


It follows that the target set Y of a function f : X — Y must always be specified when asserting that a 
function is surjective. It is best to say explicitly something like “f is surjective onto Y". 


10.5.9 THEOREM:  Bijections are functions whose inverse relation is a function. 
Let f : X — Y be a function. Then f : X — Y is a bijection if and only if its inverse relation f^! is a 
function from Y to X. 


PROOF: Let f : X — Y be a bijection. Then by Definition 10.5.2, f is injective and f : X —5 Y isa 
surjection, and so Vzi,rzo € X, (f(x1) = f(x2) > zı = x2) and Vy € Y, dr € X, f(x) = y. The former 
property for the relation f implies Vy, Vx4, Vr», (((x1, y) € f ^ (xo,y) € f) > z1 = 23) (because f C X xY), 
which is equivalent to the property Vx, Vy1, Vy2, (((x,y1) € FT! ^ (x, y2) € f^ 1) = yi = y2) for the inverse 
relation f~'. Therefore f^! is a function by Definition 10.2.2 (i). The latter property for the relation f 
implies Vy € Y, 3x € X, (x,y) € f, which is equivalent to the property Vr € Y, dy € X, (x,y) € f^! for 
the inverse relation f~!. So Dom(f^!) > Y. But f C X x Y implies f^! C Y x X. So Dom(f^!) C Y. 
Therefore Dom(f ^!) = Y. So f^! is a function on Y by Definition 10.2.2 (ii) From f^! C Y x X, it also 
follows that Range(f~!) C X. Therefore f-! is a function from Y to X by Definition 10.2.2 (iii). 

Now suppose that f : X — Y is a function and its inverse relation f-! is a function from Y to X. Then by 
Definition 10.2.2 (i), Vx, Vai, Vyo, (((zx,31) € f A (x, y2) € fT!) > y1 = y2). Therefore f has the property 


Vy, Var, Vee, (((zx1 y) € f A (z2,y) € f) = 21 = 22). So Vy, Var, £2, ((f(z1) = y ^ f(z2) = y) > 2 = 22), 
which is equivalent to Yz1, £2, (f(z1) = f(z2) > z1 = £2). So Vzi, £2 € X, (f(z1) = f(z3) > v1 = 23), and 
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therefore f is injective by Definition 10.5.2. From Definition 10.2.2 (ii), it follows that Dom(f~') = Y. So 
f^! satisfies Vy € Y, dy € X, f-'(y) = x. So f satisfies Vy € Y, dx € X, f(x) = y. Therefore f is surjective 
by Definition 10.5.2. Hence f is a bijection. 


10.5.10 DEFINITION: The inverse of a function f : A — B is the inverse relation f—!. 


10.5.11 THEOREM: The inverse of a bijection is a bijection. 
Let f : A — B be a bijection. Then the inverse relation f7! : B — A is a bijection. 


PROOF: Let f: A — B bea bijection. Let g = f^, where f^! is the inverse relation of f. Then g : B— A 
is a function by Theorem 10.5.9. The inverse relation of g is g^! = (f^!) ! = f by Theorem 9.6.16 (ii). 
But f : A > B is a function. So g^! : A > B is a function. Therefore g : B — A is a bijection by 
Theorem 10.5.9. Hence f^! : B > A is a bijection. 


10.5.12 REMARK: Inverse functions. 
It is not strictly necessary to define "inverse function" because it is the same as the "inverse relation" if the 
“inverse function" is well defined. (See Definition 9.6.13 for inverse of a relation.) 


The inverse relation of a function is always well defined, but the inverse might not be a function. Thus “the 
inverse relation fT!” is well defined for any function (or relation) f, but “the inverse function f^!" is only 
well defined if f is a bijection. 


10.5.13 DEFINITION: 
A left inverse of a function f : X > Y is a function g : Y — X such that g o f = idx. 
A right inverse of a function f : X > Y is a function g : Y — X such that f o g = idy. 


10.5.14 THEOREM: JInjectivity, surjectivity, bijectivity, and the existence of left and right inverses. 
Let X and Y be sets. 
(i) A function g : Y — X is the inverse of a function f : X — Y if and only if g is both a left inverse and 
right inverse of f. 


(ii) A function f : X — Y is an injection if and only if f has a left inverse. 

(ii) A function f : X — Y is a surjection if f has a right inverse. 

(iv) A function f : X — Y is a bijection if and only if f has an inverse function. 

(v) A function f : X — Y is an injection if and only if (f o g = f o h) > (g = h) for all functions g : Z > X 
and h : Z — X and sets Z. 


(vi) A function f : X — Y isa surjection if and only if (go f = ho f) > (g = h) for all functions g : Y > Z 
and h : Y — Z and sets Z. 


PROOF: For part (i), let f : X — Y be a function for sets X and Y. Suppose that there exists a function g 
which is the inverse of f. Then by Definition 10.5.10, g = f^!. Let x € X. Then g(f(x)) = z if and only 
if (f(z),z) € g, if and only if (z, f(z)) € g^! = f, if and only if f(z) = f(x). So z = x because f is an 
injection, because f is a bijection by Theorem 10.5.9. Therefore g(f(x)) =a for alla € X. Sog o f —idx. 
Therefore g is a left inverse of f by Definition 10.5.13. Similarly, f(g(y)) = z if and only if (g(y),z) € f, 
if and only if (z, g(y)) € f^, if and only if f^ !(z) = f^!(y). So z = y because f is a function. Therefore 
f(g(y)) = y for all y € Y. So f og = idy. Therefore g is a right inverse of f by Definition 10.5.13. 

Now suppose that f : X — Y is a function for sets X and Y , and that g is both a left inverse and right inverse 
of f. Then g : Y — X is a function, and g o f = idx and f o g = idy by Definition 10.5.13. Let (x,y) € f. 
Then y = f(x) = y. So gly) = g(f(x)) = idx(x) = x. So (y,x) € g. Conversely, let (y,z) € g. Then 
x = gly). So f(x) = f(g(y)) = idy (y) = y. So (x,y) € f. Thus (x,y) € f if and only if (y, x) € g, which 
means that g is the inverse of f by Definitions 10.5.10 and 9.6.13. 

For part (ii), let f : X — Y be an injection. Let h be the inverse relation of f. Then h : f(X) > X is 
a function. Define g : Y — X by g(y) = h(y) if y € f(X) and gly) = 0 if y ¢ f(X). Then g o f = idx. 
Therefore f has a left inverse. 

Let f : X — Y be a function which has a left inverse g : Y — X. Then g o f = idx. Let 1,22 € X. 


Suppose f(zi1) = f(z2). Then g(f(xi)) = g(f(x2)). But g(f(xi)) = zı and g(f(z2)) = z». So zi = q2. 
Therefore f is injective. 
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For part (ii), let f : X — Y be a function which has a right inverse g : Y — X. Then f o g = idy. 
Let y € Y. Then f(g(y)) = y. Let x = g(y). Then g(x) = y. Therefore f is surjective. 

For part (iv), let f : X — Y be a bijection for sets X and Y. Then the relation f^! is a function by 
'Theorem 10.5.9. So f has an inverse function. Now suppose that f : X — Y is a function and that f has an 
inverse function g. Then g is both a left inverse and right inverse of f by part (1). So f is both an injection 
and a surjection by parts (ii) and (iii). Therefore f is a bijection by Definition 10.5.2. 

For part (v), let f : X — Y be an injection for sets X and Y. Let Z be a set, and let g : Z — X and 
h: Z — X be functions. Suppose that f og = f o h. Let z € Z. Then f(g(z)) = f(h(z)). So g(z) = h(z) 
by Definition 10.5.2 because f is injective. Therefore Vz € Z, g(z) = h(z), which means that g = h. 

Now let f : X — Y be a function such that (f o g = f o h) = (g = h) for all functions g : Z > X 
and h: Z — X and sets Z. Let 11,22 € X be such that f(z1) = f(z2). Let Z = {z} for any z, and define 
functions g : Z > X and h : Z o X by g: z> zı and h : z > z2. Then (f og = f oh) because f(g(z)) = 
f(x1) = f(z2) = f(g(z)) and so Vz € Z, f(g(z)) = f(h(z)). So g = h. Therefore zı = g(z) = h(z) = za. So 
f is injective. 

For part (vi), let f : X — Y be a surjection for sets X and Y. Let Z be a set. Let g: Y > Z and h: Y > Z 
be functions. Suppose that go f = ho f. Let y € Y. Then y = f(x) for some x € X because f is surjective. 
So g(f(x)) = h(f(x)), which implies that g(y) = h(y). Therefore Vy € Y, gly) = h(y). So g — h. 

Now let X and Y be sets, and let f : X — Y be a function such that (g o f = h o f) > (g = h) for all 
functions g : Y —^ Z and h : Y > Z and sets Z. Suppose that f is not surjective. Then there is an element 
yo € Y such that Vr € X, f(x) Æ yo. In other words, yo ¢ Range(f). Let Z = (21,25) be a set containing 
two different elements z; and zə. Define g : Y — Z by gly) = zı for all y € Y. Define h : Y > Z by 
h(y) = zı for all y € Y \ {yo}, and h(yo) = z2. Then go f = h o f because g(f(x)) = h(f(x)) = 21 for 
all z € X. But g Z h because g(yo) = z1 F za = h(yo). 


10.5.15 REMARK: Slightly paradoxical existence of inverse of bijections without axiom of choice. 

There seems to be some kind of minor paradox between parts (iii) and (iv) of Theorem 10.5.14. On the one 
hand, it cannot be asserted (within ZF set theory) that every surjection has a right inverse. On the other 
hand, a bijection has both a left and right inverse. An AC-free mathematician might suspect that part (iv) 
could make some use of the axiom of choice. However, it is the injectivity which guarantees that no choice 
is required when constructing the inverse. 


10.5.16 REMARK: Proof of existence of right inverse for surjective functions requires axiom of choice. 

It is not possible to show in Theorem 10.5.14 (iii) that a function f : X — Y is a surjection only if f has a 
right inverse unless the axiom of choice is invoked. (See for example Remark 7.11.12.) The axiom of choice is 
used for AC-tainted Theorem 10.5.17 (i), but if the domain set X can be given a well-ordering, the axiom of 
choice is not required. (See Theorem 11.8.4.) Note, however, that the necessary and sufficient condition for 
surjectivity in part (vi) of Theorem 10.5.14 does not require the axiom of choice, and its proof is elementary. 


Theorem 10.5.17 (ii) asserts, effectively, that the universal and existential quantifiers in the proposition 
“Ye c Y, dó € X, R(ô,£)” can be swapped by replacing ô € X with a function 6’ : Y — X which chooses a 
value ó'(&) for each e € Y. 

Theorem 10.5.17 (ii) could be applied to X = Y = Rt with R(d,e) = ((6,2) € X x Y; f(Bz,5) € Ba), for 
all (6,7) € X x Y. (See Definition 38.2.9.) In this case, AC is not required because of the special monotonic 
structure of the relation R. (This is shown in Theorem 38.2.10.) However, in the case of a set M of Lebesgue 
measure zero, X is a set of covers of M, and R(ô,£) means that 6 is a cover of M whose sum of volumes is 
less than €. (See Definitions 45.1.3 and 45.2.2.) In this case, the ability to “swap the quantifiers” does not 


follow from pure ZF without AC. (See Theorem 45.2.5.) 


10.5.17 THEOREM [ZF+AC]: Existence of right inverses of surjections. 
Let X and Y be sets. 
(i) A function f : X — Y is a surjection if and only if f has a right inverse. 
(ii) Let RC X x Y satisfy Ve € Y, 46 € X, R(0,e). Then 50’: Y ^ X, Ve € Y, R(ó'(&),&). 
In other words, (R € X x Y and R(X) 2 Y) implies 5d’: Y > X, Ro ð D idy. 


PROOF: For part (i), let f : X — Y bea surjection. Then the function S : Y + P(X) V (0) defined by 
S(y) = (x € X; f(x) = y) for all y € Y is well defined. So by Theorem 10.3.9, there exists (by the axiom of 
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choice) a function g : Y — X such that Vy € Y, g(y) € S(y). It follows that f(g(y)) = y for all y € Y. In 
other words, g is a right inverse for f by Definition 10.5.13. Thus f has a right inverse. The converse follows 
from Theorem 10.5.14 (iii). 


For part (ii), let R C X x Y. Then the function S : Y + P(X) V (0) with S(e) = {6 € X; R(d,e)} 
for all e € Y is well defined. So by Theorem 10.3.9, there exists (by the axiom of choice) a function 
6’: Y + X such that Ve € Y, ó'(e) € S(e). But ó'(z) € S(e) if and only if ð (e) € X and R(ó'(e), e). Hence 
3ó':Y — X, Ve € Y, R(ó'(e),e). 


10.5.18 REMARK: Properties of compositions of functions with their inverses. 

If a function f : X — Y is not surjective, its inverse relation f-! is only a partial function on Y, not a 
function on Y. (See Definition 10.9.2 for partial functions.) If a function f : X — Y is not injective, its 
inverse relation does not have unique values at some points of its domain. However, the composition of 
a function with its inverse relation does have some useful properties as indicated in Theorem 10.5.19. In 
particular, the inverse relation is more or less the right inverse of an original function which is a surjection. 


10.5.19 THEOREM: Basic properties of composites of functions with their inverses. 
Let X and Y be sets. Let f : X >Y. 


(i) f 2 f° = idRange(f)- 
(ii) f o f^! = idy if and only if f : X — Y is surjective. 
(iii) f-o f 2 idx. 
Pnoor: For part (i), let (y,z) € f o f^!. Then by Definition 9.6.2, (y,z) € f^! and (x,z) € f for some 


a € X. So (x,y) € f and (a,z) € f for some x € X by Definitions 10.5.10 and 9.6.13. So y = z € Range( f) 
by Definitions 10.2.2 and 9.5.4. Therefore (y, z) € idRange(f) by Definition 9.6.9. Thus f o fc idnange(7)- 


For the reverse inclusion, let (y, z) € idgangecf). Then y € Range(f) and z = y by Definition 9.6.9. So 


(z, z) = (x,y) € f for some x € X by Definition 9.5.4. So (y, x) € f^! by Definitions 10.5.10 and 9.6.13. So 
(y, 2) € f o f^! by Definition 9.6.2. Thus f o f^! 2 ida, 4.5. Hence f o f^! = idRange(f)- 


Part (ii) follows from part (i) and Definitions 10.5.2 and 9.5.4. 


For part (iii), let (z,w) € idx. Then z € X and w = x by Definition 9.6.9. So (x,y) € f for some y € Y 
by Definition 10.2.2. Then (y, z) € f^! by Definitions 10.5.10 and 9.6.13. So (x, w) = (zx,z) € f^! o f by 
Definition 9.6.2. Hence f~! o f D idx. 


10.5.20 REMARK: The six kinds of morphisms between sets. 

By analogy with the six kinds of morphisms for groups and other algebraic categories, one may define also 
a very minimal six kinds of morphisms for unstructured sets as in Definition 10.5.21. (See Definition 17.4.1 
for the analogous group morphisms.) 


10.5.21 DEFINITION: Set morphisms. 
A set homomorphism from a set Sı to a set S2 is a map $ : $1 > S2. 


A set monomorphism from a set Sı to a set 5S5 is an injective set homomorphism ¢ : $4 — S». 


A set epimorphism from a set Sı to a set 55 is a surjective set homomorphism 6$ : $4 > S2. 
A set isomorphism from a set Sı to a set $5 is a bijective set homomorphism $ : $1 — S2. 


A set endomorphism of a set S is a set homomorphism $ : S > S. Alternative name: endofunction. 


A set automorphism of a set S is a set isomorphism 6$ : S > S. 


10.5.22 REMARK: Common names for the six kinds of set morphisms. 
In the case of plain sets, which have no order structure, nor algebraic, topological, differentiable or any other 
kind of structure, most kinds of morphisms are better known by their common names. 
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common name morphism name alternative z-values y-values 
@: Sı — $5 function set homomorphism = 
@: Sı —> $5 injection set monomorphism <1 = 
@: Sı > S2 surjection set epimorphism >1 =1 
@: Sı > S2 bijection set isomorphism =1 = 1 
ġ:S> S set endomorphism endofunction = 1 
$:S— S permutation set automorphism =1 = 


Only the term “set endomorphism” requires an alternative name “endofunction” because all of the other 
kinds of morphisms have well-known common names. The “g-values” and “y-values” columns indicate the 
constraints on z and y values. (More precisely, the “x-values” column signifies #{x; ¢(a) = y) for arbitrary y, 
while the “y-values” column signifies #{y; (x) = y} for arbitrary x.) Thus all classes of set morphisms 
must have one and only one y value in the target set for each x value in the source set. Injections must have 
at most one x value for each y value. Surjections must have at least one x value for each y value. 


10.5.23 REMARK: Morphisms versus homomorphisms. Terminology diversity. 

In this book, the term “morphism” is a synonym for “homomorphism”, and the terms “monomorphism”, 
“epimorphism” and “isomorphism” mean injective, surjective and bijective homomorphisms respectively for 
all categories. Then the terms “endomorphism” and “automorphism” mean respectively homomorphisms 
and isomorphisms whose domain and target set are the same. Authors who define mono/epi/isomorphisms 
in this way explicitly for all categories include MacLane/Birkhoff [110], page 38; S. Warner [155], page 82. 
Monomorphisms and epimorphisms are defined by Lang [108], page 120, as the exactness of certain diagrams, 
which are then equivalent to the simple injectivity/surjectivity criteria. Authors who define them in this 
way for multiple individual categories (including groups, rings and modules) include Grove [88], pages 8, 49, 
127; Ash [50], pages 19, 37, 89. 


These terms are defined differently in some accounts of category theory. For example, EDM2 [113], page 204, 
defines monomorphisms by the left cancellative property which appears in Theorem 10.5.14 (v), which is 
equivalent to injectivity for set homomorphisms, but may not necessarily be equivalent for some higher 
categories. They also define epimorphisms by the right cancellative property in Theorem 10.5.14 (vi), which 
differs from surjectivity of homomorphisms for some higher categories, such as rings for example. i 


As mentioned in Remark 1.6.3 item (3), category theory is outside the scope of this book. 


10.5.24 REMARK:  Notations for spaces of set homomorphisms. 
It is occasionally useful to have specific notations for various classes of set homomorphisms. Notation 10.5.25 
is non-standard. 


10.5.25 NOTATION: Let X and Y be sets. 

(i) InjCX, Y) denotes the set of injections from X to Y. 
(ii) Surj( X, Y) denotes the set of surjections from X to Y. 
(iii) BijCX, Y) denotes the set of bijections from X to Y. 


10.5.26 REMARK: Composition of functions with function inverses. 

A function composition of the form g o f^! : B — C for f : A B and g : A — C may be defined when f 
is not injective if the non-invertibility of f is somehow cancelled by the non-invertibility of g. To make sense 
of this, suppose that f is surjective and that g(x1) = g(a2) whenever f(21) = f(x2) for zı, x9 € A. (This 
means that Vy € B, dz € C, f !((y]) € g^ 1((z]), where z is unique for each y.) Then the set g(f!((y])) 
must be a singleton for all y € B. Consequently g o f^! may be defined to map y to the unique element of 
this singleton. This is formalised in Definition 10.5.27. (It is tempting to go wobbly at the knees here and 
reach for an axiom of choice to construct a right inverse f-! for f from which g o f^! may be constructed. 
Luckily that's not the only way to do it.) Definition 10.5.27 is illustrated in Figure 10.5.2. 


10.5.27 DEFINITION: The function quotient of a function g : A — C with respect to a surjective function 
f : A B such that Vz1, £2 € A, (f(z1) = f(z2) = g(£1) = g(z2)) is the function g o f^! : B > C defined 
by go f^! = (((2),9(2) £ € A}. 
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Figure 10.5.2 Generalised right inverse function: g = (go f7!) of 


10.5.28 REMARK: Well-definition of function quotients. 

The relation h = {(f(x), g(x)); x € A} € B x C in Definition 10.5.27 is a function from B to C because for 
all y € B, there is at least one x € A with f(x) = y since f is surjective, and the uniqueness of g(x) for a given 
f(x) follows from the assumed relation between f and g. The function quotient satisfies g = (g o f7!) o f. 
This may be regarded as a “generalised right inverse" of some kind. 


10.6. Function set-maps and inverse set-maps 


10.6.1 REMARK: Applications of function set-maps to continuity of functions. 

The properties of function set-maps and inverse set-maps are of fundamental importance for the continuity 
of functions in Section 31.12. (For properties of set-maps and inverse set-maps, see also EDM2 [113], 381.C, 
and Simmons [137], pages 14-20.) Theorems 10.6.7, 10.6.10 and 10.7.1 are amongst the most-used theorems 
in this book, each being explicitly invoked dozens of times. 


10.6.2 REMARK: The advantages of fully written-out proofs. 

The propositions and proofs in Sections 10.6, 10.7, 10.8, 10.9 and 10.10 are as tedious as those in Sections 8.1, 
8.2, 8.3 and 8.4. Therefore the comments in Remark 8.0.1 apply here also. No doubt the proofs in Sections 
10.6, 10.7, 10.8, 10.9 and 10.10 may be considerably compactified. But sometimes, a fully written-out “inline” 
proof gives clarity which highly leveraged proofs do not. 


10.6.3 REMARK: ZF set-maps versus set-theoretic set-maps. 

There is a minor “set-theoretic issue" with Definition 10.6.4. The expression (y € Y; da € A, (x,y) € f] 
is well defined for any set A, for any function f : X — Y. It seems reasonable to refer to this set by a 
notation such as f(A) for any set A. However, f is then not a ZF function, although it is a well-defined 
“set theoretic function". In other words, one may consider the text “f(A)” to be replaced by the text 
“fy € Y; dx € A, (x,y) € f}? wherever it is used, but one cannot then refer to f as a function in the sense 
of a set of ordered pairs. Nevertheless, there is no harm in using a notation such as f(A) for general sets A 
as long as f is not then used as a ZF set. 


10.6.4 DEFINITION: Set-maps and inverse set-maps of functions. 
(i) The set-map for a function f : X — Y, for sets X and Y, is the function f : P(X) > P(Y) defined by 
f(A) = {y E Y; 3x € A, (x,y) e fd - (f(x); v € A} for all AC X. 
(ii) The inverse set-map for a function f : X — Y, for sets X and Y, is the function fF: P(Y) > P(X) 
defined by f~'(B) = {x € X; Jy € B, (x,y) € f) = {x € X; f(x) € B) forall B C Y. 


10.6.5 REMARK: Diagram illustrating function set-maps. 
The set-map f in Definition 10.6.4 (1) is illustrated in Figure 10.6.1. 


10.6.6 REMARK: The usual notation for function set-maps and inverse set-maps. 

The set-maps f and f-! in Definition 10.6.4 are usually denoted simply as f and f^! respectively. This 
notation re-use, although economical, leads to real ambiguity when the domain or range of f has elements 
which are contained in other elements. Thus for instance, if f : X — Y and () € X, then (0) may refer to 
the original (point-map) function f or the corresponding set-map f. If this seems to be a problem only for 
pathological sets X and Y , consider X = w, the set of ordinal numbers. In this case, every element of w is also 
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m 
X ————M—— Y 


Figure 10.6.1 Set-map between power sets 


a subset of w, which makes it absolutely essential to distinguish between a point-map function f : X —^ Y 
and its corresponding set-map function f. Another common circumstance where serious ambiguity arises 
is when the domain or range is a power set. (The set-map notation ambiguity issue is also discussed by 
Halmos [357], page 31.) 

For clarity in Theorems 10.6.7, 10.6.10, 10.7.1 and 10.7.6, set-maps are notated differently to the corre- 
sponding point-map functions. Despite the real danger of ambiguity and confusion, most texts do use the 
same notation for the function and its set-map. Willard [165], page 4, wrote that “here we are following 
the unfortunate, but common, practice of denoting the elevation of f from A to P(A) by f also", where 
“elevation of f” means the set-map f. Kasriel [100], page 17, denoted the image f(A) of a set A by a function 
f as *f[A|", but gave no notation for the set-map f itself. 


10.6.7 THEOREM: Properties of function set-maps. 
Let f : X — Y be a function, and let f : P(X) ^ P(Y) denote the set-map for f. 


(i) FO) — () and f(X) CY. 


(i) f(X) =Y if and only if f is surjective. 

(ii) VA, B e P(X), AC B= f(A) C f(B). 

(i) VA, Be P(X), AC B & f(A) € f(B) if and only if f is injective. 
(ii) VA, B € P(X), f(AU B) = f(A)U f(B). 

(iv) VA, B € P(X), f(An B) € f()n f(B). 

(iv) VA, B € P(X), f(An B) = f(A) n f(B) if and only if f is injective. 

v) VA € P(X), f(X VA) 2 F(X) \ F(A). 

(v) VA € P(X), f(X \ A) = f(X)N f(A) if and only if f is injective. 
(vi) VA, B € P(X), f(B\ A) 2 f(B)V f(A). 

(vi!) VA, B € P(X), f(B\ A) = f(B) \ f(A) if and only if f is injective. 
(vii) VA, B € P(X), AS B= f(A) € f(B) if and only if f is injective. 
(vii) VA, B € P(X), AG B & f(A) $ f(B) if and only if f is injective. 


Pnoor: For part (i), note that f(0) = (f(x); x € 0) — 0. A function f : X — Y satisfies f(x) € Y for all 
x € X, by Definition 10.2.2 (iii). So f(X) = (f(x); x e X) C Y. 

For part (i), let f : X — Y be surjective. Then Vy € Y, dr € X, f(x) = y, by Definition 10.5.2. Let y € Y. 
Then Jz € X, f(z) = y. So y e { f(x); x € X) = {z; 3e € X, z = f(zx)). So Y C (f(z; x e X} = f(X). 
Now let f(X) =Y. Then { f(x); 2 € X) =Y. So Vy € Y, da € X, f(x) = y. So f is surjective. 

For part (ii), let A,B € P(X). Suppose that A C B. Let y € f(A). Then dx € A, y = f(x). Let re A 
satisfy y = f(x). Then r € B. So 3z, (x € B ^ y = f(x)). In other words, 3x € B, y = f(x). So 
y € {z; de € B, z = f(x)} = f(B). 

For part (ii), suppose that f is injective. Let A, B € P(X). Then A C B > f(A) C f(B 
Suppose that f(A) C f (B). Let x € A. Then f(x) € f(A). So f(x) € f(B). So f(x) = f(z’) fo 
by Definition 10.6.4 (i). Then x = z' by the injectivity of f. So x € B. Hence f(A) 

So A C B & f(A) C f(B). To show the converse, assume that VA, B € P(X), A 


Let z,z' € X with f(x) = f(a’). Let A = (x) and B = (z'). Then f(A) = {f(x)} 
Therefore A = B. So x = z’. Hence f is injective. 
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For part (iii), let A, B € P(X). Let y € f(AUB). Then y = f(x) for some x € AUB. Sore Aorze B 
(or both). Suppose x € A. Then f(x) € f(A). So y € f(A) U f(B). The same follows if £ € B. Hence 
f(Au B) € f(A) U f(B). Now suppose that y € f(A) U f(B). Then either y € f(A) or y € f(B). Suppose 
that y € f(A). Then y = f(x) for some x € A. Then z € AU B. So f(x) € f(AU B). Similarly if y € f(B). 
Hence f(A) U f(B) C f(AU B). So f(AU B) = f(A)U f(B). 
For part (iv), let A,B € P(X). Then f(An B) C f(A) and f(An B) C f(B) by part (ii). Hence 
f(AnB)C f(A)n KB ) by Theorem 8.1.7 (xii). 
For part (iv^), suppose that f is injective. Let A,B € P(X). Then f(An B) € f(A) n f(B) by part (iv). 
eee F(A) n fCB). They E f(A) and y € f(B). Boy = a ) for some z € A, and y = f(z’) for 
some x’ € B. Then x = z' by the injectivity of f. So x = z' e ANB. Soy € FIA N B). Therefore 
F(A) N F(B) € f(An B). Hence f(AN n= = f(A) N f(B). To show the converse, assume that VA, B € 
P(X), f(An B) = F(A) n f(B). Let z,z' € X with f(x) = f(a’). Let A = {x} and B = {x'}. Then 
F(A) = U(2)) = {F@)} = f(B). So f(An B) = fU) n F(B) = (2) #0. So AN B #0. So z= a. 
Hence f is injective. 
For part (v), X = (X \ A) UA by Theorem 8.2.5 (ix). So f F(X) = F(X \ A)U A) = f(XNA)U f(A) b 
part (iii). So F(X) \ f(A) = (X \ A) U F(A) \ F(A) = F(X\ A)N F(A) € F(X \ A) by Theorem 8.2.5 (six) 
For part (v^), suppose that f is injective. Let A € P(X). Then f(X \ A) 2 f(X)N f(A) by part (v). 
Let y € f(X \ A). Then y = f(x) for some x € XXV A. Then z € X and x ¢ A for such x. Suppose 
that y € f(A). Then y = f(a’) for some z' € A. But then z = a’ by the injectivity of f. So x € A, 
which is a contradiction. Therefore y ¢ f(A). So y € f(X)N f(A 7 Hence f(X XV A) C F(X) \ f(A), and so 
f(X\A) = f(X)\ f(A). For the converse, assume that VA € P(X), f(X\A) = f(X)\ f(A). Let 2,2’ € X 
satisfy f(x) = f(a’) = y. Let A = {a}. Then f(A) = (y). So y € F(X) \ f(A). Suppose that x 4 x’. Then 
a’ € XXV A. Soy € f(X V A), which contradicts the assumption. So x = x’. Hence f is injective. 
For part (vi), B = (BV A)U (BN A) by Theorem 8.2.5 (xi). So f(B) = f(B\ A)U f(BN A) by part (iii). So 
FB) € F(B A) U f(A) by part (ii). By Theorem 8.2.5 (xix), (/(B V A) U F(4)) \ F(A) = F(B\ A) \ F(A). 
So f(B)\ f(A) € f(B\ A)\ f(A) € F(B\ A). 
For part (vi'), let f be injective. Let A, B € P(X). Then "e \ A) = f(Bn(XNA) 2 f(B)nf(XNA)— 
F(B)A (F(X) \ f(A)) by parts (iv) and (v). So f(B\ A) = f(B)N f(A) by part (ii) and Theorem 8.2.6 (xx). 
Now suppose that f is not injective. Then there exist 11,22 € X with f(x1) = f(x2) = y. Let A = {zx} 
and B = {z1, £2}. So f(B \ A) = {y} and f(B) \ f(A) = {y} \ (y) =0 4 f(B \ A). Hence the assertion. 
For part (vii), suppose that f is injective. Let A, B € P(X) with A € B. Then f(A) C f(B) by part (ii), and 
BN A z 0. So f(B\ A) #0. Therefore f(B)\ f(A) 4 0 by part (vi^). So f(A) € f(B). Now suppose that 
f is not injective. Then there exist z1, £2 € X with f(zx1) = f(w2) = y. Let A = (zi) and B = (z1,25]. 
Then A € B, but f(A) = {y} = f(B). So it is not true that f(A) € f(B). The assertion follows. 
For part (vii’), let f be injective. Let A, B € P(X) with A € B. Then f(A) € f(B) by part (vii). Now 
suppose that f(A) ¢ f(B). Then A C B by part (ii), and f(B) V f(A) #0. So f(BN A) Z 0 by part (vi^). 
Therefore B \ A 4 because f (0) = 0. So A & B. Thus AS B & f(A) € f(B). Now suppose that f is not 
injective. Then by part (vii), there exist A, B € P(X) for which it is not true that A $ B= f(A) $ f(B). 
So it is a-fortiori not true that AS Bo FA) & f(B). The assertion follows. 


ES 


10.6.8 REMARK: Relations between function restrictions, domains, ranges and set-maps. 
In terms of the function restriction operation in Definition 10.4.3, the relations in Theorem 10.6.9 follow 
almost trivially. But checking the logic is a good exercise. 


10.6.9 THEOREM: The range of a restricted function equals the image of the restricted domain. 
Let f be a function. Let A be a set. 


(i) Range(/) = f(Dom(f)). 
(ii) Range(f|,) = f(A). 


Pnoor: For part (i), f(Dom(f)) = (y; dz € Dom(f), (x,y) € f) by Definition 10.6.4 (i). But z € Dom(f) 
if and only if 3z, (z,z) € f, by Definition 9.5.4. So f(Dom(f)) = (y; 3 dz, (x,z) € f) and (x,y) € f)}. 
But (x,y) € f implies 3z, (x, z) € f. So the proposition “(Az, (x, z) € f) wd (x,y) € f” is equivalent to the 
proposition *(z,y) € f" by Theorem 4.7.9 (Ixx). Therefore f(Dom(f)) = (y; 3x, (x,y) € f), which equals 
Range(f) by Definition 9.5.4. 
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For part (ii), f|, = ((z,y) € f; x € A} by Definition 10.4.3. So Range(f| ,) = (y; da € A, (x,y) € f) by 
Definition 9.5.4, which equals f(A) by Definition 10.6.4 (i). 


10.6.10 THEOREM: Properties of function inverse set-maps. 
Let f : X — Y be a function. Let f^! : P(Y) — P(X) denote the inverse set-map for f. 


(i f-(0) = : and NGC =X. 


D (0), e 1(A ) Æ 0 if and only if f is surjective. 
YndtHef cT 

VA,B € P(Y), AC 
Y), A 
Y), F 
F 
rm 


, 


T 
SA 24 E55 E35 
< 
D 
Bw 
m 
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Proor: For part (i), suppose that x € f~1(0). Then x € X and f(x) € 0 Py Definition 10.6.4 (ii). But 
this contradicts Definition 10.2.2 (iii) for a function f : X — Y. So x ¢ f-!(0). Hence f-1(0) = 0. By 
Definition 10.6.4 (ii), f^! (Y) = {x € X; f(x) € Y). But this equals X because f(x) € Y for all x € X. 


For pate (i), suppose that f is surjective. Let A € P(Y). If A= 0, then f~'(A) = 0 by part (i). Suppose 
that f-!(A) = Ø. Let y € A. Then y = f(x) for some x € X, by the surjectivity of f. Then x € f-!(A), 
which contradicts the assumption. So A = Ø. To show the converse, assume that VA € P(Y), f-!(A) = 
( & A — 0. Let y € Y. Let A= (y). Then A Zz 0. So f-1(A) Z 0. So x € f-1(A) for some x € X. Thus 
y = f(x) for some x € X. Hence f is surjective. 


Part (i^) follows directly from parts (i) and (i). 


For part (ii), let A, B € P(Y) satisfy A C B. Let x € f !(A). Then x € X and f(x ) € A by Defini- 
tion 10.6.4 (ii). So f(x) € B. So x € f^! (B) by Definition 10.6.4 (ii). Hence f~'(A) € f^! (B). 

For part (i), suppose that f is surjective. Let A,B € P(Y) satisfy A C B. Then f-!(A) C f-!(B) 
by part (ii). Suppose that f^ !(A) C f^ !(B). Let y € A. Then y = f(x) for some z € f^ !(A), by 
Definition 10.6.4 (ii). Then z € f^! (B) for such x. So y = f(x) € B by Definition 10.6.4 (ii). Hence A C B. 
To show the converse, assume that VA, B € P(Y), AC Bs f-!(A) € f-!(B). Let y € Y. Let A = (y). 
Suppose that f^! (A) = Ø. Let B = 0. Then f/-!(B) = () by part (i). So f(A) € f-! (B). So AC B. That 
is, {y} C Ø, which is a contradiction. So f-!(A) # Ø. So y = f(x) for some x € X. Hence f is surjective. 
For part (ii), suppose that f is surjective. Let A, B € P(Y). Then A = B & f^! (A) = f^! (B) by a double 
application of part (ii). To show the converse, assume that VA, B € P(Y), A= B & f-!(A) = f-!(B ) 
Let y € Y. Let A = {y}. Suppose that f^!(4) = Ø. Let B = Ø. Then f^! (B ) = 0 by part (i). 
[-(A) = f-*(B). So A = B. That is, (y) = Ø, which is a contradiction. So f^ !(A) 4 0. So y = f(x) a 
some z € X. Hence f is surjective. 


For part (iii), let A, B € P(Y). Let z € 7 (Aue), Then z € X and f(x) € AUB. So f(x) € Aor f(x) € 
(or both). If f(x) € A, then z € f~'(A). So x € f~'(A)U f~'(B). This follows similarly for f(a) € 
A 


Hence f-!(AU B) C f-1(A) U f-!(B). Now suppose that z € f-1(A)U f !(B). Then z € f-*(A 
or x € f-!(B) (or both). Suppose that x € f^! (A). Then f(x) € A. So f(x) € AU B. Sox € f- LU BE 
and similarly if x € f-1(B). Hence f! (AU B) D f-!(A)U f-!(B). So f-X (AU B) = f-!(A)U f-!(B). 
For part (iv), let A, B € P(Y). Let x € f! (An B). Then x € X and f(x) € An B. So f(x Ne A and Lr 
B. Sox € f-!(A ) and z € fJ (B). So x € f(A) N F'(B). Hence f (An B) € f! (A)n fH 
Now suppose that x € f-!(A)n f-!(B). Then z € f-!(A4) and z € f-!(B). So f(x) € A and f(z) € B. 


So f(z) e An B. Sox e f! (An B). Hence f^ (An B) 2 f-!(A)n f-!(B). 

For part (v), let A,B € P(Y). Let x € f-'(A\ B). Then z € X and f(r) € A\ B. So f(z) € A 

and f(x) ¢ B. So x € f-!(A) and x ¢ f-!(B) by Definition 10.6.4 (ii). So z € f-1(A4) V f-!(B). Now 
x d f(x) € B by 


suppose that x € f—1(A) \ f-1(B). Then x € f-!(A) and x ¢ f^ !(B). So f(z) e Aa 
Definition 10.6.4 (ii). So f(z) € AV B. Sox € f-! (AN B). Hence f! (AV B) = f-!(A)N f-(B). 
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Part (v^) follows directly from part (v) by substituting Y for A and A for B, and noting that f-1(Y) = X 
by part (i). 


10.6.11 REMARK: Inverse set-maps have simpler properties than function set-maps. 

The reason for the simplicity of Theorem 10.6.10 parts (iii), (iv) and (v^), compared to the corresponding 
Theorem 10.6.7 parts (iii), (iv), (iv’), (v) and (v'), is the fact that the inverse of a function is automatically 
injective and surjective, even i if the inverse is not a function. 


10.6.12 THEOREM: Set-maps of restrictions of functions. 
Let f : X — Y be a function. 


(i) VU € P(X), VS € P(X ire ad S). 

(ii) VU € P(X), YS € P(Y), (fly) = UNF 5. 

PROOF: For part ( i), fly(9) = (y Jx € S, (x,y) € flu} = (y; da € S, ((x,y) € f and x € U)} by 
Definitions 10.6.4 (i) ae 10.4.3. So flu (S) = iy; de € UNS, (x,y) € f} = f(UNS) by Definition 10.6.4 (1). 
For bu Dom( (Flo) =U g et by Theorem 10.4.5 line (10.4.2). Then by Definition 10.6.4 (ii), 
(Fly) S) = {x € U N Dom(f INC )e S). But fp) = f(x) for alla € UNDom(f). So (fly) (S) = 


{rE Dom(f); f(z) € S) -Un {a € Dom(f); f(x) e S) =U N f-1(S) by Definition 10.6.4 (ii). 


10.6.13 THEOREM: Decomposition of set-maps into their values for singletons. 
Let f : X — Y be a function. 


(i) Range(f) = Usex f((x))- 
(ii) Dom(/) = Uyey F). 


Pnoor: For part (i), let y € Range(f). Then y = f(x) for some z € X by Notation 10.2.10 and Definitions 
9.5.4 and 10.2.2. Then f(x) € f({x}) for such x by Definition 10.6.4 (i). So y € f((z]). Soy € U,cx f((x]) 
by Notation 8.4.10. Thus Range(f) C U,cx FU}. 


Now suppose that y € Uc x f({x}). Then y € f({x}) for some x € X by Notation 8.4.10. So y = f(x) by 
Definition 10.6.4 (i). Therefore y € Range( f) by Definitions 9.5.4 and 10.2.2. Thus Range(f) 2 U,ex f((2]). 
Hence Range(f) = Urex f((2). 

For part (ii), let z € Dom( f). Let y = f(x) and y € Y. Then f(x) € (y) by Definition 7.5.6. So x € f^! ((y]) 
by Definition 10.6.4 (ii). Therefore z € U,cy f ((y)) by Notation 8.4.10. Thus Dom(f) € ) € U,ev f yp. 


Now suppose that x € U ey f! ((yJ). Then z € f^! ((yJ) for some y € Y by Notation 8.4.10. Therefore 
f(x) € {y} for some x € X by Definition 10.6.4 (ii). So f(x) = y by Notation 7.5.7. Therefore x € Dom( f). 


Thus Dom(f) 2 Uyey f^! ((yJ). Hence Dom(f) = Uyey f UH. 


10.7. Mixed function set-maps and double set-maps 


10.7.1 THEOREM: Properties of compositions of mixed function set-maps and inverse set-maps. 
Let f : X — Y for sets X and Y. Let f and f~! be the set-maps in Definition 10.6.4. 


(i) VS e P(Y), f(f-*(S)) = SN f(x). 
(i) VS e P(Y), f(f !(S)) C S. 
(i^) VS € P(Y), fu ise S if and only if f is surjective. 
(i) YS € P(X), f QUG)258 
(i^) VS € P(X), f-!(f(S)) = S if and only if f is injective. 
(iii) VS e P(Y), f! (S) = f! (Sn f(X)). 
(ii^ VS € P(Y), f-1(S) 20 & Sn F(X) - 0. 
(ii^ VS e P(f(X)), FHS 20 s S=0. 
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PROOF: For part (i), let f : X — Y for sets X and Y. Let S € P(Y). Then 
Vy € Y, ye fü (8) e Ix e f^ (8) f(x) =y 


S y€S AJr, f(x) 2y 
SYyYESA YE f(X) 
e ye So FX). 


Parts (i') and (i’’) follow directly from part (i). 


For part (ii), let f : X — Y for sets X and Y. Let x € S. Then f(z) € f(S). Sox € f !(f(S)) b 
Definition 10.6.4 (ii). Hence S C f-!(f(S)). 


For part (ii^), let f : X — Y be an injective function for sets X and Y. Let x € f! (f(S)). Then x € X and 

f(x) € f(S) by Definition 10.6.4 (ii). So f(x) = f(z’) for some z' € S by Definition 10.6.4 (i). But 2 = a' 

by the injectivity of f. So x € S. Hence VS € P(X), f-!(f(S)) = S. 

For the converse of part (ii), let f bord VS € P(X), f! (f(S)) = S. Let z,z' € X satisfy f(x) = f(a’). 

Then f-!(f((z])) = {a} and f- pu FULH) = (a). Let y = f(x) = f(x’). Then f((z]) = (y) and 
f((z')) = {y}. So FHF) = f-(F({2"})). So x = à Hence f is injective. 

For part (iii), let S € P(Y). na FSN FX) =f S )n f O0) SF (Six = 39 by 

Theorem 10.6.10 (iv) and part (ii). But f-!($ f(X "d f~*(S) by Theorems 10.6.10 (iv) and 8.1.6 (iii). 

Therefore f—1(S) = f-!(S A f(X)). 

For part (ii^), let S € P(Y). Suppose that f-!(S) = Ø. Then f(f-!(S)) = 0 by P us 10.6.7 (i). 

Therefore $M f(X) = 0 by part (i). Now suppose that S N f(X) = 0. Then f-!(S) = 0 by part (iii) and 

Theorem 10.6.10 (i). 

For part (iii^), let S € IP(f(X)). Then SN f(X) = S by Theorem 8.1.7 (v). Therefore f-'(S)=0 = S=0 

by part (iii^). 


10.7.2 THEOREM: Properties of compositions of unmixed function set-maps and inverse set-maps. 
Let f : X > Y and g : Y — Z for sets X, Y and Z. Let h — go f. Let f, f-', g, g ^, h and h^! be the 
corresponding function set-maps and inverse set-maps as in Definition 10.6.4. 
(i) VS € P(X), g(f(S)) = h(S). In other words, go f = h. 
(ii) VS € P(Z), f-1(g 1(S)) = h-1(S). In other words, f-1 og ! = h^ 


PRoor: For part (i), let S € P(X). Then by Definition 10.6.4, h(S) = (h(r); x € S) = (g(f(x)); 
x € S} = {g(y); Ix € S, y = f(z)) = tol); v € F(S)} = G(F(S)). 

For e. i i), let S € PU Then h-1(9) = {x € X; h(x) € S) = (x € X; g(f(zx)) e S) = (x e X; 
f(x) € g-1(S)} = f-1(g-!(S)) by Definition 10.6.4. 


10.7.3 THEOREM: A property of combinations of function set-maps and inverse set-maps. 
Let f : X — Y for sets X and Y. Let f and f7! be the set-maps in Definition 10.6.4. 


(i) YVA € P(X), VBE P(Y), f(A) C B & AC f-!(B). 


Pnoor: For part (i), let A € P(X) and B € P(Y). Suppose f(A) C B. Then f-!(f(A)) C f-!(B) b 
Theorem 10.6.10 (ii). But f-!(f(A)) 2 A by Theorem 10.7.1 (ii). Hence AC f-!(B). 

Now suppose that A C f~1(B). Then f(A) C f(f-!(B)) by Theorem 10.6.7 (ii). But f(f~1(B)) € B by 
Theorem 10.7.1 (i). Hence f(A) C B. 


10.7.4 REMARK: “Double set-maps” are applicable in topology. 

Theorem 10.7.6 extends Theorems 10.6.7 and 10.6.10 to arbitrary sets of sets. This theorem involves the 
“double set-map" f : IP(IP(X)) > P(P(Y)) for the function f, which is defined by f : S++ {f(A); A € S}, 
and the “double inverse set-map" (the set-map of the inverse set-map) fu: P(P(Y)) + P(P(X)) defined 
by f-!: Sts {f-(A); A € S}. (See Theorem 10.8.18 for an indexed set-family version of Theorem 10.7.6.) 
The double inverse set-map is applicable to the generation of topologies from inverse functions. (See for 
example Theorem 32.4.2.) 


~ 
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10.7.5 DEFINITION: Double set-maps for functions. 


The double set-map for a function f : X — Y, for sets X and Y, is the function f : P(P(X)) > P(P(Y)) 


defined by /(S) = {f(A); A € S} for all $ € P(P(X)). 


The double inverse set-map for a function f : X — Y, for sets X and Y, is the function 


IP(IP(X)) defined by f-!(S) = (f-!(A); A € S] for all S € P(P(Y)). 


10.7.6 THEOREM: Properties of “double set-maps” for sets of sets. 
Let f : X — Y be a function. Let S € P(P(X)) and S’ € P(P(Y)). 


(i) FUS) = UFA); A e S$} 2U (S). 
i) ANS) € MFA); Ae S) 2 (S) it S 4 0. 
(i) f (US) UU (As Ae $') = UFS’). 
(v) £- (1.5) =M (4: Ae 8) 2 F1(8’) it 9” z 9. 
PROOF: For part (i), let f : X — Y be a function, and let S € IP(IP(X)). Then 


FUS) = {y € Y; de € US, f(z) =y} 
= {ye Y; Jx, (te USA f(z) =y)} 
= {y € Y; Jx, (GA € S, x E€ A) ^ f(x) =y)} 
= {y € Y; JA € S, 3x, (x E A^ f(x) =y)} 
= {y €Y; JA € S, y € f(A)) 
=U{f(A); A e S}. 


(i 


The equality UJ{ f(A); A € S) = U f(S) then follows directly from Definition 10.7.5. 


For part (ii), let f : X + Y be a function, and let S € P(P(X)) \ {0}. Then 


FAS) 2 ty e Y; de € NS, f(x) =y} 
= {y € Y; Jz, (ENS ^ f(z) =y)} 
= {y€ Y; Jx, (YAE S, x E€ A) ^ f(x) =y)} 
= {y € Y; Jx, (YA, (Ae S >x € A)) ^ f(x) =y)} 
= {y € Y; Jx, VA, 


(AESSareA)A f(a)=y 
C {y€ Y; VA, ar, (ACS SreA)A f(x) 2y 
C {y E€ Y; VA, ar, (ACESS (€ A ^ f(x) — y) 
= {y E€ Y; VA, (Ae S 2 (Ar, (€ A^ f(x) 2) 
= {ye Y; VA € S, Jx, (€ A^ f(x) =y)} 
= {y €e Y; VA € S, Jx € A, f(x) = y) 
={y €Y; VAE S y € f(A)} 
=M{F(4); A € 8j. 


fo 


Line (10.7.1) follows from Theorem 6.6.16 (iii). Line (10.7.2) follows from Theorem 6.6.24 (iii). Line (10.7.3) 
follows from Theorem 4.7.9 (xxxviii). Line (10.7.4) follows from Theorem 6.6.12 (xxvii). Then the equality 


MEFA; Ae S) ^f) FS) 1 ) follows directly from Definition 10.7.5. 
For part (iii), let f : X — Y be a function, and let S' € IP(IP(Y)). Then 


HUS’) = {x € X; f(z) eUS") 
= (re X; JA € S', f(x) e A} 
= {re X; JAE S, xe f + (A)} 
-UG A Ae 85). 


The equality U(f-1(4); A € S) =U f-!(S) then follows directly from Definition 10.7.5. 
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For part (iv), let f : X — Y be a function, and let S’ € P(P(Y)) \ {0}. Then 


F (15) = (se X; f(z) € NS} 
— (xe X; VYA € S', f(x) € A} 
= {re X; VAc S', xe f(A} 
= MF (A4); A€ S}. 


The equality (f! (A); A € S") = N f-1(S’) then follows directly from Definition 10.7.5. 


10.8. Families of sets and functions 


10.8.1 REMARK: A family is a function which is thought of as an indexed set. 

Families are technically defined to be synonymous to functions. The main difference is in the notation and 
the focus on the function values. The domain of a family is regarded as a mere index set which only provides 
tags for manipulating the values of the function. A family usually has values that are all sets or all functions. 
A set of sets, or set of functions, is often provided with tags to create a family out of a set. 


Sets of things and families of things are often thought of interchangeably since a family can be constructed 
from a set by providing tags for all elements of the set, and a set can be constructed from a family as the 
range of the family. An important difference is the fact that the same object may appear twice in the range 
of a family whereas there cannot be two copies of an object in a set. So if a set is indexed and the indexes 
are removed, the original set is recovered. But if a family has its indices removed, it is not generally possible 
to reconstruct the original family from the range of the family. 


A family of sets or functions may be thought of as an “indexed set” of sets or functions. But the indexed 
set is sometimes defined to be the range of the family. Many authors add and remove indices as required, 
without much comment. Often indices are used merely as a notational device. The great danger of this is 
that one often unconsciously assumes that the index set has some kind of order, even a well-ordering, so that 
various kinds of traversals of the range of the family can be defined, as mentioned in Remark 10.8.6. (This 
is also mentioned in Remark 33.5.3.) 


Although families have the same set representation as functions, they are thought of as being a different 
kind of object. This shows once again that mathematical objects have more significance than their set 
representation alone. A similar observation is that all functions are represented as sets. So a family of 
functions is also a family of sets, although a family of sets is not generally a family of functions. 


10.8.2 DEFINITION: A family of sets with index set I is a function S with domain J such that S(i) is a 
set for all i € I. 


10.8.3 NOTATION: Si, for a family of sets S and i € Dom(S), denotes S(i). In other words, S; = S(i). 
(S;)ier, for a family of sets S such that Dom(S) = I, denotes S. In other words, S = (S;)ier. 


10.8.4 DEFINITION: A family of functions with index set I is a function f with domain J such that f(i) is 
a function for all i € I. 


10.8.5 NOTATION: fi, for a family of functions f and i € Dom(f), denotes f(z). In other words, f; = f(i). 
(fi)ier, for a family of functions f such that Dom(f) = I, denotes f. In other words, f = (fi)ier. 


10.8.6 REMARK: The hiddem agenda of families of sets and functions. 

For the axiom-of-choice believer, any set of objects may be indexed by an index set which is an ordinal 
number. Then the well-ordering of the index set may be exploited in applications for the traversal of the 
whole family, visiting each element of the family once and only once. Families of sets or functions are often 
used as if the function which maps the index set to the target set gives a kind of “handle” by which the 
elements of the target set may be manipulated. For example, a set X may be known to be infinite, and 
one might write: “Let (r;);e, be a sequence of elements of X." This immediately invokes the axiom of 
(countable) choice if it is being tacitly asserted that such a sequence "exists". 


Every set X can be trivially indexed by itself. The identity function idx : X — X on aset X, defined by idx : 
x > «, satisfies the requirements for an injective family of sets whose range is the set X. Thus (S;);e;, with 
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I = X and $; = i for all i € J, is an injective family of sets with range X. However, such a useless indexing of 
a set X is often not what is implicitly signified by the introduction of a family of sets or functions. Whenever 
introducing a family of sets of functions, one should be conscious of whether one is implicitly assuming the 
existence of such a family when it cannot necessarily be readily justified. 


10.8.7 NOTATION: As a convenience in notation, if two families (4;);e; and (D;);e; have the same index 
set I, then the family of pairs (A;, B;) may be denoted in abbreviated fashion as (A;, B;);e; instead of the 
explicit notation ((A;, Bi) ,. ;. 


An n-tuple of families ((AD);er, (A2);er, .. . (A? Jier) for n > 1 may be denoted as (Al, A?,... A? Jier instead 
of the explicit notation ((A}, A?,.. mG) ere 

10.8.8 REMARK: Comparison of set-family unions and intersections with set-theoretic versions. 

Notation 10.8.10 for set-families is very similar to Notation 8.4.10 for set-valued set-theoretic functions, 
but a set-theoretic function does not always have a well-defined domain. It could be defined for all ZF 
sets, for example, so that the domain would not be a ZF set. In this sense, the set-theoretic versions of 
Notation 10.8.10 are more general. 


10.8.9 DEFINITION: The union of a family of sets S = (S;);e; is the union of the range of S. In other 
words, it is the set UJ {S;; i € I} = (x; di e I, x € Sj}. 

The intersection of a family of sets S = (S;)ier such that I 4 () is the intersection of the range of S. In 
other words, it is the set (1| (9;; i € I} = {a; Vi e I, x € S;}. 


10.8.10 NOTATION: |J;., Si, for a family of sets S = (5;);e;, denotes the union of S. 
(ie, Si, for a family of sets S = (5;)ier, denotes the intersection of S. 
Alternative notations: (J;e; S(i) and (),<; S(1). 


10.8.11 REMARK: The union and intersection of the range of a set-valued formula. 
The replacement axiom, Definition 7.2.4 (6), guarantees that (y; 3x € X, y € A(z)) = (A(x); x € X) isa 


well-defined set via Theorem 7.7.2. The union axiom, Definition 7.2.4 (4), then guarantees that the union of 
this set is also a set. So the construction |J ex A(x) in Notation 10.8.10 is a well-defined set. 


The set construction (|, c; A(z) in Notation 10.8.10 is well defined by the axiom of specification because it 
can be written as a restriction of the set [J ex A(x) by a set-theoretic formula. 


10.8.12 THEOREM: Different ways of writing the union and intersection of a family of sets. 
Let A = (A;)sex = (A(x))sex be a family of sets. 


(i) UtA(z; xe X} = {y; Ix € X, y € A(x)} = U,cx A(z) = Utv; 
(ii) (((A(z); x € X} = {y; Vx € X, y € A(z)} = ex A(2) = Mu; 


PROOF: From Notation 8.4.2, it follows that LJ[A(z); x € X} = (y; dz € (A(x); x E€ X}, y € z}. Therefore 


z €X,y-A(z)]. 
z€X,y-A(x))if X Z 0. 


UtA(z);;»xe X} = {y; dz, (ze(A(x); ve XY A ye z)} (10.8.1) 
= {y; dz, (z € {w; Ix € X, Az) = w} ^y € z)) (10.8.2) 
= (y; 3z, (Ae € X, Az) =z) ^y e z)) (10.8.3) 
= {y; dz, Ix € X, (A(x) =z A ye z)} (10.8.4) 
= {y; dz, da, (xL E X A A(z) 2 ^y €z)] (10.8.5) 
= fy; dz, 3z, (x E X ^ A(z) =z ^y € z)} (10.8.6) 
= {y; da, (we X A (Az, (A(z) 2 A ye z))} (10.8.7) 
= {y; Jx € X, 3z, (A(z) =z A ye z)} (10.8.8) 
= (y; dx € X, y € A(z)}, (10.8.9) 


where line (10.8.1) follows from Notation 7.2.7, line (10.8.2) follows from Notation 7.7.18, line (10.8.3) follows 
from Notation 7.7.10, line (10.8.4) follows from Theorem 6.6.16 (vi), line (10.8.5) follows from Notation 7.2.7, 
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line (10.8.6) follows from Theorem 6.6.24 (ii), line (10.8.7) follows from Theorem 6.6.16 (vi), line (10.8.8) 
follows from Notation 7.2.7, and line (10.8.9) follows from Theorem 6.7.9 (xi). 


The equality (y; Ja € X, y € A(x)} = U,c x A(x) follows from Notation 10.8.10. 


Notation 7.7.18 gives the equality (A(x); x € X) = (y; da € X, y = A(x)}. Taking the union of both sides 
of this equality gives U{ A(x); x € X) = U{y; dr € X, y = A(z)j. This completes the proof of part (i). 
For part (ii), note that {A(x); £ € X} is non-empty because X is non-empty. Then 


(XA(z; ze X} = {y; Vz € {A(z); £ E X}, y € 2} (10.8.10) 
= {y; Yz, (z € {A(z); rE X} > yE z)} (10.8.11) 
= (y; Vz, (z € (w; Jx € X, A(z) = wh > y € z)) (10.8.12) 
= (y; Yz, (Bx € X, A(x) = z) > y € z)) (10.8.13) 
= (y; Vz, ((da, (xL € X ^ A(x) 22) > ye z)} (10.8.14) 
= (y; Vz, Vz, ((r€ X A A(x) =z) > y € z)) (10.8.15) 
= {y; Vz, Va, (x E€ X => (A(z) = z >y € z))} (10.8.16) 
= iy; Va, Vz, (x E€ X > (A(x) = z > yE z))} (10.8.17) 
= {y; Va, (x € X > Vz, (A(x) =z > y € z))) (10.8.18) 
= {y; Va € X, Vz, (A(x) = z > ye z)} (10.8.19) 
= iy Vr e X, y € A(x)} (10.8.20) 


where line (10.8.10) follows from Notation 8.4.2, line (10.8.11) follows from Notation 7.2.7, line (10.8.12) 
follows from Notation 7.7.18, line (10.8.13) follows from Notation 7.7.10, line (10.8.14) follows from No- 
tation 7.2.7, line (10.8.15) follows from Theorem 6.6.12 (iii), line (10.8.16 16) follows from Theorem 4.7.9 (1), 
line (10.8.17) follows from Theorem 6.6.24 (1), line (10.8.1 8.18) follows from Theorem 6.6.12 (xviii), line (10.8.19 8.19) 
follows from Notation 7.2.7, and line (10.8.20) follows from Theorem 6.7.9 (xiv). 


The equality (y; Vr € X, y € A(x)} = (,cex A(x) follows from Notation 10.8.10. 


Notation 7.7.18 gives the equality [A(r); x € X] = (y; da € X, y = A(x)}. Taking the intersection of 
both sides of this equality gives (1[A(x); x € X} = (My; Ix € X, y = A(z)). This completes the proof of 


part (ii). 


10.8.13 REMARK:  Ezrtension of various general properties of unions and intersections to families of sets. 
Theorem 10.8.14 applies Theorem 8.4.8 parts (iv), (vi), (viii) and (ix) to families of sets. 


10.8.14 THEOREM: Unions and intersections of set-family unions and intersections. 
Let A = (A;)ie7 and B = (Bj)jc; be families of sets and let C be a set. 


(i) CA (Uer 4i) =U (Cn Ad: 

(ii) CU (Mier Ai) = Nier(C U Ai) i£ T 7 0. 

(iii) (User Ai) 9 (Uje; B j)- UG yerx; CA N B;). 

(iv) (Mier Ai) U (Myer Bj) = ser o (AU By) i£ T #0 and J z 0. 


PROOF: Part (i) follows from Theorem 8.4.8 (iv). Part (ii) follows from Theorem 8.4.8 (vi). Part (iii) follows 
from Theorem 8.4.8 (viii). Part (iv) follows from Theorem 8.4.8 (ix). 


10.8.15 REMARK: Swapping unions and intersections of a doubly indexed family of sets. 

In the degenerate case J = Ø in Theorem 10.8.16 (iii), the double index set is I x J = Ø x J = Ø. Then A = 0, 
and z € Uc;fljgg;j Ai; © die L Vj eJ, € Aij & dí (i € OA Vj € J,v € Aij) & L, where 
L denotes the always-false proposition. (See Notation 3.6. 15) Therefore Uje7 Mjes Ai; = 0. Similarly, 
£z E€ Mjes Uir áis @ Yj E€ J, Hi €I, x € Aij eo Vj EJ, Ji, @e0Nre Aij) oe Viele lL. 
Therefore jeg User Ai; = 0. 


10.8.16 THEOREM: Double unions and intersections of doubly indexed set-families. 
Let A = (Ai ;)icr,jc; be a doubly indexed family of sets. 
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(i) Vier Uses Aij = Des Uier Aij. 
(ii) If ls () and J Æ 0, then (Yer Mjes A. = ar el kerig: 
(iii) If J 40, then Vier jes Au; € c Mjes Uis Aij- 


PROOF: For part (i), 


iEI jet jEJ 
edel,aecJ, TEA; 
ede J, tie lI, xe Ajj (10.8.21) 
erc U U Aij, 
j€J icr 


where line (10.8.21) follows from Theorem 6.6.24 (ii). Hence UJ;c; Ue; Aig = Uje; Uier Aij- 
For part (ii), 


Va, re QN Ay & Viel, «ve N Aij 
icljcJ jet 
€ Vi € I, Vj J, c Aij 
e vjeJ, Viel, «€ Aj; (10.8.22) 
4 FE ( N Åi j» 
j€J icr 


where line (10.8.22) follows from Theorem 6.6.24 (i). Hence er e; Ai; = Njes Mier Aiz 
For part (iii), let A = (Aij)ier,je7 be a doubly indexed family of sets such that J # Ø. Then 


reU N 4i; @ Jiell, Vj eJ, xE Aij 
iel jeJ 


Wes S aed (10.8.23) 
€ rE N U Aij, 
jeJ ier 


where line (10.8.23) follows from Theorem 6.6.24 (iii). Hence Ujer Djeg Ai; € Djeg Vier Ais- 


10.8.17 REMARK: Application of a double set-map theorem to families of sets. 

Theorem A 8.18 is an indexed version of Theorem 10.7.6. The single and double set-map notations f, f7}, 
f and f~t are introduced in Definitions 10.6.4 and 10.7.5. The proof given here for Theorem 10.8.18 is 
essentially the same as the proof given for Theorem 10.7.6. 


10.8.18 THEOREM: Properties of “double set-maps” for indexed families of sets. 
Let f : X — Y be a function. Let A = (A;)icr € P(X)! and B = (B;)icr € P(Y)! be families of sets. 


(i) FUser 4) = User /(A) = UFA i € D) = U F(Range(A)). 
( 


(i) Fier A) € Mier FA) = FAS i € TH) =N f(Range(A)) if I A 0. 
(i) f^ (Uie; B) = User £7 (8) = UF 1 (Bs i € TY) =U F Qange()). 
(iv) f (Range(B)) if I 40. 


U 
Fer B bI Ee nf (B; iceI)-n)| 
(A 


PROOF: For part (i), let f : X — Y be a function, and let 


y € fier Ai) & 3x € Uier Ao y = (2) 
© diel, Ar € Aj, y = f(x) 
e Jic I, ye f(d) 
e y € Uie; (Ai). 


Hence f(Ujc; Ai) = User /(A).. Then Uje; f(Ai) = U f({Ai: i € I} = U f(Range(A)) follows directly 
from Definition 10.7.5. 


ijicr be a family of subsets of X. Then 
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For part (ii), let f : X — Y be a function, and let (A;)ie; be a non-empty family of subsets of X. Then 


ye Ham € dx € Nicer Ao y = f(x) 
€ Jr, (x € Mier Ar A y = f(x)) 
€ Jr, ((Vi, (i € I => zx € Aj) ^ y = f(x)) 
€ Jr, Vi, (GC I >x € Ai) ^ y = f(x)) (10.8.24) 
> Vi, de, (ie I > z € Aj) A y = f(x)) (10.8.25) 
=> Vi, Ja, (i€ I > (x € A; A y = f(2))) (10.8.26) 
e Vi, (i € I > Jx, (x € Aj A y= f(x))) (10.8.27) 


e Vic I, Jx € Aj, y = f(x) 
e Wiel, y € f(A) 
€ y € Mier f(A). 


Line (10.8.24) follows from Theorem 6.6.16 (ii). Line (10.8.25) follows from Theorem 6.6.24 (iii). Line 
(10.8.26) follows from Theorem 4.7.9 (xxxviii). Line (10.8.27) follows from Theorem 6.6.12 (xxvii). Hence 


fuer At) € Nier f(A).. Then Nye f(Ai) = Qf(As i e I) = 1) f(Range(A)) follows directly from 
Definition 10.7.5. 


For part (iii), let f : X — Y be a function, and let (B;);e; be a family of subsets of Y. Then 


x € Fier Bi) © Fle) € User Bi 
€ Jic I,f(x)€ Bi 
& Jic I, xe f~ (Bj) 
$s z € Uer f (B. 
Hence f-!(U;¢7 Bi) = User f (Bj). Then Ujer £^! (Bj) = Uf (Bi i € DY) = U f^ (Range(B)) follows 
directly from Definition 10.7.5. 
For part (iv), let f : X — Y be a function, and let (B;);ie; be a non-empty family of subsets of Y. Then 


TE F hes Bi) & f(x) € Mier Bi 
€ Vi € I, f(x) € Bi 
e VieI,zef (Bj) 
essa. 


Hence f—!(Qje7 Bi) 2 er f 1(B)). Then Mier £7 1(8) = Af! Bi; ie T}) 2 (| f (Range(B)) follows 
directly from Definition 10.7.5. 


10.8.19 REMARK: Some theorems about unions of unions. 

'Theorems 10.8.20 and 10.8.22 are applicable to topology. Theorems 10.8.20 means that the union of the 
unions |J f(A) of individual set-collections f(A) in a family of set-collections (f(A))Aer is equal to the 
union U{f(A); A € I} of the combined collection {f(A); A € I} of the sets in each of the individual 
collections f(A). In other words, merging each collection f(A) individually into a single set first and then 
merging these merged sets gives the same result as combining all of the collections f(A) into one big collection 
and merging that big combined collection. 


10.8.20 THEOREM: Unions of unions of families of sets of sets. 
Let f : I > P(P(X)) for sets J and X. Then U{U f(4); A € I) = UU(f(A); A € I). In other words, 


User U f(A) = UUaer f(A). 
PROOF: The left-hand side expression |J{ UU f(A); A € I} satisfies 
ce U{US(A); Ae TI} e 3AeL x eUF) 
e JAcILaBe f(A), ce B. 
The right-hand side expression | JU/( f (A); A € I) satisfies 
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ce UU{f(A; Ac} e3BeU(f(A; Ae I, «eB 
e JB, (Be U(f(4; Ae IT} ^ ze B) 
€ JB, JAE, (Be f(A) A ce B) 
© JB e€ f(A), dJA4€1, ce B. 


The result follows. 


10.8.21 REMARK: Versions of a union-of-unions theorem with and without functions. 

See Theorem 8.6.2 for a version of Theorem 10.8.22 which does not use functions. (Theorem 10.8.22 originally 
had an application to some topological construction which involved the axiom of choice. It is now no longer 
clear what it is useful for. However, see Remarks 8.6.1 and 10.8.19.) 


10.8.22 THEOREM: Formulas for the union of a set of sets which satisfy a mysterious condition. 
Let X be a set and Q € P(P(X)). Let f : Q > P(P(X)) be a function such that A = UJ f(A) for all A € Q. 


Then UQ = U{U f(4; 4€ Q} -UUCGU(A A e Qj. 


Proor: Clearly U Q = U{4; A € Q} = UU (A); A € Q}, and from Theorem 10.8.20, it follows that 
ULU f(A); Ae Q} = UULF(A); A € Qj. 

10.8.23 REMARK: Unions and intersections of collections of sets constrained by a predicate. 
The set [Jc (x € X; P(x,y)} in Theorem 10.8.24 may be thought of as the domain of the set-theoretic 
formula P if it is subject to the restrictions x € X and y € Y on P(z, y). 


The set pey {£ € X; P(z, y)) is perhaps less useful. If the set ((,y) € X x Y; P(x,y)} is interpreted as 
the graph of a function f : X > Y, the set (], ey {x € X; P(x, y)) is the set of x € X for which f({x}) =Y. 


10.8.24 THEOREM: Unions and intersections of two-parameter set-theoretic formulas. 
Let X and Y be sets. Let P be a two-parameter set-theoretic formula. 


(i) Uyey iz € X; P(z,y)) = {x € X; 3y € Y, P(x, y)}- 
(i) (ley x € X; P(z,y)] = {x € X; Vy € Y, P(z,y)) if P satisfies 3x € X, 3y € Y, P(x,y). 


PROOF: To prove part (i), note that 
U {x € X; P(z,y)-isseY;zeizeX; P(z.y)) 
= = {z; Iy EY, (z € X ^ Pv.) 
— (x; E X ^ 3y € Y, P(x,y)} 
= (x € X; 3y € Y, P(z,y)]. 


To prove part (ii), note that 
N {x € X; P(z,y)} = {z; Yy E Y, z € {x € X; Play} } 
"d = (e; Vy € Y, (z € X ^ P(e,y))} 
= {z; x E X ^ Vy € Y, P(z,y)] (10.8.28) 
= (x € X; Vy € Y, P(z,y)]. 
Line (10.8.28) follows from Y Æ Ø, which follows from the assumption 3x € X, dy € Y, P(x,y). 


10.8.25 THEOREM: Intersections of families of set-pairs. 
Let (Ai, B;)ie; be a family of set-pairs. Then (;c; Ai x Bi = (Mier Ai) x (Nier Bi). 


PROOF: Let (Ai, B;);e; be a family of set-pairs. Then 


T, VY, z, y) € ix Di) © iE I, (x,y) € i X Di 
Yr, V Aix B Wel Aix B 
iel 
= Vi € I, ((x € A;) and (y € B;)) 
«€» (Vi € I, x € Ai) and (Vi € I, y € Bi) (10.8.29) 
«€ r€ N Aj and y € f] Bi 
iel ieI 


= (z,ye( (14) x (Q Bi); 


where line (10.8.29) follows from Theorem 6.6.22 (iii). Hence (),<; Ai x Bi = (Mier Ai) X (Nier Bi)- 
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10.9. Partially defined functions 


10.9.1 REMARK: Motivation for partially defined functions. 

A function is said to be “well defined” on a specified source set X if it has one and only one value for every 
element of X. In other words, a function is well defined on X if and only if it is a function on X. The reason 
for the superfluous adjective “well-defined” is the fact that sometimes one wishes to discuss “partially defined 
functions”, which are functions according to Definition 10.2.2 (i), but have a domain which is a subset of 
some specified source set. 


Partially defined functions, also known as “partial functions”, are defined in the general sense described here 
by E. Mendelson [370], page 7. The term “partial function” is used in a much more specialised way in the 


context of recursion theory. (See for example Stoll [393], page 429; Shoenfield [390], page 145; Kleene [365], 
page 325; Kleene [366], page 244; Takeuti [397], page 51.) 


The partially defined function concept is not a formalistic “road to nowhere”. Partially defined functions 
are ubiquitous in differential geometry because the charts of manifolds and fibre bundles are defined as 
partially defined functions. Since all manifold and locally Cartesian fibre bundle concepts are based upon 
charts, partially defined functions are a constant feature of definitions, not a degenerate curiosity which can 
be relegated to the background. Therefore partially defined functions deserve almost as much attention as 
fully-defined functions. 

Conditions (i), (ii) and (iii) in Definition 10.9.2 correspond to conditions (i), (ii) and (iii) for a fully-defined 
function in Definition 10.2.2. E. Mendelson [370], page 168, uses the adjective “univocal” to refer to ordered- 
pair relations which have the uniqueness property in Definition 10.9.2 (i). 


10.9.2 DEFINITION: A partially defined function is a relation f such that 


(i) V(z1, 91), (2, y2) € f, (x1 = 22 91 = y2). [uniqueness 


A partially defined function on a set X is a partially defined function f such that 


(ii) Dom(f) € X. [domain inclusion 


A partially defined function from X to Y is a partially defined function f on X such that 


(iii) Range(f) € Y. [range inclusion 


A partial function, locally defined function and local function are alternative names for a partially defined 
function. 


10.9.3 NOTATION: X > Y denotes the set of all partially defined functions from a set X to a set Y. 
f: X >Y means that f is a partially defined function from a set X to a set Y. 


10.9.4 REMARK: Notation for sets of partially defined functions. 

The non-standard Notation 10.9.3 is difficult to justify. However, a well-defined function f has one and only 
one value for each element of its domain X. That is, #{y; (x, y) € f) = 1 for all x € X. By comparison, a 
partially defined function has at most one value. That is, #{y; (x,y) € f} € 1 forall x € X. So the number 
of values is either 0 or 1 for all x € X. So the circle over the arrow in *f : X — Y? may be thought of as a 
warning that the function may have zero values for some elements of the nominal domain. 


10.9.5 REMARK: Possible alternative notations for sets of partially defined functions. 
There is a notation Y * for the set of functions f : X — Y, but there is no simple notation for the set or 
partially defined functions (f : U > Y; U C X). This could be denoted in the following ways. 


{f:U>Y;UCX}= U YY 
UCX 
eguimp* 


= (yey, 


The second notation has the advantage of making it obvious that the cardinality of the set is (#(Y)+1)*#), 
but it is not good, meaningful set notation. 
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10.9.6 REMARK: The relation between functions and partial functions. 

A partial function f on a set X is a function if and only if Dom(f) = X. In other words, a partial function 
is a function if and only if its domain and source set are the same. As mentioned in Remark 9.5.21, the 
source set for a relation is in practice a metamathematical concept. In this case, when one states within some 
context that f is a “partial function on a set X", the set X becomes the source set of f for that context. 
This is analogous to the metamathematical concept of the target set of a function which is mentioned in 
Remark 10.2.7. Just as surjectivity of functions and partial functions is defined with respect to a contextually 
specified target set, a partial function may be said to be “defined everywhere" when its domain equals a 
contextually specified source set. It then becomes a “well-defined function". 


10.9.7 REMARK: Injective, surjective and bijective partial functions. 

Definition 10.9.8 is the partial function version of Definition 10.5.2 for functions. If the source set X is 
replaced with the domain Dom(f) of f, then the three adjectives in Definition 10.9.8 are the same as in 
Definition 10.5.2. This follows from the fact that replacing X with Dom(f) makes f a function. 


10.9.8 DEFINITION: 

A partial function f : X — Y is injective when Yz1, £2 € Dom(f), (f(zi) = f(z3) > z1 = v3). 
A partial function f : X — Y is surjective when Vy € Y, dx € Dom(f), f(x) = y. 

A partial function f : X — Y is bijective when it is injective and surjective. 


10.9.9 REMARK: Set-maps for partial functions and inverses of partial functions. 

The definitions and notations for function set-maps and inverse set-maps in Section 10.6 are immediately 
applicable to partial functions. (Of course, they may also be applied to general relations.) Definition 10.9.10 
is a partial function version of Definition 10.6.4. As mentioned in Remark 10.6.6, the notations f and f~t 
are employed, in practice, instead of f and f^! respectively, despite the potential for confusion and error. 


10.9.10 DEFINITION: Set-maps and inverse set-maps for partial functions. 
(i) The set-map for a partial function f : X > Y, for sets X and Y, is the function f: P(X) 2 P(Y) 
defined by f(A) = (y € Y; da € A, (x,y) € fd - (f(x); x € A} for all AC X. 
(ii) The inverse set-map for a partial function f : X — Y , for sets X and Y, is the function fF: P(Y) > 
P(X) defined by f-1(B) = {x € X; Jy € B, (x,y) € fd = (x € X; f(x) € B} forall B C Y. 


10.9.11 REMARK: Difference between properties of function set-maps and partial function set-maps. 

It is perhaps remarkable that there is almost no difference between Theorem 10.6.7 for functions set-maps 
and Theorem 10.9.12 for partial function set-maps. The only noticeable difference is seen in part (ii’) of 
Theorem 10.9.12, where the equivalence of set inclusions forces the partial function to be a function. 


10.9.12 THEOREM: Properties of partial function set-maps. 

Let f : X >Y be a partial function, and let f : P(X) —> P(Y) denote the set-map for f. 
G) FO) — 0 and F(X) CY. 

) f(X) =Y if and only if f is surjective. 

ii) YVA, B € P(X), AC B= f(A) C f(B). 

) VA,BeE P(X), AC B & f(A) C f(B) if and only if f is injective and Dom(f) = X. 

(ii^) VA € P(X), f(Dom(f) n A) = f(A). 

5 VA, B € P(X), f(AU B) = f(A 

iv) VA, Be P(X) 

iv’) (X) 

) ) 

) ) 


, 


Sb) S 


VA,B € P(X 


$ 


(v) VAE P(X), f(XN A)= if and only if f is injective. 


Pnoor: Part (i) is obvious from Definition 10.9.10 (i). 
Part (i^) is obvious from part (i) and Definition 10.9.8. 
Part (ii) is obvious from Definition 10.9.10 (i). 
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For part (ii), let f : X >Y bea partial function which satisfies VA, B € P(X), AC B & f(A) C f(B). 
To show that f is injective, let z,z' € Dom(f) with f(x) = f(a’). Let A = {x} and B = {x'}. Then 

f(A) = {f(x)} = (f(z')) = f(B). Therefore A = B by the assumed property of f. So x = x’. Hence f is 
injective. To show that Dom(f) = X, let x € X VDom(f). Let A = {a} and B = (. Then f(A) = 0 = f(B). 
So A = B by the assumed property of f. Hence Dom(/) = X. 


For the converse of part (ii’), let f : X — Y bea partial function which is injective and satisfies Dom(f) = X. 
Then f is a fully-defined function. So the property VA, B € P(X), AC B & f(A) € f(B) follows from 
Theorem 10.6.7 (ii"). 


For part (ii^), let A € P(X). Then f(Dom(f)M A) € f(A) by part (ii) because Dom(f) n A C A by 
Theorem 8.1.6 (iv). For the reverse inclusion, let y € f(A). Then y = f(x) for some z € A. Sox € Dom(f) by 
Definition 9.5.4. Therefore x € Dom(f)N A by Definition 8.1.2. So y € f(Dom(f) A) by Definition 10.9.10. 
Thus f(Dom(f)n A) 2 f(A). Hence f(Dom(f) n A) = f(A). 


Part (iii) follows by exactly the same proof as for Theorem 10.6.7 (iii). 


Part (iv) follows by exactly the same proof as for Theorem 10.6.7 (iv). Alternatively it follows by part (ii). 
For part (iv^), let f : X > Y be a partial function which satisfies VA, B € P(X), f(An B) = f(A) n f(B). 
To show that f is injective, let z,z' € Dom(f) with f(x) = f(x’). Let A = {x} and B = {x'}. Then 
f(A) = (f(z)) = (f(z)) = f(B). So f(ANB) = f(A)NS(B) = (f(x)) #0. So ANB Z 0. Therefore x = x’. 
Hence f is injective. The converse of part (iv’) follows exactly as in the proof of Theorem 10.6.7 (iv’). 

Part (v) follows by exactly the same proof as for Theorem 10.6.7 (v). 


Part (v^) follows exactly as for Theorem 10.6.7 (v'). Alternatively note that X = (X V Dom(f))U Dom( f), 
and so X V A = ((X V Dom(f)) U Dom(f)) V A = ((X V Dom(f)) V A) U (Dom(f ) \ A) for all A € P(X), by 
Theorem 8.2.6 (iii). But f((X\Dom(f))\ A) € f(X\Dom(f)) = 0 by part (ii). So f(X\ A) = f(Dom(f)\ A) 
for all A € P(X), by part (iii). Similarly F(X) = f(Dom(f)). So VA € P(X), f(X\A)= F(X )\ f(A) if and 
only if VA € P(X), f(Dom(f) V A) = f(Dom(f))\ f(A). But f(A) = f(ANDom(f)) and f(Dom(f) V A) = 
f(Dom(f)\(ANDom(f))) for all A € P(X). So X may be replaced by Dom() in the statement of part (v^), 
and f is a function on Dom(f). Hence the assertion follows from the application of Theorem 10.6.7 (v v) t to 
this function. 


10.9.13 REMARK: Differences between function and partial function inverse set-map properties. 
Theorem 10.9.14 for partial function inverse set-maps is almost identical to Theorem 10.6.10 for function 
inverse set-maps. The only noticeable difference is that X is replaced by Dom( f). 


10.9.14 THEOREM: Properties of partial function inverse set-maps. 
Let f : X >Y be a partial function. Let f^! : P(Y) + P(X) denote the inverse set-map for f. 


(i) f^ (0) = 0 and f^! (Y) = Dom(f). 


(i) VA € P(Y), f-!(A) = 0 & A = 0 if and only if f is surjective 
(ii) VA,B € P(Y), AC B= f-!(A)C f-!(B). 
(i) VA, B € P(Y), AC Bs f-'(A) € f-!(B) if and only if f is surjective. 
(i" VA, Be P(Y), A = B & f-!(A) = f-!(B) if and only if f is surjective. 
(iii) VA, B € P(Y), f-1(AU B) = f-*(A)U f-(B). 
(iv) VA, B € P(Y), f (An B) = f! (A) n f! (B). 
v) VA, B € P(Y), £- (AB) = P (A) V FMB) 
^ (4). 
) ) 


PROOF: Let f: X > Y be a partial function. Then f is a function from Dom(f) to Y. So Theo- 
rem 10.6.10 is directly applicable to f with X replaced by Dom(f). Hence part (i) follows from Theo- 
rem 10.6.10 (i), part (i^) follows from Theorem 10.6.10 (i’), part (ii) follows from Theorem 10.6.10 (ii), part (ii") 
follows from Theorem 10.6.10 (ii^), part (ii") follows from Theorem 10.6.10 (1i), part (iii) follows from Theo- 
rem 10.6.10 (iii), part (iv) follows from Theorem 10.6.10 (iv), part (v) follows from Theorem 10.6.10 (v), and 
part (v) fol ^) follows from Theorem 10.6.10 (v^). 
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For part (vi), let A € P(Y). Then f-!(An Range(f)) € f~1(A) by part (ii). Now let zx € f~1(A). Then 
y = f(x) for some y € A. But then y € Range(f). So y € An Range(f). Therefore x € f^! (An Range(f)). 
Thus f-!(An Range(f)) 2 f-!(A). Hence f-! (An Range(f)) = f-!(A). 


10.9.15 REMARK: Comments on partially defined functions. 

A relation f : X — Y is an injective partially defined function if and only if the inverse relation f^! : Y 2 X 
is an injective partially defined function. Such a function could be referred to as a “partial bijection” or a 
“local bijection”. Such names could be justified by observing that f : Dom(f) > Range(f) is a bijection. 


10.10. Composition of partially defined functions 


10.10.1 REMARK: Partial function set-map and inverse set-map composition properties. 

Theorem 10.10.2 is a partial function version of Theorem 10.7.1 for the properties of compositions of set-maps 
and inverse set-maps. Although the “functions” here are only partial functions, their set-maps and inverse 
set-maps are fully-defined functions on the corresponding power sets. 


10.10.2 THEOREM: Properties of compositions of function set-maps and inverse set-maps. 
Let f : X — Y bea partial function for sets X and Y. (In other words, f is a function such that f C X xY.) 
Let f and f-! be the corresponding set-map and inverse set-map respectively as in Definition 10.9.10. 


(i) VS € P(Y), f(£7 (89) = 5n F(X). 
0 YS € P(Y), fF (5) Gs. 
(i") VS € P(Y), f(/-!(S)) = S if and only if f is surjective. 
(ii) VS € P(X), f^! (f(8)) 2 S if and only if Dom(f) = X. 
(i^) VS € P(X), f-*(f(S)) € S if and only if f is injective. 
(i^) VS e P(X), f-!(f(S)) = S if and only if f is injective and Dom(f) = X. 
(iii) VS, f (f(S)) 2 5n Dom(f). 
(ii^) VS, f (f(S)) C Sn Dom(f) if f is injective. 
(ii^) VS, f-*(f(S)) = S N Dom(f) if f is injective. 


PRoor: Part (i) may be proved exactly as for Theorem 10.7.1 (i). 

Part (i) follows trivially from part (i). 

For part (i7), suppose that f: X > Y is surjective. Then F(X) = Y by Definition 10.9.8. So it follows 
from part (i) that VS € P(Y), f(f~'(S)) = S. To show the converse, assume VS € P(Y), f(f~'(S)) = S. 
Let S = Y. Then f(f-!(Y)) =Y. So for all y € Y, for some z € f^! (Y), f(x) equals y. But f^! (Y) C 
Dom(f) by Definition 10.9.10 (ii) for f~+. So for all y € Y, for some x € Dom(f), f(x) equals y. Hence f is 
surjective by Definition 10.9.8. 


For part (ii), suppose that Dom(f) = X. Then f is a function. So VS € P(X), f-! (f(S)) 2 S follows from 
Theorem 10.7.1 (ii). To show the converse, suppose that f : X >Y satisfies VS € P(X), f-1(f(S)) 2 S. 
Let S = X. Then f-!(f(X)) D X. So for all z € X, for some y € f(X), f(x) = y by Definition 10.9.10 (ii) 
for f-!. But f(X) C Y by Definition 10.9.10 (i) for f. So for all x € X, for some y € Y, f(x) = y. Therefore 
X C Dom(/). Hence X = Don f). : 

For part (i), suppose that f : X > Y is injective. Let S € P(X). Let x € f !(f(S)) Then by 
Definition 10.9.10 (ii) for f~', there is a y € f(S) such that f(x) = y, and then by Definition 10.9.10 (i) 
for f, there is an z' € S such that f(z’) = y. But x = a’ by the injectivity of f. So x € S. Hence 
[-(f(S)) € S. To show the converse, suppose that f : X > Y satisfies VS € P(X), f-!(f(S)) C S. 
Let z,z' € Dom(f) satisfy f(x) = f(a’). Let y = f(x). Then y € Y. Let S = {x}. Then S € P(X). 
So f-*(f({z})) € {x}. But f({x}) = {y}, and f~*({y}) 2 {2,2} because f(x) = y and f(x’) = y. So 


(z,z') € {a}. Therefore x = x’. Hence f is injective. 


Part (ii^) follows by pure predicate calculus from parts (ii) and (ii’). 


For part (iii), let S be a set. Let £ € S N Dom(f). Then y = f(x) for some unique y, and y € f(S). So 
x € f~*({y}) € f (f(S)). Hence f~*(f(S)) 2 5n Dom(f). 
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For part (iii), let S be a set. Let « € f! (f(S)). Then x € Dom(f) and f(x) € f(S). So f(x) = f(a’) for 
some x’ € S. Therefore x = z' € S if f is injective. Hence f~!(f(S)) C S n Dom(f) if f is injective. 
Part (iii^) follows from parts (iii) and (iii^). 


10.10.3 REMARK: Adoption of colloquial notations for function set-maps from this point on. 

The customary notations f(A) = (b € Y; da € A, b = f(a)) and f! (B) = (a € X; f(a) € B) are used 
from this point onwards for the set-maps and inverse set-maps respectively of any function f : X — Y, for 
sets A € P(X) and B € P(Y), instead of the more explicit bar-notations f(A) and f-!(B) used in Sections 
10.6, 10.7, 10.8, 10.9 and 10.10. 


10.10.4 REMARK: Composition of partially defined functions. 

Since partially defined functions are a sub-class of the class of relations, the definitions and notations for the 
composition of partially defined functions can be directly inherited from the corresponding definitions and 
notations for relations, such as Definition 9.6.2 and Notation 9.6.3. The principal additional requirement is 
to ensure that the class of partially defined functions is closed under composition. 


10.10.5 THEOREM: Composition of partial functions yields partial functions. 

The composite of any two partially defined functions is a partially defined function. 
PROOF: Let fı and f? be partially defined functions. By Definition 9.6.2 for the composite of relations, 
the composite of fı and fə is the relation {(a, b); dc, ((a,c) € fi ^ (cb) € fo)}. Suppose (a, 51) € f and 
(a,b2) € f. Then (a,c1), (a, c2) € fı and (c1,01), (c2, 02) € f2 for some c1,c2 € Bı. Since fi is a partially 
defined function, c1 = co. So bı = bz because f» is a partially defined function. Hence f is a partially defined 
function by Definition 10.9.2. 


10.10.6 DEFINITION: The composition or composite of two partially defined functions fı and f2 is the 
partially defined function ((a, b); de, ((a, c) € fi ^ (c, b) € f2)). 


10.10.7 NOTATION: go f denotes the composition of partially defined functions f and g. 


10.10.8 REMARK: Notation for composition of partially defined functions. 

Notation 9.6.3 for the composition of relations, and Notation 10.4.18 for the composition of functions, are 
used also for the composition of partially defined functions. (These are inherited from the corresponding 
Definition 9.6.2 and Notation 9.6.3 for relations.) Thus f» o fı denotes the composite of partially defined 
functions fı and f2. It is customary, but unfortunate, that the order of the functions in the symbolic notation 
is the reverse of the order in the English-language sentence. 


This is yet another example which shows clearly that relations and functions are more than just sets of 
ordered pairs. One thinks of relations and functions as having some kind of temporal, chronological or 
causal meaning. Thus one often describes the composition of f, and fo as “first” applying fı and “then” 
applying f2 to the “result” of fı. Such significance cannot be gleaned from an inspection of the sets of 
ordered pairs alone. The meaning is in the socio-mathematical context. (One may observe similarly that 
the use of the word “then” in logical statements of the form “if P, then P5" also has a temporal significance 
which cannot be gleaned from truth tables alone.) 


10.10.9 T'HEOREM: Source and target sets of the composite of two partially defined functions. 

The composite of partially defined functions fı : A; > Bı and f; : Ag > Bə is a partially defined func- 
tion fo o fi 7 A4 5 Bo. 

PROOF: Let f; C A, x By and fg C As x Bo be partially defined functions. Theorem 10.10.5 implies that 
f» ° fi is a partially defined function. It must be shown that Dom(f» o fı) C A1 and Range(f2 o fi) € Bo. 
Let a € Dom(f2 o fı). Then (a,b) € f2 o fi for some b. So 3c, ((a,c) € fi ^ (c, b) € f2) by Definition 10.10.6. 
So (a,c) € fı for some c. So a € Dom( fi). 

Let b € Range(f2 o fi). Then (a,b) € f2 o fı for some a. So de, ((a,c) € fi ^ (cb) € f2) by Defini- 
tion 10.10.6. So (c, b) € f2 for some c. So b € Range( f2). 

Therefore Dom( f2 o fı) € Dom(f1) and Range( f» o fi) C Range( f2). But Dom(fi) C A; and Range(f2) C 
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10.10.10 REMARK: The composite of two functions. 

Theorem 10.10.11 shows that the composition of any two functions with specified domains and target spaces 
yields a partially defined function from the first domain to the second target space. If the domains and 
target spaces of the functions are not specified, as in Definition 10.2.2 line (i), then the result is a function 


in the same sense, i.e. without any specified domain or target space. (Theorems 10.10.9 and 10.10.11 are 
illustrated in Figure 10.10.1.) 


fi fo 


fi: 4ı > Bı fo: Az — B3 
feo fi: 41 Bo 


Figure 10.10.1 Composite of two functions is a partially defined function 


10.10.11 THEOREM: Source and target sets for the composite of arbitrary functions. 
The composite of any two functions fi : Aj — Bı and f2 : Az — Bs» is a partially defined function 
fao fi: Ar Bs. 


PRoor: This is an immediate corollary of Theorem 10.10.9. 


10.10.12 REMARK: Domains and ranges of composites of functions. 
Theorem 10.10.13 requires f; and f2, but by Theorem 10.10.11, the composite fə o fı is only a partial 
function on the source space Dom(/f1). But if the domain is not mentioned, the composite fo o fi is a 
function in the sense of Definition 10.2.2 line (i). However, if fı is not injective, the composite fz o fj! is 
only guaranteed to be a relation, not necessarily a function. Nevertheless, the domain and range of general 
relations are well defined by Definition 9.5.4. 


10.10.13 THEOREM: Domains and ranges of composites of functions and inverses of functions. 
Let fı and f2 be functions. 


(i) Dom(f2 o fi) = f; ! (Dom(f2)). 
Range( f» o f1) = fo(Range(f1)). 


(i) 
(ii) Dom(fo © fr) = fı (Dom(f2)). 
(iv) Range(fa o f; ') = fa(Dom(fi)). 
(v) Dom(f» o fI = Range Gib (f) 
(vi) Range(f» o f, !) = Range (falbom(f.))* 

(vii) Dom(f» o fi) = f; !(Dom(f2) n Range(f1)). 
(viii) Range(fo o f1) = fs(Range( f1) O Dom( f;)). 
(ix) Dom(fs o fy) = fi(Dom(f1) N Dom(f2)). 

) 


Range(fz o f, +) = f2(Dom(f1) n Dom(f2)). 


Pnoor: Part (i) follows from Theorem 9.6.6 (1). 
Part (ii) follows from Theorem 9.6.6 (ii). 


Part (iii) follows from Theorem 9.6.6 (iii). 
Part (iv) follows from Theorem 9.6.6 (iv). 
Part (v) follows from part (iii) and Theorem 10.6.9 (ii). 
Part (vi) follows from part (iv) and Theorem 10.6.9 (ii). 
Part (vii) follows from Theorem 9.6.6 (v). 
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Part (viii) follows from Theorem 9.6.6 (vi). 


Part (ix) follows from Theorem 9.6.6 (vii). 
Part (x) follows from Theorem 9.6.6 (viii). 


10.10.14 REMARK: Regarding identity functions as partial functions. 

In Theorem 10.10.15, identity functions are regarded as partial functions. (For some related assertions for 
well-defined functions on specified domains, see Theorem 10.5.19. For an application of Theorem 10.10.15, 
see the proof of Theorem 52.1.13.) 


In Theorem 10.10.15 (v), the inverse f! of f is not a function if f is not injective, but the expression f^! 


Kes 
is well defined by an obvious extension of Notation 10.4.4 to general relations. (See Notation 9.6.22.) 


For symmetry, one would expect a variant of part (iii) which asserts something like idy o f = IG where 
f ld denotes the restriction of the range of f to Y. Such a notation is not in common use. (Nor is the 
concept.) So this assertion is not made here. Consequently part (vi) cannot assert the symmetrically 
expected id;(xj o f = fF for example, even though it is true. (See Theorem 9.6.24 (ii) for this.) 


10.10.15 THEOREM: Some basic composition properties of identity functions. 
Let X and Y be sets. Let f and g be partial functions. 
(i) idx(Y)2 XAY. 
(ii) idx [9] idy = idxny. 
(iii) f o idx = Tie 
(iv) idy o f = f oid; iy) = Mx 
(v) idx o f-1 = f-1oidg(x - fo if f is injective. 
(vi) f o idx = idg(xy o f if - is injective. 
(vii) fT} o idy = id;-i(yj o f7}. (Note that f^! is not necessarily a function.) 
(viii) f o (g| x) = f o idx) ° 9 = Fax) o g if g is injective. 
(ix) f\geg=a 7 ody og = f o (g| i) 
(x) fo f= idpoms) if f is injective. 
(xi) 
(xii) f~! o idy of = f7 lee o f = idy-1(yy if f is injective. 
(xii) f oidx of! = jx o fT! =idgcx). (Note that f^! is not necessarily a function.) 


". 


fof !-—idnage(f). (Note that f^! is not necessarily a function.) 


PROOF: For part (i), b € idx (Y) if and only if Ja € Y, (a,b) € idx by Definition 10.6.4 (i). So b € idx (Y) 
if and only if Ja € Y, (a € X and a — b) by Definition 10.2.27. (Alternatively by Notation 9.6.10.) Hence 
b € idx (Y) if and only if b € Y and b € X. In other words, b € X NY by Definition 8.1.2. 

For part (ii), Dom(idx o idy) = id,  (Dom(idx)) by Theorem 10.10.13 (i). By Definition 10.2.27, idx! = idx 
and Dom(idy) = Y. So Dom(idx o idy) = XAY by part (i). For z € XAY, (idx oidy)(z) = idx (idy(z)) = 
idx (z) = z by Definition 10.2.27. Hence idx o idy = idxny by Definition 10.2.27. 

For part (iii), Dom(f o idx) = id (Dom(f) by Theorem 10.10.13 (i). So Dom(f o idx) = X n Dom(f) by 
part (i). Then by Theorem 10.4.5 line (10.4.2 n Dom( s) = X n Dom(f) = Dom(f o idx). For any z in 
this common domain of f o idx and flx C o idx)(z) = f(idx(z)) = f(z) and GG — f(z). Hence 
foidx = cape by Theorem 10.2.13. (Alternatively, part (iii) follows from Theorem 9.6.24 (i).) 

Part (iv) follows from Theorem 9.6.25 (iv). 

Part (v) follows from parts (iv) and (iii). 

Part (vi) follows from Theorem 9.6.25 (iii). 

Part (vii) follows from Theorem 9.6.25 (iii). 

For part (viii), f o (g| .) = f o g o idx = f oidj(xj o g by parts (iii) and (vi) because g is assumed to be 


injective, and then f oidj(xj og = Flo o g by part (iii). 
For part (ix), f| x o g = f o idx o g by part (ii), and then f oidxog= fo GP a co) by part (iv). 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


350 10. Functions 


Part (x) follows from Theorem 9.6.19 (ii). 


Part (xi) follows from Theorem 9.6.19 (iii). 
For part (xii), fc! o idy o f = f [e] f ° id¢-1(y) = idpom(f) o id j-1(y) = idpom(f)nf-1(Y) = id j-1(y) by 
parts (iv), (x) and (ii), or alternatively by Theorem 9.6.25 (v), and foidyof- yos o f by part (iii). 
Part (xiii) follows from Theorem 9.6.25 (vi) and part (iii). 


10.11. Cartesian products of families of sets 


10.11.1 REMARK: Alternative ways of thinking about the Cartesian product of a family of sets. 

The elements of the Cartesian product in Definition 10.11.2 may be thought of as either functions or sets 
of ordered pairs (the graphs of the functions). The perspective may be chosen according to one's purposes. 
(This subjective difference between functions and their graphs is discussed particularly in Remark 19.6.2 in 
the context of relations.) 


10.11.2 DEFINITION: The Cartesian product of a family of sets (5;);c; is the set of functions 


{f:I> U Si; Wel, fies 
i€l 
10.11.3 NOTATION:  X;je; Si, for a family of sets (5;);c;, denotes the Cartesian product of the family of 
sets according to Definition 10.11.2. 


10.11.4 REMARK: Alternative expression for the Cartesian product of a family of sets. 
The Cartesian product in Definition 10.11.2 may be written as 


x Siı={f:I> U S; vier fies) 
tel wel 


={feP(x U Si); QU eT, Jye U Si, Gu) ef) A Wj ETI, ay € Sj, Ga) € F) }, 


which shows that x;e; S; is a well-defined ZF set by Theorem 7.7.2, the ZF specification theorem. Otherwise, 
this formula seems to have no obvious value. 


10.11.5 REMARK: The special case of the Cartesian product of a constant family of sets. 

If S; = X for all i € I, then x;e; Si = X! for any sets X and I. (See Notation 10.2.17 for the set of 
functions X7.) 

If I = Nn for some n € Zi and S; = X for all i € I, then x;e; Si = X”. (See Notation 14.1.21 for index 
sets Nn.) 


10.11.6 THEOREM: Some very basic properties of Cartesian products of families of sets. 
Cartesian products x;e; S; of families of sets (5S;);e; have the following properties. 


(i) If [= 0, then Xiel Sj = 10). 


(ii) If S; = Ø for some i € I, then x;ej; S; = 0. 
(iii) If I = {j}, then Xier Si = {(j, x); € Sj} = {j} x S. 
(iv) If fier Si z 0, then xje7 S; z (). 


PROOF: For part (i), let (S;);e; be a family of sets. Then xje7 S; = (f : I > Uic; Si; Vi € I, f; € Si} by 
Definition 10.11.2. Let 7 = Ø. Then x;er Si = (f :0 > 0; T} = O° = (0) by Theorem 10.2.25. (The symbol 
*T? denotes the always-true predicate. See Notation 5.1.10.) 

For part (ii), let (S;)ie; be a family of sets such that S; = ( for some j € I. Then x;ig;j S; = (f: 15 
Ucr Si; Vi € T, fi € Si} = {f : I > User Si; fj eo A Wie IN fi € Si} =o. 

For part (iii), let (S;);er be a family of sets with I = {j} for some j. Then xier Si = (f : {7} > Sj; f; € Sj) 
by Definition 10.11.2. Hence xier 5; = {(j,x); x € Sj} = {7} x S}. 

For part (iv), let (S;);e; satisfy (),-7 5; #0. Let x € Mier Sj. Let f =I x {x}. Then f; =x € ficer Si € S; 
for all j € I. So f € Xier Si. Therefore x;e; 9; # 0 by Definition 6.3.9 (EE). 
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10.11.7 REMARK: Set-family choice functions. 

Definition 10.11.8 is analogous to Definitions 10.3.4 and 10.3.5 for set-collection choice functions and power- 
set choice functions respectively. (See Remark 10.3.3 for the various kinds of choice functions.) So the 
Cartesian product of a family of non-empty sets is exactly the same thing as the set of all choice functions 
for that set-family. Hence an axiom of choice guarantee the existence of such choice functions if and only if 
the Cartesian product set is non-empty. Thus Cartesian products of set-families are very closely linked with 
axioms of choice. 


10.11.8 DEFINITION: A set-family choice function for a family of non-empty sets (Xq)aecr is a function 
f: S — Uacr Xa which satisfies VA € S, f(A) € A. In other words, it is an element of xoc; Xa. 


10.11.9 REMARK: Aziom of choice and non-emptiness of Cartesian products of families of non-empty sets. 
'The Cartesian product in Definition 10.11.2 is a well-defined set because it is specified as a subset of a well- 
defined set J x | J;-; S; which is restricted by a well-defined predicate. However, it is not guaranteed to be 
non-empty in general unless one adds an axiom of choice to the Zermelo-Fraenkel axioms. Except when 
one is trying very hard to construct pathological sets, it will usually be true that a Cartesian product is 
non-empty if all of the member sets are non-empty. But the non-emptiness of the Cartesian product of an 
infinite family of non-empty sets should not be assumed without proof. 


If the index set is finite, then the product of a family of non-empty sets is non-empty by an inductive 
argument in Zermelo-Fraenkel set theory. If the index set is countably infinite, then the product of a family 
of non-empty sets is non-empty by an inductive argument in Zermelo-Fraenkel set theory with the axiom 
of countable choice. For a general index set, the product of a family of non-empty sets is non-empty in 
Zermelo-Fraenkel set theory with the general axiom of choice. Since finite and countably infinite sets are 
defined in Sections 13.5 and 13.7 respectively, the statement and proof of those cases must wait until then, 
but the general case can be stated here as Theorem 10.11.10. 


If one possesses a well-ordering on the union of the domain of a family of non-empty sets, the axiom of choice 
is unnecessary to prove that the Cartesian product is non-empty. Thus if one possesses a well-ordering on 
the set LJ; S; corresponding to the family (S;);ie; in the proof of Theorem 10.11.10, one may define the 
choice function A as the set {(i,min(S;)); i € I}. (See Section 11.6 for well-orderings.) 


10.11.10 THEOREM |ZF+AC]: Whitehead and Russell’s “multiplicative axiom”. 
The Cartesian product of a family of non-empty sets is non-empty. 


Pnoor: Some formulations of ZF+AC set theory give the statement of this theorem as the definition of the 
axiom of choice. Axiom (9) in Definition 7.11.10 means that there exists a set which intersects each member 
of an arbitrary set of disjoint non-empty sets in exactly one element. For an arbitrary family of non-empty 
open sets (S;)ier, the sets {i} x S; for i € I are disjoint. Therefore the axiom of choice implies that there 
exists a set A C Uj;e7({i} x Si) such that AN ({i} x S;) contains one and only one element for each i € I. 
Such a set A is a function from I to (J;e S; such that A(z) € S; for all i € I. In other words, A € x;er Si- 
Hence x;e; Si # 0. 


icI 


10.11.11 REMARK: The axiom of choice permits universal and existential quantifiers to be swapped. 
In predicate logic, from a statement "Jr, Vy, A(x, y)", one may infer the corresponding statement with 
quantifiers swapped, namely “Vy, Ja, A(x, y)". (See Theorem 6.6.24 (iii).) But the converse inference is not 
justified. Similarly, if the quantified variables are restricted to specific sets (as in Notation 7.2.7), then from 
a statement “da € S, Vy € T, A(z, y)", one may also infer the corresponding statement with quantifiers 
swapped, namely “Vy € T, dx € S, A(z, y)", but the converse inference is not justified. An axiom of choice 
may be thought of as reversing the quantifier order in a way which, at first sight, might seem to be not 


justified. With a suitable choice axiom, one may infer (2) from (1), where (Tz)zes is a family of sets. 


(1) Yz € S, dy € Tr, A(z, y). 
(2) dy € XzesTx, Vr € 5, A(z, y(x)). 


In statement (2), the simple variable y has been replaced with a function y : S — Uses T; such that 
Va € S, y(x) € Tz. This is the kind of choice function which is delivered by an axiom of choice. (The axiom 
of countable choice requires the set S to be countable.) It is very tempting to think that since there exists 
at least one y in each set T, in line (1), then y may be regarded as a function of x in the same way that T; 
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is. But in the absence of a suitable axiom of choice, this kind of functional dependence cannot be proved in 
general in ZF set theory. (See Theorem 10.5.17 (ii) for proof in ZF--AC set theory.) 


In some common scenarios, swapping quantifiers is easy to achieve without an axiom of choice. A well known 
example is the &-ó limit definition. Thus if the proposition “Ve € R+, 3ó € Rt, f(Ba,s) € Byia),-” is given, 
one may infer the proposition “Jô : IR* — Rt, Ve € R*, f(Bo5(-)) C Byia),-” without invoking any axiom 
of choice. (This example is discussed further in Remark 45.2.1.) 


One may sometimes have the impression that the need for an axiom of choice in a particular proof is a 
somewhat subjective matter. This is not true. When the quantifiers are reversed (or swapped), the choice 
function, which is ô : Rt — IR* in the real function limit example, is supposed to be an object in the set 
theory universe in which Zermelo-Fraenkel set theory is being interpreted. This object either does or does 
not exist in that universe. One may not personally know currently whether that object is contained in that 
universe, but once the universe has been specified, it is a clear-cut true/false question whether the object is 
contained in it. 


10.11.12 REMARK: Analogy of Cartesian products of families to cross-sections of fibrations. 

The definition of x;e; S; is (dimly) reminiscent of the definition of a cross-section of a non-uniform fibration 
because of the way that an element of S; must be chosen for each i € I. (See Definition 21.1.2 for non-uniform 
fibrations.) In this (slightly far-fetched) analogy, the set I acts in the role of a base set, while S; is the fibre 
set at each point 7 in the base set. 


10.12. Projection maps and slices for binary Cartesian products 


10.12.1 REMARK: Projection maps of Cartesian products of sets. 

Projection maps are amongst the most useful constructions for Cartesian set-products. The choice of symbol 
“II” in Notation 10.12.3 for projection maps is slightly non-standard. The symbol “r” is seen more often, 
but in the differential geometry context, there are many kinds of projections, for example for tangent bundles 
and various kinds of fibrations and fibre bundles. In this book, the symbol “z” is preferred for such structural 
projection maps, while “II” is preferred for selecting sub-tuples of elements of Cartesian set-products. (See 
Definitions 10.13.2 and 10.13.6 for projection maps for Cartesian products of families of sets.) 


10.12.2 DEFINITION: The projection (map) of a Cartesian product Sı x $5 onto a component k € {1,2}, 
is the map (z1,22) — zi from $4 x $5 to Sk. 


10.12.3 NOTATION: IIx, for k € {1,2}, denotes the projection map of a Cartesian product Sı x $5 onto 
the component k. In other words, Il, : $1 x Sp — S; is defined by (21,22) > zy. 


10.12.4 REMARK:  Cartesian slice sets. 

The “slice sets” for Cartesian set-products in Theorem 10.12.8 are applied in the proof of Theorem 32.10.3. 
They are also directly relevant to the definitions of partial derivatives in Section 41.1. (See Definition 10.13.10 
for slice sets for Cartesian products of families of sets.) Definitions 10.12.2, 10.12.5 and 10.12.7 are illustrated 
in Figure 10.12.1. 


$1 x S2 S2 
( ^ (> 
Slice} (A) II» (Slice? (.A)) 
m | i 
Slice} (A) p2] 4 
A i 
S - S 


m| II; (Slice? (A)) 
Di 


Sı C ccv — 2 D 


Figure 10.12.1 Slices of subsets of Cartesian products through given points 
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10.12.5 DEFINITION: Slices of subsets of Cartesian set-products through given points. 
The slice of a set A through a point p along component k, for A C S1 x S2, p = (pi,pe) € Sı x S2 and 
k € (1,2), is the subset ((z1,22) € A; Vi € (1,21 \ {k}, zi = pi) of A. 


10.12.6 NOTATION: Slice? (A), for k € {1,2}, p € $1 x Sy and A C S; x S2, denotes the slice of A through 
p along component k. In other words, Slice? (A) = ((z1,22) € A; Vi € {1,2} \ {k}, zi = pi}. 


10.12.7 DEFINITION: Projected slices of subsets of Cartesian set-products through given points. 
The projected slice of a set A through a point p along component k, for A C S1 x S2, p = (py, p2) € 94 x S2 
and k € {1,2}, is the set I (Slice? (A)) = (xx € Sk; (x1, £2) € A and Vi € (1,2) \ {k}, zi = pi}. 


10.12.8 THEOREM: The projected slice of a set-union equals the union of projections of slices. 
Let Sı and Sə be sets. 


(i) Vp € S1 x So, YC € IP(IP($1 x S2)), II; (Slice? (UC )) = UAec II; (Slice? ( A)). 

(ii) Vp € Sy x So, vC € P(P(S; x $5)), I (Slice (UJ C )) = Uaec II; (Slice (A)). 
PROOF: For part (i), let p € S1 x S2 and C C P(S1 x $5). Then II, (Slice? (U C)) = {x1; x € Slice? (U C)} 
(zi; x € UC and z2 = p2} = (23; (BA € C, z € A) and z2 = p2} = (x1; HA € C, (x € A and z2 = p2)} = 
UAccizi z € A and z2 = po} = Uyec{ai; x € M (Slice? (A))) = Ugec Hi (Slice? (A)). 
Part (ii) may be proved as for part (i). 


10.12.9 REMARK: Slices of functions on subsets of Cartesian products. 

Definition 10.12.10 is the function slice analogue of the set slices in Definition 10.12.5. A “slice of a function” 
focuses attention on the properties of the function along a particular coordinate direction, as one does when 
defining partial derivatives. 


The “projected slice” concept in Definition 10.12.7 shifts the domain of a function slice to a component 
set. The expression f | Slice o IL may at first sight not appear to be a valid function because II; is 


(A) 

k 

not an invertible function in general. However, 0 is a valid relation, which is the inverse of the function 

TI, : 91 x S2 — Sk, and when n is composed with the restricted function f RR (A)? the result is a 
k 


well-defined function from the projected set-slice II;,(Slice?(A)) to Y. 
10.12.10 DEFINITION: Slices of functions on Cartesian products through given points. 


The slice of a function f: A— Y through a point p along component k, for A C Sı x So, a set Y, p = 
(pi, p2) € $4 x S2 and k € {1,2}, is the restriction Fl stice® a) of f to Slice? (A). 


The projected slice of a function f : A— Y through a point p along component k, for A C Sı x S2, a set Y, 
p = (pi, p2) € S1 x S2 and k € {1,2}, is the function FIN o II," on the subset II; (Slice? (A)) of Sp. 


x CA) 


10.12.11 NOTATION: Slice;(f), for k € {1,2}, p € Sı x So, f: A> Y, AC Sı x S» and a set Y, denotes 


the slice of f through p along component k. In other words, Slice? (f) = F lotier (a) 


10.12.12 REMARK: Projected function slices, substitution operators and lift maps. 
The “projected function slice” operation in Definition 10.12.10 has the same effect as the substitution of one 
of the parameters of the function. Thus 


Vt € Sk, (Slice? (f) o I1, ')(t) = (f lsticer (4) ol, 6) 
= (F o Tics) 
=fo subs(p), 
where subs;,;(p) means the result of substituting t for element k of p. (See Definition 14.12.23 (ii).) However, 


it is more accurate to think of projected function slices as "lift maps" because they are right inverses of 
projection maps. 


10.12.13 DEFINITION: Lift maps from component spaces to Cartesian set-products. 
The lift map for component k to a point (pı, p2) € Sı x So, for k € {1,2}, is the map from Sy to Sı x S2 
defined by t > (x1, £2) with xj = t and Vi € (1,21) \ {k}, a; = pi. 
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10.12.14 NOTATION: Lift}, for k € {1,2} and p € 5; x S5, denotes the lift map for component k of the 
point p. In other words, Lift? : Sk — S1 x S2 is the map defined by 


Vt € Sp, Vi € (1,2), Lift? (t) (i) = n 7 - 


10.12.15 REMARK: Lift maps are right inverses of projection maps. 

Although Definition 10.12.13 and Notation 10.12.14 may seem overly complicated for such a simple task, lift 

maps are often required. One may write the map Slice? (f) o II; ! = Fletice? (A) oll =f 6 Lan as 
k k 

the simpler and more meaningful expression f o Lift?, where A = Dom(f). Lift maps are right inverses of 

projection maps as follows. 


Vk € (1, 2), Vp € S1 x So, Vt € Sk, IL. (Lift? (t)) =t. 


In other words, Vk € {1,2}, Vp € S4 x S», II, o Lift? = ids,- Note, however, that this assumes the partial 
function composition concept in Definition 10.10.6. 


Substitution operators are distinguished from lift maps by the fact that substitution operators map from 
tuples to tuples, whereas a lift maps map from values of individual components to the tuple space. Thus 
Lift? (t) gives the same “output” as the expression subs;,;(p). In other words, p is a parameter for the lift 
map, but p is an “input” for the substitution operator, whereas t is round the other way. 


10.12.16 REMARK: Partial maps are pull-backs of functions of several variables by lift functions. 

The partial maps in Definition 10.12.17 are pull-backs of functions of two variables by the lift functions in 
Definition 10.12.13. Thus the partial maps fj : X; — Y in Definition 10.12.17 satisfy f; = f o Lift}, for 
k=1,2 and t € X4, where f : X1 x X2 >Y. 

The name *partial map" is chosen here by analogy with the familiar partial derivatives in Section 41.1. The 
analogy would be more precise if the name “partial function” was used, but this name is already used in this 
book for the partially defined functions in Section 10.9. In this context, the words “function” and “map” 
have distinct meanings, which is unfortunate. 


A better name for the partial maps in Definition 10.12.17 might be “(function) argument projections". Then 
1? and f?' would be the left and right argument projections respectively, or first and second argument 


projections respectively. For more than two arguments, one could define *nth argument projections". 


The dot notation in Notation 10.12.18 is completely standard. A particular variable in the variable list is 
substituted with a *placeholder" dot. This allows partial maps to be defined inline with no ambiguity. 


10.12.17 DEFINITION: Partial maps of functions of two variables. 

A first partial map of a function f : X4 x Xə > Y is a function fj? : X1 — Y defined by fi? : xı e f(21,22) 
for some zə € Xo. 

A second partial map of a function f : X4 x Xo — Y is a function /5' : X2 — Y defined by ff : x9 > 
f (21,22) for some zı € X1. 


A partial map of a function f : X4 x X2 — Y is a first or second partial map of f. 


10.12.18 NOTATION:  Dot-notation for partial maps of functions of two variables. 

f (^, 22), for a function f : X4 x X2 — Y and x2 € X», denotes the partial map f1? : X4 — Y defined by 
15 : £1 > f(a1, 22). 

f (zi; +), for a function f : X1 x X2 — Y and zı € Xj, denotes the partial map f3* : X2 — Y defined by 


2 195 f (599). 


10.13. Projection maps and slices for general Cartesian products 


10.13.1 REMARK: Projection maps of Cartesian products of set-families onto components. 

As mentioned in Remark 10.12.1, projection maps of Cartesian products of sets onto a single component set 
is one of the most useful constructions for a Cartesian product. Definition 10.13.2 generalises the projection 
map in Definition 10.12.2 from the product of two sets to the product of a family of sets. Definition 10.13.6 
generalises this further, namely from a single-component projection to a projection onto an arbitrary subset 
of the index set. 
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10.13.2 DEFINITION: The projection map of a Cartesian product Xicr Si onto a component k € I is the 
map from x;e; S; to Spk defined by «+> x» for x = (xi)ier € Xier Si. 


10.13.3 NOTATION: IIx, for k € I, denotes the projection map of a Cartesian set-product x;er S; onto the 
component k. In other words, II; : x;e; Si — Sp is defined by II, : £ — Tk- 


10.13.4 THEOREM: The projection of a set-union equals the union of the projections. 
Vk € I, YC € P( P(Xier $;)), II (UC) = Uaec II (A). 


Pnoor: Let k € I and C € P(P(xie7 S;)). Then II;(LJC) = UAcc H(A) by Theorem 10.7.6 (i). 


10.13.5 REMARK: Projection maps of Cartesian products of set-families onto component subsets. 

Strictly speaking, the “subset of components" in Definition 10.13.6 is a subset of the set of ordered pairs S = 
((4,5;); i € I}, not a subset of the set of component sets Range(S) = (5;; i € I}, because some of the 
component sets may be identical, and in fact very often are, as for example in the case of the Cartesian 
product IR” for n € Zg. The ambiguity here arises from the long-accepted habit in mathematics of referring 
to a member 5; of a family when one really means the ordered pair (i, S;). This is yet another example of 
a kind of pseudo-notation which is common in practical mathematics. By referring to elements or subsets 
of the index set J in Definitions 10.13.2 and 10.13.6, the ambiguity is removed, although strictly speaking, 
one should refer to elements (k, Sx) or subsets S| z = {(i, S;); i € J} of the family S. In most contexts, such 
ambiguities are easily resolved by guesswork. 

If the index set J in Definition 10.13.6 and Notation 10.13.7 is totally ordered, the subset J may be defined in 
terms of the index-set order. (See Definition 11.5.25 and Notation 11.5.26 for this.) In differential geometry, 
the set S is often a product of the form R™ for some m € Zg , and projection maps where J is a contiguous 
sub-sequence of indices in J = Nm is required. 


10.13.6 DEFINITION: The projection map of a Cartesian product Xicr Si onto a subset of components 
J C I is the map from x;er S; to Xje7 9; defined by (zi)ier > (25)jc. 


10.13.7 NOTATION: Il;, for J C I for a Cartesian product x;er; Si, denotes the projection map of x;er S; 
onto J. Thus II; ? Xiel Sj — XijcJ Sj is defined by Il; : (zi)ier > (25)jeJ- 


10.13.8 THEOREM: The projection of a set-union equals the union of the projections. 
VJ € P(I), VC € P(IP(xier Si)), H;(UC ) = Uaec Hs (A). 


PROOF: Let J C I and C € IP(IP(xier 5;)). Then IL;((J C) = U4cc II;(A) by Theorem 10.7.6 (i). 


10.13.9 REMARK:  Cartesian slice sets. 

The “slice sets" for Cartesian products of families of sets in Definition 10.13.10 are generalised from the 
Cartesian product of only two sets in Definition 10.12.5. The “slice set" concept is relevant to the definitions 
of partial derivatives in Section 41.1. 


10.13.10 DEFINITION: Slices of subsets of Cartesian set-products through given points. 
The slice of a set A through a point p along component k, for A C Xier Si, p = (pilier € Xicr Si and k € I, 
is the subset {x € A; Vi € I \ {k}, zi = pi) of A. 


10.13.11 NOTATION: Slice? (A), for k € I, p € xier Si and A C Xier Si, denotes the slice of A through p 
along component k. In other words, Slice? (A) = {x € A; Vi e I \ {k}, zi = pi}. 


10.13.12 DEFINITION: Projected slices of subsets of Cartesian set-products through given points. 
The projected slice of a set A through a point p along component k, for A C xier Si, p = (pi)ier € Xier Si 
and k € I, is the set II;(Slice?(A)) = (zy € Sk; x € A and Vi € IV {k}, z; = pi}. 


10.13.13 REMARK: Slice sets for empty and singleton index sets. 

If I = ( in Definition 10.13.10, then x;e; S; = {0}. Therefore p = Ø and either A = 0 or A = {a} = {0}. 
However, the condition k € I cannot be met. So Definition 10.13.10 is vacuous in this case. Similarly, there 
are no projected slices either. 


Let I be a singleton, such as (jj. Then x;er Si = {{(j,2,;)}; vj € S;). (This is not precisely the same 
as S; = (rj; xj € Sj}, but it is often identified with it. This identification is not assumed here.) In this 
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case, p = {(j,x,;)} for some x; € S}, and A is a subset of {{(j,2;)}; xj € Sj}. The only possibility for k is j. 
So the slice of A through p along component k is Slicez(A) = {x € A; Vi e IN {k}, zi = pi}, which equals 
A because I V {k} = Ø. In other words, the intuitively expected result is obtained. 
10.13.14 THEOREM: The projected slice of a set-union equals the union of projections of slices. 
Let xjer; Si be a family of sets. 
(i) If I = {j}, then Vk € I, Vp € xier Si, VA € IP(xier Si), II (Slice? (A)) = II, (A). 
(ii) If I = {j}, then Vk € I, Vp € Xier Si, VC € IP(P(xier S;)), Ils (Slice? (U C )) = U Acc Hx (Slice? (A)). 
(iii) Vk € I, Vp € Xier Si, VC € IP(IP(xier S;)), Ils (Slice? (U C )) = U gec Hx (Slice? (A)). 
PROOF: Part (i) follows from the observation in Remark 10.13.13 that if J is a singleton, then Slice? (A) = A 
for all k € I, p € Xier S; and A € IP(x;e; Si). Hence II, (Slice? (A)) = IL,(A). 
For part (ii), let J = {j}, k € I, p € xier Si and C C IP(xie; Si). Then by part (i) and Theorem 10.13.4, 
II, (Slice? (U C)) = II (UC) = Uaec H(A) = U Acc Hx (Slice? (A)). ` EE 
For part (iii), let k € I, p € xie; Sj and C C P(xje7 Si). Then 
II, (Slice? (U C)) = (zi; x € Slice? (U C)} 

= (zy; x € UC and Vi € I \ {k}, zi = pi} 

= (xy; (AA € C, x € A) and Vi € I \ {k}, z; = pi) 

= (rj; JA E C, (x € A and Vi € I \ {k}, zx; = pi)] 

= Usaec{en; x € A and Vi € TV {k}, 2; = pi) 

= Uaec its: x € Ty (Slices (A))) 

= Uaec Hr (Slice; (A)). 
In the case I = Ø, the proposition “k € I" is always false, and so the proposition is true. The case that I is 
a singleton (i.e. when J V {k} = 0), is additionally verified by part (ii). 


10.13.15 DEFINITION: Slices of functions on Cartesian products through given points. 
The slice of a function f : A — Y through a point p along component k, for A C Xier Si, aset Y, p € xjer Si 
and k € I, is the restriction f stice? ca) of f to Slice? (A). 

k 


The projected slice of a function f : A— Y through a point p along component k, for A C xjer Si, a set Y, 
D € Xijer 6; and k € I, is the function A o n on the subset II; (Slice? (A)) of Sp. 
k 


10.13.16 NOTATION: Slice? (f), for k € I, p € xier Si, f : A Y, AC xier 5; and a set Y, denotes the 
slice of f through p along component k. In other words, Slice? (f) = f laucert A) 
k 


10.13.17 DEFINITION: Lift maps from component spaces to Cartesian set-products. 
The lift map for component k to a point p € xier Si, for k € I, is the map from Sy to xjer S; defined by 
t x with zp = t and Vi € IN {k}, zi = pi. 


10.13.18 NOTATION: Lift}, for k € I and p € xie; Si, denotes the lift map for component k of the point p. 
In other words, Lift? : Sk > xjer Si is the map defined by 
t ifi=k 
; P 
Vt € Sp, Vi c I, Lift; (£)(2) = a iid. 


10.14. Double-domain products of functions 


10.14.1 REMARK: Definition and notation for the double-domain function product. 

Definition 10.14.3 is standard. (See for example EDM2 [113], page 1421; MacLane/Birkhoff [110], page 10.) 
This kind of “double-domain” function product is used in the definition of atlases for direct products of 
manifolds. (For example, see Definition 50.4.6.) 


Definition 10.14.3 and Notation 10.14.4 are a specialisation from relations to functions of Definition 9.7.2 
and Notation 9.7.3. The double-domain product of f and g according to Definition 9.7.2 is the relation 
{((£1, £2), (91, y2)); (31,31) € f and (x2, y2) € g}. The fact that this relation is a function with domain 
and range Dom(f) x Dom(g) and Range(f) x Range(g) respectively is shown in Theorem 10.14.2 (i, ii, iii). 
(Definition 10.14.3 and Notation 10.14.4 are illustrated in Figure 10.14.1.) 
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Dom(f x g) = 
Dom(f) x Dom(g) 


Range(f x g) = 
Range(f) x Range(g) 


Dom(g) Range(g) 


Figure 10.14.1 Double-domain product of two functions 


10.14.2 THEOREM: Some basic properties of double-domain products of functions regarded as relations. 
Let f and g be functions. Let h be the double-domain product of f and g, regarded as relations. (See 
Definition 9.7.2.) In other words, let h = {((x1, £2), (y1, y2)); (1,31) € f and (x2, y2) € g}. 


(i) h is a function. 

(ii) Dom(h) = Dom(f) x Dom(g). 
(iii) Range(h) = Range(f) x Range(g) 
(iv) Y(z1, £2) € Dom(h), h(z1, x2) = (f (21), g(z2)). 
PROOF: For part (i), let ((z1, 22), (y1, y2)) € h and ((x1, £2), (y1, y5)) € h. Then (11,41) € f, (v2, y2) € 9, 
(1,91) € f and (a2, y5) € g by Definition 9.7.2. So yı = y, and y» = y by Definition 10.2.2 (i). Therefore 
(yi, Y2) = (91, y5) by Theorem 9.2.17. Hence h is a function by Definition 10.2.2 (i). 
Part (ii) follows from Theorem 9.7.5 (i). 
Part (iii) follows from Theorem 9.7.5 (ii). 
For part (iv), let (x;,z2) € Dom(h). Then x; € Dom(f) and x2 € Dom(g) by part (ii). So (a1, f(z1)) € f 
and (25, g(z2)) € g. Therefore ((x1, x2), (f(z1), g(z2))) € h. Hence h(z1, £2) = (f (21), g(22)). 


10.14.3 DEFINITION: The double-domain product of functions f and g is the function from Dom(f) x 
Dom(g) to Range(f) x Range(g) defined by (a1, £2) +> (f (z1), g(z2)). 


10.14.4 NOTATION: f X g, for functions f and g, denotes the double-domain product of f and g. Thus 
V(zi,z2) € Dom(f) x Dom(g), — (f x g)(zi,22) = (F (x1), g(22)). 


In other words, f x g = (((z1, £2), (y1, Y2)); (1,91) € f and (x2, 92) € g}. 
Alternative notation: f x g. (Not recommended.) 


10.14.5 REMARK: Emphatic notation for double-domain product of functions. 

In some contexts, the double-domain function product in Definition 10.14.3 and the common-domain function 
product in Definition 10.15.2 must be distinguished. For this reason, the notation “f x g” is used here. The 
double dot is a mnemonic for the double domain. 


10.14.6 THEOREM: JInjectivity and bijectivity of double-domain products of functions. 
Let f and g be functions. 
(i) If f and g are injective, then f X g is injective. 
(ii) If f : X4 > Y; and g : Xz > Yo are bijections, then f x g : X, x X5 — Y, x Yo is a bijection. 


PROOF: For part (i), let (x1, y1), (£2, y2) € Dom(f) x Dom(g) satisfy (f x g)(z1,91) = (f X g)(z2,y2). Then 
(f (1), 9(y1)) = Cf (22), g(y2)) by Definition 10.14.3. So f(z1) = f(x2) and g(y1) = g(y2) by Theorem 9.2.17. 
Therefore xı = £2 and yı = y» by Definition 10.5.2. So (x1, y1) = (£2, Y2) by Theorem 9.2.17. Hence f x g 
is injective by Definition 10.5.2. 


For part (ii), let f : X; — Y; and g : Xo — Y» be bijections. Then f x g is injective by part (i). 


is a bijection. 
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10.14.7 REMARK: Composition of double-domain products of functions. 

Unsurprisingly, the composition of two double-domain products of functions yields a single double-domain 
product of two composites of functions. (See Definition 10.4.17 for composition of functions.) This is shown 
in Theorem 10.14.8. This kind of construction is relevant to transition maps for direct products of manifolds. 
(See for example Theorem 52.6.5.) 


10.14.8 THEOREM:  Double-domain product of composites equals the composite of double-domain products. 
Let fi, f2, 91, 92 be functions with Range(f1i) € Dom(f2) and Range(g,) € Dom(g2). 
Then (f2 o f1) X (g2 0 g1) = (fa X g2) e (fi X g1). 


PRoor: From Definitions 10.4.17 and 10.14.3, it follows that 


V(z, y) € Dom(f1) x Dom(gi), 
(fo o fi) X (g2 o gi) (x, y) = GC), an (i ()))) 
= (fe X ge)(fi(x), m (v) 
= (fa X ga)((fa X g1)(£,y)) 
= ((fa X ga) o (fa X m) v). 


Hence (f2 o fi) X (g2 © g1) = (fe X g2) o (fi X g1). 


10.14.9 REMARK: Definition and notation for the double-domain product of partial functions. 

The double-domain product of partial functions in Definition 10.14.10 generalises Definition 10.14.3 from 
fully to partially defined functions. This kind of partial function product is useful for the definition of 
atlases for Cartesian products of manifolds. (See Definition 50.4.6.) Definition 10.14.10 is specialised from 
Definition 9.7.2 for general relations. 


Although it is almost trivial to generalise double-domain products from fully to partially defined functions, 
it is not so easy to define such a generalisation in the case of common-domain products because the domains 
are then not so easy to manage. More importantly, double-domain products of partial functions do have 
substantial applications due to the fact that manifold charts are defined to be partial functions, whereas 
common-domain function products are typically applied to local trivialisations for fibre bundles, where 
usually the first map (the projection map) is fully defined and the second map (the fibre chart) is partially 
defined, which is quite easy to manage as a fully defined function on its domain. (Nevertheless, common- 
domain products are defined for the more general partial functions in Definition 10.15.2.) 


10.14.10 DEFINITION: The double-domain product of two partial functions f and g is the partial function 
{((z1, £2), (y1, 42); (21,31) € f and (22,2) € g}. 


10.14.11 NOTATION: 

f x g, for partial functions f and g, denotes the double-domain product of f and g. 
In other words, f X g = (((z1,22), (y1, y2)); (11,31) € f and (x2, y2) € g}. 
Alternative notation: f x g. (Not recommended.) 


10.14.12 REMARK: Domain and range of double-domain product of two partial functions. 
It is easily seen from Definition 10.14.10 and Notation 10.14.11 that Dom(f x g) = Dom( f) x Dom(g) and 
Range(f x g) 2 Range(f) x Range(g). One may then write: 


V((21, 22), (Y1, y2)) € Dom(f x g) x Range(/ x g), 
((21,22), (y1,92)) € f X g & (xi, 91) € f and (22,92) € g. 


10.14.13 REMARK: Composition of double-domain products of partial functions. 

'The composition of two double-domain products of partial functions is equal to the double-domain product 
of composites of two partial functions. (This is asserted by Theorem 10.14.14 (i). This style of partial 
function product is relevant to transition maps for direct products of manifolds. (See for example the proof 
of Theorem 52.6.5.) 
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10.14.14 THEOREM: Some properties of composites and double-domain products of partial functions. 
Let fi, fo, 91, g2 be partial functions. 


(i) (g1 © fi) X (ga o f2) = (g1 X g2) o (fi X f2). 

(i) fi x fa! - SX f). 
(ii) (91 o fr") X (92 © fx") = (n X g2) e (ft X fa) 
PROOF: Part (i) follows from Theorem 9.7.8 (i) because every partial function is a relation. 
Part (ii) follows from Theorem 9.7.8 (ii) because the inverse of a partial function is a relation. 
Part (iii) follows from parts (ii) and (iii). 


10.15. Common-domain direct products of functions 


10.15.1 REMARK: Application of common-domain direct products to fibre bundle definitions. 

The “common-domain” direct product of two functions in Definition 10.15.2 and Notation 10.15.3 is used 
extensively in this book as the most natural way of defining “local trivialisations” for a fibre bundle. (See 
for example Definition 21.5.2 and Remarks 21.5.5 and 21.5.11.) 


A common-domain direct product f x g may be thought of as a kind of “curve” in the Cartesian product-set 
Range(f) x Range(g). The functions f and g each contribute one “coordinate” for the “curve”. Since the 
decomposition of points in Cartesian set products into components is unique and well defined, any function 
from a set to a Cartesian product of two sets can be decomposed into its component functions and then 
re-composed by applying the common-domain direct product. 


10.15.2 DEFINITION: Common-domain direct product of two functions. 
The (common-domain) direct product of two functions f and g is the function from Dom(/) n Dom(g) to 
Range(f) x Range(g) defined by «+> (f(x), g(z)) for x € Dom(f) n Dom(g). 


10.15.3 NOTATION: fxg, for functions f and g, denotes the common-domain direct product of f and g. 
Thus 


Va € Dom(f) N Dom(g), (f x g)(z) = (f(x), g(2)); 
where Dom( f x g) = Dom(f) n Dom(g) and Range( f x g) € Range(f) x Range(g). 


Alternative notation: f x g. (Not recommended.) 

10.15.4 REMARK:  Double-domain, common-domain and Cartesian products of functions. 

The three kinds of products of functions may be contrasted as follows. 

(1) fi x fo = (Gs (yi, y2)); (a, y1) € fı and (a, ya) € f2} is the common-domain direct product. 

(2) fi X fa ={((a1, £2), (y1, y2)); (x1, 31) € fi and (a2, y2) € fo} is the double-domain direct product. 

(3) fa x fa = (((1, 91), (22, y2)); (21,91) € fı and (x2, y2) € fo} is the Cartesian set-product. 

(1) is useful for local trivialisations of fibre bundles, (2) is useful for Cartesian products of manifold charts, 


and (3) has no obvious use. 


10.15.5 REMARK: Relation of common-domain direct function products to diagonal maps. 

The common-domain direct product function is the same as the composition of the double-domain direct 
product with a diagonal map. Let X = Dom(f) nn Dom(g) in Definition 10.15.2 and define the diagonal map 
d: X + X x X by d(x) = (a,x) for all x € X. Then (f x g) od is the same as the common-domain direct 
product f x g. The two kinds of function product may be expressed in terms of their graph sets as follows. 


f X g = {((21, x2), (f (21); 9(x2))); (21,22) € Dom(f) x Dom(g)} 
= {((@1, £2), (Y1, Y2)); (21,31) € f and (x2, y2) € g} 

f x g = { (f(z),g(z)); x € Dom(f) n Dom(g)} 
= (o, (y1, y2)); (2,91) € f and (x, y2) € g}. 


10.15.6 THEOREM: Some basic properties of common-domain direct function products. 
Let fi: X — Y, and fo: X > Yo be functions for sets X, Y; and Y3. 


T, 
T, 
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(i) YS € P(X), (fi x f2)(S) € A(S) x fa(8). 
(ii) Range(fi x f2) C Range(fi) x Range( f2). 
(iii) (fix f2)-1(84 x S3) = fi (S1) n fo (S2) for any sets Sı and S2. 


PROOF: For part (i), let y € (fı x f2)(S). Then y = (fı x f2)(x) for some z € X. So by Definition 10.15.2, 
y = C(fh(x), fo(x)) for some x € X. Therefore y € f1(S) x f2(S) by Definition 9.4.2. It then follows that 


(Fi x f2)(5) € CS) x fa (5). 
Part (ii) follows from part (i). 
For part (iii), it follows from Definition 10.15.2 that 


(fa X f2) 1 (81 x $5) = (x € Dom(f,) n Dom(f2); (fi(z), fa(z)) € $1 x S2} 
= {x € Dom(f1) n Dom( f2); x € FS) and z € f; (9531 
= Dom( f1) n f1 ^ (81) n Dom(f2) n f; ' (S2) 
= fT (S1) N fz (S3). 


Hence the assertion. 


10.15.7 REMARK: Composition of common-domain and double-domain products of partial functions. 

The composition of double-domain and common-domain products of partial functions yields a common- 
domain product of composites of partial functions. (This is asserted by Theorem 10.15.8 (i).) This style of 
composite partial function product is often required for induced charts on total spaces of differentiable fibre 
bundles. (See for example Definition 64.9.2.) They are also required for the closely related differentials of 
common-domain product-maps, as in Section 58.7. 


10.15.8 THEOREM: Composites of double-domain and common-domain products of partial functions. 
Let fi, fo, 91, go be partial functions. 


(i) (g1 X ga) o (fi X f2) = (n © fi) X (ga o f2). 


PROOF: For part (i), 


Dom((gi X g2) o (fa x f2)) = (fi x f2) ! (Dom(gi X ga)) (10.15.1) 
= (fi x fa) ! (Dom(gi) x Dom(g2)) (10.15.2) 
= fī (Dom(gi)) N fz  (Dom(go)) (10.15.3) 
= Dom(gi o fi) A Dom(gs o fə) (10.15.4) 
= Dom((gi o f1) x (ga o f2)), (10.15.5) 


where line (10.15.1) follows by Theorem 10.10.13 (i), line (10.15.2) follows by Definition 10.14.3, line (10.15.3) 
follows by Theorem 10.15.6 (ii), line (10.15.4) follows by Theorem 10.10.13 (i), and line (10.15.5) follows 
by Definition 10.15.2. To verify that the two functions have the same values on their domain, let x € 
fj. | (Dom(gi)) n f; ! (Dom(ga)). Then 


((gi x g2) o (fi x f2))(2) = 


gi X ga)(f. filz LÍ 2(x )) 

i(fi(x)), ga (fa ())) 

(gı o fi) (x). (92 o f2)(2)) 
)x 


( 
=(9 
= ( 
= ((91 o fi) X (92 o f2))(x). 


Hence (gi X g2) o (fi X f2) = (g1 o fi) x (ga o fo). 


10.15.9 REMARK: Some bijection-related properties of common-domain direct function products. 

'The common-domain direct function product properties in Theorem 10.15.10 are illustrated in Figure 10.15.1. 
(This is a “commutative diagram" in category theory terminology. See for example Lang [108], pages ix-x; 
MacLane/Birkhoff [110], page 16; Ash [50], page 20; Lang [23], pages 4-5; Wallace [154], page 14.) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


10.15. Common-domain direct products of functions 361 


x fap 
KO) ~ho B 
f II» 
A : * Ax B 
a IG 


Figure 10.15.1 Maps and spaces for a common-domain direct function product 


10.15.10 THEOREM: Bijections from horizontal/vertical subsets to direct product components. 
Let f : X — A and g : X — B be functions for sets X, A and B. 


(i) If fx g: X — A x B is a bijection, then Va € A, ds: 
(ii) If fxg: ieee then Vb € B, f|, 
(iii) If Va € A, &ls- 

(iv) If Vb € B, f|, 


f !((a]) 9 B is a bijection. 
:g '({b}) + A is a bijection. 


1({a}) ' 


~*({b}) 


1(fa}) -l(fa)) — B is a bijection, then f x g : X — A x B is a bijection. 


~*({b}) 
PROOF: For part (i), let f x g: X — A x B be a bijection. Let a € A. Then 9| | (ay) :f^((a)) > B is 
a well-defined function. (See Definition 10.4.3 and Notation 10.4.4 for restrictions of functions.) Let b € B. 
Then (f x g)(x) = (a,b) for some x € X. So f(x) =a and g(x) = b for some x € X. Therefore g(x) = b for 
some x € f^ !((a]). So 9| quy) : ft ({a}) > B is surjective. To show injectivity, suppose that 71,22 € X 
satisfy 9|,-: tap 3) = g|- 1({a}) (22) =b. Then (f x g)(x1) = (f x g)(x2) = (a,b). So z1 = x2 because 
fxgis a Therefore g|- f ((a)) > B is a bijection. 


a — A is a bijection, then f x g : X — A x B is a bijection. 


is injective. Hence 9| j-: (aj) 


1({a}) 
Part (ii) may be proved exactly as for part (i). 

For part (iii), suppose that olita : f-t({a}) > B is a bijection for all a € A. It follows from the 
equality Dom(f x g) = Dom(f) n Dom(g) = X that f x g : X — A x B is a well-defined function. Let 
(a,b) € A x B. Then 9| ay) : f l((a)) > B is a bijection, and so g(x) = b for some x € f~1({a}). 
So (f x g)(z) = (f(z),g(x)) = (a,b) for some x € X. Therefore (a,b) € Range(f x g). Hence f x g is 
surjective. To show injectivity, let £1, £2 satisfy (f x g)(xi) = (f x g)(z2) = (a,b) for some (a,b) € A x B. 
Then f(zi) = f(x2) = a and g(z1) = g(£2) = b. So zi, £2 € f! ((a]). But 9| j-i(qay) : fa} 9 Bisa 


bijection. So z, = £2. Therefore f x g is injective. Hence f x g: X — A x B is a bijection. 


Part (iv) may be proved exactly as for part (iii). 


10.15.11 REMARK: Product-structured spaces are the same as trivial fibrations. 

The set X in Theorem 10.15.10 may be thought of as a “product-structured set", meaning that it is not 
actually a direct product of two sets, but it does have the same structure as a direct product of sets A and B 
because of the existence of the bijection f x g : X — A x B. In this case, there is no additional structure on 
the sets, but in the case of topological and differentiable fibre bundles, the map f x g is a homeomorphism 
or diffeomorphism respectively. Then the set X is the “total space” of a trivial fibre bundle, A is the “base 
space", B is the “fibre space", f is a “projection map", and g is a “fibre chart". 


Since fibre bundles are the key unifying concept of differentiable geometry, and product-structured spaces 
are trivial fibre bundles, it follows that product-structured spaces play a fundamental role in differential 
geometry. “Local trivialisation" maps, which are the quintessence of fibre bundles, are always bijective 
common-domain function products which are isomorphisms of some kind. 


Definition 10.15.12 introduces a very basic kind of “product-structured set", for which “horizontal” and 
"vertical" subsets are defined. The convention that the set of points with a constant second coordinate is 
called “horizontal” arises from the modern practice of listing the horizontal coordinate before the vertical 
coordinate. In the fibre bundle context, the vertical subsets, called “fibre sets", have greater importance, 
partly because they are independent of the choice of “local trivialisation" maps f x g. 
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10.15.12 DEFINITION: A product-structured set is a tuple X < (X,f,g,A,B) where X, A and B are 
sets and f : X — A and g : X — B are functions such that the common-domain function product 
fxg:X 3 Ax B is a bijection. 

A horizontal subset of a product-structured set (X, f, g, A, B) is a set g^! ((0]) for some b € B. 

A vertical subset of a product-structured set (X, f, g, A, B) is a set f~'({a}) for some a € A. 


10.15.13 REMARK: Inverse “fibre-set-restricted” maps expressed as direct-product-inverse partial maps. 
1(fa}) of “fibre charts” g, restricted to “fibre sets” f~!({a}, 
in terms of partial maps (f x g)! (a, -) of the inverse (f x g) ! of a common-domain direct product f x g. 
(See Definition 21.1.2 for fibre sets. See Definition 21.5.2 for fibre charts.) 


This has the useful consequence that if f x g is a homeomorphism (or diffeomorphism), then the “fibre sets” 
f ((a]) can be shown to be homeomorphic (or diffeomorphic) to the corresponding “fibre space" B. 


Theorem 10.15.14 (i) expresses the inverses g|;- 


10.15.14 THEOREM: Partial map expressions for inverse horizontal/vertical component maps. 
Let f : X — A and g : X > B be functions for sets X, A and B. 


(i) If fxg: X — Ax B is a bijection, then Va € A, arises mi xg us 
(ii) If f x g: X + A x B is a bijection, then Vb € B, f] ME = (f X g)71(-,b). 
PROOF: For part (i), let f x g: X — Ax B bea bijection. Then the inverse (f x g ! : Ax B > X 


is a well-defined bijection. Let a € A. Then (f x g) !(a,-) : B > X is given by Definition 10.12.17 and 
Notation 10.12.18 as the map-rule b — (f x g)-1(a, b). In other words, (f x g)-! (a, -)(b) = (f x g)- | (a,b). 


By Theorem 10.15.10 (), g| i, : f^ ((a)) — B is a bijection. So g| igap : B > f^ ((a)) is a 
well-defined bijection. Let b € B. Then z = gl ea) is well defined, and g(x) — 9| | quy ®) — b. But 
f(x) = a because x € Dom(9| aqa) = f-i((a)). So (f x g)(x) = (a,b). Therefore (f x g)-1(a,b) = z. 
So (f X g) (a, -)(b) = £ = gl -itaj (©) for all b € B. Hence (f X 9)! (a. -) = 9| ji yy 
Part (ii) may be proved exactly as for part (i). 


10.15.15 THEOREM: Component maps constructed from a product-structured set. 
Let f; : X — Y; for i = 1,2. Let fı x fo: X + Yı x Yə be a bijection. Then f; o (fı x fo)~! = Il; for 
i = 1,2, where IL; : Y; x Yə — Y; is the projection map as in Notation 10.12.3. 


PROOF: Let y € Yi x Y. Then y = (yi, ya) for some (unique) y; € Yı and y» € Yo. Let x = (fı x f2) !(y). 
Then z € X and (fi(2), fo(xz)) = (fi x fa)(x) = y = (y1,y2). So fi(x) = yi for i = 1,2. Therefore 
"nn x f2)-!(y)) = filx) = Yi = IL(y) by Notation 10.12.3. Hence fi [9 (fi x f! = [L. 


10.16. Equivalence relations and equivalence kernels of functions 


10.16.1 REMARK: The canonical map for the quotient of a set with respect to am equivalence relation. 
Definition 9.8.7 introduces the idea of the quotient set X/R of a set X with respect to an equivalent relation 
R on X. Definition 10.16.2 introduces the canonical quotient map for such a quotient set. 


10.16.2 DEFINITION: The quotient map of an equivalence relation R on a set X is the map f : X > X/R 
given by f : x> {y € X; x Ry}. 


10.16.3 THEOREM: Partition of the domain by a function. 
Let X and Y be sets and let f : X — Y be a function from X to Y. Then X is the disjoint union of the 
sets f~'({y}) for y € Y. That is, 


X= JU. £^ (uy) 


and 
Vyi, ya € Y, ugs f (pn (va) = 9. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


10.16. Equivalence relations and equivalence kernels of functions 363 


PROOF: Let f: X — Y be a function. Let y1,y2 € Y. Suppose that f^ !((y1)) N f^! ((y3]) #0. Then 
f(x) = yı and f(x) = y» for some x € X. So yı = y2. Therefore f^ !((y1)) = f^ !((ya]). So the sets 
f ((y)) are pairwise disjoint for y € Y. 

Let x € X. Let y= f(x). Then x € f^ !((y]). Therefore X = Uey f-'({y}). Hence X equals the disjoint 
union of the sets f~'({y}) for y € Y. 


10.16.4 REMARK: Functions induce equivalence relations on their domains. 
Theorem 10.16.3 is illustrated in Figure 10.16.1. 


PAD go a 
X 
Y e. e. * e e. e. e e. e. e. 
yi ya 
Figure 10.16.1 Partitioning of a set X by an inverse function f^! 


Functions provide a useful tool for partitioning sets. The value of a function f on a set X may be thought 
of as a tag which identifies elements of X which belong to the same part of the partition. In other words, a 
function effectively defines an equivalence relation on its domain. 


10.16.5 REMARK: Application of partitioning by inverse functions to fibrations. 

Theorem 10.16.3 provides the foundation for non-topological fibrations (groupless non-topological fibre bun- 
dles). The tuple (E,7, B) could be regarded as a fibration if r : E — B is any function from E to B. 
Then the “total space" E is partitioned by the “fibre sets” 7~1({b}) for b € B. In other words, the set 
(n-1((b)); b € B} is a partition of E. (See Theorem 21.1.9 for the equi-informationality of “partition 
tag-maps" and non-topological fibration projection maps.) 


10.16.6 REMARK: The equivalence relation induced by an inverse function is called an equivalence kernel. 
The relation R on X in Definition 10.16.7 is an equivalence relation by Theorem 10.16.3. (This definition of 
“equivalence kernel" is given, for example, by MacLane/Birkhoff [110], page 33.) 


10.16.7 DEFINITION: The equivalence kernel of a function f : X — Y is the equivalence relation R C X x X 
defined by (%1,%2) ER & f(z1) = f(a). 


10.16.8 THEOREM: Properties of equivalence kernels. 
Let f : X — Y be a function. Let R C X x X be the equivalence kernel of f. 
(i) The function g : X —> X/R defined by g : x f ((f(x)]) is well defined and surjective. 
(ii) The function h: X/R — f (X) defined by h: g(x) — f(x) for x € X is a well-defined bijection. 
(iii) f — hog. 
PROOF: For part (i), let R be the equivalence kernel of a function f : X — Y. For x € X, let g(x) = 


f (f(2))). Then g(z) = (z' € X; f(x’) € (f(2))) = {x € X; f(x’) = f(3)) = iz' € X; x' Ra} € X/R. 
So g is well defined. 

Let S € X/R. Then S = (z' € X;z' Rx) = {x' € X; f(a’) = f(x)) for some x € X, and so S = g(x). 
Therefore g : X > X/R is surjective. 


For part (ii), define h to be the set of ordered pairs ((g(x), f(z)); x € X}. Then h € (X/R) x f(X) because 
Range(g) = X/R by part (i). So h is a relation from X/R to f(x). In fact, Range(g) = X/R implies 
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that Dom(h) = X/R. To show that h(S) has a unique value for all S € X/R, let S = g(a1) = g(a) 
for zj,29 € X. Then {x € X; f(x) = f(z1)) = {x € X; f(x) = f(z2)). So f(zx1) = f(z2). Therefore h is 
well defined. 

The surjectivity of h follows from the definition of f(X) = (f(x); x € X). To show that h is injective, let 
y € f(X) and suppose that S1, S2 € X/R satisfy h(S1) = h(S2). Then Sı = g(a1) and S2 = g(z2) for some 
£1, £2 € X by the surjectivity of g. So h(g(z1)) = h(g(v2)), and so f(z1) = f(x2). Therefore h in injective. 
Hence h is a well-defined bijection. 

For part (iii), let f : X — Y be a function, and define g and A as in parts (i) and (ii). Since g : X — X/R and 
h : X/R 5 f(X) are well-defined surjections, it follows that h o g : X — f(X) is a well-defined surjection. 
Let x € X. Then (ho g)(x) = h(g(x)) = f(x) by the definition of h. Hence h o g = f. 


10.17. Partial Cartesian products and patchwork spaces 


10.17.1 REMARK: Partial Cartesian products of set-families are useful for patchwork spaces. 

The Cartesian product x;e; S; of a set-family (5;);e; in Definition 10.11.2 may be thought of as the set of 
all choice functions for the set-family. In other words, each element (r;);e; in the product is a selection of 
exactly one element x; from each set S; in the set-family, and these choices are “tagged” by indices i € I to 
indicate which set each element was chosen from. 


The partial Cartesian product X;e; S; of a set-family (S;);ce; in Definition 10.17.2 may be thought of as 
the set of all partial choice functions for the set-family. In other words, each element (2z;);e; in the partial 
product is a selection of at most one element x; from each set 5; in the set-family. 


The partial Cartesian products of set-families in Definition 10.17.2 have some limited applicability as a “place 
to live" for the patchwork spaces in Definition 10.17.8. 


10.17.2 DEFINITION: The partial Cartesian product of a family of sets (S;)ie7 is the set of functions 
{z : J > U;cr Si; J C I and Vi € J, x; € Si}. 


icr 
10.17.3 NOTATION: Xier S; denotes the partial Cartesian product of a family of sets (5S;);ce;. Thus 


x Sı ={xz: J > U $5 J € I and Vi € J, 2; € S;}. 
tel icI 


10.17.4 REMARK:  Well-definition of partial Cartesian products of families of sets. 
The partial Cartesian product of a family of sets in Definition 10.17.2 is a well-defined set by Theorem 7.7.2, 
the ZF specification theorem, because it is a subset of I x | J;- ; S; specified by a set-theoretic formula. 


10.17.5 REMARK: Partial sequences. 
If the index set J is a subset of the integers, then the elements of a partial Cartesian product may be referred 
to as “partial sequences”. 


10.17.6 REMARK: Motivation for definition of partial Cartesian products of families of sets. 

Definition 10.17.2 is probably non-standard. The idea here is that the functions S are not necessarily defined 
on all of the index set J. The set x;c; S; is a superset of the standard Cartesian product set x;c; Si. Elements 
of X ;c; S; are restrictions of the elements of x;c; S; to arbitrary subsets of T. (These restrictions are actually 
projections.) That is, the elements of the partial Cartesian product may be thought of as families (c;);e; 
of the normal Cartesian product for which some of the elements x; may be undefined. Unfortunately, set 
theory does not have a standard symbol or definition for an undefined element. 


10.17.7 REMARK: Non-emptiness of partial Cartesian products of families of sets. 
Unlike the situation with full Cartesian products, the partial Cartesian products in Definition 10.17.2 are 
guaranteed to be non-empty, even if some or all of the sets in the family are empty. 


10.17.8 DEFINITION: A patchwork space of the family of sets (5;);e; is a subset X of the partial Cartesian 
product set X;c; S; such that: 
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(i) O€ X. 
(ii) Vi € I, Vs € Si, dr € X, x; = s. In other words, Vi € I, S; C {xi x € X}. 
(iii) Vz,y € X, Vi € Dom(z) n Dom(y), (x; = yi > x = y). 


In other words, the elements of X are non-empty elements of x;c; S; such that every element of each set S; 
is the value x; of one and only partial choice function x € X. 


10.17.9 REMARK: Gluing sets onto each other. 

Patchwork spaces are closely related to quotient sets, which are introduced in Definition 9.8.7. Defini- 
tion 10.17.8 requires some interpretation. Strictly speaking, a subset X of the set x;¢7 Sj is | only a repre- 
sentation of a patchwork space of the family of sets (S;);e;. The patchwork space of a family of sets is a 
kind of equivalence class of the sets in the family. It is easier to interpret Definition 10.17.8 if the sets S; are 
pairwise disjoint. In this case, there is a canonical map f : ier Si > X defined so that f(s) is the unique 
element x of X such that x; = s for some i € I. (See Figure 10.17.1, where a dot “-” is used in tuples such 
as (w1, -) for the “undefined” elements of sequences in X.) 


f(x1) = f(2) = (21, 22) 


Figure 10.17.1 Patchwork space constructed from sets S4 and S2 


Since f is a well-defined surjective function, the sets of the form f~!({x}) for x € X are non-empty and 
constitute a partition of Ujer Si- The elements of each set f~'({a}) are regarded as identified. In other 
words, the elements in such a set are regarded as “glued” onto each other. 


If the sets S; are not disjoint, it is straightforward to attach a “tag” i to elements of each S; so as to force the 
sets to be disjoint. If the sets S; are disjoint, it is possible to formalise the elements of the patchwork space as 
a partition of the set | J;- ; S; rather than as a set of tagged elements of the partial Cartesian product Xic1 Si. 
In fact, the partial Cartesian product set x;e; 5; is simply the set of subsets of Uie; Si with tags on each 
element to indicate which set S; it was drawn from. 


icI 


The idea of superposing sets to construct “patchwork spaces" is fundamental to differential geometry. Dif- 
ferentiable and topological manifolds are constructed by gluing portions of Cartesian spaces onto each other 
to create more general classes of spaces. The sets (5;);e; are in this case the domains of charts in an atlas. 
The domain of each chart is a “patch”. The patches are glued together to form an abstract manifold. It is 
necessary to also define a patchwork-space topology and a patchwork-space differentiable structure, and so 
forth, in order to build up the structural layers of manifolds. 


10.18. Indexed set covers 


10.18.1 REMARK: Set covers are useful for topological compactness definitions. 

Set covers and subcovers are needed for the topological compactness definitions in Sections 33.5 and 33.7. 
Unindexed set covers are defined in Section 8.7. Set covers are a generalisation of set partitions. A set cover 
may have non-empty overlaps of sets in the collection, whereas all overlaps of a partition must be empty. 


10.18.2 DEFINITION: An indexed cover of a subset A of a set X is a family (B;);e; such that 
(i) Vi € I, B; € X, 
(i i) A c User Bi 
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10.18.3 DEFINITION: A subcover of an indexed cover (B;)ier of a subset A of a set X is an indexed cover 
(C;)je7 of the subset A of X such that Vj € J, di € I, C; = Bj. In other words, (Cj) je7 € (Bi)ier. 


10.18.4 DEFINITION: A refinement of an indexed cover (B;)ier of a subset A of a set X is an indexed cover 
(C;)je7 of the subset A of X such that Vj € J, di € I, C; C Bi. 


10.18.5 REMARK: Confusion of families of sets with their ranges. 

There is a subtle issue with the inclusion (C;);c; C (B;);e; in Definition 10.18.3. It is not important that 
the indexing of the two families (C;);c; and (B;);er should be the same. In fact, only the ranges of these 
two families are intended to satisfy the inclusion. In other words, (C5; 7 € J} C (Bi; i € I). 


10.18.6 REMARK:  /ndered set covers and patchwork spaces. 

Theorem 10.18.7 shows that indexed covers of sets are in essence the same thing as the patchwork spaces 
in Definition 10.17.8. However, a patchwork space is more general because it permits the gluing together 
of disjoint sets without even having a “substrate” (such as the set A in Definition 10.18.2) to attach the 
patches to. A patchwork space builds a set from patches “in the air". 


10.18.7 THEOREM: Construction of a patchwork space from an indexed cover. 
Let (Bi)ier be an indexed cover of A C X. Define z(s) = {(i,s); i € I and s € Bj) for all s € A. Let 
Y = (x(s); s € A}. Then Y is a patchwork space of (Bj);er. 


PROOF: For alls € A, let J(s) = {i € I; s € Bj}. Then z(s) is a function with Dom(z(s)) = J(s) C I and 
Vi € J(s), z(s) € Bi. So x(s) € Xier B; by Notation 10.17.3. Thus Y C xjc; Bi. 

Let s € A. Then s € [J;. B; by Definition 10.18.2 (ii). So s € B; for some i € I. So a(s) # 0. So 0 ¢ Y. 
Therefore Y satisfies Definition 10.17.8 (i). 

Let i € I and s € Bi. Then z(s) € Y and z(s)(i) = s. So Vi € I, Vs € Bi, dy € Y, y; = s. Therefore Y 
satisfies Definition 10.17.8 (ii). 

Let y!,y? € Y. Then y! = x(s,) and y? = z(s2) for some s,1,s55 € A. Let i € Dom(y!) n Dom(y?). 
Suppose that y] = y?. Then z(si)(i) = x(s2)(i). So sı = s2. So x(s1) = 2(s2). So y! = y?. Consequently 
Vy, y?, Vi € Dom(y!) n Dom(y?), (yl = y? > y! = y?). Therefore Y satisfies Definition 10.17.8 (iii). Hence 
Y is a patchwork space of (B;)ier. 


10.19. Extended notations for function-sets and map-rules 


10.19.1 REMARK: Extended notations for sets of functions. 

Let A and B be sets. It is useful to denote by A — B the set of functions f : A — B. Then one may 
write f € (A > B). This set of functions may also be denoted as B^. (See Notation 10.2.17.) Thus 
f € (A — B) and f : A — B mean exactly the same thing as f € B^. The notation A — B seems useless 
until one considers function-valued functions. Let C be a set, and let f be a function on A whose values 
are functions from B to C. One may write this as f : A > CP or f € C(8*), Tt is much clearer to write 
f:A— (B C)or f € (A> (B ^ C)). (See Remark 10.2.20 for further comments on this notation.) 


10.19.2 NOTATION: X — Y for sets X and Y denotes the set Y *. 


10.19.3 REMARK:  Zrtended notations for map-rules. 

The map-rule notation “f : a — £(a)" means that the function f maps an element a to the expression € (a). 
The context must state the domain and range for the function f. This kind of notation mimics the function 
notation “f : A — B", where A and B are sets. 


This standard map-rule notation may be conveniently extended in the same way that the function notation 
f : A — B may be extended to f : A> (B > C). The notation f : a (b — E(a,b)) may be understood 
to mean that f is a function-valued function f : A + (B — C), and the rule which determines this function 
is the map from a € A to the map b > E(a,b) for b € B. 

A practical example of this double-map-rule notation is the canonical map from a linear space V to its 
second dual in Definition 23.10.4. This function maps a vector v € V to a function h, : V* — K which is 
defined by h, : f + f(v) for f € V*. This may be succinctly expressed ash: v — (f > f(v)). Without the 
double-map-rule notation, this is h(v)(f) = f(v) for all v € V and f € V*. 


See Remark 10.19.5 for practical examples of triple-map-rule notation. 
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10.19.4 REMARK: Function-valued function transposes. 

For sets A, B and C, a function f : A — (B > C) has a “function transpose” ft: B — (A — C) defined by 
f'(b)(a) = f(a)(b) for a € Aand b € B. This may be written as f* : b — (a+ f(a)(b)) in terms of the double- 
map-rule notation in Remark 10.19.3. As a practical example, a tangent operator field on a differentiable 
manifold M is of the form X : M —> ((M > R) > R). (Here A= M, B = (M > R) and C = R.) This 
may be transposed as the function X' : (M — R) > (M > IR) defined by X'(f)(z) = X(z)(f) for all 
f € (M > R) and x € M. (In terms of the double-map-rule notation, X' : f — (x œ> X(x)(f)).) In fact, 
whenever a function is function-valued, and the target functions all have the same domain, the base function 
may be transposed in this way to make a new function. 


It often happens in differential geometry that function-valued functions are transposed in this way for 
convenience according to context. Noteworthy examples of this are differentials and connections. There are 
four different ways of representing the information in a function f : A x B — C using transposition and 
“domain-splitting”. These are as follows. 


f:AxBocCc 
f:BxA>C defined by f'(b,a) = f(a,b) or ft: (b, a) 4 f(a,b) 
f:A>(B>C) defined by f(a)(b) 2 f(a,b) or f :a (b f(a,b)) 


J':B—(A-—C) defined by X f'(b)(a) = f(a,b) or ft: bb (a f(a,b)), 


where f denotes the *domain-split" of f. Any one of these functions may be defined in terms of any of the 
others. Therefore they all contain the same information. Functions are often freely converted between these 
forms with little or no comment. 


Any function valued on a cross-product may be regarded as a function-valued function by fixing the value 
of one coordinate and constructing a function using the remaining coordinate. (See Theorem 10.19.7.) The 
functions f : A — (B > C) and ft : B —^ (A — C) may be thought of as “projections” of f onto A and B 
respectively. One could even invent notations such as mı f for f and m2f for ft. Obviously there are very 
many more ways of doing this in the case of a Cartesian product of many sets. 


10.19.5 REMARK: Splitting domain components of functions on Cartesian set-products. 

Let f : X x Y — A be a function for sets X, Y and A. For x € X, define gs : Y > A by gx : y > f(x,y). 
Define g : X — AY by g: 24 gs. 

For y € Y, define hy : X > A by hy: x 9 f(x,y). Define h : Y 2 A* by h: yt) hy. Then f € AX*Y, 
g € (AY)* and h € (A*)Y. The maps from f to g and h are bijections. This is shown in Theorem 10.19.7. 
The map-rule notation which is used in Theorem 10.19.7 possibly requires some explanation. (See also 
Remark 10.19.3.) The notation “x f(x,y)” means the function ((z, f(x, y)); x € X}, which is the same 
as hy for any y. The set X in {(x, f(r,y)); x € X} is implicit in the function f. In fact, hy equals 
the set ((z, f(x, y)); (x,y) € f}, which does mention X. Alternatively, note that X = Dom(h), so that 
hy = { (x, f(x, y)); x € Dom(h)) for any y, which also does not mention X. 

The notation “y — (x — f(z,y)) means {(y, ((z, f(z,y); £ € X}); y € Y}, which is the same as the 
function h = ((y, hj); y € Y). This follows from a second application of the map-rule notation. Then 


“m :f e (ze (yo f(z,y))" means m = (Cf, {(x, {(y, f(z, y))s y € Y); x € X9); f € A¥*Y} by a third 
application of the map-rule notation. In this case, the predicate “f € A*X*Y" comes from the context, where 


it is stated that mı : AX*Y — (AY)*. 
Similarly, “m2: f (y > (x > f(z,y)))" means m = {(f, (y, {(@, f(@,y)); £ E X}; y E Y}); f e AY). 


The choice of left or right domain splitting of functions has some relation to the notion of primary and 
secondary keys for indexes in databases. 


10.19.6 DEFINITION: The left domain split of a function f : X x Y > A is the function g : X > (Y — A) 
with the map-rule g : x — (y f(x, y)). 

The right domain split of a function f : X x Y > A is the function h : Y — (X — A) with the map-rule 
h: y> (ze f(a,y)). 


10.19.7 THEOREM: Canonical maps between sets of functions of two variables. 
Let X, Y and A be sets. Define the maps mı : A**Y — (AY)* and m3 : AX*Y — (AX)Y by the map-rules 
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71: f =œ (x 5 (y 9 f(z,y))) and m5 : f 9 (y 9 (x f(ax,y))). Then v, and m2 are bijections (with 
respect to their indicated domains and ranges). 


PROOF: Let X, Y and A be sets. Define mı : A**Y > (AY)* by m : f 5 (x 5 (y 9 f(z,y))). This is 
a well-defined map because for all functions f : X x Y — A and elements x € X and y € Y, f(x,y) has a 
well-defined value. 


Let g € (AY)*. Then g(x) € AY for all z € X. Define a function f : X x Y > A by f(x,y) = g(x)(y) for 
all (x,y) € X x Y. Then m(f)(x)(y) = f(x,y) = g(x)(y) for all (x,y) € X x Y. This means that 71(f) = g. 
So g € Range(71). Therefore mı : A**Y — (AY)X is a surjection. 

Now suppose that a function g € (AY )* satisfies 71(f1) = g and mi(/f2) = g for some fi, f € A**”. Then 
TiCfi)(z)(y) = g(z)(y) for all x € X and y € Y. But m(fi)(z)(y) = fi(a,y) by the definition of mı. So 
filz, y) = g(x)(y) for all (x,y) € X x Y. Similarly fo(z,y) = g(x)(y) for all (x,y) € X x Y. Therefore 
filz, y) = falx, y) for all (x,y) € X x Y, which means that fı = fo. Therefore 7, is an injection. Hence 
7, is a bijection. 

It may be proved that m2 : A**Y — (AX)Y is a bijection in the same way as for m1, or one may simply 
transpose 72, apply the result for mı, and then transpose 72 back to the original map. 


10.19.8 REMARK: Circled arrow notation for the action of one set on another set. 

As a minor innovation (if it is indeed new), actions of sets on other sets are indicated in diagrams (such as 
Figure 47.6.2) by a “circled arrow" or “William Tell” notation *-e»". Thus the expression G -e» F denotes 
G => (F > F) or G x F > F. Then if u: Gx F > F is a left group action, for example, this could be 
written as u : G -> F instead of u : G — (F > F). (This is more useful for arrows in diagrams than in 
linear text.) 
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11.1. Partial order 


11.1.1 REMARK: The family tree of orders on sets. 

A “partial order" is a weak form of order. It is so named to contrast with the total order in Definition 11.5.1. 
(Some authors abbreviate the term “partially ordered set" to “poset”.) Figure 11.1.1 shows a family tree of 
classes of order relations. (See Figure 9.0.1 for a family tree showing order relations in the context of other 
kinds of relations.) 


relation 


Y 
partial order 


Y 
lattice order 


Y 
total order 


Y 
well-ordering 


Figure 11.1.1 Family tree for classes of order relations 


11.1.2 DEFINITION: A (partial) order on a set X is a relation R on X which satisfies 


(i) Ve EX, z R z; [weak reflexivity] 
(ii) Yz,y € X, (@RyAyRa)S>r=y; [antisymmetry] 
(i) Vz,yuz€ X, (1 Ry ^y Rz)=> z Rz. [transitivity] 


A (partially) ordered set is a pair X < (X, R) such that X is a set and R is a partial order on X. 


11.1.3 THEOREM: The inverse of a partial order is a partial order. 
A relation R on a set X is a partial order on X if and only if the inverse relation R^! is a partial order 
on X. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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PROOF: Let R be a partial order on X. Then R satisfies Definition 11.1.2 conditions (i), (ii) and (iii). So 
R! satisfies 


(/) Vee X, x R`! q; 
(ii) Yx,y € X, (y Rtx A x Rty) > r= y; 
Gi) Va,y,zEX, (y Rtx A z Rty) >z R! 2. 


Condition (i’) for R^! is the same as condition (i) for R^!. The equivalence of conditions (ii!) and (ii) 
for R^! follows by swapping the dummy variables x and y. The equivalence of conditions (iii) and (iii) 


for R-! follows by swapping the dummy variables z and z and observing that the logical AND-operator is 
symmetric. Hence RT! is a partial order on X. The converse follows similarly. 


11.1.4 REMARK: Equivalent expressions for the conditions for partial orders. 
If idx denotes the identity map on a set X, and R is a relation on X, then condition (i) in Definition 11.1.2 
is equivalent to idx C R, condition (ii) is equivalent to RM R^! C idx, and condition (iii) is equivalent 


to Ro RC R. This is formalised as Theorem 11.1.5. 


11.1.5 THEOREM:  Set-operation conditions for a relation to be a partial order. 
Let R be a relation on X. Then R is a partial order on X if and only if all of the following conditions hold. 


(i) idx C R. [weak reflexivity] 
(ii) RN Ro C idx. [antisymmetry] 
(iii) Ro RC R. [transitivity] 


PRoor: For part (i), note that the proposition x R x in Definition 11.1.2 (i) means (z,x) € R. But 
idy = ((z,z); a € X). So the proposition Vr € X, (x,x) € R is equivalent to Vz € idx, z € R, which is 
equivalent to idx C R. 

For part (ii), note that the proposition x R y ^ y R x in Definition 11.1.2 (ii) means (z,y) € R ^ (y,2) € R, 
which is equivalent to (x,y) € R ^ (x,y) € R^ 1, which is equivalent to (x,y) € Rn R^!. But for x,y € X, 
the proposition x = y is equivalent to (x,y) € idx. So Definition 11.1.2 (ii) is equivalent to RN R^! C idx. 
For part (iii), note that Ro R = ((z,y); 3z, ((r,2) € R ^ (z,y) € R)} by Definition 9.6.2. So Ro RC R 
if and only if Vr, y, ((3z, ((x,2) € R A (z, y) € R)) = ((x,y) € R)). By Theorem 6.6.12 (iii), this is 
equivalent to Vz, y, (Vz, (((r,z) € R ^ (z,y) € R) = ((r, y) € R))). Since R C X x X, this is equivalent to 
Vr,y,z € R, (((x, z) € R ^ (z,y) € R) = ((v,y) € R)), which is equivalent to Definition 11.1.2 (iii). 


11.1.6 REMARK: Consequences of the conditions for a partial order. 

'The three conditions in Theorem 11.1.5 are line-for-line equivalent to the three conditions of Definition 11.1.2. 
However, Theorem 11.1.5 conditions (i) and (ii) may be combined into the single condition RN R^! = idx, 
which is equivalent to Yx, y € X, ((x Ryandy Rz)er- y). Similarly, the combination of Theorem 11.1.5 
conditions (i) and (iii) implies R o R = R. Thus one arrives at Theorem 11.1.7. 


11.1.7 THEOREM: Simpler-looking set-operation conditions for a relation to be a partial order. 

Let R be a relation on X. Then R is a partial order on X if and only if the following conditions both hold. 
(i) RA R! = idx. 

(ii) Ro R=R. 

PROOF: From (i), it follows that idx C Rand RNR! C idx. This verifies Theorem 11.1.5 (i, ii). From (ii), 

it follows that Ro R C R, which verifies Theorem 11.1.5 (iii). Thus R is a partial order on X. 

Now suppose that R is a partial order on X. Then idx C R and idx C R^! by Theorem 11.1.3. So 

idx C RO R^! C idx by Theorem 11.1.5 (ii). So RN R^! = idx by Theorem 7.3.5 (iii). This verifies (i). 

To verify (ii), let (x,y) € R. Then (a,x) € R because idx C R. So (x,y) € Ro R by Notation 9.6.3. Thus 

RC Ro R. But Ro RC R by Theorem 11.1.5 (iii). Therefore Ro R = R. 


11.1.8 REMARK: Redundancy of the base sets of the specification tuple for a partial order. 

As often happens, the base set X in the specification pair (X, R) for a partial order in Definition 11.1.2 
is redundant. The set X may be determined from the set R as X = (x; (x,x) € R}. Alternatively 
X = Dom(R) = Range(R). However, these observations do not apply to the corresponding strict (or strong) 
inequality R \ idx of R. 
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11.1.9 EXAMPLE: The weakest partial order on a set is the identity relation. 

For any set X, the identity relation idx is the weakest partial order on X. To show that idx is a partial order 
on X, note that idy! = idx by Theorem 9.6.16 (i), and idx o idx = idx by Theorem 9.6.11 (i). Therefore 
idx is a partial order on X by Theorem 11.1.7. The fact that idx is the weakest partial order on X then 
follows from Theorem 11.1.5 (i). 


11.1.10 REMARK: There is no strongest partial order on a general given set. 

For a given set X containing more than one element, there is no strongest partial order. In other words, 
there is no partial order on X which includes all other partial orders on X. The relation X x X, for example, 
is not a partial order if X is not empty and not a singleton, although by Remark 11.5.4, RUR 1 = X x X 
for any total order R on X. However, a partial order may be maximal in the sense that no ordered pair may 
be added to it. In fact, all total orders are maximal in this sense. (See Theorem 11.5.5 (i).) 


11.1.11 EXAMPLE: Inclusion relations are partial orders. 

The inclusion relation for sets is a prime example of a partial order which is not necessarily a total order. 
For any set A, a partial order R = {(x,y) € X x X; x C y] may be defined on the power set X = P(A). 
In other words, x Ry €» x C y for x,y € X. Thus the inclusion relation * C " is a partial order on P(A) 
for any set A. If the set A has two or more elements, the inclusion relation is not a total order because 
(a) Z {b} and {b} Z {a} whenever a Z b. This is illustrated in Figure 11.1.2. (For clarity, the “transitivity 
arrows" have been omitted from the diagram. See Remark 11.2.3 for discussion of transitivity arrows.) 


{a, b, c, d) 
ASNN 
{a,b,c} (a,b, d) (a, c, d) {b, c, d} 
{a,b,c} 
JU Tq 
{a,b} dux. desk dil {a,b} (a, c) (a, d) (b, c) {b, d} (e, d) 
zx P S 
{a} {b} {a} {b} {c} (aj. (€) {ec} {d 


"A MT 


0 0 


Figure 11.1.2 Inclusion relation for sets of two, three or four elements 


11.1.12 THEOREM: The set-inclusion relation for a set is a partial order om the power set. 
For any set X, the pair (P(X), C) is a partially ordered set. 


Proor: Let X be a set. Let A € P(X). Then A C A by Theorem 7.3.5 (i). Let A, B € P(X) with AC B 
and B C A. Then A = B by Theorem 7.3.5 (iii). Let A, B, C € P(X) with A C B and BC C. Then AC C 
by Theorem 7.3.5 (vii). Hence (P(X), C) is a partially ordered set by Definition 11.1.2. 


11.1.13 DEFINITION: The dual order of an order R for a set X is the inverse R^! of the relation R. In 
other words, the dual of an order R is the relation R^! = ((z,y); (y, v) € R}. 


11.1.14 NOTATION: The symbol “<” may be used for a partial order R. Then x < y means x R y and the 
following notations apply. 


The symbol “>” means the dual of the order “<”. Thus “>” equals R-t. Thus z > y means y < x. 
The symbol “<” denotes the relation which satisfies x < y €& (x € y ^ x £ y). 
The symbol “>” denotes the relation which satisfies x > y €& (x»gy^c-zZy). 


The symbols *X", “Z”, “<” and “” denote the logical negations of “<”, “>”, “<” and “>” respectively. 
Thus z € y means ^(z € y), z Z y means ^(r > y), x € y means —(x < y), and x  y means —^(x > y). 


11.1.15 REMARK: Relations which contain the same information as a partial order. 
A partial order “<” contains the same information as the corresponding relations “<”, “>” and “>”. 
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Therefore the specification of a partial order may be freely chosen from these four representations. To avoid 
defining everything four times, it is usual to standardise the representation as meaning “less than or equal 
to". Then “<” may be defined by (x < y) & (x <y ^ x Z y), with similar expressions for “>” and “>”. 


11.1.16 THEOREM: Exclusion of simultaneous opposing order relations. 
Let (X, X) be a partially ordered set. 
(i) Vz,y € X, (x € y > (x > y)). 
(ii) Vo,y e X, (ez y > (2 « 9) 
PROOF: Let (X, X) be a partially ordered set. For part (i), let x,y € X with x < y. Suppose that x > y. 


Then y < x and x Z y by Notation 11.1.14. So x = y by Definition 11.1.2 (ii). This contradicts r Æ y. 
Therefore =(x > y). The proof of part (ii) is essentially identical. 


11.1.17 REMARK: Similarities between partial orders and topologies. 

'The structure bestowed by a partial order on a set is similar in some ways to the structure bestowed by a 
topology or metric. One may say that an element y of a partially ordered set (X, <) lies “between” elements 
z,z € X if x € y and y € z. (One could say “strictly between" x and z if x < y and y < z.) This is similar 
to saying that y is “closer” to x than z, and if “closer” to z than x. A metric specifies distance whereas a 
topology defines the concept of a “neighbourhood”. So both a metric and a topology specify the notion of 
closeness in some sense. 


11.1.18 REMARK: Closure of partial orders under restriction of the domain and range. 
Theorem 11.1.19 uses Definition 9.6.21 for the restriction of a relation. 


11.1.19 THEOREM: Any restriction of a partial order is also a partial order. 
The restriction of an order R on a set X to any subset Y of X is an order on Y. 


PROOF: Let R be an order on X. Let Y be a subset of X. The restriction of R to Y is the set Ry = 
RA (Y x Y), which is a relation on Y. Let x € Y. Then (rz, z) € R. So (a,x) € Ry because (z, z) € Y x Y. 
So Ry satisfies Definition 11.1.2 part (i) for an order on Y. 

Let x,y € Y satisfy x Ry y and y Ry x. Then (x,y) € R and (y,x) € R. So x = y by Definition 11.1.2 (ii) 
for R. Therefore Ry satisfies Definition 11.1.2 (ii). 

Let x,y,z € Y satisfy x Ry y and y Ry z. Then (z,y),(y, z) € R. So (x, z) € R by Definition 11.1.2 (iii) 
for R. But (z,z) € Y x Y. So (zx, z) € Ry. Therefore Ry satisfies Definition 11.1.2 (iii). Hence Ry is an 
order on Y. 


11.1.20 REMARK: Order homomorphisms, monomorphisms and isomorphisms. 

Homomorphisms, monomorphisms and isomorphisms between ordered sets are introduced in the usual way in 
Definition 11.1.21. (See Definition 10.5.21 for the full range of morphisms.) These are particularly important 
for the theory of well-ordered sets and ordinal numbers in Sections 11.7, 13.2 and 13.3. 


11.1.21 DEFINITION: Let (X, X x) and (Y, Xy) be ordered sets. 
(i) A weak order homomorphism from (X, €x) to (Y, €y) is a function f : X — Y which satisfies 
Vz1, £2 € X, (£1 €x 29 > f(x1) €v f (z2)). 
(ii) A strong order homomorphism from (X, €x) to (Y, <y) is a function f : X — Y which satisfies 
Var1,02 € X, (21 «x z9 => f(xı) <y f (z2)). 
(iii) An order monomorphism from (X, €x) to (Y, <y) is an injection f : X — Y which is a weak order 
homomorphism from (X, € x) to (Y, €y) 


(iv) An order isomorphism from (X,<x) to (Y, Xy) is a bijection f : X — Y such that f is a weak 
order homomorphism from (X, € x) to (Y, <y) and f^! is a weak order homomorphism from (Y, <y) 
to (X, Ex). 

In other words, an order isomorphism from (X, € x) to (Y, <y) is a bijection f : X — Y which satisfies 
Vz1, £2 € X, (a1 €x 23 & f(z1) <y f (z2)). 
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11.1.22 REMARK: Order homomorphisms with respect to weak or strong order. 

A weak order homomorphism according to Definition 11.1.21 (i) does not necessarily satisfy the strong order 
homomorphism rule Vx1, £2 € X, (x1 «x ®2 => f(r) <y f(a2)). For example, any constant function is a 
weak order homomorphism. But in the case of order monomorphisms and isomorphisms, it is not necessary 
to distinguish between the weak and strong rules. 


A bijection f : X — Y which is an order monomorphism from (X, <x) to (Y, <y ) is not necessarily an order 
isomorphism from (X, €x) to (Y, €v) because the relation <y could be a proper superset of the induced 
order <y = {(y1,y2) € Y x Y; à! (y) €x $^ 1(y3)) in Definition 11.1.28. (See Example 11.1.23.) This 
is why Definition 11.1.21 (iv) requires bidirectional order preservation. (See Theorem 11.5.33 for the totally 
ordered set case, where every bijective order homomorphism is automatically an order isomorphism.) 


Note also that a strong order homomorphism is not necessarily an injection. (See Example 11.1.24.) 


11.1.23 EXAMPLE:  Bijective order monomorphism which is not an order isomorphism. 

Let X = {a,b} and € x = {(a,a), (b,b)} = idx. Let Y = {c,d} with <y = {(c,c), (d, d), (c, d)) = idy Uc, d}. 
Then (X, € x) and (Y, <y ) are ordered sets. Let f = ((a, c), (b, d)). Then f : X > Y isa bijection which is an 
order monomorphism, and also a strong order homomorphism, but f : X — Y is not an order isomorphism. 


11.1.24 EXAMPLE: Strong order homomorphism which is not an injection. 

Let X = {a,b} and €x = {(a,a), (b,b)} = idx. Let Y = {c} with <y = ((c,c)] = idy. Then (X, €x) 
and (Y, €y) are ordered sets. Let f = {(a,c),(b,c)}. Then f : X — Y is a strong order homomorphism. 
However, f : X — Y is not an injection. 


11.1.25 THEOREM: Relations between weak and strong order homomorphism properties. 
Let (X, € x) and (Y, <y) be ordered sets. Let f : X >Y. 


(i) If f is a strong order homomorphism, then f is a weak order homomorphism. 
(ii) If f is an order monomorphism, then f is a strong order homomorphism. 


(iii) If f is an order isomorphism, then f and f~! are strong order homomorphisms. 


PROOF: For part (i), let f : X — Y be a strong order homomorphism. Let z;,29 € X with zı <x 9. 
Then zı <x 22 or 21 = T2. So f(x) <y f(z2) or f(z1) = f(x2). Therefore f(zri) <y f(x2). Hence 
f: X — Y is a weak order homomorphism. 


For part (ii), let f : X — Y be an order monomorphism. Let 21,72 € X with zı <x x3. Then zı €x 2p. 


So f(a1) <y f(x3) Therefore f(zi) <y f(x3) or f(x1) = f(x2). But f(z1) Æ f(x2) because f is injective. 
So f(zi) <y f(xa3). Hence f : X — Y is a strong order homomorphism. 


Part (iii) follows from part (ii) and Definition 11.1.21 (iii, iv). 


11.1.26 REMARK: Induction of order by a bijection. 

Definition 11.1.21 (iv) assumes that two ordered sets are given, and then defines order isomorphisms in terms 
of them. Theorem 11.1.27 assumes that one ordered set and one bijection are given, and constructs a second 
ordered set from them. The proof is very simple and mechanical. 


11.1.27 THEOREM: The order induced by a bijection is order isomorphic to the original order. 

Let (X, €x) be an ordered set. Let ¢: X — Y be a bijection. Define the relation <y € P(Y x Y) 
by <y = {(y1,y2) € Y x Y; ó^!(y1) €x ó^1(ya)). Then (Y, €y) is an ordered set, and ¢ is an order 
isomorphism from (X, € x) to (Y, <y). 


PROOF: Let y € Y. Then à ^ !(y) <x $^ !(y) by Definition 11.1.2 (i). So y <y y by the definition of <y. 
Thus <y satisfies Definition 11.1.2 (i). 

Let y1,y2 € Y satisfy yı Xy yo and y» <y yi. Then 6 1(y1) €x $^ !(yo) and à^l(ys) €x $^ !(yi). So 
$7 (y1) = $^! (yg) by Definition 11.1.2 (ii). So y; = yo. Thus <y satisfies Definition 11.1.2 (ii). 

Let y1, y2,y3 € Y satisfy yı Xy y» and yo <y y3. Then 9^! (yi) <x 9^! (ya) and ó^! (yz) <x $^! (ys). So 
ó l(yi) €x 9^! (ys) by Definition 11.1.2 (iii). So yı <y ys. Thus <y satisfies Definition 11.1.2 (iii). Hence 
(Y, €y) is an ordered set by Definition 11.1.2, and à : X — Y is an order isomorphism by construction. 


11.1.28 DEFINITION: The induced order by a bijection o : X — Y of an order <x on X is the order 
relation {(y1,y2) € Y x Y; 6^! (y1) €x 9^ !(y3)]- 
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11.1.29 REMARK: Increasing, decreasing, non-decreasing and non-increasing functions. 

Definition 11.1.30 defines increasing and decreasing properties of functions whose domains and ranges are 
partially ordered sets. This kind of generality is particularly useful in the case of set-valued functions whose 
range is ordered by the set-inclusion relation, which is always a partial order by Theorem 11.1.12. 


The terms “non-decreasing” and “non-increasing” are misnomers, especially in the context of partially or- 


dered sets. In the total order context, “non” really means “never”. If the target spaces of a function is 
a totally ordered set, the value of the function has only three possibilities when the domain parameter is 
varied. It can increase, decrease, or stay the same. Thus “never-decreasing” implies that the function is 
always increasing or staying the same. 


A function with a partially ordered target set has an additional possibility when the domain parameter 
is varied. The function’s values at two points may be unrelated. (For example, in the case of the set- 
inclusion order in Theorem 11.1.12, it is not generally true that one set of a pair is included in the other.) 
Definition 11.1.30 specifically excludes the possibility of unrelated function values when the domain values 
are related. Consequently “non-decreasing” really means “always either increasing or constant”. It does not 
mean “never decreasing”, except when the target set is totally ordered. 


11.1.30 DEFINITION: Let (X, € x) and (Y, €y) be ordered sets. 

An increasing function f : X — Y is a strong order homomorphism from (X, € x) to (Y, <y). 

In other words, Yz1, £2 € X, (zı <x £2 > f(x1) <y f(z2)). 

A decreasing function f : X — Y is a strong order homomorphism from (X, € x) to (Y, >y). In other words, 
Vz1, £2 € X, (z1 <x T2 > f (21) >y f(z2)). 

A non-decreasing function f : X — Y is a weak order homomorphism from (X, € x) to (Y, <y). 

In other words, Yz1, £2 € X, (21 <x £2 => f(xı) Sy f(z2)). 

A non-increasing function f : X — Y is a weak order homomorphism from (X, € x) to (Y, >y). 

In other words, Yz1, £2 € X, (ay Ex t2 => f(x) 2y f(2)). 


11.1.31 REMARK: General sets of increasing functions. 
Notation 11.1.32 for sets of increasing and decreasing functions is non-standard but useful. 


The set Inc( X, Y) might be thought to contain only injective functions. However, this is the set of strong 
order homomorphisms from X to Y, and as shown in Example 11.1.24, such functions are not necessarily 
injective. In the context of partial orders, one must forget what one knows about total orders! Similarly, the 
functions in Dec(X, Y ) are not necessarily injections. 


The set NonDec( X, Y) is the same as the set of weak order homomorphisms from X to Y. 


11.1.32 NOTATION: Let (X, Xx) and (Y, Xy) be partially ordered sets. 
Inc(X, Y) denotes the set {f : X + Y; Vj ke€ X, j <x k => f(j) «v f(k)). 
Dec(X, Y) denotes the set (f : X > Y; Wik E€ X, j <x k => f(3) >y f(k)). 
NonDec(X, Y) denotes the set {f : X > Y; Vj, ke X, j €x k => f(j) <y f(k)}. 
NonInc(X, Y) denotes the set {f : X > Y; Vi, ke X, j €x ko f(j) zv f(k)). 
11.1.33 THEOREM:  Transitivity of monotonicity properties of maps. 
Let (X, € x), (Y, <y) and (Z, <z) be partially ordered sets. 
(i) If f : X > Y and g: Y ^ Z are non-decreasing, then g o f : X — Z is non-decreasing. 
(ii) If f : X > Y and g : Y 5 Z are increasing, then g o f : X — Z is increasing. 
PROOF: For part (i), let j,k € X with j <x k. Then f(j) <y f(k) by Definition 11.1.30. Then similarly, 
(go f)(3) = g(f(3)) <z g(f(k)) = (g o f)(k). Hence g o f : X > Z is non-decreasing by Definition 11.1.30. 


For part (ii), let j,k € X with j <x k. Then f(j) «v f(k) by Definition 11.1.30. Then similarly, 
(go f)(3) = g(f()) <z g(f(k)) = (g o f)(k). Hence g o f : X > Z is increasing by Definition 11.1.30. 


11.1.34 REMARK: Conversion of general partial orders to set-inclusion relations. 

Any partial order can be converted to a set-inclusion relation on its power set. Let X be a set with a partial 
order “<”. Define the function $ : X — P(X) by é(z) = {z € X; z € x} for all x € X. (Then ¢(z) is the 
same as the set {z}~ in terms of the notation in Theorem 11.2.11 (i).) Suppose that x < y. Then ¢(a) C ó(y) 
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because z < x implies z < y by transitivity, Definition 11.1.2 (iii). Suppose that ¢(x) C ¢(y). Then x € ó(x) 
by reflexivity, Definition 11.1.2 (i). So x € ó(y). Therefore x < y. Thus the map 9 : X — P(X) is an order 
isomorphism between (X, X) and (Y, C) by Definition 11.1.21 (iv), where Y = Range(ó). Therefore every 
partially ordered set can be “realised” as an inclusion relation for some set of sets. Conversely, any set (of 
sets) is a partially ordered set for the relation *C". (See Theorem 11.1.12.) 


11.2. Bounds for sets 


11.2.1 REMARK: Extremums and bounds on sets with a partial order. 

For a general partial order, the many kinds of bounds in Definitions 11.2.2 and 11.2.4 may or may not exist. 
Intuition is often deceptive when dealing with bounds on partially ordered sets. Therefore it is generally 
safest to add the caveat “if it exists” to any kind of bound for a general partially ordered set. Even in the 
case of a general total order, such caveats are required. 


11.2.2 DEFINITION: Let (X,<) be a partially ordered set. Let A be a subset of X. 
A minimal element of A in X is an element b € A such that Va € A, —(a < b). 


A maximal element of A in X is an element b € A such that Va € A, (b < a). 


11.2.3 REMARK: Arrow diagrams for partial orders. 

The minimal and maximal elements in Definition 11.2.2 are defined by the absence of lesser or greater 
elements respectively. This is illustrated by the example on the left of Figure 11.2.1, where an arrow from x 
to y means that x < y. 


maximal elements a, b maximal elements a, b 
a b a Zi b 
c N e c d e 
t AFA t ty. ur 
f g h i f g h i 
minimal elements f, g, h,i minimal elements f, g, h, à 
(with “transitivity arrows") (without "transitivity arrows") 


Figure 11.2.1 Example of minimal and maximal elements of a partially ordered set 


In the graph on the left of Figure 11.2.1, all pairs (zx, y) with z < y are shown. Strictly speaking, an order 
relation “<” should also show “self-arrows”, which are arrows pointing from each node to itself. For clarity, 
these are usually omitted. The arrow diagram for a partial order omitting “self-arrows” is always a directed 
acyclic graph. (This follows from the transitivity and antisymmetry properties.) Omitting the self-arrows 
is, of course, equivalent to graphing the strict relation “<” instead of “<”. 


Arrow diagrams for partial orders are usually further simplified by omitting arrows for pairs (z,z) for 
which there is an element y with x < y and y < z. Thus an arrow is shown from x to y if and only if 
(x « y) ^ (Yz, (xz V z €y)). In other words, the transitivity is usually implicit. The full order relation 
can always be reconstructed by adding transitivity and self-arrows. The abbreviated graph may be thought 
of as having all of the “short-cuts” removed. That is, only the longest (directed) paths are shown because 
the “short-cuts” are implicit. 


It is important to note that minimal and maximal elements of a set are non-unique in general. Thus neither 
existence nor uniqueness is guaranteed for minimal and maximal elements. This contrasts to the infimum, 
supremum, minimum and maximum of a set in Definition 11.2.4, which are all unique if they exist. 


11.2.4 DEFINITION: Let (X, €) be a partially ordered set. Let A be a subset of X. 
A lower bound for A in X is an element b € X such that Va € A, b < a. 
An upper bound for A in X is an element b € X such that Va € A, a < b. 
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An infimum or greatest lower bound of A in X is a lower bound b for A in X such that x < b for all lower 
bounds z for A in X. In other words, (Va € A, b < a) and Vz € X, (Vac A, xa) ^ x € b). 


A supremum or least upper bound of A in X is an upper bound b for A in X such that b < x for all upper 
bounds z for A in X. In other words, (Va € A, a € b) and Vr € X, (Vae A,a x) 2 b < 2). 


A minimum of A in X is an element b € A such that Va € A, b € a. 
A maximum of A in X is an element b € A such that Va € A, a < b. 


11.2.5 DEFINITION: Bounded and unbounded sets. 

A bounded-above subset of a partially ordered set (X, X) is a set A € P(X) such that X contains an upper 
bound for A. In other words, 3b € X, Va € A, a € b. 

A bounded-below subset of a partially ordered set (X,<) is a set A € P(X) such that X contains a lower 
bound for A. In other words, 4b € X, Va € A, b € a. 

A bounded subset of a partially ordered set (X,<) is a set A € P(X) such that X contains both a lower 
bound for A and an upper bound for A. In other words, A is both bounded above and bounded below. 

An unbounded subset of a partially ordered set (X, X) is a set A € P(X) which is not a bounded subset of X. 
In other words, A is not bounded above or not bounded below. 


11.2.6 REMARK: Relations between extremum classes and bounds of partially ordered sets. 

The relations between the classes of bounds in Definitions 11.2.2 and 11.2.4 are summarised in Figure 11.2.2 
and asserted in Theorem 11.2.7. A minimum is a particular kind of infimum, and an infimum is a particular 
kind of lower bound. A minimum is also a particular kind of minimal element. (For brevity, the logical 
conditions illustrated in Figure 11.2.2 for the infimum and supremum are the shorter forms which are 


demonstrated in Theorem 11.2.35.) 


minimal element bc A lower bound: be X maximal element bc A upper bound: bc X 
Va€A, ^(a«b) Vac A, b<a Va€ A, a(a>b) VaeA, b>a 
infimum: be X supremum: bc X 
VrC€X,(rXb«e VacA,r&a) Vr€X,(r2beVacA,v-a) 
minimum: bc A maximum: bcA 
Vac A, b<a VacA,b»a 


Figure 11.2.2 Relations between classes of bounds for a subset A of an ordered set X 


11.2.7 THEOREM: Relations between various extremal concepts for a partial order. 
Let (X, X) be a partially ordered set. Let A C X. 


(i) For any z € X, if x is a minimum of A, then x is a minimal element of A. 
(ii) For any z € X, if x is a maximum of A, then z is a maximal element of A. 
(iii) For any x € X, if x is a minimum of A, then x is an infimum of A. 

(iv) 
(v) 
) 


(vi) For any z € X, x is a maximum of A if and only if x € A and zx is a supremum of A. 


For any x € X, if x is a maximum of A, then x is an supremum of A. 


For any x € X, x is a minimum of A if and only if z € A and x is an infimum of A. 


PROOF: Let (X, X) be a partially ordered set. Let A C X. For part (i), let x € X be a minimum of A. 
Then x € A and Va € A, x < a. So Va € A, —^(x > a) by Theorem 11.1.16 (i). Hence x is a maximal element 
of A by Definition 11.2.2. The proof of part (ii) is essentially identical to the proof of part (i). 

For part (iii), let zı be a minimum of A in X. Then zı is a lower bound for A in X by Definition 11.2.4. 


Suppose that x2 is also a lower bound for A in X. Then Va € A, zo < a. So x2 € xı. Hence zı is an 
infimum of A in X. Part (iv) follows as for part (iii). 


For part (v), let x € A be an infimum of A in X. Then z is a lower bound for A by Definition 11.2.4. So x 
is a minimum of A in X. The converse follows from part (iii). Part (vi) follows as for part (v). 
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11.2.8 REMARK: Uniqueness of upper and lower bounds on sets. 

There is at most one minimum and one maximum for any subset of a partially ordered set. Therefore there 
is at most one infimum and one supremum, because the infimum of a subset A is a maximum of the set 
of all lower bounds of A, and the supremum A is a minimum of the set of all upper bounds of A. These 
uniqueness properties are formalised in Theorem 11.2.9. 


11.2.9 THEOREM: Uniqueness of infimum, supremum, minimum and maximum. 
Let (X, X) be a partially ordered set. Let A C X. 


(i) A has at most one infimum and at most one supremum. 


(ii) A has at most one minimum and at most one maximum. 


PROOF: Let (X, X) be a partially ordered set. Let A C X. To show part (i), suppose that z;,v5 € X are 
both an infimum of A in X. Then z1, £2 are lower bounds for A in X. But an infimum of a set is greater than 
or equal to all lower bounds for that set. So zı < x2 and x2 € zı. Hence x; = x2 by Definition 11.1.2 (ii). 
The uniqueness of the supremum of A follows similarly. 

For part (ii), suppose that 21,22 € X are both minimums of A in X. Then by Theorem 11.2.7 (iii), zı and 
z> are both infimums of A in X. Hence z, = x2 by part (i). The uniqueness of the maximum of A follows 
similarly. 


11.2.10 REMARK: Conditions for existence of the infimum and supremum of a set. 

For any subset A of a partially ordered set (X, €), the infimum of A is defined as the maximum of the set 
of lower bounds of A. (This is fairly obvious from the alternative name “greatest lower bound”, if for no 
other reason.) Therefore the infimum of A exists if and only if the maximum of the set of lower bounds of A 
exists, and they are equal if they do exist. Similarly, the supremum of A exists if and only if the minimum 
of the set of upper bounds of A exists, and they are equal if they do exist. These observations are formalised 
in Theorem 11.2.11. 


11.2.11 THEOREM: Relations between extremums of sets and their left/right segments. 
Let (X, <) be a partially ordered set. Let A € P(X). 
(i) Let AT = {x € X; Va € A, x € a}. 
Then for all b € X, b is an infimum of A if and only if b is a maximum of A^. 
(ii) Let At = {x € X; Va € A, x > aj. 
Then for all b € X, b is a supremum of A if and only if b is a minimum of AT. 


PROOF: Let (X, X) be a partially ordered set. Let A € P(X). Let AT = {x € X; Vac A, x € a}. Then 
b € X is an infimum of A if and only if Va € A, b € a and Vx € X,((Va € A, x € a) > x < b) by 
Definition 11.2.4. This holds if and only if b € A^ and z < b for all x € A^, which holds if and only if b is a 
maximum of A~. This shows part (i), and part (ii) follows similarly. 


11.2.12 REMARK: Left and right segments of subsets of partially ordered sets. 

The set A^ in Theorem 11.2.11 (i) is a kind of “left segment" of the set A. (This is related to the “weak 
initial segment" concept mentioned in Remark 11.7.10.) A^ may be thought of as the largest common left 
segment of all elements of A. 


11.2.13 EXAMPLE: Left segments for partial order by inclusion on a power set. 
For the partial order by inclusion of a power set X = P(Y), the “left segment" of A € P(X) \ {@} would be 
A = {xe P(Y); Vac A, x Ca} 
= N {r € P(Y); £ C a} 


acA 

= N P(a) (11.2.1) 
acA 

= P(f14), (11.2.2) 


where line (11.2.1) follows from Theorem 8.5.2 (xix), and line (11.2.2) follows from Theorem 8.5.2 (xvii). But 


(14 € P(Y) by Theorem 8.5.2 (xii). So P(() A) € P(IP(Y)) by Theorem 8.5.2 (iv). Then A~ € P(X) \ {0} 


by Theorem 8.5.2 (i). Since B C NA for all B € IP((] A), and NA € IP(f] A) by Theorem 8.5.2 (ii), it 
follows that (| A = max( A`). This agrees with the fact that N A = inf(A). (See Theorem 11.6.16.) 
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11.2.14 REMARK: Existence of extremums and bounds on the empty set. 

The empty set is not excluded from Definitions 11.2.2 and 11.2.4. Theorem 11.2.15 gives some properties of 
the various bounds for the empty set. By Theorem 11.2.9, the infimum, supremum, minimum and maximum 
of any subset of a partially ordered set are unique if they exist. 


11.2.15 THEOREM: Bounds and extremums for the empty set. 
Let (X, <) be a partially ordered set. 
(i) @ has no minimal or maximal element. 
(ii) All elements of X are both lower and upper bounds for 9. 
(iii) Ø has no minimum or maximum. 
(iv) The infimum of Ø equals the maximum of X, if the maximum of X exists. 
Otherwise, the infimum of Ø does not exist. 


(v) The supremum of () equals the minimum of X, if the minimum of X exists. 
Otherwise, the supremum of () does not exist. 


PROOF: Let (X, X) be a partially ordered set. Let A = (). 

Part (i) follows from the requirement that a minimal or maximal element of A must be an element of A, 
which is impossible. 

For part (ii), let b € X. Then Va, (a € A = b < a) because Va, a € Ø by definition of the empty set. 
Thus Va € A, b € a. Hence b is a lower bound for A. Similarly, b is an upper bound for A. 

Part (iii) follows from the requirement that a minimum or maximum element of A must be an element of A, 
which is impossible. 

For part (iv), note that b is an infimum for A if and only if b is the maximum of the set of lower bounds 
of A. But by part (ii), this set of lower bounds equals X. So an infimum for A exists if and only if X has a 
maximum, and if they do both exist, they must be the same element of X. 


The proof for part (v) is essentially identical to the proof for part (iv). 


11.2.16 NOTATION: Let (X, €) be a partially ordered set. Let A be a subset of X. 
inf( A) denotes the infimum of A in X, if it exists. 

sup(A) denotes the supremum of A in X, if it exists. 

min(A) denotes the minimum of A in X, if it exists. 


max(A) denotes the maximum of A in X, if it exists. 


11.2.17 THEOREM: Some equalities and inequalities for various kinds of extremums. 
Let (X, X) be a partially ordered set. Let A be a subset of X. 
(i) If A Z 0 and the infimum and supremum of A both exist, then inf( A) € sup( A). 
(ii) If the minimum of A exists, then min(A) = inf (A). 
(iii) If the maximum of A exists, then sup( A) = max(A). 
(iv) If the minimum and maximum of A both exist, then min(A) = inf(A) € sup( A) = max(A). 
PnRoor: For part (i), Va € A, (inf(A) < a and a € sup(A)) because by Definition 11.2.4, inf( A) is a lower 
bound for A and sup(A) is an upper bound for A. But A # Ø. So inf( A) < sup(A). 
Part (ii) follows from Theorems 11.2.7 (iii) and 11.2.9 (i, ii). 
Part (iii) follows from Theorems 11.2.7 (iv) and 11.2.9 (i, ii). 


Part (iv) follows from parts (i), (ii) and (iii). (The existence of the minimum and maximum implies that the 
set is non-empty by Definition 11.2.4.) 


11.2.18 REMARK: Existence of the infimum, supremum, minimum and maximum of a set. 

There is an implicit caveat when using any of the notations in Notation 11.2.16. Thus for example, a 
statement such as “b = inf(A)" means (1) “the infimum of A exists and equals b”. Alternatively, it could 
mean (2) *b is equal to the infimum of A if it exists", which is a different meaning. Interpretation (2) does 
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not assert the existence of the infimum, and so it does not give a value to b if it does not exist. Therefore 
interpretation (1) is to be preferred. 


The need to repetitively provide the caveat “if it exists” is generally avoided in the case of topological limits 
by restricting attention to continuous functions, for example, so that the limits are guaranteed to exist. In 
some naive introductions to analysis, one sometimes sees language such as: “inf(A) = undefined”, using 
the pseudo-value “undefined”. In fact, one needs three pseudo-values. One needs a “plus infinity” which is 
added to the ordered set X as an upper bound for all elements of X, a “negative infinity” which is a lower 
bound for all of X, and an “undefined” value which means that there is neither an infimum nor extremum 
amongst the elements of X, nor amongst the two pseudo-infinities. 


The issue of infinities and undefined values in the extended-real numbers is not solved by introducing even 
three of the pseudo-values for limits. And then one has the further issue of how to combine these pseudo- 
values. The arithmetic of pseudo-values is defective. For example, adding, subtracting, multiplying or divid- 
ing these three pseudo-values to each other gives rise to yet further pseudo-values. In the case of the bounds 
in Notation 11.2.16, one often wishes to evaluate combinations of them. For example, Notation 11.3.11 uses 
the “returned values” of one kind of bound in the evaluation of another bound. 


The same kind of issue arises when dealing with functions in general. One often sees the caveat “if it is 
defined” in the case of partially defined functions such as the tangent in Definition 44.2.14 and the majority 
of complex-analytic functions. One wishes to write f(a) for the value of f at x even when this is undefined 
for some x. The expression *f(r)" is well defined only if it has one and only one value. It is possible to 
^wave one's hands" when dealing with functions which occasionally have more or less than one value, but 
then one is skating on thin ice. In some contexts, the functions may be defined “almost everywhere", and 
yet one still wishes to use the notation “f(x)”, even when f may be undefined on a dense subset of its 
domain. Such *hand-waving" is acceptable only if one knows how to provide a rigorous treatment. In the 
case of divergent series, the divergence of a series to plus or minus infinity can be remedied by adding the 
pseudo-values —oo and +00 to the real numbers, or by adding a single “point at infinity" (or a “point at 
infinity" for each direction) to the complex numbers. But this does not fix the problem of bounded series 
which do not converge, for example. 


In the case of the bounds in Notation 11.2.16, there are at least two remedies to the “undefined value” 
problem (in addition to the remedy where one employs pseudo-values). One approach would be to introduce 
the non-standard notation inf set(A), for example, to mean (x € X; x is an infimum for A}. Such a notation 
has the disadvantage of mixing English with the more traditional Latin notations, which are arguably more 
international. Another alternative would be a notation such as Inf(A), capitalising the initial letter to 
indicate that the set of possible infimum values is intended, not necessarily a well-defined single value. Yet 
another possible notation would be an over-bar, such as inf( A), for example, to indicate the set of infimum 
values. Probably such an over-bar is the least bad notation for such sets. 


A second approach is to regard the bounds in Notation 11.2.16 as functions from P(X) to X, which is in 
fact what they are! In most texts, these bounds are regarded almost as unary operators acting on sets (or 
on functions), whereas they actually depend on the choice of order on the containing set (or on the function 
range). When these bounds are recognised as functions, it becomes possible to deal with them in the same 
way as general functions, namely by referring to the domain of the function in any caveat. In other words, 
one may define the domain of these bounds to be the set of subsets of X for which the bound exists. Then 
A € Dom(inf) means that inf(A) exists. Since the bound depends on the order, one may add a tag for the 
ordered set X, as for example inf* (A) or inf x (A), although the subscript is already used for other purposes 
in the standard notation. 


From these considerations, one arrives at the non-standard Definitions 11.2.19 and 11.2.21, and Notations 
11.2.20 and 11.2.22. 


11.2.19 DEFINITION: 
The infimum set-map for a partially ordered set (X, €) is the map ¢: P(X) > P(X) defined by 
(A) = (b € X; bis an infimum for A} for all A € P(X). 


The supremum set-map for a partially ordered set (X, €) is the map 9 : P(X) — P(X) defined by 
(A) = (b € X; bis a supremum for A} for all A € P(X). 
The minimum set-map for a partially ordered set (X, €) is the map ¢: P(X) — P(X) defined by 
(A) = (b € X; bis a minimum for A} for all A € P(X). 
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The maximum set-map for a partially ordered set (X, €) is the map ¢: P(X) — P(X) defined by 
(A) = (b € X; bis a maximum for A} for all A € P(X). 


11.2.20 NOTATION: Let (X, €) be a partially ordered set. 
inf* : P(X) 2 P(X) denotes the infimum set-map on X. 


P : P(X) ^ P(X) denotes the supremum set-map on X. 


* : P(X) > P(X) denotes the minimum set-map on X. 


-— : P(X) — P(X) denotes the maximum set-map on X. 


'The notations inf, sup, min and max are abbreviations for inf*, sup“, min* and max^ when the partially 
ordered set X < (X, €) is implicit in the context. 


11.2.21 DEFINITION: 

The infimum map for a partially ordered set (X, €) is the map à : (A € P(X); inf (A) 4 0) — X defined 
by $(A) = inf(A) for all A € P(X) such that inf*(A) 7 0. 

The supremum map for a partially ordered set (X, <) is the map ¢ : (A € P(X); sup (A) 4 0} — X defined 
by ¢(A) = sup(A) for all A € P(X) such that sup“ (A) £ 0. 

The minimum map for a partially ordered set (X, <) is the map ¢ : (A € P(X); min (A) 4 0} > X defined 
by (A) = min(A) for all A € P(X) such that min*(A) 4 0. 

The maximum map for a partially ordered set (X, X) is the map à : (A € P(X); max (A) 4 0} 2 X 
defined by ¢(A) = max(A) for all A € P(X) such that max*(A) £ 0. 


11.2.22 NOTATION: Let (X,<) be a partially ordered set. 

inf* : (A € P(X); in£*(A) Z 0) — X denotes the infimum map for X. 

up* : (A € P(X); sup (A) 4 0) — X denotes the supremum map for X. 

min* : (A € P(X); min (A) 4 0) — X denotes the minimum map for X. 
ax* : (A € P(X); max (A) Z 0) — X denotes the maximum map for X. 


The notations inf, sup, min and max are abbreviations for inf, sup*, min* and max* when the partially 


ordered set X < (X, €) is implicit in the context. 


11.2.23 REMARK: The infimum, supremum, minimum and maximum maps are well-defined ZF functions. 
The maps in Definition 11.2.21 are all well-defined functions in ZF set theory because they are subsets of 
P(X) x X which are determined by set-theoretic predicates. Thus for example, by Definition 11.2.4, 


inf” = {(A,b) € P(X) x X; (Vae A, b < a) and Yz € X, (Vae A, < a) = x < b)}. 


Existence is obtained by restricting the domain to those sets for which the infimum is defined, and uniqueness 
follows from Theorem 11.2.9. These assertions are given as Theorem 11.2.24. 


11.2.24 THEOREM:  Well-definition of extremum maps. 
Let (X, <) be a partially ordered set. 


(i) inf* : (A € P(X); inf *(A) 4 0} — X is a well-defined function. 
(ii) sup* : (A € P(X); sup (A) 4 0) — X is a well-defined function. 
(iii) 2i s {A € P(X); min*(A) 4 0) — X is a well-defined function. 
(iv) max* : (A € P(X); max (A4) 4 0} — X is a well-defined function. 


Pnoor: Part (i) follows from Theorem 11.2.9 (i). 
Part (ii) follows from Theorem 11.2.9 E 


Part (iii) follows from Theorem 11.2.9 (ii 
112.9 (ii 


= 
= 


: 


Part (iv) follows from Theorem 1 
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11.2.25 REMARK: Alternative expressions for some properties of extremums and bounds. 
Theorem 11.2.7 (iii,iv) and Theorem 11.2.9 (i,ii) may be rewritten as follows for any partially ordered set 
(X, X), using the numerosity notation “#” for finite sets in Notation 13.5.6. 
(i) Dom(min*) C Dom(inf*). 
(ii) Dom(max*) € Dom(sup* ). 
(ii) VA € P(X), snis" (4)) < #GMP*(A)) < 1. 
(iv) VA € P(X), d£Gmax*(A)) < s (sup (A) < 1. 
Parts (i) and (ii) follow from Theorem 11.2.7 (iii, iv). 


Parts (iii) and (iv) follow from Theorem 11.2.7 (iii, iv) and Theorem 11.2.9 (i, ii). 


11.2.26 REMARK: Domains of the extremum and bound maps. 

When one removes the formalities of Definitions 11.2.19 and 11.2.21, and Notations 11.2.20 and 11.2.22, the 
result is that the “inf” map has domain (A € P(X); A has an infimum} and values in X. Similarly for the 
sup, min and max maps. In terms of logical expressions, one has the following. 


Dom(inf) = (A € P( 
Dom(sup) = (A € P( 
Dom(min) = (A € P( 
Dom(max) = {A € P( 


); Ib € X, (Vae Ap b< a) A (Yz E X, (Va € A, x£ <a) >x <b 
); Ib € X, ((Va € A, b > a) A (Va E X, (Va € A, x£ > a) >x >b 
) 
) 


— 
— 
—— 


X 
X 


— 
— 
Kw 


; bE A, Vac A, b<a} 
; Jb E A, Vae A, b> a}. 


X 
X 


The value of each map is the unique value of b € X which satisfies the corresponding condition in the 
expression defining the map's domain. 


11.2.27 REMARK:  Erpressions for bound-sets in terms of extremum and bound operators. 
Theorem 11.2.28 is a rewrite of Theorem 11.2.11 using Notation 11.2.16. 


11.2.28 THEOREM: Relations between extremums of sets and their left/right segments. 
Let (X, €) be a partially ordered set. Let A € P(X). 


(i) Let AT = {x € X; Va € A, x € a}. Then inf(A) exists if and only if max(A7) exists. If they exist, 
then inf(A) = max( A7). 

(ii) Let At = (x € X; Va € A, x > a}. Then sup(A) exists if and only if min(A*) exists. If they exist, 
then sup( A) = min(A*). 


Pnoor: Parts (i) and (ii) follow immediately from Theorem 11.2.11 (i, ii) and Notation 11.2.16. 


11.2.29 REMARK: Relations between bound-sets and extremums. 

The sets and relations in Theorem 11.2.28 are illustrated in Figure 11.2.3. Note that c = min(AT) = sup(A) 
must be an element of A* if it exists, but it is not necessarily an element of A. (This is illustrated by the 
dotted outline of A.) Similarly, b = max( A7) = inf(A) must be an element of A^ if it exists, but it is not 
necessarily an element of A. 


upper 
bounds c= min(A*) 
Te "«. c-sup(A) 
(A 
"EN" b= inf (A) 
= AT 
lower eA 
bounds 
Figure 11.2.3 Infimum and supremum of a set A 
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From Theorems 11.2.28 (i) and 11.2.7 (v), it follows that A € P(X) has a minimum in an ordered set (X, <) 
if and only if max(A7) exists and max(A~) € A. Then inf(A) = min(A) = max(A7). Similarly A € P(X) 
has a maximum if and only if min(A*) exists and min(A*) € A. Then sup(A) = max(A) = min(A*). 


11.2.30 REMARK: Alternative expressions for relations between bound-sets and extremums. 

Applying the comments in Remark 11.2.26 to Theorem 11.2.28, one obtains the equations inf( A) = max(A7 ) 
for all A € Dom(inf), and sup(A) = min(A*) for all A € Dom(sup). By referring to the domains of the 
bounds, one avoids using the word “exists” and the term “well defined”. Theorem 11.2.28 also says that 
A- € Dom(max) if and only if A € Dom(inf), and A* € Dom(min) if and only if A € Dom(sup). 


The set-maps in Definition 11.2.19 and Notation 11.2.20 may be extended to define the lower-bound set- 
map Ib: P(X) — P(X) and the upper-bound set-map ub : P(X) > P(X) by 


VA € P(X), Ib(A) = (be X; Vae A, b x a} 
VA € P(X), ub(A) = (b € X; Va € A, b> aj. 


Then A^ = Ib(A) and At = ub(A) for all A € P(X). Hence Theorem 11.2.28 may be expressed in symbolic 
logic as follows for any partially ordered set (X, <). 
(i) VA € P(X), A € Dom(inf) = Ib(A) € Dom(max). 
(ii) VA € Dom(inf), inf(A) = max(Ib(A)). 
(iii) VA € P(X), A € Dom(sup) = ub(A) € Dom(min). 
(iv) VA € Dom(sup), sup( A) = min(ub(A)). 


11.2.31 REMARK: Alternative expressions for some properties of extremums and bounds. 
Most of Theorem 11.2.15 may be rewritten as in Theorem 11.2.32. 


11.2.32 THEOREM: Some properties of extremums and bounds for the empty set. 
Let (X, X) bea de ordered set. 


(i) Ib(0) = ub(0) = 

(ii) 0 € Dom(min) and en € Dom(max). 

(iii) Ø € Dom(inf) & X € Dom(max). If 0 € Dom(inf), then inf(0) = max( X). 
(iv) 0 € Dom(sup) & X € Dom(min). If Ø € Dom(sup), then sup(0) = min(X). 


Proor: Parts (i), (ii), (iii) and (iv) follow immediately from Theorem 11.2.15 (ii, iii, iv, v). 


11.2.33 REMARK: Am analogy between existence of extremums and topological connectedness. 

There is some similarity between the extremums of a set and the tangents of circles (and other convex curves) 
in Euclidean geometry. A tangent to a curve is supposed to touch it at one and only one point, and there 
must be no “air gap" between the tangent and the curve. Similarly, the infimum of a set is unique if it 
exists, and no other lower bound should lie between the infimum and the set. That is, there must be no “air 
gap". Figure 11.2.3 is intended to suggest that the element c is tangential to the sets A* and A, and that 
the element b is tangential to the sets A and A7, 


11.2.34 REMARK: Some properties of the infimum and supremum. 

There is probably no standard binary relation symbol which means that x and y are neither less than, greater 
than or equal to each other. But if one were to use symbols such as x E y or x = y to mean that either 
x> yorx < y, then one could use x Zz y or x Zz y to mean that x £ y and x Z y. These are clumsy symbols 
(and far too tall), but luckily they will never be needed! Except in these two tables. 


condition consequence condition consequence 

z>b n7A(Wae Aya <a) zb Vac A, r>a 

xr=b Vac A,r<a g=b Vac A, r>a 

r<b Vac A, r<a xr<b ~(Va E€ A, z > a) 

rŹb ~(Va € A,x € a) cZb —(Va€ A, x € a) 
b = inf (A) b — sup( A) 
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The left table shows the consequences of the four possible relations between an element x of a partially 
ordered set X, and an infimum b for a subset A of X. If x < b, then Va € A, x € a. Otherwise, the logical 
negative of this proposition is true. This gives the following expression for b = inf(A). 


Vx € X, r<b & VacA,z&a. 


However, it is important to note that this does not imply Va € X, (x » b «&» «(Va € A, x € a)). It is equally 
important to note that —^(Va € A, x < a) is not equivalent to Ja € A, x > a. 


Similarly, the right table shows the consequences of the four possible relations between an element x of a 
partially ordered set X, and a supremum b for a subset A of X. If x > b, then Va € A, x > a. Otherwise, 
the logical negative of this proposition is true. This gives the following expression for b — sup( A). 


Vr € X, r2b 6 VacA,r2a. 


''hese comments are formalised in Theorem 11.2.35. 


11.2.35 THEOREM: Some logical predicates for the infimum and supremum maps. 
Let (X, X) be a partially ordered set. Let A C X. 
(i) b = inf( A) if and only if Va € X, (x € b & Va € A, x € a). 
(ii) b = sup(A) if and only if Vz € X, (x > b & Va € A, zx 2 a). 
(iii) inf(A) exists if and only if db € X, Vx € X, (x € b & Va € A, x € a). 
(iv) sup( A) exists if and only if db € X, Vr € X, (x 2 b & Va € A, z > a). 


Proor: Let X be a partially ordered set. Let A C X. Then by Definition 11.2.4, b = inf(A) if and 
only if (1) Va € A, b € a and (2) Vx € X, ((Va € A, x € a) > x < b). Proposition (2) is equivalent to 
Va € X, (x € b > (Va € A, x € a)). In other words, if x > b, or x and b are not related by the partial 
order, then the proposition Va € A, x € a is false. 


If z = b, then proposition (1) implies Va € A, x € a. If z « b, then Va € A, x < a by transitivity, using 
proposition (1). So this covers all of the four possible relations (or non-relations) between x and b. The case 
x > band the “unrelated” case imply —(Va € A, x < a). The other two cases imply Va € A, x < a. Hence 
b = inf(A) implies Vr € X, (x € b & Va € A, x € a), as claimed. 

To show the converse, let b € A satisfy (3) Vr € X,(x € b & Va € A, x € a). Substituting b for x 
gives Va € A, b € a. So b is a lower bound for A. Let x be any lower bound for A. Then proposition (3) 
implies z < b. Therefore b is an infimum for A in X by Definition 11.2.4. This verifies part (i). 


The proof of part (ii) follows similarly. 


Parts (iii) and (iv) are immediate corollaries of parts (i) and (ii) respectively. 


11.2.36 REMARK: Alternative proof of Theorem 11.2.35. 

The proof of Theorem 11.2.35 follows perhaps more straightforwardly from the approach in Theorem 11.2.28. 
Let A" = (x € X;Va € A,x € aj. First note that by Theorem 11.2.28 (i), b = inf(A) if and only 
if b = max( A`). But b = max( A7) if and only if (1) b € A^ and (2) Vr € A^, x < b. 

Condition (1) holds if and only if b € X and Va € A, b < a. But Va € A, b € a holds if and only if 
Va € X, (x € b => (Va € A, x € a)). In other words, b is a lower bound for A if and only if all elements 
x € X with x < b are lower bounds for A. (This follows from the transitivity of the partial order.) 
Condition (2) holds if and only if Vr € X, ((Va E€ A, xa) >x < b). 


Combining conditions (1) and (2), it follows that b = inf (A) if and only if Va € X, (x < b & (Va € A, x < a)). 
This proves Theorem 11.2.35 (i), and part (ii) is similar. 


11.2.37 THEOREM: Logical expressions for the domains of the infimum and supremum maps. 
Let (X, <) be a partially ordered set. 


(i) Dom(inf) = {A € P(X); 3b € X, Vx € X, (a <b & Vac A, x € a)}. 
(ii) Dom(sup) = {A € P(X); Ib € X, Va € X, (x > b & Vac A, x 2 a)). 
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PROOF: Parts (i) and (ii) follow from Theorem 11.2.35 parts (iii) and (iv) respectively. 


11.2.38 DEFINITION: Let (X, €) be a partially ordered set. Let A C X. 

A is bounded below (in X) when there exists a lower bound for A in X 

A is bounded above (in X) when there exists an upper bound for A in X 

A is bounded (in X) when it is both bounded below and bounded above in X. 


11.2.39 REMARK: Some conditions for boundedness of sets. 
By applying the notations in Remark 11.2.30 to Definition 11.2.38, one may say that a set A € P(X) is 
bounded below if Ib(A) 4 0, and bounded above if ub(A) z @. 


11.2.40 THEOREM: Some relations between extremums and bounds of sets. 
Let (X,<) be a partially ordered set. Let A be a subset of X. 

(i) If inf(A) exists and b is a lower bound for A, then inf(A) > b. 

(ii) If A # 0) and inf( A) exists, and c is an upper bound for A, then inf(A) < c 
(iii) If A Z 0 and sup(A) exists, and b is a lower bound for A, then sup( A) > b. 
(iv) If sup(A) exists and c is an upper bound for A, then sup(A) < c. 


PROOF: Let (X, X) be a partially ordered set. Let A € P(X) \ {0}. For part (i), suppose that inf( A) exists 
and that b € X satisfies Va € A, a > b. Then A^ = (x € X; Va € A, a > x] is a non-empty subset of X 
and inf(A) = max(A~) by Theorem 11.2.28. Since b € A7, it follows that b < max(A7) = inf (A). 


For part (ii), suppose that inf(A) exists and that c € X satisfies Va € A, a < c. Then since A is non-empty, 
there is an a € A with a < c. So by transitivity, inf( A) € a € c. 


Part (iii) follows as for part (ii). Part (iv) follows as for part (i). 


11.2.41 REMARK: Some alternative expressions for properties of the infimum and supremum. 
One may express Theorem 11.2.40 in terms of Notation 11.2.22 and Remark 11.2.30 as follows for any 
partially ordered set (X, <). 

(i) VA € Dom(inf), Vb € Ib(A), inf(A) > b. 

(ii) VA € Dom(inf), Vc € ub( A), inf(A) < c. 
(iii) VA € Dom(sup), Vb € Ib( A), sup( 4) 
(iv) VA € Dom(sup), Vc € ub(A), sup( 4) 


11.2.42 THEOREM: Monotonicity of extremums with respect to set inclusion. 
Let (X, €) be a partially ordered set. 
(i) VA, B € Dom(inf), A C B = inf(A) > inf( B). 
(ii) VA, B € Dom(sup), A € B = sup(A) < sup(B). 
(iii) VA, B € Dom(min), A C B => min(A) > min(B). 
(iv) VA, B € Dom(max), A C B > max(A) € max(B). 


PROOF: Let (X, X) be a partially ordered set. For part (i), let A, B € Dom(inf). Suppose that A C B. 
Let b = inf(B). Then Yz € B, b < x. SoVr € A, b € x. Sob € Ib(A). Let a = inf(A). Then a = max(Ib(A )). 
So b <a. Thus inf(A) > inf(B). 


The proof of part (ii) parallels the proof of part (i). 


For part (iii), let A,B € Dom(min). Suppose that A C B. Let b = min(B). Then Vx € B,b < x. So 
Va € A, b € x. Let a = min(A). Then a € A. Therefore b € a. Thus min(A) > min(B). (Alternatively use 
part (i) together with Theorem 11.2.7 (iii).) 


The proof of part (iv) parallels the proof of part (iii). 
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11.2.43 REMARK: Order relations between points and sets induced by order relations on point-pairs. 
For any partially ordered set (X, X), corresponding relations Rı on X x P(X) and Rə on P(X) x X may 
be defined by 


Va € X, VA € P(X), x Rı AS VacA,xr<a 
VA € P(X), Va € X, A Rox & Vac A, a € z. 


The same symbol “<” may be used for Rı and Rg when there is no danger of confusion. Then one may 
write, for example, *r < A” to mean that x is a lower bound for A. Then Theorem 11.2.35 (i) may be 
written as: “b = inf(A) if and only if Vr € X, (a < b & x < A)”. Much of Section 11.2 would look much 
simpler if lower and upper bounds conditions were written in this extended notation. 


To take this style of notation further, one may also define a relation Ro on P(X) x P(X) by 


VA, B € P(X), A Ro B & Vae A, VbE B,a& b. 


Then Ro may also use the same symbol “<”. This notation is not so useful for Section 11.2, although it does 
at least have some of the properties of a partial order on P(X). Unfortunately, the empty set is problematic 
because one would have Ø < A < () for all A € P(X), which implies that X = Ø. One may restrict the relation 
Ro to (IPCX)N (01) x (P(X) \ {0}), but the restricted relation is still not a valid partial order on P(X) V {0} 
because the weak reflexivity condition A < A is not valid in general. (In fact, A < A if and only if A is a 
singleton (x) for some x € X.) However, the transitivity and antisymmetry conditions do seem to be valid. 


11.3. Bounds for functions 


11.3.1 REMARK: Extending extremal notions from sets to functions. 

Extremal notions such as infimum, supremum, minimum and maximum may be extended from sets to 
functions. A function is said to be bounded if its range is bounded. Definitions 11.3.2 and 11.3.7 are 
adaptations of Definitions 11.2.4 and 11.2.38 respectively. The extension of Definition 11.2.2 from sets to 
functions is probably not so useful. So it is not given here. 


11.3.2 DEFINITION: Let X be a set. Let (Y, €) be a partially ordered set. Let f : X > Y. 
A lower bound for f is a lower bound for f(X) in Y. 
A upper bound for f is a upper bound for f(X) in Y. 


An infimum or greatest lower bound for f is an infimum for f(X) in Y. 


A supremum or least upper bound for f is a supremum for f(X) in Y. 


A minimum for f is a minimum for f(X) in Y. 


A maximum for f is a maximum for f(X) in Y. 


11.3.3 NOTATION: Let X be a set. Let (Y, X) be a partially ordered set. Let f : X — Y. 
inf(f) denotes the infimum of f, if it exists. Alternatives: inf f, infrex f(x). 

sup(f) 

min( f) 


max(f) denotes the maximum of f, if it exists. Alternatives: max f, maxsex f(z). 


denotes the supremum of f, if it exists. Alternatives: sup f, sup,c x f(z). 


denotes the minimum of f, if it exists. Alternatives: min f, minzex f(x). 


11.3.4 NOTATION: Let X bea set. Let (Y, X) be a partially ordered set. Let f : X > Y and AC X. 
inf;cA f(x) denotes the infimum of Fla if it exists. Alternatives: inf4(f), inf, f. 

Sup;cA f(x) denotes the supremum of Fla if it exists. Alternatives: sup4 (f), sup, f. 

min;ecA f(x) denotes the minimum of Flas if it exists. Alternatives: mina (f), miną f. 


màX;cA f(x) denotes the maximum of flas if it exists. Alternatives: max4 (f), max, f. 
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11.3.5 THEOREM: Some basic relations between extremums of functions. 
Let f : X — Y for a set X and a partially ordered set (Y, X). Let A be a subset of X. 
(i) If A # 0 and the infimum and supremum of f both exist, then inf A(f) < sup A(/). 
(ii) If the minimum of f exists, then minA(f) = infA(f). 
(iii) If the maximum of f exists, then sup 4(f) = max,(f). 
(iv) If the minimum and maximum of f both exist, then min4(f) = infA(f) € sup,(f) = maxa(f). 


PROOF: For part (i), f(A) # Ú because A 7 Ø. So infA(f) < sup,(f) by Theorem 11.2.17 (i). Similarly, 
parts (ii), (iii) and (iv) follow from Theorem 11.2.17 (ii, iii, iv). 


11.3.6 REMARK: Extremal notions for functions are defined om their range sets. 

'The various kinds of bounds in Definition 11.3.2 are the same as the corresponding bounds in Definition 11.2.4 
applied to the image of each function, not the graph. For most purposes, it is not necessary to know that a 
set of ordered pairs is a function. Most of the time, one may regard the graph (i.e. the set of ordered pairs) 
of a function either as a set or as a function. But in the case of bounds, the distinction is important. When 
the set of ordered pairs is thought of as a function f : X — Y, the bounds are applied to the image set 
f(X) € Y of the function, not the graph f = {(a, f(z)); x € X). However, one may also define bounds for 
the graph as a subset of X x Y if the Cartesian set product has an order defined on it. 


11.3.7 DEFINITION: Let X be a set. Let (Y, X) be a partially ordered set. Let f : X > Y. 
f is bounded below (in Y) when f (X) is bounded below in Y. 

f is bounded above (in Y) when f(X) is bounded above in Y. 

f is bounded (in Y) when f(X) is bounded in Y. 


11.3.8 THEOREM: Some relations between extremums and bounds for functions. 
Let X be a set. Let (Y, €) be a partially ordered set. Let f : X — Y. Let S € P(X). 
(i) If inf(f(S)) exists and b € Y is a lower bound for f (S), then inf(f(5)) > b. 
(ii) If S 4 () and inf(f(S)) exists, and c € Y is an upper bound for f(S), then inf(f(S)) € c 
(iii) If S z Ø and sup(f(S)) exists, and b € Y is a lower bound for f(S), then sup(f(S)) > b. 
(iv) If sup(f(S)) exists and c € Y is an upper bound for f (S), then sup(f(S)) < c. 
PROOF: Let (Y, €) be a partially ordered set. Let f : X — Y. For part (i), suppose S is a non-empty 


subset of X such that inf(S) exists, and let b € Y satisfy Va € S, f(x) > b. Then b is a lower bound for the 
non-empty subset A = f(S) of Y. So inf(f(S)) = inf(A) > b by Theorem 11.2.40 (i). 


Similarly, parts (ii), (iii) and (iv) follow from Theorem 11.2.40 (ii, iii, iv). 


11.3.9 THEOREM:  Monotonicity of extremums with respect to function inequalities. 
Let X be a set. Let (Y, €) be a partially ordered set. Let f,g: X > Y. 

(i) If Va € X, f(x) € g(x), and inf(f) and inf(g) both exist, then inf(f) < inf(g). 
(ii) If Va € X, f(x) € g(x), and sup(f) and sup(g) both exist, then sup(f) € sup(g). 


PROOF: For part (i), let X be a set, let (Y, X) be a partially ordered set, and let f,g : X — Y satisfy 
Va € X, f(x) € g(x). Suppose that inf(f) and inf(g) exist. Then b = inf(f) € Y is a lower bound for f(X), 
by Definition 11.2.4. So b < f(a) for all x € X. So b € g(x) for all x € X, by Definition 11.1.2 (iii). So 
b < inf(g) by Theorem 11.3.8 (i). Hence inf(f) < inf(g). es 


Part (ii) may be proved as for part (i). 


11.3.10 REMARK: Mixed infimums and supremums for functions of several variables. 
For functions of two or more variables, one may mix infimum and supremum operations. The infimum over 
all variables individually is the same as the infimum over the combined cross-product domain. (Similarly 
for the supremum.) Notation 11.3.11 gives some samples of mixed infimum/supremum combinations. Of 
course, there are infinitely many variations and combinations generalising this notation. 
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11.3.11 NOTATION: Let (X,<) be a partially ordered set. Let f: $ x T >X. 
(i) infses sup;ez f(s,t) denotes inf{sup{f(s,t); t € T); s € S} if g(s) = sup{f(s,t); t € T} exists for all 
s € S and infses g(s) exists. 
(ii) sup,es inf;er f(s,t) denotes sup(inf(f(s,t); t € Tẹ; s € S} if h(s) = inf{f(s,t); t € T} exists for all 
s € S and sup,cs h(s) exists. 


11.4. Lattices 


11.4.1 REMARK: Usefulness of complete lattices for defining extremums. 

Complete lattices are well suited to be the range of functions for which inferior and superior limits are to be 
defined, for example in Section 35.8. The existence of mixed extremum expressions such as in Notation 11.3.11 
is also guaranteed when the function target space is a complete lattice. (See Figure 11.6.1 for a family tree 
including lattices and other classes of partial orders.) 


For the definition of a lattice, see for example Bell/Slomson [339], page 8. 


11.4.2 DEFINITION: A lattice is a partially ordered set X such that inf {a,b} and sup (a, b) both exist in X 
for all a,b € X. 


11.4.3 REMARK: Notations for infimums and supremums on lattices. 

The standard notations for the infimums and supremums in Definition 11.4.2 are a ^ b for inf {a,b} and 
a V b for sup {a,b}. These symbols are the same as those used for logical AND and OR respectively in 
Notation 3.6.2. The symbol “A” may also be confused with the wedge product in Notation 30.4.23 for 
antisymmetric tensor products. The expressions a ^ b = inf {a,b} and a V b = sup {a,b} are known as the 
“meet” and “join” respectively of a and b. 


11.4.4 DEFINITION: A complete lattice is a partially ordered set X such that inf A and sup A both exist 
in X for all A € P(X) \ {0}. 


11.4.5 EXAMPLE: Some examples of lattices and complete lattices. 
Let X = R with the usual order. Then X is a complete lattice. 


Let Y = R with the usual order. Then Y is a lattice, but not a complete lattice. All bounded subsets of Y 
do have an infimum and supremum, but unbounded subsets lack either an infimum or supremum, or both. 


Let Z = Q with the usual order. Then Z is a lattice, but not a complete lattice. In this case, unbounded 
sets are not the problem. The problem here is sets like {x € Q; x? < 2}, which has neither an infimum nor 
supremum in Q. 


For both of the ordered sets Y and Z, every subset has an infimum and supremum in X. 


11.4.6 EXAMPLE: More examples of lattices and complete lattices. 

Let X = RE, the set of extended-real-valued functions on the real-number domain. Define the pointwise 
order “<” on X by ¢1 € $» for ¢1,¢2 € X if and only if Va € R, ¢1(x) € $»(x). Then (X, X) is a complete 
lattice because every subset of R has an infimum and supremum in R. 


Let Y = (f : IR 2 R; f is continuous}. (See Section 31.12 for continuous functions.) Define the pointwise 
order on Y as for X. Then (Y, €) is a lattice, but not a complete lattice. The pointwise minimum and 
maximum of any f,g € Y are continuous functions, but one may easily construct infinite subsets of Y whose 
infimum and supremum are not continuous. For example, let A, = (f € Y; Vx € R, |f(z)| € h(x)), where 
h:IR —> Rj is discontinuous. Then there is no function in Y which is an infimum or supremum for A in Y. 
For example, let h(0) = 0 and A(x) = 1 for x € R \ {0}. Then inf( A4) and sup(A;) do not exist in Y, but 
if one regards Aj as a subset of X, then sup( A5) = h and inf( Aj) = —h. The set Y itself does not have an 
infimum or supremum in Y, but regarding Y as a subset of X, the supremum of Y is the function z ++ +00, 
and the infimum of Y is the function z — —oo. 


11.4.7 REMARK: Completion of incomplete lattices. 

Examples 11.4.5 and 11.4.6 hint at the idea of constructing the completion of any incomplete lattice X by 
adding to the lattice X some formal infimums and supremums which “fill the gaps". This would result in a 
complete lattice X which one could refer to as the “completion” of the lattice X. 
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11.4.8 THEOREM: Equivalent condition for a partially ordered set to be a complete lattice. 
Let (X, <) be a partially ordered set. Then (X, <) is a complete lattice if and only if Dom(inf* ) > IP(X) (0) 
and Dom(sup*) > P(X) \ {0}. 


PROOF: Let (X, x) be a partially ordered set. Then by Definition 11.4.4, (X, X) is a complete lattice if 
and only if both inf(A) and sup( A) exist in X for all A € P(X) \ (0). By Notation 11.2.22, this holds if and 
only if Dom(inf*) 2 P(X) V (0) and Dom(sup*) 2 P(X) \ (01. 


11.5. Total order 


11.5.1 DEFINITION: A total order on a set X is a relation R on X which satisfies 


(i) Va,yEe X, xz Ry V y Rq; [strong reflexivity] 
(ii) Yzx,y € X, (1 Ry ^y Rz)>zr=y; [antisymmetry] 
(i) Vz,yuz€ X, (1 Ry ^y Rz)> z Rz. [transitivity] 


A totally ordered set is a pair (X, R) such that X is a set and R is a total order on X. 


11.5.2 REMARK: Alternative names for total order. 
A total order is sometimes called a “linear order”. (See for example Willard [165], page 5.) A totally ordered 
set is sometimes called a “chain”, a “simply ordered set” or a “linearly ordered set”. 


Definition 11.5.1 condition (i) is sometimes called the “dichotomy” condition. (See for example Bell/ 
Slomson [339], page 7.) 


11.5.3 THEOREM: The inverse of a total order is a total order. 
If a relation R for a set X is a total order, then the inverse relation R7! is a total order for X also. 


PROOF: Let R be a total order on X. Then R^! is a relation on X because R is. By Definition 11.5.1 (i), 
Vr,y€ X, x RyVy Rz. SoVz,y € X, y R1 x V x R^! y by the definition of an inverse relation. So R^! 
satisfies Definition 11.5.1 (i). 


By Definition 11.5.1 (ii), Vz, y € X, (x Ry ^ y Ru) => zx = y. This is also clearly symmetric with respect 


to x and y. So R`! satisfies the same condition. 


By Definition 11.5.1 (iii), Vr,y,z € X, (x Ry ^y R z) = z R z. So from the definition of an inverse relation, 
Yz, y,z € X, (y Rt x A z Rty) => z R7! z. SoVz,y,z€ X,(yR'z2A2R1y)>2 R`! z by swapping 
dummy variables z and z. Hence R`} satisfies Definition 11.5.1 (iii) by symmetry of the AND-operator. 


11.5.4 REMARK: Automatic application of partial order properties to total orders. 

The comments on partial orders in Remark 11.1.4 (and proved in Theorem 11.1.5) apply also to total orders. 
Thus a partial order R on a set X is a relation on X which satisfies the three conditions: idy C R, 
ROR! C idx and Ro RC R. The additional condition (i) in Definition 11.5.1 for a total order may be 
written in the same idiom as RU R^! = X x X. Theorem 11.5.5 justifies the name “total order". 


11.5.5 THEOREM: Total orders are maximal partial orders. 
Let X be a set. 
(i) Let R be a total order on X. Let S be a partial order on X such that RC S. Then R= S. 


(ii) Let R be a partial order on X which is not a total order on X. Then there exists a partial order S on 
X which satisfies R&S. 

(iii) Let R be a partial order on X such that there is no partial order $ on X which satisfies R & S. Then 
R is a total order on X. 

(iv) A partial order R on a set X is a total order on X if and only if it is a maximal partial order. 


In other words, a partial order R on a set X is a total order on X if and only if there is no partial order 
S on X with R & S. 
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PROOF: For part (i), suppose that R & S. Then there is a pair (a, y) € S such that (x,y) € R. So (y, £) € R 
by Definition 11.5.1 (i). Therefore x = y by Definition 11.1.2 (ii). So (x,y) € R by Definition 11.5.1 (i), which 
is a contradiction. Hence R= S. 


For part (ii), let R be a partial order on X which is not a total order on X. Then there exist zo, xı € X 
with (xo, £1) ¢ R and (x1, z0) d R. For such zo and «1, let 


S-— RU {(to, t1) €XxX; (to, zo) € Rand (21,4) € R}. 


Then (29,21) € S since (ro, £o) € R and (z1,21) € R, and (x1, x0) € S since (x1, z0) ¢ R. To show that S is 
a partial order on X, first note that idx C S because idx C Rand RCS. So S satisfies Definition 11.1.2 (i). 
To show that S is antisymmetric, let x,y € X satisfy (x, y) € S and (y, x) € S. If (x,y) € R and (y, x) € R, 
then x = y because R is a partial order. 

If (x,y) € R and (y,x) € SX R, then (y,x) = (to,t1) for some (to, zo) € R and (z1,t) € R, and then 
(y, zo) € Rand (zi, x) € R. Combined with (x, y) € R, this gives (x1, x9) € R by a double application of the 
transitivity of R. This is a contradiction. So the case (x,y) € R and (y,x) € SX R cannot occur. Similarly 
the case (x,y) € SX R with (y, x) € R cannot occur. 


Now suppose that (x,y) € SX R and (y,z) € SN R. Then (zx, y) = (to,t1) for some (to, xo) € R and 
(x1,1;) € R, and (y, zx) = (t,t,) for some (fg, zo) € R and (z,,1j) € R. So (z,zo) € R, (xj,y) € R, 
(y,zo) € R and (zxi,z) € R. Therefore (zi,x9) € R, which is a contradiction. Thus (z,y) € R and 
(y,x) € R is the only possible case, and this implies x = y. Consequently S satisfies Definition 11.1.2 (ii). 


To show transitivity of S, let (x,y) € S and (y, z) € S. If (x,y) € R and (y,z) € R, then (z,z) € R. So 
suppose that (x, y) € R and (y, z) € SX R. (See Figure 11.5.1.) Then (y, z) = (to, t1) for some (to, zo) € R 
and (21,11) € R. Thus (y, xo) € R and (z1,z) € R. But the transitivity of R then implies that (x, zo) € R 
because (x,y) € R. Therefore (z,z) € S with (x, £o) € R and (z1,2) € R. Thus transitivity of S holds in 
the case (x,y) € R and (y, z) € S \ R, and similarly in the case (x,y) € SN R and (y,z) € R. 


Figure 11.5.1 Transitivity of extended partial order 


Now suppose that (x,y) € S\ R and (y,z) € SN R. Then (zi,y) € R and (y, xo) € R. So (x1, xo) € R by 
the transitivity of R, which is a contradiction. So this case cannot occur. Thus S satisfies the transitivity 
condition, Definition 11.1.2 (iii). Hence S is a partial order by Definition 11.1.2. 


Part (iii) follows as a contrapositive of part (ii). (See Theorem 4.5.7 (xxx).) 


Part (iv) follows from parts (1) and (iii). 


11.5.6 REMARK: Contrast between partial orders and total orders. 

A total order has a stronger reflexivity condition than a partial order. Every total order is a partial order. 
So all theorems and definitions which require a partial order apply also to a total order. A family tree for 
some classes of order relations is illustrated in Figure 11.1.1. (See Figure 9.0.1 for a family tree showing 
order relations in the context of more general relations.) ns 


11.5.7 THEOREM: Some very basic properties of totally ordered sets. 
Let (X, €) be a totally ordered set. 
(i) The set {a,b} has both a minimum and a maximum in X for all a,b € X. 
(ii) (X, X) is a lattice. 
PROOF: Let (X,<) be a totally ordered set. For part (i), let a,b € X. Then a € b or b < a (or both). 


Let A = {a,b}. If a € b, then Vx € A, a € x and Vx € A, x < b. So a is the minimum of A and b is the 
maximum of A. Similarly if b € a. So {a,b} always has a minimum and a maximum in X for all a,b € X. 


Part (ii) follows immediately from part (i), Definition 11.4.2 and Theorem 11.2.7 (iii, iv). 
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11.5.8 REMARK: Total orders are “linear”. 

Figure 11.5.2 illustrates an example of a totally ordered set. The dots are intended to suggest a possibly 
infinite number of elements lying between e and f, and between 7 and k. There could also be an infinite 
number of elements to the left of a or to the right of o. The important feature to notice is the absence of 
“branch nodes”. Thus a total order may be characterised as a “linear order”, while a partial order (as in 
the example illustrated in Figure 11.2.1) may be characterised as a “directed acyclic graph order” because 
“cycles” (which start and end at the same point) are not permitted. 


a — b— c —d—e ... f—g—h—ei—»9j ... k—-l>m>n-o 


Figure 11.5.2 Example of total order (without transitivity arrows) 


The arrows in Figure 11.5.2 link elements which have no intervening elements. Thus when an arrow is shown 
from a to b, this means that a < b, and that there is no z with a < x < b. Omitted from the diagram are the 
“transitivity arrows”, such as to indicate the relation a < c, which is required by the transitivity property. 
(Transitivity arrows are also discussed in Remark 11.2.3.) 


Interpreted in terms of such a diagram, Theorem 11.5.5 (i) means that it is not possible to add any elements 
outside the “line” of nodes. In other words, if one attempts to add a “detour” to the diagram, then the 
relation will fail to be a (partial) order. In other words, no further ordering is possible. 


11.5.9 REMARK: Generalisation of real-number interval concepts to total orders. 

Most of the order concepts of the real numbers may be imported into the general total order context because 
of the linear character of a total order. (See Definition 16.1.5 for real number intervals.) Forms of intervals 
such as (x € X;a € x Ax € b), which have two inequalities, are conveniently abbreviated to the form 
[re X;a € zr & b). 

The words “bounded”, “open” and “closed” are used here without implications for metric or topological 
spaces, although they do have a close correspondence in the special case of the real numbers. The word 
“infinite”, likewise, does not imply anything about the cardinality or diameter of intervals. 


11.5.10 DEFINITION: Let (X, €) be a totally ordered set. 
bounded) closed interval of X is a set of the form {xz € X; a € x € b} for some a,b € X. 
bounded) open interval of X is a set of the form (x € X; a < x < b) for some a,b € X. 


bounded) closed-open interval of X is a set of the form {x € X; a € x < b) for some a,b € X. 


bounded) open-closed interval of X is a set of the form {x € X; a < x € b) for some a,b € X. 


pops 


(bounded) semi-closed interval of X is any closed-open or open-closed interval of X. 


A (bounded) semi-open interval is the same as a bounded semi-closed interval. 


A semi-infinite closed interval of X is a set of the form {x € X; a < x} or {x € X; x € b) for some a,b c X. 


A semi-infinite open interval of X is a set of the form (x € X; a < x} or {x € X; x < b) for some a,b € X. 


An interval of X is any open, closed, semi-open or semi-closed interval, any semi-infinite interval of X, or 
the whole set X. 


A bounded interval of X is any bounded open, closed, closed-open or open-closed interval of X. 


11.5.11 REMARK: Interval classes for totally ordered sets. 

All of the styles of intervals in Definition 11.5.10 are well defined for a general partial order. However, 
the use of the word "interval" for partially ordered sets could mislead one's intuition. As an example, 
consider the power set X = IP(S) partially order by set inclusion, for some set S. Then the “interval” 
|A, B] for sets A, B € P(S) means the set (C € X; AC C ^ C C B). For example, if X = IR? and 
B, = (x € X; x? a2 < r?°}, then the “interval” [B,, B2] is the set of all sets which include Bı and are 
included in B5. Such sets are potentially useful, but it is difficult to think of them as “intervals”. 


11.5.12 REMARK:  Notations for intervals in general totally ordered sets. 

Bounded intervals of a totally ordered set (X, x) may be denoted as in Notation 11.5.13. Since interval 
notations such as “|a, b]” are generally assumed to mean real number intervals, it is necessary to specify the 
set X here. 
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11.5.13 NOTATION: Let (X,<) be a totally ordered set. 
X [a,b] denotes the set ( € X; a < x € b) for a,b € X. 
X |a, b) denotes the set {x € X; a < a < b) fora, bc X. 
X(a,b] denotes the set {x € X; a < x € b) fora, bc X. 
X (a,b) denotes the set {x € X; a < x < b) fora, bc X. 


11.5.14 REMARK: Intervals with single-sided bounds in totally ordered sets. 

Semi-infinite intervals of (X,<) may be denoted as in Notation 11.5.15. The lemniscate symbol “oo” is, of 
course, meaningless in general totally ordered set contexts. Therefore these semi-infinite interval notations 
should be avoided. 


11.5.15 NOTATION: Let (X,<) be a totally ordered set. 

X (—oo, b] denotes the set (x € X; x € b} for be X. 

X(—oo, b) denotes the set {x € X; x < b} forb c X. 

X [a, +00) denotes the set {x € X; a € x} fora € X. (Alternative: X[a, oo).) 
X (a, +00) denotes the set {x € X; a < x} fora € X. (Alternative: X(a,oo).) 


11.5.16 REMARK: Conditions for infimum and supremum of a partially ordered set. 

Theorem 11.2.35 is valid for total orders (because a total order is a partial order). Thus for subsets A of 
an ordered set X, b = inf(A) if and only if Vz € X, (x € b & Va € A,x € a). In the case of a total 
order, this is equivalent to Vr € X, (x > b = da € A, x > a). Consequently X is partitioned into two 
intervals by the infimum, if it exists. The proposition Va € A, x € a is true for x € X(—oo,0], and false for 
x € X(b,oo). In the dual order case, if c = sup( A) exists, then Va € A, x > a is true for z € X[c, oo), and 
false for x € X(—oo, c). 


11.5.17 REMARK: Minimal and maximal elements are minima and maxima in totally ordered sets. 

For elements a,b in a totally ordered set (X, X), ^(a < b) is true if and only if b € a is true. Therefore 
minimum and minimal elements of a subset A of X mean the same thing for a total order. (See Definitions 
11.2.2 and 11.2.4 for minimal and minimum elements.) Similarly, maximum and maximal elements of a 
subset are the same thing. 


11.5.18 REMARK: The relation of the equality relation to total orders. 
In computer programming (in some computer languages), a function f which defines a total order on a set 
X typically returns a value in the set (—1,0,1], where 


—-] ifr«y 
senf 0 ifr—y 
+1 ifzy. 


However, in mathematics an order relation is represented as a subset of X x X, which is equivalent to a 
boolean (or indicator) function on X x X, which is a function whose values lie in (0, 1), where the integer 1 
represents “true”. The reason for the difference is the fact that equality of elements x,y € X is taken for 
granted in mathematics. Therefore only two of the order-function values need to be specified. 


In computer programming, partial orders are not often returned as functions. For such an order-function, 
it would be necessary to have a fourth value to represent “unrelated” because it is possible that neither x 
nor y may be elements of the order. 


11.5.19 REMARK: Families with ordered indez sets. 

Families of sets and functions are defined in Section 10.8. The pair J < (1, R) in Definitions 11.5.21 
and 11.5.22 consists of the set I, which is the domain of the families of sets or functions, and the relation 
RC I x I, which is a total order on I. 


11.5.20 DEFINITION: A totally ordered family of elements of a set X is a family (v;)ie; € X7 such that 
the index set J is totally ordered. 


11.5.21 DEFINITION: A totally ordered family of sets is a family of sets (X;);e; such that the index set I 
is totally ordered. 
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11.5.22 DEFINITION: A totally ordered family of functions is a family of functions (f;);e; such that the 
index set / is totally ordered. 


11.5.23 REMARK: The difference between a sequence and a family. 

A family of sets or functions (with no total order on the index set) may sometimes be referred to as a 
“sequence” of sets or functions. If the index set has no specified order, the word “family” should be used. 
The word “sequence” comes from the Latin word “sequi” meaning “to follow”. So a sequence must have 
a specified total order. More precise than the term “sequence” are terms like “ordered family”, “totally 
ordered family”, and “well-ordered family". 


11.5.24 REMARK: Projection maps for totally ordered families of sets. 

Since Definition 11.5.25 requires order, it could not appear in conjunction with Definition 10.13.6 for projec- 
tions of Cartesian products of set-families onto component subsets in Section 10.11. Definition 11.5.25 uses 
the order on the index set to specify a range of components. es 


11.5.25 DEFINITION: The projection map of a Cartesian product Xicr S; onto a range of components J = 
{j € I; nı € j € ne}, where I is a totally ordered set and n1,n2 € I, is the map from x;e; S; to Xjeg Sj 
defined by (zi)ier + (25)jc;. In other words, x +> BI. for £ € Xijer Sj. 


11.5.26 NOTATION: II7?, for n;,n3 € I, denotes the projection map of x;e; S; onto the component range 
J={j € I; nı < j < no}. Thus IT? : Xier Si > Xjes Sj is defined by II? : (zi)er  (25)je7- 


11.5.27 REMARK: Projection maps from Cartesian products to component ranges. 

Definition 11.5.25 and Notation 11.5.26 are useful for various definitions for charts for tangent bundles. If 
a manifold has dimension n, then the tangent bundle is a manifold of dimension 2n. The horizontal and 
vertical components of the standard charts on the tangent bundle can then be separated using the projection 
maps II? : IR?" > R” and II24, : IR?" > R”. (A standard canonical map is assumed from the index set 
J = {i € Non; i > n+1} = Noən \ Nn to IN, as in Definition 14.6.11.) 


11.5.28 REMARK: Ordered traversals. 

The non-standard Definition 11.5.29 is essentially equivalent to Definition 11.5.21. It is intended as a 
generalisation of the concept of a continuous curve in a topological space (as in Section 36.2) to a non- 
topological curve. The very least that one expects of a curve is that it should have an order in which the 
points are traversed. If / — IR, then this permits discontinuities which a continuous curve forbids. An 
equally good term for a traversal would be a “trajectory”, although this word has a more specific meaning 
in physics. 


An obvious variation of the ordered traversal idea is to simply transfer the order relation from the ordered 
set I in Definition 11.5.29 to the set X itself. One can obviously also invert the map f without materially 
affecting the definition. A much more interesting variation is to replace the total order on J with a general 
partial order on J. But once again, one may as well consider the order to be defined on the set X itself. 


The purpose of inventing an “ordered traversal” is to put any superfluous structures on the ordered set I 
into the background. If one uses real-number intervals for I, the topological, differentiability and algebraic 
properties of the interval suggest themselves, which is often not desirable. 


11.5.29 DEFINITION: An ordered traversal of a set X is a bijection f : I — X for some totally ordered 
set I. 


11.5.30 REMARK: Sets of functions which are totally ordered by set-inclusion. 

Theorem 11.5.31 states in essence that totally ordered sets of functions (with order defined by set-inclusion) 
have both a maximum and a minimum, and that both the maximum and minimum are functions. A set 
which is totally ordered by set-inclusion is called a “nest” by Smullyan/Fitting [392], page 36. 


11.5.31 THEOREM: Some properties of sets of functions which are totally ordered by set-inclusion. 

Let X and Y be sets. Let F be set of functions from subsets of X to subsets of Y which is totally ordered 
by set inclusion. In other words, 

(1) Vf € F, SA € P(X), 3B € P(Y), f : Ao B. (In other words, F C (f € P(X x Y); f is a function}.) 


(2 VfgeF,(fegorgC f). 
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Then 
(i) Vf,g € F, f Ug is a function. 
(ii) Vf,g € F, f Qg is a function. 
(ii) UF is a function from U pep Dom(f) to U pep Range(/). 
(iv) If F z 0, then F is a function from f pep Dom(f) to N jep Range(f). 


= 


Proor: For part (i), let f,g € F. Then f C gorg C f by (2. So fUg =gor fUg = f by 
Theorem 8.1.7 (iv). Hence f U g is a function. 

For part (ii), let f,g € F. Then f C g or g C f by (2). So fg = f or f Ng = g by Theorem 8.1.7 (v). 
Hence f Mg is a function. (Note that this actually holds for arbitrary functions f and g.) 

For part (iii), let D = U jep Dom(f) and R = User Range(f). Let z € D. Then x € Dom(f) for some 
f € F. So (x,y) € f for some f € F and y € R because Range(f) C R. Therefore (x,y) € LJ F for some 
y € R. To show that y is unique for a given x € D, suppose that (x, yi), (z,ya3) € UF. Then (z,51) € f 
and (z,y2) € g for some f,g € F. So (x, y1), (z, yz) € f Ug. Therefore yı = y» by part (i). Hence U F isa 
function from (pep Dom(f) to Ujer Range(/). 


For part (iv), F is a relation from X to Y by Theorem 9.5.14 (ii), and Dom(f F) € Njer Dom(f) by (1). 
Let D = (jc p Dom(f) and R = fjer Range(f). Both D and R are well-defined sets because F # Ø. Let 
x € D. Then x € Dom(f) for all f € F. So x € Dom(fo) for some fo € F (because F 4 Ø). Let fo be such 
a function. Let yo = fo(x). Let f € F. Then x € Dom(f). But f C fo or fo C f by (2). So f(x) = yo. 
Thus Vf € F, f(x) = yo by Definition 6.3.9 (UI). Therefore Jyo, Vf € F, f(x) = yo by Definition 6.3.9 (ET). 
Consequently Jyo, (£, yo) € MF. Thus z € Dom((] F). So D € Dom(f] F). Therefore D = Dom(() F). 
To show that (F is a well-defined function on D, let x € D and suppose that (x, y1), (x, y2) € NF. Then 
(x, 91), (z,y2) € f for some f for some f € F because F 4 (). So y = yo because f is a function. To show 
that Range( F) C R, let (x, y) € (| F. Then (z, y) € f for all f € F. So y € Range(f) for all f € F. So 
y € R. Hence f] F is a function from f) per Dom(/) to N jer Range(/). 


11.5.32 REMARK: All total-order-preserving bijections are order isomorphisms. 

A useful fact about totally ordered sets is that an order homomorphism between totally ordered sets auto- 
matically becomes an order isomorphism if it is a bijection. Theorem 11.5.33 is not valid between general 
partially ordered sets. (See Example 11.1.23.) 


11.5.33 THEOREM: Bijective total order homomorphisms are order isomorphisms. 
Let 6: X — Y be a bijection which is an order homomorphism between totally ordered sets (X, € x) and 
(Y, €y). Then $ : X — Y is an order isomorphism from (X, € x) to (Y, <y). 


PROOF: Let ó: X — Y be a bijection and an order homomorphism. Let yi,yo € Y with yi < y2. Let 
xı = $ l(yj) and z2 = $^ !(y»). Suppose that xı > xə. Then (xı) > $(x3) because ¢ is an order 
homomorphism. So yı > y2, which contradicts Theorem 11.1.16 (ii). Therefore yı Z yo. So yi < y» by 
Definition 11.5.1 (ii). Thus $^! is an order homomorphism from (Y, Xy) to (X, Xx). Hence ¢ is an order 


isomorphism from (X, € x) to (Y, <y). 


11.6. Well-ordering 


11.6.1 DEFINITION: A (left) well-ordering on a set X is a partial order R on X such that all non-empty 
subsets of X have a minimum. That is, 


VA € P(X) \ {0}, 3b € A, Va € A, b Ra. 


A right well-ordering on a set X is a partial order R on X such that all non-empty subsets of X have a 
maximum. That is, 


VA € P(X) \ {0}, Ib € A, Vae A, a Rb. 


A (left) well-ordered set is a pair X < (X, R) such that R is a left well-ordering on X. 
A right well-ordered set is a pair X < (X, R) such that R is a right well-ordering on X. 
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11.6.2 THEOREM: Every well-ordering is a total order. 
If R is a left or right well-ordering for a set X, then R is a total order for X. 


PROOF: Let R be a left well-ordering on a set X. Let x,y € X. Let S = {x,y}. Then S has a minimum 
element, which must be either x or y. So either x R y or y R x. Therefore R is a total order on X. 


Let R be a right well-ordering on a set X. Let x,y € X. Let S = {x,y}. Then S has a maximum element, 
which must be either x or y. So either x Ry or y R x. Therefore R is a total order on X. 


11.6.3 REMARK: Relations between lattice orders and total orders. 
Figure 11.6.1 is the same as Figure 11.1.1 except that the two kinds of well-ordering in Definition 11.6.1 have 
been included. Also included are lattices, which are defined in Section 11.4. 


relation 

Y 

partial order 
Y 

lattice order 

x ~ 
complete 
lattice order 
"d TN 
left well-ordering| |right well-ordering 


Figure 11.6.1 Family tree for classes of order relations, including various well-orderings 


There is some similarity between total orders and lattice orders. A total order on a set X guarantees the 
existence of min{a,b} and max{a,b} for any a,b € X. A lattice order on X guarantees the existence of 
inf(a, b) and sup{a, b} for any a,b € X. Hence a total order is necessarily a lattice order. 


11.6.4 REMARK: Alternative terminology for left and right well-orderings. 
One could also refer to a left well-ordering as a “lower well-ordering". One could refer to a right well-ordering 
as an "upper well-ordering". 


11.6.5 REMARK: Well-orderings constructed by restriction or induced by bijections. 
Theorems 11.6.6 and 11.6.7 assert that the restrictions to subsets of well-ordered sets and well-ordered set 
isomorphisms are well-ordered sets and well-ordered set isomorphisms. 


Theorem 11.6.8 asserts that a bijection from a well-ordered set to a general set induces a well-ordering on 
its range, and then the bijection becomes an order isomorphism between the domain order and range order. 


11.6.6 THEOREM: Any restriction of a well-ordering is a well-ordering. 
Let (X,<x) be a well-ordered set. Let Y € P(X) and <y = Suec: Then (Y, <y) is a well-ordered set. 


PROOF: <y is an order on Y by Theorem 11.1.19. (See Definition 9.6.21 for the restriction of a relation.) 
Let A € P(Y)\ (0). Then A € P(X)V (0). So 3b € A, Va € A, b <y a by Definition 11.6.1 because a,b € Y 
for all a,b € A. Hence €y is a well-ordering on Y by Definition 11.6.1. 


11.6.7 THEOREM: All restrictions of well-ordering-isomorphisms are well-ordering-isomorphisms. 

Let $ : X — Y be an order isomorphism between well-ordered sets (X, X x) and (Y, <y). Let A € P(X) and 
B= $(A). Let <4 = aren and <p = S aono Then ola : A > B is an order isomorphism between 
well-ordered sets (A, € A) and (B, € p). 


PROOF: (A,€4) and (B, <p) are well-ordered sets by Theorem 11.6.6, and ola : A > B is clearly an order 
isomorphism between (A, € 4) and (B, € p) by Definition 11.1.21. 


11.6.8 THEOREM: Induction of a well-ordering onto a set via a bijection. 

Let (X, €x) be a well-ordered set. Let ó : X — Y be a bijection. Define the relation <y € P(Y x Y) 
by <y = {(y1,y2) € Y x Y; ó^l(y1) €x ó^1(yag)). Then (Y, £y) is a well-ordered set, and $ is an order 
isomorphism from (X, € x) to (Y, <y). 
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PROOF: By Theorem 11.1.27, (Y, €y) is an ordered set, and ¢: X — Y is an order isomorphism between 
(X, € x) to (Y, Xy). To show that <y is a well-ordering of Y, let B € P(Y) \ (0). Let A = 9^1(B). Then 
A € P(X) \ (0). So by Definition 11.6.1, there is an element ag € A which satisfies Va € A, ao X x a. Let 
bo = ¢(ao). Let b € B. Then $^ !(bg) = ao €x $^ !(b) because $^ !(b) € A. So bo <y b by the definition 
of <y. Thus bo € B satisfies Vb, by <y b. Therefore <y is a well-ordering of Y by Definition 11.6.1. 


11.6.9 REMARK: Extremums at both ends of subsets of well-ordered sets. 

Definition 11.6.1 implies that every non-empty subset A of a left well-ordered set X has a minimum, which 
means that the infimum of A is well defined in X, and also that inf(A) is contained in A. This implies in 
particular that every subset is bounded below. 


Theorem 11.6.10 (i) asserts that in a left well-ordered set, every non-empty subset which is also bounded 
above has a supremum in X, although this does not imply that this supremum is contained in A. In other 
words, it does not imply that A has a maximum. Thus Theorem 11.6.10 (i) does not imply that all bounded 
left well-ordered sets are right well-ordered. (See the counter-example in Example 11.6.11.) 


It is not difficult to show that a set which is both left and right well-ordered must be finite. (Suppose that 
a set X is both left and right well-ordered. Let A be a subset of X. Inductively construct a map @:w— A 
by ó(i) = min(z € A; Vj < i, x Æ o(j)} for all i € w. Then ¢ is an increasing sequence of elements of A. 
Let S = Range(¢). Then S must contain a maximum ó(ig) for some ig € w because X is right well-ordered. 
This is a contradiction. Therefore A must be exhausted after a finite number of inductive steps. So A is 
finite. Hence X is finite. See Section 13.5 for finite sets.) 


In the case of a totally ordered set, it is not guaranteed that subsets will have either a supremum or infimum 
in the whole set. (See Example 11.6.12.) So the assertions of Theorem 11.6.10 do require a well-ordering. 
The slightly paradoxical point here is that guaranteeing the existence of a minimum also guarantees the 
existence of a supremum as a kind of accidental by-product. 


11.6.10 THEOREM:  Bounded subsets of left well-ordered sets have a supremum. 
(i) Every non-empty bounded-above subset of a left well-ordered set (X, €) has a supremum in X. 
(ii) Every non-empty bounded-below subset of a right well-ordered set (X, €) has an infimum in X. 


Pnoor: For part (ii), let (X, X) be a right well-ordered set. Let A be a non-empty bounded-below subset 
of X. Then A has a lower bound in X. So the set A^ = {a € X; Va € A, x € a} is non-empty. Therefore 
A` has a maximum by Definition 11.6.1. Hence A has an infimum by Theorem 11.2.11. 


Part (i) follows similarly to part (ii). 


11.6.11 EXAMPLE: A well-ordered set where most infinite subsets have no supremum. _ 
Let X = Zg, where all subsets are bounded above by oo € Zf. (See Notation 14.5.4 for Zg.) 


11.6.12 EXAMPLE: A totally ordered set where bounded sets may have no infimum or supremum. 
Let X = Q, the set of rational numbers. (See Section 15.1.) Let A = {x € Q; x? < 2). Then A has neither 
an infimum nor a supremum in X with respect to the usual total order on Q. (See Definition 15.1.10.) 


11.6.13 REMARK: Motivation for focusing on left well-orderings. 

The definition of a well-ordering is not symmetric. A relation R can be a well-ordering while its inverse 
R-! is not. Thus in Definition 11.6.1, for example, the phrase “unique minimum element” would need to be 
replaced by “unique maximum element” if “<” is replaced by the corresponding representation “>”. The 
mathematics of well-ordering would be the same with either “minimum” or “maximum” in Definition 11.6.1. 
But then it would be necessary to make conversions for each application. 


Probably one good reason why one-sided well-orderings are customarily defined rather than two-sided ones 
is the fact that transfinite induction (in Section 11.8) requires only a one-sided well-ordering. Transfinite 
induction is similar to the standard induction on the non-negative integers. Every non-empty subset of the 
non-negative integers, with the usual order, is guaranteed to have a minimum element. This is why standard 
induction works. 


Transfinite induction is not the only application of well-orderings. The existence of the infimum of any 
non-empty set is guaranteed if the order is a left well-ordering. Similarly, the existence of the supremum is 
guaranteed if the order is a right well-ordering. 
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11.6.14 THEOREM: All well-ordered sets are semi-bounded. 
Let (X, €) be a non-empty ordered set. 

(i) If “<” is a left well-ordering on X, then .X is bounded below. 
(ii) If “<” is a right well-ordering on X, then X is bounded above. 


Pnoor: To show part (i), let (X, X) be a non-empty ordered set. Let “<” be a left well-ordering. Then 


Jb € X, Vae X, b Ra by Definition 11.6.1 because X € P(X) \ (01. In other words, there is an element 
b € X which is a lower bound for X. Hence X is bounded below by Definition 11.2.38. 


Part (ii) follows similarly. 


11.6.15 REMARK: Sets which are well-ordered by the set-inclusion relation. 

Every set of sets has a well-defined set-inclusion relation, and as mentioned in Remark 11.1.34, any partially 
ordered set (X, €) may be automatically converted to an order-isomorphic partially ordered set (Y, C) for 
some Y c P(X). Thus sets which are partially ordered by set-inclusion are of some interest. For such 
partially ordered sets, Theorem 11.6.16 gives the simpler-looking condition in Definition 11.6.17 for the 
partial order to be a well-ordering. 


Theorem 11.6.16 follows from the observation that A C a for all a € A, and (| A is the largest set b such 
that b C a for all a € A. Therefore (| A must be the minimum of any set-of-sets A. So the only remaining 
question is whether (] A is an element of A. Thus the question of whether every non-empty subset of X 
contains its minimum is equivalent to the question whether (].A € A for all A € P(X) V (0). This is a 
consequence of the special structure of the set-inclusion relation, which implies that the minimum of a set 
of sets will always be its intersection. 


11.6.16 THEOREM: Condition for the set-inclusion relation to be a well-ordering. 
Let X be a set of sets. Then the set-inclusion relation *C" on X is a well-ordering of X if and only if 
VA c P(X) \ (0), f] A € A. 


PROOF: Let X be a set which satisfies VA € P(X) V {0}, QA € A. Let A € P(X) \ (0). Let b = NA. 
Then b C a for all a € A by Theorem 8.4.8 (xv). Thus VA € P(X) \ (0), 30 € A, Va € A, b Ca. Hence C is 
a well-ordering of X by Definition 11.6.1. 

Now assume that C is a well-ordering of X. Let A € P(X) \ (0). Then by Definition 11.6.1, there is a b € A 
with Va € A, b C a. So b C (| A by Theorem 8.4.8 (xviii). But A C c for all c € A by Theorem 8.4.8 (xv). 
So ()A Cb. Thus b = QA. So NA € A. Hence VA € P(X) \ (0), (1A € A. 


11.6.17 DEFINITION: A set (of sets) X is well ordered by set-inclusion when (X, C) is a well-ordered set, 
where C is the set-inclusion relation on X. In other words, 


VA € P(X) \ {0}, (1A € A. 


11.6.18 REMARK: Lezicographic order. 
The lexicographic order in Definition 11.6.19 is useful for various constructions and counterexamples. It may 
be easily generalised to any family of totally ordered sets (.X;, €;);er, for any well ordered set (I, <r). 


To see the necessity for the well-ordering on the index set J, consider X = {—1,1} and I = Z, with z,y € X! 
defined by x; = (—1)! and y; = (—1)**! for all i € I. Then x € y and y < = by line (11.6.1), but x Æ y. 
Therefore < is not a total order on X7 because it fails the antisymmetry condition, Definition 11.5.1 (ii). 
This makes lexicographic order essentially useless if the index set is not well ordered. 


The essence of lexicographic ordering is comparison of the lowest-index component of two different families 
x and y in Xl. In other words, one first determines the lowest index jo = min{j € I; x; Z yj} € I where 
the two families x = (r;);e; and y = (yi)ie; differ. Then the order relation between x and y is determined 
solely by this component. The necessity to always first locate the lowest differing index implies the necessity 
of a well-ordering on the index set. 


11.6.19 DEFINITION: The lexicographic order on a Cartesian product X7, for a totally ordered set X and 
a well ordered set I, is the order “<” on X! given by 


Va,y € X1, pEkyeVijel(vielLi(tcj-s-—u)-zut) (11.6.1) 
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11.6.20 THEOREM: Lexicographic order is a total order if the index set is well-ordered. 
Let (X, <x) be a totally ordered set. Let (I, X7) be a well ordered set. Then the lexicographic order on X+ 
is a total order. 


PROOF: Let “<” denote the lexicographic order on X7. Let x,y € X7 satisfy x £ y. Then by line (11.6.1), 
there is a jo € I which satisfies Vi € I, (i <7 jo > vri = yi)) and xj, >x yj,. Let j € I. If j <r jo, then the 
proposition (Vi € I, (i <r j > yi = zi)) > yj €x cj is true because y; = xj. If j = jo, then the proposition 
(Vi € I, (à <r j > yi = vi)) > yj €x vj is true because yj <x xj. If j >r jo, then the proposition 
(Vi e I, (i <r j > yi = z;)) > yj €x 2; is true because the proposition Vi € I, (i «; j > yi = mi) is 
false. Consequently Vj € I, ((Vi € I, (i <r j > yi = 2i)) > y; € vj). So y € & by line (11.6.1). Thus “<” 
satisfies Definition 11.5.1 (i). 


Let x,y € X! satisfy x < y and y < x. Suppose that x Æ y. Let jo be the least j € I which satisfies 
xz; # yj. (This exists and is unique because <; is a well-ordering on J.) Then Vi € I, (i <r jo > vj = yi). 
So rj, <x yj, because x < y, and yj, <x rj, because y € x. So x = y because € x is a total order. But 
this contradicts rj, Z yj,. Consequently x = y. Thus “<” satisfies Definition 11.5.1 (ii). 


Let x,y,z € X! satisfy z < y and y < z. If x = y, then z < z, and if y = z, then x < z. So suppose that 
x Z y and y Æ z. Let jo = min(j € I; xj A yj} and jj = min{j € I; yj; # zj}. These are well defined 
because <; is a well-ordering. If jo X; jı, then Vi € I, (i <r jo > x; = zi) and vj, <x yj, €x 2%. So 
x X z. Similarly if jı <7 jo, then Vi € I, (à <r jı > vj = %) and zj, €x yj, <x 2%. So x < z. Thus z < z 
in all cases. So “<” satisfies Definition 11.5.1 (iii). Hence the lexicographic order on X” is a total order. 


11.6.21 THEOREM: Equivalent conditions defining lexicographic order. 
Let (X, € x) be a totally ordered set. Let (7, <7) be a well ordered set. Let < denote the lexicographic order 
on X^. 


(i) Vz,y € X!, (x <y & VIET, (aj <x y) V (ET, (i«rj ^ ziz yi). 
(ii) Va,ye XT, («y e Jj €I, ((zj «x yj) ^ (Vie I, (i <r j zi = i). 


PROOF: Part (i) follows from Definition 11.6.19 and Theorems 4.7.9 (lxii), 6.6.10 (x) and 4.7.9 (lxviii). 


For part (ii) Theorem 11.6.20 implies x < y = y £ x. Then the assertion follows from Definition 11.6.19 
and Theorems 6.6.10 (viii) and 4.7.9 (Ixviii). 


11.6.22 REMARK: Every set can be well-ordered with the axiom of choice. 
Theorem 11.6.23, the well-ordering theorem, is called “Zermelo’s theorem" by Willard [165], page 10. 


11.6.23 THEOREM [ZF+AC]: Zermelo's well-ordering theorem. 
On every set, there exists a well-ordering. 


PROOF: Since this is an AC theorem which is not entirely trivial to prove, no proof is given here. Proofs 
are given directly from the axiom of choice, or indirectly via Zorn's lemma, a numeration theorem (showing 
equinumerosity to ordinal numbers), or some other equivalent axiom, by Zermelo [443], pages 514—516; 
Stoll [393], pages 116-117; Pinter [377], pages 120-121; Lévy [368], pages 160-161; Halmos [357], pages 68-69; 
E. Mendelson [370], pages 197-199; Kelley [101], page 35; Shoenfield [390], page 253; Roitman [385], page 82; 
Bernays [341], pages 138-139; Smullyan/Fitting [392], pages 59-60. 


11.7. Comparability of well-ordered sets 


11.7.1 REMARK: Order isomorphisms between well-ordered sets. 

Cantor's late 19th century work on set theory included many results concerning “order types", which are 
equivalence classes of well-ordered sets with respect to order isomorphisms. (See Definition 11.1.21 (iv) 
for order isomorphisms.) A substantial proportion of abstract set theory is concerned with order types 
and ordinal numbers. In particular, comparability of well-ordered sets is closely related to equinumerosity 
(in Section 13.1), which is the fundamental concept underlying the cardinality of sets. Hartogs's theorem 
concerning cardinality follows from the basic properties of order types. (See Theorem 13.3.2.) 
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11.7.2 THEOREM: Properties of order isomorphisms between well-ordered sets. 
Let (X,<) be a well-ordered set. 


(i) Let Y € P(X) \ {X} with Vy € Y, Va e X \ Y, y < x. Then Y = {x € X; x < a} for some a € X. 
(ii) Let Y € P(X) and suppose that ¢: X — Y is an order isomorphism from (X, <) to (Y, <y). (In other 
words, ¢ is an order monomorphism from (X, €) to (X, <).) Then Vz € X, ọ(x) > x. 
) Let à : X — X be an order isomorphism from (X, €) to (X, €). Then Va € X, (x) = x. 
) Let a € X and Y = {x € X; x < a). Then (X, X) and (Y, €y) are not order isomorphic. 
(v) Let Y € P(X), ac Y and Z = {y € Y; y < a}. Then (X, X) and (Z, €z) are not order isomorphic. 
) Let a,b € X. Let Y = {x € X; x < a} and Z = {x € X; x < b). Then (Y, €y) and (Z, €z) are order 
isomorphic if and only if a = b. 
Let (X, € x) and (Y, €y) be well-ordered sets. 


(vii) If à : X > Y and ds : X — Y are order isomorphisms from (X, € x) to (Y, <y ), then $1 = 4». 
(viii) Let 6 : X — Y bean order isomorphism from (X, € x) to (Y, <y). Let a € X and A = (x € X; x <x a}. 
Then $(A) = (y € Y: y <y ¢(a)}. 
(ix) Let a € X and 51,59 € Y fori = 1,2. Let A = (x € X, x <x a} and B; = (y € Y; y <y bi) for 
i = 1,2. Let € 4 and € p, be the restrictions of <x and <y respectively to A and B; for i = 1,2. Let 
Qi : A — Bj be order isomorphisms from (A, € 4) to (B;, € p,) for i = 1,2. Then Bı = By and $1 = $». 
(x) Let aj,a2 € X and 51,06 € Y. For i = 1,2, let A; = {x € X, x <x aj) and B; = (y € Y; y <y bi}, 
and let <4, and Xp, be the restrictions of <x and <y respectively to A; and B;. Let $; : A; > B; be 
order isomorphisms from (A;, € 4,) to (Bi, € p,) for i = 1,2. Then $4 C $» or $9 € $i. 


PRoor: For part (i), let a be the minimum of X V Y, which exists by Definition 11.6.1 because X VY # (). 
Then Y C {x € X; x < a) because a € X \ Y. Suppose that b € {x € X; x < a} VY. Then b € XV Y and 
b < a, which contradicts the minimality of a. Hence Y = {x € X; x < a}. 

For part (ii), suppose that it is not true that Vr € X, ó(x) > x. Then A= {x € X; (x) < x} is a non-empty 
subset of X because < is a total order on X by Theorem 11.6.2. So A has a least element y with respect to 
< by Definition 11.6.1. But then (y) < y, and so ó(ó(y)) < ó(y) by Definition 11.1.21 (iv). So ¢(y) € A. 
Therefore y is not the least element of A, which is a contradiction. Hence Vz € X, (x) > x. 


Part (iii) follows from part (ii) because $^! is an order isomorphism. 
For part (iv), suppose that (X, X) and (Y, Xy) are order isomorphic. Then there is an order isomorphism 
@:X +Y={xe X;x <a}. Soo(a)« a. But ó(a) > a by part (ii), which is a contradiction. Hence 
(X, €) and (Y, €y) are not order isomorphic. 

For part (v), suppose that (X, X) and (Z,<z) are order isomorphic. Then there is an order isomorphism 
ó:X—Z-—iyc€Y;y«aj. Soó(a) < a. But ¢(a) > a by part (ii), which is a contradiction. Hence 
(X, €) and (Z, €z) are not order isomorphic. 

Part (vi) follows from part (iv). 

For part (vii), $5! o ¢1 is an order isomorphism from (X, € x) to (X, X x). So o5. o $1 = idx by part (iii). 
Hence $1 = $». 

For part (viii), @(A) C (y € Y; y <y $(a)} by Definition 11.1.21 (iv). Suppose that y € Y with y <y ¢(a). 
Then $^ !(y) <x a. So $^! (y) € A. Therefore y € ¢(A). Hence ¢(A) = (y € Y; y <y ¢(a)}. 

For part (ix), @2 o $1! : Bı — Bə is an order isomorphism from (B1, €p,) to (Bo, <B,). So bı = b2 by 
part (vi). So Bı = B». Hence $; = $» by part (vii). 

For part (x), suppose that a1 € x a». Then A; C Ag. So ¢2(A1) C $2(42) = B2. But part (viii) implies that 
$2(A1) = {y = Y; y LY 2(a1)}. So $3 (A1) = Bi and Qola, = Qı by part (ix). Thus Qı C Q2. Similarly, 
az <x a4 implies ¢2 C ¢4. 


11.7.3 REMARK: Lower sections of well-ordered sets. 
The concept of a “lower section” of a well-ordered set arises naturally from Theorem 11.7.2 (i). Much of 
Theorem 11.7.2 can be more naturally expressed in terms of lower sections. A lower section of a well-ordered 
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set (X, X) can be either a set Y, as in Definition 11.7.4, or the pair (Y, Xy) where <y = sae is the 
restriction of <x to Y. Theorem 11.7.5 expresses some parts of Theorem 11.7.2 in terms of lower sections. 


A lower section is called an “initial segment” by Cohen [349], page 57. A proper lower section is called an 
“initial segment” by Stoll [393], page 103; Halmos [357]. page 56. 


11.7.4 DEFINITION: 
A lower section of a well-ordered set (X, €) is a set Y € P(X) which satisfies Vy € Y, Vr € XN Y, y < z. 


A proper lower section of a well-ordered set (X, X) is a lower section Y of (X, X) such that Y 4 X. 


The usual well-ordering on a lower section Y of a well-ordered set (X, €) is the restricted order c6 MNA 


11.7.5 THEOREM: Properties of order isomorphisms between lower sections of well-ordered sets. 
Let (X, <) be a well-ordered set. 


(i) Ø and X are lower sections of (X, x). 
Let A and B be lower sections of (X, €). Then AC B or B C A. 
Let A be a proper lower section of (X, X). Then A = {x € X; x < a} for some a € X. 


) 
) 
(iv) Let A = {x € X; x < a) for some a € X. Then A is a proper lower section of (X, <). 
) Let A € P(X). Then A is a proper lower section of (X, X) if and only if A = {x € X; a < a} for 
some a € X. 
(vi) Let A € P(X). Then A is a lower section of (X, X) if and only if A = X or A = {x € X; x < a} for 
some a € X. 
(vii) Let a € X and A= {x € X; x < a). Then AU {a} is a lower section of (X, <). 
(viii) Let A be a proper lower section of (X, €). Then (A, € 4) and (X, €) are not order isomorphic. 
(ix) Let Y € P(X), and A be a proper lower section of (Y, Xy). Then (A, € 4) and (X, X) are not order 
isomorphic. 
(x) Let A and B be lower sections of (X, €). Then (A, € 4) and (B, € p) are order isomorphic if and only 
if A — B. 
(xi) Let C be a set of lower sections of (X, <). Then UC is a lower section of (X, <). 


Let (X, € x) and (Y, €y) be well-ordered sets. 
(xii) Let 9 : X — Y be an order isomorphism from (X, € x) to (Y, <y). Let A be a lower section of (X, € x). 
Then ¢(A) is a lower section of (Y, <y). 


(xiii) Let 9 : X — Y be an order isomorphism from (X, €x) to (Y, Xy). Let A be a proper lower section 
of (X, € x). Then ¢(A) is a proper lower section of (Y, € y). 


(xiv) Let A be a lower section of (X, € x). Let Bı and B» be lower sections of (Y, <y). Let $; : A > Bj be 
order isomorphisms from (A, € 4) to (Bi, € p,) for i = 1,2. Then By = B2 and $1 = 4». 


(xv) Let A; and Az be lower sections of (X, € x). Let B be a lower section of (Y, <y). Let à; : A; > B be 
order isomorphisms from (A;, € 4,) to (B,<g) for i = 1,2. Then A; = A» and $1 = 4». 


(xvi) Let A; and Az be lower sections of (X, € x). Let Bı and B» be lower sections of (Y, <y). Let 6; : A 
B; be order isomorphisms from (Aj, € 4,) to (Bi, € p,) for i = 1,2. Then $1 € $» or à» C $i. 


PROOF: Part (i) follows trivially from Definition 11.7.4. 


For part (ii), suppose that A Z B. Then A\ B £z ( by Theorem 8.2.5 (iv). Soa € A \ B for some a € X. 
Let b € B\ A. Then b < a because b € B and a € X \ B. But a < b because a € A and b € X V A, which is 
a contradiction. So B \ A = (). Therefore B C A by Theorem 8.2.5 (iv). 


Part (iii) follows from Theorem 11.7.2 (i). 


For part (iv), let y € A and x € X V A. Then y < a and z £ a. So y < x because < is a total order. Hence 
A is a lower section of (X, €) by Definition 11.7.4. 


Part (v) follows from parts (iii) and (iv). 
Part (vi) follows from parts (i) and (v). 
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For part (vii), let y € AU {a} and € X \ (AU (aj). If y € A, then y < x by Definition 11.7.4 because 
x € X \A and A is a lower section of (X, <) by part (iv). If y = a, suppose that x < y. Then x < y or z = y. 
If x < y, then z € A and so x € X \ (AU {a}). So x £ y. But if x = y, then x = a and so x ¢ X \ (AU {a}). 
Thus x € y is not possible. So y < x because < is a total order on X. Hence A U {a} is a lower section of 
(X, €) by Definition 11.7.4. 


Part (viii) follows from Theorem 11.7.2 (iv). 


Part (ix) follows from Theorem 11.7.2 (v). 


Part (x) follows from Theorem 11.7.2 (vi) and part (viii). 

For part (xi), let Y = UC. Then Y € P(X) by Theorem 8.5.2(ix). Let y € Y and x € XV Y. Then 
y € A for some A € C, and z € X \ B for all B € C by Theorem 8.4.12 (i). Let B = A. Then y < x by 
Definition 11.7.4 because A is a lower section of X. Hence Y is a lower section of X by Definition 11.7.4. 
For part (xii), let y; € 9(A) and y € Y \ (A). Let zı = 9^ !(y1) and z2 = 9^ !(y2). Then zı € A and 
£2 € XX A. So x < x2 by Definition 11.7.4. Therefore y; < y2 by Definition 11.1.21 (iv). Hence ¢(A) is a 
lower section of Y by Definition 11.7.4. 


Len ~~ 


Part (xiii) follows from part (iii) and Theorem 11.7.2 (viii). 
Part (xiv) follows from Theorem 11.7.2 (ix) for the case of proper lower sections. The case of general lower 
sections follows from part (x) and Theorem 11.7.2 (iii) with ¢ = $; o 9; !. 


Part (xv) follows from part (xiv) by replacing the order isomorphisms with their inverses and reversing the 
roles of the sets. 


Part (xvi) follows from Theorem 11.7.2 (x) for the case of proper lower sections. The case of general lower 
sections may be proved similarly. By part (ii), A; C Az or A» C A,. Suppose that A; C A2. Then 
$2(A1) € ¢2(A2) = Bo. So $2(A1) is a lower section of (B5, X p,) by part (xii). Therefore ¢2(A1) = Bı and 
gola, = $1 by part (ix). Thus $1 C $». Similarly, A2 C A; implies ¢2 C $i. 


11.7.6 REMARK: Comparability of well-ordered sets. 

Lévy [368], pages 44-45, attributes Theorem 11.7.7 to Cantor, 1897. It is called “the fundamental theorem 
for well-ordered sets” by Suppes [395], pages 233-234. It is referred to as “the fundamental theorem of well 
ordering” or “the comparability theorem for well orderings” by Smullyan/Fitting [392], page 82. 


This comparability effectively means that the class of all well-ordered sets is totally ordered with respect to 
the relation that the first set is order isomorphic to a lower section of the second. 


11.7.7 THEOREM: The well-ordered set comparability theorem. 
Let (X, € x) and (Y, €y) be well-ordered sets. Then either (i) or (ii) or (iii) is true. 
(i) (X, <x) is order isomorphic to (Y, <y). 
(ii) (X, € x) is order isomorphic to (B, Xp) with B = (y € Y; y <y b} and Ep = Sens for some b € Y. 
In other words, (X, <x) is order isomorphic to a proper lower section of (Y, <y). 
(ii) (Y, <y) is order isomorphic to (A, € 4) with A = {x € X; x «x a} and <4 = Sx goa for some a € X. 
In other words, (Y, <y) is order isomorphic to a proper lower section of (X, € x). 


In other words, (X, € x) is order isomorphic to a lower section of (Y, <y ) or (Y, <y) is order isomorphic to 
a lower section of (X, <x). 


PROOF: Let Wx = (A € P(X); (A, € x] aya) is a well-ordered set). Then Wx is a well-defined ZF set. 
Similarly, Wy = {B € P(Y); (B,<y| px p) 18 a well-ordered set} is a well-defined ZF set. It follows that 
N —(ó € P(X x Y (A, B) € Wx x Wy, 6: A — B is an order isomorphism} is a well-defined ZF set. 
By Theorem 11.7.5 (xvi), N is totally ordered by the set-inclusion relation. 

Let Y = UN. Then v € P(X x Y) by Theorem 8.5.2 (ix), and v is a function from D = Ugen Dom(¢) 
to R= Usen Range(¢) by Theorem 11.5.31 (iii). Similarly, v^! is a function from R to D. Sov: D > R 
is a bijection by Theorem 10.5.9. To show that ¢: D — R is an order isomorphism, let 71,22 € D with 
zı € z2. Then zı € Dom(41) and z2 € Dom(¢2) for some ¢1,¢2 € N. Suppose (without loss of generality) 
that $1 C d». Then z1, £2 € Dom(¢2). So $»(z1) € $»(r2). Therefore v(zi1) € v(r2) because p2 C v. Thus 
@: D — R is an order isomorphism. 
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By Theorem 11.7.5 (xi), D is a lower section of (X, X x), and R is a lower section of (Y, Xy). Suppose that 
D is a proper lower section of (X, €x) and R is a proper lower section of (Y, Xy). Then it follows from 
Theorem 11.7.5 (iii) that D = (x € X; x «x a} for some a € X, and R= (y € Y; y <y b} for some b € Y. 
Let Dt = DU {a} and R* = RU (b). Then D* is a lower section of (X, € x), and R* is a lower section 
of (Y, Xy) by Theorem 11.7.5 (vii). Define wt : Dt > Rt by v*(x) = y(x) for x € D and v*(a) = b. 
Then wt € N. So wt < y. So Dt C D, which is impossible. Therefore either D = X or R =Y, or both. 
Consequently (X, X x) is order isomorphic to a lower section of (Y, Xy) or (Y, <y) is order isomorphic to a 
lower section of (X, € x). 


11.7.8 REMARK: Order isomorphism existence between subsets and lower sections of well-ordered sets. 
Theorem 11.7.9 is perhaps somewhat surprising. Any subset at all of a well-ordered set (X, X) is order 
isomorphic to some lower section of that set. This implies that P(X) may be partitioned into equivalence 
classes according to which lower section they are order isomorphic to. As an example, consider the set of 
even elements of w. This subset is order isomorphic to w itself, which is a lower section of w. Similarly, 
the set of even elements less than 100 is order isomorphic to the set of all elements less than 50, which is a 
proper lower section of w. 


11.7.9 THEOREM: All subsets of a well-ordered set are order isomorphic to a lower section. 
Let (X, €) be a well-ordered set. Let A € P(X). Let <4 = Then (A, € 4) is order isomorphic to a 
lower section of (X, x). 


S ance 


PROOF: By Theorem 11.6.6, (A, € 4) is a well-ordered set. So by Theorem 11.7.7 (i), either (A, <4) is order 
isomorphic to a lower section of (X, X), or (X, €) is order isomorphic to a lower section of (A, <4). But by 
Theorem 11.7.5 (ix), (X, <) cannot be order isomorphic to a proper lower section of (A, € 4). Hence (A, € 4) 
is order isomorphic to a lower section of (X, x). 


11.7.10 REMARK: Weak initial segments are special kinds of proper lower sections. 

One might like to know why “strong initial segments" {x € X; x < a} are of such interest for well-ordered 
sets (X, <), while “weak initial segments" {x € X; x < aj are apparently disregarded. (See Halmos [357], 
page 56, for the term “weak initial segment".) For general orders, these strong and weak kinds of “initial 
segments" have mirror image properties, but well-orderings are not symmetric under mirror reversal, i.e. by 
replacing “<” with “>”. In particular, subsets of a well-ordered set (X, €) are strong initial segments if 
and only if they are proper lower sections, but this does not apply to weak initial segments. In general, every 
weak initial segment (x € X; x < a} is a lower section of (X, €), but the converse is not valid. 

For example, let X = w U {w}, ordered by set inclusion. (See Notation 12.1.29 for w. See Definition 12.2.5 
for the standard order on w.) Then w is a proper lower section of X, and w = [x € X; x « w}, but there is 
no a € X for which w can be expressed as (x € X; x < a}. 


11.7.11 THEOREM: Some trivial properties of weak initial segments of well-ordered sets. 
Let (X, X) be a well-ordered set. Let L^ (X) denote {x € X; x < a} for all a € X. 


(i) Va,b e X, L} (X) € LE (X). 
(ii) X = Usex La (X). 
(iii) VA € P(X), AC Uses LE (X. 


(iv) Va € X, (LEX) AX — Leroy = DE OO U {min(X V LE (X))})- 


Let (Y, <y) be a well-ordered set. Let ó : X — Y be an order isomorphism from (X, X) to (Y, <y). Let 
Lj (Y) denote {y € Y; y <y b} for all b € Y. 


(v) Va € X, &(LEQO) = Liu 


PROOF: Part (i) follows immediately from the definition of L} (X) fora € X. 

For part (ii), UJ, cx L1 (X) € X because L} (X) C X for all a € X. Now let x € X. Then z € L} (X). So 
x E€ Uacx Li (X). Hence X = U,ex LI (X). 

For part (iii), let x € A. Then z € L} (X). So x € Usca LI (X). Hence A C U,c4 LE (X). 

For part (v), let a € X with L} (X) A X Then XV L} (X) # 0. Let zo = min(X V L (X) # 0), which is 
well defined by Definition 11.6.1. Then xp > a and so L;, (X) 2 L} (X) U {zo}. Now let y € L} (X). Then 


Xo 
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y € X and y € zo. If y € a, then y € L} (X). Otherwise a < y < zo. But this implies y = xo by the 
definition of xo. Thus Lt, (X) C L} (X) U {xo}. Hence L} (X) = LI (X) U {ao}. 


For part (v), let a € X. Let y € é(LT(X)). Let x = $^! (y). Then x € LT (X). So x € a. So ó(z) <y (a). 
So y = d(x) € Eos ) Therefore (Lt (X)) € Lay (Y) Now let y € Lits (Y). Then y € Y and 
y <y $(a) Let x = 9 !l(y) Then r € X and z < a. So xz € Lt(X). So y € é(L1(X)). Therefore 


y 
Lj (Y) € oL (X). Hence o(L¢(X)) = Lj, (Y). 


11.8. Transfinite induction 


11.8.1 REMARK: Relation of transfinite induction to the axiom of choice. 

The principle of transfinite induction (Theorem 11.8.2) does not require the axiom of choice. In fact, it is 
often a very useful substitute for the axiom of choice. Transfinite induction is applicable to any set which 
has a well-ordering, no matter how infinite the set is. 


The axiom of choice, for those who believe in it, would imply that all sets have a well-ordering, and that 
therefore the principle of transfinite induction would be applicable to all sets. In AC-free set theory, one 
must explicitly require a well-ordering to be available rather than relying on AC to magically provide one. 


The existence of a well-ordering is apparently a fairly weak assumption. So transfinite induction should be 
very widely applicable. For example, this principle can be usefully applied to prove existence theorems for 
linear spaces. (See for example Theorems 22.7.21, 22.7.26 and 23.5.7.) Unfortunately, well-orderings for 
uncountably infinite sets are not in general easy to construct. This is why the axiom of choice is invoked so 
often for uncountably infinite sets. So in practice, transfinite induction is limited to uncountably infinite sets 
which have the right kind of structure for well-orderings to be defined on them. (Uncountable well-ordered 
sets can be “constructed” in pure ZF set theory using Hartogs's theorem. But this kind of “construction” 
relies on set cardinality tests which are ZF model-dependent. See Theorems 13.3.2 and 13.3.9.) 


11.8.2 THEOREM: Principle of transfinite induction. 
Let < be a well-ordering on a set X. Let S C X satisfy Vr € X, (((s € X;s <a} € S) — x € S). 
Then S = X. 


PROOF: Let < be a well-ordering on X. Let S C X satisfy Vr € X, (({s € X; s <a} C S) ve S). 
Suppose S Z X. Then X \ S # Ø. So there is a minimum element zo of X V S. This means that 
(se XX S; s < zo} — 0. Hence (s € X; s < zo] CS. So zo € S. So zo € SN (X \ S) — 0. This isa 
contradiction. Therefore S = X. 


11.8.3 REMARK: An AC-free version of the theorem that surjections have right inverses. 

Theorem 11.8.4 is an AC-free version of AC-tainted Theorem 10.5.17 (i). To put this theorem more briefly, 
every surjection on a well-ordered set has a right inverse. (The converse is elementary. So it does not need 
to be stated.) 


11.8.4 THEOREM: Every surjection has a right inverse if the domain is well-ordered. 
Let X be a well-ordered set. Let Y be a set. Then a function f : X — Y is a surjection if and only if f has 
a right inverse. 


PROOF: Let RC X x X be a well-ordering of a set X. Let f : X —^Y bea surjection. Define g C Y x X 
by g = {(y,x) € Y x X; f(x) = y and Yr’ € X, (f(z) = y => x Rz)). Then {x € X;i(y,z) € gh isa 
singleton for all y € Y because f is a surjection and R is a well-ordering of X. So g : Y — X is a function. 
By construction, f o g — idy. So f has a right inverse. The converse follows from Theorem 10.5.14 (iii). 


11.8.5 REMARK:  Metaphors for transfinite induction. 

The transfinite induction principle may be thought of as a kind of continuity or connectedness principle. 
(See Section 34.1 for connectedness.) It is similar to finding a break in a water line, an electricity line, or 
a network path. If a rule is going to fail, there must be a first place where it fails. One must examine the 
“first point of failure" to ensure that the inductive rule is satisfied. If the rule is followed everywhere along 
a path, then there can be no first point of failure, and therefore there is no point of failure at all. 


Theorem 11.8.2 effectively says that if the set of points where a proposition is true is a closed set (i.e. if 
Va € X, (({s € X;s« x) C S) > x € S)), and the set of all points where the proposition is false is a 
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closed set (i.e. VY € P(X) \ {0}, 3z € Y, Vy € Y, z < y), then either the proposition is true everywhere 
or false everywhere. 'This is because the two sets are disjoint. (If the set X is non-empty, it is easily 
shown that the proposition cannot be false everywhere. The set X must have a minimum z, which gives 
(se X;s« zi — 0 C S. Therefore z € S. Le. the property is always true for the minimum element z of X.) 


Another metaphor for transfinite induction is that it is like left and right continuity. If a function is every- 
where left-continuous and everywhere right-continuous, then it must be continuous everywhere. 


11.8.6 REMARK: Literature for transfinite induction. 
For proofs of transfinite induction, see E. Mendelson [370], page 174; Wilder [403], pages 123-124. For 
transfinite recursion, see Quine [382], pages 174-191; Bernays [341], pages 100-113. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


404 11. Order 


[www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


book dri 


gto. All Rights Reserved. You m 


ft for personal use. Publie redistribution of this book draft in elect 


‘or printed form is forbidden. You may not charge- 


[405] 


Chapter 12 


ORDINAL NUMBERS 
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12.0.1 REMARK: Hou to count the elements of a set. 

One of the most fundamental attributes of any set is its cardinality, which means the number of elements in 
the set. The number of elements is independent of any individual attributes of the elements. In other words, 
cardinality measures quantity, not quality. 


The most obvious way of determining the number of elements in a real-world set is to count them. In other 
words, one associates each element of the set with a number, starting at 1, for example, and continuing in 
increasing order until all of the elements have been counted. Then the number of the last element counted 
is the number of elements in the set. The numbers which are used for this counting procedure are known 
as “ordinal numbers”. Roughly speaking, an ordinal number is a number used in the counting procedure, 
whereas a cardinal number is the result of the counting procedure, i.e. the last number which was counted. 


In natural-language grammar, the ordinal numbers are “first”, “second”, “third”, etc., whereas the cardinal 
numbers are “one”, “two”, “three”, etc. This distinction reflects the role of ordinal numbers in the assignment 
of an ordered sequence with the elements of a set, whereas the cardinal numbers merely tell you the total 
number of elements. The cardinality of a set depends only on the last number reached, not on the particular 
order in which the elements were counted, whereas the ordinal numbers are used for associating particular 
number-tags with particular elements. 


Since ordinal numbers provide the means by which the elements of sets are counted, it seems that ordinal 
numbers should be defined before cardinal numbers, because elements can be counted only by associating 
numbers in a one-to-one (i.e. bijective) manner with the elements of the set. In other words, the counting 
procedure is a prerequisite for defining the cardinality of a set. 


It is true that one may compare the cardinalities of sets without counting them (i.e. bijectively associating 
them) with the elements of a standard set of ordinal numbers. So it could be argued that cardinality 
is the prior concept. But in practice, one usually wants to know cardinalities of sets in some absolute 
sense, measured with respect to sets with known, standardised cardinality, not merely comparing relative 
cardinalities. Therefore ordinal numbers are presented here as the prior concept. 


12.0.2 REMARK: The primacy of numbers and sets in mathematics. 

Numbers are arguably more fundamental than sets. Sets are merely containers. If containers contain only 
containers, it is difficult to see how structures built only from containers can signify anything useful. If one 
looks inside the sets in mathematical analysis, one ultimately finds numbers. In other words, the sets either 
contain numbers, or sets of numbers, or sets of sets of numbers, etc. In algebra and geometry, the ultimate 
elements of sets are often points and other kinds of objects, but in practical situations, these points and 
other objects are mostly expressed in terms of numbers. (E.g. points are generally given coordinates, and 
elements of algebraic structures are typically indexed by integers in some way.) 
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Mathematics thrived without set theory until the time of Cantor in the late nineteenth century. Before the 
introduction of Cartesian coordinates, most mathematics was expressed in terms of integers, points and lines, 
but line segments, in particular the theory of proportion, played a role similar to the modern concept of real 
numbers. Most of the progress in pure and applied mathematics in the last 400 years has been expressed in 
terms of numbers. 


Despite the apparent primacy of numbers in mathematics, and the consequent secondary role of sets, numbers 
have also been defined in pure mathematics for the last hundred years in terms of sets. In other words, sets 
have been regarded as the primary concept, and numbers have been defined within a set theory framework. 
This is a kind of *set-ism", analogous to the early 20th century “logicism” which claimed that all mathematics 
is reducible to logic. 


It is perfectly possible, and very much more natural, to axiomatise numbers independently of sets. In this 
book, ordinal numbers are defined in terms of sets in the usual way, and sets are axiomatised as the primary 
conceptual framework. A very strong argument in favour of defining at least the ordinal numbers within set 
theory is the requirement for sufficient “standard-cardinality” sets to use in equinumerosity tests to measure 
cardinality. Nevertheless, numbers are not sets. Although it is possible to represent numbers as particular set 
constructions, it is also possible to arithmetise set theory. Numbers and sets are complementary conceptual 
frameworks. They have “shared primacy”. In fact, it seems even more true to say that all three conceptual 
domains, logic, sets and numbers, have shared primacy. 


12.0.3 REMARK: The special role of ordinal numbers in Zermelo-Fraenkel set theory. 

Ordinal numbers are not merely a useful construction within ZF set theory for measuring the cardinality of 
finite sets, for providing “order types” for comparison with general well-ordered sets, and for parametrising 
general transfinite recursion procedures. The ordinal numbers also provide a kind of “spine” or “backbone” 
for the “class of all sets”. This special role for ordinal numbers in ZF set theory is discussed in Remark 12.6.4, 
where von Neumann’s transfinite recursion procedure for ordinal numbers is applied to sets. Then ordinal 
numbers act as the “rank” parameter for general sets. Each single step in the transfinite inductive procedure 
is associated with an application of the power-set axiom to expand the “class of all sets”. Each limit operation 
in this procedure is associated with a similar application of the union axiom. In this way, every set in a 
minimal ZF universe is associated with a unique ordinal number, which is known as the “rank” of the set. 


There is some similarity between the von Neumann style of ordinal number construction and a construction 
which arises when trying to resolve Russell’s paradox. If one requires a universe of sets U to be an element of 
itself, one way to do (try to) achieve this is to extend the universe to the set UU{U}. But then this expanded 
universe needs to be included in itself, which requires a further expansion to the set UU (UY U {U U {U}} 
and so forth. This gives rise to a construction which somewhat resembles a von Neumann style of hierarchy 
built from two atoms, one of them being the empty set (), the other being the universe U. 


Despite the unquestioned importance of the ordinal numbers for general set theory, the portion of set theory 
which is applicable to differential geometry does not require most of the breathtaking size and often exotic 
phenomena which are associated with ordinal numbers beyond the first few stages of the transfinite recursion. 
(The apparent irrelevance of most of ordinal number theory for practical mathematics is also mentioned in 
Remarks 12.5.4 and 12.5.5.) 


12.1. Extended finite ordinal numbers 


12.1.1 REMARK: Differences between ordinal numbers and integers. 
Ordinal numbers are similar to integers, but different. 


(1) Ordinal numbers include an enormous variety of infinite numbers which have no counterpart in the set 
of integers. 

(2) Integers include all of the finite ordinal numbers (in the sense that the non-negative integers may be 
identified with the finite ordinal numbers), but integers include also the negative integers, which cannot 
be identified with ordinal numbers. 


(3) Ordinal numbers have order structure defined by the set inclusion relation, but no arithmetic operations, 
whereas the integers have arithmetic operations such as addition and multiplication. 
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The finite ordinal numbers are defined inductively as 0 = 0, 1 = {0}, 2 = {0,1}, 3 = {0,1,2}, nt 2 nU {n}, 
and so forth. (This construction for the ordinal numbers is due to von Neumann [436].) Unfortunately, 
mathematical induction cannot be used to define ordinal numbers because induction requires the prior 
definition of all finite ordinal numbers. (See Theorem 12.2.12.) Construction of the ordinal numbers must 
be achieved using logical predicates which do not refer to the ordinal numbers, directly or indirectly. 


The von Neumann construction for finite ordinal numbers has various useful properties. For example, 
Oele2e3...and0C1C2C3.... But x € y => zx +y. Therefore09 1€ 2& 3.... The membership 
relation is transitive, and in fact corresponds exactly to the usual notion of order on the integers. 


12.1.2 REMARK: Finite ordinal numbers are the most useful. 

It is convenient to first define finite ordinal numbers, which are by far the most useful, and then separately 
define the infinite and transfinite ordinal numbers. Definition 12.1.34 characterises finite ordinal numbers in 
terms of a set-theoretic formula (i.e. a logical predicate). Definition 12.1.28 introduces the set of all ordinal 
numbers. But first, Definition 12.1.3 introduces ordinal numbers whose elements are all finite. 


12.1.3 DEFINITION: An extended finite ordinal number is a set N such that 
Vm € N, (m — 0 or da € N, m — aU (a]). (12.1.1) 


12.1.4 REMARK: Every non-empty element of an extended finite ordinal number has a predecessor. 

A ZF set is an “extended finite ordinal number” if and only if it is either a finite ordinal number or else the 
single infinite ordinal number w, which is equal to the set of all finite ordinal numbers. (This is shown in 
Theorem 12.1.39 (i). See Notation 12.1.29 for w. See Definition 12.1.34 for finite ordinal numbers.) These 
finite ordinal numbers, and one infinite ordinal number, have in common the property that each of its 
elements is either empty or is the successor of some other element of the set. (See Definition 12.2.17 for 
successor sets.) So these may both be regarded as “sequences”, since all of their elements, except the empty 
element, have a predecessor within the set. 


Looking further forward to the set N = wt = w U {w}, using Notation 12.2.18, it may be noted that such 
N is not an extended finite ordinal number because it contains w, which is not the successor of any element 
of N. Looking even further forward to the general ordinal numbers in Section 12.5, one may observe that 
w is a “limit ordinal" by Definition 12.5.16. With this terminology, line (12.1.1) excludes limit ordinals as 
elements of N. In other words, extended finite ordinal numbers are ordinal numbers which do not contain 
any limit ordinals. 


12.1.5 REMARK: Simplification of extended finite ordinal definition using the successor map. 

If the map s : a — aU {a} is regarded as a function, one may rewrite line (12.1.1) in the function set-map 
notation as in Remark 10.6.6 as N \ (0) C s(N), which looks much simpler. If the map s is restricted to 
a set N, then it is in fact a true ZF function by the axiom of replacement. (See Section 7.7.) It may seem 
surprising that such a simple formula constrains N to be either a finite ordinal or else equal to the set of all 
finite ordinals. It is the special nature of the map s which imposes this very strong constraint. (This is the 
"successor map" construction in Definition 12.2.17.) 

If p denotes the inverse of the injective map s : a — aU {a}, then line (12.1.1) implies p(.N V {0}) C N. In 
other words, the predecessor of every element of N, apart from (), is an element of N. However, s is not 
surjective to the class of all sets. So this set-inclusion is weaker than line (12.1.1). This can be remedied 
by also requiring N to lie in the range of the successor map s. (This follows from Theorem 10.7.3 (i) in the 
case of ZF sets and functions.) Thus line (12.1.1) is equivalent to the requirement that all elements of N are 
successor sets and p(N V (0)) C N. 


12.1.6 REMARK: Set-theoretic expression for the extended finite ordinal number property. 
Since the extended finite ordinal number property in Definition 12.1.3 is useful for defining the countable 
choice axiom in Definition 13.7.21, it is of some interest to determine the complexity of set expression which 
is required for stating this property. Line (12.1.1) may be fully expanded as in line (12.1.2). 


Vm, (m EN > ( (var, a(z € m)) V (Ja, (a E N A (Va, (€ me (eavz— a))))))). (12.1.2) 


This is effectively a definition for wt = w U {w} by Notations 12.2.18 and 12.1.29, Definition 12.1.28 and 
Theorem 12.1.37 (i). This is then used for Definition 13.7.6 for countable sets. Therefore line (12.1.2) becomes 
effectively part of Definition 13.7.21 for the countable choice axiom. 
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12.1.7 REMARK: Complementary roles of the ZF infinity and regularity axioms. 

The ZF infinity and regularity axioms play complementary roles in the existence and uniqueness of the set 
w of all finite ordinal numbers. The infinity axiom guarantees the existence of a set w where every successor 
of an element is also an element, and that every element (except the empty set) is a successor of at least one 
element. In other words, the elements of w are precisely those which are equal to the empty set or else are 
the successor of some element. However, this is not enough to prevent infinite sequences of containments 
“on the left”. In other words, there could be a sequence of containments which is not terminated “on the 
left” by the empty set. The ZF regularity axiom requires every containment sequence to terminate with the 
empty set. This implies the uniqueness of the set w. Thus, roughly speaking, the regularity axiom guarantees 
termination on the left, and the infinity axiom guarantees non-termination on the right. (This is illustrated 
in Figure 12.1.1.) 


0 {0} (0,00) 


axiom of ^ (7 (Y Y CN CN KO CY Ss axiom of 
regularity ced d ddr didi infinity 
0 1 2 3 4 5 6 7 8 


Figure 12.1.1 Complementary roles of the ZF axioms of regularity and infinity 


Termination on the left is just as important as non-termination on the right. (Termination on the left is 
what makes the induction principle work, and induction is the principal principle in mathematics!) 


12.1.8 REMARK: Design choices for definitions of ordinal numbers. 

It very often happens in mathematics that one knows some properties of a concept which one has in mind, 
and so one can write down a few properties and hope that these are sufficient and necessary to nail down the 
concept as an object or a class of objects. Therefore one of the most important tasks after defining a concept 
is to prove existence and uniqueness in the case of a single object, or to prove that the class of objects is as 
expected in the case of a class of objects. In the case of ordinal numbers, there should an infinite class of 
finite ordinals, and a single set which contains them all. The purpose of many of the theorems in Section 12.1 
is to verify that Definition 12.1.3 is as intended. S 


Most textbook definitions of ordinal numbers are stated in terms of well-ordering and the transitivity of the 
set-membership relation as in Definition 12.5.7. (A transitive set is a set X whose elements are all subsets 
of X. That is, Vr € X, Vy € x, y € X. In other words, Vr € X, x C X.) Suppes [395], page 131, defines 
ordinal numbers to be both well-ordered by the set-membership relation and transitivity (which he calls 
*completeness"). Transitivity and set-membership well-ordering are required also by Jech [364], page 33; 
E. Mendelson [370], page 172; Quine [382), page 157; Cohen [349], page 60; Roitman [385], page 73. Ordinals 
are defined to be transitive, totally ordered and well-founded (i.e. satisfying the ZF regularity axiom) by 
Bernays [341], page 80. (This seems to be an erroneous definition. Similarly, only transitivity and total order 


by set-membership are required by Pinter [377], page 183.) 


According to Chang/Keisler [347], page 580, the definition which *has by now become fairly standard in the 
literature” defines an ordinal as a set X which is well-ordered by set-membership and satisfies [J X C X. 
The condition LJ X € X is equivalent to transitivity. So the Chang/Keisler definition is also in terms of well- 
ordering and transitivity, but they comment that: “This definition of ordinals is, of course, quite artificial". 
According to a footnote by Quine [382], page 157, the earliest formulation of this well-ordering/transitivity 
kind was in 1937 by Robinson [433]. (The same attribution is given by Bernays [341], page 80. For an 
apparently different and earlier attribution, see Lévy [368], page 52.) 


Slightly different definitions are given by some other authors. Shoenfield [390], page 246, defines ordinals 
as transitive sets whose elements are all transitive. Stoll[393], page 307, and Halmos [357], page 75, define 


them as well-ordered sets X for which each element x € X equals its “initial segment" {ye X; y «€ x]. 
Generally ordinal numbers are defined to make proofs convenient rather than to maximise intuitive appeal. 
Definition 12.1.3 is chosen for maximum intuitive appeal, although the cost is a not-so-easy proof of basic 
properties in Theorems 12.1.17, 12.1.19, 12.1.21 and 12.1.23. The transitivity property of extended finite 
ordinal numbers is proved in Theorem 12.1.17 (iv). 


12.1.9 REMARK: The advantage of a bare-handed techniques relative to heavy machinery. 
Restricting the focus to finite ordinal numbers (and the smallest infinite ordinal number) in Definition 12.1.3 
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has the advantage that bare-handed techniques yield all of the expected properties, which avoids the need for 
the heavy machinery of the fully general theory of ordinal numbers. Self-contained, bare-handed techniques 
can give a stronger understanding of the mechanisms underlying results. 


12.1.10 REMARK: A sequence of set-containments must be terminated “on the left”. 

If condition (12.1.1) in Definition 12.1.3 omits the possibility that m may be empty, then N is necessarily 
empty by the axiom of regularity, Definition 7.2.4 (7). (This axiom forbids finite cycles of set-containments 
in particular, as asserted in Theorem 7.8.4.) The option m = @ permits the sequence of set-containments to 
be terminated “on the left”. 


12.1.11 THEOREM: Impossibility of set-containment chains which do not “terminate on the left”. 
Let N be a set which satisfies Ym € N, da € N, m — aU {a}. Then N = 0). 


PROOF: Let N be a set which satisfies Vm € N, da € N, m = aU {a}. If m = aU {a} and a € N, then 
mAN 2 (a) z 0. So Vn € N, mAN Z0. But N Z0 => 3z e N, zt N — 0 by the ZF regularity axiom, 
Definition 7.2.4 (T). Hence N = 0. 


12.1.12 REMARK: A useful form of the axiom of regularity for proving ordinal number theorems. 
'Theorem 12.1.13 is a contrapositive form of Theorem 12.1.11 which is useful for proving theorems. 


12.1.13 THEOREM: Termination of predecessor chains in subsets of extended ordinal numbers. 
Let N be an extended finite ordinal number. Let N' be a non-empty subset of N which satisfies @ d N’. 
Then dm € N', dae NN N', m —aU {a}. In other words, da € NN N', aU {a} € N. 


PROOF: Let N be an extended finite ordinal number. Let N’ be a set such that Ø 4 N' C N and 0 £ N'. 
Then Vm € N', da € N, m = aU {a} by Definition 12.1.3, because N’ C N and Vm € N', m z 0. By 
Theorem 12.1.11, ^(Vm € N', da € N', m = aU {a}) because N' Z 0. So dm € N', Va € N', m # aU {a}. 
Let m € N’ satisfy Va € N', m Z aU {a}. Then Ja € N, m = a U {a}. Let a € N satisfy m = aU {a}. 
Then a ¢ N'. So Ja E€ NNN', m=aU {a}. Hence dm € N', Jac NN N', m— aU {a}. 


12.1.14 THEOREM: Every successor has a unique predecessor. 
Let m, a and b be sets such that m = a U {a} and m = bU {b}. Then a = b. In other words, for any set m, 
there is at most one set a such that m = aU {a}. 


PROOF: Let m, a and b be sets such that m = aU (a) and m = bU (b). Then a € m. Soa € bU {b}. So 
a € b or a — b. Similarly, b € m and so b € a or a= b. Suppose that a 4 b. Then a € b and b € a. This is 
excluded by the ZF axiom of regularity. (See Theorem 7.8.4 (ii).) Hence a = b. 


12.1.15 REMARK: Trivial examples of extended finite ordinal numbers. 

Clearly N = () satisfies Definition 12.1.3. Let N = (0). Let m € N. Then m = 0. So 0 and {0} are both 
extended finite ordinal numbers. Let N = {0,{O}}. Let m € N. Then m = 0 or m = (0). Let a = 0). 
Then m = aU {a}. So (0, (01 is also an extended finite ordinal number. However, N = ((0]) is not an 
extended finite ordinal number because ģ ¢ N. 


It is easily verified that the sets (0, (0), (0, (013) and (0, (0), (0, (013, (0, (0), (0, (01) 1) are also extended 
finite ordinal numbers. It would seem preferable to avoid such exponentially (or geometrically) expanding 
ordinal numbers by using instead the simpler sequence Ø, {0}, {{O}}, (((0) 3, etc. The members of this 
sequence are different to each other. So they could be used as a representation of the non-negative integers. 
(This construction appeared in a 1908 paper by Zermelo [444], pages 266-267. See also Bernays [341], page 21; 
Suppes [395], page 139; Quine [382], page 81; Moore [371], page 273; Smullyan/Fitting [392], page 31.) 
However, the simpler construction does not generalise easily to transfinite ordinal numbers. (This comment 
was also made by von Neumann [439], page 385.) It also has the disadvantage that each non-empty member 
of the sequence contains exactly one element. This makes it useless as a “cardinality yardstick” for measuring 
the cardinality of sets. 


12.1.16 REMARK: Basic properties of extended finite ordinal numbers. 

Some basic general properties of extended finite ordinal numbers are given in Theorem 12.1.17. These 
properties are arguably intuitively obvious. But the proofs of these properties are probably not so obvious. 
Part (xiv) asserts, in slightly cryptic form, that extended finite ordinal numbers are totally ordered by 
set-containment. Part (xv) says the same thing in slightly less cryptic form, 
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12.1.17 THEOREM: Some order-related properties of extended finite ordinal numbers. 
Let N be an extended finite ordinal number. 
(i) EN #90, then 0 € N. 
(ii) Vm € N \ {0}, OE m. 
(ii) OE NIN \ (0)) i£ NN (01 40. 
(iv) Vm € N, m CN. In other words, UN CN. [Transitivity of “€” on N] 
(v) Vm € N, Va € m, a € N. (In other words, (a € m and m € N) > a€ N.) 
(vi) Vm € N, mS N. 


) 

) 

) 

) 

) 
(vii) Vm € N, Vb € m, b C m. In other words, Ym € N, Um C m.  [Transitivity of “€” on elements of N.] 
(vii) Vm € N, Vb € m, bU (b) Cm. 
(ix) Vm € N, Vb e m, bU (b) E N. 
(x) Vm € N, Vajbe m, (aebVa=bV bea). [The elements of N are totally ordered by *€".] 
(xi) If m € N, then m is an extended finite ordinal number. 
(xii) If m € N, then mU {m} is an extended finite ordinal number. 
(xiii) If m € N, then (x € N; m ¢ x) is an extended finite ordinal number. 
(xiv) Vm € N, mU (m) = (x € N; mé a). [N is totally ordered by *€" .] 
(xv) Va,be N,(aebva-b v bea). [N is totally ordered by *€" .] 
(xvi) Va,b € N, (ac b & a&b). 


PROOF: For part (i), let N be an extended finite ordinal number. Suppose Ø ¢ N. Then it follows from 
Definition 12.1.3 that Vm € N, Ja € N, m = aU (a). Hence N = Ø by Theorem 12.1.11. 

For part (ii), let N be an extended finite ordinal number. Let N’ = (m € N \ (0); 0 d m]. Let m € N'. 
Then m € NV (0). So da € N, m = aU (a) by Definition 12.1.3. But Ø ¢ m. So for a such that m = aU {a}, 
both Ø ¢ a and Ø z a. So a € N'. Therefore Ja € N', m = aU {a}. So Vm € N', da € N', m = aU {a}. 
Hence N' — () by Theorem 12.1.11. 


Part (iii) follows from part (ii) and Theorem 8.4.4 (ii). 


For part (iv), let N be an extended finite ordinal number. Suppose that m Z N for some m € N. In 
other words, (UN) V N 4 0. Let x € (UN) N N, and let N’ = (m € N; x € m). Then N’ z 6), and 
Vm € N', m 4 0, and Vm € N', da € N, m = aU (a) because N' C N. If m = aU {a} with m € N' 
and a € N, then z € M, and so z € a or x = a. But x  N and a € N. So x # a. Therefore x € a. 
So a € N’. Consequently, Vm € N', da € N', m — aU {a}. Therefore N' = () by Theorem 12.1.11. Hence 
m C N for all m € N. 

Part (v) is a paraphrase of part (iv) by Notation 7.3.3. 

For part (vi), let N be an extended finite ordinal number. Let m € N. Then m C N by part (iv). But 
m = N would imply N € N, which is excluded by the ZF axiom of regularity. Hence m & N. 

For parts (vii), (viii) and (ix), let N be an extended finite ordinal number. Let m € N and b € m. Define 
N'—ívxvc€ N;be€m ^c Cm}. Then N’ Z ( because m € N'. Clearly Ø € N’ because b ¢ Ø. So by 
Theorem 12.1.13, Jr € N’, Ja € NN N', x = aU {a}. For such x and a, either b d a or a Z m (or both), 
and x = aU {a} satisfies b € x and x C m. But a C m because a C x. So b ¢ a. Therefore b = a 
because b € aU {a}. So x = bU {b}. So bU {b} € N'. Therefore bU (b) C m, which proves part (viii). 
Therefore b C m, which proves part (vii). Also bU (b) € N because N’ C N, which proves part (ix). 


The proof of part (x) follows the pattern of part (vii). Let N be an extended finite ordinal number. Let 
m € N,a € m and b € m. Define N— (r € N;ae€z ^ b€ y ^ x € m). Then m € N'. So N' z (). 
Clearly Ø ¢ N' because a ¢ (0. So Vr € N', 3y € N, x = y U {y} by Definition 12.1.3. Assume that 
Vr € N', (x Fav {a} ^c #bU {b}). Then Yz € N', dye N, (x =y U {y} ^v zaU(a)] ^ x z bU(b0)). 
So Va € N’, Jy € N, (x = yU {y} Ay#aAy Æ b). But the conditions a € x, b € x, x = yU {y}, y za 
and y Æ b together imply a € y and b € y. So Vz € N’, dye N, (x =yU{fy}^aEyAbE y). Therefore 
Yx € N', Jy € N',x = y U {y}. From this, it follows by Theorem 12.1.11 that N’ = Ø. Therefore the 
assumption Vx € N’, (x ZaU {a} ^ x £bU {b}) must be false. So 3x € N', (x = aU {a} V x = bU {b}). 
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So aU {a} € N’ or bU {b} € N’. So either b € aU {a} € m or a € bU (b) C m. So either (b € a V 
b=a) A aU{a} Cmor (a€b V a=b) ^ bU(b) C m. Hence either a € b or a = b or bE a. 


For part (xi), let N be an extended finite ordinal number. Let m € N. Let b € m \ {0}. Then b € N 
because m C N by part (iv). So Ja € N, b = aU {a} by Definition 12.1.3. Then a € b for some a € N 
satisfying b = aU {a}. But b C m by part (vii). So a € m and b = aU {a} for some a € N. So 
Vb € m, (b= Ø V Ja € m, b = aU {a}). Hence m is an extended finite ordinal number by Definition 12.1.3. 


For part (xii), let N be an extended finite ordinal number. Let m € N. Then m is an extended finite ordinal 
number by part (xi). So Vb € m, (b = Ø v Ja € mU {m}, b = aU (aj). If m Æ 0, then m = cU {c} for 
some c € N. Then c € m. Soc € mU {m}. Therefore Vb € mU (m), (b = 0 v Ja € mU (m), b= aU {a}). 
Hence m U {m} is an extended finite ordinal number. 


For part (xiii), let N be an extended finite ordinal number. Let m € N. Let Nm = {x € N; m é x}. Let 
x € Nw \ {0} C N. Then by Definition 12.1.3, dy € N, x = y U (y). But if m ¢ x, then m ¢ y. So 
Jy € Nm, © = y U {y}. Therefore Vr € Nm, (£x = 0 v 3y € Nm, £x = y U {y}). Hence {x € N; mé x} is an 
extended finite ordinal number. 


For part (xiv), let N be an extended finite ordinal number. Let m € N, and let Nm = {£ € N; m ¢ xj. 
Then m U (m) € Nm because m U (m) C N and (x € m V x = m) > m ¢ x. For the first stage of 
this proof, let Ni, = (y € Nm; y € mU{m}} = Nu, N(mU(mp) = (ye N;mé£y ^ y é mU (mij. 
Then 0 ¢ Nj, because Ø € N by part (i), since N # 0, and m € Ø and Ø € mU (m) by parts (xii) and (i). 
Assume that N/. z 0. DINER Nm is an extended finite ordinal number by part (xiii), Theorem 12.1.13 implies 
dye Nh, Iz E "Nin \ N me Y= ZU{z}. Then m é y, mé z, y  mU(m), z € mU (m), and y = zU {z} for 
such y and z. From z € mU {m}, it follows that z € m or z 2 m. From m ¢ y = z U {z}, it follows that 
m € z and m Æ z. So z = m is excluded. Therefore z € m. Note that z U {z} = y € mU {m}. (This first 
stage of the proof of part (xiv) has found a “branch point” z between the sets m U (mj and Nm.) 


For the second stage of the d of part (xiv), let Nj, = {w € mU (m); z € w}. Then Nj, 0 
because m € N’, But Ø ¢ Ni, . because z g 0, and m U {m} is an extended finite ordinal number by 
part (xii). So 3 Iw e Za Ww € (mU (m]) V N, mz) W = vU {v} by Theorem 12.1.13. Then v € mU {m}, 
w€ mU im], zévand z € w = vU {v}. Soz €v orz =v. But z € v is excluded by z ¢ v. So z = v. So 
zU {z} =vU{v} 2 w € mU {m}. This contradicts the conclusion that z U {2} € mU (m). So N7, = 0. 
Hence m U {m} = {x € N; m ¢ x}. (The proof of part (xiv) is illustrated in Figure 12.1.2. The question 


marks in the diagram indicate what would happen if the axiom of regularity was not enforced.) 


w=vU {v} m mU{m} N 


y=zU{2z} Nm = {xE N; m¢ x} 
Figure 12.1.2 Strategy for a proof that Nm C mU (m) 


For part (xv), let N be an extended finite ordinal number, and let a,b € N. Suppose that a ¢ b. Let 
Na = {x € N; a ¢ £}. Then b € Na. But Na = aU {a} by part (xiv). So b € a or b — a. 

For part (xvi), let N be an extended finite ordinal number, and let a,b € N. Suppose that a € b. Then a Ẹ b 
by part (vi). No suppose that a & b. Then a Z b. So a € b or b € a by part (xv). Suppose that b € a. Then 
b & a by part (vi). This contradicts the assumption a & b. So a € b. 


12.1.18 REMARK: The ZF axiom of regularity guarantees extended finite ordinals have a lower bound. 

In Theorem 12.1.17, the ZF axiom of regularity, via Theorems 12.1.11 and 12.1.13, plays a pivotal role which 
is analogous to the principle of mathematical induction. Since the elements of extended finite ordinals are 
totally ordered by the membership relation, the ZF axiom of regularity effectively guarantees that every 
subset of such an ordinal must have a minimum element. This is a kind of well-ordering property. Hence 
induction-style proofs are able to be used in Theorem 12.1.11, arriving at contradictions by examining the 
minimum object of a set of objects which do not satisfy some asserted proposition. 
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12.1.19 THEOREM: Total order of extended finite ordinal numbers by set-inclusion. 
Let Nj and Nə be extended finite ordinal numbers. 


(i) N1 A N3 is an extended finite ordinal number. 
(ii) Mı U No is an extended finite ordinal number. 
(iii) Either Ni jm No or No C Ni. 


PROOF: For part (i), let N; and N3 be extended finite ordinal numbers and let N = Nı No. Let m € N. 
Then m = Ø v (day € Nj, m = a1U (a1]) and m = 0 v (Jaz € No, m = a2 U {a2}) by Definition 12.1.3. 
Therefore either m = Ø or (day € Ni, m = a, U (a1]) ^ (dag € No, m = az U {a2}). So either m = f or 
da, € Ni, dag € No, M = a U {a1} = ag U {a2}. But a4 = a2 for such a; and ag by Theorem 12.1.14. 
Therefore for all m € N, either m = @ or da € N, m = a U {a}. Hence N = N4 A Ns is an extended finite 
ordinal number. 


For part (ii), let N4 and Nə be extended finite ordinal numbers and let N = N1UN3. Let m € N. Then either 
m € Ni NA Ns or m € Ni \ No or m € Na \ Ny. If m € Ni O Ns, then either m = or da € N, m — aU {a}, 
as in the proof of part (i). Suppose that m € N1 V No. Then either m = @ or Jay € Nj, m = ay U {a1} by 
Definition 12.1.3. Therefore either m = @ or Ja, € N, m = a1 U {a1}. If m € Na \ Ni, the same consequence 
follows. Hence N = N; U No is an extended finite ordinal number by Definition 12.1.3. 

For part (iii), let Ni and Nə be extended finite ordinal numbers, and suppose that N41 Z Nə and No Z Ni. 
Let zi € Ni V Ng and z € Na \ Mi. By part (ii), Ny U Ns is an extended finite ordinal number. So by 
Theorem 12.1.17 (xv), 21 € £2, £1 = T2 or T2 € z,. Suppose that xı € £2. Then zı € Nə because ry C Nə 
by Theorem 12.1.17 (iv). This contradicts the assumption zı € Mı \ Ng. So either Ni C No or No C Nj. 
The same conclusion follows from the assumptions 7, = £2 or 2 € xı. Hence Mı C No or No C Ni. 


12.1.20 REMARK: Distinguishing finite from infinite ordinal numbers. 

One may be tempted to try to prove that the inclusion |J N C N in Theorem 12.1.17 (iv) is either always an 
equality or always a strict inequality. (This set inequality is the transitivity property of ordinal numbers.) 
But if N = 0, then UN = 0 = N. If N = (0), then UN —20& N. If N = (0, (01), then UN = (0) EN. 
So the inclusion is sometimes an equality and sometimes a strict inequality. In fact, this question determines 
(roughly speaking) whether an ordinal number is finite or infinite. (Note that it does not distinguish finite 
from infinite for general ordinal numbers.) 


Theorem 12.1.21 (xii) states that the criterion |J N & N is equivalent to the condition 3x € N, xU (x) ¢ N. 
Since the set x U {x} may be interpreted as “the successor of x”, this condition may be interpreted to mean 
that there is an element of N which has no successor, which seems intuitively to be a necessary condition for 
a set to be finite. The condition dz € N, xU(x]) ¢ N has a stronger intuitive appeal, whereas the equivalent 
condition |J N € N has a stronger “minimalist” appeal. 


12.1.21 THEOREM: Some properties to distinguish finite from infinite extended finite ordinals. 
Let N be an extended finite ordinal number. 
(i) If x € N and zU {x} ¢ N, then N \ {x} is an extended finite ordinal number. 
(ii) If x € N and N \ {a} is an extended finite ordinal number, then xz U (x) d N. 
(iii) If x € N and zU (x) ¢ N, then NU (xU {a}} is an extended finite ordinal number. 
(iv) If z € N and zU (z] ¢ N, then N = xU {r}. 
(v) Gc e N, zU(xN é N) & (Sx € N, N —xUtx]). 
(vi) If Jx € N, zU (xz) d N, then N U{N} is an extended finite ordinal number. 
(vii) f dz € N, zU (xz) € N, then UN & N. 
(viii) Ifa € N and xU (x) ¢ N, then UN = x. 
) 
) 
) 
) 
) 


^n 


(ix) If Yx € N, zU(x) € N, thn UN = N. 

(x) UN is an extended finite ordinal number. 

(xi) (Vr € N, zU(z) e N) UN=N. 

(xii (Jr e N, zU(z3 e N) UNS N. 

(xiii) If N Z 0 and N U {N} is an extended finite ordinal number, then dr € N, zU {x} ¢ N. 


TH 
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(xiv) If N U{N} is an extended finite ordinal number, then N = Q or dr € N, xU (x) ¢ N. 
Jx € N, zU (xd é N) & ((N Z0) A (NU {N} is an extended finite ordinal number)). 


~ 


(xv) 


PROOF: For part (i), let N be an extended finite ordinal number, and let x € N satisfy x U {x} ¢ N. 
Suppose that N X {x} is not an extended finite ordinal number. Then dm' € N V {x}, (m #0 ^ Va € 
NN {x}, m Z aU {a}) follows from Definition 12.1.3. So some m’ € N satisfies m’ 4 x, m # 0, and 
Vac N \ {x}, m ZaU(a). But Vm € N, (m = V da € N, m = aU {a}) by Definition 12.1.3 because 
N is an extended finite ordinal number. So m’ satisfies da € N, m^ = aU {a}. Combining this with 
Va € N\ {x}, m ZaU(a) gives a = x and m’ = xU {x}. But this contradicts the assumption xU {x} d N. 
Therefore N \ {x} is an extended finite ordinal number. 


For part (ii), let N be an extended finite ordinal number, and let x € N be such that N \ {x} is an extended 
finite ordinal number. Then Vm € N \ {x}, (m = 0 v (ae N \ {a}, m = aU {a})) by Definition 12.1.3. 
Suppose that x U {xz} € N. Then zU {x} € N \ {x} because x Z zU (x). Let m = zU (xz). Then m £ (). 
So Ja E€ N \ {x}, m = a U {a}. But x U {x} = aU (a) implies z = a by Theorem 12.1.14. Sox € N \ (x), 
which is a contradiction. Hence z U (x) ¢ N. 


For part (iii), let N be an extended finite ordinal number. Let x € N satisfy x U {x} € N. Let N' = 
NU(zU(z])). Let m € N’. Then m € N or m = zU(z). If m € N, then m — V da € N, m= aU {a} by 
Definition 12.1.3. So m = Ø v da € N', m =aU (a). If m 2 x U (x), then m = 0 v da € N’, m =aU {a} 
holds with a = z € N'. So Vm € N', (m = 0 v Ja € N', m = aU {a}). Hence N' is an extended finite 
ordinal number by Definition 12.1.3. 


For part (iv), let N be an extended finite ordinal number. Let x € N satisfy zU(zj ¢ N. Then zU(zj C N 
since x C N by Theorem 12.1.17 (iv). Suppose that y € N \ (xU (x]). Then y ¢ x and y zZ x. Therefore 
x € y by Theorem 12.1.17 (xv). So xU {zx} € N by Theorem 12.1.17 (ix). This is a contradiction. Therefore 
NN(zU(z])-—0. Hence N = z U [x]. v 

For part (v), let N be an extended finite ordinal number. Suppose that dr € N, rU {a} € N. Then 
dz € N, N = xU {z} by part (iv). Now suppose that dr € N, N = x U {x}. Then dr € N, xU {z} ¢ N 
because N ¢ N by the axiom of regularity. 


Part (vi) follows immediately from parts (iii) and (iv). 


For part (vii), let N be an extended finite ordinal number which satisfies Jr € N, x U {x} € N. Let x € N 
satisfy x U (x) € N. Suppose that [JN = N. Then xU (x) = UN by part (iv). So x € y for some y € N. 
Therefore x U {x} € N by Theorem 12.1.17 (ix). This is a contradiction. So UN # N. But UN C N by 
Theorem 12.1.17 (iv). Hence UN € N. i 


For part (viii), let N be an extended finite ordinal number. Let x € N satisfy x U {x} € N. Then z C (JN 
by the definition of |J N because x € N. Suppose that x Z JN. Then there exists y € UN with y ¢ x. 
Therefore, since |]J N C N by Theorem 12.1.17 (iv), there exist y, z € N such that y € z and y ¢ x. But then 
x =y or x € y by Theorem 12.1.17 (xv). So either x € z or else x € y and y € z. In either case, it follows 
from Theorem 12.1.17 (ix) that x U {x} € N. This is a contradiction. Hence [J N = z. 


For part (ix), let N be an extended finite ordinal number which satisfies Vr € N, xU (xj € N. Then 
Va € N, Jy € N, x € y because x € x U (x) for any set x. So Vr € N, {x € UN. Therefore N C UN. But 
UN CN by Theorem 12.1.17 (iv). Hence UUN — N. 


For part (x), let N be an extended finite ordinal number. Then N must satisfy either dr € N, x U (zx) € N 
or Vr € N, xU {a} € N, since these propositions are logical negations of each other. In the former case, 
part (viii) implies that |J N equals x, which is an extended finite ordinal number by Theorem 12.1.17 (xi). 
In the latter case, part (ix) implies that |J N equals N, which is also an extended finite ordinal number. 


Parts (xi) and (xii) follow immediately from parts (vii) and (ix). 


For part (xiii), let N be a non-empty extended finite ordinal number such that N U {NV} is an extended finite 
ordinal number. Then Vm € NU{N}, (m = 0 v (dae NU{N}, m = aU (a])) by Definition 12.1.3. In the 
case m = N, this gives N — v (dae NU(N), N = aU {a}). But N Z 0. So Ja € NU{N}, N =av {a}. 
But N ¢ N by the axiom of regularity. Hence Ja € N, aU (a) € N. 


— 


Part (xiv) follows from part (xiii). 


Part (xv) follows from parts (vi) and (xiii). 
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12.1.22 REMARK: Uniqueness of the infinite extended finite ordinal number. 

Theorem 12.1.23 (xii) asserts that there is at most one non-empty extended finite ordinal number N for 
which |) N = N. To be given a constant name and symbol, a set must exist and be unique. Existence is shown 
in Theorem 12.1.24. A constant name and symbol are bestowed in Definition 12.1.28 and Notation 12.1.29. 


12.1.23 THEOREM: Some order properties, and uniqueness of the infinite extended finite ordinal. 
Let Nj and Nə be extended finite ordinal numbers. 


(i) I 0 AN € No, then dr € Ni, xU {r} € NN Nj. 
) If Ny & No, then Ny € Ns. 
) Ni & Np if and only if Ny € No. 
) Ny, € No if and only if (Ny € No or Ny = No). 
) If Ni Æ No, then Ny € No or No € Ni. 
) Ny = Ns or N, € Ns or No € N,. 
(vii) (N1 = Nə or Ni € No) if and only if No ¢ Ni. 
) Ny € No if and only if No ¢ Ni. 
) If Ni # No, then Ni Ẹ No or No $ Ni. 
) Ny = No or Ny G No or No | Ns. 
) Ny CN or N5 CN. 
) EN; £0, No #0, UN; = N; and UJ No = No, then N, = Ns. 
) Va € Ny, (No C£ => N € Nj). (That is, N, contains all extended finite ordinals “lower” than z.) 


PROOF: For part (i), let N; and N2 be extended finite ordinal numbers with N; € N5. Define N’ = N2\ N1. 
Then Ø 4 N’ C No, and Ø ¢ N' because Ø € N; by Theorem 12.1.17 (i). So dx € Na \ N', xU (x) € N’ by 
Theorem 12.1.13. In other words, da € Ni, zU {x} € No \ Nj. 

For part (ii), let Nı and N3 be extended finite ordinal numbers with N, € No. If Ny = Ø, then N, € Nə 
because Ø € No by Theorem 12.1.17 (i). So suppose that N # Ø. Then by part (i), x U (x) € N2 \ Mı for 
some x € Ni. For this x, Theorem 12.1.21 (iv) implies Ny = z U {x}. Hence N; € No. 

Part (iii) follows from part (ii) and Theorem 12.1.17 (vi). 


(iii 
Part (iv) follows from part (iii). 
( 


Part (v) follows from part (ii) and Theorem 12.1.19 (iii). 
Part (vi) follows from part (v) and Theorem 4.7.9 (Ixv). 


For part (vii), suppose that N; = Nz or Ny € No. Then Ng ¢ N; by Theorem 7.8.4 (i). But suppose that 
No d Ni. Then Ni = No or Ny € No by part (vi). Hence (Ni = No or Ny € Nə) if and only if No ¢ Nj. 


Part (viii) follows from parts (iv) and (vii). 


Part (ix) follows from parts (iii) and (v). 

Part (x) follows from part (ix) and Theorem 4.7.9 (Ixv). 

Part (xi) follows from part (x). 

For part (xii), let Ny and N2 be non-empty extended finite ordinal numbers with | J.N; = Ny and U No = No. 
From Theorem 12.1.19 (iii), it follows that N; C N2 or Na C N,. Suppose that Nı $ Nə. Then by part (i), 
there exists x € Ni with z U {x} € Ma \ Nj. So 3x € M,xU{x} € Nı. Therefore LJ N1 z Ni by 
Theorem 12.1.21 (vii). Similarly, N2 € N, contradicts [J No = N2. Hence Ni = Np. 

For part (xiii), let x € MN satisfy Ng C x. Then by Theorem 12.1.17 (xi), x is an extended finite ordinal 


number. So Nə = x or Na € x by part (iv). If No = x, then clearly No € Nj. If No € z, then No € Ni by 
Theorem 12.1.17 (v). Hence No Cx > Noe Nj. 


12.1.24 THEOREM: Existence of an infinite extended finite ordinal number. 
There exists a non-empty extended finite ordinal number N such that UN = N. 
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PROOF: The ZF infinity axiom, Definition 7.2.4 (8), asserts the existence of a set X which satisfies: 


Vz, (z E€ X & ((Vu, u ¢ z) V Jy, (y E X ^Vv, (vez (vey V v- y))). 


This is more conveniently expressed as 


Yz, (ze X & (z=0 v (Jy € X, z=yU {y}))). (12.1.3) 


Any set X which satisfies line (12.1.3) must satisfy Yz € X, (z = 0 v (ay € X, z = y U {y})). So any set X 
which satisfies line (12.1.3) is an extended finite ordinal number according to Definition 12.1.3. 


Any set X satisfying line (12.1.3) clearly must satisfy Ü € X because (z = 0) > (z € X). So X z 0. For 
any X satisfying line (12.1.3), let y € X and let z = yU {y}. This z satisfies dy € X, z = y U {y}. Therefore 
z € X for this z. Thus z = yU {y} € X for any y € X. In other words, Vy € X, dz € X, z = yU {y}. So 
Vy € X, 3z € X, y € z. This is equivalent to Vy € X, y € UX. And this is equivalent to X C |J X. Hence 
UX = X because [JX C X by Theorem 12.1.17 (iv). 


12.1.25 THEOREM: Existence and uniqueness of the infinite extended finite ordinal number. 
There is a unique non-empty extended finite ordinal number N such that U N = N. 


Pnoor: Existence follows from Theorem 12.1.24. Uniqueness follows from Theorem 12.1.23 (xii). 


12.1.26 REMARK: The regularity and infinity axioms shape the set of finite ordinal numbers. 

Roughly speaking, the axiom of regularity guarantees that the set of finite ordinal numbers is bounded 
“on the left”, while the axiom of infinity guarantees that it is unbounded “on the right”. In other words, 
the axiom of regularity forbids infinite chains of set-membership relations “on the left”, while the axiom of 
infinity requires the existence of infinite chains of set-membership relations “on the right". 


12.1.27 REMARK: The acceptance or rejection of the infinity axiom is "academic". 

It is notable that the properties of finite ordinal numbers can be fully developed from Definition 12.1.3 until 
Theorem 12.1.24 without asserting the existence of an infinite set. The infinity axiom’s assertion that an 
infinite ordinal number exists is apparently superfluous. Since there is no practical way to determine from 
real-world experience whether infinite sets exist or not, the issue is metaphysical. In the absence of an infinity 
axiom, one could use a form of words such as: “If an infinite set exists, then ...". All calculations would then 
be exactly the same except that a caveat would need to be attached to calculations involving infinite sets. 
For example, instead of the existential quantifier “32”, one could write instead the tagged quantifier “3” 2”, 
signifying that existence is conditional on the infinity axiom. Then someone who rejects the infinity axiom, an 
ultrafinitist for example, can accept the proposition as conditional on an existence which may or may not be 
true, whereas the infinity axiom believer may ignore the caveat. The same calculations are made regardless. 
In this sense, the infinity axiom is an insubstantial philosophical matter, one might say “academic”. The 
axiom of infinity merely “anoints” the set of all ordinal numbers, accepting it into the mathematical arena 
as a dinkum set. It is possible for people to work for causes in which they do not believe. Similarly, one may 
carry out calculations concerning conjectural metaphysical entities in which one does not believe. 

One may argue similarly that the axiom of choice could have a special existential quantifier “J4©z”, for 
example. Once again, the calculations would be the same. However, almost nothing could have been done 
in the applied mathematics of the last 400 years without infinity-related concepts such as limits, whereas 
very little of a concrete practical nature ensues from AC-dependent existence assertions. Therefore axiom- 
dependence tagging has a practical benefit in the case of the axioms of choice, whereas infinity-axiom tagging 
would be annoyingly ubiquitous. 


12.1.28 DEFINITION: The set of finite ordinal numbers is the unique non-empty extended finite ordinal 
number N such that UN = N. 


In other words, the set of finite ordinal numbers is the unique set N such that: 


(i) N 7 0, 
(ii) Vm € N, (m — 0 V da € N, m — aU {a}), 
(iii) UN =N. 
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12.1.29 NOTATION: w denotes the set of finite ordinal numbers. 


12.1.30 THEOREM: Some basic properties of the infinite extended finite ordinal number. 
(i) w is an extended finite ordinal number. 
(ii) 0 € v. 
(iii) Vm € w, mU (m) E€ w. 
(iv) Ymi, mg € w, ((m4 U {m1} = m2 U {m2}) > m, = mə). 
) 


(v) 09 € (1 (o (0]). 


PRoor: Part (i) follows from Theorem 12.1.25 and Definition 12.1.28. 
Part (ii) follows from Theorem 12.1.17 (i) and Definition 12.1.28 (i). 
Part (iii) follows from Theorem 12.1.21 (xi) and Definition 12.1.28 (iii). 


For part (iv), let m4, ma € w satisfy mı U {mi} = maU {m2}. Then mı € mi U {m1}. So m, € mz or 
m; = m». Similarly, m3 € m, or m3 = mı. Suppose that m, Æ mg. Then m, € mg and ms € mı, which 
contradicts Theorem 7.8.4 (ii). Hence mı = m. 

Part (v) follows from Theorem 12.1.17 (iii) and parts (ii) and (iii). 


12.1.31 REMARK: Existence and uniqueness of w justifies a constant name. 

Existence and uniqueness of the set N in Definition 12.1.28 is guaranteed by Theorem 12.1.25. So the symbol 
w is aconstant name for a fixed object in Zermelo-Fraenkel set theory. This constant name for a fixed object 
probably ranks second in importance next to the constant symbolic name () for the empty set. 


12.1.32 REMARK: Notation for the set of ordinal numbers. 
The notation “w” for the set of finite ordinal numbers is generally attributed to Cantor. (See for example 
Cajori [242], Volume 2, page 45; Quine [382], page 151.) Cantor used “w” for the “first number of the second 


number-class” in an 1883 article, explaining his choice in a footnote as follows. (See Cantor [407], page 577, 
also in Cantor [344], page 195. Part 2 refers to his 1880 paper, Cantor [406], also in Cantor [344], page 147.) 


Das Zeichen oo, welches ich in Nr. 2 dieses Aufsatzes (Bd. XVII, pag. 357) gebraucht habe, ersetze 
ich von nun an durch w, weil das Zeichen oo schon vielfach zur Bezeichnung von unbestimmten 
Unendlichkeiten verwandt wird. 


This may be translated as follows. 


The symbol oo, which I have used in part 2 of this essay (Vol. 17, page 357), I replace from now on 
with w, because the symbol oo is already employed in many cases for the indication of indefinite 
infinities. 
Cantor was emphasising that his notation w indicated an “actual infinity", as opposed to the “indefinite 
infinity" or “potential infinity" notation oo. The potential infinity oo was a limit which could never be 
reached. This Aristotelian non-issue was still quite active in philosophical circles in Cantor's time. (See 
for example Korner [461], pages 20-21, 62; Struik [249], pages 80-81; Kleene [365], pages 48-49; Weyl [157], 
pages 24-26; Boyer/Merzbach [237], page 541.) — 0 m 


12.1.33 REMARK: The relation between w and the finite ordinal numbers. 
The name “set of finite ordinal numbers” in Definition 12.1.28 is justified by the name “finite ordinal number” 
in Definition 12.1.34, together with Theorem 12.1.37 (i). 


12.1.34 DEFINITION: 
A finite ordinal number is an extended finite ordinal number N such that N = @ or dx € N, zU (x) E N. 


12.1.35 THEOREM: Some equivalent conditions for extended finite ordinals to be finite. 
Let N be an extended finite ordinal number. 
(i) N is a finite ordinal number if and only if N = 0 or UN z N. 
(ii) N is a finite ordinal number if and only if N Aw. 
(iii) N is a finite ordinal number if and only if N U {N} is an extended finite ordinal number. 
) 


(iv) w is not a finite ordinal number. 
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(v) N is a finite ordinal number if and only if N = 0 or Jr € N, N 2 zU (xj. 
(vi) N is a finite ordinal number if and only if N = () or N \ {x} is an extended finite ordinal number for 
some x € N. 
(vii) N is a finite ordinal number if and only if N € w. 
(viii) N is a finite ordinal number if and only if N & w. 


PRoor: For part (i), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. 
If N z 0, then dr € N, xU {x} ¢ N by Definition 12.1.34. So UN = N by Theorem 12.1.21 (ix). Now 
suppose that N = Ø or UN z N. Then N = 0 and 3z € N, zU (x) ¢ N by Theorem 12.1.21 (ix). So N is 
a finite ordinal number by Definition 12.1.34. 


For part (ii), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. 
Then N = Ø or UN Æ N by part (i). So N z w by Definition 12.1.28 (i, iii). Now suppose that N is not 
a finite ordinal number. Then N # ( and UN = N by part (i). So N = w by Definition 12.1.28 and 
Notation 12.1.29. 


For part (iii), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. Then 
N = or Jx € N, xU {x} ¢ N by Definition (12.1.34). So N = Ø or NU {N} is an extended finite ordinal 
number by Theorem 12.1.21 (vi). But N = @ implies that N U {N} = {0} is an extended finite ordinal 
number by Remark 12.1.15. So N U {N} is an extended finite ordinal number. Now suppose that N U {N} 
is an extended finite ordinal number. Then N = f or dx € N, xU (x] € N by Theorem 12.1.21 (xiv). Hence 
N is a finite ordinal number by Definition 12.1.34. E: 

Part (iv) follows from part (iii) and Theorem 12.1.30 (i). 

For part (v), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. If 
N #90, then dv € N, zU(z) € N. So 3x € N, N = x U {x} by Theorem 12.1.21 (iv). Therefore N = @ or 
Jx € N, N = zU(x). Now suppose that N = Q or 3x € N, N = xU{x}. Then N = 9 or 3x € N, xU(z) ¢ N 
because N ¢ N by the axiom of regularity. So N is a finite ordinal number by Definition 12.1.34. 


For part (vi), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. Then 
N = or 3x € N, xU {x} ¢ N by Definition 12.1.34. So by Theorem 12.1.21 (i), N = 0 or N \ (x) is an 
extended finite ordinal number for some x € N. Now suppose that N = @ or N \ {a} is an extended finite 
ordinal number for some x € N. If N = Í, then N is a finite ordinal number by Definition 12.1.34. So let 
x € N be such that N \ {x} is an extended finite ordinal number. Then zU (x) ¢ N by Theorem 12.1.21 (ii). 
Therefore N is a finite ordinal number by Definition 12.1.34. 


For part (vii), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. 
Then N z w by part (ii). So N & w or wv & N by Theorem 12.1.19 (iii). Suppose that w € N. Then 
dz € w, zxU(x) € N\w by Theorem 12.1.23 (i). Soda € w, zU(x) gw. So Uw Sw by Theorem 12.1.21 (vii). 
This contradicts Definition 12.1.28. So N & w. Therefore N € w by Theorem 12.1.23 (ii). Now suppose 
that N € w. Then N 4 w by the axiom of regularity. So N is a finite ordinal number by part (ii). 


Part (viii) follows from part (ii) and Theorem 12.1.23 (iii). 


12.1.36 REMARK: The set of finite ordinal numbers is the set of finite extended finite ordinal numbers. 
Theorem 12.1.37 (i) verifies that w is indeed the set of ordinal numbers, which justifies the name given to 
it in Definition 12.1.28 and Notation 12.1.29. Note that Theorem 12.1.35 (vii), which seems to be the same 
proposition, is in fact predicated on the assumption that N is an extended finite ordinal number, whereas 
Theorem 12.1.37 (i) effectively assumes only that N is a ZF set. 


12.1.37 THEOREM: Some basic properties of finite ordinal numbers. 
(i) N is a finite ordinal number if and only if N € w. 

(ii) If N is a finite ordinal number, then N & w. In other words, VN € w, N & w. 

(iii) VN Ew, NU{N} ew \ (0). 

(iv) 

PRoor: For part (i), suppose that N is a finite ordinal number. Then N is an extended finite ordinal number 


by Definition 12.1.34. So N € w by Theorem 12.1.35 (vii). Now suppose that N € w. Then N is an extended 
finite ordinal number by Definition 12.1.28. So N is a finite ordinal number by Theorem 12.1.35 (vii). 


v) VN Ew, UN Ew. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


418 12. Ordinal numbers 


Part (ii) follows from part (i) and Theorem 12.1.35 (viii). 


For part (iii), let N € w. Then N is a finite ordinal number by part (i). So N U (Nj is an extended finite 
ordinal number by Theorem 12.1.35 (iii). Let x = N. Then x € NU{N}, but zU(z) = NU{N} ¢ NU{N} 
by Theorem 7.8.4 (i). So N U {N} is a finite ordinal number by Definition 12.1.34. Therefore N U {N} € w 
by part (i). But N U {N} #0 because N € NU{N}. Hence NU{N} € w\ {0}. 


For part (iv), let N € w. Then N is a finite ordinal number by part (i). So UN is an extended finite 
ordinal number by Theorem 12.1.21 (x). But N = z U {x} for some a € Ñ by Theorem 12.1.35 (v), and so 
UN = (Ux)Uz. Suppose that J N = N. Then from x € N, it follows that z € U N = ()z)Ux. But £ é x 
by Theorem 7.8.4 (i). So x € Uz. Therefore x € y for some y € x by Notation 8.4.2. But this is impossible 
by Theorem 7.8.4 (ii). So U N # N. Therefore (J N is a finite ordinal number by Theorem 12.1.35 (i). Hence 
UN €w by part (i). ~ 


12.1.38 REMARK: Precise identification of the set of extended finite ordinal numbers. 
Theorem 12.1.39 shows that w U {w} = (N; N is an extended finite ordinal number}. In other words, 


wU {w} = {N; Vm € N, (m=ģ v dae N, m-aU {a})}. 


This set is an “extension” of the set w of finite ordinal numbers in the sense that the element w of wU {w} is 
a kind of “point at infinity” or “limit” for the finite elements in w. (For comparison, see extended integers 
in Section 14.5 and extended real numbers in Section 16.2.) 


One may similarly write w as follows. 


w={N;(N=0) v (UN ÆN ^ (Yme N, (m— 0 v dae N, m=au {a})))} 
— (Ni; ((N 2 0) V (UN Z N)) ^ (Vm € N, (m — 0 v dae N, m— aU {a}))}. 


The expression for wU {w} seems simpler than the expression for w. This is because both sets have the same 
“interior equation", but the set w requires a “boundary condition". 


12.1.39 THEOREM: The set of extended finite ordinals is not an extended finite ordinal. 
(i) N € wU {w} if and only if N is an extended finite ordinal number. 


(ii) wU {w} is not an extended finite ordinal number. 


PROOF: For part (i), let N € wU{w}. If N € w, then N is a finite ordinal number by Theorem 12.1.35 (vii), 
and so N is an extended finite ordinal number by Definition 12.1.34. If N — w, then N is an extended finite 
ordinal number by Definition 12.1.28. Therefore N is an extended finite ordinal number. Now suppose 
that N is an extended finite ordinal number. Then either |J N € N or UN = N by Theorem 12.1.17 (iv). 
If UN € N, then N is a finite ordinal number by Theorem 12.1.35 (i), and so N € w by Theorem 12.1.35 (vii). 
IfUN=N, then N €w if N = 0, and N =w if N # by Definition 12.1.28. Hence N € wU {w}. 


For part (ii), suppose that wU {w} is an extended finite ordinal number. Then wU {w} € wU {w} by part (i), 
which contradicts the axiom of regularity. Hence w U {w} is not an extended finite ordinal number. 


12.2. Order on extended finite ordinal numbers 


12.2.1 REMARK: Representation of non-negative integers as finite ordinal numbers. 
The finite ordinal numbers become impractical very rapidly if written out in full. For example, the ordinal 
numbers 0, 1, 2, 3, 4, 5 are expressed as follows in terms of the empty set. 


=0 
1 = {0} 
2 = {0, {O}} 


3 = {0, {0}, {0, {O}3} 
4 = {0, {0}, (0, (03, (0, 10}, 10, {OF E33 
5 = {0, {0}, (0, (035, (0, (03, (0, (0533. (0, (05, (0, {OF}, (0. 10}, (0, (05553) 
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Figure 12.2.1 'The first seven ordinal numbers 


Figure 12.2.2 The ordinal number 10 


12.2.2 REMARK: Diagrams to illustrate finite ordinal numbers. 
Figure 12.2.1 illustrates the first seven ordinal numbers. An empty box represents the empty set. A box 
containing only an empty box represents (()), and so forth. 


The ordinal number 10 is illustrated in Figure 12.2.2. 


12.2.3 THEOREM: The minimum of a set of finite ordinals is a an element of the set. 
Let S be a non-empty subset of w. 


(i) f1S € v. 
(ii) NS e S. 


PROOF: For part (i), let S be a non-empty subset of w. Let N = NS. Then N is a well-defined set by 
Theorem 8.4.3 (ii), and N C w because x C w for all x € S by Theorem 12.1.17 (iv). Let m € N \ {0}. 
Let nı € S. Then m = 9 v (Jaı € n1, m = a1U(a1]) follows from Theorem 12.1.17 (xi) and Definition 12.1.3. 
So da, € nı, m = a U {a1}. But Vn € S, m € n. So Vn € S, (m = 0 v (Ja € n, m = aU (a])) by 
Definition 12.1.3. So Vn € S, da € n, m = aU {a}. However, aU {a} = m = aı U (a1) implies a = ay by 
Theorem 12.1.14. So Vn € S, da € n, (a =a, ^ m = aU(a)). Therefore Vn € S, (a4 € n ^m = a1U(a1]). So 
m = a1 U {a; } and a; € (| S. Therefore Vm € N, (m = 0 v (Aa, € N, m = a1 U {a1})). So N is an extended 
finite ordinal number by Definition 12.1.3. But N C n; and ni € w. So nı Sw by Theorem 12.1.17 (vi). 
Therefore N & w. Hence N € w by Theorem 12.1.21 (ii). 
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For part (ii), let S be a non-empty subset of w. Suppose that ()S d S. Then Vn € S, NS Æ n. But 
the definition of ()S implies Vn € S,(]$ C n. So Yn € S, S & n. Therefore Vn € S, NMS € n by 
Theorem 12.1.23 (ii). So NS € NS by the definition of (],S. This contradicts the axiom of regularity. 
So fS e S. 


12.2.4 REMARK: Order for the ordinal numbers in terms of the set membership relation. 

The "standard order relation on the ordinal numbers" in Definition 12.2.5 is a total order on the ordinal 
numbers by Theorem 12.2.6. Theorem 12.2.7 asserts that this order is a well-ordering. The principle of 
mathematical induction follows from this. 


12.2.5 DEFINITION: The standard order on the finite ordinal numbers is the relation “<” which is defined 
on w by Vz,y € w, (x € y & x C y). In other words, “<” = ((z,y) Ew xw; x € y]. 


12.2.6 THEOREM: The set-inclusion order on the finite ordinals is a total order. 
The standard order on the finite ordinal numbers is a total order. 


PROOF: Let Ny, No € Ww. Then Ni C No or No C Ny by Theorem 12.1.23 (xi). So Nı € No or No < Nj for 
all N1, N € w. Therefore “<” satisfies Definition 11.5.1 (i) (strong reflexivity). 

Suppose that Ny, Nə € w satisfy Ny < Nə and Nə € Nj. Then Ny; C No and Nə € Nj by Definition 12.2.5. 
So N, = Nə by Theorem 7.3.5 (iii). Therefore “<” satisfies Definition 11.5.1 (ii) (antisymmetry). 

Suppose that Ni, N2, N3 Ew satisfy Ni < No and No < N3. Then Ni e No and No C N3 by Defini- 
tion 12.2.5. So N, C Ns by Theorem 7.3.5 (vii), and so N, € Ns by Definition 12.2.5. Therefore “<” 
satisfies Definition 11.5.1 (iii) (transitivity). Hence “<” is a total order by Definition 11.5.1. 


12.2.7 THEOREM: The set-inclusion order on the set of finite ordinals is a well-ordering. 
The standard order on the finite ordinal numbers is a well-ordering. 


PROOF: Let S be a non-empty subset of w. Let N = (^] S. Then N € w by Theorem 12.2.3 (i). Since N C n 
for all n € S, it follows from Definition 12.2.5 that N < n for all n € S. But N € S by Theorem 12.2.3 (ii). 
Hence S is well ordered by the standard order on w. 


12.2.8 THEOREM: The set-inclusion order on any finite ordinal is a well-ordering. 
The restriction of the standard order on the set w of finite ordinal numbers to any individual finite ordinal 
number N € w is a well-ordering on N. 


PROOF: Let N € w. Then N & w by Theorem 12.1.37 (ii), and <y= ((z,y) € N x N; x C y) is the 
restriction to N of the standard order “<” on w in Definition 12.2.5. Let S be a non-empty subset of N. 
Let N' = S. Then N’ C n for all n € S. So N’ <y n for all n € S. But N’ € S by Theorem 12.2.3 (ii) 
because S C w. Therefore S is well ordered by the restricted standard order € y on N. 


12.2.9 REMARK: The set of finite ordinal numbers satisfies transitivity and well-ordering. 

Since the set membership relation on w is transitive by Theorem 12.1.17 (vii), and the set inclusion relation 
on w is a well-ordering by Theorem 12.2.7, the two conditions which typically define general ordinal numbers 
in textbooks are now established. (See Remark 12.1.8 and Definition 12.5.7 for these conditions.) 


12.2.10 REMARK: Total order on extended finite ordinal numbers. 

The total order in Definition 12.2.5 is extended to extended finite ordinal numbers in Definition 12.2.11. 
The mathematical induction principle in Theorem 12.2.12 is a consequence of the fact that this order is a 
well-ordering. Theorem 12.2.13 is the finite version of Theorem 12.2.12. 


12.2.11 DEFINITION: The standard order on the extended finite ordinal numbers is the relation “<” which 
is defined on wt by Vz,y € wt, (x € ya C y). In other words, “<” = ((z,y) € wt x wt; x € y}. 


12.2.12 THEOREM: The principle of mathematical induction. 
Let S be a set with S C w, 0 € S, and Vr € S, xU (x) € S. Then S =w. 
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PROOF: Let S be a set with S Cw, () € S, and Vr € S, zU {x} € S. Suppose that S Z w. Let Y —w S. 
Then Y £0. Let N = Y. Then N € Y by Theorem 12.2.3 (ii). But Y Cw \ (0) because Ø € S. Therefore 
NY 2 (\( V 10)) by Theorem 8.4.8 (iii). So Ø € NY by Theorem 12.1.30 (v). Therefore N # Ø. So 
da € w, N = xU (x) by Definition 12.1.3 because w is an extended finite ordinal number. For such x, if 
x € S then N € S, which would contradict N € Y =w\S. So x € Y. But N C y for all y € Y, by the 


definition of (]Y. So N C x. So xU (x) € x, which contradicts the axiom of regularity. Hence $ = w. 


12.2.13 THEOREM: The principle of finite mathematical induction. 
Let m € w, and let S be a set such that S C m, Ø € S, and Vr € S, xU {x} € SU (m). Then S =m. 


PROOF: Let m € w. Let S be a set which satisfies S C m, 0 € S, and Vr € S, rU (x) € SU (m). 
Suppose that S Z m. Let Y = mV S. Then Y # Ø. So N = (Y is a well-defined set, and N € Y by 
Theorem 12.2.3 (ii). But Y C m \ {0} because Ø € S. Therefore QY 2 N (m \ {0}) by Theorem 8.4.8 (iii). 
So Ø € NY by Theorem 29.6.3 (iii) because m V (0) 4 Ø. Therefore N 4 (. So Sx € m, N = xU {x} by 
Definition 12.1.3 because m is an extended finite ordinal number. For such z, if z € S then N € SU {m}, 
which would contradict N € Y = mS. (The possibility N = m is excluded because m € m would contradict 
Theorem 7.8.4 (1).) So x € Y. But N C y for all y € Y , by the definition of (Y. So N C x. SozU(z) Ca, 
which contradicts the axiom of regularity. Hence S = m. 


12.2.14 REMARK:  Equinumerous ordinal numbers are identical ordinal numbers. 

Theorem 12.2.15 states in essence that two equinumerous ordinal numbers must be the same ordinal number. 
(See Section 13.1 for equinumerosity.) This theorem has some technical applications, for example in the proof 
of Theorem 14.11.7 (vi). 


12.2.15 THEOREM: All increasing bijections between extended finite ordinals are identity maps. 
(i) For all X1, X> € w, if f : X4 > Xo is an increasing bijection, then X, = X» and f = idx,. 
(ii) For all X1, X2 € wt, if f : X, — X» is an increasing bijection, then X, = X» and f = id y,. 


PROOF: For part (i), let X1, X2 € w. Let f : X; — X» bean increasing bijection. Suppose that X1 = 0 = 0. 
Then Xə = ( = 0. Similarly, if Xə = 0, then X4 = 0. So assume that X4 4 0 and Xə 4 0. Suppose that 
f(0) =k € X3 \ {0}. Then f(£) = 0 for some £ € X4 {0}, and so 0 < £ and k > 0. So f is not increasing, 
which is a contradiction. Therefore f(0) = 0. Now let k € X4 and suppose that f(i) = i for all à € Xi 
with i < k. Then f(k) > k because f is a bijection. Suppose that f(k) > k. Then f(£) = k for some £ > k. 
Sok < £ and f(k) > f(£), and so f is not injective, which is a contradiction. Therefore f(k) = k. By 
induction, f(k) = k for all k € X41. So Range(f) C X». Therefore Range(f) = Xə because otherwise f 
would not be surjective. Hence f = idx, and so X, = X». 


For part (ii), let X1, X2 € wt. Let f : X, — X» bean increasing bijection. Suppose that X, € w and X5 = w. 
Then Range(f) = X, and f = idx, as in the proof of part (i). So f is not surjective. Therefore X5 € w. So 
X; = Xə and f = idx, by part (i). Now suppose that X, = w. Then by the inductive method of part (i), 
Xə = Range(f) = w and f = idx,. 


12.2.16 REMARK: Successor sets are well defined. 

For any set X, the successor set X U {X} is a well-defined set. This follows from the unordered pair axiom 
and the union axiom in ZF set theory. The inequality X U {X} 4 X follows from the ZF regularity axiom, 
which implies that X £ X for sets X. 

Unfortunately, the map-rule X + X U {X} is not a ZF function because its domain, the class of all sets, 
is “too big". So it cannot be called the “successor function" unless its domain is restricted to a ZF set. 
However, one may refer to this map-rule informally as “the successor map". 


12.2.17 DEFINITION: The successor set of a set X is the set XU {X}. 
12.2.18 NOTATION: X^, for any set X, denotes the successor set X U {X} of X. 


12.2.19 REMARK: The successor set notation. 

The notation X+ = XU (X) for successor sets clashes negligibly with notations such as IR* and Z+. Peano 
used the notation a4- for the abstract successor of a positive integer a in an 1891 work (Peano [431], page 20), 
but not in an earlier 1889 work (Peano [375]). 
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12.2.20 THEOREM: Some basic properties of the successor function for extended finite ordinals. 


PROOF: Part (i) follows directly from Theorem 12.1.37 (iii) and Notation 12.2.18. 


For part (ii), let X € wt. If X = w, then clearly X C w. So suppose that X € w. Let m € X. Then m is 
an extended finite ordinal number by Theorem 12.1.17 (xi). So m € wt by Theorem 12.1.39 (i). But m z w 
because the containment w € X € w is impossible by the axiom of regularity. (See Theorem 7.8.4 (ii).) 
So m € w. Therefore Vm € X, m € w. In other words, X C w. 

For part (iii), let X € w. Suppose that wv C X. Then X = w because X C w by part (ii). Sow € w. But 
this is impossible by the axiom of regularity. (See Definition 7.2.4 (7).) Hence w Z X. 

For part (iv), let N € w, X € wt and X C N. Suppose that X = w. Then w C N, which is impossible by 
part (iii). Therefore X # w. Hence X € w. 


Part (v) follows from Theorem 12.1.30 (iv). 


12.2.21 REMARK:  Bijections constructed from successor maps for extended finite ordinals. 

It is sometimes convenient to construct bijections from successor maps for extended finite ordinals. For a 
finite ordinal N, one may map each element to its successor, except that the largest element is mapped 
to Ø. It is shown in Theorem 12.2.23 (i, ii) that this is a well-defined bijection from N to N. (In terms of 
Definition 14.8.2, it is a permutation of N.) Theorem 12.2.23 (iii) is useful for the proof of Theorem 12.2.24. 


For the infinite ordinal w, a useful bijection from w \ (0) may be constructed from the successor map as in 
the statement of Theorem 12.2.23 (iv). 


12.2.22 DEFINITION: The wrapped successor function for a finite ordinal number N € w is the function 
SN : N — N defined by 


i+ ifiteN 
Vi € N, 2022 
i eut) í Ø otherwise. 


12.2.23 THEOREM: Some basic properties of the wrapped successor function for finite ordinals. 
(i) For all finite ordinal numbers N, the wrapped successor function sy : N — N is a well-defined function. 


(iii) For all finite ordinal numbers N, if S satisfies $ C N and sy(S) C S. Then Vz € S, zU(x) € SU{N}. 


) 
(ii) For all finite ordinal numbers N, the wrapped successor function sy : N — N is a bijection. 
) 
(iv) The map s, : w > w V (0) defined by s, : £ — x* is a well-defined bijection. 


PROOF: For part (i), sy is a well-defined function because i* is well-defined for all i € N, and the condition 
“i+ € N” ensures that Ø is substituted for i+ if is it not in N. If i € N satisfies it ¢ N, then Ø must be an 
element of N because i € N implies N 4 Ø, which implies Ø € N by Theorem 12.1.17 (i). 

For part (ii), let i,j € N with sy(i) = sw(j). If i* € N and jt € N, then it = j*, which implies i = j 
by Theorem 12.2.20 (v). If i+ ¢ N and j* € N, then sy(i) = and sy(j) z 0 by Theorem 12.2.20 (i). So 
sw (i) Z sw (j), which contradicts the assumption. Similarly, the combination it € N and j+ ¢ N is excluded. 
If i* € N and jt ¢ N, then it = N = jt by Theorem 12.1.21 (iv). So i = j by Theorem 12.2.20 (v). Thus 
i = in all cases. Therefore sy is injective. 


To prove surjectivity, suppose that m € N V sy(N). By Definition 12.1.3, m € N implies that either m = () 
or m = xt for some x € N. Suppose that m = Ø. Then N 4 Ø. So by Theorem 12.1.35 (v) there exists i € N 
such that N = i*. But then it ¢ N because N ¢ N by Theorem 7.8.4 (i). So sw (i) = 0, and so 0 € sw (N), 
which contradicts the assumption m € N \ sn(N). So suppose that m = x* for some x € N. Then 
m € sn(N), which is once again a contradiction. Therefore N C sw(N), which implies that sw : N > N is 
surjective. Hence sy : N > N is a bijection. 
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For part (iii), let x € S, where S C N and sy(S) C S. Then z € N Cw. If xt € N, then sy(x) = xt by 
Definition 12.2.22. So x* € sn(S) C SC SU(N). If at ¢ N, then N = zx U {x} by Theorem 12.1.21 (iv). 
So xt € SU{N}. Thus in all cases, rt € SU{N}. 

For part (iv), sj : w — w V (0) is a well-defined function by Theorem 12.2.20 (i), and s, is injective by 
Theorem 12.2.20 (v), and sw : w — w V (0) is surjective by Definition 12.1.3. Hence s, : w > w V (0) is a 
well-defined bijection. 


12.2.24 THEOREM: The principle of wrapped finite mathematical induction. 
Let N € w, and let S be a set such that S C N, Ø € S and sw (S) C S, where sy is the wrapped successor 
function for N. Then S — N. 


PROOF: Let N €w. Let S C N satisfy Ü € S and sy(S) C S. Then it follows from Theorem 12.2.23 (iii) 
that S satisfies the assumptions of Theorem 12.2.13. Hence S = N. 


12.2.25 REMARK: Notation for the set of extended finite ordinal numbers. 

Using Notation 12.2.18, the set of extended finite ordinal numbers may be written as wt = w U {w}. This 
notation can be useful to refer to sequences which have either a finite or infinite domain. The domain of a 
finite sequence is an element of w, whereas the domain of an infinite sequence is the set w itself. 


12.3. Sequences 


12.3.1 DEFINITION: A sequence is a family (X;);e; such that I € wt = wU {w}. 
A finite sequence is a family (X;);e; such that I € w. 
An infinite sequence is a family (X;);e; such that I = w. 


12.3.2 REMARK: Injective sequences and index maps. 

The range, Range(X) = {X;; i € I}, of a sequence (X;);e; is a well defined set by the Zermelo-Fraenkel 
axiom of replacement. (See Definition 7.2.4 (6).) Since all sequences are functions, the concept of an “injective 
sequence" is well defined. (See Definition 10.5.2.) In general, the increasing index map f in Definition 12.3.3 
is not uniquely determined by the sequence and subsequence, but if the sequence is injective, the index map 


is unique. This is asserted by Theorem 12.3.4 (i). 


12.3.3 DEFINITION: A subsequence of a sequence (Xi)ier is a sequence (Y;);c; such that Y = X o B for 
some increasing function 6: J > I. 


A finite subsequence of a sequence (X;)ier is a subsequence (Y;);c; of (Xi)ier such that J € w. 


An infinite subsequence of a sequence (X;)icr is a subsequence (Y;);ey of (Xi)ier such that J = w. 


12.3.4 THEOREM: Some very elementary properties of subsequences. 
(i) Let (Y;);e; be a subsequence of an injective sequence (X;);e; with Y = X o 8, and Y = X o £5, where 
B1: J — I and B5: J > I are increasing functions. Then £1 = (2. 
(ii) A subsequence of a finite sequence X is a finite subsequence of X. 
(iii) A subsequence of a subsequence of a sequence X is a subsequence of X. 
(iv) A subsequence of a finite subsequence of a sequence X is a finite subsequence of X. 
) 


(v) An infinite subsequence of a subsequence of a sequence X is an infinite subsequence of X. 


PROOF: For part (i), let (Y;);e; be a subsequence of an injective sequence (X;);ie; with Y = X o f and 
Y = X o f», where £1 : J — I and £5 : J — I are increasing functions. Then X(f1(j)) = X(82(j)) for 
all j € J. So £1(3) = f»(j) for all j € J because X is injective. Hence £1 = 65. 


For part (ii), let X be a finite sequence. Then X = (X;)ier for some I € w. Let Y be a subsequence of X. 
Then Y is a sequence (Y;)je; with J € wt such that Y = X o 8 for some increasing function B : J > I. 
Suppose that J = w. Then I € J. Let I” = (I) € I. Then I < I’ by Theorem 11.7.2 (ii). So I € I by 
Theorem 12.1.23 (xiii), which contradicts Theorem 7.8.4 (i). So J # w. Hence Y is a finite subsequence of X. 


For part (iii), let X = (X;)ier be a sequence for some I € w*. Let Y = (Y;)je be a subsequence of X for 
some J € wt. Then there is an increasing function f, : J — I such that Y = X o f. Let Z = (Zp) nex bea 
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subsequence of Y for some K € wt. Then there is an increasing function b2 : K — J such that Z = Y o Bo. 
Let 8 = pı o B3. Then Z = X o B, where 6 : K — I is an increasing function by Theorem 11.1.33 (ii). 
Hence Z is a subsequence of X. 

Part (iv) follows from part (ii) 

For part (v), let X be a sequence. Let Y be a subsequence of X. Let Z be an infinite subsequence of Y. 
Then Z = (Z;)icx for some K € wt by Definition 12.3.3, and Z is a subsequence of X by part (iii). Hence 
Z is an infinite subsequence of X. 


12.4. Enumerations of sets by extended finite ordinal numbers 


12.4.1 REMARK:  Countability of sets of finite ordinal numbers. 

Finite and countably infinite sets are discussed in Sections 13.5 and 13.7 respectively for general ZF sets. 
Cardinality of sets is defined in general in terms of the existence of bijections between standard sets which 
have known cardinality and sets whose cardinality is to be measured. (See Sections 13.1 and 13.4.) In the 
case of sets of finite ordinal numbers, the proof of Theorem 12.4.2 constructs explicit bijections with the 
assistance of countable induction. No axiom of choice was used in the proof of this theorem. This is important 
because a surprisingly large proportion of intuitively appealing cardinality assertions regarding countable 
sets do require some kind of axiom of choice. The simple construction idea in the proof of Theorem 12.4.2 
is illustrated in Figure 12.4.1. 


w |o|1|2|a|4|5]|e|v|8]|9 [1o [12 13 [1415 [16]17 1819 20 
e 
o|1[2[s 


Figure 12.4.1 Enumeration of an arbitrary subset of w 


4156 7|,8/9]/10 


12.4.2 THEOREM: Enumeration of an arbitrary subset of w. 
For all Y € P(w), there exists a unique increasing bijection f : X > Y with X € wt. 


PROOF: Let Y € P(w). Define sets gy C w x Y inductively for k € w by go = @ and 


B Po Qn) iY.z0 
Gk+1 = 


Vk € w, . 
gk otherwise, 


(12.4.1) 


where Y; denotes the subset Y V Range(gx) = Y \ (gx(i); à € k} of w for all k € w. In other words, each 
gk+1 extends gy by adding the ordered pair (k, (| Yp), where the minimum element (^) Y; of the non-empty 
subset Y, of w is a well-defined element of w because w is well ordered. Since (gk)kew is a non-decreasing 
sequence of subsets of w x Y , it follows that (Yk)kew is a non-increasing sequence of subsets of w. 


Let X = fk € w; Yk AO}. Then X Cw. Suppose that X Aw. Then w V X z 0, and so N = N (w \ X) 
is a well-defined element of w because w is well ordered. But w V X = {k € w; Yp = 0). So Yp # 0 for 
all k € N, and Y; = 0) for all k € w \ N because (Yk)kew is non-increasing. Therefore X = N € w if X Aw. 
Thus X € w*. 


Since go = 9, go is a function with domain Ø = 0 € w. For k € X, suppose that gẹ is a function whose 
domain is k. Then gy41 = gp U {(k, N Ye)} is a function with domain k + 1— kU {k} because Yp Æ 0. 
Therefore by induction, g,+1 is a function with domain k +1 for all k € X. 

Let f = Urex geti- If X Aw, then X € w and so f = gx because (gk)kew is a non-decreasing sequence 
of subsets of w x Y and X = (i + 1; i € X) for all X € w (since each ordinal number is equal to 1 more 
than the largest of its elements). So Dom(f) = X. Then Range(f) = Y because Range(g;) = Y \ Y; for all 
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k € w, and so Range(gx) = Y V Yx = Y because X € w \ X since X ¢ X. But f is an injection because 
f(k) = Yk = N (Y NRange(gx)) = N (Y \ Range(f|,)) for all k € X, and so f(k) ¢ Range(/ |, ). Therefore 
[: X — Y is a bijection. 

If X = w, then f = Ukex ge+i is a function with Dom(f) = w because f contains one and only one 
ordered pair for each k € w, and f is an injection because f(k) = N(Y N Range(f|,)) for all k € X, and so 
f(k) é Range(/|, ). To show that f : X — Y isa surjection, let z = (| (Y \Range(f)), the smallest element of 
Y which is not in the range of f, and let y = U{j € Range(f); j < z}, the largest element of Range( f) which 
is less than z. (Note that z = f Y in the exceptional case where {j € Range(f); j < z} = 0, and so f (0) = z, 
which is a contradiction. So this case can be ignored.) Let k = f^ !(y). Then f(k+1) 2 N(Y WF); i € k]) 
by the definition of f. But z € YN f (i); i € k} because YM f (i); i € k} 2 Y\Range(f) and z € Y\Range(f). 
Note that f is a strictly increasing function because (Yk)kew is a strictly decreasing sequence of sets (with 
respect to the set-inclusion partial order on P(w)). 

Suppose that z > f(i) for some i € w with i > k. Then f(z) would be an element of {j € Range(f); j < z} 
with f(k) < f(i), which would contradict the definition of k. Therefore z < f(i) for all i € w with 4 > k. 
So z = N (Y \ Range(f)) = N (Y V0; i < kH A (Y NGG i> k)) =N (Y 0; i € k)). Therefore 
f(k +1) 2 z, which is a contradiction. So Range(f) = Y. Hence f is a bijection. 

To show the uniqueness of the increasing bijection f : X > Y with X € wt, let fı : Xy > Y and fo: X2 > Y 
be increasing bijections from X1, X2 € w™ to Y. Then fo = fz o fı : Xı — Xa: is an increasing bijection. 
So Xı = Xə and fo = idx, by Theorem 12.2.15 (ii). Hence fı = fo. 


12.4.3 REMARK: Example of the ambiguity of the customary set-map notation. 

In the proof of Theorem 12.4.2, the set-expression {gx(i); i € k} is the image of the set k = {0,1,...k — 1} 
under the map gx. Thus one could write 9,(k) instead of (gr(i); i € k}, using the unambiguous notation 
introduced in Definition 10.6.4. The customary notation for the image of the set k under the map gi would 
be gx(k), as mentioned in Remark 10.10.3. However, this would be confusing because g;,(k) would mean both 
the value of gi, for the domain element k and the set {g,(i); i € k}, which are clearly different in general. 


12.4.4 REMARK: Non-standard name for standard enumeration of a set of finite ordinal numbers. 

The unique increasing enumeration of an arbitrary set of finite ordinal numbers in Theorem 12.4.2 is useful 
enough to deserve a name. (The existence and uniqueness is necessary for a definition to use the word “the” .) 
The name given in Definition 12.4.5 is non-standard. 


12.4.5 DEFINITION: The standard enumeration for a set of finite ordinal numbers Y € P(w) is the unique 
increasing bijection f : X — Y with X € w*. 


12.4.6 THEOREM: Explicit inductive formula for the standard enumeration of a set of finite ordinals. 
Let f : X — Y be the standard enumeration for a set Y € P(w). 


(i) f = Ukew gr» where gx € w x Y is defined inductively for k € w by go = Ø and 
D U{(k, N Ye)} if Ye #0 
9k 


Vk Ew, k41 = 
"IR otherwise, 


(12.4.2) 
where Y; denotes the subset Y V Range(gx) = Y \ (gx(i); à € k} of w for all k € w. 

(ii) Vi e X, f(1) >i. 

PROOF: Part (i) follows from the proof of Theorem 12.4.2. 

For part (ii), note first that f(0) > 0 if 0 € X. Suppose that k € X and k +1 € X, and f(k) > k. Then 

f(k +1) > f(k) because f is increasing. So f(k-- 1) > f(k)--1 > k--1. So f(i) 2 i for alli € X by 

induction. 


12.4.7 REMARK: Equinumerosity of sets of finite ordinal numbers elements of wt. 

In the terminology of equinumerosity in Section 13.1, Theorem 12.4.8 means that every set of finite ordinal 
numbers is equinumerous to a finite ordinal number or the set of all ordinal numbers. In other words, every 
subset of w is equinumerous to some element of wt. 


Similarly, Theorem 12.4.10 means that every subset of a finite ordinal number N is equinumerous to an 
element of N or to N itself. 
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12.4.8 THEOREM: Equinumerosity of subsets of w to elements of wt. 
VY € P(w), IX € wt, df € X x Y, f is a bijection. 


PROOF: The assertion follows immediately from Theorem 12.4.2. 


12.4.9 REMARK: Enumeration of an arbitrary subset of a finite ordinal number. 
Theorem 12.4.10 is a finite-set analogue of Theorem 12.4.8 which, for example, is useful for showing that all 
finite sets are Dedekind finite in Section 13.10. 


12.4.10 THEOREM:  Equinumerosity of subsets of N € w to elements of NT. 
VN € w, VY e P(N), 3X e N*, df € X x Y, f is a bijection. 


PROOF: Let N €w and Y € P(N). By Theorem 12.4.2, there is a unique increasing function f : X > Y 
with X € wt. But i € f(i) for all i € X by Theorem 12.4.6 (ii). So X C N. Hence X € NT. 


12.4.11 THEOREM: A proper subset of a finite ordinal has less elements than the finite ordinal. 
Let f : X — Y be the standard enumeration for a set Y € P(N) for some N € w. 


(i) X CN. 
(ii) I£ Y Z N, then X < N. 


PROOF: For part (i), let Y € N for some N € w. Then i < f(i) for all i € X by Theorem 12.4.6 (ii). 
So X CN. 

For part (ii), let Y € N for some N € w. Then X C N by Theorem 12.4.10. Suppose that X = N. Let 
y E€ NXY. Define a bijection g: N \ {y} > N —1 by 


, ; a ifi < 
vie Ng). al 4 iL 


Then Y C Dom(g) and g is increasing on Y. But f : N — Y is increasing. Sogo f: N > N—1is 
increasing. So X C N — 1 by part (i), which contradicts the assumption X — N. Hence X « N. 


12.4.12 REMARK: Injective and surjective finite set endomorphisms must be bijections. 

In the proof of Theorem 12.4.13 (i), the strategy follows the typical inductive pattern, namely to state first 
that if something “goes wrong”, then there must be a least integer for which it “goes wrong”, and then to 
show that there is an even smaller integer where it “goes wrong”, which is a contradiction. 


A possibly interesting aspect of the proof of Theorem 12.4.13 (ii) is the constructive choice of a right inverse of 
a surjective function. As mentioned in Remark 10.5.16, the general existence of right inverses of surjections in 
Theorem 10.5.17 (i) requires the axiom of choice, but in this case, an explicit right inverse may be constructed 
because every finite set has an explicit well-ordering. 


12.4.13 THEOREM: All injections and surjections on a finite ordinal are bijections. 
(i) Let 6: N — N be an injection for some N € w. Then ¢ is a bijection. 
(ii) Let 6: N > N be an surjection for some N € w. Then ó is a bijection. 
(ii) There is no injection from w to a finite ordinal number. 
Pnoor: For part (i), suppose that for some N € c, there exists an injection ó : N — N which is not a 
bijection. Let No € w be the minimum N € w such that there exists an injection $ : N — N which is not 


a bijection. (No is well defined because w is well ordered.) Let ġo : No — No be an injection which is not a 
bijection. Then y € No \ Range(¢o) for some y. Define g : No \ {y} > No — 1 by 


i ifi < 
Vi € No \ {y}, a= {54 fip 


Then g is an injection from Range(do) to No — 1. Define $1 : No — 1 — No — 1 to be the restriction of g o ġo 
to Ng — 1. Then ¢; is an injection. But $4 is not a bijection because g(¢o(No — 1)) € Range(¢,). This 
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No 
(0 y € Range(óo) "m 
g © ġo y No \ y] 
\? Qu igi 
z (No — 1) \ (2j 
z = 9(¢0(No — 1)) € Range(g o polni) 
Figure 12.4.2 Proof that an injective finite set endomorphism is a bijection 


contradicts the hypothesis that No is the minimum finite ordinal number with the specified property. Hence 
all injections ¢: N — N for N € w are bijections. (See Figure 12.4.2.) 

For part (ii), suppose that for some N € w, there exists a surjection ọ : N — N. Define  : N > N by 
v(j) = min(i € N; o(i) = j} for all j € N. Then w: N —> N is a well-defined function because ¢ is a 
surjection, and  : N — N is an injection because ¢ is a well-defined function. Therefore v : N > N isa 
bijection by part (i). Hence ¢: N — N is a bijection. 

For part (iii), let f : w — N be an injection for some N € w. Let ọ = iHe Then gd: N > N is an 
injection. Therefore ¢ is a bijection from N to N. Let x € w V N. Then f(x) € N. Let y = g !(f(z)). 
Then f(y) = f(x) because g is a bijection. But y € N because Dom(g) = N. So x # y. Therefore f is not 
injective, which is a contradiction. Hence there is no injection from w to N. 


12.4.14 REMARK: The first-instance subsequence of a sequence is injective and has the same range. 

It is often useful to be able to replace a sequence with an injective subsequence which has the same range. 
This is easily achieved by selecting the first occurrence of each value in the sequence. This construction is 
shown in Theorem 12.4.15 to have the desired properties. It is given a name in Definition 12.4.16. 


12.4.15 THEOREM: Eliminating duplicates from a sequence leaves the range unchanged. 

Let (z;);e; be a sequence for some I € wt. Let S = (i € I; Vj € i, zj # a}. Let B : J — S be 
the standard enumeration for S. Then x o f is an injective sequence which is a subsequence of x, and 
Range(x o 8) = Range(z). 


PROOF: Let (r;)ie; be a sequence for some I € wt. Let S = (i € I; Vj € i, xj A xi}. Then $ C w 
because J C w. Let Y = Range(z) and Z = (xj; i € S) = Range(z|.). Clearly Z C Y. Let y € Y. Then 
y = x; for some i € I. Let j = min({k € I; x, = x;}). Then j is well defined because {k € I; x, = vj) z 0. 
Clearly x; = z;. But zy Z x; for all k € j by the definition of j. So j € S. Therefore x; € Z. So aj € Z. 
Thus Y C Z, and so Y = Z. 


Let 8 be the standard enumeration for S. (See Definition 12.4.5.) Then 8 : J > S is an increasing bijection 
for some J € wt. Therefore x o B = (ra(j))jc; is a sequence by Definition 12.3.1, which is a subsequence 
of x by Definition 12.3.3. Suppose that j1,j» € J with jı Æ jo. Then f(j1),8(j2) € S, and f(j1) Æ B(js) 
because D is increasing. Suppose that sey < B(j2). Then Vj € B(j2), xj A x(8(j2)) by the definition 
of S because B(j2) € S. So x(B(j1)) A w(G(j2)). Similarly, 8(j1) > 8(a) implies z(8(j1)) A x(82)). 
Therefore x o f is injective. The mem Range(x o 8) = Range(x) follows from the equality Range(z o 
B) = Range(x Range(2| ,) = Range(z). 


2|pange()) = 


12.4.16 DEFINITION: The first-instance subsequence of a sequence (xi)ier is the subsequence z o f = 
(xa(j))jes, where B : J — S is the standard enumeration of S = {i € I; Vj € i, xj A zi). 


12.4.17 REMARK: Equinumerosity of finite ordinal numbers implies equality. 

In the terminology of Section 13.1, the existence of a bijection between finite ordinal numbers N; and Nə in 
Theorem 12.4.18 (ii) means that they are equinumerous. Thus Theorem 12.4.18 (ii) implies that two finite 
ordinal numbers are equinumerous if and only if they are equal. (Note that part (ii) of Theorem 12.4.18 can 
also be proved by a double application of part (i).) B 


12.4.18 THEOREM:  Zquinumerous finite ordinals are identical. Less numerous are subsets. 


[www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


428 12. Ordinal numbers 


(i) Let Ni, No € w. Let 6: N; > No be an injection. Then N; C No. Hence Ny € No. 
(ii) Let Ni, No € w. Let 6: Ni > No be a bijection. Then N; = No. 


PROOF: For part (i), let Ni, No € w. Then Ny C No or No C Ny by Theorem 12.1.23 (xi). So Ny C No 
or No & Ni. Let ọ : Nı — Nə be an injection and suppose that No & Ni. Define ọ' = sc: 'Then 
¢' : Na — Np is an injection. So ¢' : Na > Np is a bijection by Theorem 12.4.13 (i), and so Range(¢’) = No. 
Let z € N1X Ng. Then Vz € No, o(z) Æ o(x) because Vr € No, z Z x and ¢ is injective. Thus ¢(z) d No, 
which is a contradiction. Hence N C No. 

For part (ii), let Ni, N2 € w. Then Ni C Nog or No C Ny by Theorem 12.1.23 (xi). Let 6: Ni > No 
be a bijection. Then $7! : Ng — N; is a bijection by Theorem 10.5.11. Suppose that Ni C Nə. Then 
$-!: Ng — No is an injection because 9^! : No — N; is an injection. So $^! : Na — N> is a bijection by 
Theorem 12.4.13 (i). Therefore Range(ó !) = Nə. But Range(ó !) = Nj, because $^! is a bijection. So 
N, = Nə. Similarly, if No C N; then No = Range(¢) = Nı by Theorem 12.4.13 (i) because 9 : N; — Nj is 
injective. Hence N; = N2. : 


12.5. General ordinal numbers 


12.5.1 REMARK: History of general ordinal numbers. 

General ordinal numbers were introduced as abstract entities in 1883 by Cantor [344], pages 165-209. He 
presented a theory of abstract “order types" in 1895. (See Cantor [344], pages 282-307.) The von Neumann 
representation of general ordinal numbers as ZF sets appeared in 1923. (See von Neumann [436]. The 
formulation of the von Neumann representation in terms of well-ordering and transitivity is attributed by 
Quine [382], page 157, to a 1937 paper by Robinson [433]. (The historical origins of this style of ordinal 
number definition are discussed in more detail in Remark 12.1.8.) 


12.5.2 REMARK: Literature for general ordinal numbers. 

Some presentations of general ordinal numbers are given by Smullyan/Fitting [392], pages 71-90; Lévy [368], 
pages 49-71, 112-157; Roitman [385], pages 73-77, 109-116; Pinter [377], pages 166-212; Chang/Keisler [347], 
pages 580-584; Shoenfield [390], pages 246-252; Cohen [349], pages 56-64; Halmos [357], pages 74-85; 
Stoll [393], pages 98-110, 306-316; Mendelson [370], pages 170-197; Suppes [395], pages 127-158, 195-238; 
Bernays [341], pages 80-92, 203-209; Wilder [403], pages 114-141. 


12.5.3 REMARK: The indispensable role of the general ordinal numbers in ZF set theory. 

Probably the general ordinal numbers are most useful, and least avoidable, in their role as the “parameter 
space” for the construction of the cumulative hierarchy of sets in Section 12.6. This recursive construction 
procedure builds a universe of sets which is closed with respect to the ZF axioms for set-union, power sets, 
infinity and replacement. In some sense, one could say that each time the ordinal-number parameter a is 
“incremented”, the parametrised stage V4 of the cumulative hierarchy is expanded. Then the union of all 
stages Va for all ordinals a is closed under ZF operations. In other words, it is a “model” for Zermelo-Fraenkel 
set theory. This is the main standard model for ZF, and the general ordinal numbers are an indispensable 
part of the infrastructure for this model. 


12.5.4 REMARK: The usefulness of general ordinal numbers as “yardsticks” for cardinality. 

When the cardinality of a set must be measured by comparison with a specific “yardstick” (or “metre rule” ) 
set, so as to avoid using abstract non-set cardinal numbers, it is the general ordinal numbers which are 
usually used. However, this only works for all sets if all sets are well-ordered, which they are not in practice, 
although choice-axiom-believers claim that well-orderings do exist in some inaccessible universe. Hence in 
practice, ordinal numbers are fairly useless as “cardinality yardsticks” except for finite and countably infinite 
sets. Even the set R of real numbers has no demonstrable injection f : R — a for a general ordinal number a, 
because the construction of such a map would induce a well-ordering on the real numbers, which is generally 
agreed to be impossible to construct. (Anyone who could achieve such a construction explicitly would deserve 
a Nobel prize for literature because it would be an outstanding work of fiction!) A map which can only be 
explicitly constructed for sets which can be explicitly well-ordered is of limited practical value. 


It is very much more natural and practical to use “beta sets” as cardinality yardsticks for infinite sets. 
This approach will be adopted here. (See Section 13.4 for beta-cardinality.) Therefore general ordinal 
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numbers are not directly required in their role as cardinality yardsticks. General ordinal numbers are useful, 
however, for building ZF set theory models such as the “von Neumann universe” and the “constructible 
universe”. In conjunction with the construction of such ZF models, one easily defines also the beta-sets in 
Definition 13.4.13. Each beta-set Ba has a subscript œ which is a general ordinal number. So general ordinal 
numbers are required for defining general cardinality, but only as subscripts for beta-sets and as the ranks 
of stages in the construction of the von Neumann universe. On the other hand, one rarely requires ranks 
above w + 3. But it is nice to be able to say that the cardinality of any ZF set can be represented by a 
particular yardstick-set of some kind, no matter how absurdly big the sets-to-be-measured are. 


12.5.5 REMARK: A personal view of the importance (or otherwise) of transfinite ordinal numbers. 

It is this author’s personal view that the transfinite ordinal numbers are one of the monumental dead-ends 
of mathematics. There are two issues to be addressed here. First, whether or not general transfinite ordinal 
numbers have any really useful role to play in mathematics. And second, how did it happen that such a 
huge amount of intellectual energy and heavy machinery has been brought to bear on a cul-de-sac for over 
a hundred years? 


The importance of the finite ordinal numbers is unquestionable. They serve as yardsticks for both order and 
cardinality. The set w of ordinal numbers is unquestionably important. But it should be asked whether we 
need any more than these. The ordinal numbers beyond w are worse than useless as cardinality yardsticks. 
The cardinal number for the set of real numbers is an ordinal number which is no less than w1, which lies 
somewhere beyond eo. 


The role of transfinite ordinal numbers as order-yardsticks is highly questionable because the existence of 
well-orderings on sets is generally “proved” only by invoking the axiom of choice, which delivers only an IOU 
instead of a real “rabbit out of a hat”. The importance of the ordinal numbers in the time of Cantor, Hilbert 
and von Neumann depended on the value of well-orderings on all sets, which clearly cannot be constructed 
in general. (If they could be constructed, you wouldn't need the axiom of choice!) 


Since virtually all of real-world mathematics (and most fanciful mathematics) can be very easily and com- 
fortably represented within the von Neumann partial universe V,,,,,, there seems no need to go beyond this. 
For cardinalities, the existence of sets with cardinality between the beth numbers cannot be demonstrated, 
since this has been shown to be model-dependent. Therefore the beta-sets B, defined recursively by Bg = w 
and Baii = IP(B,) (and so forth) are adequate cardinality yardsticks for all sets. (See Definition 13.4.13 and 
Section 13.4 for beta-sets and beta-cardinality.) One may remove the fear of not being able to find an exact 
match amongst these sets by invoking the generalised continuum hypothesis, which is yet another “rabbit out 
of a hat”. (This particular magic trick guarantees the non-existence of some kinds of sets, whereas the axiom 
of choice guarantees the existence of particular kinds of sets!) However, such fear is not important. It would 
be a truly important discovery if one could in fact find such “mediate cardinals” between the cardinalities 
of the beta-sets. Knowing the smallest beta-set which dominates a given set provides more than adequate 
measurement of the size of sets. 


The remaining question is why so much intellect has been expended in the last 120 years on transfinite ordinal 
numbers which bring no obvious benefits but cause so much obvious trouble. Probably the endorsement 
of Cantor's “paradise” by Hilbert was an important factor influencing the investment of research effort. 
(See Hilbert [423], page 170: “Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben 
konnen." Literally: *Out of the Paradise which Cantor created for us, must no one be able to expel us.") 
Von Neumann's work on the ordinals undoubtedly added to their prestige as a mountain to be climbed. 
Understandably, real mathematicians have largely lost interest in the foundations of mathematics since the 
end of the 1960s, when the most important questions seemed to have been resolved. It is not surprising at 
all that set theory and model theory split off into a separate domain of mathematics with limited relevance 
for the mainstream. What is regrettable, however, is that the teaching of set theory continues to include 
substantial coverage of arcane aspects of transfinite ordinal numbers, and that the axiom of choice is taught 
as an article of unquestioned mainstream belief. In practice, the axiom of choice is false because no one can 
construct an explicit well-ordering of the set of real numbers (including all non-constructible real numbers). 


Conclusion: The von Neumann partial universe V,,,,, is adequate for almost all mathematics. (It doesn't 
contain w + w as an element though!) The generalised continuum hypothesis should be ignored as an 
unnecessary distraction. And the axiom of choice should be resisted wherever and whenever it arises. 
Mathematics can live without it. No serious theorems will be lost. And the beta-sets such as w, 2”, 22?) 
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should be adopted as transfinite cardinality yardsticks. 


12.5.6 REMARK: Abstract formulation for general ordinal numbers. 

Some of the literature for the abstract characterisation of ordinal numbers in Definition 12.5.7 is discussed 
in Remark 12.1.8. Unsurprisingly, all extended finite ordinal numbers are ordinal numbers according to 
Definition 12.5.7. This is asserted as Theorem 12.5.13 (iii). Consequently the class of general ordinal numbers 
is a super-class of the class of extended finite ordinal numbers determined by Definition 12.1.3. 


It is perhaps noteworthy that it is not immediately obvious how the conditions of Definition 12.1.3 imply the 
conditions of Definition 12.5.7. Many textbooks commence with the general definition and then define the 
finite ordinals and w in terms of the general class. Since the general ordinals other than the finite ordinals 
and w are largely inapplicable in most mathematics (apart from the study of the class of all ZF sets), it 
seems preferable to focus one’s efforts on the very concrete and useful extended finite ordinals first, and leave 
the very troublesome and useless general ordinals in the background as a purely philosophical investigation. 


In Definition 12.5.7 (i), the equivalence of the well-ordering of o to the condition VS € P(a) \ {0}, Se S 
is asserted in Theorem 11.6.16. 


12.5.7 DEFINITION: A (general) ordinal number is a set a which satisfies (i) and (ii). 


(i) a is well ordered by the set-inclusion relation. [well-ordering] 
In other words, VS € P(a) V (01, 38 € S, Vy € S, B C v. 
In other words, VS € P(a) \ (0), 306 e S, 8 C (S. 
In other words, VS € P(a) \ (01, AS € S. 


(ii) Ua € a. [transitivity] 
In other words, Vy, ((38, (ye Band 8€oa)) => 4€ a). 
In other words, V8 € a, Vy € 8, y € a. 
In other words, VG, Vy, ((y € Band B € a) > «y € a). 


In other words, V3, (8 € a > B C a). 


12.5.8 REMARK: Set-inclusion well-ordering plus ZF regularity implies set-membership well-ordering. 

The great majority of textbooks which present the Robinson-style characterisation of the general ordinal 
numbers as in Definition 12.5.7 assume a stronger version of condition (i), namely they assume well-ordering 
with respect to the set-membership relation “€” rather than the set-inclusion relation “G”. (In other words, 
their strong order relation is “€” instead of “G”, and their weak order relation is “€ or =” instead of *C".) 
'The motivation for this could be the historical reluctance to accept the ZF regularity axiom in pure set 
theory for theoretical, formal, aesthetic or recreational reasons. (One ironic consequence of rejecting the 
regularity axiom on the grounds that it is too complicated or unnatural-looking is the emergence of a vast 
sprawling, complicated, unnatural-looking class of ordinal numbers which have no identifiable benefits.) 


If the ZF regularity axiom is accepted, as it must be in order to avoid Russell's paradox, then set-inclusion 
well-ordering implies set-membership well-ordering. This follows from the observation that for a and 8 
satisfying Definition 12.5.7, a € 6 if and only if a € 8. This follows from Theorem 12.5.19 (xii) combined 
with the transitivity condition, Definition 12.5.7 (ii). Thus condition (i) is an easier test for ordinal numbers 
to pass, but in combination with transitivity and regularity (i.e. well-foundedness), the same class of ordinals 
is specified. 


12.5.9 REMARK:  Abbreviation for the proposition that a set is an ordinal number. 

Notation 12.5.10, or something equivalent, is widely used as an abbreviation for the logical predicate which 
means a set X is an ordinal number. The notation Ord( X) is slightly unusual because the “output” from 
this notation is a predicate, not a set. In other words, ^Ord" maps sets to predicate instead of mapping sets 
to sets as a notation such as P(X) does. Therefore “Ord(X)” must be used in a logical predicate in the role 
of a proposition, not in the role of a set. In the context of a set theory such as NBG, the class of all ordinals 
is well defined. But in ZF set theory, it is not possible to regard the ordinal class as an object in the theory. 
This is the reason for defining the ordinals by a predicate formula here rather than a class formula. Thus 
one must write ^Ord( X)" rather than something like *X € Ords". 


12.5.10 NOTATION: Ord(X), for a set X, means that X is an ordinal number. Thus Ord(.X) means 


UXCX and VS €P(X)\ (0), (18 e s. 
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12.5.11 REMARK: The standard order for general ordinal numbers is not a ZF set. 

The standard order in Definition 12.2.11 for the set wt of extended ordinal numbers is a ZF set. It is 
a relation in the sense of Definition 9.5.2, and it is an order in the sense of Definition 11.1.2, and it is a 
well-ordering in the sense of Definition 11.6.1. The standard order for general ordinal numbers is none of 
these because the ZF set of all general ordinal numbers does not exist. (This is because the ZF axiom of 
regularity forbids it via Theorem 7.8.4 (i). If it did exist, it would permit the Burali-Forti paradox. See 
Burali-Forti [404], page 164.) mE 


Definition 12.5.12 defines the standard order for general ordinal numbers as a two-parameter set-theoretic 
formula so as to avoid going outside ZF set theory. This formula is merely an abbreviation for the set-inclusion 
relation, which is a set-theoretic formula given in Notation 7.3.3. 


12.5.12 DEFINITION: The standard order om the (general) ordinal numbers is the set-theoretic relation 
formula “<” which is defined for ordinal number pairs (a, 8) bya € B @ a C B. 


12.5.13 THEOREM: Some basic relations between extended finite ordinals and general ordinals. 
(i) w is an ordinal number. 
(ii) Every finite ordinal number is an ordinal number. 


(iii) Every extended finite ordinal number is an ordinal number. 


PROOF: For part (i), w is well ordered by Theorem 12.2.7, and transitive by Theorem 12.1.17 (vii). So w 
satisfies the conditions of Definition 12.5.7. 


For part (ii), Theorems 12.2.8 and 12.1.17 (vii) imply that all elements of w also satisfy Definition 12.5.7. 
Part (iii) follows from parts (i) and (ii) and Theorems 12.1.30 (i) and 12.1.35 (ii). 


12.5.14 REMARK: Some properties of general ordinal numbers. 

It should not be forgotten that the von Neumann representation of ordinal numbers in terms of ZF sets and 
the successor construction a +> aU {a} is a pragmatic representation, many of whose properties are artefacts 
of the particular choices for identifications of abstract ordinal numbers with ZF sets. This representation is 
very convenient in many ways, but not all properties of the ZF representation are meaningful. 


Theorem 12.5.15 (iii) states that the successor set of any ordinal number is an ordinal number. (For successor 
sets, see Definition 12.2.17.) But not all ordinal numbers are necessarily successor sets. For example, () and 
w are not successor sets of ordinal numbers. Definition 12.5.16 gives the name “successor ordinal (number)" 
to those ordinal numbers which are successor sets of ordinal numbers. The ordinal numbers, apart from 9, 
which are not expressible as the successor set of another ordinal number must presumably be constructed 
by some other method, if they are in fact constructible. 


The name “limit ordinal" is given to those ordinal numbers, apart from (), which are not successor sets of 
ordinal numbers. This is supposed to suggest that they may be constructed by some kind of limit process, 
which would require proof if it is true. So the term "limit ordinal" must not be assumed to imply the 
availability of a limit construction for such an ordinal. 


12.5.15 THEOREM: The successor of amy ordinal number is an ordinal number. 
(i) If a is an ordinal number, then a U {a} is a well-ordered set. 
(ii) If a is an ordinal number, then a U {a} is a transitive set. 


(ii) If a is an ordinal number, then a U {a} is an ordinal number. 


PROOF: For part (i), let a be an ordinal number. Let at = aU {a}. Let S be a non-empty subset of at. 
Ifa ¢ S, then S Ca. So Vr € S, QS C z, and (|S € S because a is well ordered by inclusion by 
Definition 12.5.7 (i). Thus S contains its lower bound fì S. 


Now suppose that a € S. Let S' = S 1a. If S' = (0, then S = {a} and clearly a € S and Vz € S, a C x. 
(Thus S contains its lower bound f) S = a.) So assume that S’ 4 Ø. Then S' C a and Yz € S’, Q S’ C a, and 
NS’ € S' since o is well ordered by inclusion. But N S = (N S’) Na = N S’ because NF C(Jao C (Ja C a 
by Definition 12.5.7 (ii) and Theorem 8.4.8 (i, iii) since S' C o. So QS € S' C Sand Vr e S/S C x 
because () S C a. Thus S contains its lower bound f) S. Hence at is well ordered by inclusion. 
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For part (ii), let œ be an ordinal number. Let at = aU {a}. Then (at) = (Ua) Ua by Theorems 
8.4.8 (xii) and 8.4.6 (ii). So U(at) C aUa =a C at by Definition 12.5.7 (ii). Hence at is a transitive set. 
Part (iii) follows from parts (i) and (ii) by Definition 12.5.7. 


12.5.16 DEFINITION: A successor ordinal (number) is an ordinal number a which satisfies a = 8U (8) for 
some ordinal number £. 


A limit ordinal (number) is a non-zero ordinal number which is not a successor ordinal. 


12.5.17 DEFINITION: A set-inclusion chain is a set (of sets) X such that 


Vy,z c X, yCz or zCy. 


In other words, X is totally ordered by the set-inclusion relation. 


12.5.18 REMARK: ZF regularity axiom implies that all non-empty ordinal numbers contain the empty set. 
The proof of Theorem 12.5.19 (vii) uses the ZF axiom of regularity, and in fact the (direct or indirect) use 
of this axiom is indispensable for this proof. Transitivity combined with well-foundedness (i.e. regularity) 
forces the empty set to be a member of every ordinal number. The consequence A ¢ A for all sets A of the 
ZF regularity axiom is applied explicitly or implicitly at numerous points in the proof of Theorem 12.5.19. 
But the most substantial direct application of ZF regularity is in the proof of part (xi). 


Figure 12.1.2 in the proof of Theorem 12.1.17 (xiv) gives some idea of the kind of behaviour which can occur 
in the absence of the ZF regularity axiom. Ordinal numbers could then be “founded” on “atoms” (which 
are empty sets which are not the empty set) instead of being “founded” on the empty set as the minimum 
element “on the left” of a chain of containment relations. 


Without the axiom of regularity, the set (] o; could be non-empty for some non-empty ordinal number a. 
Then ()@ would be a non-empty ordinal number by Theorem 12.5.19 (v). So (]f']a would be an ordinal 
number also. Now suppose that (](]o = Ø. Then 0 € Na because fla € Na by Definition 12.5.7 (i) 
since (| à: is a non-empty subset of the ordinal number (^ o. But then Ø € a would imply that Ø € 8 for 
all 8 € a. So Ø € a by Definition 12.5.7 (ii). Therefore (|o = Ø by Theorem 8.4.8 (xv). It then follows 
from this contradiction that a 4 0. The recursive application of this argument yields a family (a;)icu 
of non-empty ordinal numbers defined inductively by ag = a and aj41 = (a; for all à € w, which has the 
property Vi € w, oj41 € o. (This family has the form illustrated in Figure 7.8.1 in Remark 7.8.1.) Such 
oddities are sometimes admitted into set theory by dropping the axiom of regularity, but it is not at all clear 
that the benefits of this are anything more than recreational. 


Another consequence of ill-founded set theory, where the regularity axiom is not assumed, is that none of 
the “inductive” ordinal numbers summarised in Table 12.6.2 in Remark 12.6.5 can be elements of an ordinal 
number a satisfying (o # 0. If Na Z 0, then Ø ¢ a, and then by Definition 12.5.7 (ii), 0 € y for all y € a. 
In other words, no ordinal number containing the empty set can be a member of a. But all of the ordinal 
numbers in Table 12.6.2 do contain the empty set. So in particular, w N a = Ø. This means that a belongs 
to some kind of disjoint “branch” of the ordinal numbers, and this “branch” has membership chains which 
extend infinitely “to the left". This is exactly the kind of problem which is hinted at by the question marks 
in Figure 12.1.2 in the proof of Theorem 12.1.17 (xiv). It is difficult to understand why one would want to 
introduce such pathological branches into the class of ordinal numbers, which is already excessively abstract. 


In terms of the cardinality notation in Notations 13.1.5 and 13.1.11, Theorem 12.5.19 part (vii) asserts that 
1 < #(a) implies 1 C a, and part (ix) asserts that 2 < #(a) implies 2 C a. 

12.5.19 THEOREM: Some basic properties of general ordinal numbers. 

(i) If a is an ordinal number, then every element of o is an ordinal number. 

(ii) Let X be a set of ordinal numbers. Then every element of |J X is an ordinal number. 

(iii) Let X be a non-empty set of ordinal numbers. Then every element of (|X is an ordinal number. 

(iv) Let some element of a set X be an ordinal number. Then every element of (] X is an ordinal number. 
(v) Let X be a non-empty set of ordinal numbers. Then ()X is an ordinal number. 

) 


(vi) If a is a non-empty ordinal number, then fa is an ordinal number and (a € a. 
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(vii) If a is a non-empty ordinal number, then Ø € a. (In other words, {0} C a.) 
(viii) If a is a non-empty ordinal number, then (|a = 0. 
(ix) If a is an ordinal number with 38 € a, 8 #0, then (0,101 Ca. 


(x) If a and y are ordinal numbers with y Ẹ a, then ()(a V y) € 7. 
(xi) If a and y are ordinal numbers with y & a, then (a V y) = 
(xii) If a and £ are ordinal numbers with o & p, then a € B. 
(xiii) If a and B are ordinal numbers, then a € f if and only if a & B. 
(xiv) If a and f are ordinal numbers, then a C £ if and only if a € 8 or a = f. 
(xv) If a is an ordinal number, then a = (8 € IP(a) \ {a}; Ord(8)]. 
(xvi) If œ and @ are ordinal numbers with o & 8, then at C f. 
(xvii) If a and 8 are ordinal numbers, then a € £ if and only if at C f. 
(xviii) If @ and @ are ordinal numbers with a C 8 and 4 (8X a) = 1, then 8 = a*. 
(xix) If a and f are ordinal numbers with a C 8, then at C BY. 
(xx) If a and 8 are ordinal numbers with o C 6, then a € Br. 
(xxi) If a and 8 are ordinal numbers, then a C f or 8 C a (or both). 


) 

) 

) 

) 

) 

) 

) 

) 

) 

i) 

) 

) 

) 

) 

i) 

(xxii) Every set of ordinal numbers is a set-inclusion chain. 

(xxiii) If œ and 8 are ordinal numbers, then a = 6 or a € f or f € a. 

(xxiv) If a and @ are ordinal numbers, then 8 ¢ œ implies a C f. 

(xxv) If a and f are ordinal numbers, then a C 8 or f € a (but not both). 

(xxvi) If a and @ are ordinal numbers, then 8 ¢ a if and only if a C f. 

(xxvii) If a and 8 are ordinal numbers, then a U £ is an ordinal number. 

(xxviii) Let X be a set of ordinal numbers. Then (J X is well ordered by set-inclusion. 

(xxix) Let X be a set of ordinal numbers. Then |J X is an ordinal number. 

(xxx) If a is an ordinal number, then Ua is an ordinal number. 

(xxxi) Let X be a set of ordinal numbers. Then X C (U X)*. 

(xxxii) 

(xxxiii) Let X be a set of ordinal numbers. Then X is an ordinal number if and only if JX C X. 

(xxxiv) Let X be a set of ordinal numbers. Then X is an ordinal number if and only if X = U X or X = (U X)*. 
) 
) 
) 
) 
) 
) 
i) 
) 
) 
) 
) 
) 


(xxxv) If a is an ordinal number, then U(at) = a. 


Let X be a set of ordinal numbers. Then X is well ordered by set-inclusion. 


(xxxvi) If a is an ordinal number, then a is a successor ordinal if and only if JG € a, 8* =a 
(xxxvii) If a is a successor ordinal, then Ua € a. 
(xxxviii) If æ is a successor ordinal, then a = aU {Ua}. 
(xxxix) If o is a non-empty ordinal number, then a is a limit ordinal if and only if V8 € a, Bt € a. 
(xl) If a is a limit ordinal, then [Jo =a. 
(xli) If a is an ordinal number, then a C (UJo)*. 
(xlii) If a is an ordinal number, then a is a successor ordinal if and only if (Ja € o. 
(xliii) If œ is an ordinal number, then o is a limit ordinal if and only if a Z 0 and [Ja = a. 
(xliv) If a is an ordinal number, then a is a limit ordinal if and only if o 49 and [Jo € a. 
(xlv) If a is an ordinal number, then either [Ja € a or [Jo =a (but not both). 
(xlvi) Let a be an ordinal number. Then a = aU (aN {Ua }). 


PROOF: For part (i), let a be an ordinal number. Let 8 € a. Let S € IP(8)V (0). Then S € IP(o) V (0) 
because S C f C a by Definition 12.5.7 (ii). So S € S by Definition 12.5.7 (i). Therefore 8 is well ordered 
by set-inclusion by Definition 11.6.17. 

Now let y € B. Then y € a by Definition 12.5.7 (ii). So either y C 8 or 8 C y because C is a total order 
of a by Theorem 11.6.2. Suppose that 8 C y. Then y € y, which contradicts Theorem 7.8.4 (i). Therefore 
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y € B. Thus Vy, (y € 8 > y € B). So C satisfies the transitivity condition, Definition 12.5.7 (ii). Hence 6 
is an ordinal number by Definition 12.5.7. 

For part (ii), let X be a set of ordinal numbers. Let a € UX. Then a € 6 for some 8 € X. So a is an 
ordinal number by part (i). 

For part (iii), let X be a non-empty set of ordinal numbers. Let o € () X. Then a € 6 for every 8 € X. So 
a € B for some f € X because X # J. So a is an ordinal number by part (i). 

For part (iv), let X be a set, and let a € X be an ordinal number. Then f) X is well defined because X + 0), 
and then (].X C a by Theorem 8.4.8 (xv). Hence by part (i), every element of (|X is an ordinal number. 
For part (v), let X be a non-empty set of ordinal numbers. Let S € IP(P) X) V(0). Then S C NX. So 
S C B for all 8 € X. But X Æ Ø. So there exists an ordinal number 8 such that S C 8. Therefore N S € S 
by Definition 12.5.7 (i) because S z Ø. Thus f) X is well ordered by Theorem 11.6.16. 

Now let 8 € (] X. Then £ € y for all y € X. So 8 C y for all y € X by Definition 12.5.7 (ii) because all 


elements of X are ordinal numbers. Therefore 6 C f] X. Thus V8, (B € (| X => 8 € NX). So NX satisfies 
the transitivity condition, Definition 12.5.7 (ii). Hence f] X is an ordinal number by Definition 12.5.7. 


For part (vi), a is an ordinal number by parts (i) and (v), and fa € o by Definition 12.5.7 (i). 


For part (vii), let a be a non-empty ordinal number. Then by the ZF axiom of regularity, Definition 7.2.4 (7), 
there is a set 6 € a such that BM a = Ø. Such 6 satisfies 8 C a by Definition 12.5.7 (ii). So 8 = 8N a = Í. 
Hence () € a. 


For part (viii), let œ be a non-empty ordinal number. Then fa C £ for all 6 € o by Theorem 8.4.8 (xv). 
But () € a by part (vii). So (1o C 0. Hence fa = Ø by Theorem 7.6.5 (ii). 


For part (ix), let œ be an ordinal number and let 8 € a for some 8 Z Ø. Then ( € a by part (vii). 
Let S = a V {0}. Then S Z Ø. So NS € S by Definition 12.5.7 (i). But all elements of S are ordinal 
numbers by part (i). So all elements of S contain Ø by part (vii). Therefore Ø € NS. Let y = S. Then 
y 2 (0). Suppose that y Z {Ø}. Then 6 € y for some 6 Z Ø. So 9 € a by Definition 12.5.7 (ii). But 
then 6 € S. So y C 6 because y = (|S. Therefore ô € 6, which contradicts Theorem 7.8.4 (i) (which is a 
consequence of the ZF axiom of regularity). Consequently there is no such 6 € y. Thus 7 = {@}. Therefore 
(0) 2 y —f|]S€S C o. Thus (0) € a. Hence (0, (011 C a. 


For part (x), let a and y be ordinal numbers with y ¢ a. Let S = o Vy. Then S # Ø and S Vo. Let 
ô — (S. Then ó € S by Definition 12.5.7 (i). Let € € œ V y. Then 6 Ce. So e ¢ ô because £ ¢ e. Therefore 
ôN (a N vy) 2 0. Hence 6 € y. In other words, (\(a\ y) € vy. 


For part (xi), let a be an ordinal number. Let S = {y € P(a) V {a}; Ord(y) and (Y(o \ y) 4 vy). Let 
y € S. Then y is an ordinal number with y Ẹ o and (](a \ 7) Z y. Let ô = N(a \ y). Then ô € a \ y by 
Definition 12.5.7 (i) because a \ y € IP(a) V {Ø} and a is an ordinal number. So 6 is an ordinal number by 
part (i) because ô € a, and 6 C y by part (x). But then 6 4 y because y € S. Therefore 6 € y. So y Nó #0. 
Let € = (Y(y \ 8). Then e € y Vó by Definition 12.5.7 (i) because y Vó € IP(y) V (0). Therefore € 4 6 because 
ó€oVy. But € € ô by part (x). So (1(aX6) = (((aN9)) n (Py N9)) = Ne =e. Therefore 6 € S 
because ()(a V 6) = £ Z à. It follows that Vy € S, (a Ny) € S. Then by applying this rule twice, one 
obtains Vy € S, Nla \ (\(a\ y)) € S. (This procedure is illustrated in Figure 12.5.1.) 


NCS NEDA NAE NOAA  Me\y) 
i] € 0 ed a8 O — o 


é\n O\E EVO óNe YNÓ a\ 7y 
Figure 12.5.1 Application of the ZF regularity axiom to show that y & o implies y € a 


Now let y € S. Let 6 = N(a \ y) and £ = f(a \ 6). Then it has been shown that ó and & are well-defined 
elements of S, and € € y \ ô C y. Thus (a \ Nla N y)) =€ € y. Consequently Vy € S, de € S, € € y. So 
S = () by Theorem 7.8.3 (i). (In other words, the infinite sequence of membership relations “on the left" is 
forbidden by the ZF regularity axiom.) Hence if o and y are ordinal numbers with y ¢ a, then (Y(o Vy) = y. 
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For part (xii), let a and 8 be ordinal numbers with a ¢ 8. Then a = (Y((8No) by part (xi). But (\(8\a) € B\a 
by Definition 12.5.7 (i) because 8 is an ordinal number and 8 V o € IP(8) V (0). Therefore (1(8 V o) € 2. 
Hence a € f. 


For part (xiii), let œ and 8 be ordinal numbers. Then o € f implies a ¢ 8 by Definition 12.5.7 (ii) and 
Theorem 7.8.4 (i). The converse follows from part (xii). Hence a € £ if and only if a € B. 


Part (xiv) follows from part (xiii). 
For part (xv), let œ be an ordinal number. Let 8 € a. Then 8 C a by Definition 12.5.7 (ii). But a ¢ a. So 
B is an ordinal number by part (i). Therefore a C (8 € P(a) \ {a}; Ord(8)]. 


Now let 8 be an ordinal number with 6 & o. Then £ € a by part (xii). So a 2 (8 € P(a) V {a}; Ord(8)]. 
Hence a = (8 € P(a) \ {a}; Ord(8)). 


Part (xvi) follows from part (xii). 


For part (xvii), let a and 8 be ordinal numbers. If a € 8, then at C 8 by parts (xiii) and (xvi). Conversely, 
if at C B, then a € B because a € at. Hence a € £ if and only if at C B. 

For part (xviii), let œ and 6 be ordinal numbers with a C 8 and #(G\a) = 1. Then 8 = aU») for some set 
y with y é a. But a € B by part (xii), and a ¢ a by Theorem 7.8.4 (i). So y = o. Hence 8 = at = aUa]. 
For part (xix), let & and f be ordinal numbers. Then by Theorem 12.5.15 (iii), at and 8* are ordinal 
numbers. Assume that a C B. If a = B, then clearly at = 6+. So assume that a € f. Then at C f by 
part (xvi). Hence at C BT, 


Part (xx) follows from part (xix) because a € at. 


For part (xxi), let a and 8 be ordinal numbers. Let y = a N 8. Then y is an ordinal number by part (v), 
and y C o and y € B. Suppose that a Z B and 6 Za. Then  & a and y & B. So y € a and y € B by 
part (xii). But y ¢ y. So y € a \ 7 and y € 8 \ y. Therefore y € (aN B) \ y = 0, which is impossible by the 
definition of Ø. Hence a C B or 8 C a (or both). 

Part (xxii) follows from part (xxi) and Definition 12.5.17. 

Part (xxiii) follows from parts (xii) and (xxi). 

For part (xxiv), let a and 8 be ordinal numbers with 8 ¢ o. Then a € 6 or a = B by part (xxiii). Hence 
a C B by Definition 12.5.7 (ii). 

For part (xxv), the assertion a C 6 or 6 € a follows from part (xxiv) and Theorem 4.7.9 (Ixv). The 
impossibility of both follows from the observation that 8 € a C £ is forbidden by the ZF regularity axiom. 


Part (xxvi) follows from part (xxiv) and the fact that 8 € a C B is forbidden by the ZF regularity axiom. 
Part (xxvii) follows from part (xxi). 


For part (xxviii), let X be a set of ordinal numbers. Let S be a non-empty subset of |J X. Let 8 € S. Then 
B is an ordinal number by part (ii). Let Sı = {a € S; a C 8). Then S; 4 Ø because B € $1. Let a € Sı. 


Then a C 8. So a € Bt by part (xx). Thus Sı C 8*. So Sı € Sı by Definition 12.5.7 (i) because AT is 
an ordinal number by Theorem 12.5.15 (iii). 

Let Sp = S \ Si. Then S = Sı U S2 because Sı C S. Let ay € Sı and o» € Sg. Then o4 and ag are 
ordinal numbers by part (ii), and o C 6 and o» Z B. So B C ag by part (xxi). Therefore a; C az. Thus 
Vay € Si, Vag € $5, Oi C Q3. So (1S9 = (691 U $5) = (191 by Theorem 8.4.8 (xxiii). But (191 € Sı. So 
(1S € S. Hence [) X is well-ordered by set-inclusion. 

For part (xxix), let X be a set of ordinal numbers. Let o € [J X. Then o is an ordinal number by part (ii), 
and a € B for some 6 € X. So by Definition 12.5.7 (ii), a C B for some 6 € X. Therefore a C |J X. Thus a 
satisfies transitivity, Definition 12.5.7 (ii). Hence |J X is an ordinal number by part (xxviii). 


Part (xxx) follows from parts (i) and (xxix). 

For part (xxxi), let œ € X \UX. Then V8 € X, a € B. So YB € X, B C a by part (xxiv). SoU/X C a 
by Theorem 8.4.8 (xvii). But a C UX by Theorem 8.4.8(xiv) because a € X. Soa = UX. Thus 
X\UX C {UX}. Hence X C UX U{U X} = (UX)* by Theorem 8.2.6 (v). 

For part (xxxii), let X be a set of ordinal numbers. Then X C (U X)" by part (xxxi). So X is a subset of 
the ordinal number ([J.X)* by part (xxx) and Theorem 12.5.15 (iii). But then (J X)" is well-ordered by 
set-inclusion by Definition 12.5.7 (ii). So X is well ordered by set-inclusion by Theorem 11.6.6. 


Part (xxxiii) follows from part (xxxii) and Definition 12.5.7. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


436 12. Ordinal numbers 


For part (xxxiv), let X be a set of ordinal numbers. Suppose that X is an ordinal number. Then |J. X C 
X CUX U{UX} by parts (xxxiii) and (xxxi). Therefore X = |J X or X = (UX)*. Now suppose that 
X-UXorX = (UX)t. Then UX C X. So X is an ordinal number by part (xxxiii. Hence X is an 
ordinal number if and only if X = U X or X = (U X)*. 


For part (xxxv), let a be an ordinal number. Then Ua C a by Definition 12.5.7 (ii), from which it then 
follows that U(at) = U(aU {a} 2 aUUo =a. 

For part (xxxvi), let œ be an ordinal number. Suppose that 8* = a for some 8 € a. Then such £ is 
an ordinal number by part (i). So a is a successor ordinal by Definition 12.5.16. Now suppose that Ó is 
an ordinal number. Then a = 8* = BU (8) for some ordinal number 8. But 8 € a for such 8. Thus 
38 € o, B+ =a. Hence an ordinal number a is a successor ordinal if and only if 18 € a, Bt =a. 


For part (xxxvii), let a be a successor ordinal. Then o = ó* for some ordinal number 6 by Definition 12.5.16. 
Let y = Ua. Then y is an ordinal number by part (xxx), and y C a by Definition 12.5.7 (ii). But 6 € a. So 
ô #a by Theorem 7.8.4 (i). Therefore y € a. So y € a by part (xii). In other words, Ua € a. 


For part (xxxviii), let œa be a successor ordinal. Then (Ja C a by Definition 12.5.7 (ii), and Ua € a by 
part (xxxvii). So a 2 UaU{Ua}. Now let 8 € o. Then 8 C Ua by Theorem 8.4.8 (xiv). So 8 € Ua 
or 6 = Ua by part (xiv) because Ja is an ordinal number by part (xxx) and f is an ordinal number by 
part (i). So 8 € UaU {Ua}. Therefore a C Yau {Ua}. Hence a = Yau {U a}. 

For part (xxxix), let œ be a non-empty ordinal number. Then by Definition 12.5.16 and part (xxxvi), a is a 
limit ordinal if and only if V8 € a, 8* # a. But V8 € a, Bt C a by part (xvii). So a is a limit ordinal if 
and only if V8 € a, B+ Sa. Hence by part (xiii), a is a limit ordinal if and only if V8 € a, Bt € a. 

For part (xl), let a be a limit ordinal. Then Ua C a by Definition 12.5.7 (ii). To show that a C Ua, 
let 8 € a. Then B is an ordinal number by part (i), and 6 C Ua by Theorem 8.4.8 (xiv). But Ua is 
an ordinal number by part (xxx). So 8 € Ua or 8 = Ua by part (xiv). Suppose that 8 = Ua. Since 
a Æ ( because 8 € a, part (xxxix) implies that 8* € a. Therefore 8* C Ua by Theorem 8.4.8 (xiv). So 
B € Ua = B, which contradicts Theorem 7.8.4 (i). Therefore 8 Z Ua, and so 8 € Ua. Thus a C Ua. 
Hence [Ja =a. 


Part (xli) follows from parts (xxxviii) and (xl). 


For part (xlii), let œ be a successor ordinal. Then Ja € a by part (xxxvii). Now let o be an ordinal number 
with [Ja € a. Then a ¥ 0), and Ua z a by Theorem 7.8.4 (i). So by part (xl), a is not a limit ordinal. 
Hence a is a successor ordinal by Definition 12.5.16. 


For part (xliii), let œ be a limit ordinal. Then a Æ Ø by Definition 12.5.16, and Ua = a by part (xl). Now 
let a be a non-zero ordinal number with [Ja = a. Then by part (xxxvii) and Theorem 7.8.4 (i), o is not a 
successor ordinal. Hence a is a limit ordinal by Definition 12.5.16. 


Part (xliv) follows from part (xlii) and Definition 12.5.16. 


Part (xlv) follows from parts (xlii) and (xliii) and Definition 12.5.16. (Alternatively the assertion follows 
from parts (xiv) and (xxx) combined with Definition 12.5.7 (ii).) 
For part (xlvi), if [Ja € a, then a = (Ua)* by parts (xlii) and (xxxviii). Soa = UaU (aN (Ua). If 
Ua € a, then a = Ua by parts (xliv) and (xl). So a = Uau (an (Uo ]). Thus the formula is valid in all 
cases. 


12.5.20 REMARK: The “unique continuation property” of ordinal numbers. 

Theorem 12.5.19 (xviii) may be thought of as the “unique continuation property” of ordinal numbers. Thus 
given any ordinal number a, the “next” ordinal number after a must be at. This is the only choice for an 
ordinal number which is “one size bigger” than a. 


12.5.21 REMARK: Limit ordinals are limits of their elements. 

Theorem 12.5.19 part (xl) is a quite remarkable property of ordinal numbers, particularly when combined 
with part (xxxvii). Part (xl) states that a limit ordinal a is equal to the “limit” Ua of its elements. With 
respect to the set-inclusion order for the class of ordinal numbers, which is asserted to be a total order in 
Theorem 12.5.19 (xxi), the set Ua is the supremum of the elements of a since y = Ua is the smallest set 
such that 8 C y for all 8 € a. So the term “limit ordinal” is vindicated! This property of limit ordinals a is 
often written as a = Usea B or a= Ug, B. 
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The limit property of a limit ordinal, that it is equal to the limit of its elements, implies that there are 
two distinct sub-classes of non-zero ordinal numbers, namely successor ordinals which are constructed by 
“adding one” to the previous ordinal, and limit ordinals which are constructed as the union of all smaller 
ordinals. The alternating application of these two construction procedures is demonstrated in Table 12.6.2 
in Remark 12.6.5. 


The fact that there are only two kinds of non-zero ordinal numbers, the successors and limits, makes it 
possible to define “cumulative hierarchies” of sets by transfinite induction. The most important example 
of this is probably the von Neumann universe discussed in Section 12.6. Another useful such transfinite 
inductive construction is the family of beta-sets in Definition 13.4.13. Since the parameter-space of this 
“family” is not a ZF set, it is not a ZF family of sets in the sense of Definition 10.8.2, but it does become a 
genuine ZF set-family if the parameter-space is restricted to a set. 


12.5.22 REMARK: Maxima and minima of sets of ordinal numbers. 

Theorem 12.5.23 gives some properties of sets of ordinal numbers which can be interpreted in terms of 
maxima or minima. In ZF set theory, a class of ordinal numbers which is only bounded below is not a set. 
So the minimum of such a class cannot be expressed as the minimum of a set. So other ways must be found 
to express such an idea. 


For example, one may say that a* is the least ordinal which is greater than any given ordinal number a. This 
observation cannot be expressed as “at = min(£; Ord(8) and a € B)" because *(8; Ord(8) and a € B)" is 
not a ZF set. But this is the implicit significance of Theorem 12.5.23 (i). 


12.5.23 THEOREM: Some mazimum and minimum properties for sets of ordinal numbers. 

(i) If a and 8 are ordinal numbers, then a & £ if and only if at C £. 
In other words, a* is the least ordinal which is greater than a. 

(ii) Let X be a set of ordinal numbers. Then X C (U X)*. 

(iii) Let X be a set of ordinal numbers. Then |J X ¢ X if and only if X C UX. 

(iv) Let X be a set of ordinal numbers. Then V8, (Ord(8) (X CH UXCB)) 
In other words, any ordinal number which covers X also covers |J X. 

(v) Let X be a set of ordinal numbers. Then |J X € X if and only if da € X, V8 € X, B C a. 
In other words, |J X € X if and only if X contains a maximum element. 


(vi) Let X be a set of ordinal numbers. Then [J X ¢ X if and only if Va € X, 36 € X, a c f. 
In other words, |) X € X if and only if X does not contain a maximum element. 


(vii) An ordinal number is a successor ordinal if and only if it contains a maximum element. 
(viii) An ordinal number is a limit ordinal if and only if it is non-empty and contains no maximum element. 
(ix) Let X be a set of ordinal numbers. Then [J X € X if and only if 
VB, (Ord(8) > (X CB & (UX)* C 8)). That is, (U X)” is the least ordinal which includes X. 
(x) Let X be a set of ordinal numbers. Then |J X ¢ X if and only if 
VB, (Ord(8) (X € 8 & UX C B)). That is, U X is the least ordinal which includes X. 
(xi) Let X be a set of ordinal numbers. Then the least ordinal number which is greater than all elements of 
X is the same as the least ordinal number which includes X. 
(xii) Let X be a set of ordinal numbers. Then [J X € X if and only if (UJ X)* is the least ordinal number 
which is greater than all elements of X. 


(xiii) Let X be a set of ordinal numbers. Then |J X ¢ X if and only if |J X is the least ordinal number which 
is greater than all elements of X. 


(xiv) Let X be a set of ordinal numbers. Then the least ordinal number greater than all elements of X is 


uxverniuxy= {Ur fUxex 


(xv) Let X be a set of ordinal numbers. Let ac UX U(X N{UX}). Then 36€ X, a C B. 
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PROOF: Part (i) follows from Theorem 12.5.19 (xvii, xiii). 

For part (ii), let X be a set of ordinal numbers. Let a € X. Then a C (J X by Theorem 8.4.8 (xiv). So 
a € (UX)* by Theorem 12.5.19 (xxix, xx). Hence X C (U X)*. 

Part (iii) follows immediately from part (ii) because (U.X)* =U X U{UX}. 

For part (iv), let 6 be an ordinal number such that X C 8. Let a € UX. Then a € y for some y € X, 
which implies y € 8. So a € 8 by transitivity, Definition 12.5.7 (ii). Thus LJ. X C £. Hence V8, (Ord(8) = 
(Xc82 UXCA)). 

For part (v), let X be a set of ordinal numbers such that UX € X. Let a = UX. Let 8 € X. Then 
B CU X by Theorem 8.4.8 (xiv). Thus Ja € X, V8 € X, B C a. Now let X be a set of ordinal numbers 
which satisfies Ja € X, V8 € X, 8 C a. Let a € X satisfy VB € X, B Ca. Let y € UX. Then y € f for 
some B € X. So y € a because 6 C a. Thus UX Ca. But a C UX by Theorem 8.4.8 (xiv). Therefore 
UX =ae X. Hence UU X c X if and only if do € X, VB € X, B C a. 

Part (vi) follows from part (v) and Theorem 12.5.19 (xiii). 


Part (vii) follows from Theorem 12.5.19 (i, xlii) and part (v). 

Part (viii) follows from Theorem 12.5.19 (i, xliii) and part (vi). 

For part (ix), first assume that (J.X € X. Let B8 be an ordinal number with X C 6. Let a € (UX). 
Then a € UX or a = U X. Ifa = UX, then a € X, and soa € B. If a € UX, then by transitivity, 
Definition 12.5.7 (ii), a € 2 because |J X € 8. Thus a € £ in either case. So (UX) C 8. Now suppose 
that (LJ.X)* C 8. Then X C £ by part (ii). Consequently X C 8 & (U X)* CB. 

Now assume Y£, (Ord(8) > (X C 8 & (UX)* C £)). With 8 = UX, one obtains X Z UX because 
(U X)* Z UX. Therefore |J X € X by part (iii). 

For part (x), first assume that |J X € X. Let B be an ordinal with X C 8. Then UX C £ by part (iv). Now 
let 8 satisfy JX C 8. Then X C £ by part (iii). Consequently V8, (Ord(@) (X CB Ux Cc B)). 
Now assume V3, (Ord(8) — (X C 8 & UX C £)). Then 8 = UX gives X C UX. Hence UX ¢ X by 
part (iii). 

For part (xi), V, (Ord(8) > (X CBS VaE X,a€ B)) follows from Notations 7.2.7 and 7.3.3. But 
a € B if and only if B is greater than a. (This follows from Definition 12.5.12 and Theorem 12.5.19 (xiii).) 
Hence the least ordinal which includes X is the same as the least ordinal greater than all elements of X. 


Part (xii) follows from parts (ix) and (xi). 

Part (xiii) follows from parts (x) and (xi). 

Part (xiv) follows from parts (xii) and (xiii), and Theorem 8.4.8 (xxiv). 
For part (xv), let ae UX U(X n (U X }). Suppose that UX € X. Then ace UXU{UX }. Ifa=UX, 
then a C B with 8 = UX € X. Ifa e UX, then a € 8 for some 8 € X by Notation 8.4.2. So a C f for 
some 5 € X by Definition 12.5.7 (ii). Now suppose that |J X ¢ X. Then a € UX. So for the same reasons, 
a C B for some 8 € X Hence 38 € X, a C B in all cases. 


12.5.24 REMARK: Comparability properties of ordinal numbers and well-ordered sets. 
Since general ordinal numbers are well ordered by Definition 12.5.7, all of the comparability theorems in 
Section 11.7 are applicable. Theorem 12.5.25 gives some isomorphism properties for ordinal numbers. 


12.5.25 THEOREM: Some isomorphism-related properties of ordinal numbers. 
(i) Let a be a lower section of an ordinal number 8. Then a is an ordinal number with a C f. 
(ii) Let œ and 8 be ordinal numbers. Then a & £ if and only if o is a proper lower section of 5. 


) 

) 
(iii) Let a and 8 be ordinal numbers. Then o C £ if and only if a is a lower section of £. 
(iv) Let a and 8 be ordinal numbers. Then o is a lower section of 8, or f is a lower section of o, or both. 
) 


(v) Let a and 8 be ordinal numbers. Then a is a proper lower section of £, or B is a proper lower section 
of a, or a = f. (These are mutually exclusive possibilities.) 


(vi) Let a be an ordinal number. Let $ : œ — a be an order isomorphism. Then 9 = idą. 
(vii) Let a and 8 be order isomorphic ordinal numbers. Then a = f. 


Let (X, €) be a well-ordered set. 
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(viii) Let $1, ¢2 : X — a be order isomorphisms from (X, €) to an ordinal number a. Then $1 = $». 
(ix) Let p : X — ax be order isomorphisms from (X, <) to ordinal numbers a, for k = 1,2. Then a; = o» 
and $i = $3. 
(x) Let 6: X — a be an order isomorphism from (X, €) to an ordinal number a. Let A be a proper lower 
section of X. Then ¢(A) is an ordinal number with $(A) € a. 


PROOF: For part (i), let 6 be an ordinal number. Let a be a lower section of 8. Then by Theorem 11.7.5 (vi), 
either a = B or a= {y € B; y € ô} for some 6 € B. But then 6 C p by Definition 12.5.7 (ii). So a = 6. But 
6 is an ordinal number by Theorem 12.5.19 (i). Hence a is an ordinal number with a C £. 

For part (ii), let œ and 8 be ordinal numbers with a ¢ 8. Then a = 8N a by Theorem 8.1.7 (v). Thus 
a = {y E€ B7 E€ a}. But a € f by Theorem 12.5.19 (xii). So œ is a proper lower section of f by 
Definition 11.7.4. The converse follows from Definition 11.7.4. 


For part (iii), let œ and 6 be ordinal numbers with a C 8. If a = B, then a is a lower section of 6 by 
Definition 11.7.4. Hence a is a lower section of 8 by part (ii). The converse follows from Definition 11.7.4. 


Part (iv) follows from part (iii) and Theorem 12.5.19 (xxi). 
Part (v) follows from part (iv) and Definition 11.7.4. 
Part (vi) follows from Theorem 11.7.2 (iii). 


For part (vii), let a and 8 be order isomorphic ordinal numbers. Suppose that a 4 8. Then a 8 or BS a 
by Theorem 12.5.19 (xxi). Suppose that o & 8. Then a is a proper lower section of 8 by part (ii). So by 
Theorem 11.7.5 (viii), œ and 6 are not order isomorphic. Similarly if 6 € a. Hence a = £. 


Part (viii) follows from Theorem 11.7.2 (vii). 


Part (ix) follows from parts (vii) and (viii) because d» o 91! : ay — o» is an order isomorphism. 
Part (x) follows from Theorem 11.7.5 (xiii), part (ii), and Theorem 12.5.19 (xii). 


12.5.26 REMARK: Limit ordinals cannot be weak initial segments. 

As mentioned in Remark 11.7.10, sets of the form (x € X; x < a} for well-ordered sets (X, X) are sometimes 
referred to as “weak initial segments". In the ordinal number context, these are almost the same thing as 
successor ordinals. (See Theorem 12.5.27 (ii).) Therefore weak initial segments are never limit ordinals. 


12.5.27 THEOREM: Basic properties of weak initial segments of ordinal numbers. 
Let œ be an ordinal number. Let L$ (o) denote (^ € o; y € B] for all 8 € a. 


(i) V8 € a, Lf (a) = BU (8]. 


(ii) VB € a, LZ (a) is a non-zero successor ordinal. 

(iii) VB € a, L5 (o) is not a limit ordinal. 

(iv) If a Z 0, then a is a limit ordinal if and only if V8 € a, L$ (o) € a. 
(v) a is a successor ordinal if and only if 38 € a, L5 (a) =a. 


Let (X, X) be a well-ordered set. For B € P(X) and ordinal numbers 8, let B = 8 mean that (X,<) and 
(B, Ca are order isomorphic. 


(vi) If X “a, then Vb € X, 3B, (Ord(8) and Lj (X) & 8). 


Pnoor: For part (i), let 8 € a. Let y € L5 (a). Then y C 8. Soy € BU (8) by Theorem 12.5.19 (xiv). 
Therefore L7 (a) C 8U(8). Now suppose that y € 8U (8. Then y C 8 by Theorem 12.5.19 (xiv). If y = 8, 
theny € a because B € a. Ify € B, then y € a because 8 C a by Definition 12.5.7 (ii). Thus y € o in either 
case. Soy € L5 (a ). Therefore 8U (8) € L3 (a ). Hence Lj (o )28uU18]. 


Part (ii) follows from part (i) and Definition 12.5.16. 

Part (iii) follows from part (ii) and Definition 12.5.16. 
Part (i 
( 


iv) follows from part (i) and Theorem 12.5.19 (xxxix). 


Part (v) follows from part (i) and Theorem 12.5.19 (xxxvi). 
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For part (vi), let X = a. Let ó be the order isomorphism from (X,<) to (8,€), which is unique by 
Theorem 11.7.2 (vii). Let b € X. Then (L$ (X)) = Lj )(@) by Theorem 11.7.1 i» But L2(o) is an 


ordinal number and Lz (a) C a by Theorem 12.5.25 (i). Thus 46, (Ord(8) and L (X) e p). 


IIe 


12.6. The transfinite recursive cumulative hierarchy of sets 


12.6.1 REMARK: The von Neumann universe. 

In view of the close connection between general ordinal numbers, cardinality of ZF sets, and the rank of sets 
in the von Neumann universe, it makes sense to define these three concepts together. The von Neumann 
universe (also known as “the cumulative hierarchy of sets”) is discussed by Bernays [341], pages 203-209; 
Jech [364], pages 33, 54; E. Mendelson [370], page 202; Moore [371], pages 270-271, 279, 281; Pinter [377], 
pages 213-218; Roitman [385], pages 77-79. As mentioned by Moore [371], pages 270, 279, 281, the so-called 
von Neumann universe was in fact first published in 1930 by Zermelo [445], pages 36-40, not by von Neumann. 
(The cumulative hierarchy is also attributed to Zermelo by Pinter [377], page 214.) However, the application 
of von Neumann's transfinite hierarchy to the construction of the cumulative hierarchy of sets was not a 
huge leap forward. The idea was inherent and clear in von Neumann's papers in the 1920s. 


Existence and uniqueness for the general transfinite recursive construction of hierarchies of sets was proved 
by von Neumann in two papers in 1928 for Zermelo-Fraenkel set theory (von Neumann [439]), and for 
von Neumann's own set theory (von Neumann [440]), which later developed into NBG set theory. In neither 
of these papers did he apply his transfinite recursive method to construct a universe of all sets. 


Presentations of the von Neumann universe by Bernays [341], pages 203-209, and E. Mendelson [370], 
page 202, give credit to von Neumann for the transfinite induction construction method which they use, 
but not for its particular application to the construction of a universe of sets. Roitman [385], page 79, 
states (without references) that: “The realization that regularity is equivalent to V = | J; coy Va is due to 
von Neumann." 


The standard notation for a universe of sets is V, which apparently originates with the early use in set theory 
of the symbol ^ or A (or upside-down V) for the empty set, and V or V for the universe of sets. (The former 
is the intersection of all sets, while the latter is the union of all sets. See for example Peano [375], page XI; 
Whitehead/Russell, Volume I [400], page 229; Carnap [345], pages 125-126; Bernays [341], pages 33, 58.) It 
is possible that this use of the letter V could have been taken as a hint by some mathematicians that the 
universe was being attributed to von Neumann. 


12.6.2 REMARK: The finite stages of the von Neumann universe. 

The zeroth stage of the von Neumann universe is Vo = Ø = 0. The first stage is Vj = P(Vo) = {0} = 1. The 
second stage is Vo = P(V,) = (0, (01) = {0,1} = 2. The third stage is V = P(V2) = (0,1, (1), 2). This is 
not an ordinal number, whereas the earlier stages are ordinal numbers. The fourth stage is 


Va = IP(V3) 
= { 0,1, {1}, 2, {{1}}, 45, {1}} {1 (GT) {0, 1, (1), 
{2}, {0, 2}, {1, 2}, 3, {{1}, 2}, (0, {1}, 2}, (1, {1}, 2}, (0, 1, {1}, 2} }- 


Thus V4 contains 16 = 2* elements. It is even more unpleasant when written out in full. 


Va = (9, {0}, (013, (0, {0}, 
(40555. (0, (0155, Ob (055. (0, (00), (0333 
(0, {Ø} (0, (0, (0555, (005, (0, (Ob, (0, 005, (0, (03) J; 
{{{O}}, (0, (03), (0, (0033, 00, {Ob}, (003, (0033, (0, (033). (0, (0), (00), (0, (03) J- 


The first 5 mini-universes are illustrated in Figure 12.6.1 using the same visualisation conventions as in 
Figure 12.2.1. (See Remark 12.2.2.) 
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Vo Hu 
Zl E 


Ve ojo 


V3 ajo] |a]|a[a 


V4 ninjulujdg a nio pni ojojo nu ninja uina ojojo ojojo ninpjjnmju nmnyjajoiao ojo ooo 


Figure 12.6.1 The first five von Neumann universe stages, Vo to V4 


The mini-universe V; contains 2'© = 65536 elements, which is too many to usefully list here. The first few 


elements of Vs are as follows (in lexicographic order). 
Vs = IP(Vi) 
= { 0,1, (152, {{1}}, (0, (095 (1, (0335 (0, 1, {13}, 
{2}, {0,2}, {1,2}, 3, (05,25 {0, {1}, 2}, 05 {1}, 2}, (0, 1, (15,2, 
HUPA (0 UOS {L 01735 0:1 1, 
UP 035 05 05013 UG 0501335 (61 05 (15 
(00335 (0,2, (1d 5 152, (033510652, (0333 
(152, 101510 052 01735 0, 052 1155 (005 {1}, 2, (0775, 
(0, 0335 (0 (0: 0035 05 £0, 0555.10, 1, (0, {1}}}, 
U15 (0 0555 (0:05 (0 093 05 {1}, (00 0557 (0: 1, {1}, (000555, 
12:40 4 LT Ee eire 


The universe stage Ve contains 265536 ~ 2,0035 . 1019725 sets, This is very significantly greater than the 
number of electrons in the visible physical universe. The set Vi expands astonishingly quickly as k € w 
increases. This shows how deceptive the word “finite” is. It is clearly not possible to represent the arbitrary 
choice of a single element of V7 using two spin states of every electron in the known Universe. So Vio is totally 
beyond description. This gives some idea of the strength of the power-set axiom. In a very small number of 
stages, the bounds of human comprehension are exceeded, long before the infinity axiom is applied. 


The finite universe stages Vi for k € w satisfy k C Vp and k € Vk+1 V Vy. Therefore Vp is the smallest 
universe stage which includes k, and Vi, is the smallest stage contains k. 


For all k € w \ {0}, the difference set Vp \ Vi... consists of all set-membership trees which have depth exactly 
equal to k. It is straightforward to arrange all elements of Vy = | J;,, Ve in a “lexicographic order" so that 
all elements of Vk come before all elements of Vk+1, for all k € w. 


12.6.3 REMARK: Some slightly infinite stages of the von Neumann universe 

It is intuitively clear that all possible finite sets which can be built from the empty set will be constructed 
in stage Vi, for some k € w as discussed in Remark 12.6.2. Therefore the union V, = Ukew Vi, of all such 
“finite universes” is the set of all possible finite ZF sets which can be constructed from the empty set. In 
fact, it follows from the ZF regularity axiom that no other finite sets are possible. In particular, w C Vp. So 
V, is an infinite set. It is clearly possible to write an explicit bijection f : w — V,, by continually extending 
bijections fy : Nk — Vk, where n; € w has the same cardinality as Vk. This is not difficult because Vk C Vk+1 
for all k € w. Hence V, is countable. 


The set V,, is a model for ZF set theory without the axiom of infinity. This is stated in simple terms as 
follows by Cohen [349], page 54. 


The first really interesting axiom [of ZF set theory] is the Axiom of Infinity. If we drop it, then we 
can take as a model for ZF the set M of all finite sets which can be built up from 9. [...] It is clear 
that M will be a model for the other axioms, since none of these lead out of the class of finite sets. 


In fact, V, is a model for Zermelo set theory without the axiom of infinity for any limit ordinal a. (See 
Smullyan/Fitting [392], page 96, for a proof of this. They call any universe which satisfies the Zermelo set 
theory axioms apart from the infinity axiom a “basic universe” .) 
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The cardinalities of n and V, are the same for n = 0, n = 1 and n = 2, but n is smaller than Vpn for n € w 
with n > 3. However, the cardinalities of w and V, are the same, whereas n is smaller than V, for all ordinal 
numbers n > w. 


The set V,44 = IP(V,) breaks through into a universe where infinite sets are possible. From w C V,, it 
follows that w + 1 = wU {w} € P(w) € IP(V,) = V,41. The inclusion P(w) C V,41 implies that V.41 is 
uncountable by Cantor's diagonalisation procedure. (See Theorem 13.1.27 (v).) It is well-known that P(w) 
is equinumerous to the set R of real numbers. So V444 is an effective “cardinality yardstick” for R. It is 
convenient to define Bp = w, and Bay; = P(Ba) for all ordinal numbers a, and B, = U — for limit 
ordinal numbers a. (See Definition 13.4.13.) Then in particular, Ba is equinumerous to V, for all finite 
ordinal numbers a. 


The set V,? is a model for Zermelo set theory. (See Remark 7.7.3 for Zermelo set theory. See Smullyan/ 
Fitting [392], page 96, for a proof that V,» is a model for Zermelo set theory. They show furthermore that 
Va is a Zermelo universe for any ordinal number a such that a > w2.) The fact that V, is not a model 
for Zermelo-Fraenkel set theory is clear from the fact that w2 ¢ V2, although w +i € V,» for all i € w, 
and w2 = w +w = Uieu (w +i). The ZF replacement axiom implies that w2 is a ZF set by considering the 
replacement map i — w + i for i € w. 


12.6.4 REMARK: The relation between ordinal numbers and von Neumann universe stages. 

The cumulative hierarchies of ordinal numbers and sets are generated by the same transfinite recursion 
procedure. The parallel development of these two hierarchies is illustrated in Table 12.6.2. (For brevity, 
some stages have been omitted.) 


The stage number, or “rank” of each stage in the cumulative hierarchy in Table 12.6.2 is the same as the 
ordinal number which is constructed at each stage. Therefore the left column is superfluous. The ordinal 
numbers may be considered to be the “spine” or “backbone” of set theory because of the central role they 
play as the rank parameter in this construction of the *class of all sets". 


At each stage of the hierarchy, non-limit ordinals a + 1 are constructed as a U {a}, whereas set-universe 
stages V,+1 are constructed as IP(V4), but both the limit ordinal numbers ao and the von Neumann universe 
limit stages V4, are constructed by the same method, namely as the union of all previous stages. Thus 
ao = [J4 24, & and Va, = UU, Va for all limit ordinals ao. 


The ad-hoc notation w; in Table 12.6.2 means the solution, for i € w, of the inductive rule cj,1 = w™ 
with wo = w. (Note that w is a variant of the Greek letter m, not the Greek letter w. A possible mnemonic 
for the letter w in this application is the word *power".) 


The beta-cardinality numbers £8, in Table 12.6.2 are defined for all ordinal numbers k by Bk = #(Vu+r) 
for ordinal numbers k < w? and B, = #(Vp) for ordinal numbers k > w?. (See Section 13.4 for beta- 
cardinality.) The principal idea of the “beta-cardinality” idea is that the cardinality of sets is measured by 
the ordinal numbers for finite and countably infinite sets. For uncountably infinite sets, the ordinal numbers 
are totally inadequate to the task of acting as “cardinality yardsticks”. Therefore beta-cardinality switches 
to the von Neumann set hierarchy as a source of “cardinality yardsticks” for all uncountable sets. Since the 
services of the axiom of choice, and therefore also of the generalised continuum hypothesis, are declined in this 
book, it is not possible to assert that all sets are equinumerous to some stage of the von Neumann hierarchy. 
However, the concrete existence of such sets with “mediate cardinality” has never been demonstrated in a 
constructive way because if they could be explicitly constructed, they would have a well-defined cardinality 
representative amongst the von Neumann universe stages. (In fact, if the universe of sets is restricted to 
constructible sets, both AC and GCH are true in the model. But beware of Skolem’s paradox!) Therefore 
beta-cardinality is well defined for all constructible sets, which is adequate for all practical purposes. 


12.6.5 REMARK: Justification of the stages of the von Neumann universe. 
Each stage in the hierarchy of sets in Table 12.6.2 in Remark 12.6.4 may be justified in one way or another. 


(1) The finite ordinal numbers are justified by the empty set axiom, the pair axiom and the union axiom. 
(This is the subject of Section 12.1.) Each ordinal number a + 1 = a U {a} can be justified by the pair 
axiom to obtain (a, a) = {a}, and the union axiom to obtain aU{a}. The universe stage V4.4 = P(Va) 
is then justified by the power set axiom. 


(2) The existence of the first infinite ordinal number w cannot be justified by applying the union axiom to 
obtain w = (Jex i =U {i; i € w} = Uw because w must be defined before it is used in this way. This is 


icu 
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rank o ordinal number cumulative set hierarchy #(Va) = (Va) 

0 0-20 V = 0 

1 1 = 0U {0} = (0) V; = P(0) = {0} 1 

2 2=1U {1} = (0,00)] V5 = P(P(0)) = (0, {O}} 2 

3 3 = 2U {2} = (0, {0}, (0, {0}}} Vs = P(P(PO))) = (8, {0}, {{03}, 10, 10} 4 

4 4 = 3U {3} = {0,1,2,3} V4 = P(V3) 200) = 16 

5 5 =4U {4} = {0, 1,2,3,4} Vs = IP(V4) 216 — 65536 

6 6=5U {5} = (0,1,2,3,4,5) Vs = P(Vs) 2699906 

7 7=6U {6} = {0,1,2,3,4,5,6} V; 2 IP(V5) pg m 

w w = {0,1,2,3,..}=Uw =m Vs = UU Vo, Vi, Va...) = Uie, Vi Bo = #(w) 

w+1 w+1=wU {w} Vo+1 = IP(V,) bı = #(2”) 

w+2 w+2=wU{w,w+1} Voto = P(Vo+1) f» = #22) 

w2 w2 — w +w = Uieu lw + i) V2 =U (Vu, Vo41, Vo+2,---} = Uie. Vusi Bu 

w2+1 w24+1=w2U {w2} Vo2+1 = P(Vu2) But 

w2+2 w2+2=w2U {w2, w2 + 1} Vo24+2 = P(Vo2+1) Bu+e2 

w3 w3 = Uie, (w2 + i) Vos = Dess Voi Bue 

w? w? = ww = Uieu (w: i) Var = Ucu Vst Bue 

w w = ww = Uic, (w? - i) Vos exa o Vult Bus 

ur w^ = Uie) =w; Ve = View Vu Bu» 

w tl w"-c-1-w"Uiw") Vooi = IP(Vo») Boe 

w” +2 w” +2 =w” U {ww +1} Vue H2 = P( P(Viw )) Due +2 

w tw W w= Uila” +4) Viet: = Uies(Vosqi) Bue tw 

w^" +w2 w"-rFw2- Uie, (Go? +w)+i) Vietur = Usiew (View oi) Bus +2 

wu" +w3d w +w = Yc, ((wY +w2) +7) Voexus = Use, (Vietwors) Bue +03 

we +2 Ww” -2 = Ucu” + wt) r Vosa = Uicwu(Vov+wi) Buw.2 

we 3 we 3= Uie, lo” “24+ w") Vow.3 = Uiew View -24we) Bus 3 

ww ww = Uc, (w" - i) Vies = jew (Vos) Baeu 

w” - w2 Ww” -w2 = Uicw (ww i i) Vus.u2 = Uicu(Vovw-i) Bue 2 

wwe wwe = ies et) Vise? = User (Varw wi) Beg? 

w” w’ w”. w’ on Uie, (w? wi) Vio u3 = (ieu Voswi) Bawa 

(w)? (w)? E Ls w’) Viwe)? = Ucu (Vov wi) Bue y? 

(v (uU = ULL (m ur) Vise = aes (Vio) We 

Qe wl) = (uy = Use, (un) Vie = Vwwye = View Ve»: Buu?) 

yy”) w”) = Us "IC = W2 V we) = Br V wt) Bi (ww) 

£0 €0 = Uie, vi Veo = Uie, Vari Beo 

€0 + 1 €0 +1= €Q U {eo} Veski = P (Va) Beati 
Table 12.6.2 The cumulative hierarchies of ordinal numbers and sets 
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the most important stage of the whole hierarchy because the entire tower rests on this foundation, and 
all limit stages explicitly use w for their justification. However, this is also the weakest stage. This set 
is defined by Definition 12.1.28. As noted in Remark 12.1.27, the acceptance or rejection of the infinity 
axiom is “academic”. The important point is that the properties of w are unambiguous if it exists. 
The conditional assertions of properties of w are true whether it exists or not. (For comparison, the 
proposition that “if Mars collides with the Earth, we are all doomed” is clearly true. This proposition 
does not require an actual collision to make it true. It is also true that the lengths of the sides of a 
perfect equilateral triangle differ by less than 107100 of a metre, although perfect equilateral triangles 
do not exist.) If the existence of w is acceptable, then the existence of Vy = U,-., V; is justified by the 
replacement axiom and the union axiom. 


icu 


The ordinal numbers w + i and stages Vs; are justified for i € w as for ordinals 4 and stages V;, since 
w+ (i+ 1) = (wt i) U (o +i} and V, 44,4) = IP(Vo44) for i € w. Then w2 = w +w = Ucu (vw +i) and 
Via = Vau = Lau Vo+i. This requires the replacement axiom. This is summarised as follows. 


Vi € w, w+ (i 4-1) — (w +i)" Vor Gary = P(Vaaa) (12.6.1) 
w2=wt+tw=Uie,(w ti) Vo2 = Vorw = Uje,, Voti (12.6.2) 


« 


In ordinal arithmetic, the plus operator “+” signifies that the two well-ordered sets on the left and right 
of the operator, a and f respectively say, are to be concatenated, maintaining their order internally, 
and extending order to the concatenated set a+ 6 by defining x < y for all x € o and y € B. According 
to this addition rule, the ordered set œa + 1 is isomorphic (as an ordered set) to the ordinal number 
at = aU {a}. (This is actually a theorem about œ + 1, not a definition for œ + 1.) Similarly, the 
ordered set w + 1 is order-isomorphic to the ordinal number wt = w U {w}. (Once again, this is a 
theorem about w + 1, not a definition for w + 1.) Thus strictly speaking, the equalities in line (12.6.1) 
are actually order-isomorphisms. However, it is customary to identify ordinal number concatenations 
with their order-isomorphic representatives in the class of ordinal numbers. (These representatives are 
always unique.) This addition operation is not commutative in general. (E.g., 1 +w 2 w z w-4 1.) 

For any i, j € w, the ordinal number w-j+(¢+1) 2 wj 4- (4-1) = (wj +1) U (ej +i}, and the hierarchy 
stage V; (441) = P(Vuj+i), are well defined as in step (3). Then w: (j-- 1) = w- jv = Uie, ((w- 3) +7) 
and V,.(541) = Uiew Viw-7) +i are also well defined as in step (3). Taking the limit of the stages w- j for 
j € w yields w? =w-w = Ujew(w : j) and V2 = Vow =U V.,.;- This is summarised as follows. 


jew 
Vi,j € w, wj + (-- 1) = (wj + i)* Vj (i41) = P(Vojsi) 
Vj € uw, w: (j +1) =wj +w = Ucu (wi 1) Vu.g41) = Vujtw = Uicw Voici 
w? =w: w = jew i) Vus = Vow = Uje, Vai 


99 


The multiplication operator in ordinal arithmetic signifies that if the well-ordered sets on the left 
and right of the operator are o and f respectively, then a- B is the concatenation of 8 copies of the 
set a. The order for this concatenation is inherited from a within each copy of a, anda < y if xz E€ aL 
and y € ag are elements of copies ay and og of a, where o, is strictly to the left of ag. This 
multiplication operation is not commutative in general. (E.g., 2-w =w  w-2.) The operator symbol 
“.” is often omitted to save space. Thus for example, “wj + i” means (w- j) +i. 


In the “quadratic tranche of stages”, one obtains “quadratic polynomials in w” as follows. 


Vijk€w, wk+wji+(i4+1) = (wkt+uj +i)" V.2k-uj-(41) = P(Vo2ktwj+i) 

Vj,k € w, wk + w(j +1) = Uie, (wk + wi +i) Vozk+w(+1) = Uses Vosksugas 

Vkew, wi(k+1)Sw7k+o7= jeu (wk +wj) Varga) = jeu Vu2k+wj 
w ed -w = Upeu uk) Vus = Va. = Uxeu Vo?-- 


These “quadratic polynomials” which lead up to w? may be extended to “cubic polynomials” which lead 
up to w^. The “cubic polynomials in w” may be written out as follows. (To save space, addition of 1 
has been abbreviated to the superscript “+”. Thus i+ means (i + 1), and so forth.) 


Vi, j,k, l Ew, wltw*ktwj tit = (uM +w? k+ wj i)* Visetw2k+witit = P(Vost4w2ktwjti) 
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Vj k,l Ew, welt a? k t wj* =Uje, (welt wk wj +i) Vasepurndwit) = jew Vuretu2htwj-+i 
Vk, l € w, wl + wkt = jeu (we +wk wj) Vuserw2rt = jew Vusttw?ktwj 
Vl € w, welt = wl + w3 = Ue, (09M + wk) Vose = Vusepus = Upew Vostcu2k 

wt =w -w= Uscu (w2) Via = Vow = Ue, Vose 
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One has thus wt = w? - w = Uscu (w?£) and Via = Vas. = Use, Vose at the end of this “cubic tranche 
of stages” of the von Neumann universe. An induction argument must now be applied to all finite- 
degree “polynomials in w”. There are only a countable number of such polynomials, and it is not too 
difficult to construct a map from w to the set of all such polynomials. The union of all such “ordinal 
number polynomials” is the next-level ordinal number limit, which is denoted w“. This ordinal number 
satisfies w” = Uew wt. Consequently, the next-level von Neumann universe stage is V, = Uew Vaë: 
For convenience later on, the rank of this stage is also written as w = w", while wo is defined by wo = w. 
Let 377 wip = whim d wm usa +... + wig + wiz + io denote the “polynomial in w” which is 
hinted at in part (5). The order of the summation must be from the highest-degree monomials down 
to the lowest, as indicated. The multipliers im are all elements of w, and k ranges over elements of w 
which satisfy k < m, where m € w is the “degree” of this “polynomial”. All of the stages up to an 
including the w^ stage may now be repeated with an additional term w“ on the left. In other words, 
one forms the “polynomials” w^ + 377. 9 wi; for all coefficient tuples (im,...io), for all m € w. The 
union of all of these stages is then «^2 = w” +w“. One may repeat this procedure any finite number 
of times to produce w“ - j for any j € w. The union of these stages is then w” - w = Uje,,(w* - j). This 
procedure may be repeated to produce w” - w? = (w” - w) - w. (The notation w” - w? is not merely a 
convenient notation for (w”-w)-w. It is verifiable that this follows from the definitions of ordinal number 
addition and multiplication. In this special case, the associative law for ordinal number multiplication 
is valid.) One may then proceed to produce w" - wf for £ € w, and the union (i.e. limit) of these is 


Ww 


(W)? = w” -w = Uew (w” - wf). (It is essential to distinguish between (w”)? and w^ here!) 


Having arrived at (w^)? in part (6), one may now repeat the procedure to arrive at (w”)* for any k € w. 
The union (ie. limit) of all such ordinal numbers is then (w’)” = Unew(w)*. One may now add 
arbitrary “polynomials” to this term as in part (6) to produce (w”)* + $7. o w^ iy for tuples (im, .. . io), 
for all m € w. The union of these is (w^)? + w” = Uje,,((w%)” +w). Repeating this procedure yields 
(w^ --w"k for k € w, and taking the limit of these gives (w”)* +w*w = Uic, ((w”)* +w%k). Repeating 
this gives (w®)® + ((w* -w)-2) = ((w")* +w%w) -- w*w. But (w® -w) -2 =w” - (w-2). (This is another 
special case where associativity gives the right answer!) So (w^) + ((w* - w) - 2) = (w*)" + w" - (w2). 
This procedure may be repeated to give (w)" +w” - (wk) for k € w, and the limit of these stages yields 
(w’)” +w” - (w?). This procedure may be repeated to give (w^)* + w” - (wf) for £ € w, and in the 
limit, this yields (w)^ + w^ - (w”), which may be written as (w^)^ + (w”)?. When this procedure is 
repeated, the result is (04) + (w”)™ for m € w, and in the limit, this yields (w”)” + (w^)*, which is 
clearly equal to (w”)” - 2. This procedure may be repeated to produce (w*)" - k for k € w, and in the 
limit, this gives (w”)” -w. When this procedure is repeated, one obtains ((w^)* - w) - w, which equals 
(w’)” - w? by the fortuitous associativity which was noted earlier. This may be repeated to produce 
(w*)" - w* for € € w, which yields (w”)” - w^ in the limit. If this procedure is repeated, one obtains 
((w”)” -w”)-w’, which equals (w”)” - (w”)? by fortuitous associativity once again. This procedure may 
be repeated to produce (w”)” - (w’)™ for m € w, and this yields (w”)” - (w”)” in the limit. Clearly this 
may be written as ((w”)”)?. This procedure may be repeated to produce ((w”)”)* for k € w, which 
yields ((w”)”)” in the limit. 


w 


Having arrived at ((w”)”)” in part (7), one naturally asks whether “exponentiation” by w can be 
simplified “fortuitously” by analogy with function-sets of the form Y* , which denotes the set of functions 
from X to Y. (See Notation 10.2.17.) Such function-sets have the property that there is a canonical 
bijection h : (Z*)¥ + ZX*Y., (Let f € (ZX)Y. Then one may define h(f) € Z**Y by h(f)(z,y) = 
f(y)(x) for all (x,y) € X x Y.) Therefore one may, if one wishes, identify the sets (Z*)Y and Z*X*Y 
with each other. By analogy, one would hope to identify the ordinal number (w”)” with a well-ordered 
set of the form w” for example. 


The first fly in the ointment here is the fact that an ordinal number of the form a” is not the same as 
the set of functions from w to a. The ordinal number a” may be visualised as an infinite-dimensional 
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hypercube with a as the parameter-set along each axis, but the non-zero coordinates are restricted to 
a finite subset of the set of axis-labels w. Each point in this infinite-dimensional hypercube represents 
a “polynomial in w” (with coefficients in o), similar to the “polynomials” mentioned in part (6). The 
elements of the ordinal number a” may be identified with the functions from w to a which have “finite 
support". (This means functions whose value is zero outside a finite set.) By contrast, general func- 
tions from w to a have “unbounded support”. In other words, general functions from w to a are like 
“power series in w” with coefficients in a. This difference underlies the fact that the function-set 2” is 
uncountable, whereas the ordinal number w" is countable. So it will clearly not be possible to construct 
a bijection between the ordinal number set w^ and the set of functions from w to w. 


The second fly in the ointment here is the fact that an ordinal number a” is an ordered set. The 
well-ordering on a” is the lexicographic order determined by the lowest component where two tuples 
differ. Thus if z, y € o^, then x < y if and only if dn € w, ((Vi € w, (i < n> zi = yi)) ^ (£n € yn)). 
The bare set of functions from w to o has no such order relation. 


Despite the imperfect analogy between ordinal numbers of the form a” and the corresponding sets of 
functions from w to a, the analogy does fortuitously give the right answer. An ordinal number of the 
form (a”)” may be visualised as an infinite-dimensional hypercube where the “coordinate” along each 
axis is itself an infinite-dimensional hypercube. The lower-level hypercube represents *polynomials in w 
with coefficients in a”, whereas the higher-level hypercube represents “polynomials in w with coefficients 
in w°”. In other words, the coefficients for polynomials in the higher-level hypercube are themselves 
polynomials in the lower-level hypercube. Any element of (a”)” may be uniquely identified by a function 
g:wxXw — a with “finite support”. If one defines h : (o/")" — o'*" and h : a*** > (a”)” by 
A(f)(a,y) = f(y)(z) and h(g)(y)(x) = g(x,y) for all f € (a”)” and g € o/**, and z,y € w, then h 
and h are inverses of each other, and they map functions with finite support to functions with finite 
support. (In the case of f € (a”)”, the function f(y) also has finite support for all y € w.) Hence h 
and h are bijections between (a”)” and a”) if the ordinal number a”) is defined to contain only 
those functions from w? to o which have finite support. 

One has therefore the identification w’) = (w”)”, where w(*) is the well-ordered set constructed as 
the set of “binomials in w” with lexicographic order. Similarly, wl") = we?) = (we?) = (Wa 
The elements of this set may be thought of as multinomials in three variables, each of which is an 
element of w. In other words, they are “trinomials in w”. By repeating this procedure, one obtains 
w") for k € w. These are multinomials in k variables which are elements of w. Hence one obtains 
we”) — peu we”), which may be though of as the set of all multinomials in w. Let c3 = w”). (For 
comparison, w; consists of all single-variable polynomials in w.) 


(9) The procedure from c, = w” to wz = w") may be repeated to produce w3 = yw), By induction, 
one may produce ay for £ € w satisfying mp4, = wl7® for all £ € w. Then one may define ey = Uscu We: 
(For general epsilon-numbers, see Pinter [377], pages 202-205. For £o, see also Quine [382], page 212; 
Halmos [357], page 77.) 


12.6.6 REMARK: The ordinal number £9. 

One somewhat surprising observation about the ordinal number € is that it is countable. In other words, 
#(€0) = #(V.) although co C Vas V Va for all ordinal numbers a < £o. Therefore when it is claimed that 
the set of real numbers can be well-ordered, it is being claimed that the real numbers can be put into a 
one-to-one association with all of the ordinal numbers up to £o and well beyond that point in Table 12.6.2. 


12.6.7 REMARK: The benefits of ordinal numbers. 

'The saddest fact about the limits, the limits of limits, and so on, in the kind of procedure illustrated in 
Table 12.6.2 is that this tedious and onerous process is insufficient to escape from the countable ordinal 
numbers. 'To find an ordinal number which is supposedly order-equivalent to an uncountable set such as the 
real numbers, it is necessary to make a leap out of this process by applying completely different kinds of 
constructions. (See Section 13.3.) Very little is known about the smallest uncountable ordinal number w1 
except that it exists and contains all of the countable ordinal numbers. The smallest ordinal number which 
has cardinality greater than w: is even more nebulous than that. So this very tedious and onerous system of 
ordinal numbers yields no benefits for measuring cardinality beyond the countably infinite ordinal number w. 


'The whole idea of ordinal numbers seems to be a waste of intellectual effort apart from building a cardinality 
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yardstick for finite and countably infinite sets. The extended finite ordinal number construction presented 
in Section 12.1 does have real benefits, but for measuring cardinality beyond this point, the von Neumann 
universe rank in Definition 13.4.3 requires little mental effort and fits the requirement precisely. Although 
the family of von Neumann universes uses ordinal numbers for its parameter, in practice this parameter is 
almost never useful beyond w + w. 


12.6.8 REMARK: Literature for constructible universes. 

For discussion of ZF constructible universe models, see Bernays [341], page 34; Cohen [349], pages 99-106; 
Shoenfield [390], pages 270-281; Smullyan/Fitting [392], pages 155-203; Chang/Keisler [347], pages 558-578; 
Jech [364], pages 38-39; Roitman [385], pages 138-145; Pinter [377], pages 222-234; Lévy [368], pages 289-291; 
Moore e [371], pages 280-283, 305. There are many variations of the idea of a ZF constructible universe, but 
they are all essentially equivalent to the first presentation by Gódel [356] in 1931. (See also the modernised 
English-language presentation of Gódel's paper by Nagel/Newman [373].) 
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13.1. Equinumerosity 


13.1.1 REMARK: Defining cardinality in terms of equinumerosity. 

Cardinal numbers are difficult to define, but the relation of equal cardinality, or *equinumerosity", is easy 
to define in a strictly logical manner. However, there is ambiguity in the word “exists” in Definition 13.1.2, 
which is the root cause of very deep and frustrating issues regarding the comparability of sets. 


13.1.2 DEFINITION: Sets X and Y are equinumerous or equipotent if there exists a bijection f : X > Y. 


13.1.3 REMARK: There is no explicit logical predicate which counts the elements in a set. 


Two sets X and Y are defined (in terms of the unique existence quantifier ^3'") to be equinumerous if and 
only if 


f € P(X x Y), (Vr € X, d'y € Y, (x,y) € f) ^ (Vy € X, Fz € Y, (x,y) € f). 


'This is a well-defined logical expression for any sets X and Y because the Cartesian set product X x Y is 
a well-defined set. T'his is important to know because the relation of equinumerosity is defined for all sets, 
and the set of sets is not a set! However, there is evidently no such logical expression for the numerosity 
or cardinality of a set in pure ZF set theory. That is, there is no logical expression of the form N(X) such 
that N(X) is a set for all sets X, and N(X) — N(Y) if and only if X and Y are equinumerous. (However, 
Theorem 13.2.5 asserts that every well-ordered set is order isomorphic to a unique ordinal number in pure 
ZF set theory, and in ZF+AC set theory, every st has a well-ordering by Theorem 11.6.23. So the desired 
predicate N(X) does exist in ZF+AC set theory.) 


It is possible to construct logical expressions to test for each cardinality. For example, Vr, x d X is an 
effective test for zero cardinality. And the logical expression (3x, x € X) ^ (Vzı, £2 € X, zı = c3) is 
an effective test for cardinality equal to 1. For small finite cardinalities, one may use logical multiplicity 
quantifiers as in Remark 6.8.8, but these become rapidly more impractical and tedious. (See for example 
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Remark 6.8.11.) More generally, one may test sets for cardinality equal to any ordinal number (or any 
other set) by applying the above logical expression for equinumerosity of sets X and Y with Y equal to the 
comparison set. This is quite simple and convenient, but it requires the provision of a class of comparison 
sets, one comparison set for each cardinality. And that is where some of the deeper issues in the subject 
of cardinality begin to arise. Even the smaller ordinal numbers become rapidly tedious to write out in full. 
(For examples, see Remark 12.2.1.) 


13.1.4 REMARK: Pseudo-notation for equinumerosity. 

Despite the lack of a function in ZF set theory from the universe of sets to the universe of sets whose value 
is equal for two sets if and only if the sets are equinumerous, Notation 13.1.5 is a very convenient pseudo- 
notation for the proposition that two sets are equinumerous. The symbol “#” does not denote a function 
in ZF set theory, although for finite sets and countably infinite sets, it can be defined to be a function valued 
in wt by means of Definitions 13.5.2 and 13.7.6. Otherwise, the pseudo-notation “#(X) = #(Y)” denotes a 
well-defined ZF set-theoretic expression according to Notation 13.1.5, but the two apparent sub-expressions 
“#(X)” and “#(Y)” have no meaning. 


13.1.5 NOTATION: #(X) = #(Y), for sets X and Y, means that X and Y are equinumerous. 


13.1.6 REMARK: The Schróder-Bernstein theorem for proving equinumerosity of sets. 

The Schróder-Bernstein theorem is a powerful tool for proving that two sets are equinumerous. (It is called 
Bernshtein's theorem by EDM2 [113], 49.B, page 199. According to Wilder [403], page 109, it is sometimes 
called the Bernstein theorem or the Cantor-Bernstein theorem. It is in fact called the Cantor-Bernstein 
theorem by Lévy [368], pages 85-86; Willard [165], page 7; Kolmogorov/Fomin [104], page 17. It is called the 
Cantor-Bernstein-Dedekind theorem by Graves [85], page 34.) s 


The idea of the proof given here for Theorem 13.1.8 is to construct sets Xo = g(Y \ f(X)), X1 = g(f(Xo)), 
X» = g(f(X1)), and so forth. This requires induction, but unlike many tools for determining cardinality, 
this theorem does not require the axiom of choice. (The axiom of countable choice and the principle of 
induction are easily confused. The difference is that the former requires an infinite number of choices to be 
made, whereas the latter requires only a finite number of choices. This is slightly paradoxical because one 
would expect a proof to be easier when there are infinitely ways to get existence. Instead, one requires a 
mysterious, mystical axiom of choice when there are infinitely many options for doing it.) 


The proof of Theorem 13.1.8 has the very great advantage that it gives an explicit construction for the 
claimed bijection. Some required properties of this construction are presented first as Theorem 13.1.7. 

For alternative proofs of the Schróder-Bernstein theorem, see Halmos [357], pages 88-89; Simmons [137], 
pages 29-30; S.J. Taylor [147], pages 6-7; Kleene [365], pages 11-12. However, these proofs are essentially 
identical to the proof given here. 


13.1.7 THEOREM: The pairwise disjoint property for the Schroder-Bernstein set-sequence construction. 
Let X and Y be sets. Let f: X —^ Y and g : Y — X be injective functions. Define the sequence 
(X;)9 € X" of subsets of X inductively by Xo = g(Y V f(X)) and X;441 = g(f(X;)) for i € w. Then 
Xin Xj = 0 for all à, j € w with i Z j. (That is, the sets X; are pairwise disjoint for i € w.) 


PROOF: Let X and Y be sets. Let f : X — Y and g : Y X be injective functions. Inductively define the 
sequences (Aj) 29 € X" and (Bi) E X” by Ag = X, Bo = g(Y), Aiii = g(f (A:)) and Bii = g(f(Bi)) 
for alli € w. Then Ap 2 Bo, and A; 2 B; implies A;,1 2 Bj, for all i € w by Theorem 10.6.7 (ii). Therefore 
by induction, A; D B; for all i € w. Similarly, Bp 2 A1, and B; D Aj41 implies B;,1 2 Aj+2 for all i € w by 
Theorem 10.6.7 (ii). Therefore by induction, B; 2 Aj+1 for all i € w. 

Inductively define (X;)% 9 € X^ by Xo = g(Y NV f(X)) and Xi41 = g(f(Xi)) for à € w. Then Xo = 
Bo \ Ai because g(Y \ f(X)) = g(Y) \ g(f(X)) due to the injectivity of g, by Theorem 10.6.7 (v). Also 
by Theorem 10.6.7 (v), X; = B; NV Ai,4 implies X;,1 = Bj, \ Aisa for all i € w. Therefore by induction, 
X; = B; \ Aii for all i € w. (This is illustrated in Figure 13.1.1.) 

Let i,j € w with i Æ j. Suppose that i > j. Then X; = B; \ Aizi1 and Xj = B; \ Aj41. So X; C Bi. 


But B; C Bj+41 because the sequence (B;);?;9 is non-increasing. So X; C Bj41. But X; N Ajy = 0 
and Aj41 2 Bj41. So X; B;j44 = 0. Therefore X; N X; = Ø. The same result holds if i < j. 
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Figure 13.1.1 Set-sequence construction for the Schröder-Bernstein theorem 


13.1.8 THEOREM: The Schröder-Bernstein theorem. 
Let X and Y be sets. Let f : X — Y and g : Y — X be injective functions. Then there exists a 
bijection h : X > Y. 


PROOF: Let X and Y be sets. Let f: X — Y and g : Y — X be injective functions. Inductively define 
(Xi9g9 € X" by Xo = g(Y V f(X)) and Xi41 = g(f(X;)) fori € w. Then X; N X; = @ for all i,j € w 
with i Æ j, by Theorem 13.1.7. Define h : X —^ Y by 


f(x) ifrce XNU X; 
Va c X, h(a) = os a 
g (x) ifee UX. 

i=0 
Then h(x) is well defined for all x € X because g is injective and X; C g(Y) for all i € w, and so g^! is well 
defined on Uo X;. 
To show that h is injective, let 21,22 € X with h(x1) = h(x2). Suppose that xı Æ x2. Since f is injective 
on X V Uo X; and g^ is injective on Uo Xi, one of the points z; and z9 must be in each of these 
subsets of the domain. Suppose (without loss of generality) that xq € X \ EIS Xj; and z2 € Uo Xi. 
Then f(zi) = g l(z3). So g(f(zxi)) = x». So z2 € g(f(X)) = Ai. So za d Xo. But xo € Xp for 
some k € w, and go f is a bijection from X to A; = g(f(X)) with the property that g(f(.X;)) = Xi+ı for 
all i € w. Therefore xı = (f^! o g^!)(x3) € Xp_1 because z2 € Xo. This contradicts the assumption that 
a, € X VU; Xi. Hence x, = x2 and so h is injective. 
To show that h : X — Y is surjective, note that h(X) = U UV, where U = f(X \ U2, Xi) and V = 
g+ (UZ o Xi). Since f is injective on X, and g^! is injective on g^ ! (Y), it follows that U = f(X)NUS S f (Xi) 
and V — UZo g (X). But g (Xi) = f(Xi-1) for i = 1. SoV= (Us, f(Xi-1)) U (Y \ fCX)) because 
9-10) = Y V f(X). Therefore UU V = (F(X) NU (X) U (Ua F(X) U (Y \ £QO) = Y. So h is 
surjective. Hence h is a bijection. 


13.1.9 DEFINITION: A set X dominates a set Y if there exists an injection from Y to X. 
A set X strictly dominates a set Y if X dominates Y but X and Y are not equinumerous. 


13.1.10 REMARK: Pseudo-notation for numerosity domination. 

Notation 13.1.11 is a pseudo-notation for numerosity domination in the same way that Notation 13.1.5 is a 
pseudo-notation for equinumerosity. (See Remark 13.1.4.) The expressions in Notation 13.1.5 have mean- 
ings as well-defined set-theoretic formulas given by Definition 13.1.9, but the two apparent sub-expressions 
"3L ( X)" and “#(Y)” have no meaning in ZF set theory, except in the case of finite sets and countably infinite 
sets, for which “#” can be defined as a function valued in wt by means of Definitions 13.5.2 and 13.7.6. 


13.1.11 NOTATION: 

> #(Y), for sets X and Y, means that X dominates Y. 

< #(Y), for sets X and Y, means that Y dominates X. 

> 3L (Y), for sets X and Y, means that X strictly dominates Y. 
< #(Y), for sets X and Y, means that Y strictly dominates X. 
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13.1.12 THEOREM: Set-inclusion implies domination. 
Let X and Y be sets such that X C Y. Then #(X) x #(Y). 


PROOF: Let X and Y be sets such that X C Y. Then the identity function idx is an injection from X 
to Y. (See Definition 10.2.27 and Notation 10.2.29 for the identity function.) Hence #(X) < #(Y) by 
Definition 13.1.9 and Notation 13.1.11. 


13.1.13 THEOREM: Mutual domination implies equinumerosity. 
Let X and Y be sets such that #(X) < #(Y) and #(X) > #(Y). Then #(X) = #(Y). 


PROOF: Let X and Y be sets with #(X) < #(Y) and #(X) > #(Y). Then by Definition 13.1.9 and 
Notation 13.1.11, there exist injections f : X —^ Y and g : Y — X. So there exists a bijection h : X — Y by 
Theorem 13.1.8. Hence #(X) = #(Y) by Definition 13.1.2 and Notation 13.1.5. 


13.1.14 REMARK: Set-cardinality domination and the non-existence of surjections. 

Theorem 13.1.15 (ii) is applicable to the proof of Cantor's diagonalisation procedure in Theorem 13.1.27. 
The converse of Theorem 13.1.15 part (i) is easily proved without the axiom of choice as in part (iii). By 
contrast, the converse of Theorem 13.1.15 (ii) can be proved from the AC-tainted Theorem 10.5.17 (i), but 
only if the axiom of choice is accepted. Therefore no such converse is given here. 


13.1.15 THEOREM: Some relations between injections, surjections and strict domination. 
Let X and Y be sets with #(X) < #(Y). 


(i) If there are no injections from Y to X, then #(X) « #(Y). 
(ii) If there are no surjections from X to Y, then #(X) < #(Y). 
(iii) If #(X) < #(Y), then there are no injections from Y to X. 
(iv) If #(X) < #(Y) and X is well-orderable, then there are no surjections from X to Y. 


PROOF: For part (i), let X and Y be sets with #(X) € #(Y), such that there are no injections from Y to 
X. Suppose that X and Y are equinumerous. Then by Definition 13.1.2, there exists a bijection f : Y > X, 
which is an injection by Definition 10.5.2. This contradicts the assumption. Therefore X and Y are not 


equinumerous. Hence #(X) < #(Y) by Definition 13.1.9 and Notation 13.1.11. 


For part (ii), let X and Y be sets with #(X) < #(Y), such that there are no surjections from X to Y. 
Suppose that X and Y are equinumerous. Then by Definition 13.1.2, there exists a bijection f : X > Y, 
which is a surjection by Definition 10.5.2. This contradicts the assumption. Therefore X and Y are not 
equinumerous. Hence #(X) < #(Y) by Definition 13.1.9 and Notation 13.1.11. 


For part (iii), let X and Y be sets with #(X) « #(Y). Suppose that there is an injection from Y to X. 
By Notation 13.1.11 and Definition 13.1.9, there is an injection from X to Y. So by Theorem 13.1.8, there 
is a bijection from Y to X. So by Definition 13.1.2, X and Y are equinumerous. By Definition 13.1.9, this 
contradicts the assumption that #(X) < #(Y). Hence there are no injections from Y to X. 


For part (iv), let X and Y be sets with #(X) « #(Y). Let R be a well-ordering of X. (For well-orderings, 
see Definition 11.6.1.) Let f be a surjection from X to Y. Define g : Y + X by 


Vy € Y, gly) = ming(f (£y), 


where ming : P(X) V (0) — X maps non-empty subsets of X to their unique minima with respect to R. 
Then g is a well-defined function from Y to X, and g is injective because f is a well-defined function from 
X to Y. So by Theorem 13.1.8, there is a bijection from Y to X. So by Definition 13.1.2, X and Y are 


equinumerous. By Definition 13.1.9, this contradicts the assumption that #(X) « #(Y). Hence there are 
no surjections from X to Y. 


13.1.16 REMARK:  Numerosity domination is a set-theoretic relation, but is ZF model-dependent. 
The comments in Remark 13.1.3 about logical expressions for equinumerosity are equally applicable to 
Definition 13.1.9. There is a purely logical expression for the relation of domination between sets. A set Y 
dominates a set X if and only if there exists an injection f : X — Y, and this is true if and only if 


df € P(x x Y), (Var € X, Vy, yo € Y, (2,91), (2, y2) € f > yı = 2) 
^ (Vr € X, dy € Y, (x,y) € f) ^ (Var, £2 € X, Vy € Y, (21, y) (32,9) € f > x1 = v3)). 
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This is a relation between all sets. Therefore it is not a relation in the sense of Definition 9.5.2. 

Let B(X,Y) = (f € P(X x Y); f is a bijection} and (X,Y) = (f € P(X x Y); f is an injection} for all 
sets X and Y. Then X and Y are equinumerous if and only if B(X,Y) Z 0, and Y dominates X if and 
only if I(X,Y) #9. So Y strictly dominates X if and only if B(X,Y) = 9 and I(X,Y) 4 0. However, 
the existence of a purely logical expression for a set property does not necessarily imply that the property 
can be determined within ZF set theory. The axiom of choice is invoked precisely because the emptiness or 
otherwise of sets is sometimes impossible to determine within ZF. In the case of cardinality, ZF set theory 
has acquired some "optional extra" axioms for resolving equinumerosity and domination questions. However, 
as in the case of the choice axiom, these optional numerosity axioms are a matter of personal faith. They 
do not construct anything new. They merely state, without any justification, that certain sets are empty or 
non-empty. (It takes courage to admit that one does not know something.) 


13.1.17 REMARK: Equinumerosity is a kind of equivalence relation. 

The assertion in Theorem 13.1.8 resembles the antisymmetry condition for a partial order in Definition 11.1.2. 
The theorem states that if X is dominated by Y and Y is dominated by X, then X and Y are equinumerous. 
However, there are two important differences. Firstly, Definition 11.1.2 requires a partial order to be a 
subset of a Cartesian set product, which is not true in the case of a relation on the class of all sets. Secondly, 
Theorem 13.1.8 does not assert that X and Y are equal. It states only that they are equinumerous. 


The equinumerosity relation amongst sets apparently satisfies the conditions for an equivalence relation in 
Definition 9.8.2. Since equinumerosity is defined on all sets, it is not a relation in the sense of Definition 9.5.2. 
However, equinumerosity of sets is a genuine equivalence relation on the class of all sets. 


If one wishes, one may think of the class of all sets as being partitioned into equivalence classes which 
have the same numerosity. These classes are not sets (except for the equivalence class of sets which are 
equinumerous to the empty set). So it is dangerous to think of them as sets. However, it is perfectly safe to 
think of equinumerosity as an equivalence relation. One could perhaps refer to general set-equinumerosity 
as a “set-theoretic equivalence relation formula". 


13.1.18 REMARK: Cardinality domination is a kind of order relation. 

Since one cannot refer to the equivalence classes of the equinumerosity relation as sets, one cannot define 
the “is-dominated-by” relation on such equivalence classes. However, one may observe that this relation 
satisfies the three conditions for a partial order in Definition 11.1.2 if one accepts “equinumerosity” as a 
substitute for “equality” in the antisymmetry condition. Those who accept the axiom of choice can say that 
the “is-dominated-by” relation is then a total order. 


Cardinality domination may be thought of as a “set-theoretic order relation”. Theorem 13.1.19 would then 
be asserting the transitivity property for that order relation. 


13.1.19 THEOREM: Transitivity of the cardinality domination relation. 
Let X, Y and Z be sets with #(X) € #(Y) and #(Y) € #(Z). Then #(X) x #(Z). 


Pnoor: The assertion follows from Theorem 10.5.6 (i). 


13.1.20 REMARK: The trichotomy comparability theorem for cardinality of sets. 

The “trichotomy” theorem in ZF+AC set theory states that all pairs of sets are comparable. In other words, 
for any pair of sets, either one dominates the other or else they are equinumerous. In pure ZF set theory, 
this theorem is not provable. (See for example Smullyan/Fitting [392], pages 41, 113; Wilder [403], pages 
132, 135-138; Moore [371], pages 46-52; E. Mendelson [370], page 198; Lévy [368], page 162; Halmos [357], 
page 89.) 


13.1.21 REMARK: Which ordinal number is w, ? 

w1 is the standard notation for the least uncountable ordinal number. One might be tempted to guess that it 
equals w^. After all, the ordinal numbers w* for k € w are all clearly countable, and w® looks like it should 
be at least as big as 2”, which is well known to be uncountable. However, w^ is equivalent to the set of all 
finite sequences of non-negative integers, not the set of all infinite sequences of non-negative integers. Hence 
w” is countable. The same observation applies to the powers w^), wee), and so forth. According to 
Quine [382], page 212, the ordinal number w lies “far beyond even £”. (For a proof that w; is unreachable by 
limit operations, see for example Willard [165], page 11.) One may conclude from this that a startlingly high 
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percentage of the infinite ordinal numbers are useless as “cardinality yardsticks” since only ordinal numbers 
which have different cardinality are required as yardsticks. 

The confusing fact that the set 2” is uncountable while the set w“ is countable may be clarified to some 
extent by using the more strictly correct notation P(w) instead of 2”. Then if one writes U,-,, w® instead 
of w”, it is no longer so surprising that P(w) is uncountable while (Jae, w^ is countable. 


acu 


acu 


13.1.22 REMARK: The parting of the ways between ordinal numbers and cardinality. 

There is a close association between finite ordinal numbers and the cardinality of finite sets. The finite ordinal 
numbers provide an excellent yardstick for the cardinality of finite sets, and the set w is the fundamental 
yardstick for countably infinite sets. However, it is clear that this close relationship suffers a severe “parting 
of the ways" beyond w. As noted in Remark 13.1.21, finding the second infinite cardinal number amongst 
the infinite ordinals is highly problematic. Not only is there a breathtaking “waste of space" between wo = w 
and wj. The identity of w; cannot be even approximately determined. It is quite astonishing, or at least 
surprising, that so much attention is paid in the literature to the use of ordinal numbers for measuring 
cardinality. The approach outlined in Remark 13.4.2 is far preferable. 


'The cardinality of ordinal numbers far beyond w has many other problems apart from sparsity and uselessness. 
The study of this subject depends very much on the axiom of choice to “solve” problems, and even so, the 
subject is plagued by exotic classes of cardinals which fortunately are outside the scope of this book. The 
ordinal numbers in full generality may be safely ignored. 


13.1.23 REMARK: Cardinality of well-orderable sets. 

If a set can be well-ordered, there exists an ordinal number which is equinumerous to it. Since the ordinal 
numbers are well ordered, there is a least ordinal number which is equinumerous to it. This is defined to be 
the cardinality of the set in Definition 13.1.24. 


13.1.24 DEFINITION: 
'The cardinality of a well-orderable set X is the least ordinal number which is equinumerous to X. 


13.1.25 NOTATION: card(X), for a well-orderable set X, denotes the cardinality of X. 


13.1.26 REMARK: The Cantor diagonalisation procedure. Every set is smaller than its power set. 
Theorem 13.1.27 shows that every set is smaller than its power set. The proof strategy, known as the Cantor 
diagonalisation procedure, was published in 1891 by Cantor [408]. (See also Cantor [344], pages 278-281; 
Zermelo [444], page 276; Cohen [349], page 67; Suppes [395], pages 97-98; Stoll [393], page 86; Roitman [385], 
page 105; Halmos [357], page 93; E. Mendelson [370], page 183; Lévy [368], page 87; Smullyan/Fitting [392], 
page 8; Bernays [341], pages 117-118; Quine [382], pages 201-202.) 


13.1.27 THEOREM: The Cantor diagonalisation procedure. 
Let X be a set. 


(i) Let f : X — P(X) be a function. Then f is not a surjection. 
(ii) The function g : X — P(X) defined by i — {i} is a well-defined injection. 
(ii) #(X) < #(P(X)). 
(iv) #(X) < #(P(X)). 
(v) #) < #(P()). 
PROOF: For part (i), let X be a set, and let f : X — P(X) be a function. Let y = {i € X; i£ f(i)). Then 
) 


y € P(X). Let j € X. If j € f(j), then j ¢ y, and so f(j) Z y. If j € f(j), then j € y, and so f(j) Z y. 
Therefore f(j) + y by Theorem 4.6.4 (iii). So y € P(X) \ Range(f), and so Range(f) 7 P(X). Hence f is 
not a surjection. 

For part (ii), {i} € P(X) for all i € X. So g : X — P(X) is a well-defined function. Let i,j € X satisfy 
g(t) = g(3). Then {i} = {j}. So i = j by Notation 7.5.7 and Theorem 7.5.2. Hence g is an injection by 
Definition 10.5.2. 

Part (iii) follows from part (ii), Definition 13.1.9 and Notation 13.1.11. 


Part (iv) follows from parts (i) and (iii), and Theorem 13.1.15 (ii). 


Part (v) follows from part (iv) because w is a set. 
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13.2.1 REMARK: Cardinality using ordinal numbers without the axiom of choice. 

The attempt to build a cardinality scale from ordinal numbers is presented in most presentations of set 
theory. Since the effort required is considerable, and the rewards are so unsatisfactory and disappointing, 
one might reasonably ask why this path has been followed for well over a century. The historical origin of this 
way of thinking in the works of Georg Cantor is clear, What is not clear is why it was not abandoned when 
it was realised how unsatisfactory it is. The set theory literature is extraordinarily focused on properties of 
the ordinal numbers despite the very poor returns on investment. One of the principal reasons for resorting 
to AC in the set theory literature is to try to salvage some kind of benefit from this investment. Thus after 
creating one feeble monster, a greater monster must be created to prop it up. 


Without the axiom of choice, the ordinal numbers as a cardinality scale would surely have been abandoned. 
Some authors have mentioned how unsatisfactory the standard cardinality theory (based on ordinal numbers) 
is without AC. For example, Suppes [395], page 224, wrote the following. 

Without using the special axiom of cardinals or the axiom of choice, a certain amount of cardinal 

number theory can be done by defining cardinal numbers as initial ordinals. [....] However, without 

the axiom of choice we cannot show that every set has a cardinal number and thus cannot develop 

a reasonable cardinal arithmetic. 
Roitman [385], page 177, made the following remark. 

Without the axiom of choice there is almost nothing you can say about cardinal arithmetic. 
When it is said that "cardinal arithmetic" cannot be developed without the axiom of choice, this refers to 
the special kind of cardinal arithmetic which is based on ordinal numbers. The successive power sets w, 
P(w), P(P(w)), and so forth, are ideal cardinality yardsticks which require almost no effort. 
Sections 13.2 and 13.3 present some cardinality theory which is possible without AC. This helps to motivate 
the alternative “beta cardinality” approach in Section 13.4, which is very naive and simple, but has the 
virtue of easily achieving all of the practical objectives of a cardinality scale. Then no monsters are required. 


13.2.2 REMARK: Comparability of cardinality of well-ordered sets and ordinal numbers. 

In ZF set theory, it is not always possible to “compare” the cardinalities of pairs of sets. In other words, it 
is not always possible to determine whether they have the same cardinality or if one has higher cardinality 
than the other. In the case of well-orderable sets, comparability of cardinality is guaranteed, and since all 
ordinal numbers are well-ordered sets, all ordinal numbers have comparable cardinality also. 


13.2.3 THEOREM: Comparability of well-orderable sets and ordinal numbers. 

(i) Let X and Y be well-orderable sets. Then #(X) < #(Y) or #(X) > #(Y). 
Hence #(X) < #(¥) or #(X) = 44Y) or #(X) > #(Y). 

(ii) Let a and 8 be ordinal numbers. Then #(a) € #(8) or #(a) > #(8). 
Hence #(a) < #(8) or #(a) = #(8) or #(a) > #(8). 

PROOF: For part (i), let X and Y be well-orderable sets. Then (X, Xx) and (Y, <y) be well-ordered sets 

for some €x € IP(X x X) and €y € IP(Y x Y). Then by Theorem 11.7.7, either 

(1) (X, €x) is order isomorphic to (Y, <y), or 

(2) (X, € x) is order isomorphic to a proper lower section of (Y, <y), or 

(3) (Y, <y) is order isomorphic to a proper lower section of (X, <x). 

In case (1), #(X) = #(Y) by Notation 13.1.5. In case (2), #(X) € #(Y) by Notation 13.1.11. In case (3), 

#(X) > #(Y) by Notation 13.1.11. So #(X) € #(Y) or Z(X) > #(Y) by Notation 13.1.11. Hence 

TX) < #(Y) or (X) = #(Y) or (X) > #(Y) by Notation 13.1.11. 

Part (ii) follows from Theorem 12.5.19 (xxi), Theorem 13.1.12 and Notation 13.1.5. Alternatively, use part (i) 

and the fact that a and p are well-orderable by Definition 12.5.7 (i). 


13.2.4 REMARK: Order isomorphisms between well-ordered sets and ordinal numbers. 
The well-ordered set comparability theorem states that for any pair of well-ordered sets, one of the sets is 
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order isomorphic to a lower section of the other. (See Theorem 11.7.7.) This includes the special case that 
the sets themselves could be order isomorphic. 


Since the ordinal numbers form a kind of fine-grained sliding scale (or “yardstick” ) of well-ordered sets, it 
seems reasonable that by gradually moving along the scale, one should be able to find an ordinal number which 
is order isomorphic to any given well-ordered set. This is in fact provable in ZF set theory. Theorem 13.2.5 
is called “the counting theorem” by Smullyan/Fitting [392], page 83. It provides a way of “counting” the 
elements of well-ordered sets. Suppes [395], pages 234, called it “a representation theorem for well-ordered 
sets in terms of ordinal numbers”. Quine [382], pages 186, called it an “enumeration theorem". Lévy [368], 
page 54, attributes this theorem to a 1917 paper by Mirimanoff [428]. 


Proofs of Theorem 13.2.5 in ZF set theory which use transfinite recursion to construct ordinal numbers are 
given by Suppes [395], pages 234-236; Roitman [385], page 113; Halmos [357], page 80; E. Mendelson [370], 
pages 179-180; Quine [382], pages 184-191; Bernays [341], pages 125-128. 

Proofs of an NBG set theory version of Theorem 13.2.5 based on a comparability theorem for well-ordered 
NBG classes are given by Smullyan/Fitting [392], page 83; Lévy [368], page 54; Pinter [377], page 185. 

The proof given here for Theorem 13.2.5 follows the proof strategy of Cohen [349], page 61, which is effectively 
transfinite inductive, but does not use the “heavy machinery" of an explicit transfinite recursive construction 
as many other authors do. This proof strategy has the advantage that it is quite elementary. (The excessive 
length of the proof given here is due to over-zealous gap-filling.) 


13.2.5 THEOREM:  Well-ordered set enumeration theorem. 
Every well-ordered set is order isomorphic to a unique ordinal number. 


Proor: For any well-ordered set (S, €) and ordinal number a, let S = a mean that there exists a bijection 
$ : S — a which satisfies Yx1, £2 € S, (x1 < £2 €» $(z1) € $(x2)). In other words, there exists an order 
isomorphism from (S, <) to (o, €). By Theorem 12.5.25 (ix), both o and ¢ are uniquely determined by (S, <) 
if such o and ¢ exist. 


Let (X, X) be a well-ordered set. Let I, = {x € X; x < a} fora € X. Then J, is a lower section of (X, <) 
for all a € A by Definition 11.7.4. Define A € IP(X) by 


A= (a € X; da, (Ord(a) and I, = a)}. 


If X = 0, then (X, «) is order isomorphic to the ordinal number Ø. So assume that X # Ø. Then X must 
have a minimum element zo, and then zo € A because Ip, = (zo) S (0). Thus A Z 0 whenever X z 0. 


Define the two-parameter set-theoretic predicate v» by w(a,a) = *Ord(a) and I, = a” for a € A. For all 
a € A, the predicate v(a, a) is true for at most one ZF set a. So (o; Ja € A, v(a, o)) is a well-defined ZF 
set by the ZF replacement axiom, Definition 7.2.4 (6). 

Let 8 = {a; da € A, w(a,a)}. Then Va € A, Ja € B, v(a, a). Therefore Va € A, 3'a € f, v(a,o) by 
the uniqueness property of v. Define f € P(A x 8) by f = {(a,a) € Ax 8; v(a,a)). Then f : A > B 
is a well-defined function with Dom(f) = A and Range(f) = f(A) = 8. For all a € A, the value f(a) is 
the unique ordinal number a for which J, S o. Consequently f is an injection by Theorem 11.7.5 (xv). 
Therefore it is a bijection onto 8. But f is an order homomorphism because a,b € A with a « b implies 
Ia & Ip, which implies f(a) = (I4) € (Ie) = f(b) by Theorem 12.5.25 (x). So by Theorem 11.5.33, f is an 
order isomorphism because (A, €) and (8, €) are totally ordered sets. 


To show that A is a lower section of (X, X), let a € A and b € XX A. Then b Z a. Suppose that b < a. Then 
b € Ia. There exists an ordinal number a with I, = o. So there is an order isomorphism 6 : I, > a. But Ip 
is a proper lower section of (Ia, <|, ,., ) by Definition 11.7.4. So by Theorem 12.5.25 (x), ¢|,, : To > (Iv) 
is an order isomorphism from I, to $(I;), and ¢(J,) is an ordinal number with $(Ij) € a. So b € A, which 
is a contradiction. Therefore b > a, and so A is a lower section of (X, €). 

To show that A = LJ,c 4 Ja, first note that A C Ue, I; by Theorem 11.7.11 (iii). Now let x € U e4 Ia. Then 
x € I, for some a € A. So x € X and x < a. Therefore da, (Ord(a) and I; = a) by Theorem 12.5.27 (vi). 
So x € A. Thus A = (aca Ta- 

Then from Theorem 10.7.6 (i), it follows that f(A) = f(Uasca Ia) = Uaca fa). So 8 = f(A) is an ordinal 
number by Theorem 12.5.19 (xxix) because f(J,) is an ordinal number for all a € A. 
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Now suppose that A Z X. Then XV A z (. Let zo = min(X V A). Then zp € X \ A is well defined by 
Definition 11.6.1, and AU (zo] is a well-ordered subset of X, and AU(xo) = L} (X) by Theorem 11.7.11 (iv). 
Let g = fU((ro,o)). Then g is a function, and g : L} (X) > aU {a} is an order isomorphism. But aU{a} is 
an ordinal number by Theorem 12.5.15 (iii). Therefore zo € A, which is a contradiction because zo € X \ A. 
Consequently Dom(f) = X. Hence X is order isomorphic to an ordinal number. Uniqueness follows from 
Theorem 12.5.25 (ix). 


13.2.6 REMARK: The aleph numbers. 

In the axiomatic set theory literature, it is the particular ordinal numbers called “alephs” or “aleph numbers” 
which are most often used as “yardsticks” for cardinality of sets. The alephs are the infinite “initial ordinals”. 
(See for example Suppes [395], pages 224-225; Roitman [385], page 107.) 


13.2.7 DEFINITION: 
An initial ordinal (number) is an ordinal number a which satisfies V8 € a, #(8) < #(a). 


An aleph (number) is an infinite ordinal number a which satisfies V3 € a, #(8) < #(a). 
In other words, an aleph is an infinite initial ordinal. 


13.2.8 REMARK: Well-definition of the alephs. 

To convince oneself that the alephs in Definition 13.2.7 are well defined, first note that the ordinal numbers, 
according to Definition 12.5.7, are ZF sets which satisfy two simple set-theoretic predicates which are clearly 
unambiguous within ZF set theory. (See also Notation 12.5.10.) The predicate “#(8) < #(a)” is given 
meaning by Notation 13.1.11 and Definition 13.1.9. Written out slightly more fully, an aleph is an infinite 
ZF set o which satisfies the following two conditions. 


(1) Ord(o). In other words, [Ja € a and VS € P(a)\ {0}, (1S € S. 
(2) V8 € a, Vf : a — B, f is not an injection. 


In other words, VB € a, Vf : a — B, Iy, 72 € o, (yı £ yo and f(71) = f(q2)). 
Although condition (2) has an unambiguous meaning, this meaning may be different in different ZF models. 
'The axiom of choice influences the existence of functions. So whether one set dominates another does depend 
on the axiom of choice and other properties of ZF models. Therefore a set could be an aleph in one model 
but not in another. 


13.3. Hartogs's theorem 


13.3.1 REMARK:  Hartogs's theorem is a kind of ordinal number version of Cantor’s theorem. 

Cantor's theorem, Theorem 13.1.27 (iv), states that every ZF set X has a lower cardinality than its power 
set IP(X). Hence for every ZF set, there exists a ZF set with higher cardinality. Consequently there is no 
set cardinality which is higher than all others. In other words, there is no maximum cardinality for ZF sets. 
Cantor's theorem is concerned with the cardinality of general sets. 


Hartogs's theorem, which was published in 1915, is similar to Cantor's theorem. (For Hartogs's theorem, see 
for example Hartogs [420]; Smullyan/Fitting [392], pages 112-114; Suppes [395], pages 226-227; Lévy [368], 
page 89; Stoll [393], pages 124-125; Mendelson [370], pages 187-188; Jech [364], page 157.) 


Hartogs's theorem, which does not use the axiom of choice, constructs for each ZF set X an ordinal number 
whose cardinality is not less than or equal to the cardinality of all well-orderable subsets of X. This does 
not imply the existence of an ordinal number with greater cardinality than X unless X has a well-ordering, 
which can be obtained for all sets only if the axiom of choice is invoked. However, all ordinal numbers are 
well ordered. Therefore for every ordinal number a, Hartogs's theorem constructs an ordinal number which 
has higher cardinality. Consequently there is no maximum cardinality for ordinal numbers. 


13.3.2 THEOREM:  Hartogs's theorem. 
For any set X, there is an ordinal number a for which there is no injection from a to X. In other words, 


VX, Ja, Ord(a) and Vd: a—> X, ¢ is not injective. (13.3.1) 


In other words, 
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VX, da, Ord(a) and #(X) Z #(a). (13.3.2) 


PROOF: Let X bea set. Let W be the set of well-ordered subsets of X. In other words, 


W = {(A, R) € P(X) x P(X x X); R well-orders A}. 


This is a well-defined ZF set by Theorem 7.7.2. By Theorem 13.2.5, for every well-ordered set (A, R) € W 
there exists a unique ordinal number 8 with (A, R) = 6, where “=” denotes the existence of an order 
isomorphism. (For simplicity, R is written here instead of its more pedantically correct equi-informational 
strong version R\ ((x, x); x € A}. Similarly, B is written here as an abbreviation for the strong well-ordered 
set (8, €).) Then by the ZF replacement axiom, Definition 7.2.4 (6), 


B = (8; Ord(8) and 3(A, R) € W, (A, R) = 8} 


is a well-defined ZF set, and so f = (((A, R), 8) € W x B; (A, R) € B) is a well-defined ZF set which is a 
function from W to B. By Theorem 12.5.19 (xxix), U B is an ordinal number. So ag = U BU(Bn(U B )) is 
an ordinal number by Theorem 12.5.15 (iii). In fact by Theorem 12.5.23 (xiv), ao is the least ordinal number 
which is greater than all elements of B. 

Let ¢: ao — X be an injective function. Let A = ó(ao) and R = {(a,b) € Ax A; 6^ !(a) € $^! (b)). Then 
A € P(X), and by Theorem 11.6.8, R € P(X x X) well-orders A and $ : ap — A is an order isomorphism 
from ag to (A, R). So $^! : A — ag is an order isomorphism from (A, R) to oo by Definition 11.1.21. 
Therefore ag € B, which contradicts Theorem 12.5.23 (xiv). Thus $9 : ag — X cannot be injective. This 
verifies line (13.3.1). The equivalence of line (13.3.2) to line (13.3.1) follows from Notation 13.1.11 and 
Definition 13.1.9. 


13.3.3 REMARK: The cardinality of ordinal numbers has no upper bound. 

An immediate consequence of Hartogs’s theorem is that for any given ordinal number, there is an ordinal 
number which has higher cardinality. Therefore the cardinality of ordinal numbers has no upper bound 
amongst ordinal numbers. It follows in particular that there exists at least one uncountable ordinal number 
because w is an ordinal number which is countably infinite. 


13.3.4 THEOREM: Every ordinal number is strictly dominated by some other ordinal mumber. 
Va, (Ord(a) = 3f, (Ord(8) and #(a) < #(G))). In other words, for any given ordinal number a, there 
exists an ordinal number 8 which has greater cardinality than a. 


PROOF: Let a be an ordinal number. Then by Theorem 13.3.2, there exists an ordinal number £8 with 


#(8) £ #(a). But #(8) € #(a) or #(8) 2 #(a) by Theorem 13.2.3 (ii). Hence #(a) < #(). 


13.3.5 REMARK: The Hartogs number of a set. 

The ordinal number ag which is defined in the proof of Hartogs’s theorem is known as the “Hartogs number” 
of the set X. The great advantage of the Hartogs number is that it is well defined for all sets in pure ZF 
set theory without the axiom of choice. The great disadvantage is that it only yields a set with greater 
cardinality than the given set when the axiom of choice is adopted. 


In ZF+AC set theory, the Hartogs number would always have greater cardinality than X. Then this least 
ordinal number with cardinality greater than X could be used to measure of the cardinality of X. This 
would set up a cardinality scale for all sets. Sadly, with the axiom of choice it is difficult to know much 
about the Hartogs number of a general set except that it has greater cardinality and is an aleph. So the 
resulting cardinality scale is very unsatisfying. Consequently beta-sets are used in practice as cardinality 
yardsticks. (See Definition 13.4.13 for beta-sets.) 

The condition “#(X) > #(w)” in Theorem 13.3.6 (vi, vii) is equivalent to the Dedekind infinite property 
for X in Definition 13.10.2 by Theorem 13.10.6 (ii). So Theorem 13.3.6 (vi) asserts that ag is uncountably 
infinite if X is Dedekind infinite, and Theorem 13.3.6 (vii) asserts that ag is an aleph if X is Dedekind 
infinite. (See Section 13.7 for countable and uncountable infinity. See Section 13.10 for Dedekind infinity.) 


13.3.6 THEOREM: Fundamental properties of the Hartogs number construction for general sets. 
Let X be a ZF set. Let W = {(A, R) € P(X) x P(X x X); R well-orders A}, let B = (8; Ord(8) and 
3(A, R) € W, (A, R) S B), and let ay =UBU(BN{UB}). 
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(i) B is a well-defined ZF set. 
(ii) ao is a well-defined ordinal number. 
(iii) #(a0) € #(X). 
(iv) Va, (Ord(a) 2 (#(a) Z 4(X) & oo € o)). 
In other words, o is the least ordinal number which is not dominated by X. 
(v 
(vi 


(vii 


qo is an initial ordinal. 

If #(X) > #(w), then #(a0) > #(w). 
If #(X) > #(w), then ag is an aleph. 
B is an ordinal number. 

(ix) ao = B. 


o Ss AS A 


(viii 


PROOF: For part (i), let X be a set. Then by Theorem 13.3.2, there is an ordinal number o for which all 
functions à : a + X are non-injective. Let 6 be an ordinal number with (A, R) = B for some (A, R) € W. 
Then there exists an injection from 6 to X. So B & a because otherwise an injection would exist from a 
to X. Therefore 8 € a by Theorem 12.5.19 (xii). So B = (8 € o; Ord(8) and 3(A, R) € W, (A, R) S 8}. 
Hence B is a well-defined ZF set by Theorem 7.7.2, the ZF specification theorem. (Alternatively, B is a 
well-defined ZF set by the argument in the proof of Theorem 13.3.2, using the ZF replacement axiom.) 


For part (ii), o is a well-defined ordinal number by part (i) and Theorems 12.5.19 (xxix) and 12.5.15 (iii). 
For part (iii), suppose that #(a9) < #(X). Then there exists an injection ¢ : ag > X by Notation 13.1.11. 


Let A = plao) and R = ((a,b) € A x A; ó^!(a) € ó-!(b)). Then A € P(X), and by Theorem 11.6.8, 
R € P(X x X) is a well-ordering of A and ó : ag — A is an order isomorphism from o to (A, R). So 
$^! : A — ag is an order isomorphism from (A, R) to ao by Definition 11.1.21. Therefore ag € B. So 


ao € ag by Theorem 12.5.23 (xiv), which contradicts Theorem 7.8.4 (i). Hence #(ao) £ #(X). 

For part (iv), let a be an ordinal number with ag C a. Suppose that #(a) < #(X). Then #(a0) € #(a) 
by Theorem 13.1.12. So #(ao) < #(X) by Theorem 13.1.19. This contradicts part (iii). So #(a) € #(X). 
Now suppose that ag Z a. Then a € ao by Theorem 12.5.19 (xxv), and so a E€ (JBU(Bn (U BJ). Then 
a C B for some 8 € B by Theorem 12.5.23 (xv). For such f, there exists a well-ordered set (A, R) with 
A C X and an order isomorphism ¢: A — f from (A, R) to 8. Let A’ = ó^1(o). Then ¢ poe wis a 
bijection. So $47 : a — X is an injection. Therefore #(a) < #(X). Hence #(a) £ #(X) € ao C a. 

For part (v), let a € ao. Then oo Z a. So #(a) € #(X) by part (iv). Suppose that #(a) > #(a0). Then 
#(ao) € #(X), which contradicts part (iii). Thus #(a) Z 4 (oo). So #(a) < #(ao) by Theorem 13.2.3 (ii). 
Hence o is an initial ordinal. 

For part (vi), let #(X) > #(w). Suppose that #(ao) € #(w). Then #(a0) € #(X) by Theorem 13.1.19, 
which contradicts part (ii). Therefore ##(ao) € #(w). Hence #(ao) > #(w) by Theorem 13.2.3 (ii). 

For part (vii), let #(X) > #(w). Then og is an infinite ordinal number by parts (ii) and (vi), and o is an 
initial ordinal by part (v). Hence o is an aleph. 


For part (viii), B is a well-ordered set by Theorem 12.5.19 (xxxii) because it is a set of ordinal numbers. Let 
8 € B and y € 8. Then there is a well-ordered set (A, R) with A C X and an order isomorphism ¢: A > 8 
from (A, R) to B. Let A’ = $^! (y) and R' = R| 4, a- Then (A', R’) is a well-ordered set and ¢| ,, : A' — 7 
is an order isomorphism from (A’, R’) to y by Theorem 11.6.7. Therefore y € B. Thus (J B € B. Hence B 
is an ordinal number by Theorem 12.5.19 (xxxiii). 


Part (ix) follows from part (viii) and Theorem 12.5.19 (xlvi). 


13.3.7 REMARK: Simplification of the formula for the Hartogs number. 

The assertion ag = B in Theorem 13.3.6 (ix) allows the slightly confusing formula ay = U BU (Bn (U BJ) 
for the least ordinal greater than all elements of B to be replaced by B itself. This considerably simplifies 
Definition 13.3.8. The Hartogs number H(X) is the set a9 = B in Theorem 13.3.6. So Theorems 13.3.9 
and 13.3.10 are obtained from Definition 13.3.8 and Theorem 13.3.6 by substituting H(X) for ap and B. 


13.3.8 DEFINITION: 
The Hartogs number H(X) of a set X is the least ordinal number o which satisfies #(a) € #(X). That is, 


VX, (X) = (f; Ord(S) and 3(A, R) € P(X) x P(X x X), (R well-orders A and (A, R) = 8)}, 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


460 13. Cardinality 


where & denotes the order isomorphism relation. 


In other words, H(X) is the least ordinal number which is not dominated by X. 


13.3.9 THEOREM: Fundamental properties of the Hartogs number of a general set. 
Let X be a ZF set. 


(i) H(X) is an initial ordinal. 
(ii) #(H(X)) € #(X). In other words, H(X) is not dominated by X. 
(iii) Vo, (Ord(a) > (#(a) £ #(X) e H(X) € a)). 
In other words, H(X) is the least ordinal number which is not dominated by X. 
(iv) If #(X) 2 #(), then #(H(X)) > #(w). 
(v) If #(X) > #(w), then H(X) is an aleph. 


PROOF: Part (i) follows from Theorem 13.3.6 (v, ix). 
Part (ii) follows from Theorem 13.3.6 (iii, ix). 
Part (iii) follows from Theorem 13.3.6 (iv, ix). 
Part (iv) follows from Theorem 13.3.6 (vi, ix). 


Part (v) follows from Theorem 13.3.6 (vii, ix). 


13.3.10 T'HEOREM: Basic properties of the Hartogs number of an ordinal number. 
Let a be an ordinal number. 


(i) H(a) is an initial ordinal. 
(ii) #(H(@)) > #(a). 
(iii) VB, (Ord(8) = (#(a) < #(8) = H(a) € 8)). 


In other words, (a) is the least ordinal number which strictly dominates a. 
(iv) If a Dw, then #(H(a)) > #(w). 
(v) If a Dw, then H(a) is an aleph. 


Proor: Part (i) follows from Theorem 13.3.9 (i). 

Part (ii) follows from Theorems 13.3.9 (ii) and 13.2.3 (ii). 
Part (iii) follows from Theorems 13.3.9 (iii) and 13.2.3 (ii). 
Part (iv) follows from Theorems 13.3.9 (iv) and 13.1.12. 
Part (v) follows from Theorems 13.3.9 (v) and 13.1.12. 


13.3.11 REMARK: The failure of ordinal number based cardinality theory. 

Without the mystical axiom of choice, not much more can be said about existence of ordinal number based 
cardinality yardsticks than the assertions of Theorems 13.2.5 and 13.3.2. Thus it is known that well-ordered 
sets are well represented by ordinal numbers, and for every ordinal number, there exists an ordinal which 
has greater cardinality. 


The two big gaps in ordinal number based cardinality are as follows. 
(1) Without a well-ordering, a set’s cardinality cannot be represented by an ordinal number. 
(2) Without a well-ordering, it cannot be shown that an ordinal number of greater cardinality exists. 


One might hopefully imagine that providing a well-ordering for “ordinary sets” would be fairly simple, and 
that only pathological sets would be difficult to well-order. After all, the task of finding a choice function for 
a set is difficult only for scenarios like Lebesgue unmeasurable sets, which do not arise naturally in practice. 
So sets which are difficult to well-order should be as uncommon as sets for which choice functions cannot 
be constructed. (This seems logical because the well-ordering theorem, Theorem 11.6.23, is known to be 
equivalent to AC, modulo ZF.) However, it is extraordinarily difficult to construct even a single well-ordering 
for an uncountable set. It is known that no well-ordering can be constructed for even such an “ordinary” 
uncountable set as the real numbers. (See Remark 7.12.5.) 


Uncountable well-ordered sets can be “constructed” by applying the method used to prove Hartogs’s theorem. 
But such a “construction” yields only the set of all ordinal numbers which are equinumerous to a given set. 
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It is difficult to know much about the structure of such a set, except that it must be much more complicated 
than the first few ordinal number stages which are sketched in Remark 12.6.4 and Table 12.6.2. 


As a concrete example, consider the set P(w), which is equinumerous to 2” by Theorem 14.7.5, which are 
both equinumerous to R. One naturally seeks an ordinal number which is equinumerous to P(w). By 
Theorem 13.2.5, there exists an ordinal number which is equinumerous to any well-ordered subset of P(w). 
Certainly there is an abundance of countable well-ordered subsets of P(w), but uncountable well-ordered 
subsets require a choice axiom. So it is easy to find ordinals which are smaller than R, but not of equal 
cardinality. By Theorem 13.3.2, there exist ordinals which are neither of equal or smaller cardinality than R, 
but a larger ordinal is not guaranteed by this theorem. Ordinals which are greater than w are certainly 
guaranteed, and arbitrarily large ordinals are in fact guaranteed by induction. But it is not guaranteed that 
any of these ordinals are larger than R. If the real numbers cannot be well ordered, they cannot be fitted into 
the ordinal number cardinality scale at all. A cardinality scale which cannot even count the elements of the 
most important number system in mathematics does not appear to be of much value at all. As mentioned 
in Remark 13.2.1, the only way to make the ordinal number monster useful for measuring cardinality is to 
invite and welcome the axiom of choice monster. (One effective way to remove most of the ordinal numbers 
is to abandon Fraenkel’s replacement axiom as suggested in Remark 12.5.5.) 


It is thus not at all surprising that the aleph which measures the cardinality of the real numbers is so 
mysterious and unknowable. Anything which requires the axiom of choice to grant it existence is necessarily 
mystical. By contrast, the measuring-stick P(w) is concrete and knowable. 


13.3.12 REMARK: The mysterious ordinal number wy. 
The set w; is defined to be the least uncountable ordinal number. By Theorem 13.3.10 (iii), wı equals H(w), 
which is the set of all countable ordinal numbers by Definition 13.3.8. This may seem to be an objective 
definition. However, countability of a set S is defined as the existence of a surjection from w to S. So the 
countability of sets depends on the choice of ZF model. 


A related difficulty arises when trying to prove that a union of countable set of finite sets is countable. (See 
for example Figure 13.8.1 in Remark 13.8.8.) The choice of ZF model influences which sets are countable. 


Thus the meaning of w; is subjective (i.e. model-dependent) because the meaning of countability of sets is 
subjective. This difficulty arises even before one considers whether w; has the same cardinality as P(w). 
This continuum hypothesis is well known to be independent of the ZF axioms. (See Cohen [349].) 


13.4. Beta-cardinality 


13.4.1 REMARK:  Cardinality is problematic, but not a problem in practice. 

Cardinality is a problematic topic. Often it is not known whether two sets are equinumerous or if one of 
them dominates the other. (This is the “trichotomy” issue. See Remark 13.1.20.) Sometimes it is not 
even knowable whether suitable injections, surjections or bijections exist to establish cardinality relations 
between sets. And sometimes it is not even known whether it is knowable or not. (The axiom of choice 
merely muddies the waters by claiming that suitable functions exist, which seem to fill the gaps in knowledge, 
but it is an empty claim. No axiom of choice ever produced a choice function. Promises, promises, ...) 
Thus one must live with some uncertainty and ignorance. On the positive side, sets which arise in applicable 
mathematics are almost always straightforward to assign a cardinality to. Sets which are not finite are usually 
provably equinumerous to some von Neumann universe stage V4, where o is almost never beyond w + w. 
The cardinality of absurdly infinite sets is certainly intriguing, but has little or no practical applicability. 


13.4.2 REMARK: The beta-cardinality of a set. 

Many common sets are not well-orderable. For example, the real numbers cannot be well-ordered. (See 
Remark 7.12.5.) Sets which have no well-ordering cannot be assigned a cardinality using general ordinal 
numbers as yardsticks. Consequently, the ordinal numbers are of limited use for measuring cardinality. 


The “beta-cardinality” of a set in Definition 13.4.9 is a hybrid of the standard cardinality using elements of 
w U {w} as yardsticks for finite and countably infinite sets, combined with infinite von Neumann universe 
stages Va for uncountable sets. The two ranges of this hybrid definition overlap (and agree) in the case of 
countably infinite sets. 


The “concept of operation" of beta-cardinality for infinite sets is to determine the smallest o for which a 
given set X is equinumerous to a subset of V,. In other words, X would have beta-cardinality V, if there 
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exists an injection from X to V4, but no injection exists to a smaller universe stage Va. (It is provable in ZF 
that #(Va) < 3£(Va41). So there is no danger of assigning X to two different universe stages.) For example, 
w has beta-cardinality w, R has beta-cardinality P(w), and IP(IR) has beta-cardinality P(P(w)). 

There is a theoretical danger that a set could have “mediate cardinality” between the cardinality of two 
von Neumann universe stages, but no explicit expression for a mediate cardinal can be constructed in ZF 
set theory. Their existence requires the negation of the continuum hypothesis. Even in model theory, it is 
not easy to demonstrate existence of mediate cardinals. In the worst-case scenario, it may be unknown what 
the precise beta-cardinality of a set is. This is not a disaster. There are many unknowns and unknowables 
in mathematics which do not hinder the progress of the subject at all. 


13.4.3 DEFINITION: The rank of a set X is the smallest ordinal number a such that the von Neumann 
universe stage V, includes X. 


13.4.4 NOTATION: rank(X), for a ZF set X, denotes the rank of X. 


13.4.5 REMARK: The rank of a set in the von Neumann universe. 

The word “includes” in Definition 13.4.3 is easily confused with the word “contains” in this situation. Let 
a = rank(X) be the smallest ordinal number such that X C V4. (The existence of such a smallest ordinal 
number is guaranteed by the fact that the class of ordinal numbers is well-ordered.) If a = 8 + 1 for some 
ordinal number £, then X Z Vg. But if X € Va, then X € IP(Va), which means that X C Vg, which 
contradicts the definition of a = rank(X). Therefore X £ Va. If there is no ordinal number 8 such that 
a = B + 1, then a = U{G; On(8), 8 < a} and V, = U {V3; On(8), 8 < a}, where On(8) means that 6 is 
an ordinal number. So if X € Va, then X € Vg for some ordinal number £ with 8 < a, which once again 
contradicts the assumption a = rank(X). Hence X ¢ Viank(x) for all set X in the von Neumann hierarchy. 


13.4.6 DEFINITION: The cardinality-rank of a set X is the smallest ordinal number a such that stage Vx 
of the von Neumann universe dominates X. 


13.4.7 NOTATION: p(X), for a ZF set X, denotes the cardinality-rank of X. 


13.4.8 REMARK: Rank versus cardinality-rank. 
'The following are some examples of the rank and cardinality-rank of sets. 


X rank(X) p(X) 
() 0 0 
(0) 1 1 
uo 2 1 
10,4005; 2 2 
{{{O}}} 3 1 
wU{w} w+1 w 
w+2 w+2 a) 
w +w w +w w 


For finite sets, the cardinality-rank is almost useless. As mentioned in Remark 12.6.2, #(Vo) = 0, #(V1) = 
20 = 1, #(V2) = 2! = 2, #(V3) = 22 = 4, #(V1) = 2* = 16, 44(V5) = 216 = 65536, and so forth. This is why 
the elements of w, the finite ordinal numbers, are used instead. 


The rank is even less useful for measuring cardinality than the cardinality-rank because the rank indicates the 
maximum membership-depth of set. (This is related to the ZF axiom of regularity discussed in Remark 7.8.1. 
Roughly speaking, the rank of a set is the ordinal number of the maximum-length path from the set down 
to the empty set by membership relations “on the left”.) Thus {{{@}}} has depth 3, and therefore rank 3, 
although it contains only one element, which gives it cardinality-rank equal to 1. 


13.4.9 DEFINITION: The beta-cardinality of a set X is 


card( X) if X is finite 
Vo(x)-w(w) otherwise, 


where p(X) denotes the smallest rank of the von Neumann stages which dominate X. 
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13.4.10 NOTATION: (X), for a ZF set X, denotes the beta-cardinality of X. 


13.4.11 REMARK: Equal beta-cardinality does not imply equinumerosity. 

It is generally assumed that when one says that two sets A and B have the same cardinality, then A 
and B must be equinumerous. In other words, there exists a bijection within ZF set theory between A 
and B. Since the “#” symbol generally signifies cardinality, it seems reasonable to assume that the equality 
“#(A) = #(B)” means that A and B have the same cardinality, and that a ZF bijection therefore exists 
between them, as is done in Notation 13.1.5. This is not necessarily the case for the beta-cardinality concept. 
In the case of infinite beta-cardinalities, the equality “8(A) = 8(B)” does not necessarily imply that A and 
B are equinumerous. (The bar on ĝ indicates that the cardinality has been “rounded up” to the nearest 
exact beta-cardinality.) The existence of “mediate cardinals” in some ZF models implies that two non- 
equinumerous sets could have the same beta-cardinality. Therefore the cardinality symbol “#” is used here 
as a kind of pseudo-notation which does not associate a particular yardstick set with its argument. Thus the 
expression “#(A)” is effectively meaningless in isolation if A is not known to be countable. But the pseudo- 
equality expression “#4(A) = #(B)” is meaningful. (See also Remarks 13.1.4 and 13.1.10 for set-numerosity 
pseudo-notations.) 


Except in the case of sets with “mediate cardinality” (which are very difficult to concoct within ZF set 
theory), it is generally found that sets will have a kind of “exact beta-cardinality". In other words, they are 
equinumerous to either a finite set or some infinite von Neumann stage V4. Two such sets will clearly be 
equinumerous. For example, both of the sets R and 2“ have exact beta-cardinality equal to Vi (w) = P(w). 
Therefore they are equinumerous. 


13.4.12 REMARK: The beth numbers and beta-sets. 

The beth number with index a is equal to the cardinality (i.e. the smallest equinumerous ordinal number) 
of the von Neumann universe V,(w). In other words, it equals card(V4(w)). Thus the beth numbers with 
indices 0, 1 and 2 are respectively the ordinal numbers card(w), card(P(w)) and card(P(P(w))). (All except 
the first of these ordinals require the axiom of choice to prove their existence.) 


The beta-sets By in Definition 13.4.13 have the transfinite recursion rules Bg = w, and B&41 = P(Ba) 
for any ordinal number a, and B, = U {B}; y € a} for any limit ordinal a. Replacing each power-set 
Ba = Va(w) with the minimal equinumerous ordinal number card(B,) = card(V4(w)) seems to have no 
benefit at all, but it does have significant disadvantages. Such ordinal numbers are not defined without the 
axiom of choice, and even when they are defined, their locations in the ordinal number sequence cannot be 
determined in general. Hence it is highly desirable to ignore the beth numbers and use the well-defined beta- 
sets as cardinality yardsticks instead. (For beth numbers, see Lévy [368], pages 139, 327; Bell/Slomson [339], 
pages 6, 204-205; Roitman [385], page 109.) 


13.4.13 DEFINITION: The beta-set with index a, for any ordinal number a, is the set Ba defined recursively 


by Bo = w and B441 = P(Ba) for all ordinals a, and By = Lea B, for all limit ordinals a. 


13.4.14 REMARK: A better notation for infinite cardinalities. 

For practical purposes, the sets w, P(w), P(P(w)), and so forth, are ideal yardsticks for cardinality of 
infinite sets. The most obvious notation for such sets would be P*(w), defined inductively by P°(w) = w 
and IP^*!(u) = P(P*(w)) for k € w. (This notation is used by Smullyan/Fitting [392], pages 113-114, 124.) 
Transfinite recursion extends this to P®(w) for ordinal numbers a. 


In this book, there would be a clash between this IP? (w) notation and the non-standard (and not enormously 
useful) notation IP^(X) = (S € P(X); (S) € m) for m € w in Notation 13.12.2. 


13.5. Finite sets 


13.5.1 REMARK: The “ordinal number yardstick” definition for finiteness of sets. 

The “ordinal number equinumerosity” criterion in Definition 13.5.2 is fairly generally agreed to be the basis 
of the concept of finiteness for sets. Despite the inconvenience of needing to first carefully construct the finite 
ordinal numbers as a “cardinality yardstick”, it has the great advantage of being obviously correct. Since 
we can construct the finite ordinal numbers almost “by hand”, this “hands-on” experience gives a high level 
of confidence in the definition. 
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Theorem 13.11.4 gives an equivalent definition for finite sets which does not use the finite ordinal numbers 
for equinumerosity tests. The Kuratowski finite set definition which is mentioned in Remark 13.11.7 also 
does not require ordinal number equinumerosity tests. (There are also various other non-equinumerosity 
definitions for finite sets.) In practice, however, it is most often finite ordinal number equinumerosity which 
is easiest to demonstrate and apply. 


Definitions of finite sets essentially the same as Definition 13.5.2 are given for example by Halmos [357], 
page 53; Roitman [385], page 103; S. Warner [155], page 138; Shoenfield [390], page 255; Stoll [393], page 83; 
Pinter [377], page 144; Suppes [395], page 149; Bernays [341], page 97; Lang [108], page 876; MacLane/ 
Birkhoff [110], pages 156-157; Curtis [65], page 14; Smullyan/Fitting [392], page 4. 


13.5.2 DEFINITION: A finite set is a set S which satisfies dn € w, Jf : n — S, f is bijective. 


13.5.3 THEOREM: Unique existence of a finite ordinal number equinumerous to a given finite set. 
Let S be a finite set. Then F'n € w, Jf : n — S, f is bijective. In other words, there is one and only one 
finite ordinal number n € w for which S and n are equinumerous. 


PROOF: Let S be a finite set. Then dn € w, Jf : n — S, f is bijective by Definition 13.5.2. Suppose that 
ni,n3 € w satisfy Jf : nj — S, f is bijective, for k = 1,2. Then there exist fı : nj — S such that fi is 
bijective and f2 : n3 — S such that f» is bijective. So f o fi : nı — na is a bijection by Theorem 10.5.6 (iii) 
because fj ! is a bijection by Theorem 10.5.11. Therefore nı = na by Theorem 12.4.18 (ii). Thus n is unique. 
Hence F'n € w, df :n — S, f is bijective. 


13.5.4 REMARK: Definition and notation for the numerosity of a finite set. 

Since the finite ordinal number n which exists for any given finite set S according to Definition 13.5.2 is 
unique by Theorem 13.5.3, it is possible to give this number a name and notation as in Definition 13.5.5 and 
Notation 13.5.6. 


Note that Notation 13.5.6 is compatible with Notation 13.1.5 for equinumerosity of sets because if two 
finite sets Sı and Sy have the same value of numerosity according to Definition 13.5.5, then they must be 
equinumerous since they will then be equinumerous to the same finite ordinal number. (This may seem 
intuitively obvious, but intuition is not a proof!) It is easily verified that Notation 13.5.6 is also compatible 
with Notation 13.1.11 in the case of finite sets. 


13.5.5 DEFINITION: The numerosity or cardinality of a finite set S is the unique number n € w for which 
a bijection exists from n to S. 


13.5.6 NOTATION: #(9), for a finite set S, denotes the numerosity (or cardinality) of S. 


13.5.7 REMARK:  Pseudo-notation for finiteness of sets. 

Notation 13.5.8 is a frequently used pseudo-notation. In terms of Notations 13.1.11 and 13.7.10, the finiteness 
of a set S may also be indicated by writing “#(S) < #(w)” or “#(S) < w”. This means, by Definition 13.1.9, 
that there exists an injection from S to w, and that S and w are not equinumerous. It then follows from 
Theorem 12.4.8 that S is equinumerous to an element of w. In other words, S is finite. (Luckily there is no 
danger of “mediate cardinals” between the finite sets and w as there is between w and P(w).) 


The apparent inequality in the expression “#4(S) < oo" has limited validity because the symbol “co” has no 
specific meaning. However, it may be safely interpreted to mean “#(S) < #(w) = w”. (The corresponding 
pseudo-notation #(S) = oo for infinite sets in Notation 13.7.4 may then be interpreted as the logical negation, 


“S(G#(S) < #(w))”-) 
13.5.8 NOTATION: #(S') < oo, for a set S, means that S is a finite set. 


13.5.9 THEOREM: The cardinality of every finite ordinal mumber is the number itself. 
VN € w, #(N)=N. 


PROOF: Let N €w. Then the identity function idy : N > N is a bijection. (See Notation 10.2.29 for id y.) 
Therefore #(N) = N. 
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13.5.10 REMARK: Ad-hoc notation for set of possible enumerations of a given set. 
In terms of Notation 13.5.11, a set S is finite when Enum(w, S) 4 Ø. In other words, S is a finite set if and 
only if there exists an enumeration of S by some element of w. 


Enum(K,Y) is a ZF set for any ZF sets K and Y. To see this, note that Bij(X,Y) C P(X x Y) for 


all sets X and Y. (See Notation 10.5.25 for Bij(X,Y), the set of bijections from X to Y.) Therefore 
Enum(K,Y) C Uxex P(X x Y) C P((UK) x Y) because VX € K, X CUK. 


13.5.11 NOTATION: Enum(K,Y), for sets K and Y, denotes the set of bijections from X to Y for X € K. 
In other words, 


Enum(K,Y)-— |J) Bij(X,Y). 
XEK 


13.5.12 REMARK: Inconvenient measures for the size of a finite set. 

Definition 13.5.2 means that a finite set is a set which is equinumerous to a finite ordinal number. This is 
a very inconvenient definition for practical applications. One must potentially test a set against an infinite 
collection of test-sets, namely the elements of w. The set w is specified in Definition 12.1.28 in terms of an 
“interior equation” and “boundary conditions”. So this set is in effect constructed by “solving a boundary 
value problem”, which is a somewhat indirect method of specification. The individual finite sets can be 
specified as in Definition 12.1.3 as follows. 


IN, 3f, ((f € X x N) ^ (Vz, Vy, (f(x) = f(y) & x = y)) ^ Ym E€ N, (m=0 v Ja € N, m = a U {a})). 


This has the advantage of not referring to the set w, which itself would need to be incorporated into the 
logical proposition if it was written out fully, but this form of definition still requires the “solution” of a 
“finite differences” formula. The inconvenience of definitions of finiteness based on the “standard ruler” of 
the set of ordinal numbers motivates the search for simpler characterisations. 


13.5.13 THEOREM: Some cardinality properties of subsets of finite sets. 
(i) If S is a finite set and A C S, then A is finite. 

(ii) If S is a finite set and A & S, then #(A) < #(S). 

(iii) If S is a finite set and A C S with #(A) = #(S), then A= S. 

) 


(iv) There are no injections from w to a finite set. 


PROOF: For part (i), let S be a finite set. Then by Definition 13.5.2, there exists a bijection f : n — S 
for some n € w. Let A be a subset of S. Let A’ = f-!(A) Then A’ C m. Let g : X — A’ be the 
standard enumeration for A’. Then X € wt and g is a bijection by Definition 12.4.5. But X C n by 
Theorem 12.4.11 (i). So X € w by Theorem 12.2.20 (iv). But f|,,: A’ — A is a bijection. (It is injective 
because f is injective, and it is surjective because f(A’) = f(f-!(A)) = A by Theorem 10.7.1 (i’’).) Therefore 
fog:X- A is a bijection. Hence A is a finite set. B 


For part (ii), let S be a finite set and A & S. Let N = #(S). Then N € w, and there is a bijection fs : 
N +S. Let X = f5!(A). Then X € N by Theorem 10.6.7 (vii). Now suppose that #(A) = #(9) = N. 
Then there is a bijection fa : N — A. So fg! o fa : N > X is an bijection by Theorem 10.5.6 (iii). 


Therefore Range( ft o fa) = N by Theorem 12.4.13 (i), which is a contradiction. Hence #(A) < #(S) by 
Notation 13.1.11 and Definition 13.1.9. 


For part (iii), let S be finite, A C S and #(A) = #(S). Suppose that A # S. Then A & S. So #(A) < #(S) 


by part (ii), which is a contradiction. Therefore A — S. 


For part (iv), let S be a finite set. Then there is a bijection f : n — S for some n € w by Definition 13.5.2. Let 
g:uw-— S bean injection. Then f^! o g: w — n is an injection. This is impossible by Theorem 12.4.13 (iii). 
Hence there are no injections from w to S. 


13.5.14 THEOREM:  Znumerations of finite totally ordered sets. 
Let S be a finite totally ordered set. Then there exists a unique increasing sequence x with Range(x) = S. 
(See Definition 12.3.1 for sequences.) 
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PROOF: Let S be finite and totally ordered. Let n = #(S). Then by Definition 13.5.5 and Theorem 13.5.3, 
n is the unique element of w for which there is a bijection f : n — S. Define x = (xj)ien € S" inductively by 


Vien, xi = min{t € S; Vj E n, (j < i >t Æ aj) 
= min(S V Range((a;)je«)). 


To verify that x is a well-defined sequence, it suffices to show that {t € S; Vj € n, (j < i => t # xj)} FO 
for all i € n. Suppose not. Then let ip be the least i € n for which {t € S; Vj En, (j < i => t # zj)} = 0. 
(Here io is well defined because n is well ordered by Theorems 12.2.7 and 11.6.6.) Then such ig will satisfy 
Yt € S, Jj € n, (j < io and t = zj) by Theorem 6.6.10 (viii) and Definition 4.7.2 (ii). So S C Range((zj);ci,). 
But 4 (Range((z;);ei,)) < #(io) = io by Notation 13.1.11, Definition 13.1.9 and Theorem 13.5.9. Therefore 
#(S) € io by Theorem 13.1.12. Thus n < io, and so n C io by Definition 12.2.5, which then contradicts the 


assumption ig € n by Theorem 7.8.4 (i). Consequently x is a well-defined sequence. 


The sequence z is injective because for each i € n, the definition of x implies Vj € n, (j < i > zi # vj). So 
Vij E€ n, (j < i => x; $ xj), and so Vi,j € n, (i < j > zi # xj) by swapping i and j. But i # j implies 
i< jor j <i by Theorem 12.2.6. Therefore Vi, j € n, (i Æ j — vi z £j). In other words, x is injective. 

To show that x is increasing, suppose that i,k € n satisfy i < k and zk < xi. Then ry < x; because x is 
injective. Since z; € Range((x;)jei) because k ¢ i and z is injective, it follows that z; € S X Range((z;);e;). 
Therefore x, < x; = min(S \ Range((z;);e;)) by the definition of x, which is a contradiction since an element 
of a set cannot be less than its minimum. Consequently x is an increasing sequence. 


To show that Range(x) = S, first note that Range(x) C S by the definition of x. Suppose that Range(x) 4 S. 
Then #(Range(x)) < #(S) by Theorem 13.5.13 (ii). But #(Range(x)) = n because z is injective, which 
implies that z : n + Range(z) is a bijection. So n < n, which is impossible. Therefore Range(x) = S. Hence 
there exists an increasing sequence x with Range(x) = S. 

To show uniqueness, let z = (r;)ie, and z' = (2')jen’ be increasing sequences with Range(x) = S and 
Range(z' = S. Then n' = n because z^! o z' : n' — n is a bijection. Suppose that x # x’. Then 
io = min(i € n; xi A x} is well defined, and either £i < xj, or rj, > z;. Assume (without loss of 
generality) that rj, < cj. Since z' : n — S is a bijection, and zi, € S, there must some k € n \ {io} such 
that rj = vij. If k < io, then £k = x}, because x; = c; for all i < ig by the definition of ig. Then £k = Tio, 
which implies that x is non-injective. Therefore k is not less than ig. If k > io, then £} = rj, < z;,, which 
implies that x’ is not an increasing sequence, which contradicts an assumption. Consequently x Æ z' is not 
possible. Hence the increasing sequence x with Range(x) = S is unique. 


13.6. Addition of finite ordinal numbers 


13.6.1 REMARK: Inductive definition of addition of finite ordinal numbers. 

Theorem 13.6.2 is a starting point for defining addition of finite ordinal numbers. Ironically, subtraction is 
not at all difficult to define. If N; € No, then No — N1 may be defined as #(N2\ N1). (Theorem 13.5.13 (i) 
guarantees that the set-complement N2 \ Mı is a finite set.) Addition, by contrast, has no such simple 
construction as set-complement to define it. It seems straightforward to define the sum of two finite ordinal 
numbers by using ordered pairs to differently “tag” elements of N; and N2, and then define the sum to be 
the cardinality of the disjoint union. However, one must first prove that this union is finite. This is where 
induction seems to be inescapable. The construction in Theorem 13.6.2 exploits the well-defined subtraction 
operation to inductively prove the existence of N3 such that No = N3 — N1, since N3 — N; is apparently so 
straightforward to define. 


To put it another way, if the solution N3 of the equation N3 — Nı = Nə is known for given N; and No, then 
it is simple to verify that the equation is satisfied. The more difficult task is to solve the “inverse problem". 
(The majority of difficult problems in mathematics are such inverse problems.) In other words, N; + N2 
must somehow be constructed from given N; and N2. Some clues can be gleaned from Figure 13.6.1. 


It can be seen in Figure 13.6.1 that the ordinal number 3 may be constructed by replacing each instance of the 
empty set (i.e. the number 0) in the number 2 with the elements of the number 2. In other words, wherever 
there is an empty box, replace this with Ø and (0), which are the elements of 2. After the replacement, there 
are 4 empty boxes instead of 2. 
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Figure 13.6.1 Addition of finite ordinal numbers 


To construct the number 4, one may likewise substitute the elements of 2 for each empty box in 3, and to 
construct 5, one may substitute the elements of 3 for each empty box in 3. The general rule is that Ni + No 
is obtained by substituting Nj‘ for each empty box in Na. This suggests that there should be a set-theoretic 
formula which constructs the sum of two finite ordinal numbers. However, this would require a “recursive 
descent” through the boxes in the diagrams (i.e. the sets and sets within sets recursively), somehow replacing 
the empty sets during the recursion. Such recursion is certainly no simpler than induction. There is a simple 
set-theoretic formula to construct N* from N, but apparently no non-recursive set-theoretic formula for 
adding general Ni, No € w. 


The simplest way to define addition seems to be to exploit the explicit construction for successor sets, 
combined with induction, to prove the existence of a solution N3 for the equation N3 — Nı = No for any 
given N, and Nz in w. An existence proof by induction is less satisfying than an explicit set-theoretic 
formula, but infinitely more satisfying than appealing to an axiom of choice. 


13.6.2 THEOREM: Existence and uniqueness of the sum of two finite ordinal numbers. 
YN, No Ew, 3' Na Ew, #(N3 i Ni) = No. 


PROOF: Let N; € w. Let X = {N2 € w; 3Na € w\ Ni, #(N3\ N1) = No}. (See Notation 13.5.6 for #(S) for 
finite sets S.) Clearly X C w, and Ø € X because #(N; \ N1) = #(0) = Ø and N1 ¢ Nı by Theorem 7.8.4 (i). 
Let N € X. Then there exists N3 € w such that #(N3 V Ni) = N and Ns € Ni. For such N3, there exists a 
bijection à : N + N; \ Ni. Using Notation 12.2.18 for successor sets, define 9* : Nt + Nj \ Ny by 


g(a) ifzcN 
Ns  dfíz—N. 


Then $* is a well-defined function from N+ to NF \ Ni because N* is the disjoint union of N and {N}, 
and N3 € N \ Ni because Na € NF = N3 U (Na) and N3 ¢ Ny. To show that d+ : NF > NF \ Mi isa 
bijection, let 71,22 € N^ satisfy $^ (x1) = $^ (xa). If 21,22 € N, then zı = x2 because $ : N > N3 \ Nj is 
a bijection. If xı € N and z2 d N, then z2 = N and $(z2) = Na which cannot be an element of N3 \ Nj. 
So this case is impossible because ó^(z1) Æ $*(r2). The only remaining case to check is zı ¢ N and 
xq € N, which implies z; = N = xg. Thus ó* is injective. But Range(¢) = N3 \ Mı and so Range(¢t) = 
Range(9) U(.N3] = (N3\N1)U{Ns}. But Na ¢ Ni. So Range(¢*+) = (N3U{N3})\Ni = NF \ M1. Therefore 
ot : N+ 2 NY \ Nj is a bijection. Thus #(NF VN1) = N+ for Nj € w. Therefore N+ € X. It then follows 
by induction, Theorem 12.2.12, that X = w. Hence VN1, No € w, IN; € w, #(N3 \ Ni) = No. 

To show uniqueness of N3, let N1, Na € w and let N3, Ns € w satisfy #(N3\ N1) = N2 and #(Ñ3\ N1) = No. 
Then there exist bijections ¢ : Nj > N; \ Nı and ¢: Ny > Ñ; \ N1. Then à o $7! : N3 \ Ni > Ñ; \ N is 
a bijection by Theorems 10.5.11 and 10.5.6 (iii). Define ¢ : Na > Na by ó = @Uidy,. Then à : Na > Na 
is a well-defined function because (N3 V N1) O Ni = 0, and then ó : Na — Ns is a bijection because 
$ : N3 \ Ni > N3 \ N4 and idy, : Ni — Nj are both bijections and (N3 NV N1) O Nı = 9. Therefore N3 = N3 
by Theorem 12.4.18 (ii). Hence VNi, No € w, I'N; € w, #(N3 \ Ni) = No. 


VreceN* JOERI 


13.6.3 DEFINITION: The addition operation on the finite ordinal numbers is the map o : w x w —> w for 
which o(N1, N2) is the unique N3 € w which satisfies #(N3 NV Ni) = No, for all Ni, No € w. 


The sum of N, and N, is the finite ordinal number o(N1, No), for Ni, No € w. 
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13.6.4 DEFINITION: The subtraction operation on the finite ordinal numbers is the map 
ô: {(M1, N3) E WX W; Nə < Ni >w given by ó(N3, N2) = #(Nı \ N2) for all Nı, No € w with No < Ni. 


The difference between N; and Np is the finite ordinal number ô( N1, N2), for Ni, No € w with No € Ni. 


13.6.5 NOTATION: Sum and difference of finite ordinal numbers. 
Nı + No, for N4, No € w, denotes the sum of N; and N3. 


N, — No, for Ny, No € w with No < N4, denotes the difference between N; and No. 


13.6.6 THEOREM: Some properties of addition and subtraction of finite ordinal numbers. 
(i) VN1, No € w, Ni + No = No + Ni. 
(ii) YM, N2 € w, dd: No > (Ni + No) \ Ni, ¢ is a bijection. 
In other words, VN, No € w, Bij(No, (Ni + N3) V Ni) 49. 
(ii) VN1, No € w, dó : No > (No + N1) \ Mı, ¢ is a bijection. 
In other words, VN, No € w, Bij(No, (No + N1) VN Ni) z 0. 
(iv) VN, No € w, (Ni + No) — Ny = No = (No + N1) — Ni. 
(v) VN, € w, VN € w\ M, (No — Ni) + Ny = No. 
(vi) VNo, Ni, No € w, (Ni € No > No + Ni € No + No). 


PRoor: For part (i), let N1, No € w. Let Na = Ni + No and N = Na + N,. Then by Definition 13.6.3, 
there exist bijections ¢: No > N3 \ Ni and à : Nı > N3 No. Define à : Na > Ns by 


"T" elx) ife N 
Va € Ns, io {80 if x € N3 \ Ny. 


Then ó : N3 > N is a well-defined function because N3 is the disjoint union of Nı and N3 V Mı, and 
Dom(¢) = Ni and Dom(¢~') = N3 \ Ni. Moreover, $: Na > Ns is a bijection because Ñ; is the disjoint 
union of Range(¢) = N3 \ No and Range(@~!) = No, and ¢ and $^! are both injections on their disjoint 
domains. Hence N3 = Ñ; by Theorem 12.4.18 (ii). 

For part (ii), let N1, No € w. Then #((N; + N2) \ Ni) = No by Definition 13.6.3. So there exists a bijection 
from Nə to (N1 + N2) \ Ny by Definition 13.5.5 and Notation 13.5.6. 

Part (iii) follows from parts (i) and (ii). 

Part (iv) follows from parts (ii) and (iii) and Definition 13.6.4. 

For part (v), let N; € w and Np € w\ N;. Then Nı C N3 by Theorem 12.1.23 (viii). So N3 = Nə — N; is well 
defined by Definition 13.6.4, and N3 = #(N2\ Ni). Therefore there exists a bijection $3 : Na > No \ Ni. 
Also, there is a bijection $1 : Nj — (Na--.N1)V Ns by part (ii). Let ó = $1Uó; l. Then à : No > N34- Nj isa 
well-defined function because No is the disjoint union of Dom(¢,) = N; and Dom(@3') = N2\ Nj, and ¢ and 
$3 ' are well-defined functions. Moreover, N3 + N is the disjoint union of Range(¢,) = (N3 + N1) V N3 and 
Range(@3 !) = Ns, and à, and $3 are injective on their disjoint domains. Therefore ¢ : No > Na + Nı = 
(N2 — N1) + IN, is a bijection. Hence (N2 — N1) + Ni = N2 by Theorem 12.4.18 (ii). 

For part (vi), let No, N1, No € w with Ny € Na. Then Ni C No by Definition 12.2.5. By part (ii), there are 
bijections dy : Ny — (No + NX) V No for k = 1,2. Then $» o à ! : (No + N1) \ No > (No + N2) \ No is a well- 
defined function because à, ! and $» are functions and Dom(ó» o 9; !) = Dom(¢,') = (No + Ni) V No and 
Range(¢2 o 9; !) C Range(¢2) = (No + N2) V No, and ¢2 o $1! is an injection by Theorem 10.5.6 (i) because 
$1. and » are injective. Let ó = idw, U(¢2 o 91 !). Then à : No +N, — No + Na is a well-defined injection 
because Dom(idy,) = No and Dom(@2 o $1 !) are disjoint and Range(idy,) = No and Range(@2 o 94!) are 
disjoint. Hence No + Ni € No + N2 by Theorem 12.4.18 (ii). 


13.6.7 THEOREM: Additivity of cardinality of unions of disjoint finite sets. 
Let A and B be disjoint finite sets. Then #(A U B) = #(A) + (B). 


PROOF: Let A and B be disjoint finite sets. Then Ni = #(A) and N2 = #(B) are well-defined elements 
of w by Theorem 13.5.3 and Notation 13.5.6, and there exist bijections $1 : Nı — A and $9 : No > B. 
There exists a bijection $3 : Na > (Ni + N2) \ Ni by Theorem 13.6.6 (ii). Let ó = ¢1 U (¢2 o $5). Then 
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@:N,+ No > AU B is a well-defined function because Ni + Na is the disjoint union of Dom(¢1) = Ni 
and Dom(¢2 o ¢3') = Dom(¢3+) = (Ni + No) V Ni, and $1 and ¢2 o $3" are well-defined functions. Since 
Range(¢1) = A and Range(ó» o 93!) = 2(Range(¢3')) = ¢2(N2) = B, it follows that Range(?) = AU B. 
The injectivity of ó follows from the injectivity of $1 and ¢2 o dig ^. Therefore $ó : Ny + No > AUB isa 
bijection. Hence (AU B) = Ni + Ns = #(A) + #(B) by Definition 13.5.5. 


13.6.8 REMARK: Application of addition and subtraction of cardinalities of finite sets. 

Theorem 13.6.9 is a fairly basic assertion regarding the cardinality of the symmetric difference of two finite 
sets, but it demonstrates the application of the basic addition and subtraction arithmetic for finite ordinal 
numbers. Since time does not permit the presentation of multiplication and division for finite ordinal 
numbers, it is tacitly assumed that the subtraction of 2 #(AN B) is the same as subtracting #(AN B) twice. 
(Theorem 13.6.9 is applied in the proof of Theorem 14.8.19 (v).) 


13.6.9 THEOREM: Formula for cardinality of symmetric difference of two finite sets. 
#(A A B) = #(A) + #(B) —2#(AN B) for any finite sets A and B. 


Proor: (AA B)U(An B) = AU(B\(ANB)) by Theorems 8.3.4 (x) and 8.2.5 (viii, xvii) for general sets 
A and B. The unions in this equation are disjoint by Theorems 8.3.4 (viii) and 8.2.5 (vi, xvii). Therefore for 
finite sets A and B, #(AAB)+#(ANB) = #(A)+#(B\(ANB)) by Theorem 13.6.7. But B is the disjoint 
union of B\ (ANB) and ANB because ANB C B. So #(B) = #(B\(ANB))+#(ANB) by Theorem 13.6.7. 
Therefore #(B\(ANB)) = (#(B\(ANB))+#(ANB))—#(ANB) = #(B)—#(ANB) by Theorem 13.6.6 (iv). 


Hence #(A A B) = (#(AA B) + Z(An B)) - Z(An B) = #(A) + #(B \ (AN B)) - F#(AN B) 
#(A) + #(B) — #(AN B) — «(An B) by Theorem 13.6.6 (iv). 


13.6.10 THEOREM:  Subadditivity of cardinality of unions of finite sets. 
Let A and B be finite sets. Then #(AU B) € #(A) + (B). 


PROOF: Let A and B be finite sets. Let C = B\ A. Then A and C are disjoint sets. But C is a 
finite set by Theorem 13.5.13 (i). So A and C are disjoint finite sets. So #(A UC) = #(A) + #(C) 
by Theorem 13.6.7. But C C B. So #(C) < #(B) by Theorem 13.1.12, and AUC = AUB. Hence 


#(AUB) = #(AUC) = #(A) + #(C) € #(A) + #(B) by Theorem 13.6.6 (vi). 


13.6.11 REMARK: Powers of 2. 

It is convenient to introduce here the concept of “powers of 2” for finite ordinal numbers. These are useful for 
proving a form of the Archimedean order property for rational numbers in Theorem 15.1.16. Theorem 13.6.12 
justifies Definition 13.6.13. The proof of the theorem contains some tedious technicalities. Usually inductive 
definitions are not justified in such detail. (Theorem 14.1.11 uses a similar form of inductive definition proof.) 


13.6.12 THEOREM: Unique existence of the power-of-two function. 
There is a unique function $ : w — w with $(0) = 1 and Vi € w, (i + 1) = (i) + (i). 


Pnoor: Let S = (n € w; 3'ó : nt — w, (¢(0) = 1 and Vi € n, oli + 1) = o(i) + ¢(i))}. For all n € w, let 
F, = {@: nt > w; $(0) = 1 and Vi € n, (i+ 1) = (i) + ó(i)). Then S = (n € w; (F5) = 1}. 

#(Fo) = 1 because Fo contains only the function 9 : 1 = {0} > w with $(0) = 1. Assume that #(F;,) = 1 
for some n € w. Then F, contains a unique 9 : nt — w such that ¢(0) = 1 and Vi € n, ¢(i+1) = 91) + o1). 
Define 9* : (n*)* = n +2 — w by ó* (n 4- 1) = O(n) + O(n) and Vi € n*, dt (i) = O(i). Then ọ* € Fy4i. 
The uniqueness of #* in F,,4 follows from the fact that the restrictions to n+ any two functions in F4 
must be elements of Fp, for which uniqueness is assumed by inductive hypothesis, and the value of $n 4-1) is 
then uniquely determine by the value of ó(n). Therefore #(F,,41) = 1. Thus 0 € S and n 4-1 € S whenever 
n € S. So by Theorem 12.2.12, S = w. 


Now let ¢ = U{@; dn € w, ó € Fr} = UUnew Fn. Then $ : w — w is a well-defined function because 
gon © $444 for all n € w, where ¢, denotes the unique function in Fa. The uniqueness of ¢ $ : w > w 
satisfying 6(0) = 1 and Vi € w, ó(i + 1) = $(1) + (i) follows easily by induction. 


13.6.13 DEFINITION: The nth power of 2, for n € w, is the value of ó(n), where $ : w > w is the function 
which satisfies 9(0) = 1 and Vi € w, (i + 1) = (i) + (i). 


13.6.14 NOTATION: 2", for n € w, denotes the nth power of 2, as in Definition 13.6.13. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


470 13. Cardinality 


13.6.15 THEOREM: The power-of-two function is never less than its argument. 
Yn € w, n € 2". 


Pnoor: Let S — (n € wv, n € 2^). Then 0 € S because 0 < 1, and 1 € S because 1 = 2° < 29+ 2° = 2! by 
Theorem 13.6.6 (vi). Suppose that n € S and 1 < n. Then n € 2", and 2"*! = 2” +2” by Definition 13.6.13. 
So by Theorem 13.6.6 (vi), n -- 1 € n 4- n € n 4- 2* « 2^ +2” — 2"*1, Therefore n 4- 1 € S. So S =w by 
Theorem 12.2.12. Hence n < 2” for all n € w. 


13.7. Infinite and countably infinite sets 


13.7.1 REMARK: The untidiness of infinity concepts. 

Compared to the aesthetically pleasing simplicity and clarity of finite sets (if one ignores the serpentine 
development of some of the definitions and theorems), infinite sets are truly a dog's breakfast. It is difficult 
to feel any kind of security or confidence in some of the ideas in Sections 13.7, 13.8, 13.10 and 13.11. The 
simple-looking Definition 13.7.2 for infinite sets leads very quickly to a labyrinth of dubious classifications 
and implications where it is difficult to find a foothold. Unfortunately, any attempt to untangle the gordian 
knots of infinity concepts with magic axioms leads to a kind of undecidability denial. Some things just can't 
be decided. There is some short-term benefit in denying the non-existence of bijections when they are sorely 
wanted. But in the longer term, learning to live with undecidability is best. 


13.7.2 DEFINITION: An infinite set is a set which is not finite. 


13.7.3 REMARK: Pseudo-notation for infiniteness of sets. 
Notation 13.7.4 is a pseudo-notation, based on Notation 13.5.8, which means that a set is not finite. The 
apparent equality in this expression has very limited validity. 


13.7.4 NOTATION: #(5S') = oo, for a set S, means that S is an infinite set. 


13.7.5 REMARK: The ambiguous boundary between finite and infinite. 

It cannot be proved within ZF set theory that every infinite set (according to Definition 13.7.2) includes 
a subset which is equinumerous to the set of ordinal numbers w. This is an extremely inconvenient fact. 
Definition 13.7.2 says what an infinite set is not, not what it is. The existence of an infinite sequence of 
distinct elements in an infinite set can be proved with the assistance of the axiom of countable choice, but 
this is useless because axioms of choice never deliver concrete examples of the choice functions which are 
claimed to exist. 


One way to resolve this issue would be to simply define an “infinite set” to mean a set which includes a set 
equinumerous to w, but a non-standard meaning for a standard definition generally creates more problems 
then it solves, in this case because there would still remain a mystery as to what lies in between the finite 
and the infinite. (With the standard definition of "infinite", the mystery relates to what lies between the 
finite and the w-infinite.) 

Definition 13.7.6 introduces the non-standard class of “w-infinite” sets, which is fairly close to an intuitive 
idea of an infinite set, and also the standard countable and uncountable cardinality classes. (The relations 
between these classes are illustrated in Figure 13.7.1 with rough indications of their meanings.) 


By Theorem 13.10.6 (i), a ZF set is w-infinite if and only if it is Dedekind-infinite. However, these concepts 
have an important philosophical difference. The test for the w-infinite class is comparative or “extrinsic”, 
in the sense that the set must be shown to be equinumerous to a given “yardstick set” w. The test for 
the Dedekind-infinite class is intrinsic because it stipulates only the existence of a particular kind of set 
endomorphism. (See Definition 10.5.21 for set endomorphisms.) Moore [371], page 24, says the following. 
(See Definition 13.10.3 for Dedekind-finite sets.) 

In 1882 Cantor still did not believe that a simple definition of finite set was necessary or even 

possible. Therefore he was surprised when Dedekind communicated his own definition [...], the 

first adequate definition of finite set which did not presuppose the natural numbers. 


13.7.6 DEFINITION: An w-infinite set is a set S which satisfies 3f :w— S, f is injective. 


A countably infinite set is a set S which satisfies 3f : w — S, f is bijective. 
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Figure 13.7.1 Relations between w-infinite, countable and uncountable cardinality classes 


A countable set is a set S which satisfies IN € wt, 3f : N — S, f is bijective. 
In other words, a countable set is a set which is either finite or countably infinite. 


An uncountable set is a set which is neither finite nor countably infinite. 


An uncountable w-infinite set is a set which is w-infinite and uncountable. 


13.7.7 THEOREM: Some basic properties of infinite, countable and w-infinite sets. 
(i) w is an infinite set. 
(ii) Every w-infinite set is infinite. 


) 
) 
(iii) Every subset of a countable set is countable. 
(iv) P(w) is an uncountable w-infinite set. 

) 


(v) A set X is w-infinite if and only if #(X) > #(w). 


Proor: For part (i), suppose that w is a finite set. Then there exists a bijection f : n — w for some n € w 
by Definition 13.5.2. Then f~! : w > n is a bijection. So f^! : w — n is an injection. But this is impossible 
by Theorem 13.5.13 (iv). Therefore w is not a finite set. Hence w is an infinite set by Definition 13.7.2. 

For part (ii), let S be an w-infinite set. Suppose that S is finite. Then by Definition 13.5.2, there exists a 
bijection g : n — S for some n € w. But by Definition 13.7.6, there exists an injection f : w — S. Then 


g7} o f :w > nis an injection. But this is impossible by Theorem 13.5.13 (iv). Therefore S is not a finite 


set. Hence S is an infinite set by Definition 13.7.2. 
Part (iii) follows from Theorem 12.4.8. 
Part (iv) follows from Theorem 13.1.27 (v), Notation 13.1.11 and Definitions 13.1.9 and 13.7.6. 


For part (v), it follows from Notation 13.1.11 and Definition 13.1.9 that #(X) > #(w) if and only if there 
exists an injection from w to X. Hence by Definition 13.7.6, (X) > #(w) if and only if X is w-infinite. 


13.7.8 REMARK: Extending the finite set cardinality notation to countably infinite sets. 

Definition 13.5.5 and Notation 13.5.6 may now be extended from finite sets to include countably infinite sets 
as in Definition 13.7.9 and Notation 13.7.10. It is easy to verify that these definitions and notations are 
compatible with each other, and also with Notations 13.1.5 and 13.1.11. 


13.7.9 DEFINITION: The numerosity or cardinality of a countably infinite set S is the set w. 


13.7.10 NOTATION: #(9), for a countably finite set S, denotes the numerosity (or cardinality) of S. In 
other words, #(S) = w if and only if S is a countably infinite set. 
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13.7.11 REMARK: Terminology for sets with subsets equinumerous to w. 

One may also write “w-infinite” as “omega-infinite”. It might seem reasonable to refer to an w-infinite set 
as “countably infinite”, since the attribute which distinguishes such a set from a merely non-finite set is 
the existence of a subset which can in some sense be counted. However, one uses the word “countable” to 
indicate an upper bound on the cardinality, whereas in the case of an w-finite set, the intention is to place 
a lower bound on the set’s cardinality. 


13.7.12 REMARK: The difference between “countable” and “denumerable”. 

In much of the mathematics literature, the word “denumerable” is given the same meaning as “countably 
infinite”. This is yet another unfortunate case of a standard English word being given a mathematical 
meaning which seems to contradicts its meaning in common usage. (The word “denumerable” is essentially 
the same in many European languages because of its Latin origin.) One would expect “denumerable” to 
have the same meaning as “countable”. To avoid confusion, it is probably best to use the term “countably 
infinite”, even though it uses more space and contains an extra syllable. 


13.7.13 THEOREM: Some useful alternative tests for finiteness and countability of sets. 


(i) A set S is finite if and only if 3n € w, Jf : S — n, f is injective. 


) 
(ii) A set S is finite if and only if Jn € w, Jf : n — S, f is surjective. 
(iii) A set S is countable if and only if 3f : S > w, f is injective. 

) 


(iv) A set S is countable if and only if S =@ or 3f : w —> S, f is surjective. 


PROOF: For part (i), let S be a set which satisfies dn € w, Jf : S — n, f is injective. Then there exists an 
injective function g : S — n for some n € w. The set g(S) is a well-defined subset of w. Suppose that g(S) = 0. 
(In other words, n = 0.) Then S = Ø. So the empty function from n to S with n = () satisfies Definition 13.5.2 
(because the empty function is a bijection from Ø to Ø). Now suppose that g(S) z 0. Then n > 1 and S z 90. 
Define a function h : n + S inductively by h(0) = min(g(S)) and h(i) = min(j € g(S)U(w\n); j > h(i—1)) 
for alli € n\ (0). This function is well defined because (j € g(S)U(w\n); j > h(i—1)} is a well-defined non- 
empty subset of w if h(i — 1) is well defined. Let m = {k € w; h(k) € n). Define ó: m —^ S by d= f^! oh. 
Then m € w and ¢ is a bijection. Therefore S is finite by Definition 13.5.2. The converse follows immediately 
from Definition 13.5.2. 


For part (ii), let S be a set which satisfies Jn € w, 3f : n — S, f is surjective. Then there exists a surjective 
function f : n — S for some n € w. Define g : S > w by g(x) = min(i € w; f(i) = x} for all x € S. This 
is well defined because the set f~'({x}) = (i € w; f(i) = x} is a non-empty subset of w, which contains a 
well-defined minimum because the standard order on w is well ordered. To see that the function g is injective, 
let x,y € S satisfy g(x) = g(y). Then g(z) = min(f- ((2))) = min(/-!((yI)) = gly). So g(x) € f (ty) 
and g(y) € f~*({x}). Therefore f(g(z)) = y and f(g(y)) = x. So x = f(g(y)) = f(g(@)) = y. Hence g is 
injective. But Range(g) € n. So g : S — n. Hence S is finite by part (i). The converse assertion follows 
directly from Definition 13.5.2. 


For part (iii), let S be a set which satisfies 3f : S — w, f is injective. Let Y = Range(f). Then Y C w. 
So there exists a bijection g : N — Y for some set N € wt by Theorem 12.4.8. Then g`! o f: S N 
is a bijection. Therefore S is a countable set by Definition 13.7.6. Now suppose that S is countable. Then 
JN € wt, 3g : N > S, g is bijective by Definition 13.7.6. Let f = g^!. If N = w, then f : S— visa 
bijection which satisfies the proposition “Jf : S — w, f is injective”. Otherwise, N € w and f : S > N is 
an injection which satisfies the proposition. 

For part (iv), let S satisfy Jf : w — S, f is surjective. Let Y = {j € w; Vi € w, (i < j — fli) # FGD} 
Then dise : Y + S is a bijection. There exists a bijection g : N — Y for some set N € wt by Theorem 12.4.8. 
Then f m og: N > S$ is a bijection. Therefore S is a countable set by Definition 13.7.6. Now suppose that 
S is constable. Then IN € wt, dg: N > S, g is bijective by Definition 13.7.6. If N = 0, then S = 0. So 
assume that N Z 0. Define f : w —> S by f(i) = g(i) for alli € N, and f(z) = o forall Gus av. Then f 


is surjective. 


13.7.14 REMARK: The relations between some definitions of “infinite”. 
Definition 13.7.6 means that an w-infinite set is a set which contains a subset which is equinumerous to w. 
Every w-infinite set is infinite. The converse follows with the assistance of the axiom of countable choice. 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


13.7. Infinite and countably infinite sets 473 


(See Definition 13.7.21 and Theorem 13.10.10.) It is not easy to construct a set which is infinite but not 
w-infinite. There are ZF models in which all infinite sets are w-infinite, and other ZF models in which some 
infinite sets are not w-infinite. (See particularly Lévy [426] for the latter.) 


The following table expresses some classes of finite and infinite sets in terms of Notations 10.5.25 and 13.5.11. 


class definition enumeration space 
finite dn € w, Bij(n, S) ZÜ Enum(w,S) 40 
infinite Vn € w, Bij(n, S) 20 Enum(w,S)=90 
w-infinite Inj(w, S) #0 = 
countably infinite Bij(w, S) #0 Enum({w}, S) 40 
countable dn € wt, Bij(n, S) #0  Enum(w*, S) £0 


The relations between these classes of finite and infinite sets are illustrated in Figure 13.7.2. 


set 
P 2 Sy 
finite set infinite set 

dn € w, Bij(n, $) Z0! Vn € w, Bij(n, S) 2 0 
T Y 
! w-infinite set 
Inj(v, S) 40 
Y 
| countably infinite set 
| Bij(w, S) #0 
i F 

countable set 
3n € w”, Bij(n, S) 40 


Figure 13.7.2 Relations between classes of finite and infinite sets (in ZF set theory) 


13.7.15 REMARK: The need to be able to “produce” the maps which verify infinite cardinality. 
Merely knowing that a set S satisfies the proposition “dn € w, Jf : n — S, f is bijective” is not sufficient 
for some practical purposes. Often one must be “in possession” of the number n and the map f if one wishes 


to use the finiteness of the set in a proof or calculation. 


This is a general ontological issue regarding propositions which contain existential quantifiers. It is not 
enough to merely know that something exists. One must be able to produce an explicit example on demand. 
One may think of a universal quantifier as meaning that “no matter what example you produce, I can show 
that my assertion about it is true". An existential quantifier, by contrast, may be thought of as meaning: 
“If you require an explicit example, I can produce one for you.” This is a kind of constructivism. Or maybe 
it is “productivism”, or “production on demand”. 


13.7.16 REMARK: Cartesian products of finite and countable families of non-empty sets. 

The non-emptiness of Cartesian products of general families of non-empty sets requires the axiom of choice. 
(See Theorem 10.11.10.) For finite families, Zermelo-Fraenkel set theory suffices. For general countably 
infinite families, the axiom of countable choice is required. Of course, there are many explicit means of 
showing non-emptiness of Cartesian products of infinite set-families which are not too unusual. For example, 
if a well-ordering is defined on the union of the range of the set-family, an element of the Cartesian product 
may be chosen as the map from each index value to the minimum of the corresponding non-empty set. 


13.7.17 THEOREM: Finite choice theorem for Cartesian products. 
The Cartesian product of a finite family of non-empty sets is non-empty. 


PROOF: Let (S;);e4 be a family of non-empty sets for some n € w. If n = 0, then (S;)ie, = (0) Æ 0 by 
Theorem 10.11.6 (i). Assume that x;ex S; 4 for some k € n. Then (f : k Ucp Si; Vie k, fie Si} FO 
by Definition 10.11.2. So there exists a function f : k > [J;-, S; such that Vi € k, f; € Si. But k+1 Cn. 


ick 
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So Sk #9. So dx, a € Sk. Then f' = f U ((k, z)) is a function f' : k +1 — Ujeny1 Si which satisfies 
Vi € k+1, f; € Si. So f' € Xiek+ı Si. Therefore Xiek+1ı Si #0. Hence by induction, Xien Si £ 0. 


13.7.18 REMARK: Finite choice theorem with the same form as the axiom of choice. 

Theorem 13.7.19 asserts that the axiom of choice on line (7.11.1) in Definition 7.11.10 is a valid theorem 
of ZF set theory if the collection of sets X is finite. (This theorem is also proved by Lévy [368], page 160; 
Whitehead/Russell, Volume II [401], page 229.) So this could be called an “axiom of finite choice", although 
it is a theorem, not an axiom. If X is countably infinite, the assertion does not follow from the ZF axioms. 
Consequently an “axiom of countable choice" must be stated as a genuine axiom, as in Definition 13.7.21. 


13.7.19 THEOREM: Finite choice theorem for pairwise-disjoint collections of sets. 
Let X be a finite set. Then 


(0 ¢ X and VA, Be X,(A=BorANB=9)) > (AC, YA € X, d'zeC,ze A). 


Proor: Let X be a finite set which satisfies Ø € X and VA, B € X, (A = B or An B = 9). In other 
words, let X be a pairwise-disjoint finite collection of non-empty sets. Then by Theorem 13.5.3, there exists 
a bijection S : n — X for some n € w. Thus S = (S;);e; is a finite family of non-empty sets because 0 ¢ X. 
So by Theorem 13.7.17, x;e49; # 0. This means that there exists a function f : n — U<, S; such that 
f; € S; for all i € n. (See Definition 10.11.2.) 


Let C = Range(f) = (fi; i € n}. Let A € X. Then A = S; for some unique i € n because S : n > X is 
a bijection. So f; € S; = A. Consequently 3C, VA € X, dz € C, z € A. To show the uniqueness of z, let 
y € C with y Æ z satisfy y € A. Then y = f; for some j € n by the definition of C. So j z i because f is a 
function. But f; € S; and S; 4 S; because S is a bijection. So S; N S; = Ø because X is a pairwise disjoint 
collection of sets. Therefore y = f; € S; = A, which is a contradiction. So y = z. Hence z is unique. 


icn 


13.7.20 REMARK: The countable axiom of choice. 

The apparently very basic Theorem 13.7.19 may seem to belong together with the basic set theory definitions 
in Chapter 7. Its much later appearance is an inevitable consequence of the need to define concepts before 
they are used. Finiteness is a fairly high-level concept, not part of the ZF axiom framework. 


? 


The unfundamental character of the "finite choice theorem" exposes also how unfundamental the countable 
choice axiom is. One would usually define all axioms up front, and then start developing the theory from 
there. But the countable choice axiom requires a fair amount of theoretical development before countability 
of sets can be defined as in Definition 13.7.6. 


This can be combined with Definition 13.7.6 to give meaning to a “countable set" in Definition 13.7.21 
Clearly, however, this does add another two lines to the two-line set-theoretic expression for the countable 
choice axiom in Definition 13.7.21. (See Remark 14.2.8 for an equivalent countable choice axiom which does 
not make any use of ordinal numbers.) 


A not-too-complicated set-theoretic expression for elements of wt is given in Remark 12.1.6 line (12.1.2). 


13.7.21 DEFINITION [MM]: Zermelo-Fraenkel set theory with countable axiom of choice is Zermelo-Fraenkel 
set theory as in Definition 7.2.4, together with the additional Axiom (9^). 


(9'") The countable choice axiom: For every countable set X, 


(VA, (Ae X > Jrz, z € A)) A VA, YB, (AEX A Be X) —(A—- B V ~Jr, (€ AA x € B)))) 
> JC, VA, (Ae X 3z, (€ A^zeC^Vy (YEAAYEC)Sy=2))). 


In other words, for every countable set X, 


(0 ¢ X and VA, B € X, (A=B o AN B =ģ)) > (AC, YA € X, 7z €C, z €A). (13.7.1) 


In other words, every countable pairwise disjoint collection of non-empty sets is choosable. 


13.7.22 THEOREM |ZF+CC]: The “countable multiplicative axiom”. 
The Cartesian product of a countable family of non-empty sets is non-empty. 
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PROOF: The assertion may be proved as for Theorem 10.11.10, with the difference that the family of 
non-empty sets is required to be countable. 


13.7.23 REMARK: Application of Cantor diagonalisation to the set of constructible real numbers. 
Since model theory is outside the scope of this book, only some brief informal comments can be made here 
on the applicability of the Cantor diagonalisation procedure in Theorem 13.1.27 (i) to the set of constructible 
real numbers. (The set of real numbers is identified here with P(w) to avoid minor technicalities.) 


The cardinality of the set of constructible real numbers is related to Skolem’s paradox. (See Remark 7.12.6 for 
some literature references for Skolem’s paradox.) It is Cantor’s diagonalisation procedure which demonstrates 
that P(w) is uncountable in the ZF language. One might reasonably ask why this procedure does not also 
demonstrate uncountability for the interpretation of P(w) within a countable constructible model for ZF. 
(See Remark 12.6.8 for some literature references for constructible ZF universes.) 


The universe L of constructible ZF sets is generally held to be countable. Therefore the set Y = P(w) N L 
must also be countable by Theorem 12.4.8. So by Theorem 13.7.13 (iv), there must be a surjection f : w > Y. 
Then f will be a function from w to P(w). Let y = {i € w; i ¢ f(i)}. Then y € P(w) V Range( f), as argued 
in the proof of Theorem 13.1.27 (i). It follows that y ¢ Y. So y € IP(w) V L. Thus no contradiction is 
obtained by asserting that Y is countable. The diagonalisation procedure yields a set which represents a 
non-constructible real number. This does not contradict the countability of P(w) N L. 


The expression for y is certainly a ZF “construction”. So L is not closed under all ZF constructions. When 
ZF constructions are applied to sets in particular stages in the cumulative hierarchy which defines L, the 
“outputs” from such constructions lie in the next stage, but this does not imply that ZF constructions applied 
to elements of the limit set L must be constructible. The set y is constructed from the limit set P(w) N L of 
all constructible elements of P(w), not from a particular constructible set in L, which would be an element 
of a particular stage of the construction of L. Any ZF construction applied to a single element of L will yield 
another element of L, but L is not an element of L. So it is perfectly possible that P(w)NZ is not an element 
of L. In the situation described here, a ZF construction applied to f : w — P(w) N L yields an element of 
P(w) \ L, which is not constructible. This prevents the diagonalisation procedure from demonstrating the 
uncountability of IP(w)N.L within a constructible ZF model, and there is no logical contradiction between the 
countability of the constructible real numbers and the uncountability of the real numbers in ZF set theory. 


13.8. Countable unions of finite sets 


13.8.1 REMARK: The difference between implicit and explicit existence. 

It seems plausible that the tendency of mathematicians to apply axioms of choice without being aware of 
doing so may be attributed to the confusion between implicit and explicit existence. For example, in 1970, 
Willard [165], page 8, gave a proof that "the union of countably many countable sets is countable", a theorem 
which requires a choice axiom which is slightly weaker than the countable choice axiom. (See for example 
Howard/Rubin [362], page 85, form 10 A.) Then on page 9, he stated that “it is customary to mention the 
axiom of choice whenever it is used." But there is no mention on the previous page that it is required. 


Theorem 13.8.5 requires enumeration maps for a countable family of sets to be given explicitly as a single 
function ¢. Most often when one thinks of a countable family of finite sets, one has in mind finite sets which 
have some known concrete structure which is easily enumerated without choice functions. If one only knows 
that each set in a family is finite, one only knows that at least one suitable enumeration exists, without 
knowing how to choose that enumeration from amongst all possible enumerations. Therefore some kind of 
choice function is required. 


Thus the mental concept of a countable family of finite sets may include explicit enumeration maps, or 
else the structures on the sets may permit them to be explicitly enumerated without difficulty. Therefore 
the often-mentioned naturality and intuitive basic of axioms of choice may in fact be illusions stemming 
from confusion between implicit and explicit families of enumerations. In typical experience, the majority 
of set families are easily provided with enumeration functions, but generalisation from the familiar to the 
unfamiliar is not always justified. 


If the axiom of choice is required in pure mathematics only because of the failure to consciously distinguish 
implicit from explicit existence assumptions, then one could almost say that the axiom of choice issue is 
psychological rather than strictly logical. Invocation of the axiom of choice converts an individual existence 
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assertion for an element of each set in a family to an explicit assertion of the existence of a family of individual 
element choices. This may be simply making the mathematician’s unconscious assumption conscious. 


The difference between implicit and explicit existence is very closely related to the “quantifier swapping” 
concept which is described in Remark 45.2.1. Another reason to believe that the axiom of choice is a non- 
issue is the “shim theorem” concept described in Remark 13.8.14. On the other hand, in some situations, 
choice functions remain permanently in the unconscious because there are no constraints on a class of 
problems which can produce them. For example, well-orderings of the real numbers will remain eternally 
unconscious. If one restricts one’s attention to those real number systems for which one can provide an 
explicit well-ordering, that class of real number systems will be empty. 


13.8.2 THEOREM: Inheritance of countability by subsets. 
Every subset of a countable set is countable. 


PROOF: Let S be a countable set. Then there exists a bijection f : N — S for some N € wt = wU {w} 
by Definition 13.7.6. Let T be a subset of S. Let Y = f~1(S), and let g = fly be the restriction of f to Y. 
Then g : Y > T is a bijection. But Y C w because N € wt and so N C w by Theorem 12.2.20 (ii). It then 
follows from Theorem 12.4.8 that there exists a bijection h : X — Y for some X € wt. Let $ = g o h. Then 
$: X — T is a bijection by Theorem 10.5.6 (iii). Hence T is countable by Definition 13.7.6. 


13.8.3 REMARK: The union of an explicit countably infinite family of finite sets is countable. 

It is fairly simple to show that the union of an explicit countably infinite family of finite sets is countable by 
first defining a natural injection from the union of the family to the set w x w (e.g. by mapping each finite set 
to a row of the doubly infinite array), and then exploiting an explicit bijection between w and w x w such as 
is presented in Theorem 13.9.2. This shows that there exists a surjection from a subset of w to the union of 
the countably infinite family. From this, it is possible to deduce that the union is either finite or countably 
infinite. However, there is no need to use such abstract methods when concrete methods are available. The 
proof of Theorem 13.8.5 inductively constructs an explicit bijection for a given countably infinite family of 
finite sets. 


13.8.4 DEFINITION: An ezplicit countably infinite family of finite sets is a family of sets S = (5;);c,, such 
that there exist a function n : w — w and a function d:w — U S; such that $; : n; > Sj is a bijection 
for all i € w. 


icu 


13.8.5 THEOREM: The union of am explicit countably infinite family of finite sets is countable. 
Let S be an explicit countably infinite family of finite sets. Then |J Range(S) is a finite or countably infinite 
set. (In other words, Enum(w*, (J Range(S)) z 0.) 


PROOF: Let S = (S;);e, be an explicit countably infinite family of finite sets. Then there are functions 
n:w—>wandd:w- Le S; such that $; : n; — S; is a bijection for all ? € w. Inductively define functions 
I:w—w*,J:w-w* and g : w — (w > )Range(S)) by g(0) = Ø with the recursion rules: 


Vk Ew, I(k) = min(i € w; 3j € ni, ói(j) € Range(gx)} 
Vk Ew, I(k) = m € nr); Pili) € Range(gx)} : n tu 
Vk €w, "ER rs U {(k, brn) CJ (k)))) : Ao f : 


where the function min : P(w) > w* is defined by 


vA € Plu), mia) - [mn Fare 


Let f = Ukew gk and N = Dom(f). Then N € wt, and it is easily verified that f : N — U Range(S) is a 
bijection. 


13.8.6 REMARK:  Theorems about countable unions of finite sets. 
Theorem 13.8.7 is an occasionally useful corollary of Theorem 13.8.5. (It is used to prove Theorem 33.4.19.) 
The total order guarantees the existence of an explicit bijection for each set in the family. 
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13.8.7 THEOREM:  Countability of a countable union of finite subsets of a totally ordered set. 
The union of a countable family of finite subsets of a totally ordered set is countable. 


PROOF: Let X be a totally ordered set with order “<”. Let S : w — P(X) be such that S; is finite for 
all i € w. Then there exists a unique function n : w + w such that there exists a bijection from n; to S; for 
all à € w. From amongst the possible bijections for each i € w, “choose” the unique bijection $; : n; > S; 
which preserves order. In other words, Vi € w, Vj, k € ni, (j < k & ilj) < e;(k)). (Thus 9;(0) = min(S;), 
$;(1) = min(S; V {min(S;)}), and so forth by induction.) Then SS is an explicit countably infinite family of 
finite sets by Definition 13.8.4. Hence |J Range(S) is a countable set by Theorem 13.8.5. 


13.8.8 REMARK:  Weaker conditions to obtain countable union of finite sets theorem. 

Whereas Theorem 13.8.5 requires an explicit countably infinite family of sets in order to avoid invoking 
a choice axiom, Theorem 13.8.7 requires the target set (the union of the range of the set map) to be 
totally ordered, which is a stronger requirement. (The countable choice axiom version of the theorem is 
Theorem 13.8.13.) The morbidly curious reader might like to know that the result can be obtained from 
a choice axiom, denoted C(w,< w), which is weaker than either the countable choice axiom or the total 
order requirement. The relations between these requirements are illustrated in Figure 13.8.1. (A much more 
detailed diagram is given by Moore [371], page 324.) 


axiom of choice 


am a 


linear ordering choice of WO sets countable choice 


LO [30] C(oo, WO) [60] C(w, oo) [8] PW 
oe | 


choice(finite) countable(countable) finite = Dedekind finite 
C(co, < w) [62] C(w,w) [32] 


RN oe 


choice(finite) [61] C(w, < w) [10] 
Yn > 2, C(co,n) UT(w, < w,w) [10A] 


Lfinite = IV-finite [9] 


EN 


~~ 


Figure 13.8.1 Implications and impossible implications between some ZF choice axioms 


The arrows in Figure 13.8.1 mean that the implication can be proved in ZF set theory. The crossed arrows 
mean that there is a known ZF model in which the implication is false. (The numbers in square brackets 
are the “form” numbers in Howard/Rubin [362], pages 9-64.) The notation “UT (w, < w,w)” means that the 
union of a countable family of finite sets is countable. (This is essentially the same as Theorem 13.8.13.) 


It can be seen in Figure 13.8.1 that form 30, the existence of total (i.e. linear) orderings on all sets implies 
form 10, the choice axiom C(w,< w), which means that there exists a choice function for any countable 
family of finite sets. But this is equivalent to UT (w, < w,w) (form 10A), which is the desired theorem. The 
diagram also shows that the countable choice axiom (CC), denoted here as “C'(w, oo)" (form 8) implies the 
required weaker choice axiom C(w,< w). (This topic is also mentioned in Remark 33.4.20.) 


The reader who desires further gory details may consult Howard/Rubin [362]. One point of possible further 
interest is that their “form 14” implies “form 49”, which implies “form 10”. Thus Theorem 13.8.13 can be 
proved in ZF with the additional axiom known as the “boolean prime ideal theorem” (form 14), which states 
that every boolean algebra has a prime ideal. This subject is, luckily, outside the scope of this book. (See 
Remark 1.6.3 item (2) for this scope constraint.) 


13.8.9 REMARK: Implicit versus explicit countable families of finite sets. “Quantifier swapping.” 

A countably infinite family of finite sets has the form S : w + P(X) for some map S and set X. (By the 
replacement axiom, the set X can always be chosen as |J Range( S), which is necessarily a ZF set.) The finite 
set condition in the implicit case in Theorem 13.8.13 may be written as 


Vi € w, dn € uw, Bij(n, Si) z 0. (13.8.1) 
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In terms of Notation 13.5.11, this condition may be written as Vi € w, Enum(w, S;) Æ Ø. The explicit case 
in Definition 13.8.4 and Theorem 13.8.5 may be written as 


Jn :w > w, Jọ : w > U Bij(n; $;), Vi € w, 
= $1 € Bij(ni, Si). (13.8.2) 


Thus the implicit form in line (13.8.1) specifies only that the set of bijections Bij(n, S;) is non-empty, whereas 
the explicit form in line (13.8.2) makes a choice $; for the bijection in each set of bijections Bij(ni, S;). 


2 


In a very rough way, one may say that the provision of a choice function enables the quantifiers “Vz” and “An 
to be swapped. When a choice function is available, one may write “dn, V?" instead of “Vi, dn”. Slightly 
more precisely, a proposition of the form “Vi, dn, 4¢é, o € Bij(n, Sı)” can be replaced with a proposition of 
the form “dn, 36, Vi, ó; € Bij(n;, Si)”. Thus the existential variables ^n" and *$" are replaced by explicit 


choice functions ^n? and *$9" when the quantifiers are swapped. (See Theorem 10.5.17 (ii) for justification 
of quantifier swapping as an application of the axiom of choice.) 


13.8.10 REMARK: Choice functions and cross-sections of fibrations. 

There is a similarity between the choice function ó in Remark 13.8.9 line (13.8.2) and the concept of a 
cross-section of a non-uniform fibration in Definition 21.1.2. It is perhaps a cruel irony that the choice axiom 
issue arises because of the existence of multiple choices for an infinite number of index values, whereas the 
problem does not occur when there is only one choice for each index value. One would intuitively expect 
that with more choices, the existence of an enumeration of J Range(S) would be even easier to demonstrate. 


13.8.11 REMARK: The union of a countably infinite family of finite sets is countable. 

Some kind of axiom of countable choice is required to be added to the ZF axioms in order to prove that the 
union of a countably infinite family of finite sets is countable. A succinct and definitive account of this issue 
is given by Howard/Rubin [362], pages 19-20. (See under “Form 10”.) Theorem 13.8.13 is in fact equivalent 


(modulo the ZF axioms) to the axiom that every countably infinite family of non-empty finite sets has a 
choice function. (This is weaker than the full axiom of countable choice.) 


It is assumed in Theorem 13.8.13 that all of the sets in the family are subsets of a single given set. This is 
unnecessary because the ZF replacement axiom guarantees that the union of the range of the family is a set. 
(See Section 7.7 for the replacement axiom.) 


13.8.12 THEOREM [ZF+CC]: Every countably infinite family of finite sets is an explicit family. 
Let S be a countably infinite family of finite sets. Then S is an explicit countably infinite family of finite 
sets. 


PROOF: Let S be a countably infinite family of finite sets. Then as discussed in Remark 13.8.9, there is a 
set X such that S : w > P(X), and Vi € w, dn € w, Bij(n, Si) Æ 0. There is a unique function n : w > w 
such that Vi € w, Bij(nj, S;) Æ Ø because Bij(n, S;) can be non-empty for at most one value of n € w. Then 
by the axiom of countable choice, there is a map $ : w — Uj¢,, Bij(n;, Si) such that Vi € w, $; € Bij(n;, 5;). 
Hence S is an explicit countably infinite family of finite sets by Definition 13.8.4. 


13.8.13 THEOREM [ZF4CC]: A countable union of finite sets is countable. 
For some set X, let S : w — P(X) be such that S(i) is a finite set for all i € w. Then LU Range(S) is a finite 
or countably infinite set. In other words, for any set X, for any S : w > P(X), 


(Vi € w, In € w, 3g : n 2 S(i), g is a bijection) > (3m € w*, 3h : m > URange(S), h is a bijection). 


In other words, a countable union of finite sets is countable. 


PROOF: The result follows from Theorems 13.8.5 and 13.8.12. 


13.8.14 REMARK: Insulation of ZF theorems from ZF+AC theorems. 

The way in which the axiom of countable choice is invoked for Theorems 13.8.12 and 13.8.13 demonstrates 
a general pattern which may be applied to isolate and insulate ZF theorems from ZF+AC theorems. First 
an AC-free theorem is proved, such as Theorem 13.8.5, which is where the real work is done. Any required 
choice functions must be explicitly provided by the user of an AC-free theorem, and the proof of the theorem 
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is written in terms of explicitly provided choice functions. Then an AC-invoking “shim theorem”, such as 
Theorem 13.8.12, makes the claim that some required choice function will be available with the assistance of a 
suitable choice axiom. Finally, the AC-free and AC-invoking theorems can be joined, as in Theorem 13.8.13, 
to give the same result as the AC-free theorem, but using an axiom-provided choice function instead of a 
user-provided choice function. (This is illustrated in Figure 13.8.2.) 


ZF input choice 
axioms conditions axiom 
iy 4 
ji T Y 
I | 
| ZF--ACitheorem (shim 
I | 
ZF input choice | SEE 
axioms conditions function l l ! 
l | function 
í | | H H 7 
ZF theorem ZF theorem 
L j 
output output 
assertion assertion 
ZF set theory ZF--AC set theory 
Figure 13.8.2 ZF--AC theorem as a “front end" for a ZF theorem 


In reality, choice axioms never provide anything at all, but most mathematicians currently invoke choice 
axioms as if they were a valid part of mathematics. By splitting theorems into AC-free and AC-invoking parts, 
the same theorems are obtained, but a “clean interface" (to use the engineering term) is created between 
ZF and ZF+AC theorems so that mathematicians who prefer a more constructive style of mathematics can 
avoid the most non-constructive theorems. 


Even if ZF and ZF--AC theorems are not split in this way, one may regard all AC-invoking theorems as 
being “virtually split”. In other words, one may interpret all AC-invoking theorems as having an invisible 
extra condition which states that the user must provide whatever choice function is required, and an invisible 
"shim theorem" which invokes an AC axiom to provide the choice function. When an AC axiom is invoked 
in a proof with language such as “by the axiom of choice, X is true", one may replace this with “using 
the provided choice function, X is true". Each invocation of an AC axiom creates an implicit IOU which 
must be paid by the user of the theorem. Such invocations are “place-holders” which must filled with user- 
provided choice functions. Thus choice axiom invocation becomes a matter of linguistic style which does not 
materially affect the structure of mathematical proofs. Belief in choice axioms then becomes a matter of 
personal interpretation and faith. Non-believers understand that choice functions must really be provided. 


Further clear examples of “shim theorems” are Theorems 23.5.9 and 23.5.10, which are “front ends" for 
'Theorems 23.5.3 and 23.5.7 respectively. 


13.8.15 REMARK: Tooth fairies, Father Christmas, the Easter Bunny, and the axiom of choice. 

The axiom of choice is comparable to the “tooth fairy" myth. Parents used to tell young children in Britain 
in the olden days that when they lost a tooth, they could put it under their pillow at night and a tooth 
fairy would replace the tooth with a sixpence. Of course, with the benefit of hindsight, it was the parents 
who had to remove the tooth and put a sixpence under the pillow while the child slept. No tooth fairy ever 
delivered a sixpence. And no axiom of choice ever delivered a choice function. 


Similarly, Father Christmas supposedly put presents under the Christmas tree, but actually the parents 
had to do this while the child was asleep. Another such myth was the Easter Bunny, which supposedly hid 
coloured eggs in scattered locations around the back garden each Easter for the children to find. In each case, 
the child perceived this as magic or supernatural, but ultimately it was the adults who had to provide the 
sixpence, the presents and the Easter eggs. The axiom of choice is entirely analogous because it supposedly 
delivers choice functions, but in every case, the concrete choice function which is found under the pillow, 
under the tree, or in the garden, is put there by an adult. This is the moral of the story in Remark 13.8.14. 
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Moore [371], pages 200-201, makes the following comment about the controversy related to this issue in 
about 1918. (Here “the Axiom” means the axiom of choice.) 


Much of the controversy surrounding the Axiom, Sierpiński contended, arose from divergent inter- 
pretations of its meaning. In this vein he considered the name *Axiom of Choice" to be inappro- 
priate. For the Axiom did not enable a mathematician to choose an element effectively even from 
a single non-empty set. In fact there existed sets, proven non-empty by the Axiom, in which it was 
not known how to determine a single element uniquely, that is, to give a characteristic property 
distinguishing this element from all others in the set. One such set was that of all non-measurable 
subsets of IR. On the other hand he knew of no set, shown to be non-empty without the Axiom, in 
which a particular element could not be determined. 


Exactly! When a set is defined without the axiom of choice, the same human being who specified the set 
can generally choose at least one element of the set. But when the set is delivered by the axiom of choice, 
there is no such information. Effectively the ZF axioms (except extension and regularity) merely anoint sets 
whose identity is already known, accepting such sets into the arena of ZF set theory. The axiom of choice 
does not anoint a human-made set, but rather claims that a set exists without anyone knowing what it is. 


13.9. Cardinality of the set of finite ordinal number pairs 


13.9.1 REMARK: Cardinality of w x w. 

Although it is intuitively obvious that the set w x w is countably infinite, it is desirable to construct a 
bijection from w to w x w without first defining addition or multiplication. The usual enumeration of the 
elements of w x w proceeds along successive diagonals, giving the sequence (0, 0), (1,0), (0, 1), (2,0), (1, 1), 
(0, 2), and so forth. (This is illustrated on the left in Figure 13.9.1.) Such a sequence can be constructed in 
terms of the definitions of addition and multiplication on the finite ordinal numbers, which require inductive 
constructions. Theorem 13.9.2 uses a different enumeration sequence which avoids the need for arithmetic 
operations. (This is illustrated on the right in Figure 13.9.1.) 


j2012 34 5-.- j=0 1 2345 
1—0/,0[2|5/|9]|14/20|27 1—0/0/[3/|8 15/24 /35|48 
1} 1] 4] 8 13/19|26|34 1} 1) 2) 7 1423/34147 
2/3) 7 112) 18) 25) 33) 42 2) 4] 5 | 6 |13)22|33)46 
3 | 6 J11)17) 24] 32] 41 | 51 3 | 9 |10)11)12)21| 32] 45 
4 |10) 16} 23] 31|40)50|61 4 116|17|18|19|20|31|44 
5 |15| 22) 30 | 39} 49 | 60 | 72 5 | 25 | 26] 27 | 28 | 29 | 30} 43 
:121|29/38 48 59| 71) 84 :::136[37/38 39/40 41 | 42 
a triangular enumeration a rectangular enumeration 

Figure 13.9.1 Two enumeration styles for w x w 


13.9.2 THEOREM: A rectangular enumeration of w x w. 


(i) Define a relation “<” on w x w by 


V(i1, j1), (i2, j2) € w x w, 
(i, j1) < (42, j2) & (i1 U jı & i2 U j2) or ((i1 U jı = i2 U j2) and (i1 2 i2 or jı € j2)).— (13.9.1) 


Then “<” is a strict total order on w x w. 
(ii) Inductively define a function h : w — w x w by h(0) = (0,0) and Vn € w, h(n + 1) = é(h(n)), where 
Q :t xw — w X w is defined by 
(qd 32] 
V(i,j) Ew x w, $((53) =4 (i—1,j) if Cj andi #0 (13.9.2) 
(j+1,0) ift=0. 
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Then h : w — w x w is a well-defined function. (Note that the operations “+1” and “—1” do not need 
the full definition of addition for ordinal numbers because j + 1 means j U {j} and i — 1 means Ji.) 


(iii) Vn € w, h(n) < h(n + 1), where “<” and h are as in parts (i) and (ii). 
(iv) Yni, no € w, (n; < ng => h(n1) < h(n2)), where “<” and h are as in parts (i) and (ii). 
(v) 
) 
) 


(vi 
(vii) w x w is a countably infinite set. 


h:w—w x w is an injection, where h is as in part (ii). 


h : w — w x w is a bijection, where h is as in part (ii). 


PRoor: For part (i), define a relation “<” on w x w as in line (13.9.1). Let (i1, j1), (i2, j2) € w x w. Then 
either i; U jı Ẹ i3 U je or i1 U j1 = i2 U j2 or i1 U jı 2 i2 U jo, and only one of these cases can occur, because 
set inclusion is a total order on w. If à; U j1 Ẹ i2 U jo, then (i1, j1) < (i2, j2) and (i2, j2) £ (i1, j1). Similarly, 
if ay U Ji 2 12 U J2, then (i2, j2) < (41,91) and (41,91) x (io, j2). Let k — 4 UJ = 12 U j2. Then k € W, 
and i; C k, jı C k, i C k and jg C k. Since set inclusion is a total order on w, either 71 & iz or i4 = tg 
or à, 2 i», and only one of these cases can occur. Suppose that i; Ẹ i2. Then jı = j2. So (i2, j2) < (41,91) 
and (i1, j1) £ (i2, j2). Suppose that i; = ig. Then either jı Ẹ j2 or jı = j2 or jı € j2, and only one of these 
cases can occur. If jı & jo, then (i1, 91) < (i2, j2) and (i2, j2) £ (i, j1). If ji = jo, then (i1, j1) = (i2, j2) 
and (41,91) £ (i2, j2) and (i2, j2) £ (à, ji). If ji 2 J2, then (41,91) x (i2, j2) and (io, j2) < (i1, ji). Finally, 
suppose that i; 2 ig. Then jı = j2. So (i1, j1) < (i2, j2) and (i2, j2) £ (i1, j1). In all cases, exactly one of 
the three propositions (i1, j1) < (i2, j2), (41, j1) = (i2, j2) and (i2, j2) < (i1, j1) holds. Hence “<” is a strict 
total order on w x w. 


For part (ii), first note that the case conditions “i 2 j”, “i C j andi #0”, and “i = 0" constitute a partition 
of the elements of w x w. (In other words, one and only one of them is true for each element of w x w.) 
The ordinal number j + 1 is well defined as j U {j} € w for all j € w by Theorem 12.1.37 (iii). The ordinal 
number i — 1 is well defined as Ji € w for all i € w\ {0} by Theorem 12.1.37 (iv). So ó (i, )) € w X w for all 
(i,j) € w x w. Since h(0) € w x w, it follows by induction that h(n) € w x w for all n € w. So h : w >w x w 
is a well-defined function. 


For part (iii), let n € w and h(n) = (i,j), and suppose that i 2 j. Then iU j = i = i U (j + 1) because 
j+1C i, and h(n) = (i,j +1) by equation (13.9.2). Since j & j+ 1, it follows from equation (13.9.1) 
that (i,j) < (i,j +1). Suppose that i C j and i #0. Then i 1 & i, and so i Uj = j = (i—1)U j. So 
(i,j) < (i — 1,j) by equation (13.9.1). Suppose that i = 0. Then iU j = j and (j +1)U0 =j +1. So 
(i,j) < (j + 1,0) by equation (13.9.1). Thus h(n) < h(n + 1) for all of the cases in equation (13.9.2). 

For part (iv), suppose that n1, n2 € w satisfy nı < nz and h(n) > h(na). Let ng = N (n € w; h(n1) 2 h(n)}. 
(In other words, nj is the minimum n € w with h(ni) > h(n).) Then n is a well defined element of w 
because (n € w; h(n1) 2 h(n)} is non-empty and w is well ordered. Clearly nı < n5, and h(n1) > h(n5) and 
h(n1) € h(n,-1). (Note that n5 might equal nı. So equality cannot be excluded here.) So h(ng—1) > h(n2). 
But this contradicts part (iii). Therefore h(n1) < h(ng). Hence Yni, no € w, (n1 < ng => h(ni) < h(na)). 


Part (v) follows from parts (ii) and (iii). 

For part (vi), it is clear that Range(h) 2 0 x 0. Suppose that Range(h) 2 k x k for some k € w. Then 
(0,k) € Range(h). So h(n) = (0, k) for some n € w, and so h(n + 1) = (k + 1,0) € Range(h). Therefore 
(k+1,7) € Range(h) for all j with j < k+1 by induction on j. (To see this, note that if (k-- 1,7) € Range(h) 
and j < k+1, then (k-- 1, j- 1) € Range(h) by the case “i 2 j” in line (13.9.2).) So (k-- 1, k-- 1) € Range(h). 
But then (i, k-- 1) € Range(h) for all i € w with i € k 4- 1. This follows by induction on i. (To see this, note 
that if (i,k +1) € Range(h) and i € k + 1, then (i — 1, k + 1) € Range(h) by the case “i C j and i Z 0" in 
line (13.9.2).) Therefore (0, k-- 1) € Range(h). But all elements (i, j) of w x w which satisfy iU j = k+ 1 are 
either of the form (i, k+1) with i C k+1, or (k+1, j) with j C k+1, and all of these elements are in Range(h). 
So Range(h) 2 {(i, j) € wx w; (Uj) =k+1}. Then since Range(h) Dk x k = ((ü,j) € w xw; (Ug) € k}, 
it follows that Range(h) 2 ((4,3) € w x w; (Uj) € k 4-1] = (k-4- 1) x (k +1). By induction on k € w, it 
follows that Range(h) 2 k x k for all k € w. Therefore Range(h) 2 w x w. So h : w — w x w is a surjection. 
Hence h : w — w x w is a bijection by part (v). 


Part (vii) follows from part (vi) and Definition 13.7.6. 


13.9.3 REMARK: The “state machine” for a triangular enumeration of w x w. 
If one inductively defines the function h : w > w x w by h(0) = (0,0) and Vn € w, h(n+1) = e(h(n)), where 
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$ :wxw- wx is defined by 


V(i,j) € wxw, Heys { ae E a : 


then h : w — w x w is a well-defined bijection which corresponds to the triangular enumeration of w x w in 
Figure 13.9.1. The function ¢ may be thought of as the “state machine" for the enumeration. Although it 
is intuitively clear that this state machine induces a bijection h : w — w x w, the most obvious way to prove 
this requires the prior definition of the addition operation on w. 


13.9.4 REMARK: Explicit higher-level enumerations for w x w. 

The triangular enumeration for w x w which is illustrated on the left of Figure 13.9.1, and has the state 
machine described in Remark 13.9.3, can be defined explicitly as the map n +> (I(n), J(n)), where the 
functions f,g,1,J : w — w are given by 


Vn € w, g(n) = floor(-1 + (2n + 1)!/?) 
Vn € w, f(n) = g(n)(g(n) + 1)/2 

Vn € w, I(n) = f(n) - g(n) ^ n 

Yn € w, J(n) 2 n-— f(n) 


This explicit enumeration is exploited in the proofs of Theorems 45.3.3 and 45.4.2. 

A corresponding explicit enumeration for the rectangular enumeration of w x w which is illustrated on the 
right of Figure 13.9.1, and has the state machine described in Theorem 13.9.2 (ii), can be defined explicitly 
as the map n  (I(n), J(n)), where the functions f, g,1,J : w — w are given by 


Vn € w, g(n) = floor(n!/?) 

Vn € w, f(n) = g(n* 

vn € v, I(n) = g(n) + min, f(n) + g(n) — n) 
Vn € w, J(n) = min(g(n), n — f (n)). 


Unfortunately, this explicit enumeration cannot be used for the proof of Theorem 13.9.2 because it is based on 
higher-level real-number algebraic functions and operations. The floor of the square root could be replaced 
with an integer function, for example by defining g(n) = max(k € w; k? < n], but this requires the 
definition of the integer multiplication operation. Similarly, the complicated expression for g in the triangular 
enumeration case may be replaced with the formula g(n) 2 max(k € w; k(k 4- 1) € 2n]. 


It could be argued that as soon as addition and multiplication have been defined (using countable induction) 
for the ordinal numbers, these explicit formulas for bijections from w to w x w are at least as concrete and 
explicit at the bijections presented in Theorem 13.9.2 (ii) and Remark 13.9.3. However, the expressions for 
g(n) require the solution of a kind of maximisation problem, which is not fully concrete and explicit. 


13.9.5 REMARK: Explicit enumerations of finite Cartesian products of countable sets. 
The method of enumeration for wxw may be applied inductively to define enumerations for general Cartesian 
products w” for n € Zt. Consequently finite products of countable sets are countable. 


13.9.6 THEOREM: All finite Cartesian products of the set of finite ordinals are countably infinite. 
w” is a countably infinite set for all n € Z+. 


PROOF: The case n = 1 is almost self-evident. The case n = 2 follows from Theorem 13.9.2 (vii). Suppose 
that w” is countably infinite for some n € Zt. Define the concatenation map v : w” x w — wt! by 
vo: ((do,... i5 1), j) > (t0,---%n-1,J). Since w” is assumed countably infinite, there exists a bijection 
f :w— w” by Definition 13.7.6. By Theorem 13.9.2 (vi), there exists a bijection h : w => w x w. Let 
ho : w > w and hı : w > w denote the two components of h so that h(i) = (ho(i), hı (i)) for all à € w. Define 
h : w — wt! by h(i) = v(f(ho(i)), hi(i)) for all i € w. Then h is a bijection. Therefore w”+! is countably 
infinite. Hence w” is a countably infinite set for all n € Z* by induction. 
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13.9.7 THEOREM: Finite Cartesian products of countably infinite sets are countably infinite. 
Let (S;)"_, be a family of countably infinite sets for some n € Z*. Then the Cartesian set-product x?_, Si 
is countably infinite. 


Proor: The assertion follows from Theorem 13.9.6 and Definition 13.7.6. 


13.9.8 REMARK: A countable union of countable sets could be uncountable. 

By contrast to Theorem 13.9.9 in ZF+CC, there is a ZF model in which a countable union of countable 
sets can have the same cardinality as 2”. In other words, the set of real numbers can be expressed as a 
countable union of countable sets in such a model. (See Howard/Rubin [362], page 154, Jech [364], page 142, 
Moore [371], pages 9, 103, 203, for the Feferman/Lévy model.) Such examples show that intuition is not to 
be trusted on this subject. 


13.9.9 THEOREM [ZF+CC]: Countable union theorem. 
The union of a countable set of countable sets is countable. 


PROOF: For a proof of Theorem 13.9.9, see Pinter [377], page 148; Halmos [357], pages 91-92. See also 
Moore [371], page 9, for a brief outline proof. (See Remark 13.9.10 for excuse for absence of proof.) 


13.9.10 REMARK:  Non-necessity of providing proofs for AC theorems. 

As a general rule, proofs are provided for all theorems in this book. However, theorems which require 
some kind of infinite choice axiom are not regarded as “true theorems” here. So references to proofs in 
the literature may sometimes be given, as for example for Theorem 13.9.9. Since choice functions whose 
existence is granted by axioms of choice reside in some kind of metaphysical astral plane, it seems fair enough 
that AC theorem proofs should reside there also! 


13.9.11 REMARK:  Well-orderings of w x w induce enumerations. 
For any well-ordering “<” on w x w, inductively define a function h : w > w x w by 


Vn € w, h(n) = min{ (i, j) € w x w; Vk € n, h(k) < (i, j)}. 


This is well defined because h(n) is defined for each n € w in terms of the values of h(k) for k € n. Then the 
set {(i, j) € wxw; Vk € n, h(k) < (i, j)} is a non-empty subset of w x w, which has a well-defined minimum. 
Thus any well-ordering induces an enumeration on w x w. 


13.10. Dedekind-infinite sets and mediate cardinals 


13.10.1 REMARK: The Dedekind definition of finite and infinite sets. 

Definition 13.10.3 is essentially the same as the characterisation of finite sets which was introduced by 
Dedekind. (The original presentation of the concepts of Dedekind finite and infinite sets may be found 
in an 1888 essay by Dedekind [352], pages 63-67.) Every finite set (i.e. set equinumerous with a finite 
ordinal number) is Dedekind-finite, but the axiom of countable choice implies that all Dedekind-finite sets 
are finite. (See E. Mendelson |370], pages 184-185; Halmos [357], page 61; Wilder [403], pages 63-72.) The 
Dedekind finite set definition is attractive because it uses a simple intrinsic rule instead of the extrinsic 
machinery of finite ordinal numbers, but the “magic wand” of the axiom of choice is the greater evil. (See 
Remark 7.11.13 (15) for further comments on Dedekind-finite sets. See Definition 13.5.2 for finite sets.) 


13.10.2 DEFINITION: A Dedekind-infinite set is a set S such that f(S) Z S for some injection f : S — S. 


13.10.3 DEFINITION: A Dedekind-finite set is a set S such that every injection from S to S is a surjection. 


13.10.4 THEOREM: Finite implies Dedekind-finite. 
If a set S is finite, then S is Dedekind-finite. 


PROOF: Let S be a finite set. Then there is a bijection f : N — S for some ordinal number N € w. Let 
g: S — S bean injection. Then g' = f^! ogo f : N >N is an injection. So g’ : N > N is a bijection by 
Theorem 12.4.13 (i). Therefore g : S — S is a bijection. Hence S is Dedekind-finite. 
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13.10.5 REMARK: ZF equivalence of Dedekind-infinite and w-infinite sets. 

It turns out that the definitions of Dedekind-infinite and w-infinite sets are equivalent, using only the Zermelo- 
Fraenkel axioms. However, it is the more practical w-infinite property which is typically used in analysis, not 
the more abstract Dedekind-infinite property. The proof of Theorem 13.10.6 (i) does not require an axiom 
of choice because the sequence (£k)kew is constructed explicitly from a single “chosen” element zo in the 
non-empty set S \ f(S). (A similar but shorter proof of Theorem 13.10.6 (i) is given by Jech [364], page 25.) 


13.10.6 THEOREM: Equivalence of Dedekind-infinite and w-infinite properties. 
(i) A set is Dedekind-infinite if and only if it is w-infinite. 
(ii) A set X is Dedekind-infinite if and only if #(X) > #(w). 


PROOF: For part (i), let S be a Dedekind-infinite set. Then there exists f : S — S with f(S) # S. 
So SX f(S) # 0. Therefore there exists zo such that xo € SX f(S). Let zi = f(xo). Then zı € f(S) 
because zo € S, and xı € f(f(S)) because zo ¢ f(S) and f is injective. So zı € f(S)N f(f(S)). Define the 
sequence of functions (/^),e, inductively by f° = ids and f^*! = f o f* for all k € w. Define the sequence 
(£k)kew inductively by £k+1 = f(x&) for k € w, where zo is as stated. Then it follows by induction that 
zy € f*(S)\ FETIS) for all k € w. Therefore (£k)kew is an infinite sequence of distinct points in S. Hence 
S is w-infinite. 

To show the converse, let S be an w-infinite set. Then there is an infinite sequence x = (£k)kew with zy € S 
for all k € w, and x, 4 x; for all k,g € w with k Æ £. Define f : S > S by f(£k) = zk41 for all k € w, and 
f(z) = for all z € S \ Range(z). Then f is a well-defined function from S to S, and Range(f) = S \ (xo). 
Therefore f(S) Z S. Hence S is Dedekind-infinite. 


Part (ii) follows from part (i) and Theorem 13.7.7 (v). 


13.10.7 REMARK: The mysterious “mediate cardinals”. 

Sets which are neither finite nor Dedekind-infinite were referred to as “mediate cardinals” by Whitehead/ 
Russell, Volume II [401], page 288, where “inductive” means the same as “finite”, and “reflexive” means the 
same as “Dedekind-infinite” . 


The following properties of cardinals which are neither inductive nor reflexive (supposing there are 
such) are easily proved. Let us put 

NC med = NoC — NC induct — NC refl Df, 

Cls med = — Cls induct — Cls refl Df, 


where “med” stands for “mediate.” 


The authors then give some properties of these “mediate cardinals”. (The abbreviations in the quotation 
have the following meaning. NC means the class of cardinal numbers [401, page 5]. NoC means the class 
of homogeneous cardinals [401, page 8]. Cls means “class” [400, page 198]. “NC med” means the class of 
mediate cardinals. “Cls med” means the class of mediate sets. The symbol “—” indicates the unary or binary 
class complement operation. “Df” means that the left term is defined to mean the right-hand expression.) 


13.10.8 REMARK: Mediate cardinals versus finite and Dedekind-infinite sets. 
The relations between mediate cardinals and the two most commonly used definitions of finite and infinite 
sets are illustrated in Figure 13.10.1. 


13.10.9 REMARK: ZF+CC proof that Dedekind-finite sets are finite. 

Theorems 13.10.10 and 13.10.11 are made possible by the axiom of countable choice, as indicated. Therefore 
they cannot be used in the proof of pure ZF theorems which are not CC-tainted. (The impossibility of 
proving Theorem 13.10.10 within unalloyed ZF set theory is demonstrated using model theory by Jech [364], 
page 123, Theorem 8.3 (iii).) 

A careful proof of Theorem 13.10.10 is given by Halmos [357], page 61, to demonstrate how to use the axiom 
of choice correctly, but the method requires the full axiom of choice. A different proof strategy is used by 
E. Mendelson [370], pages 199—200, but it also uses the full axiom of choice. An unsatisfactory informal 
proof (in a "dependent choice" style) is given by Wilder [403], pages 70-72, intentionally making implicit 
use of the axiom of choice so as to demonstrate the issues that arise. A satisfactory short proof is given by 
Jech [364], page 20, using only countable choice. This approach is followed here. 
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Figure 13.10.1 Finite and Dedekind-infinite sets versus mediate cardinals 


13.10.10 THEOREM |ZF+CC]: Every infinite set has a countably infinite subset. 
Every infinite set has a subset which is equinumerous to the ordinal numbers. That is, if X is an infinite 
set, then Inj(w, X) 4 9. In other words, if X is infinite, then 


3f :w > X, Vi, j €w, izj-f(iz]f) (13.10.1) 


Proor: Let X be an infinite set. For N € w, define Sy = Inj(N 4-1, X). Then Sy 4 0 for all N € w 
because X is infinite. So x veu Sw z Í by the axiom of countable choice. Let ¢ € x ve, Sy. In other words, 
$ : w — XnewSn satisfies (NV) € Inj(N + 1, X) for all N € w. Now define a function h : w — X as the 
concatenation of all of the sequences 9(N). In other words, h is defined as follows. 


Vk E€ w, N(k) = max{i € w; 2k > i(i + 1)} 

= min(i € w; (i + 1)(i + 2) > 2k} 
Vk Ew, I(k) — k— N(k) 
Vk € w, h(k) = (N (k)) (1(K)). 


Thus h(k) is the Z(k)-th element in the finite sequence ¢(N(k)) for each k € w. Then h is a countably 
infinite sequence of elements of X, and a countably infinite subsequence of h is injective. Duplicates may be 
removed from A to yield an injective countably infinite sequence f : w — X as follows. 


Vk € w, J(k) = min(j € w; Vi € w, (i < j = h(i) Æ h(3))] 
Vk € w, f (Kk) = h(J(k)). 


The set {j € w; Vi € w, (i < j = h(i) Z h(j))} is guaranteed to be non-empty for all k € w. Thus the pair 
(J(k), f (k)) is defined inductively for all k € w. Clearly f satisfies line (13.10.1). 


13.10.11 THEOREM |[ZF+CC]: Equivalence of infinite and Dedekind-infinite concepts. 
(i) If a set is infinite, then it is w-infinite. 

(ii) If a set is infinite, then it is Dedekind-infinite. 

(iii) If a set is Dedekind-finite, then it is finite. 


Pnoor: Part (i) follows from Theorem 13.10.10 and Definition 13.7.6. 
Part (ii) follows from part (i) and Theorem 13.10.6 (i). 
Part (iii) follows as the contrapositive of part (ii). 


13.10.12 REMARK: Comparison of infinity definitions with linear spaces. 
There is an analogy between definitions of infinite sets and definitions of linear spaces. 
(1) An w-infinite set is infinite, but the converse requires an axiom of choice. 


2) A "linear space with a basis" is a linear space, but the converse requires an axiom of choice. 
p P , q 
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In a sense, an injective map f : w — S from the non-negative integers to a set S may be regarded as providing 
a kind of “basis” for a subset of S. If the set S is countably infinite, there is a bijection f : w — S, which 
looks even more like a “basis” for S. Such a bijection gives a kind of “handle” on the elements of S by 
mapping the elements of S to w. In the same way, a linear space basis B gives a “handle” on the elements 
of a linear space V over a field K by mapping the elements of V to the set of component families KP. 


The definition of a finite set S in Definition 13.5.2 is written in terms of a specific “handle function" f :n — S 
for some finite ordinal number n € w. So the definition of a finite set is stronger than the negation of the 
w-infinite property in Definition 13.7.6. Thus every finite set is not w-infinite, but the axiom of choice is 
required to prove in general that all sets which are not w-infinite are finite. Both the definitions of finite 
sets and w-infinite sets specify “handle functions”, whereas their negatives merely specify the non-existence 
of such functions, which helps explain why they are so weak. The relative strength and weakness of the 
definitions is summarised in the following table. 


style of infinity finite infinite 


standard definition strong weak 
w definition weak strong 
Dedekind definition weak strong 


13.10.13 REMARK: The benefits of substituting Dedekind infinity for traditional infinity. 

In practice, when one demonstrates that a set is infinite, the core of the demonstration is typically a con- 
struction of an injective map from the ordinal numbers to the set. This clearly implies that the set is 
non-finite, and is therefore infinite according to Definition 13.7.2. But such a construction also has the 
stronger consequence that the set is thereby proved to be w-infinite. 


It is difficult to think of any practical situation where a set is demonstrated to not be finite while simulta- 
neously there is no clear possibility for showing that the set is w-infinite. Consequently, it seems reasonable 
to replace the term “infinite” with either “w-infinite” of the equivalent term “Dedekind infinite". 


13.10.14 REMARK: Relations between various definitions of infinite cardinality. 
The relations between some notions of infinity are illustrated in Figure 13.10.2. (The arrows in Figure 13.10.2 
indicate specialisation of definitions. See Definition 13.7.6 for w-infinite, countable and uncountable sets.) 


infinite = non-finite infinite = non-finite = w-infinite = Dedekind-infinite 
w-infinite = iu" blv infini 
Dedekind-infinite uneguntabiy unite 
countably infinite} |uncountably w-infinite countably infinite uncountably infinite 
ZF set theory ZF--AC set theory 
Figure 13.10.2 Relations between definitions of infinity 


Here an uncountable w-infinite set means an w-infinite set which is not countably infinite, whereas an 
uncountable infinite set means an infinite set which is not countable. Thus an uncountable infinite set in ZF 
set theory could have mediate cardinality. This conundrum is illustrated in Figure 13.10.3. 


It is not possible to say whether mediate cardinal sets are smaller or larger than countably infinite sets 
because they are not comparable. A related issue arises with uncountable w-infinite sets because these may 
be comparable to standard uncountable *yardstick" sets, but they might not be comparable. In other words, 
sets which are w-infinite and not countably infinite are not necessarily comparable to some standard yardstick 
set such as a general ordinal number or beta-set. (See Definition 13.4.13 and Section 13.4 for beta-sets and 
beta-cardinality.) 


13.10.15 REMARK: Avoiding higher-order mediate cardinals. 
Although an uncountable w-infinite set is at least “as big" as w, but not of the “same size" as w, this leads to 
further uncertainties which are similar to the mediate cardinality issue. (The existence of mediate cardinals 
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sets 
finite infinite 
finite mediate w-infinite 
finite mediate countably uncountably 
infinite w-infinite 
countable uncountable countable uncountable 
Figure 13.10.3 Classes of finite, mediate and infinite sets 


of higher degree was investigated in 1923 by Wrinch [442]. See Moore [371], pages 211-212.) To be really 
sure, in a positive way, that a set is “bigger than” w, a safe way to do this is to show that the set has a 
subset which is equinumerous to P(w). Similarly, to ensure that a set is “bigger than” P(w), it is safest to 
demonstrate that it contains a subset equinumerous to P(P(w)), and so forth. 


13.11. Variant definitions for finite sets 


13.11.1 REMARK: The plethora of concepts of finiteness. 

The eight variant finite-set definitions in Table 13.11.1 are given by Howard/Rubin [362], page 278. The 
word “maximal” in this table means maximal with respect to the class inclusion relation “C”, The word 
“monotone” means non-decreasing with respect to the class inclusion relation *C". 


name definition 


[finite Every non-empty set of subsets of A has a maximal element. 

Ia-finite In every two-set partition for A, at least one of the two sets is I-finite. 
II-finite Every non-empty monotone set of subsets of A has a maximal element. 
III-finite The power set of A is Dedekind finite. 

IV-finite A is Dedekind finite. 

V-finite |A| =0, or 2- |A| > JA]. 

VL-finite |A| = 0, or |A| = 1, or | AP? > JA]. 

VII-finite A is I-finite or not well-orderable. 


Table 13.11.1 Variant definitions of finiteness 


All of the definitions in Table 13.11.1 are “set-theoretic”. In other words, they are logical predicates in terms 
of the equality and set-membership relations, not referring in any way to the set of ordinal numbers w. 


The “I-finite set” definition is equivalent to the standard w-based finite set concept in Definition 13.5.2. 
(This is shown in Theorem 13.11.4.) Only two of these finite set definitions feature prominently in applicable 
analysis, namely the I-finite and IV-finite definitions. 


These eight finite-set classes are distinct in ZF set theory. In other words, they define independent concepts. 
They are not merely alternative definitions for a single concept. The definitions in Table 13.11.1 are sorted 
from strongest to weakest. (This progression is illustrated in Figure 13.11.2.) Thus, for example, a set which 
is I-finite is necessarily IV-finite in all ZF models. No pair of distinct concepts amongst these eight finite-set 
concepts can be proven to be equivalent in ZF because for each pair, there exists a ZF model in which there 
exist sets which satisfy one definition but not the other. 


All eight of these finite-set definitions are equivalent if the axiom of choice is adopted. (The relations between 
these definitions are discussed in particular by Lévy [426], pages 2-3; Howard/Rubin [362], pages 278-280; 
Moore [371], pages 22-30, 130, 209—213, 276-277; Suppes [395], pages 98-108, 149. Some exercises for I-finite, 
II-finite and IV-finite sets are given by Jech [364], page 52, for two Fraenkel models and one Mostowski model. 
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I-finite L'infinite 
Ia-finite Ia-infinite 
II-finite | IL-infinite 
III-finite | IIL-infinite 
IV-finite | mediate IV-infinite 
V-finite V-infinite 
VI-finite VLinfinite 
VII-finite VILinfinite 
Figure 13.11.2 Finite and infinite classes of sets 


'These are permutation models for ZF with atoms, which are called models N1, N2 and N3 respectively by 
Howard/Rubin [362], pages 175-183. Note that the II-finite definition is also referred to as *T-finite".) 


The finiteness definitions I, II, III, IV and V were published in 1924 by Tarski [434], pages 49, 93, and the 
forward implications were proved there. Proofs of all of the forward implications and reverse non-implications 
in ZF set theory (with ur-elements) were given in 1958 by Lévy [426], who attributed most of the results to 
earlier papers by Mostowski and Lindenbaum. (The reverse non-implications are proved by counterexamples 
which are based on countably infinite sets of ur-elements in Mostowski models.) 


The model theory underlying these results has strong relevance to this book because the “mediate cardinal” 
gap between I-finite and IV-finite sets (which is mentioned in Remark 13.11.6) has a pervasive influence on 
topology, measure and integration, and mathematical analysis generally. (See Remark 7.11.13 for a sample 
of the consequences of not adopting the countable axiom of choice.) However, a presentation of model theory 
would very significantly expand the size of this book and unacceptably delay its publication. So it is out of 
scope. (See Remark 1.6.3 item (2).) 


13.11.2 THEOREM: Every non-empty set of subsets of a finite ordinal contains a minimal subset. 
Let S be a non-empty set of subsets of n for some n € w. Then S has a minimal element. In other words, 
Vn € w, VS € P(P(n)) \ (0), Sy € S, Vx € S, (C y v — y). 


PROOF: Let n —0 =. Then P(n) = {0} and P(P(n)) = (0, (0) . So P(P(n)) \ (0) = ((0)). So the only 
possibility for S is {0}. Then 0 is clearly the minimal element of S 

Assume that the proposition is true for n = k € w. Letn=k+1=kU{k} €w. Let S € IP(IP(k 4- 1)) \ (0]. 
Then T € P(k +1) for all T € S. Suppose that T € IP(k) for some T € S. Let S' = {U € S; U Ck}. Then 
S' € P(P(k)) \ {0}. So S' has a minimal element Uo by the inductive hypothesis for n = k. This is also 
minimal for all of S because all elements U of S \ S’ contain the element k, which is not itself an element 
of k. So Up Z U for such U. 


Now suppose T ¢ P(k) for all T € S. In other words, k € T for all T € S. Let S = (Un k; U € S}. 
Then S’ € IP(IP(k)) V (01. So S' has a minimal element Up by the inductive hypothesis for n = k. But then 
Uo U {k} is a minimal element of S because U C Uo U {k} if and only if Un k C Up for all U € S, since k € U 
for all such U. 


Therefore in both cases, the proposition for n = k + 1 follows from the proposition for n = k. Hence the 
proposition is true for all n € w by Theorem 12.2.12. 


13.11.3 THEOREM: Every non-empty set of subsets of a finite ordinal contains a maximal subset. 
Let S be a non-empty set of subsets of n for some n € w. Then S has a maximal element. In other words, 
Vn € w, VS € P(P(n)) \ {0}, Jy e S, Yx € S, (yCr— yr). 


PROOF: Let n €w, and let S € P(P(n)) V {0}. Let S’ = {n\U; U € S}. Then S' € P(IP(n)) \ (01. So S’ 
has a minimal element Uj € S’ by Theorem 13.11.2. Let Uy = n \ Uj). Then Up is a maximal element of S 
because Uo C U for U, Uo € P(n) if and only if (n \ Uo) € (n \ U). 


13.11.4 THEOREM: Equivalent test for finiteness which does not refer to ordinal mumbers. 
A set A is equinumerous to a finite ordinal number if and only if every non-empty set of subsets of A has a 
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maximal element with respect to the relation of set inclusion. In other words, 


(3n € v, 3f :n 2 A, f is a bijection) = (VS € P(P(A)) \ {0}, Jy € S, Vr € S, (y C x > y = x)). 


PROOF: Suppose that f : n — A is a bijection for some n € w. Let S be a non-empty set of subsets of A. 
Let S' = (f 1(U); U € S). Then S' € IP(IP(n)) V (0). So S’ has a maximal element Up by Theorem 13.11.3. 
Therefore f(Ug) is a maximal element of S because f is a bijection. 


Now suppose that a set A satisfies VS € P(P(A)) \ (0), dy € S, Yx € S, (y C xz > y = x). Define a 
set S = (y € P(A); dn € w, If € n x y, f isa bijection}. This is the set of all subsets of A which are 
equinumerous to some element n of w. Then S € P(P(A)) V {@} because Ø € S and so S z Ø. Therefore 
Jy € S, Vr € S, (y C x = y = x). Let yo € S be such an element, namely such that Vz € S, (y C £ > y = x). 
Let z € A\y. Let x =yU{z}. Then y C x. So y = x. This is a contradiction. Therefore A \ y = 0. But 
y € S and S C P(A). So y € P(A), and so y C A. Therefore y = A. So A € S. So by the definition of S, 
jn € w, df € n x y, f is a bijection. In other words, dn € w, Jf : n — A, f is a bijection. 


13.11.5 REMARK: A ZF set is finite if and only if it is I-finite. 

The particular advantage of Theorem 13.11.4 is that it gives a set-theoretic definition of finite sets without 
referring to the very complicated set w of finite ordinal numbers as a “yardstick”. It shows that the yardstick 
style of definition of finite sets is equivalent to the I-finite definition in Remark 13.11.1. This permits finiteness 
to be written as a relatively simple logical predicate, namely A is finite if and only if 


VS € P(P(A)) \ {0}, dye S, Vr € S, (Cv yr). 


If the power-set expressions are expanded, this predicate becomes: 


VS, (((S #0) A (Yz € 5, z C A) > dye S, Yz € S, (yCrsy=z)). 


This may appear somewhat complicated, but it does not require the existence of the infinite set w to make 
it work, and it does not require a definition of w to be incorporated into the predicate. 


13.11.6 REMARK: Mediate cardinals lie between finite sets and countably infinite sets. 

A large number of situations in topology where the axiom of choice is unconsciously used are cases where one 
assumes that all infinite sets contain an infinite sequence of distinct elements. This assumption is equivalent 
to saying that Dedekind-finiteness is the same as ordinary finiteness. In other words, it is the same as saying 
that all IV-finite sets are I-finite. It is known that the axiom of countable choice implies the equivalence of 
I-finite and IV-finite sets, but that the reverse implication cannot be proved in ZF set theory. ZF sets are 
therefore of three kinds as follows. 


(1) Finite sets. These are equinumerous to elements of w. 
(2) Dedekind-infinite sets. These have subsets which are equinumerous to w. 


(3) Mediate-cardinal sets. These are not finite and not Dedekind-infinite. 


The terminology “mediate cardinal” is possibly confusing because in ZF+AC set theory, every cardinal 
number corresponds to some ordinal number, whereas in plain ZF set theory, there is no ordinal number 
with the same cardinality as a “mediate cardinal” set. This is in fact the origin of the name. (See Whitehead/ 
Russell, Volume II [401], page 288, and Remark 13.10.7.) 


Considering how important the notions of finiteness and infinity are to almost all areas of mathematics, it 
is highly inconvenient that the mediate cardinals exist in the shadows between the finite sets and w-infinite 
sets which are most convenient to use in practice. It is small wonder, then, that so many mathematicians are 
willing to sweep these inconvenient sets under the carpet using the metaphysical instrument known as the 
axiom of choice. The non-AC alternative is to adopt the Dedekind definition for infinite cardinality, and to 
adopt the beta-cardinality concept in Section 13.4 in place of the ordinal numbers as cardinality yardsticks. 


13.11.7 REMARK: The Kuratowski definition of finite sets. 
In a 1920 paper, Kuratowski [424] defined a set A to be finite when 


P(A) \ {0} = {S e P(P(A) \ {0}; (Va € A, {x} eS) and (VX,Y eS, XUY € S)). (13.11.1) 
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In other words, A is Kuratowski-finite when all of its non-empty subsets are generated from singletons by 
pairwise union operations. (To be precise, Kuratowski actually expressed this by requiring P(A) V (0] to be 
the unique subset of P(A) V {Ø} which satisfies the conditions Vz € A, {x} € S and VX,Y € S, XUY ES. 
But this is clearly equivalent to line (13.11.1).) 


Kuratowski showed that this definition is equivalent to the standard definition (i.e. equinumerosity to an 
element of w) by a simple induction argument to show that finite sets are Kuratowski-finite (similar to the 
proof given here for Theorem 13.11.2), and by examining the set of all finite subsets of A to show that 
Kuratowski-finite sets are finite (similarly to the second half of the proof given here for Theorem 13.11.4). 


Finite sets are equivalent in ZF to I-finite and Kuratowski-finite sets. However, Tarski’s I-finite definition 
seems to be easier to interpret and manipulate than Kuratowski's. Therefore the Kuratowski definition is 
not explained further here. 


In 1924, Tarski [434], pages 48-58, showed that I-finiteness is equivalent to Kuratowski-finiteness, which 
consequently implied that I-finiteness is equivalent to the standard numerical finiteness. 


13.11.8 REMARK: A set is finite if and only if the power set of its power set is Dedekind-finite. 
It was shown in 1912 by Whitehead/Russell, Volume I [401], page 288, that a set is finite if and only if the 


power set of its power set is Dedekind-finite. (This equivalence was stated in more modern language in 1924 
by Tarski [434], pages 73-74.) This gives an interesting progression of finiteness concepts as follows. 


name definition or equivalent criterion 


Lfinite | IP(IP(A)) is Dedekind finite. 
III-finite P(A) is Dedekind finite. 
IV-finite A is Dedekind finite. 


13.12. Cardinality-constrained power sets 


13.12.1 REMARK: Some special subsets of general power sets with finite bounds. 

Notations 13.12.2, 13.12.5 and 13.12.6 are refinements of the power set notation P(X) in Definition 7.2.4 (5). 
(Note that P(X) = 0 if m < n.) TUS 
A constraint of the form “#(S') < m", for m € w, is shorthand for “Ji € w, (i € m and Bij(i, S) 4 0)", 
where the set-of-bijections notation “Bij” is introduced in Notation 10.5.25 (iii). This is equivalent to the 
proposition “Ji € m*, Bij(i, S) 4 0”. 


13.12.2 NOTATION: 
IP" (X), for a set X and m € w, denotes the set {S c P(X); Z(S) < m}. 


IP, (X), for a set X and n € w, denotes the set (S € P(X); n € #(S)}. 
P? (X), for a set X and m,n € w, denotes the set (S € P(X); n € Z(S) and #(S) < m). 


13.12.3 REMARK: Some special subsets of general power sets with infinite bounds. 

Notation 13.12.5 uses the infinity symbol “oo” to extend Notation 13.12.2 to subsets S of X which have 
unbounded finite cardinality. Notation 13.12.6 uses the symbol “w” to include subsets S of X which are 
countably infinite. (See Definition 13.7.6 for countably infinite sets.) 

A constraint of the form *4£(S) < oo" is shorthand which means “Ji € w, Bij(i, S) Z @”. A constraint of 
the form “#(S) € #(w)” is shorthand which means “Ji € wt, Bij(4, S) Z 0”. A constraint of the form 
“#(S) = #(w)” is shorthand which means “Bij(w, S) 4 0". 


13.12.4 REMARK: Distinction between “potential infinite” co and “actual infinite” w. 

The distinction between the symbols “oo” and “w” in Notations 13.12.5 and 13.12.6 is in accordance with 
Cantor’s 1883 explanation of the difference between them. (See Remark 12.1.32.) Cantor used “oo” to 
mean “potentially infinite” and “w” to mean “actually infinite”. (These terms have their origin in Aristotle’s 
philosophical doctrines on the infinite and infinitesimal, which were accepted throughout the European Dark 
Ages and even into the 19th century, which Cantor then had to contend with. Aristotle said that potential 
infinity was permissible, but actual infinity was not. See for example Boyer [235], pages 40, 295-298.) 


'Thus the cardinality upper bound *oo" in Notation 13.12.5 indicates that the cardinality runs over all finite 
values, whereas the cardinality upper bound “w” in Notation 13.12.6 indicates that the cardinality includes 
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all finite values and also the “actually infinite” value #(w). A “potential infinity” is never reached, as for 
example in a limit such as lim;_,.. a; or a sum such as poner x”, but an “actual infinity” can be reached. 


13.12.5 NOTATION: 
IP?* (X), for a set X, denotes the set (S € P(X); #(S) < co}. 


P(X), for a set X and n € w, denotes the set (S € P(X); n € #(S) and #(S) < co}. 


13.12.6 NOTATION: 
P’(X), for a set X, denotes the set (S € P(X); #(S) 


) 
IP, (X), for a set X, denotes the set (S € P(X); #(w) 
IP?(X), for a set X and n € w, denotes the set (S € P(X); n € #(S) and #(S) € #(w)}. 
PY(X), for a set X, denotes the set (S € P(X); Z:(S) = #(w)}. 


13.12.7 REMARK: Applications and properties of cardinality-constrained power sets. 
Constrained power sets of the form PẸ (X) are useful for topology. Some elementary general properties are 
given in Theorem 13.12.8. 


13.12.8 THEOREM: Some properties of the set of all non-empty finite subsets of a given set. 
G) PPO) — 0. 

(ii PPUAD- = {{A}} for any set A. 

(ui) VA € X, {A} € PX (X) for any set X. 

(X) = P(X) \ (0) for any finite set X. 

(v XcYoPeUDc P(Y) for any sets X, Y. 

(vi) UCP? (X)) = X for any set X. 


) P 
) 
) 
(iv) P 
) 
i) 
(vii) X C PP(Y) > UX CY for any sets X,Y. 
) 
) 
) 
i) 
) 


(vii) VC € PS (IP? (X), UC € P(X) for any set X. 
(ix) VC € IP? (IP? (X)), QC € P?*(X) for any set X. 
(x) (UC; C e PS (PP (X))} = P£*(X) for any set X. 
(xi) ((1C; C e PH (IP?* (X))] = IP? (X) for any set X with #(X) 


2 
(xii) {N C; C e PY(PY(X))} = PX (X) for any set X with #(X) < 


2. 
1 
PROOF: To show part (i), let A € IPT*(0). Then A C Ø. So A = 0. So #(A) = 0. This contradicts 1 € #(A). 
Hence PY (0) = 0. 


To show part (ii), let C € P*((AJ). Then C C {A}. So C =0 or C = (Aj. But #(C) > 1. SoC = (Aj. 
Hence PP HAJ) = {{A}}- 


To show part (iii), let X be a set and let A € X. Then {A} C X and #({A}) = 1. Therefore {A} € PẸ (X). 


To show part (iv), let X be a finite set. Then A € IPT* (X) if and only if A € P(X) and 1 € 4 (A) < oo. So 
A € IP* (X) if and only if A € P(X) and A 4 (. Hence A € IP?*(X) if and only if A € P(X) V {0}. 


To show part (v), let X, Y be a sets with X C Y. Let A € IP*(X). Then A C X and 1 < #(A) < oo. 
Therefore A C Y and 1 € #(A) < oo. Hence A € P(Y). 


For part (vi), let X be a set. Then LJ(IPP* (X)) € U(P(X)) by Theorem 8.5.2 (iii) because P(X) C P(X 
But U(P(X)) = X by Theorem 8.5.2 (v). So UPT*(X)) € X. Now let t A € X. Then {A} € PX(X) by 
part (ii). So {A} C LUJ(P?*(X)) by Theorem 8.4.8 (xiv). So A € U(P??(X)). Therefore X € U(PY(X)). 
Hence U (PX (X)) = X. 

To show part (vii), let X, Y be sets such that X C IP*(Y). Let A € UX. Then 3C, (Ce X^ AEC). 
So3C,(C e P(Y) A AEC). SOACLC CY ^ Ae C). S0 3C, A € Y. S$0A € Y. Hence UX C Y. 
(Alternatively, part (vii) follows from Theorem 8.5.2 (ix) since IP? (X) C IP(X).) 


To show part (viii), let X be a set. Let C € IPT*(IP*(X)). Then C € P(X) and so UC C X by part (vii). 
But 1 € #(C) < oo, and 1 € #(A) < oo for all A € C. So 1 € #(UC) < oo. Therefore |] C € P(X). 
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To show part (ix), let X be a set. Let C € IPT*(IPT* (X)). Then C C P(X) and so UC C X by part (vii). 
But MC € UC because C Z 0. So NMC C X. Now 1 € #(C) < œ, and 1 € #(A) < oo for all A € C. 
So #(UC) < co. Therefore UC € P(X). 


To show part (x), let X bea set. Let Z = (JC; C € P*(IP*(X))j. Then Z C P(X) by part (viii). 
Let A € P?*(X). Then {A} € PP(PP(X)). So A =U {A} € Z. So Z 2 P(X). Hence Z = P? (X). 

To show part (xi), let X bea set with #(X) > 2. Let Z = {N 0; C € PY (PY (X))}. Then Z C IP? (X) by 
part (ix). Let D € P(X). Then (D) € P*(PP(X)). So D =(){D} € Z. So Z 2 PX(X). Let A,B € X 
with A # B. Then (A, B} € PY (X) and so {{A}, {B}} € Pf (IPP*(X)). So Ø = {A}N{B}} € Z. Therefore 
Z 2 PP(X)U(0) = P(X). Hence Z = P? (X). 

To show part (xii), let X be a set with #(X) € 1. Let Z = ((]C; C e PP (PH (X))}. If #(X) = 0, then 


X = ( and P(X) = ( and IP? (IP*(X)) = 0 and Z = 0 = IP*(X). If #(X) = 1, then X = {A} for 
some A. So PF (X) = {{A}} and PP (P(X) = (((4))) and Z = ((4)) = PF (X. 
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14.1. Natural numbers 


14.1.1 REMARK: Survey of natural number definitions and notations. 
Some definitions and notations for natural numbers are summarised in Table 14.1.1. 


Table 14.1.1 suggests that natural numbers typically commence with 0 in logic and set theory contexts, 
whereas in general mathematics they typically commence with 1. The blackboard-bold notation IN seems 
to be the most modern. (The use of blackboard bold font for special sets like N, Z, Q, R and C signifies 
that these are fundamental sets. This frees up the corresponding letters N, Z, Q, R and C for general use.) 
When an author uses the lower-case omega symbol “w” for the natural numbers, this is usually because they 
are identifying natural numbers with finite ordinal numbers. In this case, the natural numbers automatically 
commence with 0 rather than 1. 


Natural numbers and finite ordinal numbers are two different systems. Natural numbers have arithmetic 
operations such as addition and multiplication, whereas the main purpose of ordinal numbers is to represent 
equivalence classes of well-ordered sets. Ordinal numbers have been represented as ZF sets only since 
von Neumann’s 1923 paper. (See von Neumann [436].) Earlier, ordinal numbers were some kind of abstract 
essence of the equivalence classes of well-orderings. It is probably best to keep the two concepts distinct. 


14.1.2 REMARK: Arguments in favour of starting the natural numbers with 1. 

Some authors define IN to be the same as w. This would be a redundancy in both a definition and a 
notation. The concept of “natural numbers” is arguably the set of counting numbers which have been used 
for thousands of years, which does not include zero because until the last few centuries, zero seemed like an 
unnatural and unnecessary concept. One does not count three cows in a field as “zero cows, one cow, two 
cows”. One does not number three bullet points in a presentation as points 0, 1 and 2. One does not number 
the houses on a street starting from zero. The years of the Gregorian and Julian calendars start with 1, 
omitting 0 so that 1BC is followed by 1AD. The months of a year and the days of a month start with 1. The 
hours of the 12-hour clock start with 1 (or 12), not 0. Even in modern mathematics, one does not generally 
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year reference definition notation 

1888 Dedekind [352], page 68 | 1,2,3,3} N 

1918 Weyl[156], page 15 L 223.3. 

1919 Fraenkel [354], page 4 { 1,2,3,...} 

1919 Russell [389], page 3 {0,1,2,3,...} 

1931 Gédel [356], page 63 {0,1,2,3,...} 

1940 Quine [380], pages 237, 241 {0,1,2,3,...} Nn 

1941 Tarski [398], page 80 {0,1,2,3,...} 

1944 Church [348], page 176 {0,1,2,3,...} 

1946 Graves [85], pages 18-19 i d.2 9:2] M 

1952 Kleene [365], page 19 {0,1,2,3,...} 

1952 Wilder [403], pages 46, 64 { 1,2,3,...} N 

1953 Rosser [387], page 275 i 123a} 

1953 Tarski [399], page 41 {0,1,2,3,...} 

1954 Carnap [345], page 163 {0,1,2,3,...} 

1955 Allendoerfer/Oakley [48], page 58 { 1,2,3,...} 

1957 Suppes [394], page 166 - fie Re ae 2x] 

1958 Bernays [341], pages 89, 148 {0,1,2,3,...} w 

1958 Eves [353], page 184 T 1,2,3,...} N 

1960 Halmos [357], page 44 {0,1,2,3,...} w 

1960 Körner [461], page 30 {0,1,2,3,...} 

1960 Körner [461], page 80 i. 1,2,3,...] 

1960 Suppes [395], page 135 {0,1,2,3,...} w 

1963 Quine [382], page 74 {0,1,2,3,...} N 

1963 Simmons [137], page 33 i 1,2,3, N 

1963 Stoll [393], pages 56, 57 {0,1,2,3,...} N 

1964 Gelbaum/Olmsted [78], page 10. { 1,2,3,...] N 

1964 E. Mendelson [370], page 102 {0,1,2,3,...} 

1965 MacLane/Birkhoff [110], page 15 — {0, 1, 2,3,...} N 

1965 S. Warner [155], pages 3, 117 (0,1,2,3,...] N 

1966 Cohen [349], pages 56, 62 {0,1,2,3,...} w 

1967 Kleene [366]. page 176 {0,1,2,3,...} 

1967 Shoenfield [390], page 249 {0,1,2,3,...} w 

1967 Spivak [140], page 21 { 1,2,3,...} N 

1968 Curtis [65], page 11 { 1,2,3,...} N 

1968 Rosenlicht [128], pages 10, 21 1 1,2;3,:..4 

1969 Bell/Slomson [339], page 7 to1,2,3,5.] N 

1969 Robbin [384], page 66 {0,1,2,3,...} N 

1971 Friedman [74], pages 3-4 [ 1,2,3, ch 

1971 Kane et alii [98], page 3 [43122543 N 

1971 Pinter [377], page 125 1019.5 4] w 

1971 Shilov [134], page 2 { 1,2,3,...} 

1973 Chang/Keisler [347], page 582 {0,1,2,3,...} w 

1973 Jech [364], page 20 {0,1,2,3,...} w 

1979 Lévy [368], page 56 {0,1,2,3,...} w 

1982 Moore [371], page xxiii {0,1,2,3,...} IN 

1990 Roitman [385], page 2 {0,1,2,3,...} N 

1993 Ben-Ari [340], page 327 10199... — ur 

1993 EDM2 [113], 294.B, page 1100. 1 12,3..] N 

1996 Smullyan/Fitting [392], page 32 {0, 1, 2,3,...} w 

1998 Mattuck [114], page 399 { 1,2,3,...} N 

2004 Szekeres [305], page 13 {152538522} IN 

2005 Penrose [297], page xxvii {0,1,2,3,...} N 

Kennington { 1,2,3,...} N 

Table 14.1.1 Survey of definitions and notations for “natural numbers” 
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refer to the coordinates of a three-dimensional vector v as (vo, v1, V2). One generally numbers the rows and 
columns of matrices starting with 1. Therefore the natural numbers are defined to be the positive integers 
in this book. 


Probably a better term for the positive integers would be the “counting numbers", which is essentially 
how they are named in German. The German noun “Zahl” literally means “count”, closely related the 
verb “zahlen” meaning “to count". These words are derived from a word which meant a notch or incision, 
suggesting the use of notches to keep count. (See Wahrig [484], page 1452.) According to White [485], 
page 409, the Latin word “numerus”, from which the English word “number” is derived, is akin to the Greek 
word *véuo", which means “to divide" or “to distribute". (See Morwood/Taylor [480], page 219.) One 
cannot divide a thing into zero parts! Some people claim that zero is a natural number. If zero is so natural, 
why did it take so long to invent? 


14.1.3 REMARK: The nostalgic value of the Peano axioms for natural numbers. 

The Peano natural number system axioms were introduced before the axiomatisation of set theory in the 
early 20th century. (Cantor did not formally axiomatise his set theory.) Although Peano did give a list of 
axioms for the foundations of mathematics, his axiomatisation of classes, functions and numbers expressed 
a naive form of set theory without existence axioms, in particular no axiom of infinity. Although it may 
appear at first sight that arithmetic may be axiomatised without an underlying set theory, the language of 
sets must be used explicitly or implicitly in any such axiomatisation, and when the naive approach to sets 
encounters contradictions, as it did, replacement of naive sets with a more formal theory becomes inevitable. 
Definition 14.1.5 is stated within the framework of ZF set theory. T'his may seem like a historical anachronism, 
but the alternative is to use a naive set theory instead. The idea that the Peano axioms form a kind of 
alternative approach to numbers not based on set theory is illusory. 


Nowadays the main value of the Peano axioms is nostalgia for a simpler time when set theory was naive 
and informal, and the paradoxes and urgency to resolve issues of consistency and independence through 
axiomatisation and model theory were over the horizon. It would be easy to declare that the natural numbers 
are nothing other than the positive finite ordinal numbers, but there is some historical and theoretical interest 
in approaching natural numbers from the perspective of the Peano axioms. This approach is demonstrated 
in Section 14.1, but it is not an essential part of the modern approach to numbers. 


14.1.4 REMARK: Origin of the Peano axioms for the natural numbers. 

Peano presented 9 axioms (in Latin) for the natural numbers in 1889. (See Peano [375], page 1.) In 1891, he 
presented the better-known set of 5 axioms (in Italian) for the natural numbers. (See Peano [431], page 90.) 
In more modern notation, these 5 axioms are presented in Definition 14.1.5. (These axioms are also presented, 
amongst many others, by Graves [85], pages 18-23; Halmos [357], page 46; E. Mendelson [370], page 102; 


EDM2 [113], 294.B, page 1104; Stoll [393], pages 58-59; Russell [389], pages 5-6.) 


Peano possibly had in mind Newton’s “Philosophie naturalis principia mathematica" [294] when he chose the 
language and title for his own “Arithmetices principia" [375] where he presented in 36 pages an axiomatisation 
of propositional calculus, classes (i.e. set theory), functions, natural number arithmetic, rational number 
arithmetic, real numbers and limits. This seems to have been an attempt to repeat in the area of mathematics 
what had been achieved 200 years earlier in physics and 2200 years earlier in geometry. 


14.1.5 DEFINITION: The Peano axioms for the natural numbers. 

A system of natural numbers is a tuple (IN, s, 1) such that the set IN and function s satisfy the following. 
(1) 1€ N. 

(2) s(N) CN. 
(3) Vz,y € N, (s(x) = s(y) > £ = y). 

(4) 1 £ s(IN). 

(5) VX, (1€ X and s(X) C X) 2 N € X). 


The function s is called the successor function of the system of natural numbers N. 
For any element n € N, the element s(n) is called the successor of n in N. 


14.1.6 REMARK: The choice of initial number for the Peano axioms. 
Peano used 1 as the first integer in his axiomatic system for the natural numbers. He apparently did not use 
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the term “natural numbers”. He called them merely positive whole numbers (i.e. “numeri interi e positivi” 
in Italian, and “numerus integer positivus” in Latin). Nowadays, zero is often used as the first integer. If no 
arithmetic operations are applied, the choice of initial integer makes no real difference to Definition 14.1.5. 


14.1.7 REMARK: The principle of mathematical induction is one of the Peano axioms. 
Condition (5) in Definition 14.1.5 is known as the “principle of mathematical induction”. The property of 
the same name for finite ordinal numbers is proved in Theorem 12.2.12. 


14.1.8 REMARK: All systems of natural numbers are isomorphic. At least one “model” exists in ZF. 

In order to speak of the (system of) natural numbers, it must be shown that all systems of natural numbers 
are isomorphic and that there is only one isomorphism between each pair of systems of natural numbers. 
Then each element of any such system may be unambiguously identified with a unique element of each other 
system. This is shown in Theorem 14.1.16 (i). But first it must be shown that at least one system of natural 
numbers exists within ZF set theory. This is shown in Theorem 14.1.9. 


14.1.9 THEOREM: Existence of a ZF “model” for the Peano natural number system axioms. 
Let IN = w \ {f} and 1 = (0), and define s : N — IN by s(x) = x U (x) for all x € N. Then (N,s,1) isa 
system of natural numbers. 


Pnoor: For Definition 14.1.5 (1), € w by Theorem 12.1.30 (ii). Therefore {0} € w by Theorem 12.1.30 (iii). 
But {0} Æ 0 because 0 ¢ Ø. Hence (0) € w V (0) = N. 

For Definition 14.1.5 (2), s((IN) C IN by Theorem 12.2.20 (i). 

For Definition 14.1.5 (3), Vr, y € N, (s(x) = s(y) > x = y) by Theorem 12.2.20 (v). 


(3 
For Definition 14.1.5 (4), suppose that 1 € s(N). Then {0} = x U (xj for some z € N = w \ (0). So x € {0} 
because x € x U [x]. Therefore x = Ø, which contradicts the assumption that x € w V (0). Hence 1 ¢ s(IN). 


For Definition 14.1.5 (5), let X be a set which satisfies 1 € X and s(X) C X. Let S = (XU (0)) Nw. Then 
S Cw, and ( € S because € w. Now let x € S. Then z € w and so z U {x} € w by Theorem 12.1.30 (iii). 
If x = 0, then z U {z} = 1 € X, and so x U {xz} € S. If x Z 0, then x € X Nw and so z U [x] € w and 
xU {x} € X since s(X) C X. SoxU{a} € S. Thus Vr € S, xU {x} € S. Therefore S = w by the principle 
of mathematical induction, Theorem 12.2.12. So w € XU {0}. Therefore w V (0) C X by Theorem 8.2.6 (v). 
Thus N C X. Hence VX, (1E X ^ s(X) C X) 2 NC X). 


14.1.10 REMARK: Defining “leading intervals” of a natural number system. 
Since no order relation is defined on a natural number system, it is not easy to define "leading intervals", 
meaning the intervals [1, x] for x € N. These intervals must be defined inductively, as in Theorem 14.1.11. 


14.1.11 THEOREM: Inductive construction of “leading intervals”, based on the Peano axioms. 

Let (IN, s, 1) be a system of natural numbers. 

(i) Va € N, s(x) Aa. 

(ii) Vx € IN, JJ € P(N), (x € I and s(x) ¢ I and Vy € I \ {1}, 3z € I, y = s(z)). 
In other words, for all x € IN, there exists at least one J € P(N) which satisfies x € I and s(x) ¢ I and 
Vy E€ I\{1}, dz € I, y sz). 

(iii) Let J C N satisfy 1 € I and s(1) ¢ I and Vy € IN {1}, dz € I, y = s(z). Then I = {1}. 

(iv) Let x € N and I C N satisfy s(x) € I and s(s(z)) € I and Vy € I \ {1}, dz € I, y = s(z). Then 
I = I \ {s(x)} satisfies x € I~ and s(x) ¢ I~ and Yy € I \ {1}, dz E€ I`, y = s(z). 

(v) Vr € N, JI € P(N), (x € I and s(x) ¢ I and Vy € I \ {1}, dz € I, y = s(z)). 
In other words, for all x € N, there is one and only one set 7 € P(N) which satisfies x € I and s(x) ¢ I 
and Vy € I \ {1}, dz € I, y = s(z). 


PROOF: For part (i), let X = {x € N; s(x) Z x}. Then 1 € X by Definition 14.1.5 (4) because s(1) € s(IN). 
Let z € X. Then s(x) Z x. So s(s(x)) Z s(x) by Definition 14.1.5 (3). So N C X by Definition 14.1.5 (5). 
Therefore X = N. Hence Vz € N, s(x) Z x. 

For part (ii), let X = (r € N; JJ € P(N), (x € I and s(x) ¢ I and Vy € I \ {1}, dz € I, y = s(z))}. Let 
x = land I = (1). Then J € P(N) and z € J and s(x) ¢ I because 1 ¢ s(IN) by Definition 14.1.5 (4), and 
Vy € I \ {1}, dz € I, y = s(z) is satisfied because T V {1} = 0. Therefore 1 € X, 
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Let x € X. Then there exists J € P(N) such that x € I and s(x) ¢ I and Vy € IN {1}, 3 = 
Let It = IU {s(x)}. Then s(x) € I*. Suppose that s(s(z)) € I*. Then s(s(z)) € I since s(s(x)) 4 s(x) 
by part (i). Let y = s(s(z)) Then y = s(z) for some z € I. But z Æ s(x) because s(x) 
Definition 14.1.5 (3), s(x) = z because s(s(x)) = s(z), which is a contradiction. So s(s(x)) € Let 
y E€ I* \ {1}. Then y E€ I \ {1} or y = s(x). If y € I \ {1}, then 3z € I, y = s(z). If y = s(x), then the 
proposition 3z € I, y = s(z) is validated by z = x. Thus s(x) € X. So N C X by Definition 14.1.5 (5). 
Therefore X = N. Hence for all x € N, there is at least one set 7 € P(N) which satisfies x € I and s(x) ¢ I 
and Vy € I \ {1}, dz € I, y = sz). 

For part (iii), assume that J C N and 1 € J and s(1) ¢ I and Vy € IV {1}, dz € I, y = s(z). Define 
X ={y € N;y=1o0ory¢ I). Then X CN and 1 € X. (It must now be shown that Vy € X, s(y) € X.) 
Let y € X. If y = 1, then s(y) € X because s(1) ¢ I. So suppose that y € X and y #1. Then y ¢ I by 
the definition of X. Suppose that s(y) € I. Then s(y) — s(z) for some z € I by the definition of 7. But 
then y = z by Definition 14.1.5 (3). So y € I, which is a contradiction. Therefore s(y) ¢ I. So s(y) € X. 
Therefore s(X) C X. So NC X by Definition 14.1.5 (5). So Vy € I, y = 1 because I C IN. Hence I = {1}. 
For part (iv), let x € IN, I CN, s(x) € I, s(s(z)) ¢ I and Vy € I\ {1}, dz € I, y = s(z). Let IT = I\{s(z)}. 
By Definition 14.1.5 (4), s(x) # 1. So s(x) € I\ {1}. Therefore 3z € I, s(x) = s(z). So x = z by 
Definition 14.1.5 (3). So x € I. But x # s(x) by part (i). Therefore x € I^. Note also that s(x) ¢ I^. 

To show that Vy € I~ \ {1}, dz € I7, y = s(z), let y € I7 \ {1}. Then y € I \ {1} and y Z s(x). So y = s(x) 
for some z € I. To show that z € I~, it must be shown that z Z s(x). So suppose that z = s(x). Then 
y = s(s(x)), and so s(s(a)) € I because y € I. But this contradicts the assumption s(s(x)) ¢ I. So z Æ s(x), 
and so z € I~. Hence Vy € I \ {1}, 3z € I7, y= sz). 

For part (v), let Y = (x € N; JI € P(N), (x € I and s(x) ¢ I and Vy € I \ {1}, dz € I, y = s(z))). (In 
other words, let Y be the set of x € N for which the relevant set J € P(N) is unique.) For x € N, let 
Z, — (I € P(N); z € I and s(x) ¢ I and Vy e I\ {1}, 3z € I, y = s(z)). Then Vz € N, Z, z 0 by part (ii). 
Let I,I' € Zı. Then I = {1} =I’ by part (iii). So 1 € Y. 

Let x € Y. Let I,Ī € Zn). Let I~ = I \ {s(x)} and I^ = IX (s(z)). Then I^ ,I^ € Z, by part (iv). So 
I =I because of the inductive assumption that Z, contains only one set. Therefore Z,(, contains only one 
set. Thus Vr € N, (x € Y > s(x) € Y). Since 1 € Y, it follows from Definition 14.1.5 (5) that N C Y. 
Hence Vx € N, 3'I € P(N), (a € I and s(x) ¢ I and Vy € I\ {1}, 3z € I, y = s(z)). 


R 

(n 

Ty 
Ke 


14.1.12 REMARK: Definition of leading intervals of natural numbers. 
Since the set of natural numbers 7 in Theorem 14.1.11 (v) exists and is unique for each x € N, it can be 
given a name and notation, parametrised by x, as in Definition 14.1.13 and Notation 14.1.14. 


14.1.13 DEFINITION: The leading interval up to x in a natural number system (IN, s, 1), for x € N, is the 
subset I of IN which satisfies x € I and s(x) d I and Vy € IN {1}, 3z € I, y = s(z). 


14.1.14 NOTATION: 
N{1, x], for x € N, for a natural number system (N, s, 1), denotes the leading interval up to x in IN. 


[1, z] is an abbreviation for N[1, x] when N is indicated in the context. 


14.1.15 THEOREM: Some basic properties of leading intervals in a Peano natural number system. 
Let (IN, s, 1) be a system of natural numbers. 


(i) NI 1] = {1}. 

(ii) Vr € N, N[L s(z)] = N[1, x] U {s(a)}. 
(iii) N[1, s(1)] = (1, s(1)}. 
(iv) Usen NI, 2] = N. 


PROOF: Part (i) follows from Theorem 14.1.11 (iii) and Definition 14.1.13. 


For part (ii), let x € IN and I = N[1,s(z)]. Then by Definition 14.1.13 for s(x) and Theorem 13.5.3 (iv), 
I~ = N{I1, s(x)] \ {s(x)} satisfies Definition 14.1.13 for x. Therefore N[1, s(z)] V (s(z)) = N[1, x] by the 
uniqueness property in Theorem 13.5.3 (v). Hence N[1, s(z)] = N[1, z]U{s(x)} by Theorem 8.2.5 (ix) because 
s(x) € NÍL, s(a)]. 
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Part (iii) follows from parts (i) and (ii). 
Part (iv) follows from the observation that IN[0, x] C IN for all x € IN, which implies |J 
the observation that x € N[0, x] for all x € IN, which implies U „en IN[0, z] 2 IN. 


zen NIO, x] & N, and 


zcN 


14.1.16 THEOREM: Uniqueness up to isomorphism of a Peano natural number system. 
Let (IN, s, 1) and (IN', s', 1^) be systems of natural numbers. 


(i) There is a unique bijection f : IN —^ IN’ such that f o s = s' o f and f(1)= 1. 
(ii) #(N) = #(N’) = #(v). 


PROOF: For part (i), let X be the set of x € N such that there is a unique function f : N[1,s(x)] —^ IN’ 
which maps 1 to 1’ and maps s-successors to s'-successors. In other words, let 


X ={x €N; f: N[L s(z)) > N', (f(1) = 1' and Vy € N[La], f(s(y)) = s'GQ)) - 


To show that 1 € X, define f : N[1, s(1)] > N’ by f(1) = 1’ and f(s(1)) = s'(1'). This is a well-defined 
function because IN[1, s(1)] = (1, s(1)) by Theorem 14.1.15 (iii), and f is a bijection because N’[1’, s’(1’)] = 
(1^, s’(1/)} similarly. Clearly Vy € N[1, 1], f(s(y)) = s'(f(y)) since N[1, 1] = {1} by Theorem 14.1.15 (i). The 
uniqueness of f follows from f(1) = 1' and Vy € N[1, x], f(s(y)) = s'(f(y)), which implies f(s(1)) = s'(1^). 
Let x € X. Then there is a unique f : N[1, s(z)] ^ N’ with f(1) = 1' and Vy € N[1, z], f(s(y)) = s'(f(y)). 
Define f* : N[1,s(s(z))) > N’ by f*(x) = f(x) for x € N[1,s(z)] and f*(s(s(x))) = s'(f(s(z))). Then 
+ :N[Ls(s(z)) + N’ satisfies f*(1) = 1' and Vy € N[Ls(z)], ft(s(y)) = s ‘+ (y )). To see that 
* is unique, let Fu and iE both satisfy these conditions. Then JE | ecm) and ls nme must be equal 


— 


)] 
because they satisfy the conditions of the inductive hypothesis, and then fi (s(s(z))) = f3 (s(s(z))) follows 


from the requirement Vy € N[1,s(z)], f *(s(y)) = s'(f*(y)), which implies ft (s(s(z))) = s'(ft(s(z))) = 
s'(f(s(z))) = s'(fd (s(z))) = fi (s(s(x))). So fi = f£. Thus s(x) € S. So N C X by Definition 14.1.5 (5). 
Therefore Vr € IN, F f : N[1, s(z)] > N’, (f(1) = 1' and Vy € N[L, z], f(s(y)) = s'(f(y))). 

For each x € N, denote by fs the unique function f : N[1,s(x)] — N’ which satisfies f(1) = 1’ and 
Vy € N[1, x], f(s(y)) = s'(f(y)). Then fe © fsa) for all € N. So f = Usen fe is a well-defined function 
from N to N’ which satisfies f(1) = 1' and Vy € N, f(s(y)) = s'(f(y)). The uniqueness of f follows from 
the uniqueness of Tues — f, on the leading intervals N[0, x] for x € IN, using Theorem 14.1.15 (iv). The 


injectivity and surjectivity of f : N — N’ follows from the injectivity and surjectivity of fs : N[0, 1] — N'[0, 1 
on N[1, x] for each x € IN. Hence f is the unique bijection from f : IN to N’ which satisfies f o s = s’ o f 
and f(1) = T. 

Part (ii) follows from part (i) and Theorem 14.1.9, Definition 13.1.2 and Notation 13.1.5. 


14.1.17 REMARK: The identification of natural numbers with positive finite ordinal numbers. 

Since all “models” for the natural number system are isomorphic in the sense defined by Theorem 14.1.16 (i), 
one is free to either choose one particular model to call the model or else to say that the model is of no 
importance. The main advantage of being “model-free” is that no artefacts of the model can introduce 
extraneous side-effects. This issue arises with every number system and algebraic system in mathematics. 


The approach taken here is to present one concrete model, but then to ignore the chosen ZF model unless 
it is convenient to use it. Thus the symbol “N” in Notation 14.1.19 refers, from now on, to the particular 
natural number system in Definition 14.1.18. This implementation conveniently inherits the order relation 
from w in Definition 12.2.5 and the (suitably restricted) addition and subtraction operations in Definitions 
13.6.3 and 13.6.4. 


14.1.18 DEFINITION: The standard system of natural numbers is the tuple N < (IN, s, 1) with IN = w\ 0), 
s : N — N defined by Vz € N, s(x) = x U {x}, and 1 = QU (0). 


14.1.19 NOTATION: WN denotes the standard system of natural numbers. 


14.1.20 REMARK: Convenient notation for “leading intervals? of natural numbers. 
Notation 14.1.21 uses a subscript n which is an element of w so that Nọ = Ø. Otherwise each set IN, is 
the same as the corresponding leading interval N[1,n] in Notation 14.1.14. The form of notation N,, for 
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(possibly empty) leading intervals of natural numbers is frequently convenient for specifying the domains 
of finite sequences which start with 1 instead of 0. (An essentially identical notation N, is defined by 
Wilder [403], page 64, with the same meaning.) 


Ellipsis dots are very widely used for index ranges instead of notations such as N,,. Thus i = 1,...n is often 
written instead of i € Nn. But in many circumstances, ellipsis dots are unsuitable, particularly as ranges 
for logical quantifiers. 


14.1.21 NOTATION: Ny», for n € w, denotes the set {i € IN; i C n}. That is, Nn = {1,2,...n}. 


14.1.22 THEOREM: Very basic properties of natural number leading intervals. 
(i) No — 0. 
(ii) Ni = (1) = (0). 


Proor: For part (i), let x € No. Then z € N and z C 0 = ( by Notation 14.1.21. Therefore x = by 
Theorem 7.6.5 (ii). But this contradicts Definition 14.1.18 for the natural numbers. So No has no elements. 
Hence No = () by Definition 7.6.3. 

For part (ii), 1 = Ø U (0) = {0} by Definition 14.1.18. Therefore 1 € N by Theorem 12.1.30 (ii, iii). But 
1 C 1 by Theorem 7.3.5 (i). So 1 € IN; by Notation 14.1.21. 


Now let x € Ni. Then x € N and x C 1 by Notation 14.1.21. So x Æ Ø by Definition 14.1.18. Therefore by 
Definition 7.6.3, y € x for some y. But by Definition 7.3.2, y € 1 for all y € x. So y = 0) for all y € x by 
Notation 7.5.7. Therefore x = (0) = 1. Hence IN; = {1} = ((0]]. 


14.2. Peano-axioms-style countability tests 


14.2.1 REMARK: Applications of the Peano axioms to cardinality testing. 

For the purposes of cardinality testing, Definition 13.7.6 is quite unsatisfactory. Logical predicates which 
state that sets are countable, for example, must refer to a fixed “yardstick set” w which requires a fairly 
complex specification as in Definition 12.1.28. Instead of first constructing a “yardstick set" w and showing 
that it exists and is unique, and then requiring the existence of a bijection between w and a set X, it is far 
preferable to use some predicate which refers only to the set X, without any “yardsticks”. The non-standard 
Definition 14.2.2 presents such a predicate. 


14.2.2 DEFINITION:  Peano-style infinite sets. 
A Peano-infinite set is a set X for which there exist an element z € X and an injection s : X > X \ {2} 
satisfying the induction property VY, ((z € Y and s(Y) CY) = X C Y). In other words, 


JzeX,d3s: X > X \ {2}, 
s is injective and VY, ((z € Y and s(Y) C Y) 2 X C Y). 


14.2.3 THEOREM: Equivalence of Peano-infinite and countably infinite set properties. 
(i) Every Peano-infinite set is countably infinite. 


(ii) Every countably infinite set is Peano-infinite. 


Pnoor: Part (i) follows from Theorem 14.1.16 (ii) and Definitions 14.1.5 and 14.2.2. 


For part (ii), let X be a countably infinite set. Then by Definition 13.7.6, there exists a bijection f : w — X. 
Let z = f(@) and define s: X — X \ {z} by s(x) = f(f !(z) U{f-1(x)}) for all z € X. 


(x 
Let Y be a set which satisfies z € Y and s(Y) C Y. Let S = f-1(Y). Then S satisfies S C w, 0 € S, and 
Vr € S, xU {x} € S because x U {x} = f-!(s(f(x))) for all x € S, which implies x U {x} € f-!(s(f(S))) C 
f (s(Y) € f-(Y) = S since f(S) C Y and s(Y) C Y. Therefore S = w by Theorem 12.2.12. So 
X = f(w) = f(S) = f(f *(Y)) € Y by Theorem 10.7.1 (ii). Thus X satisfies the induction property 
VY, ((z € Y and s(Y) C Y) > X C Y). Hence X is a Peano-infinite set by Definition 14.2.2. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13 


500 14. Natural numbers and integers 


14.2.4 REMARK: Comparison of Peano-infinite sets and Dedekind-infinite sets. 

If the induction property requirement is omitted, Definition 14.2.2 would be equivalent to Definition 13.10.2 
for Dedekind-infinite sets. (Consequently all Peano-infinite sets are infinite by Theorem 13.10.4, with or 
without the induction property.) It is thus the induction property which restricts Peano-infinite sets from 
Dedekind-infinite to countably infinite. 


It is shown in Theorem 13.10.6 (i) that a set X is Dedekind-infinite if and only if it is w-infinite, which means 
by Definition 13.7.6 (or Theorem 13.10.6 (ii)) that there exists a bijection from w to some subset of X. Thus 
the induction property in Definition 14.2.2 has the effect of replacing the inequality #(X) > #(w) with the 
equality #(X) = #(w). Then relaxing the range of the injection s from X \ {z} to X in Definition 14.2.6 


yields #(X) < #(w). 


14.2.5 REMARK: Peano-style definition for countable sets. 
Definition 14.2.6 relaxes the target set for the injection s in Definition 14.2.2 from X \ {z} to X. It follows 
immediately that every Peano-infinite set is Peano-countable. 


The proof of Theorem 14.2.7 (i) helps to give more meaning to the concept of a “countable set”. The existence 
of a starting point z € X and a “successor function” s: X — X with the induction property means that the 
set X may be covered or “exhausted” by inductively covering each point in X with some element of w. One 
may say that in this way, all of the points in X are “counted” by elements of w. It is the inclusion relation 
“X CY" which guarantees that there are no elements of X which are not counted. Thus countability and 
the induction property are very closely related. 


14.2.6 DEFINITION:  Peano-style countable sets. 

A Peano-countable set is a set X which is either empty or such that there exist an element z € X and an 
injection s : X — X satisfying the induction property VY, ((z € Y and s(Y) C Y) > X C Y). In other 
words, either X = ( or 


ds: X >X, s is injective and dz € X, VY, ((z € Y and s(Y) C Y) 2 X C Y). 


14.2.7 THEOREM: Equivalence of Peano-countable and countable set properties. 
(i) Every Peano-countable set is countable. 


(ii) Every countable set is Peano-countable. 


PROOF: For part (i), let X be a Peano-countable set. If X = Ø then X is countable by Definition 13.7.6. 
So assume that X z (). Then there exist an element z € X and an injection s : X — X which satisfy 
VY, ((z € Y and s(Y) C Y) > X C Y). For such z and s, define f : w — X inductively by f(0) = z and 
f(n*) = s(f(n)) for all n € w. Suppose that f(n) = f(0) for some n € w X (0). Let ñ be the least such n. 
(This is well defined and ñ € w \ {Ø} because w is well ordered by Theorem 12.2.7. However, the minimality 
of ri is not used in this proof.) 

Let Y = {f(i);i en}. Let x € Y. Then z = f(i) for some i € n. So s(x) = s(f(i)) = fi) € Y if i* € n. 
If i+ d n, then i+ = n by Theorem 12.1.21 (iv), and so s(x) = f(n) = f(0) € Y. Therefore s(Y) C Y. 
Since z € Y and s(Y) C Y, it follows that X C Y. So Y — X because Y C X. Therefore Fla Dn X is 
surjective. So X is finite by Theorem 13.7.13 (ii). 


Now suppose that f(n) 4 f(0) for all n € w \ (0). Let Y = Range(f). Since z € Y and s(Y) C Y (because 
x € s(Y) implies x = s( f (i)) for some i € w, which implies x = f(i*) € Y), it follows that X C Y. Therefore 
X — Y, and so Range(f) = X. Hence X is countable by Theorem 13.7.13 (iv). Thus in both cases, X is 
countable by Definition 13.7.6. 


For part (ii), let X be a countable set. Then by Definition 13.7.6, X is either finite or countably infinite. 
Suppose that X is finite. Then there is a bijection f : n > X for some n € v. Let s = f o s, o f7}, 
where s, : n — n is the wrapped successor function in Definition 12.2.22. Then s: X — X is a well-defined 
bijection because s, : n — n is a well-defined bijection by Theorem 12.2.23 (i,ii). If n = Ø, then X = 0 is 
Peano-countable. So assume that n # Ø. Then f(@) is well defined. So T 


_ SEE) if FO ee en 
me = { f(9) otherwise. 
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Let z = f(0). Let Y be a set which satisfies z € Y and s(Y) C Y. Let Y’ = f !(Y). Then 0 € Y" 
and s,(Y’) = (f^! o s o f)(Y’) = f-!(s(Y)) C f-!(Y) = Y'. Therefore Y' = n by Theorem 12.2.24. 
So by Theorem 10.7.1 (i), Y 2 f(f (Y)) = f(n) = X. Thus Y satisfies the induction property in 
Definition 14.2.6. So X is a Peano-countable set. 


Now suppose that X is countably infinite. Then there is a bijection f : w — X. Let s = f o s, o f~+, where 
Sw : w — w \ (0) is the successor function s, : x  x* shown to be a bijection in Theorem 12.2.23 (iii). 
Let z = f(0). Then s: X — X \ {z} is a bijection. So s: X — X is an injection. Let Y be a 
set which satisfies z € Y and s(Y) C Y. Let Y' = f !(Y). Then 0 € Y' and s,(Y') = (f! oso 
JY?) = f-!(s(Y)) € (Y) = Y’. Therefore Y' = w by Theorem 12.2.12. So by Theorem 10.7.1 (i^), 
Y 2 f(f-!(Y)) = f(w) = X. Thus Y satisfies the induction property in Definition 14.2.6. So X is a Peano- 
countable set. (As an alternative proof that a countably infinite set X is Peano-countable, note that X is 
Peano-infinite by Theorem 14.2.3 (ii), which implies that X is Peano-countable by comparing Definitions 
14.2.2 and 14.2.6.) 


14.2.8 REMARK: Expression for the axiom of countable choice without using ordinal numbers. 

As suggested in Remark 13.7.20, the countable choice axiom in Definition 13.7.21 a set-theoretic expression 
may be written for the countability of a set in terms of a definition for wt, the set of extended finite ordinals, 
and the notion of countability in Definition 13.7.6. This seems somewhat unnatural, that a set theory axiom 
refers to a particular constructed “yardstick set" w* for testing cardinality. This implies that substantial 
theoretical work is required before the supposedly fundamental axiom can be stated. Hence the Peano 
countability in Definition 14.2.6 is of some interest. 


By Theorem 14.2.7 (i, ii), “countability” can be replaced by “Peano countability", which makes no reference 
to ordinal numbers. The core concept of the ordinal number construction is the successor map, which is 
replaced in Definition 14.2.6 by an injection satisfying an abstract induction property. So the countable 
choice axiom (9^) in Definition 13.7.21 may be rewritten as follows. (The vacuous case X = () is ignorable.) 


VX, (22 € X, ds : X > X, (s is injective and VY, ((z € Y and s(Y) CY) X C Y))) 


=> ((0¢ X and VA, B € X, (A=B or ANB 20) > (3C, VA € X, Jw € C, w e A)). 


This expression will double in size if fully expanded to a set-theoretic predicate. However, it has the advantage 
that w is not referred to. Therefore in principle, it could be stated alongside the ZF axioms to define ZF+CC 
set theory before commencing the first theorems and definitions. 


14.2.9 REMARK:  Peano-style definition for finite sets. 

Definition 14.2.10 differs from Definition 14.2.6 only by requiring the injection s : X — X to be a bijection. 
Definition 14.2.2 for Peano-infinite sets requires the successor map s : X — X to not be an injection. So 
it seems like a reasonable guess that a set should be Peano-finite if and only if it is finite. The proof of 
Theorem 14.2.11 (i, ii) duplicates much of the proof of Theorem 14.2.7 (i, ii). 


14.2.10 DEFINITION: Peano-style finite sets. 

A Peano-finite set is a set X which is either empty or such that there exist an element z € X and a bijection 
s: X — X satisfying the induction property VY, ((z € Y and s(Y) C Y) > X C Y). In other words, either 
X —0or 


ds: X >X, sis bijective and dz € X, VY, ((z € Y and s(Y) C Y) 2 X C Y). 


14.2.11 THEOREM: Equivalence of Peano-finite and finite set properties. 
(i) Every Peano-finite set is finite. 


(ii) Every finite set is Peano-finite. 


PROOF: For part (i), let X be a Peano-finite set. If X = Ø then X is clearly finite. So assume that X # (). 
Then there exist z € X and an bijection s : X — X such that VY, ((z € Y and s(Y) C Y) > X C Y). For 
such z and s, define f : w — X inductively by f(0) = z and f(n*) = s(f(n)) for all n € w. 
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Suppose that f(n) Æ f(0) for all n € w \ {Ø}. Since s: X — X is bijective, there is a unique z € X such 
that s(x) = z, and there exists i € w such that f(i) = x. So f(i*) = s(f(i)) = z. However, it 4 ( because 
i € i*, and f(i*) — f(0), which contradicts the assumption. 

So it may be assumed that that f(n) = f(@) for some n € w \ (0). As in the proof of the finite case in 
Theorem 14.2.7 (i), let ñ be the least such n. (This is well defined and ñ € w \ (0) because w is well ordered 
by Theorem 12.2.7. However, the minimality of 7 is not used in this proof.) 

Let Y = {f(i);i en}. Let z € Y. Then z = f(z) for some i € n. So s(x) = s(f(i)) = f(i*) € Y ifit en. 
If it ¢ n, then it = n by Theorem 12.1.21 (iv), and so s(x) = f(n) = f(0) € Y. Therefore s(Y) C Y. 
Since z € Y and s(Y) C Y, it follows that X C Y. So Y — X because Y C X. Therefore Fla :n— X is 
surjective. So X is finite by Theorem 13.7.13 (ii). 

For part (ii), let X be a finite set. Then there is a bijection f : n — X for some n € w. Let s = f o sn o f7}, 
where Sn : n — n is the wrapped successor function in Definition 12.2.22. Then s : X — X is a well-defined 
bijection because s; : n — n is a well-defined bijection by Theorem 12.2.23 (i, ii). If n = 0, then X = 0) is 
Peano-countable. So assume that n # Ø. Then f(0)) is well defined. So 


LJ fG (yu dtf ten 
dE m { fO) otherwise. 


Let z = f(0). Let Y be a set which satisfies z € Y and s(Y) C Y. Let Y’ = f !(Y). Then 0 € Y’ 
and s,(Y’) = (f-1 o s o f)(Y’) = f-'(s(Y)) C f-(Y) = Y'. Therefore Y’ = n by Theorem 12.2.24. 
So by Theorem 10.7.1 (7), Y 2 f(f (Y)) = f(n) = X. Thus Y satisfies the induction property in 
Definition 14.2.10. So X is a Peano-finite set. 


14.3. Natural number arithmetic 


14.3.1 REMARK: Addition and multiplication of natural numbers. 

The addition operator for natural numbers can be defined abstractly for the abstract natural number system 
in Definition 14.1.5, or it can be imported from the concrete addition operator for the finite ordinal numbers as 
in Definition 14.1.18. To save duplication of effort, it is most efficient to import the addition and subtraction 
operations in Definitions 13.6.3 and 13.6.4. 'The multiplication operation, whose construction is not presented 
here for the integers, is a fairly straightforward application of induction to the addition operation. Thus 
mt xn -—mxn--nand m x n* — m x n4 m, and so forth, for all m, n € N. (See for example Graves [85], 
pages 18-23, for an axiomatic treatment of natural number addition, multiplication and order.) 


14.3.2 REMARK:  Consistency of the axioms of arithmetic. 

It is often mentioned that Gódel's 1931 incompleteness theorem implied that the consistency of the axioms 
of arithmetic cannot be proved within the axiomatic system for arithmetic. (See Gödel [356] and Nagel/ 
Newman [373]. However, it was shown in 1936/38 by Gentzen [415, 416] that arithmetic is consistent by 
going outside the arithmétic system. This development in the consistency of arithmetic may be claimed as 
a valuable application of the ordinal numbers up to €o. 


Kleene [365], page 478, wrote the following in 1950. 


Gentzen's discovery is that the Gódel obstacle to proving the consistency of number theory can be 
overcome by using transfinite induction up to a sufficiently great ordinal. His transfinite induction 
is up to the ordinal called £o by Cantor, which is the first ordinal greater than all the ordinals in the 
infinite sequence w, w”, w^" ,.... It figures in Cantor's theory as the least of the solutions (called 
e-numbers) for € of the equation w* = £. 


Kleene [365], page 479, wrote the following regarding the philosophical acceptability of Gentzen’s result. 


The original proposals of the formalists to make classical mathematics secure by a consistency proof 
([...]) did not contemplate that such a method as transfinite induction up to £9 would have to be 
used. To what extent the Gentzen proof can be accepted as securing classical number theory in the 
sense of that problem formulation is in the present state of affairs a matter for individual judgement, 
depending on how ready one is to accept induction up to £o as a finitary method. 


Presentations of Gentzen's consistency proof for number theory are given by Takeuti [397], pages 102-114; 
Kleene [365], pages 476—499. The issue was summarised in 1946 by Weyl[157], page 144, as follows. 
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It is likely that all mathematicians ultimately would have accepted Hilbert’s approach had he been 
able to carry it out successfully. The first steps were inspiring and promising. But then Gödel 
dealt it a terrific blow (1931), from which it has not yet recovered. Gódel enumerated the symbols, 
formulas, and sequences of formulas in Hilbert’s formalism in a certain way, and thus transformed 
the assertion of consistency into an arithmetic proposition. He could show that this proposition 
can neither be proved nor disproved within the formalism. This can mean only two things: either 
the reasoning by which a proof of consistency is given must contain some argument that has no 
formal counterpart within the system, i.e., we have not succeeded in completely formalizing the 
procedure of mathematical induction; or hope for a strictly “finitistic” proof of consistency must be 
given up altogether. When G. Gentzen finally succeeded in proving the consistency of arithmetic 
he trespassed those limits indeed by claiming as evident a type of reasoning that penetrates into 
Cantor’s “second class of ordinal numbers.” 


A slightly modified proof of Gentzen's consistency result for arithmetic was given in 1939 by Hilbert / 
Bernays [360], pages 372-387. They mention the relation between the Gódel incompleteness results and 
Gentzen's consistency theorem, citing (in the 1970 edition) a 1943 paper by Gentzen [417]. 


To prove consistency of integer arithmetic, it is not sufficient to provide a demonstration for the standard 
constructions within ZF set theory because ZF itself has not (and reputedly cannot) be shown to be consistent. 
'The consistency must be shown for axioms which do not rely on ZF set theory. If one accepts the consistency 
of ZF set theory, which most mathematicians do, then natural number arithmetic would appear to be safe. 
But unexpected contradictions in mathematics have arisen often enough in the past to make it seem unreliable 
for the future. It is difficult to be totally certain that no new antinomies will be uncovered in the next million 
years. In the meantime, our arithmetic seems to be reliable enough for sending robots to Mars and Pluto. 


14.4. Integers 


14.4.1 REMARK: Approaches to the definition of number systems. 
Although the ordinal numbers are essentially always defined directly in terms of ZF sets as in Section 12.1, 
other number systems may be defined in multiple ways. 


(1) Axiomatically, as abstract objects defined only by predicate logic. 
(2) Direct ZF sets, identifying each number in the system with a particular ZF set. 
(3) By construction, constructing larger number systems from smaller systems. 


4) By restriction, restricting a larger number system to a smaller sub-system. 
g g y 


In the restriction approach, one may commence with complex numbers and define the real numbers, rational 
numbers, signed integers, unsigned integers and natural numbers by successive restrictions. In the construc- 
tive approach, one successively builds up number systems in the opposite order, for example, first natural 
numbers, then unsigned integers, signed integers, rational numbers, real numbers and complex numbers. 
Alternatively, one may give a direct representation of numbers in terms of ZF sets, which is generally only 
done for the smaller systems. Or one may define a number systems purely abstractly in terms of axioms 
written either formally in terms of a predicate logic or informally in terms of rules for existence and the 
properties of arithmètic operations. 


A substantial advantage of the restriction approach is that the numbers in all systems may be freely mixed, 
as they often must be. One could, in principle use an “extension approach”, whereby a lower-level system 
is extended as required for a higher-level system. This has the advantage of mixability, but then there 
are burdensome technicalities which arise with arithmètic operations, which must frequently swap between 
multiple styles of representation. 


In practice, yet another approach is actually followed, namely the “identification approach”, whereby the 
differences in representations are ignored most of the time so that all systems can be regarded as restrictions 
from one big number system (such as the complex numbers), but specific representations can be invoked 
whenever convenient. However, when presenting the foundations of mathematics, one generally chooses some 
combination of the pure axiomatic approach, the direct ZF-set-based approach, and construction of larger 
systems from smaller systems. As mentioned in Remark 12.0.2, neither logic, sets nor numbers have clear 
primacy over the others. They have “shared primacy”. Therefore the choice of approach to defining number 
systems is a matter of personal taste and convenience. 
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A fairly sensible approach here may be as follows. 

(1) Define the set w of finite ordinal numbers directly in terms of ZF sets. (See Section 12.1.) 

Define natural numbers both axiomatically and directly as ZF set. (See Section 14.1.) 

Define signed integers both axiomatically and as ZF sets (equivalence classes of ordinal number pairs). 
Define unsigned integers by restriction of the signed integers. 


Define real numbers as equivalence classes of rational number sequences, and optionally axiomatically. 
Define complex numbers as pairs of real numbers. 


Identify the elements of all number systems with the corresponding elements of higher number systems. 


) 
) 
) 
5) Define rational numbers as equivalence classes of pairs of integers, and optionally axiomatically also. 
) 
) 
) 
) 


Define extended natural numbers, signed integers, unsigned integers, rational numbers, real numbers 
and complex numbers by adding one or more “points at infinity” to these number systems. 


14.4.2 REMARK: Definition of signed integers as equivalence classes of unsigned integer pairs. 
Figure 14.4.1 illustrates the equivalence classes which are used in one popular construction of the signed 
integers from unsigned integers. This is formalised in Definition 14.4.3. 


n -8 -7 -6 -5 -4 -3 -2-10 1 2 3 4 5 6 7 8 
Q p SS SS SS SS SS SS p gw gx gx yg pg y 
Q p SS SS pg SS m gm gu gu gx gx pg p pg pg S 1g 
SS pw gw gw gw gw g SS mw gx x gu gx 5. 
W pm p gw pg gw pw gw gp gw p SS gp a 
SS SS SS SS pg SS m p pg gu ux pg gx pg pg p 1a 
W pm p pg pg gw gw gw pg gw SS gw pg p SS p S 14 
SS SS SS gg gm pw gu gu gx gx gx pg pg gx 1s 
CS SS pg p pg pw pw p gw x x x gx gx SS S 1g 
(Ah hth 

Figure 14.4.1 Construction of signed integers from pairs of unsigned integers 


14.4.3 DEFINITION: A space of (signed) integers is the set of equivalence classes of IN x IN for any system 
N of natural numbers, where (m1, n1) ~ (m2,n2) whenever m; + n2 = m» + n. 


14.4.4 NOTATION: Z denotes the set of (signed) integers. 


14.4.5 NOTATION: 

Zt = {n € Z; n> 0} = {1,...} denotes the set of positive integers. 

Zo = {n € Z; n > 0} = (0,1,...) denotes the set of non-negative integers. 

Z = {n € Z; n < 0} = {...— 3,—2,—1} denotes the set of negative integers. 

Zo = {n € Z; n < 0} = (...—3,—2, —1,0} denotes the set of non-positive integers. 


14.4.6 REMARK: Choice of notations for sets of integers. 

Reinhardt /Soeder [124], page 8, gives notations Z* = {1,2,3 ...} and Zj = {0,1,2,3...}, which agree with 
Notation 14.4.5. These do not seem to be in common use in the English-language literature, but they are 
used in this book for the greatest overall simplicity and consistency. 


14.4.7 NOTATION: Z, denotes the set (i € Z; 0 < i « n] = (0,1,...n — 1} for n € Zf. 


14.4.8 REMARK: Choice of notation for finite stretches of non-negative integers. 

Since the elements of the set Z,, in Notation 14.4.7 are all non-negative, one could define Z, as a subset of 
the ordinal numbers w. In fact, the set Zi, is essentially identical to the ordinal number n = {0,1,...n — 1}. 
However, the notation Zn is commonly used. It is also useful to be able to handle negative numbers in the 
same context as such sets since the modulo function (Definition 16.5.19) is defined for signed numbers and 
is closely associated with the sets Zi. 
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14.4.9 REMARK: Notations for general sub-intervals of the integers. 
General bounded sub-intervals of the integers may be denoted as in Notation 14.4.10. General unbounded 
sub-intervals of the integers may be denoted as in Notation 14.4.11. 


14.4.10 NOTATION: 

Z|m,n] denotes the set {i € Z; m € i € n) 2 (m,m 4- 1,...n) for m,n € Z. 

Z|m,n) denotes the set {i € Z; m € i « n] 2 (m, m - L,...n — 1) for m,n € Z. 
Z(m, n] denotes the set {i € Z; m « i € n] 2 (m 1, m -2,...n) for m,n € Z. 
Z(m, n) denotes the set {i € Z; m < i< n] 2 (m-- 1,m -2,...n — 1) for m,n € Z. 


14.4.11 NOTATION: 

Z(—co,n] denotes the set (i € Z; i € n) for n € Z. 
Z(—oo, n) denotes the set {i € Z; i < n} for n € Z. 
Z|[m, coo) denotes the set (i € Z; m < i) for m € Z. 
Z(m, oo) denotes the set {i € Z; m < i) for m € Z. 


14.4.12 REMARK: Computer representations of integers. 

'The *two's complement" representation of negative integers in computers, which is the most popular by far in 
personal computers, is of the form (—2”, p), where p € Zi is a non-negative integer satisfying 2”—! < p < 2", 
where n is the number of bits. The bit-patterns for 0 < p < 2"-! represent themselves. So the value 
represented by a two’s complement binary number p is ((p + 2"~+) mod 2") — 2"-! in terms of the modulo 
operation in Definition 16.5.19. 


14.4.13 REMARK: Construction of signed integers from unsigned integers with a “sign bit”. 
Figure 14.4.2 illustrates another popular way of representing signed integers. 


-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 
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> 


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 v 


Figure 14.4.2 Construction of signed integers from unsigned integers and sign 


This kind of representation is sometimes used in computers. It is used within Internet packets in particular. 
(For example, n-bit “one’s complement" representation has one sign bit followed by n — 1 value bits.) In this 
case, each signed integer corresponds to a unique value-sign pair (v, s), where v is a non-negative integer and 
s — 0 or 1. The pair (0,1) (meaning *—0") may or may not be excluded from the set. If *—0" is included 
in the set, it is defined to be equal to 0 anyway. 


14.4.14 REMARK: Sequences valued on integer intervals. 
a = (da)aeA denotes a function with domain A by Notation 10.2.10. If A is a subset Z[m, n] of Z, then a 
may be regarded as a sequence and written as (a;)?_,,. If A is a cross-product of two suitable subsets m and 


n of Z, then a may be regarded as a matrix and written as [as d 


General finite and infinite sequences are introduced in Definition 12.3.1. Sequences whose domains are totally 
ordered sets are introduced in Definition 11.5.20 


14.4.15 REMARK: How to avoid the technicalities of signed integers, rationals and real numbers. 

The number definition strategy adopted by Graves [85], pages 17-39, is to first define positive integers, then 
positive rational numbers, then positive real numbers, and then finally include non-negative numbers in the 
real number system. (See Graves [85], page 31, for this final step.) This strategy has the substantial advantage 
of avoiding numerous tedious technicalities in the theoretical development. According to Graves [85], page 26: 
"The second step in the historical development of the concept of number was the introduction of fractions." 
That is true. The algebraic structure of positive rational and real numbers is actually quite satisfying, and 
the technical overheads of including negative numbers “at the end" are not very great. It is then possible 
to either define signed integers and rational numbers as sub-systems of the real numbers, or one can include 
negative numbers in both the integers and the rational numbers in the same way as for real numbers. 
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Although this “positive numbers first” strategy has many advantages, it is not the way in which numbers 
are generally developed in modern introductory courses. All in all, the arguments on each side are about 
equal. Therefore the strategy adopted here is to include negative numbers in the integers, rationals and real 
numbers as soon as possible rather than leaving it until the last moment. The cost of this strategy is some 
avoidable, distracting technicalities. 


14.4.16 REMARK: Sums and products of finite sequences of integers. 

There is more than one way to define the sum of a sequence if the first index of the sum is greater than the last. 
It is generally safest to let the value equal zero in such cases. Similarly, the multiplicative identity 1 is assumed 
as the value if the index interval is empty. The sequence to be summed or multiplied in Notations 14.4.17 
and 14.4.18 must at least be well defined on the integer interval Z[m, n] for the specified m and n. In practice, 
the sequence will typically be well defined on some superset of this interval. (See also Notation 17.1.18 for 
the sum of a finite sequence of elements of a general commutative semigroup.) 


14.4.17 NOTATION: $; mai, for m,n € Zè and any function a : I + Z with I C Z[m, n], denotes the 
sum defined inductively with respect to n — m by 


(1) OL ag 2 0 ifn — m « 0, 


(2) oP ai = Om if n — m = 0, 
(3) amm Qi = ES ai) 4 à, if n — m » 0. 


14.4.18 NOTATION: J]j_,, ai, for m,n € Z and any function a: I > Z with I C Z[m,n], denotes the 
product defined inductively with respect to n — m by 


(1) ID, a; 9 1 if n — m < 0, 


i—m 


(2) II, @i = am if n — m = 0, 


(3) ID di TEZ; ai)an if n — m > 0. 
14.5. Extended integers 


14.5.1 REMARK: Pseudo-numbers for “points at infinity” of number systems. 
It does not matter how the pseudo-numbers oo and —oo are represented. The only thing that really matters 
is the order relations Vn € Z, n < oo and Vn € Z, —oo < n, and of course —oo < oo. 


14.5.2 DEFINITION: The set of extended integers is the set ZU (—0o0,0c]. 


14.5.3 NOTATION: Z denotes the set of extended integers Z U (—00, oo]. 


14.5.4 NOTATION: 
Z* = {n € Z; n» 0} = {1,2,...,00} denotes the set of positive extended integers. 


Zi = {n € Z; n > 0} = {0,1,...,co} denotes the set of non-negative extended integers. 
Z = {n € Z; n < 0} = {-oo,...—3,—-2,—1} denotes the set of negative extended integers. 
Zo = {n € Z; n < 0} = {—co,... — 3, —2, 1,0) denotes the set of non-positive extended integers. 


14.5.5 REMARK: Notations for sets of extended integers. 

The bar over the symbol for the set of integers in Notation 14.5.3 is inspired by the usual notation for the 
closure of a set in topology. The rough idea here is that by including the pseudo-numbers oo and —oo, the 
set of numbers is made in some sense "complete" by including the limits of numbers as they get arbitrarily 
large in either direction. Of course, the notion of “limit” requires topology, which is not defined for a purely 
algebraic system. (Example 31.11.2 gives a hint of how to give topological meaning to the idea that oo is 
the limit of arbitrary large integers.) The same over-bar convention is followed consistently (in this book) 
for the rational and real number systems also, as summarised in Remark 1.5.1, Table 1.5.1. 


14.5.6 REMARK: The extended natural numbers. 

Since the natural numbers are often identified with the positive integers Z+ (although not always, as noted 
in Remark 14.1.1, Table 14.1.1), it is somewhat superfluous to introduce a definition and notation for the 
extended natural numbers. Nevertheless, these are given as Definition 14.5.7 and Notation 14.5.8. 
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14.5.7 DEFINITION: The set of extended natural numbers is the set IN U {oo}. 


14.5.8. NOTATION: N denotes the set of extended natural numbers. 
Thus N = NU {co} = Z* = {1,2,..., cof. 


14.6. Cartesian products of sequences of sets 

14.6.1 NOTATION: X”, fora set X and n € Zi, denotes the set of functions N, > X. 

14.6.2 REMARK: Application of non-negative integers to the definition of set sequences. 

The set X” in Notation 14.6.1 is the same as the Cartesian product x;ew, X; with X; = X for all à € Np. 


(See Remark 14.1.20 for “leading interval" sets of natural numbers IN, = (1,...n). See Definition 10.11.2 
and Notation 10.11.3 for Cartesian set-products x;e; X; for arbitrary set families (.X;);ej.) 


One constant inconvenience with the Cartesian-product integral powers of sets in Notation 14.6.1 is the fact 
that ^X"" is customarily defined to equal XN" = (N, 5 X) = (1,...n) — X, but the literal meaning of 
the set-expression *X"" is X" = (n — X) = (0,...n — 1} — X. These sets are not equal when n Æ 0 
because n # Nn. (This issue is also mentioned in Remark 14.12.1.) 


Theorem 14.6.3 is essentially the same as Theorem 10.11.6 (i) because X? = Xicr Si, where I = () and 
(Si)ier = is the empty function. 


14.6.3 THEOREM: The zeroth power of any set contains the empty set, and only the empty set. 
X? — (0) for any set X. 


PROOF: Let X bea set. Then X is the set of functions from No to X by Notation 14.6.1, where No = () 
by Theorem 14.1.22 (i). Therefore X? — (0) by Theorem 10.2.26 (ii). 


14.6.4 REMARK: The redundant case of zero-length sequences of sets. 

Notation 14.6.1 has the interesting consequence that X? — Y? for any sets X and Y, which follows from 
'Theorem 14.6.3. Thus the evaluation of a set expression often throws away any traces of the original classes 
of objects in the expression. So IR? and Z? are the same set for example, even though they might be thought 
of as being in different classes of sets in some sense. 


Preserving the “nature” of a set could be achieved by attaching a class tag to each set, which is suggested 
in Remark 8.8.1 because the set-value of an expression does not contain its entire meaning. Such tagging is 
done in computer languages because computers can't make reliable guesses of the class membership of data 
objects. Humans use context to guess meanings and class membership. 


Theorem 14.6.3 has the interesting further consequence that #(0) = 1, as shown in Theorem 14.7.9. More 
generally, #(X ) = 1 for every set X. 


14.6.5 DEFINITION: An (ordered) n-tuple of elements of a set X for n € Zf means an element of X”. 


A 2-tuple may be referred to as an ordered pair or (very rarely) as a duple. A triple means a 3-tuple. A 
quadruple means a 4-tuple. A quintuple means a 5-tuple. A sextuple means a 6-tuple. A septuple means a 
7-tuple. An octuple means an 8-tuple. 


14.6.6 REMARK: Two representations for the ordered-pair concept. 
The phrase “ordered pair" for a 2-tuple should be used with care because it can also mean Definition 9.2.2. 


14.6.7 REMARK: Latin versus Greek number-names. 
Since the other kinds of tuple use Latin names, a “quintuple” should not be called a “pentuple”. Thus a 
6-tuple is called a “sextuple”, not a “hexuple” for example. 


14.6.8 NOTATION: The expressions (a,b), (a,b,c), (a,b,c,d) and (a,b,c, d, e) for a,b,c,d,e € X denote 
respectively a duple, triple, quadruple and quintuple of elements of X. (The ordering on the printed line 


represents the order within each tuple.) Notations for general n-tuples are defined inductively on n. 
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14.6.9 REMARK: “Standard” concatenation and subsequence map template. 

Definitions 14.6.10 and 14.6.11 assume that sequences start with 1. This is the de-facto “standard” initial 
index value, but in some contexts, the sequences may start with 0 as in Definition 14.12.6 (iii, ix). Note also 
that the notation in Definition 14.6.11 clashes with Notation 11.5.26, which does not “shift the index”. 


The concatenation and subsequence maps are in some sense inverses of each other. This is asserted in 
Theorem 14.6.12, where part (i) is essentially equivalent to Theorem 14.12.10. 


14.6.10 DEFINITION: The standard concatenation map for a set X and m,n € Zg is the map concat : 
X" x X" — X""'*" which is defined for a € X™ and b € X" by concat : (a,b) + c, where c € X™* is 
defined by c; = a; for à € Nm and cj4m = bj for i € Nn. In other words, 


Va € X", Vb € X", Vi € Nmin, 


ai ifi e Nm 
concat(a, b)(i) = { bizm if i € Nmin\ Nm. 


14.6.11 DEFINITION: The standard subsequence map for a set X and p € Zg and m,n € Zt with m < 
n < p is the map II?, : XP — X"—~™*! which is defined by 


Va € XP, Vi € Nn—m4+1) II” (a) (à) = Am+i-1- 


14.6.12 THEOREM: Concatenation and subsequencing are inverse operations. 


Let X be a set. Let m,n € Zg- 
(i) Vae X™™, x = concat(IEP (x), II? 17 (x)). 


(ii) Vy € X", Yz € X", y = IIP (concat(y, z)). 
(iii) Vy € X™, Vz € X", z =I (concat(y, z)). 


PROOF: For part (i), let x € X"*". Leti € Nmn. Ifi € Nm, then concat(II (z),IImji1(z))(4) = 
II (z)(i) = 2; by Definitions 14.6.10 and 14.6.11. Ifi € Nmin \ Nm, then concat(II (z), I7 15 (z))(i) = 
IIT? (x)(— m) = £m+1)+i-m)-1 = £i. Thus concat (P (x), I5 17 (z))(i) = z; for all i € No, 44. Hence 
x = concat(IIf (x), I 11 (z)). 

For part (ii), let y € X™ and z € X". Let i € Nm. Then concat(y,z)(?) = y; by Definition 14.6.10. 
So II?" (concat(y, z))(?) = concat(y, z)(;) = y; by Definition 14.6.11. Thus II?" (concat(y, z))(i) = y; for all 
i € Nm. Hence y = IIP (concat(y, z)). 

For part (iii), let y € X™ and z € X”. Let i € Np. Then m +i € Nmin \ Nm. So concat(y, z)(m + i) = zi 
by Definition 14.6.10. Then II7711 (concat(y, z))(i) = concat(y, z)((m 4- 1) 4- — 1) = concat(y, z)\(m +i) = zi 
by Definition 14.6.11. Thus II 1 (concat(y, z))(¢) = z; for all i € Nn. Hence z = II, 1 (concat(y, z)). 


m 


14.7. Indicator functions and delta functions 


14.7.1 REMARK: The indicator function is in fact a function template. 

The numbers 0 and 1 in Definition 14.7.2 may be unsigned integers, signed integers, real numbers, complex 
numbers, or elements of any unitary ring or field. The domain and range of the function must be interpreted 
according to application context. The indicator function is thus more accurately described as a “function 
template”. It is a recipe for constructing a relation with domain S and range V for any given sets S and V, 
and any nominated pair of elements Oy, 1y € V. 


The value set V for indicator functions must contain at least elements Oy and ly with Oy Æ 1y. Typically 
the set V will be the ordinal number set w or 2 = {0,1} C w, or a semigroup, unitary ring or field such as 
Z+, Z, Q or IR. When V has an addition operation, one expects Ov = Oy + 0v, lv = Ov +1y = 1y +0y and 
Oy Z ly +1y Z ly. When V has a multiplication operation, one expects Oy = Oy -0y = Oy -ly = 1y - Oy 
and 1y -1y = ly. The same observations apply to the Kronecker delta function (Definition 14.7.10) and the 
Levi-Civita alternating symbol (Definition 14.8.34). 
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14.7.2 DEFINITION: The indicator function for a subset A of a set S, with value set V, is the function 
f:S— V defined by 


F ) _ Jiv, ifee A 

T= 0y, ifseS\A. 

14.7.3 NOTATION: ya for a subset A of a set S, with value set V, denotes the indicator function of A in S. 
Thus the function x4 : S — V satisfies 


(x) _ ly, r€À 

XAVE/7 30v, zeSVA, 

for any subset A of S. 

14.7.4 REMARK: An alternative name for an indicator function is “characteristic function". 


Figure 14.7.1 illustrates an indicator function x4 where A = {a} is a subset of S = Z?. The domain S is 
usually clear from the context. 


Figure 14.7.1 Function X{a} for 5 = 7? 


An indicator function is often called a characteristic function, but this can be confusing, especially in 
probability contexts. The customary use of the Greek letter x for the indicator function is no doubt derived 
from the first letter of yapaxtńe (which means “stamp, mark, characteristic trait, character, token"). 


14.7.5 THEOREM:  Equinumerosity of power sets and sets of indicator functions. 
For any set S, the sets IP(X) and 2? are equinumerous. 


PROOF: Define f : P(S) — 2° by f : At xa for all A C S, where x4 denotes the indicator function for 
A as a subset of S. Then f is a bijection. 


14.7.6 REMARK: Identification of power sets with sets of indicator functions. 

The sets 2° and P(S) are quite frequently identified with each other. The inverse of the canonical map in 
the proof of Theorem 14.7.5 is the map f^! : g 5 {x € S; g(x) = 1). Thus subsets of sets and indicator 
functions on sets may be regarded as “equi-informational” . 


In the case S = w, there is a bijection ¢ : 2” — [0, 1) which maps infinite sequences of zeros and ones to the 
binary expansions of real numbers in the interval [0, 1), namely 9 : g œ> X new gna", 


14.7.7 DEFINITION: The (integer) power-of-two function is the function f : Z — Z* which satisfies 
f(0) 2 1 and f(n +1) = 2f(n) for n € Zj. 


14.7.8 NOTATION: 2” for n € Zf denotes the integer power-of-two function for argument n € Zj. 


14.7.9 THEOREM: The cardinality of the set of functions from the empty set to the empty set is 1. 
#0" = 1, 


Pnoor: By Theorem 10.2.25, Ø = {0}. Therefore #(0®) = 1. 
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14.7.10 DEFINITION: The (Kronecker) delta function on a set S, for target set V, is the function 
f:SxS— V which is defined by 


|. Jiv, ife@=y 
14.7.11 NOTATION: Ó, for sets S and V, denotes the Kronecker delta function on S with any value set V. 


5ij, 54, 0] and ô denote the value (i, j) of the Kronecker delta function à : S x S — V for i,j € S, for any 
set S with any value set V. 


14.7.12 REMARK: The Kronecker delta function is a function template. 
The Kronecker delta function in Definition 14.7.10 is often applied to sets V which are subsets or supersets 
of the integers. The Kronecker delta function for V — Z is illustrated in Figure 14.7.2. 


Figure 14.7.2 Kronecker delta function on the integers 


As mentioned in Remark 14.7.1, the range of the Kronecker delta function must be interpreted according to 
context. The symbols *0" and “1” must be defined within the context. Thus the Kronecker delta function 
is really a function template. 


14.7.13 REMARK: Customary notation for the Kronecker delta function template. 

An irony of the Kronecker delta function notation 6 is that it is apparently a mnemonic for the word 
"difference", whereas the value 1 is generally associated with the meaning "true" and the value 0 is generally 
associated with the meaning “false”. But 6;; equals 1 if and only if i and j are the same! 


14.7.14 REMARK: Relation between indicator functions and Kronecker delta functions. 
Theorem 14.7.15 shows that the indicator function and delta function for any set S are closely related. A 
Kronecker delta function may be thought of as an indicator function which indicates a singleton set. 


14.7.15 THEOREM: Expression for indicator functions of singletons in terms of the Kronecker delta. 
For any set S, 


PROOF: Let S bea set. Let i,j € S. Let A= {i}. Then A C S. By Definition 14.7.2 and Notation 14.7.3, 


x) = xA) = 1v if j € A = {i}, and xi) = Ov if j € A = (ij. So xq (j) = 1v if i = j, and 
Xt) = Ov if à A j. So xq} (j) = dij by Definition 14.7.10 and Notation 14.7.11. 


14.7.16 REMARK: Support of functions. 

In the analysis of partial differential equations, one defines the “support” of a function to be the topological 
closure of the set of points in its domain where the function is non-zero. This concept is also useful in non- 
topological contexts. (See for example Remarks 12.6.5 (8) and 22.2.9.) There is no inconsistency between 
the topological definition and the non-topological Definition 14.7.17 if the discrete topology is assumed on 
the domain set S. Like the indicator and delta functions, the support of a function is a template whose 
details must be determined by the context. 


14.7.17 DEFINITION: The support of a function f : S — V is the set {x € S; f(x) Æ Ov}, where the zero 
element Oy € V is determined by the context. 
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14.7.18 NOTATION: supp(/), for a function f, denotes the support of f. 


14.7.19 THEOREM: The support of the indicator function of a set is the set itself. 
Let S and V be sets. Let 0y,1y € V be distinct. Then VA € P(S), supp(x4) = A. 


Pnoor: The assertion follows directly from Definitions 14.7.2 and 14.7.17. 


14.7.20 REMARK: Kronecker delta pseudo-notation template for propositional arguments. 
It seems more convenient to write f(x,y) = 6(“y = z?"), for example, than to write 


: 2 
V(x,y) € R, j= 1 ify2x 
(2,9) Fey) T otherwise. 


There are two problems with this. First, the formula f(a, y) = 6(“y = x?” ) specifies neither the domain nor 
the range of f. This problem is not too serious because the Kronecker delta function also has an unspecified 
domain and range, which are generally indicated in the surrounding context. The second problem is more 
serious, namely that the argument of 6 is a proposition. This breaks the rules of ZF set theory, which is a 
first-order language. Even worse than this is the fact that, in the example, z and y must be substituted into 
the proposition somehow. However, if this form of notation is understood to be a pseudo-notation template, 
it can be used reasonably unambiguously as a shorthand. 


Despite its customary notation “6”, the “Kronecker delta function for propositions” is more similar to 
an indicator function than a delta function. In any given context, the delta pseudo-notation template in 
Notation 14.7.21 can be converted to valid ZF language by copying the proposition P into the context. Note, 
however, that the string of characters in the text representing the proposition must be significant within the 
context. This is metamathematical enough to require the [MM] tag. 


14.7.21 NOTATION[MM]: Kronecker delta pseudo-notation template for propositions. 
6(P), for a proposition P (in ZF set theory), denotes ly if P is true, and Oy if P is false, where V is a set 
(indicated in context) which contains specified zero and unit elements. In other words, 


fiy ifP 
i= otherwise. 


14.8. Permutations 


14.8.1 REMARK: Permutations within one set versus permutations between two sets. 
The word “permutation” is used for two different but related concepts. The first kind of permutation, given 
in Definition 14.8.2, is a bijection P from a set X to itself. 


The second kind of permutation is an injection f from a set X to a set Y. This kind of permutation is called 
a “k-permutation” by EDM2 [113], article 330, to distinguish it from the set-bijection style of permutation. 
In the special case that X = IN;, the two concepts are the same. Feller [70], page 28, uses the term “ordered 
sample" for an ordered selection and uses the word “permutation” for the special case #(X) = k. The 
second kind of permutation is called an “ordered selection" in this book. (This is defined in Section 14.10.) 


14.8.2 DEFINITION: A permutation of a set X is a bijection from X to X. 
14.8.3 NOTATION: perm(X) for a set X denotes the set of permutations of X. 


14.8.4 REMARK: Application of permutations to rearrangements of domains or ranges of function. 

A permutation P of a set X is often used in conjunction with a function g: X — Y ora function h : Y > X 
for some set Y. Since P : X — X is a bijection, the functions go P: X —^ Y and Poh: Y > X are 
well defined. The function g o P is injective if and only if g is injective. Similarly, g o P is surjective if and 
only if g is surjective. Corresponding relations hold between P o h and h. Thus permutations are useful for 
modifying functions by rearranging the elements of the domain and/or range. 
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14.8.5 THEOREM: Closure conditions for some operations on sets of permutations. 
Let X be a set. 


(i) idx € perm(X). 

(ii) VP € perm(X), P-! € perm(X). 

(iii) VP, Q € perm(X), P o Q € perm(X). 

(iv) VY € P(X), VP € perm(Y), PUidxXy € perm(X). 


PROOF: Part (i) follows from Theorem 10.5.5 and Definition 14.8.2. 
Part (ii) follows from Theorem 10.5.11 and Definition 14.8.2. 
Part (iii) follows from Theorem 10.5.6 (iii) and Definition 14.8.2. 


For part (iv), let Y € P(X) and P € perm(Y). Let P=PU idx\y. Then P is a well-defined function 
because P and idx\y are well-defined functions and PM idx\y = @ because Dom(P) n (X V Y) = 0, and 
then P : X + X is a surjection because Range(P) = Range(P) U Range(idx\y) = Y U (X \ Y) =X. But P 
is injective because P and idx\y are injective and Dom(P) N Dom(id xy.) = Ø. Hence P € perm(X). 


14.8.6 REMARK: The set of permutations of a set forms a group. 
The set perm(X) of permutations on a fixed set X constitutes a group if the group operation is function 
composition, the group identity is the identity map on X, and the inverse elements are function inverses. 


Permutations are typically defined on finite sets. Permutations are useful for defining symmetries (e.g. for 
tensor spaces as in Section 30.1) and for constructing examples of finite groups. (It is well known, and very 
easy to show, that every finite group is isomorphic to a group of permutations of a finite set.) When the set 
X is infinite, a useful subgroup of perm( X) is the set of “finite permutations”, which are permutations for 
which only a finite number of elements are altered. 

The expression perm, (X) in Notation 14.8.11 may be justified by analogy with the standard notation Cg? (X) 
for smooth functions with bounded support. In this case, the functions P € perm,(X) have finitely many 
i € X with P(t) Z i, which could be thought of as a kind of “finite support". 


14.8.7 DEFINITION: The support of a permutation P € perm( X), for aset X, is the set (x € X; P(x) Z x]. 


14.8.8 NOTATION: supp(P), for a permutation P of a set X, denotes the support of P. In other words, 
VP € perm(X), supp(P) = (x € X; P(x) Z x). 


14.8.9 THEOREM: Some properties of the support of a permutation. 
Let X be a set. 


(i) supp(idx) — 0. 
(ii) VP € perm(X), supp(P~!) = supp(P). 
(iii) VP, Q € perm(X), supp(P o Q) € supp(P) U supp(Q). 
(iv) VP € perm(X), Vx € supp(P), P(x) € supp(P). 
(v) VP € perm(X), Vy € supp(P), P-!(y) € supp(P). 
), 


) 
) 
) 
) 
(vi) VP € perm( X Pcs € perm(supp(P)). 
i) 
) 
) 
) 


—~ ~~ vc 


(vii) VP € perm(X), P= Pleven U idx\supp(P)- 

(viii) VY € P(X), VP € perm(Y), a Uidx\y € perm(X). 
(ix) VP € perm(X), VY € P(X), supp(P) CY > P € perm(Y). 
(x 


PROOF: Part (i) follows from Notations 14.8.8 and 10.2.29. 

For part (ii), let x € supp(P). Then P(x) Z rz. So x = P !(P(x)) Z P^ !(x) because P: X > X isa 
bijection, which implies that P^! : X — X isa well-defined function with P^! o P = idx. So x € supp( P7!) 
by Notation 14.8.8. Therefore supp(P) C supp( P^). Similarly supp(P~!) C supp(P) by substituting P~! 
for P. Hence supp(P~') = supp(P). 


VY € P(X), VP € perm(Y), supp(P Uidx\y) = supp(P). 
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For part (iii), let x € supp(P o Q). Then P(Q(z)) Z x. So Q(x ) = P-(P(Q(r)) Æ P-!(x) because 
P^: X — X is a well-defined bijection. Therefore Q(x) Z x or P~!(x) Z x (or both). So x € supp(Q) or 


x € supp(P~'). So x € supp(P) Usupp(Q) by part (ii). Hence supp(P o Q) € supp(P) U supp(Q). 

For part (iv), let P € perm(X). Let x € supp(P) and y = P(x). Then y € X. Suppose that y ¢ supp(P). 
Then y € supp(P~') by part (ii). So P-!(y) = y by Notation 14.8.8. So x = P~'(y) = y  supp(P). This 
is a contradiction. So y € supp(P). Thus P(x) € supp(P). 

For part (v), let P € perm(X). Let y € supp(P). Then y € supp(P~') by part (ii). So P~'(y) € supp( P7!) 
by part (iv). Hence P~!(y) € supp(P) by part (ii). 

For part (vi), let P € perm(X). Let P- P| appi (P) Then Range(P) C supp(P) by part (iv). Now 
let y € supp(P). Then P-'(y) € supp(P) by part (v). Soy € Range( P). Thus supp(P) € Range(P). 
Consequently Range(P) = supp(P). But P is injective because P is injective. So P : supp(P) — supp(P) is 
a bijection. Hence Papii (p) € perm(supp(P)). 


For part (vii), let P € perm(X). Then P = P| 
by Theorem 8.2.5 (xi). Therefore P = P| 


supp(P) U P| eisat) because X = supp(P) U (X \ supp(P)) 


sippi U vr by Notation 14.8.8. 


For part (viii), let Y € IP(X) and P € perm(Y). Let P=Pu idx\y. Then P: X > X is a well-defined 
function because Dom(P) n (X V Y) = 0, and P : Y — Y and idyy : X \ Y — X \ Y are well-defined 
functions, and X = Y U(X V Y). Similarly, P' = P-!U idx\y : X > X is a well-defined function because 
P-! € perm(Y) by Theorem 14.8.5 (ii). But P' o P = idx = P o P'. In other words, P’ = P-t. So 
P = P Uidy\y € perm(X). 

For part (ix), let P € perm(X) and Y € P(X) satisfy supp(P) C Y. Then Pss pup) Vidy\supp(p) € perm(Y) 
by part (viii) because P| € perm(supp(P)) by part (vi). But P — P| 
Therefore P i= 
Ply € perm(Y). 


For part (x), let Y € P(X) and P € perm,(Y). Then P=Pu idx\y € perm(X) by Theorem 14.8.5 (iv). 
Let r € X satisfy P(x) £ x. Then x € Y because P(x) = x for all x € XXY. So z € supp(P). 


Thus supp(P) C supp(P). But if z € supp(P) then P(x) = P(x) # x. So x € supp(P). Therefore 
supp(P) 2 supp(P). Hence supp(P U idx y.) = supp(P). 


supp( ACC by part (vii). 
U idy\supp(p)) because supp(P) C Y C X. Hence 


supp(P) 


U id x\supp(P) ly = P| 


(Poste) supp(P) 


14.8.10 DEFINITION: A finite permutation of a set X is a bijection f : X — X with finite support. 


14.8.11 NOTATION: perm (xX), for a set X, denotes the set of finite permutations of X. In other words, 


permo( X) = (P € perm(X); #(supp(P)) < oo} 
= (P € perm(X); sx € X; f(x) # xj < co} 
= (P € perm( X); P is a finite permutation of X). 


14.8.12 THEOREM: Closure conditions for some operations on sets of finite permutations. 
Let X be a set. 


(i) idx € permg(X). 

(ii) VP € permo(X), P^! € perm,(X). 

(iii) VP, Q € permg(X), Po Q € perm,(X). 

(iv) VY € P(X), VP € perm)(Y), PUidxXy € permo(X). 


PROOF: Part (i) follows from Theorem 14.8.9 (i) and Notation 14.8.11 because #(@) = 0 < oo. 
Part (ii) follows from Theorem 14.8.9 (ii) and Notation 14.8.11. 
Part (iii) follows from Theorem 14.8.9 (iii) and Notation 14.8.11 and Theorem 13.1.12. 


For part (iv), let Y € P(X) and P € perm,(Y). Then PU idx\y € perm(X) by Theorem 14.8.5 (iv), 
and then supp(P U idx\y) = supp(P) by Theorem 14.8.9 (x). Therefore #(supp(P U idx\y)) < oo. Hence 
P Uidxy € permg(X) by Notation 14.8.11. 
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14.8.13 REMARK: Transpositions are permutations which swap only two elements. 

A permutation of a set X is a transposition if and only if exactly two elements of the set are swapped; in 
other words, #{x € X; f(x) A x} = 2. If the set X in Definition 14.8.14 has less than two elements, there 
are no transpositions on X because the only permutation is the identity map. So clearly all transpositions 
and the identity map are finite permutations. 


Definition 14.8.14 is related to the “swap-item” list operation in Definition 14.12.6. 


14.8.14 DEFINITION: A transposition of a set X is a bijection f : X — X such that for some i,j € X 
with i Æ j, 


j ifa=i 
Va c X, f(z)24i ifz-j 
r otherwise. 


14.8.15 REMARK: Finite permutations are generated by transpositions. 

Theorem 14.8.16 says in essence that the set of all finite permutations is generated from the set of all 
transpositions. (See Example 17.6.7.) This is analogous to the way in which a basis spans a linear space. 
(See Definition 22.7.2.) It is also analogous to the way in which a topological base or subbase spans a 
topology. (See Sections 32.2 and 32.3.) The practical advantage of “spanning-sets” is that concepts may be 
defined and properties may be proved for them and then automatically extended to the whole space. This 
is done in Theorem 30.1.12 for symmetric multilinear functions for example, which uses Theorem 14.8.16. 


14.8.16 THEOREM: Existence of expressions for finite permutations as finite composites of transpositions. 
Every finite permutation f of a set X may be expressed as f = Ti o Ty 10 ... Ti for some sequence (T;)*_, 
of transpositions of X with k € Zj. 


PROOF: Let A = {x € X; f(x) Z x}. Then n = #(A) < oo by Definition 14.8.10. Let $ : IN, — A bea 
bijection, which exists because A is finite. Let P — care If n = 0, then the assertion is valid with k = 0. 
The case n = 1 is impossible. If n = 2, then P = T, with T1(9(1)) = (2) and T1(9(2)) = 9 (1). 

Now assume that the assertion is true for n < ng for some ng € Zt V {1}. Suppose that #(A) = no + 1. 
Define T : A > A by 


n if x = P(n) 
Vz € A, T(zx)—4 P(n) ifz—n 
t otherwise, 


which is a transposition of A by Definition 14.8.14 because P(n) Z n since P(x) Z x for all x € A. Let 
P' — T o P. Then 

n if P(x) = P(n) 

Vr € A, P' = T(P(x)) = | P(n) if P(z) 2n 
x otherwise 
n ifr—n 
= P(n) ifz 2 P-!(n) 

x otherwise. 
Let A’ = {x € A; P'(z) A x). Then #(A) < no + 1 because P(n) = n. So P' = Ty o Ty 10 ...Ti fora 
sequence (7;)7 ., of transpositions of A’ with k € Zi. Therefore P = Ty 410 Ty o... T; with Ty ,4 = T, which 
verifies the assertion for n € ng + 1. Hence by induction, the assertion follows for all finite permutations 
of X. 


14.8.17 REMARK: Odd and even permutations. 
Finite permutations may be classified as odd or even according to the number of transpositions which they 
are equivalent to. All finite permutations are either even or odd, but never both. 


14.8.18 DEFINITION: An even permutation of a set X is a finite permutation P of X such that P = Tk o 
T1 0 ... Ti for some sequence (T;)*_, of transpositions of X, where k is an even integer. 


An odd permutation of a set X is a finite permutation P of X such that P = Ty o Ty. 4 o ... T, for some 
sequence (T;)*_, of transpositions of X, where k is an odd integer. 
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14.8.19 THEOREM: Some basic propras of evenness and oddness of finite permutations. 
Let X be a finite set with a total order “<”. Define S : perm(X) + P(X x X), N : perm(X) > Z and 
7 :perm(X) + {—1,1} by 


VP € perm(X), S(P)-—1í(x,y)eXxX;rc«yand P(x) > P(y)}, 
VP € perm(X), N(P) = #(S(P)), 
VP € perm(X), n(P) = (-1)N09, 

Then 


(i) N(P-!) = N(P) for all permutations P of X. 
(ii) «(P~+) = «(P) for all permutations P of X. 


(iii) Z(S(P o Q) A S(Q)) = #(S(P)) for all permutations P and Q of X. (See Definition 8.3.2 and 
Node 8.3.3 for the set-difference operator A.) 


(iv) N(PoQ) = 4(S(P) ^ $(Q7')) for all permutations P and Q of X. 
(v) «(P Ps m(P) 7(Q) for all permutations P and Q of X. 
(vi) a: = —] for all transpositions T of X. 
(vii) «(P) = 1 for all even permutations of X and (P) = —1 for all odd permutations of X. 


PROOF: For part (i), let P be a permutation of X. Then 


S(P7!) = { (x,y) € X x X; z < y and P7'(x) > P (y) 
= {(y,x) € X x X; x > y and P™! (x) < P7'(y)} (14.8.1) 
= ((P(y), P(x)) € X x X; P(x) > P(y) and x < y} (14.8.2) 
= {(P(y), P(2)) € X x X; z < y and P(x) > P(y))- 


Line (14.8.1) follows by relabelling (x,y) as (y,x). The line (14.8.2) follows by replacing (y,x) with 
(P(y), P(x)), exploiting the assumption that P is a bijection. Hence $(P~') = f(S(P)), where f : Xx X5 
X x X is the map f : (z, y) > (P(y), P(x)). But f is a bijection. So N(P^1) 2 N(P). 

Part (ii) follows from part (i) and the definition of 7. 

For part (iii), let P and Q be permutations of X. Then 


S(Po Q) ={(a,y) € X x X; z < y and P(Q(x)) > P(Q(y))} 
= ((Q^ (2), Q^ (y) € X x X97! (x) < Q^ (y) and P(x) > P(y)) 


S(Q) = (Q^ (2) Q^ (y)) € X x X^! (zx) < Q^ (y) and x > y}. 


S(PoQ)VS(Q) = (Q^ (z),Q (y) € X x X; Q^"! (x) < Q^ (y) and P(x) > P(y) and x < y} 


S(Q)NS(PoQ) = (Q^ (x), Q^ (y)) € X x X:Q^ (x) < Q^ (y) and z > y and P(x) < P(y)) 
= (Q^ (y), Q^ (x)) € X x X97! (z) > Q^ (y) and x < y and P(z) > P(y)} 
= g(((Q^ (x). Q^ (y) € X x X; Q^! (x) > Q^ (y) and z < y and P(x) > P(y)}), 


where the bijection g : X x X — X x X defined by g : (x,y) > (y, x). Therefore 
*(S(PoQ) A S(Q)) = 4H(Q" (2), Q^ (y) € X x X; « y and P(x) > P(y)) 
{(z,y) E€ X x X; x «y and P(x) > P(y)) 
( 
For part (iv), let P and Q be permutations of X. Then #($(P) A S(Q^!)) = #(S(P o Q)) by replacing Q 
with Q^! in part (iii) and then replacing P with P o Q. Thus N(P o Q) = #(S(P) A S(Q7?)). 
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For part (v), let P and Q be permutations of X. Note that #(A A B) = #(A) + #(B) — 2#(AN B) for 
any finite sets A and B by Theorem 13.6.9. Let A = S(P) and B = S(Q7'). Then Z(S(P) A S(Q^))) = 
#(S(P)) + #(S(Q-)) — 2#(S(P) A S(Q-Y) = N(P) + N(Q) — 248(P) n S(Q-3)) by part (i). So 
N(P o Q) 2 N(P) + N(Q) — 2#(S(P) n S(Q7")) by part (iv). Hence «(P o Q) = «(P)r(Q) because 
2 #(S(P)N S(Q^1)) is an even integer. 

For part (vi), let X be a finite set with total order “<”. Let T be a transposition of X with T(i) = j and 
T(j) =i for i,j € X with i < j. Then T(x) > T(y) for x < y if and only if 


(G< g) A (æ<) A w=) Vv (@ =D AG K<y) A <i) V (© =D A u= i). 
So N(T) = 2m + 1, where m = #({x € X; i < z and z < j}). Therefore (T) = —1. 
Part (vii) follows from parts (v) and (vi) and Definition 14.8.18. 


14.8.20 THEOREM: Finite permutations are always either odd or even, but never both. 
Let X be a set. Let X+ = (P € perm(X); P is even} and X~ = (P € perm(X); P is odd}. Then 


G XtAX =b. 
(ii) X* U X- equals the set of finite permutations of X. 
(iii) (X *, XT} is a partition of the set of finite permutations of X. 


PRoor: For part (i), let X be a set. Suppose P € Xt n X-. Then P = Sy o Sy 1 0 ...94 for some 
sequence (S;)* , of transpositions of X, where k is an even integer, and P = Ty o Ty_1 o ... T, for some 
sequence (77); ., of transpositions of X, where £ is an odd integer. So idx = Po P^! = Um oU, 10... Ui 
for a sequence (U;)?*; of transpositions of X, where m = k + £ is an odd integer. Thus idx € X^ n X7. 

Let Y = {x € X; Ji € Nm, Ui(x) Z x}. Then Y is a finite set. Every finite set has a total order. So let “<” 
be a total order on Y. Define N(Q) = #{(x,y) € Y x Y; x < y and Q(z) > Q(y)) and «(Q) = (—1)N(2 for 
all permutations Q of X which satisfy Vr € X V Y, Q(x) = x. Then by Theorem 14.8.19 (vii) applied to the 
restrictions of such permutations to Y, «(P) = 1 and «(P) = —1, which is impossible. Hence X^ n X- = (j. 


Part (ii) may be proved by induction on #(X). The assertion is trivially true for #(X) = 1. Suppose that the 
assertion is true for sets Y with #(Y) < n € N. Let X bea set with #(X) = n-- 1. Let P be a permutation 
of X. Let x € X. Define Q : X — X to equal P if P(x) = x. Otherwise define Q = T; p(;; o P, where 
T, p(z) is the transposition of x and P(x). Then Q(x) = z, and so Ql xc is a permutation of X \ {x}. 
But #(X \ (z]) =n. So Qlx e is a product of an even or odd number of permutations. Therefore P is a 
product of an even or odd number of permutations. Hence X* U X^ = perm(X). 
Part (iii) follows from parts (i) and (ii), and Definition 8.7.12. 


14.8.21 REMARK: Partition of permutations into odd and even does not require an ordered set. 

'Theorem 14.8.20 applies to all finite permutations on an arbitrary set X. Even though a total order is defined 
in the proof of part (i), the assertion does not require any order to be given on X, or on any subset of X, and 
the assertions of Theorem 14.8.20 are order-independent because Definition 14.8.18 is order-independent. 
Thus the parity of finite permutations in Definition 14.8.22 is a well-defined function from the set permg( X) 
of all finite permutations on an arbitrary set X to the set (—1,1]. 


14.8.22 DEFINITION: The parity function for a set X is the integer valued function on the set of finite 
permutations on X with value —1 for odd permutations and value 1 for even permutations. 


Alternative names: The parity of a permutation is also called the sign or the index of the permutation. 


14.8.23 NOTATION: parity : perm)(X) — (—1, 1}, for a set X, denotes the parity function for X. In other 
words, for any set X, 


1 if P is an even permutation of X 


vee permyl X) parity(P) = { —1 if P is an odd permutation of X. 


14.8.24 REMARK: Basic properties of the parity function for general sets. 

Theorem 14.8.25 applies to general sets X, whereas Theorem 14.8.19 applies to finite ordered sets. Note that 
the function N : permy(X) — Z in Theorem 14.8.25 (iv) is not well-defined in general if supp(P) x supp(P) 
is replaced by X x X. (Consider for example the real interval X — [0, 1] with P : X — X defined as the 
transposition of 0 and 1. Let z — 0 and y € (0,1).) 
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14.8.25 THEOREM: Some basic properties of the parity function. 
Let X be a set. 


(i) VY € P(X), VP € perm (Y), parity(P Uidx\y) = parity(P). 
(ii) VP € perm,(X), parity(P~+) = parity(P). 
(iii) VP,Q € permg(X), parity(P o Q) = parity(P) parity(Q). 
(iv) For any total order “<” on X, 


VP € perm (X), parity(P) = (-1)%(), 
where N : permo(X) — Zi is defined by 


VP € perm (X), 
N(P) = #({(a, y) € supp(P) x supp(P); z < y and P(x) > P(y)}). (14.8.3) 


(v) parity(idx) = 1. 


ProoF: For part (i), let Y € P(X) and P € perm)(Y). Let P = PUidxyy. Then P € permg(X) by 
Theorem 14.8.12 (iv). So parity (P) is well-defined | by Definition 14.8.22. By Theorem 14.8.16, there is a 


sequence of transpositions (T; )E | of Y with k € Zi such that P = Tk o Ty 10... Tı. Define the sequence 
(71), by T; = Ti Uidxvy for i € Ng. Then T, € permo(X ) for all i € Nk by Theorem 14.8.12 (iv) because 
T; € perm,(Y) for all i € Ng. But P = Ty o Ty 1 o ... Ti. So parity(PU idx\y) = (—1)* = parity(P) by 
Definitions 14.8.22 and 14.8.18. 

For part (ii), let P € permg(X). Then supp(P) is a finite set by Notation 14.8.11, and supp(P~') = supp(P) 
by Theorem 14.8.9 (ii). Let Y = supp(P) and P = BL Then P € perm(Y) by Theorem 14.8.9 (vi). 
Therefore parity(P-!) = parity(P) by Theorem 14.8.19 (ii, vii) and Notation 14.8.23. But P — PU idx\y 
by Theorem 14.8.9 (vii), and P~! = P-t Uidxxy. So parity(P-!) = parity(P) by part (i). 

For part (ii), let P,Q € permg(X). Then supp(P) and supp(Q) are finite sets by Notation 14.8.11, and 
supp(P o Q) C m U supp(Q) by Theorem 14.8.9(ii). Let Y = supp(P) U supp(Q), P= Pl. 
Q= Ql. and R — (P 21 Then #(Y) < oo, and P,Q, R € perm(Y) by Theorem 14.8.9 (ix). But 
R=PoQ. So ms = parity(P) parity(Q) by Notation 14.8.23 and Theorem 14.8.19 (v, vii). Hence 
parity(P o Q) — parity(P) parity(Q) by part (i). 

For part (iv), let “<” be a total order on X and define N as in line (14.8.3). Then N : permg(X) > Zj isa 
well-defined function because the set ((z, y) € supp(P) x supp(P); x < y and P(x) > P(y)} is finite for any 
P € permo(X) because #(supp(P)) < oo. Let P € permg(X). Let Y = supp(P). Let P = P. Then Y is 
a finite set and P € perm(Y). So parity(P) = (—1)*( by Theorem 14.8.19 (vii) and Notation 14.8.23. 
Part (v) follows from Notation 14.8.23 and Definition 14.8.18 because 0 is an even integer. Alternatively 
apply part (iii) with P — Q — idx. 


14.8.26 DEFINITION: The factorial function is the map f : Zi > Zü defined inductively by: 
(1) f(0) — 1, 
(2) Vn € Zt, f(n) 2 nf(n - 1). 


14.8.27 NOTATION: n! denotes the value of the factorial function for argument n. 


14.8.28 REMARK: Multiplicative series expression for the factorial function. 
The factorial function in Definition 14.8.26 may be paraphrased as n! = IE. ; 2, using Notation 14.4.18. 


14.8.29 DEFINITION: The Jordan factorial function is the map f : Ze x Ze > Ze. defined inductively by 
(1) Yn € Zt, f(n,0) 21 
(2) Yn,k € Zf, f(n,k +1) = (n — K)f(n, E). 


14.8.30 NOTATION: (n), denotes the value of the Jordan factorial function for argument (n, k). 
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14.8.31 REMARK: Multiplicative series expression for the Jordan factorial function. 
The Jordan factorial function may be expressed as (n); = IL (n — i). Thus 


Yn, k € Zf, n! ifk=n 


nl/(n —k) ifk<n 
on | 
0 if k » n. 


The Jordan factorial function is also defined and useful for general real or complex n. Notation 14.8.30 is 
potentially ambiguous in differential geometry, which is mired in parentheses and subscripts. Luckily it is 
rarely difficult to avoid using it in practice. 


The name “Jordan factorial" for the concept in Definition 14.8.29 is given by EDM2 [113], page 1233. It is 
called the “factorial power" by Whittle [162], page 59. According to Feller [70], page 29: “The notation (n), 
is not standard". Some authors use the notation Par. (For example, Allendoerfer/Oakley [48], page 592.) 
The notations n) and ?P, are used by Hogben [94], pages 197-198. A 


14.8.32 REMARK: The cardinality of the set of permutations of a finite set. 
The number of permutations of a set with n elements is n! for n € Zg . 


14.8.33 THEOREM: The number of permutations of a finite set equals the factorial of its cardinality. 
Vn € Zt, #(perm(N,,)) = nl. 


Proor: Let n —0. Then perm(N,,) = perm(@) = {0}. So #(perm(N,,)) = Z((0)) =1=0! 2 nl. 

Now assume that the assertion is valid for some n € Zf. Then #(perm(N,,)) = n!. Define the map 
$ : perm(Nn41) > perm(Nn) x Nazi by ó: P+ (Tesi) 9 Ply, P(n +1)) for all P € perm(Nn41). 
In other words, map each P € perm(N,41) to the pair (P) = (Pl P(n), where P' = Tp(n41),n41 © P 
is obtained from P by swapping the elements P(n + 1) and n +1 of N441 after P has been applied. Then 
P'(n +1) = Tpin+1)n+1(P(n+1)) 2 n4- 1. Therefore Pih. € perm(N,,) because P' € perm(IN, 41). Thus 
$ : perm(N441) > perm(N,,) x IN,44 is a well-defined function. 

Define y : perm(IN4) x IN441 > perm(Ns41) by Y(R, j) = Tj,41 o (QU((n4-1, n4-1)]) for all Q € perm(N,,) 
and j € N4441. Then v(Q,j) € perm(N, 41) for all Q € perm(N,,) and j € IN441 because Tj,441 and 
QU{(n+1,n+1)} are both permutations of Na+1. So v(9(P)) € perm(N, 44) for all P € N,41. Summary: 


@: perm(N,+1) > perm(N,,) x Noua 
w:perm(N,,) x IN441 > perm(Nn+1) 

9: P (Toini) © Ply, »P(n +1) 

V :(Q,j) 9 Tinto (QU{(n + Ln 1)}). 


Let P € Nn41. Then (P) = (Q, j), where Q = Tpin+1)n+1 © Ply and j = P(n+1). So¥(9(P)) = YQ, J). 
Therefore QU {(n+1,2+1)} = TPin+1) n41 9 P = Tjngi © P. So Y(Q, j) = Tint © Tj4410 P = P. Thus 
v(o(P)) = P for all P € IN444. In other words, v o ¢ = idw, ,,. 

Let (Q, j) € perm(N,,) x N544. Let P = v(Q, j). Then P € perm(N,4,1). Therefore $((Q, j)) = o(P) € 
perm(N,,) x IN444. Thus ¢ o  : perm(N,,) x IN441 > perm(N,,) x IN444 is a well-defined function. Since 
P=Tyn41 ° (QU{(n+1,n+ 1)}), it follows that P(n + 1) 2 T; 41(n +1) = j and 


TP(n--1),n41 9 Ps. = Tinti © (Tjn41 ° (QUi(n t Ln DD. 
= (QU (n Ln 1). 
=Q. 


Thus $(v(Q,j)) = (Q, j) for all (Q, j) € perm(Nn) x Nn+1. In other words, ó o  — idperm(Nn)xNn41" 
Therefore v is the inverse of ó by Theorem 10.5.14 (i). So 6 : perm(Nn+1) > perm(N,,) x N41 is a bijection 
by Theorem 10.5.14 (iv). Therefore #(perm(N,+1)) = #(perm(IN,) x IN441) by Definition 13.1.2 and 


Notation 13.1.5. So #(perm(INn+i)) = ZZ (perm(N,))  ZZ(N541) = (n+ 1)n! = (n-- 1)! by Definition 14.8.26. 
Hence #(perm(N,,)) = n! for all n € Zf by induction on n. 
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14.8.34 DEFINITION: The Levi-Civita (alternating) symbol or Levi-Civita tensor (with n indices) is the 
function e : (Nn — Nn) > {—1,0, 1}, for some n € Zi, defined by 


—] if f is an odd permutation of N, 
e(f) = 0 if f is not a permutation of Nn 
+1 if f is an even permutation of N,, 
for all f : Na > Ny. 
14.8.35 NOTATION: €; i, for n € Zf denotes the value e(i) of the Levi-Civita symbol for i = (ij)? 4. 
c^» is an alternative notation for e;,....;,. 


14.8.36 REMARK: The Levi-Civita alternating symbol is related to the parity function. 
See Definitions 14.8.2 and 14.8.18 for permutations. The Levi-Civita alternating symbol is an extension of 
the parity function in Definition 14.8.22 for sets Nn. 


14.8.37 REMARK: M Multi-indez-list and multi-index-degree-tuple definitions and notations. 

Let n € Zi . There are two main ways to abbreviate monomial expressions which are products of n variables 
71,...it4 or compositions of n partial derivative operators 01,...0,. (There are two opposite conventions 
for the order of application of partial derivatives, as mentioned in Remark 42.2.1.) 


(1) Give an multi-index-list 7 € List(IN,,) = Uy NS. 
Algebra example: £1£2£4£1£4£4 = zr where I = (1,2,4,1,4,4). 
Calculus example: 010504040404 = ðr where I = (1,2,4,1,4,4) or I = (4,4,1,4, 2,1). 
(2) Give a multi-index-degree-tuple a € (Zg )". 
Algebra example: #1272042 12494 = z?zizÜzj = z^, where a = (2,1,0,3) with n = 4. 
Calculus example: 01050401040, = 02040903 = 0^, where a = (2,1,0,3) with n = 4. 


The multi-index-list abbreviation style in (1) has the advantage of greater generality and detail because each 
appearance of a variable or operator is listed individually in an explicit order. This style is also useful for 
notating the multiple indices of general tensors. It has the disadvantage of being less brief. 


The multi-index-degree-tuple abbreviation style in (2) has the advantage of brevity and fixed length when 
the number of variables is fixed, but it has the serious disadvantage that it assumes commutativity and 
associativity of the operation which is represented. This constraint is acceptable if @ partial derivatives are 
acting on a C* function because the value of the expression is then order-independent by Theorem 42.3.6. But 
this style of notation is then of no value for discussing scenarios where the order-independence is questionable 
and must be proved, or possibly disproved. 

The two styles are distinguished here by notating (1) as a subscript and (2) as a superscript. They will both 
be referred to in this book by the name “multi-index”, although it is case (2) which is given this name in 
most literature. T'he two styles of multi-indices are introduced in Definitions 14.8.38 and 14.8.39. 


14.8.38 DEFINITION:  Multi-indez-list. A list of indices in a specific order. 
A multi-index or multi-indez-list is an element of List(N,) = Uc, N£, for some n € Zg. 


14.8.39 DEFINITION: Multi-index-degree-tuple. A fixed-length tuple of degrees for each variable. 
A multi-index or multi-index-degree-tuple is an element of (Z*)* for some k € Zj. 


The addition operation is defined on multi-indices in (Z+)* by a+ 8 = (o4 + f,,... 0j + Be), where 
a = (o1,... ox) and B = (Bi,... Bx). 

The (multi-index) length function is defined on (Z^)^ as [a] = na a; for all a € (Z*)*. 

The (multi-index) factorial function is defined on (Z*)* by a! = JJ}; o;! for all o € (Z^)*. 


14.8.40 REMARK: Equality of inflow and outflow for finite permutations. 

Theorem 14.8.41 (i) is a kind of cardinality conservation equation for bijections, although it is little more than 
the definition of cardinality. Theorem 14.8.41 (ii) is a kind of flow-balance equation for permutations because 
it states that the number of elements of S which “flow” into the complement X \ S' is equal to the number of 
elements of X V S which “flow” into S. The “flow function" f may be thought of as a very minimal kind of 
evolution transform for X. (This “flow-balance equation" is useful for the proof of Theorem 14.11.7 (xiv).) 
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14.8.41 THEOREM: Cardinality conservation and flow-balance equations for finite permutations. 
Let X be a finite set. 


(i) Vf € perm(X), VS € P(X), #(f(S)) = #(S). 
(ii) Vf € perm(X), VS e P(X), #05 N F(X \ S)) = #(F(S) n (XN S)). 


PROOF: Part (i) follows from Notation 13.1.5 because f is a bijection. 


For part (ii), let S” = X \ S. Then #(S) = #((SN f(S))U(S\ f(S))) = #(SNF(S)) +#(SN f(S')) because 
SO f(S) and SV f(S) = Sn f(S’) are disjoint. Similarly, #(f(S)) = #(f(S)N.$)+#(f(S)N S$’). Therefore 


#(S f(S')) = #(S) - #(S 9 F(S)) 
FSD) + #F(S) A S’) 
) 


Il 
xd 
> 4u 

| 

x 


because #(S) = #(f(S)) by part (i). 


14.9. Combinations 


14.9.1 REMARK: Alternative names for combinations. 

A combination is called a “k-combination” by EDM2 [113], article 330, but it could also be referred to 
as an “unordered selection" or “unordered sample". (See Remark 14.10.1 for k-permutations and ordered 
selections.) Although these concepts are important in probability theory, they are even more important in 
analysis, particularly as coefficients of Taylor series. 


In Definition 14.9.2, Zj is identified with w. Therefore n = (0,1,...n — 1) for n € Zj. 


14.9.2 DEFINITION: The combination symbol is the function C : Z x Z — Z{ defined by 


Yn,r € Z, C(n,r) = #{x € P(n); Z(x) =r}. 


14.9.3 NOTATION: C", for n € Zj and r € Z, denotes the value of the combination symbol C(n,r). 


14.9.4 REMARK: Alternative notation for the combination function. 
Definition 14.9.2 means that C7 is the number of distinct r-element subsets of an n-element set. The 
notation (7) is a popular alternative to C7’. However, when combination symbols are mixed with other 


7) tends to be ambiguous. 


areas of mathematics, the notation ie 


14.9.5 REMARK: Pascal’s triangle and the combination function. 

When the combination symbol values are arranged in a triangle, this is called “Pascal’s triangle”, although 
Pascal called it an “arithmetic triangle” or “triangle arithmétique”. Pascal’s triangle has so many interesting 
properties that whole books have been written about it. In particular, Blaise Pascal wrote a small treatise 
“Traité du triangle arithmétique” [229] in 1654, published posthumously in 1665. He wrote the following 
preface to the chapter on applications. 


DIVERS VSAGES DV TRIANGLE 
ARITHMETIQVE. 
Dont le generateur est l'Vnité 


Apres auoir donné les proportions qui se rencontrent entre les cellules & les rangs des Triangles 
Arithmetiques, ie passe à diuers vsages de ceux dont le generateur est l'vnité; c'est ce qu'on verra 
dans les traictez suiuans. Mais t'en laisse bien plus que ie wen donne; c'est une chose estrange 
combien il est fertile en proprietez, chacun peut s’y exercer; l'auertis seulement icy, que dans toute 
la suite, ie n’entends parler que des Triangles Arithmetiques, dont le generateur est l'unité. 


This may be translated into English as follows. 
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Various uses of the arithmetic triangle with unit generator 


After having given the proportions which are encountered between the cells and the rows of 
Arithmétic Triangles, I pass to various uses of those of which the generator is unity; that is what 
one will see in the following tracts. But I am leaving out many more of them than I am giving; it 
is a strange thing how fertile in properties it is, everyone may exercise himself on it; I mention here 
only that in all of the following, I intend only to talk of Arithmétic Triangles of which the generator 
is unity. 


The comment about fertile properties is more often translated as: “It is extraordinary how fertile in properties 
this triangle is. Everyone can try his hand.” The conclusion one may draw from this is that it is pointless 
to try to make a comprehensive list of properties of the combination symbol (i.e. Pascal’s triangle). 


Pascal’s stipulation that “the generator is unity” is equivalent to part (i) of Theorem 14.9.8. 


14.9.6 REMARK: History of Pascal’s triangle. 
Struik [249], page 74, notes that Pascal's triangle was published by Chinese mathematicians Yang Hui (fi, 
about 1238-1298AD) and Zhü Shi-Jié (AHL, about 1260-1320AD) during the Sung dynasty (960-1279AD). 


Yang Hui presents us with the earliest extant representation of the Pascal triangle, which we find 
again in a book c. 1303 written by Zhü Shijié (Chu Shi-chieh). 


However, it seems these Chinese authors merely wrote out six or eight rows of the triangle without system- 
atically investigating its properties as Pascal did. 


Boyer/Merzbach [237], pages 254 and 269, mentions that Petrus Apianus published a Pascal’s triangle in 
Germany in 1527. They mention also a Pascal’s triangle published in Germany in 1544 by Michael Stifel. 
However, like the Chinese authors, these German authors did not investigate the properties as Pascal did. 


14.9.7 REMARK: Finite difference rule for Pascal’s triangle and the combination function. 
To establish that the “cells” of Pascal’s triangle are equal to the combination symbol in Definition 14.9.2, it 
is necessary to show that the combination symbol satisfies the induction rule in Theorem 14.9.8 (ii). 


n n 
C a Cr 
n+l 
CT 


From this, it is possible to write various formulas such as C? = IL (n — i)/(1 + i) in Theorem 16.7.4 (iii). 


Such a formula requires a definition of division for rational or real numbers, even though the result of this 
sum of ratios is an integer. This can be performed within the integers by noting that the denominator always 
divides the numerator exactly in the terms (n — i)/(1 + i), but it is simpler to wait until the real numbers 
have been fully characterised in Section 15.9. 


14.9.8 THEOREM: Initial values and “equation of motion” for Pascal’s triangle. 
(i) Vr € Z, C9 = by. 
(ii) Yn e Zt, Yr e Z, CH = OP, + CT. 


PROOF: For part (i), let r € Z. Then C? = #{x € IP(0); #(z) = r}. But IP(0) = {0} and #(0) = 0. So 
C? —1ifr-0, and C? = 0 ifr £0. In other words, C? = 6,9. 


For part (ii), let n € Zf and r € Z. Then 


C? = pHa € P(n +1); #(2) =r} 
= #{xr € P(n + 1); Z(zr) =r and n € £} + #{x € IP(n 4 1); 
= #{x' U {n}; z' € P(n) and #(x) =r — 1} + #{2' € P(n); 
= #{2' € P(n); (zx) 2r - 1] + #{2' € P(n); (x) =r} 
=O", +07. 


(x) =randn ¢ x} 


# 
#(x) =r} 
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14.10. Ordered selections 


14.10.1 REMARK: Alternative names for ordered selections. 
An ordered selection is also known as an “ordered sample”, a “k-permutation” or simply a “permutation”. 
(This is explained in Remark 14.8.1. See also Remark 14.9.1 for k-combinations and unordered selections.) 


14.10.2 REMARK: Notations for sets of strongly and weakly ordered selections. 

The notations J?’ and Jẹ in Notation 14.10.3 are non-standard. (Federer [69], page 15, gives the notation 
A(n,r) for I?.) I? could be thought of as the set of strongly ordered selections of r elements from n, and 
then J? would be the corresponding set of weakly ordered selections. J?’ corresponds to the set of possible 
independent partial derivatives of order r of a C" function of n variables. (See Feller [70], page 39.) 


14.10.3 NOTATION: Sets of increasing and non-decreasing selections. 
I? for n,r € Zi denotes the set (f : N, > Nn; Vj,k € Np, j < k > f(j) < f(K)). 
J? for n,r € Zt denotes the set {f : N, > Nn; V;ke Ny, j < k > FG) € f(K))- 
In other words, I? = Inc(IN,., Nn) and J” = NonDec(NN,, Nn). (See Notation 11.1.32 for Inc and NonDec.) 
14.10.4 REMARK: Special cases of ordered selections. 
It is easily seen that I7 = 0 ifr >n, and I? = {idz,} if r =n. 
14.10.5 THEOREM:  Cardinality of sets of increasing and non-decreasing selections. 
(i) Vn € Zg, Vr € Zo, 4 (I7) = CT. 
(ii) Vn € Z*, Vr € Zg, #(JP) = 
(iii) Vr e Zt, 4(J9) = gom, 
(iv) Vn € Zj, Vr € Zt, 4(JT) 
Pnoor: For part (i), let n,r € Zg. Then for all x € P(N,) such that #(x) = r, define (x) to be the 
standard enumeration of x from r € w to x. as described in Definition 12.4.5. Then (x) : r — x is an 
increasing bijection. Define $'(z) : N, > x by 9! : i+ (i — 1). (This converts the start-at-zero domain r 
to the start-at-one domain N,.) Then $'(z) € I7. Since 9! : (x € P(IN4); #(£) = r} 5 I? is a bijection, it 
follows from Definition 14.9.2 that #(J}’) = C7. 
For part (ii), let n — 1 and r € Zi. Then J? = (6), where ó : N, — Ni is defined by ó : z — 1. So 
#(J}) = 1 = C"*-!, Now assume de validity of the assertion for some n € Z*, for all r € Zt. Then 

vremt, — sUr)e(fegnts fona) HE E JEH fi) n1) 

= 4 Ur) + 37) 


= n+r— Lo n+r—-1 
=C 1 TUS 


= Olnt)+r-1 


by Theorem 14.9.8 (ii). Hence the general assertion follows by induction on n. 
For part (iii), let r € Z*. Then J? = 0, because there are no functions from a non-empty set to the empty 
set. So #(J?) = 0 = C7! by Definition 14.9.2. 
Part (iv) follows from parts (ii) and (iii). 


14.10.6 REMARK: Applications of ordered selections. 

Sets of increasing and non-decreasing sequences of indices (i1,...%,) selected from a complete set (1,...m) 
are frequently encountered in tensor algebra. One often wishes to sum an expression for index sequence 
values which have a unique representative for each subset of distinct index values. 


It is perhaps noteworthy that Theorem 14.10.5 is equivalent to the following assertions. 


$ " r-i nņn—i 
Vn,r € Zg, #0) = [I 
zx pd 
r—1 E 
Vn,r € Zf, $7) - IT + - 
i-o t+ 


These are more similar to each other than is suggested by parts (i) and (ii) of Theorem 14.10.5. 
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14.10.7 NOTATION: fa for a function f : IN, — A and a € I? for any set A means the sequence f o a = 
(Fali) - (£a5))551 = (fois fam) : Nm A. 


14.10.8 NOTATION: f® for a function f : Nn — A and o € I7, for any set A means the sequence f o a = 


(Fla) Fa = (FOP), = (Fo... fom) INS > A. 


14.10.9 REMARK: Notations for ordered selections. 

Notations 14.10.7 and 14.10.8 give alternatives fy and f? for writing f o o. Notation 14.10.8 may be useful 
in contexts where raised indices are a mnemonic for contravariance, whereas lowered indices are a mnemonic 
for covariance. 


The set A in Notations 14.10.7 and 14.10.8 is typically the field of numbers which is used for coordinates 
and coefficients of vectors or a set of vectors in a linear space. The function f is then a tuple of coordinates 
and a is an ordered selection from the index set Nn. 


14.10.10 REMARK: Alternative expressions for sets of strongly and weakly ordered selections. 
Using Notation 11.1.32, one may write I? = Inc(N,, Nn) and J? = NonDec(N,, Nn). 


14.11. Rearrangements for sorting ordered selections 


14.11.1 REMARK: How to construct a permutation to sort a finite sequence. 

In the theory of antisymmetric and symmetric tensors, it is often necessary to be able to sort a given sequence 
of indices into increasing or non-decreasing order respectively. Theorem 14.11.2 gives some basic properties 
of a simple construction for a permutation which may be applied (on the right) to a finite sequence to map 
it to a suitable ordered sequence. This construction is applied in Theorem 14.11.7. 


The permutation constructed in Theorem 14.11.2 has the property that if a value is repeated in a sequence, 
then the permutation maintains the order of the indices which have those values. The method of construction 
is similar to the algorithm in computer science known as a “selection sort”. (See for example Orwant/ 
Hietaniemi/Macdonald [498], pages 120-122.) 


14.11.2 THEOREM: Permutations for sorting finite sequences. 
Let S be a totally ordered set. Let r € Zi. Let g : N, — S. Define P : N, — N, by 


Vi € Nr, P(i) = 4: € Nr; g() < g(3) or (g(3) = g(t) and j € i)] 
= #(Ji), 


where J; = {j € N,; g(j) < gli) or (g(j) = gli) and j < i)} for alli € N,. 
(i) Vi € N,, i € J. 

(ii) Vi,i' € N,, g(i) < g(i) > Ji € Jy. 

(ii) vi, € N,, (g(i) < g(t) andi < i) > J;g Jy. 

(iv) P € perm(N,). 

) go P^! e NonDec(N,, S). 

(vi) If g € Inj(N,, S), then g o P^! € Inc(N,, S). 


(v 


PRoor: For part (i), let i € IN,. Let j = i. Then g(j) € g(t) and j € i. So i € P(i). 

For part (ii), let i, i € IN, with g(t) < g(t). Let j € Jj. Then j € N,, and either g(j) < g(i) or else 
(i)  g(i) and j < i. It g(j) < (i) then g(j) < 9(/) and so j € Ty. Also, if g(j) = (i) then g(j) < glè’) 
and so j € Ij. Therefore J; C Jy. By part (i), i’ € J(i’). Suppose that 2’ € J(i). Then either g(i’) < g(t) 
or else g(?") = g(t) and ?' € i. These alternatives both contradict the assumption g(i) < g(t’). So i ¢ J(i). 
Hence J; & Jy. 

For part (iii), let 7,7’ € N, with g(i) < g(i') and i <i’. Let j € Jj. Then j € N,, and either g(j) < g(i) 
or else g(j) = g(t) and j € i. If g(3) < g(t) then g(j) < g(i') and so j € Ty. If g(j) = g(t) and j < i, 
then g(j) € g(i') and j < ?', and so j € Iy. Therefore J; C Jy. By part (i), i’ € J(i'). Suppose that 
i’ € J(i). Then either g(?") < g() or else g(t’) = g(i) and ?' € i. The alternative g(i') < g(i) contradicts 
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the assumption g(i) < g(t’). The alternative g(t’) = g(i) and i’ € i contradicts the assumption i < i’. So 
i [2 J (i). Hence Ji G Jy. 

For part (iv), if r = 0 then N, = 0 and P = 0, which implies P € (0) = perm(N,). So assume r € Z*. Let 
i € N,. Then J; 40 by part (i). So P(i) € N,. Thus P is a well-defined function from N, to N,. 

To show that P is injective, let i, i € N, with i < i'. If g(i) < g(i'), then P(i) < P(i') by part (iii). If g(i) > 
g(t’), then P(i) > P(i’) by part (ii). Similarly P(i) Z P(?') if i » i'. So Vi,’ € Np, iz i' > P(t) z P(e’). 
Thus P is injective. So P : N, — N, is a bijection by Theorem 12.4.13 (i). Hence P € perm(N,). 

For part (v), let k, ' € IN, with k < K'. Suppose that g(P~'(k)) > g(P- !(k')). Let i = P~'(k) and 
ij = P-!(K). Then g(i) > g(i') and P(i) = k < k' = P(i'). But by part (ii), g() > g(i') implies that 
P(i) = #(Ji) > 4(Jy) = P(i'), which is a contradiction. So g(P~!(k)) x g(P~1(k’)). Thus g o P^! is 
non-decreasing. In other words, g o P^! € NonDec(IN,., S) by Notation 11.1.32. 

For part (vi), let g € Inj(IN,, S). By part (v), g o P^! € NonDec(N,, S). Suppose that g o P^! ¢ Inc(N,, S). 
Then g(P-!(k)) = g(P-!(k')) for some k, k' € IN, with k < k'. Let i = P^ !(k) and i' = P^!(K). Then 
gli) = gli). So i = i' because g € Inj(IN,, S). So k = P(i) = P(i') = K', which is a contradiction. Hence 
go P^ !€Inc(N,, S). 


14.11.3 DEFINITION: The standard sorting-permutation for a finite sequence g : IN, > S, for r € Zi and 
a totally ordered set S, is the permutation P, € perm(N,.) defined by 


Vi € N,, Pali) = HHI € Nr; 9(9) < g(i) or (g(j) = g(1) and j € i). 


The sorted rearrangement of a finite sequence g : IN, — S, for r € Ze and a totally ordered set S, is the 
function g o Py 1: N, > S, where P, is the standard sorting-permutation for g. 


14.11.4 REMARK: Basic properties of sorting-permutations. 

The target space S in Definition 14.11.3 is a completely general totally ordered set, but for differential 
geometry, the most useful case is S = Nn, which is assumed for Theorem 14.11.5 and for the proof of 
Theorem 14.11.7 (iv, xii). 


The immediate motivation for Definition 14.11.3 is shown in Theorem 14.11.5 (i, ii, iii), where it is shown that 


P, is a permutation of N, which “normalises” the map g € IN7, to be an ordered selection g o P7 1 which is 
non-decreasing in general, and increasing in case g is injective. 


Theorem 14.11.5 (iv) gives a special form for P, when g is injective. The purpose of the extra condition 
“and j < i” in Definition 14.11.3 is to sort indices i, j € N, for which j Z i and g(j) = g(i). Then P, maps 
i to a different value j for each given i. 


Consider for example the function g = {(1, 5), (2,4), (3,4 


wa 


} for r = 3 and n = 5. Then 


Py(1) = fU € Nr; g(j) < 5 or (g(j) = 5 and j € 1)} —3, 
P2) = tU € Nr; g(j) < 4 or (g(j) = 4 and j € 2)} = 1, 
and Po(3) = sU € Nr; g(j) < 4 or (g(j) = 4 and j € 3)} = 2. 


Thus P, = {(1,3), (2,1), (3,2)} € perm(IN3). The condition “and j € i” ensures that i = 2 and i = 3 
are mapped to different values even though g(2) = g(3). This “symmetry-breaker” is not required if g 
is injective. When P, is applied to g, the result is g o P; ! = {(1,4), (2,4), (3,5)} € J3, which is non- 
decreasing. This result g o Pj ! is independent of whether the condition “and j < i" or “and j > i" is used, 
but the permutation Pj does depend on the choice of this condition. 

To be specific, P, is chosen so that P,(i;) < Pj(i2) whenever i; < i? and g(i1) = g(iz). In other words, 
P, is chosen to be increasing on the inverse image of any element of the range of g. Consequently, P has 
an artificial dependency on the choice of a total order for the domain N, of g. If this order is reversed, for 
example, the order of P, on inverse images of values of g will be reversed. This becomes significant when 
the domain of g is permuted as it is in Theorem 14.11.5 (v), where g is replaced by g o Q. This is why the 
assertion is restricted to injective maps g. 

Theorem 14.11.5 (iv) shows that the condition “and j < i” in Definition 14.11.3 has no effect on the sorting 
rule, which is not surprising if g is injective. 

Theorem 14.11.5 (v) is applicable to the proof of Theorem 56.5.12. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


14.11. Rearrangements for sorting ordered selections 525 


14.11.5 THEOREM: Some basic properties of standard sorting-permutations. 

For n,r € Ze and g € N}, let P, be the standard sorting-permutation for the totally ordered set S = Nn. 
(i) Vn,r € Zg, Vg € Ni, P, € perm(N,). 

(ii) Vn,r € Zg, Vj CN, go pA € Jy. 

Nr, Nn), pee € Ip. 

(v) Vn,r € Zo, Vg € Inj(N,, Nn), VQ € perm(N,), Pioo = P5 o Q. 


) 
(iii) Vn,r € Z3, Vg € Inj 
) 
) 
(vi) Vn,r € Zf, Vg € Inj(N,, Nn), VQ € perm(N.), (go Q) o Poo go P; . 
) 
) 
) 


(iv) Vn,r € Zg, Vg € Inj 


TL 


N 
N 
N 
N 


— — Ti 


(vii) Vn,r € Zy, Vg € J7, Py =idn,. 
(viii) Vn,r € Zg, Yg € JP, go Prt =g. 


(ix) Vn,r e Zp, Vg € IP, go P; 1 — g. 


PROOF: For part (i), let n,r € Zf and g € N7. Let S = Nn. Then P, € perm(N,.) by Theorem 14.11.2 (iv). 


For part (ii), let n,r € Zf and g € N}. Let S = Ny. Then g o Pj! € J? by Theorem 14.11.2 (v) and 
Notation 14.10.3. 

For part (iii), let n,r € Z; and g € Inj(N,, Nn). Let S = Nn. Then g o P;! € I? by Theorem 14.11.2 (vi) 
and Notation 14.10.3. 
For part (iv), let n,r € Zy and g € Inj(N,, Nn). Then g(j) = g(i) implies j = i. Therefore the proposition 
*g(j) = g(t) and j € i” is true if and only if g(j) = g(t). Hence P, = #{j € Nr; 9g(j) € 9(i)]. 

For part (v), let n, r € Zg and g € Inj(N,, Nn). Let Q € perm(N,). Then by part (iv), 


Vi € Nr, P oQ(i) = H € Nr; g(QG)) < 9(Q(4))} 
= #{k € Nr; g(k) € 9(Q(4))} (14.11.1) 
= P(Q), 


where the substitution j = Q7! (k) is made in line (14.11.1) for k € N,, which exists and is unique because 
Q € perm(N,). Hence Pyog = P, o Q. 

For part (vi), (go Q) o Poo =g°Qo (Po Q) =g0Qo Qo P7' ge! by part (v). 

For part (vii), let n,r € Zj and g € J”. Then the proposition “g(j) < g(i) or (g(j) = g(i) and j < i)" is 
equivalent to j < i because g is non-decreasing. So P (i) = #{j € N,; j € i} = i for all i € N, by 
Definition 14.11.3. Hence P, = idy,. 


Part (viii) follows from part (vii). 


Part (ix) follows from part (viii) because I} C J}. 


14.11.6 REMARK: Partitioning of multi-index sets according to ordered-selection representatives. 
Theorem 14.11.7 is applicable to the coordinatisation of spaces of antisymmetric and symmetric multilinear 
maps. (See Notation 10.5.25 for sets Inj(.X, Y) of injections between sets X and Y.) 


Theorem 14.11.7 (ix, xi), together with Theorem 14.10.5 (i), implies that the set Inj(N,, Nn) of selections of 
r elements from a set of n elements may be partitioned into C? equinumerous equivalence classes p(f) with 
representative elements f € I”. Thus #(Inj(IN;,Nn)) = r!4Z(17) = r! C7? = (n),. This then implies that 
the set of antisymmetric arrays of the form a: IN7, — IR can be coordinatised by C}? real parameters. 


The corresponding symmetric arrays can be coordinatised by C?*"^! real parameters, but the equivalence 
classes p( f) then do not in general have the same cardinality. 

The map c : J” x perm(N,) — N? defined by o : (f, P) + f o P is not in general a bijection. This is 
why Theorem 14.11.7 (x) cannot be generalised from injective maps to general maps in the same way that 
part (ix) is generalised to part (xvii). 


14.11.7 THEOREM: Partitions of IN7, according to rearrangement orbits. 
Let n,r € Zi. Define p(f) = (f o P; P € perm(N,)) for all f € N}. 
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(G) YF SN. Pye. 
(ii) Vf € WN., Nn), p(f) € WN, Nn). 
(iii) Vf € Ni, f € p(f). 
(iv) Vg € Inj(N,, Nn), df € I7, g € (f). 
(v) Inj(N,, Nn) = Utp(f); f € IP}. 
(vi) Vig e I, ge p(f) >= 9 f. 
(vii) Vf € IP, p(f)n I? = (f). 
(viii) Vf, f e IP, FAS => of) Nef’) — 9. 
(ix) (p(f); f € I?) is a partition of Inj(IN,., Nn). 
) 
) 
) 
)N 
) 
) 
i) 
) 


(x) The Bep o : I? x perm(N,) > Inj(IN,., Nn) defined by o : (f, P) => f o P is a bijection. 
(xi) Vf € Ir, #(e(f)) =r! 
(xii Y €N,dfeJ?,gep(f). 
=U; f € JP}. 
(xiv viged ri ge pf) >g=f. 
(xv) Vf E Jr, p(f) n dP = {F} 
(xvi) Vf, fed, fA = of) olf!) =9. 
(xvii) (o(f); f € J? } is a partition of N7. 


(xiii 


PROOF: For part (i), let f € IN. Let P € perm(N,). Then Dom(P) = N, and Range(P) = N,. So 
Dom(f o P) = Dom(P) = N, and Range(f o P) = f(Range(P)) = Range(f) € Nn. Therefore f o P € N}. 
Hence p(f) C NY. 

For part (ii), let f € Inj(N,, Nn). Let P € perm(IN,). Then f o P € N}; by part (i). But f o P is injective 
by Theorem 10.5.6 (i). Hence p(f) C Inj(IN,, Nn). 

For part (ii), let f € N}. Then idw; € perm(IN,) by Definition 14.8.2 and Theorem 10.5.5. Therefore 
f = f o idur € e(f). 

For part (iv), let g € Inj(N,, Nn). Let P, € perm(N,) be the standard sorting-permutation for g as in 
Definition 14.11.3 with S = N,. Let f = g o P; +. Then f € Inc(N,, Nn) = I? by Theorem 14.11.2 (vi), 
and g = f o P} € p(f). Hence Vg € Ij(N,, Nn), If € I7, 9 € p(f). 

For part (v), Inj(N;,Nn) CU {p(f); f € I7) by part (iv). For the converse, let g € U (o(f); f € I7). Then 
g € p( f) for some f € I7. So g = f o P for some P € perm(N,) and f € I”. Therefore g € Inj(N,, Nn) by 
Theorem 10.5.6 (i). Thus Inj(N., Nn) 2 U (e(f); f € I7). Hence Inj(N,, Nn) =U {e(f); f € 175. 

For part (vi), let f, g € I? with g € p(f). Then g = f o P for some P € perm(N,). Suppose that P 7 idx,. 
Then by Theorem 12.2.15 (i), P is not an increasing function. So i < j and P(i) > P(j) for some i, j € N,. 
Then g(i) = f(P(i) > f(P(j)) = g(j) because f € I”, which contradicts the assumption that g € I7. 
Therefore P = idy,. So g= f. 

Part (vii) follows from parts (iii) and (vi). 

For part (viii), let f, f' € I? with p(f)p(f') Æ 0. Then g € p(f) p(f’) for some g. So g = f o P 
and g = f' o P' for some P,P’ € perm(N,). Then f = f' o Q, where Q = P' o P^! € perm(N,) by 
Theorem 10.5.6 (iii). So f € p(f’). Therefore f = f' by part (vi). Hence f 4 f! => p(f)N p(f') = 0 by 
Theorem 4.5.7 (xxx). 

Part (ix) follows from parts (v) and (viii), and Definition 8.7.12. 


For part (x), o : I? x perm(IN,)  Inj(IN,, Nn) is a well-defined function because f € I? and P € perm(N,) 
are both injective, which implies that f o P is injective by Theorem 10.5.6 (i). The surjectivity of ø follows 
from part (iv). To show injectivity, suppose that f o P = f' o P' for (f, P), (f', P") € I? x perm(N,). Then 
f o P € p(f) and f' o P' € p(f'). So f = f' by part (viii). Therefore f o P = f o P'. So P = P' by 
Theorem 10.5.14 (v) because f is injective. Hence ø : I} x perm(N,) — Inj(N,, Nn) is a bijection. 

For part (xi), let f € I? and define the map 9 : perm(IN,) > p(f) by 9: P fo P. Thenóisa 
well-defined surjection onto p(f). Let P,P’ € perm(IN,) with P #4 P'. Let Q = P' o P-!. Suppose 
that Q = idy,. Then P = Q o P = P'o P^!o P = P'. Therefore Q Z idy,. So Q(i) Z i for some 
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i € IN,. Therefore f(Q(i)) Z f(t) for such i € N, because f is increasing and consequently injective. Thus 
foQzf.SofoP'—foQoP £ f oP by Theorem 10.5.14 (vi) because P : N, — N, is a surjection. 
Thus $(P) Z ¢(P’). It follows that ¢ is an injection. So $ : perm(N,) — p(f) is a bijection. Hence 
#(p(f)) = #(perm(N,)) = r! by Theorem 14.8.33, Definition 13.1.2 and Notation 13.1.5. 


For part (xii), let g € IN. Let Pj € perm(N,) be the standard sorting-permutation for g with range S = N, 
as in Definition 14.11.3. Let f = g o P; !. Then f € NonDec(N,, Nn) = J}? by Theorem 14.11.2 (v), and 
g = f o P} €p(f). Hence Vg € Ny, df € Jr, g € p(f). 


For part (xiii), N, C Uf{e(f); f € Je} by part (xii). But N; 2 Uto(f); f € J} by part (i). Hence 
N; —-Ute(f); f € Jr}. 

For part (xiv), let f,g € J? with g € p(f). Then g = f o P for some P € perm(N,.). Suppose that g Z f. 
Then g(i) Z f(i) for some i € N,. Suppose that g(i) > f(i). Then f(P(i)) > f(i). So P(i) > i because 
f € JD Let S = Z[l,i. Let S = N,VS = Z(i - 1on]. Then Z(S N P(S’)) = #(S" n P(S)) by 
Theorem 14.8.41 (ii). But P(i) € S'N P(S). So 4«(S' n P(S)) > 1. Therefore #(S n P(S’)) > 1. In other 
words, P(j) € i for some j € N, with j >i. So g(j) = f(P(j)) € f(i) since f € JZ. Therefore g(j) < g(i), 
which contradicts the assumption that g € J”. So g(4) 7 f(i). 

Now suppose that g(i) < f(i). Then f(P(i)) < f(i). So P(t) < i because f € JP. Let S = Z[1,i — 1]. Let 
S' = Nn \ S = Z[i,n]. Then #(8 N P(S’)) = 4£(S' n P(S)) by Theorem 14.8.41 (ii). But P(i) e $n P(S"). 
So #(S/N P(S’)) > 1. Therefore #(S’M P(S)) > 1. In other words, P(j) > i for some j € N, with j < i. So 
9g(j) = f(P(3)) = f(i) since f € J}. Therefore g(j) > g(i), which contradicts the assumption that g € J}. 
So g(t) € f(i). Thus there is not i € N, with g(i) Z f(i). Hence g = f. 


Part (xv) follows from parts (iii) and (xiv). 

For part (xvi), let f, f' € J? with p(f)p(f') Z 0. Then g € p(f) p(f’) for some g. So g = f o P 
and g = f' o P' for some P,P’ € perm(IN,). Then f = f' o Q, where Q = P' o P^! € perm(N,) by 
Theorem 10.5.6 (iii). So f € p(f’). Therefore f = f' by part (xiv). Hence f # f' > p(f)N p(f’) = 0 by 
Theorem 4.5.7 (xxx). 


2 
= 


Part (xvii) follows from parts (xiii) and (xvi), and Definition 8.7.12. 


14.12. List spaces for general sets 


14.12.1 REMARK: Motivation for introducing a notation for list spaces. 

Lists are finite sequences of unspecified length. There are convenient notations for spaces of sequences with 
fixed, specified length, but sometimes a notation is required for spaces of sequences with unspecified length. 
A list of items in a set X is an element of the set union U7, X^. The set X^ may be interpreted as the set 
of functions f : i = (0,1,...à — 1] — X or as the set of functions f : N; = (1,2,...i) — X. The initial 
index 0 is used in Section 14.12 for maximum convenience. 


14.12.2 DEFINITION: The list space on a set X is the set Uiezi x, 


14.12.3 NOTATION: List(X) denotes the list space on a set X. 


14.12.4 REMARK: The choice of initial integer for list domains. 

One may interpret List(X) either as (Jic. X Zi or as U iez} X Ni, The intended interpretation is the former, 
which means that list indices start at 0. It is sometimes convenient to use the initial index 1 instead of 0 in 
Definition 14.12.2. Then all operations in Definition 14.12.6 are the same except that the indices are shifted. 


14.12.5 THEOREM: Some enveloping sets for list spaces. 
Let X be a set. Then List(X) C P(X0) C P(Zt x X). 


PROOF: Let X bea set. Let l € List(X). Then Ji € Zi, l € X’. So Dom(é) C Zj and Range(/) C X. 
Therefore / C X76 . Hence List(X) C P(X20). But XZ C Zt x X. So P(XZ5) C P(Zt x X). 


14.12.6 DEFINITION: The following operations are defined on the list space List(X) for any set X. 
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(i) 
(ii) 


(iii) 


[ www. geometry. org/dg.htm1] 


14. Natural numbers and integers 


The length function length : List(X) — Zi defined by length : x ^ #(Dom(z)). 


The canonical injection i : X — List(X) defined by i : x — (x), where (x) is the one-element list 
containing the single element zx. 


The concatenation function concat : List( X) x List(X) — List(X) is defined by 


Vm,n € Zj, Vx € X", Vy e X^, 
Dom(concat(z, y)) = Zm+n 
and 
Vm,n € Zi, Yx € X", Vy € X”, Vi € Zin, 
Ti ifi<m 


concat(z, y); = on ifi> m. 


In other words, concat(z,y) = (%0,.--%m—1,Y0;---Yn—1)- (This is a start-at-zero version of the start- 
at-one concatenation map template in Definition 14.6.10.) 


The “omit item j” function omit; : List(X) — List(X) is defined for j € Zf by 


ZIn-1 ifj«n 


Vn € Zy, Vee X^, Dom(omit(¢)) = { Z. T NM 


and 
Vn € Zg, VL € X", Vi c Dom(omit;(£)), 


oui) = | 


iar dfi]. 


In other words, omit;(£) = (£o,...,£j 1, €j41,---,€n—2) if j < n, and omit;(£) = £ if j > n. 
The “omit items j and k” function omit; : List( X) — List(X) is defined for j,k € Zf with j Æ k by 


Zn—2 if max(j,k) <n 
Yn € ZJ, vee X", | Dom(omit(/)) 2 4 Zn-ı if min(j,k) < n < max(j, k) 
M^ Zn if min(j,k) >n 
and 
Vn € Zt, WE € X", Vi € Dom(omit; (2), 
li if i < min(j, k) 
omit(£)(i) = 4 li+ı if min(j,k) € i < max(j, k) 
id] li+2 if max(j,k) < i. 


In other words, omit; (£) = (£o,.... la—1, la+1;-- o Ign l8+1; - £s cg) if max(j, k) < n, where a = 
min(j, k) and 6 = max(j, k). (Note that length(omit; ;(4)) = #(Zn V (j, k}) in general.) 

The “swap items j and k” operator swap;, : List(X) — List(X) is defined for all j,k € Zi by 
Dom(swap; ,()) = Dom(/) for all £ € List(X), and 


Vn € Zi, VL € X", Vi € Zn, 


f£, ifi=jandk<n 
swap(£)(i) = | £j ifi=kandj<n 
j,k f; otherwise, 


In other words, 
Yn, j,k € Z, Ve E€ X”, 
sump (t) = ( MG 4) E H) U (Gola) 056). ifinaxG E) <n 
j,k £ otherwise. 


(See also Definition 14.12.23 (i) for the closely related “swap values” operator.) 
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(vii) The “substitute at position j the value x” operator subs; : List(X) — List(X) is defined for j € Z 
and x € X by Dom(subs,,;(€)) = Dom(/) for all £ € List(.X), and 
Yn € Za, Vee X", Vie Nn, subs((i) = (7 boss. 
In other words, 


(See also Definition 14.12.23 (ii) for the closely related “substitute value" operator.) 
(viii) The “insert at position j the value x” function insert; : List( X) — List(X) is defined for j € Zf and 


xe xX by 
Vn € Zo, VE E X^ Dom(insert(£)) = Pur die 
0 : 3.15 Zn if j > TL 
and 


Vn € Zo, VL € X", Vi € Dom(insert; ,(/)), 


li ifi < j 
insert (£) (i) = | x iij 


di li dfi j. 
In other words, insert; ,(£) = (£o, ...,£j 1,2, £5,..., £4 1) i£ j < n, and insert; ,(£) = Lif j >n. 
(ix) The “subsequence starting at j with length k” function subseq;, : List( X) — List(X) is defined for 
j,k € Zp by 
Zk if k < n —j 
Yn € Zi, VL E X^, Dom(subseq(¢)) = Zn-j ifO<n-j<k 
and 


Vn € Z, VEE X”, Vie Dom(subseq; ;,(¢)), 
subseq(£) (i) = lk+i 
jk 

14.12.7 REMARK: Substitution operator applications. 

Although Definition 14.12.6 (vii) may seem overly complicated for such a simple task, the substitution op- 
erator is in fact often required. Substitution in tuples is very often explained using “horizontal ellipsis" 
dots, which do not give full confidence in the precision of a definition. For example, when an author 
wishes to substitute s for x; and t for x; in a tuple x € IR", this is often denoted in some such way as 
"zy1,... 2451, S, Litl,- 1j, 5, 544, ... Ln”, even though i might not be less than j. This can be expressed 
much more precisely as subs; , (subs; ;(x)). 


14.12.8 REMARK: Lists of elements of algebraic systems have extended operations. 
See Definitions 18.10.2 and 18.10.3 for algebraic operations on list spaces. 


14.12.9 REMARK: The concatenation operator for general lists. 
Theorem 14.12.10 is essentially equivalent to Theorem 14.6.12 (i). Conversely, lists can be extracted as 


14.12.10 THEOREM: Concatenation of two sublists of a list to reconstruct the original list. 
Let X be a set. Let £ € List(X) with n = length(¢) € Zi. Then 


Vm € Zn41; L = concat(subseq(/), subseq(£)). 
0,m 


m,n—m 


PROOF: Let X bea set. Let £ € List(X) with n = length(¢) € Zj. Let m € Zn41 = {0,1,...n}. Let 
Lı = subseqg m(£). Let £9 = subseq,, , ,,(). Then 44 € X™ and fg € X"7". Let lo = concat(£, £2). 
Then lo € X". For i € Nm, £o(i) = 41 (i) = subseqg n (4) (i) = £(). For à € Nn \ Nm, £o(i) = (i — m) = 
subseq,, n—m(£)(i — m) = £(m +i — m) = £(i). So l(i) = (i) for all i € N,. Hence concat(/1, l2) = £. That 
is, l = concat(subseqg m (£), subseq,, n-m (£)). 
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14.12.11 REMARK: Extended list spaces contain also infinite-length sequences. 
It is sometimes useful to combine finite and infinite sequences in extended list spaces as in Definition 14.12.12. 


14.12.12 DEFINITION: The extended list space on a set X is the set List(X) U X*. 


14.12.13 NOTATION: List(X) denotes the extended list space on a set X. 


14.12.14 REMARK: Some expressions for extended list spaces. 
For any set X, 


List(X) =X” uU U X’ 
i=0 


=U{X i€ Zp} 
=U{X iewt) 
= e ^ 


i€wt 


14.12.15 REMARK: Choice of notation for extended list spaces. 

The bar-notation List( X) for an extended list space is supposed to suggest an analogy with Zj = Zg U {oo} 
and other such extended number sets. (See Notation 14.5.4 for Zg.) A possible alternative would be List" (X), 
which suggests the one-point completion of a topological space such as Rj. 


14.12.16 REMARK: Spaces of lists with specified maximum length. 

For the spaces of multinomial coefficients in Definition 14.12.20 and Notation 14.12.21, it is convenient to 
introduce a definition and notation for spaces of lists with a specified maximum length in Definition 14.12.17 
and Notation 14.12.18. 


14.12.17 DEFINITION: The list space with maximum length k on a set X, for k € Zj, is the set Ls Xi. 


14.12.18 NOTATION: List;(X), for k € Zj , denotes the list space with maximum length k on a set X. 


14.12.19 REMARK: Spaces of multinomial coefficients with specified maximum degree. 

Symmetric multinomial coefficients are relevant to definitions of multinomials over commutative rings. For 
such multinomials, “multiple copies” of coefficients appear in the coefficient maps in spaces Mult? (I ,Y). 
Thus 2:425 would appear as c(1,2)z1z2 + c(2, 1)zoz, with c(1,2) = c(2, 1) = 1. 

The spaces Mult? (Nn, IR) are particularly applicable to higher-order partial derivatives of functions on 
differentiable manifolds. 


14.12.20 DEFINITION: The multinomial coefficient space of degree k with index set I and value space Y, 
for k € Zj, is the set Y L/5*«() of functions from List; (I) to Y. 


The symmetric multinomial coefficient space of degree k with index set J and value space Y , for k € Z ,1s 
the set of functions {c : List; (I) > Y; VE < k, Vo € I*, Vi, j € I, c(swap; ;(o)) = c(a)]. 


14.12.21 NOTATION: Mult;,(/,Y), for k € Zi, denotes the multinomial coefficient space of degree k with 
index set I and value space Y. In other words, Mult; (I, Y) = Y List«(), 


Mult? (I, Y), for k € Zt, denotes the symmetric multinomial coefficient space of degree k with index set I and 
value space Y. That is, Mult? (I, Y) = (c: List, (I) > Y; V£ € k, Vo € I’, Vi, j € I, c(swap; ;(o)) = c(a)}. 


14.12.22 REMARK:  Generalisations of swap and substitution operators to general sets of functions. 
Parts (vi) and (vii) of Definition 14.12.6 have useful generalisations to general sets of functions. These are 
presented in Definition 14.12.23 parts (i) and (ii) respectively. 


14.12.23 DEFINITION: The following operations are defined on sets of functions Y * for sets X and Y. 


(i) The “swap values at s and t” operator swap, , : YX 5 Y* is defined for s,t € X by 
Wa tE X, Vf CY, Vee xX, 
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fit) ifv=s 
swap(f)(z) = 4 f(s) ife=t 
st f(x) otherwise, 
In other words, 
Vste X, Vf eY*, swap(f) = (f A1 (9) FOI U Us FO) e 


(ii) The “substitute value at z with y” operator subsz y : Y* — Y~ is defined for x € X and y € Y by 


y ift=a 
Nyev We. AO = { 4 eee 
In other words, 
Vee X, Wye Y, Vf eY~, subs(f) = (FA (Gs f(x))}) U 1 9))- 
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Chapter 15 


RATIONAL AND REAL NUMBERS 
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15.0.1 REMARK: The historical motivation for the formalisation of real numbers. 

In the 21st century, the real numbers seem so natural, it is difficult to understand why there was ever a need 
to formalise the real numbers and establish the validity of the concept. The following quote from Bell [233], 
pages 519—520, helps to clarify the problem which needs to be solved. 


If two rational numbers are equal, it is no doubt obvious that their square roots are equal. Thus 
2x3 and 6 are equal; so also then are /2 x 3 and V6. But it is not obvious that V2 x v3 = V2 x 3, 
and hence V2 x 4/3 = V6. The un-obviousness of this simple assumed equality, V2 x v3 = V6, 
taken for granted in school arithmetic, is evident if we visualize what the equality implies: the 
“lawless” square roots of 2, 3, 6 are to be extracted, the first two of these are then to be multiplied 
together, and the result is to come out equal to the third. As no one of these three roots can be 
extracted exactly, no matter to how many decimal places the computation is carried, it is clear 
that the verification by multiplication as just described will never be complete. The whole human 
race toiling incessantly through all its existence could never prove in this way that V2 x J3 = V6. 
Closer and closer approximations to equality would be attained as time went on, but finality would 
continue to recede. To make these concepts of “approximation” and “equality” precise, or to replace 
our first crude conceptions of irrationals by sharper descriptions which will obviate the difficulties 
indicated, was the task Dedekind set himself in the early 1870's—his work on Continuity and 
Irrational Numbers was published in 1872. 


(See Dedekind [351]; Dedekind [352], pages 1-27.) The word “lawless” here doubtless refers to a number 
whose decimal expansion follows no apparent pattern, whereas the decimal expansion of a rational number 
will always repeat itself indefinitely after some number of digits. Thus the product of the “lawless” decimal 
expansions for v2 and v3 should somehow be proved to be equal to the “lawless” decimal expansion for v6. 
If even such elementary relations of equality amongst algebraic numbers cannot be satisfactorily resolved 
with decimal expansions, then surely the totality of analysis for general transcendental numbers will be out of 
reach. At the very minimum, a formalisation of the real numbers must be able to securely define the equality 
relation amongst the full range of formulas which specify individual numbers. This requires a “model” 
containing numbers which can be associated with symbolic formulas in such a way that equality within 
the model corresponds exactly to our intuitive sense of equality. When this bare minimum requirement to 
be able to settle questions of equality and inequality is met, one may then proceed to establish the other 
requirements for a real number system, such as addition, multiplication and order. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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15.0.2 REMARK: The rational numbers provide a convenient “base camp” for climbing the real numbers. 
One generally builds the real number system on the assumed consistency and general well-definition of the 
rational numbers. The rational numbers have the convenience that they are countable, and their decimal 
expansions consist of infinite repetitions of a finite string of digits after some point. Thus each rational 
number is finitely specifiable as a ratio of integers or in terms of a repeating digit-string in a decimal 
expansion (or an expansion with any base). One could in principle base real numbers on algebraic numbers, 
which are at least finitely specifiable, and are therefore countable. It is also possible to build the real numbers 
from a smaller space than the full rational numbers. For example, the real numbers may be built from the 
set of all rational numbers of the form i- 2^" for i € Z and n € Zg, which is dense in the real numbers. 
(This corresponds to the set of all finite binary expansions, which are typically used in computer processors.) 
However, the rational numbers have the weight of history in their favour, and they have various convenient 
arithmètic closure properties which can be used for defining real numbers instead of having to prove these 
closure properties as an additional task. The rational numbers thus provide a convenient “base camp” for 
the attack on the real numbers. (Sections 15.4, 15.5, 15.6, 15.7 and 15.8 give some idea of how difficult it is 
to build real numbers even from this relatively convenient starting point.) 


15.0.3 REMARK: Finite resolution of real-world measurements. 

Real numbers arise from measurements of length, area, volume, angles, weight, temperature, time and other 
physical observables. In ancient times, real numbers were known as “magnitudes”. In fact, most observables 
are really lengths. For example, areas and volumes are not measured directly. They are calculated by 
multiplying lengths. Angles are measured as ratios of lengths. Weight may be measured by the length of 
extension of a spring. Temperature may be measured as the length of a column of liquid. Time may be 
measured as the distance that a clock-hand moves. 


Real numbers are the standard mathematical model for physical measurements. This does not mean that 
physical things themselves have parameters or evolution equations which use real numbers. It is only the 
measurements which are modelled with real numbers. Measurements are “phenomena”, as opposed to 
the underlying entities (the “noumena”) which produce the phenomena. The fact that the coordinates of 
measurements of physical phenomena depend on the observer’s frame of reference gives the clue that they are 
not intrinsic attributes of objects-in-themselves. Physical measurements result from interactions between 
observer frames and observed physical systems. Real numbers are used to record those measurements. 
Consequently, one need not be too concerned as to whether real numbers are a suitable model for the 
physical world itself. Real numbers are an extrapolation of the model which humans currently use for 
measurements of the physical world. Lanczos [280], page 7, said something similar as follows. 


The physical world is translated into mathematical relations. This translation occurs with the 
help of coordinates. The coordinates establish a one-to-one correspondence between the points 
of physical space and numbers. After establishing this correspondence, we can operate with the 
coordinates as algebraic quantities and forget about their physical meaning. 


15.1. Rational numbers 


15.1.1 REMARK: Representation and ontology for rational numbers. 

It may at first sight seem obvious that a rational number such as two-thirds may be represented by an 
ordered pair (2,3), but the ordered pair (4,6) would presumably give the same result. One could achieve 
uniqueness for this ordered pair representation by always reducing to relatively prime integers by dividing 
by the greatest common divisor. But this can be somewhat onerous when the numerator and denominator 
are both very large integers. Therefore it is desirable to find a more natural representation for rational 
numbers. Pairs such a (2,3) and (4,6) may be thought of as parameters for a procedure to construct a 
ratio or proportion. There is an elementary construction for this in Euclidean geometry. The arithmetic 
of proportions is very ancient, and yet there is no standard notation for proportions other than pairs of 
relatively prime integers. In the case of multiplication, we do not think of the number 12 as the products 
2 x 6 or 3 x 4. There is a standard written representation “12”. 


A rational number is generally thought of in terms of the metaphor of divided geometric lengths. Thus 
the ratios 4/7 and 57/100 are thought of as being very close to each other, although the numerators and 
denominators are quite different. In other words, we think of ratios as magnitudes with a well-defined metric 
and order, not merely as abstract pairs of integers or equivalence classes of integer-pairs. Nevertheless, there 
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seems to be no representation of rational numbers which closely corresponds to their ontology and is also 
convenient to manipulate. Therefore integer-pair-equivalence-classes are generally adopted for the theoretical 
development of their properties. 


A more abstract approach to the rational number system would be to characterise it as an ordered field Q 
which contains the integers, and which contains no smaller ordered field which contains the integers. (See 
MacLane/Birkhoff [110], pages 101—104, 265-267, for the construction of Q as the “field of quotients” which 
is generated by Z regarded as an integral domain.) Then every map f : Z — K from the integers Z to 
an ordered field K which preserves the ordered additive group and multiplicative semigroup structure of 
integers would factorisable into a canonical map u : Z — Q and a structure-preserving map g : Q > K. 
(This would be along the same lines as the unique factorisation property for tensor products in Section 28.3, 
and for multilinear maps in Section 27.8.) Such a high-level abstract characterisation of the rational number 
system still does not answer the fundamental question of what rational numbers are. 


Perhaps the closest approximation to a credible ontology for rational numbers would be to say that integer 
pairs of the form (p,q) are parameters for a well-known Euclidean geometry construction to cut a given 
line in a specified ratio by first drawing a line adjacent to the given line, marking off q equal segments on 
the constructed line, and then cutting the given line with parallel lines from the constructed line. (This 
is justified by Euclid's propositions V.9 and V.10. See for example Euclid/Heath [214], pages 211-214; 
Euclid [216], pages 132-133; Kuchel [106], page 157.) This construction is illustrated in Figure 15.1.1. 


C 


Figure 15.1.1 Euclidean construction to divide a given line segment in a given ratio 


The “ratio p/q" may then be interpreted to mean the result of such a geometric construction, including its 
metric and order properties. Thus, as in the case of most concepts in mathematics, the “true meaning" of the 
concept is extra-mathematical. The pure mathematical construction, which is an integer-pair equivalence 
class in this case, is merely a parameter-set which “points to" an extra-mathematical object. (For another 
example of extra-mathematical objects, see Remark 53.1.14 for tangent vectors.) One may think of an 
integer-ratio pair (p,q) as a “coordinate tuple” for a rational number “point” in an extra-mathematical 
number space. The existence of infinitely many choices for this coordinate tuple should then be no more (or 
less) troubling than the existence of infinitely many choices for charts for manifolds. (In fact, the ratios p/q 
in Figure 15.1.2 in Remark 15.1.7 are literally coordinates, which parametrise lines by specifying a single 
point on each line.) 


15.1.2 REMARK: Rational numbers can be constructed from equivalence classes of integer pairs. 
The rational numbers are a natural extension of the signed integers Z in Section 14.4. Definition 15.1.3 
permits the representation Q of the set of rational numbers to be chosen freely, although a correspondence 
function p between integers and rational numbers must be provided. Definition 15.1.3 allows the set Q to 
include the integers as a subset. But typical representations do not literally include the integers. 


15.1.3 DEFINITION: A rational number representation is a pair (Q, p) where p : Z x (Z\ {0}) > Qisa 
surjective map to a set Q such that V(n1, d1), (na, d2) € Z x (Z \ {0}), p(m1, di) = p(no, d2) & nido = nodi. 
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15.1.4 REMARK: Representations of rational numbers. 

The most popular representation of the rational numbers is the pair (Q, p) where Q is the set of equivalence 
classes {[(n, d)]; (n, d) € Z x (Z\ {0})}, where [(n, d)] = ((n/, d') € Z x (Z\ {0}); nd’ = n'd] for integer pairs 
(n,d) € Z x (Z\ {0}), and p: Z x (Z\ {0}) > Q is defined by p : (n, d)  [(n, d)]. 

Sets of equivalence classes are often used as representations of mathematical classes when the first choice 
for the representation suffers from non-uniqueness. One way to avoid the inconvenience of equivalence 
classes is to choose a canonical element from each equivalence class to represent the whole class. In the 
case of the rational numbers, the lowest common denominator can be used to construct a well-defined 
unique representative for each rational number. (This would be a suitable representation in computer 
software.) For each equivalence class [(n, d)] in the above representation, a suitable representative would be 
the pair (n/, d^) where d' = |d|/ gcd(|n|, |d|) and n’ = nd'/d when n Æ 0, and (n', d") = (0,1) when n = 0. 
This gives a second representation (Q', o") of the rational numbers with the obvious map p'. A third 
representation (Q", p") may be constructed by substituting n € Z for each pair (n,d) € Q' with d = 1. 
Thus Q” = ((n,d) € Q'; d Z 11 U Z. Such a representation has the advantage that Z C Q”, but then the 
definition of arithmetic operations becomes more difficult because the representation is non-uniform. The 
representation (Q', d") which “simplifies” the ratio-pairs (n, d) using the greatest common divisor gcd(|n|, |d|) 
of the absolute values of n and d is also difficult to work with because after each operation, the resulting 
representative must be normalised by “simplifying the fraction". Thus for practical purposes, it may be 
most convenient to use the full equivalence classes, although other representations are more attractive for 
explaining the meaning of the rational numbers. All representations are, of course, interchangeable. 


15.1.5 DEFINITION: The set of rational numbers is the set ([(n, d); n € Z,d € Z \ (0), where [(n, d)] 
denotes the equivalence class ((n',d') € Z?; d' £0 and nd' = n'd) for all pairs (n, d) € Z x (ZW (0]). 


15.1.6 NOTATION: Q denotes the set of rational numbers. 


15.1.7 REMARK: A standard representation for rational numbers. 

Definition 15.1.5 is the popular representation for the set of rational numbers in terms of equivalence classes 
of pairs of integers with equal ratios. This is illustrated in Figure 15.1.2. (This diagram is reminiscent of 
some constructions in projective geometry.) 


3/5 
1/2 


2/5 
1/3 


1/5 


1/10 
d 0/1 
1/10 


m 


—1/3 
—2/5 


—/2 
—3/5 


Figure 15.1.2 Construction of rational numbers from integers 


15.1.8 REMARK: Rational number arithmetic. 

Rational number systems are defined abstractly in Definition 15.1.9 in terms of the abstract rational number 
representation in Definition 15.1.3. These have the algebraic structure of fields. (See Section 18.7 for fields.) 
Ordered rational number systems are defined similarly in Definition 15.1.10. These have the algebraic 
structure of ordered fields. (See Section 18.8 for ordered fields.) 
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15.1.9 DEFINITION: A rational number system is a tuple (Q,p,0,7) where (Q, p) is a rational number 
representation and e : Q x Q — Q and T : Q x Q — Q satisfy the following conditions. 
(1) V(ni, d1), (n2, d2) € Z x (Z \ 103), o(e(m1, di), p(n2, d2)) = p(nıd2 + adi, didz). 
(2) Y(n, di), (n2, d2) € Z x (Z\ {0}), r(p(ni, d1), p(n2, d2)) = p(nana, didg). 
15.1.10 DEFINITION: An ordered rational number system is a tuple (Q, p,c, T, R) where (Q,p,co,7) is a 
rational number system and the relation R € IP(Q x Q), denoted as “<”, satisfies the following condition. 


3) V(n3, di), (n2, d2) EZX Z*, p(n1, di) < p(ne, d2) € nido < n3di. 


15.1.11 NOTATION: 

QF denotes the set of positive rational numbers {q € Q; q > 0}. 

(Qj denotes the set of non-negative rational numbers {q € Q; q > 0j. 
LY denotes the set of negative rational numbers {q € Q; q < 0}. 

Qo denotes the set of non-positive rational numbers {q € Q; q < 0j. 


15.1.12 REMARK: Notations for bounded and unbounded intervals of rational numbers. 
General bounded sub-intervals of Q may be denoted as in Notation 15.1.13. General unbounded sub-intervals 
of Q may be denoted as in Notation 15.1.14. 


15.1.13 NOTATION: 

Qla, b] denotes the set {q € Q; a € q € b} fora, b € Q. 
Qla, b) denotes the set {q € Q; a < q < b} fora, b E Q. 
(a, b] denotes the set {q € Q; a < q € b} for a,b E€ Q. 
Q(a,b) denotes the set {q € Q; a < q < b} for a,b E Q. 


15.1.14 NOTATION: 

[2(—o0, b] denotes the set {q € Q; q < b} for bE Q. 
[2(—o0, b) denotes the set {q € Q; q < b} for bE Q. 
Qla, oo) denotes the set {q € Q; a < q} fora € Q. 
D(a, oo) denotes the set {q € Q; a < q} for a € Q. 


15.1.15 REMARK: Order, addition and multiplication for the rational numbers. 

The definitions and properties of order, addition and multiplication for the rational numbers, based on the 
corresponding definitions for the integers, are well known. They do not need to be repeated here in full. 
Theorem 15.1.16, which is related to the Archimedean property of the rational numbers, is used in the proof 
of Theorem 15.6.15 (i). 


15.1.16 THEOREM: Archimedean property for the rational numbers. 
Vq E€ Qt, Ji € Z, q <2. 


PROOF: Let q € Qt. Then q = [(n,d)] for some n,d € Zt. By Theorem 13.6.15, n < 2”. Therefore 
n € 2"d because d > 1. So q € 2" by Definition 15.1.10. 


15.1.17 REMARK: Uneven distribution (alleged) of rational numbers on the real line. 

One generally thinks of the rational numbers as being evenly distributed on the real line, possibly because the 
set Q is invariant under all rational translations. However, Figure 15.1.3 gives some idea of the unevenness 
of the distribution of rational numbers with bounded denominator. 


po 1111 12 1 2 1 32 34 1 5435 2 5 374 5678 1 


d 4 9876 594 7 3 85 79 2 9758 3 7 495 6789 1 


Figure 15.1.3 Uneven distribution of rational numbers p/q with denominator q < 25 


The unevenness of distributions of rational numbers with bounded denominator affects the rate of conver- 
gence of rational approximations to real numbers. In particular, convergence is much slower in the case of 
transcendental real numbers. (See for example EDM2 [113], pages 1630-1631.) 
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15.1.18 REMARK: The unevenness of the rational number distribution is illusory. 

The apparent unevenness of the rational number distribution is to a great extent illusory. There is no obvious 
reason why an upper bound on the denominator should be the criterion for truncating the infinite set of 
rational numbers to a finite number in each bounded interval. For example, such a truncation is not closed 
under addition or multiplication, whereas the set of rational numbers with a specified denominator is at least 
closed under addition. For example, if one restricts rational numbers to those with denominator q = 2" for 
some n € Zi , this locally finite set is closed under addition for each n, and the union of such locally finite 
sets over all n € Zi is closed under multiplication and is dense in the set of real numbers. 


To obtain the set of all rational numbers as the union of locally finite subsets, one could consider the sets 
Qr. = Iu pe ZNn } for n € Zt. Then Q2N[0, 1) = {0, 1/2}, Q3N[0, 1) = {0, 1/6, 1/3, 1/2, 2/3, 5/6}, 
and so forth. (This is the same as specifying that the denominator must equal n!.) Clearly each set Qn is 
closed under addition, and LJ , Qn = Q. Each stage of this sequence has an even distribution. 


15.2. Cardinality and enumerations of the rational numbers 


15.2.1 REMARK: Applications of enumerations of the rational numbers. 

The cardinality of the rational numbers is established by demonstrating the existence of a bijection from a 
"standard yardstick set", which is w in this case, to the set Q. However, such a bijection or “enumeration” 
of the rational numbers is useful for other purposes also. Of particular importance is the construction of 
bijections from w, or some element of w, to the set of connected components of a general open set of real 
numbers. (See Theorem 32.7.8.) The fact that enumerations of the rational numbers can be constructed, 
not just pulled out of an axiom-of-choice hat, has the advantage that connected components of open sets of 
real numbers can be dealt with in a concrete manner. This is particularly so when proving the Lebesgue 
differentiation theorem. (See Section 45.7.) 


15.2.2 REMARK: A bijection from Q to the rational unit interval Q(0, 1). 
Figure 15.2.1 illustrates the bijection f : Q — Q(0,1) defined by f : x — (14+ 2/(1 + |z|))/2, where 
Q(0,1) = {q E Q; 0 < q < 1} is an open rational unit interval. 


(1 z/(1 + |x|))/2 € QO, 1) 
dba mmt pesci ------------------------------ > 
xEQ 
T T T T T T T T T T T > 
-5 -4 -3 -2 -1 0 1 2 3 4 5 
Figure 15.2.1 The map «+ (1 + z/(1 + |2|))/2 from Q to Q(0, 1) 


It is easily verified that this function maps rational numbers to rational numbers. 


d+2n 
c AM Wedge 
P adm "€ fo 

Yn € Z, Yd € Z7, f(n/d) = , 
Sg ifn € Zo. 


To show that this map is a bijection, it is sufficient to show that f^! : Q(0, 1) > Q is a well-defined function. 
The inverse map may be written as: 


2y — £x 1 
e 2—2 
Vy € Q(0, 1), fF a= 
2y — : 1 
2j if y € 5. 
The fact that rationals are mapped to rationals follows from the following formula. 
mt if 2n >d 
— 2n 
Vd € Z+, Vn € Z(0,d), f (n/d)- "a 
TTT  if2n Xd, 
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where Z(0, d) denotes the integer interval {i € Z; 0 <i < d}. It follows that the cardinality of Q is the same 
as the cardinality of the unit rational interval Q(0, 1). 


15.2.3 THEOREM: Well-definition of bijection from Q to the rational unit interval Q(0, 1). 
Define a map g : Q(0, 1) > Q by 

Vd € Z*, Vn € Z(0,d), g([(n,d)]) = [(2n — d, min(2n, 2d — 2n))] (15.2.1) 

(i) g is a well-defined function from Q(0, 1) to Q. 

(ii) g is an injection. 
(iii) g is a bijection. 
Pnoor: For part (i), let q € Q(0,1). By Definition 15.1.5, q = [(n,d)] for some n € Z and d € Z \ {0}. 
Then q € (0,1) if and only if either d > 0 and 0 < n < d, or else d < 0 and d < n < 0. Suppose 
that d > 0. Then g(q) = [(2n — d, min(2n, 2d — 2n))] by line (15.2.1). Clearly min(2n, 2d — 2n) € Z*. So 
g(q) is a well-defined element of Q. Suppose that [(di,n1)| = [(d,n)] for some (di, n1) € Z x Zt. Then 
g([(n4,d1)]) = [(2n; — dy, min(2n,, 2d, — 2n4))]. This equals [(2n — d, min(2n, 2d — 2n))] because nd; = nid 
implies (2n — d) min(2ni, 2d, — 2n1) = (2n, — di) min(2n, 2d — 2n). (See Definition 15.1.3.) Therefore the 
value of g(q) is a rational number which is independent of the representation of q. 


For part (ii), let q1, q2 € Q(0, 1) with g(q1) = g(q2). Let qı = [(n1, d1)] and q2 = [(n2, d2)] with denominators 
d; > 0 and da > 0. Then (2n; — di, min(2n1, 2d; — 2ni))| = [(2na — d», min(2n2, 2d; — 2n3))]. Therefore 
(2ni = di) min(2n», 2d» = 2n3) = (2ne — dz) min(2ni, 2d, = 2ni). 
Suppose that 2n1 < dı and 2n3 > dg. Then min(2ni, 2d; —2ni) = 2n; and min(2n5, 2d? — 2n3) = 2d2 — 2n. 
So (2ni = dy) (2d. = 2n3) = (2n2 = d2)(2n1). So 2(2ni = dı) (2n2 = d2) = da (2nı = di) = di (2na = d»). 
But 2(2ni = di)(2na = d2) < 0 do (2ni = dı) = di (2n2 = dz) < 0. So do (2ni = dı) = di (2n3 = dz) = 0. 
Therefore 2n, — dı = 0 and 2ng — d2 = 0 because dı > 0, do > 0, 2n, — dı € 0 and 2ng — d2 > 0. So 
a = [(n11,2n1)] = [(1,2)) = [(na,2n3)] = q2. (In other words, q4 = q2 = 1/2.) Similarly, if 2n, > d; and 
2n» < da, then qı = q2 = [(1, 2)]. 
Now suppose that 2n, < dı and 2ng € dg. Then min(2n,,2d, — 2n4) = 2n, and min(2n5, 2d5 — 2n3) = 2n5. 
So (2n; — di)(2n3) = (2ng — dz)(2n1). So 2nıd2 = 2n3di. So qı = q2 by Definition 15.1.5. Suppose that 
2ni > dı and 2n3 > dg. Then min(2n;, 2d, — 2n1) = (2d; — 2n1) and min(2n», 2d5 — 2n3) = (2d — 2n3). So 
(2ni = dı)(2d2 — 2n3) = (2n2 = d2)(2dı ion 2n4). Thus 2n1d3 = 2nod}. So qi = qa. Hence g is an injection. 
For part (iii), define f : Q > Q by 
[((d+2n,2d+2n)] ifn>0 

[(d,2d—2n)] ifn<0. 
Then f : Q > Q is a well-defined function because f([(n, d)]) is independent of the choice of the pair (n, d) 
to represent [n, d, and Range(f) C Q(0,1) because 0 < d+ 2n < 2d + 2n when d > 0 and n > 0, and 
0 < d< 2d — 2n when d > 0 and n € 0. Let q = [(n, d) € Q. Then g(f(q)) = q. So g : Q(0,1) > Q is 
surjective. Hence g is a bijection by part (ii). 


vd € Z*, Yn € Z, f(n, d)]) = { 


15.2.4 REMARK: The cardinality of the rational numbers. 

It is straightforward to construct an explicit bijection from w = Z to Q(0,1) by countable induction. 
Every element of Q(0,1) may be written as p/q for some p,q € Zt with p < q. The pairs (p,q) may be 
right-to-left lexicographically ordered so that the lowest few pairs are (1,2), (1,3), (2,3), (1,4), (2,4), (3, 4), 
and so forth. Equivalent ratios may be eliminated inductively. Let Y = {(p,q) € Z+ x Zt; p < q} with 
the order (pi,q1) < (p2,q2) whenever qi < q2 or else qi = q2 and pı < p2. Define à : Y — Q(0,1) by 
$ : (p,q) ^ [(p, q)] = p/q. (In other words, ¢ converts each pair (p,q) to the corresponding equivalence class 
[((p,q)] in Q.) Define g : w — Y inductively by 


Vn € u, g(n) = min (y € Y; Vi € w, (i < n= oly) 4 O(9(4))} 
= min {y € Y; ó(y) € (6(g(2)); i € njj 
= min (Y Vo ((é(g(i)); i < n))). 
This generates the sequence g(0) = (1,2), g(1) = (1,3), g(2) = (2,3), g(3) = (1,4), g(4) = (3,4), and so 
forth. Define h : w — Q by h(n) = f~!((g(n))) for all n € w, where f : Q > Q(0,1) and f^! : Q(0,1) + Q 
are as in Remark 15.2.2. Then h is an explicit bijection. Hence Q has the same cardinality as w. In other 
words, Q is a countably infinite set. The first few elements of the sequence are as follows. 
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n g(n) k(n) 
0 1/2 0 
1 1& 1/2 
2 2/3 1/2 
3 1/4 -1 
4 3/4 1 
5 1/5 -3/2 
6 2/5 —1/4 
7 3/55 1/4 
8 4/5 3/2 


This sequence simultaneously increases the density and breadth of its range at each stage, where “stage” 
here means the denominator of g(n). This enumeration of all rational numbers is illustrated in Figure 15.2.2. 
(Clearly a much more “even” enumeration could easily be designed.) 


l 
L1 (UML WAIN | ILLE LLLI J- 


l 
LLL l 
l l l l l l l 
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 


VvvVvVVVVVYVVVYVVY 


Figure 15.2.2 An explicit enumeration of all rational numbers 


The explicitness of this sequence has advantages for applications in real-number topology and measure theory. 
For example, given any open subset Q of the real numbers, one may construct a fairly explicit map from w 
to the set of connected components of Q by associating each such component with the first n € w such that 
h(n) is an element of that component. 


15.2.5 THEOREM: Countability of the set of rational numbers. 
Q is a countably infinite set. 


PROOF: Define h : w — Q as in Remark 15.2.4. Then h is a bijection. Therefore Q is a countably infinite 
set by Definition 13.7.6. 


15.2.6 THEOREM: Countability of finite Cartesian products of the set of rational numbers. 
Q” is a countably infinite set for all n € Z*. 


PROOF: The assertion follows from Theorems 15.2.5 and 13.9.7. 


15.2.7 REMARK: Alternative enumerations of the rational numbers. 

Explicit enumerations of the rational numbers which are more even than the enumeration in Remark 15.2.4 
can be obtained by building successive stages with specified breadth and depth. For example, in stage k € ZT, 
one could include all rational numbers of the form i/j with j = k! and i € Z with —j? < i < j?. Thus 
stage 1 would include the numbers —1, 0 and 1, stage 2 would include the numbers —2, —3/2, —1, 0, 1/2, 
1, 3/2 and 2, an so forth. A bijection from w to all elements of all stages may be constructed by countable 
induction, appending each stage to the cumulative enumeration of previous stages and skipping over elements 
which have already appeared. This kind of procedure can be modified to give more rapid coverage of the 
breadth of rational numbers, or to more rapidly fill in the gaps to give greater depth. For pure mathematics, 
however, such details are regarded as immaterial. 
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15.3. Real numbers 


15.3.1 REMARK: Computers do not implement real numbers. 

The “floating-point numbers” implemented in digital computer hardware represent rational numbers, not 
real numbers, although the term “real numbers” is often used incorrectly for computer numbers. Rational 
numbers are often used in practice to represent intervals of real numbers. For example, a decimal floating- 
point number like 3.141592654 is understood to indicate an interval such as [3.1415926535, 3.1415926545]. 
Whether computer hardware number implementations are interpreted as rational numbers or intervals of real 
numbers, their operations are generally only approximate at best. The purpose of mathematical definitions 
of real numbers is to remove the ambiguity and imprecision of the informal every-day usage. The idea 
of number intervals as an interpretation for the common usage of numbers provides a natural basis for a 
rigorous definition of real numbers. 


15.3.2 REMARK: The relation of real numbers to the real world. 

The real numbers sometimes seem to have some surprising properties, and one may wonder whether they 
are an accurate model for the universe. However, one must always remember that mathematical systems 
only give conclusions if the assumptions are true. In the case of real numbers, it is highly unlikely that any 
physical system would satisfy the axioms. It is dubious that even the rational numbers could be a correct 
model for any physical system. Certainly measurements in physics do not provide unbounded significant 
figures of accuracy. All physical measurements have a limited resolution. 


Although physical phenomena (the appearances of things) do not offer unbounded resolution, physical 
noumena (the underlying true natures of things) could possibly offer unbounded or infinite resolution in 
some systems. But there is no way of knowing the true natures of things. When real numbers are used to 
model noumena, one can only say that the finite-resolution approximations which are provided by phenom- 
ena seem to match very well with many of the models in physics. If the underlying things behind phenomena 
are not in fact modelled correctly by the real numbers, this does not really matter. Physics never says what 
a thing is, only how it appears. 


The real number system only tells us the properties of a modelled system if it precisely matches the speci- 
fication. So if the properties of real numbers seem a little odd, and do not match exactly our experience of 
the phenomena of the universe, it doesn’t matter because we will never be able to check. Therefore the real 
number system should not be taken too seriously. It’s only a model. 


It is only our measurements and observations of the physical world which use the real number system, not 
the physical world itself. A falling object does not have any kind of internal state variable which is a real 
number corresponding to its velocity or inertia or temperature or mass. Numbers exist in human minds after 
observations have been made. (A similar comment is made in Remark 53.1.7 regarding Cartesian charts for 
the physical world.) This is very roughly illustrated in Figure 15.3.1. 


ft real 
mE number 
A 
\ action / 
a cues arrow 
of 
~leaction., time 
before 


Figure 15.3.1 Real numbers are found in models of observations, not in the real world 


Something similar is stated by Lawden [284], page 6, as follows. 
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Suppose that at some instant t, a physical system is allowed to interact with certain observing 
instruments. The information so obtained is said to refer to the state of the system at the instant t. 
Such interactions with instruments usually result in values being assigned to certain physical 
quantities associated with the system, which the instruments are said to measure. Thus, standard 
instrumental procedures for measuring the linear momentum, angular momentum, energy and 
position of a particle have been devised by physicists and, if these procedures are followed through 
in respect of a particular particle, the resulting numbers are, by definition, the values taken by 
these quantities for the particle in its state at the instant of observation. It is to be emphasised that 
the meaning to be assigned to a physical quantity is entirely contained in the procedure commonly 
employed to determine its value in any physical situation. 


Since the results of physical measurements reside in the instruments, and the measurements made with those 
instruments are interpreted by human beings, it is ultimately in human minds that the real numbers and 
other mathematical concepts reside, not in the systems under observation. As mentioned in Remark 2.2.5, 
although mathematics may seem to be located in the observed universe, all mathematics is ultimately located 
in human minds. Hence the many conundrums, paradoxes and peculiarities of the real numbers in particular 
do not necessarily imply that the physical universe outside the human mind has the same issues. 


15.3.3 REMARK: The inaccessible infinitude of the so-called “real” numbers. 

The difficulties encountered when attempting to provide a rigorous logical basis for the real numbers should, 
in retrospect, not be surprising. Humans (and computers) are finite, whereas real numbers are even more 
infinite than the integers. The difficulties arise from the ultimate impossibility of representing an infinite 
amount of information in a finite-state system. So the attempt is doomed to ultimate failure. The best that 
can be realistically claimed is that real numbers are some kind of extrapolation of our finite approximations 
into some kind of inaccessible metaphysical number-universe which we can never directly perceive. Axioma- 
tising such a metaphysical universe, and convincing ourselves that the axiomatisation is logically consistent, 
does not make that universe real. The more we rely upon abstract axiomatisation, the more we are tacitly 
admitting that we are dealing with something which is beyond our direct intellectual grasp. Ironically, the 
word “real” in the term “real numbers” is arguably the opposite of the truth. 


The unreality of the “real numbers” may provide at least some solace when one trudges through the tedious 
mire of the Dedekind cut representation for the real numbers in Sections 15.4-15.8. All logical formalisations 
of the real numbers seem to be exceedingly tedious and unnatural, bearing little relation to our relatively 
simple intuitive notions. 


15.3.4 REMARK: Alternative representations of the real numbers. 

Representations of the real numbers include the following. (See Definition 16.5.11 for the floor of a real 
number, Definition 16.5.16 for the fractional part, Definition 16.5.4 for the sign, and Definition 16.5.2 for 
the absolute value.) 


(1) Z x U, where U = {a € 2^; #{i € w; x(t) = 0} = oo). The first component Z represents the floor of 
the real number. The second component U represents the fractional part of the number. There is no 
need to apply an equivalence class to this representation. (However, the set of signed integers Z may 
be represented as equivalence classes of pairs of unsigned integers.) In this representation, a pair (k, x) 
represents k + 377-9 2(i) 27^ !. (A version of this representation with base 10 instead of 2 is outlined 
by Spivak [140], pages 505—506.) 

(2) 2 x wx U, where U = (x € 2^; #{i € w; x(t) = 0) = co}. The first component 2 = {0,1} is the 
“sign bit” of the real number. The second component w represents the floor of the absolute value of 
the number. The third component U represents the fractional part of the absolute value of the number. 
This representation has the disadvantage that there are two representatives for the number zero, but 
otherwise all real numbers have a unique representative. (Any representation which uses a “sign bit” 
will have two representations for OR.) A triple (s, k, £) represents (—1)*(k + Xo z(i) 2 ^1). 

(3) 2x V x U where V = (y € 2%; #{i € w; y(i) = 1} < co} and U = {a € 2%; Z(i € w; x(t) = 0] = co}. 
This is the same as option (1) except that the integer part of each real number is replaced by a “sign 
bit” followed by the absolute value of the integer part. (This has the same very minor non-uniqueness 
as style (2).) A triple (s, y, £) represents (—1)* (577-9 YZY + 35-9 x(i) 27771). 

(4) 2 x {z € 27; #{i € ZF; z(i) = 1} < oo and #{i € Z^; z(i) = 0} = oo). This has a sign bit followed 
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by the integer and fractional parts of the absolute value of the real number combined into a single 
map z : Z — {0,1}. A pair (s,z) is mapped to (—1)* $57-. ,, 2(1)2*. As in styles (2) and (3), the only 
redundancy here is the double representation of the number 0g. 

(5 2x Z x U where U = (x € 2%; Z(i € w; x(t) = 0) = oo]. (Floating point representation.) The 
first component represents the sign of the real number. The second component represents the binary 
"left shift" of the absolute value of the number (before the sign bit is applied). The third component 
represents the full binary expansion of the number, including both the integer and fractional parts. 
A triple (s, k, x) represents (—1)* $5; x(i) 2^7 *^!. The redundancy here is quite substantial because 
the sequence x may commence with one or more zeros. (There are some untidy ways to reduce this 
redundancy which are used in practical floating-point computer representations, but they are outside 


the scope of this book.) 


(6) Dedekind cuts. (This representation has a unique representative for each real number, but each represen- 
tative is a countably infinite set of rational numbers, which are themselves countably infinite equivalence 
classes.) Each semi-infinite interval I of Q represents sup I. 


(7) Cantor representation: Equivalence classes of Cauchy sequences. (The representative of each real number 
is an equivalence class of sequences of rational numbers. Each such equivalence class has the cardinality 
of the power set of the set of integers.) Each equivalence class [x] of sequences x € Q represents 
lim... x (i). (This representation is outlined by Spivak [140], page 505.) 

(8) Axiomatic system: Complete ordered field. (See Section 18.9.) 


There are apparently no real-number representations which combine all desirable characteristics. Many 
virtues must be traded off against each other. Any ad-hoc removal of redundancy by “normalisation” of the 
representation leads to the need to “renormalise” after every operation. For example, one generally replaces 
a sign bit s with 1 — s when negating a number, but this must be reset to zero if the result is ^—0". Similarly, 
negation applied to the unit interval U requires renormalisation afterwards. 


Styles (6) and (7) are the most important in pure mathematics, but are useless for computer implementation. 
'Their redundancy is enormous, but they have no ad-hoc normalisation requirements. 


15.3.5 REMARK: The decimal representation for the real numbers. 

'The most popular practical real-world representation of real numbers is a sequence of decimal digits prefixed 
with a sign and infixed with a decimal point. This is effectively equivalent to style (5) in Remark 15.3.4 with 10 
as the base instead of 2, and with a signed “decimal point shift” value. This is equivalent to a representation- 
set 2 x Z x {x € 10^; #{i € w; x(i) # 9) = oo), where a triple (s, k, x) represents (—1)* 0°, 10* ^ !a(i). 
A large amount of the redundancy in this representation can be removed by requiring z(0) Z 0, except that 
to represent Og one lets x(t) = 0 for all i € w, with s = 0 and k = 0. 


15.3.6 REMARK: The Dedekind and Cantor representations of the real numbers. 

'The most popular representations for the real numbers in pure mathematics are the Dedekind representation 
and the Cantor representation. The former defines real numbers as semi-infinite intervals of rational numbers 
called Dedekind cuts. The latter defines real numbers as equivalence classes of Cauchy sequences. 


The Dedekind representation exploits the total order on the rationals to “fill the gaps". The Cantor repre- 
sentation exploits the distance function on the rational numbers to “fill the gaps". The total order and the 
distance function on the rational numbers are not unrelated. However, the distance function approach is 
much more generally applicable. Any metric space may be completed (i.e. the “gaps” can be filled in) using 
the Cauchy sequence approach. On the other hand, the total order approach requires much less convoluted 
logic for its construction. 


In applied mathematics and the sciences, the most popular representations of the real numbers are as finite 
and infinite decimal or binary expansions. However, irrational numbers cannot be represented precisely by 
such expansions, since they require an infinite amount of information. 


15.3.7 DEFINITION: A Cauchy sequence of rational numbers is a sequence x : w — Q such that 


Ve € Q*, 3n € w, Vi > n, |z; — 2| < €. 


15.3.8 DEFINITION: The Cantor real number system is the set of equivalence classes of all Cauchy sequences 
of rational numbers, where two Cauchy sequences z = (r;);e, and y = (yi)ieu are considered equivalent if 
the combined sequence (2;);e,, defined by zs; = v; and z2;41 = y; for all i € w is a Cauchy sequence. 
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15.4. Real number representation by Dedekind cuts 


15.4.1 REMARK: The advantages of the Dedekind-cut representation for the real numbers. 

Although Dedekind cuts are impractical for computer representations of the real numbers, they provide 
probably the most convenient model for the formal development of the basic properties of the real numbers. 
At the minimum, it must be shown that any model of the real numbers satisfies the conditions for a complete 
ordered field, as described in Section 18.9. All other properties follow from those conditions, which uniquely 
characterise the real number system, and the particular choice of concrete representation may then be safely 
forgotten. But first, one must show that at least one representation exists. 


One of the greatest benefits of the Dedekind-cut representation for real numbers is that it “leverages” the 
general confidence in the logical consistency of the rational numbers, which have been closely scrutinised by 
mathematicians since the times of Pythagoras and Euclid. So one does not need to base one’s confidence 
on the consistency of ZF set theory, which is a relative newcomer to mathematics. The Dedekind-cut 
representation predates the development of ZF and other modern axiomatic set theories. However, nowadays 
it is customary to represent integers and rational numbers within ZF set theory, although the consistency of 
numbers may be studied in terms of logical systems outside any axiomatic set theory framework. 


The Dedekind-cut real-number representation is presented by Spivak [140], pages 494-505; Rudin [129], 
pages 3-12; Schramm [133], pages 350-359; Friedman [74], pages 10-15; Penrose [297], page 59; Eves [353], 
pages 196-199; Dedekind [351]; Dedekind [352], pages 1-27. (Dedekind cuts are presented very much more 
generally for partially ordered groups by Fuchs [75], pages 92-95.) 


15.4.2 REMARK: Computer implementation of Dedekind cuts. 

The Dedekind cut for the number V2 may be written as v2 = {q € Q; q < Oor q? < 2}. The number 
7 may be written as r = {q € Q; Vj € Zt, q < 4357 (71) (2i + 1)71). In terms of mathematical logic 
(and countable induction), such expressions are perfectly meaningful, although infinite sets like this are not 
directly implementable in computer software or hardware. On the other hand, the computation of binary or 
decimal expansions of real numbers such as V2 and 7 is not an entirely trivial task in numerical analysis, and 
the exact expansions are in principle infinite. One only ever works with rational approximations in standard 
floating point representations, and one could argue that these are in fact approximations to Dedekind cuts 
by computing rational numbers which are, or are not, inside a given cut. 


15.4.3 REMARK: Interpretation of formal definition of a Dedekind cut. 

In Definition 15.4.4, condition (15.4.1) means that all rational numbers to the “left” of a given element y 
are required to be in the Dedekind cut. Condition (15.4.2) means that at least one rational number to the 
“right” of a given element y is required to be in the Dedekind cut. The first condition guarantees that there 
are no “gaps” to the left of any given element. The second condition guarantees that the Dedekind cut 
contains no maximum element. Thus all elements are “interior points”. In the language of topology, one 
could therefore say that a Dedekind cut is a “connected open subset” of Q. 


S 
— o eo o 
T y z Q 
Figure 15.4.1 Definition of a Dedekind cut 


15.4.4 DEFINITION: A Dedekind cut is a subset S of Q which satisfies (15.4.1) and (15.4.2). 


Vy € S, Vz EQ, LLY > TES (15.4.1) 
Vy E S, 3z €E S, y «az. (15.4.2) 


15.4.5 DEFINITION: A finite Dedekind cut is a set S € IP(Q) \ (0, Q} which satisfies (15.4.3) and (15.4.4). 


Vy E€ S, Vr c Q, r<y>o>ues (15.4.3) 
Vy E S, 3z E S, y «az. (15.4.4) 


In other words, a finite Dedekind cut is a Dedekind cut S which satisfies 04 S 4 Q. 
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15.4.6 REMARK: Intuition assisting the development of Dedekind cuts to define real numbers. 

When one must develop technical properties of a definition for many pages, it is important to have some 
intuition to guide the stages of that development, both to choose which assertions to prove and to prove 
them, and also to choose which constructions to define for particular concepts. In the case of Dedekind cuts, 
one may think of any Dedekind cut as a set of the form {q € Q; q < r} for some real number r. The great 
misfortune here is the circular definition of real numbers in terms of Dedekind cuts and Dedekind cuts in 
terms of real numbers. Therefore Definitions 15.4.5 and 15.4.8 must use “intrinsic constructions” which refer 
only to rational numbers. This is the principal insight of Dedekind’s real-number construction method, that 
the real numbers can be defined in a non-circular fashion in terms of rational numbers only. Nevertheless, 
one may use the mental picture of semi-infinite open intervals (—oo,r) = {q € Q; q < r} for r € R to guide 
the theoretical development until the real numbers are solidly established and ready to use. (Of course, this 
“solidity” seems somewhat precarious if one looks too closely at the lower supporting layers, as outlined in 
Section 2.1 and Remarks 3.6.7, 3.14.3, 6.4.6 and elsewhere.) 


15.4.7 REMARK: Dedekind cuts conditions. 

Condition (15.4.1) implies that every Dedekind cut is an interval. Condition (15.4.1) means that no Dedekind 
cut has a maximum element. The non-existence of a maximum element is a technical constraint which ensures 
that there is only one Dedekind cut for each rational number. (This constraint is redundant in the case of 
irrational numbers because their Dedekind cuts cannot have a maximum element which is a rational number.) 
Definition 15.4.4 does not exclude the possibility that the sets Ø and Q could be Dedekind cuts. These could 
plausibly represent the infinite elements —oo and +00 respectively of the extended real number system. 
(This is not done here, as explained in Remark 16.2.1.) They must be excluded when one wishes to represent 
only the finite real numbers. 


Conditions (15.4.1) and (15.4.2) in Definition 15.4.4 may be combined into an equivalent single condition. 


Vy E€ S, 3z E€ S, y«z and Vz € Q, (£ <z — rec S). 


This does not make the Dedekind cut definitions easier to understand, but it does allow Definition 15.4.8 to 
be stated more compactly. 


15.4.8 DEFINITION: The set of real numbers is the set 


{S € P(Q) \ {0, QV; Yy € S, 3z € S, (y < z and Yz € Q, (£ < z > x € S))}. 


15.4.9 NOTATION: R denotes the set of real numbers. In other words, 


R = (SemP(Q)V(0,QV; Vy € S, 3z € S, (y < z and Vr e Q, (x < z => x € $))). 


15.4.10 REMARK: Arbitrary choice of representation of the real numbers. 

Although according to Definition 15.4.8 and Notation 15.4.9, the real numbers are represented by Dedekind 
cuts for the purpose of developing the basic properties, the choice of representation is intended to be ignored 
in practical applications. 


15.4.11 THEOREM: Real numbers and finite Dedekind cuts are the same thing. 
A set is an element of the set of real numbers if and only if it is a finite Dedekind cut. In other words, 
S c R if and only if S € P(Q) \ (0, Q} and S satisfies (15.4.5) and (15.4.6). 


Vy € S, Vx € Q, e<y>ues (15.4.5) 
Vy € S, dz € S, y< z (15.4.6) 


PROOF: Let S be an element of R. Then S € P(Q) V {0,Q} by Definition 15.4.8. Let y € S. Then there 
exists z € S such that y < z and Vr € Q, (x < z = x € S). Therefore Vx € Q, (x < y = x € S). So 
S satisfies line (15.4.3) of Definition 15.4.5, and line (15.4.4) is satisfied because y « z. Hence S is a finite 
Dedekind cut by Definition 15.4.5. 

Now suppose that S is a finite Dedekind cut. Then S € P(Q) \ (0, Q) by Definition 15.4.5. Let y € S. Then 
y « z for some z € S by line (15.4.4), and this implies that z € Q. So by replacing y with z in line (15.4.3), 
one obtains Vr € Q, x < z ^ x € S. Therefore Vy € S, dz € S, (y < z and Vr E€ Q, (x < z => x E $)). 
Hence S is an element of R by Definition 15.4.8. 
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15.4.12 REMARK: Definition of a real number. 

Slightly redundantly, Definition 15.4.13 defines individual real numbers. Definition 15.4.8 defines the set or 
real numbers in a very technical manner. Definition 15.4.13 defines real numbers more meaningfully in terms 
of finite Dedekind cuts. The consistency of these definitions is guaranteed by Theorem 15.4.11. 


15.4.13 DEFINITION: A real number is a finite Dedekind cut. 


15.4.14 REMARK: Characterisation of the complement of a finite Dedekind cut. 

The complement of a finite Dedekind cut may be defined by properties which mirror the two conditions in 
Theorem 15.4.11. Line (15.4.8) in Theorem 15.4.15 (ii) may be interpreted to mean that if a rational number 
u is the left-hand limit of elements v of Q \ S, then u must be in Q \ S. This is a kind of “closed on the 
left" condition which mirrors the “open on the right" condition in line (15.4.6) of Theorem 15.4.11. 


Theorem 15.4.15 (ii) implies that one could use the complements of open semi-infinite rational intervals 
instead of the intervals themselves, but with different conditions which correspond to the “closed” left side 
of the interval. Alternatively, one could use intervals which are infinite on the left and closed on the right, 
or infinite on the right and open on the left. Each of these four options contains the same information. 


15.4.15 THEOREM: Alternative conditions for a set to be a real number. 

(i) YS € R, Yy E Q\ S, Yz E Q, (uxz—z€QV5S). 

(ii) A set S is a real number if and only if S C Q and T = QS satisfies T ¢ (0, Q} and conditions (15.4.7) 
and (15.4.8). 


Vu € T, Vv € Q, ucu-vcT (15.4.7) 
Vu € Q, (WwEQ,u<vsSsveET) Suet. (15.4.8) 


Pnoor: For part (i), let $ € R, y € Q \ S, and z € Q with y € z. If y = z then z € Q\S. So let y < z. 
Suppose that z € S. Then y € S by Theorem 15.4.11 line (15.4.5). This is a contradiction. So z € Q \ S. 


For part (ii), let S € IR. Then S C Q and S £ (0, Q} by Definition 15.4.8. So T ¢ (0, Q}. Let u € T. Then 
u € QS. Let v € Q with u < v. Then v € QV S by part (i). Thus Vu € T, Vo € Q, (uc v v € T), 
which verifies line (15.4.7). Now suppose that u € Q satisfies u € T. Then u € QVT = S. So Ww € S, u < v, 
by Theorem 15.4.11 line (15.4.6). By predicate logic, this is equivalent to =(Vu € S, u > v), which is 
equivalent to ^(Vv € Q, (v € T > u 2 v)), which is equivalent to ^(Vv € Q, (u < v => v € T)). Thus 
ud T => 7(W € Q, (u < v = v € T)), which is logically equivalent to its contrapositive, which is 
(VvveQ,ucv— veT) = uc T, which verifies line (15.4.8). 


For the converse of part (ii), suppose that S C Q and that T = QS satisfies T ¢ (0, Q} and lines (15.4.7) 
and (15.4.8). Then S € P(Q)\{0, Q}. Let u € S and v € Q with v < u. Suppose that v ¢ S. Then v € T. So 
u € T by line (15.4.7), which is a contradiction. So v € S. Therefore Vu € S, Vv € Q, (v < u = v € S), which 
verifies line (15.4.5) of Theorem 15.4.11. Now let y € S, and suppose that the proposition dz € S, y < z 
is false. Then Vz € S,y > z. This is equivalent to Vz € Q, (z € S = y € z), which is equivalent to 
Vz € Q, (y > z = z ¢ S), which is equivalent to Vz € Q, (y > z = z € T). Therefore y € T by line (15.4.8), 
which contradicts the assumption that y € S. Thus Vy € S, dz € S, y « z, which verifies line (15.4.6) of 
Theorem 15.4.11. Hence S is a real number. 


15.4.16 REMARK: Equivalence of single-set Dedekind-cut representation to a two-set partition. 

Theorem 15.4.17 states that the form of Dedekind cut representation for the real numbers in Definition 15.4.8 
is equivalent to the older-style two-set cut, where the rational numbers were partitioned into two semi-infinite 
intervals, S and Q \ S. It is clearly sufficient to specify only one of the sets of a two-set partition of a fixed 
set. Therefore the one-set specification is adopted here. 


15.4.17 THEOREM: Equivalence of finite Dedekind cuts to the traditional set-and-complement definition. 
A set S € P(Q) V (0, Q} is a real number if and only if (15.4.9) and (15.4.10) are satisfied. 


Va € S, Vb E QN S, a<b (15.4.9) 
Vae S,3bc S, a « b. (15.4.10) 
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PROOF: Let S € P(Q) V {@,Q} be a real number. Then S is a finite Dedekind cut by Theorem 15.4.11. 
Let a € S and b € Q\ S. Suppose that b < a. Then b € S by Definition 15.4.5, which is a contradiction. 
Therefore a < b. So line (15.4.9) is satisfied. Line (15.4.10) follows immediately from line (15.4.4). 


Now suppose that S € P(Q) V 10, Q} and that lines (15.4.9) and (15.4.10) are satisfied. Let y € S. Then 
y < z for some z € S by line (15.4.10). For such z, let x € Q with x < z. Suppose that x ¢ S. Then z < x 
by line (15.4.9), which is a contradiction. So x € S. Hence S c R. 


15.5. Embedding rational numbers in the real numbers 


15.5.1 REMARK: Embedding the rational numbers inside the real numbers. 

The very easy Theorem 15.5.2 asserts that the standard embedding of the rational numbers inside the real 
number system in Definition 15.5.3 is well defined. This embedding is given the temporary Notation 15.5.4. 
An incidental consequence of Theorem 15.5.2 is that R is a non-empty set. 


The standard embedding of the rationals inside the real numbers is an ordered-field monomorphism according 
to Definition 18.8.17. Since all complete ordered fields are isomorphic (as mentioned in Remark 18.9.5), and 
the real numbers constitute a complete ordered field, it follows that every complete ordered field contains 
an embedded set of rational numbers. The monomorphism is uniquely determined by that requirement that 
Og must be mapped to Or, and 1o must be mapped to lr. 


15.5.2 THEOREM: Open intervals of rationals bounded above by a rational number are real numbers. 
Vq E€ Q, (re Q; x <q} € IR. That is, {x € Q; x < q} is a real number for all q € Q. 


PROOF: Let q € Q, and let S, = (x € Q; x < qj. Then S, # Ú because q — 1 € Sy, and S, £ Q 
because q +1 € QV S,. Let y € Sq. Let z = (y + q)/2. Then z € Q and y < z < q. So z € S, and y < z. 
Let x € Q with xz < z. Then x < q. So x € Sy. Hence 5, € R by Definition 15.4.8. 


15.5.3 DEFINITION: The standard embedding of the rational numbers inside the real numbers is the map 
from Q to IR, which maps each rational number q € Q to the corresponding real number (x € Q; £ < q} E R. 


15.5.4 NOTATION: Temporary notation for standard embedding of rational numbers in real numbers. 
p denotes the standard embedding of the rational numbers inside the real numbers. In other words, p : Q > IR 
is the map p: q> (x € Q; x <q}. 


15.5.5 REMARK: Rational and irrational real numbers. 
Definition 15.5.6 distinguishes between a “rational number" q € Q and the corresponding “rational real 
number" p(q) € R. 


15.5.6 DEFINITION: A rational real number is a real number S € R such that S = p(q) for some q € Q. 


An irrational real number is a real number S € IR which is not a rational real number. 


15.5.7 REMARK: Notation for individual embedded rational numbers in the real number system. 

Individual rational real numbers of the form p(q) for q € Q may be notated as in Notation 15.5.8. Thus 
for example, Or = p(0), 1r = p(1), 2r = p(2), and so forth. Usually it is clear from context whether an 
apparent rational number q denotes the rational number itself or the embedded rational number qr = p(q). 
However, it is often helpful to distinguish explicitly between 0g and Op, between 1g and 1g, and so forth. 


15.5.8 NOTATION: qm, for q € Q, denotes the rational real number p(q). 


15.5.9 REMARK: The set of real numbers is infinite and Dedekind-infinite. 

An incidental consequence of Theorem 15.5.10 is that the set of real numbers is Dedekind-infinite, and 
is therefore infinite. (See Definition 13.10.2 for Dedekind-infinite sets. See Theorem 13.10.6 (i) for the 
equivalence of Dedekind-infinite and w-infinite sets.) j 


15.5.10 THEOREM: Jnjectivity of the standard embedding of rationals in the set of real numbers. 
The standard embedding of the rational numbers inside the real numbers is injective. 
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PROOF: Let qi,q» € Q. If q1 < q2, then (q1+92)/2 € p(qo)No(a1). Tf qı > q2, then (qı 42)/2 € p(a) No(a2). 
So qı # q2 implies p(qi) Æ p(qo). Hence the standard embedding of the rational numbers inside the real 
numbers is injective. 


15.5.11 THEOREM: Order isomorphism between rationals and rational real numbers using set-inclusion. 
(i) Va, q2 € Q, (1 = 92 & plq) = p(q2))- 
(ii) Vq1,q» € Q, (qı < qo. € plq) € p(q2))- 
(iii) Var,q2 € Q, (x <a € p(a) & p(a2)). 


PRoor: For part (i), it is obvious that p(q1) = p(q2) if q1 = q2. So suppose that p(qi) = p(q2) and qi F qa. 
If qı < q2, then qı € p(q2) \ p(q1). So plq) # p(qa); and similarly if q1 > q2. 

For part (ii), let q1,q2 € Q. Then p(q1) = 1x € Q; x < qı} and p(q2) = {y € Q; y < q2} by Notation 15.5.4. 
Let qı € q2. Let x € p(gi) Then x € Q and x < qi. Therefore x € Q and x < q2. So x € p(qo). 
Hence Vq1,q» € Q, qı € q» > p(q1) € p(q). 

Now suppose that p(q1) C p(qo). Then {x € Q; x < qı} € {y € Qi y < qo). Suppose that q2 < qi. 
Then q2 € (ze Q;z < gd \{y € Qi; y < qo). This contradicts the set-inclusion. Therefore qı < q. 
Hence Vq1,q2 € Q, qı € q2  p(a) € p(az). 


Part (iii) follows from parts (i) and (ii). 


15.5.12 THEOREM: Some basic properties of the standard embedding of rationals in the real numbers. 

(i) VS € R, Yq € S, plq) Ẹ S. 

(ii) VS € R, VE QV S, S C plq). 

(iii) VS € R, S=U{e(q); a € S} 

(iv) VS € R, Vg € Q, (qe S & plq) Ẹ S). 

(v) VS ER, Vq € Q, ((q € S or plq) = S) & plq) CS). 

(vi) VS € R, S = U {2(4); a € Q and p(q) C 5]. 
) 
) 
) 
) 


(vii) VS ER, S = {z € Q; dv € Q, (z € p(w) and p(w) € S)). 

(vii) VS € R, S = {z € Q; dv € Q, (w < —z and p(—w) € S)}. 

VX € P(Q) \ {0}, (Uto(a:; ae X} FQ > Ulola)s ae X] ER). 
VS c P(Q)\ {0,Q}, (SER e S= U {0(q); 4 € Sy). 


PROOF: For part (i), let S € R and q € S. Then by Definition 15.4.8, there exists q’ € S with q < q’ and 
x € S for all x € Q with x < q'. So p(q') C S for some q' € S with q < q'. But p(q) & p(q') because q < q' 
by Theorem 15.5.11 (iii). Therefore p(q) ¢ S. 

For part (ii), let S € R and q € Q\S. Let qd € S. Then q' < q by Theorem 15.4.17 line (15.4.9). So 
q' € plq) by Definition 15.5.3. Hence S C p(q). 

For part (ii), let S € IR. If qd € S, then q' < q for some q € S, and so q' € p(q) by Definition 15.5.3. 
Then p(q) € S by part (i), and so q’ € U {p(q); q € S). Now let q' € U{p(q); q € S}. Then q' € p(q) for 
some q € S. But then q’ < q by Definition 15.5.3. So q' € S by Theorem 15.4.11 line (15.4.5). 


For part (iv), let S € IR and q € Q. If q € S, then p(q) & S by part (i). So suppose that p(q) $ S. Then there 
exists some x € S V p(q). But then q < zx for this x because x € Q and x ¢ p(q). So q € S because x € S. 
Hence q E S & plq) € S. 


Part (v) follows by pure logic from part (iv). 


For part (vi), let S € R. Let z € S. Then x € p(q) for some q € S by part (iii). Therefore x € p(q) for some 
q € Q with p(g) C S by part (iv), because q € Q by Definition 15.4.8. So x € U{p(q); q € Q and p(q) C S}. 
Hence S C [J(p(q); q € Q and plq) € S). Now suppose that x € U{p(q); q € Q and plq) € S). Then 
x € plq) C S for some q € Q. So x € S. Hence S = UJ {p(q); q € Q and plq) C S). 


For part (vii), let S € IR. Let z € S. Then z € Q and z € U{p(w); w € S) by part (iii). So z € p(w) 
for some w € S. But then p(w) C S by part (i). So S C {z € Q; Iw € Q, (z € p(w) and p(w) C S)}. 
Now suppose that x € {z € Q; dw € Q, (z € p(w) and p(w) C S)). Then z € p(w) for some w € Q which 


satisfies p(w) C S. Let y = (x + w)/2. Then x < y and y < w because x < w by Definition 15.5.3. But 
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then v € S for all v € Q with v < y because p(v) € p(y) € p(w) € S. So x € S by Definition 15.4.8. Hence 
{z € Q; dw € Q, (z € p(w) and p(w) € S)} CS. 
For part (viii), let S € IR. Then by part (vii), 


Vz, z€S edJweQ,(ze€ p(w) and p(w) € S) 
€ dw EQ, (z <w and p(w) € S) 
€ dw € Q, (z < —w and p(—w) € S) 
€ dw € Q, (w< —z and p(—w) € S). 


Hence S = {z € Q; dw € Q, (w < —z and p(—w) € S)}. 
For part (ix), let X € P(Q) V (0) satisfy U{p(q);¢ € X) # Q. Let S = U{plq);q € X}. Then 
S € P(Q) \ {0,Q} because X Z Ø. Let x € S. Then x € p(q) for some q € X. Let y = (x + q)/2. Then 
x < y € plq) because x < q. But then z € S for all z € Q with z < y. Therefore Vr € S, dy € S, 
(x < y and Vz € Q, (z < y = z € S)). Hence S € R by Definition 15.4.8. 

For part (x), let S € P(Q) V (0, Q}. If S € R, then S = U {p(q); q € S} by part (iii). Now suppose that 
S =U {plq); q € S}. Then S € R by part (ix). 


15.6. Real number order 


15.6.1 REMARK: Standard order on the real numbers. 

'The standard order on the real numbers is defined to be the same as the set-inclusion relation for the ZF 
sets which represent them. Therefore it is a partial order on IR because the set-inclusion relation for any set 
of subsets of a given set is a partial order on that set. (See Definition 11.1.2 for partial order.) It must be 
shown that it is a total order on IR. (See Definition 11.5.1 for total order.) The standard order on the real 
numbers is intended to correspond to the customary inequality relation “<”. 


15.6.2 DEFINITION: The standard order on the real numbers is the set ((91,55) € R x R; $1 C $5]. 


15.6.3 NOTATION: < denotes the standard order on the real numbers. In other words, 
VS1,S9 € R x R, $1 < S2 & S1 C So. 


15.6.4 THEOREM: The real number standard order is a total order. 
The standard order on the real numbers is a total order on the set R. 


PROOF: Let S1, S2 € R. Then $1, S2 € P(Q). Suppose that x € S1 \ S2 and y € S2 \ $1 for some z, y € Q. 
If x < y, then x € Sp (because y € S2), which is a contradiction. Similarly, if x» > y, then y € $1 
(because x € S1), which is also a contradiction. So either $4 C S2 or Sp C S$,. In other words, either 
$1 < So or $5 < $1. 

Suppose that Sı < S and Sy < S1. Then Sı C Sp and S$ C S1. So S1 = S2 by the ZF axiom of extension. 
(See Definition 7.2.4 (1).) 

Let $1, $5, 93 € R with $1 < So and So < $5. Then $1 C $5 and $5 c $5. So $1 C $3. Therefore $1 < $5. 
Hence the standard order on the real numbers is a total order on IR by Definition 11.5.1. 


15.6.5 THEOREM: Order isomorphism between rationals and rational real numbers. 
(i) Va,q2 € Q, (m < &2 € plq) < p(ao)). 
(ii) Var,92 € Q, (a1 < q2 € p(a) < pla2)). 
Pnoor: Part (i) follows from Theorem 15.5.11 (ii) and Definition 15.6.2 and Notation 15.6.3. 
Part (ii) follows from Theorem 15.5.11 (iii) and Definition 15.6.2 and Notation 15.6.3. 


15.6.6 REMARK: Derived definitions for the total order on the real numbers. 
The order relations “<”, “>” and “>” are defined for real numbers in terms of the primary relation “<” in 
the usual way, as in Notation 11.1.14. 
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15.6.7 NOTATION: 

IR* denotes the set of positive real numbers {r € IR; r > O}. 

IRj denotes the set of non-negative real numbers {r € IR; r > 0j. 
R- denotes the set of negative real numbers {r € R; r < 0). 

Ro denotes the set of non-positive real numbers {r € R; r < 0}. 


15.6.8 REMARK: Alternative notations for positive and non-negative real numbers. 
Some authors use R* for non-negative real numbers, but then there would be no obvious notation for positive 
reals. It would be tedious to have to write Rt \ {0} for the positive reals. 


Some authors use notations such as R} and Z, instead of the superscript versions. The advantage of this 
would be that you could write IR? for the Cartesian product of n copies of R}. However, this would cause 
ambiguity for notations like IRZ. for the non-negative real numbers. 


15.6.9 REMARK: Completeness of the real numbers. 

The definitions of lower bounds, upper bounds, greatest lower bound, least upper bound, infimum, supremum, 
minimum and maximum are defined for real numbers in terms of the order relation “<” in the usual way, 
as in Definition 11.2.4. 


To show that R is a complete ordered field in Definition 18.9.3, it will be necessary to know that non-empty 
bounded sets of real numbers have a least upper bound. This completeness property for ordered fields is 
asserted in Theorem 15.6.10. 


It is generally true that the supremum of a non-empty bounded-above set A for the set-inclusion relation on 
some set of sets X will be UA if |J A € X. So the non-trivial part of Theorem 15.6.10 is part (i). 


15.6.10 THEOREM: Every bounded-above set of real numbers has a supremum, which equals its union. 
Let A be a non-empty set of real numbers which is bounded above. 


(i) UA€R. 
(ii) UA is the least upper bound of A in IR. In other words, sup( A) = UA. 


PROOF: For part (i), let A be a non-empty set of real numbers which is bounded above. Then  z AC R 
and JK € R, Va € A, a € K by Definition 11.2.5. Let S = (J.A. Then S z () by Theorem 8.4.8 (xiv) because 
each element of A is non-empty by Definition 15.4.8. Let x € S. Then x € a for some a € A. But a € A 
implies a X K, which implies a C K. So x € a for some a C K. Therefore a € K. Thus SC K. So SEQ 
because K & Q by Definition 15.4.8. Consequently S € IP(Q) \ (0, Q}. 

Let y € S. Then y € a for some a € A. But a € A implies a € IR. So by Definition 15.4.8, there exists 
z € a such that y < z and Vr € Q, (x < z > x € S). However, z € a implies z € S. Consequently 
Vy € S, 3z € S, (y < z and Vr € Q, (x < z > x € S)). Hence S € IR by Definition 15.4.8. 

For part (ii), let S = |J A. Then a C S for all a € A by Theorem 8.4.8 (xiv). So S is an upper bound for A. 
Now let S' be an upper bound for A with S' < S. Then a C S’ for all a € A. SoU ACS’ C S =A. So 
S' = S by Theorem 7.3.5 (iii). Hence |J A is the least upper bound of A by Definition 11.2.4. 


15.6.11 REMARK: Greatest lower bounds of non-empty bounded-below sets of real numbers. 

Analogous to Theorem 15.6.10, one might expect to obtain a theorem such as "inf( A) = A € R” for a non- 
empty set A of real numbers which is bounded below. This does not work in the case A = {1 + 1/i; i € N}, 
for example, because (^| A is then the set {q € Q; q < 1}, which is not a real number. (The correct infimum 
would be {q € Q; q < 1}.) Thus f] A is the correct infimum of A as a subset of Q, but it is not an element 
of IR. Thus R is not “complete” with respect to intersection operations. Every non-empty bounded-below 
subset of R does have an infimum in IR, but it is not always equal to the intersection of the set. 


15.6.12 REMARK: Some technical properties of order on the real numbers. 
Theorem 15.6.13 gives some properties of real number order which are useful for proofs of other theorems. 


15.6.13 THEOREM: Technical lemma for rational real numbers and order on the real numbers. 
(i) VS ER, Yq € S, plq) < S. 

(ii) VS € R, VE QV S, S € plq). 

(ii) VS € R, Yq € Q, (eS & p(q) < 5). 
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PROOF: For part (i), let S € R and q € S. Then p(q) $ S by Theorem 15.5.12 (ii). Hence p(q) < S by 
Definition 15.6.2. 


Part (ii) follows from Theorem 15.5.12 (ii) and Definition 15.6.2. 
Part (iii) follows from Theorem 15.5.12 (iv) and Definition 15.6.2. 


15.6.14 REMARK:  Limit-like properties of the real numbers. 
Theorem 15.6.15 has some similarity to limit concepts. Part (i) of Theorem 15.6.15 relies on the property 


of rational numbers that every number K € Qt is less than or equal to some power of 2. This originates in 
the Archimedean property of the set of rational numbers. 


Theorem 15.6.15 (iii) may be interpreted to mean that the rational numbers are a “dense” subset of the real 
numbers. Between every pair of distinct real numbers, there is at least one rational number. In fact, there 
are clearly a countably infinite number of rational numbers between every pair of distinct real numbers. 
Intuition based on finite sets of points could perhaps suggest that there must therefore be infinitely more 
rational numbers than irrational numbers. The reverse conclusion was published in 1891 by Cantor [408] 
using the well-known diagonalisation argument in Theorem 13.1.27. (See Cantor [344], pages 278-281.) This 
is yet another indication of the importance of rigorous logic to filter faulty intuition out of mathematics, 
particularly in the analysis of infinite sets. 


15.6.15 THEOREM: Some real-number order properties related to the Archimedean property. 
(i) VS € R, Ve € Qt, Aw € S, y E Q\S, y-aK<e. 

(ii) YS1, S2 € R, Vg € S2 \ S1, $1 < p(q) < S2. 

(iii) YS1, S2 € R, (S1 < So — dq € Q, Sı < plq) < S2). 


PROOF: For part (i), let $ € R and e € Q”. Since S #0 and S # Q, there exist zo € S and yo € QV S. 
Let £o = yo — zo. Then co € Q* by Theorem 15.4.17 line (15.4.9). For i € Zf, let P(i) be the proposition 
that there exist z; € S and y; € Q \ S with y; — z; = 2^ !&o. Then P(0) is true. 

Suppose that P(i) is true for some i € Zi. Then there exist x; € S and y; € QV S with y; — z; = 27'e9. Let 
z = (x; + yi)/2. If z € S, let x41 = z and yji41 = yj. Otherwise let z;,1 = x; and yj41 = z. Then 74, E€ S 
and yi41 € Q\ S, and yi41 — z;i41 = 2777 leo. Thus P(i +1) is true. So by induction, for all i € Z; , there 
exist x; € S and yj € Q \ S with y; — z; = 2^ *&o. 

Since &o/e € Qt, for some io € Zf, &o/e < 2'* by Theorem 15.1.16. Then 2-/?eg < e. Let x = rj, and 
Y = Yio: Then x€ S, yE QNS andy —z « e. 

Part (ii) follows from Theorem 15.6.13 (ii, i). 

For part (iii), let S1, S2 € R with $4 < S5. Let X = S2 \ S. Then X C Q, and X £z () by Definition 15.6.2. 
So q € X for some q € Q. Then Sı < p(q) < S» by part (i). But q < q' for some q' € S5, and so p(q') < S2 
by part (ii), and $4 < p(q) < plq) < S» by Theorem 15.6.5 (ii). 


15.7. Real number addition and negation 


15.7.1 REMARK: Addition operation for the real numbers. 

The additive group structure for the real numbers is easy to define. Groups are introduced in Definition 17.3.2 
as a tuple (G, o) where G is a set and o : G x G > G is an additive operation. At this point, the addition 
operation for R may be introduced without formally declaring it to be a group, and some of its basic 
properties can be demonstrated. 


15.7.2 DEFINITION: The addition operation for the real numbers is the map o : R x R — IR defined by 
a(S1, S2) = {x € Q; Jy € $4, Iz E So, £ = y + z} = {y + z; y € S1, z € So}. 


15.7.3 THEOREM: Some basic algebraic properties of real-number addition. 
The addition operation o for the real numbers has the following properties. 


(i) VS1, S2 € R, o(S1, S2) € R. [closure of addition 
(ii) V$1,S5 € R, o(S1, S2) = o(S2, $1). [commutativity of addition 
(iii) VS, S2, S3 € IR, o(0(S1, $3), $3) = 0(S1,0(S2, S3)). [associativity of addition 
(iv) VS € R, o(p(0), S) = S = a(S, p(0)). [additive inverse 
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PROOF: For part (i), let S1,S52 € IR. Let $3 = o(S1, S2). Then $5 = {y+ z; y € $1, 2 € So}. Since 
Sı Æ Ú and S2 z Q, it follows that S3 A Ø. Since Sı 4 Q and S2 Æ Q, there exist zx; € Q \ Sı and 
z2 € QV S». Suppose that zı + za € S3. Then zı +22 = y + z for some y € Sı and z € S2. Suppose 
that S3 = Q. Then zı + £2 € S3. But (xı — y) + (za + z) = 0, and z1 — y Z 0 Æ z2 — z by the assumptions 
for £1, £2, y and z. So either z; < y or £2 < z. Suppose that zı < y. Then zı € Sı by Theorem 15.4.11 
and Definition 15.4.5, line (15.4.3), which is a contradiction. Similarly the assumption xı < z yields a 
contradiction. Therefore S3 4 Q. 


Now let y3 € S3. Then y3 = yı + yo for some yı € Sı and y2 € Sy. So there exist z; € Sı and zg € S2 
such that yı < z1 and y» < z» and Va; € Q, (a1 < 21 > z1 € $1) and Vz» € Q, (£2 < z2 = v» € S3). Let 
23 = 21 + za. Then ys < za. Let x3 € Q with z3 < za. Let zı = 21 — (z3 — 23)/2 and z2 = z2 — (z3 — 23)/2. 
Then z1, £2 € Q with x < z1 and z2 < zg. So xı € $1 and x € S2 by Definition 15.4.4. Let v3 = z1 + £2. 
Then z3 € S3 by the definition of c. Hence S3 € R by Definition 15.4.8. 

For part (ii), let $1,S2 € R. Let $3 = o(S1, S2) and 53 = o(S2, S1). Let x € S3. Thenz-—gy-dz 
for some y € Sı and z € So. But y +z = z +y by the commutativity of the rational numbers. So 
x = z + y for some z € Sy and y € Sı. So z € S$. Therefore S4 C S$. Similarly, $3 C S3. Hence 
V $1, 95 € R, a(S1, S2) = a(S2, 91). 

For part (iii), let S1, 82,953 € R. Let a € o(o(S1, S2), S3). Then a = w+ z for some w € a(S}, S2) 
and z € Ss, and w = x + y for some z € Sı and y € $3. Soa = (x +y)+z = z+ (y+ z). But 
y +z € o(S2, S3) and so z + (y + z) € o(81,0(85,53)). So o(o(S1, S2), 833) C o(91,0(95, $3)). Similarly, 
e (a (51, S2), $3) 2 ao(S1,o(S2, $3)). Hence VS1, S2, $3 € R, o(o(S1, S2), S3) = ao(S1,o(S2, $3)). 

For part (iv), let $ € R. Then o(p(0), S) = o({x € Q; x < 0}, S) = {x + y; x € Q and z < 0 and y € SJ. 
Let w € o(p(0),S). Then w = x + y for some z,y € Q with z < 0 and y € S. So w € Q and w < y. 
Therefore w € S by Theorem 15.4.11 line (15.4.5). Now suppose that w € S. Then for some z € S, 
w < z and Vr € Q, (x < z = x € S) by Definition 15.4.8. Let y = w — z. Then y € Q and y < 0. So 
y € p(0). Therefore w = y + z with y € p(0) and z € S. So w € c(p(0), S). Therefore $ C o(p(0), S). Hence 
S = c(p(0), S). The equality S = o (S, p(0)) then follows from part (ii). 


15.7.4 REMARK: Explicit constructions for additive and multiplicative inverse maps. 

The axioms for an additive group require only that an additive inverse (i.e. a negative) of each element of the 
group should ezist. The existence of a function which maps each element to its additive inverse is not required 
to exist. However, since the additive inverse is always unique if it does exist, one may always define the map 
from elements to inverses as the set of ordered pairs {(z,y) € G x G; x 4 y = 0g}, where G is the additive 
group and Og is the additive identity of the group. In the case of the real numbers, it is straightforward to 
give an explicit construction for the additive inverse map. This is given in Definition 15.7.6. Similarly, an 
explicit construction for the multiplicative inverse (i.e. reciprocal) is given in Definition 15.8.11. 


15.7.5 REMARK: The negation operation for the real numbers. 
A naive guess for the negative of a real number $ would be S’ = {x € Q; Vy € S, x+y < 0). This doesn’t 
work for embedded rational numbers. Let w € Q and S = p(w). Then S = (y € Q; y < w} and 


S'—(reQ;VvyeQ,(yuy«w-ovc-y«0) 
= {z € Q; Yy E€ Q, (y«09 x < —w — y)} 
= {x € Q; Vy E€ Q, (y > 0 = xz < —w +y)} 
= N {e(—w + y); y € Q and y > 0} 
which is not a real number. The problem with S” is that it equals an intersection of real numbers p(—w +y), 
which does not yield an open interval of rational numbers when w is rational. It makes sense, then, to try 


a union of open intervals, converging from the left instead of the right. T'his suggests using something like 
U{pe(-y); v € Q\ S), which is at least guaranteed to yield a real number. 


15.7.6 DEFINITION: The negative of a real number S is the set {2 € Q; dye QV S, x+y « 0]. 


15.7.7 NOTATION: —S, for a real number S, denotes the negative of S. In other words, 


VS ER, -S = {rE Q; dye Q\S, x+y < Of. 
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15.7.8 REMARK: Use of the Archimedean property of rational number order for real number negation. 
Theorem 15.7.9 gives some basic properties of the negative of a real number to justify the name. Part (i) 
resembles Theorem 15.5.12 (iii), which states that S = U {p(y); y € Sj for all real numbers S. 

A notable aspect of the proof for Theorem 15.7.9 (iii) that the negative of a real number is its additive inverse 
is the use of the Archimedean property of the total order on the rational numbers. This property has an 
analytic character since it uses a kind of limit operation. This is surprising because negation of a real number 
seems at first sight to be a purely algebraic operation. 


15.7.9 THEOREM: Some basic algebraic properties of real-number negation. 

(i) VS € R, —S = U {o(-y); y E Q\ 5). 

(ii) VS ER, -S € R. [closure of negation] 
(iii) VS € IR, o(S,—S) = p(0) = o(—S,S). [additive inverse] 
(iv) ViS1,52 € R, ($4 = S2 « Sı = — S2). 


VS c R, Vw € Q, (we QV—S © (p(—w) = Sor —we S)). 
(vii) VS € R, S = {x E€ Q; y E Q\-S, x+y < 0}. 


) 
) 
) 
) 
) 
(vi) VS € R, Vw € Q, (wE QV—S © p(—w) € S). 
) 
) 
) 
) 


(x) V$ € R, S = -(—S). [double negation] 


PROOF: For part (i), let S be a real number. Then by Definition 15.7.6, 


Vz, ze—-S Sze Qand Fy¥EQ\S,z+y<0 
€ dye Q\S, (2 € Qand z+y< 0) 
> dye Q\S,2€ {xe Q; x+y <0} 
€ dy €QVS, z € p(-y) 
= z2€U{o(—-y); y E Q\ S}. 


Hence -S = U {o(—y); y € Q \ S}. 

For part (ii), let S be a real number. Then —S = {x € Q; Jy E Q\ S, x+y « 0). Clearly —S C Q. Since 
S £ Q, it follows that y € Q VS for some y, and then z = —y — 1 satisfies x € Q and z +y = —1 < 0. So 
—5S # (). Since S Z (), there is some z € S. Suppose that —z € — S. Then —z+y < 0 for some y € QS. But 
then y < z, and so y € S, which is a contradiction. So —z ¢ —S although —z € Q. Therefore —S 4 Q. In 
fact, if z € S and y € Q\S, then z < y by Theorem 15.4.17, line (15.4.9). So p(—y) € p(—z) for ally € Q\S 
by Theorem 15.5.11(ii). Therefore —S C p(—z). So —z is an upper bound for A = {p(—y); y € Q \ S} 
Since A Æ (), it follows from Theorem 15.6.10 (i) that -S = (JA E R. 


For part (iii), let S be a real number. Let z € e(S, —S). Then z = x + y for some x € S and y € —S. Then 
y € Q and dw € QV S, y+w « 0. But then x < w by Theorem 15.4.17 line (15.4.9). So x+y < xz —w « 0. 
Therefore z € p(0), and so e(S, —S) C p(0). 


To show that c(S, —S) 2 p(0), let z € p(0). Then z € Q and z < 0. Let € = —z. By Theorem 15.6.15 (i), 
there exist x € S and y € Q \ S such that y — x < e. Then p(—y) C —S by part (i). By Theorem 15.4.11 
line (15.4.5), there exists x’ € S with x < a’. Let y' = y--z' — z. Then y' € Q\S by Theorem 15.4.15 (i), and 
y' — z' 2 y — zx « e. Since y' > y, it follows that —y' € p(—y) € —S. Therefore z = € < z' — y! = z' + (—y) 
for some x’ € S and —y' € —S. So z € o(S, —S) because x’ + (—y') € e(S, —S) by the definition of c. 
Therefore a (S, —S) 2 p(0). Hence e(S, —S) = p(0). Consequently c(—5, S) = p(0) by Theorem 15.7.3 (ii). 


For part (iv), let 51,52 € IR. It is obvious that Sı $5 « Sı = —S. Suppose that —S; = —55. 
Let x € —S;. Then x < —y for some y € Q\ S1. So x < —y for some y € Q \ S2. Therefore x € —5S5. 
Hence — S1 = — S2. The result follows from the symmetry of the equality relation for sets. 


For part (v), let S be a real number. It follows from Definition 15.7.6 that 
Vw, weEQ\-S & wcQandwé-S 
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w € Q and ^(dy € QV S,wt+y <0) 
weQandVye QV S, wc-y20 

w EQ and Yy E€ Q, (y¢SSwt+y>d) 
w EQ and Yy E Q, (w+y <0 => yE S) 
w € Q and Yy E€ Q, (y < —w => y E sS) 
w € Q and p(—w) C S. 


TITTIE 


Hence QV —S = {w € Q; p(—w) € S). 

Part (vi) follows directly from part (v). 

Part (vii) follows directly from part (vi) and Theorem 15.5.12 (v). 

For part (viii), let S be a real number. Then it follows from Theorem 15.5.12 (viii) and part (vi) that 


Vz, z € S & z € Q and dv € Q, (w < —z and p(—w) € S) 
€ z € Q and dw € Q, (w < —z and we QV—5) 
€ z€QanddweQN-S,w-4z «90. 


Hence S = (ze Q; Iw e QV S, w+z « 0). 
For part (ix), let S be a real number. Then by part (viii), 


Vz, zES &ze€Qand Jwe Q\-S,w+z<0 
€ dw eE Q\-S, (z € Qand z < —w) 
€ dweQV-5, z E€ p(-w) 
e z2€U{p(—-w); w € Q\ —S}. 


Hence S = [J {p(—w); w E€ Q\ —S}. 
Part (x) follows from part (viii) and Definition 15.7.6. 


15.7.10 REMARK: Commutative group properties of real number addition and negation. 

It follows from Theorems 15.7.3 and 15.7.9 that the pair (IR, 7) is a commutative group by Definitions 17.3.2 
and 17.3.23. Since those definitions appear in a later chapter, the term “commutative group” is not used for 
the properties asserted in Theorem 15.7.11. 


15.7.11 THEOREM: The real numbers form a commutative group with the addition operation. 
The pair (IR, c), where ø is the addition operation in Definition 15.7.2, has the following properties. 


(i) V$1, S2 E R, o(S1, S2) € R. [closure] 
(ii) V$1, $5, $5 € IR, o(o(S1, S2), $3) = o (S1, 0 (S2, S3)). [associativity] 
(iii) VS € IR, o(p(0), S) = S = a(S, p(0)). [existence of identity] 
(iv) VS ER, o(S,—S) = p(0) = o(-S,S). [existence of inverses] 
(v) V9, S2 € R, e($1, $5) = o(S5, 91). [commutativity] 


PRoor: Part (i) is the same as Theorem 15.7.3 (i). 
Part (ii) is the same as Theorem 15.7.3 (iii). 
Part (iii) is the same as Theorem 15.7.3 (iv). 


Part (v) is the same as Theorem 15.7.3 (ii). 


15.7.12 REMARK: Using group properties to prove properties of real number addition. 

Many properties of real number addition can be more effortlessly derived from the abstract commutative 
group properties in Theorem 15.7.11. For example, the identity —(—S) = S in Theorem 15.7.9 (x) follows 
abstractly in Theorem 15.7.13 (v) from the associativity and inverse properties in Theorem 15.7.11 by the 
same proof steps as for completely abstract groups. 
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'The real number system is in practice regarded as no more and no less than an abstract complete ordered 
field. Any other properties are regarded as mere artefacts of particular *models" for the abstract system 
of axioms for a complete ordered field. Consequently, there is little benefit in proving any properties of the 
Dedekind cut representation beyond what is required to verify the complete ordered field axioms, since all 
other properties will be discarded in applications. 

15.7.13 THEOREM: Some basic properties of the commutative additive group of real numbers. 

(i) VS, T € R, (c(S, T) = p(0) > T ——8). [unique right inverse] 
(ii) VS, T € R, (c(T, S) = p(0) > T=—S). [unique left inverse] 
(iii) VS, Ti, T5 € R, (o(S, T1) = o(S, T5) > Ti = T3). 

(iv) VS, T3, T5 € R, (o(T13, 9) = o(To, 8S) > Ti — T3). 

(v) VS ER, -(—S) = S. [double negation] 

(vi) VA, B,C € R, c(-A,—B) = —o(A, B). [commutativity of negation and addition] 


PROOF: For part (i), let S, T € R satisfy c(S, T) = p(0). Then c(—S,c(S, T)) = o(—S, p(0)) because c is 
a well-defined function. So e(o(—S, S), T) = —S by Theorem 15.7.11 (ii, iii). Therefore o(p(0), T) = —S by 
Theorem 15.7.11 (iv). Hence T = —S by Theorem 15.7.11 (iii). 

For part (ii), let S, T € R satisfy c(T, S) = p(0). Then c(c(T, $), —S) = o(p(0),—S) because c is a 
well-defined function. So o(T,a(—S,5S')) = —S by Theorem 15.7.11 (ii, iii). Therefore o(T, p(0)) = —S by 
Theorem 15.7.11 (iv). Hence T = —S by Theorem 15.7.11 (iii). us 

For part (iii), let S, T1, T5 € R satisfy o(S, T1) = o(S, T2). Then e(—5,0(S,T1)) = e(—S,o0(S, T1)). But 
o(—S,o(S,T,)) = o(a(—S,S),T1) = c(p(0), T1) = Tı follows from Theorem 15.7.11 (ii, iv, iii). Similarly 
a(—S,a(S,T2)) = a(a0(—S, S), T5) = o(p(0), T3) = To. Hence Ti = To. 

Part (iv) may be proved as for part (iii). 


Therefore o(—(—S), p(0)) = S by Theorem 15.7.11 (iv). Hence —(—S) = S by Theorem 15.7.11 (iii). 
Alternatively, for part (v), Theorem 15.7.11 (iv) implies both o(—(—S),—S) = p(0) and e(S, —S) = p(0). 
Hence —(—S) = S by part (iv). 

For part (vi), let A, B, C € IR. Then 


e(c(A, B),c(-A, —B)) = o(c(B, A,c(—A,—B)) 
= 0(B,o(A,a(—A, —B))) 
= 0(B,o(o0(A, —A), —B)) 
= 0(B,o(p(0), — B)) 
c(B,—B) 
= p(0) 


15.7.14 REMARK: Density of rational numbers implies lack of supremum for an irrational number. 

It is noteworthy that the proof of Theorem 15.7.15 (iii) uses the analytic-looking property of the real numbers 
in Theorem 15.6.15 (iii), which concerns the “density” of the rational numbers within the real numbers. 
Theorem 15.7.15 (iii) may be interpreted to mean that the supremum of the set of rational numbers which 
are less than a given irrational number is not equal to an embedded rational number. In other words, the 
reflection in the origin of the complement of an irrational real number S equals the negative of S because it 
is unnecessary to remove the rational supremum from the reflected complement (—y; y € Q \ S} because it 
does not contain one. 


'Theorem 15.7.15 implies that the negative of a real number has one of two simple forms, one for embedded 
rational numbers, and the other form for other real numbers. Thus a “rational real number" p(q), for 
some q € Q, has a negative of the form (—y; y € Q\ p(q) and y Z q}, whereas an “irrational real number” S 
has a negative of the form (—y; y € Q \ S}. The difference between these two forms is only a single point 
which must be excised in the case of rational real numbers. This is more or less intuitively obvious, but of 
course it must be verified, like all intuition. The proof is onerous, but it must be done! 
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15.7.15 THEOREM: Structure of Dedekind cuts representing negatives of rationals and irrationals. 
(i) VS € R, —SC {-y; ye Q\ S}. 
(ii) Yg € Q, —o(q) = (-y: y € QV p(q) and y 7 q}. 
) 
) 


(iii) VS € RV p(Q), -5 = {-y; y E Q\ S} 
(iv) VS € R, (S € p(Q) S = {-y; y €Q\ S}). 


PROOF: For part (i), let S € IR. Let z € —S. Then by Definition 15.7.6, r € Q and x < —z for some 
z € QVS. So —« € Q and —2 > z for some z € QV S. Therefore —z € Q\S by Theorem 15.4.15 (i). Hence 
x € {z; =x €Q\S}={-y; y € Q\ S} 

For part (ii), let q € Q. Let x € —p(q). Then z € {-y; y € Q \ p(q)} by part (i). By Definition 15.7.6, 
x € Q and z < —z for some z € Q \ p(q). But Q \ plq) = {z € Q;q < z}. So =r > z for some 
z € Q with z > q. Therefore —x > q, and so —z Z q. So x € {-y; y € QV (q) and y Z q}. Hence 
—p(q) € (—v; y € Q \ p(q) and y 7 qj. 


To show the reverse inclusion for part (ii), let x € (—y; y € Q \ p(q) and y Z q}. Then x = —y for some 
y € Q\ p(q) with y Z q. So x = —y for some y € Q with y > q and y Z q. That is, x = —y for some 
y € Q with y > q. Let y' = (y+q)/2. Then y' € Q, —y < —y' and y’ > q, and so —y € p(—y') and 
y' € Q\ p(q). Therefore x = —y € Uf{p(—-y’); v € Q\ p(q)}. So x € —p(q) by Theorem 15.7.9 (i). Therefore 
(7v; y € Q\ p(q) and y # q} € —p(q). Hence —p(q) = (—y; y € QV p(q) and y £ q}. 

For part (iii), let S € R\p(Q). Then —S C (—y; y € Q\S} by part (i). Suppose that —S 4 {—y; y € Q\S}. 
Then —y ¢ —S for some y € Q \ S. So by Definition 15.7.6, Vz € Q \ S, —y + z 2 0 for some y € QV S. In 
other words, dy € QV S, Vz € QV S, z > y. Suppose that S < p(z). Then by Theorem 15.6.15 (iii), there 
exists w € Q with S < p(w) « p(z). Sow € Q\S with w « z by Theorem 15.6.5 (ii). This is a contradiction. 
So S = p(z). Therefore S € p(Q) = (p(q); q € Q}. This is a contradiction. Hence —S = (-y; y € QV S}. 


For part (iv), let S € IR. By part (iii), if S ¢ p(Q) then -S = {-y; y € QV S}. If S € p(Q) then S = p(q) 
for some q € Q, and so —S = {—y; y € Q\S and y 7 q} by part (ii). But q € Q\p(q) = Q\S since q ¢ p(q). 
So {-y; y E€ Q \ S andy # q} # {-y; y € Q \ 8}. Hence -9 # {—y; y € QV S]. 


15.7.16 REMARK: Some minor properties of the negation and order of real numbers. 

The assertions in Theorem 15.7.17 are easy to prove and are fairly shallow. However, it is important to 
verify them because they are often required as small steps in later proofs. (For the meaning of Op, see 
Notation 15.5.8.) 


15.7.17 THEOREM: Some properties of negation and order of real numbers. 

(i) Yq € m he ) = e(-4). 

(ii OR = — 

(iii) im € Ps (S1 € $5 = —$5 < — S1). 

(iv) VS € R, (S 2 0g & —S € Op). 

(vy) VS € R, (S>0R > S2 Q). 

(vi) V$1,5, $4 € R, ($1 < S2 => o(S1, 83) < o (S5, S3)). [order additivity] 


PRoor: For part (i), let q € Q. Then —p(q) = (x € Q; dy € Q \ p(q), y < —x} by Definition 15.7.6. But 
Q \ p(¢) = {z € Q; z > q}. Therefore —p(q) = (ze Q; Fy € Q, (q < y and y < —zr} = {x E€ Q; q<-a}= 
{x € Q; x < —q} = p(-9). 

For part (ii), Or = p(0) = p(—0) = —p(0) = —0r by part (i). 

For part (iii), let 91, 52 € IR with Sı < S2. Let r € —S2. Then z € Q and x < —y for some y € Q \ Se 
by Definition 15.7.6. So x € Q and x < —y for some y € Q \ S1. Therefore x € —S;. Hence —S2 < — S1. 
The converse follows from the fact that “<” is a total order, and Sı = Sə if and only if —94 = — Sə by 
Theorem 15.7.9 (iv). 

For part (iv), let S € IR. Then S > 0g & —S < —Op by part (iii). Hence S > 0g & —S < Op by part (ii). 


For part (v), let S € R with S > 0g. Then p(0) < S. So OQ € S by Theorem 15.6.13 (iii). Therefore z € S 
for all x < 0g by Theorem 15.4.11 line (15.4.5). In other words, Qo E S. 
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For part (vi), let $1,595, 93 € R with S1 < Sg. Then Sı Æ S2. So o(S1, 53) Æ o (S5, $3) by 'Theo- 
rem 15.7.13 (iv). Let $1 < S2. Then S, C S2. Suppose z € o(S1, $3). Then z = x + y for some x € Sı and 
y € $3). But then x € S2. So z = x + y for some x € S5 and y € S3}, which implies that z € o(51, 93). 
Therefore o(S1, 83) C o(S2, $3). Hence o(91, $3) € e(S», 93). 


15.8. Real number multiplication and reciprocals 


15.8.1 REMARK: Construction of products of real numbers. 

Multiplication of real numbers is more complicated than addition of real numbers. An attempt to define 
multiplication as for addition gives almost the right answer in the case of mixed non-positive and positive 
real numbers, but in other cases, adjustments must be made using constructions which resemble the negation 
operation in Definition 15.7.6. (For the meanings of IR*, Rt, R- and Rp, see Notation 15.6.7.) 


The reason for the untidy definition of real number multiplication is that the Dedekind cut concept has two 
unnatural asymmetries. First, the semi-infinite interval is arbitrarily chosen to have an infinite negative “tail” 
rather than an infinite positive “tail”. Secondly, when a real number is an embedded rational number, that 
rational number is arbitrarily excluded from the negative tail so as to make the embedding unique. In each 
case in Definition 15.8.4, ad-hoc adjustments are made to ensure that the two “unnatural asymmetries” of 
the representation are preserved, sometimes to ensure that the “tail” points in the right direction, sometimes 
to ensure that the “open boundary condition” is satisfied, and sometimes both kinds of adjustments must 
be made. 


The untidiness of the multiplication of the standard form of Dedekind cuts can be removed by introducing a 
kind of “multiplicative Dedekind cut” as in Definition 15.8.2, which contrasts with the “additive Dedekind 
cut” in Definitions 15.4.5 and 15.4.8. (See Notations 15.1.13 and 15.1.14 for the rational number intervals 
in Definition 15.8.2.) 


15.8.2 DEFINITION: Non-standard definition of multiplicative Dedekind cuts. 
A (finite) multiplicative Dedekind cut is a set U € P(Q) V (0, Q, Q5, Qg } which satisfies 


Vy € U, VA € Q[0, 1], Ay E U (15.8.1) 
Vy € U, 3u € Q(1,oo), uy EU. (15.8.2) 


15.8.3 REMARK: Multiplication of multiplicative Dedekind cuts. 

The reason for the use of multipliers \ and p in Definition 15.8.2 is to avoid having to give separate definitions 
for positive and negative multiplicative Dedekind cuts. Line (15.8.1) means that the set U contains all of 
the rational numbers from the origin to each element of the set. Line (15.8.2) is a “boundary condition” 
which forces the interval U to be “open” at the end which is furthest from the origin. Intuitively speaking, 
each finite multiplicative Dedekind cut corresponds to an interval Q[0, r) for a non-negative real number r, 
or the interval Q(—r, 0] for a non-positive real number r. 


To multiply two multiplicative Dedekind cuts U and U2, one defines 7(U1,U2) = (yz; y € U1, z € Ud}. 
If Uy and U2 correspond intuitively to the intervals [0,r1) and [0, r2), then 7(U;,U2) corresponds to the 
interval [0,71,r2). In the negative cases, the result is as expected. So the formulation here is as simple as 
for addition in Definition 15.7.2. The fly in the ointment here is that addition works very poorly for these 
finite multiplicative Dedekind cuts. There is no difficulty for non-negative multiplicative Dedekind cuts, but 
the addition of a positive number and a negative number is significantly less simple. 


One could consider converting between the additive and multiplicative representations as required before 
each arithmetic operation is applied, but this is even more untidy than either of the two representations. 


It seems that any kind of representation of the real numbers is unsatisfyingly untidy when both addition 
and multiplication operations must be supported. This suggests that there is something intrinsically difficult 
about multiplication in particular, and certainly the combination of addition and multiplication. Since the 
weight of tradition is in favour of the additive representation, that is the representation which continues to 
be pursued here as the basis of Definition 15.8.4. 
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15.8.4 DEFINITION: The multiplication operation for real numbers is the map 7 : R x IR — R defined by 


{-yz; y € S1, z E So} if S1, S2 € Rọ 
|J {yz; y € $1, z E Q\ So} if S1 € IR; and S5 € Rt 
i RANET yO Soren if S1 € R^ and S2 € Ro 


Q U {yz; y E QU N S1, z E QN S2} if S1, S2 € Rt 


15.8.5 REMARK: Asymmetries in the definition of real number multiplication. 

In the first three cases for 7(91, S2) in Definition 15.8.4, the multiplicands Sı and $5 are required to lie in 
Ro or Rt. The reason for this apparent asymmetry is that Dedekind cuts effectively represent semi-infinite 
open intervals of the real number line. Therefore the product of two Dedekind cuts must not include the 
supremum of the interval if it represents an embedded rational number. To see what can go wrong here, 
let S1 € Rọ and $5 € Rj. In the special case S2 = (0), the set Q X S2 contains the element z = 0g. So 
the product yz for y € Sı and z € Q S2 will equal 0g if z = 0g. Then (yz; y € Si, z € QV So} will 
equal {x € Q; x < 0}, which is not a Dedekind cut. By contrast, when $5 € R*+, the products of elements 
of the open semi-infinite interval Sı with the closed semi-finite interval S is an open semi-infinite interval, 
which is a valid finite Dedekind cut. This kind of asymmetry is an unavoidable result of the somewhat 
unnatural design of the Dedekind cut representation. Both the rational numbers and signed integers have 
similar apparent asymmetries. These are inherited and amplified by any representation for the real numbers 
which is based upon them. 


15.8.6 REMARK: Application of multiplicative Dedekind cuts to multiplication of positive real numbers. 
The construction in the case $1, S2 € IR* in Definition 15.8.4 is the same as first converting 91 and S$» to the 
multiplicative style of Dedekind cut as in Definition 15.8.2, then multiplying them according to the simple 
rule mentioned in Remark 15.8.3, and finally converting the result back to an additive Dedekind cut. For 
non-negative real numbers, the two forward conversions are achieved by truncation to Qi. and the single 
reverse conversion is achieved by extending the resulting bounded interval to negative infinity. 


15.8.7 REMARK: Verification that real number multiplication is well defined. 
'Theorem 15.8.8 verifies that the multiplication operation in Definition 15.8.4 is well defined. 


15.8.8 THEOREM:  Well-definition of the real number multiplication operation. 
(i) (—yz; y € $1, z € So} € Rọ for all S1, $5 € Ro. 
(ii) —(—yz; y € $1, z € S2} € R$ for all 91, S2 € Ro. 
(iii) (yz; y € $1, z E€ QV S2} € Ro for all Sı € Ro and S5 € Rt. 
(iv) (yz; y € QV Si, z € S5) € Ro for all Sı € Rt and Sz € Rp. 
) 
) 


(v) Q U {yz; y e QEN S1, z € Q n S2} € Rt for all S1, 92 € Rt. 
0 0 
(vi) VS1, S2 € R, T(S1, S2) € R. [closure of multiplication] 


PROOF: For part (i), let S1, S2 € R with S1, S2 < Or. Then by Definition 15.6.2 and Notation 15.5.4, 
Sı C p(0) = {q € Q; q < 0}. Similarly, S2 C {q € Q; q < 0}. Let S3 = (—yz; y € S1, z € So}. Then 
S3 C {q € Q; q < 0}. Thus S3 C Q and Ø 4 S3 4 Q. Let x € S3. Then x = —yz for some y € $1 
and z € S2. So y < y' and z < 2’ for some y' € Sı and z' € S2 by Theorem 15.4.11 line (15.4.6). But 
then y’ < 0 and z’ < 0, and so yz’ < yz. So —y';' > —yz and —y'z' € S3. Let w € Q with w < —'z'. 
Let y” = —w/z'. This is well defined because z’ Z 0g. Then y” < y' because z' < OQ. So y" € S1 by 
Theorem 15.4.11 line (15.4.5). So w = —y"z' € S3. Therefore S3 € IR by Definition 15.4.8. The inequality 
S3 < Or follows from $5 C {q4 E Q; q < 0}. 

For part (ii), let 51,52 € R with 61,52 < Or. Then —{-yz; y Si, 2 So} R by part (i) and 
Theorem 15.7.9 (ii), and —(—yz; y € $1, z € S2} > Op follows from part (i) and Theorem 15.7.17 (ii). 

For part (iii), let 51,52 € R with Sı < Op and S2 > Og. Then Sı C p(0) = Q, and $5 2 Qo by 
Theorem 15.7.17 (v). Therefore QV S2 C Q \ Qp = Qt. So yz < 0g for all y € Sı and z € QV S2. Let 
S3 = {yz; y € S1, 2 € QV S2}. Then $4 C Q^ = Og. Let w € $3. Then w = yz for some y € Sı and 
z E€ Q \ S2. But then y < y’ for some y' € Sı by Theorem 15.4.11 line (15.4.6), and y' < OQ for such y’. 
Let w = y'z. Then w € $4 and w > w because z > 0g. Let w” € S3 with w” < w'. Let y" = w"/z, which 
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is well defined because z € QT. Then y" < w'/z = y', and so y" € S41. So w” = yz € S3. Therefore $5 € R 
by Definition 15.4.8. Hence $3 € Qg because $3 C Om. 


Part (iv) may be proved as for part (iii) with Sı and S5 swapped. 


For part (v), let S1, S2 € R with $1,995 > Or. Then S; 2 Qo and S2 2 Qg by Theorem 15.7.17 (v). So 
0E QIN S ZÜand0 c RENS Z 0. Let $4 = Q U(yz; y € QN S, 2 E QIN S2}. Then Qs C S3 
because 0 € S3. Let x € $5, and let w € Q with w < x. If x < OQ, then w € Q^ C S3. If x > 0o, then 
x E {yz; y E QIN Si, z E QS N S2}. If w < 0g, then w € Q^ C Ss. If w > 0o, first note that x = yz for 
some y € QFN Sı and z € Q* N S». Then y' = w/z is well defined and y € Qt because w,z > 0g. But 
y' < y because w < x. So y’ € S1. Therefore w = y'z € Ss. So line (15.4.5) of Theorem 15.4.11 is satisfied. 
To verify line (15.4.6), let z € S3. If x < 0g, then z < w and w € S3 with w = 2/2. If x = 0o, then x < yz 
for any y € Q* N Sı and z € Q* n S2. Therefore x < w for some w € S3. If x > Og, then x = yz for some 
y € Q* N Sı and z € Q* n S2. Since 5, satisfies line (15.4.6), there exists y' € S, with y < y'. Let w = y'z. 
Then x < w because z > 0g. So 53 satisfies line (15.4.6). Therefore S3 € R by Theorem 15.4.11. Hence 
S3 € IR* by Theorem 15.6.13 (i) because 0 € 55. 


Part (vi) follows from parts (ii), (iii), (iv) and (v) and Definition 15.8.4. 


15.8.9 THEOREM: Commutativity of negation and multiplication of real numbers. 
(i) V$; € Ro, VS2 € Rt, T(S1, S2) = {-yz; y € Sı, Z S2}. 

(ii) VS, € Rt, YS2 € Ro, T(S1, S2) = {—yz; y € —S1, z € So}. 
) 
) 


(ii) VS € R, T(S, p(0)) = p(0) = r(p(0), S). 
(iv) V$1,S9 € IR, —7(91,593) = T(— S1, S2) = 7($1, —S2). [commutativity of negation and multiplication] 


PROOF: For part (i), let Sı € IR; and S2 € R*. Then 7(51, 52) = {yz; y € S1, z € Q \ S2} follows from 
Definition 15.8.4. Let w € Q. Then w € Q\ S2 (p(—w) = —$5 or —w $5) by Theorem 15.7.9 (vii) 
and Theorem 15.7.13 (v). So T(S1, S2) = (—yz; y € Sı and z € Q and (p(z) = —S2 or z € —S2)}. Suppose 
that y € Sı and p(z) = —55. Then y < y/ for some y' € S, by Theorem 15.4.11 line (15.4.6), and then 
z! = zy/y' < z because y,z € Q^. So 2’ € p(z) = —S5, and so —y/z = -yze (-yziuy € S1, 2 So}. 
Hence 7(S),S2) = {—yz; y € $1, z $5). 


Part (ii) follows from part (i) by swapping Sı and S5. 


For part (iii), let $ € R. If S € Ro, then 7(S,p(0)) = —(—yz; y € S, z € p(0)} by Definition 15.8.4. But 
{—yz; y € S, z € p(0)} = {—yz; y E€ S, z € Q7} = Q because S C p(0) = Q7, and any element x € Qt 
may be written as —yz for any y € S C Q by letting z = —x/y. Therefore T(S, p(0)) = p(0). Now suppose 
that S € Rt. Then 7(S, p(0)) = (yz; y E Q\ S, z € p(0)} = fyz; yE Q\S, 2e Q`}. But QVS C Q*. So 
T(S, p(0)) = Q because every x € Q may be written as yz for any y € Qt by setting z = z/y. Therefore 
T(S, p(0)) = p(0). Similarly, p(0) = r(p(0), S). 

For part (iv), let S1, S2 € IR. If Sı = p(0) or S2 = p(0), then —7(91,52) = —p(0) = p(0) = T(— S1, S2) = 
T(S$1,—S2) by part (iii) and Theorem 15.7.17 (ii). If $1, $2 € R7, then —7 (S1, S2) = (-yz; y € $1, 2 € S2} 
by Definition 15.8.4 and Theorem 15.7.13 (v), and 7(—S1,S2) = (—yz; y € S1, z € S2} by part (ii) and 
Theorem 15.7.13 (v). So —7($1, S2) = 7(—51, S2). Similarly, —7($1, S2) = T(S1, — S2). 


15.8.10 THEOREM: Real number multiplication commutativity, associativity and identity. 


(i) YS1, S2 € R, 7($1, $2) = r($5, $1). [commutativity of multiplication] 
(ii) YS1, 92,93 € R, T(T(S1, $2), $3) = 7 ($1, 7(S2, S3)). [associativity of multiplication] 
(iii) VS € R, 7(S, p(1)) = S = r(p(1), S). [multiplicative identity] 


PROOF: For part (i), let 51,55 € IR. If S1,S2 € Ro or $1,S2 € R*+, the commutativity of the product of 
Sı and S» follows from Definition 15.8.4 and the commutativity of multiplication of rational numbers. If 
Sı € IR; and S2 € Rt, then 7(S1, S2) = {yz; y € S1, z E QV So} = (zy; z E QV S2, y € S1} = T(S2, 81). 
The result for Sı € IR* and 55 € Rọ follows similarly. Hence the result is valid for all $1, S2 € IR. 
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For part (ii), let $1, $5, $3 € R. First consider the case $1, S2, 953 € Ro. Then 


T(T(S1, S2), 83) = 7(—{—ay; x € 51, y € S2}, 53) 
= —r((-xy; x € S1, y € S2}, S3) 
= —(—{ xyz; x € $1, y € So, z € S3}) 
= {xyz; x € S1, y E S2, z E S3} 


by Definition 15.8.4 and Theorems 15.8.9 (iv) and 15.7.13 (v). Similarly, 7(S1,7(S2,53)) = {xyz; x € $1, 
y € So, z € S3}. So T(T(S1, S2), S3) = T(S1,T(S2,83)). For the case Sı € Rt and S2, 83 € Ro, one 
obtains T(T(S1, S2), 83) = T(—T(—S1, 82), 93) = —T(T(—S1, S2), S3) by Theorem 15.8.9 (iv). But then 
— S1, S2, $5 € Rg. So by the first case, T(T(—S1, S2), $3) = T(— S1, T(S2, S3)), which equals —7 (S1, T(S2, 3)) 
by Theorem 15.8.9 (iv). Therefore T(T(S1, 82), S3) = 7(S1,7(S2,S3)). The other six cases for the signs of 
$1, S2 and S3 follow similarly. 


For part (iii), let S € IR. First consider the case S € Rg. Then 7(S,p(1) = {-yz; y € S, z p(1)} 
by Theorem 15.8.9 (i). But —p(1) = p(—1) by Theorem 15.7.17 (i), and p(—1) = {z € Q; z < -1j. So 
{-yz;y E S, z p(1)) € (y; uy € S} = S. Let y € S. Then y < y’ for some y' € S. Let 2’ = y/y’. 
Then z’ < —1g. So 2' € p(—1) and y'z = yz € (-yz; y € S, z p(1)}. So 7(S, p(1)) = S. Similarly, 
S — r(p(1), S). The case S € IR* may be converted to the case S € IR^ by applying Theorem 15.8.9 (iv). 


15.8.11 DEFINITION: The reciprocal of a non-zero real number S is the set {x € Q^; dy e Q VS, x < 1/y} 
for S € IR^, and is the set Q5 U {x € Qt; Jy E€ QV S, x < 1/y) for S € R*. 


15.8.12 NOTATION: $S-!, for non-zero real numbers S, denotes the reciprocal of S. in other words, 


Me o s if S € R7 
LUQ U{rE Qt; 3yeQVS, m «l/y) ifSem*. 


15.8.13 REMARK: Analysis-like proof that the reciprocal of a real number is its multiplicative inverse. 
The second half of the proof of Theorem 15.8.14 (xiii) uses an analysis-like argument which is based on 
Theorem 15.6.15 (i), which is itself based on the Archimedean property of rational number order. A similar 
situation is encountered in the corresponding proof that the negative of a real number is its additive inverse 
in Theorem 15.7.9 (iii), as mentioned in Remark 15.7.8. 


15.8.14 THEOREM: Basic algebraic properties of the real number multiplicative inverse. 
(i) YS € R7, S 2Uto(1/y; ye Q \ Sh. 
(i VS € R*, S=} =U {o(1/y); y € Q\ S} 
(iii) YS € R7, SteR. 
(iv) VS € R*, S^! e R*. 
(v) VS € R \ {0r}, S7! € R\ {Or}. [closure of negation] 
(vi) ee Qo U {-1/z; z E€ S}=Q\ {-1/z; z E Q` NS) =U {e(-1/z); ze S) e Rt. 
(vii) YS e R*, (-C1/5ze8nQ*3 = Q N(-1/5 z € Q\ S} =Ufo(-1/z); ze Sn Q*) ER. 
) 
) 
) 
) 
) 
) 
) 


(viii) VS € R7, —(S~!) = QgU(-1/z z € S). 
(ix) VS € R7, (~S)! = Q U {-1/z; z € S). 
(x) VS € Rt, -(S!) = {-1/z; z € SN Q*}. 
(xi) VS € R*, (-S)! = (-1/z; ze S N QF}. 

(xii) VS € RV {0r}, (—S)! = -(S71). [commutativity of negative and reciprocal] 

(xiii) YS € R\ {0r}, T(S, S71) = p(1) 2 T(87}, S). [the reciprocal is the multiplicative inverse] 

(xiv) VS, T3, T5 € R, 7T(S,o(T1,12)) = o(T(S, T1), T (S, T2)). [distributivity] 


PROOF: For part (i (). let S € IR^. Then z € S^! if and only if x € Q and gz < 1/y for some y € Q^ VS. In 
other words, x € S~! if and only if z € (x € Q^; x < 1/y) = p(1/y) for some y € Q^ V S because y € Q`. 
Therefore z € S^! if and only if x € (J(p(1/y; y € Q \ S}. Hence S~! = YU {p(1/y); ye Q \ S}. 
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For part (ii), let S € IR*. Then x € S^! if and only if x € Qg or x € {x € Qt; x < 1/y} for some y € Q\ S. 
But {a € Qt; x < 1/y} = Q* n p(1/y) because y € Qt. Therefore x € S~! if and only if z € p(1/y) for 
some y € QV S. Hence S~' = U {p(1/y); y € Q\ S} 
For part (iii), let S € R7. Then S^! C Q by Definition 15.8.11. Let X = (ze Q; 1/2 €E Q V S} = 
{1/y; y € Q VS). Then S~! = U {p(y); y € X} by part (i). Therefore S-! € R by Theorem 15.5.12 (ix) 
because X Æ Ø. Hence S^! € R7 because S ! € Q` = p(0). 
For part (iv), let S € Rt. Let X = {z € Q;1/2 € QV S} = (1/y y € QV S). Then X z 0 and 
sex] rip y € X) z Q by part (ii). So S^! € IR by Theorem 15.5.12 (ix). Hence S^! € Rt because 
SQ = p(0). 
Part (v) follows from parts (ii) and (iv). 
For part (vi), let S € IR^. Then z € Qg U{-1/z; z € S} if and only if x € Q and x ¢ QV(—1/z; z € SJ. 
But Q \ {-1/z; ze S} = {-1/z; z € Q` \ S} by Theorem 10.6.7 (v') because the map z+ —1/z from Q7 
to Qt is a bijection. Hence Qg U{—1/z; z e S} = Q\ {-1/z; 2 € Q VS). Now let A = Qg U{-1/z; z € S) 
and B = U{p(-1/z); z € S}. Let y € A. Then y € Q or y = —1/z for some z € S. If y € Qg, then 
y € p(—1/z) for some z € S because SNQ Æ 0. Soy € B. If y € Qg, then y € Qt and y = —1/z for 
some z € S. There exists z/ € S with z < z’ by Theorem 15.4.11 line (15.4.6). Then —1/z € p(—1/z’) 
because z/ € Q and —1/z < —1/2'. So y € B. Therefore A C B. Let y € B. Then y < —1/z for some 
z € S. If y € Q, then y € A. If y d Qg, then y € Q* and y < —1/z for some z € S. Let z' = —1/y. 
Then z’ < z. But then y = —1/2' and z' € S. So y € A. Therefore A = B. So Qp U{-1/z;z E SER 
because [J(p(—1/2); z € S) € IR by Theorem 15.5.12 (ix). Hence [J(p(—-1/z); z € S) € IR* because 
U te- 1/2); 2 € S) € Qs. 
For part (vii), let S € Rt. Then x € {-1/z; z € SN Qt} if and only if z € Q^ and z ¢ Q^ V {-1/z; 
zeSnQ* But Q \{-1/z; z € SN Qt} = {-1/z; z € Qt VS) by Theorem 10.6.7 (v') because the map 
z+ —1/z from Q* to Q is a bijection. Hence {—1/z; z e SN Qt} = Q` Ví(-1/z z e QV S). Now let 
A= {-1/z; z € SN Qt} and B =U {p(-1/z); z E SN Q}. Let y € A. Then y € Q^ and y = —1/z for 
some z € $M QF. There exists 2’ € S with z < z’ by Theorem 15.4.11 line (15.4.6). Then —1/z € p(—1/z’) 
because —1/z < —1/z'. So y € B. Therefore A C B. Let y € B. Then y € Q` and y < —1/z for some 
z€SnQ'. Let z = —1/y. Then 2’ < z. Soz € Sn Qt. Therefore y = —1/z' for some 2’ e $n QF. 
So y € A. Therefore B C A. Hence (-1/2; z e SN Qt} =U {p(-1/z); ze Sn Qt}. But it follows from 
Theorem 15.5.12 (ix) that U (p(—1/z); z e Sn Qt} € R. Hence U {p(—1/z); z € SN Qt} € R7 because 
Ute(-1/2; ze SAQIF eq. 
For part (viii), let S € IR^. Let T = Qg U {-1/z; z € S}. Then T € Rt by part (vi). So -T = U {p(—z); 
£ € Q\T} by Theorem 15.7.9 (i). But Q\ T = {-1/z; z eQ \ S} because SC Q^. So -T = Jioc —2); 
3z € Q` \S,2£=-1/z} =Uf{ola); 3z € Q7 \ S, z =1/z} = U {p(1/z); z e Q VS). Therefore -T = S71 
by part (i). Hence —(S~') = T by Theorem 15.7.9 (x). 
For part (ix), let $ € R~. Then —S = [J {p(—y); y € Q \ S} by Theorem 15.7.9 (i). By Definition 15.8.11, 
(S)! = Q U {x € Qt; Iy € Q\ —-S,x < 1/y}. But y € Q\ —S if and only if p(—y) € S, by 
Theorem 15.7.9 (vi), and x < 1/y if and only if —1/z < —y because z, y € QT. So Jy € Q\—S, x < 1/y 
is true if and only if —1/x € p(—y) for some y € Q* with p(—y) C S, and this is true if and only if 
—1/x € Up(y); y € Q and p(y) € S) because S € IR. But this is true if and only if —1/z € S by 
Theorem 15.5.12 (vi). So(-S$) ! = Qg U {x € Qt; —1/x € S). Hence(-S) ! = Qo U(-1/z; z e S). 
For part (x), let S c R*. Let T = (—-1/z; z e SN Qt}. Then T € R7 by part (vii), and -T = {x € Q; 
ay E€ QUT z< —y) € R* by Definition 15.7.6 and Theorem 15.7.17 (i). But Q\T ={-1/z; z € Q\S}UQh. 
So -T = {x € Q; Iz € Q \ S, z < 1/2) =U{p(1/z); z € Q\ S} = S-1 by part (ii). Hence —(S~') = T by 
Theorem 15.7.13 (v). 
For part (xi), let S € Rt. Then —S = {x € Q; dy € Q\S, x < —y} € R7 by Definition 15.7.6 and 
Theorem 15.7.17 (iv). Therefore (CS)! = {2 € Q7; dy € Q V —S, x < 1/y} by Definition 15.8.11. But 
y € Q- V —S if and only if y € @ and p(—y) C S by Theorem 15.7.9 (vi). Consequently (—S)~! = 
{x € Q`; 3y € Q^, (p(-y) C S and x < 1/y)}. But x < 1/y if and only if —1/x < —y since x,y € Qr. 
Therefore (CS)! = (x e Q';dy € Q, (-1/z < -y and p(-y) € S)} = ( e Q; -1/z e S} by 
Theorem 15.5.12 (viii). But (ze Q7; —1/x € S} = (ze Q`; -1/ze SnQ* 3 = (ze Q; -1/e SN QF}. 
So(-S)!-í(-1/zzeSnqQ'). 
Part (xii) follows from parts (viii), (ix), (x) and (xi). 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


562 15. Rational and real numbers 


For part (xiii), let S € IR^. Then S^! € R7 by part (iii), and so 7(S, 5^!) € R* by Theorem 15.8.8 (ii) and 
Definition 15.8.4. Let T = {—yz; y € S, z € S71}. Let x € T. Then x = —yz for some y € S and z € $^. 
But z € S^! if and only if z € Q and zw > 1 for some w € Q` \ S, by Definition 15.8.11. From y € S and 
w € Q VS, it follows that y < w, by Theorem 15.4.17 line (15.4.9). So zy > zw > 1. Therefore —yz < —1. 
So x € p(—1). Therefore T C p(—1). 


Now suppose that x € p(—1). Then z € Q and z < —1. Since S € R7, QT VS z 0. Let wo € Q^ VS. 
Let e = (x + 1)wo. Then € € Q* because 1 -- z € Q and wo € Qj. So by Theorem 15.6.15 (i), there 
exist y € S and w € QN S such that w — y < £. Let w' = min(w,wo). Then w € Q VS and w — y < 
w-y<e. So y/w —1 = (w — y)/(—w) < e/(—w) € e/(—wo) = —x — 1. Therefore —y/w' > z. 
But —y/w' = (w’ — y)/w' —1 < —1 because y < w' and w' < 0. Let z = —z/y. Then z € Q and 
zw’ = —zw'/y = x/(—y/w') > 1, where w € Q^ VS. Sox € T. Therefore p(—1) € T, and so T = p(—1). 
However, 7(S, $^!) = —T by Definition 15.8.4. Hence r(S, S!) = —p(—1) = p(1) by Theorem 15.7.17 (i). 
The equality p(1) = 7(S~!,S) then follows from Theorem 15.8.10 (i). In the case S € IR*, part (xii) and 
Theorem 15.8.9 (iv) may be applied to obtain 7(S, $71) = r((—S), —(S-13)) = r((—S), (-S)-!) = p(1), and 
hence p(1) = r(S-1, S). 

For part (xiv), let T1, T € R. Then c(T1, T2) = {y + zi y € Th, z € T3). Suppose that S, T1, T? € Ro. Then 
(11,75) € Ro. So T(S,0(11,75)) = -(-xw; x € S, w € o(T3,15)) = —{-a(y+2z); x E€ S, y e Ti, ze To} 
by Definition 15.8.4. Similarly, T(S, T1) = —{— zy; x € S, y € Ti] and T(S, T5) = —{— zz; x € S, z € Ty}. So 
a(7(S,T1),7(S,T2)) = —e((-xy; x € S, y e Ti}, (az; x € S, z € T2]) by Theorem 15.7.13 (vi). Therefore 
c(T(S,T1),7(S,T5)) = —{-a1y — xazi £1, £2 € S, yY € Ti, z € To} = -(-x(y + z); € S, y € Ti, z € To}, 
which equals 7(S, o (Ti, T2)). 

Now for part (xiv), let S € Rj and Tj, T? € Ro. Then 7(—S,o(Ti, T2)) = o(r(—8, T1), 7(—S, T2)) by the case 
just considered because —S € Rp. But T(—S, o (T1, T2)) = —T(S, o (T1, T2)) by Theorem 15.8.9 (iv). Similarly 
T(—S,T,) = —7(S, T1) and 7(—S,T5) = —T(S, T3). So e(r(—-S, T1), T(—S,T3)) = —e(r(S, T1), T(S, T2)) 
by Theorem 15.7.13 (vi). Therefore 7(S,0(T1,T2)) = e(T(S,Ti1),T(S, T5)) by Theorem 15.7.13 (v). The 
cases with 71, Tə € RI follow similarly. The remaining cases have mixed sign for the terms T, and T5. 
Suppose that T; € Rọ and T5 € R. Then either o(Tı, T2) € Ro or o(Tı, T2) € Ri. Suppose that 
o(T,,T2) € Ro. Then r(S,o(o(11,12), —12)) = o(7(S,o0(T1, T2)),7(S,-T2)) because -T> € Rọ. But 
T(S,o(o(Tı, T5), -T2)) = T(S, Tı). So 


o(7(S,0(T1, T2)), T(S, —T2)), T(S, T3)) 
,o(T1, T2)), o(7(S, —13), T(S, T2))) 

c(T3, T2)), 0(—7(S, T3), T(S, T3))) 
,0(T1,T2)), p(0)) 


The other mixed-sign cases for T; and T5 follow similarly. 


15.8.15 THEOREM: Order properties of the real number multiplication operation. 
(i) VS, $5 € IR, VS3 € R\ {Or}, (7(S1, $3) = T ($9, $3) > s = S2). 

(ii) YS, S2 € IR, VS3 € R\ {Or}, ($1 z $5 — T(91, 93) z T (S9, $3)). 

(iii) VS1, S2 € R, VS3 € Rt, ($1 € S2 > v($1, 93) < r(S3, $3)). 

(iv) 

PROOF: For part (i), let $1, $5 € IR and $3 € R\ {0r} with T(S1, 83) = T(S2, 

T($1,7(93, 5 )) = r(r($1, 93), 85 ) = r(r($, 83), 53°) = T(S2,7(93, 5 *)) 

Part (ii) follows logically from part (i). 

For part (iii), let 51,55 € Ro with Sı < S5, and S3 € Rt. Let x € T(S1, 93). Then x = yz for some 

y € Sı and z € Q \ Ss. So x = yz for some y € Sg and z € Q \ S3 because Sı C S2. So z € T(S2, $3). 

Therefore 7(51,93) C 7(S5, $3), and so 7(91, $3) € T(S2, 53). Suppose now that Sı € Rg and $5 € Rr. 

Then 7(91, $3) C p(0) by Theorem 15.7.15 (iii) and p(0) C r(S5, 53) by Theorem 15.7.15 (v). So 7($1, $3) C 

T(S5, 83), and so 7($1, $5) < (S5, 53). Finally suppose that S1, S2 € IR* with S, < S2. Let x € (91, $3). 

Then z € Q or x = yz for some y € Qj N Sı and z € QE N Ss. Therefore x € Q^ or z = yz for some 


v) YS1, S2 E€ IR, VS3 € IR^, ($1 < So > T(91, $3) < T (S9, $3)). 


$3). 'Then $1 = 7 ($4, p(1)) = 
= 7(52, p(1)) = S2. 
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ye Qi N S2 and z € Qi N S3 because Sı C S2. So x € T(S2, 8S3). Therefore 7(91,93) € T(S2, $3). Thus 
T(91, $3) € T(S2, $3) in all cases. 


Part (iv) follows from parts (iii) and (ii). 


15.9. Axiomatic system of real numbers 


15.9.1 REMARK: The real number system is a, complete ordered field. 

'The properties required for a complete ordered field in Definition 18.9.3 are asserted in Theorem 15.9.3 for 
the Dedekind-cut real number system in Definition 15.9.2. As mentioned in Remark 18.9.5, it can be shown 
that all complete ordered fields are essentially identical to this real number system. 


All properties of the real numbers which are not listed in Theorem 15.9.3 should be regarded as mere artefacts 
of the Dedekind cut representation, not true properties of the real number system abstraction. Thus the only 
substantial achievement of Sections 15.4-15.8 is to show that a complete ordered field can be represented 
in terms of the rational numbers, which in turn can be represented in terms of the finite ordinal numbers, 
which in turn can be represented in terms of Zermelo-Fraenkel sets. 


15.9.2 DEFINITION: The Dedekind-cut real number system is the tuple (IR, o, 7, X) with 
(i) IR as in Definition 15.4.8 and Notation 15.4.9, 

(ii) c as in Definition 15.7.2, 

) 

) 


(iii) 7 as in Definition 15.8.4, and 
(iv) < as in Definition 15.6.2 and Notation 15.6.3. 


15.9.3 THEOREM: The Dedekind-cut real number system is a complete ordered field. 
The Dedekind-cut real number system (IR, c, 7, <) has the following properties. 


(i) V91,S5 € R, o(S1, S2) € R. [closure of addition 
(ii) V$1, $5, $5 € R, ao(lo(S1, S2), $3) = o(S1, 0(S2, S3)). [associativity of addition 
(iii) VS € R, c(p(0), S) = S = a(S, p(0)). [existence of additive identity 
(iv) VS € R, o(S,—S) = p(0) = e(—S, S). [existence of additive inverse 
(v) YS1, S2 € R, e($1, $5) = o(S5, 91). [commutativity of addition 
(vi) V91, $5 € R, 7(91,5) € R. [closure of multiplication 

(vii) V9, $5, $3 € R, r(7($1, $2), $3) = r(S1,7 (S9, S3)). [associativity of multiplication 
(vii) V91,95 € R, r($1, $5) = r($5, $1). [commutativity of multiplication 
(ix) VS € R, T(S, p(1)) = S = r(p(1), S). [multiplicative identity 

(x) YS € B \ {0r}, (5,5 P = p(1) = T(S7}, S). [multiplicative inverse 
(xi) VS, T3, To € R, T(S,0o (T1, T2)) = o(7(S,T1), T(S, T2)). [distributivity 
(xii) < is a total order on R. [total order 

(xiii) VS1,S2,53 € R, (S1 < S2 => 0($1,53) < e($5, S3)). [order additivity 
(xiv) V91, S2, $3 € R, ((S1 < S2 and $3 > p(0)) = (91,53) < r(S», S3)). [order multiplicativity 
(xv) Every non-empty subset of R which is bounded above has a least upper bound. [completeness 
PRoor: Parts (i), (ii), (iii), (iv) and (v) follow from Theorem 15.7.11. 
Part (vi) follows from Theorem 15.8.8 (vi). 


Part (vii) follows from Theorem 15.8.10 (ii). 
Part (viii) follows from Theorem 15.8.10 (i). 
Part (ix) follows from Theorem 15.8.10 (iii). 
Part (x) follows from Theorem 15.8.14 (xiii). 


Part (xi) follows from Theorem 15.8.14 (xiv). 


Part (xii) follows from Theorem 15.6.4. 
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Part (xiii) follows from Theorem 15.7.17 (vi). 
Part (xiv) follows from Theorem 15.8.15 (iv). 
Part (xv) follows from Theorem 15.6.10 (i, ii). 


15.9.4 REMARK: Derivation of real-number analytic properties from axioms. 

The real number system properties in Theorem 15.9.5 are important in analysis. These properties are 
direct consequences of the axioms for a complete ordered field, which are verified in Theorem 15.9.3. In 
principle, all properties of the real numbers should be proved from these axioms, not from the Dedekind cut 
representation. 


15.9.5 THEOREM: Some analysis-related properties of the real number system. 
The set of real numbers has the following properties. 


(i) VS € PCR) \ {0}, (dx ER, Vy € 9, x < y) > 
(J'a € R, ((Vy € S, a € y) and (V«' € R, (a' > a — 3x € S, x <a’))))). 
In other words, if a set of real numbers has a lower bound, then it has a unique greatest lower bound. 
(ii) VS € P(R) \ {0}, (Gr € R, Yy € S x > y) > 
(F'b € R, ((Vy € S, b > y) and (W ER, (b < b > Jz € S, x > v))))). 
In other words, if a set of real numbers has an upper bound, then it has a unique least upper bound. 
(iii) Vr € R, 3j € Z, x € j. 
(iv) Vr € R, di e Z, i € a. 


PROOF: Part (i) is a restatement of Theorem 15.9.3 (xv). 

Part (ii) follows from part (i) by substituting {x € IR; —x € Sj for S. 

For part (iii), let x € R. If z € Ro, then z € j with j 20. If x € Rt, let X = {j € Z; j xz). Then 
X Æ Ú because 0 € X, and X is bounded above because Vz € X, z < x. So by Theorem 15.9.3 (xv), X has 
a least upper bound. In other words, Jjo € X, Vj € X, j < jo. Let jı = jo +1. Then x < jı because 
otherwise jı € X, which would imply that jo is not the least upper bound of X. So x € jı € Z. 


For part (iv), let x € R. If x € R$, then i < x with i = 0. If z € R7, let y = —z. Then y < j for some 
j € Z by part (iii). So z 2 —j € Z by Theorem 15.9.3 (xiii). 
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16.1. Real-number intervals 


16.1.1 REMARK:  Real-number interval notations. 

The numbers a,b € R in Notation 16.1.2 are not required to satisfy a < b. If a > b, then [a,b] = (a,b) = 
[a, b) = (a, 6] = 0. This is often the intended interpretation. 

An alternative interpretation, which would be consistent with a natural notation for lines and line segments 
in Cartesian spaces, would satisfy [a,b] = [b,a], (a,b) = (b,a) and [a,b) = (b,a] for all a,b € R. These 
relations are definitely not consistent with Notation 16.1.2 when a Z b. 


16.1.2 NOTATION: Let a,b€ R. 


(i 
(ii 


[a, b] denotes the set {x € IR; a € x and x < b}. 

(a, b) denotes the set [x € IR; a < x and x < b}. 
[a, 0) denotes the set (x € IR; a € x and x < b}. 
(a, b] denotes the set (x € IR; a < x and x < b}. 
[ 


b) 

(iii b) 

o) 
a, oo) denotes the set {x € R; a € x}. 

oo 

o0, 

oo 


) 
) 
) 
(iv) 
(v) 
(vi) 
(vii) 
(viii) 
x) 


(i 


16.1.3 REMARK: Real-number intervals are “gapless” or convex. 
Definition 16.1.4 may be interpreted to mean that intervals are those sets which have no “gaps”. Another 
way of viewing this is that real intervals are the convex subsets of R. 


) denotes the set {x € IR; a < x}. 

b] denotes the set {x € R; z < b}. 
,b) denotes the set {x € IR; x < b}. 
(—oo, oo) denotes the set IR. 


a, 


( 
(= 
E 


16.1.4 DEFINITION: A real-number interval is a subset I of R which satisfies 


Vr,y€ I, VE CR, (r«tandt«y) — tcI. (16.1.1) 


16.1.5 DEFINITION: An open interval of the real numbers is any set of the form (a,b), (a, oo), (—o0, b) 
or IR, for some a,b € R with a < b. 
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A closed interval of the real numbers is any set of the form [a,b], [a, oo), (—oo, b] or IR, for some a,b € R 
with a < b. 


A closed-open interval of the real numbers is any set of the form [a, b) for some a,b € R with a < b. 


An open-closed interval of the real numbers is any set of the form (a, b] for some a,b € R with a < b. 


A semi-closed interval or semi-open interval of the real numbers is any open-closed or closed-open interval 
of the real numbers for some a,b € R with a < b. 


16.1.6 THEOREM: Classification of real-number intervals. 
A set of real numbers is a real-number interval if and only if it is either an open, closed or semi-closed interval 
of the real numbers. 


PROOF: Let J be a real-number interval. If J = Ø, then I = (a,a) for any a € IR. So J is an open interval. 
Now assume that I 4 Ø. Then zo € I for some zo € IR. If I is not bounded above, then x € I for alla € IR 
with zo € x by Definition 16.1.4. In other words, [ro, oo) C I. Similarly, if J is not bounded below, then 
(—oo, zo] € I. So in the case of a doubly unbounded interval, I = R = (—oo,0o). If I is bounded below 
and not bounded above, let a = inf{x € I; x € zo]. Then a € R by Theorem 15.9.5 (i), and (a, xg] C I by 
Definition 16.1.4. If a € I, then I = [a,oo). If a ¢ I, then I = (a,oo). Similarly, if T is bounded above and 
not bounded below, then 7 = (—oo, b] or I = (—oo, b), where b = sup(x € I; xo € x} € R. If I is bounded 
both above and below, then J = [a,b], I = [a,b), I = (a, 6] or I = (a,b). 


16.1.7 DEFINITION: The closed unit interval of the real numbers is the set [0, 1]. 


The open unit interval of the real numbers is the set (0,1). 


16.1.8 REMARK: Connected sets of real numbers. 

In terms of the standard topology on the real numbers, a subset of R is connected if and only if it is an 
interval. (See Theorem 34.9.3.) Note that an open interval may be the empty set. In this case, inf I = oo 
and sup I = —oo. So (inf I, supI) = 0. 


16.1.9 THEOREM: Equivalent connectedness condition for a set to be an interval. 
For all J € P(R), I is a real-number interval if and only if Vaj, £2 € I, [1,23] C I. 


PROOF: Let J be a real-number interval. Let 21,22 € I. Let x € [21,22]. Then zı € x and z € x2 by 
Notation 16.1.2 (i). If = zı or x = x2, then x € I by assumption. Otherwise x € I by Definition 16.1.4. 
Hence [21,22] C I. Conversely, suppose that J € IP(IR) satisfies Yz1, £2 € I, [ri,z3] C I. Let a,y € I 
and t € R satisfy x < t and t < y. Then t € [x,y] by Notation 16.1.2 (i). So t € I by the assumption on T. 
Therefore I satisfies line (16.1.1). Hence I is a real-number interval by Definition 16.1.4. 


16.1.10 DEFINITION: 
The interval containing an element x of a subset S of R is the set (y € R; [x,y] U [y, z] € S}. 


16.1.11 T'HEOREM: Sets of real numbers are disjoint unions of intervals. 
Let S be a subset of IR. For x € S, let Iy denote the interval of S containing z. 


(i) For all z € S, I, is a real-number interval. 


(i) S = Ures Te 
(iii) Yz1, £2 € S, (Iz, O I4, = 0 or Is, = L,). 


PROOF: For part (i), let z € S. Let y1, y» € Ix. Let t € R satisfy yı < t and t < ys. Either t < x or t > x. 
Suppose that t < x. Then yı < x. But [y1, z] C I, by Definition 16.1.10. So t € I, because t € [y1, x]. Now 
suppose that t > x. Then x < y». But [r, y2] € I; by Definition 16.1.10. So t € I, because t € |x, ys]. Thus 
t € I, in either case. Hence J, is a real-number interval by Definition 16.1.4. 

For part (ii), let z € S. Then x € I; by Definition 16.1.10. So S C U,eg Iz. Now let y € Upes Ie. Then 
y € I, for some z € S. If y € x, then y € [y, z] € S, and so y € S. If y > a, then y € [x,y] € S, and 
so y € S. Thus Upes Ia € S. Hence S = Upes Ie. 

For part (iii), let 21,72 € S with x; < x. If x4 = £2, obviously I,, = I,,. So assume that xı < z2. Suppose 
that L,, N Is, #0. Let z € IL, N Isp. If z € z1, then [z, x2] € Is, because Iy, is an interval by part (i). So 
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[z1, 23] € S. Similarly, [x1, 22] € S if x2 € z. Suppose that xı € z and z € ag. Then [,z] € S because 
I, is an interval, and |z, 3] C S because Is, is an interval. So [ri, 2] C S. Thus [zi, 22] C S in all cases. 


Let y € L;,. Suppose that y € xı. Then [y, zi] € S by Definition 16.1.10 for I,,. So [y, xo] = [y, xi] U 
[21,22] C S. Therefore y € Is, by Definition 16.1.10 for I, because [r2, y] = 0. Suppose that zı < y < za. 
Then [y, x3] € [£1, x9] € S. So y € Ix, by Definition 16.1.10 for 7,,. Suppose that x2 € y. Then [z1, y] C S 
by Definition 16.1.10 for I,,. So [ro, y] € [zi, y] € S. Therefore y € Is, by Definition 16.1.10 for I;,. In 
all cases, y € da implies x € Jes Thus de € L,, Similarly, Iss C I,,. Therfore Ln = I;,. Hence either 
LaL, =0 or Ip, = Ing. 


16.1.12 DEFINITION: A finite real-number interval is a real-number interval of the form (a,b), (a, 6], [a, b) 
or [a,b] for some a,b € R with a < b. 


An infinite real-number interval is a real-number interval which is not a finite real-number interval. 


A semi-infinite real-number interval is a real-number interval of the form (a, oo), [a, oo), (—00, b) or (—oo, b] 
for some a,b € IR. 


16.1.13 DEFINITION: The length of a real-mumber interval I is b — a if I is a finite interval (a,b), (a, 0], 
[a, b) or [a,b] for some a,b € R with a € b, and is oo if J is an infinite interval. 


16.1.14 REMARK: Symmetrised notation for a finite closed interval of real numbers. 

Notation 16.1.15 is occasionally useful for various operations on real-number intervals. (See for example 
Theorem 32.7.4.) This way of denoting a finite closed interval be expressed in terms of Notation 22.11.12 for 
the “convex span" of a subset of a linear space as [[r, y]] = conv((x, y}), if R is regarded as a linear space. 


16.1.15 NOTATION:  [[a, |], for a,b € R, denotes the set [a, b] U [b, a]. 
In other words, [[a, b]] = [min(a, b), max(a, b)]. 


16.1.16 THEOREM: Equality of bounded closed real-number interval to convex span of the end-points. 
Let a,b € R. Then [[a,6]] = ((1 — A)a + Ab; A € [0, 1]}. 


PROOF: Let S = {(1—A)a+ A A € [0, 1]]. Let c € [[a,b]]. Suppose that a = b. Then c = a = b. Let 
A —0. Then A € (0, 1] and (1—A)a-- Ab = a = c. Soc € S. Now suppose that a 4 b. Let À = (c— a)/(b— a). 
Then A € [0,1] because if a < b then c—a > 0, b— a > 0 and c — a € b — a, whereas if b < a then c — a < 0, 
b—a«0andc—azb-—a. Also, (1— àja 4- Ab 2a - A(b— a) =a+c—a=c. Soce S. Thus [[o,0]] € S 
in all cases. 


Now assume that c € S. Then c = (1— A)a-- Ab for some A € [0, 1]. Suppose that a € b. Then c = a-4- A(b— a), 
which implies that a € c, and c = b— (1— A)(b— a), which implies that c € b. So c € [a,b] by Notation 16.1.2. 
Thus c € [[a, b]] by Notation 16.1.15. Similarly, if a > b then c € a and c > b, which implies c € [b,a] = [[a, b 
Thus S C [[a, b]]. Hence [[a, 5]] = {(1 — A)a + àb; ^ € [0, 1]]. 


16.1.17 REMARK: Approximation of the supremum or infimum of a set of real numbers. 

Theorem 16.1.18 (iii) is often useful in situations where it cannot be easily shown, for example, that a > b, 
but it can be shown that a > b — e for all e € R+. It is obvious that this implies that a > b, but it is useful 
to have a formal theorem which justifies it. 


16.1.18 THEOREM: Some properties of the infimum and supremum of sets of real numbers. 
Let X be a set of real numbers. 


(i) If b = sup X € R, then X n (b, 00) = 0). 


(ii) If b = sup X € R, then Ve € IR*, XN (b — ev] FO. 

(iii) Vb € R, (b = sup X & (X n (b,oo) 2 and Ve € IR^, X n (b— €,b] z 0)). 
(iv) If b = inf X € R, then X n1(—oo, b) = 0). 

(v) I£ b — inf X € R, then Ve € Rt, X n[b,b 4- e) 4 0. 

(vi) Vb € R, (b —inf X & (X  (—oo,0) = and Ve € Rt, X n [b,b +e) £0). 
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PROOF: For part (i), suppose that X n (b,oo] # Ø. Then b is not an upper bound for X. Therefore 
b Æ sup X by Definition 11.2.4. Hence b = sup X € IR implies X N (b, co) = f. 

For part (ii), let b = sup X € R and suppose that & € IR* satisfies X N (b — e, b] = 0. Then XN (b—¢,00) = 
(X n (b— e,b]) U(X n (b,oo)) = 0 by part (i). Sox < b—e for all z € X. Thus b — € is an upper bound for 
X which is less than b. So b 4 sup X. Hence b = sup X € IR implies Ve € IR^, X n (b— e,6| FO. 


For part (iii), the forward implication follows from parts (1) and (ii). For the reverse implication, first note 
that the assumption X n (5,00) = Ø implies that b is an upper bound for X. Suppose that b is not the least 
upper bound for X. Then there exists a bound b’ € R for X with b < b. Let € = b — U'. Then £ € IR* and 
X N (b-«,b] = XN (b',b] C Xn(Vl/,;oo) 20, which contradicts the assumption Ve € IR*, X n (b — e, b| z 0. 
Hence the assertion is valid. 

Part (iv) may be proved as for part (i). 

Part (v) may be proved as for part (ii). 


Part (vi) may be proved as for part (iii). 


16.2. Extended real numbers 


16.2.1 REMARK:  Plausible representation of infinities as Dedekind cuts. 

'The system of extended real numbers could be quite conveniently implemented by extending the notion of 
Dedekind cuts. Thus instead of excluding the special cases Ø and Q as in Definition 15.4.8, the more natural 
set of general Dedekind cuts in Definition 15.4.4 could be developed in much the same way as in Sections 
15.4, 15.5, 15.6, 15.7 and 15.8. This seems to be even more natural than the class of finite Dedekind cuts 
initially, but the development of addition and multiplication properties would be untidy. Moreover, such 
a treatment would be implementation-dependent. So it is best to add the two pseudo-numbers *oo" and 
"—oo" to an abstract real number system IR and adapt the axioms accordingly. 


Another reason to develop the extended real numbers axiomatically rather than in terms of Dedekind cuts 
is that the extended real numbers have very few of the basic properties that are generally associated with 
well-behaved algebraic systems. Even the addition operation is not algebraically closed. 


16.2.2 REMARK: Interpretation of the infinity-symbol as a well-defined concept. 

The symbol for “infinity” oo in Definition 16.2.3 is known as “Bernoulli’s lemniscate". (It was introduced 
a few decades earlier by John Wallis in 1655. See Cajori 1242], Volume 1, page 214, Volume 2, page 44.) 
This symbol can be drawn as the graph of a formula like (z^ + y?)? = x? — y?. (This is r? = cos 20 in polar 
coordinates.) The meaning of the symbol is not so easy to specify. It is nearly always a pseudo-notation. 
That is, it does not represent a particular set or any precise logical concept. (Thus a pseudo-notation is even 
worse than a pseudo-number.) Propositions which include the symbol oo generally need to be rewritten to 
make them logically meaningful. 

In the case of the infinity for extended non-negative integers, one may append the set w to the set of integers 
to obtain wt = w U {w}. Thus oo could denote the set w in the case of the non-negative integers. 


In the case of the real numbers, it does not make much sense to define oo to mean the set R since then —oo 
would mean —R, which is either meaningless or else it would mean the same set as IR. The easy way out is to 
say that “it just doesn’t matter". One plausible solution is to define +oo to mean IR* and —oo to mean R7. 
Another alternative is to define all elements of R to be equivalence classes of ordered pairs in R x IR. Pairs 
(x,y) with z Æ 0 could represent the number y/z, and pairs (x, y) with x = 0 could represent +00 if y > 0 or 
—oo if y « 0. (The pair (0,0) may be held in reserve to mean “undefined” for another kind of extended real 
number definition.) Ultimately, the representation of the set of extended real numbers is not as important 
as the representation of the ordinary real numbers. In fact, it just doesn't matter! 


16.2.3 DEFINITION: The set of extended real numbers is the set R U {—co, oo}. 


16.2.4 NOTATION: R denotes the set of extended real numbers. 


16.2.5 NOTATION: 

IR* denotes the set of positive extended real numbers {r € IR; r > 0). Thus IR* = Rt U {+00}. 

IR} denotes the set of non-negative extended real numbers {r € R; r > 0). Thus Rj = R$ U {+00}. 
IR- denotes the set of negative extended real numbers {r € IR; r < 0}. Thus IR- = R7 U (—oo). 

Ro denotes the set of non-positive extended real numbers {r € IR; r < 0). Thus IR; = IR; U (—oc]. 
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16.2.6 REMARK: The choice of notations for sets of extended real numbers. 

The notation R for the extended real numbers is not used by many authors. (Federer [69], page 51, does use 
this over-bar notation.) The principal advantage of the over-bar notation is the ability to add superscripts, 
which is difficult for the more popular asterisk superscript notation IR*. 


The asterisk superscript has the advantage of suggesting the usual notation for topological compactification 
of a set. (See for example S.J. Taylor [147], page 34.) However, the topological space asterisk usually 
denotes a one-point compactification, whereas R is a two-point compactification. So there is not much loss 
in abandoning the asterisk superscript. 


16.2.7 DEFINITION: The extended real number system is the number system (IR, c, 7), where IR is the set 
of extended real numbers, c : (IR x IR) V ((—oo; +00), (+00, —20)} — R is the addition operation on R, and 
T : (R x IR) N ((0, +00), (0, —00), (+00, 0), (—co, 0)} > R is the multiplication operation on R, defined by 


a+b ifabeR 
c(a,b) — 4 +00 if a,b € RU {+co} and +00 € {a,b} 
—oo  ifa,b € RU {—co} and —oo € {a,b} 


and 
ab ifa,b c IR 
T(a,b) = 4 +00 if (a,b) € (R+ x IR*) U (IR^ x R-) and (—oo, +oo} n {a,b} z 0 
—oo if (a,b) € (IR^ x R*)U (IR* x IR) and (—oo, +00} N {a,b} FO. 


16.2.8 REMARK: Arithmétic operations on the extended real numbers. 

The extended real number system in Definition 16.2.7 is clearly not a field. Neither is it a ring. Nor is it 
a group or even a semigroup with respect to the addition or multiplication operation. Nevertheless, it is a 
useful number system for measure and integration theory and some other applications. 


Since both the addition and multiplication on IR, are defective, it is not possible to define subtraction and 
division in the usual way. However, it is possible to define a defective subtraction operation ô : RxR — R and 
a defective division operation p : IR x (R\ {0}) — R which have some level of consistency with the defective 
addition and multiplication operations. At the least, one would hope for relations such as ó(c(a, b), b) = a, 
a(d(a, b), b) = a, p(r(a, b), b) = a and r(p(a, b), b) = a. Similarly, one may hopefully construct unary additive 
and multiplicative inverse operations which are only slightly defective. 


A negation operation v : R — IR and a multiplicative inverse 8 : IR \ {0} > IR may be defined as follows. 


-a ifaeR 
Ha) = | if a — —oo 
—oo ifa - -roo 


and 


B(a) = T if a € R\ {0} 


0 if a € (—oo, +00}. 


A subtraction operation 6 : (IR x IR) V {(—00, —o9), (+00, +00)} — IR may be defined by ó(a, b) = o (a, v(b)) 
for (a,b) in the domain of 6, and a quotient operation p : R x (IRA (01) — IR may be defined by: 


a/b ifa€ hR and b € IRA {0} 
p(a,b) =< a if a € (—o0, +00} and b € IR* 
via) ifac (—oo,4oo) and b € IR. 


16.2.9 REMARK: The set of non-negative extended real numbers. 

'The non-negative extended real number system in Definition 16.2.10 is not exactly a simple restriction of 
the extended real number system in Definition 16.2.7. For example, it is possible to write 07! = oo in a 
meaningful way, which is not possible when there are two infinities. 
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16.2.10 DEFINITION: The non-negative extended real number system is the number system ((R$,o, 7), 
where Ro is the set of non-negative extended real numbers, c : Ri x Rf  Rjisthe addition opera- 
tion on R$, and 7 : (IRj x R$) V {(0, +00), (+00, 0)} — R$ is the multiplication operation on Rj, defined 
by 


sud a+b ifabe Rt 
"| +00 — if too € {a, b} 


and 


ab ifa,be R 
+oo if (a,b) € IR* x Rt and +o € {a,b}. 


r(a,b) = { 


16.2.11 REMARK: Multiplicative inverse, subtraction and quotients for extended real numbers. 
A multiplicative inverse f : Ri — Ri may be defined by 


a | ifae Rr 
B(a)=40 if a = +00 
+oo ifa=0. 
A subtraction operation ô : {(a,b) € R$ x Rj; a > 0) U ({+00} x Rt) > Ri may be defined by 


a—b ifa,be€ R anda>b 
+00 if a = +00 and be RẸ, 


&(a, b) = { 


and a quotient operation p : (R x Rj) V {(0,0), (+00, +00)} — Rj may be defined by: 


ajb ifac€IRj and b € Rt 
p(a,b) =< 0 if a € Rg and b = +00 
+oo ifa=-+oo and b € IR. 


16.2.12 REMARK: Intuitive interpretations of the real-number infinity concept. 
The “number” infinity may be thought of as the solution of the equation: 


g=a2+1. (16.2.1) 


This is not much more absurd than defining the imaginary number i as the solution of z? = —1. The 
equation x = x 4-1 may be “solved” by iteration. Start with x = 1 and substitute this into the right hand 
side. This gives x = 2. Now substitute again to obtain x = 3. By continuing this an infinite number of 
times, the solution x converges to infinity! Another way to solve equation (16.2.1) is to divide both sides 
by x, which gives 1 = (x + 1)/z, which means that the ratio of z + 1 to x equals 1, which is a more and 
more accurate approximation as x tends to infinity. 


Let x = 1019. Then equation (16.2.1) is accurate to 10 decimal places. This is as accurate as any measurement 
in physics. So from this point of view, the equality may be regarded as exact. If æ = 10!99, then the 
overwhelming majority of floating-point hardware on personal computers is unable to distinguish the left 
and right sides of the equation. So in this sense, z = 101% is an exact solution of the equation. If x = 1010 7^, 
then equation (16.2.1) is accurate to 1019? decimal places. If 2500 decimal places are printed on each side 
of an A4 sheet of paper with standard 80 grams per square metre paper density, the total weight of paper 
required to print out these decimal places is 10°! tons, which is approximately 3 x 104! times the mass of 
the observable universe. Therefore one may consider 10!9? to be a pretty good approximation to infinity. 


16.2.13 DEFINITION: The ordered extended real number system is the ordered number system (IR, o, , €), 
where (IR, o, 7) is the extended real number system in Definition 16.2.7, and “<” is the total order on R in 
Definition 15.6.2, extended by the relations 


Vr c R, —oo € r and r € oo. 


In other words, the order on R. is the relation RU ((—oo,r); r € R} U ((r,oo); r € R}, where R is the 
standard order on IR. (See Definition 15.6.2.) 
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16.2.14 THEOREM: Basic properties of order on the extended real numbers. 

(i) The order on R is a total order. 

(ii) In the order on R, inf(@) = oo and sup(@) = —oo. 
(ii) If S € IP(IR) is a not bounded below in IR, then inf(S) = —oo with respect to the order on R. 
(iv) If S € IP(IR) is a not bounded above in IR, then sup(S) — oo with respect to the order on IR. 
(v) Every subset of R has a well-defined infimum and supremum in R. 


Pnoor: For part (i), let a,b € R. Let R be the standard order on IR. Then R is a total order on R by 
Theorem 15.6.4. Let R be the order on R. If a,b € IR, then a € b or b € a by Definition 11.5.1 (i). If 
a = —oo and b € R, then a < b. If a = œ and b € R, then b < a. If a € R and b = —oo, then b < a. If 
a € R and b = oo, then a € b. Thus a < b or b < a for all (a,b) € R x R. So R satisfies Definition 11.5.1 (i). 
Let a,b € R satisfy a < b and b < a. If a,b € R, then a = b by Definition 11.5.1 (ii). If a = —oo, then b < a 
implies that b = —oo because the pair (b,—oo) is not in R for other values of b. Similarly, a = b if a = oo, 
b = —oo or b= oc. Thus R satisfies Definition 11.5.1 (ii). 

Let a,b,c € R satisfy a < b and b < c. If a,b,c € R, then a < c by Definition 11.5.1 (iii). If a = —oo, 
then a < c. If a = oo, then b = oo, and so c = oo. Therefore a < c. If b = —oo, then a = —oo. So a € c. If 
b = oo, then c= oo. Soa € c. If c= —oo, then b = —oo and so a = —oo. Soa € c. If c= oo, then a < c. 
Thus for all a,b,c € IR, a € b and b € c implies a € c. So R satisfies Definition 11.5.1 (iii). Hence Risa 
total order on IR. 


For part (ii), oo X x for all z € Ø. So oo is a lower bound for ( in the order on IR. But y < co for all y € IR. 
So y < oo for all lower bounds for Ø. Therefore inf(()) = oo by Definition 11.2.4. Similarly, sup(0) = —oo. 


For part (ii), let S be a subset of IR which is not bounded below. Then S has no lower bound in R by 


Definition 11.2.5. But —oo is a lower bound for S in IR by Definition 16.2.13. So —oo is the greatest lower 
bound for S. Thus inf(S) = —oo. 

Part (iv) may be proved as for part (iii). 

For part (v), let S be a subset of IR. If S = 0, then S has a well-defined infimum and supremum by part (ii). 
So suppose that S Z Ø. If S is bounded below in IR, then S has a greatest lower bound in IR. (This exists 
by Theorem 15.6.10 (ii) or by Theorem 15.9.3 (xv).) Thus inf(S) € R. If S is not bounded below in IR, then 
inf(S) = —oo € R by part (iii). Hence S always has a well-defined infimum in IR. Similarly, S always has a 


well-defined supremum in IR. 


16.3. Extended rational numbers 


16.3.1 REMARK: Sets of extended rational mumbers. 

The extended rational number “system” may be defined by extending the rational numbers in Section 15.1 
by both or either of the pseudo-numbers “oo” and *—oo". Alternatively it may be defined as the restriction 
of the extended real numbers in Section 16.2 to the rational numbers. Strictly speaking, it is best to define 
them by extending the rational numbers, but then again, the extended rationals are almost always regarded 
as a subset of the extended real numbers in practice. Definition 16.3.2 follows the extension approach, but 
Q is assumed to be embedded in the usual way inside R. 


16.3.2 DEFINITION: The set of extended rational numbers is the set Q U {—o0, co}. 


16.3.3 NOTATION: Q denotes the set of extended rational numbers Q U (—oo, oc]. 


16.3.4 REMARK: Representation of infinite rational numbers. 

By analogy with the ordered pair equivalence classes in Definition 15.1.5, the “infinite rational numbers” —oo 
and oo could be represented as —oo = ((n,0); n € Z- ) and oo = ((n,0); n € Z+}. However, the arithmetic 
rules will not then be consistent with the finite rationals. Since the rules for order and arithmetic must be 
specified separately for the infinite rationals anyway, it is best to leave their representation unspecified. 


16.3.5 NOTATION: E 
QF denotes the set of positive extended rational numbers {q € Q; q > 0}. 


Qg denotes the set of non-negative extended rational numbers {q € Q; q > 0}. 
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Q denotes the set of negative extended rational numbers (q € Q; q < 0}. 
Qo denotes the set of non-positive extended rational numbers {q € Q; q < 0}. 


16.4. Real number tuples 


16.4.1 DEFINITION: The Cartesian tuple space with dimension n € Zi is the set IR" with index set Nn. 


16.4.2 REMARK: Choice of index set for real-number tuples. 

See Notation 14.6.1 for the definition of X” for general sets X. The usual index set for n-tuples in R” 
is the set IN, = {1,2,...,n} as in Definition 16.4.1, although there are arguments in favour of the index 
set n = {0,1,...n—1}. (Note that Definition 16.4.3 is more or less superfluous in view of Definition 14.6.10.) 


16.4.3 DEFINITION: The m, n-concatenation operation for real (number) tuples for m,n € Zi is the func- 
tion Qm,n : IR? x R” > R™*” defined by 


Qm: (zi... £m) (Y1; .--9n)) > (£i... Lm, Y1; <- Yn): 


16.5. Some basic real-valued functions 


16.5.1 REMARK:  Case-split construction procedures for real-valued function definitions. 

The real-valued functions in Section 16.5 are mostly constructed from simple algebraic operations combined 
with logical case-splits. In other words, the independent variable is tested to determine which rule will be 
applied to obtain the function's value. 


16.5.2 DEFINITION: The absolute value (function) for real numbers is the function | - | : R — R defined 
by 
x dfr20 
Vr ch, DE fa <0 


16.5.3 THEOREM: Bounds for the absolute value function for real numbers. 
(i) Vz € R, |z| > 0. 
(ii) Vr € R, -|z| € x < |z]. 


PROOF: For part (i), let x € IR. Then either z < 0 or x > 0. If x < 0, then |x| = —x > 0. Otherwise x > 0. 
Hence x > 0. 

For part (ii), let z € IR. Then either z < 0 or x > 0. If x < 0, then |x| = —x. So —|z| = x < |z|. Therefore 
—|ir| € x € |z|. If x > 0, then |z| = x. So -|z| = —x € x = |a|. Therefore —|z| € x € |z|. Hence 


Ve € R, -|v| € z € |z]. 


16.5.4 DEFINITION: The sign function of a real variable is defined by 


1 x20 
Vr cl ie | 0 x20 
—-] «<0. 


16.5.5 REMARK: Alternative names for the signum and absolute value functions. 

The sign function is also called the signum function (from the Latin word signum"). Some authors use 
the notation sgn(z) for sign(x). The absolute value function is sometimes known as the “modulus”. The 
absolute value and sign functions are illustrated in Figure 16.5.1. 


16.5.6 THEOREM: Every real number equals its sign multiplied by its absolute value. 
Vo € R, sign(z): |x| = x. 


PROOF: sign(z):|v| equals 1: z = x for z > 0, (C1): (—x) = x for x < 0, and 0: x = x for x = 0. 
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|z| sign(z) 
A 
3+ 
2+4 
1 
+ I » 
-3 -2 -1 1 2 3 £ 
-1 


Figure 16.5.1 Absolute value and sign functions 


16.5.7 REMARK: The Heaviside unit step function. 

The Heaviside function H : R — R is variously defined with H (0) equal to 0, 1 or i. In generalised function 
and transform contexts where it usually appears, the value of H(0) often has no significance. As a Fourier 
transform, it usually is best to set H(0) — i. but for simplicity in calculations, H(0) equal to 0 or 1 is more 
convenient. It is often advantageous for functions to be right-continuous, which suggests that H(0) should 


equal 1. Generally the value is chosen to suit the application. 
EDM2 [113] gives H(0) = 1 in section 125.E and appendix A, 12.II, but H(0) = 1 in section 306.B. 


ind 2 


Rudin [130], page 180, exercise 24 gives H(0) = 0. Treves [150], pages 26 and 240, leaves H(0) undefined. 


Definition 16.5.8 gives the Heaviside function in terms of the signum function, which forces H(0) — Z. 


Some authors give the notation e(t) for the Heaviside function of t. (E.g. see CRC [261], page F-180.) The 
Heaviside function is also called the unit step function. 


16.5.8 DEFINITION: The Heaviside function H : IR — R is defined by Vx € R, H(x) = (1 + sign(x))/2. 


16.5.9 REMARK: The value of the Heaviside function at the discontinuity. 

As illustrated in Figure 16.5.2, the Heaviside function value H(0) equals H(0)0° because 0° = 1. 
Remark 16.6.2 for 0°.) However, H(x)? Z H(x) for x = 0 and integers p > 1. The equality H(z)? = 
does hold if H(0) equals 0 or 1, but not for any other choices for H (0). 


H(z) = H(x)a? H(x)x! H(x)x? 
A 
i 2 2 
1 1 1 
D io. 23 1 0 i 2 2 I1 Ô b 2%} 


-2 -1l 


Figure 16.5.2 Heaviside function multiplied by monomial 


The value H (0) = 0 would have the advantage that H (x) = limp_,o+ z? for all z > 0. But the value H (0) = 4 
has the advantage that H(x) = 1 — H(—z) for all x € R, and it is also the right choice for Fourier analysis. 
All in all, the inconsistencies between the definitions of the Heaviside function for its various applications 
make it difficult to provide a single “best” definition. So it is best to explicitly define in each context which 
value is intended for x = 0. 


16.5.10 REMARK: Floor, ceiling, fraction and round functions. 
It may not be immediately obvious that the floor, ceiling, fraction and round functions are constructed by 
logical case-splitting. However, one may write the floor function in Definition 16.5.11 as 


0 if0<a<1 
1 ifl<a<2 


Va ER, floor(z) = A 3b ee di 


={i ifieZandi<a<i+l. 
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This is admittedly not a fixed finite case-split rule. In fact, it makes the construction of the floor function look 
more like the inductive definitions in Section 16.6. The constructions floor = {(x,i) € Rx Z;i<a<i+l1} 
and ceiling = {(z,i) € Rx Z; i — 1 « x € i} show that induction is not required for Definitions 16.5.11 
and 16.5.12. 


Any definition which requires an infimum or supremum has an analytical appearance, although these are 
basic set-theoretic order concepts as in Chapter 11, which are apparently more fundamental than algebra or 
analysis. It is generally possible to remove “inf” and “sup” operations from definitions by replacing them 
with equivalent logical formulas. Therefore the boundary between logical and analytical constructions is 
somewhat subjective. 


Definitions 16.5.11 and 16.5.12 are illustrated in Figure 16.5.3. 


floor(z) ; ceiling(z) — 
A | A | 
| | 
Ar t Ar m> 
| | 
| | 
3r a 3r m> 
| | 
| | 
2- t 2r m> 
| | 
| l 
I t— 1 
| 
1 * 1 1 | 1 » | d | 1 | 1 1 > 
-l 1 2 3 4 5 T | 0 1 2 3 4 5 T 
— i 
Figure 16.5.3 The floor and ceiling functions 


16.5.11 DEFINITION: The floor function floor : IR —> Z is defined by 
floor = {(x,i) E R x Z;x—1<i< zr} 
={(x, i) ERx Z;i<r<i+1}. 
In other words, 
Vee R, floor(x) = sup(i € Z; i € x]. 
An alternative notation for floor(z) is |z]. 
16.5.12 DEFINITION: The ceiling function ceiling : IR — Z is defined by 
ceiling = ((z,4) ER x Zi zd z-4 1) 
—((zie€mRxZii-le«mctzi]. 
In other words, 
Vac € IR, ceiling(z) = inf(i € Z; i 2 x). 
An alternative notation for ceiling(x) is [x]. 


16.5.13 THEOREM: Formula for the ceiling function in terms of the floor function. 
Vz € R, ceiling(x) = — floor(—z). 
PRoor: From Definitions 16.5.11 and 16.5.12, it follows that: 
Ve eR, ceiling(z) = inf {i € Z; i > x} 
= —sup{—i;i€ Zandi > x] 
= —sup{i; —i € Zand —i > x} 
= —sup(i € Z; i € —x] 
= — floor(—z). 


16.5.14 REMARK: Alternative notation for the floor function. 
Some authors use the notation ent(x) for floor(a). (E.g. see CRC [261], page F-180.) 
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16.5.15 REMARK: The fractional-part and round-to-integer functions. 
Two useful functions which can be easily expressed in terms of the floor function are the “fractional part” 
and “round” functions. 


16.5.16 DEFINITION: The fractional part function frac : IR. — [0,1) is defined by 


Vee R, frac(z) = x — floor(z). 
16.5.17 DEFINITION: The round function round : R — Z is defined by 
Vr E€ R, round(z) = sign(z) - floor(|z| + 2). 


This may also be called the nearest integer function. 


0 1 2 3 4 5 


— 
A l 
Ar — 
frac(z) | 
3r | 
2r — 
| 
1 | 
Z se 7 
L >T TEMP 1 1 1 fi pv 
0 1 2 3 1 


-3 -2 -1 


] 
| 
——À 


Figure 16.5.4 Fractional part and rounding functions 


16.5.18 REMARK: Variant styles of round-to-integer functions. 
There are several common variants of the rounding function in Definition 16.5.17. Probably the most popular 
version rounds to the nearest integer “away from zero", as defined here. 


16.5.19 DEFINITION: The modulo function mod : IR x IR* > R is defined by 


Vz € R, Ym € R*, x mod m = inf{y € R$; 3i € Z, x =i- m +y}. 
0 


16.5.20 REMARK: Expressions for the modulo function. 
The modulo function satisfies 
Vx € R, Vm € R*, x mod m = inf((x +i- m; i € Z) n RG) 
= inf {x — km; k € Z and « > km} 
—gz-—mlz/m| 
= mfrac(x/m). 


Figure 16.5.5 illustrates the modulo function and a shifted modulo function x > (x +m) mod 2m — m. Such 
functions are familiar in electronics as “relaxation oscillator” waveforms. 


mmod m (x +m) mod 2m — m 
A 
ye t ' 
VA LS ~ 2m 2m 
» >r 
—2m -=m 0 m 2m x 


Figure 16.5.5 Modulo functions 
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16.5.21 THEOREM: Lower and upper bounds for the value of the modulo function. 
(i) Vr € R, Vm € Rt, 0 € z mod m < m. 


PROOF: For part (i), let x € IR and m € R \ {0}. Then z mod m = inf(y € Rj; Ji € Z, x = i- m + y) by 
Definition 16.5.19. Suppose that x > 0. Then x € {y € Rj; Ji € Z, x =i-m+y} because z = i-m+y with 
i = 0 and y = z. So (y € Rj; di € Z, x = i- m y) z: 0. Therefore z mod m € Rj, and so z mod m > 0. 
Now suppose that x < 0. Then z/m € R7, and so Hi € Z, i < z/m by Theorem 15.9.5 (iv). For some such i, 
let yo = x — im. Then yo € IRj and x = im + yo. So yo € (y € R$; Ji € Z, x =i- m +y} z 0. Therefore 
x mod m > 0. For the second inequality in part (i), assume z mod m > m. Then inf{y € R; di e Z, £ = 
i-m-4 y] 2 m. 


16.5.22 REMARK: Sawtooth functions. 
The sawtooth functions shown in Figure 16.5.6 are useful for defining the sine and cosine functions in 
Section 44.2. 


|(z + a) mod 2a — a| |(z + 3a) mod 4a — 2a| — a 


— 2a 2a 


Figure 16.5.6 Sawtooth functions 


A particular sawtooth function is the “distance to nearest integer", defined by inf[|r — k|; k € Z}. The 
difference between x and the rounded value round(z) = ceiling(z — $) is 


ax — sign(2) - floor(|z| + 1) = sign(z) - (|x| — floor(|x| + 2)) 
= sign(z) - (frac(|r| + 2) — 1). 

Therefore the distance from a real number zx to the nearest integer equals the absolute value of this, which 

is 


(|z| + 3) mod 1— 4 
|x] + 3) mod 1— 4 


x +a) mod 2a — a 


|frac(|a| + 3) - 3| = : 


| 
-( 
( 


with a — i. which is illustrated in Figure 16.5.6. 
16.5.23 REMARK: Square functions. 
Figure 16.5.7 shows some basic square functions. The functions floor(z) and floor(a/a) — 2 floor(z/(2a)) are 


right-continuous. The functions ceiling(x) and 2ceiling(z/(2a)) — ceiling(x/a) are left-continuous. 


floor(z/a) — 2floor(z/(2a)) 2 ceiling(z/(2a)) — ceiling(z/a) 


— e—a .—————— — a —————9 — 
 —————— ————— —— ——e — —— 
x x 
— 2a —a 0 a 2a — 2a —a 0 a 2a 


Figure 16.5.7 
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16.6. Powers and roots of real numbers 


16.6.1 REMARK: Inductive definitions of some basic real-valued functions. 

Mathematical induction is inherent in the definition and basic properties of the real numbers, as for example 
Theorem 15.6.15. (See Theorem 12.2.12 for the mathematical induction principle.) The real-valued functions 
in Section 16.6 use mathematical induction explicitly in their definitions. 


16.6.2 REMARK: WNon-negative integral power functions. 

Any finite set of integral power functions x — 2? for z € IR and p € Z may be defined as individual functions 
without using induction. By using induction, one may define the general map (x, p) — 2? for (x, p) € Rx Zg. 
'The cases p — 0 to p — 5 are sketched in Figure 16.6.1. 


Figure 16.6.1 Non-negative integral powers 


The only case requiring careful thought is (z,p) = (0,0). It is best to define 0° = 1 in accordance with 
Theorem 14.7.9, which states that #(0°) = 1 because Øf = {Ø} by Theorem 10.2.25. Consequently the 
function x ++ x° is continuous. Another way to look at 0? is analytically as the limit expression lim,_,9+ £” = 
lim,_,9+ exp(zlnz) = exp(lim,_,9+ z ln x) = exp(0) = 1. 

The notation z? for the non-negative integer power in Definition 16.6.3 is consistent with the power-of-two 
function for integers in Notation 14.7.8. Both Definitions 16.6.3 and 16.6.4 are consistent with the integral 
powers of group elements in Notation 17.4.9. 


16.6.3 DEFINITION: The (non-negative integer) power x? of a real number x € R with non-negative integer 
exponent p € Zi is defined inductively by 


(1) Yz € R, z? 2 1, 
(2) Vx € R, Vp € Z*, zx? = z? 1x. 


16.6.4 DEFINITION: The (negative integer) power x? of a non-zero real number x € R \ {0} with negative 
integer exponent p € Z- is defined inductively by 


(1) V2 € R\ {0}, z^! =1/z, 
(2) Vz € R \ {0}, Yp € Zt, x P = x fa. 


16.6.5 REMARK: Positive integral roots of non-negative real numbers. 

The existence of positive integral roots of non-negative real numbers is demonstrated in Theorem 16.6.6 (x) 
in an essentially analytic fashion. As mentioned in Remark 16.5.10, this blurs the dividing line between 
arithmetic and analysis. Any definition which uses an infimum or supremum is effectively using a limit, 
which is a topological or analytical concept. 
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The supremum sup{z € Rj; z? < x} in Theorem 16.6.6 (xi) must be shown to be a solution for y in the 
equation y? = x, which in essence requires a proof of the continuity of the function y — y1, which is clearly 
a topological concept. In fact, the bounds for z?” — y? in Theorem 16.6.6 (viii, ix) are closely related to the 
Lipschitz continuity parameter for the function y — y. 


It should not be surprising that some kind of analysis is required for the definition and properties of integer 
roots of real numbers because when these are combined with the power functions in Definition i 6.3, general 
positive rational powers of non-negative real numbers may be defined. By “filling the gaps” ” between the 
rational numbers using an infimum or supremum construction, general exponential functions such as 10* or 
e” may be defined as in Definition 16.6.15. 


Since square roots and cube roots have been computed since very ancient times, it may seem that integer 
roots are part of elementary arithmetic. It is therefore annoying, perhaps, to discover that they require 
substantial work to demonstrate their well-definition. On the other hand, the basic properties of reciprocals 
of real numbers require substantial work in Theorem 15.8.14 because multiplicative inverses are also defined 
as solutions of equations, although reciprocals are even more ancient than square roots and cube roots. 


16.6.6 THEOREM: Well-definition of positive integral roots of non-negative real numbers. 
(i) Vp € Z+, Vai, £2 € Rb, (£1 < 29 — rh < qh). 
(ii) Vp € Z*, Vari, £2 € RJ, (x1 € £2 > xl € zb). 

(iii) Vp € Zt, Vay, 22 € R$, (41 < £2 & x} < 25). 

(iv) Vp € Zt, Yz1, £2 € R, (£1 = 29 €» x} = x). 
(v) Vp € Zt, Vai, x2 € Rh, (z1 < 22 & x? < xb). 

(vi) Vp € Zt, Vx € Rg, (x 2 1 > a? >r). 

i) 
) 
) 
) 
i) 


(vii) Vp e Zt, Va € RG, (rz € 1 > z? < x). 
(viii) Vp € Zt, Vy,z € Ro, ((p > 2andy < z) > z? — y? < p(z — y)z^- )). 
(ix) Vp € Zt, Vy,z € RO, (y € z => z? — y? € p(z — y)z^ !). 


(x 


Vq € Z*, Vr e R$, J'y e R$, yt = x. 
Vq € Zt, Vr € R$, Vy € RY, (yd =x & y =sup{z € R$; 24 € x]). 


(x 


Pnoor: For part (i), let p € Z+ and z1,£2 € IRj with xı < zə. The assertion x? < 25 is obvious 
for p = 1. So assume that the assertion is true for some p € Z+. Then 0 € z, < zs and 0 < a} < zb. 
So att = g?g, X abay < hrs = cht! by Theorem 15.9.3 (xii, xiii, xiv). Hence the assertion follows for 
all p € Z* by induction on p. 


Part (ii) follows from part (i) and the observation that xı = x2 implies x? = 25. 


For part (iii), let p € Z* and z,,z3 € R$ with x? < zb. Then x} > zf. So x2 > z by the contrapositive of 
part (ii). Therefore xı < x2. The converse follows from part (i). 

Part (iv) follows from part (iii) because “<” is a total order on R by Theorem 15.9.3 (xii). 

Part (v) follows from parts (iii) and (iv). 

For part (vi), let p € Z* and x € Ri with x > 1. The case p = 1 is obvious. So assume that the assertion 
is true for some p € Z+. Then zP?*! = z?z > z?.] = 2? > x by Theorem 15.9.3 (xii, xiii, xiv, ix). Hence the 
assertion follows for all p € Z* by induction on p. 

For part (vii), let p € Z* and x € IRj with z < 1. The case p = 1 is obvious. So assume that the assertion 
is true for some p € Z*. Then z?*! = g?g € z?.1 = x? < x by Theorem 15.9.3 (xii, xiii, xiv, ix). Hence the 
assertion follows for all p € Z* by induction on p. 

For part (viii), let p € Zt and y, z € R$ with y < z. Let p = 2. Then 2? — y^ = 22 — y? = (z — y)(z + y) < 
2(z — y)z = p(z — y)z?-!, as claimed. Now assume that the assertion is valid for some p > 2. Then 
zP*l — yP*l = z(zP — yP) + (z — yy? < zp(z — y)z? 1+ (z — y)z? = (p + 1)(z — y)z?, which verifies the 
assertion for the case p + 1. Hence the assertion is valid for all p > 2 by induction on p. 


Por part (ix), let p € Z* and y, z € Rj with y < z. Let p = 1. Then z? — y? = z — y = p(z — y)z?-! because 

= 1 by Definition 16.6.3 (1). This verifies the case p = 1. Now suppose that p > 2 and y = z. Then 
2 — y? = 0 = p(z — y)z?- 1, which verifies the assertion for this case. The remaining case p > 2 with y < z 
follows from part (viii). 
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For part (x), let q € Zt and x € Rj. If q = 1, then clearly y = x is the unique solution of the equation 
y? = x by Definition 16.6.3. So it may be assumed that q > 2. But if x = 0, then y = 0 is the unique 


solution of the equation y? = x by part (iv). So it may also be assumed that x > 0. 
Let S = {z € Rg; z? € x}. Then S £0 because 0 € S. Let z = ceiling(z). (See Definition 16.5.12.) Then 
zo 2 1 because x > 0. So zj > zo by part (vi). Therefore (zo + 1)! > z6 2 zo > zx by part (i), which then 


implies that zo + 1 ¢ S. It follows that z1 ¢ S$ for all zı € R$ with zj > zo + 1 by part (i). Thus zo +1 is 
an upper bound for S. So S has a well-defined supremum sup($) € Rf by Theorem 15.9.3 (xv). 


Let y = sup(S). Suppose that y = 0. Then S = {0}. To show that this implies x = 0, assume that x > 0. 
Let z = min(1,x). Then 0 < z. If x > 1, then 2? = 11 = 1 € x. So z € S, which contradicts the assumption. 
But if x < 1, then 2? = z? < x by part (vii). So z € S, which contradicts the assumption again. Therefore 
x = 0, which implies that y = 0 is the unique solution of the equation y? = x by part (iv). Thus it may be 
assumed that y > 0. 

Now suppose that y > 0 and y? < x. Let x’ = min(z,2y?). Then y* < a’. Let ó = (z' — y?)/(2qy* |). Then 
ô » 0. Let z 2 y+6. Then y < z. So zt—y? < q(z—y)z% + by part (viii) and the assumption q > 2. Therefore 
23 < yTd-góz4- | = yt +} (x'—y1)z171/y171, From z' < 2y%, it follows that 29 < y*--1(2y8 —9)24-1/yt-! = 
y iysd 7l < yp iz]. So 24 < 2y*. Then x’ < x implies z? < y* - (x —9)2171/y1 | < yt (x—y*)y/z < 
y? -- (zx —y*) ^ x. Soz € S. But y < z. So this contradicts the assumption y = sup( S). Consequently y? > x. 
Now suppose that y > 0 and y? > æ. Let 6 = (y? — z)/(qy* |). Then 6 > 0. Let ğ = y —ó. Then 
J< y. So y! — i < q(y — g)y* ! by part (viii) and the assumption q > 2. Therefore y* — 94 < qóy* ! = 
qy* (y? — x)/(qy* ) = yt — x. So j? > x. Therefore z < ¥ for all z € S by part (iii), and g € S. So g is an 
upper bound for S which is less than y, which contradicts the assumption y = sup(S). Consequently y? < zx. 
Hence y? = x. The uniqueness of y follows from part (iv). 


Part (xi) follows directly from the proof of part (x). 


16.6.7 REMARK: Definition of positive integral root of a non-negative real number. 

Strictly speaking, positive integral roots of non-negative real numbers are solutions y € R$ of the equation 
y? = x for x € Rj and q € Z as in Theorem 16.6.6 (x). However, Definition 16.6.8 is stated slightly more 
concretely in terms of the set-supremum sup {z € IRj; 27 < x} in Theorem 16.6.6 (xi). 


16.6.8 DEFINITION: The (positive integral) root of a non-negative real number x € Rj with index q € Z+ 
is the real number sup {z € IR}; z? < x}. 


16.6.9 NOTATION: Wz, for x € R and q € Zt, denotes the positive integral root of z with index q. In 
other words, 4/z = sup {z € Rj; z4 < x} is the unique solution y of the equation y? = z. 


z!/4 is an alternative notation for w/z. 


16.6.10 REMARK:  Aational powers of non-negative real numbers. 

The rational power z?/4 in Definition 16.6.11 and Notation 16.6.13 may be computed as either (x?)!/4 
Or (al/ 1)?, and multiplication or division of p and q by the same positive integer makes no difference to 
the value obtained. In other words, z?/? depends only on the rational number equivalence class [(p, q)] in 
Definition 15.1.5. This follows from the fact that z?/4 is the unique solution y of the equation y? = x? by 
applying various identities such as x7" = 2" = (y")" = (z")" for m,n € Zt. If q < 0, it is assumed 
that z?/? means a(-?)/(-9), (Definitions 16.6.11 and 16.6.12 are illustrated in Figure 16.6.2.) 

For convenience, the undefined expression (xP)! 1 in Definition 16.6.12 for z = 0 and p < 0 may be written 
as the pseudo-value +00. This choice of value is occasionally useful. 


16.6.11 DEFINITION: The (non-negative rational) power of a non-negative real number x € R with ra- 
tional exponent p/q € Q, where p € Z and q € Z^, is the real number (a?)!/4, 


16.6.12 DEFINITION: The (negative rational) power of a positive real number x € R* with rational expo- 
nent p/q € Q, where p € Z and q € Z*, is the real number (z?”)!/4. (Optional value +00 for x = 0.) 


16.6.13 NOTATION: z?/45, for x € Ri, pe Zi and q € Z*, denotes the power of a non-negative real 
number x with non-negative rational exponent p/q € Q. 


z"/4. for x € Rt, p € Z- and q € Z+, denotes the power of a positive real number x with negative rational 
exponent p/q € Q. 
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Figure 16.6.2 Rational powers of non-negative real numbers 


In other words, x?/4 = (g?)!/4 = (41/4)? = 4/zP. In other words, z?/4 is the unique solution y for the 
equation y? — zP. 


16.6.14 REMARK: Real powers of non-negative real numbers. 

Powers of non-negative real numbers may be extended from rational exponents as in Definition 16.6.11 to 
real exponents as in Definition 16.6.15 by forming limits. The general limit concept, as in Definition 35.3.7, 
requires the higher-level topological framework. However, it is possible to cobble together a “toy limit” 
construction for the narrow purpose of defining real powers in terms of rational powers. 


The special case of the exponent r = 0 in Definition 16.6.15 may be computed directly from the formula 
inf[z?; s € Q and s < r} which is specified for 0 < x < 1. Ifr < 0, then the set Spr = (25; sc Qand s r} 
is empty because z? is undefined for s < 0. So inf(S,,.) = +oo by Theorem 16.2.14 (ii). (Alternatively, 
Szr = {+00}, which also yields inf(S,,,) = oo.) If r = 0, then S;,, = Soo = {1}, which implies inf(S,,,) = 1. 
If r > 0, then Szr = {0,1}, which implies inf(S,,,) = 0. However, these three cases are written out explicitly 
for convenience. 


Strictly speaking, the extended real number value +oo for x = 0 and r < 0 in Definition 16.6.15 means 
“undefined”. It is included here for the convenience of defining real powers for a simple Cartesian set- 
product domain (z,r) € Rj x R instead of splitting the cases according to what is and is not defined. On 
the other hand, the extended real value +00 is occasionally useful. 


16.6.15 DEFINITION: The (real) power of a non-negative real number x € IRj with exponent r € IR is the 
extended real number 


+00 if x =0andr <0 
1 if x =0andr=0 
0 ifz—0andr»0 


infi; sc Qands Er). ifüczczl 
infí[zr?; sc Q and s >r} ifl&z. 


16.6.16 NOTATION: z^, for x € IRj and r € IR, denotes the power of x with real exponent r. 
0 


16.6.17 REMARK: Elementary inequalities for real numbers involving square roots. 

The inequalities in Theorem 16.6.18 may be very broadly generalised. (The Cauchy-Schwarz inequality 
in Theorem 24.9.14 is a relatively minor generalisation of Theorem 16.6.18 (ii).) But the assertions in 
Theorem 16.6.18 may be proved in an elementary way using only some basic properties of the real numbers 
and the square root function. 


16.6.18 THEOREM: Some basic inequalities for real numbers and square roots. 
(i) Vzi. yi, 22, y € R, (122 + viz)? € (v1 + yi) (23 + y2) 
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(ii) Vari, y1,22,y2 € R, levee + iy] < (2? + yi)? (a5 + y3). 

(iii) Va1,y1,22, yo € R, (21 + 22)? + (yi + y)? < (1 + yf)? + (a3 + y3) y. 
(iv) Ve1,y1,@2,y2 € R, ((a1 + 22)? + (yr + y2)?)? € (at + yf)? + (x3 + y3)". 
PROOF: For part (i), (a? + y?) (x3 + y2) — (x12 + yiye)? = xy2 + y$22 — 2x, reyiy2 = (£1Y2 — 3122)? > 0. 
The assertion follows directly from this. 

Part (ii) follows from part (i) and Theorem 16.6.6 (iii). 

For part (iii), (22-1)? + (#3 -+y3)'/?)? — (a1 +42)? — (yi + ye)? = (27 +y?) (3 +3)? — 22122 — 29 yp. 
This is non-negative by part (ii). The assertion follows directly from this. 
Part (iv) follows from part (iii) and Theorem 16.6.6 (iii). 


16.6.19 THEOREM: Some basic inequalities for differences between square roots. 
(i) Va,y€ Rj, @—y)? < |æ? — y?l. 

(ii) Va,b € R$, (a? — bt/2)? < |a — b]. 

(iii) Va,b € R$, |at/? — bt/?] < la — b| t. 

(iv) Vz € R3, (14- 2)!/? — 21⁄2 < 1. 


2 


ho 
| 


Pnoor: For part (i), let z,y € R$ with y < x. Then (z — y)? — |x? — y?| = 2? — 2zy +y? — x 
2y? — 21y = 2y(y — x) € 0. So (x — y)? < |x? — y?|. The same holds for x < y by swapping x and y. 
Part (ii) follows from part (i) with z = a!/? and y = 0/7. 

Part (iii) follows from part (ii) and Theorem 16.6.6 (ii, iv). 


Part (iv) follows from part (iv) with a = 1 + z and b = z. 


16.7. Sums and products of finite sequences 


16.7.1 REMARK: Sums and products of finite sequences of real numbers. 

Notations 16.7.2 and 16.7.3 are clones of Notations 14.4.17 and 14.4.18, inserting real numbers into the 
formula templates instead of integers. It is then possible to state and prove Theorem 16.7.4 (iii) for the 
combination symbol in Section 14.9, which is defined purely in terms of integers. 


16.7.2 NOTATION: $i m ai, for m,n € Zp and any function a : I — R with I C Z[m, n], denotes the sum 
defined inductively with respect to n — m by 


(1) Xi omai 7 0ifn-m «0, 
(2) X s pe üm if n — m — 0, 


(3) Dimai = (Yos as) + an ifn- m > 0. 


16.7.3 NOTATION: J]j_,, ai; for m,n € Zj and any function a : I + R with I C Z[m,n], denotes the 
product defined inductively with respect to n — m by 
(1) II, a; 9 1 if n — m < 0, 


i—m 


(2) II, a; = am if n 2 m = 0, 


(3) IE ti = (Ts: ai) an ifn—m> 0. 

16.7.4 THEOREM: Basic formulas for powers and combinations as products of sequences. 
(i) Vz € R, Vp € Zi, x? = [I z. 

(ii) Ve € R\ {0}, Vp € Zi, z^? = J [f 1/z. 

(iii) Vn,r € Z, C? = [Tio (n - 0/1 +9). 


PROOF: Part (i) follows from Definition 16.6.3 and Notation 16.7.3. 
Part (ii) follows from Definition 16.6.4 and Notation 16.7.3. 
For part (iii), let n = 0. Then C? = ôro by Theorem 14.9.8 (i), and also II tn — i)/(1 + i) = ôro by 


Notation 16.7.3. Therefore C? = J [iZ (n — i)/(1 + i) for all r € Zf when n = 0. 
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Assume the formula for n = k € Zi. Let n — k4-1. Then Qr = 1 by Definition 14.9.2. Therefore 
CEH = [En — i)/(1 + i) for r = 0. Solet r € Zt. Then C**! = C* , + Ck by Theorem 14.9.8 (ii). So 
the case n — k implies 


cr" t a-o/u92 TG - 0/04 9 


i=0 i=0 
=i har dye m TTE- NA+) 
-Hexi-2/0 9. 


Then the assertion follows for all n € Zf by induction. 


16.7.5 REMARK: Formula for the product of r sums of n real numbers. 

Theorem 16.7.6 states that the product of any r sums of n real numbers is equal to the sum of all products 
of selections of one element from each of the r finite sequences of real numbers. Each multi-index 7 € IN7, in 
line (16.7.1) is a selection i = (i1, ... ir) of one element i; from IN, for each k € N,. 


16.7.6 THEOREM: Formula to swap the product and sum of a double family. 
Let r € Z* and n € Zj. Let f : N, x Nn > R. Then 


Tl X fh) = E TI fix). (16.7.1) 


i€N? k=1 


PROOF: In the case r = 1, line (16.7.1) asserts that bm £i) = Siem fC, i1), which is clearly true. 
So assume that line (16.7.1) is true for some r € Zt. Then 


r-Fl mn n Ji LL 
II 3 f(3) 2 (3 fr 1,3)) II 35/053) 
k=1j=1 j=l k=1j=1 
=( E fr+Lim)) x» Il f(k, tx) 
i-e Na i€Nz k=1 
= X (frim) 2 I f(k,ix)) 
ipt1ENn i€NT; k=1 
= l 5 P f(r - 1,441) Il f(k, tn) 
ir41ENn ENT, k=l 
r+1 
= 2. Il f(k, ik), 
i€N; 1) k=1 


which is the assertion of line (16.7.1) for the case r +1. Hence the assertion follows for all r € Z* by 
induction on r. 


16.7.7 REMARK: The most extreme sum is not more extreme than the sum of the extremes. 
Theorem 16.7.8 (i, ii) is fairly obvious. It is a fact of everyday life that when variable quantities are averaged 
(or summed), the result is never more extreme than the average (or sum) of the individual extremes. This 
is applicable in particular to the proof of Theorem 43.9.5 (vi), related to Darboux integration. 


16.7.8 THEOREM:  Bounds for supremum and infimum of finite sums of functions. 
Let n € Z. Let fi: X — R be bounded functions on a non-empty set X for i € Ny. 
(i) sup{ a fi; t€ X) € Diy sup fi. 

(ii) mf(95*4 fi): te X) > Ny int fi. 

Proor: For part (i), 377, fi is bounded because each f; is bounded. So sup{}>j_, fi(t); t € X} is 
well defined. Let t € X. Then f;(t) < sup fi for all i € Ny. So X; fi(t) € SUL, sup fi. Hence 
sup{ 5, Jilt); t € X) € Xi sup fi. 
Part (ii) may be proved as for part (i). 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13 


16.8. Complex numbers 583 


16.8. Complex numbers 


16.8.1 DEFINITION: The complex number system is the tuple (IR x R,oc,7c) where 


(i) V(zi,y1), (v2, 92) € R x R, ec((zi, y1), (2, y2)) = (1 + z2, Y1 + v), 
(ii) V(r1, y1), (22,92) € IR x R, Tc ((z1, 91), (x2, y2)) = (dixo — yiyas ziya + 122). 


16.8.2 NOTATION: C denotes both the complex number system (IR x R,oc,7c) and the set R x R in the 
context of the complex number system. 

In other words, © < (C, oc, Te) = (IR X R, oc, 7c). 

i denotes the complex number (0, 1). 


16.8.3 REMARK: Notation for imaginary numbers. 
The letter i is not universally used for the complex number (0, 1). In electronics engineering, the letter j is 
generally preferred so as to avoid confusion with the standard letter i or J denoting an electric current. 


16.8.4 REMARK: The real and imaginary part operators for complex numbers. 

The real and imaginary parts of a complex number z = (x,y) are defined to be the left and right elements x 
and y respectively of the ordered pair (x,y). The left and right element extraction operations are described 
in Definition 9.2.13 and Notation 9.2.14. 


16.8.5 DEFINITION: The real part of a complex number z € C is the left element of z. 


The imaginary part of a complex number z € C is the right element of z. 


16.8.6 NOTATION: Re(z), for a complex number z, denotes the real part of z. 
Im(z), for a complex number z, denotes the imaginary part of z. 
In other words, Re(z) = Left(z) and Im(z) = Right(z) for all z € C. 


16.8.7 REMARK: The absolute value function for the complex numbers. 
In compliance with Remark 1.6.3 item (4), complex numbers are avoided wherever possible in this book. 
However, some basic definitions are inescapable if some basic facts about connections on differentiable fibre 
bundles with unitary structure groups are to be presented. In particular, the absolute value function in 
Definition 16.8.8 is required in order to describe the topology on the complex numbers in Definition 39.4.4. 
(See Notation 16.6.9 for the square root expression in Definition 16.8.8.) 


16.8.8 DEFINITION: The absolute value (function) for complex numbers is the function | - | : € — R$ 
defined by 
Vz € C, |z| = (Re(z)? + Im(z)?)!?. 


In other words, 

Vz,y € R, Im y) = (2? +y). 
16.8.9 THEOREM: Some basic properties of the absolute value function for complex numbers. 
(i) Vz1, 22 € C, |z123| = |zi||22]. 
(ii) Vz1,z2 € C, |z1 + 22| € |a| + |z2]. 
PROOF: For part (i), let z1 = (x1, y1) and z2 = (#2, y2). Then 

|2122| = |(£1£2 — yrye, io + 122) 

= ((ziz2 — yiya2)? + (ayo + yr2)”) 
( 


z122-- iy — 2zizoyiy») + (2192 + y122 + 211y29122)) 
1/2 


1/2 


1/2 


(x? +y?) (£2 + ya))/? 
+ yi)! (3 + 3)? 
= |z] zl. 


Part (ii) follows from Theorem 16.6.18 (iv) by letting 2; = (%1,y1) and z2 = (£2, y2). 


II 


( 
= (au3 + viua + ays + yiv3) 
( 
= (ai 
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16.8.10 REMARK: The inevitability of the complex numbers. 

There is a strong argument for the “reality” of complex numbers by considering the properties of Taylor 
series. For example, consider the function f : R — R defined by f : z + (1 + 27)~!. This function is 
clearly analytic, but the Taylor series for f at a point ro € R has convergence radius (1 + 25 d 2 which is 
coincidentally the distance from the point (29,0) to (0, +1) in the complex plane, and the two pairs (0, +1) 
happen to be the poles of the analytic extension of f to the complex numbers. It also just happens that the 
two complex numbers (0, +1) are the solutions of z? + 1 = 0. (See Figure 16.8.1.) 


Cony. x ` 

D S ee, ` ` 
rni lie T3 \ 1 x 
1 it 1 an Li l La L > 


[ivi 122/ 2 3; 4 9: 6 


radius of convergence of (1 + z2)-! 


Figure 16.8.1 Convergence radii suggesting complex pole for real-analytic function 


A similar observation may be made generally for algebraic real-analytic functions, which suggests that even 
if the complex numbers are ignored, they still have disruptive effects on the real line. This is reminiscent 
of the way in which geophysicists detect minerals a long distance under the ground using gravitometric, 
electromagnetic and other detection methods. The existence and nature of minerals deep underground 
may be inferred from evidence obtained entirely above ground. In the same way, the poles of real-analytic 
functions “deep in the complex plane" may be detected by their influence on the radius of convergence of 
Taylor series for real-valued functions evaluated purely within the real line. 


This kind of argument only shows the “reality” of the complex numbers if one accepts the “reality” of power 
series. There is no reason in physics why power series should have a special significance, apart from the 
fact that addition and multiplication are programmed into calculators. It is very unusual for a physical 
system to have state-functions which are truly real-analytic except in mathematical models, because an 
analytic function is determined everywhere from any infinitesimal region. At the very microscopic level at 
which quantum theory is effective, probably the real functions are indeed real analytic. So one would expect 
complex numbers to be relevant in that context. The usefulness in quantum theory of complex numbers 
is therefore not surprising. (There seems to be no argument for the “reality” of quaternions similar to the 
argument for the reality of complex numbers.) 


A more accurate name for “imaginary numbers" would be “hidden numbers". They are just as real as the 
"real numbers", but you don't see them immediately unless you know where to look. 


Although the most obvious application of imaginary numbers is to quadratic equations, this does not compel 
one to believe that imaginary numbers are valid. The idea that “i is the square root of —1" may be regarded 
as merely a clever way of saying that the square root of —1 does not exist. However, cubic polynomial 
equations such as 2? = pz + q for real p and q have a general formula which gives the correct answer for 
p > 3(|g|/2)?/? only if one accepts V~T as a valid number. (See Boyer/Merzbach [237], pages 258-259.) 


When the rule is applied to x? = 15z--4, for example, the result is x = 4/2 + /—121-- 3/2 — V—121. 
Cardan knew that there was no square root of a negative number, yet he knew x = 4 to be a root. 
He was unable to understand how his rule could make sense in this situation. [...] Cardan referred 
to these square roots of negative numbers as “sophistic” and concluded that his result in this case 
was "as subtile as it is useless". 


Thus even in the 16th century, imaginary numbers were becoming difficult to ignore. 
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16.8.11 REMARK: The “magic properties” of the complex numbers. 

At least one author has noted that the complex number system is somewhat “magical”. (See Penrose [297], 
pages 71-85.) The complex numbers seem to yield many surprising properties for free. However, this 
“magic” is a consequence of the holomorphic function constraint in Definition 42.8.7, which follows fairly 
automatically from the analytic function constraint in Definition 42.8.5, which extends real analytic functions 
to the complex plane as term-by-term holomorphic extensions. The “magic properties” then follow from the 
particular holomorphic form of the product rule in Definition 16.8.1 (ii), which was originally motivated by 
the desire to extend the algebraic theory of real polynomial equations. 


As in all magic, the conservative laws of physics are not disobeyed. Rabbits are neither created nor destroyed. 
Magic tricks are achieved by distracting the audience so that they do not see how the rabbit gets into the 
hat. In the case of complex numbers, the trick is to downplay the very strong constraints imposed by the 
complex product rule and the holomorphic function definition. Together, these lead to conjugate pairs of 
harmonic functions which satisfy the highly conservative Laplace equation. (See for example Ahlfors [45], 
pages 25-28; Shilov [135], page 381; Lang [109], pages 237-275.) 
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Part II 


Algebra 


The informal outlines at the beginning of Parts I, II, III and IV of this book are not intended to be read. 
However, these outlines could possibly be of some value as an 1 adjunct to the table of contents and index. 
The reader is recommended to skip immediately to Chapter 17. 


Part Il: Algebra 


(1) Part II defines some algebraic systems which are required for differential geometry. Table 17.0.2 in 
Remark 17.0.3 is an overview of the structure of many of these algebraic systems. 


CHAPTER 17: Semigroups and groups 

1) Chapter 17 defines single-set algebraic classes with a single operation, including semigroups, monoids 
and groups. 

2) Section 17.1 introduces semigroups, which are the simplest algebraic structures that are applicable to 
differential geometry. 

3) Monoids in Section 17.2 are semigroups which have an identity. They are not used in the differential 
ume) in this book. 


subgroups, inchiding coe Buch as group morphisms, commutative groups, Asclimedenn order on 
a group, left and right cosets, normal subgroups, quotient groups, direct products of groups, simple 
groups, left and right conjugates of subsets, and conjugation maps. 


CHAPTER 18: Rings and fields 
a ) Pics 18 defines single-set algebraic classes with two aa mata | addition and d 


ordered Erud Sbsolute value functions on ri rings, and deed unitary rinse. Concepts dictio for rings 
include subrings, ideals, zero divisors, cancellative rings, commutative rings, unital morphisms, division 
rings, integral domains, ordered integral domains and integral systems. It turns out that an "integral 
system" is always ordered-ring-isomorphic to the ordered ring of integers. In other words, it is a complete 
axiomatic specification for the ordered ring of integers. 


(3) Sections 18.7, 18.8 and 18.9 introduce fields, ordered fields and complete ordered fields respectively. A 
field is essentially the same thing as a commutative division ring. Most of the definitions for fields are 
directly inherited from rings. It turns out that a complete ordered field is always isomorphic to the 
ordered field of real numbers. Thus the axioms of a complete ordered field are a complete specification 
of the ordered field of real numbers. 


(4) Section 18.10 extends the list spaces in Section 14.12 by using the algebraic structure on the space of 
objects that the list elements are chosen from. 


(5) Section 18.11 defines some “algebraic classes of sets" which are required for measure theory. These 
classes have only a superficial resemblance to genuine algebraic structures. The classes are given names 
such as semi-rings, rings, algebras and fields of sets. As shown in Table 18.11.2 in Remark 18.11.18, 
there is much inconsistency in the naming, apparently because the analogy with authentic algebra is 
weak. Section 18.12 extends these classes of sets to o-rings, g-algebras and o-fields, which are defined 
in terms of closure under countably infinite union and intersection operations. 
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CHAPTER 19: Modules and algebras 


(1) 


(5) 
(6) 


Chapter 19 is concerned with two-set algebraic structures, which have two sets and at least one operation 
by one set acting on the other. Roughly speaking, modules are passive commutative groups which are 
acted on by some other algebraic structure such as a semigroup, group or ring, whereas an algebra has 
two sets which both have addition and multiplication epum 


and rings. 


The most useful kind of module is a unitary left module over a commutative division ring, which is 
also known as a “linear space” (or “vector space”). These are so important that they are dealt with 
separately in Chapters 22-24. (In fact, the multilinear and tensor spaces in Chapters 27-30 are also 
linear spaces. So they are also unitary left modules over commutative division rings. A commutative 
division ring is also known as a “field” .) 


Most of the definitions and properties of modules are familiar from higher-level algebraic classes such 
as linear spaces. It is an interesting exercise to determine how little structure is required to define a 
concept or prove a property. The minimalist norms and inner products in Sections 19.5, 19.6 and 19.7 
are defined in this spirit for modules over rings, leading to some properties in Theorems 19.7.4 and 
19.7.6 which resemble linear space properties. In Section 19.8, some slightly weaker-than-usual Cauchy- 
Schwarz and triangle inequalities are proved for modules over ordered rings, and in Theorem 19.8.9, the 
full Cauchy-Schwarz inequality is shown for Archimedean ordered rings. 


Section 19.9 defines associative algebras, which are not much used in this book. 


Section 19.10 defines Lie algebras, which are closely associated with Lie groups, and are thereby of great 
significance for general connections on differentiable fibre bundles, and for affine connections on tangent 
bundles in particular. 


CHAPTER 20: "Transformation groups 


(1) 


Chapter 20 introduces transformation groups, which are more reality-oriented than the abstract groups 
in Section 17.3. One could say that all groups are groups of transformations, although often the same 
group acts on multiple passive spaces simultaneously, which justifies abstracting groups as an individual 
algebraic structure. (For example, the linear group acting on the tangent space at a point simultaneously 
transforms all of the associated tensor spaces at that point.) Transformation groups are the algebraic 
foundation for the Lie transformation groups in Section 63.4. Parallelism on general fibre bundles 
is defined in terms of structure groups, which are groups ‘of transformations of a fibre space. Hence 
transformation groups are fundamental to the definition of parallelism, from which curvature is defined. 


Transformation groups are introduced in Definition 20.1.2. 


In Sections 20.2, 20.3 and 20.4, transformation groups are classified as effective, free or transitive. 
“Effective” means that group ip elements are uniquely determined by their action on the passive set, 
which implies that the group really is a group of transformations. In other words, each group element 
is its action on the passive set. “Free” means that only the action of one element of the group has fixed 
points, which is clearly not true for the group of rotations of a two-sphere for example. “Transitive” 
means that for every pair of points in the passive set, there is a group element which transforms one to 
the other. Thus the group of rotations of a two-sphere is transitive but the group of rotations of 3-space 
is not. À non-trivial group acting on itself is always effective, transitive and free. 


Section 20.5 introduces orbits and stabilisers of transformation groups. The orbits of a transformation 
group are trivial if and only if it is transitive. The stabilisers are trivial if and only if it acts freely. 
Therefore a non-trivial group acting on itself has both trivial orbits and trivial stabilisers. 


Section 20.7 gives the definitions for right transformation groups which correspond to the more usual 
style of left transformation group. In principle, this is redundant. However, it is easy to get confused 
when a group simultaneously acts both on the left and the right on different passive sets. 


Section 20.9 is about figures and invariants of transformation groups. This topic is often presented 
informally as and where required, and the presentation here is also quite informal. The basic idea 
here is that the action of a group on the set of points of a given passive set X is closely associated 
with its action on sets of “figures” in the set X. For example, the set of rotations of the points in a 
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Euclidean space is closely associated with the set of rotations of the lines in that space, or the triangles 
or circles, or the real-valued functions on that space, or the set of functionals of those real-valued 
functions. Remark 20.9.2 describes six kinds of “figure spaces”, which may be applied recursively in any 
combination to produce a huge network of associated transformation groups, all using the same group. 
A particular application of these concepts is to the definition of associated connections on tensor bundles 
and general fibre bundles for a given connection on the tangent bundle of a differentiable manifold. 


Section 20.10 introduces “baseless figure/frame bundles”. These are implicit in the fibre bundle lit- 
erature, but are presented here explicitly in Definition 20.10.8. In the context of fibre bundles, every 
ordinary fibre bundle may be paired with an associated principal fibre bundle. The OFB represents 
observations of objects, whereas the associated PFB represents reference frames for the observations. 
Whenever the reference frame is adjusted, the observations are adjusted also, but the underlying ob- 
ject is unchanged. The intention in Section 20.10 is to clarify the relations between observations and 
reference frames by removing the underlying base-point space from consideration. The result is the 
“baseless figure/frame bundle” in Definition 20.10.8. Study of such a minimal associated OFB and PFB 
is intended to clarify some of the mysteries of fibre bundles, which are the underlying substrate for all 
parallelism, which is the underlying concept for all curvature. 


Section 20.11 presents associated figure bundles. These are like associated OF Bs, not like the OFB/PFB 
associations in Section 20.10. Examples of associated OFBs are the tensor bundles on a differentiable 
manifold. When the coordinate chart for a differentiable manifold is changed, all of the different types 
of tensor bundles are changed in an associated way. Section 20.11 in effect considers associated OFBs 
in the absence of a base-point manifold. — 


Section 20.12 presents construction methods for associated figure bundles. 


CHAPTER 21: Non-topological fibre bundles 


(1) 


(10) 


(11) 
(12) 


Chapter 21 presents non-topological fibre bundles, and Chapter 47 presents topological fibre bundles. 
Chapter 54 presents tangent bundles. Chapters 64-66 present differentiable fibre bundles. (Sections 
20.10 and 20.11 present baseless figure/frame bundles. 3 Fibre bundles provide a unifying framework for 
differential geor geometry, while also extending the concept of parallelism. 


Section 21.1 introduces non-uniform non-topological fibrations, which are the same as uniform topolog- 
ical fibrations except that they do not necessarily have uniform fibre sets. 


Section 21.2 introduces (uniform) non-topological fibrations, which are the same as topological fibrations 
except that they have no topology. Presenting fibrations without topology first gives some clues as to 
why the topology is needed. A fibration is the same as a fibre bundle except that it lacks a fibre atlas. 


Section 21.3 introduces cross-sections on non-topological fibrations. 
Section 21.4 introduces “cross-section short-cuts” for “form-style” non-topological fibrations. 


Sections 21.5, 21.6 and 21.7 introduce "fibre charts" and "fibre atlases" for non-topological fibrations. 
A fibre chart is intended to indicate how the fibration is structured relative to a fixed "fibre space". 
This is less meaningful without topology, but several low-level technical aspects of fibration structure 
can be more easily understood in the absence of topology. 


Section 21.8 defines non-topological ordinary fibre bundles to be non-topological fibrations which have 
a fibre atlas for a specified structure group. This does place a real constraint on the fibre bundle chart 
transition maps. 


Section 21.9 defines non-topological principal fibre bundles, which are like non-topological ordinary fibre 
bundles except that the fibre space is the same as the structure group. Non-topological PFBs may be 
associated with non-topological OFBs in much the same way as in the topological and differentiable 
fibre bundle cases. 


Section 21.10 defines identity cross-sections and identity charts on principal bundles. These are useful 
for applications to gauge theory. 


Section 21.11 defines the right action map and right transformation group of a non-topological principal 
bundle, including construction of a principal bundle from a given free right transformation group. 


Section 21.12 introduces associated non-topological fibre bundles. 
Section 21.13 defines patchwork associated non-topological fibre bundles. 
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Section 21.14 defines orbit-space associated non-topological fibre bundles. 


Section 21.15 introduces the idea of parallelism on a non-topological fibration. In the absence of a 
topology, continuous curves are not defined in the base space. So definitions of pathwise parallelism are 
almost meaningless. 


Section 21.16 introduces parallelism for non-topological fibre bundles. This differs from parallelism on 
fibrations in that the structure group now plays a role. 


CHAPTER 22: Linear spaces 


(1) 


(2) 


(10) 


The very substantial subject of linear algebra has been arbitrarily split into linear spaces in Chapter 22, 
linear maps and dual spaces in Chapter 23, and linear space constructions in Chapter 24 so as to limit 
the lengths of chapters. 


The definitions for linear spaces in Chapter 22 are essentially the same as in any textbook on linear 
algebra. A linear space is defined in Definition 22.1.1 in the standard way. (A linear space is also known 
rather inconveniently as a *unitary left module over a commutative division ring".) As much as possible, 
the fields for linear spaces are assumed to be general, not necessarily the real numbers for example, and 
no assumptions are made about the dimensionality. 


Section 22.2 defines the very general classes of “free linear spaces” and “unrestricted linear spaces". The 
free linear space on a set S consists of all functions on S which are zero outside a finite subset of S. 
The unrestricted linear space on S consists of all functions on S, without any restrictions. (The values 
are assumed to lie in a specified field.) 


Section 22.3 introduces linear combinations in the standard way, but with some notational innovation. 


Section 22.4 introduces the linear span of any subset of a linear space. Theorem 22.4.7 gives some useful 
properties of the linear span. 


Section 22.5 defines the dimension of a linear space as the least cardinality of its spanning sets. This is 
slightly non-standard. It is more usual to define the dimension to be the cardinality of any basis. This 
does not work so well if the axiom of choice is rejected, because then the existence of a basis is not 
guaranteed. Since the usual definition of cardinality in terms of equinumerosity to ordinal numbers is 
extremely clumsy and problematic and impractical (except for finite and countably infinite cardinality), 
Definition 22.5.2 uses the much better defined “beta-cardinality” in Section 13.4. This avoids both 
mediate cardinals and the mysticism of general ordinal numbers. Thus for example, w is the dimension 
of the linear space of real-valued functions on the integers, and 2" is the dimension of the linear space 
of real-valued functions on the real numbers. 


Section 22.6 introduces linear independence of sets of vectors. According to Definition 22.6.3, a set 
of vectors is linearly independent when none of them lies in the span of the others. Theorem 22.6.8 
converts this definition to equivalent conditions which are better suited to proofs. 


Section 22.7 introduces the concept of a basis of a linear space as a linearly independent subset which 
spans the whole space. This is uncontroversial, but it is not assumed here that every linear space has a 
basis. One of the most fundamental facts about finite-dimensional spaces is that every basis contains the 
same number of vectors. This is shown in Theorem 22.7.16, which relies for its proof on Theorem 22.7.14, 
which uses “elementary row operations" on bases to show that any linearly independent subset of the 
span of a finite set S of linearly independent vectors is not more numerous than S. 


In Remark 22.7.22, it is proposed that the issue of the existence (or non-existence) of a basis for a 
linear space should be “managed” by simply assuming such existence whenever it is required. In terms 
of the anthropological “facts on the ground”, this is what mathematicians really do, although they 
profess undying faith in the axiom of choice and all its works. Whenever an explicit basis is needed, 
like the sixpence provided by the tooth fairy while the child is asleep, it must be provided by an adult. 
No explicit basis was ever provided by any axiom of choice, despite what anyone might believe at a 
metaphysical level. Furthermore, it is proposed that theorems which need a well-ordered basis should 
assume that also. (For example, the free linear space of real-valued functions on R, which equal zero 
outside a finite set has a well-known explicit basis, but no well-ordered basis because R cannot be 
well-ordered.) 


Section 22.8 is about the components of vectors with respect to a basis. The “component map” of 
a given basis is a bijection between the vectors in a linear space and their components with respect 
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to the basis. A component map thus provides a “numericisation” of a linear space in terms of the 
much-maligned “coordinates” . 


Section 22.9 is concerned with the formulas for the changes in components of vectors when a basis 
is changed. Special difficulties arise if the linear space is infinite-dimensional. Changes of basis, and 
the accompanying changes of vector components, are the fundamental idea underlying ordinary and 
principal fibre bundles. The basis corresponds to a PFB, while the components correspond to an OFB. 
The basic algebra of basis changes thus underlies most of the concepts of differential geometry. There 
is in effect a kind of “dual action" or “contragredient action" of the group of basis changes on the 
components of vectors. 


Section 22.10 defines Minkowski sums and scalar products of sets. 


Section 22.11 gives some basic definitions of convex combinations of vectors and convex subsets of a 
linear space. 


CHAPTER 23: Linear maps and dual spaces 


(1) 
(2) 


(3) 
(4) 


(6) 


Chapter 23 is concerned with linear maps, linear functionals, and dual linear spaces. 


Section 23.1 defines linear maps, and the kernel and nullity of linear maps. Section 23.2 defines the 
component map of a linear map with respect to bases on the source and target spaces. 


Section 23.3 shows that the trace of a linear space endomorphism is basis-independent. 


Section 23.4 introduces linear functionals, which are linear maps from a linear space to its own field. 
Section 23.5 considers the existence of non-zero linear functionals. Section 23.6 defines the dual of a 
linear space V to be the linear space of linear functionals on V with pointwise vector addition and scalar 
multiplication operations. Section 10.9 discusses the thorny topic of bases for dual spaces. Sections 23.8 
and 23.9 are concerned with the much simpler subject of dual spaces for finite-dimensional linear spaces. 
Section 23.10 is about the “double dual" of a linear space, i.e. the dual of the dual of a linear space, 
which is quite simple for finite-dimensional spaces. 


Section 23.11 is about the “linear space transpose map” construction, which defines a reverse map 
QT : W* — V* between dual spaces W* and V* from a given linear map ¢: V — W. (The properties 
of this transpose map are useful for proving isomorphisms between some kinds of tensor spaces.) 


Section 23.12 defines convex and concave real-valued functions on convex subsets of finite-dimensional 
real linear spaces. 


CHAPTER 24: Linear space constructions 


(1) 
(2) 


9) 
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Chapter 24 is concerned with direct sums of linear spaces (which are really direct products), quotients 
of linear spaces, seminorms, norms and inner products. 

Section 24.1 defines “direct sums” of linear spaces, which should be called “direct products" to be 
consistent with other categories such as groups and manifolds. However, the dimension of a direct sum 
of linear spaces equals the sum of their dimensions, which justifies the name. (By contrast, the dimension 
of a tensor product of linear spaces equals the product of the dimensions, as shown in Theorem 28.1.22.) 
Section 24.2 defines quotients of linear spaces in terms of cosets, which are linear translates of a given 
subspace of a linear space. These translates are the elements of the quotient space. Quotient spaces 
arise naturally from linear maps, where the subspace in question is the kernel of the map. This leads 
to the important isomorphism theorem, stated here as Theorem 24.2.16, which leads directly to Theo- 
rem 24.2.18, the rank-nullity theorem. 

Section 24.3 defines natural isomorphisms for duals of subspaces of linear spaces. 

Section 24.4 is about affine transformation groups. 

Section 24.5 defines exact sequences of linear maps. 

Section 24.6 defines seminorms on linear spaces. 

Section 24.7 defines norms on linear spaces. A metric may be defined in terms of a norm. The most 
important example of a norm is the Cartesian space root-mean-square norm, called the “Euclidean 
norm”, which provides the metric space structure for Euclidean space. 


Section 24.8 presents various bounds for seminorms and norms on linear spaces. 
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Section 24.9 defines inner products on linear spaces, which are closely related to Euclidean space. 


Section 24.10 defines hyperbolic and Minkowskian inner products on linear spaces, which are closely 
related to Minkowski and pseudo-Riemannian spaces. 


Section 24.11 defines non-topological vector bundles. 


CHAPTER 25: Matrix algebra 


(1) 


(11) 
(12) 


Chapter 25 is about matrix algebra, but there is more than one kind of matrix algebra according to 
what the matrices signify. The range of meanings of matrices, according to their application contexts, 
is discussed in Section 25.1. The most important contexts for differential geometry are linear maps and 
bilinear forms. The operations which are meaningful for these two contexts, and the meanings of those 
operations in the two contexts, are different. All of the matrices in Chapter 25 are finite. 


Section 25.2 introduces general rectangular matrices and defines elements, rows and columns of matrices. 


Section 25.3 defines matrix addition, multiplication of matrices by scalars (i.e. field elements), and 
multiplication of matrices by matrices (called simply “matrix multiplication"). Matrix addition and 
scalar multiplication operations, which are defined as element-wise addition and scalar multiplication, 
are meaningful in many application contexts, but matrix multiplication is fairly specific to the linear 
map context. 

Section 25.4 introduces identity matrices and left and right inverses with respect to matrix multiplication, 
and some of the basic properties of these concepts. 


Section 25.5 introduces row span, column span, row rank and column rank of matrices. These concepts 
are important for analysis of the invertibility of matrices. They are related to the row and column null 
spaces and nullity which are defined in Section 25.6. 

Section 25.7 defines the component matrices for linear maps with respect to given bases. This application 
context justifies the form of the definition of matrix multiplication. 

Section 25.8 narrows the focus to square matrix algebra. This makes matrix inverses more meaningful. 
Section 25.9 is about the trace of a matrix. Section 25.10 is about the determinant of a matrix. 
Section 25.11 concerns the “upper modulus" and “lower modulus" of a square matrix, which are particu- 
larly relevant to the bilinear form context for matrices, particularly in regard to second-order differential 
operators for real-valued functions on manifolds. 


Section 25.12 concerns “upper bounds” and “lower bounds" for real square matrices. These have a 
particular importance for the proof of Theorem 41.10.4, the inverse function theorem. 


Section 25.13 is about real symmetric matrices. Section 25.14 is about classical matrix groups. 


Section 25.15 is concerned with “multidimensional matrices” or “higher-degree arrays". 


CHAPTER 26: Affine spaces 


(1) 


Chapter 26 is mostly of low relevance for the differential geometry in this book. Apart from Sections 
26.12, 26.13, 26.14, 26.15 and 26.16 on “tangent-line bundles" and “tangent velocity bundles", which 
are used to define tangent bundles on differentiable manifolds in Chapter 54, most of Chapter 26 can 
be safely ignored. (Tangent-line bundles are not related to “line bundles".) 


Section 26.1 discusses the many meanings of the word "affine". 


Section 26.2 defines a very general kind of affine space over a group. Section 26.3 defines “tangent-line 
bundles" on affine spaces over groups, which are minimalist precursors to the more useful tangent-line 
bundles on Cartesian spaces in Sections 26.13 and 26.14. 


Section 26.4 is about affine spaces over modules over sets. Section 26.5 is about tangent-line bundles on 
affine spaces over modules over sets. These are minimalist, but not as minimalist as in Section 26.3. 


Section 26.6 is about affine spaces over modules over groups and rings. Section 26.7 is about affine 
spaces over unitary modules over ordered rings. These are progressively less and less minimalist, but 
still not very useful. 

Section 26.8 is about tangent velocity spaces and tangent velocity bundles on affine spaces over modules 


over sets. These are a minimalist version of the tangent spaces and tangent bundles on Cartesian spaces 
in Sections 26.13 and 26.14, which are actually useful. 
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Section 26.9 is about parametrised line segments and hyperplane segments, which don’t seem to be 
really useful for anything. (It may be removed.) 


Section 26.10 is about affine spaces over linear spaces. These are the standard affine spaces which are 
useful for demonstrating how to remove the arbitrary origin from linear spaces. However, it is easier to 
manage the arbitrariness of the origin of a linear space than to manage the definitions of affine spaces. 
So they have more philosophical than practical interest. 


Section 26.11 is a discussion of the many different meanings for the term “Euclidean space” in the 
mathematics literature. It is proposed that the word “Euclidean” should be reserved for a space which 
has the standard Euclidean distance function, whereas the word “Cartesian” should be used for the 
coordinate n-tuple spaces R” for integers n. 


Sections 26.12, 26.13 and 26.14 are about tangent-line bundles on Cartesian spaces. These are used as 
the fundamental representation for tangent bundles on differentiable manifolds in Chapter 54. Therefore 
these tangent-line bundles are incorporated into all of the differential geometry in this book. It took 
the author about two decades to finally conclude that the best representation for tangent vectors is a 
parametrised line. These are defined in Sections 26.13 and 26.14 for Cartesian spaces, and are then 
incorporated into tangent vectors in Definition 54.1.2. 


Section 26.15 defines direct products of tangent spaces of Cartesian spaces. 


Section 26.16 is about tangent velocity bundles on Cartesian spaces. These are roughly equivalent to 
the tangent-line bundles in Sections 26.13 and 26.14, but a tangent line vector is a linear function L 
which looks like L : t — p+ tv for a point p € IR" and a velocity v € IR", whereas a tangent velocity 
vector is just the ordered pair (p,v). This is useful for computations, and is incorporated into tangent 
velocity vectors on differentiable manifolds in Definition 54.10.4. 


Section 26.17 defines tangent covector bundles on Cartesian spaces, and Section 26.18 defines tangent 
field covector bundles on Cartesian spaces, but despite their initial plausibility as candidates for the 
representation of tangent covectors on differentiable manifolds, they are rejected for this purpose in 
Sections 55.1 and 55.2. 


Section 26.19 is a philosophical discussion of ancient Greek precedents for the representation of tangent 
vectors as parametrised lines. 


Section 26.20 on line-to-line transformation groups has some philosophical interest for indicating ways 
of generalising the tangent vector concept, but it may be removed. 


CHAPTER 27: Multilinear algebra 


(1) 


The substantial subject of tensor algebra has been split into multilinear algebra in Chapter 27, tensor 
spaces in Chapter 28, tensor space constructions in Chapter 29, and multilinear map spaces with sym- 
metries in Chapter 30, so as to limit the lengths of chapters. Tensor analysis in Chapter 46 is tensor 
algebra plus analysis, which includes differential forms and the exterior derivative. Tensor calculus is 
any kind of differential geometry which is expressed principally in terms of coordinates. (Apparently 
Einstein chose the name “tensor calculus" in 1913 in preference to the earlier name, “absolute differen- 
tial calculus".) As emphasised in Remark 27.0.3, tensors are unpleasant. The more precisely they are 
described, the more unpleasant they become. Consequently Chapters 27-30 are more unpleasant than 
a typical treatment. One has the choice between treating tensors superficially, which is confusing, or 
treating them in depth, which is painful. The painful path has been chosen here, probably more painful 
than most treatments. 


Chapter 27 makes much more sense if it is known in advance that tensor spaces are the duals of linear 
spaces of multilinear functions. Thus most concepts are defined twice, once for multilinear functions, 
and once for tensors in Chapter 28, because they are duals of each other. 

Section 27.1 makes some general observations about multilinear algebra and tensor algebra. 

Section 27.2 introduces “multilinear maps", which are a generalisation of the linear maps in Section 23.1. 
Section 27.3 gives some minor additional notes on scalar-valued multilinear maps, which are called *mul- 
tilinear functions". These may be regarded as a generalisation of the linear functionals in Section 23.4. 
Section 27.4 defines “simple multilinear functions" which are products of linear functions. These are 
exactly analogous to the simple tensors in Section 28.4. The sum of any two simple multilinear functions 
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is a multilinear function. The purpose of this definition becomes clearer when the corresponding simple 
tensors are introduced. 

Section 27.5 expresses multilinear maps in terms of components with respect to a basis for each of the 
“input spaces". Then multilinear maps may be reconstructed from components in the same way that 
linear maps are. 

Section 27.6 introduces linear spaces of multilinear maps, which are analogous to linear spaces of linear 
maps. It is shown in Theorem 27.6.14 that the dimension of a linear space of multilinear functions is 
equal to the product of the dimensions of the individual “input spaces". 


Section 27.7 presents linear spaces of multilinear maps for dual spaces. 


Section 27.8 presents the “universal factorisation property" for multilinear functions in Theorem 27.8.7. 
This property is often presented for tensors in textbooks, but not so often for multilinear functions, which 
are the duals of tensors. A “universal factorisation property” says, in essence, that any multilinear map 
from a direct product of linear spaces to a given linear space U can be factorised into a fixed “canonical 
map” from the direct product to the space of multilinear functions and a linear function from that space 
to U. In other words, the multilinear function space contains exactly the “right amount of information” 
to describe the multilinear product quality of the input spaces. This makes more sense in Section 28.3 
where it is shown that the tensor product of linear space contains the “right amount of information” 
for a tensor product. 


CHAPTER 28: Tensor spaces 


(1) 


(6) 


Section 28.1 introduces a tensor (product) space in Definition 28.1.2 as the dual of the linear space of 
multilinear functions on a given family of linear spaces. A tensor is defined to be any element of such 
a space. In other words, a tensor is a linear functional on a linear space of multilinear functions. It is 
shown in Theorem 28.1.22 that the dimension of a tensor product of spaces equals the product of the 
dimensions of the spaces. 


Sections 28.2 and 28.3 present the “canonical tensor map” and the “universal factorisation property” 
for tensor products. These are the dual versions of the corresponding map and property for multilinear 
functions in Definition 27.4.13 and Section 27.8 respectively. This kind of “universal property” has 
greater significance in category theory than it has in differential geometry. (It is presented here because 
some books on tensor algebra define tensor products up to isomorphism by this property, which is 
intended to make tensor algebra seem simpler than it really is, although such an approach suffers from 
excessive abstraction. This approach is outlined in Section 28.6.) 


Section 28.4 introduces “simple tensors”, which are often used in elementary introductions to give 
some intuitive basis to tensor algebra. Commencing with simple tensors typically leads to substantial 
confusion later. Therefore simple tensors are introduced very late in this presentation. Combining 
simple tensors to make general tensors is often achieved by defining a tensor space to be the quotient 
of the free linear space over all simple tensors with respect to a suitable equivalence relation, which is 
perhaps the least satisfactory way to define tensors. (This approach is outlined in Section 28.7.) 
Section 28.5 is discussion of various philosophical aspects of tensor algebra, including a survey in Ta- 
ble 28.5.1 of the various ways of representing unmixed covariant and contravariant tensors. (Here 
“covariant tensor” means a multilinear function, “contravariant tensor” means a tensor, and “unmixed” 
means that covariance and contravariance are not combined to construct a “mixed tensor”.) 


Section 28.6 sketches the alternative approach to tensors where a tensor space is metadefined in terms 
of a “universal factorisation property”. 

Section 28.7 sketches the alternative approach to tensors where a tensor space is defined to be the 
quotient space of the free linear space over simple tensors with respect to a suitable equivalence relation. 


CHAPTER 29: Tensor space constructions 


(1) 


Chapters 29 and 30 are a continuation of Chapters 27-28. The basic definitions of the linear spaces 
of multilinear maps, multilinear functions and tensors are applied to construct further dual spaces, 
isomorphisms between many different kinds of tensor-related dual and double-dual spaces, mixed spaces 
of various kinds of tensors and multilinear functions, and spaces which are constrained to have various 
symmetry or antisymmetry properties. 
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Section 29.1 describes various kinds of duals and double-duals of multilinear function and tensor spaces, 
and the relations between them. 


Section 29.2 presents numerous isomorphisms between duals and double-duals of unmixed multilinear 
function and tensor spaces. These are “unmixed” in the sense of not mixing primal and dual “input” 
spaces. However, the input spaces may be any mixture of finite-dimensional linear spaces. (Of course, 
some of those inputs spaces could be dual spaces, but that is not explicitly taken into account in 
Section 29.2.) 


Section 29.3 introduces mixed covariant and contravariant tensor product spaces with heterogeneous 
“input spaces”. This is a very general class of constructions where the sequence of “inputs” to a tensor 
product construction consists of two lists, which are a list of primal linear spaces, which may all be 
different, and a list of duals of linear spaces, which may all be different. 


Section 29.4 is concerned with components with respect to linear space bases for mixed covariant and 
contravariant tensor spaces with heterogeneous input spaces. Since these tensor spaces are very general, 
the management of tensor components is very untidy because so many levels of indices are required. 


Section 29.5 constrains the “input spaces” for mixed covariant and contravariant tensor spaces to all be 
the same. These are the kinds of tensor spaces which are applicable to the vast majority of contexts in 
differential geometry. 


Section 29.6 is concerned with components with respect to a linear space basis for mixed covariant and 
contravariant tensor spaces with a single input space. The management of tensor components for such 
tensors is much easier than for heterogeneous input spaces, but it is still onerous. These components 
are the famous “coordinates” of tensor calculus. 


Section 29.7 presents tensor contraction operations, which are well known from old-style tensor calculus, 
and tensor juxtaposition products, which are formed by a very naive kind of multiplication. 


CHAPTER 30: Multilinear map and tensor symmetries 


(1) 


Section 30.1 defines spaces of symmetric and antisymmetric multilinear maps. In most situations in dif- 
ferential geometry, general tensors without symmetry constraints are not in fact required. The majority 
of situations require either symmetric tensors or antisymmetric tensors. 


Sections 30.2 and 30.3 define bases and coordinate maps for general, antisymmetric and symmetric 
multilinear maps. 


Section 30.4 defines antisymmetric tensor spaces. A subtlety here is that so-called antisymmetric tensors 
are in fact elements of the dual space of the space of antisymmetric multilinear functions. (This subtlety 
is lost in tensor calculus expressed in terms of coordinates.) 


Section 30.5 is concerned with symmetric tensor spaces. (This section may be removed.) 


Section 30.6 builds tensor bundles on Cartesian spaces from the tangent bundles on Cartesian spaces 
in Sections 26.13 and 26.14. These are not really needed for defining tensor bundles on differentiable 
manifolds because the approach taken in Sections 55.4 and 56.1 is to define tensors on differentiable 
manifolds in terms of the tangent spaces on those manifolds instead of “importing” pre-fabricated tensor 
bundles from Cartesian coordinate charts, which is the alternative approach taken in Section 56.2. 
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Chapter 17 


SEMIGROUPS AND GROUPS 
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17.0.1 REMARK: Family tree of algebraic systems. 

Figure 17.0.1 is a family tree for some classes of algebraic systems which are presented in Chapters 17, 18, 
19 and 20. (Further details of the family tree for modules are shown in Figure 19.0.1. See Figure 20.0.1 for 
further family tree details for transformation groups.) 


semigroup 
(Tor) 


group 
(G,oq) : 
J transformation group 
(G,X;oa,H) 


— 
(Tor) Dei 


module commutative group 
(M,om) (G,oq) 


L 


ring 
(R,or,TR) 


unitary ring commutative ring 
(R,or,TR) (R,oRr,TR) 
commutative unitary ring 
(R,or,TR) 
field 
(K,oK ,TK) 


Figure 17.0.1 Family tree of semigroups, groups, rings and fields 


Chapters 17 and 18 present some basic definitions for single-set algebraic systems. These include semigroups, 
monoids and groups, which have only one operation, and rings and fields, which have two operations. 


Two-set algebraic systems include modules, associative algebras and Lie algebras in Chapter 19, transforma- 
tion groups in Chapter 20, linear spaces in Chapters 22-24, matrix algebra in Chapter 25, and tensor algebra 
Chapters 27-30. Three-set algebraic systems include affine spaces, which are presented in Chapter 26, 
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17.0.2 REMARK: The plethora of algebraic systems. 
On the subject of the plethora of algebraic systems, Bell [234], page 213, wrote the following. 


There are 4,096 (perhaps more) possible generalizations of a field. To develop them all without 
some definite object in view would be slightly silly. Only those that experience has suggested have 
been worked out in any detail. The rest will keep till they are needed; the apparatus for developing 
them is available. 


And Pinter [122], page 10, wrote: 


Today it is estimated that over 200 different kinds of algebraic systems have been studied, each of 
which arose in connection with some application or specific need. 


In this book only those algebraic systems which have some relevance to differential geometry, direct or 
indirect, are presented in any detail, and only relevant definitions and theorems are given. Nevertheless, 
they may seem excessive. One may safely skip those which seem less interesting, and return to them if 
cross-references suggest that they are in fact core topics after all. 


17.0.3 REMARK: Overview of algebraic systems. 

Table 17.0.2 summarises the operations and maps of some basic algebraic systems. When there are two sets 
in one of these systems, there is a map whose domain is the cross product of the two sets and the range is 
one or other of the two sets. Here the range set is called the “passive set” and the other set is called the 
“active set”. In all cases in the table, the active set is the one on the left. 


active passive operation in operation in action map 

system name set set active set passive set on passive set 
semigroup, monoid, group G 0:GxGoG 
ring R o:RxR>R 

T:RxROR 
field K o:KxK >K 

T:KxKOK 
transformation group G X 0:GxGoG wiGxx >X 
left A-module A M a:MxM7>M pw: Ax MoM 
left module over a group G M oa:GxGroG om:MxM>M gu:GxM—M 
(unitary) left module R M oR: RxR>OR o4: Mx MM w:RxM>M 
over a ring TR: RX R5 R 
linear space K V Ok: KxK—K oy:VxVoV L:KxVoV 

Tk: K x KS K 
associative algebra, K A Ok: Kx K5K cA: AX A—^ A L:KxA—A 
Lie algebra Tk: Kx KK. TA: AX A— A 
Table 17.0.2 Summary of sets and operations of algebraic systems 


17.0.4 REMARK: Mnemonics for addition-like and multiplication-like algebraic operations. 

A mnemonic for the algebraic operation symbols in Table 17.0.2 is that ø means addition ("sums"), 7r means 
multiplication (“times”) and u means a map. The o symbol is hopefully also reminiscent of the o “operation” 
symbol. The algebraic structures in Table 17.0.2 are illustrated in Figure 17.0.3. 


17.0.5 REMARK:  Abbreviation of specification tuples. 

Specification tuples for mathematical systems are usually abbreviated, as discussed in Remark 1.5.7 and 
Section 8.8. For example, a group may be fully specified as a tuple (G,c), where G is the set and c : 
G x G — G is the operation on G, or the group may be abbreviated to simply G. The definition of such 
an abbreviation may be indicated by the (non-standard) “chicken-foot” notation “<” as G < (G,c), which 
means that “G is an abbreviation for (G,c)". The meaning of G is ambiguous because it represents both 
the basic set and the full tuple, which cannot be equal, but this kind of symbol re-use is standard practice. 
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group, monoid, ring, field transformation 
semigroup group/semigroup 


“Ge GRT GG 


Jj 


M 
A oa Q^ G oR R MTR OK OOK DTK 
d E d "A 


OM ("M OM QM OM ("M AQ. A TA 


module module module over a ring, associative algebra, 
over a set over a group linear space Lie algebra 
Figure 17.0.3 Basic algebraic structure classes 


17.1. Semigroups 
17.1.1 DEFINITION: A semigroup is a pair I < (L,c) where T Z 0 and c : T x T — T satisfies 
(i) Yg1, 92,93 ET, o(g1,0(92, 93)) = o(o(91. 92), 93). [associativity] 


17.1.2 EXAMPLE: A semigroup with non-injective left and right actions. Maximum “information loss". 
Let T be any set, and let 0 € T. Define o : x P — P by *(g1, 92) = 0 for all gi, g2 € T. Then ø is clearly an 
associative closed operation on I. Therefore (T', c) is a semigroup. However, the non-injectivity of the maps 
gi  0(g1,g2) and g2 +> o(g1, 92), when T Æ {0}, implies that the usual kind of cancellation operation to 
solve algebraic equations is not available. This is an extreme example of “information loss". 


17.1.3 EXAMPLE: Semigroups of non-negative integers. 

Let Ty, = (i € Z; i > k} for some k € Zt. Define op : Ty x y — Tx by ex(z,y) = £ + y, where “+” is 
the usual addition operation on the integers. Then Range(o,) C Ty, and oy is associative. So (Tk, ok) isa 
semigroup for all k € Zj. 


17.1.4 EXAMPLE: A non-associative closed operation does not qualify as a semigroup operation. 

The well-known cross product (or vector product) for the vectors i, —i, j, —j, k, —k, 0 € RÌ, with i = (1,0,0), 
j = (0,1,0) and k = (0,0, 1), is a closed operation which is not associative. (See for example Szekeres [305], 
page 220; Struik [39], pages 11, 206; Lanczos [280], page 355; Corben/Stehle [258], page 18; Schutz [36], 
pages 125-126; Joos/Freeman [278], pages 12-13; Gregory [272], pages 13-14; Bishop/Goldberg [3], page 95; 
Guggenheimer [16], page 163; Kay [18], page 10; Kreyszig [105], pages 377-381; Kreyszig [22], pages 13-14; 
Frankel [12], page 92; Stillwell [143], pages 12-13; Kaplan/Lewis [99], pages 819-824.) The “Cayley table" 
for this operation is as in Table 17.1.1. (See Cullen [64], page 8, for Cayley tables. It is called an *operation 
table" by Pinter [122], page 27. It is called a ^multiplication table" by MacLane/Birkhoff [110], page 44.) 


x 0 i j k —i —j —k 
0 0 0 0 0 0 0 0 
i 0 0 k —j 0 —k j 
j 0 —k 0 i k 0 —i 
k 0 —i 0 —j D 0 
—i 0 0 —k j 0 k —j 
—j 0 k 0 —i —k 0 i 
—k 0 —j a 0 j —i 
Table 17.1.1 Multiplication table for the cross product. 
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For example, (ix j) x j 2 kx j i#0=1x0=i~x(j x j). So this is not a semigroup. 


17.1.5 REMARK: Semigroups are useful for modelling non-reversible processes. 

A group is required to have an inverse element, which suggests that in some sense, no “information” is lost 
in the transformations corresponding to group elements. In other words, all actions are reversible. If an 
action has no inverse, this suggests that “information” has been lost. The original state cannot be recovered. 
So semigroups would typically be useful for modelling irreversible operations while groups are useful for 
modelling reversible operations. 


In the case of the operator solutions of the heat equation, the operators are injective but not surjective. 
(See Example 17.1.6, and Treves [150], section 6.1, page 41; Friedman [73], page 92.) They are only non- 
invertible because they are not surjective. The impossibility of defining general negative time-step operators 
as solutions to the heat equation arises from the fact that arbitrary time-zero functions are generally not 
smooth enough. One may construct negative time-step operator solutions using Fourier transforms, but 
they do not yield convergent integrals for general functions. Therefore, in a strict sense, one may say that 
information is not lost in this situation. But in practice, we know that noise tends to obscure smoothed heat 
distributions, which makes it impossible to make the reverse-time transformation to recover the initial heat 
distribution. In other words, the inverse operator is extremely unstable. Therefore, in practice, information 
is lost. In fact, heat redistribution does correspond to an increase in entropy, which is a loss of information. 


So the conclusion remains that groups tend to be associated with reversible, information-preserving systems 
whereas semigroups (which are not groups) are associated with irreversible, information-losing systems. 


17.1.6 EXAMPLE: A semigroup of forward-time operators for the heat equation. 

The transformations which solve the heat equation for initial values in the space X of bounded continuous 
functions on R” for n € Z* are the maps v; : X — X fort € R$, where Wo = idx, and v, : f — (x > 
f E(a—y,t)f(y)) dy for t € R*, with E : IR" x R+ defined by E : (x,t)  (2Vt)^" exp(—|x|?/4t). (See 
for example Treves [150], page 42; Friedman [72], page 1.) 

T= {uit € Rj) is a semigroup of transformations of X with multiplication operation o : xT — T defined 
by ø : (Ws, Y+) Vs. (This coincides with the transformation composition operation.) The elements of T 
are non-invertible except for Wo. This follows from the observation that y+(f) € C? (IR?) for all f € C°(R”). 


17.1.7 REMARK: Embedding the non-negative integers inside a general semigroup. 

According to Theorem 17.1.9, the semigroup of positive integers may be mapped homomorphically to any 
non-empty semigroup. The sequence $, cannot commence with 0 € Zi because this would require an 
element $,(0) € T which satisfies $,(0) = ¢2(0 + 0) = c($5(0), @2(0)). Such an element is not guaranteed 
to exist in I. 


Although Theorem 17.1.9 is expressed in terms of successive multiplication on the right, multiplication on 
the left would clearly give the same result. 


17.1.8 DEFINITION: A semigroup homomorphism from a semigroup (T'1,c1) to a semigroup (T2,02) is a 
map $ : Iı — P» which satisfies Vgi, g2 € T1, 6(01(91,92)) = 72(9(91), (92): 


17.1.9 THEOREM:  Unital semigroup morphisms. 
Let (T,o) be a semigroup. For x €T, define $, : Zt — T inductively by 


(i) bz (1z) =T, 
(ii) Vn € Z+, be(n +1) 2 0(¢b2(n), £). 


Then $, is a semigroup homomorphism for all x € I. 


Pnoor: The pair (Z*,+) is a semigroup, as mentioned in Example 17.1.3. For all 7 € Z*, define the 
proposition P(j) = “Vi € Z*, ali + j) = 0(bx(2), dc (j))”. Let j = 1. Then ġz(i + j) = o(ós (1), x) for all 
i € Z* by (ii). So Vi € Zt, é.(i + j) = o(ós(i), 6: (j)). Therefore P(1) is true. 

Assume that P(j) is true for some j € Zt. Let i € Zt. Then zli + (j 4-1) = é((i +1) +7) = 
c (6s (i +1), 6: (j)) by P(j). But ġz(i +1) = e(02(1), x) by (ii). So Geli + (j + 1)) = e(e(6s(1), £), 6«(3)) = 
c (6s (1), (x, Gx(7))) by the associativity of o. But a(z, 9:(j)) = o(be(1), 0«(3)) = de + j) = 6G + 1) 
by P(j). So sli + (j +1)) = e(éz(i), ó;(j + 1)). Therefore P(j) is true. So Vi,j € Zt, sli + j) = 
c(ó,(i), ó,(j)) by induction. Hence y is a semigroup homomorphism by Definition 17.1.8. 


—VIL 
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17.1.10 REMARK:  Unital morphisms and algebraic system structure. 

The name “unital semigroup morphism” for the sequences defined in Theorem 17.1.9 is perhaps not fully 
justified. The standard “unital morphism” in the literature is for unitary rings as in Definition 18.2.11. 
Various other “unital morphisms” are summarised in the following table. 


system map unital morphism property theorem definition remark 
semigroup Zt >T semigroup homomorphism 17.1.9 
monoid Zy >T monoid homomorphism 
group ZG group homomorphism 17.4.7 17.4.8 
ordered group Z > G group monomorphism 17.5.12 
rng ZR group homomorphism 18.1.21 
unitary rng ZR unitary ring homomorphism 18.2.10 — 18.2.11 
ordered ring Z> R group monomorphism 18.3.15 
ordered unitary ring Z —^ R ordered unitary ring monomorphism 18.6.6 
integral system Z —> R ordered unitary ring isomorphism 18.6.15 


Homomorphisms which embed the integers in algebraic systems according to the general pattern of a unital 
morphism give some basic insight into the structure of these systems. (The family relations between the 
systems discussed here are illustrated in Figure 17.1.2.) 


semigroup 


t 


monoid 


+ > 
ordered group 


"IE WE: 


unitary ring ordered ring 


BYE 


ordered unitary ring 


t 


integral system 


Figure 17.1.2 Algebraic systems with unital morphisms 


17.1.11 EXAMPLE: Set-endomorphism semigroups. 

Let S be a set. Let T = S? be the set of functions from S to S. In other words, T is the set of set- 
endomorphisms of S. Define o : T x T >T by o : (f,g) ^ f og. Then c is a closed operation on I by 
Theorem 10.4.21 (i), and is an associative operation by Theorem 10.4.21 (ii). Hence (T,ø) is a semigroup. 
Clearly (T, o^) is also a semigroup with the reverse-order operation o' : (f,g) = go f. 


17.1.12 REMARK: Additive versus multiplicative notation for semigroup operations. 

When the operation c of a semigroup (T,c) is implicit in the context, it may be notated multiplicatively 
or additively. Multiplicative notation is mostly used for noncommutative semigroups. Additive notation is 
usually used for commutative semigroups. 


In the additive notation, o(g1, go) is written as gı + g2. In the multiplicative notation, o(g1, g2) is written 
as gig (simple juxtaposition) or gı - g2, or sometimes as gı © go. The circle symbol o is also used for the 
composition of arbitrary functions. To avoid ambiguity, it is better to not use the circle-symbol “o” for 
semigroup operations unless the semigroup operation is in fact a function composition. 


Since a semigroup operation is associative, parentheses may be omitted because (g1g2)93 = gi(gog3) in 
multiplicative notation, and (gi + g2) + g3 = g1 + (g2 + g3) in additive notation. 


17.1.13 REMARK:  Cancellative semigroups. 
A cancellative semigroup is a semigroup where multiplication on the left or right by any fixed element never 
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causes “information loss”. In other words, all left and right actions within a cancellative semigroup can 
be reversed. This does not imply the existence of left and right inverses, but the existence of left and 
right inverses would imply that the semigroup is cancellative. For example, the multiplicative semigroup 
of positive integers is cancellative, but no element except for 1 has a left or right inverse. In other words, 
left and right multiplication of positive integers by a fixed positive integer k can be reversed (because such 
maps are injective), but there is no positive integer k^! which can achieve this reversal by left or right 
multiplication respectively (unless k = 1). 


17.1.14 DEFINITION: A cancellative semigroup is a semigroup I < (D, c) which satisfies: 


(i) Vg, ga, h ET, (hg = hg2 > 9 = 92), [left cancellation] 
(ii) Vgi, go, h ET, (gh =goh — gi = 92). [right cancellation] 


17.1.15 DEFINITION: 
A commutative semigroup is a semigroup (T', c) which satisfies Vgi,g2 E T, o(g1, 92) = 0(g2, 91). 


17.1.16 REMARK: Order-independence of sequences of operations in a commutative semigroup. 

In a commutative semigroup I’, sequences of operations may be performed in an arbitrary order. Thus 
gj + Ija --- dj = G1 + G2---+ gi for any finite sequence (g;)*_, of elements of I, for any permutation 
j : Nk — Ng, for any integer k € Zt. (See Section 14.8 for permutations.) Consequently, the summation 
notation $7 has an unambiguous meaning for general commutative semigroups. 


The sum in Definition 17.1.17 is well defined for a general semigroup. (In fact, the sum is well defined for 
any closed operation c on L', whether it is associative or not.) But independence with respect to order of 
summation requires the semigroup to be commutative. It is dangerous to employ the summation notation 
for noncommutative semigroups because users will inevitably swap the order without realising their error. 

Definition 17.1.17 and Notation 17.1.18 have obvious generalisations to sequences f : J — T with general 


totally ordered finite index sets J. In the case of unordered index sets, sums of finite sequences may be 
defined unambiguously if the semigroup is commutative. 


17.1.17 DEFINITION: The sum of a finite sequence f : N, — V, for a semigroup (T, o) and n € Z”, is the 
element s,, € I defined inductively by s; = f; and Vj € Nn \ {n}, sj41 = o(s;j, fj+1)- 


17.1.18 NOTATION: J` f, for a finite sequence f : N, — I, where n € Z* and T is a commutative 
semigroup, denotes the sum of f. 


Mina fi and Mey, fi are alternative notations for 7 f. 


17.1.19 REMARK: Historical origin of addition and subtraction symbols. 

The first publication of the addition and subtraction symbols “+” and * — " is attributed by Cajori [242], 
Volume 1, pages 128, 130, 230-237, to the 1489 publication, “Behede vnd hubsche Rechenung auff allen 
Kauffmanschafft", by Johann Widman (1462-1498). Evidently the plus sign “+” originated in a shorthand 
for the Latin word “et” (“and”). (See also Bell [234], page 97; Ball [232], page 206.) 


17.2. Monoids 


17.2.1 REMARK: A monoid is a semigroup which has an identity element. 

The monoid is an algebraic structure category between a semigroup and a group. (For more on monoids, 
see MacLane/Birkhoff [110], page 39; Lang [108], page 3; Grove [88], page 1.) Since a monoid is a semigroup 
which has an identity element, it is convenient to first give the definition and basic properties of identity 
elements in semigroups. 


17.2.2 DEFINITION: An identity (element) in a semigroup (T, c) is an element e € I' which satisfies 
Vg ET, (e, g) = a(g, e) = g. 


17.2.3 THEOREM: Uniqueness of the identity of a semigroup. 
A semigroup has at most one identity. 


PROOF: Let (L,oc) be a semigroup. Let e1,e2 € I be identity elements in (L',c). Then o(e1,e2) = 
a(€2,€1) = e» and c(es, €i) = a(e1, €2) = ei. So e = es. 
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17.2.4 REMARK: Consequences of the existence of left and right identities in a semigroup. 

One may define abstract left and right identities ez, eg €T for a semigroup (L, o) to satisfy Vg € T, e(er,g) = 
g and Vg € T, o(g,er) = g. Then c(er,en) = en and o(ez,er) = er. So the left and right identities, if 
they exist, must be the same unique identity element. 

If a semigroup is only known to have a left identity, then this left identity is not necessarily a right identity. 
Nor is it necessarily unique. (See Example 17.2.5.) 


17.2.5 EXAMPLE: Smallest semigroup which has a left identity but no right identity. 

Let I = {e, g} such that e 4 g. Define o : I x T >T by o(x,y) = y for all x,y E T. Then o(a(z, y), z) = z 
for all z,y,z € IF. Similarly, o(x,o(y,z)) = o(y,z) = z for all z,y,z € I. Therefore ø is an associative 
operation on P. 


Both e and g are left identities for (T, o), but neither e nor g is a right identity. Since a semigroup with one 
element necessarily has a two-sided identity, the smallest semigroup with a left identity but no right identity 
has two elements. It is also the smallest semigroup with two different left identities. 


17.2.6 DEFINITION: A monoid is a pair I < (T, o) where o : x T — P satisfies 


(i) Yg1, 92,93 ET, o(g1,0(92,93)) = e (e(91, 92), 93). [associativity] 
(ii) 3e ET, Vg ET, of(e,g) 2 o(g.e) =g. [existence of identity] 


17.2.7 REMARK: The uniqueness of the identity element in a monoid. 
Definition 17.2.6 part (ii) says that a monoid has at least one identity element. Since a monoid is necessarily 
a semigroup, it follows by Theorem 17.2.3 that a monoid has at most one identity element. 


17.2.8 REMARK: Examples of monoids. 

Examples of generic systems which are monoids but not groups are the sets of endomorphisms on a given 
structure, for any structure category. Let X be a set. Let T = X* be the set of functions from X to X. Let 
0:0 xT >T be the function composition operation on I. Then ø is a well-defined associative operation. 
(See Example 17.1.11.) The identity function idx : X — X satisfies condition (ii) of Definition 17.2.6. So 
(I, c) is a monoid. However, if X has two or more elements, I is not a group. 


Another example of a monoid is the set Zi of non-negative integers with the standard addition operation. 
The set V^ )" of non-negative integer n-tuples is also a monoid with componentwise addition for n € Zi : 
Similarly, the sets (Qj)” and (IRj)” of non-negative rational and real number n-tuples respectively are 
monoids with componentwise addition. 


17.2.9 REMARK: Limited applicability of monoids relative to semigroups. 

It is not entirely clear that monoids offer any “added value” relative to a semigroup. If a semigroup (T, c) 
does not have an identity element, it may be trivially added. Thus if e ¢ T, then (I’, a’) is a monoid, where 
I" =Tuf{e} and e" = eU(((eg),g); g E T'}U{((g,e), 9); g € I’}. In this construction, there are no elements 
91,92 € V with g1go = e. So there are no left or right inverses. An identity element is only non-trivial if such 
pairs of elements exist in l'. In a monoid where such non-trivial left and right inverses do exist, this leads 
naturally to the question of whether all elements have a left inverse, or a right inverse, or both, and if both 
a left and right inverse exist, whether these inverses are the same. And this leads to the definition of groups. 


'The forward-time operator solutions of the heat equation in Example 17.1.6 constitute a monoid with a 
natural identity element which is the zero-time-step operator. There are no non-identity elements in this 
monoid whose composition equals the identity. Nevertheless, the identity element is much used in the study 
of such semigroups. Similarly, the zero element of the non-negative integers is much used, although there 
are no positive integers which sum to zero. 


17.3. Groups 


17.3.1 REMARK: Historical origins of groups. 

According to Bell [234], page 234. the term group as an algebraic structure was invented by Galois. However, 
Bell [233], page 278, says the following, apparently attributing the early development of group theory to 
Cauchy (1789-1857). 
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[...] the theory of substitutions, begun systematically by Cauchy, and elaborated by him in a long 
series of papers in the middle 1840’s, which developed into the theory of finite groups |. . .] 


According to Szekeres [305], page 27: 


The concept of a group has its origins in the work of Evariste Galois (1811-1832) and Niels Henrik 
Abel (1802-1829) on the solution of algebraic equations by radicals. 


Bell [233], page 282, states that the transition from a concrete group (such as a group of permutations) to 
an abstract group (defined only by labels for its elements and the “multiplication table” for the elements) 
was made by Cayley (1821-1895). 


This abstract point of view is that now current. It was not Cauchy's, but was introduced by Cayley 
in 1854. Nor were completely satisfactory sets of postulates for groups stated till the first decade 
of the twentieth century. 

Bell [233], page 518, attributes the axiomatic approach to group theory to Dedekind (1831-1916). 
Dedekind was one of the first to appreciate the fundamental importance of the concept of a group 
in algebra and arithmetic. In this early work Dedekind already exhibited two of the leading charac- 
teristics of his later thought, abstractness and generality. Instead of regarding a finite group from 
the standpoint offered by its representation in terms of substitutions [...], Dedekind defined groups 
by means of their postulates [...] and sought to derive their properties from this distillation of their 
essence. This is in the modern manner: abstractness and therefore generality. 


Hence it is not entirely clear who deserves credit for the invention and early development of group theory. 


17.3.2 DEFINITION: A group is a pair G < (G,c) such that o : G x G — G satisfies the following. 


(i) Voi, 92,93 € G, o (gi, 0 (go. g3)) = a (o (91, 92), 93). [associativity] 
(ii) dec G, Vg € G, c(e,g) = o(g,e) =g. [existence of identity] 
(iii) Vg € G, dg’ € G, o(9', 9) — c(g, g) =e. [existence of inverses] 


17.3.3 REMARK:  Additive versus multiplicative notation for group operations. 

The group operation c of a group (G,c) may be notated additively or multiplicatively, as for semigroups. 
(See Remark 17.1.12.) Multiplicative notation is generally used within group theory. Additive notation is 
usually used when the group is embedded in a two-operation algebraic structure such as a ring or field. 


17.3.4 DEFINITION: A trivial group is a group (G,c) with G = {e} for some e. 


17.3.5 REMARK: Properties of trivial groups. 
By Definition 17.3.2 (ii), the operation c of a trivial group (G,o) with G = {e} must satisfy c : (e,e) > e. 


17.3.6 NOTATION: 

T for g in a group (G, aq), denotes the function Lv : G — G defined by LG : h> ogg, h). 

RẸ, for g in a group (G, ac), denotes the function RF : G — G defined by RF : h  ea(h, g). 

L, and R, are abbreviations for Lg and Rg respectively when the group G is implicit in the context. Then 
Lg : G — G and Rg : G — G are defined by Lg: h> gh and Rg : h+ hg respectively. 


17.3.7 DEFINITION: The function L, in Notation 17.3.6 may be referred to as the left action by g (on G). 
The function Rg may be referred to as the right action by g (on G). 


17.3.8 REMARK: Associativity is a kind of commutativity of left and right actions. 

Associativity may be thought of as a kind of commutativity because associativity means that the left and 
right actions of group elements commute. In other words, Ly, o Ry, = Rg, o Lg, for all g1, g2 € G. Group 
elements h € G may be thought of as “two-port objects” which can be multiplied from the left or the 
right, and the actions on these two ports commute. Thus (Ly, o Rga)(h) = Lg (Rga(h)) = Lg, (hg2) = 
gı(hg2) = (gıh)g2 = (Lg,(h))go = Rg (Lg (h)) = (Rg, © Lg,)(h) for all g1,g2,h € G. This contrasts 
with transformation groups and linear spaces, for example, where the elements of the “passive set” may be 
multiplied only on one side (conventionally on the left) by elements of the “active set”. In other words, the 
elements of the passive sets are “single-port objects”. 


17.3.9 DEFINITION: An identity for a group G is any element e € G such that eg = ge = g for all g € G. 
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17.3.10 REMARK: Possible origin of the notation “e” for group identity elements. 

Probably the customary notation “e” for the identity in Definition 17.3.9 is a mnemonic for the German 
word “Einheit”, which means “unit” or “unity”, because the identity in a multiplicatively notated group is 
also called the unit element or unity. 


17.3.11 REMARK: Uniqueness of the identity in a group. 
Definition 17.3.9 for a group identity is the same as Definition 17.2.2 for a semigroup identity. Therefore by 
Theorem 17.2.3, the identity in a group is unique. (Very literally, “Einheit” means “one-hood” or “one-ness” .) 


17.3.12 DEFINITION: A left inverse of an element g in a group G is an element h € G such that hg = e. 
A right inverse of an element g in a group G is an element h € G such that gh — e. 
An inverse of an element g in a group G is an element h € G such that hg = gh =e. 


17.3.13 REMARK: Some conditions which imply that group elements are the identity or inverses. 


relatively easy-to-prove conditions imply some quite strong conclusions. 


17.3.14 THEOREM: Some basic tests for the identity and inverses of a group. 
Let G be a group with identity e. 


(i) Each element in G has at most one inverse. 

(ii) If g, h € G and gh =h, then g =e. 
(ii) If g, h € G and hg = h, then g = e. 
(iv) If g, h € G and hg = e, then g is the inverse of h and h is the inverse of g. 

(v) e is the inverse of e. 
(vi) If g € G and gg = g, then g = e. 
PRoor: For part (i), suppose g € G has two inverses hi, hg € G. Then hig = ghi = e and hog = gh» =e. 
Therefore hz = eha = (hig)ha = hi(gha) = hie = hi. (Alternatively apply Theorem 17.2.3.) 
For part (ii), suppose g, ^ € G satisfy gh = h. By Definition 17.3.2 (iii), there is an element h’ € G with 
hh! = e. Then g = g(hh/) = (gh)h' = hh’ = e by Definition 17.3.2 (i). So g =e. 
For part (iii), suppose g,h € G satisfy hg = h. By Definition 17.3.2 (iii), there is an element h’ € G with 
h/h =e. Then g = (h'h)g = h'(hg) = h'h = e by Definition 17.3.2 (i). So g = e. 
For part (iv), let g,h € G satisfy gh = e. By Definition 17.3.2 (iii), there is an element g' € G with 
g'g = e gg'. Then h = (g'g)h = g'(gh) = g'e = g'. So h is an inverse of h, and this inverse is unique by 
part (i). Similarly, g is the inverse of h. 
Part 
Part 


v) follows from part (iv) and the observation that ee = e by Definition 17.3.2 (ii). 
vi) follows from part (ii). 


—~ 


17.3.15 NOTATION: gt, for any element g in a group G, denotes the inverse of g in G. 


17.3.16 REMARK: The existence and properties of inverses distinguish groups from semigroups. 
Although left and right inverses always exist and are the same for a group, they may be different, non-unique 
or even non-existent for semigroups. (See Remark 17.2.4.) 


17.3.17 THEOREM: Inverse rule for a product of group elements. 
Let G be a group. Then (g192)~! = g5 gi | for all gi, g2 € G. 


PROOF: Let 91,92 € G. Then (93 '94,")(9192) = g3 (91 "91)92 = g3 G2 = 9292 = e. Similarly, 
(9192)(93 91 ) =e. Hence(gg2) ! = 93 91 - 


17.3.18 REMARK: The set of set-automorphisms of a set is a group. 

As noted in Example 17.1.11, the set of all functions from a set S to itself (i.e. the set End(S) of set- 
endomorphisms of S) is a semigroup with the operation of function composition. In Remark 17.2.8, it is 
noted that End(S) is a monoid. However, End(S) is not a group in general because some elements will 
be non-invertible if #(S) > 2. But by restricting these maps to the set-automorphisms in Aut(S), every 
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element is then invertible, and the result is a group. This is also known as the “permutation group” for S. 
(See Definition 14.8.2 for permutations.) 

The group operation may be defined on the left as oz : Aut(S) > Aut(S) with oz : (f,g) 5 f o g, or on 
the right as og : Aut(S) > Aut(S) with or: (f, g) — g o f. These are presented in Theorem 17.3.19. 
Similarly, the set of automorphisms for any algebraic or analytic category forms a group under composition, 
in the same way that the set of endomorphisms forms a monoid. Thus in general, Aut(S) may be referred 
to as an “automorphism group", whereas End(S) may be referred to as an “endomorphism monoid”. 


17.3.19 THEOREM: The set of bijections of a set is a group with the composition operation. 
Let S be a set. Let G = (f : S — S; f is a bijection}. 


(i) (G,or)is a group with or : (f, g) => fog. 
(ii) (G,og) is a group with or: (f,g) => go f. 


Pnoor: For part (i), closure follows from Theorem 10.5.6 (iii). Let f,g,h € G. Then or(er(f,g),h) = 
(fog)oh-— fo(goh)-or(f,or(g,h)) by Theorem 10. 4. |10.421(i ii), which verifies associativity. Let e — ids. 
Then e satisfies Definition 17.3.2 (ii). Let f € G. Then the function inverse f^! is in G by Theorem 10.5.11, 
and ox(f—'!, f) = f7! o f = ids =e and ox (f, f7!) = f o f7! = ids = e by the definition of the inverse 
function. This verifies Definition 17.3.2 (iii). Hence (G, o7) is a group. 

For part (ii), closure follows from Theorem 10.5.6 (iii). Let f,g,h € G. Then or(or(f,g),h) = ho (go 
f) = (hog)o f = or(f,or(g,h)) by Theorem 10.4.21 (ii), which verifies associativity. Definition 17.3.2 
conditions (ii) and (iii) follow as in part (i). Hence (G, oR) is a group. 


17.3.20 THEOREM: The left and right actions by group elements are bijections without fixed points. 
Let G < (G,c) be a group. 
(i) L,:G > G is a bijection for all g € G. 
(ii) Rg : G > G is a bijection for all g € G. 
(iii) Vg € G \ {eg}, Vh € G, Lgh Æ h. In other words, L, has no fixed points unless g = e. 
(iv) Vg € G \ {eg}, Vh € G, Rgh Æ h. In other words, R, has no fixed points unless g = e. 
PROOF: For part (i), let g,h € G. Then (Lj-: o Lj)(h) = g^ !gh = h. So L,-: o Ly = idg. Similarly, 
Lg o Lj-: =idg. So Lg : G > G is a bijection by Theorem 10.5.14 (iv). 
Part (ii) may be proved similarly to part (i). 
Part (iii) follows from Theorem 17.3.14 (ii). 
Part (iv) follows from Theorem 17.3.14 (iii). 


17.3.21 REMARK: The set of all translations of a group by a group element is a group. 

Let G be a group. For h € G, define the map Lj : G > G by Li :g |> hg. Then (L4; h € G} is a group 
under the operation of composition of maps. In other words, La, Li, = Ly, o Ly, = Ln n, 

Similarly, (R,; h € G} is a group with Rp : G > G defined by Rp : g > gh and the reverse map-composition 
operation, In other words, Rp, Lj, = Ry, o Ry, = Rhina. Both left and right translation groups are presented 
in Theorem 17.3.22. 

Since these groups are sets of set-automorphisms, they are subgroups of the set of set-automorphisms Aut(S). 
(See Section 17.6 for subgroups.) 


17.3.22 THEOREM: The sets of left and right actions of a group are groups with the composition operation. 
Let G < (G,c) be a group. 

(i) (Gr,or)isa group with Gr, = (Lg; g € G} and øz : (Lg, Ln) > Lg © Ln. 

(ii) (GR, oR) is a group with Gg = (R5; g € G} and og: (Rg, Rn) > Ry o Rg. 

PROOF: For part (i), closure follows from the equality er(L,,L,)(r) = ghax = Lgp(x) for all x € G. 
Associativity follows from 9r(Lp,or(Lg, Ln)) = Ly [e] (Lg [e] Ln) = (Ly [e] Lg) [e] In = oL(or(Lf, Lg), Ln) for 
all f,g,h € G by Theorem 10.4.21 (ii). Let er, = ida = Le, where e is the identity for G. Then er, € Gr and 
or(ep, Lg) = ar(Lg,er) = Lg for all g € G, which verifies Definition 17.3.2 (ii). For all g € G, Lj-: € Gr, 
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and or(L,-1, Lg) =L 
(GL, oz) is a group. 


10D,=e,=L,0L,-1 =oz(Lyg,L,-1), which verifies Definition 17.3.2 (iii). Hence 
g g g g g:bg-i^ ete 


For part (ii), closure follows from eg(R;, Rp)(x) = zgh = Ryp(x) for all x € G. Associativity follows from 
or(Ry,oRr(Ry, Ra)) = (Rn iS) Rg) fe) Ry = Ryo (Rg iS Ry) zx eon(on(Ry, Rg), Rn) for all fig,h EG by 
Theorem 10.4.21 (ii). Let eg = ida = Re, where e is the identity for G. Then eg € Gr and or(er, Rg) = 
or(Rg er) = Rg for all g € G, which verifies Definition 17.3.2 (ii). For all g € G, R;-: € Gr and 
or(Rg-1, Rg) = Rg o Ry-1 = er = Rg-ı o Rg = oR(Ry, Rg-1), which verifies Definition 17.3.2 (iii). Hence 
(Gr, oR) is a group. 


17.3.23 DEFINITION: A commutative group is a group (G,c) such that 
c(gi, 92) = a (ga, g1) for all g1, g2 € G. 


17.3.24 REMARK: A commutative group is also known as a “module without operator domain". 

The operation of a commutative group is typically notated additively to distinguish it from a multiplication 
operation on the same set. Commutative groups are also known as “Abelian groups", named after Niels 
Henrik Abel (1802-1829). 

The conditions for a commutative group are logically equivalent to the conditions for a “module without 
operator domain" in Definition 19.1.2. In the module role, a commutative group is thought of as a “passive 
set" which can be acted on by "active sets". 


17.4. Group morphisms 


17.4.1 DEFINITION: Group morphisms. 
A group homomorphism from a group G4, < (G1,01) to a group G2 < (G2,02) is a map $ : G4 > Gs such 
that ó(c4(g, h)) = os(é(g), d(h)) for all g, h € Gy. In other words, ó(gh) = ó(g)ó(h) for g, h € G4. 


A group monomorphism from a group G to a group G3 is an injective group homomorphism 6$ : G4 > G2. 


A group epimorphism from a group G4 to a group G» is a surjective group homomorphism 6$ : Gi > G2. 


A group isomorphism from a group G to a group G3» is a bijective group homomorphism 6$ : G4 > Go. 


A group endomorphism of a group G is a group homomorphism $ : G > G. 


A group automorphism of a group G is a group isomorphism 6$ : G > G. 


17.4.2 NOTATION: Let G, G4, G2 be groups. 

Hom(Gi, Gz) denotes the set of group homomorphisms from G to Go. 
Mon(G1, G5) denotes the set of group monomorphisms from G4 to Go. 
Epi(G1, G2) denotes the set of group epimorphisms from G to G2. 
Iso(G1, Gz) denotes the set of group isomorphisms from G4 to Go. 
End(G) denotes the set of group endomorphisms of G. 

Aut(G) denotes the set of group automorphisms of G. 


17.4.3 REMARK: Comparison of group morphisms with semigroup morphisms. 

Definition 17.4.1 appears to be the same as the corresponding Definition 17.1.8 for semigroups. However, it 
is not necessary to explicitly require that the identity be mapped to the identity, and inverses be mapped 
to inverses. The extra conditions (ii) and (iii) in Definition 17.3.2 guarantee these extra properties of group 
morphisms, as shown in Theorem 17.4.4. 


17.4.4 THEOREM: Mapping of identity and inverses by group homomorphisms. 
Let G4, G2 be groups. Let ó : G4 — G2 be a group homomorphism. 


(i) ó(ea,) = eG: 

(ii) Vg € Gi, é(g!) = o(g)7?. 
PROOF: For part (i), let g € G1. Then ġ(g) = ó(gec,) = (9) ¢(eG,) by Definition 17.4.1. So ó(ec,) = ea, 
by Theorem 17.3.14 (iii). pee 


For part (ii), let g € G1. Then eg, = ó(ea,) = é(g !g) = é(g- )é(g) by part (i). So o(g~") = é(g) ^! by 
Theorem 17.3.14 (iv). 
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17.4.5 REMARK: The inverse of a group isomorphism is necessarily a group homomorphism. 

Definition 17.4.1 does not explicitly require the inverse $^! of a group isomorphism ¢ to be a homomorphism. 
But it is readily verified that $^ !(goh3) = ó^1(ó(gi)ó(h1)) = 6 -1(O(g1h1)) = gihi = $7! (ga)ó- (ha) for 
any go, ha € Go, where gı = $^! (hi) and g2 = ¢ 1 (ha). 


17.4.6 REMARK:  Unital group morphisms. 

Theorem 17.4.7 defines the unique group morphism from the additive group Z to a group G which maps 1z 
to g € G. (An almost identical theorem is given by MacLane/Birkhoff [110], page 52.) These are analogous to 
the unital morphisms for rings in Definition 18.2.11. The name “unital group morphism" in Definition 17.4.8 
is not entirely justified because the unit of Z is mapped to an arbitrary element of G, not to the unique unit 
element of a ring. The term “nth power of g” in Definition 18.2.11 is equally unjustified because the word 
“power” suggests a multiplicative group whereas Z is here an additive group. However, the industry-standard 
notation g" does suggest the word *power". 


17.4.7 THEOREM: Inductive construction for unital group morphisms. 
Let (G,o) be a group with identity e € G. For g € G, define v, : Z > G inductively as follows. 


(i) v5(0z) =e. 
(ii) Vn € Zf, vg(n + 1) = e(vs (n), 9). 
(ii) Vn € Zp, v4(n — 1) 2 e(vg(n), 97"). 
Then for all g € G, w, is the unique group homomorphism from the additive group Z to G with (1) = 


Pnoor: For g € G and v, : Z — G as defined, Yg(nı + 0) = v4(n1) = vs(ni)v4(0) for all nı € Z by 
assumption (i). Suppose (as inductive hypothesis on no € Zg) that vg(n1 + n2) = vg(ni)vg(n2) for all 
mi € Zj. Then Vg(n + (n2 € 1)) = Vy((ni + n2) +1) = vg(m + nz)vs(1) = (og (ni)us (n2) 9,1) = 
vg(ni) (Pg (n2)vg(1)) = vg(ni)vg(na + 1) for all ny € ZË by the associativity of multiplication in G and 
assumption (ii). Hence by induction on nz € Zp, vg(n1 + n2) = vg(ni)vg (na) for all nj,n2 € Zg. The 
other cases follows similarly. So v : Z — G is a group homomorphism by Definition 17.4.1. 

To show uniqueness, let V : Z — G and v) : Z — G be group homomorphisms. Then 7 (0) = e = 
v OX (0). If PY (n) = v (n) for some n € Zi, then PY (n +1) = v O (n)g = v O (n)g = PA (n +1). So by 
induction on n € Z, (9 (n) = y? (n) for all n € Zt, and similarly for n € Zp. Hence Yy® = wp). 


17.4.8 DEFINITION: The unital group morphism generated by an element g of a group G is the map 
Vs : Z — G which is defined in Theorem 17.4.7. 


The nth power of g means the value v4(n) of the unital group morphism generated by g. 


17.4.9 NOTATION: g", for g in a group G and n € Z, denotes the nth power of g. 


17.4.10 REMARK: The reverse-operation group of a group. 
Any group is group-isomorphic to its reverse. In other words, if the group operation is reversed, then the 
group with the reversed operation is group-isomorphic to the original group. 


17.4.11 THEOREM: Group isomorphism between a group and its reverse-operation group. 
Let (G,o) be a group. 

(i) Define g : G x G > G by g : (g, h) + o(h,g). Then (G,2) is a group. 

(ii) Define 6: G — G by ó:g g !. Then ¢ is a group isomorphism from (G,c) to (G, ē). 


Pnoor: For part (i), let f, g,h € G. Then o(f,o(g,h)) = e(c(h,g), f) = o(h,o(g, f)) = a(o(f,g),h) by 
Definition 17.3.2 (i). So & is associative. The identity e for (G, c) is clearly also an identity for (G a). also. For 
g € G, let g~! be the inverse of g in (G,o). Then a(g~', g) 2o(g,g |!) = e and o(g,g !) = alg ECL 
So g^! is also an inverse of g in (G, g). Hence (G,@) is a group. 

For part (ii), let g,h € G. Then ó(c(g, h)) = ec(g,h) ! = e(h,g) ! = s(g !, h-*) by Theorem 17.3.17 
because the inverses of g and h are the same in (G,@) as they are in (G,c). So é$(e(g, h)) = e(9(g), O(h)). 
Therefore ¢ is a group homomorphism by Definition 17.4.1. But à : G — G is a bijection (and it equals its 
own inverse). Hence ¢ is a group isomorphism. 
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17.4.12 DEFINITION: The reverse-operation group of a group (G,c) is the group (G, g) witha : GxG —^ G 
defined by ø : (g, h) + o(h, g). 


17.4.13 REMARK: Four basic kinds of isomorphisms from a group to its translation groups. 

Any group G is isomorphic to the group of left translations of G, with essentially the same group operation. 
This is because L5; = Lg o Ly for all g, h € G. If each g € G is mapped to Rg, the isomorphism must reverse 
the operation order because Rgn = Rp o Ry for g, h € G. It follows from Theorem 17.4.11 that g may be 
mapped to L,-: if the operation order is reversed. This is due to the order reversal in Theorem 17.3.17. 
Consequently, if g is mapped to R,-1, there is a double reversal. Thus R(55)-: = Ry-1 o Rp-1. These four 
cases are described in Theorem 17.4.14. 


17.4.14 THEOREM: Isomorphisms from a group to its four operation reversal and inversion groups. 
Let (G,c) be a group. 
(i) Define ¢, : G — Gr by ór : g++ Lg, where (Gr, cz) is as in Theorem 17.3.22 (i). 

Then ór, is a group isomorphism from (G,c) to (Gr, oz). 

(ii) Define óg : G — Gg by ón : g ^ Rg, where (Gn, on) is as in Theorem 17.3.22 (ii). 
Then óg is a group isomorphism from (G,c) to (Gg, oq). 

(iii) Define o, :G— Gpbyón:g L,-1, where (Gy, @) is the reverse-operation group of (Gr, az). 
Then ór, is a group isomorphism from (G,c) to (Gr, Gz). 

(iv) Define dr :G— Gr by or: gH R,-1, where (Gg, FR) is the reverse-operation group of (Gr, oR). 
Then óg is a group isomorphism from (G,c) to (Gr, 6g). 


PROOF: For part (i), let g, h € G. Then ór(gh) = Ligh = Lg o Lp = 9L(Ls, Ln) = cr (ér(g), or (h)). Hence 
Qr is a group homomorphism from (G,c) to (Gr,or) by Definition 17.4.1. The surjectivity of ór, follows 
from the definition of Gr. Let ór(g) = ór(h). Then Lg = Lp. So g = ger, = her, = h, where er, is the 
identity of Gr. So ór, is injective. Hence ór, is an isomorphism. 

For part (ii), let g, h € G. Then ġr(gh) = Rgn = Rn o Rg = or(Ry, Rn) = en(ón(g). ón(h)). Hence dz is 
a group homomorphism from (G,6) to (Gg, og) by Definition 17.4.1. The proof that dp is an isomorphism 
follows as for part (i). 

For part (iii), define $: Gr > Grpbyó:L,;o [,-: for all g € G. Then óisa group isomorphism from 
(Gr,or) to (Gr,or) by Theorem 17.4.11 (ii) because LZ! = L,-: for all g € G. But ór, = $ o ór. So ór is 
a group isomorphism by part (i). 


For part (iv), define $ : Gr Gnbyo: Rg > R,-1 for all g € G. Then ġ isa group isomorphism from 
(Gr, oR) to (Gr, FR) by Theorem 17.4.11 (ii) because BR = R,-: for all g € G. But óg = $ o óg. So ón 
is a group isomorphism by part (ii). 


17.5. Ordered groups 


17.5.1 DEFINITION: Orderings of groups. 
A left-ordering of a group (G,c) is a total order “<” on G such that Va, b,c € G, (a < b = ca < cb). 


A right-ordering of a group (G,c) is a total order “<” on G such that Va,b,c € G, (a < b = ac < bc). 


A bi-ordering of a group (G,c) is a total order “<” on G which is both a left-ordering and right-ordering. 


17.5.2 DEFINITION: Ordered groups. 
A left-ordered group is a tuple (G,c, «) such that “<” is a left-ordering on the group (G, o). 


A right-ordered group is a tuple (G,o,<) such that “<” is a right-ordering on the group (G, o). 


An ordered group is a tuple (G,c, «) such that “<” is a bi-ordering on the group (G, o). 


17.5.3 REMARK: Archimedean orderings of groups. 

An Archimedean order on a group is an order which meets the requirements of the lemma underlying the 
“method of exhaustion” which was introduced by Eudoxus of Cnidus in about the middle of the fourth 
century BC. Boyer/Merzbach [237], page 81, says the following. 


According to Archimedes, it was Eudoxus who provided the lemma that now bears Archimedes’ 
name—sometimes known as the axiom of continuity—which served as the basis for the method of 
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exhaustion, the Greek equivalent of the integral calculus. The lemma or axiom states that given 
two magnitudes having a ratio (that is, neither being zero), one can find a multiple of either one 
that will exceed the other. 


Definition 17.5.4 is not useful in isolation because it does not require the order to be compatible with the 
group structure. The necessary formalities are spelled out in Definitions 17.5.5 and 17.5.9. 


17.5.4 DEFINITION: An Archimedean order on a group (G,c) is a total order “<” on G which satisfies 
Vg € G, Vh € Gt, In € Zf, g < h”, where Gt = {x € G; eg < x} and eg is the identity of G. 


17.5.5 DEFINITION: An Archimedean left-ordering of a group (G,c) is a left-ordering “<” of (G,o) such 
that “<” is an Archimedean order on (G, ø). 


An Archimedean right-ordering of a group (G,o) is a right-ordering “<” of (G,c) such that “<” is an 
Archimedean order on (G, ø). 


An Archimedean bi-ordering of a group (G,c) is a bi-ordering “<” of (G, o) such that “<” is an Archimedean 
order on (G,c). 


17.5.6 REMARK: Application of ordered commutative groups to a minimalist metric function definition. 
The minimalist definition of a point-to-point metric function in Section 37.1 requires the values of the metric 
function to have a commutative addition operation and a total order. This is the motivation for Definitions 
17.5.7 and 17.5.8. In the case of a standard non-negative metric, only the non-negative values of the ordered 
commutative group are required. 


17.5.7 DEFINITION: An ordering of a commutative group (G,o) is a total order “<” on G which satisfies 
Va,b,c € G, (a < b => ac < bc). 


17.5.8 DEFINITION: An ordered commutative group is a tuple (G, c, R) such that (G,o) is a commutative 
group and R is an ordering of (G, o). 


17.5.9 DEFINITION: An Archimedean ordering of a commutative group (G,c) is an ordering “<” of (G,c) 
such that “<” is an Archimedean order on (G, c). 


An Archimedean ordered commutative group is an ordered commutative group (G, o, «) such that “<” is an 
Archimedean order on (G, c). 


17.5.10 REMARK: Ordered group morphisms. 

For brevity, only ordered group homomorphisms are defined in Definition 17.5.11. The other five kinds of 
morphisms are almost always assumed to be automatically defined in terms of homomorphisms following the 
pattern of Definition 17.4.1. 


17.5.11 DEFINITION: An ordered group homomorphism from an ordered group (G1,01, «1) to an ordered 
group (G2, 02, <2) is a group homomorphism $ : G4 — G2 such that Vg, h € Gi, (g <1 h => $(g) <2 é(h)). 


17.5.12 THEOREM: Some ordered morphism properties of unital group morphisms. 
Let G be an ordered group. For x € G, let s, : Z— G be the unital group morphism generated by x. 


(i) Ys : Z > G is a group homomorphism for all x € G. 
(ii) If x > eg, then Vn € Zt, v,(n) > eg and Vn € ZT, wz (n) < ea. 
(ii) If x < eg, then Vn € ZT, v,(n) < eg and Vn € Z7, a(n) > ec. 


) 
) 

(iv) Yz : Z — G is a group monomorphism for all x € G^ {ec}. 
) 


(v) Yz: Z — G is an ordered group monomorphism for all x € G with x > eg. 


Pnoor: For part (ii) follows from Theorem 17.4.7. 

For part (ii), let x > eg. Then v,(1) = x > eg = v«(0). Suppose (as inductive hypothesis) that Ys(n) > ea 
for some n € Z+. Then v,(n +1) = v,(n)g > We(nje = x(n) because “<” is a right-ordering. So 
V, (n + 1) > eg. Therefore by induction, t; (n) > eg for all n € Z+. Similarly, Vn € ZT, v, (n) < ea. 

Part (iii) may be proved as for part (ii). 
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For part (iv), let x € G \ {eg}. Then either x < eg or x > eg. Suppose that x > eg. Let nı, n2 € Z with 
ni # na. Then either n; < na or nı > no. Suppose that nı < ng. Then nı — na < 0. So ,(n1 — n3) < ea 
by part (ii). Therefore V,(ni) = Y2(n2 + (ni — n2)) = Ya (na)vs(ni — n2) < v«(nz)ea = Y2(nz) because 
“<” is a right-ordering. So (n1) Æ W2(n2), and similarly if £ < eg or nı > na. Therefore v, is injective. 
Hence Yy : Z— G is a group monomorphism by Definition 17.4.1. 


Part (v) follows from Definition 17.5.11 and the proof of part (iv). 


17.6. Subgroups 
17.6.1 DEFINITION: A subgroup of a group (G,oc) is a group (H,og) such that H C G and og C og. 


17.6.2 REMARK: The operation for a subgroup is a restriction of the operation of the group. 
If (H,c 4) is a subgroup of a group (G, og) according to Definition 17.6.1, then og = oG| ayn This follows 
from the fact that the domain of og must be H x H. 


In practice, it is usual to refer to the set H as a subgroup rather than the pair (H, ea| g n) because the only 
possible choice for the subgroup operation can be so easily determined from the parent group’s operation. 
Definition 17.6.3 is thus an informal alternative to Definition 17.6.1. Then Theorem 17.6.4 gives some fairly 
obvious (but useful) conditions for a subset of a group to be a subgroup in the sense of Definition 17.6.3. 


17.6.3 DEFINITION: A subgroup-set of a group (G,aq) is a set H such that (H,oz) is a subgroup of 
(G, oa), where oy = ae ee 


A subgroup-set may be referred to simply as a “subgroup”. 


17.6.4 THEOREM: Equivalent set of conditions for a subset to be a subgroup. 
Let G < (G, og) be a group. Let H be a set. Then H is a subgroup-set of G if and only if 


(i) HCG, 

(ii) Vg, go € H, oc(g1,92) € H, 
(iii) eg € H, and 
(iv) Vg € H, gq’ € H, where gc! denotes the inverse of g in G. 
Hence (H, re MEME is a subgroup of (G, øq) if and only if (i), (ii), (iii) and (iv) hold. 
PROOF: Assume that H is a subgroup-set of G according to Definition 17.6.3. Then H is a subset of G, 
which verifies (i), and (H, op) is a subgroup of (G, og), where og = CE M Since (H, og) is a subgroup 
of (G, aq), it follows from Definition 17.6.1 that (H, op) is a group. 
Let 91,92 € H. Then eu(gi, g2) € H because (H, op) is a group. So eG (gi, go) € H because oy C og. This 
verifies (ii). 
H has an identity element eg. Then eg € G and eu(egm,en) = en. So oG(en, en) = eg because op C og. 
Therefore eg = eg by Theorem 17.3.14 (ii). This verifies (iii). 
Let g € H. Let gg! € G be the inverse of g in G, which is well defined because G is a group. Since (H, cy) 
is a group, g has an inverse n in H. Therefore 


—1 


Ia ea) 


,em) 
,en(9. gr )) 
,ea(g. gr )) 
va (95.9). 9g) 


ea; gg.) 


QI Qi Qi QI 


—1 
= 9H , 


which implies that g5;! € H. This verifies (iv). 
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Now suppose that H is a set which satisfies (i), (ii), (iii) and (iv). Then H C G. Let og = dg acis Then 
oH C og. To satisfy Definitions 17.6.3 and 17.6.1, it remains to show that (H, oy) is a group. The algebraic 
closure of a; follows from (ii). The associativity of oq follows from the associativity of og. The identity 
eq for G in (iii) is also an identity for H. And the inverse gz’ € H in (iv) is an inverse for g within H. 
Therefore (H,o#) is a group by Definition 17.3.2. Hence H is a subgroup-set of G by Definitions 17.6.3 
and 17.6.1. 


It then follows directly from Definition 17.6.3 that (H, oc] py) is a subgroup of (G, aq) if and only if (i), 
(ii), (iii) and (iv) hold. 


17.6.5 DEFINITION: The group generated by a subset S of a group G is the subgroup of G whose set of 
elements is (\{G’ € P(G); S C G' and G' is a subgroup of G}. 


17.6.6 l'HEOREM: The group generated by a subset is a subgroup covering the subset. 
The group generated by a subset S of a group G is a subgroup of G which includes S. 


PROOF: Let G be a group. Let S € IP(G). Let Go = ((G' € IP(G); S C G' and G’ is a subgroup of G}. 
To show closure, let g1, g2 € Go. Then gi, go € G” for all groups G” € P(G) with S C G'. So oœ (g1, 92) = 
eG(gi, 92) € G' for all subgroups G’ of G with S C G”. Therefore oG(g1,g2) € Go. Associativity follows 
from the fact that oG, All subgroups of G contain the identity eg € G. Therefore eg € Go. 


i 9G [osx os 


Then eg is also the identity for Go because og, = Pasa So one may write eg, = eg. To show the 
existence of inverses, let g € Go. Then g has an inverse g^! € G. But g also has an inverse in all subgroups 
G' of G such that S C G', and these inverses are all the same element of G because oq = og for all 


G'xG' 


such G”. Therefore g^! € Go. Hence Go is a subgroup of G which includes S. 


17.6.7 EXAMPLE: The set of all permutations of a set X is clearly a group with map composition as 
the group operation. (See Definition 14.8.2 for general permutations.) The set of all finite permutations 
of X is clearly a subgroup of the set of all permutations. (See Definition 14.8.10 for finite permutations.) 
All finite permutations are expressible as the composition of a finite sequence of two-point transpositions. 
(See Theorem 14.8.16.) Thus the subgroup of finite permutations is generated by the set of all two-point 
transpositions of any set X. 


17.7. Cosets, quotient groups and product groups 


17.7.1 DEFINITION: The left coset of a subgroup H of a group G by an element g € G is the set (gh; h € H}. 
The right coset of a subgroup H of a group G by an element g € G is the set (hg; h € H}. 


17.7.2 NOTATION: Let H be a subgroup of a group G and let g € G. Then gH denotes the left coset 
(gh; h € H} and Hg denotes the right coset (hg; h € H}. 


17.7.3 REMARK: Relation between notations for left/right cosets and left/right action maps. 
In terms of Notations 17.3.6 and 17.7.2, gH = L,(H) and Hg = R4(H). 


17.7.4 THEOREM: Sets of left and right cosets form partitions of a group. 
The set (gH; g € G} of left cosets of a subgroup H of a group G is a partition of G. 


Similarly, the set (Hg; g € G} of right cosets of H is a partition of G. 


PROOF: For left cosets, suppose gı H N g2H Æ Ø. Then g1h1 = geh2 for some hı, h3 € H. For any he H, 
gh = gihi(hi Hh) = gahahi h € goH because H is a subgroup of G. So gH C ga H. Similarly, g4 H 2 ga H. 
So g1H = gs H. The proof for right cosets is the same. 


17.7.5 REMARK: Motivation for the definitions of normal subgroups and quotient groups. 

A simple kind of left action jj: G x X — X may be defined on the set X = (gH; g € G} of left cosets of H 
by u: (g', gH) > (gg) H. This is clearly associative in the sense that gi(g2H) = (gig2)H for all g1, g2 € G. 
Therefore one may write simply g1go H instead of u(gi1, go H) or g1(go H) or (g192) H. 

An attempt to define right translations of left cosets by v : X x G — X with v : (gH, g)  (ghg'; h € H} 
does not necessarily yield left or right cosets. If one attempts instead to define v : (gH, g') > (gg’)H, the left 
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cosets (gg')H may depend on the choice of g’. If g1H = gaH, it is not guaranteed that (g19') H = (g29')H. 
The map v would be well-defined if the left and right cosets of H yielded the same partition of G. Such 
a subgroup H is called a “normal subgroup” of G. For such a subgroup, the maps u : Gx X > X 
and v : X x G + X are well-defined, and the values u(g’,gH) and v(gH, g') depend only on the coset 
of H which g’ belongs to. Therefore one may construct a well-defined operation o’ : X x X — X with 
o’ : gH x g3H — (g19g2)H. It turns out that (X,o’) is a group. This is called the quotient group of G 
modulo H, and is given the notation G/H. This is presented more formally in Definitions 17.7.7 and 17.7.9. 


17.7.6 REMARK: The analogy of cosets to parallel translation in Euclidean space. 

Cosets of subgroups have much in common with the parallel translation of lines and planes in Euclidean 
space. The sets gH, for a subgroup H of a group G, may be thought of as “parallel translates” of H within G. 
In particular, the non-intersection of distinct cosets resembles the notion in Euclidean space that parallel 
lines never meet. 


17.7.7 DEFINITION: A normal subgroup of a group G is a subgroup H of G such that: 


Vg € G, Vh € H, oh’ € H, gh = Ng. (17.7.1) 


17.7.8 REMARK: The condition for a normal subgroup is equivalent to its mirror image. 

Definition 17.7.7 for a normal subgroup H of G is equivalent to requiring that gH — Hg for all g € G. To 
show that the mirror image of condition (17.7.1) in Definition 17.7.7 holds, note that for all g € G and h € H, 
hg —(g 1h 1)! = (ig 1)! = g(h’)—' for some h’ € H, and so (À) ! € H. Hence 


Vg € G, Vh € H, ch’ € H, hg = gh’. 


17.7.9 DEFINITION: The quotient group of G modulo H for any normal subgroup H < (H,oy) of a group 
G < (G,oq) is the group (G/H,oG,g), where G/H = (gH; g € G} and ogy : (91H, 92H) > 9192H for 
all 91,92 € G. 


17.7.10 REMARK: Simple groups cannot be factored further. 

Quotient groups are also known as “factor groups”. This suggests the possibility of decomposing groups into 
factors in a manner analogous to the decomposition of integers into prime factors. A simple group is then 
analogous to a prime number. 


17.7.11 DEFINITION: A simple group is a group which has no non-trivial normal subgroups. 


17.7.12 REMARK: Quotient groups are well defined. 

To show that quotient groups in Definition 17.7.9 are well-defined, it must first be checked that the operation 
ccu is well-defined. It must be shown that if gH = gH and goH = g5H, then gigoH = gigoH. 
Suppose that gı H = g, H and gH = g,H. Then gi = gıhı and gh = gaha for some hi,ho € H. So 
g,g5,H = (gihi)(geh2)H = gihago H. But haga = ga, for some h € H because H is a normal subgroup 
of G. So g1g5H = g1g3 h^; H = g1g3 H. The identity of G/H is eG H = H, and the inverse of gH is g^! H. 


17.7.13 REMARK: A normal subgroup combines the left and right coset groups into a single group. 
Remark 17.7.5 and Definitions 17.7.7 and 17.7.9 may be summarised as follows from the perspective of 
Sections 20.1 and 20.7. For an arbitrary subgroup H of a group G, define the maps ue : G x Xe — X, by 
ue : (g, gH)  (g'g)H and ur : X, x G > X, by ur : (Hg, g)  H(gg), where X, = (gH; g € G} and 
X, = (Hg; g € G}. For any subgroup H of G, ue satisfies Definition 20.1.2 for a left transformation group 
and ur satisfies Definition 20.7.2 for a right transformation group. But when H is a normal subgroup of G, 
X, = X, = G/H becomes a group which combines the left and right actions jjj, and up. (Something similar 
happens with fibre bundles in Chapter 21. An ordinary fibre bundle defines a left transformation operation 
on the fibre space whereas a principal fibre bundle defines a two-sided group action on the fibre space.) 


17.7.14 DEFINITION: The direct product of groups (G1,01) and (G2,02) is the group (G,o) where G = 
G4 x G2 and o : G x G — G satisfies o : ((g1, 92), (91, 95)) ^ (9191. 9295)- 

The direct product of (G1,c1) and (G5,02) may be denoted as (G1, 01) x (G2, 02). 

The direct product of a finite family of groups (Gi,oi)ier is the group (G,c) where G = »x;e; Gi and 
o : G x G > G satisfies o : ((gi)ier. (9;)ier) > (9:9) )ier- 
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17.7.15 REMARK: Identity and inverses for direct product groups. 

The identity of the direct product group in Definition 17.7.14 is the element e = (e1,e2), where e; and es 
are the identities of G; and Gz respectively. The inverse of (g1, g2) € G is (gj |, g5 .). Each of the groups 
G4 and G3 is called a “direct factor" of G. 


17.8. Conjugates and conjugation maps 


17.8.1 REMARK: The arbitrary choice of left conjugate versus right conjugate. 

Most authors define conjugation in the form gSg~', while some others define it as g^ Sg. Definition 17.8.2 
gives the more popular left conjugate gSg- !. Definition 17.8.3 gives the less popular right conjugate g~! Sg. 
The left conjugate is defined by MacLane/Birkhoff [110], page 70; Lang [108], page 26; Szekeres [305], page 43; 
Pinter [122], pages 133, 140; Ash [50], page 124; Gilmore [82], page 72; Bump [57], page 103; Fulton/ 
Harris [76], page 13; S. Warner [155], page 581. The right conjugate is defined by Grove [88], page 13; 
Rose [127], page 22; Stillwell [143], pages 14, 28; EDM2 [113], 190.C, page 705. 

1 


If S is a normal subgroup of G, then gSg~! = g^ !Sg for all g € G. The converse almost follows if S is a 
subgroup of G, but not quite. (See Example 17.8.5.) The same issue arises in Definition 17.8.12. 


17.8.2 DEFINITION: The (left) conjugate of a subset S of a group G by an element g € G is the subset 
gSg | ={gxg-!; x € S} of G. 
The (left) conjugate of an element h of a group G by an element g € G is the element ghg ^! € G. 


17.8.3 DEFINITION: The right conjugate of a subset S of a group G by an element g € G is the subset 
g 'Sg={g- 12g; x € S} of G. 
The right conjugate of an element h of a group G by an element g € G is the element g^! hg € G. 


17.8.4 DEFINITION: Two conjugate subsets of a group G are subsets Sı and S5 of G such that Sı is the 
conjugate of S5 by some element of G. 


17.8.5 EXAMPLE: Subgroup with equal left and right conjugates but different left and right cosets. 

It is possible for a subgroup S of a group G to satisfy Vg € G, g9g- | = g7'Sq, but Ih € G, hS 4 Sh. In 
other words, it is possible for a subgroup to have all left and right conjugates equal, but not all left and 
right cosets equal. In particular, such a subgroup is not a normal subgroup. These two propositions are 
equivalent to Vg € G, g?Sg~? = S and Ih € G, hSh-! Æ S respectively. So an element h of order 2 would 
seem to be required for this example. 

For X = Z?, define the bijections T; j : X — X and e : X > X for i,j € Z by Tij : (z,y) (£ +i,y +j) 
and c : (x,y) (y, x). These are translation and coordinate-swap actions respectively. Define the group G 
as the set of actions (75,5; i,j € Z} U (omi; i,j € Z} with the operation of function composition, where 
OT; į denotes the composition c o Ti, j. 


Let H = (7444; n € Z}U{otmn; n € Z}. This is a subgroup of G. Let g = oTa p for any a,b € Z. Then 
gHg | = g^ !Hg for all a,b € Z, but gH = Hg if and only if a = b. 


17.8.6 REMARK: Interpretation of left conjugate as “push-forth” and right conjugate as “pull-back”. 

If G is a left transformation group (defined in Section 20.1), the left conjugate ghg~' can be interpreted as a 
*push-forth" operation because points are first pulled back from near g to near e by the operation g7}, then 
the operation h is applied to the points, and finally the transformed points are “pushed forth" again to near 
their original location. By contrast, the right conjugate g^ ! hg may be interpreted as a “pull-back” operation 
because points are first pushed from near e to near g by the operation g, then the operation h is applied 
to the points, and finally the transformed points are *pulled back" again to near their original location. In 
other words, left conjugation pushes the operation h from e to the point g, whereas right conjugation pulls 
the operation h from g back to e. 

Figure 17.8.1 shows the idea that a left and right conjugate of an element h by an element g of a group G 
may be interpreted respectively as a push-forth of h from e to g, or a pull-back of h from g to e. 

This is related to the concept of parallel transport. The element h is transported between the points e and 
g in the group. In some sense, the conjugate ghg! is the result of transporting the operation h from e to g. 
If the group is commutative, all conjugates of h are equal to h, in which case one could say that there is an 
“absolute parallelism”. In other words, the space is “flat”. 
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—1 


left conjugate ghg 


push-forth of h from e to g 


right conjugate g^! hg 


pull-back of h from g to e 


Figure 17.8.1 


Left and right conjugates as pull-back and push-forth 


17.8.7 EXAMPLE: The right conjugate as a “pull-back” and the left conjugate as a “push-forth”. 
Illustrated in Figure 17.8.2 is the example G = GL(2), the group of invertible linear transformations of R?, 
with g,h € G defined by the left action of the matrices Rg and Sọ respectively on column vectors in R?, 


where for all 0 € IR, 


cos 8 
Ro = p 

— cos 20 
ar e 


— sin 
cos 0 


— sin 20 
cos20 |' 


That is, Rọ rotates points of IR? by angle 0 in an anti-clockwise direction and Sọ reflects points through a 


line normal to the vector (cos 0, sin 0). 


"me h = Sy QURE 
CASO m pee 
" zl M 
^A So | ~ S-o 
"E 2 | ES N conjugate 
RySpR-9 = 9 RS Sy 
puso Abya pull bad ofh bys 


Figure 17.8.2 


Left conjugate is a push-forth. Right conjugate is a pull-back. 


The effect of the left conjugate ghg~', which has matrix RgSoR_» is to push forth the direction of reflection 


of So by the angle 0 to Sg. In other words, the left conjugate by g rotates h forward in the direction of g. 
The effect of the right conjugate g^! hg, which has matrix R..9.So Rg is to pull back the direction of reflection 
of So by the angle 0 to S. ,. 


17.8.8 NOTATION: 9% for a subset S of a group G and g € G denotes the conjugate of S by g. 


17.8.9 REMARK: Properties of left and right conjugates. 
From Notation 17.3.6, 99 = 999^" = L,(R,-2(S)) = R,-1(L,(S)) = (Ly o R5-1)(8) = (Rg-: © L4). 


Similarly for the right conjugate, g Sg = L,-1(R4(S)) = Rg(L,-1(S)) = (Lg-1 o Rg)(S) = (Rg o D,-1)(S). 


17.8.10 THEOREM: The set of left conjugates of a subgroup is a subgroup. 
Let H be a subgroup of a group G. Then for all g € G, gHg-! is a subgroup of G. 


PROOF: Let 91,92 € gHg |. Then gı = ghig |! and g2 = ghag ! for some hi,h9 € H. But then 


g1go = ghig |ghag | = ghiho2g~', where hiho € H because H is a group by Definition 17.6.1. So Hg^! 
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is closed under multiplication. Let gı € gHg~!. Then gı = ghig ! for some hy € H. Let hg = hj. 
Then hg € H because H is a group. Let go = ghag |. Then go € gHg !. But gigo = ghig !ghag | = 
ghihag | = geg ! = e, and similarly gag1 = e. So every element of gHg~' has an inverse with ey = eg. 


Hence gHg-! is a subgroup of G. 


17.8.11 REMARK: Conjugation maps on groups. 

Conjugations maps are automorphisms which are related to adjoint maps for Lie group elements acting on 
the Lie algebra of the group in Definition 62.10.2. For an adjoint map, the group element h is replaced by 
an “infinitesimal group element”, which means an element of the Lie algebra of the group. In effect, ghg^! 
is differentiated with respect to h. 


17.8.12 DEFINITION: 
The (left) conjugation map of a group G by an element g € G is the map h — ghg ! for h € G. 


An inner automorphism of a group G is the same thing as a left conjugation map. 


The right conjugation map of a group G by an element g € G is the map h — g !hg for h € G. 


17.8.13 REMARK: Properties of left and right conjugation maps. 
In terms of Notation 17.3.6, the left conjugation map by g € G is Lj o Rg-1 = Rg-ı o Ly. The right 
conjugation map by g € G is L,-1 0 Rg = Ry o Lg-1. 


17.8.14 THEOREM: Conjugation maps are automorphisms. 
All conjugation maps of a group are automorphisms. 


PROOF: Let ¢ be the left conjugation map of a group G by g € G. Then ¢: h œ ghg ! maps hiho to 


ghihag | = ghig !ghag | = ó(hi)ó(ha). So ¢ is a group endomorphism by Definition 17.4.1. To see that 
$ : G — G is a bijection, let h € G, and let h! = g^!hg. Then $(h’) = ghg ^! = gg !hgg | = h. So 
$ is surjective. Suppose that ¢(h1) = $(ha3). Then ghig ^! = ghag |. So g^! (ghig ))g = g^ !(ghag-!)g. 
Therefore hı = hə. So ó is injective and is therefore a bijection. Hence $ is a group automorphism by 
Definition 17.4.1, and similarly for right conjugation maps. 
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RINGS AND FIELDS 
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18.1. Rings 


18.1.1 REMARK: Rings are algebraic structures with both addition and multiplication. 

A ring is a two-operation algebraic system, in contrast to semigroups and groups, which have only one 
operation. A ring is a group with respect to the first operation, and a semigroup with respect to the second 
operation. The first operation is thought of (and notated) as an addition operation. The second operation 
is thought of (and notated) as a multiplication or product operation. The multiplication operation satisfies 
a distributive condition with respect to the addition operation. 


Figure 18.1.1 illustrates some relations between various categories of rings, including some relations to fields 
and the real numbers and integers. (Not shown are the relations for Archimedean ordered rings and various 
combinations of axioms which are not presented in this book.) 


18.1.2 DEFINITION: A ring is a tuple R < (R,o,7) such that 


(i) (R,o) is a commutative group (written additively), [additive group] 
(ii) (R,T) is a semigroup (written multiplicatively), [multiplicative semigroup] 
(iii) Va,b,c € R, a(b+c) = ab + ac and (a+ b)c = ac + be. [distributivity] 


18.1.3 THEOREM: Some very basic properties of rings. 
Let (R,c,T) bea ring. Let Or be the identity of the additive group (R, o). Then 


(i) Vr € R, 7(Or, 2) — r(r,0g) — On. (Le. Vx € R, Onz = z0r = Op.) 
(ii) Vr, y € R, r(-x,—y) = r(z,y). (Le. Vr, y € R, (=x): (—y)= x-y.) 
(iii) Vr,y € R, r(—x,y) = (zx, —y) = —r(z, y). (Le. Yx,y € R, (-x)-y =x- (-y) = —-(a-y).) 


PROOF: To prove part (i), let x € R. Then 0g-x = (0n +0R)-« since Op is the identity of the group (R,c). 
But (0g + 0g) - 2 — Og z-F0n- x by Definition 18.1.2 (iii). So OR - z = Og: z-FOg-z. Hence 0g — Og. x 
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Figure 18.1.1 Family tree for rings 


(by adding the additive inverse —(0r-2x) of Or - x to both sides of the equation and using the associativity 
of addition). It follows similarly that 0g = x - Op. 

To prove part (ii), let x,y € R. Then 0g = z-0g =2-(y+(—-y)) =£- y - x: (—y) by Definition 18.1.2 (iii). 
Similarly, Or = 0g: (—y) = ((—2) + z)- (~y) = (—2)- (-y) + «- (—y). Therefore x - y = (—a) - (—y). 

For part (iii), let x, y € R. Then Og = z: Or = x: (y + (—y)) = x -y +x- (—y) by Definition 18.1.2 (iii). 
Therefore x- (—y) = — (x-y). Similarly Og = 0g:y = (x + (—2)):y = £- y - (—2):y by Definition 18.1.2 (iii). 
Therefore (—z) : y = —(a- y). ~~ 71 


18.1.4 REMARK: The zero ring. 

Every ring R < (R,o,7) must contain an additive identity element. Therefore the set R is always non-empty. 
In fact, the single-element group (R,o) with R = {Or} and o = (((05,05),05)) becomes a ring if a trivial 
multiplication operation T is added to the specification tuple. Thus the triple (R,o,7) with R = {0p}, 
v = (((05,05),05)) and 7 = (((05,05),05)) is a ring. In other words, 05 + 05 = Or and Og -Or = Og. It 
is easily verified that this satisfies Definition 18.1.2 (iii). 


18.1.5 DEFINITION: A zero ring or trivial ring is a ring (R,o,7) such that R has only one element. 
Thus R = {0R}, g = {((Or, 0R), OR)} and T = (((05,05),05)]. 


A non-zero ring or non-trivial ring is a ring which is not a zero ring. 


18.1.6 REMARK: Some basic constructions for rings. 

Analysis of the structure of rings uses concepts such as subrings, ideals, and subrings generated by sub- 
sets. Construction methods for rings include free rings (generated by abstract elements), quotient rings, 
polynomials over rings, and matrices over rings. 


18.1.7 DEFINITION: A subring of a ring R < (R,o,7T)isaring $ < (S,os, 7s) such that S C R, o 
and Tg = Fano: 


S= P sus 


18.1.8 DEFINITION: The subring generated by a subset S of a ring R is the subring of R whose set of 
elements is {A € IP(R); S C A and A is a subring of R}. 


18.1.9 DEFINITION: A left ideal of a ring R is a subring A of R such that Vr € R, Va € A, ra € A. 
A right ideal of a ring R is a subring A of R such that Vr € R, Va € A, ar € A. 
A (two-sided) ideal of a ring R is a subring A of R such that Vr € R, Va € A, (rac A ^ ar € A). 


18.1.10 REMARK: History of ideals. 

The origin of the perplexing name “ideal” for the invariant subrings in Definition 18.1.9 lies in the work by 
Ernst Eduard Kummer in the 19th century on certain problems in number theory, where he introduced the 
term “ideal complex number", nowadays known simply as an “ideal number". Dedekind generalised this 
ideal number concept to ideals of general rings. (See Boyer/Merzbach [237], pages 521—522, 594; Cajori [241], 
pages 442-445; Bell [233], pages 473-474, 513-514; Bell [234], pages 218, 223-224.) 
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18.1.11 REMARK: Ideals and subrings. 

Some texts do not define an ideal of a ring to be a subset rather than a subring, apparently because they 
define all rings to be unitary rings. (See Section 18.2 for unitary rings. Ideals are defined as subsets rather 
than subrings by MacLane/Birkhoff [110], page 95; Lang [108], page 83.) An ideal of a unitary ring is not 
necessarily a unitary ring. In fact, the only unitary ideal of a ring is the whole ring, which would be of no 
interest at all. 


18.1.12 EXAMPLE: Examples of ideals. 
The set of even integers with the usual addition and multiplication operations is a ring which is not a unitary 
ring. The ring of even numbers is an ideal of the ring of integers. 


18.1.13 REMARK: Some basic classifications of rings. 
'The numerous criteria used in the classification of rings include requirements such as commutativity and the 
non-existence of zero divisors. 


The cancellation properties in Definition 18.1.15 lines (18.1.1) and (18.1.2) are essentially the same as for 
cancellative semigroups in Definition 17.1.14. (A tiny difference is that Definition 17.1.14 requires the 
elements h, gı and go to be in the same semigroup.) These cancellation properties are also equivalent to the 
non-existence of zero divisors. 


18.1.14 DEFINITION: A zero divisor in a ring R is an element a € R \ {Or} such that 


3b € RN {Op}, ab —O0g or ba-0mg. 


18.1.15 DEFINITION: A cancellative ring or cancellation ring is a ring R which has both the left and right 
cancellation properties. In other words, 


Vh € RN {Or}, Vgi, g € R, hg, = hgo > 91 = g2 (18.1.1) 
Vh € RN {Or}, Vo, 92 € R, nh = gh > gı = ga. (18.1.2) 


18.1.16 THEOREM: Cancellative rings are the same as rings without zero divisors. 
A ring is a cancellative ring if and only if it has no zero divisors. 


PROOF: Let R be a cancellative ring. Suppose that a € R is a zero divisor. Then a € R \ (05) and either 
ab = Op or ba = Op for some b € R\ {0r}. If ba = Op, then ba = 0p although a Æ Op, which contradicts 
line (18.1.1). So no zero divisors can exist. 

Conversely, suppose that R contains no zero divisors. Let h € RN {Or} and g1,g» € R. Suppose that 
hg, = hgs with gı Æ go. Then h(gi — 92) = Or, but h Æ OR and gı — g2 Æ Og, which implies that h is a 
zero divisor. Similarly, if g1h = goh and gı Æ g2, then h is a zero divisor. Therefore both lines (18.1.1) and 
(18.1.2) are valid, and so R is a cancellative ring. 


18.1.17 DEFINITION: A commutative ring is a ring R such that Va,b € R, ab = ba. 


18.1.18 REMARK: Ring morphisms. 

Definitions 18.1.19 and 18.1.20 are based on Definition 17.4.1. For every category, there are such definitions 
for morphisms. (Category theory, which has bulk handling facilities for morphisms, is outside the scope 
of this book. See Remark 1.6.3 (3). So morphisms must be presented here individually for each category.) 
In category theory, some authors define morphisms in a different way, which gives definitions which are 
not always equivalent to the definitions given here, for example in the case of ring epimorphisms. (See 
Remark 10.5.23 for comments on the terminology diversity in the literature.) 


18.1.19 DEFINITION: A ring homomorphism from a ring Rı < (Ri,01,7) to a ring Rə < (R2,02, T2) isa 
map ¢: R, > Rə such that 


(i) Vz, y € Ri, ó(e1(z, y)) = e»(6(x), o(y)), 
(ii) Vz,y € Ri, o(mi(@,y)) = T2(O(2), O(y)), 
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18.1.20 DEFINITION: Ring morphisms. 
A ring monomorphism from a ring R; to a ring Rə is an injective ring homomorphism ¢: R1 > Ro. 


A ring epimorphism from a ring R4 to a ring Rə is a surjective ring homomorphism $ : Rı > Ro. 


A ring isomorphism from a ring R; to a ring Rə is a bijective ring homomorphism ¢: R4 > Ro. 


A ring endomorphism of a ring R is a ring homomorphism $ : R > R. 


A ring automorphism of a ring R is a ring isomorphism ¢: R > R. 


18.1.21 REMARK: Unital morphisms for non-unitary rings are not well defined. 

The unital group morphisms in Definition 17.4.8, applied to the additive group of a ring R, are not ring 
homomorphisms between Z and R in general. Let v, : Z — R be the unital group morphism generated 
by x € R for the additive group of R. Let nj € Z. Then v,(ni0z) = v,(0z) = Or by Theorem 17.4.7 (i). 
Suppose that ng € Zi satisfies (nino) = w.(ni)v.(na). Then wv&(ni(na + 1)) = w&(mino + nı) = 
Vs (ning) + Vs(ni) = va(ni)Us (n2) + vs(ni). However, v; (ni) (no + 1) = vs(ni)(s(n2) + vs (1)) = 
is (na is (n2) + hs (na) (1). Therefore vs (ni (na + 1)) = va(ni)is(na + 1) only if e(n )Ya(1) = vs (ni). 
This would be true for all n; € Zi if (ni) = Og for all n; € Zi or if V, (1) is a multiplicative identity 
for R, which is not available in a non-unitary ring. (See Theorem 18.2.10 for a more positive outcome.) 


Notation 18.1.22 is related to unital morphisms for rings. If 1g is the multiplicative identity element of a 
unitary ring R, then the map v : Z — R defined by v : n > nlg will be the unital morphism according to 
Definition 18.2.11. If R contains integers, Notation 18.1.22 will be ambiguous. 


18.1.22 NOTATION: nr, for n € Z and r € R, for a ring R, denotes the element of R which is defined 
inductively by Ozr = Or and 


Yn € Z*, nr = (n—lz)r-r 
and 
Yn € Z7, nr — (n-F lg)r — r. 


18.2. Unitary rings 


18.2.1 REMARK: Variant definitions of rings and unitary rings. 

Some texts do not require rings to have a multiplicative identity. Then they define unitary rings to be those 
rings which do have a multiplicative identity. (See for example Pinter [122], page 172; Hartley /Hawkes [90], 
pages 3, 12; EDM2 [113], 368.A, page 1369; Allendoerfer/Oakley [48], page 132.) This agrees with the 
approach taken here. Some other texts do require general rings to have a multiplicative identity. (See for 
example Lang [108], page 83; MacLane/Birkhoff [110], page 85; Ash [50], page 29.) Needless to say, this can 
cause confusion when comparing multiple sources. 


18.2.2 DEFINITION: A unitary ring or ring with unity is a ring R < (R,o,7) such that 


(i) 3e € R, Va € R, (ea — a and ae — a). 


The element e is called the multiplicative identity (element), unit element, unity or unity element of R. 


18.2.3 REMARK: Uniqueness of the unit in a ring. 
The unity in Definition 18.2.2 is unique. 


18.2.4 NOTATION: 1p, for a ring R, denotes the multiplicative identity of R. 


The multiplicative identity 1g may be denoted as 1 when the containing ring is implied by the context. 


18.2.5 REMARK: The inconvenient fact that a zero ring is a unitary ring. 

Unfortunately, a zero ring is a unitary ring. (See Definition 18.1.5.) It is easily verified that Or is a 
multiplicative identity for a zero ring. Therefore the zero ring case must be explicitly excluded whenever the 
inequality Or Æ 1p is required, which is almost always. 
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18.2.6 THEOREM: Some basic properties of unitary rings. 
Let (R,o,7) be a non-trivial unitary ring. 
(i) 0n z 1n. 
(ii) #(R) > 2. 
(iii) (-1g)- (71g) = 1n. 
(iv) Vr € R, x- (-1g) = (-1g): z = —a. 
(v) Va € R, (-z)-(—x) =x- x. 


PROOF: For part (i), note that since R is a unitary ring, it contains a unique additive identity 0g and a 
unique multiplicative identity 15. Suppose that 05 = 1g. Since R is assumed to not be a zero ring, there is 
an element x of R such that x 4 0g. Then 1 g:z = z:1g = x by Definition 18.2.2 (i), and 0g: x = r-0g = Og 
by Theorem 18.1.3 (i). Hence x = Og, which is a contradiction. Therefore Or Æ 1g. 

Part (ii) follows trivially from part (i.) 

To show part (ii), note that (1g + (—1n))(1n + (—1r)) = Or by definition of —1g and Og. Apply- 
ing distributivity three times gives 1g1g + In(—1n) + (—1r)lr + (—1R)(—1R) = Or. So In + (-1g) + 
(—1r) + (-1g)(-1ag) = OR by definition of 15. Then (—15) + (C1g)(—-1ga) = OR by definition of —175. 
Hence (-1g)(—-1n) = 1n. 

To show part (iv), let x € R. Then Or = Or: £ = (1r -(-1g)): z = IR- £ + (-1g): £ — x (1g): a. 
So —z = (-1g): x. Similarly, 0g = £ -Or = z- (In +(-lr)) — x»: Ig - z-(-1g) 2 z - xz: (—1n). So 
—x = z:(—1g). Therefore x - (C15) = (—1mg): x = —z as claimed. 

To show part (v), let x € R. Then (—2z)-(—z) = (z:(—1n)): ((—1n):x) by a double application of part (iv). 
So (—x) - (—x) = a- ((-1g): (—1pr)) : x by the associativity of multiplication. So (—x)- (—-x) = x- (1g): c 
by part (iii). Hence (—x) - (~x) = x - x as claimed. 


18.2.7 REMARK: Unitary ring morphisms. 
Definition 18.2.8 is a straightforward adaptation of Definition 18.1.19 to unitary rings. The corresponding 
adaptation of Definition 18.1.20 to unitary ring morphisms is too obvious to state. 


18.2.8 DEFINITION: A unitary ring homomorphism from a unitary ring R4 < (R4,01,71) to a unitary ring 
Rə < (R2,02, T2) is a ring homomorphism $ : R1 — Rə which satisfies 


(iii) (1R,) = 1n. 


18.2.9 REMARK: An immersion of the integers inside any unitary ring. 
The unique unitary ring homomorphism from the unitary ring of integers to a unitary ring R is called the 
“unital morphism” for R. (See Theorem 17.4.7 and Definition 17.4.8 for the corresponding non-unique unital 
group morphisms.) The unital ring morphism is not necessarily injective. 


18.2.10 THEOREM: Uniqueness of the unital ring morphism for a unitary ring. 
Let (R,o,7) be a unitary ring with additive and multiplicative identities 0g, 1g € R. Define the function 
w:Z— R inductively as follows. 


(i) v(0z) = Or. 
(ii) Vn € Zi, v(n +1) = e(v(n), 1g). 
(iii) Vn € Zp, (n — 1) = e(U(n), 1a). 
Then v is the unique unitary ring homomorphism from the unitary ring Z to R. 


PROOF: Let (R,o,7) be a unitary ring. Then (R,o) is a group by Definition 18.1.2 (i). So it follows from 
Theorem 17.4.7 that there is a unique group homomorphism v : Z — R from the additive group of Z to 
(R, c) which satisfies (1) = 15. Then v is the unique map from Z to R which satisfies Definition 18.1.19 (i) 
and Definition 18.2.8 (iii). i 
Let ni € Z. Then v(ni0z) = v(0z) = Or. Now suppose (as inductive hypothesis) that no € Zg satisfies 
V(nina) = v(ni)v(na). Then v(ni(na +1)) = v(nina m) = v(nina) + (n1) = v(nm)v(n2) + v(ni)1 a = 
v(ni)( (na) + 18) = v(ni)v(na + 1) by the ring properties of Z and R and the definition of v. So by 
induction, (nina) = v(ni)v(n2) for all n € Zf, and similarly for no € Zg. Thus Definition 18.1.19 (ii) is 
satisfied. Hence v : Z — R is a unitary ring homomorphism, and it is unique by Theorem 17.4.7. 
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18.2.11 DEFINITION: The unital (ring) morphism of a unitary ring R is the unique map v : Z — R which 
is defined inductively by 
(i) v(0z) = Or, 
(ii) Vn € Zj, v(n-- 1) = v(n) + 1n, 
(ui) Vn € Zp, Y(n — 1) = v(n) — 1n. 
18.2.12 REMARK: The characteristic of a non-trivial unitary ring. 


Definition 18.2.13 is a generalisation of the characteristic of a field to non-trivial unitary rings. (For the 
characteristic of a field, see Definition 18.7.10.) 


18.2.13 DEFINITION: The characteristic of a non-trivial unitary ring R is the least positive integer n € Z* 
such that Y(n) = 0, if such a an n exists, or 0 otherwise, where ~ is the unital morphism for R. 


18.2.14 DEFINITION: A commutative unitary ring is a unitary ring which is also a commutative ring. 


18.2.15 DEFINITION: An integral domain is a non-trivial cancellative commutative unitary ring. 


18.2.16 EXAMPLE: The integers and Gaupian integers are integral domains. 

The ring of Gaußian integers is a commutative unitary ring with unit (1,0). This is a subring of the ring of 
complex numbers. Since the Gaufian integers satisfy the cancellation properties in Definition 18.1.15, they 
also constitute an integral domain. (The Gaufian integers contain no zero divisors because the complex 
numbers contain no zero divisors. So the ring is a cancellative ring by Theorem 18.1.16.) 


Another well-known integral domain is the ring of integers. (See Section 14.4.) This is also easily proved to 
have the cancellative property since it contains no zero divisors because it is a subring of the real numbers, 
which also has no zero divisors (because it is a field). However, neither the integers nor the Gaußian integers 
are division rings. 


18.2.17 DEFINITION: The ring of GauPian integers is the ring R which is defined by 
(i) R=7?, 

(ii) Yz,y € R, x +y = (z1 + yi, $2 + Y2), 

(iii) Vx, y € R, zy = (£1Y1 — Layo, iyo + T291). 


18.2.18 REMARK: The complex number system is an integral domain. 
The complex number system (C < (C, oc,Tc) in Definition 16.8.1 is a field according to Definition 18.7.3, 
but in Theorem 18.2.19 it is shown only that it is an integral domain. 


18.2.19 THEOREM: The complex numbers form an integral domain. 
The complex number system © < (C, oc, Tc) is an integral domain. 


Proor: The pair (C,øc¢) is a group in which the identity is 0c = (0,0) and the additive inverse map is 
(x,y) + (—x,—y). The addition operation og is commutative because addition in R is commutative. Thus 
Definition 18.1.2 (i) is satisfied. 


The multiplication operation Tc : C x C — C is associative because by Definition 16.8.1 (ii), 


V(r1, y1), (22, Y2), (23. y3) € ©, 
To ((21, Y1), To ((x2, Y2), (za. y3))) = To((21, 91), (2223 — yoys, x2ys + yaxa)) 
(z1(xoxa — yaya) — Yı (T2Y3 + £3Y2), Li (T2y3 + Xaya2) + Yı (T2£3 — yoys)) 
= ((z1z2 — y1y2)23 — (z1ya + y122)ya; (L1Y2 + y122)23 + (11132 — )192)93) 
= TE( (£182 — Y1Y2, Liy2 + yi2); (£3, y3)) 


E TO(Tc((x1, yi), (22, Y2)); (£3, y3))- 


Thus (C, Tc) is a semigroup. So Definition 18.1.2 (ii) is satisfied. The distributivity for Definition 18.1.2 (iii) 
is similarly easy (and tedious) to show. So C is a ring by Definition 18.1.2. Then (1,0) is a multiplicative 
identity for C. So C is a unitary ring by Definition 18.2.2. The commutativity of Tc is evident from 
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Definition 16.8.1 (ii). So C is a commutative unitary ring by Definition 18.2.14. And finally C is non-trivial 
by Definition 18.1.5 because it contains more than one element, and C is a cancellative ring because every 
non-zero element (x, y) has a multiplicative inverse (z', y) with a’ = a/(a?+y+2) and y' = —y/ (x? +y - 2). 
Hence C is an integral domain by Definition 18.2.15. 


18.3. Ordered rings 


18.3.1 REMARK: An ordering of a ring must be compatible with both addition and multiplication. 
Definition 18.3.2 means that an ordering of a ring is a total order which is compatible with addition and 
multiplication. (See Definition 11.5.1 for total order.) MacLane/Birkhoff [110], page 261, restricts ordered 
rings to unitary rings with 0 Z 1. 


18.3.2 DEFINITION: An ordering of a ring (R,o,T) is a total order O on R such that 


(i) Vz,y,a € R, (r«y > a+zr<a+y), 
(ii) Vz,y,c€ R, (( «y ^ c» 0) 2 zc « yo), 


where O is denoted as “<” and OT! is denoted as “>”. 


18.3.3 DEFINITION: An ordered ring is a tuple (E,c, 7T, O) where (R,o,7) is a ring and O is an ordering 
of the ring (&,o, 7). 


18.3.4 REMARK: The choice of direction for the notation for the order relation of a ring. 

It is convenient to denote the order O in Definition 18.3.2 as “<” and denote O^! as “>”, together with the 
usual notations “<” and “>” for the corresponding weak order and its inverse respectively. In principle, one 
could denote O as “>”, but if R is a non-zero unitary ring, then 05 O 1g. Since one is accustomed to think 
of Op as being less than 1p in a ring, it is best to denote O as “<”. 


18.3.5 REMARK: Equivalent conditions for the order condition for an ordered ring. 

Condition (ii) in Definition 18.3.2 is expressed in terms of multiplication on the right by a fixed element c > 0. 
This is equivalent to the same condition with multiplication on the left instead. (This is obvious if the ring 
is commutative, but not so obvious if it is noncommutative.) It is also equivalent to the more symmetric 
condition: Vx,y € R, ((x » 0 ^ y > 0) = xy > 0). These equivalences are shown in Theorem 18.3.6. 


18.3.6 THEOREM: Equivalent conditions for an additive total order om a ring to be a ring ordering. 
Let R be a ring with a total order “<” such that Vr,y,a € R, (x < y > a4 xz «a- y). Then the following 
conditions are equivalent. 


(i) Vz,y,c€ R, ((r «y ^ c» 0) 2 zc « yc). 
(ii) Vx,y,c€ R, ((x«y ^c» 0) > ex < cy). 
(ui) Vz,y € R, ((x 20^ y» 0) > zy » 0). 


Proor: Let R bearing with a total order “<” which satisfies the indicated additive translation invariance. 
Suppose that “<” satisfies condition (i). Let y, c € R satisfy y > 0 and c > 0. Then 0 = 0c < yc. So yc > 0. 
Hence Vy,c € R, ((y > 0^c» 0) = yc > 0), which is the same as condition (iii) under an exchange of 
dummy variables. Similarly, condition (ii) yields cy > 0 for all y, c € R which satisfy y > 0 and c > 0, from 
which condition (iii) follows. 


N 


Now suppose that “<” satisfies (iii). Let r,y,c € R satisfy x < y and c > 0. Let z = y — x. Then 
0=2-a2<y-—2 =z by the addition-invariance of “<”. So z > 0 and c > 0. Therefore zc > 0 by (iii). 
So yc = zc + zc > zc by addition-invariance. Therefore Vr,y,c € R, ((x < y ^ c » 0) > zc < yo), 
which is the same as condition (i). Similarly, to show that (iii) implies (i), swapping the dummy variables 
c and z in the application of condition (iii) gives cz > 0 and cy = cz + cz > ca, from which it follows that 


Vz,y,c€ R, (( <y ^ c» 0) = cz < cy), which is the same as condition (ii). 


Thus (iii) implies both (i) and (ii), while conditions (i) and (ii) both individually imply (iii). Hence the three 
conditions are pairwise equivalent as claimed. 
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18.3.7 REMARK: The term “isotonic” for compatibility of an order with ring operations. 

The term “isotonic” is used to describe conditions (i) and (ii) in Definition 18.3.2 by MacLane/Birkhoff [110], 
page 263, and Fuchs [75], pages 2, 4. Thus the total order is isotonic for addition, and isotonic for multipli- 
cation by positive factors. 


The isotonic conditions are a strong constraint on the rings themselves. For example, an ordered unitary 
ring which is not a zero ring must be order-isomorphic to the ring of integers if the positive elements are 
well ordered. (See Remark 18.6.12.) 


18.3.8 REMARK: Examples of ordered rings. 

An example of an ordered ring (which is not a unitary ring) is the ring of even integers with the usual total 
order. An example of an ordered unitary ring (which is not a field) is the ring of integers with the usual 
total order. 


18.3.9 REMARK: Not all ordered rings are commutative. 
It is very difficult to prove that all ordered rings are commutative because it isn’t true. (The hardest theorems 
to prove are the ones which aren’t true!) 


18.3.10 THEOREM: Some basic properties of ordered rings. 
Let (R,0o,7,<) be an ordered ring. 


(i) Vz,y € R, (£ >0r ^y» 0n > x4 y » On). 

(ii) Vr, y € R, £< y & -y«-a. 

(iii) Vr,y € R, zr «0g & —z » Op. 

(iv) Vr € R, z 50g > r+au> Op. 

(v) Vr € R, r#0r > z-ccxz0pg. 

(vi) Vr € R, r#0r > &-a> Op. 

(vii) Vz,y € R, (r5 0R ^x «y) > xx «y-y. 

(viii) Vz,y € R, (x >ORAYyY>ORAX- 2 =y-y) — zy. 
(ix) Vr, y € R, (rx 5 OUR ^y «On ^z:.z—y:y) > x ——y. 


) 
) 
) 
) 
) 
) 
) 
(x) Vz,y € R, ((x-.z—y:y) > (x29 V z= —y)). 
) 
) 
) 
) 
) 
) 
) 


(xi) Vr,y € RV {Or}, x-y z On. (That is, R has no zero divisors.) 
(xii) Vr,y,c € R, ((xc « yc ^ c» 0) > x « y). 
(xiii) Vz,y,c € R, ((xc 2 yc ^ cz 0) > xz — y). 
(xiv) Vz,y € R, Vee RV {Or}, (zc 2 yc & z—y S cz — cy). 


(xv) Vz,y € R, (a 2 0R ^ y » 0n > xy > Og). 
(xvi) Vr,y € R, (r 2 0g ^ y «€ 0g — xy « Op). 
(xvii) Vz,y € R, (x «0n ^ y « 0g => zy » Og). 
PROOF: For part (i), let x > 0 and y > 0. Then 0 < y. So x =2+0< x+y by Definition 18.3.2 (i). 
Therefore x + y > 0 by Definition 11.5.1 (iii). 
To show part (ii), let x,y € R satisfy r < y. Then (—x) + (—y) +a < (—x) + (-y) + y by a double 


application of Definition 18.3.2 (i). So —y < —a by Definition 18.1.2 (i). Now suppose that —y < —r. Then 
—(—a) < —(—y) by the already-proved forward implication. Hence x < y. 


Part (iii) follows from part (ii) with y = Or and Theorem 17.3.14 (v) for -0g = Og. 

To show part (iv), let x € R with z > Og. Then z 4 z > x by Definition 18.3.2 (i). So x + z > Og because 
“<” is a total order. 

To show part (v), let x € R with z # Og. Then x > Og or x < Og. Suppose x > Og. Then z 4 z > 0g 
by part (iv). Hence x + x # Or. Suppose x < Og. Then —z > Og by part (ii). So (—z) + (-z) > Or by 
part (iv). So —((—z) + (—2)) < Or. Hence x +x = —((—z) + (-z)) Z Or. 

To show part (vi) let x € R\ {Or}. Then x > Og or z < Or. Suppose x > Or. Then 0p. z < zz by 
Definition 18.3.2 (ii). Hence x -x > 0g. Suppose x < Or. Then —z > Op by part (ii). So (—z) - (—z) > Or 
similarly. But (—2) - (—x) = x: a by Theorem 18.1.3 (ii). So xz- x > Og. 
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To show part (vii), let z, y € R satisfy x > 0g and x < y. Then z-x < x-y by Definition 18.3.2 (ii). Similarly, 
x-y <y-y by Definition 18.3.2 (ii) since y > 0g because “<” is a total order. Therefore x-x < y: y because 
“<” is a total order. 


To show part (viii), let z, y € R satisfy x > Or, y > Or and z -x = y: y. Suppose that z Z y. Then a < y 
or x > y. Suppose that x < y. Then x- a < y: y by part (vii). This contradicts the assumption. Similarly 
x > y yields a contradiction x x > y: y. Therefore x = y. 


To show part (ix), let z,y € R satisfy x > Or, y < Or and rz: zr = y: y. Let z = —y. Then z > Op by 
part (ii), and z-z = y: y by Theorem 18.2.6 (v). So x > 0g, z > 0g and z: z = z: z. Therefore z = z by 
part (viii). Hence z = —y. 

To show part (x), let x,y € R satisfy x - x = y: y. If x = Op, then 0g = 0g- OR = y: y, and so y = Og by 
part (vi). Hence x = y (and incidentally x = —y). Similarly, x = y (and x = —y) if y = 0g. Now suppose 
that x > Og and y > Og. Then x = y by part (viii). If z > Og and y < Og, then x = —y by part (ix). 
Similarly, if x£ < 0g and y > Og, then z = —y by part (ix). Lastly, if £ < Or and y < Og, then —x > Og 
and —y > Og by part (ii), and so —z = —y by part (viii) because z- x = (—2): (—x) and y: y = (—y): (—y) 
by Theorem 18.2.6 (v). Hence z — y. 

To show part (xi), let x,y € R\ {Or}. Suppose that z > Or and y > Or. Then zy > OR by Defini- 
tion 18.3.2 (ii). So zy A 0g because “<” is a total order. Similarly, if £ < Og or y < Og, then zy < Or 
or zy > Og. In both cases, zy Z Og. So there are no zero divisors. 


For part (xii), let z,y,c € R with xc < yc and c > 0. Then either x < y, x = y or x > y because “<” 
is a total order on R. Suppose that x = y. Then zc = yc, which contradicts the assumption xc < yc. 
Suppose that x > y. Then zc > yc by Definition 18.3.2 (ii), which also contradicts the assumption xc < yc. 
Hence z « y. 


For part (xiii), let z,y,c € R with zc = yc and c > 0. Suppose that « « y. Then Definition 18.3.2 (ii) 
implies rc < yc, which contradicts the assumption xc = yc. Similarly, if y < x, then yc < xc, which also 
contradicts the assumption xc = yc. Therefore x = y. Now assume that xc = yc and c < 0. Let d = —c. 
Then zd = x(—c) = —(ac) = —(yc) = y(—c) = yd and d > 0. So x = y again. Hence x = y always follows 
from xc = yc and c Æ 0. 


Part (xiv) follows from a double application of part (xiii). 

Part (xv) follows from Definition 18.3.2 (ii) and Theorem 18.3.6 (i, iii). 

For part (xvi), let x > Og and y < Og. Then —y > Op by part (iii). So z(—y) > Or by part (xv). But 
r(—y)-— —(xy) by Theorem 18.1.3 (iii). So —(xy) > Or. Therefore zy < Og by part (iii). 

For part (xvii), let x < Or and y < Og. Then —z < Og and —y > Og by part (iii). So (—x)(—y) > Or by 
part (xv). But (—a)(—y) = —(xy) by Theorem 18.1.3 (ii). Hence zy > Op. 


18.3.11 REMARK: The multiplicative semigroup of an ordered ring is cancellative. 
Part (xiv) of Theorem 18.3.10 means that the multiplicative semigroup (Ro, T| Rox Ro) 
Ro = RN {Or}, is “cancellative” if the ring is ordered. (See Definition 17.1.14 for cancellative semigroups.) 


of a ring (E, c, 7), where 


18.3.12 REMARK: Ordered ring morphisms. 

It assumed that usual cohort of morphisms as in Definition 18.1.20 for rings is defined also for ordered 
rings in accordance with Definition 18.3.13. Note that Definition 18.3.13 does not assume condition (iii) in 
Definition 18.2.8. Definition 18.3.13 is derived directly from Definition 18.1.19 because the existence of a 
unit element cannot be assumed. 


18.3.13 DEFINITION: An ordered ring homomorphism from an ordered ring Ry < (R3,01,71, «1) to an 
ordered ring R2 < (Ro, 02,72, <2) is a ring homomorphism $ : R4 — Rə which satisfies 


(iv) Vzr,y € Ri, x «1y > &(2) «a ó(y). 


18.3.14 REMARK: Embedding the integers in an ordered ring as a line of points through the origin. 
Theorem 18.3.15 for ordered rings is a strengthened form of Theorem 17.4.7 for groups, which in turn is a 
strengthened form of Theorem 17.1.9 for semigroups. A consequence of the order on the ring is that the 
unital group homomorphism in Definition 17.4.8 is promoted here to a monomorphism. 
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18.3.15 THEOREM: Jnjectivity of unital group morphisms in an ordered ring. 
Let R be an ordered ring. For x € R define v, : Z —> R inductively by 


(i) Vs (Oz) = Or, 
(ii) Vn € Zi. Ue(n +1) = v.(n) +z, 
(iii) Vn € Zp, v, (n — 1) = Ys (n) — x. 


Then 4^, is a group monomorphism with respect to the additive group operation on R for all x € RN {Or}. 
In other words, the unital group morphism generated by x € R {Or} is injective. 


PROOF: The assertion follows from Definition 18.3.2 (i) and Theorem 17.5.12 (iv). 


18.3.16 REMARK: Positive and negative elements in an ordered ring. 

The elements of an ordered ring may be partitioned into three sets, the positive elements, the negative 
elements and the zero element. This forms the basis of an alternative approach to defining ordered rings. 
This is done in Definition 18.3.17. 


18.3.17 DEFINITION: A positive cone or positive subset for a ring R is a subset P of R which satisfies the 
following. 
(i) Or ¢ P. 
(ii) Vr € R\ {Or}, (re P & —r ¢ P). 
(iii) Va,yeE P, x - y € P. 
(iv) Vz,y € P, zy € P. 


18.3.18 REMARK: Alternative methods of specifying the structure of an ordered ring. 

It follows from Theorems 18.3.19 and 18.3.20 that an ordered ring R may be specified either by a total 
order “<” as in Definition 18.3.2, or by a positive cone or subset P as in Definition 18.3.17. The relation 
Yzy E I, x <y € y—z€ P defines the equivalence map between the two styles of specification. 


18.3.19 THEOREM: The set of positive elements of an ordered ring is a positive cone. 
Let (R,o, T, <) be an ordered ring. Let P = (x € R; Or < x}. Then P is a positive cone for the ring (R, 0,7). 


PROOF: Let (R,o,T, «) be an ordered ring. Let P = (x € R; 0g < x}. Then 0g ¢ P because “<” is a 
total order on R. Let a € R\ {Or}. If a € P, then Og < a. So —a < —0g = Og by Theorem 18.3.10 (ii). 
Hence Or £ —a, by the antisymmetry of the total order “<”. So —a £ P. Similarly, if z 4 0g and —2 ¢ P, 
then x € P. 


Let x,y € P. Then 0g < x. So y < x +y by Definition 18.3.2 (i). So Or < x + y, by the transitivity of a 
total order, because 0g < y. By Definition 18.3.2 (ii), Ory < xy. So ry > Og. Hence P satisfies all of the 
conditions of Definition 18.3.17. 


18.3.20 THEOREM: Construction of a ring ordering from a positive cone. 
Let (R,c,7) be a ring with a positive cone P. Let the relation “<” be defined on R by x «y & y-xc€P 
for all x,y € R. Then “<” is an ordering of the ring (R,6,7). 


PROOF: Let (R,o,7) bea ring with a positive subset P. Let “<” be the relation on R defined by x < y © 
y—z € P for all z,y € R. To show that “<” is a total order on R, let x,y € R satisfy x £ y. Then y—z ¢ P. 
So by Definition 18.3.17 (ii), either y — z = 0g or —(y — x) € P. That is, either z = y or x — y € P. So 
either x = y or y < x. But if x = y, then y — x ¢ P and z — y ¢ P by Definition 18.3.17 (i). Sox € y 
and y £ x. Hence “<” satisfies the antisymmetry and reflexivity conditions for a total order. To show 
transitivity, let x,y,z € R with z «y andy < z. Then y — x € P and z — y E€ P. So(y—a)+(z-y)EP 
by Definition 18.3.17 (iii). Thus z — x € P and therefore x < z. Hence “<” is a total order on R. 


Let a,b,c € R with a < b. Then b — a € P. So (b+c) - (a+c) € P. Soa+c<b+c. 


Let a,b,c € R with a < b and c > 0. Then b—a € P and cE P. So (b—a)c € P. So be—ac E P. So ac « bc. 
Hence all of the conditions of Definition 18.3.2 are satisfied. So “<” is an ordering of the ring (Ro, 7). 
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18.3.21 REMARK: Notations for positive and negative cones in ordered rings. 

One may generalise the notations in Remark 1.5.1 for positive and negative subsets of the integers, rational 
numbers and real numbers to general ordered rings. (See also Notation 14.4.5.) Thus for any ring R, one 
may introduce the following notations, where P is the positive cone of R. 


AH = {x € R; z > 0r} =P. 

Re = {zx € R; x > 0r} = PU {oR}. 
R- = {xz € R; z < 0r} = R\ P \ {Or}. 
Ro = {z € R; x < Or} = R\P. 


18.4. Archimedean ordered rings 


18.4.1 REMARK: Archimedean ordered rings. 

The significance of Archimedean orderings for the “method of exhaustion” is mentioned in Remark 17.5.3. 
The Archimedean property for orderings is extended in Definition 18.4.2 from groups (as in Definition 17.5.5) 
to rings. An ordered ring is said to have the Archimedean property if its additive group is Archimedean. In 
essence, this property means that the ring contains no infinities and no infinitesimals. An “infinity” would 
be an element which is larger than any positive integer multiple of some positive element. An “infinitesimal” 
would be a positive element which cannot be made larger than a given positive element by multiplying it 
by positive integers. Here “positive integer multiples” of a ring element x € R are elements of the form ng 
for n € Z*, where nz = w,(n) is as defined in Theorem 18.3.15. (Alternatively see Notation 18.1.22.) 


18.4.2 DEFINITION: An Archimedean ordering of a ring (R,o,7T) is an ordering “<” of (R,o,7) such that 
“<” js an Archimedean order on the additive group (&,c). In other words, 


Vr € R, Ye c R*, jn e Zi, r « ne. 


An Archimedean ordered ring is an ordered ring (R,o,7,<) such that “<” is an Archimedean ordering 
of (R,o, 7). 


18.4.3 REMARK: nfinities and infinitesimals in non-Archimedean ordered rings. 

Example 18.4.4 shows how an ordered commutative unitary ring may contain infinities. (See Section 18.6 
for ordered unitary rings.) Infinities are ring elements which are greater than all positive integer multiples 
of the unit element. 


Example 18.4.4 is inverted in Example 18.4.5 to define elements oo! for k € Zo. Then the element oo! 


is an infinitesimal because 0 < n.oo^! < 1 for all n € Z*. Note that the ring (R,c,7) in Example 18.4.5 
is isomorphic to the corresponding ring in Example 18.4.4, but they are not isomorphic as ordered rings 
because the condition *j > k” in the order definition for Example 18.4.4 has not been changed to “7 « k” 
in Example 18.4.5. 


These two examples may be combined to permit elements oo" for all k € Z by changing the domain for ring 
elements to Z. This gives a non-Archimedean ordered ring containing both infinities and infinitesimals. 


18.4.4 EXAMPLE: A non-Archimedean ordered ring with infinities. 

Let R= (x € ZZ; Jk € Zg, Vj > k, zj = 0}. Define o : R x Ro R by o(2,y)i = zi + yi for all x,y € R 
and i € Zg. Deiner: Rx R > R by r(z,y)i = Xio rj jyj for all z,y € R and i € Z. Define a total 
order “<” on R by x < y if and only if Jk € Zt, (zy < yk ^ Vj > k, xj = yj). Then (R,o,7,<) is an 
ordered commutative unitary ring. 

For k € Zi, let oo* denote the element y of this ring which satisfies yj = 1 and y; = 0 for j € Zj \ {k}. 
Then one may write x = } 77-9 2,00 for all x € R. This ordered ring is non-Archimedean because n.1g < oo 
for all n € Z*. (See Definition 17.5.4 for the Archimedean property.) Thus oo is an infinity in this ring. 
Similarly, —oo is a “negative infinity" for this ring because —oo < —n.1g < 0 for all n € Z*. 


To verify that (R,c, T) is a unitary commutative ring, note that o(z,y) € R and r(z,y) € R for all z,y € R 
because the ring Z is closed under addition and multiplication, and the sum 229 zj.jyj has only a finite 


number of non-zero terms for all i € Zo . The proofs of associativity, commutativity and distributivity are 
elementary. It is clear that the element 15 = oc? is the unit element of R. 
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To verify that (R,o,7,<) is an ordered ring, suppose that x,y € R satisfy x < y, and that a € R. Then 
dk € Zi, (£k < yk ^ Vj > k, xj = yj). But a; + xi < aj + yi if and only if z; < yi, and a; + £i = ai + yi 
if and only if z; = yi, for all i € Zg- So dk € Zg, (ak + Ek < ak + yk ^ Vj > k, aj +4; = aj + yj). 
Therefore a+z < a+y. Now let z,y € R satisfy x > 0 and y > 0. Then Jk € Z, (zy > 0^ Vj > k, zj —0) 
and Im € Zf, (ym > 0 ^ YL > m, xe = 0). For such k and m (which are uniquely determined by x and y), 


it follows that (£zy)k+}m = = Lktm—j¥j = tkUm > 0, and (zy); = i-o vj.jyj = 0 for alli > k4- m. 
Therefore zy > 0. Hence (R,o,7, «) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6. 


18.4.5 EXAMPLE: A non-Archimedean ordered ring with infinitesimals. 

Let R = {x € Z”o ; dk € Zg, Vj < k, xj 20). Defineo: Rx R > R by o(z,y); = zi + yi for all x,y € R 
and i € Zy. Define 7 : Rx R —> R by T(x,y)i = P» z, yj for all z,y € Randi € Zg. Define a total 
order “<” on R by x < y if and only if Jk € Zo, (£k < Yk ^ Vj > k, xj = yj). Then (R,o, 7, €) is an 
ordered commutative unitary ring. 


For k € Zp , let oo" denote the element y of this ring which satisfies yy = 1 and y; = 0 for j € Z3 \{k}. Then 


one may write z = Y, z;oc! for all x € R. This ordered ring is non-Archimedean because 0 < n.oo7! < 
1g for all n € Z*. (See Definition 17.5.4 for the Archimedean property.) Thus oo ^! is an infinitesimal in this 


ring. Similarly, —oo^! is a “negative infinitesimal” for this ring because —1g < —n.oo^! < 0 for all n € Zt. 


To verify that (E, 6,7) is a unitary commutative ring, note that o(z,y) € R and r(z,y) € R for all z,y € R 
because the ring Z is closed under addition and multiplication, and the sum 4 zj.,yj has only a finite 
number of non-zero terms for all i € Zg. The proofs of associativity, commutativity and distributivity are 
elementary. It is clear that the element 15 = oo? is the unit element of R. 


To verify that (R,o,7,<) is an ordered ring, suppose that x,y € R satisfy x < y, and that a € R. Then 
Jk € Zo, (£k < yk ^ Vj > k, x; = yj). But aj + £i < a; + yi if and only if x; < yi, and aj + zi = ai + yi 
if and only if z; = yi, for all i € Zj. So 3k € Zg, (ag + ay < og - yg ^ Vj > k, a; +a, = aj + yj). 
Therefore a-- x < a-- y. Now let x,y € R satisfy x > 0 and y > 0. Then dk € Zp, (£k > 0^ Vj > k, x; —0) 
and Im € Zp, (Ym > 0 A VL > m, xe = 0). For such k and m (which are uniquely determined by x and y), 
it follows that (xy)kim = Nim Ykpm—jUj = CkYm > 0, and (xy); = 21 zi jy; = 0 for all à > k +m. 
Therefore zy > 0. Hence (H,6, 7, «) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6. 


18.4.6 REMARK: A ring where some elements have no square upper bound. 

Example 18.4.5 can be made non-unitary by excluding the elements n.15 for Z \ {Og}. The ring elements 
x € Rin Example 18.4.7 have the form x = 55; . z,oo!. An interesting property of this pathological ring 
is that one cannot validly assert that Yx € R, dy € R, x < y?. A counterexample to this assertion is the 
ring element z = co~!. The square y? for any y € R has the form y? = $52. (y?);oo!. The fact that the 


i——oo 
coefficient (y?)_1 = 0 follows from the equality oo !.oo ^! 


= oo^?. Consequently Vy € R, y? < oom}. 
The existence of square upper bounds for all ring elements is shown in Theorem 18.4.8 for non-zero ordered 
rings which are either unitary or Archimedean. 


18.4.7 EXAMPLE: A non-unitary non-Archimedean ordered ring with infinitesimals. 

Let R= (re Z” ; Jk € Z7, Vj < k, vj = 0). Define o : Rx R > R by o(2,y)i = zi + yi for all x,y € R 
and i € Z-. Definer: Rx R —> R by r(z,y)i = Dia rj .jyj for all x,y € R and i € Z^. Define a total 
order “<” on R by x < y if and only if dk € Z7, (zy < yy ^ Vj > k, z; = yj). Then (R,o,7,<) is an 
ordered commutative ring. 

For k € Z~, let oo" denote the element y of this ring which satisfies yj = 1 and y; = 0 for j € Z- V {k}. 
Then one may write x = pcm zoo! for all x € R. This ordered ring is non-Archimedean because 
0 < n.oo ? < oo ! for all n € Zt. (See Definition 17.5.4 for the Archimedean property.) Similarly, 
—oo ! < —n.oo ? < 0 for all n € Z+. 


To verify that (R, o, 7) is a commutative ring, note that o(z, y) € R and r(x, y) € R for all x,y € R because 
the ring Z is closed under addition and multiplication, and the sum Xs 41 ti-jyj has only a finite number of 
non-zero terms for all i € Z7. The proofs of associativity, commutativity and distributivity are elementary. 
To verify that (R,o,7,<) is an ordered ring, suppose that x,y € R satisfy x < y, and that a € R. Then 
Jk € ZT, (£k < yk ^ Vj > k, xj = yj). But aj + £i < aj + yi if and only if x; < yi, and aj + zi = ai + yi 
if and only if z; = yy, for alli € Z7. So Jk € Z,(a&--£y < ak + yg ^ Vj > k, aj + £j = aj + yj). 
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Therefore a-- z < a4- y. Now let z, y € R satisfy x > 0 and y > 0. Then Sk € Z7, (zy > 0A Vj > k, xj =0) 
and dm € Z-, (Ym > 0 : ye > m, x, = 0). For such k and m (which are uniquely determined by x and y), it 
follows that Cn RE = c Lktm—j¥j = ZkVm > 0, and (zy); = Dzaa Tj jyj; = 0 for all i > k+m. 
Therefore xy > 0. Hence e 9, T, X) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6. 


18.4.8 THEOREM: In unitary and Archimedean ordered rings, every element has a square upper bound. 
Let (R,o, T, «) be a ordered ring which is unitary or Archimedean, and which is not the zero ring. 
Then Yz € R, dy € R, x < y?. 


PROOF: Let (R,o,7,<) be a non-zero unitary ordered ring. Let x € R. If x < 0, then x < y? with y = 0. 
If0c z <1, then z < y? with y = 1. If x = 1, then z < 4 = (14-1). If z > 1, then z < z? by 


Definition 18.3.2 (ii) because 1 < x and z > 0. So x = 1.x < x.x = z?. 


Let (R,c,T, «) be a non-zero Archimedean ordered ring. Let x € R. Let z € R\ {0}. Then z? > 0 by 
Theorem 18.3. uly (vi). It follows from Definition 17.5.4 that « < n.z? for some n € Zt. But n.z? € n?z? 
because n € n?. Hence x < (n.z)?, which is of the required form. 


18.5. Absolute value functions on rings 


18.5.1 REMARK: A pseudo-absolute value function is analogous to a pseudo-Riemannian metric. 

A pseudo-absolute value function on a ring has a value which lies in an ordered ring. It is the same as an 
absolute value function except that non-negativity is not required. This is similar to the way that a pseudo- 
Riemannian metric tensor field is not required to be non-negative. The purpose of defining pseudo-absolute 
value functions is to motivate some details of Definition 18.5.6 for a minimalist absolute value function. 


In Definition 18.5.2, the order “<” is the weak order corresponding to the order “<” on S. 


18.5.2 DEFINITION: A pseudo-absolute value (function) on a ring (R,or,TR), valued in an ordered ring 
(S,05,Ts, <), is a function ¢: R — S which satisfies the following. 


(i) Vm,y € R, O(tr(x,y)) = rs(6(x), ó(y)). (That is, Vr, y € R, (xy) = ó(z)ó(v).) 
(ii) Vr,y € R, ó(en(v,y)) < as(ó(), 6(y)). (That is, Vr, y € R, O(a +y) < 6(x) + ó(y)-) 


18.5.3 REMARK: Necessary conditions for the definition of an absolute value function. 

Let R be any ring, and let S be the integers Z with the usual ordering. Then ¢: R — S defined by 
(x) = 1z for all x € R is a pseudo-absolute value function on R, valued in S. This dashes any hopes of 
proving that $(0g) = Os in general. 

Let R = S be any ordered ring, and define ¢: R — S by ó(x) = x for all x € R. Then ó is a pseudo-absolute 
value function on R, valued in S. This dashes any hopes of proving that ¢(x) > Og in general. 


These examples show that non-negativity does not guarantee (0g) = Og, and that $(05) = Og does 
not guarantee non-negativity. Therefore these must both be assumed in Definition 18.5.6. Even if both 
non-negativity and $(05) = Og are assumed, the pseudo-absolute value function could be trivial because 
(x) = Og for all x € R defines a valid pseudo-absolute value function. Therefore positivity for non-zero 
elements of R must also be specified if that is desired. 


Pseudo-absolute value functions do have some useful properties, as shown in Theorem 18.5.4. 


18.5.4 THEOREM: Some basic properties of pseudo-absolute value functions om rings. 
Let ¢: R — S be a pseudo-absolute value function on a ring R, valued in an ordered ring S. 


(i) $(05) = Os. 
(ii) Va € R, (¢(—2) = (2) or ġ(—x) = —é(z)). 
(iii) If Vr € R, (x) > Og, then Vy € R, ó(y) = o(-y). 
(iv) If (OR) > 0s, then Vz € R, d(x ) > 0g. 
(v) Va,y € R, (¢(xy) = 0s = (ó(z) = 0s V ó(y) = 
(vi) Vz,y € R, ((6(—2) = (x) A (~y) n 
(vii) If dy € R, d(y) < Og, then Va € R, 
(vii) Either (A) Vx € R, o(z) = ¢(—2) > 


o(a) = 
Os, or (B) Va € R, o(x) = —d(-2). 
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PROOF: Let ¢: R — S be a pseudo-absolute value function on a ring R, valued in an ordered ring S. To 
show part (i), note that ¢(0r) = ¢(0r + 0n) € (OR) + (0g). So 0s < $(0n) by Definition 18.3.2 (i). 

To show part (ii), let x € R. Then $(x)ó(x) = ó(z-z) = é((—2):(72)) = ¢(—x)¢(—2) by Theorem 18.2.6 (v). 
Therefore ó(z) = ¢(—2) or $(x) = —¢(—a) by Theorem 18.3.10 (x) 

To show part (iii), assume Vr € R, ó(r) > Os. Let y = Or. Then ó(y) = ó(—y) because —0g = OR 
by Definition 18.1.2 (i) and Theorem 17.3.14 (v). Now let y > Or. Then ¢(y)¢(y) = ¢(—y)¢(—y), and so 
ely) = ¢(—y) by Theorem 18.3.10 (viii). 

To show part (iv), let (x) < 0s for some z € R. Then 0g < $(0n) = ó(x--(—2)) € ó(x)--6(—2) by part (i). 
So ¢(—x) > —ó(x) > 0s. Therefore ¢(x) = —¢(—2x) by Theorem 18.3.10 (ix). So ¢(0r) € ¢(a)+¢(—x) = 0s. 
Hence $(05) = Og. In other words, the assumption Jx € R, d(x) < Os implies (0g) = Og. Therefore the 
assumption (05) > Og implies Vr € R, $(x) > 0s by modus tollens tollendo. (See Remark 4.8.15.) 

To show part (v), let x,y € R satisfy ó(zy) = 0s. Then $(x)ó(y) = 0s. Hence (x) = 0s or (y) = 0s by 
Theorem 18.3.10 (xi). 


To show part (vi), let z,y € R satisfy ó(— m g(a) and ¢(—y) = —¢(y). Then $((—z): y) = é(—x)ó(y) = 
a(e) ou), and dle a) = Hoa) = (096). But (Cz) vc 2 (Cy) by Theorem 18.13). So 
o(x)o(y) = —¢(x)o(y). Therefore (x) ¢(y) = 0s by Theorem 18.3.10 (v). So (xy) = 0s. Hence (a) = 0s 
or $(y) = 0s by part (v). 

To show part (vii), let y € R satisfy ¢(y) < Os. Then ¢(—y) = —¢(y) by part (ii). So ¢(-y) > 0s by 
Theorem 18.3.10 (ii). Now assume that x € R satisfies d(x) # —ó(x). Then ¢(x) = ¢(—2) by part (ii). 
Therefore (x) = 0s or ¢(y) = 0s by part (vi). So (x) = 0s by the assumption ¢(y) < 0s. So ó(—z) = 0s 
by part (ii). Therefore ¢(x) = —¢(x), which contradicts the assumption. Hence Vz € R, ¢(x) = —¢(—2). 
Part (viii) follows directly from parts (iii) and (vii). 


18.5.5 REMARK: A minimalist absolute value function has its value in an ordered ring. 
Absolute value functions on rings may be defined very generally in terms of orderings on rings. The order 
relation “<” in Definition 18.5.6 is the non-strict order corresponding to the strict order “<” on S. 


18.5.6 DEFINITION: An absolute value (function) on a ring (R,oR,TR), valued in an ordered ring 
(S,og,73,<), is a function ¢: R — S which satisfies the following. 

(i) Vz,y € R, ó(Tn(z, y)) = rs(6(x), 6(y)). (That is, Vz, y € R, (xy) = o(x)(y).) 
(ii) Va,y € R, $(or(x,y)) € es(6(x), O(y)). (That is, Vr,y € R, ¢(x +y) € O(z) + (y)-) 

) 

) 


(ii) $(05) = Os. 
(iv) Vr € RN {Or}, d(x) > Og. 


18.5.7 REMARK: Absolute value function definitions for special domains and ranges. 

Absolute values are defined by MacLane/Birkhoff [110], page 264, for general non-trivial unitary rings, but 
they require the ring R and the ordered ring S to be the same ring, and they specifically define the absolute 
value of z to be max(z, —2). Absolute value functions are defined as real-valued functions on fields satisfying 
the rules in Definition 18.5.6 by Lang [108], page 465, and Ash [50], page 220. 


'The independent choice of rings R and S is motivated by some quite familiar kinds of scenarios, such as 
Example 18.5.8, where R is not a subset of S, and S is not a subset of R. 


18.5.8 EXAMPLE: An absolute value function on the ring of Gaufsian integers. 

The ring of Gaufian integers R in Definition 18.2.17 has an absolute value function ¢: R — S where S is 
the subring of R which is generated by E set of all square roots y/n for n € Z* (together with the usual 
ordering of IR, restricted to R), and $(x) = yx? + z2 for all x € R. 


18.5.9 EXAMPLE: Absolute value on a zero ring. 
Let R = {Or} be a zero ring as in Definition 18.1.5. Let (S, «) be any ordered ring. Define 9 : R S by 
$(0n) = 0s. Then ¢ is an absolute value function on R by Definition 18.5.6. 


18.5.10 THEOREM: Some basic properties of absolute value functions on rings. 
Let ó : R — S be an absolute value function on a ring R, valued in an ordered ring S. 


(i) $ is a pseudo-absolute value function on R, valued in S. 
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(ii) Vr € R, ó(x) > 0s. 
(iii) Yx € R, é(—x) = $(x). 
PROOF: Part (i) follows from Definitions 18.5.2 and 18.5.6. 


Part (ii) follows from Definition 18.5.6 (iii, iv). 


Part (iii), follows from parts (i) and (ii) and Theorem 18.5.4 (iii). 


18.5.11 REMARK: Application of general absolute value function to define a general norm. 
The general absolute value function in Definition 18.5.6 is used for the general norm on modules over rings 
in Definition 19.6.2. 


18.5.12 REMARK: The relation of absolute value functions to metric functions. 

Conditions (ii), (iii) and (iv) in Definition 18.5.6 imply that the function d : R x R — S which is defined by 
d : (x,y) + $(y — x) is a metric function on R according to Definition 37.1.2. The symmetry of d follows 
from Theorem 18.5.10 (iii). 


18.5.13 REMARK: The “standard” absolute value function on an ordered ring. 

On an ordered ring, the domain and range of an absolute value function can be the same ring. In this special 
case, the familiar style of definition for an absolute value function on the integers, rational numbers or real 
numbers satisfies the requirements for an abstract absolute value function. This is shown in Theorem 18.5.15. 


18.5.14 DEFINITION: The standard absolute value (function) on an ordered ring (R,o,T, <) is the function 
$ : R — R which satisfies 


Vn c R, oa)={ 2 ife > 0 


x ifa<0. 


18.5.15 THEOREM: The standard absolute value function on an ordered ring is an absolute value function. 
The standard absolute value function on an ordered ring R is an absolute value function on R, valued in R. 


PROOF: Let ¢: R — R be the standard absolute value function on an ordered ring (R,o,T,<). Then 
Definition 18.5.6 (iii) is satisfied because 9(0) = 


Let x € R\ {0}. If z > 0, then ó(z) = x > 0. If z < 0, then ¢(x) = —x > 0 by Theorem 18.3.10 (iii). 
Therefore Definition 18.5.6 (iv) is satisfied. 


Let z,y € R with z > 0 and y > 0. Then zy > 0 by Theorem 18.3.10 (xv). So ó(ry) = ry = $(x)G(y). 
Now let x > 0 and y < 0. Then xy < 0 by Theorem 18.3.10 (xvi). So ¢(ay) = —(ay) = (—x)y = e(x)ó(y). 
Similarly if x < 0 and y > 0 then ¢(xy) = ¢(x)d(y). Finally let x < 0 and y < 0. Then zy > 0 
Theorem 18.3.10 (xvii). So óé(zy) = xy = (—2x)(—y) = é(x)ó(y). Thus Definition 18.5.6 (1) is satisfied. 

Let z,y € R with z > 0 and y 2 0. Then z +y 2 0. So ó(x4 y) = x+y = o(x)+¢(y). Therefore ó(x4- y) € 
(x) + é(y). Now suppose that x > 0 and y < 0. If r +y 2 0, then ó(z--y) 2z-y € z— y — ó(x) 4- gly). 
If x+y <0, then (x +y) = -—(x +y) <x- y= (x) + ly). Thus ó(x +y) € o(x) + ó(y) in either case. 
Lastly let x < 0 and y < 0. Then ó(z + y) = —(z + y) = O(a) + é(y). Thus ó(x +y) € ó(x) + (y) in 
every case. So Definition 18.5.6 (ii) is satisfied. Hence ¢ is an absolute value function on R, valued in R, by 
Definition 18.5.6. - 


18.5.16 THEOREM: Some basic properties of the standard absolute value function on an ordered ring. 
Let 9 : R  R be the standard absolute value function on an ordered ring R. 


(i) Va € R, —ó(x) € x € O(a). (Le. Vr € R, -|z| € x € |a|.) 
(ii) Vr € R, —d(a) € —x € O(a). (Le. Va € R, -|z| € —z < |x].) 
(ili) Va,y € R, o(9(x) — 9(y)) € é(x — y). (Le. —|yl| < |z — yl.) 
(iv) Yz, y € R, d(x) + (y) = max(d(x + y), 6(z — y)). (Le. Yz, y € R, |z| + |y| = max(|z + yl, |x — yl).) 


PRoor: For part (i), let x € R with z > 0. Then ó(x) = x and so z < ó(x). Also, —ó(x) = —x < x by 
Theorem 18.3.10 (iii). So —ọ(x) € x € ọ(x). Now suppose that x < 0. Then —z > 0. So —ọ(—x) < —x < 
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¢(—2) by the case z > 0. So ó(—2) 2 x > —ó(—z) by Theorem 18.3.10 (ii). Therefore —9(x) € x < o(x) 
by Theorem 18.5.10 (iii). Thus —$(x) < x € $(x) in both cases. 

Part (ii) follows from part (i) and Theorem 18.5.10 (iii). (Alternatively apply Theorem 18.3.10 (ii).) 

For part (iii), let x, y € R with z > 0 and y > 0. If z > y then ó(ó(z) — d(y)) = ó(x—y) 2z—y € ó(z — y) 
by part (i). If æ < y then 6(¢(2) — d(y)) = (z — y) = —(x — y) € é(x — y) by part (ii. 

Now suppose that x > 0 and y < 0. Ifa+y>0 m d (x) — ly) = ó(zx-y) 2zt-y € z—y € ó(x — y) 
by part (i). E +y € 0 then o(ó(z) — ó(y)) = (z +y) = -(z-c y) < x — y < ó(x — y) by part (i). 
Suppose that xz < 0 and y > 0. If x +y > 0 then ¢(¢ i )-4(y)- r—y)—ccty&€-rcty&$(zr-y) 
by part (ii). If x +y € 0 then ó(ó(z) — ó(y)) = 6(-r — y) = -x - y < —x +y € o(x — y) by part (ii). 
Suppose that x < 0 and y < 0. If —z-- y > 0 then à(ó(x) — o(y)) = é(—x +y) = —- +y € O(a — y) 
by part (ii). If -x+y < 0 then ó(ó(x) — ó(y)) = é(-x +y) = x—y € O(a — y) by part (i). Thus 
(O(a) — é(y)) < (x — y) in all cases. 

For part (iv), let x,y > 0. Then ó(z) + ó(y) = x + y = max(x + y,o(x — y)) = max(ó(z + y), (x — y)) 
because z + y > d(x — y) since z +y > z— y and xz - y > —(z — y). Let x > 0 and y € 0. Then 
o(x) + ó(y) = x — y = max(ó(z + y), a — y) = max(ó(x + y), p(x — y)) because O(a + y) S x — y since 
z-cyXr-—yand -(x-4y)X z-— y. Similarly if £ < 0 and y > 0. Let z,y < 0. Then $(x) + ly) = 
—x — y = max(—r — y, (x — y)) = max(ó(x + y), ọ(x — y)) because —x — y > O(a — y) since -—9y > x — y 
and —z— y > —(z — y). 


18.5.17 REMARK: The standard absolute value functions for the real and complex numbers. 

Since the real number system R is an ordered ring as a consequence of the basic properties in Theorem 15.9.3, 
and the “absolute value function for IR" in Definition 16.5.2 agrees exactly with Definition 18.5.14, it follows 
from Theorem 18.5.15 that the “absolute value function for IR? is a valid absolute value function for IR, and 
the assertions of Theorem 18.5.16 are therefore applicable. 


Not so straightforward is the absolute value function for complex numbers in Definition 16.8.8. A particular 
difference is that the complex number system (C, oc, 7c) in Definition 16.8.1 is not an ordered ring, although 
it is a unitary ring according to Definition 18.2.2. However, the value of an absolute value function is required 
to lie in an ordered ring, which in this case is the real number system. This shows the motivation for defining 
absolute value functions in Definition 18.5.6 in terms of two rings, one for the domain and one for the range. 


18.5.18 THEOREM: The complez-number standard absolute value function is an absolute value function. 
The standard absolute value function for the complex numbers is an absolute value function on the ring of 
complex numbers, valued in the ordered ring R. 


PRoor: The complex number system (C < (C, oc, Tc) in Definition 16.8.1 is a ring by Theorem 18.2.19 and 
Definition 18.2.11. The standard absolute value function | - | for C in Definition 16.8.8 is valued in IR, which is 
an ordered ring by Theorem 15.9.3 and Definition 18.3.3. The product rule in Definition 18.5.6 (1) follows from 
Theorem 16.8.9 (i). The triangle inequality in Definition 18.5.6 (ii) follows from Theorem 16.8.9 (ii). The non- 
negativity and positivity conditions in Definition 18.5.6 (iii, iv) are evident from the form of Definition 16.8.8. 
Hence | - | is an absolute value function on the ring ing C. 


18.6. Ordered unitary rings 


18.6.1 REMARK: Many definitions for general rings are immediately applicable to ordered rings. 
It is not necessary to re-define orderings, positive cones and absolute value functions for unitary rings. 
Definitions 18.3.2, 18.3.3, 18.3.17 and 18.5.6 for rings are immediately applicable to unitary rings. 


18.6.2 REMARK: Consequences of the existence of an ordering on a ring. 
'The existence of an ordering is a strong constraint on a ring. For example, any non-zero element z of a ring 
R which satisfies xx = x is an identity for that ring. 


18.6.3 THEOREM: A simple condition which guarantees existence of an identity for an ordered ring. 
Let (R,o,7, «) be an ordered ring. Let x € R\ {Or} satisfy xz = x. Then x is an identity for R. Hence R 
is an ordered non-zero unitary ring. 
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PROOF: Let (R,o,7,<) be an ordered ring. Let x € R\ {Or} satisfy xx = x. Suppose y € R satisfies xy Z y. 
Then zy < y or zy > y. Suppose zy < y. If x > 0 then zxy < xy by Definition 18.3.2 (ii). So zy < xy, which 
is a contradiction. In the case that x < 0, note that zy < y implies that —y < — (xy) by Theorem 18.3.10 (ii). 
So (-z)(-y) < (^z)(-(xy)). So zy < rzy by Theorem 18.1.3 (ii). So zy < xy, which is a contradiction. 
Similar contradictions follow if it is assumed that zy > y. Therefore xy = y. Similarly, yz = y for all y € R. 
Hence z is a multiplicative identity for R. 


18.6.4 THEOREM: Some basic properties of ordered unitary rings. 

Let R < (R,o,7,<) be an ordered unitary ring which is not a zero ring. 
(i) Orn < 1n. 

(ii) Vr € RN {Or}, VA € R, (Az =z > A— 1g). 


PnRoor: To show part (i), note that 0g # 1g by Theorem 18.2.6. So either 0g < 1g or 1g < Or. Suppose 
that 1g < Or. Then 0g < —1p by Definition 18.3.2 (i) with c = —1g. So —1g < Or by Definition 18.3.2 (ii) 
with a = 1g, b = 0g and c= —1g. This contradicts the definition of a total order. Therefore 05 < Lp. 

For part (ii), let x € R\ {0r} and A € R. Suppose that Ax = x. Then Az = 1gz. Therefore A = 1g by 
Theorem 18.3.10 (xiii). 


18.6.5 REMARK: Embedding the integers as a subring in a unitary ring. 

'The integers may be embedded in any non-zero ordered unitary ring, as mentioned in Remark 18.2.9. That 
is, every non-zero ordered unitary ring contains a subring which is order-isomorphic to the ordered ring of 
integers. This is shown in Theorem 18.6.6. It follows that there are no finite non-zero ordered unitary rings. 


18.6.6 THEOREM:  Unital morphism for ordered unitary ring is an ordered unitary ring monomorphism. 
Let R be a non-zero ordered unitary ring. Then the unital morphism for R is an ordered unitary ring 
monomorphism. 


PROOF: Let y: Z — R be the unital morphism for R. (See Definition 18.2.11.) Then $ : Z > Risa 
unitary ring homomorphism by Theorem 18.2.10. But 1g 4 Og because R is a non-zero unitary ring, So v 
is injective by Theorem 18.3.15. Therefore v : Z — R is a unitary ring monomorphism. 


For the additive ordered group structure of R, note that y : Z — R is an ordered group monomorphism 
by Theorem 17.5.12 (v) because 1g > 0g by Theorem 18.6.4 (i). Hence v : Z — R an ordered unitary ring 
monomorphism. 


18.6.7 REMARK: Consequences of unitary ring structure for absolute value functions. 
Absolute value functions on rings have some additional properties for unitary rings. 


18.6.8 THEOREM: Absolute value functions for unitary rings map identity to identity. 
Let ¢: R — S be an absolute value function on a non-zero unitary ring R, valued in an ordered ring S. 
Then S is a non-zero unitary ring and ¢(1r) = 1g. 


PROOF: Let ¢: R —> S be an absolute value function, where R is a non-zero unitary ring and S is 
an ordered ring. Suppose ó(15): y Z y for some y € S. Then either ó(15): y < y or d(1ng): y > y. 
Suppose ó(1g): y < y. Then ó(1g)- ó(1n): y < ó(1n): y by Definition 18.3.2 (ii) and Definition 18.5.6 (iv). 
But o(1n): ó(1n) = ó(1n) by Definition 18.5.6 (i). So o(1n):y < (1r): y, which contradicts the total order 
on S. Similarly, the assumption ó(1g): y > y yields a contradiction. Therefore ó(1g): y = y. Similarly, 
y: ó(1n) = y for all y € S. Hence (1r) is a multiplicative identity for S. So S is a non-zero unitary ring 
and (1r) = lg. 


18.6.9 REMARK: The trivial absolute value function. 

Absolute value functions are typically real-valued, but the value may lie in any ordered ring S, as presented 
in Definition 18.5.6. If a ring R has no zero divisors and the value space S is a unitary ring, then a trivial 
absolute value function is always a well-defined absolute value function on R. 


18.6.10 DEFINITION: The trivial absolute value (function) on a ring (R,or,TR) with no zero divisors. 
valued in an ordered unitary ring S < (S,os,7s5,<), is the function 9 : R — S which satisfies 


E Os if T = OR 
Yr € R, p(z) = { ls otherwise. 
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18.6.11 THEOREM: The trivial absolute value function on a cancellative ring is an absolute value function. 
Let 6: R — S be the trivial absolute value function on a ring (R,oR,TR) with no zero divisors, valued in 
an ordered unitary ring (S, 05,75, «). Then ¢ is an absolute value function from R to S. 


PROOF: Let $ó : R — S be the trivial absolute value function on a ring R < (E,og,7g), valued in an 
ordered unitary ring S < (S, 05,75, «). Then ¢(0”) = Os, which is part (iii) of Definition 18.5.6. Part (iv) 
follows from Theorem 18.6.4 (i). 

To show part (i) of Definition 18.5.6, let z,y € R. If x = Og or y = Og, then ó(zy) = Or = é(x)o(y). If 
x Z 0g and y Z Og, then zy Z 0g because R has no zero divisors. So ¢(xy) = 1g = o(x)d(y). 


18.6.12 REMARK: Ordered integral domains. 

Integral domains are defined as non-trivial cancellative commutative unitary rings in Definition 18.2.15. 
So the ordered integral domains in Definition 18.6.13 are ordered rings which are non-trivial cancellative 
commutative unitary rings if the order structure is removed. A prime example of an ordered integral domain 
is the integer number system Z. The Gaufian integers in Definition 18.2.17 cannot be given a compatible 
ordering. 


A non-trivial ordered integral domain whose positive cone is well ordered is called an “integral system” in 
Definition 18.6.14. The name is well justified by Theorem 18.6.15, which states that all integral systems 
are ordered-unitary-ring-isomorphic to the ordered unitary ring of integers. (See also Pinter [122], page 210; 
MacLane/Birkhoff [110], page 264.) Commutativity is not required in order to prove this isomorphism. (The 


relation of integral systems to other ring categories is illustrated in Figure 18.1.1 in Remark 18.1.1.) 


18.6.13 DEFINITION: An ordered integral domain is an ordered ring (E,6, 7, «) such that the underlying 
ring (R,o,7) is an integral domain. 


18.6.14 DEFINITION: 
An integral system is a non-trivial ordered unitary ring whose positive cone is well ordered. 


18.6.15 THEOREM: All integral systems are ordered-unitary-ring-isomorphic to the integers. 
For every integral system R, there is an ordered unitary ring isomorphism from the ordered unitary ring Z 
to R. 


PROOF: Let R be an integral system. Let wv : Z — R be the unital morphism in Definition 18.2.11. Then 
v is an ordered unitary ring monomorphism by Theorem 18.6.6. Let X = Range(v). 


Let x € RN X. Then —z € R\ X because otherwise y(n) = —z for some n € Z and so (—n) = —(—2) = 
x € X. Let P = {y € R; y > On). Then either x € P or —z € P because “<p” is a total order. Therefore 
P\ X #9. But P is well-ordered. So PX X contains a minimum element z. But z Z 1g because z ¢ X 
and 1g € X. So either z < 1g or z > 1g. Suppose that z > 1g. Then z — 1g > Or. Soz— 1g € P. But 
z— lg < z, which then contradicts the assumption that z = min(P \ X). So suppose that z < 1g. Then 
2:2 > Or by Theorem 18.3.10 (vi) because z Z Og. But z-z < 1g because z > Og and z < 1g implies z-z < z 
by Definition 18.3.2 (ii). Suppose that z -z € X. Then Or < Y(n) < 1n for some n € Z. This is impossible 
because 7 is an ordered ring homomorphism. (So n < 0 implies Y(n) € 05 and n > 1 implies (n) > 1g.) 
Therefore z -z € PX X and z-z < z, which contradicts the assumption that z = min(P\ X). So R\ X = 90. 
Therefore «/ is surjective. Hence w is an ordered unitary ring isomorphism from Z to R. 


18.6.16 THEOREM: All integral systems are ordered integral domains. 
Every integral system is an ordered integral domain. 


PRoor: By Theorem 18.6.15, eery integral system is ordered-unitary-ring-isomorphic to Z. But Z is an 
ordered integral domain. The assertion follows. 


18.7. Fields 


18.7.1 REMARK: Division rings are almost fields, but lack commutativity. 

'The division rings in Definition 18.7.2 probably have more in common with fields than with general rings. 
So it seems reasonable to present them together. The classic example of a not-commutative division ring is 
the system of quaternions. 
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18.7.2 DEFINITION: A division ring is a non-trivial unitary ring in which every non-zero element has a 
multiplicative inverse. 


18.7.3 DEFINITION: A field is a tuple K < (K,o,7) which satisfies: 


(i) (K,o) is a commutative group (written additively). 


) 

(ii) (K,T) is a commutative semigroup (written multiplicatively). 
i) 
) 


(ii) (Ko, To) is a group, where Ko = K V {0x} and 7 = PE 


(iv) Va,b,c € K, a(b + c) = ab + ac and (a + b)c = ac + bc. 


18.7.4 REMARK: The conditions which define a field. 

Condition (iii) in Definition 18.7.3 excludes the possibility that the zero ring could be a field. (As mentioned 
in Remark 18.2.5, the zero ring is a unitary ring.) If the multiplicative semigroup (KK, 7) in condition (ii) is 
not required to be commutative, the resulting system is a division ring. Thus a field is the same thing as a 
commutative division ring. A field is also the same thing as a non-trivial commutative unitary ring in which 
every non-zero element has a multiplicative inverse. 


18.7.5 THEOREM: A field is the same thing as a commutative division ring. 
A tuple K < (K,6,7) is a field if and only if it is a commutative division ring. 


PROOF: Let K < (K,o,T) be a field. Then K is a ring by Definition 18.7.3 (i, ii, iv) and Definition 18.1.2, 
and K is a non-trivial ring by Definition 18.7.3 (iii) since the group Ko must be non-empty. So #(K) > 2. 
Then K is a unitary ring by Definition 18.7.3 (iii) and Definition 18.2.2. Every non-zero element of K has 
a multiplicative inverse by Definition 18.7.3 (iii). So K is a division ring by Definition 18.7.2. Hence K is a 
commutative division ring. The proof of the converse is equally unsurprising. 


18.7.6 REMARK: The choice of name for fields. 

The choice of the word “field” in English for commutative division rings is unfortunate because it clashes 
badly with the use of the word “field” in physics for a function of space or space-time. A better name for a 
field in algebra would have been an “arithmetic”, since the Greek word àpiüuóc means “number”, and fields 
in algebra are systems which behave essentially as we expect numbers to behave. Alternatively, the word 
“numeric” could be borrowed from Latin. 


Bell [233], page 355, writing in 1937, said that a field was sometimes called a “corpus” in English, which 
corresponds to the German “Körper” and the French “corps”. 


18.7.7 EXAMPLE: Some well-known fields. 

The set of rational numbers Q (Section 15.1) with the usual addition and multiplication operations is a field. 
The set of real numbers R (Section 15.3) and the set of complex numbers C (Section 16.8) with the usual 
addition and multiplication operations are also fields. 


18.7.8 THEOREM: Some very basic properties of fields. 
Let K be a field. Then the following properties hold. 


(i) Va € K, aon = 0ga = Ox. 
(ii) Va € K, (ab = Og — (a — 0g V b= 0g)). In other words, there are no zero divisors in a field. 
(i) #(K) > 2. 


PROOF: For part (i), let a € K. Then by Definition 18.7.3 (iv), a(0x +0x) = a0g --a0k. But Og +0x = Ox 
by definition of the zero of an additive group. So a0qK = a(0x + Og) = a0x +a0g. So a0g + (-(a0x)) = 
a0K +a0g + (—(a0g)). So Ox =a0g +0g = a0x. Hence a0g = Ox. Similarly, Oxa = Ox. 

For part (ii), note that K V {0x} is closed under multiplication by Definition 18.7.3 (iii). So if a # Og and 
b z Ox, then ab x Ox. 

Part (iii) follows from the fact that K is a unitary ring by Theorem 18.7.5 and Definition 18.7.2, and so 
#(K) > 2 by Theorem 18.2.6 (ii). 
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18.7.9 REMARK: The characteristic of a field. 

For any field K, one may inductively construct the sequence (a;)%29 satisfying ay = Ox and an41 = an + 1K 
for n € Ze , using the addition operation of K. Using the total order on Zi , one may define an extended 
integer N € Zi as the minimum n € Z* for which a, — Og. If no such sequence element exists, one may 
say that this minimum is "infinity". Alternatively, one may define N = #(Range(a)). This is called the 
"characteristic" of the field, but to avoid talking about "infinity", this pseudo-value is replaced with the 
integer 0. (More accurately, the value 0 signifies “none of the above".) 


Since every field is a non-trivial unitary ring, the unital morphism in Definition 18.2.11 is well defined for 
all fields. The informal expression $7; 1x in Definition 18.7.10 is a convenient shorthand for (n), where 
v is the unique unital morphism for the K. (See Definition 18.2.13 for the corresponding definition of the 
characteristic of a non-trivial unitary ring, which is a generalisation of Definition 18.7.10.) 


The characteristic of a field cannot equal 1 because that would imply 1j = 0x, which is inconsistent with 
Definition 18.7.3 (iii). However, the two-element field K = (Or, 1x} is valid and has characteristic 2. Or 
to put it another way, the characteristic of any non-trivial unitary ring cannot equal 1, and a field is a 
non-trivial unitary ring. So the consequence is clear. 


18.7.10 DEFINITION: The characteristic of a field K is the positive integer min(n € Z*; 55; 1x = 0x} 
if (n € Zt; $; 1x = 0k) z 0, or the integer 0 otherwise. 


18.7.11 NOTATION: char(K), for a field K, denotes the characteristic of K. In other words, 


char(K) — oe € Zt; Di lx =0x} ifJn e Zt, Ð; lx =0x 
0 otherwise. 


18.7.12 REMARK:  Subfields. 

The structure of subfields of the real numbers and complex numbers is remarkably complicated. Most of the 
time, this complexity can be ignored in differential geometry. Theorem 18.7.14 asserts a minimal set of tests 
for a subset of a field (together with the corresponding subsets of its operators) to be a subfield. Most fields 
of interest are subfields of the complex number field. 


18.7.13 DEFINITION: A subfield of a field K < (K,o,7) is a field K' < (K',o',T') such that K' C K, 
c'Coand-r' Cr. 


18.7.14 THEOREM: Alternative conditions for a sub-system of a field to be a subfield. 
Let K < (K,o,7T) be a field. Then K’ < (K’,o’,7’) is a subfield of K if and only if 

(i) K'C K, o' =0]| x, 
(ii) #(K’) 2 2, 
(iii) Va, b € K', o(a,—b) € K', and 
(iv) Va,b € K'N {0x}, r(a,b 71) € K', 


NE 
and Tr xiu 


xK' xK”? 


where —b and b^! denote respectively the additive and multiplicative inverses within K. 


PROOF: Let K < (K,o,7) be a field, and let K’ < (K',o',T') be a subfield of K. By Definition 18.7.13, 
K' C K, c' C a and 7' Cr. By Definition 18.7.3 (i, ii) and Definitions 17.1.1 and 17.3.2, Dom(a’) = K' x Kk’ 
and Dom(r’) = K’ x K’. Therefore o’ = c|,,,, ,, and T = T | TNT This verifies (i), while (ii) follows from 
Theorem 18.7.8 (iii). 

To verify (iii) and (iv), first note that the additive and multiplicative identities Oy and 1x within K are equal 
to the respective additive and multiplicative identities 0x, and 1j; within K’ because c' C o and 7’ C rm. 
Similarly the additive and multiplicative inverses —b and b^! within K are the respective additive and 
multiplicative inverses within K’. Let a,b € K’. Then o(a,—b) is a well-defined element of K. But 
o’(a,—b) = c(a, —b) because o’ = Claes Then o(a,—b) € K' because Range(o’) C K' by Definitions 
18.7.13 and 17.3.2. This verifies (iii). Similarly, if a € K’ and b € K’ \ {0x}, then r(a, b^!) is a well-defined 
element of K which is also an element of K’ because T’ = $us Then r(a,b^!) = 7'(a,b ^!) € K' by 
Definitions 18.7.13 and 17.1.1. This verifies (iv). 


xK” 
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Conversely, assume that the tuple (K',o',7') satisfies (i), (ii), (iii) and (iv). The inclusions K’ C K, 
o’ C c and 7' C 7 follow from (i). It follows from (ii), that K contains at least on element a. Then 
0x = c(a,—a) € K' by (iii). By (ii), it then follows that there exists b € K’ with b 4 0x, and by (iv), it 
follows that 1 = r(b,b 3) € K'. Then o'(a,0&) = c(a,0&) = 0x and o’(0x,a) = c(0x,a) = Ox for all 
a € K' by (i). So Ox has the properties required for an additive identity for (K", o’). 
Let a € K'. Then —a € K satisfies o(a, —a) = o(—a,a) = Ox = Ox. But —a = c(0g, —a) € K' by (iii). 
Therefore o/(a, —a) = a'(—a,a) = Ox: by (i). Thus every element of K has an additive inverse. The closure 
condition Range(o’) C K’ follows by noting that o(a,b) = o(a,—(—b)) € K' by (iii). Therefore (K’,o’) 
is a commutative group by Definitions 17.3.2 and 17.3.23. (Note that commutativity and associativity are 
inherited automatically by o’ from ø.) This verifies Definition 18.7.3 (i). 
Let a,b € K'. If b = Ox, then 7'(a,b) = 7'(a,0«) = r(a,0«) = Ox by (i) and Theorem 18.1.3 (i). 
So 7'(a,6) € K'. If b Æ Ox, then b has a unique multiplicative inverse b^! in K. Then b^! = r(1g, 071) € K' 
by (iv). So 7'(a, b) = T' (a, (b 1) *) = r(a, (b 1) !) € K' by (iv). Thus Range(7T) C K'. The commutativity 
and associativity of 7’ are inherited automatically from +. Therefore (K’,7’) is a commutative semigroup 
by Definitions 17.1.1 and 17.1.15. This verifies Definition 18.7.3 (ii). 
Let Kj = K'N {0x} = K’\ {Ox} and sj = 7|, ,,. Then T = Tle yx Let a,b € Kj. Then 
0 0 0 0 
To(a,b) = 7'(a,b) = T(a,b) Z OK by Theorem 18.7.8 (ii). So Range(7)) € K6. Therefore (K$,75) is a 
semigroup by Definition 17.1.1 because 7j inherits associativity from 7. Since 1x # Ok and 1x € K', it 
follows that 1g € KG. The multiplicative identity property of 1j within Kj is inherited from K. Let a € Kg. 
Then a has a unique multiplicative inverse a~' in K, and a^! = r(1g,a ^!) € K' by (iv). So 74(a,a7!) = 
7'(a,a-1) = r(a,a-*) = 1g. Therefore 7j(a-!,a) = 1g by the commutativity of 75, which is inherited from 
the commutativity of T. So every element of Kj has an inverse within Kj. Therefore (A, 75) is a group by 
Definition 17.3.2. This verifies Definition 18.7.3 (iii). 
Since the distributivity of (', c^, 7^) is inherited from the distributivity of (K, c, 7), Definition 18.7.3 (iv) is 
verified. Hence (K', o, 7^) is a field by Definition 18.7.3. 


18.8. Ordered fields 


18.8.1 REMARK: Applications of ordered fields. 

Ordered fields are applied in Definition 26.9.7 for line segments in affine spaces, and in Section 22.11 to 
define convexity in linear spaces. (For ordered fields, see also MacLane/Birkhoff [110], page 261; Lang [108], 
page 449; Curtis [65], page 4; Spivak [140], page 489; EDM2 [113], 149.N, page 581.) 


18.8.2 REMARK: The non-necessity of defining some ring concepts for fields. 

Strictly speaking, it is not necessary to re-define orderings and positive cones of fields because every field 
is a ring. So Definitions 18.3.2, 18.3.3 and 18.3.17 effectively also define an ordering on a field, an ordered 
field, and a positive cone of a field respectively. Nevertheless, the corresponding Definitions 18.8.3, 18.8.4 
and 18.8.9 are rewritten here for fields. 


18.8.3 DEFINITION: An ordering of a field (K,o,7) is a total order R on K such that 


(i) Va,b,ce K, (a<b > a+c<ato), 
(ii) Va,6,c € K, ((a « b ^ c» 0) => ac< bo), 


where R is denoted as “<” and R-! is denoted as “>”. 


18.8.4 DEFINITION: An ordered field is a tuple (K,o,7, R) where (K,6o,7) is a field and R is an ordering 
of the field (K,o, 7). 


18.8.5 REMARK: Notation for orderings on fields. 
Definition 18.8.3 means that an ordering of a field is a total order which obeys two distributive rules with 
respect to the addition and multiplication. (See Definition 11.5.1 for total order.) 


It is convenient to denote R as “<” and R-! as “>”, together with the usual notations “<” and “>” for 
the corresponding weak relation and its inverse respectively. In principle, one could denote R as “>”, but 
Theorem 18.8.7 (i) guarantees that Ox R1x, and one is accustomed to think of Ox as being less than 1x for 
any field (or ring). 
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18.8.6 REMARK: Examples of ordered fields. 
Familiar examples of ordered fields are the real numbers R and the rational numbers Q. By Theo- 
rem 18.8.7 (iv), a finite field cannot be an ordered field. Therefore any ordered field will contain at least a 
sub-field which is isomorphic to the rational numbers. As may be guessed from Example 18.8.11, there are 
infinitely many non-isomorphic ordered fields. 


18.8.7 THEOREM: Some basic properties of ordered fields. 
Let K < (K,o,7, «) be an ordered field. 
(i) On < 1x. 
(ii) Vr,y € K, «y > -y< sa. 
(iii) Vr€ K, z»1x > (Ox <7! Ax} < 1x). 
(iv) The field K < (K,o,7) has characteristic 0. 


PROOF: The proof of part (i) is identical to the proof of Theorem 18.6.4 (i) because every field is a unitary 
ring which is not a zero ring, by Remark 18.7.4. 

The proof of part (ii) is identical to the proof of Theorem 18.3.10 (ii) because every field is a ring. 

To prove part (iii), note first that if x > 1x, then x > 0x. So x ¥ Ox, and therefore the multiplicative in- 
verse z^! is well defined and z^! # 0x. Suppose that x^! < 0x. Then z^ !-z < 0x-a by Definition 18.8.3 (ii). 
So 1x < Ox, which contradicts part (i). So x ' > Ox. Therefore ly < x implies Ig - a7! < x.x! by 
Definition 18.8.3 (ii). Hence z^! < 1g. 


To prove part (iv), define the sequence (a,)»-.9 by ao = Ox and a441 = a, + 1k for n € Zi. Then 
a; = lk > Ox by part (i). Suppose (i € Z*; a; < Ok] #0. Then n = min(i € Z*; a; € 0x} is well defined 


and n > 2. So an-ı > Ox and an € Oy. Hence an-ı > an = an-ı + 1x. So OK > lg, which contradicts 
part (i). Therefore (i € Z*; a; < Og} =Ø. Hence the characteristic of K is 0 by Definition 18.7.10. 


18.8.8 REMARK: Alternative definition of ordering of a field by positive cones. 

The elements of an ordered field may be partitioned into three sets, the positive elements, the negative 
elements and the zero element. This forms the basis of an alternative approach to defining ordered fields. 
This is done in Definition 18.8.9. 


18.8.9 DEFINITION: A positive cone or positive subset for a field K is a subset P of K which satisfies the 
following. 
(i) Ox € P. 
(ii) Vr € K\ {0x}, (we P & —r ¢ P). 
(iii) Va,yeE P, x 4- y € P. 
(iv) Vz,y € P, zy € P. 


18.8.10 REMARK:  Non-uniqueness of orderings of a field. 

Example 18.8.11 shows that the positive cone of a field is not necessarily unique. (This example is presented 
in a different way by MacLane/Birkhoff [110], page 263.) By Remark 18.8.12, this implies also that the 
ordering of a field is not necessarily unique. Therefore the presence of the order relation (or the positive 
cone) in the specification of an ordered field is not superfluous. Since every field is a unitary ring, these 
comments apply equally to ordered rings and ordered unitary rings. 


18.8.11 EXAMPLE: Let K be the field (Q?,6, 7) with the following operations. 


(1) Va,y € Q?, o(z, y) = (a1 yu 22 + ya). 

(2) Va,y € Q?, r(z, y) = (zuy1 + 291292, Liye + £241). 

The element 0% = (05,05) of K satisfies On +x = x +0x = x for all x € K. So this is the additive 
identity of K. The element 1g = (19,09) of K satisfies 1g -x = x: 1g =< for all x € K. So this is the 
multiplicative identity of K. The element z^! = (zi/(z? — 2922), —v2/(x7 — 2922)) is the multiplicative 
inverse of x € K \ {Ox}. (The remaining conditions for a field in Definition 18.7.3 may be easily verified.) 
So K is a field. 
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Define P, = {x € Q?; vi(x) = 1), where 


0 ifr= OK 

1 if x1 > OQ, 22 > 09 and z Z OK 
Va € Q?, plz) = 1 ifzi > 0g, £2 < 0g and z? 2922 

1 ifa< 0g, 1222 0o and Ez « 2922 

1 otherwise. 


Then P, is a positive cone for K. However, let P; = (x € Q?; q2(x) = 1}, where q2(x) = v1 ((x1, —2)) for 
all £ € Q?. Then P, is also a positive cone for K. But P, 4 P2. 


18.8.12 REMARK: Equivalence between two methods of specifying an ordering on a field. 

It follows from Theorems 18.8.13 and 18.8.14 that an ordered field K may be specified either by a total order 
“<” as in Definition 18.8.3, or by a positive cone P as in Definition 18.8.9. The relation Vr, y € I, £ < y € 
y — x € P defines the equivalence map between the two styles of specification. 


18.8.13 THEOREM: The set of positive elements of a field is a positive cone. 
Let (K,o,7,<) be an ordered field. Let P = {x € K;0x% < a}. Then P is a positive cone for the 
field (K,o,T). 


Pnoor: The proof of Theorem 18.8.13 is identical to the proof of Theorem 18.3.19 with the word “ring” 
replaced by the word “field”. Alternatively, merely note that a field is a ring, and the definitions of an 
ordering and a positive cone are identical for fields and rings. 


18.8.14 THEOREM: Construction of a field ordering from a positive cone. 
Let (K,o,7) be a field with a positive cone P. Let the relation “<” be defined on K by x < y & y—xEP 
for all z, y € K. Then “<” is an ordering of the field (K,o, 7). 


PROOF: The proof of Theorem 18.8.14 is identical to the proof of Theorem 18.3.20 with the word “ring” 
replaced by the word “field”. Alternatively, merely note that a field is a ring, and the definitions of an 
ordering and a positive cone are identical for fields and rings. 


18.8.15 REMARK: Order-preserving ordered field morphisms. 

Fuchs [75], pages 20-21, uses the terms *o-homomorphism" and *order-homomorphism" for homomorphisms 
which preserve order between algebraic categories such as groups, rings and fields in addition to the algebraic 
operations. In Definitions 18.8.16 and 18.8.17, the full category name “ordered field" is used to avoid 
ambiguity as to exactly how much structure is preserved. The definition of an ordered field isomorphism is 
used in Remark 18.9.5, where it is stated that all complete ordered fields are ordered-field-isomorphic to the 
real numbers. 


18.8.16 DEFINITION: An ordered field homomorphism from an ordered field Kı < (K1,01,71,<1) to an 
ordered field Kz < (K2,02,T2,<2) is a map à : Kı — Ko such that 
(i) Var, y € ky, ó(a1(z, y)) = o2((2), ó(y)). 
(ii) Vz,y € Ki, ó(ri(z. y)) = rs (6 (x). o(y)), 
(iii) Va,y € Ki, x <1 y => (2) «a O(y). [isotonic property] 


18.8.17 DEFINITION: An ordered field monomorphism from an ordered field Ky to an ordered field Kə is 
an injective ordered field homomorphism ¢: Kı > Kə. 


An ordered field epimorphism from an ordered field Kı to an ordered field Kə is a surjective ordered field 
homomorphism $ : Kı > Ko. 


An ordered field isomorphism from an ordered field K, to an ordered field K 2 is a bijective ordered field 
homomorphism $ : Kı > Ko. 


An ordered field endomorphism of an ordered field K is an ordered field homomorphism ¢: K > K. 


An ordered field automorphism of an ordered field K is an ordered field isomorphism ¢: K > K. 
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18.9. Complete ordered fields 


18.9.1 REMARK: Complete ordered fields, real numbers, order-completeness and topology. 
The most important fact about complete ordered fields is that they are all isomorphic (as ordered fields) to 
the real numbers. (See Sections 15.3, 15.4 and 15.9 for the basic definitions and properties of real numbers.) 


The definition of a complete ordered field requires the definitions of infimum and supremum, which are 
semi-analytical concepts. (See Section 11.2 for bounds and extrema for general partially ordered sets.) The 
infimum and supremum concepts do not require the general framework of topology, although they do have 
much in common with topology. (See Chapter 31 for general topology.) 


18.9.2 REMARK: Completeness of ordered fields. 

Definition 18.9.3 is stated somewhat asymmetrically. Due to the symmetry of the real numbers under 
negation, clearly the non-empty bounded-above subsets of an ordered field all have a least upper bound if 
and only if the non-empty bounded-below subsets have a greatest lower bound. 


The concept of completeness in Definition 18.9.3, which is expressed in terms of the order on a field, is closely 
related to the topological concept of completeness, which is expressed in terms of a metric. (See Section 37.8 
for completeness of metric spaces.) 


18.9.3 DEFINITION: A complete ordered field is an ordered field in which every non-empty set which is 
bounded above has a least upper bound. 


18.9.4 THEOREM: The real number system is a complete ordered field 
The real number system (IR, c, 7, <) is a complete ordered field. 


Pnoor: The assertion follows directly from Theorem 15.9.3. 


18.9.5 REMARK: All complete ordered fields are isomorphic to the real numbers. 

It is not too difficult to show that every complete ordered fields is ordered-field-isomorphic to the field of 
real numbers. (See Shilov [135], pages 36-38; Spivak [140], pages 509-511; Johnsonbaugh/Pfaffenberger [97], 
pages 9-16, 24-25, 441, exercise 7.9. See Definition 18.8.17 for ordered field isomorphisms.) Therefore 
one may say that there is only one complete ordered field. So there is no need to classify them! In other 
words, the system of real numbers is fully axiomatised by the conditions for a complete ordered field. The 
axiomatic approach to the real numbers, and also the Dedekind cut and Cauchy sequence approaches, are 
presented, for example, by Eves [353], pages 180-181; Shilov [135], pages 3-21; Spivak [140], pages 487—512; 
Thomson/Bruckner/Bruckner [149], pages 2-13; A.E. Taylor [145], pages 1-7; Rudin [129], pages 3-13. It is 
the unnaturalness of the various representations of the real numbers, and their extraneous artefacts, which 
make the axiomatic approach seem more attractive. (See Remark 15.3.4 for some alternative representations.) 
However, the axiomatic approach lacks concreteness of meaning. 


18.10. List spaces for sets with algebraic structure 


18.10.1 REMARK: Additional operations on lists over fields. 
'The operations available on a list space depend on the operations defined on the base set X. See Section 14.12 
for list spaces for general sets. 


18.10.2 DEFINITION: If X is a semigroup whose operation is written additively, the following operations 
are defined for List(.X) in addition to the operations in Definition 14.12.6: 


(i) The projection functions II; : List( X) — X, defined by... 
on fG tt € length) 
IL (0) = T if i > length(¢) ' 


(ii) The sum function 5 : List( X) — X defined for @ € List(X) by 


length(£)—1 


2 > h 


i=0 
This is well defined because addition in X is associative. The sum is understood to be “left-to-right”. 
That is, the sum is £o + £1 4- .... (See Definition 17.1.17.) 
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18.10.3 DEFINITION: If X is a ring, then the following additional operations are defined for List(X) in 
addition to the operations for a semigroup in Definition 18.10.2: 


(i) The product function from (X, List(X)) to List(X) defined by 


for x € X, £ € List(X) and 0 <i < length(é). 


18.11. Algebraic classes of sets 


18.11.1 REMARK: The tenuous connection between set-algebras and serious algebra. 
Despite the lack of substantial commonality between so-called rings and algebras of sets and serious algebra, 
Sections 18.11 and 18.12 are located near the structures which have similar names in this book. 


18.11.2 REMARK: Classes of sets closed under various set operations. 

Definitions 18.11.7, 18.11.12, 18.11.15, 18.12.2 and 18.12.5 are useful in the development of measure theory. 
Sets of sets which satisfy these definitions define their algebraic structures in terms of standard set operations: 
such as set union, set intersection and set difference. 


It is difficult to find any two authors who agree on the definitions of the classes of sets which appear in 
Section 18.11. They disagree both on the naming and on the precise details of their definitions. The purpose 
of these definitions is to provide a foundation for the definition of measure and integration. The definitions 
differ according to the following criteria for a class of subsets S of a set X. 


(1) Whether the empty set and parent set are elements of S. Thus: Ø € S and/or X € S and/or S #9. 

(2) Whether S is closed under simple set operations. Examples (for A, B € S): An Be S. AUBES. 
A\BeS.AABES. X\AES. 

(3) Whether S is closed under countable unions and intersections. Thus: (\;2, A; € S and/or UZ Ai € S 
for families (A;)%2, € SZ”. 

(4) Whether S is closed under countable monotone unions and intersections. Thus: (;-, A; € S for families 
(Aj), € SZ” which satisfy A; D A;,4 for all i € Zt and/or US, A; € S for families (Aj), € SZ* 
which satisfy A; C Aij44 for all i € Zt 

As in all algebra, one may pick any combination of axioms and investigate the systems which satisfy them. 

Some of the systems are sometimes somewhat useful for something in some context somewhere for somebody. 


18.11.3 REMARK: Family tree of some classes of sets closed under some kinds of set operations. 
A family tree for the finitely defined classes of sets in Definitions 18.11.4, 18.11.7, 18.11.12 and 18.11.15 is 
illustrated in Figure 18.11.1. 


multiplicative class 


M 


semi-ring 


M 


ring 


M 


algebra (or field) 


Figure 18.11.1 Family tree of finitely defined classes of sets 


18.11.4 DEFINITION: A multiplicative class of subsets of a set X is a set S € IP(IP(X)) such that 


(i) SAO. 
(ii) VA,BeES,ANBES, 
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18.11.5 REMARK: A possibly useless class of sets which is closed under intersection. 

Definition 18.11.4 is given by EDM2 [113], 270.B, page 999, but it is not clear that it is useful for anything. 
The requirement that S Æ Ø does not imply that Ø € S. The set {A} satisfies Definition 18.11.4 for any 
A € IP(X). At the opposite extreme, it is clear that P(X) is a multiplicative class of subsets of X, for any 
set X. It is equally clear that the sets of all subsets of X which have less than n elements, for any n € Ze ; 
is also a multiplicative class of subsets of X. 


One may obviously show by induction that a multiplicative class S of subsets of a set X includes all inter- 
sections of non-empty families of elements of S. In other words, Vn € Zt, V(Aj))*., € X”, Q1 Ai € X. 


A related kind of set-class is a class S of subsets of a set X which is closed under pairwise unions and 
contains every singleton {x} for x € X. This concept is used in the Kuratowski definition for finite sets in 
Remark 13.11.7. This is a kind of “pairwise-union span" of the class of singletons. In general, one obtains a 
useful set-class by at least insisting that some set of simple sets is included in it. 


18.11.6 EXAMPLE: Multiplicative class of complements of finite non-empty subsets of a set. 

For any non-empty set X, let S = {A € P(X); 1 € #(X \ A) < œ}. Then S is a multiplicative class of 
subsets of X. If X is a singleton, then S = (0). If X is finite and non-empty, then S = P(X) \ {X}. If X 
is infinite, then S contains only the complements in X of finite non-empty subsets of X. The set of finite 
non-empty subsets of X is closed under pairwise unions. So S is closed under pairwise intersections. 


18.11.7 DEFINITION: A semi-ring of sets is a set S such that 


(i) 0€ S, 
(ii) VA, Be S, An BcS, 
(i) VA, Be S, In € Z; , (Ej), € S", 
(A\ B = U; E; and Vi,j € Nn, 4 j > E; N E; = 0). 


18.11.8 EXAMPLE: Multiplicative classes which are not semi-rings. 

For any infinite set X, the set S in Example 18.11.6 is a multiplicative class of subsets of X which is 
not a semi-ring of sets. Such a class does not contain the empty set. If the set is modified by defining 
S= {Ae P(X); 1 < #(X\ A) < oc) U(0)), then conditions (i) and (ii) of Definition 18.11.7 will be satisfied, 
but condition (iii) will not. To see this, let A = X \ {a} and B = X \ {b} for some a,b € X with a # b. Then 
A,B € S and AN B = {b}. But this cannot be expressed as the union of a finite family of complements of 
finite sets (i.e. elements of S) because X is infinite. Therefore $ is not a semi-ring. 


18.11.9 THEOREM: Condition for a semi-ring of sets to be a multiplicative class of sets. 
Let S be a semi-ring of sets. Then S is a multiplicative class of subsets of a set X if and only if US C X. 


PROOF: Let S be a semi-ring of sets. Let X be a set which satisfies [JS C X. Then S C P(X) by 
Theorem 8.5.2 (x). In other words, S € P(P(X)). Conditions (i) and (ii) of Definition 18.11.4 follow directly 
from conditions (i) and (ii) of Definition 18.11.7. Hence S is a multiplicative class of subsets of X. 


To show the converse, let S be a semi-ring of sets, and suppose that S is a multiplicative class of subsets 
of X. Then S € IP(IP(X)) by Definition 18.11.4. Hence |J S C X by Theorem 8.5.2 (ix). 


18.11.10 REMARK: A variant definition of a semi-ring of sets. 
According to S.J. Taylor [147], page 15, many authors replace condition (iii) in Definition 18.11.7 with a 
stronger condition which is equivalent to condition (iii) as follows. 


(ii VA, B € S, In € Zp, 3(E)*., e S", 

(AN B =U; E; and Vi, j € Nn, i £ j > E; E; =f and Vm € Nn, (An B) UUT E; € S). 
18.11.11 EXAMPLE: Semi-rings of Cartesian space bounded semi-open intervals. 
The set S = P(X) is a semi-ring for any set X. In particular, (0) is a semi-ring. A non-trivial example 
of a set which satisfies Definition 18.11.7 is the set of semi-open intervals I(a,b) € IR" for n € Z*, defined 
by I(a,b) = (x € R”; Vi € Nn, a; € xi < bi} for all a,b € R” such that Vi € Ny, a; € bj. Thus S = 
(1(a, b); a,b € R” and Vi € Nn, a; € bi}. This also satisfies the stronger condition (iii!) in Remark 18.11.10. 
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18.11.12 DEFINITION: A ring of sets is a set S such that 


(i) OES, 
(ii) VA,BeES,ANBES, 
(iii) VA,BeS,AABES. 


18.11.13 REMARK: Some comments on rings of sets. 

Definition 18.11.12 could have defined a ring of sets to be a non-empty set S while omitting condition (i). This 
is because if S is non-empty, then there is at least one set A € S, and then by condition (iii), @ = AAA € S. 
This apparently eliminates one condition in the definition. However, this condition is hidden in the “non- 
empty set S" condition, and then the user must add a proof that € S. This is a false economy. It 
is simpler to state condition (i) explicitly, which makes direct comparison with Definition 18.11.7 much 
simpler. A similar observation applies to many textbook definitions which minimise the apparent number 
of conditions by hiding one or two of them. This kind of minimalism in the design of definitions is often 
counter-productive and confusing. 

From conditions (ii) and (iii) in Definition 18.11.12, it follows that AU B € S and A\ Be S forall A,B € S 
because of the formulas AU B = (AA B) A (AN F B) and A\ B = AA (An B) for general sets A and B. It 
follows that if S is a ring of sets, then S is a semi-ring of sets. 


The set S = P(X) is a ring for any set X. In particular, {Ø} is a ring. 

Rudin [129], page 227, gives a slightly weaker definition of a ring of sets which does not require 0 € S. 
However, the only ring which is excluded by this condition is the empty set S, which is not of very great 
interest. Therefore the definitions are essentially identical. 

MacLane/Birkhoff [110], page 485, requires a ring of sets to be closed under binary union and intersection. 
They define a field of sets to be closed under binary union and intersection, and also the complement with 
respect to the containing set. This is equivalent to Definition 18.11.15. 


18.11.14 THEOREM: Alternative set of conditions for a ring of sets. 

A collection of sets S is a ring of sets if and only if the following conditions hold. 
(i) 0 € S, 

(ii) VA, Be S, AU B € S, and 

(iii) VA,Be S, ANBcS. 


PROOF: Let S bea ring of sets. Let A, B € S. Then AUB = (AA B)A(An B) by Theorem 8.3.4 (xi), and 
ANB—AA(An B) by Theorem 8.3.7 (vi). So AU B € S and A\ B € S by Definition 18.11.12 conditions 
(ii) and (iii). Hence Theorem 18.11.14 conditions (i), (ii) and (iii) hold. 

Suppose that S is a collection of sets which satisfy Theorem 18.11.14 conditions (i), (ii) and (iii). Let A, B € 
S. Then An B = (AUB) \ ((A\ B) U (BN A)) by Theorem 8.2.4 (i), and A^ B = (A \ B) U (B \ A) by 
Definition 8.3.2. So AN B € S and A A B € S. Hence S is a ring of sets. 


18.11.15 DEFINITION: An algebra of subsets (or field of subsets) of a set X is a set S C P(X) such that 


(i) S is a ring of sets, 
(ii) X e S. 


18.11.16 REMARK: Examples of algebras of sets, and some comments. 

Given any ring of sets S, let X = |J S. Then S is an algebra of subsets of X if and only if |J S € S. Hence a 
ring of sets cannot always be upgraded to an algebra of subsets of a set simply by choosing the right superset. 
However, any ring of sets which does contain its union is automatically upgradable. 

The set S = P(X) is an algebra of subsets of X for any set X. In particular, the singleton {@} is an algebra 
of subsets of 0). 

Definition 18.11.15 is given by both S.J. Taylor [147], page 15, and EDM2 [113], 270.B, page 999. They both 
give the names “field” and “algebra”, but EDM2 [113] gives additionally the name “finitely additive class". 
To be precise, EDM2 [113], 270.B, page 999, defines an “algebra” to be a non-empty subset S C P(X) 
which is closed under finite unions and complements. Theorem 18.11.17 shows that this is equivalent to 
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Definition 18.11.15. The extension from binary unions to finite unions follows very easily by induction. 
(Rudin [129], page 227, gives the name “ring” for a set-class which is closed under finite unions and comple- 
ments, but does not explicitly require a non-empty class.) 


18.11.17 THEOREM: Alternative set of conditions for an algebra of subsets. 
Let X be a set. Let S C P(X). Then S is an algebra of subsets of X if and only if the following conditions 
all hold. 


(i) S z 0. 
(ii) VA,Be S, AUBc S. 
(iii) VAE S, XN Ac S. 


PROOF: Let X bea set. Let S C P(X). Let S be an algebra of subsets of X. Then Ø € S. So (i) is 
satisfied. Let A, B € S. Then XVA—XAAc€ Sand XXB—X A B € S by Definition 18.11.15 (ii) and 


To show the converse, let X be a set and let S C IP(X) satisfy conditions (i), (ii) and (iii). To show that 
S satisfies Definition 18.11.12 (i), note that Ø = AX A € S for some A € S by parts (i) and (iii). To show 
Definition 18.11.15 (ii), note that X = X V 0, which is in S by part (iii). To show Definition 18.11.12 (ii), 
let A,B € S. Then An B = X \ ((X \ A) U(X \ B)) is in S by parts (ii) and (iii. To show Defini- 
tion 18.11.12 (iii), let A, B € S. Then AA B = (AV B) U (B \ A) is in S by parts (ii) and (iii). 


18.11.18 REMARK: Survey of names for various classes of sets closed under various kinds of operations. 
Table 18.11.2 shows some of the terminology in the literature for the set-class algebras in this book. The o- 
versions of these set-class algebras require closure under countably infinite operations, which are presented in 
Section 18.12. It is not entirely clear how some authors are defining their conditions because they sometimes 
don't state explicitly if their classes are required to be non-empty, and whether the empty set and the whole 
set X are required to be in their classes. 


year reference mult. class semi-ring ring algebra o-ring c-algebra 
1933 Saks [131] additive class 
1953 Rudin [129] ring c-ring 
1963 Pitt[123] - ring c-ring 
1963 Simmons [137] ring algebra 
1964 Gelbaum/Olmsted [78] c-ring 
1965 MacLane/Birkhoff [110] [ring] field 
1965 A.E. Taylor [145] semiring ring algebra c-ring c-algebra 
1965 Yosida [167] c-ring 
1966 Shilov/Gurevich [136 semiring ring o-ring »-ring 
1966 S.J. Taylor [147] semi-ring ring field, algebra o-ring o-field, Borel field, 
c-algebra 
1973 Jech [364] set-algebra 
1975 Adams [44] c-algebra 
1976 Hayman/Kennedy [91] ring c-ring 
1978 Wilcox/Myers [164] c-algebra 
1988 Daley/Vere-Jones [66] semiring ring algebra c-ring c-algebra 
1993 EDM2 [113] multiplicative algebra, field c-algebra, 
class Borel field 
1997 Bruckner/Bruckner/ algebra c-algebra 
Thomson [56] 
2004 Szekeres [305] c-algebra 
2011 Bass [53] algebra c-algebra, o-field 
Kennington mult. class  semi-cring ring algebra c-ring c-algebra 
Table 18.11.2 Survey of algebraic set-class definition terminology 
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18.11.19 REMARK: Generation of closed classes of sets from given sets of sets. 

As with most algebraic and other structure categories, it is possible to define rings and algebras of sets which 
are generated by a given starting set. As usual, the method is to first show that a structure category is closed 
under arbitrary intersections, and then define the structure generated by a given set as the intersection of 
all structures which include that set. 


18.11.20 DEFINITION: The ring of sets generated by a set So is the intersection of all rings of sets S such 
that S C P(U So) and So C S. 


18.11.21 REMARK: The ring of sets generated by a set of sets is well defined. 

It must be checked that the ring of sets generated by a set So in Definition 18.11.20 is well defined. For any 
set So, the union |]J Sp and the power set IP(LJ So) are well defined by the ZF axioms. Since P(U So) is a 
ring of sets and So C IP(U) So), the set of rings of sets S such that S C P(U So) and So C S is non-empty. 
Therefore the ring of sets generated by Sp is a well-defined set. The fact that this is a ring of sets follows 
from the closure of rings of sets under arbitrary intersections. 


18.11.22 DEFINITION: The algebra of sets generated by a subset So of a set X is the intersection of all 
algebras S of subsets of X such that So C S. 


18.12. Classes of sets with countably infinite closure conditions 


18.12.1 REMARK: Extension of set-class algebra to classes with countable closure conditions. 
Section 18.12 differs from Section 18.11 by replacing finite operations with countably infinite operations. It 
is these infinite operations which are most relevant to Lebesgue measure theory. 


18.12.2 DEFINITION: A o-ring of sets is a set S such that 


(1) 0 € S, 
(2 VA,BeS, ANBcS, 
(3) V(A)g8s € S", UZo A; e S. 


18.12.3 REMARK: Consequences of the definition of a a-ring. 

Definition 18.12.2 implies V(A4;)??9 € S", (9 A; € S. This follows from conditions (2) and (3) by letting 
B = Us, Ai and observing that No A; = BN U;SS(B \ Ai). By combining these countable union and 
intersection closure properties, one obtains also limsup; ,4, Ai = No Uji Aj € S and liminf; ss, A; = 
L8 a A; € S for all (Aj) € S". 


18.12.4 REMARK: Other definitions of a-rings. 
S.J. Taylor [147], page 16, and Gelbaum/Olmsted [78], page 83, give o-ring definitions which are equivalent to 
Definition 18.12.2. The o-ring definition by Rudin [129], page 227, is the same except that it permits S = Í. 


18.12.5 DEFINITION: A c-algebra (or c-field) of subsets of a set X is a set S C P(X) such that 


(1) Xes, 
(2) VA,BES,A\BeES, 
(3) V(Ai)#o € 5^, UZo Ai € S. 


18.12.6 REMARK: A o-algebra is necessarily a o-ring. 
Any o-algebra according to Definition 18.12.5 is also a o-ring according to Definition 18.12.2. This follows 
from the identity 0 = X \ X. 


A family tree for the infinitely defined classes of sets in Definitions 18.12.2 and 18.12.5 is illustrated in 
Figure 18.12.1, together with the finitely defined classes of sets in Definitions 18.11.4, 18.11.7, 18.11.12 and 
18.11.15 
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multiplicative class 


4 


semi-ring 


i 


ring | — —»* |0-ring 


' ' 


algebra | —» |c-algebra 


Figure 18.12.1 Family tree of finitely and infinitely defined classes of sets 


18.12.7 REMARK: Other definitions of o-algebras. 

The c-algebra definitions of EDM2 [113], 270.B, page 999, Szekeres [305], page 287, and S.J. Taylor [147], 
page 16, are equivalent to Definition 18.12.5. EDM2 [113] gives the additional names: “countably additive 
class”, “completely additive class" and “Borel field”. S.J. Taylor [147] gives the additional names “sigma- 
field” and “Borel field" for a c-algebra. Yosida [167], page 15, gives the names “o-ring” and “o-additive 
family" instead of c-algebra. 


A survey of the variety of terminology for c-rings and c-algebras is given in Table 18.11.2 for Remark 18.11.18. 


18.12.8 REMARK: The difference between a a-algebra and a topology. 
A o-algebra may be contrasted with a topology, which requires closure under arbitrary unions and finite 
intersections. (See Definition 31.3.2.) 


18.12.9 REMARK: Some trivial o-rings and a-algebras. 
The set P(X) is a c-algebra of subsets of X for any set X. Therefore P(X) is also a o-ring for any set X. 
In particular, (0) is a c-algebra of subsets of Ø, and (0) is a o-ring of sets. 


18.12.10 REMARK: o-rings and o-algebras generated by sets. 
Some useful classes of o-rings and c-algebras are given in Definitions 18.12.11 and 18.12.12. 


18.12.11 DEFINITION: The o-ring of sets generated by a set So is the intersection of all c-rings of sets S 
such that S C P(U So) and So C S. 


18.12.12 DEFINITION: The c-algebra of sets generated by a subset So of a set X is the intersection of all 
c-algebras S of subsets of X such that So C S. 
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Chapter 19 


MODULES AND ALGEBRAS 


19.1 
19.2 
19.3 
19.4 
19.5 
19.6 
19.7 
19.8 
19.9 
19.10 


19.0.1 REMARK: 


Modules over rings 


Associative algebras . 
Lie algebras 


Modules over semigroups and groups 


Morphisms between modules over rings . 
Seminorms on modules over rings . 
Norms on modules over rings 

Inner products on modules over rings . 


Modules over unstructured operator domains . 


Family tree for modules and algebras. 
The family tree in Figure 19.0.1 summarises the relations between the classes of algebraic structures in 
Chapter 19. These structures may be broadly classified as modules and algebras. 


Cauchy-Schwarz and triangle inequalities for modules over rings . 
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group 
(Goa) 


y 


module 
(M,om) 


Y 


left A-module 
(A,M,om,H) 


Y 


left module over semigroup 


(T, M,or,o M.) 


transformation group 
(G.X,ocG.H) 


x^ b 


Pd 


left module over ring 
(R,M,e n,TR,O MH) 


left module over group 
(G,M,oc,om.H) 


Y 


unitary left module over ring 
(R,M,e n, TRO MH) 


s< Y 


~ 


associative algebra 
(R,A,OR;TR;OA,TA,H) 


Lie algebra 
(R, A0 n, TRO ATA) 


linear space 
(K,V,oK,TK OV 5b) 


7a x 


Real Lie algebra 
(I, A,  R,TR.C ATA) 


Figure 19.0.1 


19.0.2 REMARK: 


Family tree of modules and algebras 


Modules and algebras are algebraic structures whose passive set is a commutative group. 
Single-set algebraic structures such as semigroups, groups, rings and fields are presented in Chapters 17 
and 18. Chapter 19 presents modules, which are two-set algebraic structures. Linear spaces, associative 
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algebras and Lie algebras are derived from basic modules by adding further algebraic operations. Linear 
spaces are presented separately in Chapters 22-24 because of their extensive definitions and properties which 
are required for differential geometry. Tensor spaces, which are linear spaces of maps which are constructed 
from other linear spaces, are presented separately in Chapters 27—30. 


Modules differ from transformation groups, which are also two-set algebraic structures, in that the passive 
set of a module is required to be a commutative group and the action of a module's operator domain must be 
linear in some sense. General transformation groups are treated separately in Chapter 20. (See Table 17.0.2 
in Remark 17.0.3 for an overview of active and passive sets of algebraic systems.) 


19.1. Modules over unstructured operator domains 


19.1.1 REMARK: A minimalist module has no operator domain. 

There is such a thing as a module without an operator domain. (For example, see EDM2 [113], 277.B.) 
It is nothing more than a commutative group written additively. A module may be considered to have an 
operator domain which is the empty set. (This allows it to be regarded, technically speaking, as a two-set 
algebraic structure.) 

To such a minimalist module may be added an operator domain which becomes the active set, while the 
module is the passive set. The operator domain may be an abstract set, a group, a ring or a field. 


19.1.2 DEFINITION: A module (without operator domain) is a commutative group, whose operation is 
written additively. 


19.1.3 REMARK: An example class of modules. 
'The module of homomorphisms in Definition 19.1.4 is a module because it is a commutative group since 


the operation of M» is commutative. Thus (f + g)(z) = f(x) 4- g(x) = g(x) + f(x) = (g + f)(x) for all 
fig € Hom( Mı, M3) and x € Mi. 


19.1.4 DEFINITION: The module of homomorphisms from a module M4 to a module M» is the group 
(Hom(Mi, M3), c), where Hom(Mj, M2) is the set of group homomorphisms from M to M», and ø is the 
operation of pointwise addition on Hom(Mi, Mə). 


19.1.5 DEFINITION: The ring of endomorphisms of a module M is the ring (End(M), o, 7), where End(M) 
is the set of group endomorphisms of M, c is the operation of pointwise addition on End(M), and 7 is the 
composition operation of End( M). 


19.1.6 REMARK: Modules over sets are a generalisation of modules without operator domain. 

Since Definition 19.1.7 requires no structure at all on the operator domain of a module, Definition 19.1.2 is the 
special case of Definition 19.1.7 where the operator domain is the empty set. In this sense, Definition 19.1.2 
is superfluous. 


19.1.7 DEFINITION: A (left) module over a set A or (left) A-module or (left) module with operator domain A 
is a tuple M < (A, M, om, p) such that: 


(i) M < (M,ay) is a commutative group (written additively). 


(ii) u: Ax M > M satisfies Va € A, Vr, y € M, ula,om(2,y)) = om (ula, x), u(a, y)). 
In other words, Va € A, Vx, y € M, a(x + y) = ax + ay. 


A is said to be an operator domain of the module M . 


19.1.8 REMARK: Terminology for left A-modules. 
A better name for a “left A-module” would be a “module over a set”, but the terminology “A-module” is 
widely accepted in the literature. 


A left A-module may be thought of as a parametrised set of endomorphisms. Let M < (A, M,on, p) 
be a left A-module. For a € A, define the map Ha : M > M by pa : z + u(a,z). Then Ha satisfies 
Vz,y € M, Halt +Y) = ua(x) + Haly). So Ha is an endomorphism of the group M by Definition 17.4.1. Thus 
u may be thought of as a map from A to End(M). Since A has no structure, it plays the role of a set of 
parameters for a subset of End(M), assuming that u is an "effective" map in the sense of Definition 20.2.1 
for transformation groups. 
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19.1.9 REMARK: A consequence of the definition of a module over a set. 
Since a.0 = a(0 + 0) = a.0 + a.0 in Definition 19.1.7, it follows that a.0 = 0 for all a € A. 


19.1.10 DEFINITION: An A-homomorphism between modules Mj and Mə over a set A is a map f : Mı > 
Ms» such that 


(i) f is a group homomorphism from M; to M»; 
(ii) Va € A, Va € Mi, f(ax) — af (x). 
An A-homomorphism between modules is also called an operator homomorphism or allowed homomorphism. 


An A-endomorphism of a module M over a set A is an A-homomorphism from M to M. 
An A-automorphism of a module M over a set A is an invertible A-endomorphism of M. 


19.1.11 REMARK: Applicability of module homomorphism definitions and notations to the general case. 
Definition 19.1.10 and Notation 19.1.12 apply not only to modules over abstract sets, but also to modules 
over groups, rings and fields. 


19.1.12 NOTATION: HomaA(Mi, M2) denotes the set of all A-homomorphisms from module M, to module 
Ms» over the same set A. 

End4 (M) denotes the set of all A-endomorphisms of a module M over a set A. 

Aut 4(M) denotes the set of all A-automorphisms of a module M over a set A. 

GL(M) is an alternative notation for Aut A(M). 


19.1.13 REMARK: Superfluity in the notation for spaces of module homomorphisms. 
The subscript A in Notation 19.1.12 is superfluous because Definition 19.1.7 requires the set A to be part of 
the specification of a module. 


19.1.14 DEFINITION: The module of A-homomorphisms from a module M; to a module M» over the same 
set A is the group (HomA(Mi, M2), co), where HomA(Mi;, M2) is the set of A-homomorphisms from Mi; to 
Ms» and ø is the operation of pointwise addition on HomA(Mij, Mọ). 


19.1.15 DEFINITION: The ring of A-endomorphisms of an A-module M is the ring (Enda (M), c, 7), where 
EndA(M) is the set of A-endomorphisms of M, c is the operation of pointwise addition on End A(M), and 
T is the composition operation of EndA(M). 


19.1.16 REMARK: Properties of endomorphism rings. 

The tuple (Enda (M), c, 7) in Definition 19.1.15 is a ring. As noted in Remark 19.1.3, the tuple (End A(M), c) 
is a commutative group. The composition operation 7 is a semigroup by the associativity of composition 
in general. (See Theorem 10.4.21(ii). The distributivity follows pointwise from the distributivity of the 
operation of A on M in Definition 19.1.7 (ii). 


19.2. Modules over semigroups and groups 


19.2.1 REMARK: Overview of modules acted on by various kinds of systems. 
Each algebraic system in Sections 19.2, 19.3, 19.9 and 19.10 has two sets and three or more operations. Each 


set has one or more internal operations, and there is an action operation of one set on the other. 


General algebra textbooks seem to show little interest in modules over groups (and even less in modules over 
semigroups or monoids). This is not very surprising. The imposition of a commutative group structure on 
the module restricts the possibilities there. The group acting on the module is free to be any kind of group 
except that it must act on the module as a left transformation group (as in Definition 20.1.2) and its action 
on the module must be distributive over the module's addition operation. Thus a module over a group is 
the same as a left transformation group with the additional requirement that the passive set (the module) 
must be a commutative group and the left action on it must satisfy distributivity. It seems to make sense, 
therefore, to consider the class of modules over groups to be derived from the class of transformation groups. 


'The study of modules focuses largely on the decomposition of the module into submodules by the action of 
the active set, and the submodule mapping properties of homomorphisms between modules. The study of 
transformation groups is focused more on the group structure. Clearly there is some overlap. But generally 
it seems that modules over rings are studied in the “modules department" and modules over groups are 
studied in the "transformation groups department". 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


650 19. Modules and algebras 


19.2.2 REMARK: The relation between general groups and transformation groups. 

It could be argued that groups and transformation groups are the same thing. Any group may be regarded as 
acting on itself as a transformation group. Since such a “group acting on itself” is an effective transformation 
group (by Definition 20.2.1), the group itself may be eliminated from the specification. That is, the set of 
transformations is the same as the group itself. These transformations are a subgroup of the group of all 
bijections on the set of group elements. Thus all groups and transformation groups are essentially identical 
to subgroups of the group of bijections of a set. 


In the case of a module over a group, the transformation group is constrained to act distributively with 
respect to the module’s commutative group structure. This is essentially linearity, except that no scalar 
multiplication is involved. In this case, the transformation group may be identified with a subgroup of 
the set of all automorphisms of the module with respect to its additive (i.e. commutative group) structure. 
Thus one may regard all modules over groups as semigroups of the group of module automorphisms. Hence 
it is more natural to study modules over groups as in the department of “additive-structure-preserving 
automorphism groups”. 


19.2.3 REMARK: Modules over semigroups. 
A left module over a semigroup I is the same thing as a left [-module M (i.e. a module over the set D) such 
that l' is a semigroup and the additional condition (iv) of Definition 19.2.4 holds. 


19.2.4 DEFINITION: A (left) module over a semigroup T is a tuple M < (T, M,or, om, H) such that 


(i) T < (T, op) is a semigroup (written multiplicatively); 


) 

(ii) M < (M,oy,) is a commutative group (written additively); 
i) 
) 


(iii) w:T x M — M (written multiplicatively) satisfies Va ET, Vr,y € M, a(at+y) = ax + ay; 

(iv) Va,b € T, Vx € M, (ab)z = a(bz); 

19.2.5 REMARK: Modules over monoids. 

Presumably one could define a “module over a monoid”. It would be the same as Definition 19.2.6 with the 
word “group” replaced by “monoid”, and with the additional condition that the identity of the monoid acts 
as the identity map on the module. It is difficult to see any application for such a definition which would 
justify the extra ink. 


19.2.6 DEFINITION: A (left) module over a group G is a tuple M < (G, M,ocG,om, p) such that 


(i) G < (G, oa) is a group (written multiplicatively); 
(ii) M < (M,cy) is a commutative group (written additively); 
(iii) u: G x M — M (written multiplicatively) satisfies Va € G, Vr, y € M, alx +y) = ax + ay; 
(iv) Va,b € G, Vx € M, (ab)z = a(bz); 
(v) Vr € M, lax -zr. 


19.2.7 REMARK: Modules over groups. 

A left module over a group G is the same thing as a left G-module M such that G is a group and the 
additional conditions (iv) and (v) of Definition 19.2.6 hold. A left module over a group G is also the same 
thing as a module M over a semigroup G such that G is additionally a group and the additional condition (v) 
of Definition 19.2.6 holds. These relations are illustrated in Figure 19.0.1. 


19.3. Modules over rings 


19.3.1 DEFINITION: 
A (left) module over a ring R or (left) ring-module over R is a tuple M < (R, M,or,TR, om, H) such that 


(i) R< (R,on,TR) is a ring; 
(ii) M < (M,oy) is a commutative group (written additively); 
(iii) u: Rx M — M (written multiplicatively) satisfies Va € R, Vr, y € M, a(x + y) = ax + ay; 
(iv) Va,b € R, Vr € M, (a+ b)z = ax + bx; 
(v) Va,b € R, Yx € M, (ab)z = a(bz). 
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19.3.2 THEOREM: Some very basic properties of modules over rings. 
Let M be a module over a ring R. 


(i) Vm € M, ORM = 0m. 
(ii) Vr € R, Vm € M, (—r)m = —rm. 


PROOF: For part (i), let m € M. Then Orm = (0g--0g)m = 0gm4-0nm by Definition 19.3.1 (iv). Therefore 
Ou = Ogm + (—(0nm)) = (Orm + Ogm) + (—(0gm)) = Orm + (Orm + (—(05m))) = ORM + Ou = ORM. 

For part (ii), let r € R and m € M. Then (—r)m = (—r)m+ (rm + (—rm)) = ((—r)m + rm) + (—rm) = 
(=r +r)m + (=rm) = O0gm + (—rm) = 0m + (—rm) = -rm by part (i). 


19.3.3 REMARK: The relation of modules over rings to other kinds of modules and linear spaces. 

A left module over a ring R is the same thing as a left module M over the set R, together with the extra 
conditions (i), (iv) and (v) of Definition 19.3.1. The sets and operations of the basic classes of modules are 
illustrated in Figure 19.3.1. 


A oa Q^ G On QR PTR 
ag Me ọ un ọ 


OM ("M OM ("M OM ("M 


module module module 
over a set over a group over a ring 
Figure 19.3.1 Sets and operations of modules 


Definition 19.3.1 is not a specialisation or extension of Definition 19.2.6. Therefore a left module over a ring 
cannot be a left module over a group. This is illustrated by the forking of the tree in Figure 19.0.1 between 
modules over rings and modules over groups. 


Condition (v) of Definition 19.3.1 refers to the multiplication operation of a ring R, which is not a group 
operation, whereas condition (iv) of Definition 19.2.6 does refer to a group operation, even though it is written 
multiplicatively. If the tuple (R, M,TR,GM;p) in Definition 19.3.1 is substituted into Definition 19.2.6, it will 
satisfy all conditions except (i) and (v). The tuple (R, M, TR, om, p) is in fact a module over a semigroup. 


Consequently everything said about modules over rings is applicable to linear spaces, whereas modules over 
groups are a kind of “dead end". (See Chapter 22 for linear spaces.) From the perspective of a linear space 
V over a field K, Definition 19.2.6 (iv) would require VÀ, u € K, Vv € V, (A+ u)u = A(uv) because addition 
is the group operation of a field. Even worse is Definition 19.2.6 (v), which would require Vv € V, 0v = v. 


19.3.4 REMARK: Why modules over rings are interesting. 

From the perspective of differential geometry, the proliferation of definitions and theorems for modules over 
rings may seem dry, abstract and irrelevant. To see the relevance, one may substitute “real linear space” 
or “real matrix group" for "(unitary) (left) module over a (commutative) (ordered) (unitary) ring”, and 
similarly substitute “complex linear space" or “complex matrix group" for “(unitary) (left) module over a 
(commutative) (unitary) ring". The terminology of pure algebra disguises interesting applications as dry 
abstractions in the hope, often justified, of obtaining broader applicability. 


19.3.5 REMARK: A unitary module over a ring requires a unitary ring. 

When the ring R in a module over a ring has a multiplicative identity 15, one naturally expects that this 
identity will act as the identity map on the module. Perhaps a more logical name for Definition 19.3.6 would 
be a “unitary left module over a unitary ring”, but the ring requirement is implicit. 


19.3.6 DEFINITION: A unitary (left) module over a ring R or unitary (left) ring-module over R is a left 
module M < (R, M,or,Tr,om, H) over a ring R such that 


(i) R< (R,on,TR) is a unitary ring; 
(ii) Vx € M, Irz = zx, where 1g is the unity of R. 
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19.3.7 REMARK: Choice of definitions for modules over rings and unitary modules over rings. 

Hartley /Hawkes [90], definition 5.1, page 70, use Definition 19.3.6 for left modules over rings, but they 
comment that some authors use Definition 19.3.1 instead. However, it seems safest in a general context to 
always state the unitary requirement explicitly. 


19.3.8 REMARK: Submodules of modules over rings. 

There is a substantial history of investigation of the decomposition of spaces of matrices into subspaces. 
The natural abstraction of matrices over fields is the study of modules over rings. This gives submodules of 
modules over rings some importance. 


19.3.9 DEFINITION: A submodule of a (left) module over a ring M < (R,M,or,TR,omM, M) is a left 
module over a ring M’ < (R, M',on, TR, 0 M, Um’) such that M' C M, ow Coy and pw C pm. 


19.3.10 THEOREM: Some basic properties of submodules of modules over rings. 
Let M’ < (R, M',or,TR, Om’, IAM!) be a submodule of a module over a ring M < (R, M,or,TrR,omM, M). 


G) om — ou wxm 


(ii) uw = UM | pe 
(iii) Oy € M’. 


Pnoor: For part (i), M’ is a module over a ring by Definition 19.3.9. So Definition 19.3.1 (ii) implies that 


Dom(om') = M' x M'. Hence oy = am| ywy by Theorem 10.4.10 4.10 because M' C M and ew Com. 


For part (ii), M” is a module over a ring. So Definition 19.3.1 (iii) implies that Dom(u m) = R x M'. Hence 
bu = Hml asa by Theorem 10.4.10 because M' C M and uw: C um. 


For part (iii), Oy» € M’ because M' is a group (written additively) by Definition 19.3.1 (ii), since M' is a 
module over a ring by Definition 19.3.9. Then Ow + 0w = Ow, where Ow € M because M' C M. So 
Ow = 0m by Theorem 17.3.14 (vi). Hence Om € M'. 


19.3.11 REMARK: Closure conditions to guarantee that a subset is a submodule. 

If a module over a ring is not unitary, a subset of the module requires closure under the additive inverse 
operation as in Theorem 19.3.12 (iii) in order to guarantee that the subset is a submodule. In the case of a 
unitary module over a ring, closure under the inverse operation follows from the action of the element —1 of 
the ring. 


19.3.12 THEOREM: Conditions for a sub-system of a module over a ring to be a submodule. 
Let (R, M,on, TR, M, HM) be a module over a ring. Let M’ C M satisfy 


(i) ew (M x M^) € M' and uu(R x M!) CM’, 
(ii) 0m € M', and 
(iii) Yz € M', —x € M’. 
Then M' « (R, M',on, TR, OM, M), where oy = am| yy 
of M < (R, M, OR; TR; OM; HM): 


san and py = lu pym” is a submodule 


PROOF: cq: is a well-defined function with domain M’ x M’ by Theorems 10.4.5 line (10.4.2) and 9.4.6 (ii), 
and clearly om C oy. Similarly, uj: is well defined with domain R x M' and uy: € uy. 
(M',oyr) is a group by Theorem 17.6.4 (because it is closed and contains Owy and inverses), and it is 


commutative because (M, om) is commutative. Properties (iii), (iv) and (v) in Definition 19.3.1 are inherited 
by M' from M. So M’ is a module over a ring, Hence M' is a submodule of M by Definition - 19.3.9. 


19.3.13 REMARK: Submodules of unitary modules over rings. 

In practice, most of the useful modules over rings are unitary modules over rings. (In particular, linear spaces 
are unitary modules over rings, and Lie algebras are defined as unitary modules over rings with an additional 
multiplication operation.) Definition 19.3.14 is almost identical to Definition 19.3.9. Consequently it follows 
almost trivially that Theorem 19.3.10 is also valid for unitary modules over rings. The unitary version of 
Theorem 19.3.12, however, differs from Theorem 19.3.15 in that condition (iii) is not required. 
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19.3.14 DEFINITION: A submodule of a unitary (left) module over a ring M < (R,M,or,Tr,om, M) is 
a unitary left module over a ring M’ < (R, M';on, Tn, Ou, Uu!) with M' C M, ow: Coy and pw C um. 


19.3.15 THEOREM: Conditions for a sub-system of a unitary module over a ring to be a submodule. 
Let (R, M, 0R, TR, OC M, um) be a unitary module over a ring. Let M’ C M satisfy 


(i) e (M' x M’) € M' and uy (R x M!) CM’, 
(ii) 0m € M’. 
Then M’ < (R, M'on,TR,OM', Hw), where oy = PAM asses and uw = NE is à submodule 
of M < (R, M, OR, TR, OM, um). 
PROOF: Let r € M'. Then —z = -um (1R, x) = um (1R, £) = um(—1Rr, x) because R has a multiplicative 


identity 1r. Therefore —z € M' because uu (R x M^) C M'. Thus Vz € M', —x € M'. Hence the assertion 
follows by Theorem 19.3.12. 


19.3.16 REMARK: The relation between linear spaces and unitary modules over rings. 

If the ring R of a unitary left module M over a ring R is a field, then M is called a “linear space" over R. 
(See Definition 22.1.1.) In other words, a linear space is the same thing as a unitary left module over a field. 
(See Remark 22.1.2 for more details.) 


Linear spaces are deferred until Chapters 22-24. Their properties are presented in much greater detail than 
most other algebraic systems because of their fundamental importance for differential geometry. 


Since linear spaces are so central to mathematics and all of the sciences, it is not surprising that modules 
over rings, which are so closely related to linear spaces, also have great importance. Modules over rings 
may be thought of as the maximal generalisation of linear spaces which preserves a large proportion of the 
virtues of linear spaces. In practice, it turns out that modules over rings are very frequently the right choice 
of algebraic structure. However, modules have an unfortunate choice of name which makes them sound dull 
and lifeless. Perhaps they should have been called “linear spaces over rings" or “vector spaces over rings" 
to benefit from the name recognition of linear spaces and vector spaces. (See also Remark 19.3.4 for this 
"branding" issue.) 


19.4. Morphisms between modules over rings 


19.4.1 REMARK: How to skip Section 19.4. 

Section 19.4 is intensely tedious. Theorem 19.4.13 is applied in Theorem 19.9.6 to show that the algebra of 
endomorphisms of a unitary left module over a commutative unitary ring is an associative algebra over that 
ring. This is in fact the primary motivation for Section 19.4, and the result does help illuminate associative 
algebras. But associative algebras are themselves not centrally important to differential geometry in the 
way that Lie algebras are. On the other hand, the very important Lie algebras are often constructed from 
associative algebras. So the result is not entirely uninteresting. Nevertheless, Section 19.4 would be a good 
section to skip by readers who have something better to do. 


19.4.2 REMARK: Morphisms between modules over the same ring. 
Definition 19.4.3 and Notation 19.4.4 for morphisms between modules over the same ring are derived from 
the corresponding Definition 19.1.10 and Notation 19.1.12 for modules over sets. 


19.4.3 DEFINITION:  Ring-module morphisms. 
A ring-module homomorphism from a module Mı < (R, M1, 0R, TR, 01, 41) to a module 
M» < (R, Mo,0n,7Tn,02, 2) over a ring R < (R,or,TR) is a map ó : Mı — M» such that: 
(i) $ is a group homomorphism from Mı to M2. That is, Vz,y € Mi, $(e1(z, y)) = 72(¢(2), o(y)). 
In other words, Vz,y € Mi, ¢(a + y) = (x) + (y). 


(ii) VA € R, Yx € Mi, ó(pi (A; x)) = ua (A, ó(x)). 
In other words, VÀ € R, Vx € Mi, (Ax) = Aó(x). 


A ring-module monomorphism from a module Mı to a module Mə over the same ring R is an injective 
ring-module homomorphism $ : Mı > Mə. 
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A ring-module epimorphism from a module M4 to a module Mə over the same ring R is a surjective ring- 
module homomorphism ¢ : Mı — Mo. 


A ring-module isomorphism from a module Mı to a module Mə over the same ring R is a bijective ring- 
module homomorphism $ : Mı — M3. 


A ring-module endomorphism of a module M over a ring R is a ring-module homomorphism $ : M — M. 


A ring-module automorphism of a module M over a ring R is a ring-module isomorphism ¢: M — M. 


19.4.4 NOTATION: Let M, Mi, M» be modules over the same ring R. 

Homg(Mi, M2) denotes the set of ring-module homomorphisms from M; to M» over R. 
Mong (Mi, M2) denotes the set of ring-module monomorphisms from M; to M» over R. 
Epig (Mi, M2) denotes the set of ring-module epimorphisms from M, to M» over R. 
Ison(Mi, M2) denotes the set of ring-module isomorphisms from Mi to M» over R. 
Endg(M) denotes the set of ring-module endomorphisms of M overR. 

Autr(M) denotes the set of ring-module automorphisms of M over R. 


19.4.5 REMARK: Sets of module homomorphisms are linear spaces with pointwise operations. 

In the linear space language in Chapter 22, all of the sets of morphisms in Definition 19.4.3 are linear spaces. 
Theorem 19.4.6 states that Homg(M;, Mə) is a linear space, but without using the term “linear space”. 
(Because of the close connection between unitary modules over rings and linear spaces, homomorphisms 
between modules over rings are sometimes referred to as linear maps.) 


19.4.6 THEOREM: Closure of sets of ring-module homomorphisms. 
Hompg(Mı, M3) is closed under pointwise addition and scalar multiplication for any modules Mı and M» 
over a ring R. 


PROOF: Let ¢1,¢2 € Homg(Mi, M2). Then using the notation of Definition 19.4.3, 


Vr,y € Mı, (61 + é2)(x + y) = é1(z + y) + éx(x + y) 

+ éi(y)) + (¢2(x) + éx(v)) 
+ ó»(x)) + (b1(y) + éx(y)) 
= ($1 + 2)(z) + ($1 + éx)(y). 


which shows that the pointwise sum ¢1 + ¢2 is in Homg( Mı, M2). Thus Homga(Mi, M2) is closed under 
pointwise addition. Similarly the pointwise scalar product Ad is in Homg (Mi, M2) for any A € R and 
[o € Homg(Mi, M»). 


19.4.7 REMARK: The module of homomorphisms between two modules over the same ring. 
The set Homg( Mı, M3) of all homomorphisms between modules Mı and M» over the same commutative 
ring R is the natural structure of a module over the same ring as in Definition 19.4.8. The fact that this is 
a valid module is verified in Theorem 19.4.9. The commutativity of the ring is applied in line (19.4.8). (The 
slightly surprising necessity for commutativity is pointed out by Ash [50], page 95. This shows that checking 
all of the conditions is not entirely superfluous! See also MacLane/Birkhoff [110], page 166.) 


19.4.8 DEFINITION: The ring-module of ring-module homomorphisms from a module 
Mı < (R, Mi,0n,7n,01, 41) to à module Mo < (R, Mo,cR,TR, 02, H2) over a commutative ring R is the 
tuple M < (R, M,or,Tr, 9, p), where: 


(i) M = Homna(Mi, M3) is the set of ring-module homomorphisms from M to M» over R. 
(ii) o: M x M > M is defined by Vài, 9» € M, Vr € Mi, o(¢1, 93) (z) = o»(di(x), $»(x)). 
In order words, V1, 9» € M, Vx € Mi, ($1 + ¢2)(x) = di(x) + ó»(x). 
(iii) u: Rx M — M is defined by Va € R, Yọ € M, Vx € Mi, pla, 9)(z) = u2(a, ó(x)). 
In order words, Va € R, Yọ € M, Vx € Mi, (a¢)(x) = aé(x). 


19.4.9 THEOREM:  Ring-modules of ring-module homomorphisms over commutative rings are ring-modules. 
The ring-module of ring-module homomorphisms between two modules over the same commutative ring R 
is a ring-module over R. 
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PROOF: Let Mı < (R, Mi,0n, TR, 01, 11) and M» < (R, M2, 0R, TR, 02, i2) be modules over a commutative 
ring R < (R,or,TR). Let M < (R, M,on,Tn,0,7) be the ring-module of ring-module homomorphisms from 
M; to M». To show that M is a ring-module over R, note first that Definition 19.3.1 (i) is satisfied because 
Ris a ring by Definition 19.4.8. 


For Definition 19.3.1 (ii), it must be shown that (M,c) is a commutative group. The algebraic closure of 
c: Mx M — M follows from Theorem 19.4.6. The associativity and commutativity of ø follow from 
Definition 19.4.8 (ii) and the associativity and commutativity of o». Define $o : Mı — Mə by $o : x > Om,- 
Then ġo € M and $o + ó = 6+ ġo = ¢ for all à € M. So ġo is an identity for (M,c). Similarly, for all 
$ € M the map —¢: x  —ó(x) is the additive inverse of à. So (M,c) is a commutative group. 

For Definition 19.3.1 (iii), it must first be shown that u : R x M — M is well defined. For every a € R and 
$ € M, it must be shown that p(a, $) is a ring-module homomorphism from M; to M» over R. Let x € Mj. 
Then (a, $)(x) = ua(a, $(x)) € Ma. Thus u(a, 9) : Mı — Mə. Then 


Yz, y € Mi, u(a, ó)(o1 (m, y)) = Ha(a, $(o1(2,y))) (194.1) 
= ua(a, a» (6(x), ó(y))) (19.4.2) 
= oz(u2(a, ó(x)), us(a, ó(v))) (19.4.3) 
= 02(u(a, $)(x), ula, $) (y)), (19.4.4) 


(homomorphism additivity), line (19.4.3) is by Definition 19.3.1 (iii) (module additive distributivity), an 
line (19.4.4) is by Definition 19.4.8 (iii) (pointwise product). To put it more colloquially, (a@)(« + y) 
aó(z + y) = a(é(x) + é(y)) = aé(x) + aé(y) = (aé)(x) + (ad)(y). Thus u(a, 9) satisfies Definition 19.4.3 (i) 
for homomorphism additivity from Mı to M». The corresponding homomorphism scalarity condition may 
be checked as follows. 


VA € R, Va € My, ula, b) (141 (A, x)) = u2(a, (u1 (A, x))) (19.4.5) 
= p2(a, p2(A, (x) (19.4.6) 
= m(0r(a, A), ó(y)) (19.4.7) 
= pa(0 n(^, a), O(y)) (19.4.8) 
= p12(A, us (a, ó(2))) (19.4.9) 
= p2(A, n(a, ¢)(x)), (19.4.10) 


where line (19.4.5) is by Definition 19.4.8 (iii) (pointwise product), line (19.4.6) is by Definition 19.4.3 (ii) 
(homomorphism scalarity), line (19.4.7) is by Definition 19.3.1(v) (module multiplicative distributivity), 
line (19.4.8) is by Definition 19.4.8 (ring commutativity), line (19.4.9) is by Definition 19.3. 1 (v) (module 
multiplicative distributivity), and line (19.4.10) is by Definition 19.4.8 (iii) (pointwise product). To put it 
more colloquially, (a6)(Az) = ad(Ax) = a(A9(z)) = (aA)ó(z) = (Aa) (x) = A(ag(x)) = A((aó)(z)). Thus 
u(a, o) satisfies Definition 19.4.3 (ii) for homomorphism scalarity from M; to Mo. Sou: Rx M > Misa 
well-defined map. B 


Let A € R and $1,» € M. Let x € Mı. Then (\(¢1 + ¢2))(x) = A((¢1 + ¢2)(x)) by Definition pas 8 (iii). 
This equals A(ó1(x) + ¢2(x)) by Definition 19.4.8 (ii). This equals Aó1 (x) + A$»(x) by Definition 18.1.2 2 (Hi). 
This then equals (A¢1)(x) + (A$2)(z) by Definition 19.4.8 (iii). And this then equals (A61 + ME 


y 
Definition 19.4.8 (ii). Thus VÀ € R, Vó1, 62 € M, A(ó1 + 2) = Abi + Aga. Therefore u : Rx M > M 
satisfies Definition 19.3.1 (iii). 
Let A1, À2 € R and [o € M. Let x € Mj. Then (a + A2))(x) = (Ai + A2) (2) = A(x) + A2(x) = 
(A19) (x) + (A26) (x) = (A16 + A2¢) (x) by Definition 19.4.8 (iii), Definition 18.1.2 (iii), Definition 19.4.8 (iii) 
and Definition 19.4.8 (ii) respectively. So u and ø satisfy Definition 19.3.1 (iv). 
Let À1, A2 € R and [o € M. Let zx € Mı. Then ((A1A2)¢) (2) = (A1A2) (2) = A1(Aa(x DES = A1((A2Q) (a )) = 
(A1(A20)) (x) by Definition 19.4.8 (iii), Definition 19.3.1 (v), Definition 19.4.8 (iii) and Definition 19.4.8 (iii) 
respectively. So u satisfies Definition 19.3.1 (v). Hence M is a ring-module over R by Definition 19.3.1. 


19.4.10 DEFINITION: The ring-module of ring-module endomorphisms of a module M over a commutative 
ring R is the ring-module of ring-module homomorphisms from M to M over R. 
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19.4.11 REMARK: Unitary ring-module morphisms. 

There is no need to define unitary ring-module morphisms. Definition 19.4.3 and Notation 19.4.4 for general 
ring-module isomorphisms are adequate. If R is a unitary ring with unit 15, and Mj is a unitary module 
over R, then VÀ € R, IRA = Alg = A by Definition 18.2.2, and Vx € Mi, Irz = x by Definition 19.3.6. 
Therefore Vz € Mi, 1gó(x) = é(1gz) = $(r) by Definition 19.4.3 for a ring-module ¢ : Mı — M» over R. 
In other words, 1p acts at least as a unit element on the left for all elements of Range(9). This is not strong 
enough to imply that M» is a unitary module over R. (Consider for example the zero map $ : x + 0m,, which 
is a ring-module homomorphism. Knowing that 1ROm, = Om, certainly does not show that M» is a unitary 
module.) However, the conditions for a module homomorphism are clearly always compatible with M; and 
Ms» both being unitary modules over R. Thus it is redundant to say that a ring-module homomorphism 
between unitary rings is a unitary ring-module homomorphism. However, it is not entirely obvious that 
the ring-module of ring-module homomorphisms between modules M, and M» in Definition 19.4.8 will be a 
unitary ring-module if Mı and M» are unitary modules over a unitary ring R. 


19.4.12 THEOREM: Modules of module homomorphisms (of a certain kind) are unitary ring-modules. 
The ring-module of ring-module homomorphisms between two unitary modules over a commutative unitary 
ring R is a unitary ring-module over R. 


PROOF: Let Mi < (R,Mi,0Rr,TR,01, H1) and Mz < (R, M2,0R,TR,02, H2) be unitary modules over a 
commutative unitary ring R < (R,or,TR). Let M < (R, M,og, Tn,0,7) be the ring-module of ring-module 
homomorphisms from Mı to M5. Then M is a ring-module over R by Theorem 19.4.9. 

Let ó € M. Then Yx € Mi, (1n9)(z) = 1nó(z) = O(a) by Definition 19.4.8 (iii) and Definition 19.3.6 (ii). 
Thus 15ó = ¢ for all ? € M, Hence M is a unitary ring-module over R by Definition 19.3.6. 


19.4.13 THEOREM: Properties of the ring-module of endomorphisms of a commutative ring-module. 

Let A be the ring-module of ring-module endomorphisms of a module M over a commutative ring R. 
(i) A is a module over the ring R. 

(ii) If M is a unitary module over R, then A is a unitary module over R. 


(iii) Endg(M) is closed under function composition. 


PROOF: Part (i) follows from Theorem 19.4.9. 

Part (ii) follows from Theorem 19.4.12. 

For part (iii), let $1, 92 € Endg(M). Let x,y € M. Then (¢; o à2)(z + y) = o1(¢2(a@ + y)) = b1(G2(a) + 
i). 


$»(y)) = ¢1(¢2(x)) + o1(b2(y)) = (1 © à»)(x) + (1 © G2)(y). So $1 o 2 satisfies Definition 19.4.3 ( 
Similarly, 1 o à» satisfies Definition 19.4.3 (ii). So 9; o 6? € Endg(M). 


19.5. Seminorms on modules over rings 


19.5.1 REMARK: The role of the positivity requirement for norms. 

The motivation for introducing seminorms in Definition 19.5.3 is to determine how much can be proved for 
the norm in Definition 19.6.2 without the positivity condition. In the case of finite-dimensional linear spaces, 
the positivity condition can be shown to imply a kind of uniform positivity condition, which is useful for 
constraining the topology which is induced by the norm. 


In the case of infinite-dimensional linear spaces, the use of infinitely many seminorms in place of a single norm 
permits a wider range of topological linear spaces to be defined, in particular those spaces which are related 
to Schwartz distributions and other kinds of generalised functions. This topic is outside the scope of this 
book, as mentioned in Remark 1.6.3 (5). (For presentations of seminorms in the context of topological linear 


spaces, see Robertson/Robertson [126], pages 12-22; Rudin [130], pages 25-30; Yosida [167], pages 23-30.) 


19.5.2 EXAMPLE: Seminorms on a module over a zero ring. 

Let R = (0n) be a zero ring as in Definition 18.1.5. Let M = IR be the additive group of real numbers. Define 
the left action of R on M by Org = Oy, for all x € M. Then M is a left module over the ring R according 
to Definition 19.3.1. Let S = IR be the ring of real numbers with the usual order as in Definition 15.6.2. 
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Define 6: R — S by $(0n) = 0s. Then ó is an absolute value function on R by Definition 18.5.6. (See also 
Example 18.5.9 for absolute value functions on the zero ring.) Define v : M — S by 


ar ifr20 
Ve € M, Wie sacos 


where a; B € R satisfy a > 8. Let A € R and z € M. Then A = 0g. So (Ax) = v(0nz) = v(05) = Og = 
Osv(z) = o(0g)v(x) = o(A)v(x). So ¢ and v satisfy Definition 19.5.3 condition (i). Suppose that x > Op 
and y > Og. Then y(x +y) = a(x +y) = az +ay = v(z) 4- v(y). Similarly y(x +y) = v(x) - v(y) if £ < OR 
and y € Og. Suppose that x < Opr and y > Or and x+y > Or. Then (x + y) = a(x + y) = az + ay = 
Bz + ag + (a — B)z € Bx + ay = v(x) + v(y). Suppose that x < Or and y > Og and «+ y € Og. Then 
ylz +y) = L(x +y) = Bat By = Bx oy (8—o)y € Bx oy = v(z) - v(y). Thus v(z- y) € v(z)--v(v) 
for all x,y € M. So v satisfies Definition 19.5.3 condition (ii). 


'This example shows that a seminorm on a module over a zero ring is not necessarily symmetric with respect 
to negation of its argument, and is not even necessarily non-negative. In fact, in the case of a module over 
a zero ring, a seminorm is only required to satisfy a limited kind of subadditivity. This is not the kind of 
behaviour which is typically associated with the words “seminorm” or “norm”. Therefore Definitions 19.5.3 
and 19.6.2 require the ring R to be a non-zero ring. 


19.5.3 DEFINITION: A seminorm on a (left) module M over a non-zero ring R, compatible with an absolute 
value function à : R > S, for an ordered ring S, is a map v : M — S which satisfies 


(i) VÀ € R, Va € M, v (Ax) = (Ajy (a), [scalarity] 
(ii) Vz,y € M, y(x +y) € v(x) + v(y). [subadditivity] 


19.5.4 THEOREM: Some basic properties of seminorms on left modules over rings. 
Let  : M — S be a seminorm on a left module M over a non-zero ring R, compatible with an absolute 
value function 9 : R — S, for an ordered ring S. 
(i) v(07) = Os. 
(ii) Vx € M, y(—x) = y(x). 
(iii) Vr € M, w(x) > Og. 
(iv) Vz,y € M, (z) > v(y) — v(z — y). 


PROOF: For part (i), the equality w(0m) = w(OR0m) = o(0R)w (Or) = Osw(Ons) = 0g follows from 
Theorem 19.3.2 (i), Definitions 19.5.3 (i) and 18.5.6 (iii), and Theorem 18.1.3 (i). 

For part (ii), let x € M and A € R\ {Or}. Then —A € R\ (0n) and v((—A)z) = o(—A)v(x) = e(A)v(z) 
by Definition 19.5.3 (i) and Theorem 18.5.10 (iii). But v((—A)x) = v(A(—2)) = $(A)v(—2z) by Definitions 
19.3.1 (v) and 19.5.3 (i). So $(A)v(x) = o(A)u(—2). But (à) # Og by Definition 18.5.6 (i 
cancellative ring by Theorem 18.3.10 (xiv) because it is an ordered ring. Hence (x) = v(—z). 

For part (ii), let x € M. Then 0s = Y(0m) = y(x + (—2)) € P(x) + v(—z) = y(x) + y(x) by parts (i) 
and (iv). If d(x) < Og then (x) + u(x) > Os. Therefore y(x) > Os. 

For part (iv), let x,y € M. Then v(y) = y(x + (y — zx)) € y(x) + v(y — x) by Definition 19.5.3 (ii). But 
vy — x) = v(x — y) by part (ii). So v(x) 2 v(y) — va — y). 


19.6. Norms on modules over rings 


19.6.1 REMARK: General norms for modules over rings. 

The scalarity condition for the minimalist norm on modules over rings in Definition 19.6.2 is expressed in 
terms of the minimalist absolute value function in Definition 18.5.6. (See Definition 18.3.3 for ordered rings. 
Some motivation for the apparently excessively weak requirements in Definition 19.6.2 may possibly be found 
in Examples 18.5.8, 19.6.5 and 19.6.6.) 


Definition 19.6.2 differs from the seminorms in Definition 19.5.3 by the addition of condition (iii) The 
non-negativity of a seminorm is asserted in Theorem 19.5.4 (iii) 
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19.6.2 DEFINITION: A norm on a (left) module M < (R,M,or,Tr,om,/) over a non-zero ring 
R < (R,on, Tn), compatible with absolute value function ¢: R > S, for an ordered ring S < (S,og,75,<), 
is a map v : M — S which satisfies 


(i) VÀ € R, Va € M, v (Ax) = (A) Vv (x), [scalarity] 
(ii) Vr,y € M, ylz +y) € y(x) + v(y), [subadditivity] 
(iii) Vr € M \ {0m}, v(x) > Os. [positivity] 


19.6.3 DEFINITION: A normed (left) module over a non-zero ring R < (R, om, TR) compatible with absolute 
value function ó : R — S, for an ordered ring S, is a tuple M < (M,v) < (R, Men, Tn, om, p, Y) where 
M < (R, Mon, TR,CM,H) is a left module over R and v : M — S is a norm on M compatible with S. 


19.6.4 THEOREM: Some basic properties of norms on left modules over rings. 
Let y : M — S be a norm on a left module M over a non-zero ring R, compatible with an absolute value 
function $ : R — S, for an ordered ring S. 


(i) v is a seminorm on the left module M over the non-zero ring R, compatible with ¢: R > S. 
(ii) v(0j) = Og. 
(iii) Va € M, w(-2) = 
(iv) Va € M, y(x) > 0s 
(v) Vr,y € M, y(x jede aca 


(a). 


PRoor: Part (i) follows from Definitions 19.5.3 and 19.6.2. 

Parts (ii) and (iii) follow from part (i) and Theorem 19.5.4 parts (i) and (ii) respectively. 
Part (iv) follows from part (ii) and Definition 19.6.2 (iii). 

Part (v) follows from part (i) and Theorem 19.5.4 part (iv). 


19.6.5 EXAMPLE: Let R be the ring 2Z = (2n; n € Z} with the usual addition operation from Z. The set 
M = Z? with componentwise addition is a commutative group. Define the action of R on M componentwise. 
Then M is a module over the ring R. Let S = IR. Define  : M > S by y : (z,y) — (a? + y?)'/?. Define 
@:R—>S by ó:z  max(z, —z), identifying integers with the corresponding real numbers. Let A € R and 
x € M. Then v(Az) = é(A)v(x). Thus v satisfies Definition 19.6.2 for the given M, R, S and ¢. (The other 
conditions are easily verified.) Note that R is not a unitary ring. 


19.6.6 EXAMPLE: Let M = C, the set of k-tuples of complex numbers for some k € Z*. Let R = C. 
Then R is a ring with the usual addition and multiplication operations. Define the action of R on M 
componentwise. Let S = R. Define ¢: R > S by (A) = |A], the usual absolute value function for the 
complex numbers. Define  : M > S by y : z 9 (J$; lzi2)!/2. Then s» is a norm on M over R with 
absolute value function $ for the ordered ring S. Note that R # S and R is not an ordered ring. 


19.6.7 REMARK: The interactions between sets and functions for a norm on a module over a ring. 
The spaces and maps in Definition 19.6.2 (i) are illustrated in Figure 19.6.1. 


NER S a Lar) 
Lx " 
Y o La =Lyy ov 
M 
DO 
Figure 19.6.1 Scalarity of a norm on a module over a ring 
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There are two rings in Definition 19.6.2. The role of the first ring R is to scale the elements of M. That 
is, elements of R act on the elements of M without changing their direction, or by reversing their direction. 
The role of the second ring S is to give represent the size of elements of M. The value of a norm function v 
is an element of S, which is equipped with a total order by which relative size can be represented. 


When the left action Ly acts on M for some A € R, the effect on the norm of elements of M is the same as 
the action of $(A) on S. Thus the absolute value function ¢ indicates the scaling effect of elements of R. In 
other words, (A) is the scaling effect of A € R. Definition 19.6.2 (i) is similar to a linearity condition. One 
could perhaps refer to it as a “scalarity” condition. 


In the case of linear spaces M over the field of complex numbers R = C, as in Example 19.6.6, norms on M 
are generally valued is the ordered ring of real numbers S = IR. Then the absolute value function $ : C > IR 
indicates the scaling effect of complex numbers. The ring R = C (which is not an ordered ring) scales 
elements of M, but the ordered ring S = IR represents lengths of elements of M. 


In the case of the module M = Z" over the ring R = Z for n € Z* (with the usual componentwise module 
addition and scalar multiplication), the norm Y : M > R with v : z œ> ($5; 422) /? uses Z to scale the 
module elements, but uses R to represent lengths of module elements. 


19.6.8 REMARK: Typical notations for norms and absolute value functions. 

'To distinguish norms from absolute value functions, the norm is often denoted by double vertical bars, as in 
“p(x) = ||x||", while the absolute value function is denoted by single vertical bars, as in “é(A) = |A|”. Then 
the scalarity condition, Definition 19.6.2 (i), becomes VA € R, Vx € M, ||Az|| = JA] ||x]|, 


19.7. Inner products on modules over rings 


19.7.1 REMARK: Square-norms induced by inner products on general modules over general rings. 
Definition 19.7.2 seems meaningful enough for any module over any ordered ring, but as mentioned in 
Remark 19.8.1, it does not seem to immediately yield a corresponding norm of the form u + ||u|| = n(u, u)!/? 
for module elements u. Section 19.8 gives some idea of how the usual properties of norms which are thus 
derived from inner products can ‘be demonstrated without being able to define square roots. The “square- 
norm" ||u||? is generally well defined in terms of any inner product because the value 7(u, u) is well defined 
for any u in a module, even when its square root is meaningless. 


19.7.2 DEFINITION: An inner product on a (left) module M < (R,M,or,Tr,om, p) over an ordered ring 
R < (R,og,Tn, €) is a map y: M x M > R which satisfies 


(i) Vu, v,w € M, nlu 4- v,w) = n(u, w) -- n(v, w), [left additivity] 
(ii) Vu,v,w € M, n(u, v + w) = n(u, v) + n(u, w), [right additivity] 
(iii) Vu,v € M, Va € R, n(au,v) = n(u, av) = on(u, v), [scalarity] 
(iv) Vu,v € M, n(u,v) = n(v, u), [symmetry] 
(v) Vu € M \ {0m}, OR < n(u, u). [positive definiteness] 


19.7.3 REMARK: Positive definiteness of an inner product. 

Definition 19.7.2 contains many redundancies. Conditions (i), (ii) and (iii) together mean that an inner 
product is bilinear. It is not necessary to assume that 7(u, u) equals zero for u = Om because this is implied 
by the other assumptions, as shown in Theorem 19.7.4 (ii). 


19.7.4 THEOREM: Some basic properties of inner products om left modules over ordered rings. 
Let 7 be an inner product on a left module M < (R, M,on,Tn,0M,) over an ordered ring (R, oR, TR, <). 


(i) Vv € M, n(Om,v) = nw, 0m) = On. 
(ii) Vue M, n(u,u) 20g & u=0m. 
(ii) Vu € M, Og € nlu, u). 
(iv) Vu,v € M, n(—u,v) = —n(u,v) = n(u, —v). 
(v) Vu,v € M, n(u,v) + n(v,u 
(vi) Vu,v € M, n(u,v) + (u,v 
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(vii) Vu,v € M, Va, B € R, (aß + Ba)n(u,v) € o?n(u, u) + B?n(v, v). 
(viii) Vu,v € M, Vo, B € R, (aß + aB)n(u, v) € o?n(u, u) + 8?n(v, v). 
(ix) Vu,v € M, Vo, B € R, (aß — Bo)n(u,v) = Or. 


PROOF: ‘To prove part (i), note that n(0m,v) = n(u(0r, 0m), v) = en(O0n.n(0,v)) = OR follows from 
Definition 19.7.2 (iii) for any v € M. The equality n(v, Om) = Op follows similarly. 

Part (ii) follows from part (i) and Definition 19.7.2 (v). 

Part (iii) follows from part (ii) and Definition 19.7.2 (v). 

For part (iv), let u,v € M. Then n(u — u,v) = n(0m, v) = Or by part (i). Therefore n(u,v) + n(—u,v) = 
n(u — u,v) = Or by Definition 19.7.2 (i). Hence n(—u, v) = —n(u,v) by the additive group structure of R. 
Similarly, n(u, —v) = —n(u, v). 

For part (v), note that 0g X n(u — v,u — v) for any u,v € M by Definition 19.7.2 (v) and part (iii). Then 
Or X n(u, u) — n(u,v) — n(v, u) + n(v, v) by the left and right additivity of n and part (iv). It then follows 
by Definition 18.3.2 (i) that n(u,v) + n(v, u) € n(u, u) + n(v, v). 

Part (vi) follows from part (v) and Definition 19.7.2 (iv). 

For part (vii), let u,v € M and o, 8 € R. Then n(au, Bv) + n(Bv, au) € n(ou, au) + (Bv, Bv) by part (v). 
Hence (af + Ba)n(u, v) € o?n(u, u) + B?n(v, v) by Definition 19.7.2 (iii, iv). 

For part (viii), let u,v € M and o, 8 € R. Then n(au, 8v) + n(ou, Bv) < n(ou, au) + n(Bv, Bv) by part (vi). 
Hence (a + o /8)n(u, v) € o?n(u, u) + B?n(v, v) by Definition 19.7.2 (iii). 

For part (ix), let u,v € M and o, 8 € R. Then 7(au, Bv) = an(u, Bv) = affn(u, v) by Definition 19.7.2 (iii). 
But by applying the rule in a different order, one obtains n(au, 8v) = Bn(au,v) = Ban(u,v). Hence 
(aß — Ba)n(u, v) = On. 


19.7.5 REMARK: Inner products effectively entail commutativity of the ring. 

It is clear from Theorem 19.7.4 (ix) that the form of an inner product in Definition 19.7.2 implicitly assumes 
that the ring is commutative. If the product of the commutator of all o, 8 € R with every inner product 
r(u, v) equals Op, then as far as the inner product 7) is concerned, the ring does appear to be commutative. 


19.7.6 THEOREM:  Parallelogram laws for inner products. 
Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. 


(i) Vu,v € M, 2n(u,v) = n(u,u) + n(v,v) — n(u — v,u — v). 
(ii) Vu,v € M, 2n(u,v) = nlu 4- v,u + v) — nlu, u) — n(v, v). 


(iii) Vu,v € M, 4n(u,v) = n(u+v,utv) — n(u — v, u — v). 


PROOF: For part (i), let u,v € M. Then n(u — vu — v) = nu, u) + n(v, v) — 2n(u, v) by Definition 19.7.2, 
where 2 denotes the element 1g + 15 of R. So 2n(u,v) = n(u, u) + n(v,v) — n(u — v,u — v). 

For part (ii), let u,v € M. Then n(u+v,ut+v) = n(u,u) + n(v, v) + 2n(u, v) by Definition 19.7.2. So 
2n(u, v) = n(u Tuut v) y nu, u) m n(v; v). 


Part (iii) follows by summing parts (i) and (ii). 


19.7.7 REMARK: Geometrical interpretation of parallelogram laws for inner products. 
In ordinary Euclidean geometry, if the inner product of two vectors u and v equals zero, then the two vectors 
are orthogonal. From Theorem 19.7.6 (ii), it follows from n(u,v) = 0 that n(u,u) + n(v, v) = n(u-- v, ud v). 


This may be interpreted in terms of the “square-norm” mentioned in Remark 19.7.1 as ||u||? - ||v||? = ||u4-v||]?, 
which is the Pythagoras law for right-angled triangles. 


If n(u, v) = 0, Theorem 19.7.6 (iii) may be interpreted to mean that the mid-point of vectors u and v is the 
centre of a circle on which the “origin” 0 lies. This is because n(u +v, u +v) = n(u — v,u — v) if n(u, v) = 0. 
This equation may be interpreted as ||3(u + v)|| = |i (u — v)||, where 3(u + v) is the mid-point of u and v, 
and F(u — v) is half the diagonal of the parallelogram bounded by u and v. The mid-point of this diagonal 
is the same as the mid-point of u and v. This is illustrated in Figure 19.7.1. 


Therefore the square-distance from 0 to 4 (u + v) equals the square-distance from $(u+ v) to both u and v. 


Hence this point i(u + v) is apparently the mid-point of the diameter of a circle which has the origin 0 
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Figure 19.7.1 Interpretation of parallelogram law 


on its circumference. This matches very well with the Euclidean theorem that the angle subtended by the 
diameter of a circle at a point on the circumferences is a right angle. Perhaps a simple interpretation of 
Theorem 19.7.6 (iii) when n(u,v) = 0 is that the diagonals of a parallelogram are equal when one of the 
angles is a right angle. 


19.8. Cauchy-Schwarz and triangle inequalities for modules over rings 


19.8.1 REMARK: No norm for inner products on general modules over rings. 

At this point, it is difficult to produce a Cauchy-Schwarz inequality because square roots are not well defined 
for general elements of general rings, and nor is division well defined in general. Therefore one cannot even 
construct a norm from an inner product 7 of the form ||v|| = (v, v)!/?. 


One might attempt to prove something like n(u,v)? < n(u,u)n(v,v) instead. The first step in such a 
proof would be to derive an inequality such as 2a8n(u,v) € o?n(u, u) + 8?n(v, v) from the generally valid 
inequality n(au— Bv, qu— pv) > 0. However, even this simple derivation requires the ring to be commutative 
so that a8 = fa. Therefore this line of development is best abandoned. The scalar space R needs to 
be upgraded from a ring to an ordered commutative ring which has at least multiplicative inverses, and 
preferably also contains square roots for all non-negative elements. Thus one requires at least an ordered 
field and preferably the existence of square roots. 


As mentioned in Remark 19.7.5, one loses almost nothing by assuming that the ordered ring is commutative. 
It also makes good sense to assume that the module is a unitary module over a ring as in Definition 19.3.6 
so as to simplify expressions such as 7(u, v) + (u,v) and af + af in Theorem 19.7.4 (vi, vii, viii) with the 
help of the ring element 25 = 1g-- 1g for example. Then one may obtain a generalised Cauchy-Schwarz 
inequality for modules over rings as in Theorem 19.8.2. (The notation Hj = [x € R; Or < x} for the 
“non-negative cone" of an ordered ring is defined in Remark 18.3.21. Similarly, Rt = (x € R; 0g < x} 
denotes the “positive cone" of R.) 


For each element u in a module M over an ordered ring R, the existence of a € R such that n(u, u) < a? 
is guaranteed by Theorem 18.4.8 if R is unitary or Archimedean, and R is not the zero ring. (Ordered rings 
where some elements have no square upper bound are discussed in Remark 18.4.6.) 


19.8.2 THEOREM:  Cauchy-Schwarz inequality for modules over rings. 
Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. Then 


Yu,v € M, Va,B € R, (n(u,u) € o? ^ q(v,v) € 8?) > nlu, v) < ap. 


PROOF: Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. Let 
u,v € M and o,f € Ro satisfy n(u,u) € o? and n(v,v) < 6?. If a = Og, then n(u,u) = Or, and 
therefore u = Om and n(u, v) = Or, and so (u,v) € af. Similarly, if 8 = Or, then n(u,v) € a8. Suppose 
then that a, 8 € R*. It follows from Theorem 19.7.4 (viii) that 2a8n(u,v) € f?rn(u,u) + a?n(v,v). So 
2aBn(u,v) € B?o? + a?6? by Definition 18.3.2 and Theorem 18.3.10 (vi). Therefore 2agr(u,v) < 20? f?. 
Hence (u,v) < aß by Theorem 18.3.10 (xii) because 2a8 > 0. 


19.8.3 REMARK: The Cauchy-Schwarz inequality for real linear spaces. 
When applied to real linear spaces, Theorem 19.8.2 immediately yields the standard Cauchy-Schwarz in- 
equality in Theorem 24.9.14. 
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19.8.4 REMARK: Work-arounds for the non-existence of square roots in Cauchy-Schwarz inequalities. 
The form of the assertion in Theorem 19.8.2 carefully avoids assuming the existence of square roots. If one 
lets a = n(u, u)!/? and 8 = n(v, v)!/?, one obtains n(u, u) € a? and n(v, v) € 8?, from which Theorem 19.8.2 
gives n(u, v) € n(u, u)!"?n(v, v)!/?, which is precisely the form of the standard Cauchy-Schwarz inequality. 
If one does not assume the existence of square roots, the inequality n(u,v) € n(u, u)/?n(v,v)/? may be 
expressed in the formally equivalent form 7(u, v)? < n(u, u)n(v, v), which does not pre-suppose the existence 
of square roots. However, Theorem 19.8.2 does not imply this assertion, but it does imply the assertion that 
n(u, v)? < o28? whenever it is assumed that 7(u,u) < o? and g(v, v) X 8?. From this, one might conjecture 
that (u,v) € inffa € R; n(u,u) € o?) - inf(8 € RF; n(v,v) € B?}, from which one might conclude that 
n(u, v) < |[ul||v||, where ||w|| = inf{y € R; n(w, w) € 77} for all w € M. Of course, such an infimum may 
not exist, but if one adds such infimums to the ring as some kind of formal completion extension, this kind 
of generalised Cauchy-Schwarz inequality could be viable. 


19.8.5 REMARK:  Cauchy-Schwarz equality for parallel vectors. 

Theorem 19.8.2 tends to suggest that the Cauchy-Schwarz inequality n(u,v)? < n(u,u)n(v,v) should be 
true. This impression is strengthened by Theorem 19.8.6, which implies that in the extremal case of two 
“parallel vectors", the inequality becomes an exact equation. This extremal case suggests that in the more 
general case, 7(u, v)? will be no greater than 7(u, u)n(v, v). However, this kind of geometric intuition proves 
absolutely nothing. A logical deduction is required! 


19.8.6 THEOREM:  Cauchy-Schwarz equality for parallel vectors in a unitary module. 
Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. Let u,v € M 
and a, B € R\ {0} satisfy au + Bv = 0. Then n(u, v)? = n(u, u)n(v, v). 


PROOF: Let u,v € M and o, 8 € R\ {0}. Then 0 = n(ou + 8v,v) = an(u,v) + 8(v,v). So an(u,v) = 
—f(v, v). Similarly, 0 = n(u, vu + Bv) = an(u, u) + B(u,v). So Br(u, v) = —a(u, u). Therefore o/gr(u, v)? = 
(an(u, v)).(Bn(u, v)) = (—Bn(v, v)).(-em(u, u)) = on(u, u)n(v, v). Hence n(u, v)? = n(u, u)n(v,v) by the 
cancellative property for R because af Æ 0. 


19.8.7 EXAMPLE: Let R = Z and M = Z?, with the usual pointwise scalar multiplication and vector 
addition operations. Let y: M x M — R be an inner product on M. Let e; = (1,0) and eg = (0,1). Let 
a = (ei, €), B = (es, e2) and y = n(e1, e2). Then a > 0 and 8 > 0. The bilinearity of 7 implies 


V81, 52,t1,to € R, n(s1ei + S2€2, t1€1 + t2e2) = sıt1Q& + S2t2 8 + (site + Sot1)y. 
Thus 7 is completely determined by a, 8 and y. All non-negativity constraints on 7 are as follows. 
Vs1,52 € R, sta + s26 + 25152"y > 0. 


By temporarily working outside the integers, one may exploit these inequalities more easily using some 
elementary calculus. For any given sı € Z*, let s2 = ceiling(si(o/8)!/2). (Note that s; and sz are 
both integers.) Then s2 < si(o/8)!/? +1, and so s2/sı < (o/8)!/? + s(1. The case s2 = 0 may be 
avoided by requiring s; > (8/a)!/?. Then s» > si(o/8)'/7, and so s1/s» < (8/a)'/?. Consequently 
2|y| < (s1/52)o + (52/51)8. € (a8)? + (o8)!/? + B/s,. Since this inequality holds for all s; € Z* with 
sı > (8/a)'/?, it follows that || € (a8)!/?. Therefore 4? < aß. This consequence is well defined within the 
ordered ring R = Z, although the method of demonstration uses the real numbers. 


Let u = sje; + S2€9 and v = tye, + tgeg. Then 


u, u) = s?2a + 838 + 2515208 


( 
n(v,v) = that t2B + 2tytoaP 
( 


Therefore the exact, standard Cauchy-Schwarz inequality n(u, v)? < n(u, u)n(v, v) holds, although the proof 
here goes beyond the ring R into the real number extension of the integers to perform the analysis. 
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It is also perhaps of some interest that the stronger inequality y? < a must hold if a and 8 are both equal 
to squares of positive integers. Suppose that ó,c € Zt satisfy 6? = a and e? = f. Let u = ce, — deg. 
Then 0 < n(u,u) = ea + 678 — Wey = 2aB — 2ey. So dey < af. So aby = 8 ey < deaf. So «y < óc. 
So 7? < X? € = af. (This argument is entirely within the integers.) One then obtains 7(u, v)? < n(u, u)n(v, v) 
whenever stg — Sgt; Æ 0 for u and v as above, which holds whenever u and v are not parallel. In the special 
case a = p = 1, the only possibility for y is y = 0. 


19.8.8 REMARK: The standard Cauchy-Schwarz inequality for modules over Archimedean rings. 

Example 19.8.7 gives a hint of how to prove the standard Cauchy-Schwarz inequality n(u, v)? < n(u, u)n(v, v) 
for modules over rings in general. The proof of Theorem 19.8.9 uses the assumption that the ordered ring is 
Archimedean. (See Definition 17.5.4 for Archimedean order on a group. See Section 18.4 for Archimedean 
ordered rings.) This proof has essentially the same “reductio ad absurdum” form which Archimedes felt 
was necessary to validate his formulas for areas and volumes to other mathematicians at the time. The 
proof of Theorem 19.8.9 has an analytical flavour although the result is purely algebraic. No square roots 
or multiplicative inverses were used in the making of this proof. 


19.8.9 THEOREM: Cauchy-Schwarz inequality for modules over Archimedean rings. 
Let 7 be an inner product on a unitary module M over an Archimedean commutative ordered ring. Then 


Vu,v € M, n (u, v)? < n(u, u)n(v, v). (19.8.1) 


Pnoor: Let 7 be an inner product on a unitary module M over an Archimedean commutative ordered 
ring R. If u = 04; or v = Oy, then n(u, v) = 0g, and so line (19.8.1) holds. 


Suppose that 7(uo, vo)? > n(uo, uo)n(vo, vo) for some uo, vo € M V {0m}. For all s,t € R*, define 5(s,t) = 
T (suo, tuo)” uu n(suo, suo)n(tvo, tvo) € R. Let Yo = ó(1n; lg) = n (uo, vo)? = T (uo; uo)n(vo, vo) € R*. Then 
6(s,t) = s?t?yg € R* for all st € R*. 

For any s,t € R*, there exist a, 8 € Z* such that n(suo, sug) < o2.1g and ņ(tvo, tvo) < 8?.1g. The existence 
of such a and f is guaranteed by Theorem 18.4.8 because R is assumed to be Archimedean and unitary. Let 
a(s) = min{a € Z*; n(suo, suo) € o?.15) and B(s) = min( € Z*; n(tvo,tvo) < 8?.1g) for s,t € Zt. Then 
(suo, suo) > (a(s) — 1)?.1g and n(tvo, tvo) > (B(t) — 1)?.15 for all s,t € Z+. But n(suo, tvo) € o(s)B(t).1g 
for all s,£ € Z* by Theorem 19.8.2. Thus for all s,t € Zt, 


s?n(uo, uo) > (a(s) — 1)?.1n 
t n(vo, vo) > B(t) — 1)?.1g 
stn(ug, vo) € a(s)B(t).1n 


Therefore, suppressing the multiplication by 15, one has 


Vs,t € ZF, ó(s,t) < a(s)?B(t)? — (als) — 1° (8(t) — 1)? 
= (2a(s) — 1)8(t)* + (28(t) — 1)o(s)? — (2a(s) — 1)(28(t) — 1) 
< 2a(s)B(t)? + 28(t)o(s)?. (19.8.2) 


It is more or less clear that the expression on line (19.8.2) is of order st? + s?t, whereas ó(s, t) is of order s?t?, 
and that therefore by making both s and t large enough, it may be shown that 6(s,t) < s?t?»o, which 
contradicts the assumption, but this must be demonstrated more carefully. Note that (uo, uo) € o(1)?, and 
so (a(s) — 1)? < s?n(uo, ug) € s?o(1)?. Therefore a(s) < 1+sa(1) for all s € Z+. Similarly, B(t) < 1--t8(1) 
for all t € Z*. But from a(1) > 1 and (1) > 1, it follows that 1 + so(1) € 2sa(1) and 1 + t8(1) < 2t8(1) 
for all s,t € Z*. Therefore 


Vs,t € ZT, ó(s,t) < 2(1 + sa(1))(1+¢6(1))? + 2(1 + t8(1))(1 ++ sa(1)) 
< 16st?o(1)8(1)? + 16s?t8(1)a(1)?. 


By the Archimedean property of R, there exists so € Z* such that soyo > 32a(1)8(1)? because yọ > 0. For 
such so, one has s%t@yo > 32sot20(1)8(1)?. Similarly, there exists tọ € Zt such that toyo > 32a(1)?8(1). 
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For such to, one has s2t/o > 32s5too0(1)8(1)?. Consequently 


26(so,to) < 32st?a(1)8(1)? + 32:?t8(1)o (1)? 
< sitio sino 
= 2s3t070 
= 25(So, to). 


From this contradiction, one infers that (ug, vo)? < n(uo, uo)r(vo, vo) for all uo, vo € M X {0m}. 


19.8.10 REMARK: A triangle inequality for inner products on modules over rings. 

The generalised triangle inequality in Theorem 19.8.11 for the “virtual norm” corresponding to an inner 
product on a module over a ring is obtained from Theorem 19.8.2. If square roots of values of norms are 
well defined, this inequality may be written as ||u + v|| € ||u|| + ||v|| by setting a = ||u|| = n(u,u)!/? and 
B = ||vl| = nw, v). 


19.8.11 THEOREM: Triangle inequality for modules over rings. 
Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. Then 


VuveM,Vo,B e Ri,  (n(uu)xo? A nv, v) < 8?) > nlu +v, u +v) € (at B). 


PROOF: Let 7 be an inner product on a unitary left module M over a commutative ordered ring R. Let 
u,v € M and o, 8 € Rọ satisfy n(u,u) < o? and n(v, v) < f?. Then n(utv, utv) = n(u, u)+n(v, v)+2n(u, v) 
by Definition 19.7.2 (i, ii). Hence n(u+v,u+v) € n(u,u) + n(v,v) + 2a8 € o? + 8? + 208 = (a + B)? by 
Theorem 19.8.2. 


19.9. Associative algebras 


19.9.1 REMARK: Algebras have two sets and five operations. 

Sections 19.9 and 19.10 define associative algebras and Lie algebras. (Tensor algebras are delayed until 
Chapters 27-30.) Algebras are distinguished from modules by having five operations, namely an addition 
and multiplication operation within each of the active and passive sets, and an action operation by the active 
set on the passive set. (See Table 17.0.2 in Remark 17.0.3.) The passive set is the module of the algebra. 
'The active set is a commutative unitary ring of scalars. 


An algebra may be thought of as a module over a ring which has an additional multiplication operation for 
the module. Typical modules over rings are linear spaces and spaces of square matrices over a ring or field. 
In the case of a linear space, the additional multiplication operation is some kind of “vector multiplication” 
such as is often encountered in physics and Euclidean geometry. In the case of matrices over a ring, the 
multiplication operation is the standard matrix product. The space of endomorphisms of any module over 
a ring is also an algebra with composition of endomorphisms as the multiplication operation. 


The theory of associative algebras is not so prominent in differential geometry as the theory of Lie algebras. 
Associative algebras can be converted to Lie algebras, according to Theorem 19.10.10, by replacing the 
product on the module by its commutator. 


19.9.2 DEFINITION: An associative algebra over R, for a commutative unitary ring R < (R,oR,TR), is à 
tuple A < (R, A,oRr,TR,OA,TA, H) such that: 


(i) A <(R,A,oR,TR, CA, H) is a unitary left module over the ring R, 
(ii) A < (A,c4, TA) is a ring, [TA associativity, (74,04) distributivity] 


(iii) VA € R, Va,b € A, pA, Tala, b)) = TA(u(A, a), b) = Ta (a, (A, b)). [((TA, u) scalarity] 
In other words, VÀ € R, Va,b € A, A(ab) = (Aa)b = a(Ab). 


19.9.8 REMARK:  Bilinearity of vector multiplication for associative algebras. 
Definitions which are essentially identical to Definition 19.9.2 are given by MacLane/Birkhoff [110], page 334; 
S. Warner [155], page 356; Ash [50], page 86; Grove [88], page 163; Simmons [137], page 208. 
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to the bilinearity of the vector product operation TA : A x A — A. (This bilinear, is "Féquised for Lie 
algebras in Definition 19.10.3 (iii, vi) as two separate conditions.) Consequently an associative algebra may 
also be described as a unitary module over a ring with a bilinear vector product. This is done for example 
by Lang [108], page 121. It is described as a linear space with a vector product satisfying the bilinearity 
conditions by Szekeres [305], page 149. 


19.9.4 REMARK: The associative algebra of endomorphisms of a unitary module over a ring. 

The unitary ring-module of endomorphisms Endg(M) of a unitary ring-module M over a commutative 
unitary ring R, with addition defined pointwise and multiplication defined as composition of maps, is shown 
in Theorem 19.9.6 to be an associative algebra over a commutative unitary ring R. (See Notation 19.1.12 
for Endg(M).) This is the same as the module of endomorphisms in Definition 19.4.10 (which is shown to 
be a unitary ring-module in Theorem 19.4.9), but a multiplication operation is now also defined. 


The multiplication operation for the endomorphism algebra in Definition 19.9.5 (iii) satisfies fi f2 = fi o f2, 
but it could be argued that the order should be reversed so that fifo = f» o fi. "(The order fifo = fi o fo 
agrees with Pinter [122], page 177, and S. Warner [155], page 188. This order is also given by Ash [50], 
page 96, together with the opposite order as an alternative.) _ 


19.9.5 DEFINITION: The algebra of endomorphisms of a unitary left module M < (R, M,on, Tg, OM; HM) 
over a commutative unitary ring R < (R,or,TR) is the tuple A < (R, A,or, TR, OA, TA, HA), where: 
(i) A = Endg(M) is the set of R-endomorphisms of M. 
(ii) c4: Ax A > A is defined by Vfi, fo € A, Yx € M, 2 f2)(x) = om(fi(2), fa(x)). 
In other words, Vfi, fo € A, Vr € M, (fi + f2)(x) = fila) + falx). 
(iii) TA : A x A — A is defined by Vfi, fo € A, Vx € M, Ta(fi, f2)(x) = fil fo(x)). 
In other words, Vf, fo € A, fifa = fi o fa. 
(iv) wa: Rx A > A is defined by VÀ € R, Vf € A, Yx € M, wal, f)(x) = uw (à, f (2)). 
In other words, VÀ € R, Vf € A, Vx € M, (Af)(x) = Af (a). 


19.9.6 THEOREM: The algebra of endomorphisms of a unitary module is an associate algebra. 
The algebra of endomorphisms of a unitary left module over a commutative unitary ring R is an associative 
algebra over R. 


PROOF: Let R< (R,og, 7n) be a commutative unitary ring. Let M < (R, M,on, TR, M. Um) be a unitary 
left module over R. Let A < (R, A,og, TR,CA, TA, HA) be the algebra of endomorphisms of M over R. 

The tuple (R, A, on, TR, 0A, pa) is a unitary ring over R by Theorem 19.4.13 (ii). So Definition 19.9.2 (i) is 
satisfied by A. 

For Definition 19.9.2 (ii), note that (4,74) is a commutative group by Definition 19.3.1 (ii) because A is 
a module over R, and (A,74) is a semigroup because Endg(M) is closed under function composition by 
Theorem 19.4.13 (iii), and this composition is associative by Theorem 10.4.21 (ii). To show distributivity of 
TA over 24, let $, $1, 92 € A and let x € M. Then (6(¢1 + ¢2))(x) = 6(¢1(x)) + 6(ó2(x)) = (691 + 692) (x). 


o (¢1+62) = $91-- $9». Similarly (014-02) = 0194-029. Thus Definition 18.1.2 (iii) (ring distributivity) 
is verified. Therefore (A, c, 7) is a ring, which verifies Definition 19.9.2 (ii). 


Let À € R and ¢1,¢2 € A. Let x € M. Then (A(¢1¢2))(z) = A(b1¢2)(x) = Adi(d2(x)) = (Ad1)(éa(a)) = 


((Ag1) $2) (a). So A(G1b2) = (461)ó2. But (Ag1)($2(x)) = d1(Aé2(2)) = $1((92)(2)) = (b1(AG2)) (x). So 
A(¢1¢2) = 1(Ad2). Thus Definition 19.9.2 (iii) is verified. Hence A is an associative algebra over R. 


19.9.7 REMARK: Associative algebras applied to linear spaces. 

In the language of linear spaces in Chapter 22, the associative algebra of endomorphisms in Definition 19.9.5 
is the space Lin(V, V) between two linear spaces over a field K because linear spaces are a particular kind 
of unitary module over a commutative ring. With respect to a particular basis, the space Lin(V, V) may be 
identified with the algebra of square matrices over the field K. 


19.9.8 REMARK: Automorphism spaces are not associative algebras. 
Usually one would expect that whatever properties are possessed by a space of endomorphisms would be 
possessed in even greater abundance by the corresponding space of automorphisms because automorphisms 
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have the advantage of being invertible. For example, the set of all automorphisms of a space (for just about 
any category) is a group with the composition operation whereas the corresponding set of all endomorphisms 
does not. So it is somewhat disappointing to discover that Theorem 19.9.6 apparently does not generalise 
to automorphisms. Thus there is no associative algebra of automorphisms of a unitary module. 


The reason for this is that the pointwise addition and scalar multiplication operations in Definition 19.9.5 
are not closed when the space is restricted to automorphisms. In the special case of linear maps and linear 
spaces, for example, the set of automorphisms would be GL(V), the set of all invertible linear maps from 
V to V for some linear space V. Both I = idy and —I (the inversion map) are invertible linear maps, 
but J + (—I) = 0 is the zero map, which is not invertible. Similarly, \¢ is not invertible if ¢ is invertible 
and A = 0. 


19.10. Lie algebras 


19.10.1 REMARK: Definition styles for Lie algebras. 
In the literature, there are four main styles of primary definitions for a Lie algebra. 


(1) Abstract algebra. The elements of the Lie algebra obey antisymmetry and the Jacobi identity. (See 
for example Fulton/Harris [76], page 108; Bump [57], page 30; Cahn [58], page 3; Jacobson [96], page 3; 
Winter [166], page 18; Gilmore [82], page 10; Lipkin [285], page 10; Lang [108], page 548; Szekeres [305], 
pages 167-176; Hall [89], page 49; Cheeger/Ebin [5], page 47; Gallot/Hulin/Lafontaine [13], page 27; 
Penrose [297], pages 267—270; Stillwell [143], page 82; Peskin/Schroeder [298], pages 495-496.) 

(2) Tangent vectors. The elements of the Lie algebra are vectors in the tangent space T.(G) of a Lie group G 
with identity e. (See for example Fulton/Harris [76], page 109; Lang [23], page 166; Szekeres [305], 
page 561; Frankel [12], page 402; Sternberg [38], pages 151-152; Gallot/Hulin/Lafontaine [13), page 27; 
Bleecker [254], page 18; Stillwell [143], pages 74, 104.) 


(3) Vector fields. The elements of the Lie algebra are left-invariant vector fields on a Lie group. (See 
for example Lang [23], page 166; Szekeres [305], pages 559-561; Frankel [12], page 403; Sternberg [38], 
pages 151-152; Gallot/Hulin/Lafontaine [13], page 27; Daniel/Viallet [317], page 182; Penrose [297], 
pages 312-313; Kobayashi/Nomizu [19], page 38; Drechsler/Mayer [262], page 202.) 


(4) Matrices. The elements of the Lie algebra are matrix "generators" of a Lie matrix group. (See for 
example Hall [89], page 56; Peskin/Schroeder [298], pages 495-496; Moriyasu [293], pages 165-166; 
Cahn [58], page 3; Stillwell [143], page 105.) 


Styles (2) and (3) are very closely related and both require the definition of a differentiable manifold. (See 
Section 62.8.) Style (1) is an abstraction from styles (2) and (3) which can be defined algebraically without 
any analysis or geometry. 


Style (4) is closely related to the tangent vector style (2) because a “generator” of a Lie matrix group 
generates a curve through the identity with the generator as the derivative at the identity. So it is a tangent 
vector of a group of matrices. 


19.10.2 REMARK: Similarities and differences between associative algebras and Lie algebras. 

The specification tuples for associative algebras and Lie algebras have the same number and types of sets and 
operations. Both kinds of algebras have a passive set A and an active set R, and addition and multiplicative 
operations on each of these sets, plus an action map p by the active set on the passive set. However, the 
passive set A for an associative algebra is a ring, which the product operation 74 associative and distributive 
over the addition operation c4, whereas the passive set A for a Lie algebra has anticommutativity and the 
Jacobi identity instead of an associativity rule. 

The distributivity of r4 over a, is the same for both kinds of algebra, but in the Lie algebra case, it must 
be stated as a separate condition because (A, o4, TA) is not assumed to be a ring as in Definition 19.9.2 (ii). 
Direct comparison of Definitions 19.9.2 and 19.10.3 is made more difficult by the fact that Definition 19.9.2 
part (ii) is a combines two conditions, namely the associativity of r4 and the distributivity of 74 over c 4. 


19.10.3 DEFINITION: A Lie algebra over R, for a commutative unitary ring R < (R,or,7TR), is a tuple 
A < (R, A,on, TRO A, TA, H) which satisfies the following conditions. 
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(i) A < (R,A,oRr,7TR, CA; H) is a unitary left module over the ring R. 

(ii) TA: AX A> A. 

(ii) VX4, X3, Y € A, TA(cA(X1, X9), Y) = o4(74(X1, Y), TA(X2, Y)) [(TA, 04) distributivity] 
and YX, Y1, Y2 € A, rTA(X,oA(Yi, Y2)) = oa(TA(X, Y1), TACX, Y2)). 
In other words, VX1,X2,Y € A, [X1 + X,Y] = [X1, Y] + [X», Y] 
and YX Yi, Yo € A, [X, Yi + Yə] = [X Yi] + [X, Yo]. 

(iv) VX,Y € A, cA(rACX, Y), rA(Y, X)) = 04. [anticommutativity] 
In other words, VX, Y € A, [X,Y] = —[Y, X]. 

(v) VX,Y,Z € A, TA(X,TA(Y, Z)) = oA(rA(Y, TACX, Z)), TA(TACX, Y), Z)). [Jacobi identity] 
In other words, VX, Y, Z € A, Lx. Y], Z] = LX, IY, Z] - [Y [X, Z]. 

(vi) VA € R, VX,Y € A, n(A,TACX, Y)) = tau, X), Y) = TACX, uw, Y)). [((TA, p) scalarity] 
In other words, VÀ € R, VX,Y € A, A[X, Y] = [AX, Y] = [X, AY]. 


19.10.4 REMARK: The bracket notation for the product operation of a Lie algebra. 
'The main benefit of the brackets in Notation 19.10.5 is to remind the user that the product is anticommutative 
and non-associative, thus discouraging incorrect changes in the order of terms for example. 


If a Lie algebra product is defined as the commutator of an associative algebra product as in Theorem 19.10.10 
and Definition 19.10.11, the product in Notation 19.10.5 can be correctly called the “commutator” of X 
and Y. When the product is defined abstractly, this is not strictly correct. For example, Jacobson [96], 
page 6, uses the terms “Lie product” and “commutator” and the bracket notation only when the product is 
in fact a commutator of elements of an associative algebra. The bracket notation is widely used to denote 
commutators of linear operators. (See Lipkin [285], page 10.) But it seems reasonable to use it for abstract 
Lie algebra products also, and “Lie product” seems like a reasonable name. (See Winter [166], page 18.) 


19.10.5 NOTATION: [X,Y], for X and Y in a Lie algebra A, denotes the product rA(X, Y). 


19.10.6 REMARK: Linear spaces with antisymmetric bilinear product satisfying Jacobi identity. 
Conditions (iii) and (vi) of Definition 19.10.3 mean that the product 74 : A x A — A is bilinear. (In other 
words, it is linear with respect to each of its two parameters for fixed values of the other parameter. See 
Definition 27.2.16 for bilinear maps.) So a Lie algebra is sometimes defined very succinctly as a linear space 
with an antisymmetric (or “skew”) bilinear product which satisfies the Jacobi identity. Since a linear space 
is a unitary left modules over a ring, this is a special case of Definition 19.10.3. 


Theorem 19.10.7 is a straightforward consequence of the bilinearity of the Lie algebra product. In the special 
case that X and Y in Theorem 19.10.7 are equal to the same basis of A, it can be seen that the product 
of general elements of A may be computed by multiplying the component sequences « and 8 by the basis- 
dependent factors [X;,Y;| as indicated. These factors are called the “structure constants” because they 
completely determine the structure of the Lie algebra. (This is true for associative algebras also.) 


19.10.7 THEOREM: Formula for the bracket of two finite linear combinations. 
Let A < (R, A,on, TR, 0A, TA, H) be a Lie algebra over R. Then 


Vm,n € Zt, Vo € R”, V8 € R”, VX € A", VY e A", 


[Zax i: Y oi; X. i» Y] 
i=1 j=l 


a,j=1 


Pnoor: The assertion follows by induction on m and n from Definition 19.10.3 (iii, vi) 


19.10.8 REMARK: Alternative formulation for anticommutativity of the Lie algebra product. 

Definition 19.10.3 (iv) may be replaced by YZ € A, [Z, Z] = 0. This is clever but less meaningful. It implies 
[X,Y] + [Y, X] = [X, X] + [X,Y] + [Y, X] + Y, Y] = X, X -Y]-[Y; X +Y] 2 [X +Y,X +Y] = 0 for 
any X,Y € A, assuming conditions (i) and (iii). Thus it implies condition (iv). (The converse is trivial.) 
This implication requires both the left and right distributivity of 74 over c4 in condition (iii). There is no 
obvious benefit in replacing a meaningful condition with a riddle. 
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19.10.9 REMARK: Interpretations of the Jacobi identity. 
The formula *[[X, Y], Z] = [X, [Y, Z]] —[Y, |X, Z]]” in Definition 19.10.3 (v) may be rearranged in many ways 
to try to interpret it. 


(1) Associativity. The identity is equivalent to [X, [Y, Z]] — [[X,Y], Z] = [Y, |X, Z]], which looks like some 
kind of “non-associativity” rule. An associative operation would satisfy |X, [Y, Z]] — [[X, Y], Z] = 0. So 
[Y, [X, Z]] could possibly be thought of as a measure of deviation from associativity. 

(2) Derivation. The identity is equivalent to [X, [Y, Z]] = [[X, Y], Z] + [Y [X, Z]], which closely resembles 
the rule for a derivation because the operation of X on [Y, Z] is computed by first making X act on Y 
in [[X, Y], Z], and then making X act on Z in [Y, [X, Z]]. This is clearly related to its meaning. 

(3) Lie derivatives. The identity [.X, Y], Z] = (X, [Y, Z]] — [Y, |X, Z]] as stated resembles formulas such as 
Lixy|Z = (LxLy — Ly Lx)Z for Lie derivatives of tensor fields. This comes close to identifying its 
origin and meaning. 

(4) Rotational symmetry. The identity is equivalent to [X, [Y, Z]] + [Y, |Z, X]] + [Z, [X, Y]] = 0. Despite its 
mnemonic value and aesthetic appeal, there seems to be no interpretative value in this, even though it 
is very popular in the literature. 


19.10.10 THEOREM: Construction of Lie algebra using commutator operation of an associative algebra. 
Let A < (R,A,or,TR,0A,TA, u) be an associative algebra. Define the operation 74 : A x A — A by 
TA (X, Y) o rA(X, Y) — rA(Y, X). Then (R, A,on, TR, 0A, T4, H) is a Lie algebra. 


Pnoor: Definition 19.10.3 condition (i) follows directly from Definition 19.9.2 condition (i) because these 


do not depend on T4 or 74. Definition 19.10.3 conditions (iii) and (vi) for 74 follow almost immediately 
from Definition 19.9.2 conditions (ii) and (iii) for r4. Definition 19.10.3 (iv) follows from the formula for 74. 


For Definition 19.10.3 (v), let X, Y, Z € A. Then, writing XY for rA(X, Y) and [X,Y] for T4(X, Y), 


(X, [Y, Z]] - IY, [X, Z] = (XY Z -| XZY |-[YZX -Zvx)-(vxz -|vzx|-|XZv |+ ZXY) 
=(XYZ+ZYX)-(YXZ+ZXY) 
X,Y]Z - ZĪX,Y] 


=| 
= [[X, Y], Z]. 


(The boxed terms cancel to zero.) Hence A is a Lie algebra with 74 in place of 74. 


19.10.11 DEFINITION: The Lie algebra associated with an associative algebra A < (R, A, OR, TR, G A, TA, H) 
is the Lie algebra (R, A, oR, TR. 04,74. H), where 74 : Ax A — A is defined by 


VX,Y € A, TALK, Y) = TA(X, Y) — TA (Y, X). 


19.10.12 REMARK: The Lie algebra associated with an associative algebra of ring-module endomorphisms. 
Since the set Endg(M) of endomorphisms of a unitary module M over a commutative ring R is an associative 
algebra (with multiplication defined as composition) by Theorem 19.9.6, and every associative algebra can 
be converted to a Lie algebra by Theorem 19.10.10, it follows immediately that a Lie algebra is associated 
with the set of endomorphisms of any unitary module over a commutative ring. This is the Lie algebra in 
Notation 19.10.13. 


19.10.13 NOTATION: gl(M) for a ring-module M over a commutative unitary ring R denotes the Lie 
algebra associated with the associative algebra Endr(M) of ring endomorphisms of M. 


19.10.14 DEFINITION: A real Lie algebra is a Lie algebra over R. 


19.10.15 REMARK: The algebra of vector fields on a smooth manifold is a real Lie algebra. 
The algebra X®(T(M)) of C® vector fields in a C% manifold M with the Lie bracket as the product 
operation is a real Lie algebra. (See Definition 61.5.7 for the Lie bracket.) 
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19.10.16 EXAMPLE: A class of Lie algebras consisting of matrices. 
A useful example of a Lie algebra gl(V) is the case that V is the linear space R” for some n € Z*. Then the 
elements of End(IR”) are the linear transformations of IR^. With respect to a basis for IR”, these correspond 
to n x n matrices. Therefore gl(V) corresponds to the set of n x n matrices together with the operations of 
matrix addition, scalar multiplication and the commutator product. 


19.10.17 DEFINITION: A Lie subalgebra of a Lie algebra A < (R, A,on, TR, CA, TA, HA) is a Lie algebra 
A’ < (R, Aon, TROA, TA, ha) such that A’ C A, ox C 04, Tar C 7A and uy C Ha. 


19.10.18 THEOREM: Some basic properties of Lie subalgebras. 
Let A’ < (R, Aon, TR, OA’, TA’, LA’) be a Lie subalgebra of a Lie algebra A < (R, A, oR, TR, GA, TA; HA). 


(i) oy — 0A 


A'x A? and HA! = Jia sisi 
(ii) (R, A, on, TR, C A', WA’) is a submodule of the unitary module (R, A,or, TR, 0A, A). 


A’x A” TA! = TA 


Proor: For part (i), A’ is Lie algebra by Definition 19.10.17. So (R, A';og, TR, 0 A', Wa’) is a unitary left 
module over a ring by Definition 19.10.3 (i). Consequently Dom(a y) = A’ x A’ and Dom(p4/) = R x A’ by 
Definition 19.3.1 (ii, iii). Moreover, Dom(r4/) = A’ x A’ by Definition 19.10.3 (ii). Hence by Theorem 10.4.10, 
A'x A’? TA' — TA A'xA'!'? and par = Hal py ar 

Part (ii) follows from Definitions 19.10.17, 19.10.3 (i) and 19.3.9. 


CO A! =OA 


19.10.19 THEOREM: Conditions for a sub-system of a Lie algebra to be a Lie subalgebra. 
Let (R, A,on, TR, GA, TA, HA) be a Lie algebra. Let A’ C A satisfy 


(i) cACA' x A’) € Al, rACA! x A’) C A! and na(R x A!) C A’, 
(ii) 04 € A’. 


Then A’ < (R, Aon, TR, OG AÀ',TA', LA), Where oy — 604 soa and [Lar = HA „ is a Lie 
A'xA RxA 


subalgebra of A < (R, A,on, TR, C A, TA, HA). 


A'xA^ TA! — TA 


PROOF: Clearly oy C oa, and oy is a well-defined function with domain A’ x A’ by Theorem 10.4.5 
line (10.4.2) and Theorem 9.4.6 (ii). Similarly, r4; has domain A’ x A’, and Tø’ C TA, and pa has domain 
Rx A’, and pa C pa. 


To show that A’ is a Lie algebra, first note that assumptions (i) and (ii) imply that (R, A’,or, TR, G A', Wa’) 
is a unitary module over the ring R by Theorem 19.3.15 because (R, A, OR, TR, OA; HA) iS a unitary module 
over R. By assumption (i), properties (iii), (iv), (v) and (vi) for Definition 19.10.3 are inherited by A’ 
directly from A. Thus A’ is a Lie algebra. Hence A’ is a Lie subalgebra of A by Definition 19.10.17. 
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Chapter 20 


TRANSFORMATION GROUPS 


20.1 Left transformation groups 
20.2 Effective left transformation groups 
20.3 Free left transformation groups 
20.4 ‘Transitive left transformation groups 
20.5 Orbits and stabilisers 


20.6 Left transformation group homomorphisms 


20.7 Right transformation groups 
20.8 Right transformation group homomorphisms 
20.9 Figures and invariants of transformation groups 
20.10  Baseless figure/frame bundles 
20.11 Associated baseless figure bundles 
20.12 Associated baseless figure bundle constructions 


20.0.1 REMARK: 


Groups and transformation groups. 
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Transformation groups are presented in a separate chapter from groups so as to avoid discontinuity in the 
presentation of the basic algebraic structures in Chapter 17. 


Figure 20.0.1 shows some relations between the algebraic classes which are presented in Chapter 20. (See 


Figure 17.0.1 for further details of relations between group-related classes.) 
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L 


transformation group 
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"4 


B" 


effective 
transformation group 
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transitive 
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free 
transformation group 
(G,X,oG,/) 


free and transitive 


—»|transformation group 


(G,X,o0G,b) 


Figure 20.0.1 


Family tree of transformation groups 


20.1. Left transformation groups 


20.1.1 REMARK: 


Transformation groups are more useful than abstract groups. 


Abstract groups are of little use in themselves, particularly in practical subjects like differential geometry 
and physics. The useful groups are the ones which actually do something by acting on a space of some kind, 
which is exactly what transformation groups are. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www. geometry .org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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Transformation groups are used throughout differential geometry, for example in the definition of connections 
on fibre bundles. See Section 36.9 for topological transformation groups and Section 63.4 for differentiable 
transformation groups. 


A transformation group is an algebraic system consisting of two sets and two operations. The active set 
G < (G,c) is a group. The passive set X has no operation of its own. There is a binary operation 
u: Gx X 5 X called the “action” of the group G on X. 

By default, a transformation group is a left transformation group, which is presented in Definition 20.1.2. 
Right transformation groups are presented in Section 20.7. 


20.1.2 DEFINITION: A (left) transformation group on a set X is a tuple (G, X) < (G, X, c, u) where (G,o) 
is a group, and the map w: G x X > X satisfies the following conditions. 


(i) V1, go € G, Vr € X, H(o(91, 92), £) = ug, M (go, x). 
(ii) Vr € X, u(e,x) = x, where e is the identity of G. 


The map u may be referred to as the (left) action map or the (left) action of G on X. 


20.1.3 REMARK: The passive set of a transformation group may be empty. 
Definition 20.1.2 does not exclude the possibility that X is the empty set. 


20.1.4 NOTATION: gz, for g € G and x € X, for a left transformation group (G, X), denotes the value 
ug, x) of the action of g on z. 


20.1.5 REMARK:  Avoidance of ambiguity in the notation for the group action on the point set. 
Notation 20.1.4 does not result in ambiguity because (g1g2)x = gi(gox) for all g1, g2 € G and x € X by the 
associative condition (i) in Definition 20.1.2. 


20.1.6 NOTATION: L/ denotes the function ((z, u(g,x)); (g, v) € Dom(z)}, where yu is the left action map 
of a left transformation group (G, X,o, u), for each g € G. 


Lg denotes the function Lf when the left action map p is implicit in the context. 


20.1.7 REMARK: The left action map. 

The function Lf in Notation 20.1.6 is a well-defined function with domain and range equal to X for any 
left transformation group (G, X,c, p), for all g € G. The function L, = L4 may also be referred to as 
the “left action map" of the left transformation group. The left action map Lg = Lb: X — X satisfies 
L(x) = u(g, x) = gz for all g € G andre X. 


20.1.8 REMARK: Each left action map is a bijection on the point space. 
For any fixed g € G in Definition 20.1.2, the map Lg : X — X must be a bijection, because Lg(Lg-1(£)) = 


Lj-i(Lg(x)) = v for all x € X. This is formalised as Theorem 20.1.9. 


20.1.9 THEOREM: Left action maps are bijections of the passive set. 
Let (G, X,0, u) be a left transformation group. Then the left action map Lg : X — X is a bijection for 
all g € G, and L7} = L 


gt: 


PROOF: Let (G, X,c, u) be a left transformation group. Let g € G. Then the inverse g^! € G of g is well 
defined, and Lj(L,-:(z)) = Lg-1(Lg(£)) = x for all z € X. Thus D; o Lg-1 = Ly-1 © Lg = idx. So Ly isa 
bijection from X to X by Theorem 10.5.14 (iv) because it has an inverse. 


20.1.10 REMARK: Transformation groups as observer frame transformations or point transformations. 
Transformation groups enter into science and engineering in (at least) two main roles, namely as coordinate 
transformations and as point transformations. In a formal sense, these two kinds of transformations may be 
thought of as almost the same thing. They are, in fact, very easily confused, both in theory and in fact. For 
example, if one is seated in a train, one may easily have the illusion that one is moving when really it is the 
neighbouring train which is moving. In special relativity, it is held that all inertial frames are equivalent. So 
it is not clear whether the observer’s view is being transformed by relative motion, or the observed system 
is undergoing a point transformation which a static observer is viewing. 


As discussed in more detail in Section 20.9, transformation groups are often defined so that some given 
properties of their point set are preserved. This includes the following kinds of transformation groups. 
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Linear transformations preserve linear combinations. 
Affine transformations preserve convex combinations. 


Euclidean transformations (rotations and translations) preserve distances and angles. 


The Lorentz group preserves space-time intervals. 


(1) 

(2) 

(3) 

(4) Conformal transformations preserve angles. 

(5) 

(6) Strictly monotone increasing maps preserve total order. 
(7) 


Homeomorphisms preserve connectivity of sets and continuity of functions. 


(8) Diffeomorphisms preserve tangent spaces (modulo a linear transformation). 


One naturally expects that a model of the physical world should preserve the basic facts of the system 
being described, independent of the observer’s viewpoint or frame. In the macroscopic world, it is generally 
assumed that the laws of physics are independent of the observer. It is not quite so obvious that a physical 
system should be invariant under point transformations. It is not even obvious that a transformation of the 
points of a system should be invertible in the sense of leaving the system unchanged. However, there are 
many instances of reversible physics where transformations of a system may be said to form a group. 


20.1.11 REMARK: Realisations of groups. 

A left transformation group (G, X,c, u) is equivalent to a group homomorphism $ : G — Aut(X) with 
G < (G,c), where Aut(X) is the group of bijections of X. The map ¢ is defined by 6: gh (x  u(g, x)). 
This group homomorphism is called a “realisation” of the group G. 

When the same group (G,c) appears in multiple left transformation groups (G, Xk, 0, uk), one may think 
of the multiple realisations ¢: g — (x — ui (g, x)) for x € Xy as “associated” realisations, by analogy with 
associated fibre bundles. (See Definition 20.9.17 for “associated transformation groups". See Section 20.11 
for associated baseless figure bundles. See Section 47.9 for associated topological fibre bundles.) 


20.2. Effective left transformation groups 


20.2.1 DEFINITION: An effective (left) transformation group is a left transformation group G < (G, X) < 
(G, X,c, u) which satisfies 


Vg E G \ (e), de € X, u(g,x) Ax. 


In other words, Lg = Le only if g = e. In other words, 
Vg cG, (vz € X, u(g,z) =x) > g=e. 
Such a left transformation group (G, X, c, u) is said to act effectively on X. 


20.2.2 REMARK: No two group elements produce the same action in an effective transformation group. 

If (G, X,o,) is an effective left transformation group and g,g’ € G are such that Lg = Ly, then g = g'. 
In other words, no two different group elements produce the same action. That is, the group element is 
uniquely determined by its group action. Conversely, a left group action which is uniquely determined by 
the group element must be an effective left action. 


20.2.3 EXAMPLE: The group of all permutations of a set is an effective group. 

A simple example of an effective left transformation group is the group (G, X) of all bijections from an 
arbitrary set X to itself. (This group is called the “symmetric group" of X if X is finite. See Section 14.8 
for permutations.) The action of each group element g € G is a bijection L, : X — X. The group operation 
of G is determined by p as function composition so that Lg = LL, for all g, g^ € G. This is the largest 
possible effective left transformation group of a given set X. 


At the other extreme is the trivial group G containing only the identity operation on any set X. 


20.2.4 EXAMPLE: The group of actions of a group on cosets is typically not an effective group. 

A fairly general example of a non-effective left transformation group is given by the tuple (G, Xe, o, pe) which 
is the group of left translations of the set X; of left cosets of an arbitrary subgroup H of an arbitrary group 
G < (G,c) as described in Remark 17.7.13. If H is not the trivial group, then (G, X¢, o, pe) is not effective. 
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20.2.5 REMARK: Effective transformation groups are isomorphic to subgroups of the autobijection group. 
Since the set (L4; g € G} C (X > X) of all left transformations of a set X must be a subgroup of the group 
Bij(X, X) of all bijections from X to X, it is clear that the group G of an effective left transformation group 
(G, X) must be isomorphic to a subgroup of this group Bij( X, X) of all bijections from X to X. 


20.2.6 REMARK: Effective non-trivial transformation groups cannot act om empty or singleton sets. 

If the point set X of an effective left transformation group (G, X,ø, p) is the empty set or a singleton, 
the only possible choice for G is the trivial group (e]. This is fairly clear from Remark 20.2.5 because 
Bij(X, X) = {idx} if #(X) =0 or I. 


20.2.7 THEOREM: Effective left transformation group elements are uniquely determined by their action. 
Group elements of an effective left transformation group (G, X) are uniquely determined by their action on 
the passive set X. In other words, Vg1, g2 € G, (Lg, = Lg, > 91 = 92). 


PROOF: Let (G,X,0o,) be an effective left transformation group. Let g1,g2 € G. Then p(gigo,2) 
(gi, u(ga, x)) for all x € X. Suppose that there are two group elements gs,g4 € G such that u(g3, x) = 
u(g4, £) = u(g192, x) for all x € X. Then for all x € X, n(gsg4 ^2) = (gs, (g1, £)) = ulga, (ga ,2)) = 
u(gag, ^, 2)) = u(e,x) = x. This implies that g3g,;' = e since the group action is effective. So the group 
element o(g1, 92) = g1g2 is uniquely determined by the action map p. 


20.2.8 REMARK: Group elements may be identified with group actions if the action is effective. 

Theorem 20.2.7 implies that the group operation o of an effective left transformation group (G, X, ø, p) does 
not need to be specified because all of the information is in the action map. An effective transformation group 
is no more than the set of left transformations Lg : X — X of the set X. If the group action is not effective, 
then there are at least two group elements g, g' € G which specify the same action Ly = Ly : X + X. Any 
group which is explicitly constructed as a set of transformations of a set will automatically be effective. 


If a left transformation group is effective, the group elements g and the corresponding left translations Lg 
may be used interchangeably. There is no danger of real ambiguity in this. 


In concrete applications of transformation groups, it is generally the transformations Lg : X — X which are 
defined, not the group elements g € G. The group is then defined as a subgroup of the group of all bijections 
from X to X. In other words, the transformations are concrete, and the group is an abstraction constructed 
from the concrete group. 


It follows that in concrete applications, transformation groups are almost always effective. This fails to be 
true, however, when a group is first constructed from one set of transformations, and is then applied to a 
different passive set. For example, the group G of linear transformations of the Cartesian space X, = IR" 
may be applied to the set X5 of lines (Az; A € R \ {0}} C R” with x € R” \ {0}. In this case, the original 
group G may be retained for convenience, even though it is not effective on the substituted passive set X». 


20.2.9 THEOREM: Embedding of a left transformation group in the group of passive-set bijections. 

Let (G, X, c, u) be a left transformation group. Let (Go,o0) be the group of all (left) bijections from X 
to X. That is, Go = (f : X — X; f is a bijection} and the (left) composition operation co : Go x Go > Go 
is defined by V fi, f2 € Go, col fi, f2) = fi © fo. Define 9 : G > Go by P(g): X —> X with ®(g)(x) = u(g,x) 
for all x € X, for all g € G. 


(i) ®: G — Go is a group homomorphism. 
(ii) ® : G > Go is a group monomorphism if and only if (G, X, ø, p) is effective. 


PROOF: Let (G, X,c, u) be a left transformation group, and let (Go, co) be the group of all (left) bijec- 
tions from X to X. Define 6 : G > Go by ®(g) : X > X with e$(g)(x) = u(g,x) for all x € X, for 
all g € G. To verify that ® is a well-defined map, note that 6(g~')(®(g)(x)) = u(g !,u(g,v)) = x by 
Definition 20.1.2 (i, ii). So (g) is a bijection. So ®(g) € Go. 

Let gi,g2 € G. Then o9(®(91), $(go))(z) = ($(gi) o &(g2))(x) = $(g1)($(go)(x)) = ulg, n(go, )) = 
u(o(g1, 92), £) = 9(c(gi, g2)) (x) for all x € X, by Definition 20.1.2 (i). So oo(®(g1), &(g2)) = ®(a(g1, g2)). 
Hence ® is a group homomorphism by Definition 17.4.1. 

For part (ii), € : G — Go is a group monomorphism if and only if ®(g) Z (ec) for all g € G \ {ec}, if and 
only if 3z € X, ®(g)(x) Z 9(ec)(x), for all g € G \ {ec}, if and only if Vg € GV {ec}, da € X, u(g, £) £ v, 
if and only if (G, X,6, p) is effective, by Definition 20.2.1. 
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20.2.10 REMARK: The map from group elements to the corresponding action maps. 

Theorem 20.2.9 maps effective left transformation groups on X to subgroups of bijections on X. So all left 
transformation groups are essentially subgroups of the symmetric group on X. The elements of the group 
G may be thought of as mere “tags” or “labels” for bijections on X. 


20.3. Free left transformation groups 


20.3.1 REMARK: Groups which “act freely” are those which act without fixed points. 

The free transformation groups in Definitions 20.3.2 and 20.7.11 must be distinguished from “free groups”, 
which are groups (not transformation groups) which are generated freely from a set of abstract elements. 
The meaning would be clearer if one used the expression “fixed-point free” or “free from fixed points”. 
(Spivak [37], Volume II, page 309, says “G acts without fixed point” rather than “G acts freely”.) The 
movement of a solid object such that one point is fixed could be thought of as constrained at that point. 
In other words, the object is not moving freely at a fixed point. In this sense, the word “free” does seem 
suitable for transformations without fixed points. 


The concept of a group “acting freely” on a set is applicable to affine spaces, as in Definitions 26.2.2, 26.4.10, 
26.6.2 and 26.6.6, and also to fibre bundles, as in Definitions 20.10.8 and 47.13.2. 


20.3.2 DEFINITION: A left transformation group G < (G, X, c, u) is said to act freely on the set X if 
Vg € G\ {e}, Va € X, p(g, x) # «. 


That is, the only group element with a fixed point is the identity e. In other words, 


Vg EG, (3x € X, n(g,z) 2 xz) > g-e. 


A free left transformation group is a left transformation group (G, X,c, p) which acts freely on X. 


20.3.3 REMARK: Acting freely om the empty set. 

In the special case that X = Ø in Definition 20.3.2, the group G is completely arbitrary. Therefore all left 
transformation groups except the trivial group {e} act freely on the empty set, but are not effective. This 
observation and the assertion of Theorem 20.3.4 are summarised in the following table. 


G X properties 


= {e} any free and effective 
#{e} =Q free, but not effective 
any #0 if free, then effective 


20.3.4 THEOREM: A group acting freely on a non-empty set is an effective transformation group. 
Let (G, X,c, u) be a left transformation group. If X 4 Ø and G acts freely on X, then (G, X,o,u) is an 
effective left transformation group. 


PROOF: Let (G, X,c, p) be a left transformation group with X 4 Ø. Assume that G acts freely on X. 
Let g € GV {e}. Let x € X. Then u(g, x) Z x by Definition 20.3.2. Therefore Vg € GV {e}, da € X, gx Ax. 
So G acts effectively on X by Definition 20.2.1. 


20.3.5 THEOREM: Equivalent injectivity condition for a transformation group to act freely. 
Let (G, X, ø, u) be a left transformation group. Then G acts freely on X if and only if the map uy : G > X 
defined by up : g  u(g,p) is injective for all p € X. In other words, G acts freely on X if and only if 


Vp € X, Vg, 92 € G, (u(gi, P) = (92, D) > 91 = 92). 


PROOF: Let (G, X,o,j) be a left transformation group which acts freely on X. Let p € X and g1,g» € 
G satisfy (gi, p) = u(g2,p). Then p = u(e,p) = n(g; ^, n(gy.p)) = n(gy n(go.p)) = n(gy gov) by 
Definition 20.1.2, where e is the identity of G. Therefore g7 !g» = e by Definition 20.3.2. So gı = g». 

Now suppose that (G, X,o, u) satisfies Vp € X, Vg1,go € G, (u(gi, p) = u(ga, p) > g1 = 92). Let g € G and 
p€ X satisfy u(g, p) = p. Then u(g,p) = u(e,p). So g =e. Hence G acts freely on X. 
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20.3.6 REMARK: Applicability of groups which act freely to principal fibre bundles. 

A canonical example of a transformation group which acts freely is a group acting on itself by left or right 
translation as in Theorem 20.3.7. This is the kind of transformation group which is needed for principal fibre 
bundles in Sections 47.8 and 66.1, which play an important role in defining parallelism and connections. 


20.3.7 THEOREM: A group acting on itself is effective and free. 
Let (G,c) be a group. Define the action map u : G x G > G by u : (91,92) + o(g1, 92). Then the tuple 
(G,G,o, u) = (G,G,a,c) is an effective, free left transformation group of G. 


PRoor: For a left transformation group, the action map u : G x X — X must satisfy the associativity 
rule u(o(gi, g2), x) = u(gi, (g2, x)) for all g1,g2 € G and x € X. If the formula for u in the theorem is 
substituted into this rule with X = G, it follows easily from the associativity of c. So (G, G,c, p) is a left 
transformation group. 


This left transformation group acts freely by Theorem 17.3.20 (iii) and Definition 20.3.2. Therefore it is 
effective by Theorem 20.3.4 because G Æ (). 


20.3.8 DEFINITION: The (left) transformation group of G acting on G by left translation, or the left 
translation group of G, is the left transformation group (G, G, 0,0). 


20.4. Transitive left transformation groups 


20.4.1 DEFINITION: A transitive (left) transformation group is a left transformation group G < (G, X) < 
(G, X,c, u) such that 


Veg X. Jg EG, L(g, x) = y. 
In other words, 
Vor A, {u(g,2);9 e G} — X. 


A left transformation group (G, X, c, p) is said to act transitively on X if it is a transitive left transformation 
group. 


20.4.2 THEOREM: All points reachable from all points if and only if reachable from a single point. 
A left transformation group (G, X, ø, p) acts transitively on a non-empty set X if and only if 


dz € X, {ulg x£); g E G} =X. 


PROOF: Let (G, X,c, u) be a transitive left transformation group. Then Vz € X, (u(g,z); g E G} = X. 
So clearly dx € X, {u(g, x£); g € G} = X if X is non-empty. 

Let (G, X,c, u) be a left transformation group. Let x € X satisfy (u(g,x); g € G} = X. Let y € X. Then 
(gy, x) = y for some gy € G. So u(g;', y) = x by Theorem 20.1.9. Therefore u(g, £) = u(g, u(gy ^, y)) = 


u(gg, ^, y) for all g € G. In other words, X = {u(g, £); g € G} = (u(995 1, y); g € G} = (u(g^ y); 9’ € G}, 
substituting g' for gg; +. In other words, (u(g,y); g € G} = X. 


20.4.3 THEOREM: Conditions for a left transformation group to act freely and/or transitively. 
Let (G, X,c, u) be a left transformation group. Define uy : G — X by up : g => ulg, p) for all pe X. 


(i) G acts freely on X if and only if up is injective for all p € X. 
(ii) G acts transitively on X if and only if up is surjective for all p € X. 
(iii) G acts freely and transitively on X if and only if up is a bijection for all p € X. 


PROOF: Part (i) is a restatement of Theorem 20.3.5. Part (ii) follows directly from Definition 20.4.1. 
Part (iii) follows from parts (i) and (ii). 
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20.4.4 REMARK: A group acting on itself is effective, free and transitive. 

By Theorems 20.3.7 and 20.4.5, a group acting on itself is effective, free and transitive. Theorem 20.4.6 is a 
converse which asserts that any effective, free and transitive left transformation group is isomorphic to the 
group acting on itself. Any point in the passive set may be chosen to be its identity element. (Note that 
by Theorem 20.3.4, the transformation group in Theorem 20.4.6 must be effective because it is free and the 
passive set is assumed non-empty.) 


20.4.5 THEOREM: A group acting on itself is transitive. 
Let (G,c) be a group. Define the left action map p : G x G > G by w: (91,92) ^ o(g1, g2). Then the tuple 
(G,G,o, u) = (G,G,o,¢) is a transitive left transformation group. 


PROOF: Let z,y € G. Let g = yx |. Then (g, x)= y. Hence (G,G,o, ji) is transitive. 


20.4.6 THEOREM: Free transitive transformation group of non-empty set is isomorphic to self-acting group. 
Let (G, X,c, u) be a left transformation group which is free and transitive. Let p € X. Define the bijection 
Hp : G — X by up : g> gp. Define ox : X x X — X by 


Va.y € X, ex (2,9) = up(oe (s (2), uz ())) (20.4.1) 


Then (X,ox) is a group and up : G — X is a group isomorphism. 


PROOF: Theorem 20.4.3 (iii) implies that up : G — X is a bijection. So ax : X x X — X is a well-defined 
function, and 


Yr, y,z E€ X, ex(ox(z,y)z) = 


| 
a 
— 
XR 
— 
— 


»(ea(u, (£), calu (Y), Mp 


(ca( . 
siec(eo(us (£), up (y)), up (2) 
(ea( 

»(ea(uy (x), Mp  (up(oe Qus (Y), Mp  (2))))) 


which verifies the associativity of ox. Let ex = p. Then uy(ea) = ex. So up (ex) = ea. Consequently 
ex (x. ex) = noe (us (2), uz (ex))) = noe (s (2). e)) = polig (2)) = z for all z € X. Similarly, 
ox(ex,x) = x for all x € X. Therefore ex is an identity element for X. 


Now let z € X. Let y = up((u5!(z)) ?). Then y € X and 


Similarly, cx (y, x) = ex. So y is an inverse for x with respect to ox in X. Hence (X,cx) is a group. 


Let g,h € G. Then uy(oc(g. h)) = us(ec(uy  (up(9)). Mp (up(h)))) = ex(up(g). up (h)) by line (20.4.1). 
So uy : G > X is a group isomorphism by Definition 17.4.1 because it is a bijection. (As mentioned in 
Remark 17.4.5, it follows that uz" : X — G is a group homomorphism.) 


20.5. Orbits and stabilisers 


20.5.1 REMARK: Applicability of orbit spaces to fibre bundles. 
Orbit spaces are important for defining fibre bundles. (See for example Sections 47.11 and 66.7.) 


20.5.2 DEFINITION: The orbit of a left transformation group (G, X) passing through the point x € X is 
the set Ga = {gx; g € G}. 
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Figure 20.5.1 Orbits of rotations of S? around the North pole 


20.5.3 EXAMPLE: The orbits of the group of rotations of the sphere S? about its North pole are the sets 
of points of constant latitude. The group action causes translations within each orbit but not between them. 
(This is illustrated in Figure 20.5.1.) 


By contrast, the orbits of the group of all rotations of S? are all equal to S? itself because from any given 
point x € S?, it is always possible to find a rotation which will translate x to any other y € S?. 


20.5.4 REMARK: The relation between orbits and transitivity of a transformation group. 

By Definition 20.4.1, a left transformation group (G, X) is transitive if and only if all of its orbits equal the 
whole point set X. By Theorem 20.4.2, one orbit equals X if and only if all orbits equal X. Hence the set 
of all orbits in Definition 20.5.6 is the uninteresting singleton {X} if the group action is transitive. 


Roughly speaking, the "bigger" the group, the smaller the orbits, whereas a bigger passive space typically 
has larger orbits. As noted in Example 20.5.3, a subgroup of SO(3) consisting of rotations around the “north 
pole" has infinitely many orbits within the passive set S?, whereas the larger group of all elements of SO(3) 
has only one orbit. However, if the passive space S? is replaced with IR?, the whole group of rotations once 
again has infinitely many orbits, namely the spheres of any radius centred on the origin 0 € IR?. 


20.5.5 THEOREM: All orbits cover the passive set if and only if one orbit covers the passive set. 
A left transformation group (G, X,c, u) acts transitively on a non-empty set X if and only if X is the orbit 
of some element of X. 


Pnoor: The assertion follows directly from Theorem 20.4.2. 


20.5.6 DEFINITION: The orbit space of a left transformation group (G, X) is the set {Ga; x € X] of orbits 
of (G, X) passing through points x € X. 


20.5.7 NOTATION: X/G denotes the orbit space of a left transformation group (G, X). 


20.5.8 REMARK: The orbit space may be regarded as a point space for the group. 
'Theorem 20.5.9 effectively implies that the elements of the orbit space are acted on by elements of the group 
in a well-defined manner which permits the orbit space to be regarded as a point space for the group. 


The proof of Theorem 20.5.9 has the same form as the proof of Theorem 17.7.4. The orbit space can 
also be shown to be a partition of the passive set X by noting that the relation (E, X, X) defined by 
R= {(x1, £2) € X x X; dg € G, xı = gx2} is an equivalence relation whose equivalence classes are of the 
form Ga for x € X. 


20.5.9 T'HEOREM: The set of orbits of a left transformation group partitions the passive set. 
The orbit space of a left transformation group (G, X) is a partition of X. 


PROOF: Let (G, X,0, 1) be a left transformation group. Let 11,72 € X be such that Gz N Grz 4 0. Then 
g1X1 = 92X2 for some gi, g2 € G. So for any g € G, gx, = (991 gtı = 991 9222 € Garg. Hence Ga; C Gro. 
Similarly Gx, 2 Gax2. So Ga, = Gro and it follows that X/G is a partition of X. 
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20.5.10 DEFINITION: The stabiliser of a point x € X for a left transformation group (G, X,o, u) is the 
group (Gz, ox) with Gs = {g € G; gz = x} and e, = gs "m 


20.5.11 REMARK: Relation between stabilisers and the free-action property of a transformation group. 
A left transformation group (G, X,c, u) acts freely on X if and only if G, = {e} for all x € X. This follows 
immediately from Definitions 20.3.2 and 20.5.10. 


'Thus the group acts freely if and only if every stabiliser is a singleton, whereas the group acts transitively if 
and only if the orbit space is a singleton, as mentioned in Remark 20.5.4. 


20.5.12 REMARK: Orbits of mixed right and left actions on frame/measurement pairs. 

Transformation groups are ubiquitous in physics for changes of coordinates. To state what one has observed, 
one must state both the measurement and the coordinate frame which was used to obtain that measurement. 
'The coordinate frame could be as simple as a temperature scale such as Centigrade or Fahrenheit, or a length 
scale such as light-years or parsecs, or it could be as complicated as a time-dependent curvilinear coordinate 
system. The measurement is of little value is the reference frame is not known. 


A good way to deliver the outcome of an observation is to give a pair (g, x), where x is the observation and 
g is the transformation which can be applied to x to bring it back to some standard known reference frame. 
This is somewhat superfluous because one could simply deliver the transformed measurement gx, which is 
how the measurement would look in some standard frame e € G. However, sometimes a reference frame g 
might make all of the measurements much easier to manage. It also might not not be clear which reference 
frame is the universal standard. What is standard to one group of researchers might not be standard to 
another. If there are multiple standard references gi, go and ga, then one would effectively have to present 
the measurements as pairs (g1, 71), (go, %2) and so forth. 


Since each measurement presumably refers to some underlying reality, one may say that two measurements 
(g1, £1) and (g2,%2) refer to the same underlying object or event if g1z1 = gozo. This gives a trivial 
partitioning of the set G x X into equivalence classes. Despite this triviality, it is useful to understand 
this simple case first before considering the more complicated situations which occur in the construction of 
associated fibre bundles using orbit spaces. Theorem 20.5.13 gives some basic properties of this partition. 
Most important is the relation ((g, x£) € Gx X; gx = goxo} = ((goh- !, hzo); h € G} in part (i). (Line (20.5.1) 
is a higher-level way of saying the same thing.) This helps to explain why the orbits of this kind of mixed 
right and left action on pairs of groups elements and pairs appears so often in the context of associated fibre 
bundles. The left action by a group element h € G on a pair (go, zo) may be thought of as a “change of 
coordinates" operation. The reference frame go and measurement xo are simultaneously modified in a kind 
of “covariant” manner so that the underlying object being observed does not change. (A well-known example 
of this kind of “covariance” is multiplication of column vectors by matrices from the left, where Ax = y if 
and only if A'z' = y, where A’ = AC^! and x’ = Cz for invertible C.) 


Despite the triviality of Theorem 20.5.13, if the product G x X is replaced by a product P x X, where G 
acts on the right on P, then the set-construction [(goh-!, hag); h € G} becomes much more interesting, as 
in Theorem 20.12.3 and Definitions 47.11.5 and 66.7.12. 


20.5.13 THEOREM: Frame/measurement pair equivalence classes are equal to mixed-action orbits. 
Let (G, X, oc, u) be a left transformation group. 


(i) Vgo € G, Vzo € X, ((g, v) € G x X; gx = gozo} = ((goh- ^, hao); h e G}. 
Let Q = ([(go, xo)]; (go. zo) € G x X}, where [(go, xo)] = ((g, v) € G x X; gx = goxo} for (go, £o) € G x X. 
(ii) Q is a partition of G x X. 
Define uo : G x (G x X) + Gx X by Vh € G, V(go, 20) € G x X, uolh, (go, o)) = (goh™*, hao). 


(ii) (G, G x X,og, uo) is a left transformation group. 
(iv) The orbit space of (G, G x X,6G, Ho) is equal to Q. In other words, (G x X)/G = Q. Moreover, 


V(go,zo) € G x X, [(go; o)] = G(go, xo). (20.5.1) 
For zo € X, define $4, : G — Q by $4,:9  [(g, xo)]. 
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(v) bx, : G > Q is injective for all xo € X if and only if G acts freely on X. 
(vi) zo : G — Q is surjective for all zo € X if and only if G acts transitively on X. 
(vii) so : G > Q is a bijection for all xo € X if and only if G acts freely and transitively on X. 


PROOF: For part (i), let go € G and a € X. Let (g,z) € Gx X with gx = gozo. Let h = g^ go. 
Then (goh™}, hzo) = (9090 9. 9 gozo) = (g,x). Therefore (g,x) € {(goh~', hao); h € G}. Consequently 
(gx) € G x X; gx = goto} C ((goh- !, hao); h € G}. 

For the reverse inclusion, suppose that (g, x) € ((goh-!, hao); h € G}. Then (g, x) = (goh-!, hao) for some 
h € G. Therefore gx = goh~thap = gozo. Consequently (g,z) € ((g,x) € G x X; gx = goxo}. Hence 
((g,z) € G x X; gx = gozo) = {(goh*, hao); h € G}. 

For part (ii), let (g1, £1) € G x X. Then (gi,21) € [(91,%1)]. So (g1, £1) € UQ. Thus G x X C UQ. Now 
suppose that y € UQ. Then y € [(go, £o)] for some (go, zo) € Gx X. Soy € Gx X. Thus Gx X DUQ. 
Hence G x X =UQ. 


To show pairwise disjointness of the elements of Q, let [(g1,71)], ((g2,22)] € Q with [(g1,21)] N[(g2, 22)] # 9. 
Then there exists (g,x) € Gx X with gx = g1z1 = goz2. Let (ga,23) € [(g1, x1)]. Then gaza = gi21 = gaza. 
Therefore (g5,23) € [(go,x3). Thus [(g1,21)] © |(go,x2)]. Similarly [(g1,21)] 2 [(go,x2)]. Therefore 
[(g1, 21)] = [(g2, v2)]. Hence Q is a partition of G x X by Definition 8.7.12. 


For part (iii), let 91,92 € G and (go,zo) € G x X. Then uo(gi, H0(92, (go, £0))) = Holgi, (gogz, 9210)) = 
(9099 II l'ggaxo) = = po(g192, (go, o)). So Mo satisfies Definition 20.1.2 (i). Let e be the identity of G, and 
(go, xo) € G x X. Then pole, (go, xo)) = (goe~*, exo) = (go, xo). Thus jo satisfies Definition 20.1.2 (ii) also. 
Hence (G, G x X, og, uo) is a left transformation group by Definition 20.1.2. 


For part (iv), G(go,zo) = {uo(h, (go, %0)); h € G} = {(goh*,hao)); h € G} for all (go, £o) € G x X by 
Definition 20.5.2. Therefore line (20.5.1) follows from parts (i), (ii) and (iii). Hence (G x X)/G = Q by 
Notation 20.5.7 and part (ii). 


For part (v), let zo € X. Let 91,92 € G with ó;,(g1) = zo (92). Then (g1, xo) € [(g2, xo)] by the definition 
of [(g1,zo)]. So g1xo = g2%0. Therefore zo = g7 goto. So gı = g2 by Theorem 20.3.5 because G acts freely 
on X. Thus ¢z, is injective for all zo € X. For the converse, suppose that v, is injective for all zo € X. 
Let ro € X and gi, go € G with gi1zo = gozo. Then (91,20) € [(g1, o)] NO [(g2, xo)]- So [(g1, xo)] = [(g2, xo)] 
by part (ii). Therefore gı = g2 because $4, is injective. Thus the map g +> go is injective for all xo € X. 
Hence G acts freely on X by Theorem 20.3.4. 

For part (vi), let zo € X. Suppose that |(g, z)] € Q. Then zo = gi(gz) for some gı € G by Definition 20.4.1 
because G acts transitively on X. Let go = gj |. Then gozo = gx. So (go, o) € [(g, x)]- So [(go, £0)] = [(9,)] 
by part (ii). Thus ¢z,(g2) = |(g, z)]. Therefore Range($4,) = Q for all xo € X. For the converse, suppose 
that Range(ó,,) = Q for all zo € X. Let 21,22 € X. Then Range(ó,,) = Q and [(e, x2)] € Q, where e is the 
identity of G. So ¢z,(g1) = [(e, z2)] for some gı € G. So [(g1, 21)] = [(e, £2)]. Therefore g1z1 = ex» = z2. 
Hence G acts transitively on X by Definition 20.4.1. 


Part (vii) follows from parts (v) and (vi). 


20.6. Left transformation group homomorphisms 


20.6.1 REMARK: Applicability of transformation group homomorphisms to fibre bundles. 

Definition 20.6.2 is related to the definition of fibre bundle homomorphisms, which are in turn related to 
the export and import of parallelism between associated fibre bundles in Section 48.4. Definition 20.6.2 is 
illustrated in Figure 20.6.1. o 


20.6.2 DEFINITION: A (left) transformation group homomorphism from a left transformation group 


(G1, X1) < (Gi, X1, 01, u1) to a left transformation group (G2, X2) < (G2, X2, 02, u2) is a pair of maps (6, ¢) 
with $ : Gi > G2 and ¢: Xı — X» such that 


(i) é(e1(g, h))  ex(6(g), 6(h)) for all g, h € Gy. That is, (gh) = $(g)ó(h) for g,h € Gh. 


(ii) (ui (g, 2)) = uo(ó(g), ó(x)) for all g € Gy, x € X1. That is, (gx) = ó(g)9(x) for g € Gi, £ € Xs. 


A (left) transformation group monomorphism from (G4, X1) to (G2, X5) is a left transformation group 
homomorphism (6, $) such that ¢: Gi > Go and ¢ : X4 > X» are injective. 
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OO : 
OO i 


Figure 20.6.1 Transformation group homomorphism maps and spaces 


A (left) transformation group epimorphism from (G1, X1) to (G2, X2) is a left transformation group homo- 
morphism CA $) such that $:G, > Gz and à : X, > X» are surjective. 


A (left) transformation group isomorphism from (G1, X1) to (G2, X2) is a left transformation group homo- 
morphism (ó, $) such that $: Gi > G and à : X, > X» are bijections. 

A (left) transformation group endomorphism of a left transformation group (G, X) is a left transformation 
group homomorphism (6, $) from (G, X) to (G, X). 

A (left) transformation group automorphism of a left transformation group (G, X) is a left transformation 
group isomorphism (d, $) from (G, X) to (G, X). 


20.6.3 EXAMPLE: Neither the group-map nor the point-space-map uniquely determines the other. 

One naturally asks whether the pair of maps in Definition 20.6.2 may be replaced with a single map. 
Unfortunately, neither ó nor ¢ determines the other. Consider the example of embedding the transformation 
group (G1, X41) = (SO(2), S!) inside (G2, X3) = (GL(2), R?). The pair of identity maps ¢: G1 > Gz and 
$: X4 — X» constitutes a transformation group homomorphism. (See Figure 20.6.2.) 


CA nA 
SO(2) 2 +> GL(2) 
-— uu» 
Hı i i H2 
t eu) 
r p (x) 
a ER R? 

Figure 20.6.2 Transformation group homomorphism in Example 20.6.3 


In fact, it is a monomorphism. But for a fixed choice of ĝ, the map v o @ is a transformation group 
homomorphism for any conformal map 7 : IR? > IR?. For a fixed choice of ¢, if the target group G% is not 
specified, then any supergroup of Gz with the same map db is also a transformation group homomorphism. If 
the target group G» is given, the map ó may or may not be uniquely specified by the map $. In the current 
example, db is uniquely determined by ¢. But if the group Gə is replaced with the group of all bijections 
of IR? (the permutation group of IR2), then, for example, ó may be replaced by the map which sends group 
elements Rọ € SO(2) to bijections of IR? which rotate the points of S! in the same way but leave all other 
points of R? fixed. More generally, any map ¢ determines only the action of group elements ó(g) € G2 on 
points in the image of 9. 


It follows from this example that transformation group homomorphisms must specify both maps in general. 
20.6.4 EXAMPLE: Freedom of choice of the point-space map for a given group map. 


Let X, = X; = R? with G4 = Gp = SO(2). Define $ : G1 > G3 by ¢ = idc, and $ : x  A(|z|) R(|x|)z 
with A: IR} —> R and R: Rj — SO(2). Then 9 clearly satisfies Definition 20.6.2 (i) for a transformation 
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group homomorphism. Condition (ii) stipulates VS € SO(2), Vx € R?, (Sx) = S¢(z). This means 


YS € SO(2), Vx € R2, A(|Sz|]) R(|Sz|)Sz = SA(|z|) R(|z|)z 
= A(Iz|) R(|x|) Sx 


because S commutes with both the scaling transformations z — A(|z|)x and the rotations x œ> R(|v|)x. But 
|Sx| = |x| for all x € IR?. So this requirement is satisfied for completely arbitrary functions A : Rj — R 
and R : Rj > SO(2). Another way to think of this is that the value of ó(z) can be completely freely 
chosen for x € Rj x {Og}, but then (x) is determined for all other z € IR? by rotating ¢( (||, 0) ) by the 
angle arctan(x1, z2). (See Definition 44.2.9 for the two-parameter inverse tangent function.) 


Thus for x within each orbit of Gi, the value of ó(x) is completely determined by the choice of ó(xo) for 
any single point x9 in that orbit. Between the orbits, there is complete freedom. 


20.6.5 REMARK: An equivariant map is a special case of a transformation group homomorphism. E 
An equivariant map in Definition 20.6.6 is the map ¢ of a left transformation group homomorphism (4, ¢) 
in Definition 20.6.2 for which ¢: Gi G2 = G is the identity map. 


20.6.6 DEFINITION: An equivariant map between left transformation groups (G, X1) < (G, X1,0, u1) and 
(G, X2) < (G, X2,0, p2) is a map ¢@: X1 — X» which satisfies 


(i) Vg € G, Vr € X1, ó(ui(g, x)) = u2(9, ó(x)). That is, d(gx) = gó() for g € G, x € Xi. 


20.7. Right transformation groups 


20.7.1 REMARK: Both left and right transformation groups are required in some contexts. 

Section 20.7 is not an exact mirror image of Section 20.1. It is true that left and right transformation groups 
are essentially mirror images of each other, but there are some subtleties which need to be checked. More 
importantly, the definitions, notations and theorems for right transformation groups must be given here so 
that later chapters can refer to them directly rather than referring to left transformation groups and stating 
informally what kinds of adjustments are required. 


An example of the need for both left and right transformation groups is the definition of a principal fibre 
bundle. (For example, see Section 47.8.) The total space of a principal fibre bundle is acted on by fibre chart 
transition maps on the left, and by structure group actions on the right. The fact that the actions are on 
opposite sides makes the notation more convenient. 


20.7.2 DEFINITION: A right transformation group on a set X is a tuple (G, X) < (G, X,c, p) where (G,c) 
is a group, and the map w: X x G — X satisfies 


(i) Yg1, 92 € G, Vr € X, n(z,o(g1,g2)) = u(u(z, 91), 92). 
(ii) Vr € X, u(z,e) = x, where e is the identity of G. 


The map u may be referred to as the (right) action map or the (right) action of G on X. 


20.7.3 REMARK: Notation for transformation group actions. 
Notation 20.7.4 does not result in ambiguity because x(gig2) = (£g1)g2 for all g1, g2 € G and x € X by the 
associative condition, Definition 20.7.2 (i). 


20.7.4 NOTATION: xg, for g € G and z € X, for a right transformation group (G, X), denotes the action 
u(z,g) of g on z. 


20.7.5 REMARK: The associativity rules of left and right transformation groups are different. 
The difference between left and right transformation groups lies in the associativity rule, condition (i) in 
Definition 20.7.2, rather than in the order of G and X in the domain G x X or X x G of the action p. 


20.7.6 NOTATION: R% denotes the function {(x, u(x, g)); (x,g) € Dom(u)J, where p is the right action 
map of a right transformation group (G, X,c, u), for each g € G. 


Rg denotes the function Rf when the right action map y is implicit in the context. 
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20.7.7 REMARK: Some comments on right action maps. 

The function Rf in Notation 20.7.6 is a well-defined function with domain and range equal to X for any 
right transformation group (G, X, c, u), for all g € G. The function Rg = R4 may also be referred to as the 
“right action map” of the right transformation group. The right action map Rg = RẸ : X — X satisfies 
R(x) = n(z,g) = xg for all g € G and z € X. 


20.7.8 REMARK: The right action by each group element is a bijection. 
The right action map R, : X — X is a bijection for each g € G because R,(R,-:(x)) = Rg-: (Rg(£)) = x 
for all x € X. 


20.7.9 DEFINITION: An effective right transformation group is a right transformation group (G, X,o, u) 
which satisfies 


Vg € G \ {e}, Ix € X, ulz, g) z a. 
In other words, Rg = Re only if g = e. In other words, 
YgEG, (vz € X, n(z,g) 2 x) > g=e. 
Such a right transformation group (G, X,c, p) is said to act effectively on X. 
20.7.10 THEOREM: An effective group action uniquely determines the group operation. 


The group operation c in Definition 20.7.2 is uniquely determined by the action map p if the group action 
is effective. 


PRoor: The argument is essentially the same as for left transformation groups in Theorem 20.2.7. 


20.7.11 DEFINITION: A right transformation group G < (G, X,o, u) is said to act freely on the set X if 
Vg € G \ {e}, Vz € X, w(x, g) 7 x. 


That is, the only group element with a fixed point is the identity e. In other words, 


Vg € G, (ar € X, u(z,g) 22) > g—e. 


A free right transformation group is a right transformation group (G, X,o, i) which acts freely on X. 


20.7.12 THEOREM: A freely acting transformation group acts effectively. 
Let (G, X,o, u) be a right transformation group. If X 4 () and G acts freely on X, then (G, X,o, p) is an 
effective right transformation group. 


PROOF: Let (G, X,c, u) be a right transformation group with X 4 Ø. Assume that G acts freely on X. 
Let g € G\ {e}. Let x € X. Then u(x, g) 4 x by Definition 20.7.11. Therefore Vg € G\ {e}, da € X, ag Ax. 
So G acts effectively on X by Definition 20.7.9. 


20.7.13 THEOREM: Equivalent injectivity condition for a transformation group to act freely. 

Let (G, X, c, u) be a right transformation group. Then G acts freely on X if and only if the map uy : G > X 
defined by uy : g œ u(y, g) is injective for all y € X. In other words, G acts freely on X if and only if 
Vy € X, Voi, 92 € G, (u(y, 91) = u(y.g2) => 91 = 92). 


PROOF: Let (G,X,0¢,) be a right transformation group which acts freely on X. Let y € X and gy, 92 € 
G satisfy u(y,91) = u(y.g2). Then y = u(y,e) = u(u(y m). gy ) = HCY, g2),91 ) = MY, 9291) by 
Definition 20.7.2, where e is the identity of G. Therefore 929, = e by Definition 20.7.11. So gı = ge. 

Now suppose that (G, X, o, u) satisfies Vy € X, Vg1,go € G, (u(y, g1) = u(y, ga) > g1 = go). Let g € G and 
y € X satisfy u(y, g) = y. Then u(y, g) = u(y, e). So g =e. Hence G acts freely on X. 
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20.7.14 DEFINITION: A transitive right transformation group is a right transformation group (G, X, 0, u) 
such that 


Vz,y € X, 3g €G, w(x, g) =y. 
In other words, 
Va € X, {u(z, 9); g € G} = X. 


A right transformation group G < (G, X,c, u) is said to act transitively on X if it is a transitive right 
transformation group. 


20.7.15 THEOREM: All points reachable from all points if and only if reachable from a single point. 
A right transformation group (G, X,o, u) acts transitively on a non-empty set X if and only if 


Jx € X, {u(x, g); ge G} — X. 
That is, G acts transitively on a non-empty set X if and only if X is the orbit of some element of X. 


PROOF: Let (G, X,o, u) be a transitive right transformation group. Then Vz € X, (u(z,g); g € G} =X. 
So clearly dx € X, {u(x, g); g € G} = X if X is non-empty. 

Let (G, X, ø, 4) be a right transformation group. Let x € X satisfy {u(x, g); g € G} = X. Let y € X. Then 
u(x, gy) = y for some gy € G. So u(y, g7') = x by Theorem 20.1.9. Therefore u(x, g) = u(u(y, gg +), 9) = 


u(y, 9, "9, y) for all g € G. In other words, X = {u(z, g); g € G} = {u(9; +9, y); g € G} = {uly g'); g' € G}, 
substituting g' for g7'g. In other words, {u(y, g); g € G} = X. 


20.7.16 THEOREM: Injectivity/surjectivity conditions for transformation groups to act freely/transitively. 
Let (G, X, o, u) be a right transformation group. For all y € X, define uy : G — X by uy: gH uly, 9). 


(i) G acts freely on X if and only if py is injective for all y € X. 
(ii) G acts transitively on X if and only if uy is surjective for all y € X. 
(iii) G acts freely and transitively on X if and only if jy is a bijection for all y € X. 


PROOF: Part (i) is a restatement of Theorem 20.7.13. Part (ii) follows directly from Definition 20.7.14. 
Part (iii) follows from parts (i) and (ii). 


20.7.17 THEOREM: A group acting on itself acts effectively, freely and transitively. 
Let G < (G,c) be a group. Define a right action map p : G x G > G by u : (91, 92) ^ o(g1, 92). Then the 
tuple (G, G,c, u) = (G, G,o,c) is an effective, free, transitive right transformation group. 


PRoor: For aright transformation group, the right action map u: X xG — X must satisfy the associativity 
rule u(z,o(gi,g2)) = (u(x, gi), 92) for all g1,g2 € G and x € X. If the formula for u in the theorem is 
substituted into this rule with X = G, it follows easily from the associativity of c. 


This right transformation group acts freely by Theorem 17.3.20(iv) and Definition 20.7.11. Therefore it 
is effective by Theorem 20.7.12 because G Z Ø. To show that (G,G,o,j) is transitive, let x,y € G. Let 
g = x ly. Then u(r,g) = ra-'y = y. So (G,G,o,p) is a transitive right transformation group by 


Definition 20.7.14. 


20.7.18 DEFINITION: The right transformation group of G acting on G by right translation, or the right 
translation group of G, is the right transformation group (G, G, 0,0). 


20.7.19 REMARK: Insufficiency of specification tuples to distinguish left from right transformation groups. 
Oddly enough, the tuple (G, G,c,0) is identical for the left and right transformation groups where G acts 
on G. This is yet another example which demonstrates that definitions of object classes are more than just 
the sets in the specification tuples. The same tuple is used to parametrise two different classes of objects: 
both the left and right transformation group classes. 


20.7.20 DEFINITION: The orbit of a right transformation group (G, X) passing through the point « € X 
is the set zG = (xg; g € G}. 
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20.7.21 REMARK: The relation of transitivity to orbits. 

In terms of the definition of an orbit, Definition 20.7.14 defines a right transformation group (G, X) to be 
transitive if and only if all of its orbits are equal to the whole point set X, and Theorem 20.7.15 means that 
if one of the orbits equals X, then all of the orbits equal X. 


20.7.22 THEOREM: All orbits cover the passive set if and only if one orbit covers the passive set. 
A right transformation group (G, X, c, p) acts transitively on a non-empty set X if and only if X is the orbit 
of some element of X. 


PRoor: The assertion follows directly from Theorem 20.7.15. 


20.7.23 DEFINITION: The orbit space of a right transformation group (G, X) is the set (xG; x € X} of 
orbits of (G, X) passing through points x € X. 


20.7.24 NOTATION: X/G denotes the orbit space of a right transformation group (G, X). 


20.8. Right transformation group homomorphisms 


20.8.1 DEFINITION: A right transformation group homomorphism from a right transformation group 
(Gi, X1) < (Gi, X1,01, 1) to a right transformation group (G2, X2) < (G2, X2,02, M2) is a pair of maps 
(0, à) with ¢: G1 > Gp and à : X4 — X» such that 


(i) &(ex(g, h)) = ex(6(g), 6(h)) for all g,h € Gy. That is, d(gh) = (g)ó(h) for g, h € Gi. 

(ii) (ui (m, g)) = H2(o(x), 6(g)) for all g € Gy, x € X1. That is, ó(g) = ó(x)ó(g) for g € Gi, x € Xs. 
A right transformation group monomorphism from (G4, X1) to (G2, X2) is a right transformation group 
homomorphism (6, ¢) such that $ : Gi — G3 and 9 : X, — X» are injective. 


A right transformation group epimorphism from (G4, X1) to (Go, X2) is a right transformation group homo- 
morphism (6, $) such that ¢: G1 + G2 and à : X4 > X» are surjective. 


A right transformation group isomorphism from (G4, X1) to (G2, X2) is a right transformation group homo- 
morphism (6, $) such that $:G, > Gg and à : X, > X2 are bijections. 

A right transformation group endomorphism of a right transformation group (G, X) is a right transformation 
group homomorphism (6, $) from (G, X) to (G, X). 

A right transformation group automorphism of a right transformation group (G, X) is a right transformation 
group isomorphism (d, $) from (G, X) to (G, X). 


20.8.2 REMARK: Relevance of groups of automorphisms of right transformation groups to affine spaces. 
'The set of all automorphisms of a right transformation group constitutes a group with respect to the compo- 
sition operation. Groups of automorphisms of right transformation groups are relevant to affine spaces. In 
the case of an affine space X over a group G, for example, a transformation of the point set X corresponds 
to a transformation of the group G. (See Section 26.2 for affine spaces over groups.) The group action is 
transformed in accordance with Definition 20.8.1. 


20.8.3 REMARK: Relevance of equivariant maps between right transformation groups to fibre bundles. 
Definition 20.8.4 is relevant to the definition of principal fibre bundles. (See Remark 47.8.20.) The fibre 
charts of a principal fibre bundle are equivariant maps with respect to the right action of a structure group 
on the total space of a principal fibre bundle. 


20.8.4 DEFINITION: An equivariant map between right transformation groups (G, X1) < (G, X1, 0, u1) and 
(G, X3) < (G, X2,0, u2) is a map $ : X1 — X» which satisfies 


(i) Vg € G, Yx € X1, (m(x, g)) = uz(ó(z), g). That is, ¢(xg) = ọ(x)g for ge G, x € Xi. 
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20.9. Figures and invariants of transformation groups 


20.9.1 REMARK: Associated transformation groups. 

A wide variety of transformation groups (G, Fo, ur) may be constructed from a single given transformation 
group (G, X,c, ux). These “associated transformation groups” have the same group G as the given point- 
transformation group, but they have different passive sets F which are constructed by various methods from 
the given passive point-set X. 


The objects in each constructed passive set F may be referred to as “figures”. These figures may be subsets 
of X, curves or trajectories in X, functions on the domain X, families of curves in X, equivalence classes of 
trajectories in X, generalised functions (distributions) on X, and so forth. Various kinds of figures are listed 
in Remark 20.9.2. Each transformation rule up : G x F — F is constructed from the given transformation 
rule ux : G x X — X by a specified method. 


In differential geometry, the transformation rules for sets of objects such as tangent vectors, differential 
forms, tensors and distributions are determined by the transformation of the points of the manifold which 
those objects “inhabit”. Therefore for any group of point-transformations, there is an associated group of 
object-transformations for each of the different kinds of objects which “live” in the manifold. 


20.9.2 REMARK: Some methods for constructing figures associated with a given transformation group. 
Some methods for constructing transformation groups (G, F, o, up) associated with a given transformation 
group (G, X,c, ux) are outlined in the following list. For each set of figures F, the map pr: Gx F > F 
denotes the action of G on F. The constructed left transformation groups are “associated” with the given left 
transformation group in the sense that ur(co(gi, 92), f) = ur(gi, ur(go, f)) for all f € F, for all g1,g2 € G. 
In each construction method, g may be replaced by g^! to give an alternative associated construction. In 
each case, the set F is indicated as a subset of some set. This subset must be chosen so that the operation 
ur is closed. 
(1) The figures are individual points of the base set X. This is the trivial case. 

FCX 

Vg EG, Vf € F, ur(g, f) = ux(g, f). 
(2) The figures are subsets of X. 

FCP(X) 

Vg EG, V8 EF, ur(g, 5) = {ux(9, f); f € S}. 
(3) The figures are maps y : I — X for some set I. 

Fue x 
(4) Each figure is a map from X to some set Y. 

Poy 

Vg € G, Vó E F, Vr € X, ur(g, ó)(v) = o(ux(g,2)). 
(5) Each figure f : E + K is a map from some set of “test functions” E C Y* to a set K. 

F C K” for some E C Y* 

Vg € G, Vf € F, YỌ € E, Vr € X, ur(g, f)(9)(2) = f(é(ux(g. 2)))- 
(6) The figures are maps from an index set J with values which are subsets of X. 

FCIP(X) 


20.9.3 REMARK: Example of figures which are subsets of a given point space. 

An example of case (2) in Remark 20.9.2 would be the set of images of injective continuous maps f : Ck > X, 
where Cp = {x € IR*; a zi < land Vi € Ng, x; > 0} is a closed k-dimensional “cell”. Such images might 
be referred to as “disoriented cells”. Closed cells could be replaced with open cells or boundaries of cells. 
By fixing k, the set of figures could be restricted to just line segments or triangles etc. 
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20.9.4 REMARK: Figures which are sets of subsets of the given point space. 
One obvious generalisation of case (2) is to sets of subsets. 


F C IP(P(X)) 

Vg € G, VC € F, ur(g,C) = {{ux(9, f); fe S}; Sec]. 
This would be applicable, for example, to sets of neighbourhoods of points in a topological space X. (For 
indexed sets of neighbourhoods, this would correspond to case (6), as mentioned in Remark 20.9.9.) Arbitrary 


hierarchies of sets of sets of sets, and so forth, may be defined in this way by considering F = P(P(X)), 
F = P(P(P(X))), and so forth. 

Case (2) may also be generalised from unordered subsets of X to ordered subsets of X. For example, ordered 
pairs, ordered triples, etc. Ordered sets could be thought of as equivalent to case (3), where the (possibly 
variable) index set I has an order which is imported onto the point set. 


20.9.5 REMARK: Examples of maps with the given point set as the range. 
Examples of case (3) in Remark 20.9.2 are sequences of elements of X such as a basis of a linear space (where 
I is a finite or countable set), curves in X (where J is an interval of the real line), and families of curves 
in X (where J is a subset of IR? such as [0,1] x [0,1]). The natural operation L, : F — F is defined by 
L;(y)(t) = u(g,"y(t)) for y € F and t € I. A special case would be the set of functions f : Ck + X, where 
Cy, is defined as in case (2). Such functions could be called “oriented cells". 


20.9.6 REMARK:  Covariant and contravariant transformation of functions on the point set. 

In case (4) in Remark 20.9.2, there are two natural operations L, : F — F. The action could be defined 
either so that L,(f)(x) = f(u(g,x)) or Lg(f)(x) = f(u(g !,x)). The second choice has the effect that the 
function is moved in the direction of the action of g on X because the points x are moved in the reverse 
direction. This has the advantage that if both f and x are transformed by g € G, then the value f(x) is 
unchanged. 


20.9.7 REMARK: Functions to and from the point set are not natural figures. 
It might seem reasonable to combine cases (3) and (4) in Remark 20.9.2 by considering maps 9 : X > X. 


These could be endomorphisms, automorphisms, or arbitrary partial function maps (defined only on subsets 
of X). The action of G on such figures could affect only the domain of each map as in case (4) as follows. 


Bic 
Vg € G, Vó € F, Va € X, ur(g,$)(x) = o(ux(g, a)). 
Alternatively only the range of each map could be affected as in case (3) as follows. 
Vg eG, Vó € F, Va € X, ur(g, ó)(z) = ux(g. é(x)). 
Or the action could be applied to both the domain and range as follows. 
Vg E€ G, Vó E F, Vr € X, nr(g, )(x) = ux (g9, ó(ux(g,))). 


A more important question is whether point transformation maps can be validly considered as “figures” in 
the point space. A figure should be an object with a location in the point space, and the object should be 
movable within the space, more or less in the same way as individual points may be moved. 


A point transformation could be regarded as a figure if the pairs (a, ¢(a)) were regarded as ordered pairs 
of points in X. In this case, the map would be thought of as a set of ordered pairs, which is more or less 
a set of sets as in Remark 20.9.4. This is still not very convincing. A transformation group acting on the 
"figures" of a point set must act on more than merely some set which is connected in some way with the 
set X. The figures must be structures which are based in some on the points of X, and they must have a 
natural motion when those points are acted on by the group G. 


20.9.8 REMARK: Distributions are functions whose range is a set of functions on the point set. 

An example of case (5) in Remark 20.9.2 is the set F of continuous linear functionals f : C (IR^) > R 
with X = R”, Y = K = R, and E = C$ (IR"). In this case, the figures are functions of functions. A double 
reversal of the sense of the Lg action in case (4) yields the forward action defined by L,(f)(6) = f(L4-1(9)) 
fo f € F, 6: X 2 Y, where L,(¢)(x) = é(u(g !,x)) ford: X ^ Y and x € X. 


20.9.9 REMARK: Sequences of families of sets can be figures. 
An example of case (6) in Remark 20.9.2 would be an infinite sequence of neighbourhoods of a point in a 
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topological space X. The natural operation Lg : F — F here is L,(f)(t) = {u(g, 2); x € f(t)) for f € F 
and t£ € I. 


A kind of inverse of case (6) is a set-measure. Measures map sets to numbers. This gives something similar 
to the following formulation for some set of numbers K. 


FC KP) 


Vg € G, Ym € F, YS € P(X), wr(g,m)(S) = m((ux(g.z); x € SH). 


1 


One would normally apply g^ ^ rather than g because this makes the measure follow the transformed points. 


20.9.10 REMARK: Examples of figures on a differentiable manifold as the point set. 

If G in Remark 20.9.2 is the group of C! diffeomorphisms of some C! manifold X, an example of case (3) 
would be C! curves in X, an example of case (4) would be C! real-valued functions on X, and an example 
of case (5) would be C! real-valued distributions on X. 


20.9.11 REMARK: It is difficult to give a definitive characterisation of all classes of figures. 

All of the genuine “figures” in Remark 20.9.2 seem to fit into the general pattern of a subset F of a cross 
product X x Y, where the elements of X are transformed by group elements while the elements of Y are 
unchanged. The set Fy = (y; (x, y) € F} describes the structure which is “attached” at each point x of X. 


In case (2), the characteristic function of the set S determines whether the set is or is not present at a given 
point. This is a boolean function of the points. So Y is the set (0,1) or something similar. In the more 
general case of a set of sets, the structure Fy is then the set of sets which contain each point x. These 
formulations may seem to be unnatural rearrangements of the definitions, but they do correctly determine 
how to transform each set S in the figure space F. 


In case (3), Y = I and F, = (t € I; y(t) = x}. This is effectively the inverse of the relation y. Case (4) is 
the same except that the relation does not need to be inverted. 


Case (5) is a fly in the ointment here because F C IP(X,Y) x K. In this case, it is seemingly impossible 
to localise the figures to points in any way. The value of the figure depends on all of the points of the 
point set X. Nevertheless, it is possible to make elements of G act on the set X while leaving the other 
sets unchanged, and figures of this kind are genuine figures. For example, if E = Cg*(IR^) and K = R, 
then the space of continuous linear functionals from E to K (with a suitable topology on E) is a set of 
distributions on X. Examples include Dirac delta functions. Distributions have well-defined transformation 
rules in accordance with transformations of the underlying point set. Distributions are similar to measures, 
which can be transformed around the point set along with the points, as mentioned in Remark 20.9.9. 


All in all, it seems that figures in point spaces are too diverse to unify in a single framework without including 
the complete generality of transformation groups over a given group G. 


It does seem reasonably clear whether the group element g € G or its inverse should be applied to points in 
the figure constructions to make the figure move in the correct direction relative to points, but this is also 
difficult to formulate in a precise way. Thus intuition seems to determine what a figure is, and also how 
precisely it should be transformed. Unless a reliable, systematic framework can be found, it seems reasonable 
to simply codify the basic operations and relations. 


20.9.12 REMARK: Arbitrary combinations of the classes of figures. 

Definition 20.9.13 defines figure spaces for transformation groups recursively in terms of a set of methods 
called “generation rules”. The application of any sequence of generation rules defines a figure space for the 
initial transformation group. 


Definition 20.9.13 permits the generation of all of the figure spaces in Remarks 20.9.2, 20.9.4 and 20.9.9, but 
not the dubious double transformation rule up (g, $)(x) = ux(g, óé(px(g, x))) in Remark 20.9.7. (The trivial 
case (1) in Remark 20.9.2 is not generated by Definition 20.9.13 because it is too trivial.) 


Definition 20.9.13 does not add structures such as an order, a topology or an atlas to the generated figure 
spaces. Nor does this definition generate the contravariant transformations which may be obtained by 
replacing g with g^! in the generation rules for the action maps Hp,- 


20.9.13 DEFINITION: A figure space for a transformation group (G, X,c, ux) is a transformation group 
(G, F,c, up) which is generated by some sequence of the following generation rules. 
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(1) For figure space (G, F\,0, up,) and predicate P, on IP(F1), define figure space (G, F2, 0, pr, ) by 
F> ={S € P(F\); Pi(S)) and 
Wg E€ G, YS € FP», up,(9,9) = lur, (9, hi); h € S}, 
where P; satisfies up, (G x F5) C Fo. 
(2) For figure space (G, F1, c, ur, ), set I, and predicate Pj on F1, define figure space (G, F2, 0, jp,) by 
Fə = {7 : I > Fi Pi(y)} and 
Vg € G, Vy € F», wr (9,7) = (t ur, (9,9(0)) t € I}, 
where P; satisfies up, (G x F5) C Fo. 
(3) For figure space (G, F1, c, ur, ), set Y, and predicate P, on Y ^, define figure space (G, F2, 0, ur,) by 
Fə ={¢:F, > Y; Pi(¢)} and 
Vg € G, V6 € Fo, up, (9,6) = {ur (9, fi), O(f1)); h © Fi}, 
where P; satisfies up, (G x F5) C Fo. 


20.9.14 REMARK: A minor technicality in the definition of figure spaces. 
In Definition 20.9.13, part (1), the closure pre-condition for P, is 


Vg € G, VS € P(F), P(S) = Pi({ur, (9, fi); fi € Sf). 


This should, strictly speaking, be placed on P, before F5 and up, are defined. It is tedious to write out such 
pre-conditions in full. They are therefore expressed more simply as the post-condition up, (G x F5) C F> in 
terms of F and up,. (Better late than never!) 


20.9.15 EXAMPLE: Examples of figures using classical groups. 
As an example of Definition 20.9.13, let G be the classical group SO(n) for some n € Zt, with X = IR". (See 
Section 25.14 for classical groups.) Denote this transformation group by (G, X,c, ux). Using method (2) 


with J = [0,1] C R and defining P\(y) to be true if the map y : J > IR" is linear, one obtains the figure 
space (G, F,c, ur) of linearly parametrised line segments in R”. 


For each line segment y € F, the point 4(1/2) is an element of X = R”. This defines a map a from F to X. 
This map is covariant in the sense that a(up(g,y)) = ux(g, o(»)) for all g € G. Of course, this is valid also 
for G = GL(n). 


20.9.16 REMARK: Associated transformation groups are generated from a common transformation group. 
If two transformation groups can be obtained as figure spaces via Definition 20.9.13 from a common transfor- 
mation group, then those groups may be referred to as associated transformation groups. This is formalised 
as Definition 20.9.17. 


20.9.17 DEFINITION: Two associated transformation groups are transformation groups which are figure 
spaces of a common transformation group. 


20.9.18 REMARK: Group invariants and figures. 
The definition of group invariants is based on the notion of “figures”. A group action on a space X may be 
extended to a wide variety of spaces of figures derived from the basic space X. 


20.9.19 REMARK: Invariant, covariant and contravariant attributes of a transformation group. 

An “invariant attribute" of an associated transformation group (G, F,o, wr) of a given transformation group 
(G, X,o, ux) is a function h : F > A for a set A, such that h(ur(g, f)) = h(f) for all g € G and f € F. 

A “covariant attribute" of an associated transformation group (G, F,c, ur) of a given transformation group 
(G, X,o,ux) is a map h : F — A for an associated transformation group (G, A,c, ua) which satisfies 
h(ur(g, f)) = va(g, h(f)) for all g € G and f € F. 

A "contravariant attribute" of an associated transformation group (G, F,o, ur) of a given transformation 
group (G, X,o, ux) is a map h: F > A for an associated transformation group (G, A, 0, uA) which satisfies 
h(ur(g, f)) = na(g *, h(f)) for all g € G and f € F. 

Clearly a contravariant attribute is the same as a covariant attribute with the inverse associated transfor- 
mation group (G, A,c, BA) replacing (G, A, , uA), where ji, is defined by fia: (g, f) ^ pa(g |, f). It is 
also clear that an invariant attribute is the same as a covariant attribute (G, A, c, ua) for which the group 
action is the identity map on the set A. 
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20.9.20 REMARK: The transformation rules for particular figures spaces are not uniquely defined. 

There is clearly a very broad range of classes of objects which could be regarded as “figures” for a set X. 
In each case, there is no well-defined rule for generating the map Lg from the figure space F, although it 
is “obvious” which map to choose in each case, more or less. The point made here is that the concepts of 
figures and invariants seem to be some sort of meta-concept which cannot be codified easily. 


Suppose now that a figure space Fı has been chosen for a transformation group (G, X, c, u), and a transfor- 
mation operator Lg : Fı — F; for each g € G. Then any function 0 : Fı — F> for any figure space F5 for X 
(with action operator also denoted as Lg) is said to be an invariant of the group G if 0(L,f) = L,6(f) for 
all f € F1 and g € G. (This is equivalent to requiring 0 to be an equivariant map from (G, F1) to (G, F2). 
See Definition 20.6.6 for equivariance.) 

Typical examples of invariant functions 0 would be the length of a curve, the centroid of a finite set of vectors, 
the maximum of a real-valued function, the angle between two vectors, and the mass of a Radon measure. 
The 0 functions may be boolean-valued attribute functions such as the collinearity of a set of points. The 
attribute value set Fy may not be modified at all by the L, operations. But in the case of linear combinations, 
for instance, the linear combination of a sequence of vectors is not invariant under transformations of the 
base space X, but the linear combination is transformed in the same way as the sequence of vectors. Thus 
linear-combination operations are invariants of the group of linear transformations. 


20.9.21 REMARK: Motivation for systematising figures and invariants. 

Section 20.9 is partly motivated by the observation that convex combinations of vectors are preserved under 
affine transformations. This is not strictly invariance, because the midpoint of two points, for instance, is 
transformed to a different point by an affine transformation, but the midpoint of two transformed points is 
the same as the transformed midpoint of the original points. This kind of preservation of relations could 
be described as a kind of *covariance". In other words, the transformation group preserves relations rather 
than points or functions of points. However, it is arguable that the same word "invariant" could be applied 
both to fixed points, fixed function values, and fixed relations under transformation groups. 


When parallelism is defined on fibre bundles, it is often required that some structure is preserved. Thus, 
for instance, if two objects attached to one point of a set satisfy some relation, then the same two objects 
transported in a parallel fashion to another point may be required to maintain that relation. A property 
or relation which is maintained under a transformation is called an "invariant" of the group of transforma- 
tions. So when defining parallelism for manifolds, it is convenient to define a “structure group" so that the 
properties and relations which are supposed to be maintained are invariants of that group. This motivates 
the presentation here of transformation group invariants. 


20.9.22 REMARK: Invariance of linear combinations under parallel transport of tangent spaces. 

An important example of a group invariant is the linear combination operation on linear spaces such as the 
tangent vector spaces at points of a differentiable manifold. Consider the linear space IR". For any sequence 
A = (ài); € R”, define the linear combination operator S) : (R”)™ > IR? by Sy : (vi); > OE, Aivi- 
This maps m-tuples of vectors in R” to a linear combination of those vectors. Then S4(L,v) = gS) (v) for any 
g € GL(n) and v = (v;)?*, € (R")™, where Lg : (R”)™ — (R”)™ is defined so that L, : (v;)?*4 > (gvi). 
So every linear combination operator S) is an invariant (or covariant?) function of the set of m-frames in R”. 
When defining a connection on a differentiable fibre bundle in Chapters 67, 68, 69 and 71, the connection will 
be required to preserve these invariants. This will play a role in the transition from connections on ordinary 
fibre bundles to connections on principal fibre bundles. More specifically, the invariance of a connection on 
a principal fibre bundle under group action on the right will arise from the invariance of linear combinations 
of n-frames under linear transformations. 


20.10. Baseless figure/frame bundles 


20.10.1 REMARK: The advantages of studying fibre bundles without base points. 

Fibre bundles are generally understood to have a base point for every fibre. (See Chapters 21, 47 and 64-66 
for fibre bundles.) However, geometrical figures in general do not necessarily have a well-defined base point. 
A tangent vector in a differentiable manifold has a well-defined base point, but a line segment, an infinite 
line and a polygon have no well-defined single base point. Therefore the base-point/fibre-coordinate model 
in the standard definitions of fibre bundles is unable to describe all geometrical figures. Even when the 
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base/fibre split is the correct model, it is easier to understand the concepts by first ignoring base points. 
The base/fibre split is a significant distraction from the fundamental concepts of ordinary and principal fibre 
bundles. Topology and differentiable structure related to the base space are likewise significant distractions. 
Therefore it is convenient to first define “baseless fibre bundles”. When there is no natural base point, it is 
more accurate to talk of “baseless figure bundles”. 


20.10.2 REMARK: Relativity of observations may be modelled by a transformation group. 

In the application of transformation groups in physical models, there is a kind of duality between object and 
observer. Let X be the space of configurations of some object, like an asteroid in space for example. Let w be 
a single “standard observer”. This observer may observe a configuration x € X for the object at some point 
in time. Each general observer w' will observe a different configuration x’ € X relative to their viewpoint or 
frame of reference. The observed configurations are typically related by some transformation g in a space G 
of possible observed configurations. Thus z' = g(w,w’)a, and the set of such transformations constitutes a 
left transformation group (since one has g(w,w) = idx, g(w,w’)g(w’,w) = idx and transitivity). 

As mentioned in Remark 20.1.10, it very often happens that the group of transformations of viewpoints 
matches the group of the object being observed. That is, the coordinate transformation group matches 
the point transformation group. Then one may describe object transformations over time in terms of left 
transformation group elements. Thus the state at time t could be described as x(t) = hox(0), where hs, € G 
for all s,t € R, and z(t) € X for t € R. 


If "everything is relative", then the observers cannot distinguish between changes in the object and changes 
in the observer viewpoints. If the object is acted on by a group element h on the left, this is the same 
mathematically as acting on the observers with h on the right. The effect of h on the object from the left 
is the map x + hz. This results in the observation g(ha) by an observer with viewpoint g € G. This has 
the same effect on the observation as replacing the observer viewpoint g with gh. The net observation ghx 
is the same in each case. This is often experienced when sitting in a train at a station, waiting for the train 
to move. When the neighbouring train moves, one may have the illusion that one's own train is moving. 


It follows from these comments that there is a natural isomorphism between the left transformation group 
(G, X,c, p), regarded as a subset of the (bijection) automorphism group Aut( X), and the right transforma- 
tion group (G, G,c,6) of G acting on itself, regarded as the group-automorphism group Aut(G). 


20.10.3 THEOREM: Isomorphism between left actions on a point set and right actions on the group. 

Let (G, X,c, u) be a left transformation group. Let (G,G,0,0) be the corresponding right transformation 
group of G acting on itself. For g € G, define p : X > X by LŽ : x > u(g,x), and RE :G >G 
by RÇ : h> o(h,g). 


(i) The map 9 : Ax — Ag defined by 9 : LŽ — RE is a group isomorphism between the function- 
composition groups Ay = i15 g € G} and Ag = IS geGj. 


PROOF: For (G, X,co, u), (G,G,0,0) and 9 : Ax — Ag as in the statement of the theorem, let g1, go € G. 


Then $(LT o LX) = (L]o) = Re ug = RED ES RC. o RE Part (i) follows. 


20.10.4 REMARK: The relation between group actions on a point set and group actions on the group. 
'The maps and spaces in Theorem 20.10.3 are illustrated in Figure 20.10.1. 


& 

RF Tog d. 
d 

D Gx 


Figure 20.10.1 Left actions on the point set and right actions on the group 


It may seem that Remark 20.10.2 and Theorem 20.10.3 are vacuous. The “isomorphism” in Theorem 20.10.3 
is apparently none other than the identity map idg : G — G. However, a group of transformations is usually 
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specified as a subgroup of the automorphism group of some set. Theorem 20.2.9 shows that each effective 
left transformation group on X is essentially the same thing as a subgroup of the symmetric group on X, 
which is the group of bijections from X to X. The group G in Theorem 20.10.3 has two roles. In (G, X, c, u), 
G is a subset of the symmetric group on X. In (G,G,o,c), G is the right automorphism group on G. 


20.10.5 REMARK: The relation of principal fibre bundles to ordinary fibre bundles. 

The purpose of Example 20.10.6 is to show how principal fibre bundles are related to ordinary fibre bundles 
in the case of “baseless figure charts”, but where the maps are not all linear. Principal fibre bundles are 
usually explained in the context of tangent bundles, where the fibre maps and transition maps are linear, 
and the structure group is a general linear group. 


20.10.6 EXAMPLE: Camera images, covariance and contravariance. 

Consider multiple cameras i € J viewing a two-dimensional region E through distorted lenses so that each 
camera i maps the region E to an imaging region F. For simplicity, suppose that the imaging region is 
the same set for each camera, and that the map between E and F is a diffeomorphism ¢; : E — F. Then 
ij = ¢; 9 ¢, : F — F is a diffeomorphism for all i, j € I. So ©; j c Aut(F), where Aut(F) means the 
diffeomorphism group on F. (See Definition 52.2.2 for diffeomorphisms.) Let w be a subset of E. Then 
camera i will perceive the set f; = ¢;(w) on its imaging region F. The images seen by cameras i and j are 
related by Í; = 9; ;(fi) for all i,j E I. 


Now suppose that the functions ¢; are not individually knowable. In other words, suppose that only the 
transition maps 9; į are known, which can be established by comparing the images in different cameras. The 
“objectivity” of the imaging may be confirmed (more or less) by verifying that $; ; o 0; ;(w) = 9; (w) for 
all i,j,k € I, at least when these maps are compared for a range of presumed sets w € P(E). In the absence 
of direct information about the “real world" (and the maps between the “real world” and the camera imaging 
regions), one cannot reconstruct the sets w € IP(E), but one may construct equivalence classes which are a 
(poor) substitute for the “real thing". If cameras i, j € I produce images f; and f; respectively, one may say 
that they are looking at the same object w € P(E) if f; = 9; ;(f;). Thus the pairs (¢;, fi) and (6j, fj) are 
equivalent (i.e. produced by the same “real world” object) if and only if f; = ; ;(f;). So one may construct 
an equivalence class [(¢;, f)] = 1(65, 9; ;(f)); j € I}, for any i € J and f € IP(F). Then there is a natural 
bijection v : {[(¢:, f)]; à € I, f € P(F)} > P(E). Unfortunately, this bijection is not knowable, but one can 
at least perform the thought experiment of inverting the unknown y to obtain ^ !([(o;, f)]) = w for some 
unique w € IP(E), for all (i, f) € I x IP(F). 

Corresponding to the group G = Aut(F) is the group G = {g : P(F) > IP(F); g € G}, where g denotes the 
set-map corresponding to g € G. Thus g : f ++ {g(x); x € f) for all f € IP(F). The tuple (G, P(F),¢, u) 
is a left transformation group, where ø denotes the function composition operation and u : (g, f)  g(f) 
denotes the action of G on IP(F). Let P be the set of all possible camera imaging diffeomorphisms. That 
is, let P = (6 : E — F; ¢ is a diffeomorphism}. Then G has a natural action u on P on the left. Thus 
ub :(g,0) 9 goo for g € G and € P. 

If g € Gis applied to ¢; in the equivalence class [(6;, f)], the result is [(g@;, f)], which equals [(¢;, g ! f)]. (To 
see this, let 9; = gó;. Then (gói, f) equals (;, f), which is equivalent to (i, Pj: f), which equals (¢;, g ! f) 
because ®j; = $i o $;! = g |.) In terms of the physical interpretation, this means that applying g to the 
distortion of the camera lens is equivalent to applying g to the object which is being imaged. This seems 
intuitively clear. If the object is made twice as big, this has the same effect as increasing the magnification 
of the imaging. 


In the modern digital camera, the apparent size of an object in a photo depends on the size of the imaging 
element, typically a charge-coupled device (CCD). One can imagine, for the sake of this mathematical 
example, that the CCD could be distorted in some way, for example by a diffeomorphism vy : F > F. If 
the array of pixels in the image surface suffers such a distortion, then when the image is displayed as the 
rectangle which it is (falsely) assumed to be, the resulting image will suffer the distortion ó = w^ 1. Then 
one will have the equivalence ((w o g)-!, f) = (g^ v^ !, f) = (v^, gf). So in terms of pairs (v^, f), one will 
have (wg, f) = (v, gf). This resembles a formula which is often seen in definitions of principal fibre bundles. 


In the typical textbook example of principal fibre bundles for the general linear group GL(n) acting on 
the set R” of real n-tuples, the real n-tuples represent the components of vectors (in a tangent space on a 
differentiable manifold) with respect to a given basis. The group has two roles. In the OFB (ordinary fibre 
bundle), the group acts on the real n-tuples of vector coordinates. In the PFB, the group acts on the linear 
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space basis at each point. In terms of the usual conventions for matrix multiplication, one says that the 
basis is acted upon by a matrix on the right, whereas the vector components are acted upon by a matrix on 
the left. Thus Ra : (ei),  (65)5-1 = (eia ;)5- and Lp : (v), > (02)524 = (Div) 1, with the implicit 
summation convention in Remark 22.3.10. The reconstructed vector w = 5; , vře; is the same as the 
reconstructed vector ip = Y. , 7g; if and only if 977 vie; = 35, ve = DL pe akt" uL e 


oua Dore Qoia 01a’, )v" ee. This holds if and only if 357 , b’;a’, = 5°, for all k, £ € Nn, which means that 
the matrices b and a are inverses of each other. Therefore the pairs (e,v) and (R,-1e, Lgv) are equivalent 
for all g € G. 


Thus the vector components are transformed contravariantly to the basis if the imaged vector is unchanged. 
The basis vectors are “measuring sticks” for a fixed “real vector". If a measuring stick is doubled in size, 
the corresponding measurement (the “coordinates” ) must be halved. This is the underlying reason for the 
contravariance. (This is not at all surprising when one looks at Figure 22.8.1 in Remark 22.8.4, which shows 
that the “vector components calculation” operation is the inverse of the “linear combinations construction” 
operation. The former is contravariant whereas the latter is covariant relative to the choice of basis.) 


In the camera example, if the CCD is made larger, the image produced from a fixed object will be smaller 
relative to the size of the CCD. (This is why low to medium price digital SLR cameras produce larger 
images than expected with lenses from old film cameras. The smaller imaging area makes objects appear 
relatively larger.) The photographic and linear space situations are analogous in that the transformation 
applied to the measuring sticks (i.e. imaging area or basis vectors) must be inverse to the transformation of 
the measurements (i.e. relative image size or vector components) if the “real” object is fixed (namely the 
photography subject or the vector). 


The moral of this story is that one should think of a PFB total space as a set of “frames of reference” 
or “cameras” or “observers” or “linear space bases”, while the fibre space is the set of “measurements” or 
“images” or “vector components”. An OFB total space is the set of “real objects” or “photograph subjects” 
or “vectors”. Each measurement is produced by combining a frame of reference with a real object. This 
idea is fundamental to fibre bundles, tensor algebra, tensor calculus, parallelism and connections. A related 
comment was made by Bleecker [254], page 24, as follows. 


Since all measurements are made relative to a choice of frame, and the measurement process can 
never be completely divorced from the aspect of the universe being measured, we are led to the 
conclusion that the bundle of reference frames should play a part in the very structure of the 
universe as we perceive it. 


An analogue of the above “covariant” relation between object size and image size (where the diffeomorphism 
maps have the form ¢ : E > F) can be manufactured by providing an inner product for a linear space. Then 
the inner product tuple (v;)?., = (v - e;)?., is covariant with respect to the basis vector tuple (e;)7.,. 


20.10.7 REMARK: Contravariant baseless figure/frame bundles. 

The concept of a combined baseless figure/frame bundle may now be formalised as Definition 20.10.8. This 
is the contravariant version where the coordinates vary inversely with respect to the observation frames. 
The spaces and maps of baseless figure/frame bundles are illustrated in Figure 20.10.2. For comparison, the 
corresponding fibre/frame bundle spaces and maps are shown on the right. (See Remark 20.10.20 for the 
corresponding conditions for a covariant baseless figure/frame bundle.) 


20.10.8 DEFINITION: A (contravariant) baseless figure/frame bundle is a tuple (G, P, E, F,o, Hb. ub.) 
which satisfies the following conditions. 


(i) (G,c) is a group. 
(ii) (G, P, c, uE) is a right transformation group which acts freely and transitively on P. 
(ii) (G, F, c, wé) is an effective left transformation group. 
(iv) n: Px E 2 F satisfies Vd € P, Vz € E, Vg € G, n(uE(0,g )), z) = Elg, nlo, z)). [contravariance 
) 


(v) Y9 € P, Vf € F, Fz € E, n($, z) =f. 
(Le. the map z > (9, z) is a bijection from E to F for all ọ € P.) 


G is the structure group. 
P is the principal bundle or frame bundle or observer bundle or viewpoint bundle or perspective bundle. 
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Figure 20.10.2 Spaces and maps for baseless figure/frame and fibre/frame bundles 


E is the ordinary bundle or figure bundle or state bundle or entity bundle or object bundle. 
F is the observation space or measurement space or component space or coordinate space. 
uÈ is the right action (map) of G on P. 

ue; is the left action (map) of G on F. 

(G, P, c, uÈ) is the frame (transformation) group. 

(G, F, c, uE) is the measurement (transformation) group. 

n is the observation map or measurement map. 

@ is a frame or observer or viewpoint or perspective for ó € P. 

z is a figure or state or entity or object for z € E. 

f is an observation or a measurement or a view or the components or the coordinates for f € F. 
n(à, -) is a chart or component map or coordinate map for ¢ € P. 


20.10.9 REMARK: Principal fibre bundles and ordinary fibre bundles make more sense when combined. 

In a combined fibre/frame bundle which does have a base space, the set P is the total space of the principal 
fibre bundle, and the set E is the total space of the ordinary fibre bundle. The standard textbooks on this 
subject generally present principal (i.e. frame) bundle and ordinary fibre bundles as separate classes which 
may be associated, but they probably make more sense when combined. 


20.10.10 REMARK: Interpretation of the definition of contravariant baseless figure/frame bundles. 
Condition (iv) in Definition 20.10.8 may be written as 7(R,-1¢, z) = Lgn(¢, z) in terms of Ry : 6 — ulo, 9) 
and Ly : f + w&(g, f). It may be simplified further to (R,-1¢)(z) = Lg(z) by abbreviating charts (4, - ) 
to $. It may be simplified even further to (¢g~')(z) = gó(z) by writing R,¢ as óg, and L,f as gf. And 
yet further, one may write óg ^! = gó by suppressing the variable z. One must remember, however, that 
the right and left multiplications by g^! and g respectively are different group transformation operations. 
Moreover, the right operation transforms the function ¢, whereas the left operation transforms the function’s 
value ¢(z). 

The self-consistency of condition (iv) may be verified by letting g = g1g» for any g1, g2 € G, and noting that 
(Fg-16)(2) = (Rigiga)- 6) (2) = (,-1,-:0)(2) = (t Roz h) C) = (Roz 01,39))(2) = Da ((Rg-19)(2)) 
= Lg, (Lg,(ó(z)) = Logo (0(2z)) = Lg(G(z)) by a double application of condition (iv) and the definition of a 
right transformation group. 


20.10.11 REMARK: The arbitrary terminology for covariant and contravariant transformations. 

In Definition 20.10.8, one could substitute g^! for g (and g for g^!) without changing the meaning of the 
definition. Then one would act on frames ¢ with actions Rg, and on observations f = ¢(z) with actions L,-1. 
This alternative convention would have some advantages. The terminology in differential geometry refers to 
the frames as transforming “covariantly”, while the measurements are said to transform “contravariantly” , 
which is a great mystery to many people. If the actions Rg and L,-: were adopted, this could make 
the transformations align better with the terminology. However, the observables are nearly always in the 
foreground in differential geometry. One is generally concerned, particularly in tensor calculus, with the 
transformations of coordinates, not of bases (or reference frames). Therefore the convention followed here 
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is to write R,-1 and Lg. This arbitrary convention is a difficult trade-off between tradition, common sense 
and practicality. (See also Section 28.5.7 for discussion of this issue. See Remark 20.10.20 for the conditions 
for a covariant baseless figure/frame bundle.) 


20.10.12 REMARK: Arguments for and against identifying charts with frames. 

Abbreviating charts 7(¢, -) to ó in Remark 20.10.10 is not necessarily an abuse of notation. The principal 
bundle P is introduced in Definition 20.10.8 (ii) as an abstract set. But P could easily be defined as a set of 
functions from E to F without modifying Definition 20.10.8. That is, one could choose P to be a subset of 
the set of functions FP, and one could choose these functions ¢ € P so that ¢ = n(¢, -). Then one would 
have Vz € E, (z) = n(à, z). One could in fact take this as the definition of 7. There is no logical difficulty 
with this in principle. 


An argument against identifying (¢, -) with $ is that one generally wishes to admit multiple “associated” 
ordinary bundles E, for a single common principal bundle P. (See Remark 20.11.1.) In fact, this is the 
main benefit of defining principal bundles. To take care of this situation, one could tag each chart $ to 
show which ordinary bundle it is associated with. Thus ¢”* or ¢* would mean nk(¢, -) for example. This 
is somewhat clumsy, but not necessarily a bad idea. Another approach would be to arrange things so that 
the sets representing the ordinary bundles E; are disjoint. Then ¢ could be applied to all of the relevant 
ordinary fibre bundles without tagging. 


20.10.13 REMARK: The structure group is the group of all transitions between reference frames. 
Condition (ii) in Definition 20.10.8 implies that the map uE(6, -) : G — P is a bijection. This is intentional. 
The free and transitive requirements follow from the fact that the purpose of G is to model the group of all 
transitions between the reference frames in P. 


20.10.14 REMARK: The fibre chart atlas for a baseless figure/frame bundle. 

Every non-topological, topological or differentiable fibre bundle requires a fibre chart atlas. In the case of 
baseless figure/frame bundles, this atlas is implicitly given by Definition 20.10.8. The only choice for the fibre 
atlas AL for a baseless figure/frame bundle (G, P, E, F, c, ub. ub.) is A = (1(9,-); ¢ € P). Then one 
may think of the tuple (E, Ø, Ø, A7) as a non-topological (G, F) fibre bundle according to Definition 21.8.3. 
Technically, this is valid, although the correspondence of definitions would be more accurate if the base space 
of a baseless figure/frame bundles was a singleton rather than the empty set. (In fact, Definition 20.10.8 
would have more accurately reflected its intended meaning if a singleton base space had been required. But 
it's too late now. Maybe in the second edition!) 


If the empty base space of a baseless figure/frame bundle is replaced by a singleton, Definition 21.8.3 (v) 
would require the property V1, ¢2 € Ak, dg € G, Vz € E, ó»(z) = u(g, ó1(z)). This follows immediately 
from Definition 20.10.8 (iv). Thus A£ = {n(¢, -); 6 € P) satisfies the requirements for a fibre chart atlas. 


20.10.15 THEOREM: Frames are uniquely determined by their action on the ordinary bundle. 

Let (G, P, E, F,o, ub, ub, n) be a baseless figure/frame bundle. Let $1, 9» € P satisfy n(ġ1, -) = n(¢2, -). 
'Then Qı = Q2. 

Hence the map ¢ > n(¢, -) is a bijection from P to AE = {n(¢, -); ¢ € P}. 


PROOF: Let ¢1, 9» € P. Suppose that n(¢1, -) = n(¢2, -). Since u£ is free and transitive, there is a unique 
g' € G such that ¢2 = u% (¢1,g') by Theorem 20.4.3 (iii). Let g = (g') !. Then Definition 20.10.8 (iv) 
implies Vz € E, n(¢2,2) = ut (g. n(¢1, 2)). So Vz € E, n(¢1, z) = uG(g, n(¢1, z)) by the assumption on n. 
Let f € F. Then f = 7(¢1,z) for some z € E since n(¢, -) : E — F is a bijection by Definition 20.10.8 (v). 
So f = uE(g, f). Thus Vf € F, f = uE(g, f). Therefore g = eg by Definition 20.2.1 because uL acts 
effectively on F by Definition 20.10.8 (iii). Hence $1 = ¢2. 

Thus the map ¢ > 7(¢, -) is an from P to AT is an injection, which is therefore a bijection because it is a 
surjection by the definition of A‘. 


20.10.16 THEOREM: Some basic properties of contravariant baseless figure/frame bundles 
Let (G, P, E, F,o, uE, n n) be a contravariant baseless figure/frame bundle. 


(i) V1, $2 P, ag G, Vz E, $»(z) = Loi (z). 
(ii) There is a unique function g : P x P — G which satisfies ¢2 = Lg(4,,4,)¢1 for all 1, %2 € P. 
(iii) Vor, éz, Laliga) = $201 : F > F. 
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PROOF: For part (i), let ¢1,¢2 € P. By Definition 20.10.8 (ii) and Theorem 20.7.16 (iii), there is a unique 
g € G such that ¢2 = R,-1¢1. Then ¢2 = L9; by Definition 20.10.8 (iv). The uniqueness of g for a given 
map Ly : F — F follows from the requirement that G act effectively on F by Definition 20.10.8 (iii). 


The existence of the function g in part (ii) follows directly from part (i) because the group element g exists 
and is unique for each ¢1,¢2 € P. The uniqueness of the function g follows from the effective action of G 
on F by Definition 20.10.8 (iii). 

For part (iii), the existence of the inverse map $5 1: F 2 E for all 9; € P follows from Definition 20.10.8 (v). 
Then the formula L5(5,,5,) = $29, ! follows from part (ii). 


20.10.17 REMARK: Interpretation of some properties of contravariant baseless figure/frame bundles. 
Theorem 20.10.16 (i) means that all frames are related via the left action by a unique element of the struc- 
ture group G. The map Lá(o,,94) : F — F in Theorem 20.10.16 (ii) is called the “transition map" from 
frame $4 to frame $» for $1, 09 € P. This map clearly satisfies 9(¢1, ¢1) = ea, 9(1, 62) = 9(¢2, 91) ! and 
(02, 03)9(01. 2) = G(G1, $3) for all $1, 02:03 € P. 

It is noteworthy that the map Lys : E + E defined by Ly, : z $^! L;ó(z) is not independent of the choice 
of ¢ € P. To see this, let 91,92 € P. Then Ly.g.L54, = ($2 Lot) Lg1) | = $2 Loppi Lz hi = 
z LoLa(9,,9,) L5 10103 2 = $3 Liga (b1,62)9-24(b2,61) P2 This equals idp for all $1, $2 € P if and only 
if g0(61,02)g 1G(¢1, 02)! = eg for all $4, 93 € P. In other words, g must commute with 9(¢1, ¢2) for 
all $1, 2 € P. This can only hold for all g € G if G is a commutative group. The physical interpretation of 
this is that in general, changes to the state (or orientation or location) of an object may appear differently 
to different observers. This is what makes figure bundles and fibre bundles "interesting". 


20.10.18 REMARK: Relativity of observations of an object relative to the observer frame. 

Condition (v) in Definition 20.10.8 means that the measurement value space F fully describes the state z € E. 
In other words, there is a one-to-one relation between the elements of F and E. In general, observations of a 
physical object do not always fully describe the object state. However, it is straightforward to restrict one's 
attention to state variables which one can fully observe. The prime example of a physical object is a single 
n-dimensional vector. This can be fully parametrised by the coordinate space F = IR". The immediate 
application to differential geometry is the description of tangent spaces on differentiable manifolds. 


Condition (iv) in Definition 20.10.8 may be interpreted as a kind of “relativity condition". A change R, in 
the state of the observed object is equivalent to a change L, of the observer frame. The only kinds of state 
transformations which are permitted in the figure bundle (or fibre bundle) model are those which correspond 
to changes of frame of the observer. 


In the case of parallelism on a differentiable manifold, the end result of parallel transport of a tangent vector 
around a closed path is assumed to be related to the initial vector by a linear transformation. This happens 
to correspond to the set of transition maps for manifold charts at a point, which are also related by a linear 
transformation. Therefore fibre bundles are able to use the same structure group for two roles, namely 
transitions between different observer frames and description of state changes. Frame changes and state 
changes do not commute. 


20.10.19 REMARK:  Baseless figure/frame bundles are a special case of standard fibre bundles. 

A “baseless figure/frame bundle" is essentially equivalent to the usual definition of a fibre bundle where the 
base set B is a singleton. This corresponds to a O0-dimensional manifold as the base set. Thus one may 
say that baseless figure/frame bundles are a special case of standard fibre bundles. (The observation that a 
figure/frame bundle with a singleton base space has the right kind of fibre atlas for a nontopological fibre 
bundle is made in Remark 20.10.14.) 


20.10.20 REMARK: Covariant baseless figure/frame bundles. 

The definition of combined covariant baseless figure bundles is the same as Definition 20.10.8 except that 
(G, P,c, uÈ) must be a left transformation group in condition (ii), and the contravariance condition (iv) is 
replaced by the covariance condition (iv^). 


(i) (G, P,o, uE) is a left transformation group which acts freely and transitively on P. 
(iv) n: P x E 5 F satisfies Yọ € P, Vz € E, Vg € G, n(uE(g. 0), z) = u&(g.m(o, z)). [covariance] 
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20.11. Associated baseless figure bundles 


20.11.1 REMARK: One frame bundle may be applied to many figure spaces. 
A single frame bundle (i.e. principal bundle) may be applied to multiple “associated” baseless figure spaces. 
This is illustrated in Figure 20.11.1. 


cé G cé. G 


P P P P 
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m| n2 "| m(¢, y "o, y n] (6, y 
M 
F n F3 Fi Fy F5 


A 
Fo Fə 
F F: 


"C "C 


baseless figure bundles baseless figure bundles 
observation maps nk charts 7z(¢, -) for frames ¢ 
Figure 20.11.1 Common frame bundle for associated baseless figure spaces 


The frame transformation group (G, P, c, E) is common to associated baseless figure spaces. For each figure 
space Ej, a measurement transformation group (G, Fc, Bc and measurement map Nk : P x Ey — Fk 
must be constructed or specified so that Definition 20.10.8 (iv) is satisfied. In other words, rj; must satisfy 


Vo € P, Yz € Ex, Vg € G, mue (9.9), z) = ue (g, m. z)). (20.11.1) 


Then the charts 7,(¢,-) : Ey — Fk will commute with the right action JE in the sense described in 
Remark 20.10.10. In other words, R,-1¢ = Lgġ, where ¢ is an abbreviation for the chart rj (ó, - ). 

Line (20.11.1) may be interpreted to mean that the measurement of a figure z using reference frame R,-1¢ 
gives the same result as applying Lg to the measurement of z using reference frame $. In other words, it is 
impossible to distinguish left actions on measurements from right inverse actions on reference frames. 


20.11.2 DEFINITION: Associated (contravariant) baseless figure/frame bundles are contravariant baseless 
figure/frame bundles (G, P, Ex, Fr,o, TT mS for k = 1,2. 


20.11.3 REMARK: No chart association map is required for associated baseless figure/frame bundles. 

In the case of a topological fibre bundle association in Definition 47.9.5, the association is defined to be a 
bijection between fibre atlases, and this bijection must satisfy some conditions. However, in the case of a 
baseless figure/frame bundle association in Definition 20.11.2, no bijection is required between fibre atlases. 
This is because the atlases are automatically harmonised by Definition 20.10.8. 


As mentioned in Remark 20.10.14, the natural choices of fibre atlases for baseless figure/frame bundles 
(G, P, Ex, Fr, c, LE, MEF, Nk), for k = 1,2, are the sets of charts Art = {n.(¢, -); ó € P). Then the natural 


choice of an association map between these atlases is h : AR > AP defined by h : m(¢, -) > n2(¢, -). This 
is a bijection because Theorem 20.10.15 implies that the maps m (¢, -) — ¢ from AE to P, and ó  m(9, -) 
from P to AR, are bijections. 


To verify Definition 47.9.5 (ii), let v], Ud € AZ. Then v1 = n(¢ı, -) and Vi = n(¢2, -) for some unique 
$1,909 € P by Theorem 20.10.15. Since ub is free and transitive by Definition 20.10.8 (ii), there is a 
unique g € G such that à» = u&(ó1,g |) by Theorem 20.4.3 (iii). Then Definition 20.10.8 (iv) implies 


Vz € E, n(¢2, z) = v&(g, (1, z)). Thus Vz € E, yd(z) = u (g, vl(z)). In other words, Yy = L, o yt. 


Now define of, y3 € Aj by W? = h(yi) and v2 = h(U3). Then V; = n(ói,-) and $3 = n(do, -) for 
the same unique ¢;,¢2 € P as for yi and vi. Therefore V2 = Lg o v? for the same g € G because g 
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depends only on $1, ¢2 and uE. Thus h satisfies Definition 47.9.5 (ii). Hence h satisfies Definition 47.9.5. 
In other words, h meets the nontopological criteria for a (G, F) fibre bundle. Consequently all associated 
baseless figure/frame bundles in accordance with Definition 20.11.2 are automatically associated in the sense 
of nontopological ordinary fibre bundles. 


20.11.4 REMARK: Interpretation of associations between baseless figure/frame bundles. 

Associations between fibre bundles may seem at first sight to be unimportant, uninteresting and unnecessarily 
technical, with arbitrary conditions which are difficult to attach intuitive meaning to. However, fibre bundle 
associations are one of the core concepts of the fibre bundle framework. These associations permit parallelism 
(or connections) to be transferred or copied from simple fibre bundles, such as vector bundles, to more 
complicated fibre bundles, such as tensor bundles. They also allow parallelism (or connections) to be copied 
from principal bundles to ordinary bundles. 


The concept of an association between two baseless figure bundles has the same meaning as between the 
better known fibre bundles which are at the core of differential geometry. The construction methods which 
are generally given for associated fibre bundles are sometimes quite difficult to interpret. So figure bundles 
provide an ideal minimalist environment in which to attempt to attach meaning to such construction methods. 


A key question which arises in the construction of associated fibre bundles, or associated baseless figure 
bundles, is why the association should be defined as it is, and not otherwise. Associations may be abstract or 
concrete. A purely abstract association may satisfy all of the algebraic requirements of the abstract definition, 
but without any evident meaning. A concrete association is derived from some underlying relationship 
between the figures in two associated bundles. Examples of concrete relationships between figures which are 
acted on by transformation groups are given in Remark 20.9.2 and Definition 20.9.13. Some purely abstract 
associations are given by Definitions 20.9.17, 20.11.2 and 47.9.5. 


A typical association for tensor algebra or differentiable manifolds enables copying linear group actions 
from a linear space to a tensor space. For example, the action on a multilinear map space may be defined 
by applying the linear space action to all of the arguments. Thus (L,w)(v1,...vx%) would be defined to 
equal w(L,vi,...L,vy) for example, for a multilinear map w. (This is rule (3) followed by rule (4) in 
Remark 20.9.2.) When the tensor space charts are then correctly associated with the linear space charts, 
the requirements for an abstract association will be satisfied. (In the case of differentiable fibre bundles, see 
Definitions 66.7.6, 66.7.10 (iii) and 66.7.12 (iii) for abstract associations of charts.) The meaning of bundle 
associations cannot be obtained from abstract association definitions. The meaning must be obtained from 
the concrete definition of each association. 


The patchwork and orbit-space associated fibre bundle construction methods in Definitions 47.10.4, 47.11.5, 
66.7.10 and 66.7.12 are identification spaces where the patches, which are identified on their overlaps, are in 
essence nothing more than local trivialisations. Thus effectively, they construct associated bundles from co- 
ordinate spaces, grouped together in equivalence classes, in the same that topologically interesting manifolds 
may be constructed from overlapping charts. 


As shown in Remark 20.11.3, baseless figure/frame bundles with the same structure group and principal 
bundle are automatically associated. Thus a non-trivial base space is required for non-trivial bundle associa- 
tions. Local trivialisations are used in the construction of associated bundles, but all fibre charts for baseless 
bundles are trivial. The conclusion to be drawn is then that base spaces and fibre charts (also known as 
“local trivialisations"), are the underlying structures which give meaning to bundle associations. 


20.12. Associated baseless figure bundle constructions 


20.12.1 REMARK: Construction methods for associated figure spaces. 

A very wide range of associated baseless figure bundles may be constructed via the methods described in 
Section 20.9. (See particularly Definitions 20.9.13 and 20.9.17.) Other well-known associated figure bundles 
are the spaces of vectors, covectors and tensors over a fixed linear space, as described in Chapters 27—30, for 
which the principal bundle may be defined in terms of the right action of matrices on the set of linear space 
bases for the linear space. 


20.12.2 REMARK: Ordinary bundles may be reconstructed as equivalence classes. 
The ordinary bundle of a baseless figure/frame bundle may be reconstructed from the two transformation 
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group actions and the coordinate space F. Theorem 20.12.3 reconstructs the elements of an ordinary bundle 
E asa set E of equivalence classes of P x F, with a natural bijection Q : E > E. 


20.12.3 THEOREM: Reconstruction of am ordinary bundle as a quotient space. 
Let (G, P, E, F, o, pẹ, ub. n) be a contravariant baseless figure/frame bundle. Define a map Q : P x F > 
IP(P x F) by 


Vo € P, Vf € F, Q, f) ={ f); Ag € G, (9, P) = (5-36, Lo f)} 
= (Uto Lg f); g € G}. 
Let E = Range(Q) = {Q(¢, f); (¢, f) € P x F}. Define Q : E > P(P x F) by 
Vz € E, Q(z) = {(¢, f) € P x F; n(¢,2) = f}. 
(i) Range(Q) is a partition of P x F. 
(ii) Range(Q) — È. 
(iii) Q : E > E is a bijection. 


PROOF: For part (i), it must be shown that the sets Q(¢, f) are pairwise disjoint and that they cover P x F. 
Let (¢1, fi), (%2, f2) € P x F be such that Q(¢1, fi) N Q(¢2, fo) # 0. Then there is a pair (¢', fl") € Px F 
such that (d’, f^ = (R "IDE Ly fi) = (R o: 192; Lj,f2) for some gi,g2 € G. Then R gii = = R gg 02 
and Lg fi = Lg f2. S b dies = R,-1¢2 and * - = L,f», where g = gigs. Therefore (rf) € Olds, f. 
Similarly ($e, fo) € OU f: So Q(¢1, fi) = Q(¢a, fo). Range(Q) clearly covers P x F because (¢, f) € 
Q(¢, f) for all (o, f) € P x F. 

For part (ii), let z € E. Let (61, fi), (02, fo) € Q(z). Then n(¢1,z) = fi and n(¢2,z) = fe. By Def- 
inition 20.10.8 (ii), there is a unique g € G such that ¢2 = Rj,ó;. Then n(R,-1¢1,z) = Lgn(¢1, z) by 
Definition 20.10.8 (iv). So f2 = n(Ry-1¢1, z) = Lg fi. So (¢2, f2) = (Ry-1¢1, Lg f1) for some g € G. There- 
fore (do, fr) € Q(¢1, fi). So Q(z) c Q(¢1, fa) for some (d1, fi) €PxF. 

To show that Q(z) = Q(¢1, fı), let (o, f) € Ql, fi). Then (9, f) = (Hg-11, Lg fi) for some gE G. So 
olz) = R,-161(2) = Lgdi(z) = Lafı = f. So (6, f) € Q(z). Therefore Q(z) = Q(¢1, 21). So Q(z) e E 
Hence Range(Q) C E. 

To show that Range(Q) = E, let y € E. Then y = Q(6, f) for some (o, f) € Px F. By Definition 20.10.8 (v), 
there is a (unique) z € E with ¢(z) = f. Then (4, f) € Q(z) by the definition of Q. So Q(z) = Q(¢, f) (as 
shown in the previous paragraph). Therefore y € Range(Q). Hence Range(Q) — E. 

For part (iii), Q : E — E is a surjection by part (ii). Let 21,22 € E satisfy Q(z;) = Q(z2). Then for all 
(¢1, fi) € P x F, (n(01,21) = fi) > (n(¢1, 22) = fi). So 21 = z» by Definition 20.10.8 (v). So Q is injective. 
Hence it is a bijection. i 


20.12.4 REMARK: The quotient space in a reconstructed ordinary bundle. 

The reconstructed ordinary bundle E in Theorem 20.12.3 is denoted by some authors (in the context of 
fibre bundles which do have a base set) as (P x F)/G or P xg F. The notation (P x F)/G is obtained by 
defining a left group action by G on P x F with L,(¢, f) = (Rj-1¢, L,f) for (6, f) € P x F and g € G, 
and then defining (P x F)/G to be the orbit space of G under this action. (See Definition 20.5.6 for the 
orbit space of a left transformation group. According to Kobayashi/Nomizu [19], page 54, and Drechsler/ 
Mayer [262], page 80, (P x F)/G is the orbit space of a right transformation group because they define 
Ry(¢, f) = (Ryd, L;-:f).) The action of G on P x F is a simultaneous change of observer frame and object 
coordinates corresponding to no change in the state of the observed object. 


It should be remembered that the group G in the quotient space (P x F)/G is really two group actions, one 
on P and one on F. The action of G on P x F is a combination of these two actions. 


20.12.5 REMARK: Construction of new ordinary bundles by the quotient method. 

If Theorem 20.12.3 only demonstrated a bijection from (P x F)/G to E for a given ordinary bundle £, it 
would be of limited value. The true value of this construction method lies in the ability to construct new 
ordinary bundles E = (P x F)/G for any given effective left transformation group (G, F, c, LE). Then group 
actions may be induced onto the constructed ordinary bundle via the transformation group (G, F). Most 
importantly, parallelism and curvature may be ported between associated ordinary bundles and the principal 
bundle via the charts and component spaces. 
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20.12.6 REMARK: Peer-to-peer versus principal-to-ordinary bundle associations. 

There are two kinds of associations between baseless figure bundles. The first kind is peer-to-peer associations 
between ordinary bundles E; which have a common principal bundle as in Remark 20.11.1. The second kind 
is associations between the principal bundle and any individual ordinary bundle, which could be thought 
of as “server/client” associations. Parallelism is often defined first on principal bundles and then ported to 
ordinary bundles. 


The constructed ordinary bundle Ej = (P x F,)/G referred to in Remark 20.12.4 can be given an action by 
elements of G of the form g : [(¢, f)] — (R4, f)] = [(¢, L,f)]. Thus when the observer bundle P is acted 
upon by Rg, a corresponding action Lg is induced on each ordinary bundle Ex. 


(( 2018-12-11. Maybe present a very primitive version of Definition 47.12.3 near here. )) 
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Chapter 21 


NON-TOPOLOGICAL FIBRE BUNDLES 


21.1 Non-uniform non-topological fibrations . . . ... eA 705 
21.2 Uniform non-topological fibrations . . . . . . es 707 
21.3 Cross-sections of non-topological fibrations . . . 2... 2. a 710 
21.4 Cross-section short-cuts for form-style fibrations . . . o sosoo a a 712 
21.5 Fibre charts for non-topological fibrations . . . .. . 0.200002. 0004 715 
21.6 Cross-section localisations and locally constant cross-sections ..............0.. 720 
21.7 Fibre atlases for non-topological fibrations . . . .. ... e 00004 721 
21.8 Non-topological ordinary fibre bundles . . . ...... 0.00000 eee ee 723 
21.9 Non-topological principal fibre bundles . . .. .... le 727 
21.10 Identity cross-sections and identity charts on principal bundles . .............. 728 
21.11 The right action map of a non-topological principal bundle ................. 132 
21.12 Associated non-topological fibre bundles . ...... e 138 
21.13 Patchwork associated non-topological fibre bundles . ........ llle. 139 
21.14 Orbit-space associated non-topological fibre bundles . . ........ llle 743 
21.15 Parallelism on non-topological ibrations . .. ....... es 743 
21.16 Parallelism on non-topological ordinary fibre bundles .................... 746 


21.0.1 REMARK: The geometric nature of fibre bundles, and their three structural layers. 

Fibre bundles are an application of algebra to the geometry of “objects at locations”. More specifically, 
fibre bundles are an application of transformation groups to the combined geometry of objects and their 
point-locations. 


Traditional geometry was concerned with points in physical space or space-time, or various kinds of non- 
physical spaces. Fibre bundles are also concerned with the “actors on the stage”, not just the stage itself. 
A typical example of a cross-section of a fibre bundle is a vector field, which has one “actor” at each point 
in the point-space. For example, this could be an electro-magnetic field, gravitation field, or the stress or 
strain of a material at each point. 


Transformation groups, which are algebraic, describe the structure of the set of fibres based at each point of 
a fibre bundle, but the base space is geometric. So although non-topological fibre bundles are presented here 
shortly after the transformation groups in Chapter 20, they are really geometric objects which just happen 
to be a useful application of transformation groups. 


As in the case of pure point-space geometry, it is possible and desirable to add topological and differentiable 
structure to the geometry of fibre bundles. These additions are introduced in Chapters 47 and 64 respectively. 
Fibre bundles are fundamentally geometric objects, but they progessively acquire algebraic, topological and 
differentiable structure in Chapters 21, 47 and 64. To present these three structural layers with maximum 
clarity, they are presented in three different parts of this book. 


Thus “non-topological fibre bundles” are not an exercise in pointless minimalism. They are the first stage 
in the build-up of the three layers of structure of fibre bundles. The purpose of Chapter 21 is to first clearly 
present the very substantial algebraic aspects of fibre bundle geometry before progressing to their topological 
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and analytical aspects. A surprisingly large proportion of the key concepts of fibre bundles are meaningful 
and non-trivial in the total absence of topology and analysis. 


Some authors claim that fibre bundles are a branch of topology. The view taken here is that, at least in purely 
structural terms, topology is the least important of the three layers of structure on fibre bundles. The pure 
mathematical research literature on fibre bundles may be predominatly concerned with their combinatorial 
topology properties, but from the structural point of view, the topology is quite simple and unremarkable. 
From the point of view of the geometry of physics, the topology is almost always a side-effect of locally 
Cartesian differentiable structure. So from the physics perspective, it is the differentiable layer which is 
of most interest, particularly for the definition of tangent bundles, vector bundles, connections and metric 
tensor fields. The topology is merely a property of the differentiable structure. 


The term “non-topological fibre bundles” is intended here to counteract the common assumption that all 
fibre bundles are topological. A more accurate and informative term might be “fibre bundle algebra”, and 
then the topological and differentiable fibre bundle topics could be renamed to “fibre bundle topology” and 
“fibre bundle analysis” respectively. 


21.0.2 REMARK: The removal of topology and differentiable structure from fibre bundle definitions. 
Chapter 21 introduces non-topological fibre bundles and non-topological parallelism as a minimalist prelude 
to Chapters 47 and 48, which deal with topological fibre bundles and parallelism. Differentiable fibre bundles 
are defined in Chapters 64-66, and parallelism is defined differentially as “connections” on differentiable fibre 
bundles in Chapters 67-69. 


The following table summarises the distribution of fibre bundle topics amongst the chapters in this book. 
(Tangent bundles are a sub-species of differentiable vector bundles, which are a sub-species of differentiable 
fibre bundles, as illustrated in Figure 65.0.1.) 


bundle class fibre bundles parallelism 
non-topological Chapter 21 Chapter 21 
topological Chapter 47 Chapter 48 
differentiable Chapters 64-66 Chapters 67-69 
[tangent bundle] Chapter 54 Chapter 71 


'The non-topological fibre bundle definitions in this chapter are not presented by most authors, but it is 
useful to initially ignore topological questions in order to concentrate on the algebraic fundamentals. Most 
texts assume that fibre bundles have topological or differentiable structure. A surprisingly large proportion 
of fibre bundle concepts are meaningful and non-trivial in the absence of topology, for example: 

> cross-sections, localisation of cross-sections, constant cross-sections, 

> structure groups, 

> identity cross-sections and identity charts for principal bundles, 

> the right action map and right transformation group of a principal bundle, and 

> the construction of a principal bundle from an arbitrary freely-acting right transformation group. 
Also meaningful without topology are associated fibre bundles and the “porting” of parallelism between 
associated bundles, although these are introduced in the context of topological fibre bundles in Chapter 47. 


Even more “minimalist” than non-topological fibre bundles are the “baseless figure bundles” which are 
described in Sections 20.10 and 20.11. Baseless figure bundles are fibre bundles with no base space. It is 
possibly beneficial to first understand baseless figure bundles before reading this chapter. 


21.0.3 REMARK: The relation between fibre bundles and tangent bundles. 

Fibre bundles seem to be natural generalisations of tangent bundles and tensor bundles on differentiable 
manifolds. The wide variety of tensor bundle structures which can be defined in association with the basic 
tangent bundle structure on a manifold can be organised and understood within the broader context of 
general fibre bundles. 


Although tangent bundles are quite likely the original inspiration for general fibre bundles, they are regarded 
as a distinct class of structure here. Tangent bundles fit uncomfortably within the framework of abstract fibre 
bundles. When the fibre bundle framework is applied to tangent bundles, various identifications of strictly 
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distinct objects are required, for example via the vertical drop functions in Section 59.2, the horizontal swap 


functions in Section 59.6, and the “informal short-cut versions of differential forms” in Remark 57.7.1. 


21.0.4 REMARK: The key role of fibre bundles in definitions of parallelism. 

The pure mathematical literature on fibre bundles focuses on topological connectivity classification questions 
which are of little interest for single-chart manifolds. The main value of fibre bundles in differential geometry 
for general relativity is to provide a substrate for defining parallelism and curvature. 


Parallelism specifies associations between objects which are attached to different points of a base set. For 
example, the parallel lines of everyday geometry are objects which are in different locations in space. In 
differential geometry, the objects are typically tangent vectors or tensors at different points in a manifold. 
Thus for defining parallelism, one needs both a point space (called the “base space”) and a space of objects 
(called the “total space”), each of which is located at a unique point of the point space. 


The core topic of differential geometry is curvature, which is a kind of differential parallelism. Fundamental 
physics is largely concerned with fields, which are formalised as “cross-sections” within the framework of 
fibre bundles. Hence fibre bundles are inevitably invoked in a presentation of differential geometry. 


21.0.5 REMARK: Summary of fibre bundle classes and their specification tuples. 

Table 21.0.1 is a preview of various classes of fibre bundles and their specification tuples. The letters have 
the following significance: E is a total space, B is a base space, m is a projection map, F is a fibre space, 
AF. is a fibre atlas for fibre space F', G is a structure group, Tg is a topology on E, Tg is a topology on B, 
Ag is a manifold atlas on E, Ap is a manifold atlas on B, P is the total space on a principal fibre bundle, 
Tp is a topology on P, and AG is a fibre atlas for structure group G, and M is a base space which has 
differentiable manifold structure (i.e. a manifold atlas). 


definition specification tuple fibre bundle class 


non-topological 


21.2.1 (E, 7, B) non-topological fibration (with intrinsic fibre space) 
21.2.10 (E,7, B) non-topological fibration with fibre space F 

21.7.5 (E, r, B, AE) non-topological fibre bundle for fibre space F 
21.8.3 (E, 7, B, AZ) non-topological (G, F) fibre bundle 

21.9.4 (Pin, B, AS) non-topological principal G-bundle 

topological 

47.2.2 E, Tg, n, B,Tp) topological fibration (with intrinsic fibre space) 


( 
(E,Tg, t, B, Tg) topological fibration with fibre space F 
47.5.5 (E,Tg, n, B,Tg, AŁ) topological fibre bundle for fibre space F 
(E,Tg, n, B,Tg, AŁ) topological (G, F) fibre bundle 
(P, Tp,m, B,Tg, AS) topological principal fibre bundle with structure group G 


64.4.4 (E, Ag, n, M, Am, AE) OF fibration with fibre atlas for fibre space F 
(E, Ag, v, M, Am, AE) CF (G,F) fibre bundle 

64.10.1 (E, Ag, Tt, M, Am, AE) analytic (G, F) fibre bundle 
(P, Ap,v, M, Am, AG) C* principal fibre bundle with structure group G 


'Table 21.0.1 Summary of fibre bundle specification tuples 


21.0.6 REMARK: Family tree of fibrations, ordinary fibre bundles and principal fibre bundles. 
Figure 21.0.2 is a family tree of non-topological fibrations, ordinary fibre bundles and principal fibre bundles. 


21.0.7 REMARK: Fibre bundles are locally product-structured spaces. The Universe is a fibre bundle. 

The quintessence of fibre bundles is their local direct product structure. This is usually referred to as the 
“local triviality" of fibre bundles. The first component of the product is generally space or space-time, and 
the second is the space of possible field values at each point. If this first component is simple Euclidean space 
or Minkowskian space-time, then the direct product structure is global, and then there is no need for “local 
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non-uniform fibration 


Y 
uniform fibration 


Y 
fibre bundle 


Y 
ordinary fibre bundle 


Y 
principal fibre bundle 


Figure 21.0.2 Family tree of fibrations, ordinary fibre bundles and principal fibre bundles 


trivialisations". In such cases, the “non-triviality” of fibre bundles is redundant. Product-structured spaces 
are then adequate. However, the general machinery of fibre bundles, which emphasises the arbitrariness 
of local fibre charts, forces any theory expressed in this framework to be robust with respect to fibre chart 
transitions. This greatly increases the mental effort required, but hopefully has some benefits such as insights 
into the relations between geometric concepts. 


Product-structured spaces are the mathematical frameworks for physical fields. There had been no need for 
physical fields when “action at a distance" was accepted as in Newton's original gravity theory. In fact even 
in 1873, Maxwell [290], Volume II, page 437, described strong resistance to the idea of fields. 


'There appears to be, in the minds of these eminent men, some prejudice, or à priori objection, 
against the hypothesis of a medium in which the phenomena of radiation of light and heat, and 
the electric actions at a distance take place. It is true that at one time those who speculated as 
to the causes of physical phenomena, were in the habit of accounting for each kind of action at a 
distance by means of a special aethereal fluid, whose function and property it was to produce these 
actions. They filled all space three and four times over with ethers of different kinds, the properties 
of which were invented merely to ‘save appearances,’ so that more rational enquirers were willing 
rather to accept not only Newton's definite law of attraction at a distance, but even the dogma of 
Cotes, that action at a distance is one of the primary properties of matter, and that no explanation 
can be more intelligible than this fact. Hence the undulatory theory of light has met with much 
opposition, directed not against its failure to explain the phenomena, but against its assumption of 
the existence of a medium in which light is propagated. 


Maxwell [290], Volume II, page 438, concluded that nevertheless a field theory was best. 


Hence all these theories lead to the conception of a medium in which the propagation takes place, 
and if we admit this medium as an hypothesis, I think it ought to occupy a prominent place in 
our investigations, and that we ought to endeavour to construct a mental representation of all the 
details of its action, and this has been my constant aim in this treatise. 


The fibre bundle concept is the modern mathematical framework for the “medium” of electro-magnetism 
and other field theories. Despite the danger that fibre bundles “fill all space three and four times over with 
zethers of different kinds", such a framework appears to give the right answers. (In other words, it “saves 
appearances".) Fibre bundles may then be regarded as a “mental representation" of each “medium”. 


Since space appears to be an empty vacuum, it is difficult to believe that at each infinitesimal point of 
space-time, there is a choice of fibre value for each kind of field, potentially representing an infinite amount 
of information per cubic metre. Field theories are enormously successful, particularly in comparison with 
the old action-at-a-distance theories. They are indispensable to physics, which makes them a prime subject 
of study for mathematicians. The Universe is a fibre bundle. This is why fibre bundles are important. 


Action-at-a-distance theories, which are still taught and are still valid in most practical situations, relied on 
the possibility of universal simultaneity of events, which lost credibility when the finite speed of light was 
measured. This led inevitably to relativity theory long before Lorentz-Fitzgerald-Poincaré-Einstein relativity 
was developed. This compelled the replacement of force-at-a-distance with field theories, and field theories 
require fibre bundles to mathematically represent them. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


21.1. Non-uniform non-topological fibrations 705 
21.1. Non-uniform non-topological fibrations 


21.1.1 REMARK: All functions are non-uniform non-topological fibrations. 
According to the minimalist non-uniform non-topological fibration concept in Definition 21.1.2, any function 
7 : E — Bisa fibration, with its range as the base space, and its domain as the total space. 


Definition 21.1.2 and Notation 21.1.3 are illustrated in Figure 21.1.1. 


Ey, = 1 3 ((b1)) ON rad Ej, = 1 1 ({b2}) 
E 
B . e . . . . e . e . 
by by 
Figure 21.1.1 Non-uniform non-topological fibration 


21.1.2 DEFINITION: A non-uniform non-topological fibration is a tuple (E, r, B) such that E and B are 
sets and 7 is a function 7: E > B. 


The set E is called the total space of (E,7, B). 

The set B is called the base space of (E, r, B). 

The function 7 is called the projection map of (E,7, B). 

The set 7~1({b}) is called the fibre set at b of (E, v, B), for each b € B. 
A total space element z € E is called a fibre of (E, r, B). 


21.1.3 NOTATION: F», for a total space E of a fibration (E, 7, B) and a point b € B, denotes the fibre set 
of (E, v, B) at b. In other words, Ep = »- !((6)). 


21.1.4 EXAMPLE: Some fibrations on the unit circle. 

Let S! = (x € R?; |z| = 1} = (es; 0 € R}, where eg = (cos0,sin0) for all 0 € IR. Then (E,v, B) isa 
fibration in the following cases. 

(1) Normal vectors. B = S!, E = ((eg,teg); 0 € R, t € R}, and 7: (eo, teg) + eo. 

(2) Tangent vectors. B = St, E = {(e9, teg42/2); 0 € R, t € R}, and 7 : (eg, teg4.4.2) ^ €o. 

(3) Ambient space vectors. B = St, E = B x R2, and 7: (z, y) > az. 

(4) Exterior normal vectors. B = S!, E = {(e9, tes); 0 € R, t € IR), and s : (eg, teg) + eo. 

(5) Clockwise tangent vectors. B = S', E = {(e9,teg4n/2); 0 € R, t € Ro}, and m : (eg, teg4.n/2) eo. 

It is important to distinguish between fibration total spaces and cross-sections. (See Section 21.3 for cross- 
sections.) A fibration total space consists of any number of fibres at each base point, and its elements do not 
necessarily have an ordered pair structure as in these examples. A cross-section, on the other hand, always 


has an ordered pair structure because it is a function which chooses one element from Ej for each b € B. 
Thus the fibration total space is the totality of all fibres at all points, not just one fibre at each point. 


21.1.5 REMARK: Degenerate cases. Empty non-uniform fibrations. 

Let (£,7,B) be a non-uniform non-topological fibration. If E = Ø, then 7 = (), but then B can be any set. 
This is the case of “everywhere-empty fibre sets” where Ej = () for all b € B. 

If E z 0, then 7 z ( and B Ø. Conversely, if B = 0, then E = ( and r = 0. (See Remarks 21.2.5, 21.2.6 
and 21.2.11 for "degenerate cases" for a uniform non-topological fibration.) 
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21.1.6 REMARK: Alternative name for the projection map. 

In the case of fibre bundles, some authors refer to the projection map as the “fibration” of the fibre bundle. 
(See for example Poor [32], page 2.) Many authors also write something like “the fibration m : E — B” or 
“the fibre bundle v : E — B". Thus the projection map is often used as a shorthand for the entire structure 
which is built around it. Most importantly, the projection map partitions the total space into fibre sets, as 
shown in Theorem 21.1.7. 


21.1.7 THEOREM: The total space is partitioned by its projection map. 
Let (E,7, B) be a non-uniform non-topological fibration. Then (z-!([6)); b € B} is a partition of E. In 
other words, E — Uses Ey and Vb4,65 € B, (by z b > Ey, N E, = 0). 


PROOF: ‘The assertion follows from Theorem 10.16.3 and Definition 8.7.12. 


21.1.8 REMARK: A fibration is a function which is viewed as a “partition tag-map”. 
The non-uniform fibration in Definition 21.1.2 is clearly nothing more or less than a completely general 
function m : E — B bundled up into a triple (E, 7, B) which includes its domain E and a target space B. 
The set E can always be determined from f as E — Dom(f), and the set B is any set which satisfies 
B D Range( f), which Range( f) is also very easily determined from f. 


However, a non-uniform fibration is more than just a repackaged arbitrary general function. It is a function 
which is viewed as an injective "tagging map" for a partition of a given set E. (See Definition 8.7.12 for 
partitions.) In other words, the partitioned set E is the primary focus, and the projection map provides an 
injective map which tags the elements of the partition. Thus the base space B provides a set of tags for the 
elements of the partition of E. This “change of perspective" is the essence of Theorem 21.1.7, which is valid 
for all functions. 


Theorem 21.1.9 shows that the fibration projection-map perspective and the “partition tag-map” perspective 
are equi-informational. 


21.1.9 THEOREM: Partition tag-maps and fibration projection maps are equi-informational. 
Let E and B be sets. Let P be a partition of E. Let f : P — B be an injective function. 
(i) The set t = Une p(A x {f(A)}) is a well-defined function from E to B with Range(») = Range( f). 
(ii) VA € P, a! ((f(A4))) = A. 
(iii) Vb € Range(f), VA € P, «-!((6]) = A. 
(iv) P = (n !((0)]); b € Range(7)}. 
(v) f = ((17((6)),0); b € Range(7)]. 
Thus a fibration projection map T = U4epon (p CAx {f(A)}) may be constructed from a partition tag-map f, 


and a partition tag-map f = ((»-1((b]),0); b € Range(z)} may be constructed from a fibration projection 
map 7, and these constructions are inverses of each other. 


PROOF: For part (i), 7 is a well-defined set by the ZF union axiom and the well-definition of Cartesian 
set-products as discussed in Remark 9.4.5. All of the elements of 7 are ordered pairs because for all A € P, 
all elements of A x (f(A)) are ordered pairs. Therefore 7 is a relation by Definition 9.5.2. 


The domain of 7 is |J 4c p Dom(A x {f(A)}) = Ugep A, which equals E by Definition 8.7.12 (i). The range 
of 1 is U jep Range(A x {f(A)}) = Uneptf(A)}, which equals Range( f). Thus Range(7) = Range( f) C B. 
Let b, bg € Bwith (e, bi), (e, bz) € c for some e € E. Then (e, bı) € Ay x {f(A1)} and (e, bz) € A» x {f(Az2)} 
for some A1, A» € P by the definitions of m. But then e € A, M As, which implies that A; = A» by 
Definition 8.7.12 (ii) because P is a partition. So b; = f(A1) = f(A2) = 0». In other words, m(e) has a 
unique value for all e € E. Hence 7 is a well-defined function from E to B with Range(z) = Range( f). 

For part (ii), let A € P. Then 1 !((f(A))) = (e € E; r(e) = f(A)) = {e € E; (e, f(A)) € T}. By part (i), 
(e, f(A)) € 7 if and only if (e, f(A)) € A' x (f(A')) for some A’ € P. If A’ = Athen (e, f(A)) € Ax {f(A} 
if and only if e € A. If A’ Z A then (e, f(A)) € A’ x {f(A’)} because f is injective. Therefore (e, f(A)) € 7 
if and only if e € A. Hence r~1({f(A)}) = (ee E; (e, f(A)) E€ Ax (f(A))) =A. 

For part (iii), let b € Range(f). Then b = f(A) for some A € P. So « !((b]) = A by part (ii). The 
uniqueness of A follows from the injectivity of f. 
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For part (iv), let A € P. Let b = f(A). Then 1~'({b}) = A by part (ii). Thus A € (1^! ((b)); b € Range(z)}. 
So P C (v !((b]); b € Range(v)). 

Now suppose that A € [zx !([b]); b € Range(z)). Then A = s« !([b]) for some b € Range(z). So 
A = 7~1({b}) for some b € Range(f) by part (i). Then by part (iii), z^ !((b)]) = A’ for some unique 
A’ € P. But then A’ = « !((b]) = A. So A = A € P. Thus P D (x !((6]); b € Range(r)}. Hence 
P = {n71 ({b}); b € Range(z)}. 

For part (v), let z € f. Then z = (A, f(A)) for some A € P by Theorem 10.2.11 (ii). Let b = f(A). 
Then b € Range(f), and A = x !((b]) by part (ii). So z = (n-!((b)),b) for some b € Range(f). Thus 
z € ((n-1((5]),0); b € Range(r)). Therefore f C {(m71({b}), b); b € Range(z)]. 

Now let z € ((1-1((56]), b); b € Range()}. Then z = (^ 1((6]), b) for some b € Range(z) by Notation 7.7.18. 
But Range(7) = Range(f) by part (i). So b = f(A) for some A € P. Then (x-!((5)) = (n! ((f(4)) = A 
by part (ii). So z = (A, f(A)). Therefore z € f. Consequently f 2 ((n-!([0]),0); b € Range(7)). Hence 
f = (71 ({b}), b); b € Range(m) }. 


21.1.10 REMARK: The theory of fibrations focuses on surjective, nowhere-injective functions. 

Although, in a technical sense, every function is a (non-uniform) fibration, the study of fibrations focuses on 
the inverses of functions. More specifically, the theoretical framework of fibrations is principally concerned 
with the inverses of surjective, nowhere-injective functions. In other words, a function 7: E — B is only of 
serious interest as a fibration when «^ !((b]) Æ 0 for all b € B, and moreover «^ !((b]) contains more than 
one element for all b € B. 


21.1.11 REMARK: A fibration represents sets of states at the points of a space. 

The elements of the base space of a fibration (or fibre space) typically represent locations in some geometrical 
or abstract space or space-time. The elements of the fibre set at a single point typically represent possible 
states of some kind at that point. 

Definition 21.1.2 describes à minimalist fibration which is useful for introducing the basic vocabulary of 
fibrations without being distracted by additional structures such as topologies, atlases, fibre spaces and 
structure groups. 


To be thought of as a fibration, the base space should be a set of points in a geometrical space of some kind, 
and the fibre set Ey should be the set of possible states of an object which is located at each point b of the 
base space B. The purpose of fibrations (and fibre bundles) is to provide a “place to live" for system states 
which may be decomposed into a distinct, but essentially equivalent, state at each point of a point set. For 
example, a global temperature map of Earth surface temperatures consists of one and only one temperature 
at each point of the Earth. The measurement of global Earth temperature is an aggregate of measurements 
at each point, like for example, “the temperature in Warsaw is —22°C.” The location of this measurement 
is Warsaw and the state is —22?C. The set of all such measurements would be the total space E, and m(e) 
denotes the location of each measurement e € E. 


An important capability of fibrations is the ability to compare states at different points. Even if it is true 
that the sets of possible temperatures in Warsaw and Cairo are different, it is more convenient to have a 
uniform set of states across the whole point space. So at the very least, one expects the fibre sets to be 
equinumerous, as in Definition 21.2.1. 


21.2. Uniform non-topological fibrations 


21.2.1 DEFINITION: A (uniform) non-topological fibration is a tuple (E, n, B) such that E and B are sets, 
7 is a function 7: E > B, and 


(i) Ybı, b2 € B, Jd: n^! ((b1)) A v^ 1((05]), ó is a bijection. [uniformity] 


21.2.2 REMARK: The equinumerosity of fibre sets. 

The equinumerosity of fibre sets x !((b])) is necessary but not sufficient for a fibration to meet its intended 
purpose. In the case of temperature, completely arbitrary one-to-one associations between temperatures in 
Warsaw and Cairo are of no value. More structure is required on the total space so as to make meaningful 
comparisons between base points. For example, a total order on each set E, would make possible the defini- 
tion of monotonic bijections between these sets, which would help to make comparisons between temperature 
measurements at different points more meaningful. 
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21.2.3 REMARK: All fibre sets of a non-empty total space are non-empty. 
Theorem 21.2.4 is trivial, but trivial assertions are sometimes false. Proving trivial assertions is a good 
exercise in the application of a definition, and it helps to validate “boundary cases". 


21.2.4 THEOREM: Some trivial properties of non-topological fibrations. 
Let (E,7, B) be a uniform non-topological fibration. 


(i) If E Z 0, then «^ 1((5]) Z 0 for all b € B. 

(ii) If E Z 0, then 7: E > B is surjective. 

(ii) If E # 0, then VU € P(B), (n! (U)) =U. 

Gy) XE T. then CE, B= CAA: rames and 9. 


PROOF: For part (i), if E #0, then z € E for some z. Let b = z(z). Then z e 7~'({b}). So v !((b]) z 0. 
Let b’ € B. Then by Definition 21.2.1 (i), there exists a bijection ¢ : x !((b]) — v !((V). Let z' = (z). 
Then z' € x !((U]). So x! ((U]) z 0. Thus «^! ((5]) Z 0 for all b € B. 

Part (ii) follows from part (i) and Theorem 10.6.10 (i”, ii). 

Part (iii) follows from part (ii) and Theorem 10.7.1 (i"). 

For part (iv), if 4£(E) = 1, then E = {z} for some z. Let b = z(z). Then b € B, and «^! ((b]) = E because 
z € m !((b]) and 1! ((b)) C Dom(z) = E. Let Y € B with b Z b. Then «-!((V']) # 0 by part (i), and 
x i((b))nm-!((U]) = 0 by Theorem 21.1.7. But #(77'({b'})) = 1 by Definition 21.2.1 (i). So there exists 
z! Æ z for which z’ € 7~'({b'}) C E. Therefore #(E) > 2, which gives a contradiction. Hence B = {b}, and 
then « = {(z,b)}. 


21.2.5 REMARK: Some trivial kinds of non-topological fibrations. 

If B = @ in Definition 21.2.1, then E = Ø and m = Ø. The tuple (0,0,0) satisfies Definition 21.2.1. One 
may refer to this as “the empty fibration”. However, if E = () then B can be any set. Thus (0,0, B) is a 
non-topological fibration for any set B. (See Remark 21.2.6 for “everywhere-empty fibre sets".) 


If B = {b} for some b, then E may be any set, and there is only one choice for the projection map 7 : E > B, 
namely the map 7 : e > b for all e € E. The identity map ¢ = idp on E = 1~'({b}) satisfies condition (i) 
of Definition 21.2.1. So the tuple (E£,7,{b}) is a non-topological fibration. One may refer to this as a 
"single-base-point fibration". 


For any set B, one may construct a trivial kind of fibration with E = B and m = idg. Then for any 
bi, bo € B, one may define a bijection à : x^ 1((b1)) — «^! ((b5]) by à : bı + bg. One may refer to this is a 
"self-projection fibration". 

For any sets B and C, one may define E = B x C, and then define 7 : E > B by the rule v : (b,c) b. (See 
Definition 10.13.2 for Cartesian product projection maps.) Then Ey = x !((b]) = {b} x C for all b € B. 
One may define the bijection $ : (b1, c) — (bo, c) from Ey, to Ey, for any 61,59 € B. So (B x O,7, B) isa 
fibration, with 7 as indicated, for any sets B and C. This may be referred to as a "trivial fibration". 


21.2.6 REMARK: The “undesirability” of non-surjective projection maps. 

By Theorem 21.2.4 (ii), all non-topological fibrations with a non-empty total space must have a surjective 
projection map, which means that the projection map “covers” the whole base space. This is highly desirable. 
A base space which consists of “covered points” and “uncovered points”, which is certainly possible with the 
non-uniform non-topological fibration in Definition 21.1.2, introduces irrelevant “junk points” into the base 
space, which do nothing at all. (This possibility is also mentioned in Remark 21.2.5.) 


The technical inconvenience of continually needing to treat the degenerate case where the projection map is 
not surjective seems to outweigh any possible increase in generality. On the other hand, empty fibrations are 
meaningful. Empty structures often occur in a meaningful way in many classes of structures. So requiring 
the total space to be non-empty is not desirable. However, the surjectivity of the projection map does seem 
to be a desirable condition. It would be tedious to continually require fibrations (and fibre bundles) to be 
surjective. Therefore it seems that surjectivity should be part of the definition of every projection map. 


On the other hand, a non-empty base space B with an empty fibre set 7~1({b}) at every point b € B is 
certainly uniform, and it would correspond to an empty fibre space F in Definition 21.2.8. Thus one may 
easily exclude empty fibre sets by choosing the fibre space in Definition 21.2.10 to be non-empty. 
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Conclusion: Since base spaces with an empty fibre set at every point can be easily excluded by specifying 
a non-empty fibre space, and a base space with everywhere-empty fibre sets could be meaningful in some 
contexts, there seems to be no reason to require all projection maps to be surjections. 


21.2.7 REMARK: Non-uniqueness of the fibre space for a non-topological fibration. 

Definition 21.2.1 (i) means that the sets ~'({b}) are pairwise equinumerous for b € B. The sets x! ((5]) 
are pairwise equinumerous if and only if they are all individually equinumerous to some fixed set. This is 
called a “fibre space" for the non-topological fibration. Clearly such a fibre space is highly non-unique. If a 
fibre space exists for a given fibration, then that fibration is necessarily uniform. 


If the fibre sets Ej are pairwise equinumerous, one may choose a fixed set F such that for all b € B, there is 
a bijection ¢: Ey > F. 


21.2.8 DEFINITION: A fibre space for a non-topological fibration (E, m, B) is a set F such that 
(i) Vb € B, 36 : n-!((b])  F, ¢ is a bijection. [uniformity] 


21.2.9 REMARK: Interpretation of the fibre space as a space of measurement values. 

Measurements are sometimes made in different ways by different observers. For example, temperature may 
be measured in Celsius or Fahrenheit. The temperature at b € B is independent of the measurement scale 
being used. The elements of E are regarded as the "real temperature", while separate spaces are required 
for the measurement values. It often happens that the definition of the "real state" of a system at a given 
point is a difficult philosophical question, while the measurement values are well-defined numbers which there 
is little argument about. Thus one may make conversions between Celsius or Fahrenheit without knowing 
what temperature “is”. Hence the total space E is generally more abstract, and possibly extra-mathematical, 
whereas the measurement values are more concrete. 


In the case of temperature, the fibre set for both Celsius or Fahrenheit may be taken as the set R of real 


numbers, but the maps ¢ : x !((b]) > F are different. Definition 21.2.10 requires only the existence of such 
maps, but this is only a necessary condition, not sufficient to capture the full meaning of a fibration. 


21.2.10 DEFINITION: A non-topological fibration with fibre space F is a fibration (E, n, B) such that F is 
a fibre space for (E,7, B). 


21.2.11 REMARK: Empty and almost-empty fibrations. 

If B = () in Definition 21.2.8, then E = () and m = 9, as mentioned in Remark 21.2.5. Then F may be any 
set. Thus the empty fibration (E, m, B) = (0,0,0) is a non-topological fibration with fibre space F for any 
set F. For any set B, an “almost-empty fibration” (E, n, B) with E = Ø and m = ( is a non-topological 
fibration with fibre space F = (). These almost-useless comments are summarised in Theorem 21.2.12. 


21.2.12 THEOREM: Degenerate non-topological fibrations with specified fibre space. 
Let (E,7, B) be a non-topological fibration with fibre space F. 
(i) If B = 0, then E = — and 7 = 0, and F may be any set. 
(ii) If E = 0, then « = 0. 
(iii) If E = 0 and B # 0, then 7 = 0 and F = 0j. 
(iv) If F — 0, then E = ( and r = 0, and B may be any set. 


PROOF: For part (i), let B = Ø. Then E = and r = 0 by Theorem 10.2.23 (iii). Let F be a set. Then by 
Theorem 7.6.9, Definition 21.2.8 (i) is satisfied because B = Ø. Thus F may be any set. 

For part (ii), let E = Ø. Then 7 = Ó by Theorem 10.2.23 (i). 

For part (iii), let E = Ø and B 4 0. Then m = 0 by part (ii). Let b € B. Then 7~1({b}) = 0 because « = 0. 
(The emptiness of fibre sets also follows from E = ().) Therefore F = ( by Definition 21.2.8 (i). Consequently 
(E — 0 and B Z 0) > F — (V. 

For part (iv), let F = 0. If B = 0, then E = 0 and 7 = 6 by part (i). So suppose that B # Ø. Let b € B. 
Then 7~!({b}) = 0 by Definition 21.2.8 (i). Thus Vb € B, x !((b]) = 0 by Definition 6.3.9 (UI). Therefore 
E = ( because Dom(7) = Ujeg ^! ((5]) by Theorem 10.6.13 (ii). So 7 = Ø by part (ii). Thus F = ( implies 
E = Ú and « = 0, whether B is empty or not. 
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Now let B be any set. Let F = Ø. Then there exists a non-topological fibration (E, m, B) with fibre space F 
because one may choose E = () and m = 9, which gives a valid non-topological fibration (Ø, Ø, B) with fibre 
space F = Ø. Thus the condition F = () does not exclude any choice of B. 


21.3. Cross-sections of non-topological fibrations 


21.3.1 REMARK: Survey of terminology for cross-sections. 
Terminology for cross-sections includes hyphenated “cross-section”, unhyphenated “cross section”, and just 
“section”. Some authors use more than one term. These are summarised in Table 21.3.1. 


year reference terminology 
1951 Steenrod [142], page 3 “cross-section” 
1963 Auslander/MacKenzie [1] “cross section” 
1963 Kobayashi/Nomizu [19] “cross section” 
1964 Bishop/Crittenden [2] "cross section" 
1968 Choquet-Bruhat [6] "section" [fr] 
1970 Spivak [37] “section”; sometimes “cross section" 
1972 Malliavin [28], pages 69, 112 "section" [fr] 
1972 Sulanke/Wintgen [40] “Schnitt” [de] 
1977 Drechsler/Mayer [262], pages 17, 150 “cross section", “section” 
1980 Daniel/Viallet [317], page 178 “cross section”, “section” 
1980 EDM2 [113] "cross section? 
1980 Schutz [36] “cross-section” 
1981 Bleecker [254], page 27 “section” 
1981 Poor [32] “section” 
1983 Nash/Sen [30] "cross-section", "section" 
1986 Crampin/Pirani [7] "cross-section"; alternative "section" 
1987 Gallot/Hulin/Lafontaine [13] "section" 
1994 Darling [8] "section" 
1997 Frankel [12] “section”; sometimes “cross section” 
1997 Lee [24] "section" 
1999 Lang [23] "section" 
2004 Szekeres [305 “vector field" 
2007 Morgan/Tián [29], page 3 "section" 
2012 Sternberg [38] "section" 
2015 Gómez-Ruiz [14], page 65 "sección" [es] 
Kennington "cross-section" 
Table 21.3.1 Survey of terminology for cross-sections 


? 


The best practical choice of name may be “cross-section”. This avoids confusion with the word “section” as 
in “chapters and sections". In colloquial English, probably “section” has no disadvantages. The terminology 
“vector field” reveals the true significance of cross-sections. They are simply generalisations of the vector and 
tensor fields which are familiar from physics. Probably the best terminology of all would be “fibre fields”! 
The term “cross section” has a long history of meaning something completely different in the context of 
scattering theory. The term “section” does not suffer from this confusion. 


21.3.2 REMARK: Crross-sections of a non-topological fibration are right inverses of its projection map. 

For uniform non-topological fibrations as in Definition 21.2.1, a global cross-section in Definition 21.3.3 and 
Notation 21.3.4 is any right inverse of the projection map. (See Definition 47.4.2 and Notation 47.4.3 for 
cross-sections of topological fibrations. See Definition 47.4.6 and Notation 47.4.7 for continuous cross-sections 
of topological fibrations.) 
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21.3.3 DEFINITION: Cross-sections of uniform non-topological fibrations. 
A (global) cross-section of a non-topological fibration (E, r, B) is a map X : B — E such that 7 o X = idp. 


A (local) cross-section of a non-topological fibration (E, n, B) on a set U C B is a map X : U — E such 
that 7 o X = idy. 


21.3.4 NOTATION: Sets of cross-sections of uniform non-topological fibrations. 
X(E,n, B), for a non-topological fibration (E, r, B), denotes the set of all global cross-sections of (E, r, B). 
In other words, X (E, 7, B) = (X: B > E; m o X = idp}. 


X(E,m, B |U), for a non-topological fibration (E, m, B), denotes the set of all local cross-sections of (E, r, B) 
on a set U C B. In other words, X(E, r, B|U) - (X : U > E; v o X = idy}. 


o 


X(E,7, B), for a non-topological fibration (E,7, B), denotes the set of all local cross-sections of (E, r, B). 
In other words, X(E£,7,B) = {X : U > E; U € P(B) and 7 o X = idy}. 


21.3.5 THEOREM: Cross-sections of non-topological fibrations are injective. 
Let X be a cross-section of a non-topological fibration. Then X is injective. 


PROOF: Let X be a (local or global) cross-section of a non-topological fibration (E, m, B). Then X : U > E 
satisfies 7 o X = idy for some U € P(B), which implies that X has a left inverse. Hence X is injective by 
Theorem 10.5.14 (ii). 


21.3.6 REMARK:  Ezristence of cross-sections of a non-topological fibration. 

Theorem 10.5.14 (iii) states that if a function has a right inverse, then it is surjective. By Theorem 21.3.7, 
a uniform non-topological fibration as in Definition 21.2.1 has a surjective projection map if and only if the 
total space is non-empty or the base space is empty. Therefore a uniform non-topological fibration (E, 7, B) 
has no global cross-sections if E = Ø and B # Ø. (This observation is trivial. It is the converse which is 
problematic, as discussed in Remark 21.3.8.) Moreover, if E = Ø and B Æ 0), then the only possible local 
cross-section would be the “empty cross-section". 


21.3.7 THEOREM: Condition for uniform non-topological fibration projection map to be surjective. 
Let (E, r, B) be a uniform non-topological fibration. Then m : E — B is surjective if and only if E z () 
or B — (. Hence (E = Mand B Z 0) > X(E,«, B) — (V. 


Pnoor: If E40, then r Z 0 and B40. So »-!((b]) Z 0 for some b € B. Therefore 7~1({b}) 
all b € B by Definition 21.2.1 (i). So 7 is surjective by Definition 10.5.2. If B = 0, then 7 = @ and 
So m is surjective. 


If it is not true that E 40 or B = (), then E = 0, m = ( and B ¥ Ø, which implies that 7 is not surjective. 


Therefore 7 : E — B is surjective if and only if E # Ø or B = Ø. Hence by Theorem 10.5.14 (iii) and 
Notation 21.3.4, if E = ( and B Z f, then X(E,m, B) = 0. 


# () for 
E 


21.3.8 REMARK: Choice functions are required for the existence of cross-sections. 

As mentioned in Remark 10.5.16, choice functions are required in order to prove the existence of right 
inverses of surjections in general. So Theorem 21.3.7 does not guarantee the existence of cross-sections if 
E #9 or B = 0, even though it does guarantee surjectivity of the projection map. Somehow an element of 
7 l([b)) must be chosen for each b € B. If B is finite, this is guaranteed by the “axiom of finite choice" in 
Theorem 13.7.19. 


If Z£(x- 1 ((b))) > 2 for an infinite number of base points b € B, then a choice function is required. However, 
this is typically not problematic in practical scenarios because most fibrations in applications have an atlas 
with finitely many charts. (See Section 21.5 for fibre charts.) Then choices of fibre set elements can be made 
by inverting those charts. In fact, if there is a fibre atlas which has a choice function, this can be used to 
choose points of fibre sets. (See Section 21.7 for fibre atlases.) For example, any well-orderable atlas would 
have a choice function. So in practice, the existence of cross-sections is almost always trivial to prove. 
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21.4. Cross-section short-cuts for form-style fibrations 


21.4.1 REMARK: Short-cut versions of cross-sections of “form-style” non-topological fibrations. 
Differential forms on tangent bundles are particular kinds of cross-sections. (See Section 57.6.) In practice, 
they are very often replaced by a “short-cut” version which makes some analysis simpler. (See Section 57.7 
for short-cut versions of differential forms on differentiable manifolds.) These “short-cuts” are possible when 
for all p € B, all elements z of E, have the functional form z : E; + W, where (E, v, B) and (E,7, B) are 
non-topological fibrations and W is a set. The total space E could be thought of as a kind of “form domain” 
for the total space elements z € E. Then (E,7, B) could be called the “platform fibration” for (E, m, B). 


Definition 21.4.2 defines a form-style non-topological fibration to be the special kind of non-topological 
fibration (E, m, B) for which every fibre set element z € Ep is a W-valued function on a corresponding fibre 
set E, of a “platform” non-topological fibration (E, 7, B). (The case E = () is excluded to avoid worthless 
“platform fibrations".) A simple practical example of this concept would be B = M and E = T*(M) with 
E = T(M) and W = R for some C! manifold M as in Definition 55.4.4, where each element w € T (M) = 
Lin(T;(M),TR) is a function w : T,(M) — R. (Definition 21.4.2 is illustrated in Figure 21.4.1. For the 
corresponding diagram for short-cut versions of differential forms on manifolds, see Figure 57.7.1.) 


:«z(z) |W 
i 
LA 
à 
Figure 21.4.1 A form-style non-topological fibration 


21.4.2 DEFINITION: A form-style non-topological fibration is a non-topological fibration (E, v, B) for which 
E #0 and there exist a non-topological fibration (E, 7, B) and a set W satisfying (i). 


(i) Vp € B, Vz € Ey, z : Ep > W. In other words, every z € E is a function from E) to W. 


21.4.3 THEOREM: Some basic properties of form-style non-topological fibrations. 
Let (E, r, B) be a form-style non-topological fibration. 


(i) Vz, z’ € E, (a(z) 2 (27) = Dom(z) = Dom(z’)). 
(ii) Vz,z' € E, (n(z) Z v(z7) = Dom(z) n Dom(z’) = 0). 
(iii) Vz, 2' € E, (Dom(z) n Dom(z’) 4 0 = Dom(z) = Dom(z’)). 
(iv) If E Z (0), then Vz € E, Dom(z) z 0. 
(v) If E # (0), then Vz,z' € E, (n(z) = 1(7) & Dom(z) = Dom(z’)). 
(vi) If E Æ (0), then Vz, z' € E, (n(z) Z (7) & Dom(z)n Dom(z’) = 9). 
(vii) If E Æ (0), then Vz, 2’ € E, (Dom(z) n Dom(z’) 4 0 & Dom(z) = Dom(z’)). 
PROOF: For part (i), let z,z' € E with m(z) = m(z). Let p = n(z). Then p € B and z,z' € Ep. So 
Dom(z) = E, and Dom(z’) = Ep by line (i). Hence Dom(z) = Dom(z’). 


For part (ii), let z,2’ € E with T(z) Z m(2’). Let p = m(z) and p' = n(2'). Then p,p' € B and z € E, 
and z' € Ey. So Dom(z) = E, and Dom(z’) = Ey by line (i. But p # p’. Therefore E; N E, = 0 


by Theorem 21.1.7 because (E,7,B) is a non-topological fibration. (Note that uniform non-topological 
fibrations are special cases of non-uniform non-topological fibrations.) Hence Dom(z) n Dom(z’) = 0. 


Part (iii) follows from parts (i) and (ii). 
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For part (iv), let E # (0). Then since E # {0}, there must be at least one 2 € E with 2 # Ø. So 
Dom(2) 4 Ø. Therefore Dom(z) 4 0 for all z € E by the equinumerosity of fibre sets E, of the total space 
E in Definition 21.4.2. 

For part (v), let E 4 (0). Let z, 2’ € E satisfy Dom(z) = Dom(z’). Then Dom(z) N Dom(z’) = Dom(z) 4 0 
by part (iv). So a(z) = 1(7) by part (ii). Thus Vz, z' € E, (a(z) = 1(z) 4 Dom(z) = Dom(z’)). Hence it 
follows from part (i) that Vz, z' € E, (x(z) = 1(7) € Dom(z) = Dom(z’)). 

For part (vi), let E # (0). Let z,z € E ind Dom(z) n Dom(z’) = Ø. Then Dom(z) Z Dom(z’) by 
part (iv). So n(z) # n(2') by part (i). Thus Vz, 2' € E, (a(z) Æ 1(z) = Dom(z) n Dom(z’) = 0). Hence it 
follows from part (ii) that Vz, 2’ € E, (n(z) Z «(7) + Dom(z) n Dom(z’) = Ø). 


Part (vii) follows from parts (iii) and (iv). 


21.4.4 REMARK: Constructing the platform fibration for a form-style non-topological fibration. 

Theorem 21.4.5 shows how to construct a fibration (E,7, B) satisfying line (i), which is required to at least 
exist in Definition 21.4.2. Theorem 21.4.5 (vi) shows that this fibration is unique. Objects which exist and 
are unique can be given a name! This is done in Definition 21.4.6. 


21.4.5 THEOREM: Explicit construction for the platform fibration of a form-style fibration. 
Let (£,7,B) be a form-style non-topological fibration with E Z (01. Let E = U,-, Dom(z) and define 
f : Ê — B by Vz € E, V2 € Dom(z), #(2) = n(2). 

(i) 
(ii) 
(ii) Yp € B, #*({p}) = User, Dom(2). 

(iv) Vp € B, Vz € Ep, Dom(zg) = Uses, Dom(z). 
) 
) 


zcE 


Ê — B is a well-defined function. 


f: 
(E, %, B) is a non-topological fibration. 


(v) Vp € B, Yz € Ep, ft ((p]) = Dom(z). 


(vi) (E, 7%, B) is the unique non-topological fibration which satisfies Definition 21.4.2 (i). 


Proor: For part (i), Ê is a well-defined set by the ZF union axiom, Definition 7.2.4 (4), because Dom(z) 
is a well-defined set for all z € B. Then Dom(z) C Ê for all z € E by Theorem 8.4.8 (xiv). 

To show that 4: Ê — B is well-defined, let 2 € Ê. Then the definition of Ê implies that 2 € Dom(z) for 
some z € E, where z is not necessarily unique. So 4(2) = «(z), but this could potentially be non-unique. 
Suppose that 2 € Dom(z) and 2 € Dom(z’) for z, 2’ € E. Then Dom(z) n Dom(z’) 4 0. So «(z) = n(z') by 
Theorem 21.4.3 (vi). Thus 7(z) is uniquely determined by 2 € Ê. So ĉ : Ê — B is a well defined function. 
For part (ii), E = Ø implies E, £ 0 for all p € B by Theorem 21.2.4 (i). Let pı, pa € B. Then Ey, = Dom(z1) 
for all z; € Ep. Therefore Ey, = Dom(z1) for some zı € Ep, because Ep, # Ø. Similarly Ey, = Dom(z2) 
for some z2 € Epa. But then by ile 21.4.2, there exists a non-topological fibration (E,7, B) for 
which Dom(z1) = Bp, and pi : . Consequently, by Definition 21.2.1 (i), there exists a bijection 
$: EQ > Ej. But E, = E, m E, = Ê. So ¢: É,, > Ep,. Thus (Ê,ĉ, B) also satisfies 
Definition 2121 à. Hence (By , B) is a non-topological fibration. 

For part (iii), let p € B. Let 2 € &-!((p]). Then 2 € Ê and #(2) = p. So 2 € Dom(z) for some z € E, 
and then «(z) = #(2) = p by the definition of 4. Therefore z € Ep. Thus 2 € Dom(z) for some z € Ep. 
So 4 l((p) € Ucr, Dom(z). Now let 2 € LJ, cz, Dom(z). Then 2 € Dom(z) for some z € Ep. So 
i(2) = n(z) = p by the definition of 4. Therefore 2 € 4! ((pJ). Consequently 4^! ((pJ) = Uzer, Dom(). 


For part (iv), let p € B and zo € Ey. Then Dom(zo) € U,cz, Dom(z) by Theorem 8.4.8 (xiv). Now suppose 
that 2 € [J; cg, Dom(z). Then 2 € Dom(z) for some z € Ey. For such z, suppose that Dom(z)  Dom(zo). 
Then m(z) Z (zo) by Theorem 21.4.3 (i). So p # p, which contradicts Theorem 6.7.9 (ix). Therefore 
Dom(z) = Dom(zo). So 2 € Dom(zo). Thus Uer, Dom(z) € Dom(zg). Hence Dom(z5) = Uses, Dom(z). 
Part (v) follows from parts (iii) and (iv). 

For part (vi), let p € B and z € E,. Then Dom(z) = É, by part (v), and Range(z) = W because 
(E,7,B) satisfies Definition 21.4.2 (i). Thus by part (ii), (E,7, B) is a non-topological fibration which 
satisfies Definition 21.4.2 (i). 
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To show uniqueness, let (E, 7, B) satisfy Definition 21.4.2 (i). Let p € B. Then E, # Ø by Theorem 21.2.4 (i) 
because E # ( by Definition 21.4.2. So z € Ep exists. Then Dom(z )= de by Definition 21.4.2(i). But 
Dom(z) = Ê, by part (v). So Ep = = Ey. Thus E = User E» = Unes Ê, = Ê. To show that 7 — @, first note 
that Dom(7) = E-É- Dom( 2. Let z€ E = Ê. pad that 7(Z) Æ 7(Z). Let pı = 7(Z) and pa = 7(2). 
Then 2 € Ej, and Z € Êp,. So Ep, N Ep, #0. But Ep, = Dom(z1) for all z; € Ep, by Definition 21.4.2 (i), 
and É,, = Dom(z3) for all z2 € Pa by part (v). Then Dom(z1) N Dom(22) a 0. So m(z1) = m(z2) by 
Theorem 21.4.3 (vi). Therefore p; = pa. Thus 7 = £t. So (E,7, B) = (Ê, 7, B). In other words, (E,7, B) is 
the unique non-topological fibration which satisfies Definition 21.4.2 (i). 


21.4.6 DEFINITION: The platform fibration of a form-style non-topological fibration (E, 7, B) is the unique 
non-topological fibration (E, 7, B) which satisfies Definition 21.4.2 (i). 


21.4.7 THEOREM: Some very basic properties of platform fibrations of form-style fibrations. 
Let (E,7, B) be a form-style non-topological fibration. Let (E, 7, B) be the platform fibration of (E, r, B). 


(i) E = LJ, Dome). 
(ii) * : E > B satisfies Vz € E, YZ € Dom(z), (Z) = v(z). 


PROOF: The assertions follow from Theorem 21.4.5 (vi). 


21.4.8 REMARK: The short-cut map for a form-style non-topological fibration. 

The platform fibration in Definition 21.4.6 may now be applied to the construction of “short-cuts” for cross- 
sections in Definition 21.4.9. The short-cut construction map converts a given cross-section X € X(E,7, B) 
toa map X : E > W, which is defined by X = p(X) : z e X(z(z))(2). (This is illustrated in Figure 21.4.2.) 


Figure 21.4.2 Short-cut map for a form-style non-topological fibration 


21.4.9 DEFINITION: The short-cut map for a form-style non-topological fibration (E,v, B) with platform 
fibration (E,7, B) with target space W is the map p : X(E,7, B) > (E — W) defined by 


VX € X(E,n, B), YZ € E, (X) = X (a(2))(3). 


The (local) short-cut map for a form-style non-topological fibration (E, n, B) with platform fibration (E, 7, B) 
with target space W, on a set U € P(E), is the map p: X(E,7, B|U) — (U > W) defined by 


VX € X(E,n, B|U), Vz € U, (XN) = X(a(2))(2), 


where U = LJ, cy Dom(z). 

21.4.10 Notation: X(E,7,B), for a form-style non-topological fibration E 7, B), denotes the range of 
the short-cut map for (E, r, B). In other words, X(E, r, B) = (p(X); X € X(E,m, B)), where p is the 
short-cut map for (E, r, B). 

X(E,n, B |U), for a form-style non-topological fibration (E, m, B), denotes the range of the short-cut map for 
local cross-sections of (E, r, B) on U € P(E). In other words, X(E,7, B|U) = (p(X); X € X(E,m, B|U)). 
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21.4.11 REMARK: Constraints on short-cuts of cross-sections. 

The set X(E,7, B) in Notation 21.4.10 is a subset of W, the set of all functions from E to W, because 
X(E,n, B) = p(X(E,7,B)) is the range of p. Although this map is clearly injective, it is typically not 
surjective. The elements of X(E,7, B) inherit specific constraints from E. For example, if E = T^" W = 
Ao (T(M), W) as in Definition 56.5.17, then E = T"(M), where M is a C! manifold, m € Zf, and W isa 
real linear space. In this case, the maps p(X) for X € X(E,m, B) are maps from T" (M) to W for which 


the restriction X m (M) is both linear and antisymmetric for all p € M. These constraints are inherited 
from As (T(M),W). 


21.4.12 REMARK: Simple expression for the short-cut map. 

Theorem 21.4.13 (ii) means that each short-cut p(X) is the disjoint union of the functions X (p) : Ey >W 
on the fibre sets Ep, which constitute a disjoint union of E. In other words, p(X) is equal to the aggregate 
of the fibre-set-wise partial functions p(X )| z= X(p) for p € B. 


Theorem 21.4.13 (iv) shows that each original cross-section X can be reconstructed from the corresponding 
“short-cut cross-section" p(X). The existence of this inverse map from p(X) back to X implies that the 
short-cut map p : X(E,7,B) 5 X(E,7, B) is a bijection. This is important because it means that the use 
of “short-cuts” for cross-sections of form-style fibrations never loses information. This is usually intuitively 
obvious in practical applications, but it is comforting to have a general proof. 


21.4.13 THEOREM: Some basic properties of short-cut maps for form-style fibrations. 
Let p : X(E,7,B) — (E — W) be the short-cut map for a form-style non-topological fibration (E, m, B) 
with platform fibration (E, 7, B). 
(i) VX € X(E,v, B), Vp € B, p(X)|g, = X(p). 
(ii) VX € X(E, m, B), p(X) =Upen X(p) = U Range(X). 
(iii) X(E,m, B) = Range(p) = {Upes X(p); X € X(E, v, B)) = (URange(X); X € X(E, m, B)). 
(iv) VX € X(E, v, B), X = {(p,X(p)); p € B} = ((o.o(X)|g p € B}. 
(v) p: X(E,v, B) 2 X(E,m, B) is a bijection. 
(vi) VY € X(E, v, B), {(P,Y |p p € B} € X(E,7, B). 


PROOF: For part (i), let X € X(E,z, B) and p € B. By Definition 21.4.9, p(X)(z) = X(p)(Z) for all 
zZ € Ep. Therefore p(X)|g = X(p) by Theorem 10.2.13. 


Part (ii) follows from part (i) because Dom(p(X)) = E is equal to the disjoint union of the sets E, for p € B, 
for all X € X(E,m, B). 

Part (iii) follows from parts (i) and (ii) and Definition 21.4.9 and Notation 21.4.10. 

Part (iv) follows from part (i) and Theorem 10.4.5 line (10.4.1). 

For part (v), p : X(E,v, B) > X(E,7, B) is a well-defined surjection by part (iii) and Notation 21.4.10. 
Let Xı, X2 € X(E,m,B) satisfy p(X1) = p(Xi) Then Xi(z(z))(z) = Xao(x(z))(3) for all z € E by 
Definition 21.4.9. So X4(p) = X2(p) for all p € B. Therefore X; = X5. Hence p: X(E, r, B) ^ X(E,7, B) 
is a bijection. 

For part (vi), let Y € X (E, r, B). Then by Notation 21.4.10, Y = p(X) for some X € X(E,7, B). This value 
of X € X(E,7, B) is unique by part (v). Then by part (iv), X = {(p, p(X)| ); pE B}= (Go, Y|g ); p € B}. 
Hence {(p,¥|_ y pe B} € X(E, 7, B). i " 


21.5. Fibre charts for non-topological fibrations 


21.5.1 REMARK: Fibre charts are aggregations of per-point fibre space bijections. 

A “fibre chart” is the union of one fibre space bijection per point for any number of base-space points in a 
fibration. Hence Definition 21.5.2 (i) specifies that the domain of a fibre chart must be of the form «^! (U) for 
some U € P(B), which is the same as (pey v! ((5)). (This is a disjoint union by Theorem 10.6.10 (iii, iv).) 
So an element e € E is in the domain of ¢ if and only if 3b € U, e € v !((b]). Therefore b € «(Dom(9)) if 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


716 21. Non-topological fibre bundles 


and only if 7~'({b}) C Dom(¢). Thus if at least one element of a fibre set 7~!({b}) is in the domain of 4, 


then all elements of that fibre set are in the domain. In other words: “One in, all in.” 

Condition (i) could be written more briefly as Dom(¢) = v! (1(Dom(9))), which avoids the need to specify 
a set U, but the requirement that Dom(¢) = «^ !(U) for some U € P(B) is probably easier to interpret. 
Definition 21.5.2 is illustrated in Figure 21.5.1. (See Notation 21.1.3 for the abbreviation Ey = x! ((b]).) 


F fibre space 

[o fibre chart 

E total space 

T projection map 

B base space 
Figure 21.5.1 Uniform non-topological fibration with fibre chart 


21.5.2 DEFINITION: A fibre chart with fibre (space) F for a non-topological fibration (E, 7, B) is a partial 
function $ : E > F such that 


(i) JU € P(B), Dom(¢) = «^ 1(U), [fibre set “one in, all in”] 
(ii) m x 6: Dom(¢) > «(Dom(9)) x F is a bijection. [local trivialisation] 


In other words, 7 x 6: 1(U) + U x F is a bijection for some U C B. 
A fibre chart with fibre F for (E,7, B) may also be called an F-fibre chart for (E,m, B). 


21.5.3 REMARK: Inverse projections of projections of domains of fibre charts. 

The almost-trivial Theorem 21.5.4 is often required in proofs of fibre bundle theorems, but is almost never 
invoked explicitly. It is not entirely obvious because although a surjective function m : E — B obeys 
T o7 ! = idg by Theorem 21.2.4 (iii), the assertion “1—* o m = idp” is not valid in general. It is the special 
form of the sets Dom(ó) in Definition 21.5.2 (i) which makes Theorem 21.5.4 valid. 


Note that Theorem 21.5.4 does not utilise Definition 21.5.2 condition (ii). In fact, it is actually a theorem 
about general sets of the form «^ !(U) for U € P(B), where U = 0 or Range(v) = B. 


21.5.4 THEOREM: Projections of domains of fibre charts. 
Let (E, n, B) be a uniform non-topological fibration. Let F be a set. 


(i) If à is a fibre chart for (E, x, B) with fibre space F, then 7~!(m(Dom(¢))) = Dom(¢). 
(ii) If ¢ is a fibre chart for (E, v, B) with fibre space F and U = v(Dom(9)), then a^! (U) = Dom(¢). 
(iii) If Q4 and $» are fibre charts for (E, n, B) with fibre space F’, then 
1 l(x(Dom(Qi))) = 7~!(a(Dom(¢2))) implies Dom(¢,) = Dom(¢z). 
(iv) If $4 and ¢2 are fibre charts for (E,7, B) with fibre space F and Ug = 7(Dom(¢x)) for k = 1,2, then 
1 1(Ui) = 4-!(Us) implies Dom(¢1) = Dom(9»). 
PROOF: For part (i), let ¢ be a fibre chart for (E, r, B) with fibre space F. Suppose that Dom(¢) = 0. 
Then z^ !(z(Dom(9))) = «^ !(n(0)) = 0 = Dom(¢). 
Now suppose that Dom(¢) # Ø. By Definition 21.5.2 (i, Dom(¢) = «^ !(U) € E for some U € P(B). So 
E + Ú because E > Dom(¢) # Ø. Therefore m : E — B is surjective by Theorem 21.2.4 (ii). Consequently 
m(Dom(¢)) = «(x-!(U)) = U by Theorem 10.7.1(i"). Therefore z^ !(z(Dom($))) = «^ !(U) = Dom(¢). 
Thus m~ !(x(Dom(9))) = Dom(¢) in either case. 
Part (ii) follows directly from part (i). 
For part (iii), let 1, 92 be fibre charts for (E, v, B) with fibre space F. Then «^! (r(Dom(9&))) = Dom(¢x) 
for k = 1,2 by part (i). Hence r~'(a(Dom(¢1))) = 1^! (1(Dom(ó32))) implies Dom(¢1) = Dom(¢2). 


Part (iv) follows directly from part (iii). 
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21.5.5 REMARK: A fibre chart can be decomposed into per-point fibre space bijections. 

The product m x ¢ in Definition 21.5.2 (ii) denotes the common-domain product of the functions 7 and ¢. 
(See Definition 10.15.2 and Notation 10.15.3.) The projection map 7 and a fibre chart $ are similar in 
that they project the total space to a component space, either B or F respectively. In fact, the functions 
ro(rx$ó)!:Ux F—U and óo(n x9) ! :U x F 2 F are the Cartesian product set projection maps 
for U x F, as presented in Definition 10.13.2. 


Definition 21.5.2(ii) is equivalent to requiring the map ol, 
all b € U. This is shown in Theorem 21.5.6. 


-iy + n l((b)) — F to be a bijection for 


21.5.6 THEOREM: Restrictions of fibre charts to fibre sets are bijections. 
Let (E, r, B) be a uniform non-topological fibration. Let U C B. Let ¢: 7~1(U) — F for some set F. 


(i) If à is an F-fibre chart for (E, n, B), then $i (ay :m l((b]) > F is a bijection for all b € U. 

(ii) If 9| i (yy :m l((b)) > F is a bijection for all b € U, then ¢ is an F-fibre chart for (E,7, B). 
PROOF: For part (i), let ¢ be an F-fibre chart for (E, v, B). Then 7 x 6: 1 !(U) + U x F isa bijection by 
Definition 21.5.2 (ii). Hence $i y) :m 1((b)) > F is a bijection for all b € U by Theorem 10.15.10 (i). 


For part (ii), suppose that $n) is a bijection for all b € U. Then 7 x $: x !(U) — U x F is a bijection 
by Theorem 10.15.10 (iii). Hence ¢ is an F-fibre chart for (E, r, B) by Definition 21.5.2 (ii). 


21.5.7 REMARK: Local trivialisation. 

In the context of topological fibrations and fibre bundles, a condition resembling Definition 21.5.2 (ii) is 
often referred to as a “local trivialisation" condition. In the case of non-topological fibrations and fibre 
bundles, the existence of a bijection m x $ : x !(U) — U x F is equivalent to the existence of only pointwise 
bijections, whereas in the topological case, this bijection is required to be a homeomorphism, which is stronger 
than the corresponding pointwise homeomorphism condition. Thus for Definition 21.5.2 (ii), the label “local 
trivialisation" is unjustified. It has the right form, but there is nothing “local” about it because there is no 
topology. (As mentioned in Remark 31.1.1, topology is “the study of locality” .) 


Although non-topological local trivialisations do not bestow any locality on “neighbouring” fibre sets, they 
do deliver substantial trivialisation. Theorem 21.5.6 (i) implies the equinumerosity of all fibre sets to the 
nominal fibre space F by Definition 13.1.2. So they are “cardinality-equivalent” subsets of the total space. 
This is a very humble harbinger of the approaching topological and differentiable fibre bundles, where each 
fibre set will have isomorphic topological or differentiable submanifold structure within the total space, and 
the restrictions of trivialisations to fibre sets will be topological or differentiable isomorphisms. (The relevant 
assertions are in Theorems 47.3.11 (i) and 64.3.9, and in Section 64.11.) 


21.5.8 REMARK: Interpretation of fibre charts as observation/measurement maps. 

Despite the lack of real “locality” in the “local trivialisation” condition, a fibre chart does have some real- 
world significance if one considers that a single function $ aggregates into a single mathematical structure 
the measurement maps $| p, at multiple points b of the base set B. Each pointwise fibre chart $| p, Maps 
a real-world fibre set Ey to the observer’s measurement space F. In other words, they map from noumena 
(the real things in the world) to phenomena (observations of real things). The chart $ is an aggregate of 
such pointwise measurement maps. This suggests that an observer is able to observe multiple points as 
part of a single measurement framework or process. In the case of topological or differentiable fibrations, 
there is topological or differentiable structure which connects observations at different base points. In the 
mathematically minimalist non-topological case, the only *value added" by a fibre chart is aggregation. 
Questions about continuity and differentiability are left for later consideration. But in ontological terms, the 
aggregation generally signifies an observer reference frame, or some such thing. 


21.5.9 REMARK:  Objectivity of "measurement" of fibration base-space points. 

In the temperature example, as described in Remarks 21.1.11 and 21.2.9, one might define a Celsius fibre 
chart $1 : U1 — F for the subset U, of the Earth's surface B where Celsius is accepted as the temperature 
scale, and a second fibre chart $9 : U2 — F for the subset Uz where Fahrenheit is accepted. However, 
according to Definition 21.5.2, there is no subjectivity regarding the map 7 : E — B. Each observer frame 
$ makes the same “measurement” of the base point 7(e) for a given observed object state e € E. 
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Objectivity regarding the base point, or “location”, of an object is clearly not logically necessary. Different 
observers could have different systems for coordinatising the base space. However, in the kinds of systems 
which are intended to be modelled by fibrations and fibre bundles, the measurement subjectivity of base 
space points is not the main concern. It is true that if the sets E, B and F are locally Cartesian, they do 
all have multiple choices for locally Cartesian charts, but the chief purpose of topological fibration and fibre 
bundle definitions is to model the relations between the structures on E, B and F which cannot be removed 
by changes in the choices of locally Cartesian charts. 


It is assumed that the base space B is an abstract point space whose measurement issues are separate, to 
be dealt with independently of the total subjectivity of the fibre charts. In practice, the reconciliation of 
base point coordinates is performed before comparing measurements of the fibre sets. There is no strictly 
a-priori logical reason why simultaneous measurements of point and state coordinates should be separable 
in this simple way. It is merely a fact of experience that location and state can be neatly separated. 


21.5.10 REMARK: The fibre chart for a fibration with an empty base space. 

If B = () in Definition 21.5.2, then E = ( and m = 9, as mentioned in Remark 21.2.5, and any set F is a fibre 
space for (E, n, B) = (0,0, 0), as mentioned in Remark 21.2.11. Then the empty function à — 0 : Ø — F is 
a fibre chart with fibre F (since the function product Ø x 0: Ø — Ø x 0 is a bijection because Ø x Ø = 0 by 


Theorem 9.4.6 (i)). 


21.5.11 REMARK: Specification styles for fibre charts. 

There are at least three equi-informational styles of specification for fibre charts. These are itemised as styles 
(1), (2) and (3) in Theorem 21.5.13. Style (1) is adopted in this book because it requires the least redundant 
information. The other two styles are easily constructed from it. Theorem 21.5.13 shows how each of the 
three styles may be converted to the others. Table 21.5.2 summarises the styles presented by some authors. 


year reference fibre chart style 


1951 Steenrod [142], page 8 :U x F => m7! (U) 
1963 Auslander/MacKenzie [1], page 161 :U x F377 (U) 


(3) p 
(3) p 

1963 Kobayashi/Nomizu [19], page 50 oux Uem 
1964 Bishop/Crittenden [2], page 42 (1) ó:- (U)—F 
1968 Choquet-Bruhat [6], page 22 (D é:r (U—F 
1970 Spivak [37], Volume 2, page 307 (2) v: "(U)UxF 
1972 Sulanke/Wintgen [40], page 78 (2) werte "(UFU XP 
1980 Daniel/Viallet [317], page 177 (1) 9:07 7(U)— F 
1980 EDM? [113], 147.B, page 568 (3 p:UxF-m-(U) 
1980 Schutz [36], pages 40-41 (1) $:vT7(U)  F 
1981 Poor [32], page 2 (1) 9: !(U)—F 
1983 Nash/Sen [30], page 142 (2) v:m-!(U)—oUxF 
1986 Crampin/Pirani [7], page 354 (2) v:- (U)OUXxF 
1987 Gallot/Hulin/Lafontaine [13], page 31 (2) v:m "(U)UxF 
1994 Darling [8], page 125 (2 y:n t(U)>Ux F 
1995 O’Neill [295], page 7 (3) p:UxF—m (U) 
1997 Frankel [12], page 416 (3) pg:UxF—m WU) 
1997 Lee [24], page 16 (2) vis (U)-—UxF 
1999 Lang [23], page 103 (2) v: "(U)UxF 
2004 Szekeres [305], page 424 (2) wis (U)—UXF 
2012 Sternberg [38], page 327 (2) :m (U)JGUXxF 

Kennington (1) ó$:77(U) 5 F 

Table 21.5.2 Survey of fibre chart styles 


Poor [32], page 2, calls the combined map 2 7 x ¢: 4 1(U) + U x F a “bundle chart” on E, and calls ó 
the “principal part of the bundle chart”. In practice, both ¢ and w are extensively used. In this book, ¢ is 
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generally called a “fibre chart”, whereas for topological and differentiable fibrations and fibre bundles, ~ is 
typically called a “manifold chart” for the total space, or something similar. 


21.5.12 REMARK: Illustration of relations between three styles of fibre chart specification. 
Figure 21.5.3 illustrates the relations between the three styles of fibre chart specification which are mentioned 


in Theorem 21.5.13. 
zx $ 
W=7 (U)JCE ——— F 


+ ^ 
E 
UCB Ow QO UxFCBxF 
1 


Figure 21.5.3 Relations between three styles of fibre chart: ¢, and p 


21.5.13 THEOREM: Conversion formulas between the three most popular fibre chart styles. 

Let U, F and W be sets. Let 7: W — U be a surjective map. Define the Cartesian product projection 
maps II; : U x F —> U and Il; : U x F > F by II; : (b, f) — b and Ils : (b, f) — f. Define single-variable 
predicates X1, X5 and X3 as follows. 


(1) X1(¢) = “6: W 2 F and 7 x 9:W —U x F is a bijection”. 
(2) Xo(w) = “y : W —U x F is a bijection and II o v = 7”. 
(3) Xa(p) = "p: U x F > W is a bijection and II; 27 o p". 


'Then: 

(i) X1(¢) & Xlr x Q). 
(ii) Xo(w) > X k o y). 
(ii) Xo) e Xa(u71) 

(iv) Xp) & Xa(p-?) 

(v) X1(¢) > Xs((m x à) 7) 

(vi) X3(p) > Xı (Ilə o p^!) 


Pnoor: To show part (i), assume X4(à). Then 6: W > F anda x 9: W — U x F is a bijection. 
Let y =a x ¢. Then y : W-UxFisa bijection and II, o Y% = II; o (m x à) = v by the definitions of II, 
and the pointwise Cartesian function product. (See Definition 10.15.2.) Hence X2(m x ¢). For the converse 
of (i), assume Xo(x x ¢). Then tx 6: W — U x F. Therefore ó : W — F. Hence Xi(9). 

For part (ii), assume X5(v). Then v : W > U x F is a bijection and II; o v = v. Let ọ = II» o v. Then 
$-—1l;o 9 :W >U anda x $ =r x (Ilo o v) = (II o v) x (II o y) = v. Therefore t x 6:W —UxF 
is a bijection. Hence X, (II o v). 

For part (iii), assume X2(~). Then v : W — U x F is a bijection and II, o Y = 7. So v :UxFoWisa 
bijection and Vw € W, II4(v(w)) = m(w). Let p = ^1. Let (b, f) € Ux F. Let w = p((b, f)). Then v(w) = 
(b, f). So Th (W(w)) = Th (C, f)) = b. So (oll, f))) = 7(w) = b. Therefore V(b, f) € U x F, (o((b. f))) = 
b = IL ((b, f)). In other words, m o p = Il. Hence X3(~~'). For the converse of (iii), assume Xs(v!). 
That is, ~~! : U x F > W is a bijection and II = m o 7t. Then v : W > U x F is a bijection and 
V(b, f) € U x F, Il((b, f)) = n(v-!((b, f))). Therefore V(b, f) € U x F, «(v !((b, f))) = b. Let w € W. 
Let (b, f) = ow). Then w= 4- ((b, f)). So z(w) = z(9- ((5, £))) = b = Ih (b, f) = Hi (#(w)). Therefore 
Vw € W, 1(w) = Il (v(w)). That is, m = II; o y. Hence X2(w). 

The proof of part (iv) follows exactly as for part (iii) by substituting p^! for Y and ~~! for p, and noting 
that (g-1)-! = p and (9-1)! =v. 

Part (v) follows from parts (i) and (iii). 


Part (vi) follows from parts (ii) and (iv). 
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21.6. Cross-section localisations and locally constant cross-sections 


21.6.1 REMARK: “Localisation” of cross-sections via fibre charts. 

The “localisation” of a cross-section via a fibre chart in Definition 21.6.2 uses the function-composite for 
partial functions in Definition 10.10.6, which does not require Range(X ) C Dom(¢). This composite function 
is “local” in the sense that it uses a particular choice of fibre chart. This is not the same as “local” in the 
topological sense of an open neighbourhood. 

The purpose of “localising” a cross-section is to enable various kinds of analysis, such as differentiation, to be 
applied to the cross-section. Formulas tend to be written in physics within particular choices of coordinate 
systems, and “localisation” converts abstract fibre set values to more concrete fibre space values. 


21.6.2 DEFINITION: The localisation of a cross-section X € X(E, 7, B) via a fibre chart ¢ with fibre space 
F for a non-topological fibration (E,7, B) is the map $ o X : Dom(X) N z(Dom(9)) > F. 


21.6.3 REMARK: The use of the word "local" to refer to chart-specific comcepts. 

Part of the reason for the custom of using the word “local” in the names of chart-specific concepts, as 
opposed to £opological locality, may be the archaic practice, which still persists, of defining manifold and 
fibre charts to be ordered pairs (U;, 9;), where U; is an open set and ¢; maps U; to some coordinate space. 
'Then each such pair was thought of as a *coordinate neighbourhood", although the set is really superfluous. 
(Similarly, the set X of a topological space (X, T) is superfluous, as mentioned in Remark 31.3.5.) 


Clearly a single set U; may have many different charts 9; with U; = U;. So this practice is ambiguous and 
confusing. Nevertheless, the customary terminology is retained here in Definition 21.6.2. 


Since fibre charts are often referred to as “local trivialisations", and the “localisation” in Definition 21.6.2 
is achieved by means of a fibre chart, this does seem to justify the term “localisation” here. 


The following table lists some definitions which use some kind of localisation concept. 


localisation concept: cross-section horizontal lift function connection form 
non-topological Definition 21.6.2 

differentiable OFB Definition 67.7.2 

differentiable PFB Definition 69.1.12 Definitions 69.11.3, 69.13.3 


These localisation concepts have something in common, but are significantly different. The localisation of a 
cross-section uses a fibre chart to map the total-space output from a cross-section to the fibre space. 


The localisation of a horizontal lift function uses a fibre chart to convert its domain from a complicated 
structure to a simple Cartesian product of manifolds so that differentiability can be defined. (For the 
“complicated structure" of a horizontal lift function 0, see Definition 67.5.4.) This is in keeping with the 
idea that all fibre bundles are "locally trivial", which means that they are cross-products of the base space 
and the fibre space "locally". 


The localisation of a connection form on a principal bundle uses a local cross-section to “pull back” the 
connection form from the total space domain to the base space domain. This is a different kind of concept 
because a local cross-section is used for localisation instead of a fibre chart. However, it amounts to much 
the same thing because local cross-sections and fibre charts have a close one-to-one relation which is shown 
in Theorem 21.10.14. (See also Remark 21.6.5 and Definition 21.6.6.) So it may be concluded that all of the 
above localisation concepts are based on local trivialisations. 


21.6.4 REMARK:  Reglobalisation of localised cross-sections. 

The localisation procedure in Definition 21.6.2 may be reversed since the maps 7 x ¢: 7 1(U) —> Ux F in 
Definition 21.5.2 are bijections. Let Ux = Dom(X) and Us = Dom(¢). Then X(b) = (mx ¢)~1(b, (¢ o X)(b)) 
for all b € Ux N Ug. Thus X can be “reglobalised” wherever the available charts ¢ cover Dom(X). 


21.6.5 REMARK: Constant cross-sections constructed from fibre charts. 

The fibre charts in Definition 21.5.2 can be used to construct fibration cross sections as in Definition 21.3.3. 
A fibre chart is a local bijection ¢ : E — F whereas a cross-section is a local injection X : B > E by 
Theorem 21.3.5. If Range(.X) C Dom(4), then the composite function ¢ o X : B > F is well defined. (See 
Definition 21.6.2.) A kind of reversal of this procedure is to fix an element fg € F and construct a function 
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X : B > E so that ¢ o X has the constant value fo for some given local chart o : E > F. This kind 
of construction of a local cross-section from a fibre chart is seen quite frequently in applications to gauge 
theory. (See for example Daniel/Viallet [317], page 178.) 


Let à : U — F be a fibre chart for a non-topological fibration (E, r, B) with fibre space F. Let fo € F. 
Define X44, : U > v !(U) by 


Vb € U, Xo, gy (b) = (v x $) + (b, fo). 


Then X47, : U — m !(U) is a well-defined function because m x 9 : « !(U) — U x F is a bijection by 
Definition 21.5.2. Let b € U. Then z(X,,5,(b)) = n((x x $)! (b, fo)) = b. Thus b € v^! ((b]). Therefore 
X5, r, is a local cross-section of (E, m, B) by Definition 21.3.3. Such a local cross-section, constructed directly 
from a local chart, may be referred to as a “constant cross-section" relative to a given chart. Clearly the 
fact that it is constant with respect to one chart does not in general imply that it is constant relative to 
other charts. However, in applications, one often works much of the time in a single chart. Then the notion 
of a constant cross-section is meaningful. (See Definition 21.10.3 for the related "identity cross-sections" of 
principal bundles.) 


One could refer to constant cross-sections as being “locally constant", where “local” means “relative to the 
given fibre chart”. In terms of Definition 21.6.2, one may say that the localisation of the cross-section is 
constant via some chart. 


21.6.6 DEFINITION: The constant cross-section of a non-topological fibration (E, m, B) with fibre space F, 
for fibre chart ó and fibre fo € F, is the function X, 5, : 7(Dom(¢)) > Dom(¢) defined by 


Vb € z(Dom(9)), Xo, f (b) = (n x à) 1 (b, fo). 


In other words, X, y, = Cro ossco) x p), fo). 


21.6.7 REMARK: Constant cross-section extensions. 

When the value fo € F in Definition 21.6.6 is replaced by ó(z) for some z € Ey, the resulting function X, 4/2) 
has the value Xg, 4.)(b) = (v x à)! (b; ó(z)) =z at b € B. Then the constant cross-section Xg (z) may be 
thought of as the constant extension of the total space element choice z € E; to all of Dom(4), although 
this constancy is of course only valid with respect to the particular choice of fibre chart $. This concept is 
given a name in Definition 21.6.8. (See Definition 57.1.20 and Notation 57.1.21 for tangent bundle version 
of Definition 21.6.8.) 


21.6.8 DEFINITION: The constant cross-section extension via a fibre chart $ of a total space element 
z € Dom(@), for a non-topological fibration (E, 7, B), is the function Extng(z) : ™(Dom(¢)) — Dom(49) 
defined by 


Yb € «(Dom(9)), Extng(z)(b) = (r x à)! (b, d(2)). 
In other words, Extng(z) = (v x ¢)~1(-, 6(z)) = X¢,42)- 


21.7. Fibre atlases for non-topological fibrations 


21.7.1 REMARK: A fibre atlas is a set of charts which cover the total space. 

A fibre atlas A for a non-topological fibration (E,7, B) is a set of charts which cover all of E using the same 
fibre F. In the example of temperature on the Earth's surface, this means that the combined coverage of 
measurements in Celsius and Fahrenheit (and other systems such as Kelvin, Réamur, Rankine and Rømer) 
is the whole of the Earth’s surface. 


21.7.2 DEFINITION: A fibre atlas with fibre space F for a non-topological fibration (E, m, B) is a set A of 
partial functions ¢: E > F such that 
(i) V6 € A, Dom(9) = m~t (1(Dom(9))), 
(ii) Yọ € A, * x 6 : Dom(¢) > «(Dom(9)) x F is a bijection, 
(iii) U(r(Dom(ó)); é € A} = B. 
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21.7.3 REMARK: Fibre atlases for empty fibrations. 

If B = 9 for a fibration (E, r, B) in Definition 21.7.2, then any set F is a fibre space for (E, n, B) = (0,0, Ø), 
and the empty function ¢ = @ is the only possible fibre chart. Therefore a fibre atlas with fibre F for the 
empty fibration (E, m, B) may be either Ø or {0}. 


21.7.4 REMARK: For non-topological fibre bundles, fibre charts contain no continuity information. 

Since the base set B in Definition 21.7.2 is unstructured, there is no difference between requiring a single 
combined chart ¢ : x !(U) — F and requiring an individual per-fibre chart à : x !((b]) — F for each b € U. 
'The combined chart does make a significant difference when topological or differentiable structure is defined 
on E, B and F, and the combined chart is then required to be continuous or differentiable respectively. 


A set with no topology is effectively equivalent to a topological space with the discrete topology. (See 
Definition 31.3.19 for discrete topology.) Since the fibre space F is also non-topological, even pointwise 
fibre charts contain no information, because all bijections between the fibre space and a given fibre set are 
equivalent. The value of defining structures such as charts and atlases for unstructured fibrations and fibre 
bundles is to highlight the information that is contained in them when structure is present. And besides, a 
single chart is easier to work with than one chart per base-space point. 


21.7.5 DEFINITION: A non-topological fibre bundle with fibre space F is a tuple (E, v, B, AZ) such that 


(i) (E, m, B) is a uniform non-topological fibration; 
(ii) AE is a fibre atlas with fibre space F for (E,7, B). 


Alternative name: non-topological F -fibration. 


21.7.6 REMARK: The lack of real added information in the specification of an explicit fibre atlas. 

The concept of a fibration with fibre atlas for fibre space F in Definition 21.7.5 specifies an explicit atlas 
A5. containing fibre charts for the fibre F. This is given the name “fibre bundle". Not many authors define 
fibrations (which have no atlas) as a class of objects distinct from fibre bundles (which do have an atlas). (See 
however Crampin/Pirani [7], pages 353-354; Sternberg [38], page 327; Poor [32], pages 1-2.) An immediate 
question which arises is whether the addition of an atlas adds real information to the structure. 


When the explicit fibre space F is chosen for the uniform fibration in Definition 21.2.1, and one asserts that 
for all fibre sets Ej = 7~+({b}) for b € B, there exists a bijection ¢: Ey — F, one is saying as much about 
the set F as one is saying about the fibration tuple (E, 7m, B). Any set which is equinumerous to F may be 
substituted for F. Therefore one is not really adding much information by asserting that “F is a fibre space 
for the fibration (E, r, B)" as in Definition 21.2.8. 


The same observation applies to the addition of fibre charts to a fibration. For any family ¢ = (b)beu of 
bijections $y : Ey — F, for b in a subset U of B, the union Ud = U{»; b € U} of (the graphs of) the 
bijections is necessarily (the graph of) a bijection from 7~'(U) to U x F. The existence of such a chart does 
not in any way constrain the tuple (E, m, B). Therefore it does not add information about the fibration. For 
similar reasons, the addition of an atlas of fibre charts to a fibration also adds no real information because 
all charts are automatically compatible in the sense that the restriction of the composition $4 o #3! of any 
two charts $1 and $2 to a single fibre set Ey = x !((b]) is necessarily a bijection from Ey to Ey. 


Consequently Definition 21.7.5 may be referred to as a “non-topological F-fibration", since it carries only 
the same amount of information as a fibration with fibre space F as in Definition 21.2.10. 


Since Definition 21.7.5 contributes, in the indicated senses, no additional "information" to the bare-bones 
uniform fibration in Definition 21.2.1, one naturally asks whether one should delete the definition. The 
main purpose of Definition 21.7.5 is ^political" or "rhetorical". It demonstrates that without any further 
structures on the sets E, B and F, the charts and atlases are of no real value. This observation motivates 
the introduction of further structures. (A similar situation occurs for locally Cartesian topological spaces 
in Section 49.7, where the addition of an atlas is at first superfluous, but the choice of a particular atlas is 
later seen to be important for specifying a particular choice of differentiable structure on such spaces.) 


21.7.7 REMARK: Minimal structure groups for non-topological fibre bundles with explicit fibre atlas. 
Without adding algebraic, topological or differentiable structures on a non-topological fibre bundle, one may 
build at least one construction with the aid of an atlas which one cannot build non-trivially without it. This 
construction is a “minimal structure group”. 
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Let (E, 7, B, AZ) be a non-topological fibre bundle with a non-empty fibre atlas for fibre space F. Let 
So = {¢1 © pz le; b1, $2 € Ak, b € t(Dom(¢1) N Dom(¢2))}. Then So is a subset of the group Gi of 
bijections from F to F. The group Go generated by Sp is the minimal structure group for the particular 
choice of atlas. (See Definition 17.6.5 for subgroups generated by subsets.) Any subgroup of G4 which is a 
supergroup of Go may be asserted to be a “structure group" for the fibre bundle. 


This shows how arbitrary the structure group is. This arbitrariness may be seen in exactly the same way in 
the context of general transformation groups. Given any set So of bijections for any set X, one may construct 
the group G which is generated by So, and then observe that So is a subgroup of any group between Go 
and the group G of all bijections of X. 


The structure group is limited only by the choice of atlas. One may choose the atlas to consist of a single 
fibre chart $ : E — F. Then any subgroup of G4 may be chosen as the structure group. This situation 
changes when the charts are required to be compatible with algebraic, topological or differentiable structure 
on the total space E. 


21.7.8 REMARK: Fibre atlases which are compatible with a given structure group. 

Definition 21.7.9 differs from Definition 21.7.2 by the addition of condition (iv). If AẸ is a (G, F) atlas for a 
non-topological fibration (E, r, B) as in Definition 21.7.9, then the tuple (E, 7, B, AL) is a non-topological 
(G, F) fibre bundle as in Definition 21.8.3. 


21.7.9 DEFINITION: A (non-topological) (G, F) (fibre) atlas for a non-topological fibration (E, m, B), where 
(G, F) is an effective left transformation group, is a set A of partial functions $ : E > F such that 

(i) Ve € A, Dom(9) = «^ (r(Dom(9))), 
(ii) Yọ € A, t x ó : Dom(¢) > «(Dom(9)) x F is a bijection, 
(ii) U (x«(Dom(9)); é € A) = B. 
(iv) Vó1, 9» € A, Vb € t(Dom(¢1) n Dom(¢2)), dg € G, Vz € x !((b)), b2(z) = u(g, à1(z)). 


21.8. Non-topological ordinary fibre bundles 


21.8.1 REMARK: Abbreviation for “ordinary fibre bundle". 
In this book, OFB is an abbreviation for “ordinary fibre bundle”. 


21.8.2 REMARK: The addition of a specified structure group to a fibre bundle. 

When a fibre bundle has a specified structure group, it is called an “ordinary fibre bundle”. The term 
“structure group” is slightly misleading. It is in fact a structure transformation group because it is always 
required to act on the fibre space in a specified way. Without this specified group action, part (v) of 
Definition 21.8.3 would be meaningless. 


The Euclidean group constrains the relations between valid reference frames for Euclidean geometry. The 
structure group of a fibre bundle similarly places pointwise constraints on the relations between fibre charts 
to ensure that transformations between fibre charts preserve the validity of structures which are viewed 
through these charts. (This is also mentioned in Remarks 21.8.18 and 20.10.2.) Thus the structure group 
and the atlas are closely linked in a kind of duality relationship. (This is also mentioned in Remark 47.6.14.) 


21.8.3 DEFINITION: A non-topological (G, F) (ordinary) (fibre) bundle for an effective left transformation 
group (G, F,c, p) is a tuple (E, r, B, AL) satisfying the following. 


(i) E and B are sets and 7 is a function 7: E > B. 
(ii) Vo € AE, JU € P(B), 6: 7! (U) 5 F. 
(ii) U (x(Dom(9)); ¢ € AE} = B. 
(iv) Vo € AE, JU € P(B), m x ó : v1 (U) S U x F is a bijection. 
(v) Voi, d2 € AE, Vb € r(Dom(¢1) n Dom(93)), 3g € G, Vz € n! ((6]), ó»(z) = ulg, o1(z)). 
E is called the total space. 


7 is called the projection map. 
B is called the base space. 
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Af, is called the fibre atlas. 

G is called the structure group. 

F is called the fibre space. 

The functions ¢ € AL are called the fibre charts of the fibre bundle. 
The sets 7~1({b}) are called the fibre sets of the fibre bundle. 

Each element of F is called a fibre of the fibre bundle. 


21.8.4 NOTATION: Fibre bundle global and local atlases 
atlas(E, x, B), for a (G, F) fibre bundle (E, m, B, AE), denotes the fibre atlas AL. 


atlas; (E, v, B), for b € B denotes the set (9 € atlas(E, 7, B); b € (Dom(¢))}. 
A5 p for b € B and a (G, F) fibre bundle (E, v, B, Az), denotes the set {¢ € Az; b € s (Dom(9))]. 


21.8.5 REMARK: Ordinary fibre bundles with empty fibre spaces. 

The possibility that F — () in Definition 21.8.3 is not excluded. This is because an effective left transformation 
group can act on an empty set if the group contains only a unit element, as discussed in Remark 20.3.3. 
However, if F is empty, then Definition 21.8.3 condition (ii) implies that Af C {Ø}, and then by condition (iii), 
B must be empty, which implies by condition (i) that E must be empty. Unfortunately this combination of 
choices does satisfy Definition 21.8.3. To avoid this circumstance, one must state explicitly that the fibre 
space is non-empty. 


21.8.6 REMARK:  Non-topological fibre bundles contain uniform fibrations. 

The triple (E, 7, B) is a uniform non-topological fibration whenever (E, r, B, AL) is a non-topological (G, F) 
fibre bundle. This is shown as Theorem 21.8.7 (i). So all of the properties of uniform fibrations are inherited 
(tax-free) by fibre bundles. : 


21.8.7 THEOREM: Properties of non-topological fibre bundles inherited from uniform fibrations. 
Let (E, r, B, AZ) be a non-topological (G, F) fibre bundle. 
(i) (E, m, B) is a uniform non-topological fibration. 
(ii) If E Z 0, then a! ((5)) Z 0 for all b € B. 
(ui) If E # f, then 7: E > B is surjective. 
(iv) If E Z 0, then VU € P(B), z(x !(U)) =U. 
(v) If Z£(E) = 1, then (E, r, B) = ({2}, ((z,0)), {b}) for some z and b. 
Pnoor: For part (i), let b1,b2 € B. Then Definition 21.8.3 (iii) implies that there exist $1,» € AE such 
) 


that bj € 7(Dom(¢1)) and b2 € m(Dom(¢2)). HD D 21.8.3 (iv), there exist U1, U2 € P(B) 
such that m x $i : 7 E — U, x F and 7 x $2 : 4-1(U53) — Ui x F are bijections. 

Then dil, -ip ° —l({b,}) — F and go|_- ido ` T~! ({b2}) > F are bijections by Theorem 10.15.10 (i). 
So del anin? "n im l1 ({b1}) 4 v! ((05])) is a bijection by Theorem 10.5.6 (iii). Thus (E, 7, B) 
satisfies Definition 21.2.1 (i). Hence (F,r, B) is a uniform non-topological fibration by Definition 21.2.1. 
Part (ii) follows from part (i) and Theorem 21.2.4 (i). 

Part (iii (i) and Theorem 21.2.4 (ii). 
Part (iv) follows from part (i) and Theorem 21.2.4 (iii). 
Part (v) follows from part (i) and Theorem 21.2.4 (iv). 


iii) follows from part 


21.8.8 THEOREM: Fibre chart restrictions to fibre sets are bijections. 
Let (E, v, B, AZ) be a non-topological (G, F) fibre bundle. Then ln, : Ey > F is a bijection for all b € B 
and ¢ € Af, 


Pnoor: The assertion follows from Theorems 21.5.6 (i) and 21.8.7 (i). 


21.8.9 REMARK: Alternative expressions for the fibre chart overlap condition. 
The function values $1(z) and ¢2(z) in Definition 21.8.3 (v) are guaranteed to be well defined for all z € 
1 l([b]) in the condition because b is required to be an element of t(Dom(¢,)NDom(¢2)), which implies that 
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« !((b)) € Dom(¢1) n Dom(¢2) by condition (ii). (This is discussed in Remark 21.5.1.) Thus condition (v) 
may be expressed equivalently as follows. 


(v) Vor, ¢2 € AE, Vb € B, Ag € G, Vz ex! ((06)) NDom(¢2) N Dom(¢1), ¢2(z) = g1 (2). 


Condition (v^) has the advantage that the test for the total space element z very clearly guarantees that the 
following expressions $1(z) and 2(z) are well defined. However, condition (v) is preferred here because in 


later definitions, the group element g € G will be formalised as a function of $4, ¢2 and b. (The uniqueness 
of g will follow from Definition 20.2.1 for an effective left transformation group.) 

In condition (v^), the set zx !((b)) n Dom(¢1) n Dom(¢z) is either equal to x! ((b]) or 0, depending on 
whether b € 7(Dom(¢,) n Dom(¢2)) or not. This is perfectly valid, but it does not most clearly reveal the 
meaning of the condition. 


Condition (v) in Definition 21.8.3 may also be expressed as follows. 


(v) Vói, 2 € AE, Vb € Dom(¢1) N Dom(¢2), dg € G, $2, o $i = Lg, where oj) denotes the function 
for b € Dom(¢,), for i = 1,2. 


9i. - quy 


21.8.10 REMARK: Fibre chart transition maps for non-topological ordinary fibre bundles. 

Since the group element g in Definition 21.8.3 (v) is unique for any given $1, $» and b, it is a well-defined 
function of those variables. (The uniqueness ensures that choice-axiom issues do not arise.) This function is 
given a name in Definition 21.8.12. 


21.8.11 THEOREM: Uniqueness of the structure group element for fibre chart transition maps. 
Let (E, v, B, A) be a non-topological (G, F) fibre bundle. 
(i) Vó1, 99 € AE, Vb € m(Dom(¢1) N Dom(¢2)), Yg € G, Yz € «-1((b)), é»(z) = u(g.d1(z)). In other 
words, there is only one choice for g in Definition 21.8.3 (v). 
(ii) For all $1, € A£, there is a unique function ge, œ, : ™(Dom(¢1))  m(Dom(¢2)) + G for which 
Vb € m(Dom(¢1)) N ™(Dom(¢2)), Vz € «^ ((6]), 2(z) = 942,0, (D) 1(2). 


Pnoor: Part (i) follows from the assumption that the left transformation group (G, F) is effective. 


Part (ii) follows from part (i). 


21.8.12 DEFINITION: Fibre chart transition maps. 
The fibre chart transition map for fibre charts ¢1,¢2 € AL, for a non-topological (G, F) fibre bundle 
(E, v, B, AZ), is the unique map gg, 4, : r(Dom(ó1)) N t(Dom(¢2)) — G which satisfies 


Vb € m(Dom(¢1)) n t(Dom(¢2)), Vz € «- !((b]), 
a(z) = Go2,o1 (b)i (2). 
21.8.13 REMARK: A (G,F) fibre bundle is the same as an F'-fibration with a (G, F) atlas. 


Definition 21.8.3 for a non-topological (G, F) fibre bundle is equivalent to the combination of the F-fibration 
in Definition 21.2.10 with a (G, F) fibre atlas as in Definition 21.7.9. 


21.8.14 THEOREM: Fibre bundles are formed from fibrations by the addition of a fibre atlas. 
Let (G, F) be an effective left transformation group. 


(i) If (E, -, B) is a non-topological fibration with fibre space F, and A is a non-topological (G, F) fibre 
atlas for (E, x, B), then (E,7, B, A) is a non-topological (G, F) fibre bundle. 
(ii) If (E, 7, B, A) is a non-topological (G, F) fibre bundle, then (E, m, B) is a non-topological fibration with 
fibre space F, and A is a non-topological (G, F) fibre atlas for (E, m, B). 
Proor: Part (i) follows from Definitions 21.2.10, 21.7.9 and 21.8.3. 
Part (ii) follows from Theorem 21.8.7 (1) and Definitions 21.8.3, 21.2.10 and 21.7.9. 


21.8.15 REMARK: Empty fibre bundles. 
It is always a good idea to check that definitions and theorems are valid in extreme cases, such as for 
example when one or more sets are empty. As mentioned in Remark 21.2.5, the empty fibration (E, r, B) = 
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(0,0,0) satisfies Definition 21.2.1. Let (G, F,o, p) be any left transformation group. Let AZ = Ø. Then 
Definition 21.8.3 parts (ii), (iv) and (v) are satisfied vacuously, and for part (iii), U{7(Dom(¢)); 6 € AZ} = 
UO = 0 = B by Theorem 8.4.6 (i). Therefore (E, v, B, AZ) = (0,0,0,0) satisfies Definition 21.8.3 for any 
effective left transformation group (G, F). 


21.8.16 REMARK: Objects which should not be in specification tuples for fibre bundles. 
Many texts include the structure group G and the fibre space F in specification tuples for fibre bundles. 
This is not necessary, and is in fact undesirable. 


For differentiable manifolds, it is undesirable to include the integer n and the order of differentiability k in 
the specification tuple for an n-dimensional C^ differentiable manifold. The dimension n is not included 
because it is evident and unambiguous in the charts of the differentiable atlas for the manifold. One does not 
include the differentiability parameter k in the specification tuple because one wishes to be able to say that 
every C^*! manifold is necessarily a C^ manifold for any integer k € Z*, for example. The set-construction 
for a C**! manifold should not need to be modified in order to assert that it is a C^ manifold. Some authors 
do indeed make such an assertion impossible by insisting that a C^ manifold should have a “complete C* 
atlas” (which includes all possible charts which are C*-consistent with the atlas). To assert that a C^*! 
manifold is C^, one must then add all of the C^-compatible charts to the atlas first. One is accustomed to 
stating that a C? real-valued function of a real variable is necessarily a C! function. This should be equally 
possible in the case of manifolds. 


For fibre bundles, it is similarly desirable to be able to say that every (G1, F) fibre bundle is a (G2, F) fibre 
bundle if G4 is a subgroup of G2. For example, every (SO(n), R”) fibre bundle is a (GL(n), IR") fibre bundle. 
One cannot easily say this if one incorporates the group G into the specification tuple for the fibre bundle. 
The fibre space F is evident from the fibre bundle atlas since it is the range of each chart. (Similarly, the 
dimension n of a manifold is evident from the range IR" of every chart.) Thus F is necessarily a fixed and 
immutable part of the structure of a fibre bundle, whereas G is a changeable attribute of the fibre bundle. 
In the case of F, it is evident and does not need to be in the specification tuple. In the case of G, it is 
changeable and therefore should not be in the specification tuple. Thus F is useless and G is undesirable. 


'These comments about specifications need to be made because it is clear, for example from Figure 21.11.1, 
that the structure group G and fibre space F are an integral part of a fibre bundle. However, according 
this kind of argument, a linear space R” should be part of every manifold definition, and the domain and 
range should be part of every function definition, and full algebraic field structure should be part of every 
differentiable real-valued function definition, and so forth. The temptation to include G and F in each fibre 
bundle is understandable, but must be resisted. 


21.8.17 REMARK: Advantages of specifying structure groups of fibre bundles. 

Definition 21.8.3 is an extension of Definition 21.7.5. In fact, Definition 21.7.5 is equivalent to the special 
case that the group (G, F) in Definition 21.8.3 is the permutation group of F. (See Example 20.2.3 for 
permutation groups.) 


In contradiction to Remark 21.8.16, one may regard the structure group as primary, while the fibre bundle 
charts are required to fit with the structure group. This is related to the structure groups of geometries. 
One may define, for example, “the” structure group for Euclidean geometry to be the minimal group which 
preserves the facts of the geometry. This would be the group of translations and rotations (and possibly 
reflections). Alternatively, one may fix the structure group and determine the set of all facts which are 
invariant under this group. 


In the case of fibre bundles, one constrains the fibre chart transition maps so that a set of "facts" in the 
fibre space may be induced onto each fibre set of the total space. The smaller the group, the more properties 
may be thus induced. For example, if (G, F) is the pair (GL(n), R”), then linear structure may be induced, 
whereas with (SO(n), R”), then special orthogonal structure may be induced onto each fibre set. 


21.8.18 REMARK: Interpretation of structure groups in terms of observers and reference frames. 

In the pure mathematical literature, the structure group of a fibre bundle is an abstract group. In practical 
applications, the structure group is the group of transition maps between "frames of reference" for fibre 
sets. The fibre space F is the set of components or coordinates for fibre sets Ey = «x !((b]), and the 
fibre charts ó : E > F map the “real objects” in sets Ej to the component space F. Each fibre chart 
$ € AL defines a frame of reference at each point b € B for making measurements on Ej. As discussed 
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in Section 20.10, particularly with reference to Definition 20.10.8, the fibre charts $| p, May be regarded 


as “observers”, and ¢(z) represents the combination of an observer frame $| p, With an object z € Ey to 
produce a measurement ¢(z) € F. The set of observer frames $| B constitutes a second fibre bundle, namely 
the “principal fibre bundle" which is discussed in Section 21.9. 


21.9. Non-topological principal fibre bundles 


21.9.1 REMARK: Principal fibrations are not meaningful. 

The notion of a “principal fibration” makes little sense because a fibration lacks a fibre atlas, and a structure 
group only plays a role when a fibre atlas is specified so that Definition 21.8.3(v) is meaningful. Thus 
without a fibre atlas, the (G, G) fibre bundle mentioned in Definition 21.9.4 would be meaningless. 


21.9.2 REMARK: Abbreviation for “principal fibre bundle". 
In this book, PFB is an abbreviation for “principal fibre bundle". 


21.9.3 REMARK: Principal fibre bundles are bundles of reference frames. 
Principal fibre bundles are bundles of reference frames for making observer-dependent measurements of the 
objects in an ordinary fibre bundle. 


Principal fibre bundles have the advantage that parallelism may be defined on their total spaces directly, 
independently of the choice of fibre charts, and then the parallelism definition may be “ported” to all ordinary 
fibre bundles which are “associated” with a given PFB. 


Principal fibre bundles have both right and left group actions on their total space. The left action corresponds 
to fibre chart transition maps, while the right action corresponds to point transformations. 


21.9.4 DEFINITION: A non-topological principal (fibre) bundle with structure group G is a non-topological 
(G, G) fibre bundle (P, , B, A). 


Alternative names: non-topological principal G-bundle or non-topological G-bundle. 


21.9.5 REMARK: Empty principal fibre bundles. 

As mentioned in Remark 21.8.15, the tuple (P, m, B, AE) = (0,0, Ø, Ø) satisfies Definition 21.8.3 for a (G, F) 
fibre bundle for any effective left transformation group (G, F). Therefore by Definition 21.9.4, this empty 
fibre bundle (Ø, 0, 0, Ø) is also a non-topological principal fibre bundle with structure group G for any group G. 
(By Theorem 20.7.17, the transformation group (G, G,o,6) where G acts on itself is always effective.) 


21.9.6 REMARK: The value of principal fibre bundles lies in associations with ordinary fibre bundles. 

The left transformation group (G,G) < (G,G,o,c) of a group G < (G,c) acting on itself is defined in 
Definition 20.3.8. This is always an effective left transformation group by Theorem 20.3.7. Therefore the 
left transformation group (G, G) may be used as the group/fibre pair (G, F) in Definition 21.8.3 for any 
group G. 


Definition 21.9.4 may appear at first sight to add very little value relative to Definition 21.8.3 for ordinary 
fibre bundles. The usefulness of principal fibre bundles lies in the relations between principal fibre bundles 
and ordinary fibre bundles, particularly in the porting of parallelism between fibre bundles which have a 
common structure group and matching atlases. Fibre bundles with the same base space and structure group, 
and matching atlases, are called “associated fibre bundles". 


Since a principal bundle P and associated ordinary fibre bundle E are often discussed in the same context, 
it is sometimes convenient to distinguish their projection maps as 7: E — B and tp: P > B for example, 
as in Definition 47.11.5. Principal fibre bundles are also called “principal G-bundles” or just *G-bundles". 


21.9.7 REMARK: Fibre chart transition maps for principal bundles are “quotients” of chart values. 

The group element product $2(z)¢1(z)~! in Theorem 21.9.8 may be thought of as a “quotient” of $»(z) 
“divided by" $1(z) on the right. This expression for g¢,,¢,(b) is only possible because the fibre space of a 
principal bundle is the same as the structure group. 
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21.9.8 THEOREM: Fibre chart transition maps of principal bundles look like quotients. 
Let (P, r, B, A8) be a non-topological principal bundle with structure group G. Then 


Vó1,» € AG, Vb € (Dom(¢1)) N «(Dom(ó3)), Vz € a! ((b)), 
45,6, (b) = é2(z)ei(z) 1. 


PROOF: The assertion follows from Definitions 21.8.12 and 21.9.4. 


21.10. Identity cross-sections and identity charts on principal bundles 


21.10.1 REMARK: Identity cross-sections, identity charts, and gauge theory. 
Identity cross-sections and identity charts on principal bundles are applicable to gauge theory. (See for 
example Bleecker [254], page 27; Daniel/Viallet [317], page 178.) 


21.10.2 REMARK: Identity cross-sections for non-topological principal bundles. 

The constant cross-sections of non-topological fibrations in Definition 21.6.6 are well defined for principal 
bundles also, but whereas an arbitrary choice of fibre space value fo € F must be specified in the construction 
of cross-sections of fibrations, groups always have a well-defined unique identity element which can act in 
this role. Definition 21.10.3 specialises Definition 21.6.6 by choosing the identity e € G as the fibre space 
value. Thus effectively every fibre chart for a principal bundle is associated with a unique local cross-section 
via the identity element e. (The converse is true also, as shown by Definition 21.10.7 and Theorem 21.10.8.) 


Although only fibre charts ¢ in the given atlas A& are used as input for Definition 21.10.3, any chart which is 
compatible with AZ may be used for the construction, where the meaning of “compatible” varies according 
to the class of principal bundle, which may be topological or differentiable for example. (Definition 21.10.3 
is illustrated in Figure 21.10.1.) 


Input ¢. Output X¢. 


Figure 21.10.1 Identity cross-section for a non-topological principal bundle 


21.10.3 DEFINITION: The identity cross-section of a mon-topological principal bundle (P,v, B, AG) with 
structure group G, for a fibre chart ¢ € AG, is the function X; : U — «- 1(U) defined by 


Vb € U, X4(b) = (v x p) (b, e) 


where U = v (Dom(4)), and e is the identity of G. 


21.10.4 REMARK: Fibre chart transition formula for identity cross-sections. 

Theorem 21.10.5 (v) becomes more meaningful when the arbitrary chart $' is removed by applying the right 
action map L5 : P x G — P in Definition 21.11.4. (See Theorem 21.11.14 (i).) Then the formula becomes 
Vz € Py, Xs(b) = uE(z, ó(z) 1), which looks simpler when written as Vz € Pj, X¢(b) = zó(z) 1. (This 
could be used as a definition for X, but it seems better to regard it as an incidental property.) 
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21.10.5 THEOREM: Basic properties of identity cross-sections of principal bundles. 
Let (P, m, B, AG) be a non-topological principal G-bundle. Let Xg denote the identity cross-section for fibre 
charts ó € AG as in Definition 21.10.3. 


(i) Vo € AB, Vb € n(Dom(9)), «(X4(b)) — b 
(ii) Vo € AG, X, € X(P,7, B | r(Dom(¢))) C X(P,7, B). (That is, X, is a local cross-section of (P, v, B).) 
(ii) Vo € A, Vb € «(Dom(9)), 6(X4(b)) =e 
) 


iv) Vo, d! € AS, Vb € n(Dom(¢) n Dom(¢’)), ¢/(X4(b)) = gos (b), where ggr.4(b) denotes the fibre chart 
P D p'o 9, 
transition map for P from ¢ to ¢’ in Definition 21.8.12. 


(v) Và, ?! € AG, Vb € r(Dom(¢) N Dom(¢’)), Vz € x! ((b)), 9^(X4(b)) = à'(z)0(z)1. 
PROOF: For part (i), let ¢ € AG and b € 7(Dom(¢)). Then X4(b) = (a x à)! (b, e) by Definition 21.10.3. 
So (v x $)(X4(b)) = (b, e). Hence «(X4(b)) = b. 

For part (ii), let ¢ € AZ. Let U = Dom(X,). Then U = z(Dom(9)) by Definition 21.10.3. Let b € U. 
Then m(X4(b)) = b by part (i. So X, € X(P,z, B| 1(Dom(9))) by Notation 21.3.4. (The inclusion 
X(P,7, B|x(Dom(¢))) C X (P, v, B) is automatic by Notation 21.3.4.) 

For part (iii), let 6 € AG and b € 7(Dom(¢)). Then $(X4(b)) = o((m x 9) 1(b,e)) = (|p, (e)) =e. 

For part (iv), let ¢,¢’ € AG and b € m(Dom(¢) n Dom(¢’)). Let z' = lp, (e) with P, = « !((b)). Then 
e (X4(b)) = é'(( x )71(6.6)) = e (e| p, (6) = d G) = 961,0(0)4(2") = gos (b) for any z € Py. 
Part (v) follows from part (iv) and Theorem 21.9.8. 


. In other words, V € AG, To X4 = idr(Dom(¢))- 


21.10.6 REMARK: The identity chart corresponding to a given principal bundle cross-section. 

Definition 21.10.7 is more or less a converse to Definition 21.10.3. One difference is that it is not at all 
guaranteed that the constructed “identity chart” will be in the given fibre atlas AG for the principal bundle, 
but Theorem 21.10.8 (ii) asserts that it is at least compatible with Af. 


Definitions 21.10.3 and 21.10.7 are not exact inverses or converses of each other. The identity cross-section in 
Definition 21.10.3 uses only a small amount of the information in the given chart $ to construct X, because 
it samples the inverse chart’s value only for the identity element e € G. By contrast, the identity chart 
in Definition 21.10.7 uses the information in the cross-section X to construct a chart $x not only for the 
image X(U) of X, where the chart's value is e (as shown in Theorem 21.10.8 (iv)), but also for the entire 
set 1 !(U). (Definition 21.10.7 is illustrated in Figure 21.10.2.) 


UxG 


Input X. Output dx. 


Figure 21.10.2 Identity chart for a non-topological principal bundle 


21.10.7 DEFINITION: The identity chart for a (local) cross-section X of a non-topological principal bundle 
(P,7, B, AG) with structure group G is the function óx : 1 1(U) — G defined by 


Yz € a7 (U), Vb € AG. éx(z) = AX me) A), (21.10.1) 
where U = Dom(X). 
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21.10.8 THEOREM: Basic properties of identity charts for cross-sections of principal bundles. 
Let óx be the identity chart for a local cross-section X of a non-topological principal bundle (P, s, B, AS). 


(i) x is well-defined. That is, óx is independent of the choice of fibre chart ¢ in line (21.10.1). 
(ii) AG U (ox) is a (G, G) fibre atlas for the non-topological fibration (P, 7, B). 
(ii) (P, v, B, AS U (óx]) is a non-topological principal bundle with structure group G. 
(iv) Vp € Dom(X), éx(X(p)) =e. 


PROOF: For part (i), let z € m~1(U), Let 9!,9? € AG with z € Dom(9!) n Dom(9?). Then there is a 
go € G such that ¢?(z’) = goó! (2) for all z' € x! ((1(z)]) by Definition 21.8.3 (v). Then 


e X (n(2)) X(T) 8 (X ((2)) * 9? (2)9! (2)! et (a) 
P (X (1(2)))! go ! go à! (2) 
$ (X (n(2))) 19! (2). 


$^ (X((2))) e^ (2) 


Hence line (21.10.1) gives the same value for óx (z), independent of the choice of ¢. 


For part (ii), let P < (P, v, B, AS) be a non-topological principal bundle. Let X : U — P with U C B 
be a local cross-section of P as in Definition 21.3.3. Let óx : x 1(U) > G be the identity chart for X as 
in Definition 21.10.7. Then 7(Dom(¢x)) = «(1 !(U)) = U by Theorem 10.7.1 (i) because 7 is surjective 
(onto B). So Dom(óx) = 1 !(U) = x! (s(Dom(óx))). So dx satisfies Definition 21.7.9 (i). But AS satisfies 
Definition 21.7.9 (i) because it satisfies Definition 21.8.3 (ii). So AG U {dx} satisfies Definition 21.7.9 (i). 

To show that 7 x óx : Dom(óx) — «(Dom(óx)) x G is a bijection, first note that it is a well-defined 
function because Range(r|p,.. (5) = 7(Dom(¢x)) and Range(óx) C G. To show that it has an inverse, let 
(b, g) € m(Dom(¢x)) x G. Let ¢ € AS with b € tDom(¢). Let z = (x x $)- (b, 6(X(b))g). Then v(z) = b 
and ¢x(z) = 6(X(b))-!ó(z) = e(X(b))-19(X(b))g = g, Thus (1 Xxóx)(z) = (b, g), which verifies surjectivity. 
But injectivity follows from the fact that $a is injective for all b € m(Dom(¢)) by Definition 21.8.3 (iv). 


Therefore 7 x óx : Dom(óx) — 7(Dom(¢x)) x G is a bijection. So since 7 x ¢ : Dom(¢) + m(Dom(¢)) x G 
is a bijection for all charts ¢ € AG, it follows that A U {¢x} satisfies Definition 21.7.9 (ii) also. 


It is clear that the base-space coverage condition, Definition 21.7.9 (iii), must hold for AG U (ox) because 
the charts of AG already cover the base space. 

The fibre chart transition condition, Definition 21.7.9 (iv), holds for all $1, ¢2 € AE by Definition 21.8.3 (v). 
So it remains to show that the transition rules are correct for ¢ € AE and ox. 

Let ¢ € AS. Let b € (Dom(¢) n Dom(óx)) = 7(Dom(¢))NU. Let g = ó(X(b))-!. Let z € x-!((b)). Then 
óx(z) = é(X(1(z)))-!ó(z) = gó(z). Thus 3g € G, Vz € «^! ((b)), óx(z) = gó(z). So Definition 21.7.9 (iv) 
is satisfied for ó € AE and dx. Hence Definition 21.7.9 for a (G, G) fibre atlas for a non-topological fibration 
is satisfied. 


Part (iii) follows from part (ii) and Theorem 21.8.14 (i) and Definition 21.9.4. 


For part (iv), let p € Dom(X). Then óx(X(p)) = e(X(n(X(p))) 'é(X(p)) = &(X(p) !6(X(p)) = e for 
all ó € AS, because r o X = idpom(x) by Definition 21.3.3. 


21.10.9 REMARK: Transition map for identity charts on principal bundles. 

Theorem 21.10.10 states that the quotient ¢x(z)¢y(z)~! is independent of z € m~1({b}) for any b in the 
common domain of two identity charts. This implies that a transition map can be defined which depends 
only on base points b, not on the choice of z. This is given a name in Definition 21.10.11. 


21.10.10 THEOREM: Transition map formula between identity charts for different cross-sections. 
Let óx and óy be the identity charts for local cross-sections X and Y respectively of a non-topological 
principal bundle (P, r, B, AG). Then 


Vb € Dom(X) n Dom(Y), Vz, z’ e » !((b]), 
óx(z)óv(z)! = ox (2')oy (2). 
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PROOF: Let be Dom(X) n Dom(Y), z,z € x 1((b6)) and ó € A$». Then 


by Definition 21.10.7 and Theorem 17.3.17 because T(z) = «(z") 


21.10.11 DEFINITION: The identity chart transition map for cross-sections X and Y of a non-topological 
principal G-bundle (P, x, B, AG) is the map xy : Dom(X) n Dom(Y) — G defined by 


Vp € Dom(X) n Dom(Y), Vz € « !((b)), 
bxy (p) = ox (z)óv (2) *. 


21.10.12 THEOREM: Relation between identity chart transition maps and. structure group elements. 
Let (P, 7, M, AG) be a non-topological principal G-bundle. Let X,Y € X(P,7,B). Let U = Dom(X) N 
Dom(Y). Let g : U 2 G. Suppose that Vb € U, Y (b) = X(b)g(b). Then vxy = g. 


PROOF: Let b € U. Let z € « !([b]). Let ¢’ € atlas,(P,7,M). Then vxv(p) = éx(z)óv(z) ! = 
e'(X(p)) e ()(9 (Y (9)) *9'(3))* = e'GX(p) *e(X(m)g(p)) = 9'CX(p)) ^e (X (7)g(») = 29). 


21.10.13 REMARK:  Equi-informationality of principal bundle fibre charts and identity cross-sections. 

A cross-section of a principal bundle can be converted to a specific chart, and then back again into the 
original cross-section. Moreover, a fibre chart can be converted to a specific cross-section, and then back 
again to the original fibre chart. This implies that these structures are “equi-informational” because no 
information is lost or gained in these conversions. (A minor technical subtlety here is that Definition 21.10.3 
for an identity cross-section must be extended to all compatible fibre charts, not just charts in a given fibre 
atlas. This extension is easy to do, but not interesting enough to merit the effort.) 


21.10.14 THEOREM: Inter-convertibility of fibre charts and identity cross-sections. 
Let (P, v, B, AG) be a non-topological principal G-bundle. 


(i) Ve € AG, bx, =Q. 
(ii) VX € X(P,7,B), Xo, =X. 


Proor: For part (i), let 6 € AY. Let U = x(Dom(¢)). Then Dom(X¢) = U and X4(b) = (x x ¢)71(b,e) 
for all b € U by Definition 21.10.3, and X, € X(P,7,B|U) by Theorem 21.10.5 (ii). Let z € Dom(X,) = 
nm l(U). Let ' € AT (2) = AS; where b = a(z). Then óx,(z) = 9'(X4(b)) | ó(2) by Definition 21.10.7. 
But 9'(X4(b)) = $'(z)ó(z) ! by Theorem 21.10.5 (v). So éx,(z) = (9'(z)ó(z2) !) !o(7) = (z). Thus 
$x, (2) = (2) for all z e Dom(¢x,) = Dom(¢). Hence ¢x, = 9. 

For part (ii), let X € X(P,z,B). Then X € X(P,z, B|U) for some U € P(B) by Notation 21.3.4. So 
x : 1 1(U) > G satisfying Definition 21.10.7 line (21.10.1) is well defined. Consequently the identity 
cross-section Xg, € X(P,m, B|U) is well defined by Definition 21.10.3. (This very marginally extends 
Definition 21.10.3 to fibre charts which are not in the given atlas A e .) 


Let b € U. Then óx(X,,(b)) = e by Theorem 21.10.5 (iii). Let z = X44). Then by Definition 21.10.7, 
e = óx(z) = ¢'(X(b))~'¢/(z) for any fibre chart ¢’ € AS ,. So ¢'(z) = 9'(X(b)). But z(z) = b = «(X(b)). 
So (m x 9!) 1(9'(z)) = (bz) and (m x 9!) !(e'(X(0))) = (b, X(b)). So z = X(b). Thus X44) = X(b) for 
all b € U. Hence X4, = X 
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21.11. The right action map of a non-topological principal bundle 


21.11.1 REMARK: The right action map of a principal bundle is implicit in its definition. 

Principal fibre bundles permit the construction of a right action map by the structure group on the total 
space which commutes with the chart transition maps. The action map does not need to be specified as part 
of the definition of a principal fibre bundle because it is implicit in the standard specification tuple. The right 
action map for a principal bundle is important for the definition of connections on differentiable fibre bundles 
in particular. Theorem 21.11.2 establishes that the right action map is well defined and chart-independent. 


21.11.2 THEOREM:  Well-definition and chart-independence of the right action map. 

Let (P,v, B, A8) be a non-topological principal fibre bundle with structure group (G,c). For all g € G 

and ¢ € AG, define Rg, : Dom(¢) + Dom(¢) by Rg,g(z) = (v x 9)-1(x(z),o(ó(z), g)) for all z € Dom(¢). 
(i) Rg, : Dom(¢) + Dom(9) is a well-defined bijection for all g € G and ¢ € AS. 

(ii) Ve € AG, Vz € Dom(9), Vai, 92 € G, Rgi,6(Roo,6(2)) = Ro(gs,g1),6(2): 

(ii) V € A, Vz € Dom(¢), Reg.g(2) = z. 

(iv) Voi, 9» € AG, Vz € Dom(¢1) n Dom(¢2), Vg € G, Rg, (2) = Rg,ġ2 (2). 


Pnoor: Part (i) follows directly from the definition a fibre chart and the fact that the map Ry : G > G 
defined by Ry : f ++ o(f,g) is a bijection on G by the definition of a group. 


For part (ii), let (P, v, B, AG) be a non-topological principal fibre bundle with structure group (G,c). Let 
$ € AG, z € Dom(¢) and gi,g2 € G. Let b = z(z), and let z2 = Ry,,g(z) and zi = Ry, .4(z2). Then 22 = 
(x x $)-1(x(z),o(ó(z),ga)). So (n(z2),0(22)) = (n(z),o(ó(z), ga)). Therefore z2 € 7~1({b}) and (z2) = 
a((z),g2). Similarly, z1 € «-!((b]) and (21) = o(¢(z2),g1). Therefore $(z1) = o(o(¢(z), 92), 91) = 
o($(z), o (ga. g1)). Let 23 = Ro(go.g1),6(2)- Then (m(z3), 6(23)) = (n(2).o(6(2).0(92. 91))). So z3 € n^! ((0]) 


and $(z3) = c($(2),0(g2, g1)) = (21). Therefore z3 = z;. That is, Ry, 4 (R5,,5(2)) = Istae nius Ut) 
For part (iii), let 6 € AG and z € Dom(¢). Then 
Rec e(z) = (x x 6) !(n(z),o(d(z), ea)) 
= (mx)  ((2), (2) 
=z. 


For part (iv), let ¢1,¢2 € AS and z € Dom(¢1)M Dom(¢2). Let z1 = Rg,4,(z) and z2 = Hg,4,(z). Then 
(m(21).0163)) = (a(z), 0(¢1(2),9)) and (m(22). 02(22)) = ((z).o(62(2).9)). So m(z1) = n(z2) = T(z) and 
$1(z) = o(¢1(z), g) and ¢2(z2) = o(¢2(z), g). By Definition 21.8.3 (v), there is a group element g’ € G such 
that Vz’ € a 1((b3), o1(2’) = o(g', ¢2(2’)), where b = m(z). So from $1(z1) = o(¢1(z),g), it follows that 
a(g', $2(a)) = o(0(9', ¢2(2)), 9) = e(g',o(02(z), g)) by the associativity of o. So ġ2(21) = e(92(z), g) (by 
pre-multiplying both sides with (g')-!). So ¢2(z2) = ¢2(z1). Therefore z2 = zı because v x ¢ is a bijection 
by Definition 21.8.3 (iv). Hence Rg, (z) = R5,5, (2), as claimed. 


21.11.3 REMARK: Action by structure groups on total spaces may be defined in a chart-independent way. 
Theorem 21.11.2(iv) is the core reason for defining principal fibre bundles. It says that an action of the 
structure group G on the total space P may be defined in a chart-independent manner. The reason for this is 
that right and left actions by G on G commute with each other. The fibre chart transition maps are equivalent 
to left actions. Therefore the right action commutes with fibre chart transition maps. Consequently, the 
right action map wé in Definition 21.11.4 is well defined although it may at first sight seem to depend on 
the choice of chart œ. 


21.11.4 DEFINITION: The right action (map) (on the total space) of a non-topological principal fibre bundle 
(P, v, B, AG) with structure group (G,c) is the map på : P x G — P defined by 


Vz € P, Yg e G, VOEAP (2), uis g) = (x 6) (Tl), 0(6(2),9)); 
where AS n(o) = (6 € AŞ; z € Dom(¢)} as in Notation 21.8.4. 
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G o G on 
L 
P uG 
E 
T T T 

B B BxF B Bx F B BxG 

fibration F-fibration (G, F) fibre bundle principal G-bundle 
Figure 21.11.1 Non-topological fibrations and fibre bundles 


21.11.5 REMARK: Diagram for the right action map on a principal fibre bundle. 
Figure 21.11.1 illustrates the sets and functions in Definition 21.11.4 for principal fibre bundles in comparison 
with fibrations and ordinary fibre bundles, which do not have a right action map. 


The right action map is called the “principal map” by Steenrod [142], pages 37-38. 

Definition 21.11.4 is independent of the choice of the fibre chart ¢ by Theorem 21.11.2. This fibre-chart- 
independence follows from Definition 21.8.3 (v), which implies that there is a unique group element which 
effects the group transitions for all elements of a fixed fibre x! ((b)), for a fixed pair of charts $1 and $». 


21.11.6 NOTATION: RT. for a non-topological principal G-bundle (P, x, B, AG) and g € G, denotes the map 

from P to P defined by z — u&(z, g). In other words, RP : P — P is defined by Vz € P, RẸ (2) = u&(z. g). 

In other words, RP = u&(-, 9). 

zg, for z € P and g € G, denotes RẸ (2). 

21.11.7 THEOREM: Some basic properties of the right action map of a non-topological principal bundle. 

Let (P, v, B, AG) be a non-topological principal G-bundle with group G < (G, oa). 

(i) Vg € G, Dom(R7) — P. 

(ii) Vz € P, Von, ga € G, 2(9192) = (291)g2- 
(That is, Vz € P, Voi, 92 € G, RE ,,(z) = RI, (RI, (z)). Thus Voi, g2 € G, RF, = R}, o RI.) 

(iii) Vz € P, zeg = z. 

(iv) (G, P,og, uÈ) is a right transformation group. 

(v) Vz € P, Vg € G, n(zg) = T(z). (That is, v(uE(z, g)) = n(z).) 

(vi) Vo € AG, Yz € Dom(¢), Vg € G, (29) = o(2)g. (That is, 6(uB(z,9)) = oelle), 9).) 

(vii) Let u : Px G — P bea function which satisfies (v) Vz € P, Vg € G, n(u(z, g)) = T(z) and (vi) Yọ € AS, 
Vz € Dom(9), Vg € G, ó(u(z, g)) = ea(ó(z), g). Then u = ub. Hence u£ is the unique function from 
P x G to P which satisfies parts (v) and (vi). 


Vg €G, To RP — m. 
Vb € AG, Vg EG, Go RP = RE o 9. 
Vb € B, Yz € Ph, P, = {29; g E G}. 
Vb € B, Yz, z’ € Py, V9 E G, 2 = zg. 
-1 -1 
vb € B, Vo € A, Vg, 92 € G, YP, (9192) = $l p, (91)92- 
-1 -1 
Vb € B, Vó € AS, Vg € G, e|, (9) = op, (C9. 


(viii 
(ix 
(x 


(xi 


(xii 


FN ONS NS M Sg 


(xiii 
PROOF: Part (i) follows from Definition 21.11.4 and Notation 21.11.6. 
Part (ii) follows from Theorem 21.11.2 (ii), Definition 21.11.4 and Notation 21.11.6. 


Part (iii) follows from Theorem 21.11.2 (iii), Definition 21.11.4 and Notation 21.11.6. 
Part (iv) follows from Definition 20.7.2 and parts (ii) and (iii). 
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For part (v), let z € P, g € Gand o € AP mle) Then by Definition 21.11.4 and Notation 10.14.4, 
((29), 6(29)) = (v x $) (29) = (m x ó)((m x 9) *(n(2),6(2)9)) = (a(z), O(2)g)- Hence (zg) = T(z). 
For part do let € AG, z € Dom(¢) and g € G. Then (n(zg). 6(zg)) = (T(z), ó(z)g) as in part (v). Hence 


(zg) = ó(2)g- 
For part (vii), suppose that u : P x G — P satisfies (v) and (vi). Let z € P, g € Gand o € AS (a): Then 


by (v), m(u(z,9)) = T(z). and by (vi), 6(u(z,g)) = ea(6(2),9). So ( x $)(u(z,9)) = (n(z), oa(6(2), 9))- 
Therefore u(z, g) = (x x $)-1(((z),0G(0(z),9))) = uE(z, g). Thus u = u&. Hence uE is the unique map 
from P x G to P which satisfies (v) and (vi) 


Part (viii) follows from part (v). 

Part (ix) follows from part (vi). (See Definition 10.10.6 for composition of partially defined functions.) 
For part (x), let b € B and z € Py. Let z' € Py. Let p € Ag ,. Let g = $(z) !ó(z'). Then by part (vi) and 
Definition 21.11.4, zg = (n x ¢)~1(b, (z)g) = (a x $) 1(n(z),o(7)) = z'. Thus P, C (2g; g € G}. Now let 
z'€izg: g € G}. Then z' = zg for some g € G. So z’ € P, by part (v). Hence P, = 12g; g € G}. 

For part (xi), existence follows from part (x), and uniqueness follows from Definition 21.11.4 and Theorems 
21.8.8 and 20.3.7. 

For part (xii), $| 5! P, — G is a bijection by Theorem 21.8.8. Consequently dlp, : G — RP, is a bijection 
by Theorem 10.5.11. Let g1,g9 € G. Then gig» = 6(4| p, (g1))92 a lol p, (91)92) by part (vi). Hence 
dlp (0192) = |p, (Olp (01)92)) = Op, (g1)92- 
Part (xiii) follows from part (xiii). 


21.11.8 DEFINITION: The right transformation group of a non-topological principal fibre bundle 
(P, v, B, AG) with structure group (G, øq) is the right transformation group (G, P, oc, uE), where uE is the 
right action map on (P, v, B, AS). 


21.11.9 REMARK: Basic properties of the right transformation group of a principal bundle. 

Theorem 21.11.10 asserts that right transformation groups of principal bundles act freely. In other words, the 
only right action which has a fixed point is the identity action. Note that by Remark 21.9.5, the possibility 
that the total space is empty cannot be excluded. 


21.11.10 THEOREM: Free and effective action by right transformation group of a principal bundle. 
The right transformation group of a non-topological principal bundle acts freely on the total space. In other 
words, for any non-topological principal bundle (P, m, B, AG) with structure group G, 


Vg € G, dz € P, uS (5,9) =z) > g=e. 


“—— 


Hence the action is effective if the total space is non-empty. In other words, if P Z () then 
Vg € G, (vz € P, u&(z,g) =z) > g=e. 


Proor: Let P < (P,v, B, A8) be a non-topological principal bundle with structure group G < (G,oc). 
Then the right transformation group of P is (G, P) < (G, P, oc, uE), where ui is the right action map on P. 
Let g € G\ {e}. Let z € P. Suppose that u&(z, g) = z. Let 6 € A8. Then (1x6) !(n(z)o(9(2),9)) = 2 
by Definition 21.11.4. So (1(z),o($(z), g)) = (a(z), ó(z)). Therefore c(ó(z), g) = (z). But G acts freely on 
G by Theorem 20.7.17. Therefore g = e. So G acts freely on P by Definition 20.7.11. Hence if P 4 (), then 
G acts effectively on P by Theorem 20.7.12. 


21.11.11 REMARK: Construction of a principal bundle from a right transformation group. 

A principal bundle may be constructed from any right transformation group where the group acts freely on 
the set. (See for example Bleecker [254], page 26; Daniel/Viallet [317], page 177.) Theorem 21.11.12 uses the 
orbit space in Definition 20.7.23 as the base-point space, and the projection map simply maps each element 
of the passive set to its orbit in Definition 20.7.20. This construction is the reverse of Definition 21.11.8, 
which constructs a right transformation group from a principal bundle. 
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Since the projection map of a principal bundle which has been constructed from a right transformation group 
is the quotient map q : P — P/G for the quotient group P/G as in Definition 17.7.9, it is not surprising 
that some authors use the notation “q” for this map. (See for example EDM2 [113], page 569.) 


It is not possible in general to construct a fibre chart for an infinite subset of B by combining the pointwise 
fibre charts constructed in Theorem 21.11.12 (i) because an element z € P must be chosen for each base 
point b. This may be impossible without an axiom of choice, although in typical concrete applications there 
will usually be obvious choice-rules. The choice of zo € P for each b € B is effectively the choice of a 
cross-section of the fibration. (See Definition 21.3.3 for cross-sections of fibrations.) So the existence of a 
cross-section for a given subset of B is equivalent to the existence of a fibre chart for that subset. 


Theorem 21.11.12 (iv) implies that one may start with either the right transformation group or the principal 
bundle, and construct one from the other. The main advantage of starting with the principal bundle is that 
it has a concrete base-point set B rather than an abstract orbit space P/G. 


21.11.12 THEOREM: Construction of a principal bundle from a right transformation group. 
Let (G, P,c, u) be a right transformation group where G acts freely on P. Let B = P/G = (zG; z € P), 
where zG = {zg; g € G} for z € P. Define 7: P > B by «(z) = zG for z € P. 


(i) (P, m, B) is a non-topological fibration with fibre space G. 


(ii) (P, v, B, AG) is a non-topological (G, G) fibre bundle with AG = {¢.,; zo € P}, where z : P,(4 > G 
is defined for zo € P as the unique map from P,:,,) to G which satisfies 


Vz € Ph, 20x) (2) = z, (21.11.1) 


where P, = « 1((b]) for b € B. In other words, P, = {z € P; b = zG}. 
(iii) (P, v, B, AG) is a non-topological principal fibre bundle with structure group G. 
(iv) The right action map of the non-topological principal bundle (P, 7, B, AG) is p. 


PROOF: For part (i), let b € B. Choose zo € P such that b = z9G. Let z € Py. Then z € P and b = zG. 
So z = zeg € zG = zgG. Therefore z = zog for some g € G, which is unique by Definition 20.7.11 because G 
acts freely on P. Thus Vz € Py, 3'g € G, z = zog. Therefore there is a unique function ¢,, : P, + G which 
satisfies line (21.11.1). 


To show that $4, : P, > G is a bijection, define L4, : G — P, by L(g) = zog for all g € G. Let z € P». 
Then L4($:,(2)) = Zobz(z) = z by line (21.11.1). Let g € G. Then ¢,,(L2,(9)) = $z,(zog). Therefore 
zoz (Lz (9)) = zoóz,(20g) = zog by line (21.11.1). So z (L4,(g)) = g because G acts freely on P. Thus 
$z 0 Lz, = idp, and $4, o Lz, — ida. Therefore ¢,, : Pp + G is a bijection by Theorem 10.5.14 (iv). Hence 
(P, 7, B) is a non-topological fibration with fibre space G by Definition 21.2.10. 


For part (ii), Theorem 20.7.17 implies that G acts effectively on G, as required by Definition 21.8.3 for a 
non-topological (G, G) fibre bundle. To show that (P, n, B, AS) satisfies Definition 21.8.3 (ii), let ọ € AG. 
Then ó = $4, for some z € P. Let b = z9G. Then m(zo) = b. Let U = {b}. Then U € P(B) and 
c l(U)- P», and so $4, : U > G is a well-defined function. Thus Definition 21.8.3 (ii) is satisfied. In fact, 
bz : U > G is a bijection. So m x $4, : P, + U x G is a bijection. Thus Definition 21.8.3 (iv) is satisfied. 
For all $4, in A, t(Dom(¢.,)) = 7(P,) = {b}. So U(z(Dom(à); 9 € AS} = U{{b}; b e B) = B. 
Therefore Definition 21.8.3 (iii) is satisfied. 

To verify Definition 21.8.3 (v), let 9!, 9? € AG. Then 9! = $,, and $? = ¢,, for some 2, 22 € P, and then 
Dom(¢') = Py, and Dom(¢?) = P;,, where bı = 21G and b = z3G. Let b € t(Dom(¢!) n Dom(¢?)). Then 
b € (P5, N P5,), which implies that b = bı = bz. Let g = $4, (21). Let z € Py. Then gó!(z) = $4, (21) dz, (2). 
So zsgó!(z) = zaó. (1)64 (2) = 2162, (2) = z = 2262, (2) = z2d?(z). Therefore gb1(z) = $?(z) because G 
acts freely on P. Thus Definition 21.8.3 (v) is satisfied. Hence (P, v, B, AG) is a non-topological (G, G) fibre 
bundle by Definition 21.8.3. 


Part (iii) follows from part (ii) and Definition 21.9.4. 
For part (iv), let LE : P x G — P be the right action for (P, v, B, AG) as in Definition 21.11.4. Then 


Vg G, Vz P, Vo AS «(a uc, g) = (x x 4) (a(z), ó(2)9). 
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Let g € G and z € P. Let ge AS (ay Then ó = $4, for some zo € P, with b = m(z) = m(zo). It 
follows that p(z, g) = ( x $4)! (n(z), ó2,(z)g). But zoó4,(z)g = zg by line (21.11.1) and zo$4,4 = zg by 
line (21.11.1). So $4,(z)g = $z (z9) because G acts freely on P. Also, T= = m(z) by the definition of 7. 
Therefore uP (2,9) = (m X bay) M (n(z9). s, (28)) = 29 = mlz, g). Hence p8 = p. 


21.11.13 REMARK: Identity cross-sections expressed in terms of the right action map. 
Theorem 21.11.14 applies the right action map of a principal bundle to express identity cross-sections in a 
slightly simpler form. 


21.11.14 THEOREM: Formula for identity cross-sections in terms of the right action map. 
Let (P, v, B, A) be a non-topological principal G-bundle. Let X4 denote the identity cross-section for fibre 
charts ¢ € AG as in Definition 21.10.3. 


(i) Vó € AG, Vb € «(Dom(ó)), Vz € x-!((b]), X&(b) = 26(z)72. 


PROOF: Part (i) follows from Theorem 21.9.8 (v) and Definition 21.11.4 


21.11.15 REMARK: Compatibility of identity charts with principal bundle atlases. 
Theorem 21.11.16 shows that the identity charts in Definition 21.10.7 obey the same rule for the right action 
map as the fibre charts in the principal bundle's own atlas. 


21.11.16 THEOREM: Identity charts commute with the right action map like fibre atlas charts. 
Let (P, v, B, AG) be a non-topological principal G-bundle. 
(i) VX € X(P, v, B), Vz € a! (Dom(X)), Vg € G, éx(zg) = óx(z)g. 


(ii) VX € X (Pir, B), Vg € G, Ox o RE |, ioo = R$ OFX: 


PROOF: For part (i), let z € m~'(Dom(X)) and g € G. Then m(zg) = m(z) anc (zg) = o(z)g by 
Theorem 21.11.7 (v, vi). It follows that dx(zg) = ó(X(1(z9g))) |ó(zg) = é(X(n(z))) !9(z)g = óx(z)g by 
Definition 21.10.7. 


Part (ii) is equivalent to part (i). 


21.11.17 REMARK: Left action maps by principal bundle elements “acting on” group elements. 

The left action map Lz, which is defined for temporary convenience in the proof of Theorem 21.11.12 (i) is in 
fact of great importance because when it is defined for differentiable principal bundles as in Definition 66.4.2, 
it is the map which must be differentiated to obtain the infinitesimal action map of Lie algebra elements 
acting on the principal bundle in Definition 66.5.2, which is then transposed to construct the “fundamental 
vertical vector field” in Definition 66.6.2, which is a central concept used in the development of properties of 
connection forms in Definition 69.5.4. The basic properties of the left action map given in Theorem 21.11.19 
are useful for the definitions of the infinitesimal action map and fundamental vertical vector field. 


The left action map in Definition 21.11.18 is constructed from the right action of the structure group of a 
principal bundle, which can only be defined for principal bundles, not for ordinary fibre bundles. It should 
perhaps also be noted that the fibre atlas is required for this construction since the right action map is 
constructed from this atlas. Therefore a principal fibration structure is not sufficient. 


21.11.18 DEFINITION: The left action (map) by a principal bundle element z € P, for a non-topological 
principal G-bundle P < (P, v, M, A) is the map LË : G — P defined by LP : g> zg. In other words, 


vzeP,VgeG, LE (g) = (a). 
In other words, Vz € P, LP = uE(z, -). (See Definition 21.11.4 for uE.) 


21.11.19 THEOREM: Basic properties of left action map by a principal bundle on its structure group. 
Let (P, v, M, AG) be a non-topological principal G-bundle. 


(i) Yz € P, Yg € G, Vb € AG, LP (9) = (x 9)" (0), (29). 
(ii) Range(L7) = P; for all z € P. 
(iii) LP : G > P,(z) is a bijection for all z € P. 
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(iv) Yz € P, Yg € G, LẸ o RẸ = RẸ o LẸ. 
(v) Yz € P, (LP) !(z) =e. (Note that Yz € P, Dom((LP)~') = Ps.) 
(vi) Yz € P, và € AG, We! € Pray (11) (2) = d(2)- 4). 
In other words, Vz € P, Vó € AS (2): (LP)-1 = Loy o Bp, E 
(Note that (LP)~! is the inverse of a map LP, whereas $(z)-! is the inverse of a group element $(z) € G.) 
= -1 
(vii) Vz € P, Yọ € AT ney (L) to P| Be = Lo gei 


Pnoor: Part (i) follows from Definitions 21.11.18 and 21.11.4. 
For part (ii), let z € P. Then x(L? (g)) = v(z). by part (i). So D. So LP (g) € P; (;). Therefore Range(L 


Let z € Pris). Let g = ó(z)- (^). Then LP(g) = (v x $)-!(«(2).(2)9) = (m x é)-!((2)),9(2) = 21. 
Hence Range(L7) = P,(z). 
For part (iii), Range(L?) = P,z) by part (ii). To show that L? is injective, let g, g' € G with LY (g) = LẸ (g'). 
Then zg = ag. So z —agg — [E So (z) = (RE -1(2)) = ó(z)g'g ! by Theorem 21.11.7 (ix). 
Therefore g = ¢(z)~'$(z)g' = g'. Hence LP : G — Pre) is a bijection. 
For part (iv), let z € P and g € G. Then Dom(LP) = G = Range(RẸ). So LP o RẸ : G > P is 
well defined. Similarly, Dom(R?) = P 2 Range(L?) by Theorem 21.11.7 (i). So RP o LP: G > P 
is well defined by Theorem 10.10.13 (i). Let h € G. Then (LP o Rg)(h) = LẸ (RẸ(h)) = z(hg), and 
(RP o LP)(h) = RẸ (LẸ (h)) = (zh)g = z(hg) by Theorem 21.11.7 (ii). Hence LP o RẸ = RẸ o LP. 
For part (v), let z € P. Then LP(e) = RẸ (z) = z by Definition 21.11.18. So e = (LP)~1(z) by part (iii). 
For part (vi), let ¢ € AG and z, 2’ € Dom(¢) with 7(z) = m(z’). Then by Definitions 21.11.18 and 21.11.4, 
LE (6(2) ^ 6(7)) = Rice) 

= (nx) (n(2),0(2)0(2) (2) 

mra rte. Go = 2 
Hence (LP) !(z") = $(z)~'@(z’) by part (iii). So (DP) ! = e|, , for all z € P and $ € AŠ r(e) 
because Dom((LP)~') = P(e) by part (iii). 
For part (vii), first note that (D2) 1 : P. > G, LG,)-1:G— Gand a a : Pae) > G are all bijections. 


Doy 


=i : 
Hence (LP)7! o P| B = LG) by part (vi). 


21.11.20 REMARK: Inversion of the non-surjective left action map. 

The map LT : G — P in Definition 21.11.18 has target space P, whereas its range is Pr(z), as stated 
in Theorem 21.11.19 (ii). So if L? is given the alternative target space P,(z), it becomes a bijection, as 
stated in Theorem 21.11.19 (iii). A difficulty arises here when L is claimed to have is inverse (LP)~! as in 
Theorem 21.11.19 (v, vi, vii). The interpretation which is intended here is that (L?)~? is ue inverse relation 
of L? , and its dor its domain and target space are adjusted to make it a map of the form (LP)-! : Pr(z) G. But 
this p that LP is a map of the form LP : G > P,(z), which subtly contradicts the form Pe GP 
in Definition 21.11.18. 


These subtle distinctions may be ignored in the case of non-topological principal bundles, but then various 
ambiguities and contradictions arise when defining the differentials of L? and (LL)-1. (See Remark 66.4.4.) 
The ambiguities in Theorem 21.11.19 may be (at least partly) resolved by assuming that (LPL)-! has domain 
and target space P,(,) and G respectively, whereas LP has domain and target space G and P respectively. 


An alternative resolution of the issue would be to let P; (2) ) be the target space of LP. However, this leads 
to another issue because the main purpose of defining L} is to define its differential and transpose it as in 
Definition 66.6.2. The differential of L? : G — P; would then be a bijection from T(G) to T(P,,z)) for 
each z € P, and the transpose of this would be a map from P to P,,,) for each g € G, which is meaningless 
because then for each g € G, the target space of the transposed function would depend on the argument of 
the transposed function. It is for this reason that L? has target space P. This makes the differential of LP 
have target space T(P), the tangent bundle of the total space P, instead of target space T(P,(z)), which is 
the tangent bundle of a single fibre set. 


The conclusion, then, is that the ambiguities in Theorem 21.11.19 are the lesser evil. 
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21.12. Associated non-topological fibre bundles 


21.12.1 REMARK: History of associated fibre bundles. 
The introduction of the concepts of associated principal bundles and general associated fibre bundles is 
attributed by Steenrod [142], page 36, to a 1941 paper by Ehresmann [177]. 


21.12.2 REMARK: The Universe is a set of associated fibre bundles. 

It is suggested in Remark 21.0.7 that the Universe is a fibre bundle. In fact there are many fibre bundles, 
one for each kind of field. Each field theory needs its own Universe-wide field, and each field needs a fibre 
bundle to live in. 


When fibre bundles have the same base space and structure group, and they have consistent topological 
structure, they are associated fibre bundles. If all of the structure groups in physics could be regarded as 
subgroups of a single group, the Universe could then be regarded as having a single principal bundle, and 
then all of the fields of physics would live in fibre bundles which are associated with this “grand unified 
principal bundle”. Thus associated fibre bundles have great significance in physics. 


21.12.3 REMARK: Fibre bundles can be associated via a common structure group and base manifold. 
Fibre bundles are said to be associated if they have the same fibre chart transition maps at each point 
of a common base manifold. The fibre spaces of associated fibre bundles are typically different, but their 
structure groups are required to be the same because each associated pair of chart transition maps must be 
equal to actions by the same group element acting on the respective fibre spaces. 


Definitions 21.12.4 and 21.12.5 are illustrated in Figure 47.9.1. 


21.12.4 DEFINITION: Non-topological fibre bundle association maps. 7 
A (non-topological) fibre bundle association (map) between non-topological (G, F) and (G, F) fibre bundles 
(E, v, B, AE) and (E, i, B, AL) respectively is a bijection h : AE > AÈ which satisfies 


(i) Vó € AZ, r(Dom(9)) = #(Dom(h(ġ))), 


(ii) Vó1, 2 € Ap, Vb € r(Dom(ó1) N Dom($2)), 942,61 (b) = 9n(62),n qo.) (b). 
where g, g denote the fibre chart transition functions for the respective fibre bundle atlases. 


(See Definition 21.8.12 for go, 4, (b) and dn(o,),n (o) (b)-) 


21.12.5 DEFINITION: Associated non-topological fibre bundles. n 
Associated non-topological fibre bundles are non-topological (G, F) and (G, F) fibre bundles (E, r, B, AE) 


and (E, i, B, AE) for which a non-topological fibre bundle association h : AZ > AT is specified. 


21.12.6 REMARK:  Tensor bundles of different type are associated fibre bundles. 

Prime examples of associated fibre bundles are tensor bundles T'^* (M) of different types (r, s) on the same 
differentiable fibre bundle M. (See Section 56.3 for tensor bundles on differentiable manifolds.) These have 
the same structure group GL(n) for an n-dimensional manifold, and associated fibre chart transition rules. 
Tensors of different type have different fibre spaces F and different transformation rules u : G x F — F, but 
the coordinate charts themselves are related to each other by the same transition rules gg, ,4, for each type 
of tensor. Thus the group elements are the same, but the actions of the group elements on the fibre spaces 
are different because the fibre spaces are different. 


21.12.7 REMARK: Parallelism-related expression for fibre chart association maps. 
Perhaps a clearer way of presenting Definition 21.12.4 (ii) is condition (21.12.1) in Theorem 21.12.8. This 
resembles the associated parallelism rule in Definition 48.4.2 condition (48.4.1). 


21.12.8 THEOREM: Alternative expression for fibre-chart-independence condition for OFB associations. 
The fibre chart association map rule, Definition 21.12.4 (ii), is equivalent to condition (21.12.1). 
Vó1, p2 € AE, Vb € (Dom(¢1) N Dom(¢2)), Vg € G, 
polr, = Ls 0 bil, €  h(é2|g, = Ls o h(ó1)|,. (21.12.1) 


PROOF: Assume Definition 21.12.4 (ii). Let ¢1,¢2 € AL, b € n(Dom(¢1) n Dom(¢2)) and g € G, and 
suppose that $2\ s, = Dg ils, By Definition 21.8.12, gol s, = Log, o, (b) © hilm So g = gọ,,ġ, (b) by 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


21.13. Patchwork associated non-topological fibre bundles 739 


Theorem 20.2.7 because Range(1| p) = F = Dom(L,) = Dom(L,,, ,,()) and the action of G on F is 
effective by Definition 21.8.3. So by Definition 21.12.4 (ii), 


Vi € E, h(b2)(Z) = 8n(6s),n coi) (0) h ($1) (2) 
= 90,61 (P) h($1)(Z) 
= gh(¢1)(Z). 


Therefore h(¢2)| g, = L4 0 h(ó1)| g, . which verifies the forward implication on line (21.12.1). The reverse 
implication follows similarly because h is a bijection. Thus Definition 21.12.4 (ii) implies line (21.12.1). 

To show the converse, assume condition (21.12.1). Let ¢1,¢2 € AE, b € t(Dom(¢1) n Dom(¢2)) and g € G. 
Then ¢2| p, = L,,, , (o © $1|p, by Definition 21.8.12. So h(ó»)|g = Lo, , (o o h(ó1)|g, by line (21.121). 
But h(ó2)]g = La,,,,5:,0) 9 h(ó1)|g, by Definition 21.8.12. So go,,9, (b) = n(o.),.i(o,) (P) because 
Range(h(¢1)| y. ) = f= Dom(L,, ,,(5) = Dom(Ls,,, ,,(,,,()) and the action of G on F is effective by 
Definition 21.8.3. Thus line (21.12.1) implies Definition 21.12.4 (ii). 


21.12.9 REMARK: Associated fibre bundles in gauge theory. 

As alluded to by Daniel/Viallet [317], page 180, principal bundles are used in gauge theory to represent 
(bosonic) radiation fields, whereas associated ordinary fibre bundles (which are vector bundles) represent 
associated (fermionic) matter fields. Thus associated fibre bundles are of some importance in elementary 
particle theory to link bosons with the fermions which they interact with. 


'The patchwork style of associated fibre bundle construction in Definitions 21.13.2, 47.10.4 and 66.7.10 has 
the advantage of great generality. The patchwork style can associate general fibre bundles which have the 
same base space and structure group. Given any fibre bundle and the action of its structure group on a 
different fibre space, the patchwork style constructs an associated fibre bundle with that fibre space. 


'The orbit-space style of associated fibre bundle construction in Definitions 21.14.2, 47.11.5 and 66.7.12 has 
the advantage that it is more concrete, has stronger algebraic appeal, and is most often used in applications, 
especially in gauge theory. It has the disadvantage that the input must be a principal bundle. Thus in the 
language of mobile communciations, the patchwork style is a kind of peer-to-peer construction, from one 
mobile handset to another, whereas the orbit-space style creates a mobile handset (i.e. ordinary bundle) from 
a base station (i.e. principal bundle), which is a kind of server/client construction. (See Remarks 20.12.6 
and 47.10.1 for similar comments.) 


21.13. Patchwork associated non-topological fibre bundles 


21.13.1 REMARK: Construction of an associated fibre bundle with a different fibre space. 

Definition 21.13.2 constructs an associated (G, F) fibre bundle from a given (G, F) fibre bundle. The only 
information that the associated bundle inherits from the given bundle is the set of transition maps go,,o, of 
Definition 21.8.3 for non-topological fibre bundles. 


The inputs and outputs of the patchwork construction method may be summarised as follows. (The input 
and output lists for topological fibre bundles for the patchwork and orbit-space construction methods are 
given in Remarks 47.10.3 and 47.11.4 respectively.) 


Inputs: 

(1) (G, F) < (G, F, c, p), an effective left transformation group. (The source fibre space.) 
(2) (E, v, B, AZ), a non-topological (G, F) fibre bundle. (The source OFB.) 

(3) (G, F) < (G, F,oc, uÈ), an effective left transformation group. (The target fibre space.) 


Outputs: 

(1) Ē, the new total space. 

(2) *, the new projection map. 
(3) Ar. the new fibre atlas. 
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(4) (E,%,B, AP), the new non-topological (G, F’) fibre bundle. (The target OFB.) 


Composed of the new structures E, 4 and AE, and the old base space B. 
(5 h: AL > AT, the fibre chart association map. (From source OFB to target OFB.) 


It is significant that only a small amount of information is used from the old fibre bundle to construct the 
new fibre bundle. For example, the old fibre charts in AT are only used to compute the old transition maps 
945,9, : B > G. The fibre charts themselves are used only as index-tags. In fact, no index set is required. It 
is only the set of transition maps which is used as inputs. 


The old base space B and group G are fully re-used in the new fibre bundle, but the old fibre space F is 
not used. Nor are the old total space E and projection map 7 used in the new fibre bundle. Therefore the 
new fibre bundle can be constructed using only B, G and [(95,,5,; $1, 92 € AE}, where AŻ is used only as 
an index set (which could typically be replaced by a set of integers), and the functions gy, 4, : B > G use 
only B and G. (See also Remark 47.10.2 for construction of associated fibre bundles using only B, G and 
(955,41; $1, 92 € AZ) as minimal “abstract” inputs.) 


Since the sets B and G can be obtained from the domains and ranges of the transition maps, one may even 
say that only the set of transition maps is required as an input. Not even the sets B and G are required! 
(On the other hand, the group operation og : G x G > G would not usually be obtainable.) 


21.13.2 DEFINITION: The patchwork associated non-topological (G, F) fibre bundle of a non-topological 
(G, F) fibre bundle (E,7, B, AL), for effective left transformation groups (G, F) and (G, F), is the non- 


topological (G, F) fibre bundle (E,7, B, A£ 5) With chart association map + : AT, AE jg: defined as follows. 


E-([(bfó;b^eBfeFo AE ,), where [(b, f,¢)] = {(b, 9o. (b)f, 9); 6 € AE), and the 
for all $1,609 € AF, 


transition maps gg,,¢, : Ug, N Uga — G are defined by Lg, , (y) = $2 ° di zu 
and b € Uy, N U,,, where Uy = t(Dom(¢)) for ¢ € AL. 
(ii) 4 : E > B is defined by 7: [(b, f, 9)] — b 
(iii) AË = (h(6); ó € AF}, where h(@) : à (Us) — F is defined for ¢ € AE by h() : (b, fo] 4 Jf. 


The non-topological fibre bundle association map is h : AZ > AE as in part (iii). (See Figure 47.10.1.) 


21.13.3 THEOREM: Validation of the total space construction for patchwork associated fibre bundles. 

Let (G, F) and (G, F) be effective left transformation groupe: Let (E, 7, B, AE) be a non-topological (G, F) 
fibre bundle. Let X = {(b, f,¢);b € B, f € F, 9 € AE 5]. Define the. relation “=” on X by (U, f', d) z 
(b, f, 9) whenever b’ = b and f' = gy (b) f. 


(i) “=” is an equivalence relation on X. 
(ii) The set {[(b, 9); bE B, fe F, o € Aj, p} of non-empty equivalence classes of “=” is a partition of X. 


(iii) Vb € B, Vf € F, Yoi, %2 € AE. b: [(b, f, b1)] = [(b, 96.0. (b) f; 02)]. 


Pnoor: For part (i), let (b, f, 9) € X. Then go,4(b) = ea because Ly, (y) = $ o Op, = idp and G acts 
effectively on F. So f = gs,4 (b) f. Therefore (b, f, 9) = (b, f, ¢). 

To verify that “=” symmetric, suppose that (bi, f1, 91) = (be, f2, $2). Then bz = bı and fo = 954,4, (b) fs. 
So fi = i, o (b ) fo by Definition 21.8.12. Therefore (ba, fo, $2) = = (bi, fi, $1). 

To verify transitivity, suppose that (b1, f1, $1) = (b2, f2, 62) and (b2, f2, 62) = (ba, f3, $3). Then bz = b2 = bi 
and fa- = Gos, $5 (b )fa and fo = = Goa, 1 (b Jfa- So f= = 9¢3,¢2 (6) 940,61 (b \fi = = Joz, (b )h by Definition 21.8.12. 
Therefore (bi, f1, $1) = (03, fa, $3). Thus “=” is an equivalence relation on X. 

Part (ii) follows from part (i) and Theorem 9.8.4 (ii). (See Definition 8.7.12 for partitions.) 


For part (ii), let (0', f'. 9") € [(b, f, $1)]. Then by Definition 21.13.2 (i), b = b and f' = gs, (b)f. So 
f! = garós (b) (940,61 (b) f). Therefore (b', f', 6’) € [(b, go,,5, (b) f. 62)]. Thus [(b, F, 01)) € [(b, 94,41 (b) f. 92)]. 
Now suppose that (b', f^, 6’) € [(0, 942,01 (b) f, ó2)]. Then b' = b and f! = goo, (0)9¢0,1 (D) f = geo. (b) f. So 
(b', f', o!) € [(b, f, $1)] by Definition 21.13.2 (i). Thus [(b, f, $1)] 2 [(b, 965,5, (b) f, 92)]. Hence [(b, f, $1)] = 
[(b, VL (b) f, $)]. 
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21.13.4 REMARK: Validation of patchwork associated non-topological fibre bundles. 
To show that Definition 21.13.2 constructs valid associated fibre bundles from valid fibre bundles, satisfying 
Definition 21.12.4, one must show the following. 


1) A constructed sets and maps E, 7 and AF) are well-defined ZF sets and maps. 


2) The constructed tuple (É, 7, B, AP) is a valid non-topological (G, F) fibre bundle. 


4) The fibre charts of the old and new fibre bundles have corresponding domains as in Definition 21.12.4 (i). 
5 


) 
) 
3) The association map h : AZ > AE is a valid bijection. 
) 
) 


The fibre-chart pairs of the old and new fibre bundles have chart transition maps related by the same 
structure group elements, as in Definition 21.12.4 (ii). 


21.13.5 THEOREM: Patchwork associated fibre bundles are associated fibre bundles. 

Let (G, F) < (G, Fea, u&) and (G,F) < (G, F,oa, u&) be effective left transformation groups, and let 
(E, v, B, AZ) be a non-topological (G, F) fibre bundle. Assume that F £ Ø and F 4 0. 

Let (E, 7, B, AE) be the patchwork associated non-topological (G, F) fibre bundle of (E, v, B, AE) with chart 
association map h : A% > AE as in Definition 21.13.2. Then (E, 7, B, AF) has the following properties. 


(i is a well-defined partition of ((b, f, à); be B, f € F, ó € AS] = Useak (x(Dom(¢)) x F x {¢}). 


: E — B is a well-defined function, and 7: E — B is a surjection. 
VU € P(B), 4(x-!(U)) =U. 


— 
zu 


ii 


) E 
EI 
(iii) 

(iv) Vb € B, Ey £0, where Ej denotes 5^! ((b]) for b € B. 

(v) Range(7) = B. ! 
(vi) Vo € Ag, Dom(h(9)) = 3^ (x(Dom(4))) = {[(6, f, &)]; b € Dom(9), f € F}. 
(vii) h: AE > AE is a well-defined bijection. 

(viii) Vó € AZ, «(Dom(9)) = *(Dom(h(¢))). (See Definition 21.12.4 (i).) 

(ix) và e AE, "BENE ) pra HI > É. 

(x) Useat i (Dom(9)) = B. In other words, (J ((Dom(9)); ó € AE} =B. 

) 
i) 
) 


(xi) Vó € AL, ñ x à: 4 (U,) > U; x F is a bijection, where Vó € AL, U; = #(Dom(ĝ)). 


(xii) VÀ, $2 € AŽ, 33; ; : Üz, nU; > G, VZ € t (U5 nU; ), és) = Gz, 5, EO). 

(xiii) Vb1, 2 € AZ, Vb € Ug, nUs,, 960,61 (b) = Õn(p2), hlo) (b), where Ug denotes 7(Dom(¢)), and g, g denote 
the fibre shart transition functions for the respective fibre bundle atlases. (See Definition 21.12.4 (ii).) 

(xiv) (E, 7, B, AF) is a non-topological (G, F) fibre bundle. (See Definition 21.8.3.) 

(xv) (E, i, B, AF) is an associated non-topological (G, F) fibre bundle of (E, x, B, AE) with chart association 
map h: AE > AF. 


PROOF: Part (i) follows from Theorem 21.13.3 (ii) and Definition 21.13.2 (i). 


For part (ii), let [(b1, f1, $1)] and [(b2, f2, ¢2)] be elements of E. Then b, = bz by Definition 21.13.2 (i i). So 
the map 4 : E > B in Definition 21. 21.13.2 (i) is well defined. (In other words, the rule 7 : [(b, f, $)] + b maps 
each element of E to one and only one element of B. ) Thus 7 : E > B is a well-defined function. 


Let b€ B, f € F and à € AZ ,. Then [(b, f, 9)| € E by Definition 21.13.2 (i), and then ([(b, f, à)] = b by 
Definition 21.13.2 (ii). Hence &: Ë B is surjective. 

Part (iii) follows from part (ii) and Theorem 10.7.1 (i). 

For part (iv), let b € B. Let 6 € Af». Then [(b, f,¢)] € E by Definition 21.13.2 (i), and 4([(b, f, 9)]) = b by 
Definition 21.13.2 (ii). So [(b, f, 9)] € #71 ((b)). Thus E, # Ø. Hence Vb € B, Ey 7 0. 

For part (v), suppose that B = Ø. Then E = () and  — ( by Definition 21.13.2 (i, ii). So Range(1) = 0 = B. 
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Now suppose that B + Ø. Let b € B. Let f € F and ġ € AE ,. Then [(b, f, )] € E and #(((b, f, 9)]) = b by 
Definition 21.13.2 (i, ii). So b € Range(7). Thus Range(7) 2 B. 

Now let b € Range(7). Then b € B by Definition 21.13.2 (ii). Thus Range(7) C B. Hence Range(7) = B. 
For part (vi), let 6 € AZ. Then Dom(h(¢)) = $^! (x(Dom(9))) by Definition 21.13.2 (iii). 

Let Us = 7(Dom(¢)). Then 4-!(U;) = ([(b f. 9)]; b € Us, f € F, d' AE s] by Definition 21.13.2 (i, ii). 
So 41 (Us) 2 ([(b f, d)]; b € Us, f € F} because ¢ € AZ , whenever b € Uy. 

Now let 2 € 57! (U5). Then Z = [(b, f, 9')] for some b € Uy, f € F and ¢' € AZ ,. So 2 = [(b, 94,0 (b)) f, d] by 
Theorem 21.13.3 (iii). So Z = [(b, f’, @)] with b € Us, f' = go, (b)f € F. So Z € (Itb, f.d)l; b € Us, f € F). 
Therefore 7~'(Ug) C {[(b, f, 9); b € Us, f € F}. Hence $ !(U5) = (((b, f. 9); b € Us, f € F). 

For part (vii), let ó € A5. Let Us = 7(Dom(¢)). Then $-1(Us) = ([(b, f, à)]; b € Us, f € F} by part (vi). 
So the map h(¢) : 3^! (U5) — F defined by h(¢) : (b, f, 9)] — f in Definition 21.13.2 (iii) is a well-defined 
function, and h(¢) € AE because AE is defined as Range(h). Thus h : AZ > AË is a well-defined surjection. 
To show that h is injective, let ¢1,¢2 € A with h(ó1) = h(¢2). Then Dom(h(¢1)) = Dom(h(¢2)). So 
$-!(U4,) = $-!(Us,) by Definition 21.13.2 (iii). Therefore Uy, = 4(4-1(Us,)) = $(31-1(Us,)) = Us, by 
part (iii). So 7(Dom(¢1)) = 7(Dom(¢2)) by Definition 21.13.2 (i). Therefore Dom(¢,) = Dom(9»). 

Let Z € Dom(h(¢1)) = Dom(h(¢2)). Let b = 7(Z). Then Z = [(b, f1, 91)] = [(0, fo, 92)] for some fi, fo € F, 


and then fı = h(1)({(b, fi. 41)]) = A(6:)3) = h(02)3) = h(a) (((b, fo, d2)]) = fo by Definition 21.13.2 (ii). 
But f2 = Ly, , (9) (f1) by prs 21.13.2 (i). Thus L idg. Therefore gy, \4,(b) = eq because 


(G, F) is effective. So $2 o oil Lia = Lge a (b) 
So $2 = $4. Consequently h is injective. Hence h : AE > AZ is a bijection. 

For part (viii), let ó € AZ. Then Dom(h(¢)) = * !(x(Dom(o))) by part (vi). So #(Dom(h(4))) = 
m(Dom(¢)) by part (iii). 

For part (ix), let ó € AT. Let ó = h-!(Q), which is well defined by part (vii). Let U = 7(Dom(¢)). Then 
h(9) :#-1(U) > F by Definition 21.13.2 (iii). Hence Vó € AE, JU € P(B), 6: 3 *(U) 5 P. 

For part (x), let ¢ € AT. Let ó = h! ($) € AE, which is well defined by part (vii). Then #(Dom(¢)) = 
x(Dom(¢)) by part (vili). So Uzes &(Dom(9)) = Uze ar v(Dom(h- (8) = Ugeag ™(Dom(¢)) = B by 
Definition 21.8.3 (iii) applied to the non-topological fibre bundle (E, r, B, A). 

For part (xi), let 9 € AZ. Then Dom(#) = E by Definition 21.13.2 (ii). Let ¢ = h-'(), which is well 


defined by part (vii). Then Dom(¢) = &-'(z(Dom($))) by Definition 21.13.2 (ili). So Dom(z x 3$ s 
Dom(s)nDom(9) = 4-!(x(Dom(9))) by Notation 10.15.3. Therefore by part (viii), Dom(3 x p= U3), 
where Ü; = = 7(Dom(¢)). 
Then Range( x à) C T (U5)) x $(1-! (U;)) by Theorem 10.15.6 (ii). So Range(i x $) C Uz x Range(4) 
because 1(4—1(U;)) = Uz by part (iii), and 4-1 (U5) = #1 (7#(Dom(¢))) = Dom(¢) by NOE 21.5.4 (1). 
By Definition 21.13.2 (iii), Range() C F. So Range( x d) C U; x P. 

Let (b, f) € U; x F. Let Z = [(b, f, d)]. Then 2 € E and #(2) = b € Uj. So Z € $1 (U;). But (2) = f. 
So (x x 6)(2) = (b,2). Thus (b, 2) € Range(4 x à). Therefore Range(t x 9) = Uz x F 

ax: & (E — Ü; x F is a surjection. 

Let 4,,29 € 7 ES). Then Z, = [(b., fis, O) for k = 1,2 for some 5,55 € Us and fi, f2 € F. Suppose 
that 21, 22 satisfy (T x o) (2); = (T x o) (21). Then by = (2) 1(4) bo and fi o(21) (21) fe: 
Therefore 2, = Žv. Thus 4 x $ is injective. Hence 4 x à : 7 CE — Uj x F is a bijection. 


For part (xii), let $1, 02 € AE, and let p = h-! (dx) for k = 1,2. Then by Theorem 21.8.11 (ii), there is a 
(unique) function gg,.4, : t(Dom(¢1)) N 7(Dom(¢2)) + G which satisfies 


993,0, (6) = 
= idp. Therefore pole, = tila, for all b € Ug, = Us,. 


3 X F. Consequently 


Vb € m(Dom(¢1)) n r(Dom(¢2)), Vz € 71 ({d}), 
Q»(z) = ó»,di (b)oi (z). 
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By part (viii), ™(Dom(¢,)) = #(Dom(¢x)) = U5, for k = 1,2. So Dom(go,,4,) = Uz, N U;,. 
Let 2€ a (05, NU;,). Then Z = [(b, fi, d1)] = [(b, f2, 92)] for some b € U5, nÜ;, and fi, fo € F, where 
fa = 96,4. (b) fi by Definition 21.13.2 (i). Define d; 5, : Uz, Uz, — G for 5,01 € AZ by 

V», b1 € AZ, Vb c Üz, CHR 95, 4, (0) = 94-1($),h-1(d1) ©): (21.13.1) 
Then since ¢;(2) = fy for k = 1,2 and (Z) = b, one obtains the desired assertion: 

Yó, 1 € Ag, YZ e t  (U5 nO; ), — $2(2) = 85,5, (8 (2) 1). 
Part (xiii) follows from line (21.13.1) in the proof of part (xii). 


Part (xiv) follows from Definition 21.8.3 and parts (ii), (ix), (x), (xi) and (xii). 
Part (xv) follows from part (xiv), Definition 21.12.4, and parts (vii), (viii) and (xiii). 


21.14. Orbit-space associated non-topological fibre bundles 


21.14.1 REMARK: Importance of orbit-space associated fibre bundles to gauge theory. 
In gauge theory, when a connection form has been defined on a principal bundle to represent the gauge 
potential for a bosonic radiation field, the next step is to somehow represent fermionic matter fields. This is 
done using associated vector bundles, which are typically constructed using the orbit space method. After 
this, a matter field can be defined in terms of the associated connection on the associated vector bundle 
using the contravariant principal bundle function construction as in Definition 47.12.3. 


Most of the useful properties of associated vector bundles and contravariant principal bundle functions are 
purely algebraic. So they can be defined and proved for non-topological fibre bundles. (See Section 47.11 
for orbit-space associated topological fibre bundles. See Definition 66.7.12 and Section 66.8 respectively 
for orbit-space associated differentiable fibre bundles and the very closely related differentiable short-cut 
orbit-space associated cross-sections.) 


For some discussion of the meaning and importance of the orbit-space method of construction of general 
associated fibre bundles (non-topological, topological and differentiable), see Remarks 47.11.1 and 47.11.2. 
For comparison of the orbit-space method with the patchwork method, see Remark 47.11.3. 


21.14.2 DEFINITION:  Orbit-space construction of non-topological fibre bundle from a principal bundle. 
The orbit-space associated non-topological (G, F) fibre bundle for a given non-topological principal bundle 
(P, p, B, AG) and an effective left transformation group (G, F) < (G, F, o, uE) is the non-topological (G, F) 
fibre bundle (E, 7, B, AZ) which is constructed as follows. 

(i) E = ([(z, f)]; z € P, f € F}, where for all (z, f) € P x F, 

(z, £)] = (05, f") € P x Fi ne(7) = np(z) and 3e € Ag, 6(2)f* = (2) f}. 

(ii) c : E > B is defined by s : [(z, f)] ^ mp(z). 

(iii) AE = (h(9); ¢ € AG}, where h(d) : x 1(U4) > F is defined for 9 € AS by h(9) : (z, f)] 9 e(z)f. 


The non-topological fibre bundle association map is then h : AG — AF as in part (iii). (See Figure 47.11.1.) 


((2022-12-13. Show that the orbit-space construction satisfies the requirements for associated fibre bundles. 
Also present a non-topological version of the contravariant principal bundle functions in Definition 47.12.3 
in à new section. )) 


21.15. Parallelism on non-topological fibrations 
((2018-11-9. Sections 21.15 and 21.16 need to be totally rewritten. Please ignore them for now. )) 


21.15.1 REMARK: Parallelism for non-topological fibrations and fibre bundles. 

Sections 21.15 and 21.16 are of dubious value, and should probably be skipped. However, some authors do 
define parallelism on paths, as opposed to pointwise differential parallelism (i.e. connections), in a way which 
resembles the non-topological path-dependent parallelism mentioned in Remark 21.15.5. (See for example 
Poor [32], pages 275-276.) In the case of topological fibre bundles, a pathwise parallelism concept is presented 
in Definition 48.3.2. 
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21.15.2 REMARK: Absolute parallelism specifications determine bijections between fibre sets. 
Parallelism associates elements of fibre sets Ey = v !([b]) for pairs of points b in the base space B of a 
fibration (E, r, B). A point-to-point parallelism may be specified as a map 0 : (B x B) > (E — E) such 
that 05,5, : 7 1({bi}) — v 1((03]) is a bijection for all 6,05 € B. It is reasonable to expect the following 
equivalence relation properties for parallelism. (See Definition 9.8.2 for equivalence relations.) 


(i) Vb € B, 05 = idg,. [reflexivity 
(ii) Vbi,bo € B, Obi ba = Os ba" [symmetry 
(iii) Vor, b2,b3 € B, Os, i, © Op, o, = Ops. [transitivity 
For 21,22 € E, one may write 2 || z2 to mean that 07/2.) 7(z,)(21) = 22. Then the relation “||” satisfies: 

(i) Yz € E, z || z, [reflexivity 
(ii) Vz1, 22 € E, ((z1 || 22) € (2 || z1)); [symmetry 
(iii) Vz1, 22,23 € E, ((a || 22 and za || z3) > z1 || z3). [transitivity 


This is clearly a simple equivalence relation on E, whose equivalence classes contain one and only one element 
of each fibre set E. This may be referred to as an “absolute parallelism” or a “flat parallelism”. Absolute 
parallelism is illustrated in Figure 21.15.1. 


F (7) fibre space 


ó 


2 
Ob, b. Ob. b2 
E T — > Ww. = total space 
0 0 
by 


fibre chart 


by ,b2 b2,b3 
we e c o | projection map 


B base space 


Figure 21.15.1 Non-topological fibration with absolute parallelism 


If F is a fibre space for the fibration (E, n, B), and $1, ¢2 are fibre charts for (E, n, B) with fibre F, such 
that bj € 7(Dom(¢,;)) for k = 1,2, then balm, o dile, DES — Ep, is a bijection. So this map could be 
regarded as a kind of parallelism relation between Ej, and E,,. However, it is chart-dependent. So this is 
not a feasible way of constructing an absolute parallelism relation unless one uses only a single global chart. 


For examples of such a simple point-to-point parallelism, see Peskin/Schroeder [298], pages 482, 487. 


21.15.3 REMARK: Parallelism may be thought of as a transformation group action on the fibre space. 

If 0 : (B x B) > (E > E) is an absolute parallelism relation for a fibration (E,7, B), and $1, $9 are 

fibre charts for (E, r, B) with b € 1(Dom(@,)) for k = 1,2, then the map gu : F — F defined by 

gu = 2 0 0p, b, © dila is a bijection on F for all b1,b2 € B. Thus po is an element of the group of 
‘ 


bijections on F. Therefore viewed “through the charts”, parallelism may be thought of as a transformation 
group action on the fibre space F. 


If one considers point-to-point absolute parallelism on a fibration in the context of an example such as 
temperature distributions on the Earth, parallelism is not a “conversion of units”. In other words, it does 
not correspond to a change over observer “coordinate frame”. An absolute parallelism map 6, b, corresponds 
to an equivalence relation between the “real temperatures" at points bı and b on the Earth. Then the maps 
a through the charts ¢; and ¢2 correspond to equivalence relations between temperature measurements 
at points bı and bə which are made via “coordinate frames” $4 and $». 


21.15.4 REMARK:  Non-conservative parallelism. 
If a parallelism relation does not obey the transitivity rule, one could obtain different parallelism bijections 
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by following different paths through the base set. This would be a “non-conservative” parallelism relation, 
by analogy with force fields in physics. 


An example of a non-conservative parallelism relation would be a set of exchange rates between all of the 
currencies in the world. Assuming that there are no conversion fees, the exchange of sums of money between 
currencies bı and bg would be reversible. So the reflexivity and symmetry conditions would be satisfied. 
However, it is not unthinkable that the product of exchange rates for currency pairs (b1,b2) and (bg, bs) 
might not equal the exchange rate for currency pair (51,03). Then one could make a profit (or loss) by 
routing money around the loop (bı, 62, bs). Such loops do happen in practice, although they obviously do 
not persist for very long. 


Another example of a non-conservative parallelism would be an erroneous set of height difference measure- 
ments by survey teams. If three teams were responsible for measuring the height differences between the 
point pairs (b1, b2), (55,03) and (53,01), the sum of the differences might not equal zero. In this case, the 
height difference between two of the points would depend on whether one used a single hop or two hops. In 
this case, the non-conservative relation is caused by measurement errors, not physical reality. 


21.15.5 REMARK:  Path-dependent parallelism. 

In the case of the absolute parallelism in Remark 21.15.2, there is one and only one way to establish the 
parallelism bijection between each pair of base points. This is a “point-to-point parallelism”. For each pair 
of base points, there is one and only one parallelism bijection. Such a parallelism has the form of bijections 
Oba bi x E, — Ey, for 51,605 € B. 

For “pathwise parallelism”, one may establish multiple parallelism bijections for each base space point pair 
by making multiple “hops” between one point and another. Then for each pair of points, there is one 
parallelism bijection for each path between the pair of points. Such a parallelism may be specified as 
bijections © : Py, b — (Ey, > Ep,) for 1,05 € B. In other words, for all 51,55 € B, there is a set 7, bz 
of permitted paths between bı and b2, and for each path Q € biba, there is an end-to-end parallelism 
bijection O(Q) : Ey, — Ey,. (See Figure 21.15.2.) 


F fibre space 
fibre chart 


total space 


projection map 


3 

a 
© 

-— 


B «— Æ base space 
b De S 
Q3 Q4 
Figure 21.15.2 Non-topological fibration with pathwise parallelism 


The reflexivity and symmetry conditions in Remark 21.15.2 are assumed to hold for point-to-point bijections. 
(This means that the constant path maps to the identity map, and the reversal of a path gives the inverse 
bijection.) However, the transitivity condition is not required. The choice of path determines the end-to-end 
parallelism relation. 


One would expect a pathwise parallelism to obey transitivity for concatenations of paths. In other words, 
if a path Qo € 2, ba is equal to the concatenation of paths Q4 € 4,4, and Q2 € 2, ,, then one would 
expect O(Qo) = (Q3) o O(Q1). 

'The canonical example of non-conservative parallelism is the parallel transport of tangent vectors on a curved 
manifold. However, this example is best studied in the context of differentiable fibre bundles. 


21.15.6 REMARK: The special case of pathwise parallelism on a finite base space. 

If B is finite, the end-to-end parallelism relations may be formed from any concatenation of point-to-point 
paths. In the finite case, transitivity under concatenation implies that only the point-to-point bijections 
need to be specified. In fact, one could define the point-to-point bijections for a subset of the full cross 
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product B x B. If all points of B are reachable by some concatenation of links, then one or more end-to-end 
parallelism relations will be implied. 


Paths may be defined as equivalence classes of parametrised curves for some suitable equivalence relation. 
One may also define paths as totally ordered subsets of the base space. If these subsets are finite, the 
end-to-end parallelism will be well defined. 


21.15.7 REMARK: Specification of permitted paths in terms of “forward cones”. 

A credible procedure for defining “traversable paths” on an otherwise unstructured base space would be to 
specify a “forward cone" C, C B at all points b € B, from which paths may be defined as finite sequences 
Q = (bi)? o for n € Zj , such that b;;4 € Cy, for all i € Z[0,n — 1]. Then the end-to-end parallel transport 
map from Ep, to Ey, along the path Q would be the bijection O(Q) : Ey, — Ey, defined as O(Q) = Wy, 
where the sequence (V;)7.; is defined inductively by Vo = idg,, and V;,, = 65, ,, s, o V; fori € Z[0, n — 1], 
where 0y p : Ey — Ey is the local parallelism map for all b’ € Cy, for all b € B. (This model has some 
relevance to communication networks, where data packets are transformed by each link traversal, and the 
end-to-end transformation depends on the path.) 


21.16. Parallelism on non-topological ordinary fibre bundles 
((2018-11-9. Sections 21.15 and 21.16 need to be totally rewritten. Please ignore them for now. )) 


21.16.1 REMARK: The role of structure groups to constrain reference frames for defining parallelism. 
The primary practical purpose of defining fibre bundles for differential geometry is to define parallelism and 
curvature on them. The role of the structure group in defining parallelism is to constrain the set of “frames 
of reference” at each point of the fibre bundle. (See Section 20.10 for frames of reference and their relation 
to structure groups.) 


Parallelism between fibre sets, when viewed through a fibre chart, must have the form of a left action by the 
structure group on the fibre space. In the case of differentiable fibre bundles, a connection (i.e. differential 
parallelism) is required to be an infinitesimal left action of the structure group on the fibre space. (See 
Definitions 67.5.4 and 63.6.5.) 


21.16.2 REMARK:  Parallelism viewed through fibre charts. 

Parallelism may be defined either as an absolute parallelism as in Remark 21.15.2, or as a pathwise parallelism 
via alternative paths which transport parallelism from between points of the base space, or in any other way. 
But the end result of these parallelism constructions is a bijection 0b, b, : Ey, — Eb, between the fibre sets 
Ey, = 1 !((b,]) at base points b; € B for k = 1,2. This is illustrated in Figure 21.16.1. 


F QE) fibre space 

[o fig, Be fibre chart 
Obs b, 

g =m total pace 
Ob, ,b2 

T | | projection map 

B . . base space 

by b» 
Figure 21.16.1 Parallelism between fibre sets of a fibre bundle 
It is reasonable to expect that if the frame $4 at bı € B and the frame ¢2 at bo € B are parallel, 


les, le, 
and if the objects z; € Ey, and z2 € Ep, are parallel, then the measurements $1(z1) and ¢2(z2) should be 


equal. This implies that $1(z1) = $2(05, s, (z1)). So if the frames of reference $4 and $» are known 


lg, lm, 


to be parallel, then the fibre set parallelism bijection 0), ;, may be calculated as 0p, b, = dlp o dile, : 
2 1 
Conversely, the relation between the frames of reference may be calculated as $1 | B, = $2 | E, 0b, b, 
1 2 


Since fibre charts are identified with principal fibre bundle total spaces in Section 21.9, these calculations 
show that there are two ways to specify parallelism on a fibre bundle. One may either specify the parallelism 
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directly on fibre sets of the ordinary fibre bundle, or one may specify parallelism on the associated principal 
fibre bundle, which then induces parallelism on the ordinary fibre bundle. 


21.16.3 REMARK: The parallelism relations look like left actions on the structure group. 

When the parallelism bijection 6), ,, is viewed through charts ¢; and ¢2 which are not necessarily parallel, 
where bp € 7(Dom(¢,)) for k = 1,2, this bijection must be equal to the action of an element of the structure 
group G on the fibre space F. This follows from Definition 21.8.3 because all chart transitions must be 
structure group elements. In other words, 


Vài € Af. , VÀ» € Ak, 3g € G, d2ln,, o Oba 5, = Lg © dila,» (21.16.1) 


where Lg : F — F denotes the left action of g on F. Thus when viewed “through the charts”, the parallelism 
relation looks like the left action of an element of the structure group. This is formalised as Theorem 21.16.4. 


Since structure groups preserve some sort of structure (such as linearity or orthogonality), this means that 
any specification of parallelism must preserve structure. If one associates each fibre chart with an observer 
viewpoint, then line (21.16.1) means that each observer sees the same reality because their observations are 
related through a transformation Lg under which the “laws of physics” are invariant. (See Remark 20.1.10 
regarding the role of transformation groups for describing observer-independence of the “laws of physics” .) 


21.16.4 THEOREM: Parallelism transformations are equivalent to left actions by the structure group. 

Let (G,F,o,) be an effective left transformation group. Let (E,7,B,A) be a non-topological (G, F) 
ordinary fibre bundle. Let w : Ey, — Ep, be a bijection for some 51,05 € B, where Ey = 7~'({b}) for b € B. 
Suppose that there are charts $4, 9 € AL with bj € t(Dom(¢/,)) for k = 1,2, such that Pal m, = ou. 


Then 


Vài € Af. , VÀ» € ALL, 3g € G, Yz € Ey, 
6a(2) = ng: (o2) (21.16.2) 

where AZ = (6 € AE; b € r(Dom(¢))} for b € B. 
PROOF: Let (G,F,co,u), (E, m, B, AZ), b1,b2 € B, w : Ey, — Ep, and $4, 95 € AE be as in the assumptions 
of the theorem, with VAR = ow. Let ó1€ Ag and $3 € Af. Then by Definition 21.8.3 (v), there exist 

s LIUM 

91,92 € G such that Vz, € Ey, dn (Ze) = (gk, >, (2x)), for k = 1,2. Let g = gog, |. Let z € Ey,. Then 
(9, ¢1(w(2))) = ug, (gi. 91 (09(2))) = HG, ulg, d5(2))) = HG, ulg, (aa ^. ó2(2)))) = éx(2)- 


21.16.5 REMARK: Parallelism bijections do not in general commute with structure group left actions. 
The fact that parallelism is observer-dependent in line (21.16.1) implies that the desirable condition (21.16.3) 
is not valid. 


Voy € Aj; Vo» € Aj, Vh € G, Vz1, 22 € Ey, 
b1(z1) = h: b1(z2) = 2(9b1,b.(21)) = h - $2 (8,5, (22)). (21.16.3) 


To see how this fails, suppose that line (21.16.1) holds. Fix $1 and ¢2. Then for some g € G, for all z € E», 
$2(00,,(2)) = 9: di(z). So $2(F0,5:(41)) = g ` d1(1) and 2(05,,5, (22)) = g: $1(22). Then from the 
assumption $1(z1) = h- $1(22), it follows that $2(05, b, (21)) = ghg- | - ó2(05, bı (22)). Thus even with a fixed 
pair of charts, the bijection 05, »,, viewed through the charts, does not commute with the action of G on F, 
unless G happens to be a commutative group. 


The physical reason that condition (21.16.3) fails is that the action L, looks different from two viewpoints 
or coordinate frames. The algebraic reason that condition (21.16.3) fails is that the multiplication by h is 
on the left, which is the same side as the multiplication by g in line (21.16.1). The two left actions do 
not commute in general. What is really wanted here is an action on the right in condition (21.16.3). (In 
fact, we really only want the actions to be on opposite sides, but the convention is to make (G, F) a left 
transformation group. And this implies that the invariance condition (21.16.3) must use a right action.) 
Unfortunately, there is no right action of G on F. So condition (21.16.3) cannot be valid in general. 
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A right action of G on F is in fact available if F is the group G, and G acts on G from the left in 
Definition 21.8.3 (v), and G acts on G on the right in condition (21.16.3). This works. The left and right 
actions commute. Then one obtains condition (21.16.4). 


Voy € Aj, ; Yoz € Ay, Vh € G, Vz, Z2 € Egis 
b1(21) = b1(22)-h => Q2(0bz,b1(21)) = 2 (0v3 ,b1 (22)) + h. (21.16.4) 


This is part of the motivation for defining principal fibre bundles. 
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Chapter 22 


LINEAR SPACES 
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22.0.1 REMARK: The historical origin of linear algebra. 
Bell [233], page 379, writing in 1937, attributes the initial development of linear spaces of general dimensions 
to Arthur Cayley (1821-1895). 


Another of the ideas originated by Cayley, that of the geometry of “higher space” (space of n 
dimensions) is likewise of present scientific significance but of incomparably greater importance as 
pure mathematics. Similarly for the theory of matrices, again an invention of Cayley’s. 


Linear algebra is also attributed to Hermann Grafmann. Eves [68], page 10, says the following. 


Algebras of ordered arrays of numbers can perhaps be said to have originated with the Irish mathe- 
matician and physicist Sir William Rowan Hamilton (1805-1865) when, in 1837 and 1843, he devised 
his treatment of complex numbers as ordered pairs of real numbers (...) and his real quaternion 
algebra as an algebra of ordered quadruples of real numbers. More general algebras of ordered 
n-tuples of real numbers were considered by Hermann Günther Grassman (...) in 1844 and 1862. 


22.0.2 REMARK: Why vector spaces should be called "linear spaces". 

The term “linear space" is used in Definition 22.1.1 and elsewhere in this book in preference to “vector 
space" to try to avoid suggesting the image of a vector which is presented in elementary texts as an arrow 
with a shaft and arrowhead. The terms “linear space" and “vector space" are often used interchangeably, 
but the term “vector space" is suggestive of old-fashioned introductions to physics and elementary geometry 
whereas the term “linear space" suggests the abstract mathematical definition. The “vector spaces” with 
two-ended arrows are better modelled by the tangent-line bundles in Section 26.14 or the vector bundles in 
Section 54.5. In a linear space, the “blunt end of the arrow" is always the zero vector. 


22.0.3 REMARK: The role of linear algebra in differential geometry. 

It is a quite astonishing fact that the geometry of space and the surface of the Earth can be arithmetised. 
For 2000 years from the time of the classical Greek geometers to the time of René Descartes, geometry 
was done in the intuitively obvious coordinate-independent way using points and lines and measuring-sticks. 
Geometry in those 2000 years related points and lines to each other, not to a rectangular grid. Maps did 
not have longitude and latitude grids. But since the 17th century, the education system and science and 
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technology have caused us to think of the universe as a rectangular grid which provides the arena in which 
matter moves. After almost 400 years of ubiquitous rectangular grids and the arithmétic coordinatisation of 
space, the idea of curved space which is not a simple grid stretching to infinity in all directions seems alien. 


It should come as a surprise to us that the arithmetic system which was created for counting goats and cattle 
and obols and denarii could usurp the traditional role of Euclidean ruler-and-compass geometry, which was 
created for the triangulation of fields and the design of houses, roads, bridges and carts. For 2000 years, these 
two systems, arithmetic and geometry, were distinct and separate. The numericisation of space arrived very 
conveniently at a time when the spherical form of the Earth began to have real consequences for navigation, 
when the ruler an compass could no longer cope with the curvature of the Earth. 


Linear algebra is the algebra of the Cartesian grid. Galileo recognised that the motion of objects on Earth 
follows a square law in terms of an arithmetic grid for space and time. Newton’s second law F = ma 
may be thought of geometrically in terms of parallel vectors F and a, or in terms of coordinates F; = ma; 
for i = 1,2,3. This formula arises from the linearisation of space and time. The path of the particle is 
curved and the force field is curved, but when they are both differentiated, the differential formula is linear. 
Thus curved global reality is reduced to a linearised local relation, which is then integrated to determine the 
global behaviour. It took thousands of years to recognise that arithmetic could be applied to the world with 
great success, whereas the old Euclidean point-and-line system was difficult to apply to a curved world. 


Line recognition is built into the brain's visual processing. Lines are recognised at a low level on the neural 
path from the eyes to the visual cortex. Arithmetic is more closely related to verbal processing in the human 
brain. So linear algebra and Cartesian coordinatisation bring together two distinct functionalities of the 
human brain: the visual and the verbal. 


Linear algebra has a central role in differential geometry as the limit of curved manifolds. Curved space and 
time are locally linearised so that linear (and multilinear) algebra may be applied. Then global solutions are 
obtained by integration. 


One should not be perplexed that the spherical Earth requires more than one Cartesian patch to cover it 
so that we can arithmetise the Earth's surface. One should not be annoyed that the curved universe of 
general relativity can only be expressed in terms of simple Cartesian coordinates locally instead of a globally 
flat coordinate-free Euclidean point-and-line geometry. One should be astonished that the arithmetic which 
humans developed for counting goats and cattle and money is applicable to a curved universe by locally 
linearising it. The methodology of linearisation and coordinatisation has been one of the huge successes of 
mathematics in applications to science, engineering, economics and other fields. 


22.1. Linear spaces 


22.1.1 DEFINITION: A linear space over a field K < (K,ox,TK) is a tuple V < (K,Vi,og,TK,Ov, H) 
such that (V, cy) is a commutative group written additively, and the operation jj : K x V > V, written 
multiplicatively, satisfies the following conditions. 


(i) VA € K, Voz, v2 € V, A(v, + v2) = AV, + Aus, [(oy, u) distributivity] 
(ii) VÀ1,AÀ9 € K, Vu € V, (A, + Aa)u = A19 + Agu, [(ox, p) distributivity] 
(iii) VA1,A2 € K, Vv € V, (A193)v = à1 (A20), [(7K, p) associativity] 
(iv) Vv» € V, 1gv — v. [unit scalarity] 


The operation oy : V x V — V is called vector addition. The operation u is called scalar multiplication. 


22.1.2 REMARK: A linear space is a unitary left module over a commutative division ring. 

A linear space V over a field K is the same thing as a unitary left module V over the ring structure of K. 
(See Definition 19.3.6 and Remark 19.3.16.) In other words, a linear space is a unitary left module over a ring 
with the additional requirement that all elements of the ring must have two-sided multiplicative inverses. 


Figure 22.1.1 illustrates some relations between unitary left modules over rings and various kinds of linear 
spaces. (The existence of bases for linear spaces is discussed in Section 22.7. See Definition 22.5.7 for 
finite-dimensional linear spaces.) 


A field is known more technically as a commutative division ring. (See Remark 18.7.4.) Therefore the full 
technical name for a linear space would be a “unitary left module over a commutative division ring”. Clearly 
the shorter term "linear space" has advantages. 
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finite-dimensional linear space 


Figure 22.1.1 Relations between linear space definitions 


22.1.3 THEOREM: Every linear space is a unitary module over a ring. 
Let (K, V,og,TK,ov, H) be a linear space. Then (K,V,oK,7K,0y, p) is a unitary module over a ring. 


PROOF: Since K < (K,ok,Tk) is a field, K is a ring by Theorem 18.7.5. Hence by Definitions 19.3.1, 
19.3.6 and 22.1.1, (K,V, ok, Tk, 0v, p) is a unitary module over the ring K. 


22.1.4 THEOREM: Every unitary module over a ring which is a field is a linear space. 
Let (K, V,og,TK, ov, H) be a unitary module over a ring K. Let K be a field. Then (K, V, ox, Tg, ovy, p) is 
a linear space over the field K. 


PROOF: Let V < (K,V,oy,Tk,Ov,H) be a unitary module over a ring K. Then V satisfies Definitions 
22.1.1 and 19.3.6, from which all of the conditions of Definition 22.1.1 follow, except that K is a field. Hence 
when K is a field, V is a linear space over the field K. 


22.1.5 THEOREM: Some very basic properties of linear spaces. 
Let V be a linear space over a field K. 


(i) Vv € V, Ogv = Oy. 
(ii) VA € K, Vv € V, (-A)u = —Av. 
(iii) Vv € V, (-1k)v = —v. 


PROOF: Part (i) follows from Theorems 22.1.3 and 19.3.2 (i). 
Part (ii) follows from Theorems 22.1.3 and 19.3.2 (ii). 
Part (iii) follows from part (ii) and Definition 22.1.1 (iv). 


22.1.6 REMARK: Some partial redundancy in the axioms for a linear space. 

Linear spaces are defined in Weyl [310], section 2, page 15, in a classical fashion for K = IR. Weyl makes 
the observation that the scalar multiplication rules (i), (ii) and (iii) in Definition 22.1.1 follow from the 
vector addition rules for scalars in the rational subfield Q of the field K = IR. Therefore the full scalar 
multiplication rules for K = IR follow from the rules on the dense subset Q of IR by requiring continuity. 


22.1.7 REMARK: Order independence of finite sums of vectors in a linear space. 

Sums of finite sequences (v;)?., € V" of vectors in a linear space V are defined inductively by 377. , v; = 
[m vi) + Um for m > 1 and D v; = Oy. Such sums are clearly order-independent because vector 
addition is commutative. Therefore sums of the form (va)aca € V4 are likewise well defined for finite 
sets A, whether the index set A is totally ordered or not. Finite sets are easily ordered via a bijection 
to/from the corresponding set of integers IN, = {1,...n} or n = (0, 1,...m — 1]. 


22.1.8 REMARK: A field acting on itself may be regarded as a linear space. 

Every field may be regarded as a linear space because fields have a natural linear space structure where the 
field K is both the active and passive set as in Definition 22.1.9. The addition operation ox on K may be 
used as the addition operation oy for the vectors in the set V = K. The multiplication operation Tg on K 
may be used as the action u : K x K — K of elements of K acting on K. 


Consequently, whenever a linear space is required in any definition or theorem, a field (regarded as a linear 
space) may be used in its place. Note, however, that this does not mean that every field is a linear space. 
Linear spaces are a different kind of structure to a field, but a linear space may be constructed in a natural 
manner from every field. 
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22.1.9 DEFINITION: The field K regarded as a linear space is the linear space which has the specification 
tuple (K, K,oK,7K,0K,TK), where (K,og, Tg) is the full specification tuple for K. 


22.1.10 DEFINITION: A (linear) subspace of a linear space V < (K,V,0K,TK,0v, uv) over a field K isa 
linear space U < (K,U,o0K,TK,0u, pu) over K such that U C V, oy C oy and py C py. 


22.1.11 THEOREM: Some very basic properties of subspaces of linear spaces. 
Let U be a subspace of a linear space V. 


(i) U is a submodule of V, regarded as a unitary module over a ring. 
(ii) Oy = Bil: 
(iii) wy = bv | pyy» 
(iv) Oy € U. 


Pnoor: Part (i) follows from Theorem 22.1.3 and Definition 19.3.14. 
Parts (ii), (iii) and (iv) follow from part (i) and Theorem 19.3.10 parts (i), (ii) and (iii) respectively. 


22.1.12 THEOREM: A unitary-module-over-a-ring submodule of a linear space is a linear subspace. 
Let V be a linear space over a field K. Let U be a submodule of V, regarded as a unitary module over the 
ring K. Then U is a linear subspace of V. 


PROOF: From Definition 19.3.14, it follows that U < (K,U,oK,7TK,0u, uu) is a unitary module over the 
rng K such that U C V, oy C oy and uy C uy. But then U is linear space over the field K by 
Theorem 22.1.4. Hence U is a linear subspace of V by Definition 22.1.10. 


22.1.13 THEOREM:  Zquivalent set of conditions for a sub-system of a linear space to be a linear subspace. 
A subset U of a linear space V < (K,V,oK,7K,0v, uy) is a linear subspace of V if and only if U Z () and 
ov(U x U) CU and uy(K x U) CU. 


Proor: IfU is a linear subspace of V, then ay(U x U) CU and uy(K x U) CU follow from the closure 
of the vector addition and scalar multiplication operations for U according to Definition 22.1.1, and U Æ 0 
because Oy € U by Theorem 22.1.11 (iv). 

Now assume U Æ () and oy(U x U) CU and py(K x U) CU. Let u € U. Then Oy = Ogu € U. SoU isa 
submodule of the unitary module V over the ring kK by Theorem 19.3.15. Hence U is a linear subspace of V 
by Theorem 22.1.12. 


22.1.14 REMARK: Trivial subspaces of every linear space. 
The trivial subspace {Oy} of a linear space V is a subspace of all subspaces of V. The space V is a subspace 
of V. For any vector v € V, the subset (Av; A € K} is a subspace of V. 


22.1.15 DEFINITION: A real linear space is a linear space over the field of real numbers. In other words, a 
real linear space is a linear space (R, V, oR, TR, Cv, H) such that (IR; oR, TR) is the field of real numbers. 


22.1.16 DEFINITION: A complex linear space is a linear space over the field of complex numbers. In other 
words, a complex linear space is a linear space (C, V, oc, Tc, ov, u) such that (C,o¢,7c) is the field of 
complex numbers. 


22.2. Free and unrestricted linear spaces 


22.2.1 REMARK: Most practical linear spaces are subspaces of free and unrestricted linear spaces. 

In practice, most linear spaces are based on spaces of functions for which the domain is a fixed set and the 
range is a field or a linear space, and the vector addition and scalar multiplication operations are defined 
pointwise. As mentioned in Remark 22.1.8, a field may always be regarded as a linear space. So in this 
sense, one may speak here simply of functions whose range is a linear space. 


For any set S, pointwise addition and multiplication operations may be defined on the set V? of functions 
from S to a linear space (or field) V over a field K. Such an unrestricted function space provides a valid 
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linear space. (The range V may, of course, be constructed itself as a space of functions with pointwise 
addition and multiplication operations on a different domain.) 


Unrestricted function spaces are sometimes useful, but in the case of an infinite domain, linear subspaces of 
such function spaces are usually more useful. 


Free linear spaces are a particular class of useful subspaces of completely unrestricted function spaces. Other 
kinds of useful function space subspaces are constructed by imposing linear constraints on unrestricted 
function spaces or free linear spaces. (Other methods for constructing linear spaces include quotients, direct 
sums, tensor products and duals.) 


22.2.2 REMARK: Applications for free and unrestricted linear spaces. 
Coordinate maps for linear spaces use a free linear space as target space in Definitions 22.8.6 and 22.8.7. 
According to Theorem 23.1.15, linear spaces which have a basis (as defined in Section 22.7) are naturally 


isomorphic to a free linear space. Thus free linear spaces are of fundamental importance. They are not 
merely special examples of linear spaces. 


As mentioned in Remark 23.7.4, the algebraic dual (as defined in Section 23.6) of a linear space which has 
a basis has a natural isomorphism to an unrestricted linear space which is constructed from the basis. So 
unrestricted linear spaces also play a fundamental role in the theory of linear spaces. 


22.2.3 REMARK: The meaning of the term “natural isomorphism”. 

The term “natural isomorphism” is meta-mathematical. Every mathematician knows what it means, but it 
is difficult to define. Bishop/Goldberg [3], page 80, makes the following brief comment on this subject. (The 
word “reinterpretations” means representations in this context.) 


These reinterpretations are natural since no choices were made to define them. 


22.2.4 REMARK: Well-known examples of unrestricted linear spaces. 
Examples of unrestricted linear spaces of functions according to Definition 22.2.5 include IRZ and R®. 


22.2.5 DEFINITION: The unrestricted linear space on a set S over a field K < (K,ox,7TK) is the tuple 
V < (K,V,og,Tk,0v, H), where V = K^, oy : V x V > V is the pointwise addition operation in V, and 
u: K x V > V is the pointwise scalar multiplication operation by elements of K on V. In other words: 

(i) Vfi, fo € KË, Vx € S, ov (fi, f2) (x) = ok(fa(z), fo(x)). [pointwise addition] 
(ii) VÀ € K, Vf € KS, Va € S, uA, f)(x) = rk (A, f(z)). [pointwise scalar multiplication] 


22.2.6 NOTATION: UL(S, X) denotes the unrestricted linear space on a set S over a field K. In other 
words, UL(S, K) — K? together with the associated pointwise addition and scalar multiplication operations. 


22.2.7 DEFINITION: The standard immersion (function) for the unrestricted linear space V on a set S over 
a field K is the map i : S — V defined as the indicator function i(z) = x42} : S — K for x € S. In other 
words, i : x > (y > day). 


22.2.8 REMARK: Indicator functions and standard immersion functions. 

See Definition 14.7.2 for the indicator function x in Definition 22.2.7. The standard immersion function 
i: S > K? in Definition 22.2.7 satisfies i(z)(y) = dey for all x,y € S. The immersion function value i(z) 
may be written informally as x, and (x) may be informally identified with z. 


22.2.9 REMARK: Functions with finite support. 

'The difference between unrestricted and free linear spaces is that the free linear spaces in Definition 22.2.10 
require all functions to be zero outside a finite subset of the domain. Such functions may be referred to as 
“functions with finite support”, where the “support” supp(/) of a function f means the subset of its domain 
where it is non-zero. (See Definition 14.7.17 and Notation 14.7.18.) 


22.2.10 DEFINITION: The free linear space on a set S over a field K < (K,ok,Tk) is the tuple V < 
(K, V, OK,TK,OV, u), where 


V ={f:S > K; #{x € S; f(x) £0} < oo} 


= {f:S— K; #(supp(f)) < oof, 
ov : V x V — V is the pointwise addition operation in V, and u : K x V — V is the pointwise scalar 
multiplication operation by elements of K on V. 
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22.2.11 NOTATION: FL(S, K) denotes the free linear space on a set S over a field K. In other words, 
FL(S, K) = {f : S —> K; 4 (x € S; f(x) £0} « oo] 
— [4:82 K; #(supp(f)) < oo, 
together with the associated pointwise addition and scalar multiplication operations. 


22.2.12 DEFINITION: The standard immersion (function) for the free linear space V on a set S over a field 
K is the map i: S — V defined as the indicator function i(x) = x42}: S > K for x € S. 


22.2.13 REMARK:  Unsatisfactory multi-letter notations. 

Notations 22.2.6 and 22.2.11 are non-standard and not entirely satisfactory. Such multi-letter abbreviations 
are not often used in mathematics because they may be confused with the product of the individual let- 
ters in the abbreviation. However, the fonts available to mathematicians, although large and numerous, 
are insufficient for the full range of concepts. The use of numerous font styles, such as upper and lower 
case roman, bold, italic, bold-italic, blackboard-bold, calligraphic, Greek, bold-Greek, Fraktur, and many 
mathematical symbols, eventually results in confusion because of the inevitable similarities between many 
of these. Therefore multiple-letter notations are sometimes the least-bad solution. 


22.2.14 THEOREM: Basic linear space properties of unrestricted and free linear spaces. 
Let S be a set and let K be a field. 


(i) UL(S, K) is a linear space. 
(ii) FL(S, K) is a linear space. 
(iii) FL(S, K) is a linear subspace of UL(S, K). 
(iv) If S is finite, then FL(S, K) = UL(S, K) = K?, together with the associated pointwise addition and 
scalar multiplication operations. 


PROOF: For part (i), let V = UL(S, K). Then (V,acy) is a commutative group whose zero element is 
Oy : S + K, where Oy : x — Ox for x € S, because (K,ox) is a commutative group. Let A € K and 
vv; € UL(S, K) = K?. Then A(v, + v2) is the algebraic expression for the value (A, ay (v1, v3)) with 
u: KxV —^Vandoy : VxV — V asin Definition 22.2.5. Thus this is a well-defined element of V. Similarly 
Avy + Avg = ov (uA, v1), HA, v2)) is a well-defined element of V. The equality of these two expressions is 
clear from the distributivity of multiplication over addition for the field K. This verifies Definition 22.1.1 (i). 
Definition 22.1.1 conditions (ii), (iii) and (iv) may be verified similarly from the distributivity, associativity 
and multiplicative identity properties of a field. Hence UL(S, K) is a linear space. 


Part (ii) may be proved as for part (i), except that the algebraic closure of the vector addition operation oy 
and scalar multiplication operation p in Definition 22.2.10 must be verified within FL(S, K). The closure 
of vector addition follows from the fact that any subset of the union of two finite sets is a finite set. The 
“support” of the product Av for A € K and v € FL(S, K) is either the same as the support of v, if A Z Ox, 
or is equal to the empty set (which is clearly finite) if A = Og. Hence FL(S, K) is a linear space. 


For part (iii), clearly FL(S, K) C UL(S, K) by Definitions 22.2.5 and 22.2.10. Hence FL(S, K) is a linear 
subspace of UL(S, K) by Definition 59.2.9 and parts (i) and (ii). 

For part (iv), let S be finite. Let f : S > K. Then {x € S; f(x) £0) C S. So #{z € S; f(x) FO} € 
#(S) < oo. Thus (f: S — K; #{x € S; f(x) £0} < oo) = K?. Hence FL(S, K) = UL(S, K) = KS. 


22.2.15 REMARK: Component-graph diagram for vectors in a free linear space. 
Pointwise addition is illustrated in Figure 22.2.1 for the free linear space FL(IR, IR) on IR over the field R. 
This kind of vector-addition picture is in contrast to the usual shaft-and-arrowhead picture of vector addition. 


22.2.16 REMARK: The range of the standard immersion function for a linear space. 

The standard immersion function i : S — FL(S, K) in Definition 22.2.12 is the same as that which is 
discussed in Remark 22.2.8. The range of this immersion function is a basis for the free linear space. (See 
Definition 22.7.2 for linear space bases.) 


22.2.17 REMARK: Free linear spaces may be thought of as external direct sums. 
Free linear spaces may be thought of as the external direct sum of a copy of a field K for each element of an 
arbitrary set. (See Section 24.1 for direct sums of linear spaces.) 
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Figure 22.2.1 Vector addition in the free linear space over the real numbers 


22.2.18 REMARK: Common examples of free linear spaces. 

If the set S is finite, then the unrestricted linear space K^? and the free linear space FL(S, K) are the same. 
Of particular interest are the spaces FL(N,, K) = KN” for n € Zf. These are usually denoted as K”. 
Examples of particular interest are IR" and ©”, the linear spaces of real and complex n-tuples respectively. 
Also, the rational n-tuples Q” constitute a linear space over the field Q. The Cartesian linear space in 
Definition 22.2.19 is the free linear space FL(N,,, IR) on N, over the real numbers. 


22.2.19 DEFINITION: A Cartesian linear space is a set R” for some n € Zi. together with the operations 
of componentwise addition and scalar multiplication over the field IR. 


22.2.20 REMARK: The difference between Cartesian n-tuple space and Euclidean point space. 

The Cartesian linear space in Definition 22.2.19 is also known as “Cartesian coordinate space”, “Euclidean 
space", *Euclidean linear space" and by other names. It is probably best to reserve the adjective Euclidean 
for the normed space which is defined in Section 24.7 because the term “Euclidean” is often used in contrast 
with non-Euclidean, which implies that the metric is a defining feature. 


'The bare coordinate space in Definition 22.2.19 is referred to as Cartesian to emphasise the role of the real 
n-tuples as a chart for some underlying point space, which distinguishes Cartesian geometry from the more 
ancient geometry of point, line and circle constructions. The name “Cartesian” is also apt because the set 
IR" is the Cartesian set-product of n copies of the set IR. 


22.2.21 REMARK: Computational convenience of the sparse representation of a free linear space. 

From the computational point of view, the representation in Definition 22.2.10 is easy to operate on. Each 
element of the free linear space has only a finite set of non-zero components. So it is possible to represent 
an element of the space by a finite set of ordered pairs. Any vector f in the free linear space V on the set S 
over K may be represented by a finite set of tuples: 


{(z, f(@)) € fi F@) 40]. 


Such a representation is easy to store and manipulate if the elements of S and K are easy to store and 
manipulate. This is close to the way symbolic algebra is implemented in computer software. Figure 22.2.2 
illustrates an element X{u} + 2x(v) + (73/2)xqu with u = (3,3), v = (1, —4) and w = (—4, —1) in the free 
linear space on the Cartesian product set S — IR^ over the field K — IR. 


K=R 
(v2) | 


(w,—3/2) $ 


Figure 22.2.2. Xt(3,3)) + 2x¢(1,-4)} + (-3/2)x4(-4,-1)} in the free linear space on IR? over IR 
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22.2.22 REMARK: Free linear spaces can give concrete set-representations for symbolic algebras. 

Free spaces of any kind may be thought of as a kind of “symbolic algebra” such as is offered by some 
computer software. Each element of a “free space” of any class represents a grouping of symbols rather 
than a mathematical object. The intended mathematical object is obtained from the “symbolic algebra” 
by imposing an equivalence relation and adding algebraic operations. In the example of tensors, a symbolic 
expression such as “;(v1 Q w1) + Ag(v2 & w2)” may be represented as a sum of indicator functions such as 
ALX{(v1,w1)} + A2X((v2,w2))- 

Whenever mathematicians or physicists write down juxtapositions of symbols without working out what 
they represent mathematically, the concept of a “free” space can often save the situation. For instance, 
when tensor products are written, they are usually written as v & w. To formalise this mathematically, one 
may first create the “uncut space" or “free space" of all formal juxtapositions of vectors v and w in some 
linear space, and then reduce the size of this free space by introducing equivalence relations. This approach 
is also used for “free groups”, “free R-modules”, and so forth. 


An example of interest to differential geometry is a free R-module of singular q-chains over a commutative 
unitary ring R such as R = Z. In this case, such a free R-module is referred to as the space of “formal 
linear combinations” of elements of some space. The base space for the free space of q-chains is the space of 
q-simplices. (See Greenberg/Harper [86], page 44.) 


22.2.23 REMARK:  Sparse arrays and sparse representations of algebraic systems. 

Sparse arrays may be represented by lists in a similar way to free linear spaces. If an array has very few 
non-zero entries, it is more efficient in computer software to represent arrays as lists of the non-zero terms. 
Thus the representation of algebras (such as tensor products of linear spaces) in terms of free linear spaces 
may be thought of as “sparse representations". 


22.2.24 REMARK: Convenient notation for sets of functions with finite support. 

Sets of the form "(f : S > K; #{a € S; f(x) Z 0x} < co}”, where S is a set and K is a field or a linear 
space, are tedious to write and tedious to read. Since such sets are frequently required, it is convenient to 
define Notation 22.2.25 for them. This is the same as the set of vectors of the free linear space FL(S, K) in 
Notation 22.2.11, but Fin(S, K) has no linear space operations, and the range K may be either a field or a 
linear space. The notation Fin(S, K) is typically used when S is a subset of a linear space. 


22.2.25 NOTATION: Set of functions with finite support. 
Fin(S, K) denotes the set (f : S > K; #{x € S; f(x) Z 0g} < co} for any set S, where K is any field or 
linear space. In other words, Fin(S, K) = (f € KS; #(supp(f)) < oo]. 


22.2.26 REMARK:  Vector-valued free and unrestricted linear spaces. 

As discussed in Remark 22.2.1, unrestricted and free linear spaces may be defined with a linear space U 
instead of a field K as the target space for functions on an arbitrary set S. The vector-valued unrestricted 
and free linear spaces in Definitions 22.2.27 and 22.2.28 are essentially the same as direct products of multiple 
copies of the linear space U, without or with a finite support constraint respectively. 


22.2.27 DEFINITION: The unrestricted (vector-valued) linear space on a set S, with values in a linear 
space U < (K,U,oxg,TKg,ou, u) over a field K < (K,ox,TxK), is the tuple V < (K,V,oK,TK,0Vv, LV), 
where V = US, oy : V x V — V is the pointwise addition operation in V, and py : K x V — V is the 
pointwise scalar multiplication operation by elements of K on V. In other words: 


(i) Vfi, fa € U$, Vx € S, ov (fi, fa) (x) = ou (fila), fa(z)). [pointwise addition] 
(ii) VÀ e K, Vf € US, Vx € S, py (A, f)(zx) = nu (A, f (z)). [pointwise scalar multiplication] 


22.2.28 DEFINITION: The free (vector-valued) linear space on a set S to a linear space U over a field 
K < (K,ox,Tx) is the tuple V < (K, V,oK,TK, 0v, uv), where 


V={f:S >U; #{x € S; f(z) Z0y) < co} 
= {f € U?; #(supp(f)) < œ}, 


ov : V x V —> V is the pointwise addition operation in V, and uy : K x V — V is the pointwise scalar 
multiplication operation by elements of K on V (as in Definition 22.2.27). 
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22.2.29 REMARK: No standard immersion for vector-valued free and unrestricted linear spaces. 

For Definitions 22.2.27 and 22.2.28, there is no “standard immersion” map for S in V following the pattern 
of Definitions 22.2.7 and 22.2.12 because a linear space U does not generally have a multiplicative identity 
element 1y. However, it is possible to define a “standard immersion” of U in V for each x € S as the map 
iz : U — V with à; : u e (y > syu). In other words, iz(u)(y) = u if y = x and i;(u)(y) = Oy ify Z x. The 
standard projection map Tg : V — U defined by Tg : (uy)yes +> us is a left inverse for this immersion map 
for each z € S. 


22.3. Linear combinations 


22.3.1 REMARK: Infinite sums of non-zero vectors are undefined. 

Since there is no topology for defining convergence, the sum of any infinite set of vectors is undefined in pure 
algebra, but the sum of any finite set of vectors is well defined by induction on the number of elements. An 
infinite set of vectors may be summed if all but a finite number of the vectors are equal to the zero vector. 
Thus it is assumed that the sum of any number, finite or infinite, of zeros is equal to zero. 


22.3.2 DEFINITION: A linear combination of (elements of) a subset S of a linear space V over a field K is 
any element v € V which satisfies 


dn € Zi, Jw € S”, 3AE K”, 5 AiWi = v. (22.3.1) 
i=1 


22.3.3 REMARK: Comments on the definition of a linear combination. 

The sum Yi A;w; in Definition 22.3.2 is well defined, as discussed in Remarks 17.1.16 and 22.1.7. The set 
S may be infinite, but the sum is always over a finite subset of S. Note also that the vectors w; for i € Nn 
are not necessarily distinct. The index set does not need to be the set of integers Nn. Any finite index set 
is acceptable, as discussed in Remark 22.1.7. Thus condition (22.3.1) may be replaced by (22.3.2). 


3A, #(A)<0o and A(wa)aca € Sf, 3(Ao)acA € K^, E Aawa — v. (22.3.2) 
acA 


Unfortunately, the “set of all finite sets” is not a ZF set. So line (22.3.2) cannot be written with a set- 


constrained quantifier such as “JA € FinSets, ...”. However, lines (22.3.1) and (22.3.2) may be written in 
terms of Notation 22.2.25 as follows. 


à € Fin(S, K), X Aww sv. 
wES 


Likewise, line (22.3.3) may be rewritten as VÀ € Fin(S, K), X weg Aww € S. Such expressions using a set 
Fin(S, K) of functions with finite support exploit the understanding in Remark 22.3.1 that only the non-zero 
elements of a family need to be summed. 


In the special case n = 0 in Definition 22.3.2, both w and A are empty lists, and $7; , A;w; becomes 


bum Ajw;, which equals Oy, as noted in Remark 22.1.7. (It is a generally accepted convention that the sum 
of zero terms in an additive group is equal to the additive identity of the group.) 


22.3.4 DEFINITION: A subset S of a linear space V over a field K is closed under linear combinations if 


Yn € Zf, Yw € S", VÀ € K^, S iwi € 8. (22.3.3) 
{=l 


In other words, VA € Fin(S, K), X ues Aww € S. 


22.3.5 THEOREM: Linear-combination closure is the same as addition-and-scalar-product closure. 
A subset S of a linear space V is closed under linear combinations if and only if it is non-empty and closed 
under vector addition and scalar multiplication. 
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PROOF: Let V bea linear space over a field K. Let S be a subset of V. Suppose first that S is closed under 
linear combinations. Let v1,v2 € S. Then vi = S774, Aw} and v9 = ent News for some n1,n3 € Zf, 
(wm, e Sm, (wj)52, € S" (My e K™, (9524 € K”. Then vi + vg = Y 4, HkUk, where 
(ux), € S" and (ux)? , € K” are defined by m = nı +72, uk = wy for k = 1,...mi, Un, te = w? for 
k — 1,...n2, uy = AL for k = 1,...n1 and js, 4x = A2 for k = 1,...na. So vi c v9 € S. Thus S is closed 
under addition. 

Suppose again that S is closed under linear combinations. Let v € S and p € K. Then v = $7 4 Awi for 
some n € Zg, (wi), € S" and (4;)*., € K”. Then pv = 15, , piwi, where (mi) € K” is defined by 
li = pA; for i € Nn. So pv € S. Thus S is closed under scalar multiplication by elements in K. 

To show that S # 0, let n = 0. Then (w;)?., 2 € S? and (4). 2 € K®. So0y 277 Aw; € S. 
Now suppose that S is non-empty and closed under vector addition and scalar multiplication. Let v be a 
linear combination of elements of S. Then v = $77 , Aw; for some n € Zp, (w;)*., € S" and (Aj), € K”. 
Such v in defined by induction on n € Zf as in Remark 22.1.7. That is, 37 , Awi = Sa Ajw;) +AmWm 
for m > 1 and D Aow; = Oy. Since S Æ (), there is at least one vector v € S. Then Oy = 0gv € S 
because S is closed under scalar multiplication. So >", A;w; € S for m = 0. The validity for general m > 1 
follows from the validity for m — 1 by the closure of S under vector addition and scalar multiplication. So 
by induction, Ya A;w; € S for all m € Nn. Sov € S. Hence S is closed under linear combinations. 


22.3.6 THEOREM: The linear combination closure of a set is closed under linear combinations. 
Let S be the set of all linear combinations of a subset S of a linear space V. Then S is closed under linear 
combinations. 


PROOF: The assertion follows from an explicit expression for a linear combination of a finite set of linear 

combinations. Let u = $5, Av; be a linear combination of vectors v; € S for i € Ny. Then v; = 

35-4 HijWigj for some sequences (pi j)”; in K™ and (wij); in S™ for m; € Zg for alli € Ny. By 
s E : i į M 

substitution, it follows that u = 35, 4 Xi Dia ligue pn Nihi jWij = Y oa AteHMig,j¢Wiggos Where 

M = Yam; and ig = min(i! € Zt; 35; 4 mi > 4} and je = £— xs m; for all £ € Nm. Thus u is a 

linear combination of the vectors (w;,,;,)/4,, with scalar multipliers (Aj, His 5,) 24 ,. 


22.3.7 REMARK: The set of linear combinations of a set of vectors is closed under linear combinations. 
'Theorem 22.3.6 effectively says that a linear combination of a linear combination is a linear combination. So 
by Theorem 22.3.5, the set of linear combinations of any subset S of a linear space V is a non-empty subset 
of V which is closed under vector addition and scalar multiplication. The set of linear combinations of a set 
S contains every element of S and also every element of V which can be constructed from elements of S by 
the recursive application of any number of vector addition and scalar multiplication operations. 


22.3.8 THEOREM:  Linear-combination-closed sets are the same as linear subspaces. 
A subset S of a linear space V is non-empty and closed under linear combinations if and only if S is a linear 
subspace of V. 


Proor: This follows from Definition 22.1.10 and Theorem 22.3.5. 


22.3.9 REMARK: A subset of a linear space is assumed to have an induced linear space structure. 

'The second mention of the subset S in Theorem 22.3.8 is implicitly an abbreviation for the induced tuple 
S «(K,S,ok,Tk,0s, Ls), where K < (K,oxK,Tx) is the field of V, og = OF PN and us — recs Then 
if the subset is closed under linear combinations, it satisfies Definition 22.1.10. 


22.3.10 REMARK: The implicit summation notation for linear combinations. 

Einstein is widely blamed for the convention of omitting the summation symbol for paired indices of n-tuples 
such as occur in Definition 22.3.2 line (22.3.1). During the 19th century, the original developers of tensor 
calculus adopted the convention that the components of covariant vectors and tensors should use subscripts, 
while the components of contravariant vectors and tensors should use superscripts. Since the operation of 
summing over paired subscripts and superscripts was so common, and the index set N, was typically fixed 
within a given context, the summation symbol could be omitted without introducing ambiguity. 


Despite the convenience of this implicit summation convention, it has numerous difficulties when extended 
too far. Sometimes the summation is not intended at all, and sometimes the index set IN,, is variable, which 
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often occurs in pseudo-Riemannian manifold contexts where n may be 3 or 4 depending on whether the time 
parameter is included. Another issue is that sometimes the implicit sum is over a sub-expression of the entire 
expression. In other words, the position of the sum symbol in a long expression may be ambiguous. For 
further arguments against the “Einstein summation convention”, see also Spivak [37], Volume 1, page 39. 


22.4. Linear span of a set of vectors 


22.4.1 REMARK: Terminology and notation for the span of a set or family of vectors. 

As a very minor abuse of terminology and notation, the term “linear span” and the notation “span” are 
applied to both subsets of a linear space and families of elements of a linear space. In the literature, sets 
and families are often used interchangeably, adding and dropping the family-subscripts as convenient. Thus 
in Notation 22.4.5, the set-theoretic function “span” has technically different meanings in its two instances 
in the equality “span(v) = span(Range(v))”. 


22.4.2 DEFINITION: The (linear) span of a subset S of a linear space V is the set of all linear combinations 
of elements of S in V. 


22.4.3 DEFINITION: The (linear) span of a family v = (vi)ier of vectors in a linear space V is the set of 
all linear combinations of vectors v; such that i € J. 


22.4.4 NOTATION: span(S) denotes the linear span of a subset S of a linear space V. 


22.4.5 NOTATION: span(v) denotes the linear span of a family v = (v;)ie7 of vectors in a linear space V. 
In other words, span(v) = span(Range(v)) = span((vi; i € I]). 


22.4.6 REMARK: Trivial properties of linear spans. 
Although the properties asserted in Theorem 22.4.7 are almost obvious, it is prudent to provide proofs to 
ensure that there are no borderline cases which contradict the obvious general rules. 


22.4.7 THEOREM: Basic properties of the linear span of a subset of a linear space. 
Let V be a linear space over a field K. 
(i) span(S) is a linear subspace of V for any subset S of V. 
i) span() = {Ov}. 
) S C span(S) for every subset S of V. 
) span(W) = W for every linear subspace W of V. 
) For any subset S of V, span(S) = S if and only if S is a linear subspace of V. 
) Let Sı and $5 be subsets of V. Then Sı C S2 = span($1) C span(S2). 
(vii) Oy € span(S) for any subset S of V. 
ii) For any v € V and any subset S of V, v € span(S) > span(S U {v}) = span(S). 
) For any subsets Sı and S2 of V, Sı C span(S3) = span(9$1) C span(S2). 
)SCW = span(S) C W for every linear subspace W of V and every subset S of V. 
) For any subset S of V, span(span(S)) = span(S). 
) For any subset S of V, for any u € K \ {0x}, span((uv; v € $1) = span(S). 


Pnoor: Part (i) follows from Definitions 22.4.2 and 22.1.10, and Theorem 22.3.6. 

For part (ii), a vector v € V is a linear combination of elements in Ø if and only if v = $77 , uj for some 
n € Zg and u; € ( for all i € Nna. The only possibility for n is zero, and so v = p ui = Oy. Hence 
span(0) = {0y }. 

For part (iii), let S be a subset of V. Let v € S. Then $5; , ù is a linear combination of elements of S, 
where n = 1 and u; = v. So v € span(S). Hence $ C span(S). 

For part (iv), let W be a linear subspace of V. Then W C span(W) by part (ii). Let v € span(W). 
Then v = 37.4 w; for some family (w;)?_, of elements of W. So v € W by Theorem 22.3.8. Therefore 
span(W) € W. Hence span(W) = W. 
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For part (v), let S be a subset of V. If S is a linear subspace of V, then span(S) = S by part (iv). 
Now suppose that span(S) = S. Let v be a linear combination of elements of S. Then v € span(S) by 
Definition 22.4.2. Sov € S. Thus S is closed under linear combinations. Therefore S is a linear subspace 
of V by Theorem 22.3.8. Hence span(S) = S if and only if S is a linear subspace of V. 

For part (vi), let S1 and S2 be subsets of V such that Sı C S2. Let v € span(51). Then by Definition 22.3.2, 
v = 075.4 Aw; for some n € Zj, w € S? and A € K”. But S? C S2. So v = 357 Aw; for some n € Zg, 
w € S? and A € K”. Therefore v € span(.S2) by Definition 22.3.2. Hence span(51) C span(S2). 

For part (vii), let S be a subset of V. Then 0y — yG A;w;, where (A;)°_, = () = 0 is the empty (zero- 
length) tuple of elements of K, and (w;)0, = () = 0 is the empty (zero-length) tuple of elements of V. 
Hence Oy € span(S). (See Remarks 22.1.7 and 22.3.3 for the convention that 325 , Ajw; = Ov.) 

For part (viii), let S be a subset of V. Let v € span(S). Then span(S) C span(S U {v}) by part (vi). 
Let u € span(S U {v}). Then by Definition 22.3.2, u = $5; , Aw; for some n € Zf, w € (SU {v})” 
and A € K”. If w; Z v for all i € Zn, then w € S", and so u € span(S). So suppose that wj = v 
for some j € Zn. Since v = S ed Liw; for some n’ € Zt, w € S" and we K”, it follows that u = 


b3 we + eae + Y (Agta wl, which is a linear combination of elements of S. So u € span(S). 
Therefore span(S U {v}) C span(S). Hence span(S U {v}) = span( S). 

For part (ix), let Sı and S5 be subsets of V, and suppose that Sı C span(S5). Let v € span(S1). Then v is 
a linear combination of elements of $4. So v is a linear combination of elements of span( S2). But span(S2) 
is a linear subspace of V by part (i). So v € span(S2) by Theorem 22.3.8. Hence span(51) C span(S2). 

For part (x), let W be a linear subspace of V and let S be a subset of V. Suppose that S C W. Then 
span(S) C span(W) by part (ix). But span(W) = W by part (iv). So span(S) C W. 

For part (xi), let S be a subset of V. Then span($) is a linear subspace of V by part (i). So span(span(S)) = 
span(S) by part (iv). 

For part (xii), let S be a subset of V, and let u € K V {Ox}. Suppose that u € span(S). Then by 
Definition 22.3.2, u = 157, Aw; for some n € Zf, w € (SU {v})” and A € K”. Sou = Y, 4 X (pwi) 
with A; = uw); for all i € Nna. So u € span((pv; v € S}). So span((uv; v € S}) D span(S). Similarly 
span((uv; v € SY) C span(S). Hence span({yv; v € S]) = span(S). 


22.4.8 T'HEOREM: The linear span of a subset is a linear subspace which covers it. 
The linear span of a subset S of a linear space V is a linear subspace of V which includes S. 


PRoor: Let S be a subset of a linear space V over a field K. Let U be the set of all linear combinations 
of elements of S. For every vector v € S, the linear combination Scd kjw; equals v if n = 1, kı = 1x 
and wj = v. So S C U. The fact that U is a subspace of V follows directly from Definition 22.4.2, 
'Theorem 22.3.6 and Theorem 22.3.8. 


22.4.9 REMARK: The free linear space om a given set equals the span of that set. 
The set of vectors in the free linear space FL(S, K) in Section 22.2 may be expressed in terms of linear spans 


as follows. 
FL(S, K) = (f € K?; #(supp(f)) < oo) 


span(fx(ay; a € S}) 
= span(Range(i)), 


where i : S — FL(S, K) is the standard immersion function for FL(S, K). (See Definition 22.2.12.) The 
vectors X{a} for a € S are “unit vectors” in a free linear space. 


22.4.10 REMARK: The span of a set equals the smallest linear subspace containing the set. 

The above approach to defining linear spans is explicitly constructive, building up the span of a set from linear 
combinations. Linear spans can also be defined more abstractly by the “lim-sup” method as the intersection 
of all linear subspaces which contain a given set. This is often preferred for its apparent simplicity. The 
equivalence of the two definitions is stated as Theorem 22.4.12. 


22.4.11 THEOREM: Every intersection of subspaces is a subspace. 
The intersection of a non-empty set of subspaces of a given linear space is a subspace. 
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PROOF: Let V < (K,V;ok,Tk,0v,Lhv) be a linear space, and let C be a non-empty set of subspaces 
of V. Define the tuple (K,W,oKn,7TK, ow, uw) by W = NC, ow = Eo ee and uw = Bie seus Then 
Range(cw) € W and Range(uw) € W. So W is a linear space. Therefore it is a subspace of V. 


22.4.12 THEOREM: The linear span of a subset equals the intersection of subspaces which cover it. 
The linear span of a subset S of a linear space V is the intersection of all subspaces of V which include S. 
In other words, 

span(S) = (| {U € IP(V); U is a subspace of V and S C U}. 


PROOF: Let S be a subset of a linear space V. Let W be the intersection of the subspaces of V which 
include S. By Theorem 22.4.8, span( S) is a linear subspace of V, and S C span(S). Therefore span(S) > W 
by Theorem 8.4.8 (xv). But W is a subspace of V by Theorem 22.4.11, and S C W by Theorem 8.4.8 (xviii). 
So all linear combinations of elements of S are also elements of W by Theorem 22.3.8. So span(S) € W by 
Definition 7.3.2 and Notation 7.3.3. Hence span(S) = W. 


22.4.13 REMARK:  Meanings of the verb "to span". 

It seems reasonable to say that “a set Sı spans a set S2” if every element of S» is a linear combination of 
elements of $1. In other words, S5 C span($1), which means that the span of Sı covers $5 in some sense. 
But in practice, the phrase “Sı spans S2” usually means that Sə is exactly equal to the span of S1. In other 
words, $5 = span( S1). 

Definition 22.4.14 interprets the verb “to span” in the strict sense of equality because most authors define 
it that way. Therefore any set U which is spanned by a subset S of a linear space V can be assumed 
a-priori to be a linear subspace of V. It can also be assumed a-priori that S C U. (This follows from 
Theorem 22.4.7 (iii).) 


22.4.14 DEFINITION: A linear subspace U of a linear space V is said to be spanned by a subset S of U if 
U = span(S). Then the set S is said to span the subset U, and the set S is said to be a spanning set for U. 


22.5. Linear space dimension 


22.5.1 REMARK: The dimension of a general linear space uses “beta-cardinality”. 
The dimension of general linear spaces is defined in terms of the “beta-cardinality” B(X) of sets X. (See 
Definition 13.4.9 and Notation 13.4.10 for beta-cardinality.) The standard cardinality of uncountable sets 
is ill-defined in general. (See Remark 13.4.11 for some discussion of the difference between beta-cardinality 
and the customary style of ordinal-number-based cardinality found in many books.) 


If a linear space has a countable spanning set, its dimension is the same as the standard minimum cardinality 
of the spanning set, which is either a non-negative integer in Zj (or finite ordinal number in w) in the case 
of a finite set, or the set w in the case of a countably infinite set. Therefore in the case of linear spaces with 
countable dimension, the standard cardinality symbol “#” may be used instead of “8”. 


If the “smallest” spanning set is equinumerous to IR, then the dimension is G(R) = 2”. To be more precise, 
if a linear space has no countable spanning sets, but it does have a spanning set which is equinumerous to 
a subset of IR, then the dimension of the space according to Definition 22.5.2 is 2”. (This is essentially the 
same as P(w).) 


Similarly, if a linear space has no spanning set which is equinumerous to a subset of IR, but it does have a 
spanning set which is equinumerous to a subset of IP(IR), then the dimension of the space is 27) (which is 
essentially the same as P(P(w))). 

Some indications of why beta-cardinality is preferred here to the more customary ordinal-number-based 
cardinality are given in Section 13.4 and elsewhere. (In particular, note that there is no explicit ordinal 
number which is equinumerous to R because that would imply that R. can be explicitly well-ordered, which 
is not possible without assuming the axiom of choice.) 


22.5.2 DEFINITION: The dimension of a linear space is the least beta-cardinality of its spanning subsets. 


22.5.3 NOTATION: dim(V) denotes the dimension of a linear space V. In other words, 


dim(V) = min(f(S); S € P(V) and span(S) = V}. 
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22.5.4 REMARK: The dimension of some trivial spaces. 
The trivial linear space V = {Ov} is spanned by the empty set. So dim((0y]) = 0. A field regarded as a 
linear space, as in Definition 22.1.9, has dimension 1. 


22.5.5 REMARK: The beta-cardinality is always well defined. 

The set of spanning sets (S € IP(V); span(S) = V} in Definition 22.5.2 is always a well-defined ZF set 
because it is specified as a subset of IP(V), which is always a well-defined set if V is. This set is always 
non-empty because it always contains the set S = V. The set of beta-cardinalities of sets S € IP(V) such 
that span(S) = V is well ordered. (The finite beta-cardinalities are non-negative integers, and the infinite 
beta-cardinalities are defined by transfinite recursion with a parameter which is an ordinal number, and the 
ordinal numbers are well ordered.) Therefore the set (8(S); S € IP(V) and span(S) = V} always has a 
unique minimum. 


22.5.6 REMARK: Upper dimension and lower dimension of a linear space. 

The dimensionality in Definition 22.5.2 is a kind of “upper dimension" by analogy with the “outer measure” 
of a set. (The outer measure is the infimum of the sums of measures of sets of simple sets which cover a given 
set. See Chapter 45 for Lebesgue measure.) One may also define a “lower dimension” of a linear space by 
analogy with the “inner measure” of a set. (The inner measure is the supremum of the sums of measures of 
sets of simple sets which are covered by a given set.) The lower dimension of a linear space may be defined 
as the supremum of the beta-cardinality of linearly independent subsets of the space. (See Section 22.6 for 
linearly independent sets of vectors.) The dimensionality of a linear space is defined as this kind of lower 
dimension by Mirsky [117], page 50. 


It is fairly clear that if a linear space contains at least n linearly independent vectors, for some kind of 
cardinality n, then the dimension of the space must be at least n. If it is possible to span the space with 
n vectors, then the dimension must be at most n. In the case of measure theory, the measure of a set is 
considered to have a definite value when the inner and outer measures are equal. In the case of integration, 
the equality of upper and lower integrals is considered to give the integral a definite value. The same kind 
of thinking is applicable to the dimension of a linear space. 

The upper dimension in Definition 22.5.2 has the form dimt (V) = inf(8(B); B C V and span(B) = V). The 
lower dimension would have the form dim (V) = sup(8(B); B C V and B is linearly independent}. The 
dimension of a linear space is then considered to have a definite value when the upper and lower dimensions 
are equal. In the case of finite-dimensional linear space, this is in fact always true. 


22.5.7 DEFINITION: 
A finite-dimensional linear space is a linear space which is spanned by a finite set. 


An infinite-dimensional linear space is a linear space which is not finite-dimensional. 


22.5.8 REMARK: The dimension of a finite-dimensional space is finite. 

Unsurprisingly, the combination of Definitions 13.5.2, 22.5.2 and 22.5.7 implies that the dimension of a finite- 
dimensional linear space is a non-negative integer. In other words, dim(V) € Zf for any finite-dimensional 
linear space. This means that Jn € Zt, 3S € IP(V), (#($) = n and span(S) = V). 


22.5.9 REMARK: Linear subspaces of finite-dimensional linear spaces are finite-dimensional. 

It seems intuitively clear that any linear subspace of a finite-dimensional linear space should itself be finite- 
dimensional. But this must be proved. It is not entirely a-priori obvious in general that the dimension of a 
linear subspace U must be equal to or lower than the dimension of a linear space V of which it is a subspace. 
If V has dimension a, this means that the least beta-cardinality of spanning sets for V is a. Thus there exists 
a spanning set S for V with B(S) = a because the beta-cardinals are well ordered. But then a spanning 
set Sy must be constructed for U such that 6(Sy) < a. In general, it is not obvious how to construct 
such a spanning set. In the case of finite-dimensional spaces, the induction principle may be applied as in 
Theorem 22.5.10 (ii) to explicitly construct a basis for any given linear subspace of a finite-dimensional linear 
space. (Questions regarding linear independence and bases are avoided in the proof of Theorems 22.5.10 
and 22.5.11 because these topics are first introduced in Sections 22.6 and 22.7 respectively, and linear maps 
are not introduced until Section 23.1. 


22.5.10 THEOREM: Projection of linear subspaces of tuple spaces to restricted-tuple spaces. 
Let K be a field. For any finite subsets S' C S of Zt, define «s : KS + K? by msg : vm (vi)jieg; — v 


S" 
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(i) If S and S’ are finite subsets of Z+ with S' C S, and W is a linear subspace of K?, then ms:(W) is a 
linear subspace of K* . 


(ii) For any finite subset S of Z*, for any linear subspace W of K?, the map «s; : KS — K” is a bijection 
for some subset S" of S. In other words, 


VS c IP? (Z*), VW € P(K5), 


W is a linear subspace of KS > 3S’ € P(S), mg: K? — K” is a bijection. 


(See Notation 13.12.5 for IP**(Z*) = (S € IP(Z*); #(S) < oo). 


Pnoor: For part (i), let 4j, A9 € K and wj,w5 € W'. Then wi = Tg (w1) and wy = mg (w2) for some 
w1, w2 € W. So Au + A2we € W because W is a linear subspace of K?, and so mg (ywi + Agw2) € W”. 
But «s (Awi + À2u2) = (Awi i + A2wo2i)ies; = Alwi ihes + Ao(wai)ies = Tsg (wi) + Aomgr(w2) = 
Arw) + Au^. So Awi + Au, € W'. Hence W' is a linear subspace of K’ by Theorem 22.1.13. 

Part (ii) may be proved by induction on n = #(S). Let V denote the linear space K^?. The assertion 
is clearly true for n = 0 because W = {0y} is the only possibility for W, and then S" = () satisfies the 
requirement. So let n € Zi and assume that the assertion is valid for all linear subspaces of linear spaces 
K with S C Z* and #(S) =n. 

Let W be a linear subspace of V = K? with S C Z* and #(S) =n+1. If W = V, then S' = S meets the 
requirement. So suppose that W Æ V. 

Let (0;);es be the family of vectors 6; = (0;;);es € K^. Then ôy € V VW for some i’ € S. (Otherwise W 
would include span((d;)ies) = K?, and so W would equal V.) Let S' = SV (i). Let V' = K”. Define 
Ts: : W > V' by mg: :v e v| g. To show that zs is injective, let vt, v? € W satisfy mg (vt) = ms (v?). Then 
vl = v? for all i € S. Suppose that vj, Z v2. Let A = (v, —v2) !. Then Avt — Av? = A(v], — v2.) dy = 0p € W 
because W is a linear subspace of V. This is a contradiction. So v], = v2. Therefore cs; is injective. But 
W'= Range(7s") is a linear subspace of V' by part (i). So by the inductive assumption, there is a bijection 
ms” : W! — KS" for some subset S" of S' of the form mg” : v = v ge Then zs» o tg : W > KS” is a 
bijection of the required form. Therefore by induction, the assertion is valid for all n € Zi : 


22.5.11 THEOREM: Some basic properties of linear subspaces of finite Cartesian product spaces. 

Let W be a linear subspace of the linear space K” over a field K for some n € Ze ‘ 

(i) The map v — (v;)ier from W to K! is a bijection for some subset I of Np. 

(ii) W = span((z; ! (0;); i € I}) for any subset I of N, for which the map 7; : v +> (vi);er from W to KI 
is a bijection, where 9; = (6;;)jer € K! fori € I. 

(ii) dim(W) < n. 

(iv) K” = span(Sr U Tj), where Sr = (1; !(0;); i € I} and Tr = {6;; i € Nn \ I}, for any subset I of N, for 
which the map 7; : v > (v;)ier from W to KT is a bijection, where 6; = (0;5)jer € K! for i € I. 


PROOF: Part (i) follows from Theorem 22.5.10 (ii) by letting S = N, and then letting J = S". 

For part (ii), let my : v + (u)ier be a bijection from W to Kl. Let Sr = Inr (di); d € I}. Then 
S; C W. So span(S7) C W by Theorem 22.4.7 (x). Let w € W. Let (Ai)ier = tr(w) € K!. Then 
w = n7 (nr(w)) = r7 (er) Yer Aim; (ô;) € span( Sz). Therefore W C span(W). So W = span(S7). 
For part (iii), v; : v œ (vi)ier is a bijection from W to K' for some subset I of N,, by part (i), and then 
W = span({r7 ! (0;); i € I}) by part (ii). So by Notation 22.5.3, dim(W) < #(I) € n. 

For part (iv), let v € K”. Then v = (A), = J; Aid; for some (Aj)"_, € K”. But for i € I, ôi = 77 (6) - 
(ôi = n; (ô:)), where for j € l, 2; (0); = nr (nj (6i))5 = oig. So for i, j € I, (ôi — n7 (65); = iy ex bi = 0. 
(In other words, 7; has no effect on component j of 6; if i, 7 € I, which follows from the definition of 77.) 
Let I‘ = Nn VI. Then for all i, 6; — 77 (5;) = X jer að; for some (a4) jer € K”. (In other words, the 
component a = (ô; — 7; (0;)); of 6; — 11! (8;) equals zero for j € I.) Therefore 


icr wel’ 


" 
Il 
m 
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where M = A; + X-a for alli € I’. Thus v € span(S; U Tr). Hence K” = span( Sr U Tr). 
i Gel "à 


22.5.12 REMARK: Spanning K” with n vectors so that a subset spans a given subspace. 

The key point in the proof of Theorem 22.5.11 (iv) is the recognition that the structure of the projection map 
7; guarantees that a; (ôi) has the same j-component as à; for j € I, which implies that 6; can be expressed 
as a linear combination of 77 '(6;) and only the vectors ô; for j ¢ I. Therefore the total number of spanning 
vectors Sz U Tz is equal to n. In other words, the #(/) vectors 6; for i € I may be replaced with the #(J) 
vectors n; (ôi) € W without needing to use any of the vectors 6; for i € I in the linear combination. In the 
language of bases, this means that one basis has been replaced by another. The difficulties in the proof arise 
mostly because bases, coordinate maps and linear maps have not been introduced at this point. 

The purpose of Theorem 22.5.11 (iv) is to show that the linear space K” can be spanned by n vectors S;UT; 
such that a subset Sr spans a given subspace. This is a kind of converse to the basis extension problem 
where a minimal spanning set for a subspace is given and this set must be extended to a minimal spanning 
set for the whole space. 


22.5.13 REMARK: Linear subspaces of finite-dimensional linear spaces are finite-dimensional. 

The strategy of the proof of Theorem 22.5.14 is to use a finite spanning set for the linear space V to map 
vectors in V to their coordinates (A;)2., in the linear space K”. It is not guaranteed that this map is 
one-to-one, and it doesn't need to be. Then Theorem 22.5.11 (iii) is applied to the image W C K” under 
this coordinate map of the linear subspace U to show that it has lower or equal dimension. Then finally 
the linear subspace W of the coordinate space K” is mapped back to V to show that the reverse-mapped 
vector-family (uj) corresponding to the finite spanning set for W is a spanning set for U. (This idea is 
illustrated in Figure 22.5.1.) Hence U is finite-dimensional. This kind of proof may seem clumsy or untidy, 
but it avoids any discussion of linear independence or bases, which are not yet defined at this point. 
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Figure 22.5.1 Coordinate space for a linear subspace of a finite-dimensional linear space 


22.5.14 THEOREM:  Finite-dimensional linear spaces have subspaces of less-or-equal dimension. 
Let U be a linear subspace of a finite-dimensional linear space V. 


(i) U is finite-dimensional. 

(ii) dim(U) € dim(V). 
PROOF: Let U be a linear subspace of a finite-dimensional linear space V over a field K. Then dim(V) = n 
for some n € Zf. So there is a subset S of V such that #(S) = n and span(S) = V. Then S = (v; i € Nn} 
for some (vj), € V^, and V = ( 35 Awi; (24 € K™}. 
Let W = ((Aj)., € K”; iL, Avi € US. Then W is a linear subspace of K” because U is a linear subspace 
of V. So dim(W) < n by Theorem 22.5.11 (iii). Therefore there is a sequence w = (w;)5-, € (K")™ for 
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some m < n such that W = span(w) = span(Range(w)). In other words, W = { i pw; (Mz) FL € K™}. 
Let uj = 3j, 4 Wj,kUk for j € Nm. Then uj € U because w; € W. 

Let u € U. Then u = $77 4 Av; for some (4;)-., € W by the definition of W. But all elements of W can be 
expressed as 577^, jw; for some (uj)j21 € K™ because W = span(w). So (Aj)4 = 355-4 jw; for some 
(uj)fzi € K™. Therefore u = Jy; Aivi = Diaa jar ujwsati = jar Hj Dejan Witi = Dj- Hju. So 
U = (Xa niu; (uj) € K™}. Thus U = span(u), where #(Range(u)) € m. This verifies part (i). But 
since U has a spanning set of cardinality less than or equal to the cardinality of any spanning set for V, it 
follows that dim(U) € dim(V), which verifies part (ii). 


22.5.15 REMARK: Finite-dimensional linear spaces have no proper subspaces with the same dimension. 
The assertions of Theorem 22.5.16 are perhaps intuitively obvious, but as always, intuition is no substitute 
for proof. In the case of infinite-dimensional linear spaces, the assertion is false. Therefore the special 
properties of finite-dimensional linear spaces must be invoked in the proof. 


Theorem 22.5.16 would be easier to prove with the assistance of the concept of a basis and the fact that the 
cardinality of every basis for a finite-dimensional linear space equals the dimension of the space, but this is 
not proved until Theorem 22.7.16. 


22.5.16 THEOREM: Proper subspaces of finite-dimensional linear spaces have smaller dimension. 
Let U be a linear subspace of a finite-dimensional linear space V. 


(i) If U Z V, then dim(U) < dim(V). 
(ii) If dim(U) = dim(V), then U = V. 
(iii) U # V if and only if dim(U) < dim(V). 
(iv) dim(U) = dim(V) if and only if U = V. 


PROOF: For part (i), let V be a finite-dimensional linear space over a field K. Then dim(V) = n for some 
n € Z5 , and there is a sequence (ej)521 € V" such that Vv € V, (Aj). € K”, v = 20i Ajej. 

Let U be a linear subspace of V. Then U is finite-dimensional by Theorem 22.5. ait ). So dim(U) = m for 
some m € Zg, and there is a sequence (u;)7*-, € U™ such that Vv € U, 3(uj*, € K”, v = Y 4 Hiti. 
Since U C V, there is a sequence (A;,;)?_, € K” such that Vi € Nm, ui = ae Ài jej. 

Suppose that U # V. Then there is a vector w € V V U, and there is a sequence (o5)7., € K” such that 
w = Xd ajej. If a; = Ox for all j € Nn, then w = Oy = Oy € U, which contradicts the assumption. 
So ak # Ox for some k € Nn. For some choice of k with o; # Ox, define the sequence (u,)?*, € V™ by 
u, = ui — az Ai kw for alli € Nm . Let U' = span((w/; i € Nm}). Then U' is a linear subspace of V. For 
ic Nn, u; = bam Ai jej = oL yd 195j€j = pp iN = a, Ai kaj)ej, but Aig = o rik; = 0g 
for j = k. So u; € span({e;; 7 € Nn \ {k}}) for all i € Nm. Let Up = span({e;; j € Nn \ {k}}). 

U’ is a linear subspace of Up. But dim(U,) < n — 1 by Definition 22.5.2. Therefore dim(U’) < n — 1 by 
Theorem 22.5.14 (ii). 

To show that U C U’, let v € U. Then d = Yu for some (u)?*,c€ K". But u; = w + o LÀ kt 
for all i € Nm. So v = 354 milu! + oy VA jw) = Manu ox! oer Ag. If X ra pis £ OK, 
then w = ag (do), Hidi) (v — Poi miu) € U, which contradicts the assumption w € V VU. Therefore 
v = pu, € U'. SoU C U'. Hence U = U', and so dim(U) € n — 1 < n = dim(V), which verifies 
part (i). 

Part (ii) follows immediately as the contrapositive of part (i). 


Parts (iii) and (iv) follow from parts (i) and (ii) because U = V implies dim(U) = dim(V). 


22.5.17 DEFINITION: The codimension of a linear subspace W of a linear space V is the least beta- 
cardinality of sets S C V such that span(W U $) — 


22.5.18 NOTATION: codim(W, V) denotes the codimension of a linear subspace W of a linear space V. 
22.5.19 DEFINITION: A finite-codimensional (linear) subspace of a linear space V is a linear subspace W 
of V for which there exists a finite set S C V such that span(W U S) = 


An infinite-codimensional (linear) subspace of a linear space V is a linear subspace of V which is not finite- 
codimensional. 
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22.5.20 REMARK: Issues which arise for infinite-dimensional linear spaces. 

Chapter 22 intentionally permits linear spaces to be arbitrarily infinite-dimensional. It is a useful exercise 
to determine which features of finite-dimensional spaces may be generalised to arbitrary infinite-dimensional 
linear spaces, and which concepts need to be restricted. 


The definition of a basis is perfectly meaningful for an infinite-dimensional linear space. However, the infinite 
series which appear in calculus are not linear combinations of vectors in the sense of Definition 22.3.2. For 
example, a Fourier series for a square function may look like an infinite linear combination of basis vectors 
which are sine and cosine functions. But not all series converge. 


The subject of topological linear spaces is where this topic is addressed. Banach spaces and Hilbert spaces 
are special kinds of topological linear spaces. Without a topology, it is difficult to give meaning to infinite 
series of vectors. Therefore Chapter 22 is limited to linear combinations of finite sets of vectors. 


22.6. Linear independence of vectors 


22.6.1 REMARK: Motivation for the concept of linear independence of sets of vectors. 

As mentioned in Remark 22.5.4, the trivial linear space V = {0y} containing only the zero vector has 
dimension 0. For any linear space V over a field K, for any v € V V {Ov}, the subspace W = (v; A € K} 
has dim(V) = 1 because it can be spanned by a single vector v € W, but it cannot be spanned by less than 
one vector. 


For any linear space V over a field K, for any v1, v2 € V\ {Ov}, the subspace W = {A1v1 + A209; A1, Ao € K} 
has dim(V) = 1 or dim(V) = 2 because it can be spanned by the two vectors v1, v2 € W, but it can be 
spanned by only one vector if either vector v, or v2 is a linear combination of the other. That is, the set 
(v1, v2} is adequate to span all of W, but one of these two vectors may be eliminated from the spanning set 
if it is redundant, i.e. if it is spanned by the other vector. 


For three non-zero vectors v1, v9, v3 € V \ {0y }, the subspace W = (A1U1 + Agve + A3u3; À1, A2, A3 € K} has 
dim(V) = 1, dim(V) = 2 or dim(V) = 3. This is because it may be possible to eliminate one vector which 
is spanned by the other two. And then one of the two remaining vectors may be spanned by the other. 
These examples motivate the definition of linear independence of vectors. A set of vectors in a linear space 
is said to be a linearly independent set of vectors if there is no redundancy, i.e. if none of the vectors is 
spanned by the others. When a set of vectors is linearly independent, the dimension of the spanned set is 
equal to the number of vectors in the set. This is the pay-off for linearly independent vectors. Since none 
of the vectors in a linearly independent set S is a linear combination of the others, the spanned set span(S) 
cannot be spanned by fewer vectors. Therefore dim(span(S)) = 8(S). Definition 22.6.3 may be interpreted 
by saying: “If you remove v, you don't get it back again.” In other words, if you remove any vector v from 
the spanning set S, the vector v cannot be reconstructed from the remaining vectors in S \ [v]. 


There is one little fly in the ointment here. Although there may be no more redundancy that can be removed 
from a spanning set S for a linear subspace W of a linear space V, it is not immediately obvious that there 
is no other set with fewer vectors which can span the same subspace W. For example, all integers greater 
than 1 may be expressed as sums of integer multiples of the integers 2 and 3. So in this sense, the set 
{2,3} spans the integers. But the smaller set {1} also spans the integers greater than 1 in this way. So 
the spanning set (2, 3) is minimal but not minimum. Nevertheless, removing redundancy from a spanning 
set is an adequate motivation for Definition 22.6.3, and it will be shown that a spanning set of minimal 
beta-cardinality is also a spanning set of minimum beta-cardinality. 


22.6.2 REMARK: All vectors in a linearly independent set of vectors contribute to the span. 
Figure 22.6.1 illustrates spans and independence of vectors. 


In the abstract view on the left of Figure 22.6.1, one knows merely that the span of a subset S of a linear 
space V is a subset span(S) of V such that S C span(S). Therefore for any two vectors vı and v2, the 
subset span({v1,v2}) must satisfy v1, v9 € span({v,,v2}). Consequently, the intersection span({v1, v2}) N 
span({v1, v3}) must contain at least the vector vı. Definition 22.6.3 says that the linear independence of the 
set S = (v1, v2, v3) means that v ¢ span(S V (v]) for v = v1, v2, vs. In the diagram, it is clear that no two 
points in S = (v1, v2, v3) span all three points in S. However, it is not a-priori obvious in this abstract view 
that there are no two points in the whole space V which could span all of S. This is a specific property of 
linear spaces. 
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Figure 22.6.1 Spans and independence of vectors 


The projective space view on the right in Figure 22.6.1 gives a useful way of mapping out the concepts of 
spanning sets and linear independence. 


22.6.3 DEFINITION: A linearly independent set of vectors in a linear space V is a subset S of V such that 


Vv € S, v é span(S \ {v}). (22.6.1) 


22.6.4 REMARK:  Linearly independent families of vectors must be injective. 

The image set (vi; i € I} of a family (vi);er in Definition 22.6.5 will contain only one “copy” of any vector 
which is repeated in the family. To prevent this, a linearly independent family is required to be injective. 
That is, it consists of distinct vectors. Otherwise a family of one hundred copies of a single vector would 
have a linearly independent range-set consisting of a single vector. 


22.6.5 DEFINITION: A linearly independent family of vectors in a linear space V is a family (v;);e; of 
distinct vectors in V such that (vj; i € I} is a linearly independent set of vectors in V. 


22.6.6 THEOREM: All subsets of a linearly independent set are linearly independent. 
Let S be a linearly independent set of vectors in a linear space V. 


(i) Ov € S. 
(ii) Every subset of S is a linearly independent set of vectors in V. 


PROOF: For part (i), let S be a linearly independent set of vectors in V. Then Oy € span(S \ (0y]) by 
Theorem 22.4.7 (vii). This contradicts Definition 22.6.3 line (22.6.1). Hence Oy ¢ S. 

For part (ii), let S" be a subset of a linearly independent subset S of V. Then Vv € S, v é span(S \ {v}) by 
Definition 22.6.3. So Vv € S’, v é span(S \ (vj). But span(S' \ {v}) C span(S \ (vj) by Theorem 22.4.7 (vi). 
So Vv € S', v é span(S' \ {v}). Hence S" is a linearly independent subset of V by Definition 22.6.3. 


22.6.7 REMARK: Fin(S, K) denotes the set of linear combinations of vectors in S. 
'The statement of Theorem 22.6.8 makes use of Notation 22.2.25. 


22.6.8 THEOREM: Equivalent conditions for linear independence. 
The linear independence of a subset S of a linear space V is equivalent to each of the following conditions. 
(i) Vv € S, (vd N span(S \ (v]) = 0. 
(ii) VA € Fin(S, K), (X es Avv = 0 > Vv e sS, A, — 0). 
(iii) Vw € span(S), 3'À € Fin(S, K), w = J pes vv. 
) 


i 
(iv) Vw € S, VA € Fin(S, K), (w = Jes aut > WES, w= Sons) 

PROOF: Let $ be a subset of a linear space V. Then the equivalence of condition (i) and Definition 22.6.3 
line (22.6.1) follows from Notation 7.5.7 for (v). 

Assume that (i) is true. Then Vv € S, (v) Nspan(S\{v}) = 0. To show (ii), let A = (Av)ves € Fin(S, K) and 
Sues Avv = 0. Suppose that A, # Ox for some v € S. Then A,v = — 55, cs Aww. Sov = 3, eg —^w/M)w. 
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So (v) N span(S V {v}) = {v} z 0, which contradicts assumption (i). Therefore Vu € S, A, = 0. Thus (ii) 
follows from (i). 

Assume that (i) is false. Then 3v € S, {v} N span(S \ {v}) Æ 9. So v = Junes Aww for some A € Fin(S, K) 
with A, — 0. Then oes Uww = 0, where (uu)wes € K is defined by py = —1x and pw = Ay for 
w € SN {v}. But it is not true that Vw € S, Hw = 0 (because ji, = —1x). So (ii) is false. Hence (i) is true 
if and only if (ii) is true. E j 


«gn 


Condition (iii) is true without pre-conditions if the unique existence quantifier “3” is replaced with the 
ordinary existence quantifier “3”. This is because span(S) is defined by 


w € span(S) & 3A € Fin(S, K), w = 5 Avv. 
ves 
The uniqueness part of condition (iii) has the following meaning. 
Vw € span(S), V, u € Fin(S, K), ((w = 5 Auu and w = 5 pov) > A=). (22.6.2) 
ues ves 


Suppose that (22.6.2) is true. Let A € Fin(S, K) satisfy ` „eg Avv = 0. Define jj € Fin(S, K) by pu = Ox 
for all u € S. Then X „es uuu = 0. So 0 = Veg Av and 0 = X ues Huu. Therefore A = u by (22.6.2). So 
Vv € S, Ay = 0. Thus condition (iii) implies condition (ii). 

Suppose that condition (ii) is true. Let w € span(S) and A, u € Fin(S, K). Let w = „es Auu and w = 
Yves Mov. Then Oy = J ues Aut — Pes Hvt = Y csv — My)u. So by (ii), Vv € S, Av — uy = OK. 
So A= ji. Therefore (22.6.2) is true. Hence (ii) implies (iii). So (ii) is equivalent to (iii). 

Assume that (iii) is true. Then (22.6.2) follows from the uniqueness part of (iii). Let w € S and À € Fin(S, K) 
be such that w = J „es àv. For any w € S, w = J „eg vuU. So w = Jpeg Awd = Pues Óv,ut- So by 
(22.6.2), Ay = ôv w for all v € S. Thus (iv) is true. 

Assume that (iv) is true. Let w € S and suppose that w € span(S V {w}). Then for some A € Fin(S, K), 
Av = Ox and w = J peg Avv. So by (iv), Vv € S, A, = óyu. So A, = 1x. This is a contradiction. So 
w € span(S'\ {w}) for all w € S. This is the same as (22.6.1). So (iii) follows from this. Hence the conditions 
(i), (ii), (iii) and (iv) are all equivalent to (22.6.1). 


22.6.9 REMARK: Alternative proof of the unique linear combination condition for linear independence. 
Theorem 22.6.8 part (iii) may also be proved from part (iv) as follows. 

Assume condition (iv) in Theorem 22.6.8. Let w € span(S) and A, u € Fin(S, K) be such that w = J pes Av’ 
and w = } peg Hvt and A z p. Then A, z Hu for some u € S. Then (Ay — Huu = — Dyes quy wv — Ho). 
So u = J „eg vut, where v, = —(Av — u)/(^u — Hu) for all v € SX {u} and v, = Ok. Then by (iv), 
Vv € S, v, = by,u- So Hu = 1g. This is a contradiction. So A = u. That is, w is a unique linear combination 
of the elements of S. So condition (iii) is true as claimed. 


22.6.10 REMARK: Alternative lower-level notation for conditions for linear independence. 
If the conditions in Theorem 22.6.8 are written out explicitly, without the benefit of the non-standard 
Notation 22.2.25 for Fin(S, K), they have the following appearance. 


(i) Vv € S, (v) N span(S V (v]) = 0. 
(ii) VÀ € KS, 4&(A ! (KV {0x})) < oo > (yes Av = 0 > WES, A, —0). 
(üi') Vw € span(S), VA e KS, (#(A71(K \ (0«])) < oo and w = Veg v). 
(iv) Vw € S, VÀ, € K?, 4&(A 1( \ {0K})) < oo > (w= Yes Mv > Vv € S, A, = vw). 


22.6.11 REMARK: A popular linear independence condition is less intuitive than the spanning-set definition. 
Condition (ii) in Theorem 22.6.8 is often given as the definition of linear independence of vectors. It has 
an attractive symmetry and simplicity, but it is not close to one's intuition of the linear independence of 
vectors. Condition (22.6.1), on the other hand, clearly signifies that each vector is independent of the others. 


Definition 22.6.3 has been chosen to correspond closely to the meaning of the word “independent”. To see 
this, note that the logical negative of line (22.6.1) in Definition 22.6.3 is line (22.6.3) as follows. 


3e € S, v € span(S \ {v}). (22.6.3) 
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This expresses quite clearly the idea that v depends on the other vectors of S. The logical negative of 
Theorem 22.6.8 condition (ii) is line (22.6.4). 


3A € Fin(S, K), S Ayv=0 and 3v€ 5,4, £0. (22.6.4) 
ves 


This means that there is some linear combination of vectors in S whose value is zero, but whose coefficients 
are not all zero. It is not instantly intuitively clear that this encapsulates the idea of dependence. 


The formulation of the idea of independence in Definition 22.6.3 corresponds well to the independence concept 
in some other subjects. For example, in mathematical logic one seeks independent sets of axioms, which 
are axioms sets for which no axiom may be proved from the others. One might say that no axiom is in the 
“span” of the other axioms. Similarly, in set theory one seeks axioms which are independent in this same 
sense. Thus there has been a considerable literature concerning the independence of the axiom of choice, 
the continuum hypothesis, and various other axioms, from the Zermelo-Fraenkel axioms. One may think of 
the set of all ZF theorems as the “span” of the ZF axiom system. Independence of the axiom of choice then 
means that it is not in the span of the ZF axioms. 


22.6.12 REMARK: Linear charts for linear spaces using coordinates with respect to a basis. 

Condition (iii) in Theorem 22.6.8 may be interpreted to mean that every vector w in span(S) has a unique 
representation as a linear combination of vectors of S. So there is a bijection between the vectors span(S) 
and the elements of Fin(S, K). But Fin(S, K) is the set of vectors in the linear space FL(S, K). The map 
w e» À, where A € FL(S, K) satisfies w = } „eg À,v, is a linear map. (See Section 23.1 for linear maps.) 
Therefore linearly independent sets of vectors can be applied to map vectors in linear spaces to scalar tuples 
in free linear spaces. This provides a very general mechanism for coordinatising linear spaces. The map from 
vectors to tuples may be thought of as a “chart”. A set of linearly independent vectors which is used in this 
way is called a “basis”. 


22.6.13 REMARK: Transforming linearly independent spanning sets. 

The map f in Theorem 22.6.14 line (22.6.5) is in essence a “row operation" as used in the well-known 
Gaußian elimination procedure for matrices. Theorem 22.6.14 is used in the proof of Theorem 22.7.14. (For 
more conventional versions of Theorems 22.6.14 and 22.7.14, see Kaplan/Lewis [99], pages 676-677.) 


22.6.14 THEOREM: Effects of adding multiples of a vector to other vectors of a linearly independent set. 
Let V be a linear space over a field K. Let S be a linearly independent subset of V. Let u’ € S. Let 
S! = SN {u'}. Let (Au)ues: € K9 be a family of elements of K. Define f : S' > V by 


Vu € S', f(u) =u + Aqu, (22.6.5) 


f is injective. 

u' ¢ span(f(S^)). Hence v' ¢ f(S"). 

f(S’) U (u') is a linearly independent subset of V. 
span( f($") U (u']) = span($). 


PROOF: For part (i), let u1,u2 € S' with f(ui) = f(u2) =v. Then u1 + Au,v' = uo + Auau’. Therefore 
Uy = uz + (Aus — Au, uw’ € span(S V (u1]) if ur A ug and uj 4 u’. This would contradict the linear 
independence of S. So u, = ug. Hence f is injective. 

For part (ii), suppose that u' € span(f(S’)). Then u’ = X es Huf(u) for some u € Fin(S', K). Therefore 
u' = Vues Hulu + Au) = (Yuesr Hutt) + (ues: Hudu)u’. So 3 ues Hut (ues; Huñu — 1)u' = 0. So 
by Theorem 22.6.8 (ii) and the linear independence of S, pu = 0 for all u € S' and 1 = Do es’ Huu = 0, 
which is impossible. Therefore u’ ¢ span(f(S")). Hence u’ ¢ f(S"). 

For part (iii), suppose that f(5") U {u’} is not a linearly independent set of vectors. Let Š = f(S)U {u'}. 
Then by Theorem 22.6.8 (ii), there exists  € Fin(S, K) with ) „eg Huv = 0 and p, # 0 for some v € S. 
If pw # 0, then u’ € span(f(5")), which would contradict part (ii). Therefore $- eps) Mvv = 0 and 
Hv # 0 for some v € f(S'). In other words, X` es’ Hf(u) f(u) = 0 and py.) Æ 0 for some u € S’. Thus 


0 = Vues: Hf (u Au’) = Dues Hawt + (Cues Igea) S). Tf Dyes: ip(a) # 0, then wu! € span($"), 


(iii 


) 
) 
" 
) 


(iv 
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which would contradict the assumed linear independence of S. Therefore 7 es Mp(u)(u + Auu’) = 0, 
which implies by Theorem 22.6.8 (ii) that S" is not linearly independent, which would contradict the linear 
independence of S. Hence by Theorem 22.6.6 (ii), f(.S") U {u’} is a linearly independent subset of V. 

For part (iv), f(S’) C span(S) and uw’ € S. So f(S') U (u') C span(S). So span(f(S") U (u')) C span(S). 
Now let u € S. If u = u’, then u € f(S') U (u'). Otherwise u € S' and u = f(u) — Aw € span(f (S^) U (u')). 
So span(S) C span(f (S^) U {u’}). Hence span(f (S") U {u’}) € span(S). 


22.7. Linear space basis sets and basis families 


22.7.1 REMARK: Historical note om Cartesian coordinates. 

The idea of a basis and coordinates for a linear space is essentially the same as Cartesian coordinatisation, 
which was not originated in about 1637 by René Descartes. (See Boyer [236], page 76; Boyer/Merzbach [237], 
pages 239, 319-320; Cajori [241], page 175.) 


22.7.2 DEFINITION: A basis is a linearly independent spanning set. 
A basis (set) for a linear space V is a linearly independent subset S of V such that span(S) = V. 


22.7.3 REMARK: Immediate extension of the basis concept to linear subspaces. 

Since every subspace W of a linear space V is a linear space, Definition 22.7.2 applies to subspaces of linear 
spaces also. Thus a basis for a linear subspace W of a linear space V is a linearly independent subset of W 
which spans W. 


22.7.4 THEOREM: Every linearly independent set of vectors is a basis for its span. 
Every linearly independent set of vectors is a basis for its span. 


PROOF: Let S bea linearly independent set of vectors in a linear space V. Then span($) is a linear subspace 
of V by Theorem 22.4.7 (i), and S C span(S) by Theorem 22.4.7 (iii). Hence S is a basis for span(S) by 
Definition 22.7.2. 


22.7.5 REMARK: The added convenience of indexed families of basis vectors. 

In order to traverse the vectors in a basis, it is useful for the basis vectors to be ordered in some way, and 
for many purposes a well-ordering is best. (For example, see Theorem 22.7.26.) It is often convenient to 
indicate the ordering on a basis by indexing the basis with an ordered set. Thus instead of a basis set S C V 
for a linear space V, one would employ a family (va)aca € V^, where A is an ordered set. 


Even if the index set is not ordered, it is often convenient to specify a family of vectors rather than a set. 
For example, two bases expressed as families (va )aca and (Wa)aca can be more easily compared when they 
are indexed. For example, one might state that va = 2wa for all a € A, which is difficult to express without 
an index set. Indexed families are especially useful for defining vector coordinates relative to a basis. 


It is convenient, therefore, to also define a basis as a family of vectors. Definition 22.7.6 requires a basis 
family to consist of unique vectors. Definition 22.7.7 defines basis families for which the index set is ordered. 


22.7.6 DEFINITION: A basis (family) for a linear space V is an injective family v = (va)aca € V^ such 
that Range(v) is a basis set for V. 


22.7.7 DEFINITION: Order properties of basis families. 
An ordered basis (family) for a linear space V is a family v = (va)aea € V^ such that A is an ordered set, 
Range(v) is a basis set for V and v : A — V is injective. 


A totally ordered basis (family) for a linear space V is a family v = (va)aca € V^ such that A is a totally 
ordered set, Range(v) is a basis set for V and v : A — V is injective. 


A well-ordered ordered basis (family) for a linear space V is a family v = (va)aca € V^ such that A is a 
well-ordered set, Range(v) is a basis set for V and v : A — V is injective. 


22.7.8 REMARK: Indexing finite basis sets. 

In the case of a finite-dimensional linear space V, it is always possible to index a basis set with the integers 
1,2,...n for some n € Zg- Thus one may use a well-ordered finite basis family of the form (v;)"_, € V” 
for some n € Zf, where v; Z vj for i Z j and Range(v) is a basis for V. The injectivity of the family v is 
essential. In the case of a finite family, the word “sequence” may be used instead of "family". 
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22.7.9 DEFINITION: The standard basis of a Cartesian linear space R” for n € Z is the sequence (e;)? , 
defined by e; — (0i5)54 € R” for à € Nn. 


The unit vector e; in a Cartesian linear space R”, for n € Zo and i € Nn, is the n-tuple e; € R” defined by 
ei = (fij )j=1- 


22.7.10 REMARK: The standard basis of the Cartesian linear space is a basis. 

Theorem 22.7.11 is effectively a special case of Theorem 22.8.3. (See Definition 22.2.19 for Cartesian linear 
spaces.) The triviality of Theorem 22.7.11 makes it slightly more difficult to prove. One must distinguish 
between scalar tuples A € R” and vectors v € R”, which are the same set, but with different meaning. 


22.7.11 THEOREM: Basic properties of the standard basis of the Cartesian linear space. 
Let n € Zt. Let (e;)", be the standard basis of the Cartesian linear space R”. 


(i) Vv € R^, v = dy vtei. 
(ii) (e;)2.4 is a linearly independent family of vectors in IR". 
(iii) (e;)?., is a basis for IR". 


PROOF: For part (i), let v € IR". Then 


Vj E Nn, (X v'e:); = Y; v (ei); (22.7.1) 
i=l t=1 
= $ và (22.7.2) 
i=l 
=v 


a 


where line (22.7.1) follows from Definition 22.2.19, and line (22.7.2) follows from Definition 22.7.9. Hence 
ey vei. 

For part (ii), let A € Fin(N,,R). Then A = (4;)-, € R”. Suppose that $77. A;e; = 0. Then for all j € Nn, 
0 = 357.4 Aidiy = A. Hence (e;)"_, is a linearly independent family of vectors in R” by Theorem 22.6.8 (ii). 
Part (iii) follows from parts (i) and (ii) by Definition 22.7.6. 


22.7.12 THEOREM: Every finite-dimensional linear space has a finite basis. 
If a linear space V is finite-dimensional, then V has a finite basis. In other words, if a linear space has a 
finite spanning set, then it has a linearly independent spanning set. 


PRoor: Let V be a finite-dimensional linear space over a field K. Then V is spanned by a finite set 
S € P(V\{0v}) by Definition 22.5.7. (The zero vector can be omitted from S because it makes no difference 
to the span of S.) Let n = #(S). Then n € Zj. If n € 1, then S is a linearly independent subset of V. 
Therefore S is a finite basis for V. So suppose that n > 2. If S is not a linearly independent subset of V, then 
by Definition 22.6.3, there is a vector v € S such that v € span(S V (vj). It follows from Theorem 22.4.7 (viii) 
that span(S) = span(S \ {v}). So S may be replaced with S NV (v). But #(S \ {v}) = n— 1. Thus if the 
assertion is true for #(.S) = n — 1, then it is true for #(.S) = n. Consequently by a standard induction 
argument, the assertion is true for any finite spanning set 5. 


22.7.13 THEOREM:  Finite-dimensional linear spaces have a basis with cardinality equal to dimension. 
If a linear space V is finite-dimensional, then V has a basis whose cardinality is equal to the dimension of V. 
In other words, #(S) = dim(V) for some basis S for V. 


PRoor: Let V be a linear space with finite dimension n. Then n € Zj is the least cardinality of a spanning 
set for V. Let X be the set of all spanning sets for V which have cardinality not exceeding n. Then X 
is non-empty because otherwise the least cardinality of a spanning set would exceed n. But there are no 
spanning sets with cardinality less than n. So X is a non-empty set of spanning sets for V with cardinality 
equal to n. (Note that Oy cannot be an element of S for any S € X because removing this element would 
give a spanning set with cardinality less than n.) 


Let S € X. If n = 0, then S is a linearly independent subset of V. So V has a basis S with #(S) = 
dim(V) = 0. If n = 1, then S = (v), for v Æ Oy, is a linearly independent subset of V. Therefore $ is a 
basis for V with #(S) = dim(V) = 1. 
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So suppose that n > 2. If S is not a linearly independent subset of V, then by Definition 22.6.3, there is a 
vector v € S such that v € span($\ (vj). It follows from Theorem 22.4.7 (viii) that span(S) = span($ \ (vj). 
So span(S \ {v}) = V. But #(S \ {v}) = n — 1, which contradicts the dimensionality of V. Therefore S is a 
linearly independent spanning set for V with cardinality n. 


22.7.14 THEOREM:  Linearly independent sets within the span of a finite set have not-more elements. 
Let V be a linear space. Let Sı and Sə be linearly independent subsets of V such that Sə is finite and 
5; € span(55). Then #(51) € #(S2). 


PROOF: Let Sı and S» be linearly independent subsets of a linear space V over a field K. Then Oy ¢ Sı 
and Oy ¢ $5 by Theorem 22.6.6 (i). Assume that $4 C span(S2). If #(S2) = 0, then S2 = {Ov}, and so 


Sı =. Therefore #(S1) = 0 € #(S2) = 0. 

If #(S2) = 1, then S5 = (vi) for some vı € V \ {Ov}. So span(S2) = {A1v1; A1 € K}. Suppose that 
U1, U2 € Sı. Then uj,u3 € span($3). So uy = uv, and us = Hov, for some 1, u2 € K \ {Ox}. But if 
uj Æ Ug, the linear independence of Sı implies that u; ¢ span(S1 \ (u1]) = span((u3]) = (4u3; A € K} = 
[Austi; A € K} = (Av; A € K} since u2 # Ox. Clearly this is false because u1 = pvi € (v1; A € K}. 
This contradiction implies that uj = u2. Therefore #(S1) < 1 = #(S2). 

To prove the assertion for a general linearly independent finite set S2, let n € Z^ and assume that the 
assertion is true for all linearly independent subsets S2 of V such that #(S2) € n — 1. Let S» be a linearly 
independent subset of V such that #(S2) =n. Then Sy = (vj; j € Nn} for some family (v;)?.., of distinct 
non-zero elements of V. Assume that Sı is a linearly independent subset of V such that Sı C span(S;). 
Then $4 = {u;; i € I) for some (not necessarily finite) family (u;);e;, where for all i € I, u; = dm Aijv; for 
some family Qu) of elements of K. In the special case that Ain = 0 for all i € J, all elements of $1 would 
be contained in span(S5 \ {un}). But #(S2 \ {vn}) 2 n — 1. So in this case, the assertion #(51) < #(52) 
follows from the inductive hypothesis for $5 satisfying #(S2) < n — 1. Therefore suppose that Aj, Æ Ox for 
some k € I. Let I" = I \ {k}, and define the family (w;);er by 


Vic I', uj; = Ui AinrApe, Uk 
= Xa Magy — Aaa 205i Mit 
n—1 = 
= Y 08 — nM Aes) 


because Aij — Ai Ag lÀxg = 0 for j = n. So wj € span(Sa V (v,]) for all i € I’. Let S| = (wi; i € I}. 
Then $1 C span(S2 \ {vn}). But Sj is a linearly independent subset of V by Theorem 22.6.14 (iii) and 
Theorem 22.6.6 (ii). So #(S{) € n — 1 by the inductive hypothesis because #(S2 \ (v,]) = n — 1. But 
#(S1 \ (ux)]) = 4S1) by Theorem 22.6.14 (i), and uz € S1 by Theorem 22.6.14 (ii). Therefore #(S1) = 
#(S1) +1 <n = #(S2). 


22.7.15 THEOREM: Linearly independent set cardinality is bounded by the finite dimension of a space. 
Let S be a linearly independent subset of a finite-dimensional linear space V. Then #(S) < dim(V). 


PROOF: Let S bea linearly independent subset of a finite-dimensional linear space V. By Theorem 22.7.13, 
V has a basis B with #(B) = dim(V). Then by Definition 22.7.2, B is a linearly independent subset of V 


and span(B) = V. Then S C span(B). So by Theorem 22.7.14, #(S) < #(B) = dim(V). 


22.7.16 THEOREM: The dimension of a finite-dimensional space equals the cardinality of every basis. 
Every basis of a finite-dimensional linear space has cardinality equal to the dimension of the space. In other 
words, every basis S for V satisfies #(S) = dim(V). 


PROOF: Let V be a linear space with finite dimension n € Zg. By Theorem 22.7.13, there is at least one 
basis S for V which has #(S) = dim(V). Let S" be any basis for V. Then S and S’ are linearly independent 
subsets of V such that S C span(S’) = V and S” C span(S) = V. So #(S) < #(S’) and #(S’) < #(S) 
by Theorem 22.7.14. Therefore #(S) = #(S’) by Theorem 13.1.8. Hence every basis S for V satisfies 
#(S) = dim(V). 


22.7.17 REMARK: Uniqueness of the cardinality of minimal spanning sets. 
Theorem 22.7.16 may seem unsurprising because one speaks so often of the dimension of a space. But 
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mathematics has many kinds of algebraic and analytical structures which may be “spanned” in some sense 
by a relatively small subset, using only a restricted set of operations. An analytical example of this is the 
open bases in Section 32.2 and open subbases in Section 32.3 for topological spaces. It is not always obvious 
that all minimal spanning sets should have the same cardinality. Consider for example the additive group 
Zeooo = {x € Zf; x < 6000} with addition defined modulo 6000. (See Definition 16.5.19 for the modulo 
function.) This group is spanned by the subset {2,3} because every element may be written as finite sums 
of these two elements. This is also a minimal spanning set because clearly {2} and {3} do not span Zeooo. 
However, the subset {1} is also a minimal spanning set for Zeo00, and obviously #({1}) Z #({2,3}). 
Therefore it should perhaps be seen as slightly surprising that absolutely all minimal spanning sets of a 
linear space have exactly the same cardinality. (In fact, the cardinality of a finite-dimensional linear space 
completely determines the space up to isomorphism, as mentioned in Remark 23.1.18.) This would help to 
explain why the proof of this fact is not entirely trivial. 


For some slightly different versions of Theorem 22.7.16, see MacLane/Birkhoff [110], page 200; Lang [108], 
pages 140-141; Pinter [122], page 287; Ash [50], pages 93-94; Cullen [64], page 85; S. Warner [155], page 640; 
Franklin [71], pages 35-36; Shilov [134], pages 40-41; Schneider/Barker [132], pages 125-126; Stoll [144], 
page 40; Curtis [65], page 37; Hartley/Hawkes [90], page 101; Kaplan/Lewis [99], pages 117, 677. 


22.7.18 THEOREM: The dimension of a free linear space on a finite set equals the set’s cardinality. 
Let S be a finite set. Let K be a field. Then dim(FL(S, K)) = #(S). 


PnRoor: For any finite set S and field K, FL(S, K) = K? by Theorem 22.2.14 (iv). Let n = 4£(S). Then 
n € Zf and the n elements e; = (ó;j)jes of K? for i € S constitute a spanning set for K^ because 
every element v € K? may be written as v — dics Viei. But the vectors e; are linearly independent 
by Theorem 22.6.8 (ii) because if J jes Aiei = Ogs for some (A;)ies, then (A;i)ies = Vieg iei = Os, 
which means that A; = Ox for all i € S. Therefore (e;);es is a basis for K? by Definition 22.7.2. Hence 
dim(K^?) = #(S) by Theorem 22.7.16. 


22.7.19 REMARK: Every free linear space has an obvious basis, but it might not be well-ordered. 

Free linear spaces on infinite sets provide examples of infinite-dimensional linear spaces. (See Section 22.2 
for free linear spaces.) For any set S and field K, let V be the free linear space on S with field K. 
Then B = (f : S > {0x, 1g}; #({x € S; f(x) Z 0)) = 1} is a basis for V. For example, the free 
linear space FL(IR, IR) has a basis consisting of all functions of the form f, € V defined by fala) = 1 and 
Vx € R \ {a}, fa(x) = 0. However, this basis cannot be well-ordered without the assistance of the axiom of 
choice. Thus there is no explicit well-ordering of this basis. 


22.7.20 REMARK: A well-ordered basis always exists if you have faith. 

It is not immediately obvious that every linear space has a basis. In fact, the axiom of choice is required to 
make this assertion in full generality in Theorem 22.7.21. As a bonus, the basis can always be well-ordered. 
The disadvantage is that you need to have faith in the AC pixies when you are told that a well-ordered basis 
exists because it will never be revealed to any human being. This book is based on scepticism, not faith. 
AC delivers nothing but vain hope. (Proofs which use the axiom of choice are typically short and sweet, but 
they leave the stomach feeling empty as if the meal was just a dream. A theorem which uses AC is like an 
IOU for a two-week holiday in the Andromeda Galaxy. It’s a nice thought, but a waste of paper.) 


22.7.21 THEOREM [ZF+AC]: Every linear space has a well-ordered basis. 
Every linear space has a well-ordered basis. 


PROOF: Let V be a linear space over a field K. By the well-ordering theorem (Remark 7.11.12 part (2)), 
the elements of VV {0y } may be well-ordered. That is, there is a family (va)ae4 such that A is a well-ordered 
set and v : A— V \ {0y } is a bijection. 


Define f : A — {0,1} inductively for all a € A by 


sane. { 1 if va ¢ span({vg; B < a and f(B) = 1j) 


~ 10 otherwise. 


Then by transfinite induction (Theorem 11.8.2), (vo; f(a) = 1} is a well-ordered basis for V. 
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22.7.22 REMARK: How to develop linear spaces without the axiom of choice. 

The AC-tainted Theorem 22.7.21 is not used in this book except for occasional theorems which are presented 
for comparison with the literature. All theorems derived from AC-tainted theorems such as Theorem 22.7.21 
are, of course, also AC-tainted. 


When a basis is required for an infinite-dimensional linear space, the existence of a basis will be specified as 
a pre-condition in any theorem or definition which requires it. Then the reader who accepts the axiom of 
choice will understand “for any linear space with a basis" as “for any linear space". 


The following conditions may be placed on linear spaces in theorems and definitions. 


(1) “For all linear spaces ...” 
(2) “For all linear spaces which have a basis ...” 
(3) “For all linear spaces which have a well-ordered basis ..." 


'The reader who accepts the axiom of choice will think of all three of these pre-conditions as meaning the 
same thing: “For all linear spaces ...”. 


It would be unreasonable to restrict linear algebra to the study of linear spaces which have a well-ordered 
basis. For example, the unrestricted linear space R® with pointwise addition and multiplication apparently 
has no definable basis. This is by no means a pathological space. But no basis can be constructed for R®. 
So all theorems and definitions for RÈ must be valid in the absence of a basis. The axiom of choice is a 
*faith axiom". It does not deliver a basis for IRF which human eyes can gaze upon. The AC pixie reveals to 
the joyous faithful that a basis exists in some astral plane, but non-believers are doomed to walk eternally in 
the valley of darkness, bitter and forever athirst for the metaphysical sets which they spurn. But seriously 
though, any theorem or definition which constructs something out of an unknowable set is merely creating 
a fiction from a fiction. Garbage in, garbage out! 


22.7.23 REMARK: The real numbers cannot be well ordered. 

The real numbers cannot be explicitly well-ordered. This must mean that 2" also cannot be explicitly 
well-ordered. AC says that they can be well-ordered, but no such well-ordering can ever be presented. It 
follows that the linear space FL(R, IR) with pointwise addition and multiplication does not have a “definable” 
well-ordered basis. According to Feferman [411], page 325: 


No set-theoretically definable well-ordering of the continuum can be proved to exist from the 
Zermelo-Fraenkel axioms together with the axiom of choice and the generalized continuum 
hypothesis. 


(This quotation is also discussed in Remark 7.12.5.) For the linear space FL(R, IR), there is a basis {e,; x € 
R}, where e; : IR — R is defined by e;(y) = ôx, for all z, y € IR. But well-ordering this basis requires a 
well-ordering for IR, which is not available in plain ZF set theory. 


22.7.24 REMARK: Examples of linear spaces with and without a basis. 

Some infinite-dimensional linear spaces, such as the free linear space FL(Z¢ , R) on the non-negative integers 
over the real number field, clearly have a well-ordered basis. The free linear space FL(R, IR) clearly has 
a basis, but without the axiom of choice, the basis cannot be well ordered. The linear space REF of all 
real-valued functions on the real numbers apparently does not even have a basis in the absence of the axiom 
of choice. This is summarised in the following table. 


linear space basis well-ordered basis 


FL(Zj,ÀR) yes yes 
FL(R, IR) yes = 
RR = = 


For the unrestricted linear space RR = UL(IR,IR) the set {e,; x € IR) in Remark 22.7.23 is not a basis 
because it only spans functions in R® which have finite support. (In other words, it spans only the subspace 
FL(R, IR) of UL(R, R).) 


22.7.25 REMARK: Application of the axiom of choice to the extension of a spanning set to a basis. 
It often happens that one requires an extension of a linearly independent set S of vectors in a linear space 
to a basis for the whole space. The existence of a basis for the whole space guarantees this extension in 
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the case that the set S is empty. But if the set S is infinite, one must assume in general the existence of 
a well-ordered basis for the whole space as in Theorem 22.7.26. With the axiom of choice, the existence of 
basis extensions is guaranteed for all linear spaces as in Theorem 22.7.27. 


22.7.26 THEOREM: Construction of a well-ordered basis including a given linearly independent set. 
Let S be a linearly independent subset of a linear space V which has a well-ordered basis. Then there is a 
well-ordered basis for V which includes S. 


PROOF: Let V be a linear space with basis (va)aea, where A is a well-ordered set. Let S be a linearly 
independent subset of V. Define f : A > {0,1} inductively for a € A by 


H= { 1 if va d span(S U (va; 8 < a and f(B) = 1}) 


0 otherwise. 


Then by transfinite induction (Theorem 11.8.2), S U (vs; f(a) = 1} is a well-ordered basis for V which 
includes S. 


22.7.27 THEOREM [ZF+AC]: Every linearly independent set is included in a well-ordered basis for the space. 
Let S be a linearly independent subset of a linear space V. Then there is a well-ordered basis for V which 
includes S. 


PRoor: This follows immediately from Theorems 22.7.21 and 22.7.26. 


22.8. Vector components and component maps 


22.8.1 REMARK: The numericisation of space. 

The motivation for defining bases for linear spaces is Theorem 22.6.8 (iii). This states that for a fixed basis, 
every vector in the linear space can be equivalently represented in terms of coefficients with respect to the 
basis. As mentioned in Remark 22.7.1, the original model for this idea is Cartesian coordinatisation, which 
locates all points in two or three dimensions according to a rectangular grid based on an origin and two or 
three basis vectors. In other words, space can be numericised. 


22.8.2 THEOREM: Unique linear combination condition for a basis of a linear space. 
A subset B of a linear space V over a field K is a basis for V if and only if 


Vv € V, k: B K, #({e € B; k(e) #0}) <œ and Y k(e)e =v. 
ec B 


In other words, for all v € V, there is a unique function k : B > K such that k(e) = 0 for all but a finite 


number of vectors e € B and v = » ep k(e)e. 


Pnoor: This follows from Theorem 22.6.8 (iii). 


22.8.3 THEOREM: Unique linear combinations for vectors in a finite-dimensional linear space. 
A vector family (e;)?., € V” in a linear space V over a field K is a basis for V if and only if 


Yu cV, d'ke K”, 2 kiei = v. 


In other words, for all v € V, there is a unique n-tuple k € K” such that v = 35; kiei. 


Pnoor: This follows from Theorem 22.6.8 (iii). Alternatively, Theorem 22.8.3 follows as a special case of 
Theorem 22.8.2. 


22.8.4 REMARK: The component map for a linear space with a given basis. 
For any set of vectors B C V in a linear space V over a field K, the map ¢z : Fin(B, K) — V defined by 
óp(k) = J eg k(e)e for k € Fin(B, K) is well defined. Theorem 22.8.2 implies that dg is a bijection if and 


only if B is a basis. So B is a basis if and only if the inverse map kp = oz : V > Fin(B, K) is well defined. 
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linear combination map 


Fin(B, K) OB 


d V 


zi 
KB = OB 
vector component map 


Figure 22.8.1 Component maps and linear combinations 


This inverse map assigns unique components relative to the basis B for all vectors in the linear space. (See 
Figure 22.8.1.) 


The map «p : V > Fin(B, K) may be thought of as a linear chart for V. Each basis for a linear space yields 
a different linear chart. These linear charts are the “component maps" in Definition 22.8.6. They are more 
or less analogous to the charts for locally Cartesian spaces in Section 49.6. 


In the figure/frame bundle context (as in Section 20.10), the linear space V would be considered to be the 
“total space", while the coordinate space Fin(B, K) or K” would be the “fibre space", and the coordinate 
map kp : V — Fin(B, K) would be the “fibre map" or “measurement map". The basis B would be the 
"frame" which is used for making measurements of vectors expressed in terms of coordinates. Multiple bases 
are thought of as alternative "frames of reference" for making observations. 


22.8.5 REMARK: The component map for a finite-dimensional linear space with a given basis. 

Remark 22.8.4 may be expressed more simply for finite-dimensional linear spaces. For any vector family 
B = (e;)"_, € V" in a linear space V over a field K, the map óp : K” — V defined by óp(k) = 5,4 kie; 
for k € K” is a well-defined map. Theorem 22.8.3 implies that op is a bijection if and only if B is a basis. 
So B is a basis if and only if the inverse map kp = ógl : V — K” is well defined. This inverse map assigns 
unique components relative to the basis B for all vectors in the linear space. 


The map «g : V > K” may be thought of as a linear chart for V. Each basis for a linear space yields a 
different linear chart. These linear charts are the “component maps" in Definition 22.8.8. They are analogous 
to the charts for locally Cartesian spaces in Section 49.6. 


22.8.6 DEFINITION: The component map for a basis set B C V of a linear space V over a field K is the 
map kp: V — Fin(B, K) defined by Vv € V, v = Deeg &n(v)(e)e. 


22.8.7 DEFINITION: The component map for a basis family B = (e;);e; € V! of a linear space V over a 
field K is the map kg : V > Fin(I, K) defined by Vv € V, v = » jc; B (v) (i)ei. 


22.8.8 DEFINITION: The component map for a basis B = (e;)?., € V” of an n-dimensional linear space V 
over a field K for n € Zj is the map kg : V > K” defined by Vv € V, v = 35, 4 &p(t)ie;- 


22.8.9 THEOREM: Some basic properties of the component map. 
Let V be a linear space. 

(i) Let B be a basis set for V. Then Vv,e € B, &p(v)(e) = v.e. 

(i) Let (e;);ier be a basis family for V. Then Vi, j € I, kp(ei)(j) = 9i. 
(iii) Let (e;)?_, be a basis family for V for some n € Zj. Then Vi, j € Nn, &p(ei); = ĝi, j- 
Pnoor: For part (i), let B be a basis set for V. Then by Definition 22.7.2, B is a linearly independent 
subset of V. So Ve € B, VA € Fin(B, K), (e = J veg Aee' => Vv € B, M = ĝue) by Theorem 22.6.8 (iv). 
But e = J vep &nB(e)(e')e' for all e € B by Definition 22.8.6. So Vv € B, &p(e)(v) = ó,, for all e € B. 
Hence Vv, e € B, kp(v)(e) = ôv e- 
Part (ii) follows from part (i) with B = {e;; i € I}. 
Part (iii) follows from part (i) with B = (ei; i € Ny}. 


22.8.10 REMARK: The subtle difference between a free linear space and its set of vectors. 

The set Fin(B, K) in Definition 22.8.6 is the underlying set of vectors for the free linear space FL(B, K) 
in Definition 22.2.10. This free linear space is the natural linear space structure for Fin(B, K). This set 
and linear space may be used interchangeably, as in Theorem 22.8.11. Thus the component map kg: V > 
Fin(B, K) is essentially the same as the map kg : V > FL(B, K). 
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22.8.11 THEOREM: The component map is a bijection to the free linear space on its basis. 
Let B be a basis set for a linear space V over a field K. Then the component map kg : V > FL(B, K) isa 
well-defined bijection. 


Proor: Let B be a basis set for a linear space V over a field K. Define the linear combination map 
óp : Fin(B,K) > V by óp(k) = Deeg kle)e for k € Fin(B, K). By Theorem 22.8.2, this map is a 
bijection if and only if B is a basis. But Definition 22.8.6 says that kg : V — Fin(B, K) is defined by 
Vv € V, v=) eeg kB(v)(e)e, which is equivalent to Vv € V, v = óp(&np(v)). In other words, idy = $B o Kp. 
(That is, Kg is a right inverse of dg.) So Kpg is the unique inverse of dg because óp is a bijection, and 
kB : V > Fin(B, K) is also a bijection. 


22.8.12 THEOREM:  Linearity of the component map. 
Let B be a basis set for a linear space V over a field K. Then the component map kg : V > FL(B, K) 
satisfies 


VA € Fin(V, K), KB( D Aw) = Y Argl), (22.8.1) 
vEV vEV 


where linear combinations in FL(B, K) are defined in terms of the pointwise linear space operations specified 
in Definition 22.2.10. 


PROOF: Let A € Fin(V, K). Then „ey Avv is a well-defined element of V. Therefore Jj ey Aw = 
Yep &BÜ sey Avv)(e)e by Definition 22.8.6. But v = Jeep &a(v)(e)e for all v € V by Definition 22.8.6. 


So 
D Aso — Y Ay Y, &a(v)(e)e 
veV vEV ecB 
=> M NXp(v)(e)e. 
ecB vceV 


(Swapping summations is permissible because the sums are finite.) So xg (X sey Avv)(e) = yey AvkB(v)(e) 
for all e € B since the components of „ey A,v with respect to B are unique by Theorem 22.6.8 (iii). Hence 


VÀ € Fin(V, K), Ve € B, KB ( 2: at) He) = 2, rvkB(v)(e), 


which is equivalent to line (22.8.1) because linear operations on FL(B, K) are defined pointwise. 


22.8.13 THEOREM: Images of linear subspaces by a component map are free linear space subspaces. 
Let B be a basis set for a linear space V over a field K. Let U be a linear subspace of V. Then &g(U) is a 
linear subspace of FL(V, K). 


PROOF: Let «pg: V — FL(V, K) be the component map for B. Let U be a linear subspace of V. Let u4, uh € 
&p(U) and Àj, A9 € K. Then wv, = &p(uj) and uy = &p(us) for some uz,u2g € U. By Theorem 22.8.12, 
Kp (A1tu1 + Agua) = Aru, + Agus So Au + Agus € Ke(U). Hence &g(U) is a linear subspace of FL(V, K) by 
Theorem 22.1.13. 


22.8.14 REMARK: Avoidance of the axiom of choice when defining component maps. 

No axiom of choice is required for definitions of component maps. In Remark 22.8.4 and Theorem 22.8.11, 
for example, the existence and uniqueness of the map kg : V — Fin(B, K) as the inverse of the linear 
combination map ¢g : Fin(B, K) — V is guaranteed by the fact that every function has a unique well- 
defined inverse relation by Definition 9.6.13, and this relation is then a function by Theorem 22.6.8 (iii), 
which follows from Definition 22.6.3 for linear independence of sets of vectors, and Definition 22.7.2 defines a 
basis to be a linearly independent spanning set for a linear space, which yields both existence and uniqueness 
for values of the inverse relation kg = da. 


An axiom of choice issue does arise if one wishes to claim that every linear space as in Theorem 22.7.21, but 
when the existence of a basis is assumed, the well-definition of the component map for each basis follows. 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


778 22. Linear spaces 


22.8.15 REMARK: The intentional inconvenience of the explicit component map notations. 

Instead of clumsy component notations such as &p(v)(e), «g(v)(i) or kg(v);, simpler notations such as v; 
or v^ for vectors v and basis index elements i are more commonly used. The simpler kinds of notation hide 
the complexity of the inversion of the linear combination map. Although there is no axiom of choice issue 
here, as explained in Remark 22.8.14, the inversion of the linear combination map can be computationally 
non-trivial. Roughly speaking, this inversion is equivalent to solving n simultaneous linear equations for n 
variables in the case of a linear space V with dim(V) = n € Z{. The solution of these equations can be 
achieved by standard matrix inversion algorithms, but the computation time can be quite large if n equals 
108 or 10? for example. And of course the case of infinite-dimensional spaces is even more difficult than that. 
Thus clumsy notations such as &g(v)(i) are intended to communicate the idea that a non-trivial procedure 
is required here. 


The difficulty of computing vector components should not be confused with the relative simplicity of com- 
puting “covariant components”, which are expressed in terms of an inner product. For example, if (e;)?_, 
is a basis for an n-dimensional linear space V, each covariant component n(v,e;) of v € V for i € IN, is 
computed with reference to only two vectors, v and e;. But the “contravariant components" &pg(v)(i) depend 
on v and all of the basis vectors. 'To compute even a single component, it is necessary to solve for all of the 
components simultaneously. 


22.8.16 DEFINITION: The component function for a vector v with respect to a basis B C V for a linear space 
V over a field K is the function &p(v) € Fin(B, K), where «pg is the component map in Definition 22.8.6. 


22.8.17 DEFINITION: The component tuple for a vector v with respect to a basis B = (e;)?_, for an n- 
dimensional linear space V over a field K for n € Z is the tuple &pg(v) € K”, where kg is the component 
map in Definition 22.8.8. 


22.8.18 THEOREM: Existence of Cartesian space basis with a subset spanning a given subspace. 
Let W be a linear subspace of K” for some n € Z. 

(i) There is a basis (e;)?-, for K” such that (ej);er is a basis for W for some subset J of Np. 

(ii) There is a basis (e;)?_, for K” such that (e;)7*, is a basis for W for some m € Zj with m < n. 


PROOF: For part (i), let W be a linear subspace of K” for some n € Zg. Let I be a subset of N, such 
that the map 7; : K” — K! defined by m; : v =œ (vilier = v|, is a bijection. Such a set J exists by 
Theorem 22.5.11 (i). Let S; = {n7 !(0;); i € I} and Ty = {6;; i € I'] where 6; = (6;;)jer € K! for i € I, and 
ó; = (015); € K” fori € I = Nn VI. Then K” = span(Sr UTI) by Theorem 22.5.11 (iv). Let B = SrUTr. 
Define (e;)?., by e; = n; (ôi) for i € I and e; = 6; fori € I’. Then B = (ei; i € Ny}. 

To show that B is a linearly independent subset of K”, suppose that * 77 , Aje; = 0 for some (A), € K”. 
Then 0 = 5 je; Amm (O) + Dien Aib; = Y uer Ai(np 1 (0) 20) + 95,4 Adi. But (171(0;) 26); 20forj € I 
because 7, and therefore ar! does not alter the jth component of ĝi. So n; (ôi) — ô; = ier bid; for some 
(jer € K” for all i € I. Let M = X; for i € I and M = X; + 3 jer jb; for i € I'. Then 0 = 557 , Ajo. 
Therefore A; = 0 for alli € Nn. So A; = 0 for all i € I, and so A; = = er dj; di = 0 for alli € I’. Thus 
Aj = 0 for alli € Nn. So (e;)?_, is a linearly independent family, which is therefore a basis for K”. 


The sub-family (e;);e; consists of elements of W which span W by Theorem 22.5.11(ii). Since they are 
linearly independent, (e;);e; is a basis for W by Definition 22.7.6, which verifies part (i). 

For part (ii), let m = #(I), where T is as in part (i). Then m € Zj and m € n. Let à : Np > Nn be 
a permutation of N, such that ¢(Nm) = I. (Such a permutation may be constructed using the standard 
enumeration for I in Definition 12.4.5.) Then $ : Nm — I is a bijection. Let (€)?_, be the basis for K” 
which is constructed for J in (cH d Define (e;)-, by e; = Eg) for all i € Ny. Then by part (i), (ei), is 


a basis for K” and (ei);-4 = (es())ic4 = (Eg) lotier is a basis for W. 


22.8.19 THEOREM: Existence of a basis for which a subset of vectors spans a given subspace. 
Let U be a linear subspace of a finite-dimensional linear space V. Then there is a basis (e;)7., for V such 
that (e;)7*, is a basis for U, where m = dim(U) and n = dim(V). 


PROOF: Let B = (6;)2., be a basis for V. (Such a basis exists by Theorem 22.7.13.) The map «kg: V > K” 
is a bijection by Theorem 22.8.11. Let W = &p(U). Then W is a linear subspace of K” by Theorem 22.8.13, 
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and KBly : U — W is a bijection because &g is injective. By Theorem 22.8.18 (ii), K” has a basis (e/)?_, 


such that (e/), is a basis for W for some m € Zj with m < n. Define (e;)"_, by e; = Kg (ej) for alli € Ny. 
Then (e;)2., is a basis for V and (e;)7*, is a basis for U. 


22.8.20 REMARK: Standard notations for vector components. 

It is customary to employ an abbreviated notation such as v; in place of expressions like &p(v)(i), where Kp 
is the component map for a basis B = (e;);e7 for a linear space V, with v € V and i € I. Sometimes the 
superscript notation v is used. 


22.8.21 REMARK: Conventions for upper and lower indices for vector components. 

The tensor calculus convention of using of a superscript index like vê for vector components instead of sub- 
scripts like v; is unnecessary in linear algebra. In tensor calculus for Riemannian spaces, vector (and tensor) 
indices may be raised and lowered to indicate that the components have been combined (or “contracted” ) 
with the metric tensor or its inverse. This is not a feature of simple linear algebra. 


Upper and lower indices are also used in tensor calculus as a mnemonic for covariance or contravariance 
with respect to change-of-basis transformations. This mnemonic device is of some relevance to general 
linear algebra. The upper and lower index conventions are often useful for guessing correct expressions and 
equations, but they prove nothing. Avoiding these conventions forces one to make more careful checks while 
developing the basic theory. (See Remark 23.8.4 for further comment on this topic.) 


22.9. Vector component transition maps 


22.9.1 REMARK: Transition map things which do not work for infinite-dimensional linear spaces. 

The purpose of Examples 22.9.2 and 22.9.3 is to show what can go wrong in the case of infinite-dimensional 
linear spaces. Example 22.9.2 gives the obvious coordinatisation of the linear space FL(Zj , R) of real-valued 
sequences which have only a finite number of elements with a non-zero value. Example 22.9.3 gives a different 
basis for the same linear space FL(Zi ,IR). But the component transition array between the two bases does 
not have a finite number of non-zero elements in all rows and columns. 

After linear functionals and dual bases have been introduced in Sections 23.4 and 23.6, Examples 22.9.2 
and 22.9.3 are used in Remark 23.7.4 to observe that algebraic dual spaces of infinite-dimensional linear 
spaces have serious problems which make them unsuitable for many applications. 


22.9.2 EXAMPLE: The most obvious basis for real-valued vectors om the non-negative integers. 
Let V = FL(Zj, R). Then B = (Cdiext is a basis for V, where e; € V is defined for i € Zj by e;(x) = 9i 


for all x € Zj. The component map kg : V > Fin(Zg , IR) satisfies 
vv € V, Vi € Z, &p(v)(i) = v(i). 


Then (Xiezi &p(v)(i)e;)) (j) = (Xiezi v(ije)() = 2 iz v(i)e;(j) = Viet v(i)ó;; = v(j) for all j € ZF, 
for all v € V. So iezi &p(v)(i)e; = v for all v € V. This verifies that &p is the component map for B. 


The expression &p(v)(;i) may be written as jezi ci ju(j) where cij = ôi j for all i,j € Zf. This means 
that the ith component &p(v)(i) of v depends only on the value of v at i. 

22.9.3 EXAMPLE: Non-finite component transition map for an infinite-dimensional linear space. 

Let V = FL(Zi,R). Then B = (E:)iezt is a basis for V, where e; € V is defined by @(x) = o, for 


all x € Zg and ē;i(x) = 64,2 — ĝi—1,» for all x € De for all i € Zt. A comparison of this basis B with the 
basis B in Example 22.9.2 is illustrated in Figure 22.9.1. 


R R 
A A 
€0 €1 €2 €3 €4 €5 €o €1 €» €3 €4 €5 
1- ° ° ° ° ° 1^ e ` 1759 ° 
ammm 04————3——————3— 
1 2 3 4 5 Z% : dé Oe 86 4 5. Ay 
-1- -1 $ s e E s 
Figure 22.9.1 Comparison of basis examples for FL(Zj, IR) 
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The component map «p : V > Fin(Zg , IR) satisfies 


vv € V, Vi € Z, &pg(v)(i) 2 YO v(li+k). 


Then for all v € V, for all j € Z, 


=( (X wi-E)e)O) 


~ 
a 
Cui 
~ 
es 
~ 
— 
es. 
LA 
CI 
S: 
Nr 
— 
QS 
ey 
l 


icz ieZt kezt 
- (X v(E))esG) E vi +h) als) 
keZ? i€2 kez 
= X wk); (35 vit E) Sij — 61) 
kezi i€Z kezt 
=i E v(k)- (1—6595)( E vG-- 5) E vG t1 k) 
kez keZy kez 
= X. v(j+k)- X, v(j+1+k) 
kez? kezi 


= v(j). 
So vient &g(v)(i)e; = v for all v € V. This verifies that &g is the component map for B in V. 


The expression &g(v)(i) may be written as ` eg+ ci;v(j) where cj; = X rezi Oi+n,j for all i, j € Zf. This 
means that the ith component &p(v)(i) of v depends on all of the values of v at i+ k for k > 0. A comparison 
of this component map &g with the component map «pg in Example 22.9.2 is illustrated in Figure 22.9.2. 


C2.j n" 
Ra = KB Ra d KB 
14 ° 14 E NE M ML 
0 T | T T T -j 0 T | | | | M 
1 2 3 4 5 Zo 1 2 3 4 5 Zo 
14 -1 -+ 
Figure 22.9.2 Comparison of component map examples for FL(Zj, R) 


It is straightforward to recover the basis B from B because e; = x €; for all i € Zj. Conversely, 
é; = ej — e;—1 for all i € Z and &o = eg. 


22.9.4 DEFINITION: The (vector) component transition map for a linear space V over a field A, with 
indexed bases Bı = (eh)ier and Bg = (e$ )jeJ is the map WB, Bo = KB, © OB, i Fin(/, K) = Fin(J, K), 
where the component map &p, : V — Fin(J, K) is as in Definition 22.8.7 and the linear combination map 
óp, : Fin, K) > V is defined by àp, : k 5 Jier k;el. 


22.9.5 REMARK: The vector component transition map is always well defined. 
The component transition maps Vp, .p, and Vp, p, in Definition 22.9.4 for indexed bases Bı and B» for a 
linear space V are illustrated in Figure 22.9.3. 


The map óg, : Fin(/, K) > V always yields a well-defined element of V. Then the map kg, : V > Fin(J, K) 
always yields a well-defined finite linear combination of elements of the indexed basis B5 because a basis for 
V is defined to be such that each element of V is a unique (finite) linear combination of elements of the basis. 
One might suppose that therefore the component array of the transition map must have a finite number of 
non-zero elements in every row and column. It turns out that this is not true. 


22.9.6 DEFINITION: The basis-transition (component) array for a linear space V over a field K, from 
indexed basis Bı = (el)ie; to indexed basis By = (e5)je: is the double family (aj;i);ezi;er where aj; = 
&p,(e1); for all i € I andj € J. 
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component OB, 
transition Wap, Bo V5, B, V 
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pet 
= OB 
Fin(I, K) KBs ent me? 
co? 
eck 
Figure 22.9.3 Component transition maps for linear spaces 


22.9.7 REMARK:  Basis-transition component array rows may have infinitely many non-zero entries. 

The double family Apg,,p, in Definition 22.9.6 is well defined, and for each i € I, there are only a finite 
number of j € J such that aj; Æ 0. However, there is no guarantee that the number of non-zero aj; for a 
fixed j € J is finite. Nevertheless, the two basis-transition component arrays are inverses of each other, as 
shown in Theorem 22.9.8. 


The convention followed here for the words “row” and “column” is that row j of a double family (aj;)je ier 
is the single-index family (aj;)ier, whereas column i is the single-index family (aj;)je;. Then a change of 
basis is effected by “multiplying on the right” by the array, whereas the transformation of components is 
effected by “multiplying on the left” by the array. (See Remark 22.9.10 and Theorem 22.9.11.) 

It is not strictly correct to refer to the array operations in Theorem 22.9.8 as “matrix multiplication” 
or “matrix inversion” because multiplication and inversion of infinite matrices requires either topology or 
sparseness constraints to make infinite sums meaningful. This is why the word “array” is used here. 


22.9.8 THEOREM: Forward and reverse basis-transition component arrays are “inverses” of each other. 
Let Bı = (ej)ie; and B2 = (ej)je; be bases for a linear space V over a field K. Let Ap, p, = (aji)je jer 
and AB,.B, = (G@;)ier,je7 be the basis-transition component arrays for Bı and B2. Then 


Vi, j € J, Y aed; = 64; (22.9.1) 
lel 

Vi, j € I, 2» ü;pQpj = Óij. (22.9.2) 
LEJ 


Proor: For indexed bases Bı = (ej)ier and B» = (ej)je; for a linear space V over a field K, define the 
arrays (ai) je Jer and (Gij)ier,jeJ by Qji = KB, (e); and Qij = KB, (e$) for all i € I and j € J. These 
formulas mean that e} = Dyer 95:65 for all i € I, and e? = » jc; üije; for all j € J. One may insert the 
expression for e} into the expression for e? to obtain e? = Jer ài; (X peg aie?) = Dier Dees Ūijakiek = 
res (ier akiãij)e?. (The manipulations of the summation symbols are justified by the fact that all of the 
sums have only a finite number of non-zero terms.) But e? = } per 0jye;, and the components of a vector 
with respect to Bə are unique by Theorem 22.8.2. Therefore ier GpiGiy = Ójy for all i € I and j € J. 
This is equivalent to line (22.9.1). Line (22.9.2) follows similarly by inserting the expression for e? into the 


expression for el. 


22.9.9 EXAMPLE: A basis-transition component array with infinitely many non-zero entries in all rows. 
In Example 22.9.3, the bases have index sets J = Zj and J = Zj. The double family App = (aji)j i-o 
satisfies ay = Kp(ei); = 4 Ôi j+k for i, j € AZ So ay, = 1 for i = J and Qji = 0 for i < J: This is 
illustrated in Figure 22.9.4. 

Clearly #{j € Z3; aji # 0} = #{j € Zf; Ik € Zg, i = j+k} = #{j € Zj; i > j} = #(Z[0,i]) = i41 < oo, 
as expected. However, #{i € Zt; aji 4 0) = #(Z[j, oo]) = oo. 

One may easily verify that the two basis-transition component arrays are “inverses” of each other in the 
sense of Theorem 22.9.8. Questions of convergence of infinite series do not arise because the number of 
non-zero elements is finite in every column. The index sets for the bases may be finite, countably infinite, 
or arbitrarily uncountably infinite. There is no requirement for any order on the index sets. 
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j=0 12 3 4 5- i=0 12 3 4 5 
i—0/1/|-110/0/.0/0 j—-0 1/|]1/|1/|1 1/|1 
1/0/1-10/.0/0 1|/0/1/1]1/|1/1 
i 2/0)0)1 4-1] 04] 0 2/.0]10|/1|1 1/|1 

App App = 
3/0/0)0 -1/0 3,0|0[|0/|1 1/|1 
4.0/0[/0|0/1]-1 4.0:0|0|0/1|1 
5|[0/0/0/0/0/1 5|[0/0|0]/0/0/1 

Figure 22.9.4 Basis-transition component arrays for example bases B and B 


22.9.10 REMARK:  Basis-transition component arrays on the right of bases, on the left of components. 
The basis-transition array Agp,.p, = (aji)jej4er seems to be “backwards”. This is both intentional and 
unfortunate. In the theory of matrices, one generally prefers matrices to be “on the left", which implies 
that the tuple-vectors which are multiplied by matrices must be “column vectors" which are “on the right”. 
However, this convention generally assumes that the entries in the tuples are components of vectors in some 
linear space. When one changes the basis of a linear space, the tuple of basis vectors must be multiplied by 
a transition matrix “on the right” if the components are multiplied by a transition matrix “on the left”. In 
other words, the transition array for the basis vectors must be “backwards” if the transition array for the 
components is "forwards". 


The basis-transition component array in Definition 22.9.6 is “backwards” also in a second sense. The 
formula aj; = kKp,(el); expresses the basis vectors of Bı in terms of the basis vectors in B». Thus 
ei = Mer KB. (C}) 367 = 25e; 6jaji. One would expect this to be called the transition array from B» 
to Bı. Therefore it would seem more logical to define the array as Ap,,p, = (aij)icr,;jc; With aij = Kp, (e$)i 
for all i € I and j € J. Then one would obtain ef = J jer KB, (€5)i€} = J jer €104j. However, the name of 
the array is the basis-transition component array, not the basis-transition basis array. It turns out that it is 
an inevitable consequence of the definition of coordinatisation of vectors by bases that the transition array 
for basis vectors is the transposed inverse of the transition array for the components. This is demonstrated 


by lines (22.9.3) and (22.9.5) in Theorem 22.9.11. 


22.9.11 THEOREM: Transformation formulas for bases and vector components. 
Let V be a linear space over a field K. Let B4 = (el);e; and By = (e$ )jes be indexed bases for V. Define 
basis-transition component arrays Ap, .p, = (aji)jeJicr and ÁB,B, = (Gij)ier,jcJ by 


Vi € I, Vj € J, aji = KB, (ei )j 
Vi € I, Vj € J, à = KB, (65)i- 
'Then 
Vj € J, e = D 8 (22.9.3) 
icI 
Vi € I, éco s dn. (22.9.4) 
jeJ 


Denote the components (v; Jier and (vj);e; of vectors v € V with respect to Bı and B» respectively by 


Vi € I, vl = KB, (v); 
Vj € J, v? = Kp, (v);. 
Then 
Vv cV, VjeJ, v? = Lavi (22.9.5) 
i€ 
Vv € V, Vi € I, ul = 2M usu). (22.9.6) 
je. 
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PROOF: Lines (22.9.3) and (22.9.4) follow directly from the definitions of the component maps &p, and KB,- 
Let (vi);e; and (vj);e; be the component families for a vector v € V with respect to B, and B» as 
indicated. Then v = Die, £B: (vie? = Vier vie; and v = Ye; kp,(v)jej = 35e; vje; by the definition 
of the component maps. So je, ve; = 55e; vj6;. 

gives ier vier ejas) = Deslier ajv e; = 3 eJ vie. Hence 37,., aj;vj = vj by the uniqueness 
of vector components with respect to B2. This verifies line (22.9.5). Similarly, line (22.9.6) follows by 


substituting ef into J jer viei = jez v;6; from line (22.9.3). 


Substituting el from line (22.9.4) into this equation 


22.9.12 REMARK: The figure/frame bundle interpretation for changes of linear space basis. 

Theorem 22.9.11 may be interpreted in the context of figure/frame bundles (as described in Section 20.10) 
by regarding multiplication on the left by the array (a;;)j;ezier as the left action Lg of a group element g 
acting on a fibre space element (vl);e;, which is an observation of an object v in the total space V. The 
bases Bı and B» are regarded as “frames of references” for observers. Then line (22.9.3) may be interpreted 
as the right action R,-1 of g^! on the basis (e});ez in the principal bundle (or “frame bundle"). Thus the 
combination of the actions Ly and R,-1 “keeps everything the same". In other words, the observations 
(vi)ier and (v)je; correspond to the same observed object v € V if the basis and coordinates are acted on 
by R,-: and Ly respectively. 


22.10. Minkowski sums and scalar products of sets 


22.10.1 REMARK: Minkowski sums and scalar products. 
Since the vector addition operation of a linear space is associative and commutative, the order of addition in 
the Minkowski sum of a finite family of sets in Definition 22.10.2 is immaterial. The sum of a finite set-family 
is constructed inductively in the usual way. The definitions and notations for set-sums and scalar products 
(or “multiples” ) of sets may be combined to define linear combinations and convex combinations of sets. 
22.10.2 DEFINITION: The Minkowski sum of subsets Sı and Sə of a linear space is the set 
S1 + $5 = {v + vg; v € $1 and v9 € $5). 

The Minkowski sum of a finite family of subsets (5;)?., of a linear space is the set 

TL 

> Ui; Vi € Nn, Vi € Si \. 


The Minkowski scalar multiple by A € K of a subset S of a linear space V over a field K, is the set 
AS = (Av; v € S]. 


22.10.3 REMARK: Minkowski set-sum notations for singletons are equivalent to coset notation. 

Definition 22.10.4 defines the notations u + S and S + u for vectors u and subsets S to mean the same as 
{u}+S. This kind of notation is equivalent to the left and right cosets for general groups in Definition 17.7.1, 
except that the operation is notated additively rather than multiplicatively. 


It is also convenient to introduce notations for negatives and differences of sets in Notation 22.10.5. (For an 
application of Theorem 22.10.6 (i), see the proof of Theorem 39.1.8 (i).) 


22.10.4 DEFINITION: The Minkowski sum of a vector u and a subset S of a linear space is the set 
u+S=S+u={utvyveS}= {u} +S. 

22.10.5 NOTATION: —S, for a subset S of a linear space V over a field K, denotes the Minkowski scalar 

multiple (—1g)S of S, where 1g is the unit element of K. 


Sı — So, for subsets Sı and S of a linear space V, denotes the Minkowski sum Sı + (— S2) of Sı and — S2. 
In other words, S1 — S2 = (v1 — vs; v € Sı and v» € S2}. 


22.10.6 THEOREM: Invariance of disjoint property of sets with respect to Minkowski sums. 
Let V be a linear space. 


(i) VS1,S2,A € IP(V), (Si + A) N S2 = 0 Sin (S5 = A) =Í. 
PROOF: For part (i), let S1, $5 and A be subsets of V. Suppose that (S1 + A) N S2 # (). Then there are 


zı € $41, v2 € S2 and a € A such that zı +a = z2. Then zı = z9 — a. So Sı N (S5 — A) #9. Similarly, 
S1 N (S2 — A) # Ø implies (Si +A) NA S2 z 0. Hence ($1 + A) a So =i © Sı N (S2 — A) = 0. 
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22.11. Convexity in linear spaces 


22.11.1 REMARK: Convex combinations may be regarded as a special case of linear combinations. 

The concept of a convex combination in Definition 22.11.2 may be thought of as a special case of the linear 
combination concept which was introduced in Definition 22.3.2. (See Section 18.8 for ordered fields.) This 
is a very convenient form of definition, although it has important differences from the intuitive notion of a 
convex combination, and it does not generalise very well to curved space. 


22.11.2 DEFINITION: A conver combination of (elements of) a subset S of a linear space V over an ordered 
field K is any element v € V which satisfies 


Jn € Zj,3ke(Kj)"3weS", Sok =1x and X` kwi =v, (22.11.1) 
i—l i=l 


where Kj denotes the non-negative elements of K. 


22.11.3 REMARK: Geometrical interpretation and order relations for convex combinations. 

The linear constraint D ki = 1x in Definition 22.11.2 restricts vectors to a hyperplane. (This hyperplane 
could be the whole linear space as a special case.) The requirement that all coefficients of convex combinations 
must be non-negative constrains the combinations to lie “between” the points which are combined. 


It is easiest to see the intuitive meaning of convex combinations in the case of two vectors w; and w». In this 
case, v = w1 + A(w2 — w1) for some À € K such that 0 < A € 1. Thus A lies “between” 0 and 1. This notion 
of "lying between two numbers" 0 and 1 is transferred to the linear space V so that v lies between the two 
vectors wu; and w2. A total order is induced on the line {w1 + A(wa — w1); A € K}. This total order makes 
it possible to define an “interval” (or line segment) of vectors {w1 + A(w2 — w1); A€ K and0<A< 1} on 
that line. The map A  w + A(we — w) is a kind of (inverse) chart on the line. In the same way that linear 
space component maps provide a chart for a linear space or subspace (defining linear structure on them), 
and a differentiable manifold atlas provides charts for manifolds (defining differentiable structure on them), 
the map A e us + A(ws — w1) provides a chart for a line of vectors, and this defines a total order structure 
on the line. 


For n points, one may write Un = Wn + An (Un—-1 — Wn) with 0 € A, € 1, defined inductively with respect to n. 
Thus a convex combination can be built inductively from between-operations. Another way of viewing this 
is to consider the map k — Yi kjw; from K” to V. This may be thought of as a linear chart which induces 
order relations. Then the image under this map of the set {k € K”; 377 , ki = 1 and Vi € Nn, ki > 0} is 
the set of convex combinations of a family (w;)?., € S". 


22.11.4 REMARK: Some basic properties of convex combinations. 
Convex combinations have properties which make them very useful. For example, a convex combination of 
convex combinations of a fixed set of vectors is also a convex combination of those vectors. 


Convex combinations have properties which are related to the properties of linear combinations. For example, 
the set of convex combinations of a subset of a linear space is closed under convex combinations. 


22.11.5 REMARK: Ad-hoc notation for the set of convex multipliers in a free linear space. 

As in the case of linear combinations, the lack of suitable notation is a hindrance to the presentation of 
convex combinations for infinite-dimensional spaces. Notation 22.2.25 introduced Fin(S, K) to mean the set 
Uf :S > K; #{x € S; f(x) # 0k) « co} of K-valued functions which are non-zero only on a finite subset 
of the domain S. Notation 22.11.6 is a corresponding non-standard notation for discussing convexity. The 
superscript 1 is supposed to suggest that the sum of the function values equals 1. 


22.11.6 NOTATION: Fing(S, K) denotes the set {f : > Kg ; #(supp(f)) < oo and cg f(z) = 1x) for 
any set S and ordered field K. In other words, 


Fing(S, K) = (f € Fin($, KP); D f(x) = Lich 


22.11.7 REMARK: Simplified expression for the definition of convex combinations. 
Using Notation 22.11.6, line (22.11.1) of Definition 22.11.2 may be written as: 


Jk € Fini (S, K), 5 k(w)w =v. 
wES 
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22.11.8 DEFINITION: A convex set S in a linear space V is a subset S of V which is closed under convex 
combinations. In other words, 


Yn € Zf, Yw € S", Yk € (Kj), Soh —1k > Do kw Es. (22.11.2) 


i=l i=1 


22.11.9 REMARK: Simplified expression for the definition of convex sets. 
Using Notation 22.11.6, line (22.11.2) of Definition 22.11.8 may be written as: 


Vk € Finį(S, K), X k(w)w € 8. 


Convex sets are frequently defined to be sets which are closed under convex combinations of pairs of points. 
Then it is easily shown by induction that Definition 22.11.8 is satisfied. The well-known method for this is 
exemplified in the proofs of Theorems 23.1.3 and 23.4.4. 


22.11.10 DEFINITION: 
The convez span of a subset S of a linear space V is the set of all convex combinations of S in V. 


22.11.11 REMARK: The notation for the convex span of a set has an implied linear space. 
The linear space V in Notation 22.11.12 must be provided by the context. 


22.11.12 NOTATION: conv(S) denotes the convex span of a subset S of a linear space. 


22.11.13 REMARK: The convex span of a set is convex. 
Theorem 22.11.14 is very similar in form to Theorem 22.3.6. The form of the proof is also very similar. 


22.11.14 THEOREM: The convex span of a set is a convex set. 
For any subset S of a linear space V, the convex span of S is a convex set in V. 


PROOF: Let S be a subset of a linear space V. If S = Ø, then conv(S) = Ø, which is a convex set. In 
general, if v is a linear combination of elements of conv(S), then v = $5; ., k;w; for some (w;)?., € conv(S)^ 
and (ki) € Cen with pr ki =1x. But for all i € Ny, wi = 2554 Ai jYij for some (yi j) € S” and 
(Argit € (Kg )'* with 575*, Aig = 1x. Therefore v = 577 4 ki( 3754 Mgt) = Dici Djar kidijYij = 
Y 2.4 ki M, jeYiejea Where N = Y ni, and ig = min{i' € Zg; Sa ni > £) and je = £ — Y! ni for 
all / € Ny. So v € conv(S). Hence conv(S) is a convex set by Definition 22.11.8. 


22.11.15 THEOREM: A set is convex if and only if it equals its own span. 
A subset S of a linear space is a convex set if and only if conv(S) = S. 


PROOF: Let S be a subset of a linear space V. If S is a convex set, then by Definition 22.11.8, every convex 
combination of elements of S is in S. Therefore conv(S) = S by Definition 22.11.10. Now suppose that 
conv(S) — S. Then S is a convex set by Theorem 22.11.14. Hence the assertion is proved. 


22.11.16 REMARK: Some simple cases of convex spans. 

For any linear space V, conv(0) = Ø. This contrasts with the situation for linear combinations, where 
span()) = {0v } by Theorem 22.4.7 (ii). 

For any vector vo in a linear space V, conv((vo]) = {vo}. (In the case of linear combinations, the linear span 
of a singleton set is a one-dimensional subspace of V.) In the case of two points vo, vı € V with vo Æ vi, 
conv((v,,v5)) forms a line segment in the linear space. Similarly, the convex span of three non-colinear 
points is a filled triangle, and the convex span of four non-colinear points is a filled tetrahedron. 


22.11.17 THEOREM: The convex span of a two-point set is a line segment. 
Let V be a linear space over an ordered field kK. Then 


Va,b € K, conv((a,6)) = ((1y —A)a+ Ab; A € K[0x,1xK]}. 
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PROOF: Let cé€ ((lg —A)a+ Ab; A € K[0x,1x]}. Then c = (1K — A)a + Ab for some A € K[0x, 1x], which 
implies A € K and Og < A € 1x by Notation 11.5.13. Let kı = 1 — à and ka = A. Then ky > Ox, ko > OK 
and ki + k2 = (lk —r)+A = lg. So (1k — AÀ)a + Ab = kıa + kəb € conv({a, b) for all A € K[0x, 1x]. 'Thus 
{(lk — A)a+ Ab; A € K[0K,1K]} € conv({a, d}). 

Now suppose that c € conv({a,b}). Then c= 25; , kiwi; for some n € Z+, w € {a,b}" and k € (Kg )" with 
Xi ki = 1g. Let A = {i € Nn; wi = a} and B = {i € Nn; w; -b.LeA-SXupgk. Then À > Ox, 
AUB-IN;, An B - (, A= lx — 228 < ix and c — (lk — A)a + Ab. Soce {(lk — àja + àb; à € 
K[0x,1x]}. Hence conv({a,b}) = { (1x — å)a + àb; A € K[Ox, 1x]}. 


22.11.18 DEFINITION: The convex generator of a subset S of a linear space V is the minimum subset T 
of V such that the convex span of T equals the convex span of S. 


22.11.19 REMARK: Alternative terminology for convex spans and convex generators. 

The convex span of a set S in Definition 22.11.10 is generally referred to as the “convex hull” of S. However, 
EDM2 [113], 89.A, page 331, uses the term “convex hull” for Definition 22.11.18 and gives the notation [S] 
for this. Yosida [167], page 363, calls Definition 22.11.10 the “convex hull” and gives the notation Conv(S). 
Rudin [130], page 72, also calls Definition 22.11.10 the “convex hull”, and gives the notation co(S). See 
Table 22.11.1 for further options. 

Guggenheimer [16], page 250, uses the term “convex hull” for the “boundary of the smallest convex domain 
whose interior contains the interior” of S, and gives the notation S* for this. This corresponds roughly to 
Definition 22.11.18, but requires a topological context. 


year reference convex span convex generator 

1963 Guggenheimer [16], page 250 — [convex hull:] S* 

1964 Gelbaum/Olmsted [78], page 130 convex hull — 

1964 Marcus/Minc [111], page 96 convex hull: H(S) — 

1965 Yosida [167], page 363 convex hull: Conv( S) — 

1971 Trustrum [151], page 1 convex hull: (S) — 

1973 Rudin [130], page 72 convex hull: co(S) — 

1981 Greenberg/Harper [86], page 42 convex hull — 

1993 EDM?2 [113], 89.A, page 331 — convex hull: [S] 
Kennington convex span: conv(S) convex generator 

Table 22.11.1 Survey of convex hull terminology and notation 


The term "convex hull" does suggest the boundary of a convex set, in other words a minimal (convex) 
spanning set for a given convex set. The English word “hull” means a shell, pod, husk, outer covering or 
envelope, or the body or frame of a ship. This strongly suggests Definition 22.11.18. The literature may be 
evenly split between Definitions 22.11.10 and 22.11.18 for the “convex hull”. Therefore it is safest to avoid 
the term. 


22.11.20 REMARK: The convex envelope of a subset of a linear space. 

Robertson/Robertson [126], page 5, defines the “convex envelope" of a subset S of a linear space V to be 
the set (357 Avi n € Zt, A € V”, v EV", 57. làl| € 1}, which “balances” the set about the zero 
vector Oy. This is clearly not the same as the convex hull. 


22.11.21 REMARK: Convex combinations in affine spaces. 

The concept of an affine space is often argued to be closer to the common intuition of space than a linear 
space, since affine spaces have no specified origin. (See Chapter 26 for affine spaces.) The concept of a 
hyperplane segment in Definition 26.9.12 is applicable to a more general affine space over a unitary module 
over an ordered ring. 


22.11.22 REMARK: Convexity of Minkowski sums of convex sets. 

It is occasionally useful to know that the Minkowski sum of two convex sets in Definition 22.10.2 is a 
convex set. Theorem 22.11.23 implies in particular that the Minkowski sum of two real-number intervals is a 
real-number interval by Theorem 22.11.24. Minkowski set-sums are also convenient for some other purposes. 
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22.11.23 THEOREM: Minkowski sums of convex sets are convex. 
The Minkowski sum of convex subsets of a linear space over an ordered field is convex. 


PROOF: Let $,,55 be convex subsets of a linear space V over an ordered field K. Let w be a convex 
combination of elements of the Minkowski sum 54 + S2. Then w = Aw + A2w» for some w1, W2 € $4 + S2 
and Ay, A2 € ET with A; + As = 1 by Definition 22.11.2. But Qj = Vj 1 d U5,2 for some Uji € S; for i = 1,2 
and J = 1,2. So w = ài (v11 + U1,2) + A2(v21 + v2,2) = (Quia + A2V2,1) + (AiU1,2 + A2V2,2) € $1 + So by 
Definitions 22.11.8 and 22.10.2. Hence 5$ + $5 is a convex set by Definition 22.11.8. The case of Minkowski 
sums of general finite families of convex sets follows similarly. 


22.11.24 THEOREM: Convezxity of real-mumber intervals. 
(i) A set of real numbers is convex if and only if it is an interval. 


(ii) The Minkowski sum of real-number intervals is a real-number interval. 


PROOF: For part (i), let J be a real-number interval. Let x,y € J and A, u € Rọ with Accu — 1. Ifa<y 
then Ax + py = z + u(y — x) > x. Similarly Az + uy < y. So Ar + uy € I by Definition 16.1.4, and similarly 
if x > y. Therefore I is convex. Conversely, if J is convex then I satisfies Definition 16.1.4 for a real-number 
interval. 


Part (ii) follows from part (i) and Theorem 22.11.23. 


22.11.25 THEOREM: The convex span of a set of two real numbers is a non-empty bounded closed interval. 
Va,b € R, conv({a, b}) = [[a, b]]. 


PROOF: The assertion follows from Theorems 22.11.17 and 16.1.16. 
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Chapter 23 


LINEAR MAPS AND DUAL SPACES 
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23.0.1 REMARK: Linear maps for infinite-dimensional linear spaces. 

Linear maps are quite simple and clear-cut for linear spaces which have a finite basis. The situation is more 
complicated for infinite-dimensional linear spaces which have a well-ordered basis. But for spaces with a 
non-well-ordered basis, or with no basis at all, linear maps and linear functionals become quite problematic. 
The difficulties and mysteries of dual spaces for infinite-dimensional linear spaces typically go unnoticed 
because topology rescues the situation before it gets out of hand. By constraining linear maps and linear 
functionals to those which are bounded (i.e. continuous), the fathomless depths of infinitude of dual spaces 
are removed from consideration. (For example, the algebraic duals of RY and RÈ are rarely mentioned.) 
In Chapter 23, some of the difficulties of unbounded algebraic duals are considered. Even a relatively 
benign space such as FL(w, IR), which is the linear space of real sequences with finite support, has the dual 
UL(w, R) = R”, which is the linear space of completely general real sequences, and the double dual of 
FL(w, R) is difficult to describe because of choice-function issues. These give some idea of why algebraic 
duals and double duals of infinite-dimensional spaces should be avoided. 


23.1. Linear maps 


23.1.1 DEFINITION: A linear map from a linear space V over a field K to a linear space W over K is a 
map $ : V — W such that 


(i) Vur,v2 € V, ó(vi + v3) = (v1) + O(v2), [vector additivity] 
(ii) VÀ € K, Vu € V, (Av) = Ad(v). [scalarity] 
23.1.2 REMARK: Linear maps preserve linear combinations. 


The combination of conditions (i) and (ii) in Definition 23.1.1 is equivalent to the following single linearity 
condition. 


Vài, À2 € K, Vv, v2 € V, Q(A1t1 + A203) = A16(tv1) + A29(v3). 


Definition 23.1.1 is equivalent to linearity for arbitrary linear combinations, as shown in Theorem 23.1.3. 
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23.1.3 THEOREM: Linear maps commute with general finite linear combinations. 
Let V and W be linear spaces over a field K. Let 6: V —^ W. Then ó is a linear map from V to W if and 
only if 


Vn € Zt, VA € K^, Vv € V", $ ( X Awi) zo Apli). (23.1.1) 
i=l i=1 
Hence 
VA € Fin(V, K), e( $s X) = Y: A é(v). (23.1.2) 
vEV vEV 


PROOF: Let V and W be linear spaces over a field K. Let 6: V + W. Suppose ¢ satisfies line (23.1.1). 
Let v1, v2 € V. Then with n = 2, Ay = A» = 1x, line (23.1.1) implies that 6(v, + v2) = (v1) + (v2), which 
verifies Definition 23.1.1 (i). Now suppose that u € K and w € V. Then with n = 1, A; = p and v, = w, 
line (23.1.1) implies that (uw) = uó(w), which verifies Definition 23.1.1 (ii). 

To show the converse, suppose that ¢ is a linear map according to Definition 23.1.1. Let n = 0 in line (23.1.1). 
Then A and v are empty sequences. So $7; , Avi = Ov. Therefore $(5; 4 Avi) = (0v) = ¢(0K0v) = 
Ox(0y) = 0x by Definition 23.1.1 (i). Similarly, 57. Aió(vi) = Ox. So line (23.1.1) is verified for n = 0. 
Suppose that line (23.1.1) is valid for some n € Zf. Let ju € K"*! and w € V"*1. Then ó(u, 41w541) = 
Mn+10(Wn41) by Definition 23.1.1 (ii). So gU Hiwi) = $l Un41Wn+1 + Aai piwi) = $(Un+1Wn+1) + 
i Hiwi) = Mn O(Wnsr) + 2, pid(w;) = 3374 uid(w;) by Definition 23.1.1 (i) and the inductive 
assumption. Hence line (23.1.1) holds for all n € Z by induction. 


23.1.4 NOTATION: Lin(V,W) denotes the set of linear maps from the linear space V to the linear space W. 


23.1.5 REMARK: Alternative notations for the space of linear maps between two spaces. 

The set of linear maps Lin(V, W) is denoted as L(V, W) by Rudin [129], page 184, and as Hom(V,W) by 
Federer [69], page 11. The “Hom” notation is unnecessarily unspecific. The concept of a homomorphism is 
very broad indeed. It seems therefore preferable to save the “Hom” notation for when it is really needed. 


The notation Homg(V, W) is used by some authors. (For example EDM2 [113], section 256.D.) But the 
field K is implied in the specification tuples for V and W. It is best to avoid tacking “useful hints" onto 
notations. (A similar kind of “useful hint" is the annoying practice of writing a manifold M as M" to remind 
the reader of the dimension of the manifold.) 


23.1.6 DEFINITION: The linear space of linear maps from a linear space V to a linear space W is the set 
Lin(V, W) of linear maps from V to W, together with the operations of pointwise vector addition and scalar 
multiplication given by 


Vii, fo € Lin(V,W), Vv € V, (fi + faw) = fato) + fa(v) 
and 
VA € K, Vf € Lin(V,W), Vv € V, (Af)(v) = Af (v). 


23.1.7 REMARK: The possible difficulty of constructing non-trivial linear maps. 

For any two linear spaces V and W, the space Lin(V, W) contains at least the zero map f : V + W defined 
by f(v) = Ow for all v € V. If V is equipped with a basis, linear maps are readily constructed from 
the component map, which is well-defined by Theorem 22.8.11. Otherwise, it is not guaranteed that there 
are linear maps other than the zero maps in general. (This issue is also mentioned in Remark 23.5.1 and 
elsewhere in Section 23.5.) 


23.1.8 DEFINITION: Linear space morphisms. 
Let Vi, V2 and V be linear spaces over a field K. 


A linear space homomorphism from V; to V2 is a linear map 9 : Vi > V3. 


A linear space monomorphism from V4 to V» is an injective linear space homomorphism 6$ : Vi > V2. 


A linear space epimorphism from Vj to V3 is a surjective linear space homomorphism 6 : Vi > V5. 
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A linear space isomorphism from V4 to Və is a bijective linear space homomorphism 6 : Vi > V3. 


A linear space endomorphism on V is a linear space homomorphism $ : V — V. 


A linear space automorphism on V is a linear space isomorphism $ : V > V. 


23.1.9 NOTATION: Let Vi, V2, V be linear spaces over a field K. 
Hom(Vi, V2) denotes the set of linear space homomorphisms from Vj to Vo. 
Mon(Vi, V3) denotes the set of linear space monomorphisms from Vj to V2. 
Epi(Vi, V2) denotes the set of linear space epimorphisms from Vj to Vo. 
Iso(Vi, V2) denotes the set of linear space isomorphisms from V, to Vo. 
End(V) denotes the set of linear space endomorphisms on V. 

Aut(V) denotes the set of linear space automorphisms on V. 


23.1.10 REMARK: Summary of linear space morphisms. 
'The definitions of linear space morphisms are summarised in the following table. 


morphisms injective surjective Vi = V2 


Hom(Vi ; Vo ) — a —_ 
Mon(VA, V2) yes — = 
Epi(Vi, V2) "2l yes — 
Iso(Vi, V3) yes yes — 
End(V) = si yes 
Aut(V) yes yes yes 


23.1.11 REMARK: The general linear group of a linear space. 

In the group-theoretic context, the set Aut(V) of linear space automorphisms of a linear space V is also 
denoted GL(V) as in Notation 23.1.12. (See also Notation 19.1.12 for the notation GL(M) for the set of 
automorphisms of a module M.) This has the advantage that it specifies that the structure which is preserved 
by maps is the linear space structure. However, GL(V) usually denotes not only the set of automorphisms, 
but also the group structure, and possibly also a standard topological or differentiable structure. As always, 
the context should clarify the precise meaning. 


23.1.12 NoTATION: GL(V), for a linear space V, denotes the set of linear space automorphisms on V. In 
other words, GL(V) = Aut(V). 


23.1.13 THEOREM: The composite of two linear maps is a linear map. 
Let U, V and W be linear spaces over a field K. Let $1 : U — V and $5 : V —^ W be linear maps. Then 
$2001: U > W is a linear map. 


PROOF: Let v1,v2 € U. Then ($2 o $i)(v1 + v2) = b2(¢1(v1 + v2)) = G2(b1(v1) + $1(v2)) = éx($1(v1)) + 
$2($1(v2)) = ($2 o $1)(v1) 4- (62 o $1)(v2). So Definition 23.1.1 (i) is satisfied for 9» o $1 : U + W. Similarly 


Definition 23.1.1 (ii) is satisfied because (¢2 o $1)(Av) = ¢2(b1(Av)) = ¢2(Agi(v)) = Ado(b1(v)) = A(0» o 
¢1)(v) for all A € K and v € U. Hence ¢2 0 91 : U —^ W is a linear map. 


23.1.14 THEOREM: The inverse of a linear space isomorphism is a linear space isomorphism. 
Let à : V — W be a linear space isomorphism between linear spaces V and W. Then Q7! : W > V isa 
linear space isomorphism. 


PROOF: Let $: V — W bea linear space isomorphism between linear spaces V and W. Then à : W — V 
is a well-defined bijection because ¢ is a bijection. 

Let A € Fin(W, K) and let u = $^! (Y, ew Aw). Then ¢(u) = Di cw Aw = Lwew ^ud(ó-1(w)) = 
O(>~ ew Aw’ '(w)) because ¢ is a linear map. So u = Y, ew Awd 1 (w) because ¢ is injective. Therefore 
Q^! is linear. 


23.1.15 THEOREM: The component map of a linear space with a basis is an isomorphism. 
Let B be a basis set for a linear space V over a field K. Then the component map kg: V > FL(B, K) isa 
linear space isomorphism. 
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PROOF: Let B be a basis set for a linear space V over a field K. Then by Theorem 22.8.11, the component 
map «g : V — FL(B, K) is a well-defined bijection. The inverse of &p is the linear combination map 
óp : FL(B, K) — V defined by óp(A) = J ecg Ace for all A € Fin(B, K). Clearly óp is a linear map. So 
op is a linear space isomorphism. Hence «p is a linear space isomorphism by Theorem 23.1.14. 


23.1.16 REMARK: The general linear group of a linear space is a left transformation group. 

The group operation og in Theorem 23.1.17 uses the order gı o go so that the left transformation group 
action rule will be satisfied. If the order is reversed to g» o gi, then the tuple becomes a right transformation 
group as in Definition 20.7.2, assuming that the action map u% is transposed. 


23.1.17 THEOREM: The general linear group of a linear space is an effective left transformation group. 
Let V be a linear space over a field K. Let G = GL(V). Define og: G x G > G by ea : (91,92) > 91 9 92. 
Define ub : Gx V — V by u% : (g,v)  g(v). Then the tuple (GL(V), V, og, uE) is an effective left 
transformation group on V. 


PROOF: The closure of G under the composition operation og follows from Theorem 23.1.13. Existence 
of inverses follows from Theorem 23.1.14. The identity map idy : V — V is an identity element for G. So 
(G, oq) is a group by Definition 17.3.2. 

Let g1,9? € G and v € V. Then u6(cc(g1,g2),v) = HEC o 92v) = (g1 o g3)(v) = m(ga(v)) = 
Hen. He (g2,v)). So u% and og satisfy Definition 20.1.2 (i). Let v € V. Then uc (idv, v) = idy (v) = v. So 
pb and idy satisfy Definition 20.1.2 (ii). Hence (GL(V), V, OG, MG) is a left transformation group on V by 
Definition 20.1.2. 

G acts effectively on V by Definition 20.2.1 because if u% (g, v) = v for all v € V, then g : V > V is the 
identity map, which is the identity element of G. Hence (GL(V), V, og, ud) is an effective left transformation 
group on V by Definition 20.2.1. 


23.1.18 REMARK: Finite-dimensional linear spaces are fully classified by their dimension. 
The following comment, which is related to Theorem 23.1.20, is made by Gilmore [82], page 12. 


Because of the profound result that all N-dimensional vector spaces over the same field are 
equivalent (this is not true of groups with n elements or of algebras with the same dimension), 
it is necessary to study only one vector space of any dimension N in any detail. 


Probably much more profound than this is the fact that all bases for a finite-dimensional linear space 
have the same cardinality by T'heorem 22.7.16. So the dimension of a linear space is uniquely determined 
by the cardinality of any basis. (All other bases have the same cardinality.) Some authors define the 
dimension of a linear space to be the cardinality of a linearly independent spanning set for that space. (In 
Definition 22.5.2, the dimension is defined as the minimum cardinality of spanning sets without mentioning 
linear independence.) When the dimension is defined in terms of a basis, the ^profound" observation is that 
the cardinality of a single finite basis fully determines the space up to isomorphism because all other bases 
have the same cardinality. Therefore the nature of the profound observation depends on exactly how one 
defines dimension. (The motivation for the minimum spanning set cardinality style of definition is to ensure 
that every linear space has a well-defined dimension even if no basis can be proved to exist.) 


Theorems 23.1.19 and 23.1.20 extend the “profound result” for finite-dimensional spaces to general linear 
spaces with bases which are equinumerous in the sense of Definition 13.1.2. 


23.1.19 THEOREM: Permutation of basis elements yields a linear space isomorphism. 
Let Y : Bu — By be a bijection for sets By and By. Let K be a field. Define v : FL(By, K) > FL(By, K) 
by 


vf € FL(By, K), Vj € By, BAG) = fti). (23.1.3) 
Then y is a linear space isomorphism between FL(By, K) and FL(By, K). 


Pnoor: The map Ņ on line (23.1.3) is well defined because Dom(f) = By = Dom(4) = Range(q~") and 
Dom(w~!) = Range(v) = By. To show the linearity of v, let f,g € FL(Bu, K) and A, € K. Then for 
all j € By, it follows from the definitions of vector addition and scalar multiplication on FL(By, K) that 


V(Af - ug)G) = (Af ug) (671 Q)) = Af (o7 1)) - ug(o 71 (3)) = A9) ) + ub (g) (3). Hence v(Af + ug) = 
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Av(f) + ub(g). So v» is a linear map by Definition 23.1.1. The fact that ~ is an isomorphism follows from 
the observation that $ o  — idgr(p,,x) and podo= idpi(By,K)> vies 9 : FL(By, K) > FL(By, K) is the 
map defined according to line (23.1.3) for the inverse map ¢ = v^! : By > By. 


23.1.20 THEOREM: Linear spaces with equinumerous bases are isomorphic. 
Let U and V be linear spaces with bases By and By which are equinumerous. (In other words, there exists 
a bijection between By and By.) Then U and V are isomorphic linear spaces. 
PROOF: Let U and V be linear spaces over a field K. Let By = (ef )icr and By = (ey );e; be bases 
for U and V respectively. (As mentioned in Remark 22.7.5, a basis which does not already have an index 
may be self-indexed.) Let v: I — J be a bijection. (If a bijection is given between the ranges of indexed 
basis families, the bijection may be induced onto the index sets because a basis family is always injective.) 
Let Kp, : U > FL(By, K) be the component map for By. (See Definition 22.8.7.) Define ¢: U > V by 
Vu € U, ó(u) = ier KBy (u) (eva: Then ó(u) is a well-defined vector in V for all u € U because V is closed 
under finite linear combinations and #({i € I; kp, (u)(i) 4 0x]) < oo for all u € U. By Theorem 23.1.15, 
: U > FL(By, K) is a linear space isomorphism. Similarly, kg, : V — FL(By, K) is a linear space 
arien and so Ng. : FL(By, K) > V is a linear space isomorphism. But ¢ = Kp, o Wo Kgy, where 
v : FL(By, K) > FL(By, K) is the linear space isomorphism which is induced between the linear spaces 
FL(Bu, K) and FL(By, K) by the map 7: I > J as in Theorem 23.1.19. So 9: U — V is a linear space 
isomorphism. Hence U and V are isomorphic linear spaces. 


23.1.21 REMARK: The kernel of a linear map. 

For any linear map ¢ : V > W, for linear spaces V and W over the same field, the set (v € V; ¢(v) = Ow} is 
closed under linear combinations. Therefore this is a subspace of V (with the algebraic operations inherited 
from V). This subspace is an important attribute of a linear map. So it has a standard name and notation 
as in Definition 23.1.22 and Notation 23.1.23. 


23.1.22 DEFINITION: The kernel of a linear map ¢: V — W for linear spaces V and W over the same 
field is the subspace (v € V; ¢(v) = Ow} of V. 


23.1.23 NOTATION: ker(ó) denotes the me of a linear map ¢: V — W for linear spaces V and W over 
the same field. In other words, ker(¢) = ¢~1({Ow }). 


23.1.24 THEOREM: The kernel of a linear map is trivial if and only if the map is injective. 
Let 9 : V > W be a linear map for linear spaces V and W over the same field. Then ker(¢) = {0y } if and 
only if ó is injective. 


PROOF: Suppose that ¢ is injective. Then (v) Z $(0w) for all v € V \ {Ov}. So ker(9) = {Ov}. 


To show the converse, assume that ker(¢) = {Oy}. Suppose that ¢ is not injective. Then there are vectors 
V1, 02 € V with vı Æ v9 and $(v) = (v2). So (v2 — v1) = ó(v3) — ó(v1) = Ow by the linearity of 9. But 
vg — vı € ker(à) and ve — v4 Æ Ov, which contradicts the assumption. Therefore ¢ is injective. 


23.1.25 REMARK: The nullity of a linear map is the dimension of its kernel. 
The dimension of the kernel of a linear map is well defined because the dimension of any linear space is well 
defined, as mentioned in Remark 22.5.5. This well-defined concept is given a name in Definition 23.1.26 


23.1.26 DEFINITION: The nullity of a linear map 9 : V — W for linear spaces V and W over the same 
field is the dimension of the kernel of 9. 


23.1.27 NOTATION: nullity(¢) denotes the nullity of a linear map ¢: V > W for linear spaces V and W 
over the same field. In other words, nullity(¢) = dim(ker(9)). 


23.1.28 THEOREM: A linear map is injective if and only if it has zero nullity. 
Let 9 : V — W be a linear map for linear spaces V and W over the same field. Then ¢ is injective if and 
only if nullity(¢) = 0. 


PROOF: The assertion follows from Theorem 23.1.24 and Definition 23.1.26. 
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23.1.29 REMARK: Relation between linear map injectivity and the rank-nullity theorem. 

Theorem 23.1.28 is related to the rank-nullity formula which follows directly from the “first isomorphism 
theorem” for linear spaces. (See Remark 24.2.15 for the three isomorphism theorems which are valid for a 
wide range of algebraic categories.) Theorem 24.2.18 states that dim(Range(¢))+nullity(¢) = dim(V) for any 
linear map ¢: V > W between linear spaces V and W over the same field such that V is finite-dimensional. 
The special case nullity(¢) = 0 implies that $ is injective, which implies that dim(Range(¢)) = dim(V) 
because Range(¢) and V are then isomorphic. 


23.1.30 REMARK: Expression for a linear map in terms of a basis for the domain space. 

Section 23.2 is concerned with component maps for linear maps, which require a basis for both the domain 
and target spaces. But with only a basis for the domain space, it is possible to express linear maps as linear 
combinations of values for the domain basis. 


23.1.31 THEOREM: Linear map expression in terms of action of the map on a basis. 
Let V and W be linear spaces over a field K. Let B = (e;)ie7 be a basis for V. Let 6: V > W be a linear 
map. Then 


vv € V, lv) = $5 ka (v)io(ei). 


icl 


PROOF: Let v € V. Then v = er KB (v)iei by Definition 22.8.7. Therefore ọ(v) = » er &ka(v)ió(e;) by 
Theorem 23.1.3. 


23.2. Component maps for linear maps 


23.2.1 REMARK: Viewing linear maps through bases-and-components on the domain and range. 
Figure 23.2.1 illustrates the forward and reverse component maps for a linear map v between two linear 
spaces which each have a basis. (The inverse map v! may or may not exist.) 


vector component map 
-1 
KR = 
Fin(Bs, K) í Bc m V 
or Fin(Iz, K) > 
PB, 
linear-map linear combination map 
component "WVp,p,| |Vp,np, Y wt 
maps : ae 
p linear combination map 
Fin(B,, K) OB, ` " 
or Fin(I,, K) 5 = 
KB, = PB, 
vector component map 


Figure 23.2.1 Component maps for a linear map 


Let Y : Vi — V3 be a linear map between linear spaces V; and V2 which have respective bases By, or 
indexed bases B, = (eich, for £ = 1,2. By the definition of a linear space basis, the component maps 
KB, : Ve > Fin(Be, K) (or &p, : Ve > Fin(I¢, K)) are well defined for £ = 1,2. Therefore the component 
map Vp, p, = KB, o V o op, : Fin(B,, K) > Fin(B5, K) is well defined, where the vector component map 
Kp, : V > Fin(Bs, K) is as in Definition 22.8.6, and the linear combination map ó$p, : Fin(B41, K) > V is 
defined by óp, : k > Jpeg, k(v)v. 


It is more convenient to use an indexed basis. Then the component map is Vp, p, = Kp, o WO óp, : 
Fin(;, K) > Fin(I5, K), where the vector component map kg, : V > Fin(J2, K) is as in Definition 22.8.7, 
and the linear combination map óp, : Fin(I;, K) — V is defined by óp, : k  »5,:;, kiej. This is now 
formalised as Definition 23.2.2. 
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23.2.2 DEFINITION: The component map for a linear map w : Vi > Və, for linear spaces V; over a field K 
with indexed bases By = (ef)icz, for £ = 1,2 is the map Vp, p, = KB, o Y o op, : Fin(I3, K) > Fin(I5, K), 
where the component map kp, : V > Fin(J, K) is as in Definition 22.8.7, and the linear combination map 
óp, : Fin(I;, K) — V is defined by óp, : k 9 Vier, kiej- 


23.2.3 REMARK: Linear-map component maps in the special case that the domain and range are equal. 

If Vj = V2 = V in Definition 23.2.2 and y : V4 — Və is the identity map on V, then the component map 
for w is equivalent to the component transition map in Definition 22.9.4 (which is the component transition 
map for a change of basis). 

When V; = V; = V and v € Aut(V), the component maps Vp, pg, and Vp, p, are both well defined. 
If » = idy, then these component maps are inverses of each other. 


23.2.4 REMARK: The fount of all matrices. 

The matrix algebra in Chapter 25 is quintessentially the algebra of the component matrices of linear maps. 
Wherever there are linear maps between finite-dimensional linear spaces, there are component matrices for 
these maps, and the composition rule for these matrices is a direct consequence of the composition of linear 
maps. Therefore one could say that Definition 23.2.5 is the fount of all matrices. 


It is true that matrices are also defined for bilinear functions, such as Riemannian metric tensors for example, 
but such matrices are merely the components of individual tensor objects in the same way that vector 
component tuples are the components of vector objects. The multiplication of such object component 
matrices has no real meaning. The quintessential operation on a matrix is matrix multiplication, and the 
origin of this operation is the composition of linear maps. 


It is also true that matrix multiplication arises naturally from the composition of basis-transition component 
arrays as in Theorem 22.9.8. But the component arrays for basis transitions are effectively the same as 
linear-map component arrays for the identity map. In other words, Definition 22.9.6 may be thought of as 
a special case of Definition 23.2.5. 


23.2.5 DEFINITION: The component array for a linear map wv : Vi — Vz between linear spaces V; over a 
field K with respect to indexed bases B, = (e£ Jier, for £ = 1,2 is the double family (aj;);e1,,;c1,, where 


Vieh, Vj € 1, aji = KB, (o (ei))- 


23.2.6 THEOREM: Components of a linear map between basis elements. 
Let Be = (ef);er, be bases for linear spaces V; over a field K for / — 1,2. Let v : Vj —> V2 be a linear map. 
Let (aj;);er,;cr, be the linear-map component array for i) with respect to B; and B2. Then 


Vi € ly, (el) = DY age; (23.2.1) 
JEI 
and 
Vv € Vi, v(v) = M &pg,(v); M ajie; (23.2.2) 
ich JER 
EXE asiKp, (v)i ej. 
jE ich 


PROOF: Line (23.2.1) follows directly from Definitions 23.2.5 and 22.8.7. 
Line (23.2.2) follows from line (23.2.1), Theorem 23.1.3 and Definition 22.8.7. 


23.2.7 REMARK: Component maps for maps between linear spaces. 

In the same way that component maps are defined for linear spaces in Definitions 22.8.6, 22.8.7 and 22.8.8, 
component maps may be defined also for the linear space Lin(Vi, V2) of all linear maps between spaces Vi 
and V2 in terms of bases for those spaces. The reverse index order in Definitions 23.2.8, 23.2.9 and 23.2.10 
ensures that the usual conventions for matrix multiplication will be valid. 


23.2.8 DEFINITION: The linear map component map for basis sets Bı and Bə of linear spaces V; and V3 
respectively over a field K is the map kg, p, : Lin(Vi, Vj) > Fin(Bo x B4, K) with 


V € Lin(Vi, V2), Ver € Bi, Veo € Bs, 
KB, Ba (V)(e2, €1) = KB, (Y(e1)) (e2), 
where kp, : Vo > Fin(Bo, K) is as in Definition 22.8.6. 
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23.2.9 DEFINITION: The linear map component map for basis families Bı = (e})ier, and B3 = (e2)ier, for 

linear spaces Vi and Vz respectively over a field K is the map &p,,p, : Lin(Vi, Vj) > Fin(I; x I2, K) with 
Vy € Lin(Vi, V2), Vi € Ii, Vj € I», KB,,B» (V)G, i) = KB», (v(ei));, 

where kp, : V2 > Fin(I5, K) is as in Definition 22.8.7. 


23.2.10 DEFINITION: The linear map component map for finite bases Bı = (e})"_, and By = (e2) for 
linear spaces Vi and Vz respectively over a field K is the map &p,,p, : Lin(Vi, Vj) > K™*” with 


Vu € Lin(Vi, V2), Vi € Nn, Vj € Nm, 
KB,,Bo (V), i) =KBo (abies 


where Kp, : V2 > K"" is as in Definition 22.8.8. 


23.2.11 REMARK: Comparison of proofs of inverse linear-map and reverse basis-transition theorems. 
The proof of Theorem 23.2.12 is, unsurprisingly, very similar to the proof of Theorem 22.9.8, which is 
effectively the special case of Theorem 23.2.12 for V = Vi = Və and v = idy. 


23.2.12 THEOREM: The component arrays of a linear map and its inverse are "inverse matrices". 

Let Be = (ef)ier, be bases for linear spaces V; over a field K for £ = 1,2. Let v : Vi — Vo be a linear space 
isomorphism. Let (aji)jer icn and (àij)ier,jer, be the linear-map component arrays with respect to By 
and Bə for v and v^! respectively. Then 


Vi, j € Io, 5 Qim mj = 9i (23.2.3) 
mel, 
and 
Vi,j € h, $5 imam; = Óij. (23.2.4) 
mE 


PROOF: For line (23.2.3), let v; = 7+ (eF) for j € Ig. Then vj € Vi for all j € I», and e? = v(vj) = 
Lier (Yren QimKB, (Vj)m)e? by Theorem 23.2.6 line (23.2.2). But e = ien ó;;€? for all j € Iz, and by 
Definition 22.7.2 and Theorem 22.6.8 (iii), e? can be expressed in at most one way as a linear combination of 
elements of B2. So ài; = X mer, Gim*B, (Vj)m for all i, j € I3. However, &g,(vj)m = KB, (Yt (e?))m = Gmj 
for all j € I2 and m € J, by Definition 23.2.5. Hence 6;; = >> Qimāmj for all i, j € I5. 

Line (23.2.4) follows exactly as for line (23.2.3) by swapping v and v !. 


mel, 


23.3. Trace of a linear map 


23.3.1 REMARK: Invariance of the trace of a linear space endomorphism. 

The most important fact about the trace of a linear space endomorphism in Definition 23.3.3 is that its value 
is independent of the choice of basis. (For the component map kg : V > K”, see Definition 22.8.8.) This 
invariance is probably also the most annoying fact about the trace. Although the diagonal elements which 
are added to compute the trace all individually change when the basis changes, somehow the sum of these 
elements is invariant. 


The trace of linear space endomorphisms finds extensive applications in differential geometry, especially 
for the “contraction of indices” of tensors. The requirement of a basis for the definition of the trace is a 
particular embarrassment for anti-components campaigners, some of whom would probably ban coordinates, 
bases and indices outright if they could. Since the trace is basis-independent, one can at least hide the role 
of a basis in its definition by using an abstract notation such as Tr(1). 


Nostalgia for the good old days of synthetic geometry is somewhat misplaced. Anyone who has experienced 
the rigours of traditional Euclidean geometry knows that it was infested with non-obvious constructions and 
plethoras of tedious labels for points, lines, angles, triangles and other figures, and its scope was limited 
to straight lines and a few conic sections in flat space. The arithmetisation of geometry by Fermat and 
Descartes achieved substantial simplification and a huge broadening of scope which more than compensate 
for the lack of geometric immediacy. (However, neither Fermat nor Descartes used a coordinate system with 
two axes. See Boyer [236], page 76; Cajori [241], page 175.) It is analogous to the replacement of analogue 
computers with digital computers, which replace voltages with bits. Anyone who has tried to build a basic 
analogue computer with op amps (i.e. operational amplifier chips) would certainly not feel nostalgia for the 
good old days before digital computers. 
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23.3.2 THEOREM:  Basis-independence of the trace of a linear space endomorphism. 
Let Y% € Lin(V,V) be a linear map for a finite-dimensional linear space V. Then $5; 4 &p,(V(e1)); = 
Mii B; (0(e7)); for any two bases By = (e1);-, and By = (ef); for V. 


PROOF: Let v € Lin(V, V) for a finite-dimensional linear space V. Let Bı = (e;);-, and By = (e)5., be 
bases for V. Let aj; = &p,(e1); and Gj = Kp, (ef); for all i, j € Nn. Then 


= &p,(0(65)); = > kB, (Y( > Gjer )); (23.3.1) 
= x Ra (22 tytle) P (23.3.2) 
= È auem (4) (23.3.3) 
= Y kp, (We): (23.3.4) 


e. 
Il 
un 


where line (23.3.1) follows from Theorem 22.9.11 line (22.9.3), line (23.3.2) follows from the linearity of v, 


line (23.3.3) follows from Theorem 22.8.12, and line (23.3.4) follows from Theorem 22.9.11 line (22.9.6). 


23.3.3 DEFINITION: The trace of a linear space endomorphism w € Lin(V,V), for a finite-dimensional 
linear space V over a field K, is the element $77 , kpg(v(e;)); of K, where B = (e;)?_, is any basis for V. 


23.3.4 NOTATION: ‘Tr(w), for a linear space endomorphism vy € Lin(V, V) on a finite-dimensional linear 
space V, denotes the trace of v. 


23.3.5 REMARK: Geometric interpretation of the trace of a linear map as its divergence. 

Without resorting to analytical concepts, it is possible to give a geometric kind of interpretation for the trace 
of a linear map. Consider the family of transformations $ : R — Lin(V, V) given by (t)(v) = v + tL(v) for 
a linear map L € Lin(V, V) for some finite-dimensional linear space V. Given any figure S C V, such as a 
rectangular or ellipsoidal region with finite positive volume, the image $(t)(S) of S under the map ¢(t) will 
have a volume which is a polynomial function with respect to t. The constant term in this polynomial is 
the initial volume of S = $(0)(S). The first-order term has the form t Tr(L) vol(S). This follows from the 
well-known properties of the characteristic polynomial for a linear map. Thus the trace of a linear map may 
be identified with the divergence of the linear map v > Lv. (See Section 71.8 for the divergence of a vector 
field, which employs the trace for its definition.) 


23.4. Linear functionals 


23.4.1 REMARK: Specialisation of linear maps to linear functionals. 

Linear functionals are linear maps from a linear space to the field of the linear space. As mentioned in 
Remark 22.1.8 and Definition 22.1.9, a field may be regarded as a linear space whose field is itself. Hence 
Definition 23.1.1 for a linear map applies immediately to the case where the range of the map is a field 
(regarded as a linear space), but this field must be the same as the field of the domain space. 


A linear map where the range is the field of the domain space is called a “linear functional”. The use of a 
different name is justified by the large number of special properties of linear functionals which are not valid 
for general linear maps. However, the form of the definition for a linear functional in Definition 23.4.2 is 
essentially identical to that of linear maps in Definition 23.1.1. 


23.4.2 DEFINITION: A linear functional on a linear space V over a field K is a function f : V — K which 
satisfies the following conditions. 


(i) Yvi, v2 € V, f(vi + v3) = f(v1) + f (vo). [vector additivity] 
(ii) VÀ € K, Vv € V, f(Av) = Af (v). [scalarity] 
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23.4.3 REMARK: Equivalence of additivity and scalarity to linear-combination linearity. 
In the same way as is mentioned in Remark 23.1.2 for the case of linear maps, the two conditions in 
Definition 23.4.2 are together equivalent to the following combined linearity condition. 


VAL, A2 € K, V1, V2 € V, f(A + A22) = Aif (vi) + Aaf (v2). 


Theorem 23.4.4 asserts that Definition 23.4.2 is equivalent to linearity for arbitrary linear combinations. 


23.4.4 THEOREM: Equivalent linear-combination linearity condition for linear functionals. 
Let V be a linear space over a field K. Let f : V — K. Then f is a linear functional on V if and only if 


Yne Zf, VÀ € K^, Vv e V", 163 divi) aia 
i—1l i=1 


Pnoor: The proof by induction is the same as for Theorem 23.1.3. 


23.4.5 THEOREM: Unique determination of linear functionals by their action on basis elements. 
On a linear space which has a basis, all linear functionals are uniquely determined by their action on the 
elements of the basis. 


PROOF: Let V be a linear space which has a basis family B = (e;);e;, over a field K. Let f be a linear 
functional on V. Define the family c = (ci)ie; € K! by c; = f(e;) for all i € I. Let v € V. Then v = 
icr Aiei for some A € Fin(I, K) by Definition 22.7.6. So f(v) = frie iei) = jer Aif (ei) = Vier Aici- 
But by Theorem 22.8.2, A is uniquely determined by v. So f(v) is uniquely determined by v and c. Hence 
the action of f on all of V is uniquely determined by the action of f on the elements of the basis B. 


23.4.6 T'HEOREM: Expression of linear functional in terms of its action on a basis. 
Let f be a linear functional on a linear space V with basis B. Then Vv € V, f(v) = X ecn &a(v)(e) f (e). 


PROOF: Let v € V. Then v = ep &kp(v)(e)e by Definition 22.8.6, where kg : V — Fin(B, K) is the 
component map for B. Thus (e € B; kpg(v)(e) 4 0) < oo. So f(v) = $ ecg KB(v)(e)f(e) by Theorem 23.4.4. 
The assertion follows from this. 


23.4.7 REMARK: Guaranteed existence of linear functionals on a linear space with a basis. 
Linear functionals are guaranteed to exist in a linear space which has a basis. As shown in Theorem 23.4.8, 
each individual component of the component map with respect to a basis is a linear functional. 


23.4.8 THEOREM: Each component of a component map is a linear functional. 

Let V be a linear space over a field K. Let B = (e;);e; be a basis family for V. Let xg : V > Fin(J/, K) be 
the component map for B in Definition 22.8.7. Then the map h; : V > K defined by h; : v  &p(v)(i) is a 
linear functional on V for all i € J. 


Pnoor: For V, K, B, kpg and h; as in the statement of the theorem, let A;,A2 € K and v1,v9 € V. 
By Theorem 23.1.15, the component map kp : V — Fin(I, K) is a linear space isomorphism. Therefore 
hi(Mti + A22) = Kp(Mt1 + A2v2)(t) = (Ai&p(v1) + A2k B(v2))(2) = Ai&p(vi)(1) + A2Kk B(v2)(t) by the 
pointwise definition of operations on Fin(J, K). 


23.4.9 REMARK: Ad-hoc “juxtaposition products” of linear functionals by vectors. 

In the expression in Theorem 23.1.31 for a linear map ¢: V — W in terms of a basis B for V, the maps 
v  Kp(v); for v € V are linear functionals in V* by Theorem 23.4.8. This suggests replacing the formula 
Vv € V, ó(v) = Vie, &n(v)id(ei) with the simpler-looking ¢ = Jier é(ei)(&a)i, where $(e;) € W and 
(&p); € V* for each i € I. This looks like a linear combination of the linear functionals (kg); except that 
the multipliers in the “linear combination" are vectors in W, not scalars in K. This raises the question of 
what kind of product is intended here for elements of W and V*. (Such “juxtaposition products” are seen 
in the literature for vector-valued differential forms for example.) 


The juxtaposition product in Definition 23.4.10 is clearly well defined. If V and W have the same field, the 
product f -w is a well-defined linear map from V to W. This is very general, requiring no bases nor any 
relation between the spaces apart from the common field. 


The juxtaposition product is quite limited. Only linear maps g € Lin(V,W) which have a one-dimensional 
range can be written as g = f -w for some f € V* and w € W. However, if V has a basis, Theorem 23.1.31 
implies that g may be written as a sum of such products. 
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23.4.10 DEFINITION: The juxtaposition product of a linear functional f by a vector w, where f € V*, 
w € W, and V and W are linear spaces over the same field, is the map h : V* x W — Lin(V, W) defined by 


Vf € V*, Vw € W, Vv € V, h(f,w)(v) 2 f(v)w. 
The expression h(f,w) may be denoted as f -w for f € V* and w € W, or alternatively as w- f. 


23.4.11 REMARK: Formula for linear maps in terms of juxtaposition products. 
In terms of the juxtaposition product in Definition 23.4.10, Theorem 23.1.31 may be written as 


Vo € Lin(V,W), b= M (KB)i- (ei), (23.4.1) 


for any linear spaces V and W over the same field K, where B = (e;);e; is a basis for V. Since each term 
in the sum is a linear map from V to W, the summation uses the addition operation for the linear space 
Lin(V, W) in Definition 23.1.6. 

The fact that the index set J could be infinite might seem to make the infinite sum perilous. However, 
each term (Kg); : ó(e;) in the sum has “finite support" in the sense that ((&p)i- (e:))(e;) Æ 0 for only 
one j € I, and any vector v € V has a non-zero component (&p)(v); with respect to B for only a finite 
number of i € I. So for any fized v € V, only a finite number of terms in the sum are non-zero. In the sense, 
the sum “converges” for each v € V. Moreover, the sum “converges” to ó(v). This justifies line (23.4.1). 
Although all of the terms may be non-zero linear maps, only a finite number of terms are non-zero for any 
fixed argument v € V. 


If the linear space W is replaced by the field K, line (23.4.1) seems to state that the family B* = ((&p);)ier isa 
basis for V* because then any linear functional ¢ € Lin(V, K) = V* may be written as a “linear combination" 
of members of B*. This apparent “dual basis" is given the name “canonical dual basis" in Definition 23.7.3, 
and line (23.4.1) is then equivalent to Theorem 23.7.7. However, by the comments in Remark 23.7.4, 
B* does not span V* if V is infinite-dimensional. This follows from the observation that Lin(V, K) can only 
be “spanned” by B* if infinite subfamilies are permitted, which contradicts Definition 22.7.6 since the span 
of a family in Definition 22.4.3 permits only finite linear combinations. 


It is the special linear-functional structure of the vectors (kg); € V* which makes it possible to reduce 
infinite linear combinations to finite linear combinations by restricting these linear functionals to particular 
fixed vectors v € V. Without knowledge of this special structure, infinite linear combinations in general 
are meaningless because the meaningfulness of linear combinations in Definition 22.3.2 relies upon math- 
ematical induction to ensure that all finite sums lie within the linear space. It should also be noted that 
the "convergence" of the "infinite linear combinations" here depends on the presence of only one copy of 
the linear functional (kg); in B* for each i € I. If infinitely many copies were permitted, then even for a 
fixed vector v € V, the number of terms in the sum in line (23.4.1) would be infinite and therefore undefined 
(because there is no topology to define convergence with). Thus it is both the special nature of the individual 
members of B* and the fact that there is only one copy of each (kg); which makes the pseudo-convergence 
in line (23.4.1) possible. 


23.5. Existence of non-zero linear functionals 


23.5.1 REMARK:  Non-zero linear functionals cannot necessarily always be constructed. 

For any linear space V over a field K, the zero linear functional f : V — K defined by f(v) — Ox for 
all v € V is well defined. So every linear space has at least one linear functional. Perhaps surprisingly, there 
is no guarantee for a general linear space that there are any linear functionals apart from the zero functional. 


It might seem obvious that a non-zero linear functional can be constructed by hand for any linear space. So 
let V be a linear space over a field K, and suppose that nothing is known about this linear space apart from 
the fact that it is not spanned by any finite set of vectors. To construct a non-zero linear functional f on V, 
one may presumably choose a vector v; € V \ {0y } and specify that f(v,) = 1x. The linearity of f implies 
that f(A1v4) = Az for all 44 € K. So f is a perfectly valid non-zero linear functional on the one-dimensional 
subspace Wi = (Avi; À1 € K} of V. 

Since it does not matter how the linear functional is defined on V V Wi, it is tempting to specify simply 
that f(u) = 0x for all u € V \ W1. However, this cannot be linear. To see why, note that u + vı € V \ W: 
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for all u € V \ Wi. (Otherwise u would equal the difference between two elements of Wi, which would 
imply that u € Wi.) Therefore f(u+ v1) = Ox for all u € VV W,. But linearity implies that f(u + vı) = 
f(u) + fwi) = 1k 4 Ox. 

This method can be improved a little by choosing v € V V Wi and specifying f(v2) = Og, from which it 
follows by linearity that f(Aivi + A2u3) = A1 for all 4, 49 € K. This is a valid non-zero linear functional 
on W2 = span(vi,v2). This process can be continued for any particular finite linearly independent set of 
vectors by hand if one has sufficient patience. By countable induction, this procedure can be extended to 
an arbitrary finite or countably infinite linearly independent set of vectors. Using transfinite induction, the 
same procedure yields a linear functional if a well-ordered basis is available for the linear space. In fact, 
the well-ordering of the basis is not necessary. But without any basis at all, it is not possible within ZF set 
theory to construct a non-zero linear functional. 


23.5.2 REMARK: Methods for guaranteeing the existence of non-zero linear functionals. 

Theorem 23.5.3 guarantees the existence of a non-zero linear functional on a linear space if the space has a 
basis. If no basis is available, Theorem 23.5.5 will usually guarantee the existence of constructible non-zero 
linear functionals. 


23.5.3 THEOREM: Construction of non-zero linear functionals om a non-trivial linear space with a basis. 
Let V be a linear space over a field K, which has a basis. Then for any non-zero v € V, there is a linear 
functional f : V > K such that f(v) 4 0. 


PROOF: Let B be a basis for V. Let v € V \ {0}. Denote the component map for the basis B by 
k: V — KP. Then v = Jeep &(v)(e)e. The set B' = (e € B; k(v)(e) Æ 0) is finite. Let ë € B bea 
basis vector such that &(v)(e) z 0. (There must be such a vector € because v # 0.) Define f : V > K by 
f(w) = &(v)(e) !«(w)(e) for all w € V. Then f is a well-defined linear functional on V with f(v) = 1x. 
'This is clearly non-zero. 


23.5.4 REMARK: Existence of non-zero linear functionals on subspaces of unrestricted linear spaces. 
Theorem 23.5.5 is shallow, but surprisingly effective. The majority of linear spaces are either subspaces of 
unrestricted linear spaces K$ (according to Definition 22.2.5), or are easily identifiable with such subspaces. 
(For example, the linear space of all real or complex Cauchy sequences or the linear space C^? (IR^, R) in 
Notation 42.1.10.) Therefore in practice, finding non-zero linear functionals is usually not difficult. (One 
needs only to find a v € V and z € S with v(x) Z Ox, which should be easy if V Z (0y }.) 


23.5.5 THEOREM: Existence of non-zero linear functionals on subspaces of unrestricted linear spaces. 
Let K? be the unrestricted linear space on a set S with field K (with vector addition and scalar multiplication 
defined pointwise). Let V be a linear subspace of K^. 


(i) For any z € S, the map ¢: V > K defined by ¢: v > v(x) is a linear functional on V. 


(ii) If du € V, v(x) Z Ox, then 6: v v(x) is a non-zero linear functional on V. 


Pnoor: For part (i), let x € S, and define 6: V + K by 9 : v +> v(x). Then ó(Aivi + Agve) = 
(Aivi + Ava) (x£) = Avi (£) + Agva (x) = A1o(v1) + A2d(v2) for any v1, v9 € V and Aji, 49 € K. Hence ¢ isa 
linear functional on V. 

For part (ii), let x € S, and define ¢: V + K by ¢: v e» v(x). Let vo € V satisfy vo(z) # Ox. Then 
(vo) = vo(x) # Ox. So ¢ is a non-zero linear functional on V. 


23.5.6 REMARK:  Ertendability of linear functionals from a subspace to the full space. 

It is frequently required to know that there exists a linear extension of a linear functional from a subspace 
to the whole linear space. This is guaranteed by Theorem 23.5.7 in the case of linear spaces which have a 
well-ordered basis. (In ZF+AC set theory, every linear space has a well-ordered basis by Theorem 22.7.21, 
which yields Theorems 23.5.9 and 23.5.10.) 


23.5.7 THEOREM: Ezplicit extendability of linear functionals from a subspace to the full space. 
Let V be a linear space over a field K which has a well-ordered basis. Let f be a linear functional on a 
subspace W of V. Then f may be extended to a linear functional f : V > K. 
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PROOF: A proof may be constructed following the pattern of the proof of Theorem 22.7.26. Let W be a 
subspace of a linear space V over a field K. Let V have a basis B = (va)aea, where A is a well-ordered set. 
Let f : W — K bea linear functional on W. 


Define g : A > {0,1} inductively for a € A by 


OF (1 if va € span(W U (vg; B < a and g(8) = 1}) 
0 otherwise. 


Then by transfinite induction (Theorem 11.8.2), span(W U (v4; g(a) = 1}) = V and Va € A, (g(a) = 1 > 
Va € W). So W nspan(S) = {Oy}. 
Let u € V = span(W U S), where S = {va; g(a) = 1}. Then by the definition of the linear span, u = 
P eewus Aee for some A € Fin(W U S, K). Since W and S are disjoint sets, u = Do cw Aee + Deeg Ace = 
w t J ecg Ace; Where w = Do cw Ace € W. Now f : W — K can be extended to f : V + K by defining 
f(u) = f(w) for all u € V. It must be shown that this extension is well defined and linear on V. 
Let u € V and suppose that u = wi + Ð ecg ule = w2 + J ecg ue for some w1,w2 € W and pt, pu? € 
Fin(S,K). Then (wi — we) = Yeeg(u2 — ulje. But wi — we € W and Y.cs(u2 — ul)e € span(S). 
So w, = w2. Thus the choice of W is unique for a given u € V. 
To show the linearity of f, let v = Y? , piu; with n € Zj, p € K” and u € V^. Then uj = wi + ecg uie 
for some w € W” and p’ € Fin(S, K) for each i € IN. So v = 2377 4 pi(wi + Deeg Hee) = oia piwi + 
Di1 D ecg Mie. Therefore 

f(v) = F(X awt 92 35 mee) 
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Hence f is linear. 


23.5.8 REMARK: Existence of linear functionals if the axiom of choice is accepted. 

A general linear space is guaranteed to have a well-ordered basis if the axiom of choice is assumed. (See 
AC-tainted Theorem 22.7.21 for this.) The AC versions of Theorems 23.5.3 and 23.5.7 are Theorems 23.5.9 
and 23.5.10 respectively. Acceptance of the axiom of choice does not reduce the workload. The proofs are 
the same. Only the claimed generality differs. 


23.5.9 THEOREM ([ZF+AC]: Implicit extendability of linear functionals from a vector to the full space. 
Let V be a linear space over a field K. Then for any non-zero v € V, there is a linear functional f : V > K 
such that f(v) 4 0. 


PRoor: This follows immediately from Theorems 22.7.21 and 23.5.3. 


23.5.10 THEOREM [ZF+AC]: Implicit extendability of linear functionals from a subspace to the full space. 
Let f be a linear functional on a subspace W of a linear space V over a field K. Then there is a linear 
functional f : V — K which extends f from W to V. 


PRoor: This follows immediately from Theorems 22.7.21 and 23.5.7. 


23.6. Dual linear spaces 


23.6.1 REMARK: The algebraic dual of a linear space. 

The set of linear functionals on a linear space has a natural linear space structure by defining pointwise vector 
addition and scalar multiplication operations. The space of linear functionals is called the “dual space". The 
original space may then be referred to as the “primal space", although the word “primal” is rarely used in 
this way in the literature. 
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23.6.2 NOTATION: V* denotes the set of linear functionals on a linear space V. 


23.6.3 REMARK: The set of linear functionals on a linear space. 
Notation 23.6.2 means the following for linear spaces V. 


V* 2(f:V > K; YA € Fin(V, K), f( 35 Avv) = 35 Avf(v)}, 


veV vEV 


where K is the field for V. This may be written in terms of finite-tuple linear combinations as follows. 


Vt ={f:V > K; Vn e Z, VA € K”, vw € V^, f(Y: Xu) = X Af (wi) }. 
i=l i 


E 
ll 
= 


The set V* in Notation 23.6.2 is given a linear space structure in Definition 23.6.4. 


23.6.4 DEFINITION: The dual (linear) space of a linear space V < (K, V, ox, Tg, ovy, Uy) is the linear space 
V* < (K,V*,oK,7K,0v~, Hy*) where the set V* is the set of all linear functionals on V, and oy», py» are 
the pointwise vector addition and scalar multiplication operations on V* respectively. In other words, 


(i) Vwi, w2 € V*, Ww € V, oy»(w1, w2)(v) = og (w1 (v), w2(v)), [pointwise vector addition] 
(ii) VÀ € K, Vw € V*, Vv € V, ny«(A,w)(v) = Tg (à, w(v)). [pointwise scalar multiplication] 


23.6.5 NOTATION: V* denotes the dual linear space of a linear space V. 


23.6.6 REMARK: The fine distinction between the dual linear space and the set of linear functionals. 
The notation V* is used both for the linear space V* in Notation 23.6.5 and the set V* in Notation 23.6.2. 
This ambiguity is almost always harmless. 


23.6.7 REMARK: Alternative names for the dual linear space. 
Robertson/Robertson [126], page 25, calls Definition 23.6.4 the “algebraic dual" or “algebraic conjugate” to 
distinguish it from the “topological dual”. 


23.6.8 THEOREM: If all linear functionals have zero value on a vector, then the vector is zero. 
Let V be a linear space which has a basis. Then (Vw € V*, w(v) = 0) = v = 0. In other words, the only 
vector in V for which all linear functionals on V have the zero value is the zero vector. 


Pnoor: This theorem is the logical contrapositive of Theorem 23.5.3. 


23.6.9 THEOREM: Vectors are uniquely determined by the action of linear functionals on them. 
Let V be a linear space which has a basis. Then 


Vv1, 02 € V, (Vw € V*, w(v1) = w(ve)) > vi = vs. 


In other words, if all linear functionals on V have the same value on two vectors in V, then the two vectors 
are the same. 


PROOF: Let v,v9 € V have the property that Vw € V*, w(v1) = w(v2), where V is a linear space which 
has a basis. Suppose that Vw € V*, w(v1 — v2) = 0 by the linearity of w for w € V*. So vı — v2 = 0 by 
Theorem 23.6.8. So vı = v. Hence (Vw € V*, w(v1) = w(v2)) > v1 = v2. 


23.6.10 THEOREM: Scalar multiple relation between linear functionals implies the same for two vectors. 
Let V be a linear space over a field K, which has a basis. Let t € K. Then 


Wur, v2 € V, (Vw € V*, w(v4) = tw(v2)) > v = tv. 


PROOF: Let V be a linear space over a field K, which has a basis. Let t € K. let v1, vo € V, and suppose 
that Vw € V*, w(v4) = tw(v2). Then Vw € V*, w(v) = w(tv;). So by Theorem 23.6.9, vı = tv». 
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23.6.11 REMARK: The dual of a proper subspace is not a subspace of the dual. 

The dual of a proper linear subspace of a linear space is not a subspace of the dual of the primal space. 
To see why this is so, let U be a linear subspace of a linear space V such that U & V. The elements of 
V* are linear functionals f : V — K, where K is the scalar field of V. The elements of U* are linear 
functionals g : U + K. But Dom(g) = U & V = Dom(f). Therefore g ¢ V*. In fact, U* Y V* = 0. 


This observation regarding duals of subspaces has particular relevance for the definition of antisymmetric 
tensors in Section 30.4. In the tensor algebra context, one defines subspaces of tensor spaces which are 
defined by a symmetry or antisymmetry constraint of some kind. Then the formation of the dual of such a 
subspace has an untidy relationship with the dual of the unconstrained space. 


It seems straightforward, perhaps, to extend linear functionals g : U — K to the whole space V, thereby 
identifying elements of U* with elements of V*. The existence of such linear extensions is sometimes in 
some doubt, as noted in Remarks 23.5.1 and 23.5.6, although in essentially all practical cases, the existence 
of (algebraic) extensions to (algebraic) linear functionals is easily proved (without invoking the axiom of 
choice). The uniqueness issue is more serious. In general, there are infinitely many linear extensions from 
a proper subspace to the whole space, if at least one is known to exist. With the aid of a basis, one may 
standardise such extensions to some extent, but the artificiality of such a construction is very unsatisfying 
and unsatisfactory. 


23.6.12 REMARK: The dual of a subspace is easier to define than the dual of the full space. 

The fact that the dual U* of a subspace U is not a subspace of the dual V* of the full space V, but is instead 
a space of restricted linear functionals with domain U, makes the definition of U* easier. Instead of needing 
to define the value of f(v) for all v € V, it is only necessary to define f(v) for v € U. If V is a huge space 
such as IR" or RË, axiom-of-choice issues arise in the definition of V*, as mentioned in Remark 23.7.16. But 
if U is so constrained that it has an identifiable basis, then the construction of the dual is straightforward, 
as indicated in Theorem 23.7.14. 


For a wide range of infinite-dimensional linear spaces which are of practical interest, axiom-of-choice issues 
do not arise in the formation of dual spaces. This is partly because the primal spaces are so restricted that a 
basis, possibly well-ordered, can be readily identified. It is also partly because the spaces of linear functionals 
are themselves constrained to be bounded with respect to some topology. Thus even the double dual, under 
such circumstances, may be well-defined in ZF set theory. 


23.7. Dual linear space bases 
23.7.1 REMARK: A “canonical dual basis” is not a basis in general. 


The “canonical dual basis" sets and families defined in Definitions 23.7.2 and 23.7.3 are not bases for the 
algebraic dual linear space if the primal space is infinite-dimensional. This is discussed in Remark 23.7.4. 


23.7.2 DEFINITION: The canonical dual basis of a basis set E for a linear space V is the set {he; e € E}, 
where he : V — K is defined for all e € E by h-(v) = &g(v)(e) for all v € V, where K is the field of V and 
Kg: V > Fin(E, K) is the component map for E in Definition 22.8.6. 


23.7.3 DEFINITION: The canonical dual basis of a basis family B = (e;)ier for a linear space V is the 
family (hi)ier, where h; : V — K is defined for all i € I by h;(v) = &p(v)(i) for all v € V, where K is the 
field of V and «p : V — Fin(J, K) is the component map for B in Definition 22.8.7. 


23.7.4 REMARK: Difficulties of constructing an algebraic dual basis for an infinite-dimensional space. 
Definitions 23.7.2 and 23.7.3 only define bases if the primal space is finite-dimensional. These “canonical 
dual basis" constructions are intuitively appealing, but they do not span the dual linear space if the primal 
space is infinite-dimensional. 

The family (h;);e; in Definition 23.7.3 certainly does define a family of linear functionals h; on V, and the 
span of these linear functionals is a linear space with the same dimension as V. However, this span is a 
proper subspace of the dual space V* if dim(V) — oo. 


It is an appealing thought that a useful restricted dual space may be constructed as the span of the “canonical 
dual basis" in Definition 23.7.3. This would be analogous to the way a topological dual consisting of only 
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the bounded linear functionals is constructed for topological linear spaces. However, the span of the family 
of linear functionals (h;);e; depends on the primal basis B. 

To see how the primal basis may affect the span of the “canonical dual basis”, consider Examples 22.9.2 
and 22.9.3. These examples present well-ordered bases B and B respectively for the free linear space V = 
FL(Zj,IR). Consider the linear functional f : V — R defined by f(v) = y» v(i) for all v € V. This 
function is well defined because v(i) = 0 for all except a finite number of values of i. So f € V*, but f is not 
in the span of the elements of B*, where B* = (hi)iez: is constructed from the basis family B — (ei)iez 


as above. On the other hand, f is in the span of the elements of the family B*, where B* — (hi)iens is 


constructed from the basis B = (Zdiext as above. In fact, f = ho. So span(Range(B*)) 4 span(Range(B*)). 
This is a little surprising. The bases B and B span the same space V, but the dual families do not. In this 
example, span(Range(B*)) C span(Range(B*)), but span(Range(B*)) N Range(B*) = 0. 

The dual of any free linear space FL(S, K) may be identified with the unrestricted linear space K’. (See 
Definition 22.2.5 for unrestricted linear spaces.) To see this, define ¢)(f) = X peg A(x) f(x) for all A € KS 
and f € FL(S, K). Then ¢)(f) is well-defined because f(x) = 0 for all but a finite number of elements x € S, 
and (f) is clearly a linear function of f. So 6: KS — V* maps the unrestricted linear space to V*. 


To see that 9 : KS — V* is a bijection, let g € V* and define \, € K? by \g(x) = g(ex) for all z € S, where 
e; € FL(K, S) is defined by e, : y  ó;,, for x € S. Then o, (f) = Ynes Agla) f (zx) = Vines Gla) f(x) = 
Maes WF (zee) = 9 dices f(x)es) = g(f) because g is linear and f = $ peg f(x)ez for all f € FL(S, K). 
Hence ġa, = g for all g € V*. So ¢ is a bijection. 


It follows that all of the issues which are mentioned for unrestricted linear spaces on an infinite set in 
Remark 22.7.22 apply to the unrestricted dual V* of any free linear space on an infinite set. However, every 
infinite-dimensional linear space which has a basis is isomorphic to a free linear space on an infinite set. (See 
Theorem 23.1.15.) 


It may be concluded that the unrestricted algebraic dual of any infinite-dimensional linear space is seriously 
problematic. Even if one restricts the dual space to those linear functionals which are in the span of a “dual 
basis" constructed as in Definitions 23.7.2 and 23.7.3, the resulting restricted dual space is basis-dependent, 
which is clearly highly unsatisfactory. Examples 22.9.2 and 22.9.3 are easily extended to any infinite free 
linear space FL(S, K), which implies that the basis dependence applies to all infinite-dimensional primal 
linear spaces which have a basis. 


It is possible to restrict the algebraic dual of an infinite-dimensional linear space in a basis-independent 
way by making use of a topology on the primal space. But that is another story. (See Sections 39.1, 39.3 
and 39.5.) 


23.7.5 THEOREM: The canonical dual basis is uniquely determined by values on basis vectors. 
The “canonical dual basis" in Definition 23.7.3 is the unique family of linear functionals (h;);e; on V such 
that hates) = 9i for all i, j € I. 


PROOF: The family B = (hi)ie; in Definition 23.7.3 satisfies h;i(ej) = k&p(ej)(i) for all i,j € I. But 
ej = J per jki. Since the components with respect to B are unique by Theorem 22.8.11, &p(e;)(k) = 9x 
for all j,k € I. So hi(ej) = óij. The uniqueness of B follows from the fact that a linear functional is uniquely 
determined by its action on a basis (by Theorem 23.4.5). 


23.7.6 THEOREM: The canonical dual basis set is a linearly independent set of functionals. 
Let V be a linear space which has a basis. Then the “canonical dual basis set” for V is a well-defined linearly 
independent set of linear functionals on V. 


PROOF: Let V be a linear space over a field K, with a basis E C V. Let H = {he; e € E} be the “canonical 
dual basis set” of E. (See Definition 23.7.2.) It must be shown first that he : V — K is well defined and 
linear for all e € E. Let e € E. By Theorem 22.8.11, the component map kg : V > Fin(E, K) is a well- 
defined bijection. So h.(v) = &g(v)(e) is well defined for all v € V. By Theorem 23.1.15, the component 
map kg: V > FL(E, K) is a linear space isomorphism. (FL(E, K) is the standard linear space structure 
on Fin(E, K). See Notation 22.2.11.) So he is linear. Hence he € V* for all e € E. 


To show the linear independence of the linear functionals he € H for e € E, let A € Fin(E, K) be such 
that ecg Aceh, = Oy«. Then for all v € V, Oy = » ecg Aehe(v) = 3 eg Aekg(v)(e). Let v = e' € E. 
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Then Oy-« = 5 ecg Ae&g(e')(e) = Jeep Acde’e = Ae. SO Ae = 0 for all e’ € E. So the functionals in H are 
linearly independent by Theorem 22.6.8 (ii). 


23.7.7 THEOREM: Reconstruction of linear functionals from their action on unit vectors. 
Let V be a linear space with a finite basis E. Let H = (h,; e € E} be the canonical dual basis of E. Let 


f € V*. Then f = Seg f (€)he. 


PROOF: Let f € V*. Let kg : V > Fin(E,K) be the component map for E. Then for all v € V, 
f(v) = F (Xeen eWoe) = Zeeg s (v)(e)f(e) = Deen f(e)he(v). So f = Deeg f(e)he- 


23.7.8 THEOREM: The canonical dual basis for a linear space is a basis for its dual. 
Let V be a linear space which has a finite basis. Then the canonical dual basis for V is a basis for V*. 


PROOF: Let V be a linear space over a field K, with a basis E C V. Let H = (he; e € E) be the “canonical 
dual basis set” of E. (See Definition 23.7.2.) Then H is well-defined and the linear functionals he are linearly 
independent by Theorem 23.7.6. 

Let f € V*. Then f = cpg f(e)he by Theorem 23.7.7. So f € span(H) because E is a finite set. So 
span(H) = V*. Hence H is a basis for V* by Definition 22.7.2. 


23.7.9 REMARK: The “canonical dual basis” is a valid basis for a finite-dimensional space. 

In the proof of Theorem 23.7.8, the finite dimension of V is essential to guarantee that f can be expressed 
as a finite linear combination of the elements he of the basis E. The expression f = J- epg f(e)he is not a 
linear combination if there are an infinite number of terms f(e)he in the summation. 


For each v € V, the expression f(v) = »;.cg &g(v)(e)f(e) is a summation over a finite number of non-zero 
terms &g(v)(e)f(e) because v is a linear combination of elements of the primal basis E. Thus one may 
write f(v) = » ecg, &E(v)(e)f (e), where E, C E is a finite set depending on v. What is happening here 
is that only a finite number of values f(e) are brought into action in the expression » ,c p f(e)he(v) for 
each particular vector v. But Uey E, may be an infinite set. In fact, it is not at all difficult to construct 
examples of this. Remark 23.7.4 discusses an example of an infinite-dimensional spaces (with a well-ordered 
basis) where a very simple linear functional shows this behaviour. 


23.7.10 THEOREM: Reconstruction of linear functionals from their action on unit vectors. 
Let V be a linear space with a finite basis B = (ej);e; € V!. Let (hi)icr € (V*)/ be the canonical dual basis 
family to B. Let f € V*. Then f = » ic; f(ei)hi. 


PRoor: The assertion is the family version of Theorem 23.7.7. So it may be proved in the same way. 


23.7.11 THEOREM: The dual of a finite-dimensional space has the same dimension. 
'The dimension of the dual of a finite-dimensional linear space equals the dimension of the primal space. 


PRoor: Let V be a finite-dimensional linear space with basis E. The canonical dual basis linear functionals 
he € H in Definition 23.7.2 for e € E are different. To see this, let e, e' € E. Then h.(e) = 1g z Ox = he (e). 
So he # he. Therefore the map h : E — H defined by h : e+ he is a bijection. Therefore #(E) = #(H). 
So dim(V) = dim(V*). 


23.7.12 THEOREM: The dual of an infinite-dimensional space is infinite-dimensional, and conversely. 
A linear space which has a basis is infinite-dimensional if and only if its algebraic dual space is infinite- 
dimensional. 


PROOF: This follows immediately from Theorem 23.7.11. 


23.7.13 REMARK: The algebraic dual is isomorphic to the unrestricted linear space on the primal space. 
Theorem 23.1.15 states that if a linear space V over a field K has a basis set B, then the component map 
kp : V > FL(B, K) is a linear space isomorphism. Thus the primal space V is isomorphic to the free linear 
space on B. Theorem 23.7.14 asserts an analogous property for dual spaces. The dual space is isomorphic 
to the unrestricted linear space UL(B, K) = KP when the primal space has basis B. Therefore if the primal 
basis is infinite, the dual space is hugely bigger than the primal space. 
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23.7.14 THEOREM: Linear space isomorphism between a dual space and an unrestricted linear space. 
Let V be a linear space over a field K with a basis set B. Then V* is isomorphic to the unrestricted linear 
space KP, and the map f +> (f(v)),ep is a linear space isomorphism from V* to KP. 


PROOF: Let V be a linear space over a field K with a basis set B. Let f € V*. Then f : V > K isa linear 
functional by Notation 23.6.2. Define à : V* > KP by à : f ^ (f(v)).eg € KP. Then ¢ is a linear map 
because @(Ai fi + A2f2) = Qu fitv) + Aofa(v))ven = M(Cfi(v))ven + A2(fo(v) oes = MORI) + A29(f2) for 
all Ay, A € K and fi, f2 € V* by Definition 22.2.5. 

For h € KP, define v(h) : V > K by v(A)(v) = Deeg h(e)kn(v)(e) for all v € V. Then v(h) is well 
defined because (e € B; &g(v)(e) Z 0] < oo by Definition 22.8.6, which is justified by Theorem 22.8.2. 
But v(h)(Avi + A2U2) = Deen h(e)sg(Mti + A2v2)(e) =X eee h(e)isg(vi) + A2 cB h(e)ip(va)(e) = 
Aiv(h)(v4) + A2w(h) (v2) for all Ay, A2 € K and v,v$ € V by Theorem 22.8.12. Therefore w(h) € V* by 
Notation 23.6.2. Thus v(h) € V* for all h € KP. Sow: KP — V* is a well-defined map. However, 


Vh e KP, e(Q(h)) = (Wh) (v) ves 


( 
=(¥ Ale)bew) sep (23.7.1) 
( 


where line (23.7.1) follows from Theorem 22.8.9 (i). So ¢ o Y = idgs. Therefore by Theorem 10.5.14 (iii), 
@ is surjective. Conversely, 


Vf € V*, Vv € V, v(é(v)) = X e(f)K(e)ss(v)(e) 


ecB 


= à, f(e)kn(v)(e) 


ecB 


- f(v) 


by Definition 22.8.6. Thus v o 9 = idy». So ¢ is an injection by Theorem 10.5.14 (ii). Hence 9 : V* > K8 
is a linear space isomorphism. 


23.7.15 THEOREM: Isomorphisms between a free linear space’s dual and an unrestricted linear space. 
The dual linear space of FL(B, K) is canonically isomorphic to UL(B, K) for any set B and field K. 


Proor: The assertion follows from Theorem 23.7.14. 


23.7.16 REMARK: The double dual of the free linear space on an infinite basis. 

Regrettably, the double dual linear space of a free linear space raises choice function issues if the basis is 
infinite. For example, let B = w and K = IR. Let W = UL(w,R) = R”. Then one might expect that 
there exists a linear map 7: W — R for which n(1w) = 1, where lw : w > R is the function defined by 
Vi € w, lw (i) = 1. Since this vector lw € W is linearly independent of all of the “atoms” 4; : w — R which 
are defined by Vi € w, 6;(i) = 6;;, the value of 7 for all such atomic functions may be chosen to equal zero. 
Thus 7(g) = 0 for all g € UL(w, R) with compact support. In other words, 7(g) = 0 for all g € FL(w, R). 


The next task is to choose a value of n(g) for all g € W which are not spanned by the atoms. One could 
perhaps hope that the definition g(g) = lim;,.. g(t) would be suitable, because it does satisfy n(lw) = 1 
and n(g) = 0 for all g € FL(w, R). However, this is not defined for all real-valued sequences. Nor are the 
inferior and superior limits, which are not even linear on their domains. It is not possible to set ņn(g) = 0 for 
all g € W which are not in the span of 1w and not in FL(w, R) because this is not a linear subspace. 

Now define gm,; € W by Range(gm,;) = (0n, 1r} and Vi € w, (g(i) = lr & (i mod m) = j) for m € Zt 
and n € Zf with n < m. Then goo + g21 = lw. Therefore (goo) + 7(g2,1) = 1m, but n(g2,0) and n(ge,1) 
may be freely chosen within this constraint. Similarly, X N(gm j) = lm for all m € Z*, but whenever 
mı, M € Z* have a common factor, there will be additional constraints. It is not too difficult to choose 
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values of 7 for this well-ordered, countably infinite subspace of W. The difficulties lie in the not-well-ordered, 
uncountably infinite remainder of the space. 


There is no obvious well-ordering of W which could be used here because this would imply a well-ordering 
of P(w), which would imply a well-ordering for R, which is not possible, as mentioned in Remarks 7.12.5 
and 22.7.23. Consequently the possibility apparently cannot be excluded that UL(w,R)* = FL(w, R) in 
some model of ZF set theory. Since it is not possible to demonstrate the existence of linear functionals on 
UL(w, R) which do not have finite support, it seems reasonable to suppose that they do not exist. But this 
apparently cannot be proved within ZF set theory. The same comments apply to any infinite basis set B, 
whether it is Dedekind-infinite or not. Thus Theorem 23.7.17 can only demonstrate the existence of an 
embedding of FL(B, K) inside UL(B, K)* for a general set B. 


23.7.17 THEOREM: Monomorphism from a free linear space to the dual of an unrestricted linear space. 
For any set B and ordered field K, the map ¢: FL(B, K) > UL(B, K)* defined by 


Vf € FL(B, K), Vg € UL(B, K), e(f)(g) = 2. Flejate) (23.7.2) 


is a linear space monomorphism from FL(B, K) to the dual linear space of UL(B, K). 


PROOF: Let W = UL(B, K). The sum in line (23.7.2) is finite because #(supp(f)) < oo. The linearity of 
o(f): W > K is clear for f € FL(B, K). So 6 : FL(B, K) > W* is a well-defined map. The linearity of 
(f) with respect to f is also clear. Therefore ó : FL(B, K) —^ W* is a linear map. To show injectivity, let 
fi, f2 € FL(B, K) satisfy fı A fo. Then fi(e) # fz(e) for some e € B. Define g € W by g = xtqey. Then 


é(fi)(g) = file) 4 fa(e) = o(f2)(g). So 6(f3) Æ o(f2). Therefore ¢ is injective. 


23.7.18 REMARK: Algebraic double duals of infinite-dimensional spaces are problematic. 

The difficulties described in Remark 23.7.16 are related to the comments in Remark 23.6.12, where it is noted 
that in practice, dual linear spaces are typically constrained to be bounded with respect to some topology, 
and likewise with the double dual. Therefore axiom-of-choice issues are avoided. This helps to explain why 
purely algebraic duals and double duals of infinite-dimensional spaces are rarely utilised. 


? 
? 


B 
B 
) 


23.8. Duals of finite-dimensional linear spaces 


23.8.1 REMARK: The canonical dual basis for a finite-dimensional linear space. 

Definition 23.8.2 is a finite-dimensional version of Definition 23.7.2. It follows from Theorem 23.7.5 that the 
canonical dual basis (h;)*., of any given basis (e;)"_, for a linear space V with n = dim(M) € Zg is the 
unique sequence of n vectors in V* which satisfy 


Và, j € Nn; hi(e;) = oig. 


The fact that the sequence (h;)?., is a basis for V* follows from the linearity of any f € V* because for 
any v € V, fw) = fi wes) = DL uf(e) = Ya f(eih(v), so that f = P; f(e)h;. This 
shows that (h;)"_, spans V*. To show that the coefficients of the ^; are unique, let f = $5; , fih; = 0 
for (fj)?-, € K”. Then f(v) = 0 for all v € V. In particular, 0 = f(e;) = 5,4 fihi(ej) = fj for all j € Nn. 
So the coefficients are unique and (h;)7., is a basis for V*. (Theorem 23.7.8 says much the same thing.) 


23.8.2 DEFINITION: The canonical dual basis of a basis B = (e;)?_, for a finite-dimensional linear space V 
is the basis (h;)?., for the dual space V* defined by h;(v) = v; for all à € Ny, for all v = yt vijei € V. In 
other words, h;(v) = &p(v)(i) for all v € V. 


23.8.3 REMARK: Visualisation of the canonical dual basis as families of level sets. 

Each canonical dual basis functional h; in Definition 23.8.2 depends on all of the vectors in the primal 
basis (e;)?_,. This is illustrated in Figure 23.8.1 in the case n = 2. For the same primal basis vector e1, the 
dual basis vector hı depends on the choice of the other primal basis vector e». 


In the case of an orthonormal basis in a linear space with an inner product, each canonical dual basis 
functional h; depends only on the matching vector e; because orthonormality removes the freedom of the 
other vectors to influence the dual basis functionals. Since one's intuition for linear algebra is largely built 
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U = U1€1 + V2€2 vU = v1e1 + v$e, 
hi(v) 2vi hi (v) = v 


Figure 23.8.1 Level sets of canonical dual basis vector depending on primal space basis 


through examples with orthonormal bases, it is tempting to apply this intuition to linear spaces in general. 
This intuition must be resisted! 


The fact that the dual basis depends on all of the primal basis vectors helps to explain why the existence of 
a basis for the primal space is helpful for constructing the space of linear functionals. 


23.8.4 REMARK: Confusion caused by index raising and lowering conventions. 

Bases for the primal and dual spaces are often notated with the same letter, such as e, distinguishing the 
two meanings by using subscripts for the primal space basis (such as (ea)aca or (e;)%_,) and superscripts for 
the dual space basis (such as (e*)aea or (e*)? 4). This convention is standard practice for tensor calculus 
on manifolds. It is often a useful mnemonic for checking calculations, but it sometimes leads to erroneous 


assumptions. (See Remark 22.8.21 for further comment on this.) 


In tensor calculus for Riemannian or pseudo-Riemannian geometry, the raising and lowering of indices often 
signifies that a tensor has been contracted with the metric tensor or its inverse. In simple linear algebra, 
there is no metric tensor (or inner product). In linear algebra, upper and lower indices serve only to hint 
that a particular object is either covariant or contravariant. When a single letter, such as e, denotes a 
basis (e;)?*., of a linear space, the raised-index version (e^) , denotes the dual basis. This raising of the 
index is not achieved by contracting with a metric tensor in the linear algebra context. 


To see why the dual basis of a linear space is not obtained from the primal space basis by contracting with a 
metric tensor, suppose to the contrary that $77 , gije = ej for some (invertible) matrix (gi;)7;-, € IR"*^. 
(See Chapter 25 for matrix algebra.) This equation does not make sense because the left-hand side is a linear 
combination of linear functionals e^ € V* and the right-hand side is a vector ej € V. For two objects to be 
equal, they should at least live in the same space! But suppose we temporarily ignore this show-stopper and 
optimistically try to force the equality. Maybe the problem can be fixed somehow! 


Suppose 77, gije’ = ej, as above. The action of the left-hand side on a vector v € V is (*5;., gij)e/(v) = 
oia gi (ei(v)) = Moi gijv?, where (vt)? € V" is the tuple of vector components satisfying v = Y; ., vei. 
In other words, $77 , gije’ € V* is the linear functional which maps v € V to $77 , gi; € K. This is well 
defined. If the right-hand side e; of the equation is applied to v, the result is e;(v) = ej;(*5; vtei) = 
oa v'ej(e;), assuming that the ill-defined function e; : V + K is at least linear! The left and right hand 
sides will now be equal if gi; = e;(e;) for all i,j € Nn. The product of basis vectors is not defined. But if 
an inner product is defined on V as a bilinear function h : V x V + K (satisfying certain criteria), then the 
expression e;(e;) may be interpreted as such an inner product. (See Section 24.9 for inner products.) An 
inner product is, in fact, exactly what a metric tensor is at each point of a manifold. An inner product is an 
additional structure which is not available for a linear space according to Definition 22.1.1. 


It may be concluded that the use of subscripts and superscripts to denote two different objects, for example 
the primal and dual space bases, is confusing in linear algebra because this convention has a different meaning 
in linear algebra to what it has in tensor calculus. 
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23.9. Dual space component maps for finite-dimensional spaces 


23.9.1 DEFINITION: Canonical component map for the dual of a finite-dimensional linear space. 
The dual component map for the dual V* of a linear space V over a field K with a finite basis (e;);e; for V 
is the map &* : V* — Fin(I, K) defined by «* : f + (f(ei))ier for f € V*. 


23.9.2 REMARK:  Necessity of defining canonical dual component maps only for finite-dimensional spaces. 
For a general linear space V with a basis in Definition 23.9.1, it is not self-evident that Range(&*) C Fin(I, K). 
It must be shown that only a finite number of components of &*( f) are non-zero for any f € V*. As discussed 
in Remark 23.7.4, there are always linear functionals f in V* for which «*(f) has an infinite number of non- 
zero values. So Range(&*) Z Fin(I, K). This is why the definition must be restricted to finite-dimensional 
linear spaces. 


23.9.3 THEOREM: Reconstructing a linear functional from its components. 

Let B = (ej)ie; € V! be a finite basis for a linear space V over a field K. Let &* : V* — Fin(I, K) be the 
dual component map for V*. Let f € V*. Then f = J c, &'(f)()hi, where (hi)ier is the canonical dual 
basis of B. 


PROOF: Let (h;);e; be the canonical dual basis of (e;);e;. Let f € V*. Then 
Vv € V, (E AAO) O) = E PA Ohl) 
icl icl 
=) f(ei)ka(v)() 
icI 


= f(v) 
by Definitions 23.9.1 and 23.8.2 and Theorem 23.4.6. Hence f = » c &'(f)(i)h;. 


23.9.4 DEFINITION: Canonical component map (tuple version) for dual of finite-dimensional linear space. 
'The dual component map for the dual V* of an n-dimensional linear space V over a field K with n € Ze for 
a basis (e;)"_, for V is the map «* : V* => K” defined by «* : f — (f(e;))%, for f € V*. 


23.9.5 REMARK: Reconstructing a linear functional from its component tuple. 
In terms of the dual component map «* in Definition 23.9.4, any linear functional f € V* satisfies f = 
Mua K*(f)ihi, where (h;)?_, is the canonical dual basis of (e;)?.,. This follows from the calculation f(e;) = 


(Dia 8" (Paha) = ua s ye) = Ma sts = s). 


23.9.6 REMARK: Change-of-basis rules for canonical dual bases. 

Bases and components for primal and dual spaces are useful because they compress the information in the 
specification of a vector, linear functional or map. The information compression usually more than compen- 
sates for the effort to choose and use a basis and components. However, the effort increases substantially 
when one wishes to change basis. The rules for changing basis are tedious, but often necessary. Luckily, 
though, the change-of-basis rule for the canonical dual of a finite basis has a straightforward relation to the 
rule for the primal space basis. This is presented in Theorem 23.9.7. 


23.9.7 THEOREM: Change-of-basis rules between two canonical dual bases. 

Let V be a finite-dimensional linear space over a field K. Let By = (ej)ie; and By = (e2)je; be indexed 
bases for V. Let Bt = (hj)ier and B3 = (h7);e; be the canonical dual bases to Bı and B» respectively. 
Define the basis transition component arrays AB, Ba = (aji);ej4cr and AB, B, = (Gij)ier jes by 


Viel, Vie J, aji = &py(e1)j 
Vi € I, Vj € J, Bij = KB, (e5)i- 
Then 
Vj € J, hj- X au (23.9.1) 
icI 
Vi € I, hi = X ijh}. (23.9.2) 
jEJ 
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Denote the components (fj )ier and (f7)je7 of dual vectors f € V* with respect to the bases Bı and By 
respectively by 


Wiel, fl - fe 
Vj € J, i = Fle). 
Then 
Vf eV*, VjeJ, f= Dii ai; (23.9.3) 
Vf € V*, Vi e I, qe 2 Fi asi. (23.9.4) 
J 


Proor: By Definition 23.8.2, the dual bases (hj);e; and (h2)je; satisfy h;(v) = «p, (v)(i) for i € I and 
h5(v) = &p,(v)(3) for j € J. So for all v € V and j € J, h5(v) = Kg, (v) (j) = KB Puer kg (v) (i)el)(3) = 
ier EB 0) (i) KB, (el) (3) = Mie, ajihl(v), which verifies line (23.9.1). Line (23.9.2) follows similarly. 

For all f € V* and j € J, f? = fle?) = Fier e¢4ij) by Theorem 22.9.11 line (22.9.3). So f? 
Mer (ei) Gig = Me, flai;, which verifies line (23.9.3). Line (23.9.4) follows similarly. 


23.10. Double dual linear spaces 


23.10.1 REMARK: Terminology for the algebraic dual of the algebraic dual of a linear space. 
The second dual could also be called the “double dual". Robertson/Robertson [126], page 70, use the word 
“bidual” for the dual of the dual. 


23.10.2 DEFINITION: The second dual of a linear space V is the dual of the dual of V. 


23.10.3 REMARK: The canonical immersion of a linear space inside its double dual. 
The double-map-rule notation *$ : v (f — f(v))" in Definition 23.10.4 is described in Remark 10.19.3. 


23.10.4 DEFINITION: The canonical map from a linear space V to its second dual is the map $9 : V —V** 
defined by à : v => (f  f(v)). In other words, ¢(v)(f) = f(v) for all v € V and f € V*. 


23.10.5 THEOREM: Monomorphism from a linear space to its double dual. 
Let V bea linear space which has a basis. Then the canonical map from V to its second dual is a well-defined 
linear space monomorphism. 


Proor: The canonical map ¢ in Definition 23.10.4 is well-defined because for all v € V, $(v) is a linear 
map from V* to the field K of the linear space V. (This follows from the calculation @(v)(Ai fi + A2f2) = 
(fa + Aa f2)(v) = Ai fi(v) + Aa fa(v) = A19 (v)(.f1) + A26(v) (fo) for all v € V, Ay, A2 € K and fi, fo € V*.) 
Therefore f(v) € V*. 

For all Ay, A2 € K and vi, v? € V, (àv + Azv2)( f) = fti + A22) = Ar f(v1) + A2 f (w2) = Apv) F) + 
A26(v2)(f) for all f € V*. So $(A1v4 + A09) = A1b(v1) + A2xG(v2). Thus ¢ is a linear space homomorphism. 
To show that $ is injective, suppose that v,,v2 € V are vectors such that $(v,) = (v2) and v4 Z v2. By 
Theorem 23.5.3, there is a linear functional f € V* such that f(v1—v2) Z 0. Then 0 = $(v1)(f) —¢(v2)(f) = 
f(vu1) — f(v2) = f(v1 — v9) Z 0. This is a contradiction. So ¢ is injective. 


23.10.6 REMARK: The dual of the dual is isomorphic to the primal space if it is finite-dimensional. 

The second dual V** of a finite-dimensional linear space V is canonically isomorphic to the primal space V. 
Theorem 23.10.7 does not hold for infinite-dimensional linear spaces which have a basis. (The dual of an 
infinite-dimensional linear space with a basis is “larger” than the primal space.) 


23.10.7 THEOREM: Isomorphism from a finite-dimensional space to its double dual. 
Let V be a finite-dimensional linear space. Then the canonical map from V to its second dual is a linear 
space isomorphism. 
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PROOF: Let V be a finite-dimensional linear space. It follows from Theorem 23.10.5 that the canonical 
map ¢: V — V** is a monomorphism. 


To show that $ is surjective, let g € V**. It must be shown that there is a vector v € V such that 
g(f) = f(v) for all f € V*. Let k: V — Fin(B, K) denote the component map relative to a basis 
B for V. Then any vector w € V may be written as w = Jeeg &(w)(e)e. Define v € Fin(B,V*) by 
v(e)(w) = &(w)(e) for all e € B and w € V. Then for any f € V* and w € V, f(w) = f(*,,eg &(w)(e)e) = 
Dees k(w)(e)f(e) = een v(e)(w)f(e). So f = *;.cp f(e)u(e) because B is finite. Therefore for all f € 
V*, qf) = (X uen FOV) = Zeen FIle) = FOE cn 9(¥(6))€) = f(v), where v = Yep gule)ye. 
Then v is a well-defined vector in V because (g(v(e))) € Fin(B, K). So $ is surjective. Hence $ is a 
linear space isomorphism. 


ecB 


23.10.8 REMARK: Space of linear maps in the special case that the domain or range is a field. 

The space of linear maps Lin(V, W) for linear spaces V and W over a field K has a special form when V or 
W is substituted with K. If W is substituted with K, then Lin(V, K) = V*. If V is substituted with K, 
then there is a natural isomorphism Lin(K,W) =% W as shown in Theorem 23.10.9. It is usual to identify 
Lin(K,W) with W. 


23.10.9 THEOREM: Isomorphism from space of linear maps with scalar domain to the range space. 
Let W be a linear space over a field K. Then the map ¢: Lin(K, W) — W defined by ¢(f) = f(1x) for 
all f € Lin( K, W) is a linear space isomorphism. 


PRoor: Clearly ¢ is a linear map. To show surjectivity, let g € W and note that the map f : K > W 
defined by f(k) = kg is well defined and linear. So f € Lin(K, W). But 4(f) = f(1&) = g. So ¢ is surjective. 
To show injectivity, let fi, fo € Lin(K,W) satisfy (f1) = é(fo). Then Ow = (f1) — e(f2) = o(fi — fe) = 


(fi — f2)(1«). By the linearity of f, (fi — f2)(k) = k(f1 — fo) 1x) = Ow for all k € K. So fi = f2. Hence 
ó is injective. 


23.10.10 REMARK: The application of primal and dual linear spaces as tangent vectors and covectors. 
In the differentiable manifolds context, if the primal linear space is the tangent space at a point, then the 
dual space is the set of all differentials of differentiable real-valued functions at that point. (See Section 55.2.) 


A tangent vector v on a differentiable manifold M may be thought of as an equivalence class of C! curves 
which have the same velocity at a point p. Such a vector may be identified with the common velocity of the 
curves. This is then pictured as an arrow as in Figure 23.8.1. Similarly, a tangent covector may be thought of 
as an equivalence class of C! real-valued functions on M which have the same differential at p. The covector 
is identified with the common differential of the functions. This is then pictured as a set of contour lines as 
in Figure 23.8.1. Thus one may think of the arrows and contour lines of linear algebra as limiting pictures 
for differentiable manifold concepts. (Note that the word "gradient" cannot be substituted for *differential" 
here because, technically speaking, the gradient requires a metric tensor. See Definition 74.6.2.) 


23.11. Linear space transpose maps 


23.11.1 REMARK: The canonical immersion in the double dual is the transpose of the identity map. 

If the expression f(v) in Definition 23.10.4 is regarded as a function of two variables f € V* and v € V, the 
canonical map h : V — V** is the transpose of this function. (See Remark 10.19.4 for the transpose of a 
function of a function.) More precisely, if $ : V* — (V > K) denotes the map $ : f — (v — f(v)), then h 
may be thought of as the transposed function h : V > (V* > K) defined by h : v > (f  f(v)). But ¢ is 
the identity map idy« on V*. So the second dual canonical map h equals the function transpose of idy». 


23.11.2 DEFINITION: The transpose of a general linear map. 

The transpose (map) of a linear map ó : V — W for linear spaces V and W over the same field K is the 
map $7 : W* > V* defined by à? : f 5 (v = f(é(v))). 

In other words, 97 (f)(v) = f(6(v)) for all f € W* and v € V. In other words, 97 (f) = f o ¢ for all f € W*. 


23.11.3 REMARK: Alternative terminology for the transpose of a linear map. 
Frankel [12], page 640, notes that the transpose $7 of a linear map ¢: V > W in Definition 23.11.2 is the 
same as the “pull-back operator" for functionals on W. EDM2 [113], 256.G, calls ó* the “dual mapping” or 
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“transposed mapping”. Yosida [167], VII.1, calls QT the “dual operator” or “conjugate operator". Robertson/ 
Robertson [126], page 38, call 9^ the transpose, but they note the alternative terms “adjoint”, “conjugate” 
and “dual”. 

Crampin/Pirani [7], page 27, call $T the “adjoint” of ¢. This use of the word “adjoint” is (hopefully) a 
minority opinion. The adjoint of $ is usually defined as a quite different map ¢* : W — V which uses inner 
products on the linear spaces V and W. The term “adjoint” is also used for 9^ by Gilbarg/Trudinger [81], 


page 79, in the context of Banach spaces, and by Spivak [37], page 1.103, for linear maps 6 : R” > IR". 


Allin all, it seems best to use the terms “transpose” or “transposed map” for the concept in Definition 23.11.2 
to minimise confusion. 


23.11.4 EXAMPLE: Transpose map for immersions and submersions of finite-dimensional spaces. 

Define 9 € Lin(V, W) for V = R”, W = R”, for m,n € Zi with m € n, by à : v 9 (vi,...v,0,...0). 
Then $7 : W* — V* is defined by 97 (f) : v œ> f(d(v)) = f(vi,...v&,0,...0) for all f € W*. Thus 
QT (f): V — R is defined on R™ by evaluating f on the subspace Range(¢) of W and attributing the value 
to the input tuple (v;,...v,,). Then ¢ is injective and $7 is surjective, but if m < n, then ¢ is not surjective 
and ó7 is not injective. (This is related to Theorems 23.11.9 and 23.11.10.) 


Now let m > n and define ó € Lin(V, W) by $ : v +> (vi,...v,). Then 67 : W* — V* is defined by 
QT (f) : v e f(ó(v)) = f(vi,... v4) for all f € W*. So ¢ is surjective and ¢7 is injective, but if m > n, then 
is not injective and $7 is not surjective. 


23.11.5 EXAMPLE: Transpose map for infinite-dimensional spaces. 

Let V = R” and W = FL(R, IR). (See Definition 22.2.10 for free linear spaces.) Define $ € Lin(V, W) by 
b(v) iz DM, vids. Then 4" (f)(v) = flv) = Fey vi) = OM, vif (8) for all f € W* and v € V, 
where 6; = xi; : IR — IR denotes the function 6; : z ++ 0,,. (See Definition 14.7.2 for indicator functions.) 
Thus, as expected, ¢: V — W is an injection which is not a surjection, and 97 : W* — V* is a surjection 
which is not an injection. 

Linear functionals f € W* map functions in W = FL(R,R) = fg : R > R; #(supp(g)) < co} to R. 
Elements of W* assign values independently to the subspaces (Aóy; ^ € R} for y € IR. Each functional 
f € W* has the form f : g++ X rer F(z)g(x) for some arbitrary function F : IR — IR. (The sum is always 
finite because g has finite support.) Consequently W* is canonically isomorphic to UL(R,R). (This is 
asserted in general by Theorem 23.7.15.) The expression for ¢7(f) uses only m of the infinitely many values 
of f(d;) to construct a linear functional on IR^". 


In regard to Theorems 23.11.14, 23.11.15 and 23.11.17, it would be interesting to know whether the map 
V : Lin(V, W) > Lin(W*,V*) defined by v : 6 $T is surjective in this example. All linear maps from V to 
W have the form ¢: v — Y 5,4 Vigi for some (g;)*, € W™. Then $T has the form ¢7(f)(v) = 154 vif (gi) 
for f € W* and v € R”. But the functions g; : R — R have finite support for i € Nm. Therefore only 
those linear functionals in Lin(W*, V*) which have finite support are represented in Range(v). It cannot 
be guaranteed in pure ZF set theory that there are no linear functionals in W* which do not have finite 
support. In fact, in ZF models where the axiom of choice is valid, linear functionals with infinite support do 
exist in W*. Therefore it cannot be claimed that v : Lin(V, W) > Lin(W*, V*) is surjective. 


23.11.6 REMARK: The relation of the transpose of a linear map to the “pull-back” concept. 

If V = W and ¢ = idy in Definition 23.11.2, then ¢* = idy». In the general case, the formula ¢*(f) = f o à 
may be written as ¢* = Ry, where Rọ is the right action on linear functionals f € W* by the map ¢. This 
is actually a special case of the concept of the “pull-back” of the function f : V — K by the map ¢ to a 
function ¢*(v) : W — K. The pull-back concept is used frequently for functions on manifolds. 


The natural generalisation of this to the set W — K of all (not necessarily linear) K-valued functions on W 
would have the same definition ¢* : f — f o $. Another useful generalisation would be to linear maps from 
V and W to a third linear space U. Thus a transposed map ¢* : Lin(W, U) — Lin(V, U) may be defined by 
the same rule ¢*(f) = f o à for all f € Lin(W,U). 


23.11.7 REMARK: Representation of dual space immersions as transposed maps. 

There is an eerie similarity between the transposed map in Definition 23.11.2 and the canonical map for the 
second dual in Definition 23.10.4. The formula for the transposed map is ¢7 : f +> f o ¢, whereas for the 
second dual canonical map, it is h(v) : f + f(v). In each case, the formula “multiplies” the linear functional 
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f on the right by something. In the former case, the argument of f is the value of $. In the latter case, the 
argument of f is the vector v € V. 


By some tortuous thinking, it is possible to represent the second dual map h : V — V** in terms of a 
transposed map. For a given v € V, define the linear map ¢, : K > V by ¢,(t) = tv for all t € K, regarding 
the field K as a linear space over K. Then the transpose of ¢, is oT : V* — K* defined by 97 : f f o 4, 
so that QT (f) : t — f(tv). If K* is identified with K via another canonical map, the map t — f(tv) could 
be identified with f(v). Thus effectively 97 : f — f(v). So QT is (roughly) the same thing as h(v). 


23.11.8 REMARK: Properties of transposes of linear space maps. 

Theorems 23.11.9, 23.11.10 and 23.11.12 give some basic properties of the transpose of a linear space map. 
The terms “monomorphism” and “epimorphism” were introduced in Definition 23.1.8 to mean injective and 
surjective homomorphisms respectively. It is noteworthy that Theorem 23.11.10 requires the target space 
W to have a well-ordered basis. This is required to ensure the existence of a linear extension of a linear 
functional on a subspace of W to the whole of W. 


23.11.9 THEOREM: The transpose of a linear space epimorphism is a linear space monomorphism. 
Let 6: V — W be a linear space epimorphism between linear spaces V and W over a field K. Then the 
transpose map à? : W* — V* is a linear space monomorphism from W* to V*. 


Proor: Let 9: V — W be a linear space epimorphism for linear spaces V and W over a field K. Suppose 
g1,g2 € W* satisfy T (g1) = $T (go). Then Vv € V, gi(ó(v)) = ga(ó(v)) by Definition 23.11.2. Since ¢ is 
surjective, Vw € W, Ww € V, ó(v) = w. Therefore Vw € W, 3v € V, (ó(v) = w and gi(d(v)) = g2(¢(v))). 
This is equivalent to Vw € W, 3v € V, (ó(v) = w and gi(w) = ga(w)). So Vw € W, Ww € V, (gi(w) = go(w)). 
So Vw € W, (g1(w) = go(w)). Therefore gı = g2. Hence ó* is a monomorphism. 


23.11.10 THEOREM: The transpose of a linear space monomorphism is a linear space epimorphism. 
Let 6: V > W bea linear space monomorphism between linear spaces V and W over a field K. If W has 
a well-ordered basis, then the transpose map $7? : W* — V* is a linear space epimorphism from W* to V*. 


PROOF: Let ó : V — W be a linear space monomorphism for linear spaces V and W over a field K. 
Suppose W has a well-ordered basis. Let ¢7 : W* — V* denote the transpose of ¢. Since ¢ is injective, it 
has a unique well-defined left inverse $^! : 6(V) — V such that $^! o ọ = idy. (See Figure 23.11.1.) 


Figure 23.11.1 Transpose map of linear space monomorphism 


For f € V*, define the linear functional gs : (V) + K by gr(w) = f(d~'(w)) for w in the linear subspace 
o(V) of W. By Theorem 23.5.7, since W has a well-ordered basis, gy can be extended to a linear func- 
tional gf : W — K. Then for all v € V, é? (gr)(v) = Gf(O(v)) = f(671(0(v))) = fw). So o? (gy) = f. 
Hence Vf € V*, 3g € W*, ó* (g) = f. In other words, $7 is an epimorphism. 


23.11.11 REMARK:  Applicability of dual isomorphism theorem. 

Theorem 23.11.12 is applied in the proofs of Theorems 29.2.9 and 29.2.10 to show that certain kinds of dual 
tensor spaces are isomorphic. The proof is not a simple matter of combining Theorems 23.11.9 and 23.11.10 
because Theorem 23.11.10 requires a well-ordered basis for W. 
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23.11.12 THEOREM: The transpose of a linear space isomorphism is a linear space isomorphism. 

Let 6: V —^ W be an isomorphism between linear spaces V and W over a field K. Then the transpose map 
QT : W* — V* is an isomorphism from W* to V*. Hence if V and W are isomorphic spaces over a field K, 
then V* and W* are isomorphic spaces over K. 


PROOF: Let V and W be linear spaces over a field K. Let o : V — W be a linear space isomorphism. 
Then the inverse map $^! : W — V is a well-defined linear space isomorphism. 

For any f € V*, let hy = f o à !. Then hy € W*. So 9T (hy) —hgoó-—fodó !od$- f. Hence 
Vf c V*, dh € W*, 67 (h) = f. That is, 67 is surjective. 

Let g1,g2 € W* and suppose $^ (g1) = $T (g2) € V*. This means gı 0$ = g20$ € V*. So gı © ġo ġ7™t = 
g2 ° ġo 7t € W*. That is, gı = go. So ¢” is injective. Hence $ is a bijection. Hence $7 is a linear space 
isomorphism. 


23.11.13 REMARK: Linear space morphism properties of linear space transpose maps. 

It is not at all surprising that the linear space transpose map is a linear map between two spaces of linear 
maps in Theorem 23.11.14. The transpose map cannot be expected to be surjective in general because 
the algebraic dual of an infinite-dimensional linear space is typically larger than the primal space. (See 
Theorem 23.7.14.) 


23.11.14 THEOREM: The transpose map between linear space maps and dual space maps is linear. 
Let V and W be linear spaces over a field K. Then v : Lin(V, W) — Lin(W*, V*) defined by v : 6 — @7 is 
a linear map. 


PROOF: Let V and W be linear spaces over K. Define y : Lin(V, W) — Lin(W*,V*) by Y : 1 7. Then 
v is a well-defined map by Definition 23.11.2. (The linear space structure on Lin(V, W) and Lin(W*,V*) is 
given in Definition 23.1.6.) Let 41, 4» € K and $1, 6» € Lin(V, W). Then by Definition 23.11.2, 
VfeW*,VveV, — $Qaci- Aoé2)f)(v) = uci + A202)" (f£) (0) 

= f(Qu1 + A2¢2)(v)) 

= f (Arbi (v) + Azó2(v)) 

= Af (é1(v)) + Aa f (62 (v)) 

= iet (f)(v) + A263 (F)(v) 

= AW (o1)(F)(v) + Azh(G2)(F) (v). 
Thus VÀ, A2 € K, Voi, à» € Lin(V, W), (A161 + A22) = A1W(b1) + Ao ($2). So v is a linear map. 


23.11.15 THEOREM:  T7ranspose map between linear space maps and dual space maps is a monomorphism. 
Let V and W be linear spaces over a field K. Suppose that W has a basis. Then the map v : Lin(V, W) > 
Lin(W*, V*) defined by v : ġ — ó is a linear space monomorphism. 


PROOF: Let V and W be linear spaces over K such that W has a basis. Then w : Lin(V, W) > Lin(W*,V*) 


is a linear map by Theorem 23.11.14. 


Suppose that ¢1,¢2 € Lin(V, W) satisfy v(ó1) = v(ó2). Then ¢f = 63. So f(¢1(v)) = f(éx(v)) for all 
f € W* and v € V. Therefore ¢1(v) = ¢2(v) for all v € V by Theorem 23.6.9 since W has a basis. So 
$1 = $2. Hence w is a linear space monomorphism. 


23.11.16 REMARK: The isomorphism from linear maps to their transposes for finite-dimensional spaces. 
'Theorem 23.11.17 shows that for the special case of finite-dimensional linear spaces over the same field, the 
map from linear maps to their transposes is an isomorphism. This is useful for establishing isomorphisms 
between certain kinds of mixed tensor spaces. (See Remark 29.3.1.) 


The generalisation of Theorem 23.11.17 to infinite-dimensional spaces is not valid, even if those spaces have 
bases, because not all elements of the dual of the dual of a linear space have finite support. (This issue is 
discussed in Remark 23.7.16 and Example 23.11.5.) 


23.11.17 THEOREM:  Transpose map between linear space maps and dual space maps is an isomorphism. 
Let V and W be finite-dimensional linear spaces over a field K. Then the map v : Lin(V, W) > Lin(W*,V*) 
defined by 4% : ó — $T is a linear space isomorphism. 
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PROOF: Let V and W be finite-dimensional linear spaces over K. Then w is a linear space monomorphism 
by Theorem 23.11.15. Let 7 € Lin(W*,V*). Let By = (eY Jier and Bw = (e}")jez be finite bases for V 
and W respectively, and let By, = (hY Jier and Bj = (hW )jez be the corresponding canonical dual bases 


as in Definition 23.7.3. Then By and By, are bases for V* and W* respectively by Theorem 23.7.8. Define 
$ € Lin(V, W) by $= J e; Vier m(hy )(ey jeW hy. Then by Theorem 23.7.10, 


Vf € W*, Vv € V, VPE) = 97 ()(v) 
= f(6(v)) 
= f(D Vena eer AY) ()) 


j€J tel 
= 22 F(es") 22 (hy Xe Jay w) 
E 2 fe? wh hy (v)ei ) 
= 22 Fej Q0 )(2) 


= nE PeF h EI 
= n(f)(v). 


Thus w(¢) = 7. So v : Lin(V, W) > Lin(W*,V*) is surjective. Hence v/ is a linear space isomorphism. 


23.11.18 REMARK: The dual or contragredient of a linear space automorphism. 

The contragredient or dual representation of a linear space automorphism is closely related to the transpose 
of a linear transformation. The main difference is that instead of mapping to ¢: V => W to 9T : W* > V* 
by the rule 67 : f œ (v œ f(d(v))), the rule 6* : f 6 (v ^ f(¢71(v))) is used instead. For this to 
be meaningful, V and W must be the same space and ¢ must be invertible. Consequently this concept is 
restricted to groups of automorphisms. 


For differential geometry, groups of linear automorphisms play an important role as structure groups for 
vector bundles. The linear action of the structure group on the fibre space of the vector bundle must often 
be extended to the dual of the fibre space or some tensor product composed of copies of the primal and dual 
space. The standard choice of action map by the group on the dual space is the “dual representation” or 
“contragredient representation” of the group. 


Definitions 23.11.19 and 23.11.20 are restricted to finite-dimensional linear spaces because this avoids the 
need for topology for the definition of the dual space. 


23.11.19 DEFINITION: The dual representation or contragredient representation of an invertible linear map 
L on a finite-dimensional linear space V is the map L* : V* — V* defined by 


Vw € V*, Ww € V, L*(w)(v) = w(L- v), 
where V* is the dual of V. In other words, L* (w) = w o L^! for all w € V*. 


23.11.20 DEFINITION: The dual representation or contragredient representation of a group G of invertible 
linear maps on a finite-dimensional linear space V is the action u : G > Lin(V*,V*) defined by 


VL € G, Vw € V*, Vv c V, u(L)(w)(v) = w(L v). 


In other words, each group element is represented by its dual (or contragredient) representation. 
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23.12. Convex functions on linear spaces 


23.12.1 REMARK: Definitions of convex and concave functions. 

Definition 23.12.2 could be extended to linear spaces over general ordered fields K by replacing the real 
interval [0, 1] with the interval K[0,1] = {A € K; Ok < A € 1x} as in Notation 11.5.13. Definition 23.12.2 
could also be extended to general subsets S of V by replacing [0,1] with {A € K[0, 1]; (1 — A)u + Av € S}. 
However, such generalisations are not required for this book. (See Definition 22.11.8 for convex sets.) 


23.12.2 DEFINITION: Convex and concave real-valued functions on real linear spaces. 
A convex function on a real linear space V is a function f : S — IR on a convex subset S of V which satisfies 


Vu,v € S, VÀ € [0,1], fF((1— Aju + Av) € (1 — A)f(u) + Av. 


A concave function on a real linear space V is a function f : S — IR on a convex subset S of V which satisfies 


Vu,v € S, VÀ € [0, 1], f((1— Aju + Av) > (1 — A) f(u) + Av. 


23.12.3 THEOREM: Convexity/concavity for general convex combinations. 
Let V be a real linear space. Let S be a convex subset of V. Let f : S — IR. 


(i) f is convex if and only if 
Vm € Zt, Vw € S", Vu € (IRL), 
2 Hi =1 > FCE mws) 


IA 
Ms 
= 
= 
Ẹ 


(ii) f is concave if and only if 
Vm € Zt, Vw € S", Vu € (IRL), 


yi rd e FQ uan) 2 X pi f (wi). 


i=1 


PROOF: For part (i), suppose that f is a convex function. If m = 1 then A; = 1 and the inequality follows. 
The case m = 2 is equivalent to Definition 23.12.2 with A = u1. So assume that the condition holds for 
some m > 2. Let w € S"*! and v € (IRP)"*! with $57! v; = 1. If 4,44 = 1, then v; = 0 for all 


i € Nm, and the inequality follows as in the case m = 1. So suppose that 444 4 1. Define u € (Rt)™ by 
hi = vi/ (1 — Vm4i) for i € Nm. Then 357-4 ui = 1. So 


m+1 m 
"ni » viwi) = f((1— vmi) 2 nans T Um+1Wm+41) 
€ (1— vg41)f( Y nan) T Vm f (wm41) (23.12.1) 
< (L — aa) $S ef Qoi) c fn) (2312.2) 
m-41 
= 2 vi f (wi), 


where line (23.12.1) follows from Definition 23.12.2, and line (23.12.2) follows from the assumption. Thus 
the case m + 1 follows from the case m. So the inequality follows for all m € Zt. Conversely, the case m = 2 
is equivalent to Definition 23.12.2 with A = u1. So the condition implies that f is a convex function. 


Part (ii) may be proved as for part (i). 
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LINEAR SPACE CONSTRUCTIONS 
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24.1. Direct sums of linear spaces 


24.1.1 REMARK: The difference between “direct sum” and “direct product”. 

The terms “direct sum" and “direct product” are often used interchangeably. A set R™®+” of (m + n)-tuples 
may be identified with the set of pairs (x,y) with z € IR" and y € R”. In other words, IR"*" may be 
identified with the Cartesian set product IR?" x R”. This set is called a “product” because for finite sets, the 
cardinality of a Cartesian product of two sets is equal to the arithmétic product of the cardinalities of the two 
sets. However, the dimension of the Cartesian product of two linear spaces (as applied in Definition 24.1.2) 
is equal to the sum of the two dimensions. This is not at all surprising because the a space of n-tuples may 
be thought of as a power of multiple copies of a one-dimensional space, and arithmétic powers of numbers 
can be multiplied by summing the powers. Thus one defines direct products of groups (Definition 17.7.14), 
but in the case of linear spaces (and modules over rings), one generally defines direct sums. 


24.1.2 DEFINITION: The ezternal direct sum of a sequence (Va)aea of linear spaces over a field K is the 
set 


V. = [(ve)aca € a Yai #{a € A; Va # O} < oo) 


together with the operations of componentwise addition and scalar multiplication defined by 


V(Vo)acA: (Wa)acA € D Va, (va)aca 5t: (wa)acA x (Va + Wa)acA 
and 
VA € K,V(va)aea € D Vas A(va)aeA = (Ava)aeA- 


The external direct sum of linear spaces may also be called a formal direct sum. 


24.1.3 NOTATION: External direct sums of linear spaces. 
Paca Va denotes the external direct sum of a family (Va)aca of linear spaces. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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er Vi, for n € Zt, denotes the external direct sum Daca Vo with A= n. 

Q;., Vi, for n € Zf, denotes the external direct sum Paca Va with A = Ny. 

D” V, for n € Z, denotes the external direct sum P}; V; with V; = V for all i € Nn. 
Vi Q V3 denotes the external direct sum q?i VW. 

Vi Q Vo Q V3 denotes the external direct sum qi, Vi, and so forth. 


24.1.4 REMARK: The difference between internal and external direct sums of linear spaces. 

The formal or abstract direct sum in Definition 24.1.2 is referred to as an external direct sum by Hartley/ 
Hawkes [90], definition 3.1, page 34, whereas they call Definition 24.1.6 (corresponding to their definition 3.2) 
the “internal direct sum". 


Most authors use the simple term “direct sum" of linear spaces to mean the “internal direct sum” which 
is given by Definition 24.1.6. Since the internal and external direct sums of linear spaces are different set 
constructions, they should have different notations to distinguish them. But when the internal direct sum is 
defined, it is equivalent to the external direct sum. This reduces the motivation to find a separate notation. 


The external direct sum of linear spaces has much broader applicability than the internal direct sum. But 
the internal direct sum is more often used in applications. It seems inevitable that both the internal and 
external constructions should share a single notation. 
Hartley /Hawkes [90], page 34, make the following useful observation in regard to external and internal direct 
sums of rings. 
The reason for the use of the same notation for external and internal direct sums will appear 
shortly. The external direct sum should be thought of as a way of building up more complicated 
rings from given ones, and the internal direct sum as a way of breaking a given ring down into 
easier components. 
This comment is equally valid for direct sums of linear spaces. The “reason” for the use of the same notation 
is that the external and internal direct sums are isomorphic, as shown here in Theorem 24.1.8. 
It seems fairly safe to use the same notation for external and internal direct sums of linear spaces, since the 
interpretation will generally be clear from the context. 


24.1.5 DEFINITION: The standard injection ig : Va — XacA Va is defined by 


Vw € Va, Gatua = Ta 2m 


24.1.6 DEFINITION: Let V; and V5 be subspaces of a linear space V over a field K such that Vj OV = {0}. 
Then the (internal) direct sum of V; and V3 is the subspace of V defined by 


{v1 + v2; v4 € Vi, v2 € Vo}. 


24.1.7 REMARK: The direct sum of intersecting subspaces equals the span of their union. 
Definition 24.1.6 means that the direct sum of two subspaces with trivial intersection is the span of the union 
of the two subspaces. (See Definition 22.4.2 for the span of a subset of a linear space.) Thus 


[vi + vs; v; € Vi, v2 € Vj] 2 (1(U € IP(V); U is a subspace of V and Vi UV? C U }, 


where V; and V2 are subspaces of V such that V; N V2 = {0}. 


24.1.8 THEOREM: Isomorphism between the internal and external direct sums of two linear spaces. 
Let Vi, V2, V3 be subspaces of a linear space V over a field K such that V3 is the internal direct sum of Vi 
and V3. Define a map ¢: V; D V5 > V3 by 


Vu, € Vi, Vv € Vo, (v1, v2) = vı + Vo. 
Then ¢ is a linear space isomorphism. 
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PROOF: Clearly ¢ is a linear map. Suppose that (v1, v2), (v1, 05) € Vi x Vo and (v1, v2) = $(v4, v5). Then 
vy + ug = vi +v. So v4 — v = v5 — ve. But v4 — vi € Vi and $5 — v2 € V3. So vi — v = v5 — v € Vj Va. 
So v1 — v, = v — v2 = 0. by Definition 24.1.6. Therefore (v1, v2) = (vj, v5). So ¢ is injective. 

Let v € V3. Then v3 = vı + v9 = $(v1, v2) for some vı € Vi, v € Vo. So v3 € Range(¢). Therefore ¢ is 
surjective. Hence 6 is a linear space isomorphism. 


24.1.9 REMARK: Isomorphism between the internal and external direct sums of linear subspaces. 
Theorem 24.1.8 means that the internal direct sum of two subspaces of a linear space is isomorphic to the 
external direct sum of the spaces. Therefore Notation 24.1.3 is re-used in Notation 24.1.10, as discussed in 
Remark 24.1.4. 


24.1.10 NOTATION: Vj, ©@ Vz denotes the internal direct sum of linear subspaces V; and V2 which is given 
by Definition 24.1.6. 


24.1.11 REMARK: Extension of the sum-of-dimensions rule for direct sums to infinite-dimensional spaces. 
Theorem 24.1.12 is restricted to finite-dimensional spaces because the situation for infinite-dimensional spaces 
is much less clear. Whether linear space dimension is defined in terms of beta-cardinality of spanning sets 
(as in Definition 22.5.2) or in terms of conventional cardinality (which encounters difficulties with the axiom 
of choice and some other issues), the cardinal arithmetic must be carefully defined and carefully applied. 
Theorem 24.1.12 can be extended to infinite-dimensional linear spaces, but the effort in the general case is 
not necessarily justified by the benefit. 


The assertion of Theorem 24.1.12 is fairly obviously true. The proof given here is somewhat convoluted 
because of the need to manage doubly indexed families of vectors. The management of multiply indexed 
families of vectors becomes even more tedious in tensor algebra. It is probably best to avoid even reading 
such proofs, which sometimes resemble computer programming more than mathematics. 


24.1.12 THEOREM: The dimension of a direct sum equals the sum of the dimensions. 
The dimension of the direct sum of finitely many finite-dimensional linear spaces is equal to the sum of the 
dimensions of the component spaces. 

Proor: Let (V;)?*, be a family of linear spaces with finite bases (B;)7*,, where B; = (e;,;)7*, with n; € Zi 
for all i € Nm. Let w € Q;-, Vi. Then w = (v;)?* for some sequence in x?*, V;. Sow = (30554 Mati) 
for some sequences (4,;);*, € K" for i € Nm. So w € span(Bo), where Bo = {e; z; (i,j) € Jo}, where 
Io = ((63) € Nm x Z* and Vi € Nm, j € Nn,}, and ej ; = (Ókiei)g € Q5, Vi for all (i, j) € Io. Thus 
C. Vi is spanned by Bo, where #(Bo) = Y 5,4 ni. But the elements of Bo are linearly independent because 
each set B; is linearly independent for i € Nm and the m components of (;.., V; cannot be non-trivial linear 
combinations of each other. Hence dim(G;*, Vj) = 35; ., ni- 


24.1.13 REMARK: Sum-of-dimensions rule for direct sums. Product-of-dimensions for tensor products. 
Theorem 24.1.12 may be contrasted with the situation for tensor products of linear spaces. The dimension 
of a tensor product of finite-dimensional linear spaces equals the product of the dimensions of the component 
spaces. (See Theorem 28.1.22 for the dimension of a tensor product of linear spaces.) 


24.1.14 REMARK: Free linear spaces on finite sets are special kinds of direct products. 

The free linear space on a finite set S in Definition 22.2.10 is a direct product of #(S) copies of the field K 
regarded as a linear space over itself. Thus dim(FL(S, K)) = #(S). (This is shown in Theorem 22.7.18, but 
without the interpretation of free linear spaces as direct products.) 


24.2. Quotients of linear spaces 


24.2.1 REMARK:  Partitioning linear spaces into cosets. 

A linear space V may be “divided” by any subspace of V. In other words, a linear space may be “subdivided” 
or “partitioned” by a subspace. The subdivisions are called “cosets” and the global partition is called a 
"quotient space". This is the same concept as for groups. (See Definition 17.7.1 for cosets of subgroups, and 
Definition 17.7.9 for quotients of groups.) 
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24.2.2 DEFINITION: A coset of a subspace U in a linear space V is a subset S of V which satisfies 


da € S, Vu € V, vcSev-rcU. 


24.2.3 REMARK: Left and right translates of subsets of a linear space. 
'The left and right translates in Definition 24.2.4 and Notation 24.2.5 are, of course, identical because vector 
addition is commutative. 


24.2.4 DEFINITION: 
The (left) translate of a subset S of a linear space V by an element x € V is the set {a + v; v € S}. 


The right translate of a subset S of a linear space V by an element z € V is the set {v + z; v € S]. 


24.2.5 NOTATION: 
x + S, for an element x and a subset S of a linear space V, denotes the left translate of S in V by zx. 


S + x, for an element x and a subset S of a linear space V, denotes the right translate of S in V by zx. 


24.2.6 THEOREM: Equivalent condition for membership of a left translate of a subset. 
Let V be a linear space. Let S be a subset of V. Let x € V. Then Ww cV, (vE £+ S &v-«xec S). 


PROOF: Let V be a linear space. Let S C V and z,v € V. Thenver+S & (aw E S, v = x +w) 
by Definition 24.2.4. So v € z -- $ & (Iw € S, w = v — x). Therefore v € z + S = v— zr € S. Hence 
Vv eV, (vertS e v-rcS) 


24.2.7 THEOREM: Some basic properties of left translates of subspaces. 
Let V be a linear space. Let U be a subspace of V. 


(i) Every coset of U in V is non-empty. 
(ii) For all x € V, the translate x + U is a coset of U in V. (Every translate of U is a coset of U.) 
(iii) For every coset S of U in V, for alla € S,S = x +U. (Every coset of U is a translate of U.) 
(iv) Ve € V, Vy E€ x +U, (x +U =y+U). 

(v) YrEeV,zex+U. 

(vi) The set {x +U; x € V} is a partition of V. 
(vii) Vzri,z2 € V, (xvi +U = t2 +U & zı — z2 € U). 


PROOF: For part (i), let S be a coset of U. Then by Definition 24.2.2, da € S, Vv € V, (v € S & v-r € U). 
This proposition may be rewritten as dz, (x € $ A W E V, (v € S & v—z € U)). Therefore 3z, (x € S). 
Hence S Æ Í. 


For part (ii), let z € V. Then z +U = {xz +w;w € U}. Let v € V. Then v € z + U if and only if 
w€U,v-— z-- w. Let S 2 z -U. Then v € S if and only if dw € U, v = z + w, which holds if and only if 
w € U, w = x — v, which holds if and only if x — v € U. Therefore Vv € V, (v € S & v — x € U). Hence 
x+U is a coset of U. 


For part (iii), let © be a coset of U in V. Then dy € S, vv € V,(v € S & v—y € U). Let yo 
be an element of S such that Vv € V, (v € S & v— y; € U). Let z € S. Then xz — yo € U. So 
v—yo EU & (v—yo) - (x yo) EU & v—x €U for all v € V. Therefore Ww € V, (VES & v—a2 EU). 
But Vv € V, (v € z--U & v-r € U) by Theorem 24.2.6. So Vr € V, (v € S & v € xU). Hence $ = z4U. 


For part (iv), let x € V and y € x +U. Suppose that z € x + U. Then y — z € U and z — z € U by 
Theorem 24.2.6. So (z — x) — (y — x) = z — y € U because U is a subspace. So z € y +U by Theorem 24.2.6. 


Therefore z J-U C y+U. Similarly, suppose that z € y+U. Then (z-y)+(y—a2) = z—-z €U. Soz € z -U. 
Therefore y+ U € z +U. Hence z cU =y+U. 

For part (v), let x € V and note that Oy € U by Theorem 22.1.11 (iv). Sor =2+0y € z +U by 
Definition 24.2.4. Hence Yx cV, x € z -U. 

For part (vi), let 21,22 € V. Suppose that (xı + U) A (xa +U) Z 0. Then y € zı +U and y € za +U for 
some y € V. Sox;+U = y4+U = z3--U by part (iv). Therefore for all x1, £2 € V, either (x1 +U)N(a2+U) = 0 
or zı +U = z2 +U. But by part (v), V CU, cy (e+ U). So V 2U,cy (xz +U). Hence {r +U; z € V) isa 
partition of V by Definition 8.7.12. 
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For part (vii), let 21,22 € V. Suppose that xı +U = z2 +U. Let y € xı +U. Then y — xı € U by 
Theorem 24.2.6. Similarly, y — zo € U. So zi — £2 = (y — x2) — (y — 21) € U because U is a subspace. Now 
suppose that xı — #2 € U. Then 2; € a2 + U by Theorem 24.2.6. So xı +U = 22 +U by part (iv). 


24.2.8 DEFINITION: The quotient (linear) space of a linear space V with respect to a subspace U of V is 
the set (v + U; v € V} together with addition and scalar multiplication operations defined by 


V1, U2 € V, (v4 +U) + (ve +U) = (v1 +v) +U (24.2.1) 
VA € K, Vv € V, A(v + U) = (Av) + U, (24.2.2) 


where K denotes the field of V. 
24.2.9 NOTATION: V/U, for a linear space V and subspace U, denotes the quotient space of V over U. 


24.2.10 THEOREM:  Well-definition of addition and scalar multiplication of left translates of subspaces. 
'The operations in Definition 24.2.8 are well-defined. 


PROOF: The sets vı +U, v2 +U and (vı + v2) + U in Definition 24.2.8, line (24.2.1), are well defined by 
Definition 24.2.4. It must be shown that the defined addition operation for cosets vı + U and və + U is 
independent of the vectors vı and v2 when the sets are the same. In other words, it must be shown that 
(v4 +U) + (v2 + U) = (vi +U) + (v +U) if vı +U = vi +U and v9 +U = v +U. So suppose that 
vı +U = vj +U and v +U = v, +U. Then v; — v4 € U and v; — vg € U by Theorem 24.2.7 (vii). So 
(v1 +v2) — (v1 +g) = (v1 v1) - (v —v5) € U. Therefore (vi 4-v3)--U = (vj -v5)--U by Theorem 24.2.7 (vii). 
This verifies that line (24.2.1) of Definition 24.2.8 is well defined. 

To show the well-definition of line (24.2.2), let v4, vi € V satisfy vı +U = vi +U. Then v — v € U by 
Theorem 24.2.7 (vii). Let A € K. Then A(vı — v1) € U because U is a subspace. So Avı — Av, € U. Hence 


(Avı) +U = (Av) + U by Theorem 24.2.7 (vii). 


24.2.11 THEOREM: The identity for the addition operation for left translates of subspaces. 
Let U be a linear subspace of a linear space V. Then U is the additive identity of the quotient linear 
space V/U. In other words, 0y;y = U. 


PRoor: Let U be a linear subspace of a linear space V. Then 0y + U = U by Definition 24.2.4. So it 
follows from Definition 24.2.8 that U + (v +U) = (0y +U) + (v +U) = (Ov 4-v) -U 2 v +U for all v € V. 
Similarly, (v + U) +U = v +U for all v € V. Hence U is the additive identity of V/U. 


24.2.12 DEFINITION: The natural homomorphism from a linear space V to the quotient linear space V/U, 
for a subspace U of V, is the map 0: V > V/U defined by Vv € V, 0(v) 2 v +U. 


24.2.13 REMARK: Natural homomorphisms for quotient linear spaces are epimorphisms. 
The natural homomorphism in Definition 24.2.12 is a linear space epimorphism. This follows easily from 
Definitions 23.1.8 and 24.2.8. The maps and spaces in Definition 24.2.12 are illustrated in Figure 24.2.1. 


V —— V/U 


0 o idy 


U 


Figure 24.2.1 Natural homomorphism between a linear space V and a quotient space V/U 


24.2.14 THEOREM: The dimension of a linear spare quotient equals the difference of the dimensions. 
Let U be a linear subspace of a finite-dimensional linear space V. Then dim(V/U) — dim(V) — dim(U). 
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PROOF: By Theorem 22.8.19, there is a basis (e;)?_, for V such that (e;)/7*, is a basis for U, where 
n = dim(V) and m = dim(U). Then for all v € V, v = 35; Aiei for some unique A = (4,)*., € K”. So 


V/U ={v+U;veV} 


- (3 e) +U; Aek"| 


eu > N,(e; +U); A e Kn7my, 


i—m-4-1 


Thus (e; + U)? 4,4, is a basis for V/U. Hence dim(V/U) = n — m = dim(V) — dim(U). 


24.2.15 REMARK: The image and kernel of a linear map. 

Theorem 24.2.16 asserts that if o : V — W is a linear map between linear spaces V and W, then there is 
a natural isomorphism between the quotient space V/ ker(ó) and the image ¢(V) of ¢. (The kernel ker(¢) 
of ¢ is defined in Definition 23.1.22.) 

Theorem 24.2.16 is the first of the famous three isomorphism theorems, which have analogues in many other 
categories. (See for example, Ash [50], pages 20-23, 37-38, 89-90; MacLane/Birkhoff [110], pages 409-412; 
Lang [108], pages 16-17, 120; Rose [127], pages 50, 56; Hartley /Hawkes [90], pages 79-80.) 


24.2.16 THEOREM: Isomorphism theorem for linear spaces. 
Let V and W be linear spaces over a field K, and let 6: V —^ W be a linear map. 


(i) &(V) is a linear subspace of W. 
(ii) 6: V > (V) is a linear space epimorphism. 
(iii) 9' : V/ ker(¢) — @(V) is a linear space isomorphism, where $' is defined by ¢’ : v + ker(¢) + ¢(v). 


PROOF: Let V and W be linear spaces, and let ó : V — W bea linear map. Then any w1, w2 € 6(V) may be 
expressed as w; = $(v1) and w2 = ¢(v2) for some v1, v € V. Then wı +w2 = $(vi)--$(v3) = ó(vi-v3) € W. 
So wi + we € $(V). Therefore ¢(V) is closed under vector addition. Similarly, if w € ¢(V) and à € K, 
then Aw = $(Av) € (V) for some v € V for which ¢(v) = w. So ¢(V) is closed under scalar multiplication. 
Hence ¢(V) is a linear subspace of W. This verifies part (i). 

Part (ii) follows from the linearity and surjectivity of 6: V > (V). (See Definition 23.1.8 for linear space 
epimorphisms.) 

For part (iii), let U = ker(¢), and define ó' : V/U — Q(V) by ¢'(v +U) = ¢(v) for all v € V. Then ¢’ is well 
defined because if v, + U = v2 + U for some v1, v2 € V, then v; — v2 € U = ker(ó) by Theorem 24.2.7 (vii), 
and so $(v1 — v2) = Ow, and therefore ó(v3) = (ve + (v1 — v2)) = o(v2) + ó(v1 — v2) = $(v1). Now let 
v1, v2 € V be arbitrary elements of V. Then (vi + U) + (v2 + U) = (v + v2) + U by Definition 24.2.8. So 
$' ((v +U)+(v2+U)) = 6! ((v, +02) +U) = O(v, + v2) = ó(v1)- (v3) = P' (v1 +U) 4+ ¢'(v2+U), which verifies 
Definition 23.1.1 (i), the vector additivity of ¢’. Similarly, let A € K and v € V. Then A(v + U) = (Av) + U 
by Definition 24.2.8. So d'(A(v + U)) = ¢'((Av) +U) = (Av) = Aó(v) = A'(v + U), which verifies 
Definition 23.1.1 (ii), the scalarity of ¢’. So ¢’ is a linear map between V/U and W. But the image 9'(V/U) 
of ¢’ equals the image Q(V) of à. So ¢’ is surjective. The injectivity of ¢’ follows by letting v1, v2 € V satisfy 
$'(vi -U) = $'(v3--U), and noting that then Oy; = ¢'(v,+U)—@'(v24+U) = ó'((v4—v3)--U) = ó(v1 —v3), and 
so vı — v2 € ker(¢) = U, and therefore vı +U = v2 +U by Theorem 24.2.7 (vii). Hence ¢’ : V/ ker($) > 9(V) 
is a linear space isomorphism by Definition 23.1.8. 


24.2.17 REMARK:  Rank-nullity theorem. 

Theorem 24.2.18, which follows from Theorem 24.2.16, is sometimes known as the “rank-nullity theorem" 
because it relates the rank, dim(Range(49)), of ¢ to its nullity, dim(ker($)). (See Definition 23.1.26 for the 
nullity of a linear map.) Theorems 24.2.16 and 24.2.18 are illustrated in Figure 24.2.2. 


The rank-nullity theorem may seem, at first sight, to be provable in terms of some kind of orthogonal 
complement (or linearly independent complement) to the kernel of a linear map. However, the space which 
has dimension dim(V) — dim(ker(¢)) is not a subspace of Dom(4). It is in fact a set of cosets. If the space 
has an inner product, one could possibly make some kind of “orthogonal complement” interpretation, but 
Theorem 24.2.18 is “metric-free” . 
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V/ker(¢) ~ (V) 
dim(V) — dim(ker(¢)) = dim(¢(V)) 


Figure 24.2.2 Isomorphism theorem and rank-nullity theorem 


24.2.18 THEOREM: Dimension of finite-dimensional space equals the sum of range and kernel dimensions. 
Let 6: V —^ W be a linear map for linear spaces V and W over the same field. If V is finite-dimensional, 
then dim(Range(¢)) + dim(ker(¢)) = dim(V). 


PROOF: The assertion follows from Theorems 24.2.16 (iii) and 24.2.14. 


24.3. Natural isomorphisms for duals of subspaces 


24.3.1 REMARK: Immersion of dual of a subspace within the dual of the whole space. 

As noted in Remark 23.6.11, the dual of a proper subspace of a linear space is not a subspace of the dual of 
the primal space. This raises the question of how one may somehow immerse the dual of a proper subspace 
within the dual of the primal space in a natural way. This kind of immersion may be constructed using 
suitable quotient spaces as in Theorems 24.3.4 and 24.3.7. The requirement for such constructions arises very 
naturally when one attempts to find constructions to represent antisymmetric covariant and contravariant 
tensors as in Section 30.4. (Theorem 24.3.4 is used in the proof of Theorem 30.4.16.) 


Despite the relevance of natural isomorphisms for duals of subspaces to the construction of duals of spaces 
of antisymmetric and symmetric tensors, this topic is often ignored in tensor algebra presentations. There 
is no great harm in this. Since there are so many identifications of different kinds of tensor spaces, it is 
tempting to ignore the underlying algebraic structures and define tensors purely in terms of transformation 
rules under changes of basis. In other words, “if it transforms like a tensor, it is a tensor”, and if two objects 
have the same transformation rules, they are the same kind of tensor. This can lead to some perplexing 
difficulties in the case of duals of antisymmetric tensor spaces. This is the motivation for Section 24.3. But 
the cure may be more painful than the malady. So skipping Section 24.3 might be a good idea. 


24.3.2 REMARK: Illustration for isomorphism for the dual of a subspace of a linear space. 

The maps and spaces in Theorem 24.3.4 are illustrated in Figure 24.3.1. The map @ in this diagram is the 
natural homomorphism in Definition 24.2.12 for the quotient V*/U+. The numbers at the bottom left of 
each space are the dimensions of the spaces if dim(V) = n and dim(U) = m for some m,n € Zf. 


V V* yu 
O 0 
. A——» 
n n m 
idy | fiav: E 
m n-m m 
U UL U* 
Figure 24.3.1 Natural isomorphism for the dual of a subspace of a primal space 
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24.3.3 REMARK: A related theorem for duals of linear spaces. 

Theorems 24.3.4 and 24.3.7 may be obtained as consequences of a related theorem concerning bilinear maps, 
duals spaces and quotient spaces by Lang [108], pages 145, 523, where the proof has a different form. There 
the spaces corresponding to U+ and W+ are called the “kernel on the left" and “kernel on the right” 
respectively of the map (¢,v) +> ó(v). Lang [108], page 49, also gives such a theorem for commutative 
groups. 


Theorems 24.3.4 and 24.3.7 may also be obtained as consequences of a similar, but different, related the- 
orem by MacLane/Birkhoff [110], page 210, where the spaces corresponding to U + and W- are called the 
“annihilators” of the sets U and W respectively. 


24.3.4 THEOREM: Natural isomorphism for the dual of a linear subspace of a primal linear space. 
Let U be a subspace of a linear space V with a well-ordered basis. Let U^ = {@ € V*; Vv € U, ó(v) = 0]. 
Define v : U* 2 IP(V*) by v(f) = (6 € V*; Ww € U, o(v) = f(v)) for all f c U*. 


(i) U+ is a linear subspace of V*. 


(ii) Vf € U*, v (f) eV*yU1. 

(iii) Vf € U*, Y € V*, (GE v(f) & v(f) 2 6 - U7). 
(iv) v is a linear map. 

(v) v : U* + V*/U- is a linear space isomorphism. 


PRoor: Let V be a linear space with a well-ordered basis. Let U be a linear subspace of V. Define the 
subset U+ of the algebraic dual space V* by Ut = ($ € V*; Vv € U, ó(v) = 0). Then clearly Oy» € Ut. Let 
$1, $3 € UŁ. Then Vv € U, ¢1(v) = 0 and Vv € U, $»(v) = 0. So Vv € U, (p1 +¢2)(v) = 0. So 91-2 € Ut. 
Similarly, A € U+ for all A € K and ó € U+, where K is the field of V. Hence U+ is a linear subspace 
of V*, which proves part (i). 

For part (ii), it must be shown that for all f € U*, the subset (f) = (o € V*; Vv € U, ó(v) = f(v)) of V* 
is a coset of the form $o +U+ € V*/U+ for some ġo € V*, and that this coset is uniquely determined by f. 
Let f € U*. Then v(f) # 0 by Theorem 23.5.7 because V has a well-ordered basis. Let $9 € (f). Then 
Vv € U, polv) = f(v). Let ó € V*. Then 


$ € (f) = Vv €U, o(v) = f(v) 
€ Vv € U, ov) = M 
€ Vv EU, (6 — 6o)(v) = 
< 0-—$09€U- 
< ġġ E po +U7, 


by Theorem 24.2.6. So v(f) = $o -- U^. It follows from Theorem 24.2.7 (vii) that $1 + U+ = $o + U+} for 
any alternative choice $, in place of ġo. Hence v(f) is a well-defined element of V*/U+. 


For part (iii), let f € U* and ¢ € V*. Then by part (ii), v(f) = $9--U- for some ġo € V*. Sod € v(f) if and 
only if € = do +U+, which holds if and only if ¢+U+ = ġo +U}, by Theorem 24.2.6 and Theorem 24.2.7 (vii). 
Hence $ € ¥(f) & v(f) 2 6 - Ut. 

For part (iv), let fi, fo € U*. By part (ii), Y(f1) = $1 -- Ut and v(f2) = p2 -- U+ for some $1,693 € V*. 
Then ¢; € v(f1) and ¢2 € Y(f2) for such $1, 9». So gı (v) = fi(v) and ¢2(v) = fa(v) for all v € U. Therefore 
Vv € U, ($1 + b2)(v) = b1(v) + b2(v) = fiw) + fa(v) = (fi + f2)(v). So ġ1 + à» € (6 € V*; Vv E U, (v) = 
Gfi-- f2)(v)) = v(fi4- f2). So by part (iii), v(f1--f2) = (d1t¢2)+U+ = (d14+U*)+(b24 U*) = Y( f1) 4-52). 
Therefore yw satisfies the condition of vector additivity. To prove scalarity of v, let A € K and f € U*. By 
part (ii), v(f) = $o + U^ for some ġo € V*. Then ¢o(v) = f(v) for all v € U. So (Aóo)(v) = Ado(v) = 
Af(v) = (Af)(v) for all v € U. Therefore Ado € {6 € V*; Vv € U, ó(v) = (Af)(v)} = v(Af). So by part (iii), 
W(Af) = (Ado) + Ut = Afo + U+) = AW(f), which proves the scalarity of Y. Hence 7 is a linear map. 

For part (v), to show surjectivity of 7, let o +U+ € V*/U- for some ġo € V*. Let f = $o],- 'Then 
f € U* because the restriction of a linear functional to a linear subspace is a linear functional on the 
subspace. So do € {¢ € V*; Vv € U, óo(v) = f(v)) = v(f). So v(f) = $o + UM by part (iii). Therefore 
o + Ut € Range(7). Hence v is surjective. To show injectivity, let f1, fo € U* satisfy Y( f1) = v(f2). Then 
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V(f1) =¢1+U~ and (fo) = $2 -- U- for some ¢1,¢2 € V* by part (ii). So $1 +U+ = d2+U+. Therefore 
$1 — d2 € Ut by Theorem 24.2.7 (vii). But ¢1 € v(fi) and à» € v(f2) by part (iii). So for all v € U, 
0 = ($1 — $2)(v) = ¢1(v) — é»(v) = fi(v) — fa(v). So fı = fo. Therefore f is injective. Hence v is a linear 


space isomorphism. 


24.3.5 REMARK: Illustration for isomorphism for the dual of a subspace of a dual space. 

The maps and spaces in Theorem 24.3.7 are illustrated in Figure 24.3.2. The map @ in this diagram is the 
natural homomorphism in Definition 24.2.12 for the quotient V/W ^. The numbers at the bottom left of 
each space are the dimensions of the spaces if dim(V) = n and dim(W) = m for some m,n € Zg. 


V* V V/W- 
“ww 0 
e — y 
n m 
m n-m m 
W wt w* 
Figure 24.3.2 Natural isomorphism for the dual of a subspace of a dual space 


24.3.6 REMARK: The double dual of a finite-dimensional space is isomorphic to the primal space. 

It may not be immediately clear why v(f) = (v € V; V € W, d(v) = f(¢)} should be non-empty for 
any f € W* in part (ii) of the proof of Theorem 24.3.7. Since V is finite-dimensional, the double dual of 
V is canonically isomorphic to V. (See Theorem 23.10.7.) The map f, : 6 > ó(v) for ó € V* defined a 
double-dual element f, € (V*)* for all v € V. But the map from v € V to f, € (V*)* is a bijection. In 
other words, every element of (V*)* is of this form. Therefore the expression f(¢) may be written as ó(vy) 
for some v; € V. So y(f) = (v € V; Vb € W, à(v) = à(vr)). Then clearly vy € v(f). So v(f) Z 0. 


24.3.7 THEOREM: Natural isomorphism for the dual of a linear subspace of a dual linear space. 
Let V be a finite-dimensional linear space, and W a subspace of V*. Let WŁ = {v € V; Vó € W, o(v) = 0}. 
Define Y : W* > P(V) by v(f) = (v € V; Vo € W, o(v) = f(9)) for all f € W*. 
(i) WŁ is a linear subspace of V. 
(ii) Vf e W*, (f) e V/W+. 
(ii) Vf € W*, vv € V, (ve 9(f) & v(f) v - W1). 
) 
) 


(iv) w is a linear map. 


(v) v : W* + V/W- is a linear space isomorphism. 


Proor: Let V be a finite-dimensional linear space. Let W be a linear subspace of V*, the algebraic dual 
of V. Define the subset W+ of V by Wt = (v € V; Yọ € W, d(v) = 0). Then clearly Oy € Wt. Let 
vı, v2 € WŁ. Then Yọ € W, ¢(v1) = 0 and Yọ € W, ¢(v2) = 0. So Vd € W, ó(vi +v2) = 0. Sov -v E WŁ. 
Similarly, Av € W+ for all A € K and v € W+, where K is the field of V. Hence W+ is a linear subspace 
of V, which proves part (i). 


For part (ii), it must be shown that for all f € W*, the subset (f) = (v € V; Yọ € W, ó(v) = f(9)) 
of V is a coset of the form vo -- W^ € V/W- for some vo € V, and that this coset is uniquely determined 
by f. Let f € W*. Then v(f) 4 0 by Theorem 23.5.7 because V has a well-ordered basis (because V is 
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finite-dimensional). Let vo € v(f). Then Yọ € W, $(vo) = f(¢). Let v € V. Then 


ve€wy(f) e vó EW, o(v) = f(9) 
€ Vo € W, ó(v) = (v0) 
€ Vó € W, ó(v — vo) = 0K 
e v—uvycW- 


e veu -W-, 


by Theorem 24.2.6. So (f) = vo + W+. It follows from Theorem 24.2.7 (vii) that v4 +W+ = vo + W^ for 
any alternative choice v; in place of vo. Hence w(f) is a well-defined element of V/W+. 


For part (iii), let f € W* and v € V. Then by part (ii), (f) = vo + W- for some vo € V. Sov € y(f) 
if and only if v € vp + W+, which holds if and only if v + W+ = vo + WŁ, by Theorem 24.2.6 and 
Theorem 24.2.7 (vii). Hence v € v(f) & v(f) 2 v - W-. 

For part (iv), let fi, fa € W*. By part (ii), Y( f1) = vi +W~ and v(f2) = v2+W~ for some v1, v2 € V. Then 
vı € v(f1) and vı € v(fi). So $(v1) = fi(9) and (v2) = fo(@) for all à € W. Therefore (vi +v2) = $(v1)4- 
b(v2) = fi() + fa(à) = (fi + f2)(9) for all 6 € W. So vi c v» € (v € V; Vb € W, (v) = (fi + f2(0)) = 
b(fi+ f2). So by part (iii), Y(fi + fa) = (v1 +v2)+W+ = (vi -WT)- (vo - W^) = v(fi) -v(f2). Therefore 
w satisfies vector additivity. To prove scalarity of 7, let A € K and f € W*. By part (ii), v(f) = vo + W- 
for some vo € V. Then (vo) = f(¢) for all 6 € W (because vo € v(f)). So (Avo) = Aé(vo) = Af(à) = 
(Af)(@) for all ó € W. Therefore Avo € (v € V; Vo € W, o(v) = (Af)(¢)} = v(Af). So by part (iii), 
V(Af) = (Avo) + Wt = A(vuo + Wt) = AV(f), which proves the scalarity of v. Hence v is a linear map. 
For part (v), to show surjectivity of v, let vo + W+ € V/W-. Then vo € V. Define f : W > K by 
f : — (vo). Then f € W* and vo € (v € V; Yọ € W, ó(vo) = f(¢)} = v(f). So v(f) = vo + W+ 
by part (iii). Therefore v; + W+ € Range(v). Hence w is surjective. To show injectivity, let f1, f € W* 
satisfy v(f1) = v(fa). Then (fi) = vi + Wt and (fo) = v2 -- W^ for some vi,va € V by part (ii). So 
v; + W^ = v; + W+. Then vı — v € WŁ by Theorem 24.2.7 (vii). But vı € v(f1) and v» € v(f2) by 
part (ili). So for all ó € W, 0 = (vı — v2) = $(v1) — 6(v2) = fi(9) — fa(9) = (fa — f2)(9). So fa — fa = 0w», 


and so fı = f2. Therefore f is injective. Hence v is a linear space isomorphism. 


24.4. Affine transformations 


24.4.1 REMARK: The historical origins of affine transformations. 

The concept of affine transformations was apparently first published by Euler [217] in 1748. Although 
Euler used the word “affine”, he was describing uni-axial scaling operations, which do not form a group. 
Affine transformations were first dealt with in a serious manner by Möbius [226] in 1827. Euler chose the 
name “affine” for the concept, but it was Möbius who correctly defined the general affine transformation 
and proved some non-trivial properties and invariants of affine transformations, such as the invariance of 
convex combinations of points. Euler needed a term to describe a geometric relation which was weaker than 
similarity, and since “affine” means “related”, he evidently found this word suitable to his purpose. (See 
Section 77.2 for details.) 


24.4.2 THEOREM: Equivalent conditions for affine transformations. 
Let V be a linear space over a field K. Let 6: V — V be a bijection. Then the following three conditions 
on @ are equivalent if the characteristic of K does not equal 2. 
(i) Vp,u,v € V, Vu € K, ó(p- àu + uv) — ó(p) = A(O(p + u) — é(p)) + u(é(p + v) — é(p)). 
(ii) Vp,v € V, YA € K, o(p+ Àv) — é(p) = A(é(p + v) — ¢(p)). 
(iii) For some b € V, the map à, : V > V with œ : v  $(v) — b is linear. 


If K is an ordered field, then conditions (i), (ii) and (iii) are equivalent to condition (iv), where K[0, 1] 
denotes the set {A € K; Ox X A € Ix]. 


(iv) Vp,v € V, VÀ € K[0,1], (p+ Av) — é(p) = A(O(p + v) — é(p)). 
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PROOF: Let V be a linear space over a field K. Let ¢: V — V be a bijection which satisfies condition (i). 
Substituting 0x for u gives condition (ii). 


Let V be a linear space over a field K whose characteristic does not equal 2. Let 6: V > V be a bijection 
which satisfies condition (ii). Let p, u, v € V and A, u € K. Substitution of u for v and 2A for A in (ii) gives 


o(p + 2Au) — (p) = 2A(9(p + u) — O(p)). 
Substitution of 2 for A in (ii) gives 
(p + 2uv) — é(p) = 2u(O(p + v) — é(p)). 
Substitute p + 2Au for p, and 2uv — 2Au for v, and 2^! for A in (ii). Then 
olp + 2Au + 2-1 (2uv — 2Au)) — ó(p + 2Au) = 2^ 1(ó(p + 24u + (2uv — 2Au)) — o(p + 2Àu)). 


So 
o(p + àu + uv) — (p + 2Au) = 2^! (ó(p + 2uv) — ó(p + 2Au)). 
Henc 
t o(p + Au + pv) — é(p) = 27 (Alp + 2u*) + (p+ a ép) 
= 2 1 (ó(p + 2uv) — ó(p)) + 2! (6(p + 2àu) — é(p)) 
= 2 !(ó(p- 2w) — (p)) + he (p + 2Àu) — é(p)) 
—2 !(2u(ó(p- v) — hop ~1(2d(o(p + u) — é(p))) 


= A(é(p + u) — o(p)) + os + v) — ó(p)), 


by a double application of condition (ii). This verifies condition (i). Hence (i) and (ii) are equivalent. 


Now assume condition (iii). Then there is a b € V such that the map œ» : V — V with œ» : v ¢(v) — b is 


linear. Then 
olp + àu + pv) — b = 1 (ó(p) — b) + 1K (G(Au + uv) — b) 


= (¢(p) — b) + A(6(u) — b) + u(ó(v) — b) 
So 
olp + Au + uv) — (p) = A(P(u) — b) + ullu) — b). 


But by linearity of ¢, again, ó(p + u) — b = (ó(p) — b) + (G(u) — b). So (u) — b = o(p+u) — (p). Similarly, 
(v) — b = d(p+v) — o(p). So 


olp + àu + uv) — (p) = A(O(p + u) — é(p)) + n(à(p + v) — e(»)). 


This verifies condition (i). So now assume condition (i) and let b = ¢(0). Then by substituting 0 for p in (i), 


by (^u + uv) = (àu + uv) — 6(0) 
= A(o(u) — 9(0)) + n(ó(v) — ¢(0)) 
= Adbo(u) + up (v). 


So ó» is linear, which verifies (iii). So conditions (i), (ii) and (iii) are equivalent. 

Now let K be an ordered field. Then K has characteristic 0 by Theorem 18.8.7 (iv). Condition (iv) is a 
restriction of condition (ii). So (iv) follows immediately from (ii). 

Now assume condition (iv). Let p,v € V and A € K. If A € K[0, 1], then (ii) follows immediately from (iv). 
Suppose \ > 1. Then A! € K[0, 1] by Theorem 18.8.7 (iii). Substituting A~* for A and Av for v in part (iv) 
gives ó(p + A~"(Av)) — é(p) = A! (6(p + Av) — é(p)). Hence o(p + Av) — ó(p) = A(é(p + v) — ó(p)); which 
verifies part (ii) for all \ > 0. 

So suppose À < 0. Then 1 — A > 1. Substituting p+ v for p, and —v for v, and 1— A for A in condition (ii), 
which has just been verified for all A > 0, gives 


olp +v t (1— A)(7v)) — p+) = (1 — AY(ó(p + v — v) — é(p + v)). 


So ó(p--Àv) — é(p4-v) = (1— A)(6(p) — é(p--v)). So ó(p-- Av) —é6(p) = A(6(p--v) —6(p)). Hence condition (ii) 
is verified for all A € K. Therefore condition (iv) is equivalent to all of the other three conditions if K is an 


ordered field. 
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24.4.3 REMARK: Affine combinations of points. 

The point p+ Av in Theorem 24.4.2 (iv) is a convex combination of p and p+v because \ € K[0, 1]. It seems 
reasonable to call the point p + Av in Theorem 24.4.2 (ii) with unrestricted A € K an “affine combination” 
of p and p +v. 


24.4.4 REMARK: Affine transformations have no fixed base point. 

It is perhaps surprising that the equivalence of one-scalar and two-scalar conditions for affine transformations 
in Theorem 24.4.2 does not work for linear transformations. However, in the affine case, the base point p is 
portable. So a triangle of points can be linked and related to each other. In the linear case, the single-scalar 
condition ¢(Av) = A¢(v) has a fixed base point at the origin. So only pairs of points on radial lines from the 
origin can be related. 


24.4.5 REMARK: Affine transformations on linear spaces. 

Definition 24.4.6 is expressed in terms of Theorem 24.4.2 condition (iii) instead of the slightly more general 
condition (i) because linear spaces over fields with characteristic 2 are (probably) of limited value for differ- 
ential geometry, and the existence of a single “point of linearity” is a more convenient assumption than the 
assumption that all points are “points of linearity”. 


24.4.6 DEFINITION: An affine transformation of a linear space V over a field K whose characteristic is not 
equal to 2 is a bijection ¢: V — V which satisfies 


p € V, Vu,v € V, Vu € K, 
olp + àu + uv) — (p) = A(ó(p + u) — ó(p)) + u(à(p + v) — é(»)). 


24.5. Exact sequences of linear maps 


24.5.1 REMARK: Applications of exact sequences of linear maps. 

Exact sequences of linear maps have some relevance to the properties of connection forms on principal fibre 
bundles in Theorem 69.5.13. In algebraic topology, exact sequences of homomorphisms between modules 
over a ring are used extensively. 


According to a foreword to Hurewicz [95], page xii, exact sequences were introduced in 1941 by Hurewicz. 
(For exact sequences in general algebra, see MacLane/Birkhoff [110], pages 326-328; Lang [108], pages 15-18; 
Ash [50], pages 105-108. For exact sequences in topology contexts, see Hocking/Young [93], pages 284—288; 
Greenberg/Harper [86], pages 75-81; Wallace [152], pages 150-157; Wallace [154], pages 17-23; Steenrod [142], 
pages 72-79; Nash/Sen [30], pages 99-104; Lang [23], pages 52-58.) 


24.5.2 DEFINITION: An exact sequence of linear maps (over a field K) is a pair ((Vi)?1, (fi)? 4) of 
sequences such that each V; is a linear space (over A), each f; is a linear map from V; to V;,1, and 
Img f; = ker fj, for 1 < à < n — 1; that is, the pair of sequences must satisfy 


(i) Vi € IN441, Viis a linear space (over K), 
(i) Vi € Nn, f; € Lin(Vi, Vis1), 
(iii) Vic Nn-ı, Img fi = ker fiai 


If K is not mentioned, it may be arbitrary or else equal to IR, depending on context. 


24.5.3 NOTATION: The pair of sequences ((V;)7]., (f;)?*.,) may be denoted 


f 


vı F y; IEN LV, hes. 


The trivial linear space {0} may be denoted 0 in such diagrams. 


24.5.4 THEOREM: Relations between dimensions of images and kernels in an exact sequence. 


Suppose Vi qr NERA EG aan is an exact sequence of linear maps between finite-dimensional linear 
spaces. Then the following hold: 
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(i) Forl<k<l<n, 
l 


dim Img fı = 50 (—1)'"' dim V; — (—1)'^* dim ker fp. 
i=k 
(ii) If Vi = 0, then for 1 <I € n, 
l 
dim Img fı = 3 C1) dim V;. 
222 


(ii) If Vj = 0 and V, = 0, then 


TL 


3 71)" dim V; = 0. 


i=2 


That is, the sum of the odd dimensions equals the sum of the even dimensions. 


PROOF: Let1 < k< n. Then by Theorem 24.2.18, dim Vk = dim ker fk +dim Img fk. That is, dim Img fk = 
dim V;, — dim ker fp. This establishes case | = k of (i). 


If 1 X k « n, then 


dim Img fx41 = dim V4, — dim ker fk+1 
— dim Vk+1 = (dim Vk — dim ker fkaa)- 


This is case l = k + 1 of (i). The result for general k and | follows by induction from Theorem 24.2.18 and 
the exactness of the sequence. 


Parts (ii) and (iii) follow directly from (i). 


24.5.5 THEOREM: Injectivity and surjectivity of linear maps in an exact sequence. 


(i) If V4 X Və — 0 is an exact sequence of linear maps, then f is surjective, and hence dim V; > dim V2. 


If 0— Vi f » Və — 0 is an exact sequence of linear maps, then f is a bijection. Hence dim V; = dim V2. 


) 
(i) If 0 2 V4 -25 Vp is an exact sequence of linear maps, then g is injective, and hence dim V4 < dim V2. 
(iii) 

) 


(iv) If0— V4 2 V Ly V3 — 0 is an exact sequence of linear maps, then f is surjective, g is injective, 
f o g is the zero map, and dim V2 = dim Vj + dim V3. 


Pnoor: Assertions (i), (ii) and (iii) follow easily from Definition 24.5.2. 
Part (iv) follows from Theorem 24.5.4 (iii). 


24.5.6 REMARK: Properties of right inverses of surjective linear maps. 
Theorem 24.5.7 is concerned with right inverses of surjective linear maps. A one-to-one correspondence is 
demonstrated between such right inverses and the complementary subspaces of such maps. 


f 
V IW i0. 
p 


24.5.7 THEOREM: Dimension of space equals sum of linear map nullity and right inverse range dimension. 
Let V and W be linear spaces over the same field. Let f : V —^ W be a surjective linear map. Let o: W > V 
be a linear map such that f o p = idw. Then V = (ker f) Cp p(W). 


PROOF: Clearly p(W) is a linear subspace of V. To show that (ker f) U o(W) spans V, let v € V and put 
v' = po f(v). Then 


So v—v' € ker f. But v’ € p(W). Hence v = (v—v')+v’ € (ker f) B p(W). To show that (ker f)Np(W) = {0}, 
suppose v € (ker f) N p(W). Then v = p(w) for some w € W. But w = idw(w) = f o p(w) = f(v) = 0, 
because v € ker f. So v = p(0) = 0. So V = (ker f) Ð p(W). 
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24.6. Seminorms on linear spaces 


24.6.1 REMARK: Seminorms and norms are formally algebraic, but have topological significance. 

Seminorms and norms are apparently algebraic structures on linear spaces. (See Definitions 19.5.3 and 19.6.2 
respectively for the general definitions of seminorms and norms on modules over non-zero rings.) But a 
norm may be regarded as a topological concept because it induces a metric, and a metric induces a topology. 
Similarly, a set of seminorms may induce a topology on a linear space. So topological concepts such as limits, 
convergence and continuity may be defined in terms of seminorms or norms on linear spaces. However, even 
the absolute value of a real or complex number may be regarded as a metric, which therefore has a topological 
character. Seminorms and norms only acquire an explicitly topological character when limits, convergence, 
continuity and other topological concepts are applied to them. In Sections 24.6, 24.7 and 24.8, only the 
algebraic aspects of seminorms and norms are presented. mL CN 


24.6.2 REMARK: Specialisation of general seminorms and norms from modules to linear spaces. 
Definitions 19.5.3 and 19.6.2 define respectively a seminorm and norm on a general module over a non-zero 
ring, compatible with a general absolute value function valued in a general ordered ring. Definitions 24.6.3 
and 24.7.2 specialise these to linear spaces with real numbers as the ordered ring. The spaces and functions 
in Definitions 24.6.3 and 24.7.2 are illustrated in Figure 24.6.1. (La denotes left multiplication by A.) 


|] 
Ace K —— R D) L 
Ly 


yo Ly = Lyo y 
V 


LA( 


Figure 24.6.1 Seminorm or norm on a linear space V over a field K 


The use of an abstract field K is not superfluous formalism. It could be that K equals C or is a subfield of 
C which is not a subfield of IR. 


24.6.3 DEFINITION: A seminorm on a linear space V over a field K, compatible with an absolute value 
function | - |x : K > IR, is a function Y% : V — R which satisfies: 


(i) VÀ € K, Vv € V, (Av) = |A|v(v), [scalarity] 
(ii) Vv,w € V, v(v + w) € v(v) + vw). [subadditivity] 


24.6.4 THEOREM: Some basic properties of seminorms. 
Let V be a linear space over a field K. Let y : V — R be a seminorm on V which is compatible with an 
absolute value function | - |x : K > R. 


(i) v is a seminorm on the left module V over the non-zero ring K, compatible with | - |x. 
(ii) (0v) — Op. 
(ii) Vv € V, u(—v) = (v). 
(iv) Vv € V, v(v) > Or. 


Pnoor: Part (i) follows from Definitions 19.5.3 and 24.6.3 because any field K is a non-zero ring by 
Theorem 18.7.8 (iii), and R is an ordered ring. 


Parts (ii), (iii) and (iv) follow from part (i) and Theorem 19.5.4 parts (i), (ii) and (iii) respectively. 


24.6.5 REMARK: Continuity of seminorms. 

Theorem 24.6.6 (iv) may be thought of as a kind of “continuity” of a seminorm with respect to itself. (See 
Theorem 39.3.6 for a direct application of this.) In other words, if x — y is “small” as measured by w, then 
w(x) — v(y) is also small as measured in R by the standard absolute value function | - |r on IR. 
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24.6.6 THEOREM: Some inequalities for seminorms on linear spaces. 
Let  : V > IR bea seminorm on a linear space V over a field K, with absolute value function |- |x : K > R. 


(i) If v (v) Z Op for some v € V, then |l x|x = 1m. 
(ii) Va,y € M, (x) > oy) — v(x — v). 
(iii) Vz,y € M, p(x) € v(y) + (x — y). 
(iv) Vz,y € M, |v(x) — v(y)In € Yl — y). 
PROOF: For part (i), let v € V with v(v) 4 OR. By Definition 24.6.3 (i), v(v) = v(1xv) = l1x|xv(). 
Therefore |1 | = 1r by Theorem 18.6.4 (ii). 
Part (ii) follows as for Theorem 19.5.4 (iv). 
Part (iii) follows from part (ii) and Theorem 24.6.4 (iii) by swapping x and y. 


For part (iv), let x,y € M. By part (ii), (y) — y(x) < y(x — y). By part (ili), P(@) — v(y) < v(z — y). 
Therefore |v(z) — v(y)|a € v(x — y) by Definition 18.5.14. 


24.6.7 REMARK: The convex, balanced and absorbing properties for seminorms. 

Theorem 24.6.8 shows that the open and closed “unit balls" So and So for a seminorm « are convex in parts 
(ii) and (iii), “balanced” in parts (iv) and (v), and “absorbing” in parts (vi) and (vii). (See Definition 24.6.9 
for balanced and absorbing sets.) “The functions Vo and wc are known as the “Minkowski functionals” of 
So and Sc respectively. (See Yosida [167], page 25.) 


24.6.8 THEOREM: Some properties of open and closed “unit balls? for seminorms. 

Let Y be a seminorm on a real linear space V with the usual absolute value function on IR. Define open and 
closed “unit ball” sets So = {v € V; v(v) < 1} and Sc = (v € V; v(v) € 1}, and functions vo, Yc : V > R 
by Vo : v  inf(t € R*; t^!» € So} and vc : v  inf(t € R*; t^v € So}. 


(i) 0 € So C Sc. 

(ii) So is a convex subset of V. [So is convex 
(iii) Sc is a convex subset of V. [Sc is convex 

(iv) Vv € V, (v E€ So & —v € So). [So is balanced 

(v) Vv € V, (v € So & —v€ Sc). [Sc is balanced 
(vi) Vv € V, (te R}; t !v € So) z 0. [So is absorbing 
(vii) Vv € V, (t € Rt; t !v € Sc) FO. [Sc is absorbing 
(viii) Vv € V, polv) = ov). 


(ix) Vv € V, vc(v) = v(v). 
Pnoor: Part (i) follows from Theorem 24.6.4 (ii). 


For part (ii), let So = (v € V; v(v) < 1) and let vj,v? € So. Let u € [0,1]. Then 1— u > 0. So 
V((1— u)vi + uva) € v((1— u)v) + v(uva) = |1 — pyw) + |uli(va) = (1 — u)u(vi) + ui(v2) < 1 by 
Definition 24.6.3 (i, ii). So (1— u)vı + uv? € So. Therefore So is a convex subset of V by Definition 22.11.8. 
Part (iii) may be proved as for part (ii). 

Parts (iv) and (v) follow from Theorem 24.6.4 (iii). 

For part (vi), let v € V. Let t € IR* satisfy t > (v). Then y(t^!v) = t^! (v) < 1. Sot^!v € So. Therefore 
{t € Rt; t !v € So) 2 (v(v),oo) FO. 

For part (vii), let v € V. Let t € IR* satisfy t > v(v). Then (t !v) = t !v(v) € 1. Sot^!v € Sc. Therefore 
(t € Rt; t^v € Sc) 2 [(v),oo) Z 0 if (v) > 0, and (t € Rt; t^ 1v e Sc) 2 Rt Z 0if v(v) — 0. 

For part (viii), let v € V. Let t € Rt with t > (v). Then t^! y(v) < 1. Sot € So. But if 0 < t € pv) 
then t^! (v) > 1 and so t ¢ So. Therefore vo(v) = inf{t € Rt; t > v(v)) = v(v). 


For part (ix), let v € V. Let t € IR* with t > (v). Then t^ ! (v) € 1. Sot € Sc. But if 0 < t < v(v), then 
t (v) > 1 and so t ¢ Sc. Therefore c(v) = inf(t € IR*; t > v(v)) = v(v). 


24.6.9 DEFINITION: 
A balanced set in a linear space V is a subset S of V which satisfies Wv € V, (v € S v € $). 


An absorbing set in a real linear space V is a subset S of V which satisfies Vv € V, 3t € Rt, tiw € S. 
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24.6.10 REMARK: Unit balls contain the same information as seminorms. 

It follows from Theorem 24.6.8 (viii, ix) that the “information” in a seminorm may be recovered from either 
the “open unit ball” So or the “closed unit ball” Sc. This is useful in particular for visualisation of seminorms 
in terms of these “unit balls”. 


Theorem 24.6.11 shows that “unit balls” can be, more or less, recovered from Minkowski functionals which 
are constructed from given convex, balanced, absorbing sets. (The “open” and “closed” character of these 
sets requires some topology. So these concepts are not formally defined here.) Consequently there is a kind 
of information equivalence between seminorms and convex, balanced, absorbing sets. They are in some sense 
inverses of each other. 


This equivalence is useful for the graphical representation of seminorms and norms because one may visualise 
them either in terms of their value for a variable vector or else in terms of their “unit balls” So or Sc. (See 
for example the unit level sets for p-norms in Figure 24.7.1.) Some concepts are easier to understand, and 
some kinds of theorems are easier to prove, in terms of the unit balls. In particular, when a seminorm or 
norm is constrained in some way, for example by specifying its value on the elements of a basis, the knowledge 
that the unit ball must be convex, balanced and absorbing helps to narrow down the possibilities. 


24.6.11 THEOREM: Construction of a seminorm from a “unit ball” set. 
Let V be a real linear space. Let S be a subset of V which is convex, balanced and absorbing. Define 
w:V>Rby ov: v4 inf{t e R*;t !ve S). 


(i) Ov € S. 
(ii) Vv € V, V € Rt, (v(v) <t 2 wes). 
) 
) 


(iii) v is a seminorm on V. 


(iv) {v E V; yw) <1} C SC (v € V; v(v) € 1). 


PROOF: For part (i), the absorbing property implies t^!0y € S for some t € Rt. Then Oy — t^ !0y € S. 


For part (ii), let v € V and £ € IR* satisfy (v) < t. Then inf(t € Rt; t !v € S) « i. Sot !v € S for 
some t € IR* satisfying t < t. But then ttv = (1— )0y + ut^ 1v with u = t/t € [0,1]. So ttv € S by the 
convexity of S because Oy € S by part (i). 

For part (iii), {t € Rt; ttv € S} £0 for all v € V because S is absorbing. So v : V > IR is well defined. 
Let à € R and v € V. If A = 0 then v(Av) = inf(t € Rt; t !Av € S) = inf(t € Rt; Oy € S) = infIR* = 
0 = Av(v) by part (i), which verifies Definition 24.6.3 (i) for A = 0. 


Now suppose that A # 0. Since 5 is balanced, (Av) = inf{t € R+; t !Av € S} = inf{t € Rt; t7*|Alu € S}. 
Substituting t = |A|t gives (Av) = inf(|A[t € IR*; t^ !v e S) = |A|inf(t € Rt; t^v e S) = |AlW(v). So v 
also satisfies Definition 24.6.3 (i) for all A € R \ {0}. Thus v satisfies the scalarity condition. 


To show the subadditivity condition, Definition 24.6.3 (ii), let v, w € V. Suppose that v(v) = 0. Let e € Rt. 
Then &^1v € S. Let t € IR* satisfy t !w € S. Let u = t/(t--£). Then p € [0,1]. So (1—4)e !v-- ut we S 
by the convexity of S. Thus (t+ ce)-!(v +w) € S. Sot+e € {t € R*;t-!(v +w) € S]. Therefore 
v(v t w) = (t € R*; t (vc w) € S) € te for all e € IR*. Consequently v/(v +w) < t for all t € IR* which 
satisfy v(w) < t. Hence v(v + w) € v(w) = v(v) + v(w). Similarly, v(v + w) € v(v) + v(w) if v(w) = 0. 


Now assume that v(v) > 0 and v(w) > 0. Let e € IR*. Let s = v (v) + £/2 and Let t = v(w) + £/2. Then 
s,t € IR* satisfy (v) < s and y(w) < t. So s^!v € S and t^!w € S by part (ii). Let u = t/(s + t). 
Then p € [0,1]. So (1— u)s-!v + ut^ w € S by the convexity of S. Thus (s + t) (v +w) € S. So 
v(v +w) € srt — v(v) - v(w) +e. Since this holds for all € € Rt, it follows that (v + w) € v(v) + v(w). 
Hence y is a seminorm on V by Definition 24.6.3. 


For part (iv), let So = (v € V; v(v) < 1}. Let v € So. Then v(v) < 1. So v € S by part (ii). Thus So C S. 


Now let Sc = {v € V; v(v) < 1. Let v € S. Then 1 € (te R*';t !ve S). So inf{t e R';t ve S) <1. 
So v(v) € 1. Sov € Sc. Thus S C Sc. 
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24.7. Norms on linear spaces 


24.7.1 REMARK: The positivity condition distinguishes norms from seminorms. 
Seminorms on linear spaces are non-negative by Theorem 24.6.4(iv). Definition 24.7.2 condition (iii) 
strengthens this to a strict inequality for non-zero vectors. 


A norm defines the lengths of vectors in a scalable fashion. In other words, if a vector is multiplied by 
a scalar, then the norm is multiplied by the absolute value of the scalar. (It is not necessary to take the 
absolute value if the scalar is non-negative and the field of scalars K is the same as the field of absolute 
values, which is R in Definition 24.7.2.) 


24.7.2 DEFINITION: A norm on a linear space V over a field K, compatible with an absolute value function 
| -|x : K > R, is a function 4 : V > R which satisfies: 


(i) VÀ € K, Vv € V, (Av) = |A|v(v), [scalarity] 
(ii) Vv,w € V, v(v + w) € v(v) + v(w), [subadditivity] 
(iii) Vv € VV {Ov}, v(v) > OR. [positivity] 


24.7.3 THEOREM: Some basic properties of norms on linear spaces. 
Let V be a linear space over a field K. Let  : V — R be a norm on V which is compatible with an absolute 
value function | - |g : K > R. 


(i) v is a norm on the left module V over the non-zero ring K, compatible with | - |x. 
(ii) w is a seminorm on the linear space V over the field K, compatible with | - |x. 
(ili) v(0y) = On. 

(iv) Vv € V, Y(—v) = v(v). 


PROOF: Part (i) follows from Definitions 19.6.2 and 24.7.2 because any field K is a non-zero ring by 


Theorem 18.7.8 (iii), and R is an ordered ring. 
Part (ii) follows Definitions 24.6.3 and 24.7.2. 
Parts (iii) and (iv) follow from part (ii) and Theorem 24.6.4 parts (ii) and (iii) respectively. 


24.7.4 REMARK:  Normed linear spaces. 
Linear spaces which have a single seminorm are typically not very useful. Therefore “seminormed linear 
spaces" are not defined here. Definitions 24.7.5 and 24.7.7 are concerned only with norms. 


Seminorms are useful for defining topological structure on infinite-dimensional linear spaces which cannot be 
induced by a single norm. Then infinitely many seminorms are required. However, the seminorm concept in 
Definition 24.6.3 is useful for highlighting the role of the positivity condition in establishing the equivalence 
of pairs of norms on finite-dimensional linear spaces modulo a fixed scaling factor which depends only on 
the pair of norms. 


24.7.5 DEFINITION: A normed linear space is a linear space with a norm. In other words, a normed linear 
space is a tuple V < (V, yY) < (K, V, oK, TK, ov, H, | + |x, v) where 


(i) (K, V,oxg,TK,ov, pş) is a linear space over K, 
(ii) | - |x : K > R is an absolute value function for K, 
(iii) v: V > R is a norm on the linear space V < (K, V,oK,TK,ov, H). 
24.7.6 REMARK: Notation for standard absolute value function for the real numbers. 
In the case of norms on real linear spaces as in Definition 24.7.7, the absolute value function is always the 


standard absolute value function on R as in Definition 18.5.14. Therefore it is unnecessary to add a subscript 
“TR” as in “| a In" : 


24.7.7 DEFINITION: A normed real linear space is a linear space over the field R with a norm which is 
compatible with the standard absolute value function |- | : IR — IR given by x > |z| = max(z, —«) for x € R. 


24.7.8 NOTATION: ||- ||, for a vector v in a normed linear space (V,4/), denotes the norm v. In other 
words, Vv € V, ||v|| = v(v). 


| - | is an alternative notation for || - ||. 
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24.7.9 REMARK: Alternative notation for a norm. 

The distinct notations || - || and | - | often distinguish the norm of a vector from the absolute value of a scalar 
respectively. This notational convention can be a useful hint, but the class of object which the function 
applies to determines which function is meant. In the special case that a linear space is a field over itself as 
in Definition 22.1.9, the norm and absolute value are usually the same function. In the unlikely event that 
the norm and absolute value differ for the special case of a field regarded as a linear space, the notational 
distinction can be helpful. But in the overwhelming majority of practical situations, the double bar notation 
|| - || is little more than a “helpful hint”, which may or may not be a waste of ink. 


Oddly, in the case of matrices, the typical notation for the determinant uses single vertical bars, whereas 
one might logically expect triple bars because matrices are “higher-order objects” than vectors. 


24.7.10 REMARK: Power norms on Cartesian spaces. 

The power norms in Definition 24.7.11 use real powers of non-negative real numbers as in Definition 16.6.15. 
These have an analytical character, as mentioned in Remarks 16.6.5 and 16.6.14. Many of the properties 
of power norms also have an analytical character. For example, the p-norm is continuous with respect to 
p for a fixed argument. In other words, for any x € R” with n € Zj, the map p +> |z|p is a continuous 
function on [1,00]. Even the 2-norm utilises an analytical construction, namely the square root function, 
which requires some analysis to demonstrate existence. (See Remark 16.6.5 and Theorem 16.6.6 (x) for proof 
of well-definition of positive integral roots of non-negative real numbers.) 


Some of the unit level sets {x € IR?; |x|, = 1) are illustrated in Figure 24.7.1. 


T24 
1 


-1 


Figure 24.7.1 Unit level sets of p-norms | - |p on IR? 


24.7.11 DEFINITION: The p-norm on a Cartesian product R” for n € Z and p € [1,co] is the map 
ze (Ya zi)? for 1 < p < oo, and z 9 max? , [z;| for p = oc. 


24.7.12 REMARK: The maximum of an empty set of non-negative real numbers. 

In the special case n = 0 in Definition 24.7.11, it is assumed that the expression max? , |x;| is interpreted as 
sup{x;; i € Nn}, where the supremum is defined with respect to the ordered set Rj). By Theorem 11.2.15 (v), 
sup(0) = min(R$) = Or with respect to this ordered set. Consequently max? , |x;| = 0 for n = 0. 


24.7.13 NOTATION: |z|, for x € R”, n € Z and p € [1, oo] denotes the p-norm of z € R”. Thus 


(2 les") if 1<p<oo 


i=l 

[x], = max |z| ifp—=ooandn>1 
= 
0 if p = œ and n = 0. 


24.7.14 REMARK: Abbreviated notation for the 2-norm. 
The vector norm |v|, in Notation 24.7.13 is usually abbreviated to |v| when p = 2. 
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24.7.15 DEFINITION: The Euclidean norm on R” for n € Zg is the 2-norm x > |z|o on R” 


24.7.16 DEFINITION: The Euclidean normed (Cartesian) (coordinate) space with dimension n € Zi is the 
pair (IR^, | - |2), where R” has the usual linear space structure and | - | is the Euclidean norm on R”. 


24.7.17 REMARK:  Equivalence of all norms on a finite-dimensional linear space. 

From the point of view of the topology on a linear space which is induced by the metric which is obtained 
from a norm as in Theorem 37.1.8, two norms on the space are equivalent if each norm is bounded with 
respect to the other. It is useful to know that any two norms on a finite-dimensional linear space are 
equivalent from this point of view. In other words, they induce the same topology. (The same is not true 
for infinite-dimensional linear spaces.) 


The equivalence of norms on a finite-dimensional linear space may be shown by means of topological concepts 
such as Cauchy sequences. (See for example Bachman/Narici [51], pages 122-125; Lang [108], pages 470-471.) 
However, a norm is an apparently algebraic concept, whose properties should presumably be demonstrable 
by algebraic means alone. (Certainly not all algebraic theorems can be proved without analysis, but it is 
generally preferable to at least seek an algebraic proof before resorting to analysis.) 


The small selection of basic properties for p-norms in Theorem 24.7.18 is limited to integer and infinite values 
of p because the properties for non-integer real p values have not been presented or proved here. 


24.7.18 THEOREM: Some basic properties for p-norms on real Cartesian spaces. 
(i) Vn € ZT, Va € R^, |z] € |x|o. 

(ii) Vn € Z*, Va € R”, |alo € |x]1. 

(iii) Vn € Z*, Vp € Z[1,oo], Yx, y € R”, ( (Vi € Nn, |z;| € luil) > |vlp < lylp )- 


PROOF: For part (i), let x € IR". The case n = 0 is trivial. So assume that n > 1. Let j be the least j € IN, 
with |z;| = max? |x;|. Then |z;|? € X; |xi|?. So |x|2, € |z|} by Notation 24.7.13. Hence |z|s; < |z|2 by 
Theorem 16.6.6 (v). 

For part (il), let z € R”. Then Jal? = (Xa lei? = Xa lil? + Xa Xen a leid lel > llf. Hence 
[r|5 < ||; by Theorem 16.6.6 (v). 

For part (iii), let p € Z* and let z,y € R” satisfy Vi € Nn, |z:| € |yi]. Then |z;[? < |y;|? for all 
i € Nn by Theorem 16.6.6 (ii). So 5, , Jail? € Y54 |yi|? by induction on n. Therefore (*5, , |z;|?) /? < 
(927-4 ly;|P) V? by Theorem 16.6.6 (v) and Notation 16.6.9. Hence |z|, < |y|; by Notation 24.7.13. 

Now let p = oo and z, y € R” with Vi € Ny, |v;| € |yj]J. Then z; € max?_, |y;| for all 7 € Nn. Therefore 
max? 4, |rj| € max?_, |y;J. Hence |x|, < |y|; by Notation 24.7.13. 


24.7.19 REMARK: The most extreme tuple is not more extreme than the tuple of extremes of components. 
Theorem 24.7.20 is similar to Theorem 16.7.8. In both cases, summing the extremes gives a not-less- 
extreme result than the extreme sum. In Theorem 24.7.20, the “sum” is a 2-norm, but the result can be 
extended to general norms on R” because of the monotonicity properties of norms. (For an application of 
Theorem 24.7.20 (ii), see the proof of Theorem 43.9.5 (vii).) 


24.7.20 THEOREM:  Bounds for the supremum and infimum of the 2-norm of a tuple-valued function. 
Let n € Zt. Let f; : X — IR be bounded functions on a non-empty set X for i € Nn. Define f : X > R” 


by f:t (fit) 

(i) supt|f(t)]a; t € X} < |(sup|fil)i-ila- 

(ii) inff Olz; t € X} > |(inf | fil) ale. 
Pnoor: For part (i), sup{|f(t)|2; t € X) is a well-defined element of Rj because each f; is bounded on X. 
Let t € X. Then |f;(t)| € sup | fi] for all i € Ny. So [([fi()|) alo € |(sup |fi])i-1l2 by Theorem 24.7.18 (iii). 
Thus |f(¢)|2 = (Al = (AON < l(sup|fil)i-il2. Hence supt]f(t)l2; t € X} € K(sup|fil)iila- 


For part (ii), let ¢ € X. Then |f;(t)) 2 inf|f;| for alli € Ny. So (AED = Inf | fill by 
Theorem 24.7.18 (iii). Thus |f(¢)|2 > |(inf |fi])z-1]2. Hence inf(|f(t)]2; t € X} > |(inf |fi|)2-ila. 
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24.8. Bounds for seminorms and norms on linear spaces 


24.8.1 REMARK: Bounds for seminorms and norms on finite-dimensional linear spaces. 

Theorem 24.8.2 gives an easy upper bound for seminorms, and therefore for norms also, in terms of the value 
of the seminorm (or norm) on the finite set of vectors in a basis. (If a space is known to be finite-dimensional, 
it will typically not be very difficult to construct or otherwise produce a basis.) The much more difficult 
task is to demonstrate the existence of a lower bound for a norm in terms of a finite set of “sample points”. 
Certainly this is not possible for a general seminorm. It is the positivity condition which makes it possible, 
but a general lower bound for a norm in terms of its values only on a basis is not possible. 


24.8.2 THEOREM: Bounds for seminorms on finite-dimensional linear spaces in terms of components. 
Let v : V — R be a seminorm on a finite-dimensional linear space V. Let B = (ej), be a basis for V. 
Then 


VA € R”, »(X Mei) < » lote) (24.8.1) 
< Y [Ail - max d(er) (24.8.2) 
=1 = 


and 
VA c R”, (D> Aver) < max d S Yle) (24.8.3) 
i=1 1 i=1 
= [Aloe i 2, Ve) 


Pnoor: The inequality on line (24.8.1) follows from Definition 24.6.3 (i, ii) by induction on n. Then lines 
(24.8.2) and (24.8.3) follow from line (24.8.1) by Theorem 24.6.4 (iv). (As explained in Remark 24.7.12, the 
| 


expressions max? , w(e;) and max?_, |A;| are interpreted to equal zero when n = 0.) 


24.8.3 REMARK: Lower bounds for norms in terms of values at finite sets of sample points. 

Theorem 24.8.2 gives upper bounds for the value of a seminorm w for a vector v in terms of the vector’s 
coordinate tuple A with respect to some basis B and the value of v» at the sample points e; in the basis. If 
the seminorm is not a norm, no such lower bound is possible because of the existence of non-zero vectors 
whose seminorm equals zero. Therefore a lower bound of this kind may only be obtained for norms. 


General lower bounds for norms, analogous to the upper bounds for seminorms in Theorem 24.8.2, cannot 
in general be obtained in terms of the values for the elements of a basis alone. This is demonstrated by 
Example 24.8.4. However, one may hope to express such lower bounds in terms of a finite set of sample 
points. The sample points should preferably be chosen by some simple algorithm in terms of w and B, not 
in terms of some kind of topological procedure based on convergence of Cauchy sequences for example. 


24.8.4 EXAMPLE: Let V = IR? with the usual linear space structure. Then B = (ej;)?., is a basis for V 
with e1 = (1,0) and eg = (0, 1). Define vy : V — IR for k € R$ by 


Va € V, Vy(x) = max(k|zi + x9], |z4 — v3]. (24.8.4) 


Then v;; is a norm on V for k > 0. The values of Yp on B are vx(e1) = Wx(e2) = max(k, 1). The unit level 
sets of this norm for some values of k € (0, 1] are illustrated in Figure 24.8.1. 


As k is made arbitrarily small but positive, the value of Yp for the fixed vector (1, 1) has the arbitrarily small 
value 2k, while the closed unit ball SE = {x € R?; W(x) € 1} becomes arbitrarily large. The unit vectors 
+e; and ces are in SË for all k € (0,1] because vi (-ke1) = v (cea) = 1 for k € (0,1]. The sets SE are 
required to be convex. Therefore the convex span conv({e1, e», —e1, —e2]) = S must be a subspace of S5 
for all k € (0, 1]. (See Definition 22.11.10 for the convex span of a set.) 


On the other hand, the open unit ball $5 = (x € R?; (x) < 1) cannot contain the vectors e1, e2, —ei 
and —es. So SË must be a convex set which has these four vectors on its boundary. (See Definition 31.9.5 
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Figure 24.8.1 ^ Unit level sets of norms Yp : £ œ> max(k|z1 + z3|, |z1 — z2|) on IR? 


for the formal definition of a boundary.) Any convex, balanced, absorbing set S with the vectors ei, e2, 
—e; and —es on its boundary may be converted to a norm wy : x  inf(t € IR*; ttx € S] which satisfies 


V(e1) = v(e2) = v(—e1) = v(-ex) = 1. 
For k € (0,1], the norm vy defined on line (24.8.4) satisfies 


VA € R^, be (D> Nei) > Ali - max tbe (ei), (24.8.5) 
i=1 = 


where n = 2. To verify line (24.8.5), note that k[A|; max?_, v(e;) = k([A1] + |À3]), where |A1| + |Ae| = 
max(|A1 +Az2|,|A1 — Aa]) by Theorem 18.5.16 (iv). So k|A|, maxz-, v(e;) = k max(|A1 - A3], [A1 — A2|) € YA). 
The sharpness of this bound is verified by A = (1, 1). 


24.8.5 REMARK: Strategies for proving lower bounds for norms relative to values on a basis. 

Whereas the upper bound v(» 7; , A;e;) € |A1 max?_, v(e;) in Theorem 24.8.2 is universal for all norms v 
on V, the lower bound ($7. , A;e;) 2 k|A]1 max’, v; (e;) in Example 24.8.4 line (24.8.5) depends on the 
norm Yk. The factor k can be arbitrarily small. So the bound can be arbitrarily weak. However, it can be 
hoped that for any fixed norm y, the lower bound in line (24.8.5) will hold for some positive factor k. 


It follows from Theorems 24.6.8 and 24.6.11 that seeking a lower bound for a norm is equivalent to seeking 
an upper bound for its unit ball. Thus the algebraic lower-bound task here is equivalent to the geometric 
task of finding an outer bound for a convex, balanced, absorbing set S which has the elements of a basis on 
its boundary. From Figure 24.8.1, it is abundantly clear that such a set S can be arbitrarily large. 


The positivity property of a norm requires the intersection of S with any ray through the origin to be finite. 
It is intuitively clear that if all such ray-intersections have finite length, then S must be bounded by some 
finite cube centred at the origin because the intersection of S with a plane perpendicular to the ray must be 
convex, and this convex intersection must be non-decreasing as the plane moves away from the origin. In 
fact, this plane-intersection must be sublinear, which implies that it must contract to the empty set at some 
finite distance from the origin. This intuitive picture must be converted into a logically rigorous proof. One 
may either approach such a proof via norms directly or via convex, balanced, absorbing sets. 


24.8.6 THEOREM: Lower bound for the seminorm of linear combinations of vectors. 
Let w be a seminorm on a real linear space V. Then 


Vu, v2 € V, Vti, H2 € R, (uv + [2V2) > |uyp(v) = u2(v)]. 
PRoor: By Theorems 24.6.4(i) and 19.5.4 (iv), and Definition 24.6.3 (i), v(pivi + 2v2) > wv(pmvi) — 


V(uaU2) = pi (v1) — u2b(va). Similarly v(pivi + pave) > v(uav2) — V(pivi) = pav (va) — uiv(vi). So 
V(puivi + pave) 2 max(pav(vi) — ua (v2), ua (v2) — pib(vi)) = |piv(vi) — Ho (v2)|- 
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v(v,) 
ji lower bound for v(v,): 
(0, v (vo)) N | (1 — p)b(vo) + ub(vi) | 
additional 
lower — 
bounds 
b v | > 
(u^, 0) pO 
Figure 24.8.2 Lower bound for seminorm of convex combinations of two vectors 


24.8.7 EXAMPLE: Additional lower bounds for the seminorm of convex combination of two vectors. 
Let Y% be a seminorm on a real linear space V. Let vj,vi € VV {Ov}. Let v, = (1— u)vo + uvi for u € [0, 1]. 
Then by Theorem 24.8.6, Vi € [0,1], v (v4) > |(1— u)v(vo) — uv(v;)|. This is illustrated in Figure 24.8.2. 


In the case of a norm, it is known that v(vo) and v(vi) are positive, and it is also known that ~(v,/) 
is positive for u^ = w(vo)/(v(vo) — v(vi)). Therefore Theorem 24.8.6 may be applied to the vector pairs 
(vo, Up) and (vy; v1) to obtain two additional lower bounds to “fill the gap". This yields an explicit positive 
lower bound which depends only on three “sample points" for the norm, where two of the points come from 
a basis, and the third point is computed from the value of the norm at the first two “sample points". 


24.8.8 REMARK: Obtaining additional lower bounds for norms using an intermediate point. 
The idea in Example 24.8.7 is applied in the proof of Theorem 24.8.9 (iii). (An immediate application of 
Theorem 24.8.9 is the proof of Theorem 24.8.12.) 


24.8.9 THEOREM: Additional lower bound for the norm of a linear combination of vectors. 
For any finite-dimensional real linear space V, and any norm ~ on V, and any basis B = (e;)_, for V, let 
P(V,w, B) denote the proposition: 


“Jk € Rt, VÀ € IR" o> Nei) > kA max e)", (24.8.6) 
i=l = 


where n = dim(V). Then 

(i) If dim(V) = 0, then P(V, v, B) is true for every norm v on V and basis B for V. 
(ii) If dim(V) = 1, then P(V, v, B) is true for every norm w on V and basis B for V. 
(iii) If dim(V) = 2, then P(V, Y, B) is true for every norm 7 on V and basis B for V. 


PROOF: For part (i), note that B = 0 and $77. , A;e; = Oy and |A|1 = |ORo|; = Or for all A € R? = (Ono, 
and max? , v(e;) = Op as discussed in Remark 24.7.12. Therefore line (24.8.6) is valid for all k € IR* because 
V(0v) = Or by Theorem 24.7.3 (iii). 

Part (ii) follows from Definition 24.7.2 (i) by choosing k = 1. 


For part (iii), v(A1e1 + Az2e2) > |Aiv(e1) — A2v(e2)| by Theorem 24.8.6 for all À4;, A» > 0. Consequently 
w(rA1e1 + A2€2) = Avv (ei) zu A2t(e2) for all Ay > 0 and 0 < À2 < (ylei) /ab(e2)) At. Let do = p(w), 
where w = 4(e3)ei + Y(e1)e2. Then gg > 0 and Aye, + à2€2 = quw + pee, with wy = à2/%(e1) and 
u2 = M — AgW(e2)/w(e1) for all A € R?. So 


w(Arer + Aze2) = v(uiw + p21) 
> |pv(w) — pe(er)| 
= |Aoqo/V(e1) — ArW(e1) + A2v(ea)] 
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whenever 441, H2 > 0 by Theorem 24.8.6. Combining these two lower bounds gives 


Y(Aier + A222) > max(Aiv(e1) — Az(ez), —ArP(e1) + A2(v(e2) + qo/v(e1))) 
for Ay > 0 and 0 € A» € (w(e1)/v(e2))A1. Restricting this to A satisfying |A|1 = 1 gives 


V((1— Az)e1 + Aze2) > max(v(e1) — A2(wW(e1) + Wle2)), —v(e1) + à2(Y(e1) + v(ea) + Go/H(e1))) 


for 0 € A» € (w(e1)/w(e2))(1 — A2), which means that 0 € As € v(ej)/(v(e) + w(e2)). The two bounds 
in this maximum-expression overlap when A2(v(e1) + v(ez) + $q0/v(e1)) = v(e1), which gives a minimum 
value equal to qo(ei)/(qo + 2: (ei) (b(e1) + v(ea))). Dividing this by max(v(ei), Y(e2)) gives 


T" qo min(1, v(e1)/v(ea)) 
* go 2u(e1) (U(ex) + (e) 
min(1, v(e1)/v(ea)) 
1+ 2y(ei)(v(e1) + v(e2))/$(V(ez)ex + v(e1)ez) 
min(1, v(e1)/v(ez)) 
1+ 2(1 + 9(e1)/v(ez)) /U((e1) Tex + v(e2)7lea) 


as the constant k in line (24.8.6) for A € IR? satisfying 0 < A1 and 0 € As € ($(e1)/v(ea)) A1. A similar 
constant ko may be computed for 0 € Ay and As > (v(e1)/v(es))A1 in terms of the same qo value. Then by 
letting qı = v(v(ea)e1 — W(e1)e2), a further two similar constants ka and k4 may be computed for A < 0 for 
two similar ranges of As. (The computation of ky, k3 and k4 is left as an exercise for the curious reader!) 
Then line (24.8.6) is satisfied with k = min(k;, k2, k3, k4). 


24.8.10 REMARK: Difficulties in computing explicit lower bounds for norms in higher-dimensional spaces. 
In Theorem 24.8.9, the lower bound k may be freely chosen as any positive real number when n = 0, is 
equal to at most 1 when n = 1, and has a complicated dependence on w and B when n = 2. When n = 3, 
it is not entirely clear how to compute an explicit expression for k. It is relatively straightforward to prove 
the existence of k € IR* for general n € Zf using topological concepts such as continuity, compactness and 
completeness. Therefore this task is delayed to Theorem 39.5.2. 


The principal motivation for Theorem 24.8.6 is to show that all norms on finite-dimensional linear spaces 
are equivalent in the sense that their ratio is bounded above and below by some positive constant. This 
then implies that they have the same topological structure. Based on the limited bounds in Theorem 24.8.6, 
obtained using algebraic arguments, it is possible to state the limited Theorem 24.8.12 regarding equivalence 
of norms. The more general theorem for dimension greater than 2 requires some topological analysis. 


24.8.11 DEFINITION: Equivalent norms on a linear space V over a field K, for an absolute value function 
| -|g : K > R are norms $1 and V» on V which satisfy 


de, C € IR*, Vv € V, cV (v) < palv) < C4 (v). 


24.8.12 THEOREM: All norms are equivalent on a two-dimensional real linear space. 
Let V4, Y2 be norms on a real linear space V with dim(V) < 2. Then v and v» are equivalent norms on V. 


PROOF: Let 1, Y2 be norms on a real linear space V with n = dim(V) € 2. Let B = (e;)2., be a basis 
for V. Let v € V. Then v = Y; , Axe; for some unique A = (Ai); € R”. So y1(v) € [Alı max?_, va(ei) 
by Theorem 24.8.2. But by Theorem 24.8.9, Yə(v) > k|A|1 max? V»(e;) for some constant k € Rt which is 
independent of v. Let c = k max? 4 v»(e;)/ max? , v1(e;). Then c € Rt and 


Vv € V, cpi (v) € eA max vi (ei) 
= kA max v»(e;) 
< v»(v). 
Let C = c^!. Then swapping v, and v» gives Vv € V, palv) € Cy (v). 
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24.9. Inner products 


24.9.1 REMARK: An inner product is a bilinear function from a linear space to its field. 
An inner product defines both distances and angles. Hence orthogonality and spectral decomposition of 
linear space endomorphisms can be defined in terms of an inner product. 


An inner product on a linear space is a bilinear map from a linear space to a field. (See Definition 27.2.16 
for bilinear maps.) The square root of an inner product induces a norm on the linear space if the two 
vector parameters of the inner product are the same. In this way, an inner product can induce a metric, 
and therefore a topology, on the linear space. (See Definition 39.3.2.) When the two vector parameters are 
different, the inner product is equal to the product of the induced norms of the two vectors multiplied by 
the cosine of the angle between them, which effectively defines the angle. 


24.9.2 REMARK: Definition of inner product for real linear spaces. 

Definition 24.9.4 is essentially identical to Definition 19.7.2. The combination of conditions (i), (ii) and (iii) 
means that the inner product must be bilinear. The combination of conditions (iii) and (v) implies that 
n(v, v) = 0 if and only if v = 0, as shown in Theorem 19.7.4 (ii). 


Bilinear functions are a special case of the multilinear maps in Definition 27.2.3. However, they are introduced 
early here as Definition 24.9.3 because of their importance for general linear algebra (as distinct from tensor 
algebra). In differential geometry, and in analytic geometry in general, bilinear maps have a non-tensorial 
meaning. Tensors are objects which are defined on manifolds, but inner products are part of the infrastructure 
of Riemannian manifolds and linear spaces. The role of an inner product is to specify distances and angles 
between vectors. This is part of the geometrical structure of a space. Tensors and tensor fields are merely 
inhabitants in the space. 


Bilinear functions are often called “bilinear forms” because when written in terms of vector components, 
they are sums of products of terms which resemble quadratic forms, polynomials or multinomials. 


24.9.3 DEFINITION: A bilinear function or bilinear form on a linear space V over a field K is a function 
n:V x V 2 K which satisfies: 


(i) Vu, v,w € V, nlu +v, w) = nlu, w) 4- n(v, w), [left additivity] 
(ii) Vu, v, w € V, n(u,v + w) = n(u,v) + n(u, w), [right additivity] 
(iii) Vu,v € V, Va € K, n(au,v) = n(u, ov) = on(u, v). [scalarity] 


In other words, the maps n(v, -) : V > K and n(-,v): V ^ K are both linear maps for all v € V. 


24.9.4 DEFINITION: An inner product on a real linear space V is a positive definite symmetric bilinear 
function on V. In other words, it is a function 7: V x V — R which satisfies the following conditions. 


(i) Vu, v, w € V, n(u-4- v,w) = nlu, w) 4- n(v, w), [left additivity] 
(ii) Vu, v, w € V, n(u,v + w) = n(u,v) + n(u, w), [right additivity] 
(iii) Vu,v € V, Va € R, n(au,v) = n(u, av) = on(u, v), [scalarity] 
(iv) Vu,v € V, n(u,v) = n(v, u), [symmetry] 
(v) Vu € V \ {0y}, n(u,u) > 0. [positive definiteness] 


24.9.5 DEFINITION: An inner product (real linear) space is a real linear space with an inner product. In 
other words, it is a pair (V,7), where V is a real linear space and 7) is an inner product on V. 


24.9.6 DEFINITION: The (standard) (Euclidean) inner product on R” for n € Zj is the map ņ : R” x R” > 
R defined by  : (zx, y) => OP, viyi- 


24.9.7 DEFINITION: The Euclidean inner product space or Cartesian inner product space with dimension 
n € Zg is the inner product space (IR",7), where 7 is the standard Euclidean inner product on R”. 


24.9.8 NOTATION: Various notations for an inner product. 
(x,y) denotes the inner product of z, y € R” for n € Zj. 


x- y denotes the inner product of x,y € R” for n € Z à 
(x,y) denotes the inner product of x, y € R” for n € Zg. 
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24.9.9 REMARK: The diversity of notations for inner products. 

There are many notations for inner products, of which Notation 24.9.8 gives a selection. A norm may be 
defined on any linear space. In physics contexts, the angle bracket notation (x, y) is popular. (See for example 
Garrity [269], pages 122-124; Hall [274], pages 55, 537.) Also popular is the Dirac-style angle brackets with a 
vertical stroke instead of the comma, particularly for complex Hilbert spaces. (See for example Lawden [284], 
pages 13, 102; Szekeres [305], page 330; Tung [308], pages 302-303.) 


When multiple norms are defined on a single space in a single context, it may be necessary to use subscripts 
to distinguish the norms. The quantum mechanics bra/ket notation is especially ambiguous since the angle 
brackets may denote either the inner product or the action of a linear functional, depending on whether the 
left vector is in the primal or dual space. 


24.9.10 REMARK: Inversion of inner products to “raise indices” in Riemannian geometry. 

Since the metric tensor field on a Riemannian manifold in Section 73.2 is an inner product at each point of 
the manifold, the properties of inner products have great relevance to differential geometry. In particular, the 
ability to “invert” an inner product, in some sense, is often required for “raising indices" in tensor calculus. 
(This is also mentioned in Remark 73.3.7. For the index-raising inverse isomorphism, see Definition 73.5.4.) 
One important application of this idea is the gradient concept in Definition 74.6.2. 


In general, the practical inversion of linear maps is not trivial. One often demonstrates the existence of 
an inverse without stating an explicit procedure to compute it. Matrix methods are typically required for 
explicit inversion procedures. Therefore only existence is shown here, not explicit formulas for inverses. 


Although matrix inversion is a procedure, which could require non-trivial numerical methods in practice, it 
should also be considered that even the division of real numbers is a “procedure” if, for example, a decimal or 
binary expansion of the result is required. Even multiplication and addition require some kind of “numerical 
algorithm" in binary or decimal. When a ratio such as 2/7 is written, it is only a symbolic formula for 
the solution of an equation such as 7z = 2 which must be solved for x by some procedure. Similarly A^! 
denotes the solution of a matrix equation AX = X A = I, which requires a procedure of some kind. Even 
the addition of integers requires numerical procedures. 


24.9.11 THEOREM: Using inner product to construct isomorphism between linear space and its dual. 
Let V be a finite-dimensional real linear space. Let 7 : V x V — R be an inner product on V. Define 
$:V > V* by Vu,v € V, o(u)(v) = n(u, v). 

(i) $ € Lin(V,V*). 


(ii $ is a linear space isomorphism. 


PROOF: For part (i), let u € V. Then ¢(u) € V* by Definition 24.9.4 (ii, iii). Let u1, u2 € V. Let v € V. 
Then ó(u; + u3)(v) = n(ui + us, v) = n(ui, v) + n(uo, v) = ó(u1)(v) + é(u2)(v) by Definition 24.9.4 (i). So 
plui + u2) = (u1) + d(u2) by Definition 23.6.4. Similarly, for all A € IR and u € V, d(Au) = n(^u,v) = 
An(u,v) = Ad(u)(v) by Definition 24.9.4 (iii). So (Au) = Aó(u) by Definition 23.6.4. Hence ¢ is linear by 
Definition 23.1.1. 

For part (ii), dim(V*) = dim(V) by Theorem 23.7.11. Suppose that ó(u) = Oy» for some u € V \ {0y} 
Then ¢(u)(v) = Op for all v € V. So ó(u,u) = Or. But this contradicts Definition 24.9.4 (v). Therefore 
ker(¢) = {Ov}. Hence ¢ is a linear space isomorphism by Theorem 40.5.7 (i). 


24.9.12 REMARK: Inner products on real linear spaces. 

Inner products may be defined on general modules over rings as in Definition 19.7.2. In such generality they 
are not easily related to norms. As mentioned in Remark 19.8.1, one requires the ring to be upgraded to an 
ordered field where all non-negative elements have square roots. It is possible to prove a generalised Cauchy- 
Schwarz inequality for a general unitary module over a commutative ordered ring as in Theorem 19.8.2. 
In the case of a real linear space, one obtains the usual square-root norm as in Definition 24.9.13, and a 
Cauchy-Schwarz inequality as in T'heorem 24.9.14. 


24.9.13 DEFINITION: The norm corresponding to an inner product  : V x V — R on a real linear space V 
is the function L : V + IRj defined by L(v) = n(v, v)? for all v € V. 
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24.9.14 THEOREM: Cauchy-Schwarz inequality for inner products on real linear spaces. 
Let n: V x V 2 IR be an inner product on a real linear space V. Then 


Vv, v2 € V, In(vi, v2) € |lorllileell, 
where W € V, ||v|| = n(v, v)1/?. 


PROOF: Letn: V xV — R be an inner product on a real linear space V. Let v1, v9 € V. Define a, 8 € Rt 
by a = ||v|| and 8 = ||v2||. Then Theorem 19.8.2 implies (vi, v2) € a8 = [vi ||||vo ||. 


24.10. Hyperbolic inner products 


24.10.1 REMARK: Hyperbolic generalisations of inner products. 

Whereas the inner product in Definition 24.9.4 is used to define metric fields for Riemannian manifolds, 
the generalised style of inner product in Definition 24.10.6 is used to define pseudo-metric fields for pseudo- 
Riemannian manifolds. An inner product which is not necessarily positive definite is called a “scalar product” 
by O’Neill [295], page 12; Sternberg [38], page 117. It is called an “inner product” by Szekeres [305], page 127. 


24.10.2 DEFINITION: A nondegenerate or non-singular symmetric bilinear function on a linear space V 
over a field K is a symmetric bilinear function 7: V x V > K satisfying Vu € V V {0}, dw € V, n(v, w) 4 0. 


24.10.3 DEFINITION: A (hyperbolic) inner product on a linear space V is a nondegenerate symmetric 
bilinear function on V. 


24.10.4 DEFINITION: An orthogonal subset of a linear space V with respect to a hyperbolic inner product 
n on V is a subset S of V V {0} which satisfies Vv, w € S, n(v,w) = 0. 


24.10.5 DEFINITION: The index of a hyperbolic inner product n on a linear space V is the extended integer 
sup{#(S); S C V and Vv € S, (v, v) < 0 and Vv, w € S, n(v, w) = 0]. In other words, the index of rj is the 
largest number of orthogonal vectors v € V with g(v,v) « 0. 


24.10.6 DEFINITION: A Minkowskian inner product on a real linear space V is a hyperbolic inner product 
on V with index 1. In other words, a Minkowskian inner product 7 satisfies 


(i) dv € V, n(v, v) < 0, and [index > 1) 
(ii) Vv, w € V, ((n(v,v) < 0 and g(w, w) < 0) = n(v,w) 4 0). [index « 2] 


24.10.7 REMARK:  Equivalence of basis-independent and basis-specific Minkowskian definitions. 

Minkowskian inner products are typically defined in the literature in terms of a diagonalised matrix which 
has a sequence such as (—1,1,...,1) as diagonal elements. This is quite unsatisfying because the inner 
product in Definition 24.9.4 has no such requirement to find a basis for which the matrix has a specific 
diagonal form. For a pseudo-Riemannian manifold, one must find such a basis at each point of the manifold. 
'This seems quite unnatural, particularly since the time-space of our universe supposedly has such structure. 


The modern viewpoint of differential geometry is antagonistic towards coordinates and special choices of 
bases. So it is desirable to be able to formulate Minkowskian inner products without bases or matrices. This 
is the purpose of T'heorem 24.10.8, which asserts the equivalence of the typical diagonalised matrix view- 
point to the basis-independent characterisation in Definition 24.10.6. The proof of this equivalence is not 
entirely trivial. (This perhaps explains why it is rarely seen.) It is fairly straightforward to generalise Defini- 
tion 24.10.6 to different “signatures” (which are normalised diagonal element sequences for the diagonalised 
matrix.) 


24.10.8 THEOREM: Equivalence of the basis-free and basis-bound Minkowskian inner product definitions. 
Let V be a finite-dimensional real linear space with hyperbolic inner product 7. Let n = dim(V). Then 77 is 
Minkowskian if and only if V has a basis (e;)7?., which satisfies 


(i) n(e1) = —1, 
(ii) Vi € Nn \ {1}, n(ei) = 1, 
(iii) Vi € Nn, Vj € Nn \ {i}, n(ei,e;) = 0. 
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PROOF: Assume that 7 is Minkowskian. Then n > 1 because n(v,v) < 0 for some v € V, If n = 1, then 
choose e1 = (—n(v,v))~/?v. This gives n(e1,e1) = (—n(v, v))~1n(v, v) = —1. So (e;)?., satisfies (i), (ii) 
and (iii). So it may be assumed that n > 2. 


Since V is finite-dimensional, it has a basis B = (e;)?_,. This basis B can be modified in stages to make it 
satisfy (i), (ii) and (iii). First suppose that 7(e;,e;) > 0 for alli € Nn. By Definition 24.10.6 (i), there is at 
least one v € V with n(v, v) < 0. Since B is a basis, v = $5; , vie; for some unique (v;)?., € IR". Not all of 
the components v; equal zero because n(v, v) Æ 0. Let io = min(i € Nn; v; 4 0). Modify B to B' = (e)2.4 
where ej = v and ej = e; for i # ig. To see that B' is a basis for V, suppose that $7; , Ae; = 0. Then 
AigU + Vivi, Aiei = 0. If Ai, A 0, then v = ) izi (-X/Ai)ei, which would imply that vi, = 0 by the 
uniqueness of the component tuple (v,;)?_,, which contradicts the choice of ig. Therefore Aj, = 0, which 
implies 5; ie Aiei = 0, which implies that A; = 0 for all ? € Nn because B is a basis. Thus it may be 
assumed that 7(e;,e;) < 0 for at least one i € Nn. 


Since n(e;,e;) < 0 for some i € Nn, the vectors in B may be permuted so that 7(e1,e1) < 0, and this can be 
scaled so that 7(e1,e1) = —1. Thus it may be assumed that 7(e1,e1) = —1. 


For the third stage, it is assumed that B is a basis with ņn(e1,e1) = —1. Define B’ = (e)? by e; =e) and 
e, = ei 4- n(e1, e;)e1 for i € Nn \ {1}. Then B’ is a basis for V and q(e1,e;) = n(e1, ei) + n(e1, ei)n(e1, 61) = 0 
for all i # 1. Therefore (ej, e;) > 0 for all i # 1 by Definition 24.10.6 (ii). Note that at the end of the third 


stage, the matrix [n(e;, e;)]7 j=1, as defined in Section 25.2, has the following general appearance: 


-1 0 0 0 
0 20 ? ? 
0 ? 20 

0 ? ? >0 


For the fourth stage, suppose that 7(e;,e;) = 0 for some i 4 1. Then by permuting indices, it may be 
assumed that i = 2 and n(e2,e2) = 0. If n = 2, let v € V. Then v = v1e1 + vee for some v1, v2 € IR. So 
(ea, v) = vin(e2, e1) + ven(e2, e2) = 0. Thus (es, v) = 0 for all v € V, which contradicts Definition 24.10.2 
for nondegeneracy because e» # 0 since it is a basis vector. Therefore 7(e2,e2) > 0, and e2 may be scaled so 
that 7(e2,e2) = 1. Thus conditions (i), (ii) and (iii) are verified. So it may be assumed that n > 3. 


So now assume that n > 3 with 7(e1,e1) = —1, and suppose that n(e2,e2) = 0. Suppose that (ea, e;) = 0 
for alli € Nn \ {1,2}. Then n(ea, v) = 0 for all v € V, which contradicts the nondegeneracy assumption. So 
n(e2,e:) # 0 for some i € IN, \ {1,2}. By permuting indices, it may be assumed that 7(e2,e3) Æ 0. Then 
suppose that (ea, e3) = 0. Let ef = e2+e3 and e3 = e2— e3, and e; = e; for i € Nn \ {2,3}. Then B’ = (e)? 
is a basis for V, and n(e5,e4) = 2n(e2,e3), and n(e5, e5) = —2n(e2,e3), and n(e4,e3) = n(e$,e5) = 0. If 
n(e2,e3) < 0, then n(e5,e5) < 0 and n(e1, e5) = 0, which contradicts Definition 24.10.6 (ii). If n(e2,e3) > 0, 
then (e3,e3) < 0 and w(ei,e3) = 0, which contradicts Definition 24.10.6 (ii). Therefore the hypothesis 
n(e3,€3) = 0 is excluded. So n(e3,e3) > 0. Now let ej = e2 + pes with u = —n(e2, e3)/n(e3,e3). Then 
n(ey,e3) = 0 and n(e5, eb) = —n(e2, ex)?/m(es, e3) < 0, which contradicts Definition 24.10.6 (ii). Therefore 
n(e2,€2) > 0. Consequently n(e;,e;) > 0 for all i z 1. Then e; may be scaled so that n(e;,e;) = 1 for 
all i z 1. 


For the final stage, n(e;, ej) may be set to zero for i Z j by induction on i > 1. Suppose that n(e;,e;) = 0 
for all i, j € IN; with i Z j for some k > 1 with k < n. Define e, = e; for L € Nz and ej, = ep — n(ex, ee)ek 
for L € IN, VIN. Then 7(ex, e;) = 0 for all € IN, with £ Æ k. Thus by induction on k € Nn, there is a basis 
B for V which satisfies conditions (i), (ii) and (iii). 


For the converse, suppose that 7 satisfies conditions (i), (ii) and (iii). Then Definition 24.10.6 (i) is satisfied 
with v = e. Let v,w € V with n(v,v) < 0 and n(w,w) < 0. Let v = Oy, vie; and w = $5, 4 wiei. 


Then g(v,v) = —v? + $5 9v? < 0 and n(w,w) = —w? + yw? < 0. Therefore v? > V QU? and 
we > Pow. So viw? > (15 2v2)(95 9 w2) > ($; viwi)? by the Cauchy-Schwarz inequality. (See 


Theorem 24.9.14.) Therefore n(v,w) = —viwi + $5, 9 viw; # 0. Thus Definition 24.10.6 (ii) is satisfied. 
Hence 7 is a Minkowskian inner product on V. 
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24.11. Non-topological vector bundles 


24.11.1 REMARK:  Non-topological vector bundles add linear structure to non-topological fibre bundles. 

If the fibre space F of a non-topological ordinary fibre bundle (E, r, B, AE) as in Definition 21.8.3 is given a 
linear space structure, then the set X (E, n, B) of cross-sections of (E, m, B) according to Notation 21.3.4 will 
have a natural linear space structure with pointwise operations as in Definition 24.11.4. This linear space 
structure on the set of cross-sections is almost universally defined and used for topological and differentiable 
vector bundles. 


It should perhaps be noted that pure “vector fibrations” are worthless. If the relations between fibre charts 
are not constrained by the general linear group of the linear space (which is the fibre space), then any linear 
operations performed on fibre elements will not be preserved by chart transitions. Therefore it will not be 
possible to induce well-defined linear operations onto fibre spaces. The mere existence of bijections between 
fibre sets and the fibre space, as in Definition 21.2.8, is not sufficient. 


In Definition 24.11.2, the word “induced” implies that the linear structure is induced from the fibre space to 
the fibre sets £j. However, as in the case of topological and differentiable structure, the structure on the fibre 
sets may be pre-existing, which is then found upon examination to be consistent with the structure on the 
fibre space. There is no difference in practice. The distinction is purely meta-mathematical because the two 
structures uniquely determine each other via fibre charts. Hence Definition 24.11.2 could be reformulated 
to say that the total space has a given linear space structure on each fibre set, and that this is linear-space 
isomorphic to the linear fibre space V. Thus the word “induced” could be replaced by “given”. 


24.11.2 DEFINITION: A (non-topological) vector bundle over a linear space V is a non-topological (G, V) 
ordinary fibre bundle with fibre space V and structure group G = GL(V). (See Definition 21.8.3.) 


The (induced) linear structure on a fibre set Ep of a non-topological vector bundle E < (E, r, B, AE) over 
V < (K,V,oK,7TK, Ov, uv) is the linear space E, < (K, Ep,oK,TK, Cp, Hp) which satisfies 
(i) Vp E B, V1, 22 € E», Vo € AY, Op(21, 22) = "(ov ($(z1), o(22))), 
(ii) Vp € B, Vc € K, Vz € Ey, Vó € Ab, np(c, z) = 97 (uv (c, o(2))). 
24.11.3 REMARK: The induced linear structure on fibre sets is well defined. 
It is easily verified that conditions (i) and (ii) in Definition 24.11.2 are chart-independent because the 


structure group G = GL(V) guarantees it. (See Notation 23.1.12 for GL(V).) Therefore “Yọ” may be 
replaced by “3¢” without changing the definition. 


24.11.4 DEFINITION: The linear space of cross-sections of a non-topological vector bundle (E, 7, B, AY) 

over a linear space V < (K,V,oK,7K,0v, uv) over a field K < (K,ox,Tk) is the tuple 

S < (K,S,oK.,TK,08, Ls), Where S = X(E,n, B), and os: S x S — S and ps: K x S > S are defined by 
(i) VX,Y € S, Vp € B, es(X,Y)(p) = ov (X (p), Y (p)), 

(ii) Vc € K, VX € S, Vp € B, us(c, X)(p) = uv (c, X(p)). 


24.11.5 THEOREM: The set of cross-sections of a non-topological vector bundle is a linear space. 
Let V < (K, V,ox,Tg, ov, uy) be a linear space over a field K < (K,on,TK). Let E < (E, t, B, AY) bea 
non-topological vector bundle over V. Let S < (K,S,oK,TK,08, us), where S = X(E,m, B), be the linear 
space of cross-sections of E. 

(i) VX,Y € S, o5(X,Y) € S. 

(ii) Vee K, VX € S, ns(c, X) € S. 
(ii) S is a well-defined linear space over K. 
PROOF: For part (i), let X,Y € S. Then Vp € B, os(X, Y)(p) = ov(X(p),Y(p)) € Ep because E, is a 
linear space induced by Definition 24.11.2. Hence og(X,Y) € S by Definition 21.3.3. 
For part (ii), let c € K and X € S. Then Vp € B, us(c, X)(p) = uv(c, X(p)) € Ep because E, is a linear 
space induced by Definition 24.11.2. Hence us(c, X) € S by Definition 21.3.3. 
For part (iii), S satisfies Definition 22.1.1 for a linear space by parts (i) and (ii), and the fact that E, is a 
linear space for all p € B. 
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The family tree in Figure 25.0.1 summarises the main classes of matrices which are described in Chapter 25. 


rectangular matrix 
Mm,n(K) 


a ™ 


square matrix real rectangular matrix 
Ma, (K) Mm,n (IR) 


^7 Pd 


real square matrix 
Ma, (R) 


y 


real symmetric matrix 
Sym(n,R) 
real positive semi-definite real negative semi-definite 
symmetric matrix symmetric matrix 
Syms (n,R) Sym, (n, R) 
real positive definite real negative definite 
symmetric matrix symmetric matrix 
Sym” (n, R) Sym (n,R) 
Figure 25.0.1 Family tree of matrix classes 


25.0.1 REMARK: History of matriz algebra. 
According to Bell [233], page 379, matrix algebra was invented by Arthur Cayley. (See also Remark 22.0.1 
regarding the origin of linear algebra.) Mirsky [117], page 72, wrote the following. 
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The algebra of matrices was first developed systematically by Cayley in a series of papers which 
began to appear in 1857. 


Eves [68], page 10, wrote the following. 


The name matrix was first assigned to a rectangular array of numbers by James Joseph Sylvester 
(1814-1897) in 1850. [...] But it was Arthur Cayley (1821-1895) who, in his paper “A memoir on 
the theory of matrices,” of 1858, first considered general m x n matrices as single entities subject 
to certain laws of combination. 


According to Mirsky [117], page 75: “The term ‘matrix’ is due to Sylvester (1850).” A useful, concise account 
of the origins and wide range of applications of matrix algebra is given by Eves [68], pages 1-5, 10-12. 


25.1. Application contexts for matrices 


25.1.1 REMARK: Contexts in which matrix spaces naturally arise. 

Matrices are doubly indexed families of elements of a field whereas tuple vectors are singly indexed, but 
matrix spaces are not a simple generalisation of linear spaces of tuples of field elements. Matrices have a 
much broader set of operations and properties. Matrices may be thought of as acting on tuple vectors. In 
other words, the vectors are passive and the matrices are active. 


Whereas tuple vectors arise naturally as sequences of components of vectors in a linear space with respect 
to a basis, rectangular matrices arise naturally in two main ways in differential geometry. 


(1) Linear map component matrices. The components of linear maps between linear spaces with respect 
to given bases for the source and target spaces. 


(2) Coordinate transformation component matrices. The components of a change of coordinates for 
a single linear space when the basis for that space is changed. (A matrix also arises from the change of 
basis vectors.) 


In case (1), the points are transformed. In case (2), the points are unchanged, but their coordinates are 
changed because of a change of basis. 


Linear map component matrices can be rectangular with any numbers of rows and columns. Coordinate 
transformation component matrices must be square, i.e. with an equal number of rows and columns. (This 
is because coordinate transformations are generally invertible. Le. They must not “lose information” .) 


A change of vector coordinates due to a linear change of basis may be thought of as a linear map from the 
linear space of coordinate tuples to itself. In this sense, case (2) is a special case of case (1). Thus matrices 
mostly correspond to some kind of linear transformation. 


To avoid confusion between point transformations and coordinate transformations, it is perhaps helpful to 
think of the points as referring to the “state” of some system, whereas the coordinates refer to the “ob- 
servation” of the same system from the point of view of a particular “observer”. Thus the points refer to 
“the real thing”, whereas the coordinates are more subjective, depending on the viewpoint of the observer. 
(This system/observer dichotomy is also mentioned in Remarks 20.1.10 and 20.10.2, Example 20.10.6, Defi- 
nition 20.10.8, and Remark 20.10.18.) 


There is a third main way in which matrices arise naturally in differential geometry. 


(3) Component matrix for a second-degree tensor. For example, the components of a metric tensor 
on a Riemannian manifold. 


The second-degree tensor components in case (3) resemble vector components more than transformation 
components. A tensor is a kind of generalised vectorial ob ject, not a kind of transformation between vectorial 
objects. The component matrices of tensors also do not usually act on vectors in the same way that 
transformation component matrices do. In other words, tensors are generally passive objects on which 
transformation component matrices act, whereas matrices actively transform such passive objects. The 
component matrix for a second-degree tensor is usually square. 


Second-degree tensors are a special case of tensors of general degree. Component matrices are well defined 
for tensors of any degree. It is preferable to discuss algebraic operations on tensor component matrices in 
the context of tensor algebra because the operations of interest are different to the typical operators for the 
component matrices of linear maps and coordinate transformations. (See Chapters 27-30 for tensor algebra.) 
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In addition to differential geometry contexts, finite matrices are also applied to purely algebraic systems. 


(4) Component matrix for simultaneous linear equations. The main task here is to solve the equa- 
tions using row operations which preserve the set of solutions. 


(5) Component matrix for a quadratic linear form. The main task here is to determine eigenvalues 
and eigenvectors. 


Case (4) is closely related to the components of linear maps and coordinate transformations in cases (1) 
and (2). Case (5) is closely related to the symmetric second-degree tensor component matrices in case (3), 
such as the components of a metric tensor. Whereas there are bijective relations between general matrices 
and general linear maps, the coefficients of a quadratic form correspond to symmetric matrices only. 


Thus the tasks to be performed with matrices fall into two main groups. In the first group, matrix multiplica- 
tion corresponds to composition of maps, whereas on the second group, matrix symmetry and diagonalisation 
are more important. In the second group, the matrices represent an object to be studied in itself by trans- 
forming coordinates to reveal its structure, whereas in the first group, the matrices act to transform points, 
and inversion and invertibility are of more interest. 


The difference between transformation-style matrices and quadratic-form matrices can be seen fairly clearly 
by trying to give meaning to the product of the matrices of two metric tensors (gij)? j=1 and (9j,;)?j;-1- The 
product (hij)? ;—1 given by hi; = 375, gixg,; has no meaning. This is because such matrices are components 
of tensorial objects, not transformations of tensorial objects. Thus it is important to not assume that all 
square arrays of numbers refer to the same underlying class of objects. As always, the operations which 
may be applied to a class of mathematical objects depends on the underlying meaning, not just the set of 
parameters for the class. (As a more absurd example, consider the cube root of the number of dollars in a 
bank account, or the number of prime factors of the number of people who live in a city.) 


25.1.2 REMARK: The meanings of elements of matrices in various applications. 
In each application, the individual elements of matrices have different meanings. 


(1) Linear map. If linear spaces V and W have finite bases (eY a and (el), respectively, then a 


linear map ¢ : V — W has component matrix A = (a;j)7-,5-; such that He y)= = JL] aje for 
all j € Nn. Then ġ(v) = 55212754 aijuje¥W for general vectors v = 2a je Y in V. Thus $(v) = 
where w = >>", wie}”, where the component tuple (w;)?*, is given by the formula w = Av. In uh 
context, each matrix element a;; means the component of (ey) with respect to the ith vector e in 
the basis (e”)™, for W. 

n TL 


(2) Coordinate a ee If a linear space V has finite bases B! = (el), and B? = (e?)”,, 

then the component matrix for the change of baste from B! to B? is the matrix A = (a;;)7;.., given by 
Dive = Dj- vej, where v? = $77 ,a;jv]. In other words, v? = Av’. This implies the relation 
bpm Qij e? = ej for all 7 € Nn. So in this context, each matrix element aj; means the component of ei 
with respect to the ith vector e2 in the basis B? for V. (Note that this is not a point transformation 
because the linear map here is effectively the identity map idy on V. Points are not transformed. Only 
the basis and vector components are changed.) 


(3) Simultaneous linear equations. In the equations Vi € Nm, Rm QijUj = wj, each element aj; of the 
m x n matrix A = (aij); j= is the coefficient of the contribution of the variable v; to the ith constraint. 
Each “line” of the sequence of simultaneous linear equations is a constraint which must be satisfied. In 
this context, it is possible to think of the map v +> Av more abstractly as a linear map from K™ to K”, 
where the object of the exercise is to determine the inverse image of the set {w} by this map. But in 
concrete terms, each equation is an individual constraint which delineates the "feasible space", which is 
the set (v € K™; Av = w}. The role of the equations is only to describe the feasible space. 


(4) Inner product. Given a finite-dimensional linear space V with a basis (e;)?.,, an inner product 
n:VxV o E is in essence a positive definite symmetric bilinear form on V. (See Sections 19.7 and 19.8 
for general inner products.) Then 7(v,w) = $5; , aijv;w; for all v = 35.4 vie; and w = 35 4 wie; 
in V, for some positive definite symmetric n x n matrix A = (a;;)?;.,. In this context, each matrix 
element a;; means the inner product 7(e;,e;) of e; and ej. This is a property of the unordered pair of 
vectors {e;,e;}. Metric tensors for Riemannian spaces are particular examples of inner products. 
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(5) Area element. The area subtended by two tangent vectors at a point in a differentiable manifold is 
an antisymmetric bilinear real-valued function of the ordered vector pair. In terms of a basis (e;)?_, 
at a given point, this function may be written as w(v,w) = pue ayuiw; for v = 3 vje; and 
w = Y. wie, where the n x n matrix A = (a;j)7;-, is antisymmetric. In this context, the matrix 
element a;; represents the area subtended by vectors e; and ej. 

(6) Markov process. Each matrix element aj; represents the probability of transition from state i to 
state j in a single discrete-time step. Thus the state vector is a row vector which is multiplied on the 
right by the transition matrix, which is the opposite to the order in most applications. Markov process 
transition matrices must be square, their elements must lie in the real number interval [0, 1], and the 
sum of each row must equal 1. Matrix multiplication signifies the execution of multiple state transitions. 


(7) Input-output process. Each square matrix element a;; represents the flow of some commodity, 
substance or items between sub-systems i and 7 of a system in some unit of time. The Ath power of 
such a matrix yields the flow in k units of time is the flow rate is constant. 


(8) Network connectivity. Each square matrix element a;; equals 1 if there is a link between node i 
and node j, and equals 0 otherwise. Matrix multiplication for network connectivity matrices signifies 
existence of paths of particular lengths. 


(9) Basis frames. Each square matrix element a;; represents the component of a variable basis vector i 
with respect to a fixed basis vector j, or vice versa. Such a matrix is a coordinatisation of a variable 
reference frame, for example in the context of a principal fibre bundle. 


If it clear that matrices are typically associated with linear spaces in some way, but there are applications 
other than as components of linear maps. 


25.1.3 REMARK: Applicability of various matrix operations and properties to various contexts. 

Arrays of numbers (or more general kinds of ring elements) are often encountered in mathematical contexts 
where none, or almost none, of the typical matrix operations are meaningful. For example, an addition or 
multiplication operation for a finite set may be presented as a “Cayley table", which is a square array of values 
Qij = £; o x; of elements c; of some finite algebraic system with an operation “o”. (See Example 17.1.4 for 
example.) Sums, products, determinants and eigenvalues of such “matrices” have little meaning. As another 
example, truth tables in propositional logic are often presented as rectangular arrays of truth values. The 
typical matrix operations have only limited applicability for such arrays. By considering such examples of 
non-matrix arrays, it becomes clear the concept of a matrix is defined by the set of meaningful operations 
which may be applied, not by the static structure of the arrays. Moreover, the set of typical matrix operations 
which are applicable varies according to context. 


25.1.4 REMARK: General matrices. 

In general, a matrix is a map from some Cartesian product X x Y to some ring R. The reason for constraining 
the domain to be a Cartesian product, and the range to be a ring, is that this makes matrix multiplication 
possible. With only an addition operation, there is not much difference between a matrix and a tuple of 
commutative group elements. But if the matrix elements have both an addition and multiplication operation, 
two matrices A: X x Y > Rand B:Y x Z 2 R can be multiplied according to the usual formula: 


Vi € X, Vk € Z, (AB)(i, k) = X` A(i, j) Bj, k). (25.1.1) 
jevY 


If the addition operation is commutative, the order of summation is immaterial. The multiplication operation 
does not need to be commutative. (Eves [68], page 20, gives the name “Cayley product" for the matrix 
product rule on line (25.1.1).) 


Matrices can have an infinite number of rows, or columns, or both. The numbers of rows and columns can 
even be uncountable. This raises the question of how to define an infinite sum if Y is infinite. There are two 
main ways to resolve this issue. One may require that the rows {(j, A(i, j)); j € Y} of A and the columns 
{(j, BU, k)); j € Y} of B contain only finitely many non-zero entries for each i € X and k € Z. This ensures 
that an infinite number of ring elements never need to be summed. Alternatively, one may define a topology 
on R so that convergence is meaningful. Of course, this implies that every multiplication must be checked 
for convergence, and meaningless matrix products must be rejected. In Chapter 25, neither of these methods 
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of extending matrices to infinite domains are presented. Nor is the range of matrices extended from fields 
to general rings. (The properties of inverses of matrices are more straightforward when a field is used.) For 
the purposes of this book, it is sufficient to consider only matrices with domains which are finite and ranges 
which are fields. In fact, fields other than the real numbers are rarely required here. 


The word “matrix” is often generalised to include “infinite matrices”, for which the numbers of rows and 
columns may be infinite, typically countably infinite. Such a generalisation is useful in applications to Hilbert 
space operators, for example, but in this book, the word “matrix” generally means “finite matrix”. 


25.2. Rectangular matrices 


25.2.1 REMARK: Diagrammatic matrices versus pure mathematical matrices. 

Introductory presentations of matrices generally explain them as rectangular arrays of numbers arranged 
in rows and columns. Diagrams are useful for developing intuition, but not suitable for pure mathematical 
definitions. Therefore Definition 25.2.2 is expressed in terms of field-valued functions on Cartesian products 
of finite sets. The distinction between rows and columns corresponds to the distinction between the first and 
second sets in the Cartesian product Nm x Ny. 


25.2.2 DEFINITION: An m x n matriz over a field K for m,n € Zg is an element of K*»*N», In other 
words, it is a function with domain Nm x Nn and range K. 


A rectangular matrix is an m x n matrix for any m,n € Zg. 


The depth or height or number of rows of a rectangular matrix is the number m. 


The width or number of columns of a rectangular matrix is the number n. 


25.2.3 REMARK: Conventions for index sets for matrices. 

The strictly correct notation KN~*N» in Definition 25.2.2 is inconvenient. The notation K™*” is generally 
used instead. The ordinal number m = {0,1,...m — 1} is clearly not the same as the set Nm = (1,2,... m]. 
Mathematics, computer programming and many other subjects are plagued by lack of standardisation of the 
starting number for sequences: 0 or 1. There seems to be no tidy cure for this problem. One must simply be 
ready to convert between the representations when the need arises. It will be assumed here that the indices 
start at 1 because that is the majority preference. 


25.2.4 NOTATION: Mmn(K), for m,n € Zi and a field K, denotes the set of m x n matrices over K. In 
other words, Mj, 4,(K) = KNm*Nn, 


25.2.5 NOTATION: Mm,n for m,n € Zi denotes the set Mm n(R). 


25.2.6 REMARK: The choice of default field for spaces of matrices. 
The default field for matrices is chosen to be the real numbers in Notation 25.2.5, but in some contexts the 
default field would be the complex numbers. 


25.2.7 REMARK: Matrices with no rows or no columns are indistinguishable from the empty set. 

There is a peculiarity which arises when a matrix has either no rows, or no columns, or both. It follows 
from Theorem 9.4.6 (i) that the Cartesian product Nm x N, equals the empty set if either m = 0 or n = 0. 
Therefore there is no way to distinguish 0 x n matrices and m x 0 matrices from 0 x 0 matrices. They are 
all equal to the empty set. So it is generally preferable to avoid them! 


Since all matrices in Mp »(K) with m = 0 or n = 0 are functions valued on the empty set, it follows that 
Mm n(K) = (0) whenever m = 0 or n = 0. One could refer to the matrix Ó as the “empty matrix”. This 
is, of course, very different to the “zero matrix” in Definition 25.2.8. But when m = 0 or n = 0, the zero 
matrix in Mm,n(K) does in fact equal the empty matrix. 


25.2.8 DEFINITION: The zero matriz in a set of matrices Mm »(K), for some field K and m,n € Zj, is 
the matrix (a;;)7-15-1 € Mm, (K) which satisfies V(i,7) € Nm x Nn, aij = Or. 


25.2.9 REMARK:  Notational conventions for matrices. 
A matrix A in the set IK» *N» is a map A: Nm x IN, > K. So strictly speaking, such a matrix should 
be denoted as the family (A(i, j))(,j jew, xw,; Or the function parameter-pair (i, j) may be converted to 
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subscripts as in the notation (A; ;)(;;jeN,, xw,. But it is more usual to denote it as a doubly indexed family 
as (Aij); jar 

It is customary to use lower-case letters for matrices when they have indices, and upper-case when the 
indices are absent. (This notational policy is made explicit for example by Cullen [64], page 12; Schneider/ 
Barker [132], page 4; Franklin [71], pages xi-xii. However, it is often also convenient to ignore this convention 
and use the same case for both the name and the components of a matrix.) The purpose of the upper-case 
is to contrast matrices and vectors when used in the same expression. (Thus the expression “Av” is a hint 
that A is a matrix and v is a vector or column matrix.) When indices are used, the upper-case hint is not 
required because the reader can count the indices to distinguish matrices from vectors. Thus it is usual to 
write A = (ai;)7-1.j-1. If m =n, it is usual to write A = (aij)? j=1: 

Square brackets may also be used. Thus A = [244] 741 (The rectangularity of “square brackets" is perhaps 
a useful mnemonic for the rectangularity of the matrix!) Likewise, when the elements are fully written out 


in rows and columns, the brackets may be round or rectangular. Thus 


a1] 412 Gin Q11 Q12 Qin 
a a a a a a 
A= (as)? fa I^*, = 21 22 2n | — 21 22 2n 
4j)i,j=1 1j1i,j—1 m : 
Gm1 Am2 m Amn Gm1 Am2 m Amn 


Rectangular brackets are preferable because they delineate the matrix entries more precisely. As mentioned 
in Remark 25.2.17, it is convenient to reserve round brackets (i.e. parentheses) for vectors, which have a 
single index, so that matrices, which have a double index, can be highlighted with square brackets. 


25.2.10 REMARK: Elements of matrices, and their addresses and row and column indexes. 

To refer to “an element” of a matrix A = (aij)7215-1 € Mm, (KK), one typically writes simply “aij”. This 
is a kind of pseudo-notation because if the matrix values are all zero, for example, then the value 0 could 
refer to any of the m x n zero elements. The notation “a;;” strictly speaking means the value a(i, j) € K 
of the function a: Nm x Nn > K, but in this case, the information about the index pair (i, j) must not be 
discarded. Therefore in Definition 25.2.11, a matrix element is defined as the ordered pair ((i, j), a;;) instead 
of the more colloquial *a;;". But if one uses a phrase such as “element (i,j) of a matrix A”, there is no 
ambiguity in defining this to mean the value of A for the pair (i, j). 


If one refers to “the address of an element aj; of a matrix A”, this suffers from ambiguity in the case of a 
zero matrix, for example. Therefore Definition 25.2.11 gives a pedantically correct meaning for the *address 
of an element". The same comment applies to the row index and column index. 


25.2.11 DEFINITION: An element of a matriz A = (aij)5-45-1 € Mm, (KC) for some field K and m,n € Zg 
is a pair ((i, j), aij) for some (i, j) € Nm x Nn. 

Element (i,j) of a matriz A = (aij)745-1 € Mm, (K), for some field K and m,n € Zg, for some (i, j) € 
Nm X Nn, is the value aij. 

The address of an element ((i, j), aij) of a matrix A = (aij)@17.1 € Mm,n(K), for some field K and 
m,n € Zi , for some (i, j) € Nm x Nn, is the index pair (i, j). 


The row index of an element ((i,j),aij) of a matrix A = (aij)7-15-1 € Mm,5 (X), for some field K and 


m,n € Zg, for some (i, j) € Nm x Nn, is the integer i. 
The column index of an element ((i, j),a;;) of a matrix A = (ai;)7.15-4 € Mm, (C); for some field K and 


m,n € Z, for some (i, j) € Nm x Nn, is the integer j. 


An entry in a matriz is synonymous with an “element of a matrix". 


25.2.12 REMARK: Diagrammatic rows and columns versus pure mathematical rows and columns. 

The distinction between rows and columns corresponds to the order of the finite sets in the Cartesian product. 
Thus the elements in row i of a matrix A € Mm,n(K) are the values A(i, j) for which the first index i is 
fixed while j varies, and vice versa for columns. It is customary to think of rows and columns of matrices as 
"row vectors" and "column vectors" respectively. Strictly speaking, this should mean that row vector i of 
A € Ma, (K) is the set of ordered pairs (((i, j), A(4, 3)); j € Nn}, which is the restriction of the function A 
to the set {(i, j); j € Nn}, with a corresponding expression for columns. If the double indexing is replaced 
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by single indexing, the information about whether such a vector is a row or a column is lost. In other words, 
there is no way to distinguish a vertical vector from a horizontal vector. 


There are (at least) three reasonable pure mathematical interpretations of the concepts of row and column 
vectors. These are presented here as Definitions 25.2.13, 25.2.14 and 25.2.15. The rows and columns in 
Definition 25.2.13 are simple restrictions of the matrix to specific subsets of the domain of the matrix. In 
Definition 25.2.14, the domain index pair is shifted so as to yield 1 x n and m x 1 matrices respectively. 
Definition 25.2.15 fully extracts the rows and columns as vectors which are neither horizontal nor vertical. In 
this last case, the terms “row vector” and “column vector” are misleading because the information about the 
orientation of the vector has been discarded. Vectors do not have an orientation, whereas the row and column 
matrices are matrices, which do have a specific orientation. This question of orientation is important when 
a row or column “vector” is to be multiplied with matrices according to the standard matrix multiplication 
rules. In practice, these three interpretations are freely and informally intermingled. Usually the intended 
meaning can be guessed from the context. 


25.2.13 DEFINITION: Rows and columns of a matriz. 
Row i of a matriz A € Mm, n(K) is the set {((i, j), A(4, 3); j € Nn} = Ass € KXN, 


Column j of a matriz A € Ma (K) is the set {((i, j), A(i, j)); à € Nm} = Aly xi E KN xi), 


25.2.14 DEFINITION: Row matrices and column matrices of a matrix. 
Row matriz i of a matriz A € Mm n(K) is the set {((1, j), A(i, 3); j € Nn} € KIX" = KÜINS, 
Column matriz j of a matriz A € Mm n(K) is the set {((i,1), A(i, j)); i € Nm} e K™! = KNmx {1}, 


25.2.15 DEFINITION: Row vectors and column vectors of a matriz. 
Row vector i of a matrix A € Mm, n(K) is the set ((j, A(i, j); j ENn} € K”. 


Column vector j of a matriz A € Mm,n(K) is the set {(i, A(i, j)); i € Nm} € K”. 


25.2.16 NOTATION: Row;(A), for a matrix A € Mmn(K) and i € Nm, for a field K and m,n € Zi, 
denotes row matrix i of A. 


Col;(A), for a matrix A € Mmn(K) and j € Nn, for a field K and m,n € Zf, denotes column matrix j 
of A. 


25.2.17 REMARK: Conversion of vectors to row matrices or column matrices. 

There are two obvious natural isomorphisms from the spaces of tuples of elements of a field K to the matrix 
spaces over K. These isomorphisms are the maps ġe : K™ — Mm (K) and à, : K” > Min(K) in 
Definitions 25.2.18 and 25.2.19. These define respectively the column and row matrix conversions for the 
spaces K™ and K”. These maps may be thought of as injections, embeddings or immersions. (The column 
matrix injection map and row matrix injection map are summarised in Figure 25.2.1.) 


Ui 
v2 $c br 
| «——— (01, v2,.-- Un) ————> |v v9 ... v] 
Un tuple 
column matrix vector row matrix 
in Mn(K) in K” in Mi (K) 
Figure 25.2.1 Column matrix injection map and row matrix injection map 


Figure 25.2.1 demonstrates how one may conveniently distinguish between row vectors and row matrices by 
using round brackets (i.e. parentheses) for row vectors and square brackets for row matrices. 


25.2.18 DEFINITION: The column matriz injection map for a linear space K™ over a field K for m € Zi 
is the map ĝe : K™ + Mm,1(K) defined by: 


Vv € K”, Vi € Nm, $c(v)(i, 1) = vj. 


The matrix ¢,(v) is called the column matriz for v. 
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25.2.19 DEFINITION: The row matrix injection map for a linear space K” over a field K for n € Zü is the 
map $, : K” > Mı n(K) defined by: 


Ww € K”, Vj E€ Ng br(v)(1, j) = vj. 


The matrix ¢,(v) is called the row matriz for v. 


25.3. Rectangular matrix addition and multiplication 


25.3.1 REMARK: Operations om general rectangular matrices. 

The operations defined on rectangular matrices include matrix addition, scalar multiplication, and a limited 
form of matrix multiplication. Most of the interesting operations, such as matrix inversion, the determinant 
and trace functions and the calculation of eigenvalues and eigenvectors, require square matrices, which are 
defined in Section 25.8. (See Sections 25.9 and 25.10 for the trace and determinant of a matrix.) 


The algebra of general rectangular matrices is somewhat impoverished. Although the set of rectangular 
matrices for a fixed number of rows and columns is a well-defined linear space with respect to element-by- 
element addition and scalar multiplication, as in Definition 25.3.2, the matrix multiplication operation is 
not even closed. Therefore there is no algebraic structure or specification tuple for Definition 25.3.7. The 
main motivation for defining general rectangular matrices is to perform computations for linear maps with 
respect to bases on linear spaces, as described in Section 25.7. 


25.3.2 DEFINITION: The linear space of m x n matrices over a field K for m,n € Zo is the set of matrices 
My 4 (K) together with the following operations. 


(i) Addition: o : Ma (K) x Mas (K) > Mas (KK), written as o(A, B) = A+ B, defined by 


VA, B € Mm (K), V(i, j) € Nm x Nn, 
(A+ B)(3) = A(i, j) + BG, j). 


(ii) Scalar multiplication: u : K x Mm,n(K) > Mm,n(K), written as (à, A) = AA, defined by 


VA € K, VA € Mm n(K), V(i, j) € Nm x Nn, 
(AA) (i, j) = A(A(1, 3)- 


25.3.3 REMARK: Matrix operation definitions without formal specification tuples is clearer. 
The formal linear space Definition 22.1.1 requires a specification tuple (K, Mmjn(K),oK,TK, 9, p) with field 
operations ox and Tx for K. Definition 25.3.2 is clearer without such formality. 


25.3.4 REMARK: Linear spaces of matrices are free linear spaces on finite set products. 

For each fixed pair m,n € Zğ, the linear space M,,,, (K) in Definition 25.3.2 is no more and no less than 
the free linear space FL(N,, x Nn, K). (See Definition 22.2.10 and Notation 22.2.11 for free linear spaces.) 
So there is no need to present the basic algebraic properties of this system here. The real interest lies in the 
multiplication operation. Although the linear spaces Mm,n(K) are distinct for different pairs (m, n) (except 
for the cases noted in Remark 25.2.7), the multiplication operation is not confined to a single such space as 
the addition operation is. Thus the relations between addition and multiplication are not as straightforward 
as in most of the algebraic structures in Chapters 17, 18 and 19. 


25.3.5 THEOREM: The dimension of a linear space of matrices is the product of its depth and width. 
Let K be a field and m,n € Zf. Then dim(Mm,n(K)) = mn. 


PROOF: The linear space Mm,n(K) is clearly spanned by {exe; (k, £) € Nm x Nn}, where exe € Mm,n(K) is 
defined by epeli, j) = ĝikĝje for (i, j) € Nm x Nn, for all (k, £) € Nm x Nn. These mn vectors are also clearly 
linearly independent. So they are a basis for Mm,n(K) by Definition 22.7.2. Hence dim(Mm,»(K)) = mn by 
Theorem 22.7.16. (Alternatively, the assertion follows more directly from Theorem 22.7.18.) 
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25.3.6 REMARK: Matrix multiplication is defined only for “conformable” matrices. 

The multiplication operation (A, B) +> AB for matrices is defined only for matrices A and B such that the 
number of columns of A equals the number of rows of B. Cullen [64], page 15, calls such ordered pairs of 
matrices “conformable for multiplication". 


25.3.7 DEFINITION: The (matrix) product of an ordered pair of matrices A € M,, ,(K) and B € My (K) 
for a field K and m,n,p € Zg is the matrix C € Mm,n(K) defined by 


V(i,j) € Nm x Nn, C(i,j) = M; Ali, k)B(k, j) 


k=1 


25.3.8 NOTATION: AB denotes the matrix product of matrices A € M, ,(K) and B € M,,,(K) for a field 
K and m,n, p € Zg. 


25.3.9 THEOREM: Associativity of matrix multiplication. 
Let A € Mm p(K), B € Mpq(K), C € Mg n(K) for a field K and m,n,p,q € Zi. Then (AB)C = A(BC). 


Proor: For A, B and C as stated, AB € Mm,(K) by Definition 25.3.7. So (AB)C is a well-defined 
matrix in Mm n(K) by Definition 25.3.7. Similarly, A(BC) is a well-defined matrix in Mm,n(K). Then 
(AB)(i, 2) = M5 Ali, k)B(k, 2) for all (4,£) € Nm x Ng. So ((AB)C)(à, 3) = 3544(AB)(, OC (£, j) = 
Dl Oka Ali, R BOSCO] = Yh Xi AO k)B Us £)C(j, £ for all (i,j) € Nm x Nn. (Removal 
of the parentheses around the inner summation expression is justified by distributivity in K. Swapping 
the summation operators is justified by the commutativity of addition in K. Removal of the parentheses 
in the expression “A(i,k)B(k, £)C(j, £)” is justified by the associativity of multiplication in K.) Similarly 
(A(BC))(i, 7) = 35a Pia A, Kk) B(k, 0) C (j, £) for all (i, j) € Nm x Nn. Hence (AB)C = A(BC). 


25.3.10 REMARK: Entries in a matrix product equal products of left rows and right columns. 

Theorem 25.3.11 demonstrates the usual way in which matrix products are taught and thought of. Each 
entry (AB)(i, j) in the product AB of A and B is computed as the sum of products of entries in the Row;(.A) 
and Col;(B). A minor pedantic subtlety here is that since this row-times-column product is a 1 x 1 matrix, 
its value must be extracted by making it act on the index pair (1, 1). 


25.3.11 THEOREM: Formula for matriz product in terms of row/column products. 
Let A € Mm p(K) and B € Mp n(K) for a field K and m,n,p € Zj. Then 


p 
Vi € Nm, Vj € Nn, = V Row;(A)(1, k) Col; (B)(k, 1) (25.3.1) 
1 


k= 
= (Row;(A) Col;(B))(1, 1). (25.3.2) 


Pnoor: Line (25.3.1) follows from Definitions 25.3.7 and 25.2.14. Line (25.3.2) follows from line (25.3.1) 
and Definition 25.3.7 for the product of the 1 x p and p x 1 matrices Row;(A) and Col;(B). 


25.3.12 REMARK: Rows and columms of matrix products are spanned by right rows and left columns. 
Theorem 25.3.13 combines the elements in each row or column of a matrix product to conclude that each 
row of a matrix product AB equals a linear combination of the rows of B, and each column of AB equals a 
linear combination of the columns of A. This elementary observation has important consequences because 
it allows some of the theory of linear spaces to be applied to matrices. 


25.3.13 THEOREM: Formulas for matrix product rows and columns as row/column linear combinations. 
Let A € Mm p(K) and B € My, (K) for a field K and m,n,p € Zt. Then 


Vi € Nm, Row;(AB) -X aik Row; (B) (25.3.3) 
and 
Vj € Nn, Col, ( -X bj Col (A (25.3.4) 


where A = [ai], 5. and B = [bx;]t a 7—- 
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PROOF: For line (25.3.3), Row;(AB) = {((1,7), (AB)(i,7)); j € Nn} € Min(K) by Definition 25.2.14 for 
all i € Nm. Thus Row;(AB)(1,j) = (AB)(i, j) for all j € Nn. Then Row;(AB)(1,j) = $5, 4 aix B(k, j) 
by Definition 25.3.7. But Row,(B)(1,j) = B(k,j) by Definition 25.2.14. Therefore Row;(AB)(1,j) = 
D2] aik Rows(B)(1, j) for all j € N4. Hence Row;(AB) = $77 4 aik Row;(B) by the vector addition and 
scalar multiplication operations defined on M; „n(K) in Definition 25.3.2. This verifies line (25.3.3). 

For line (25.3.4), Col; (AB) = {((2, 1), (AB) (i, 3)); à € Nm} € Mm (K) for all j € IN, by Definition 25.2.14. 
So Col; (AB)(i, 1) = (A B)(i,j) for all i € Nm. Then Col;(AB)(i,1) = $7 , Ali, k)by; = $5 brz ACG, k) 
by Delution 25.3.7. But Colg(A)(i, 1) = A(i, k). So Col;(AB)(é, 1) 2 X}; b; Col (A)(i, 1) for all i € Nm. 
Hence Col; (A B) =>? ņ—1 bkj Coli (A) by the vector addition and scalar multiplication operations for M; (K) 
in Definition 25.3.2. This verifies line (25.3.4). 


25.3.14 REMARK: More expressions for the rows and columns of a matrix product. 
Lines (25.3.3) and (25.3.4) in Theorem 25.3.13 may also be rewritten as follows. 


Vi € Nm, Row;(AB) -X Row;(A)(1, k) Rowy(B) 
= aa B 
and 
Vj € Nn, Col, ( -X Col; (B)(k, 1) Col, (A) 
x 4 Ca, (B ). 


25.3.15 REMARK: Layout conventions for matrix products and matriz equations. 

Hand-written matrix algebra is most convenient when the vector components are depicted as column vectors. 
In other words, the elements of the n-tuples are listed in increasing order down the page. This gives the best 
layout when such a vector is multiplied by a matrix as in the following. 


yı Q1] 012 ... Qin T1 
Y2 421 Q22  ... Gan T2 
Ym Am1 Am2 esa Amn Tn 


The use of vertical (i.e. column) vectors keeps handwritten calculations compact in the horizontal direction. 
A further advantage of column vectors is that the composition of multiple linear transformations gives the 
same order for matrices. Thus if y = f(x) = Ax and z = g(y) = By for matrices A and B, the composition 
g(f(x)) equals BAx. Thus in both notations, the order is the reverse of the temporal (or causal) order which 
intuitively underlies the composition of functions. An m x n matrix over a field K is used for linear maps 
f: K” > K™, which has m and n in the reverse order. An advantage of multiplying a vector by a matrix 
from the left is that the indices are contiguous. For example, y; = i a,jx;. The two instances of the 
index 7 are close to each other. 


In Markov chain theory, the reverse convention is adopted. There the vectors are row vectors, which makes the 
matrix multiplication order the reverse of the standard function composition notation order. The advantage 
in this case is that the matrix multiplication order matches the temporal order since the initial state of the 
system is a row vector which is multiplied on the right by state transition matrices. For example, in the 
equation wr = pm 25-4 v; Pj; Qi, the initial state is v and one or matrices on the right modify the state, 
first P then Q. 


25.3.16 REMARK: A not-very-useful generalisation of rectangular matrix algebra. 

The matrix product is a map T : Mj, (KK) x My, (K) > My, (IK). This operation is not closed in any nice 
way, but it can be made closed by embedding the spaces Mm,n(K) in the space of sparse infinite matrices 
Mh (K) = (C € KNXN; 4(((4,j) € N x N; C(i,j) # 0}) < 00}, whose product operation is defined by 


VA, B € M%,œ (K), Y(i, j) E N x N, 


(AB) (i,j) = V. A(i, k)B(k, j). (25.3.5) 


Me 


x 
ll 


1 
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Accordingly, general matrices A € Mm »(K) and B € M,,,(K) for a field K and m,n,p,q € Zj may be 
multiplied to give AB € Mm,n(K) defined by 


VA € Mmp(K), VB € Mg n(K), V(i, 3) € Nm X Nn, 


min(p,q) 


(AB)(i,j)= J, AG, k)B(k, j). 
k=1 


This calculation gives the same result as (25.3.5). It is probably not very useful for differential geometry. 


The infinite matrix embedding idea also makes possible the addition of arbitrary matrices A € Mm, n,(K) 
and B € Min, 4, (K) for a field K and m4, n1, M2, n2 € Zi. A matrix sum A+ B € Mm may be defined 
for m3 = max(m;,m») and n3 = max(ni,n2) by 


3,n3 


VA € Ma, mı (K), VB € Mma n (K), Y, j) € Nma X Nng, 
(A+ B)(,3) = A(i, j) + BG, j), 


where Ox is assumed for each right-hand term if it is undefined. (In other words, the infinite embedded 
matrices are “zero filled”.) As in the case of the generalised matrix product, this generalised matrix addition 
is also of dubious practical value. 


25.4. Identity matrices and left and right inverse matrices 


25.4.1 REMARK: Unit matrices. (Also known as identity matrices.) 

The multiplicative identity matrix in Definition 25.4.2 is well defined for any field K because every field has 
a unique zero and unity element. (As mentioned in Remark 18.2.3, the zero and unity element are unique 
even in a unitary ring. Fields are a sub-class of unitary rings.) The Kronecker symbol ó;; is defined in this 
context to have entries which are either the zero or unity element of the field K. See Definition 14.7.10 and 
Notation 14.7.11 for the Kronecker delta symbol 6,;. 


Unit matrices are also often known as identity matrices. There are arguments for and against each name. 
The term “unit matrix" can be confused with the similar-sounding “unitary matrix". The term “identity 
matrix" is ambiguous because it could be an additive or multiplicative identity. Therefore both terms are 
used according to convenience. The additive identity matrix is generally known as the "zero matrix". So 
there is little real danger of ambiguity for the term “identity matrix". 


25.4.2 DEFINITION: The unit matrix or identity matrix in a matrix space M, ,(K) for a field K and 
n € Zg is the matrix A € Mn n(K) defined by A(i, j) = 6;; for all i,j € Nn. 


25.4.3 NOTATION: J, denotes the identity matrix in Mn,n(K) for n € Le and a field K. In other words, 
Vn € Zt, V(i,j) € Nn x Nn, In(i, j) = d. 


25.4.4 THEOREM: Identity matrices act as left and right identities for rectangular matrices. 
Let A € Mm n(K) be a matrix over a field K with m,n € Zi. Then [,,A = AL, = A. 


PROOF: Let K be a field. Let m,n € Zi and A € My »(K). Then ImA is a well defined matrix in 
Mm,n(K) because Im € Minjm(K), and (I5 4)(4 j) = PU, Im, k) A(k, j) = Mos Sin Alk, j) = Ali, j) for 
all (i, j) € Nm x Nn. Hence ImA = A. Similarly AI, = A. 


25.4.5 DEFINITION: A left inverse of a matrix A € Mm n(K) for a field K and m,n € Zj is a matrix 
B € My m(K) such that BA = I. 

A right inverse of a matrix A € Mm,n(K) for a field K and m,n € Z is a matrix B € Mj, (KK) such 
that AB = Im. 


25.4.6 REMARK: Left and right inverses of products of matrices. 
Theorems 25.4.7 and 25.4.8 give some idea of what can be demonstrated for left and right inverses of general 
rectangular matrices. 
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25.4.7 THEOREM: Some properties of left and right inverses of products of matrices. 
Let A € Mmn(K), A’ € Mnm(K), B € Mn p(K) and B' € Mp n(K) for some field K and m,n,p € Zq. 
, ; p p. 0 
(i) If AA’ = Im and BB’ = hn, then ABB'A' = I. 
(ii) If A is a left inverse for A’ and B is a left inverse for B’, then AB is a left inverse for B'A’. 
(iii) If A’ is a right inverse for A and B’ is a right inverse for B, then B'A' is a right inverse for AB. 


PROOF: For part (i), let AA’ = Im and BB’ = I,. Then ABB'A' = A(BB')A' = AI, A’ = AA' = Im since 
matrix multiplication is associative by Theorem 25.3.9. 
Parts (ii) and (iii) follow from part (i) and Definition 25.4.5. 


25.4.8 T'HEOREM: More properties of left and right inverses of products of matrices. 
Let A, B' € Ma, (K) and A’, B € M, (K) for some field K and m,n € Zj. 
(i) If AA’ = Im and BB’ = I4, then ABB' A’ = Im and BAA'P' = Is. 


(ii) If A is a left inverse for A’ and B is a left inverse for B’, then AB is a left inverse for B'A’ and BA is 
a left inverse for A'B’. 


(ii) If A’ is a right inverse for A and B’ is a right inverse for B, then B'A’ is a right inverse for AB and 
A' B' is a right inverse for BA. 


PROOF: For part (i), let AA’ = Im and BB’ = In. Then ABB’ A’ = A(BB')A' = AT, A’ = AA’ = Im and 
BAA'B' = B(AA')B' = BImB' = BB’ = I, since matrix multiplication is associative by Theorem 25.3.9. 
Parts (ii) and (iii) follow from part (i) and Definition 25.4.5. 


25.4.9 REMARK: Constraints on matrix sizes for left and right inverses. 

The matrix size parameters m, n and p in Theorem 25.4.7 cannot be freely chosen. If AA’ = Im and 
BB' = Ín, then the constraints m < n and n < p follow from Theorem 25.5.14. (The proof of this exploits 
some properties of the row rank and column rank of matrices. Therefore this topic is delayed to Section 25.5.) 
Similarly, the equalities AA’ = Im and BB’ = I, in Theorem 25.4.8 imply that m < n and n < m, and 
so m = n. However, even if m = n, it does not follow by pure algebra from AA’ = Im that A'A = Im. 
Matrices are not commutative in general, and the case of infinite matrices in Example 25.4.10 shows that 
AA’ = I is consistent with A’A = I for an infinite identity matrix J. The peculiar properties of finite- 
dimensional linear spaces are required in order to show that AA’ = Im implies A’A = Im for a square 
matrix A. 


25.4.10 EXAMPLE: Infinite square matrices for which left inverse does not imply right inverse. 

In the space Majo = K Z xZ! of infinite matrices with positive integer indices and values in a field K, it 
is possible to have AA’ = I and A'A # I, where I € Moo,00 is defined as the unit matrix [0;;];5 .,. Consider 
the matrices A = [a;;];5., and A’ = [a;;]/;-, with aij = 0;11,; and aj; = 6;,;41 for all i,j € Z*. These 
matrices may be visualised as follows. 


, 
ij 


01 0 0 00 0 0 
0 0 1 0 1 0 0 0 
A-[legfg;-|0 0 0 1 A'-[|ee.-|0 1 0 o 
0 0 0 0 0 0 1 0 


Then X224 ik Qh; = ku Ói ci, kÓR, j+1 = Õi+1,j+1 = Oi for all i,j € Z+. So AA’ = I. But Eo ak = 
Ra Ôi k+1Ôk+1,j = rar 6i,k9k,j = 9; (1 = 6i,1) for all 2, € ZF, So A'A Æ I because the entry (1, 1), 1) 
in I is missing in A'A. 


25.4.11 DEFINITION: The transpose of a matrix A € Mm,n(K) for a field K and m,n € Zj is the matrix 
B € Mn,m(K) defined by B(j,i) = A(i, j) for all i € Nm and j € Nn. 


25.4.12 NOTATION: AT denotes the transpose of A € Mm,n(K) for a field K and m,n € Zj. 
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25.4.13 REMARK: An alternative notation for the transpose of a matriz. 

The notation ‘A for the transpose of a matrix A is also popular. However, a mixture of prefix and postfix 
superscripts (or subscripts) can be confusing. (For example, it might not be clear which order of computation 
is implied by expressions such as ‘A? and ‘A~!. The fact that they may give the same answer does not remove 
the ambiguity as to which order of operations is signified.) Therefore only postfix superscripts and subscripts 
are used in this book. 


25.4.14 REMARK: Some obvious properties of the transpose of a matrix. 
Clearly the transpose of the transpose of a matrix is the original matrix. That is, (AT)? = A for all 
A € Mj, (KK) for any field K and m,n € Zj. 


A matrix space Mm,n(K) is not closed under the transpose operation unless m = n. The transpose of a 
column vector X € M,,1(KK) is a row vector XT € Mı,m(K). The transpose of a row vector Y € Mı n(K) 
is a column vector YT € M, (K). 

It is readily verified that the transpose operation commutes with both addition and scalar multiplication on 
a linear space of matrices. In other words, (A +B)? = AT + BT and (AA)? = AAT for A, B € Mm n(K) 
and A € K. 


25.4.15 THEOREM: The transpose of a matrix product equals the reverse product of the transposes. 
For any field K, Vm,n,p € Zi, VA € Mm. (K), VB € My (K), (AB)? = BTAT. 


PROOF: Suppose m,n,p € Zi, A € Mm p(K) and B € My, (K) for some field K. Then for all i € N, 
and j € Nn, (AB) = (AB): = Si Qjkbki = pm byiüjk = ao BAL; = (BTAT); as claimed. 


25.5. Row and column span and rank 


25.5.1 REMARK: Row matrix span and column matrix span. 

One naturally defines a matrix inverse to be a matrix which is both a left inverse and a right inverse. This 
is complicated by three main issues. The first issue is that the set of all rectangular matrices is not even a 
semigroup, although the multiplication operation is associative whenever it is defined, and the matrices of 
each fixed size constitute a linear space. The second issue is that the identity matrices in Definition 25.4.2 
are not unique. Consequently, the left and right inverses in Definition 25.4.5 are defined with reference to 
an infinite set of identity matrices. The third issue is that a two-sided inverse matrix must be square, but 
to prove this requires some understanding of the span of the set of row matrices and the span of the set 
of column matrices of a given matrix. Therefore in order to prove Theorem 25.5.14, some definitions and 
properties for the row matrix span and the column matrix span are required. 


25.5.2 DEFINITION: The row (matriz) span of a matrix A € Mm,n(K), for some field K and m,n € Zg, 
is the span of the set of m row matrices of A in the linear space Mi n(K). 


The column (matriz) span of a matrix A € Mm n(K), for some field K and m,n € Zg, is the span of the 
set of n column matrices of A in the linear space Mm, (K). 


25.5.3 NOTATION: Rowspan(A), for a matrix A € Mm,n(K) for some field K and m,n € Zi , denotes the 
row matrix span of A. In other words, 


Rowspan(A) = span({Row;(A); i € Nm}) 
= (E ARows(A); (A) € K” J. 


Colspan(A), for a matrix A € Mm,n(K) for some field K and m,n € Zj, denotes the column matrix span 
of A. In other words, 
Colspan(A) = span((Col;(A); j € Nn}) 


= (X uj Colj(4); (ufus € K^). 


25.5.4 REMARK: Row and column spans are subspaces of row and column matrix spaces. 

It is fairly obvious that Rowspan(A) is a subspace of the linear space of row matrices Mi,,(€), and 
Colspan( A) is a subspace of the linear space of column matrices M; 1(K), where A € Mm,n(K) for some 
field K and m,n € Z. This is shown in Theorem 25.5.5 because it’s better to be safe than sorry! 
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25.5.5 THEOREM: Row and column spans are subspaces of row and column matriz spaces. 
Let K be a field and m,n € Z. Let A € Mm n(K). 

(i) Rowspan( A) is a linear subspace of Mı (K). 

(ii) Colspan(A) is a linear subspace of Mm, (K). 


PROOF: For part (i), let v € Rowspan(A). Then by Definition 25.5.2, v = >i", A; Row;(A) for some 
(A)*, e K". But Row;(A) € Mi, (K) for all i € Nm by Notation 25.2.16 and Definition 25.2.14. Hence 
v € Mi, 4 (K) by the closure of Mi, (€) under linear combinations by Definition 25.3.2. 

For part (ii), let w € Colspan(A). Then w = 555 ., jjj Col;(4) for some (uj)7-, € K” by Definition 25.5.2. 
But Col;(A) € Mm, (K) for all j € N, by Notation 25.2.16 and Definition 25.2.14. Hence w € M; (KK) by 
the closure of Mm,ı(K) under linear combinations by Definition 25.3.2. 


25.5.6 THEOREM: Relations between matriz product row/column spans and individual row/column spans. 
Let A € Ma, 4 (K) and B € My (K) for some field K and m,n € Z*. 


(i) Vi € Nm, Row;(AB) € Rowspan( B). 

(ii) Vj € Na, Col;(AB) € Colspan( A). 
(iii) Rowspan(AB) is a linear subspace of Rowspan(B). 
(iv) Colspan(AB) is a linear subspace of Colspan(A). 


Proor: Part (i) follows from Theorem 25.3.13 line (25.3.3) and Definition 25.5.2. 
Part (ii) follows from Theorem 25.3.13 line (25.3.4) and Definition 25.5.2. 

For part (iii), note that Rowspan(AB) = (95,4, A4; Row;(AB); (Ai), € K™} as in Notation 25.5.3. So 
Rowspan(AB) C Rowspan(B) by part (i) because Rowspan( B) is a linear space. But Rowspan( AB) is closed 
under linear combinations. Hence Rowspan( AB) is a linear subspace of Rowspan(B). 

For part (iv), note that Colspan(AB) — I4 hj Co (AB); (uj)5-4 € K^) as in Notation 25.5.3. So 
Colspan(AB) C Colspan(A) by part (ii) because Colspan(A) is a linear space. But Colspan(AB) is closed 
under linear combinations. Hence Colspan(AB) is a linear subspace of Colspan( A). 


25.5.7 REMARK: Row rank and column rank of a matrix. 

Since the row span and column span of a matrix are linear subspaces of the corresponding row and column 
matrix linear spaces (if they inherit the associated linear space structures), these spans have a well-defined 
finite dimension as linear spaces. These are defined to be the row rank and column rank respectively in 
Definition 25.5.8. 


25.5.8 DEFINITION: The row rank of a matrix A € Mm,n(K), for a field K and m,n € Zj, is the dimension 
of the row span of A. 

The column rank of a matrix A € Mm,n(K), for a field K and m,n € Zj, is the dimension of the column 
span of A. 


25.5.9 NOTATION: 
Rowrank( A), for a matrix A, denotes the row rank of A. In other words, Rowrank(A) = dim(Rowspan(A)). 


Colrank( A), for a matrix A, denotes the column rank of A. In other words, Colrank(A) = dim(Colspan( A)). 


25.5.10 REMARK: Some constraints for row rank and column rank of products of matrices. 
It follows from Theorem 25.3.5 that dim(Rowspan(A)) € n and dim(Colspan(A)) € m. This yields the 
inequalities for the row rank and column rank of a product of two matrices in Theorem 25.5.11. 


25.5.11 THEOREM: Relations between matriz product row/column ranks and individual row/column ranks. 
Let A € Mj, 4(K) and B € Mn m(K) for some field K and m,n € Zt. 


(i) Rowrank(AB) € Rowrank(B). 
(ii) Colrank(AB) x Colrank( A). 


Pnoor: Part (i) follows from Theorems 25.5.6 (iii) and 22.5.14 (ii). 
Part (ii) follows from Theorems 25.5.6 (iv) and 22.5.14 (ii). 
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25.5.12 THEOREM: Row and column spans and ranks of identity matrices. 
Let K be a field and n € Zi. Let L, be the identity matrix in Mn n(K). 


(i) Rowspan(J;,) = Mi, (K). 
(ii) Colspan(14) = Mn, (K). 
(iii) Rowrank(J,,) = Colrank(1,) = n. 


PROOF: For part (i), Rowspan(/;,) = lxx Aje5; (Ak)Ra1 € K"), where e; € Mi, (K) is defined by 
e(l, k) = ĝj for all j,k € Nn. Then Rowspan(1,) = { pullis (Ax), € K") = Mi (K). 

For part (ii), Colspan(In) = ($5.1 Aef; Qu), € K” }, where e£ € Mn (K) is defined by e£(k, 1) = ôi,x 
for all i,k € Nn. Then Colspan(In) = { [Ai] a1; Qu)g E K") = Mna (K). 


Part (iii) follows from parts (i) and (ii) and Theorem 25.3.5. 


25.5.13 THEOREM: Row and column ranks are bounded by both the depth and width of a matriz. 
For a field K and m,n € Z*,let A € Mj, (K). 

(i) Rowrank(A) € min(m, n). 

(ii) Colrank(A) € min(m, n). 


PRoor: For part (i), Rowrank(A) = Rowrank(AI,) € Rowrank(J,) = n by Theorem 25.5.11 (i) and 
Theorem 25.5.12 (iii). But Rowspan(A) is spanned by the m row vectors of A by Definition 25.5.2. So 
Rowrank(A) = dim(Rowspan(A)) € m by Definition 22.5.2. Hence Rowrank(A) € min(m, n). 

For part (ii), Colrank(A) = Colrank(/,,A) € Colrank(/,,) = m by Theorems 25.5.11 (ii) and 25.5.12 (iii). 
But Colspan(A) is spanned by the n column vectors of A by Definition 25.5.2. Therefore Colrank(A) = 
dim(Colspan(A)) € n by Definition 22.5.2. Hence Colrank(A) € min(m, n). 


25.5.14 THEOREM: Requirements on matrix depth/width to make left/right inverses possible. 
For a field K and m,n € Z*,let A € Mm,n(K) and B € M, (K). 


(i) If A is a left inverse of B, then m < n. 
(ii) If A is a right inverse of B, then m > n. 


(ii) If A is both a left inverse and right inverse of B, then m — n. 


PROOF: For part (i), let A be a left inverse of B. Then AB = Im. So m = Rowrank(Im) = Rowrank(AB) < 
Rowrank(B) < n by Theorems 25.5.12 (iii), 25.5.11 (i) and 25.5.13 (i). 


For part (ii), let A be a right inverse of B. Then BA = I,. So n = Colrank(J,) = Colrank(BA) 
Colrank(B) < m by Theorems 25.5.12 (iii), 25.5.11 (ii) and 25.5.13 (ii). 


Part (iii) follows from parts (i) and (ii), 


IA 


25.5.15 REMARK: Two-sided inverse matrices. 

As mentioned in Remark 25.5.1, and proved in Theorem 25.5.14 (iii), a two-sided inverse matrix must be 
square. The idea that non-square matrices have even single-sided inverses is somewhat illusory. There are 
infinitely many “identity matrices”, which is impossible in any group. Of course, the non-square matrices 
don’t even constitute a multiplicative semigroup. So terms like “left inverse” and “right inverse” do not have 
the usual kind of meaning. 


Since only square matrices have two-sided inverses, and such inverses are themselves necessarily square 
matrices, two-sided inverse matrices are presented in Section 25.8. 


25.5.16 THEOREM:  Row/column rank have maximum values when rows/columns span maximum spaces. 
Let A € Mm, (K) for some field K and m,n € Z*. 


(i) Rowrank(A) = n if and only if Rowspan(A) = Mi, (K). 
(ii) Colrank(A) = m if and only if Colspan(A) = My 1(K). 


PROOF: Let A € Mm n(K). By Theorem 25.5.6 (i), Rowspan(AI,) is a linear subspace of Rowspan(1;), 
which equals Mi n(K) by Theorem 25.5.12 (i). Therefore Rowspan(A) = Rowspan(AI,) is linear subspace 
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of Mı n(K), which has dimension n by Theorem 25.3.5. Hence if Rowrank(A) = n, then Rowspan(A) = 
M,,,(K) by Theorem 22.5.16 (ii). The converse follows from Theorem 25.3.5. This verifies part (i). 


For part (ii), Theorem 25.5.6 (ii) implies that Colspan(J;, A) is a linear subspace of Colspan(Im), which equals 
Mm, (K) by Theorem 25.5.12 (ii). Therefore Colspan(A) = Colspan(1,, A) is linear subspace of M, (KK), 
which has dimension m by Theorem 25.3.5. Hence if Colrank(A) = m, then Colspan(A) = Mj1(K) by 
Theorem 22.5.16 (ii). The converse follows from Theorem 25.3.5. 


25.5.17 REMARK: Row and column spans of left and right inverse matrices. 

Theorem 25.5.18 shows that a necessary condition for A to be a left inverse of B, or equivalently that B is a 
right inverse of A, is that the rows of B must span the whole row space M1, (€) and the columns of A must 
span the whole column space M, 1(X). An additional necessary condition is given by Theorem 25.5.14 (i), 
namely that m < n. Example 25.5.19 shows that these conditions are not sufficient to guarantee that A is 
a left inverse of B. 


25.5.18 THEOREM: Row/column rank and span properties for right/left inverses. 
For a field K and m,n € Z*,let A € Mm,n(K) and B € Mn,m(K) with AB = Im. 


(i) Rowrank(B) = m and Rowspan(B) = Mi (KX). 
(ii) Colrank(A) = m and Colspan(A) = Mm, (K). 


PROOF: For part (i), m = Rowrank(Im) = Rowrank(AB) € Rowrank(B) by Theorems 25.5.12 (iii) 


and 25.5.11 (i). But Rowrank(B) < m by Theorem 25.5.13 (i). So Rowrank(B) = m. Hence Rowspan(B) = 


Mi, (K) by Theorem 25.5.16 (i). 

For part (ii), m = Colrank(L,,) = Colrank(AB) < Colrank(A) by Theorems 25.5.12 (iii) and 25.5.11 (ii). 
But Colrank(A) € m by Theorem 25.5.13 (ii). So Colrank(A) = m. Hence Colspan(A) = Mm (K) by 
Theorem 25.5.16 (ii). 


25.5.19 EXAMPLE: For a field K and m = 1 and n = 2, define A € Mm n(K) and B € M,,4(K) 

by A= [lk Ox] and B = ball Then AB = [0x] 4 Im. However, A and B satisfy m < n and 
K 

Rowrank(B) = 1 = m and Colrank(A) = 1 = m. 


25.5.20 REMARK: Left inverse does not imply right inverse. 

Example 25.5.21 shows that even when AB = Im for A € Mm,n(K) and B € Mn (K), it does not follow 
that BA = I,. The behaviour seen in Examples 25.5.19 and 25.5.21 is excluded when m = n. Square 
matrices have many desirable properties which are lacking in general rectangular matrices. 


25.5.21 EXAMPLE: For a field K and m = 1 and n = 2, define A € My»(K) and B € Mnm(K) by 


A= |ix Ox] and B= Fae Then AB = [1x] = Im. Note also that A and B satisfy m < n and 
K 


Rowrank(B) = 1 = m and Colrank(A) = 1 = m. However, BA = Fs =| # Ih. 
K OK 


25.5.22 REMARK: Conditions for row rank and column rank of a matrix. 

Theorem 25.5.23 shows the possibly surprising equality of the row rank and column rank of any matrix. 
The row rank of a matrix A € M,,,,(K) equals the dimension of the range of the linear map v —— Av 
for v € Mn (K). The column rank of A equals the dimension of the range of the linear map w e ATw 
for w € Mi, (K). (See Definition 25.4.11 for the transpose of a matrix.) It is not immediately, intuitively 
obvious that these two ranges should have the same dimension. 


It is perhaps of some technical interest that Theorem 25.5.23 makes non-trivial use of matrices with zero 
rows or zero columns, thereby justifying their inclusion in Definition 25.2.2. 


25.5.23 THEOREM: Existence of product decompositions for matrices with small enough row/column rank. 
Let K be a field and m,n € Z. Let A € Mm (K). 


(i) For p € Zi, Rowrank(A) < p if and only if there are matrices B € Mj,,(K) and C € Mi, (K) such 
that A = BC. 
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(ii) For p € Zt, Colrank(A) < p if and only if there are matrices B € Mm,p(K) and C € My, (K) such 
that A — BC. 


(iii) Rowrank(A) = Colrank( A). 
(iv) There are matrices B € Mm p(K) and C € M,,(K) such that A = BC, where p = Rowrank(A) = 
Colrank(A). 


(v) For p € Z*, there is no pair of matrices B € Mj, 1(K) and C € Mp-1,n(K) such that A = BC if 
p = Rowrank(A) or p = Colrank( A). 

(vi) If B € Mm p(K) and C € Mp n(K) satisfy A = BC for p = Rowrank(A) = Colrank(A), then 
Rowrank(B) = Rowrank(C) = p = Colrank(B) = Colrank(C). 


Pnoor: For part (i), let A € Mm n(K). Suppose that Rowrank(A) < p for some p € Zt. Rowspan(A) is 
a linear subspace of Mi, (K) by Theorem 25.5.5 (i). So there is a sequence of vectors (c)? € (Mi, (K))? 
such that Rowspan(A) C span((cy; k € N51) by Definition 22.5.2. Thus for each i € Nm, there is a sequence 
(uk), € K” such that Row;(A) = 375 4, inex. Define the matrix B € Mm p(K) by B(i,k) = Ni for 
all (i,k) € Nm x Np. Define the matrix C € M,,(K) by C(k, j) = cx(1, j) for all (k, j) € Np x Nn. Then 
A(i, j) = Xp- B(i, k)C (k, j) for all (i, j) € Nm x Nn. In other words, A = BC. 

Conversely, suppose that A = BC for some B € Mm p(K) and C € Mp n(K), for some p € Zj. Then 
Rowrank(A) = Rowrank(BC) < Rowrank(C) < p by Theorems 25.5.11 (i) and 25.5.13 (i). 


For part (ii), let A € Mm,n(K). Suppose that Colrank(A) < p for some p € Zg. Colspan(A) is a linear 
subspace of Mm,ı(K) by Theorem 25.5.5 (ii). So there is a sequence of vectors (bi)? ., € (Mm,1(K))? such 
that Colspan(A) C span({bg; k € Np}) by Definition 22.5.2. Thus for each j € Nn, there is a sequence 
(Mj) ., € K? such that Col;(A) = $77 , Aj bi. Define the matrix C € M, (K) by C(k, j) = p for all 
(k, j) € Np x Nn. Define the matrix B € Mm »(K) by B(i, k) = by(i, 1) for all (i,k) € Nm x Np. Then 
A(i, j) = 355 Bli, kK)C(k, j) for all (i,j) € Nm x Nn. In other words, A = BC. 

Conversely, suppose that A = BC for some B € Mm p(K) and C € Mj (K), for some p € Zj. Then 
Colrank(A) = Colrank(BC) < Colrank(B) € p by Theorems 25.5.11 (ii) and 25.5.13 (ii). 

, i) and (iii). 

, (i) and (iii). 

, (i) and (iii). 

), (ii) and (iii), and Theorems 25.5.11 (i, ii) and 25.5.13 (i, ii). 


Part (iii) follows from parts (i 


— M 


Part (iv) follows from parts ( 
Part ( 
Part ( 


follows from parts (i 


i 
x ) 
vi) follows from parts (i 


25.5.24 REMARK: Conditions for existence of left and right inverses of matrices. 
'Theorem 25.5.25 shows the importance of the row and column rank concepts. Existence of inverses is an 
important property. It turns out that this existence is directly related to row and column rank. 


25.5.25 THEOREM:  Row/column rank conditions for existence of left/right matrix inverses. 
Let K be a field and m,n € Zf. Let A € Minn(K). 


(i) A has a left inverse if and only if Rowrank(A) = n. 

(ii) A has a right inverse if and only if Colrank(A) = m. 
(iii) A has both a left and right inverse if and only if Rowrank(A) = m =n. 
(iv) A has both a left and right inverse if and only if Colrank(A) 2 m — n. 


Proor: For part (i), suppose that A has a left inverse. Then Rowrank(A) = n by Theorem 25.5.18 (i). 
Now suppose that Rowrank(A) = n. By Definitions 25.5.2 and 25.5.8, dim(span({Row,(A); k € Nm})) =n. 
So span({Row;(A); k € Nm}) = Min(K) by Theorem 25.5.16 (i). Thus for each i € Nz, there exists a 
sequence (4;,)7*., € K™ such that $77. , Aj; Row,(A) = ei, where e; € Mi, (K) is defined by e;(1, 7) = ôi; 
for all j € IN. Define B € Mn,m(K) by B(i, k) = Aj for all (i, k) € Nn x Nm. Then 


I 
Ms 


v(i, j) € Nn x Nn, (BA)(i, j) Bi, k)A(k, j) 


> 
ll 
= 


ll 
Ms 


Aix A(k, j) 


> 
ll 
= 
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= x Ai (Rows (A)(1, j)) 
= Ls Row; (A))(1, 7) 
— ei(l, j) = óij 


In other words, BA = Ip. Hence A has a left inverse. 


For part (ii), if A has a right inverse, then Colrank(A) = m by Theorem 25.5.18 (ii). So suppose that 
Colrank(A) = m. Then dim(span({Col,(A); k € N,})) = m by Definitions 25.5.2 and 25.5.8. Therefore 
span({Col,(A); k € Nn}) = Mma(K) by Theorem 25.5.16 (ii). So for each j € Nm, there exists a sequence 
(ujk) € K” such that 37? 4 ux Colk(A) = ej, where e; € Mm (K) is defined by e;(i,1) = ài; for 
all i € Nm. Define B € M, 4 (K) by B(k, j) = py for all (k,j) € Nn x Nm. Then 


VED € Na x Ns, (AB)(i,j) = x A(i, k) B(k, j) 
" 3 A, k) j,k 
- > Lj, (Col (A) (i, 1)) 
= 2 (1j. Coli (A))i, 1) 
= ej(i, 1) = ài. 


In other words, AB = [,,. Hence A has a right inverse. 
Part (iii) follows from parts (i) and (ii) and Theorem 25.5.14 (iii). 
Part (iv) follows from parts (1) and (ii) and Theorem 25.5.14 (iii). 


25.6. Row and column null spaces and nullity 


25.6.1 REMARK: The row nullity and column nullity of a rectangular matriz. 

The row nullity and column nullity of a matrix in Definition 25.6.3 are defined as the dimensions of kernels 
of linear maps which are associated with the matrix. The abstract nullity concept in Definition 23.1.26 is 
the dimension of the kernel of any linear map. Associated with any matrix A € Mm,n(K) are the matrix 
multiplication maps w — wA and v œ> Av for row matrices w € Mj, (K) and column matrices v € Mn, (K). 
When the abstract kernel and nullity concepts are applied to these two maps, the resulting concepts are called 
the “row null space" and “row nullity”, and the “column null space" and “column nullity”. 


In Definition 25.6.2, the map w — wA for A € Mm »(K) maps row vectors in Mi,m( K) to row vectors 
in Mi, (K). So it seems reasonable to call the kernel of this map the “row null space". Similarly, the map 
v + Av maps column vectors in M;, 1(€) to column vectors in Mj,,1(K). So the kernel of this map is called 
the “column null space". 


By considering matrices A € Mm n(K) with m 4 n, it becomes clear that the row and column null spaces 
are not spaces of row matrices or column matrices of the matrix A. This is in contrast to the row span and 
column span of a matrix, which can be naturally identified as spaces of row and column matrices respectively. 
The vector 0 for the row null space and row nullity in Definitions 25.6.2 and 25.6.3 signifies the null matrix in 
Mi,4 (K). The vector 0 for the corresponding column null space and column nullity signifies the null matrix 
in My (K). 


'Theorem 25.6.5 asserts that the row nullity and column nullity have a simple relation to the corresponding 
row rank and column rank. 


25.6.2 DEFINITION: The row null space of a matrix A € Mm,n(K), for a field K and m,n € Zj, is the 
linear subspace {w € Mi, (K); wA = 0} of My m(K). 


The column null space of a matrix A € Mm »(K), for a field K and m,n € Zi, is the linear subspace 
iv € Mn(K); Av = 0) of Mn(K) 
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25.6.3 DEFINITION: 
The row nullity of a matrix A € Mm,n(K), for a field K and m,n € Zj, is the dimension of the row null 
space of A. In other words, the row nullity equals dim((w € Mi m(K); wA = 0}). 


The column nullity of a matrix A € M, (K), for a field K and m,n € Zf, is the dimension of the column 
null space of A. In other words, the column nullity equals dim((v € Mn (K); Av = 0}). 


25.6.4 THEOREM: Relations of row/column nullity to matriz map injectivity. 
Let A € Mm n(K) for a field K and m,n € De: 

(i) The row nullity of A equals 0 if and only if the map wA e w for w € M1 m(K) is injective. 
(ii) The column nullity of A equals 0 if and only if the map Av +> v for v € M, (K) is injective. 


PROOF: Parts (i) and (ii) follow from Theorem 23.1.28 and Definition 25.6.3. 
25.6.5 THEOREM: Relations of row/column nullity to row/column rank and matrix depth/height. 
Let A € Mm n(K) for a field K and m,n € Z. 
(i) The row nullity of A equals m — Rowrank( A). 
(ii) The column nullity of A equals n — Colrank(A). 


PROOF: Parts (i) and (ii) follow from Theorem 24.2.18, the rank-nullity theorem for linear maps. 


25.7. Component matrices for linear maps 


25.7.1 REMARK: Motivation for rectangular matrices: Linear maps between linear spaces. 

Linear maps between linear spaces are very broadly applicable in mathematics, and especially in differential 
geometry. In practice, most computations for linear spaces are made with respect to one or more bases, 
particularly for finite-dimensional linear spaces. Thus vectors are, in practical computations, most often 
represented by their coordinates with respect to some basis for each linear space. In differential geometry, 
the maps between differentiable manifolds which are of greatest interest are the differentiable maps. But 
differentiable maps are defined to be approximately linear in a neighbourhood of each point. So for each 
differentiable map, there is a linear map between the (linear) tangent spaces at points of the domain and 
range. In this way, linear maps between finite-dimensional linear spaces arise constantly in differential 
geometry. Since coordinates are used for practical computations, matrices arise very naturally and inevitably 
in every situation where there is a differentiable map. These matrices are not necessarily square. 


A linear transformation from an n-dimensional linear space V to an m-dimensional linear space W may be 
specified by a rectangular m x n matrix of components with respect to a basis on V and a basis on W. For 
a fixed basis for each space, the association between linear maps and matrices is a bijection. 


25.7.2 REMARK: Advantages and disadvantages of component matrices. 

The principal application of rectangular matrix algebra is to linear maps between finite-dimensional linear 
spaces. Section 25.7 presents the correspondence between linear maps and matrix algebra. The principal 
property of the component matrices of linear maps is that the matrix of a composition of linear maps is 
the product of the matrices of the linear maps. Therefore calculations may be performed on linear maps in 
terms of their component matrices. 


Linear maps are often specified in terms of component matrices. The principal drawback of component 
matrices is the fact that they depend on the choice of basis for both the source and target space. Conversion 
between matrices with respect to different bases is tedious and error-prone. 


'The decision to work with bases and matrices or directly with linear spaces and linear maps is a trade- 
off between their advantages and disadvantages in each application context. Differential geometers who 
proclaim the virtues of “coordinate-free” methods are advocating the avoidance of bases and matrices. 
The “coordinates” they refer to are the components of vectors and matrices (and tensors) with respect to 
particular choices of bases. Abstract theorems can mostly be written in a coordinate-free style, but practical 
calculations mostly require the use of vector component tuples and matrices with respect to bases. 


Matrices are also useful for representing bilinear maps with respect to a linear space basis. This is discussed 
in Example 27.5.9. 
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25.7.3 DEFINITION: The (component) matrix of a linear map ó : V — W with respect to bases (e3)5-1 € 
V" and (fj), € W™ for linear spaces V and W over a field K is the matrix a € Mm,n(K) such that 


olej) = oe fioi for all j € Nn. 


25.7.4 REMARK: Well-definition of the component matrix of a given linear map. 

The matrix a in Definition 25.7.3 is well defined because by definition of a basis, each vector ¢(e;) has a 
unique m-tuple (o/;)7*, € K™ of components with respect to the basis (f;)7*,. In terms of the component 
map in Definition 22.8.8, oj; = Kw((e;))i, where kw : W — K™ is the component map with respect to 
the basis (f;)7*,. 


25.7.5 DEFINITION: The linear map for a (component) matrix a € Mmn(K) with respect to bases 
(e;)7-1 € V" and (f;);-, € W™ for linear spaces V and W over a field K is the linear map given by 
ó:ve TA jal fiaigkv(v); for v € V, where ky : V > K” is the component map for the basis (e;)7.,. 


25.7.6 REMARK: Well-definition of the linear map for a given component matrix. 

Definition 25.7.5 specifies a unique linear map ¢ € Lin(V, W) for each matrix a € Mm,n(K) for given bases 
for V and W. Conversely, Definition 25.7.3 specifies a unique matrix a € Mm,n(K) for each linear map 
$ € Lin(V, W) for the same bases. This establishes a bijection between the linear maps and component 
matrices. Thus one may freely choose whether to work *coordinate-free", in terms of linear maps, or “using 
coordinates”, in terms of matrices. 


25.7.7 REMARK: Matrices multiply basis vectors on the right, vector components on the left. 

Whereas the basis vectors are transformed by multiplying the sequence of basis vectors on the right by the 
matrix «a in Definition 25.7.3, component tuples are multiplied on the left by a. 

Let v = »7 4 vje; be a vector in V with components (v;)?., € K”. (Here v denotes both the vector and 
its component tuple.) Let $ : V — W be the linear map for the component matrix a € Mm n»(K). Then 
by Definition 25.7.5, ó(v) = 357-4 0-1 fioi;vj. So the component tuple for ó(v) with respect to the basis 
(fi)i&1 € W™ is (w;)7:, where wi = $55 4 aijv; for i € Nm. This is multiplication of the component tuple 
of v on the left by the matrix a. This is illustrated in Figure 25.7.1. 


tuple space tuple space 
matrix multiplication 
K” > Kn" 
wi = X j- Mig Py 
component component 

map V> K” kw: W > K™ map 

o:V7>W 
V = > W 

linear map 

linear space linear space 

Figure 25.7.1 Matrix multiplication corresponding to a linear map 


25.7.8 REMARK: Component matrices for linear maps between infinite-dimensional linear spaces. 
Component matrices are defined for general linear spaces, which may be infinite-dimensional, in Section 23.2. 
If the spaces have a topology, questions of convergence place constraints on the definitions and properties of 
such matrices. If there is no topology, infinite sums of vectors must be avoided, which also places constraints 
on definitions and properties. In the non-topological case, even changing basis is quite problematic, as 
discussed in Section 22.9. Apart from some very basic definitions and properties, both topological linear 
spaces and non-topological infinite-dimensional linear spaces are not presented here. They are unnecessary 
for the differentiable manifold definitions which are presented in Part IV of this book. 
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25.8. Square matrix algebra 


25.8.1 REMARK: Interpretation of square matrices as components of linear space endomorphisms. 
Whereas general rectangular matrices may be interpreted as the components with respect to finite bases 
of general linear homomorphisms ¢ : V — W, square matrices arise naturally as the components of linear 
endomorphisms $ : V — V. Invertible square matrices correspond to linear space automorphisms. Hence 
invertibility is one of the major concerns in the algebra of square matrices. 


Left and right inverses of general rectangular matrices are given by Definition 25.4.5. As mentioned in 
Remark 25.5.15, it is vain to seek non-square matrices which have equal left and right inverses. Therefore 
the investigation of two-sided inverses of matrices is restricted to square matrices. Two-sided inverses are of 
particular interest because inverses in groups are two-sided. (See Definition 17.3.2.) 


25.8.2 THEOREM: Left and right inverses of square matrices are equal and unique. 

If BiA BoA AC, AC, In for A, B1, Bo, C1, C2 € Ma (K), for a field K and n € Zt, then 
Bı = By = Cı = Cə. In other words, if A has at least one left inverse and at least one right inverse, then 
these two inverses are both unique and they are equal to each other. 


PROOF: Let BiA BoA AC, AC, In. Then Bj = Byly = By (AC?) = (Br A)C; = I,Ce = C, for 
all k,@ € {1,2}. Hence By = By = C1 = C». 


25.8.3 REMARK: A one-side inverse of a square matrix is a two-sided inverse. 

Theorem 25.8.2 delivers left-right equality and uniqueness of inverses if a square matrix has at least one 
left inverse and at least one right inverse. This leaves open the question of whether the existence of a left 
inverse implies the existence of a right inverse, and vice versa. For non-square matrices, the existence of 
a left inverse does not imply the existence of a right inverse. And even if left and right inverses do both 
exist, they might not be the same and they might not be unique. (This is made possible by the fact that 
non-square matrices do not constitute a group.) 


Theorem 25.8.4 shows that there is no such thing as a single-sided inverse of a square matrix which is not 
also a two-sided inverse. Therefore it is not necessary to investigate the properties of single-sided inverses of 
square matrices which are not two-sided inverses. 


25.8.4 THEOREM: Every left inverse of a square matrix is a right inverse, and vice versa. 
Let AB = In for some A, B € M, (K), for some field K and n € Zg. Then BA = h. 


PROOF: Suppose that AB = I,. Then B has a left inverse. So Rowrank(B) = Colrank(B) = n by 
Theorem 25.5.25 (i) and Theorem 25.5.23 (iii). Therefore B has a right inverse by Theorem 25.5.25 (ii). Let 
C be a right inverse for B. Then B = C by Theorem 25.8.2. So BA = Is. 


25.8.5 REMARK: The mysterious commutativity of matrices with their inverses. 

Since square matrices do not commute in general, the fact that a matrix commutes with its inverse requires 
some explanation. If one interprets a matrix A € M,,,,(K) as a linear endomorphism v + Av on the linear 
space M,, of n x 1 column matrices, a left inverse B signifies an endomorphism which can be applied after 
A to return the vectors to their original state. If one applies B before A, it is not automatically obvious that 
the result should once again be a return to the original state. The left inverse is a kind of “Undo” button 
which undoes an action. When editing a document, one does not push the “Undo” button before the action 
which one wishes to undo. But this works perfectly well for matrix left and right inverses. 


A very beneficial consequence of the commutativity of matrices with their inverses is the fact that one may 
define a unital morphism from the integers to the ring of square matrices so that a single-parameter group 
of matrices is associated with each invertible matrix. 


25.8.6 REMARK: Terminology for invertible matrices. 

Invertible matrices are frequently referred to as “nonsingular matrices”, while non-invertible matrices are 
equally frequently referred to as “singular matrices”. The most important fact about invertible matrices is 
that they form a multiplicative group. (The name “singular” is no doubt justified by the fact that invertible 
matrices come in pairs, since every invertible matrix equals the inverse of its own inverse. So the singular 
matrices are those which are “on their own” with no other matrix to form such a pair with.) 
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25.8.7 DEFINITION: An invertible matrix or nonsingular matrix is a matrix A € Mj, (K), for some field 
K andne Za such that AB = BA =I, for some B € Mn n(K). 


A non-invertible matrix or singular matriz is a square matrix which is not invertible. 


25.8.8 NOTATION: MiP’ (K), for a field K and n € Zf, denotes the set of invertible n x n matrices over K. 


n,n 


In other words, MÌ®¥ (K) = (A € M, 4(K); 3B € Mn n(K), AB = BA = In}. 


En Tt 


25.8.9 REMARK: Well-definition of inverse matrices. 

For invertible square matrices, the inverse exists and is unique. Therefore it is possible to give them a 
well-defined name and notation. This is done in Definition 25.8.10 and Notation 25.8.11. (The inverse of the 
useless empty matrix in Mo,o(K) is the empty matrix itself, a fact which is equally useless.) 


25.8.10 DEFINITION: The inverse of a matric A € Mi¥(K), for a field K and n € Zj, is the matrix 
B € Mn n(K) such that AB = BA = h. 


25.8.11 NOTATION: A`! denotes the inverse of an invertible square matrix A. 


25.8.12 THEOREM: Row/column rank conditions for invertibility of square matrices. 
Let A € Mn,n(K) for a field K and n € Zj. 


(i) A is invertible if and only if Rowrank(A) = n. 
(ii) A is invertible if and only if Colrank(A) = n. 
(iii) A is invertible if and only if Vv € M, (K), (Av = 0 > v— 0). 


Pnoor: Part (i) follows from Theorem 25.5.25 (iii). 
Part (ii) follows from Theorem 25.5.25 (iv). 


For part (iii), assume that A € Mn, n(K) is invertible. Suppose that v € Mn, (K) and Av = 0 € Mn(K). 
Then v = (A~!A)v 2 A! (Av) = A710 = 0. Therefore Vv € Mn(K), (Av = 0 > v = 0). For the converse, 
assume that Vv € M, (K), (Av = 02 v = 0). Let V = Mn(K). Define 9: V > V by $:v — Av. Then 
nullity(¢) = 0 by Definition 23.1.26. So $ is injective by Theorem 23.1.28. Therefore 9 : V > Range(9) is 
an isomorphism by Definition 23.1.8. So n = dim(V) = dim(Range(¢)). But Range(¢) = Colspan(A). So 
Colrank(A) = dim(Range(¢)) = n. Hence A is invertible by part (ii). 


25.8.13 THEOREM: Some basic properties of invertible matrices. 
The set MY (KX) of invertible n x n matrices over a field K for n € Zg has the following properties. 
(i) VA € K \ {0x}, VA € Mi(K), (4A)! = ATT ATI. 


(ii) VA, B € MPX(K), (AB)! = B-1471. 


Pnoor: For part (i), let A € K V {Ox} and A € MPPY(K). Then (4 14 (A4) = A71A(4714) = Is. 


Hence (AA)-! = A-14-! by Theorem 25.8.4 and Definition 25.8.10. 
For part (ii), let A, B € MPY(K). Then (B-! A-*)(AB) = B-!(A-! A)B = B-I,B = B-! B = I,. Hence 
(AB)! = B-14^! by Theorem 25.8.4 and Definition 25.8.10. 


25.8.14 REMARK:  Unital morphisms and integer powers of matrices. 
The unital morphism concept for groups in Definition 17.4.8 may be applied to invertible matrices to define 
integer “powers” of matrices as in Definition 25.8.15. 


25.8.15 DEFINITION: The powers of a matrix A € Mn n(K), for a field K and n € Zf, are defined 
inductively by 
(i) AÆL = h. 
(ii) Yp € Zp, APTI = APA. 
(iii) If A € MY (K), then Vp € Zg, AP 1 = A? A^. 


n,n 
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25.8.16 THEOREM: Kernel and range conditions for linear maps to be linear space isomorphisms. 
Let ¢: V 2 W be a linear map between finite-dimensional linear spaces V and W with dim(V) = dim(W). 


(i) à is a linear space isomorphism if and only if ker(9) = {Oy}. 
(ii) $ is a linear space isomorphism if and only if Range(¢) = W. 


PROOF: For part (i), let n = dim(V) = dim(W). Then V and W have bases (e;)?_, and (ej), re- 
spectively by Theorem 22.7.13. Let A € Mn,n(K) be the matrix for ¢ according to Definition 25.7.3. By 
Theorem 25.8.12 (iii), A is an invertible matrix if and only if Vb € M, 4(K), (Ab = 0 = b = 0). Therefore 
A is an invertible matrix if and only if ker(ó) = {Ov} because ó(v) = 9($ 55.565) = $5j-1559(e;) = 
Mio Aid; ei, where b = (bj)? = (&v(v);)gzi € Mn(K) for v € V, and so ó(v) = 0 implies v = 0 
for all v € V is equivalent to Ab = 0 implies b = 0 for all b € Mn (K). But ¢ is an invertible linear 
transformation if and only if A is an invertible matrix. Hence 6 is a linear space isomorphism if and only if 
ker(¢) = {Ov}. (Alternatively, the assertion follows from Theorem 24.2.16 (iii).) 


Part (ii) follows from Theorem 24.2.16 (iii). 


25.9. Trace of a matrix 


25.9.1 REMARK: The trace, determinant and eigenvalues of a square matriz. 

The trace and determinant of a square matrix are closely related to its spectrum of eigenvalues. (Roughly 
speaking, the trace equals the sum of the eigenvalues and the determinant equals the product of the eigen- 
values.) Both the trace and determinant are invariant under an orthogonal change of basis if the matrix 
is regarded as the component matrix of a bilinear form on a finite-dimensional linear space. However, the 
computational complexity of the trace and of the determinant are very different. The trace is extremely easy 
to compute, whereas the determinant offers some computational “challenges” for large matrices. 


25.9.2 DEFINITION: The trace of a matrix A = (aij) j= € M, 4(K) for n € Zg is the sum $7 , aii. 
25.9.3 NOTATION: ‘Tr(A) denotes the trace of a matrix A € M,,,(K) for n € Zj and a field K. 


25.9.4 THEOREM: Properties of the trace of a square matrix. 

The trace of square matrices has the following properties for all n € Ze and fields K. 
(i) VA € K, VA € Ma, (KK), Tr(AA) = àA Tr(A). 

(ii) VA, B € M, í,(K), Tr(A+ B) = Tr(A) + Tr(B). 

(iii) VA € Mn n(K), Tr(AT) = Te A). 


PROOF: For part (i), Tr(AA) = Oy) Aan = A9 4.104 = À Tr(A). 
For part (ii), Tr(A + B) = DOL, (aii + bi) = (iy au) + (Ly bi) = Tr(A) + Tr(B). 
For part (iii), Tr(AT) = $7 a4 = 3 2.404 = Tr(A). 


i=1 “it j=l 


25.9.5 REMARK: Geometric interpretation of the trace of a matriz. 

When a square matrix is interpreted as the component map of a linear space endomorphism, it is shown in 
Theorem 23.3.2 that the trace of the matrix is independent of the choice of basis for the linear space. As 
mentioned in Remark 23.3.5, it is possible to give a geometric interpretation for the trace of a linear map. 
Since the trace is invariant with respect to the general linear group, the “geometry” for this interpretation 
does not use an inner product of any kind. 


Another kind of interpretation for the trace of a matrix pertains to the application of the matrix in the 
context of bilinear maps. In this case, the matrix is a set of components for an object which transforms like 
a second-degree covariant tensor. The trace of such a matrix is a special case of the concept of a contraction 
of a tensor. This kind of interpretation does require an inner product, and the trace is in this case invariant 
under the orthogonal group of transformations. 


More generally, a function f of an endomorphism component matrix A is invariant if f(B71AB) = f(A) 
for all B in some specified matrix group, whereas if A is the coefficient matrix for a bilinear form, it is 
invariant if f(BTAB) = f(A) for all B in some specified matrix group. Thus in the linear endomorphism 
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case, invariance of the trace is obtained if Tr(B~'AB) = Tr(A) for all B in some group. This means that 
Mio = ij nar bijajkbri = Oj. naa Wik Xii bribij = y210;kÓk; = 27.105; which holds for all 


B € GL(n), where B^! = [b;;]? j=1; So GL(n) is the invariance group for the endomorphism application. If 
A corresponds to a bilinear form, one wants $77 , ai; = 35; j par bjiagkDki = DOF par Ajk Xi- bribji, which 
holds if B is orthogonal because then $77 , bj;b;; = 0j. Thus O(n) is the invariance group for the bilinear 
form application. 


The discrepancy between the invariance groups for the trace of a linear map and the trace of a bilinear 
form is a clue that these are different kinds of traces, although arithmetically they seem identical when the 
different kinds of objects are parametrised by matrices. 


25.10. Determinant of a matrix 


25.10.1 REMARK: History of determinants. 

The name “determinant” is attributed to Gauf's 1801 book “Disquisitiones arithmeticae" by Mirsky [117], 
page 1, who states that determinants were the first topic to be studied intensively in linear algebra, initiated 
by Leibniz in 1696. 


25.10.2 REMARK: Determinants require permutations and parity. 
Determinants are defined in terms of permutations and parity, which are defined in Section 14.8. The set 
perm(N,,) is the set of n! permutations of IN, = {1,2,...n}. 


25.10.3 DEFINITION: The determinant of a matrix A = (a4;)7; 4 € Mnn(K) for n € Zp and field K is 
the element of K given by * peperm(Nn) (parity (P) JIi ai, P())- 


25.10.4 NOTATION: det(A) denotes the determinant of a matrix A € Mn n(K) for n € Z; and a field K. 


det(A) = 5 parity(P) lvo. 
i=1 


P€perm(N,,) 


25.10.5 REMARK: The special case of determinants for a field with characteristic 2. 

If the field K in Definition 25.10.3 is Z2 = {0,1}, the parity values 1 and —1 are the same, and the matrix 
A has only zeros and ones as elements. (See Definition 18.7.10 for the characteristic of a field.) Since 
the determinant can only be zero or one, it is interesting to ask what kinds of matrices have the non-zero 
determinant value. The diagonal matrix with a;; = 6;; for i, j € Nn certainly has det(A) = 1. It is also clear 
that det(A) = 1 for any lower diagonal matrix with a;; = 1 for i = j and a;; = 0 for i < j. (This is true for 
any field.) 


25.10.6 THEOREM: The determinant of the transpose equals the determinant of the original matriz. 
Let A € Mn,n(K) for n € Zj and a field K. Then det(AT) = det(A). 


Pnoor: For any n € Zj, the set {P7!; P € perm(N,,)} is equal to perm(N,,) because P € perm(NN,) if 
and only if P^! € perm(N,,). So 


det(A)= — M7  paity(P) [ [ai 7-0. 
i—1 


P€perm(N,,) 


But [Ji aipa = II aprope- = Iia apai for any P € perm(N,,) because a permutation of 
the factors in a product in a field does not affect the value of the product. (According to Definition 18.7.3, 
the product operation of a field is commutative.) Therefore 


det(A) = 5 parity(P) lo; 
i=l 


P€perm(N,,) 


which equals det( AT). 
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25.10.7 THEOREM: Formula for the determinant of a scalar multiple of a matriz. 
Let A € Mnn(K) for n € Zj and a field K. Then det(AA) = à” det(A) for A € K. 


PROOF: Let A € Mn, n(K) for n € Zj and a field K. Then det(AA) = 
2 PEpeul Na) parity(P) IL 1 Aa; P= (i) — =)" 1 Peperm(N „) Parity (P) IL Qi p-i(i) = A” det(A). 


25.10.8 REMARK: A formula for a sum of products of elements of a matriz. 
For Theorem 25.10.12, it is useful to first prove Theorem 25.10.9. The parity function is extended by 
parity(Q) = 0 for functions Q : Nn —> IN, which are not permutations (i.e. not bijections). 


25.10.9 THEOREM: Formula for a sum of signed products of permutations of matrices. 
Let A = [ai;]2;.1 € Mnn(K) for n € Zg and a field K. Let Q : N, > Nn. Then 


XO parity(P) [ [400.505 = parity (Q) det(A). (25.10.1) 
i=1 


P€perm(IN,) 


PROOF: Suppose first that Q is not a permutation of N,,. (This implies, incidentally, that n > 2.) Then 
Q(k) = Q(/) for some k,l € IN, with k Z £. Let S € perm(N,,) be the permutation which swaps k and 4. 
(That is, S(k) = £ and S(/) = k; otherwise S(i) = i.) Then Qo S = Qo S^! = Q and parity(S) = —1. 
Substitute P = T o S in the left-hand side of (25.10.1) for permutations T € perm(IN,). Then 


5 parity (P) II aQ(i),P@) = 5 parity(T o S) [[cow.rsw 


P€perm(N,) i=1 T€perm(N,) i—1 
= » parity (T) parity (S II aQs-1(i),T(i 
T€perm(N,) 


= parity(9) 5 parity (T) II 4Q(i),T (i) 
i=l 


T€perm(N;) 
= 0, 
because x = —z for r € K implies that x = 0. Since parity(Q) = 0 for a non-permutation, the theorem is 


verified in this case. 
If Q is a permutation of Nn, substitution of P = T o Q in (25.10.1) yields 


5 parity(P) lIvoo.20 = 5 parity (T o Q Two »TQ( 


P€perm(NN,) i=1 T€perm(N,) 
= 5 parity (T) parity(Q II aQQ-: 
TEperm(Nn) 
= parity (Q) 5 parity(T) Il 4,7 (4) 
Téperm(Np) i=1 
= parity(Q) det(A), 


which is as claimed. 


25.10.10 REMARK: A generalised multinomial formula. 

Theorem 25.10.11 is a generalised multinomial formula. In the special case m = n = 2, this theorem states 
that (a11 + à12)(a21 + a22) = 011021 + 411022 + 012021 + 012022. In other words, the product of the sums of 
all rows is equal to the sum of all possible products of a single choice of an element in each row. This fairly 
obvious assertion is useful in the proof of Theorem 25.10.12. 
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25.10.11 THEOREM: Multinomial formula for products of sums of matrix elements. 
Let A = [aij]; 721 € Mas (KK) be a matrix over a field K with m,n € Zj. Then 


lle X eso (25.10.2) 


i=1 j=l JeNNm i=1 


PROOF: Let Ninn = (Nn)N™ for m,n € Zj. Let m — 0. Then [[7-, Brai aij = lg because the product 
of an empty sequence of terms equals 1x, and 35 jc... Mi tiso = Do ao lia tisa = ag Uk = 1x 
because No,n = (Nn)® = (0) 4 Ø by Theorem 10.2.26 (ii). So line (25.10.2) is valid for m = 0 for all n € Zf. 
Let m — 1. Then [T7 575 4 ai; = 255-11, and »7;ey,, n Tin tisa = Viens, 01,20) = 2252101. So 
line (25.10.2) is valid for m = 1 for all n € Zf. 

Assume that line (25.10.2) is valid for m € Z^. Then 


m-1 n n n 
I| Say = (Il 3 a5) X an 
i=1 j=l i=1j=1 j=l 


QE Be) £o 


J€Nm,n 


(E Tas 2 


JENm,n ` j=l 


= » X asus Luo 


J€N mn J= 
n m-4i 


Qi. (JUL (m--1,3))) G 
J€N n,n J=1 o (JU{(m+1,5)}) (@) 


m+1 


ii Qi J (4): 


JEN min i= 


II 


which verifies line (25.10.2) for case m + 1 for all n € Zf. Hence by induction, line (25.10.2) is valid for 
all m,n € Zg- 


25.10.12 THEOREM: The determinant of the product equals the product of the determinants. 
Let A, B € Mn n (K) for n € Zj and a field K. Then det(AB) = det(A) det(B). 


PROOF: Let Mn = (Nn)N” and P, = perm(Nn) € Mn. Then 


det(AB) = 5 sigen(P) JS aixbs,P()) 


PEPn i=1 k-1 
= Y sien(P) Y^ (IIazooboo.o) (25.10.3) 
PEP», QcN, i=1 
=>) 2. sien(P) [Tarembaw.ra 
QENn PEPn i=1 
> (II li, Ql ME sign(P Toro 
QENn i= PEP, 
» (Il Qi.Q(i )X sign(Q) sign(P BEL (25.10.4) 
QEN, i=1 PEPy j=l 
= 1 5 sel) [aoo ) ( b» sign( P II b; a) 
QEPn i=1 PEPn 
= det(A) det(B). 


Line (25.10.3) follows from Theorem 25.10.11. Line (25.10.4) follows from Theorem 25.10.9. 
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25.10.13 THEOREM: The determinant of the inverse equals the reciprocal of the determinant. 
Let A € Mj (K), for some field K and n € Z*. Then det(A^!) = det(A)~?. 


PROOF: The assertion follows from Theorem 25.10.12 and the definition of an inverse matrix. 


25.10.14 REMARK: Geometric interpretation of the determinant of a matriz. 

As in the case of the trace of a matrix, and its eigenvalues and eigenvectors, the determinant of a matrix may 
be interpreted differently according to whether the matrix is the component matrix of a linear transformation 
or the coefficient matrix of a bilinear form, or some other matrix application. 


In the application to linear space endomorphisms, the determinant of a matrix may be interpreted as its 
volume multiplier. Thus given a rectangular region S = (x € V; JA € K”, x = 37. 4 Aiei} spanned by a 
sequence of n vectors (e;)7., in a linear space V with dimension n, the image of that region under the linear 
map ¢: z — Az has volume vol(¢($)) = det(A) vol(S). 


The interpretation of the determinant in the application to bilinear forms is not so simple. Like the trace, the 
determinant of a bilinear form is not invariant under all linear transformations. In fact, the “special linear” 
transformations are defined to be those linear transformations which preserve the determinant of bilinear 
forms. Since the determinant of a symmetric real matrix is equal to the product of its eigenvalues, the 
determinant of a positive definite real symmetric matrix may be interpreted as the inverse volume multiplier 
for level sets of the bilinear form. Thus if Q = (x € V; zTz < 1) and €! = (x € V; 27 Ax < 1}, then 
vol(Q’) = det(A)~! vol(Q) if det(A) Z 0 and A is a positive definite real symmetric matrix. 


25.11. Upper and lower modulus of real square matrices 


25.11.1 REMARK: The upper and lower modulus of a matriz over an ordered field. 

The real-number field R has a standard total order. This makes possible the definition of an upper and lower 
“modulus” for matrices over IR. This kind of modulus can be negative, unlike the absolute value function, 
which is always non-negative. 


25.11.2 DEFINITION: The upper modulus for real square matrices is the function AT : Mnn (R) > R 
defined for n € Zg by 


VA € Mn, (IR), A QA) sup { 5 GjjUjUj; v € IR" and t =1 + 


i,j=1 i=1 


The lower modulus for real square matrices is the function A^ : Mn,n(R) —^ R defined for n € Z? by 


VA € Mn (IR), X (A) = inf 5 aijvivj; v € IR" and y d =1 \. 
i=l 


ij-l 


25.11.3 REMARK: Applications of matrix upper and lower modulus. 

The terms “upper modulus” and “lower modulus" in Definition 25.11.2 are probably non-standard. These 
functions are useful for putting upper and lower limits on the coefficients of elliptic second-order partial 
derivatives in partial differential equations. 


25.11.4 REMARK: The apparently useless upper and lower modulus for zero-dimensional matrices. 
In the case n = 0 in Definition 25.11.2, \*(A) = —oo and A (A) = oo for all A € M, (IR). Therefore these 
functions are of dubious value for n = 0. 


25.11.5 THEOREM: Formulas for upper/lower modulus of symmetrants and anti-symmetrants of matrices. 
(i) Vn € Z, VA € Mn (IR), (F(A + AT)) = X (A). 
(ii) Yn € Zy, VÀ € M, (RR), X (à(A4 T AT)) =X (A). 


(iii) Yn € Zj, VA € M, (R), A* (41(4— AT) 2 X (1(A- A7)) 20. 


PROOF: Parts (i) and (ii) follow from $77; 4(3(aij + aji))viv; = 35; j= aijviv;, which follows from com- 
mutativity of real number multiplication. Part (iii) follows from De ja (ay — aji))vivj = 0. 
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25.11.6 REMARK: Applications of matriz upper and lower modulus to elliptic second-order PDEs. 

In the study of second-order partial differential equations, the order properties of the coefficient matrix for the 
second-order derivatives are important for the classification of operators and equations. An elliptic operator, 
for example, requires a second-order coefficient matrix which is positive definite. A weakly elliptic operator 
requires a positive semi-definite second-order coefficient matrix. In curved space, similar classifications are 
applicable, although the second-order derivatives in this case should be covariant and the coefficient matrix 
is replaced by a second-order contravariant tensor. Then the (semi-)definiteness properties are applied to the 
components of this contravariant tensor with respect to a basis. Of course, the (semi-)definiteness properties 
must be chart-independent in order to ensure that the properties are well defined. (See Definition 30.5.3 for 
the bilinear function version of Definition 25.11.7.) 


25.11.7 DEFINITION: A real square matrix (a;;)7;., € Mnn(IR) for n € Zj is said to be 


(i) positive semi-definite if Vv € IR", p» aijvivj > 0, 


(iii) positive definite if Vu € IR^ \ {0}, E QijVvivj > 0, 

(i 

25.11.8 THEOREM: The relation of the upper and lower matriz modulus to eigenvalues. 
Let n € Z* and A € M, (IR). Then 


) 

(ii) negative semi-definite if Vv € R”, 21 4 Qijvivj X 0, 
) 
) 


v) negative definite if Vu € R” X (0), $77, ., aijviv; < 0. 


(i) A is positive semi-definite if and only if A (A) > 0; 
(ii) A is negative semi-definite if and only if A* (4) < 0. 


PROOF: To show part (i), let n € Zt and let A = [a;j]|?;-, be positive semi-definite. Then A € M, (IR) and 
Vv € R^, $7, ,a;v;v; > 0. By Definition 25.11.2, X (A) = inf 4 aijvivj; v € R” and Dy, v7 = 
1}. But 2 258 aijvivj > 0 for any v € R”. Therefore X (A) > 0 as claimed. 

Now suppose that A = [aij] ;=1 € Mn»(IR) and A (A) 2 0. If v = 0 € R”, then i ca a, 00; = 0. Bo 
Yo jeidigtiv; > 0. If v 7 0, define w € R” by w = k V?v, where k = 577 ,v?. Then 55; W? = 1. So 
p» ajjw;wj > 0 be the definition of X (A). Therefore Soci QjjUU; = kia ajjw;wj > 0 since k > 0. 


It follows that D j=1 ij ij > 0 for all v € IR". Hence A is positive semi-definite. 


The proof of (ii) follows by suitable changes of sign. 


25.11.9 REMARK: Applications of definite and semi-definite matrices. 

Definite and semi-definite matrices are generally defined only for the special case of real symmetric or complex 
hermitian matrices. (See for example EDM2 [113], 269.1.) Whether a real matrix A has a definiteness or semi- 
definiteness property depends only on the symmetric part (A + AT). (This is stated in Theorem 25.11.5.) 
The main applications of (semi-)definiteness are to eigenspaces for real symmetric (or complex hermitian) 
matrices, which are guaranteed to have real eigenvalues. Such matrices therefore have well-defined ordering 
properties when applied to vectors. Nevertheless, Definition 25.11.7 defines (semi-)definiteness for all real 
square matrices. Notation 25.13.9 specialises the concept to real symmetric matrices. 


25.12. Upper and lower bounds for real square matrices 


25.12.1 REMARK: Applications of upper and lower bounds of matrices. 

'The upper and lower bounds for matrices in Definition 25.12.2 are useful for the inverse function theorem. 
(See Theorem 41.10.4.) The inverse function theorem is useful for defining and proving properties of regu- 
lar submanifolds, embeddings, immersions and submersions of differentiable manifolds. (See Sections 52.3 
and 52.4 for submanifolds. See Section 52.5 for embeddings, immersions and submersions.) 


For simplicity, the column matrix space M;,;(IR") is replaced with the linear space IR" in Section 25.12. 
Most of the time, it can be assumed that the action of a square matrix A € M,,,(K) on a vector v € K” 
is the left action v — Av as for column matrix vectors. One must always be alert, however, since the right 
action v > VA is also quite often encountered in differential geometry, where the vector v is interpreted as 
an element of the row matrix space M; n(K), typically for the field K = IR. 
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25.12.2 DEFINITION: Upper and lower bound functions for real square matrices. 
The upper bound function for real square matrices is the function u* : Mn,n(R) — R defined for n € Z+ by 


VA € Mn (IR), pt (A) = sup( |Av|; v € R” and |v| = 1}, (25.12.1) 


where | - | : IR" — Rọ denotes the 2-norm as in Notation 24.7.13. 
The lower bound function for real square matrices is the function ~ : M, (IR) — IR defined for n € Z* by 


VA € Mn (IR), po (A) = inf (|Av|; v € R” and || 2 1}. (25.12.2) 


25.12.3 THEOREM: Some basic properties of upper and lower bound functions for square matrices. 
Let A € Mn (IR) for some n € Zt. 


(i) wo (4) < wt (A). 
(ii) wt (A) € R$. 
-(4) € Rb. 
(iv) If A=0, then u+ (A) = w (A) =0. 
(v) If A Z 0, then u^ (A) > 0 


(iii 


)nu 

) H 

) 

) 

(vi) Vv € R”, |Av| < u*(A)|v]. 

(vii) Vv € R^, |Av| > pw (A)|v]. 

(viii) u^ (A) = inf {A € R; Ww € R^, |Av| € Alo] ). 
(ix) u~ (A) = sup {A € R; Vv € R^, |Av| > Alo] }. 
(x) If A € Rand Vv € R”, |Av| € A|v|, then u^ (A) 
(xi) If A € R and Vv € R”, |Av| > A|v|, then 7 (A) 
xii) A is invertible if and only if u~ (A) > 0. 

(xiii) If A is invertible, then u* (A1)! = u^ (A) 

(xiv) If A is invertible, then u* (A)! = u- (A^!) 


IV. IA 
>> 


— 


< 
< pt (A7) = (Ay), 

PROOF: Part (i) follows directly from Definition 25.12.2 because ( |Av|; v € R” and |v| = 1} 49. 

For part (ii), let n € Zt and let A € M,,,(R). Then (|Av|; v € IR" and |v| = 1} is a non-empty set 
of non-negative real numbers. Suppose that v € IR" and |v| = 1. Then |Av|? = Xa x3 ajk). 
So by Theorem 19.8.9 (the Cauchy-Schwarz inequality), |Av|? < k (aas) (Xa) = Dij 0 
Therefore 0 € u™ (A) < œ. 

For part (iii), ( |Av|; v € R” and |v| 2 1) z 0 and |Av| > 0 for all v € R”. So 0 < & (A) < oo. 

Part (iv) follows immediately from Definition 25.12.2. 

For part (v), let A = [aij]?;-; # 0. Then axe # 0 for some k,£ € Nn. Let v = (55/);.; € IR". Then 
Av = (S051 tijõje)i1 = (aie) # 0. Therefore |Av| > |axe| > 0. Hence u* (A) > 0. 
For part (vi), let v € IR". If v = 0, then the inequality is an equality, 0 = 0. Suppose that v 4 0. Let 
w = v/|v|. Then |w| = 1. So by line (25.12.1), |Aw| < u+ (A). Hence |Av| = |Aw]- |v| € ut (A)|v]. 
For part (vii), let v € IR". If v = 0, then the inequality is an equality, 0 = 0. Suppose that v 4 0. Let 
w = v/|v|. Then |w| = 1. So by line (25.12.2), |Aw| > u~ (A). Hence |Av| = |Au|- |u| > u (A)|v]. 
For part (viii), let S* (A) = (A € R; Vv € R^, |Av| € A|v| J. Then u*(A) € S*(A) by part (vi). Therefore 
S*(A) Æ 0. But S*(A) C R$ because R” contains at least one non-zero vector v, for which |Av| > 0 
and |v| > 0. So A > |Av|/|v| > 0 for all A € S*(A). So 0 € inf(S*(A)) € u* (A) since n^ (A4) € S* (A). Now 
let A € S*(A). Then |Av| € A for all v € R” with |v] = 1. So wt(A) € A. Therefore u* (A) € inf(S* (A)). 
Hence 4^ (A4) = inf(S*(A)) as claimed. 

For part (ix), let S- (A4) = {A € R; Vv € R”, |Av| > Alv|}. Then 0 € S-(A). So S-(A) # 0 and 
sup(S~(A)) > 0. Let A € S (A). Then |Av| > A for all v € R” with || = 1. So (A) > 
Definition 25.12.2. Therefore u^ (A) > sup(S-(A)). So0 € sup(S-(A)) € 7 (A) < oc. But u~ (A) € S^ (A) 
by part (vii). So sup(S~(A)) > p~ (A). Hence u^ (A) = sup(S7 (A)) as claimed. 
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Parts (x) and (xi) follow directly from parts (viii) and (ix). 
For part (xii), let A be invertible. Let v € IR". Let w = Av. Then v = A™tw. So |u| < u Es 1\lw| by 
part (vi), and u^ (A7!) < oo by part (ii). So |Av| = |w| > u*(A^!)^!|v]. Therefore u^ (A) 2 pt (A7)! 
by part (xi). But 4* (A73)! > 0. So u7 (A) > 0. To show the converse, suppose that „~ (A) S 0. Then 
Av # 0 for all v € R” V {0}. Hence A is invertible by Theorem 25.8.12 (iii). 

For part (xiii), let A € M, ,, (IR) be invertible. Let v € IR^. Let w = Av. Then v = A !w. So |v| = |A !w| € 
n* (A !)|w| by part (vi). But u* (A^!) > 0 by part (v). So |Av| = |w| > u*(A ^!) !|v|. Consequently 
Vv € R”, |Av| > n* (A 1) !|v|. Hence u^ (A) > u* (A71)! by part (xi). 

Now let v € IR^. Let w = A^!v. Then v = Aw. So |v| = |Aw| > u~ (A)|w| by part (vii). But u~ (A) > 0 
by part (xii). So |A!v| = |w| € u~ (A) !|v]. Xm m Hur Vv € R”, |A tu] < u^ (A e Therefore 
u* (A1) € pw (A) ! by part (x). So u~ (A) € wt (A1) 5, and hence i (A) = p+ (A7)! 

By substituting A^! for A, it follows that u^ (A7!) = u* (A). Hence u* (A) = u~ (A71) 
(A) € ut (A) follows from part (i). 

Part (xiv) follows from part (xiii) by substituting A^! for A. 


1. The inequality 


25.12.4 THEOREM: Properties of upper/lower bounds of matriz sums and products. 
Let n € Z*. 


(i) VÀ € R, VA € Mn n (R), wt (4A) = |Au* (A). 
(ii) s B € Mn n(R), n^ (A B) € p* (A) + n* (B). 
(ii) u^ is a norm on the linear space M, (IR). 
(iv) VA, B € M, (R), n* (AB) < u* (Au^ (B). 
(v) VA, B € Myn(IR), u- (AB) > n7 (A) (B). 
PROOF: For part (i), let A € R and A € M; (IR). Then it follows from Definition 25.12.2 that ^ (A4) = 
sup f |(AA)v|; v € R” and |v| = 1) = |A| sup { | Av|; v € IR" and |v| = 1) = |A|u* (A). 
For part (ii), let A,B € M, (IR). Then by Definition 25.12.2, 


u^ (A + B) = sup { |(A + B)v|; v € R” and |v| = 1) 
< sup ( |Av| + |Bv|; v € R” and |v| = 1} 
< sup {|Av|; v € R” and |v| = 1} + sup {|Av|; v € R” and |v| = 1} 
= ut (A) + u*(B). 


Part (iii) follows from Definitions 19.6.2 and 25.3.2, Theorem 25.12.3 (iv, v), and parts (i) and (ii). 
For part (iv), let A,B € Mn „n (R). Let v € R” with |v| = 1. Then |(AB)v| = |A(Bv)| € u*(A)|Bv| < 
u*(A)u*(B)|v| by Definition 25.12.2. Hence u^ (AB) < u* (A)u* (B) by Definition 25.12.2 


For part (v), let A, B € Maj4 (IR). Let v € R” with |v] = 1. Then |(AB)v| = |A(Bo)| : > nu (A)Bv| > 
po (A)u- (B)|v| by Definition 25.12.2. Hence u` (AB) > u (A) (B) by Definition 25.12.2. 


25.12.5 REMARK: The benefits of the lower bound function for matrices. 

Clearly the lower bound function u7 is not a norm on the real square matrices, but it does have benefits. The 
lower bound is closely related to the invertibility of a matrix, and invertibility is one of the key properties of 
matrices which are significant for applications. By Theorem 25.12.3 (xii), a matrix A is invertible if and only 
if its lower bound (A) is positive. Consequently lower bounds for the matrix lower bound are of some 
interest. This is the motivation for Theorem 25.12.6. 


25.12.6 THEOREM: Relations between matrix invertibility and upper/lower bounds. 
Let A, B € M44 (R) for some n € Z*. 


(i) u-(B) > u-(4) - n*(B — A). 
(ii) If A is invertible and u*(B — A) < u7 (A), then B is invertible. 
(iii) |n (B) - n (A)| € wt (B - A). 

(iv) y (A) - u+ (B — A) < À& (B) < & (A) 
) 1 


tu 
(v) If A and B are invertible, y+ (B 1 —A 1) < 


+(B 
n" P)u* (B )u*(B-A) = pt (B-A)u (A) !u (B). 
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PROOF: For part (i), let v € R” with |v| = 1. Then |Bv| = |(B — A)v + Av| > |Av| — |(B — A)v| by 
the triangle inequality for the norm on R”. (See Theorem 19.6.4 (v).) So |Bu| > n- (A) — n* (B — A) by 
Definition 25.12.2. Hence u~ (B) > p~ (A) — u* (B — A) by Definition 25.12.2. 

Part (ii) follows from part (i) and Theorem 25.12.3 (xii). 

Part (iii) follows from a double application of part (i). 

Part (iv) follows from part (iii). 

For part (v), for A, B invertible, jt (B! — A 1) = 4*(B ! (A- B)A )) < u* (A )s* (B )u* (B — A) by 
Theorem 25.12.4 (i, iv), and w*(A7*)u* (B^!) = u^ (A) !u- (B)! by Theorem 25.12.3 (xiv). 


25.12.7 REMARK: Continuity of the map from a matrix to its inverse. 

From Theorem 25.12.6 (v), it can be shown that the inversion map A — A`! for invertible matrices is 
continuous. However, the formal statement of this assertion must wait until continuity on metric spaces has 
been defined. (See Section 24.7 for norms on linear spaces.) 


25.12.8 REMARK: Alternative fields, norms and absolute value functions. 

The definitions and theorems for lower and upper bounds of real square matrices are mostly directly applicable 
to general square matrices over an ordered field which has an absolute value function. (See Section 18.8 for 
ordered fields. See Section 18.5 for absolute value functions on rings.) xm 
The root-mean square norm v ++ |v| = |v|a = (X; v2 )!? for v € R”, which is implicit in Definition 25.12.2, 
could be replaced by the max-norm v ++ |v|.. = max{|v;|; i € Nn} or the sum-norm v  |v|1 = X; |vi| on 
R” for n € Z+. (See Definition 24.7.11 and Notation 24.7.13 for these norms.) Then the upper bound for a 
matrix A € Mj, (R) has the form A > $5; , max?_, |aij| for the max-norm, and A ++ max7.4 555 ., [aj] 
for the sum-norm. (The latter of these two upper bounds is defined to be the norm of a rectangular matrix 
by Edwards [67], page 174.) 


25.13. Real symmetric matrices 


25.13.1 REMARK: Applicability of symmetric matrices. 

The basic properties of real symmetric matrices correspond to their role as the coefficients of bilinear forms, 
which are more closely related to the multilinear algebra in Chapter 27 than the linear maps in Chapter 23. 
This comment applies even more to the definite and semi-definite matrices in Notation 25.13.9. Consequently 
it is not possible to use the properties of abstract bilinear algebra at this point in the book. 


25.13.2 DEFINITION: A symmetric n x n matrix over a field K for n € Zi is a matrix (aij) j= € Mn n(K) 
such that Và, j € Nn, Qij = Qj. 


25.13.3 DEFINITION: A real symmetric n x n matriz for n € Zf is a matrix (aij) j=1 € Mas (IR) such 
that Vi, j € Nn, Qij = Qj. 


25.13.4 NOTATION: Sym(n, R) denotes the set of real symmetric n x n matrices for n € Zj. 


25.13.5 REMARK: Symmetric matrices are not closed under simple matriz multiplication. 

It is not generally true that AB € Sym(n, IR) if A, B € Sym(n, R). By Theorem 25.13.6, (AB)? = BA 
if A, B € Sym(n, IR), which is not quite the same thing. Theorem 25.4.15 implies that (AB)T = BTA? for 
any matrices A, B € M, (IR). So (AB)? = BA = BTAT = (ATBT)? for all A, B € Sym(n,R). It is not 
possible to conclude the general equality of (AB)T and AB from this. However, AB + BA is symmetric if 
A and B are symmetric, as shown in Theorem 25.13.7. 


25.13.6 THEOREM: The transpose of a real symmetric matrix product equals the reverse product. 
Let n € Z and A, B € Sym(n, R). Then (AB)? = BA. 


PROOF: Let n € Zg and A, B € Sym(n, IR). Then 
Vi,j € Nn, (AB); = (AB); = M ajkbii 
k=1 
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I 
M= 


Gxjbik 


> 
ll 
En 


i 
Ma 


bik Qk; = (BA). 


> 
ll 
= 


Therefore (AB)? = BA. 


25.13.7 THEOREM: The anti-commutator of two real symmetric matrices is a real symmetric matrix. 
Let n € Z and A, B € Sym(n, R). Then AB + BA € Sym(n, IR). 


PROOF: Letn € Zj and A, B € Sym(n, IR). Then by Theorem 25.13.6 and Remark 25.4.14, (AB+BA)? = 
(AB)? + (BA)? = BA + AB = AB + BA. So AB + BA € Sym(n, IR) as claimed. 


25.13.8 REMARK: Semi-definite and definite symmetric matrices. 
The semi-definite and definite concepts for matrices in Definition 25.11.7 are most often applied to real 
symmetric matrices as in Notation 25.13.9. 


25.13.9 NOTATION: 
Symj (n, IR) denotes the set of real positive semi-definite symmetric n x n matrices for n € Zj. 


Symp 
Sym" 


denotes the set of real negative semi-definite symmetric n x n matrices for n € Zj. 


n, IR) 
n, R) denotes the set of real positive definite symmetric n x n matrices for n € Zg : 
n, R) 


Sym denotes the set of real negative definite symmetric n x n matrices for n € Zj. 

25.13.10 REMARK:  Eigenvalues of square matrices. 

Square matrices are useful for at least two kinds of applications, namely as the coefficients of a linear 
endomorphism and as the coefficients of a bilinear map. (See Example 27.5.9 for coefficients of bilinear 
maps.) In both cases, a basis must first be specified for the linear space. 


In the case of a linear map ¢: V — V for a linear space V, the eigenspaces of the matrix of the map are 
useful for the study of the properties of the linear map. In the case of a bilinear function 8 : V x V > K, 
where K is the field of the linear space V, the eigenvalues of the matrix of the bilinear function are also 
useful, but in a different way. 


If a matrix A = (aiji j= is real and symmetric, its eigenspaces span the whole linear space V, and the 
eigenvalues are real. Denote by A* the maximum eigenvalue and by A^ the minimum eigenvalue. Then a 
linear map @: V — V never multiplies the length of a vector in V by a multiple which is greater than 
A = | max(A*, \~)|, no matter how the norm on the linear space is chosen. In fact, A = sup{|@(v)|; |v| € 1}. 
(See Definition 25.11.2 for the related matrix norms A* and X^ which are valid for all square matrices.) 


For a bilinear map corresponding to a real symmetric matrix, A = sup(|8(v,v)|; v € V and |v| € 1} for any 
choice of norm. 


It would be useful to be able to ignore the complicated algebra of eigenvalues and use the simpler-looking 
norm-based expressions instead. (In practice, the sup and inf expressions would involve complicated algebra 
anyway, but they do look less algebraic.) The norm-based sup/inf expressions are perfectly meaningful for 
asymmetric matrices, where eigenvalue calculations are more difficult. 


25.14. Matrix groups 


25.14.1 REMARK: The classical matrix groups. 
Some classical matrix groups are listed in Table 25.14.1. The field may be IR, C or H (the quaternions). 


In applications to physics, Lie groups are often classical matrix groups, but they may also be extended to 
include translation operations. Such inhomogeneous or affine groups are Lie groups, but are not matrix 
groups because translation operations are not linear actions of matrices on tuples. Even if such group 
elements are represented as rectangular arrays of numbers, their composition operations are not matrix 
multiplication as in Definition 25.3.2. 
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notation name definition 

GL(n) general linear group 

SL(n) special linear group det(A) = 1 

O(n) orthogonal group AAT = I, 

SO(n) special orthogonal group AAT = I, and det(A) = 1 
U(n) unitary group AA! =I, 

SU(n) special unitary group AA! = I, and det(A) = 1 


Sp(n) symplectic group 
USp(n) unitary symplectic group 


Table 25.14.1 Some classical matrix groups 


25.14.2 DEFINITION: An orthogonal matriz in a set of matrices Mn n(K) for n € Zj and field K is a 
matrix A € Mn (KK) which satisfies AAT = In. 


25.14.3 REMARK: Progressive addition of structure to the general linear group. 
The build-up of structure for general linear groups in this book is summarised in Table 25.14.2. (See also 
the “structure build-up” for the general linear group and its linear space in Table 54.9.1 in Remark 54.9.1.) 


structure 
19.1.12 set GL(M) of automorphisms of a module M over a set 
23.1.12 set GL(V) of linear space automorphisms of a linear space V 
23.1.17 left transformation group GL(V) acting on V 
23.2.10 | component maps for linear maps between finite-dimensional linear spaces 
23.11.20 dual (contragredient) representation of a general linear group 
25.14.1 table of notations and names for the classical matrix groups 
32.6.8 the standard topology for the general linear group GL(F) 
39.6.4 topological transformation group structure of a general linear group 
49.4.18 locally Cartesian space structure for general linear group GL(F) 
49.7.15 the standard atlas for the general linear group GL(F) 
51.4.24 standard differentiable manifold structure for general linear group GL(F) 
63.4.17 general linear Lie transformation group GL(V) 


Table 25.14.2 Overview of build-up of structure on general linear groups 


25.15. Multidimensional matrices 


25.15.1 REMARK: Multidimensional matrices are really higher-degree arrays, not true matrices. 

The usual concept of a matrix is closely tied to concepts such as linear maps, coordinatised with respect to 
bases, but as discussed in Section 25.1, two-dimensional rectangular matrices are also used for the components 
of second-degree tensors, quadratic forms, inner products, an other concepts. These other concepts are mostly 
generalisable to “multidimensional matrices”. 


The word “dimension” is ambiguous in the context of arrays. It could refer to the number of rows or the 
number of columns, or it could mean the dimensionality of the index set. An m x n matrix over a field 
K is a function A: Nm x IN, — K, whereas an r-dimensional array would be a more general function 
A: xt_ijNm, > K for some m = (mk); € (Z{)". The numbers mj could be the “dimensions” of the 
array, or the number r could be the array’s “dimension”. For clarity, the number r could be called the 
“degree” of the array, and the word “array” could be used instead of matrix. 


The requirement for K to be a field is only important if addition and multiplication operations are required for 


higher-degree arrays. Operations such as the compression and decompression maps in Definitions 25.15.12, 
25.15.13, 25.15.15 and 25.15.16, for example, are valid for any set K. 


25.15.2 DEFINITION: A multidimensional matrix or higher-degree array over a field K is a function 
A: xt_4Nm, > K for some m = (mi); € (ZY. 
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The degree or dimension of A is the number r. 


The width sequence of A is the sequence m. 


25.15.3 NOTATION: Multidimensional matrices, or higher-degree arrays. 

A((Nm,)%—1; K), for a field K and a sequence m = (mx)%_, € (Zj)" for some r € Zf, denotes the set of 
multidimensional matrices over K with width sequence m. 

A((Nm,);—1) is an abbreviation for A((Nm,);—1; IR). 

A,(INn; K), for a field K and n,r € Zi, denotes the set A(N? ; K) of multidimensional matrices over K with 
width sequence m defined by mj, = n for all k € N». 

Ar(Nn) is an abbreviation for A, (Nn; IR). 


25.15.4 REMARK: Symmetric and antisymmetric multidimensional matrices. 

Symmetry and antisymmetry of multidimensional matrices requires the “widths” to be constant to permit 
indices to be arbitrarily permuted. (See Section 14.8 for permutations.) Therefore Definitions 25.15.5 
and 25.15.6 are restricted to constant-width matrices. (These could be called “cubic matrices” perhaps.) 


25.15.5 DEFINITION: A symmetric multidimensional matriz or symmetric higher-degree array over a field 
K is a function A: N7, > K, for some n,r € Zi. , which satisfies 


Vi € N;, VP € perm(N,), A(i o P) = A(t). 


The degree or dimension of A is r. The width of A is n. 


25.15.6 DEFINITION: An antisymmetric multidimensional matrix or antisymmetric higher-degree array 
over a field K is a function A: IN? —> K, for some n,r € Zi , which satisfies 


Vie Nf, 


VP € perm(N,), A(i o P) = parity(P) A(i). 


The degree or dimension of A is r. The width of A is n. 


25.15.7 REMARK: Antisymmetry applied to non-injective index-tuples. 

The antisymmetry condition in Definition 25.15.6 implies A(z) = 0 for all non-injective index-tuples 4 € IN7,. 
This is because if j,k € IN, are such that i; = 7%, then the permutation P which transposes j and k is 
an element of perm(NN,.). But the parity of any transposition equals —1 by Definitions 14.8.18 and 14.8.22. 
Therefore A(i) = —A(i o P) = —A(i) because i o P = i. So P(i) = 0. 


25.15.8 NOTATION: Symmetric multidimensional matrices, or higher-degree arrays. 
AT (N,; K), for a field K and n,r € Zj, denotes the set of symmetric multidimensional matrices over K 
with width n and degree r. In other words, 


A (Nn; K) ={4: N; > K; Vi € N}, VP € perm(N,), A(i o P) = A(i)). 
A (Nn) is an abbreviation for A*(N,;R). 


25.15.9 NOTATION: Antisymmetric multidimensional matrices, or higher-degree arrays. 
A; (Nn; K), for a field K and n,r € Zt, denotes the set of antisymmetric multidimensional matrices over 
K with width n and degree r. In other words, 


A, (Nn; K) = (A: Nj, > K; Vie N}, VP € perm(N,), A(i o P) = parity(P) A(i)}. 
A; (Nn) is an abbreviation for A> (Nn; IR). 


25.15.10 REMARK: Relations between multidimensional arrays and multilinear maps. 

The array sets A,(N,; K) have similarities to the multilinear function sets Y,(V; K) in Definition 30.1.3. If 
Definition 25.15.2 and Notation 25.15.3 are generalised by replacing IN,, with a general linear space V, then 
L,(V;K) € A,(V;K). Moreover, .Z^ (V; K) € AT(V;K) and 27 (V; K) C A7 (V; K). (See Notations 
30.1.4 and 30.1.5 for .Z*^ (V; K) and 2- (V; K).) 

Furthermore, if Definition 25.15.2 and Notation 25.15.3 are generalised by replacing the field K with a linear 
space U over K, one may write %.(V;U) C A(V;U), .£^ (V; U) € A} (V; U) and YO (V;U) C Az (V;U). 
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25.15.11 REMARK: Compression of symmetric and antisymmetric higher-degree arrays. 

For applications to symmetric and antisymmetric tensors and tensor bundles, it is necessary to remove the 
redundancy from the corresponding higher-degree arrays so that tensors can be coordinatised with respect 
to a basis and tensor bundles can coordinatised by a Cartesian space. 


The “compression algorithm” in Definition 25.15.12 is very simple. The symmetric array A € A; (Nn; K) is 
compressed by restricting the domain IN7, of A to J? = NonDec(N,., Nn), which is the set of non-decreasing 
maps from IN, to N, as in Notation 14.10.3. Similarly, Definition 25.15.13 introduces a “compression algo- 
rithm” for antisymmetric arrays A € A, (Nn; K) by restricting the domain NY of A to I? = Inc(N,;, Nn). 
The test of the value of such “compressions” is whether they have left inverses, which would then be “de- 
compression algorithms". 


25.15.12 DEFINITION: The (standard) compression map for symmetric higher-degree arrays with width 
n € Zj and degree r € Zj , over a field K, is the map Cz, : AF (Ns; K) > (J? > K) defined by 


VA € AT (Ny; K), Vi € J, Ci.(A)() = Ai). 


25.15.13 DEFINITION: The (standard) compression map for antisymmetric higher-degree arrays with width 
n € Z and degree r € Zj , over a field K, is the map Chr : Ay (Nn; K) > (I? + K) defined by 


VA € A. (Nn; K), Vi e I, Cy, (A)G) = Ali). 
25.15.14 REMARK: Decompression of symmetric and antisymmetric higher-degree arrays. 
To decompress compressed higher-degree arrays, it is necessary to copy array entries back to the places where 
they have been removed. This is done in Definitions 25.15.15 and 25.15.16. 


25.15.15 DEFINITION: The (standard) decompression map for symmetric higher-degree arrays with width 
n € Zj and degree r € Zj , over a field K, is the map D$, : (J? > K) + Aj (Nn; K) defined by 


VB: J" > K, Vi € INT, D} UB = B(i o P7*), 
where P; € perm(IN,.) is the standard sorting-permutation for i as in Definition 14.11.3. 


25.15.16 DEFINITION: The (standard) decompression map for antisymmetric higher-degree arrays with 
width n € Zj and degree r € Zj , over a field K, is the map D; , : (I? > K) > A; (Nn; K) defined by 


parity(P;) B(io P; !) if i € Inj(N,, Nn) 


YB: + K, Vie NT, D; exo - [2 TAE 


n? nr 


where P; € perm(IN,.) is the standard sorting-permutation for i as in Definition 14.11.3. 


25.15.17 REMARK: Verification that the decompression maps undo the effect of the compression maps. 
Theorem 25.15.18 shows that the decompression maps in Definitions 25.15.15 and 25.15.16 are left inverses 
of the compression maps in Definitions 25.15.12 and 25.15.13 respectively. 


25.15.18 THEOREM: Verification that the decompression maps undo the effect of the compression maps. 
Let K be a field. 


(i) Vn,r € Zg, VA € AT (Nn; K), D} (Ct. (A)) = A. 

In other words, Vn,r € Zj, Di, o Ci, = id jew, i) 
(ii) Vn,r € Zp, VA € A; (Nn; K), Da r(Chr(A)) = A. 

In other words, Vn,r € Zo, D, o Cz, = id ,- w, y 


r 


PROOF: For part (i), let n,r € Zj and A € A} (Nn; K). Let B = Ct, (A). Let A’ = D} „(B). Let i € N}. 
Then A'(i) = B(i o P7+) by Definition 25.15.15. (Note that this is well defined because i o P; ! € J” by 
Theorem 14.11.5 (ii).) But B(i o P; !) = A(i o P; !) by Definition 25.15.12, and then A(i o P7+) = A(i) by 
Definition 25.15.5. Thus A'(i) = A(i) for all i € N?. Hence A’ = A. 
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For part (ii), let n,r € Zj and A € A;(N4;K). Let B = C,,(A) Let A’ = D,,(B). Let i € 


n,r 


Inj(N,,N,) Then A'(i) = parity(P;) B(i o P; !) by Definition 25.15.16. (This is well defined because 


io P;! € I? by Theorem 14.11.5 (iii).) But B(i o P;!) = A(i o P; !) by Definition 25.15.13, and 


A(i o P; 1) = parity(P; !) A(i) by Definition 25.15.6. Thus A'(i) = parity(P;) parity(P; +) A(i) = A(i) for 
all i € Inj(N,, Nn). 
Now suppose that i € IN? V Inj(IN,,N,). Then A'(i) = 0 by Definition 25.15.16. However, A(i) = 0 also, 


by Definition 25.15.6, as discussed in Remark 25.15.7. So A'(i) = A(i) for all i € N? NV Inj(N;, Nn). Thus 
A'(i) = A(i) for all i € N}. Hence A’ = A. 
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Chapter 26 


AFFINE SPACES 


26.1 Affine spaces discussion . a s ac e e pa suie uoe k e sss ena 881 
26.2 Affine spaces over groups . . . . 2 4 oea e a 884 
26.3 Tangent-line bundles on affine spaces over groups... .. een 886 
26.4 Affine spaces over modules over sets . . . 2 o o a a llle 888 
26.5 Tangent-line bundles on affine spaces over modules. . ..... llle 890 
26.6 Affine spaces over modules over groups and rings. . . en 892 
26.7 Affine spaces over unitary modules over ordered rings . ....... e e 893 
26.8 Tangent velocity spaces and velocity bundles . . . ..... 22r. 894 
26.9 Parametrised line segments and hyperplane segments... ..... lll. 895 
26.10 Affine spaces over linear spaces . . . . 2 2 2 sl ss sss 897 
26.11 Euclidean spaces and Cartesian spaces . . . . . . lll ln 899 
26.12 Cartesian-space tangent-line bundle philosophy. . . ...... 22e 903 
26.13 Tangent-line tangent spaces on Cartesian spaces . . . . .. e 905 
26.44 "Tangent-line bundles on Cartesian spaces... . eA 908 
26.15 Direct products of tangent bundles of Cartesian spaces... .... llle 909 
26.16 Tangent velocity bundles on Cartesian spaces... .... 22e 910 
26.17 Tangent covector bundles on Cartesian spaces . . . ... 22e 910 
26.18 Tangent field covector bundles on Cartesian spaces... 22222 e 911 
26.19  Parametrised lines in ancient Greek mathematics . ..................... 912 
26.20 Line-to-line transformation groups . . . . . 4 4 4 lle esos 915 


A family tree of affine spaces defined in Chapter 26 is illustrated in Figure 26.0.1. 


26.0.1 REMARK: How to skip this chapter. 

The definitions in Chapter 26 are much more general than they need to be. In fact, most of this chapter is 
not used in later chapters. As in most chapters in this book, any definitions which are required later can 
easily be found through the cross-references so that one reads only what is necessary. 


26.1. Affine spaces discussion 


26.1.1 REMARK: The relation between affine spaces and linear spaces. 

Roughly speaking, an affine space is a linear space from which the coordinates have been removed. So, for 
example, the points of an affine space have no origin, no angles, no metric and no inner product. However, 
parallelism and convexity are well defined in an affine space. 

In a linear space, points and vectors are the same thing. Every point is a vector and every vector is a point. 
In an affine space, points and vectors are members of two disjoint spaces, namely the point space and the 
vector space. 

Since affine spaces may be constructed by “removing” the coordinates from a linear space, the points of an 
affine space may in practice be labelled by coordinates for convenience. Therefore in practical calculations 
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affine space over group 
(G,X,0G,uUG,6x) 


, 


affine space over module 
(M,X,o ub x) 


Y 


affine space over module over set 
(A,M,X.o M IA BM ox) 


affine space over module over group affine space over module over ring 
(G,M,X,0G,09M,HG,MM x) (R,M,X,o n, TRO ME RM OX) 
affine space over module over field affine space over module over ordered ring 
(K,M,X,o k,TK,0 M.E.K IM OX (R,M,X,o n,TR, X. MH RAM x) 
affine space over linear space affine space over module over ordered field 
(KV, X,0K TK OV HK MV Ox) (K,M,X,oK TK, S0 M MK HM Ox) 


affine space over linear space over ordered field 
(K,V,X,o k,TK,X,0 v. MV Ox) 


Figure 26.0.1 Family tree of affine spaces 


in affine spaces, points and vectors may be confused. One must exercise some self-discipline to avoid using 
concepts such as “origin”, “axes”, “angles” and “metric” in an affine space. 


One may also think of an affine space as a point space in which every point is an origin of a linear space, 
and all compatible sets of coordinates at all such origins are equally acceptable. 


26.1.2 REMARK: Affine spaces are a flat-space “reference model" for affinely connected manifolds. 

Affine spaces provide a reference model for understanding parallel transport and affine connections on general 
manifolds. An affine space may be thought of as a manifold with absolute (i.e. path-independent) parallelism. 
In other words, affine spaces are examples of “flat spaces". 


26.1.3 REMARK: The many meanings and contexts of the word “affine”. 

The word “affine” is used in several ways. Transformation groups may be described as “affine”. Conversely, 
spaces which are invariant under affine transformation groups may be described as “affine”. Then path- 
dependent parallelism concepts which generalise the absolute parallelism on affine spaces may be described 
as "affine". Spaces which have a path-dependent parallelism may then also be described as “affine”. 


(1) An “affine transformation” of a linear space is a combination of a translation and an invertible linear 
transformation. This use of the word “affine” is closest to the original meaning which was introduced 
by Euler in 1748. Euler intended the word to mean scalings along a single axis. (These are a kind of 
uni-axial similarity transformation. Hence the word “affine”, which means “related” or “similar”.) The 
modern meaning of “affine transformation” is an element of the group of transformations generated by 
uni-axial scalings. (See Remark 24.4.1 and Section 77.2.) 


(2) An “affine space” is a space which has both points and vectors, and a difference function which maps 
pairs of points to vectors. The difference functions satisfies a linear space addition rule. The point-space 
transformations which conserve this addition rule are the affine transformations. Thus an affine space 
is invariant under the group of affine transformations. 


(3) An “affine connection” is a generalisation of affine space parallelism. Whereas an affine space determines 
a path-independent global parallelism on its point set, an affine connection defines a path-dependent 
parallelism. An affine connection is thus a path-dependent analogue of affine space parallelism. Using 
an affine connection, one may define constant-velocity or “self-parallel” lines which are analogous to the 
(affine-parametrised) straight lines in affine spaces. 


(4) An “affine manifold” may refer to a differentiable manifold on which an affine connection is defined. 
This term is apparently not used by the majority of authors. Affine manifolds are a generalisation of 
affine spaces. 
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(5) An “affine map” is generally a map f which is locally linear in the sense that the difference ó( f (1), f (x2)) 
depends linearly on the difference ó(z1,22). In this sense, an affine transformation in case (1) is an 
invertible affine map for which the domain and range are the same set. (The difference functions 6 may 
be linear space differences of some more general kinds of difference functions. The domain and range of 
f might not be linear spaces, but the ó-values should be in linear spaces.) 


6) An “affine-parametrised” curve (or more general map) is an affine map whose domain is a subset of IR 
8 
(or R” for some n € Z*). 


Sometimes the word “affine” is also used as a synonym for “linear”. There is therefore ample opportunity 
for confusion in the use of this word. 


26.1.4 REMARK: The invariance constraints which characterise affine spaces and other spaces. 
Table 26.1.1 gives some examples of invariant relations on a Cartesian space R” for n € Z*. Each relation 
corresponds to a maximal group of transformations which preserve the relation. 


name invariance relation transformation 
orthogonal |p(v)| = |r| $ : vm Av with AAT = I, 
isometric |p(u) — à(v)| = |u — v| $:v e Av+b with AAT = I, 
linear (Au + pv) = Aé(u) + uó(v) $:v Av with det(A) Z 0 
affine olu + A(v — u)) = ó(u) + A($G(v) — $(u))) $:ve Av+b with det(A) Z0 
Table 26.1.1 Invariance relations and transformation classes 


Each of the relations in Table 26.1.1 requires an expression calculated after applying ¢ to be the equal to the 
same expression calculated before applying ¢. In the case of linear transformations, let w = Au + pv. Then 
w — Au — pv = 0 before the transformation, whereas ¢(w) — A¢(u) — ué(v) = 0 after the transformation. So 
the linear combination relation is invariant under the transformation. 

In the affine transformation case, let w = u+ A(v — u). Then w — u — A(v — u) = 0 before the transformation, 
whereas $(w) — (u) — A(d(v) — ó(u)) = 0 after the transformation. So the convex combination relation is 
invariant under the transformation. This is a weaker constraint than in the linear case. 

The affine transformations may thus be defined as those which preserve convex combination relations between 
points. However, history seems to have developed in the opposite direction. First the affine transformations 
were defined by Euler. Much later, the invariant properties and relations of affine transformations and affine 
spaces were studied by Mobius. 


26.1.5 REMARK: Alternative forms of affine transformation constraints. 

It is shown in Theorem 24.4.2 that the single-scalar constraint on line segments: 
Vp,v € V, VA € K[0, 1], (p+ Av) — ó(p) = AO + v) — ¢(p)) 

is equivalent to the two-scalar constraint: 


Vp,u,v € V, V. u € K, ọ(p+ àu + uv) — ó(p) = Allp + u) — é()) + n(ó(p + v) — e(»)) 


for any ordered field K. The two-scalar constraint is more clearly recognisable as a local form of the two-scalar 
linearity constraint, relative to a point p € V. 


26.1.6 REMARK: The affine transformation group is the automorphism group for affine spaces. 

The invariance relation for affine spaces is essentially the same as the requirement for automorphisms on the 
space of right transformation groups on a set. (See Section 20.7 for right transformation groups.) The most 
basic kind of affine space is the affine space over a group in Definition 26.2.2, which is essentially the same 
as a right transformation group. By Definition 20.8.1, right transformation group automorphisms must map 
both the group (i.e. the vector set) and the point set so that the action of the group after the automorphism 
corresponds to the action before it. This implies that ¢(p + g^) = o(p) + (d(p + g) — é(p))" for all points p, 
group elements (i.e. vectors) g, and n € Z. 


In the case of an affine space over a linear space, the automorphism condition becomes ¢(p + Av) = ó(p) + 
A(é(p + v) — o(p)) for vectors v and scalars A. In terms of the difference function ô on the point space, 
this may be written as 6(¢(p + Av), 6(p)) = Aó(ó(p + v), o(p)), which is equivalent to the invariance of the 
convex combination relation. This is equivalent to the affine space invariance relation in Table 26.1.1 in 
Remark 26.1.4. The affine transformation group is thus the automorphism group for affine spaces. 
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26.1.7 REMARK: Defining affine transformations in terms of "differences between points". 

The affine map conditions in Remark 26.1.6 may be written more abstractly in terms of ó-values (“differences 
between points") as ô(p1, q1) = Aó(pa. qa) => 9(6(pi), 6(q1)) = A5(P(p2), ó(q2)). In other words, the difference 
between two ¢-values depends only on the difference between the ¢-arguments, and if the ¢-arguments are 


related by a scalar multiple, the ¢-values are related by the same scalar multiple. 


26.1.8 REMARK: Flat affine spaces are a useful analogy for understanding affinely connected manifolds. 
It is important to have some understanding of affine spaces in order to better understand the concept of an 
affine connection on a differentiable manifold. (Affine connections are presented in Chapter 71.) An affine 
space exhibits a canonical example of an affine connection in the same sense that a Euclidean metric space 
exhibits a canonical example of a Riemannian metric. In an affine linear space, straight lines and parallelism 
are defined, but not distance or angles. An affine connection defines geodesics and parallel transport, but 
not distance or angles. 


26.1.9 REMARK: Affine spaces have no special origin and no special directions. 

An affine space is a linear space V with no special zero vector and no scalar multiplication or vector addition 
operations, although vector subtraction is permitted on the point space. An affine space is a democratic 
space where all points are equal. Definition 26.10.3 removes unwanted concepts from the linear space V by 
starting with a bare set X and adding only those properties of V which are required. 


An alternative construction to specify an affine space would be an equivalence class of linear maps. This is 
the approach taken for defining manifolds and it is just as applicable here. Thus an affine space could be 
specified as a set X together with a bijection w : X — V. The pair (X, 4») is then taken to be equivalent to 
the pair (X', V) if and only if X = X’ and / o j^! : V — V is an affine transformation, namely an element 
of GL(V). This would perhaps be closer to “the truth” than Definition 26.10.3. 


There is a third possibility for defining affine spaces which seems to be much better than even the atlas 
approach. Affine spaces can be defined in a natural, intrinsic manner by constructing a four-way equivalence 
relation on the points of a set. Thus given a set .X, a parallelism relation on the set .X x X is defined to 
mean that (P,Q) ~ (R, S) whenever the vector PQ is parallel to RS. 


26.1.10 REMARK: History of the word “vector”. 

The word “vector” is Latin meaning “carrier”. So a vector carries something from one point to another. 
According to Struik [249], page 175, the word “vector” was introduced into mathematics by William Rowan 
Hamilton (1805-1865) in the context of quaternions. The OED [482], page 2456, gives the date 1865 as the 
first recorded occurrence of the word “vector” in the mathematical sense of a quantity having both magnitude 
and direction although it was used as early as 1796 in the sense of the straight line joining a planet to the 
focus of its orbit. According to Kreyszig [22], page 9: “The concept of a vector was first used by W. Snellius 
(1581-1626) and L. Euler (1707-83).” 

In 1926, Levi-Civita [26], page 102, used the word “versor” to mean a vector with unit length. This word 
is derived from a Latin verb meaning “to turn” in the same way that “vector” is derived from a Latin verb 
meaning “to carry”. However, the word “versor” does not appear in classical Latin dictionaries. 


26.2. Affine spaces over groups 


26.2.1 REMARK: A very general definition of affine spaces. 

An affine space over a group is a fairly minimal version of the more familiar affine spaces over linear spaces. 
(For affine space definitions, see also Szekeres [305], page 231; MacLane/Birkhoff [110], page 564; EDM2 [113], 
article 7.A, page 23; Crampin/Pirani [7], page 9; Greenberg/Harper [86], page 41; Weyl [310], page 14.) 


26.2.2 DEFINITION: An affine space over a group is a tuple (G, X,6,11,0) such that (G, X, ø, u) is an 
additively notated right transformation group which acts freely and transitively on X, and the subtractively 
notated operation ó : X x X — G satisfies 


(i) Vr € X, Vg EG, lulz, g) £) =g. (Le. Vr e X, Yg E€ G, (r-g) v — g.) 
The point space of the affine space (G, X,6,11,0) is the set X. 


The vector space of the affine space (G, X, ø, u, ô) is the group G < (G,o). 
The affine structure function of the affine space (G, X,0o,1,0) is the map u: X x G > X. 
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26.2.3 REMARK: Affine spaces over groups are very similar to right transformation groups. 

An affine space X over a group G is essentially identical to a right transformation group which acts freely 
and transitively on X because the difference operation 6 adds no new information. (See Definitions 20.7.11 
and 20.7.14 for right transformation groups which act freely and transitively respectively.) The operation 
ô: X x X — G in Definition 26.2.2 is uniquely determined by the action operation u of the group G on the 
set X. This is shown in Theorem 26.2.4. 


26.2.4 THEOREM: Unique existence of group elements which map given points to other given points. 
Let (G, X,o,) be a right transformation group which acts freely and transitively on X. 


(i) There exists a function ô : X x X — G such that Vx € X, Vg € G, ó(u(x,g), £) = g. 
(ii) There is at most one function 6: X x X — G such that Va € X, Vg € G, ó(u(z, g), £) = g. 


Proor: For part (i), if X = (), the empty function 6 = () satisfies Vr € X, Vg € G, ó(u(x,g),v) = g. 
Suppose that X # (. For y € X, define uy : G — X by py : g 6 pu(y,g). Then py is a bijection by 
Theorem 20.7.16 (iii) because the group action is both free and transitive. So py has an inverse function 
py :X 2 G. Defineó: Xx X 2 V by 6: (x,y) 5 uy (x) for all (z,y) € X x X. Then d(u(2, g), £) = 
pa (u(x, g)) = uz (us(g)) =g for all x € X and g € G. 

For part (ii), let 6, : X x X — G satisfy Vx € X, Vg € G, óx(u(z,g), v) = g for k = 1,2. Let x,y € X. 
Then by transitivity, u(y, g') = x for some g' € G. So ðk(x,y) = óx(u(y,g),y) = g' for k = 1,2 by 
Definition 26.2.2 (i). Hence ôı (x, y) = ô2(x, y) for all x,y € X. Thus 04 = do. 


26.2.5 REMARK: Optional commutativity of the group acting on an affine space. 

It is usually assumed that the group G in Definition 26.2.2 is commutative. There is no obvious necessity 
for this restriction. However, when a group operation notation is additive, there is an expectation that the 
operation will be commutative. The parenthetical additive and subtractive notations in Theorem 26.2.6 


suggest false relations. Therefore the proof is expressed in functional notation to avoid being misled. 


26.2.6 THEOREM: Elementary properties of an affine space over a group. 
Let X < (G, X,0,,6) be an affine space over a group G. Then: 


(i) Vg € G, Vx € X, p(u(z, g7), g) = n(u(z,g)g ^) = 2. 
Le. Yg € G, Vx € X, (r—g) -g (tg) — g = x.) 
(ii) Vr € X, ó(z,x) = ea. 

Le. Vr € X, x — z = 0c.) 


(iii) Vai, 22,03 € X, (£3, 21) = o (0(z2, £1), 0(z3, £2)). 
Le. V21, £2, £3 € X, T3 — Tı = (£2 = zı) + (£3 = z£2).) 


(iv) Vx1, T2 € X, Ó(z2, 21) = ó(z1,22) 5. 


Le. V2x1, £2 X, T2 r= (xı z2).) 
(v) Vz € X, Vg1, 92 € G, ô(u(x, g2), u(z, n a(g97*, 92). 
Le. Vr € X, Ygı, g2 € G, (£ +92) — (£ +91) = —91 + 92.) 


For g € G, define ug : X — X by ug : z u(x, g). Then: 
(vi) Vg € G, ug : X — X is a bijection. 
(vii) Vg € G, ug o ug-1 = igi © fg = idx. 
For p € X, define 6, : X 2 G by 6, : x 2 ó(z, p). Then: 
(vii) Vp € X, 6p : X > G is a bijection. 


Pnoor: Part (i) follows directly from Definition 20.7.2 (i). 

For part (ii), let x € X. Then u(z,eg) = x by Definition 20.7.2 (ii). So ó(z,z) = d(p(a,eg),x) = ea by 
Definition 26.2.2 (i). 

For part (iii), let v1,72,23 € X. Then u(ri,g1) = x2 and u(z2,9g2) = x3 for some gi,g2 € G because 
G acts transitively on X. Then ó(zs,21) = Ó(pu(xi,gi) £1) = gi and 6(x3, 22) = ó(u(x2,9g2),x2) = 92 
by Definition 26.2.2 (i). So o(6(x2,21),6(x3,22)) = o(91,92). But u(r1,0(91,92)) = n(u(1,g1), 92) by 
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Definition 20.7.2 (i). So u(z1,0(g1,92)) = w(r2, 92) = xs. But 6(u(21,0(91, 92)), £1) = o(91,92) by Defini- 
tion 26.2.2 (i). So 6(x3, x1) = o(91, 92) = o (0(z2, 21), 0(13, 22)) as claimed. 


For part (iv), let 1,72 € X. Then ó(zi,21) = o(4(x2,%1),6(#1,22)) by part (iii) with z3 = xı. So 
o(5(#2, £1), (£1, £2)) = eg by part (ii). Similarly, a(9(z1,22),0(12,21)) = eg. So (£2, £1) = 6(#1,22)7'. 
To show part (v), let r € X and g1,92 € G. Let xı = u(r,g1), x2 = x and z3 = p(x, g2). Then by 
part (iii), 5(u(x, g2), u(x, g1)) = o(4(@, u(x, 91)), 5(u(@, g2), x). But 0(z, u(x, g1)) = 9(u(,91),2) ! = gt 
by part (iv) and Definition 26.2.2 (i). Hence ó(u(z, ga), u(x, 91)) = o(97', 6(u(a, g2), £)) = o (gi ^. 92). 

For part (vi), the surjectivity of jjj : X — X for all g € G follows from the assumption that G acts transitively 
on X. To show injectivity, let 21,72 € X and let y; = p(x1,g) and y2 = p(a2,g). Then u(yi,g |) = 
p(u(z1,g),g 1) = u(z1,o(g.g 1)) = u(zi, ec) = xı by Definition 20.7.2. Similarly, (y2, g7") = x2. So if 
yı = Y2, then xı = x2. Hence pg is injective. 

Part (vii) is the same as part (i). 

For part (viii), let p € X. Define p : G — X by ¢p: g  u(p,g). Let x € X. By the transitivity of G 
on X, there is a g € G with u(p,g) = x. Then p(x) = dp(u(p,9)) = 9(n(p.g),p) = g. So bp(5p(x)) = 


$p(g) = u(p, g) = x. Therefore dp o dp = idx. Let g € G. Then ó,(65(9)) = dp(u(p, 9)) = 9(u(p. g), p) = g. 
Therefore à o @p = ida. Hence ¢, = ô, t. So 6, : X — G is a bijection. 


26.2.7 REMARK: Local difference maps in an affine space resemble manifold charts. 
The bijections 6, in Theorem 26.2.6 may be thought of as manifold charts which satisfy a linearity constraint. 


26.3. Tangent-line bundles on affine spaces over groups 


26.3.1 REMARK: Styles of tangent bundles. 
Tangent spaces, fibrations and fibre bundles are fundamental to differential geometry. Some of these are 
listed in Table 26.3.1. 


reference style of fibration or fibre bundle 


26.3 tangent-line bundles on affine spaces over groups 

26.5 tangent-line bundles on affine spaces over modules over sets 

26.7.9 unidirectional tangent bundles on affine spaces over modules over ordered rings 

26.8 tangent velocity spaces on affine spaces over modules 

26.13 tangent-line tangent spaces on Cartesian spaces 

26.14 tangent-line bundles on Cartesian spaces 

26.16 tangent velocity bundles on Cartesian spaces 

26.17 tangent covector bundles on Cartesian spaces 

54.5.30 tangent vector bundles on differentiable manifolds 

54.16 unidirectional tangent bundles on differentiable manifolds 
Table 26.3.1 Some fibre bundle definitions 


26.3.2 REMARK: Tangent bundles consisting of parametrised “lines”. 
Some primitive kinds of parametrised lines may be defined in an affine space over a group, even if the group 
is not commutative. From these lines, primitive kinds of tangent spaces and tangent bundles may be defined. 


Let X < (G,X,0o, 1,6) be an affine space over a group G. For p € X and g € G, define Lpg : Z > X 
inductively by 
(i) Lp,g(0) =P; 
(ii) Yn € Z5, Lp (n +1) = u(Lp, (n), 9), 
(ii) Yn € Zg, Lp.g(n — 1) = (Lp, (n), 9). 
It is easily verified that Vn € Z, Lp (n) = u(p, g”), and Vn € Z, ôp(Lp,g(n)) = g”, where 6, : X — G is the 


bijection in Theorem 26.2.6. By Theorem 17.4.7, 6p o Lp,g : Z — G is a group homomorphism for all p € X 
and g € G. 
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For p € X, define T;(X) = (L5,4;: g € G}. Then Tp(X) is a primitive kind of tangent space at p on X, and 
T(X) = UT (X) is a primitive kind of tangent bundle on X. 

Note that such a tangent bundle is not a species of fibre bundle, although it may be considered to be a 
non-topological tangent fibration in the sense of Section 21.2. If suitable structure group and fibre atlas 
structures are added, it could be considered to be a non-topological fibre bundle in the sense of Section 21.8. 
The range of a line Lj; € T;(X) for p € X and g € G is equal to the set (u(p, g”); n € Z}. This is the image 
of H = (g^; n € Z} under the bijection ¢, : G — X defined by 9, : g — u(p,g). So (15 (X)) equals #(H), 
which is the order of the cyclic subgroup H of G which is generated by g. It follows that the structure and 
relations of these primitive tangent spaces are essentially the same as the structure and relations of cyclic 
subgroups and their cosets in the group G. 


26.3.3 DEFINITION: A line on an affine space over a group (G, X,0, 1,6) is a map f : Z — X which is 
defined inductively for some p € X and g € G by 
(i) f(0) = p, 
(ii) Vn € Zo, f(n+1) = u(f(n). 9). 
(ii) Vn € Zp, f(n— 1) = u(f(n),g- )). 
The point p is called the base point of the line. 
The group element g is called the velocity of the line. 


26.3.4 NOTATION: Lpg, for p € X and g € G, for an affine space over a group (G, X) < (G, X, o, 1,9), 
denotes the line on (G, X) with base point p and velocity g. 


26.3.5 DEFINITION: The tangent space on an affine space over a group (G, X,c, u, ô) at base point p € X 
is the set of lines {Lp g; g € G}. 


26.3.6 NOTATION: T,(X), for p € X, for an affine space over a group (G, X) < (G, X, ø, u, 6), denotes the 
tangent space on (G, X) at base point p. In other words, T,(X) = (L5: g € G}. 


26.3.7 DEFINITION: The tangent bundle on an affine space over a group (G,X,o, 1,9) is the set of lines 
ibd pE AD eG 


26.3.8 NOTATION: T(X), for an affine space over a group (G, X) < (G,X,0o,u,6), denotes the tangent 
bundle on (G, X). In other words, T(X) = (Log; p E€ X, g € G} = U,ex Tp(X). 


26.3.9 REMARK: Attempting to make lines on affine spaces over groups “coordinate-free”. 

Definition 26.3.3 may be regarded as “coordinate-bound” in some sense. Those who yearn for “coordinate- 
free” definitions might prefer that a particular point p € X is not specified. One way to remove special 
points from the definition is to specify only that f : G — X should satisfy part (ii). The rule Vn € 
Z, f(n 4-1) = u(f (n), g) guarantees that the function will be a line. Then one may recover the base point 
p by examining f(0). 

One may also avoid specifying the velocity g by changing the rule to Vn € Z, f(n+1) = u(f(n), 9(f(1), f (0))). 
In other words, Vn € Z, ó(f(n +1), f(n)) = ó(f(1), /(0)). One can be even more “coordinate-free” by 
specifying the rule Vm,n € Z, ó(f(m + 1), f(m)) = 6(f(n + 1), f(n)). This signifies that the velocity is 
constant. This is reminiscent of the analytical definition of straight lines in Euclidean spaces and affine 
manifolds (i.e. manifolds which have an affine connection). One may remove one of the dummy variables by 
specifying Vn € Z, (f(n +1), f(n)) = 6(f(n), f(n — 1)), which is reminiscent of the second-order definition 
of lines and geodesics as those curves whose curvature equals zero. 


26.3.10 REMARK: Interpretation of tangent-line bundles as non-topological fibre bundles. 

The tangent bundle in Definition 26.3.7 may be considered to be a primitive kind of non-topological fibration 
(or groupless fibre bundle) (E, r, B) according to Definition 21.2.1 by defining E = T(X), B = X, and 
m: E B with a: L> L(0) for all L € T(X). 


The tangent bundle in Definition 26.3.7 also meets the requirements for the uniform non-topological fibration 
in Definition 21.2.1, since there is a well-defined bijection between the fibres Ey = «-!((b]) for all b € B. 
One may always recover the unique group element g which was used to define a line f on an affine space 
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(G, X) over a group by applying the difference operation 6 to f(0) and f(1). Thus g = 6(f(1), f(0)). So there 
is a natural bijection between all of the fibres Ej and the group G. Therefore the tangent bundle also meets 
the requirements of Definition 21.7.5 for a non-topological fibre bundle for fibre space G. Consequently, 
the tangent bundle meets the requirements of Definition 21.8.3 for a non-topological (G,G) fibre bundle is 
one adds a fibre atlas E which consists of the single map ¢ : T(X) — G, where ¢: f + 6(f(1), f(0)) 
for f € T(X). (The action of G on X must be a left action, not the right action in Definition 26.2.2, but one 
may easily define the left action to equal the right action. Then condition (i) of Definition 26.2.2 yields the 
corresponding associativity for the left action.) One could even go further by observing that this tangent 
bundle is a principal fibre bundle with this atlas. (See Definition 21.9.4 for non-topological principal fibre 
bundles.) 


26.3.11 REMARK: Uni-directional tangent lines on affine spaces over groups. 
One may also define half-lines LT, : Zf — X by Li, : n e» u(p,g"). Then one may define unidirectional 
tangent spaces T (X) = {L}; g € G}. 


26.3.12 DEFINITION: A half-line on an affine space over a group (G,X,o, 1,6) is a map f : Ze — X which 
is defined inductively for some p € X and g € G by 
(i) f(0) =p, 
(ii) Vn € Zo, f(n+1) = u(f (n), 9). 
The point p is called the base point of the half-line. 
The group element g is called the velocity of the half-line. 


26.3.13 NOTATION: Lj,,for p € X and g € G, for an affine space over a group (G, X) < (G, X, 0, u, ô), 
denotes the half-line on (G, X) with base point p and velocity g. 


26.3.14 DEFINITION: The unidirectional tangent space on an affine space over a group (G, X,0, u, Ô) at a 
base point p € X is the set of lines {Lt}; g € G}. 


F. 
[E 
26.3.15 NOTATION: T3 (X), for p € X, for an affine space over a group (G, X) < (G, X, o, u, ô), denotes 
the unidirectional tangent space on (G, X) at base point p. In other words, T^ (X) = {LF 4; g € GJ. 


26.3.16 DEFINITION: The unidirectional tangent bundle on an affine space over a group (G, X, co, pu, ô) is 
the set of half-lines {L} 4; p € X, g € G}. 


26.3.17 NOTATION: T'*(X), for an affine space over a group (G, X) < (G,X,0, 1,6), denotes the unidi- 
rectional tangent bundle on (G, X). In other words, T* (X) = {L}; p E€ X, g € G} = Upex Ty (X). 


26.3.18 REMARK: Parallelism on affine spaces over groups. 
A primitive kind of parallelism may be defined on the tangent bundle T(X). The lines Ly, g1; Lpa, g2 € T(X) 
may be said to be parallel if g1 = go. 


26.4. Affine spaces over modules over sets 


26.4.1 REMARK: A family of classes of structures derived from affine spaces over groups. 

Affine spaces over modules are derived from affine spaces over groups by adding a set of actions to the group. 
According to Definition 19.1.2, a module without operator domain is the same thing as a commutative group 
(M,c) written additively. Therefore an affine space over a module without operator domain is the same as 
an affine space over a commutative group. 


Definition 26.4.2 is the root of the family tree of affine spaces over modules. Therefore the properties in 
Theorem 26.2.6 (for affine spaces over groups) apply to affine spaces over modules without operator domain, 
over a set, over a group, and over a ring. The sets and operations for these kinds of affine spaces over modules 
are illustrated in Figure 26.4.1. 


26.4.2 DEFINITION: An affine space over a module (without operator domain) is an affine space over a 
group X < (M,X,om,um,6x) such that M < (M,om) is a commutative group written additively (i.e. a 
module without operator domain). 
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X X X X 
affine space affine space affine space affine space 
over a group over a module over a module over a module 
(or module) over a set over a group over a ring 

Figure 26.4.1 Sets and operations for affine spaces over modules 


26.4.3 REMARK: Affine spaces over commutative groups have “zero curvature". 

The commutativity condition which is added to affine spaces over groups by Definition 26.4.2 causes the 
tangent spaces at points of such affine spaces to have a simpler algebraic structure. Tangent vectors at a 
point may be added without regard to order. This may be thought of as “zero curvature" in some sense. 


Definition 26.3.3 and Notation 26.3.4 for lines on an affine space over a group may be adopted immediately 
for affine spaces over modules. Since modules are commutative groups, the additive notation may be used. 


26.4.4 REMARK: The special case of a module with empty operator domain. 

As mentioned in Remark 19.1.6, a module without operator domain is the special case of a module over 
an unstructured set which happens to be empty. Therefore Definition 26.4.5 is superfluous because it is a 
special case of Definition 26.4.10. 


26.4.5 DEFINITION: A line on an affine space over a module (M, X,o y, uM ,Óx) is a map f : Z > X 
which is defined inductively for some p € X and m € M by 
(i) f(0) — p, 
(ii) Vk € Zp, f(kK+1) = f(k) +m, 
(iii) Vk € Zg, f(k— 1) = f(k) — m. 


'The point p is called the base point of the line. 
The module element m is called the velocity of the line. 


26.4.6 NOTATION: Lym, for p E€ X and m € M, for an affine space (M, X) < (M, X,om, um, Ôx) over a 
module (M, c), denotes the line on (M, X) with base point p and velocity m. 


26.4.7 REMARK: Unital morphisms mapping integers to points in an affine space over a module. 
The unital morphism in Definition 18.2.11 may be applied to Definition 26.4.5 to write Lpm(k) = km 
for k € Z, where km is defined inductively so that 0zm = Oy and (k +1)m = km + m for all k € Z. 


26.4.8 REMARK: Lines are expected to use scalar multiples when the module is over a field. 

In principle, Definition 26.4.5 is applicable to affine spaces over all kinds of modules, including linear spaces, 
which are the same thing as unitary left modules over fields (and fields are the same thing as commutative 
division rings). However, simply taking integer multiples of the displacement m from the point p is not what 
we generally mean by a “line” in an affine space over a linear space. We expect the line in a space with a 
scalar multiplication operation to contain every scalar multiple of the velocity vector. 


26.4.9 REMARK: Modification of affine spaces over modules to modules with non-empty operator domain. 
A left module M < (A, M, cy, pa) over a set A is a commutative group (M, om) with a left action map pa : 
A x M — M (notated multiplicatively) which is distributive over the group operation. (See Definition 19.1.7 
for left modules over sets.) Thus Va € A, Vz,y € M, ua(a,oM(x,y)) = om (uala, x), uala, y)). In other 
words, Va € A, Vx, y € M, a(x + y) = az + ay. This requires a modest extension to Definition 26.2.2 for an 
affine space over a group. 
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Affine spaces over modules are defined in terms of left modules. There is no particular significance in this. 
It does not affect the algebra. The module for an affine space is in practice typically a linear space, which is 
almost always defined as a left module. Therefore it makes sense to standardise the module as a left module. 


26.4.10 DEFINITION: An affine space over a module over a set is a tuple 
X < (A, M,X,om, Wa, Um, ôx) such that: 


(i) M < (A, M,a y, HA) is a left module over the set A. 


(ii) (M, X) < (M,X,om,pum) is an additively notated right transformation group which acts freely and 
transitively on X. 


(iii) ôx : X x X > M, written subtractively, satisfies Vr € X, Vm € M, óx(uy(x,m), x) =m. 
(Le. Va € X, Vn € M, (r-m) — 2 m.) 


The point space of the affine space (A, M, X, oy, HA, um, ôx) is the set X. 
The vector space of the affine space (A, M, X, om, uA, Lu; x) is the module M < (A, M,o yr, Ha). 
The scalar space of the affine space (A, M, X,om, HA, LM, Óx) is the set A. 


26.4.11 REMARK: An affine space over a module over a set has 3 spaces and 2 action maps. 
The slightly confusing operations in Definition 26.4.10 may be summarised as follows. 


(i) wa: Ax M — M. Top-level set acting on the middle-level set. 

(ii) o : M x M > M. Addition operation within the middle-level set. 
(iii) um : X x M > X. Middle-level set acting on the bottom-level set. 
(iv) dx : X x X — M. Bottom-level operation with middle-level result. 


A novel aspect of affine spaces over modules over sets is the fact that there are three sets and two action 
operations linking them. The unstructured set A is at the top level, acting on M. The commutative group M 
(in the middle) acts on X. The operation om keeps elements of M at the same level. The operation ôx, 
unusually, raises elements of X from the bottom level to the middle level. 


26.4.12 REMARK: Possibly confusing combination of a left module with a right transformation group. 

It is noteworthy that Definition 26.4.10 combines a left module with a right transformation group. This is 
the standard way of doing things, although it is perhaps not the tidiest way of doing things. This convention 
is seen in affine spaces over linear spaces in expressions such as p + Av for the displacement of a point p by 
a vector Av, where A is a scalar multiplier for the displacement v. There is no confusion because the left 
module action is written multiplicatively in such expressions, and the displacement is written additively. So 
the “operator binding rules" ensure that the actions take place in the correct order. Thus v is acted on by 
the left module action of A, and then Av acts on p by a right transformation action. 


26.5. Tangent-line bundles on affine spaces over modules 


26.5.1 REMARK: Defining tangent-line bundles for minimalist affine spaces. 

All of the definitions in Section 26.5 apply to affine spaces over modules over sets, groups and rings. The 
definitions are stated in maximum generality for the case of affine spaces over modules over unstructured 
sets. These definitions then apply also to affine spaces over modules over arbitrary structured sets. 


26.5.2 REMARK: Scalar operations can “join the dots” of lines on affine spaces. 

The lines in Definition 26.3.3 for affine spaces over groups are defined in the absence of a scalar multiplication 
operation on the group. Therefore only the most primitive form of line can be defined, by simply iterating 
the group operation on points of the affine space point set forwards and backwards. 


Lines can also be defined using iterated group operations when a scalar operation is available, but a different 
kind of line is possible using scalar multiplication. When the vector space of an affine space is a module 
over a set, the set is thought of as an operator domain. In fact, the unstructured operator domain A in 
Definition 26.4.10 cannot be considered to be an authentic scalar space because it does not have even an 
addition operation. Therefore Definition 26.5.3 should not be taken very seriously. It is probably of technical 
interest only. The gradual build-up of structure helps one to understand which aspects of algebraic structure 
perform which functions. 
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26.5.3 DEFINITION: A line on an affine space over a module over a set (A, M,X,om, HA, HM, Ôx) is a map 
f : A— X which is defined for some p € X and m € M by f : an p-- am. 

'The point p is called the base point of the line. 

The module element m is called the velocity of the line. 


26.5.4 NOTATION: Lym, for p E X and m € M, for an affine space (M, X) < (A, M, X,om, HA, Hus Óx) 
over a module M < (A, M, oy, pa) over a set A, denotes the line on (M, X) with base point p and velocity m. 


26.5.5 REMARK: Scalars in the operator domain can act on velocities before applying them to points. 
Definition 26.5.3 and Notation 26.5.4 incompatibly redefine Definition 26.3.3 and Notation 26.3.4. (This 
does not matter much because it is unlikely that anyone will use these definitions and notations!) It should 
be noted that the line parameter space in Definition 26.3.3 is the ring of integers whereas the line parameter 
space in Definition 26.5.3 is the scalar set A. 


The line map in Definition 26.5.3 is defined by making the scalar a € A act on the velocity m € M first 
before applying the product am € M to the point p € X. The more primitive line map in Definition 26.3.3 
effectively applies the unital morphism in Definition 18.2.11 to the velocity g € G before applying the iterated 
result g^ to the point p € X for k € Z. This is effectively using the ring of integers Z as an ad-hoc scalar 
space for the group which is in the role of a vector space. Therefore in this sense, Definition 26.3.3 may 
be considered to be a special case of Definition 26.5.3. Thus Definition 26.5.3 is the “true definition”. The 
domains, ranges and parameters of lines on affine spaces over modules are illustrated in Figure 26.5.1. 
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Figure 26.5.1 Sets and maps for lines on affine spaces over modules 


26.5.6 REMARK: Applicability of tangent concepts for unstructured to structured operator domains. 
Definition 26.5.3 and Notation 26.5.4 are directly applicable to affine spaces over modules over groups and 
rings in Definitions 26.6.2 and 26.6.6, which do have authentic scalar spaces. 


Definitions 26.5.7 and 26.5.9, and Notations 26.5.8 and 26.5.10, are the scalar-multiple versions of the cor- 


responding unital morphism definitions and notations in Section 26.2. They are defined in terms of Defini- 


tion 26.5.3 and Notation 26.5.4, but they are directly applicable to affine spaces over modules over groups 
and rings. 


26.5.7 DEFINITION: The tangent space on an affine space over a module over a set 
(M, X) < (A, M, X, M. pa, Hm, ôx) at base point p € X is the set of lines {Lp m;m € Mj. 


26.5.8 NOTATION: T,(X), for p € X, for an affine space over a module over a set (M, X), denotes the 
tangent space on (M, X) at base point p. In other words, T;(X) = {Lpm;m € M). 


26.5.9 DEFINITION: The tangent bundle on an affine space over a module over a set 
(M, X) < (A, M, X,om, Ha, bm, Óx) is the set of lines {Lp m; p € X, me M}. 


26.5.10 NOTATION: T(X), for an affine space over a module over a set (M, X), denotes the tangent bundle 
on (M, X). In other words, T(X) = (Ly; p € X, m € M} = U,ex T, (X). 
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26.5.11 REMARK: Comparison of tangent bundles on affine spaces over modules with fibre bundles. 

The tangent bundle in Definition 26.5.9 may be regarded as a non-topological fibre bundle in the same way 
as for the affine spaces over groups which were discussed in Remark 26.3.10. In the case of an affine space 
(A, M, X,o Ww LA; M, Óx) over a module M over an unstructured set A, there is no explicit role for the 
set A within the fibre bundle. The set A cannot be the structure group because it is not a group! 


26.5.12 REMARK: Uni-directional tangent lines require an order definition on the operator domain. 

The definitions of half-lines cannot be given if the scalar space lacks a compatible ordering. It would be 
possible to define half-lines here for ordered operator domains A. However, ordered rings are of more interest 
than general ordered operator domains. Therefore the definition of half-lines is deferred until affine spaces 
over modules over rings have been defined. (See Definition 26.7.5.) 


26.6. Affine spaces over modules over groups and rings 


26.6.1 REMARK: Upgrade of affine spaces over modules to operator domains which are groups. 

Definition 26.6.2 upgrades the module M < (A, M, co, pa) over an unstructured set A in Definition 26.4.10 
to a module M < (G, M,oG, oy, Ha) over a group G < (G, oq). (See Section 19.2 for modules over groups.) 
One may also define affine spaces over modules over semigroups or monoids, ‘but these seem unlikely to be 
useful in differential geometry. 


26.6.2 DEFINITION: An affine space over a module over a group is a tuple 
X< (G, M, X,0G,0M, HG, HM; ôx) such that: 


(i) M < (G, M,og,om, HG) is a left module over a group G < (G, oa). 

(ii) (M, X) < (M, X,om, um) is an additively notated right transformation group which acts freely and 
transitively on X. 

(iii) ôx : X x X — M, written subtractively, satisfies Vr € X, Ym € M, ôx(um(xz, m), x) =m. 
(Le. Vr € X, Vm € M, (x+ m)— x= m.) 

The point space of the affine space (G, M, X,0G,0m, HG, M, Óx) is the set X. 

The vector space of the affine space (G, M, X,0G,0 M. HG, iM; Óx) is the module M < (G, M,oc,om, Ha). 

The scalar space of the affine space (G, M, X,0G,0 M, HG, HM, Ôx) is the group G < (G, oa). 


26.6.3 REMARK: Summary of spaces and operations for affine spaces over modules over groups. 
The operations in Definition 26.6.2 may be summarised as follows. 


(i) og : G x G — G. Operation within the top-level set. 

(ii) ug: Gx M > M. Top-level set acting on the middle-level set. 
(iii) on : M x M — M. Addition operation within the middle-level set. 
(iv) um : X x M > X. Middle-level set acting on the bottom-level set. 
(v) ôx : X x X > M. Bottom-level operation with middle-level result. 


The space G of actions on the module M is now a group. So there is an extra operation o; : G x G > G 
within the top-level structure G. 


26.6.4 REMARK: Interpretation of the operator domain of an affine space over a group as a structure group. 
As mentioned in Remark 26.5.6, the definitions and notations for lines, tangent spaces and tangent bun- 
dles for affine spaces over modules over sets may be directly exported to affine spaces over modules over 
groups and rings without change. However, when exporting the fibre bundle ideas which are mentioned in 
Remark 26.3.10, the presence of a group acting on the module opens up the possibility of using this group 
as the structure group for the fibre bundle. 


Let (G, M, X,0G,0 M, HG, iM, Óx) be an affine space over a module over a group. Definition 21.8.3 for a non- 
topological (G, F) fibre bundle requires a specification tuple (E, 7, B, AL) and an effective left transformation 
Broup (G, F) < (G, F, OG, H). 

Let E = T(X), B = X and F = M, using the same group G for the affine space and the fibre bundle. Let 
AL = {ġ}, where ó : T(X) — M is defined by ¢: Lym — m. This map is well defined, as mentioned in 
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Remark 26.3.10, because for any line L € T(X), the value of m = ¢(L) is uniquely determined as $(L) = 
dx(L(1), L(0)). Similarly the map 7: T(X) — X with 7 : Ly, — p is well defined. The action of G on 
F = M is p= ug: Gx M — M. With these assignments, (E, m, B, AE) is a non-topological (G, F) fibre 
bundle. In other words, the tuple (T'(X), r, X, {¢}) is a non-topological (G, M) fibre bundle. It happens 
that this fibre bundle is trivial, but that is because of the particular choice of fibre atlas Ar xj {o}. 


It is clear from this example and Remark 26.3.10 that affine spaces in general have tangent-line spaces which 
satisfy the requirements for a non-topological fibre bundle. In the case of an affine space over a linear space, 
the structure group is the field of the linear space. So in the case of the well-known affine space on R”, the 
structure group is R. 


26.6.5 REMARK: A module over a ring is not a module over a group. 

Definition 26.6.6 for an affine space over a module over a ring is not derived from Definition 26.6.2 because 
a module over a ring is not a module over a group. (See Remark 19.3.3 for details.) Therefore it must be 
defined separately. 


An affine space over a module over a ring is almost the same thing as an affine space over a linear space, 
which is the most common kind of affine space seen in applications. 


26.6.6 DEFINITION: An affine space over a module over a ring is a tuple 
X < (R, M, X,on,TR,OM; HR, HM, Óx) such that: 


(i) M < (R,M,or,Tr, om, HR) is a left module over a ring R < (R,or,TR). 

(ii) (M, X) < (M, X,om, um) is an additively notated right transformation group which acts freely and 
transitively on X. 

(iii) ôx : X x X > M, written subtractively, satisfies Vr € X, Ym € M, ôx(um(z, m), x) =m. 
(Le. Va € X, Vn € M, (x4 m)-—z-m.) 

The point space of the affine space (R, M, X,on, TR, M, HR, M, Óx) is the set X. 

The vector space of the affine space (R, M, X, OR, TR, OM; HR, UM, 0x) is the module 

M < (R, Mon, TR, M; HR). 

The scalar space of the affine space (R, M, X, OR, TR, OM; HR, AM; Óx) is the ring R < (Rom, 7n). 


26.6.7 REMARK: Summary of spaces and operations for affine spaces over modules over rings. 
The operations in Definition 26.6.6 may be summarised as follows. 


(i) on : Rx R — R. Additive group operation within the top-level set. 
(ii) Te: Rx R — R. Multiplicative semigroup operation within the top-level set. 
(iii) ug : Rx M — M. Top-level set acting on the middle-level set. 
(iv) oy: M x M — M. Addition operation within the middle-level set. 
(v) um : X x M —> X. Middle-level set acting on the bottom-level set. 
(vi) ôx : X x X > M. Bottom-level operation with middle-level result. 


26.7. Affine spaces over unitary modules over ordered rings 


26.7.1 REMARK: The advantages of upgrading the affine space module to a unitary module. 

Definition 26.6.6 almost offers all of the functionality of the simplest kind of affine space, namely the affine 
space over a group in Definition 26.2.2. There is one thing missing, namely the guarantee that every line 
in the affine space will include displacement by the velocity vector. One naturally expects that a line 
Lp,m : T 5 p rm for r € R should include the velocity vector m. This would be guaranteed if the ring had 
a unit element 1g such that lym = m for all m € M. This requirement is met by a unitary left module over 
a ring. (See Definition 19.3.6.) 


An affine space over a unitary module over a ring is defined as an affine space over a module over a ring for 
which the module over a ring is a unitary module over a ring. (See Definition 26.7.2.) The lines Lpm in 
an affine space X over a unitary module M over a ring R according to Definition 26.5.3 are guaranteed to 
satisfy Ly (1g) = p + m for all p € X and m € M. Therefore the range of Lp,m includes the range of the 
much simpler line defined for affine spaces over groups in Section 26.2. 
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26.7.2 DEFINITION: An affine space over a unitary module over a ring is an affine space over a module 
over a ring X < (R, M,X,or,TR, OM, HR, AM, Óx) such that 


(i) M < (R,M,or,TR, om, HR) is a unitary module over a ring R < (R,or,TR). 


26.7.3 REMARK: Import of tangent definitions to affine spaces over a ring. 

Lines, tangent spaces and tangent bundles are defined and notated on affine spaces over modules over rings 
exactly as for affine spaces over modules over unstructured operator domains. (See Definitions 26.5.3, 26.5.7 
and 26.5.9, and Notations 26.5.4. 26.5.8 and 26.5.10.) One simply ignores the structure on the ring. 


26.7.4 REMARK: Advantages of upgrading an affine space to a module over an ordered ring. 

Definitions 26.7.5, 26.7.7 and 26.7.9, and Notations 26.7.6, 26.7.8 and 26.7.10 are the scalar-multiple versions 
of the corresponding unital morphism definitions and notations in Section 26.2. Half-lines make no sense 
if the scalar space has no order. Therefore these definitions require an ordered ring. (See Section 18.3 for 
ordered rings.) It is possible to define half-lines for affine spaces over modules over unstructured sets or 
groups also, but an ordering must then be provided. The form of the definitions and notations would then 
be as given here. 


An ordered ring has a tuple R < (R, or, TR, <), where “<” is the ordering on R. So the definition of an affine 
space over a module over an ordered ring requires an expanded tuple (E, M, X,on, 7n, «,0 M, HR; HM Ôx). 
A formal definition of this would be tedious and obvious. So it is omitted. 


26.7.5 DEFINITION: A half-line on an affine space over a module over an ordered ring 

(M, X) < (R, M, X,on, TR, X, 0M, HR, IAM; Óx) is a map f : RẸ — X which is defined for some p € X and 
m € M by f :re po rm, where RẸ = {r € R; r > Og]. 

The point p is called the base point of the half-line. 

'The module element m is called the velocity of the half-line. 


26.7.6 NOTATION: L for p € X and m € M, for an affine space over a module over an ordered ring 
(M, X), denotes the half-line on (M, X) with base point p and velocity m. 


26.7.7 DEFINITION: 
The unidirectional tangent space on an affine space over a module over an ordered ring (M, X) at a base 
point p € X is the set of half-lines {Lf m; m € Mj. 


,m? 


26.7.8 NOTATION: T3 (X), for p € X, for an affine space over a module over an ordered ring (M, X), 
denotes the unidirectional tangent space on (M, X) at base point p. That is, T7 (X) = (Lj; m € M}. 


p,m? 


26.7.9 DEFINITION: 
The unidirectional tangent bundle on an affine space over a module over an ordered ring (M, X) is the set 
of half-lines {Lf m; p € X, m € M}. 


26.7.10 NOTATION: T''(X), for an affine space over a module over an ordered ring (M, X), denotes the 
unidirectional tangent bundle on (M, X). In other words, T* (X) = {Lf m; p E€ X, me M} = U,ex Tp (X). 


26.8. Tangent velocity spaces and velocity bundles 


26.8.1 REMARK: Terminology for tangent velocity spaces and bundles on affine spaces over modules. 
Some bidirectional tangent spaces and tangent bundles have been defined on affine spaces in Definitions 
26.3.5, 26.3.7, 26.5.7 and 26.5.9. These may be referred to as “tangent-line spaces”. They correspond to the 
classical “infinite lines”, although these lines may cycle back to the starting point in some kinds of abstract 
affine spaces. 

Some unidirectional tangent spaces and tangent bundles have been defined on affine spaces in Definitions 
26.3.14, 26.3.16, 26.7.7 and 26.7.9. These may be referred to as “tangent half-line spaces”. They correspond 
to the classical “semi-infinite lines”, although these half-lines may cycle back to the starting point in some 
kinds of abstract affine spaces. 


As mentioned in Remark 53.3.2, many species of tangent bundle may co-exist. Each species may be given 
its own name, and canonical isomorphisms may be defined between them. In the case of affine spaces, one 
may also define “tangent velocity spaces" containing pairs (p,m) rather than lines Lp,m or half-lines LY - 
Clearly the "information content" is the same. It is only the meaning which is different! 
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26.8.2 DEFINITION: The tangent velocity space on an affine space over a module over a set 
(M, X) < (A, M, X,om, HA, HM, Óx) at base point p € X is the set of pairs {(p, m); m € M). 


26.8.3 NOTATION: T,(X), for p € X, for an affine space over a module over a set (M, X), denotes the 
tangent velocity space on (M, X) at base point p. In other words, T;(X) = ((p, m); m € M}. 


26.8.4 DEFINITION: The tangent velocity bundle on an affine space over a module over a set 
(M, X) < (A, M, X,om, HA, HM, ôx) is the set of pairs {(p, m); p € X, m € M}. 


26.8.5 NOTATION: T(X),for an affine space over a module over a set (M, X), denotes the tangent velocity 
bundle on (M, X). In other words, T(X) = ((p,m); p e X, me M} = U,ex T,(X). 


26.8.6 REMARK: The disadvantages of point-velocity-pair parametrisation of tangent vectors. 

Definitions 26.8.2 and 26.8.4, and Notations 26.8.3 and 26.8.5, have the advantage of compactness and 
simplicity. But they have the big disadvantage of ambiguity. Bidirectional and unidirectional tangent lines 
have the same velocity. On differentiable manifolds, numerous other kinds of tangent bundles share the same 
velocities, such as various classes of tangent differential operators and tangent curve classes. The tangent 
velocity bundle T(X) — X x M is only sufficient to indicate which object within a given tangent bundle is 
meant, but is insufficient indicate which tangent bundle is meant. 


Tangent velocity spaces have a further disadvantage. They do not indicate how the scalar spaces act on 
them. The line and half-line spaces indicate explicitly the maps from scalar spaces to point spaces. 


A much more serious disadvantage of the tangent velocity bundle is that it is indistinguishable from the 
displacement bundle. Velocities and displacements are not the same thing. In an affine space, there is a 
one-to-one correspondence between displacements and velocities. In differentiable manifolds, these concepts 
are totally different. The confusion between displacements and velocities in affine spaces is the core reason 
for the difficulties in defining tangent spaces on differentiable manifolds. Nostalgia for the good old days 
of flat space makes people want a velocity vector which is as tidy and convenient as the classical flat-space 
point-to-point vector, which is indistinguishable from a velocity vector. 


The tangent line and half-line bundles have the advantage that they correspond very well to the requirements 
of differential calculus. When one differentiates a real-valued function of a single variable, one is approxi- 
mating the function by a line. When one differentiates a real-valued curve in IR", one is approximating the 
curve by an affine-parametrised line. The real-valued case is also an affine-parametrised line because the 
Y-value is a linear function of the X-value. In other words, x — y = p+ mz for some point p and velocity m. 
So differentiation is the art of approximating a curve by an affine-parametrised line. 

In the case of differentiable manifolds, one approximates differentiable curves by affine-parametrised lines in 
the chart space. That is as good as it gets. There is no such thing as an affine-parametrised line on a general 
differentiable manifold. Therefore the most natural definition of a tangent vector on a general differentiable 
manifold is an affine-parametrised line with respect to each chart in the atlas. That is as good as it gets! 


26.9. Parametrised line segments and hyperplane segments 


26.9.1 REMARK: Parametrisation of lines by point-pairs. 
The lines on affine spaces in Definition 26.5.3 are specified by their base point and velocity. Definition 26.9.2 
specifies the base point and one other point. 


26.9.2 DEFINITION: The line through points p,q in an affine space X over a unitary module M over a 
ring R is the map L: R — X defined by L:t p+t(q—p). 


26.9.3 REMARK: The line through two points might not include the second point in a non-unitary module. 
The expression p--t(q— p) in Definition 26.9.2 means uy (p, o. (t, 0x (q, p))) in terms of the affine space tuple 
(R, M, X,on, TR, X0 M, HR, AM, Óx) in Definition 26.6.6. So Definition 26.9.2 is well defined in a general 
affine space over a module over a ring (without requiring the module to be unitary). But one requires the 
module to be unitary so that one can write L(1g) = p + 1n(q — p) =q. 


In terms of Notation 26.5.4, the line through points p,q is Ly,4 .,. Thus Definition 26.9.2 does not require 
its own specific notation. It is only a different way of specifying or parametrising lines. 
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26.9.4 REMARK: Lines through point-pairs may have gaps in them. 

It is not guaranteed that the line in Definition 26.9.2 will cover all points on the line through the two given 
points. For example, if R = Z and p,q € M = Z with p = 0 and q = 2, the map t > p+t(q—p) will miss out 
the value 1 € M because t = 1/2 is not possible. (A similar situation for the module M = Z? over R = Z 
is illustrated in Figure 40.2.2 in Remark 40.2.3.) This suggests that “lines between two points” should be 
parametrised by determining the closest point to p on the line joining p to q, and using this instead of q. 


26.9.5 REMARK:  Non-availability of a customary linear space formula for a line through two points. 

The expression (1 — t)p + tq is not well defined because there is no product function from R x X to X, and 
there is no addition function from X x X to X. The pseudo-expression (1—t)p+tq would be well defined if 
X were given a linear space structure (or the structure of a unitary module over a ring) which is consistent 
with the affine space structure, but that would defeat the main purpose of defining affine spaces, which is to 
remove the origin and vector addition operation from the point space. 


26.9.6 REMARK: The definition of line segments requires an ordered ring. 

One requires an ordering on the ring so that line segments will be well defined. Therefore a general affine 
space over a unitary module over an ordered ring seems to be a suitable minimal structure for this definition. 
(See Section 18.8 for ordered fields.) 


The space-specific notation R[r,y| = (t € R; x € t € y] for ordered rings R is used in Definition 26.9.7. 
The prefix R may be omitted when the space is implicit in the context. 


26.9.7 DEFINITION: The line segment through points p,q in an affine space X over a unitary module M 
over an ordered ring R is the map L : R[0g, 15] > X defined by L: tty p+ t(q — p). 


26.9.8 REMARK: Using the real-number interval notation for line segments. 
Notations 26.9.9 and 26.9.15 are suggested by the standard closed interval notation in R. 


26.9.9 NOTATION: X[p,q], for points p,q € X for an affine space X over a unitary module over an ordered 
field, denotes the parametrised line segment through points p and q. 


26.9.10 REMARK: Well-definition of formulas for hyperplanes through given points. 

The expression po + QM ti(pi — po) in Definitions 26.9.11 and 26.9.12 is a well-defined element of X for 
all point sequences p € X^*! and number sequences t € R^. The expression DS ti(pi — po) is defined 
inductively by the rule $7 ,t;(pi — po) = t5(pj — po) + KS e ti(pi — po). Since vector addition in the 
module M is commutative, the sum expression is independent of the order of addition of the terms. 


26.9.11 DEFINITION: The hyperplane through points po,pi...py € X for k € Zi in an affine space X over 


a unitary module M over an ordered ring R is the map L : R* — X defined by L : t — po + T ti(pi — po). 


26.9.12 DEFINITION: The hyperplane segment through points po,pi...py € X for k € Zi in an affine 
space X over a unitary module M over an ordered ring R is the map L : (R[0g,15])" — X defined 


by L:te po + se ti (pi — po). 


26.9.13 REMARK:  Hyperplane segments are the same as convex spans. 
The range of the hyperplane segment through points in Definition 26.9.12 is known as the “(convex) span” 
of the points po, p1 .. . px. 


26.9.14 REMARK:  Non-availability of a well-known linear-space formula for hyperplanes in affine spaces. 
The expression po + EM ti(pi — po) in Definitions 26.9.11 and 26.9.12 resembles the expression (1 — 
x ti)po + pu tipi. As pointed out for the two-point case in Remark 26.9.5, the latter expression is 
ill defined because there is no product function from R x X to X and no addition function from X x X to X. 
A convex combination of k +1 points po, pi, ... py is often written as p Aipi when the point space X has 
a linear space structure which is compatible with the affine space structure on X. The number sequence 
A € R**! is then required to satisfy X Ai — lg. 


26.9.15 NOTATION: Xl|po,pi...px] denotes the hyperplane segment through points po,pi...px € X for 
ke Ze in an affine space X over a unitary module over an ordered field. 
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26.10.1 REMARK: The ancient Egyptian understanding of parallel translation. 

One may define affine spaces in two ways. One may define vectors v as displacements v = q— p between points 
p and q, or one may define displacements as sums p = q+ v of points and vectors. It seems plausible that in 
prehistory, points were best known, and displacements from point to point must have been secondary. The 
idea of vector portability must have been a relatively late development. The concepts of “north”, “south”, 
etc., must have been somewhat fuzzy in pre-farming communities. Only the Sun would have given early 
humanity any sense of portable direction. But by the time of the Egyptian pyramids about 4500 years ago, 
the idea of portable direction was clearly well understood. Romer [468], page 42, says the following about 
the survey of the Khufu pyramid (the “Great Pyramid") by William Matthew Flinders Petrie in 1880-1883. 


As far as the Great Pyramid's exterior was concerned, Petrie found errors that none of its previous 
measurers had seen, establishing that the ancient builders had laid the Pyramid's surviving casing 
stones within 5 seconds of arc of true north, an extraordinary precision for a building of any period 
of history and of the same order of resolution that in 1973 NASA was proudly employing to map 
the sun from Skylab. Following a series of soundings along the Pyramid's four baselines, Petrie 
also established that when newly built the Pyramid's four sides should have formed a theoretical 
square of 755 feet 83/4 inches (230.346 m) and that, in reality, the fine stone of its casing had 
deviated from that absolute perfection by a little less than 2 inches, a calculation that a century on 
(for the Pyramid's old stones still encourage such precision in its surveyors) advances in surveying 
equipment have tightened by less than half an inch and 1.5 seconds of arc. 


This kind of precision shows that the ancient Egyptians were very fully aware that direction was portable. 
Such awareness leads inevitably to the idea that direction has an existence of its own, independent of location. 
Discussing the stellar alignment of the Khufu pyramid, Romer [468], pages 347—348, says the following. 


The remarkably high level of precision held in various lines of Khufu’s Pyramid, however, tokens 
more than the routine checking of its rising stones. This Pyramid, after all, still holds such a 
constancy of build within it that 41/ millennia later the equally fastidious Flinders Petrie was 
able to detect the tiny twist of just 4 inches (average 9.72 cm) between the Pyramid’s core and its 
Tura casing stones; a precision that required the establishment of a single master orientation at the 
Pyramid to that cosmic abstraction that we now call north. 


Thus the “cosmic abstraction” of portable direction was well established at the dawn of history. So it is 
difficult now to see it as an abstraction. We think of directions as having a meaning or existence of their 
own, independent of location. Therefore one may say that the primacy of points is probably the most ancient 
view, and the primacy of directions is the most modern, but “modern” includes all of recorded history. 


In Definition 26.6.6 for an affine space over a module over a ring, the local displacement is defined in terms 
of a universal, portable vector space. That is, the displacement map um : X x M — X is primary, and 
the subtraction map ôx : X x X — M is secondary, being defined in terms of uy. Some texts make the 
opposite choice, making the difference function óx primary. The choices made by some authors are shown 
in Table 26.10.1. 


year reference primary operation vector space scalar space 
1918 Weyl[310], p. 14 difference 6: X x X —^ V linear space, fin. dim. IR 

1957 Whitney [161], p. 350 right action u : X x V — X linear space, fin. dim. IR 

1963 Auslander/MacKenzie [1], p. 3 action o; : V + X, x €X linear space R” R 

1965 MacLane/Birkhoff [110], p. 564 left action u: V x X — X linear space, fin. dim. field, char 4 2 
1981 Greenberg/Harper [86], p. 41 difference 6: X x X — V linear space R” R 

1986 Crampin/Pirani [7], p. 9 right action u : X x V — X linear space, fin. dim. IR 

1993 EDM2 [113], 7.A, p. 23 right action u : X x V — X linear space field 

2004 Szekeres [305], p. 230 right action u : X x V — X linear space field 

Kennington right action u: X x V —2 X group or module  set/group/ring/field 
Table 26.10.1 Survey of affine space definitions 
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26.10.2 REMARK: Upgrading an affine space over a module to an affine space over a linear space. 

Affine spaces over linear spaces are derived from affine spaces over modules over rings (Definition 26.6.6) or 
affine spaces over unitary modules over rings (Definition 26.7.2) by requiring the module to be a unitary left 
module over a field, which is the same thing as a linear space. (See Definition 19.3.6 and Remark 19.3.16 for 
unitary left modules over fields.) 


26.10.3 DEFINITION: An affine space over a linear space is an affine space over a unitary module over a 
ring X < (K,V,X,oK,TK,OV, UK, uv, ôx) such that 


(i) V < (K,V;ok,TK,0v,Hk) is a unitary left module over a field K < (K,ox,7Tx«) (i.e. a linear space). 


26.10.4 REMARK: An affine space over a linear space can be specified in two equivalent ways. 

Theorem 26.10.5 shows that it is possible to define an affine space X over a linear space V in terms of the 
difference function dx : X x X — V instead of the affine structure function uy : X x V — X. The resulting 
algebraic structure is identical. The equivalence of the two specification methods applies also to the affine 


26.10.5 THEOREM: Construction of an affine space from a linear space and a difference function. 
Let V < (K,V,oK,TK,0v,uUK) be a linear space over a field K < (K,oxn,TK). Let X be a set. Let 
dx : X x X > V satisfy the following conditions. 


(i) For all q € X, the map p — ôx (p,q) is a bijection from X to V. 
(ii) Vp, gre X, ôx (p, r) = ov (ôx (q, r), ôx (p, q)). 


Then X < (K, V, X,0oK,TK,0v, HK, Hv, Ôx )is an affine space over the linear space V, where uy : Xx V > X 
is defined by py : (y, v) ++ 67 ! (v), where ôy : X — V is defined by ôy : x œ ôx (x,y) for y € X. 


PROOF: The inverse function go : V — X is well defined by condition (i). So the map uy : X x V > X 
defined by py : (y, v) + 9; ! (v) is well defined. Then ôx (uv (y, v), y) = ôx (05 (v), y) = ôy (ô ! (v)) = v for 
all y € X and v € V. So py satisfies Definition 26.6.6 (iii). 


It must also be shown that (V, X, øv, uv) is a right transformation group which acts freely and transitively 
on X. The pair (V; oy) is a group because V is a linear space. 

Let x € X and v1, v9 € V. Let y = uv(z,v1) and z = uy (y, v2). Then óx(y, x) = ó(uv(z,vi),x) = v and 
óx(z,y) = (uv (y, v2), y) = va. So by condition (ii), óx(z,z) = óx(y,x) + 6x(z,y) = v1 + vo. Therefore 
Av (x, v + v2) = py (6x (z, x), £) = 05;1(0,(z)) = z = uy (y, v2) = uv(uv(x,vi), v2). Hence py satisfies the 
associativity requirement for a group action. (See Definition 20.7.2 (i).) 

Let x € X. Then by condition (ii), dx(z,y) = 6x(#,2) + óx(z,y) for all y € X. Sov = óx(z,z) +v 
for all v € V because Vy € X, Vv € V, dx € X, 6x(x,y) = v by condition (i). Since (V,cv) is a group, 
each v has an inverse —v. So 0y = v — v = ôx(x,x)+ v — v = óx(z, x). Hence Vx € X, ó0x(z, x) = Oy. 
Therefore Vr € X, uy (r,0y) = uy (a, x(x, z)) = 0; (0x (v, v)) = x by the definition of uy. This verifies 
Definition 20.7.2 (ii). So (V, X, ev, uv) is a right transformation group. 


It follows from condition (i) and Theorem 20.7.16 (iii) that jy acts freely and transitively on X. Hence 
X < (K,V,X,0K,7TK,0v; UK, -y,0x) is an affine space over the linear space V by Definitions 26.10.3, 
26.7.2, 26.6.6 and 20.7.2. 


26.10.6 REMARK: The difference function can be easily reconstructed from the module-action map. 

The basic property Theorem 26.10.5 (ii) for the difference function dx : X x X — V follows very easily as 
Theorem 26.10.7 (i) from the basic property Definition 26.6.6 (iii) of the affine structure map uy : XxV > X. 
This seems to indicate that it is preferable for mathematical: purposes to define affine spaces in terms of 
transformation groups rather than in terms of difference functions. 


26.10.7 THEOREM: Some basic properties of affine spaces over linear spaces. 
Let X < (K,V, X,ok,TK,OVv, Ik, Ay, Óx) be an affine space over a linear space V. 


(i) Yp,q,r E€ X, r — p = (q— p)+ (r — q). 
(ii) Vp € X, p— p = Oy. 
(iii) Vp,q € X, p — q = —(q — p). 
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PROOF: For part (i), let p,q,r € X. Let vı = óx(q,p) and v = óx(r,q). Then py(p,v1 + v3) = 
u(u(p, v1), v2) = nu(g,v2) = r by Definition 20.7.2 (i). So óx(r,p) = dx(uv(p, vı + v2),p) = vı + v2 = 
Óx (q, p) + óx(r,q) by Definition 26.6.6 (iii). 

For part (ii), let p € X. Then óx(p, p) = ôx (uv (p, Ov), p) = Ov by Definition 26.6.6 (iii). 


Part (iii) follows from part (i) by substituting p for r. 


26.11. Euclidean spaces and Cartesian spaces 


26.11.1 REMARK: The many meanings of a “Euclidean space”. 

There is no single standard definition of a Euclidean space. The majority of so-called “Euclidean spaces” are 
more accurately described as “Cartesian spaces”. The following are lists of some kinds of spaces which are 
commonly referred to as “Euclidean space”. The spaces in the first list have a Euclidean metric (which is 
perhaps more accurately known as a Pythagorean metric, although the Euclidean metric and its Pythagoras 
law were known a thousand years earlier in Mesopotamia and Egypt), or some structure such as a norm or 
inner product from which the metric can be constructed. 


(1) Euclidean plane geometry. The geometrical objects are points, straight lines and circles, including 
various kinds of subsets of these objects. The construction methods use the ruler (i.e. “straight-edge” 
with no distance markings) and the compass. There is no coordinate grid. (See Euclid/Heath [213], 
[214), [215]. 

(2) Euclidean solid geometry. The geometrical objects are points, straight lines, circles, planes, spheres, 
cylinders and cones, including various kinds of subsets of these objects. The construction methods use 
the ruler (i.e. ^straight-edge") and the compass. There is no coordinate grid. (See Euclid/Heath [215].) 


(3) Euclidean metric space. The set R” with a metric. This is the set of tuples R” together with the 
standard point-to-point metric function d : R” x IR^ + R$, where d : (x,y) 9 (oia (i — vi?) 7. 

(4) Euclidean metric linear space. The linear space R” with a metric. This is the linear space R” with 
the standard point-to-point metric function d : R” x R” — R$, where d : (x,y) 9 (SUL, (ai — )?)!72. 


(5) Euclidean normed space. The linear space R” with the standard norm g +> |x|o = (ees a 
for x € R”. (See Definition 24.7.16.) 


(6) Euclidean inner product space. The real linear space R” with the standard inner product 7 : 
R” x R” > R defined by 7 : (z, y) > $5; 4 Ziyi. (See Definition 24.9.7.) This immediately implies the 
corresponding standard norm as in case (5). 

(7) A linear space of n dimensions together with the usual tangent bundle, Riemannian metric and Levi- 
Civita connection. 


The spaces in the following list have no Euclidean metric. So they should be referred to as Cartesian. 
(Of course, Descartes did assume that space always had the Euclidean metric. So this is a misnomer. It 
would be more accurate to call spaces with coordinates “Oresmian spaces” because Oresme introduced the 
idea of plotting functions using coordinates in the 14th century. See Boyer/Merzbach [237], pages 239—241; 
Boyer [235], pages 80-85; Boyer [236], pages 46-51, 76.) 


(8) Cartesian tuple space. The set IR" with index set Nn. (See Definition 16.4.1.) 


(9) Cartesian linear space. The linear space R”. (See Definition 22.2.19.) This is the set of tuples R”, 
for some n € Zi , together with the operations of componentwise addition and scalar multiplication over 
the field IR. 


(10) Cartesian topological space. The set IR" together with the standard topology as in Definition 32.6.3, 
which is induced by the standard metric in case (3). (But the metric is not part of this structure.) Any 
topological space which is homeomorphic to this may also be called a “Cartesian topological space”, as 
in Definition 49.4.3. 


(11) Cartesian topological linear space. The linear space IR" with the standard topology, which is 
induced by the standard metric in case (4). (But the metric is not part of this structure.) 


(12) Cartesian abstract linear space. An n-dimensional real linear space. This space has implied linear 
charts. For each basis B = (e;)2., for an n-dimensional linear space V, there is a component map 
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kB : V > R” defined by Vv € V, v = $5, 4 &p(v)iei, where R” denotes the linear space of real n-tuples. 
(See Definition 22.8.8 for component maps.) Each such component map is a linear space isomorphism. 
The elements of V are considered to be the points of the space while the maps kp are linear charts 
for V. The point space has no fixed axes as are present in case (9). Therefore this abstract space is 
closer to Euclidean plane geometry than case (9), but the origin is still fixed. 


(13) Affine space of n dimensions. This space has a linear space of displacement vectors at each point. 


(14) A linear space of n dimensions together with a tangent bundle. (See Section 26.14.) This has a space 
of velocity vectors at each point, which is often confused with the affine space in case (13), which has 
displacement vectors at each point. 


(15) The set IR" together with the usual flat affine connection. 
(16) The set IR" together with its usual differentiable structure. 
(17) The set R” together with its usual Lebesgue measure. 


As a result of the wide variety of meanings in the literature, it is difficult to say exactly how much algebraic, 
geometric or analytic structure is referred to when someone says “Euclidean space". 


26.11.2 REMARK: The disadvantages of pseudo-notation for Euclidean spaces. 

'The pseudo-notation *E"" is frequently used for one or more Euclidean space definitions. When the notation 
"E"? is used in elementary texts it is probably intended to remind the reader to think of R” with some 
range of usual “Euclidean” structures attached to it. However, such an “E”” notation is not the nth power 
of anything — certainly not the nth power of E, whatever E is. This kind of pseudo-notation should be 
replaced with meaningful notation wherever it is found. (An even more absurd notation is *M"" for an 
n-dimensional manifold. Teachers can do more harm than good by using such meaningless notation because 
it forces students to abandon logical thinking.) 


26.11.3 REMARK: The coordinatisation of Euclidean space. 

'The modern mind is so accustomed to the use of Cartesian coordinates that the majority of authors refer to 
Cartesian coordinates as “Euclidean space". It is true that the construction of a Cartesian coordinate chart 
for a Euclidean space enables all Euclidean definitions and theorems to be expressed correctly in terms of 
n-tuples of real numbers. But such charts are arbitrary and not “intrinsic”. One may counter-argue that the 
lines and circles of Euclidean geometry are equally arbitrary and non-intrinsic, being imposed on physical 
geometry in the same way as Cartesian grid lines. 


For any set of Euclidean points, lines and circles, one may easily construct a Cartesian grid using ruler and 
compass so as to express the locations, dimensions and orientations of points, lines and circles in terms of 
numerical coordinates. For example, given any straight line segment OA, one may construct another line 
segment OB at a right angle to one of its end-points O. (See Figure 26.11.1.) 


Figure 26.11.1 Euclidean construction of Cartesian coordinates 
According to Boyer [236], page 150, it was Nicolas Guisnée in 1705 (see Guisnée [221], page 15) who first 
published an explicit statement that the coordinates of points in Cartesian geometry may be constructed by 


dropping perpendiculars to two orthogonal axes. 
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[...] this book seems to be the first one in which both the x and y coordinates in a rectangular 

Cartesian system are interpreted as the segments cut off on the two axes by perpendiculars from a 

given point. 
Boyer [236], pages 156-157, also mentions that dropping perpendiculars to obtain Cartesian coordinates for 
three dimensions was published by Antoine Parent in “Essais et recherches de mathématique et physique” 
in 1705. (The classical method of “dropping perpendiculars” is given by Euclid’s proposition I-12. See 
Euclid/Heath [213], pages 270-275; Euclid [216], pages 10-11.) 
The common point O is the “origin”. Each of these two lines may be extended indefinitely in both directions, 
thereby constructing axes. For any point P in the plane, there is a unique point on each line which is the 
intersection of the perpendicular from the point to the line. (In Euclidean geometry, there is one and only 
one perpendicular from each point to each axis line. This is not totally obvious. For example, why should the 
perpendicular intersect the axis in the same point for any choices of radii for drawing the circles? Numerous 
other aspects of the construction are non-obvious a-priori. But they are part and parcel of the Euclidean 
geometry tradition.) These two points P; and P, are the *coordinates". The only serious issue is whether the 
distance from the origin to the coordinates is commensurable with the length of the initial line segment OA 
used in the construction. The distances of the coordinate points from the origin effectively convert points in 
the place into numbers, but only if the lengths of lines OP, and OP, may be expressed as numbers. 


Pairs or triples of numbers are not the same as geometrical points. The numbers which were known to 
classical and Hellenic geometry were only those which could be constructed geometrically. Thus it was 
impossible, for example, to construct lines in the proportion 4/2 : 1 or yr : 1. So the use of the Euclidean 
definition of numbers would not permit points to be located on all of the points of the plane — as we 
understand the Euclidean plane in our time. In the time of Descartes, there were less “real numbers” than 
were available by the end of the 19th century. Thus the Euclidean plane and Euclidean space have been 
stealthily redefined by virtue of our modern understanding of numbers. 


Euclidean space is commutative in the sense that if you move a kilometre forwards and then a kilometre to 
the left, you arrive at the same place as if you move a kilometre left and then a kilometre forward. So the 
order in which one make translations does not affect the outcome. Therefore the relative location of a place 
may be specified arithmetically as the sum of the X and Y translations, where the order does not matter, 
like on graph paper. The geometry of a sphere is not so simple. The graph-paper model for the geometry of 
space is difficult to unlearn. One should be surprised that space has a simple arithmetic character. In the 
time of Kant, philosophers even tried to show that space must a-priori obey Euclid's axioms. 


It is best to regard Cartesian coordinates as a numerical model for Euclidean space. It happens that when 
we specify a linear chart for space or time, the model gives practical benefits. But we cannot be sure that 
the model is free of errors — both at the small and large extremes of scale. 


These observations about Cartesian coordinates need to be said because when differentiable manifolds are 
defined in terms of charts, it is generally assumed that the real-number tuple spaces are safe, secure constructs 
which are familiar and can be taken for granted. But if the concept of a tangent vector on a manifold is 
defined in terms of tangent vectors on a so-called Euclidean space, it is important to know that they are well 
defined. In fact, they are not, because there is no such thing as a vector in classical Euclidean space. There 
are only points, lines and circles in two-dimensional space — and various point-sets such as spheres, cones, 
ellipses, etc., in three-dimensional space. (See Sections 40.1 and 53.1 for further comments on this issue.) 
An ontologically unambiguous definition for tangent vectors in Cartesian spaces is the subject of Sections 
26.12, 26.13 and 26.14. 


26.11.4 REMARK:  Cartesian coordinate frames for Euclidean spaces are charts. 

Cartesian coordinates, and the inhomogeneous Euclidean group of rotations and translations, are surely the 
first example in history of mathematical atlases of charts. (One could argue that any collections of two or 
more overlapping geographical charts of the Earth's surface were the first differentiable manifolds defined in 
terms of atlases of charts.) In this sense, it is wrong to think of the modern approach to differential geometry 
as commencing in the 19th century with Gauf and others. But in another sense, this is correct. The Cartesian 
atlases for Euclidean geometry have algebraic transition functions, not topologically or differentially defined 
transition functions. The collections of geographical charts from the time of Ptolemy to the 17th century 
were not mathematically described as differentiable atlases until the necessary mathematical concepts had 
been defined. 
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The fact that collections of compatible Cartesian coordinate charts provide atlases for physical (or intuitive) 
Euclidean space implies that Euclidean geometry is extra-mathematical. At least, this is so if one assumes 
that modern mathematics consists of all that can be constructed and proved within ZF set theory. In 
ancient Greek times, geometry was regarded as more trustworthy than arithmetic because arithmetic could 
not deal with the “incommensurables” which were so immediately evident in their geometry. In relative 
trustworthiness, arithmetic and geometry have swapped places, probably some time in the 19th century. Now 
we import geometry into set theory and arithmetic and perform calculations within analysis and arithmetic, 
and then we export the answers back to geometry. In ancient Greek times, it was the opposite way around. 


Most of the other concepts of differential geometry are also extra-mathematical. For example, differentiability 
and continuity of curves and functions are evident in the intuitive geometry of curved manifolds, and are 
merely modelled within differential geometry. Likewise, straight lines and flat planes are imported into 
Cartesian atlases from the intuitive Euclidean geometry. 


26.11.5 REMARK: The ancient Greeks used Cartesian coordinates. 

One might ask why the ancient Greeks did not discover Cartesian coordinates. In fact, such orthogonal 
coordinate systems were used frequently by Archimedes and Apollonius in the study of conic sections. The 
“value added” by Descartes seems to lie in the systematic conversion of geometric operations into algebraic 
operations. (In fact, it wasn’t really Descartes who introduced Cartesian coordinates. This was done by later 
mathematicians. But Descartes did introduce the arithmetisation of geometry. See Boyer/Merzbach [237], 
pages 319-320; Boyer [236], page 76; Descartes [212].) 


26.11.6 REMARK: Tangent-line bundles on Cartesian spaces use implicit charts. 

The tangent-line bundles in Sections 26.12, 26.13 and 26.14 are not quite correct. For example, a line in 
n-dimensional Euclidean space at a point p is not a function L : R — R” which is affine and satisfies L(0) = p 
as claimed in Definition 26.13.5 and elsewhere. Some information is missing here. The missing information 
is the chart v : IR" — X for some Euclidean space X. The coordinatisation L for a geometric line depends 
on the choice of chart v. Likewise, the coordinatisation p for each point is chart-dependent. Therefore 
strictly speaking, one should always think of lines and points as equivalence classes of pairs (Y, L) and (wv, p) 
respectively. If this was always done for flat-space geometry, there would be almost no “culture shock” when 
making the transition to the atlases of charts in differential geometry. There is in fact less difference than 
there at first seems between Cartesian analytic geometry and differential geometry. The former supports 
any concept which is invariant under the affine group of chart transitions. The latter supports any concept 
which is invariant under some transformation semigroup of chart transition maps. 


If one could follow a Euclidean space or manifold chart Y from the coordinate space back to the point space, 
one could presumably see “real geometry”. However, this real geometry is highly variable. Thousands or 
millions of real-world systems can be modelled, with varying fidelity, by coordinate charts. The principal value 
of abstract Cartesian spaces and differentiable manifold atlases lies in the enormous range of applications. 


6 


It is counter-productive to ask to see the “real geometry". Fitting coordinate charts to the real world is the 
task of the applied mathematician, the scientist, the engineer, the geographer or anyone else. It must never 
be forgotten that mathematics merely provides toolboxes of intellectual tools. The value of the tools arises 
both from the broad generality of applications and from the austere abstraction which isolates mathematics 
from applications, making mathematics application-independent. 

'There was a time in history when mathematics and the real world were note kept distinct from each other. 
When mathematics became more abstracted from the real world towards the end of the 19th century, the 
desire for meaning caused some people to advocate the Platonic ideal-universe ontology for mathematics, 
according to which mathematics was an exact model for a metaphysical universe beyond the senses. That 
is a path which some follow in their desperate thirst for meaning, but the meaning of mathematics is not 
difficult to find. The “true meaning" can be found by following the implied charts from all mathematical 
models back to the real world. In each application context, the chart leads to a different reality which is being 
modelled by someone. (That person might not be human. For example, many interplanetary probes use 
stars to guide them, building up an internal model of their location in the Solar System from observations. 
This is mathematical modelling of real-world geometry. Autonomous and semi-autonomous robots create 
models of the world much as humans do.) 


One may likewise ask for the meaning of the integers. As in the case of Cartesian coordinates, the integers 
are merely a model for real-world systems. Following the implicit application charts from the integers back 
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to the real world may lead to cows in a field, which are being counted by a farmer. Or they may lead to 
bundles of bank-notes, or beta particles and gamma rays being counted in a Geiger counter. Modelling the 
world is just something which humans, other animals and robots do. Mathematics is merely one class of 
human-made models among many. 


If one does not forget that every applicable mathematical structure has an implicit chart which may attach it 
to the “real world”, then it is harmless to model Euclidean space with Cartesian coordinates in IR”, ignoring 
the chart connecting coordinates to a “real geometry”. The harm arises when one begins to think that 
mathematical structures have some reality of their own. 


The pertinent difference in the case of manifolds is that manifold specifications require explicit charts to 
connect them to the modelled geometry because more than one chart is typically required. The linkage 
between charts via the extra-mathematical “real manifold” gives meaning to the inter-chart transition maps. 
For other classes of mathematical systems, we tend to focus on the mathematical set-construction as it was 
the “real thing”, not just a model. The pertinent difference in the case of manifolds is that we are forced to 
be aware of the linkage to the “real geometry”. 


It may be counter-argued that manifolds could be exempted from the need to link coordinate charts to an 
extra-mathematical geometry in two different ways. The first way is to link manifold charts to, for example, 
subsets of some higher-dimensional flat Cartesian space. In other words, manifolds could be regarded as 
merely models for embedded sets, for example of the form {x € IR"; F(x) = 0} for some suitably regular 
function F : IR^ — R. Then the extra-mathematical linkage would be from the purely mathematical 
ambient space to the “real geometry”. The second kind of evasive action would be simply link the charts of a 
manifold to each other. This is the local transformation semigroup approach, where one defines an abstract 
semigroup of local injective maps with some specific property such as continuity or differentiability. With this 
approach, the transition maps between charts lack motivation and meaning. If one believes that mathematics 
can flourish as a purely formal, axiomatic symbolic algebra devoid of extra-mathematical meaning, then this 
kind of intra-mathematical linkage between charts would be satisfactory. For many mathematicians, however, 
intuition is the main spring of ideas and conjectures. 


One may regard it as misfortune or fortune that manifolds are defined in terms of charts linking them to an 
abstract point space. It is certainly true that differential geometry is made confusing by them, and much 
more difficult to learn and teach. But one is led by the multi-chart style of manifold specification to think 
very carefully about geometry without the traditional assumptions about Euclidean flat spaces. Multi-chart 
manifolds are a bit like walking on a very slippery iceberg. You never know quite where you stand, and the 
group can move under you very quickly. One is compelled to recognise that “everything is relative". 


26.12. Cartesian-space tangent-line bundle philosophy 


26.12.1 REMARK: In a flat Cartesian space, a point-pair vector is equivalent to a point/velocity pair. 
There are two kinds of vectors in Cartesian spaces, namely displacement vectors and velocity vectors. These 
two classes of vectors are often identified with each other, but they have significant differences which become 
evident in non-Euclidean geometries. A displacement vector is specified by its start-point and end-point. A 
velocity vector is a parameter of lines and curves. A velocity vector has a specific start-point, but there is 
no end-point. 


Tangent-line bundles are a flexible replacement for the flat Euclidean point-displacement affine spaces which 
are described in Section 26.10. Tangent-line bundles are described in Section 26.5. Tangent-line bundles 
replace point-to-point displacements with parametrised lines with a specified start-point and velocity. 


26.12.2 REMARK: Velocity is a tangent-line parameter. 

Affine spaces explicitly identify point-pairs with displacement vectors. In a flat affine space, constant-velocity 
lines have a natural one-to-one relation with displacement vectors. One may extract the start-point and the 
unit-time displacement from a constant-velocity line to construct a pair of points, the start-point and end- 
point. Thus if L : K — X is a constant-velocity line in an affine point space X over a linear space V with 
field K, one may associate each such line L with the point-pair (p, p +v) = (L(0x), L(1&)) = (po, p1), where 
p — po is the start-point and v is the velocity of the line. One may then recover the line from these two 
points by defining L: K — X by L:t e po + t(p1 — po). 
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However, the existence of a one-to-one association between two concepts does not mean that they are 
identical. In this case, both the set-construction and the underlying ontology are different. A “displacement 
vector" means either a start-point/end-point pair or an element of the vector space of a flat affine space. 
A "velocity" means a particular parameter of lines and more general curves. The displacement from L(0x) 
from L(1y) is a fairly arbitrary parameter for specifying constant-velocity lines. There are other ways to 
parametrise lines. If the curve is not of constant velocity, then the displacement from L(0«) from L(1g) is 
clearly not generally in a one-to-one association with the velocity. Conversely, the point L(0«) + v is not 
generally equal to L(1x). 

The “underlying ontology" of mathematical concepts is not vapid philosophy. One of the principal thrusts of 
mathematics is generalisation. Illumination through generalisation has been a major thrust for the last two 
centuries. T'wo concepts which may be in a close one-to-one association in a specific context will require quite 
different treatment in a generalised context if they are ontologically different. One of the incidental benefits 
of the generalisation from Euclidean geometry to differential geometry is that it forces mathematicians 
to clarify the ontology of traditional geometric concepts so that they may be unambiguously defined and 
represented. Concepts which were formerly regarded as interchangeable often have only a tenuous association 
in the differential geometry context. It should not be forgotten also that there are (at least) three distinct 
differential geometry contexts, namely differentiable manifolds, affinely connected manifolds, and Riemannian 
manifolds. 


26.12.3 REMARK: Avoidance of “coordinates” when specifying affine parametrisation of lines. 

A natural and convenient representation of velocity vectors is by affine-parametrised lines. Thus an affine 
function L : R > R” with L(0) = p may be used as a representation of a velocity vector at p € IR^. An 
advantage of specifying only that these lines be affine rather than of a specific form t — p + tv is that there 
are no parameters which can be thought of as *coordinates". (Some people don't like coordinates!) 


A map ¢: V > W for general linear spaces V and W over a field K is said to be “affine” if it satisfies 


Vv, vo, v; € V, VA E K, 
v— vo = A(vi — vo) — (v) — (vo) = A(P(v1) — ¢(vo)). 


It follows that an affine map L : R — IR" has the form L : t — p + tv for some p,v € IR^. Hence the 
pair (p,v) is sufficient to identify an individual velocity vector. Such a coordinates-pair is convenient for 
calculations, but is ^ontologically wrong". The affine-parametrised line (i.e. affine map) representation L is 
a more “ontologically correct" representation of a velocity vector. 


26.12.4 REMARK: Support for the concept of vectors as parametrised lines. 
The representation of vectors as parametrised lines instead of point pairs is mentioned by Misner/Thorne/ 
Wheeler [292], page 49. 


26.12.5 REMARK:  Affine spaces are ontologically more correct, but linear spaces are more convenient. 
The following definitions of ontologically correct tangent-line vectors are closely related to the tangent-line 
spaces in Section 26.5 for general affine spaces over modules. The following definitions are, however, for 
Cartesian spaces of n-tuples, not for affine spaces over a linear space. These definitions are applicable also 
to general linear spaces, and to general modules without any affine space structure. 


'The purpose of affine structure is to remove the special role of the origin and axes of a coordinate system. 
In fact, Definitions 26.13.1, 26.13.2, 26.13.3 and 26.13.5 have a fundamental ontological error because they 
use Cartesian n-tuple spaces as a convenient substitute for the corresponding affine space structures. In 
the intended applications, these definitions will be typically used as charts, for which the origin and axes 
have no special value (because differentiable manifold charts have the same meaning no matter how they are 
translated or rotated). 


Therefore although Cartesian n-tuple spaces are used as charts throughout this book (and in most other 
differential geometry texts), the ontologically more correct chart spaces are affine spaces. The convenience 
of Cartesian spaces, and the weight of tradition, cannot be ignored in this case. However, the representation 
of tangent vectors on linear spaces such as IR" are as convenient as the velocity-components representation 
or any other representation. Therefore the weight of tradition can be ignored when defining ontologically 
correct tangent vectors on the (slightly ontologically incorrect) Cartesian n-tuple spaces. 
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26.13.1 DEFINITION: A tangent (line) vector in a Cartesian space IR" for n € Zj is an affine map from IR 
to IR”. In other words, it is a map L : IR — R” with L : t> p + tv for some p,v € R”. 


26.13.2 DEFINITION: A tangent (line) vector at a point p in a Cartesian space IR" for n € Zj is a tangent- 
line vector L in IR" such that L(0) — p. 


26.13.3 DEFINITION: The tangent (line) vector at a point p with velocity v in a Cartesian space IR" for 
n € Zg is the tangent-line vector L in R” such that L(0) = p and L(1) — L(0) = v. 


26.13.4 NOTATION: Lp, denotes the tangent-line vector at a point p with velocity v in a Cartesian space 
IR” for n € Zi. Thus 


Vp, VE R^", Vie R, Ly v(t) =p + tv. 


26.13.5 DEFINITION: The tangent (line) vector set at a point p in a Cartesian space R” for n € Zg is the 
set {L : IR > R”; L(0) = p and L is affine}. 


26.13.6 NOTATION: T,(IR"), for n € Zj and p € R”, denotes the tangent-line vector set at p in R”. 
In other words, 


Vp c m^, T(R”) = (L5,,; v E R^) 
={L: R > R”; w e R”, vt c R, L(t) = p + tv}. 


26.13.7 THEOREM: Bijection between tangent lines and the velocity parameter. 
Let n € Zi and p € R”. Define p, : R” — T, (IR?) by pp(v) = Lp,» for all v € R”. 


(i) VL € T; (IR^), w € R^, ppv) = L. 
(ii) Yui, v2 € R”, (py(v1) = ppy(v3) > v1 = v2). 
(iii) py : R” — T, (IR") is a bijection. 


PROOF: Part (i) follows directly from Notation 26.13.6. 


For part (ii), let v1, v2 € R” satisfy py (v1) = pp(v2). Then Lp», = Ly. S0 p-F v1 = Lp, (1) = Ly (1) = 
D - v3. Therefore vı = v9 by Definition 22.2.19 and Theorem 15.7.13 (iii). 


Part (iii) follows from parts (i) and (ii) and Definition 10.5.2. 


26.13.8 REMARK: Extension of affine space tangent-line vector notation to Cartesian spaces. 
Notation 26.4.6 for lines in affine spaces over modules is extended to Cartesian spaces in Notation 26.13.4. 


26.13.9 REMARK: Illustration of tangent-line vectors on a Cartesian space. 

Figure 26.13.1 illustrates two tangent-line vectors in T,(IR"). Lines L1, L2 € T,(IR”) are functions with 
domain IR and range R”, but for simplicity, the domain values t = ... — 2, —1,0,1,2,3... are indicated as 
tags on the points in the range set. 


Figure 26.13.1 Tangent-line vectors in T(R”) 
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Figure 26.13.2 Tangent-line vectors in T, IR"), space-time view 


26.13.10 REMARK:  Space-time and time-space illustration of tangent-line vectors on a Cartesian space. 
Figure 26.13.2 illustrates two tangent-line vectors in T; (IR^) which are similar to the vectors in Figure 26.13.1, 
but the time parameter is shown explicitly for the lines L1, Lo € T,(IR”) in this space-time diagram. The 
space-like lines L4, L5, are obtained by projection into the space plane R”. In the special case of the zero 
tangent vector L : t — 0, the line L would be vertical above and below the point p. The projection would 
then have range equal to the single point p € R”. 


Tangent-line vectors may also be illustrated as in Figure 26.13.3, which shows space vectors as a function of 
the time parameter. In this case, the zero tangent vector at p is shown as a vertical line Lo. 


ta 


Figure 26.13.3 Tangent-line vectors in T(R”), time-space view 


26.13.11 DEFINITION: The tangent (line) space at a point p in a Cartesian space R” for n € Zg is the 
tangent-line vector set T,(R”) = {L : R > IR^; L(0) = p and L is affine} together with 


(i) the vector addition operation o : T; (IR?) x T; (IR?) > T,(R”) with o(Ly, Le) : t — Lı (t) + L2(t) — p for 
all Li, Lo € T(R”), and 

(ii) the scalar multiplication operation p : R x T(R”) > T,(R”) with (A, L) : t — L(At) for all Ac R 
and L € T, (IR^). 


26.13.12 REMARK: Illustration of vector addition for tangent lines. 
Figure 26.13.4 illustrates vector addition for a tangent-line space T; (IR?). 


26.13.13 THEOREM:  Tangent-line spaces on a Cartesian space are real linear spaces. 
The tangent-line space in Definition 26.13.11 is a linear space over the field IR. 
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2 > 
E! Lı + Lg 
> 
R” 
Figure 26.13.4 Vector addition for tangent-line vectors in T, (IR?) 


PROOF: Let ne Ze and p € IR". Then clearly Lı + L4 and AL are affine maps from IR to R” for any A € IR 
and L, Li, Lz € T(R”). Since (L4 + L2)(0) = L4(0) + L2(0) — p = p and (AL)(0) = L(0) = p, it follows that 
T, (IR?) is closed under vector addition and scalar multiplication. Hence T, (IR") (with these operations) is 
a linear space. 


26.13.14 DEFINITION: The velocity of a tangent-line vector L € T,(R”), at a point p in a Cartesian space 
R” forn € Z7, is the n-tuple v € R” such that L(t) = p + tv for all t € R. 


26.13.15 REMARK: The velocity parameter of a tangent vector is not an element of the point space. 

The velocity of a tangent vector in Definition 26.13.14 is well defined because the n-tuple v exists and is 
unique for any tangent vector L. However, the velocity vector is not, strictly speaking, an element of the 
point space IR". The velocity of a vector is merely a parameter for the function L which just happens to be 
an element of the set IR". The affine function L should be thought of as a curve which is parametrised by 
an element v of an abstract space of curve-family parameters. 


If the parameter space R is transformed in some way, for example by converting the time from seconds to 
minutes, then the velocity parameter will change to a different n-tuple. Most texts show tangent vectors as 
arrows in the point space. This leads to endless confusion, even in flat Cartesian spaces. But in the context of 
differentiable manifolds, the mental image of tangent vectors as small arrows leads to absurdities. The large 
number of different representations for tangent vectors on manifolds may be traced and attributed to the 
false representation of vectors as occupying the point space in the flat Cartesian context. (See Remark 53.3.2 
for a survey of definitions of tangent bundles on differentiable manifolds.) 


The introduction of Definition 26.13.11 for tangent vectors would be justified if it only removed this confusion 
of points with vectors. But it also has the substantial advantage that it permits a very much generalised 
definition of tangent figures (to figures which are not straight lines) and tangency criteria (to criteria which 
are not simple first-order convergence to affine functions). 


26.13.16 REMARK:  Decomposition of tangent-line vectors into components. 

In general, L4 (t) + L(t) is not the same as (Lı + L2)(t), and (AL)(t) is not the same as AL(t), where \ € IR 
and L, 11, L2 € T(R”), with n € Zf and p € R”. The expression 7; , viLp,c, in Theorem 26.13.17 uses 
the sum and product operations c and u in Definition 26.13.11. 


26.13.17 THEOREM: Expression for tangent-line vectors as linear combinations of unit tangent-line vectors. 
TL 


Let p be a point in a Cartesian space R” for n € Zf. Let v = (vj), € R”. Then Ly, = $5, 4 vibpe;; 
where (e;); = ĝi; for all i, j € Nn. 


PRoor: Let p be a point in a Cartesian space IR" for n € Zj. Let v = (vj), € IR^. Let i € Np. Then 
Vt € R, (viLo,e;)(t) = Lye; (vit) = pt+vite; = Lp,v,e,(t) by Definition 26.13.11 (i). Therefore v; Lp e; = Ly,v;e;. 
If n = 2, then Vt € R, (Lue, + Lpv;e;)(t) = Lp e (t) + Lp wze2(t) — p = p + vieit + p + vest — p = 
p + viert + v3eot = p + (vie, + v2e2)t = Lp wviei+vze2(t) by Definition 26.13.11 (i). Therefore Lp wie, + 
Lp wzez = Ly,vyer+v2e2- From the usual inductive argument on n € Zq, it follows that Lp» = 5; 4 vilp.e: 


+ oi —NM P 
for general n € Zo since v = $77 4 viei. 
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26.14. Tangent-line bundles on Cartesian spaces 


26.14.1 REMARK: Tangent-line vectors in tangent-line bundles have unambiguous base points. 

In contrast to the situation with numerical tangent vectors v € R” at a point p € R”, it is not necessary 
to combine tangent-line vectors with “point-tags” to indicate which point each vector belongs to. For 
any L € T; (IR?), the point p may be obtained from L as L(0). 

A numerical vector v must be tagged as, for example, the ordered pair (p,v) to indicate the base point of 
the vector. So one typically uses the untagged form v when only one base point is being discussed, and 
the tagged form (p,v) when the whole tangent bundle on R” is discussed. This can be confusing. The 
tangent-line bundle representation does not suffer from this kind of confusion. 


26.14.2 DEFINITION: The tangent (line) bundle total space on a Cartesian space R” for n € Zj is the 
set Uper» T? (R7). 


26.14.3 NOTATION: T(R”), for n € Zf, denotes the tangent-line bundle total space on R”. That is, 


T(R^) — U T,(R") 


pcm" 
—iL44pucB^ 
={L: R> R”; w e€ R^, vt € R, L(t) = L(0) + tv) 
={L: R > R”; Jp, v € R”, Vt € R, L(t) 2 p + tv}. 


26.14.4 REMARK: Parametrisation of Cartesian-space tangent-line bundle total space. 

Like Theorem 26.13.7 for pointwise tangent-line tangent spaces, Theorem 26.14.5 is completely obvious. 
However, totally obvious assertions are sometimes totally wrong. So it’s a good idea to check. This is good 
practice for generalisations to differentiable manifolds and higher-order tensors. 


The most important reason for presenting so many trivial low-level theorems in this book is that they can be 
referred to in proofs of difficult higher-level theorems. This has the benefit of reducing the burden of proving 
many easy low-level steps for a higher-level theorem. So one is then free to concentrate on the difficult steps 
while not being concerned that there could be a hidden “bug” in the low-level steps. 


26.14.5 THEOREM: Bijection between tangent lines and their parameter pairs. 

Let n € Zj and p € R”. Define p : R” x R” > T(R”) by p(p,v) = Lp» for all p,v € R”. 
(i) VL € T(R”), 3p € R”, w € R^, p(p,v) = L. 

(ii) Vp1, p2 € IR", Vui, v2 € R”, (p(p1, v1) = p(p2, v2) — (pı = po and vı = v2)). 

(iii) p: R” x R” > T(R”) is a bijection. 


PROOF: Part (i) follows directly from Notation 26.14.3. 


For part (ii), let p1, p? € R” and v1,v2 € R” satisfy p(pi,v1) = p(p2,v2). Then Ly, ;, = Lp, .,- Therefore 
pi = Lpi w (0) = Ly,,,,(0) = po. So pı = p2. Similarly pı + vi = Ly, w (1) = Lpz,va (1) = p2 + ve = pi + v2. 
So v1 = v5. 


Part (iii) follows from parts (i) and (ii) and Definition 10.5.2. 


26.14.6 REMARK: Projection maps and velocity charts for tangent-line bundle line spaces. 

According to Definition 21.2.10, T(R”) is a non-topological fibration with fibre space R”, and the sets 
T, (IR") for p € IR" are its fibre sets. Definition 26.14.7 defines the obvious projection map for this fibration, 
and also the obvious global chart which associates the velocity component tuples in Definition 26.13.14 with 
tangent vectors. 


26.14.7 DEFINITION: The projection map for the tangent-line bundle total space T(R”), for n € Zi , is the 
map 7 : T(R”) — R” defined by Vp € R^, VL € T,(R”), 7(L) =p. That is, YL € T(R”), «(L) = L(0). 
The velocity chart for the tangent-line bundle total space T(R”), for n € Zi, is the map 8 : T(R”) — R” 
defined by VL € T(R”), 8(L) = L(1) — L(0). In other words, 6(L) is the velocity of L € T(R”). 
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26.14.8 REMARK:  Tangent-line bundles satisfy the requirements for a tangent bundle. 

Since the sets T, (IR) are pairwise disjoint for points p € R”, the tangent bundle set T(R”) = U,ern TP (R7) 
is a disjoint union of the tangent sets at the points p of the base space IR". By analogy with the fibre bundles 
defined in Chapter 21, the name “tangent bundle" may be given to the set T(R”) if it is given some suitable 
additional structures. 


26.14.9 REMARK: Tangent-line bundles are not related to the standard term “line bundle”. 

The term “tangent-line bundle” in this book is different to the term “line bundle” in the fibre bundle 
literature. In this book, a line bundle is a bundle of lines at each point of a base set. The tangent objects 
are one-dimensional in character, but the linear space of such lines at each point in the Cartesian space is 
n-dimensional. In the fibre bundle literature, a “line bundle” is a vector bundle where the linear space of 
objects at each point is one-dimensional. Line bundles, in the sense of 1-dimensional vector bundles, are 
defined by EDM2 [113], 147.F, page 570, for example. (See also Remark 65.2.12.) 


26.15. Direct products of tangent bundles of Cartesian spaces 


26.15.1 REMARK: Direct products of tangent spaces of Cartesian spaces. 

The tangent space T, (IR?) for a Cartesian space R”, with n € Zi and p € IR" as in Definition 26.13.11, has 
a clear relation to the tangent spaces Tp, (IR"') and Tp, (IR"2) if n = n; -- n; and p = (p1, p2). The apparent 
ordered pair (pi, p2) here signifies the result of concatenating the tuples p; € IR"! and p» € R”? as described 
in Definition 14.12.6 (iii). So strictly speaking, one should write p = concat(pi, p2). However, it is customary 
to identify R™ x R”? with IR^: ^"? by identifying (pı, p2) with concat(pi, pa) for all (pj, pa) € R™ x R™. 
Most of the time, this identification is beneficial and harmless, but when defining some kinds of structures, 
the strict differences may be significant. Here it is convenient to use the ordered pair notation “(pı, p2)” as 
a shorthand for concat(p;, p2), but without forgetting what it really means. 


An element of T(R”) has the form L,, : R — IR" where Ly, : t 4 p + tv for some velocity tuple v € IR". 
(See Definitions 26.13.3 and 26.13.14.) Then any tangent vectors Ly, ,, € Tp, (IR"*) for k = 1,2, may be 
combined to produce Lp u € T(R”) with p = (pi, p2) and v = (v1, v2). This procedure yields an identification 
map which is a bijection from Tp, (IR?!) x T, (IR"?) to T, (IR") as in Definition 26.15.2. This effectively defines 
the direct product of the two tangent spaces. 


26.15.2 DEFINITION: The direct product identification map for Cartesian space tangent spaces Tp, (IR) 
and Tp, (IR"2), for n1,n2 € Zj and py € IR^* for k = 1,2, is the map i : Tj, IR") x Tj, (IR^2) > T,(IR?), 
where n = nı + n3 and p = concat(pi, p2), which is defined by 


Vu, € IR", Vv, c R™, (Ly, vis Lpa,v2) = Ly concat(v1,v2)* 


26.15.38 REMARK: The direct product of two tangent bundles of Cartesian spaces. 

Definition 26.15.2 is easily extended to tangent bundles. The concatenation operation is applied in the same 
way to the component point-pairs and velocity-pairs. Although this is admittedly almost too simple to 
present formally, these definitions are applied to direct products of differentiable manifolds. It is best to first 
deal with the low-level formalities here before proceeding to higher-level applications. 

The values of pı and v; in Definition 26.15.4 are easily obtained from L4 as py = Lı (0) and vı = L4(1)—L4(0). 
(See for example Definition 26.13.3.) Similarly, po = L2(0) and v2 = L2(1) — L2(0). But it would be perhaps 
too pedantic to write the equation in Definition 26.15.4 more formally as 


VL; € R™, YVL E€ R™, 


i(L1, L2) = Leoncat(L1(0),L2(0)),concat(L1(1)—L1(0),L2(1)—L(0))- 


26.15.4 DEFINITION: The direct product identification map for Cartesian space tangent bundles T(R":) 
and T (IR??), for n1,n2 € Zf, is the map i : T(R") x T(R™) — T(R™*+”2) which is defined by 


Vpi, v E R” ; Vpo, v2 € R??, i(Ly, v, , L5) = Lconcat(pi pa),concat(vi ,va) : 


26.15.5 REMARK: Concatenation of tangent vectors in Cartesian space direct products. 
Since so much concatenation is involved in the construction of the vector i(L,, vı, Ly, »,) in Definitions 26.15.2 
and 26.15.4, it is given the name “concatenation of Cartesian space tangent vectors” in Definition 26.15.6. 
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26.15.6 DEFINITION: The concatenation of (Cartesian space) tangent vectors Lp, », € T(IR^*) for k = 1,2 
is the tangent vector i(Lp vi; L5, 7) € T (IR^), where i is the identification map in Definition 26.15.4. 


Alternative name: (Cartesian space) tangent vector concatenation. 


26.15.7 NOTATION: (Lı, L2), for tangent vectors Lı € T(IR™) and Lz € T(IR"2), denotes the tangent 
vector concatenation (Li, L3) € T(IR"^!*?2), where i is the direct product identification map for T (IR?!) 
and T'(IR"2) as in Definition 26.15.4. In other words, 


Vpi,Ui E R”, Vp2, v2 € R”, [oro Dose bo poss] 


= Leoncaitpinpzjconcat ono]: 
In other words, Vp1,v1 € R™, Vp2, v2 € R™, (Lo, vi, pss) = L(pi pa), (vi ,va)- 


26.16. Tangent velocity bundles on Cartesian spaces 


26.16.1 REMARK: Definitions and notations for tangent velocity vectors. 
Section 26.16 presents the very obvious definitions and notations for tangent velocity vectors corresponding 
to the tangent-line vectors in Sections 26.13 and 26.14. 


26.16.2 DEFINITION: A tangent velocity vector in a Cartesian space IR" for n € Zg is a pair (p,v) € 
R” x R”. 


26.16.3 DEFINITION: A tangent velocity vector at a point p in a Cartesian space R” for n € Zi is a tangent 
velocity vector (p, v) € IR" x R”. 


26.16.4 DEFINITION: The tangent velocity vector at a point p with velocity v in a Cartesian space IR" for 
n € Zg is the tangent velocity vector (p, v) € IR" x R”. 


26.16.5 DEFINITION: The tangent velocity set at a point p in a Cartesian space IR" for n € Zi is the set 
(p) x R” = ((p,v); v € R"}. 


26.16.6 NOTATION: T,(IR"), for n € Zf and p € R”, denotes the tangent velocity set at p in R”. 


26.16.7 DEFINITION: The tangent velocity space at a point p in a Cartesian space IR" for n € Zi is the 
tangent velocity set T, (IR^) = (p) x R” together with 


(1) the vector addition operation c : T; (IR^) x T, (IR?) — T,(IR") with o : ((p, v1), (p, va)) 9 (p, vı + v2), 
2T, 


(2) the scalar multiplication operation r : IR x 7; (IR?) (IR?) with 7 : (A, (p,v)) (p, Av). 


26.16.8 DEFINITION: The velocity of a tangent vector (p,v) € T, (IR"), at a point p in a Cartesian space 
R” forn € 7, is the n-tuple v € IR". 


26.16.9 DEFINITION: The tangent velocity bundle (set) on a Cartesian space IR" for n € Zg is the 
set Uper» T, (R7). 


26.16.10 Notation: T(R”), forn € Zg, denotes the tangent velocity bundle set on R”. 


26.17. Tangent covector bundles on Cartesian spaces 


26.17.1 REMARK: Mathematical representation of dual vectors for tangent-line vectors. 

There are two obvious ways to define duals for the tangent-line bundles in Section 26.14. The most obvious 
way is to use the algebraic dual of the linear space structure T,(R”) in Definition 26.13.11. A slightly less 
obvious way is to use an affine function from R” to IR, which has the superficial look-and-feel of a dual of a 
line L : t — p+ tv from R to IR". This second option is described in Section 26.18. Section 26.17 presents 
the first option. 


To decide how best to implement tangent covectors, it is helpful to “stress-test” candidate implementations 
in at least two ways. First, the implementation should survive when C' differentiability is replaced by weaker 
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regularity classes, particularly on the boundaries of regions or manifolds. Second, the implementation should 
be suitable for applications to physics (and other hard sciences). 


For non-C! manifolds, tangent lines and half-lines can be perfectly meaningful. But they no longer have a 
simple relation to affine functions which look like x ++ $77 , wi(a—p). Considering physics applications, one 
may regard a parametrised line L : t — p + tv as a “test velocity", which is related to the well-known “test 
particle" concept. One may move a particle at various velocities v in a small neighbourhood of a point p, 
and one may observe the work required to move the particle. 


As discussed in Remark 28.5.8, a useful inspiration for covectors is the (old-fashioned) work formula dW — 
Xdz + Y dy + Zdz, where one tests a force field by moving a point by infinitesimal displacements dx, dy 
and dz, and one observes infinitesimal work dW, and the corresponding ratios are denoted X, Y and Z. If 
the work rate is given by the linear combination rule for infinitesimal displacements in all directions, then 
the work rate is given as the algebraic dual on the tangent space at the point. However, forces are not 
always consistent with such a linear force field rule. Even if the force is linear, it may not be integrable as 
a conservative force field. Therefore an affine function is not always a suitable model for the force field. In 
general, one may expect some rule for the force as a function of displacement, but it may not be linear. 


One must also consider that the v in tangent-line vectors represents velocity, not displacement. In physics, 
forces are quite often velocity-dependent. Thus the simple rule of the style “dW = Xdx + Y dy + Zdz" does 
not reflect real physics. 


It may be concluded that the affine field notion in Section 26.18 has limited applicability. It is better to adopt 


the algebraic linear dual as the definition of tangent covectors, while observing that this is not necessarily 
universally applicable. The linear dual may be replaced, if necessary, by something else. 


26.17.2 REMARK: Definition of tangent covector spaces. 
The tangent-line space T (M) in Definition 26.17.3 is defined in Definition 26.13.11. 


26.17.3 DEFINITION: The tangent covector space at a point p in a Cartesian space IR" with n € Zi is the 
algebraic dual T (M)* of the tangent-line space T; (M). 


26.17.4 NOTATION: Ty (IR"), forn € Zi and p € R”, denotes the tangent covector space at p. 


26.17.5 DEFINITION: A tangent covector at a point p in a Cartesian space IR" with n € Zi is an element 
of the tangent covector space at p. 


26.17.6 DEFINITION: A tangent covector in a Cartesian space IR” for n € Ze is an element of the tangent 
covector space at some point in R”. 


26.17.7 DEFINITION: The tangent covector at a point p with co-velocity w, for a point p in a Cartesian 
space R” for n € Zj , and w € R”, is the tangent covector at p which maps L to Y. , wi(L(1) — L(0))! € R. 


26.17.8 NOTATION: Lj,,forp,w€ R” for n € Zt, denotes the tangent covector at p with co-velocity w. 


26.17.9 REMARK: The action of tangent covectors on tangent-line vectors. 
Combining Notations 26.13.4 and 26.17.8 yields L5 „(Lp,v) = Ya wit; for all p, v, w € IR^, for n € Zj. 


26.18. Tangent field covector bundles on Cartesian spaces 


26.18.1 REMARK: An apparent natural dual for affinely parametrised lines. 

An appealing idea for a natural dual of the set of affinely parametrised maps from R to R” is the set of 
affine maps from R” to R. An affine map L* : R” — R satisfies L* (v2) — L* (v1) = L* (v3) — L* (v1) for all 
v1, V2,01, V3 € R” such that vg — vı = vg — vj. In other words, L* (pı + v) — L* (pi) = L* (p2 + v) — L* (p2) 
for all pı, p2ọ,v € R”. It is not too difficult to show that all such affine maps have the form Lik Dg 
oa ki(zi — pi) for some p, k € R”. 


26.18.2 REMARK: Confusion between algebraic fields and fields of vectors. 
As mentioned in Remark 18.7.6, it is a great misfortune that the word “field” is used in algebra to mean a 
commutative division ring. The word “field” in Section 26.18 is supposed to suggest a physical field. 
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26.18.3 DEFINITION: A tangent field covector in a Cartesian space R” for n € Zi is an affine map from 
IR" to IR. 


26.18.4 DEFINITION: A tangent field covector at a point p in a Cartesian space R” for n € Zu is a tangent 
field covector L* in R” such that L*(p) = 0. 


26.18.5 DEFINITION: The tangent field covector at a point p with co-velocity k in a Cartesian space IR" 
for n € Z is the tangent field covector L* in R” such that Vv € R”, L*(p + v) = 5, 4 kivi. 


26.18.6 DEFINITION: The tangent field covector set at a point p in a Cartesian space IR" for n € Zi is the 
set {L* : R” > R; L*(p) = 0 and L* is affine}. 


26.18.7 NOTATION: 7*(R”), forn € Zg and p € R”, denotes the tangent field covector set at p in IR". 


26.18.8 DEFINITION: The tangent field covector space at a point p in a Cartesian space IR" for n € Ze is 
the tangent field covector set T7 (IR^) = (L* : R” — R; L(p) = 0 and L* is affine} together with 


(1) the vector addition operation o* : T? (IR^) x Tz (IR^) > T7 (R?) with o* (Li, L3) : x  Lt(z) + L3(2) 
for all Lt, Ls € T (R^), and 

(2) the scalar multiplication operation 7* : R x T (R”) > T (R”) with 7*(A, L*) : x œ> AL*(t) for all 
A € Rand L* € T7 (IR^). 


26.18.9 DEFINITION: The co-velocity of a tangent covector L* € T) (R”), at a point p in a Cartesian space 
IR” for n € Zj , is the n-tuple k € R” such that L* (x) = Y5; , ki(z; — p;) for all x € R”. 


26.18.10 REMARK:  Affine functionals on Cartesian space lack a well-defined start-point. 

Although the set of linear functionals on a finite-dimensional linear space is in a dual relation to the linear 
space, this convenient duality is not available between affine lines and affine functionals. The problem is 
that a single affine function L* : R” — R is zero for more than one point p € IR" when n > 2. So one 
cannot define the tangent field covector bundle total space as the set LJ; cn. T (IR") as in the case of the 
tangent-line bundle total space Uper» T, (IR?) in Definition 26.14.2. Given a line L : R — R”, the base point 
pis unambiguously p = L(0). There is no such formula for a unique base point for affine maps L* : R” > IR 
when n > 2. Consequently one must add “tags” to the affine maps. Definitions 26.18.11 and 26.18.13, and 
Notations 26.18.12 and 26.18.14, are too clumsy to use in practice. 


26.18.11 DEFINITION: The tagged tangent field covector set at a point p in a Cartesian space R” for n € Zi 
is the set {(p, L*) : R” > IR; L*(p) = 0 and L* is affine]. 


26.18.12 NOTATION: T; (R^), for n € Z and p € R”, denotes the tagged tangent field covector set at p 
in R”. 


26.18.13 DEFINITION: The tagged tangent field covector bundle (set) on a Cartesian space IR" for n € Zg 
is the set U er» 7p (R^). 


26.18.14 NOTATION: T* (R”), for n € Zj , denotes the tangent field covector bundle set on R”. 


26.19. Parametrised lines in ancient Greek mathematics 


26.19.1 REMARK: Euclid’s geometry was static. Archimedes’ geometry was dynamic. 

It is perhaps noteworthy that none of the definitions of lines or any other geometrical figures in Euclid's 
"Elements" have a time parameter or any other parameter. Euclid's geometry is static. (See Euclid/Heath 
[213], pages 153, 370; [214], pages 1, 78, 113 188, 277; [215], pages 10, 101, 177, 260.) By contrast, Archimedes 


freely uses a time parameter in the definition of a spiral for example. (See Archimedes/Heath [200], page 154.) 


If a straight line of which one extremity remains fixed be made to revolve at a uniform rate in a plane 
until it returns to the position from which it started, and if, at the same time as the straight line 
revolves, a point move at a uniform rate along the straight line, starting from the fixed extremity, 
the point will describe a spiral in the plane. 
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Archimedes also freely uses time parametrisation for straight lines. Following are the first two theorems and 
their proofs in the Archimedes treatise “On Spirals”. (See Archimedes/Heath [200], page 155.) 


Proposition 1. 


If a point move at a uniform rate along any line, and two lengths be taken on it, they will be 
proportional to the times of describing them. 


Two unequal lengths are taken on a straight line, and two lengths on another straight line repre- 
senting the times; and they are proved to be proportional by taking equimultiples of each length 
and the corresponding time after the manner of Eucl. V. Def. 5. 


Proposition 2. 


If each of two points on different lines respectively move along them each at a uniform rate, and if 
lengths be taken, one on each line, forming pairs, such that each pair are described in equal times, 
the lengths will be proportionals. 


This is proved at once by equating the ratio of the lengths taken on one line to that of the times of 
description, which must also be equal to the ratio of the lengths taken on the other line. 


It seems that the modern definition of a tangent vector in differential geometry is more closely related to 
Euclid’s static definition than to Archimedes’ dynamic definition. Since differential geometry is based entirely 
on tangent vectors as the most fundamental concept upon which all other concepts are built, it seems desirable 
to have a satisfying definition rather than a discordant set of definitions, neither of which is satisfying. (See 
Remark 53.3.2 for a survey of definitions of tangent vectors on differentiable manifolds.) Definition 26.13.11 
is in effect an upgrade of tangent vector definitions from Euclid (about 300BC) to Archimedes (about 250BC). 


Here are the first three formal definitions from “On Spirals” by Archimedes. (See Archimedes/Heath [200], 
pages 165-166.) 


DEFINITIONS. 


1. If a straight line drawn in a plane revolve at a uniform rate about one extremity which remains 
fixed and return to the position from which it started, and if, at the same time as the line revolves, 
a point move at a uniform rate along the straight line beginning from the extremity which remains 
fixed, the point will describe a spiral in the plane. 


2. Let the extremity of the straight line which remains fixed while the straight line revolves be 
called the origin of the spiral. 


3. And let the position of the line from which the straight line began to revolve be called the initial 
line in the revolution. 


The time parameter is clearly in the foreground of the thinking in Archimedes’ definitions of both straight 
lines and curves. This shows clearly the difference between Euclid and Archimedes. Euclid was more of a 
pure geometer while Archimedes was more of a mathematical physicist. The former was more concerned 
with static displacements and angles while the latter was more concerned with velocities. Many difficulties 
in differential geometry disappear when tangent vectors are viewed as velocities rather than displacements. 
Thus Definition 26.13.11 could be thought of as the “Archimedean tangent space” whereas the velocity in 
Definition 26.13.14 could be referred to as the corresponding “Cartesian tangent vector”. 


26.19.2 REMARK: Archimedes’ dynamic definition of a spiral. 
Figure 26.19.1 illustrates Proposition 20 the Archimedes treatise “On Spirals”. (See Archimedes/Heath [200], 
pages 174-176.) 


Proposition 20 in Archimedes’ “On Spirals” shows that the length of the line OT in the diagram is equal to 
the length of the arc KRP. This effectively converts the geometric construction for the tangent line TP to 
the spiral curve OP at P into a number, namely the ratio of the length of OT to the length of the radius OP. 
This ratio is the cotangent of the angle a. Thus the tangent line TP may be constructed as the unique line 
through P which cuts the curve only once or by measuring the specified distance along the line OT at right 
angles to OP. This is an example of how even in about 250BC, tangents to curves were being thought of in 
the two alternative ways: the geometrical and the numerical. 
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Figure 26.19.1 Tangent line for Archimedes spiral 


26.19.3 REMARK: The dynamic quadratriz curve of Hippias of Elis. 

The quadratrix is a “kinematic” or dynamically defined curve which is apparently more ancient than the 
spiral of Archimedes. It is credited to Hippias of Elis in the late 5th century BC. (See for example Heath [244], 
pages 182, 225-230; Boyer/Merzbach [237], pages 62-63, 87-88; Ball [232], pages 34-35; Cajori [241], pages 21, 
49-50; Sarton [470], pages 281-282; Boyer [236], pages 10-12; Bell [234], page 80.) 

'The quadratrix used a horizontal line descending at a constant rate, while simultaneously a radial line was 
rotating at a constant rate. The intersection of these defined the quadratrix. Obviously this could be used 
to trisect angles. But that’s a bit like saying that v2 satisfies the equation x? = 2. The definition of the 
Archimedes spiral has strong similarities to the definition of the quadratrix. 


Although the kinetic or dynamic curves did not really solve ancient problems because they essentially required 
as "input" the same geometric information as they supposedly produced as “output”, the same circularity 
really applies also to lines and circles. Nevertheless, such curves were mostly rejected by the ancient Greeks 
while lines and circles were regarded as somehow self-existent. One could almost say that the dynamic 
quality of curves such as the spiral and the quadratrix was a very early form of differential geometry since 
the curves were defined by velocities, and velocity is a differential concept. 


The true importance of kinematic curves lies in the very rejection itself. It was because of the rejection of 
kinematic curves that axiomatic Greek geometry was unable to progress to real-world problems for some 
1800 years. Dynamic curves were not unknown to them, but they were not regarded as worthy of serious 
study. Perhaps in accordance with this rejection, definitions of tangent vectors in 21st century differential 
geometry still mostly lack a time parameter. 


26.19.4 REMARK: Generalisations of ancient Greek definitions of tangents. 

The word “tangent” does mean “touching”, but Euclid’s definition of a tangent line to a curve is that it 
intersects the curve only once (even if extended infinitely in both directions). This definition corresponds to 
the modern tangent concept in the case of the boundary of a bounded convex open set. 


The simple ideas of “touching” and “cutting only once” yield some interesting generalisations. For example, 
a triangle may have an infinite number of “tangents” at one of its vertices. Any infinite line which intersects 
a vertex but does not intersect any other point of the triangle may be considered to be a tangent because it 
does indeed touch the triangle. This kind of tangent line is closely related to the tangent cone concept. 


The vertex of a triangle does not seem to have a genuine tangent line because we generally think of tangents 
as being unique, being aligned with the unique infinitesimal direction of the curve. It could be argued, 
however, that a vertex does pass through an infinite number of directions within a zero distance, as if it were 
an arc of a circle with zero radius. 


The idea of circles being tangent to each other is defined by Euclid. (See Remark 53.3.12.) This suggests 
a generalisation from first-order to second-order tangency. That is, whereas a tangent line merely has the 
same direction as the curve where it touches, a tangent circle would have both the same direction and radius 
of curvature where it touches. Of course, this implies that the curve must have a well-defined direction and 
curvature at the tangent point. Second-order tangency may be generalised to higher-dimensional surfaces, 
using ellipsoids as tangential objects. This kind of tangency is well known in the study of the curvature of 
embedded manifolds. 


Another way to generalise tangents is to replace tangent lines with other kinds of geometrical objects. 


[www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


26.20. Line-to-line transformation groups 915 


For example, the infinite line may be replaced by a semi-infinite line. Generally a closed convex region 
will have an infinite number of such tangent semi-infinite lines, and the union of these lines will form a 
cone. This suggests the definition of exterior tangent cones. These are used in the analysis of boundary 
value problems for partial differential equations, where an exterior (or interior) cone condition may help to 
guarantee existence of solutions to some classes of problems. Similarly, exterior (or interior) sphere conditions 
can be useful. 


26.20. Line-to-line transformation groups 


26.20.1 REMARK:  Line-to-line transformations are not necessarily linear. 

Groups of linear transformations are of fundamental importance for tangent spaces of n-dimensional C! 
differentiable manifolds because the chart transition maps for tangent spaces are linear maps on n-dimensional 
linear spaces. Linear maps are presented in Section 23.1. Section 26.20 is concerned with the kinds of 
transformation groups which play the same role for unidirectional and bidirectional non-C! differentiable 
manifolds. These are useful for the tangent spaces of continuous (i.e. topological) manifolds whose transition 
maps have well-defined directional derivatives (either unidirectional or bidirectional). 


A (bidirectional) line-to-line transformation is required to be invertible so that the transformations will form 
a group. Another constraint is that they must map scalar multiples to the same scalar multiples. This 
property may be called "scalarity". Apart from invertibility and scalarity, there are no other constraints. 
(In the application to tangent spaces for non-C! manifolds, there is generally an additional constraint of 
continuity, and possibly also Lipschitz continuity.) 


26.20.2 REMARK:  Halfline-to-halfline transformations. 

As mentioned in Sections 26.8 and 26.16, tangent-line vectors may be parametrised by their velocity. In 
the bidirectional case, two lines whose velocity is related by any non-zero scalar multiple should transform 
similarly. In the unidirectional case, only positive scalar multiples are constrained in this way. 


Line-to-line transformations on the set of Cartesian n-tuples IR^ may therefore be specified as bijections 
$ : IR" \ {0} — R” \ {0} with the constraint that VÀ € R \ {0}, d(Av) = Aó(v), whereas halfline-to-halfline 
transformations have the constraint VA € IR*, (Av) = Ad(v). It is clear that in each case, the set of such 
transformations forms a group under function composition. The halfline-to-halfline transformations may be 
specified equivalently as bijections on the set Bo 1(IR") = {x € R”; |z| = 1}. Line-to-line transformations 
may be specified a bijections on a suitable subset of Bo (R°). It is clear that such specifications require 
“too much information". This information may be reduced by requiring continuity or Lipschitz continuity 
for example. Differentiability or analyticity would reduce the “information” in the specification even further. 
(For examples of bidirectional line-to-line transformations, see Examples 53.2.4, 53.2.5 and 53.2.6.) 
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Chapter 27 


MULTILINEAR ALGEBRA 
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27.0.1 REMARK: Tensor algebra is perhaps more accurately called “multilinear algebra”. 

The subject of Chapters 27-30 is multilinear and tensor algebra, but for historical reasons, the combined 
topic is often referred to simply as “tensor algebra”. Multilinear and tensor algebra are two sides of the same 
coin. Multilinear algebra is presented first in Chapter 27 because it is the “primal” concept, whereas tensor 
algebra is its “dual”. 


27.0.2 REMARK: How to skip tensor algebra and multilinear algebra. 

Tensor algebra, if it is presented with logical precision, as opposed to diagrams, intuition and wishful thinking, 
costs more pain than it is worth. Most readers should skip Chapters 27-30. Any definitions, notations or 
theorems which are required for the differential geometry in Part IV are cross-referenced. 


27.0.3 REMARK:  Tensors are inevitably unpleasant. There is no Royal Road to tensors! 
Gallot /Hulin/Lafontaine [13], page 36, say the following about tensors. 


It is one of the unpleasant tasks for the differential geometers to define them! 


They’re not exaggerating! Defining tensors is unpleasant for both the reader and writer, and Chapters 27-30 
are no exception. The presentation here seeks the deeper truth underlying tensor algebra and attempts to fill 
some of the gaps in typical introductions. This requires a substantial investment of effort when developing 
the basic definitions and properties. But hopefully the result will be greater confidence and certainty in 
the application of tensor algebra to differential geometry. The calculations will be the same, but better 
understanding should yield more accurate decisions about which calculations to make. 


All things considered, the tensor algebra which is presented in Chapters 27-30 is far too tedious for the vast 
majority of readers. Although the gaps and short-cuts in the majority of texts are discomforting to any 
reader who wants to know the whole truth about tensors, the whole truth is an administrative nightmare of 
indices and multi-indices. So all in all, it may not be such a bad thing to think of tensors as arrays which 
obey certain transformation rules (as the elementary tensor calculus texts say), or as multilinear functions 
of tuples of vectors and linear functionals (as many other texts say). 

A logically correct presentation of multilinear and tensor algebra and analysis, including exterior algebra 
and exterior calculus, seems to be unavoidably painful. For practical applications, it is probably better to 
ignore the precise underlying logic and use intuition and guesswork instead. 


27.0.4 REMARK: It is helpful to keep tensor algebra and tensor analysis separate. 
Chapters 27-30 present only the algebra of tensor spaces. The analysis of tensor spaces is delayed until 
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Chapter 46 because analytical tensor space concepts such as differential forms (Section 46.2) and the exterior 


derivative (Section 46.7) require differential and integral calculus. 


27.1. Multilinear algebra and tensor algebra 


27.1.1 REMARK:  Tensors quantify the response of particles to multilinear fields. 
Tensors may be defined informally in the physics context as follows. 


A tensor is the property of a particle which quantifies its response to a multilinear field. 


An example is a small antenna in an electric field, or a small loop of wire in a magnetic field, or a small 
lump of matter in a gravitational field, or a particle of an elastic solid subjected to stress. The response of 
a particle to a multilinear field can be quantified as a tensorial state of the particle. The effects of fields on 
larger amounts of matter may be calculated by integrating the response of notional individual particles over 
a volume of matter. (See Definition 27.2.3 for multilinear maps and functions.) 


27.1.2 REMARK: Tensor products of vectors are not really products of vectors. 

Probably the usual method of introducing tensors as products of vectors is the root cause of the complexity 
and incomprehensibility of tensor algebra. Products of vectors are really only incidental to tensor algebra. 
Multilinear functions are the core concept of tensor algebra. Covariant tensors are multilinear functions, no 
more, no less. Contravariant tensors are the duals of multilinear functions. In other words, a contravariant 
tensor is a linear functional on a space of multilinear functions. 


It just happens that a product of vectors yields a particular simple kind of linear functional on a space of 
multilinear functions. Products of vectors only have significance to tensor algebra via their association with 
the special kinds of contravariant tensors called “simple tensors” or “monomial tensors”. 


Tensor algebra may be thought of as having two kinds of entities: fields and tensors. The fields are multilinear 
functions and the tensors are the responses to those fields. Trying to think of these responses (i.e. linear 
functionals on spaces of multilinear functions) as sums of products of vectors is a bit like subdividing a 
sphere into cubes — which is in fact how practical integration is generally done in Cartesian coordinates. 
The sphere does not easily subdivide into cubes. Nor do tensors naturally subdivide into products of vectors. 


In tensor algebra, it is the covariant tensors which are fundamental. Contravariant tensors are a secondary 
construction, defined as the dual of covariant tensors. This could possibly provide some slender justification 
for the mystifying use of the terms “covariant” and “contravariant” in differential geometry. It is more likely, 
however, that the term “contravariant” refers to the components of vectors with respect to a given basis. The 
components of a vector do vary contravariantly with respect to the sequence of basis vectors. In the tensor 
calculus which was developed in the last decade of the 19th century by Ricci-Curbastro and Levi-Civita, 
which was heavily oriented towards tensor component arrays, there would have been a natural inclination 
to regard the component arrays as being the real tensors, just as in the theory of linear spaces as taught 
at an elementary level, vectors are usually identified with n-tuples of numbers. In terms of coordinates, 
“contravariant” vectors and tensors do vary in a contrary fashion to the linear space basis. Unfortunately, 
this confusion of vectors and tensors with their component arrays has led to somewhat careless usage of 
the terms “covariant” and “contravariant” for the vectors and tensors themselves. (See also Remark 28.5.7 
regarding this terminology issue.) 


27.1.3 REMARK:  Tensors arise naturally in many contexts. 

The tensor concept includes the vector concept as a special case. Tensors which are not vectors are useful 
in most of physics (including classical and quantum field theories, fluid dynamics, astrophysics and general 
relativity), in civil engineering (especially elastic solids in the construction of structures), and electrical and 
electronics engineering (especially electromagnetic systems). 


Tensors and vectors are almost ubiquitous in physics and engineering. Ever since Newton, science and 
engineering have utilised the reductionist, analytical approach, which is the art of (a) reducing a general 
non-linear system to a linear system by differentiation, (b) studying a problem “in the small” as a linear 
system, and (c) integrating the linear solution to obtain the non-linear solution “in the large”. Differentiation 
approximates a non-linear function by a “best-fit” linear function in the limit of small size. Integration then 
extrapolates this linearised local system to a global system. Vectors arise naturally from the linearisation of 
single-parameter systems. Tensors arise naturally from the linearisation of multi-parameter systems. Thus 
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tensors are a very direct consequence of the fact that differentiation yields linear mathematical objects and 
linear relations between those objects. Since the vast majority of physics is expressed as differential equations, 
multilinearity and tensors arise inevitably from analysis in models for physics, chemistry, engineering and 
numerous other fields. In other words, tensors are not a narrow, special topic. Tensors are at the core of all 
mathematical analytical models. 


27.1.4 REMARK: Tensor algebra requires coordinates, but the tensor concept is coordinate-independent. 
The use of coordinates in differential geometry is much maligned. But many of the most basic steps in 
the development of tensor algebra depend heavily on basis-and-coordinates expressions because multilinear 
functions and tensors have natural decompositions into linear combinations of simple multilinear functions 
(in Section 27.4) and simple tensors (in Section 28.4) respectively. Only those constructions and properties 
which are independent of the choices of bases are regarded as authentic. So when bases and coordinates are 
used in a construction or proof, it must be verified that the result is basis-independent. 


One of the guiding objectives of an introduction to tensor algebra is the need to build up mathematical 
machinery which hides the coordinates as soon as possible. Thus coordinates are used where necessary in 
the early development in the earnest hope that they will be politely hidden in the later development and 
applications. Of course, coordinates inevitably reappear in practical calculations. 


27.1.5 REMARK: The historical origins of tensors. 

The subject of tensors may be thought of as “multilinear algebra”. It seems to have started with Hermann 
Günther Grafmann in the middle of the 19th century. Much of the confusion in tensor algebra might 
disappear if it was renamed “multilinear algebra". The topic of antisymmetric (or “alternating” ) tensors is 
called exterior algebra, exterior calculus or Grafmann algebra. (See Federer [69], page 8 and Frankel [12], 
page 66 for notes on Grafimann as the originator of the exterior calculus.) 


Struik [249], page 204, says the following about the origins of the word "tensor". 


The term tensor was introduced, in its modern meaning, by Woldemar Voigt, Gottingen physicist 
and crystallographer, in 1908. 


The word “tensor” was also used with the modern meaning in 1898 in a book about crystallography by 
Voigt [309], page 20. On the origin of the term “tensor calculus", Struik [249], page 204, says the following. 


An entirely different and more fundamental approach to hyperspace goes back to Riemann's paper of 
1854 with his introduction of a topological manifold, endowing it with a quadratic linear element as 
for which ds? = gij dx*dz? (to use our present notation). This led, through the works of Christoffel, 
Lipschitz, and Beltrami, to the absolute differential calculus of Gregorio Ricci-Curbastro of Padua 
(1883 and later). A summary of his research, with its wide applications to differential invariants, 
differential geometry of S", elasticity, and rational mechanics, was laid down in a paper in the 
Mathematische Annalen, Vol. 54 (1901), entitled “Méthodes de calcul différentiel absolu," written 
together with his pupil Tullio Levi-Civita, who contributed his own research. This paper became 
quite famous after 1913, when Einstein adopted that calculus for his general relativity and gave 
this calcul absolu the name of tensor calculus. 


(See Levi-Civita [222], pages 479—559, for the 1901 paper by Ricci/Levi-Civita [194].) 


27.2. Multilinear maps 


27.2.1 REMARK: Scalar-valued multilinear maps are called “multilinear functions”. 

A multilinear map is a scalar-valued or vector-valued function of vector variables which depends linearly on 
each variable individually. “Scalar-valued” may be thought of as a special case of “vector-valued” because a 
field may be regarded as a one-dimensional linear space over itself. (See Definition 22.1.9.) 


Scalar-valued multilinear maps will be referred to as “multilinear functions”, presented in Section 27.3. 
A multilinear function is not, strictly speaking, a particular kind of multilinear map because a field is 
not, strictly speaking, a particular kind of linear space. But this choice of terminology is convenient and 
unambiguous in most contexts. (Of course, “functions” and “maps” are synonyms. These words are used 
here only as hints that the target space is a field or linear space respectively. This is similar to the way in 
which the synonyms “set” and “collection” are often used as hints.) 
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Since multilinear functions are effectively multilinear maps, everything in Section 27.2 is applicable to the 
multilinear functions in Section 27.3. However, multilinear functions have some special properties not enjoyed 
by general multilinear maps. For example, the duals of multilinear function spaces are tensor spaces, which 
have very broad applications, whereas general multilinear maps do not have such useful dual spaces. On the 
other hand, vector-valued multilinear maps are very useful in their role as vector-valued forms in physics, 
including Lie-algebra-valued forms in particular. 


27.2.2 REMARK: A function of r variables is the same as a function of a single r-tuple variable. 

A function of many variables is (almost always) formalised as a function of a single variable which is a tuple. 
For example, a function with variables vi, vo ...v, is formalised as a function of the tuple (v1, v2 ... v,.). (See 
Remark 10.2.31.) Thus a function with domain x4e4 Va is, strictly speaking, a function of a single variable, 
but it is thought of as a function of #(A) variables. Context determines which interpretation is intended. 


27.2.3 DEFINITION: A multilinear map from the Cartesian product x4c4 Va of a finite family of linear 
spaces (V4)aeA over a field K to a linear space U over K is a map f : XaeA Va — U such that 


VB € A, VÀ,u € K, Vu,v,w € € 


QE 


(ug = Avg + uwg and Va € A \ {8}, ta = Va = Wa) f(u) = Af(v) + uf (w). (27.2.1) 


A multilinear function is a multilinear map whose target space U is the field K regarded as a linear space. 


27.2.4 REMARK: Interpretation of the multilinear map concept. 

Definition 27.2.3 says that a function f : Xaca Va — U is multilinear if f is linear with respect to each 
space Vg individually for fixed values of the other parameters. Theorem 27.2.27 expresses line (27.2.1) in an 
intuitively clearer way perhaps. 


'The linearity of a multilinear map with respect to each component of its domain is illustrated in Figure 27.2.1. 
This diagram shows three linear relations. The function value f(v{,v2,v3) € U is related linearly to v € Vi 
for each fixed vg € Vz and v3 € V3. Similarly, f(vi, v5, v3) is a linear function of v4 for fixed vı and v3, and 
f (v1, v2, v3) is a linear function of v5 for fixed vı and vp. 


f (v1, v2, v5) 


/ 

f (vi No, v3 
`. linear 
. ¥ È N 
linear , ! linear e 
P4 | 
4 
V V2 V3 


multilinear map f : Vi x Vo x Vz > U 


Figure 27.2.1 ^ Linearity of a multilinear map with respect to domain components 


27.2.5 EXAMPLE: Application of the multilinear map condition for two finite-dimensional spaces. 

If one does not know the answer in advance, it is not entirely obvious how much flexibility remains for a 
map after the multilinearity condition in line (27.2.1) has been applied. In particular, it is not immediately 
obvious that the set of multilinear maps for a finite product of finite-dimensional linear spaces has the 
structure of a finite-dimensional linear space. (See Theorem 27.6.14 for a proof of this assertion.) 


Let A = Ng and K = IR. Let V; = IR? and V2 = R3. Then Xaea Va = R? x RÌ. (Note the pedantic subtlety 
here that strictly speaking, x4c4 Va is the set of maps g : A > (Vi U V3) such that gı € V; and g2 € Vz by 
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Definition 10.11.2, whereas here the simple set of ordered pairs in Definition 9.7.2 is used. This observation 
is “in the weeds” of mathematical formalism. So it does not need to be considered!) 


Let U be the linear space IR. By Definition 27.2.3, a multilinear map from IR? x IR? to U = IR is then a 
map f : R? x IR? > IR which satisfies line (27.2.1). Let 6 = 1. Let A,u € IR. Let u,v,w € R? x RÌ. Let 
ui = Av, + pw € IR? and ug = ve = we € R”. Then line (27.2.1) requires that f(u) = Af(v) + uf(w). 
Thus for each fixed uz € IR?, the map fiu, : ui > f(u1, u2) is linear from IR? to IR. So by Theorem 23.7.10, 
Vus € R?, Vu € R2, fius) = fius (€1,1) ht (ur) + Pius (€1,2)h2 (ur), where 11 = (1, 0) and €1,2 = (0, 1) 
are the standard basis vectors for R?, and hł (u1) and h(ui) are the components ul and u? of u, = (u$)2., 
respectively. Consequently, if only this 6 = 1 condition is applied, f would only be required to satisfy 
Vu; € IR?, Vus € RÌ, f(ui,u2) = f(e13, ua)hl(ui) + f(e1,2, u2)h? (u1). This would allow enormous freedom 
for f because the numbers f(e1,1, u2) and f(e1,2, u2) would be completely arbitrary functions of u € IR?. In 
fact, for an arbitrary linear-functional-valued function g : IR? — (IR2)*, the function f : (u1, u2)  g(u»)(ui) 
satisfies Definition 27.2.3 for 8 = 1. Moreover, Theorem 23.7.10 implies that any function f : IR? x IR? > R 
which satisfies Definition 27.2.3 for 6 = 1 must have this form. 


The condition for 8 = 2 can now be applied for u; € R? V {0} to the function f : (u1, u2) — g(u2)(u1) for 
any linear-functional-valued function g : IR? — (IR2)* to discover that g is a linear linear-functional-valued 
function with respect to the linear space structure on the dual space (IR?)* given in Definition 23.6.4. Since 
g must have the form g(ua) : u1 œ> g(ua)(e1 3) hl (ur) + glu2)(e1,2)h? (u1) for all u, € IR?, and this expression 
must be linear with respect to ug, it follows that both ug +> g(u2)(e1,1) and ug  g(u2)(e1,3) must be linear 
functions from IR? to IR, which implies by Theorem 23.7.10 that g(ua)(e1,;) = p» g(e€2,;)(€1,1) ua for i € No. 
Therefore f(ui, u2) is determined for all (u1, u2) € IR? x IR? by the six numbers g(eo,;)(e1,) = f(e1,i, €2,). 
This suggests that the set of multilinear functions f : IR? x IR? — IR constitutes a six-dimensional linear 
space. In fact, f(u1,u2) = 354.4 Pi f (eri, €2,5) u$ uà) for all (u1, u2) € IR? x IR?. 


In terms of the standard bases and coordinates, every multilinear function f : R? x IR? — R has the form 
f(u, u2) = Di a;;u4 u$ for all (uj, uz) € IR? x IR? for some array [aij];-, 3-1 € IR^*?. This seems much 
simpler to understand than Definition 27.2.3. So it is not at all surprising that so many authors prefer 
to think of multilinear functions (and also tensors) as arrays of numbers with suitable manipulation rules. 
Nevertheless, an abstract approach is adopted here, even though this is often painful. 


27.2.6 REMARK: Comparison of multilinear maps with partial derivatives of functions. 

Initially it may be difficult to acquire an intuition for multilinear maps. It may be helpful to consider 
the analogy to partial differentiation of real functions of several variables. The partial derivative ð; f(a) = 
Of (z)/8a* me of a continuously differentiable real-valued function f : R” — IR, with respect to each i € Na 
for n € Z*, is calculated as the limit of the differential quotient of f with respect to the variable x’ while 
all other variables are kept constant. Thus ð; f(a) = limpo h^ !(f(a + he;) — f(a)) for the standard unit 
vectors e;. The existence of the partial derivatives requires only that the differentials f(a + he;) — f(a) be 
approximately linear in the limit as h > 0. 


27.2.7 REMARK: Multilinear maps on infinite families of linear spaces are useless. 

Definition 27.2.3 explicitly excludes an infinite set A. This is a necessary assumption. To see why, consider 
the consequences of Theorem 27.2.30 (ii) in the case that A is an infinite set. If any element vg of the family 
of vectors v = (va)aeA is zero, then every multilinear function f € .Z((Va4)seA;U) has value f(v) = 0. 
Therefore if v € Xaea Va satisfies f(v) Æ 0, then v, 4 0 for all a € A. Define w € Xae4A Va by Wa = 2wa. 
Then the application of linearity to each index 8 gives a factor of 2 between f(w) and f(v). Therefore 
f(w) = 2? f(v) ¢ U if the character of K equals 0. (Fields with non-zero character cause other problems 
for multilinear functions.) It follows that 2((Va)aea;U) = {0}. In other words, there is only one possible 
multilinear function, namely the function f with value f(v) — 0 € U for all v € (Va)aea. This is clearly 
useless although it does not appear to be logically self-contradictory. 


27.2.8 REMARK: Combined multinomial-style conditions for multilinearity. 

Theorems 27.2.9 and 27.2.10 give equivalent criteria for the multilinearity in Definition 27.2.3. These criteria 
have the advantage of resembling the usual rules for linear maps acting on linear combinations of vectors. 
(See Remark 23.1.2.) Expressions such as » 5, cy, Aa(va)va)aeA and (ALv] + A2v2)aeA may be thought of 
as “multilinear combinations". (The expression Fin(S, K) denotes the set of functions from a set S to a field 
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K which are zero outside a finite set. See Notation 22.2.25.) The formulas in Theorems 27.2.9 and 27.2.10 
are probably too convoluted for practical use. 


27.2.9 THEOREM: Equivalent condition for multilinearity in terms of action on linear combinations. 
Let (Va)aea be a finite family of linear spaces over a field K and let U be a linear space over K. Then a 
map f : Xaca Va — U is multilinear if and only if 


VAE x Fin(Va, K), 
F(X Aa(a)valaca)= D (TI Ao(wa)) F((wy)rea)- 


Va € Va we x Va BEA 
acA 


Pnoor: The result follows by induction on the elements of A. 


27.2.10 THEOREM: Equivalent condition for multilinearity in terms of action on linear combinations. 
Let (Va)aea be a finite family of linear spaces over a field K and let U be a linear space over K. Then a 
map f : Xaea Va — U is multilinear if and only if 


VAt, A? € K4, Vul, u? e x Va, 
acA 


KAk +A DO (TD 95") wh nea). 


kE{1,2}4 BEA 


Pnoor: The result follows by substituting suitable values for A in Theorem 27.2.9. 


27.2.11 REMARK: Multilinearity is not a superficial extension of the linearity concept. 

If one looks long enough and closely enough at the conditions for multilinear maps in Definition 27.2.3 and 
'Theorems 27.2.9, 27.2.10 and 27.2.27, one may conclude that the concept of a multilinear map is not a 
simple generalisation of the linear map concept. As mentioned in Remark 27.2.6, there are some similarities 
between multilinear functions and partial derivatives of functions of several variables. In the same way that 
partial differential equations are not a superficial extension of ordinary differential equations, multilinear 
maps are not a superficial extension of the linear map concept. 


27.2.12 DEFINITION: The degree of a multilinear map f : Xaca Va — U for a finite family of linear spaces 
(Va)aea over a field K to a linear space U over K is the integer #(A). 


27.2.13 REMARK: Well-definition of the degree of a multilinear map. 

The degree of a multilinear map f : Xaca Va — U in Definition 27.2.12 is well defined because A can always 
be explicitly reconstructed from f as A = J (Dom(z); z € Dom(f)}. The set-product Dom(f) = xac4 Va 
is always non-empty because it always contains the all-zeroes tuple. Thus J [Dom(z); z € Dom(f)} = 
U{A; z € Dom(f)} = A. 


The term “degree” is also applied to the whole set (f : Xaca Va — U; f is multilinear} which appears in 
Notation 27.2.17. This is meaningful because every map in the set has the same degree. (This is because 
they all have the same domain.) 


27.2.14 REMARK: Well-definition of multilinearity for degree zero. 

The Cartesian product x ge Va of linear spaces (Va)aca in Definition 27.2.3 is always non-empty. If A = 9, 
then Xaca Va = {0} z 0. (See Theorem 10.11.6 (1).) The empty function @ € xaca Va is an all-zeros tuple. 
(Since the tuple is empty, all of its elements are zero!) 


In the case A = @, all functions f : Xac4 Va — U are multilinear. Therefore Z ((Va)aca; U) = Ut which 
is the set of all functions which map () to an element of U. Thus #((Va)aea;U) may be identified with the 
set U when A = Ø. (See also Remark 27.2.22 and Definition 27.2.23.) When U is a field K regarded as a 
linear space, the set Y((Va)aca; K) may be identified with K when A = Ø. (See also Definition 27.3.7.) 


27.2.15 REMARK: Alternative terminology for the degree of a multilinear map. 

The term “degree” is well-chosen because, as mentioned in Remark 27.4.8, multilinear functions whose 
domain component spaces are all finite-dimensional are polynomials with homogeneous degree. As mentioned 
in Remark 28.1.14, some authors use other terms for the degree. (See Table 29.5.1 in Remark 29.5.11 for 
some variant terminology.) 
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27.2.16 DEFINITION: Let f: Xae4 Va — U be a multilinear map as in Definition 27.2.3. 
When #(A) = 2, the map f is said to be bilinear. 

When #(A) = 3, the map f is said to be trilinear, and so forth. 

When #(A) =r for r € Z{, the map f is said to be r-linear. 


27.2.17 NOTATION: .Z((V4)aeA;U) for a finite family of linear spaces (Va)ac a over a field K, and a linear 
space U over K, denotes the set of multilinear maps from x44 Va to U. 


27.2.18 NOTATION: .Z(Vi,...V,,; U) for a sequence of m linear spaces (V;)7*., over a field K with m € Zf, 
and a linear space U over K, denotes the set .Z((Va4)sceA; U) with A= Nm = (1... m}. 


27.2.19 NOTATION: .£,(V;U)for m € Zj and linear spaces V and U over a field K denotes the set of 
multilinear maps L ((Va)aca; U) with A = Nm and V, = V for all a € A. 


27.2.20 REMARK: Schematic diagrams for spaces of multilinear maps. 
Notations 27.2.17, 27.2.18 and 27.2.19 are illustrated in Figure 27.2.2. (These diagrams may be compared 
with the diagrams for tensor spaces in Figure 28.1.1 in Remark 28.1.3.) 


U U U 
LAC Vi.. Vn L 
L((Va)aeasU) || . Z(Vi,... Vm; U) Lm(V; U) 


Figure 27.2.2 Notations for spaces of multilinear maps 


27.2.21 THEOREM: Properties of multilinear maps of degree zero. 
Let A = 0. Let K be a field. Let U be a linear space over K. 


(i) XacA Va = {0}. 
(ii) Z((Va)aca; U) = U = (((0,u)); ue U}. 


PROOF: Part (i) follows from Theorem 10.11.6 (i). 
Part (ii) follows from Notation 27.2.17 and Definition 27.2.3. 


27.2.22 REMARK: The canonical injection of vectors into a multilinear map space of degree zero. 

By Theorem 27.2.21 (ii), there is clearly a natural bijection u — {(@,u)} from a linear space U to the space 
-Z (0; U) of multilinear maps of degree zero. This map may also be written as u > (Ø — u). This identifies 
the each vector u with the corresponding tensor Ø — u. Thus the tensor space .Z/(; U) may be identified 
with the linear space U. This identification is called a “canonical injection" in Definition 27.2.23. It is, 
however, clearly an isomorphism with respect to the linear space structure for multilinear map spaces in 
Definition 27.6.3. This gives .Z/(; U) linear space dimension equal to dim(U), but degree 0. 


An important point to note here is that the constraint “f is multilinear” for the set .Z/((); U) is actually weaker 
than a linearity constraint. Multilinearity means linearity with respect to all components, but here there 
are no components! So the multilinearity constraint is vacuous. (This is also mentioned in Remark 27.2.14.) 
The set of linear functions from (0) — (0) to U would contain only the zero map. So this set would have 
dimension 0. But for the multilinearity constraint, the dimension is dim(U) because there is no constraint. 
'This is consistent with the pattern seen in the dimensionality formula in Theorem 27.6.18. 


27.2.23 DEFINITION: The canonical injection of vectors in a multilinear map space .Zg9(V;U) = .£Z(0;U), 
for linear spaces V and U over a field K, is the map i : U — .Z (0; U) given by 


Vu € U, Vv € {0}, i(u)(v) = u. 
In other words, i : u> (0 > u) for all u € U. 
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27.2.24 REMARK: Identification of unilinear maps with linear maps. 

A set 2((Va)aea;U) of degree 1 may be identified with Lin(Vg,U), the set of linear maps from V; to U, 
where A = {6}. (See Section 23.1 for linear maps.) Then for f € .Z((Va)aeA;U), the domain of f is the 
set Xae4 Va = {(8,v); v € Va), which is not exactly the same as the set Va. However, it is customary to 
identify the set x qe_g} Va with Va. 


A set Y,(V;U) for m € Z has degree m. If U is the field K regarded as a linear space, then 4% (V; K) 
may be identified with the linear space Lin(V, K) = V*, the dual of V, defined in Section 23.6. 


27.2.25 REMARK: Issues that arise for multilinear maps with an infinite-dimensional component space. 
As discussed in Remark 23.5.1, it is not absolutely guaranteed that the dual space V* has more than a single 
element (the zero functional) if V is infinite-dimensional (although in practical situations, the existence 
of non-zero functionals is usually very easily demonstrated). Consequently, if any space V4 in the family 
(Va)aca is infinite-dimensional, it must be shown that the set 2((Va)aea;U) contains more than the zero 
multilinear function. One way to guarantee this is by providing a basis for each Va. 

In the case of an infinite-dimensional linear space V, which has a basis, the dual space V? is very large. (See 
Theorem 23.7.14.) In general, there is no constructible basis for such a dual space. Consequently tensor 
products with one or more infinite-dimensional component spaces are of limited value. 


The algebraic duals of infinite-dimensional linear spaces can be made useful by imposing a topological 
constraint on the dual, for example by requiring linear functionals in the dual space to be bounded or 
continuous. The same can be done for multilinear functions. The set of multilinear maps may be regarded 
as the “multilinear dual”, which can be pruned back from the full algebraic multilinear dual to a topological 
multilinear dual of some kind. So infinite-dimensional multilinear functions and tensors are not a totally 
absurd idea. 


27.2.26 REMARK: Clearer formulation of multilinearity in terms of substitution operators. 

Theorem 27.2.27 re-expresses Definition 27.2.3 in terms of the “substitution operator" notation which defines 
€ £3 -y to mean the result of substituting of expression V into expression E(x) wherever the variable x 
occurs. Condition (27.2.2) is probably easier to interpret than (27.2.1). Theorem 27.2.27 says that a function 
f: Xaea Va — U is multilinear if and only if the map w > f( ) from Vg to U is linear for all 6 € A 


U 
vg =w 
and v € Xaca Va. 


27.2.27 ''HEOREM: Equivalent substitution-operator-based expression for multilinearity. 
A map f from the Cartesian product x ge Va of a finite family of linear spaces (Va)aca over a field K to 
a linear space U over K is multilinear if and only if 


Vv € X Vas VB € A, Vwi, we € Va, VÀ1,À2 € K, 
ac 


ID E = J|, us) + dof (v|,, rw») (27.2.2) 
PROOF: Let f : XaeA Va — U be multilinear. Let v € Xoca Va, B € A, w1, w2 € Vg and 41,42 € K. Let 
ü= PREE v= v|, cus and Ù = b cenis Then tig = A10g4-A2t0g and Va € A\{a}, tia = ta = Wa. 
So by Definition 24.1.5, PESEE ee) = f(@) = Af (®)+rA2f(w) = A f(v], gw) FA2F GL, ay) which 


verifies line (27.2.2). The converse is equally evident. 


27.2.28 REMARK: Visualisation of multilinear maps in terms of axes and components. 

Figure 27.2.3 gives an alternative style of visualisation to Figure 27.2.1 for the definition of multilinear 
functions. It is important to keep in mind that the “axes” for the Cartesian product Vi x V2 x V3 represent 
linear spaces Vi, V2 and V3 which are not necessarily one-dimensional. 


27.2.29 REMARK: Multilinear maps equal zero if at least one component equals zero. 
It follows from Definition 27.2.3 that the value of a multilinear map is zero if any of its arguments is zero. 
This is expressed in two equivalent ways in parts (i) and (ii) of Theorem 27.2.30. 


27.2.30 THEOREM: A multilinear map equals zero if one or more components equals zero. 

Let (Va)aca be a finite family of linear spaces over a field K, and let U be a linear space over K. Then: 
(i) Vf € Z((Va)aca; U), VB € A, Vv € Va)aca, (va 2 09 f(v) — 0). 

(ii) Vv € (V5)aeA, ((38 € A, vg — 0) > (Vf € .Z((Va5)aeA;U), f(v) = 0)). 
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V3 a 
f € L (Vi, Vo, V3; U) 
(v1, V2, v3) r 

linear 
(v1, v5, U3) linear 

(v1, V2, v3) 
EV, x Vo; x Va. (vj,v2,03) linear 

Vo 
0 
Vi x V2 x V3 Va 
Figure 27.2.3 Definition of multilinear map 


Pnoor: For part (i), let 6 € A and let u = (vo)aeA € XaeA(Va) satisfy ug = 0. In Definition 27.2.3, let 
v = w = u and à = p = 0. Then line (27.2.1) gives f(u) = 0.f(u) + 0.f(u) = 0. 


Part (ii) follows from part (i) because it is logically equivalent. 


27.2.31 REMARK: Ad-hoc “juxtaposition products” of multilinear functions by vectors. 

In practical applications, one sometimes sees products of multilinear functions by vectors. In other words, a 
multilinear function f € .Z((V4)ae4; K) is multiplied by a vector € € U (using scalar multiplication in U) to 
obtain a map f£: XacaVa — U which is hopefully a valid multilinear map in #((Va)aca;U). As always, 
anything which seems intuitively obvious must be carefully checked, as in Theorem 27.2.32. (In practice, the 
multiplication operation “-” is not indicated explicitly. The product is indicated only by juxtaposition.) 


The output of the “juxtaposition product" f - € is limited to the subspace (A£; A € K} of U. So clearly not 
all multilinear maps can be represented in this way. However, if .Z((V4)ae4; K) and .Z((V4)aeA; U) are 
finite-dimensional linear spaces, a general multilinear map in Z ((Va)aca; U) can be expressed as a sum of 
juxtaposition products J` „epg v: € for a basis B of vector tuples in .Z((Va)aeA; K). (See Section 27.6 for 
linear spaces of multilinear maps.) 


'The juxtaposition product for multilinear functions by vectors is an extension of Definition 23.4.10 for the 
juxtaposition product for linear functionals by vectors. 


27.2.32 THEOREM:  Juztaposition products of multilinear functions by vectors. 

Let (V4)aeA be a finite family of linear spaces over a field K. Let U be a linear space over K. Let f -£ 
denote the function f-£ : x 4eAV4 — U defined by f-£ : (Va)aca | f((va)aeA JE for all (va)aca € XaeAVa 
and € € U. Then 


Vf € Z((Va)aeA; K), VE € U, f:&€ Z((Va)aeA;U). 


PROOF: Let B € A and A, € K. Let the vector-tuples u,v,w € XacAVa satisfy ug = Avg + uwg and 
Va € A\ {8}, wa = v, = Wa. Then by Definition 27.2.3, (f - E(u) = f(u)£ = (Af(v) + uf(w))& = 
Af (vjE+uf(wé = A(f-€)(v) + u(f - Elw). So f-£: xogaAVa — U is a multilinear map by Definition 27.2.3. 
Hence f- £ € .Z((V4)aeA;U). 


27.2.83 REMARK: Easy extension of multilinearity from linear spaces to modules over rings. 

Linearity is defined for a broader class than linear spaces. For example, maps between modules over a 
commutative ring have a well-defined concept of linearity. (See for example Theorem 19.4.6.) Multilinearity 
of maps is well-defined for these more general classes. Therefore tensor spaces are well-defined for these 
classes also. (See for example Bump [57], chapter 9.) 
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27.3. Multilinear functions 
27.3.1 REMARK: Multilinear functions are scalar-valued multilinear maps. 


Multilinear functions are defined along with multilinear maps in Definition 27.2.3. Multilinear functions are 
scalar-valued multilinear maps, where the scalar field is regarded as a linear space over itself. 


27.3.2 NOTATION: .Z'((Va)aeA) for a finite family of linear spaces (V4)aeA over a field K denotes the set 
of multilinear maps X ((Va)aca; K). 


27.3.3 NOTATION: .Z(Vi,...V4) for a sequence of m linear spaces (V;)*, over a field K with m € Zg 
denotes the set of multilinear maps .Z(Vi,... Vm; K). 


27.3.4 NOTATION: -%,,(V) for m € Zt and a linear space V over a field K denotes the set of multilinear 
maps Z%,(V; K). 


27.3.5 REMARK: Schematic diagrams for spaces of multilinear functions. 

The notations .Z((Va4)aeA; K), Z (Vi,... Vm; K) and .Z;,(V; K) are used in case the linear space U in 
Notations 27.2.17, 27.2.18 and 27.2.19 respectively is the field K regarded as a linear space, as defined in 
Definition 22.1.9. A case of particular interest for differential geometry is, of course, K — IR. 


Notations 27.2.17, 27.2.18 and 27.2.19 are abbreviated to Notations 27.3.2, 27.3.3 and 27.3.4 in case U is the 
field K regarded as a linear space. These abbreviations are illustrated in Figure 27.3.1. 


K K K 
EST Va Vi S Vian Uu 
-Z((Va)aeA) Vis tee Vin) RV) 
Figure 27.3.1 Notations for spaces of multilinear functions 


27.3.6 REMARK: The canonical injection of scalars into a multilinear function space. 
Definition 27.3.7 is the scalar version of Definition 27.2.23. It is also clearly a linear space isomorphism. 


It follows from Remark 27.2.14 and Notations 27.3.2 and 27.3.4 that 4(V) = 4(V; K) = 2 (0; K) = KU 


for a linear space V over a field K. This implies (V) = Lin((0), K) with the linear space structure added. 
So the functions in .Z5(V) have the form 0 +> t fort € K. 


27.3.7 DEFINITION: Canonical identification of 0-linear functions with scalars. 
The canonical injection of scalars in a multilinear function space Yo(V), for a linear space V over a field K, 
is the map i : K > (V) given by 


vt € K, Vv € (0), i(t)(v) — t. 
In other words, i : t> (0 — t) for all t € K. 


27.3.8 EXAMPLE: Multilinear functions in calculus of several variables. 

Multilinear functions arise naturally in differential calculus for several variables. (See Section 42.2.) Let 
f € C**(R") be a real-valued function of n variables with continuous partial derivatives of all orders, 
where n € Zf. Let V = R”. By induction on m € Zg, define fm : V" — C®(R) by fm : (vl,... v") 5 
(cH UL, ul (0/025) fo, 1(vi,... vm 1)(z)). For m € Zf and x € R”, define gm,» : V" — R by gmx : 
(v!,...v") > f(x). Then gma € .Z4(V;IR) for all m € Zf and x € R”. In other words, directional 
derivatives of order m are multilinear functions of degree m at each point. 
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27.4. Simple multilinear functions 


27.4.1 REMARK: A construction method for a simple class of multilinear functions. 
The simplest way to construct non-trivial r-linear functions is by multiplying r linear functionals. This is 
done in Remark 27.4.3 and Definition 27.4.4. 


27.4.2 REMARK: The possibility of simple multilinear vector-valued maps. 

In principle, one may construct general multilinear vector-valued maps (as opposed to multilinear scalar- 
valued functions) by the same sort of procedure as in Remark 27.4.3. Simple r-linear maps whose range is a 
linear space U may be constructed as a product of r — 1 linear functionals and one linear map from one of 
the tensor component spaces to U. However, this construction method is not discussed here. 


27.4.3 REMARK: Constructing multilinear maps from multiple linear functionals. 

Some simple multilinear functions in the set Z ((Va)aca; K), for a finite family of linear spaces (Va)aca 
over a field K, can be constructed from linear functional families w = (Wa)aca € Xaca Vè by applying each 
linear functional to the corresponding vector. That is, one may define the map 


Voc Vs, E aWa 
we x U II wawa) 


ac acA 


for any w € X4cAV;. Definition 27.4.4 formalises this idea as the “canonical multilinear function map” 
n: Xaca Vè > L((Va)aca; K). This is a kind of “upgrade map” from a linear functional tuple to a 
multilinear function. It is not a bijection (except in special cases.) 


27.4.4 DEFINITION: The canonical multilinear function map for the set .Z((V4)aeA; K) of multilinear 
functions is the map 7: Xaea V; > Y((Va)aca; K) defined by 


s Va, = alt 27.4.1 
vwe x Va, Wwe x, n(w)(v) = [[ wa(ve) ( ) 


ae ae aca 


for any finite family of linear spaces (Va)aea over a field K. 


27.4.5 REMARK: Multilinearity of the canonical multilinear function map. 
Equation (27.4.1) in Definition 27.4.4 may be written more fully as follows. 


V(Wa)aeA € om Vj V(Va)acA € Lm Va; 
n((Wa)aeA) ((va)acA) = II Wa(Ua)- 


acA 


It is easy to show that n(w)(v) is multilinear with respect to v € X4ec4 V4 for all w € XacaVZ. (See 
Theorem 27.6.6 for proof.) 


27.4.6 DEFINITION: A simple multilinear function for a finite family of linear spaces (Va)aea over a field K 
is a multilinear function n(w) € 2((Va)aca; K) for some w € Xaca Và, where 77 is the canonical multilinear 
function map in Definition 27.4.4. 


27.4.7 REMARK: Analogy between simple multilinear functions and simple tensors. 

The simple multilinear functions in Definition 27.4.6 are analogous to the simple tensors in Section 28.4. 
The “canonical multilinear function map” in Definition 27.4.4 is likewise analogous to the “canonical tensor 
map” in Definition 28.2.2. 


27.4.8 REMARK: Multilinear functions are sums of products of monomials of homogeneous degree. 

It may seem that the canonical multilinear function map in Definition 27.4.4 and the simple multilinear 
functions in Definition 27.4.6 are almost trivial and superficial by comparison with the supposed complexity 
of tensor algebra. The products [],¢4 Wa(Va) look like algebraic monomials, and that is exactly what they 
are. General multilinear functions for finite-dimensional component spaces V4 are no more and no less than 
sums of products of linear functionals which have homogeneous polynomial degree. 
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ND 
f (21,22) eI 
2 
0.5 1.5 
| 1 
| 0.25 0.5 
l > T1 
len 1 —0.5 
-1 
(zNz2)-0 15 


ft) = T1122 


1 -1 -2 


Figure 27.4.1 Example bilinear function f : R! x IR! > R, f : (z1,23) œ z1232 


27.4.9 EXAMPLE: Contours of a bilinear function on two copies of the real numbers. 
Figure 27.4.1 illustrates a bilinear function f € .Z'(Vi, Vo; U) for the linear spaces V; = V2 = IR! and U = R. 
This is an example of a simple multilinear function. (See Definition 27.4.6.) 


All multilinear functions from IR! x IR! to IR are of the form f(z1,22) = kx 22 for some k € IR. The diagram 
uses k = 1. It is clear from the contour curves that f is not a linear function on R?. A linear function of two 
variables would have straight lines for contours. (More generally, the contours of linear functions are always 
parallel hyperplanes, i.e. translations of a subspace.) However, the value of f (x1, 22) for a constant value of 
xı does vary linearly with respect to x9. 


27.4.10 REMARK: Contrast between multilinear functions and linear functions of a vector variable. 

Example 27.4.9 hints at a general property of multilinear maps which makes them significantly different to 
linear functionals. In the case of linear functionals on a linear space V which has a basis, the only vector in 
V for which all linear functionals have the zero value is the zero vector. In other words, (Vw € V*, w(v) — 
0) = v — 0. (This follows from Theorem 23.6.8.) In the case of multilinear functionals of degree 2 or more, 
Theorem 27.2.30 implies that all multilinear functionals on a vector tuple v = (va)aea € Xae4A Va have the 
value zero if one or more of the components vg equals zero. This is the set (v € xaeA Va; JB € A, vg = 0]. 


By Theorem 23.6.9, if all linear functionals on a linear space V which has a basis have the same value for 
two vectors in V, then the two vectors are the same. In the case of multilinear maps, equality of multilinear 
map values defines an equivalence relation on the tuple set x4£4 Va. One of the equivalence classes of this 
relation is (v € Xae4 Va; 38 € A, vg = 0). All multilinear maps have value 0 on this set. The equivalence 
relation for the rest of x4c4 Va is characterised in Theorem 27.4.11. The equivalence classes are clearly 
(r — 1)-dimensional hyperbolic manifolds embedded in the product space xaea Va, where r = #(A). These 
are generalisations of the contour lines in Figure 27.4.1. 


27.4.11 THEOREM: Equivalence of multilinear functions for pairs of linear functional families. 
Let (V4)aeA be a non-empty finite family of linear spaces over a field K, which have bases. Define a relation 
“=” on XaeA Va by u = v whenever Vw € Xaca Vt, n(w)(u) = n(w)(v). Then 


Yu,v € ne \ {0}), 


u=v @ Jk € (K\ {O«})4, ([[ ka = 1 and Wa € A, ua = kava). (27.4.2) 
acA 


PROOF: Suppose u,v € XacA(Va V {0}) and u = v. Then [[;c4 Walta) = [Inca WalVa) for all w € 
Xaca Ve by Definition 27.4.4. For #(A) = 1, let A = {8} and Vg = {0}. Then line (27.4.2) is true because 
Xaca (Va \ {0}) = 0 by Theorem 10.11.6 (ii). 

Suppose A = (8j and Vg # {0}. Then w(ug) = w(vg) for all w € Vj. So by Theorem 23.6.9, ug = vg 
because Vg has a basis. So line (27.4.2) holds with kg = 1. 
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Suppose #(A) > 2. If V, = {0} for some a € A, then xaeA(Va V {0}) = Ø by Theorem 10.11.6 (ii). So 
line (27.4.2) is true because it is void. 

Suppose ##(A) > 2 and V, # {0} for alla € A. Let 8 € A. Let A’ = A\ {8}. Then wg(ug) [5c 4 Wa (uo) = 
wg(vg) [aca wo (va) for all wa € Vj and all w € Xaca VY. Theorem 23.5.3 implies that for alla € A’, 
there is some w^, € XaeA' Vo for which [Jacy w5 (ua) Æ 0. Then ks = J [oca wo (vo)/ Iaca wo (uo) is 
well defined. For such w’, wg(ug) = kgwg(vg) for all wg € Vj. Therefore ug = kgvg by Theorem 23.6.10. 
Hence Vo € A, Jk, € K, Ua = kava. So J [5c A Walta) = (loca ka) oes wo(va) for all w € Xaca VE. Ve. So 
Ilaca ka = 1. Then clearly k € (K V {0«})*. So line (27.4.2) is verified. 

To show the converse, let u,v € Xaea(Va V (0)) and k € (K V (0&])^ be such that [],¢4 ko = 1x and 


Va € A, Ua = kava. Then mem Wa(Ua) = DE Walkava) = Toca ka Iles Wal(kava) = IL. Walkava) 
for all w € Xaca Vz. So n(w)(u) = n(w)(v) for all w € Xaca VX. So u = v. 


27.4.12 REMARK:  Function-transposition of the canonical multilinear function map. 
If the canonical multilinear function map 7 in Definition 27.4.4 is transposed, the result is the map 77 in 
Definition 27.4.13. (The map 7 is shown to be multilinear in Theorem 27.6.7.) 


27.4.13 DEFINITION: The canonical multilinear function map transpose for a set of multilinear functions 
-Z((Va)aeA; K) is the map 9? : Xaca Va > .Z((V1)aeA; K) defined by 


Vv € x Va, V€v€ x Vi, nT (v)(w) = II Wa(Va) 
acA acA ET 


for any finite family of linear spaces (Va)ac a over a field K. 


27.4.14 DEFINITION: A simple dual multilinear function for a finite family of linear spaces (Va)aea over 
a field K is a multilinear function n” (v) € -Z ((Vž)aca; K) for some v € X aca Va, where 77 is the canonical 
multilinear function map transpose in Definition 27.4.13. 


27.4.15 THEOREM: Inverse image of zero vector of a canonical multilinear function map transpose. 
Let (Va)aca be a non-empty finite family of linear spaces, which have bases, over a field K. Then 


(PIAH = {v € x Vai 38 € A, ve =0}, 


where n? : Xaca Va > Z ((Vž)aca; K) is the canonical multilinear function map transpose in Defini- 
tion 27.4.13. 


PROOF: Let X = {v € xacAVa; IB € A, vg = 0). Then Vv € X, Vf € Z((Va)aca; K), f(v) = 0 by 
Theorem 27.2.30. But n(w) € 2((Va)aea; K) for all w € Xaca V by Definition 27.4.4. So n(w)(v) = 0 for 
all v € X and w € Xaca VŽ. So n” (v)(w) = 0 for all v € X and w € Xaca VŽ. So n (wv) = 0 for all v € X. 
So j^ (X) = {0}. So (57) *((0)) 2 X. 

Now suppose that v € (y^) !((0]). Then r^ (v)(w) = 0 for all w € Xgea VE. So [LoeA Walta ) = 0 for all 
w € Xaca V by Definition 27.4.13. If #(A) = 0, then (77)-!((0)) = 0 and X = 0. So (y7)-1((0]) = 
as claimed. So suppose that #(A) > 1. Suppose va 4 0 for all a € A. Then for each a € A, there is 
a Wa € VŽ such that wa(va) Æ 0 by Theorem 23.5.3. So [[aca wa(va) # 0. So v ¢ (n7) !((0)). Thus 
v € X >v g (nT)-!((0)). So v € (g7)-!((0)) > v e X. Hence (57)-!((0)) € X as claimed. 


27.4.16 THEOREM: Inverse images of a canonical multilinear function map transpose. 
Let (Va)aca be a non-empty finite family of linear spaces over a field K, which have bases. Then 


v$ € "(x Va), Ww € EAZ 


(7) *((0)) = {(kava)aca; k € K^ and II ka = 1}, 


acA 


where 97 : Xaca Va > -Z ((Vž)aca; K) is the canonical multilinear function map transpose which is given 
in Definition 27.4.13. 
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PROOF: If ó € 7" (Xaca Va), then ¢ = n? (v) for some v € Xaca Va. So 


Vue x Var we (0!) ((9) em )-ó 
€ n” (u) =n" (v) 
Vw € ES V, n” (u)(w) = n? (v)(w) 


Vw € X Vo. n(w)(u) = n(w)(v) 


t t $ 


3k € K^, ( [] ka = 1x and Vo € A, Ua = kava) (27.4.3) 
acA 


€ u€ {(kata)oca; k € K^ and J] ka = 1}, 
acA 


where line (27.4.3) follows from Theorem 27.4.11. 


27.5. Components for multilinear maps 


27.5.1 REMARK: The applicability of components of multilinear maps. 

Linear maps map infinite sets to infinite sets, but the redundancy in linear maps can be exploited. A linear 
map may be represented by its components with respect to bases for the domain and range. Thus a linear 
map may be represented by a matrix. Such coordinatisation offers advantages such as compactness and ease 
of computation. 


In the same way, because of the linearity properties of multilinear maps, the redundancy of information in 
the maps may be exploited by using components. Demonstrations of many basic properties of tensors are 
most easily achieved by expressing tensors and multilinear maps in terms of bases consisting of “simple” 
tensors and multilinear maps. Matrix representation is the most usual way of specifying multilinear maps 
for practical calculations. Hence “coordinates” cannot be easily avoided. 


It could be argued that the use of coordinates is onerous. The following steps are typically followed to 
calculate f(vi,...v,) for an r-linear map f and a vector tuple (v1, ... ur) € V" using coordinates. 


(i) Choose a basis (e;)?_, € V” for V. 

(ii) Determine the value aj, i, = f(ei,,...6;,) for each of the n” tuples (e;,,...€:,.). 
(iii) For each j € N,, determine coordinates (v; ;)7-, such that vj = 35; 4 vjei. 
(iv) Calculate f(U1,... vr) as 375 ag 2 aa ái UL re ne 


This seems like a lot of hard work for nothing. Why not just calculate f(vi,...v,.) directly instead of 
converting both the map f and the vectors v1,... Up to coordinates? The problem with the expression 
f (1... 0.) is that it requires an infinite number of function values whereas the coordinate approach requires 
a finite number n” of coefficients to completely specify the map, and each vector vj requires only n coefficients 
to specify it. On the other hand, the finite set of coefficients a;,,..;, = f(ei,,...¢:,) is different for each 
basis, and transforming coordinates between bases is tedious work. 


The abstract form f(v,,...v,) is best when dealing with abstract relations between multilinear maps, but for 
concrete calculations, it is best to use coordinates. Bases and coordinates cannot be avoided in many of the 
definitions and basic theorems in the initial presentation of tensor algebra. Thereafter one may use abstract 
formulas for abstract theorems, returning to coordinates whenever concrete calculations are required. 


27.5.2 REMARK:  Coordinatisation of multilinear functions and multilinear maps. 

In Definition 27.2.3, the domain of the multilinear map f is a finite Cartesian product x4c4 Va of linear 
spaces (Va)aca over a field K. To coordinatise the multilinear map f, it must be assumed that the spaces 
Va for a € A all have bases if one has no faith in the axiom of choice. (See Remark 22.7.20.) 


Coordinate expressions for multilinear maps are given in Theorems 27.5.3 and 27.5.4. Theorem 27.5.4 gives 
an equivalent expression for multilinear maps in terms of primal bases and canonical dual bases for the 
component spaces. (See Definition 23.7.3 for the canonical dual basis of a basis family. Theorem 27.5.3 
gives technical expressions for simple multilinear functions in preparation for Theorem 27.5.4. 
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27.5.3 THEOREM:  Basis-and-coordinates expression for canonical multilinear function map n. 

Let (Va)aea be a finite family of linear spaces over a field K. Let Ba = (e? );,cr, € Vie be a basis for Va 
for each a € A, and let (h? );,cr, € (Vz)/^ be the canonical dual basis to each Ba. Let h; denote the dual 
vector family (h& Jaca € XaeA Va for each index family i = (ia)aca € I = Xoca Ia. Then 


Vic I, Vv € x Va, n(hi)(v) = TD ^9, (va) 
acA acA 


= II KB, (Va) (ia) 


acA 


where kg, : Va — Fin(Ia, K) is the coordinate map for Ba in Definition 22.8.7, and y : Xaca Vè — 
L((Va)aca; K) is the canonical multilinear function map in Definition 27.4.4. 


PROOF: From Definition 27.4.4, it follows that n(hi)(v) = [Jaca(hila(va) = IIoea ^2, (va) for all i € I 
and v € XacA Va. From Definition 23.7.3, it follows that h? (va) = KB, (Va) (ia) for all i € I and a € A. So 


Haca he (va) = est KB, (Va) (te): 


27.5.4 THEOREM: Basis-and-coordinates expression for Z ((Va)aca; U). 

Let f € .Z((V4)aeA; U) be a multilinear map, where (V4)ae4 is a finite family of linear spaces over a field K, 
and U is a linear space over K. Let By = (e? )igera € Vie be a basis for V,, for each a € A. Let e; denote the 
basis vector family (e? )ueA € XacA Va for each index family i = (ia)aca € I = Xoca Ia. Let v = (va)aeA 
denote an element of xaea V4. Then 


Vue x Vas flv) = ECS II. (va)(ia)) (27.5.1) 


icl acA 

= (ve) TI ^ (va)) (27.5.2) 
icl acA 

- 2 fient), (27.5.3) 
tel 


where kp, : Va > Fin(Io, K) is the coordinate map defined for Ba in Definition 22.8.7, and the linear 
functional families h; = (hè Jaca € XaeA Va for i € I are as in Definition 23.7.3, and n : XaeAVz > 
L((Va)aca; K) is the canonical multilinear function map in Definition 27.4.4. Hence 


f= M f(eon(h). (27.5.4) 


icI 


PROOF: Let f € .Z((V4)aeA;U) be a multilinear map, where (Va)aca is a finite family of linear spaces 
over a field K, and U is a linear space over K. Let Ba = (e? )i;er, € Vie be a basis for Va for each a € A. 
Let e; = (e? Jaca € XacA Va for each index family i = (ia)aea € I = Xaca Ia. Then by Definition 22.8.7, 
Va = RNC Kp, (Va)(ta)ef, for all a € A, for all v = (va)aeA € XaeA Va. Therefore 


Vue x Va, f(v) = f(( 5 KB (Va) (tae aca): 


acA i.€l, 


This expression for f(v) may be expanded for a particular 8 € A using the definition of linearity of f with 
respect to ug. 


Wwe x Vas Fe) = DF selio) lueg)» 


igClg 


where the expression Va ls ,8 means the result of substituting ef, for vg in v = (va)aeA- This expansion of 


f may be continued inductively on B € A to obtain line (27.5.1). Lines (27.5.2), (27.5.3) and (27.5.4) follow 
from Theorem 27.5.3. 
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27.5.5 REMARK: Coordinatisation of multilinear maps on dual spaces families. 

Theorem 27.5.7 is a dual version of Theorem 27.5.4. It is noteworthy that the spaces V4 are required to be 
finite-dimensional in Theorem 27.5.7 whereas this is not required in Theorem 27.5.4. Infinite-dimensional 
linear spaces do not have a simple canonical dual basis. (See Remark 23.7.4 for discussion of this.) Al- 
though Theorem 27.5.7 is similar to Theorem 27.5.4, there are significant differences. Theorem 27.5.6 gives 
expressions for the action of some simple multilinear functions as technical preparation for Theorem 27.5.7. 


27.5.6 THEOREM:  Basis-and-coordinates expression for canonical multilinear function map transpose n? . 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let Ba = (e )i,er, € Va" 
be a basis for Vy for a € A. Let BY = (h? )i,er, € (Vz)!^ be the canonical dual basis to each Ba. Then 


Viel, Wwe x V*, T (ej = alez 
i we x Va n” (ei)(w) Hs (ee) 


= II KBs (Wa) (ia) 


acA 


where kgs : VX — Fin(Ia, K) is the coordinate map for Bž in Definition 23.9.1, and 77 : Xaca Va > 
-Z((Vt)aeA; K) is the canonical multilinear function map transpose in Definition 27.4.13. 


PROOF: From Definition 27.4.13, it follows that 77 (e;)(w) = [Tae 4 Wa ((ei)a) = Iaca Wale) for all i € I 
and w € Xaca Va. From Definition 23.9.1, it follows that wa(ef ) = KBs (Wa)(ia) for all i € I and a € A. 
So Laca Wa (eg. ) = DL KBs (Wo) (ta). 


27.5.7 THEOREM: Basis-and-coordinates expression for Z ((Vž)aca; U). 

Let f € .Z((Vz)aeA;U) be a multilinear map, where (Va)aca is a finite family of finite-dimensional linear 
spaces over a field K, and U is a linear space over K. Let B = (h? );,er, € (V1)'* denote the canonical 
dual basis for V? corresponding to a primal basis (e )i,er, for each a € A. Let e; = (ef )aeA € XaeA Va 
and hy = (hẹ Jaca € Xaca Ve for each index family i = (ia)aca € I = XaeA Ia. Let w = (wa)aeA denote 
an element of x4c4 V7. Then 


vwe x Va, f(w) = yi f(hi) [Les: (wa) (ia)) (27.5.5) 


icl acA 

= y» JI wales); (27.5.6) 
icl acA 

= M f(hi)n(w)(ei), (27.5.7) 
icl 


where g+ : Vi > Fin(l4, K) is the coordinate map defined for B7 in Definition 23.9.1, and the fami- 
lies of linear functionals B% = (h? );,er, € (Vz)!* are defined in Definition 23.7.3, and r : xaeAVz > 


-Z((Vo)aeA; K) is the canonical multilinear function map in Definition 27.4.4. Hence 


f= >> for (e. (27.5.8) 


icI 


PROOF: Let f € ¥((Ve)aea;U) bea multilinear map, where (Vx)aca is a finite family of finite-dimensional 
linear spaces over a field K, and U is a linear space over K. Let Ba = (e$ )ier, € Vie be a basis 
for Va for each a € A. Let e; denote the basis vector family (e? )seA € XaeA Va for each index family 
i = (ia)aca € I = XacAla. Let BY = (hg)icr, € (Vi)'* denote the canonical dual basis for V 
corresponding to Ba for each a € A. Then Wa = Do; cr, Walef )h?. for all a € A by Theorem 23.7.10, for 
all w = (Wa)acaV.. Therefore 


we x V, f(w) =F(( >> wal hg) nea): 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


27.5. Components for multilinear maps 933 


This expression for f(w) may be expanded for a particular 8 € A using the definition of linearity of f with 
respect to wg. 


* — 
Vw € Em V. f(w) = 2l welei df (wlagang): 
ig€lg 
where Wal ug pê means the result of substituting h$, for wg in w = (Wa)aca. This expansion of f may be 


continued a on 6 € A to obtain line (27.5.6). Then lines (27.5.5), (27.5.7) and (27.5.8) follow from 
Theorem 27.5.6 and the definitions of 7 and its function-transpose, 7° . 


27.5.8 REMARK:  Decomposition of multilinear functions into simple multilinear functions. 

In Section 27.6, it will follow from lines (27.5.4) and (27.5.8) in Theorems 27.5.4 and 27.5.7 that general 
multilinear functions may be expressed as linear combinations of simple multilinear functions. This is 
formalised for 2((Va)aea;U) in Theorem 27.6.11. 


27.5.9 EXAMPLE: Decomposition of bilinear functions into simple bilinear functions. 
Let V be a linear space over a field K with basis (e;)?_, € V" for some n € Zi. Let f: V x V 2 K bea 
bilinear function. Let aj; = f(e:,e;) for i,j € Nn. Then the matrix a = (ai;)7;-4 € Ma, (€) satisfies 


Vv,w € V, f(v,w) = 0D ven Y S ujej) 


where v = 257 , vie; and w = 555  wjej. Thus the action of the bilinear function f on v and w is equivalent 
to the action of the matrix a on the products of coordinates of v and w. 


27.5.10 EXAMPLE: Reconstruction of bilinear functions from simple bilinear functions. 

As a converse to Example 27.5.9, let b € M,,,,(K) be a matrix. Then the map 8: V x V > K with £: 
(v, w) => »» bijviwj for v, w € V defines a bilinear function on V. Then 8(e;,e;) = >; ac brelei)k(ej)e = 
Sh a1 bkeôikôje = bi; for all i,j € Na. Thus every bilinear functional corresponds to a matrix and every 
matrix yields a bilinear form. 


The correspondence between matrices and bilinear maps is a bijection. So matrices offer an equivalent way 
of expressing bilinear maps. (This is illustrated in Figure 27.5.1, which is related to Figure 25.7.1.) 


tuple space 


K” x K” 


= ^j 
component : i 
vox E or 
EU ear 
pi P K 
po 


"(Sa 


linear space 


Figure 27.5.1 Components of a bilinear map with respect to a basis 
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27.5.11 REMARK: Converting between colloquial (covariant) tensors and well-defined multilinear maps. 
In the literature, one frequently sees expressions such as “the tensor gij”. The real meaning of such an 
expression is something like “the bilinear map g : V x V > R with g(e;,ej) = gij for all i, j € Nn, for some 
basis (e;)?., for V", where n € Z+. Clearly the shorter expression does have advantages, as long as one 
does not forget that the underlying bilinear map is the “real thing". Examples 27.5.9 and 27.5.10 show how 
to convert from bilinear maps to matrices and back again. 


27.6. Linear spaces of multilinear maps 


27.6.1 REMARK: Definition, bases and dimension of linear spaces of multilinear maps. 

Since tensor spaces will be defined as the linear-space dual of multilinear map spaces in Definition 28.1.2, it is 
necessary to define linear space structure for the sets of multilinear maps -X ((Va)aca; U) in Definition 27.2.3. 
In Section 27.6, these linear spaces are defined using pointwise vector addition and scalar multiplication. 
'Then a basis can be constructed for multilinear maps in terms of given bases for the component spaces. 
From this, the linear space dimension of spaces of multilinear maps can be calculated, and component maps 
may be constructed with respect to the bases. (Note that the word “component” has two meanings here, 
namely the component spaces V4 and the component-tuples in the field K.) 


27.6.2 THEOREM: Pointwise linear space operations for multilinear maps. 
The set Z ((Va)aca; U) is closed under pointwise vector addition and scalar multiplication defined by 


VYAq, A2 € K, Vf, fo € ZZ ((Va)aeA; U), Vv e A Vas 
ae 


(Arfi + A2f2)(v) = Arfi(v) + A fit). 


PROOF: Let g,h € Z((Va)aca; U) and let f = g +h be the pointwise sum of g and h. The expression f(u) 
in Definition 27.2.3 evaluates to g(u) + h(u) = A1g(v) + Agg(w) + M4h(v) + Agh(w) = Aif (v) + Af (w), as 
required. Similarly, for A € K and f = Ag, f(u) evaluates to Ag(u) = A(A1g(v) 4- Agg(w)) = Mf (v) 4- Ag f (w), 
as required. 


27.6.3 DEFINITION: The linear space of multilinear maps from a finite family (Va)ac a of linear spaces over 
a field K to a linear space U over K is the set .Z((V4)aeA;U) together with the operations of pointwise 
vector addition and scalar multiplication. 


The linear space of multilinear functions on a finite family (Va)aea of linear spaces over a field K is the set 
L((Va)aea; K) together with the operations of pointwise vector addition and scalar multiplication. 


27.6.4 REMARK: Analogy of linear spaces of multilinear functions to “multilinear dual” spaces. 

The multilinear function space Y ((Va)aca; K) in Definition 27.6.3 may be thought of as a kind of “multilinear 
dual" of the family (Va)aea, which is analogous to the dual of a single linear space. It is not truly a “dual” 
because the quintessential property of a dual is that the dual of a dual is isomorphic to the original space. 
The “multilinear dual” of the “multilinear dual” is not isomorphic to the original linear-space-family (except 
in trivial cases). 


The dual of a linear space is the space of linear functionals on that space, whereas X ((Va)aca; K) is the 
linear space of multilinear functionals. So the linear space of multilinear functions is only a generalisation of 
the concept of a space of linear functionals, not truly a dual in any genuine sense. Nevertheless, the phrase 
“multilinear dual” is a convenient shorthand for “linear space of multilinear functions". 


27.6.5 REMARK: The specification tuple for the linear space of multilinear maps. 

More strictly speaking, the linear space of multilinear maps in Definition 27.6.3 is specified as the tuple 
(K,W,oK,TK,ow, Uk,w), where K < (K,ox,7TxK) is the field, W = @((Va)aca;U) is the set of maps, 
ow : W x W — W denotes pointwise addition on W, and ux,w : K x W — W denotes pointwise 
multiplication of elements of W by elements of K. (See Definition 22.1.1 for linear spaces.) 


27.6.6 l'HEOREM: The canonical multilinear function map is a multilinear map. 
The canonical multilinear function map 7: XaeA Ve > L((Va)aea; K) in Definition 27.4.4 is a multilinear 
map for any finite family of linear spaces (Va)aca over a field K. 
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PROOF: Let (V4)aeA be a finite family of linear spaces over a field K. Let 8 € A. Let A1, 4» € K. Let 


w?, wt, w? € Xaca Vf satisfy w3 = wh + Aq and Va € A\ {8}, w? = wl = v2. 


Let v € Xaca Va. By Definition 27.4.4, n(w°)(v) = Toca wa(Ya) = (Awi + A2w3) (va) Lacate Wa (va) = 


M Jaca wá (va) + A2 Taca wa (va) = Arn(w')(v) + A2n(w*)(v). So n(w?) = Ain(w*) + Agn(w?). Hence n is 
a multilinear map by Definition 27.2.3. 


27.6.7 THEOREM: The canonical multilinear function map transpose is a multilinear map. 
The canonical multilinear function map transpose n? : Xae4 Va > .Z((Vi)aeA; K) in Definition 27.4.13 is 
a multilinear map for any finite family of linear spaces (Vy)aea over a field K. 


PROOF: Let (V4)aeA be a finite family of linear spaces over a field K. Let 8 € A. Let A1, Ap € K. Let 
v), v! ,v? € Xaca Va satisfy v3 = A1vg + Azv and Va € AV (8), v$ = v1 = vå. 
Let w € XocA Vi. By Definition 27.4.13, n” (v°)(w) = [Jaca wo (2) = wo (Av Av) Lacate Walv?) = 


àr Taca wav) - A2 Taca Walwa) = Arn? (vt) (w) +A2an7 (v?) (w). So n? (v°) = Ain? (vt) + Aan (v?). Hence 
77 is a multilinear map by Definition 27.2.3. 


27.6.8 REMARK: Properties of the canonical basis for the multilinear function space. 

For the “canonical basis" family ($;);e; defined in Definition 27.6.9 for the multilinear function space 
L((Va)aea; K), it is shown in Theorem 27.6.10 that 6; € Z((Va)aca; K) for all i € I, and the family 
is shown to be a basis for 2((Va)aea; K) in Theorem 27.6.11. 


Regrettably, the family ($;);e; does not span Y((Va)aca;K) if one or more of the component spaces Va is 
infinite-dimensional. (See Remark 23.7.4.) So the “basis” is only a basis for finite-dimensional spaces. 


27.6.9 DEFINITION: The canonical basis for the multilinear function space .Z((V4)a4eA; K) corresponding 
to the bases Ba = (ef ier, € Vie for V, for a € A is the family (ó;)ier where I = Xaca Ta and 
Qi : XacA Va > K is defined for i = (ia)aca € I by 


Vv — (Va)acA € Sa Vas ó;(v) = Il KB, (Va) (ta), 


ae aca 
where Kg, is the component map for Ba in Definition 22.8.7. 


27.6.10 THEOREM: Canonical basis elements are images of linear functional families. 
Let (ó;);er be the “canonical basis" for 2((Va)ae4; K) in Definition 27.6.9, where (Va)aca is a finite family 
of linear spaces over a field K. Then 


Vi € I, Qi = n(hi) € Z((Va)aea; K), 


where n : XaceA Ve > .Z((Va)aea; K) is the canonical multilinear function map in Definition 27.4.4, and hi 
denotes the family (h? Jaca € Xaea Vå for all i € I, where (h€ )i,er, € (Ve)!* is the canonical dual basis 


to each primal basis By. 


PROOF: The equation ¢; = 7(h;) follows from Theorem 27.5.3. Therefore à; € 2((Va)aea; K). 


27.6.11 THEOREM: The canonical basis is a basis if the component spaces are finite-dimensional. 

Let ($;);e; be the “canonical basis" for L ((Va)aca; K) in Definition 27.6.9, where (Va)aca is a finite family 
of finite-dimensional linear spaces over a field K. Then (¢;)ie7 is a basis for the linear space of multilinear 
functions .Z((Va)oeA; K). 


PROOF: To show that {¢;; i € I} spans .Z((V4)aeaA; K), let f € .Z((Va)seA; K). Then by Theorem 27.5.4 
line (27.5.1), f(v) = Vier (Ff (ei) Haea &n,(vo)($4)) for all v € xaea Va. Therefore by Definition 27.6.9, 
Fw) = Vier f (ei)ói(v) for all v € Xaca Va. So f = Vics f(ei)ói. Hence {¢,; i € I} spans 2((Va)aea; K). 
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To show that the vectors ¢; for i € I are linearly independent, suppose that 5 ^;-, kid; = 0 for some k € K T; 
Let j € I. Then 


icl 

=X ki Ts GR) 
icI acA 

= hi I9. 
wel acA 


So k; = 0 for all j € I. So the vectors ¢; for i € I are linearly independent. Hence (ó;)ie; is a basis for 
Z ((Va)aca; K). 


27.6.12 REMARK: The canonical basis elucidates the structure of a multilinear function space. 
In view of Theorem 27.6.11, the quotation marks around the term “canonical basis" for families ($;);e; for 
-Z((Va)aeA; K) may be removed when the spaces Va are finite-dimensional. 


The functions ¢; in Definition 27.6.9 are simple multilinear functions according to Definition 27.4.6. Thus 
every linear space of multilinear functions with finite-dimensional component spaces is spanned by a finite 
set of simple multilinear functions. This elucidates the structure of the space of multilinear functions to a 
great extent. Theorem 27.6.13 gives a simple explicit formula on line (27.6.1) for multilinear functions. 


The method of proof for Theorem 27.6.13 also yields the explicit expression on line (27.6.2) for multilinear 
maps f € .Z((V4)aeA;U) in terms of the basis vector family ($;);e; for general linear spaces U. The 
pointwise products f(e;)¢; are well-defined maps in Y((Va)aea;U). Therefore the sum of these products is 
a well-defined map in the same linear space. 


There is a similar expression for tensor spaces. (See Theorem 28.1.20.) 


27.6.13 THEOREM:  Basis-and-coordinates expression for .Z'((V4)se4; K) and Z((Va)aca; U). 
Let (ó;);e; be the canonical basis for Z ((Va)aca; K) in Definition 27.6.9, where (Va)aca is a finite family 
of finite-dimensional linear spaces over a field K. Then 


Vf € Z((Va)aca; K), f= 5 f(ei)di, (27.6.1) 


icI 


where e; = (e? )aea for all i € J. Let U be a linear space over K. Then 


Vf € Z((V.)aeA;U), f= f(e)é. (27.6.2) 


ier 


PROOF: Line (27.6.2) follows from Theorem 27.5.4 line (27.5.4) and Theorem 27.6.10. To prove the formula 
on line (27.6.1) for K-valued maps, substitute K for U. 


27.6.14 THEOREM: The multilinear function space dimension equals the product of component dimensions. 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Then 


dim(-Z((Va)aea; K)) = Il dim(Vq). 


acA 


PROOF: The assertion follows from Definition 27.6.9 and Theorem 27.6.11. 


27.6.15 REMARK: Canonical basis for a linear spaces of multilinear maps. 

Definition 27.6.16 is the vector-valued version of Definition 27.6.9. The right hand side of line (27.6.3) is 
the scalar product of e by the element [[4c 4 &p, (Va) (ia) of the field K. This is a kind of “juxtaposition 
product” as described in Remark 27.2.31. 
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27.6.16 DEFINITION: The canonical basis for the multilinear map space Z ((Va)aca; U) corresponding to 
the bases Ba = (ef )i,er, € Vie for Va for a € A, and a basis BY = (e¥)™, for the finite-dimensional 
linear space U over K, is the family (¢;,;)ier,jen,, Where I = XacA la and $i,j : Xaca Va — U is defined 


for i = (ia)aca € I and j € Nm by 


Vv = (va)aca € x Vas pij) 2 e [[ 55. (va) (ia); (27.6.3) 
x acA 


where Kg, is the component map for Ba in Definition 22.8.7. 


27.6.17 THEOREM: The multilinear map space canonical basis for finite-dimensional spaces is a basis. 
Let ($jj)ier;ew, be the multilinear map space canonical basis in Definition 27.6.16 for 2((Va)acai U) 
corresponding to bases Ba = (e? )i,er, € Vde for finite-dimensional spaces V4 for a € A, and a basis 
BU = (e)? for the finite-dimensional space U. 


(iii) The vectors $;; for (i, j) € I x Nm are linearly independent. 


PROOF: For part (i), let f € 2((Va)acajU). Then f(v) = » 5c r(f(ei) loca Ba (Va) (ta)) for v € Xaca Va 
by Theorem 27.5.4 line (27.5.1). Since f(e;) € U for all i € I, it follows from Definition 22.8.8 that 
f(ei) = pue &pv (f(ei));eV for all i € I. Therefore 
Wwe x Va, f(v) 2 3; Y (sav(f(ei)sej TL ra. (Ya) (ta)) 
a ae 


icI JENm 


=>), Dd kau(f(ei));¢i,5(v) 


icl jENm 


by Definition 27.6.16. So f = Jer jenn EBU (F (ei) ibig. 
Part (ii) follows from part (i). 


For part (iii), suppose that 57 ki,j¢i,j = 0 for some k € KT*Nv, Let i’ € I. Then 


iE LEN, 
m 
0=5 Dd bij bi, (ex) 
icrj-i 
=> X kije] D ss.) 
i€1 j=l a 
- e X kij [I wale Jia 
j=1 iEI ac 
vs U 
= €j $^ ki; II 945544 (27.6.4) 
j=l icl acA 


j=1 icI 
= »» eU ky j: 
j=l 
where line (27.6.4) follows from Theorem 22.8.9 (ii). So ky; = 0 for all (i^, j) € Ix Nm by Theorem 22.6.8 (ii) 
because (e7)7* , is a linearly independent sequence of vectors in U. Thus the vectors œi; for (i, j) € I x Nm 


are linearly independent. 
Part (iv) follows from parts (ii) and (iii) and Definition 22.7.2. 


27.6.18 THEOREM: Multilinear map space dimension. 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U be a finite-dimensional 
linear space over K. Then 


dim( (Va )acA U)) = dim(U) TI dim(Va). 


PROOF: The assertion follows from Definition 27.6.16 and Theorem 27.6.17. 
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27.7. Linear spaces of multilinear maps on dual spaces 


27.7.1 REMARK: Dual versions of canonical basis definitions and theorems. 
The following dual versions of Definition 27.6.9 and Theorems 27.6.10, 27.6.11 and 27.6.13 are similar, but 
have significant differences. 


27.7.2 DEFINITION: The canonical basis for the multilinear dual function space L((V~)vea; K), 
corresponding to bases By = (ef )i,er, € Vde for Va for a € A, is the family (¢ž)ier where I = XacA Ia 
and $7 : Xaca V? — K is defined for i = (ta)aea € I by 


Vw = (wa)aeA € X Va, i (w) = Il wo (es). 
acA aba 


27.7.3 THEOREM: Dual canonical basis elements are images of basis-vector families. 
Let ($7 )ier be the “canonical basis” for Y ((Vž)aca; K) in Definition 27.7.2, where (V4)ac4A is a finite family 
of linear spaces over a field K. Then 


Vi € I, pi =n" (ei) € L((Ve)aea; K), 


where n? : Xaca Va > Z((Vž)aca; K) is the canonical multilinear function map transpose defined in 
Definition 27.4.13. 


PROOF: The equation ¢* = 7 (e;) follows from Theorem 27.5.6. Therefore 9? € .Z((V3)seA; K). 


27.7.4 THEOREM: Dual canonical basis elements are multilinear functions over the dual linear spaces. 
Let ($7 )ier be the “canonical basis” for Y ((Vž)aca; K) in Definition 27.7.2, where (Va)aea is a finite family 
of linear spaces over a field K. Then $7 € .Z((Vz)aea; K) for all i € I. 


PROOF: Forallie I, ¢¥ =n? (ej), where e; = (ef )aca and 17 : Xaca Va > Z ((Vž)aca; K) is defined in 
Definition 27.4.13. Therefore 7 € .Z((Vz)aea; K). 


27.7.5 THEOREM: The dual canonical basis is a basis if the component spaces are finite-dimensional. 
Let ($7);e; be the “canonical basis" for Y ((Vž)aca; K) in Definition 27.7.2, where (Va)aca is a finite family 
of finite-dimensional linear spaces over a field K. Then ($7);e; is a basis for the linear space of multilinear 
functions .Z((Vz)aea; K). 


Pnoor: To show that {@7;7 € I} spans .:Z((Vz)aca: K), let f € .Z((Vz)acea; K). By Theorem 27.5.7 
line (27.5.6), f(w) = erf) aca Walet )) for all w € Xaca Va- Therefore by Definition 27.7.2, 
f(w) = Vier f(hi)ó; (w) for all w € Xaca Vg. So f = Vics f (hi). So (65; i € I} spans Z((Vz)sea; K). 
To show that the vectors ¢* for i € I are linearly independent, suppose that Y7,-; ki¢* = 0 for some k € KT. 
Let j € I. Then 


per 


0- Y kig: (hy) 


icr 

B 5 ki II hs. (en) 
icI acA 

= Dok [ID 
icI acA 

= kj. 


So k; = 0 for all j € I. So the vectors 97 for i € I are linearly independent. Hence (¢ž)icz is a basis for 
Z((Vi)aea: K). 


27.7.6 THEOREM: Basis-and-coordinates expression for S ((Vž)aca;U). 
Let (¢*);e7 be the canonical basis for Y ((Vž)aca; K) in Definition 27.7.2, where (Va)aca is a finite family 
of finite-dimensional linear spaces over a field K. Then 


Vf € Z(Vz)aea: K), f fet 


icI 
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where h; = (hè )aeA is the dual to e; for i € I. Let U be a linear space over K. Then 


Vf € Z(Vz)acA;U), f= So fet. (27.7.1) 


icI 


Pnoor: Line (27.7.1) follows from Theorem 27.5.7 line (27.5.8) and Theorem 27.7.3. 'To prove the formula 
for K, substitute K for U. 


27.8. Universal factorisation property for multilinear functions 


27.8.1 REMARK:  Kernel-like structure for multilinear maps on dual spaces. 

Theorem 27.8.2 says essentially that the multilinear “kernel” of the canonical multilinear function map 177 is 
included in the kernel of any multilinear map with the same domain. (See Definition 10.16.7 for the related 
“equivalence kernel” concept for general functions.) In other words, linear functionals w! and w? which are 
in the same equivalence class with respect to the action of 7 are also equivalent with respect to the action 
of f. This shows that there is no finer partition of xac4 V; which is generated by a multilinear function on 
this set. (See Definition 8.7.15 for finer and coarser partitions.) The fact that this partition is the coarsest 
possible partition with this property follows from the fact that 7 is a multilinear function. Therefore the 
partitioning of x4c4 V; by 7 is the coarsest partition for which all multilinear functions have equal values on 
the elements of the partition. This partition may be thought of as a kind of “multilinear kernel” of xaea V5. 
(This partition is precisely characterised in Theorems 27.4.11, 27.4.15 and 27.4.16.) 


The canonical multilinear function map 7 helps to clarify the nature of multilinear maps in general. The 
“kernel” of 7 is also in the kernel of every other multilinear map. Hence there is no such thing (in general) 
as an injective multilinear map when #(A) > 2. Similar observations apply also to the canonical tensor map 
u for the Cartesian product xX ge4 Va in Section 28.2. 


27.8.2 THEOREM: Multilinear maps are constant on contours of the canonical multilinear function map. 
Let (V4)aeA be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear map f : Xgea Vè > U, 


Vul ute x Vg, qw!) = n(w*) = fw!) = fu), 


where 7: XaeA V; > .Z((Va)aea; K) is the canonical multilinear function map for 2((Va)aea; K). 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear 
space over K. Let f : Xaca V — U bea multilinear map. Let (e? Ji er, be a basis of V, for each a € A, and 


denote by (^? );,er, the corresponding dual basis in Definition 23.7.3. Let wl, W? € Xoca Vt with n(w!) = 


n(w?). Let I = Xaca Ia. Then by Theorem 27.5.7, f(w!) = Mier f (hi)n(w')(ex:) = ? er f (hi)n(w?) (ex) = 
f(w?), where e; = (e? Jaca and h; = (h? Jaca for i € I. 


27.8.3 REMARK:  ^Divisibility" of multilinear functions by the canonical multilinear function map. 

Theorem 27.8.2 has the interesting consequence that f o 5^! : Range() — U is a well-defined function. (In 
fact, this is the purpose of the theorem!) The map 5^! : Range(7) > Xaca Vč is not well defined because 
n +({T}) is not in general a singleton for T € Range(n). However, f o nT} = {f(w); n(w) = T) 
is guaranteed to be a singleton for T € Range(g) by Theorem 27.8.2. (This is an alternative way of 
expressing Theorem 27.8.2.) Therefore there is a well-defined function g : Range(7) — U which satisfies 
g(T) = f(w) & n(w) = T. In other words, g o n = f. Theorem 27.8.4 asserts the existence and uniqueness 


of g. 


27.8.4 THEOREM: Quotient of multilinear functions by the canonical multilinear function map. 

Let (V4)aeA be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear map f : xaeA V; — U, there is a unique function g : Range(y) > U 
which satisfies g(n(w)) = f(w) for all w € Xaca Ve, where y : Xoca V; > L((Va)aea; K) is the canonical 
multilinear function map for 2((Va)aca; K). 
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PROOF: Let W = xa&ecAVi. To show existence of a function g : Range(7) + U with g o 7 = f, let 
g = (T, f(w)); T € Range(n), w € W, nw) = T}. If (T, f(w!)) € g and (T, f(w2)) € g, then nw!) = T = 
n(w). So by Theorem 27.8.2, f(w!) = f(w?). So g is at most single-valued. If T € Range(7), there is some 
w € W with q(w) =T. So g is at least single-valued. Therefore g is a function. 


To show uniqueness, let g,g’ : Range(7) > U satisfy g o n = g' on = f. Let T € Dom(g) = Dom(g'). Then 
T = n(w) for some w € W. So g'(T) = g'(n(w)) = f(w) = g(n(w)) = g(T). Hence g = g'. 


27.8.5 REMARK: The "lift" of a multilinear function can be extended to all multilinear functions. 

'The quotient function g in Theorem 27.8.4 may be thought of as a kind of "lift" of the function f from the do- 
main Xaca V to the domain Range(r), which is the set of all simple multilinear functions in Y ((Va)aca; K). 
This “lift” does not lose any information from the function f : Xaca V; — U because f may be recovered 
from gas f —gom. 

The function g in Theorem 27.8.7 is an extension of the function g in Theorem 27.8.4 from Range(r) 
to .Z((V4)aeA; K). In other words, it is an extension of the “lift” of f from the simple multilinear functions 
in Range(7) to set of all multilinear functions in 2((Va)aca; K). Moreover, Theorem 27.8.7 asserts that 
there is a unique linear extension of this kind. 


27.8.6 REMARK: Diagram of quotient of multilinear functions by canonical multilinear function map. 
Theorem 27.8.7 is a dual version of Theorem 28.3.7. The functions and spaces in Theorem 27.8.7 are 
illustrated in Figure 27.8.1. 


n 
f 
x Vi 
acA i 
Figure 27.8.1 Universal factorisation property for multilinear functions 


27.8.7 THEOREM: Universal factorisation property (multilinear functions) 

Let (Va)aea be a finite family of finite-dimensional linear spaces, and U be a linear space, over a field K. 
Then for any multilinear map f : Xaca Vè — U, there is a unique linear map g : .Z((Va)aeaA; K) > U 
such that f = g o n, where y : Xaca Vè > L((Va)aea;K) is the canonical multilinear function map 
for L ((Va)aca; K). 


PROOF: Let (Va)aca be a finite family of linear spaces over a field K. Let B = (e? )i,er, € Vie be a basis 
for Va for each a € A. Let (h? )i,er, € (Vz)/^ be the dual basis of B for Vx for each a € A. Let U bea 
linear space over K. For all f € Z((Ve)aca;U), define g : 2((Va)aca; K) > U by 


VT € L((Va)nea; K), g(T) = M T(e)f(h;), 


icI 


where e; = (ef Jaca € Xaca Va and hi = (hẹ )ueA € Xaca Va for all i = (ta)aca € I = Xoca Ia. Then g is 
a linear function of T because T (e;) is a linear function of T for each i € I. (This follows from the pointwise 
definition of vector addition and scalar multiplication for the linear space 2((Va)oea;K).) Substituting 
n(w) for T gives: 


wwe x Y. g(n(w)) = 3  n(w)(ei) f (AE, Jaca) 


icl 


= (TI valek )) 5 (022464) 


icl acA 


=Y F((wale% Jh? Jaca) 
icl 
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=F(( D0 walet Ae Jaca) 


Therefore f = go 7. 


To show uniqueness of g, let g' : Y((Va)aea;K) — U be a linear function such that f = g' o rj. Then 
g' (T) = g(T) for all T € Range(7). By Theorem 27.6.11, every multilinear function T in the linear space 
-Z((Va)aeA; K) may be expressed as a linear combination of the simple multilinear functions ó; = n(h;) 
for i € I. By Theorem 27.6.13, T = » c; T(ei)oi, where $; € 2((Va)aca; K) is defined by ¢; = [| 
for i € I. Then g'(T) = g O ser T(e)6)) = Zier T(e)9 (i) = Lier T(e09(0) = (T). So g' =g. 


27.8.8 REMARK: Difficulty of proving universal factorisation property without basis and coordinates. 
Although the proof of Theorem 27.8.7 uses bases and coordinates, the statement of the theorem does not 
seem to suggest a requirement for the use of bases and coordinates. The uniqueness assertion in the statement 
of the theorem implies, in fact, that the map g is independent of the choice of basis which is used in its 
construction. It only depends on f. Therefore it is reasonable to ask whether a proof may be constructed 
without bases and coordinates. 


Theorem 27.8.7 asserts essentially that the map g in Theorem 27.8.4 can be extended from simple multilinear 
functions to a linear map on the whole domain .Z((V4)4e4; K). In principle, this kind of extension can be con- 
structed without choosing a basis. To construct g, we need to know that each function T € .Z((Va)aeA; K) 
can be expressed as a linear combination of simple multilinear functions n(w) for w € Xaea Ve. (This is 
a "spanning property" for simple multilinear functions.) Then we need to show that such a construction 
for g is independent of the choice of linear combinations. (This is a *uniqueness property".) The fly in the 
ointment here is the difficulty of proving this spanning property and uniqueness property (Theorem 27.8.9) 
without using a basis and coordinates. This intuitively appealing approach would be “coordinate-free” if 
Theorem 27.8.9 could be proved without a basis and coordinates. 


27.8.9 THEOREM:  Basis-free theorem about multilinear function quotients, requiring bases for proof. 

Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 

over K. Then 

(1) Vf € Z((Va)aea; K), 3m € Zp, 3A € K™, Iw € (xaeA V2)", f = Xj- Ayn (wy). 

(2) Vf € .Z((V4)seA; U), Vm € Zt, VÀ € K™, Vw € (Xaca Vf)" 
Poe Ajn(w;) =0 > 5 aftu e 

(3) vf € pe aaea; U), Ymi, ma € ZB, VA! e K™, VA? e K"2, vw! € (xac AVI)", Vu? € (XaeaVo)™, 
$a Ajn(wj) = 2752, ANWR) > LG Aj f (wj) = Dpi Ak fF (wi) 

PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let B = 


(e£ )i,er, € Vie be a basis for V, for each a € A. Let (hẹ );, er, € (V1)!* be the dual basis of B for V7 for 
each a € A. Let e; = (e? Jaca € XaeA Va and hi = (hẹ Jaca € Xaca Va for all i = (ia)aca € I = Xaca To 
To show part (1), let f € Z((Va)aca; U). By Theorem 27.5.4, f = Oy, Ajm(w;) with m = Jaca dim(Va), 
where A € K™ and w € (Xaca V1)" are defined by A; = = fle; eg(j)) and wj = n(hg(j)) for all 7 € Nm, for any 
bijection 8 : Nm — A. 

To show part (2), let f € Z((Va)aca; U), m € Zi, A € K™, and w € (Xaea V1)". By Theorem 27.5.7, 


nies) = ier Mws)(ei)n(ha) and f(w;) = Dyer n(w;)(ei) f (i) for all j € Nm. Suppose 377-4 Ajn(w;) = 0. 
Then 
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But the vectors n(hi) € .Z((V4)aeA; K) for i € I are linearly independent by Theorem 27.6.11. So 
2204 Ajn(w;)(ei) = 0 for all i € I. Therefore 


Do Ash (ws) = Y 3 nwe) f0s) 


j=l ier 
=X fhi) X Ayn(w;)(ei) 
iel j=l 

= 0. 


Part (3) follows immediately from part (2). 


27.8.10 REMARK: Difficulty of removing “coordinates” from proof of multilinear function properties. 
Theorem 27.8.9 part (3) is a generalisation of Theorem 27.8.2. Parts (1) and (3) of Theorem 27.8.9 imply 
respectively that every multilinear function over a finite family of finite-dimensional linear spaces can be 
expressed as a linear combination of simple multilinear functions and that any multilinear map evaluated 
term-by-term on such a linear combination will give the same value, no matter how the linear combination 
is chosen. 


Theorem 27.8.9 could be used to prove Theorem 27.8.7 without using a basis and coordinates. But clearly 
Theorem 27.8.9 is heavily dependent on a basis and coordinates for its own proof. The statement of this 
theorem avoids the basis and coordinates, but it is very difficult to see how a “coordinate-free” proof of it can 
be constructed. Nevertheless, parts (1) and (3) do give an intuitively appealing notion of how multilinear 


functions are related to simple multilinear functions. 


27.8.11 REMARK:  Kernel-like structure for multilinear maps on primal spaces. 
Theorem 27.8.12 is a dual version of Theorem 27.8.2. 


27.8.12 THEOREM: Multilinear maps constant on contours of canonical multilinear function map transpose. 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear map f : Xecs Va > U, 


Vul,v? € x Ve n7w) =n) => fat) = fa’), 


where 77 is the canonical multilinear function map transpose for .Z((V*)4c 4; K). 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let U be a 
linear space over K. Let f : Xaca Va > U be a multilinear map. Let (e? )i,er, be a basis of V4 for 
each a € A, and denote by (h& );, er, the corresponding dual basis in Definition. 23.7.3. Let vl, v? € Xaeca Va 
with n” (vt) = n7 (v?). Let I = Xaca Ta. Then by Theorem 27.5.4, f(v') = Ye, flein œt) (hi) = 
Ser f(ei)m? (v (h i) = f(v?), where e; = (62. Jaca and h; = (h? Jaca for i € I. 
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Chapter 28 


TENSOR SPACES 
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28.1. Tensor product spaces 


28.1.1 REMARK: The perils of developing tensor algebra by starting with simple tensor products. 

The definition adopted for tensor products in this book is the dual of the linear space of multilinear functions 
on a finite family of linear spaces (Vą)aca. This definition is simpler, clearer and more economical than 
other representations. (See Remark 28.5.3 for some alternative representations.) 


In most applications, the tensor component spaces V4 are identical copies of a single finite-dimensional 
linear space, and the index set A consists of integers. However, tensor spaces are defined in this chapter with 
heterogeneous component spaces with general index sets as far as possible, as in Definition 28.1.2. 


It is tempting to commence the presentation of tensor algebra with simple tensors, which are formal products 
of vectors. But this is a trap. A development path which starts with such simple products of vectors leads 
very rapidly to complicated, convoluted representations which are difficult to work with. The approach here 
is to commence with multilinear maps (Section 27.2), then define tensors as the dual of multilinear maps 
(Section 28.1), and then define simple tensors as a special kind of tensor (Section 28.4). 


28.1.2 DEFINITION: The tensor product (space) of a finite family (Va)aca of linear spaces over a field K 
is the dual .Z((V4)ae4; K)* of the linear space of multilinear functions 2((Va)aca; K). 


A tensor space is the tensor product of a finite family of linear spaces over the same field. 
A tensor is any element of a tensor space. 


28.1.3 REMARK: Schematic diagrams for tensor spaces. 

Notations 28.1.4, 28.1.5 and 28.1.7 are illustrated in Figure 28.1.1. (These diagrams should be compared 
with the diagrams for spaces of multilinear maps and multilinear functions in Figures 27.2.2 and 27.3.1 in 
Remarks 27.2.20 and 27.3.5.) 


28.1.4 NOTATION: (9,c4 Va denotes the tensor product space of a finite family (Va)aca of linear spaces 
over a field K. Thus 
A Va = Z ((Va)aca; K)* 
= Lin(Y((Va)aea; K), K). 


28.1.5 NOTATION: @j, V; for a sequence of linear spaces (V;)7*., over a field K, where m € Zf, denotes 
the tensor product space Qaca Va with index set A = Nm. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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>> RT 


i j 


Z((Va)aeA) 2(Vi,...Vm) Lm(V) 
@ Va=L(Va)acaY QV = POE Q"V = ZU) 
Figure 28.1.1 Tensor space notations 


28.1.6 NOTATION: V;® ...Vim for a sequence of linear spaces (V;)*, over a field K, where m € Zi, 
denotes the tensor product space Q; Vi. 


28.1.7 NOTATION: 9" V for a linear space V and m € Zf denotes the tensor product ®/, V; where 
Vi = V fori € Nm. 


28.1.8 REMARK: Alternative tensor space notations. 
See Remark 30.4.13 (Table 30.4.2) for a list of tensor space notations of other authors. 


28.1.9 REMARK: Abbreviated schematic diagrams for tensor spaces 

The diagrams in Figure 28.1.1 for the tensor spaces in Notations 28.1.4, 28.1.5 and 28.1.7 are may be 
abbreviated as in Figure 28.1.2. The lines with “chicken-feet” tails symbolise multilinear function spaces 
from the indicated cross-product sets to K. The lines with the single-stroke tails symbolise the dual spaces 
of the indicated multilinear function spaces. 


© Va © Vin Gy 
acA i-—t 
L((Va)aeA) (Vi... V) (V) 
EM Vize- Vm oa 
Figure 28.1.2 Abbreviated tensor space notations 


28.1.10 REMARK:  Equivalences between notations for tensor spaces for a family of linear spaces. 
In terms of Notations 28.1.5 and 28.1.6, one may write 


VG. Vs = Q Vi = A) K) = (s Vmi K)" 


for m € Zt. Notation 28.1.7 implies that Q” V = Gn(V; K)*. 


28.1.11 DEFINITION: The degree of a tensor space Qaca Va is the number #(A), where (Va)aca is a finite 
family of linear spaces over the same field. 


28.1.12 REMARK: The degree of a tensor space with homogeneous linear spaces. 
The degree of a tensor space 69" V for m € Zf and a linear space V is the number m. 


28.1.13 REMARK: The degree of a tensor is evident in its set-construction. 
Definition 28.1.11 applies the term “degree” to tensor spaces, not the tensors which inhabit those spaces. 
However, the degree of a tensor is well defined because every element of acA Va is a linear functional on 
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the linear space &((Va)aca; K). (The domain of a function may be determined unambiguously from any 
function, and all functions in a dual space have the same domain.) The domain for this linear space is always 
a non-empty set of m-tuples. (See Remark 27.2.13.) The number of elements in a tuple is unambiguous. 
Therefore Definition 28.1.11 is readily extended to define the degree of each tensor which is an element of a 
tensor space. 


28.1.14 REMARK: Alternative terminologies for the degree of a tensor. 
Some authors use the words “order”, “rank” or “valence” for the tensor degree in Definition 28.1.11. Some 
variations on this terminology are summarised in Table 29.5.1 in Remark 29.5.11. 


The word “rank” strongly suggests the matrix concepts of “row rank" and “column rank”. So this is excluded 
here. The word “order” is required for calculus concepts such as “second order derivatives" etc. The term 
“valence” seems to have no relevance and very little popularity. 

The word “degree” could be confused with polynomial degree, but this is not actually confusion at all. Since 
multilinear functions on finite-dimensional linear spaces are sums of products of linear functionals, they are 
in fact polynomials of homogenous degree. So it is quite apt to use the same word "degree" for the dual 
spaces, namely the tensor spaces. 


28.1.15 REMARK: The meaning of tensor spaces of degree 0. 

A tensor space of degree 0 has the form @'_, V; = Lin.Z((Va)oeg; K), K) = €)? V for a sequence of linear 
spaces (V;)°_, over a field K. By Remark 27.2.14, Y((Va)aco;K) = K? in this case, which may be 
identified with the field K regarded as a linear space. (See Definition 27.3.7 for this identification.) Then 
G0? , V; = Lin(K ), K) may be identified with Lin(K, K), which may also be identified with K regarded 
as a linear space over K. Therefore dim(®)_, Vi) = 1. This is consistent with Theorem 27.6.14 combined 
with Theorem 23.7.11. Definition 28.1.16 gives a natural linear space isomorphism from K to Gv. 


28.1.16 DEFINITION: The canonical injection of scalars in a tensor space Q$ V; = Z (0; K)* over a field 
K is the map i: K + .Z(0; K)* given by 


vt e K, vo e KY, i(t)(9) = to(0). 


In other words, i: t 9 (¢ + td(0)) for all t € K and à € K. 


28.1.17 REMARK: The meaning of tensor spaces of degree 1. 

A tensor space of degree 1 has the form ©, V; = Lin(.Z((Va)aeA; K), K) for a sequence of linear spaces 
(Vi), over a field K with m = 1. The sequence (V;)L., is the set {(1,Vi)}, which is not exactly the same 
as Vi, but these sets may be identified. Then .Z((V4)aeA; K) = Vr. So Q; V; = V;**. Therefore if V; has 
a basis, 9; V; may be identified with V; by Theorem 23.10.7, and dim(G9;-, V;) = dim(Vi). The map in 
Definition 28.1.18 is essentially the same as the canonical map in Definition 23.10.4. 


28.1.18 DEFINITION: The canonical injection of vectors in a tensor space QV = L(V; K)* over a field 
K is the map i: V > .Z(V; K)* given by 


Vv € V, Vo € Z (V; K), i(v)(o) = é(v). 
In other words, i: v > (ġ > $(v)) for all v € V and $ € .Z(V; K). 


28.1.19 REMARK:  Tensors of degree 1 are not the same as vectors. 

On the subject of “identifications” generally, it should be said that they generally trade meaningfulness 
for convenience. Often some important meaning is lost when similar things are agreed to be identical or 
“identified”. In the case of tensor spaces of degree 1, the loss of meaningfulness might be quite significant. 
In fact, it could be argued that much of the confusion surrounding tensors may be traced to the kind of 
identification of tensors with vectors which is mentioned in Remark 28.1.17. 


A vector is a class of object which may be interpreted physically as a velocity. (See Sections 26.19, 53.1 
and 53.3 for some discussion of the interpretation of vectors as velocities.) By contrast, an element of V**, 
for a finite-dimensional linear space V, is a map of the form w > w(v) for some v € V. In other words, an 
element of V** maps linear functionals w € V* to values w(v). In other words, it is a map from V* to K, 
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where K is the field of V. This map just happens to be expressible as the action of linear functionals w € V* 
on a fixed vector v € V (or alternatively as the “response” of vector v to linear functionals w € V*). If w 
represents a force field of some kind, then w(v) represents the force (or work) experienced by a particle with 
velocity v in that field. By replacing the physically meaningful formula “w e w(v)" with the vector v, the 
interaction with the force field is eliminated from the mathematical representation. 


If one acquiesces in the identification of tensors of degree 1 with vectors, one naturally seeks to identify 
tensors of degree 2 with vectors in some way. Then it may seem reasonable to identify tensors of degree 2 
with sums of pairs of vectors, and the mathematical machinery to handle this becomes very clumsy. But 
if one forgoes the initial “advantages” of identifying tensors of degree 1 with vectors, one may deal with 
all (contravariant) tensors in a consistent way as the duals of multilinear spaces. Then one may physically 
interpret all (contravariant) tensors as interactions between particles and force fields, including tensors of 
degree 1. In short, vectors are not a sub-class of the class of tensors, and tensors are not a generalisation of 
the class of vectors. 


28.1.20 THEOREM:  Basis-and-coordinates formula for the action of a tensor on a multilinear function. 
Let (V4)aeA be a finite family of finite-dimensional linear spaces over a field K. Then 


YTE Q Va Vf EL(Va)acaiK), TU) =D) TO), 
e icl 


where (¢;)ie7 is the canonical basis for .Z((V4)se4; K) in Definition 27.6.9 corresponding to the bases 
(e. JiacIa for Va, and e; = (e% )aeA for all i = (ia)aca € I = Xaca Ia- 


PROOF: Let T € ®aea Va and f € Z((Va)aca; K). Then T(f) =T( Xer f(e:)¢i) by Theorem 27.6.13. 
Hence T(f) = Jier T($:)f(ei). 


28.1.21 REMARK: Extension of formula for action of a tensor on a multilinear function. 
'Theorem 28.1.20 is slightly extended in Theorem 28.2.12. 


28.1.22 THEOREM: The tensor product space dimension equals the product of the component dimensions. 
Let (Va)aca be a finite family of finite-dimensional linear spaces. Then 


dim( @ Va) = [J dim(Va). 


acA 


PROOF: This follows immediately from Theorem 27.6.14 because aca Va is the linear space dual of 
L ((Va)aca; K). 


28.1.23 REMARK: Comparison of tensor product space dimension with direct product space dimension. 
The dimension [] e4 dim(Va) of the tensor product aca Va of finite-dimensional linear spaces may be 
compared with the dimension X „e4 dim(Va) of the direct product Paca Va, which is also known as the 
“direct sum” of the family of linear spaces. 


space dimension 
Paca Va Pac dim(V4) 
Baca Va Iaca dim(Va) 


This shows clearly how different the two concepts are. It also explains why the word “sum” is used for the 
direct sum of linear spaces. 


28.2. The canonical tensor map 


28.2.1 REMARK: Distinguishing the canonical tensor map from the canonical multilinear map. 

The “canonical tensor map” in Definition 28.2.2 is usually referred to as the “canonical multilinear map”. 
However, in this book, there are two canonical multilinear maps. To assist systematic naming, Defini- 
tion 28.2.2 is called the “canonical tensor map” to distinguish it from the “canonical multilinear function 
map” in Definition 27.4.4. 
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28.2.2 DEFINITION: The canonical tensor map for a tensor space aca Va, where (Va)aca is a finite 
family of linear spaces, is the map 2: Xaca Va — Baca Va which is defined by 


Vv € ES Va, Vf e Z((Va)aca; K), u(v)(f) = f(v). 


28.2.3 THEOREM: The canonical tensor map is linear. 
The canonical tensor map H : X «eA Va > Qaca Va in Definition 28.2.2 is multilinear. 


PROOF: Let B c A. Let \1,A2 € K. Let u, v, w € Xaea Va satisfy ug = A1vg--A2wg and Va € A\ {8}, ua = 


Va = Wa. Let f € Z((Va)aca; K). Then p(u)(f) = f(u) = f((ua)aeA) = Mf((vo)oaeA)  A2f((wa)aoeA) = 
Ajif (v) + Af (w). Therefore u is multilinear by Definition 27.2.3. 


28.2.4 REMARK: Physical interpretation of the canonical tensor map. 

The expression u(v)(f) signifies the “response” of a vector tuple v = (va)aea to a multilinear function f. 
Thus p(v) signifies the “effect” of multilinear functions on the vector-tuple v. Hence u maps vector-tuples 
to their “tensorial response” in the presence of multilinear fields. 


Since the interaction between a particle and a field depends on both the particle and the field, one may 
represent the interaction as either a function of the particle or a function of the field. The expression f(v) 
presents the interaction as a function of the particle’s vector-tuple as if the function f is active and the 
vector-tuple v = (va)aea is passive. In the expression u(v)(f), the function u(v) represents the particle’s 
attributes while the function f represents the field as if the particle is in the foreground and the field is in 
the background. The map p reverses the roles of particle and field by transposing the function as discussed 
in Remark 10.19.4. 


To put it another way, the expression f(v) means “the effect of f on the vector tuple”, whereas u(v)(f) 
means “the response of the vector tuple to the field”. They are the same idea from two different viewpoints. 


28.2.5 REMARK: Diagrams of domains and ranges of the four canonical multilinear maps. 
The domains and ranges of the canonical multilinear function map 7 (and its function-transpose 7”) and 
the canonical tensor map 44 (and its dual version u*) are summarised in Figure 28.2.1. 


AP Jaca; K L ((Va) ee x Vaga K) AV a laca; KY“ 
a " acA 
EN a ' : 
x Vi 
ae acA 
Figure 28.2.1 Canonical multilinear map domains and ranges 


28.2.6 REMARK: The relation between the canonical tensor map and canonical multilinear map. 

The canonical tensor map H : X aca Va — aca Va in Definition 28.2.2 is related to the canonical multilinear 
function map ņ : XaeA Ve > V ((Va)aca; K) in Definition 27.4.4. Since aea Va is the linear space dual 
of Z ((Va)aca; K), the composition u(v)(n(w)) is well defined and satisfies 


wE X Ve vwe 2 Vas u(v)(nlw)) = n(w)(v) = n7 (v)(w) 
= II Wa(Ua)- 
acA 


Hence p(v) o n =n" (v) € Z((Vz)oea: K) for all v € xacA Va. 


28.2.7 REMARK: The canonical tensor map for a family of dual spaces. 
If Definition 28.2.2 is applied to the dual spaces (V*)aea, a canonical tensor map pu* : XaeA V? > Baca Ve 
is then defined by 


Wwe x Va, Vf € .Z((Vy)oea: K), p(w)(f) = f(w). 
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This is well-defined when the primal component spaces V, are infinite-dimensional, or even when these spaces 
have no basis. However, the dual-space canonical tensor map p* is (probably) of limited applicability for 
such primal spaces. 

The dual-space canonical tensor map does not require a separate definition, but it is formalised anyway in 
Definition 28.2.8, to at least give it a definite name and notation. 


28.2.8 DEFINITION: The dual-space canonical tensor map for a tensor space aca Va, where (Va)aca is 
a finite family of linear spaces, is the map u* : Xqgea VŽ — Baca Vč which is defined by 


Yw € x VŽ, Vf € AV nea, K), p(w)(f) = f(w). 


28.2.9 REMARK: Relation between the dual-space canonical tensor map and canonical multilinear map. 
The dual-space canonical tensor map u* in Definition 28.2.8 is related to the canonical multilinear function 
map 7] : Xaca Vè > .Z((Va)aeaA; K) in Definition 27.4.4. Since aca Vž is the linear space dual of 
A ((V3)seA; K), the composition *(w)(77(v)) is well defined and satisfies 


vwe x Var Wwe x Va, ju (w) (^ (v)) = n" (v)(w) = n(w)(v) 
= II WalVa). 
acA 


Hence p*(w) o n” = n(w) € 2((Va)aeas K) for all w € xaea Vå. 


28.2.10 REMARK: Conditions for injectivity and surjectivity of the canonical tensor map. 

'The canonical tensor map in Definition 28.2.2 is almost never injective, and it is only surjective in special 
cases. Injectivity holds for #(A) = 0 and #(A) = 1. In the case #(A) = 0, there is only one element 
in Xqea Va. So p is necessarily injective. In case #(A) = 1, the space aca Va is the double dual of 
the linear space Va. So p is actually the “canonical map from a linear space to its second dual" which is 
presented in Definition 23.10.4. Therefore the map is injective in this case. When #(A) > 1, the map n is 
necessarily non-unique. 


The canonical tensor map is surjective when dim(V4) = 0 for some a € A because then dim(C94e 4 Va) = 0 
also. The map is also surjective if #(A) = 1 with a single finite-dimensional linear space V4 because the 
"canonical map from a linear space to its second dual" is a linear space isomorphism if the primal linear space 
is finite-dimensional. The map is also surjective if dim(V4) > 1 for at most one index o € A if this single 
tensor component is finite-dimensional. But if dim(V4) > 1 for two or more indices a, and dim(Va4) > 0 for 
all indices a, then p is necessarily non-surjective. 


28.2.11 REMARK:  Basis-and-coordinates expression for general tensors. 

Theorem 28.2.12 expresses tensors in terms of basis vectors (for the tensor component spaces) and the 
canonical tensor map p. (This is a modest extension of Theorem 28.1.20.) Similarly, Theorem 27.5.4 
expresses multilinear functions in terms of basis vectors (for the multilinear function component spaces) and 
the canonical multilinear function map n, 


28.2.12 THEOREM:  Basis-and-coordinates expression for Qaca Va = L((Va)aca; K)*. 

Let T € Qaca Va, where (Va)aca is a finite family of finite-dimensional linear spaces over a field K. Let 
Ba = (e JiacIa € Vl^ be a basis for Va for each a € A. Let ei = (e£ Jaca € XaeA Va for each index family 
i= (iaJaca E€ I = Xaca Ia. Then 


Vf €L((Va)oeaiK), T(f) => 0 Ti) F (ea) (28.2.1) 
iel 

= SOT (¢i)ules)(f), (28.2.2) 
iel 

where à; € Z((Va)aca; K) is defined by 6 = J[aca ^?, for i € I and the linear functional families 


hy = (h?)aeA € XaeAVa for i € I are as in Definition 23.7.3, and p : XacaVa — Baca Va is the 
canonical tensor map in Definition 28.2.2. Hence 
T = Y T(6)u(e). (28.2.3) 
iel 
PROOF: Line (28.2.1) is the same as Theorem 28.1.20. Then line (28.2.2) follows from Definition 28.2.2. 
Line (28.2.3) follows from the pointwise definition of equality of tensors. 
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28.2.13 REMARK: Extension of the canonical tensor map to symbolic sums of vector-tuples. 

The canonical tensor map H : Xaca Va > Baca Va for the finite family of linear spaces (Va)aca may be 
extended in a natural way to the list space List(x,e4 Va). There is no natural vector addition operation on 
the Cartesian product x44 Va, but if this set is extended to lists, a primitive kind of addition can be defined 
by list concatenation. The range of u has a well-defined vector addition already. This suggests an extended 
map j : List(xaea Va) > Baca Va defined by i((v*)ier) = 35e, (v) for all lists (i.e. finite sequences) 
(vlier in List(xaea Va). Then 


H(((va)aea)ier) (f) = >> f((vi)aea)s 


icI 


for all ((($)5&4);e; € List(xaca Va) and f € .Z((V4)seA; K). This map has the property that jj applied 
to the concatenation of two vector-tuple lists yields the sum of u applied to each vector-tuple list. In other 
words, the concatenation operation maps to the addition operation. 


28.3. Universal factorisation property for tensor products 


28.3.1 REMARK: The name of the universal factorisation property. 
Theorem 28.3.7 is known as the “universal factorisation property" for tensor products. Bump [57], page 50, 
calls it simply the *universal property". 


28.3.2 REMARK: Interpretation of the universal factorisation property theorem. 

Theorem 28.3.7 may be thought of as stating that all multilinear maps f : Xoc4 Va — U can be “upgraded” 
to linear maps g : aca Va — U. The function f is defined only for tuples (va)aeA, whereas g is defined for 
tensors in Qaca Va. This tensor space typically has a higher dimension than the sum of the tensor component 
dimensions. The function g can also be “downgraded” to f again by applying the formula f = g o p. 


In many presentations of tensor algebra, the universal factorisation property is actually the definition or 
"characterisation" of a tensor space, and any tensor space representation which satisfies this criterion is 
accepted as valid. This metadefinition approach saves all the arguments about which representation is best, 
but it is unsatisfyingly abstract for people who expect concrete definitions. 


28.3.8 REMARK: Comparison of universal factorisation properties for multilinear functions and tensors. 
Theorem 27.8.7 is the dual version of Theorem 28.3.7. Theorem 27.8.7 is preceded by Theorem 27.8.2, 
which shows that 7 partitions Xaecea V; no more coarsely than any other multilinear function. Therefore 
g = f o n~t is well defined. Probably this approach could be replicated for 7^ and p* also. 

'Theorems 28.3.4 and 28.3.5 are dual analogues of Theorems 27.8.2 and 27.8.4. Theorem 28.3.7 could be 
restructured to be more like Theorem 27.8.7, which uses Theorem 27.8.4 to prove uniqueness. 


28.3.4 THEOREM: Multilinear maps are constant on contours of the canonical tensor map. 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear map f : Xae4 Va > U, 


Vut, u? € x Va, ie = ple") => Fe) = Fo’), 


where 4 is the canonical tensor map for ®aca Va- 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let U be a 
linear space over K. Let f : XacaVa > U be a multilinear map. Let Ba = (ef JiacIa be a ba- 
sis of V4 for each a € A. Let v!,v? € Xaca Va satisfy u(v!) = p(v?). Let I = Xaca Ta. Then 
f(v!) = 2 ier f(ei)ói(v!) = Jier fleiJulut) (gi) and f(v?) = Det f(ei)ói(v?) = ? el f(ei)u(v?)(9.) by 
Theorem 27.6.13 and Definition 28.2.2. So f(v!) = f(v?). 


28.3.5 THEOREM: Quotient of multilinear maps by the canonical tensor map. 

Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear map f : Xaca Va — U, there is a unique function g : Range(u) > K which 
satisfies g(u(v)) = f(v) for all v € x4e4 Va, where p is the canonical multilinear function map for aea Va. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


950 28. Tensor spaces 


PROOF: Let W = Xqea Va. To show existence of a function g : Range(u) > U with g o u = f, let g = 
{(T, f(v)); T € Range(u), v € W, u(v) = T}. If (T, f(v*)) € g and (T, f(v?)) € g, then p(v") = T = u(v?). 
So by Theorem 28.3.4, f(v!) = f(v?). So g is at most single-valued. If T € Range(), there is some v € W 
with u(v) = T. So g is at least single-valued. Therefore g is a function. 


To show uniqueness, let g,g’ : Range(u) > U satisfy g o u = g' o p = f. Let T € Dom(g) = Dom(g'). Then 
T = u(v) for some v € W. So g'(T) = g'(u(v)) = f(v) = g(u(v)) = g(T). Hence g = g'. 


28.3.6 REMARK: Arrow diagram for quotient of multilinear maps by the canonical tensor map. 
The functions and spaces in Theorem 28.3.7 are illustrated in Figure 28.3.1. 


® Va J U 


acA 


Figure 28.3.1 Universal factorisation property for tensor products 


28.3.7 THEOREM: Universal factorisation property (tensor product) 

Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space 
over K. Then for any multilinear function f : Xac4 Va — U, there exists a unique linear map g : Baca Va > 
U such that f = g o u, where u : Xaca Va — Baca Va denotes the canonical tensor map for the tensor 
space Baca Va- 


PROOF: Let (Va)aea be a finite family of linear spaces over a field K. Let (e? Ji cr, € Vde be a basis for 
Va for each a € A. Let U be a linear space over K. For all f € Z((Va)aca; U), define g : Maca Va > U by 


YTE Q Va = SOT (¢) F(R Jaca), 


tel 


where I = Xgca Ia and $; € 2((Va)aca; K) is defined by ¢;(v) = J Jaca vf, for all v = (va)aca € XaeA Va; 
for all i = (ta)aea € I, where va = igEla v? e% for alla € A. Then g is a linear function of T because 

T (6) is a linear function of T for each i € I. (This follows from the pointwise definition of vector addition 
and scalar multiplication for the linear space aea V4.) Substituting u(v) for T gives: 


Wwe x Va g(u(v)) = M p(v) ($i) f ((e%, Jaca) 

wel 

= Y pll aca) 
wel 

= (TT E)E) 
icl acA 

= => f UE, ef) Jaca) 
icl 

= f b» Be eu) 

icls 
= f(v). 


Therefore f = g o u. This proves existence. 


By Theorem 28.2.12, tensors T € aca Va may be expressed as a linear combination T = J jer T(¢i)u(ei) 
of the simple tensors u(e;) for i € I. Then g(T) = gdje T (Gi) ules) = 35er (oi)g(u(ei)). Therefore g 
is uniquely determined by the values of g on Range(j), and g is uniquely determined on Range(j:) by the 
relation f = g o u by Theorem 28.3.5. Therefore g is uniquely determined by f. 
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28.3.8 REMARK: More interpretation of the universal factorisation property theorem. 

Theorem 28.3.7 looks very much like an extension theorem. Suppose that the values of a linear function 
h: Qaca Va — U are known only for the images u(v) € Maca Va of the vector tuples v € Xgea Va. Then 
the values h(u(v)) for such v can be mapped back via u so that they seem to be coming from x4 Va. 


For v € x4c4 Va, define f(v) € U by f(v) = h(u(v)). Then f(v) is multilinear with respect to v because pu 
is multilinear. Therefore f satisfies the conditions of Theorem 28.3.7. So by the theorem, there is a unique 
function g : Maca Va > U with f = g o u. This function g must agree with h on the image of u. Hence g 
is a linear extension of h from the image u(Xaca Va) of u to all of the tensor space Maca Va. 


28.3.9 REMARK: Universal factorisation property theorem statement does not need a basis. 

It seems that the proof of Theorem 28.3.7 requires the use of bases for the component spaces of the tensor 
product, although the statement of the theorem does not seem to require bases. The uniqueness part of the 
theorem shows that the constructed map is independent of the basis which is used to construct it. 


28.4. Simple tensors 


28.4.1 REMARK:  Untidiness of the old-fashioned sum-of-vector-tuples concept of tensors. 

The canonical tensor map H : XaeA Va > Z ((Va)aca; K)* in Definition 28.2.2 maps vector tuples to tensors 
by the rule u : v (f — f(v)). This map answers the question: “What is the response of each vector tuple 
to a given multilinear field?” The idea of “response of a particle to a multilinear field” was mentioned in 
Remark 27.1.1. This “response” can be described as the field’s effect on a single vector tuple only in special 
cases. In general, the response to a multilinear field may be described as the sum of the effects of one or more 
vector tuples. The non-uniqueness of this kind of sum-of-vector-tuples description is frustratingly untidy. 
Therefore formalising tensors in this way would lead to massive confusion and complexity. 


The untidiness of the sum-of-vector-tuples description is avoided by not including them in the formal defi- 
nition of tensor spaces in Section 28.1. Tensors constructed from vector tuples are presented in Section 28.4 
as a special topic, not as the bedrock of tensor algebra. 


28.4.2 DEFINITION: A simple tensor in a tensor product space ® aca Va is any tensor (v) for v € Xaca Va, 
where p is the canonical tensor map for acA Va. 


28.4.3 NOTATION: QacAVa, for an element (va)aea of a finite family (Va)aea of linear spaces over a 
field K, denotes the value u((vo)aeA), where ps is the canonical tensor map pz for the tensor product acA Va 
in Definition 28.2.2. 


28.4.4 NOTATION: 7*4 vj, for a sequence of vectors (vi)*, in linear spaces (V;)7*, over a field K, where 
m € Zi, denotes the simple tensor u((v;)7* 4). 


28.4.5 NOTATION: v4 ...Um, for a sequence of vectors (v;)*, in linear spaces (V;)*, over a field K, 
i=1 i=l 


where m € Z{, denotes the simple tensor @72, vj. 


28.4.6 REMARK:  Equivalences between notations for tensor monomials for a family of linear spaces. 
In terms of Notations 28.4.4 and 28.4.5, one may write 


for all f € .Z((V;)4; K). 


28.4.7 REMARK: The relation between simple tensors and simple multilinear maps. 
Simple tensors (v) = Qac A Va for v € XaeA Va are related to the simple multilinear maps r(w) in Defini- 
tion 27.4.6 as follows. 


Vw € Pm Vi Vv € E Va, (2, Va) (n(w)) = n(w)((va)aea) 
= II WalVa). 
acA 
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28.4.8 REMARK: An expression for general tensors in terms of simple tensors. 

Suppose (ef )i,er, is a finite basis for V, for a € A. Using Notation 28.4.3, the simple tensor u((ef )aeA) 
may be written as Qaca €f, for i = (i4) € I = XaeA Ta. Then from Theorem 28.2.12 line (28.2.3), it follows 
that T = » icr T(ó:) Saca ef, for all T € acA Va, where (ó;)ie1 is the canonical basis for .Z((Va)aeA; K) 


in Definition 27.6.9. 


28.4.9 REMARK: A basis for a linear space of tensors. 

The basis (e? )i,er, for finite-dimensional V4 for each a € A yields the basis vectors e; = (@aca ef, Jaca 
for Qaca Va for i = (ig) € I = Xaea Ia. 

The basis (e;)ier for Qaca Vo is the dual of the basis ($;);e; for .Z((Va)scA; K) in Definition 27.6.9. These 


bases satisfy: 


Vi € I, Vj € I, ei(65) = $3 ((E%, aca) 


= II 5.5. 


28.4.10 EXAMPLE: The equivalent “bilinear effect” of vector-pairs. 

Let vı € Vı and v9 € Vo, where V; = V2 = IR" with n € Zt. Then p((vi, v2)) € Vi ® Vo is the linear 
functional on .Z'(Vi, V2; IR) which maps every multilinear function f € 2(Vi,V2;R) to f(v1,v2). In other 
words, u((v1,v2)) = v1 8 vo : f OH f (v1, v2) for all (v1, v2) € Vi x V2. 

If a vector-pair in R” x R” is constrained to map to the same tensor in RQR”, the vectors must be scalar 
multiples of a fixed pair of vectors (v1, v3). Then the locus of the sum of the vectors in the pair traces out 
a hyperbola as illustrated in Figure 28.4.1. Each of these vector-pairs gives the same value for any fixed 
bilinear function f. 


- f (v1, v2) aa £(0.5 11,2 v2) me" 


Ay = ti + A205; f(A101, A2v2) = t} 


Figure 28.4.1 Equivalent “bilinear effect” of vector pairs 


The constraint f(Aiv1, A2v2) = t for a bilinear function f implies A1 A2 f (v1, v2) = t, which implies a reciprocal 
relation between A; and Ag. 

The same monomial degree-2 tensor is determined by each of the vector pairs (A1v1, A2v2) in Figure 28.4.1 
because all bilinear functions f must give the same value f(A1v1, A2v2) for each of these vector pairs. Thus 
even though we often think of tensors as products of vectors, the tensors are only determined by the vectors. 
Many vector pairs determine the same tensor. It is better to think of a tensor as a property of a vector tuple 
rather than as the tuple itself. This is analogous to the common acceptance that 1/2 and 2/4 determine the 
same number. 

Let (aij) j= € Mn,n(R) be a real n x n matrix and define f : Vi x V > R by f(v1, v2) = icd QijU1 i U2,j 
for (v4, v3) € Vi x Va, where (vk i); are the components of v; for k = 1,2. Then f is bilinear. Therefore 
(v1 & v2)(f) = u((v1, v2))(f) = Dija a;jU1,,U2,j for all vy € Vi and v2 € Va. In this example, f is a general 
bilinear map and vı Q v2 is a simple bilinear tensor. 


28.4.11 REMARK: Another way of seeing that not all tensors are simple tensors. 
Not all tensors are simple tensors. This is abundantly clear from the different dimensions of the spaces of 
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simple tensors and general tensors. In the case of finite-dimensional spaces (Va)aca with na = dim(Va 
for each a € A, the dimension of the linear subspace of simple tensors is }),¢4 na, but dim(®aea Va) = 
[laca na. (See Remark 28.1.23.) 


There is a more concrete way to see that not all tensors are simple tensors. If one has a finite basis for each 
component space as in Remark 28.4.8, one may write any T € aca Va as T = ? aet ki Daca €. , where 
the coefficients k; are defined by ki = T(¢;) for all i € I. In the case #(A) = 2 with Vj = Vo = V with 
dim(V) =n € Zi and K = R, the coefficients k; become an n x n square matrix a € Mn (IR). By adjusting 
the basis of V, the matrix can be diagonalised if it is symmetric. Then the tensor 7' can be written as the 
sum of n simple tensors. Only in very special cases is it possible to choose the basis so that all except one 
of the elements of the matrix a are equal to zero. Clearly simple tensors are quite unusual in general. 


28.4.12 REMARK: Simple tensors can be represented as equivalence classes of vector-tuples. 

Since vector-tuples with the same “multilinear effect” are mapped to the same tensor by the canonical tensor 
map, one might conjecture that tensors may be defined as equivalence classes of vector-tuples. Certainly one 
may define an equivalence class v(v) = (v' € x7*4 Vi; V € .Z, (V), o(v’) = ó(v)) for all v € x, Vi, but 
these classes can represent only simple tensors in ©”, V^, not general tensors. 

Similarly, in the case of antisymmetric tensors in Section 30.4, one might wish to represent such tensors as 
equivalence classes (v) = (v' € x7*4 Vi; Vd € .Z,í(V), gv) = o(v)} for v € x?4Vi. However, only 
simple antisymmetric tensors may be represented in this way. 

In the case of simple tensors, Example 28.4.10 shows that “equivalent bilinear effect" classes do have a 
strong intuitive clarity. Similarly, in Remark 30.4.25, the “equivalent antisymmetric bilinear effect” of vector 
pairs gives strong intuitive clarity for simple antisymmetric tensors. Thus the tensor product v4 ® vg may be 
interpreted as the multilinear effect of the pair (v1, v2), and the symbolic expression v1 Qvz may be represented 
mathematically as an equivalence class V((vi, v3)). Similarly, the antisymmetric “wedge” product vı ^vo may 
be interpreted as the antisymmetric multilinear effect of the pair (v1, v2), and the symbolic expression v1 ^ v2 
may be represented mathematically as an equivalence class v^ ((v1, v3)). Despite the intuitive advantages, 
it is not easy to generalise this kind of representation to general tensors and general antisymmetric tensors. 


28.4.13 REMARK: Summary of formulas for canonical multilinear maps 
It is perhaps helpful to provide here a summary of some formulas. The canonical multilinear function map 
1) (and its transpose 77) and the canonical tensor map yz (and the dual version u*) are defined as follows. 


n : x Va > Z ((Va)aca; K) 


ac 
qi: x Va £((Vz)aea: K) 
acA 
H: X V. AT Vasa A) = & Va 
acA acA 
p: x VE + aca KY" = Q Vi 
n :we (ve J| walva)) (28.4.1) 
acA 
n? :ve (we [I walva)) (28.4.2) 
acA 
u :v= (db d(v)) (28.4.3) 
u* :w e ($e ġ(w)). (28.4.4) 
Substituting 7(w) for ¢ in u(v)(ó) and substituting 7 (v) for $ in u*(w)(o) gives 
We x Va, Vwe x Và, p(v)(n(w)) = n(w)(v) 
acA acA T 
= 1 (v)(w) 
Wwe x Va, uv) on =n" (v) (28.4.5) 
ac 
Vwe x VŽ, Wwe x Va, u*(w)(n" (v)) =n" (v)(w) 
acA acA 
= n(w)(v) 
Vw € AA Ve, u*(w) on? =n(w). (28.4.6) 
ac 
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Basis-and-coordinates expressions for multilinear functions and tensors are as follows. 


Và € I, Qi = n(hi) 

Wists éi =n" (e) 

Vf € L((Va)acaiU), f-M f(eon(h) =X fleg: 
tel icI 

vf € .Z((Vz)aca;U); f-» fh) (e) = 3, fade 
tel tel 

Vite 8 Va, YF €.Z((Va)ea;K), TO) =X T(¢.)f (ei) 
wel 

YTE Q Va, T = T(di)ules), 

acA ieI 


where U is a linear space over the field K. The universal factorisation properties are as follows. 
Vf € Z((Vz)aeA;U), Fg € LinZ((Va)aea; K), U), 
f=90n 


Vf € -Z((Va)aea; U), 3g € Lin( g Va, U), 
f=gou. 


28.5. Tensor space definition styles and terminology 


28.5.1 REMARK: How to skip this section. 

Section 28.5 is a relaxing general discussion of the representation of tensor concepts in terms of Zermelo- 
Fraenkel sets, the advantages and disadvantages of the numerous alternative styles of definitions, and the 
historical origins of the confusing terms “contravariant” and “covariant”. The reader who prefers hard work 
should skip immediately to Section 29.1. 


28.5.2 REMARK:  Tensorial objects which transform similarly may be very different classes of objects. 
Bishop/Goldberg [3], page 79, make the following comment on the many alternative representations of 
tensor spaces as set-constructions. The term “interpretation” means “representation” in their context. The 
“multilinear function interpretation of a tensor” which they refer to is the style of tensor representation 
presented in Definition 29.3.8. 


Tensors generally admit several interpretations in addition to the definitive one of being a multilinear 
function with values in IR. For tensors arising in applications or from mathematical structures it 
is rarely the case that the multilinear function interpretation of a tensor is the most meaningful in 
a physical or geometric sense. Thus it is important to be able to pass from one interpretation to 
another. The number of interpretations increases rapidly as a function of the degrees. 


Flanders [11], page 3, makes a related comment. 


In tensor calculations the maze of indices often makes one lose sight of the very great differences 
between various types of quantities which can be represented by tensors, for example, vectors 
tangent to a space, mappings between such vectors, geometric structures on the tangent spaces. 


In other words, the fact that two tensor spaces have the same transformation rules does not necessarily imply 
that they are the same “type of quantity”. 


28.5.3 REMARK: Survey of definition styles for tensor spaces. 

Table 28.5.1 summarises some of the definitions for tensor spaces in the literature. The definitions in the 
table are converted into the notations in this book. Many of the literature definitions do not have the full 
generality of this book. Therefore they are generalised here for ease of comparison. 


The “characterisation” style of definition in Table 28.5.1 means that any tensor space representation is 
accepted if it satisfies Definition 28.6.2. The “quotient” style means that a free linear space is constructed 
on the Cartesian product of the component linear spaces and a quotient space over a set of identities is then 
constructed as in Section 28.7. The “coordinates” style defines tensors in terms of the transformation rules 
for the tensor coefficients. 
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year reference contravariant covariant 

1918 Wey] [310] coordinates coordinates 

1925 Levi-Civita [26] coordinates coordinates 

1949 Synge/Schild [: [41] coordinates coordinates 
1950 Corben/Stehle [258] coordinates coordinates 

1951 Steenrod [142] coordinates coordinates 

1959 Kreyszig [22] coordinates coordinates 

1959 Willmore [42] L ((Vž)aca; K) L((Va)aca; K) 
1962 Landau/Lifschitz [282 coordinates coordinates 

1963 Auslander/MacKenzie [1] characterisation Qaca Ve, (Baca Va)” 


1963 


Guggenheimer [16 


Z ((Vž)aca; K) 


Z((Va)aea; K) 


955 


1963 Kobayashi/Nomizu [19] 
1965 Lang [108] 
1965 MacLane/Birkhoff [110] 


quotient for (V;)aeA quotient for (VX )aea 
quotient for (Va)aeA = 
quotient for (V;)aeA quotient for (Ve )aea 


1968 Bishop/Goldberg [3] L((Ve)aca; K) L((Va)aca; K) 
1968 Choquet-Bruhat [6] L((Va)aca; K)* LZ (Valaeai A) 
1969 Federer [69] characterisation — 

1970 Misner/Thorne/Wheeler[202) ^ .Z((Vi)&ea; K) Z ((Va)aca; K) 
1970 Spivak [37] 2 ((Vž)aca; K) Z((Va)aea: K) 
1972 Malliavin [28 — ZZ ((Va)aea; K) 
1975 Lovelock/Rund [27] FUV2 een K) -Z((Va)aea; K) 
1979 Do Carmo [9] — -Z((Va)aea; K) 
1980 EDM? [113] quotient for (Va)aea ^ quotient for (V*)aea 
1980 Schutz [36] X ((Vi)aea; K) -Z((Va)acaA; K) 
1981 Bleecker [254] ZZ ((Vi)aea; K) SP ValasArA) 
1983 Marsden/Hughes [289 FU? wens K) L((Va)aca; K) 
1986 Crampin/Pirani [7] L((Ve)aca; K) X ((Va)aea: K) 
1987 Gallot/Hulin/Lafontaine [13] characterisation — 

1988 Kay [18] MANS peat) Z ((Va)acai K) 
1991 Fulton/Harris [76] characterisation aea Va, (Baca Va)” 


1994 Darling [8] quotient for (Va)aeA 


1995 O'Neill [295] -Z((Vi)eea: K) -Z((Va)aea; K) 
1996 Goenner [270] L((Ve)aca; K) L((Va)aca; K) 
1997 Frankel [12] -Z((Vi)eea: K) -Z((Va)aea; K) 
1997 Lee [24] -Z((Vi)eea: K) -Z((Va)aea; K) 
1999 Lang [23] — S ((Va)aea; K) 
1999 Rebhan [299 coordinates coordinates 
2004 Bump [57] characterisation = 
2004 Szekeres [305] quotient for (Va)aea quotient for (V*)aea 
2005 Penrose [297] coordinates coordinates 
2006 Gregory [272] coordinates — 
2012 Sternberg [38] ZZ ((Vi)aea; K) L((Va)aca; K) 
Kennington L((Va)aea; K)* L((Va)aca; K) 


'Table 28.5.1 Survey of tensor space definitions 


28.5.4 REMARK: The wide diversity of tensor space definition styles. 

The wide range of tensor space definition styles in Table 28.5.1 is a consequence of the difficulty of the 
concept. Tensor spaces do not fit easily into the framework of linear spaces although in components form, 
tensors do superficially resemble vectors or matrices, and in pure algebraic form, tensors do have strong 
relations to linear spaces. 


The “truth” about tensors may be as suggested in Remark 27.1.2, namely that multilinear functions represent 
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“force fields", and tensors represent the response to fields, and that response may have a vectorial character, 
or a product-of-two-vectors character, or something more complex than that. By starting a presentation of 
tensor algebra with the simplest case, the tensors of contravariant degree 1, the subject becomes unnecessarily 
complex. Starting with multilinear “force fields” and defining contravariant tensors as responses to such fields 
seems to give the subject a more natural structure and development sequence. 


28.5.5 REMARK: Criticism of representation of tensors as multilinear functions on linear functionals. 
About a third of the authors in Table 28.5.1 represent contravariant tensors as a space of multilinear functions 
on spaces of linear functionals, namely the space -Z ((Vž)aca; K). The choice of representation in this book 
is the space of linear functionals on a space of multilinear functions, namely the space £((Va)aca; K)*. 
These may be compared and contrasted as follows. 


(A) .Z((Va)aeA; K)*: Tensors are linear functionals on the multilinear functions. (This book.) 


(i 


(ii) wu ^ v maps ¢ to ó(u,v), where ó(u, v) = —(v, u). 


u@v maps ¢ to ó(u, v). 


(iii) The wedge products are not elements of the general tensor space. 


(i) u& v maps (41, 62) to ó1(u)ó»(v). 
(ii) u ^ v maps ($1, 2) to $(é1(u)ó»(v) — b2(u)d1(v)). 


(iii) The wedge products are elements of the general tensor space. 


) 
) 
) 
(B) .Z((VZ)aeA; K): Tensors are multilinear functions on the linear functionals. (The popular choice.) 
) 
i) 


The two representation choices (A) and (B) are naturally isomorphic by Theorem 29.2.4. 


It seems subjectively that contravariant tensors should be a different kind of object to multilinear functions. 
Suppose u and v are vectors in a linear space W. In model (A), they form a tensor u®v by acting linearly on 
all of the bilinear functions on W. This is a linear response to a bilinear field. In the more popular model (B), 
the response is a bilinear function of pairs of linear functionals, which is more difficult to reconcile with the 
physics. The field in model (B) effectively consists of tuples of linear functionals, which is not how one thinks 
of fields in physics. In model (A), the field is a multilinear function, which is more credible. 


The difference is clearest in the case of antisymmetric tensors. In the dual-of-multilinear-functions case (A), 
the action of the field ¢ on a vector pair is the same as in the ordinary tensor case, but $ just happens to 
be antisymmetric. That is, ó(u, v) = —ó(v, u). One can easily implement any kind of symmetry for the field 
by imposing symmetry conditions on the field, which is where the symmetry constraint belongs. 


In the popular multilinear-functions-of-duals representation (B), a simple wedge product u ^ v must map 
each pair (1, $2) to the expression (u^ v)(¢1, 02) = 2(u& v —v& u)(ó1, 62) = $(1(u)G2(v) — ó(u)oi(v)). 
In case (B), the particle must perform a more complex calculation on the two field strengths $1 and $» to 
decide how to move antisymmetrically. This seems very clumsy and arbitrary. This representation of simple 
wedge products does not put any constraints on the multilinear function pairs (41, à»). 


The fact that the popular model (B) defines wedge product-tensors to be a subspace of the general tensor 
products might sound like a good idea (because an already defined space can be recycled). But in fact, 
wedge products are a quite different kind of object to general tensors or symmetric tensors. They have a 
different algebra. You shouldn't be able to add an antisymmetric tensor to a symmetric tensor. It is difficult 
to think of a circumstance where you would want to add them. The result would be subject to ambiguous 
multiplication operations. Multiplication of antisymmetric tensors follows different rules to general tensors. 


It seems clear, therefore, that it is best to define antisymmetric contravariant tensors as the dual of the 
antisymmetric multilinear functions. 


The popular representation of tensors as multilinear functions on spaces of linear functionals has the ad- 
vantage of simplicity, especially when dealing with tensors of mixed type, but the advantage of convenience 
must be weighed against the disadvantage of ontological falsity. In other words, the popular representation 
has an obscure meaning, although it is convenient for computations. This is a frequent theme in modern 
differential geometry: trading meaningfulness for computational convenience. One might perhaps speculate 
that part of the reason for the famous incomprehensibility of modern differential geometry is this kind of 
trade-off of meaningfulness for convenience. 
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28.5.6 REMARK: Formulas for the wedge product in terms of the tensor product. 

The formula u Av = (u8 v — v &u) is given by MacLane/Birkhoff [110], pages 558-560, and Szekeres [305], 
page 209. This must not be confused with the formula aA B = a & B — B & a for differential forms, which is 
given by Frankel [12], page 67, and Darling [8], page 33. 


28.5.7 REMARK: The perplexing terminology for contravariant and covariant tensors. 

It often seems that the words “covariant” and “contravariant” are defined with the reverse meanings to what 
is expected. The terminology may be justified by noting that the coefficients of contravariant vectors vary 
as inverses of the transformations of basis vectors. However, contravariant vectors themselves are the primal 
vectors whereas covariant vectors are the dual vectors. 


It seems that the expression of all vectors in terms of components in tensor calculus may be responsible 
for the confusion. It one thinks of the components as being the real vectors and tensors, then certainly the 
term “contravariant components” makes very good sense. But if this adjective is then transferred from the 
components to the vector, the term “contravariant vector” comes into existence, although the vector itself 
does not vary! A more precise terms would be a “vector whose components are contravariant”. This would 
remove the confusion but would be tedious to read and write. 


In acomment about the transformation rule for contravariant vector coefficients with respect to a transformed 
basis, Szekeres [305], page 84, says the following. 


This law of transformation of components of a vector v is sometimes called the contravariant 
transformation law of components, a curious and somewhat old-fashioned terminology that 
possibly defies common sense. 


However, as discussed in Remark 27.1.2, the traditional choice of terminology might be completely correct. 
(See also a related discussion of the confusing terminology for the “covariant derivative” in Remark 71.6.2.) 


28.5.8 REMARK: Historical origins of terminology for contravariant and covariant tensors. 
The following passage appears in the 1926 English translation (approved by the author) of a 1923 book by 
Levi-Civita [26], page 20. It is expressed in an old-fashioned way. So it needs deciphering. 


[...] We may therefore interchange 4 and j in the second part of the preceding formula, so that we 
can now write the equation in the form 


n (OX; OX; 
j i 


) aa dx; (11) 
1 

The expression Ya — dws is called the bilinear covariant relative to the given Pfaffian. The use 
of the term “bilinear” is sufficiently justified by the expression just found, which is linear in the 
arguments dx and also in the arguments ôx. The name “covariant” is due to the circumstance that 
the numerical value and formal structure of the two sides of equation (11) always remain the same 
when the independent variables x vary in any way whatever. 


In more modern notation, the quoted equation (11) would look something like: 


Oy w(X) = Oxw(Y) = 5 (jwi = Ojw;) X!Y?, 


4,7—1 


where dx and 6 would be written nowadays as vector fields X and Y on a manifold, and Levi-Civita’s Pfaffian 
X would be notated as a 1-form w. The detailed context of this quotation is not important. The main point is 
that Levi-Civita observes that the exterior derivative (as we now call it) is a second-order covariant tensor, or 
“bilinear covariant” as he calls it, which is chart-independent. The quoted passage suggests that Levi-Civita 
considered chart-independence to mean the same thing as covariance. The key point that he makes, though, 
is that the expression is covariant with respect to the Pfaffian. The expression on line (11) is “bilinear 
covariant relative to the given Pfaffian”. Thus 1-forms are taken to be the fundamental class of objects on 
a manifold, not vector fields. This is possibly the key difference of perspective which explains the quirky 
terminology. Pfaffians are generalisations of differentials of real-valued functions, whereas vector fields are 
generalisations of the derivatives of curves. The early 20th century differential geometry literature places 
more emphasis on solving systems of differential equations than the modern literature. Arrays of differentials 
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like Ou; /Ox' were in the foreground in the earlier literature. (There’s an important clue in the name given 
to the subject — “absolute differential calculus” or “tensor calculus", not “differential geometry” .) Hence 
any tensor which transformed like a differential was called “covariant”. 


In a later chapter, Levi-Civita [26], pages 61-71, develops the general terminology and definitions for co- 
variant, contravariant and mixed tensors. When assigning the name “covariant”, he invokes the examples 
of linear and bilinear forms, force fields and differentials of functions. Then tangent vectors are defined as 
“contravariant” as a consequence. Most particularly, Levi-Civita assumes that physical force is the model 
upon which transformation laws for vectors should be based, whereas the last 20th century view is that 
infinitesimal displacement is the model for vectors. He then notes that the differential of work in a force field 
is given by a formula dW = Xdx+Ydy+Zdz, and work is an invariant under transformations. Consequently 
the force vector (X, Y, Z) must transform inversely to the coordinates. The text shows some awareness of 
the arbitrariness of the choice of names. 


[...] If in fact, as we may suppose, the vector has a magnitude and a direction which are independent 
of the system of co-ordinates chosen (we shall think of it as being defined physically as a force), 
its components, on the contrary, even when the point considered remains unchanged, change their 
values when the frame of reference is changed. This is obvious in the case of a rotation of Cartesian 
axes. If, however, the transformation considered is not of this particular kind, we do not know 
a priori what to substitute for the projections X, Y, Z of the vector on the axes of the old system 
in order to specify the vector in the new system; i.e. we have to determine the law of transformation 
which will meet the needs of the case in question. The most suitable criterion to take as a guide 
in making our choice is found by introducing, alongside the given vector, a scalar quantity with a 
physical significance which is transformed by invariance. In this case we take two infinitely near 
points whose co-ordinates differ by dx, dy, dz; then the work of the force whose components are 
X, Y, Z, in passing from one of these points to the other, will be 


dW = Xdz + Y dy + Zdz; (2) 


this scalar quantity has a physical significance which is invariant, and it can therefore be concretely 
determined. 


Hence it does seem that the choice of terminology is due to a perspective which is more oriented towards 
algebra, force fields and systems of differential equations, and not oriented so much towards geometry. In 
fact, it may even be erroneous to assume that the developers of tensor calculus identified their research as 
“differential geometry”. The first paragraph of the first chapter of Levi-Civita [26], page 1, seems almost to 
be an apology for introducing any geometrical thinking at all into tensor calculus. 


1. Geometrical terminology 

In analytical geometry it frequently happens that complicated algebraic relationships represent 
simple geometrical properties. In some of these cases, while the algebraic relationships are not easily 
expressed in words, the use of geometrical language, on the contrary, makes it possible to express 
the equivalent geometrical relationships clearly, concisely, and intuitively. Further, geometrical 
relationships are often easier to discover than are the corresponding analytical properties, so that 
geometrical terminology offers not only an illuminating means of exposition, but also a powerful 
instrument of research. We can therefore anticipate that in various questions of analysis it will be 
advantageous to adopt terms taken over from geometry. 


28.6. Tensor product space metadefinition 


28.6.1 REMARK: Metadefinition of tensor spaces via characterisation. 

As can be seen in Table 28.5.1 in Remark 28.5.3, many authors use “characterisation” to meta-define tensor 
spaces. According to this approach, any mathematical system which satisfies the characterisation test is 
acceptable as a tensor space representation, and the choice of representation does not matter. This test is 
presented here as Definition 28.6.2. The test is essentially identical to the “universal factorisation property” 
in Section 28.3. 


28.6.2 DEFINITION: Characterisation of tensor spaces. 
A tensor product space for a finite family (Va)aea of linear spaces over a field K is a pair (W, p) which 
satisfies the following. 
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(i) W is a linear space over K. 
(ii) H: Xaca Va > W is multilinear. 
(iii) For any pair (U,v), where U is a linear space and v : Xqge4 Va — U is multilinear, there exists a unique 
linear map g : W > U such that v = g o u. 


The map u is referred to as the canonical (multilinear) tensor map. 


28.6.3 REMARK: Universal factorisation property of tensor spaces. 

Definition 28.6.2 is illustrated in Figure 28.6.1. There is a different map g for each multilinear map v, but 
the map p is unique to the particular tensor product space definition and there is a unique function g for 
each pair (u,v). 


Ui ge W 2 U2 
Ui P V2 
x Va 
acA 
Figure 28.6.1 Metadefinition of tensor spaces 


28.6.4 REMARK: Terminology for the canonical multilinear tensor map. 

EDM2 [113], section 256.1, calls the canonical tensor map yu in Definition 28.6.2 the “canonical bilinear 
mapping” in the case that #(A) = 2. Both Bump [57], page 50, and Fulton/Harris [76], page 471, call the 
map y in Definition 28.6.2 the “universal” bilinear map in the case #(A) = 2. 


28.6.5 REMARK: Isomorphisms between representations of a tensor space. 

All tensor product definitions are equivalent because any two tensor product definitions will yield isomorphic 
tensor spaces. It follows from condition (iii) that if two tensor product definitions yield pairs (Wi, 1) and 
(W2, 2) for a finite family (Va)aea, then there exist maps go1 : W1 —> W2 and gi» : W2 — Wi such that 
bg = goip and Hı = gı2H2. Therefore go; and gı2 are linear isomorphisms between W; and W2. This is 
illustrated in Figure 28.6.2. 


921 
Wi 4— — — W» 
912 
ut Je 
x V4 
acA 


Figure 28.6.2 Uniqueness of tensor space up to isomorphism 


Very importantly, the isomorphism between any two representations of a tensor product is unique. This 
ensures that any individual tensor in one representation may be identified with one and only one particular 
tensor in the other representation. 


28.6.6 REMARK: Alternative representations of tensor product spaces. 
Four ways of defining tensor product spaces are presented in this chapter. 
(i) Characterisation in terms of a multilinear map (Definition 28.6.2). 
(ii) The dual of the linear space of multilinear maps (Definition 28.1.2). 
(iii) The multilinear function space on a mixture of primal and dual spaces (Definition 29.3.8). 
(iv) The quotient of a free linear space by a set of multilinear identities (Definition 28.7.2). 
Definition 28.6.2 defines tensor space representations in general. Definitions 28.1.2 (dual of multilinear 
maps) and 28.7.2 (quotient of free linear space) are particular tensor space representations. All tensor space 


representations (for a particular sequence of linear spaces) are related by a unique isomorphism. Therefore 
calculations in all representations give the same answers. 
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28.6.7 REMARK: Disadvantages of the characterisation approach to tensor spaces. 

The characterisation approach to defining tensor spaces is unsatisfactory in many ways. Although mathe- 
matical objects are often defined “up to isomorphism”, one usually likes to see at least one usable example 
of each equivalence class. It is not too difficult to construct several usable representations of tensor spaces, 
but it is still unsatisfying to be told that tensor spaces are defined by a universal mapping property rather 
than something more concrete. 


Another inconvenience of the characterisation approach is that any modifications to the definition could 
require a lot of hard work to determine a suitable characterisation for the modified definition. For example, 
if one wished to replace a linear space (such as the tangent space at a point in a C! manifold) with a space 
of unidirectional vectors whose invariance group is not the usual group of invertible linear transformations, 
it would not be so easy to define a universal mapping property for this. The highly abstract framework of 
category theory seems best suited to mathematical classes which are fully described by simple well-defined 
homomorphism classes, particularly when the relations between spaces are more important than the details 
of the classes themselves. 


28.7. Tensor product spaces defined via free linear spaces 


28.7.1 REMARK:  Tensor product spaces defined as quotients of free linear spaces. 

Section 28.7 presents an alternative approach to defining tensor products of linear spaces. A simpler approach 
is presented in Section 28.1. In this section, tensor products are carved out of free linear spaces. Free linear 
spaces are defined in Section 22.2. 


28.7.2 DEFINITION:  Quotient-of-free-linear-space-by-equivalence-relation definition of tensor spaces. 

The tensor (product) space acA Va of a finite family (Va)aca of linear spaces over a field K is the quotient 
linear space F/G of the free linear space F on the set x 4c A Va with respect to the subspace G of F generated 
by the set 


{eu + e, — e; uv, w € X Va and JB € A, (ug + vg = wg and Va € A \ {8}, Ua = Va = Wa) } 
ae 


U {ey — Cev; uve X Va, c € K, and 38 € A, (ug = cvg and Va € A\ {8}, ta = Va) }, 
oc 


where e, € F denotes the function eu = X{u} : XaeA Va > K. 


The standard immersion of a product Xae4 Va of linear spaces over a field K into the tensor product 
Qaca Va is the function u : X «c4 Va > Baca Va defined by 


Vv € x Va, ulv) = ey +G. 
acA 


That is, each element v of x qe Va is mapped onto the coset in F/G of the indicator function of {v}. 


28.7.3 REMARK: Interpretation of vectors in the free-linear-space definition of tensor spaces. 
Definition 28.7.2 may be interpreted as representing each symbol of the form v as e,. (See Figure 28.7.1.) 


Figure 28.7.1 Function e, = X{u} for a = 2, Vi = V2 = K = R, u = (3,4) 


Definition 28.7.2 is very close to a symbolic algebra representation of sums of products of sequences of 
vectors. The symbolic expression v4 & v2, for example, may be represented as the function f : Vj x V2 > K 
with f(v1,v2) = 1x and f(v1, v2) = Ox otherwise. This close relation to symbolic algebra is possibly the 
motivation underlying it. 
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28.7.4 REMARK: Verification that the free-linear-space representation satisfies the metadefinition. 

A metadefinition for the tensor product of linear spaces is given in Section 28.6. Definitions of tensor product 
spaces are “characterised” by a set of properties which any definition of tensor products must satisfy. It is 
straightforward to show that the free linear space tensor product immersion u : Xaca Va > Baca Va in 
Definition 28.7.2 satisfies Definition 28.6.2 with W = acA Va- 


28.7.5 REMARK: Advantages and disadvantages of free-linear-space quotient representations. 

The representation of the tensor product space concept as a quotient of a free linear space with respect to 
an equivalence relation amongst linear combinations of vector-sequences is more concrete than the abstract 
algebraic characterisation approach in Section 28.6. Such quotient spaces are also more flexible, since the 
equivalence relation may be chosen fairly freely. — 


Despite the initial intuitive appeal of this extended concept of sums of products of sequences of vectors, it is 
clumsy and unworkable for most analysis. It is difficult to compare vector-sequence-product-sums with each 
other, and algebraic operations are even more difficult, except in the simplest situations. In practice, this 
representation is usually replaced very quickly with component-matrices or some kind of symbolic algebra, 
and thereafter the original definition is forgotten. This contrasts with the dual-of-multilinear-functions 
approach in Section 28.1, where the concrete representation is perfectly usable, and even helpful, in the 
development of the theoretical machinery and in applications. 
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Chapter 29 


'TENSOR SPACE CONSTRUCTIONS 


29.1 Tensor spacë duals. «4 ike Roo op om RUE 9 m Ae ea on Roe v ES 963 
29.2 Unmixed tensor space isomorphisms . . . .. . . . 2 les 964 
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29.5 Mixed tensor product spaces on a single primal space . .............. rs. 980 
29.6 Components for mixed tensors with a single primal space .................. 984 
29.7 Tensor contractions and juxtaposition products ..... les 987 


29.1. Tensor space duals 


29.1.1 REMARK: Various duals and double duals of multilinear function spaces. 

Section 29.1 focuses on the various tensor spaces which can be constructed from the basic multilinear space 
L((Va)aea; K) by forming the duals of the component tensor spaces Va or by forming the dual of the whole 
multilinear space. Algebraic dual spaces of infinite-dimensional linear spaces have serious issues whereas the 
duals of finite-dimensional spaces are very well behaved. Therefore the constructions in Section 29.1 are 
generally restricted to finite-dimensional component spaces Vy. 


Figure 29.1.1 illustrates some dual space constructions which start with the basic multilinear function space 


-Z((Va)aeA; K) for finite-dimensional linear spaces (V4)aeA over a field K. 
me 
= 
Xaca Ve zc Va Jaca; K Qaca Ve Qaca Va)” 
2, : Q, 
Vooy Yay Lay 
dual 
MS ~ > 
XacA Va w+ ¥(( Jaca; K Qaca Va Qaca Ve) 
n” _ 
m 
Figure 29.1.1 Tensor space duals 


The space £((V.*)qea; K) is constructed from .Z((V4)se4; K) by substituting the dual space V for each 
component space Va. The space Qaca Va = &((Va)aca; K)* is constructed as the dual of the whole space 
L((Va)aca;K). Similarly, Qaca Ve = .Z((Vi)aea;K)* is constructed from .Z((V*)424;K). Then the 
double duals (acA V4)* and (aca V,*)* may be constructed similarly. 

It follows from Theorem 23.10.7 that the double duals (Maca V4)* and (aca VŽ)“ have natural iso- 
morphisms to their respective primal spaces. Thus (aca Va) = .Z((V4)aeA; K) and (Qaca Vy S 
L((Ve)aea;K). Some other natural isomorphisms between spaces in Figure 29.1.1 are not so easy to 
construct. These isomorphisms are the subject of Section 29.2. 


The canonical multilinear maps u, u*, ņn and 57. are indicated in Figure 29.1.1. These maps associate simple 
tensors with vector tuples as in Figure 28.2.1 in Remark 28.2.5. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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29.1.2 REMARK: Alternative unentwined diagram for various duals of multilinear function spaces. 

Figure 29.1.2 contains the same information as Figure 29.1.1, shown differently. Whereas Figure 29.1.1 shows 
duality relations between spaces as diagonal arrows in an “entwined” manner, Figure 29.1.2 “unentwines” 
the spaces to show the duality relations in linear sequences. 


Ya) Vi) 
ee uei 
® Va © Ve 
acA acA 
H| 2£((Va)aea) Z(Ve)aea) Ju 
A jo A 
T PONE 


Figure 29.1.2 Tensor spaces with canonical tensor maps 


29.1.3 REMARK: The meanings of the four multilinear maps. 
The canonical multilinear maps pu, u*, n and n? in Figure 29.1.1 in Remark 29.1.1 imply (by Definition 28.6.2) 
that their respective ranges are representations of tensor spaces. 


B. : (Va)aea +> ($ > O((va)aea)) (29.1.1) 
n : (Va)acA > ((wa)acA > ID Wa(Va)). (29.1.2) 
p“ : (wa)acA > ($ — Q((wa)aeA)) (29.1.3) 
n : (Wajaca > ((Va)aeA aa II Wa(Ua))- (29.1.4) 


acA 


The maps p, p*, 7 and 77 have the following significance. 
By Bs) n 8 sig 


(1) u(v) means the representation of the simple contravariant tensor for v = (Va)aca as a linear functional 
acting on .Z((V4)seA; K). 

(2) nT (v) means the representation of the simple contravariant tensor for v = (Va)aca as a multilinear 
functional acting on Xqea Vž. 


(3) u*(w) means the representation of the simple covariant tensor for w = (Wa)aea as a linear functional 
acting on .Z((Vz)aea; K). 

(4) n(w) means the representation of the simple covariant tensor for w = (Wa)aca as a multilinear functional 
acting on Xe A Va. 


As an example of how the representations u(v) and 77 (v) work, consider the operation of addition. The sum 
of (vt) and u(v?) is the map $+ $((vl)aeA) + O((v2)aea) for 6 € .Z((Va)aeA; K). The sum of n7 (v!) 
and n7 (v?) is the map (wa)aeA œ> Iaca we (vl) RE [isca wa (v2). 


29.2. Unmixed tensor space isomorphisms 


29.2.1 REMARK: Isomorphisms and identifications of various tensor space constructions. 

There are many ways to construct new tensor-related spaces by transforming and combining others. These 
construction methods include dual spaces, linear map spaces, multilinear map spaces, tensor products and 
universal factorisation maps. Between these constructed spaces, there are numerous isomorphisms and 
identifications by which they may be grouped together as essentially identical spaces. Most authors do 
identify at least some of these spaces under canonical isomorphisms, often without stating explicitly that 
the identifications are being made. 
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It could be claimed that most of these tensor-related constructions are irrelevant and devoid of real interest. 
However, since different authors use different constructions to represent the same concept, it is important 
to define sufficient constructions here to be able to recognise them in the literature. (Representations for 
tensor spaces which are presented by some authors are summarised in Table 28.5.1 in Remark 28.5.3.) It 
is also important to prove sufficient natural isomorphisms to be able to recognise which constructions are 
essentially identical to other constructions. Another objective here is to investigate various tensor space 
constructions in this section to determine which are most suited to particular purposes. 


29.2.2 REMARK: Overview of four basic tensor space isomorphisms. 

Figure 29.2.1 illustrates some relations between various spaces of multilinear maps and tensor products. The 
abbreviation “iso” means “natural isomorphism”, “dual” means “linear space dual”, and “m-dual” means 
the “multilinear dual” or “space of multilinear maps”. 


(Dee, — (Vaca K) «e Baca Vi A (Qaca Vo)" 
dual 
y * * Vk 
(Va)aeA -((Vi)aea: K) e— Baca Va : (Baca Vi) 
iso iso 
Figure 29.2.1 Multilinear spaces and tensor products of linear space families 


'The isomorphisms in Figure 29.2.1 help to reconcile the many definitions for covariant and contravariant 
tensor spaces which are adopted by various authors. The spaces in the top row are covariant. The spaces in 
the bottom row are contravariant. 


The syncretistic point of view would be that all three spaces in the top row are valid covariant tensor 
spaces, and the three spaces in the bottom row are valid contravariant tensor spaces. There is no real 
need to identify which is the “correct” representation. They each have their own peculiar virtues which 
are applicable in different circumstances. Therefore they may co-exist as multiple tensor spaces for a single 
family of linear spaces (V.)aca- As suggested by Figure 29.2.7 in Remark 29.2.19, each of these tensor space 
representations has its own canonical multilinear map. Therefore they all qualify as tensor spaces according 
to the characterisation test in Definition 28.6.2. 


The following isomorphisms are indicated in Figure 29.2.1. 


AP VS asa K) = A Va = -Z((Vo)aea; K) (29.2.1) 
Z((Va)aea; K) = g, V —-£((Vi)ea K)" (29.2.2) 
& Va & (Q Vi)" = L(V )acai K)" (29.2.3) 
acA acA 
& Ln = ( &) Va)" = L((Va)aca; K)** (29.2.4) 
acA acA 


29.2.3 REMARK:  Tensors and multilinear functions of dual vectors are isomorphic. 

The isomorphism between 2((Vx)aca; X) and £((Va)aea; K)* in Remark 29.2.2 line (29.2.1) demonstrates 
an equivalence between this book's representation for tensor products Qaca Va and the more popular rep- 
resentation. (These two representations are discussed in Remark 28.5.5.) 

The canonical isomorphism which is constructed in Theorem 29.2.4 between tensors and the corresponding 
multilinear functions of dual spaces is analogous to the isomorphism construction which appears in the proof 
of Theorem 29.2.6. The maps and spaces in Theorem 29.2.4 are illustrated in Figure 29.2.2. 


29.2.4 THEOREM: Isomorphism between tensors and multilinear functions of dual vectors. 
Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then there is a unique 
linear space isomorphism Y : Qaca Va > L ((Vž)aca; K) which satisfies 


Wwe x Va Vwe x. VŽ, v(u(v))(w) = D [ valva), (29.2.5) 


Sue acA 


where u : Xaca Va > Qaca Va denotes the canonical tensor map for Qaca Va. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


966 29. Tensor space constructions 


B Ve — . (Viesi K) 


u 


x Va 
acA 


Figure 29.2.2 Application of universal factorisation property 


Pnoor: The canonical multilinear function map transpose 77 : XoeA Va > .Z((Vz)aea; K) is a multilinear 
map by Theorem 27.6.7. So by Theorem 28.3.7 with target space U = Z ((Vž)aca; K), there exists a unique 
linear function v : Qaca Va > .Z((Vz)aeaA; K) such that n? = v o u. This is the same as line (29.2.5). 

It follows from Theorem 27.5.7 line (27.5.8) that the linear space Z ((Vž)aca; K) is spanned by Range(7? ). 
But Range(v) 2 Range(g^) because y^ = y o u, and span(Range(v)) = Range(v) because 7 is linear. 
Therefore Range(7) = span(Range(v)) 2 span(Range(y?)) = .Z((Vz)aea; K). So v is surjective. 

To show that 7 is injective, let T € Baca Va satisfy Y(T) = 0. If T € Range(u), then T = p(v) for 
some v € Xae4A Va. So v(u(v)) = 0. So nT (v) = 0. So u(v) = 0 by Theorem 27.8.12. Therefore Range(u) n 
ker(v) = {0}. So span(Range(u)) n ker(v) = {0}. But span(Range(u)) = aca Va. So ker(v) = {0}. So v 
is an injection. Hence w is a linear space isomorphism. 


29.2.5 REMARK:  Dual-vector tensors and multilinear functions of primal vectors are isomorphic. 

Theorem 29.2.6 presents a canonical isomorphism between Y : Qaca V? and .Z((V4)aeA; K) which validates 
line (29.2.2) in Remark 29.2.2. This isomorphism is natural in the sense that it is the unique linear extension 
of the map from p*(w) = @acA Wa to the multilinear function n(w) : v œ> [[,;£4 wa(vo) for all w = 
(Wa)acA € XaeA Vi, where u* : Xaca V — Qaca Vč is the canonical tensor map for Qaca Vž, and 


1n: Xaca Ve > .Z((Va)aea; K) is the canonical multilinear function map defined in Definition 27.4.6. 


The isomorphism 7 is explicitly specified only for simple tensors, not for general tensors in aca Vx. The 
existence and uniqueness of the extension from simple tensors to all tensors is guaranteed by the universal 
factorisation property for tensor products, which uses bases and coordinates for the proof. An explicit 
construction of the isomorphism would therefore require the use of coordinates with respect to bases for 
the spaces (V4)aeA4. The isomorphism is not basis-dependent, but the proof does employ a basis for each 
component space. 


The functions and spaces in Theorem 29.2.6 are illustrated in Figure 29.2.3. 


x Ve 
acA s 


Figure 29.2.3 Application of universal factorisation property (tensor space) 


29.2.6 THEOREM: Isomorphism between dual-vector tensors and multilinear functions of primal vectors. 
Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then there is a unique 
linear space isomorphism v : Qaca Ve > Y((Va)aea; K) which satisfies 
Vw€ x Vi,Vv€ x Va, b(u*(w))(v) = [] wa(va), (29.2.6) 
acA acA acA 


where u* : Xaca Ve — Qaca Vč denotes the canonical tensor map for Qaca Vč. 


PROOF: The canonical multilinear function map 7: XoeA V2 > .Z((Va)aeA; K) is a multilinear map by 
Theorem 27.6.6. So by Theorem 28.3.7 with target space U = -2 ((Va)aca; K), there exists a unique linear 
function Y : Qaca V? > LY ((Va)aca; K) such that n = v o u*. This is the same as line (29.2.6). 
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It follows from line (27.5.4) in Theorem 27.5.4 that the linear space 2((Va)aea; K) is spanned by Range(7). 
But Range(v) 2 Range(7) because n = v o j^, and span(Range(v)) = Range(7) because v is linear. 
Therefore Range(v) = span(Range(v)) 2 span(Range(7)) = .Z((V4)aeA; K). So v is surjective. 

To show that w is injective, let T € Qaca V? satisfy Y(T) = 0. If T € Range(u*), then T = p*(w) for 
some w € Xaca Vf. So v(u*(w)) = 0. So n(w) = 0. So u*(w) = 0 by Theorem 27.8.2. Therefore Range(u*) 
ker(i /) = {0}. So span(Range(u*)) n ker(/) = (0). But span(Range(u*)) = Baca Vz. So ker(v) = (0). So 
w is an injection. Hence v is a linear space isomorphism. 


29.2.7 REMARK: The universal factorisation property uses bases-and-coordinates for its proof. 

The construction of an isomorphism .Z'((V4)aeA; K) = Qaca V? in Theorem 29.2.6 cleverly hides the use 
of coordinates. Theorem 28.3.7 (the universal factorisation property for tensor products) requires bases 
and coordinates for its proof. Presentations of tensor algebra which characterise (i.e. meta-define) tensor 
spaces in terms of a universal factorisation property seem to have no dependence on coordinates, but any 
representation of such a metadefinition requires bases and coordinates to boot-strap such an apparently 
"coordinate-free" approach. This seems to be true more generally. Essentially all claimed coordinate-free 
approaches to differential geometry topics are really *coordinates-hidden". 


29.2.8 REMARK: The last two tensor space isomorphisms are direct consequences of the first two. 

A natural isomorphism in line (29.2.3) may be constructed from the natural isomorphism in line (29.2.2) 
by observing that Baca Va is the dual of Y((Va)aca;K), and (aca V.*)* is the dual of Qaca VŽ. 
Line (29.2.4) is proved in exactly the same way in Theorem 29.2.10. This completes the demonstration 
of the four isomorphisms required to validate Figure 29.2.1. 


29.2.9 THEOREM: Isomorphism between tensors and the dual of the dual-space tensors. 
Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural 
isomorphism between aca Va and (aca V1)*. 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Let U = aca VŽ 
and W = .Z((Va4)aeA;K). Then by Theorem 29.2.6, there is a natural linear space isomorphism w : 
U — W. By Theorem 23.11.12, the transpose v? : W* — U* of v) is a linear space isomorphism. Thus 
YT : Maca Va > (Qaca VŽ)* is a natural linear space isomorphism. 


29.2.10 THEOREM: Isomorphism between dual-space tensors and the dual of the primal-space tensors. 
Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural 
isomorphism between aca VŽ and (Baca Va)*. 


PROOF: Let (Va)aea be a finite family of finite-dimensional linear spaces over a field K. Let U = acA Va 
and W = .Z((Vi)ae4; K). Then by Theorem 29.2.4, there is a natural linear space isomorphism 4% : 
U => W. By Theorem 23.11.12, the transpose yI : W* > U* of v» is a linear space isomorphism. Thus 
YT : Baca VE — (Qaca Va)” is a natural linear space isomorphism. 


29.2.11 REMARK: Tensor space isomorphisms in the context of the canonical multilinear maps. 

Figure 29.2.4 illustrates the four isomorphisms in Theorems 29.2.4, 29.2.6, 29.2.9 and 29.2.10 combined with 
the context of the diagram in Figure 29.1.1, which shows the canonical tensor map 44 and the canonical 
multilinear function map 7, and the maps u* and 7” for the corresponding dual spaces. 


* 


m 


c 


SrA Va Jaca; K eras Qaca Va (Baca Va)“ 


dual 
t - g> 


XacA p Jaca; K — Qaca Va Qaca Vi) 
n” jc. 
m 
Figure 29.2.4 Canonical tensor map relations between tensor spaces 
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29.2.12 REMARK: Unentwined vertical diagram of tensor space isomorphisms and duals. 
Figure 29.2.5 contains the same information as Figure 29.2.4, but shown in the “unentwined” vertical manner 
which is explained in Remark 29.1.2. 


(@ Va) (Q Va) 
jo x o | 

GV. Gv 
acA iso KEY acA 

H| Z((Va)aca) L((Vž)aca) |p 

A we A 

A O s vA Va 

Figure 29.2.5 Tensor spaces with isomorphisms and canonical tensor maps 


29.2.13 REMARK: Alternative constructions for some tensor space isomorphisms. 

An alternative way to construct the natural isomorphism in line (29.2.4) of Remark 29.2.2 is to compose the 
isomorphism in Theorem 29.2.6 with the standard isomorphism (canonical dual map) for the second dual 
in Theorem 23.10.7 and Definition 23.10.4. This standard isomorphism for the second dual is written out 
explicitly as Theorem 29.2.14. Then the isomorphism h : Z((Va)aca; K) — (aca Va)” in the proof of 
Theorem 29.2.14 may be composed with the isomorphism v : Qaca V* > Z ((Va)aca; K) in Theorem 29.2.6 
to give an isomorphism h o v : Qaca VŽ > (Qaca Va)*, which satisfies line (29.2.4). 


29.2.14 THEOREM: Isomorphism between multilinear functions of primal vectors and the dual of tensors. 
Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural 
isomorphism between Z ((Va)aca; K) and (aca Va)“. 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then by Theo- 
rem 23.10.7, there is a natural linear space isomorphism h from the linear space Z ((Va)aca; K) to its double 
dual. Thus ^ : 2((Va)aca; K) > .Z((Vo)aea; K)** = (Baca Va)*. 


29.2.15 REMARK: Yet another canonical tensor space isomorphism. 

The proof of Theorem 29.2.16 is the same as for Theorem 29.2.14 because it is really the same theorem! By 
combining the isomorphism h : .Z((Vz)eea; K) > (aca V1)* in Theorem 29.2.16 with the isomorphism 
UT : Baca Va > (Qaca VŽ)“ in the proof of Theorem 29.2.9, one obtains a natural isomorphism A-! o 


YT : Baca Va > .Z((Vi)aea; K) which is as required for line (29.2.1) in Remark 29.2.2. 


29.2.16 THEOREM: Isomorphism between multilinear functions of duals and dual of dual-space tensors. 
Let (V4)aeA be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural 
isomorphism between .Z((V*)a4e4; K) and (aca VŽ)“. 


PROOF: Let (Va)aca be a finite family of finite-dimensional linear spaces over a field K. Then by The- 
orem 23.10.7, there is a natural linear space isomorphism h* from the linear space .Z((VZ)se4; K) to its 
double dual. Thus h* : .Z((VZ)aea; K) > .Z((Vi)aea; K)'* = (Qaca VŽ“. 


29.2.17 REMARK: Circuitous method of constructing a tensor space isomorphism from four isomorphisms. 
The pathway to construct an isomorphism between aca Va and .Z((Vz)se4; K) on line (29.2.1) in Re- 
mark 29.2.2 could be made more circuitous. First, the universal factorisation property for the canonical 
tensor map u* can be applied in Theorem 29.2.6 to implicitly construct an isomorphism 7 : Qaca Vi 
S ((Va)aca; K) such that n = v o u*, where 9^ is the canonical multilinear function map transpose. This 
construction employs a basis and vector components. Second, Theorem 29.2.9 transposes ~ to obtain the 
isomorphism «7 : Qaca Va > (Qaca VŽ)“. Third, Theorem 29.2.16 employs the canonical double dual 
map h* : Z((Vž)aca; K) > (Qaca V,*)*. Fourth, YT may be composed with the inverse of h* to obtain 
the isomorphism (h*)-! o YT : Qaca Va > .Z((V3)sea; K). 
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29.2.18 REMARK: Circuitous method of constructing a tensor space isomorphism by inversion. 

It is easy to imagine from Figure 29.2.3 that one could swap the roles of 7 and u* and apply the universal 
factorisation property for multilinear functions (Theorem 27.8.7). (See Figures 27.8.1 and 29.2.6.) Then y~! 
will be the unique linear space isomorphism whose existence is guaranteed by the theorem. This shows that 
w maps between simple tensors and the corresponding multilinear functions, but 7) also extends this map to 
the entire spaces of tensors and multilinear functions. 


95 gy 
-((Va)aeA; K) en 


acA 


x vV 


acA j 


Figure 29.2.6 Application of universal factorisation property (multilinear functions) 


29.2.19 REMARK: Relations between representations of tensors. 

Both the canonical multilinear function map 7 : Xaca Vè > Z((Va)aca; K) and the canonical tensor 
map fl" : Xaca VŽ — Qaca V; are shown in Figure 29.2.3 for Theorem 29.2.6. Since both of these maps 
have universal factorisation properties (Theorems 27.8.7 and 28.3.7 respectively), it seems likely that the 
isomorphism ~ has some fundamental significance. To see how these maps are related to each other, and to 
the linear map w and the multilinear maps p : Xaca Va > Baca Va and 9? : Xaca Va > .Z((Vi)aea; K) 
in Figure 29.2.2 for Theorem 29.2.4, the basic definitions may be summarised as follows. 


u —4 og 

y -wvop (29.2.7) 
w= pron 

-— (29.2.8) 


These relations are illustrated in Figure 29.2.7. 


JE. 
L(V. *JaeA; K Va cer y -Z((V. a Jaca; K Jaca; K y 
= & ? 
X A SV acA 
x Ve 
ane acA 5 
Figure 29.2.7 Isomorphisms between representations of tensor spaces 


The existence of the linear space isomorphism ~ implies that the contravariant tensor representations u(v) 
and 77 (v) contain the same information. However, the (v) style of representation is preferred in this book 
because it has clearer motivation and significance in the physics context, it is easier to construct spaces 
of alternating tensors from it, and it gives clearer meaning to various kinds of mixed tensor constructions 
(including the Riemann curvature tensor). The most popular representation is n? (v). 

Similarly, the existence of the linear space isomorphism 7 implies that the covariant tensor representations 
p*(w) and 7(w) contain the same information. The 7(w) style of representation is preferred in this book. 
'This is the same as the popular definition. 


29.2.20 REMARK: The analogy of multilinear function spaces to distribution "test function" spaces. 

In the contravariant case in Remark 29.2.19, u(v) and 5? (v) are alternative representations for the tensor 
product of the contravariant vector family (va)aea € XacA Va. The linear space .Z((Va4)ae4; K) may be 
thought of as the “test space" for u(v) in the same way that generalised functions act on C% test spaces. 
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Such test spaces serve only to “test” the objects which are defined on them. The Cartesian product x gea V~ 
may be thought of as the “test space” for the representation n? (v). 


The equation u(v) o n = 7 (v) in line (28.4.5) in Remark 28.4.13 gives the conversion rule between the 
contravariant tensor representations (v) and 7! (v) for v. The right-composition of p(v) by 7 serves to shift 
the test space of the representation from &((Va)aea; K) to Xaea Vi. Thus the tensor representation u(v) 
which maps the test space -Z ((Va)aca; K) to K is converted into the equivalent tensor representation n? (v), 
which maps the test space Xaca V; to K. 


Thus 7 acts on the representation (v) on the right to shift the domain of (v) to convert it to the repre- 
sentation 77 (v). Similarly, the equation n? = y o p in line (29.2.7) means that the map 7 acts on u on the 
left to shift the range of u to convert it to the representation 7’. The big advantage of this left-composition 
with v is that it maps all tensors, not just the simple tensors as in the case of the right-composition with n. 


29.2.21 REMARK: Another analogy between multilinear function spaces and "test function" spaces. 

In the covariant case in Remark 29.2.19, u* (w) and 7(w) are alternative representations for the tensor product 
of the covariant vector family (wa)aeA € Xaca V. The linear space Y ((Vž)aca; K) may be thought of as 
the “test space" for u*(w). The Cartesian product xaea Va may be thought of as the “test space" for the 
representation 7(w). 


The equation u*(w) o n? = n(w) in line (28.4.6) of Remark 28.4.13 gives the conversion rule between the 
covariant tensor representations y*(w) and 7(w) for w. The right-composition of u*(w) by 97 serves to shift 
the test space of the representation from Z ((Vž)aca; K) to Xaca Va. Thus the tensor representation u* (w) 
which maps the test space Z ((Vž)aca; K) to K is converted into the equivalent tensor representation n(w), 
which maps the test space Xac4a V; to K. 


Thus 77 acts on the representation u*(w) on the right to shift the domain of u*(w) to convert it to the 
representation 7(w). Similarly, the equation 7 = v o u* on line (29.2.8) means that the map w acts on u* on 
the left to shift the range of u* to convert it to the representation 7. The big advantage of this left-composition 
with v is that it maps all tensors, not just the simple tensors as in the case of the right-composition with n”. 


29.2.22 REMARK: Interpreting the structure of the canonical isomorphisms. E 

From Remarks 29.2.20 and 29.2.21, it can be seen what the tensor space isomorphisms w and w look like 
when expressed in terms of bases and coordinates. The relation n? = w o p implies that ọ maps the 
simple tensor representations u(v) and n? (v) to each other for all v € x 4&4 Va. In particular, for any bases 
Ba = (et )i,e1, for the component spaces Va, the simple tensor representations u(e;) and n” (e;) are mapped 
to each other by y, where e; denotes (e? Jaca for i = (ta)aea € I = XaeA Ia. 

Theorem 28.2.12 line (28.2.3) gives the expression T = Xer T(oi)u(ei) for T € Maca Va, where h; denotes 
the dual vector family (h? )aeA € XaeA Va for i € I, and (ói)ier = (n(hi))ier is the canonical basis for 
-Z((V4)aeA; K) in Definition 27.6.9 for the bases B4. So each such tensor T has coordinates t; = T(¢;) 
for i € I with respect to the basis (u(e;));er for Maca Va. 

Theorem 27.5.7 line (27.5.8) gives the expression T" = X jer T' (hi) (e;) for T' € .Z((Vz)sea; K). So the 
coordinates of T" are t^ = T'(h;) for i € I with respect to the basis (n? (ej));e; for Z((Vž)aca; K). 

If T” = y (T), then the coordinates satisfy t; = t^ for all i € I because the bases (j(e;));e; and (n? (e;));er 
for Maca Va and Y((V*)aca; K) respectively are mapped to each other by 7. Hence the map w is the map 
which preserves coordinates between the two spaces. Therefore w is invariant under all changes of bases for 
the component spaces (V4)ae A. 

The isomorphism 7 has the map-rule v : J;e times) > Vier tin? (e;) for all t = (t;);e; € K'. Hence in 
terms of coordinates, the map-rule is v : (t;);er > (t;)ier. If A = N, for some r € Zf, the map-rule may be 
written as Y : Sicr E A Q e2... Er — Jier Li is. i, hii Riz "T fiss 

These comments apply equally to the map v, referring to the comments in Remark 29.2.21. 


29.2.23 REMARK: Diagram of tensor space duals and isomorphisms for homogeneous tensors. 

Figure 29.2.8 illustrates the relations between spaces of multilinear maps and tensor product spaces of a 
single linear space. As in Figure 29.2.1, the abbreviation “iso” means “isomorphism”, and “m-dual” means 
the “multilinear dual” or “space of multilinear maps". 


In the case that all tensor component linear spaces are copies of a single finite-dimensional linear space, the 
isomorphisms listed in Remark 29.2.2 may be written as follows. These are the isomorphisms which are 
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y* Xe tV K) 1S0 Q” y* 1S0 (@™ Vy 
Zg, Q Q, 
op Cy Qag 
dual 
dual 

AM <> Se 
Very ow ZZ. (V*; K) <— G9" V << (@™ y*)* 

iso iso iso 

Figure 29.2.8 Linear and multilinear duals of a linear space 


indicated in Figure 29.2.8. 


BASE SQUE VAR) (29.2.9) 
Ln V; K) SQ V* = Ln(V*; KY" (29.2.10) 
Qo Ww S (QVE = Lm(V*; ek)” (29.2.11) 
Q” V* S (@Q"V)* =Z,(V; K)™* (29.2.12) 


These natural isomorphisms follow immediately from the corresponding isomorphisms in Remark 29.2.2, 
which are proved in Theorems 29.2.6, 29.2.14, 29.2.9 and 29.2.16. 


29.2.24 REMARK: Unentwined diagram of tensor space duals and isomorphisms for homogeneous tensors. 
Figure 29.2.9 contains the same information as Figure 29.2.8, shown “unentwined”. (See Remark 29.1.2 for 
an explanation of such “unentwined” diagrams.) 


(e vy (BV 
L we l 
env so 4g0 oe 
U| £V) Lm(V*) | 
4 


iso 


Figure 29.2.9 Tensor spaces for a single linear space with canonical tensor maps 


29.2.25 REMARK: An isomorphism for spaces of linear maps. 

Amongst the bewildering (and often tedious) array of isomorphisms for various spaces which are constructed 
using multilinear and linear duals, the isomorphism in Theorem 29.2.26 has the virtue of being simple and 
also occasionally useful. This simple isomorphism does not require any canonical multilinear maps. However, 
note that the proof of the canonical isomorphism between W and W** requires a basis and coordinates. 


29.2.26 THEOREM: Isomorphism between linear maps and multilinear maps on a linear space and a dual. 
Let V and W be finite-dimensional linear spaces over a field K. Define the map V : Lin(V,W) > 
L(V,W*; K) by v(ó)(v,v) = z($(v)) for all ? € Lin(V. W) and (v,z) € V x W*. Then v is a linear 
space isomorphism. 


PROOF: Let V and W be finite-dimensional linear spaces over K. Let w be as defined. Let 41,42 € K 
and $1,099 € Lin(V,W). Then v(Xió1 + Azó2)(v, £) = z(Quói + A202)(0)) = z(Argı (v) + 2202(0)) = 
Ai 2(d1(v) + A2x(¢2(v)) = Aith(d1) (2, z) +rov(d2) (0, x) = (A1v(¢1) +A20b(db2))(v, x) for all (v, x) EVxW*. 
So V(M i e A24») = Ayla) + Ast (h2) for all An A2 € K and Qı, Q2 € Lin(V, W). So Y is a linear map. 


WN 
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For 0 € .Z(V, W*; K), define a map v(0) : V > W** by v(0)(v)(x) = 0(v,z) for all (v,z) € V x W*. It is 
clear that v(0)(v) : W* > K is linear for v € V. So v(0)(v) € W**. It is also clear that v(0) : V + W** is 
linear. So v(0) € Lin(V, W**) for all 0 € .Z(V,W**; K). 

Let 1, à EK and 01, 05 € ZV, W*; K). 'Then v(A61 + A202) (v) (£) = (A161 + A292)(v, ax) = A101 (v, x) + 
A202(v, 2) = Avib(61) (v) (ar) + Az (02) (o) (0) = Qui (01) + Ao/(02)) (v) (x) for all (v, £) € V xW*. So $01 
A262) = A1U(01) + Aot (02) for all Ay, A2 € K and 01,05 € L(V, W*: K). Therefore y: LV, V,W**: K) — 
Lin(V, W**) is a linear map. 


By Theorem 23.10.7, the canonical linear map u : W — W** defined by w: w > (a +> x(v)) is a linear space 
isomorphism because W is finite-dimensional. Therefore u^! : W** — W is a linear space isomorphism. 
Define p : Lin(V, W) ^ Lin(V, W**) by p(ó)(v) = u($(v)) for all $ € Lin(V, W) and v € V. Then it is 
easily shown that p is a linear map, and that p : Lin(V, W**) > Lin(V, W) defined by p(ó)(v) = u-1(6(v)) 
for all $ € Lin(V,W**) and v € V is the inverse of p. So p is a linear space isomorphism. Therefore 
pow: £(V,W*;K) > Lin(V, W**) is a linear space isomorphism. It is easily shown that p o ~ is the 
inverse of w. Hence w is a linear space isomorphism. 


29.3. Mixed tensor product spaces on mixed primal spaces 


29.3.1 REMARK: The need to identify isomorphisms between mixed tensor space constructions. 

In Section 29.2, various natural isomorphisms are demonstrated between the various unmixed tensor space 
constructions which were presented in Section 29.1. It may seem unprofitable to build numerous families of 
tensor space constructions and then show that they are equivalent to each other. But these constructions 
are not redundant. Each family of tensor space constructions arises naturally in differential geometry. They 
may arise as linear maps, or as spaces of differentials or differential operators, or in other ways. Then it is 
necessary to identify these constructions with other constructions via isomorphisms. 


The opportunities for identifying natural isomorphisms between tensor space constructions are even greater 
when the tensors are mixed. Mixed tensors are mixtures of covariant and contravariant tensors. Mixed tensor 
spaces have natural isomorphisms to multilinear function spaces of the form -Y ((Vž)aca, (Wa)gep; K), which 
is the space of multilinear functions on a mixture of dual spaces (Vž)aca and primal spaces (Wg) geR over 
a field K. (See Definition 29.3.8.) Alternatively, a mixed tensor space may have a natural isomorphism to a 
multilinear function dual space of the form Z ((Va)acA, (W3) Bep; K)*, which has a natural isomorphism to 
Z ((Vž)aca, (Wa)gep; K). Many textbooks present only one of these kinds of spaces, but there are numerous 
other kinds of constructions which are naturally isomorphic to these two families, and these constructions 
do arise naturally in differential geometry. The purpose of Section 29.3 is to present some of the naturally 
occurring mixed tensor space constructions and demonstrate natural equivalences between them. 


A third general family of mixed tensor spaces has the form Lin(.2((Va)aea; K),.Z ((Wg)aep:; K)). (See 
Definition 29.3.3.) These are spaces of linear maps between multilinear function spaces. A natural isomor- 
phism between this family and the multilinear function family .Z((Vt)aeA. (Wa)gep; K) is demonstrated 
in Theorem 29.3.11. These three families of mixed tensors may be summarised as follows. (For reasons of 
aesthetic balance, a fourth isomorphic family is included in this list.) 


family construction 
(1) multilinear function space X ((V3)oeA. (Wa)gen: K) 
(2) multilinear function space dual .Z((V4)aeA; (W5)aen; K)* 
(3) linear map space Lin(-Z((Va)aeaA: K),-Z((Wa)aen; K)) 
(4) linear map space dual Lin(-Z((W3)8en; K), ((Vz)aea; K)) 


Whenever these and any other families of mixed tensorial spaces are encountered, it is clearly desirable 
to be able to identify the relations between them. The mixed tensor space families require definitions, 
notations and isomorphism theorems. Many more spaces could be added to the list in the table above, such 
as Lin(-Z((Wa)sep; K), 2((Va)aca; K)*). When the linear space components of a multilinear spaces are 
the same, various symmetries and anti-symmetries may be applied to obtain yet more mixed tensor spaces. 
This is not idle definition building. Such combinations of linear spaces, dual spaces, multilinear spaces and 
symmetries are often encountered. (See Remark 30.4.14, for example.) However, the techniques required to 
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construct such natural isometries are reusable. Theorem 29.3.11 gives an example of how these isometries 
may be constructed. 


It follows from Theorem 23.11.17 that Lin(.Z((Wa)agen; K)*, 2((Va)aca; K)*) is equivalent to line (3) in 
the above table. 


29.3.2 REMARK: The order of listing of component spaces for mixed tensor products. 

The order of covariant and contravariant parts of tensor spaces is most often chosen with the contravariant 
part first. In the table in Remark 29.3.1, the families of mixed tensors are effectively contravariant with 
respect to the linear spaces V4 and covariant with respect to the linear spaces Wg. The contravariant-first 
convention is followed in Notation 29.3.4. 


29.3.3 DEFINITION: The mized tensor product (space) of finite families (Va)aca and (Wg)sgex of linear 
spaces over a field K is the linear space of linear maps Lin(2((Va)aca; K), 2((We) sen; K)). 


29.3.4 NOTATION: &((Va)aea; (Wa)gep) denotes the mixed tensor product 
Lin(Z((Va)aea: K),-Z((Wa)sen; K)) 


for finite families (Va)aea and (Wa)genp of linear spaces over a field K. 


29.3.5 REMARK: Schematic diagram of a mixed tensor space. 
Definition 29.3.3 and Notation 29.3.4 are illustrated in Figure 29.3.1. 


K K 
k | +] I 
x Va x We 
acA BEB 
L((Va)acai K) -Z((Wa)gen; K) 


LinCZ((Va)aeA; K), Z ((Wa)gen; K)) 
= @((Va)aca; (Wa) sex) 


Figure 29.3.1 Mixed tensor product space 


The arrows with “chicken-feet” tails symbolise spaces of multilinear maps from the indicated cross-product 
sets to K. The arrow with the single-stroke tail symbolises the set of linear maps between the spaces in the 
two inner boxes. 


29.3.6 REMARK: Non-standard notation for mixed tensor spaces. 

Notation 29.3.4 is non-standard, but the author had previously considered the notation Qaca Va Dheg Wa, 
which is much worse. One could presumably also define the notation ®((Va)aca) to mean &((Va)aca; 0) 
(which may be identified with acA Va), and the notation C9" ((Wa)aep) to mean G9 (0; (Wa)aep) (which 
may be identified with .Z((Wga)sep; K)). Thus: 


G((Va)aeA) = Baca Va = L((Va)aca; KY 
©" ((Wa)gen) = -Z((Wa)gen; K) 
G9 ((Va)aea: (Wa)gen) = Lin(-Z((Va)aea: K), Z ((Wa)sen; K)). 


The star-notation C9" ((Wg)gex) for covariant tensors .Z((W5)5ep; K) is uncomfortable and unnecessary. 


29.3.7 REMARK: The special case of mixed tensor spaces which are not mixed. 

If A = 0) in Definition 29.3.3, then Y((Va)aca; K) may be identified with K regarded as a linear space. (See 
Remark 28.1.15.) Then ®((Va)aea; (Wa)aep) = Lin(K,.Z((Wg)aep; K)), which may be identified with 
-Z((Wa)aep; K). In other words, the mixed tensor product space in this case may be identified with the 
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linear space of multilinear functions for the family (Wg)gez, which may be thought of as a space of covariant 
tensors. 

Similarly, if B = () in Definition 29.3.3, then @((Va)aca; (Ws) ses) = Lin(Y((Va)aca, K); K) = acA Va. 
In other words, the mixed tensor product space in this case may be identified with the tensor product space 
Qaca Va in Definition 28.1.2. 

The mixed tensor product space C9((Va)aeA; (Wg)aep) = Lin(.Z((Va)aea: K), 2((Wes) sen; K)) is thus 
some kind of mixture of covariant and contravariant tensors. It is not identical to the sets which typically 
define mixed tensor product spaces. However, this formalism has various advantages relative to the more 
popular definitions. 


29.3.8 DEFINITION: The mized multilinear function space of finite families (V,)4c4 and (Wg)gex of linear 
spaces over a field K is the linear space of multilinear maps .Z((V7)aeA. (Wag)genp; K). 


29.3.9 REMARK: Fine points of the notation for mixed multilinear function spaces. 

The comma-separated juxtaposition “(V*)aea,(Ws)sen” in Definition 29.3.8 signifies the concatenation of 
the two finite families of linear spaces (Vi )aea and (Wsg)gep. (See Definitions 14.6.10 and 14.12.6 (iii) for 
concatenation of tuples and lists.) If AN B 4 9, then one or both of the index sets A and B must be modified 
so that they do not intersect. If the index sets are both ordered, then the combined index set should have an 
order which makes all elements of A less then (or greater than) the elements of B. If A= Nm and B = Nn, 
it is customary to add m to each element of B so that B = Nmin \ Nm. 


29.3.10 REMARK: Schematic diagram of isomorphism between mixed tensor spaces. 
The isomorphism map p in Theorem 29.3.11 is illustrated in Figure 29.3.2. 


K K K 
p 
1. || 4 A E 
x Va x We x Vi x We 
acA BEB acA BEB 
Z ((Va)aca; K) |. Z((Wg)sgen; K) L ((Vž)aca, (Wa)gen; K) 


Lin(-2((Va)aca; K), 2((We) sen; K)) 
= Q((Va)eeA: (We) sex) 


Figure 29.3.2 Isomorphism between mixed tensor spaces 


29.3.11 THEOREM: Isomorphism between linear-style and multilinear-style mixed tensor spaces. 

Let (Va)aca and (Wg)gez be finite families of finite-dimensional linear spaces over a field K. Then the map 
p from Lin(.2((Va)aea; K), 2((We) een; K)) (a mixed tensor product space) to .Z((VZ)aeA. (Ws) eB; K) 
(the corresponding mixed multilinear function space) which is defined by 


Vee X, Vv* € x Vi, Vwe x Wa, 
acA BEB 
p(o)(v*, w) = n(w)(o6(n(v*))) 


is a linear space isomorphism, where 7 : Xaca Vè > Z ((Va)aca; K) is the canonical multilinear function 
map in Definition 27.4.4 and u : x gen Wg — ges Wg denotes the canonical tensor map in Definition 28.2.2. 


PROOF: Let (Va)aea and (Wg) sep be finite families of finite-dimensional linear spaces over a field K. Let 
X = Lin(Z((Va)aea; K), 2((Wa) ges; K)) and Y = Y((Ve)aea, (Wa) eB; K). Let ó € X. Then 


vot € x V, Vw € .x Wa, p(o)(v*,w) = é(mQv^))(w) 


by Definition 28.2.2. The multilinearity of p($)(v*,w) with respect to w for fixed v* follows from the 
definition of $. The multilinearity with respect to v* for fixed w follows from the multilinearity of n(v*) 
with respect to v* and the linearity of $(0)(w) with respect to 0 € Z((Va)aca; K) for fixed w. (To 
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verify this linearity, define ¢, : .Z((Va)aeA; K) > K by by : 04 $(0)(w) for w € xgep Wg. Then for 
Ai, À2 ek and 01,02 € Z ((Va)aca; K), ów (A161 -F A209) = (A161 + A909) (w) = A16(61)(w) + A26(82)(w) = 
A1Qw (01) 4- A20, (02).) Therefore ó(n(v*))(w) is multilinear with respect to the concatenation (v*, w). Hence 
P(o) € Z((Vz)aea, (Wa)gen; K) = Y, as claimed. 

To show that p(ó) is linear with respect to ¢, first note that the linear structure on both X and Y is defined 
pointwise. Let A1,A2 € K and $1,909 € X. Then p(A1éi + A242) (v*,w) = (Aridi + A202) (n(v*))(w) = 
Aic (n(v*)) (w) + Ard2(n(v*))(w) = Arp(o1)(v*, w) + Arp(G2)(v*, w) = (Arp) + A2p(02)) (v*, w) for all 
v* € XacA V and w € xgcep Wg. Therefore p(A1¢1 + 4202) = A1p(¢1) + Aap(ó2). So p is linear. 

To show that p is surjective, define ¢, € X for g € Y by 


V0 € .Z((Va)aeA; K), Vu € x Wa, 
BEB 


y(8)(w) = (67 (Gw)) (9), 
where gy € .Z((Vz)aea; K) is defined by gy : v* +> g(v*,w) and v : Baca Va > .Z((Vi)aea; K) is 
the isomorphism in Theorem 29.2.4. (It is easily verified that ¢, € X for all g € Y.) Then for g € Y, 
p(ós)(v*, w) = és (n(v*))(w) = Q7 (gQw)))(n?)). 
Since g(w) € .Z((Vz)aea; K), it can be expressed as a linear combination of simple multilinear functions 


n? (v) for v € Xaca Va. In fact, these vector tuples v can be chosen to be tuples of basis vectors with bases 
€? ); er, for Va. Thus g(w) = Y2,., kin? (ei), where k; = g(w)(h;), by line (27.5.8) of Theorem 27.5.7. So 
taltaEla tel albis — 


p(dg)(v*, w) = (Y (X kin” (e)))) (iG) 
icl 
= $ ki (W7 (1 (e:))) (n@*)) 
icl 


= $ k wei) (n(v*)) 


icr 


= kinte’ )(e:) 
icI 
=P kin? (eN(0*) 
iel 
= g(w)(v") 
= g(v*,w). 
Thus p(¢,) = g for all g € Y. This shows that p is surjective onto Y. 
To show injectivity of the map p, suppose p(d1) = plén) for di, da € X. Then di(n(v*)(w) = da(n(v*) (w) 
for all v* € Xaca V? and w € xgep Wg. So ói(n(v*)) = é»(n(v*)) for all v* € Xaca Vă. This implies that 
$1 and ¢2 are equal on all simple multilinear functions n(v*) € .Z((V4)aeA; K). Therefore $1(0) and $2(0) 
are equal for all multilinear functions 0 € 2((Va)aea; K). So $1 = $2. 


29.3.12 REMARK: Strategy for proof discovery by first testing simple tensors and multilinear functions. 
Much of the proof of Theorem 29.3.11 was constructed by first restricting attention to simple tensors or 
simple multilinear functions, and then extending the procedure to general tensors or multilinear functions by 
expressing them as linear combinations of the simple case. This approach is quite useful in general. A useful 
tactic for testing conjectures is to substitute simple tensors and simple multilinear functions where possible. 
If an idea does not work for simple tensors and multilinear functions, then it will certainly not work for the 
general case. But if the simple case is successful, the general case will very often follow by using a basis for 
the relevant space or spaces. 


29.3.13 REMARK: Literature for the mixed tensor space isomorphism theorem. 
A statement similar to Theorem 29.3.11 for the special case #(A) = #(B) = 1 is given by Willmore [42], 
page 175. 


29.3.14 REMARK: An advantage of the linear-map style of mixed tensor space representation. 
With the linear-map representation Lin(.Z((V4)ae A; K), 7((Wg) se; K)) for mixed tensors, the traditional 
composition operation on tensors may often be achieved by composition of tensors. 
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29.4. Components for mixed tensors with mixed primal spaces 


29.4.1 REMARK: The application of bases-and-components to mixed tensor product spaces. 
The purpose of Section 29.4 is to apply bases and components to the mixed tensor product spaces which are 
defined in Section 29.3 for mixed primal spaces. 


29.4.2 REMARK: The confusion between “components”, “coordinates” and “coefficients”. 
It is probably strictly more correct to refer to the “components” of tensors rather than the “coordinates” or 
“coefficients”. However, these terms are often used interchangeably. 


29.4.3 REMARK: Component-expressions for two representations of mixed tensor spaces. 

Let (Vk); and (W;); 4, with r,s € Zj, be finite families of finite-dimensional linear spaces over a field K. 
Then two styles of mixed tensor spaces are defined over these spaces in Section 29.3, namely the mixed tensor 
product space CO((V.)t 4; (Woga) = LinCZ((Vi)r 4). -Z((W2); 4)) and the mixed multilinear function 
space £((V,")7_,,(We)?_,). (The field K is suppressed from these expressions for simplicity.) 


Theorem 29.4.4 expresses mixed tensors in the space .Z ((V)1. .,, (We)f_,) in terms of an array of coefficients 
(a’;)ier,jes- Theorem 29.4.5 expresses mixed tensors in the space Lin(.Z((Vi)7..,). Z2 ((W?)2..,)) in terms of 
an array of coefficients (a^;);e;,jc;. These arrays are the same under the map p in Theorem 29.3.11. These 
coefficient arrays are expressed in terms of canonical dual bases. (See Definition 23.7.3 and Theorem 23.7.5 
for canonical dual bases.) 


29.4.4 THEOREM: Components for the multilinear-function-space representation of mixed tensors. 

Let (V); and (W?)$_,, with r,s € Z, be finite families of finite-dimensional linear spaces over a field K. 
Let By = (€x,i,);,%1 be a basis for Vp for all k € IN,, and let By = (Ee je) j= be a basis for Wọ for all £ € Ng. 
Then 


VA € LU Vi Jats (Woa), VOu)kaa € X Vi, Yw) € X We, 


r 


AAD (wia) = Z X a'y TI Alera) TT w^ 


i€1j€J ` k=1 


where I = x; 4INm,, J = Xp=,Nn,, and wł = Kg, (wi), is the je-th component of we with respect to the 
basis B; for £ € N,, and 


Vie I, Vj € J, aj = AÇ (e)p, (665,)1-1); 


where e*** is the i-th element of the canonical dual basis to Bj for k € Ny. 


PROOF: The assertion follows from Theorem 27.5.4 line (27.5.1). 


29.4.5 THEOREM: Components for the linear-map-space representation of mixed tensors. 

Let (V), and (Wz)? 4, with r,s € Zj , be finite families of finite-dimensional linear spaces over a field K. 
Let Bj = (exis oca be a basis for Vz for all k € N,, and let B; = (Ee je) 5 1 be a basis for Wọ for all £ € Ng. 
Then 


VA € D(L (Va) ZWO), VÀ € (Via), woi € X We 


AAwa) = X Xia I wit, 


i€Ij€J 


where I = x; 4INa,; J = x; 4IN5,, and A; = A((ex;i,)g a) = A(E1,i1>--- er i,) fori € I, and wit = KB, (we); 
is the jg-th component of we with respect to the basis By for £ € IN,, and 


Wie T wed a’; = A(e*)((&5,)$.), 


where e! = @%_, e*?*, and etr is the i,-th element of the canonical dual basis to Bj for k € N,. 


PROOF: The assertion follows from the multilinearity of 2((V,);_,) and .Z((Wy); 4). 
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29.4.6 REMARK: Change-of-basis formulas for tensor spaces. 

Theorems 29.4.4 and 29.4.5 are certainly very tedious to read. However, they are also very general. They are 
more general than is required for most purposes in differential geometry, but even when they are simplified 
for the case of a uniform primal space (as in Section 29.5), they are still very tedious. This helps to explain 
why so many people do not like “coordinates”. 


The change-of-coordinates formula for tensors in Theorem 29.4.7 is twice as tedious as the coordinates 
expressions for tensors with respect to a single set of bases. 


29.4.7 THEOREM: Change-of-basis formulas for two mixed tensor space representations. 

Let (Vk); and (W~)#_, be finite families of finite-dimensional linear spaces over a field K, where r,s € Zj. 
Let By = (eki,), 5, and By = (6x,,);, 5 be bases for Vp for all k € N,, and let Be = (&&j,)5/., and 
B, = (êe, jj); be bases for We for all £ € IN,. Define the basis transition matrices p 8)o.g-i and 
d. Jako] for k € N,, where n; = dim(V;), b 


id 


,8-1 
Nk 

Vk € Ny, Va € Ns, eka = A EYE. 
B=1 
Nk ^ 

Vk € IN, Va € Nn, êka = p Sa Ya 


mg 


Define the basis transition matrices (Z£ DA , and (Zt T Boa 9-1 for L € Ns, where m; = dim(W;), by 


VL € Ns, Va € Na, E » YA 
B-—1 
^ me ^ 
VE € Ns, Ya € Nm,, eta = Y, Eep Êh a- 
B=1 
Let A € L((Vj*)"_,, (Wg)2 4). Then 
Vi € I, Vj € J, at, = CIT X OTT Zo)" gs (29.4.1) 
i€lj'€J k=1 gan” 
Vi c I, Vj € J, ai; = CT ŻE (II ZE 8 y, (29.4.2) 
i€eljcJ koci  F" fa ^" 
where 
Vi € I, Vj € J, a’; = Al (e), (Eej) (29.4.3) 
Vi € I, Vj € J, à'; = A(& Vka, (655,)0 1). (29.4.4) 


where (e5*)7* , and (é*"*)/"£, are the canonical dual bases to Br and Dy respectively for all k € N,, and 
(e^ je) , and (€ e^ Je)", are the canonical dual bases to B, and B, respectively for all £ € Ng. 


Similarly, let A € Lin(.2((V_)f_1),-2((We)#_,)). Then lines (29.4.1) and (29.4.2) hold when (a’;)ier jeJ 
and (4’;)ier,jey are defined instead by (29.4.5) and (29.4.6): 


Vi € I, Vj € J, a’; = A(e’)((€e,j, )é—1) (29.4.5) 
Wel, Vj € J, a’; = A(é’)((€e,j,)é—1)- (29.4.6) 
PROOF: Let Ac .Z((V£); 4, (We), 4) for linear spaces (Vi);.., and (W;)5 4, for r,s € Zi - For the bases 


and other assumptions as in the statement of the theorem, define (a^;);e;,;e; and (âf a LjcJ by lines 
(29.4.3) and (29.4.4). Then 


Vi € I, Vj € J, a; = ACE Vka, (êe jez) 

2 me x 

A(C mY T. [A e^ Sy ast zs Ee j, Zi is )t=1) (29.4.7) 
j=1 
= (II Y; "n JUI 2 5) 4t du 1 (6,52 1) 
VEL jE k=1 ica 
z 744 a 

ml SOT il ww (TI Ft je) a J 

iclj'cJ k=1 £—1 
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This verifies line (29.4.1) for A € .Z((Vz)ti 4, (W?)?.4). (The coordinate transformation rules applied in 
line (29.4.7) are proved in Theorems 22.9.11 and 23.9.7.) Line (29.4.2) follows similarly. 

Let A € Lin(.Z((V)7. .,), 2 ((W»)5..,)) for linear spaces (V;,);_, and (We)$_, for r,s € Zi. For the bases and 
other assumptions as in the statement of the theorem, define (a';);e;,;e; and (à^;);ier,;je; by lines (29.4.5) 
and (29.4.6). Then 


Vi € I, Vj € J, à'; = A(&)((&55,)—1) 
r Mk 3 Ne ^ 
- A(& (3, Yaa OCE Eest 25 i)i) Poe 
RED a Jg— 


This verifies line (29.4.1) for A € LinZ((Vi)r 4),-Z((We)2-4)). (The coordinate transformation rules 


applied in line (29.4.8) are proved in Theorems 22.9.11 and 23.9.7.) Line (29.4.2) follows similarly. 


29.4.8 REMARK: Different classes of tensorial object may correspond to the same coefficient matrix. 

The tensor calculus approach to differential geometry utilises the coefficient matrix (a';);er,;e; in Theorems 
29.4.4 and 29.4.5 and often largely ignores the objects which the coefficients coordinatise. In this case, the 
coefficient arrays are the same, and they transform in the same way under changes of basis, but they refer 
to different object classes. 


29.4.9 REMARK: Tensor monomial notation meanings are different for vectors and linear functionals. 
Notations 29.4.10 and 29.4.11 give a different meanings to the binary operator “®” depending on whether 
the operands are vectors or linear functionals. The expression 9». Vk is a simple tensor. The expression 
Qa A* in Notation 29.4.11 is a simple multilinear function. (Notation 29.4.10 duplicates Notations 28.4.4 
and 28.4.5, but is presented again here, slightly differently, for direct comparison with Notations 29.4.11, 
29.4.14 and 29.4.15.) 


29.4.10 NOTATION: Tensor monomials composed of contravariant vectors only. 
L4 Ug, for a sequence of vectors (vy)7.., € xL 4V&, for r € Zg, where Vp is a linear space over K for 
all k € N,, denotes the tensor in .Z((Vi); .,; K)* defined by AH A(vi,... up) for all A € .Z((Vi)i 4; K). 


vı ®...@ v, means the same as Gy. Vk- 


29.4.11 NOTATION: Tensor monomials composed of covariant vectors only. 

&$..4 $f, for a sequence of linear functionals (¢°)f_, € x2 4(W;)*, for s € Zi, where W; is a linear space 
over K for all / € Ns, denotes the tensor in .Z((W»)? 4; K) which is defined by (we)%_, > [[p_, f (wo) for 
all (we)g_, € Xp, We. 


$! @...@* means the same as @3_, o°. 


29.4.12 REMARK: Contravariant and covariant simple tensors are totally different kinds of objects. 

The definition of mixed tensors is presented in most texts as if the contravariant and covariant simple tensors 
GL Vk and @3_, Qf were equivalent in some sense. In fact, the contravariant simple tensor @f_, vj, is formed 
by applying multilinear functions in .Z'((V.); .,; K) to the sequence of vectors (vy); ., € x; 4,V.. The 
covariant simple tensor @j_, df, by contrast, is defined as the application of each individual linear functional 
in the sequence (0^)? , € x? ,W;7 to the corresponding element of a sequence of vectors (w;)7.., € X$- We. 
Consequently, it is not possible to simply string these simple tensors together in a well-defined mixed simple 
tensor (87. vk) Q (@%_, 9^). 

An alternative interpretation for the notation for simple contravariant tensors in Notation 29.4.10 would 
define @f_, vy to mean the multilinear function in .Z((Vz)7..,; K) defined by (9*)7. , — [Tz ó^(vi) for 
all (Q^)? , € xz 4,V£. The equivalence (i.e. natural isomorphism) of this interpretation to Notation 29.4.10 
is shown in Theorem 29.2.4. This multilinear function style of tensor product is closer to the simple mixed 
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tensors in Notation 29.4.14. As discussed in Remark 28.5.5, the representation 2((V.);_,;.)* for con- 
travariant tensors is much closer to physical applications than @((V,")7_,; K), although the latter does have 
some advantages in the construction of mixed tensors. 


Notations 29.4.14 and 29.4.15 define two ways of combining contravariant simple tensors with covariant 
simple tensors to form a single mixed simple tensor. Notation 29.4.14 constructs a mixed simple tensor in 
the popular multilinear-map style of mixed tensor space. Notation 29.4.15 constructs a mixed simple tensor 
in the linear-map style of mixed tensor space. For both styles of mixed tensor products, the contravariant 
and covariant components must be treated differently. 


29.4.13 REMARK: The application of the list-concatenation operator to tuples of vectors. 
The list concatenation function template concat : List(S) x List(S) — List(S) for general sets S is defined in 
Definition 14.12.6 (iii). For the application in Notation 29.4.14, S may be chosen as (U;.., Vix) U (U;.., Wo). 


29.4.14 NOTATION: Simple mized tensors in the multilinear-function style. 

(%1 vx) & (82. 05), for a sequence (vy), € xL 4V&, for r € Z, where V, is a linear space over K 
for all k € N,, and a sequence (05)? ., € x; 4(W;)*, for s € Zj, where W; is a linear space over K for 
all / € Ng, denotes the mixed tensor in .Z((V£); 4, (We)_,) which is defined by 


Ax (ve) TI ef (wo) 


k=1 £—1 


concat((A)5.—,, (we)ea1) > 
for all (A4); € Xg- Ve and (wi)?.4 € X$- We. 
(v1 ...9 vr) & (9! & ...& $5) means the same as (95 , vx) & (@3_, 9^). 


29.4.15 NOTATION: Simple mized tensors in the linear-map style. 

GL vy — @§_, $f, for a sequence (vx); , € x1. 4Vx, for r € Zg, where V; is a linear space over K for 
all k € N,, and a sequence ($^); ., € x_,(W2)*, for s € Zf, where Wz is a linear space over K for all £ € Ng, 
denotes the mixed tensor in Lin(.Z((Vi); 4), -Z ((W4?)5.,)) which is defined by 


S 


AK ((we)g—1 e A(v1,..-Ur) [I o*(we)) 
i-i 
for all A € .Z((Vi); .,; K) and (wj), € x5 4Wi. 
vj $...G v, >18... & à? means the same as 97. , v, > &$ Q. 


29.4.16 REMARK: Simple mixed tensors composed of basis vectors. 

Notations 29.4.14 and 29.4.15 are abbreviated in the case of mixed tensors composed of basis vectors to No- 
tations 29.4.17 and 29.4.18 respectively. There is some inconvenience in the use of the over-bar to distinguish 
the two sets of bases, but this can be easily removed by re-indexing the family (We)f_, as (Wọ); 13 41, and 
rewriting (Egje) j=1 aS (€¢,j¢)jon1- 


29.4.17 NOTATION: Simple multilinear-function-style mized tensors composed. of basis vectors. 
ei), for à € X,_)Nm, and j € x; .4INS,, for r,s € Zg; for finite families of finite-dimensional linear spaces 
(Vi)p—1 and (W;)7.., over a field K, with bases (epi; )i č; for Vi for k € Np, and (Er ;,)5/., for We for £ € Ns, 
denotes the mixed tensor in .Z((Vz)t 4, (W?)5.4) which is defined by 
. T sS E " 
ej —(& Ckir) 8 (8 e"), 


£—1 


where et) is the canonical dual basis to B, for £ € Ns. 


29.4.18 NOTATION: Simple linear-map-style mixed tensors composed of basis vectors. 
ei, for i € XT Nm, and j € x$_,Nn,, for r,s € Zg, for finite families of finite-dimensional linear spaces 
(V), and (We);_, over a field K, with bases (ei,;, )7,*., for Ve for k € N,, and (€¢,;,) i", for We for £ € Ns, 


denotes the mixed tensor in Lin(.Z((V&)7 4). 2((We)?_,)) which is defined by 


> =f, 
Ei I= ( e €k ik — e e ge). 
k=1 £—1 


where (e^3054 is the canonical dual basis to By for £ € Ng. 
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29.4.19 REMARK: Expressions for mixed tensors in terms of bases. 
Theorems 29.4.20 and 29.4.21 express mixed tensors in the multilinear and linear style respectively in terms 
of Notations 29.4.14 and 29.4.15 for simple mixed tensors. 


29.4.20 THEOREM: Expression for multilinear-function-style mixed tensors in terms of bases. 

Let (Vk); and (Wz); 4, with r,s € Zj , be finite families of finite-dimensional linear spaces over a field K. 
Let By = (€x,i, );,., be a basis for V; for all k € N,, and let Be = (€r je )5/. be a basis for We for all £ € Ns. 
Then l 


VAE (Vg )k=1 (We)i=1), A= 5 > a’; (( 5 Chin) Q ( & ese) 


iclIjcJ k= f=1 
ae 
=>), 2} je, 
icljcJ 


where I = x; 4INa,, J = x5. 41IN5,, and 
Vi € I, Vj € J, a'j = AÇ (e)ko, Eta )é=1); 


where (e^^*)7*, is the canonical dual basis to By for k € N,, and (@°*)"* , is the canonical dual basis 
to B, for £ € Ng. 


PROOF: The assertion follows straightforwardly from the definition of 2((V,*);_,, (We)#_,)- 


29.4.21 THEOREM: Expression for linear-map-style mixed tensors in terms of bases. 

Let (V). and (Wz)? 4, with r,s € Zj , be finite families of finite-dimensional linear spaces over a field K. 
Let By = (ekir Ji = be a basis for V; for all k € N,, and let Be = (€e je )5/. be a basis for We for all £ € Ns. 
Then l 


VA € Lin L(V) Z(W0t4). A= Lal ena, > È E) 


i€l j€J 
- Mae, 
i€l j€J 
where I = x; 4INa,, J = x5. 41IN5,, and 
Vi € I, Vj € J, aj = AÇ )( (Ecje )i=1) 


where (e^^*)7*, is the canonical dual basis to By for k € N,, and (&^/*)7^ , is the canonical dual basis 
to B, for ( € Ng. 


PROOF: The assertion follows straightforwardly from the definition of Lin(-Z((Vx)7. 4), Z ((W2)5 .4)). 


29.5. Mixed tensor product spaces on a single primal space 


29.5.1 REMARK: Ordering convention. Contravariant degree comes before covariant degree. 
As mentioned in Remark 29.3.2, a contravariant-first order convention is followed in notations for mixed 
tensors. This is applied in Definition 29.5.4 and Notation 29.5.5, where r is the contravariant degree and s is 


the covariant degree. A (possible) mnemonic for this order is that the prefix “contra” contains an “r”, and 
r comes before s in the alphabet. 


29.5.2 DEFINITION: The mized multilinear function space of type (r,s) on V, where r,s € Zi and V is 
a finite-dimensional linear space, is the mixed multilinear function space .Z'((V2)aeA.(Wg)gen; K). with 
Va = V for alla € A = N, and Wg = V for all 8 € B=Nsg. 

A mized multilinear function of type (r,s) on V is an element of the mixed multilinear function space of 
type (r,s) on V. 


29.5.3 NOTATION: Mized multilinear-function-style tensor space. 
®"* V, for a finite-dimensional linear space V and r,s € Zj, denotes the mixed multilinear function space 
of type (r,s) on V. In other words, ®"* V = ¥((V*)", V5; K). 
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29.5.4 DEFINITION: The mized tensor (product) space of type (r,s) on V, where r,s € Zj and V isa 
finite-dimensional linear space, is the mixed tensor product @((Va)aca; (Wg)geB) with V, = V for all 
a € A= N, and Wg = V for all 8 € B=Ns. 


A mized tensor of type (r,s) on V is an element of the mixed tensor space of type (r,s) on V. 


29.5.5 NOTATION: Mized linear-map-style tensor space. 
®"** V, for a finite-dimensional linear space V and r,s € Zj , denotes the mixed tensor space of type (r, s) 
on V. In other words, 9^ V = Lin.Z(V*; K),.Z(V5; K)). 


29.5.6 REMARK: Simplified notations for mixed tensor spaces on a single primal space. 

The mixed tensor product @((Va)aea; (Wa)aenp) in Definition 29.5.4 may be written as C9(V"; V*). Thus 
G"*y = G(V'; V5) and OV & $^? y e &"? y = &(V'; V9). One may also write Y(V) & QV S 
Q” y = G(v9:ys). 


29.5.7 REMARK: Identifications of mixed tensor spaces in boundary cases. 

The special cases r — 0 or s — 0 in Definition 29.5.4 are implicitly discussed in Remark 29.3.7 as the 
special cases A = Ø or B = () respectively. In the case (r,s) = (0,0), the mixed tensor product space 
Q”? V is the space (H; Ø), which has two empty sequences of linear spaces over an implied field K. Then 
G9? y = Lin.Z(0; K),.2/(0; K)). As mentioned in Remarks 27.2.14 and 28.1.15, -Z (9; K) is equal to K’ = 
{f :0 — K} with the operations of pointwise addition and scalar multiplication by elements of K. This may 
be identified with K (the linear space K over the field K), and so Q”? V may be identified with Lin(K, K), 
which can also be identified with the linear space K over the field K. Hence dim(G9?"? V) = 1. 


Definition 29.5.8 is closely related to Definitions 27.3.7 and 28.1.18. 


29.5.8 DEFINITION: Some canonical injections into mulinlinear-function-style tensor spaces. 
The canonical injection of scalars in a mixed tensor space QV = -Z(0; K) for a linear space V over a 
field K is the map i: K > .Z(0; K) = K® given by 


vt € K, Vz € (0), i(t)(z) — t. 


In other words, i: t — (0 t) for all t € K. 


'The canonical injection of vectors in a mixed tensor space Q&Q V = -Z(V*; K) for a linear space V over a 
field K is the map i: V > .Z(V*; K) given by 


Vv € V, VA € V*, i(v)(A) = Av). 


In other words, i : v œ> (ÀA œ> A(v)) for all v € V and à € V*. 


The canonical injection of dual vectors in a mized tensor space QV = -Z(V; K) for a linear space V over 
a field K is the map i : V* > .Z(V; K) given by 


vA Ee V*, Vv € V, i(A)w) = A(v). 
In other words, i = idy». 


29.5.9 REMARK: Formulas for canonical injections for linear-map-style tensor spaces. 
The formulas in Definition 29.5.10 are equivalent to the formulas in Definition 29.5.8, but they are clearly 
much less natural in their appearance and handling. 


29.5.10 DEFINITION: Some canonical injections into linear-map-style tensor spaces. 
The canonical injection of scalars in a mixed tensor space 9"? V = Lin( L (0; K), L(0; K)) 
= Lin(K (9), K1) for a linear space V over a field K is the map i: K > Lin(K‘!, K(?)) given by 


vt c K, Yọ c K, vz c (0) i(t)(o)(z) = toh). 
In other words, i : t > ($ => (0 — t$(0))) for all t € K and 9 € K. 
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The canonical injection of vectors in a mixed tensor space ®'?° V = Lin( L(V; K),.£ (0; K)) 
= Lin(V*, K 7) for a linear space V over a field K is the map i: V — Lin(V*, K*?) given by 


Vv € V, VA € V*, Yz c {0}, i(v)(A)(z) = A(v). 


In other words, i: v œ> (AH (0 A(v))) for all v € V and A € V*. 


The canonical injection of dual vectors in a mized tensor space Q”! V = Lin(.Z(0; K),.Z(V; K)) 
= Lin(K19), V*) for a linear space V over a field K is the map i: V* > Lin(K‘}, V*) given by 


VA c V*, vó e KO, vv c V, il ANA) w) = $(0)A(v). 
In other words, i: A (ġ (v = (Ø)A(v))) for all A € V*, 9 € KE and v € V. 


29.5.11 REMARK: Survey of tensor space degree and type terminology. 

Some of the terminology for tensor degree and rank is summarised in Table 29.5.1. There seems to be fairly 
general agreement on the order of the contravariant degree r and covariant degree s in the type pair for 
mixed tensors. When the degrees are arranged vertically, the contravariant degree generally appears on top. 
This agrees with the tensor calculus convention that contravariant indices are superscripts and covariant 
indices are subscripts. 


29.5.12 REMARK: Tensor spaces with the same transformation rules are not necessarily the same. 

It could be argued that the mixed tensor products in Definition 29.5.4 are highly inconvenient because this 
approach will spawn numerous spaces of linear maps between numerous tensor spaces. Every application may 
require a new space of linear maps which is ultimately isomorphic to one of the spaces in Definition 29.5.4, 
which is itself isomorphic to the popular definition of mixed tensors as spaces of multilinear maps on mixed 
primal and dual spaces. In this view, all of the naturally isomorphic tensor bundles on a differentiable 
manifold are merely alternative representations of the one-and-only single tensor bundle with each type (r, s). 


There are two main counter-arguments to this view. First, the extra work required to build the right tensor 
space for each application is valuable because it forces one to understand the specific structure of each tensor 
bundle. Second, the view that all naturally isomorphic tensor bundles on a manifold are really the same 
tensor bundle is essentially asserting that the only distinguishing feature of tensor bundles is their coordinate 
transformation rules. There is a very popular view that “coordinates-free” differential geometry is somehow 
superior, but most people who believe this also believe that coordinate transformation rules are the only 
distinguishing feature of tensor bundles. These beliefs seem to be incompatible. The view adopted in this 
book is that multiple transformation-identical tensor bundles may co-exist on a single manifold, but that 
coordinates are the principal structural feature of tensor algebra and differential geometry generally. 


29.5.13 REMARK: The numerous tensor space representations in the literature. 

It is not surprising that there are numerous representations in the literature for mixed tensor spaces because 
there are so many representations for unmixed covariant and contravariant tensors. The mixed multilinear 
function space C9" V is sometimes defined, for linear spaces V and integers r, s € Zo , as the tensor product 
G7? V;, where V; = V for i € N, and V; = V* for i € N,,, V Ny. The unmixed tensor product (9771 V; 
on which such a definition is based can then have the various representations listed in Table 28.5.1 in 
Remark 28.5.3. Definition 29.5.4 has the advantage that it is clear to a pure mathematician exactly how the 


space is constructed. 


29.5.14 REMARK: Permutations of the order of primal and dual spaces in a mixed tensor space. 

'The style of mixed tensor representation mentioned in Remark 29.5.13 uses a sequence of primal spaces 
followed by a sequence of dual spaces, but this could be generalised so that the primal and dual spaces are 
mixed up in any order. 


As an example, a possible notation for the space of tensors whose components are written as K;/* , would be 
699^?! R”, (Something like this is discussed by Petersen [31], page 55.) More generally, the notation would 
be Q322 V with contravariant and covariant degrees respectively equal to r4, r2,... and 51, 52,.... It 
would then be necessary to develop a set of notations for arbitrary contractions and products of such tensors 
and spaces. This seems to be an area where the physicists’ component notations are better developed than 
the pure mathematical notations. 
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year reference degree r, s type (r, s) 
1925 Levi-Civita [26], page 70 rank r, s rank r 4- s 
1949 Synge/Schild [41], page 11 order r, s — 
1950 Corben/Stehle [258], pages 337-338 order r, s — 
1951 Steenrod [142], page 23 order r, s — 
1959 Willmore [42], page 177 order r, s type (r, s) 
1963 Auslander/MacKenzie [1], page 195 order r, s type (r, s) 
1963 Guggenheimer [16], page 182 — type (r, s) 
1963 Kobayashi/Nomizu [19], page 20 degree r, s type (r, s) 
1965 MacLane/Birkhoff [110], pages 528, 531 degree r type (s,r) 
1965 Postnikov [33], page 26 — type (r, s) 
1968 Bishop/Goldberg [3], page 78 degree r, s type (r, s) 
1968 Choquet-Bruhat [6], pages 35, 236 [fr] Ordre r, s ordre r+ s 
1970 Misner/Thorne/Wheeler [292], page 75 — rank (7) 
1970 Spivak [37], Vol. 1, pages 117, 123 order r, s type (7) 
1975 Lovelock/Rund [27], page 341 — type (r, s) 
1979 Do Carmo [9], page 100 order s — 
1980 Schutz [36], pages 57, 68 order r + s type (7) 
1983 Marsden/Hughes [289], page 65 rank r, s type (7) 
1983 Nash/Sen [30], page 41 rank r, s type (r, s) 
1986 Crampin/Pirani[7], pages 97, 194, 230 degree s, valence s type (r, s) 
1987 Gallot/Hulin/Lafontaine [13], page 37 — type (r, s) 
1988 Kay [18], pages 29, 193 order r, s type-(7) 
1993 EDM2 [113], 256.J, page 949 degree r, s type (r, s) 
1995 O'Neill [295], page 8 — type (r, s) 
1996 Goenner [270], page 87 [de] — Typ (r, s) 
1997 Frankel [12], page 58 rank r, s — 
1997 Lee [24], page 12 rank r + s type (7 
1998 Petersen [31], page 51 — type (r, s) 
1999 Rebhan [299], page 973 [de] — () 
2004 Szekeres [305], page 186 degree r, s type (r, s) 
2005 Penrose [297], page 240 — valence [7] 
2006 Gregory [272], page 501 order r — 
2012 Sternberg [38], pages 94, 98 degree r, s type (r,s) 

Kennington degree r, s type (r,s) 

Table 29.5.1 Survey of tensor space degree and type terminology 


29.5.15 NOTATION: .Z,,(V;U) denotes the mixed multilinear map space .Z((V*)", V°; U), where V isa 
finite-dimensional linear space, U is a linear space, and r,s € Zi : 

Lr (V) denotes the mixed multilinear map space .Z'((V*)", V*: K), where V is a finite-dimensional linear 
space over the field K, and r,s € Zj. 


29.5.16 REMARK: A natural isomorphism for multilinear-function-style and linear-map-style tensors. 
The mixed multilinear map space .Z.,(V) = .Z((V*)',V*; K) in Notation 29.5.15 is the same as the 
space C9^ V. If Theorem 29.3.11 is applied to the case that all of the tensor component spaces are the 
same, the result is the following natural isomorphism: 


QV = Lin( L(V"; K), AVR) e (Vy VS; K) =O" VHS V). 
This gives the equivalence of the definition in this book to the more popular style of definition. 


29.5.17 REMARK: An unimportant generalisation of a tensor space notation. 
Notation 29.5.18 slightly generalises Notation 29.5.15. As a consequence, %,.(V;U) = Zr .(V, VU). 
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29.5.18 NOTATION: .%,5(Vi,V2;U) denotes the mixed multilinear map space £((V;")”, V2; U), where Vi 
and V5 are finite-dimensional linear spaces, U is a linear space, and r,s € Z ; 


29.6. Components for mixed tensors with a single primal space 


29.6.1 REMARK: Specialising components for mixed tensors to the homogenous primal spaces case. 

The purpose of Section 29.6 is to apply bases and components to the mixed tensor product spaces which are 
defined in Section 29.5 for a single primal space. Theorems 29.6.2 and 29.6.3 are specialisations of Theorems 
29.4.4 and 29.4.5 respectively to the case of uniform linear spaces. 


29.6.2 THEOREM: Components for multilinear-function-style homogenous mixed tensor spaces. 
Let V be a linear space over a field K, with n = dim(V). Let B = (e;)?_, be a basis for V. Then 


Yr, s € Z3, VA e Z((V*)",V*), VAT, e (V*)", Viw)3_, e V*, 
0 


A(t (iar) E D AE 5013) TI NS IE vf 


i€Nz JENS é=1 


where vj! = Kp(v);, denotes the j-th component of v; with respect to B, AE = A*(e;,) denotes the i;-th 
component of A; with respect to B, and e'* denotes the i-th element of the canonical dual basis to B. 


PRoor: This result is a direct consequence of Theorem 29.4.4 by replacing Vp with V for k € N, and We 
with V for £ € Ng. 


29.6.3 THEOREM: Components for linear-map-style homogenous mixed tensor spaces. 
Let V be a linear space over a field K, with n = dim(V). Let B = (e;)?., be a basis for V. Then 


Vr,s € Zi, VA € Lin(.Z.(V),.Z.(V)), VA e Y(V), Yve) € V5, 
AOO) = E E AC.) X i vit, 


ieN7 JENS, 


where À; = A((ei,)ţ—1) for i € N7, and v? = &p(v;);, is the j;-th component of v; with respect to the 


basis B, and e! = @f_, e’*, where et! is the i;-th element of the canonical dual basis to B. 


PRoor: This result is a direct consequence of Theorem 29.4.5 by replacing Vi with V for k € N, and Wọ 
with V for £ € Ng. 


29.6.4 THEOREM:  Change-of-basis formulas for two mixed tensor representations. 
Let V be a linear space over a field K, with n = dim(V) € Z. Let r,s € Zj. Let B = (e;j)*., and 


B = (é;)?_, be bases for V. Define the basis transition matrices (Yo,8)a g=1 and (Yap) g= by 


TL 


Va € Nn, ea = X êgYg o 
B=1 
n ^ 
Va € Nn, ĉa = y e8Ya3,a. 
B-—1 


Let A € .Z((V*)', V°). Then 


Vie I, Vj € J, a= Y 3:3! e J IL Yu) DL T; ue y, (29.6.1) 
YEI k=1 £—1 
Vie I, Vie J, d > XI T ILE u)8 y, (29.6.2) 
i'cIljcJ k=1 l=1 
where 
Vi € I, Vj € J, atj SAY as (ej as) (29.6.3) 
Vi € I, Vj € J, a eA siena) (29.6.4) 
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where (e")"_, and (é*)?_, are the canonical dual bases to B and D respectively. 


Similarly, let A € Lin(.Z(V"),.Z(W*)). Then (29.6.1) holds when (a’;)ier,jey and (@';)ier,je7 are defined 
instead by (29.6.5) and (29.6.6): 
Vi € I, Vj € J, a’; = A(e*)((ej,)54) (29.6.5) 


Vi € I, Vj € J, à'; = A(&)((65,)2..,). (29.6.6) 


PRoor: This result is a direct consequence of Theorem 29.4.7 by replacing Vi, with V for k € N, and Wọ 
with V for / € Ng. 


29.6.5 REMARK: Tensor monomial notation meanings are different for vectors and linear functionals. 

Notations 29.6.6 and 29.6.7 are specialisations of Notations 29.4.10 and 29.4.11 to homogeneous linear spaces. 
Note that the binary operator “®” has a different meaning for the contravariant vectors in Notation 29.6.6 
and the covariant vectors in Notation 29.6.7. In the contravariant case, the r vectors are applied as arguments 


of an r-linear “test function" À € Y,(V; K). In the covariant case, the s linear functionals are applied to an 
s-tuple of “test vectors”. 


29.6.6 NOTATION: Homogeneous tensor monomials composed of contravariant vectors only. 
GT. 4 Vk, for a sequence of vectors (vg)¥_, € V", for r € Zf, where V is a finite-dimensional linear space 
over K, denotes the tensor in .Z;.(V; K)* defined by A+> A(v1,...v,) for all A € Y,(V; K). 


v, ®...® v, means the same as Gy ., Vk- 


29.6.7 NOTATION: Homogeneous tensor monomials composed of covariant vectors only. 

Qf 9^, for a sequence of linear functionals (¢°)f_, € (W*)*, for s € Z, where W is a finite-dimensional 
linear space over K, denotes the tensor in .Z,(W; K) which is defined by (w;)?., — [D o (wi) for 
all (we)g_, € W5. 

9! Q... Q ° means the same as @$_, Qf. 


29.6.8 REMARK: Simple mized tensors on a uniform linear space. 

Notations 29.6.9 and 29.6.10 are specialisations of Notations 29.4.14 and 29.4.15 to uniform primal spaces. 
Since the contravariant part and covariant part of the mixed tensor in Notation 29.6.9 are different kinds 
of objects (and the binary operator *&" has a different meaning in each case, as noted in Remark 29.6.5), 
the simple mixed tensor expression which is defined in Notation 29.6.9 is not at all uniform. The rules for 
constructing the (r + s)-linear function in .Z((V*)", V*) are different for the contravariant and covariant 
components. This difference is even more obvious in Notation 29.6.10. 


29.6.9 NOTATION: Simple mixed tensors in the multilinear-function style. 

(@t_, vx) & (&$.., *), for a sequence (vz)%_, € V^, for r € Zj , and a sequence (95); , € (V*)5, for s € Zf, 
where V is a finite-dimensional linear space over K, denotes the multilinear function in .Z((V*)", V5) which 
is defined by 


T E 


concat((At)ha1, (we)iaa) => IT Aror) TI o (we) 


for all (4;)5.., € (V*)" and (wj); ., € V*. 
(v ®...@ v.) & (9! &...& $*) means the same as (@f_, vx) & (&$. ., 9^). 


29.6.10 NOTATION: Simple mixed tensors in the linear-map style. 

Gr 4 Vk — Q$ $f, for a sequence (vy)z.., € V", for r € Zi, and a sequence (¢°)s_, € (V*)*, for s € Zi, 
where V is a finite-dimensional linear space over K, denotes the mixed tensor in Lin(.Z;.(V),.Z,(V)) which 
is defined by 


S 


AK ((we)p—1 e A(v,...v.) [I o'(we)) 
(=1 
for all à € .Z;(V; K) and (we)f_, € V*. 
UV, @...@v, + 9! G... € means the same as 97. , vy > Q Q. 
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29.6.11 REMARK: Simple mixed tensors composed of basis vectors. 
Notations 29.6.12 and 29.6.13 are specialisations of Notations 29.4.17 and 29.4.18 to a single primal space. 


29.6.12 NOTATION: Simple multilinear-function-style mixed tensors composed of basis vectors. 
ej, for i € IN” and j € NS, for r,s € Zi , for a finite-dimensional linear space V over a field K, with 
basis B = (e&)7,,, denotes the mixed tensor in .Z((V*)", V°) which is defined by 


e? = (8 e) e (& e), 
k=1 f=1 


where (e)? _, is the canonical dual basis to B. 


29.6.13 NOTATION: Simple linear-map-style mixed tensors composed of basis vectors. 
e; 7, for i € IN” and j € NS, for r,s € Zi , for a finite-dimensional linear space V over a field K, with bases 
B = (em)?,—, denotes the mixed tensor in Lin.Z;(V),.Z;(V)) which is defined by 


" * S * 
ej? 2 (& e > & E), 
k—1 £—1 


where (e™)? _, is the canonical dual basis to B. 


29.6.14 REMARK:  Ezpressions for mized tensors in terms of bases. 
Theorems 29.6.15 and 29.6.16 are specialisations of Theorems 29.4.20 and 29.4.21 to a single primal space. 


29.6.15 THEOREM:  Ezrpression for multilinear-function-style mixed tensors in terms of bases. 
Let V be a finite-dimensional linear space over a field K. Let B = (€m)",_, be a basis for V. Let r,s € Zf. 
Then 


VA e £((V*)",V*), A- Y Yai; ((® e) (& e)) 
ieIjeJ k=1 £—1 
B X ajej, 
ieljeJ 
where I = N}, J = Nj, and 
VES NI Su a^; = A((e™* ena (esi); 


n 


where (e™)? _, is the canonical dual basis to B. 


PRoor: This result is a direct consequence of Theorem 29.4.20 by replacing V; with V for k € N, and We 
with V for / € Ng. 


29.6.16 THEOREM: Expression for linear-map-style mixed tensors in terms of bases. 
Let V be a finite-dimensional linear space over a field K. Let B = (em)m=1 be a basis for V. Let r,s € Zo. 
Then 


VA € Lin.£Z. (V), Z(V)), A- Y Yai,( 8, id È, eit) 


iclIjcJ k= 
=P Daah 
icljcJ 
where I = N}, J = Nj, and 
Vi € I, Vj € J, a’; = A(e))((e5,)54). 


where e! = & 4 e’*, and (e™)” _, is the canonical dual basis to B. 


PRoor: This result is a direct consequence of Theorem 29.4.21 by replacing V, with V for k € N, and Wọ 
with V for / € Ng. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13 


29.7. Tensor contractions and juxtaposition products 987 


29.7. Tensor contractions and juxtaposition products 


29.7.1 REMARK: Tensor contraction operations. 

The “contraction of indices” operation on tensors in tensor calculus is one of the most frequently applied 
operations. In this well known operation, one simply makes two of the index letters the same and then sums 
over all values of the index. In the case of a (1,1) tensor, the contraction operation is the same as the trace 
of the coefficient matrix, introduced in Definition 25.9.2. 


It is shown in Theorem 23.3.2 that the trace of a linear map is independent of the basis for which it is 
computed. Since contraction operations are a generalisation of the trace operation, it is not surprising that 
all contraction operations are basis-independent, although a basis is required for their computation. (The 
annoying apparent impossibility of computing the trace of a linear map without resorting to “coordinates” 
is mentioned in Remarks 23.3.1 and 23.3.5.) 

For simplicity, Definition 29.7.2 introduces the contraction operation for a single primal space, although 
in practice contractions are often required for tensors with mixed primal spaces, for example the Riemann 
curvature tensor for a connection on a vector bundle. 


29.7.2 DEFINITION: Contraction operation, multilinear-function-style mixed tensor, single primal space. 
The contraction of a tensor A € Q^ V = Y((V*)", V°; K) for indices a € N, and £ € Ng, for a linear 
space V over K and r,s € Z*, is the tensor C € QETE V = Z((V*)r-1, V571; K) defined by 


VÀ = (At-1 e (V*'*1, Vv = (ve) e Ve, 
OUS c we 3 A(insert(A), insert(v)) 


1€q 


I 
Q 
Me i 


A (insert (4) (1) 1.1, (insert(v)(J))j—1) 


a,ed Ba 


ES 
Il 
n 


o—1 atl r—1 
QA Vge b AC TFT LLLA | UL o UB—1; €q; UB oa); 


Il 
Ma 
> 

x 


ES 
ll 
uh 


where (ej;)g-, is any basis for V, and (e?)7., is its canonical dual basis as in Definition 23.7.3. (See 
Definition 14.12.6 (viii) for “insert” .) 


29.7.3 NOTATION: Conj, for a € IN, and 8 € Ng, for a tensor space Q^ V = .Z((V*)",V*5; K) over a 
finite-dimensional linear space V over K, denotes the contraction operator for indices œ and 8. In other 
words, Cong : Q^ V > G9" 1S"! V is defined by 


Vr,s€ Zt, VA € Q'* V, Va € Nr, YB € Ns, VA € (V*)'^!, Vv e V8 « Vg E F(V), 
Cong(A)(A, v) = > Ainere A), meet (uA 


q=1 €q 
where n = dim(V) and F(V) denotes the set of all bases of V. 


29.7.4 THEOREM: Verification of basis-independence of contractions of tensors. 
Let r,s € Zt. Let G9"^ V be a tensor space over a finite-dimensional linear space V over K. Then for all 
aé€N,, GEN, and Ac "Vy, Cong (A) is independent of the choice of basis for V. In other words, 


Vr,s € Zf, VA € Q'" V, Vac N,, V8 € Ns, VÀ € (V*) 1, Vv EVO, Veg) 1, (Ep) pa € FV), 
>- A(insert(A), insert(v)) = = A(insert(A ), insert(v)), 


q—1 9e B,eq €p 
where n = dim(V) and F(V) denotes the set of all bases of V. 


PROOF: Let B = (eq)?_,; and B = (ēp)}—ı be bases for V, and let (e7)"_, and (@)?_, be their canonical 
dual bases. Let Qji = &g(ei)j and Qij = KB(8j)i for all i, j € IN. Then 


TL TL TL 


>> A(insert(o, e?) (A), insert(B, e;)(v)) = > A(insert(a, E aue) (A), insert(8, X ajqé;)(v)) (29.7.1) 


q=1 q=1 j=1 
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= X yayA(nsert(o, & (3), insert(B,2;)(0)) (29.7.2) 
= X iy AGnsert(a, 2") (2), insert (8, £j) ()) (29.7.3) 
2 » A(insert(o,, P (A), insert(B, &;) (v)), 


where line (29.7.1) follows from Theorem 23.9.7 line (23.9.2) and Theorem 22.9.11 line (22.9.4), line (29.7.2) 
follows from the multilinearity of A, and line (29.7.3) follows from Theorem 22.9.8, from which the assertion 
follows. 


29.7.5 REMARK: Juataposition products for general tensors. 

A simple kind of “juxtaposition product” of multilinear functions multiplied by vectors is described in 
Remark 27.2.31 and shown to be a valid multilinear map in Theorem 27.2.32. Such products are very 
often seen in differential geometry and are almost ubiquitous in tensor calculus. Juxtaposition products are 
typically combined with tensor contraction operations to construct new tensors. 


The basic principle of the juxtaposition product appears in Definition 23.4.10 for the product of a linear 
functional by a vector. In more complicated cases, the juxtaposition product involves general tensor fields 
or multilinear map fields, often involving symmetries and antisymmetries. 


Since the tensors A and B in Definition 29.7.6 have values in the field K, the product of their values is well 
defined in K. The multilinearity of the product C follows from Notation 27.2.17 and Definition 27.2.3, which 
only require linearity with respect to each component individually, while holding the other components fixed. 
When one of the components of A is being varied, multiplication by B is effectively multiplication by a scalar 
constant, and vice versa when A is fixed while a component of B is varied. 


Unsurprisngly, given the name, the notation chosen here for the juxtaposition product of tensors A and B 
in Notation 29.7.7 is a simple juxtaposition “AB” of the letters A and B. 


29.7.6 DEFINITION:  Juztaposition product, multilinear-function-style mixed tensor, single primal space. 
The juxtaposition product of tensors A € C9" ^' V and B € G9'??* V, for a linear space V over K and 
T1,5$1,72,82 € Zt, is the tensor C c Q1 7?*1t*? y = Y((V*)"472, ys1*52: K) defined by 


VAS EP equos Vu = Ga) eyes, 
C(A, v) = A(subseq(A), subseq(v)) B( subseq (A), subseq (v)) 
lyri 1,81 ritl,rit+re sı+l,s1 +82 


= A(A1, see Ary) V1; ha süsi) B(M4a, see Arere s.s Vitez): 
(See Definition 14.12.6 (ix) for the “subseq” list subsequence operator.) 


29.7.7 NOTATION: AB, for multilinear-function-style tensors A and B over a single primal space, denotes 
the juxtaposition product of A and B. 


((2019-7-18. Define juxtaposition products and contractions for linear-map-style tensors. )) 


((2019-7-18. Combine juxtaposition products and contractions. In the case of a (1,0) and (0,1) tensor, the 
contraction should be equal to the simple application of the latter to the former, like a linear functional 
acting on a vector. )) 


((2019-7-18. Tensor contractions and juxtaposition products are the bread and butter of tensor calculus. (All 
summations contravariant/covariant index pairs are contractions.) So should give here the basic definitions 
for contractions and juxtaposition products in terms of tensor components. This should be easy. )) 


((2019-7-18. Define the “interior product”. See Sternberg [38], pages 56-58; Frankel [12], pages 89-92; Bishop/ 
Goldberg [3], pages 170-173. This seems to be only defined for antisymmetric covariant tensors, but it may 
be possible to extend it to more general kinds of tensors. )) 
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Chapter 30 


MULTILINEAR MAP AND TENSOR SYMMETRIES 


30.1 Antisymmetric and symmetric multilinear maps . . aoo 989 
30.2 Multilinear map space canonical basis . .. ..... es 992 
30.8 Antisymmetric multilinear map space canonical basis . . . ..... a 994 
30.4 Antisymmetric tensor spaces . . . . 2. 2 2 sl ess ss Ss ss 999 
30.5 Symmetric tensor spaces... . 5. ll s oor osos 1005 
30.6 Tensor bundles on Cartesian spaces... n. . o o es 1006 


30.0.1 REMARK: Applications of tensors typically require symmetry or antisymmetry constraints. 
Applications for symmetric tensors and multilinear maps include elastic stress and strain tensors, Taylor 
series terms, directional derivatives (see Example 27.3.8), Riemannian metric tensors, and the relativistic 
energy-stress tensor. 


Applications for antisymmetric tensors and multilinear maps include line, area and volume elements, for 
integration, and the relativistic electromagnetic field tensor. Exterior algebra and antisymmetric forms are 
almost ubiquitous in differential geometry. Often these forms are vector-valued, which is the prime focus 
of Sections 30.1, 30.2 and 30.3. In the gauge theory context, the space of vector values is very often a Lie 
algebra. Anteyumetvic forms are so predominant relative to symmetric forms that it is almost unnecessary 
to say anything about symmetric forms. (Hence the brevity of Section 30.5.) 


Multiplication operations for symmetric tensors and multilinear maps seem to have limited applications. 
By contrast, the multiplication operations for antisymmetric tensors and multilinear functions are applied 
extensively in the theory of integration on manifolds. Even antisymmetric vector-valued multilinear maps 
have extensive applications when a vector multiplication operation is available from a Lie algebra structure 
on the linear space. 


In applications, higher-degree tensors typically have one or more symmetry or antisymmetry properties. Thus 
tensors of degree 2, either doubly covariant or doubly contravariant, are predominantly either fully symmetric 
or fully antisymmetric. The Riemann curvature tensor, in various forms and under various assumptions on 
the connection, usually has some mixture of symmetry and antisymmetry properties. 


In the case of tensors and multilinear maps of degree 3 or higher, there may be partial symmetries, meaning 
symmetries over some proper subset of the set of component linear spaces. However, only fully symmetric and 
fully antisymmetric tensors and multilinear maps are considered here. Partial symmetries and antisymmetries 
must often be dealt with on a case-by-case basis because the component permutations may be, for example, 
only rotations, not simple transpositions. More generally, symmetries may be defined with respect to any 
subgroup of the full group of component permutations. One advantage of the full permutation group is that 
it is generated by the set of binary transpositions or “swaps” of components. 


30.1. Antisymmetric and symmetric multilinear maps 


30.1.1 REMARK: Linear-space-component permutation-symmetries require unmixed linear spaces. 

In Definitions 30.1.2 and 30.1.3, symmetry or antisymmetry under linear-space-component swaps implies 
that the linear space components must be the same. (See Definition 14.12.6 (vi) for the list item-swap 
operator.) So general mixed multilinear spaces of the form &((Va)aea;U) must be abandoned here in 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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favour of unmixed spaces Y%,,,(V;U) using multiple copies of a single linear space V. (See Notation 27.2.19 
for £4, (V;U).) 


30.1.2 DEFINITION: A symmetric multilinear map from a Cartesian product V™ of a linear space V over a 
field K, for m € Z; , to a linear space U over K is a multilinear map f € .Z;,(V; U) such that f (swap; ,(v)) = 
f(v) for all v = (v)? € V" and j,k € Nm. 


30.1.3 DEFINITION: An antisymmetric multilinear map from a Cartesian product V™ of a linear space 
V over a field K, for m € Zgj, to a linear space U over K is a multilinear map f € .Z,,(V;U) such that 
f (swap; ,(v)) = —f(v) for all v = (vj), € V™ and j,k € Nm with j # k. 


An alternating multilinear map means an antisymmetric multilinear map. 


An (alternating) m-form (map) for m € Zf means an antisymmetric multilinear map of degree m. 


30.1.4 NOTATION: %(V;U) for m € Zf and linear spaces U and V denotes the set of symmetric multi- 
linear maps from V™ to U. Thus 


Ly (V;U) = (f € ZaS(V;U) Vj, k € Nm, f o swap = f}. 
j,k 


30.1.5 NOTATION: .Z.(V;U) for m € Zf and linear spaces U and V denotes the set of antisymmetric 
multilinear maps from V™ to U. Thus 


LaVi U) ={f € Z4(V;Uys Vj,k E Nm, j Ak => f o swap = — f}. 
j,k 


30.1.6 REMARK: Symmetric and antisymmetric multilinear functions versus maps. 

The linear space U in Definitions 30.1.2 and 30.1.3 may be substituted with the field K regarded as a 
linear space. (See Definition 22.1.9 for fields regarded as linear spaces.) This special case deserves its own 
definitions and simplified notations. 


30.1.7 DEFINITION: A symmetric multilinear function on a Cartesian product V™ of a linear space V 
over a field K, for m € Zj, is a multilinear function f € Zm(V; K) such that f(swap;,(v)) = f(v) for all 
v = (uj), € V" and j,k € Nm. 


30.1.8 DEFINITION: An antisymmetric multilinear function on a Cartesian product V™ of a linear space 
V over a field K, for m € Zj, is a multilinear function f € -Zm(V; K) such that f(swap, ,(v)) = — f(v) for 
all v = (vj)?*, € V" and j,k € Nm with j Z k. 


An alternating multilinear function means an antisymmetric multilinear function. 


An (alternating) m-form (function) for m € Z* means an antisymmetric multilinear function of degree m. 


30.1.9 NOTATION: %'(V) for m € Zf and a linear space V denotes the set of symmetric multilinear 
functions on V™. Thus 


XU = {f = Lm(V; K); Vj, k E€ Nm, f ° swap = Tb 
j,k 


30.1.10 NOTATION: £7 (V) form € Z and a linear spaces V denotes the set of antisymmetric multilinear 
maps on V™. Thus 


L(V) = {f € GZrlV; K); Vj,k E Nm, j # k > f o swap = — f}. 
j,k 


30.1.11 REMARK: General linear-space-component permutations and the parity function. 

Symmetry (or antisymmetry) with respect to binary swaps implies symmetry (or antisymmetry) with respect 
to general permutations. The factor +1 or —1 in the symmetry (or antisymmetry) definitions must then 
be replaced by the respective factor +1 or parity(P), where P € perm(Nm) is a permutation of Nm. (See 
Definition 14.8.2 and Notation 14.8.3 for the set perm( X) of permutations of a set X. See Definition 14.8.22 
and Notation 14.8.23 for the parity function parity : perm(N,,,) + (—1,1) C Z for m € Zj.) 
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30.1.12 THEOREM: Symmetry under all binary swaps implies symmetry under all permutations. 
Let f € Z} (V; U) with m € Zg for linear spaces V and U over a field K. Then 


Vf € ZI(V;U), VP € perm(Nm), Vv € V”, 
f(v o P) = f(v). 


In other words, f((vp(i))#21) = f((vi)iz4) for all permutations P : Nm > Nm and v = (uj), € V". 


PROOF: The assertion follows from Definition 30.1.2 and Theorem 14.8.16. 


30.1.13 THEOREM: Antisymmetry under all binary swaps implies antisymmetry under odd permutations. 
Let f € Z, (V;U) for m € Zj and linear spaces V and U over a field K. Then 


Vf € Zna (V;U), VP € perm(Nm), wEV”, 
f(v o P) = parity(P) f (v). 
That is, f((vp(i))#21) = parity (P) f((v:)7,) for all permutations P : Nm > Nm and v = (vj)z*, € V". 


PROOF: Let f € Z (V;U), P € perm(Nm) and v € V". All permutations P : Nm — Nm are either 
odd or even (and not both) by Theorem 14.8.20 (i, ii). Suppose that P is even. Then by Definition 14.8.18, 
P = T, o Te; © ... T, for some sequence (T;) ., of transpositions of Zm, for some even integer £ € Zj. 
Thus v o P = v o swap; k; © SWwap;, k; , © -..SWap;, 4, for some integer-pairs (ji, kj); € (Nm x Nm)" 
with ji Æ ki for alli € Ne. So f(v o P) = f(v) = parity(P) f (v) by Definitions 30.1.3 and 14.8.22. Similarly, 
if P is odd, then £ is odd. So f(v o P) = —f(v) = parity(P) f(v). Thus the assertion is verified. 


30.1.14 REMARK: The linear spaces of symmetric or antisymmetric multilinear maps. 
Theorem 27.6.2 asserts the closure of spaces of multilinear maps under linear combinations, or equivalently 
under vector addition and scalar multiplication. So to show that the spaces .Z/ (V; U) and Y>(V;U) are 
similarly closed requires only the verification that the symmetries of multilinear maps are inherited by the 
sums and scalar products of multilinear maps. This is shown in Theorems 30.1.15 and 30.1.16. 


30.1.15 THEOREM: Closure of symmetric multilinear maps under addition and scalar multiplication. 
The set .ZZ* (V; U) is closed under pointwise addition and scalar multiplication. 


PROOF: Let g,h € Y*(V;U) and let f = g +h be the pointwise sum of g and h. Then f € .Z4(V;U) 
by Theorem 27.6.2. For permutations P : Nm — Nm and vector sequences (vi); € V™, f((vpq)*,) = 
g((vpq))4) tA((opay 1) = g((v)84) +h((vi)1) = f((v;)*,). The inheritance of symmetry by scalar 
products of multilinear maps follows similarly. 


30.1.16 THEOREM: Closure of antisymmetric multilinear maps under addition and scalar multiplication. 
The set Z7; (V; U) is closed under pointwise addition and scalar multiplication. 


PROOF: Let g,h € Z, (V; U) and let f = g + h be the pointwise sum of g and h. Then f € .Z;,(V;U) by 
Theorem 27.6.2. For permutations P : Nm — Nm and vector sequences (v;)?*, E V™, 
F wra) = gwra) + h(Qbo)) 
= parity(P)g((vi)z4) + parity (P)h((v:);21) 
= parity (P) f (vi) 1). 


Therefore f is antisymmetric. So f € .Z;; (V;U). The inheritance of antisymmetry by scalar products of 
multilinear maps follows similarly. 


30.1.17 REMARK: Duals of constrained linear spaces are not subspaces of the unconstrained duals. 

'The symmetric and antisymmetric multilinear map spaces and multilinear function spaces are linear sub- 
spaces of the corresponding unconstrained linear spaces. Thus for linear spaces V and U, and m € Zi ; 
LI (V; U) C MIU. ZVI) CAV). OV) C MV) and Z (V) € SS (V). This is quite 
obvious, but the corresponding statements for the duals of these spaces are not valid. (See Remark 23.6.11 
for discussion of this issue.) 
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30.2. Multilinear map space canonical basis 


30.2.1 REMARK: Coordinatisation of multilinear map spaces by means of linear space bases. 
Coordinatisation of multilinear maps spaces is required for the construction of charts for the differentiable 
manifold structure of multilinear map bundles. Such coordinatisation is straightforward in the case of 
general multilinear maps, but becomes less obvious in the case of multilinear maps which are constrained 
by symmetries or antisymmetries, where the “natural choice” of coordinates is non-unique and therefore 
requires somewhat more thought. 


Theorem 30.2.2, with U = K, is a simple application of Theorem 29.6.2 for mixed tensor spaces to the 
special case of unmixed field-valued covariant tensor spaces. (To see this, put r = 0 in Theorem 29.6.2 and 
then substitute r for s.) However, Theorem 30.2.2 is stated for general vector-valued multilinear maps. So 
it is derived from Theorem 27.5.4. 


Theorem 30.2.2 is a useful starting point for coordinatising spaces of multilinear maps, but the component 
arrays (ai )ienn = (A(e:))ienn, where e; denotes the vector tuple (e;,);.., for à € IN?, have vector values. 
The space U could be infinite-dimensional, or might even have no basis. This does not affect the validity 
of the construction in the theorem. However, the construction of a basis for .Z;(V;U) does not follow 
automatically from Theorem 30.2.2 in general. When U is finite-dimensional, the construction of a basis is 
straightforward, as shown in Definition 30.2.7 and T'heorems 30.2.5 and 30.2.9. 


'Theorem 30.2.2 is applied in Theorem 30.3.2 to prove a similar formula for the coordinatisation of spaces of 
antisymmetric multilinear maps. 


30.2.2 THEOREM:  Vector-valued components for multilinear map spaces. 
Let V be a linear space over a field K, with n = dim(V) € Zf. Let B= (e5)5-1 be a basis for V. Let U be 
a linear space over K. Then 


Vr € Zi, VA € .Z,(V;U), V(vy)g., € V”, 
A((vk)&-1) = 35 A((eu)k-1) H KB(Uk)ix 


icN;, = 
To. 
= x u lle, (30.2.1) 
i€Nz k=l 
where mue = &p(vr)i, denotes the i-th component of vy with respect to B as in Definition 22.8.8, and 


A((ei,)f—1) for all i € N7. In other words, 


vr € Zi, VA € .Z,(V;U), Vv € V”, A(v) = Y. av, 
ieNs, 


where v! = [Tz 4 v? = [Iz KB (vx); for all v = (vj)T., and i € N}. 


Pnoor: The assertion follows from Theorem 27.5.4 line (27.5.1). 


30.2.3 EXAMPLE: Let n — 3 and r = 2 in Theorem 30.2.2. Then line (30.2.1) may be expressed as 
VA € £(V;U), V(vi,vo) € V?, 
Alvi, v2) = Aer, ex)vdvd + Aler, ea)vded + Aler, es)vivd 
(ea eju] v5 + A(ea €2)U1 v3 + Alea, €3)0} 03 


oe 1)vivj + A(es €2)vi v3 + A(e3, eg)v1vj 


3 T 
P» A(ei, e5)vivg. 


+ + 
mB 


I 
Tas 


30.2.4 REMARK:  Basis-and-coefficients expression for multilinear maps. 

Theorem 30.2.5 utilises a basis for the finite-dimensional space U to produce an expression for vector- -valued 
multilinear maps in -%.(V;U) as linear combinations of a finite fixed set of multilinear maps v > vj. This 
is a strong hint to define a “canonical basis" for .Z;.(V; U) as in Definition 30.2.7, which is asserted "a be a 
valid basis in Theorem 30.2.9. 
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30.2.5 THEOREM:  Scalar-valued components for multilinear map spaces. 
dv. and 


Let V and U be finite-dimensional linear spaces over a field K, with basis families By = (ej fel 


By = (eU) ., respectively. Let r € Z. Then 
q yr 
VA € .Z(V;U), Vv € V', A(v)) 2 3 M ale? lI key (ve )ix; 


where Kpg, (Uk)i, means the iz-th component of vz with respect to By as in Definition 22.8.8, and 


k 


VA € Y(V;U), Vie NS, Vj ENa, — al = ks, (A((e1 a ))5- 


n? 


PROOF: The assertion follows from Theorem 27.6.17 (i). 


30.2.6 REMARK: Multilinear map space canonical basis for unmixed component spaces. 
Theorem 30.2.5 is based on the mixed-space multilinear map space canonical basis in Definition 27.6.16. 
This is specialised to unmixed component spaces in Definition 30.2.7. 


30.2.7 DEFINITION: The canonical basis for the multilinear map space Y,.(V;U) corresponding to bases 
By = (eY); and By = (e? ym, for r € Zf and finite-dimensional linear spaces V and U respectively over 
a field K, is the family (JÉ iene JENm where g € £,(V;U) is defined for (7,7) € INZ, x Nm by 


Vv = (vs), 2, € V”, gi (v) = eU II «By (Vaia; (30.2.2) 
a=1 


where Kp, and kg, are the component maps for By and By respectively as in Definition 22.8.8. 


30.2.8 REMARK: Coordinatisation of multilinear maps with respect to a canonical basis. 
Theorem 30.2.9 expresses multilinear maps in terms of coordinates with respect to the canonical basis in 
Definition 30.2.7. This is useful for building coordinate charts for multilinear map bundles. 


30.2.9 THEOREM: The multilinear map space canonical basis for unmixed spaces is a basis. 
Let (4 iene jENm be the multilinear map space canonical basis in Definition 30.2.7 for Y,.(V;U) with 
r € Zg, corresponding to bases By = (ef). € V" and By = (eU), € U™ for finite-dimensional spaces 
V and U respectively over a field K. 
(i) (P )ienr jENm is a basis for L(V; U). 
(ii) VA € .Z.(V;U), A= J ew: Vien, KBu (A(e)); $4. In other words, 
VA € Z,(ViU), A= Viens Ljen, 01 0), where Vi € N}, Vj € Nm, a] = kp, (A(eY));. 


PROOF: Part (i) follows from Theorem 27.6.17 (iv). 
Part (ii) follows from Theorem 27.6.17 (i). 


30.2.10 REMARK: Canonical basis for a multilinear function space. 

Definition 30.2.7 is further specialised from multilinear maps to multilinear functions in Definition 30.2.11. 
(See Definition 27.6.9 for a mixed-space version of Definition 30.2.11. See Theorem 27.6.11 for a mixed-space 
version of Theorem 30.2.12.) 


30.2.11 DEFINITION: The canonical basis for the multilinear function space Y,(V; K) corresponding to a 
basis B = (e;)’_,, for r € Zg and a finite-dimensional linear space V over a field K, is the family (¢"):enr., 
where ¢' € .Z.(V; K) is defined for i € N? by 


Vi € IN, Ww = (va), 4 € V”, (v) = II &a(vo),, 
a=1 


where «Kp is the component map for B as in Definition 22.8.8. 
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30.2.12 THEOREM: The multilinear function space canonical basis for unmixed spaces is a basis. 
Let (¢’ )ienr be the multilinear function space canonical basis in Definition 30.2.11 for .Z(V; K) with r € Zt, 
corresponding to a basis B = (e;)?_, € V" for a finite-dimensional space V over a field K. 
(i) (¢)ienx is a basis for .Z, (V; K). 
(ii) VA € .Z,(V; K), A= Y eg. A(ei) ¢*. In other words, 
VA € (Vi K), A= Dien: a; d^, where Vi € N}, a; = A(e;). 


PRoor: Part (i) follows from Theorem 30.2.9 (i) with U = K. 
Part (ii) follows from Theorem 30.2.9 (ii) with U — K. 


30.3. Antisymmetric multilinear map space canonical basis 


30.3.1 REMARK: Bases and coordinate maps for antisymmetric multilinear map spaces. 

Apart from facilitating some kinds of practical computations with multilinear maps, coordinate arrays are also 
required for multilinear map bundle atlases as in Notations 56.4.13 and 56.4.18. In the case of antisymmetric 
multilinear maps, there is some extra difficulty when choosing the coordinatisation and in converting a 
coordinate representation back to abstract multilinear maps. (See Notations 56.5.20 and 56.5.25.) For a 
differentiable manifold atlas, the number of coordinates must match the dimension of the linear space of 
multilinear maps. So the redundancy in the coordinatisation must be removed. (In other words, a basis for 
the space must be chosen.) Then this redundant information must be restored when reconstructing abstract 
antisymmetric multilinear maps from coordinates. 


Theorem 30.3.2 uses an antisymmetry rule to reduce the number of terms in the expression for general 
multilinear maps in Theorem 30.2.2. The purpose of this is to determine bases for spaces of antisymmetric 
multilinear maps. Line (30.3.1) groups the tuples i in INZ, which are selections of r elements of Nn, into 


rearrangement classes p(¢) for 4 € Inc(IN,, Nn), which form a partition of the injective tuples in Inj(N,, Nn). 
In other words, line (30.3.1) applies the disjoint union expression Inj (N,., Nn) = Use rn p(£). The non-injective 


tuples i € N7, \ Inj(IN;, Nn) are ignored because A((ei,)5.,) — 0 € U for such tuples. 


Theorem 30.3.2 allows the “output” space U to be any linear space. So the Cartesian-style coordinatisation 
of maps A can be applied only to the finite-dimensional “input” space V. Therefore the resulting coefficients 
A((ee, 5-1) are computed only for a finite number of input tuples (eg, )7...,, but the output cannot be finitely 
coordinatised in general. Theorem 30.3.5 does require the output space to be finite-dimensional. Hence the 
coefficients A((eY, )k-1) can be fully coordinatised by finitely many coordinates. 


30.3.2 THEOREM:  Vector-valued components for antisymmetric multilinear map spaces. 
Let V be a linear space over a field K, with n = dim(V) € Zi. Let By = (e;)5-1 be a basis for V. Let U 
be a linear space over K. Then 


Yr € Zt, VA € Z (V;U), V(vx)s. € V”, 
Ai) E E Alenka) IL v? (30.3.1) 


LEI? icp(£) 


= $ A((e)ki) È parity(P) I] vino, (30.3.2) 


fern P€perm(N,.) k=1 


where vit = Kgy (vi), is the ij-th component of v; with respect to By, and p(¢) = (£ o P; P € perm(N,)) 
for all £ € I?. (See Notation 14.10.3 for I? = Inc(IN,, Nn). See Notation 14.8.23 for parity(P).) 

PROOF: Leti € Nj, \Inj(N,,N,). Then ik, = ik, for some ki, k; € IN, with ky 4 k2. So swap,, ko (€i) = ei 
and then A(e;) = A(swap,, x, (ei)) = —A(e;) by Definition 30.1.3. Therefore A(e;) = 0. Thus A(e;) = 0 for 
all i € N7, V Inj(IN,, Nn). So 

E Alenka) = E Aled) DD 


i€N;, icInj(N, IN,) 


=D E Allende) H ot 


LEI? iep(£) 
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because (p(£); @ € I7 is a partition of Inj(N,, Nn) by Theorem 14.11.7 (ix). Then line (30.3.1) follows from 
Theorem 30.2.2 
For line (30.3.2), let £ € I?. Then i € p(£) if and only if i = £o P for one and only one P € perm(IN,). (The 


uniqueness follows from the injectivity of /.) By Theorem 14.11.7 (vii), there is one and only one element of 
p(£) which is an element of I”, and this element is £. (In other words, p(£) N I? = (£).) Therefore 


vr € Zt, VA € Z (V;U), V(v)g 1 € V^, VE e TP, 
r T ik r T Lpi 
» A((ei.)k-1) Il Up = 2 A((€epay )k=1) I] Uk die 


icp(£) k=1 P€perm(N,,) k=1 
: p A 
SM parity(P) A((ee, i-i) LI v» "uu 
Péperm(N,) k=1 


by Theorem 30.1.13. This implies line (30.3.2). 


30.3.3 EXAMPLE: Let n — 3 and r = 2 in Theorem 30.3.2. Then lines (30.2.1), (30.3.1) and (30.3.2) may 
be expressed as lines (30.3.3), (30.3.4) and (30.3.5) respectively. (See also Example 30.2.3.) 


VA € Y (V;U), Yv, v2) € V?, 


A(v1, v2) = A(e1, e3)v] vd + A(e1, e)vlv2 + A(e1, e3)upu3 (30.3.3) 
+ A(ea, e1) v7 vg + Alea, e2)v2v2 + A(eo, e3)uzu3 
+ A(es, e1)v1v1 + A(es, e2)uzvs + A(es, e3)urvs 
= A(e1, e2)ulv2 + A(e1, e3)u]v3 (30.3.4) 
+ A(ea, &1)v2vd + A(ea, e3)vzu3 
+ A(e3, e1)v]vl + A(e3, e2)uzus 
= A(e1, €2)(vzus — v2v3) + A(e, ex) (v]o3 — v$o1) + Ales, e3)(v2u3 — viv2). — (30.3.5) 


30.3.4 REMARK:  Basis-and-coefficients expression for antisymmetric multilinear maps. 

Theorem 30.3.5 is the antisymmetric multilinear map version of Theorem 30.2.5, and Definition 30.3.6 is the 
antisymmetric multilinear map version of Definition 30.2.7. As mentioned in Remark 30.3.1, Theorem 30.3.5 
differs from Theorem 30.3.2 by requiring the output space U to be finite-dimensional, which makes possible 
the full finite coordinatisation of the space of maps. 


In differential geometry, for example for the gauge potentials in Definition 69.11.3, a typical output space U 
would be the Lie algebra of the structure group of a differentiable fibre bundle. Theorem 30.3.5 is applicable 
to such Lie algebras because they are finite-dimensional. Of even more general importance is the case U = K 
which is the subject of Theorem 30.3.10. 


30.3.5 THEOREM:  Scalar-valued components for antisymmetric multilinear map spaces. 
Let V and U be finite-dimensional linear spaces over a field K, with basis families By = (Hm and 
By = (eU )1., respectively. Let r € Zj. Then 


VA € .Z. (V;U), Ve € V^, 
A(v) — 2 B a; ej 2 parity(P) H KBy (vk)ipqy 
ze 


icI? JENg P€perm(N,.) 


where Kpg, (vi); 


ipa, Means the ip(;j-th component of uj with respect to By as in Definition 22.8.8, and 


VA E€ Z (V;U), Vi € I5, VjeN, — al = ks, (A((e1 a ))3- 


Proor: The assertion follows from Theorem 30.3.2 and Definition 22.8.8. 
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30.3.6 DEFINITION: 

The canonical basis for the antisymmetric multilinear map space Y-(V;U) corresponding to finite bases 
By = (e YE , and By = (e a p for r € ZF and linear spaces V and U respectively over a field K, is the 
family (¢4)iere jENm, where 9j € Zr (V; U) is defined for (i, j) € I? x Nm by 


Vv = (vs), € V”, $; (v) = ev »» parity(P) I KBy (Va)ipi: (30.3.6) 


P€perm(N,.) 
where Kp, and kg, are the component maps for By and By respectively as in Definition 22.8.8. 


30.3.7 REMARK: The antisymmetric multilinear map space canonical basis. 

Theorem 30.3.8 is the antisymmetric multilinear map version of the mixed component space Theorem 27.6.17, 
but antisymmetry requires the component spaces to be unmixed. The assertion in Definition 30.3.6 that 
à, € .;- (V; U) for all (i, j) € I} x Nm is verified (slightly retrospectively) in Theorem 30.3.8 (i). The name 
“basis” for Definition 30.3.6 is justified (slightly retrospectively) in Theorem 30.3.8 (v). 


30.3.8 THEOREM: The antisymmetric multilinear map space canonical basis is a basis. 
Let (S be the antisymmetric multilinear map space canonical basis in Definition 30.3.6 for 
££, (V;iU) with r € Zj, corresponding to bases By = (eY ){ Lı € V" and By = (ef)? € U™ for finite- 
dimensional linear spaces V and U respectively over the same field K. 

(i) Vi e I2, Vj e Nm, dd € Z7 (V;U). 

(ii) VA € 2 (V;U), A= er 2 ,jen,, KBu (A(ef )); 4. In other words, | 

VA € Z; (V;U), A= * 4er 2 ]jen,, G i, where Vi € I?, Vj € Nm, a] = Kgy (Aley ))5- 

(iii) (95; i € IP, j € Nm} spans YZ (V;U). 
(iv) The vectors ¢! for (i, j) € I? x Nm are linearly independent. 

(v) (Pi )ierm, jew, is a basis for Z7 (V;U). 
(vi) dim(.Z;- (V; U)) = Z(I? x Nm) = mC}. 


PROOF: For part (i), let i € J? and j € Nm. Let P € perm(N,). Let a € IN,. Then the map go,p : V > K 
defined by ga,p : Va © KBy (Va)ip(a) is a linear functional in V* = Lin(V, K) = “(V, K) by Theorem 23.4.8. 
Define f; P : V" > K by fi,p(v) = IT; "2, Wair = Daci Jap (v) for v € V”. Then fip € (V; K). 
(See Remark 27.4.3 and Definition 27.4.4.) So the map hi, j,p : v eU parity (P) [T; 4 &v (va);,,, satisfies 
hi;pP € Y-(V;U) for all (i,j) € I? x Nm and P € perm(N,) by Theorem 27.2.32 and the closure of 
-£.(V;U) under scalar multiplication. Then g = 2 Peperm(N,) hi j,p satisfies gl € ¥,(V;U) by the closure 


of Y,(V;U) under vector addition. To show the antisymmetry of di , let Q € perm(IN,). Then 


Vv € V*, $i (v oQ)- eU 5 parity (P) T] «By Wola) ipa 

P€perm(N;.) a=1 

=e © parity(P) [I ay (v5) iy (30.3.7) 
P€perm(N,.) 8-1 

= È  pariy(P'oQ) ]I 5o 6n (30.3.8) 
P'€perm(N,.) B=1 

= ey »» parity( P^) parity(Q) H KBy (UB )iprpy (30.3.9) 
P'€perm(N,.) p=1 

= parity(Q) 9; (v), 


where the substitution a = Q^ (8) is applied in line (30.3.7), the substitution P = P’ o Q is applied in 
line (30.3.8), and line (30.3.9) follows from Theorem 14.8.19 (v). Therefore Vv € V”, à) (v o Q) = =i (v) 
for all transpositions Q by Theorem 14.8.19 (vi). So ¢} o swap,, kọ = —¢! for all ky, ka € IN, with ky Æ ks. 
Hence à! € .Z- (V;U) by Notation 30.1.5. 
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Part (ii) follows from Theorem 30.3.5 and Definition 30.3.6. 
Part (iii) follows from part (ii). 


For part (iv), suppose that $7 kigi = 0 for some k = (ki ier jENm € KT? XN», Then 


icI",jeN 
Vi! EN’, => X kie) 

wel” j—1 

= 2 key” L parity(P) I KBy (eit, Jipa) 
icI? j—1 P€perm(N;.) a=1 

s ER eU 2. k X parity(P) II KBy (ëi Jista) 
j= icIn P€perm(N,.) a=1 

—-Y d y M Y) parity(P) |] 67° (30.3.10) 
j= icIn P€perm(N;) a-l *? 

=5 7 XM © parity(P) 6? 
j= icIn P€perm(N,.) 

=e 9») parity(P) Y) kj? 
j= P€perm(N,.) icIn 

=Ye E perty(P) ko. (30.3.11) 
I= P€perm(N,) icI? 

=5 e} Ð parity(P) kp 6(“i o Pe In”) (30.3.12) 
j= P€perm(N,.) 

= » e7 parity(Pir) kop, (30.3.13) 
J= 

= parity(Py) 35 ef klop, (30.3.14) 

j=1 


where line (30.3.10) follows from Theorem 22.8.9 (ii), and line (30.3.11) substitutes P = (P)~', where the 
equality parity(P~') = parity(P) follows from Theorem 14.8.19 (ii). (See Notation 14.7.21 for the pseudo- 
notation 6(“i’ o P € I””) on line (30.3.12). See Definition 14.11.3 for the standard sorting-permutation P; 
for i' on line (30.3.13), where 7’ o Py € I? is the “sorted rearrangement” of i.) 

Now let i’ € IP. Then P(?) = idw, by Theorem 14.11.7 (vii). So i’ o P; = i’ and parity(P») = 1. So 
1 ey ky, = 0. Therefore kj, = 0 for all (i,j) € I? x Nm by Theorem 22.6.8 (ii) because (ef), is a 
linearly independent vector-family in U. Thus the vectors di for (i, j) € I? x Nm are linearly independent. 


Part (v) follows from parts (iii) and (iv). 
Part (vi) follows from part (v) and Theorem 14.10.5 (1). 


30.3.9 REMARK: Antisymmetric multilinear function space coordinatisation. 

When the linear space U in Theorem 30.3.5 is replaced by the field K, regarded as a linear space over itself 
as in Definition 22.1.9, the result is Theorem 30.3.10. Since the field K is effectively self-coordinatised, there 
is no need to apply a coordinatisation function to the output A((e;,);_, ) from A. 


30.3.10 THEOREM: Components for antisymmetric multilinear function spaces. 
Let V be a finite-dimensional linear space over a field K, with basis family B = (e;)?_,. Let r € Zi. Then 


VA € L(V; K), vv € V", 
A(v) = a ai = parity(P) II KB(Uk)ipq,: 
ic€I? Pe€perm(N,) k=1 


where KB (vk)ipqu; means the ip ,)-th component of vy with respect to B as in Definition 22.8.8, and 


VAEZ (V; K), Vic I}, a; = A( (ei poi) 


Proor: The assertion follows from Theorem 30.3.5. 
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30.3.11 REMARK: Coordinatisation of multilinear map spaces and tensor spaces. 

Since spaces of multilinear maps and tensors are linear spaces, a natural kind of coordinatisation is provided 
by the component map with respect to any basis. As for linear spaces in general, such coordinatisation by 
component maps supplies a differentiable structure for the differential calculus in Chapters 40, 41 and 42, 
and also the exterior calculus in Chapter 46. Table 30.3.1 indicates where component maps are defined for 
various kinds of finite-dimensional linear spaces, including spaces of multilinear maps and tensors. 


class of space dimension 
22.8.8 linear space V dim(V) 
23.2.10 linear space of linear maps Lin(V, W) dim(V) - dim(W) 
23.9.4 dual linear space V* dim(V) 
27.5.4 multilinear map space 2((Va)acai W) dim(W) Į] [aca dim(Va) 
27.5.7 multilinear map space -Z ((Vž)aca; W) dim(W) [[,c4 dim(Va) 
28.2.12 tensor space Maca Va = .Z((Va)aeA; IR)* Iaca dim (Va) 


29.4.20 multilinear-style mixed tensor space .Z((V*)1 4. (W?); 4) II- dim(V;,) - I7, dim(W;) 
29.4.21 linear-style mixed tensor space Lin(.Z((Vi)1.. p LW) IMi- dim(Ve)- Ip- dim(We) 


29.6.15 multilinear-style tensor space .Z((V*)", V5) dim(V)^*5 
29.6.16  linear-style tensor space Lin(.Z;.(V),.Z,(V)) dim(V)^*5 
30.2.9 multilinear map space .Z,.(V; W) dim(V)* dim(W) 
30.3.8 antisymmetric multilinear map space Z7 (V; W) poe) dim(W) 


'Table 30.3.1 Tensor space coordinatisation and dimension 


30.3.12 THEOREM: Dimensions of general, symmetric and antisymmetric multilinear function spaces. 
Let V be a finite-dimensional linear space over a field K, and m € Zg . 


(i) dim(Zn(V; K)) = dim(V)". 
(ii) dim( Z; (V; K)) = Cim) 
(iii) dim(.Z; (V; K)) = Co met 


PROOF: For part (i), let (e;)7_, be a basis for V. Then f has the form f : v > D7) eyNm aruz for v € V™ for 
all f € . Z4, (V; K). So dim. Z;, (V; K)) = dim(V)'* because there are no constraints on the coefficients ay. 


For part (ii), the coefficients a; are constrained by the antisymmetry rule so that a; = parity(P)ayop for all 
permutations P € perm(N,,). From this it follows that a; = 0 for index sequences J with any two indices 
equal. The remaining index sequences may be partitioned according to the equivalence relation J = J if and 
only if JP € perm(N,,), J = J o P. A unique representative may be chosen from each equivalence class 
by sorting into increasing order. There is one and only one increasing index sequence in each equivalence 
class, and there is one and only one equivalence class for each increasing index P It follows that the 
number of equivalence classes equals the number of increasing index sequences in J”, which is equal to C7» 
by Theorem 14.10.5 (i). 


For part (iii), the symmetry rule implies that a; = a; whenever I = J o P for some P € perm(Nm). 
Equivalent index sequences may be partitioned into equivalence classes as in the antisymmetric case, but 
coefficients with repeated indices are not set to zero. À unique representative for each equivalence class may 
be obtained by sorting into non-increasing order. Since a; = a, if and only if the sorted index sequences I 
and J are equal, it follows that the number of independent coefficients is equal to the cardinality of the set 
J, of non-increasing maps from Nm to Nn. By Theorem 14.10.5 (ii), this equals C ^"-1. 
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30.4. Antisymmetric tensor spaces 


30.4.1 REMARK: Antisymmetric multilinear maps are a subspace of general multilinear maps. 

The set of antisymmetric multilinear maps .Z;; (V; W) on a linear space V, valued in a linear space W 
over the same field, is closed under pointwise vector addition and scalar multiplication operations by Theo- 
rem 30.1.16. (See Notation 30.1.5 for 2; (V; W).) It is therefore a well-defined linear subspace of .Z5, (V; W). 
Notation 30.4.2 defines A4,(V; W) to be essentially a synonym for Z; (V; W). 


30.4.2 NOTATION: A,,(V;W), for linear spaces V and W over the same field, where m € Zf, denotes the 
set YO (V; W) together with its pointwise vector addition and scalar multiplication operations. 


30.4.3 NOTATION: A,,V, for a linear space V over a field K, where m € Ze. is an abbreviation for the 
linear space A,,(V; K), where K is regarded as a linear space over the field K. 


30.4.4 REMARK: The “effect” on a particle of a symmetric or antisymmetric “multilinear field”. 
tensor is in essence the “multilinear response” or “multilinear effect” of a particle in the presence of a 
“multilinear field”. If the “field” possesses some kind of symmetry or antisymmetry, the “response” or 
“effect” of the particle will be limited because the set of possible “multilinear fields” will be limited to those 
which possess the symmetry or antisymmetry. In a symmetric bilinear field, for example, the “effect” on 
a simple vector-pair will be independent of the order of the vectors, whereas in an antisymmetric bilinear 
field, the “effect” will be reversed if the vector order is swapped. (Some typical contexts where multilinear 
functions are symmetric or antisymmetric are listed in Remark 30.0.1.) 


The mathematical model chosen in this book for the effect on a particle of a general m-linear field is the 
dual space %,,(V)* of the space .Z;,(V) of multilinear functions on a linear space V. Therefore a suitable 
mathematical model for the effect on a particle of a symmetric m-linear field is the dual space -2% (V)* of the 
space .Z;' (V) of symmetric multilinear functions on V. (The symmetric case is presented in Section 30.5.) 
Similarly, a suitable mathematical model for the effect on a particle of an antisymmetric m-linear field is the 
dual space .Z;,; (V)* of the space .Z;,; (V) of antisymmetric multilinear functions on V. 


30.4.5 REMARK: The imperfect duality of contravariant and covariant tensors. 

General contravariant and covariant tensors are not perfect mirror images of each other. The imperfection 
of the duality is even more evident if the tensors are symmetric or antisymmetric. Elements of 2f (V)* are 
not symmetric, and elements of “> (V)* are not antisymmetric. The elements of these dual spaces are linear 
functionals on spaces of symmetric or antisymmetric multilinear spaces respectively. So they are functions 
of a single variable. It is not possible to assert symmetry or antisymmetry with respect to a swap of two 
variables when there is only one variable! (Of course, if m = 0 or m = 1, there are no variables to swap 
anyway. So Zt(V) = .Z (V) = Gn(V) for m € 1.) 

Another imperfection in the duality is the fact that although Z^ (V) C Zn(V) and Yr (V) € Yn(V), it is 
not true in general that Z4 (V)* C GY, (V)* and ZI (VY C Ln VY. 


30.4.6 REMARK: The inconvenient structure of the duals of antisymmetric multilinear function spaces. 

The significance of the falsity of the inclusion .Z;; (V)* C .Z4(V)* in general becomes clearer if one considers 
that a knowledge of the effect on a particle of all possible antisymmetric multilinear functions is not sufficient 
to determine the effect of all general multilinear functions. The antisymmetric multilinear fields constitute 
a subspace of the general multilinear fields. Extending the “effect” from a subspace to the full space is non- 
unique. In other words, although 4%, (V) € .Z, (V), the linear functionals in the subspace dual 47 (V)* need 
to be extended in some way in order to compare them with linear functionals in the full-space dual Y,,(V)*. 


Let K be the field of V. Then the elements of .Z;,(V)* are linear functionals of the form ¢: Zn(V) > K. 
In general, .Z;; (V) is a linear subspace of .Z,(V), but ZI (V) Z .Z4,(V) when m > 2 and dim(V) > 1. 
Then .Z;; (V) is a proper linear subspace of .Z;,(V). The elements of .Z;; (V)* are linear functionals of the 
form $ : Zz (V) > K. So the dual space .Z; (V)* cannot be considered to be the space of “antisymmetric 
contravariant tensors of degree m”. In other words, ZY, (V)* is not the subspace of tensors in .Z5,(V)* which 
happen to be antisymmetric. However, the space 2; (V) can be considered to be the space of “antisymmetric 
covariant tensors of degree m" because .Z, (V) € .Z;,(V). 
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The issue here is a special case of the fact that the dual of a proper linear subspace is not a subspace of the 
dual of the full linear space. In short, the dual of a subspace is not a subspace of the dual. (This more general 
issue is discussed in Remark 23.6.11. The issue is partially resolved in Section 24.3 by using equivalence 
classes of extensions of linear functionals instead of trying to construct unique extensions.) 


30.4.7 REMARK: Terminology for alternating forms. 

The expression “antisymmetric multilinear maps of degree m on V” is long and clumsy. The expression 
“covariant alternating tensors of degree m on V” is only slightly less long and less clumsy. The expression 
“alternating m-forms on V” is certainly shorter. Then the dual space could be referred to as “alternating 
m-tensors on V”. 


30.4.8 DEFINITION: The alternating tensor product space of degree m on V, for m € Zj and a linear 
space V, is the dual of the linear space A,,(V) of antisymmetric multilinear forms of degree m on V. 


30.4.9 NorATION: A'V,forme Zi and a linear space V, denotes the alternating tensor product space 
of degree m on V. In other words, A" V = Am(V)*. 


30.4.10 REMARK:  Notational conventions for covariant and contravariant alternating tensors. 

The alert reader will have noticed that the wedge symbol /\ for alternating tensors in Notation 30.4.9 is 
not quite the same as the lambda symbol A for multilinear functions. This is intended as a reminder of the 
imperfect duality which is mentioned in Remarks 30.4.5 and 30.4.6. Definition 30.4.8 and Notation 30.4.9 
imply that for m > 2 and dim(V) > 1, 


AS {Ala A € ZA) 
= {Ala AEB" V} 
CO" V ec Cy. 


Individual tensors in A'"V are restrictions of individual tensors in C9" V, but the alternating tensor space 
AV is not a subset of the tensor space C9" V. The relation between AV and C9" V is clarified in 
Remark 30.4.19. 


30.4.11 REMARK:  Subscript/superscript conventions for covariant/contravariant antisymmetric tensors. 

The subscript index m in Notations 30.4.2 and 30.4.3 for the degree of antisymmetric multilinear map and 
function spaces is inherited from the %2, (V; K) notation. This matches the superscript /subscript convention 
for tensor indices because such maps and functions are covariant. (This convention was used as early as 
1900 by Ricci/Levi-Civita [194].) The raised index in the notation A'"V for the contravariant space (“wedge 


product") is inherited from the corresponding Q™ V notation. The contravariant superscript for the degree 
also matches the index convention in tensor calculus. 


30.4.12 REMARK: Multiple duals of multilinear function spaces. 
Some relations between the antisymmetric tensor spaces which are mentioned in Section 30.4 are illustrated 
in Figure 30.4.1. The spaces in the lower row are contravariant. The spaces in the upper row are covariant. 


AmV Ny" (NV) 
Leo Leo 
cs S> 
AmV* N"V (EV 
Figure 30.4.1 Linear and antisymmetric multilinear duals of a linear space 


30.4.13 REMARK: Survey of notations for general and antisymmetric tensor spaces. 

Table 30.4.2 summarises the general and antisymmetric tensor space notations of a selection of authors. 
There is clearly much agreement, but there is also significant diversity. However, there is fairly broad 
agreement that the contravariant degrees should be superscripts and covariant degrees should be subscripts. 
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general tensor 


antisymmetric tensor 


year reference contravariant covariant contravariant covariant 

1959 Willmore [42] ym Vin — — 

1963 Auslander/MacKenzie [1] V vo ATV (A™V)*, A™(V*) 

1963 Flanders [11] — — NV — 

1963 Guggenheimer [16] QV G2 V* AV = 

1963 Kobayashi/Nomizu [19] To (V) T? (V) — — 

1965 Lang [108 T”(V) = NV — 

1965 MacLane/Birkhoff [110] T9 (V) To" (V) Aq (V) Alt; (V) 

1967 Henri Cartan [4] — Ln V: R) — Sf (V;IR) 

1968 Bishop/Goldberg [3] Tg (V) T9 (V) NV A S. 

1968 Choquet-Bruhat [6] (V (@)™v* — (A)"V* 

1969 Federer [69] Om V = Nm V NV 

1970 Spivak [37] Tm(V) T™(V) = aQ™ (Vv) 

1972 Malliavin [28] = L(V, R) — Zu a(V, R) 

1972 Sulanke/Wintgen [40] = - — AV* 

1975 Lovelock/Rund [27] a E fe — A” (V) 

1981 Bleecker [254] quur qum = A” (V) 

1983 Nash/Sen [30] TY T°, — A” V 

1986 Crampin/Pirani [7] — — NV Ap 

1987 Gallot/Hulin/Lafontaine[13] — G9" V e" y* — de 

1991 Fulton/Harris [76] pem — NV, At” V A v 

1993 EDM2 [113] Q"Vv,T"(V, G"Vv*,T?(V), NV AUS 

(VSR)  £(I"V.R) 

1994 Darling [8] — — NV N°V*, Am(V > R) 

1995 O’Neill [295 $5 (V) 82 (V) A” (V) — 

1997 Frankel [12] eQ"y e" y* — Ace 

1997 Lee [24] Tm(V) T™(V) — A” (V) 

1999 Lang [23] = L™(V, R) NV A" V^, L(V, R) 

2004 Bump [57] QV — ATV — 

2004 Szekeres [305] — — A” (V) A" (V*), A*"(V) 

2012 Sternberg [38] eV) T? (V) pry — 
Kennington ey Lm V R), GnV NV Am( V; R), Am V 

Table 30.4.2 Survey of general and antisymmetric tensor space notations 


30.4.14 REMARK: Tensor space representations of the Riemann curvature. 

A prime example of a tensor-like object possessing antisymmetry is the Riemann curvature. The Riemann 
curvature at a point p in an affinely connected manifold M is a linear map from the linear space of area 
elements at p to the linear space of linear automorphisms of the tangent space at p. In other words, the 
Riemann curvature inhabits the linear space Lin( NT,(M ), Aut(T; (M))). 


The properties of the Riemann curvature arise from holonomy groups. (See Section 70.1.) Differentiable 
paths close to à point p are assumed to be boundaries of area elements, and the parallel transport of the 
tangent space around such paths is assumed to yield a linear automorphism of T;,(M). Furthermore, it 
is assumed that this automorphism is differentiable with respect to the area element. In other words, the 
automorphism in Aut(T,(M)) can be approximated as a linear function of the area element which is enclosed 
by the curve, plus an error term which converges to zero. (Of course, this informal description needs some 
work to make it mathematically meaningful.) 


The antisymmetry arises from the area elements which are approximations for closed curves. Since the 
curvature is defined in terms of parallel transport around curves, and parallel transport is defined as the 
integral of a connection form around the curve, it is inevitable (by the Stokes theorem) that the resulting 
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automorphism will converge to the application of the exterior derivative of the connection form to the area 
form in the limit of small curves. Thus the curvature form is the exterior derivative of a connection form. 
(The connection form is an element of Lin(T (M), Aut(15(M))).) 


The space Lin(A’T,(M), Aut(T;(M))) is almost identical to Ao(T,(M); Aut(T,(M))), namely the space of 
antisymmetric bilinear maps from T,(M) to Aut(T;(M)). This is slightly less intuitive because it is a space 
of maps from vector-pairs to the automorphism group. Each vector-pair is associated with a parallelogram 
bounded by the vectors on two sides, and this parallelogram is associated with a unique, well-defined area 
form in NT, (M ). However, surfaces may be partitioned in many ways, not necessarily approximating the 
surface by parallelograms. So although an antisymmetric bilinear function of vector-pairs does correspond 
mathematically to the space of area forms, it is not so immediately intuitive. 

Since the set Aut(T,(M)) of tangent space automorphisms is a linear subspace of Lin(T, (M), T,(/)), the set 
of all tangent space endomorphisms, the spaces Lin(A’T,(M), Aut(Z,(M))) and Ao(T, (M); Aut(T, (M))) are 
subspaces of Lin(A T; (M), Lin(T; (M), T,(M))) and A2(T,(M); Lin(T; M), T,(M))) respectively. Similarly, 
A2(T, (M); Aut(T,(M))) and A2(Tp(M); Lin(T5 (M), T5, (M))) are subspaces of .Z5(T, (M); Aut(T,(M))) and 
2£(T,(M); Lin(T; (M), T, (M))) respectively. 

It follows from Theorem 29.2.26 that Lin(T (M), T;(M)) has a natural isomorphism to .Z'(T5,(M), T,(M)*). 
This has a natural isomorphism to T,(M)* & T,(M). So %o(T,(M);Lin(T,(M),T,(M))) has a natural 
isomorphism to %(T,(M);T,(M)* & T,(M)). Then, via several further natural isomorphisms, this has 
an isomorphism to T,(M)* ® T,(M)* & T,(M)* & T,(M), which may be identified with the mixed tensor 
space gp T,(M). This slightly ridiculous sequence of isomorphisms gives some idea of the conceptual 
distance between the space Lin(A T, (M), Aut(T;(M))) and the standard tensor space &9 ^? T,(M). (This 
topic is also discussed in Remark 71.11.1.) 


30.4.15 REMARK: Natural immersion for antisymmetric multilinear maps. 

It is mentioned in Remarks 30.4.5, 30.4.6 and 30.4.10 that the dual AV = Z; (V)* of the space -Z7 (V) of 
antisymmetric multilinear functions, which is a subspace of .Z, (V), is not in general a subspace of the dual 
Q” V = Z4, (V)* of the whole space .Z;,(V). Theorem 30.4.18 shows how to partially remedy this situation 
by applying Theorem 24.3.4 to construct an isomorphism from the dual of the subspace to a quotient of the 
dual of the whole space. For greater generality, Theorem 30.4.16 is expressed in terms of general multilinear 
maps instead of multilinear functions. 


The maps and spaces in Theorem 30.4.16 are illustrated in Figure 30.4.3. The expressions at the lower 
left of the spaces in the diagram indicate their dimensions if dim(V) — n and dim(W) — 1. The set 
UŁ = ¥>(V;W)+ is called the “kernel on the left” of the bilinear map (¢, f) — $(f). It may also be 
referred to as the “annihilator” of .Z; (V; W) with respect to this bilinear map. (See Remark 24.3.3 for 
more comments on this terminology.) 


AV SW) Lm VW) LaVi W) / Ln VW) 


U-.Z.(V;W) Ut-.£i;(V;W)t U*=Z5(V;w)* 


Figure 30.4.3 Maps and spaces for subspace-quotient isomorphism for multilinear map space 


30.4.16 THEOREM: Isomorphism of antisymmetric multilinear maps dual to a quotient space. 
Let V and W be finite-dimensional linear spaces over the same field, and m € Zf. Let Z; (V; W)- 
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denote the linear subspace {ġ € Zn(V;W)*; Vf € Zna (V; W), o(f) = 0} of Zn(V;W)*. Define the map 
V Ln Vi WY > LaVi W/Z, (V;W)* by 


Vé e (iW), WO) = {0 E EnV W)"; vf e, iW), e(f) = o) 


Then v is a linear space isomorphism. 


PROOF: The result follows by substituting Z%,(V;W) for V and .Z;; (V; W) for U in Theorem 24.3.4. 


30.4.17 REMARK: Natural immersion for antisymmetric multilinear functions. 
Theorem 30.4.18 specialises Theorem 30.4.16 to the case that the linear space W is the field K regarded as 
a linear space over itself. 


30.4.18 THEOREM: Isomorphism of antisymmetric multilinear functions dual to a quotient space. 

Let V be a finite-dimensional linear space over a field K, and m € Zf. Let .£; (V; K)- denote the linear 
subspace (6 € .Z,(V; K)"; Vf € ZIV; K), o(f) = 0} of . Z4, (V; K)*. Define the map  : Zz, (V; K)* > 
L(V; K)*/ Za (V; K)^ by 


VP e Ln VK), VH) = {9 E Ln(Vi K)"; Vf e La (VK), Of) = 90). 


Then v is a linear space isomorphism. 


Proor: The result follows by substitution of the field K for the linear space W in Theorem 30.4.16. 


30.4.19 REMARK: Natural immersion of contravariant antisymmetric tensors. 
The isomorphism in Theorem 30.4.18 may be written as Y : A" V — C9" V/(AmV)+, where (AmV)+ denotes 
the linear subspace {¢ € Q” V; Vf € AV, (f) = 0) of G9" V, and v is defined by 


vg € A" V, VD) = (6 € "V; Vf € AV, e(f) = P 


This is illustrated in Figure 30.4.4, 


L(V) Gy B” V/(AmV)* 
(9) © 
n" n" cn 
z fiav: V 
cn o n" — Cn cn 
U — AV UŁ = (Am V)} U* = N"°V 


Figure 30.4.4 Maps and spaces for subspace-quotient isomorphism for multilinear functions 


This has the clear geometric interpretation that 7(¢’) is the set of all tensors ¢ in C9" V which have the 
same antisymmetric m-linear “effect” as ¢’. Hence “contravariant antisymmetric tensors” may be thought 
of as equivalence classes of contravariant tensors which have the same antisymmetric m-linear “effect” as 
each other. Thus A" V is not a subset of C9" V, but it is isomorphic to a quotient of &™ V. This and some 
other natural isomorphisms between antisymmetric tensors spaces are illustrated in Figure 30.4.5. 


30.4.20 DEFINITION: An m-covector for a linear space V and m € Zi is any element of A4,V. 
30.4.21 DEFINITION: An m-vector for a linear space V and m € Zi is any element of A" V. 


30.4.22 DEFINITION: A simple m-vector in an alternating tensor product A''V for a linear space V and 
m € Zg is any f € A"V of the form f : A > A(v) for some v € V". 
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@™V/(AmV)~ 


Figure 30.4.5 Duals and isomorphisms for antisymmetric tensors 


30.4.23 NOTATION: A?', v; for a sequence v € V™, where V is a linear space and m € Zi, denotes the 
simple m-vector f € A" V defined by f(A) = A(v). 


vı Avg ^ ... 0j is an alternative notation for A7 4vi. 


30.4.24 THEOREM: Formula for dimension of antisymmetric tensor and multilinear function spaces. 
Let V be a linear space and m € Zt. Then dim(A" V) = dim(A,,V) = C9", 


PROOF: It follows from Theorem 30.3.8 (vi) and Notation 30.4.3 that dim(A,,V) = C7, for n = dim(V). 
The dimensionality of A" V then follows from Theorem 23.7.11. 


30.4.25 REMARK: The area spanned by a vector-pair is an “antisymmetric bilinear effect”. 

In the same way that a simple general tensor may be thought of as the “multilinear effect” on a sequence 
of vectors at a point in a “multilinear field”, a simple antisymmetric tensor may be thought of as the 
“antisymmetric multilinear effect” on the sequence of vectors. 


The area spanned by a pair of vectors vı and v2 is a kind of “antisymmetric bilinear effect" of the pair of 
vectors. The word “area” here means “directed area”, not just the amount of area. This directed area is 
denoted vı ^ vo. (The symbol “A” is pronounced “wedge”.) Since this area has a direction, reversing one of 
the vectors changes the direction to the opposite: v; A (—v2) = — (v1 ^ v3) = (—v1) ^ v2. When the order of 
multiplication is swapped, the resulting area has the opposite direction. So v9 A vy = —(v4 ^ v2). 

As illustrated in Figure 30.4.6, the antisymmetric bilinear effect of the pair of vectors v; and v4 + v2 is the 
same as for the pair vı and vo. This is because vı A (v1 + v3) = v ^v + v A v2 by linearity with respect to 
the second factor, and vı ^ vı = —(v1 A v1) by antisymmetry. So vı ^vi = 0. Hence v A (v1 + v2) = v1 ^ Vo. 
Similarly, 

0.5(v1 — v3) A (v1 + v2) = (vı — 0.5(v4 + v3)) ^ (v1 + v2) 
= vi ^ (v T V2) — 0.5(vi + U2) ^ (vi + U2) 


= v1 A (v1 EE v2). 


201 + U2 
Ui + V2 Ui + 2-7 Ui + V2 
Pi 


Vg .-^ ve 201 t 0.502 


~ 1.501 + 0.5v2 


\ / e \ 
g "d re Q.5U9 .-- 2v 
Ui Ui ;'1 
UA 


0.5(vi = v2) 
v1 ^ v2 = v1 ^ (v1 + v2) = 0.5(v1 2m v2) ^ (v1 + v2) = (201) ^ (0.5v2) 
Figure 30.4.6 Equivalent antisymmetric multilinear effect of vector pairs 


Antisymmetric tensors are used for integration of functions on curves, surfaces and volumes. The line, area 
and volume elements for (standard) integration are “antisymmetric multilinear effects" of vectors. 
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30.4.26 REMARK: Interpretation of antisymmetric tensors in the context of integration on manifolds. 
The principal motivation for antisymmetric tensors is the theory of integration on manifolds embedded within 
flat spaces or manifolds, especially when the codimension is greater than zero. 


In applications to the pure mathematical study of the global topological implications of the curvature of 
manifolds, it is customary to assume that all manifolds have C?? regularity. Such “topological differential 
geometry” does not require the sophisticated analytical investigation of integration on regions with extremely 
discontinuous boundaries which is the subject of geometric measure theory. Even the sophistication of 
Lebesgue measure and integration is “surplus to requirements” for such “topological geometry”. 


In applications of antisymmetric tensor spaces to integration on manifolds, regions of space, or patches 
of surfaces, are approximated (in one way or another) by very simple regions such as parallelograms or 
parallelepipeds. The edges of such simple regions are considered to be vectors, and the area or volume may 
be modelled as an antisymmetric tensor product of the edge vectors. Such local approximation of regions 
and boundaries is very straightforward for smooth manifolds, but requires progressively more sophistication 
as the “smoothness parameter” is lowered. 


The use of antisymmetric tensors for integration on manifolds requires a metric tensor field. This requirement 
is often hidden by bundling the metric tensor into the integrand, or by implicitly assuming that the metric is 
Euclidean. The interpretation of antisymmetric tensors as volume, area or line elements requires, in principle, 
multiplication of such tensors by the square root of the determinant of the projection of the metric tensor 
onto the hyperplane of the elements. 


30.4.27 REMARK: Alternative names for antisymmetric tensor spaces. 
Antisymmetric covariant tensors are also known as exterior forms, skew-symmetric forms, skew-symmetric 
tensors and alternating tensors. 


30.5. Symmetric tensor spaces 


30.5.1 REMARK: Applications of symmetric tensors. 

Symmetric tensors are important in the theory of elasticity and in the dynamics of rigid bodies. In elasticity, 
stress and strain are represented as symmetric tensors. In rigid body dynamics, the moment of inertia of a 
body is represented as a symmetric tensor. In Riemannian and pseudo-Riemannian spaces, the metric tensor 
and Ricci tensor are important symmetric tensors. In general (and special) relativity, the stress-energy tensor 
is an important symmetric tensor. 


Although the construction of symmetric tensor spaces is similar to the construction of antisymmetric tensor 
spaces, symmetric tensors do not have a well-developed algebra comparable to that for antisymmetric tensors. 


30.5.2 REMARK: Classification of bilinear functions as positive and negative definite and semi-definite. 
For the real number field, bilinear functions may be usefully classified according to how positive or negative 
they are, and how definite or non-degenerate they are. In the case of matrices, these properties are discussed 
in Sections 25.11 and 25.13. Since these properties are invariant under changes of basis, the matrix concepts 
may be meaningfully applied to bilinear functions. 


Although Definition 30.5.3 is written in terms of general spaces of bilinear functions %(V; R), the definition 
is mostly applied to the symmetric bilinear function spaces S VIR). A particular application of this 
definition is to the metric tensor for a Riemannian manifold, which is defined to be a positive definite 
bilinear function on the tangent space at each point of the manifold. (See Section 73.2.) 


30.5.3 DEFINITION: A bilinear function f € %(V;R), for a linear space V, is said to be 

(i) positive semi-definite if Vv € V, f(v,v) > 0, 

(ii) negative semi-definite if Vu € V, f(v,v) € 0, 
) 
) 


(iii) positive definite if Vu € V N {Ov}, f(v, v) > 0, 

(i 

30.5.4 REMARK: Alternative terminology for positive and negative semidefinite bilinear functions. 
Lang [108], pages 583 and 597, uses the term “semipositive” to mean “positive semi-definite”. 


v) negative definite if Wv € VN {0y}, f(v,v) < 0. 
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30.6.1 REMARK: Motivation for defining tensor bundles on Cartesian spaces. 

The motivation for defining tensor bundles on Cartesian spaces is to provide a standard structure which may 
be imported onto differentiable manifolds. Tensor bundles on Cartesian spaces are also useful in themselves 
if one does not wish to define a full differentiable structure. 


30.6.2 REMARK: Tensor bundles are based on tangent vector bundles and tangent covector bundles. 
Tangent vector bundles on Cartesian spaces are presented in Sections 26.13 and 26.14. Tangent covector 
bundles on Cartesian spaces are presented in Section 26.17. Section 30.6 applies the definitions of tensors to 
construct the corresponding tensor bundles on Cartesian spaces. 


Tensor bundles are not fibre bundles because they do not have an explicit structure group. (See Defini- 
tion 21.8.3 for non-topological ordinary fibre bundles.) 


30.6.3 REMARK: Notations and identifications for tensor spaces on Cartesian spaces. 

When the linear space V in Definition 29.5.4 and Notation 29.5.5 for mixed tensor spaces is the tangent-line 
space T,(R”) at a point p in a Cartesian space R” for n € Zi, as in Section 26.13, the dual space is then 
denoted T7 (IR^), as in Section 26.17. Using the mixed tensor space notation C9'" V in Notation 29.5.5, 
one may identify T,(IR") with e T,(IR"), and T5 (IR^) with a T,(IR"). Conversely, one may write 
G9* T; (IR^) as T7*s(IR"), and ®"* T; (IR^) as T7 (IR^). This is done in Notations 30.6.6 and 30.6.7. 


30.6.4 DEFINITION: The (multilinear-style) tensor space of type (r,s) at a point p in a Cartesian space IR" 
with n € Zf, for p € IR" and r,s € Zg, is the tensor space C9" T, (IR") of the tangent line space T; (IR^). 


30.6.5 DEFINITION: The (linear-style) tensor space of type (r,s) at a point p in a Cartesian space IR" 
with n € Zg , for p € IR" and r,s € Zf, is the tensor space G9" "^ T, (IR") of the tangent line space T;,(IR”). 


30.6.6 NOTATION: 77"(IR"), for p € IR", r,s € Zj and n € Zj; , denotes the tensor space &9^* T, (IR^). 
30.6.7 NOTATION: 77"*(IR"), for p € R”, r,s € Zj and n € Zj, denotes the tensor space ®"”* T, (IR"). 
30.6.8 REMARK: Tangent covector spaces. 

The tensor space T?! (IR^) is the same as the tangent covector space T7 (IR^) in Notation 26.17.4. The tensor 


space 79?! (IR?) is equivalent to the tangent covector space T*(R”). 
p p 


30.6.9 DEFINITION: A (multilinear-style) tensor of type (r,s) at a point p in a Cartesian space R” with 
n € Zg, for p € R” and r,s € Zj, is an element of the multilinear-style tensor space of type (r, s) at p. 


30.6.10 DEFINITION: A (linear-style) tensor of type (r,s) at a point p in a Cartesian space R” with n € Zf, 
for p € R” and r,s € Zf, is an element of the linear-style tensor space of type (r, s) at p. 


30.6.11 REMARK: Tensor bundles on Cartesian spaces. 

Fibre bundles are introduced in Chapter 21. Tangent bundles on manifolds are defined in Chapter 54. 
General tensor bundles on manifolds are defined in Chapter 56. It is convenient to use the language of tensor 
bundle total spaces for the sets in Notation 30.6.12 in anticipation of later definitions. (See Notations 56.5.14 
and 56.7.4 for the corresponding bundle total spaces on differentiable manifolds. See Definition 21.8.3 for 
non-topological fibre bundles.) 


30.6.12 NOTATION: Tensor bundle total space sets for Cartesian spaces. 


T™S(T(R”)), for r,s € Zj and n € Zf, denotes the set enn tp” (R”). 


T5 (T (R?)), for r,s € Zf and n € Zf, denotes the set User» 75 R”). 
As(T(IR?)), for s € Zj and n € Zt, denotes the set Uper» As(Tp(R")). 
A,(T(IR"),W), for s € Z{, n € Z{, and a real linear space W, denotes the set Uper» As(Tp(R”), W). 


£,(T(R"),W), for s € Zg, n € Zi, and a real linear space W, denotes the set UJ, ci ZA (T5 IR"), W). 
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30.6.13 REMARK: Sets of cross-sections of tensor bundle total spaces for Cartesian spaces. 

For general non-topological fibrations (E, m, B), Notation 21.3.4 lets X(E, r, B) denote the set of cross- 
section X : B — E such that Vb € B, X(b) € «-!((b])). In the case of tensor bundles of various kinds, the 
base space and projection map are clear from the context and do not need to be included in the notation. 
Therefore one generally writes X(T"*(T(IR?))) instead of X(T^*(T(R")), m, R”), and so forth. This is 
stated somewhat informally in Notation 30.6.14. 


30.6.14 NOTATION: X(E), for any tensor bundle total space for a Cartesian space IR" with n € Zf, 
denotes the set of cross-sections of E. In other words, X(E) = (X : R” > E; Vp € R^, X(p) e x !((p). 
where m : E — IR" is the standard projection map for E. 

Then X(E |U), for a subset U of IR", denotes the set (X : U + E; Vp € R”, X(p) € v! ((p]))]). 


(These notations apply to the spaces in Notations 26.14.3 and 30.6.12 in particular.) 


30.6.15 REMARK:  Tensors parametrised by their components with respect to the standard basis. 

The standard basis (em)% -1 € (IR")" in Notations 30.6.16 and 30.6.17 is the standard basis of the Cartesian 
space R” which is defined in Definition 22.7.9. Tangent vectors Lp, and tangent covectors Ly, for p, v, w € 
R” are defined in Notation 26.13.4 and Notation 26.17.8 respectively. 

Simple tensors of the form (@{_, vg) Q (8$., 4^) € .Z((V*)', V5) for (vk); € V" and (95; , € (V*)5, 
where V is a finite-dimensional linear space, are defined in Notation 29.6.9. Simple tensors of the form 
GL Qv, > 9244! € Lin(G(V),Z(V)) for (vk); € V" and (95); , € (V*)* Lin(G(V),Z%(V)) are 
defined in Notation 29.6.10. 


30.6.16 NOTATION: Lpa for r,s € Zi for p € R” and a: N7, x NS — R for some n € Zt, denotes the 


multilinear-style tensor in 77*(IR") which is defined by 
Ls = i, 9 Lye. & »L* 
p,a ES ;((&, D, in) (@ p.ej, )) 
where J = N}, J = N$, and (em)m=1 € (IR")" is the standard basis of IR^. 


30.6.17 NOTATION: Lpa, for r,s € Z, for p € R” and a: N? x NS > R for some n € Zf, denotes the 


linear-style tensor in 75**(IR") which is defined by 


; T s 
LS = a’ 5 Q L ] = Q L* . 
p,a EP CS, P,ĉik lal Dez)? 


where J = N}, J = N5, and (em)m=1 € (R”)” is the standard basis of R”. 
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Part III 


Topology and analysis 


The informal outlines at the beginning of Parts I, II, III and IV of this book are not intended to be read. 
However, these outlines could possibly be of some vule as an 1 adjunct to the table of contents and index. 
The reader is recommended to skip immediately to Chapter 31. 


PAnT III: "Topology and analysis 


(1) General topology is divided between Chapters 31-39. This is followed by general analysis in Chapters 
40—46. Topological fibre bundles are presented in Chapters 47-48. (See Chapter 21 for non-topological 
fibre bundles.) 


(2) All of the topology and analysis in this book is presented in terms of symbolic predicate logic so as 
to maximise precision of expression. In these subjects, where infinities are the rule, not the exception, 
informal language frequently leads to ambiguity, incomprehensibility, or even serious errors. Whenever 
the symbolic logic is difficult to read, it is also explained informally in remarks. (This is like comments 
in computer code. The comments are for humans. The code is for the computer.) 


(3) There are two kinds of topology, sometimes called “point set topology" and “combinatorial topology". 
In this book, only point set topology is required, although in a large proportion of differential geometry 
books, combinatorial topology is developed in some depth. However, combinatorial topology, including 
algebraic topology, is outside the scope of this book. 


CHAPTER 31: Topology 


(1) Chapter 31 presents the initial basic concepts of topology, including topological spaces, open and closed 
sets, relative topology, open covers, the interior, exterior, closure and boundary of sets, limit points and 
isolated points of sets, and some simple examples of topologies on finite and infinite sets. Then the basic 
definitions of continuous functions and homeomorphisms are given. 


(2) Section 31.1 is an informal discussion of the differences between point-set and combinatorial topology, 
and the history and naming of these branches of topology. 


(3) Section 31.2 is an informal discussion of the intuitive basis of topology. 


(4) Section 31.3 commences the more formal presentation of topology with the standard formal definition 
of a topology in Definition 31.3.2. The elements of a topology are called “open sets". 


(5) Section 31.4 defines closed sets to be the complements of open sets. A popular alternative definition for 
closed sets is that they are sets which contain all of their limit points. Then it is shown that such sets 
are complements of open sets. Such an approach is closer to the intuitive meaning of a closed set, but 
defining limit points and demonstrating the equivalence of the two definitions is not completely trivial. 
The equivalence is shown here in Theorem 31.10.12(vi). It seems preferable to introduce closed sets 
very early in the presentation as they are here. 


(6) Section 31.5 presents some topologies on finite sets. Since topology is quintessentially infinite by nature, 
finite-set topologies might seem irrelevant. They do have real applications, but more importantly, they 
provide examples to demonstrate how the basic logic of some theorems works, and counterexamples to 
quickly eliminate some intuitively appealing conjectures which are in fact wrong. 


(7) Section 31.6 defines "relative topology", which is one of the most common ways of defining a topology. 
If a set already has a topology, then every subset inherits a relative topology by restricting the parent 
set’s topology in an obvious way. 
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Section 31.7 defines a fundamental analytical tool called an “open cover". 


Sections 31.8 and 31.9 define the interior, closure, exterior and boundary of sets. These four concepts 
have strong intuitive meanings which are possibly even more important then the definition of a topology. 
In fact, these operators contain the same information as the topology itself. (Some authors axiomatise 
topology in terms of the interior or closure instead of open sets.) Table 31.8.1 in Remark 31.8.10 is 
a survey of the wide range notations of these four concepts in the mathematics literature. Numerous 
useful properties are given for these concepts, which are frequently used in this book. 


Section 31.10 defines limit points of sets (also known as accumulation points or cluster points), and 
isolated points. A substantial proportion of analysis is concerned with demonstrating the existence 
of limit points of one kind or another. In application contexts, it is easy to make errors by following 
intuition. So precise definitions for these concepts are required. Also defined here are oo-limit points, 
which are sometimes closer to intuitive notions of limit points than the standard definition. Limit points 
must not be confused with sequential limits (although they often are). 


Section 31.11 presents some simple topologies on infinite sets. These are very broadly useful as examples 
and counterexamples. 


Sections 31.12, 31.13 and 31.14 give basic initial definitions for continuous functions and homeomorph- 
isms (which are “bidirectional” continuous functions). The serious consideration of continuous functions 
commences in Chapter 35, but some definitions and properties are required much earlier, for example 
for defining some function-induced topologies in Section 32.8, for product topologies in Sections 32.9 
and 32.12, and for some separation classes in Section 33.3 


CHAPTER 32: Topological space constructions 


(1) 
(2) 


(3) 
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Chapter 32 introduces various methods for constructing topological spaces. Typically one or more 
topologies are given as inputs, and the methods convert these into new topological spaces as outputs. 


Section 32.1 generates a topology on a set from an arbitrary collection of subsets of that set. This is 
the smallest topology which contains those subsets. Applications include the definitions of topologies 
for the real numbers, Cartesian spaces and general metric spaces. 


Sections 32.2 and 32.3 introduce “open bases” and “open subbases”. These are the inverse of the 
topology generation method in Section 32.1. In other words, a topology can be re-generated from an 
open base or subbase. Open bases and subbases are useful because some properties of topologies can be 
demonstrated much more easily by testing the base or subbase instead of the full topology. For example, 
if the base or subbase is explicitly definable, or is countably infinite, this often makes the task easier. 


Section 32.4 generates a topology on a set from the inverse images of functions on that set. This is a 
special case of Section 32.4 which is important for defining topological manifolds. 


Section 32.5 defines standard topologies for the integers, extended integers, rational numbers and real 
numbers. 


Section 32.6 defines the standard topology for Cartesian spaces. 


Section 32.7 demonstrates that every open set of real numbers is equal to a countable disjoint union of 
open intervals (without using the axiom of choice). This has applications including the demonstration 
in Sections 45.6 and 45.7 that Lipschitz functions are differentiable almost everywhere. 

Section 32.8 concerns some topological space constructions which use continuous functions. 

Section 32.9 defines the direct product of two topological spaces. 

Section 32.10 gives various properties of “slice sets" and projection maps of products of topological 
spaces. 

Section 32.11 presents “product-structured” topological spaces, which are homeomorphic to products of 
topological spaces. This topic is particularly relevant to the definition and properties of the fibre sets 
of topological fibre bundles. 

Section 32.12 extends topological space products from two spaces to general families of spaces. 
Sections 32.13, 32.14 and 32.15 present various ways of combining topologies on overlapping sets to define 


a global topology. These include topological quotient and identification spaces, set-union topologies, and 
topological *patchwork spaces". 
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CHAPTER 33: Topological space classes 


(1) 


(9) 


General topology is a general framework within which many related subjects are combined. However, 
not much can be proved from the very weak assumptions placed on this very general framework. Each 
topological space class is defined by one or more conditions which narrow down the range of topological 
spaces so that more assertions can be proved. The narrower the class, the more that can be said. 
Chapter 33 introduces some of the most important topological space classes. Connectedness classes are 
delayed to Chapter 34. The class of metric spaces is delayed to Chapter 37. 


Section 33.1 defines the weaker separation classes which are known as To, T; and T2. The conditions 
imposed on these classes are so weak that they were originally assumed to be true of all topological 
spaces. In fact, spaces which are not in these classes typically have a pathological character. The Tə 
class is also known as the Hausdorff class. This is important for differential geometry because locally 
Cartesian spaces do not necessarily have even this very weak Hausdorff property. (Many examples are 
given in Section 49.5.) 


Before progressing to the strong separation classes, Section 33.2 defines two notions of the separation of 
pairs of sets, referred to here as ^weak separation" and “strong separation”. Confusion between these 
two notions can easily lead to erroneous assumptions. Therefore their properties and differences are 
emphasised here. (There is no difference between them in a Ts space or metric space.) 


Section 33.3 introduces stronger topological separation classes. Spaces in these classes are known as 
T5, T; p Ta, T; and Ts spaces. The relations between the stronger separation classes are not simple 
inclusions according to numerical order as in the case of the weaker separation classes. The untidy 
details of these relations are summarised in Figure 33.3.7 in Remark 33.3.36. 


Section 33.4 presents separable spaces, and first and second countable spaces. (Confusingly, separable 
spaces are not directly related to separability classes, despite the very similar name.) These classes are 
concerned with the countability of dense subsets, global open bases and pointwise open bases respectively. 
'These properties are often indirectly useful because they imply other properties which are directly useful. 
Since these classes require only the existence of certain kinds of subsets or open bases, the ability to 
choose these subsets or open bases often leads to invocations of the axiom of choice to combine infinitely 
many of these choices. To combat this issue, explicit first and second countable spaces are defined. 


Section 33.5 introduces open covers and compactness. There are many kinds of topological compactness, 
which are often confused in the mathematics literature. Literature which uses this concept must always 
be examined carefully to determine which definitions and terminology are being used. The style of 
compactness defined in Section 33.5 is expressed in terms of subcovers of open covers. 


Section 33.6 introduces local compactness, which is implied by the global compactness in Section 33.5. 


Section 33.7 presents some variations of the compactness class in Section 33.5. These include the classes 
of countably compact, Lindelóf and paracompact spaces. These classes often require the axiom of choice 
in applications. So they are defined here more on a “know-thy-enemy” basis than because they are 
really applicable. They are often encountered in differential geometry books, although they are rarely 
genuinely useful. 


Section 33.8 defines "topological dimension". This is rarely useful in differential geometry, although it 
has a very limited relevance for Hilbert’s 5th problem for Lie groups in Section 62.1. The “Lebesgue 
dimension" of a topological space is difficult to define and difficult to use. 


CHAPTER 34: Topological connectedness 


(1) 
(2) 


(3) 
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Chapter 34 defines connected and locally connected sets in topological spaces. (Pathwise connected sets 
are defined in Section 36.7.) 


Section 34.1 defines connected sets to be sets which cannot be disconnected. This is in contrast to 
the definition of pathwise connectedness, which requires the existence of some kind of structure. The 
definition of connected sets requires the non-existence of some kind of structure, which in this case is a 
disjoint open cover of a pair of sets. This definition of connectedness has some unintuitive properties, 
possibly because pathwise connectedness is closer to one’s intuition of connectedness. 


Section 34.2 shows that in Ts separated spaces, disconnectedness of a pair of sets is equivalent to weak 
separation of the sets. This has some technical benefits. 
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(4) 


(8) 


Section 34.3 is about “disconnections”, which are the disjoint open covers which appear in the definition 
of a disconnected set. 


Section 34.4 presents and proves several practical methods for proving that a set is connected. 

Section 34.5 is on partitioning sets into their connected components. Section 34.6 is on the “connected 
component map”, which maps each point in a set to its component in the connected component partition. 
Section 34.7 is on local connectedness, which in general neither implies nor is implied by connectedness. 
Section 34.8 shows that open sets in locally connected separable spaces have countably many connected 
components, which is a generalisation of the same observation for the topological space of real numbers 
in Section 32.7. 

Section 34.9 gives some topological properties of the real numbers, including the Heine-Borel theorem, 
which states that all bounded closed sets of real numbers are compact. 


CHAPTER 35: Continuity and limits 


(1) 


(6) 


(7) 
(8) 


Chapter 35 is principally concerned with continuity, which is often considered to be the central concept 
of topology. Applications of the properties of continuous functions are almost ubiquitous in differential 
geometry. Chapter 35 is also concerned with the very closely related topic of limits of functions. Since 
there are many kinds of limits, this is not as simple as it might seem. 


Section 35.1 presents numerous useful set-map properties of continuous functions. These are the set- 
maps and inverse set-maps which are presented in Sections 10.6 and 10.7 for general functions. 


Section 35.2 presents, for philosophical interest (with no applications in the rest of the book), a non- 
standard definition of function continuity in terms of connectedness of function images and inverse 
images instead of the usual definition in terms of open sets. It is shown in Theorems 35.2.17 and 35.2.19 
to be equivalent to the standard definition if the target space is Tı or T3, depending on the choice of 
function connectedness definition. 


Section 35.3 is about limits and convergence of functions. These are generalisations of the well-known 
corresponding concepts in metric spaces, particularly in introductory analysis. Similarly, Section 35.4 
is about limits and convergence of sequences. 


Section 35.5 is about definitions of compactness which are based on limit points of sets. As summarised in 
Remark 35.5.1, there are several kinds of compactness in each of three broad classes, namely compactness 
which is based on open covers, on limit points of sets, and on limit points of sequences. These are often 
confused in the literature. So they need to be carefully distinguished here. This topic is closely related 
to the Bolzano-Weierstrab property, which also refers to a range of different concepts in the literature. 
The subject is further confused by interactions with the axiom of choice. To clear up ambiguities, these 
concepts are defined here with some precision, and their interrelations are presented with some care. 


Section 35.6 concerns compactness definitions which are based on limits of sequences of points. This 
section continues the attempt in Section 35.5 to bring some clarity to the bewildering range of definitions 
concerned with compactness. 

Section 35.7 is about limit points and limits of sequences of points in sets of real numbers. This includes 
the Bolzano-Weierstraf property for real numbers and for Cartesian spaces. 


Section 35.8 presents inferior and superior limits of real-valued functions on topological spaces. 


CHAPTER 36: Topological curves, paths and groups 


(1) 


[ www. geometry. org/dg. html] 


Section 36.1 considers terminology for curves and paths in topological spaces. A curve is defined here 
as a continuous map from a real interval to a topological space. A path is defined as an equivalence 
class of curves with respect to some equivalence relation. (A survey of terminology for curves, paths 
and arcs is given in Table 36.1.1 in Remark 36.1.1.) Since there are many kinds of equivalence relations 
for curves, the subject of paths is untidy. Most of the time, it is curves which are used in differential 
geometry. It is typically preferable to deal with equivalence according to the needs of each context. 


Section 36.2 defines various kinds of continuous curves. 


Section 36.3 is on the special topic of “space-filling curves”, which are useful as counterexamples, par- 
ticular in regard to immersions and embeddings of manifolds. 
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Section 36.4 is concerned with the removal of stationary intervals (or “constant stretches") from curves. 


Section 36.5 considers some equivalence relations for curves. The underlying motivation for considering 
this topic is to be able to say that parallel transport in a topological fibre bundles is a function of 
an equivalence class of curves instead of each curve individually. However, the benefits of using such 
equivalence classes (here called “paths”) are not very great. 


Section 36.6 defines concatenation of curves. Again the underlying motivation is in applications to 
parallel transport. It is desirable to be able to say that the parallel transport along the concatenation 
of two curves equals the composition of the parallel transport maps for the individual curves. 


Section 36.7 is about pathwise connected sets. Although this is the standard terminology for this kind 
of connectivity, it is more accurate to refer to it as “curvewise connectivity” in terms of the definitions 
given here. The most important property is Theorem 36.7.9, which states that pathwise connected sets 
are connected. 


Section 36.8 defines paths as equivalence classes of curves. It turns out that these are not as useful as 
one would expect. 


Section 36.9 defines topological groups. Sections 36.10 and 36.11 define topological left and right trans- 
formation groups respectively, which are useful for defining topological fibre bundles. 


CHAPTER 37: Metric spaces 


(1) 


(9) 
(10) 


The metric spaces in Chapters 37 and 38 are often presented before general topological spaces in order 
to commence with ideas which are more familiar to the reader. However, this has the unfortunate 
consequence that it is then difficult to forget the facts regarding metric spaces when general topological 
spaces are introduced. It is for this reason that this book is mostly presented in general-before-specific 
order. Then the facts from earlier chapters apply to all later chapters. It is easier to remember something 
than to unremember it. 


Section 37.1 defines very general metric functions, and a metric space is then a set with a metric function. 
This is a minimalist algebraic definition where the metric function is valued in an ordered commutative 
group, which is too general for differential geometry. 


Section 37.2 specialises the very general metric functions in Section 37.1 to have real values, which is 
the usual way to define them. 


Section 37.3 defines open and closed balls, punctured balls and annuli. Open balls are most important 
for their role as an open base for the topology induced by a metric space. Open and closed balls are 
also useful for compactly expressing ideas in definitions and theorems. 


Section 37.4 defines set distance and set diameter. A bounded set is a set whose diameter is finite. 
Section 37.5 defines the topology induced by a metric function in terms of its open balls. 


Section 37.6 presents some topological properties of metric spaces, particularly regarding the interior, 
closure, exterior and boundary of sets. 


Section 37.7 applies the general topological space concepts of compactness and convergence in Sections 
33.5 and 35.4 to metric spaces. 


Section 37.8 defines Cauchy sequences and completeness of metric spaces. 


Section 37.9 presents nested set convergence theorems for complete metric spaces. 


CHAPTER 38: Metric space continuity 


(1) 


Section 38.1 is about continuity of maps between metric spaces. Continuity for metric spaces can be 
expressed in terms of open balls. This is more convenient than the general topological space definition 
in terms of open sets. 


Section 38.2 points out that the dependence of 6 on £ in the typical definition of continuity for metric 
spaces can be regarded as an explicit function, without using any axiom of choice. 


Section 38.3 defines uniform continuity of maps between metric spaces. 
Section 38.4 defines uniform convergence of sequences of metric-space-valued functions. 


Section 38.5 presents Ascoli's theorem for convergent subsequences of bounded sequences of functions. 
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(6) Section 38.6 defines six different variants of Lipschitz continuity. 


(7) Section 38.7 very briefly defines Hólder continuity, which is not much used in this book. Holder continuity 
is a generalisation of Lipschitz continuity. 


(8) Section 38.8 defines the length of a curve in a metric space and presents some properties. 


(9) Section 38.9 is about rectifiable curves and paths in metric spaces. These are closely related to functions 
of bounded variation. A rectifiable curve has finite length. The importance of rectifiable curves in 
manifolds comes from the fact that they have a well-defined velocity almost everywhere, which makes 
them suitable for defining parallel transport. 


(10) Section 38.10 defines functions of bounded variation for general metric spaces, although the theory for 
such functions is usually restricted to normed linear spaces. 


CHAPTER 39: Topological linear spaces 

1) Section 39.1 defines topological linear spaces, which provide the necessary structure for defining conver- 
gence of infinite series. 

Section 39.2 presents infinite series in topological linear spaces. 

Section 39.3 introduces real and complex normed linear spaces. 

Section 39.4 introduces real and complex Banach spaces. 

Section 39.5 introduces finite-dimensional normed linear spaces, which are Banach spaces. 


Section 39.6 defines topological groups of finite-dimensional linear transformations. 


Section 39.7 introduces Landau’s order symbols “O” and “o”, which often provide a useful shorthand 
for definitions and proofs in analysis. 


CHAPTER 40: Differential calculus 


(1) Chapters 40-42 present some differential calculus concepts which are useful for differential geometry. 
Until a hundred years ago, differential geometry could be considered explicitly or implicitly as a topic 
lying within the scope of differential calculus, as shown by the original name “absolute differential 
calculus” for “tensor calculus”. One could still think of differential geometry as “differential calculus 
with a geometric narrative”. 


(2) Section 40.1 is an informal discussion of the nature of “velocity”, the fundamental meaning which under- 
lies differentiation. (For example, derivatives were referred to as “fluxions” in Newton’s terminology.) 
It is emphasised here that velocity is equally related to both space and time, but time is absent from 
static geometry. Archimedes added time to Euclid’s geometry, but later pure geometers unfortunately 
rejected any role for time. In differential geometry, the role of time must be welcomed. 


(3) In Section 40.2, it is proposed that parametrised lines correspond to the fundamental essence underlying 
derivatives, velocities and tangent vectors. Differential calculus is largely concerned with approximation 
of curves by lines. In practical computations, points and velocities are represented by numbers, but 
velocity cannot be defined within the point space unless time is incorporated. This helps to answer the 
question of how to best represent tangent vectors on manifolds. 


4) Section 40.3 defines differentiability of real-valued functions of a single real variable. 
5) Section 40.4 defines derivatives of differentiable real-valued functions of a single real variable. 


6) Section 40.5 presents the constant multiplication rule, sum rule, product rule, reciprocal rule, quotient 
rule and chain rule for differentiation. 


7) Section 40.6 presents the mean value theorem and Rolle’s theorem, and some useful consequences. 


8) Section 40.7 defines differentiable vector-valued functions. 


9) Section 40.8 presents the so-called mean value theorem for several dependent variables. It states that 
the average speed of a curve between two points does not exceed the maximum speed at points along 
the curve. This is very easy to show with integral calculus, but Theorems 40.8.5 and 40.8.5 show that 
it can be proved using only differential calculus. 


(10) Section 40.9 presents unidirectional differentiability of real functions of a real variable. 
(11) Section 40.10 presents Dini derivatives, which are used in the proof of the Lebesgue differentiation 


theorem in Section 45.7. 
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CHAPTER Al: Multi-variable differential calculus 


1) 


2) 


Sections 41.1, 41.4 and 41.6 present respectively partial derivatives, directional derivatives and total 
derivatives of real-valued functions on Cartesian spaces. These are progressively stronger forms of 
derivatives, although if any of them is continuous, it then follows that they are all equivalent. 


Section 41.1 defines partial derivatives of real-valued functions on Cartesian spaces. These are very weak 
(if they are not assumed to be continuous), but they are computationally the easiest kinds of derivatives 
to manage for such such functions. 


Section 41.2 extends partial derivatives from real-valued functions on Cartesian spaces to maps between 
general Cartesian spaces. 


Section 41.3 presents some minor differentiability theorems for joint-domain and joint-range maps. This 
means that map domains or ranges which are direct products of Cartesian spaces are regarded as 
Cartesian spaces by identifying R™ x IR"? with R™t™., 

Section 41.4 is about directional derivatives of real-valued functions on Cartesian spaces, whereby deriva- 
tives are computed in all directions at a point, not only in the axial directions as in the case of partial 
derivatives. 


Section 41.5 defines unidirectional derivatives of real-valued functions on Cartesian spaces. These can 
be useful at boundaries of regions. 


Section 41.6 defines total differentiability of real-valued functions on Cartesian spaces. This is a very 
strong form of definition for multiple independent variables. This requires the directional derivatives at a 
point to be a linear function of the direction vector, and the convergence must be uniform with respect 
to direction. This is the stronger than both partial and directional differentiability. It is therefore 
the most difficult to verify directly. But without total differentiability, it would not be possible to 
express all tangent vectors on differentiable manifolds in terms of a simple n-tuple of real numbers. 
Luckily Theorem 7.2.9, here called “the total differential theorem”, shows that a continuously partially 
differentiable function is also continuously totally differentiable. This explains why differentiability in 
differential geometry is mostly required to be continuous differentiability. 


Section 41.7 presents chain rules for C! maps and an inverse rule for C! diffeomorphisms. 


Section 41.8 extends differentiation of functions on Cartesian spaces from Cartesian space valued function 
to general finite-dimensional real linear space valued functions. 


Section 41.9 extends differentiation from Cartesian spaces to abstract linear spaces. 


Section 41.10 presents the inverse function theorem and implicit function theorem. These are important 
for the existence and regularity of immersions and embeddings of differentiable manifolds. 


CHAPTER 42: Higher-order derivatives 


(1) 
(2) 


Section 42.1 presents higher-order derivatives of real functions of a real variable. 


Sections 42.2, 42.3 and 42.4 present definitions, notations and basic properties for higher derivatives 
of real functions on Cartesian spaces. These are applicable to real-valued functions on differentiable 
manifolds. 

Section 42.5 presents definitions, notations and basic properties for higher-order derivatives of maps 
between Cartesian spaces. These are applicable to maps between differentiable manifolds. 

Section 42.6 applies higher-order derivatives of maps between Cartesian spaces to various kinds of 
constructions. 

Section 42.7 defines Cartesian space diffeomorphisms and related morphisms. 


Section 42.8 defines real and complex analytic functions. 


CHAPTER 43: Integral calculus 


(1) 


Chapter 43 is about integral calculus. As mentioned in Remark 43.0.1, integration is typically presented 
after differential calculus because integration is a more “advanced technology” than differentiation. 
This is perplexing because Archimedes was performing impressive feats of integral calculus more than 
2200 years ago, whereas differential calculus is an invention of the 17th century, almost two millennia 
later. The 20th century Lebesgue measure and integration “technology” is not presented because it has 
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numerous technical complexities which are irrelevant and unnecessary for most integration purposes in 
differential geometry. The 19th century integration “technology” is sufficient. 


Section 43.1 is an overview and brief history of numerous methods of integration. 
Section 43.2 makes some comments on the Leibniz-Newton antiderivative style of integration. 


Sections 43.3, 43.4 and 43.5 introduce the Cauchy, Cauchy-Riemann and Cauchy-Riemann-Darboux 
integrals respectively for real functions of a real variable. These are 19th century integrals, as the 
Stieltjes versions of these integrals are also, but they are adequate for most purposes in differential 
geometry. 


Section 43.6 gives a necessary and sufficient condition for Darboux integrability in terms of the Jordan 
content of the set of points where the integrand has positive oscillation. 

Section 43.7 presents some basic properties of the Cauchy-Riemann-Darboux integral. 

Section 43.8 presents the fundamental theorem of calculus for the Darboux integral. 

Section 43.9 applies the Riemann integral to vector-valued integrands. 

Section 43.10 introduces the Cauchy-Riemann-Darboux-Stieltjes integral (also known as the Riemann- 
Stieltjes integral) for real functions of a real variable. Although this is a 19th century integral, it is 
useful and adequate for integration of the velocity of a curve. 

Sections 43.11 and 43.12 introduce the Stieltjes integral for vector and linear operator functions of a 
real variable. These are the “right” integrals for computing parallel transport for linear connections on 


vector bundles along rectifiable curves. (They are used in conjunction with the Picard iteration method 
in Definition 44.6.15.) 


CHAPTER 44: Differential equations and special functions 


1) 
2) 
3) 


(7) 
(8) 


Section 44.1 defines logarithmic and exponential functions of real variables. 

Section 44.2 defines trigonometric functions of real variables. 

Section 44.3 presents the Peano method for demonstrating existence of solutions of systems of ordinary 
differential equations. This has some small advantages relative to the Picard method which has largely 
displaced it. 

Section 44.4 presents the Peano ODE existence method for systems of equations. 

Section 44.5 presents uniqueness theory for ODEs and ODE systems. 

Section 44.6 presents the Picard integration-iteration method for demonstrating existence for systems of 
ordinary differential equations. Despite the name of this topic, it is actually a generalisation of integral 
calculus, which is in essence the solution of equations such as f'(x) = g(x), which is a very simple kind 
of ODE. In the ODE subject, the objective is always to solve them, not just compute the derivatives! 
ODEs are central to differential geometry because parallel transport along a curve is defined as the 
solution of an ODE. The Picard iteration method in Definition 44.6.15 (which is also 19th century 
“technology” ) is powerful enough to prove existence of solutions for the linear connections on general 
vector bundles which appear in differential geometry. 

Section 44.7 presents the Picard ODE existence method for systems of equations. 


Section 44.8 presents the calculus of variations on Cartesian spaces. 


CHAPTER 45: Lebesgue measure zero 


(1) 


Chapter 45 presents, in a cursory manner, Lebesgue measure and integration. Although these are often 
used because they are in some sense maximal, they require extensive applications of the axiom of choice 
to prove even quite elementary properties. The main difference between the Lebesgue approach to 
integration and the earlier Riemann/Stieltjes approach is the summation over horizontal slices in the 
Lebesgue case by contrast with the vertical slices in the Riemann/Stieltjes case. Hence set-measure (the 
measure of horizontal slices of functions) is a specific characteristic of the Lebesgue theory. 


Section 45.1 presents sets of real numbers of measure zero. Although some of the very basic theorems 
for sets of measure zero require the axiom of countable choice, it is shown in Sections 45.2 and 45.3 that 
these theorems can be replaced with “explicit” versions which do not need any axiom of choice. 
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Section 45.4 demonstrates in Theorem 45.4.5 that there exist bounded non-decreasing functions which 
are non-differentiable at all points of any given set of explicit measure zero. The purpose of this is 
to show that the Lebesgue differentiation theorem in Section 45.7 is the best possible constraint on 
the subsets of non-decreasing functions where functions are non-differentiable. (In other words, the 
Lebesgue differentiation theorem is “sharp”.) 


Sections 45.5 and 45.6 are concerned with “shadow sets” and “double shadow sets" respectively. These 
are technical constructions which are useful for proving the Lebesgue differentiation theorem. 


Section 45.7 presents the Lebesgue differentiation theorem, which states that a non-decreasing real 
function (and therefore also a function of bounded variation) is differentiable everywhere except on a set 
of measure zero. This is achieved by careful analysis of Dini derivatives with the assistance of shadow 
sets and double shadow sets. 


CHAPTER 46: Vector field calculus for Cartesian spaces 


1) 


Chapter 46 is concerned with vector fields, the Lie bracket, differential forms, the exterior derivative 
and the so-called Stokes theorem in Cartesian spaces. (Stokes did not discover it.) 


Section 46.1 gives some basic definitions for true vector fields on Cartesian spaces. These are defined in 
terms of tangent bundles, whereas the “naive vector fields" in Section 46.3 are not. 


Section 46.2 defines differential forms on Cartesian spaces. These are antisymmetric covariant tensor 
fields valued in arbitrary finite-dimensional linear spaces. 

Section 46.3 defines “naive vector fields" for Cartesian spaces. These fields are defined in the classical 
fashion, without the benefit of the fibre bundle perspective. 

Section 46.4 defines the Lie bracket for naive vector fields. 

Section 46.5 presents the relation between the Lie bracket of two naive vector fields and the deviation 
from holonomy of the integral curves of these fields. 

Section 46.6 defines “naive differential forms" for Cartesian spaces. These are optimised for computations 
related to the exterior derivative. 

Section 46.7 introduces the exterior derivative for Cartesian spaces. Since the exterior derivative does 
not depend on connections or metrics, it is essentially the same locally for Cartesian spaces as for 
differentiable manifolds. 

Section 46.8 applies the exterior derivative to general nonholonomic vector-field tuples. 

Sections 46.9 and 46.10 introduce the very simple cases of the Stokes theorem for rectangular regions of 
IR? and IR?. These are easy applications of the fundamental theorem of calculus. 


CHAPTER 47: Topological fibre bundles 


(1) 
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Chapter 47 introduces topological fibrations (which have no fibre atlas) and topological fibre bundles 
(which do have a fibre atlas). Some authors define only differentiable fibre bundles, but there are 
good reasons to introduce topological fibre bundles first. the structure of differentiable fibre bundles 
is quite complicated, requiring in particular tangent bundles on all constituent spaces. (These spaces 
are the base space, the total space, the fibre space and the structure group.) But tangent bundles are 
themselves quite complicated. It is helpful to be able to first identify tangent bundles as particular kinds 
of topological fibre bundles, and then later incorporate tangent bundles into differentiable fibre bundles. 
But a much more important reason for separately defining topological fibre bundles is the fact that they 
are significantly more general because they do not require finite-dimensional locally Cartesian structure 
on all constituent spaces. (This is why topological fibre bundles can be defined before topological 
manifolds.) 

Section 47.1 gives an overview of the differences between fibrations, ordinary fibre bundles and principal 
fibre bundles. 

Section 47.2 defines a minimal kind of topological fibration which has no specific fibre space, but which 
does have an "intrinsic fibre space". (This is analogous to defining a topological manifold without an 
atlas.) 

Section 47.3 adds a fibre space and fibre charts to a fibration to define the topology of the total space 
extrinsically. 
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(5) 
(6) 
(7) 


(14) 


Section 47.4 defines cross-sections of fibrations, which must choose a single element from the fibre set 
at each base point. 


Section 47.5 defines fibre atlases, which are sets of pairwise topologically consistent fibre charts which 
cover the whole total space. (These are analogous to the coordinate charts for topological manifolds.) 


Section 47.6 defines topological ordinary fibre bundles, which are fibrations with a fibre atlas for a 
specified structure group. The structure group constrains the transition maps between overlapping fibre 
charts. (This is analogous to the way in which the transition maps of topological and differentiable 
manifolds are constrained.) More importantly, the fibre atlas induces structure from the structure 
(transformation) group onto the underlying topological fibration. 


Section 47.7 introduces topological fibre bundle homomorphisms, isomorphisms and products. These 
are similar to the corresponding definitions for other categories. 


Section 47.8 defines topological principal fibre bundles. In structural terms, a PFB is like an OFB 
except that the fibre space is the same as the structure group. However, the real purpose of a PFB is 
to act as the associated frame bundle for an OFB. Then parallelism can be defined on many different 
associated OF Bs by first defining parallelism on the PFB. 


Section 47.9 defines associated topological fibre bundles. A PFB may be associated with one or more 
individual OF Bs, and the OFBs may be associated with each other. 


Section 47.10 presents a “patchwork construction method” for associated topological fibre bundle. 


Section 47.11 presents a method of constructing associated topological fibre bundles. This is called here 
the “associated topological fibre bundle orbit-space construction method”. Some textbooks give this as 
the only way to define associated fibre bundles. 


Section 47.12 presents associated contravariant fibre-space-valued functions on principal bundle total 
spaces. These have a one-to-one association with cross-sections of an associated ordinary fibre bundle. 
So these contravariant functions may be thought of as OFB cross-sections which have been somehow 
lifted to the PFB total space so that the PFB connection can act on them. Then this PFB covariant 
derivative can be dropped back down to the OFB. (This may sound overly convoluted, but it is effectively 
what is done in gauge theory.) 


Section 47.13 defines combined topological fibre/frame bundles, which differ from the “baseless fig- 


ure/frame bundles” in Section 20.10 by the addition of a base space and topological structure on all 
constituent spaces. Both OFBs and PFBs are given greater meaning by being combined. 


CHAPTER 48: Parallelism on topological fibre bundles 


(1) 
(2) 


(3) 
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Chapter 48 makes a not very successful attempt to define parallelism on topological fibre bundles. The 
difficulty lies in the excess of freedom. 


Section 48.1 introduces structure-preserving fibre set maps. These are maps between fibre sets at 
different base points of a topological fibre bundle which are consistent with the structure group. This 
is relevant to the definition of parallelism. 


Section 48.2 notes that the objects on which parallelism is defined are equivalence classes of curves. 
These are called “parallelism path classes” here. Parallel transport should be the same along two 
curves which cover the same path with different parametrisation. Thus a “parallelism path class” is an 
equivalence class of curves which are expected to always determine the same parallel transport of fibre 
sets along the curves. 


Section 48.3 defines a very general notion of parallelism on topological fibre bundles. The concept is 
meaningful, but the freedom is excessive. This helps to motivate the notion of a connection, which 
defines the differential of parallelism everywhere, so that parallelism can be defined as its integral. 


Section 48.4 defines “associated parallelism” on associated topological fibre bundles. For example, this 
can be used to define parallelism on an OFB in terms of parallelism on an associated PFB. Alternatively, 
parallelism can be defined on OF Bs from other associated OF Bs. The prime example for parallelism on 
affinely connected differentiable manifolds is the method by which parallel transport is defined for all 
tensor bundles which are associated with a given tangent bundle. This is why connections and parallel 
transport only need to be defined on the tangent bundle, because the tensor bundles are associated with 
that tangent bundle. (For example, the Christoffel array is defined for contravariant vectors. Then the 
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covariant derivative for covariant vectors essentially uses the negative of the Christoffel array, and all 
other tensors use mixtures of positive and negative Christoffel arrays.) 
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31.0.1 REMARK: The most relevant topology topics for differential geometry. 

Chapters 31 to 39 are not a full introduction to general topology. Only those aspects of topology which are 
needed for later chapters are presented here. Apart from the basic definitions, the most important aspects 
of topology for differential geometry are topological space construction techniques (such as product and 
quotient topologies), classes of topological spaces, continuous curves and paths, topological transformation 
groups, and metric spaces. 


31.1. History of the two branches of topology 


31.1.1 REMARK: Etymology of topology and analysis situs. The study of locality. 

The word “topology” is generally said to have been introduced in 1847 in a 65-page exposition “Vorstudien 
zur Topologie" by Johann Benedict Listing [188]. (See EDM2 [113], article 426.) The word “topology” is 
derived from xóxoc (topos) meaning “place, spot; passage in a book; region, district; space, locality; position, 
rank, opportunity” and Aóyoc (logos) meaning “thought, reasoning, computation, reckoning, deliberation, 
account, consideration, opinion” [and 44 other meanings in my dictionary].) Thus the word “topology” 
corresponds roughly to the Latin-derived term “analysis situs”. (The Latin word “situs” is defined by 
White [485], page 572, as: “The manner of lying; the situation, local position, site of a thing” .) 


Bell [233], page 492, suggested (in 1937) that the word “topology” was earlier than the term “analysis situs". 


[...] topology (now called analysis situs) as first developed bore but little resemblance to the 
elaborate theory which today absorbs all the energies of a prolific school [. . .] 
Listing mentions in his 4-page introduction the terms “géométrie de position”, “Geometrie der Lage”, 
“problèmes de situation" and "geometria situs" as fore-runners of “Topologie”. However, his examples 
of "Topologie" are more closely related to combinatorics than to modern analytical topology. He mentions 
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Leibniz, Grabmann, Mobius, Carnot, Monge, Euler and Vandermonde as significant names in the historical 
development of topology. 


According to Struik [249], page 196, 


An important contribution to the further penetration of point-set topology into the main body of 
mathematics was the Grundztige der Mengenlehre, published in 1914 by Felix Hausdorff, a teacher 
in Bonn. It offered an axiomatic definition of what became known as topological space. 


Struik [249], page 199, makes the following comments about the early development of topology. 


R. L. Moore and Veblen were American representatives of that field of mathematics first known as 
analysis situs and after 1900 more and more as that part of topology called combinatory (in contrast 
with point-set topology) as it was explained, for instance, by Hausdorff. Its emergence from a set of 
puzzle-like problems, such as that of Euler on the seven bridges of Königsberg, or the Mobius strip, 
came with the Riemann surfaces in complex function theory, with Jordan’s closed curve theorem, 
and in particular with Poincaré’s publications, between 1895 and 1904, on simplexes, and Betti 
numbers of manifolds. 


It is not immediately obvious why the word “topology” is used for the study of continuity. The connection 
can be seen by thinking about what a set would be without a topology. Then every point is equivalent. 
For example, there would be no concept of a continuous curve. But with a topology, continuous curves are 
constrained to move smoothly from one point to another. If progressively shorter segments of a curve are 
considered, they are progressively more local to their starting point. In other words, a topology gives a set 
a sense of place or locality. This becomes even clearer when considering how neighbourhoods are used in 
defining interior, exterior and boundary points of sets. In fact, the aptness of the word “topology” is most 
clear when a topology is defined in terms of open bases at all points rather than open sets. Thus it may be 
best to think of topology as meaning not “the study of place", but rather “the study of locality". 


31.1.2 REMARK: The two branches of topology. 

Since the author has encountered some individuals who have strongly asserted the existence of only one 
branch of topology, namely the combinatorial or algebraic branch, sometimes referred to as “rubber sheet 
geometry”, it is perhaps of some value to assert here the existence of two branches by means of some brief 
quotations from the literature. The combinatorial/algebraic branch of topology is concerned almost entirely 
with locally Cartesian spaces, which constitute a rather narrow class of topological spaces. 


Simmons [137], page viii, said the following about the subject matter of topology in 1963. 


Historically speaking, topology has followed two principal lines of development. In homology theory, 
dimension theory, and the study of manifolds, the basic motivation appears to have come from 
geometry. In these fields, topological spaces are looked upon as generalized geometric configurations, 
and the emphasis is placed on the structure of the spaces themselves. In the other direction, the 
main stimulus has been analysis. Continuous functions are the chief objects of interest here, and 
topological spaces are regarded primarily as carriers of such functions and as domains over which 
they can be integrated. These ideas lead naturally into the theory of Banach and Hilbert spaces 
and Banach algebras, the modern theory of integration, and abstract harmonic analysis on locally 
compact groups. 


Topology thus splits into two branches of investigation, one focused on the connectedness of sets, the other 
focused on the continuity of functions. The first branch focuses on sets — typically smooth manifolds — using 
homeomorphism invariants such as homotopy as tools to classify spaces into homeomorphism equivalence 
classes. The second branch focuses on functions, using continuity to demonstrate the existence of limits. 


branch names main focus concepts and agenda 


combinatorial, connectedness of sets classification of homeomorphism equivalence classes of 
or algebraic low-dimensional manifolds; algebraic topology tools for 
classifying topologies 


point-set, or continuity of functions convergence; compactness; continuity; real analysis; analysis on 
set-theoretic infinite-dimensional linear spaces; PDE problem existence using 
approximation methods 
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However, there is another view of the subject matter of topology, namely that it consists only of the con- 
nectedness questions. EDM2 [113], section 426, gives the following summary of topology’s subject matter. 


The term topology means a branch of mathematics that deals with topological properties of 
geometric figures or point sets. 


That matches the view which is often seen in popular science magazine articles for example. Possibly the 
meaning of “topology” as a mathematical subject has drifted over the decades due to the many undergraduate 
courses which have this name. 


Bynum et alii [238], page 423, make the following clear distinction between the two branches. 


Topology studies properties of geometrical figures invariant under continuous one-to-one transfor- 
mations. There are two branches, combinatorial and point-set topology. Certain problems of a 
type treated by L. Euler (1707-83) gave rise to combinatorial topology, often called ‘rubber sheet 
geometry’ [...]. The first systematic development was given in 1895 by H. Poincaré (1854-1912) in 
his Analysis situs (an expression used by G. W. Leibniz (1646-1716) for geometry of situation). [.. .] 


Point-set topology, which regards the geometrical figures as subsets of a structured space, originated 
with G. Cantor (1845-1918) and was perfected by F. Hausdorff (1868-1942), who showed that the 
essential concept of a neighbourhood was possible in a space without a metric. L. E. J. Brouwer 
(1881-1966) combined the two branches and topology now unifies almost all mathematics. 


Their definition of topology seems to focus very much on homeomorphism-invariant figures side of topology, 
while also permitting a second branch dealing with the spaces rather than the figures. 


Reinhardt /Soeder [124], page 207, also distinguish the two branches as follows. 


Die Topologie ist als selbstständige math. Disziplin noch verhältnismäßig jung. Man unterschei- 
det heute die mengentheoretische Topologie und die algebraische Topologie. In der algebraischen 
Topologie werden top. Probleme mit algebraischen Hilfsmitteln gelöst. 


Die mengentheoretische Topologie (kurz: Topologie) hat ihre Wurzel in der reellen Analysis. 
Es zeigte sich nämlich, daß man z. B. die Konvergenztheorie allein durch Eigenschaften von Punkt- 
mengen entwickeln kann, ohne sich der algebr. Struktur oder der Ordnungsstruktur zu bedienen. Es 
kristallisierte sich eine dritte Struktur heraus, die topologische Struktur. In ihr werden Begriffe wie 
Umgebung, offene Menge, abgeschlossene Menge, Berührungspunkt, Haufungspunkt, Konvergenz, 
Zusammenhang und Kompaktheit formuliert und Punktmengen mit Hilfe dieser Begriffe untersucht 
und klassifiziert. 


This may be translated into English as follows. 


Topology as an independent mathematical discipline is still relatively young. One distinguishes 
nowadays set-theoretic topology and algebraic topology. In algebraic topology, topological problems 
are solved with algebraic tools. 


Set-theoretic topology (in short: topology) has its root in real analysis. Thus it is shown, for 
example, that the theory of convergence may be developed by way of point-set properties alone, 
without using algebraic structure or order structure. A third structure crystallised out of this: 
topological structure. In this, concepts such as neighbourhood, open set, closed set, contact point, 
accumulation point, convergence, connectedness and compactness are formulated, and point-sets 
are investigated and classified with the help of these concepts. 


Boyer/Merzbach [237], pages 552-553, describes Poincaré's negative view of point-set topology. 


Topology is now a broad and fundamental branch of mathematics, with many aspects, but it can 
be subdivided into two fairly distinct sub-branches: combinatorial topology and point-set topology. 
Poincaré had little enthusiasm for the latter, and when, in 1908, he addressed the International 
Mathematical Congress at Rome, he referred to Cantor's Mengenlehre as a disease from which later 
generations would regard themselves as having recovered. Combinatorial topology, or analysis situs, 
as it was then generally called, is the study of intrinsic qualitative aspects of spatial configurations 
that remain invariant under continuous one-to-one transformations. It is often referred to popularly 
as “rubber-sheet geometry,” for deformations of, say, a balloon, without puncturing or tearing it, 
are instances of topological transformations. 


Willard [165], page v, distinguishes the two branches of topology as follows, referring to them as *continuous 
topology" and “geometric topology". 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1024 31. Topology 


The first, which could be called “continuous topology”, centers on the results about compactness 
and metrization which are the indispensable tools of the modern analyst. [...] The second area, 
which might be called “geometric topology", is primarily concerned with the connectivity properties 
of topological spaces and provides the cores of results from general topology which are necessary 
preparation for later courses in geometry and algebraic topology. 


31.1.3 REMARK: History of topology. 
According to Willard [165], page 15, Fréchet introduced metric spaces in 1906, and Hausdorff introduced 
topological spaces in 1914. (However, see Remark 37.0.5 for a different opinion.) 


31.2. The intuitive basis of topology 


31.2.1 REMARK: Defining topology in a single sentence. 
Topology is the study of the connectedness of sets and the continuity of functions. Therefore: 


Topology is the study of connectedness and continuity. 


Roughly speaking, continuous functions are those which preserve the connectedness of sets. (See Section 35.2 
for details.) So continuity may be defined in terms of connectedness. But connectedness and continuity may 
both be defined in terms of the concepts of interior, exterior and boundary of sets. Thus: 


Topology is the study of the interior, exterior and boundary of sets. 


The interior Int(S) and exterior Ext(S) of a set S in a topological space X may be defined in terms of the 
boundary Bdy(S) as Int(S) = SV Bdy(S) and Ext(S) = (X \ S) \ Bdy(S). So everything in topology may 
be defined in terms of boundaries of sets. Hence: 


'Topology is the study of boundaries. 


The modern technical specification of a topology is expressed in terms of open sets. (An open set is a set 
which contains none of its boundary points.) The interior, exterior and boundary of a set are then defined 
in terms of these open sets. However, for the meaning of a topology, boundaries are more fundamental than 
open sets. Both concepts contain the same information in a technical sense, but boundaries have a much 
stronger intuitive appeal than open sets. 


31.2.2 REMARK: Physical metaphors for topology. 
'The concepts of the interior, exterior and boundary of sets are familiar from many real-life contexts, including 
the following kinds of objects which have a well defined interior, exterior and boundary. 


(1) Biological cells and the bodies of animals and plants. 

(2) Enclosed vehicles such as cars, aeroplanes, spacecraft and ships. 

(3) Oceans, seas and lakes, water droplets, icebergs and glaciers. 

(4) Nations, territories and continents. 

(5) Planets, stars, asteroids. 

It is not totally implausible that the earliest vertebrate animals on Earth implemented the concepts of 


interior, exterior and boundary in the world-models by which they navigated their environments. These 
topological concepts are arguably more fundamental than numbers or propositional logic. 


The logical concept of a class of objects implies both an interior and an exterior of the class. So there is 
apparently a close fundamental relation between set theory and topology. 


Mathematical models of the physical world are very often expressed in terms of the interior, exterior and 
boundary of sets. For example, the Stokes theorem expresses integrals over a region in terms of integrals over 
its boundary. Conservation equations for physical flows are expressed in terms of boundary integrals and 
interior integrals. Solutions of boundary value problems are typically expressed as integrals over boundaries 
and interior regions. A large proportion of complex analysis is concerned with integrals over boundary curves 
and their relations to integrals over interior regions bounded by curves. 


Perhaps most importantly, all of mathematical analysis is expressed in terms of limits, which are effectively 
equivalent to boundaries of sets. Thus analysis is based upon the boundary concept, which is the core 
concept of topology. 
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31.2.3 REMARK: The topology of a set partitions the whole space into interior, exterior and boundary. 

A topology for a set X defines the interior Int(S), exterior Ext(S) and boundary Bdy(S) of every subset S$ 
of X. These three sets form a partition of X for each set S. The technical specification of a topology uses 
open neighbourhoods to define these three sets. 


The interior of a set S consists of the points xı which have at least one neighbourhood which is entirely 
inside the set S. The exterior consists of the points r3 which have at least one neighbourhood which is 
entirely outside the set S. The boundary consists of the points z9 which are neither interior nor exterior. 
In other words, every neighbourhood of a boundary point contains at least one point inside and one point 
outside S. (See Figure 31.2.1.) 


XAS z2 € Bdy(S) 


Cine Int(S) 


EN 


neighbourhoods 


33€ Ext(S) 


Figure 31.2.1 Interior, boundary and exterior of a set 


One way to think about set interiors and boundaries is to recall Zeno's paradox of Achilles and the tortoise. 
Imagine the tortoise walking towards the boundary. Whenever Achilles gets to where the tortoise was, he 
still has the tortoise between him and the boundary. So Achilles always has a neighbour inside the set. 


31.2.4 REMARK: The point-set and topology layers in a 5-layer differential geometry structure model. 
The entities in layer 0 of the differential geometry layer-model proposed in Section 1.1 are points, which may 
be locations in space or events in space-time, or any other kind of concrete or abstract “point”. Points are 
represented mathematically as elements of sets. 


There is not necessarily any association at all between points in layer 0. The only necessary relation between 
points is the equality relation. Given any two points P and Q, it is possible to say if the points are equal 
(P = Q) or not equal (P # Q). There is no distance relation. So there is no distinction between points 
which are close to P and those which are distant from P. 


A set of points can be counted because the equality relation is well defined on any set. Counting a set or 
subset can be achieved by labelling points with numbers as illustrated in Figure 31.2.2. 


° Te * B H 
A . P . 5e : 
E. 279 : . 
` È . 2° m ° ° . * * 4* 
. 6. Š e "i H ° ° T é . 
* Oe z . . 
Figure 31.2.2 Layer 0: Points (or events) 


Although the points in this figure are drawn on 2-dimensional paper, pure points do not have coordinates of 
any kind. All we can do with pure points is to label and group and count them. (The basic theory underlying 
pure point-sets is presented in Chapters 7 to 16.) 
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Differential geometry layer 1 adds topological structure to simple point sets. Topology may be thought of as 
a kind of glue which holds neighbouring points together. Otherwise points would have no association with 
their neighbours at all. 


31.2.5 REMARK: The role of neighbourhoods in the disconnection of pairs of sets. 

Within the topological neighbourhood metaphor, every point x has one or more neighbourhoods Ny of x. 
Figure 31.2.3 illustrates two sets of points. Each point x € Sı and y € S» is surrounded by a single circular 
neighbourhood N; or Ny in this diagram, but points may have any number of neighbourhoods of any size 
and shape. 


The fact that the two sets of points $1 and $5 in Figure 31.2.3 are disconnected from each other is clear from 
the fact that each point-set Sı and S2 can be covered by neighbourhoods which exclude the neighbourhoods 
of the other set of points. That is, the intersection N, N Ny of neighbourhoods N, and Ny is empty for all 
x € Sı and y € So. 


Figure 31.2.3 Disjoint neighbourhoods of individual points of disconnected sets 


The neighbourhoods of the sets Sı and $5 may be joined into combined neighbourhoods €, and Qz which 
contain all of the respective points in the interiors. This is illustrated in Figure 31.2.4. The combined 
neighbourhoods are €, = Upe s, Ne and (2; = Le s, Ny respectively. Clearly 5; is in the interior of Qı and 
$5 is in the interior of Q2, and the intersection Qı N Q2 is empty because all of the neighbourhood pairs Ny 
and N, have an empty intersection. 


e 


z€S$i Q4,n905-( (5 = U Ny 
yES2 
Figure 31.2.4 Disjoint combined neighbourhoods of disconnected sets 


Note that the sets Q4 and Qə are not associated with any particular individual point. The subject of topology 
is much simpler as a logical discipline if the open sets are not associated with particular points. Instead of 
dealing with an infinite number of little neighbourhoods around an infinite number of points, we can deal 
instead with a much smaller number of combined neighbourhoods around combined sets of points. This leads 
to a way of thinking that looks more like Figure 31.2.5. 


31.2.6 REMARK: Neighbourhoods of points versus non-point-specific open sets. 

The “open set” concept is an efficient abstraction of the per-point “open neighbourhood” concept. The gain 
in logical efficiency has the disadvantage of a loss of intuitive immediacy. In fact, in applications one generally 
focuses on per-point neighbourhoods rather than abstract open sets. In this book, Top(.X) denotes the set 
of open sets of a set X, and Top, (X) = {Q € Top(.X); x € Q} denotes the set of open neighbourhoods of a 
point x € X. (See Notations 31.3.4 and 31.3.12.) In practice, the pointwise sets Top, (X) are generally more 
useful than the abstract set Top(.X). It is important to be able to change focus easily between the per-point 
neighbourhoods in Top, (X) and the global, non-point-specific neighbourhoods in Top( X). 


Axiomatisations of topology in terms of neighbourhoods, instead of open sets, are quite rare in the more recent 
literature. Some examples are Wallace [152], page 14; Baum [54], pages 20-21. The set of all associations of 
neighbourhoods with points is sometimes called a “neighbourhood system". (See for example EDM2 [113], 
pages 1606-1607; Baum [54], page 20.) 
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31.2.7 REMARK:  Connectedness and disconnectability of sets. 

In terms of abstract open sets, a set is defined to be disconnected if it can be covered by two disjoint open 
sets Qı and Q2 which each contain at least one point of the set. Otherwise the set is connected. In other 
words, a set is connected if and only if it has no *gaps" which separate the set into two or more components. 
The set illustrated in Figure 31.2.5 is disconnected. 


01 


Figure 31.2.5 Definition of disconnectedness in terms of disjoint cover by open sets 


This fact is determined by first specifying the set of all open neighbourhoods in the topological structure of the 
point space. (A different choice of neighbourhoods would give a different classification of sets into connected 
and disconnected.) In Figure 31.2.5, there are two sets Qı and Q2 which “cover” the set of points. In other 
words, all points of the set are inside at least one of the two sets. In this case, the neighbourhoods are disjoint. 
This is the definition of connectedness of a set. A set X is disconnected if there are two neighbourhoods 
which cover X and there is at least one point of X in each of the two neighbourhoods. These neighbourhoods 
effectively “disconnect” the set into two non-empty components. (See also Figure 34.1.2.) 


The ability to disconnect points and sets of points from each other is the fundamental task of open sets. If 
the set of open neighbourhoods is reduced by removing some open neighbourhoods, this tends to reduce the 
ability to disconnect sets into two components. It follows that more sets are then defined to be connected as 
the set of neighbourhoods is reduced. Similarly, if the set of neighbourhoods is augmented by adding new 
neighbourhood sets, this makes it easier to disconnect sets into two ore more subsets. Roughly speaking, the 
bigger the topology is, the smaller the class of connected sets is. 


31.2.8 REMARK: The interdefinability of the connectedness and continuity concepts. 

Continuity of a function may be defined in terms of connectedness. A function is continuous if and only if 
its inverse preserves disconnectedness. (There is a minor technicality that the range of the function must 
be within a Tı or Ts space, but this is a fairly weak requirement.) Most texts do not define continuity of 
functions in terms of connectedness, but this style of definition is presented in Section 35.2 as an alternative. 
It is useful to just know that continuity can be defined as the preservation of set disconnectedness by the 
inverse of a function rather than preservation of set openness. 


Continuous functions may be used to define connectedness of sets. The concepts of connectedness and 
continuity are thus essentially *equipotent". If one knows which sets are connected, one may determine 
which functions are continuous. If one knows which functions are continuous, one may determine which sets 
are connected. This could be called the “interdefinability” of these concepts. 


31.2.9 REMARK: Topology is applicable to mathematical models, not to the real physical world. 

Real physical geometry does not correspond very well to the properties of topological spaces. For example, 
if a zero-diameter single point is removed from a physical plane figure, it is impossible to detect this. No 
measurement apparatus has a zero resolution. If a point is removed from the set IR?, the topology is 
significantly altered. There is no way to determine whether a physical set is open or closed because a zero- 
thickness boundary is undetectable. Physical point-sets have fuzzy boundaries. However, this is not a serious 
problem. Topology is applicable to mathematical models, not to real physical systems. Real points have 
positive width. Real boundaries have positive thickness. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1028 31. Topology 


31.3. Topological spaces 


31.3.1 REMARK: The multiple formalisms of topology reached consensus with open-set classes. 

There are numerous equivalent formalisms for topological spaces. A topology may be formalised, for example, 
in terms of open sets (Definition 31.3.2), closed sets (Notation 31.4.4), interior operators (Definition 31.8.2), 
closure operators (Definition 31.8.7) or per-point neighbourhoods. (These are described and formalised 
axiomatically in EDM2 [113], 425.B, page 1606.) 


The most popular formalisation for topological structure defines a topology to be its set of open sets as in 
Definition 31.3.2. Simmons [137], page 98, says the following on this subject. 


A good deal of research was done along these lines in the early days of topology. It was found 
that there are many different ways of defining a topological space, all of which are equivalent to 
one another. Several decades of experience have convinced most mathematicians that the open set 
approach is the simplest, the smoothest, and the most natural. 


From the point of view of doing practical analysis, undoubtedly the open-set formalisation is the best trade-off 
between objectives. However, open-set classes do lack direct intuitive appeal, as mentioned in Remark 31.2.1. 


31.3.2 DEFINITION: A topology on a set X is a set T C P(X) such that 


(i) (0, X} C T, 
(ii) VOxx, Q2 € T, QNARE T, 
(ii) VC € P(T), UC e T. 


31.3.3 DEFINITION: 

A topological space is a pair X « (X,T) such that T is a topology for the set X. 
The point set of a topological space (X, T) is the set X. 

A point in a topological space (X, T) is any element of X. 

A set in a topological space (X, T) is any subset of X. 

An open set in a topological space (X, T) is any element of T. 


31.3.4 NOTATION: Top(X), for a topological space X < (X, T), denotes the topology T when the choice 
of topology T on X is implicit in a particular context. In other words, Top( X) = T. 


31.3.5 REMARK:  Topological space specification tuple redundancy. 

There is some redundancy in the specification tuple for a topological space in Definition 31.3.3. The set X 
in a tuple (X, T) always satisfies X = [|J T. The set X certainly does not contain the full information in the 
pair (X, T), while the set T' does contain the full information. But in typical usage, the set X is usually in 
the foreground and the topology T' is in the background. Therefore the pair (X, T) is often abbreviated to 
just X. This seems illogical and perplexing unless one considers that in the history of mathematics, it often 
required a long time to determine which structures on sets were required for their unambiguous formalisation. 


31.3.6 REMARK: The equivalence of the two-set and finite-collection-of-sets intersection rules. 

The two-set intersection rule in Definition 31.3.2 (ii) is chosen to facilitate the proof of validity of topologies. 
By Theorem 31.3.7, the two-set rule implies the general finite intersection rule. Since Theorem 31.3.7 (ii") 
implies Definition 31.3.2 (ii), the finite intersection condition (ii^) could be substituted for the two-set inter- 
section condition (ii) without changing the definition. 


The non-standard set notation P?°(T) is defined as (C € P(T); 1 € #(C) < oo], the set of all non-empty 
finite subsets of T. (See Notation 13.12.5.) 


31.3.7 THEOREM: The two-set intersection rule implies the finite-collection-of-sets intersection rule. 
Let (X, T) be a topological space. Then 


(i) VC € P(T), NC e T. 


PROOF: Let (X,T) be a topological space. To show (ii'), let C € IPT*(T). If #(C) = 1, then C = {9} for 
some set Q € T. So NC = € T as claimed. If #(C) = 2, then C = {01,2} for some sets Q1, Q2 € T. So 
NC = Q1 N Q2 € T by Definition 31.3.2 (ii). 
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Let n = #(C) € Z+. Then there is a bijection f : Nn > C. So NC = Ni fli) = (NY F@) 0 f(n). This 
is an element of T by Definition 31.3.2 (ii) if part (ii!) is valid for #(C) = n — 1. Therefore by induction 
on n, (ii) is valid for all n = #(C) € Z*. 


31.3.8 REMARK: Why the intersection rule is finite whereas the union rule is infinite. 

A topology for a set X is a set of subsets of X which contains the empty set, the set X, and any finite 
intersection or arbitrary union of sets in X. One might reasonably ask why there is an asymmetry between 
set intersections and set unions in this definition. One simple answer is that topology would be very boring 
if closure under arbitrary intersections was required. In that case, every topology on a set X would be the 
set of all unions of a partition of X. For such a topology to have the ability to separate pairs of points at all 
(in the sense of the very weak T separation class in Definition 33.1.8), the topology would have to be the 
power set IP(X). Then the only connected sets would be singletons. 


A better way to answer the question is to consider the intuitive idea of the interior of an open set. A set is 
intuitively defined to be “open” if every point in the set is in the interior of the set. 


If a point x is in the interior of an open set Qı and also in the interior of Q2, then we would expect x to 
be in the interior of Qı N Q2 although the “walls” of the set would be a little “closer”. In the case of a 
union Qı U Qz, the “walls” of the union will be either the same “distance” away or further away. Since the 
union operation makes sets bigger, we are guaranteed to always be in the interior of a set no matter how 
many open sets are in a union, even an infinite or uncountably infinite number of open sets. The shrinkage 
of a set, on the other hand, has the danger that eventually we might not have enough room to move. Being 
in the "interior" of a set intuitively means that we have at least some "space" between each point and the 
“walls”. The amount of space for the intersection Q1 N Q2 should be the minimum of the space for the 
individual sets O4 and Qə. But the minimum of a “small space" and a “small space" is still a “small space". 
So by naive induction, we expect that the intersection of any finite number of sets will leave us at least some 
"room" around each point which is still in the intersection. 


To put it simply, after any number of enlargements, a neighbourhood is still a neighbourhood. But after 
an infinite number of shrinkages, a neighbourhood might not be a neighbourhood any more. Hence the 
asymmetry. Therefore a topological space specifies closure under finite intersections and arbitrary unions. 


31.3.9 REMARK: The empty set is not excluded as a topological space. 

Some texts (e.g. Simmons [137], page 92; B. Mendelson [115], page 71) require a topology to be defined on 
a non-empty set. Others (e.g. Kelley [101], page 37; Wallace [154], page 264; Steen/Seebach [141], page 3; 
Ahlfors [45], page 67; Gilmore [82], page 59; Robertson/Robertson [126], page 6; Yosida [167], page 3) do 
not make this requirement, at least not explicitly. The exclusion of the empty set is inconvenient for the 
statement of some kinds of theorems. It is tedious to have to treat separately the cases where a set is empty. 
So {Ø} is accepted as a valid topology on Ø in Definition 31.3.2. Consequently the pair (Ø, (0]) is a valid 
topological space in Definition 31.3.3. 


31.3.10 REMARK: Defining neighbourhoods in terms of the class of all open sets. 

Definition 31.3.11 is somewhat backwards. A topology on a set is the collection of all neighbourhoods of all 
points in the set. As mentioned in Remark 31.2.6, a more intuitive approach to topology would first define 
the collection of neighbourhoods at each point, and then define a topology as the union of these per-point 
collections. However, the class-of-all-open-sets definition of a topology does have many practical advantages. 


A “neighbourhood” always means an “open neighbourhood” in this book. Some authors also define a “closed 
neighbourhood" and a *compact neighbourhood". (See for example Kelley [101], pages 38, 141, 146.) Closed 
and compact neighbourhoods are given here as Definitions 31.8.26 and 33.6.2. 


31.3.11 DEFINITION: An (open) neighbourhood of a point x in a topological space X < (X, T) is any set 
Q € T which satisfies x € Q. 


31.3.12 NOTATION: The set of neighbourhoods of a point. 
Top, (.X), for a topological space X < (X, T) and x € X, denotes the set of neighbourhoods of x. In other 
words, Top,(X) 2 (Q € Top(X); ze 0) 2 (OQcT;xc&). 


31.3.13 REMARK: Alternative notations for the set of neighbourhoods at a point. 
B. Mendelson [115], page 76, gives the notation R, (using the Fraktur-N character) for the set of neighbour- 
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hoods Top,(X) at x € X. Kelley [101], page 56, gives a notation which looks similar to Us. Robertson/ 
Robertson [126], page 6, gives the notation 7. 

The non-standard Notation 31.3.14 denotes the set of all point/neighbourhood pairs for a given topological 
space. This is occasionally useful as a shorthand. 


31.3.14 NoTATION: The set of all point/neighbourhood pairs. 
Nbhd(X), for a topological space X < (X, T), denotes the set of point/neighbourhood pairs (x, Q) € X x T 
such that x € Q. In other words, 


Nbhd(X) = ((z, Q) € X x Top(X); ze Q} 
= ((z, Q) € X x Top(X); Q € Top, (Q)}. 


31.3.15 REMARK: A set is open if every point has a neighbourhood included in the set. 

Theorem 31.3.16 gives a clue as to why open sets are called “open”. The open sets are those for which every 
point has a neighbourhood which is included in the set. In other words, no point of an open set has “outside 
points" in every neighbourhood. Such a point would be incapable of isolating itself from the outside. So it 
would be on the boundary of the set, not within the “inside”. 


31.3.16 THEOREM: A set is open if and only if every point has a neighbourhood included in the set. 
Let X be a topological space. 
(i) VG € P(X), ((Va € G, 30 € Top,(X), Q C G) => Ge Top( X)). 
That is, if every element of G has a neighbourhood which is included in G, then G is an open set. 
(ii) VG € P(X), (G € Top(X) © (Vr € G, 30 € Top,(X), 2 C G)). 


Pnoor: For part (i), let G satisfy Vr € G, IQ € Top,(X), Q C G. Let C = {Q € Top(X); 2 C GJ. 


Then [JC € Top(X) by Definition 31.3.2 (iii) because C C Top(X). Let x € G. Then 2 C G for some 
Q € Top,(X). Sor € Q € C. Therefore x € UC. So G CUC. But UC CG. Hence G = UC € Top( X). 


For part (ii), let G € Top(X). Let x € G. Let Q = G. Then € € Top, (X) and Q C G. Thus G € Top(X) = 
(Va € G, 30 € Top, (X), Q C G). The converse follows from part (i). 


31.3.17 REMARK: The smallest and largest topologies on a set. 

For any set X, both (0, X} and the power set P(X) are valid topologies on X. (See Theorem 31.3.20.) 
These are given the names “trivial topology" and “discrete topology" respectively in Definitions 31.3.18 
and 31.3.19. (The trivial topology is called the "indiscrete topology" by Steen/Seebach [141], page 42.) 


It is clear that (0, X} C T C P(X) for any topology T on any set X. So the trivial topology is the smallest 
topology on a set, and the discrete topology is the largest topology (in the sense of the partial order on sets 
defined by set inclusion). 


The trivial topology (0, X ) contains only one element if X = Ø. Note also that the trivial topology and the 
discrete topology are the same if X is empty or contains only one element. 


31.3.18 DEFINITION: The trivial topology on a set X is the set (0, X}. 
A trivial topological space is a topological space (X, T) such that T is the trivial topology on X. 


Alternative names: coarse topology and coarse topological space. 


31.3.19 DEFINITION: The discrete topology on a set X is the set P(X). 
A discrete topological space is a topological space (X, T) such that T' is the discrete topology on X. 


31.3.20 THEOREM: The trivial and discrete topologies are topologies. 
The trivial and discrete topologies are topologies. 


Proor: To show that the trivial topology is a topology, let X bea set and Tx = (0, X}. Then (0, X) C Tx. 
Suppose that Q4, 05 € Tx. Then Q1 N Q3 equals either Ø or X. So 0400» € Tx. Suppose that C € IP(Tx). 
Then UC = 0 if X ¢ C, and JC = X if X € C. So UC € Tx. Hence Tx is a topology on X. 
To show that the discrete topology is a topology, let X be a set and Tx = P(X). Then (0, X) C Tx. 
Suppose that 041, Q2 € Tx. Then O1 109 C Qı C X. So O10. € P(X) = Tx. Suppose that C € IP(Tx). 
Then C C P(X). So Q C X for all Q € C. So UJC € X. Therefore UC € P(X) = Tx. Hence Tx isa 
topology on X. 
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31.3.21 REMARK: Where there is no topology, the discrete topology is the best default. 

The best default topology on a set, when nothing is known or claimed about its topology, may be the discrete 
topology. This topology makes every point an island. Then there is no concept of “neighbourhood” at all. 
By contrast, the trivial topology makes every point a neighbour of every other. 


When it is said that a set has no topology, this should be understood to mean that no point is associated in 
any sense with any other point. This is precisely what is achieved by the discrete topology. (See for example 
Remark 14.7.16, where compatibility is achieved between non-topological and topological definitions by 
assuming the discrete topology in the non-topological case.) 


31.3.22 REMARK: Standard terminology for smaller and larger topologies on a given set. 

When more than one topology is being discussed on a single set, one often compares the relative “strength” 
or “weakness” of the topologies. It is often true that if a given topology has a property, then all stronger, or 
weaker, topologies also have that property. Therefore knowing the relative strength or weakness of topologies 
is often useful for proving properties more easily. 


In contradiction to standard English, topologies T; and Tə such that Tı = Tə are said to be both weaker and 
stronger than each other. It is simpler to adopt this convention than to say, for instance, that ^T, is weaker 
than or equal to T5". For any set X, the trivial topology (0, X) is clearly weaker than all topologies on X, 
and the discrete topology P(X) is stronger than all topologies on X. 


31.3.23 DEFINITION: A topology T; on a set X is said to be weaker than a topology 75 on X if T1 C T5. 
T is said to be stronger than T5 if Ti 2 T5. 


31.3.24 REMARK: Alternative terminology for weak and strong topologies. 
The meanings of the adjectives “weak” and “strong” are not instantly clear in the context of topologies on 
sets. The respective adjectives “coarse” and “fine” would be much clearer. 


31.4. Closed sets 
31.4.1 DEFINITION: A closed set in a topological space (X, Tx) is any set K C X such that X V K € Tx. 


31.4.2 REMARK: Popular hint-letter notations for open and closed sets. 
In the English-language literature, the letter G is often used for open sets and F is often used for closed 
sets. It is plausible that F is intended to be a hint for the French word for “closed”, which is “fermé”. It is 
not clear why the letter G is often used as a hint that a set is open, although the first letter of the German 
word “Gebiet” (meaning “region” ) seems plausible. (Such ideas for the origins of the conventional letters F 
and G are called “folklore” by Bruckner/Bruckner/'Thomson [56], page 4.) 


The letter K is often used for compact sets (Definition 33.5.10), but is also commonly used for closed sets. 


The letter Q is very popular for open sets, probably because “open”, “offen” and “ouvert” all start with “o”. 
In Greek, O-mega means “big O". (The corresponding Greek lower-case letter o has the name “o-micron”, 
which means “small O”.) 


When an open set is regarded as a neighbourhood of a particular element of a topological space, the hint- 
letter U is often used. This comes from the German word “Umgebung”, which means “neighbourhood”. 
(See for example Weyl [158], pages 17-20.) 


31.4.3 REMARK: Notation for collections of closed sets and closed neighbourhoods. 

The set of closed sets for a topology is rarely given its own notation. EDM2 [113], 425.B, page 1606, uses 
© (Fraktur O) for the set of open sets and $ (Fraktur F) for the set of closed sets. But the Fraktur font is 
difficult to write and read. The non-standard Notation 31.4.4 uses an over-bar to indicate the set of closed 
sets of a topological space (X,T) by analogy with Notations 31.3.4 and 31.8.8. 


31.4.4 NOTATION: The set of closed sets for a given topological space. 
Top(X), for a topological space X < (X,T), denotes the set of closed sets in X. In other words, 


Top(X) = {F € P(X); X \ F € Top(X)} 
= {X \ 9; Q € Top(X)} 
={X\Q QET} 
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Top, (X) denotes the set of closed sets in a topological space X < (X, T) which contain a point x € X. That 
is, Top, (X) = {F € Top(X); x € F}. 


31.4.5 THEOREM: Closure of the set of all closed sets under finite unions and arbitrary intersections. 
Let X be a topological space. 


(i) YC € IP (Top(X)), UC € Top(X). (The union of any finite set of closed sets is closed.) 
(ii) VC € IP, (Top(X)), NC € Top(X). (The intersection of any non-empty set of closed sets is closed.) 


Pnoor: To prove part (i), let C be a non-empty finite set of closed sets in a topological space X. Let 
C' = {X \ K; K € C}. By Definition 31.4.1, VS € C', S € Top( X). By Theorem 31.3.7, QNC” € Top(X). 
But UC = X \ ((] C"). So the union of C is closed by Definition 31.4.1. If C is empty, (JC = Ø, which is a 
closed set. 


For part (ii), let C be a non-empty set of closed sets in a topological space X. Let C” = (XVK; K € Cj. By 
Definition 31.4.1, VS € C’, S € Top(X). By Definition 31.3.2 (iii), UC’ € Top(X). But UC = X V (N C’). 
So the intersection of C is closed by Definition 31.4.1. 


31.4.6 REMARK: Closed sets contain all of their limit points. 

Theorem 31.4.7 gives some idea of why closed sets are called “closed”. In the terminology of “limit points” 
in Definition 31.10.2, Theorem 31.4.7 (i) states in essence that a closed set contains all of its limit points. 
Theorem 31.4.7 (ii) states that closed sets are precisely those which have this property. Theorem 31.4.7 is 
the dual of Theorem 31.3.16. 


31.4.7 THEOREM: A set is closed if and only if it contains all of its limit points. 
Let X be a topological space. 
(i) VF € Top(X), Vr € X, ((VO € Top, (X), QN F z 0) 2 x € F). In other words, if all neighbourhoods 
of a point x € X have a non-empty intersection with a closed set F, then x € F. 


(ii) VF € P(X), (F € Top(X) © (Vr € X, (YQ € Top, (X), ANF Z 0) 2 x € F))). 


PROOF: For part (i), let F € Top(X) and x € X. Let G = X \ F. Then G € Top(X) by Definition 31.4.1. 
Suppose that x ¢ F. Then x € G. Let Q = G. Then QN F = (0. Thus z € F > 30 € Top,(X), ON F =f. 
This is logically equivalent to the contrapositive statement (VO € Top,(X), On F z 0) 2 x € F. Hence 
VF € Top(X), Vz € X, (VO € Top,(X), Qn F Z0) 2 x € F). 

For part (ii), let F € P(X) satisfy Vx € X, (VO € Top,(X), Qn F Z0) 2 x € F). Let G = X \ F. The 
equivalent contrapositive statement is Vr € X, (x € G => IQ € Top, (X), Q € G) because ON F = 0 if 
and only if Q C G. So Va € G, 3€ € Top, (X), Q C G. Therefore G € Top(X) by Theorem 31.3.16 (i). So 
F € Top(X) by Notation 31.4.4. The converse follows from part (i). 


31.5. Some topologies on finite sets 


31.5.1 REMARK: Minimalist topological spaces as a familiarisation exercise. 

It is a useful familiarisation exercise to determine in full generality the set of all topologies on the most 
trivial sets. On the other hand, the subject of topology is not directed at finite sets. Topology is principally 
concerned with limiting processes and continuity, and these concepts are only non-trivial for infinite sets. 
However, connectedness does make good sense in a finite set. Connectedness for finite sets is sometimes 
referred to as “network topology". 

Minimalist (i.e. finite) topological spaces do have practical applications. For example, if a continuous function 
has a finite range, the topology on the range-space puts constraints on the topology of the domain-space. 
The smallest topology Top(X) = (0, X) on any set X is the trivial topology in Definition 31.3.18. The 


largest topology Top(X) = P(X) = {S; S C X} on any set X is the discrete topology in Definition 31.3.19. 
The interesting thing is to determine what the other possibilities are for the topology on any given set X. 


31.5.2 EXAMPLE: Only one topology is possible on the empty set. 
The only possible topology on the set Ø is {Ø}. This gives the empty-set topological space (Ø, {0}) which is 
mentioned in Remark 31.3.9. 
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31.5.3 REMARK: Without loss of generality, let the finite sets be sets of integers. 
When considering the possible topologies on countable sets of points, it is convenient to consider only sets 
whose “points” are integers. Only the cardinality of the point set matters for this task. 


31.5.4 EXAMPLE: Only one topology is possible on a singleton set. 
On a single-element set X = {1}, the only possible topology is Top( X) = (0, X}. In this case, the trivial 
topology and the discrete topology are the same. 


31.5.5 EXAMPLE: Four topologies are possible on a two-element set. 
On a two-element set X = {1,2}, there are four possible topologies. 


topology abbreviation 
2a — (0X) 0 
2b (0,(1), X] 1 
2c — (06,(2 X) 2 


2d {0, {1}, {2}, X] 1,2 


Topology 2a is the trivial topology. Topology 2d is the discrete topology. Topologies 2b and 2c are equivalent 
under a permutation of the point set. So there is only one “interesting” topology on a two-point set. The 
four possible topologies (amongst the 2°?) = 16 subsets of P({1,2})) are illustrated in Figure 31.5.1. 


2a 2b 2c 2d 


Figure 31.5.1 All topologies on the set {1,2} 


Since Ý and X are always elements of a topology on a set X, it makes sense to ignore them. Similarly, the 
set brackets are a distraction. So the topologies may be abbreviated as in the right column of the table. 


31.5.6 EXAMPLE: Nine unique topologies are possible om a three-element set, modulo permutations. 
On a three-element set X = {1,2,3}, there are 9 unique topologies. The other topologies are obtained by 
permuting the point set. 


topology multiplicity 
3a 0 1 
3b 1 3 
3c 12 3 
3d 12, 3 3 
3e 1, 12 6 
3f 1, 2, 12 3 
3g 2, 12, 23 3 
3h 1, 2, 12, 23 6 
3i 1, 2, 3, 12, 13, 23 1 


total 


nN 
No} 


Topology 3a is the trivial topology. Topology 3i is the discrete topology. Including all of the permutations 
of the point set, there are 29 possible topologies amongst the 2°") = 256 subsets of P({1,2,3}). Topologies 
3a to 3h are illustrated in Figure 31.5.2. 


31.5.7 REMARK: Possible topologies on sets with four or more elements. 

The set of all 355 valid topologies on a four-element set (amongst the 65,536 subsets of P({1,2,3,4})) is 
demonstrated in Example 31.5.8. There are 6942 valid topologies with five points amongst the 4,294,967,296 
subsets of IP((1, 2,3,4,5}) (assuming that the author's PHP script is correct), which starts to make exhaustive 
search difficult. 


Since Ø and X are always elements of X = IN, for n € Z+, the number of subsets of P(X) which must be 
checked to find valid topologies is 2?"~? for n > 1. This number increases rather rapidly with increasing n. 
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3a 3b 3c 3d 
12 3 1,2 3 1 2) 3 1 2/3 
3e 3f 3g 3h 
1) 2) 3 1/2/]|3 1|2/]|3 112/3 
Figure 31.5.2 Unique topologies on the set {1, 2,3} 


For n = 6, 2?"~? is equal to 4,611,686,018,427,387,904, which is beyond the capability of current computers 
without prior analysis to reduce the search space by many orders of magnitude. If topologies on even such 
small sets are too difficult to search, clearly countably and uncountably infinite sets are beyond exhaustive 
study. However, by imposing uniformity conditions, substantial reductions can be achieved. 


31.5.8 EXAMPLE: All possible topologies on a four-element set. 
Figure 31.5.3 is a computer-generated tabulation of all possible topologies on a four-element set 


The empty set and the set {1, 2,3,4} have been omitted from all 355 topologies in Figure 31.5.3 for brevity. 
For example, the 25th topology in the list is 755 = {0, {1, 2}, {3}, {1, 2, 3}, (1,2, 3, 4}}, which is abbreviated 
as “Tos (12, 3, 123)". 


31.5.9 REMARK: The dual of a topology is a valid topology on the same set. 
In the case of finite point sets X, there is a simple duality between the open sets and closed sets because an 
“arbitrary union” of sets in a topology on X means a “finite union” of sets when X is finite. 


Let T be a topology on a finite set X. Then T = {X\Q; Q € T} isa topology on T. The topology T is a kind 
of “dual topology" of T. The closure of T' under set union follows from the fact that (X V 91) U (X V Q3) = 
XN (0400035) for all Q1, 05 € T. Closure under intersections follows from (X \Q1)N(X \ 92) = XV(01U03). 
Of course the dual of the dual T is the same as the original topology T. 


In Example 31.5.5, topology 2b is the dual of topology 2c. Both topologies 2a and 2d are self-dual. 


In Example 31.5.6, the dual of topology 3b is the same as a permutation of topology 3c and topology 3f is 
the same as a permutation of topology 3g. Topologies 3e and 3h are equivalent (under set permutations) to 
their own dual topologies. Topologies 3a and 3h are self-dual. 


31.5.10 EXAMPLE: Permutation-invariance of a finite topology implies the discrete topology. 

In practice, one usually wants topologies which have some sort of uniformity or symmetry. For example, 
a topology on sets like Z, Q, R and IR" for n € Zi would generally be expected to be invariant under 
arbitrary translations. Translation invariance is a very strong constraint which greatly reduces the set of 
possible topologies on a set. 


If X is a finite set, the only topologies on X which are invariant under all permutations of X are the trivial 
and discrete topologies. To see this, suppose {x} € Top( X) for some x € X. Then permutation invariance 
implies that {x} € Top(X) for all x € X. Since all unions of elements of Top(X) are elements of Top( X), 
this implies that Top(.X) — IP(X). 


Now suppose that A is an arbitrary non-empty subset of X such that A Æ X. Then there are elements 
x,y € X such that z € A and y ¢ A. So by permutation invariance of Top( X), the set B; = swap, ,(A) is an 
element of Top( X) for all z € Z = X \ {x}. (See Definition 14.12.23 (i) for the swap-value operator *swap".) 
Therefore the finite intersection B = (,.7 B; must be an element of Top( X). But B = {a}. So by the first 
argument, Top(.X) is once again the discrete topology. Note that the finiteness of X was an essential step 
in this proof. 


It follows that there are no interesting topologies on finite sets which are invariant under all permutations 
of the point set. 
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Ti() T2(1) T3(2) T4(12) T5(1,12) Te (2,12) T7(1,2,12) T&(3) Tə(13) Tio(1,13) T11 (3,13) Ti2(1,3,13) T13(23) T14(2,23) 
T15(3,23) Ti6(2,3,23) T17(123) Tyg(1,123) T19(2,123) Too (12,123) T51(1,12,123) T22(2,12,123) T53(1,2,12,123) T54(3,123) 
755(12,3,123) T56(13,123) T27(1,13,123) T58(2,13,123) T59(1,12,13,123) T3o(1,2,12,13,123) T31(3,13,123) T32(1,3,13,123) 
T33(1,12,3,13,123) T34 (23,123) T35(1,23,123) T36(2,23,123) T37(2,12,23,123) T3g(1,2,12,23,123) T39(3,23,123) 

Tao (2,3,23,123) T41(2,12,3,23,123) T42(3,13,23,123) T43(1,3,13,23,123) T44(2,3,13,23,123) T45(1,2,12,3,13,23,123) Ty6(4) 
(123,4) Tag(14) T49(1,14) Ts0 (23,14) T51(1,123,14) T52(1,23,123,14) T53(4,14) Ts4(1,4,14) Ts5(1,123,4,14) Ts6(24) 
Ts7 (2,24) Ts (13,24) T59(2,123,24) T59(2,13,123,24) Te1(4,24) Te2(2,4,24) T53(2,123,4,24) Tea(124) Tes (1,124) Tee (2,124) 

(12,124) Tes(1,12,124) Ts9(2,12,124) Tro (1,2,12,124) Tz (3,124) T73(1,13,124) T73(1,3,13,124) T74(2,23,124) 

(2,3,23,124) T75(12,123,124) Tr7(1,12,123,124) Trg (2,12,123,124) 7T75(1,2,12,123,124) Tyo (12,3,123,124) 
( 
( 
( 
( 


— 


,12,13,123,124) Tyo(1,2,12,13,123,124) 753(1,12,3,13,123,124) 7T84(2,12,23,123,124) T85(1,2,12,23,123,124) 
2,12,3,23,123,124) Ty7(1,2,12,3,13,23,123,124) T&s(4,124) T&9(12,4,124) Too (12,123,4,124) Toi (14,124) To2(1,14,124) 
To3(2,14,124) T54(1,12,14,124) Tos (1,2,12,14,124) To6(2,23,14,124) To7(1,12,123,14,124) Tog (1,2,12,123,14,124) 
,2,12,23,123,14,124) T190(4,14,124) Tio1(1,4,14,124) Tyo2(1,12,4,14,124) T103(1,12,123,4,14,124) T194(24,124) 


,12,4,14,124,34,134) T533(3,4,14,124,34,134) T534(1,3,13,4,14,124,34,134) T555(1,12,3,13,123,4,14,124,34,134) 


T105(1,24,124) T196(2,24,124) Tyo7(2,12,24,124) T198(1,2,12,24,124) Tyo9(1,13,24,124) T110(2,12,123,24,124) 
7T111(1,2,12,123,24,124) T112(1,2,12,13,123,24,124) Ty13(4,24,124) Ty14(2,4,24,124) Ty15(2,12,4,24,124) 
T116(2,12,123,4,24,124) T417(4,14,24,124) Ti18(1,4,14,24,124) Ty19(2,4,14,24,124) Tyo9(1,2,12,4,14,24,124) 
T1021 (1,2,12,123,4,14,24,124) T123(34) T123(12,34) T124(3,34) Ties (3,123,34) T126(12,3,123,34) T127(4,34) T128(3,4,34) 
T129(3,123,4,34) T130(4,124,34) T131(12,4,124,34) T132(3,4,124,34) T133(12,3,123,4,124,34) T134(134) Tias (1,134) 
T136 (2,134) T137(1,12,134) Ti38(1,2,12,134) T139(3,134) T140(13,134) T141(1,13,134) T142(3,13,134) T143(1,3,13,134) 
7144(3,23,134) T145(2,3,23,134) T146(13,123,134) T147(1,13,123,134) T14&(2,13,123,134) T149(1,12,13,123,134) 
T150(1,2,12,13,123,134) T151(3,13,123,134) T152(1,3,13,123,134) T153(1,12,3,13,123,134) 7154(3,13,23,123,134) 
7155(1,3,13,23,123,134) T156(2,3,13,23,123,134) Ty57(1,2,12,3,13,23,123,134) T158(4,134) T159(13,4,134) T160(13,123,4,134) 
T161(14,134) Ty62(1,14,134) T163(3,14,134) T454(1,13,14,134) Ty65(1,3,13,14,134) T,55(3,23,14,134) Ty67(1,13,123,14,134) 
T163(1,3,13,123,14,134) Ty69(1,3,13,23,123,14,134) Ty79(4,14,134) Ty71(1,4,14,134) Ty72(1,13,4,14,134) 
T173(1,13,123,4,14,134) T174(4,24,134) Ty75(2,4,24,134) T176(13,4,24,134) T177(2,13,123,4,24,134) Ty7g (14,124,134) 
T179(1,14,124,134) Tyg0(2,14,124,134) Tigi (1,12,14,124,134) Tyg2(1,2,12,14,124,134) T13(3,14,124,134) 
Tig4(1,13,14,124,134) Tygs(1,3,13,14,124,134) Tig6(2,3,23,14,124,134) T1g7(1,12,13,123,14,124,134) 
Tigg(1,2,12,13,123,14,124,134) T359(1,12,3,13,123,14,124,134) Tyo0(1,2,12,3,13,23,123,14,124,134) T191(4,14,124,134) 
T192(1,4,14,124,134) Ty93(1,12,4,14,124,134) T194(1,13,4,14,124,134) Tyo5(1,12,13,123,4,14,124,134) T196(4,14,24,124,134) 
T197(1,4,14,24,124,134) T198(2,4,14,24,124,134) T199(1,2,12,4,14,24,124,134) T590(1,13,4,14,24,124,134) 
T501(1,2,12,13,123,4,14,24,124,134) T292(34,134) T293(1,34,134) T594(1,12,34,134) T595(3,34,134) T5596 (3,13,34,134) 
7507 (1,3,13,34,134) T59&(3,13,123,34,134) T599(1,3,13,123,34,134) T219(1,12,3,13,123,34,134) T511(4,34,134) T512(3,4,34,134) 
T213(3,13,4,34,134) T514(3,13,123,4,34,134) T515(4,14,34,134) T216(1,4,14,34,134) T»17(3,4,14,34,134) 
T218 (1,3,13,4,14,34,134) To19(1,3,13,123,4,14,34,134) T520(4,14,124,34,134) T551(1,4,14,124,34,134) 

( 

( 

( 


T341(2,12,23,123,234) T542(1,2,12,23,123,234) T543(3,23,123,234) T244(2,3,23,123,234) To45(2,12,3,23,123,234) 
T246(3,13,23,123,234) T547(1,3,13,23,123,234) To4g(2,3,13,23,123,234) T549(1,2,12,3,13,23,123,234) T550(4,234) 
T551(23,4,234) T252(23,123,4,234) T553(4,14,234) T»54(1,4,14,234) T555(23,4,14,234) T556(1,23,123,4,14,234) T257(24,234) 
T552(2,24,234) T559(3,24,234) T260(3,13,24,234) T261(2,23,24,234) T562(2,3,23,24,234) T563(2,23,123,24,234) 
T564(2,3,23,123,24,234) T265(2,3,13,23,123,24,234) T566(4,24,234) T567(2,4,24,234) T5658 (2,23,4,24,234) 
T569(2,23,123,4,24,234) T279(24,124,234) T»71(1,24,124,234) T»72(2,24,124,234) T273(2,12,24,124,234) 
T374(1,2,12,24,124,234) T275(3,24,124,234) T276(1,3,13,24,124,234) T»77(2,23,24,124,234) T27g(2,3,23,24,124,234) 
To79(2,12,23,123,24,124,234) T589(1,2,12,23,123,24,124,234) T581(2,12,3,23,123,24,124,234) 
T582(1,2,12,3,13,23,123,24,124,234) T583(4,24,124,234) T584(2,4,24,124,234) Tog5(2,12,4,24,124,234) T586(2,23,4,24,124,234) 
T587(2,12,23,123,4,24,124,234) Togg(4,14,24,124,234) T589(1,4,14,24,124,234) T599(2,4,14,24,124,234) 
T591(1,2,12,4,14,24,124,234) T292(2,23,4,14,24,124,234) T3593(1,2,12,23,123,4,14,24,124,234) T594(34,234) T295(2,34,234) 
T596(2,12,34,234) T297(3,34,234) T598(3,23,34,234) T299(2,3,23,34,234) T300(3,23,123,34,234) T391(2,3,23,123,34,234) 
T302(2,12,3,23,123,34,234) T503(4,34,234) T304(3,4,34,234) T505(3,23,4,34,234) T306(3,23,123,4,34,234) T3507 (4,24,34,234) 
T308(2,4,24,34,234) T3909(3,4,24,34,234) T310(2,3,23,4,24,34,234) T311(2,3,23,123,4,24,34,234) T312(4,24,124,34,234) 
T313(2,4,24,124,34,234) T314(2,12,4,24,124,34,234) T315(3,4,24,124,34,234) T316(2,3,23,4,24,124,34,234) 
T317(2,12,3,23,123,4,24,124,34,234) T318(34,134,234) T319(1,34,134,234) T329(2,34,134,234) T321(1,2,12,34,134,234) 
T322(3,34,134,234) T323(3,13,34,134,234) T324(1,3,13,34,134,234) T325(3,23,34,134,234) T326(2,3,23,34,134,234) 
T327(3,13,23,123,34,134,234) T555(1,3,13,23,123,34,134,234) T329(2,3,13,23,123,34,134,234) 
T330(1,2,12,3,13,23,123,34,134,234) T331(4,34,134,234) T332(3,4,34,134,234) T533(3,13,4,34,134,234) T334(3,23,4,34,134,234) 
T335(3,13,23,123,4,34,134,234) T336(4,14,34,134,234) T337(1,4,14,34,134,234) T3538 (3,4,14,34,134,234) 
T339(1,3,13,4,14,34,134,234) T340(3,23,4,14,34,134,234) T341(1,3,13,23,123,4,14,34,134,234) T342(4,24,34,134,234) 
T343(2,4,24,34,134,234) T344(3,4,24,34,134,234) T345(3,13,4,24,34,134,234) T3416 (2,3,23,4,24,34,134,234) 
T347(2,3,13,23,123,4,24,34,134,234) T34g(4,14,24,124,34,134,234) T349(1,4,14,24,124,34,134,234) 
T350(2,4,14,24,124,34,134,234) T351(1,2,12,4,14,24,124,34,134,234) T359(3,4,14,24,124,34,134,234) 
T353(1,3,13,4,14,24,124,34,134,234) T354(2,3,23,4,14,24,124,34,134,234) T355(1,2,12,3,13,23,123,4,14,24,124,34,134,234) 


Figure 31.5.3 Example 31.5.8: List of all topologies on the point set {1, 2,3, 4} 


31.6. Relative topology on subsets 


31.6.1 REMARK: The topology induced on a subset by the topology on the whole set. 

The standard terminology “relative topology” in Definition 31.6.2 is not as clear as it could be. It would have 
been better to call it “the induced topology by T on S”, or “the topology induced on S by the topology T”, 
or something like that. It is important to know where the topology came from. Therefore the slightly 
non-standard word “induced” is applied in Definition 31.6.2. 


31.6.2 DEFINITION: The relative topology on a subset S of a set X, induced by a topology T on X, is the 
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set {NN S; 0 € T). 


31.6.3 THEOREM: The relative topology is a valid topology. 
Let T be a topology on a set X. Then the relative topology induced by T on a set S C X is a topology on S. 


PROOF: Let T be a topology on X. Let S C X. Let T’ = {QN S; Q € T} C P(S) denote the relative 
topology induced by T on S. Then () = Ø ^.S € T' because € T, and S = X N S € T' because X € T. 

Let 04,05 € T’. Then 04 = Qı N S and 05 = Q2 N S for some 01,05 € T. So OQ; n OQ» € T since T isa 
topology. So 04905 = (Q1NS)N(Q2N S) = (01 n93)n5 € T'. So T" is closed under binary intersections. 
Let C" € P(T’). Then VM € C', IN € T, Q = QN S. (Choosing a single set Q for each Q’ € C" at this 
point in the proof would risk invoking the axiom of choice. This is avoided by using all possible sets €.) Let 
C = {Q € T; AN EC, o = QN S}. Then VO’ e P(S), (X e Œ e (ANE CL, N'=0NS)). But UC eT 
because T' is a topology. So 


UC -U(q Y ec") 
—-U(9; Y eC' and INET, A 2 On S) 
=U {0 € P(S); Y € C and IN ET, 0 =0NNS} 
=|J {0’ € P(S); dQ EC, Q =0N39} 
=U{NNS; NEC} 
=U{2; NE C}NS 
=(UC)NS. 


Therefore (J C” € T’. Hence T” is a topology on X by Definition 31.3.2. 


31.6.4 REMARK: The relative topology on an open subset of a topological space. 

Theorem 31.6.5 implies that the induced topology on an open subset of a topological space consists of 
sets which are open in the whole space. The theorem is shallow, and the proof is a shallow exercise. 
Theorem 31.6.6 is an easy corollary of Theorem 31.6.5. Then Theorem 31.6.7 paraphrases and extends 
Theorem 31.6.6. 


31.6.5 THEOREM: The relative topology on an open set is the simple restriction of the global topology. 
Let T be a topology on a set X. Let S € T. Let Ts be the relative topology induced by T on S. Then 
Ts =TA P(S). 


PROOF: Let T be a topology on a set X. Let S € T. (In other words, let S be an open set in (X, T).) Let 
Ts be the relative topology induced by T on S. Then Ts = {QN S; Q € T} by Definition 31.6.2. Suppose 
that G € Ts. Then G = QN § for some 2 € T. So G € T because T is closed under unions. But G € P(S) 
because GC S. So G € Tn IP(S). Therefore Ts C TM P(S). Now suppose that G € TN P(S). Let Q = G. 
Then G — OS and Q € T. So G € Ts. Therefore T N P(S) € Ts. Hence Ts = T N P(S). 


31.6.6 THEOREM: All relatively open subsets of an open set are open in the global topology. 

Let T' bea topology on a set X. Let S € T. Let Ts be the relative topology induced by T' on S. Then Ts C T. 
In other words, all sets which are open in the relative topological space (S, Ts) are also open in the topological 
space (X, T). 


PROOF: The relation Ts C T follows from the equality Ts = T N P(S) in Theorem 31.6.5. 


31.6.7 THEOREM: Some properties of the set of closed sets in the relative topology on a subset. 
Let X be a topological space. 


(i) If S € Top(X) has the relative topology from X on S, then Top(S) C Top( X). 
(ii) If S € P(X) has the relative topology from X on S, then Top(S) = {F N S; F € Top(X)]. 
(iii) If S € Top(X) has the relative topology from X on S, then Top(S) C Top(X). 
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PROOF: Part (i) is a paraphrase of Theorem 31.6.6. So it must be true! 


For part (ii), let X be a topological space, S € P(X) and F € Top(S). That is, F is a closed set in 
the relative topology Top(S) on S. Then F = SV G for some G € Top(S) by Definition 31.4.1. But 
G = Qn S for some 2 € Top(X) by Definition 31.6.2. So F = S\ (RAS) = (SVQ)U(SNS) by 
Theorem 8.2.3 (ii). Therefore F = SVQ = SN(X\Q) = F'nS, where F' = X \Q € Top(X). It 
follows that Top(S) C {F N S; F € Top(X)}. Now suppose that F’ € {F N S; F € Top(X)}. Then 
F' = FS for some F € Top(X). So F’ = (X\Q)NS =S\Q=S\ (SNQ) for some Q € Top(X). But 
SQ € Top(S) by Definition 31.6.2. So F' € Top(S). Therefore {F N S; F € Top(X)} € Top(S). Hence 
Top(S) = {F N S; F € Top(X)]. 

For part (iii), let X be a topological space, S € Top(X) and F € Top(S). Then F = F'nS for some 
F € Top(X) by part (ii). So F € Top(X) by Theorem 31.4.5 (ii) because F’ € Top(X) and S € Top(X). 
Hence Top($) C Top(X). 


31.7. Open covers 


31.7.1 REMARK:  Unindexed open covers of sets. 
Definition 31.7.2 is a specialisation of the non-topological set cover concept in Definition 8.7.7. For indexed 
open covers, see Definitions 33.5.4, 33.5.5 and 33.5.7. 


31.7.2 DEFINITION: An open cover of a subset S of a topological space X « (X, T) is a set C € IP(T) such 
that S C JC. 


A finite open cover of a set S is an open cover C of S such that #(C) « oo. 


31.7.3 DEFINITION: A subcover of an open cover C of a set S in a set X is an open cover C" of S in X 
such that C” C C. 


31.7.4 DEFINITION: A refinement of an open cover C of a set S in a set X is an open cover C” of S in X 
such that YVA’ € C’, HA € C, A' C A. 


31.7.5 REMARK: Open covers of the whole topological space. 
In the special case S = X in Definition 31.7.2, the inclusion condition *S C UC” can be replaced by the 
equality “S = [J C” because then JC CUT =X 2 S. 


31.7.6 REMARK: Sets are open if and only if they are “locally open” with respect to an open cover. 

The very elementary Theorem 31.7.7 is a fundamental property of open covers which helps to explain the 
ubiquity of open covers in differential geometry. Lines (31.7.1) and (31.7.2) mean that a set is open if and 
only if it is “locally open”, where a set S is said to be “locally open” with respect to an open cover C if 
QN S is open for all Q € C. This implies that the “information” in the topology T is equivalent to the 
combined "information" in the relative topologies of the sets in any open cover. This guarantees that for 
many purposes, global properties can be defined locally. For example, a subset of a locally Cartesian space 
is open if and only if its image in each chart is open. 


31.7.7 THEOREM: A set is open if and only if its intersection with each set of an open cover is open. 
Let (X, T) be a topological space and let C be an open cover of X. Then 


VS c P(X), SET &YNEC,NNSET (31.7.1) 
e VO e C, 0n S € Ty, (31.7.2) 


where To denotes the relative topology on 2. Hence T = { Uoec Ga; G: C > T and VO € C, Go € To}. 


PROOF: Let S € T. Then OQ n S € T for all Q € C by the closure of T under finite intersection. 

Let S € P(X) and suppose that QN S € T for all Q € C. Then S = XNS = (UC)NS = Ugesc(0n.S) by 
Theorem 8.4.8 (iv). So S € T by the closure of T under arbitrary unions. This verifies line (31.7.1). 

To show line (31.7.2), let Q € C with QN S € T. Then Qn S = Qn (QN S) € To by Definitions 31.6.2 
and 31.7.2. Now let Q € C with QN S € To. Then ONS € T by Theorem 31.6.6. Thus the right-hand sides 
of lines (31.7.1) and (31.7.2) are equivalent. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1038 31. Topology 


To verify the claimed set-expression for T, let S = Uoec Ga for some function G : C — T which satisfies 
VO € C, Go € To. Then Go = Gt for some Go € T, for all Q € C, by Definition 31.6.2 for the relative 
topology Tg. Therefore Go € T for all Q € C. So S € T by the closure of T under unions. 

Conversely, suppose that S € T. Then V0 € C, 01S € T. Define Go = On S for all Q € C. Then 
S=XNS =(UC)NS = Upgec(2N S) by Theorem 8.4.8 (iv). So S = Ugec Go. But VQ € C, Go € To. 
Therefore 5 € {Upeo Go; G: C > T and VQ € C, Go € To}. This verifies the set equality. 


31.7.8 REMARK:  Non-equivalence of openness and local openness for non-open covers. 
As a matter of intellectual curiosity, one might ask how much of Theorem 31.7.7 survives if the cover is not 
required to be open. 


If (X, T) is a topological space and C is a non-topological set cover of X as in Definition 8.7.7, then 


VS € P(X), YVQec,aNSEeT o SET (31.7.3) 
=> VOEC, ANS € To, (31.7.4) 


where To denotes the relative topology on Q. Also T C (Joco Go; G: C > P(X) and VO € C, Go € To}. 


To show line (31.7.3), let S € P(X) satisfy QN S € T for all Q € C. Then S = Xn S = (Uaec 9)n S = 
Uoecc (QN S) € T by the closure of topologies under the general union operation. To show line (31.7.4), let 
S € T. Then QNS € Tg for all Q € C by Definition 31.6.2. To show the set inclusion, let S € T. Define the 
function G : C > P(X) by Go = QN S for all Q € C. Then Go € To for all Q € C by line (31.7.4). But 
Uaec Ga = Uaec (21S) = S. Therefore T C {Ugeg Go; G: C > P(X) and VO € C, Go € To}. 


To disprove the converse of line (31.7.3), let X = IR with the usual topology T on IR, and let C = (IR^, IRj) 
and S = (-1,1). Then S € T, but R MS = [0,1) ¢ T. To disprove the converse of line (31.7.4), let 
S = [0,1) for the same X, T and C. Then R N S = 0 € Tp and RUNS = (0,1) € Tre but S ¢ T. 
To disprove the reverse inclusion for T, let S = [0, 1) for the same X, T and C. Define G : C > P(X) by 
Gpr- = 0 and Cg; = [0, 1). Then Go € To for all Q € C and Joco Go = S. Therefore S is an element of 
the set (Uo c6 Go; G: C > P(X) and VO € C, Go € To}, but S € T. 


Open covers are ubiquitous in differential geometry because general set covers lack the most basic local 
openness properties, which makes them useless for defining topological, analytical or geometric concepts in 
terms of local properties. Some such concepts may require additional constraints on open covers, such as 
countability or finiteness. 


31.7.9 REMARK: A property of a cover which implies that all locally open sets are open. 

Theorem 31.7.10 can be expressed more succinctly by saying that all locally open sets with respect to some 
cover C are open in X if and only if the interiors of elements of the C are also a cover of X. However, the 
interior of a set is not defined until Section 31.8. So Theorem 31.7.10 is expressed in terms of open subsets 
of the elements of the cover C. 


There is a very minor axiom-of-choice issue here because the existence of a function o : C — T as in 
line (31.7.6) does not automatically follow from the existence of a subset of A with some required property 
for all A € C. However, function existence here will follow from the choice of 9(4) = Int(A) for all A € C 
because Int(A) can be expressed as a unique set-theoretic construction which requires no arbitrary choice. 
(See Notation 31.8.3 for Int(A).) Then the map $ : A — Int(A) will be a well-defined ZF set without AC. 


On the other hand, in the proof of Theorem 31.7.10, it is only known that for each A € C, there exists an 
Q € T such that S N A = QN A. This does not guarantee the existence of a function v : C — T such that 
Sn A= v(A)n A for all A € C. Therefore a choice must be made according to some fixed set-theoretic 
construction. The choice rule * which is used in the proof of Theorem 31.7.10 is the map v : C > P(X) 
given by v(A4) = U(Q e T; $n A- Qn A) for all A € C. 


31.7.10 THEOREM: Locally open sets are globally open if and only if the covers interiors cover the set. 
Let (X,T) be a topological space. Let C be a cover of X. Then C satisfies 


VS € P(X), (VAEC,SNAETs) > SET (31.7.5) 
if 
36:C >T, X= U ¢(A) and VAEC, GA) CA, (31.7.6) 
AEC 
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where T'4 denotes the relative topology of T on A for all A € P(X). In other words, all locally open subsets 
with respect to C are open in X if X is covered by some open subsets of the elements of C. 


Hence line (31.7.6) implies line (31.7.7) 
VS € P(X), (VAEC,SNAET,s) e SET. (31.7.7) 


PROOF: Let C be a cover of X which satisfies line (31.7.6). Then there is a function ¢ : C — T which 
satisfies X = LJ4cc (A) with $(A) C A for all A € C. If @ is such a function, then S = Sn X = 
Uaec (S n ¢(A)) for any S € P(X) by Theorem 10.8.14 (i). 


Let S € P(X) satisfy SN A € Ty for all A € C. Then for all A € C, there exists Q € T such that 
SNA=QNA. Define Y : C > P(X) by v(A) =U{Q e T; Sn A— Qn A} for A € C. Then 


VA c C, WAJNA=U{QN A; Qe T and $5nA—- 0n A) (31.7.8) 
—-U(snA; Qe T and $n A- 0 A) 
= SA, 


where line (31.7.8) follows by Theorem 8.4.8 (iv), and (A) € T for all A € C. 
Let A € C. Then 9(A) n A = $(A) by Theorem 8.1.7 (v) because ¢(A) C A. Thus 
VA € C, Sne(A) 2 $n An (A) 
= ¥(A)N An gA) 
= (A) 9(A). 


So $ = Uaec SN (A) = Uaec (A) (A). But v(A) n (A) € T for all A € C. Therefore S € T. Hence 
line (31.7.6) implies line (31.7.5). 
Finally, line (31.7.6) implies line (31.7.7) because $ € T > VA € C, SN A € Ty by Definition 31.6.2. 


31.7.11 REMARK: Sufficiency and non-necessity of interior-covering property. 

'Theorem 31.7.10 shows that if the topological space X is covered by the interiors of the sets in a cover C, 
then openness of sets is equivalent to local openness with respect to C. Example 31.7.12 shows that this 
interior-covering property is not necessary. 


It follows from Theorem 31.7.10 that the property of open covers which is mentioned in Remark 31.7.6 is 
also shared by covers whose interiors cover the whole topological space. So Theorem 31.7.7 does not truly 
explain the ubiquity of open covers in differential geometry. 


31.7.12 EXAMPLE:  Countererample to converse of cover condition for locally open sets to be open. 
Let X = R with the standard topology T on IR. (See Definition 32.5.7.) Let C = {A,B}, where A = Rp 
and B = Rj. Let S € P(X) satisfy SN A € T4 and Sn B € Tg. 


First suppose that 0 ¢ S. Then SAR = $n A. So SAR = QN A for some N € T. Let Y = QAR. 
Then Q € T because R7 € T. But SAR =QN ANF by Theorem 8.1.7 (v). So SNR =Q0R =. 
Therefore S IR^ € T. Similarly SN Rt € T. Consequently S = (SA IR. )U(SnIR*) € T. 


Now suppose that 0 € S. Then 0 € SnIR; and0e SN Rj. From 0 € SN A and S € T4, it follows that 
there exists Q4 € T such that SN A = QAN A and 0 € O4. Then by Theorem 32.5.9, (—a,a) € QA for 
some a € IR*, which implies that (—a,0] C S N A. Similarly, [0, 5) C S A B for some b € IR*. It follows that 
(—a, b) = (—a, 0] U[0,b) C (SN AJU (SN B) = Sn (AU B) = S by Theorems 8.1.9 (iii) and 8.1.6 (xii). So 


S = SU (—a, b) (31.7.9) 
= ((SNR_)U(S/N {0}) U (SN R*)) U (—a, b) (31.7.10) 
=(SAR7)U(SAR?)U (—a,b) (31.7.11) 
ET, (31.7.12) 


where lines (31.7.9) and (31.7.11) follow by Theorem 8.1.7 (iv), line (31.7.10) follows by Theorem 8.1.6 (xii), 
and line (31.7.12) follows from Definition 31.3.2 (iii). Thus C satisfies line (31.7.5) of Theorem 31.7.10, but 
C cannot satisfy line (31.7.6) because ¢(A) C A would imply (A) C IR € A, and ¢(B) C B would imply 
(B) C Rt & B, which would imply that X 2 9(A) U 9(B). 

Hence the interior covering condition in line (31.7.6) is not a necessary condition for local openness with 
respect to C to imply openness in X. 
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31.7.13 REMARK: Alternative test for global openness in terms of local openness. 
Theorem 31.7.14 is an alternative interpretation of the idea that a topology can be reconstructed from the 
relative topologies on elements of an open cover, which is mentioned in Remarks 31.7.8, 31.7.9 and 31.7.11. 


The interpretation in Theorem 31.7.7 is that sets can be tested for global openness by testing their intersec- 
tions with elements of an open cover. The interpretation in Theorem 31.7.14 is that the open sets are those 
which are unions of relatively open sets with respect to the elements of an open cover. 


The assertion of Theorem 31.7.14 is almost obvious in the case of an open cover. The slightly less obvious 
assertion in Theorem 31.7.16 is that this kind of equivalence of local and global openness is only valid in the 
case of open covers. 


31.7.14 THEOREM: A set is globally open if and only if it is equal to a union of locally open sets. 
Let (X, T) be a topological space. Let C be an open cover of X. Then 


VS € P(X), SET & 36:C 2 P(X),(S— U (A) and YA € C, (A) € Ta), (31.7.13) 
AEC 


where Ty = {9N A; Qc Ty 2 (Q€ T; QC A} =TNP(A) is the relative topology of T on A for A € C. 
In other words, T = {Uyec 9(4); 6: C 2 P(X) and VA € C, 9(A) € Ty}. 


PROOF: The equality TA = {Q € T; QC A} =T AN P(A) for A € T follows from Theorem 31.6.5. 

Let S € T. Define 9 : C > P(X) by (A) = SN A for A € C. Then (A) € T for all A € C. Also, 
Uaec HA) = U4ec(8n A) = Sn)ace A — Sn X = S. Moreover, ¢(A) € T4 for all A € C. Thus Se T 
implies 39 : C > P(X), (S = Uaec $(A) and VA € C, $(A) € Ta). 

Now assume that S € P(X) satisfies dé: C > P(X), (S = Uaec $A) and VA € C, (A) € Ta). Then 
S € T by Definition 31.3.2 (iii) because @(A) € T4 C T for all A € C. Thus line (31.7.13) is proven. Hence 


{ U (A); $: C > P(X) and VA € C, @(A) € Ta} 
UT -(U 9(A) $ € tg € P(X)8; VA € C, g(A) € Ta} J (31.7.14) 
AEC 


= {S; Jọ € {g € P(X)°; VA € C, g(A) € Ta}, S= U o(A)) (31.7.15) 
AEC 


—[$;30:C > P(X), (S= U ¢(A) andVAEC, (A) € TA)) (31.7.16) 
AEC 


(31.7.17) 


where lines (31.7.14) and (31.7.16) follow by Theorem 7.7.14, line (31.7.15) follows by Notation 7.7.18, and 
line (31.7.17) follows by line (31.7.13) and Definition 7.2.4 (1). 


31.7.15 REMARK: A necessary and sufficient condition for locally open sets to be globally open. 

Theorem 31.7.16 shows that only open covers of a topological space have the property that all unions of 
relatively open subsets of elements of the cover are open. Perhaps this gives some kind of justification for 
the very frequent use of open covers in definitions of concepts for manifolds. 


Certainly Theorems 31.7.7 and 31.7.14 do both show that the global topology can be reconstructed from 
the relative topologies on elements of an open cover, and this implies that an atlas of charts can induce the 
global topology by inducing local topology in the domain of each chart. Then Theorem 31.7.16 shows that 
the domains of manifold charts must form an open cover if the global open sets are constructed as unions of 
relatively open sets. 


31.7.16 THEOREM: All unions of relatively open sets are open if and only if the cover is open. 
Let (X,T) be a topological space. Let C be an cover of X. Then C is an open cover of X if and only if 


VS € P(X), SET e 36 :C 2 P(X), (S= U 9(A) and VA € C, o(A) € Ta), (31.7.18) 
AEC 
where T4 = {QN A; 0 € T) denotes the relative topology of T on A for all A € C. 
In other words, C is an open cover if and only if T = {U,ec (A); 6: C > P(X) and VA € C, ((A) € Ty}. 
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PROOF: Let C be an open cover. Then C satisfies line (31.7.18) by Theorem 31.7.16. 
Now let C be a cover of X which satisfies line (31.7.18). Let Ao € C. Define $ : C > P(X) by ¢(Ao) = Ao 
and $(A) = 0 for all A € CV (Ao). Then Uyec o(A) = Ao and 9(A) € Ty for all A € C. Therefore Ap € T 
by line (31.7.18). Thus C is an open cover of X. Hence C is an open cover of X if and only if C satisfies 
line (31.7.18). 


31.8. Interior and closure of sets 


31.8.1 REMARK: The interior of a set in a topological space. 

Topological spaces can be formalised in terms of the “interior operator" as the definition of a topology. 
Such an operator yields the interior of any set S € P(X) for a given base set X. But as mentioned in 
Remark 31.3.1, a consensus developed that the class-of-open-sets formalism was best in general. A small 
price to pay for this choice is that Definition 31.8.2 is slightly unnatural-looking. However, although the 
interior of a set is defined as a union of a typically infinite collection of open sets, this union is always itself 
an open set. Therefore it may be thought of as “the largest open set included in S". It always exists, it is 
always unique, and it is always an open set. 


(2020-6-4. Replace Definition 31.8.2 with {x € X; IQ € Top, (X), Q C S} as in Theorem 31.8.13 (iv). Then 
make adjustments to all theorems and definitions based on Definition 31.8.2. )) 


31.8.2 DEFINITION: The (topological) interior of a set S in a topological space (X, T) is the union of all 
open sets in (X, T) which are included in the set S. In other words, the interior of S is the set 


U 9=U{MET; QC $}=U(TNP(S)). 
QET 
acs 


31.8.3 NOTATION: Int(S) denotes the interior of a set S in a topological space (X,T). 


31.8.4 DEFINITION: The interior operator of a topological space (X, T) is the map Int : P(X) — T, where 
Int(S) denotes the interior of subsets S of X as in Definition 31.8.2 and Notation 31.8.3. 


31.8.5 REMARK: Some elementary properties of the interior operator. 

It is clear that Int(S) C S for any subset S of X in Definition 31.8.2, and Int(S) = S if S € T. Conversely, 
if Int(S) = S, then S must be open because Int(S) is always open, as noted in Remark 31.8.1. Further 
elementary properties of the interior operator are given in Theorems 31.8.13 and 31.8.14. 


31.8.6 REMARK: The topological closure operator. 

The intersection in Definition 31.8.7 is well-defined because {X V €; Q € T and S C X V Q} is non-empty 
since 0 € T and SC XV 0 — X. 

For complicated set-expressions in place of S in Notation 31.8.8, the closure of S may be more conveniently 
denoted as Clos(S) as in Notation 31.8.21. 


((2020-6-4. Replace Definition 31.8.7 with {x € X; VO € Top,(X), QN S 4 0) as in Theorem 31.8.17 (ii). 
Then make adjustments to all theorems and definitions based on Definition 31.8.7. )) 


31.8.7 DEFINITION: The (topological) closure of a set S in a topological space (.X, T) is the intersection of 
all closed sets in (X, T) which include the set S. In other words, the closure of S is the set 


N X\Q=({X\0; NET and SC X\9} 
NADOS =(\ik € P(X); X\K eT and $C K} 
=(){K € Top(X); S C K}. 


31.8.8 NOTATION: S denotes the closure of a set S with respect to an implicit topology. 


31.8.9 DEFINITION: The closure operator of a topological space (X, T) is the map from P(X) to T which 
is defined by S > S for S € P(X). 
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31.8.10 REMARK: Survey of notations for interior, closure and boundary of sets in a topological space. 
The overline or overbar notation S$ for the closure of a set S is the most popular, although it is inconvenient for 
complicated set-expressions. The notation S® is used by some authors. (The letter “a” may be mnemonic 
for “adherent set”, but Yosida [167], page 3, states that it comes from the German phrase for closure: 
“abgeschlossene Hiille”.) Table 31.8.1 lists some notations which are found in the literature for the topological 
interior and closure of sets. (Also included are some notations for the topological boundary of a set, which 
is defined in Section 31.9.) 


31.8.11 REMARK: Pointwise characterisation of the interior and closure of a set. 

Definitions 31.8.2 and 31.8.7 are perhaps excessively abstract. Pointwise characterisations of the interior and 
closure are more meaningful and intuitive. Probably the most intuitively clear equivalent definition for the 
interior of a set is given by Theorem 31.8.13 (iv), which states that the interior of S is the set of points which 
have a neighbourhood which is fully included in S. In other words, Int(S) = {x € X; 30 € Top, (X), Q C S). 
(See also Theorem 31.8.17 (i).) This kind of pointwise characterisation of the interior is easily extended to 
the closure of a set. Thus S = {x € X; VO € Top, (X), ANS z Ø}. (This is asserted as Theorem 31.8.17 (ii).) 


31.8.12 REMARK: Box diagram for relations between the interior and closure of a set. 
Some of the relations between the interior and closure of a set are illustrated in Figure 31.8.2. 


«4 S >< X S — 
Int(S) -—SVInt(S)—  Int(X V S) 


E S >< X N S —— 
< > 
Figure 31.8.2 Relations between the interior and closure of a set S 


31.8.13 THEOREM: Some elementary properties of the interior and closure operators. 
Let $ be a subset of a topological space X. Then: 
(i) Int(S) € Top( X). 
(ii) Int(S$) C S. 
(iii) Vr € X, (x € Int(S) & 30 € Top,(X), QC S). 
(iv) Int(S) = {a € X; IN € Top,(X), OC S]. 
(v) Int(S) = {x € S; 30 € Top,(X), OC S). 


S1 C So => Int(S,) C Int(S;). 

(xiv) Sı C 95 > Si Cc So. 
Pnoor: Part (i) follows from Definition 31.3.2 because the interior of a set is the union of a set of open 
sets by Definition 31.8.2. 
For part (ii), let C = (Q0 € Top(X); Q C S). Then Vz € C, z € S. So UC € S by Theorem 8.4.8 (xvii). 


That is, Int(S) C S. 
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year reference interior closure boundary 
1946 Graves [85], pages 45, 340 S 
1953 Rudin [129], page 39 5 
1955 Kelley [101], pages 42, 44, 103 Be S- (or S) d 
1957 Wallace [152], pages 17, 36 8 S 
1957 Whitney [161], page 355 int (S) S fro (S) 
1961 Hocking/Young [93], pages 4, 47 S B(S) 
1962 B. Mendelson [115], pages 82, 84, 85 Int (S$) S Bdry (S) 
1963 Auslander/MacKenzie [1], page 9 S 
1963 Simmons [137], pages 63, 68 Int(S) S 
1964 Baum [54], pages 28, 33 S, 8° S Fr(S) 
1964 Gaal[77], pages 24-25 St S S^ 
1964 Robertson/Robertson [126], page 6 S 
1965 A.E. Taylor [145], page 52 S° S B(S) 
1965 Yosida [167], pages 3, 12, 286 Ss’ 4 Os 
1966 Ahlfors [45], page 53 Int S S- (or CIS) Bd S or 0S 
1966 S.J. Taylor [147], pages 26, 28 S° S 
1967 Gemignani [80], page 55 S? CIS Fr S 
1968 Bishop/Goldberg [3], pages 9, 10 S9 S- Os 
1968 Wallace [153], page 5 Int S S Fr S 
1969 Helms [92], page 1 S OS 
1970 Kolmogorov /Fomin [104], pages 46, 128 I(S) [5] 

1970 Steen/Seebach [141], page 6 S° S (or S7) B 

1970 Wallace [154], page 264 S S 

1970 Wilansky [163], pages 19-20 S! S, cr S bS 

1970 Willard [165], pages 25-28 S°, Int(S) S, CI(S) Fr(S) 

1971 Kasriel [100], pages 170-171 int(S) cl(S) Fr(S) 

1972 Malliavin [28], page 122 S 

1973 Rudin [130], page 7 S? S 

1974 Gilmore [82], page 60 S 

1974 Reinhardt /Soeder [124], page 214 p S as 

1975 Adams [44], page 2 S bdry S 

1975 Lovelock/Rund [27], page 332 S? S Os 

1975 Treves [150], pages xv, 189 S Os 

1981 Bleecker [254], page 11 S 

1981 Johnsonbaugh/Pfaffenberger [97], page 136 5° S Os 

1983 Gilbarg/Trudinger [81], page 9 S 0S 

1983 Nash/Sen [30], pages 14, 15 8v S b(S) 

1993 EDM2 [113], pages 1607, 1611 S’ (or S°, Int S) S^ (or S, CIS) Bd S, Fr S, 0S 

1994 Darling [8], pages 112, 114 S° S Os 

1997 Bruckner/Bruckner/'Thomson [56], page 107 int(S) S 

1999 Lang [23], pages 35, 40 Int (S$) S as 

2001 Thomson/Bruckner/Bruckner [149], p. 154-155 int(S) S 

2011 Bass [53], page 198 B" S as 

Kennington Int(S) S, Clos($) Bdy(S), 0S 

Table 31.8.1 Survey of topological interior, closure and boundary notations 


For part (iii), let x € X. Suppose that x € Int(S). Then z € U{Q € Top(X); Q € S} by Definition 31.8.2. 
c 


So x € Q for some Q € Top(X) such that € C S. Therefore 3 


suppose that x € X and JQ € Top,(X), Q C S. Then x € Q for some Q € Top(X) such that Q C S. So 


x E UJ {Q € Top( X); Q C S). Hence x € Int(S) by Definition 31.8.2. 
Part (iv) is a paraphrase of part (iii). 
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Part (v) follows from part (iv) by part (ii). 

Part (vi) follows from Theorem 31.4.5 because the closure of a set is the intersection of a non-empty set of 
closed sets by Definition 31.8.7. (The set C of closed supersets of a set $ C X is non-empty because X is a 
closed set in any topological space X. Therefore X € C. See also Remark 31.8.6.) 

For part (vii), let C = {K € Top(X); S C K}. Then Vz € C, z 2 S. So C 2 S by Theorem 8.4.8 (xviii). 
That is, SDS. 

Part (viii) follows from Int(X VS) = U {9 € Top(X); Q C X \ S} =U{Q; 2 € Top(X) ASC X\O} = 
XA(10XN0; Q € Top(X) A S C X \ Q}, which equals X VS by Definition 31.8.7. 


Part (ix) follows from part (viii) and Theorem 8.2.5 (xvi). 


Part (x) follows from parts (ii) and (vii). 

Part (xi) follows from part (viii) by substituting X V S for S. 
Part (xii) follows from part (xi). 

Part (xiii) follows from Theorem 8.4.8 (ii) because, by the transitivity of the set inclusion relation, $1 C S2 
implies that {Q € Top(X); Q C S1} C {Q € Top( X); Q C So}. 

Part (xiv) follows from Theorem 8.4.8 (iii) because, by the transitivity of the set inclusion relation, $1 C S2 
implies that {K € Top(. X); K 2 $11 2 {K € Top( X); K 2 So}. 


31.8.14 THEOREM: Further elementary properties of the interior and closure operators. 
Let S be a subset of a topological space X. Then: 


(i) S € Top(X) & Int(S) = S. 


PRoor: For part (i), let S be an open set in X. That is, S € Top(X). Then S € {Q € Top(X); Q C S}. 
So S C U{Q € Top(X); Q C S) = Int(S) by Definition 31.8.2. But Int(S) C S by Theorem 31.8.13 (i). 
So Int(S) = S. 

Now suppose that S C X and Int(S) = S. Then S is open by Theorem 31.8.13 (i). So part (i) is verified. 
Part (ii) follows from part (i) and Theorem 31.8.13 (ii). 

Part (iii) follows from part (i) and Definition 31.3.2 (i). 

For part (iv), let $ — X. Then Int(S) — X by part (i) and Definition 31.3.2 (i). Now suppose that Int(S) — 
X. Then S — X by Theorem 31.8.13 (ii). 


For part (v), let S be a closed set in X. That is, S € Top(X). Then S € {K € Top(X); K 2 Sj. So 
SD(\{K € Top(X); K 2 S) =S by Definition 31.8.7. But S 2 S by Theorem 31.8.13 (vi). So $ = S. 
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Now suppose that S C X and § = S. Then S is closed by Theorem 31.8.13 (vi). So part (v) is verified. 
Part (vi) follows from part (v) and Theorem 31.8.13 (vii). 


For part (vii), let S = Ø. Then S = ( by part part (v) and Definitions 31.3.2 (i) and 31.4.1. Now suppose 
that 5 = Ø. Then S = Ý by Theorem 31.8.13 (vii). 


Part (viii) follows from part (v) and Definitions 31.3.2 (i) and 31.4.1. 
Part (ix) follows from part (i) and Theorem 31.8.13 (i). 


Part (x) follows from part (v) and Theorem 31.8.13 (vi). 

(xi) follows from Theorem 31.8.13 (ii, xiv). 
Part (xii) follows from Theorem 31.8.13 (vii, xiii). 
For part (xiii), Int($1) C Int($1 U S5) by Theorem 31.8.13 (xiii). Similarly, Int($5) C Int($4 U S5). Hence 
Int(51) U Int(55) C Int($1 U S5) by Theorem 8.1.7 (ix). 
For part (xiv), let S1, S2 € P(X). Then Int(S1) 2 Int(S1N S2) by Theorem 31.8.13 (xiii). Similarly Int(S5) 2 
Int(S N $5). So Int(S1) N Int(S3) 2 Int(S, N S2) by Theorem 8.1.7 (xii). Now assume z € Int(S1) N Int($5). 
Then 301,22 € Top,(X), (Q1 C Sı and N C S5) by Theorem 31.8.13 (iii). So Qı N Nz € Top, (X) and 
Q1 005 C $10 S2 for such Qı and Q» by Theorem 8.1.9 (iv). Sor € Int(5, N $5) by Theorem 31.8.13 (iii). 
Therefore Int(S,)  Int(S5) C Int(S1 n S2). Hence Int(5,) N Int(S5) = Int(S n S2). E 
For part (xv), let S1, S2 € P(X). Then $, C S1 U S2 by Theorem 31.8.13 (xiv). Similarly Sy C S1U S5. So 
S1 U S2 C S1 U S2 by Theorem 8.1.7 (ix). Now assume that z € S1 U S2. Then x € X \ Int(X V (S1 U S2)) 
by Theorem 31.8.13 (ix). So Int(X V (S1 U S2)) = Int((X V $1) A (X V S2)) = Int(X \ $1) N Int(X V S2) by 
part (xiv). Therefore x € X V (Int(X V $1) A Int(X V $2)) = (X \ Int(X V $1)) U (X \ Int(X V $2)) = $1 U $2 
by Theorem 31.8.13 8.13 (ix). So $1 U So C $1 U So. Hence Sy U $5 = $1 U So. 
For part (xvi), Sı 2 S1 N $5 by Theorem 31.8.13 (xiv). Similarly, $9 2 S1 N S2. Hence $4 $5 2 $n So. 
by Theorem i 8.1.7 (xii). 


31.8.15 REMARK: Some p inclusion mo for interior of closure and closure of interior. 
The hypothetical rules *Int(S) C S" and *Int(S) 2 S" are not valid in general. For example, let S = Q in 
R with the usual topology on R. Then Int(S) = R ZS. Let S = [0,1] C R. Then Int(S) = (0,1) 2 S. 


Similarly, the hypothetical rules “Int S C S" and “Int S D S" are not valid general rules. Let S = (0,1) 
in R. Then Int $ = [0,1] Z S. Let S = Q. Then Int $ = 0 Z S. 

To put these observations in colloquial terms, one may say that the interior operation makes sets “smaller”, 
and the closure operation makes sets “bigger”, but neither of these operations dominates the other in general. 


31.8.16 THEOREM: Yet more elementary properties of the interior and closure operators. 
Let S be a subset of a topological space X. 


üyS-XIuX5), 
(ii) Int(S) = X \ (XX S). 
Proor: For part (i), note that 
S=N{X\Q9; Qe Top(X) A SC X\Q} 
= X \U {9; Q € Top(X) A SC XX 
= X\U{Q; Q € Top(X) ANC XNS) 
= X \ Int(X NS). 
Part (ii) follows from part (i) by substituting X V S for S and complementing both sides of the equation. 


31.8.17 THEOREM: Even more elementary properties of the interior and closure operators. 
Let S be a subset of a topological space X. 


(i) An element x of X is an element of the interior of S in X if and only if IQ € Top,(X),QC S. 
In other words, Int(S) = (x € X; 30 € Top,(X), QC S]. 


(ii) An element x of X is an element of the closure of S in X if and only if VO € Top,(X), ANS 40. 
In other words, $ = {x € X; VQ € Top,(X), QN S z 0). 


(iii) S € Top( X) if and only if Vz € S, 30 € Top,(X), Q C S. 
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PROOF: To show part (i), let x € X and assume that IQ € Top,(X),Q C S. Then x € Q for some 
Q € Top(X) such that Q C S. Sor € U{Q € Top(X); Q C S). Therefore x € Int(S) by Definition 31.8.2. 


Conversely, let x € Int(S). Then z € U{Q € Top(X); Q C S) by Definition 31.8.2. So x € Q and Q C S for 
some Q € Top(.X). Therefore Q C S for some € € Top, (X). In other words, 30 € Top, (X), QC S. 


For part (ii), let x € X. Then by Definition 31.8.7, 
reSerc(|lXWXOG;QeTandSCXMVXQ] 
erzdij(o;QeTand SC X\ QO} 
e 330€ Top(X),(€ QA SC XXQ) 
€ 750 € Top(X), (rx ENA SNQ =b) 
e V0 € Top(X),( Q0 v SAN ED) 
€ VO eTop(X),(re 09 SNQ g) 
S VO € Top, (X), SANE 0. 


In other words, x is in the closure of S if and only if VO € Top, (X), QN S z 0), as claimed. 
Part (iii) follows from part (i) and Theorem 31.8.14 (ii). 


— Ni 


31.8.18 REMARK: Alternative expressions for the interior and closure of sets. 
Definitions 31.8.2 and 31.8.7, and Theorem 31.8.17 (i, ii), may be summarised as follows. 


Int($) =U {Q € Top(X); QC S} 
= {x € X; IQ € Top, (X), Q C S) 
S=(\{X\0; Q € Top(X) and S C X\ OF 
-= {K € Top(X); S C K} 
= {x € X; YQ € Top, (X), ANS z 0), 


for any subset S of a topological space X. 


31.8.19 REMARK: JInterior-operator and closure-operator notations which indicate choice of topology. 
Notations 31.8.3 and 31.8.8 require the topology and base set to be implied in the context. So there can be 
confusion if more than one topology is under consideration. As mentioned in Remark 31.3.5, a topological 
space (X, T) is fully determined by the set T. Thus Notations 31.8.20 and 31.8.21 fully determine the implicit 
point space X = UT by stating that T is a topology on X. 

Notations 31.8.20 and 31.8.21 are applied in Theorem 31.8.22. The notation S for the closure of a set does not 
easily permit the addition of a subscript to indicate the implied topology explicitly. Therefore the notation 
Closr (S) is used here for the closure of a set S with respect to a topology T. 


31.8.20 NOTATION: Intp(S$) denotes the interior of a set S € P(X) with respect to a topology T on X. 


31.8.21 NOTATION: Closz(S) denotes the closure of a set S € P(X) with respect to a topology T on X. 
Clos(S) denotes the closure of a set S with respect to an implicit topology. 


31.8.22 THEOREM: The effect of topology strength on the interior and closure operators. 
Let Tj and T5 be topologies on a set X such that Tj C Tə. Then: 

(i) Intr, (S) C Intr, (S) for all $ € P(X). 

(ii) Closp, (S) 2 Closr, (S) for all $ € P(X). 
PROOF: For part (i), note that {9 € Ti; Q C S} C (OQ € T3; Q C S). Therefore by Theorem 8.4.8 (ii) and 
Definition 31.8.2, Int, (S) 2(J(Ge 5; 9 C S) C JO € T5; QC S] = Int, (S). 
Part (ii) follows from part (i) and Theorem 31.8.13 (ix). 


31.8.23 THEOREM: Expressions for the interior and closure of sets in a relative topology. 
Let X a topological space. Let S C X and AC S. 


(i) Int pops) (A) =SN U{Q € Top( X); QNS C A}. 
(ii) Clostop(s)(4) = SNN {F € Top(X); AC FAS}. 
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PROOF: For part (i), it follows from Definition 31.8.2 that Introp(s) (4) = U {G € Top(S); G C A}, where 
Top(S) = {QN S; Q € Top(X)} by Definition 31.6.2. So Intro (s) (4) = U (05; Q € Top(X), ANS C AJ. 
Hence Introp(s) (4) = SAU (9 € Top(X); QN S C A} by Theorem 8.4.8 (iv). 

For part (ii), it follows from Definition 31.8.7 that Clospy(s(4) = (](K € Top(S); A C K}, where 
Top(S) = {F N S; F € Top(X)) by Theorem 31.6.7 (ii). Therefore Clostops)(A) = N {F N S; F € Top(X), 
AC FNS}. Hence Closqop(s)(A) = SNN {F € Top(X); AC FNS} by Theorem 84.8 (vii). 


31.8.24 REMARK: The extremes of interior and closure operators for trivial and discrete topologies. 

'Theorem 31.8.22 says that strengthening the topology on a fixed set X makes interiors of sets larger, and 
makes closures of sets smaller. (The inequalities in Theorem 31.8.22 are illustrated in Figure 31.9.3.) In 
the extreme case of the strongest topology on X, namely the discrete topology IP(.X), both the interior and 
closure of $ equal S itself. (This is discussed in more detail in Remark 31.11.11.) In the opposite extreme 


of the trivial topology T = (0, X) on X, the interior of any set S in P(X) V 10, X} equals Ø, and the closure 
of such a set S equals X. (See Remark 31.9.17 for similar comments on the exterior and boundary of sets.) 


31.8.25 REMARK: Closed neighbourhoods. 

The closed neighbourhoods in Definition 31.8.26 are not nearly so useful as the open neighbourhoods in 
Definition 31.3.11. A closed neighbourhood is not necessarily the same thing as the closure of an open 
neighbourhood. The closure of any open neighbourhood is necessarily a closed neighbourhood by Theorems 
31.8.13 (vii) and 31.8.14 (v). For example, let X = IR with its usual topology, and let F = [-1, 1]U Q. Then 


F € Top(X) and Int(F) = (—1,1). So F is a closed neighbourhood of x = 0 € R. But there is no open 


neighbourhood 2 € Top, (X) such that F = Q. 


31.8.26 DEFINITION: A closed neighbourhood of a point x in a topological space X is any set F € Top, (X) 
which satisfies IQ € Top, (X), 2 C F. 


31.9. Exterior and boundary of sets 


31.9.1 REMARK: The exterior of a set in a topological space. 

The exterior of a set is defined analogously to the interior in Definition 31.9.2. The exterior turns out to be 
the same as the complement of the closure which is introduced in Definition 31.8.7. It is also the same as 
the interior of the complement. A reasonable notation for the exterior of a set S would be Ext(S). 


It would perhaps also be reasonable to define the “exterior closure" of a set as the complement of the interior 
(which equals the closure of the complement). But there seems to be little demand for this. 


((2020-6-4. Replace Definition 31.9.2 with {x € X; IQ € Top,(X), Q C X \ S). Then make adjustments to 
all theorems and definitions based on Definition 31.9.2. )) 


31.9.2 DEFINITION: The (topological) exterior of a set S in a topological space X is the union of all open 
sets in X which are included in the set X V S. In other words, the exterior of S is the set 


U Q-2U(Q0eTop(X);; Qn S = 0). 
Oc Top( X) 
QCX\S 


31.9.3 NOTATION: Ext(S) denotes the exterior of a set S in an implicit topological space. 


31.9.4 DEFINITION: The exterior operator of a topological space (X,T) is the map Ext : P(X) — T, where 
Ext(S) denotes the exterior of subsets S of X as in Definition 31.9.2 and Notation 31.9.3. 


((2020-6-4. Replace Definition 31.9.5 with {x € X; VO € Top,(X), (QN S # 0 and O01 XN S z 0). Then 
make adjustments to all theorems and definitions based on Definition 31.9.5. )) 


31.9.5 DEFINITION: The (topological) boundary of a set S in a topological space X is the complement of 
the interior of S within the closure of S; in other words, S \ Int(S). 


31.9.6 NOTATION: Bdy(S) denotes the boundary of a set S with respect to an implicit topology. 
OS is an alternative notation for Bdy(S). 
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31.9.7 DEFINITION: The boundary operator of a topological space (X,T) is the map Bdy : P(X) > T, 
where Bdy(S) denotes the boundary of subsets S of X as in Definition 31.9.5 and Notation 31.9.6. 


31.9.8 REMARK: Alternative notation for the boundary of a set. E 
Combining Definition 31.9.5 and Notation 31.9.6 gives 0S = Bdy(S) = S\Int(S) for subsets S of a topological 
space X. 


Table 31.8.1 suggests that the notation 0S is much more popular than Bdy(S) for the boundary of a set S. 
However, the curly-dee symbol ð is also used extensively for denoting partial derivatives. In fact, the two 
concepts are closely related within the context of the theory of distributions. (The gradient of the indicator 
function of the set S is related to the boundary of S.) 


31.9.9 REMARK: Diagram for relations between interior, exterior, boundary and closure of a set. 

The relations between the interior, exterior, boundary and closure of a set in a topological space are illustrated 
in Figure 31.9.1. Note particularly that, in general, the boundary Bdy(S) straddles the set S and its 
complement. The set S is open if and only if the intersection Bdy(S) N S is empty. The set S is closed if 
and only if the intersection Bdy(S) n (X V S) is empty. 

The entire boundary Bdy(S) is empty if S = 0 or S = X. If Bdy(S) = 0 and S £9 and S Z X, then X 
is a disconnected topological space and the pair (S, X V S) is a disconnection of X. (See Section 34.1 for 
connectedness definitions.) 


< S >< xX \ S ——- 


< > 


Figure 31.9.1 Relations between interior, exterior, boundary and closure of a set S$ 


31.9.10 THEOREM: Some elementary properties of the exterior, boundary, interior and closure operators. 
Let S be a subset of a topological space X. Then: 


(i) Bdy(S) n Int(S) = 0. 


(xv) X = Int(S) U Bdy(S) U Ext(S). 
(xvi) (Int(S) n Bdy(S) = 0) ^ (Int(S) n Ext(S) = 0) ^ (Bdy(S) n Ext(S) = 0). 
(xvii) (Int(5), Bdy(S), Ext(S)) is a partition of X. 


ii) 
ii) 
(iv) Ext(S) = Int(X \ S). 
(v) Ext() € Top(X) 
(vi) Ext(S) = a 
(vii) Ext(S) N S = 
(viii) Ext(S) n Int(S à =0 
(ix) Bdy(S) n Ext(S) — 0 
(x) Ext($) C XXV S 
(xi) Bdy(S) 2 SN XXV S 
(xii) Bdy(S) € Top( X). 
(xiii) Bdy(S) = X \ Int(S) V Ext(S). 
(xiv) Bdy(S) = Bdy(X \ S). 
) 
) 
) 
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(xviii) X V.S = Bdy(S) U Ext( S). 
(xix) S € Top(X) > X \ S = (X \ 8) UBdy(S). 
(xx) Int(S) = SX Bdy(S). 
Int(S) = S \ Bdy(S). 


VInt(S) = SN Bdy(S). 
xt(S) = (X V 5) \ Bdy(S). 
= Int(S) U Bdy(S). 


(xxvii 


Int(S)) c C Bdy(S). 


y( 

y(S) € Bdy(S). 
( 

Bdy(S) 


(xxix) Bd 
(xxx) Bd 
dy(Ext(S)) C Bdy(S). 

Bdy(S) = Bdy(S). 

Bdy(Bdy(5)) © Bdy(S). 

X \ S € Top(X) & Ext(S) 2 XX S. 

X \ S € Top(X) > (XX S)nBdy(S) = 0. 

Bdy(S) = 0 => S € Top(X). 

(xxxvii) Bdy(S) 2 0 > X \ S € Top(X). 

(xxxviii) Bdy(S) — 0 < (S € Top( X) and X \ S € Top(X)). 

Let S, and S» be subsets of a topological space X. Then: 


(xxxix) $4 C S2 > Ext(51) 2 Ext(S2). 


(xxxi 
(xxxii 
(xxxiii 
(xxxiv 
(xxxv 


(xxxvi 


) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
(xxviii) 
) 
) 
) B 
) 
) 
) 
) 
) 
) 


PRoor: Part (i) follows directly from Definition 31.9.5. 
Part (ii) follows directly from Definition 31.9.5. 
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For part (iii), note that S € Top(X) = S C Int(S) by Theorem 31.8.14 (ii). But S C Int(S) & SV 
Int(S) = @ by Theorem 8.2.5 (iv), and S V Int(S) = (S N S)VInt(S) = (S \ Int(S)) n S = Bdy(S) n S by 


Theorem 31.8.13 (vii), Theorem 8.2.6 (vi) and Definition 31.9.5. So S € Top(X) = Bdy(S) n S = 0. 


Part (iv) follows from Ext(S) = U {Q € Top(X);; QN S = 9} = U {9 € Top(X); Q C XN S), which equals 


Int(X V S) by Definition 31.8.2. 
Part (v) follows from Theorem 31.8.13 (i) because Ext(S) = Int(X V S) by part (iv). 
Part (vi) follows from part (iv) and Theorem 31.8.13 (viii). 


Part (vii) follows from part (vi) and Theorem 8.2.5 (vi). 

Part (viii) follows from part (vii) and Theorem 31.8.13 (x). 

Part (ix) follows from part (vii) and Definition 31.9.5. 

Part (x) follows from part (iv) and Theorem 31.8.13 (ii). 

Part (xi) follows from Definition 31.9.5 and Theorem 31.8.13 (xi). 

Part (xii) follows from part (xi), Theorem 31.8.13 (vi) and Theorem 31.4.5 (ii). 


For part (xiii), note that Bdy(S) = SV Int(S) = X \ Int(X V S) VInt(S) = X \ Int(S) \ Ext(S) by part (iv). 
To show part (xiv), note that by part (xiii), Bdy( X V S) = X \ Int(X VS) VExt(X NS) = XV Ext(S) \ Int(S) 


by part (iv) and Theorem 8.2.5 (xvi). This equals Bdy(S) by part (xiii). 

Part (xv) follows from part (xiii) because Bdy(S) = X V Int(S) V Ext(S) = X \ (Int(S) U Ext(S)). 
Part (xvi) follows from parts (i), (viii) and (ix). 

Part (xvii) follows from parts (xv) and (xvi). 
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For part (xviii), note that Bdy(S) U Ext($) = (X VInt(S) V Ext(S)) U Ext(S) = X VInt(S) by part (xvi). 
But by Theorem 31.8.13 (ix), X VS = X \ Int(S). 

Part (xix) follows from parts (xviii) and (vi), and Theorems 31.8.13 (xii) and 31.8.14 (i). 

Part (xx) follows from Definition 31.9.5 and Theorem 31.8.13 (x). 

Part (xxi) follows from part (xx) and Theorem 31.8.13 (ii). 

For part (xxii), Int(5) = S\Bdy(S) by part (xxi). So $\Int($) = $\(S\Bdy(S)). So S\Int(S) = SnBdy(5) 
by Theorem 8.2.5 (xv). 

Part (xxiii) follows from parts (iv) and (xiv). 

Part (xxiv) follows from Definition 31.9.5 and Theorem 31.8.13 (x). 


Part (xxv) follows from part (xxiv) and Theorem 31.8.13 (ii, vii). 


For part (xxvi), note that Bdy(S) C S if and only if S = S by part (xxv) and Theorem 8.1.7 (viii). But 
S = S if and only if S € 'Top(S) by Theorem 31.8.14 (v). Hence S € 'Top(S) if and only if Bdy(5) C S. 
Part (xxvii) follows from Ext(Ext(S)) — Int(X V Ext(S)), by part (iv), which equals Int(S) by part (vi). 
Part (xxviii) follows from part (v) and Theorem 31.8.14 (i). 

Part (xxix) follows from Bdy(Int(5)) = Int(S) \ Int(Int(5)) by Definition 31.9.5, which equals Int(S) VInt(S) 
by Theorem 31.8.14 (ix), which is a subset of S \ Int(S) by Theorem 31.8.14 (xi), which equals Bdy( S). 

For part (xxx), note that Bdy($) = S \ Int(S) by Definition 31.9.5, which equals S V Int(S) by Theo- 
rem 31.8.14 (x), which is a subset of S \ Int(S) by Theorem 31.8.14 (xii), which equals Bdy(S). 

For part (xxxi), note that Bdy(Ext(S)) — Bdy(X V S), by part (vi), which equals Bdy(S) by part (xiv), and 
this is a subset of Bdy(S) by part (xxx). 

Part (xxxii) follows from part (xii) and Theorem 31.8.14 (v). 

For part (xxxiii), note that Bdy(Bdy(S)) C Bdy(S) by Definition 31.9.5, but Bdy(S) = Bdy(S) by 
part (xxxii). 

For part (xxxiv), note that Ext(S) = Int(X/S) by part (iv). Hence Ext(S) = X\S © Int(X/S)=X\S © 
XS € Top(X) by Theorem 31.8.14 (i). 

For part (xxxv), note that (X V S) VInt(X V 8) = (X \ 8) N Bdy(X \ S) = (X \ S) A Bdy(S) by parts (xxii) 
and (xiv). So XV S € Top(X) => (X \ S)N Bdy(S) = 0. by Theorem 31.8.14 (i). 

Part (xxxvi) follows from part (xxi) and Theorem 31.8.14 (i). 


Part (xxxvii) follows from parts (xxiii) and (xxxiv). 


For part (xxxviii), Bdy(S) = 0 > (S € Top(X) and XXS € Top(X)) follows from parts (xxxvi) and (xxxvii). 
The converse follows from parts (iii) and (xxxv) because Bdy(S) = (Bdy(S) n S) U (Bdy(S) n (X V S)). 


Part (xxxix) follows from part (iv), Theorem 31.8.13 (xiii) and Theorem 8.2.6 (xvi). 


31.9.11 REMARK: Topology is the study of boundaries. Open sets are the insides of sets. 

As mentioned in Remarks 31.2.1 and 31.3.1, one may say that "topology is the study of boundaries". This 
notion is supperted by Theorem 31.9.10 (iii), which states that a set is open if and only if it contains none 
of its boundary points. Thus the concept of an open set can be simply and accurately defined in terms of 
its boundary. 


In the physical world, vast numbers of systems have well-defined boundaries. So it is very easy to explain 
the concept of a boundary due to its familiarity in everyday life. It is not so easy to explain the notion of an 
open set. But by Theorem 31.9.10 (iii), open sets are very straightfowardly defined in terms of boundaries, 
which should be useful in teaching point-set topology. 


The most popular mathematical definition of the topology as the set of open sets becomes more meaningful 
when it is explained that open sets are sets which do not contain their boundaries. In other words, “open 
sets are the insides of sets”, a characterisation which may be justified by Theorem 31.8.14 (i). 


31.9.12 REMARK: Every set is partitioned into its interior, boundary and exterior. 

By Theorem 31.9.10 (xv, xvi), the set {Int(S), Bdy(S), Ext(S)} is a partition of the topological space X for 
any subset S of X. In colloquial terms one may say that every set has a distinct inside, outside and boundary. 
This is a fairly accurate informal characterisation of the essential nature and meaning of topology. 
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31.9.13 THEOREM: The 3-way partition of the empty and full sets. 
Let X be a topological space. 
(i) Int(0) = 0, Bdy(0) = 0 and Ext(0) = X. 
(ii) Int(X) = X, Bdy(X) = 0 and Ext(X) = 0). 
PROOF: For part (i), Int(@) = Ø by Theorem 31.8.14 (iii), and Clos(@) = Ø by Theorem 31.8.14 (vii). Then 
Bdy(0) = Ø by Definition 31.9.5 and Ext(0) = X by Theorem 31.9.10 (vi). 


For part (ii), Int(0) = X by Theorem 31.8.14 (iv), Bdy(0) = Ø by Theorem 31.9.10 (xxxviii), and Ext(0) = Ø 
by Theorem 31.9.10 (x). 


31.9.14 REMARK: The open and closed portions of the boundary of a set. 

For any set S in a topological space X, the points in Int(S) are always elements of S (by Theorem 31.8.13 (ii)), 
and the points in Ext(S) are always elements of X V S (by Theorem 31.9.10 (x)). But the points of Bdy(S) 
may belong to either S or X V S. (See Figure 31.9.2.) 


portion Op 


Figure 31.9.2 Open and closed portions of boundary of a set S 


If S is open, all points of Bdy(S) belong to X \ S, whereas if S is closed, all points of Bdy($) belong to S. 
Therefore one may refer to the points of Bdy(S) which are elements of X V S as the “open portions" of the 
boundary, and the points of Bdy(S) which are elements of S as the “closed portions" of the boundary. 


31.9.15 REMARK:  Ad-hoc notations for exterior and boundary operators to indicate choice of topology. 
Notation 31.9.16 gives topology-dependent notations for the exterior and boundary of sets analogous to 
Notation 31.8.20 for the interior of sets. 


31.9.16 NOTATION: Extr(S) denotes the exterior of a set S with respect to a topological space (.X, T). 
Bdy4.(S) denotes the boundary of a set S with respect to a topological space (X, T). 


31.9.17 REMARK: The effects of topology strength on set exteriors and boundaries. 

'Theorem 31.9.18 says that strengthening the topology on a fixed set X makes set exteriors larger and set 
boundaries smaller. In the extreme case of the strongest topology on X, namely the discrete topology P(X), 
the exterior of S is equal to X V S and the boundary of S is empty. In the opposite extreme of the trivial 
topology T = (0, X) on X, the exterior of any set S in P(X) V (0, X) equals Ø, and the boundary of such a 
set S equals X. (See Remark 31.8.24 for similar comments on the interior and closure of sets.) 


31.9.18 THEOREM: The effects of topology strength on the exteriors and boundaries of sets. 
Let Tj and T5 be topologies on a set X such that Ty C To. 


(i) Extr, (S) C Extr, (S) for all S € P(X). 

(ii) Bdy7, (S) 2 Bdy7z,(S) for all S € P(X). 
Pnoor: Part (i) follows from Theorem 31.8.22 (i) because Extr(S) = Intr(X \ S) for T = T; and T» by 
Theorem 31.9.10 (iv). 
Part (ii) follows from Theorem 31.8.22 and Theorem 31.9.10 (xiii). 
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Figure 31.9.3 Relation between topology strength and interior/boundary/exterior of sets 


31.9.19 REMARK: Interpretation of the effect of topology strength on set boundaries. 
The inequalities in Theorems 31.8.22 and 31.9.18 are illustrated in Figure 31.9.3. 


For a fixed set S, the non-increasing boundary set Bdy( S) with respect to the strength of the topology may 
be thought of as a process of nibbling away the boundary by the interior Int(5) and Ext(S) as open sets are 
gradually added to the topology. When the topology is weak, many points have “undecided status". That is, 
they are in neither the interior nor the exterior. (Figure 31.9.2 in Remark 31.9.14 illustrates the “undecided 
status” of points in Bdy(S) which are in S or XX S, but which are not allocated to either Int(S) or Ext(S).) 


As the topology is strengthened, more and more points are decided as either interior or exterior points. The 
extreme cases, the trivial and discrete topologies, are illustrated in Figure 31.9.4, where almost all sets S 
in the trivial topology have Bdy(S) = X because no points in X are “decided”, whereas all sets S in the 
discrete topology have Bdy(.S) = Ø because all points in X are “decided”. 


31.9.20 THEOREM: Properties of discrete topologies. 
Let X be a discrete topological space. Then the following propositions are true. 


(i) VS € P(X), S € Top( X). 


(ii) VS € P(X), S € Top( X). 
(iii) VS € P(X), Int(S) = S. 
(iv) VS € P(X), Ext(S) = X \ S. 
(v) YS € P(X), EI 

( 


(vi) VS € P(X), 8 


Pnoor: By Definition 31.3.19 for a discrete topological space, Top(X) = P(X). This implies part (i). 
Part (ii) follows from part (i) and Definition 31.4.1. 

Part (iii) follows from part (i) and Theorem 31.8.14 (i). 

Part (iv) follows from part (iii) and Theorem 31.9.10 (iv). 

Part (v) follows from parts (iii) and (iv) and Theorem 31.9.10 (xiii). 

Part (vi) follows from part (ii) and Theorem 31.8.14 (v). (Or from part (iv) and Theorem 31.9.10 (vi).) 


31.9.21 THEOREM: Properties of trivial topologies. 
Let X be a trivial topological space. Then the following propositions are true. 
(i) VS € P(X), (S € Top(X) 6 (S=0V S — X)). 
) (X), (S € Top(X) & (S=0 v S=X)). 
) VS € P(X) \ {X}, Int(S) = 0. 
(iv) VS € P(X) \ {0}, Ext(S) = 0). 
) (X) 
) (X) 


VS € P(X) \ (0, X), Bdy(S) = X. 
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PROOF: Part (i) is equivalent to Definition 31.3.18 for the trivial topology. 


Part (ii) follows from part (i) and the definition of closed sets. 

For part (iii), note that if S € P(X) V {X} and Q € Top(X) and QC S, then 9 = Í. 
Part (iv) follows from part (iii) and Theorem 31.9.10 (iv). 

Part (v) follows from parts (iii) and (iv). 

Part (vi) follows from part (iv) and Theorem 31.9.10 (vi). 


31.9.22 REMARK: Diagram illustrating trivial and discrete topologies. 
Theorems 31.9.20 and 31.9.21 are illustrated in Figure 31.9.4. 


weakest 
topology 


trivial 
topology 
(0, X; 


discrete 
topology strongest 
P(X topolo 
CO mM(S)- S  Bay($)=0 Ex(S)- X \ 5 Pe 
< S=S > 
E X > 


Figure 31.9.4 Interior/boundary/exterior of sets including trivial and discrete extremes 


Note that for the trivial topology on a non-empty set X, the label Int(S) = Ø applies only if S # X; the 
label Ext(S) = () applies only if S 4 Ø; the label Bdy( S) = () applies only if S € {0, XY; and the label S = X 
applies only if S 4 Ø. (See Remark 31.9.19 for related comments.) 


31.9.23 REMARK: Diagram illustrating the effect of topology strength on boundary "thickness". 

Figure 31.9.5 illustrates the way in which boundary thickness of a fixed set S in a topological space X 
decreases as the topology is strengthened. It is notable that both the weakest (i.e. trivial) topology and 
the strongest (i.e. discrete) topology contain no information. The extremes of topology strength add no 
information to the set S. 


weakest topology = (0, X} 


ic) 


» 


Bdy(S) = X 


J^ 


e 
[o] 


thick boundary weak topology 


© 
Ù 


X 


© 
ü 


thin boundary strong topology 


Bdy(S) = 0 strongest topology — IP(X) 


Figure 31.9.5 The influence of topology strength on boundary thickness 
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31.10. Limit points and isolated points of sets 


31.10.1 REMARK: Alternative terminology for limit points. 

Some authors use the term “accumulation point” for the limit points in Definition 31.10.2. (See for example 
Ahlfors [45], page 53; Szekeres [305], page 260; Frankel [12], page 106; Yosida [167], page 3; EDM2 [113], 425.0, 
page 1611.) The “limit point of a set” is easily confused with a “limit of a sequence". (See Remark 35.4.12 
for comments on the relations between these two concepts.) However, the term “accumulation point" is too 
long, although it is very clear and precise, and not at all confusing. 


31.10.2 DEFINITION: A limit point of a set S in a topological space X is a point x € X which satisfies 
YQ € Top, CX), QN (SV {r} z 0. 


A limit point is also known as an accumulation point or a cluster point. 


The limit set of a set S in a topological space X is the set of limit points of S. 


31.10.3 THEOREM: The limit points of a set are also limit points of any superset. 
Let X be a topological space. Let S1, S2 € P(X) satisfy Sı C So. 


(i) Let z be a limit point of $1. Then z is a limit point of Sp. 
(ii) Let A1, A2 be the limits sets of 91, $5 respectively. Then A4 C Ap. 


PROOF: For part (i), let $1, S2 be subsets of a topological space X such that S1 C S5. Let z be a limit point 
of Sı. Then VO € Top, (X), QN (S1 \ (x]) zZ 0 by Definition 31.10.2. So VO € Top, (X), AN (Ss (x]) z 0. 
Hence z is a limit point of $5 by Definition 31.10.2. 


Part (ii) follows from part (i) and Definition 31.10.2 for the limit set. 


31.10.4 THEOREM: A point is a limit point if and only if its singleton set is not a neighbourhood of it. 
A point a in a topological space X is a limit point of X if and only if (a) € Top, (X). 


PnRoor: Let X bea topological space. Let a € X. Then for all Q € Top, (X), QN (X \ {a}) z 0 if and only 
if Q \ {a} # 0, which holds if and only if Q Z {a} by Theorem 8.2.5 (iv), which holds if and only if Q 4 () 
and Q Æ {a}. But the case Q = () is excluded by the definition of Top, (X). So a is a limit point of X if and 
only if VO € Top, CX), QN (SX {a}) 4 0, which holds if and only if VO € Top, (X), Q 4 {a}, which means 


that (a) ¢ Top, (X). 


31.10.5 REMARK: Limit points are uninteresting for finite neighbourhood-systems. 

If a point x has only a finite number of neighbourhoods Q € Top, (X), the point x can be a limit point only 
if there is at least one point y distinct from x which is in the intersection (] Top, (.X) of all neighbourhoods 
of z. (See Figure 31.10.1.) 


not a limit point limit point 


Figure 31.10.1 Limit point x of a set S 


But then G = (| Top, ((X) must be an element of Top, (X) (i.e. an open neighbourhood of x) if the number 
of neighbourhoods is finite. It would therefore follow that {x,y} C G C Q for all non-empty € € Top, (X). 
'This would imply that the topology is extremely weak. Such a topology would not even have the extremely 
weak T; separation property in Definition 33.1.8. Therefore limit points are of real interest only when 
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there are infinitely many neighbourhoods at each point. (A topological space where each point has only a 
finite number of neighbourhoods could be called a “first finite topological space” by analogy with the first 
countable spaces in Definition 33.4.12. But such a worthless concept does not deserve to have a name.) 


31.10.6 REMARK: Expression for the limit set of a set. 
The limit set of a set S in a topological space X may be written as 


{x € X; VO € Top, (X), Qn (SV {2} z 0]. 


31.10.7 THEOREM: Limit points are points which are not in the closure of their complement. 
Let S be a subset of a topological space X. Then a point z is a limit point of S if and only if x € S \ {£}. 


PROOF: The proof follows from Definitions 31.10.2 and 31.8.2. Let x € X. Then 


x is a limit point of S & VO € Top,(X), On (S\ {x}) Z0 
«€ 2390 € Top,(X), Qn (SV {z}) 2 0 
€ 530€ Top,(X), QC X \ (S\ {x}) 
€ z € Int(X V (SV (2j) 
erc S\ {x}. 


The last line follows from Theorem 31.8.16 (i). 


31.10.8 DEFINITION: An isolated point of a set S in a topological space X is a point x € S which is not a 
limit point of S. 


31.10.9 REMARK: Isolated points are points which are excluded by some neighbourhood of their complement. 
From Definition 31.10.2, it is clear that a point x in a topological space X is an isolated point of a subset S 
of X if and only if INQ € Top, (X), QN(S \ {r} = 0. 


31.10.10 THEOREM: A point is an isolated point if and only if its singleton set is a neighbourhood of it. 
A point a in a topological space X is an isolated point of X if and only if {a} € Top, (X). 


PROOF: The assertion follows immediately from Theorem 31.10.4 and Definition 31.10.8. 


31.10.11 REMARK: Theorem and diagram for properties of limit points and isolated points. 

Some relations between the closure of a set and its limit points and isolated points are summarised in 
'Theorem 31.10.12 for convenient reference. These relations are illustrated in Figure 31.10.2 for the case that 
X has no isolated points. Any isolated points in the interior of a set S are necessarily also isolated points 
of X. So S cannot have any isolated interior points if X has no isolated points. 


< S » 
Int(S) <«———— Bdy(S) ———— Ext(S) 
S >< X\S 
X > 
Int(S) Bdy(S)nS | Bdy(S)N S Ext(S) 
ws] NS 
*—— —— lim(S) ——* *—— ]im(S) —> 


Figure 31.10.2 Limit points, isolated points and set interior and boundary if iso( X) = () 
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31.10.12 THEOREM: Properties of limit points and isolated points. 
For subsets S of a topological space X, let lim(S) denote the set of limit points of S in X, and let iso(S) 
denote the set of isolated points of S in X. 


(i) iso(S) = S \ lim(S). 


(ii) lim(S) niso(S) = 0. 

(iii) lim(S)V S 2 SV S 

(iv) lim(S) Uiso(S) = S. 

(v) lim(S) U iso(S) = lim(S) U S. 

(vi) S is closed if and only lim(S) C S 
(vii) S = SUlim(S). 
(viii) Int(S) C lim(S) if iso(X) = 0. 
(ix) iso(S) n Int(S) = 0 if iso(.X) = 0. 


(x) iso(S) C Bdy(S) N S if iso(X) = 0. 


Proor: Part (i) is a restatement of Definition 31.10.8. 
Part (ii) follows from part (i). 


To show part (iii), note that by Definition 31.10.2 and Theorem 31.8.17 (ii), 
Va € X, z€lim(S)V S & (VO € Top,(X), Qn (SV(xz]) Z0) ^ (vg S) 
€ (VQ € Top,(X), ANS £0) ^ (x € S) 


ercS^zésS 
€ rcSVS. 
To show part (iv), note that lim(S) U iso(S) = lim(S) U Ed \ lim(S)) = lim(S) US = (lim(S) \ S) U S by 
part (i). So by part (iii), lim(S) Uiso(S) = (S$ \ S) U S = 
For part (v), note that by part (i), lim(S) Uiso(5) = lim(S a (S \ lim(S)) = lim(S)U S. 
For part (vi), note that S is closed if and only if S C S by Theorem 31.8.14 (vi), which holds if and only if 
S \ S = Í, if and only if lim(S) V S = 0 by part (iii), if and only if lim(S) C S. 
Part (vii) follows from parts (iv) and (i). 
For part (viii), suppose that iso(X) = Ø. Let S € P(X) and x € Int(S). Then Qı C S for some €, € 
Top,(X). Let Q € Top,(X). Then Qı n € Top,(X). So Qı NQ \ {xz} Æ 0 because iso(X) = 0. But 
Q4 19 CS because 0; C S. SoO0, n0 904, NNAS. So UQ NANQNS\ {r} z 0. SoNNS\ {x} z 0 because 
Q 20;n9. Therefore VO € Top, (X), Q0(SV(z])) # 0. Thus z € lim(S). 
For part (ix), suppose that iso(X) = Ø. Then Int(S) C lim(S) by part (viii). But iso(S) N lim(S) = 0 by 
part (ii). So iso(S) n Int(S) = 0. 
For part (x), suppose that iso(X) = Ø. Then iso(S) N Int(S) = ( by part (ix). So iso(S) C S VInt(S). But 
S \ Int(S) = Sn Bdy(S) by Theorem 31.9.10 (xxii). Hence iso(S) C S Bdy(S). 


31.10.13 REMARK: Diagram for properties of limit points and isolated points in a general topological space. 
Figure 31.10.3 illustrates the relations in Theorem 31.10.12 between limit points, isolated points and set 
interior and boundary in the general case where the set of isolated points iso(.X) is not necessarily empty. 


31.10.14 REMARK: Alternative definition for a closed set. 
Theorem 31.10.12 (vi) is sometimes given as the definition of a closed set. Then Definition 31.4.1 is given as 
a property of closed sets and Theorem 31.10.12 (vii) becomes the definition of the closure of a set. 


31.10.15 REMARK: A limit point is not necessarily an oo-limit point. 

It seems intuitively clear that if every neighbourhood of a point z contains at least one element of a set S, 
then by shrinking the neighbourhoods in an finite sequence will guarantee the existence of infinitely many 
distinct elements of S in every neighbourhood of z, since if the number were only finite, one could exclude 
them all by sufficiently shrinking a neighbourhood. This intuition is almost correct, but not quite. The 
proof that every oo-limit point is a limit point in Theorem 31.10.18 is trivial, but the converse implication 
in Theorem 33.1.22 requires the space to have the T4 separation property. 
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< S > 
Int(S) «— — — Bdy($) ———_+ Ext(S) 
S > X \ S ———————- 

X > 

«———— Int(S) ———-+ Bdy(S)n S Bdy(S)\ S$ Ext(S) ——— 
! lim(S)V S 
S58 

+— lim(S) ——*«— iso(S) —4——— lim(S) — 


Figure 31.10.3 Relations between limit points, isolated points and set interior and boundary 


31.10.16 REMARK: Terminology: co-limit point versus w-limit point. 

The name “w-limit point" could be confusing. Each neighbourhood of an co-limit point is required to include 
an infinite subset of points of the given set, not necessarily an w-infinite subset. (See Definition 13.7.6 for 
w-infinite sets.) Unfortunately, it is the name “w-limit point” which is most often seen in the literature for 
the concept in Definition 31.10.17, or the alternatives “w-accumulation point" or “w-cluster point”. 


31.10.17 DEFINITION: An co-limit point of a set S in a topological space X is a point z € X which satisfies 
VQ € Top,(X), #(QN (S \ {z})) = oo. 


An oo-limit point is also known as an oo-accumulation point or an oo-cluster point. 


The co-limit set of a set S in a topological space X is the set of co-limit points of S. 


31.10.18 THEOREM: All co-limit points are limit points. 
Let X be a topological space, S C X and z € X. If z is an co-limit point of S, then z is a limit point of S. 


PROOF: Let X be a topological space with S C X. Let z € X be an co-limit point of S. Let € € Top, (X). 
Then #(QN(S\{z})) = oo by Definition 31.10.17. So ON(S\{z}) z 0. Thus VO € Top, (X), QA(S\{z}) z 0. 
Hence z is a limit point of S by Definition 31.10.2. 


31.10.19 THEOREM: All co-limit points of a set are oo-limit points of all supersets. 
Let X be a topological space. Let 5,55 € P(X) satisfy Sı C S2. 

(i) Let z be an co-limit point of $4. Then z is an cc-limit point of S5. 

(ii) Let A1, Ag be the oo-limits sets of S1, S2 respectively. Then A1 C Ag. 


PROOF: For part (i), let 54,55 be subsets of a topological space X such that Sı C S2. Let z be an 
oo-limit point of Sı. Then VO € Top,(X), #(Q n (S; \ (z])) = oo by Definition 31.10.17. Therefore 
VO € Top, CX), (QN (S2 \ {x})) = oc. Hence z is an oo-limit point of S5 by Definition 31.10.17. 


Part (ii) follows from part (i) and Definition 31.10.17 for the oo-limit set. 


31.11. Some simple topologies on infinite sets 


31.11.1 REMARK: Stronger topologies separate sets more effectively. 

The task of a topology is to separate sets or points from each other. (Elementary topology introductions 
create the opposite impression, namely that a topology joins sets together.) A stronger topology is better at 
separating sets or points. A weaker topology has less ability to separate sets or points. (See Definition 31.3.23 
for weaker and stronger topologies. See Sections 33.1 and 33.3 for separation classes.) 


When a set is finite, the ability to separate points from each other implies that all singletons are open sets, 
which implies that the topology is the discrete topology. (See Theorem 33.1.17.) When a set is infinite, 
the ability to separate all pairs of singletons does not result in the topology being discrete. This fact is 
demonstrated in Example 31.11.2. 
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31.11.2 EXAMPLE: A topology with pairwise point separation, but some points not isolated. 

Figure 31.11.1 illustrates a topology with imperfect separation on a countably infinite set X = Zg- In other 
words, the topology is not the discrete topology. Even though the point x = 0 is separated from any other 
point y € X by a set Q, € Top(X), it is not possible to construct {x} as the intersection of a finite set of 
open sets Qy. So x is not an isolated point. 


Figure 31.11.1 illustrates the set T = (0) U {Q,; y € Z } where Qy = {i € Zi; i=0 V i» y) for y € Zj. 


Q76 Qs [Q4 
yw wv Ww 
me selel elal] " . . 
9/8716] 5] 4 3 2 1 
Figure 31.11.1 Topology with poor separation on a countably infinite set 


It is easily verified that the range of any non-increasing sequence of subsets of P(X) (with respect to the 
set-inclusion partial order) is closed under finite intersection and arbitrary union. So if Ø and X are added, 
the result is a valid topology. It follows that T is a valid topology on X. 


Although x = 0 is separated from y by the open set Qy for all y € Zj (because 0 € Q, and y € Qy), the 
set {0} is clearly not equal to the intersection of any finite number of sets Qy. So (0) ¢ T although 0 is 
separated from all individual elements of X. 


The topology T may be extended to include all sets (y) for y € Z*. Let T’ = (G1 U Go; G1 € T and G2 € 
IP(Z*)). (This is the same as T’ = P(Z+) U{Q € P(Zj); #(Z* VQ) < oo).) Then it is (fairly) clear that 
T" is also a valid topology on X. This larger topology completely separates all points x € Zi from other 
elements y € X V {x}. The set {x} is in T" for all x € Z*, but (0) £ T". 


31.11.3 REMARK: Translation-invariant topologies on the integers containing non-empty finite sets. 
There is now the question of whether infinite sets such as Z have interesting topologies which are invariant 
under various groups of permutations of the point set. 


Suppose a topology T on Z is invariant under all translations of Z and contains at least one non-empty 
finite set. (See Definition 24.2.4 and Notation 24.2.5 for translates of sets.) Then T = P(Z). To show this, 
let Q € T be a non-empty finite set. Let d = max(Q) — min(Q). Then QN (Q + d) = {max(Q)} contains 
exactly one element of Z, where €) +d denotes the translate of Q by a distance d. It follows that (x) € T for 
all z € Z. From this, every subset of Z can be constructed as a union of singleton sets. (See Definition 32.5.3 
for the standard topology for the set of integers.) 


31.11.4 REMARK: Translation-invariant topologies on the integers containing no non-empty finite sets. 
Next one naturally asks whether there are non-trivial, non-discrete topologies on the integers which contain 
no non-empty finite sets. 


The set T; of all subsets of Z with period k € Z* is a translation-invariant topology on Z. These topologies 
are simply infinite copies of the discrete topology on Nx. The case k = 1 is the trivial topology. A more 
interesting class of topologies is given in Theorem 31.11.6. These complement-of-finite-set topologies are 
well defined on arbitrary sets. (There is a nuance here in regard to the condition “#(X V Q) < oo". This 
means that there exists a bijection from a finite ordinal number to #(X V Q). But the word “exists” raises 
questions of constructibility in some situations.) 


31.11.5 REMARK: Topology consisting of complements of finite subsets of the integers. 
'The topologies in Remarks 31.11.3 and 31.11.4 do not exhaust all of the possibilities for translation-invariant 
topologies on Z. By Theorem 31.11.6, a topology can be defined on X — Z by 


Top(X) = {0} U{Q € P(X); #(X \ 9) < oo]. (31.11.1) 
This topology is invariant under all permutations of the point set Z. The set construction defined by (31.11.1) 
is a valid topology for any set X, and it is always permutation-invariant. It might not be very useful in direct 
applications, but it is certainly useful for counterexamples, as in Remark 33.1.27 and Example 35.3.15. 
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31.11.6 THEOREM: Topology consisting of complements of finite subsets of an arbitrary set. 
For any set X, let T = (0) U {Q € P(X); #(X \ 9) < oo). Then T is a topology on X. 


PROOF: Let T = {0}U{Q € P(X); #(X \Q) < oo} for a set X. Then clearly Ø € T and X € T. 
Let Q1, Q2 € T. If Q1 = 0 or Qs = 0, then Q4 n Q3 = 0 € T. So suppose that Qı 4 Ø and N2 Æ Ø. Then 
#(X \ Q1) < oo and #(X V Q2) < oco. But X \ (0; N Q2) = (X \ Q1) U (X V Q3) by De Morgan’s law 
(Theorem 8.2.3). So #(X V (Q4 N 93)) € #(X \ Q1) + (XV 03) < oo. So Q 05 € T. 

Now suppose that C € P(T). Let Oc = UC. If C = 0, then Qc = 0 € T. Solet C z Ø. Then 
XX 96 = (XX; Q € C} by the generalised De Morgan’s law (Theorem 8.4.12). So Z(X Vc) < 
min{#(X V 9); QE C) « oo. So Qc € T. Hence T is a topology on X by Definition 31.3.2. 


31.11.7 DEFINITION: The finite-complement topology on a set X is the set 


T = (0) U{Q € P(X); #(X \ 9) < oo). (31.11.2) 


A finite-complement topological space is a topological space (X, T) such that T is the finite-complement 
topology on X. 


31.11.8 REMARK: The finite-complement topology is the minimal “closed-singleton topology”. 

The finite-complement topology in Definition 31.11.7 could be thought of as the “minimal closed-singleton 
topology” because it is the smallest topology on a set X for which singletons {x} are closed sets for all x € X. 
(This property of a topology is defined as the T, separation property in Section 33.1. So a slightly less clumsy 
name for this concept would be the “minimal T; topology” .) 


31.11.9 THEOREM: The finite-complement topology on a set is a topology. 
The finite-complement topology on a set X is a topology on X. 


PROOF: Let X be a set. Define T as in equation (31.11.2). Then clearly T C P(X) and (0, X) C T. So 
Definition 31.3.2 (1) is satisfied. 


Let Q1, 05 € T. If Qi = Ø or Q2 = 0), then Q1 N Q9 = Ø € T. So suppose that Qı Z Ø and Qə Æ Ø. Then 
Q1, Q2 € P(X), and 3X \ Qı) < œ and #(X \ 92) < œ. But X \ (Qı B Q2) = (X \ 01) U CX V Q2) by 
Theorem 8.2.3. So #(X V (04 N Q2)) < oo and therefore Qı N Q2 € T, which satisfies Definition 31.3.2 (ii). 

Let C C T and let Q = UC. If Q = then Q € T. So suppose that Q Z Ø. Then 3G € C, G z 0. Therefore 
AG € C,(G C X ^ #(X \G) < oo). But by Theorem 8.4.8 (xiv), G € C implies that G C UC. Therefore 
G € C,(GC Q ^ #(X\G) < oo). Hence Z(XX 9) € #(X \ G) < oo. So UC € T, which satisfies 
Definition 31.3.2 (iii). 


31.11.10 THEOREM: The finite-complement topology is the smallest topology with all singletons closed. 
The finite-complement topology on a set X is the smallest topology on X for which {x} is a closed set for 
all z € X. 


PRoor: To show that {x} is closed for all x € X for the finite-complement topology on the set X, let 
Q = X V {x} and note that #(X VQ) 2 1 < oc. 


Let T' be the finite-complement topology on a set X. To show that T' C T" for all topologies T" on X such 
that {x} is a closed subset of X with respect to T" for all x € X, let T" be such a topology. Let Q € T. 
If Q = @, then Q € T" because T" is a topology. So let Q be a subset of X such that #(X V Q) < oo. 
Let C = {X \{z};x e XXV QO}. Then Z(C) < œ and C C T" because every singleton {x} is closed 
with respect to T”. If #(C) = 0 then O = X, and so Q € T" because T" is a topology on X. So assume 
that 1 < #(C) < oo. Then (C € T" because T" is a topology. Thus Q = NC € T'. Hence T C T". That 
is, T is smaller than (or equal to) every topology on X for which {x} is a closed set for all x € X. 


31.11.11 REMARK: Properties of finite-complement topologies. 
Let X be a topological space with the finite-complement topology. If X is a finite set, then Top( X) = P(X). 
In other words, The topology is the same as the discrete topology in Definition 31.3.19. So for any subset S 
of X, the interior, boundary, exterior and closure are as follows. 
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Int(S) Bdy(S) Ext(S) S 
S 0 X\S S 


If the set X is infinite, the interior, boundary, exterior and closure are as follows. 


cardinality of S Int(S) Bdy(S) Ext(S) S 
3*(S) < oo 0 S XXE. 8S 
#(S) = oo, (X VS) = oo 0 X 0 x 
#(X \ S) « oo S xX\S 0 x 


These tables suggest that the finite-complement topologies are rather uninteresting. When X is finite, there 
are no boundary points for any set, and the interior and exterior are simply S and X XS for any set S € P(X). 
So the topology gives no information about a set S other than its set of elements. 


When X is infinite, any set S for which #(S) = oo and #(X V S) = oo has Bdy(S) = X. In other words, 
such sets have no interior and no exterior. So the topology says very little of interest about sets S, other 
than whether they (and their complements) are finite or infinite. 


It can be hoped that this simple class of topologies may be of some use in providing pathological examples 
to disprove false conjectures. (See for example Remark 33.4.18 and Theorem 33.4.19.) In particular, various 
trivial classes of topologies show that Definition 31.3.2 for a general topology is perhaps overly broad. This 
motivates the introduction of the classes of topologies in Chapter 33, which add various sets of extra axioms 
to topologies to make them more likely to be useful. 


Sections 32.1, 32.3 and 32.4 introduce methods of generating topologies which are more interesting than 
the finite-complement topologies because the interior, boundary and exterior of sets can be made to contain 
much more information when topologies are built up in more sophisticated ways. 


31.11.12 REMARK: Particular-point topological spaces. 
Particular-point topological spaces provide useful counterexamples for some compactness concepts. If X has 
the particular-point topology for a point p € X, then Top(X) = {X}U P(X \ {p}). 


31.11.13 DEFINITION: The particular-point topology on a set X, for a particular point p € X, is the set 


(0) U {9 € P(X); pe OF = {0} U (P(X) \ P(X V {p})). 


31.11.14 THEOREM: Particular-point topologies are valid topologies. 
The particular-point topology on a set, for a given point in that set, is a valid topology on the set. 


PROOF: Let T be the particular-point topology on a set X for a point p € X. Then (0, X} C T, and T is 
clearly closed under unions and intersections. So T' is a topology on X by Definition 31.3.2. 


31.12. Continuous functions 


31.12.1 REMARK: The fundamental importance of continuity in topology. 

One might think that the subject of topology should be initially concerned only with topological spaces, and 
then only secondarily concerned with the qualities and properties of functions between such spaces. However, 
the continuity of functions is so fundamental to topology that it would be quite difficult to say much about 
topological spaces without it. (In fact, continuity was invented before topology. See Remark 31.1.1.) 

In Chapters 32, 33 and 34, applications of continuity are almost ubiquitous. Consequently the most basic 


definitions for continuity are introduced "early" in Sections 31.12, 31.13 and 31.14 to *bootstrap" the subject 
of topology. (Higher-level aspects of continuity are presented in Sections 35.1, 35.2 and 35.3.) 


31.12.2 REMARK: History of the continuity of functions. 

According to Bynum et alii [238], page 14, continuity of functions was first defined by Cauchy in 1821. (See 
Cauchy [206], pages 34-37.) Boyer/Merzbach [237], page 456, discussing definitions of limits in Cauchy's 
1821 textbook “Cours d'analyse de l'Ecole Polytechnique" and his later textbooks, stated: “Cauchy also 
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gave a satisfactory definition of a continuous function.” More detail is given by Cajori [241], page 419 as 
follows. (The source for his free-style Cauchy translation is Cauchy [206], pages 34-35.) 


J. Fourier’s declaration that any given arbitrary function can be represented by a trigonometric series 
led Cauchy to a new formulation of the concepts “continuous,” “limiting value” and “functions.” In 
his Cours d’Analyse, 1821, he says: “The function f(x) is continuous between two given limits, if for 
each value of x that lies between these limits, the numerical value of the difference f(x +a) — f(x) 
diminishes with a in such a way as to become less than every finite number." 


Such precision in the definition of continuity only became necessary when the concept of a function was 
liberated from the earlier “analytic” style of function which used a formula of some kind. Therefore such 
a precise definition of continuity could not appear before functions had been redefined to have the modern 
meaning, which permits discontinuities. 


Reading Cajori's above quotation from Cauchy, one could easily imagine that Cauchy was describing uniform 
continuity here. However, the comments and examples given by Cauchy [206], pages 34-37, leave little doubt 
that he was defining “continuity” to mean pointwise continuity at all points of the domain of a function. 
(See Theorem 35.3.3 for the equivalence of pointwise continuity to the continuity in Definition 31.12.4.) 


In 1821, Cauchy used the symbol a for small variations of x in expressions such as f(x 4- &)) — f(x), and used 
€ for a small bound on the variation of a function value. (See for example Cauchy [206], page 49. See also 
Cauchy [207], page 27, for his use of the specific symbols £ and 6 for the analysis of derivatives in 1823.) 


31.12.3 REMARK: The abstract open-set definition of continuity of functions. 

In Section 31.12, continuity of functions is considered from the global point of view. In other words, a 
function is defined to be globally continuous or not continuous. In Section 35.3, continuity is looked at from 
the point-wise perspective, which means that a function may be either continuous or not continuous at any 
individual point of its domain, more or less independent of the rest of the domain. 


Definition 31.12.4 is the standard modern definition for continuous functions in general topological spaces. 


31.12.4 DEFINITION: A continuous function from a topological space X to a topological space Y is a 
function f : X — Y such that VO € Top(Y), f~1(Q) € Top(X). 


31.12.5 REMARK: Comparison of open-set-based continuity with interval-based real-function continuity. 

Continuity of real-valued functions of one or more real variables may be defined without topology. For such 
functions, continuity may be defined using only real-number intervals, which need only total order structure 
for the real numbers. The open sets 2 in Definition 31.12.4 may be thought of as generalisations of intervals. 


31.12.6 THEOREM: A function is continuous if and only if the inverse images of closed sets are closed. 
Let X and Y be topological spaces. Then f : X — Y is continuous if and only if 
VK € Top(Y), f-1(K) € Top(X). 


PROOF: Let X and Y be topological spaces. Then for f : X > Y, 


f is continuous & VO € Top(Y), f^! (0) € Top(X) 
€ VK € Top(Y), f. (Y V K) € Top( X) 
€ VK € Top(Y), XV f (Y V K) € Top(X) 
€ VK € Top(Y), f^! (K) € Top(X). (31.12.1) 


Line (31.12.1) follows from Theorem 10.6.10 (v^). 


31.12.7 THEOREM: The composite of two continuous functions is continuous. 
Let X, Y and Z be topological spaces. Let f : X —^ Y and g : Y — Z be continuous. Then go f: X > Z 
is continuous. 


PROOF: Let f: X — Y and g : Y — Z be continuous for topological spaces X, Y and Z. Let Q € Top(Z). 
Then g~1(Q) € Top(Y) by the continuity of g. So (go f) !(Q) = f^! (g^! (0)) € Top(X) by the continuity 
of f for all Q € Top(Z). Hence g o f : X > Z is continuous. 
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31.12.8 REMARK: Existence of continuous functions in particular cases. 

It follows from Theorem 31.12.9 that there is at least one continuous function f : X — Y for any topological 
spaces X and Y if Y £0. 

If X = 0, then Y* = (0) by Theorem 10.2.26 (ii), and the empty function f = ( is a continuous function 
from X to Y, even if Y — (). (The empty function also happens to be constant!) 

If X #0 and Y = 0, then Y* = Ø by Theorem 10.2.26 (i), and therefore there are no continuous functions 
from X to Y (because there are no functions at all). However, Theorem 31.12.9 is still valid when X # () 
and Y — () because there are then no constant functions which are not continuous. 


31.12.9 THEOREM: All constant functions are continuous. 
All constant functions are continuous. 


PROOF: Let f : X — Y be constant, for topological spaces X and Y. Then da € Y, Va € X, f(x) =a. 
Let a € Y satisfy Vr € X, f(x) =a. Let Q € Top(Y). Then either a € Q or a € Q. So either f^ !(Q) = X 
or f! (Q0) = 0. In either case, f^! (Q) € Top(X) by Definition 31.3.2 (i). 


31.12.10 THEOREM: All functions with discrete domain or trivial target space are continuous. 
Let X and Y be topological spaces. 


(i) If X has the discrete topology Top( X) = P(X), then all functions from X to Y are continuous. 
(ii) If Y has the trivial topology Top(Y) = (0, Y }, then all functions from X to Y are continuous. 


PROOF: For part (i), let X and Y be topological spaces such that Top(X) = P(X). Let f : X > Y. 
Let Q € Top(Y). Then f(Q) € P(X) = Top(X). So f is continuous. 

For part (ii), let X and Y be topological spaces with Top(Y) = (0, Y). Let f : X — Y. Let Q € Top(Y). 
Then Q = or Q =Y. If Q = 9, then f^ !(0) = 0 € Top(X). If Q =Y, then f! (0) = X € Top(X). So f 
is continuous. 


31.12.11 NOTATION: 
C(X,Y), for topological spaces X and Y, denotes the set of continuous functions from X to Y, 


C? (X, Y) is an alternative notation for C(X, Y) for topological spaces X and Y. 


31.12.12 REMARK: The convenience, and ambiguity, of the C? notation for continuous functions. 
The alternative notation C?(X, Y) in Notation 31.12.11 is often used in contexts where the sets C^(X, Y) 
of k-times continuously differentiable functions from X to Y for k € Zi are also defined. 


Even if differentiability of functions from X to Y is not defined, the C? notation is a convenient abbreviation 
for the word “continuous” because the notation “C” on its own is ambiguous, whereas “C®” strongly suggests 
the idea of continuity. It is often said that a “function is C°”, but rarely that a “function is C”. (See for 
example Notation 42.1.10 for C* function spaces.) 


Notation 31.12.13 defines the default range Y of C(X,Y) as the real numbers. In other words, C°(X) is 
defined to equal C?( X, IR). 


31.12.13 NoTATION: C(X), for a topological space X, denotes the set of continuous functions from X to 
R with the usual topology on IR. 


31.12.14 REMARK: Continuity of restrictions of continuous functions. 

It seems intuitively obvious that any restriction of a continuous function should be continuous. This is 
certainly clear in the case of metric spaces because the bounds which apply to the full function will apply at 
least as strongly for the restricted function. In the case of general continuous functions, one must demonstrate 
that the inverse images of open sets are open sets, but this is not true in general. (Consider for example the 
map f : IR? — IR defined by f : (z1,22) — 1, restricted to {x € IR?; £2 = 0}.) 

However, it is true that the inverse images of a continuous function restricted to a set S are restrictions of 
open sets to S. These restricted sets happen to be the sets in the relative topology in Definition 31.6.2. This 
observation is stated as Theorem 31.12.15. This is useful for restrictions of fibre charts and projection maps 
for fibre bundles. (See Theorem 32.10.8 for an application to restrictions of homeomorphisms whose target 
space is a direct product of topological spaces.) 
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31.12.15 THEOREM: Restrictions of continuous functions are continuous. 
Let (X4, T1) and (X2, T5) be topological spaces. Let f : X4 — X» be continuous. Let S be a subset of X 
with the relative topology T1 on S from Ti. Then fla : S — Xə is continuous from (S, T1) to (X2, T2). 


PROOF: Let Gp € T5. Then FIR (G3) = f-1(G2) NS € T! by Definition 31.6.2 for the relative topology 
on $ because f~'(G2) € T1 by Definition 31.12.4. Hence fls : S — Xə is continuous. 


31.12.16 REMARK: Keyhole testing for continuity. 

In the case of topological manifolds, it is sometimes inconvenient to test the continuity of a function globally. 
Theorem 31.12.17 makes it possible to conclude that a function is continuous globally if each point has 
a neighbourhood for which the restriction of the function is continuous. The proof glues together the 
restrictions U N f~'(Q) of an inverse image f~+(Q) to elements U of an open cover Q of the domain X to 
show that f~'(Q) is open if Q is open. (See Theorem 41.1.27 for a Ct version of Theorem 31.12.17.) 


31.12.17 THEOREM: Local continuity implies global continuity. 
Let f : X — Y for topological spaces X and Y. Then f is continuous if and only if for all p € X there is a 
set U € Top, (X) such that f lu is continuous. In other words, 


feC(X,Y) & VpeX,3U e Top,(X), fly € C(U,Y). 


PROOF: Let f: X — Y satisfy Vp € X, JU € Top, (X), fly € C(U,Y). Define a collection of open sets 
Q = {U € Top(X) \ {0}; fly € C(U, Y)). Then UQ = X. 

Let Q € Top(Y). Let U € Q. The continuity of fale from U to Y for such a set U implies by Definition 31.12.4 
that (f|,,)~'(Q) € Top(U). But (f],,)~1(Q) = Un f^! (Q) by Theorem 10.6.12 (ii). So Un f! (0) € Top(U). 
But Top(U) C Top(X) by Definition 31.6.2. So U N f~1(Q) € Top(X). But f! (0) = Upeg(U n f? (Q)) 
by Theorem 8.7.5 because f^! (Q) C X = UQ. Therefore f^! (Q) € Top(X) by Definition 31.3.2 (iii). Hence 
f € C(X,Y) by Definition 31.12.4. The converse follows from Theorem 31.12.15. 


31.12.18 REMARK: Constant continuous extensions of continuous functions. 

In differential geometry, it is sometimes necessary to extend a continuous function from an open subset to 
the whole manifold. Theorem 31.12.19 states that if the function is continuous on the closure of an open 
subset and constant on its boundary, then its constant extension to the whole space is continuous. This 
result may seem obvious in a Cartesian space, but if the space is very weakly separated, it might not be 
so obvious. (See Section 33.1 for weak separation classes. See Section 49.5 for examples of non-Hausdorff 
locally Cartesian spaces where Theorem 31.12.19 might be not entirely obvious.) 


31.12.19 THEOREM: Continuous function with constant boundary value has continuous constant extension. 
Let X and Y be topological spaces. Let Q € Top(X) and h € C(Q,Y). Let yo € Y. Suppose that h(x) = yo 
for all x € Bdy(Q). Define f : X > Y by 


vex O UE siga 


Then f € C(X, Y). 


PROOF: Let G € Top(Y). Suppose that yo € G. Then f^ !(G) CQ. So f !(G 


)-f (G)nQ = (Fla) 
by Theorem 10.6.12 (ii). So f! (G) = (h|3)-! (G) € Top(Q) because h|, € ze Y) 
op(Q 


by A 31.12. 
) 


by Definition 31.12.4. 
z(Ungoucd 0). 


Q 
Now suppose that yo € G. Then f~1(G) = h-1(G) U (XXV), and h71(G) € 
So h-1(G) = U NQ for some U € Top(X) by Definition 31.6.2. This gives - 1(G) 
But XV Q = (X \ Q) U Bdy(Q) by Theorem 31.9.10 (xix). So f^ !(G) = (UN Q) U (X V Q) U Bdy(Q). 
h-!(G) because yo € G and h(x) = yo for x € Bdy(Q). So Eus C I (149, and ss 


But Bdy(Q) € 
fG) = w N O) U (X VQ) by Theorem 8.1.7 (iv). So f^! (G) = (UND) U(UN(X \ Q)) U (X VQ) by 
Theorem 8.1.6 (xiii). Therefore /-!(G) = U U (X V Q) by Theorem 8.2.5 (ix). So f^! (G) € Top(X) since 


U € Top(X x aI X VQ € Top(X). Hence f € C(X,Y) by Definition 31.12.4. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13 


1064 31. Topology 


31.13. Continuous partial functions 


31.13.1 REMARK: Continuity of partial (i.e. partially defined) functions. 

In differential geometry, partial functions are frequently encountered as charts for manifolds and fibre bundles. 
Consequently a definition of continuity for partial functions is required. Definition 31.13.2 is identical in form 
to Definition 31.12.4 for fully defined functions. (See Section 10.9 for partially defined functions.) 


31.13.2 DEFINITION: A continuous partial function from a topological space X to a topological space Y 
is a partial function f : X — Y such that VO € Top(Y), f~1(Q) € Top(X). 


31.13.3 REMARK: Composition of continuous partially defined functions. 

The composition of partial functions is defined for completely arbitrary source and target sets. As indicated 
in Definition 10.9.2, a partial function need not have a specified source set or target set, and if a source 
set is specified, the domain is only required to be a subset of that source set. Thus theorems about the 
composition of partial functions may specify source and target sets, as in Theorem 10.10.9, or the source 
and target sets may be unspecified, as in Theorem 10.10.13. 


It is meaningless to talk about the continuity of a fully or partially defined function without specifying 
topological spaces as its source and target sets. But even if the source and target spaces of two functions are 
specified, the continuity of the composition fə o fı : $1 > Tə of two continuous partial functions fi: Sy > Ti 
and f2 : $5 > Tə depends on how the topologies on Tı and Sə are related. Theorem 31.13.4 avoids this 
issue by requiring the first target space and second source space to be subsets (with relative topologies) of a 
single topological space. Then any subset of the intersection of these two spaces is open in one space if and 
only if it is open in the other. 


31.13.4 THEOREM: The composite of two continuous partial functions is continuous. 
Let X, Y and Z be topological spaces. Let f : X > Y and g : Y > Z be continuous partial functions. Then 
gof: X >Z is a continuous partial function. 


PROOF: Let f: X > Y and g: Y > Z be continuous partial functions for topological spaces X, Y 
and Z. Let Q € Top(Z). Then g^ !(Q) € Top(Y), and so (g o f) !(Q) = f-!(g-!(Q)) € Top(X), by 
Definition 31.13.2. Therefore VO € Top(Z), (g o f)! (Q) € Top(X). Hence g o f : X — Z is continuous by 
Definition 31.13.2. 


31.13.5 REMARK: Upgrading the composite of two partial functions to a surjective function. 

'Theorem 31.13.6 is the same as Theorem 31.13.4 except that the partial function becomes a surjective fully 
defined function by setting the source and target sets to the partial function's domain and range respectively. 
This theorem has particular relevance to the composition of charts on locally Cartesian spaces in Section 49.6. 
(The proof of Theorem 31.13.6 is illustrated in Figure 31.13.1.) 


NS 
Dom(f) C f^ (Dom(g)) 


Figure 31.13.1 Inverse image of open set under composite of partial functions 
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31.13.6 THEOREM: The composite of two continuous partial functions is a continuous function. 

Let X, Y and Z be topological spaces. Let f : X > Y and g : Y > Z be continuous partial functions. 
Then g o f is a surjective continuous function from Dom(g o f) = f~'(Dom(g)) € X to Range(g o f) = 
g(Range(f)) € Z with respect to the relative topologies on Dom(g o f) to Range(g o f) in X and Z 
respectively. 


PROOF: Let f : X >Y and g : Y > Z be continuous partial functions for topological spaces X, Y and 
Z. Let X' = Dom(g o f) and Z' = g(Range(f)). Then X' = f-!(Dom(g)) and Z' = g(Range(f)) by 
Theorem 10.10.13 (i, ii). Therefore g o f : X' > Z’ is a well-defined surjective function. The inclusions 
X' C X and Z' C Z follow from Theorem 9.5.18 (ii,i) and Definition 10.2.2 (iii). 


Let € € Top(Z’). Then Q' = QN Z’ for some Q € Top(Z) by Definition 31.6.2 for the relative topology on 
Z' in Z. By Definition 31.13.2, g^! (Q) € Top(Y), and so f^! (g !(Q0)) € Top(.X), because f : X >Y and 
g:Y > Z are continuous. But (g o f) ! (Q0) = (go f)! (Qn Z’) = (go f)! (Q) by Theorem 10.7.1 (iii). 
Therefore (g o f)! (0) = f-!(g~1(Q)) € Top(X). But (go f)~1(Q’) C X’. It follows that (go f) !(0") = 
(go f) ! (Q")nX' € Top(.X^) by Definition 31.6.2 for the relative topology on X’ in X. Hence g o f : X' > Z! 
is a continuous function by Definition 31.12.4. 


31.14. Homeomorphisms 


31.14.1 REMARK: History and etymology of the word “homeomorphism”. 

The word “homeomorphism” was introduced by Henri Poincaré in 1895. (See EDM2 [113], section 425.G.) 
It comes from the Greek word Ouoio¢g meaning “like, similar, resembling; the same, of the same rank; equal 
citizen; equal; common, mutual; a match for; agreeing, convenient” (Feyerabend [475], page 273), and uopoń 
meaning “form, shape, figure, appearance, fashion, image; beauty, grace” (Feyerabend [475], page 257). 


31.14.2 DEFINITION: A homeomorphism between two topological spaces (X1, T1) and (X2, T5) is a bijection 
f: X41 — Xə such that both f and fT! are continuous. 


Two topological spaces (X1, T1) and (X5, T5) are said to be homeomorphic if there exists a homeomorphism 
f ; X1 — Xo. 


A topological automorphism on a topological space X < (X,Tx) is a homeomorphism from X to X. 


31.14.3 NOTATION: (X4, T1) ~ (X2, T5), for topological spaces (X1, Tı) and (X2, T2), means that (X4, T1) 
and (.X5, T2) are homeomorphic. 


X, £& Xə, for topological spaces X4 and X5, means that X; and X» are homeomorphic with respect to 
topologies which are implicit in the context. 


f: CX3, T1) © (X2, T3), for topological spaces (X1, T1) and (X2, T5), means that f is a homeomorphism from 
X, to X» with respect to topologies T; and Tə on X, and X» respectively. 


f : Xı ~ Xs, for topological spaces X4 and X5, means that f is a homeomorphism from X, to Xə with 
respect to topologies which are implicit in the context. 


31.14.4 EXAMPLE: A continuous bijection is not necessarily a homeomorphism. 

Let Xi = X» = {a,b} with a Æ b. Let Ti = P(X) and To = (0, X2} C Ti. Let f = idx,. Then 
fQ) = Q € T; for all Q € T». So f : X1 — X» is continuous with respect to the topological spaces 
(X1, Tı) and (X2, T5). However, if G = {a} then G € Ti, but G ¢ T3. So f^! : Xp — X; is not continuous. 
Therefore f : X, — X» is not a homeomorphism with respect to topologies T) and T}. 


31.14.5 REMARK: Homeomorphisms and the axiom of choice. 

In pure ZF set theory without any axiom of choice, it is not possible to know for all pairs of topological 
spaces whether the pair is homeomorphic or not. Consider for example two topological spaces (.X;, T;) for 
i = 1,2, where T; is the coarse topology on X;. These spaces are homeomorphic if and only if X4 and Xs 
are equinumerous, but in general it is not possible to determine whether they are equinumerous or if one of 
the sets is "larger" than the other. (See Remark 13.1.20 for the “comparability theorem".) Therefore it is 
unsafe to assume that two spaces either are or are not homeomorphic. It may be impossible to determine 
unless suitable axioms are added, and even then, AC only tells you that a homeomorphism either exists or 
does not. The axiom of choice does not help you to determine whether it exists or not. 
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31.14.6 REMARK: Notation for sets of isomorphisms. 

The notation “Iso” in Notation 31.14.7 is (probably) non-standard. But it denotes a kind of set construction 
which is often convenient in differential geometry. The space Aut(X) is the same as Iso(X, X). Such sets 
of morphisms can be defined for a very wide range of classes of structures. In a narrow context, there is no 
danger of confusion, but in differential geometry, isomorphisms are often intermingled for different classes 
within a single context. Therefore a subscript may sometimes be added to distinguish structure classes. 


31.14.7 NOTATION: Iso(X,Y), for topological spaces X < (X, Tx) and Y < (Y, Ty), denotes the set of all 
topological isomorphisms (i.e. homeomorphisms) from X to Y. 


Aut(X), for a topological space X < (X, Tx), denotes the set of all topological automorphisms on X. 
In other words, Aut(.X) = Iso( X, X). 


31.14.8 REMARK: The homeomorphism relation is an equivalence relation on the class of topological spaces. 
It follows from Definition 31.14.2 that the set map f : IP(X1) — P(X2) of f is a bijection between the open 
sets in Tj and 75. In other words, there is a one-to-one correspondence between the topologies of the two 
sets. (This is stated as Theorem 31.14.9.) This implies that absolutely all topological properties of the 
two sets are identical. The homeomorphism relation is clearly an equivalence relation, but the set of all 
topological spaces is not a set. So this is not an equivalence relation in the sense of a set of ordered pairs as 
in Definition 9.5.2. 


31.14.9 THEOREM: A homeomorphism maps the domain and range topologies to each other. 
Let f : X41 > X3 be a homeomorphism between topological spaces (X1, T1) and (X2, T2). 
'Then To = {f(Q); QE Tı} and Ti = T3 HU QE T3). 


Pnoor: Let f: X; — X» be a homeomorphism from (X1, T1) to (Xo, T2). Let Q € T». Then f-!(Q) € Ti 
by Definitions 31.14.2 and 31.124. So Tı 2 {f-1(Q); Q € T). Let Y € Tj. Then f(Q) = (f-1)-!(Q) € Th 
since f^! is continuous. So Q = f^ !(f(Q0/)) = f^! (Q), where 0 = f(Q’) € Ti. So Ti C (f! (Q); Q € Ta}. 
Hence Tj = (f^! (0); Q € To}. Also, To = {f (Q); Q € Tj) since f^! : X9 — X4 is a homeomorphism. 


31.14.10 REMARK: Homeomorphisms between subsets of topological spaces. 

Theorem 31.14.11 is apparently a trivial consequence of Theorem 31.14.9, which is itself almost obvious. 
However, Theorem 31.14.11 is not quite so clear when expressed informally that “the topology induced on 
the image of a homeomorphism is the same as the relative topology on the image". This proposition is not 
in fact valid unless one considers that the inverse f -1 of a homeomorphism f : X, — X» is considered to be 
continuous only with respect to the relative topology on f (X1). 


This use of the relative topology is not required for the continuity of the forward map f. Thus when one says 
that a topological space is homeomorphic to a subset of a given topological space, this must be interpreted 
with respect to the relative topology on the image of the homeomorphism. 


Theorem 31.14.13 is a two-sided version of Theorem 31.14.11 which similarly has the purpose of making 
clear that a homeomorphism between subsets of topological spaces is to be interpreted with respect to the 
relative topologies on the forward and reverse images of the homeomorphism. Definition 31.14.12 makes this 
interpretation explicit. 


31.14.11 THEOREM: A local homeomorphism maps the domain topology to the range’s relative topology. 
Let (X;,T1) and (X5, T5) be topological spaces. Let f : X4 — X» be a homeomorphism from (X1, T1) to 
(f(X1), T(x,)), where T(x,) is the relative topology of f(X1) in X2. Then Tyx,) = {f(Q); Q € Top(X1)]. 


PROOF: By Theorem 31.14.9, Tx) = {f(Q); Q € Ti}. 


31.14.12 DEFINITION: A homeomorphism between subsets Sı and S2 of topological spaces X; and Xə 
respectively is a homeomorphism between (S1, T1) and (55, T5), where T; is the relative topology of S; in 
Xy for k = 1,2. 


31.14.13 THEOREM: A local homeomorphism maps relative topologies to each other. 

Let f : S1 — S2 be a homeomorphism from Sı C X4 to Sg C X5, where X; and X» are topological spaces. 
Then Top($1) = (f(0); € € Top(X2)} = {f(@); Q € Top(S2)} and Top(S2) = (f-!(Q); € € Top(X1)} = 
(£^! (0); Q € Top(5)), where Top(S;) denotes the relative topology on S; in X; for k = 1,2. 
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PROOF: The equalities Top(51) = { f(Q); Q € Top(S2)} and Top(S2) = {f~1(Q); Q € Top(51)) follow from 
Definition 31.14.12. Let Q € Top( X3). Then f(Q) = f(Q N Dom(f)) = f(QN S2), where Q N S2 € Top(S2) 
by Definition 31.6.2. So {f(Q); Q € Top(X2)} € {f(Q); Q € Top(S2)}. Let €" € Top(S2). Then Y = 05» 
for some € € Top(X2), and f(X) = f(QN 52) = F(Q). So {f(O); Q € Top(X2)) 2 {f(Q); Q € Top(S2)} 
Hence {f(Q); Q € Top(X2)} = {f(Q); Q € Top(S2)}. This verifies the assertion for Top(.S;). The assertion 
for Top(S2) follows similarly. 


31.14.14 THEOREM: Restrictions of homeomorphisms are homeomorphisms. 
Let f : (X4, T1) © (X2,T2) be a homeomorphism. Let Sı be a subset of X, with the relative topology 
from Ti. Let S2 = f (91) have the relative topology from Tz. Then fls, : 61 — $5 is a homeomorphism. 


Proor: The restriction of a bijection is always a bijection. Denote by Tj and T} the relative topologies 
for Sı and S» from (X, T1) and (X2, T3) respectively. To show the continuity of ee note that if G5 € T2, 


then G5 = G2 N S2 for some Gy € T». Since f is continuous, f~'(G2) € Ti. It follows that (f|5.) (62) = 
f !(G3)n S, € T3. So Fls, is continuous with respect to the relative topologies. The continuity of the 
inverse follows symmetrically. Therefore f | 5, is a homeomorphism with respect to the relative topologies on 
Sı and f(S1). That is, Pls: (91,77) © (S2, T2). 


31.14.15 REMARK: Local homeomorphisms. 

A local homeomorphism between two topological spaces is a homeomorphism between subsets of the re- 
spective spaces. Definition 31.14.16 expresses this concept in terms of continuous partial functions. (See 
Definition 31.13.2 for continuous partial functions.) A local homeomorphism on a single topological space 
may be thought of as a kind of “local (topological space) automorphism”. 


31.14.16 DEFINITION: A local homeomorphism from a topological space X to a topological space Y is an 
injective continuous partial function f: X > Y such that f-!: Y — X is a continuous partial function. 


A local homeomorphism on a topological space X is a local homeomorphism from X to X. 


31.14.17 THEOREM: The composite of two local homeomorphisms is a local homeomorphism. 
Let X, Y and Z be topological spaces. Let f : X > Y and g : Y > Z be local homeomorphisms. 


(i) go f : X > Z is a local homeomorphism. 
(ii) Dom(g o f) = f~'(Dom(g)) € X and Range(g o f) = g(Range(f)) C Z. 
(ii) g o f is a homeomorphism from Dom(g o f) to Range(g o f) with respect to the corresponding relative 
topologies in X and Z. 


PROOF: For part (i), let f : X > Y and g : Y > Z be local homeomorphisms. Then f : X > Y and 
g: Y > Z are continuous partial functions by Definition 31.14.16. So go f : X > Z is a continuous 
partial function by Theorem 31.13.4. Similarly, f^! : Y > X and g^! : Z Y are continuous partial 
functions by Definition 31.14.16, and so (go f)! = f7! o g7! : Z + X is a continuous partial function by 
Theorem 31.13.4. Hence g o f: X ^ Z is a local homeomorphism by Definition 31.14.16. 

For part (ii), Dom(g o f) = f^! (Dom(g)) and Range(g o f) = g(Range(f)) by Theorem 10.10.13 (i, ii). The 
inclusions Dom(g o f) C X and Range(g o f) C Z follow from Theorem 10.10.9. cd 

For part (iii), it follows from Definition 31.14.16 and Theorem 31.13.6 that g o f is a surjective continuous 
function from Dom(g o f) to Range(g o f) with respect to the relative topologies on Dom(g o f) to Range(g o 
f) in X and Z respectively. Similarly, (g o f)! is a surjective continuous function from Range(g o f) to 
Dom(g o f) with respect to the same relative topologies. Hence g o f is a homeomorphism from Dom(g o f) 
to Range(g o f) for these relative topologies by Definition 31.14.2. 
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Chapter 32 


TOPOLOGICAL SPACE CONSTRUCTIONS 
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32.15 Topological patchwork spaces . . . . . lll s 1098 


32.0.1 REMARK: Construction methods for topological spaces. 

The topology T generated on a collection of sets S on a set X is a construction method for a topology on 
any set, given an initial set-collection S. (See Definition 32.1.4.) Such a collection S is then called an open 
base for the topology T. (See Definition 32.2.3.) Similarly one may construct a topology from a sub-base. 
(See Definition 32.3.5.) 


A topology can be generated on a set X from a family of maps from X to other topological spaces. (See 
Definition 32.4.8.) A topology can also be defined on X as a “pull-back” of a topology on Y via a function 
from X to Y. (See Definition 32.8.4.) 


Direct products of topological spaces have a useful minimal topology for which the projection functions are 
continuous. (See Sections 32.9 and 32.12.) Disjoint and non-disjoint unions of topological spaces have a 
minimal topology. (See Section 32.14.) Partitions of topological spaces into subsets have a natural “quotient 
topology" on the set of subsets in the partition. (See Section 32.13.) Families of topological spaces with 
“patchwork functions" identifying overlaps have a “patchwork topology". (See Section 32.15.) 


32.1. Generation of topologies from collections of sets 


32.1.1 REMARK: Topologies can be generated from collections of sets. 

In practice, topologies are rarely constructed “by hand". In most situations, the intended topology can be 
constructed from well-known topologies on well-known topological spaces by some kind of procedure. Some 
of these procedures are described in Chapter 32. One of the most powerful general construction methods 
is to “generate” a topology from a set of sets as in Definition 32.1.4. (See Remark 32.4.5 for a summary of 
various ways of generating topologies from set-collections and function-families.) 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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On any set X, a topology may be generated on X from any given subset S of P(X). (Note that two distinct 
subsets $1 Æ S2 of P(X) may generate the same topology on X.) This construction has the advantage that 
one knows for sure that the topology generated by a collection of sets S will contain at least all of the sets 
in S. It is convenient to first state Theorem 32.1.2, which is used in Theorem 32.1.5 to prove that topologies 
generated by set-collections are valid topologies. 


32.1.2 THEOREM: The intersection of a non-empty set of topologies is a topology. 
Let T be a non-empty set of topologies on a set X. Then (7 is a topology on X. 


PROOF: Let T be a non-empty set of topologies on a set X. Then T C P(X) for all T € T. So QT isa 
well-defined subset of P(X). Since 0 € T and X € T for all T € T, it follows that 0 € QNT and X E€ QT. 
So Definition 31.3.2 condition (i) is satisfied. 

To prove condition (ii) of Definition 31.3.2, let 04, 05 € QT. Then Q1, Q2 € T for all T € T. SoQ NN, € T 
forall T € T. Therefore Q,0Q3 € T. Condition (iii) of Definition 31.3.2 follows similarly. Let C € P((\7). 
Then C C NT. So C CT for all T € T. So UC € T for all T € T. Therefore UC € NT. Hence (]7 is a 
topology on X. 


32.1.3 THEOREM: The intersection of a non-empty family of topologies is a topology. 


Let (T;)ier be a non-empty family of topologies on a set X. Then T = (;, T; is a topology on X. 


PROOF: This follows immediately from Theorem 32.1.2 by setting 7 = (T;; i € I} and T = (T. 


32.1.4 DEFINITION: The topology generated by S on X , for any sets X and S such that S C P(X), is the 
intersection of all topologies T on X such that S C T. 


32.1.5 THEOREM: The topology generated by a collection of subsets on a set is a topology on the set. 
Let X and S be sets with S C P(X). Then the topology generated by S on X is a valid topology on X. 


PROOF: Let T = (T € P(P(X)); S C T and T is a topology on X), where X and S are sets which 
satisfy S C P(X). Then the topology generated by $ on X is equal to(] 7. The set 7 is non-empty because 
P(X) € 7 for any set X. (See Definition 31.3.19 for the discrete topology P(X) on X.) So QT is a valid 
topology on X by Theorem 32.1.2. 


32.1.6 REMARK: The empty collection of sets generates the trivial topology. 
In the special case S = Ø, the topology generated by S on any set X is the trivial topology (0, X] on X. 


32.1.7 REMARK: Application of cardinality-constrained power-sets to topology. 

By Notation 13.12.5, P(S) means (C € P(S); 1 € #(C) < oo}, the set of non-empty finite subsets of any 
set S. Some elementary properties of PẸ (S) are given in Theorem 13.12.8. This notation is useful for the 
precise statement of numerous definitions and theorems in topology. 


32.1.8 THEOREM: An explicit expression to construct a topology from an arbitrary collection of sets. 
Let S be a set (of sets). Let T(S) = {U C; Ce P(((]D; De Pf*(S)))). 


(i) SC T(S). 


(ii) UT(S) - US. 
(iii) T(S) is a topology on US. 


PROOF: Let S be a set (of sets). Then Y(S "Am 1 
IP(Y (S)) is well defined and 0 € P(Y(S)). So UO 
Theorem 8.4.6 (i). So 0 € T(S). 

To put a lower bound on Y(S), note that Y(S) = {N D; D e P*(S)) 2 ((]D; De Pt 
PP(S) 2 P1(S). But {N D; De Pi(S)} = {N {a}; {a} € 9} = es a€S}=S. So Y(S) 2 

To put an upper bound on Y (S), note that Y (S) C {N D; D € P(S) HOH because P(S) C EI But 
P(S) € IP(IP(U S)) by Theorem 8.5.2 (viii, iii). So Y (S) C {N D; De P(P(US)) \ {O}}. But ((1D; De 
P(P(US)) \ (0) = IP(U S) by Theorem 8.5.2 (xiv). So Y (S) C IP(U S). 

From the lower bound Y(S) 2 S, it follows that T(S) = (UC;C e P(Y(S))} 2 tUC; C e P(S)} 2 
(UC;CerPi(S) = {U{a}; (a) € S) = {a; a € S) = S. This verifies part (i). Hence UT(S) 2 US. 


QD; D € P$*(S)) is a well-defined set (of sets). So 
€ T(S) = (UC; C e P(Y(S))}. But 0 = U by 
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From the upper bound Y(S) C P(U S), it follows that T(S) = {UC;C e P(Y(S)) c (UC;C € 
P(P(US))} = P(US) by Theorem 8.5.2 (xiii). So UT(S) C U(GP(US)) = US by Theorem 8.5.2 (v). 
Hence UT(S) = US. This verifies part (ii). 

It follows from Y (S) 2 S that S € P(Y(S)). So UJ S € T(S). 

By Theorem 8.6.2, T'(S) is closed under arbitrary unions. 

Let 4, Q2 € T(S). Then Q4 = UC: and Qə = UC» for some C4, C3 € IP(Y(S)). So Q1 N (25 = (U C1) N 
(U C2) = U {E1 N Ex; E; € Ci, E» € C2} by Theorem 8.4.8 (viii). Then Ey, E2 € Y (S) for all Ey € Cy 
and Es € Cy. So F4 = (1D1 and E = f D» for some Dı, D2 € IP?* (5). So E A Es = (D1) N (N D2) = 
N(Dı U Dz) by Theorem 8.4.8 (xiii). Then E; E» € Y(S) because 1 < #(Dı U D2) < oc. Therefore 


U{E1N E2; E; € Ci, E; € Co} € T(S) because T(S) is closed under arbitrary unions. Thus Q1 N92 € T(S). 


been shown that (0,LJ S} C T(S) C IP(U S) and T(S) is closed under binary intersections and arbitrary 
unions. Therefore T(S) is a topology on | JS by Definition 31.3.2. This verifies part (iii). 


32.1.9 REMARK: The topology generated by a collection of sets equals am explicit expression. 
Theorem 32.1.10 shows that the topology generated by a collection of sets $ C IP(X) on a set X is the same 
as the fairly explicit topology construction in Theorem 32.1.8. 


32.1.10 THEOREM: Explicit expression for the topology generated by an arbitrary collection of sets. 
Let X = US for sets X and S. Then T(S) = (UC; C € PUND; De P?°(S)})} is the topology generated 
by S on X. 


PROOF: Let X and S be sets with X = [J S. Define T(S) as indicated. Then T(S) is a topology on X by 
Theorem 32.1.8 (iii). 

Let T" be a topology on X which satisfies $ C T". Let Y(S) = ((1D; D € P(S)). Then Y(S) C T" 
because Y (S) is the set of all finite intersections of elements of S, and T" is closed under finite intersections 
(because T" is assumed to be a topology). Since T" is closed under arbitrary unions, it follows similarly that 
(UC;C e P(Y(S))) C T’. Thus T(S) C T’. Since T(S) is a topology on X which satisfies S C T(S) 
by Theorem 32.1.8 (i), it follows from Theorem 32.1.2 that T(S) is equal to the intersection of all such 
topologies. Therefore T(S) is the topology generated by S on X by Definition 32.1.4. 


32.1.11 REMARK: The choice axiom is not required for an explicit expression for a generated topology. 
The proof of Theorem 32.1.10 does not use the axiom of choice because it was avoided in the proof of the 
closure of T under arbitrary unions in Theorem 8.6.2. Avoiding the axiom of choice in topology is difficult 
because topology deals with such general sets and collections of sets. Measure theory is another subject 
which tempts one to use the axiom of choice because of the enormous generality of the sets. 


32.1.12 REMARK: Verification of a generated topology theorem in a trivial case. 

It is a useful exercise to verify topology theorems for trivial cases. Theorems which are valid for an intended 
range of cases do occasionally fail in such cases. Theorem 32.1.10 may be verified for X = 9 as follows. 

Let X = (0. Then P(X) = {0}. This gives two possibilities for subsets S C P(X) of the power set of X, 
namely S = ( or S = {0}. However, (0, X) = (0). So the requirement that (0, X) C S implies that S = {0}. 
Therefore P(S) = (0, {O}}. 

In the equation 7T" = ((1C; C € P(S), 1 € Z(C) < ov}, the only possibility for C is (0) because 1 € #(C). 
Then NC = 0. So T" = (0) and P(T’) = (0, {O}}. 

In the equation T = {U D; D € IP(1T^)), the only choices for D are Ø or (0). Then LJ D = 0 in both cases. 
Therefore T = (0). This is the same as the only possible topology on X, namely the set (0, X) = {0}. 

To fully verify Theorem 32.1.10 for X = (), it must be shown that the topology T = {Ø} on X = Í is the 
intersection of all topologies on X. This follows immediately from the fact that there is one and only one 
possible topology on X = Í. 


32.1.13 REMARK: The topology generated by a set is the weakest topology which includes the set. 
The topology generated on a set X by a set S of subsets of X is the unique topology on X which is weaker 
than all other possible topologies on X which include S. 
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32.1.14 REMARK: Construction for generated topologies from collections closed under intersection. 
Theorem 32.1.15 is a version of Theorem 32.1.10 which assumes that the set S is closed under finite inter- 
sections. Then only half of the construction work is required to build the topology generated by S on X. 
To facilitate comparisons, the set S is denoted as 7". In fact, the proof of Theorem 32.1.10 could have been 
shortened by first proving Theorem 32.1.15 and then applying it to the set T" in Theorem 32.1.10. 


32.1.15 THEOREM: Explicit expression for topology generated by am intersection-closed collection of sets. 
Let X and T" be sets such that (0, X) C T’ C P(X) and T" is closed under finite intersections. Define 


T -(UQ: Qe P(T)). 
Then T is the topology generated by T" on X. 


Pnoor: To show that Ø € T, let Q = 0 € P(T’). Then 0 = UQ € T. To show that X € T, let 
Q={X}eP(T’). Then X =UQET. So (0, X} C T. 

To show that T is closed under finite intersections, let A1, A9 € T. Then A; = U Q; for some Q; C T" 
for i = 1,2. Hence A1 N Ag = (UQI) N (UQ2) = UQ with Q = {U N U2; Uy, E€ Qı, U2 € Q2} (by 
Theorem 8.4.8 (viii)). For i = 1,2, U; € Q; implies that U; € T" and so Ui N U2 € T" by the closure of 7" 
under finite intersections. So Q € P(T’). Therefore JQ € T. That is, A; N A4» € T. Hence T is closed 
under finite intersections. 


The closure of T under arbitrary unions is guaranteed by Theorem 8.6.2. So T satisfies all of the conditions 
of Definition 31.3.2 for a topology on X. 


To show that T is the topology generated by T" on X, first show that T" C T. Let U € T". Then {U} € P(T’). 
So U =U{U} € T. Hence T' C T. 


Let T be a topology on X which satisfies 7T" C T. Then T C T because T is closed under arbitrary unions 
and T is the closure of T’ under arbitrary unions. Therefore T is included in the intersection of all topologies 
T on X which include 7". Since T is itself such a topology, it follows that T is equal to the intersection of 
all such topologies. Therefore T' satisfies Definition 32.1.4 for the topology generated by T" on X. 


32.2. Open bases 


32.2.1 REMARK: Generating topologies from open bases and open subbases. 

In practice, it is almost always too inconvenient to directly specify all of the open sets in a topology. A 
topology is most conveniently generated from an open base or open subbase. The open subbase concept 
corresponds roughly to T'heorem 32.1.10, which states that the topology generated by an arbitrary collection 
of sets equals the set of unions of finite intersections of sets in the collection. The open base concept 
corresponds roughly to Theorem 32.1.15, which states that the topology generated by a collection of sets 
which are closed under finite intersections equals the set of unions of sets in the collection. 


'The open base and subbase concepts are similar to the concept of a basis for a linear space. Many operations 
on linear spaces can be specified for a basis, from which the operations on the whole space follow. In the 
same way, many definitions and calculations for topological spaces may be specified for an open base or 
subbase, from which the corresponding operations follow for the full topology. 


32.2.2 DEFINITION: An open base at a point x in a topological space X is a subset B of Top, (X) such that 
every neighbourhood of x includes at least one element of B. In other words, B € IP(Top, (X)) and 


YQ € Top, (X), 3G € B, GCR. (32.2.1) 


32.2.3 DEFINITION: An open base for a topological space X is a subset B of Top(X) such that 


Va € X, VO € Top, (X), 3G € B, rc€GandG CQ. (32.2.2) 


32.2.4 THEOREM: Every topology on a set is an open base for the topological space. 
Let X be a topological space. Then Top(X) is an open base for X. 
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Pnoor: The assertion follows from Definition 32.2.3 by choosing G = Q. 


32.2.5 REMARK: Equivalent expressions for open bases of topological spaces. 
Line (32.2.2) in Definition 32.2.3 may be rewritten as follows. 


Yz € Q, VO € Top, (X), IG € BNTop,(X), GCQ. 


Clearly a collection B € P(Top(X)) is an open base for a topological space X if and only if B N Top, (X) 
is an open base at each point x of X. Line (32.2.2) may also be rewritten with the universal quantifiers 
swapped as follows. 


VO € Top( X), Vx € Q, 3G € B, r€GandG CQ. 


Theorem 32.2.6 gives another equivalent condition for an open base. 


32.2.6 THEOREM: An open base is a subset of the topology whose unions span the topology. 
Let X be a topological space. Let B € P(Top(X)). Then B is an open base for X if and only if 


VQ € Top(X), 3C € P(B), 2=UC. (32.2.3) 


In other words, Top(X) = {UC; C € P(B)}. 


PROOF: Let B be an open base for a topological space X. Let Q € Top(X). Let C = (Ge B; GC}. 
Then [JC CQ. Let z € Q. Then z € G and G C Q for some G € B. Sox € (JC. Hence 0 = UC. 

For the converse, let X be a topological space and suppose that B € P(Top(X)) satisfies line (32.2.3). Let 
x € X and Q € Top, (X). Then Q = [JC for some C € P(B). Sor € UC. Therefore x € G for some G € C. 
But G € B and G C for such G. So IG € B, (x € G and G CQ). This verifies line (32.2.2). 


32.2.7 REMARK: Typical applications of open bases. 
A typical application of the open base concept is to metric spaces, where the set of all open balls forms an 
open base for the topology induced by the metric. (See for example Definition 37.5.2 and Theorem 37.5.4.) 


32.2.8 REMARK: Indexed countable open bases for a topological space. 

Definitions 32.2.9 and 32.2.10 are applicable to the concepts of first countable and second countable topo- 
logical spaces. (See Definitions 33.4.12 and 33.4.13.) A “non-increasing indexed countable open base" for 
the global topology makes very little sense. So it is not defined. But in the case of a countable open base 
at a single point, it is always possible to construct a corresponding non-increasing indexed open base from 
a general indexed open base. This is asserted in Theorem 32.2.11. 


In most cases, finite open bases are of no practical interest. If they are required, they can either be explicitly 
defined to be finite, with or without and index, or a single open set can appear an infinite numbers of times 
in the sequence. Therefore only countably infinite indexed open bases at a single point are defined here. 


32.2.9 DEFINITION: A countable open base at a point x in a topological space X is a countable subset B 
of Top, (X) which is an open base at zx. 


An indexed countable open base at a point x in a topological space X is a sequence (B;);ie, € Top, (X)* 
whose range (Di; i € w} is an open base at x. 


A non-increasing indexed countable open base at a point x in a topological space X is an indexed countable 
open base (B;);ie; € Top, (X)" at x such that B;41 C B; for all i € w. 


32.2.10 DEFINITION: A countable open base for a topological space X is a countable subset B of Top( X) 
which is an open base for X. 


An indexed countable open base for a topological space X is a sequence (D;);e,, € Top( X)" whose range 
(Bi; i € w} is an open base for X. 


32.2.11 THEOREM: The progressive sequence of intersections of an open base sequence form an open base. 
Let (Bi)ie, be an indexed open base at a point x in a topological space X. Then the sequence (B;)icw 
defined by B; = No Dj for all i € w is a non-increasing indexed open base at x. 
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PROOF: Let X bea topological space with z € X. Let B = (B;)ie, be an indexed countable open base at x. 
Define B’ = (Bie, by B; = a Bj for all i € w. Then B; € Top, (X) for all i € w by Theorem 31.3.7. 
Let Q € Top,(X). Then B; C Q for some i € w by Definition 32.2.2. But B; C Bj. Consequently 
YQ € Top, (X), Ji € w, B; C Q. So VO € Top, (X), IG € Range(B’), G C Q. Therefore Range(B’) is an 
open base at x by Definition 32.2.2. Hence B’ is a non-increasing indexed open base at x. 


32.3. Open subbases 


32.3.1 REMARK: Non-empty finite intersections of open subbase elements span a pointwise open base. 
In Definition 32.3.2 line (32.3.1), the constraints D € P(S) and S C Top, (X) force the non-empty finite set 
collection D to contain only open neighbourhoods of x. So fD is always a well-defined element of Top, ( X ). 


32.3.2 DEFINITION: An open subbase at a point x in a topological space X is a subset S of Top, (X) such 
that ((] D; D € IP?*(S)), the set of finite intersections of elements of S, is an open base at x. In other 
words, S € IP(Top, (X)) and 


YQ € Top, (X), 3D € IP? (S), ca. (32.3.1) 


32.3.3 REMARK: Notation for the set of non-empty finite subsets of a given set. 

As mentioned in Remark 32.1.7, the expression IP?*($) (which appears in Definition 32.3.2) denotes the set 
(D € P(S); 1 € #(D) < œ} of non-empty finite subsets of D. (See Notation 13.12.5 for P? (D). For basic 
properties, see Theorem 13.12.8.) 


32.3.4 REMARK: Non-empty finite intersections of open subbase elements span an open base. 
In Definition 32.3.5 line (32.3.2), the constraint D € P(S N Top, (X) forces the non-empty finite set 
collection D to contain only open neighbourhoods of x. So f^] D is always a well-defined element of Top, (X). 


32.3.5 DEFINITION: An open subbase for a topological space X is a subset S of Top(X) such that the set 
((1D; D € PX(S)} is an open base for X. In other words, S € P(Top(X)) and 


Yz € X, VQ € Top, (X), 3D € P(S n Top, (X)), (32.3.2) 
ca. 


In other words, Top(X) = (UC; C €e PHQ D; De P#(S)})}. 


32.3.6 THEOREM: Any open base for a topological space is an open subbase. 
Let X be a topological space. 


(i) Top(X) is an open subbase for X. 
(ii) If B is an open base for X, then B is an open subbase for X. 


Proor: Part (i) follows from Definition 32.3.5 by choosing D = {Q}. 


Part (ii) follows from Definitions 32.2.3 and 32.3.5 by choosing D = {G} for each € € Top(Q), where G € B 
and GCR. 


32.3.7 REMARK: A set-collection is a subbase for a topology if and only if it generates the topology. 
By combining Theorem 32.2.6 and Definition 32.3.5, one sees that a set S € P(P(X)) is an open subbase 
for a topological space (.X, T) if and only if 
T —(UC; C € P(P(X)) and VG € C, 3De PP(S), G= ND} 

= { UC; YG € C, ID € P?(S), G= ND} 

= {UC; C € {ND; DerPr(s)j 

= {UC; C € P{ND; De PF (8)})}- (32.3.3) 
Note that C, D, S, T € P(P(X)) are collections of subsets of X, whereas G € P(X) is a subset of X. Note 


also that the open base B = {N D; D € IP*(S)) € P(P(X)) is a collection of subsets of X, whereas 
UC € P(X) and f] D € P(X) are subsets of X. (See Theorem 8.5.2 for general properties of power sets.) 


The expression in line (32.3.3) is the same as in Theorems 32.1.8 and 32.1.10. Therefore S is an open subbase 
for (X, T) if and only if T is the topology generated by S. This is formalised as Theorem 32.3.10. 
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32.3.8 THEOREM: A subbase for a topology is a sub-collection which generates the topology. 

Let X be a topological space. Let S € P(Top(X)). Then S is an open subbase for X if and only if 

7 oo #(D) _ 

VQ € Top(X), IC e P( U S"), Q= U N D: (32.3.4) 
m=1 Dec i=1 


PROOF: Let S be an open subbase for a topological space X. Let B = {(|D; D € PX(S)}. Then B 
is an open base for X by Definition 32.3.5. Let Q € Top(X). Then Q = UC for some C € P(B) by 
Theorem 32.2.6. For each G € C, G = fD for some D € IPT*(S). But for each D € PẸ (S), there exists 
a positive integer m = #(D) € Z* and a bijection D : Nm + D. Then G = (X, Di. Let C be the 
set of all bijections from sets Nm to sets D € PẸ (S), for all m € Zf. (Note how the axiom of choice is 
avoided here by using all possible such bijections instead of choosing a single bijection for each set D.) Then 
(\D= nze» Dj for all D € C which satisfies Range(D) = D, for each D € IPT*(S). Since there is at least 
one such bijection D for each D € P$(S), it follows that Q = (JC = Upec D = Upee ne??? Dj. Hence 
line (32.3.4) holds with C = U {Bij(Nm,D); m € Zt and D € P?*(S) and N D € C]. 

To show the converse, let X be a topological space. Suppose that S € P(Top(X)) satisfies line (32.3.4). Let 
x € X and € € Top,(X). Then = U e6 nze Dj for some Č € P(U%_, S"). Sox € nz?» Dj for some 
D € C. Then x € (| D, where D = Range(D) € P(S). Let B = {N D; De PX(S)}. Let G —( D. Then 
x € G and G € B. But GC Q by Theorem 8.4.8 (xiv). Consequently for all x € Q, for all Q € Top, (X), 
for some G € B, x € Gand G C Q. Therefore B is an open base for X by Definition 32.2.3. Hence S$ is an 
open subbase for X by Definition 32.3.5. 


32.3.9 REMARK: Diagram for the two-level hierarchy which generates a topology from an open subbase. 
The two-level hierarchy in Definition 32.3.5 is illustrated in Figure 32.3.1. The subbase elements FE; jẹ € S are 
intersected to construct sets (] Di; = Ei, j10Ei,j20..., where Di; = (Eij1, Eij2,...]. The set intersections 
N Di,; are joined to construct sets LJ C; = (N Dit) U (N Di2) U..., where C; = {N Dii N Dis; ...]. Then 
the topology T = {U C1, C2,...} is the set of all such unions |] C;. 


UC, UG m UG un 
Z "A" p dq c EE 
(1D11 P12 NP21 f D25 NDia e fdDüi, -:- 


/ XO “Sw de de a e OS 


Eiaa Ei Eja Eija Pes 


Figure 32.3.1 Two-level hierarchy of construction of topology from an open subbase 


32.3.10 THEOREM: An open subspace is a set-collection which generates the topology. 
Let X and S be sets satisfying (0, X) C S C P(X). Let T be a topology on X. Then S is an open subbase 
for the topological space (.X, T) if and only if T is the topology generated by S on X. 


Proor: Let X and S be sets satisfying (0, X) C S C P(X). Let T be a topology on X. Suppose that S is 
an open subbase for the topological space (X, T). Then by Theorem 32.1.10, the topology generated on S is 
the set T(S) = {UC; C € P(((1D; De PX(S)})}. But as mentioned in Remark 32.3.7, this is equivalent 
to the condition in Definition 32.3.5 for S to be an open subbase for T(S). 


32.4. Generation of topologies from inverses of functions 


32.4.1 REMARK: Avoiding axiom of choice in a topology proof. 

In the proof of Theorem 32.4.2, the axiom of choice is avoided by defining the collection of open sets C" to 
be the set of all Q € Ty such that f^! (Q0) € C instead of selecting a single open set Q € Ty for each element 
of C. So no choice function is required. 
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32.4.2 THEOREM: Generation of a topology from inverse images of a single function. 
Let (Y, Ty) be a topological space. Let f : X — Y bea function from X to Y. Then Tx = {f~1(Q); Q € Ty} 
is a topology for X. 


PROOF: (,X € Tx since Ø = f !(0) and X = f !(Y). If G1,G» € Tx, then Gi = f^ !(01) and 
Go = f-*(Q2) for some Q1, Q2 € Ty. So by Theorem 10.6.10 (iv), G40 G3 = F(Q B Q2) € Tx. Now let 
C € P(Tx). Let C' = {9 € Ty; f-1(Q) € CJ. Then C = {f-*(Q); Q € C'). So by Theorem 10.7.6 (iii) 
Uc = U{F H9); 2 € CŒ} 2f «(Ufq$oec')-f !(UC»). Then UC € Tx because UC’ € Ty. 
Hence Ty satisfies the conditions of Definition 31.3.2 for a topology. 


32.4.3 REMARK:  Non-generation of a topology from forward images of functions. 

Theorem 32.4.2 does not have a forward analogue which would state that (Y, Ty) is a topology with Ty = 
(f(Q); Q € Tx} for a topological space (X, Tx) with f : X — Y. The reason that the inverse map 
f^! works so well is that an inverse function is always injective and surjective, which causes the set map 
f-! : P(Y) 5 P(X) to send intersections to intersections and unions to unions. (See Theorems 10.6.7, 
10.6.10 and 10.7.6 for the relevant properties.) 


32.4.4 REMARK: Generation of a topology from inverse images from multiple topological spaces. 
Theorem 32.4.2 is generalised in Theorem 32.4.6 to an arbitrary family of functions and target topologies. 
This is useful for proving the validity of topologies on direct products of topological spaces. 


32.4.5 REMARK: Methods of generating topologies from function-families and set-collections. 
There are strong similarities between the generated topology constructions in Theorems 32.1.8, 32.1.15, 
32.4.6, 32.12.3, 32.14.7 and 49.8.8, but they have significant differences. 


(1) Arbitrary set-collection. Generated topology — set of all unions of intersections. 
In Theorem 32.1.8, an arbitrary collection of sets C induces a topology on UC as the set of all unions 
of intersections. 


(2) Arbitrary function-family. Generated topology — set of all unions of intersections. 
In Theorem 32.4.6, an arbitrary family of functions induces a topology on a set X by forming the unions 
of intersections of inverse images of open sets via the functions. 
In Theorem 32.12.3, the family of projections of a Cartesian set-product induces a topology on the 
product by forming the unions of intersections of inverse images of open sets via the projections. 


(3) Topologically consistent set-collection. Generated topology — set of all unions. 
In Theorem 32.1.15, the topology generated by a set-collection is the set of all unions of sets in the 
collection if it is closed under finite intersections. 
In Theorem 32.14.7, a topology is constructed on the base set X as the set of all unions of open subsets 
of overlapping sets which cover X and have consistent topologies on overlaps. 


(4) Topologically consistent function-family. Generated topology — set of all unions. 
In Theorem 49.8.8, a topology is constructed on the base set M as the set of unions of inverse images 
of partial functions on M satisfying the consistency condition in Definition 49.8.2 (iv). 


Cases (3) and (4) both require prior consistency between contributing topologies, which explains why their 
constructions are expressed in terms of simple unions of sets, not unions of intersections. Thus the generated 
topologies in cases (1) and (2) are constructed from open subbases, whereas the generated topologies in cases 
(3) and (4) are constructed from open bases. 


32.4.6 THEOREM: Generation of a topology from inverse images of many functions. 
Let (Yi, T;);er be a family of topological spaces for some non-empty index set J. Let X be a set and let 
fi: X — Y; be a function for all i € I. Define 


Tx = {UC; C € P(Tx)}, 
where 
Tx ={ N f; (055 J € PPD and Vj € J, 0; € T; }. 
jEJ 


Then Ty is a topology on X. 
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Proor: (X € Tx because Ü =U Ø and X = f; !(Y;) for any i € I. Let S = Uj ff, (Q); Qi € Ti} = 
Ur (Q); i € I, 0; € Tj) and S’ = { ND; D e PP(S)). Clearly T € S'. Since any element of S” is a 
finite intersection of sets f; !(Q;), there must be only a finite number of these sets for each i € J. In other 
words, any element (^ D of S" must be expressible as (jey Nex, f^ (053) for some finite subset J of J, 
and a finite family (Q;,4)eex, for each j € J. But (lex, f; (Q4) = f; (nex, $5) = £7 (0) for some 
Q; € Yi, for each j € J. So ND € Tx. Therefore Ty = S’. Hence Tx is the topology generated by Ty on 
X by Theorem 32.1.10. 


32.4.7 REMARK: Properties of the weak topology generated by a family of maps and topological spaces. 
'Theorem 32.4.6 shows that the set Tx in Definition 32.4.8 is a topology on X. Any topology on X which 
contains all of the sets f; !(Q;) must include the weak topology on X. It is shown in Theorem 32.8.12 that 
Tx is the weakest topology on X for which all of the functions f; : X — X; are continuous. (Definition 32.4.8 
is used in Definition 32.12.2 for product topologies. A differentiable manifold version of Theorem 32.4.6 is 
given as Theorem 51.5.7.) 


32.4.8 DEFINITION: Weak topology generated by family of maps to topological spaces. 
The weak topology on a set X generated by a non-empty family of maps (f;);e; to topological spaces (Y;, T;)ier 
is the set 


Tx = (UC; C € P(Tx)}, 
where 
Tk ={ N F (Q); J e PP(J) and Vj € J, Qj € T; }. 
jeJ 
In other words, 


Tx ={UC; Ce P({ 15 (Q3) J € IP (I) and (05);e; € Fora ÐP 


32.5. Real-number topology 


32.5.1 REMARK: Real-number topology is the starting-point for numerous other topologies. 

In practice, the topology of the real numbers is by far the most important of all topologies. In fact, the real 
number system itself has been by far the most important mathematical system since they were axiomatically 
formalised in the late 19th century, usurping the role of Euclidean geometry as the most important pillar of 
mathematics. In fact, topologies which are not based on the standard real-number topology are of relatively 
minor interest. Even the integer, rational and complex number topologies are derived from the real numbers. 


Topologies of manifolds are built from Cartesian space topology, which is built from real-number topology. 
The topologies of topological linear spaces, including finite-dimensional linear spaces, are almost entirely built 
from real-number topology. Thus real-number topology is the core topology built into the vast majority of 
applicable topologies, in the same way that the real number system is the core number system underlying 
most number systems. Consequently, the topology of the real numbers is not merely one single example of 
a topology. It is in fact the quintessential topology which is woven into all of mathematical analysis. Hence 
it deserves close attention and scrutiny, and deep understanding. 


32.5.2 REMARK: Basic properties of the usual topology on the integers. 

The usual topology for the integers in Definition 32.5.3 is the only topology on the integers which is 
translation-invariant and contains at least one non-empty finite set. (See Remark 31.11.3 for discussion 
of this.) This usual topology is the same as the relative topology induced by the set of real numbers in 
Definition 32.5.7. It is also the same as the discrete topology in Definition 31.3.19. The same comments 
apply to the non-negative integers. 


The extended integers are more interesting. The topologies in Definition 32.5.4 are the standard two-point 
and one-point compactifications of the topologies in Definition 32.5.3. 


32.5.3 DEFINITION: The usual topology on the integers is the set of all subsets of the integers. In other 
words, Top(Z) = P(Z). 


The usual topology on the non-negative integers is the set of all subsets of the non-negative integers. In other 
words, Top(Zj ) = IP(Zg ). 
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32.5.4 DEFINITION: The usual topology on the extended integers is the set of all subsets of the integers 
together with all complements of finite subsets of the integers. In other words, 


Top(Z) = P(Z) U {S € P(Z); #(Z\ S) < oo]. 


The usual topology on the non-negative extended integers is the set of all subsets of the non-negative integers 
together with all complements of finite subsets of the non-negative integers. In other words, 


Top(Zj) = P(Z) U (S € P(Zj): #(Z\ 8) < oo). 


32.5.5 REMARK: The usual topology on the real numbers requires only the total order structure. 

The usual topology on the real numbers can be defined in a relatively elementary way without reference 
to metric spaces or open bases. The usual total order on the real numbers is a sufficient foundation for 
Definition 32.5.7. (Open intervals for general totally ordered sets are defined in Definition 11.5.10. Real 
numbers are introduced in Section 15.3.) 


For good measure, the usual topology on the set of rational numbers is also given here. It has the same 
formal structure as the usual topology on the real numbers, and it is equal to the relative topology induced 
by the usual topology on the set of real numbers. 


32.5.6 DEFINITION: The usual topology on the rational numbers is the set of all unions of open intervals 
of the rational numbers. 


32.5.7 DEFINITION: The usual topology on the real numbers is the set of all unions of open intervals of the 
real numbers. 


32.5.8 REMARK: The topology of the real numbers is generated from open intervals. 
The usual topology for the real numbers is the topology generated by the set of all real open intervals. (By 
Theorem 32.7.8, each open set of real numbers is the union of a countable sequence of disjoint open intervals.) 


Definition 32.5.7 is not as circular as it looks. In Definition 16.1.5, an “open interval" is defined as a set 
of the form (a,b) = (x € IR; a < x < b}, where a,b € R with a < b. This definition uses only the order 
structure on IR. It turns out that open intervals are indeed open sets in the usual topology of IR as one would 
expect. Some topological properties of real number intervals are presented in Section 34.9. 


The topological and order properties of the real numbers are closely interlinked. This is shown especially by 
the two major methods for constructing the real numbers, namely Dedekind cuts (which use a total order) 
and Cauchy sequences (which use a distance function). These are discussed in Section 15.3. 


32.5.9 THEOREM: Every point of an open set of real numbers is in an open interval included in the set. 
YQ € Top(IR), Va € Q, de € R*, (a—&,a4- e) CQ. 


PROOF: Let Q € Top(R). Then Q = (JS for some set S of open intervals of IR. Let a € Q. Let 
Sa = {I € S; a € S}. Then S, Z 0. Let J = U Sa. Then J is a real-number interval and a € J. Let 
I € Sa. Then inf(I) < a because I is a non-empty open interval. Let e; = a — inf(J). Then e € Rt 
because inf(J) < inf(I) for all I € Sa. Similarly, sup(J) > a + £2, where e2 = sup(J) — a € Rt. Let 
e€ = max(1, min(£1,£2)). Then £ € IR* and (a — £&,a +€) CQ. 


32.5.10 REMARK: Containment of extrema in non-empty, bounded, closed sets of real numbers. 

Theorem 32.5.11 shows that the maximum (or minimum) of any non-empty, bounded above (or below), 
closed set of real numbers is a member of the set. The non-emptiness and boundedness are required for the 
well-definition of these extrema. It is the closure which implies containment. It is shown in Theorem 37.7.7 
that a subset of R is closed and bounded (both above and below) if and only if it is compact. 


32.5.11 THEOREM: Some topological properties of minima and maxima of real-number sets. 
Let S be a subset of IR. 


(i) If S is non-empty and bounded above, then max(.$) is a limit point of S. 
(ii) If S is non-empty, closed and bounded above, then max(5S) € S. 
(iii) If S is non-empty and bounded below, then min(S) is a limit point of S. 
(iv) If S is non-empty, closed and bounded below, then min(S) € S. 
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PROOF: For part (i), let S € IP(IR) be non-empty and bounded above. Then max(S) is a well-defined 
element of IR. by Theorem 15.9.3 (xv), Notation 11.2.16 and Definition 11.2.4. So by Theorem 16.1.18 (ii), 
Ve € Rt, S N (max(S) — £, max(S)] 4 Ø. Therefore max(S) is a limit point of S by Definition 31.10.2. 


Part (ii) follows from part (i) and Theorem 31.10.12 (vi). 
Part (iii) may be proved similarly to part (i). 
Part (iv) follows from part (iii) and Theorem 31.10.12 (vi). 


32.6. Cartesian space topology 


32.6.1 DEFINITION: The usual topology for IR^ for n € Z* is the topology generated by the set of all 
Cartesian products of real open intervals. 


32.6.2 REMARK: The topology om Cartesian products of real numbers is generated from open intervals. 
'The Cartesian products of real open intervals which are referred to in Definition 32.6.1 are sets of the form 
x2 (ai, bi), where (a;i): 1, (b;)?., € R” are sequence of real numbers such that a; < b; for alli € Nn. The 
usual topology on IR" is the same as the product topology for the set product R” = x? R. (See Definitions 
32.9.4 and 32.12.2 for general product topologies.) 


The Cartesian topological spaces in Definition 32.6.3 are Cartesian tuple spaces as in Definition 16.4.1, 


together with the usual topology in Definition 32.6.1. Theorem 32.6.4 is an n-dimensional version of Theo- 
rem 32.5.9. 


32.6.3 DEFINITION: The Cartesian topological space with dimension n € Za is the topological space 
(IR^, Tr»), where Tr» is the usual topology for R”. 


32.6.4 THEOREM: Points in open subsets of Cartesian spaces have open-interval neighbourhoods. 
Vn € Z, Và € Top(IR?), Va € Q, Je € Rt, x? 4(a; — €,a; +e) C €. 


PROOF: Let n € Zj and Q € Top(R”). Then 2 = US for some set S of Cartesian products of open 
intervals of R”. Let a € Q. Then eo = sup(e € (0,1]; 3J € S, x? (a; — £,a; +€) C I} is well defined and 
£o € (0, 1] because a € I for some I € S. Let € = $€9. Then x? 4(a; — €,a; + €) C Q. 


32.6.5 REMARK: The standard topology on a finite-dimensional linear space. 

The topology in Definition 32.6.1 may be imported onto any finite-dimensional linear space via a coordinate 
chart as in Definition 32.6.6. The construction for this importation is given in Theorem 32.4.2. (See 
Definitions 22.8.6, 22.8.7 and 22.8.8 for coordinate maps.) The imported topology is independent of the 
choice of basis for the chart. (This follows easily from the basic properties of the vector component transition 
maps in Definition 22.9.4.) 


32.6.6 DEFINITION: The standard topology for a finite-dimensional linear space F is the topology on F 
given by {xp (Q); Q € Top(R”)} for any basis B for F, where n = dim(F) and «pg is the component map 
for B on F. 


32.6.7 REMARK: The topology for general linear groups is imported from Cartesian spaces. 

In Definition 32.6.8, the standard topology for the general linear group GL(F) of a finite-dimensional linear 
space is imported from the parameter space R”*” for n x n matrices via the linear-space component maps 
Kp,p in Definition 23.2.8. Since the image of xg, g is a proper subset of IR"*" when n > 2, the relative 
topology on the set of invertible matrices in IR"*" is used. The set of invertible matrices is an open subset 
of the set of all matrices with respect to the topology on Mn,n(R) which is imported from IR"*" via the 
component maps Kp,p. 


32.6.8 DEFINITION: The standard topology for the general linear group GL(F) of a finite-dimensional linear 
space F is the topology on GL(F) given by Gals (Q0); Q € Top(R”*”)} for any basis B for F, where 
n = dim(F) and &p,p is the linear-map component map for B on F as in Definition 23.2.8. 


32.6.9 THEOREM: The closure of the rational numbers in the real numbers is the whole space. 


Q=R. 
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PROOF: Let x € IR. Let Q € Top, (IR). Then (x — £z +€) C 2 for some £ € IR^. Let N = ceiling(e ^ t). 
Then € ! € N. So N^! € e. Let q = ceiling(Nz)/N. Then Nq = ceiling(Nz). So Nx € Nq < Nz 4 1. 
Therefore x < q < x + NT! < g +e. Soq € (rx —&,z4 e). But q E Q. So Qn Q z 0. Therefore z € Q by 
Theorem 31.8.17 (ii). Hence Q = R. 


32.6.10 THEOREM: The closure of the rational tuples in a real tuple-space is the whole space. 
Q” = R” for all n € AZ. 


PROOF: For the case n = 0, Q? = {0} = R°. The case n = 1 follows from Theorem 32.6.9. For n > 2, the 
method of Theorem 32.6.9 may be applied to each component x; of x € IR" for i € Ny. This yields a tuple 
q = (qi), such that z; < q; < z; + € for all à € Na. Then q € x?4(z; — €, £; +€). Thus q € Q” NQ for 
any given Q € Top, (R”). Hence Q^ = R” for all n € Zg. 


32.6.11 REMARK: The usual topology for the complex numbers. 
The usual topology on the complex numbers is obtained via the standard identification of C with R?. (See 
Section 16.8 for the complex numbers. See Definition 32.6.1 for the usual topology on R?.) 


32.6.12 DEFINITION: The usual topology on the complex numbers is the set of all unions of open intervals 
of the complex numbers, identified with R?. 


32.6.13 REMARK: Definition of circle topology om semi-open intervals in terms of set translates. 
The notation Q + L in Definition 32.6.14 means the set {a + L; x € Q}. The torus topological spaces in 
Definition 32.6.15 are defined as topological space products of circle topological spaces. 


32.6.14 DEFINITION: The circle topology for a finite semi-open real-number interval is the topology 


{(QU(Q+L)ALAETH, 


where the semi-open interval is I = [a, a + L) or I = (a,a+ L] for some a € IR and L € IR*, and T is the 
usual topology on RR. 


32.6.15 DEFINITION: The torus topology for a finite product of finite semi-open real-mumber intervals is the 
product topology for a set x? I, where for each k € Nn, the set I; is a semi-open interval I; = [ax, ak 4- Ls.) 
or Ij = (ag, aj + Li] for some aj € R and Ly € R”, and the topology Tj on each interval I; is the circle 
topology on Ix. 


32.7. Real-number open set component enumerations 


32.7.1 REMARK:  Ezpressing open sets of real numbers as sequences of disjoint open intervals. 
The existence of enumerations of the open intervals of which an open set of real numbers is composed has 
some significance for measure theory, for example in Sections 45.5 and 45.7. 


Theorem 32.7.8 asserts that every open set of real numbers is equal to a countable disjoint union of open 
intervals. (Proofs of Theorem 32.7.8 are given by Kolmogorov/Fomin [104], page 51; Thomson/Bruckner/ 
Bruckner [149], pages 155-157; Shilov [135], pages 62-63; A.E. Taylor [145], page 55; Willard [165], page 18. 
For the extension to Cartesian spaces R”, the use of Zorn's lemma is suggested by Kolmogorov/Fomin [104], 
page 55. However, Theorem 34.8.2 confirms that no axiom of choice is required for R”.) Definition 32.7.2 is 
generalised from the real numbers to any topological space by Definition 34.6.16. Theorem 32.7.8 is gener- 
alised from the real numbers to general locally connected separable topological spaces by Theorem 34.8.2. 


To enumerate the components of an open set of real numbers, one’s first instinct might be to list them in 
left-to-right order, but this clearly will not work for open sets like R \ Z or (x € R\ Z; sin(1/ frac(z)) < 0}. 


If one restricts the focus to open subsets of a bounded interval J of IR, one may enumerate the intervals of an 
open set by successively listing the intervals whose length lies in the interval (L2*~!, L2] for k € w, where 
L is the length of the interval J. This list is finite for each k € w. Thus one may generate an enumeration 
of all of the component intervals of the set. This approach is not so feasible for the Cartesian products R” 
of IR, however, because in place of the length of open-set components as a criterion for sorting, one would 
need to use some other attribute, such as the Lebesgue measure. But then it would not be possible to exploit 
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the countable-components property of open sets when building Lebesgue measure theory. Therefore a more 
general and systematic strategy is usually adopted, which exploits the separability and local connectedness 
of the real numbers. 


To avoid using the excessively mystical axiom of choice, it is important to check that each stage in the 
construction of an enumeration of the components of an open set is expressible in terms of set-theoretic 
functions. The usual strategy is as follows. 


(1) Construct an explicit enumeration of a countable dense subset of a given topological space, which is in 
this case the real number system. (The usual choice of countable dense subset is the rational numbers, 
although the set of binary or decimal fractions would be suitable also. See Remark 15.0.2 for some choices 
for countable dense subsets. An explicit enumeration of the rational numbers is given in Remark 15.2.4.) 


2) Using an explicit enumeration (2;);e,, of such a dense subset, construct the connected component of a 
€ 
given open set 2 which contains any given point x; of the enumeration for i € w. (This construction is 
presented in Theorem 32.7.4 and Definition 32.7.5.) 


(3) By countable induction, construct an explicit enumeration of the component intervals of an open set Q 
which associates each integer i € w with the first new component interval which was not “landed on" 
by earlier elements in the dense subset enumeration. 


The most mystical aspect of this procedure is the use of countable induction. (In other words, it is not very 
mystical at all.) This procedure is also used for the more general Theorem 34.7.11. 


32.7.2 DEFINITION: An open interval enumeration for a set Q € Top(IR) is a map f : I > Top(IR), for 
some I € wt, such that 


(i) f(i) is an open interval of R for all i € J, 
(i) Uir f) - 9 
(iii) vi, j e I, (i I = FGN FCG) = 9). 
32.7.8 REMARK: The convex span of a pair of real numbers. 


The notation [[r, y]] in Theorem 32.7.4 means the closed interval [x,y] if x < y, or the closed interval [y, x] 
if y € x. In other words, [[z, y]] = [min(z, y), max(z, y)]. (See Notation 16.1.15.) 


32.7.4 THEOREM: The set of points connected to a point within an open set is a maximal open interval. 
Let Q € Top(R) and z € €. Let Y = (y € Q; [[r, y] € Q}. 


(i) Y is an interval of IR. 
(ii) Y is an open subset of IR. 
(iii) If Z is an open interval of R with Y C Z C Q, then Z = Y. 


PROOF: For part (i), let Q € Top(R) and x € Q. Let Y = (y € Q; [[z,y]] € Q}. Let y, yo € Y 
with yı € y». If y2 € x, then (yi, y2) € [y1, x] CO. If x € yı, then (y1, y2) € [mr yo] CQ. If yq € v € yo, 
then (yi, y2) = [y1, z] U [z, yo] € Q. So Y is a real-number interval by Definition 16.1.4. 


For part (ii), let y € Y. Then (y — ô, y +6) C Q for some ô € R* because N is open. Let z € (y — ô, y + ô). 


Then [[z,z]] € [[x, y] U (y — à, y +6) € Q. Soz € Y. Therefore (y — 6,y+6) € Y. Hence Y is an open 
subset of IR. 


For part (iii), let Z be an open interval of R with Y C Z C Q. Let z € Z. Then [[z, z]] CQ. So z € Y. 
'Therefore Z C Y. Hence Z — Y. 


32.7.5 DEFINITION: The open interval component of an element x in an open subset Q of R is the open 
interval (y € Q; [[x, y]] € Q}. 


32.7.6 REMARK: Alternative expression for the open interval component of a given point. 

The open interval component of a given element x of an open subset Q of IR in Definition 32.7.5 equals the 
largest open interval which contains x. This may be written as U ((y, z); x € (y, z) and (y,z) € Q}. The 
important thing to note is that the open interval component which contains a given point may be written 
as a set-theoretic expression. 
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32.7.7 REMARK:  Ezplicit enumerations of the open interval components of an open set. 

Theorem 32.7.8 gives an explicit expression for an enumeration of the open interval components of an open 
set of real numbers using a kind of “dartboard procedure". The bijection g : w — Q may be thought of 
as firing darts into a dartboard. If a dart hits an open interval of the set Q, this interval is added to the 
cumulative list of open intervals. Note that this “dartboard procedure” does not require g to be a bijection. 
Any surjection is adequate to the purpose. 


It may seem at first sight that this semi-pseudo-random “dartboard” procedure for enumerating components 
of an open set could be replaced with a more system, faster, sleeker procedure. However, the union of the 
open cover for the rational numbers in Remark 45.1.5 gives some idea of why peppering the real line with 
rational number “darts” may be as efficient as possible for the general case. Another example of a difficult 
open set to enumerate more efficiently is IR V Q, the complement of the rational numbers. For such a set, 
it is difficult to think of a better way to do it. These observations apply also to the more general case of 
enumerations of components of open sets in locally connected separable spaces in Theorem 34.8.2. (Possible 
alternatives to the “dartboard procedure” are also speculated on in Remark 34.8.3.) 


32.7.8 THEOREM: Open sets of real numbers can be partitioned into countable sets of open intervals. 
Every open set 2 € Top(IR) has an open interval enumeration. 


PROOF: Let Q € Top(R). Let g : w > Q be a bijection. Let I = {i € w;g(i) € Q}. Define the 
function h : I — Top(R) so that h(i) is the open interval component of g(i) in Q for all i € I. Let 
J—-(jeLhviel,(i«j- h(t) z h(j))). By Theorem 124.8, there exists a bijection à : X — J for some 
set X € wt. Define f : X —> Top(IR) by f = h o $. Then f is a bijection from a set X € wt to the set of 
all open interval components of 2. 


32.7.9 REMARK: The measure of an open set of real numbers. 

It follows from Theorem 32.7.8 that every open set of real numbers 2 has a well-defined “measure”, which is 
the sum of the lengths of all of the open interval components of €). It is easily seen that this sum is independent 
of the enumeration. If one of the component intervals is infinite, then the sum in Definition 32.7.10 is infinite 
for any enumeration of the component intervals. If the sum is unbounded for an enumeration g1, then the 
sum must be unbounded for any other enumeration go because go will eventually included any given finite 
set of intervals enumerated by gı. Consequently if the sum for gı is bounded, then the sum for g2 is bounded 
also. The sum for gg must be at least as great as the sum for gı because go eventually included any finite 
subsequence of gı. Since the converse is equally true, the sums must be equal. 


32.7.10 DEFINITION: The measure of an open set of real numbers Q is the non-negative extended real 
number Jez |g(?)|, where (g(1));er is an open interval enumeration for Q and |g(i)| denotes the length of 
g(i) as in Definition 16.1.12. 


32.7.11 THEOREM: Some basic properties of the measure of am open set of real numbers. 
Let u : Top(IR) — IRj denote the measure for open sets as in Definition 32.7.10. 


(i) u(0) — 0. 
(i) u(R) = a(R") = a(R) = oc. 
(iii) Va,b € R, (a € b > u((a,b)) =b- a). 
(iv) V1, Q2 € Top(IR), (O1 C Q2 => u(Q04) € u(Q2)). 
(v) Và, Q2 € Top(IR), (Q1 n Q5 — 0 > u(Q1 U Q2) = (Q1) + u(Q2)). 


PROOF: For part (i), the empty function g = @) is an open interval enumeration for Ø. Therefore u(0) = 
Sieg 19 (2)| = 0, where |J| denotes the length of any real-number interval J according to Definition 16.1.12. 


For part (ii), the function g : J + Top(IR) with J = {0} and g(0) = R is an open interval enumeration for R. 
So (IR) = |R| = oo by Definition 16.1.13. Similarly u(R*) = |(0,00)| = oo and u(R7) = |(—06,0)| = oo 


For part (iii), let a,b € IR with a < b. Then the open interval (a,b) by be enumerated by g : I — Top(IR) 
with 7 = {0} and g(0) = (a,b). Then u( (a,b) ) = |(a, 6)| = b — a by Definition 16.1.13. 

For part (iv), let Q1, 0 € Top(IR) with O4 C Q2. Let gk : I; > Top(IR) be an enumeration of the open 
interval components of Qy for k = 1,2. Let S = (gi1(i1) N go(t2); (4,12) € Ty x I3) NV (0). Then S is a 
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set of disjoint non-empty open intervals of real numbers. Let x € Qı. Then æ € gı(i1) N g2(i2) for some 
(i1,i2) € I; x Ig because Q; = U(Range(g,)) for k = 1,2. So x € (JS. Conversely, if x € US, then 
z € Qi N Q2 = Qı. Therefore Qı =U S. The set S may be enumerated by a map g : I > S with I € wt, 
Then (Q1) = Vier lÀ = Varen Miser, l9(3) 1 9(32)| € 32er, lg(t2)| = w(MQ2). Hence (91) € n(02). 

For part (v), let 01,Q2 € Top(IR) with Qi N Q2 = Ø. Let gy : Ik — Top(IR) be an enumeration of 
the open interval components of Q, for k = 1,2. Then there is an enumeration g : I — Top(R) of 
Range(gi1) U Range(g2), and then O4 U Q2 = U Range(g) and so n(Q1 U 3) = »;;e;l9(3)| = 32i er, |g (i) | + 
Puer, 192(@)| = (91) + u(Q3). 


32.8. Some constructions using continuous functions 


32.8.1 REMARK: The pull-back of a topology is a topology. 

Definition 31.12.4 states that the set {f~1(Q); Q € Top(Y)} of inverse set-map images by a continuous 
function f of the topology on Y is a subset of the topology on X. Theorem 32.8.2 states that this set 
of inverse set-map images is in fact a topology. This is not too surprising because inverse set-maps very 
effectively preserve the set union and intersection properties on the function's target set. (See Sections 10.6 
and 10.8.) One could express Theorem 32.8.2 by saying that “the pull-back of a topology is a topology". 
(See Theorem 32.13.4 for the corresponding “push-forth” topology.) 


32.8.2 THEOREM: Induction of a topology on a set from inverse function images of another topology. 
Let X be a set, Y be a topological space, and f : X + Y. Then (f^! (0); Q € Top(Y)} is a topology on X. 


PROOF: Let T = (f-!(Q); Q € Top(Y)} for a function f : X — Y for a topological space Y. Then 
(0, X} C T because Ø = f^! (0) and X = f! (Y), since (0, Y} C Top(Y). So T satisfies Definition 31.3.2 (i) 
for a topology on X. 

Let S1, S2 € T. Then Sı = f~!(Q1) and S2 = f-1(Q3) for some Q1, Q2 € Top(Y). But Qı n Qz € Top(Y) 
by Definition 31.3.2 (ii). So by Theorem 10.6.10 (iv), $; n $5 = f! (Q1) n f^ 1 (05) = f-1(0; N 05) € T. So 
T satisfies Definition 31.3.2 (ii). 

Let C C T. Then VS € C, IQs € Top(Y), S = f^! (Qs). But UC € Top(Y) by Definition 31.3.2 (iii). So by 
Theorem 10.7.6 (iii), UC = U (f^! (0s); S € C) = f"! (Uscc Qs) € T. So T satisfies Definition 31.3.2 (iii). 
Hence T' is a topology on X. 


32.8.3 REMARK: The topology induced om a set by a single function. 
The inverse-image topology in Theorem 32.8.2 is given a name in Definition 32.8.4. (This is quite possibly 
not a standard name for this topology.) 


32.8.4 DEFINITION: The topology induced on a set X by a function f : X — Y from a topological space Y 
is the topology {f~1(Q); Q € Top(Y)} on X. 


32.8.5 REMARK: Condition for all constant functions to be continuous. 
If f : X — Y is a constant function, the topology induced by f on X is the trivial topology. Since the 
trivial topology on a set X is weaker than any other topology on X, it follows from Theorem 32.8.6 that all 
constant functions are continuous, as already stated in Theorem 31.12.9. 


32.8.6 THEOREM: Continuity test using the inverse function image topology strength. 
For topological spaces X and Y, let f : X — Y. Then f is continuous if and only if the topology { f~1(Q); Q € 
Top(Y)} induced by f on X is weaker (i.e. not stronger) than Top( X). 


PROOF: Let X and Y be topological spaces. Let f : X — Y be continuous. Then f~!(Q) € Top(X) for 
all Q € Top(Y), by Definition 31.12.4. So (f^! (0); Q € Top(Y)) € Top(X). Since (/^! (0); Q € Top(Y 
is a topology on X, by Theorem 32.8.2, it is a weaker topology on X than Top( X), by Definition 31.3.23. 
To show the converse, suppose that T" = (f^! (0); Q € Top(Y)} is a weaker topology on X than Top( X). 
Then T’ C Top(X). So VO € Top(Y), f~1(Q) € Top(.X). Hence f is continuous by Definition 31.12.4. 
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32.8.7 REMARK: Relation of topological space continuity to metric space continuity. 

Theorem 32.8.8 effectively expresses the abstract notion of continuity in Definition 31.12.4 in terms of a more 
classical style of -ô definition as in metric spaces. (See Theorems 38.1.3 and 38.1.7.) But instead of a ball 
of radius £, there is a general neighbourhood €? € Top;(,; (Y), and instead of a ball of radius à, there is a 
general neighbourhood G € Top, (X). 


32.8.8 THEOREM: Function continuity test similar to metric space continuity tests. 
Let f : X — Y for topological spaces X and Y. Then f is continuous if and only if 
Va € X, Và € Top; (Y), IG € Top, (X), f(G) Ca. 


PROOF: Let f: X — Y be continuous. Suppose that z € X and Q € Top;,(Y). Let G = f-!(Q). 
Then G € Top(X) by Definition 31.12.4, and x € G. So G € Top,(X). But f(G) = f(f !(0)) C 
Theorem 10.7.1 (i^). Hence Vz € X, VO € Top; (Y), IG € Top, (X), f(G) € Q. 

For the converse, let f : X — Y satisfy Vr € X, VQ € ToP) (Y), IG € Top,(X), f(G) C Q. Let 
Q € Top(Y). Let S = f-1(Q). Let x € S. Let y = f(x). Then y € Q. So f(G) C Q for some G € Top, (X). 
Therefore f !(f(G)) € f-!(Q) by Theorem 10.6.10 (ii). But f^ !(f(G)) 2 G by Theorem 10.7.1 (ii). So 
GC f !(Q)- S. Thus Vz € S, 3G € Top,(X), G C S. Therefore S € Top(X) by Theorem 31.8.17 (iii). So 
YQ € Top(Y), f^ !(Q) € Top(X). Hence f is continuous by Definition 31.12.4. 


sa 


32.8.9 REMARK: Testing continuity of functions with open bases. 

For practical examples, the test for continuity in Theorem 32.8.8 would effectively require testing the inverse 
image of every neighbourhood of every point in the range of a function. The purpose of the open base 
concept in Section 32.2 is to substantially reduce the number of open sets which need to be tested for the 
sake of definitions. Theorem 32.8.10 modifies the test in Theorem 32.8.8 to use open bases instead of general 
neighbourhoods. Of particular interest are open bases which are countable at each point, which exist for the 
first countable spaces in Definition 33.4.12, and open bases which are globally countable, which exist for the 
second countable spaces in Definition 33.4.13. 


Since the full topology on a set is itself an open base by Theorem 32.2.4, the open base Bx may be replaced 
by Top(X) in Theorem 32.8.10, and the open base By may be replaced by Top(Y). 


32.8.10 THEOREM: Open-base function continuity test similar to metric space continuity tests. 

Let Bx € P(Top(X)) and By € P(Top(Y)) be open bases for topological spaces X and Y respectively. Let 
Bx, = {9 € Bx; x € Q} and By, = {9 € By; y € Q} for alla € X andy € Y. Then f : X > Y is 
continuous if and only if Yz € X, VO € By f(a), IG € Bx, f(G) CQ. 


PROOF: Let f: X — Y be continuous. Let x € X and Q € By sz). Let G = f^! (Q). Then G € Top(X) 
by Definition 31.12.4, and x € G. So G € Top, (X). Therefore by Definition 32.2.3, there exists G' € Bx; 
such that G” C G. But f(G) = f(f~*(Q)) € Q by Theorem 10.7.1 (i^). So f(G’) C € by Theorem 10.6.7 (ii). 
Hence Vz € X, VQ € By, f(a (Y), 3G' € Bx,,, f(G') C Q. 

For the converse, let f : X — Y satisfy Vx € X, VQ € By f(z), IG € Bx, f(G) € Q. Let € € Top(Y). 
Let S = f-!(Q). Let x € S. Let y = f(x). Then y € 2. So by Definition 32.2.3, there is a set 2’ € By f(z) 
such that Q’ C Q. Therefore f(G) C Q’ for some G € Bx. So f(G) € Q, and so f !(f(G)) C f !(Q) 
by Theorem 10.6.10 (ii). But f-!(f(G)) 2 G by Theorem 10.7.1 (ii). So G C f^!(Q) = S. Consequently 
Va € S, 3G € Top,(X), G € S. So S € Top(X) by Theorem 31.8.17 (iii). Consequently VO € Top(Y), 
f(Q) € Top(X). Hence f is continuous by Definition 31.12.4. d 


32.8.11 REMARK: The weak inverse-map topology is the weakest which makes the functions continuous. 
Theorem 32.8.12 asserts that the weak topology generated by the inverse images of a family of functions (in 
Definition 32.4.8) is the weakest topology which makes those functions continuous. This justifies the choice 
of name for this “weak topology". Theorem 32.8.12 is the function-family analogue of Theorem 32.1.10 
for set collections. Theorem 32.8.12 is also closely related to Theorem 32.12.3 for the weak topology on a 
Cartesian product of topological spaces. 


32.8.12 THEOREM: The weak inverse-map topology is the weakest which makes the functions continuous. 
The weak topology on a set X generated by a non-empty family of maps (/f;);e; from X to topological 
spaces (.X;, T;);er, is weaker (i.e. not stronger) than all other topologies on the set X for which all functions 
fi: X — Xj are continuous. 
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PROOF: By Definition 32.4.8, the weak topology on X generated by (/f;);e; and (Xi, T;)ier is 


Tx = (UC; Ce P({ Df (O); JE IP? (4) and (0) eg E p H} 


By Theorem 32.4.6, Tx is a topology on X. Let i € I. Let Q; € Tj. Then f !(Q0;) = Mjes f! (0) with 
J = {i}, and then f^!(0;) = (JC with C = (f^! (Q;)). So f^! (0;) € Tx. Consequently f; : X — X; is 
continuous for all i € I. 

Let T be a topology on X for which all functions f; : X — X; are continuous. Then f(Q) € T 
for all Q; € Tj, for all i € I. So (c; f; (Q3) € T for all (05)je; € Xjes Tj, for all J € IP*(I) by 
Theorem 31.3.7. Therefore [JC € T for all C € P(((;e; dh (O3); J € IP (I) and (05);e; € Xjes 1; }) 
by Definition 31.3.2 (iii). So Tx C T. Hence Tx is included in all topologies on X for which all functions 
fi: X — Xj are continuous. 


32.9. Direct product of two topological spaces 


32.9.1 REMARK: The usefulness of direct product topologies. 

Product topologies are used extensively in differential geometry, particularly for fibre bundles, tuple spaces 
and function spaces. The underlying set of the product of a family of topological spaces (X;,T;);e; is 
the Cartesian set-product X — x; X;. (See Section 9.4 for the Cartesian product of a pair of sets. See 
Section 10.11 for the Cartesian product of a family of sets.) The product topology is defined to be the 
weakest topology on X for which the projection maps II; : X — X; are continuous. (See Definition 10.13.2 
for projection maps for the Cartesian product of a family of sets.) Other useful topologies which are defined 
on Cartesian set-products are typically stronger than this weak topology. 


32.9.2 REMARK: Empty Cartesian products of sets. 

Some Cartesian products of sets can have axiom-of-choice issues. In most practical situations, plain Zermelo- 
Fraenkel set theory yields a non-empty Cartesian product of non-empty sets, but some products of families 
of non-empty sets, in some ZF models, can be empty. One convenient way to avoid this situation is to require 
set-products to be non-empty, not just the individual sets in the set-family. Then for a mathematician who 
accepts the axiom of choice, this is the same as requiring all individual sets to be non-empty. The empty 
product case is of limited interest since the only topology is then the set containing the empty set. (See 
Remark 31.3.9 for topology on the empty set.) 


Interestingly, if a family of non-empty sets does have an empty Cartesian product, the product topology 
in Definition 32.12.2 will be a valid topology on the set-product. If the product topology has a non-empty 
element, then it is easily shown that the set-product is non-empty. (See also Remark 32.12.5 for axiom-of- 
choice issues for product topologies.) 


32.9.3 REMARK: Weak topology for the direct product of two topological spaces. 

The general direct product of topological spaces in Definition 32.12.2 is perhaps more easily understood by 
first considering the product of two sets in Definition 32.9.4. This has the following basic properties. 

(1) Theorem 32.9.6. The product topology is a valid topology. 

(2) Theorem 32.10.3. Projected slices through the product topological space are open sets. 

(3) Theorem 32.10.7. The product topology is the weakest for which the projection maps are continuous. 
(4) Theorem 32.10.10. Partial maps from the product topological space are continuous. 

The direct product of two topological spaces is used very frequently in the fibre-bundle framework for 
differential geometry. Therefore it is beneficial to focus closely on this special case. There is no dearth of 


applications. This comment is equally true for the direct product of two topological manifolds and of two 
differentiable manifolds. (It is also easier to create diagrams for just two spaces!) 


32.9.4 DEFINITION: The (direct) product topology for two topological spaces (X1,T1) and (X5, T3) is the 
set of unions of sets of the form G4 x G3 such that G4 € T1 and G2 € To. That is, the product topology is 


T={UC; C C (G4 x Go; G ET and G2 € To}}. 


The (direct) product of two topological spaces (X4, T1) and (X5, T3) is the pair ((X4 x X5, T). 
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32.9.5 REMARK: Practical verification that a set is open in a product topological space. 

The collection C of Cartesian products of open sets in Definition 32.9.4 is typically infinite. But in practice, 
one does not generally produce an infinite list of Cartesian products of open sets to verify that a set is open 
in a product topological space. A more practical method, when a logical expression is given for an open-set 
candidate Q € P(X, x X35), is to show that each point in Q has a neighbourhood of the form G4 x G5. Then 
Q € Top(X1 x X2) by Theorem 31.8.17 (iii). 


32.9.6 THEOREM: Some basic properties of direct product topologies. 
Let (X1, T1) and (X2, T2) be topological spaces. Let T be the product topology for (X1, Tı) and (X5, T5) on 
X= X1 x X2. 

(i) The product topology for (X1, T1) and (X5, T5) is a topology on the Cartesian set product X4 x X». 
(ii) VO € T3, VOS € To, O1 x Q5 € T. 
(iii) V(z1,z2) € X, VO € Top, ,,, (X), 3G1 € Top,, (X1), 3G» € Topp, (X2), Gi x G2 C Q. 


PROOF: Part (i) follows from Theorem 32.1.15 because (G4 x Go; Gi € Tı and G2 € T5) is closed under 
finite intersections by Theorems 10.8.25 and 31.3.7. 

For part (ii), let Q1 € Ti, Q2 € T». Let C = (Q0, x Q2}. Then Qı x Qə = (JC € T by Definition 32.9.4. 

For part (iii), let x = (21,42) € X and Q € Top,(X). Then by Definition 32.9.4, € = UC for some 
C C (G4 x Go; GET and G2 € T5). So x € G, x Go for some G4 € Tj and G5 € T5 and G4 x G5 C Q, 
which implies that G, € Top, (X1) and G5 € Top, (X5). 


32.9.7 THEOREM: Natural open base and open subbase for product topology for two spaces. 
Let (X1, T1) and (X2, T2) be topological spaces. Let T be the product topology on X1 x X». 


(i) (G4 x Go; G4 € Tj and G2 € To} is an open base for T. 
(ii) {G1 x X»; G1 € Ti} U{X1 x Go; Go € To} is an open subbase for T. 


Proor: For part (i), let B = (G1 x Go; Gi € T; and G2 € T5]. Then B C T by choosing the open set 
collections C in Definition 32.9.4 to be singletons. But every open set 2 € T may be expressed as UC for 
some C € P(B). Hence B is an open base for T by Theorem 32.2.6. 

For part (i), let S = en x Xo; Gy € Tı} U {Xi x Go; Go € T3). Let B = 1D; Dc PX (S)}. Then 
B = (G4 x Go; G1 € Tı and G2 € Tz}, and so B is an open base for X, x Xə by part (i). Hence S is an 
open subbase for Xı x Xə by Definition 32.3.5. 


32.9.8 THEOREM: The map from the domain to the graph of a continuous function is continuous. 
Let X,Y be topological spaces. Let f : X — Y be a continuous function. Then the map g: X > X x Y 
defined by g : x + (a, f(x)) is continuous. 


PROOF: Let Q € Top(X x Y). It must be shown that g^! (Q) € Top(X). Let x € g 1(Q) € Top(z). Then 
(x, f(z)) € Q. So G1 x G2 C Q for some G; € Top, (X) and G2 € Top,(;; (Y) by Definition 32.9.4. Therefore 
x € Gı Cg !(Q). It follows that x is in the interior of g ! (Q). Hence g^! (Q) € Top(X). 


32.9.9 REMARK: Continuity of double-domain direct product of two continuous functions. 

Theorem 32.9.10 concerns the double-domain style of direct function product in Definition 10.14.3. (The 
common-domain version of this is Theorem 32.11.2.) The continuity of this style of direct product of two 
continuous functions is useful for combining local charts of topological manifolds, as in Theorem 49.4.24. 


32.9.10 THEOREM: Continuity of double-domain direct products of continuous functions. 
Let k : Xk — Y, be maps between topological spaces Xj, and Y;, for k = 1,2. Define ¢ to be the direct 
product map $4 X $2 : X — Y, where X = X4 x Xə and Y = Y; x Ys have the respective direct product 
topologies. 

(i) If k : Xk — Y, are continuous maps for k = 1,2, then ó : X — Y is a continuous map. 


(ii) If à; : Xk — Y, are homeomorphisms for k = 1,2, then 9 : X — Y is a homeomorphism. 
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PROOF: For part (i), let G € Top(Y » Then for some C C (G4 x G2; Gi € Top(Y1) and G2 € Top(Y2)}, 
G = UC by Definition 32.9.4. So 9^! (G) = U(6 ^! (G1 x G2); G1 x G2 € C) by Theorem 10.7.6 (iii). But 
óz (Gx) € Top(X,) for k = 1,2. ^s $-1(G4 x G2) = $1! (G1) x by (G2) € Top( X) for all Gy x G2 € C by 
Theorem 32.9.6 (ii). So ó -1(G) € Top(X) by Definition 32.9.4. Hence ¢ is continuous. 


Part (ii) follows from the double application of part (i). 


32.10. Slices and projections of product spaces 


32.10.1 REMARK: Projected slices of the direct product of two topological spaces. 

One would expect that if one combines the topologies of the components of a direct product of two topological 
spaces, the projections of all slices through an open set in the Cartesian set-product should be open in the 
relevant component space. This is proved in Theorem 32.10.3 and illustrated in Figure 32.10.1. 


Q € Top(.X4 x Xə) 


Figure 32.10.1 Slices of the direct product of two topological spaces 


The “slice sets" Slice? (Q) = {(x1, £2) € Q; Vi € No \ {k}, zi = pi} in Definition 10.12.5 are subsets of Q, 
but it is the projected slice sets of the form II, (Slice? (Q)) C X4 or II;(Slice5(Q)) C X» in Definition 10.12.7 
which are used in Theorem 32.10.3. Thus the slice sets illustrated in Figure 32.10.1, which are subsets of €), 
must be projected onto the component spaces as in Figure 32.10.2 before being tested for openness. For 
convenience, Notation 32.10.2 defines the projected slice maps which are used in Section 32.10. 


32.10.2 NOTATION: Temporary notation for projected slice maps. 
PY? and Py", for x; € X and x2 € X», for a Cartesian product X, x X», denote the projected slice maps 
Pi? 4 P(X, x X») — P(X) and Py? H P(X1 x Xə) — P(X1) defined by 


VAE P(X, x X2), V£2 € Xo, p = (mi € Xy (z1,223) € A}, 
VA € P(X, x X3), Vai € Xi, Pj. = {xo € Xs; (a1, £2) € A}. 


32.10.3 THEOREM: Projected slices of the product of two topological spaces are open sets. 
Let T be the product topology for topological spaces (X1, T1) and (X2, T3). 


(i) Vag € X2, VO € T, PP (Q) € Ti. 
(ii) Vay € X,, VO c T, P (Q) € T». 
(ii) YQ € T, {x1 € X1; dz» € X2, (x1,23) € Q} =U, ex, PP? (Q) € Top(X:). 
(iv) VQ € T, {x2 € Xo; dr, € Xi, (21,22) € OQ} = = Uriex Pj: (Q) € Top( X3). 
Q = 


PROOF: For part (i), let z2 € X5 and Q € T. By Definition 32.9.4, 
products G4 x G2 such that Gi € Tı and G2 € T». Therefore Pj? (Q) 
Theorem 10.12.8 (i). Hence PË? (Q) = U{G1; Gi x G3 E C) e Ty. 


Part (ii) may be proved exactly as for part (i). 


"UE for some set a of set 


1 (UC) = Uaec PP (G) by 


Part (iii) follows from part (i) and the closure of topologies under arbitrary unions in Definition 31.3.2 (iii). 
Part (iv) follows from part (ii) and Definition 31.3.2 (iii). 
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32.10.4 REMARK: Terminology for Cartesian set-product slices and fibre bundle cross-sections. 

The concept of a “slice” in Figure 32.10.1 would be more accurately expressed by the word “section”, since 
this is the term employed in technical drawing by architects and engineers. However, there is an unfortunate 
name-clash between a “section” in the sense of the inverse image II; !((z3]) = X1 x {a2} for a projection 
map II, of a Cartesian set-product, as opposed to a “cross-section” of a fibre bundle, which is in essence a 
right inverse € of a projection map, satisfying II; o € = idx,, which is a kind of “lift map". 


In a technical sense, the set X4 x {x2} is the function on X, which has the constant value x2, but this is a 
function from X, to X2, whereas a cross-section of the “trivial fibre bundle" X4, x X» would be a function 
from Xi to X1 x X». 

Thus the word “slice” is adopted to avoid confusion. Then a “projected slice" in Theorem 32.10.3 (i) is a 
subset of X1, whereas a “cross-section” of II; : X1 x X2 — X is a map from X, to X1 x X». (See also 
Sections 10.12 and 10.13 for general slices, projected slices and lift maps for Cartesian set-products.) 


32.10.5 REMARK: Relations between projected slice sets and projection maps. 
In terms of II; and Ils, the following relations hold for the projected slice sets in Notation 32.10.2. 


(1) PP (Q) = (QN 03 ({a2})) € Top(Xi) for all r2 € Xa and Q € Top(X, x X3). 


5 IL (Q) = ae PY? (Q) € Top( X1) for all Q € Top( X x X»). 
(Q) = Urex Py: (Q) € Top( X3) for all Q € Top( X x Xə). 


32.10.6 REMARK: The product topology is the weakest with continuous projection maps. 

Theorem 32.10.7 shows that the product topology in Definition 32.9.4 is the weakest topology for which the 
Cartesian set-product projection maps II; : X1 x X2 > Xı and Ilə : X4 x X» — Xə are continuous. (See 
Definition 10.13.2 for projection maps. See Definition 31.12.4 for continuity.) 


32.10.7 THEOREM: The projection maps for a product topology are continuous. 
Let (X1,T1) and (X2, T5) be topological spaces. Let T be the product topology on X4 x X». Define the 
projection maps II; : X1 x X» — X4 and IIo : X1 x X» > X» by Ih: (21,22) > 24 and Ilo : (£1, £2) > T2. 


(i) II, and Ig are continuous with respect to the topologies T1, Tz and T on X1, X» and X4 x X» respectively. 


(ii) If T isa topology on X4 x Xə for which II; and II» are continuous with respect to the topologies 71, T> 
and T on X1, Xə and X4 x X» respectively, then T C T. 


PROOF: For part (i), let G € Tı. Then II; ! (G) = G x X5, which is in T by Theorem 32.9.6 (ii). Hence II; 
is continuous by Definition 31.12.4. Similarly [2 is continuous. 

For part (ii), let T be a topology on X, x Xə such that II; and II; are continuous for the topologies 
Ti, T; and T on Xi, X; and Xi x Xə. Let G; € T; and G2 € Tj. Then Gi x X5 = IIl; (Gi) € T 
and G2 x Xı = II;!(G;) € T by Definition 31.12.4. So Gi x G2 = (Gi x X2) N (G2 x Xi) € T by 
Definition 31.3.2 (ii). So UC € T for all C € P({G1 x G2; G1 € T, and G2 € T2}) by Definition 31.3.2 (iii). 
Hence T C T by Definition 32.9.4. 


32.10.8 THEOREM: Homeomorphisms from slice sets with relative topology to direct-product components. 
Let X; x Xə be the product of topological spaces X4 and X». 


(i) Tia lesie : Xı x {xg} > X1, is a homeomorphism for all zx € Xo, where X4 x {x2} has the relative 
topology from X4 x Xə. 

(ii) Yale : {a1} x X2 > Xo, is a homeomorphism for all xı € X1, where {z1} x X» has the relative 
topology from X4 x Xə. 


Pnoor: For part (i), M| y xtra} : Xı x (22) — X1 is a bijection which is continuous by Theorem 31.12.15 


since II, : X4 x X2 — X, is continuous by Theorem 32.10.7 (i). Let G € Top(X4 x {x2}). Then G = QN Sı 
for some Q € Tı. Therefore II, (G) = I (Qi (X4 x {x2})) = PP? (Q) € Ti by Theorem 32.10.3 (i). 


PE 
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So 1 ceed : Xı X4 x {x2} is continuous by Definition 31.12.4. Hence IT, : X1 x (19) > Xy, 


is a homeomorphism for all x9 € X». 


X1x {x2} 


Part (ii) may be proved according to the pattern of part (i). 


32.10.9 REMARK: Projection maps, projected slices and “partial maps” for a function of two variables. 
Figure 32.10.2 illustrates the projection maps II; and II? in Theorem 32.10.7 and the projected slices PF? (Q) 
and P3’ (Q) of open sets in the product topological space which are defined in Notation 32.10.2, together 
with the partial maps ff?’ and f7' which are defined in the statement of Theorem 32.10.10. 


PRO) 
SS ih 
B mtem PUE 35 
nl VA n 
a x FN 
| PZ(Q) 
Li 
|| 
II; | $ [22 
Q 
Ne A NA 
X4 X Xə Xo 


Figure 32.10.2 Projection maps, projected slices and partial maps for a function of two variables 


32.10.10 THEOREM: The “partial maps” of a continuous function of two variables are continuous. 
Let (X1, T1), (X2, T2) and (Y, Ty) be topological spaces. Let f : X1 x X2 — Y be continuous. 


(i) The function fj? : X1 — Y defined by f1? : xı > f(z1,22) is continuous for all r2 € X». 
(ii) The function f7' : Xa > Y defined by f" : £2 — f (21,22) is continuous for all zı € X4. 


PROOF: For part (i), let (X1, T1), (X2, T2) and (Y; Ty) be topological spaces. Let f : X; x X2 — Y be 
continuous. Let z? € X5. Define f]? : X1 > Y by ff? : zi > f (21,22). Let Q € Y and let G1 = (f1?) 1 (Q). 
Then G, = (z1 € Xy; f(z1, 23) € Q} = {x1 € Xy; (x1, 22) € G}, where G = f^!(Q) € Tx, xx,. So GET 
by Theorem 32.10.3 (i). Hence ff? is continuous for all x2 € X». 


Part (ii) may be proved exactly as for part (i). 


32.10.11 REMARK: Relations between “partial maps” and projection maps. 

The “partial maps” in Theorem 32.10.10 are given this name by analogy with partial derivatives, where 
one variable is allowed to vary while all other variables are fixed. They (more or less obviously) satisfy the 
following relations with the projection maps. 


1 — fi? o Il 1 X {z2} > Y for all rg € Xə. 


T p 
: {xi} x X» — Y for all zı € X4. 


Flade) 


2 = fz' o Il 


faz» tea} Xa 


3 1 = fint pr" : Xı > Y for all z2 € Xo. 
2 


(1) 
(2) 
(3) 
a = Flade o Ht: X4 2 Y for all zı € Xi. 

The compositions on lines (3) and (4) are general relation compositions as in Definition 9.6.2 because i 
and II; l are typically not functions. However, the “output” is a well-defined function in each case. 
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32.11. Common-domain function-product homeomorphisms 


32.11.1 REMARK: Continuity of common-domain direct product of two continuous functions. 

Theorem 32.11.2 concerns the common-domain style of direct function product in Definition 10.15.2 and 
Notation 10.15.3. (The double-domain version of Theorem 32.11.2 (i) is Theorem 32.9.10 (i).) The continuity 
of this style of direct function product is relevant when combining the projection map with a fibre chart for a 
topological fibre bundle as in Definition 47.3.3. In fact, common-domain function-product homeomorphisms 
are the simplest kind of topological fibration, known as a "trivial fibration". 


A topological space which is homeomorphic to the direct product of two topological spaces may be referred 
to as a “(globally) product-structured topological space". A topological fibration, as in Definition 47.3.6, is 
then a “locally product-structured topological space". 


32.11.2 THEOREM: Continuity of common-domain direct products of continuous functions. 
Let ¢, : Y — Xj be maps between topological spaces Y and X;, for k = 1,2. 


(i) If à, : Y — Xy are continuous for k = 1,2, then $1 x $» : Y — X4 x X» is continuous. 
(ii) If $4 x dg: Y 4 X, x X» is continuous, then p : Y —> X, are continuous for k = 1,2. 


PROOF: For part (i), let G € Top( X; x X5). Then by Definition 32.9.4, G = UC for some set-collection 
C C {Gy x Go; Gi € Top(X1) and Gz € Top(X2)}. So ¢ 1(G) = Ute! (Gi x G3); Gy X Go € C) by 
Theorem 10.7.6 (iii). But ¢;'(Gx) € Top(Y) for k = 1,2. So 971(G1 x G3) = à1 (G1) $5! (G3) € Top(Y) 
for all Gi x G2 € C by Theorem 10.15.6 (iii) and Definition 31.3.2 (ii). Therefore $^ !(G) € Top(Y) by 
Definition 31.3.2 (iii). Hence ¢ is continuous. 


For part (ii), let Q € Top(X1). Let G = Q x Xə. Then G € Top(X; x X3) by Theorem 32.9.6 (ii). So 


(61x $2)-(G) € Top(Y) since $1 X d» is continuous. But ($1 x42)! (G) = (y € Y; (¢1(y), d2(y)) € Qx Xo} 
= (y € Y; &1(y) € Q} = $1! (Q). Hence ¢, is continuous. Similarly, z is continuous. 


32.11.3 REMARK: Common-domain function products which are homeomorphisms. 

The set $5 '({a2}) in Theorem 32.11.4 (i) is a kind of “horizontal fibre set” within the set Y. It is the inverse 
image under a homeomorphism of the slice-set X, x {x2}. (See Definition 10.12.5 for slice sets of Cartesian 
products.) Whereas the set X1 x {x2} can be considered to have the same topology as X4, in the case of the 
“horizontal fibre set” $5 !((13]) the natural topology is the relative topology from Y. Unsurprisingly, the 
restriction of $4 to this “horizontal fibre set” is a homeomorphism. This seemingly uninteresting observation 
is related to the fact that each fibre set of a topological fibration is homeomorphic to its fibre space. (See 
Theorem 47.3.11.) This idea becomes more interesting when it is shown in Theorem 52.7.5 (ix, x) that 
horizontal and vertical “fibre sets" are diffeomorphic to the corresponding direct-product component spaces. 
This can be applied to differentiable fibre bundles to show that each fibre set is diffeomorphic to the fibre 
space. (See Theorem 64.3.9 (ii).) Theorem 32.11.4 is illustrated in Figure 32.11.1. 


$2 


Y ———— X» 
pz ((22]) 


Qi X $2 : Y z X1 x Xo 


X4 Xi x X3 
X4 x {x2} 


Figure 32.11.1 Homeomorphism for pointwise restrictions of a direct product map 


32.11.4 THEOREM: Homeomorphisms from horizontal/vertical subspaces to direct product components. 
Let (X1, T1), (X2, T3) and (Y, Ty) be topological spaces. Let $1 : Y — X and $9 : Y — X» be functions 
such that Q4 x $3 : Y — X, x X» is a homeomorphism. 
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i) Mi uy : 9 ((3)) > X1 is a homeomorphism for all a2 € Xo. 
2 


ii) PENAN : ¢, ((21)) — Xa is a homeomorphism for all zı € X4. 


PROOF 1: For part (i), let £2 € X5. Then the restriction go = $11 (ual : dl ((z3)) > X1 of $i is 
2 
continuous by Theorems 32.11.2 (ii) and 31.12.15, and it is a bijection by Theorem 10.15.10 (ii). Therefore 
its inverse, gı = TTR : Xı > $5 ({a2}) C Y, is a well-defined bijection. 
2 


Let f = ($1 X ġ2)71. Then f : X, x Xa — Y is a well-defined homeomorphism. For any (1,22) € X4 x Xs, 


(1 X $2) (f (£1, 22)) = (21, 22), and so $1(f(z1,22)) = 21 and $»(f (£1, 22)) = £2. So f(z1,22) € 63 *({x2}), 
and consequently $1] zu UG 22) = zı. Thus gi(z1) = f(z1, 22) for all zı € X4. But f? : X > Y in 


the statement of Theorem 32.10.10 (i) also satisfies f? (r1) = f (x1, £2) for all zı € X4. Therefore gı = f’. 
So gı is continuous by Theorem 32.10.10 (i). Hence $113) : by ((23)) > X1 is a homeomorphism. 
2 


Part (ii) may be proved exactly as for part (i). 


PROOF 2: For part (i), let £2 € X2. Then the map ($1 x loses) : by ({t2}) > Xi x {a2} is a 
2 
homeomorphism with respect to the m topologies on e ((x3]) € Y and (X; x {x2}) € (X1 x X3) by 
Theorem 31.14.14. But ó1|,-: (5,5, = Thi] x, Ly 9 (1X 2), p where I| y 4: X1 X {22} > Xi 
is a homeomorphism by Theorem 32.10.8 (i). Hence $4 pe isa : ó5  ([3)) + Xi is a homeomorphism for 
2 
all xg € Xo. 


Part (ii) may be proved exactly as for part (i). 


32.11.5 REMARK: Product-structured topological spaces are the same as trivial topological fibrations. 

Definition 32.11.6 gives names to the horizontal and vertical subspaces of a product-structured topological 
space. (See Definition 10.15.12 for the non-topological version of Definition 32.11.6.) In terms of the 
notations in Definition 32.11.6, T'heorem 32.11.4 asserts that ijy : YP? — X, and Q2jy : Y5 > Xo 


are homeomorphisms for all z4 € Xı and rg € X». The fact that | " tuples (Yi?, TF?) Ru ‘(21 TE) are 
valid topological spaces follows from Theorem 31.6.3. 


32.11.6 DEFINITION: A product-structured topological space is a tuple Y < (Y, 1, 62, X1, X2) where Y, X4 
and X» are topological spaces and $4 : Y + X4 and à» : Y — X» are functions such that the common-domain 
function product $1 x $3 : Y — X4 x Xə is a homeomorphism. 


A horizontal subspace of a product-structured topological space (Y,¢1,¢2,X1,X2) is a topological space 
(YE, T1?) for some £2 € X5, where 


(i) Y? = og ((z2)), and 
(i) TP = {AN Y; Q € Top(Y)). 


A vertical subspace of a product-structured topological space (Y, $1, $2, X1, X2) is a topological space 
(Y>, T7*) for some x, € X1, where 


(i) Yo" = $1 (£21)); and 
(ii) TZ? = (Qn YZ"; Q € Top(Y)}. 


32.12. Product topology for a family of spaces 


32.12.1 REMARK: The topology for a general direct product of topological spaces. 

Although Definition 32.9.4 expresses the topology on the product of two topological spaces in terms of unions 
of products of open sets from the two topologies, this way of expressing the product topology is aimed more 
at brevity than clear meaning. It is the proof of Theorem 32.10.7 part (ii) which best shows how the product 
topology can be generalised to arbitrary products. For the projection maps to be continuous, all sets of the 
form xje7; such that (Qi)ier € Xier T; and #{i € I; Qi Æ Xi} = 1 must be in the product topology, 
and by the closure of a topology under finite intersections, all sets of the form Xier Qi € xijer Ti such that 
#{i € I; Qi Z Xi} < co must be in the product topology. Then by closure under arbitrary unions, all unions 
of such sets must be in the product topology. This is the weakest topology for which all projection maps are 
continuous. This explains the form of Definition 32.12.2. 
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32.12.2 DEFINITION: The (direct) product topology for a family of topological spaces (X;,T;)ier is the set 
of all unions of sets x;e; Qi such that (Qi)ier € Xier T; and #{Q;; Qi Æ Xi} < oo. In other words, the 
direct product topology is the set T defined by 


T'={ x Qj; (Qi)ier € x d and #{i € I; Q; # Xi) < oo) 
ic ie 
and 
T-(UC;CcT). 


The (direct) product of a family of topological spaces (Xi, T;)ie; is the pair (X, T), where X = xje7 Xi. 


32.12.83 THEOREM: Product topology on a family is the weak topology generated by its projection maps. 
Let T be the product topology for the product X = x;e; X; of a family of topological spaces (X;, T;);e;. 
Define the projection maps II; : X — X; by Il; : x > a; for all i € I. 

(i) (X, T) is a topological space. 

(ii) IL; is continuous with respect to the topologies 7; and T on X; and X respectively for all i € I. 


(ii) If T is a topology on X for which II; is continuous with respect to the topologies T; and T on X; and 
X respectively for all i € J, then T CT. 


In other words, the product topology on X is the weak topology on X which is generated by the family 
(IL;);er of projection maps IL; : X — X;, in the terminology of Definition 32.4.8. 


PROOF: For part (i), let G € T", where T" is as in Definition 32.12.2. Then G = Xie 9; for some family 
(Qiier € Xie T; for which Q; is equal to X; for all i € I except pt for a finite subset J = {i € I; Qi Æ X. 
For all j € J, define (G2 ier € Xicr T; by Gi = =Q; if i a and Gi = = X; if i # j. Then G = Ayes x ie G1. 
But xier GI = II; ! (Qj) for all j € J. 8o T' - (f| 7 (Q); J E€ PPU ) and Vj € J, Qj € Tj). Hence T 
is a topology on X by Theorem 32.4.6. 

For part (ii), let G € X. Then G = UC for some subset C of T" in Definition 32.12.2. Let j € I and 
Gj = II; (G) ={a; € Xj; Jx € X, x e UC}. Let Il = IN {j} and X’ = xier Xi. Then 


je; H 


G; = {xj € Xj; Ix’ € X’, X(Qj)ier € C, (aj € Qj and Vi € I’, xi € Q;)) 
= (zj € Xj; 3(Qi)ie; € C, Aw’ € X', (x; € Qj and Vi € I’, xi € (,)) 
= (zj € Xj; 3(Q;)ier € C, (rj € Qj and 3z' € X’, Vi e I’, xi € Q)) 
= (xj € Xj; 3(Qi)ier € C, (a; € Qj and xier Qi F 0)) 


= U 19; € Tj; 3(Qjer € C, xier Qi F OF. 


So Q; € T; by Definition 31.3.2 (iii). Hence II; is continuous by Definition 31.12.4 for all j € I. 


For part (iii), let T be a topology on X = xje; X; such that II; is continuous for the topologies T; and T 
on X; and X respectively for all i € J. Let Q € T. Then Q = (JC for some C € PP(T^), where T" is 
as in Definition 32.12.2. Let G € C. Then G = xje7Q; for some (Q;)ie; € Xier T; for which the set 
J = {i € I; Qi Z Xi) is finite. But x;e;j Qi = a" Xje1 O2, where for each j € J, (OQ? )ier € xiei Ti 
satisfies Qf = Q; for i = j and OQ} = X; for all i € I V(j). Also, xier 0} = II; (Q5). So xier Qj € T by 
Definition 31.12.4 because IL; : X — X; is continuous with respect to T and T}. Therefore Myer X ie OF ET 
by Definition 31.3.2 (ii). Thus G € T for all G € C. Therefore Q = (JC € T by Definition 31.3.2 (iii). Hence 
TCT by Definition 32.12.2. 


32.12.4 THEOREM: Open base and open subbase for a product of topological spaces. 
Let T be the product topology for a family of topological spaces (X;,Ti)ier- 


(i) The set B = ( xier Qi; (Qi)ier € Xier T; and #{i € I; Qi 4 Xi} < oo} is an open base for T. 
(ii) The set S = { Xier Qi; (Qi)ier € Xier T; and #{i € I; Qi A Xi} = 1} is an open subbase for T. 


PROOF: Part (i) follows directly from Definition 32.12.2 and Theorem 32.2.6. 
Part (ii) follows from Definition 32.3.5 and the observation that B = { N D; D € PX (S)}. 
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32.12.5 REMARK: Axziom-of-choice issues for product topologies. 

Some minor axiom-of-choice issues arise when defining the product of an arbitrary family of topological 
spaces in Definition 32.12.2. There are even more significant AC issues with theorems about products of 
arbitrary families of topological spaces. (For example, the product can be shown to be compact if each 
individual space is compact, but only by means of an axiom of choice.) If one works within a ZF model 
where the product of a non-empty family of non-empty topological spaces can be empty, many claims about 
the properties of the product topology will be true simply because the product topology will be empty. (This 
is also mentioned in Remark 32.9.2.) 


Theorem 32.12.6 (iii) asserts that if the set-product X = xje; X; is empty, then the product topology in 
Definition 32.12.2 is a valid topology {@} on X. (This “empty topology” is discussed in Remark 31.3.9 and 
Example 31.5.2.) Note that there are two ways in which X could be empty. Either one or more of the sets 
Xj are empty, or no choice axiom powerful enough to guarantee X Æ () is available. 


In most practical situations, no axiom of choice is required. For example, if X; is the same non-empty set 
for all i € I, then one may nominate any fixed element x of the set so that (x);e; € xier Xi. More generally, 
if NQ; X; 4 0, one may nominate any x € (^; X; in a similar way. If each X; is a group, one may nominate 
the identity e; in each X;, so that (e;)j;e; € Xicr X;. Trouble typically arises only for specially crafted 
pathological cases, such as in the “construction” of Lebesgue non-measurable sets. 


32.12.6 THEOREM: Some írivial boundary cases for product topologies. 
Let (X;,T;);e; be a family of topological spaces. Let T be the product topology on the Cartesian set- 
product X = xjer; Xj. 
(i) If 7 = 0, then T = {0}. 
(ii) 0 € T. 
(iii) If X = 0, then T = {0}. 


PRoor: For part (i), let (X;,7;)ier be a family of topological spaces with 7 = Ø. Then x;e; T; = (0) by 
Theorem 10.11.6 (i). So the set T" in Definition 32.12.2 is equal to {{0}} (by applying Theorem 10.11.6 (i) 
to Xier Qu). Let C C T'. Then C = 0 or C = (01, and so UC = 0. Hence T = {0}. 
For part (ii), let (X;, T;);er be a family of topological spaces. Let X = x;e; Xi. If I = Ø, then 0 € T by 
part (i). Now suppose that J Z Ø. For any j € I, define (Q;);e; by 9; = and Q; = X; for all i € I \ {7}. 
'Then (Qi)ier € XicI Ti. So XicI Q; c p^ where T” is as in Definition 32.12.2. But XicI Q; = 0. SOo®ET 
by Definition 32.12.2. 

For part (iii), suppose that T Z (0). Then I 4 Q by part (i). Since Ø € T by part (ii), T must contain at 
least one non-empty set Q. Then Q = [JC for some C € IP(T"), where T" is as in Definition 32.12.2. So T" 
cannot equal @ or {0}. Therefore T" must contain at least one non-empty set G. Then G = x;e; 9; for some 
(Qiier € Xier T; for which Q; is equal to X; for all i € I except for a finite subset. But xje7 Qi C Xier Xi. 
Hence X Æ 0. 


32.12.7 REMARK: Continuity of functions on products of topological spaces. 

Theorems 32.12.8, 32.12.9, 32.12.10 and 32.12.11 are concerned with continuity of functions related to direct 
products of topological spaces. The continuity property for functions in Theorem 32.12.8 can more or less 
be considered to be a defining characteristic for product space topology. Theorem 32.12.11 has practical 
applications, for example in the proof of Theorem 33.3.34 regarding Tg and T3:1 topological separation 
classes. Theorems 32.12.9 and 32.12.10 are used in the proof of Theorem 32.12.11. 


32.12.8 THEOREM: Functions with product target space are continuous when projections are continuous. 
Let X be a topological space. Let Y = x;e;Y; be the product of a family of topological spaces (Y;);er. 
Then a function f : X — Y is continuous if and only if IL; o f : X — Y; is continuous for all i € J, where 
IL; : Y ^ Y; denotes the standard projection map for (Y;);er. 


PROOF: Let f: X — Y = Xier Yi be continuous. Then II; o f : X — Y; is continuous for all i € I by 
Theorem 32.12.3 (ii) and Theorem 31.12.7. 


For the converse, let f : X — Y and suppose that II; o f : X — Y; is continuous for all i € I. Let Q € Top(Y) 
and r € Q. Let By = { xier Qi; (Qi);ier € xier Top(Y;) and #{i € I; Qi Z Yi} < co}. Then By is an 
open base for Y by Theorem 32.12.4 (i). Let G € By. Then G = x;e; €); for some (Q;)ier € xier Top(Y;) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1094 32. Topological space constructions 


such that J = (i € I; Q; # Yı} is finite. But then G = (;c; Xier Qj, where for all j € J, the family 


(Aier € xie; Top(Y;) is defined by QF = Q; and Vi € IV {j}, Qi = Yi. Then xiej 0? = II; ! (05) 


for all j € J. So G = Mjes Hz (Q;). Therefore f-1(G) = f (Njes 11; (05)) = (5e; £101; (05). 
But f-!(II;'(0;)) = (II; o f)^!(Q05) for all j € J by Theorem 10.7.2 (ii). So f^!(G) € Top(X) by 
Definition 31.12.4, Theorem 31.3.7 and the continuity of II; o f for all j € J. Hence f is continuous by 
Theorem 32.8.10. 


32.12.9 THEOREM: Continuity of composite of Cartesian-space-valued and real-valued functions. 
Let X be a topological space. Let m € Zo . Let f; : X — R be continuous functions for i € Nm. Let 
g : IR? — R be continuous. Define h : X > R by h: «+> g(fi(a), fo(z),... fm(x)). Then h is continuous. 


PROOF: Let X be a topological space. Let m € Zi. Let f;: X — IR be continuous functions for i € Nm. 
Define f : X — R” by f : x — (xi). Then f; = Il; o f for alli € Nm. Therefore f is continuous by 
Theorem 32.12.8. Let g : IR" — IR be continuous. Define h : X > IR by h : x > g(fi(z), fa(z),... fa. (x)). 
Then h = go f. Hence h is continuous by Theorem 31.12.7. 


32.12.10 THEOREM: The minimum and maximum functions for finite tuples are continuous. 
The functions defined on R™ by z +> min(z;; i € Nm} and z + max{a;; i € Nm} are continuous for 
all m € Z*. 


PROOF: Let m € Z*. Define f : IR" > R by f(x) = min(z;; i € Nm} for x € IR". Let B be the set of 
open intervals of IR. Then B is an open base for Top(IR) by Definition 32.5.7 and Theorem 32.2.6. Let Q € B. 
Then Q = (a,b) for some a,b € IR. (See Notation 16.2.4 for IR.) Let G = {x € R™; Vi € Nm, z; € Q}. 
Then G € Top(IR") and f(G) C Q. Hence f is continuous by Theorem 32.8.10. Similarly, the map 
xr maxizi;; i € Nm} is continuous. 


32.12.11 THEOREM: Continuity of minimum and maximum of Cartesian-space-valued functions. 
Let X be a topological space. Let (f;)7*, be a non-empty finite sequence of real-valued continuous functions 
on X. Then the functions min7*, f; and max7*, f; are continuous on X. 


PROOF: Let X be a topological space. Let (f;)7*, be a non-empty finite sequence of real-valued continuous 
functions on X. Define g : IR" > IR by g : x > min; zi. Then g is continuous by Theorem 32.12.10. So 
the function h : X — IR defined by h: x — min, f;(x) is continuous by Theorem 32.12.9. In other words, 
the function min?*, f; is continuous on X. Similarly, the function max7*, f; is continuous on X. 


32.13. Topological quotient and identification spaces 


32.13.1 REMARK:  Topological identification spaces versus topological patchwork spaces. 
The construction of a topological quotient space from a given topological space and an equivalence relation 
on the point set is widely referred to as an “identification space", “quotient space" or “decomposition space". 


Topological identification spaces are useful for constructing topological spaces which have more interesting 
connectedness properties from less interesting basic building blocks such as Cartesian spaces. Whereas the 
topological patchwork spaces in Section 32.15 combine multiple patches with continuous overlaps to produce 
a single topological space, identification spaces identify points in a single given topological space by a kind 
of fold-and-pin procedure. The difference between these two kinds of constructions is illustrated for the case 
of S! in Figure 32.13.1. 


32.13.2 REMARK:  Quotients defined in terms of partitions, equivalence relations and functions. 
Topological identification spaces are based on non-topological identification spaces, which are also known as 
decomposition spaces or quotient spaces. (See Definition 9.8.7.) There are (at least) three convenient ways 
to construct a quotient space. 


(1) Define a partition of a set X. 


(2) Define an equivalence relation on a set X. 


(3) Define a function on a set X. 
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Figure 32.13.1 Comparison of atlas-patchwork and identification space constructions 


Every partition determines a corresponding equivalence relation and vice versa. (See Theorem 9.8.4.) For 
every equivalence relation R on a set X, there is a quotient map (or identification map or decomposition map) 
$: X — X/R which bijectively maps elements of X to equivalence classes of R. (See Definition 10.16.2 
for quotient maps.) Conversely, for every function f : X — Y, for any set Y, an equivalence kernel is 
defined on X, which is an equivalence relation on X. (See Definition 10.16.7 and Theorem 10.16.8 for 
equivalence kernels.) Topological identification spaces may be defined in terms of either (or all) of these 
three equi-informational specification styles. 


Topological identification spaces are defined in terms of partitions by EDM2 [113], page 1610, 425.L; in terms 
of equivalence relations by Willard [165], pages 59-66; Gemignani [80], pages 79-83; in terms of functions by 
Steen/Seebach [141], pages 8-9; B. Mendelson [115], pages 101-105; Hocking/Young [93], pages 132-136; in 
terms of a combination of equivalence relations and functions by Kasriel [100], pages 239-241; Wilansky [163], 
page 105. These authors make clear the close inter-linkage between the three styles of specification. 


Probably the most convenient form of definition of an identification space is in terms of a function on a set 
X because every function determines a unique partition and equivalence relation, whereas the partition and 
equivalence relation correspond to only one canonical function, namely the quotient map from X to X/R 
for an equivalence relation R on X. 


In linguistic terms, the word “quotient” suggests the division or partitioning of a set into cosets or equivalence 
classes. Therefore the partition style of definition seems closest to the meaning of the word. The word 
"identification", on the other hand, suggests an identification map, which tends to support the function style 
of definition. 


32.13.3 REMARK: The topology induced by a map onto the range set. 

The topology induced on a set Y by a map f : X — Y in Definition 32.13.5 is the strongest topology for 
which f is continuous. The continuity of f with respect to Tx and Ty follows trivially from the definition 
of continuity. If any more open sets were added to Ty, clearly the continuity property would fail. This is a 
kind of converse, inverse, obverse or reverse of Theorem 32.8.2. 


32.13.4 THEOREM: The “push-forth” of a topology via an arbitrary function. 
Let (X, Tx) be a topological space. Let f : X — Y for some set Y. Then the set {Q € P(Y); f^! (0) € Tx} 
is a topology on Y. 


PROOF: Let Ty = {Q € P(Y); [^ Q) € Tx). Then Ty C P(Y), and (0,Y Y C Ty since f-i(0) =e Tx 
and f iv) = X € Tx. Let 01, Q2 € Ty. Then F(Q) € Tx and f^ (93) € Tx. So f (Qi N Q2) = 
F= (21) 0 f^! (05) € Tx by Theorem 10.6.10 (iv) and Definition 31.3.2 (ii). Therefore Q; N Qə € Ty. Thus 
Ty is closed under binary intersection. Let C € P(Y). Then f ^!(Q) € Tx for all Q € C. f-'(UC) = 
Uoec f^! (Q9) € Tx by Theorem 10.8.18 (iii) and Definition 31.3.2 (iii). Therefore (JC € Ty. Thus Ty is 
closed under arbitrary unions. Hence Ty is a topology on Y by Definition 31.3.2. 


32.13.5 DEFINITION: The topology induced by a map f : X — Y from a topological space (X, Tx) to a 
set Y is the set {Q € P(Y); f^! (Q) € Tx}. 
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32.13.6 REMARK: Standard topology for the quotient of a topological space by an equivalence kernel. 

For any function f : X — Y from a topological space (X, Tx) to a set Y, Definition 32.13.5 may be applied 
to the quotient map from X to X/R in Theorem 10.16.8 to define a quotient topology on the quotient 
set X/R, where R is the equivalence kernel of f in Definition 10.16.7. The topological quotient space in 
Definition 32.13.7 is a topology on X/R by Theorem 32.13.4. 


32.13.7 DEFINITION: 
The (topological) quotient space or (topological) identification space or (topological) decomposition space of 
a function f from a topological space X to a set Y is the topological space ((X/ R, Tx;g) where 


Txjg = {9 € P(X/R); g ' (0) € Top(X)], 
and g : X — X/R is defined by g : x f! ((f(zx)]), where R is the equivalence kernel of f. 


32.13.8 REMARK: Standard topology for the quotient of a topological space by an equivalence relation. 
If Definition 32.13.5 is applied to the canonical quotient map for an equivalence relation in Definition 10.16.2, 
the result is the quotient topology in Definition 32.13.9. 


32.13.9 DEFINITION: The quotient topology on the quotient set X/R, where R is an equivalence relation 
on a set X, is the topology induced by the canonical quotient map from X to X/R. 


32.14. Set union topology 


32.14.1 THEOREM: The weakest topology on the union of two topological spaces. 
Let (X1, Tı) and (X2, T5) be topological spaces such that Xı N X3 = Ø. Then the set T = (Q1 U Q2; Qı € 
Tı and Q2 € T5) is the weakest topology on X4 U X» such that T 2 T1U T». 


PROOF: The closure of T with respect to pairwise intersection follows from the identity (,(0$ U 02) = 
(M: 24) U (f; 05) which holds if 0? C X; and 0$ C X for all i, and X; 1 X5 = Ø. The closure with respect 
to arbitrary unions follows from the corresponding identity for unions. The minimality of the topology T' 
follows from the fact that any topology T 2 T1 U Tə must contain at least all of the unions of elements 
of Ti U T5. 


32.14.2 DEFINITION: The disjoint set union topology for disjoint sets X; and Xə with topologies T} and 
Tə respectively is the topology T = {Q1 U Q2; Qı € T; and Q2 € To}. 


32.14.3 REMARK: Gluing topologies on the intersections of patches. 

Definition 32.14.6 introduces a sort of “patchwork” of two or more topologies. The idea here is to try to 
define a topology on the set X4 U X», especially in the case that Xı N Xə is non-empty. The topologies of 
manifolds and fibre spaces are typically specified by identifying overlaps of patches. This is a very general 
mechanism for creating topologically interesting spaces out of flat, boring spaces without having to resort 
to induced topologies of embeddings or various quotient topologies. A patchwork topology can be expressed 
as the quotient of the “disjoint union topology” (of two nominally non-intersecting topological spaces) with 
respect to an appropriate relation on the base set union. 


A more general form of “patchwork topological space” can be defined with the aid of Definition 10.17.8, which 
defines patchwork spaces of arbitrary families of sets. If the topologies on each member of such a family are 
consistent, then a topological patchwork space is well-defined. This is presented in Definition 32.15.3. 


32.14.4 REMARK: Equivalent condition for the weakest topology on an overlap of patches. 

The conditions VO; € Ti, Q1 O X» € T» and VO2 € T», Q2 O Xı € Tı in Theorem 32.14.5 are equivalent to 
requiring the identity map idx,nx, to be a homeomorphism for X4 N X» with the relative topologies from 
T, and T, respectively. (Theorem 32.14.5 is a special case of Theorem 32.14.7.) 


32.14.5 THEOREM: Equivalent condition for the weakest topology on union of topological spaces. 

Suppose two topological spaces (X1, Tı) and (X2,T>) are such that O1 N X2 € T» for all Qı € Tı and 
O5 N X41 € T; for all Q9 € T». Then the set T = (01 U Q2; Qı € Tı and Q2 € To} is the weakest topology 
on X4 U Xə such that T1 U T5 C T. 
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PROOF: It must be shown that the set T is a topology for X1U X». As in the proof of Theorem 32.14.1, it is 
sufficient to show that T is closed under pairwise intersections and arbitrary unions. (See Definition 31.3.2.) 
So consider 01, Q? € Ti and 04,02 € T». It must be shown that Q = (Qt U4) n (Q? U 02) e T. By 
distributivity, 


Q = (01n92)u(01n02)u (05n 97) u (Qn 02). 


Clearly O1 N Q? € Tj and 01 n 02 € To. Since 01 n 02 = (01 n X5) n 02 = Q1 n (02 n X4), it follows that 
Qr 03 € Ti n T» by the assumptions of the theorem. Similarly (01 à 02) € T1 n T3. So Q is equal to the 
union of an element of Tı and an element of To. Therefore Q € T. The closure of T under arbitrary unions 
follows from the associativity and commutativity of set unions. So T is a topology on X,U X». Any topology 
on Xı U Xə which includes T} and Tz must also contain all pairwise intersections and arbitrary unions of 
elements of T; and T5. Hence T is the weakest topology on X1 U X5 which includes 7T; and T5. 


32.14.6 DEFINITION: The set union topological space of topological spaces (X1, T1) and (X2, T2) such that 
Qı N X» € T» for all 0; € T; and Q2 N X4 € Ti for all Q2 € T» is the topological space (X1 U X2, T), where 
T= (94 U Qə; Qı € Ty and Q5 € T3). 


32.14.7 THEOREM: Some basic properties of general set-union topological spaces. 
Let (X;, T;);e; be a family of topological spaces where Vi, j € I, VQ € T;, QN X; € Tj. Let X = Uic; Xi 
and T= { Ujer Qi; Vi € I, 0; e Ti). 
(i) The pair (.X, T) is a topological space. 
(ii) Vi e I, X; e T. 


(iii) T is the weakest topology on X such that Ue; T; C T. 


ier 
PROOF: For part (i), define (Qj);e; by €; = 0 for all i € I. Then (Q;)ie; € XierTi and Ujer €; = 0. 
So 0 € T. Similarly, let (Q;)icr = (Xi)ier. Then (Qiier € xierT; and Ujer €; = X. So X € T. To show 
the closure of T under arbitrary unions, let QF € T for all k € K, for some set K. Then Q* = Wier QE for 
some (Q¥)ier € XierT;, for all k € K. Let Q = Upeg Q". Then Q = User Uses OF = User Use O7. But 
Uker 2% € T; for all i € I because T; is a topology. Therefore Q € T. 


To show closure of T under pairwise intersection, let O^ € T for k = 1,2. Then Q* = U icl Q^ for some 
(OF ier € XierT;, for k = 1,2. Let Q = Q! N Q?. Then Q = (Vier 27) (Ujer Q7) = Us jer (01 N Q7) 
by Theorem 8.4.8 (viii). But Q1 N Q? = X n (o1 n 05) = (Use; Xe) n (H 0 02) = Use; (Xe Q1 N 02) by 
Theorem 10.8.14 (i) for all i, j € I, and XN Q} NS € T; for all i, 7,@ € I by the assumptions of the theorem. 
Define (05);e; by € = Ui eji (Xn On Q7) for all £ € I. Then Q = U; jer User (XeN QE 0%) = Ue; t 
But Q, € T; for all £ € I because each set Q, is a union of elements of Ty. Therefore Q € T. Hence T is a 
topology on X. 

For part (ii), let i € I and define (Q;)je; by Qj = Xi for j = i and Qj = Ø for j € IN (ij. Then 
(Q5) ser € Xierj and User Q; = Xi. Therefore X,cT. 

For part (iii), let T" be a topology on X such that LJ;-; T; C T'. Let € € T. Then € = (jer Qi for some 
(Qiier € XierT;. Since Q; € T; for all i € I, it follows that €; € T" for all i € J, and since T” is a topology, 
it follows that | J;-; €; € T'. Thus Q € T". Therefore T C T’. Hence T is the weakest topology on X such 
that Ue; T; C T. 


wel 
32.14.8 DEFINITION: The (overlapping) set-union topology of a family (X;,T;);e; of topological spaces 
which satisfies Vi,j € I, Q; N X; € T; for all (Qi)ier € xierT; is the topological space (X, T) with X = 
Uer and T = TE s ile: Vi € I, Q; € Ti). 


32.14.9 REMARK: Disjoint set-union topology of a family of topological spaces. 

Whereas Theorem 32.14.7 justifies the definition of the potentially overlapping set-union topology of a family 
of sets in Definition 32.14.8, Theorem 32.14.10 justifies the definition of the guaranteed non-overlapping set- 
union topology of a family of sets in Definition 32.14.11. Since the induced topology of a locally Cartesian 
atlas in Chapter 49 (particularly Section 49.8) is typically defined as the union of topologies on overlapping 
patches (in each of which the topology is induced by the inverse of a chart function), Definition 32.14.8 is of 
some relevance to topology on manifolds. 
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32.14.10 THEOREM: Some basic properties of disjoint set-union topological spaces. 
Let (X;, T;)ie; be a family of topological spaces. Let X’ = U;er(Xi x {i}) and T" = { Ujer (Mi x {i}; Vi € 
1, 0; e Ti). 

(i) For all i € I, the pair (X7, T7) is a topological space, where X; = X; x {i} and T; = {Q x (i; Q € Ti). 
(ii) The pair (.X", T") is a topological space. 
(iii) Vie I, X! € T*. 
(iv) T” is the weakest topology on X’ such that [J;-j(Q x {i}; Q € T;) CT". 


PROOF: For part (i), let i € J. Then 0 x (i) € T} because Ø € T;. But Ø x {i} = 0 by Theorem 9.4.6 (i). 
So 0 € T7. Similarly, X; = X; x {i} € T; because X; € T;. To show closure of T7 under finite intersections, 
let 05, 05 € T7. Then 04 = 0, x (i) and Q5 = Qə x {i} for some 01,05 € Ti. So 04105 = (Qı x (in 
(Q5 x {i}) = (Q1 N 05) x {i} € T; because Q1 N Q2 € T;. The closure of T; under arbitrary unions may be 
shown similarly. Hence (X7, 77) is a topological space. 

For part (ii), note that X’ = U;er X; and T' = (Ue; Q5 (Mier € xierT;). Therefore (X',T") is a 
topological space by part (i) and Theorem 32.14.7 (i). Similarly, part (iii) follows from Theorem 32.14.7 (ii), 
and part (iv) follows from Theorem 32.14.7 (iii). 


32.14.11 DEFINITION: The (disjoint) set-union topology of a family (X;,T;)ier of topological spaces is the 
topological space (X, T) with X = UJ; (X; x {i}) and T = { Uje (Mi x (i; Vi € I, Qi e Ti). 


32.15. Topological patchwork spaces 


32.15.1 REMARK: Construction of topological spaces by gluing patches. 

“Topological patchwork spaces” are topological spaces which are constructed by gluing together patches of 
topological spaces. This is how topological manifolds are often defined in practice. One may construct a 
topological patchwork either by gluing patches to each other, or else by gluing the patches onto a pre-defined 
topological space. The “glue” is the topology. Without glue, one obtains a non-topological patchwork which 
falls apart in the slightest breeze, as in Definition 10.17.8. 


32.15.2 REMARK: The small difference between set union topology and patchwork topology. 

Definition 32.15.3 requires some explanation. It is effectively the same as Definition 32.14.8 except that 
the sets (X;);e; are first glued onto a set X before the tests are applied to the topologies on the pairwise 
intersections of the sets X;. In other words, the patchwork topology is the same thing as a set union topology 
except that an equivalence relation is first applied to the sets in the family to determine which points in the 
sets should be identified. 


A useful mental image for the definition of a topological patchwork space is a football made out of many 
patches of material sewn together with overlapping flaps. Another suitable metaphor for a patchwork space 
is the panorama style of photo-stitching software which makes construct very wide angle photos by defining 
homeomorphisms between the overlaps of individual photos. 


For the non-topological patchwork space in Definition 32.15.3, see Definition 10.17.8. For partial Cartesian 
products xj;e; Xj, see Definition 10.17.2 and Notation 10.17.3. 


32.15.3 DEFINITION: Let (X;,T;);e; be a family of topological spaces. Let X C x;e; X; be a patchwork 
space of the sets (X;);er. The family (X;, T;)ie; is said to be (topologically) compatible with the patchwork 
space X if for all i, j € I, for all Q; € Ti, (z;; £ € X and z; € Qi} € T}. 

When the family of topological spaces (.X;, T;);e; is compatible with the patchwork space X C x;e; Xi, the 
patchwork topology on X is the set T' defined by 


T-(U fi); Vie L OG e T }, 
ier 


where f; : X; — X is defined so that for all y € X;, fi(y) is the unique x € X such that y = xi. (See 
Remark 10.17.9.) 


The topological space (.X, T) may be called a topological patchwork space of the family (X;, T;);er. 
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32.15.4 REMARK: Interpretation of definition of topological patchwork spaces. 

The topological compatibility in Definition 32.15.3 means that the patchwork space transition functions 
are continuous, analogously to Definition 32.14.8. The topology T of a patchwork space X is the weakest 
topology for which projections from X to the patch topologies are continuous. 
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TOPOLOGICAL SPACE CLASSES 


33.1 Weaker separation classes . . . . . 2 o osos 1101 
33.2 Topologically separated pairs of sets . . . .. es 1108 
33.3 Stronger separation classes . . . . . 2 2 osos 1112 
33.4 Separability and countability classes . . . . ... rs 1118 
33.5 Open covers and cover-based compactness . . . . . . . les 1123 
39.6. LocaleompaGUness 2 se ld ncs eo xo RO xo WO o ee RC Rum RUP UR e 1128 
33.7 Other cover-based compactness classes . . . . .. rs 1130 
33.8 Topological dimension... . . 4 4 2l e e es 1134 


33.0.1 REMARK: The usefulness of topological space classes. 

Topological space classes are important because the unifying concept of a general topological space is too 
broad. Topological spaces are a useful generalisation of metric spaces, but the extreme generality makes it 
difficult to prove much more than abstract theorems. Therefore it is usual to work within particular classes 
of topologies which satisfy the minimum requirements for particular applications. Topological classes help 
to aggregate theorems which follow from particular constraints on spaces. 


33.1. Weaker separation classes 


33.1.1 REMARK: Weaker and stronger separation classes for topological spaces. 

Presentation of the topological separation classes is fairly arbitrarily divided into “weaker separation classes” 
To to T» in Section 33.1 and “stronger separation classes" T3 to Tg in Section 33.3. (The real reason for 
this is avoid an excessively long single section.) The T^» class is the basic level of separation which is required 
for topological manifolds in Section 50.1. In practice, most topological manifolds are in all of the classes up 
to and including Tg. (However, it is suggested in Section 49.5 that it could be profitable to relax the T» 
separation class for differentiable manifolds.) 


33.1.2 REMARK: The Alexandrov/Hopf topological separation classes. 

The six topological separation classes To to T; (not including T3 i ) which are presented in Sections 33.1 
and 33.3 are attributed by Simmons [137], page 130, and Kelley [101], page 57, to a 1935 book by Alexandrov 
and Hopf [47]. These “Trennungsaxiom” (separation axiom) spaces are defined as follows. 


definition class alternative name 


33.1.5 To Kolmogorov space 

33.1.8 Tı  Kuratowski space, Fréchet space 
33.1.24 Tə Hausdorff space 

33.3.2 Ts Vietoris axiom 


w 
w 
[29 
[aen 
[o1 
= 
w 


Tikhonov axiom 


33.3.20 T4 Tietze’s first axiom 
33.3.25 Ts Tietze’s second axiom 
33.3.30 Ts  Vedenisov axiom 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www. geometry .org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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These axioms do not follow a strict hierarchy. The relations between them are summarised in Figure 33.3.7 
in Remark 33.3.36. There are some other separation axioms which are mentioned by some authors, but they 
are of less relevance for differential geometry. (See Steen/Seebach [141], pages 13, 16, for T, 1 spaces, which 
are also known as “completely Hausdorff spaces”, and Urysohn spaces, perfectly T'4 spaces, and semiregular 
spaces. See EDM2 [113], pages 1612-1613, for T} and T7, which are stronger than T; if the axiom of choice 
is not admitted.) 


33.1.3 REMARK: The confusability of “separation” and “separability” in topological spaces. 

Very confusingly, “separation” is not closely related to “separability”. A separation property of a topological 
space tells you how easy it is to disconnect subsets of the space from each other. (Separable spaces are 
defined in Section 33.4. Connectedness is defined in Section 34.1.) 


Two subsets of a space are typically separated by covering them with disjoint open sets. In the case of To 
and T, spaces, only a single open set is involved. In the case of T; spaces, a continuous function is used 
to separate sets. 


33.1.4 REMARK: The extremely weak To topology separation class. 

A topology defines neighbourhoods of points in a set X. A neighbourhood Q of a point x € X means a set 
of points Q C X such that x is in some sense in the “interior” of Q. This means that the set Q surrounds 
the point x in some sense. In other words, the set Q “separates” the point x from the points of X which 
are “exterior” to Q. The concepts of “interior”, “exterior” and “boundary” are fundamental to all human 
thinking. They are in fact fundamental to all life forms. (See Figure 31.2.1 in Remark 31.2.3 for a diagram 
of these concepts.) Every neighbourhood divides the world into “interior” and “exterior” by including the 
interior points and excluding the exterior points. 


Neighbourhood and separation are two sides of the same coin. A neighbourhood is defined by the points 
which are excluded. So the most fundamental question one can ask about a topology T on a set X is whether 
it contains at least one set Q € T for each pair of points x,y € X such that z € Q and y ¢ Q. If such a set 
Q does not exist for a given point pair, the pair of points might as well be considered to be a single point. 


If every neighbourhood of x contains y, and every neighbourhood of y contains zr, the topology has no 
capability at all to separate the two points. A topology which forbids this possibility is said to have the To 
property. In other words, the To property requires that for each pair of points x,y € X, there is at least one 
neighbourhood Q € T which contains one of the points and excludes the other. 


The To property does not guarantee that both points can exclude the other. It only guarantees that at 
least one of the points can exclude the other. In other words, it guarantees that each point x can be either 
included (while y is excluded) or excluded (while y is included), but it cannot be guaranteed on which side x 
will lie. A topology which guarantees that every given point x can be included in a neighbourhood Q which 
excludes any other given point y is said to have the T, property. 


33.1.5 DEFINITION: The To topological separation class. 
A To (topological) space is a topological space X such that 


Vai € X, Vx» EX X {x1}, 
(AQ, € Tops, (X), za d M1) V (AM € Topa, (X), 21 d M2). (33.1.1) 


33.1.6 REMARK: Alternative logical expressions for the To separation property. 
Condition (33.1.1) in Definition 33.1.5 is equivalent to: 


Vai € X, Va» €x N {x1}, AO, € Top,, (X), AD. € Top,, (X), 
vg € Qı V zı € Qo. 


This equivalence follows from the fact that Top, (X) and Top,,(X) are non-empty for all z1,25 € X. 
Condition (33.1.1) is also equivalent to: 


Va, E€ X, Vz € X \ {a1}, IQ € Top( X), 
(ry € Q^ x3 EQ) V (ty EQ ^ a2 € Q). 
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This may also be written in terms of the exclusive-OR logical operator A as: 


Vay € X,Vro € X \ {21}, JN € Top(X), x3 €Q ^ z2€ Q. 


(See Notation 3.7.8 and Definition 3.7.10 for the exclusive-OR operator. The proposition (xı € Q) A (a2 € Q) 
is equivalent to the proposition xı € Q & x» ¢ ©.) And this may be rewritten in terms of the cardinal 
number operator # as follows. 


Vzi€ X,Vr€ X \ {x1}, 30 € Top(X), zZ((z1,231 NQ) = 1. 


33.1.7 EXAMPLE: The trivial topology on a set with at least two points is not To. 
Let X be a set containing at least two points. Let Top(X) = (0, X]. Let x1,22 € X with zi Z x2. Let 
zı € Qı € Top( X). Then Qı = X. So xg € Qı. Similarly for zo € Qı € Top( X). Hence X is not a To space. 


33.1.8 DEFINITION: The T, topological separation class. 
A T; (topological) space is a topological space X such that 


Vai € X, Vx» € X N {x1}, AO, € Top,, (X), X2 ¢ Qı. (33.1.2) 


33.1.9 REMARK: Comparison of the To and Tı separation classes. 

The T; property means that for every pair of distinct points xı and zs, there is an open set Q4 which contains 
xı but does not contain x2. Since x; and z2 may be swapped, this implies that IN € Top, (X), 71 ¢ Q2 
also. Therefore 


Vai € X, Va» EX X {x1}, 
(SQ, € Top,, (X), 2 € Q1) ^ (392 € Top,, (X), zi € Q2). 


The To property in Definition 33.1.5 is weaker than this because it guarantees only the disjunction of these 
two propositions rather than the conjunction. A topological space which is not Tg has two (or more) distinct 
points which are always both either inside or outside any given open set. In other words, a topological space X 
is non- To if and only if 


£1, T2 € X, (zi Æ zs A VQ E€ Top(X), (xı ENS TE Q)). 
33.1.10 THEOREM: Every T; space is a To space. 
Let X be a T; topological space. Then X is a To topological space. 


PROOF: Let X be a T; topological space. Let 21,72 € X with rj Æ x2. Then by Definition 33.1.8, there 
is a set Q € Top, (X) such that x2 ¢ Q. So line (33.1.1) is satisfied. Hence X is a To topological space. 


33.1.11 REMARK: Interpretation of the T; separation property. 

It follows from the T, property in Definition 33.1.8 that there exist open sets O; and Qo such that zı € Q1 \Q2 
and x2 € Q3 \ O1, but it does not follow that the sets Qı and Q3 can be chosen such that Q1 Qs = Ø. (This 
is illustrated in Figure 33.1.1.) A useful way of thinking of the T; separation property is that it guarantees 
that all single-point sets are closed. 


To space T; space 
Pd EN "x ud iN 
7 \ ^ oN N 
1 \ y / \ \ 
Oii Ze | e&2 Oii Te | 32. Qe 

\ / \ \ 1 1 
e j S Ne 4 

zı E Qı vg É Qı z1€(04 z3€ Ne 

Tı ¢ Q2 x2 ¢ Qı 

Figure 33.1.1 T; separation does not require disjoint covering pairs 
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33.1.12 THEOREM: Some elementary properties of Tı spaces. 
Let X be a T, topological space. Then the following propositions are true. 


(i) Yz € X, {x} € Top(X). 


v) YS € P(X), Vr € XV S, 303,05 € Top(X), (£ € Qı V Q5 and S C Nz \ 1). 
0 = 

vi) VS € PE (X), Vr € XV S, 30 € Top, (X), ANS =O. 
0 


) 
) 
(iv) YS € P(X), Vr € XX S, dO € Top(X), (S C Q and {£} NN = Q). 
) 
) 


PRoor: For part (i), let x € X. Define C = {Q € Top(X); x e Q}. Then Vy e X \ {r}, REC, yeg 
because X has the T; property. Therefore Vy € X\{x}, y € UC, by the definition of UC. So Xæ} C UC. 
But x £ UC. So X \ {z} = UC. But UC € Top(X) because C is a set of open sets. Therefore 
{x} = X \UC € Top( X). 

Part (ii) follows from part (i) and Theorem 31.8.14 (v). 

Part (iii) follows from part (ii) and Theorem 31.9.10 (vi). 

Part (iv) follows from part (i) with Q = X V (xj. 

For part (v), recall that P(X) = (S € P(X); #(S) < co} by Notation 13.12.5. Let S be a finite subset 
of X, and let z € X \ S. Let n = #(S). If n = 0, then Qı = X and Q2 = Í satisfy the assertion. The case 
n = 1 follows directly from Definition 33.1.8. Suppose that n > 2. Let S = (yi; i € Nn} for some finite 
sequence (y;)?-,. By Definition 33.1.8 (and induction), there are sequences (Q01)7., and (05)7., such that 
z € Q V 03 and y; € Q5 Of for alli € Na. Let Q; = (5, 0$ and 0; = U; 05. Then x € O4 \ Qz and 
SC Oy Oy 

Part (vi) follows from part (v) with Q = Q4. 


33.1.13 REMARK: Comparison of 'T1 and T3 separation classes. 

Part (iv) of Theorem 33.1.12 is a more or less “political” way of restating part (i). The significance of this 
restatement is that the T; property implies one half of the T3 property, which is illustrated in Figure 33.3.1 
for Remark 33.3.1. In terms of Definition 33.2.5, one may say that any set S which does not contain x is 
“exterior” to the set {x} in a T, space, but the reverse is not true. In other words, the T; property does 
not imply the existence of a neighbourhood of z which excludes the set S, in particular if S is closed. The 
T3 condition in Definition 33.3.2 does stipulate the existence of such a neighbourhood. 


33.1.14 REMARK: Conditions equivalent to the Tı separation property. 

By Theorem 33.1.15, one could characterise T; spaces as “closed-singleton spaces". Consequently, a Tı 
space is a topological space where the removal of any finite number of points from an open set yields an open 
set. In other words, all “finitely punctured” open sets are open. (This is shown in Theorem 33.1.16.) So a 
T; space could be thought of as a space where open sets are “robust” or “resilient” enough to withstand 
any finite number of punctures without losing their openness. 


33.1.15 THEOREM: Ty, spaces are “closed-singleton spaces". 
A topological space X has the T, property if and only if {a} is closed in X for all z € X. 


PROOF: It follows from Theorem 33.1.12 (i) that {x} is closed in X for all x € X if X is a T, space. So 
suppose that X is a topological space such that {x} is closed in X for all z € X. Then Q, = X \ {x} is open 
for alla € X. Soy € Qz for any y € X \ {x}, but z ¢ O,. This implies the T; property for X. 


33.1.16 THEOREM: T; spaces are topological spaces where all “finitely punctured” open sets are open. 
A topological space X has the Tı property if and only if 


VO € Top(X), VS € IP(Q), JS) < oo > QV S € Top( X). (33.1.3) 


PROOF: Let X bea T, space. Let Q € Top( X). Let x € Q. Then (x) is closed by Theorem 33.1.12 (i). So 
X \ {x} is open by Definition 31.4.1. So Q \ {£} = On (X \ {z}) is open by Definition 31.3.2 (ii). It then 
follows by induction that Q \ S is open for any finite set S C Q. 
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To show the converse, assume line (33.1.3). Let x € X. Let Q = X and S = {x}. Then Q \ S is open by 
Definition 31.3.2 (i) and line (33.1.3). So {x} is closed by Definition 31.4.1. Consequently {x} is closed for 
all x € X. Hence X is a Tı space by Theorem 33.1.15. 


33.1.17 THEOREM: All finite 'T4 spaces have the discrete topology. 
Let X be a finite T; topological space. Then Top(X) = P(X). 


PROOF: Let X be a finite T, topological space. If X = Ø, then there is only one possible topology on X, 
namely the set Top(.X) = (0) = IP(0) = P(X). 

Let x € X. By the T, property, there is a set Q, € Top(X) for each y € J = X V {x} such that x € Q, 
and y ¢ Qy. Then f], ez €, = {x}. Since this is a finite intersection, it follows that {x} € Top(X). Since 
every subset of X can be written as a union of such singleton sets, it follows that Top(.X) = P(X). 


33.1.18 REMARK: Infinite Tı topological spaces do not necessarily have the discrete topology. 

Theorem 33.1.17 means that if a finite set has the Tı property, then all single-point sets are open. In the case 
of countably infinite sets, this is not true. This is demonstrated by Example 31.11.2. Trivial T, topologies 
are mentioned in Definition 31.11.7 and Remark 31.11.8. 


33.1.19 EXAMPLE: A countably infinite topological space which is To but not Ty. 

Let X = Z and T = (0, Z} U (Z(—oo,a]; a € Z}, where Z(—oo,a] means (i € Z; i < a}. (See Notations 
11.5.15 and 14.4.11.) Then T is a topology on X. (It is easily verified that T is closed under finite non-empty 
intersections and arbitrary unions.) 


The topological space (.X, T) has the To property, but not the T4 property. Given £1, £2 € X with x1 Æ x2, 
the set Q = Z(—oo, min(zi, x2)] satisfies min(zi, v2) € Q and max(z1, £2) € Q, which verifies the To property. 
But clearly any set in T which contains max(z1, z2) must also contain min(z1, 22). So the topology is not T4. 


33.1.20 EXAMPLE: An uncountable topological space which is To but not T. 

Let X = R and T = {0,R} U ((—o6,a); a € R}, Then T is a topology on X. It is easily verified that 
T is closed under finite non-empty intersections. Closure under arbitrary unions follows from the relation 
(—o0,@) = U {(—00, a); a € S) for any S € IP(IR), where à = sup S. (This holds for all à € [-oo, +00] = R.) 
The topological space (.X, T) has the To property, but not the T4 property. Given 21,72 € X with z1 Æ x2, 
the set Q = (—oo, (21 4- 12)/2) satisfies min(z1, £2) € Q and max(z1, x2) ¢ Q, which verifies the To property. 
But clearly any set in T which contains max(z1, z2) must also contain min(z1, 22). So the topology is not T4. 


33.1.21 REMARK: The effect of 'T1 separation on limit points. 

It is shown in Theorem 31.10.18 that all co-limit points are limit points. Theorem 33.1.22 shows that the 
converse implication is equivalent to the Tı separation property for the space. This has consequences for 
the relations between various cover-based and limit-based compactness classes in Sections 32.3, 33.7, 35.5 
and 35.6. 


33.1.22 THEOREM: The T; property is equivalent to the equivalence of limit and co-limit points. 
A topological space is 'T if and only if all limit points of sets are oo-limit points. 


PROOF: Let X be a T, topological space. Let z be a limit point of a set S € P(X). Let Q € Top,(X). 
Suppose that #(Q N (SV {z})) < oo. Then by Theorem 33.1.12 (vi), there exists Qo € Top, (X) such that 
NN(QN(S\{z})) =O. Let 04 = N9NQ. Then Qı € Top, (X) and Q1N(S\{z}) = 0. So by Definition 31.10.2, 
z is not a limit point of S, which is a contradiction. Therefore z is an oo-limit point of S. 


For the converse, let X be a topological space where all limit points of sets are oo-limit points. Let x,y € X 
with x Z y. Suppose there is no set Q € Top, (X) such that y ¢ Q. Then y € Q for all Q € Top,(X). So 
YQ € Top,(X), QN ({y} \ (x]) # 0. Therefore x is a limit point of {y} by Definition 31.10.2. However, 
YQ € Top, (X), #(QN ({y} \ {z})) = 1 < co. Therefore by Definition 31.10.17, x is not an oc-limit of (y). 
'This contradicts the assumption for X. Therefore X is a T space. 


33.1.23 REMARK: The Tə separation property, more usually known as Hausdorff separation. 
The To, i.e. Hausdorff, topological separation property in Definition 33.1.24 means that every pair of distinct 
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points has a pair of disjoint neighbourhoods. Line (33.1.4) may be rewritten as follows. 


Vzı € X, Vag € X \ {ai}, 301, Q2 € Top( X), 
xı € Qı and x2 € Q2 and Q4 N Qə = (j. 


The Hausdorff property specifies that each set-pair ((z1), (z2)) must be strongly separated in the sense of 
Definition 33.2.10. This is equivalent to specifying that every two-point set must be disconnected in the 
sense of Definition 34.1.6. This is illustrated in Figure 33.1.2. 


xı E Qı T2 € Q2 
QNR f 


Figure 33.1.2 In a Hausdorff (i.e. T2) space, distinct points have a disjoint open cover 


33.1.24 DEFINITION: The T^ topological separation class. 
A Tə (topological) space is a topological space X such that 


Yz1, £2 € X, X1 z z2 > JQ E€ Top, (X), JQ € Tops, (X), Q1 NA Qo = 0. (33.1.4) 


A Hausdorff space is the same as a T? space. 


33.1.25 REMARK: Comparison of the Tı and T2 separation classes. 

Although the Tı property guarantees that every point has a neighbourhood which excludes any given point, 
it does not guarantee the simultaneous exclusion of each point by a neighbourhood of the other. (This 
is illustrated in Figure 33.1.1 in Remark 33.1.11.) The existence of a disjoint covering pair (i.e. mutually 
exclusive territories) is guaranteed by the To separation property, better known as “Hausdorff separation". 


33.1.26 THEOREM: Every Tə space is a Ty space. 
Let X be a T5 topological space. Then X is a T topological space. 


PROOF: Let X be a T? topological space. Let x1, £2 € X with zı Æ x2. Then by Definition 33.1.24, there 
are sets Q1 € Top,, (X) and Q2 € Top,, (X) such that Q1 N Q2 = (). So x2 € Q1. So line (33.1.2) is satisfied. 
Hence X is a T, topological space. 


33.1.27 REMARK: Topological spaces which are T; but not T5. 
For any infinite set X, the example topology which is defined on X in Theorem 31.11.6 is T, but not 
Hausdorff. This is asserted in Theorem 33.1.28. 


33.1.28 THEOREM: The finite-complement topology on an infinite set is Tı but not T2. 
Let Top(X) = (0) U {Q € P(X); #(X V Q) < oo) for an infinite set X. Then Top(X) is T4, but is not To. 


Proor: Let Top(X) = (01 U (0 € P(X); #(X VQ) < oo) for any set X. Let z1,22 € X with zı Æ x. Let 
Q4 =X \ {x2} and Qo =X b {z1}. Then Q4 € Top,, (X) and Qə € Top, (X). But Tı ¢ Qo and T2 ¢ Qı. 
Hence Top( X) is a Tı topology on X. 


Now suppose that Top(X) is a Hausdorff topology on X. Let z1,z2 € X with zı Æ zo. Then there exist 
Q E€ Top, (X) and Qə € Top,,, (X) with Q4 n Qz = 0. So X = XN (Q4 N Q2) = (X \ 01) U CX V 03) by 
De Morgan’s law (Theorem 8.2.3). But #(X V Q1) < oo and #(X V 1) < oo because Qı # Ø and Qz F Ó. 
So #(X) € #(X \ 01) + #(X \ Q2) < oo. So Top( X) is not Hausdorff if X is infinite. 


33.1.29 REMARK: Continuous functions with a T2 target space. 
Theorem 33.1.30 is an immediate application of the Hausdorff separation class. (Note that “the graph of f” 
means the same thing as f itself. See Remarks 9.1.2 and 9.1.3 for this issue.) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


don, You may not charge 


33.1. Weaker separation classes 1107 


33.1.30 THEOREM: A continuous function has a closed graph if the target set is a To space. 
Let f : X — Y be a continuous function, where X is a topological space and Y is a Hausdorff space. Then 
the graph of f is a closed subset of the product topological space X x Y. 


PROOF: Let (x,y) € X x Y satisfy y Æ f(x) in Theorem 33.1.30. Since Y is a Hausdorff space, there exist 
Q4, Q2 € Top(Y) such that f(x) € Q1, y € Qz and Q, n Qs = Ø. Then 


graph(f) n (f^! ((1) x 05) = ((z,y) € X x Y; y = f(x) and f(x) € €, and y € Q5) 
{(z,y) € X x Y; f(x) € Qı and f(x) € Q2} 
0 


But (x,y) € f~4(Q1) x Q2 and f~1(Q1) x Q2 € Top(X x Y). Therefore (x,y) is in the interior of G = 
(X x Y)N graph(f). Since all points of G are in the interior of G, it follows that G is open in the topology 
of X x Y. Therefore graph(f) is closed in the topology of X x Y. 

To make the last statement a little more rigorous, let A = Uœ 510 € Top(X xY); Q C G and (x,y) € Q}. 
Then A € Top(X x Y) and A = G. Therefore graph( f) is closed. 


33.1.31 REMARK: Convergent infinite sequences in a To space have unique limits. 
By Theorem 35.4.10, convergent infinite sequences of points in a Hausdorff space have unique limit points. 


33.1.32 REMARK: Locally Cartesian spaces are not necessarily To. 

It is notable that a locally Cartesian space is not necessarily Hausdorff, although all Cartesian topological 
spaces are clearly Hausdorff. This is explained in Remark 50.1.2. (See Definition 49.4.3 for Cartesian 
topological spaces. See Definition 49.4.7 for locally Cartesian spaces.) When topological manifolds are 
defined in Section 50.1, non-Hausdorff topologies are explicitly excluded. The examples of non-Hausdorff 
topological manifolds in Section 49.5 give a hint of why one would want to exclude them. 


33.1.33 THEOREM: The weak topological separation properties are hereditary. 
Let X be a topological space. Let S be a subset of X with the relative topology from X. 


(i) If X is a To space, then S is a To space. 
(ii) If X is a T; space, then $ is a T, space. 
(iii) If X is a T2 space, then S is a T? space. 


PROOF: For part (i), let X be a To space. Let 21,22 € S with x; # £2. Then by Definition 33.1.5, either 
there exists a set Qı € Top,, (X) such that x2 ¢ Qı or there exists a set Q2 € Top,, (X) such that zı ¢ Q2. 
Suppose the former case. Let Q} = 9219S. Then € € Top(S) by Definition 31.6.2, and zı € € and z2 ¢ (Y. 
Similarly for the latter case. Hence S is a To space by Definition 33.1.5. 

For part (ii), let X be a T, space. Let 21,22 € S with zı Æ £2. Then by Definition 33.1.8, there exists a set 
Qc Top, (X) such that z2 d Qı. Let Q1 = Qı S. Then € € Top(S) by Definition 31.6.2, and zı € Q4 
and z2 d Q^. The same holds for xı and z2 swapped. Hence S is a T4 space by Definition 33.1.8. 

For part (iii), let X be a Tı space. Let zi,2 € S with xı # x2. Then by Definition 33.1.24, there exist 
sets Qı € Top,, (X) and Q5 € Top,, (X) such that Q1 N Q5 = 0. Let Q1 = Qı N S and 05 = 029 S. Then 
Q5, 05 € Top(S) by Definition 31.6.2, and 91 N O5 = Ø. Hence S is a T? space by Definition 33.1.24. 


33.1.34 REMARK: Inheritance of weak separation properties by finite direct products. 

Theorem 33.1.35 (iii) is applicable to showing that the product of two topological manifolds is a topological 
manifold in Theorem 50.4.7 (i). The usual kind of inductive constructions and arguments extend the binary 
product case to arbitrary finite direct products of topological spaces. 


33.1.35 THEOREM: Weak topological separation properties are inherited by binary direct products. 
Let X = X4 x Xə be the direct product of topological spaces X4 and X». 
(i) If X; and X» are To spaces, then X is a To space. 
(ii) If Xı and X» are T, spaces, then X is a T1 space. 
(iii) If X1 and Xə are T» spaces, then X is a T» space. 
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PROOF: For part (i), suppose that X; and X5 are To spaces. Let x,y € X with z # y. Then x = (21, 22) 
and y = (yi, y») for some unique 21, y1 € X1 and x2, y2 € Xo, and either zı Æ yı or x2 Æ y2 (or both). 
Suppose that xı # yı. Then there exists €; € Top, (X1) with y; ¢ Q1, or Q2 € Topp, (X1) with zı € Q2. 
In the former case, let Q} = O1 x X2. Then € € Top, (X) and y ¢ 94. In the latter case, Q5 € Top, (X) 
and y ¢ Q% with Q5 = Qa x X». The case ry Z ys is similar. Hence X is a To space. 


For part (ii), suppose that X; and X» are Tı spaces. Let x,y € X with x # y. Then z = (zi,73) and 
y = (y1, y2) for some unique z,,y1 € X4 and x2, y» € X», and either x, Æ yı or £2 Æ y» (or both). Suppose 
that zı # yi. Then there exists Qı € Top, (X1) with yi € Qı. Let Q1 = Qı x X5. Then Q € Top, (X) and 
y € Q1. The same holds for xı and yı swapped. The case x2 Æ y» is similar. Hence X is a T, space. 


For part (iii), suppose that X, and Xə are T»? spaces. Let x,y € X with z # y. Then zr = (z1,72) and 
y = (y1, y2) for some unique x1, yi € X1 and x2, yo € X», and either zı Æ yı or £2 Æ y» (or both). Suppose 
that x; # yi. Then there exist Qı € Top,, (X1) and Q2 € Top,, (X1) with Q1 N Q2 = 0. Let Q1 = Qı x Xə 
and 25 = O5 x X». Then N; € Top,(X) and Q5 € Top, (X), and Q N 05 = Ø by Theorem 9.4.6 (v). The 
case £2 Æ Yo is similar. Hence X is a Tz space. 


33.2. Topologically separated pairs of sets 


33.2.1 REMARK: Weakly and strongly topologically separated pairs of sets. 

The standard concept of a separated pair of sets and the slightly non-standard concept of a strongly separated 
pair of sets are introduced in Section 33.2. (See Figure 33.2.1 for an illustration of these two kinds of 
separation.) These concepts, and the differences between them, help to clarify the meanings of many of the 
topological separation classes in Sections 33.1 and 33.3. The weakly separated set-pair concept is equivalent 
to the disconnected set-pair concept in Definition 34.3.6 if both sets are non-empty. 


33.2.2 DEFINITION: A (weakly) (topologically) separated pair of sets in a topological space X is a pair of 
sets S1, S2 € P(X) such that Sı N S2 = Ý and S1 N S2 = Ó. 


33.2.3 THEOREM: Some basic properties of weakly separated pairs of sets. 
Let X be a topological space. Let $1, S2 € P(X). 


(i) If S1 — 0 or S2 = 9 (or both), then (S1, S2) is a weakly separated pair of sets in X. 
(ii) If (S1, S2) is a weakly separated pair of sets in X, then S1 N S2 = Í. 
(iii) * , 52) is a weakly separated pair of sets in X if and only if Sı C Ext(S2) and S2 C Ext( S1). 
) (S1, S2) is a weakly separated pair of sets in X if and only if SQ, € Top( X), (S; C Qı and S3 0, = 0) 
and IQ € Top( X), (S2 C Q2 and S1 n Qz = ()). 


(iv 


(v) 
(vi) 


(S1, S2) is a weakly separated pair of sets in X if and only if 
301, Q2 € Top( X), (Sy Ç Q1 and So C Qe and (Sy U S2) N Qı N Qə = 0). 
( 


$1, S2) is a weakly separated pair of sets in X if and only if 
301, Q2 E Top(X ), ($1 G Q1 \ Qə and $5 C Qə \ Q1). 


PRoor: For part (i), suppose that Sı = Ø. Then $4 = Ø by Theorem 31.8.14 (vii). So $1 N $5 = Ø and 
S, N S2 = () by Theorem 8.1.4 (ii). Hence (S1, $5) is a weakly separated pair of sets. 

For part (ii), let (51,55) be a weakly separated pair of subsets of X. Then S1 N S2 = by Definition 33.2.2. 
However, $; C $i by Theorem 31.8.13 (vii). So S; N S2 € Sı N S2 = ( by Theorem 8.1.7 (iii). Hence 
S1 N S2 = @ by Theorem 8.1.4 (iv). 

For part (iii), note that Ext($;) = X V S1 by Theorem 31.9.10 (vi). So 5$, N S2 = 0) if and only if S; C 
X \ S$; = Ext( S1). Similarly, $1 N S2 = @ if and only if Sı C Ext(S2). 

For part (iv), let (S1, S2) be a separated pair of sets in X. Let Qı = Ext(S2). Then Sı C Qı by part (iii), 
and S2 N Qı = Ø by Theorem 31.9.10 (vii). Similarly S2 C Q2 and S1 N Q5 = Ø with Q2 = Ext(S1). Now 
suppose that the pair (S1, S2) satisfies S1 C Qı and S2 N Qı = for some Q; € Top(X). Then S2 C XXV 4. 
So S2 C XX Q1 by Theorem 31.8.13 (xiv) and Theorem 31.8.14 (v) because X V Q is a closed set in X. But 
X \ Qı G X N Si. So $5 e X \ Sı. Therefore S5 N S1 = = 0. Similarly, Si n $5 = = (). 

For part (v), note that by part (iv), ($1, S2) is a separated pair of sets in X if and only if 304, Q2 € Top( X), 
(Si e Qı and $5 C Qə and SoNQ, = 0 and S100» = 0). But (S1US5)n94005 = (S100105)U(S5004003) 
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by Theorem 8.1.6 (xii). So (Si U S2) N Qı N Qə = (Si N Q2) U (S2 N Q1) when Si C Qı and So C Qo. Then 
(51 U $5) N Q4 N (25 = 0 if and only if $1 N Qə = 0 and So N Qı = 0 (when 94 C Qı and So C Q2). Hence 
(51,595) is a separated pair of sets in X if and only if I3NQ1, Q2 € Top(X), ($1 C Qı and S2 C Qz and 
(Sı U S2) N Q1 N Ro = 0). 

Part (vi) follows from part (iv) by noting that S; N Q9 = 0 if and only if S; C X V Q5, and so $1 C Qı and 
Sy N Q2 = ( if and only if $1 e 04 \ Qə. Similarly So C (25 and So N Qı = 0 if and only if So C Qə Y Qı. 


33.2.4 REMARK: Interpretation of equivalent conditions for a pair of sets to be weakly separated. 
Theorem 33.2.3 may be expressed as the equivalence of the following propositions in any topological space X. 


(1) (S1, S2) is a weakly separated pair of sets in X. 

(2) Sı C Ext(S2) and S2 C Ext(S1). 

(3) 39, € Top(X), (S1 € Qı and S5 N Qı = 0) and 3€» € Top( X), (S2 C Qz and S N Qz = 0). 
(4) 301,92 € Top(X), (S, € Qı and S; € My and (Sj U Sz) NM, NA = 0). 

(5) 304,05 € Top( X), (S; € Qı V Q2 and S2 C Q3N Q4). 


Inspection of conditions (2) and (3) reveals that the weak separation of two sets naturally decomposes into 
two sub-conditions, namely that 5, is separated within 2, from 55, and Sù is separated within Qo from Sı. 
The first of these sub-conditions effectively means that Sı lies within the interior of O4 while S5 lies within 
the “exterior” of Q1. The second sub-conditions effectively means that Sə lies within the interior of Qə while 
Sı lies within the “exterior” of Q2. Strictly speaking, the word “exterior” here should be replaced with 
“complement”. Since the sets Qı and Q5 are open, the complement is in each case equal to the union of 
the topological exterior and boundary of the respective open set. In terms of Definition 33.2.5, the following 
further equivalent condition may be added to the above list. 


(6) Sı is exterior to S2, and S» is exterior to $1. 


It is noteworthy that there is effectively no interaction between the open sets €); and €); in condition (3), 
although the way in which the property is written in condition (4) gives the superficial appearance that 
these two sets are logically linked in some way. This situation may be contrasted with the notion of “strong 
separation" in Definition 33.2.10, where the open sets are required to not overlap, which is a real logical 
linkage between these sets. 


33.2.5 DEFINITION: A set Sı is exterior to a set S5 in a topological space X when S1 C Ext(S2). In other 
words, 30, € Top( X), (5; € Qı and S2 N Qı = 0). 


33.2.6 REMARK:  Ezpressing the interior of a set as weak separation from its complement. 

'The concept of the topological interior of a set may be expressed as the separation of points in the interior 
from the complement of the set if the topology is Tı. This is stated as Theorem 33.2.7 (iii). The T; 
condition is only required in order to guarantee both sub-conditions of the weak separation definition. (See 
Remark 33.2.4 for these two sub-conditions.) Without the T condition, it is still possible to prove one half 
of the symmetric weak separation property. 


33.2.7 THEOREM: Interior points of a set have weakly separated point-singleton and set-complement. 
Let X be a topological space. Let S € P(X). 
(i) x € Int(S) if and only if {x} is exterior to X V S in X. 
(ii) x € Int(S) if the pair ({a}, X V S) is weakly separated in X. 
(iii) If X is a T, space, then x € Int(S) if and only if the pair ((x], X V S) is weakly separated in X. 


Pnoor: For part (i), let X be a topological space, and let S € P(X). Suppose that x € Int(S). Then 
x € Qı and Qı C S for some Qı € Top(X) by Theorem 31.8.13 (iii). So (zx) € Qı and (XV S) Qı = Í. 
Therefore {x} is exterior to X V S in X. 

For the converse, suppose that ({x} is exterior to X V S) in X. Then by Definition 33.2.5, there exists 
Qı € Top( X) such that {x} C Qı and (X V S) à Qı = 0. Thus z € Qı and Qı C S. Hence x € Int(S) by 
Theorem 31.8.13 (iii). 

For part (ii), suppose that the pair ((z], X V S) is weakly separated in X. Then by Theorem 33.2.3 (iv), 
there exists Qı € Top(X) such that {x} C Qı and (X VS) 10; = 0. Thus z € Qı and Qı C S. Hence 
x € Int(S) by Theorem 31.8.13 (iii). 
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For part (iii), let X be a Tı topological space, and let S € P(X). Suppose that x € Int(S). Then x € Qı 
and Qı C S for some Qı € Top(X) by Theorem 31.8.13 (ii). So {a} C Qı and (XV S) n Q, = 0. Let 
Q5 = X \ {x}. Then Q5 € Top(X) by Theorem 33.1.12 (i). Also, X V S C Q5 and (z) n Q2 = Ø. Therefore 
(1x), X V S) is a weakly separated pair of sets in X by Theorem 33.2.3 (iv). The converse is the same as 
part (ii). 


33.2.8 EXAMPLE: Non-T, space where an interior point is not weakly separated from its complement. 
Let X = {x,y} for some x and y, and let Top(X) = (0, {x}, (x, y) $- This is a To space which is not a 
Tı space. (See Example 31.5.5, Figure 31.5.1 and Remark 33.3.7.) Let S = {x}. Then x € Int(S) = S 
and {x} = {x,y} because {x,y} is the smallest closed set which includes {x}. Therefore {x} N (X V S) = 
{y} # 0. Hence the set-pair ((x), X V S) is not weakly separated. This shows that the T, condition in 
Theorem 33.2.7 (iii) cannot be removed. Nor can it be replaced with a To space condition. 


33.2.9 REMARK: Strongly topologically separated set-pairs. 
Definition 33.2.10 is stronger than Definition 33.2.2. (This follows from Theorem 33.2.3 (v). ) T he difference 
between the concepts of weakly and strongly separated set-pairs is illustrated in Figure 33.2.1 


wally separated sets Buone SEDE sets 


$1 NQR = $5001 -( 
(i.e. Q4 N Q5 n (S1 U S2) = Ø) 


Figure 33.2.1 'The difference between weakly and strongly separated set-pairs 


One way of thinking about the difference is that weak separation of sets Sı and Ss in the full topology 
is the same as strong separation in the relative topology on their union, S1 U S2. Thus the weak concept 
corresponds to the weaker topology because the relative topology is a subset of the full topology. 

The weak and strong separation concepts are equivalent for all sets in a topological space if and only if the 
space is of class 'T5. (See Definition 33.3.25 for T; spaces.) 


33.2.10 DEFINITION: A strongly (topologically) separated pair of sets in a topological space X is a pair of 
sets $1, S2 € P(X) such that 


301, Q2 € Top( X), Sı CQ, and Sp € Qo and Qi N Qo = 0. 


33.2.11 THEOREM: Set-pair strong separation implies weak separation. 
Let Sı and S2 be subsets of a topological space X. If (51,53) is a strongly separated set-pair in X, then 
(S1, 93) is a weakly separated set-pair in X. 


PROOF: Let ($1,595) be a strongly separated set-pair in X. Then by Definition 33.2.10, there are sets 
Q1, Q2 € Top( X) such that Sy Ç 01 and $5 c Qə and Qı N (25 = 0. So (Si U S2) N Q1 N Qə = 0 by 
Theorem 8.1.4 (ii). Hence (S1, $5) is a weakly separated set-pair in X by Theorem 33.2.3. 


33.2.12 REMARK: Expressing the interior of a set as strong separation from its complement. 

There is a theorem analogous to Theorem 33.2.7 for strongly separated set-pairs. In Theorem 33.3.9, it is 
shown that a point x is interior to a set S if and only if the set-pair ({x}, X V S) is strongly separated, but 
this equivalence requires the topology to be of class T3, which is introduced in Definition 33.3.2. 


33.2.13 THEOREM: Inheritance of weak and strong separation by subset-pairs of set-pairs. 
Let X be a topological space. Let S1, $5 € P(X). Let Sj C S1 and S5 C S5. 


(i) If the pair (S1, S2) is weakly separated in X, then (51,55) is weakly separated in X. 


(ii) If the pair (51, $5) is strongly separated in X, then (51,52) is strongly separated in X. 
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PROOF: For part (i), let (51,55) is weakly separated in X. Then 519 S2 = ( and S1 N Sz = 0 by 
Definition 33.2.2. Therefore $1 N $5 = Ø and S1 $5 = Ø by Theorem 31.8.13 (xiv). Hence the set-pair 
(51,95) is weakly separated in X by Definition 33.2.2. 

For part (ii), let ($1, 52) is strongly separated in X. Then Sı C Qı and S2 C Qz and Qı n Qz = Ø for some 
Q1, Q2 € Top( X). But then Si C Qı and S$ C Q2. Hence the set-pair (51,55) is strongly separated in X by 
Definition 33.2.10. 


33.2.14 THEOREM: A set-pair is strongly separated if one of the sets is empty. 
Let X be a topological space. Let S1, S2 € P(X) with Ø € (51,55). Then the set-pair (S1, S2) is strongly 
separated. 


PROOF: Let X bea topological space. Let S1, S2 € P(X). Suppose that Sı = Ø. Then Qı = and Qz = X 
satisfy the requirements of Definition 33.2.10. So (S1, S2) is strongly separated. Similarly if S2 = (). 


33.2.15 EXAMPLE: Weakly and strongly separated set-pairs in finite-set topological spaces. 

As an exercise, it is perhaps useful to briefly study the weak and strong topological separation properties 
of pairs of sets in some small finite-set topological spaces. Consider for example the three-point topological 
space labelled 3f and 3g in Example 31.5.6. (See Figure 33.2.2.) 


3f 3g 
1121/3 1/2/13 
Figure 33.2.2 'Two 3-point topological spaces 


Topology 3f is (0, {1}, {2}, {1, 2}, (1,2, 3}}. Topology 3g is 10, {2}, {1, 2}, {2, 3}, (1,2, 3] ]- 


topology 3f topology 3g 

set-pair weakly strongly set-pair weakly strongly 
$1 So separated separated $1 Sə separated separated 
{1} {2} yes yes {1} (2) no no 
{1} (3) no no {1} (3) yes no 
{12} {3} no no {2} (3) no no 
{1} {2,3} no no {1} {2,3} no no 
{2} {1,3} no no {2} {1,3} no no 
{3} {1,2} no no {3} {1,2} no no 


In each case, any set-pair which contains an empty set is automatically both weakly and strongly separated 
by Theorems 33.2.14 and 33.2.11. Any set-pair (S1, S2) with S1 N S2 4 @ is automatically neither weakly 
nor strongly separated by Definitions 33.2.2 and 33.2.10. So it is only necessary to examine non-intersecting 
pairs of non-empty sets. 


33.2.16 REMARK: Definitions of topological separation classes in terms of set-pair separation. 
Five of the topological separation classes can be defined more succinctly in terms of separated set-pairs. 


definition class definition in terms of separated set-pairs 
33.1.5 To — 

33.1.8 Tı ({ax1},{x2}) is weakly separated for x1, £2 € X, £1 # XQ 
33.1.24 Tə ({a1},{x2}) is strongly separated for z1, £2 € X, £1 Æ £2 
33.3.2 Ts (K, {x}) is strongly separated for K € Top(X), xe X\K 


33.315 "mu — 


33.3.20 T4 (Kı, K2) is strongly separated for Kı, Kə € Top(X), KOK: =0 
2 all weakly separated set-pairs (S1, S2) are strongly separated, for S1, S2 € P(X) 
3.3.3 — 


w 
eo 
w 
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w 
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33.3. Stronger separation classes 


33.3.1 REMARK: The T; separation property. 

Whereas the T5 property in Definition 33.1.24 requires every distinct pair ((z1]), (z2]) to have a disjoint 
open cover, for £1, £2 € X, the T3 property requires every pair (K, {x}) to have a disjoint open cover, for 
any closed set K and point x ¢ K. The T3 separation class in Definition 33.3.2 is illustrated in Figure 33.3.1. 


\ so. L 


Ne? 


FP — 0s 


Figure 33.3.1 Ts separation class requires disconnection of points from closed sets 


33.3.2 DEFINITION: The T3 topological separation class. 
A Ts (topological) space is a topological space X where 


VK € Top(X), Vz € X V K, 304,5 € Top(X), 
x € (and K C Qo and Q1 N Qə = 0. 


In other words, every set-pair (K, (x]), for a closed set K and x ¢ K, has a disjoint open cover. 


33.3.3 REMARK: Comparison of the Tı and T3 properties. 

One half of the T4 concept is implied by the Tı condition. The T, condition implies that there is an 
open set Qə such that K C Q5 and x ¢ Qə. In fact, one may choose Q2 = X \ {x} because singletons are 
closed in a T, topology by Theorem 33.1.12 (i). The other half of the T3 concept is even easier to satisfy. 
With 2, = X X K, one obtains Q, € Top(X) and x € 9, in any topological space. 


The only really demanding requirement for a topology to be of class T is to provide neighbourhoods of x 
and K which are disjoint. (This issue is also discussed in Remark 33.1.13.) 


33.3.4 THEOREM: The combination of Tı and T3 conditions implies T5. 
Let X be a topological space which is both Tı and T3. Then X is Tə. 


PROOF: Let X be a topological space which is both Tı and T3. Let 21,22 € X with zı Æ x2. Then 
K = {x2} is a closed subset of X by Theorem 33.1.15, and zı d K. So by Definition 33.3.2, there are 
disjoint sets Q1, 05 € Top(X) such that zı € Qı and K C Q5. Then Q, € Top,, (X) and Q2 € Top,, (X) 
and Q1 N Qə = Ø. Hence X is T2 by Definition 33.1.24. 


33.3.5 REMARK: Strategies for proving the Ta property from the T3 property. 

The strategy in the proof of Theorem 33.3.4 is to exploit the fact that all singletons in a T, space are closed 
by Theorem 33.1.15. This strategy is not available for T spaces. Nevertheless, the To property, together 
with the Ts property, is sufficient to prove the Tə property by a different strategy, as in Theorem 33.3.6. 


33.3.6 THEOREM: The combination of To and T; conditions implies T3. 
Let X be a topological space which is both To and T3. Then X is T». 


PROOF: Let X be a topological space which is both To and T3. Let r;,:2 € X with zı # x2. Then by 
Definition 33.1.5, there is a set Q € Top( X) such that (x1 € Q) A (xa € Q). (See Remark 33.1.6 for this form 
of the To definition.) By relabelling if necessary, it may be assumed that xı € Q and z3 ¢ Q. Let A= XXQ. 
Then A is closed, and so A = A. But zı € A. So by the T property, there are disjoint sets Q1, Q2 € Top( X) 
such that zı € Qı and A C Og. Thus z2 € Q2. Hence X has the T2 property by Definition 33.1.24. 


33.3.7 REMARK: Some separation properties of some small finite topological spaces. 
The two-point topologies in Example 31.5.5 have the following To, T1, T5, and T3 properties. 
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topology To Ti T; Ts 
2a (0, X} yes 
2b | (06(15,X) — yes 
2c (0,(2), X) . yes 
2d (0, {1}, {2}, X} yes yes yes yes 


The three-point topologies in Example 31.5.6 have the following To, T1, T», and T3 properties. 


topology To T; To T3 
3a 0 yes 
3b 1 
3c 12 
3d 12, 3 yes 
3e 1, 12 yes 
3f 1, 2, 12 yes 
3g 2, 12, 23 yes 


3h 1, 2, 12, 23 yes 
3i 1,2,3,12, 13, 28. yes yes yes yes 


As mentioned in Remark 31.11.11, the only T, topology on a finite set is the discrete topology. 


33.3.8 REMARK: Strong separation of interior points of sets in T spaces from their complements. 
Theorem 33.3.9 is the strongly separated sets analogue of Theorem 33.2.7. It is delayed to this location 
because the T3 space definition is required. 


33.3.9 THEOREM: Interior points of a set have strongly separated point-singleton and set-complement. 
Let X be a topological space. Let S € IP(X). 

(i) x € Int(S) if the pair (x), X V S) is strongly separated in X. 

(ii) If X is a T space, then x € Int(S) if and only if the pair ((x), X V S) is strongly separated in X. 


Pnoor: Part (i) follows from Theorem 33.2.7 (ii) since strongly separated set-pairs are weakly separated. 


For part (ii), let X be a T3 topological space, and let S € P(X). Suppose that x € Int(S). Then z € Q 
and Q C S for some € € Top(X) by Theorem 31.8.13 (iii). But X \ Q is a closed subset of X. So by 
Definition 33.3.2, there exist Q1, Q2 € Top( X) such that {x£} C O1, XV O C Ng and Q4 N Q2 = Ø. Therefore 
({z}, X V S) is a strongly separated pair of sets in X by Definition 33.2.10. The converse is the same as 
part (i). 


33.3.10 REMARK: A T; space is a space where every neighbourhood includes a closed neighbourhood. 
Theorem 33.3.11 means that a topological space is T3 if and only if every open neighbourhood of every point 
includes a closed neighbourhood of that point. (See Definition 31.8.26 for closed neighbourhoods.) 


By Theorem 33.1.15, a space is T, if and only if every singleton is closed. However, a singleton {x} is 
typically not a closed neighbourhood of x because it typically does not include an open neighbourhood of x. 
(The statement of Theorem 33.3.11 is illustrated in Figure 33.3.2.) 


E x\o 


Figure 33.3.2 Nested open/closed neighbourhoods condition for a T space 
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33.3.11 THEOREM: Equivalent condition for the T3 property in terms of closed sets. 
A topological space X is T if and only if 


Va € X, VO € Top, (X), 30; € Top, (X), 3K» € Top(X), 
Q, C ky EQ. 


(33.3.1) 


In other words, for every neighbourhood Q of any point x € X, x € Q4 C K2 C Q for some open set Q and 
closed set Ko. 


PROOF: Let X be a T3 space. Let r € X and Q € Top,(X). Let K = X\Q. Then K is closed 
and x € X V K. So by Definition 33.3.2, x € Qı and K C Qə for some disjoint open sets Qı and Q». Let 
Ky = XX Q2. Then Ko is closed and Ky C XV K =Q and Qi C X \ Q = Ko. Sor € Q4 CK C Q. 

The converse may be shown by reversing the argument. Let a topological space X satisfy condition (33.3.1). 
Let K € Top(X) and x € X\K. Let Q = X\K. Then Q € Top, (X). So by condition (33.3.1), Q C Ky CO 
for some Qı € Top, (X) and Ky € Top(X). Let Qs = XXV Ko. Then z € €; and K=X\QCX \ ky =% 
and Q4 N Q2 = Ø. Hence X is a T5 space by Definition 33.3.2. 


33.3.12 REMARK: Finite-complement topologies on infinite sets do not have the T3 property. 

It is not surprising that the example topological space X in Theorem 33.3.13 is not T3. By Theorem 33.1.28, 
X is T4, but not T5. So by Theorem 33.3.4, X cannot be T3. This conclusion is established more directly 
in the proof of Theorem 33.3.13 (i). 


It is somewhat more difficult to construct topological spaces which are T» but not T3. (For examples, see 
Steen/Seebach [141], counterexamples 60, 63, 66, 75, 78, 79, 80, 81, 88, 90, 91, 92, 94 and 126.) 


33.3.13 THEOREM: Separation properties of the finite-complement topology on an infinite set. 
Let X be an infinite set. Let Top( X) = (0) U (Q € P(X); #(X \ Q) < oo}. 
(i) Top(.X) is not a T3 topology on X. 
(ii) VS € P(X), (Int(S) 40 & #(X \ 8) < oo). 
(iii) VS € P(X), (Int(S) Z 0 & S € Top( X) V (01). 
(iv) For all S € P(X) and x € Int(S), the set-pair ({x}, X V S) is weakly separated. 
) 
) 


(v) For all $ € P(X) \ {X} and z € Int(S), the set-pair ({x}, X V S) is not strongly separated. 
(vi) For all $;, $5 € P(X), the set-pair (S1, S2) is strongly separated if and only if 0 € (51,95). 


PROOF: For part (i), let X be an infinite set. Let Top(X) = {0} U {Q € P(X); #(X VQ) < co}. Let A 
be a non-empty finite subset of X. Then A is closed. Suppose that there are disjoint sets Q4, 05 € Top( X) 
with z € Qı and A C Q2. Then Q2 must be finite because Q2 C X V O1 and X V OQ, is finite. But the only 
finite open set is Ø. This is a contradiction. Hence by Definition 33.3.2, X is not a T space. 


For part (ii), let S € P(X) with x € Int(S). Then Q C S for some € € Top, (X) by Theorem 31.8.13 (iii). 
So #(X \ S) < co because X \ is finite and XV S C XQ. Conversely, suppose that #(X VS) < oo. Then 
S € Top(X) V {0} because X is infinite. But then Int(S) = S by Theorem 31.8.14 (i). Hence Int(S) 7 0). 


Part (iii) follows from part (ii). 

For part (iv), suppose that S € P(X) and x € Int(S). Then X VS is finite by part (ii). But X is a T, space 
by Theorem 33.1.28. So there exist Q4, Qə € Top(X) which satisfy x € O; V O9 and X VS C Qs NO, by 
Theorem 33.1.12 (v). So the set-pair ((z], X V S) is weakly separated by Theorem 33.2.3 (vi). 


For part (v), let S € P(X)\{X} and z € Int(S). Suppose there exist disjoint sets Q1, Q2 € Top(X) satisfying 
r € Q and XV S C Q4. Then Q, C X \ Q2. But Q9 Æ Ó because X VS #0. So X V Qs is finite, and Q1 
is infinite because Qı Æ Ø. This is a contradiction. So there exists no such disjoint open cover-pair for {x} 
and X V S. Hence by Definition 33.2.10, the set-pair ({x}, X V S) is not strongly separated. 


For part (vi), let Sı = Ø, then Qı = Ø and Q2 = X satisfy the requirements of Definition 33.2.10. So 
the pair (@,S2) is strongly separated. Similarly if S2 = Ø. Now let 51,59 € P(X) V (0). Suppose that 
4, Q2 € Top(X) satisfy $1 C Qı, So C Qə and Q4 N 05 = 0. Then Qı C X \ Q2, but Q4 is infinite and 
X X OQ» is finite, which is a contradiction. So by Definition 33.2.10, the set-pair (51,595) is not strongly 
separated. 
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33.3.14 REMARK: Illustration of the T: separation property. 


The T}, separation class in Definition 33.3.15 is illustrated in Figure 33.3.3. The set Top(X) of closed 
subsets of X in Definition 33.3.15 uses Notation 31.4.4. 


Figure 33.3.3 T; 1 class requires existence of a continuous separating function 


33.3.15 DEFINITION: The T31 topological separation class. 
A T31 (topological) space is a topological space X such that 


VK e Top(X), Va e X \ K, Sf € C(X, [0,1]), 
f(x) =0 and Vy € K, f(y) =1. 


33.3.16 THEOREM: Every Tı space is a T3 space. 
Let X be a topological space which is Tai. Then X is T3. 


PROOF: Let X bea Tı topological space. Let K € Top(X) and x € X \ K. By Definition 33.3.15, there 
is a function f € C(X,[0, 1]) such that f(x) = 0 and Vy € K, f(y) = 1. Let Qı = (y € X; f(y) < 1/4} and 
O5 = (y € X; f(y) > 1/2). Then 01,92 € Top(X), K C O05, z € O1 and Q4 Q0» = (. Hence X is T5 by 
Definition 33.3.2. 


33.3.17 REMARK: A property of completely regular topological spaces. 

For any finite set of points z1,...2,, € X V K in Definition 33.3.18, suitable continuous functions fi,... fm 
are guaranteed to exist if the topological space is completely regular. So one may construct f : X — [0,1] 
with f(y) = mini% f;(y) for all y € X. Then f is continuous on X by Theorem 32.12.11, and has the value 1 
on K and the value 0 on all points z1,...z;. 


33.3.18 DEFINITION: A completely regular (topological) space is a topological space which is T; and Tai. 


33.3.19 REMARK: T, spaces. 

The weak separation of disjoint pairs of closed sets is trivial to demonstrate in a general topological space. 
(Just let 0; = X \ K2 and Qs = X\K, and apply Theorem 33.2.3 (vi).) The difficult task in Definition 33.3.20 
is to provide disjoint covering sets Qı and Q2. A T4 space has the property that every disjoint pair of closed 
sets is strongly separated, which is shown in Theorem 33.3.21. Definition 33.3.20 is illustrated in Figure 33.3.4. 


ky oy Ky C Qo 
QIN = 


Figure 33.3.4 T4 separation requires a disjoint open cover for each pair of closed sets 
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33.3.20 DEFINITION: The T, topological separation class. 
A T4 (topological) space is a topological space X such that for every disjoint pair of closed sets Ky and Ko 
in X, there are disjoint open sets Qı and Qə such that Kı C Qı and Kə C O5. In other words, 


VK}, Kə € Top(X), 
Ky nN K2 = 0 => JO, O5 € Top( X), (Kı C Qı and Ky C Q5 and Qi N Q2 = 0). 


33.3.21 THEOREM: Disjoint closed sets in a T4 space are strongly separated. 
Let X be a T4 topological space. Let Sı and S2 be disjoint closed subsets of X. Then (S1, S2) is a strongly 
separated set-pair in X. 


Pnoor: Let X be a Ty topological space. Let $1,S2 € Top(X) with Sı N S2 = Ø. It follows from 
Definition 33.3.20 that there exist Q1, Q2 € Top(X) such that Sı C Q4, S2 C Q2 and Q1 N2 = (). Hence 
(51,53) is a strongly separated set-pair in X by Definition 33.2.10. 


33.3.22 REMARK: The axiom of choice is required for the proof that 'T4 implies Tai. 

The relation between T3 1 and T4 spaces is somewhat problematic. With the “benefit” of the axiom of 
choice, one may easily show that every T4 space is a Ts i Space. This assertion is known as Urysohn’s 
lemma. The full-strength AC axiom is not required, but it does require some form of AC axiom. Therefore 
the continuous function f in Definition 33.3.15 is not constructible in general. The standard proof of 
Urysohn's lemma constructs approximations to the function f by countable induction, but the existence of 
the limit requires the axiom of choice. The effect of this issue is suggested in Figure 33.3.7. By contrast, the 
relation between T3 and T4 spaces is very simple, as shown by Theorem 33.3.23. 


33.3.23 THEOREM: The combination of Tı and 'T4 conditions implies T3. 
Let X be a topological space which is both Tı and Ty. Then X is T3. 


PROOF: Let X be a topological space which is both Tı and Ty. Let K € Top(X) and z € X V K. 
Then {x} € Top(X) by Theorem 33.1.15 because X is Tı. So by Definition 33.3.20, there are disjoint sets 
Q1, Q2 € Top( X) with {x} C Qı and K C O5. Hence X is T3 by Definition 33.3.2. 


33.3.24 DEFINITION: A normal (topological) space is a topological space which is both T, and T4. 


33.3.25 DEFINITION: The T; topological separation class. E 
A T; (topological) space is a topological space X such that for all $1, $5 € P(X), if Sı N S2 = @ and 
S1 N S2 = (), then there exist disjoint Q1, Q2 € Top( X) such that Sı C Qı and S2 C Ng. In other words, 


VS1, S2 € P(X), 
((S1 N S2 =Ø) ^ (S1 N S2 = 0)) => IN, Q2 € Top(X), ((Q1 N 95 = 0) ^ (S1 € M1) ^ (S2 € Q2)). 


33.3.26 REMARK: The T; topological separation class and separated pairs of sets. 
Definition 33.3.25 for the T; separation class says that any sufficiently separated pair of sets has a disjoint 
open cover. This style of disjoint open cover is illustrated in Figure 33.3.5. 


on iQ 
SCM $; CM 
Qı N (25 = 0 
Figure 33.3.5 The T; class disconnects pairs of sets which are separated 
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In terms of Definitions 33.2.2 and 33.2.10, a Ts space means a topological space where every weakly separated 
pairs of sets is strongly separated. A set-pair (S1, 92) with S1, $9 € P(X) is weakly separated when $4085 = 0 
and S1 N S2 = Ý. By Theorem 33.2.3 (v), this is equivalent to the combined conditions Sı C 01, S5 C Q2 
and ($4 U $5) O Q1 n Q3 = Ø. The pair (S1, S2) is strongly separated when the sets satisfy Sı C Q4, S2 C Qe 
and Q4 N Qə = Ø. Thus in a Ts space, a set-pair is weakly separated if and only if it is strongly separated. 
This has some significance for connectedness of sets in Sections 34.1 and 34.3 because a set is connected if 


and only if it cannot be partitioned into a set-pair which is weakly separated. 


33.3.27 THEOREM: Every T; space is a 'T4 space. 
Let X be a T; topological space. Then X is a T4 topological space. 


PROOF: Let X be a T; topological space. Let Kı, K2 € Top(X) be disjoint closed sets in X. Then 
Kı 0 K = Kı N Kə = Kin Kə = É. So by Definition 33.3.25, there are disjoint Q1, Q9 € Top( X) such that 
Kı € Q; and Kə C Q2. Hence by Definition 33.3.20, X is a T4 topological space. 


33.3.28 REMARK: Topologies which are Ts but not T. 

Some examples of topologies which are T; but not Tı are conveniently listed by Steen/Seebach [141], 
page 187. Their example numbers are 4, 13, 17, 52 and 55, which they call respectively the "indiscrete 
topology", “finite excluded point topology", “either-or topology", “nested interval topology" and “Hjalmar 
Ekdal topology". The simplest of these is the “indiscrete topology" (0, X) on any set X, which is the same 
as the trivial topology in Definition 31.3.18. If X has two or more points, then (X, T) is clearly not a Tı 
space, but the T5 property is valid because the only sets $1,595; € P(X) which satisfy $4 O S2 = Ý and 
S, Sy = () are the empty sets Sı = () and S2 = (), and then Q, = Qə = () satisfy Definition 33.3.25. 


The independence of the T; and T; axioms can perhaps be more easily understood from the perspective of 
the table of alternative definitions in Remark 33.2.16. In a Tı space, every set-pair ((zi), (z2]) is weakly 
separated for 21,22 € X with x; # x2. Ina Ts space, every weakly separated set-pair (S1, S2) is strongly 
separated, for S1, $5 € P(X). Thus the Ts property only “promotes” weakly separated set-pairs to strongly 
separated set-pairs, but if even pairs of disjoint singletons are not weakly separated, there is not much that 
can be “promoted”. 


33.3.29 REMARK: Illustration of the Tg separation property. 
Definition 33.3.30 for the Tg regularity class is illustrated Figure 33.3.6. The Tg regularity class is attributed 
by EDM2 [113], page 1612, to Nikolai Borisovich Vedenisov (Huxonaiti Bopucosuu Benenucos, 1905-1941). 


Figure 33.3.6 The Tg class requires closed sets to equal f~'({0}) for some continuous f 


Like the T3; axiom, the Te axiom does not fit perfectly into the pattern of the original Alexandrov/Hopf [47] 
series of six separation axioms, but these two continuous-function-based axioms are unsurprisingly related 
to each other, as shown in Theorem 33.3.34. 


33.3.30 DEFINITION: The Tę topological separation class. 
A Te (topological) space is a topological space X such that for every closed set K in X, there is a real-valued 
continuous function f : X — R such that K = {x € X; f(x) = 0}. That is, 


VK € Top(X), 3f € C(X, R), K = {x € X; f(x) = 0}. 
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33.3.31 REMARK: Clarification of the empty-set extreme case for the Te property definition. 

In the case K = @ in Definition 33.3.30, one may choose f € C(X,IR) with f(x) = 1 for all x € X. (This 
function f is continuous by Theorem 31.12.9.) Then K = (x € X; f(x) = 0}. Therefore it is immaterial 
whether one tests all closed sets or only the non-empty closed sets in Definition 33.3.30. 


33.3.32 THEOREM: Every Tg space is a T5 space. 
Let X be a T$ topological space. Then X is a T; topological space. 


PROOF: Let X be a Ts topological space. Let S1, 52 € P(X) be disjoint subsets of X such that $1 N S2 = 
Ø = S, Sg. By Definition 33.3.30, there are functions f,g € C(X,IR) such that $4 = (x € X; f(x) = 0} 
and $5 = {x € X; g(x) = 0). Let Qy = {x € X; |f(z)| < |g(x)|} and Q2 = (x € X; |g(z)| < |f(z)]]- Then 
Q € Top(X) because % =User {2 € X: f(t) <a and a < g(z)) = User (f! ((-99.4)) Ng! (a, +00))), 
which is a union of finite intersections of open sets because f and g are continuous. Similarly, Q9 € Top( X). 
Clearly Q1 N Q3 = Ô. 

To show that $1 C O1, suppose that x € Sı and z ¢ Qı. Then f(x) = 0 and |f(x)| > |g(x)|. So g(x) = 0. 
So x € S2. But Sı N $9 = Ø. Sox € Sı, which is a contradiction. Therefore x € Qi. Hence $1 C €. 
Similarly, Sg C Q5. Hence X is a T; topological space. 


33.3.33 REMARK: All Te spaces have the Tai property. 
For choice-axiom believers, all T4 spaces are T3 i. For non-believers, this theorem is not available. However, 
it is elementary to show that all Tg topological spaces are of class T; 1. 


33.3.34 THEOREM: Every Tg space is a Tai Space. 
Let X be a Tg topological space. Then X isa Tsai topological space. 


PROOF: Let X be a Tg topological space. Let F be a closed subset of X, and let r € X \ F. By 
Definition 33.3.30, there is a function f : X — R such that F = (y € X; f(y) =0}. Then f(x) Z 0. Define 
9: X > [0,1] by g(y) = max(0, min(1,1 — f(y)/f(x))) for all y € X. Then g(x) = 0, and g(y) = 1 for 
all y € F. The continuity of g follows from Theorem 32.12.11 and the continuity of f. Hence X is a T31 
topological space. Ü 


33.3.35 EXAMPLE: Topological space which is Tg but not To. 

Let X be any set which contains at least two points. Let Top(X) = (0, X). Then X is a Ty, Ts and Te 
space which is not a To, T; or T? space. To show the Tg property, for K = Ø define f : X > R by f: x — 1. 
Then f is continuous and f~!({0}) = K. For K = X define f : X 2 IR by f : x — 0. Then f is continuous 
and f-!([0]) = K. Hence X is a Tg space. Then X is a Ts and T; space by Theorems 33.3.32 and 33.3.27. 
But by Example 33.1.7 and Theorems 33.1.10 and 33.1.26, X is not a To, Tı or T» space. 


33.3.36 REMARK: Family trees for relations between separation classes. 

The relations between various separation classes are illustrated in Figure 33.3.7. One of the arrows must be 
removed if one does not accept the axiom of choice, but then it is still possible to show that the combination 
of the T, and T4 properties implies T5, and that the Tg property implies T, 1. (Similar tables are given by 
Steen/Seebach [141], pages 11-17, and EDM2 [113], pages 1612-1613.) 


33.3.37 REMARK:  Metrisable topological spaces. 

The “metrisable” topology separation class in Figure 33.3.7 means that the topology may be induced by 
a metric function. Topologies induced by metric functions are defined in Section 37.5. It is shown in 
Theorem 38.1.5 that a metrisable topological space is necessarily of class Tg. It is trivial to show that a 
metrisable topological space is of class Tı. So a metrisable topological space fulfils all of the conditions in 
Figure 33.3.7, with or without the axiom of choice. 


33.4. Separability and countability classes 
33.4.1 REMARK:  Dense subsets of topological spaces. 


The definition of a dense subset of a topological space is attributed by Moore [371], page 236, to a 1927 book 
by Felix Hausdorff. A prime example is the set of rational numbers Q within the real numbers IR. 
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r——|topological space m] topological space 
To To 
Kolmogorov space Kolmogorov space 
Ti Tı 
Kuratowski space Kuratowski space 
T2 T2 
Hausdorff space Hausdorff space 
Ts Tot+Ts Ts Tot+T3 
—— x: s : SS » ay: ; , ~l 
Vietoris axiom regular space Vietoris axiom regular space 
L , L , 
ai TitTs1 Tii TitTs1 
— 39 ——— 2 2 — ode 2 
Tikhonov axiom completely regular space Tikhonov axiom completely regular space 
yAC 
Ta Tic T4 p j Ta Ti+T4 
————— ————- 
>| Tietze's first axiom normal space Tietze’s first axiom normal space 
Y Y Y Y 
Ts Ti4Ts Ts Ti4+Ts 
Tietze’s second axiom completely normal space Tietze’s second axiom completely normal space 
t t t L 
Ts T;-c-Te Ts Ti+T6 
> . MEET < > : DEON —— 
Vedenisov axiom perfectly normal space Vedenisov axiom perfectly normal space 
Y Y, 
metrisable space metrisable space 
ZF set theory ZF--AC set theory 
Figure 33.3.7 Family tree for separation classes of topological spaces 


33.4.2 DEFINITION: A dense subset of a topological space X is a set S € P(X) such that S$ = X. 


33.4.3 THEOREM: Equivalent conditions for sets to be dense in a topological space. 
'The following conditions are equivalent in any topological space X. 


(i) S is a dense subset of X. 
(ii) Int(X V S) = 0). 
(iii) VO € Top( X) (0), ANS FO. 


Pnoor: To show the equivalence of (i) and (iii), let S be a subset of a topological space X. Then S is a 
dense subset of X if and only if S = X, which holds if and only if X V S = Ø. But X \ S = Int(X V S) by 
Theorem 31.8.13 (viii). Hence S is a dense subset of X if and only if Int(X V S) = 0. 

To show that (ii) implies (iii), let S be a subset of a topological space X such that Int(X V S) = Ø. Let 
Q € Top(X) V 10). Suppose that QN S = Ø. Then Q C XXV S. So Q = Int(Q) € Int(X VS) = 0 by 
Theorem 31.8.13 (xiii), which is a contradiction. So Q N S # Ø. Hence VO € Top(X) V (0), ONS AO. 

To show that (ii) implies (ii), let S be a subset of X such that VO € Top(X) \ (0), Q à S z 0. Let 
x € Int(X V S). Then Q € X \ S for some € € Top,(X) by Theorem 31.8.13 (iii). So QN S = 0. But this 
contradicts the assumption for S. Therefore x € Int( X V S) is impossible. Hence Int(.X V S) = 0. 


33.4.4 DEFINITION: E 
A nowhere dense subset of a topological space X is a set S € P(X) such that Int(S) = 0. 


33.4.5 REMARK: Separable spaces versus separability classes. 

Very regrettably, the concepts of separable spaces and separability classes are not directly related, but are 
closely enough related to be easily confused. (This is also mentioned in Remark 33.1.3.) In fact, the dense 
subsets referred to in Definition 33.4.6 suggest the non-separability of a countable set of points because the 
complement of such a set has no interior. Thus separable spaces are poorly named. 
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33.4.6 DEFINITION: A separable (topological) space is a topological space in which there exists a countable 
dense subset. 


33.4.7 THEOREM: Real Cartesian topological spaces are separable. 
R” is a separable topological space for all n € Zg. 


PROOF: For the case n = 0, IR? = {0} is separable because it is finite. So let n > 1. Then Q” is a dense 
subset of R” by Theorem 32.6.10. But Q” is countable by Theorem 15.2.6. Hence IR" is separable by 
Definition 33.4.6. 


33.4.8 REMARK: ZF choice functions for separable topologies. 

In topology-related proofs, a choice function for non-empty open sets is often required. This can be done 
in ZF set theory without any axioms of choice if the topology is separable, as indicated in Theorem 33.4.9. 
This is a kind of “choice theorem”, as opposed to a “choice axiom” which tells you almost nothing concrete 
about the choice function which is delivered. 


The choice function construction in Theorem 33.4.9 uses a kind of “dartboard procedure” for choosing points, 
with a “first cab off the rank” rule for which “dart” to pick. This is related to the “dartboard procedure” 
used in Theorem 32.7.8 for enumerating component intervals of open sets of real numbers and the “dartboard 
procedure” used in Theorem 34.8.2 for enumerating connected components of open sets in locally connected 
separable spaces. 


33.4.9 THEOREM: Topology choice function for separable topological spaces. 
Every separable topology has a choice function. In other words, if X is a separable topological space, then 
there exists a map ¢: Top(X) V {0} — X such that VO € Top( X) V {0}, AQ) € Q. 


PROOF: Let X be a separable topological space. Then X = S$ for some countable subset S of X. If S = 0, 
then X = Ø and Top(X) = {0} and Top(X) \ (0) = 0. So ¢ = 0 is a suitable choice function. So assume 
that S # Ø. Then by Theorem 13.7.13 (iv), there exists a surjection f : w — S. Let Q € Top(X) \ (0). 
Then QN S 4 Ø by Theorem 33.4.3 (iii). So (i € w; f(i) € Q} z 0. Therefore min{i € w; f(i) € Q} isa 
well-defined element of w by Theorem 12.2.7. Define $ : Top( X) > X by ¢(Q) = f(min(i € w; f(i) € O}). 
Then ¢(Q) € Q for all Q € Top(Q) V 0). Thus $ is a suitable choice function for the topology on X. 


33.4.10 THEOREM: Topology choice function for Cartesian topological spaces. 
Every Cartesian topological space has a topology choice function. In other words, for all n € Ze , there 
exists a map ¢: Top(R”) V {0} > R” such that VQ € Top(IR?) V {0}, AQ) EQ. 


PROOF: The assertion follows from Theorems 33.4.7 and 33.4.9. 


33.4.11 REMARK: The importance of countable open bases for topological spaces. 

Since the open base and subbase concepts in Sections 32.2 and 32.3 are analogous to the concepts of spanning 
sets or bases of linear spaces and other kinds of algebraic structures, the idea of a countable open base in 
topology is similar to the idea of a countable basis for a linear space. (See Section 22.7 for bases for linear 
spaces.) A wide range of practically useful topological spaces have a countable open base, and existence of 
such an open base makes such spaces easier to manage. 


In practice, topological definitions (such as continuity of functions) are not tested with all possible open sets 
in the relevant topologies (such as the topologies of the source and target spaces of functions). Instead a 
more manageable set of open sets is used, and the general case follows from the narrower set of particular 
cases. Typically one prefers to perform tests on at most a countable set of open sets. So it is useful to know 
whether a topology can be “spanned” at each point by a countable set of open neighbourhoods. Hence the 
definition of second countable topological spaces has practical importance. (Second countable spaces are 
called “completely separable spaces” by Hocking/Young [93], page 64; B. Mendelson [115], page 184.) 


33.4.12 DEFINITION: A first countable (topological) space is a topological space for which there exists a 
countable open base at each point. 


33.4.13 DEFINITION: A second countable (topological) space is a topological space for which there exists a 
countable open base. 
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33.4.14 THEOREM: Second countable implies first countable. 
Let X be a second countable topological space. Then X is first countable. 


PROOF: Let X be a second countable topological space. Then there exists a function f : N — Top(X) 
such that N € wt and Range(f) is an open base for X. For any x € X, let B, = Range(f) N Top, (X). 
Then B, is an open base at x. But f^ !(B,) is a countable set because it is a subset of a countable set by 
Theorem 13.8.2. (The proof of this requires countable induction, but does not require an axiom of choice.) 
Therefore there exists a function f' : N' — Top, (X) such that N’ € wt and Range(f’) is an open base at x. 
Hence X is first countable. 


33.4.15 REMARK: Equivalent definition for first countable spaces using indexed open bases. 
A topological space is first countable if and only if there exists a non-increasing indexed countable open base 
at every point. This follows from Theorem 33.4.16. 


33.4.16 THEOREM: First countable spaces have non-increasing countable indexed open bases. 
Every first countable topological space has a non-increasing indexed countable open base at every point. 


PROOF: Let X be a first countable topological space. Let x € X. By Definition 33.4.12, there exists a 
countable open base B € IP(Top, (X)) at x. By Theorem 13.7.13 (iv), either B = ( or there exists a surjective 
function f : w — B. If B = 9, then X = 0, and so B may be replaced by the open base {0}, which is 
clearly non-empty. So there always exists a sequence (f(i));e; such that Range(f) is an open base at x. 
Then the sequence (B;);e, defined by B; = au f (4) is a non-increasing indexed countable open base at x 
by Theorem 32.2.11. 


33.4.17 REMARK: Some separable topological spaces are not second countable. 

A separable topological space is not necessarily second countable, although this implication is valid in a 
metric space. (See Theorem 37.7.23.) A separable space which is not second countable is the “right half-open 
interval topology" for IR, for which the set of intervals [a, b) for a, b € R is an open base. (For this example, 
see Steen/Seebach [141], page 75; Wilansky [163], page 76. For further information on relations between 
separable and second countable spaces, see Steen/Seebach [141], pages 7, 21-22, 49-50; Simmons [137], 
page 100; Gaal[77], pages 120-121; Wilansky [163], pages 75-80; Willard [165], pages 108-115; Baum [54], 
pages 43-49; Kasriel [100], pages 193-197; Kelley [101], pages 48-50.) 


33.4.18 REMARK: Examples of separable non-first-countable topological spaces. 
Theorem 33.4.19 gives a class of topological spaces which are separable but not first countable. The set X 
can be, for example, R or any uncountable subset of IR. 


An uncountable w-infinite set is a set which is w-infinite but not countable. (See Definition 13.7.6.) An 
w-infinite set is equivalent to a Dedekind set. (See Theorem 13.10.6 (i).) Any set which has a subset which is 
equinumerous to the set P(w) or IR is uncountably w-infinite. The term “w-infinite” is used in the statement 
of Theorem 33.4.19 because the existence of an infinite sequence of distinct elements is required in the proof 
of separability. 


Without the existence of a total order in Theorem 33.4.19, it is possible to show that the topology is not 
first countable with the assistance of a fairly weak version of the axiom of choice, or else some kind of 
choice-like assumption for the set X. However, uncountable totally ordered sets are not rare. So such arcane 
technicalities are not merited here. 


33.4.19 THEOREM: Separable and first/second countable properties for finite-complement topologies. 
Let X be an uncountable w-infinite set with the finite-complement topology. Then X is a separable space, 
but if X has a total order, it is not first countable and not second countable. 


Proor: Let X be an uncountable w-infinite set. Then Top(X) = (0) U {Q € P(X); #(X \Q) < oo] is the 
finite-complement topology on X. (See Definition 31.11.7.) Let S be a countably infinite subset of X. Let 
Q € Top(X) \ (0). Then #(X \ 9) «oo. So SZ X\Q. So SNAN zZ 0. Thus VO € Top(X) \ {0}, ONS z 0. 
Therefore S is a dense subset of X by Theorem 33.4.3 (i, iii). Hence X is separable by Definition 33.4.6. 


Now assume that X has a total order “<”, and suppose that X is first countable. Let x € X. Then there 
exists a countable set B C Top,(X) such that VQ € Top,(X), JU € B,U C Q. But X is a T, space. 
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(See Remark 31.11.8 and Theorem 31.11.10.) So for every y € X \ {x}, there exists V € Top,(X) such 
that y ¢ V. Then there is a set U € B such that x € U and U CV. So for all y € X V {a}, there exists 
U € B such that y € U. Therefore (] B = {x}. So X\ {£} = X Af] B = Upecg(X NU). But #(X \ U) < oo 
for all U € B. So Upeg(X XU) is countable by Theorem 13.8.7. So X is countable, which is a contradiction. 
Therefore X is not first countable. Hence X is not second countable by Theorem 33.4.14. 


33.4.20 REMARK: Separable non-first-countable space examples with axioms of choice. 

Theorem 33.4.21 requires only the countable choice axiom, but since Theorem 33.4.19 is used in the proof, a 
total order is required, which CC cannot deliver. In practice, a total order is often easy to produce. To prove 
that the space is not first countable, it is sufficient to assume that all countable sequences of finite subsets 
of the set have a choice function. But this kind of condition is rarely encountered “in the wild". (This topic 
is also mentioned in Remark 13.8.8.) 


33.4.21 THEOREM [ZF+AC]: Finite-complement topology separable and first/second countable properties. 
Let X be an uncountable set with the finite-complement topology. Then X is a separable space, but it is 
not first countable and not second countable. 


PRoor: Let X be an uncountable set with the finite-complement topology. Then by the axiom of choice, 
X is an uncountable w-infinite set and has a total order. So the assertions follow from Theorem 33.4.19. 


33.4.22 REMARK: All second countable spaces are separable, assuming the axiom of countable choice. 
Every second countable topological space is separable if the axiom of countable choice is adopted. (The 
resulting relations between separable, second countable and first countable spaces, without and with the 
axiom of countable choice, are illustrated in Figure 33.4.1.) 


topological space topological space 
separable space first countable space separable space first countable space 
second countable space second countable space 
ZF set theory ZF+CC set theory 


Figure 33.4.1 Family tree for separable, first countable and second countable spaces 


In the most obvious proof strategy, one chooses a single point from each open set in a countable base. The 
axiom of countable choice asserts that this can be done. (See Moore [371], pages 235-237, for the history 
and other details of the CC axiom requirement for Theorem 33.4.25.) Without the benefit of a choice axiom, 
one may observe that if a human being has constructed (or otherwise specified) the countable open base in 
question (which is more than likely because this is usually how it can be asserted that it is countable), then 
that human being can very likely specify a rule to select a unique point in each of the sets in the open base. 
The most obvious example is when the open base consists of open balls in a metric space. In that case, 
one may select the centre of each ball for example. It is difficult to think of situations where a countable 
sequence of open sets could be specified without also being able to specify a point-selection rule. Thus 
the AC-free Theorem 33.4.24 may be used in essentially all practical situations, whereas the AC-assisted 
Theorem 33.4.25 may be used as a last resort when the definition context is too abstract for a point-selection 
rule to be specified. (An IOU for a rabbit is better than no rabbit at all.) 


The sad thing about Theorem 33.4.24 is that, since an open base choice function must be provided, this is 
really exactly what is required by the definition of a separable space. So in essence, the theorem says not 
very much more than that a separable space is separable. Similarly, Theorem 33.4.25 says not much more 
than that the axiom of countable choice implies the axiom of countable choice. In fact, it is known that 
Theorem 33.4.25 is equivalent modulo ZF to the axiom of countable choice. (See Howard/Rubin [362], pages 
18, 125, form 8L.) In this sense, one could almost regard Theorem 33.4.25 as a definition of CC. 


33.4.23 DEFINITION: A second countable (topological) space with open base choice function is a topological 
space X such that there exist a countable open base B and a function C : B — X such that VG € B, C(G) € G. 
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33.4.24 THEOREM: Second countable spaces with open base choice function are separable. 
Every second countable topological space with an open base choice function is separable. 


PRoor: Let X be a second countable topological space with open base choice function. Then it follows 
from Definition 33.4.23 that there is a function f : w — Top(X) such that B = Range(f) is an open base 
for X, and there exists a function €: B — X such that C(G) € G for all G € B. Let g=Co f :w — X and 
S = Range(g) = Range(C). Then S is a countable subset of X, and VG € B, dy € S, y € G. In other words, 
VG c B, SNG z (. But by Definition 32.2.3, the open base B satisfies 


Va € X, VO € Top, (X), IG € B, r € GandG C Q. 


So Vz € X, VO € Top, (X), 3G € B, (x € G and G C Q and SN G # 0). But GC Q and SN G FO implies 
SNAN ZÜ. So Yx € X, VO € Top,(X), SAN z 0. Therefore Vx € X, x € S by Theorem 31.8.17 (ii). In 
other words, X C S, and so X = S. So S is a countable dense subset of X. Hence X is separable. 


33.4.25 THEOREM [ZF+CC]: All second countable spaces are separable. (Conditions apply.) 
Every second countable topological space is separable. 


PROOF: Let X be a second countable topological space. Then there is a function f : w — Top(X) such that 
B = Range(f) is an open base for X. By the axiom of countable choice, there exists a function C : B+ X 
such that C(G) € G for all G € B. Therefore X is a second countable topological space with open base 
choice function by Definition 33.4.23. Hence X is separable by Theorem 33.4.24. 


33.4.26 REMARK: Second countability, local compactness, and partitions of unity. 
A second countable, locally compact Hausdorff space has partitions of unity subordinate to any open cover. 
(See Lang [23], pages 35-36; EDM2 [113], page 1618.) 


33.5. Open covers and cover-based compactness 


33.5.1 REMARK: Origin and “diversity” of the term “compact” for topological spaces. 
EDM2 [113], 273.F, suggests that the term “compact” was introduced in 1906 by René Maurice Fréchet, but 
A.E. Taylor [145], page 65, said the following in about 1965. 


The term “compact” was introduced into mathematics in 1904 by Maurice Fréchet [. . .]. 
Subsequently, in 1906, he introduced an alternative definition of the term “compact,” equivalent to 
his original definition under certain conditions. [...] according to his 1906 definition a set $ in R* 
would be called compact if every infinite subset of S has at least one point of accumulation (which 
need not be in S, however). 


He then explains that Fréchet’s 1904 definition was equivalent to requiring (]7- , Sn 4 () for all non-increasing 
sequences (5,)7* consisting of subsets of S. These analytic convergence-style definitions are clearly not the 
same as the modern definition. A.E. Taylor [145], pages 65-66, then said the following. 


Fréchet's definition of compactness was used quite generally for more than thirty years, and it is still 
used by some mathematicians. However, the prevailing contemporary definition of compactness in 
America and western Europe, though closely related to Fréchet's definition, is not identical with 
it. For this reason one must be very careful in reading about compactness in mathematical books 
and periodicals. The lack of uniformity in terminology is regrettable, but it is a troublesome fact 
which must be faced. 


The following 1961 comment by Hocking/Young [93], pages 20-21, gives some idea of the extent of confusion. 


The word *compact" has been defined in so many (related) ways that one must be quite careful 
in reading the literature. For a long time, a space was said to be compact if it were what we have 
called countably compact. And a subset X of a space S was said to be compact if every infinite 
subset of X had a limit point in S. [...] A new term bi-compact was introduced and used for a 
while to mean our covering compactness, but the prefix was later dropped. The terms countably 
compact and sequentially compact were coined to replace the older notion of compactness for spaces; 
for subsets, the terms conditionally compact and pre-compact are sometimes used to mean that a 
subset is compact in the older sense. 
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Fortunately, the term “compact” is now much better standardised, although some relatively recent authors 
do still use the original Fréchet sequential-style definition for compactness. (For example, Mattuck [114], 
page 356; Shilov [135], page 91.) Various related concepts suffer from more substantial “diversity”, notably 
the theorems and definitions which carry the names *^Bolzano-Weierstraf". (See Remark 35.5.3.) 


33.5.2 REMARK: The distinction between open-cover compactness and sequential compactness. 

Most treatments of compactness present sequential compactness and limit-point compactness in close associ- 
ation with open-cover/subcover compactness. To emphasise the differences between these styles of compact- 
ness, which are so easily confused, sequential compactness is presented separately in Section 35.6, after the 
presentation of limits and convergence of sequences in Section 35.4, and limit-point compactness is presented 
in Section 35.5. Equivalences between definitions in these styles often require the axiom of countable choice. 


33.5.3 REMARK: Open covers and subcovers. 

The modern definition of compactness is expressed in terms of open covers and subcovers. (See Section 8.7 
for general set covers. See Section 10.18 for general indexed set covers. See Definition 31.7.2 for unindexed 
open covers of sets.) 


Some authors define an open cover to be a set of sets. (For example, Wilansky [163], page 19; Johnsonbaugh/ 
Pfaffenberger [97], page 144; Kasriel [100], page 117; Kelley [101], page 49. According to Kelley [101], page 1, 
and Willard [165], page 1, a “family” is synonymous with a “set”.) Other authors define an open cover to be 
an indexed family of sets. (For example, Bass [53], page 205; Simmons [137], page 111; Steen/Seebach [141], 
pages 4, 163; Hocking/Young [93], page 18; Kolmogorov/Fomin [104], page 83; Gemignani [80], page 142; 
B. Mendelson [115]; A.E. Taylor [145], pages 61, 96; Shilov [135], page 93; Gaal [77], page 89; Rudin [129]; 
Nash/Sen [30], page 17. Many of these authors use the word “collection” to mean an indexed family.) 
Some authors explicitly define an open cover to be sometimes a set and sometimes an indexed family. (For 
example, Thomson/Bruckner/Bruckner [149], pages 168, 602.) And some authors add and remove indexing 
from “collections” of sets casually without much comment. (For example, Willard [165], page 104, but many 
of the other authors listed here are equally informal in their use of indices.) m 


In the definitions of open covers by some authors, it is unclear whether they intend their covers to be indexed 
or not. In principle, there is no substantial difference because a family can always be "selfindexed". (In 
other words, one may index a set X with an index-set J = X, with a map f : I > X defined by f : x> z.) 
On the other hand, indexed covers permit the same set to appear multiple times with different indices, 
which can create difficulties when attempting to constrain the cardinality of the cover, and can also create 
confusion between different definitions of limit points. But the real danger is that one may unconsciously 
assume that the index set has a well-ordering, or other kind of order, whereby various kinds of traversals of 
the indexed set may be defined. (This is also mentioned in Remark 10.8.1.) 


33.5.4 DEFINITION: An indexed open cover of a subset A of a topological space X < (X, T) is a family of 
sets (Bi)ie1 such that 


(i) Vi € I, B € T, 
(ii) AC Vier Bi. 
33.5.5 DEFINITION: A subcover of an indexed open cover (Bj);e; of a subset A of a topological space 


X < (X, T) is an indexed open cover (C;)jc; of A in X such that Vj € J, 3i € I, C; = B;. In other words, 
(Cj);e € (Bi)ier. 


33.5.6 REMARK: Refinements of indexed open covers. 
Refinements of open covers are applied in Definitions 33.7.15 and 33.8.2 for paracompactness and Lebesgue 
dimension respectively. 


33.5.7 DEFINITION: A refinement of an indexed open cover (B;)ier of a subset A of a topological space 
X < (X, T) is an indexed open cover (C;);e; of A in X such that Vj € J, 3i € I, C; C Bi. 


33.5.8 REMARK: A subcover or refinement of an open cover is required to be an open cover. 

According to Definition 10.18.3, a subcover of a cover (B;);e; of a subset A in a general set X is any cover 
of A in X which satisfies Vj € J, Ji € I, C; = Bj. Definition 33.5.5 requires a subcover of an open cover to 
also be an open cover. But this follows automatically because the elements of a subcover must be open sets. 
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According to Definition 10.18.4, a refinement of a cover (D;);e; of a subset A in a general set X is any cover 
of A in X which satisfies Vj € J, Ji € I, C; € Bi. However, Definition 33.5.7 requires a refinement of an 
open cover to also be an open cover. This does not follow automatically for any general-set-cover refinement. 
Thus Definition 33.5.5 is essentially the same as Definition 10.18.3, whereas Definition 33.5.7 differs from 
Definition 10.18.4 by requiring the members of (C;);c; to be open sets. 


33.5.9 REMARK: The Heine-Borel condition for compactness. 

The characterisation of compactness in Definition 33.5.10 is sometimes called the Heine-Borel condition or 
Heine-Borel compactness. (See for example Ahlfors [45], page 60.) However, Heine's 1872 paper did not 
present any compactness concept at all, although he did prove that continuous functions on closed bounded 
intervals are uniformly continuous. His method of proof may be interpreted retrospectively as implying that 
any open cover has a finite sub-cover, but he did not mention such a concept. (See Heine [181], page 188.) 


The notation P§°(C) in Definition 33.5.10 means the set of finite subsets of C. (See Notation 13.12.5.) Two 
of the most useful basic properties of compact sets are given in Theorems 33.5.13 and 33.5.15. 


33.5.10 DEFINITION: A compact set in a topological space (X, Tx) is a subset K of X such that every 
open cover of K has a finite subcover. In other words, 


vC € P(Tx), KCUC = IC e P(C), K CUC' 


A compact topological space is a topological space (X, Tx) such that X is a compact set in X. 


33.5.11 REMARK: Compactness is unaffected by substituting relative topology for global topology. 

One generally thinks of compactness as being essentially equivalent to a combination of closedness and 
boundedness in Cartesian spaces and metric spaces. The closedness of a set depends very much on whether 
the global topology or some relative topology is used. However, in the case of set-compactness, the property 
is unaffected by substituting a relative topology for the global topology. A property which has this meta- 
property is said to be “hereditary”. Thus set-compactness is a “hereditary property". 


A possibly noteworthy aspect of the proof of Theorem 33.5.12 is the use of an “axiom of finite choice" to 
guarantee the existence of a finite cover in the global topology corresponding to a finite cover in the relative 
topology on S. Therefore could be infinitely many open sets in the full cover in the global topology whose 
intersection with S equals a single member of a finite cover for S. Therefore one must “choose” from amongst 
such global open sets. Luckily the “axiom of finite choice" is valid in Zermelo-Fraenkel set theory without 
any axiom of choice. (By contrast, in the proof of the inverse heredity property for the Lindelóf property in 
Theorem 33.7.10, the axiom of countable choice is invoked.) 


33.5.12 THEOREM: Set-compactness is hereditary and inverse hereditary. 
Let X be a topological space. Let S C X and K C S. Then K is compact in the relative topology on S if 
and only if K is compact in the topology on X. 


PROOF: Let X be a topological space. Let S C X and K C S. Then by Definition 33.5.10, K is compact 
with respect to the relative topology on S if and only if 


VC € IP(Top(S)), K CUČ > IČ ere(c), KeUC, (33.5.1) 


where Top(S) = {QN S; Q € Top(X)} by Definition 31.6.2. Let K be compact in the relative topology on S. 
Suppose that C € P(Top(X)) and K C UC. Let C = {9N S; Q € C}. Then Č € IP(Top(S)) and K C UČ. 
Therefore K C LJ C" for some C” € PẸ (C) by line (33.5.1). Let I = C" and define the set-family (S;);e; by 
S;={Q2EC;QNS € i) fori € I. Then I is a finite set and S; Z @ for all i € I. Therefore xX;er S; 4 0 by 
Theorem 13.7.17. (This is a kind of *axiom of finite choice".) So there exists a function f : C’ + C such 
that i C f(z) for all i € C’. Let C' = Range(f). Then C’ € P(C) and K C UC”. Hence K is compact in 
the topology on X by Definition 33.5.10. 

For the converse, let K be compact in the topology on X. Suppose that € € P(Top(S)) and K C UC. Let 
C = {Q € Top(X); QN S € Cj. Then C € P(Top(X)) and K C UC. So K C UC’ for some C’ € PẸ (C) 
by Definition 33.5.10. Let C’ = {QN S; Q € C'). Then C’ € P®(C) and K C UC’. Therefore K satisfies 
line (33.5.1). Hence K is compact in the topology on S by Definition 33.5.10. 
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33.5.13 THEOREM: Closed subsets of compact sets are compact. 
If F is a closed subset of a compact set K, then F is compact. 


PROOF: Let C be an open cover for a closed subset F of a compact set K in a topological space (X,T). 
Define C” = CU{X \ Fl. Then C” is an open cover of K. So K has a finite open subcover C1 of C". 
Define C = C N C1. Then C5 is a finite subset of C. Since Cı C CU (X \ F}, C2 must equal either C1 or 
C4, N (X \ F}. But C1 covers F, and Ci \ (X V F} also covers F because (X \ FP} NF =. So C2 C C isa 
finite subcover of F. The theorem follows. 


33.5.14 THEOREM: All compact subsets of a Hausdorff space are closed. 
Let X be a Hausdorff space. Let X be a compact subset of X. Then K is a closed subset of X. 


PROOF: Let X be a Hausdorff space. Let K be a compact subset of X. Let x € X\K. Then for each y € K, 
there are disjoint sets Q, G € Top(X) such that y € € and z € G. Therefore the collection of open sets 


C —U,ex {2 € Top, (X); 3G € Top,(X), ANG = 0j 
= {Q € Top( X); dy € K, (y € Q and 3G € Top,(X), ANG —- 0)) 


satisfies K C JC and z ¢ UC. Since K is compact, there is a finite subset C” of C such that K C UC’. For 
each Q € C’, there is a set Go € Top, (X) such that O N Go — 0. Let H = (gcc; Go. Then H € Top, (X) 
and KN H =Q. So H C XX K. Therefore X V K is open. Hence K is closed. 


33.5.15 THEOREM: Continuous images of compact sets are compact. 
If X and Y are topological spaces, f : X — Y is continuous and A is a compact subset of X, then f(A) is 
compact. In other words, the image of a compact set under a continuous map is compact. 


PROOF: Let X and Y be topological spaces, and let f : X — Y be continuous. Suppose that A is a 
compact subset of X. Let (B;);ie; be an open cover of f(A). Then f(A) C Uer Bi- So f(A) € User B 

So f-!(f(A)) € f^! (U;e; Bi) by Theorem 10.6.10 (ii). So A C U;ez f^! (Bj) by Theorem 10.7.1 (ii) and 
Theorem 10.8.18 (iii). So C = (CiJier = (f! (Bi))ier is an open cover for A in Pie Therefore A has a subcover 
(Cj) jer for some finite subset J of I because A is compact. So AC Ue; C; = Uje; f^ 1(B;). Therefore 
f(A) € f(Uje; FO 1(B;)) = Uses ECET 1(B;)) € Ujes Bi by Theorem 10.8. 18 ( ) and Theorem 10.7.1 (i). So 
(B;)je7 is a finite open cover for f(A). But (B;);e7 is a subcover of (Bi)ier. So every open cover of f(A) 
has a finite subcover. Hence f(A) is compact. 


33.5.16 REMARK: Tikhonov’s theorem for infinite products of spaces requires the axiom of choice. 
Tikhonov’s theorem states that the product of an arbitrary set of compact topological spaces is a compact 
topological space. (See Definition 32.12.2 for products of topological spaces.) This theorem requires the 
axiom of choice. Therefore it is not presented here for infinite products of spaces. Theorem 33.5.17 is a 
very limited form of Tikhonov's theorem for the product of two spaces. (See Gaal [77], pages 144-145, for a 
similar proof.) v 


33.5.17 THEOREM: Tikhonov’s theorem for the product of two topological spaces. 
Let X; and Xə be compact topological spaces. Then X; x X» is a compact topological space. 


PROOF: Let X; and Xə be compact topological spaces. Let X = X; x Xo. If X4 = Ø or X = Ú, then 
X = ( by Theorem 9.4.6 (i), and so X is compact. So assume that X, # Ø and X» # 0. 


Let C be an open cover for X = X4 x X». Define C; : X — P(Top(X1)) by 


V(zi,z2) € X, Ci(zi,22) = {Gi € Top,, (X1); IG2 € Top,, (X2), IN € C, Gi x Ga C Q}, 


and define C1 : X5 + IP(Top(X1)) by 


Vag € Xo, Ci(z2) = Us, ex, C1(21, 22) 
= = (Gi € Top(X1); Jzı € X4, JG € Top( X3), 30 c C, (21,22) €G, x G9 C Q} 
= en € Top( X1) \ (0); AG» € Top,, (X3), JN € C, G4 x Go c Q}. 
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Let z3 € Xə. To show that Ci(r2) covers Xi, let xı € Xı. Then (21,22) € Q for some Q € C. So 
(z1,22) € G1 X G2 C Q for some G, € Topp, (X1) and G2 € Top,, (X2) by Theorem 32.9.6 (iii). Therefore 
x1 € Gı for some Gi € Ci(x2). So xı € U Ci (x2). Thus X4 C U Ci(z2). The compactness of X4 then 
implies that there is a finite subset Dı of Ci (x2) such that X4 C U Di. Then D; Æ ( because X4 Æ Ø. So 


Vag € Xa, AD, € IP?? (C1 (22)), Xı C U Dı, (33.5.2) 


where PẸ (X) denotes the set of non-empty finite subsets of any set X. (See Notation 13.12.5.) 
Define E, : X2 > PP(IP?(Top(.X1))) by 


Vag € Xa, E\(x2) = [Di € IP?? (C1 (22)); Xı C U Di}. 


(Note that PY (C1 (x2)) C PY (Top(X1)) for all zx € X5 by Theorem 13.12.8 (v) because Ci (x2) C Top(X1). 
So Ei (£2) € IP(IP?*(Top(.X1))).) Then Yz € X», E1 (£2) Æ 0 by line (33.5.2), and by the definition of C1, 


Vr € X2, VD, € E\(22), VG. € Dı, 
G4 € Top,, (X1) \ 10} and AG2 € Top,, (X3), 30 € C, Gy x Go G Q. (33.5.3) 


Define F (x2, D1) € P(Top,, (X3)P1) for z2 € X and D, € E (£2) by 


Vx E Xo, VD, € Ei(22), 
F(xo, D1) = {f : Dı > Top,., (X3); YG € Dy, 30 € C, G4 X f(G1) c Q}. (33.5.4) 


Then F(z2, D1) is non-empty for all x2 € X2 and Dı € E1 (x2) by mathematical induction by line (33.5.3). 
(No axiom of choice is required to prove existence of these choice functions because D is finite.) Define 


$: User Up, cE: (22) F (x2, D1) > P(X2) by 


Vag € Xo, VD, € E1 (x2), Vf € F(z2, D1), 
o(f)= N FG) 


G,ED, 


Then ¢ is well defined because D; # 0, and ¢(f) € Top,,(X2) by Theorem 31.3.7 for all f € F(x», Dı), for 
all x9 € Xə and D, € Fy (x2). Define H C Top(X2) by H = Range(¢). In other words, 


H = (ef); ED € X», AD, € Ei (23), f € F(zo, D1)) 
= {o(f); f € Less Up. cE: (as) F(z, D1)). 


Then Xə € H. So H has a finite subcover H for X» because X» is compact. 


Let J € H. Then J — of) = Neien f(G1) for some f € F(x2, Di), for some z2 € X» and Dı € E (z2). 
So dz» € Xa, dD, € E\(x2), af € F(x, D1), VG, € Dy, 30 € C, Gyx JCR by line (33.5.4). Since D is 
finite, the quantifiers “VG, € D1, JQ € C" may be reversed by way of a finite choice function. (See Remarks 
10.11.11 and 45.2.1 for “quantifier reversal” using explicit choice functions.) By the induction principle, 
there is a function Q; Dı > C such that VG, € Di, Q(G4) € C. Thus the quantifiers “VG, € D1, IQ € C? 
may be replaced with “JQ : Dı > C, VG, € D,”, where Q depends on J, as follows. 


VJec H, dz» € X», AD, € Ei (23), 3f € F(x, D), 30; : Dı — C, VG, € D, 
Gi x J C Q5(G1). (33.5.5) 


One choice of the function Q; must be made for each J € H, but H is finite. So my the induction 
principle, there is a function J ++ Qy from H to (UJ, cx, PY (C1(z2)))€ which satisfies line (33.5.5). Let 


C= User Range((2;). Then C is a finite open subcover of C. Hence X is compact. 


33.5.18 THEOREM: Tikhonov’s theorem for a finite product of topological spaces. 
Let (X;);e; be a finite sequence of compact topological spaces. Then x;e;.X; is a compact topological space. 
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PROOF: The assertion follows by induction from Theorem 33.5.17. 


33.5.19 REMARK: The compact-open topology for sets of continuous functions. 
The set of continuous functions from X to Y is denoted as C(X, Y) in Notation 31.12.11. Definition 33.5.20 
defines a stronger topology on C(X, Y) than the pointwise convergence topology in Definition 35.3.27. (For 
further details, see Willard [165], pages 282-287; EDM2 [113], pages 1648-1649.) 


33.5.20 DEFINITION: The compact-open topology on the set of functions C(X,Y) for topological spaces 
X and Y is the topology generated by (Gv; K is a compact subset of X and U € Top(Y)} on C(X,Y), 
where Og uy —(f : X >Y; f(K) CU} for all compact K C X and open U CY. 


33.6. Local compactness 


33.6.1 REMARK: Compact neighbourhoods and local compactness. 

The compact neighbourhoods introduced in Definition 33.6.2 are analogous to the closed neighbourhoods 
in Definition 31.8.26. (See also Definition 31.3.11 for open neighbourhoods.) Compact neighbourhoods are 
relevant to the locally compact spaces and sets in Definitions 33.6.3 and 33.6.4. 


33.6.2 DEFINITION: A compact neighbourhood of a point x in a topological space X is any compact set 
K € P(X) which satisfies JQ € Top,(X), QC K. 


33.6.3 DEFINITION: A locally compact topology on a set X is a topology on X such that every point of X 
has a compact neighbourhood. That is, 


Va € X, IQ € Top,(X), IK € P(X), QC K and K is compact. 


A locally compact (topological) space is a topological space for which the topology is locally compact. 


33.6.4 DEFINITION: A locally compact set in a topological space X is a set $ € P(X) such that every point 
of S has a compact neighbourhood in the relative topology on S. That is, 


Va € S, IQ € Top,(S), IK € P(S), QC K and K is compact. (33.6.1) 


33.6.5 REMARK: Compactness implies local compactness. 

If X is a compact topological space, then X is locally compact, but the converse implication is not valid in 
general. Note that in the case of locally compact sets, whether or not a set is compact in Definition 33.6.4 
and Theorem 33.6.6 (ii) is independent of whether the global or relative topology is used to test compactness. 
(See Theorem 33.5.12.) Note also that line (33.6.1) in Definition 33.6.4 is equivalent to line (33.6.2). 


Ve € S, IQ € Top,(X), IK € P(S), ONS CK and K is compact. (33.6.2) 


33.6.6 THEOREM: Compactness implies local compactness. 
Let X be a topological space. 
(i) If X is compact, then X is locally compact. 
(ii) If S € P(X) is a compact set in X, then S is locally compact set in X. 


PROOF: For part (i), let X be a compact topological space. Let r € X. Let Q = X. Then Q € Top,(X) 
and 2 is compact. Hence X is locally compact by Definition 33.6.3. 

For part (ii), let X be a topological space. Let S be a compact subset of X. Let x € X. Let Q = K = S. 
Then Q € Top, (S), K € P(S) and Q C K, and K is compact. Hence S is a locally compact set in X by 
Definition 33.6.4. 


33.6.7 REMARK: Relations of compact neighbourhoods to closed and open neighbourhoods. 

A compact neighbourhood is not necessarily the same thing as a closed neighbourhood which happens to be 
compact. However, every closed neighbourhood which happens to be compact in a compact neighbourhood, 
and in a Hausdorff space, every compact neighbourhood is a closed neighbourhood. Hence in a Hausdorff 
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space X, a set S € P(X) is a compact neighbourhood if and only if it is a closed neighbourhood which is a 
compact set. 


A compact neighbourhood is not necessarily the same thing as the closure of an open neighbourhood which 
happens to be compact, even in a Hausdorff space. As mentioned in Remark 31.8.25, the closure of an open 
neighbourhood must be a closed neighbourhood, but a closed neighbourhood is not necessarily equal to the 
closure of some open neighbourhood. 


Theorem 33.6.8 shows that in a Hausdorff space, the existence of a compact neighbourhood is equivalent to 
the existence of an open neighbourhood whose closure is compact. The equivalence fails if the space is not 
Hausdorff. (For counterexamples, see Steen/Seebach [141], page 20 and examples 9, 10, 50, 52 and 57.) A 
space or set where every point has an open neighbourhood whose closure is compact is called a “strongly 
locally compact" space or set, as in Definitions 33.6.10 and 33.6.11. 


33.6.8 THEOREM: Some basic properties of compact neighbourhoods. 
Let X be a topological space. Let z € X. 


(i) If Q is compact for some Q € Top, (X), then x has a compact neighbourhood. 
(ii) If z has a compact neighbourhood and X is a Hausdorff space, then €) is compact for some 2 € Top, (X). 


(iii) If X is Hausdorff, then z has a compact neighbourhood if and only if JQ € Top, (X), Q is compact. 


Pnoor: For part (i), suppose that Q is compact for some Q € Top, (X). Then 2 is a compact neighbourhood 


of x because Q C Q. 
For part (ii), suppose that X is a Hausdorff space and x has a compact neighbourhood K. Then by 


Definition 33.6.2, K is compact and there is an open neighbourhood € € Top, (X) such that Q C K. By 
Theorem 33.5.14, K is a closed subset of X. So Q C K = K by Theorems 31.8.13 (xiv) and 31.8.14 (v). 


Therefore € is compact by Theorem 33.5.13. 


Part (iii) follows from parts (i) and (ii). 


33.6.9 REMARK:  Topology-dependent closure of open neighbourhoods. 

In Definition 33.6.10, the closure of Q is with respect to the global topology Top( X). In Definition 33.6.11, 
the closure of Q is with respect to the relative topology Top(S). The choice of topology is indicated by a 
subscript as in Notation 31.8.21. By Theorem 31.8.23 (ii), Clostopcs)(Q) = SNN {F € Top(X); Q € FNS}. 


33.6.10 DEFINITION: A strongly locally compact topology on a set X is a topology on X such that every 
point of X has an open neighbourhood whose closure is compact. That is, 


Va € X, 30 € Top, (X), Q is compact. 


A strongly locally compact (topological) space is a topological space with a strongly locally compact topology. 


33.6.11 DEFINITION: A strongly locally compact set in a topological space X is a set S € P(X) such that 
every point of S has an open neighbourhood in the relative topology on S which is compact. That is, 


Va € S, 30 € Top, (S), Clostop(s)(Q) is compact. 


33.6.12 THEOREM: Some basic properties of strong local compactness. 
Let X be a topological space. Let S € P(X). 


(i) If S is strongly locally compact, then S is locally compact. 
(ii) If S is locally compact and X is a Hausdorff space, then S is strongly locally compact. 
(iii) If X is Hausdorff, then S is locally compact if and only if S is strongly locally compact. 


PROOF: Part (i) follows from Definition 33.6.11 and Theorem 33.6.8 (i). 
Part (ii) follows from Definition 33.6.11 and Theorem 33.6.8 (ii). 


Part (iii) follows from parts (i) and (ii). 
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33.6.13 EXAMPLE: Topological spaces which are locally compact but not strongly locally compact. 

Let X be a set and p € X. Let X have the particular-point topology for the particular point p. (See 
Definition 31.11.13 for particular-point topologies.) In other words, Top(X) = (0) U {Q € P(X); pE Q} = 
(0) U (P(X) \ P(X V {p})). Then Top(X) = {X} U P(X V {p}). 

For any z € X, the set {z,p} is both open and compact, but not closed unless X = (x, p}. Therefore every 
point in X has a compact neighbourhood, and so X is locally compact. 


Let Q € Top(X). If p € Q, then Q = X, but X is compact if and only if X is finite. (Consider the open 
cover of an infinite set S by pairs (z, p} for x € S.) So p has no open neighbourhood for which the closure 
is compact. Therefore X is not strongly locally compact if X is infinite. 


33.7. Other cover-based compactness classes 


33.7.1 REMARK: Countable compactness and sequential compactness. 

Although Definition 33.7.2 for countable compactness is expressed in terms of the existence of open covers 
with lower cardinality, as in the case of Definition 33.5.10 for ordinary compactness, countable compactness 
has more in common with the limit-based concepts in Sections 35.5 and 35.6 which are expressed in terms 
of the existence of limits of infinite sequences or limit points of infinite sets. 


33.7.2 DEFINITION: A countably compact set in a topological space X is a subset S of X such that 


V(Q;);e, € Top( X)", SCU > 3I e Pw), SC UM. 


icu tel 


In other words, every countable cover of S has a finite subcover. 


33.7.3 THEOREM: Compactness implies countable compactness. 
Every compact set in a topological space is countably compact in that space. 


PROOF: Let X be a topological space. Let S € P(X) be a compact set in X. Let (Q;)ie, € Top( X)" be 
an infinite family of open sets in X such that S C | J;z,, i. Let C = (Q;; i € w}. Then C is an open cover 
for S. So by Definition 33.5.10, there is a finite subset C" of C such that S C UC". Let I = {i € w; Qi € C". 
Then J is a finite set of integers, and S C U,-, Qi. Hence S is countably compact by Definition 33.7.2. 


33.7.4 REMARK: A limit-point property of countably compact sets. 

Theorem 33.7.5, and its countable choice doppelganger, Theorem 33.7.6 (i), are relevant to the relations 
between the various compactness classes in Sections 35.5 and 35.6. The relations between countable com- 
pactness and the existence of oo-limit points for infinite or w-infinite sets are illustrated in Figure 33.7.1. 


infinite set countably infinite set countably 
=> oo-limit point compact => oo-limit point compact 
Sy "A co RN zr zr 7 cc 
w-infinite set w-infinite set 
=> oo-limit point => oo-limit point 
ZF set theory ZF+CC set theory 


Figure 33.7.1 Implications for countable compactness and co-limit point existence 


33.7.5 THEOREM: All w-infinite subsets of countably compact sets have an oo-limit point. 
In a topological space, every w-infinite subset of a countably compact set has an oo-limit point in the set. 


PROOF: Let K be a countably compact subset of a topological space X. Let S be an w-infinite subset of K 
which does not have an oo-limit point in K. Then by Definition 31.10.17 for an oo-limit point, for all z € K, 
for some € € Top,(X), the set QN (S \ {z}) is finite. By Definition 13.7.6 for an w-infinite set, there exists 
an injective sequence x = (a; )icew € S". Let S’ = Range(xr). Then S C S. So for all z € K, there exists a 
set Q € Top,(X) such that #(Q N S") < oo because Q à S € (Qn (SN {z})) U {2}. 
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Let C = {Q € Top(X); #(Q AN S") < co}. For each z € K, at least one set Q € C contains z. So K C UC. 
Thus C is an open cover for K. For each n € w, let C, = {2 € C; ANS" € (zi; i < n}} and Qn = UC. 
Then (5, € Top(X) by Definition 31.3.2 (iii), and K C Une, Qn because Une, (zi; i < n) = S'. Thus 
(Q,; n € w} is a countable open cover for K. Therefore K C U,,¢ Qn for some finite set J € P(w) by the 
countable compactness of K. But then S = S NA K C S'nUJ, c; Qn = Unes n On), which is a finite 
set because the union of a finite set of finite sets is finite. This is a contradiction because S" is (countably) 
infinite. Hence S has an cc-limit point in K. 


33.7.6 THEOREM [ZF+CC]: Some properties of countably compact sets in ZF+CC set theory. 
Let X be a topological space. 


(i) Every infinite subset of a countably compact set in X has an oo-limit point in the set. 


(ii) Let K be a subset of X such that every w-infinite subset of K has an oo-limit point in K. Then K isa 
countably compact subset of X. 


(iii) Let K be a subset of X such that every infinite subset of K has an oo-limit point in K. Then K isa 
countably compact subset of X. 


(iv) A set K € P(X) is a countably compact subset of X if and only if every w-infinite subset of K has an 
co-limit point in K. 

(v) A set K € P(X) is a countably compact subset of X if and only if every infinite subset of K has an 
co-limit point in K. 


PROOF: For part (i), let X be a topological space. Let K be a countably compact subset of X. Let S be 
an infinite subset of K which does not have an co-limit point in K. By the axiom of countable choice and 
Theorem 13.10.11 (i), S is w-infinite. Hence S has an oc-limit point in K. 


For part (ii), let X be a topological space. Let K be a subset of X such that every w-infinite subset of K has 
an oo-limit point in K. Let (Q;);e, € Top(X)” be a countable open cover of K. Suppose that K Z Uier fX 
for all finite sets I € P(w). Let Sn = K V Uien Qj for n € w. Then S, 7 ) for all n € w, but new Sn = 0 
By the axiom of countable choice, there exists a sequence x = (r;)ie, € K” such that Vn € w, £n € Sp. 
Then Range(z) is infinite because otherwise there would be an infinite subsequence of x which has a constant 
value in (),,¢, Sn, Which is impossible. So Range(z) is an w-infinite subset of K. So Range(x) must have 
an co-limit point z € K, and so z € Les Qi. Therefore z € Q, for some k € w, and so z ¢ S; for i € w 
with i > k. So Q, n (Range(x) V {z}) is a finite set. But by Definition 31.10.17, Q N (Range(x) \ {z}) is an 
infinite set for all Q € Top, (X). This is a contradiction because €), € Top, (X). Therefore K C (Uez €; for 
some finite set J € P(w). Hence K is a countable compact subset of X. 


Part (iii) follows from part (ii) because every w-infinite set is infinite (even if the axiom of countable choice 
is not assumed). 


Part (iv) follows from part (ii) and Theorem 33.7.5. 


Part (v) follows from parts (i) and (iii). 


33.7.7 REMARK: Lindelöf spaces are not very useful. 

An important property of Lindelóf spaces is that they promote countably compact sets to compact sets. This 
is shown in Theorem 33.7.11. Thus there is no distinction between compact sets and countably compact sets 
in a Lindelöf space. 


Regrettably, the apparently very useful-looking Theorem 33.7.12, which claims that every subset of a second 
countable space is a Lindelóf set, relies upon the axiom of countable choice. The non-removability of this 
requirement is confirmed by the existence of ZF models where even the real number system is not a Lindelöf 
space, even though it is clearly second countable in standard ZF theory. (See Howard/Rubin [362], pages 20, 
131, 325; Moore [371], pages 70-71, 236, 244, 248, 322-323.) It is known that the Lindelöf property for the 
real numbers implies the equivalence of finite and Dedekind-finite sets. (See Moore [371], page 70 footnote; 
Howard/Rubin [362], page 325.) Therefore the Lindelóf property for the real numbers will not be valid in 
any ZF model where Dedekind-finiteness does not imply finiteness. These difficulties might help to explain 
why one reads so little about Lindelóf spaces in practical application-oriented texts about topology. 


One might question the utility of the plethora of classes of topological spaces. A class of spaces €, can be 
very useful if there are some difficult theorems which assert that membership of class 64 implies membership 
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of class 62, where class 65 has some clear utility in itself. But if it is much more difficult to prove membership 
of class G than membership of class 65, then there is no net benefit in approaching class G2 via class 64. 
In this case, there are some useful downstream consequences of membership of the Lindelóf class, but it is 
very difficult to prove the Lindelöf property in the first place. 


When discussing Lindelóf spaces, as in Theorems 33.7.9 and 33.7.10, it is convenient to use the ad-hoc 
abbreviation *IP^" for the set of countable subsets of a given set, which is defined in Notation 13.12.6. 


33.7.8 DEFINITION: A Lindelöf (topological) space is a topological space such that every open cover has a 
countable subcover. 


A Lindelóf set is a subset of a topological space such that every open cover has a countable subcover. 


33.7.9 THEOREM: The Lindelöf set property is hereditary. 
Let X be a topological space. Let S C X and K C S. If K is a Lindelöf set in the topology on X, then K 
is a Lindelöf set in the relative topology on S. 


PnRoor: Let X bea topological space. Let S C X and K C S. Let K be a Lindelöf set in the topology on X. 
Suppose that Č € IP(Top(S)) and K C UC. Let C = {9 € Top(X); QN S € C). Then C € P(Top(X)) 
and K C UC. So K C UC" for some C' € P“(C) by Definition 33.7.8. Let C’ = {QN S; Q € C"). Then 
C' € IP*(C) and K C UČ’. Thus K satisfies 


VC € IP(Top(S)), KCUC > 38 er"(Ó), KOUČ. 


Hence K is a Lindelöf set in the topology on S by Definition 33.7.8. 


33.7.10 THEOREM |ZF+CC]: The Lindelöf set property is inverse hereditary. 
Let X be a topological space. Let S C X and K C S. If K is a Lindelóf set in the relative topology on S, 
then K is a Lindelóf set in the topology on X. 


PROOF: Let X be a topological space. Let S C X and K C S. Then by Definition 33.7.8, K is a Lindelöf 
set with respect to the relative topology on S if and only if 


VC € IP(Top(S)), KcCUE = 3Č' € P*(C), KCUC, (33.7.1) 


where Top(S) = {QN S; Q € Top(X)} by Definition 31.6.2. Let K be a Lindelöf set in the relative topology 
on S. Let C € P(Top(X)) and K C UC. Let Č = {QN S; Q € C}. Then Č € P(Top(S)) and K C (JC. 
Therefore K C LJ C' for some C' € P*(C) by line (33.7.1). Let I = C" and define the set-family (S;)ie; by 
S; — (Q € C; Qn S € ij for i € I. Then I is a countable set and S; 4 0 for all i € I. Therefore x;e; S; 4 () 
by the axiom of countable choice and Theorem 13.7.22. So there exists a function f : C’ — C such that 
i C f(i) for all i € C'. Let C' = Range(f). Then C" € IP^(C) and K C UC’. Hence K is a Lindelöf set in 
the topology on X by Definition 33.7.8. 


33.7.11 THEOREM: Countable compactness implies compactness in Lindelöf spaces. 
In a Lindelóf space, every countably compact set is compact. 


PROOF: Let K be a countably compact set in a Lindelöf space X. Then every open cover of K has a 
countable open cover, and every countable open cover of K has a finite open cover. Therefore every open 
cover of K has a finite open cover. Hence K is compact. 


33.7.12 THEOREM [ZF4CC]: Subsets of second countable topological spaces are Lindelöf sets. 
Every subset of a second countable topological space is a Lindelóf set. 


PROOF: Let S be a subset of a second countable space X. Let C € P(Top(X)) be an open cover for S. 
By Definition 33.4.13, X has an indexed countable open base (B;)icw. (See Definition 32.2.10 for indexed 
countable bases.) Let J = (i € w; JQ € C, B; € Q}. Then J is a countable set by Theorem 13.7.7 (iii). Let 
U = Uje Bi. Then clearly U C UC. Let x € S. Then z € Q for some Q € C. So x € B; and B; C CQ 
for some i € w by Definition 32.2.10. But this implies that i € I. So x € B; for some i € I, and so x € U. 
Therefore S C U. 
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Define the countable family Y = (Y;);e; € IP(C)/ by Y; = {Q € C; Bj C Q} for all i € I. Then Y; Æ Q for 
all à € I by the definition of I. So Y is a countable family of non-empty sets. Therefore by the axiom of 
countable choice, there exists a choice function $ : I > C such that ¢(7) € Y; for alli € I. Let C” = Range(49). 
Then C” is a countable subset of C, and S C UC’ because B; C $(i) for all i € J, and so U C UC’. Thus 
C" is a countable open cover of S. Hence S is a Lindelöf set in X by Definition 33.7.8. 


33.7.13 REMARK:  Paracompactness literature. 

Some useful presentations of paracompactness are given by Willard [165], pages 144—160, 248-249, 265; 
Wilansky [163], pages 316-319, 351; Steen/Seebach [141], pages 22-27, 165-173; Kelley [101], pages 156-161, 
172-173; Hocking/Young [93], pages 77-81; Gemignani [80], pages 228-230; Gaal [77], pages 153-163. 

Some discussion of paracompactness in the context of differential geometry is given by Lang [23], pages 33-38; 
Auslander/MacKenzie [1], pages 102-104; Choquet-Bruhat [6], pages 67-69, 100-103; Spivak [37], Volume 1, 
pages 210, 459-460; Szekeres [305], page 482; Kobayashi/Nomizu [19], pages 58-60; Nash/Sen [30], page 46. 
Some indications of the set theory implications of paracompactness are given by Howard/Rubin [362], 
pages 64, 136. (They mention, for example, that the theorem that all metric spaces are paracompact 
cannot be proved in ZF, even assuming that all Dedekind-finite sets are finite. See also Remark 37.7.19.) 


33.7.14 DEFINITION: A locally finite cover of a topological space X is a set C € P(P(X)) such that each 
point of X has a neighbourhood which intersects at most a finite number of elements of C. That is, 


Va € X, IG € Top, (X), #({N ec; ANG FO}) «oo. 


A pointwise finite cover of a topological space X is a set C € P(P(X)) such that each point of X is contained 
in at most a finite number of elements of C. That is, 


Va € X, #({Q € C; z € Q}) «oo. 


33.7.15 DEFINITION: A paracompact topology on a set X is a topology on X such that every open cover 
of X has a locally finite open refinement. 


A paracompact set in a topological space X is a set S € P(X) such that every open cover of S has a locally 
finite open refinement. 


33.7.16 DEFINITION: A countably paracompact topology on a set X is a topology on X such that every 
countable open cover of X has a locally finite open refinement. 


A countably paracompact set in a topological space X is a set S € P(X) such that every countable open 
cover of S has a locally finite open refinement. 


33.7.17 REMARK: Conditions which imply paracompactness. 
If X is a compact topological space, then X is paracompact. Any metrisable topological space is paracompact. 
So paracompactness is a fairly weak compactness property. (See Chapter 37 for metric spaces.) 


33.7.18 REMARK: Diagram of relations between compactness classes. 
Some relations between compactness classes are summarised in Figure 33.7.2. 


topological space 


(X,Tx) 
Hausdorff space locally compact space paracompact space 
(X,Tx) (X,Tx) (X,Tx) 
locally Cartesian space locally compact, Hausdorff, compact space 
(X,Tx) second countable space (X,Tx) 


m (X, Tx) 
"d 


topological manifold 
(X,Tx) 


Figure 33.7.2 Family tree of compactness classes 
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The fact that the Hausdorff property is not implied by the compactness properties is proven by the trivial 
topology (0,IR] for IR. This is clearly not Hausdorff, but it is compact because all open covers are finite. 
The underlying topology of a topological manifold is typically defined to be a second countable locally 
Cartesian Hausdorff space. See EDM2 [113], article 425, page 1618, for a comprehensive set of family trees 
for topological spaces. 


33.8. Topological dimension 


33.8.1 REMARK: The relevance of topological dimension to Hilbert’s fifth problem. 
Topological dimension has some relevance to the determination of sufficient conditions for a locally compact 
transformation group to be a Lie group. (See Section 62.2.) 


33.8.2 DEFINITION: The Lebesgue dimension of a subset S of a normal topological space X is the extended 
non-negative integer in Zg which is the minimum of the set of n € Zf such that for any finite open cover G 
of S in X (i.e. a family G = (Gi), € Top( X)" with m € Zg such that S C Ui, Gi), for some refinement 
H of G (i.e. a family H = (H;)?*, € Top(X)™ such that H; C G; for all i € Nm and S C U3", Hj), for all 
sets J C Nm with #(J) = n + 2, the set f,- ; H; is empty. (If the set of such n € Zf is empty, then the 
Lebesgue dimension of S in X is oo € Zg.) 


JET 


33.8.3 NOTATION: dim(S) denotes the Lebesgue dimension of a set S in a normal topological space X. 


33.8.4 REMARK: Interpretation of the definition of Lebesgue dimension. 
Let Covm(S) = {G € Top( X)"; S CU", Gi} for m € Zg, for subsets S of any topological space X. Then 
the Lebesgue dimension dim(S) for any normal space X is the smallest n € Zj such that 


Vm € Zt, VG € Covm(X), JH € Cova (X), 


(Vi € Nm, Hi € Gi) and (VJ € P”? (Nm), n H; =6), 
j€ 


where IP^( A) is defined in Notation 13.12.2 as (B € P(A); (B) < k} for any set A and k € Zj. 


33.8.5 THEOREM: Countable subsets of normal spaces have Lebesgue dimension zero. 
Let S be a countable subset of a normal topological space X. Then dim(S) = 0. 


PROOF: Let X be a normal topological space. Let S be a countable subset of X. Then S = (z5; j € J} 
for some finite or countably infinite sequence (x;)j;¢y with x; # x, for all j,k € J with j # k, where 
J =N or J = Nn for some n € Z. Let G = (Gi), with m € Dr; be a finite open cover for S in X. 
Define the sequence (/;);e; € N7, by £; = min(i € Nm; zj € Gi}. This is a well-defined sequence because 
{i € Nm; zj € Gi) z 0 for all j € J, and Nm is well ordered. 


A sequence (Q5);e; € Top(X)7 may be constructed by induction so that xj € Q; C Ge, for all j € J, and 
Qj n Qy = 0 for all j,k € J with k < j. (This follows from the Hausdorff separation property.) Construct 
the family of sets H = (H;)™, € Top(X) as H; = U{9;; j € J and £4; = i} for alli € Nm. Then 
Hi, n Hi, = ( for all 41,12 € Nm with à x 12, and S Cc EH Hi, and Hij e G; for alli € Nm: So dim(S) < 0. 
Hence dim(S) = 0. 
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34.1. Connected spaces and sets 


34.1.1 REMARK:  Connectedness is one of the fundamental concepts of topology. 

Connectedness is one of the two core concepts of topology. The other core concept is continuity. (The 
interrelated roles of these two core concepts are discussed in Sections 31.1 and 31.2.) The concepts of 
connectedness and continuity are so closely intertwined that it is difficult to say that one is more fundamental 
than the other. 


Many of the proofs concerned with connected spaces and sets in Chapter 34 may seem excessively tedious. 
However, there are many "facts" about connected sets which are intuitively obvious while not actually being 
true. Therefore it is important to check all “facts”, paying special attention to the precision of the logic 
to ensure that intuition does not become a substitute for rigorous argument. (Intuition must be turned off 
while writing proofs, but it must be turned on again to discover new theorems.) 


Some useful references for connectedness are Baum [54], pages 98-114; B. Mendelson [115], pages 112-156; 
Gaal [77], pages 98-107; Steen/Seebach [141], pages 28-33; Willard [165], pages 191—222; Wilansky [163], 
pages 68-75; Simmons [137], pages 142-152; Kasriel [100], pages 110-116, 218-229; Hocking/Young [93], 
pages 14-17, 105-122; Gemignani [80], pages 183-207; Kelley [101], pages 53-55. 


34.1.2 REMARK: Topological connectedness is double-negative. Structural connectedness is positive. 
Connectedness is introduced in Definition 34.1.3 as the impossibility of partitioning a set into two non-empty 
subsets which have a disjoint open cover. This double-negative non-disconnectability concept may be thought 
of as “topological connectedness”. (Local connectedness in Section 34.7 is also a double-negative style of 
connectedness.) 


The topological double-negative kind of connectedness is implied by pathwise connectedness, which is, by 
contrast, a positive assertion of the existence of a continuous connecting path. (See Section 36.7 for pathwise 
connectedness.) Positive connectedness concepts may be thought of as a kind of “structural connectedness” 
because they signify that some connecting structure exists, whereas “topological connectedness” signifies 
that a disconnecting open cover does not exist. 


There are many “structural connectedness” concepts which require the positive existence of various kinds 
of connecting structures. These structures typically involve continuous maps of some kind, which yield 
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homotopy groups and homology groups, for example. However, these topics lie within algebraic topology, 
which is outside the scope of this book. (For this scope constraint, see Remark 1.6.3 item (6).) For the double- 
negative style of connectedness in Chapter 34, proofs of connectedness are mostly obtained by supposing 
that a space or set is disconnected and then showing that this leads to a contradiction. For the positive 
styles of connectedness, proofs of connectedness are typically obtained by demonstrating (by construction or 
otherwise) the existence of connecting structures. 


34.1.3 DEFINITION: A connected topological space is a topological space (.X, T) such that 
VO, Q2 € T, (X — 04U O5 and Q4n Q0» = () > (01 — 0 or Q = ()). (34.1.1) 


A disconnected topological space is a topological space (X, T) such that 


301,05 € TN (0), X = Q1U Qə and Qı N Qə = (j. (34.1.2) 


34.1.4 REMARK: A topological space is disconnected if and only if it is not connected. 
Line (34.1.1) in Definition 34.1.3 is logically equivalent to the following condition, which clearly means that 
X cannot be partitioned into two non-empty open sets. 


VOi, Q2 € T, (Q1 z 0 and Q2 Æ 0) > A(X = Qı U Qə and Q4 N Qz = 0). 
Thus, unsurprisingly, lines (34.1.1) and (34.1.2) are logical negatives of each other. 


34.1.5 REMARK: Connectedness for subsets of a topological space. 

Connectedness of subsets of topological spaces is defined in a subtly different way to connectedness of the 
whole space. The condition “Q1 N Q2 N S = Ø” in Definition 34.1.6 is apparently a weaker constraint on Qı 
and Qə than the condition *Q; N Q2 = Ø”. This is not just apparently weaker. It is in fact weaker, which in 
principle makes it easier to prove that a set is disconnected. The reason for this subtle difference is to make 
the definition of connectedness for a subset S of X the same as for the topological space (S, Ts), where Ts 
is the relative topology induced on S' by the whole space X. This is shown in Theorem 34.1.8. 


34.1.6 DEFINITION: A connected set in a topological space (X, T) is a set S € P(X) such that 
VOi, Q2 € T, (S C Qi UQ and Q; 1 Q9 18 20) > (ANS —0 or RN S =D). (34.1.3) 
A disconnected set in a topological space (X, T) is a subset which is not connected. In other words, 


3041,05 € T, S C Qi U Qo and Q; Qs nS =H and Qı N S Z and QNS z 0. (34.1.4) 


34.1.7 EXAMPLE:  Connectedness of subsets of three-point topological spaces. 

Table 34.1.1 indicates which subsets of the three-point topological spaces presented in Example 31.5.6 are 
connected. (These are illustrated in Figure 31.5.2. See also Example 33.2.15 for weakly and strongly 
separated set-pairs in the three-point topological spaces 3f and 3g.) 


topology connected subsets 
on {1,2,3} {1,2} {1,3} {2,3} 112,3) 
3a () yes yes yes yes 
3b 1 yes yes yes yes 
3c 12 yes yes yes yes 
3d 12, 3 yes no no no 
3e 1, 12 yes yes yes yes 
3f 1, 2, 12 no yes yes yes 
3g 2, 12, 23 yes no yes yes 
3h 1, 2, 12, 23 no yes no no 
3 1, 2, 3, 12, 13,23 no no no no 
Table 34.1.1 Summary of 3-point topological spaces 
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The empty set and all singletons are connected sets in all topological spaces. (See Theorem 34.4.5 (v).) The 
corresponding table of connected subsets for all four-point topologies would be impractically large. (The 
curious reader might like to create a similar table for all four-point topological spaces, which are listed in 
abbreviated form in Figure 31.5.3 in Example 31.5.8, but this is not recommended!) 


34.1.8 THEOREM: Equivalence of connectedness in the global and relative topologies. 
Let X < (X,Tx) be a topological space. Then a set S € P(X) is a connected set in X if and only if the 
relative topological space (S, Ts) is connected. 


PROOF: Let X < (X, Tx) be a topological space, and let S € P(X). The relative topology on S in X is the 
set Ts = {QN S; 0 € Tx} by Definition 31.6.2. Then by Definition 34.1.3, S is a disconnected topological 
space if and only if S = Qı U Qz and Q4 N Qz = () for some N1, Q2 € Ts V {Ø}. This holds if and only if 
S = (Qı N S) U (05 N S) and (04 N S) n (Q9 n S) = 0 for some 01,0» € Tx such that Qı N S z 0 and 
2205 4%. And this holds if and only if S C Q1 U O5 and Q4 N Q5 nS = () for some N1, Q2 € Tx such that 
Qı N S z and Qa N S Z 0. In other words, S is a disconnected set. 


34.1.9 REMARK:  Connectedness of a set is the same in a relative topology. 

Theorem 34.1.10 states that the connectedness of a set is unaffected by being tested with respect to any subset 
of the full space in which it is included. The case A — X in Theorem 34.1.10 implies that Definition 34.1.3 
is equivalent to the special case S — X of Definition 34.1.6. 


34.1.10 THEOREM: Equivalence of connectedness in the global and any covering relative topology. 
Let X < (X,Tx) be a topological space. Let A € P(X) and S € P(A). Then S is a connected set in (X, Tx) 
if and only if S is a connected set in the relative topological space (A, T4). 


PROOF: Let (X,Tx) be a topological space. Let A € P(X) and S € P(A). Let S be a connected set 
in (X, Tx). Then S satisfies line (34.1.3) in Definition 34.1.6 with 01,0» € Tx. Let Q1, Q3 € Ta. Then 
Qi = Qı N A and Q5 = Qa N A for some 01,02 € Tx. Suppose that S C Qi U 05 and QA n Q5 S = (. 
QNNZNANS = 01090505 because S C A. So it follows from line (34.1.3) that Qı N S = 0 or QN S = O. 
So MNS = (QINA) NS = H or QNA S = (Q2N A) n S = 0. Therefore S satisfies line (34.1.3) with the 
relative topology T4 in place of T. Hence S is a connected set in the relative topological space (A, T4). 


To show the converse, suppose that S is a connected set in the relative topological space (A, TA). Then 
S satisfies line (34.1.3) in Definition 34.1.6 with 01,9 € Ta. Let 9,05 € Ty. Then Q1 n A = N 
and Q5 n A = Qe for some 01,22 € T4. Suppose that S C Qi UNS and Q) nO, n S = Ø. Then 
S—SnAC(01U05)n A—-901U9s and Q40n04n 8$ 2 0 NAN n An S = Q NAN NS = because S C A. 
So it follows from line (34.1.3) that Qı N S = Ø or Qo N S =H. SoY nS-Qn(SnA) 2Q9,nS —-0or 
Q5n$-Q05n(SnA)-05n5S =H. Therefore S satisfies line (34.1.3) with the full topology Tx in place 
of T. Hence S is a connected set in the topological space (X, Tx). 


34.1.11 REMARK: A connected set is a set with no non-trivial disjoint open covers. 
Line (34.1.3) in Definition 34.1.6 means that S cannot be partitioned into two sets by a two-set open cover. 
This condition may be expressed in the following equivalent way. 


VO1,O5 € T, (S € 0, UO, and QNS Z Ü and QNSE 2 91n9sn5 z 0. (34.1.5) 


In other words, if the set S is covered by two open sets Qı and Q2, and each of these open sets covers at 
least one point of S, then Qı N S and Q2 N S must have at least one point in common. This is probably 
closer to an intuitive picture of connectedness. Condition (34.1.5) is illustrated in Figure 34.1.2. 


Intuitively speaking, one must first find a "gap" between two portions of a set. Then one must cover each 
portion with an open set. If it is not possible to find any gap, the set must be connected. Difficulties arise, 
however, when the “gap” has zero width, which demands great skill to position the covering sets accurately. 
There is no margin for error! 


34.1.12 REMARK: Connected spaces are spaces which have no non-trivial boundaryless sets. 

Theorem 34.1.13 expresses connectedness of a topological space in terms of the non-existence of non-trivial 
boundaryless sets. In other words, a topological space is connected if and only if the only sets with an empty 
boundary are the empty set and the whole space. 
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disconnected connected 
Figure 34.1.2 Definition of connectedness of a set 


34.1.13 THEOREM: A space is connected if and only if every proper subset has non-empty boundary. 
A topological space X is connected if and only if VS € P(X) V (0, X), Bdy(S) z 0. 


Pnoor: Let X < (X,T) be a connected topological space. Let S € P(X) \ (0, X} with Bdy(S) = 0. Then 
S and X \ S are both non-empty, and are both open subsets of X by Theorem 31.9.10 (xxxvi, xxxvii). So 
by Definition 34.1.3, X is not connected, which is a contradiction. Therefore Bdy(S) 4 0. Thus if X is 
connected, then VS € P(X) \ (0, X}, Bdy(S) z 0. 

Now suppose that X is not connected. Then by Definition 34.1.3, there are sets Q1, Q2 € Top(X) V {Ø} such 
that X = Q1UO5 and Q1 O05 = 0. But then Q9 = XXV Qı and so Bdy(04) = Ø by Theorem 31.9.10 (xxxviii). 
This contradicts the proposition VS € P(X) \ (0, X}, Bdy( S) Æ 0 by letting S = 91. Hence X is connected 
if and only if VS € P(X) \ (0, X), Bdy(S) z 0. 


34.2. Connectedness in strongly separated topological spaces 


34.2.1 REMARK: Attempted simplification of the conditions for connected sets. 

The condition “Q4 N Q5 N S = 0" in lines (34.1.3) and (34.1.4) of Definition 34.1.6 cannot be replaced 
with *Q4 N Q2 = Ø” for general topological spaces. A counterexample may be constructed using the finite- 
complement topology in Definition 31.11.7. Let X = Z, S = {1,2}, 0; = Z \ {1} and Q2 = Z \ {2}. Then 
S C UUR = Z, O1003nS = 0, QAS = (2) Z 0 and QN S = {1} # 0, but 0; 9s = Z\ (1,2) 4 0, and 
in fact Q1 N Q2 Æ Ø for any non-empty open sets in the finite-complement topology. This topological space 
has very poor separation. It is not even Tə. Theorem 34.2.2 shows that with T; separation, the condition 
*Q4 n Q0 n S =” in Definition 34.1.6 line (34.1.4) can be replaced with “Qi n Qz = Q”. 


34.2.2 THEOREM: Slightly weaker test for disconnectedness in Ts spaces. 
Let (X,T) be a Ts topological space. Then S € P(X) is disconnected if and only if 


304,05 € T, S C QO1U (5 and Q4 Q3 = 0 and Qı N S 49 and UYASI. (34.2.1) 


PROOF: Let (X,T) be a Ts topological space. Line (34.1.4) follows trivially from line (34.2.1). So suppose 
that S € P(X) satisfies line (34.1.4). Then S C Qi U 05, Q4 1 O50 8 2 0, 040 S Z 0 and Q4 S z 0 
for some 04, Q2 € T. Let $1 = en N S and $5 = Q N S. Then $1 Æ 0, S5 Æ 0, 91 U So € 01 U £25 and 
Q1 950 (5; U S2) = 0. But then Q10 Sı = Si, and so Q2 N Sı = Q2 N (Q1 N S1) C Q2 N Q1 N (S1U S2) = 0. 
Thus Qə N Sı = 0, and similarly N1 N S2 = Ø. Therefore Sı C X V Q2 and Sz C X \ Qi. So $4 C X\ Qe, 
and $9 C X V OQ, by Theorem 31.8.13 (xiv) and Theorem 31.8.14 (v) because X V Q2 and X V Qı are closed 
sets in (X, T). Therefore $; € 05 = 0 and $5 N Qı = 0, and so S, N S2 = Ó and S2 N Sı = Ø. So by the 
T; property, Definition 33.3.25, there exist 04,05 € Top(X) such that 91 n € =, Sı C € and S2 C (5. 
Then $ C Q1 U Q5 and Q) N S = S, Æ ( and QL N S = Sy z 0. This verifies line (34.2.1). Hence S is 
disconnected if and only if line (34.2.1) holds. 


34.2.3 REMARK: Relaxation of the criterion for disconnected sets in a T5 topology. 

It may seem somewhat surprising that the T separation condition is not required in Theorem 34.2.2. (Recall 
from Remark 33.3.28 that a topological space may be Ts and not T4.) An example of a T5 non-T, space 
is the trivial topology T = (0, X in Definition 31.3.18 for a set X with two or more points. With such a 
topology, no disconnected sets are possible. So both lines (34.1.4) and (34.2.1) are necessarily false. 
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As indicated in Figure 33.3.7 in Remark 33.3.36, all metrisable topological spaces have the Ts property. 
Since Cartesian spaces are metrisable, it follows that the condition in line (34.2.1) may be used instead 
of line (34.1.4) in Definition 34.1.6 in the case of Cartesian spaces. Correspondingly, line (34.2.2) may be 
substituted for line (34.1.3) in Definition 34.1.6 for T5 spaces, in particular for metrisable and Cartesian 
spaces. 


VOi, O5 € T, (S C Q4U O5 and Q4 NQ = 90) > (Q30 S — 0 or QAS =). (34.2.2) 


34.3. Disconnections of sets 


34.3.1 REMARK: A connected set is a set for which there is no disconnection. 

In terms of Definition 34.3.2, one may say that a topological space is connected if and only if there does 
not exist any disconnection of the topological space. A disconnected topological space may have many 
disconnections. For example, in the case of the discrete topology P(X) on a set X, the pair (S, X VS) isa 
disconnection for any S € P(X) \ (0, X}. Interestingly, there is no corresponding concept of a “connection” 
in the sense of a single set structure which proves that a topological space is connected because connectedness 
is proved by the absence of a disconnecting pair. (The lack of such a "connection" structure conveniently 
makes this word available for the very different parallelism-related concept in Chapter 67.) 


34.3.2 DEFINITION: A disconnection of a topological space X is a pair of sets (Q1, Q2) which satisfy Q4, Q2 € 
Top( X) N (0), X= Qı U Qə and Q4 N Qə = (). 


In other words, a disconnection of X is a partition of X into a pair of disjoint, non-empty open subsets. 


34.3.3 REMARK: A connected subset is a subset for which there is no disconnection. 
In terms of Definition 34.3.4, one may say that a subset of a topological space is connected if and only if 
there does not exist any disconnection of the subset. 


There are some notable differences between Definition 34.3.2 and 34.3.4. A disconnection of the whole 
topological space X is expressed in terms of open sets whereas a disconnection of a subset S is expressed 
in terms of subsets of S. This is a consequence of the fact that the relative topology must be used for the 
disconnecting cover in the case of subsets. 


It is not necessary to specify in Definitions 34.3.4 and 34.3.6 that S1 N S2 = () because it follows from lines 
(34.3.1) and (34.3.2) that S1 N S2 € (0; 003) n ($1, U $2) 2 94 ARNS =O. 


34.3.4 DEFINITION: A disconnection of a subset S of a topological space X is a pair of non-empty sets 
(Si, S2) such that S = $1 U So and 


301, Q2 € Top( X), Sı CQ, and Sp € Qo and Qi N Q2 N S = 0. (34.3.1) 


In other words, a disconnection of S is a partition of S into a pair of disjoint non-empty subsets which may 
be covered by a corresponding pair of open sets which are disjoint relative to S. 


34.3.5 THEOREM: A set is disconnected if and only if it has a disconnection. 
A subset S of a topological space X is disconnected in X if and only if there exists a disconnection of S 
in X. (In other words, S is disconnected if and only if S has a disconnection.) 


PRoor: Let S be a subset of a topological space X. Suppose that S is disconnected in X. Then by 
Definition 34.1.6, there exist Q1, Q2 € Top( X) such that S C Qi U Q5, O1 à Q3 S = 0, Q n S z 0 and 
QN S Æ 0. Let Si = SQ, and So = SAO». Then S = $1U S5, Si C Qı, So e Qo, Q1 005n($4U S2) = 0, 
S, Æ Ú and S2 Z Ø. Therefore (S1, S2) is a disconnection of 5 in X by Definition 34.3.4. 

Now suppose that (51,52) is a disconnection of S in X. Then by Definition 34.3.4, S1, S2 € P(X) V {0}, 
S = Sı U S», and there exist O1, 05 € Top(X) such that Sı C 01, S2 C Qz and Q; n O05 n S = Ø. So 
S C Q4U9s, Q1 N S Z and 05S z 0, because Ng N S = Nk N (S1 U S2) 2 Or n Sy = Sk 4G for k = 1,2. 
Hence S is disconnected in X by Definition 34.1.6. 
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34.3.6 DEFINITION: A disconnected pair of sets in a topological space X is a pair $1, $5 € P(X) \ {0} such 
that 


30, Q2 € Top( X), Si C 04 and So C 5 and 04 n Qo N (Si U S2) = 0. (34.3.2) 


34.3.7 REMARK: Disconnected pairs of sets are the same as weakly separated pairs of non-empty sets. 
Theorem 34.3.8 asserts that Definition 34.3.6 for a disconnected pair of sets is equivalent to Definition 33.2.2 
for a weakly separated pair of sets. 


34.3.8 THEOREM: Equivalence of set-pair disconnectedness and weak separation. 

A pair of non-empty sets (S1, $5) in a topological space X is a disconnected pair of sets in X if and only if 
the pair (S1, S2) is weakly separated. 

In other words, a pair of non-empty sets (51, 92) in X is a disconnected pair in X if and only if $1085 — 0 
and $1 n So = 0. 


PROOF: The result follows from Theorem 33.2.3 (v) and Definitions 33.2.2 and 34.3.6. 


34.3.9 REMARK: Summary of equivalent conditions for connectedness of a topological space. 

It may be useful to summarise here, in Theorem 34.3.10, some equivalent conditions for connectedness of 
a general topological space. Most of these conditions are little more than synonyms of the definition of a 
connected set. Some less shallow sufficient conditions for connectedness of subsets of topological spaces are 
outlined in Remark 34.4.1. 


34.3.10 THEOREM: Some equivalent conditions for the connectedness of a topological space. 
Let X be a topological space. Then the following conditions are equivalent. 


(i) X is a connected topological space. 
(ii) X is not a disconnected topological space. 
(iii) X has no disconnections. 


(iv) X is not equal to the union of a weakly separated pair of non-empty sets. 


(vi) X is not equal to the disjoint union of two non-empty closed sets. 
(vii) The only sets included in X which are both open and closed are Ø and X. 
(viii) VS € P(X) \ (0, X), Bdy(S) z 0. 


) 
) 
) 
(v) X is not equal to the disjoint union of two non-empty open sets. 
i) 
i) 
) 


PROOF: Conditions (i) and (ii) are equivalent by Definition 34.1.3 and Remark 34.1.4. 
Conditions (ii) and (iii) are equivalent by Definitions 34.1.3 and 34.3.2. 

Conditions (iii) and (iv) are equivalent by Theorem 34.3.8. 

Conditions (i) and (v) are equivalent by Definition 34.1.3 line (34.1.1). 

Conditions (v) and (vi) are equivalent by Definition 31.4.1 for a closed set. 

Conditions (v) and (vii) are equivalent by Definition 31.4.1 for a closed set. 

( 


Conditions (i) and (viii) are equivalent by Theorem 34.1.13. 


34.3.11 REMARK: Summary of equivalent conditions for disconnectedness of a subset. 

Theorem 34.3.12 is very similar to Theorem 34.3.10, but differs in some important details. The logical 
conditions have all been negated (apart from two omitted conditions), and the global set X has been replaced 
by subsets of X. All of the conditions in Theorem 34.3.10 are applicable to subsets if the full topology is 
replaced by the relative topology. However, this changes the appearance of some conditions. 


34.3.12 THEOREM: Some equivalent conditions for the disconnectedness of a topological space. 
For any subset S of a topological space X, the following conditions are equivalent. 
(i) S is a disconnected subset of X. 
(ii) S is not a connected subset of X. 
(ii) S has a disconnection in X. 
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(iv) S is equal to the union of a weakly separated pair of non-empty sets in X. 
(v) 501, Q2 € Top(X), (S € 04 U Qz and Q1 N Q2 N S =H and Qı N S Æ 0 and Q2 N S Æ ()). 
(vi) 3K1, K» € Top(X), (S € Kı U Kə and Kı N K2 N S = 0 and SV Kı 40 and SV Ko #9). 


PRoor: Conditions (i) and (ii) are equivalent by Definition 34.1.6 and the observation that conditions 


(34.1.3) and (34.1.4) are logical negatives of each other. 

Conditions (i) and (iii) are equivalent by Theorem 34.3.5. 

Conditions (iii) and (iv) are equivalent by Theorem 34.3.8. 

Conditions (i) and (v) are equivalent by Definition 34.1.6 line (34.1.4). 


To show the equivalence of conditions (v) and (vi), let Ay = XXV, and Ky = X \ Q2. Then the propositions 
Q1, Q2 € Top(X), S C Qi U Q2, Q4 AN2 N S = 0, 01 n S zZ and Q3 N S Æ are converted respectively to 
the equivalent propositions Kı, Kə € Top(X), Kin Kon S =0, S C Kı UK, S\ Ki z and SX K2 #0. 
(Note the subtle change in order.) Hence conditions (v) and (vi) are equivalent. 


34.3.13 REMARK: Totally disconnected spaces and sets. 

In the topological space Q with the relative topology from IR, every pair of distinct points can be disconnected 
by a "gap" between the points. So this space is totally disconnected by Definition 34.3.14. 

As mentioned in Remark 34.3.1, every subset of a discrete topological space is disconnected from its com- 
plement. In fact, all disjoint set-pairs of a discrete space are disconnected. So all discrete topological spaces 
are totally disconnected. 


34.3.14 DEFINITION: A totally disconnected (topological) space is a topological space in which all connected 
subsets contain at most one element. 


A totally disconnected subset of a topological space X is a subset S of X such that all connected subsets of 
S in the relative topology on S contain at most one element. 


34.4. Connectedness verification methods 


34.4.1 REMARK: Practical procedures to verify that a set is connected. 

It is relatively easy to prove that a set is disconnected according to Definition 34.1.6. One merely has to 
produce a single pair of sets Q1, 05 € Top(.X) which satisfy line (34.1.4). But to demonstrate that a set is 
connected, one must show that all sets Q1, 05 € Top(X) satisfy line (34.1.3), which is much more difficult 
because Top(X) is typically a very large set. Therefore in practice, one mostly uses some combination of 
theorems to prove connectedness of a set. 


(1) The empty set and all singletons are connected sets in all topological spaces. (See Remark 34.4.2.) 


(2) A set which is connected with respect to one topology is connected with respect to all weaker topologies. 
(See Theorems 34.4.4 and 34.4.5.) 


(3) All intervals of R are connected in the usual topology on IR. (See Theorem 34.4.9.) 


(4) The closure of a connected set is connected. (See Theorem 34.4.8 (ii).) 


(5) The union of an arbitrary set of connected sets with non-empty pairwise intersections is connected. (See 
Theorem 34.4.13.) 


(6) The union of a countable family of connected sets with non-empty intersections of adjacent pairs is 
connected. (See Theorem 34.4.15.) 


(7) The union of a countable family of non-empty sets with connected unions of adjacent pairs is connected. 
(See Theorem 34.4.16.) 


(8) The continuous image of a connected set is connected. (See Theorem 34.4.18.) 


(9) The graph of a continuous function is a connected subset of the Cartesian product of the source and 
target spaces if the domain of the function a connected set. (See Theorem 34.4.22.) 


(10) The direct product of a finite family of connected sets is connected. (See Theorem 34.4.23 (ii).) 
(11) Pathwise connected sets are connected. (See Section 36.7.) 
(12) Convex subsets of real and complex linear spaces are connected. 
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In most practical situations, one forms a conjecture based on intuition that a set is connected. Then one 
attempts to prove the conjecture with some combination of theorems. 


34.4.2 REMARK: Connectedness of the empty set and singletons. 
Any set with less than two elements must be connected because it cannot be partitioned into two different 
subsets in Definition 34.1.6. So the empty set and all singletons are connected. (See Theorem 34.4.5 (v).) 


34.4.3 REMARK: Stronger topologies have less connected sets. 

The more open sets you have at your disposal, the more pairs of sets you can separate. (This can be seen 
from Theorem 34.3.8, for example, because larger topologies makes closures of sets smaller, as can be seen 
in Figure 31.9.3 in Remark 31.9.19, or in Figure 31.9.5 in Remark 31.9.23.) Therefore “larger” topologies 
have more disconnected sets, which implies less connected sets. In the extreme case of the weakest topology 
on a set, all subsets are connected. In the extreme case of the strongest topology on a set, all subsets with 
two or more elements are disconnected. These observations are asserted in Theorems 34.4.4 and 34.4.5. 


34.4.4 THEOREM: Inheritance of connectedness properties by weaker topologies. 
Let Tı and T5 be topologies on a set X with T1 C T5. 


(i) If (.X, T1) is disconnected, then (X, T3) is disconnected 

(ii) If (X, Tz) is connected, then (.X, T1) is connected 
(iii) For any set S € P(X), if S is disconnected in (X, T1), then S is disconnected in (X, T3) 
(iv) For any set S € P(X), if S is connected in (X, T5), then S is connected in (.X, T1) 


PRoor: For part (i), let T; and Tz be topologies on a set X with T; C Ty, and suppose that (X, Tı) is 
disconnected. Then X = Qı U Q2 for some disjoint O1, Q2 € Tı \ (01 by line (34.1.2) in Definition 34.1.3. 
But then Q1, Q2 € Tə \ (0). So (X, T3) is disconnected by Definition 34.1.3. 

Part (ii) follows from part (i) because a space is connected if and only if it is not disconnected. 

For part (iii), let T, and Tz be topologies on a set X with T; C T5, and suppose that S € P(X) is disconnected 
in the topological space (X, T1). Then by line (34.1.4) in Definition 34.1.6, S C Q1 UQ2 for some 01, Q2 € Ti 
with Q1 n 09 N S = 0, Qı N S = 0 and Q2 N S = 0. But then Q1, Q2 € Ty, while the other conditions are 
unaffected by the choice of topology. So S is disconnected in (X, T5) by Definition 34.1.6. 


Part (iv) follows from part (iii) because a set is connected if and only if it is not disconnected. 


34.4.5 THEOREM:  Connectedness properties for some very simple kinds of topological spaces. 
(i) The trivial topological space (X, T) with T — (0, X) is connected for any set X. 
(ii) The discrete topological space (X, T) with T = P(X) is disconnected for any set X with #(X) > 2. 
(iii) In the trivial topological space (.X, T) with T = (0, X), every S € P(X) is connected, for any set X. 
(iv) In the discrete topological space (X, IP(X)), every S € P(X) with #(S) > 2 is disconnected. 
) 
) 


(v) In any topological space (.X, T), every S € P(X) with #(5) < 2 is connected. 
(vi 


PROOF: For part (i), let X be a set, and let T = (0, X). Let Q1, Q2 € T \ (0). Then Q; = Q2 = X and 
X z (. So the condition X = Q4 U Q» is satisfied, but the condition Q1 N Qə = ( is necessarily false for any 
set X. Therefore line (34.1.2) in Definition 34.1.3 is always false. Hence X is connected for the topology T. 
For part (ii), let X be a set, and let T = P(X) with #(X) > 2. Let x € X. Let Qı = {x} and 0; = X \ 91. 
Then Q1, Q2 € TV {0}, X = Qi U Qo and Q4 n O5 = Ø. Hence (X, T) is disconnected by Definition 34.1.3 
line (34.1.2). 

For part (iii), let X be a set, T = (0, X}, and S € P(X). Let O1, Q2 € T with Q; N S 40 and Qn S FO. 
Then Qı Z () and Q2 4 Ø. So Qı = Qz = X and X Æ Ø. Therefore $ C Q1 U Q5 = X, but Qj NQ NS = 
XAS = S #9. Therefore line (34.1.4) in Definition 34.1.6 is always false. Hence S is connected in the 
topological space (X, T"). 

For part (iv), let X be a set, T = P(X), and S € P(X) with #(S) > 2. Let z1,22 € X. Let Qı = {x1} 
and Qə = {x2}. Then 1, Q2 € T, S — QO1U (5 QANNS =f, UNAS and QNSE. Hence S is 
disconnected in (X, T) by Definition 34.1.6 line (34.1.4). 


In any Hausdorff topological space (X, T), every finite set S € P(X) with #(S) > 2 is disconnected. 
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For part (v), let (X, T) be a topological space. Let S € P(X) with #(S) < 2. If #(S) = 0, then clearly the 
condition Q1 N S Æ () in line (34.1.4) of Definition 34.1.6 cannot be met. So S is not disconnected. Therefore 
S is connected. Suppose that #(S) = 1. Then S = {x} for some x € X. The conditions Qı N S 4 () and 
Q2NS Æ 0 then imply that x € Q1 0s. But then 21NN2NS = (x) z 0. So line (34.1.4) of Definition 34.1.6 
cannot be valid. Therefore S is connected. 


For part (vi), let (X, T) be a Hausdorff topological space, and let S € P(X) be a finite set with #(9) > 2. 
Then S = (2,...z4) for some distinct £1,... £n. Let Sı = {a1} and S2 = (z5,... En}. Then S = S1 U S5, 
Sı N S2 = 0, Sı 4 Ý and S2 #4 Ø. By the Hausdorff condition, Definition 33.1.24, there exist open sets 
G»5,...G, € T and Hs5,... H, € T with z € G; and x; € Hi for all i = 2,...n. Let Q1 = (15.5 Gi 
and Q5 = [J; Hi. Then 91,9 € T, S1 C Qi, S2 C Qz and Q, N Nz = Ø. Hence S is disconnected by 
Definition 34.1.6 line (34.1.4). 


34.4.6 REMARK: A connected set cannot non-trivially intersect two weakly separated sets. 

Theorem 34.4.7 is useful for proving Theorems 34.4.8, 34.4.15 and 34.4.16. It is intuitively obvious that a 
connected set cannot have a non-empty intersection with both sets of a pair of non-empty sets. But of course 
it must be proved. 


34.4.7 THEOREM: A connected subset of a weakly separated pair cannot intersect both sets. 
Let (S1, S2) be a weakly separated pair of sets in a topological space X. Let K be a connected set in X such 
that K C Sı U S2. Then either K C Sı or K C So. 


PROOF: Let (51,5) be a weakly separated pair of sets in a topological space X. Let K be a connected set 
in X such that K C Sı U S2. Suppose that Kı = KN Sı 40 and Ky = K N S2 4%. Then the pair (51,53) 
is disconnected by Theorem 34.3.8. By Definition 34.3.6, there exist Q1, Q2 € Top(X) such that Sı C €, 
So C XQ» and Q1n0053n(S1U S2) =. So Kı C Q1, K2 C Qo and 01,9029 (K1U Kə) =. But K = Kı UKo. 
Therefore (K1, K2) is a disconnection for K by Definition 34.3.4. So K is disconnected by Theorem 34.3.5, 
which is a contradiction. Therefore either K N S4 = @ or KN Sg = 0. But K C S1 U S2. So either K C Sı 
or K € $5. 


34.4.8 THEOREM:  Connectedness of the closure of a connected set. 
Let K be a subset of a topological space X. 


(i) If K is connected, then S is connected for all S € P(X) satisfying K C S C K. 
(ii) If K is connected, then K is connected. 


PnRoor: For part (i), let K and S be subsets of a topological space X such that K is connected and 
K C S C K. Suppose that S is not connected. Then by Theorem 34.3.10 (ii,iv), S = S, U S2 for some 
weakly separated pair of non-empty sets (S1, S2) in X. Therefore K C $1U $5. So by Theorem 34.4.7, either 
K C S, or K C S5 (because K is connected). Suppose that K C S1. Then K C S, by Theorem 31.8.13 (xiv). 
So S C $,. But & N S2 = () by Definition 33.2.2 for weakly separated sets. So SM S2 = Ø. So S2 = Ø, which 
contradicts the non-emptiness assumption for $5. Therefore K Z Sı. Similarly, K Z S5. This contradicts 
the connectedness of K. Therefore (S1, S2) is not a weakly separated pair for S. Hence S is connected. 


Part (ii) follows immediately from part (i). 


34.4.9 THEOREM: All real-mumber intervals are connected. 
All real-number intervals are connected. 


PROOF: Let J be a real-number interval. Then by Definition 16.1.4, I satisfies 
Vz,y€ I, Vt € R, (r«tandt«y) — tc I. (34.4.1) 


Suppose that I is disconnected. If #(I) < 2, then I is connected by Theorem 34.4.5 (v). So I must contain at 
least two elements. By Definition 34.1.6, the disconnectedness of I implies the existence of O1, Q2 € Top(IR) 
such that I C Qi U O5, Q1nQOsnI- 0, QinI Æ f) and QNI Æ Ø. Let zı € Qı and z2 € Q2. Then zi Æ XQ. 
Suppose (without loss of generality) that zi < z2. Let J = [ri,23] = {x € R; zı € x and z < ag}. 
Then J C I by Theorem 16.1.9. So J C Q U Q2, UNAR NAJ = 0, Q NAJ z 0 and Q NAJ z 0. Let 
Ji =JN Qı and Jo —Jn (25. Then J = Ji U Jo, Ji N J = 0, 31€ Jı and TE Jo. Let y= inf(J2). Then 
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y € IR is well defined because Jz is non-empty and bounded below by 2. Also, y € [1,22] because zı is a 
lower bound and x2 € J2. Therefore y € Qı or y € Q2. If y € Qi, then y cannot equal the infimum of J2. 
(The infimum must be higher.) If y € Q2, then y cannot be a lower bound for Jz. (The infimum must be 
lower.) Therefore J cannot be disconnected. Hence I is connected. 


34.4.10 REMARK:  Notations for connected open sets, neighbourhoods, and real intervals. 

It is frequently convenient to have a notation for open and closed intervals, particularly those which contain 
particular points. Notation 34.4.11 extends this idea to general connected open and closed sets, and open 
neighbourhoods of given points. (See Notation 31.4.4 for Top(X).) 


Then one may write Top^?"" (IR) for the set of open real intervals, Top 
conn 
( 


»"" (IR) for the set of open real intervals 


p 
containing p € IR, Top IR) for the set of closed real intervals, and Top; ^"" (IR) for the set of closed real 


intervals containing p € IR. 


34.4.11 NOTATION: 
Top^??" ( X), for a topological space X, denotes the set {Q € Top( X); Q is connected}. 


Topp ™™ (X), for a topological space X and p € X, denotes the set (9 € Top, (X); Q is connected}. 


conn 


X), for a topological space X, denotes the set (F € Top( X); F is connected]. 
X), for a topological space X and p € X, denotes the set {F € Top, (X); F is connected]. 


Top 


( 
conn 
Top, ( 
34.4.12 REMARK: The union of pairwise non-disjoint connected sets is connected. 
A union of connected sets may or may not be connected. But if the sets in the union are pairwise non- 
disjoint, the union must be connected because each set in the union must belong to one or the other set in 
a disjoint covering pair. So the union must be connected. This is stated more formally in Theorem 34.4.13. 


34.4.13 THEOREM: The union of a pairwise non-disjoint union of connected sets is connected. 
Let C be a set of connected sets in a topological space X such that S1 N S2 Æ Ø for all S1, S2 € C. Then 
UC is connected. 


PROOF: Let X be a topological space. Let C be a set of connected sets in X such that S1 N S2 Æ () for 
all S1, S2 € C. If C = 0, then UC = 0, and so UC is connected by Theorem 34.4.5 (v). If #(C) = 1, then 
C = {S} for some connected set S € P(X), and so [JC = S, which implies that UC is connected. 


In the general case, suppose that A = UC is disconnected. Then Definition 34.1.6 implies that there exist 
4, Q2 € Top( X) with A € Q4 U Qs, QARINA = 0, QNA Æ 0 and QNA Æ 0. Let S € C. Then 
S C A. So S C Qi UQ and Q4, 05 nS — 0. If Q1 n S zZ 0 and Q5 S z 0, then S is disconnected 
by Definition 34.1.6, which is impossible because S € C. So either Q1 N S = Ø or Qə N S = Ø. It is not 
possible for both intersections to be empty because S = S N S 40 and S € Q1 U Q2. So every set S in C is 
a subset of either Qı or Q2, but not both. If at least one set Sı € C is included in Q; and at least one set 
S5 € C is included in Q5, this would contradict the assumption that 5, N S2 Æ () for all $1, S2 € C because 
S1 U S5 CAC Q4 U £25, and Q1 N Q9 n (S1 $3) CQ1nQsn A - 0. So either A C Qı and AN 5 =Í, 
or else A C Q9 and AN Qı = Ø. Either case contradicts the assumptions about Q4 and Q2. Hence UC is 
connected. 


34.4.14 REMARK: Connectedness of the union of a chained family of connected sets. 

Theorems 34.4.15 and 34.4.16 are essentially the same. (This helps to explain why the proofs are so similar.) 
Theorem 34.4.16 may be proved by setting K; = L;UL;,1 for all i € Zi in Theorem 34.4.15. Theorem 34.4.15 
may be proved by setting L; = K; for all ic Zi in Theorem 34.4.16 and applying Theorem 34.4.13 to the 
pair C = (K;, Ki41} for each i € Zj. 


34.4.15 THEOREM: The union of a nezt-pair-wise non-disjoint sequence of connected sets is connected. 
Let (K;)%2o be a family of connected sets in a topological space X which satisfies K;UKj41 4 0 for alli € Zj. 
Then [J;*, K; is a connected set in X. 


PROOF: Let (K;)%2, be a family of connected sets in a topological space X such that K; N Ki+ı 4 0 for 
all i € Z}. Suppose that S = UZ, K; is not a connected set in X. Then by Theorem 34.3.10 (ii, iv), there 
exists a weakly separated pair of non-empty sets (51, S2) in X such that S = S1 U S2. So by Theorem 34.4.7, 
for each i € Zi, the connected set K; is a subset of either S1 or S2, but not both. Since K; K;,4 4 9, 
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both K; and K;,; must be in the same separating set (S1 or S2) for all i € Zi. By induction, this set must 
be the same for all i € Ze . (Otherwise, there must be a least i € Z* for which K; is included in a different 
separating set to the initial set Ko, which would yield a contradiction.) Therefore S C $4 or S C S2, which 
contradicts the assumption for (51, $5). Hence S is connected. 


34.4.16 THEOREM: The union of a nezxt-pair-wise connected sequence of non-empty sets is connected. 
Let (L;);95 be a family of non-empty sets in a topological space X such that L; U Lj41 is a connected set 
in X for all i € Zf. Then UZ, L; is a connected set in X. 


PROOF: Let (L;);29 be a family of non-empty sets in a topological space X such that L;UL;,4 is a connected 
set in X for all i € Zi. Suppose that S = Uo Li is not connected. Then by Theorem 34.3.10 (ii, iv), there 
exists a weakly separated pair of non-empty sets (S1, S2) in X such that S = S1 U S2. So by Theorem 34.4.7, 
for each i € Zg, the connected set L; U L;,1 is a subset of either $4 or S2, but not both. By induction, this 
set must be the same for all 7 € Zi . (Otherwise, there must be a least i € Z? for which L; U L;,; is included 
in a different separating set to the initial set-union Lo U L1, which would yield a contradiction.) Therefore 
S C S, or S C S5, which contradicts the assumption for (S1, S2). Hence S$ is connected. 


34.4.17 THEOREM: The inverse image of a disconnected set-pair is a disconnected set-pair. 
Let X and Y be topological spaces, f : X — Y be continuous and (S1, 52) be a disconnected pair of sets 
in Y. Then (f~+($1), f-1(S3)) is a disconnected pair of sets in X. 


PROOF: Let X and Y be topological spaces, f : X — Y be continuous and ($1, $5) be a disconnected pair of 
sets in Y. Then by Definition 34.3.6, $1, S2 € P(Y pi) and for some Q4, Q2 € Top(Y), S1 C Q1, S2 C Qo, 
and Qı N Q2 N (Si U S2) = Ø. Therefore on 153), f 1S. 2) € IP(X) \ (0), F11), y HOS) € Top( X) by 
Definition 31.12.4, f-1($,) C f-1(01), f-!(83) C f-! (92) by Theorem 10.6.10 (ii), and /-! (G4)nf-1 (05) 
(f-1(91) U f-1(S2)) = f 0 N Qə N A U S2)) = 0 by Theorem 10.6.10 6.10 (iii, iv). Hence (f (91), f 53) 
is a disconnected pair of sets in X by Definition 34.3.6. 


34.4.18 THEOREM: The continuous image of a connected set is connected. 
Let X and Y be topological spaces, f : X — Y be continuous and A C X be connected. Then f(A) is 
connected. In other words, the continuous image of a connected set is connected. 


PROOF: Suppose f(A) is not connected. Then f(A) = Sı U S» for some disconnected pair of sets ($1, S2) 
in Y by Theorem 34.3.5. So (f (9). f ^ (92) is a disconnected pair of sets in X by Theorem 34.4.17, 
and A = f !(f(A)) = f! ($1U S2) = f! (S1) U f~*(S2) by Theorem 10.7.1 (ii) and Theorem 10.6.10 .10 (iii). 
Therefore A is not connected by Theorem 34.3.5. Hence the contrapositive follows. In other words, f(A) is 
connected if A is connected. 


34.4.19 REMARK: The range of a continuous curve is connected. 

Theorem 34.4.20 states that continuous images of real-number intervals are connected. By Definition 36.2.3, 
a continuous map f : I — X for a real-number interval I is the same thing as a continuous curve in X. So 
'Theorem 34.4.20 means that the range of a continuous curve is connected. 


34.4.20 THEOREM: Continuous images of real-number intervals are connected sets. 
Let f : I — X be a continuous function from a real-number interval J to a topological space X. Then f(I) 
is connected. 


PROOF: The assertion follows from Theorems 34.4.9 and 34.4.17. 


34.4.21 REMARK: The restriction of the graph of a continuous function to a connected set is connected. 
Theorem 34.4.22 is an immediate corollary of Theorem 34.4.18. (Note that “the graph of f A means the 
same thing as Fla itself. See Remarks 9.1.2 and 9.1.3 for this issue.) 


34.4.22 THEOREM: The graph of a continuous function with a connected domain is connected. 
Let X and Y be topological spaces. Let f : X — Y be continuous and A C X be connected. Then the 
graph of f | 4 is a connected subset of X x Y with the product topology. 


PROOF: By Theorem 32.9.8, the map g : X — X x Y defined by g : x ++ (a, f(x)) is continuous if f is 
continuous. Therefore the image of g is connected by Theorem 34.4.18. 
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34.4.23 THEOREM: Finite direct products of connected sets are connected. 


(i) Let X and Y be topological spaces. Let A € P(X) and B € P(Y) be connected sets. Then A x B isa 
connected set in the direct product topology X x Y. 


(ii) Let (X;)?., be a family of topological spaces for some n € Zt. Let (Aj;)"_, be a family of connected 
subsets A; € P(X;) for i € Nn. Then x%_, A; is a connected set in the direct product x7, X. 


Pnoor: For part (i), let X and Y be topological spaces, with connected sets A C X and B C Y. Suppose 
that A x B is not connected in X x Y. Then Ax B = S1 U S5, $1 C Q1, S2 C Ng and O; Q4 (Ax B) 20 
for some S1, S2 € P(X x Y) \ (0) and €01,09 € Top(X x Y). Let (zi,y1) € Sı and (x2, y2) € S2. Then 
(z1,y2) € A x B because zı € A and yo € B. So (21, y2) € 94 or (z1,y2) € S2. Suppose that (x1, y2) € $1, 
and let S] = (x € X;(z,y3) € Si}, Sh = {x € X; (x, y2) € So}. €, = {x € X; (a, y2) € Qı} and 
OS = {a € X; (x, y2) € Qo}. Then Si U S2 = (x € X; (a, y2) e S1 U S2} = [re X; (x,y) E Ax B1] =A 
and ( n 051 A = {a € X; (a, y2) e Q NAQ} NA = (x € X; (z,y9) € Q1 NQ2 N (A x B)} = 0. But 
Q € Top( X) and €) € Top(Y) by Theorem 32.10.3, and 51 2 (z1) #@ and S, D {x2} # 0. So A is not a 
connected subset of X, which is a contradiction. Therefore (z1,y2) € S1. Similarly (x1, y2) € S2 by letting 
SY = {y € Y: (zi. y) € Si} and 57 = (y € Y; (zi, y) € So}, so that S7 U 97 = {y € Y; (zj,y) E€ S1 U S2} = 
{y € Y; (zx4,y) € Ax B) = B, and so forth. But this contradicts the assumption that (21, y2) € $1 U S2. 
Therefore A x B is a connected subset of X x Y. 


Part (ii) follows from part (i) by induction. 


34.5. Connected components 


34.5.1 REMARK: Some motivation for connected components. 

Probably the principal motivation for the study of connected components of topological spaces is a kind 
of “divide and conquer" strategy. First one divides a topological space into components, each of which is 
connected, and then one only needs to analyse spaces which are connected. This is of particular value in the 
study of topological and differentiable manifolds. 


It is of particular interest to show that every open set of real numbers has a countable set of components. 
(These components are open intervals in the specific case of the real numbers.) Explicit enumeration of the 
partition of an open set of real numbers into connected components is of particular importance when one 
wishes to prove that the set of non-differentiable points of a non-decreasing function has explicit measure 
zero. (See Sections 45.5 and 45.7.) Countable covers of sets by open intervals are of particular relevance in 
measure theory. (See Sections 45.1 and 45.2.) 


34.5.2 REMARK: Connected components are maximal connected subsets of a topological space. 

Definition 34.5.3 means that a connected component is a maximal connected subset. In other words, any 
proper superset will be disconnected. In fact, it is not just maximal. It is a maximum because the union 
of any two connected sets containing a common element is also connected. (The uniqueness is important 
because it helps to avoid axioms of choice.) 


34.5.3 DEFINITION: A connected component of a topological space X is a connected set K € P(X) \ {0} 
such that VS c P(X), (K & S = S is not connected). 


34.5.4 THEOREM: Connected components of topological spaces are closed sets. 
Every connected component of a topological space X is a closed subset of X. 


PROOF: Let K be a connected component of a topological space X. Then K is connected by Theo- 
rem 34.4.8 (ii). But K C K. If K # K, then K ¢ K. So K is not connected by Definition 34.5.3, which is a 
contradiction. Therefore K = K. Hence K is closed in X by Theorem 31.8.14 (v). 


34.5.5 REMARK: Connected components of subsets of a topological space. 

Definition 34.5.3 is extended in Definition 34.5.6 from components of a topological space X to components of 
general subsets A of such a space. T'hese definitions seem superficially similar, but the attribute *connected" 
may be interpreted with respect to the topology Top(.X) for the whole space X or with respect to the relative 
topology Top( A) = {QN A; Q € Top(X)} for the subset A. Luckily these two interpretations are equivalent 
in view of Theorem 34.1.8. 
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Since connectedness is defined to be the same in the full topology as in the relative topology, Definition 34.5.6 
is equivalent to defining components of A with respect to the relative topology of X on A. Therefore 
definitions and theorems regarding connected components of general topological spaces usually apply to 
subsets of topological spaces also. 


34.5.6 DEFINITION: A connected component of a subset A of a topological space X is a connected set 
K € P(A) \ (0) such that VS € P(A), (K € S => S is not connected). 


34.5.7 REMARK: Exclusion of empty connected components is almost redundant. 

By Theorem 34.4.5 (v), it is not necessary to explicitly exclude the empty set as a connected component in 
Definition 34.5.3 if the topological space .X is non-empty. When X is not empty, any singleton is connected. 
So the empty set is never a maximal connected set in a non-empty topological space. In other words, the 
non-emptiness requirement in Definition 34.5.3 is redundant for non-empty topological spaces. 


It is desirable to exclude the empty set as a connected component in any empty topological space so that 
the set of components will be empty for such a space. In other words, the list of components is empty if the 
space is empty. 

Although one is rarely interested in the empty topological space as an arena for serious work, the empty 
subset of a topological space does very frequently arise in serious work. Therefore the exclusion of the empty 
set as a connected component in Definition 34.5.6 is not so redundant. 


34.5.8 REMARK: Ad-hoc notation for the set of connected components of a set. 
The ad-hoc Notation 34.5.9 is useful for discussing enumerations of the connected components of subsets of 


a topological space. 


34.5.9 NOTATION: ConnSet(A), for a subset A of a topological space X, denotes the set of all connected 
components of A in X. 


34.5.10 REMARK: Connected components of subset X of X are the same as connected components of X. 
The remarkably trivial Theorem 34.5.11 implies that Definition 34.5.6 is an extension of Definition 34.5.3. 
Hence all theorems which apply to connected components of general subsets of a topological space also apply 
to connected components of the whole space. 

The slightly less trivial Theorem 34.5.12 shows that whether or not a set K is a connected component of a 
subset A of a topological space X is independent of whether Definition 34.5.6 is applied to K as a subset of 
the subset A of X or Definition 34.5.3 is applied to the subset K of the relative topological space (A, T4). 


34.5.11 THEOREM:  Equivalence of whole-space and subset versions of connected component definitions. 
Let X be a topological space and K € IP(X). Then K is a connected component of X if and only if K isa 
connected component of the subset X of X. 


Pnoor: Let X bea topological space and K € IP(X). If K isa connected component of X , then substitution 
of X for A in Definition 34.5.6 shows that K is a connected component of the subset X of X. So suppose 
that K is a connected component of the subset A of X, where A = X. Then substitution of X for A in 
Definition 34.5.6 shows that K satisfies Definition 34.5.3, and is therefore a connected component of X. 


34.5.12 THEOREM:  Equivalence of subset and relative-space versions of connected component definitions. 

Let A be a subset of a topological space X and K € P(A). Then K is a connected component of the set A if 
and only if K is a connected component of the topological space (A, TA), where T4 = {QN A; Q € Top(X)} 
is the relative topology on A. 


PROOF: Let X be a topological space. Let A € P(X) and K € P(A). Suppose that K be a connected 
component of the subset A of X. Then by Definition 34.5.6, K is a non-empty subset of A which satisfies 
VS € P(A), (kK € S = S is not connected). Here the connectedness of S means connectedness in X 
according to Definition 34.1.6. But by Theorem 34.1.10, this is equivalent to connectedness of $ in the 
relative topological space (A, TA). Therefore K satisfies Definition 34.5.3 with A substituted for X. Hence 
K is a connected component of the topological space (A, T4). 


To show the converse, suppose that K is a connected component of the topological space (A, TA). Then 
K satisfies Definition 34.5.3 with A substituted for X. So K is a non-empty subset of A which satisfies 
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VS c P(A), (K € S => S is not connected), where connectedness of S here means connectedness in X 
according to Definition 34.1.6 with A substituted for X. But by Theorem 34.1.10, this is equivalent to 
connectedness of S in the full topological space (.X, Tx). Therefore K satisfies Definition 34.5.3. Hence K 
is a connected component of the subset A of X. 


34.5.13 REMARK: Connected components of a set are closed relative to the set. 

'Theorem 34.5.4 is directly applicable to any subset of a topological space if the topology on that subset 
is the relative topology induced by the space. Thus one may state that every connected component of a 
subset A of a topological space X is a closed subset of A in the relative topology on A. The components 
are not necessarily closed in the topology on X. However, if the set A is closed in the topology on X, one 
obtains Theorem 34.5.14. 


34.5.14 THEOREM: All connected components of closed sets are closed sets. 
All connected components of a closed subset of a topological space X are closed in X. 


PROOF: Let A be a closed subset of a topological space X. Let K be a connected component of A. Then 
by Theorem 34.5.12, K is a connected subset of the relative topological space (A, T'4). So K is a closed set 
in (A, T4) by Theorem 34.5.4. Hence K is a closed set in X by Theorem 31.6.7 (iii). 


34.5.15 THEOREM: A connected set with an open complement is a connected component of the space. 
Let X be a topological space. Let Q be a non-empty connected open subset of X whose complement is a 
non-empty connected subset of X. Then 2 is a connected component of X. 


PROOF: Let 2 be a connected subset of a topological space X such that Q and X V Q are non-empty open 
subsets of X. Suppose that Q is not a connected component of X. Then by Definition 34.5.3, there is a set 
S € P(X) such that Q Ẹ S and S is connected. Let Q; = Q and Q2 = X\Q. Then S € O1UQ5, QNAN = (), 
QNS z 0, 22NS Z 0, and 04,0» € Top( X). So S is disconnected by Definition 34.1.6 line (34.1.4), which 
is a contradiction. Hence €? is a connected component of X. 


34.5.16 REMARK: A maximal connected set is not necessarily disconnected from its complement. 

The idea in Definition 34.3.4 that an arbitrary subset of a topological space can be partitioned into two 
subsets which are in some topological sense “disconnected” from each other (as opposed to being simply 
disjoint) suggests that one might carry on such a process by a kind of “divide and conquer” algorithm, 
splitting each subset into a further two subsets until they can no longer be split in this way. 


It seems reasonable to conjecture that a topological space may be partitioned into subsets which cannot 
be split any further. These would then be “atomic connected components” of the space, and every set 
could then be expressed as a partition of such “indivisible” connected components. A hopeful starting 
point would be to first define an indivisible component to be (1) disconnected from the rest of the space, 
and (2) not itself disconnectable (or "splittable"). In other words, an indivisible component K should be 
disconnected from X V K, but K should be connected, i.e. incapable of being split further. It turns out 
that this approach does not work. A maximal connected subset K of a topological space X might not be 
disconnected from X V K. A simple example is the set of rational numbers Q with the relative topology 
from R. The maximal connected sets of Q are the singletons, but the singletons {x} are not disconnected 
from their complements Q \ {x}. (Maximal connected sets as in Definition 34.5.3 are disconnected from 
their complements in locally connected spaces. See Remark 34.7.12 and Theorem 34.7.13.) 


34.5.17 THEOREM: The union of connected supersets of a non-empty set is connected. 
Let X be a topological space. 
(i) U{S € P(A); x € S and S is connected} is a connected subset of A for all A € P(X) and all z € A. 
(ii) UU 4S € P(A); B € S and S is connected} is a connected subset of A for all A € P(X) and all non-empty 
subsets B of A. 


PROOF: For part (i), let C = (S € P(A); x € S and S is connected] for some x € A, for a subset A of a 
topological space X. Then $1155 2 (x) # 0 for all 51, S2 € C. Hence (JC is connected by Theorem 34.4.13. 
For part (ii), let C = (S € P(A); B € S and S is connected} for some non-empty subset B of a subset A of 
some topological space X. Then y € B for some y. So Sı N S2 D {y} Æ 0 for all $1,595 € C. Hence UC is 
connected by Theorem 34.4.13. 
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34.5.18 REMARK: The connected components form a partition of a topological space or subset. 

The connected components of a topological space, or any subset of a topological space, constitute a partition 
of the space or subset because every element of the space must be an element of at least one component 
of the space and the union of every pair of non-disjoint connected sets is connected. This is stated more 
formally as Theorem 34.5.19. 


34.5.19 THEOREM: Partitioning sets into connected components. 

(i) Let Kı and Ky be connected components of a topological space X. Then either Kı = Kə or KiN Kə = 0. 

(ii) Let Kı and Kə be connected components of a subset of a topological space X. Then either Kı = Kə 
or ky N Kə = (). 

(iii) Let X be a topological space. Then [J(S € P(A); x € S and S is a connected subset of X) is a 
connected component of A for all A € P(X) and z € A. 

(iv) Let A be a subset of a topological space X. Then for all z € A, there is a connected component K of A 
such that x € K. 

(v) Let A be a subset of a topological space X. Define a relation 
there is a connected component K of A such that xı € K and x» € K. Then 
relation on A. 


ot” 


on A so that zı = x2 if and only if 
“=” is an equivalence 


(vi) The set of connected components of a subset A of a topological space X constitutes a partition of A. 


PROOF: For part (i), let Kı and Kə be connected components of a topological space X. Suppose that 
K, 0 Kə £0. Then K = K, U Kg is a connected subset of X by Theorem 34.4.13 with C = [K4, Ko}. 
If Kı # Ko, then either Kı € K or Ko ¢ K. So by Definition 34.5.3, either Kı or Kə is not a connected 
component of X, which contradicts the assumptions. Hence either Kı = Kə or Kı N Ko = Ô. 


For part (ii), let Kı and K2 be connected components of a subset A of a topological space X. Suppose 
that Kı N Ko #0. Then K = Kı U Kə is a connected set in X by Theorem 34.4.13 with C = (K1, K2}. 
If Kı # Ko, then either Kı € K or Ky & K. So by Definition 34.5.6, either Ky or Ky is not a connected 
component of A, which contradicts the assumptions. Hence either Kı = Kə or Kı NA Ky = Ô. 


For part (iii), let A be a subset of a topological space X. Let x € A. Then {x} is a connected subset of X 
by Theorem 34.4.5 (v). Let K =U (S € P(A); x € S and S is a connected subset of Xj. Then x € K, and 
K is connected by Theorem 34.4.13. Suppose that K’ is a connected subset of A such that K C K'. Then 
K' C K by the definition of K. So K' = K. Therefore VS c P(A), (K € S => S is not connected). Hence 
K is a connected component of A by Definition 34.5.6. 


Part (iv) follows directly from part (iii). 
For part (v), let A be a subset of a topological space X. Define the relation “=” on A as indicated. Then 
r =x fralg e A by part (iv). Clearly zı = xə if and only if z2 = xı for all z1, £2 € A by the 
symmetry of the relation. Suppose that zı = z2 and £2 = za for some z1, %2,%3 € A. Then for some 
connected components Kı and Kə of A, 71,72 € Kı and z2,£3 € Kə. But then Kı N Kə D {ro} Æ 0. So 
Kı = Kə by part (ii). Therefore 71,73 € Kı, and so z; = za. Hence “=” is an equivalence relation on A by 
Definition 9.8.2. 

For part (vi), let X be a topological space. Let P = {K € P(A); K is a connected component of A} 
for A € P(X). Then UJ P = X by part (iv), and for all K1, K2 € P, either Kı = Kə or Kı N Kə = 0 by 
part (ii). Hence P is a partition of X by Definition 8.7.12. 


34.5.20 REMARK: A set-union expression for connected components of sets. 

Theorem 34.5.21 apparently gives a condition for a set K to be a connected component of a subset A of a 
topological space X. However, since the set K appears on both sides of equation (34.5.1), the condition is 
only useful for verifying that a set is a connected component, not for “constructing” or “computing” it. For 
this computational task, Theorem 34.6.6 is more useful because it “constructs” a connected component from 


a single element. 


34.5.21 THEOREM: Expression for the connected component of a set as a union of connected supersets. 
A set K is a connected component of a subset A of a topological space X if and only if K € P(A) \ (0) and 


K=U{S € P(A); K € S and S is connected}. (34.5.1) 
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PROOF: Let K be a connected component of a subset A of a topological space X. Then K € P(A) \ (0) 
and K is connected, by Definition 34.5.6. So K € (S c P(A); K C S and S is connected). Therefore 
K C US € P(A); K € S and S is connected}. Let y € (U{S € P(A); K C S and S is connected] ) V K. 
Then y € S for some S c P(A) such that K & S and S is connected. But this contradicts Definition 34.5.6. 
So U{S € P(A); K € S and S is connected} C K. Hence K = [J {S € P(A); K € S and S is connected}. 


For the converse, let K € P(A) \ {0} satisfy K = UC with C = (S € P(A); K C S and S is connected]. 
Then K is connected by Theorem 34.5.17 (ii). Suppose that S’ € P(A) is a connected set such that K & S". 
Then S’ € C. So S’ CUC. So S c K, which is a contradiction. Therefore S’ must be disconnected. 
It follows that VS € P(A), (K € S = S is not connected). Hence K is a connected component of A by 
Definition 34.5.6. 


34.5.22 THEOREM: A set is connected if and only if it is non-empty and has no connected supersets. 
A set K is a connected component of a topological space X if and only if K is a non-empty subset of X and 
K=U{S € P(X); K € S and S is connected}. 


PROOF: The result follows from Theorems 34.5.21 and 34.5.11. 


34.6. The connected component map 


34.6.1 REMARK: The connected component of a point in a subset of a topological space. 

Since Theorem 34.5.19 (vi) shows that the set of connected components of a subset A of a topological space 
X is a partition of A, the connected component of A which contains a given point x € A is a well-defined 
function from A to the set of connected components of A. 


34.6.2 DEFINITION: The connected component of a point x in a subset A of a topological space X is the 
connected component of A which contains x. 


The connected component map for a subset A of a topological space X is the map from elements of A to the 
corresponding connected components of A which contains each x € A. 


34.6.3 NOTATION: 
ConnA, for A € P(X), for a topological space X, denotes the connected component map for A in X. 


Thus ConnA(z), for x € A and A € P(X), for a topological space X, denotes the connected component of A 
in X which contains g. 


34.6.4 REMARK: Possible extension of the connected component map. 

The definition and notation for the connected component map Conn, : A — P(A) V {@} may be extended 
from a subset A to the whole space X so that Conn,4(a) = Ø whenever x € X \ A. (Since the empty set 
cannot be a connected component, as mentioned in Remark 34.5.5, there is no danger of ambiguity.) One 
small advantage of such an extension would be that all connected component maps Conn, for A € P(X) 
would have the same domain and target space. Thus Conn, : X — P(X) for all A € P(X). But there is no 
really compelling reason to do this. 


34.6.5 REMARK: Connected component containing given point is like an image editor “fill” algorithm. 
Theorem 34.6.6 states that a set S € P(A) is a connected component of a subset A of a topological space X 
when it is equal to the maximum connected set which contains some point x € X. This is analogous to the 
“fill” algorithm which is found in most computer graphics image editing tools, which computes the largest 
connected set containing a given point. The maximality criterion in Definition 34.5.6 means that not even 
a single point can be added to a connected component without losing connectedness. In other words, any 
element of A which lies outside a connected component must be disconnected from that component. 


34.6.6 THEOREM: Connected components of sets are maximal connected supersets of some element. 
A set S is a connected component of a subset A of a topological space X if and only if S € P(A) and 


dre S, S=U{K € P(A); x € K and K is connected]. 


PROOF: Let S be a connected component of a subset A of a topological space X. Then S € P(A) \ (0) and 
S is connected by Definition 34.5.6. So x € S for some x, and S € {K € P(A); x € K and K is connected] 
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for such x. So S C U{K € P(A); x € K and K is connected}. Let y € (U{K € P(A); x € K and 
K is connected) ) V S. Then y € K V S for some K € P(A) such that x € K and K is connected. Let 
K' = KUS. Then S & K'. But K’ is connected by Theorem 34.4.13. This contradicts Definition 34.5.3. So 
U{k € P(A); x € K and K is connected) C S. Hence S = |]J {K € P(A); x € K and K is connected]. 


To show the converse, let $ € P(A) satisfy S = J C with C = {K € P(A); x € K and K is connected} for 
some z € S. Then 5 € IP(A)V (0), and S is connected by Theorem 34.5.17 (i). Let A’ € P(A) be connected, 
with S € K'. Then K' € C. So K' C UC. So K' C S, which is a contradiction. Therefore A’ must be 
disconnected. Thus VK € P(A), (S € K = K is not connected). Hence S$ is a connected component of A 
by Definition 34.5.6. 


34.6.7 THEOREM: Connected components of a space are maximal connected supersets of some point. 
A set S is a connected component of a topological space X if and only if S € P(X) and 


Jr € S, S=U{K € P(X); x € K and K is connected}. 


PROOF: The result follows from Theorems 34.6.6 and 34.5.11. 


34.6.8 REMARK: Explicit expressions for connected components containing given points. 

Theorem 34.6.7 (i) gives an expression for the connected component of a set which contains a given point. 
The fact that such an explicit set-theoretic formula can be given for connected components is important for 
avoiding axioms of choice. (Part (ii) is essentially identical to part (i).) 


34.6.9 THEOREM: Expressions for the connected component of a point in a set. 
Let X be a topological space. 


(i) VA € P(X), Va € A, Conna(xz) =U {K € P(A); x € K and K is connected}. 
(ii) VA € P(X), Vx € A, Conna(x) = (y € A; IK € P(A), {x,y} € K and K is connected}. 


Pnoor: For part (i), let C = {K € P(A); x € K and K is connected) for x € A, for some subset A of 
a topological space X. Then [JC is a connected component of A by Theorem 34.5.19 (iii), and z € UC 
because {x} € C. However, Conn,(x) is the unique connected component K of A such that x € K, by 
Theorem 34.5.19 (vi), Definition 34.6.2 and Notation 34.6.3. Hence Conna (x) = UC. 


For part (ii), let X be a topological space and C = {K € IP(A); x € K and K is connected} for some 
AcTP(X)andzec A. Then UC = (y IK € C, y € K} = (y; IK € P(A), x € K and K is connected and 
y € K} = (y; IK € P(A), {x,y} € K and K is connected}. But this is a subset of A because every y in this 
set satisfies JA € P(A), y € K. So UC = (y € A; IK € P(A), (x, y) € K and K is connected). Hence 
the proposition follows from part (i). 


34.6.10 THEOREM: Some properties of the set of connection components of all elements of a space. 
Let X be a topological space and A € IP(X). 
(i) {Conn,(x); x € A} is the set of all connected components of A. 


(ii) {Conna (x); z € A} is a partition of A. 


PROOF: For part (i), let X be a topological space and A € P(X). Let P = (Conna(z) x € Aj. If 
S € P, then S = ConnA4(z) for some x € A, and so S is a connected component of A by Notation 34.6.3 
and Definition 34.6.2. But if S is a connected component of A, then S contains some element r € A by 
Definition 34.5.3, and so S = ConnA(x) by Definition 34.6.2 and Notation 34.6.3, whereby S € P. So 
{Conn,(x); x € A} is the set of all connected components of A. 


Part (ii) follows from part (i) and Theorem 34.5.19 (vi). 


34.6.11 REMARK: Open connected components may be "filled" from a point by open sets. 

Theorem 34.6.12 (i) differs from Theorem 34.6.7 by assuming that the component €? is an open set, and then 
it is shown that Q can be “filled” as the union of all open connected neighbourhoods of some element x € Q, 
as opposed to general connected sets which contain z. 
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34.6.12 THEOREM: Open connected components are maximal connected neighbourhoods of some point. 
Let X be a topological space. 


(i) If Q € Top( X) is a connected component of X, then 
Jz EQ, Q = U {G € Top, CX); G is connected}. 


Pnoor: For part (i), let Q € Top(X) be a connected component of a topological space X. It follows from 
Theorem 34.6.7 that Q = U{G € P(X); x € G and G is connected} for some x € Q. From Top(X) C 
P(X), it follows that Q 2 U{G € Top(X); x € Gand G is connected}. But Q € Top(X), x € Q and 
Q is connected. So Q € {G € Top(X); x € G and G is connected). Therefore Q C U{G € Top(X); 
x € G and G is connected}. Hence dx € Q, Q =U {G € Top, (X); G is connected]. 


34.6.13 REMARK: Connected component may not equal largest connected open set containing a point. 

It may not be immediately obvious that the converse of Theorem 34.6.12 (i) is false in a general topological 
space. (That is, the union of all connected open neighbourhoods of a point is not necessarily a connected 
component of the space.) A counterexample is provided by Example 34.6.14. (This helps explain why the 
converse is so difficult to prove!) Theorem 34.7.11 (i) asserts the valid converse for locally connected spaces. 


34.6.14 EXAMPLE: Let X = KUL, where K = ((z, y) € IR; 0 < y < x and dk € Z, x = 1 — (0.75)*) and 
L = ((z,y) € (0,1]?; x = 10r y = 1}, with the relative topology of R?. (This is illustrated in Figure 34.6.1.) 


y 
1 f 
| not 
II ? locally 
connected 
04 x 
0 1 

x =1-— (0.75), k € Zt, and0<y<a, 
or ((z = 1 or y = 1) and (z, y) € (0, 1]?) 


Figure 34.6.1 Badly connected topological space 


Let z = (4,1) € R?. Then z € X and the connected component of z in X is the subset L of X because L 
is connected and if any point of K is added to the set L, then that point will be in a connected component 
of X which is disconnected from L. However, L is not an open subset of X because any neighbourhood of 
any point of the subset {1} x (0,1) of L contains points of K. The set U{G € Top,(X); G is connected} 
is equal to (0,1) x {1}, which is a proper subset of the connected component L of X which contains z. 
(If z = (1,y) € X for some y € (0,1), then |J {G € Top,(X); G is connected} = ( because then z has no 
connected open neighbourhoods.) Thus X provides a counterexample for the converse of Theorem 34.6.12 (i). 


The primary reason for the failure of LJ) (G € Top,(X); G is connected} to be a connected component for 
Z= (5, 1) is the fact that the set X is not locally connected at some points of L. (See Section 34.7 for locally 
connected topological spaces.) Another way to put this is that L is an open set in a neighbourhood of z, 
but it is not open on the subset {1} x (0,1] of L. 


34.6.15 REMARK:  Enumerations of connected components of sets. 

It is often desirable to be able to enumerate the connected components of sets. This is not difficult for sets 
which have a finite number of components. When the sets of connected components is known to be infinite, 
one may distinguish three cases. (1) The set is countable (and may be enumerated as an infinite sequence). 
(2) The set is uncountable. (3) The set is not finite, but the existence of some kind of enumeration can 
be proved with an axiom of choice. Definition 34.6.16 is concerned with the finite case and the countably 
infinite case (1). 


[ www. geometry. org/dg.htm1] [ draft: UTC 2023-1-3 Tuesday 00:13] 


dden. You may not charge 


34.7. Local connectedness 1153 


34.6.16 DEFINITION: A (countable) connected component enumeration for a set S in a topological space X 
is a map f : I + P(S) V (0), for some I € wt, such that 


(i) f(i) is a connected subset of S for all i € I, 
(i) Uier fO = S 
(ii) vij €I, 6G j > fN fO) = 0). 


34.6.17 REMARK: Countable connected component enumerations of open sets in separable spaces. 

It might seem intuitively that in a separable space, the connected components of an open set should be 
countable. (See Theorem 32.7.8 for the case of the real numbers.) But the components of an open set are 
not necessarily open. So this approach could fail. Example 34.6.18 shows that it is in fact unsuccessful in 
general. However, open sets do have a countable sets of components in separable spaces if the extra condition 
“locally connected” is added. (See Theorem 34.8.2.) 


34.6.18 EXAMPLE: Let X = R \ Q with the relative topology of R. Let S = {r + q; q € Q}. Then S is 
countable and dense in X. So X is a separable space. However, the set of components of X is uncountable 
because every singleton of X is a connected component. 


34.7. Local connectedness 


34.7.1 REMARK: Connectedness classes and topological manifolds. 
Some relations between connected spaces and (the underlying topological spaces of) topological manifolds 
are illustrated in Figure 34.7.1. 


topological space 


(X,Tx) 
Hausdorff space locally connected space connected space 
(X,Tx) (X,Tx) (X,Tx) 


Y 


locally Cartesian space 


(X, Tx) 
a 


topological manifold 
(X, Tx) 


Figure 34.7.1 Relations between connected spaces and topological manifolds 


Locally Cartesian spaces, and topological manifolds in particular, are locally connected. Therefore in the 
context of topological manifolds (and differentiable manifolds in particular), local connectedness never fails 
to be true. However, local connectedness is highly relevant to subsets of topological manifolds. 


By contrast, connectedness is not implied by the specifications of locally Cartesian spaces or topological 
manifolds. However, each component of such a space can typically be treated as an independent space or 
manifold since the disconnection of the components typically insulates their properties from each other. 
Thus it could be argued that both connectedness and local connectedness have very limited relevance for 
locally Cartesian spaces and manifolds. The relevance of both connectedness concepts is typically to subsets 
of such spaces, not to whole spaces. 


34.7.2 DEFINITION: A locally connected point in a topological space X is a point x € X such that 
YQ € Top, (X), JU € Top, ( X), U CQ and U is connected. 


A topological space X is locally connected at a point x € X if x is a locally connected point in X. 


34.7.3 DEFINITION: A locally connected (topological) space is a topological space X such that 


Va € X, VQ € Top, (X), JU € Top, (X), 
U CQ and U is connected. 


In other words, X is locally connected at all points of X. 
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34.7.4 REMARK: Local connectedness of a subset of a space is defined within the relative topology. 

It makes good sense to define local connectedness of a subset of a topological space as local connectedness of 
the corresponding relative topological space. By Theorem 34.1.10, connectedness of UMS in X is the same 
as the connectedness of U N S in the relative topology on S in line (34.7.1). The set-inclusion UN S C RNS 
is equivalent to (U\Q)NS = 0. Otherwise there is little that can be done to reduce or simplify line (34.7.1). 
Local connectedness of subsets is rarely defined explicitly in textbooks. 


34.7.5 DEFINITION: A locally connected subset of a topological space X is a subset S of X such that S is 
a locally connected topological space with the relative topology from X. In other words, 


Va € S, VO € Top, (X), JU € Top, (X), 
UNS CONS and UNS is a connected subset of X. (34.7.1) 


34.7.6 REMARK: Connectedness and local connectedness are independent concepts. 

A connected space is not necessarily locally connected, and vice versa. It is easy to demonstrate topological 
spaces which are locally connected but not connected. For example, any Hausdorff space containing a finite 
number of points is locally connected but not connected if it contains two or more points. Any discrete space 
containing at least two points is locally connected but not connected. And so forth. Some connected spaces 
which are not locally connected are given in Examples 34.7.7 and 34.7.8. 


34.7.7 EXAMPLE: Comb space. Connected, but not locally connected. 
A simple example of a connected space which is not locally connected is the “comb space” which is illustrated 
in Figure 34.7.2. 


y 
1 
| not 
) locally 
connected 
04 
0 i~ 
x= 1— (0.75), k € Zt, or x=1,o y=1 


Figure 34.7.2 Connected but not locally connected “comb space” 


34.7.8 EXAMPLE: Closure of graph of sine-of-reciprocal function. Connected, but not locally connected. 
Define a topological space X based on the sine-of-reciprocal function as the set 


X = { (x,y) eR; (x =0A y € [-1,]]) v (x € (0,1] ^ y = sin(z/(22)))] 
= ({0} x [-1,1]) U {(z,y) € R?; x € (0, 1] and y = sin(r /(2£))} 


with the relative topology of R?. (This is illustrated in Figure 34.7.3.) 


X is connected (by Theorem 34.4.18) because it is the closure of the graph of sin(7/(2x)) for x € (0,1]. All 
neighbourhoods € € Top,(X) of points z = (0,y) € {0} x [-1, 1] are disconnected into an infinite number 
of components if Q C B(o,j),4. Therefore the set is not locally connected at these points. 


34.7.9 THEOREM: Some equivalent conditions for a space to be locally connected. 
Let X be a topological space. Then the following propositions are equivalent. 


(i) X is locally connected. 
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y 
not E a 
locally 
connected 
0 
1 £ 
= ({0} x [-1,1]) U 
{(z,y) € (0,1] x R; y = sin(7/(2x))} 
-1 
Figure 34.7.3 Connected set which is not locally connected 


(ii) All connected components of open sets in X are open. 


(iii) X has an open base whose elements are all connected sets. 


) 
) 
(iv) The set of all connected open sets in X is an open base for X. 
(v) Va € X, VU € Top, (X), Conny (x) € Top, (X). 


PRoor: To show the equivalence of propositions (i) and (ii), let X be a locally connected topological 
space, let Q € Top(X) and let K be a connected component of Q. Let r € K. Then K = Conngo(z) 
by Notation 34.6.3. So K = U{S € IP(Q); x € S and S is connected) by Theorem 34.6.9 (i). Since X 
is locally connected, there is a connected neighbourhood U € Top,(X) such that U C Q. Then U C K 
because U € (S € P(Q); x € S and S is connected}. Therefore x € Int(K) by Theorem 31.8.13 (iii). Thus 
Vr € K, x € Int(X), and so K C Int(K). So K € Top(X) by Theorem 31.8.14 (ii). Hence (i) implies (ii). 
Now suppose that all connected components of open sets in X are open. Let x € X and Q € Top, (X). 
Let U = Conngo(x) be the connected component of Q which contains x. Then U € Top,(X). But U is a 
non-empty connected subset of Q by Definition 34.5.6. Thus every neighbourhood of every point r € X 
includes a connected neighbourhood U € Top, (X). So X is locally connected. Hence (ii) implies (i). 


To show the equivalence of propositions (ii) and (iii), let X be a topological space in which all connected 
components of open sets in X are open. Then Conno(xz) is a connected open neighbourhood of z, and 
Conno(x) € Q, for all x € X and Q € Top, (X). Let B = (Conno(z); Q € Top( X) and x € Q}. Then B is 
an open base for X whose elements are all connected sets. Hence (ii) implies (iii). (Note how the axiom of 
choice is avoided by *choosing" the unique largest connected set which is included in each neighbourhood of 
each point.) 

Now suppose that X has an open base whose elements are all connected sets. Then every neighbourhood 
of every element in X includes a connected neighbourhood by Definition 32.2.3. So X is locally connected. 
Thus (iii) implies (1). Hence (iii) implies (ii) because it has already been shown that (i) implies (ii). 


To show the equivalence of propositions (iii) and (iv), let X a topological space which has an open base 
Bo whose elements are all connected sets. Let Bı be the set of all connected open sets in X. Then 
Bo € Bı € Top( X). So B, is an open base for X. Hence (iii) implies (iv). 


Now suppose that the set Bı of all connected open sets in X is an open base for X. Then Bı is an open 
base for X, all of whose elements are connected set. Hence (iv) implies (iii). 


To show the equivalence of propositions (ii) and (v), suppose that all connected components of open sets in 
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a topological space X are open. Let x € X and U € Top,(X). Then Conny (x) is a connected component 
of the open set U by Definition 34.6.2. Therefore Conny (x) is an open set in X. It follows that Va € X, 
VU € Top, (X), Conny (x) € Top, (X). Hence (ii) implies (v). 


Now assume Va € X, VU € Top, (.X), Conny (xz) € Top, (X) in a topological space X. Let K be a connected 
component of an open set Q € Top(X). Let r € K. Then K = Conngo(x) by Definition 34.6.2. So 
K € Top, (X), and so K is an open set in X. Hence (v) implies (ii). 


34.7.10 REMARK: Local connectedness makes maximal connected neighbourhoods connected components. 
Theorem 34.7.11 (i) is very similar to Theorem 34.6.12 (i). The difference is that in Theorem 34.6.12 (i), the 
topological space space is general, and only a one-way implication can be asserted, namely that every connected 
component is a maximal connected neighbourhood of some point. 


Theorem 34.7.11 (i) requires the space to be locally connected. Then the two-way implication may be asserted 
because all maximal connected neighbourhoods of points are connected components. 


34.7.11 THEOREM: Maximal connected neighbourhoods of points are connected components and vice versa. 
Let X be a locally connected topological space. 


(i) A set Q € Top( X) is a connected component of X if and only if 


Jz EQ, Q = UJ {U € Top, CX); U is connected}. 


PROOF: For part (i), let X be a locally connected topological space, and let Q € Top(X) be a connected 
component of X. Then Q = [J {U € Top, (X); U is connected} by Theorem 34.6.12 (i). 


To show the converse for part (i), let Q € Top(X) satisfy Q = U{U € Top,(X); U is connected} for 
some x € Q. Let S 2 |J(U € P(X); x € U and U is connected). Then Q C S because Top( X) C P(X), 
and S is a connected component of X by Theorem 34.5.19 (iii). So S € Top( X) by Theorem 34.7.9 (i, ii). So 
S C U {U € Top, (X); U is connected} = Q. Thus () = S. Hence Q is a connected component of X. 


34.7.12 REMARK:  Disconnection of connected components from their complements. 

As discussed in Remark 34.5.16), referring to the example of the rational numbers with the relative topology 
from IR, and as demonstrated by Example 34.6.14, a connected component of a topological space is not 
necessarily disconnected from its complement. However, in a locally connected space, every connected 
component is disconnected from its complement. 


34.7.13 THEOREM: Connected component of locally connected space is disconnected from its complement. 
Every connected component of a locally connected topological space is disconnected from its complement if 
the complement is non-empty. 


PROOF: Let X be a locally connected topological space. Let S be a connected component of X. Then S 
is closed by Theorem 34.5.4. So X V S is open. But S is open by Theorem 34.7.9 (i, ii), and S is non-empty 
by Definition 34.5.3. Therefore the pair (S, X V S) is a disconnection of X if X V $ is non-empty. In other 
words, S is disconnected from its complement if the complement is non-empty. 


34.8. Locally connected separable spaces 


34.8.1 REMARK: Enumeration of connected components of open sets in separable topological spaces. 
Theorem 34.8.2 generalises from R to locally connected separable spaces the well-known Theorem 32.7.8 
which states that every open set of real numbers may be written as a countable sequence of connected 
components which are pairwise disconnected. (See Definition 33.4.6 for separable topological spaces. See 
Definition 34.6.16 for countable connected component enumerations.) 


34.8.2 THEOREM: Open sets have countably many components in locally connected separable spaces. 
Let X be a locally connected separable topological space. Then every open subset of X has a countable 


connected component enumeration. 
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PROOF: Let X be a locally connected separable topological space. Then X has a countable dense subset S 
and there exists a surjection g : w —> S. Let Q € Top(X). Let Io = g^ !(Q) = (i € w; gli) € Q}. Define 
the function h : Ig + Top(X) by h(i) = Conno(g(i)) for all i € Ig. In other words, h(i) is the connected 
component of g(i) in Q for all i € Ig. Then Range(h) is the set of all connected components of Q because 
every component of Q is an open subset of X by Theorem 34.7.9 (i, ii), and every open set in X contains at 
least one element of S by Theorem 33.4.3 (i, iii). 

Let J = {j € Ig; Vi € Io, (à < j => h(i) Z h(j)))]. Then h(J) = h(Ig) = Range(h) and the restriction 
of h to J is injective. So hls : J > Range(h) is a bijection. By Theorem 12.4.8, there exists a bijection 
$ : N — J for some set N € wt. Define f : N > Top(X) by f = ho ¢. Then f is a bijection from the 
ordinal number N € wt to the set of all connected components of Q. Hence f is a countable connected 
component enumeration for Q by Definition 34.6.16. 


34.8.3 REMARK: Alternative enumeration procedures for connected components of open sets. 

The method of enumeration of connected components of open sets in Theorem 34.8.2 is analogous to throwing 
an infinite sequence of darts at a dartboard and listing the components in the order they are hit. It seems 
desirable to find a more efficient, less “hit and miss”, procedure for the enumeration. For example, one could 
define a well-ordering on the components, possibly with the following kind of abstract form. 


VG, Go € K, 
Gi > Go € "size" (G1) < "size" (G3) or ("size" (G4) = "size" (G2) and G4 “is to the left of” G2), 


where K is the set of all non-empty connected open subsets of X. When X is the set of real numbers (with 
the usual topology), the function "size" may be interpreted as the length of the set because all connected 
sets in R are intervals, but in general, one would need to use some kind of size parameter such as measure 
or diameter. The concept of *to the left of" is also easily interpreted for X — IR, but in general it is not 
so easy to order subsets of a space X by location. However, if these obstacles can be overcome, one could 
presumably list all of the connected components of a given open set systematically with respect to some 
well-ordering. The objective of Theorem 34.8.2, however, is merely to verify the countable cardinality of the 
set of components of any open set in general locally connected separable topological spaces. 


For bounded open subsets of X — IR, a well-ordering of the component intervals is produced by ordering first 
with respect to decreasing size, and second with respect to left-to-right order along the real line. In the case 
of an unbounded set, there could be an infinite periodic sequence of equal-size components, which would be 
totally ordered by location, but not well-ordered. This can be overcome by modifying the location-order so 
that intervals are ordered by distance from the origin, and positive intervals come before negative intervals 
if the distance is equal. Another problem with unbounded open sets of real numbers is that there could 
be an infinite sequence of components with increasing length. This problem can be overcome by applying 
left-to-right order to the set of all components which exceed a specified positive length, after which the 
components are sorted in order of decreasing size. 


'The purpose of ordering components first by size is to avoid the problem of infinite sequences of components 
converging to points in X. If location-ordering is used only for components of equal positive "size", one may 
hope to avoid the possibility of limit points of components. 


Whenever the connected components of a topological space can be well-ordered, enumeration of components 
by such a well-ordering will generally be preferable for practical applications to the “dartboard procedure". 


34.8.4 REMARK: Insufficiency of locally connected and separable to guarantee second countable. 

It might seem to be a reasonable conjecture that a locally connected separable space must necessarily be 
second countable. However, it is not even necessarily first countable. This is demonstrated by the class 
of finite-complement topology counterexamples in Theorem 33.4.19, which are locally connected. General 
finite-complement topological spaces with infinitely many points are both connected and locally connected, 
as shown in Theorem 34.8.5. (See Definition 31.11.7 for finite-complement topological spaces. ) 


34.8.5 THEOREM: Infinite subsets of finite-complement spaces are connected and locally connected. 
Let X be a finite-complement topological space which contains an infinite number of points. Then every 
infinite subset of X is connected and locally connected. 
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PROOF: Let X be a finite-complement topological space with an infinite number of points. Let S € P(X) 
be an infinite subset of X which satisfies S = Sı U So with Sı Æ (), So 4 0, Sq C Qi, S2 C Qo and 
S (1040 05 = (0, where Q1, Q2 € Top( X). Then SN Qı € X \ Q2. So Sy C X \ Q2 because S104 = $1 
since Sg C Q2. But X \ Q2 is finite. So Sı is finite. Similarly, Sə is finite. So S is finite, which is a 
contradiction. Therefore S is connected. 


To test local connectedness, let S be an infinite subset of X. Let r € S and Q € Top,(X). Then S = 
(SN Q)U(S\Q). Suppose that S  Q is a finite set. Then S is a finite set because SVO C X \ 2 is a finite 
set (and the union of two finite sets is a finite set). So S N Q must be an infinite set. Therefore S N Q is 
connected. Thus every neighbourhood Q of every element x in S contains a neighbourhood UN S=0NS 
in the relative topology on S which is connected. Hence S is locally connected by Definition 34.7.5. 


34.9. Topological properties of real number intervals 


34.9.1 REMARK: All connected sets of real numbers are intervals, and vice versa. 

If a set of real numbers is connected, assuming the usual topology on IR, then it is an interval. (See 
Definition 16.1.4 and Notation 16.1.2 for real-number intervals.) A more general version of Theorem 34.9.2 
is proved by Gaal [77], pages 101—102, for any totally ordered set with the least upper bound property. (See 
Definition 11.5.10 for the generalisation of intervals to totally ordered sets.) 


34.9.2 THEOREM: Every connected set of real numbers is an interval. 
Every connected subset of R is an interval of R. 


PROOF: Let S be a connected subset of IR. Suppose that 271,72 € S, t € R\ S and zı < t < zs. Let 
Sı = {x E€ S; x < t}, So = {x E€ S; t < a}, O1 = (—oo,t) and Q9 = (t,o0). Then S = S1U S2, $4 C Q, 
S2 C Ng and QYNNENS = 0. But Q1, Q2 € Top(IR), $1 2 (zi) Æ and S2 D (v2) Æ 0. So S is disconnected 
by Theorem 34.3.5, which is a contradiction. Therefore Vz,y € S, Vt € R, (( < tand t < y) 2 t € S). 
Hence S is a real-number interval by Definition 16.1.4. 


34.9.3 THEOREM: The connected sets of real numbers are the intervals, and vice versa. 
A set I C R is connected in the usual topology on IR if and only if J is an interval. 


PROOF: A real-number interval is connected by Theorem 34.4.9. The converse by Theorem 34.9.2. 


34.9.4 l'HEOREM: The image of a connected set by a continuous real-valued function is an interval. 
Let f : X — IR be a continuous function for some topological space X. Let A be a connected subset of X. 
Then f(A) is a real-number interval. 


PROOF: The assertion follows from Theorems 34.4.18 and 34.9.2. 


34.9.5 THEOREM: A continuous real-valued function of a real number maps intervals to intervals. 
A continuous function f : R — R maps intervals to intervals. 


PROOF: The assertion follows from Theorems 34.4.20 and 34.9.2, or alternatively from Theorems 34.9.3 
and 34.9.4. 


34.9.6 REMARK: History of the intermediate value theorem. 
For an early statement and proof of the intermediate value theorem, see Cauchy [206], pages 43-44, 460—463. 


34.9.7 THEOREM: Intermediate value theorem. 
Let f : [a,b] + R be continuous for some a,b € R with a < b. Then 


Vy € [[f(a), f (0)]], 3x € [a,b], f(x) — y, 


where [[f(a), f(6)]] = [min(f (a), f(b)), max( f(a), f(b))] is defined by Notation 16.1.15. In other words, 
f (la, b]) 2 [Lf (a), FO]. 


PROOF: The assertion follows from Theorem 34.9.5 and Definition 16.1.4. 
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34.9.8 THEOREM: Continuous injective real functions preserve order. 
Let f : I — R be continuous and injective for some real interval J. Then 


Va,b € I, (lle, ll) = [LF (a), FO), (34.9.1) 
where [[a, b)]] = [min(a, b), max(a, b)] is defined in Notation 16.1.15. Moreover, 
Va, b € I, f is either increasing or decreasing on [[a, 0]. (34.9.2) 


PROOF: By Theorem 16.1.9, [[a, 0]] C I. By Theorem 34.9.5, f([[a, b]]) is an interval. By Theorem 34.9.7, 
[Lf (a), f(5)] € F([la, 6]]). Suppose that x € [[a,b]] with f(x) € [[f(a), f(b)]]. Then f(z) < min(/(a), f( 
or f(x) > max(f(a), f(b)). Suppose first that a < b, f(a) € f(b) and Pla ) < min(f(a), f(b)) = f( 
Then f(x’) = f(a) for some z' € [r, b] because f(a) € [f(x), f(b)]. But x’ 4 a because a < x < a’. 
f is not injective on I, which contradicts as assumption. Therefore the combination a < b, f(a 1) < f(b) 
and f(x) < min(f(a), f(b)) is not possible. Similarly, all other combinations with a > b, f(a) > f(b) or 
f(x) > max( f(a), f(b)) contradict injectivity. Hence f([[a, 6]]) = [[f(a), f(b)]]. 

To show line (34.9.2), suppose that a < b and f(a) < f(b). Then f(a) < f(b) by injectivity. Suppose that 
there exist 21,22 € [[a,b]] with zı < z» and f(zi1) > f(z2). Then f(zi) > f(x2) by injectivity. So by 
line (34.9.1), f(a) € f(z2) < f(z1) € f(b). But by applying line (34.9.1) with x in place of a, one obtains 
f(ai) € f(z2) € f(b), which is a contradiction. So there do not exist 21,22 € [[a,b]] with zı < z2 and 
f(zi) > f(x2). In other words, f is increasing on [[a,5]]. Similarly, a < b with f(a) < f(b) imply that f 
is decreasing. And then the same argument is valid for b < a. (The case a = b is trivial) This verifies 
line (34.9.2). 


34.9.9 REMARK: Continuity and injectivity implies strict monotonicity. 

Whereas Theorem 34.9.8 line (34.9.2) applies only to bounded closed intervals, Theorem 34.9.10 applies to 
general intervals. (See Theorem 34.9.25 for the further consequence that f7! : Range( f) — J is a continuous 
bijection.) 


34.9.10 THEOREM: Continuous injective real functions on real intervals are increasing or decreasing. 
Let I be a real-number interval. Let f : I — IR be continuous and injective. Then f is increasing or 
decreasing on I. 


Proor: If #(/) <1, then f is both increasing and decreasing on J. So assume that #(/) > 1. Then there 
exist x1, £2 € I with x < x2. Since f is injective, either f(x1) < f(x2) or f(z1) > f(x2). Assume first that 
f (21) < f(z2), and suppose that f is not increasing. Then by Definition 11.1.30, there exist x3, x4 € I with 
x3 < z4 and f(x3) > f(x4). Injectivity then implies that f(x3) > f(x4). By Theorem 34.9.8 line (34.9.2), f 
is decreasing on [23,24]. So it is not possible for the intervals [z1, x2] and [x3, z4] to have a positive-length 
overlap. So either v3 < £4 € £1 < £2 or z4 < x3 € £3 < z4. In the former case, [z3, £2] C I and so f must 
be increasing or decreasing on [x3, x2], which contradicts the assumption for one of the two intervals [x3, z4 
or [2,22]. Likewise in the latter case. Therefore z3,x4 € I with £3 < x4 and f(x3) > f(x4) do not exist. 
So f is increasing on J. Similarly, if f(z1) > f(x2) then f is decreasing on I. This verifies the assertion. 


34.9.11 REMARK: The Heine-Borel theorem for sets of real numbers. 

Theorem 34.9.13 is proved here without using any axiom of choice. This very simple form of proof is given 
by S.J. Taylor [147], page 30, Gemignani [80], pages 152-153, and Kelley [101], pages 144-145. The proof by 
Johnsonbaugh/Pfaffenberger [97], page 113, is essentially the same. See also Jech [364], page 29, exercise 25. 
This simple proof strategy was followed in Heine's 1872 paper to prove a different theorem, namely that a 
continuous function on a compact interval is uniformly continuous, although he did not present any concept 
of compactness there. (See Heine [181], page 188.) 


For a much more complicated proof, see B. Mendelson [115], pages 165-167. The theorem is proved for 
R and general R”, unnecessarily using the axiom of choice, by Simmons [137], pages 114, 119-120. It is 
called the Heine-Borel-Lebesgue theorem by Kelley [101], page 135. The version for IR? is called the Borel- 
Lebesgue theorem by Wallace [153], page 12. Moore [371], page 65, states that Borel proved the assertion of 
Theorem 34.9.12 in his 1895 doctoral thesis by tacitly, and unnecessarily, applying the axiom of countable 
choice. An elementary proof for R” is given by Gemignani [80], pages 184-185. 


The proof of Theorem 34.9.12 is illustrated in Figure 34.9.1. 
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Ge 
m q Pa R^ » R 
C 
Ci > C1 C5 = C1 U {Ge} 
Figure 34.9.1 Proof that bounded closed real-number intervals are compact 


34.9.12 THEOREM: Heine-Borel theorem for a real-number interval. 
All bounded closed real-number intervals are compact. 


PROOF: Let C be an open cover of [a,b]. Define S = (x € [a,b]; IC, € C, Ci is finite and Ci covers [a, x]). 
Then a € S because 3G, € C, a € Ga = U{Ga}. So S Z 0. Therefore c = sup(S) is well defined and c € (a, b] 
because S C [a,b] and Ga C S for some Ga € C with a € Ga. But c € Ge for some Ge € C because C 
covers [a,b]. By Definition 32.5.7 for the topology on IR, there is an open interval (d,e) C IR such that 
c € (de) C Ge. Let d' = max(a,d). Then d' € (a,c) and the interval [a, d'] is covered by a finite subcover 
Cı of C by the definition of S. Let C2 = C1 U {Ge}. Then C is a finite open cover of the interval [a, e]. If 
c < b, then this contradicts the definition of S because e > c. Therefore c = b. So C has a finite subset Cy 
which covers [a,b]. Hence [a,b] is compact. 


34.9.13 THEOREM: Heine-Borel theorem for sets of real numbers. 
All bounded, closed subsets of R are compact. 


PROOF: Any bounded, closed interval [a,b] is compact by Theorem 34.9.12. In the case of a general 
bounded, closed subset K of IR, there is a closed interval [a,b] C R with K C [a,b]. By Theorem 33.5.13, 
any closed subset of a compact set is compact. Therefore K is compact. 


34.9.14 THEOREM:  Heine-Borel theorem for subsets of Cartesian spaces. 
All bounded, closed subsets of IR^ are compact for all n € Zg. 


PROOF: Letn € Zi . Let S be a bounded closed subset of IR". Then S C x7 41; for some sequence of 
closed intervals (1;)*., € Top(IR)". But I; is compact for all i € N, by Theorem 34.9.12. So x? 1; is 
compact by Theorem 33.5.18. Therefore S is compact by Theorem 33.5.13. Hence S is a compact subset 
of R”. 


34.9.15 REMARK: Classification of real-mumber intervals according to topology and order. 

From the topological point of view, two real intervals are equivalent if they are homeomorphic, but there 
is a further distinction according to whether the homeomorphism is order-preserving. The following table 
classifies intervals into equivalence classes with respect to order-preserving (increasing) homeomorphisms. Tt 
is assumed that a,b € IR with a < b. 


type intervals topological properties 
0. empty () compact and open 
1. singleton a, a] compact 
2. compact [a, 6] compact 
3a. compact-open  [a,5), [a, oc) left-compact, right-open 
3b. open-compact (a, b], (—oo, b] left-open, right-compact 
4. open (a,b), (a, oo), (—00,0), (—00,00) (non-empty) open 


There are six equivalence classes of intervals for oriented homeomorphisms and five classes for unoriented 
homeomorphisms because (3a) and (3b) are equivalent if reversals are permitted. Since all topological 
properties of the image of a curve are invariant under homeomorphisms of the parameter interval, one may 
represent all possibilities in terms of bounded intervals, and one may reduce these intervals to the canonical 
case that a = 0 and b = 1. 


The above table uses the term “compact” rather than “closed”, which is used in Definition 16.1.5, because 
it is technically more precise. In elementary introductions, intervals with square brackets are identified as 
“closed” because they contain the specified end-point, where intervals with round brackets (i.e. parentheses) 
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are identified as “open” because they do not contain their specified end-points. However, this disagrees with 
the topological definitions of “open” and “closed”. Definition 34.9.16 gives a more precise naming convention. 
(The empty interval is not defined to be left-open or right-open because it has no left and no right.) 


34.9.16 DEFINITION: 
A left-compact (real-number) interval is a real-number interval I such that {t € I; t € c} is a compact set 
for some c € I. 


A right-compact (real-number) interval is a real-number interval J such that {t € I; t > c] is a compact set 
for some c € I. 


A left-open (real-number) interval is a real-number interval I such that {t € I; t < c] is a non-empty open 
set for some c € I. 


A right-open (real-number) interval is a real-number interval J such that {t € I; t > c] is a non-empty open 
set for some c € I. 


34.9.17 REMARK: All real-number intervals are non-decreasing continuous images of an open interval. 
Intervals are closely related to continuous curves. (See Definition 36.2.3.) Theorem 34.9.18 states in essence 
that any real-number interval can be expressed as the image of an open interval. (This is applicable to the 
path-equivalence of curves in Theorem 36.5.6.) 


34.9.18 THEOREM: Every non-empty interval is the range of some non-decreasing continuous real function. 
Let J be a non-empty real-number interval. Then there is a non-decreasing continuous function from IR to 
R whose range equals 7. 


PROOF: Let I = [a,b] for a,b € R with a € b. Define 8 : IR > I by f(x) = max(a, min(b, x)). Then B isa 
non-decreasing continuous surjection. 

Let I = [a,b) for a,b € R with a < b. Define 8 : R > I by B(x) = (a + bimax(0, x))/(1 + max(0, x)) 
for x € IR. Then £ is a non-decreasing continuous surjection. Similarly, for J = (a, b], define 8 : R — I by 
G(x) = (b — a min(0, z))/(1 — min(0, x)) for z € R. 

Let I = fa, oc) for some a € R. Define 6: IR — I by f(x) = min(a, x) for x € IR. Then £ is a non-decreasing 
continuous surjection. Similarly, if I = (—oo, b], define 8 : R — I by G(x) = max(z,b) for x € R. 

Let I = (a,b) for a,b € R with a < b. Define 8 : R — I by B(z) = $(a-- b) + (b — a)z/(1 + ||) for x € R. 
Then f is a non-decreasing continuous surjection. 

Let I = (a,co) for a € R. Define 8 : IR > I by 8(x) = a+ (1 + max(0,2))/(1 — min(0, z)) for x € R. 
Then f is a non-decreasing continuous surjection. Similarly, if I = (—oo,b) for b € IR. Define 8 : R > I by 
B(x) = b — (1 — min(0, x))/(1 + max(0, z)) for z € R. 


34.9.19 REMARK: Relation of existence of non-decreasing continuous surjections between intervals. 
Theorems 34.9.18 and 34.9.20 assert the existence of non-decreasing continuous surjections between the five 
classes of non-empty intervals in Remark 34.9.15. Such existence is not symmetric, although it is reflexive 
and transitive. So it resembles an order relation, whereas the existence of a homeomorphism is an equivalence 
relation. (This order relation is illustrated in Figure 34.9.2.) 


singleton empty 


{a} 0 
Bl ge lo Me 


compact-open| |compact| |open-compact 
[a, b), [a, oc) [a, b] (—20, b], (a, b] 


LEE al 


non-empty open 
(a, b), (a, oo), (—00, b), (—00, oo) 


Figure 34.9.2 Existence of non-decreasing continuous surjections between intervals 
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34.9.20 THEOREM: Existence of non-decreasing continuous surjections between some interval classes. 
(i) For any non-empty open real-number interval J, there is a non-decreasing continuous surjection from I 
to IR. 
(ii) For any non-empty compact intervals J and J, there is a non-decreasing continuous surjection from I 
to J if I is not a singleton. 
(iii) For any compact-open intervals J and J, there is a non-decreasing continuous surjection from I to J. 
(iv) For any open-compact intervals J and J, there is a non-decreasing continuous surjection from I to J. 


v) For any non-empty interval / and singleton interval J, there is a non-decreasing continuous surjection 
from I to J. 


PRoor: For part (i), let Z = (a,b) for some a,b € R with a < b. Define B : (a,b) > R by 8 : x => 
(b— x)! — (x — a)-!. Then f is a non-decreasing continuous surjection. 

Let I = (a,oo) for some a € IR. Define 8 : (a,oo) — R by 8 : x —z—a-—(r—a) |. Then B isa 
non-decreasing continuous surjection. 

Let I = (—oo,b) for some b € R. Define 8 : (—oo0,b) > R by B: 21 b—z--(b—z) |. Then £ isa 
non-decreasing continuous surjection. 

For part (ii), let J = [a,b] and J = [c, d] for some a,b, c, d € R with a < b and c < d. Define  : [a,b] — [c, d] 
by 8 : x> c+ (x — a)(d — c)/(b — a). Then £ is a non-decreasing continuous surjection. 


For part (iii), suitable maps from J = |a, b) and J = |c, d) for a,b, c, d € R with a < b and c < d are as given 
for part (ii). For the case 7 = [a, oo) and J = [c,oo), 8: x œ> x + c— a is suitable. 

Let I = [a, b) and J = [c, oo) for a,b,c € R with a < b. Define £ : [a, b) > [c,oo) by 8: ze c+ (b—z)! 
(b — a). Then £ is a non-decreasing continuous surjection. 

Let I = [a, oo) and J = [c, d) for a,c, d € R with c < d. Define f : [a, o0) — [c d) by B : x — d — (d— c) 
(z — a 4- 1)71. Then f is a non-decreasing continuous surjection. 


For part (iv), suitable maps from I = (a,6] and J = (c,d] for a,b, c, d € R with a < b and c < d are as 
given for part (ii). For the case I = (—oo, b| and J = (—oo,d], 8: x — x + d bis suitable. The bounded 
open-compact intervals have suitable maps following the pattern of part (iii) 

For part (iv), for any non-empty interval J and singleton interval {a}, define 8 : I — {a} by B(x) = a for 
all x € I. Then 8 is a non-decreasing continuous surjection. 


34.9.21 THEOREM: Impossible non-decreasing continuous surjections between intervals. 
(i 
(ii 


There is no non-decreasing continuous surjection from an empty interval to a non-empty interval. 
There is no non-decreasing continuous surjection from a non-empty interval to an empty interval. 
(iii) There is no non-decreasing continuous surjection from a singleton interval to a non-singleton interval. 
(v 


(vi 


) 
) 
iv) There is no non-decreasing continuous surjection from a left-compact interval to a left-open interval. 
g J 
) There is no non-decreasing continuous surjection from a right-compact interval to a right-open interval. 
) 


There is no non-decreasing continuous surjection from a compact-open, compact or open-compact 
interval to an open interval. 


PROOF: For part (i), let J = Ø and let J be a non-empty interval. Let 6 : I — J be a non-decreasing con- 
tinuous surjection. Then 8(I) = 0 4 J by Theorem 10.6.7 (i), which contradicts the surjectivity assumption. 


For part (ii), let J be a non-empty interval and let J = Ø. Let 8 : I — J be a non-decreasing continuous 
surjection. Let x € I. Then B(r) € J which contradicts the emptiness assumption for J. 


For part (iii), let J be a singleton interval and J be a non-singleton interval. Then J = {a} for some a € IR. Let 
B : I — J bea non-decreasing continuous surjection. Then Range() = {6(a)}, which is a singleton interval. 
But J = Range(£) by the surjectivity assumption. So J is a singleton interval, which is a contradiction. 

For part (iv), let 7 be a left-compact interval and J be a left-open interval. Then K = {t € I; t € cj is 
a compact set for some c € I. Let B : I — J be a non-decreasing continuous surjection. Let K’ = 8(K). 
Then K' is compact by Theorem 33.5.15, and K’ is a real-number interval by Theorem 34.9.5. But K’ = 
(8(t); t € I and t € c) = {t € J; i' € B(c)) because 8 is non-decreasing. So J is a left-compact interval, 
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which contradicts the assumption that J is left-open. Therefore there is no such non-decreasing continuous 
surjection 6: I > J. 


Part (v) may be proved following the pattern of the proof of part (iv). 


Part (vi) follows from parts (iv) and (v). 


34.9.22 REMARK:  Non-decreasing surjections between real-number intervals are continuous. 

It may seem slightly surprising that all non-decreasing surjections between real-number intervals are contin- 
uous. The non-decreasing condition depends only on the order on R, and the surjection condition is purely 
set-theoretic at a very low level. Neither of the conditions seems to be related to topology. However, the 
topology on R is defined in terms of intervals, which are themselves defined in terms of only the order on R. 
So the topology and order on R are tightly coupled. 


Intuitively, Theorem 34.9.23 seems clear because a non-decreasing function which is not continuous must 
have a gap, which contradicts the surjectivity. In other words, non-continuity would imply that an interval 
maps to a non-interval, thereby contradicting Theorem 34.9.5. (The relationship between continuity and 
mapping connected sets to connected sets is examined in much more detail and generality in Section 35.2.) 


Theorem 34.9.23 is perhaps more easily proved in terms of the metric space £- formulation of function 
continuity as in Theorem 38.1.3. Then one would suppose de > 0, Vd > 0, Ih € (0,0), |f(x +h) - f(z)| > € 
for some x € I. Since f is non-decreasing, this implies Je > 0, Yx’ € I, (x' > x => f(a’) > f(x) +e). So 
Range(f) n (f(x), f(x) + e) = 0. This contradicts the surjectivity of f. So Ve > 0, dó > 0, Vh € (0,5), 
|f (xz +h) — f(x)| < € for all x € I, and similarly for the left-side limit. Therefore f is continuous. 


34.9.23 THEOREM: All non-decreasing surjective maps between real-mumber intervals are continuous. 
Let f : I — J be a non-decreasing surjective map between real-number intervals J and J. Then f is 
continuous. 


PROOF: Let J and J be real-number intervals. Let f : I — J be non-decreasing and surjective. Let 
G € Top(J). Then G = QN J for some 2 € Top(IR) by Definition 31.6.2. Let y € G. Then y € Q. So 
(y —6,y +€) CO for some € € IR* by Theorem 32.5.9. Let Iye = {t € I; |f(t) - y| < £}. Then I, is a real- 
number interval by Definition 16.1.4 because if t1, t2 € Iy, and t € (t1,t2), then t € I because Lis an interval, 
and f(t) € f(t) € f (t2) because f is non-decreasing, and so |f (t) — y| < £ because (y' € R; |y — y| < €} is 
an interval, which then implies that t € Iye. 

Let x € f~'({y}). Let zı = inf(I,,-). Then x; € R is well defined and x < x because z € I}. Suppose 
that rz; = x. Suppose that inf(I) < xı. Then there exists z| € I with x} < zı. But f(zj) € y — e€ for 
such z^, because f is non-decreasing. But this contradicts the surjectivity of f because there is then no 
x] € I with f(x{) = y — $e although y — $e € J because J is an interval and y — je € [f(z1),y] € J. 
Therefore inf(Z) = zı, and so inf(J) = y. Thus either zı < x or else inf(I) = zı = x and inf(J) = y. 
Similarly, let 2 = sup(Jy,-). Then either £ < x2 or else x = x2 = sup(I) and sup(J) = y. 

Let G = (21,22). Then G € Top(I) because G = Qn I, where € = (21,23) € Top(R) with # = x — 1 if 
xı = x = inf(I) and Z; = xı if x, < x, and £9 = z + 1 if z2 = x = sup(J) and Z9 = za if x < x. But 
G C I and z € G. So G € Top, (1), and so x € Int(G) with respect to the relative topology on I. Therefore 
G € Top(I) by Theorem 31.8.14 (ii). Hence f is continuous by Definition 31.12.4. 


34.9.24 THEOREM: Continuous bijections between intervals are strictly monotonic homeomorphisms. 
Let f : I — J be a continuous bijection between real-number intervals J and J. Then f^! : J > I isa 
continuous increasing or decreasing function. 


PROOF: Let J and J be intervals. Let f : I — J be a continuous bijection. Then by Theorem 34.9.10, f is 
either increasing or decreasing on I. So by Theorem 11.5.33 and Definition 11.1.21 (iiv), f ^ : J > I is 
increasing or decreasing. Hence f^! : J — I is continuous by Theorem 34.9.23. 


34.9.25 THEOREM: A continuous real-valued injection on a real interval has a continuous inverse. 
Let I be a real interval. Let f : J + R be a continuous injection. Then f^! : Range( f) > J is a continuous 
bijection. 


PROOF: Let f : I — R be a continuous injection. Then f is increasing or decreasing by Theorem 34.9.10, 
and Range( f) is a real interval by Theorem 34.9.5. So f^! : Range(f) — I is continuous by Theorem 34.9.24, 
and it is a bijection because f : I + Range(1) is a bijection. 
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34.9.26 REMARK: Topological properties of real-number sequences. 
Since limits and convergence of sequences are defined in Section 35.4, the topological properties of the real 
number system which are expressed in terms of sequences are presented in Section 35.7. 
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Chapter 35 


CONTINUITY AND LIMITS 
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Continuity of functions is defined in Section 31.12. Chapter 35 is concerned with conditions for continuity, 
and also with pointwise limits and convergence of functions. 


35.1. Set-map properties of continuous functions 


35.1.1 REMARK: Conditions, consequences and equivalences for continuity of functions. 
Theorem 35.1.2 demonstrates in parts (i), (i’), (i”) and (i) that continuity of functions may be defined in 


terms of the function’s inverse set-map for interiors, exteriors, closures or boundaries respectively of general 
sets in the domain and range. These inverse set-maps are illustrated in Figure 35.1.1. 


< S > 
Int(S) «— Bdy(S) —— Ext(S) 
S >< YNS > 
Y > 
f f 


Figure 35.1.1 Inverse set-maps of interior, exterior and boundary for a continuous function 


Although practical topology is often associated with hard analysis, expressed in terms of various kinds of 
limits and convergence, the soft side of topology is often made to resemble abstract algebra. Thus, for 
example, the nitty-gritty of hard analysis is concealed beneath abstract relations in terms of various classes 
of maps and operators in Theorem 35.1.2. The disadvantage of such "abstract expressionism" is that the 
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algebraic style of symbology is distant from concrete analysis, and ultimately the abstract symbols must be 
interpreted within concrete contexts if they are to be applied. Analysis is often broadly categorised as either 
“hard analysis” or “soft analysis”. The abstract algebraic-style formulations of topology are part of soft 
analysis. Differential geometry has a similar division into the softer “coordinate-free” style, which attempts 
to make everything look like algebra, and the harder analytical style where every symbol has a clear concrete 
meaning. Both the soft and hard wings of differential geometry, analysis and topology are essential for real 
understanding. Therefore it is important to have fluency in both languages and the ability to translate each 
into the other. 


35.1.2 THEOREM: Inverse set-map conditions for continuity of functions. 
For topological spaces X and Y, let f : X > Y. 
(i) f is continuous if and only if VS € P(Y), f-!(Int(S)) C Int(f !(S)). 
(ii) If f is continuous and surjective, then VS € P(Y), Int(S) C f(Int(f -! (S))). 
(iii) If f is injective and VS € P(Y), Int(S) C f(Int(f/ ^! (5S))), then f is continuous and surjective. 
(iv) If f is injective, then f is continuous and surjective if and only if VS € P(Y), Int(S) C f(Int(f ^! (S))). 
(i) f is continuous if and only if VS € P(Y), f! (Ext(S)) C Ext(f-!(S)). 
(ii^) If f is continuous and surjective, then VS € P(Y), Ext(S) C f(Ext(f-!(S))). 
(iii) If f is injective and VS € P(Y), Ext(S) C f(Ext(f !(S))), then f is continuous and surjective. 
) 
) f 
) 
) 
) 
rg 


(iv’) If f is injective, then f is continuous and surjective if and only if VS € P(Y), Ext(S) C f(Ext(f~1(S))). 


(i^) f is continuous if and only if VS € P(Y), f-*(8) 2 f-1(S). 
(ii") If f is continuous and surjective, then VS € P(Y), 8 2 f(f-1(8)). 
(ii^) If f is injective and VS € P(Y), S 2 f(f-1(S)), then f is continuous and surjective. 
(iv) If f is injective, then f is continuous and surjective if and only if VS € P(Y), $ 2 f(f-1(9)). 


(i 


is continuous if and only if VS € P(Y), f^! (Bdy(S)) 2 Bdy(f~!(S)). 


PROOF: For part (i), let f : X — Y bea continuous function for topological spaces X and Y. Let S € P(Y). 
Then Int(S) € Top(Y) by Theorem 31.8.13 (i). So f^! (Int(S)) € Top(X) by Definition 31.12.4 because f 
is continuous. But f~'(Int($)) C f^!(S) by Theorem 31.8.13 (ii) and Theorem 10.6.10 (ii). Therefore 
f^ (Int(S)) C Int(f-! (S)) by Definition 31.8.2. Hence VS € P(Y), f-!(Int(S)) C Int(f- (S)). 


For the converse of part (i), assume that f : X — Y satisfies VS € P(Y), f^! (Int(S)) € Int(f- (8) for 
topological spaces X and Y. Let Q € Top(Y). Then Q = Int(Q) by Theorem 31.8.14(i). So f^ !(Q) = 
f i(Ix(Q)). So f-!(Q) C Int(f^!(0)) by the assumption on f. But Int(f !(0)) C f-!(Q) by Theo- 
rem 31.8.13 (ii). So f~'(Q) = Int(f-! (Q)). Therefore f^! (Q0) € Top(X) by Theorem 31.8.14 (i). Hence f is 
continuous by Definition 31.12.4. 
For part (ii), let f : X — Y be continuous and surjective. Let S € P(Y). Then f^ !(Int(S)) C Int(f~1(S)) 
by part (i). But f(f-(Int(S))) = Int(S) by Theorem 10.7.1 (i7). So Int(S) = f(Int(f -!(S))). 
2d part (iii), let f : X — Y be injective, and assume that VS € P(Y), Int(S) C f(Int(f-!(S))). Then 
= Int(Y) C f(Int(f-1(Y))) = f(Int(X)) = F(X). Hence f is surjective. Now let Q € 2 Then 
ae = Int(Q) C f(Int(f-1(Q))). But Int(f-!(Q)) C f-!(Q) by Theorem 31.8.13 Gi), and so f(Int(f~1(Q))) C 
f(f-1(0)) = Q by Theorem 10.6.7 2 and Theorem 10.7. as ^. So Q = fünf. HO » by the ZF axiom of 
extension, Definition 7.2.4 (1). So f^! (Q) = f! (f(Int(f-! (0)))). But f^ aC) (Int(f ^ (Q 0) = -Ini(f *(0)) 
by Theorem 10.7.1 (ii^) and the injectivity of f. Therefore f^ !(Q) = Int(f ^! (Q)). So f^! (Q) € Top(X) by 
Theorem 31.8.14 (i). Hence f is continuous. 


Part (iv) follows from parts (ii) and (iii). 

For part (i), let A € P(Y) and let S = Y V A. Then f !(Int(S)) = f !(Int(Y V A)) = | eA )) 
by Theorem 31.9.10 (iv). Similarly, Int(f ^! (S)) = Int(f ! (Y V A)) = Int(X V f7'(A)) = Es 1(A)) by 
Theorem 10.6.10 (v') and Theorem 31.9.10 (iv). Hence by part (i), f is continuous if and only if VA € 
P(Y), f~*(Ext(A)) € Ext(f ^! (A)). 

Part (ii^) follows from part (ii) by letting S = Y\A for A € P(Y), and noting that f^! (Int(S)) = f^! (Ext(A)) 
and Int(f ^! (S)) = Ext(f-1(A)). 
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Part (iii!) follows from part (iii) by letting S = Y V A for A € P(Y), and noting that f !(Int(S)) = 
f (Ext(A)) and Int(f ^! (S)) = Ext(f ^ !(A)). 
Part (iv^) follows from parts (ii) and (iii). 
For part (i), 
YS € P(Y), f! (5) 2 f-!(S) & VS € P(Y), £f. (YVS) 2 f- (YN S) 

e VS e P(Y), f (Y VInt(S) 2 X \ Int(X V f! (YVS)) 

€ VS € P(Y), XX f (Y VInt(S)) C Int(X V f 1 (Y V S)) 

€ YS € P(Y), f^! (Int(5)) € Int(f ^! (5)) (35.1.1) 

«€» f is continuous. (35.1.2) 
Line (35.1.1) follows from Theorem 10.6.10 (v’). Line (35.1.2) follows from part (i). 
For part (ii”), let f be continuous and surjective, and let S € P(Y). Then it follows from part (ii) that 
S —YMnt(YVS) 2 E ATUS f~'(Y\S))), which equals f( X Mnt(f ^! (YVS))) by Theorem 10.6.10 (v^), and 
this equals f(X V f^! (Y V S)) by Theorem 31.8.13 (xii), which equals f(f~'(S)) by Theorem 10.6.10 (v’). 
For part (ii^), let f be injective, and assume that VS € P(Y), S > f(f-!(S)) Let S € P(Y). Then 
Y \S € P(Y). So Y\S 2 f(f-!(Y NS)). Therefore Int(S) = Y \ (Y S) CY \ F(Y \S$)). But 
Y\ EEY \ S) = Y \ F(X \Int(X AY S) = Y \ F(X \ Int(f“(5))) = f(Int(f ^ (5))). Therefore 
VS € P(Y), Int(5) C f(Int(f~1($))). Hence f is continuous and surjective by part (iii). 
Part (iv^) follows from parts (ii") and (iii^). 
For part (1), let S € P(Y). Suppose f : X — Y satisfies f*(Bdy(5)) D Bdy(f. 1(S)). Then f-!(S)N 
f^ (Bdy(5)) € f~"(S)\Bdy(f~*(S)) by Theorem 8.2.6 (xvi). But f" (S)Vf- "(Bdy(S ) = f-*(S\Bay(S)) 
by Theorem 10.6.10 (v). So f^! (SNBdy(S)) € f! (S)VBdy(f ^! (5)). Therefore f^! (Int(S)) C Int(f~*(S)) 
because Int(S) = S \ Bdy(S) and Int(f !(S)) = f !(S) V Bdy(/ ^! (S)) by Theorem 31.9.10 (xxi). Hence 
VS € P(Y), f! (Int(S)) € Int(f-! (S)). So f is continuous by part (i). 
For the converse of bab (i P^) let f : X — Y be continuous. Let S € P(Y). Then f^! (Int(S)) C Int(f-! (S)) 
by part (i). So f^ as aT 1(Bdy(S »c [^ 1(S)N Bdy(f-!(S)) by Theorems 31.9.10 (xxi) and 10.6.10 (v). 
Therefore f! (S) f *(Bdy(S)) 2 f! (S)nBdy(f !(S)) by Thco E aE E): Similarly, f  (Ext(S)) C 
Ext(f-!(S)) by part (1). So IYA S) \ Bdy(S)) C (XV f71(S)) \ Bdy(f"(S)) by a double applicasioa 
of Theorem 31.9.10 (xxiii). But f^ etd \ 8) \ Bdy(S)) = f (YXS)X f I (Bdy(5)) = (f- (Y ) Vf M8) 
f" (Bdy(S) = (XX f-*(S)) V f^ (Bdy(S)) by Theorem 10.6.10 (v). So (X XV FHS) V fr "(Bdy(S ) 
(X \ £7 (8) \ Bdy(f7(8)). Therefore (X V f7!(85)) n f (Bdy(S)) 2 (X \ f-*(S)) n Bdy(f7(8)) by 
Theorem 8.2.6 (xvii). Combined with the proposition FS na f !(Bdy(S pof «Ss)nBdy(f-!(S) 
which has just been obtained, this gives f! (Bdy(S)) 2 Bdy(f !(S)) by Theorem 8.2.6 (xxi). Hence VS € 
P(Y), f~*(Bdy(S)) 2 Bdy(f~*(S)). 


IN 8 


— 


35.1.3 REMARK: Possibly easier proof of boundary set-map theorem. 
Figure 35.1.1 suggests that Theorem 35.1.2 part (i") might be proved much more easily from parts (i) and (i’) 
by directly exploiting the partition of X into its interior, exterior and boundary. 


35.1.4 REMARK: Impossibility of easily removing some conditions for function continuity. 

Theorem 35.1.2 (i) cannot be strengthened to the assertion that VS € P(Y), f^! (Int(S)) = Int(f-! (S)) for 
any continuous function f, even if f is required to be an injection, surjection or bijection. (See Example 35.1.5 
for a counterexample to this conjecture.) The same observation applies to parts (i') and (i). 


The injectivity condition for part (ii) cannot be removed. This is shown by Example 35.1.6. The same 
observation applies to part (iii^). 


35.1.5 EXAMPLE: [t is possible to define a continuous bijection f : X — Y , for topological spaces X and Y, 
for which it is not true that VS € P(Y), f^ !(Int(S)) = Int(f ^! (S)). This can be done by defining a weak 
topology on X and a strong topology on Y. 

Let X = R with Top(X) = P(X). Let Y = R with Top(Y) = (0, Y). Define f : X > Y by f(x) = x for 
all x € X. Then f is continuous. (In fact, any function f : X — Y is continuous.) Also, f is a bijection. 
Let S = {a} for some a € IR. Then Int(S) = (). (This is also true if Y has the standard topology on IR.) So 
f (Int(S)) = 0. But Int(f. HS) = Int({a}) = = {a}. So f-!(Int(S)) Z Int(f~1($)). This contradicts the 
proposition VS € P(Y), f^! (Int(S)) = Int(f~!(S)). 
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35.1.6 EXAMPLE: Let X = R, where Top(X) is the usual topology on IR. Let Y = Rj, where Top(Y) is 
the usual relative topology on R. Define f : X — Y by 


x forx>0 
Va c X, f(r)—4 -x forr X0 with zz -1 
2 forr-—-1. 


(See Figure 35.1.2.) Then f is surjective, but is not injective and not continuous. Let S € P(Y). Then 
f£ (S) 2 S. So Int(f ^! (S)) 2 Int(S). Therefore f(Int(f-!(S))) 2 f(Int(S)) = Int(S). Hence VS € P(Y), 
Int(S) C f(Int(f-! (5))). Clearly this is satisfied for arbitrary values of f(x) for x < 0. 


=2 =] 1 2 


Figure 35.1.2 Discontinuous function with VS € P(Y), Int(S) C f(Int(f—1($))) 


35.1.7 REMARK: Unsatisfactory continuity conditions using the forward set-map. 

Whereas the inverse set-map conditions in Theorem 35.1.2 are quite useful and pleasingly symmetric, some 
corresponding conditions in Theorem 35.1.8 are not so pleasing. The unnatural-looking open range condition 
*f(X) € Top(Y)" in part (ii) of Theorem 35.1.8 seems not to be superfluous. Consider for example the 
function f : [0,1] + R defined by 


forz z1 
1 xU 35.1.3 
vz € [0,1], ge lo (35.1.3) 
This function apparently satisfies all of the conditions in Theorem 35.1.8 (ii) except f(X) € Top(Y). It is 
also very clearly discontinuous. 
If the condition f(X) € Top(Y) is omitted in Theorem 35.1.8 part (ii), instead of the desired equality 
f7*(Int(f(X))) = f (f(X)) = X, one obtains by Theorem 31.9.10 (xxi) that 


F Qnt(£(X)) = f£ (F(X) \ Bdy(f(X))) 
= f-(F(X))\ A Bay) 
= X \ f^ (Bay(f(X))), 


which could be smaller than X. (The discontinuity of the function on line (35.1.3) occurs in Bdy( f (.X)), which 
is why continuity is not delivered in this case.) Instead of the desired consequence f^ !(Int(S)) C Int(f~1($)) 
in the proof, one then obtains f !(Int(S)) V f~'(Bdy(f(X))) € Int(f^!(S)). By Theorem Rd. this 
is equivalent to f !(Int(S)) C Int(f !(S)) U f  (Bdy(f(X))). Thus the condition in Theorem 35.1.2 (i) is 
satisfied except for the dangling term f! (Bdy( f (X))). Since continuity is obtained only if this set cM 
is strictly valid, the desired result is not delivered. The complexity of the conditions for Theorem 35.1.8 (ii) 
follows the general pattern that continuity is generally most conveniently expressed in terms of inverse set- 
maps rather than forward set-maps. (Exceptions to this pattern include compactness and connectedness, 
which behave better for forward maps than inverse maps.) 


35.1.8 THEOREM: Forward set-map conditions for continuity of functions. 
For topological spaces X and Y, let f: X > Y. 


(i) If f is injective and continuous, then VA € P(X), Int(f(A)) € f(Int(A)). 
(ii) If f is injective and f(X) € Top(Y), and VA € P(X), Int(f(A)) C f(Int(A)), then f is continuous. 
(iii) f is continuous if and only if VA € P(X), f(A) 2 f(A). 
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PROOF: For part (i), let f : X — Y be injective and continuous. Let A € P(X). Let S = f(A ). 

f^ (Int(S)) € Int(f~1($)) by Theorem 35.1.2 (i). So f^! (Int(f(A))) € Int(f~1(f(A))). But f! (f(A) = A 
by Theorem 10.7.1 (ii") because f is injective. So f^! (Int(f(A))) € Int(A). Therefore f(f—'(Int(f(A)))) € 
f(Int(A)) by Theorem 10.6.7 (ii). But f(f~'(Int(f(A)))) = Int(f(A)) N f(X) by Theorem 10.7.1 (i), and 
Int(f(A) C f(A) € f(X) by Theorem 31.8.13 (ii). Therefore f(f-'(Int(f(A)))) = Int(f(A)), and thus 
Int(f(A)) € f(Int(A)). Consequently VA € P(X), Int(f(A)) € f(Int(A)). 


For part (ii), let f : X — Y be injective with f(X) € Top(Y). Assume VA € P(X), Int(f(A)) € f(Int(A)) 
Let S € P(Y). Let A = f-1(S). Then Int(f(f—1(S))) C f(Int(f-1(S))). But f(f-!(S)) = SN f(X) by 
Theorem 10.7.1 (i). Therefore Int(SM f(X)) C f(Int(f ^! (S))). But Int(S$ n f(X)) = Int(S) n Int(f(X)) 


by Theorem 31.8.14 (xiv). Therefore Int(S) N Int(f(X)) C f(Int(f7"($))). So f^! (Int(S) N Int(f(X))) € 
[^ (fünt(f-T(S)))) by Theorem 10.6.10 (i). Therefore f-'(Int(S)) 9 f (Int(f(X))) nif ($9) 
Theorem 10.6.10 Gy) and Theorem 10.7.1 (ii^) because f is injective. But f! (Int(f(X))) = f^! (f(X)) = 
by Theorem 31.8.14 (i) and Theorem 10.7.1 (ii!) because f(X) € Top(Y) and f is injective. Therefore 
f^ (Int(S)) n f-Tnt(f(X))) = " lünt(S) Ax = f (Int(8)). So / "(Im(S)) € Int(f -(S)). Thus 
VS € P(X), f! (Int(S)) € Int(f ^! (S)). Hence f is continuous by Theorem 35.1.2 (i). 


ia 
— 


For part (iii), let f : X — Y be continuous. Let A € P(X), and let S = f(A). Then f~'(S) 2 f-1(S) 
by Theorem 35.1.2 (i”). So f !(f(A) 2 f-(f(A). But f !(f(A)) 2 A by cs meth So 
2 f(A) by 


f(A) 2 A. Therefore f(f-*(F(A))) 2 F(A) by Theorem 10.6.7 (ii). Consequently F(A 
Theorem 10.7.1 (i). Hence VA € P(X), f(A) 2 f(A). 


For the converse of part (iii), let f : X — Y and assume that VA € P(X), f(A) 2 f(A). Let S € P(Y). Let 


A = f (5). Then f(A) 2 f(A). So FUIS) 2 (£15). So S 2 f(f (5]) by Theorem 10.7.1 (i). 
Therefore f-!(S) 5 f7 Pid UR 1(S))) by Theorem 10.6.10 (ii). So f-!(S) 2 f-!(S) by Theorem 10.7.1 (ii). 
Consequently VS € P(Y), f-!(S) 2 f-1(S). Hence e f is continuous by Theorem 35.1.2 (E 


35.1.9 EXAMPLE: To clarify the meaning of the inclusion condition in Theorem 35.1.8 parts (i) and (ii), it 
is perhaps helpful to consider the following (almost everywhere) discontinuous functions fi, f? : R > IR. 


x  forrzcQ 
Vr ER, hi) - 4, for rz € IRVQ 
fle frre Q 
Yr € R, fo(x) xD for z € IRN Q. 


The condition VA € P(X), Int(f(A)) C f(Int(A)) is not satisfied for f = fi. (Consider A = (R7 \ Q) U QF, 
which gives Int(f;(A)) = Int((IR* \ Q) U Qt) = Int(Rt) = R*+ and fi(Int(A)) = f(0) = 0.) But f; is 
injective and Range(f,) = R € Top(IR). However, the condition is trivially satisfied for f = f» because 
Int(Range(f2)) = 0, but f» is not injective and Range(f2) = (R7 V Q) U Qj € Top(IR). Thus fz fails to be 
continuous, even though it satisfies the condition VA € P(X), Int(f(A)) C f(Int(A)), but fı totally fails 
this condition, even though f; and f» are almost exactly the same function on each side of the origin. This 
shows the non-local character of the condition. 


35.2. Definition of continuity of functions using connectedness 


35.2.1 REMARK: Motivation for defining continuity in terms of connectedness. 

'The question of whether continuity of functions can be defined in terms of connectedness of sets is somewhat 
philosophical. The two principal concepts of topology are continuity and connectedness. Therefore it is of 
some interest to know more about the relations between these concepts. It turns out that it is possible to 
define function continuity in terms of separation of sets if a relatively weak separation condition is placed on 
the topology of the target space. 


Section 35.2 has no applications in the rest of the book. It may be safely skipped. This is recommended! 
35.2.2 REMARK: Thinking about continuity in terms of the connectedness of the graph. 
Usually the continuity of a function is defined in terms of the action of the function (or its inverse) on sets. 


But there are two ways of thinking about functions: either as a dynamic map from one set to another, or 
as a static set of ordered pairs, namely the *graph" of the function. Theorem 33.1.30 is an example of the 
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graph view of continuity of a function. Theorem 33.1.30 shows that if a function is continuous (and its range 
is a Hausdorff space), then the graph is closed. (This is similar to, but not the same as, the closed graph 
theorem for Banach space operators.) 


Definition 31.12.4 is succinct and convenient for many applications, defining a function f : X — Y to be 
continuous whenever f! (Top(Y)) € Top(X), where f! denotes the inverse set-map for f. (See Section 10.6 
for function set-maps and inverse set-maps.) From the technical point of view, it is a good definition. But it 
does not correspond closely to the intuitive concept of continuity in everyday life. To understand the “true 
nature" of continuity, it is desirable to find an alternative definition. 


In elementary introductions, continuity is sometimes explained in terms of the continuity of the graph, which 
in colloquial English is often understood to mean the connectedness of the graph. (For example, when one 
says that a cable between two points is continuous, one means that it is unbroken.) Theorem 34.4.22 states 
that the graph of a continuous function is connected. But the converse does not hold in general. 


Theorem 35.2.17 asserts that if the target space of a function is a T, space, then the function is continuous 
if and only if the inverse function maps disconnected sets to correspondingly disconnected sets. 


35.2.3 REMARK:  Connectedness of functions. 

It is useful to introduce here some non-standard definitions for the connectedness of functions. In the standard 
definition for continuity, the inverse of a function is required to map open sets to open sets, although the 
forward map of a continuous function does preserve some properties. For example, a continuous function 
maps compact sets to compact sets. (See Theorem 33.5.15.) 


Similarly, continuous functions map connected sets to connected sets. (See Theorem 34.4.18.) But continuity 
is not implied by the forward-map preservation of connectedness of subsets of the function's domain. (See 
Example 35.2.7.) Therefore it seems reasonable to seek a reverse-map connectedness condition which could 
guarantee continuity of a function. Definitions 35.2.5 and 35.2.6 are two candidates for such a guarantee. 


It eventuates that the requirement for connected sets to forward-map to connected sets is insufficient because 
the important information is not how connectedness succeeds in the forward direction, but rather how 
connectedness fails in the reverse direction. The way in which two separated subsets S4 and S» of the target 
space reverse-map to separated subsets f^ !(S;,) and f~+(S2) in the source space must be incorporated 
into the definition of a "connected function". (See Definitions 33.2.2 and 33.2.10 for weakly and strongly 
separated set-pairs respectively.) 


35.2.4 DEFINITION: A forward-connected function from a topological space X to a topological space Y is 
a function f : X — Y such that 


VS c P(X), S is connected in X = f(S) is connected in Y. 


35.2.5 DEFINITION: A (weakly) connected function from a topological space X to a topological space Y is 
a function f : X — Y such that 


V5,,$; € P(Y), 
(S1, S2) is weakly separated in Y > (f^ !(91), f! (S5)) is weakly separated in X. 


35.2.6 DEFINITION: A strongly connected function from a topological space X to a topological space Y is 
a function f : X — Y such that 


V9, S2 c P(Y), 
(S1, S2) is strongly separated in Y > (f !(S1), f~'(S2)) is strongly separated in X. 


35.2.7 EXAMPLE: A function which is forward-connected, but not continuous. 

Define f : [0,2] > R by f(0) = 0 and f(x) = sin(z/x) for x € (0,2]. (See Figure 35.2.1.) Then f is 
forward-connected, but not continuous. 

Let S be a connected subset of Dom(f) = [0,2]. If x d S, then f(S) is connected by Theorem 33.5.15 
because the restriction of f to (0,2] is a continuous function. So suppose that x € S. All connected subset of 
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139 
0 * 
all 
Figure 35.2.1 Function which is forward-connected, but not continuous 


[0,2] are intervals. Therefore S is either equal to the singleton {0}, or else S includes a neighbourhood of 0. 
If S = {0}, then f(S) = {0}, which is a connected subset of IR. If S includes a neighbourhood of 0, then 
f(S) = [71,1], which is connected subset of IR. Thus f is a forward-connected function. However, clearly f 
is not continuous. 

Let Sı = {0} and S2 = (1). Then 0 € f~1(S), and 0 € f7!(S2) because f~1(S2) = (n^ !; n € Z*) has 0 
as a limit point. So by Definition 33.2.2, the pair (f ^!(91), f ^1 (S25)) is not weakly separated, whereas the 
pair (51,595) is weakly disconnected. Therefore by Definition 35.2.5, f is not weakly connected. Similarly, 
f is not strongly connected because by Definition 33.2.10, (S1, S2) is a strongly separated set-pair and by 
Theorem 33.2.11, the set-pair (f 1(51), f ^! (S3)) is not strongly separated because it is not weakly separated. 


35.2.8 EXAMPLE:  Biüective forward-connected function which is not weakly connected. 
Define C : IR? > IR?, where IR? has the usual direct product topology, by 

2 | f (0,t) for s — 0andtc IR 

TT ES es { (s,t--sin(r/s) for s#O0andteR, 


Then Z is bijective and discontinuous, C! is discontinuous, and Dom(Z) and Range(Z) are connected and 
locally connected. (See Section 34.7 for local connectedness.) The inverse function satisfies 


2 ET . f (0,1) for s 20 andt € R 
YOUR EU, Et { (s,t—sin(x/s)) fors #0 andte R, 
By the same arguments as in Example 35.2.7, © and ( ! are forward-connected, but are neither weakly 
connected nor strongly connected. 


35.2.9 EXAMPLE: Injective forward-connected function which is not weakly connected. 

Let X = QU (r +Q) = QU{r +z; x € Q} with the relative topology from IR. Define g : X > R by 
g(x) = exp(x) for x € Q and g(x) = — exp(x) for x € t+ Q. The only connected subsets of X are the empty 
set and the singletons because there are “gaps” between all points of X, permitting any two distinct points 
to be disconnected. Therefore the image g(X) of any connected subset K of X is a connected subset of 
the range R by Theorem 34.4.5 (v). So g is a forward-connected function. However, g is neither continuous 
nor weakly or strongly connected, although it is injective and the domain and range are metric spaces. It 
is perhaps noteworthy that the domain is neither connected nor locally connected. Moreover, the domain is 
totally disconnected, which reduces its collection of connected subsets so much that any function at all would 
be forward-connected. (See Definition 34.3.14 for totally disconnected spaces.) Clearly continuity cannot be 
proved for such domains. To prove an implication of continuity from forward-connectedness, one must at 
least have a domain which has an abundance of connected subsets. 
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35.2.10 EXAMPLE: Weakly connected does not imply strongly connected, and vice versa. 

It is logically clear that a weakly connected function is not necessarily strongly connected, and that a strongly 
connected function is not necessarily weakly connected. This logic is concretely verified by the three-point 
topologies labelled “3f” and “3g” in Example 31.5.6. (They are illustrated in Figure 31.5.2). The weak and 
strong separation properties of these topologies are presented in Example 33.2.15. The function $ : 3g > 3g 
in Figure 35.2.2 is strongly connected, but not weakly connected. The function ¢2 : 3g — 3f in Figure 35.2.2 
is weakly connected, but not strongly connected. 


3g | 112.13 110213] 3f 
Qı $2 
3g | 112.13 11213] 3g 
not forward-connected forward-connected 
not weakly connected weakly connected 
strongly connected not strongly connected 
Figure 35.2.2 Examples of weakly and strongly connected functions 


In topology “3g”, the only weakly separated set-pair is ((1), (31), and this pair is not strongly separated. 
Therefore all functions with target space “3g” are strongly connected because there are no strongly separated 
set-pairs in “3g”. But the inverse set-map $1 ! applied to the pair ({1}, {3}) yields ({1}, (21), which is not 
a weakly separated set-pair. Therefore $, is strongly connected, but not weakly connected. 


In topology “3f”, the only weakly separated set-pair is ({1}, {2}), and this pair is also strongly separated. 
The inverse set-map ¢2 applied to the pair ((1), {2}) yields ({1}, {3}), which is weakly separated, but not 
strongly separated. Therefore the function $9 is weakly connected, but not strongly connected. 


'The connectedness of all subsets of all three-point topological spaces is presented in Example 34.1.7. The 
function ¢1 is not forward-connected because the only connected subset of the domain, which is {1,3}, is 
mapped to {1,2}, which is not connected. (This implies by Theorem 34.4.18 that ¢, is not continuous.) The 
function $» is forward-connected because the image of the only connected subset of the domain, {1,3}, is 
mapped to the only connected subset of the range, {1,2}. 


$4 and ¢2 are both not continuous because the set {2} is an open subset of the range of both functions, but 
$1! ((2)) = (1) and $5! ((2]) = {3} are not open subsets of the respective domains. The spaces 3f and 3g 
are both To, but not T, T» or T3. (This is stated in Remark 33.3.7.) 


35.2.11 REMARK: Relations between continuity and various function connectedness properties. 

Examples 35.2.7, 35.2.8, 35.2.9 and 35.2.10 assist the search for true theorems by ruling out various “false 
theorems”. The following table summarises the continuity, connectedness and injectivity properties of the 
functions f in Example 35.2.7, C in Example 35.2.8, g in Example 35.2.9, and $4 and ¢2 in Example 35.2.10. 


f C g 1 Q0» ob A 


continuous no no no no no no no 
forward-connected yes yes yes no yes yes no 
weakly connected no no no no yes yes no 
strongly connected no no no yes no yes yes 
injective no yes yes yes yes yes yes 


range topology met met met To To To Ti 


connected domain yes yes no yes yes yes yes 
locally connected domain yes yes no yes yes yes yes 
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Also included in this table are the functions ¢ from Example 35.2.18 and h from Example 35.2.20. 


Separation properties of the range topologies are also indicated in the above table, since these may have some 
influence. (See for example Theorems 35.2.17 and 35.2.19.) The abbreviation *met" means that the topology 
is induced by a distance function. As mentioned in Example 35.2.9, the connectedness of the domain is also 
a significant issue in the power of the forward-connected property to influence continuity. 


35.2.12 REMARK: The weakness of the forward-connectedness property for functions. 

Although forward-connectedness follows easily from conditions such as continuity or weak connectedness of 
functions in Theorem 35.2.13, it is not so easy to obtain continuity or weak connectedness from forward- 
connectedness, despite any intuitive inkling that it should imply something which resembles continuity. 
Example 35.2.9 shows that forward-connectedness combined with injectivity does not guarantee much. In 
fact, when the domain is totally disconnected, all functions are forward-connected. Probably at least local 
connectedness of the domain is required. (After two weeks of examining the issue, this author has found 
many clues but no theorems which exploit forward-connectedness in this regard.) 


Weak and strong connectedness of functions, on the other hand, are relatively powerful properties in relation 
to continuity, as demonstrated by Theorems 35.2.17 and 35.2.19. In retrospect, this is perhaps not too 
surprising. Separated set-pairs are typically more “numerous”, in some sense, than connected sets. (On 
the other hand, in the case of the coarse topology, every subset is connected and consequently there are no 
separated set-pairs at all. But excessively coarse topologies are rarely useful in practical applications.) 


35.2.13 THEOREM: Some conditions which imply forward-connectedness of functions. 
Let f : X — Y for topological spaces X and Y. 


(i) If f is continuous, then f is forward-connected. 


(ii) If f is injective and weakly connected, then f is forward-connected. 


Pnoor: Part (i) is a paraphrase of Theorem 34.4.18. 


For part (ii), let f : X — Y bean injective, weakly connected function for topological spaces X and Y. Let S 
be a connected subset of X. Suppose that f(S) is a disconnected subset of Y. Then f(S) = A1U A2 for some 
weakly separated pair (A1, A2) of non-empty subsets of Y by Theorem 34.3.12 (i, iv). (See Definition 33.2.2 
for weakly separated pairs of sets.) So (f^ 1(A1), f^! (A3)) is a weakly separated pair of subsets of X by 
Definition 35.2.5, and since 41,45 C f(S) C f(X), it follows from Theorem 10.7.1 (iii") that f-!(A1) 
and f !(A3) are non-empty sets. So f !(A1) U f^!(A3) = f 1(A1U A2) = f (f(S)) = S by Theorems 
10.6.10 (iii) and 10.7.1(ii^ because f is injective. Therefore S is disconnected by Theorem 34.3.12 (i, iv). 
This contradicts the assumption. Hence f is forward-connected. EE 


35.2.14 REMARK: Continuous functions are strongly connected. 

Theorem 35.2.13 (i) states that any continuous function is forward-connected. From Theorem 34.4.17, it 
follows that a continuous function is also weakly connected, as stated in Theorem 35.2.15 (i). This implication 
is not much more than a paraphrase of the theorem that continuous images of connected sets are connected. 


35.2.15 THEOREM: Continuity implies that a function is both weakly connected and strongly connected. 
Let X and Y be topological spaces. Let f : X — Y be continuous. 


(i) f is weakly connected. 


(ii) f is strongly connected. 


Proor: Part (i) follows from Theorem 34.4.17 and Definition 35.2.5. (In the special case that S, = @ or 
S; = Ú or both, the pairs (S1, S2) and (f~'(S}), f~1(S2)) are trivially both weakly and strongly separated.) 


For part (ii), let X and Y be topological spaces, f : X — Y be continuous and (51,52) be a strongly 
separated pair of sets in Y. Then by Definition 33.2.10, 51,59 € P(Y) and for some Q,,05 € Top(Y), 
Si Cc Q1, So C Q2, and Q1 N Q = Ø. Therefore f (Sut 8a) € P(X), f-1(Q1), f^ 1(03) € Top( X) 
by Definition 31.12.4, f^ !($,) C f !(Q1), f^! (559) C f-!(Q3) by Theorem 10.6.10 (ii), and f~1(Q1) 9 
f (Q5) = f^ (Q0; 05) = Ø by Theorem 10.6.10 (iv). Hence (f^ !(51), f~1(S2)) is a strongly separated 
pair of sets in X by Definition 33.2.10. 
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B= B, U Be is disconnected - B= B U Be is ded 


f-*(B) = f-1(B,) U f !(B5) is disconnected — f^ !(B) = f^! (Bi) U f-1(B5) is connected 
f is continuous f is discontinuous 
Figure 35.2.3 Continuous function pre-images of disconnected sets 


35.2.16 REMARK: Conditions under which weakly or strongly connected functions are continuous. 
Theorems 35.2.17 and 35.2.19 are illustrated in Figure 35.2.3. 


To guarantee that a function is continuous if it is weakly connected, the target space must be Tı. To 
guarantee that a function is continuous if it is strongly connected, the target space must be T3. This 
difference of requirements originates in Theorems 33.2.7 and 33.3.9, which give conditions on the topology 
which guarantee that any point in the interior of a set is weakly or strongly separated from the complement of 
the set. It is easier to guarantee weak separation than strong separation. Hence the weaker space separation 
class T, is required for the weaker kind of set-separation. 


For topological manifolds, the Tı condition is very easy to satisfy. Even locally Cartesian spaces, which are 
topological manifolds without the Hausdorff separation restriction, are guaranteed to have the T, property. 
So Theorem 35.2.17 applies to all topological manifolds and locally Cartesian spaces. 


35.2.17 THEOREM: For a T, target space, functions are continuous if and only if weakly connected. 
Let X be a topological space and let Y be a T, space. Then f : X — Y is continuous if and only if f is 
weakly connected. 


PROOF: The forward implication of the assertion follows from Theorem 35.2.15 (i). It remains to show that 
f is continuous if it is weakly connected. Let f : X — Y be weakly connected. By Theorem 35. 1 2 (i), to 


show the continuity of f, it is sufficient to show that x € Int(f~'(S)) for all S € P(Y) and z € f ! (Int(S)). 
Let S € P(Y) and z € f !(Int(S)). Let y = f(x). Then y € Int(S). So by Theorem 33.2. E iii), the 
pair ((y], Y V S) is weakly separated because Y is a T, space. Then the set-pair (f~'({y}), (Y \ S)) 


is weakly separated by Definition 35.2.5. Therefore the set-pair ({z}, f~'(Y V S)) is weakly separated 
Theorem 33.2.13 (i). But f !(Y VS) = XX f-!(S) by Theorem 10.6.10 (v). So x € Int(f ! (S)) 
Theorem 33.2.7 (ii). Hence f is continuous. 


35.2.18 EXAMPLE: Sharpness of Tı condition for weakly connected function to be continuous. 

The sharpness of the Tı condition in Theorem 35.2.17 is demonstrated by the function $ : 3f — 3g which is 
illustrated in Figure 35.2.4, where the three-point topologies “3f” and “3g” are defined in Example 31.5.6. 
(See Remark 33.3.7 for their separation class properties.) 


1213| 3g 
ó 
152]3| 3f 
Figure 35.2.4 Non-continuous weakly connected function with To target space 
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The topology “3g” has one weakly separated non-empty set-pair ({1}, {3}) and no strongly separated non- 
empty set-pairs. The topology “3f” has one weakly separated non-empty set-pair ({1}, {2}), which is also 
a strongly separated non-empty set-pairs. Both of these topologies are of class To, but not of class Tj. 
(As mentioned in Remark 33.2.16, every pair of singletons in a T, topology must be weakly separated.) 
Therefore the function ¢ is both weakly connected and strongly connected, but ¢ is not continuous because 
the inverse image of every non-trivial open set in the target space is a non-open set in the domain space. 


It is perhaps also of interest that @ maps all connected subsets of the domain to connected subsets of the 
target space. All subsets of topology “3g” are connected except {1,3}. All subsets of topology "3f" are 
connected except {1,2}. The image of {1,2} by $ is {1,3}. Thus $(S) is connected if and only if S is 
connected, for all subsets S of the domain. Therefore both ¢ and $^! are forward-connected functions. 


35.2.19 THEOREM: For a T; target space, functions are continuous if and only if strongly connected. 
Let X be a topological space and let Y be a T3 space. Then f : X — Y is continuous if and only if f is 
strongly connected. 


Pnoor: The forward implication of the assertion follows from Theorem 35.2.15 (ii). It remains to show that 
f is continuous if it is strongly connected. Let f : X — Y be strongly connected. By Theorem 35.1.2 (i), to 


show the continuity of f, it is sufficient to show that x € Int(/ ^! (S)) for all S € P(Y) and z € f ! (Int(S)). 


Let S € P(Y) and x € f^!(Int(S)). Let y = f(a). Then y € Int(S). So by Theorem 33.3.9 (ii), the 
pair ({y}, Y V S) is strongly separated because Y is a T3 space. Then the pair (f~'({y}), f (Y V S)) 
is strongly separated by Definition 35.2.6. Therefore the pair ([z], f !(Y V S)) is strongly separated by 


Theorem 33.2.13 (ii). But f !(Y VS) = X \ f !(S) by Theorem 10.6.10 (v). So x € Int(f ^! (S)) by 
Theorem 33.3.9 (i). Hence f is continuous. 


35.2.20 EXAMPLE: Sharpness of T3 condition for strongly connected function to be continuous. 

Let X; be an infinite set. Let T1 = (0) U {Q € P(X1); #(Xı \ 9) < co}. Then T; is a topology on X4. (See 
Theorem 31.11.6 for finite-complement topologies.) Let Xo = X4 and To = (0, Xo}. Then To is a topology 
on Xo. Define h : Xo > X4 by h: xı z. Then h is strongly connected because by Theorem 33.3.13 (vi), 
a set-pair (S1, S2) in the topological space (X1, T1) is strongly separated if and only if 0 € (51, S2}, but for 
each pair (51, $5), the set-pair (h-1(S4), h-1(S35)) in the topological space (Xo, To) is strongly separated by 
Theorem 33.2.14. However, h is not continuous because h^ !(Q) € To if and only if Q € (0, X1), which is 
not satisfied by all Q € Tı. By Theorem 33.3.13 (i), the topological space (X1, T1) is a Tı space, but is not 
a T3 space. By Theorem 33.3.13 (iv), h is not a weakly connected function. 


Every subset of Xo is connected with respect to the coarse topology To. But every distinct pair {21,22} in 
X4 is a disconnected set with respect to the finite-complement topology Tı. Therefore h is not a forward- 
connected function. 


35.2.21 REMARK: Intuitive interpretation of continuity/connectedness theorem for injective functions. 
Very roughly speaking (i.e. ignoring counterexamples), a continuous function is a function which maps 
connected sets to connected sets. Preservation of connectedness is a necessary condition for a function to 
be continuous, although not quite sufficient. If there is no “gap” in a subset of the domain of a continuous 
function, there will be no “gap” in the image of that subset by the function. If there is a gap in a subset 
of the target space of the function, there must be a corresponding gap in the pre-image. “Corresponding” 
means that if a subset C of the target space is separated into A and B, then the inverse image f~!(C) is 
separated into f^! (A) and f^ !(B). 

Thus continuity may be presented in terms of the more intuitive concept of connectedness instead of the 
convoluted ¢-d definition for metric spaces, or the open-set inverse-map definition for general topologies. 
Although connectedness is technically defined in terms of open sets, people generally have a strong intuition 
for connectedness without mentioning open sets. Connectedness may be explained as the "absence of a gap", 
which is possibly easier to grasp than the absence of a discontinuity. 


A continuous function may close up some gaps, but it never opens up new ones. Therefore: “Continuous 
functions are the functions which preserve connectedness.” Or more accurately: “Continuous functions are 
the functions whose inverse set-maps preserve disconnected set-pairs.” The fact that the T4 separation class 
is required as a technical condition has limited effect on most practical scenarios because only pathological 
topologies fail the T, test. 
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35.3. Pointwise continuity, limits and convergence 


35.3.1 REMARK: Limits, convergence, and continuity at a point. 

The concepts of continuity, limits and convergence are closely related. Roughly speaking, a function is 
continuous at a point if the function converges to a limit at the point and the value of the function at the 
point equals the limit. However, the limit may not be unique even if the function is continuous. Intuition for 
topology is typically learned from non-pathological functions between finite-dimensional Cartesian spaces. 
So it is important to prove every assertion, no matter how intuitively obvious it may seem. The subject of 
limits is particularly prone to false intuition. 


Definition 35.3.2 decomposes the global continuity of a function in Definition 31.12.4 into a distinct property 
at each point of the domain of the function. This distinguishes points where a function fails to be continuous. 
The failure points can then perhaps be “repaired”. For example, the function f : IR — R with f : x — 059 
fails to be continuous at 0, and this continuity “failure” can be fixed by replacing the function's value with its 
limit. Definition 35.3.2 has a similar form to the traditional -ô definition for metric spaces in Theorem 38.1.7. 


35.3.2 DEFINITION: A function f : X — Y is continuous at a point p € X, for topological spaces X and Y, 
when VG € Top;(,,(Y), 3€) € Top,(X), f(Q) € G. 


p) 


35.3.3 THEOREM: Continuity is equivalent to everywhere pointwise continuity. 
Let f : X — Y for topological spaces X and Y. Then f is continuous if and only if f is continuous at every 
point of X. 


PROOF: Let f: X — Y be continuous. Let p € X. Let G € Topp (Y). Then f^!(G) € Top(X) by 
Definition 31.12.4. So f^! (G) € Top, (X) because p € f~'(G) by Definition 10.6.4 (ii). Let Q = f^! (G). 
Then f(Q) = GnRange(f) by Theorem 10.7.1 (i). So f(Q) € G. So f is continuous at p by Definition 35.3.2. 
Now suppose that f is continuous at every p € X. Let G € Top(Y). Let p € f-!(G). Then f(p) € G by 
Definition 31.12.4. So G € Top,(,,(Y). Therefore by Definition 35.3.2, there is a set Q € Top, (X) such 
that f(Q) C G. Then Q C f !(f(Q)) C f^!(G) by Theorems 10.7.1 (ii) and 10.6.10 (ii). So p € Int(f ^! (G)) 
by Theorem 31.8.13 (iii). Thus f^! (G) C Int(f !(G)). So f^! (G) € Top(X) by Theorem 31.8.14 (ii). Hence 
f is continuous. 


35.3.4 REMARK: Pointwise continuity proofs can be easier than global continuity proofs. 

Theorem 35.3.5 gives some examples of continuity proofs which are made easier by Theorem 35.3.3. Using 
Definition 31.12.4, it would be necessary to show for Theorem 35.3.5 (i) that o^ !(G) € Top(IR x IR), for 
example. One would probably achieve this by finding € € Top, (IR. x TR) satisfying o(Q) C G for all 
(z1,22) € f-!(G), and then constructing the union of such sets Q. This ultimately amounts to the same 
thing, but it is easier to use Theorem 35.3.3. The inverse images for Theorem 35.3.5 (ii, iii, iv) have a more 
complicated structure, which makes them even more inconvenient when using only Definition 31.12.4. 


11,72 


To avoid mentioning metric spaces, Theorem 35.3.5 is expressed in terms of the absolute value function 
| - | :IR— R$ in Definition 16.5.2. 


35.3.5 THEOREM: Continuity of real-number algebraic operations. 
(i) The real-number addition operation ø : R x IR — IR is continuous. 


(iii) The Cartesian linear space scalar product operation u : R x R” — R” is continuous for all n € Zp. 
I 


) 

(ii) The real-number product operation 7 : IR x IR — IR is continuous. 
) 

(iv) The Cartesian linear space vector addition operation on : R” x IR" — R” is continuous for all n € Zf. 


PROOF: For part (i), let x € R x R. Then x = (zi, 22) for some 71,72 € IR. Let y = o(£1, £2) = 21 + 23. 
Let G € Top, (R). "Then (y — €&,y +£) C G for some € € Rt by Theorem 32.5.9. Let 5 = ¢/2. Let 
Qı = (21—0, 41 +6) and Q2 = (13 —0, 23-6). Then Qı € Top, (IR) and Q2 € Top, (IR) by Definition 32.5.7. 
So (4 x Q2 € Top, y (IR x IR) by Theorem 32.9.6 (i). Let (x1, £2) € Q1 x Q2. Then |o(z1, 25) — y| < £. So 
(21,25) € G. Thus c(Q, x 03) C G. Therefore c : IR x R > R is continuous at (x1, x2) by Definition 35.3.2. 
Hence c : R x R —> R is continuous by Theorem 35.3.3. (See Definition 16.5.2 for | - |.) 

For part (ii), let z = (z1,22) € R x IR. Let y = r(z1,22) = ziz2. Let G € Top, (IR). Then (y—e,y+e) CG 
for some £ € IR* by Theorem 32.5.9. 
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Suppose that xı = 0 and x2 = 0. Let 6 = min(1,¢). Let Qı = (xı — 0,21 + ô) and Q2 = (za — 6,42 + Ô). Let 
(z1,25) € O1 x Q2. Then |z425| « €. So r(z1, 25) € G because y = 0. Thus 7 is continuous at (x1, £2). 
Suppose that x, = 0 and z2 4 0. Let 0j = ie/|vo| and à» = |x9|. Let Qı = (xı — 61,21 + 61) and 
O5 = (xq — 62,42 + 03). Let (24,25) € Qı x Qe. Then |z425| <e. So (4,25) € G. Thus 7 is continuous 
at (£1, £2) because y = 0, and similarly if xı 4 0 and z2 = 0. 

Suppose that zı # 0 and x2 #0. Let 6; = $ min(e/|v2|, e, 1) and à = 1 min(e/|x1|, e, 1). Let Qı = (v1 — 
01,21 +061) and Q = (aq — 09, £2 +62). Let (25, 25) € Q1 x Qe. Then |a 25 — 2129 < ài lza] + |a | +6162 < 
le + łe + je <e. So r(x},25) € G. Thus 7 is continuous at (£1, £2). Hence 7 : IR x IR — R is continuous 
by Theorem 35.3.3. 

For part (iii), let (A4, z) € IR x R”. Let y = u(à, z) = Az. Let G € Top, (IR"). If n = 0, then z = y = 0 and 
G = {0}, and the continuity follows trivially. So assume that n € Zt. Then x? 4(y; — & yi 4- €) € G for 
some £ € Rt by Theorem 32.6.4. 

Suppose that À = 0 and x = 0. Let ô = min(1,¢). Let Qı = (A — ô, A+ ô) and Q2 = x% (xi — ô, zi +6). Let 
(A, ^) € Qi x Q2. Then |Az/| < e for all i € Nn. So (A, 2’) € G. Thus p is continuous at (A, x). 
Suppose that À = 0 and x # 0. Let 62 = max? , |v;|. Then 62 € Rt. Let 0; = 3e/ó5. Let Qı = (A— 81, A+ô1) 
and Q2 = xt, (a; — 62,2; + 62). Let (X, x’) € Qi x Q2. Then |A'z/| < & for all i € Ny. So u(r’, 2’) € G. 
Thus p is continuous at (A, x) because y = 0. 

Suppose that A Z 0 and z = 0. Let 6; = |A|. Then 04 € IR*. Let 03 = 5é/61. Let Qı = (A — 61,4 + 61) and 
Qo = xt, (a; — 05, 2j +62). Let (X, 2") € Qi x Qe. Then |Az| < e for all i € Ny. So p(X’, 2’) € G. Thus u 
is continuous at (A, x) because y = 0. 


Suppose that A # 0 and z # 0. Let M = max’), |z;|. Then M € R*. Let 6, = imin(e/M,e,1) and 


6g = i min(e/|A|, e, 1). Let Q4 = (A —64,A 4-61) and Qa = x? (aj — 62,4; 4-02). Let (X, x) € Qi x Q2. 
Then |X’ ai — Azi| € 64M + ó3|A| + 0102 < ie + ze + sé < € for all i € Ny. So p(X,2’) € G. Thus p is 
continuous at (A,x). Hence u : R x R” > R” is continuous by Theorem 35.3.3. 

For part (iv), let (x,y) € R” xR”. Let z = o (xz, y) = x+y. Let G € Top,(R”). Ifn 20, then r =y=z=0 
and G — TO}, and the continuity follows trivially. So assume that n € Z*. Then x7 4(z; — €, z; - €) C G for 
some £ € Rt by Theorem 32.6.4. Let 6 = £/2. Let Qy = x? (xi — 6,2; + 6) and Qa = x? 1 (yi — ô, yi + 0). 
Then 9, € Top,(R”) and Q2 € Top,(IR") by Definition 32.6.1. So Qı x Q2 € Top”, (R” x R”) by 
Theorem 32.9.6 (ii). Let (z',y') € Qj x Q2. Then Jes(z', y); — zil < € for all i € Nn. So (z^,y) € G. 
Thus o4, (01 x Q2) C G. Therefore on : R” x R” 2 R” is continuous at (x,y) by Definition 35.3.2. Hence 
On: R” x R” > R” is continuous by Theorem 35.3.3. 


35.3.6 REMARK:  Decomposition of the continuity concept into limits and values. 

After decomposing global continuity into continuity at each point of the domain, a further decomposition of 
pointwise continuity can be expressed in terms of the limit and the value of a function at each point. Then, 
roughly speaking, the function should be continuous at a point if and only if the limit and value at that 
point are the same. 


There are many situations where the procedure for computing the value of a function fails at particular 
points. For example, the function f : 0 —— 0-!sin0 is not meaningful at 0 = 0, although the limit is well 
defined. Similarly, differential quotients of the form x — (f(x) — f(a))/(x — a) for functions f : R > R 
and a € R, are not defined for z = a, although the limit is often well defined. In fact, a large proportion of 
analysis is concerned with limits of expressions whose values cannot be computed directly at the limit point. 
'Thus limits, and convergence to limits, are important independently of their relevance to continuity. 


'The convergence test in Definition 35.3.7 is the same as the continuity test in Definition 35.3.2 except that 
the neighbourhood €) is *punctured" by removing the point p € X. 


35.3.7 DEFINITION: A limit of a function f : X — Y at a point p € X, for topological spaces X and Y, is 
an element q € Y which satisfies VG € Top, (Y), IQ € Top, (X), f(Q \ {p}) EG. 


A function f : X — Y converges at a point p € X to a value q € Y, for topological spaces X and Y, when 
VG € Top,(Y), IN € Top, (X), f(Q {p}) € G. 


35.3.8 THEOREM: A function is continuous at a point if and only it converges to its value at that point. 
Let f : X — Y for topological spaces X and Y. Let p € X. Then f is continuous at p if and only if f 
converges at p to f(p). 
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PROOF: Suppose that f is continuous at p € X. Then VG € Topf) (Y), JQ € Top, (X), f(Q) € G by 
Definition 35.3.2. Therefore VG € Topp) (Y), JQ € Top,(X), f(Q\ (pf) € G. So f converges at p to f(p) 
by Definition 35.3.7. 

Now suppose that f converges at p to f(p). Then VG € Topp) (Y), IQ € Top,(X), f(Q \ {p}) € G by 
Definition 35.3.7. But if G € Top, (Y), then f(p) € G. So f({p}) € G. Therefore f(Q \ (pJ) C G implies 
that f(®) = F(Q\{p})UF({p}) € G by Theorem 10.6.7 (iii). So VG € Topp (Y), IQ € Top, (X), F(Q) CG. 
Hence f is continuous at p by Definition 35.3.2. 


35.3.9 THEOREM: Continuity is equivalent to convergence of a function to its value at all points. 
Let f : X — Y for topological spaces X and Y. Then f is continuous if and only if f converges at p to f(p) 
for all p € X. 


PROOF: ‘The assertion follows from Theorems 35.3.3 and 35.3.8. 


35.3.10 REMARK: The link between pointwise continuity and open inverse-image continuity. 

Theorems 35.3.8 and 35.3.9 establish links between the abstract Definition 31.12.4 for continuity, expressed 
in terms of the inverse images of open sets, and the more concrete pointwise Definitions 35.3.2 and 35.3.7, 
expressed in terms of convergence of a function to its value at each point of its domain. The abstract 
definition is often more efficient for proofs, but the concrete definition is more intuitive. 


The pointwise definition is itself an abstraction of the £-ó style of definition for metric spaces, but the &-ó 
style of definition took over a century to develop out of the intuitive concepts of continuity prevalent in 
calculus in the 18th century and earlier. The 17th and 18th century concepts of continuity developed out of 
the even more primitive earlier concepts of limits which were due to Eudoxus of Cnides and Archimedes of 
Syracuse in the 4th and 3rd centuries BC. 


35.3.11 REMARK: Pointwise neighbourhoods give meaning to locality. 

Definition 35.3.7 hints at why topology is called topology. The set of neighbourhoods Top, (X) around the 
point p € X defines closeness to this point. Similarly, the set of neighbourhoods Top, (Y) defines proximity 
toq € Y. The proposition VG € Top,(Y), JQ € Top,(X), f(QX (pj) € G may be interpreted as: “for any 
neighbourhood G of q, there is some (punctured) neighbourhood € \ (p) of p such that the values of f on 
Q \ {p} fall within G.” In a metric space, one uses balls Bp, and By,- as neighbourhoods. (See Section 37.3 
for balls in metric spaces.) Thus the ancient Greek word *"xóxoc" (topos) is associated with the notion of 
closeness to a place, generalising the notion of distance. 


Even so, it is a far stretch to call this kind of topology the "analysis of place". OED [482], page 2328, gives 
the year 1659 for the earliest use of the word “topology”. They give the archaic primary meaning as: “The 
department of botany which treats of the localities where plants are found." Probably the word “locality” 
comes closest to describing the subject matter of the mathematical subject of topology. The Latin word 
“locus” does in fact mean “place” in the same sense as the Greek *xónoc". So “the study of locality” would 
be a good interpretation for the word “topology”. Definition 35.3.7 says that as the function f is localised 
more and more to the point p, its value f(x) is localised more and more to the value q. The meaning of the 
word “neighbourhood” is very close to the word “locality”. Thus topology is “the study of neighbourhoods", 
or simply “the study of neighbourhood", where “neighbourhood” means the abstract relation between points 
of being neighbours of each other, like “sisterhood”, “brotherhood”, “parenthood” and so forth. 


As mentioned in Remark 31.3.21, the best default topology for a set which “has no topology" is the discrete 
topology. This may seem paradoxical because the discrete topology is the biggest topology, containing 
the most open sets. But in the discrete topology, each point p is effectively an island because the set 
{p} is a neighbourhood of p. Conversely, when one says that a set does have a topology, this means that 
all neighbourhoods of points include other points, typically infinitely many other points. The topology 
with the maximum “neighbourliness” is, perhaps paradoxically, the trivial topology because the smallest 
neighbourhood of any point is the whole set. (See Remark 31.1.1 for further comments on the origin and 
meaning of the word “topology” .) 


35.3.12 REMARK: Uniqueness and non-uniqueness of limits. 
The uniqueness of the value q in Definition 35.3.7 is not guaranteed. As an extreme example, let Y have the 
trivial topology Top(Y) = (0, Y }. (See Definition 31.3.18.) Let q € Y, and let G € Top,(Y). Then G =Y. 
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Let f : X — Y. Then f~'(G) = X € Top(X). Let Q = X. Then © € Top,(X) for all p € X. Hence f 
converges at all points p € X to all values q € Y, for all functions f : X — Y. 


As another extreme example, let X have the discrete topology Top(X) = P(X). (See Definition 31.3.19.) 
Let p € X and let Q = {p}. Then Q € Top,(X) and f(Q \ {p}) = f(0) = 0 C G for all G € Top,(Y) for 
all q € Y. Hence f converges at all points x € X to all values q € Y, for all functions f : X > Y. 


These examples show that to guarantee uniqueness of the values to which functions converge, constraints 
are required both on the domain topology and the range topology. The domain topology should not be 
too strong, and the range topology should not be too weak. The standard separation classes for topologies 
in Sections 33.1 and 33.3 put minimum constraints on separation (i.e. the strength of the topology), very 
rarely maximum constraints. For Theorem 35.3.13, a suitable maximum constraint for the domain topology 
is the non-existence of isolated points. (See Definition 31.10.8.) A suitable (minimum) separation class for 
the range topology is the Hausdorff class. (See Definition 33.1.24.) 


35.3.13 THEOREM: Function limits are unique, assuming no-isolated-point domain and Ta target space. 
Let f : X — Y for topological spaces X and Y, where X has no isolated points and Y is a Hausdorff space. 
If f converges at p € X to both qı € Y and q2 € Y, then qı = q2. 


PROOF: Let f : X — Y for topological spaces X and Y, where X has no isolated points and Y is a 
Hausdorff space. Suppose that f converges at p € X to both qı € Y and q2 € Y, where qı Z qs. Then 
by Definition 33.1.24, there are sets Gi € Top, (Y) and G2 € Top,,(Y) such that Gi N G2 = 0. By 
Definition 35.3.7, there are sets Q1,Q2 € Top,(X) such that f(Q1: V {p}) C Gi and f(Q2 \ (pj) C Ga. 
Let Q = Qı N Q2. Then Q € Top,(X), and QV {p} = (Q1 \ {p}) N (Q2 \ {p}) by Theorem 8.2.6 (vi). So 
f(Q\ {p}) € G1 N G2 = 0 by Theorem 10.6.7 (iv). So Q \ {p} = 0. So Q C {p}. But p e Q. So Q = {p}. 
So {p} € Top,(X). So p is an isolated point of X by Theorem 31.10.10, which contradicts an assumption. 
Hence qi = q2. 


35.3.14 REMARK: Counterexample justifying Hausdorff condition requirement for uniqueness of limits. 
Example 35.3.15 shows that the Hausdorff condition required for the function range in Theorem 35.3.13 
cannot be weakened to the T condition. 


35.3.15 EXAMPLE: A function converging to an infinite number of limits at a single point. 

Let X = R with the standard topology for R. Let Y = Z with the finite-complement topology Top(Y) = 
(01 0 (Q € P(Y); Z(Y V Q) < oc] as in Definition 31.11.7. Then Y is a Tı space which is not Hausdorff. 
(This is shown in Theorem 33.1.28.) Define f : X + Y by f(0) =0 and f(x) = floor(|z| !) for x € IRN {0}. 
(See Figure 35.3.1.) 


f(x) 
A : 
eo 4- ó—e 
e—o 3- o—e 
e—— ——o 2- o—__ 
—— — — — o 14 oo 
© x 
9 I TL T1 * Tale al I 9 » 
1 111 11 1 1 
-1 -3 737475 513 23 1 
Figure 35.3.1 Function which converges to an infinite number of limits 


Let q € Y. Let G € Top,(Y). Then G 2 Z[n, oo) for some n € Zt. Let p — 0. Let Q = (—1/n,1/n). Then 
Q € Top, (X) and f(Q \ {p}) € Z[n, oo) C G. Therefore VG € Top,(Y), IQ € Top, (X), f(Q \ {p}) € G. 
Hence f converges to q at p — 0 € X for any q € Y. So f has an infinite number of distinct limits at 0. 
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Since ((—1/n,1/n); n € Z*) is an open base for X at 0, (loerop, CR) f(Q) = {0} U NZ Ze) = {0}, 


Noerop, c) f(® \ {0}) = 0, and foem (n) f (V {0}) = Z by Remark 31.11.11. 


35.3.16 REMARK: A theorem and definition which do not require unique limits. 

Definition 35.3.7 and Theorem 35.3.9 do not require the uniqueness of the value to which the function 
converges. This seems perhaps a little surprising because in the case of metric spaces, which are necessarily 
Hausdorff, one is accustomed to always having at most one limit for a function at each non-isolated point. 
With general topological spaces, one must be more cautious. A function may converge to more than one 
value at a single point of its domain, as in Example 35.3.15. So when one says that a function is continuous 
at a point because it converges to its value, that does not necessarily exclude the possibility that it also 
converges to other values. 


Notation 35.3.17 requires the existence and uniqueness of the limit of a given function at a given point. 
Theorem 35.3.13 gives some sufficient conditions for uniqueness, but existence is more difficult to guarantee 
in a general way because it depends very much on the function. Thus Notation 35.3.17 can only be used 
after testing specific functions at specific points in specific topological spaces to determine whether the limit 
is well defined. 


35.3.17 NOTATION: lim; f(z), for a function f : X — Y, for topological spaces X and Y with x € X, 
denotes the limit of f at x if the limit is well defined. 


35.3.18 REMARK: The arbitrary, unnecessary dummy variable in the standard notation for limits. 
Although Notation 35.3.17 is commonly used for the limit of a function f, it contains a superfluous variable z. 
Notation 35.3.19 is preferable, but is probably non-standard. (The apparently superfluous “dummy variable" 
is useful for inline definition of functions. For example, lim, ,9 z !sin(z) = 1.) 


35.3.19 NOTATION: lim, f, for a function f : X — Y, for topological spaces X and Y with x € X, denotes 
the limit of f at x if the limit is well defined. 


35.3.20 THEOREM: Function continuity at a point is equivalent to convergence to its limit at the point. 
A function f : X — Y for topological spaces X and Y is continuous at x € X if and only if lim, f, the limit 
of f at x, is well defined and f(x) = lim; f. 


PRoor: This assertion is a paraphrase of Theorem 35.3.9. 


35.3.21 REMARK: The limit set of a function at a point. 

It is inconvenient that functions must be tested for convergence at a given point before computing the limit 
of the function at that point. Every statement about limits of functions must be preceded by a condition that 
the limit exists, but that existence is determined by an attempt to compute the limit. Thus computation 
and existence are intertwined, as they very often are in analysis. It would be more satisfying if there were a 
set-theoretic expression which is always well defined, and then if this has some specified property, one may 
say that a limit is well defined. Thus one would make a computation which is always well defined, followed 
by a simple test to determine whether the function is convergent and then whether it is continuous. 


For metric spaces Mı and Ms, one could perhaps define a suitable set-theoretic expression for a function 
f: Mı > Mə at a point p € Mj as limp(f) = (sso f(Bp,s). (This is related to Theorem 38.1.7.) Then 
lim,(f) = {f(p)} would hopefully be true if and only if f is continuous at p. If M5 = IR, one may define 
inferior and superior limits of functions, which are then equal if and only if the limit of the function is well 
defined, as described in Section 35.8. 

For general topological spaces X and Y, a suitable limit-set expression for a function f : X —Y at p€ X 
could be lim,(f) = laevo», xf (Q). Theorem 35.3.22 suggests that this expression could be useful for 
testing functions for their convergence and continuity, but Example 35.3.23 shows that this will not be 
successful in general. 


35.3.22 THEOREM: Continuity at a point implies intersection of images of neighbourhoods is a singleton. 
Let X be a topological space. Let Y be a Tı space. Let f : X — Y be a continuous function. Then 


Vp € X, (loerop, (x) f(Q) = {f(p)}. 
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PROOF: Let f: X — Y be continuous. Let p € X. Let Q € Top,(X). Then p € Q. So f(p) € f(Q). 
Therefore f (p) € ( loerop, (x) f(Q). Now let q € Noetop, (x) f(Q). Suppose that q # f(p). Then there is a set 
G € Topp (Y) such that q ¢ G by Definition 33.1.8 because Y is a T; space. Then Q = f^! (G) € Top, (X). 
So f(Q) = f(f-1(G)) = Gn Range(f) by Theorem 10.7.1 (i). So q ¢ f(Q). Therefore q ¢ aero, (x) / (99; 
which is a contradiction. Hence (laerop, (X) F(Q) = (f(p)). 


35.3.23 EXAMPLE: Define f : R IR by 


— fle} ife e{n1; ne Z\ {0}} 
Vr € R, f(z) = t otherwise, 


where IR has the usual topology. (See Figure 35.3.2.) 


f(x) 
A 
. 4- . 
e : 3- e 
. : 24 . 
e . ' 14 : . 
: : : : |o 
9 9 ġ— çq oom $—  axoo-0—9——9- 9 9 » 
-1 zd p d id (4 1 
2 34 4 3 2 
Figure 35.3.2 Divergent function with singleton limit set 


Then (loeo, (x) f(9) = (f(p)) for p = 0. But f is not continuous at 0. Nor is f convergent at 0. 


35.3.24 REMARK: The limit set of a function at a point. 
Example 35.3.23 shows that even if the limit set at some point in Definition 35.3.25 is a singleton, this does 
not necessarily imply that the function is convergent at that point. 


Definition 35.3.25 removes the point x € X from the limit-set expression, but adds all of the limit points of 
sets f(QN(x]) by computing its closure. (By Theorem 31.10.12 (vii), the closure of a set adds all of its limit 
points.) So it seems that the limit set in Definition 35.3.25 should include all of the possible limit points of 
a function at a given point. However, it does not imply convergence. 


35.3.25 DEFINITION: The limit set of a function f at a point x, for topological spaces X and Y, function 
[: X — Y and x € X, is the set 


F(Q\ (2]) = (y € Y; VO € Top,(X), y € F(Q \ {x})}- 
Oc Top, (X) 


35.3.26 REMARK: The topology of pointwise convergence. 

Pointwise convergence of functions is the same as the product topology for the space of all functions, except 
that it must be restricted to the function space of interest. In the case of Definition 35.3.27, the function 
space is the set of all continuous functions from X to Y, but this space can be any subset of Y*. (See 
Definition 32.12.2 for the product topology in Definition 35.3.27. For the pointwise convergence topology, 
see Gaal [77], page 61; Willard [165], pages 278-279.) 


35.3.27 DEFINITION: The topology of pointwise convergence on the set C(X,Y) for topological spaces 
X < (X, Tx) and Y < (Y, Ty) is the restriction to IP(C(X, Y)) of the product topology on Y *. 
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35.3.28 REMARK: The compact-open topology for spaces of continuous functions. 
See Definition 33.5.20 for the compact-open topology on C(X,Y), which is stronger than the pointwise 
convergence topology. 


35.3.29 REMARK: Convergence of set-valued functions. 

It is sometimes useful to define limits of set-valued (or multiple valued) functions f : X — P(Y) for 
topological spaces X and Y. Definition 35.3.30 follows the pattern of Definition 35.3.7. This has some 
relevance to the nested set convergence concepts in Section 37.9. 


35.3.30 DEFINITION: A limit of a set-valued function f : X — P(Y) at a point p € X, for topological 
spaces X and Y is an element q € Y which satisfies VG € Top,(Y), JQ € Top,(X), Vz € OX (pj, f(z) CG. 


35.4. Limits and convergence of sequences 


35.4.1 REMARK: Specialisation of limits and convergence concepts to sequences. 

An infinite sequence is a function whose domain is the totally ordered set of all non-negative integers. (See 
Definition 12.3.1 for infinite sequences.) In the sense of the limit of a function at a point in Definition 35.3.7, 
every sequence converges at every point in its domain to every point in its range, assuming the usual topology 
on the integers as in Definition 32.5.3. Therefore the general definition of limits for functions is of no interest 
when applied to infinite sequences. However, in terms of the usual topology on the extended non-negative 
integers, the situation is more interesting. The limiting process of interest for infinite sequences is the limit 
of its value as a variable point in the domain “tends to infinity”. Since such a sequence has no value at 
infinity, it is not meaningful to define continuity at infinity, but limits can be well defined “at infinity" by 
ignoring the non-existent value of the sequence at infinity. 


For brevity, a sequence is implicitly assumed here to be infinite unless otherwise noted. 


35.4.2 DEFINITION: A limit of a sequence (x;)ic of points in a topological space X is a point z € X such 
that VO € Top, (X), 3n € I, Vi > n, z; € Q, where I is Zj or Z+. 


A convergent sequence in a topological space X is a sequence of points in X which has a limit in X. 


A divergent sequence in a topological space X is a sequence of points in X which is not convergent in X. 


A sequence x = (r;);e;y converges to a point z in a topological space X when z is a limit of x in X. 


35.4.3 THEOREM: A constant infinite sequence converges to its constant value. 
Every constant infinite sequence in a topological space converges to its value. 


PROOF: Let x be a sequence in a Hausdorff space X with x; = z for all i € Zf. Let Q € Top,(X). Let 
n — 0. Then z; € Q for all i > n. Thus x converges to z by Definition 35.4.2. 


35.4.4 REMARK:  Notations for limits of sequences. 

There are many different uses for the notation “lim”. The notation lim(S) in Theorem 31.10.12 denotes the 
limit set of a set S, which is always well defined. The notation “lim,_,, f(z)” in Notation 35.3.17, for a 
function f : X — Y, for topological spaces X and Y, denotes the limit of f at z, which may or may not be 
well defined. (The notation “lim, f" in Notation 35.3.19 means the same as “lim,_,, f(z)".) However, none 
of these notations is suitable for limits of sequences. Therefore a limit notation must be defined specifically 
for sequences, as in Notation 35.4.5. Of course, it must only be used if the limit exists and is unique. 


35.4.5 NOTATION: 

lim;.,o, Zi, for an infinite sequence x = (x;)%2, in a topological space X, denotes the limit of x in X ifa 
limit exists and is unique. (The choice of “dummy variable" i is arbitrary.) 

lim x, for an infinite sequence x in a topological space X, denotes the limit of x in X if a limit exists and is 
unique. 


35.4.6 THEOREM: A subsequence of a convergent sequence converge to the same value as the sequence. 
Let y be an infinite subsequence of a sequence x which converges to a point z in a topological space X. Then 
y converges to z in X. 
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PROOF: Let x = (2;)%29 converge to z € X. Then VO € Top,(X), 3n € I, Vi > n, x; € Q. Let y = 
(yj);&.o be an infinite subsequence of x. Then y = x o 8 for some increasing function f : Zi — Zg by 
Definition 12.3.3. Let Q € Top,(X). Then there exists n € Zf satisfying Vi > n, x; € Q. Let i € Zt with 
i > n. Then f; > i > n by Theorem 11.7.2 (ii). So Vi > n, xg, € Q. Thus Vi > n, y; € Q. Therefore 
VO € Top, (X), 3n € I, Vi > n, y; € Q. Hence y converges to z by Definition 35.4.2. 


35.4.7 REMARK: Conditions for the uniqueness of limits of sequences. 

Theorem 35.4.10 asserts that infinite sequences in Hausdorff (i.e. T4) topological spaces have at most one 
limit. Thus any convergent sequence in a Hausdorff space has a unique limit, which is as required by a wide 
range of applications. 


In a general topological space, the uniqueness of the limit of a sequence is not guaranteed. For example, 
let X = (yi, y2} with yı # ys, and let Top( X) = (0, X). Then both yı and ys are limits of all infinite 
sequences (z;);e; € X7. This space does not even have the To property. 


35.4.8 EXAMPLE: The inadequacy of To separation to guarantee uniqueness of limits of sequences. 

To prevent simple periodic “oscillating sequences" from converging, X requires at least the T4 separation 
property. Sequences x € X! with Z:(Range(z)) < oo and #{y € X; #{i € I; zx; = y} = œ} > 1 “oscillate” 
between a finite number of values. Such sequences can converge if points in the set of infinitely repeating 
values {y € X; #{i € I; x; = y} = co} are not separated by neighbourhoods from each other. Such 
sequences may have multiple limits. But even points which are not in the set of sequence values may be 
limits of sequences which have a finite number of values. For example, let X = {y1, Y2, y3} for distinct 
points yi, y», ya, and Top(X) = (0, ly y2}, {yo}, (yo, Y3}, X}. (This is homeomorphic to the topological 
space 3g in Example 31.5.6. See Figure 31.5.2.) As mentioned in Remark 33.3.7, this space is To, but not 
T; or T». The sequence (z;)ie; € X7 with z; = y2 for all i € I converges to all three points of X. 


35.4.9 EXAMPLE: The inadequacy of Tı separation to guarantee uniqueness of limits of sequences. 

To see that Tı separation is not enough to guarantee the uniqueness of limits of sequences, consider the 
finite-complement topology Top(.X) = (0) U(9 € P(X); #(X\Q) < co} on X = Z. (See Definition 31.11.7.) 
This topology is Tı, but not Hausdorff. Define x = (1;)?99 by xı = for alli € Z$. Then every element of 
X is a limit of the sequence x. 


35.4.10 THEOREM:  Convergent infinite sequences in Hausdor[f spaces have unique limits. 
If X is a Hausdorff space, then every convergent infinite sequence of points in X has a unique limit. 


PROOF: Let X be a Hausdorff space. Let x = (xi;)ie; be a convergent sequence in X which has limits 
Yı, Y2 € X with yı A y2, where I = Zj. Then by Definition 35.4.2, VO € Top, (X), jn € I, Vi 2n,z;€Q 
for k = 1,2. By Definition 33.1.24 for a Hausdorff space, there exist sets Qı € Top,, (X) and Q2 € Top,, (X) 
such that Q1 N Q9 = Ø. Then 3n, € I, Vik > ng, xi € Oy for k = 1,2. Let n = max(nı, nə). Then 
Vi > n, az, € Q1 (05. Hence di > n, z; € Q1 N Q2. This is a contradiction. Therefore yı = y2. 


35.4.11 THEOREM: Constant infinite sequences in Hausdorff spaces converge to their constant value. 
Let x be a constant infinite sequence in a Hausdorff space with value z. Then lim; 55 £i = Z. 


PRoor: By Theorem 35.4.3, x converges to z, and by Theorem 35.4.10, this limit is unique. Hence 
lim; oo Li = Z. 


35.4.12 REMARK: Limit points of sets and limits of sequences are different concepts. 

The term “limit point” for sets (as in Definition 31.10.2) has a different but related meaning to the term 
“limit” for a sequence of points in a topological space (as in Definition 35.4.2). For example, if all of the 
points z; of a sequence x = (z;);e; have the same value x; = y, then y is a limit of the sequence x, but y is 
not a limit point of the range-set Range(x) = (y) of the sequence. If all, or at least an infinite number, of 
the elements of the sequence are different, then any limit of the sequence is also a limit point of the range-set 
of the sequence. This is asserted in Theorem 35.4.13. 


Another difference between limit points of sets and limits of sequences is that the word "infinite" has different 
meanings in each context. An infinite set is a set which is not equinumerous to some finite ordinal number, 
but it is not necessarily equinumerous to the set of all finite ordinal numbers w unless the axiom of countable 
choice (or some slightly weaker non-ZF axiom) is invoked. In other words, it is not necessarily w-infinite. By 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1184 35. Continuity and limits 


contrast, the range of an infinite sequence is always countable, and the range of an injective infinite sequence 
is always countable infinite, which implies that it is w-infinite. (See Section 13.7 for these infinity concepts.) 


Some authors distinguish a limit point of the range of a sequence from a limit of the sequence itself by using 
terminology such as a “cluster point of the sequence” or an “accumulation point of the sequence”. This can 
be confusing. For maximum clarity, it is perhaps preferable to always mention the range of the sequence 
rather than just the sequence itself. Thus one may refer to such a point as a “limit point of the range of the 
sequence”, which removes all ambiguity. 


35.4.13 THEOREM: Conditions for a limit of a sequence to be a limit point of its range. 
Let X be a topological space. Let x = (aj)ier € X7 be an infinite sequence of points in X. Let y be a limit 
of x. 


(i) If #(Range(x)) = oo, then y is a limit point of Range(z) 
(ii) If x is injective, then y is a limit point of Range(z) 


PROOF: For part (i), let y be a limit of an infinite sequence x = (z;)ie; € X! with #(Range(x)) = oo ina 
topological space X. Suppose that y is not a limit point of the set Range(x). Then by Definition 31.10.2, 
it follows that JQ € Top, (X), QN (Range(x) V (yj) = 0. But then for such Q, there is an n € I such that 
x; € Q for alli 2 n. Therefore x; = y for all i > n. So #(Range(x)) € n 4- 1 < oo, which contradicts an 
assumption. Hence y must be a limit point of the set Range(x). 


Part (ii) follows from part (i). 


35.4.14 REMARK: A limit of a sequence with a finite range could be a limit of the range-set. 

The converse of Theorem 35.4.13 (i) is not valid in general. That is, if #(Range(x)) < oo, it does not 
necessarily follow that a limit y of the sequence x = (z;);e; in X is not a limit of the range-set Range(x). 
For example, Let X = {y1, y2} with yı Z yo and Top(X) = {0,{y2},X}. This is a To space, but not a 
Tı space. Let x; = y» for i € I. Then yı is a limit of the sequence x because there is only one set Q 
in Top, (X), namely Q = X. But yı is also a limit point of the set Range(x) because x; € X V {y1} for 
alli € I. Hence a finite sequence range does not imply that a limit of the sequence is not a limit of the 
range-set. Theorem 35.4.15 asserts that T, separation is sufficient to obtain a converse of Theorem 35.4.13 (i). 


35.4.15 THEOREM: Equivalent condition for a sequence limit to be a limit point of its range. 
Let X be a T, topological space. Let x = (r;)ie; € X7 be an infinite sequence of points in X. Let y be a 
limit of x. 


(i) If y is a limit point of Range(x), then #(Range()) = oo. 

(ii) y is a limit point of Range(z) if and only if #(Range(s)) = oc. 
PROOF: For part (i), let X be a topological space. Let y be a limit of an infinite sequence x = (zi)ier € X! 
in X. Suppose that #(Range(x)) < oo. Let S = Range(x) \ {y}. Then #(S) < oo. So there is a set 
Q € Top (X) with QN S = 0 by the T; property. (This follows by intersecting sets Q, € Top, (X) with 
Q, N {z} = 0 for z € S.) Therefore IN € Top, (X), QM (Range(z) V {y}) = 0. Hence #(Range(x)) < oo 
implies that y is not a limit point of the range-set Range(z). 


Part (ii) follows from part (i) and Theorem 35.4.13 (i). 


35.4.16 REMARK: Four different kinds of convergence of a sequence. 
Reversals of the implications in Theorem 35.4.17 require some constraints. If X is a Tı space, then (4) 


implies (3) by Theorem 33.1.22. If X is a first countable space, then (3) implies (2) by Theorem 35.4.18. To 
make (2) imply (1) requires a constraint on the sequence itself, such as Cauchy convergence. 


35.4.17 THEOREM: Some sequence limit properties in non-increasing order of strength. 
Let x = (r;)ie, be an injective sequence of points in a topological space X. Let z € X. 
Then (1) + (2) + (3) + (4). 


(1) z is a limit of the sequence z. 
2 


(2) z is a limit of some subsequence of the sequence z. 
(3) z is an oo-limit point of Range(z). 
(4) 


4) z is a limit point of Range(z). 
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PROOF: Letz € X. Let x = (z;)ie, be an injective sequence in X. Assume (1). Then z is a limit of x. 
But x is a subsequence of x. Therefore (2) is satisfied. 


Assume (2). Let z € X be a limit of an injective infinite subsequence z' = x o 6 of x, where B € Inc(w,w). 
Then by Definition 35.4.2, VO € Top, (X), dn € w, Vi > n, x; € Q. But (25; i > n) € Range(x’) C Range(x) 
and Z([zj;i > n}) = oo for all n € v. So On Range(z) 2 QN (z5;i > n} = (x5; i > n}, and so 


#(Q N Range(x)) > #({x;; i > n}) = oo. Therefore #(Q N (Range(x) V {z})) = oo for all Q € Top,( X). 
Hence z is an oo-limit of Range(z) by Definition 31.10.17. Thus (3) is satisfied. 


The implication (3) — (4) follows from Theorem 31.10.18. 


35.4.18 THEOREM: Some conditions for an infinite sequence to have a convergent subsequence. 
Let x = (r;)ie, be an infinite sequence in a topological space X. If X is first countable and z is an oo-limit 
point of Range(x), then z is a limit of some injective infinite subsequence of z. 


PROOF: Let X be a first countable topological space. Let £x = (x;);e, be a sequence in X such that 
z € X is an oo-limit point of Range(x). Then #(Q N (Range(z) \ {z})) = oo for all Q € Top,(X) by 
Definition 31.10.17. Let (%;)j¢7 be the first-instance subsequence of x. (See Definition 12.4.16.) Then z is 
an injective subsequence of x and Range(z) = Range(x) by Theorem 12.4.15. So #(QN(Range(Z)\{z})) = 

for all Q € Top, (X). Therefore #(Range(Z)) = oo, and so J = w. 


By Theorem 33.4.16, there exists a non-increasing indexed countable open base (B;);e, € Top,(X)” at z 
because X is first countable. Define (S;);e, by S; = {j € w; zj € Bi \ {z}} for i € w. Then z£(S;) = oo for 
all i € w because otherwise B; N (Range(z) V {z}) = (25; j € Si} would be finite for some i € w. 


Inductively define the map f : w — w by 6(0) = min(So) and 8(i) = min(j € Si; B(i — 1) < j} for all i € w. 
Then by induction, 8(i) is well defined for all i € w because #(S;) = oo, and removing a finite number of 
elements from an infinite set cannot make it empty. Also by induction, f is an increasing function. Therefore 
x’ = ro ß is an infinite subsequence of z by Definition 12.3.3. Also, x’ is an injective sequence because z and 
B are both injective. But x, € B; V {z} for all i € w because x, = z(B(i)) and B(i) € S; and z(3) € Bi \ {2} 
for all j € S;. So a € Bn \ {z} for all in € w with n € i because (Bn)new is a non-increasing sequence of 
sets. Thus Vn € w, Vi > n, x; € Bn \ {z}. 

Let Q € Top, (.X). Then B, € Q for some n € w by the definition of an open base at z. So Vi > n, x; € O\{z}. 
Thus VQ € Top, (X), dn € w, Vi > n, z; € QN {z}. Therefore z is a limit of the subsequence x’ = z o B of i. 
But then z' is a subsequence of x by Theorem 12.3.4 (iii) because 3 is a subsequence of x. Hence z is a limit 
of the subsequence 2’ of x. 


35.4.19 THEOREM: The closure of a set contains all limits of convergent sequences in the set. 
Let S be a subset of a topological space X. 


(i) S contains all limits of convergent sequences in S. 
(ii) If S is a closed subset of X, then S contains all limits of convergent sequences in S. 
PROOF: For part (i), let S be a subset of X. Let x = (x;);94, be a convergent sequence in S. Let z be 


a limit of x. Let Q € Top,(X). Then QN S Æ () because x; c QN S for some i € w by Definition 49.9. 
Therefore z € $ by Theorem 31.8.17 (ii). 


Part (ii) follows from part (i) and Theorem 31.8.14 (v). 


35.4.20 REMARK: Proving convergence of real-number sequences. 

The assertions in Theorem 35.4.21 are proved somewhat clumsily because of the lack of specific theorems 
for metric spaces, which are introduced in Chapter 37, and theorems for real-number sequences specifically. 
However, it is valuable to attempt to produce proofs without the usual “machinery”. (Machinery-free proofs 
are sometimes referred to as ^bare-handed".) 


35.4.21 THEOREM: Some useful basic examples of convergent real-mumber sequences. 
(i) Vz € (1,00), VR € R$, dn € Zj, Vi € Za, (i> m > zi >R). 
(ii) Vz € (0, 1), Timisoo 2" = 0. 
) 
) 


(iii) Vz € (—1, 1), lime z* = 0. 


(iv) Vz € R, lim;,,5; =~ = 0. 
i 
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Pnoor: For part (i), let z € (1,00). Define the sequence z = (2;)%9 by x; = 2 for alli € Zd. Then x 
is an increasing sequence of positive real numbers. Let S = sup(Range(x)). Suppose that S € IR. Then 
(S/z, S] N Range(x) # Ø by Theorem 11.2.9 (i) because S/z < S. So there exists t € (5/2, S] n Range(z). 
Then t = z? for some i € Zj. So z^*1 = zt > S, which is a contradiction. Consequently S = oo. Hence 
VR € Rĝ, In € Zf, Vi € Zt, (i > n => zt > R) because the sequence is non-decreasing. 

For part (ii), let z € (0,1). Define z = (a;)%o by xi = z' for all i € Zf. Since 1/z € (1,00), it follows 
from part (i) that YR € R$, In € Zd, Vi € Zd, (i> n => z! « 1/R). Let Q € Topy(IR). Then (—r,r) C Q 
for some r € R+ by Theorem 32.5.9. Let R = 1/r. Then dn € Zj, Vi € Zf, (i > n > zê € Q) because 
(0,r) C Q. Hence lim;_,., 2f = 0 by Notation 35.4.5, Definition 35.4.2 and Theorem 35.4.10. 


For part (iii), the case z € (0,1) is proved by part (ii). The case z = 0 follows from Theorem 35.4.11. The 
case z € (—1,0) follows from part (ii) by substituting —z for z and noting that the neighbourhood (—r, r) is 
symmetric about 0 for all r € Rt. 

For part (iv), let z € IR*. Define x = (2;)%o by v; = z!/i!for all i € Zj. Let io = 2ceiling(z). (See 
Definition 16.5.12 for the ceiling function.) Then io € Z* and z < io/2. Let i € Z[ig,oo). Then z;,1 = 
zit /(i +1)! = (z/(i + 1)»; < z;/2. The subsequence y = (£io+i)?2o is positive and decreasing. Suppose 
that I = inf(Range(y)) > 0. Then [/, 27) N Range(y) Z Ø. So y; € [I,21) for some i € Zj. But then 
Yi+ı < Yi/2 < yi < I, which contradicts the definition of I. Therefore J = 0. Let Q € Top (R). Then 
(—r,r) € Q for some r € IR* by Theorem 32.5.9, and then Jn € Zf, Vi > n, yi € (—r,r) because I = 0. 
Therefore dn’ € Zd; Vi > n', z; € (—r,r), where n' = io +n. Thus lim; 555 2*/i! = 0. 


The case z = 0 follows from Theorem 35.4.11. The case z € R7 follows by substituting —z for z in the case 
z € IR*. Hence the assertion. 


35.5. Set-limit-based compactness 


35.5.1 REMARK:  Cover-based versus limit-based compactness. 
Broadly speaking, compactness classes may be subdivided into cover-based and limit-based classes. Limit- 
based compactness classes may be further subdivided into set-limit-based and sequence-limit-based classes. 
(1) Cover-based compactness classes. 
1.1) Compact set. Every open cover has a finite subcover. 
1.2) Countably compact set. Every countable open cover has a finite subcover. 
1.3) Lindelóf set. Every open cover has a countable subcover. 
1.4) Paracompact set. Every open cover has a locally finite open refinement. 
1.5) Countably paracompact set. Every countable open cover has a locally finite open refinement. 
(2) Set-limit-based compactness classes. 
(2.1) Every infinite subset has an oo-limit point in the set. 
(2.2) Every infinite subset has a limit point in the set. (Bolzano-Weierstraf limit-point compactness.) 
(2.3) Every w-infinite subset has an oo-limit point in the set. 
(2.4) Every w-infinite subset has a limit point in the set. 
(3) Sequence-limit-based compactness classes. 
(3.1) Every sequence has a convergent subsequence to a point in the set. (Sequential compactness.) 
(3.2) The range of every injective infinite sequence has an oo-limit point in the set. 
(3.3) The range of every injective infinite sequence has a limit point in the set. 


Cover-based compactness classes are presented in Sections 33.5, 33.6 and 33.7. Set-limit-based compactness 
classes are presented in Section 35.5. Sequence-limit-based compactness classes are presented in Section 35.6. 


35.5.2 REMARK: Literature for Bolzano- Weierstraf? theorems and properties. 

Various Bolzano-Weierstraf theorems and properties are presented by Kasriel [100], pages 75-77, 123-125, 
202-210; Thomson/Bruckner/Bruckner [149], pages 56, 163-164, 171, 447, 598-599, 602-603; Simmons [137], 
pages 120-124; Johnsonbaugh/Pfaffenberger [97], pages 58-59, 148-150; Gaal [77], pages 128-129, 269-270; 
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B. Mendelson [115], pages 175-176, 179-184; Hocking/Young [93], pages 128-129, 269-270; Moore [371], 
pages 13, 17-19, 65, 80-81, 169, 180, 205, 237, 240. 
Some briefer presentations of the Bolzano-Weierstraf concept and various related sequential compactness 
concepts are given by Bass [53], pages 206, 211-212; Gemignani [80], pages 180-181; Kolmogorov/Fomin [104], 
page 101; Mattuck [114], pages 82-84, 185-187, 350-351; A.E. Taylor [145], pages 58-59; Rudin [129], 
page 35; Steen/Seebach [141], page 19-20; Riesz/Szókefalvi-Nagy [125], pages 64-66; S.J. Taylor [147], 
page 31; Shilov [135], pages 75-76; Bruckner/Bruckner/Thomson [56], pages 8, 374-375; Wilansky [163], 
pages 121-124; Willard [165], page 125; Jech [364], pages 21, 142; Howard/Rubin [362], pages 132, 231. 


35.5.3 REMARK:  Bolzano-Weierstraf? property definition diversity. 

There is some diversity in the definitions for the Bolzano-Weierstraf property in the literature. The cause 
of this is likely the fact that many compactness definitions become equivalent when the application domain 
is restricted. For example, restricting definitions of compactness concepts from general topological spaces 
to metric spaces, Cartesian spaces or the real number system makes some otherwise distinct definitions 
equivalent. Similarly, adding the axiom of choice, or countable choice, to plain ZF set theory makes some 
distinct definitions of compactness concepts equivalent. 


In many textbooks, the name *Bolzano-Weierstraf property" is given to the limit-point compactness concept 
in Definition 35.5.4. (For example, Bass [53], page 206; Kasriel [100], pages 123, 203; Simmons [137], page 121; 


S.J. Taylor [147], page 31; Bruckner/Bruckner/Thomson [56], page 375.) Some authors identify it with the 
sequential compactness concept as in Definition 35.6.2. (For example, Thomson/Bruckner/Bruckner [149], 
pages 163, 598.) Other authors define the Bolzano-Weierstrab property to mean that every infinite sequence 
has an accumulation point. (For example, Gaal [77], page 129; Hocking/Young [93], pages 128-129, 269-270.) 
Yet other authors define it to mean that every infinite set has an oc-accumulation point, which means that 
every neighbourhood of the point contains an infinite number of elements of the infinite set. (For example, 
B. Mendelson [115], pages 175-176.) And some authors define it to mean that every infinite sequence has an 


oc-accumulation point. (For example, Bruckner/Bruckner/Thomson [56], page 375.) 


35.5.4 DEFINITION: Limit-point compactness. (Bolzano- Weierstraf?. ) 
A limit-point compact set in a topological space X is a set S € P(X) such that 


VA € P(S), #(A)=co > XES, VN c Top,(X), On (AX {2} z 0. (35.5.1) 
In other words, every infinite subset of S has a limit point in S. 


A limit-point compact set is also known as a Bolzano- Weierstraf? set. 


A limit-point compact set in a topological space is said to have the Bolzano- Weierstraf! property. 


35.5.5 DEFINITION: Limit-point compact space. (Bolzano- Weierstraf?. ) 
A limit-point compact (topological) space is a topological space X which is a limit-point compact set in X. 


A limit-point compact space is also known as a Bolzano- Weierstraß space. 


A limit-point compact topological space is said to have the Bolzano- Weierstraf! property. 


35.5.6 REMARK: Implications between limit-point compactness, countable compactness and limit points. 
Some implications between limit-point compactness, countable compactness and some closely related kinds 
of compactness are illustrated in Figure 35.5.1. The downward arrows are valid in general topological spaces 
in Zermelo-Fraenkel set theory. The upward arrows require either T, separation or the axiom of countable 
choice. (This may be compared with the ancient board game called “snakes and ladders".) Thus in a T4 
space, there are only three independent concepts in Zermelo-Fraenkel set theory, whereas in a T4 space with 
the axiom of countable choice, all five indicated concepts are equivalent. 


It may seem, when viewing side-by-side comparisons of ZF set theory and ZF+CC set theory (or ZF+AC 
set theory) such as in Figure 35.5.1, that the benefits of the countable or full choice axiom are irresistible. 
However, the benefits are largely illusory. In practice, one generally determines directly whether a set or 
space has any particular compactness property. It is rare that a theorem such as Theorem 35.5.7 must be 
relied upon to determine whether a given set or space has a particular topological property. 


The arrows labelled “Lindelöf” in Figures 35.5.1 and 35.6.1 are of little practical interest. As mentioned 
in Remark 33.7.7, it is not even possible to show directly that the real number system is a Lindelöf space. 
(Proofs of this assertion require a choice axiom which is at least as strong as the equivalence of finite and 
Dedekind-finite sets.) 
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Figure 35.5.1 Implications between limit-point compactness and countable compactness 


35.5.7 THEOREM [ZF4OC]: Limit-point compact sets and countably compact sets. 
(i) In a topological space, every countably compact set is limit-point compact. 
(ii) In a T; topological space, every limit-point compact set is countably compact. 


(iii) Every subset of a T topological space is a limit-point compact set if and only if it is a countably 
compact set. 


PROOF: For part (i), let X be a topological space. Let K € P(X) by a countably compact subset of X. 
Then by the axiom of countable choice and Theorem 33.7.6 (ii), every infinite subset S of K has an co-limit 
point in S. But by Theorem 31.10.18, every oo-limit point of S is a limit point of S. Therefore every infinite 
subset of K has a limit point. Hence K is a limit-point compact subset of X by Definition 35.5.4. 


For part (ii), let X be a IT, topological space. Let S € P(X) be a limit-point compact set in X. Then by 
Definition 35.5.4, every infinite subset of S has a limit point in S. So by Theorem 33.1.22, every infinite 
subset of S has an oc-limit point in $. Hence by the axiom of countable choice and Theorem 33.7.6 (iii), S is 
countably compact. = 


Part (iii) follows from parts (i) and (ii). 


35.5.8 REMARK: co-limit-point compactness is slightly stronger than limit-point compactness. 

Definition 35.5.9 gives an ad-hoc name to the slightly stronger form of limit-point compactness where the 
specified limit point must be an oo-limit point. By Theorem 33.1.22, every limit-point compact set in a 
T; space is an co-limit-point compact set. 


35.5.9 DEFINITION: oo-limit-point compactness. 
An oo-limit-point compact set in a topological space X is a set S € P(X) such that 


VA € P(S), #(4)=œ > Xz eE sS, YN c Top,(X), #(QN (AN (z])) = oo. 
In other words, every infinite subset of S has an co-limit point in S. 


35.5.10 THEOREM: Relations between co-limit-point compactness and limit-point compactness. 
Let X be a topological space. Let K be a subset of X. 


(i) If K is co-limit-point compact, then K is limit-point compact. 


(ii) If K is limit-point compact and X is a T4 space, then K is oo-limit-point compact. 


PROOF: Part (i) follows from Theorem 31.10.18 and Definitions 35.5.4 and 35.5.9. 
Part (ii) follows from 'Theorem 33.1.22 and Definitions 35.5.4 and 35.5.9. 


35.5.11 REMARK: Notations for some styles of limit-point compactness. 
'The four limit-point compactness styles in Figure 35.5.1 may given ad-hoc abbreviations as follows. 
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,w): Every infinite set has an oo-limit point. (oo-limit-point compactness.) 
, 1): Every infinite set has a limit point. (Standard Bolzano-Weierstraf limit-point compactness.) 
,w): Every w-infinite set has an oo-limit point. 


w 
4) BW (w, 1): Every w-infinite set has a limit point. 


Then BW(co,w) = BW(œ,1) — BW(w,1) and BW(%,w) > BW (w,w) = BW(w,1). The standard 
definition for the Bolzano-Weierstraß property is BW(oo,1). Since these concepts are all equivalent in 
ZF+CC set theory in a T, space, some authors define the Bolzano-Weierstraf property differently. Thus 
B. Mendelson [115], pages 175-176, defines the Bolzano-Weierstraf property to mean BW (oo, w). 


As mentioned in Remark 31.10.15, the term “oo-limit point”, which agrees with most of the literature, is 
potentially confusing. Each neighbourhood of such a point is required to contain merely an infinite number of 
points, not an w-infinite number of points. Therefore the above notations would be more accurately written 
as BW(oo, oo) > BW(co, 1) = BW(w, 1) and BW(co, oo) > BW(w, co) = BW (w, 1). 


The compactness styles BW(w,1) and BW(w,w) on lines (3) and (4) respectively are given clumsy ad-hoc 
names in Definition 35.5.12. 


35.5.12 DEFINITION: Ad-hoc w-infinite-set limit-point compactness class names. 
An w-infinite-set limit-point compact set in a topological space X is a set K € P(X) such that every w-infinite 
subset of K has a limit point in K. 


An w-infinite-set co-limit-point compact set in a topological space X is a set K € P(X) such that every 
w-infinite subset of K has an co-limit point in K. 


35.5.13 THEOREM: Some properties of w-infinite-set limit-point and co-limit-point compactness. 
Let X be a topological space. Let K be a subset of X. 


(i) If K is oo-limit-point compact, then K is w-infinite-set oo-limit-point compact. 

(ii) If K is limit-point compact and X is a T4 space, then K is w-infinite-set oo-limit-point compact. 
(iii) If K is w-infinite-set oo-limit-point compact, then K is w-infinite-set limit-point compact. 
) 


(iv) If K is w-infinite-set limit-point compact and X is a T, space, then K is w-infinite-set oo-limit-point 
compact. 


PROOF: Part (i) follows from Definitions 35.5.9 and 35.5.12 because every w-infinite set is infinite by 
Theorem 13.7.7 (ii). 


Part (ii) follows from part (i) and Theorem 35.5.10 (ii). 
Part (iii) follows from Definition 35.5.12 and Theorem 35.5.10 (i). 
Part (iv) follows from Definition 35.5.12 and Theorem 35.5.10 (ii). 


35.5.14 THEOREM [ZF+CC]: Some properties of w-infinite-set oo-limit-point compactness. 
Let X be a topological space. Let K be a subset of X. 


(i) If K is w-infinite-set oo-limit-point compact, then K is oo-limit-point compact. 
(ii) If K is w-infinite-set oc-limit-point compact, then K is limit-point compact. 


(iii) If K is w-infinite-set limit-point compact, then K is limit-point compact. 


Pnoor: Part (i) follows from Definitions 35.5.9 and 35.5.12 because, by the axiom of countable choice, 
every infinite set is w-infinite. 


Part (ii) follows from part (i) and Theorem 35.5.10 (1). 


Part (iii) follows from Definitions 35.5.4 and 35.5.12 because, by the axiom of countable choice, every infinite 
set is w-infinite. 
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35.6. Sequence-limit-based compactness 


35.6.1 REMARK: Clear distinctions between compactness and sequential compactness. 

As mentioned in Remark 33.5.1, compactness was defined for several decades of the last hundred years in 
terms of convergence of sequences. A hundred years ago, the distinctions between different definitions of 
compactness were not very clear because analysis was not at that time based so firmly on axiomatic set 
theory, and so the need for axioms of choice to choose infinite sequences was not so evident. Nowadays, the 
distinctions between various definitions related to compactness are more clear-cut, and the need for theorems 
to establish implications between these definitions is consequently more evident. 


The notation “Inc(w,w)” in Definition 35.6.2 means the set of all strictly increasing functions from w to w. 
(See Notation 11.1.32.) The notation “w” for the non-negative integers is more convenient than “Fe m 


35.6.2 DEFINITION: A sequentially compact set in a topological space X is a subset S of X such that 


Vf € S", dz € S, 38 € Inc(w,w), 
YQ € Top, (X), Ik € w, Vj > k, f(8(3)) EQ. (35.6.1) 


In other words, every infinite sequence in S has a convergent infinite subsequence to a point in S. 


A sequentially compact (topological) space is a topological space X such that X is a sequentially compact 
subset of X. 


35.6.3 REMARK:  Limit-point compactness of the range of a sequence. 
As summarised in Remark 35.5.1, the following compactness classes are based on sequences. 


(3) Sequence-limit-based compactness classes. 
(3.1) Every sequence has a convergent subsequence to a point in the set. (Sequential compactness.) 
(3.2) The range of every injective infinite sequence has an co-limit point in the set. 


(3.3) The range of every injective infinite sequence has a limit point in the set. 


Classes (3.2) and (3.3) are a kind of hybrid of sequence-limit-based and limit-point-based criteria. The range 
of an injective infinite sequence is effectively the definition of a countably infinite set. Therefore classes (3.2) 
and (3.3) are essentially equivalent to classes (2.3) and (2.4) respectively, which are referred to as BW (w, w) 
and BW (w, 1) respectively in Remark 35.5.11. To state these equivalences, it is convenient to introduce some 
ad-hoc compactness classes in Definition 35.6.4, which is closely related to Definition 35.5.12. (The proofs 
of the equivalences in Theorem 35.6.5 are so obvious that they do not really deserve to be presented. But 
even the most obvious theorems sometimes turn out to be wrong! So it is prudent to prove everything.) 


35.6.4 DEFINITION: Ad-hoc sequence-range limit-point compactness class names. 
A sequence-range limit-point compact set in a topological space X is a set K € IP(X) such that the range of 
every injective infinite sequence of points in K has a limit point in K. 


A sequence-range oo-limit-point compact set in a topological space X is a set K € P(X) such that the range 
of every injective infinite sequence of points in K has an co-limit point in K. 


35.6.5 THEOREM: Some properties of sequence-range limit-point and oo-limit-point compactness. 
Let X be a topological space. Let K be a subset of X. 
(i) K is sequence-range limit-point compact if and only if K is w-infinite-set limit-point compact. 


(ii) K is sequence-range oc-limit-point compact if and only if K is w-infinite-set oc-limit-point compact. 


PROOF: For part (i), let K be a sequence-range limit-point compact set in a topological space X. Let S 
be an w-infinite subset of K. Then there exists an injective sequence x = (z;);ic,, € S^ by Definition 13.7.6. 
So Range(x) has a limit point z € K by Definition 35.6.4. But Range(x) C S. So z is a limit point of S by 
Theorem 31.10.3 (i). Hence K is an w-infinite-set limit-point compact set in X by Definition 35.5.12. 


For the converse of part (i), let K be an w-infinite-set limit-point compact set in X. Let x = (zi)ic, € K” 
be an injective infinite sequence of points in K. Then Range(x) is an w-infinite set by Definition 13.7.6. So 
Range(x) has a limit point z € K by Definition 35.5.12. Thus every injective infinite sequence of points in 
K has a limit point in K. Hence K is a sequence-range limit-point compact set in X by Definition 35.6.4 
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For part (ii), let K be a sequence-range oo-limit-point compact set in a topological space X. Let S be an 
w-infinite subset of K. Then there exists an injective sequence x = (z;);ie, € S" by Definition 13.7.6. So 
Range(z) has an oo-limit point z € K by Definition 35.6.4. But Range(x) C S. So z is a limit point of S by 
Theorem 31.10.3 (i). Hence K is an w-infinite-set oo-limit-point compact set in X by Definition 35.5.12. 


For the converse of part (ii), let K be an w-infinite-set oo-limit-point compact set in X. Let x = (r;)ie; € KY 
be an injective infinite sequence of points in K. Then Range(x) is an w-infinite set by Definition 13.7.6. 
So Range(r) has an oo-limit point z € K by Definition 35.5.12. Thus every injective infinite sequence of 
points in X has an oo-limit point in K. Hence K is a sequence-range co-limit-point compact set in X by 
Definition 35.6.4 


35.6.6 THEOREM: More properties of sequence-range limit-point and co-limit-point compactness. 
Let X be a topological space. Let K € P(X). 


(i) If K is sequentially compact, then K is sequence-range co-limit-point compact. 


(ii) If K is sequence-range co-limit-point compact and X is a first countable space, then K is sequentially 
compact. 


(iii) If K is sequence-range oo-limit-point compact, then K is sequence-range limit-point compact. 

(iv) If K is sequence-range limit-point compact and X is a T4 space, then K is sequence-range oo-limit-point 
compact. 

(v) If X is a first countable space, then every countably compact subset of X is sequentially compact. 

(vi) If X is a first countable space, then every compact subset of X is sequentially compact. 


PROOF: For part (i), let K be sequentially compact. Let x = (zi)ie; € K" be an injective infinite sequence 
of points in K. Then some subsequence 2’ = x o f, with 8 € Inc(w,w), has a limit z € K. Thus for all 
Q € Top, (X), for some k € w, {x}; j 2 k} C Q. The set {x}; 7 > k} is an infinite subset of K because z’ 
is injective, since x and @ are both injective. So z is an oo-limit point of Range(z') by Definition 31.10.17. 
So z is an oo-limit point of Range(r) by Theorem 31.10.19 (i). Hence K is sequence-range oc-limit-point 
compact by Definition 35.6.4. (Alternatively, use Theorem 35.4.17 (2, 3).) 

For part (ii), let K be a sequence-range oo-limit-point compact set in a first countable space X. Let 
£ = (vi)ie, € K” be an infinite sequence in K. If Range(z) is finite, then #(S) = oo for some j € w, where 
S = (i € w; zx; = £j}. Let B: I — S be the standard enumeration for S as in Definition 12.4.5. Then f is 
an increasing bijection for some J € wt. But I = w because S is infinite. So 8 € Inc(w,w). Then 2’ = x o 8 
is an infinite subsequence of x for which x; = z; for all i € w. So rj € K is a limit for the sequence z'. In 
other words, the sequence x has an infinite subsequence which has a limit in K. 


Now suppose that Range(z) is infinite. Let z = (7;);c; be the first-instance subsequence of x as specified in 
Definition 12.4.16. Then z is an injective subsequence of z and Range(z) = Range(x) by Theorem 12.4.15. 
So #(Range(Z)) = oo, and so J = w. Therefore Range(z) has an co-limit point in K by Definition 35.6.4. 
Therefore z has a subsequence with a sequence-limit in K by Theorem 35.4.18. Thus every infinite sequence 
in K has an infinite subsequence with a sequence-limit in K (because a subsequence of a subsequence is a 


subsequence by Theorem 12.3.4 (iii)). Hence K is sequentially compact by Definition 35.6.2. 

Part (iii) follows from Definition 35.6.4 and Theorem 31.10.18. 

Part (iv) follows from Definition 35.6.4 and Theorem 33.1.22. 

For part (v), let X be a first countable space. Let K € P(X) be countably compact. Then every w-infinite 
subset of K has an oo-limit point in K by Theorem 33.7.5. So K is w-infinite-set oo-limit-point compact by 
Definition 35.5.12. Therefore K is sequence-range oc-limit-point compact by Theorem 35.6.5 (ii). Hence K 
is sequentially compact by part (ii) because X is first countable. 


Part (vi) follows from part (v) and Theorem 33.7.3. 


35.6.7 REMARK: Implication network between limit-point, countable and sequential compactness. 
When the implications in Theorems 35.6.5 and 35.6.6 are combined with the compactness implications in 
Section 35.5, the resulting implication network is as illustrated in Figure 35.6.1. 


Figure 35.6.1 does not show the sequence-range limit-point and sequence-range oo-limit-point compactness 
classes because they are almost trivially equivalent to the w-infinite-set limit-point and w-infinite-set oo- 
limit-point compactness classes respectively, as shown in Theorem 35.6.5. 
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compact compact 
T Lindelöf f Lindelöf 
infinite set countably| |sequentially infinite set countably| |sequentially 
=> oo-limit point compact compact => oo-limit point compact compact 
` Sy 2 As [feo n 
: N LA ae a - EM - - io astu 
BW: infinite set w-infinite set BW: infinite set w-infinite set 
=> limit point | |= co-limit point => limit point | |= co-limit point 
E IT Ti COSE EE 
w-infinite set w-infinite set 
— limit point — limit point 
ZF set theory ZF+CC set theory 


Figure 35.6.1 Implications between limit-point, countable and sequential compactness 


The arrows tagged “Lindelöf” in Figure 35.6.1 are not necessarily CC-free. Even for the real number system, 
the axiom of countable choice is required in order to prove the Lindelóf property. (See also Remarks 33.7.7 
and 35.5.6 on this subject.) 


35.6.8 REMARK: Conditions for which limit-point compactness implies sequential compactness. 

The first countable and T, conditions in Theorem 35.6.9 are not superfluous. For limit-point compact spaces 
which are not sequentially compact, see Steen/Seebach [141], examples 6, 21, 50, 52, 54, 57, 62, 105, 106, 
111. No axiom of choice is required for Theorem 35.6.9. The converse assertion in Theorem 35.6.10 does use 
the axiom of countable choice, however, although it requires neither the first countable nor the T, condition. 


35.6.9 THEOREM:  Limit-point compactness implies sequential compactness in a first countable Tı space. 
In a first countable T, topological space, every limit-point compact set is sequentially compact. 


Pnoor: The assertion follows from Theorems 35.5.13 (ii), 35.6.5 (ii) and 35.6.6 (ii). 


35.6.10 THEOREM [ZF+CCc]: Sequential compactness implies limit-point compactness in ZF +CC. 
Let K be a subset of a topological space X. If K is sequentially compact, then K is limit-point compact. 
In other words, sequential compactness implies the Bolzano-Weierstraf property. 


PROOF: The assertion follows from the axiom of countable choice and Theorems 35.6.6 (i), 35.6.5 (ii), 
35.5.14 (i) and 35.5.10 (i). 


35.6.11 REMARK: Sequential compactness does not imply compactness in standard ZF set theory. 

It is known that there are Zermelo-Fraenkel set theory models in which there exist sequentially compact sets 
of real numbers which are neither closed nor bounded. (See Jech [364], page 142.) But every compact subset 
of R is closed and bounded. (See Theorem 37.7.3.) Therefore it is not possible to prove that all sequentially 
compact sets are compact in a general topological space in standard ZF set theory. Even in a metric space, 
the axiom of countable choice is required for this implication. (See Theorem 37.7.15.) 


35.6.12 REMARK: Zermelo-Fraenkel set theory limit-point compactness of sequentially compact sets. 

If w-infinite-set limit-point compactness is substituted for limit-point compactness in Theorem 35.6.10, the 
result is Theorem 35.6.13 (i), which is valid in Zermelo-Fraenkel set theory. In practical examples of spaces 
and sets, this is generally not significantly weaker than Theorem 35.6.10 because mostly the procedure for 
determining that a set is infinite lends itself to an inductive proof that the set is w-infinite. 


The first countability and Tı separation requirements for Theorem 35.6.13 (ii) are fairly easy to satisfy for 
many kinds of spaces. For example, all metric spaces have these properties. In particular, finite-dimensional 
Cartesian spaces and topological manifolds have these properties. 


35.6.13 THEOREM: Sequential compactness and w-infinite-set limit-point compactness. 
(i) Every sequentially compact set in a topological space is w-infinite-set limit-point compact. 
(ii) A subset of a first countable T; space is sequentially compact if and only if it is w-infinite-set limit-point 
compact. 
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PROOF: Part (i) follows from Theorems 35.6.6 (i), 35.6.5 (ii) and 35.5.13 (iii). 


Part (ii) follows from part (i) and Theorem 35.6.9. 


35.7. Real-number limit points and sequence limits 


35.7.1 REMARK: The Bolzano- Weierstraf) property for real numbers. 

Theorem 35.7.3 states that bounded infinite sets of real numbers have limit points. This is known as the 
"Bolzano-Weierstraf property" of the real numbers. (For the general definition for topological spaces, see 
Definition 35.5.4.) One of the technical hurdles to proving this property is the difference between infinite sets 
and Dedekind-infinite sets. An infinite set is a set which is not finite, which does not necessarily imply the 
existence of an infinite sequence of distinct elements of the set. Therefore the Bolzano-Weierstraf property 
must be proved without assuming that the given infinite set has an infinite subset which can be enumerated 
by integers. 


It is perhaps noteworthy that the construction in Theorem 35.7.2 does not yield an infinite sequence of points 
in each bounded infinite set S. It does yield an infinite sequence of points, but they are not chosen from S. 
The existence of an infinite sequence of distinct points of S would imply that S is Dedekind-infinite, which 
is known (from model theory) to be impossible to prove in general in pure ZF set theory. (See Jech [364], 
page 141; Moore [371], page 322; Howard/Rubin [362], form 13, pages 87, 325.) 


In some applications, instead of a single bounded infinite set of real numbers, the Bolzano-Weierstraf limit- 
point property may be required to yield an explicit limit point for each set in an infinite set of bounded 
infinite sets. Then an axiom of choice could be applied to this to yield one limit point per bounded infinite 
set. But resorting to choice axioms is unnecessary because by Theorem 35.7.2, there exists an explicit map 
from bounded infinite sets to limit points. The uniqueness is achieved by first defining interval end-points a 
and b explicitly in terms of each set S, and then *choosing" a limit point to be the least limit point of S by 
a kind of “binary tree search" within [a,b]. (See Figure 35.7.1.) 


limit points 
IR 
zo =a 
i=0 e 
zı =a + (b — a)/2 
=i 295-109? +59 et f 
za — a4 (b— a)/2 
Pep dr 110° +8 e ai 11.90 
23 =a + (b — a)/2 + (b — a)/8 
i 3 132 204 +19 j107 6 2 31 j 99 24 jL 99. j 
z4 =a + (b— a)/2 + (b — a)/8 
i 4 D i27 419 ,29* 6 110° 2 4,0 422 49 yoo Œ 421 43 IL? Wee j 


Figure 35.7.1 Binary bucket-tree procedure for Bolzano-Weierstraß theorem proof 


35.7.2 THEOREM: Explicit Bolzano-Weierstraf limit-point property for real numbers. 
Let X = (S € P(R); S is bounded and infinite). Then there exists a map v : X — R such that w(S) is an 
co-limit point of S for all S € X. 


PROOF: The existence of the map v : X — IR requires (S) to exist and be unique for all S € IR. Therefore 
an explicit procedure is required which yields one and only one oo-limit point for each S € X. 


Let S € X. Let a = inf S and b = sup S. (Note that the maps S +> inf(S) and S + sup(S) are well-defined 
ZF functions on X by Theorem 11.2.24 (i,ii).) Then S C [a,b]. So #(S N [a, b]) = oo. (The case a = b is 
impossible because ZZ(S N [a, a]) € #({a}) = 1.) 
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Inductively define the sequence (2;);29 € RY by zo = a and 

i if Æ#(S A [zi, zi + (b — a)/2+1]) = 

vie Ze, sul: if #( l [zizi + ( a)/ ]) = œ 
zi +(b—a)/2'*! otherwise. 


By induction, #(SN[z;, z;4-(b—a)/2]) = oo for alli € Zf because #(.SN[z0, zo+(b—a)/2°]) = #(S) = oo, and 
from the inductive assumption #(SN[z;, 2: +(b—a)/2"]) = oo, it follows that #(SN[zi41, zi41+(b-a) /2**1]) = 
#(S B [2i Zi (b — a)/2*]) = oo if 241 = 21, and 


HSA [zeta ziza + (b — a)/2*1]) = #(S N [zi + (b — @)/2"4, zi + (b — a)/27]) 
> (Sn [2i Zi + (b = a)/2*]) = #(S N [zi zi ER = a)/2'*)) 


if ziii 21 + (b — a)/2î+1. Then (2;)?29 is convergent because it is a Cauchy sequence. Let Z = lim; 5s; Zi- 
Then 2 € [z;, zi + (b — a)/2*] and #((S V {Z}) n [zi, zi + (b — a)/2"]) = oo for all i € Zf by induction. Since 
(SZH O Bz oa) j2i: 2 (S\ {Z}) NB: a ajo: for alli € Zi, it follows that #((S\{Z}) Baa) /2i-1) = oo 
for all i € yj Hence Z is an co-limit point of S by Definition 31.10.17. 


Since no random choices were made in the construction of Z = (S) from S, the set (($,v(S)); Se X) isa 
well-defined ZF function from X to IR such that 7(S) is an co-limit point of S for all S € X. 


35.7.3 THEOREM: The Bolzano- Weierstraf? limit-point property for real numbers. 
Every bounded infinite set of real numbers has at least one limit point. 


PROOF: The assertion follows from Theorems 35.7.2 and 31.10.18. 


35.7.4 REMARK: The role of Bolzano in the Bolzano- Weierstraf? theorem. 

More than one author has mentioned that the Bolzano-Weierstraf property and theorem owe little to Bolzano. 
A.E. Taylor [145], page 59, wrote the following in regard to the version of the theorem which says that every 
bounded infinite subset of a Cartesian space contains at least one accumulation point. 


Theorem [...] is quite generally referred to as “the Bolzano-Weierstrass theorem." Karl Weierstrass 
(1815-1897) was one of the pioneers in the study of the theory of sets of points in connection 
with investigations of the properties of functions. Bernard Bolzano (1781-1848) made early and 
important contributions to the study of continuous functions and other fundamental topics in 
analysis. The precise historical reasons for naming Theorem |. ..] after both Bolzano and Weierstrass 
are not clear. Some of the ideas that underlie the theorem are present in Bolzano's work. The 
theorem itself apparently should be attributed to Weierstrass. 


Moore [371], page 17, refers to “the Bolzano-Weierstrass Theorem, actually due to Weierstrass alone", and 
then says the following on page 18. 


It was not his theorem, but rather the rudiments of his method of proof, that he owed to Bolzano. 


Rudin [129], page 35, calls the Bolzano-Weierstraf theorem the “Weierstrass theorem". S.J. Taylor [147], 
page 31, refers to the Bolzano-Weierstraf property as the “Weierstrass property" or “property (W)". Since 
other important theorems in analysis are named after Weierstraf, it is plausible that the prefix “Bolzano” 
could have been added as a way to remember which theorem was which. Graves [85], page 49, calls it the 
“Weierstrass-Bolzano theorem". 


35.7.5 REMARK: Extending the Bolzano- Weierstraß limit-point theorem to Cartesian spaces. 

The proof of Theorem 35.7.3 may be extended to general finite-dimensional Cartesian spaces by replacing 
real-number intervals with n-tuple rectangles (x € R”; Vi € Nn, a; € xi € bi}, and then constructing a 
sequence of such intervals which converges to a point in IR". This is the approach adopted for Theorem 35.7.6. 


35.7.6 THEOREM:  Ezplicit Bolzano-Weierstraf limit-point property for Cartesian spaces. 
Let X = (S € P(IR”); S is bounded and infinite} for some n € Z. Then there exists a map v : X > IR 
such that v(S) is an oco-limit point of S for all $ € X. 
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PROOF: Ifn=0, then X = f, for which v = Ø has the required property. So n > 1 may be assumed. 


Let S € X. For k € Nn, let ap = inf(zy; z € S) and bk = sup{ zk; x € S). Let a = (ax) ~_, and b = (bg) p_). 
Then a # b. For z,y € R”, let [x,y] denote the compact rectangular subset x7? ,[rz,ys] of R”. Then 
S C [a,b]. So (S n [a, b]) = #(S) = co. 

For i € Zt and x € R”, let K;(z) = [r, x + (b — a)/2] and Vi(z) = { (£; + 4 (b; — 81)/2 as £ € TIT. 
Then V;(x) is the set of vertices of the compact rectangular set K;(r). Define the lexicographic order “<” 
on IR" as in Definition 11.6.19. Then “<” is a total order on IR" by Theorem 11.6.20, and a « b. 


Inductively define the sequence (2;);2, € (IR?)" by zo =a and 
Vi € Zf, 2,41 = min(z € Vií(zi); Æ#(S N K;,1(x)) = œ}. (35.7.1) 


In other words, z;,1 is the minimum vertex x € V;,1(2;) of the sub-rectangle K;+1(z;) of the rectangle K;(z;) 
such that K;,1(x) contains infinitely many points, where the minimum is with respect to lexicographic order 
on R”. (For n = 1, this yields the same sequence as in the proof of Theorem 35.7.2.) 

By induction, #(SN.K;(z;)) = oo for alli € Z because #(SN K;(z;)) = oo implies that #(SN Ki41(x)) = oo 
for at least one z € V;+1(z;), and one of these infinite sub-rectangles is chosen by line (35.7.1) for z;41. 
Since (z;)729 is a Cauchy sequence, it is convergent. Let 2 = limj.. %. Then 2 € [zi z; + (b — a)/2"] and 
#((S\ {2})N [zi z; + (b—a)/2]) = oo for alli € Zf by induction. Therefore #((S\ {Z}) 0 Bz y a|/2:i1) = 00 
for alli € Ze because (S \ {Z}) 0 Bz jp—aj/ai-1 2 (SV {Z}) N Bz aai Hence Z is an oo-limit point of S by 
Definition 31.10.17. 

Since no random choices were made in the construction of Z = (S) from S, the set (($, v(S)); $ € X}isa 
well-defined ZF function from X to R such that (S$) is an co-limit point of S for all S € X. 


35.7.7 THEOREM: The Bolzano- Weierstraf? limit-point property for Cartesian spaces. 
Every bounded infinite subset of a finite-dimensional Cartesian space has at least one limit point. 


PROOF: The assertion follows from Theorems 35.7.6 and 31.10.18. 


35.7.8 REMARK: The Bolzano- Weierstraf! sequence-limit property for real numbers. 

Some texts give the name *Bolzano-Weierstraf theorem” to the property of real numbers that every bounded 
infinite sequence of real numbers has at least one convergent subsequence. (For example, Johnsonbaugh/ 
Pfaffenberger [97], pages 58-59.) Other texts give this name to the sequential compactness property of closed 
bounded subsets of IR. The sequential compactness version is given here as Theorem 35.7.11. 


35.7.9 REMARK: Sequential compactness of bounded closed subsets of the real numbers. 

It is often useful in applications to know that every bounded infinite sequence of points in a set has a 
convergent infinite subsequence. This is the “sequential compactness” concept. (See Definition 35.6.2.) It is 
even more useful to know that there exists an explicit ZF map from bounded infinite sequences to convergent 
subsequences. (For an application which needs an explicit map, see the function-sequence diagonalisation 
procedure in the proof of Theorem 38.5.2, which is required for the proof of Theorem 38.5.5, Ascoli's theorem.) 


By Theorem 35.6.6 (vi), every compact subset of a first countable topological space is sequentially compact, 
but this only means that convergent subsequences of bounded sequences exist, without giving a procedure to 
construct the subsequences. In the case of the real numbers and Cartesian spaces, there is no need to invoke 
a choice axiom to choose the subsequences. The constructive procedure in Theorems 35.7.2 and 35.7.6 to 
choose limit points can be adapted to make explicit choices of convergent subsequences of bounded sequences. 


The difficult thing to do without choice axioms is to construct sequences from infinite sets of points. But 
when trying to prove sequential compactness, one is given a sequence. So the hard work is already done. 
Constructing a subsequence is then relatively mundane. The reverse direction, proving that infinite sets have 
limit points, is typically more problematic. 


The proof of Theorem 35.7.10 uses a kind of binary tree traversal algorithm to construct a convergent 
subsequence for any given bounded sequence. (This algorithm is illustrated in Figure 35.7.1 for the closely 
related Theorem 35.7.2.) The limit Z which is found in the proof of Theorem 35.7.10 is equal to lim inf joo £j- 
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35.7.10 THEOREM: Explicit sequential compactness of compact sets of real numbers. 
Let X = {x € R*; x is bounded}. Let Y = (y € IR^; y is convergent}. Then there exists a map ¢: X > 
(Z — Zg) such that for all z € X, 


(i) d(x): Zy — Zp is an increasing sequence, and 


(ii) zo d(x) EY. 


Hence there exists a map v : X — Y such that y(x) = z o G(x) is a subsequence of x for all x € X. 


PROOF: Let z € X. Then x = (x;)7<o is a bounded sequence of real numbers. Let a = inf(Range(x)) and 
b = sup(Range(a)). (The maps z +> inf(Range(x)) and x +> sup(Range(x)) are well-defined ZF functions 
from X to R by Theorem 11.2.24 (i, ii) because the map x ++ Range(x) is a well-defined ZF function from 
X to (S € P(R); S is bounded}.) Then Range(x) C [a,b]. So Zz(Range(x) N [a, b]) = co. 


Define the sequence z = (2;);2, inductively by zo = a and 


vie Z, — a | if "e Zg; xj € [zi zi + (b — a)/2'*1]}) = oo 

zi + (b— a)/2** otherwise. 
Then by induction, Z((j € Zj; x; € [z, zi + (b — a)/2]]) = oo for all i € Zg. This clearly holds for i = 0 
because then {j € ZJ; v; € [zi zi + (b — a)/2']) = [a,b]. If it holds for a given value of i € Zj , then either 
*(U € Zg; wy € [zi zi + (b— a)/27*1])) = oo or #Uj € Zg; xj € [s + (b — a)/27*, zi + (b — a)/27]]) = oo, 
or both. Consequently #({j € Zt; xj € [zi zi zi+1 + (b — a)/2])) = oo. 
z is a Cauchy sequence because Vk € Z, Vi, j € Z[k, oo), |zi — z;| € (b — a)/2*, where Z[k, oo) denotes the 
set {i € Z; k < i} for all k € Z as in Notation 14.4.11. Therefore z is convergent in IR. So 2 = lim; 5, 2; is 
well-defined with 2 € [a,b] because Vi € Zf, z; € [a,b]. 


Define a sequence f : Zi > Zi inductively by fo = 0 and 
Vi € Zf, Bi+ı = min(j € Z(8;, 00); £j € [zi+1; zi+1 + (b — a)/2**1]), 


where Z(k,oo) denotes {i € Z; k < i} for k € Z as in Notation 14.4.11. Then 6; is well defined because 
#UJ € Zo; £j € [ziti 243 + (b — a)/2**1]]) = oo. So {j € Z(Bi, 00); £j € [zi zi+1 (b — a)/2+1]} FO. 
The definition of @ also implies that it is an increasing sequence of non-negative integers. Define y; = c, 
for alli € Zj. Then y = x o f is a subsequence of x which converges to 2. 


Since no random choices were made in the construction of 8 = $(x) from z, the set {(x, 9(x)); x € X} is 
a well-defined ZF function from X to Z — Z such that ó(z) is an increasing sequence for all z € X. 
Likewise, the set ((x,v(z)); x € X} is a well-defined ZF function from X to Y such that y(x) = x o $(z) is 
a subsequence of x for all x € X. 


35.7.11 THEOREM: Sequential compactness of compact sets of real numbers. 
Every bounded infinite sequence of real numbers has a convergent infinite subsequence. 


PROOF: The assertion follows from Theorem 35.7.10. 


35.7.12 REMARK: The difficulty of demonstrating existence of infinite sequences in infinite sets. 
'Theorem 35.7.11 should not be confused with a similar-looking assertion which requires a countable choice 
axiom, namely that every bounded infinite set of real numbers has a sequential limit. Theorem 35.7.11 
asserts that every bounded infinite sequence of real numbers has a convergent subsequence. 


Moore [371], page 18, states that a choice axiom is required in order to prove that: “Every infinite bounded 
subset A of IR" has a sequential limit." (For ZF models where some infinite subset of R. contains no countably 
infinite subset, see Cohen [349], page 138; Jech [364], pages 21, 81, 141. See Section 13.10 for the issue of 
infinite versus Dedekind-infinite sets.) 
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35.8. Inferior and superior limits of real-valued functions 


35.8.1 REMARK: Inferior and superior limits of real-valued functions on topological spaces. 

The inferior and superior limits lim inf;.,4 f(x) and limsup, ,, f(x) may be defined for functions f : X > Y 
and a € X for general topological spaces X and general partially ordered sets Y. The infimum and supremum 
are defined very generally in Section 11.3 for functions f : X — Y where Y is a partially ordered set. There 
are various tedious technicalities when the target set Y is a general ordered set because of the need to ensure 
that the infimum and supremum in the following formulas are well defined. 


liminff(zr)- sup inf f(x) 
ra QE Top, (X) zeQM(a) 
and 
limsup f(z) = inf sup f(x), 
r>a OcTop,(X) rEQ\{a} 


These technicalities are avoided in the case of real-valued functions by employing the extended real numbers 
R. as the target set Y, while requiring the range of f to lie within IR. (See Definition 16.2.13 for the order 
on the extended real numbers.) The infimum and supremum of any subset of R are well-defined elements 
of R by Theorem 16.2.14 (v). For example, inf(0) = sup(IR) = +00 and sup(@) = inf(IR) = —oo. 


35.8.2 DEFINITION: The inferior limit of a function f : X — R at a point a € X, for a topological 
space X, is the extended real number supge Top, (x) infzeo (a) f(x). 


The superior limit of a function f : X — IR at a point a € X, for a topological space X, is the extended 
real number infoeTop, (X) SUPzeQ (a) Jf (a). 


35.8.3 NOTATION: liminf,,, f(x), for a function f : X — IR and a € X, for a topological space X, 
denotes the inferior limit of f at a. In other words, lim infra f(x) = SUPoeTop, (x) Infzeo {a} f (2). 


lim sup, ,4 f(x), for a function f : X > R and a € X, for a topological space X, denotes the superior limit 
of f at a. In other words, limsup,_,, f(x) = infoetop, (x) SUPreo {a} f(x). 


35.8.4 EXAMPLE: Locally bounded functions have an infimum and supremum everywhere. 

When the domain X and range Y are both the real numbers, with the standard topology and standard total 
order respectively, any locally bounded function has an infimum and supremum everywhere. For example, 
let f : R — R with f(0) = 0 and f(x) = (1--3|z|) sin(2/(2z)) for x € R\ {0}. Then —1 = lim inf; o f(x) < 
limsup, ,o9 f(x) = 1. This is illustrated in Figure 35.8.1. 


35.8.5 REMARK: At limit points, the inferior limit does not exceed the superior limit. 

At an isolated point p of a topological space X, liminf, ,, f(x) = oo and limsup, ,, f(z) = —oo for any 
function f : X — Y because (pj is a neighbourhood of p, and so QV {p} = Ø for some Q € Top, (X). It may 
seem paradoxical that liminf,,, f(x) > limsup,_,, f(x) in this case, but it follows automatically from the 
equally “paradoxical” fact that inf()) = co > —oo = sup(@). In a sense, there is in this case no limit because 
the point is isolated from the rest of the set. (See Definition 31.10.8 for isolated points.) 


For any non-isolated point, i.e. for any limit point, the expected inequality lim inf, _,, f(x) X limsup, ,, f(x) 
is valid, as shown in Theorem 35.8.6. Since the assertion applies also to relative topologies, the theorem is 
also valid for functions f : S — R for subsets S of X and limit points p of S, where the limits are computed 
with respect to the relative topology. 


35.8.6 THEOREM: At domain limit points, the inferior limit is bounded above by the superior limit. 
Let f : X — R for a topological space X. Let p € X be a limit point of X. 
Then lim inf; ,, f(x) € limsup, ,, f(z). 


Proor: By Definition 31.10.2, VO € Top,(X), Q \ (p) z 0. So inf(f(O \ {p})) € sup( f(Q \ {p})) for all 
Q € Top,(X) by Theorem 11.3.5 (i). So supe top, (x) inf(f(QV {P})) < infoerop, (x) sup( (GV {p})) because 
SUPQeTop, (X) inf (f (QA {p})) is less than or equal to all upper bounds for {inf(f(©\ {p})); Q € Top,(X)} and 
infoeTop, (x) inf (f (QV {p})) is greater than or equal to all lower bounds for {sup(f(Q \ {p})); € € Top, (X)} 
by Definition 11.2.4. Hence lim inf; ,, f(x) < limsup, ,, f(x) by Definition 35.8.2. 
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1+ 3|z]|) sin(7/(2x zc x0 
see) ={f |x|) sin(v/(22)) es 

P4 
Figure 35.8.1 Function f : R — R with lim inf, ,9 f(x) < limsup,_,9 f(z) 


35.8.7 REMARK: Relations between topological limits and inferior/superior limits. 

When the domain and range of a function f : X — Y are both the real numbers IR, with the usual 
topology and order, one observes that liminf, ,, f(x) < limsup,_,, f(x), and that if liminf,_,, f(x) = 
lim sup, ,4 f(x), then lim; ,4 f(x) is well defined and equal to both the inferior and superior limits. However, 
lim; ,4 f(x) is defined in terms of the topology on Y , whereas lim inf; ,4 f(x) and limsup, ,, f(x) are defined 
in terms of the order on Y. The order and topology structures on IR are clearly distinct. But they somehow 
give the same limit. 


'The reason for this correspondence is not too difficult to find. If a set has the same greatest lower bound and 
least upper bound, there is one and only one point which can be greater than or equal to the lower bounds, 
and also less than or equal to the upper bounds. This is a consequence of the antisymmetry property of 
total orders. (See Definition 11.5.1 (ii).) In the case of the topology on IR, there is one and only one number 
which is inside all neighbourhoods of that point. This is a consequence of the T, property of the topology 
on R. (See Definition 33.1.8 for the very weak T, property.) 


One might suppose that since total orders (or well-orderings) are simpler structures than topologies, one 
might benefit from discarding topologies and replacing them with total orders. However, natural well- 
orderings are somewhat rare. Even a simple space such as R” with the usual topology has no natural total 
order which corresponds to the topology. The set of real numbers is unusual in having a natural topology and 
natural total order which are compatible. This helps to explain why the real numbers may be constructed 
equally well from the rational numbers as Dedekind cuts (using the total order) or as Cauchy sequences 
(using the topology). 


35.8.8 REMARK: The inferior and superior one-sided limits of real functions always exist. 

By Theorem 16.2.14 (v), the inferior and superior left and right limits of real functions in Definition 35.8.9 are 
always well-defined extended real numbers. Such limits are applicable in particular to the Dini derivatives 
in Definition 40.10.2 and the shadow sets in Sections 45.5 and 45.6. 


35.8.9 DEFINITION: Inferior and superior one-sided limits of real functions. 
The inferior left limit of a function f : IR. —^ IR at a point a € IR is supseg infze(a. 5,4) f (x). 


The inferior right limit of a function f : IR — IR at a point a € IR is supseg« infre(aja+s) f(x). 


The superior left limit of a function f : JR. — R at a point a € R is infseR+ Supze(a 5,4) f (2). 
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The superior right limit of a function f : R — IR at a point a € R is infser+ SUPre(a,a+8) f(x). 


35.8.10 NOTATION: 

lim inf, ,4- f(x), for a function f : IR — R and a € R, denotes the inferior left limit of f at a. In other 
words, lim inf, ,4,- f(x) = supsep+ infre(a—s,a) f(x). 

lim inf, ,44 f(x), for a function f : R — IR and a € R, denotes the inferior right limit of f at a. In other 
words, lim inf, a4 f(x) = SUPS ER: inf re (a,a+5) f(z). 

limsup, ,4- f(x), for a function f : R —^ R and a € R, denotes the superior left limit of f at a. In other 
words, limsup; ,4- f(x) = infser+ SUPre(a—s,a) f(x). 

lim sup, ,4« f(x), for a function f : R — IR and a € R, denotes the superior right limit of f at a. In other 
words, lim sup, 44 f (£) = infseme SUPre(a,a+8) f(z): 
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36.0.1 REMARK: The central role of curves in physics and differential geometry. 

Curves are of fundamental importance to both differential geometry and physics. Therefore they deserve 
careful study. Much of physics is expressed in terms of the effect of fields (electric, magnetic, gravitational, 
etc.) on “test particles", which are abstract infinitesimal particles which follow continuous trajectories in 
some space, usually with a time parameter of some kind. Much of differential geometry is expressed in 
terms of parallel transport along curves or paths. Curves and paths are similar to 1-manifolds (defined in 
Chapter 50), but have important differences as discussed in Remark 36.1.8. 


Since almost everything in mathematics is generalised to the maximum, one might ask why the definitions 
and properties of curves are not routinely generalised from one dimension to multiple dimensions. The special 
importance of curves with a single point moving according to a single real-number parameter becomes evident 
when one considers that so much of fundamental physics is framed in terms of single-point particles, and 
their location coordinates (and other state parameters) vary in accordance with a single time parameter. 
The trajectories of entities which are not single-point particles can be modelled by expanding the location 
coordinate-tuple to include orientation and other state parameters. 

Curves and paths are also of fundamental importance in the study of connectedness in general topological 
spaces because curves are continuous maps whose domain is an interval, and intervals are precisely those 
subsets of R which are connected. Curves are therefore used for defining pathwise connectedness. Families 
of curves are also a basic tool in algebraic topology. 


36.1. Curve and path terminology and definitions 


36.1.1 REMARK: Survey of meanings of the terms “curve”, “path” and “arc”. 

'The candidates for definitions of curves and paths include the following styles of structures, where X is a 
topological space of points and J is a real-number interval. (The various kinds of real intervals are presented 
in Definition 16.1.5 and Remark 34.9.15.) 


(1) A map y: I — X for intervals 7 C R. 
(2) An image set S = 4(I) of a map y : 1 > X. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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Most differential geometry texts define a “curve” in style (1). This is sometimes referred to as a “parametrised 
curve” or “continuous curve”. Some texts use the word “path” for structure style (1). Some texts use both 
words “curve” and “path” for style (1). A minority of texts use the word “curve” for structure style (2). 
Table 36.1.1 is a rough survey of the usage of some authors. 


The pattern which seems to emerge is as follows. 


(i) Structure style (1) is the most popular by far. Style (2) is much less popular. 


(ii) The word “curve” is used by most DG texts for structure style (1). A small number of authors use the 
word “path” for style (1), either exclusively or in addition to the word “curve”. 


(iii) The word “path” is used only by a small minority of DG texts. 


(iv) The texts which do not use the word “curve” for structure (1) are predominantly on non-DG subjects 
such as real analysis, complex analysis and topology. 


» 


The most popular names for structures (1) and (2) are “curve”, “path”, “arc”, “locus” and “contour”. Other 
lausible names are “trajectory”, “track”, “route”, “journey” and “traversal”. 
p J XY 4 , xg y 


Styles (3) and (4) are not found at all in the author's survey of texts, but they do have some benefits. 


(3) A map y : I — X from which some redundant information has been removed. 
(4) A set S C X to which some extra information has been added. 


Structure style (1) has the most information. Style (2) has the least information. Styles (3) and (4) have 
intermediate amounts of information. The removal of information in case (3) may be achieved by replacing 
the map y in case (1) with an equivalence class of such maps. The addition of information in case (4) may be 


achieved by attaching extra structures (such as start and end points, the direction of traversal, or an order 
structure) to the image set S of case (2). 


36.1.2 REMARK: Paths in networks are static, but possibly directed. 

In mathematical network theory (also called *graph theory"), a *path" generally means an ordered sequence 
of links (or “edges” ) in the network. Although the phrase “path from A to B” has a sense of directionality, 
this suggests only a sequence of traversal, not the time at which each point should be reached. So the 
function concept (1) in Remark 36.1.1 has far too much information for a path. Perhaps a total order on the 
path would be the best way of representing the English-language idea of a path from one point to another. 


36.1.3 REMARK: Choice of terminology for curve and path concepts. 
In this book, the word “curve” (Definition 36.2.3) signifies the map structure (1) in Remark 36.1.1 and 
“path” signifies an equivalence-class structure (3), whereas concept (2) is called simply the “image” of the 


curve or path. In other words, a curve will be a map from an interval to a topological space whereas a 
“path” means a curve from which some or all information about the method of traversal has been removed. 


36.1.4 REMARK:  Analogies of curve and path concepts to manifold concepts. 

There may be many different path definitions according to the choice of equivalence relation. This is very 
much analogous to the way in which there is a multiplicity of definitions of manifolds — topological, C^ dif- 
ferentiable, analytic, and so forth. Concept (1) in Remark 36.1.1 is similar to a manifold chart (defined in 
the inverse direction), whereas concept (3) resembles a maximal atlas for a manifold. Concept (2) resembles 
the base set of a manifold without the charts or topology. (See Remark 36.1.8 for comments on inverse 
atlases for curves and paths.) 


36.1.5 REMARK: Removal of redundant information from curves. 

What is often wanted is a curve definition with the redundant information removed. Curves which have 
the same image are sometimes regarded as equivalent, but the image alone does not usually contain all of 
the desired information. It is helpful to look at some examples. Consider the maps 7; : [0,7] — IR? with 
yı : t > (cost,0) and y2 : [0,7] — IR? with y2 : t > (cos3t,0). (See Figure 36.1.2.) Both maps have 
the image set [—1, 1] x {0}, and they have the same start and end points. Given the image of a path with 
self-intersections, it is impossible to determine the order in which the image is traversed. 


In the case of a space-filling curve, essentially all parametrisation information is lost. Therefore even if a 
curve is known to be injective, the traversal sequence cannot be determined from only the image set and 
the start and end points. Therefore it is better to either specify the traversal order explicitly as an abstract 
order relation, or to specify a single map y together with an equivalence relation on the set of all curves. 
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year reference curve path arc 
1925 Levi-Civita [26] y:1M 
1931 Weyl [311] y : [0,1] > M 
1949 Synge/Schild [41] Img(y) [geodesic] 
1950 Struik [39] . y : [a,b] > IR? 
1953 Rudin [129] y : [a,b] 2 R* y : [a,b] — R* 
1955 Montgomery/Zippin [118] y : [0,1] > M "Y([0, 1]) 
1957 Wallace [152] y:[0,1] > M 
1959 Kreyszig [22] +([a, 6]) ¥({a, 6]) 
1959 Willmore [42] Img(y) y:1M 
1962 B. Mendelson [115] ([0, 1]) y:[0,1] > M 
1962 Taylor/Wade [146] y:I> R” 
1963 Auslander/MacKenzie [1] y:1M 
1963 Flanders [11] y:1—M 
1963 Guggenheimer [16] iy] 
1963 Simmons |137] Img(y) 
1964 Baum [54] "Y([0, 1]) 
1964 Bishop/Crittenden [2] y : [a,b] > M 
1964 Gelbaum/Olmsted [78] y:I>R 
1965 Feynman/Hibbs [265] 7: [a,b] > M 
1965 Postnikov [33] y:I>M 
1965 A.E. Taylor [145] 7: [a,b] > R” y : [a,b] > R” 
1966 Ahlfors [45] y:[o, 8] > C 
1967 Henri Cartan [4] ¥({a, b]) q : [a,b] > M 
1967 Gemignani [80] y: [0,1] > M 
1968 Bishop/Goldberg [3] y : [a,b] > M 
1968 Choquet-Bruhat [6] [fr] y:1—M y:1—M 
1970 Misner/Thorne/Wheeler [292] y:1—M ly] ? 
1970 Spivak [37] y:1—5 M Img(y) Img(y) 
1970 Steen/Seebach [141] y:[06,1—5 M y:[0,1]—^ M 
1970 Wallace [154] y:[0,1] > M 
1970 Willard [165], pages 25-28 y:[0,1] >M y:[0,1] ^ M 
1971 Kasriel [100], pages 170-171 y : [0,1] > M 
1975 Lovelock/Rund [27] y:I>M 
1979 Do Carmo [9] qy: (—£,€) > M 
1980 Schutz [36] y:1— M 
1981 Greenberg/Harper [86] y : [0,1] > X 
1983 Nash/Sen [30] y:I>M y: [0,1] > M 
1986 Crampin/Pirani [7] y:1—M Img(7) 
1987 Gallot/Hulin/Lafontaine [13] y:12 M 
1988 Kay [18] = y:I—>M 
1993 EDM2 [113] Img(y) Img(y) 
1994 Darling [8] y:1—5M 
1995 O'Neill [295 y:1—M 
1997 Frankel [12] y:1M 
1997 Lee [24] y:1—M 
1998 Petersen [31] y:1—M 
1999 Lang[23] - y:I—M y:IM 
2004 Bump [57] y : [0,1] > M 
2004 Szekeres [305 y: (a,b) > M 
2012 Sternberg [38] y:I—mR* 

Kennington y:1M I] 


Table 36.1.1 


Survey of meanings of “curve” and “path” 
y p 
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t=7 t—0 t= 1/3 t=0 
^n |t + (cost, 0) Y2 |t + (cos 3t, 0) 
b= 7 t= 
— à —! 
IR IR 
Figure 36.1.2 Curves with same image set and end-points 


36.1.6 REMARK: The distinction between curves and their images. 

'The difference between sets and curves may be compared with the difference between sets and sequences. 
Sets are often indexed for convenience even when the choice of index is irrelevant. Sometimes the order of a 
sequence of objects is important, sometimes not. The parametrisation of paths is analogous to the indexing 
of sets. The choice of index map for the set doesn't matter as long as the order is right. 


'The parametrisation of a curve is often of little importance apart from its order. Similarly, the choice of 
charts for a manifold is often of little importance as long as the topology and differentiable structure are as 
intended. One can remove the superfluous details of the choice of parametrisation by defining an equivalence 
class of parametrisations or by simply declaring parametrisations to be equivalent with respect to some 
specified equivalence relation. These are the usual ways of doing things when a mathematical structure 
contains superfluous information which should be ignored. 


36.1.7 REMARK: Comparison of curves and one-dimensional manifolds. 

Despite some similarities between curves and 1-dimensional manifolds, they are not really the same thing. 
The charts of curves map from R to the point set, whereas 1-manifold charts are from the manifold to IR. 
(See Figure 36.1.3.) 


curve one-manifold 
S=71)CM Dom(w), Dom(w2) C $C M 
y:1—5M es] pem 
——————— — — 
IR R 
I=Dom(y) CR ¥1(S), pa(S) CR 
Figure 36.1.3 Contrast between curve and one-manifold 


This is a necessary difference because curves may self-intersect. Therefore the map for a curve may not be 
injective. A curve is not just a point-set with a given topology. A curve is a parametrised trajectory within 
a topological space. A manifold structure would be more suitable for level curves of a real-valued function. 
Strictly speaking, “level curves” should perhaps be called “level manifolds". 


36.1.8 REMARK: Reparametrisation of curves. 
A path could be fully defined by analogy with manifolds as a pair (S,A) where S C M is a subset of a 
topological space M, and A is a set of continuous maps y : Iy — S for intervals I, C IR. (See Figure 36.1.4.) 


From this perspective, the curve map in concept (1) in Remark 36.1.1 is a chart y € A, the image set in 
concept (2) is the set S, and the equivalence class suggested by concept (3) is the atlas A. For any two maps 
71,72 € A, one may construct monotonic surjective continuous functions 6; : J — Lj, and £5 : I — Iy, such 
that y, o By = y2 o By. (The technicalities here are explained in Section 36.8.) 
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S= U Img(y) € M 
yEA 
yı ^a 3 
R R R 
Dom(71) C R Dom(72) C R Dom(73) € R 


Figure 36.1.4 Atlas of curves for a path 


If the reparametrisation functions £4 and £5 are non-decreasing for all pairs of maps, the path is oriented. 
Other constraints could be put on reparametrisation maps. For instance, they could be required to be affine, 
C*, analytic or an isometry. Each transition map constraint would yield a different class of path. One could 
even have transition maps which are of different regularity classes on different subsets of the path. In the 
case of rectifiable curves, Lipschitz transition maps might be suitable. 


One could now proceed to develop all of the concepts of topological and differentiable manifolds for paths 
of the form (S, A). After defining paths, one could define a family of paths as a pair (S, A) where y : 
I, — S is a multi-parameter map with J, C IR" for n € Zi . The advantage of this kind of inverse 
atlas construction for n-manifolds is that it can represent in a natural way surfaces which have complex 
self-intersections. The important point to note is that the charts are only required to be continuous, not 
necessarily homeomorphisms. 


If one expands a an atlas of curves by completing the atlas with respect to equivalence of direction only, 
then the information left in the atlas corresponds to a total order on the path set. In this case, one may as 
well use instead the concept of an “ordered traversal” which was introduced in Definition 11.5.29. 


A simple kind of path-chart equivalence would be to declare a set A of curves for a path set to be equivalent 
if y1 1oy : R > R is continuous for all yı and y2 in A. These curve transition maps may also be required 
to have various regularity properties. 


Perhaps a much more interesting concept of “path-atlas” would generalise the index set J to an open subset 
of IR" for general n € Zf. As in Definition 11.5.29, the analytical structure on the parameter set I could 
be replaced with an order structure, because sometimes one is only interested in order of traversal, not in 
the analytical properties of the traversal. In the interests of minimalism, the structure on J = R”, for 
example, could be a partial order such as z < y €» (Vi € Nn, z; € yi) or the lexicographic total order 
r<yo (Vj E Nn, (Vi E Nn, i < j > Ti = yi) > zj < yj) if I is well ordered, as in Definition 11.6.19. 


36.2. Curves 


36.2.1 REMARK: Typical hint-notation for curves. 

The customary use of the symbol y for curves is possibly due to the fact that y is the third Greek letter, 
corresponding to the Latin third letter ‘c’ for “curve”. However, the most obvious related Greek word 
for “curve” is “xuetéc”, meaning “curved, arched”, which commences with the letter x. (See for example 
Liddell/Scott [478], page 458.) 


36.2.2 REMARK: The continuity of curves. 

The definitions of curves in Section 36.2 are meaningful in general topological spaces although they are 
typically used in topological and differentiable manifolds. It is assumed that all curves are continuous 
because it is difficult to think of a useful class of curves with a weaker condition than continuity. (A 
curve with discontinuous jumps is probably better described as a set or sequence of curves, or an “ordered 
traversal”.) However, a curve may be referred to as a “continuous curve” to emphasise that no stronger 
regularity properties are expected from it. 


36.2.3 DEFINITION: A (continuous) curve in a topological space M is a continuous map y : J — M for 
some interval J C R. 
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36.2.4 REMARK: The role of real-number intervals in the definition of curves. 
Curves are defined in terms of real-number intervals. Curves inherit various properties from these intervals. 
(Intervals of R are defined in Definition 16.1.4.) 


36.2.5 REMARK: Classes, sub-classes and classifications of curves. 
There are many classes, sub-classes and classifications of curves. The following kinds of questions arise when 
choosing definitions for curves for various applications. 


(1) What is the domain? Intervals of R can have the form [a,b], (a,b), [a, b) and (a, b] for a,b € IR. Possible 
constraints on the numbers a and b are a < b, a < b or a Æ b. Some texts require a = 0. Some texts 
require both a = 0 and b = 1. Some texts require a = —b and b > 0. One may also replace the parameter 
space R with the extended real numbers R. Then a and b may be equal to —oo or --oo. This makes 
possible intervals of the form (—oo, +00), [—00, oo] and so forth. 


(2) What is the direction? If the domain is denoted as [a,b], (a,b), [a, b) or (a, b] with b < a, then one 
may consider that the curve is traversed in the reverse direction. However, strictly speaking, there is 
no difference between two intervals which are specified with the end-points a and b swapped, since the 
interval set I is independent of the notation used to specify it. (For example, the notations [—1,2) and 
(2, —1] denote the same set.) 


(3) What kinds of self-intersections are permitted? A curve may be required to be injective, which means 
that there are no self-intersections. A curve may be required to have no “constant stretches”, which are 
sub-intervals where the curve's value is constant. The end-point of a curve may be required to be the 
same as the start point, while permitting or forbidding other self-intersections. A curve may also be 
required to have a finite or other kind of limit on the cardinality of the set of self-intersections. 


(4) Should curves be partitioned into equivalence classes? Two curves may be considered to be equivalent 
if they can be continuously reparametrised to be equal, either respecting or ignoring the direction of 
traversal. Curves may also be considered equivalent if their image set is the same. 


(5) Should curves satisfy conditions with respect to additional structures on the target space? For example, 
if the target topological space is a metric space, the image of the curve may be required to be a bounded 
set. If the target space has a differentiable structure, curves may be required to have some kind of 
differentiability property. If the target space is a manifold, curves may be required to have various local 
or global embedding or immersion properties. If the target space is a linear or affine space, curves may 
be required to be linear or polynomial with respect to the parameter. 


Many combinations of the above kinds of constraints are given special names, but the naming conventions 
vary enormously. As is suggested by Table 36.1.1, there is not much agreement on what constitutes a curve, 
path or arc. The names of classifications are even more divergent between authors. 


For each curve classification scheme, one may define various basic operations on the curves (or curve equiv- 
alence classes), such as reversal, concatenation, reparametrisation, translation, extension and restriction. 


36.2.6 REMARK: The question of "empty curves". 

There’s an interesting question here as to whether empty curves should be permitted in Definition 36.2.3. 
Real intervals are characterised as the connected subsets of IR. The set of all real intervals is closed under 
intersection if empty intervals are permitted. So it is desirable to permit empty curves. If J = (), then y = (), 
namely the empty function. 


36.2.7 REMARK: The connectedness of the image of a curve. 

Since the parameter interval I of a curve in Definition 36.2.3 is connected, it follows that the image q(T) is 
connected in the topology of the target space M. The interval / may be open, closed or semi-closed. It may 
also be classed as bounded, singly infinite or doubly infinite. If J is compact (i.e. closed and bounded), then 
the image q(T) is compact. The non-empty compact intervals are the most useful for defining parallelism on 
fibre bundles because the end-points are required. 


In algebraic topology, it is usual to normalise a positive-length compact parameter interval of a curve to 
the set [0,1], but in differential geometry, general parameter intervals are required. The parameter may 
represent, for example, the time of passing a point or the distance measured along the curve. 
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36.2.8 REMARK: Confusing usage of the words “open” and “closed”. 

One must distinguish two different ways of using the words “open” and “closed”. A curve y : [a,b] > X 
might be called “closed” if y(a) = y(b), as in Definition 36.2.11, whereas the word “open” in Definition 36.2.9 
refers to the topological openness of the domain of the curve. This is an unfortunate re-use of words. Because 
of these multiple meanings of “closed” and “open”, it is sometimes a matter of guesswork to interpret them. 
Perhaps a better word for a curve which ends where it starts would be a “loop” or a “closed loop”. 


36.2.9 DEFINITION: Let M be a topological space. 


An open-interval curve in M is a curve y : I > M such that J is an open interval. 


A compact-interval curve in M is a curve y: I > M such that J is a compact interval. 


36.2.10 REMARK: Classification of curves according to injectivity. 

Whereas Definition 36.2.9 classifies curves according to their domain interval, Definition 36.2.11 classifies 
curves according to the injectivity or lack of injectivity of the curve. The term “cyclic curve" is non-standard, 
but it avoids the confusion caused by the term “closed curve”. 


36.2.11 DEFINITION: Let M be a topological space. 
A cyclic curve or closed curve in M is a curve 7: [a,b] > M such that y(a) = ¥(b). 


A simple curve or Jordan arc in M is an injective curve, i.e. a curve y such that y(t1) = y(t2) > ti = te. 


A Jordan curve or simple closed curve in M is a curve with a non-empty compact domain interval which 
is injective except that the end-points coincide; in other words, it is a curve y : [a,b] — M such that 
(t) = (t2) & (tı = te or (t, to} = {a, b}). 

A constant curve in M is a curve y : I — M such that Vs,t € I, y(s) = y(t). 


36.2.12 REMARK: Initial and terminal points of curves. 

It is not at all guaranteed that a curve y : (a,b) — M can be continuously extended to a curve y : [a,b] 2 M 
for a,b € IR with a « b. Therefore the initial and terminal point concepts in Definition 36.2.13 and 
Notation 36.2.15 assume a non-empty compact parameter interval fa, b]. 


36.2.13 DEFINITION: Let M be a topological space. 

The initial point of a curve y : [a,b] — M is »(a). 

The terminal point of a curve y : [a,b] — M is q(b). 

A multiple point of a curve y : I > M is a point z € M such that *(t1) = y(t2) = x for some tı,t2 € T 
with ty Æ to. 


36.2.14 REMARK: Source, start, terminal and target points of curves. 
In Notation 36.2.15, S is mnemonic for “source” or “start”, and T is mnemonic for “terminal” or “target”. 


36.2.15 NOTATION: The initial and terminal points of a curve y : [a,b] —^ M may be denoted as S(y) = 
y(a) and T(y) = 7(b) respectively. 


36.2.16 REMARK: Experimental notation for sets of continuous curves. 
Notation 36.2.17 is experimental. It may need some indication of the nature of the parameter interval. 


36.2.17 NOTATION: Denote by @(M) the set of all continuous curves in M. 


36.3. Space-filling curves 


36.3.1 REMARK: The Peano space-filling curve. 

It is not possible to defined a homeomorphism between the real-number interval [0,1] C R and the unit 
square [0,1]? C IR?, assuming the standard topologies on these sets. (See Definitions 32.5.7 and 32.6.1.) 
However, it is possible to define a continuous function with domain [0,1] and range [0,1]?. Such curves 
are called “space-filling curves”. The first example was given in 1890 by Peano [192]. Also well-known is 
the Hilbert space-filling curve, published in 1891. (See for example Hocking/Young [93], pages 122-123; 
Simmons [137], pages 341-342; Newman [247], Volume 3, pages 1965-1966; Guggenheimer [16], pages 3-7. 
See also Gelbaum/Olmsted [78], pages 132-138, for the Hilbert curve and badly behaved plane curves in 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1208 36. Topological curves, paths and groups 


general.) Some published descriptions of space-filling curves present the Hilbert curve while claiming that 
it is the Peano curve, whereas the Hilbert curve is only one example of a space-filling curve, although such 
curves in general are often called “Peano curves” as a class. 


Peano’s space-filling curve has the form 


oo 


ve € [0,1], f(z) => 037 (ene 1) +1, ( peru, - 1) +1), 


k=1 


where x is expressed in the form x = 75°, 377b; for some sequence b : N — {0,1,2}. (The slight non- 
uniqueness of this ternary fractional expansion of x does not affect the uniqueness of the value of f(z).) This 
curve is illustrated for the summation up to k = 4 and k = 5 in Figure 36.3.1. 


pen eee 


= 


o AM > 0 ee 
1 


il 
il 
0 1 ” o 


Figure 36.3.1 Peano space-filling curve for stages k < 4 and k < 5 


Tı 


Peano’s space-filling curve caused surprise and even alarm in the mathematics “community”. Gelbaum/ 
Olmsted [78], page 133, wrote the following. 


In 1890 the Italian mathematician G. Peano (1858-1932) startled the world with the first space- 
filling arc. 


Hocking/Young [93], pages 122-123, wrote the following (during the 20th century). 


The Peano spaces have an interesting history. During the last century, when mathematicians were 
first formulating concepts with a careful regard for rigor, the notion of a “curve” caused considerable 
difficulty. [...] This example, surprising and almost paradoxical at the time, is commemorated in 
the term Peano space. |...] 


The modern theory of curves has absorbed this phenomenon and carried on. 
Simmons [137], page 342, wrote the following. 


Peano’s discovery of space-filling curves was a shock to many mathematicians of the time, for it 
violated all their preconceived ideas of what a continuous curve ought to be. 


Coincidentally, while many paradoxes were causing alarm in the mathematics world at the turn of the 20th 
century, certain “clouds on the horizon” were appearing in the physics world. Revolutions of thought arose 
out of these difficulties in both disciplines. 
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36.4. Removal of stationary intervals from curves 


36.4.1 REMARK: Stationary stretches of curves. 

In the study of pathwise parallelism, it is assumed that no change of orientation of a fibre occurs if the 
curve is stationary for a while. Thus if a curve y : I — M satisfies y(s) = y(t) for all s,t € [a,b] C I for 
some a « b, then the curve could be said to be stationary on the interval [a,b]. But the word “stationary” 
is usually associated with functions of one or more real or complex variables whose derivatives vanish at a 
point. Therefore the more specific term “constant” is preferable. 


If reparametrisations are permitted to be non-decreasing continuous surjections rather than increasing 
homeomorphisms, then all constant stretches of curves may be removed. A curve yı which is constant on 
the interval [a,b] may have this constant stretch removed by expressing it as yı = y o B, where 8: I > R 
is defined by (t) = min(z, a + max(0, x — b)) for all t € I, and 


— 1(t) t<a 
w=] EO a) t>a 


for t € (I) = Dom(y2). A reparametrisation such as this is clearly not a homeomorphism, but the curves yı 
and 72 trace out the same set of points in the same order. Therefore they are equivalent as far as representing 
a traversal of points is concerned. When curves are used for parallel transport, there is supposed to be no 
change in the orientation of a fibre when the curve is constant. 


There are two obvious ways to deal with sometimes-constant curves. Either they can be simply removed 
from consideration, or else they can be “equivalenced out", which means that they can be collected together 
in equivalence classes which effectively ignore the constant stretches of curves. If the latter approach is used, 
it will be convenient to always be able to select a never-constant representative for each equivalence class. 
In practice, this would be the same as just ignoring sometimes-constant curves completely. There remains, 
therefore, the question of whether there is any use in permitting sometimes-constant curves to be members 
of curve classes. The formalism chosen here uses unrestricted continuous curves, and imposes an equivalence 
relation which makes all curves equivalent to some never-constant curve. Therefore all sometimes-constant 
curves may be ignored since their paths are represented by never-constant curves. 


36.4.2 DEFINITION: A constant stretch of a curve y : I — M in a topological space M is an interval 
[a,b] C I with a < b and g(s) = q(t) for all s,t € [a,b]. 


36.4.3 DEFINITION: A never-constant curve in a topological space M is a curve y : I — M such that 
Va,b € I, (a < b => (3c € [a,b], y(c) 4 y(a))). 
A sometimes-constant curve in a topological space M is a curve y : I — M which has a constant stretch. 


36.4.4 REMARK: The relation of constant stretches to injectivity. 

A curve is a never-constant curve if and only if it has no constant stretches. It is not necessarily true that 
a never-constant curve is injective if restricted to small enough sub-intervals. That is, a restriction such as 
liee, tte] May be non-injective for all e > 0. But a curve which does have this local injectivity property is 
necessarily never-constant in the sense of Definition 36.4.3. 


36.4.5 THEOREM: A constant and never-constant curve has an empty or singleton domain. 
If a curve y is constant and never-constant, then either y = 0 or ##(Dom(7)) = 1. 


PROOF: Let y: I — M be a curve for some topological space M and real interval J. Suppose that y is 
constant and y 4 Ý and Dom(y) Æ 1. Then #(J) > 2. So there exist a,b € I with a < b. Then (c) = yla) 
for all c € [a,b] by Definition 36.2.11 because y is constant. Therefore by Definition 36.4.3, y is not a 
never-constant curve. 


36.4.6 REMARK: The “constant neighbourhoods” of a continuous function. 

Theorem 36.4.8 shows that the set of points p in the domain X of a continuous function f : X > Y 
where f is constant in some neighbourhood of p is an open subset of X. If X = IR, this implies by 
Theorem 32.7.8 that these “constant neighbourhoods” may be expressed as a countable union of open 
intervals. Then Theorem 36.4.10 is useful for “squeezing out" these constant intervals from a continuous 
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curve. Theorem 36.4.10 is a kind of “technical lemma" to support Theorem 36.4.11. (Part (ix) was written 
to support a style of proof for Theorem 36.4.11 which was ultimately abandoned. So this part can be safely 
ignored.) 


The proof of Theorem 36.4.11 uses some “naive measure theory" for open intervals of real numbers to 
decompose an arbitrary continuous curve 7; : J — M into a never-constant curve ya : J —^ M composed 
with a curve reparametrisation map 8 : I > J so that J71 = 72 o B and yo = 71 o a, where B o a = idz. The 
assertion of Theorem 36.4.11 is perhaps intuitively obvious when M is a Cartesian space, but a proof needs 
to be written to ensure that such intuition is not misguided. Such caution is justified by Example 36.4.7. 


36.4.7 EXAMPLE: Let Y — Z with the semi-infinite interval topology as in Example 33.1.19. In other 
words, Top(Y) = (0, Z} U (Z(—oo,a]; a € Z}, where Z(—oo, a] denotes the set {i € Z; i € a) for a € Z. 
Then Y is a topological space which is To but not Tı. 

Define yı : R — Y by ^1(t) = floor(t) for t € R, where R has its usual topology. Then yı : R > Y isa 
surjection which is continuous because 4, ! (0) = 0, 5; ! (Z) = R and 41 ! (Z(—o0,a]) = (—00, a+1) € Top(IR) 
for all a € Z. 

Suppose that ^1 = y2 o B, where 3 : R > R is a non-decreasing continuous function and 7.2 : Range(8) > Y 
is a never-constant continuous curve. Then 6(0) < 8(1). For all c € R with 8(0) < c < 8(1), there is a 
t € (0, 1) such that c = 8(t). Therefore y2(c) = ya(B(t)) = y1 (t) = 0. This contradicts the never-constancy 
of y2. Therefore yı cannot be expressed as y2 o 8 for such 8 and 72. In other words, the constant stretches 
of yı cannot be “squeezed out” in this way. 


36.4.8 THEOREM: The constant-stretch interior points of a continuous function form an open set. 
Let f : X — Y be continuous. Then {p € X; IQ € Top, (X), Vx € Q, f(x) = f(p)) is an open subset of X. 


PROOF: Let G = {pe X; 40 € Top, (X), Vz € Q, f(x) = f(p)}. Let p € G. Then f(Q) = {f(p)} for some 
Q € Top,(X). For such Q, let z € Q. Then Q € Top,(X). So x € G. Therefore Q C G. So p € Int(G). 
Therefore G C Int(G). Hence G is an open subset of X by Theorem 31.8.14 (ii). 


36.4.9 REMARK: “Nearest point algorithm” to help “squeeze out the constant intervals". 

The function a in Theorem 36.4.10 computes the nearest point to p where the function 8 has a given value. 
If no such point exists (as may happen for a discontinuous function), the relevant infimum or supremum is 
used. This “nearest point algorithm" is illustrated in Figure 36.4.1. 


4 B(x) 
t4- ff 


t3 - e 
t2 4 


t1 


Figure 36.4.1 Finding the nearest point where a function has a given value 


36.4.10 THEOREM: Properties of the nearest points with a given value of a non-decreasing function. 
Let I be a real-number interval. Let p € I. Let 8 : I — R be a non-decreasing function such that 6(p) = 0. 
Let J = Range( 8). Define a: J — IR by 


5) Perra cT frt>0 
Q = 


Vt € J, 
sup{x € I; x < pand f(x) € t) for t <0. 
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(i) a: J > I is a well-defined function. 
(ii) Vt € Jn R$, a(t) > p, and Vt € JN Rọ, a(t) € p. 
(ii) o is a non-decreasing function. 
(iv) Vt e Jn R$, B(o(t)) € t. 
(v) Vt € Jn Rs, B(a(t)) 2 t. 


Let B be continuous also. Let G = (p € Int(I); 39 € Top, (Int(7)), Vz € Q, B(x) = B(p)}. Then: 


) J is a real-number interval. 
) ßoa=idz. 
(viii) a is a strictly increasing function. 
(ix) YQ € Top(I), (G CQ > B(Q) € Top(J)). 


(vi 


(vii 


Proor: Let St = {x € I; x > p and f(x) > t} fort € Jn IRj, and let S7 = {x € I; x € p and f(x) < t) 
for t € JO Rg. Then S£ C [p, oo) for all t € JN R, and S7 C (—oo, p] for all t € JN Rg. Note also that 
Sj is a real-number interval for all t € J n RẸ since S? = (x € I; B(x) > t) n [p, oo) is an intersection of 
two intervals because J is an interval and B8(z1) > t = B(x2) > t for any z1, £2 € I with zı € x. Similarly, 
S, is a real-number interval for all t € JN Ro. 

For part (i), p € SÈ C [p, oo). So inf Sf = p. Similarly p € Sy C (—oo, p]. So sup Sg = p. Therefore o(0) 
is well defined and a(0) = p. 

Let t € JN R$. Then B(x) = t for some x € I with x > p because 8 is non-decreasing. So x € S7 C |p, oc) 
for some x € I with x > p and G(x) = t. Therefore inf S7 is well defined and p < a(t) < x for all x € I with 
a > p and B(x) = t, and a(t) € I because S? is a real-number interval which is bounded below. 


Lett € JN Ro. Then (x) = t for some x € I with x < p because 8 is non-decreasing. So x € S} C (—oo, p] 
for some x € I with x < p and f(x) = t. Therefore sup S; is well defined and x < a(t) € p for all x € I 
with x < p and f(x) = t, and a(t) € I because S, is a real-number interval which is bounded above. Thus 
a: J — I is a well-defined function. 

Part (ii) follows from the proof of part (i). 

For part (iii), let t1,t2 € JMR} with tı € t3. Then S$ 2 S$ because B is non-decreasing. So a(tı) = 
inf S$ < inf S$ = a(tz). Thus a is non-decreasing on J Rj. Similarly, a is non-decreasing on J N Ro. 
But a(tı) < p € a(t) if tı € 0 < tz by part (ii). So o is non-decreasing on J. 

For part (iv), let t € JO R$. Then a(t) € I and p < a(t) € x for all x € I with z > p and f(x) = t, as 
observed in the proof of part (i). So B(a(t)) € B(x) for all x € I with x > p and B(x) = t because £ is 
non-decreasing. So B(o(t)) < t because t € (IN [p,oo)). Consequently Vt € J n R$, B(a(t)) € t. 

Part (v) may be proved as for part (iv). 

Part (vi) follows from Theorem 34.9.5. 

For part (vii), let t € JA R$. Let z = a(t). Then x € I and x > p and Ve > 0, S£ n [o(t), a(t) +e) z 0. 
So Ve > 0, Jx € IN [p,oo), (x < a(t) -- and G(x) > t). Therefore Ve > 0, B(a(t) +£) 2 t because 8 
is non-decreasing. So S(a(t)) > t because B is continuous. Therefore 8(a(t)) = t by part (iv). Similarly, 
B(a(t)) = t for all t € JM Ro by part (v). Hence 8 o a = idy. 

For part (viii), suppose that a(t1) = a(t2) for some tı,t2 € J. Then tı = B(o(t1)) = B(o(t23)) = te by 


part (vii). So o is injective on J. Hence a is an increasing function on J by part (iii). 


For part (ix), let Q € Top(1) satisfy G C Q. Let K = 8(G) and S = B(Q). Then KC S C J. Lett € S\K. 
Then t = f(x) for some x € QV G. But Q \ G € Top(I). So x € (z1,22) C Q \ G for some x1, 272 € R. 
Therefore 8(z1) < t < 8(x3). (Otherwise (z1,x) or (x, £2) would be a constant stretch for 6, which would 
imply that these are subsets of Œ.) Thus t € Int(S \ K) C Int(S). 

Now let t € K. Then t = f(x) for some x € G. Let zı = inf(z € I; B(x) = t) and za = sup(z € I; B(x) = t). 
Then —oo € zı € a(t) € x2 € oo because B(a(t)) = x, and (x) = t for all x € (21,22). Suppose that 
v2 < sup(I). Then (xs,sup(l)) is a non-empty open interval and f(z3) < (x) for all x € (zs,sup(1)). 
But [z2, x5) € Q for some xh > x2 because x2 € Q. Then [6(x2), 8(x5)) € 8(Q). Similarly, if inf(7) < zi 
then (inf(I),z,) is a non-empty open interval and f(r) < 8(xı) for all x € (inf(I),z1). Then zı € Q and 
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so (z1,z1] € Q for some x, < zi, which implies that (8(x1), 8(x1)] € (Q). Consequently t € Int(S) if 
inf(I) < zı and x2 < sup(1). 

Now suppose that inf(7) < zı and z2 = sup(I). Then t = (x2) = sup(J) because 8 is non-decreasing and 
continuous. So (f(x),sup(J)] € S for some x < xı and (x1) < B(z1) = t = sup(J). This is an open 
neighbourhood of t in the relative topology on J. So t € Int(S). Similarly, t € Int(S) if inf(I) = xı and 
x2 < sup(I). If inf(I) = zı and z2 = sup(1), then S = {t}, which implies that t € Int(S) in the relative 
topology on J. Thus t € Int($) in all cases. So S € Top(J) by Theorem 31.8.14 (ii). 


36.4.11 THEOREM: Squeezing constant stretches out of a curve to make it never-constant. 
For any curve 7; : / — M in a T, topological space M, there exists a never-constant curve y2 : J > M and 
a non-decreasing continuous surjection 8: I — J such that y1 = %2 o f. 


PROOF: Let yı : — M be a curve in a topological space M. If I = 0, let J = ( and y2 = 0 and 8 = 0. 
Then 8 : I — J is a non-decreasing continuous surjection and y1 = y» o 8. If Z(I) = 1, let y =m, J — I, 
and B = idz. Then 8: I > J is a non-decreasing continuous surjection and 7; = %2 o f. 

Let #(J) > 1. Then Int(I) 4 0. Let I? = Int(I). Let G = (p € I°; 30 € Top, (19), Va € Q, (x) = 71 (p))- 
Then G € Top(1?) by Theorem 36.4.8. So G € Top(IR). Let p € I9. Define 8 : I > IR by 


Va € I, 


gites U —p-—g(Gn(p,z) fora >p (36.4.1) 


where u(Q) denotes the measure of sets € € Top(IR) as in Definition 32.7.10. Then f is well defined because 
A(G n (p,z)) € R for z > p and u(G n (x, p)) € IR for x € p, and (0) = 0 by Theorem 32.7.11 (i). Also, 
L(GN (p, z3)) — (GC (p, £1)) = (GN (z1,22)) € [0,22 — 21] for p € z1 € x2 by Theorem 32.7.11 (iii, iv, v). 
Similarly, u(G N (z1,p)) — (G N (z1,p)) = (G N (p, £2 — z1)) € [0,22 — zi] for zı € z2 € p. So f isa 
non-decreasing and continuous function on I. 


Let J = Range(8). Then 8 : I — J is a non-decreasing continuous surjection. Define a : J + I by 


inf{x € I; x > pand B(x) >t} fort20 


(36.4.2) 
sup{x € I; x < pand f(x) € t) for t € 0. 


Vt € J, a(t) = { 


Then a: J — I is a well-defined function and f o a = id; by Theorem 36.4.10 (i, vii). Let y2 = yı o a. Then 
y2 : J — M is a well-defined function. Therefore y2 o B = %1 0v o B : I — M is a well-defined function. 
Let z € In[p,oo). Let z' = o(B(x)). Then B(z") = B(a(8(z))) = B(x) because B o a = idy. If a’ > a, 
then z' — x = p(GN (p,2’)) — (GNA (p,x)) = (GNA (x, z')). So (x, z^) C G. Therefore q(x’) = y(x), and 
similarly if 2’ < x. Thus ya(f(x)) = m (a(8(x))) = yı (x). Similarly, ys(8(x)) = yı (x) if x € TN (—oo, p]. 
Consequently yı = 72 o f. (See Figure 36.4.2.) 


I 
p 
7 ^ = 
Figure 36.4.2 Reparametrisation of 7, to remove constant stretches 


To show that y2 : J — M is continuous, let t € J, q = y2(t) = yi (a(t)), Q € Top,(M), € = y (9) 


and Q" = 73 *(9). Then 77 (Q0) = 87! (a2 (Q)), and so B(yy "(2 )) B(8- (yz (Q))) = 2 (9) because 
Range(B8) = J. Therefore €)" = 6(). 


Let K, = 8-!((t]). Then K; is real-number interval which is a (relatively) closed subset of I because £ is 
continuous and non-decreasing. 


If K; is a singleton (p), then p = a(t) and Vza € I, (xa > p > f(x) > t) because otherwise K, would not be 
a singleton. If p = sup(J), then t = (p) = sup(J). (This includes the case sup(I) = oo.) If p < sup(J), then 
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[p, x2) € Q for some x2 > p, and then [t, 8(z2)) C B(Q’) = Q”. Thus t is in the interior of Q” “on the right" 
in both cases. Similarly, Vxı € I, (xı < p > (zi) < t). If p = inf(I), then t = B(p) = inf(J). (This includes 
the case inf(I) = —oo.) If inf(I) < p, then (a, p| € Q for some xı < p, and then (8(z1),t] C B(Q’) 2 €". 
Thus ¢ is in the interior of Q” “on the left" in both cases. Consequently in all combinations of cases, t is in 
the interior of Q” with respect to the relative topology on J. 


Now suppose that K, is not a singleton. Then Int(K;) Æ Ø. Since B is constant on K;, it follows from 
line (36.4.1) and the definition of G that *1(z1) = y1 (x2) for all z1, £2 € Int(K,). Let q' = y(x) for some x € 
Int(K,). Then (g') is a closed subset of M by Theorem 33.1.12 (i) because M is a T, space. So 7;  ((q']) 
is a (relatively) closed subset of 7 by Theorem 31.12.6. But Int(K;) C 44 !((q]). So Int(K;) C yy Hy). 
Therefore yı(x) = q' for all x € Kı. So q' = q. In other words, *1(x) = q for all x € Ky. (This can 
alternatively be proved using Theorem 35.3.22, which also requires a T4 space.) 


Let p; = inf(K,) and po = sup(K;), which gives —oo < pı < po € oo. Then Vza € I, (zo > po => (x2) > t) 
because otherwise K, would include [p2, £2) for some x2 > p2, which would contradict the definition of K+. 
If po = sup(Z), then t = sup(J). (This includes the case sup(I) = oo.) If po < sup(J), then [po, 23) C Q 
for some £2 > p2, and then [t, 8(z3)) C 8(Q) = Q”. Thus t is in the interior of Q” “on the right” in both 
cases. Similarly, Vai € I, (xı < pı > B(z1) < t). If pı = inf(I), then t = (pı) = inf(J). (This includes the 
case inf(I) = —oo.) If inf(I) < pi, then (zi, pi] € Q’ for some zı < pı, and then (8(z1),t] C (Q) = Q”. 
Thus ¢ is in the interior of Q” “on the left" in both cases. Consequently in all combinations of cases, t is in 
the interior of Q” with respect to the relative topology on J. Thus all elements of Q” are interior points (in 
the relative topology on IR), and so Q” € Top(J). Therefore yı : J > M is continuous by Definition 31.12.4. 


To show that 7 is never-constant, suppose that y2(t) = q for all t € (t1,t2) for t1,t9 € J with tı < t2. By 
Theorem 36.4.10 (viii), a(t1) < a(t2). Let z € (a(t1), a(t2)). Then tı = 8(o(t1)) € B(x) € B(a(t2)) = t» by 
Theorem 36.4.10 (vii). Thus S(x) € [t1, t2]. So 51 (x) = 72(B(a)) = q because y2 is continuous. Therefore yı 
is constant on the non-empty open interval (a(t1), o (t2)). So (a(t1), a(t2)) € G, and so B(a(t1)) = B(a(t2)) 
by line (36.4.1) and the definition of G. Therefore tı = tz, which contradicts the assumption. Hence 
y2 : J — M is a never-constant continuous curve such that y1 = y2 o B, where 8 : J — J is a non-decreasing 


continuous surjection. 


36.4.12 EXAMPLE: Squeezing infinitely many constant stretches out of a continuous curve. 
Let M = IR. Let I = R. Define yı : I — M by y(x) = 0 for z € 0, yı (x) = x — 1 for z > 2, and 


VneZi,Vre-27"1!2—27] 
(x) = min(z — 1 4- 27^,1— 2771). 


(See Figure 36.4.3.) 


I I 


1 2 


Figure 36.4.3 Graph of continuous curve with infinitely many constant stretches 


The point on this curve is stationary for x < 0, then moves with velocity 1 for half a unit of time and then 
is stationary for half a unit of time. The point then does the same thing for half the duration recursively. 
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After time 2, the velocity equals 1. Then G = (—o0, 0) UU: ,(2 — 32>", 2 — 27") and 8 = ^, in the proof 
of Theorem 36.4.11. The equality of the functions yı and £ is a peculiarity caused by the fact that yı 
has velocity equal to 1 whenever the curve is not constant. If p is chosen equal to 0, then J = [0,00) and 
a(0) = 0, and a(t) = inf{x € Ri; 51(z) > t} for all t € IR}. Thus a(t) is always the earliest time x that 
the point reaches location t. So a(t) never takes on values x in the semi-open intervals (2 — 327”, 2 — 2^" 
for n € Z. The curve 72 : IRj — R without the constant stretches satisfies y2(t) = t for all t € RẸ. 

Let Q = (0,1) € Top(M). Then 0’ = y;71(Q) = (0,2), but GAM Z €'. (Therefore the approach in 
Theorem 36.4.10 (ix) cannot be applied to Q’ to obtain 8(Q) C J.) 


36.5. Path-equivalence relations for curves 


36.5.1 REMARK: Various kinds of equivalence relations for curves. 

Some texts define any two curves in a topological space M to have the same path if they have the same 
image. Such a definition discards information about the direction of the curve and detailed properties of 
the traversal, for example in the case of self-intersections or retracing of the image set. On the other hand, 
if two curves are defined to be path-equivalent when they are related by a homeomorphism between the 
parameter intervals, not enough information is discarded. This is because a curve which is constant on some 
sub-interval actually traces the same path as if there had been no such constant sub-interval. For example, 
if you travel by train from Paris to Warsaw, your path is the same whether or not your train stops in Berlin 
for 5 minutes. Parameter-interval homeomorphisms are unable to remove such pauses in journeys. Therefore 
in Section 36.5 a more precise concept of path equivalence is defined. Intervals where a curve is constant 
(called *constant stretches") are ignored when comparing curves. In particular, this implies that a constant 
curve has the same path as a curve with a one-point parameter interval, which is as one would expect. 


Information about the direction of a curve is not discarded because this information is needed in most 
applications in differential geometry. Unoriented paths are useful for defining pathwise connectedness in 
general topology, but oriented paths can do that job too. So the default definition for a path is oriented. 
Alternative terms for “oriented” would be “directed” or “ordered”. 


36.5.2 REMARK:  Reparametrisation of curves while inserting constant stretches. 

The reparametrisation functions 5; and (2 in Definition 36.5.3 modify the corresponding curves 7; and %2 
so that they have the same parameter interval Z. They also insert constant stretches into the curves so that 
they match correctly. Thus if p € M is a point such that y(t) = p for t in some positive-length interval 
but *9 does not have such a constant stretch, then £5 must insert a constant stretch with value p into the 
curve y2. In other words, the maps 0, and 2 do not remove constant stretches — they insert constant 
stretches in each curve to match the other curve. (See Figure 36.5.1.) 


Dom(41) 
EUCH EN y 
Pi | Nei 
I J1 9 Bi = 72 © B» x M 
al me e 
Dom(72) 
Figure 36.5.1 Equivalence of curves yı and y2 via reparametrisations 8; and f» 


Two curves which differ only by their parametrisation would be expected to transport fibres in the same way 
between two endpoints. Thus path-equivalence of curves has some significance for the theory of parallelism 
for fibre bundles, namely that path-equivalence implies parallelism-equivalence. (See also Example 36.5.11.) 


36.5.3 DEFINITION: Curves yı and %2 in a topological space M are path-equivalent if there are non- 
decreasing continuous surjections 6; : I — Dom(71) and $2 : I — Dom(y2) for some interval J C R 
such that 41 O £1 = 72 0 Bo. 
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36.5.4 REMARK:  Non-decreasing surjections between intervals are continuous. 

A continuous function maps intervals to intervals by Theorem 34.9.5. A partial converse states that a non- 
decreasing surjection between two real-number intervals must be continuous. (See Theorem 34.9.23.) So it 
is logically superfluous to require continuity is non-decreasing surjections between real intervals. 


36.5.5 REMARK: Path-equivalence of two curves does not imply that their domains are homeomorphic. 
An immediate consequence of Theorem 36.5.6 is that path-equivalence of two curves does not imply that 
their domains are homeomorphic intervals. 


36.5.6 THEOREM: Path-equivalence of non-empty constant curves with the same range. 
Any two non-empty constant curves with the same range are path-equivalent. 


PROOF: Let «,: I, — M and %2 : Ig — M be curves with Range(y1) = Range(y2) = (p) for some p in a 
topological space M. Then J; and Jz are non-empty real-number intervals. So by Theorem 34.9.18, there are 
non-decreasing continuous surjections 6, : I — I, and 85 : I — I5, where I = IR. Then «4 o £1 = 72 o fs. 
So ^ and 72 are path-equivalent by Definition 36.5.3. 


36.5.7 REMARK:  Path-equivalent curves can be “normalised” to a common open-interval domain. 

As illustrated in Figure 34.9.2 in Remark 34.9.19, non-empty open intervals are a kind of “lowest common 
denominator" for five classes of non-empty real-number intervals. This has the technical convenience that 
path-equivalence (for non-empty curves) can always be expressed in terms of a common open domain for the 
reparametrised curves, as shown in Theorem 36.5.8. Although this assists the proof of Theorem 36.5.9 (1), it 
is not desirable to constrain path-equivalent curves to have open domains. In the theory of parallelism, one 
usually possesses a fibre at a particular point of a curve and wishes to transport it along the curve. This 
suggests that the most useful kind of curve for parallel transport is “left-compact” with a form such as [a, b) 
or [a, oo) for some a,b € R with a < b. Compact and semi-compact intervals are converted to open intervals 
in Theorem 34.9.18 by adding initial or terminal open constant stretches. (The proof of Theorem 36.5.8 is 
illustrated in Figure 36.5.2.) 


"n 
yı 
Bi - | By ee 
I Bo I ^" 9 Bi = * © B» " M 
I» 
Figure 36.5.2 “Normalisation” of path-equivalence maps to use an open interval 


36.5.8 THEOREM:  Path-equivalence maps for path-equivalent curves via a given interval. 

Let yk : Ik — M be non-empty path-equivalent curves in a topological space M for k = 1,2. Then for any 
non-empty open real-number interval I, there are non-decreasing continuous surjections Bi : I > I, such 
that yı o 6, = %2 o £85. In other words, ^; and 72 are path-equivalent via any non-empty open interval. 


PROOF: Let yz : Ik — M be non-empty path-equivalent curves in a topological space M for k = 1,2. Then 
by Definition 36.5.3, there is a non-empty interval J and non-decreasing continuous surjections By : I — I 
with yı o fi = %2 o f». Let I be a non-empty open interval. Then by Theorems 34.9.18 and 34.9.20 (i), 
there is a non-decreasing continuous surjection fj, : I — I. Let Bk = Pk o Bo for k = 1,2. Then f : I — Ik 
is a non-decreasing continuous surjection for k = 1,2 and 7 o B= Y2 0 b: M, 


36.5.9 THEOREM:  Path-equivalence is an equivalence relation. 
(i) Path-equivalence is a transitive relation for continuous curves. 


(ii) Path-equivalence is an equivalence relation for continuous curves. 
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PROOF: For part (i), let yk : J, — M be continuous curves in a topological space M for k = 1,2,3. 
Assume that yı and y2 are path-equivalent and that y2 and y3 are path-equivalent. Then there are non- 
decreasing continuous surjections fi, : Jı — I, and 61,2 : Jı — Iz for some real-number interval Jı which 
satisfy 1 o 611 = %2 o fi, and there are non-decreasing continuous surjections £55 : J2 — Ig and 


b2,3 : Ja — Iz for some real-number interval Jz which satisfy y2 o 05.9 = 73 o 5.3. (See Figure 36.5.3.) 


Figure 36.5.3 Intervals and maps for proof of transitivity of path-equivalence relation 


If one (or more) of I4, I or Ig is empty, then all three are empty. So path-equivalence is transitive if one 
(or more) of the curves is empty. So it may be assumed that Jı, I2 and J3 are all non-empty. 


Let J = Jı + J2 be the Minkowski sum of J; and Jz. Then J is an interval by Theorem 22.11.24 (ii). Define 


Bi {(z,y) € J x Ig; dar, € J4, Are € Jo, (a = zı +22 and B1,2(£1) = B2,2(x2) = y)} (36.5.1) 
= {(x,y) E€ J x Ig; dr, € J, (x — zı € Jo and B1,2(21) = Bo.» (x m 21) = y) (36.5.2) 
= ((z, y) € J x Io; Arg € J2, (x — £2 € Jı and Bia(z — 22) = b2,2(£2) = y)]- (36.5.3) 


In other words, 6 = ((zi + zo, y); (x1, y) € £1, and (5, y) € £25). (It will be shown that 8: J > In isa 
well-defined non-decreasing continuous surjection.) 


Let x € J. Then z = z1 + z? for some zı € Jı and z2 € J2. Suppose that £15(x1) < 62,2(x2). Then there 
exists z| € Jı with xı < x and £1,2(1) = 82,2(x£2) because 61,2 : Jı — Ig is non-decreasing and surjective. 
Similarly there exists x4 € Jo with x < £2 and 6) 2(x1) = Bo2(x4) because 62,2 : Ja — Ip is non-decreasing 
and surjective. Either x| +x, < x or x} +24 > x (or both). Suppose that x| +24 <a. Then z — v > c5, 
and so z — z € [z5, £2]. So [x — x1, x — zi] € [#5, £2] € J2. Therefore x — z € Jo for all z € [x1, x1], where 
[1,21] € Jı. Define à : [1,234] > R by 6 : z+ 81 2(z) — B2,3(x — z). Then ¢ is well defined, continuous and 
non-decreasing on [21,74]. But $(z1) < 0 and (x1) = 81,2(£1) — 62,2(@ — $1) = Bo,2(x2) — B2,2(x — 1) 2 0. 
So ó(z) = 0 for some z € [r1, x1] by the intermediate value theorem. (See Theorem 34.9.7.) Let y = 61,2(z) 
for some such z. Then 8i5(z) = £525(r — z) = y and z € Jı and x — z € J2. So (x,y) € 8. Similarly, 
(x,y) € B for some y € Ia if x + xh > x. Thus Va € J, 3y € I5, (2, y) € B. 


To show the uniqueness of y for any given x € J, let (x,y), (x,y) € B. Then x € J and y,y’ € h 
and zr = zı + z9 = x) + xh for some 21,2, € Jı and z$,z5 € J2 and fi»(ri) = b2 2(£2) = y and 
b1 2(£1) = Bo,2(x5) = y'. Suppose that xı < x1. Then z2 > x4. Suppose that y < y’. Then y = b2,2(£2) < 
£2.3(x5) = y’, which contradicts the assumption that £55 is non-decreasing. Similarly, if y > y’ then 
y = fis (21) > 81,2(£1) = y', which contradicts the assumption that 61,2 is non-decreasing. Therefore y = y’. 
The same conclusion follows if zi > z4. Thus 6: J > I is a well-defined function. 


To show that 8 : J — I» is non-decreasing, suppose that y = f(x) > B(x’) = y' for some x, z' € J witha < x’. 
Then there are x1, £} € Jı and 5,25 € J2 such that z = 71+22 and a’ = z| +z and 61,2(£1) = 825(12) = y 
and £i»(z1) = 82,2(£3) = y'. If xı < x) then 81,2(£1) = y > y' = 81,2(x) contradicts the assumption that 
£1, is non-decreasing. So xı > z^. Sov, = x— zı < r-r] « z'—a', = x5. But fia »(z3) = y > y = Ba3(x5), 
which then contradicts the assumption that £5.» is non-decreasing. Therefore 8 : J — I» is non-decreasing. 
B: J > h is a surjection because Range(f) = Range(£12) = Range(f2,) = Io. So B is continuous by 
Theorem 34.9.23 because it is a non-decreasing surjection between two real intervals. Thus 8: J — Ig is a 
well-defined non-decreasing continuous surjection. 
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Let Ks = {z € Jı; x—z € Jo and fi 2(z) = B25(x —z)) for x € J. Let x € J and y = B(x). Then z-z € Jp 
and £1.3(z) = fa.»(x — z) = y for some z € Jı by line (36.5.2) and the fact that 8 is a well-defined function. 
Therefore K, is non-empty for all x € J. So Ky is a non-empty real-number interval for all x € J because 
£1,» and £5,» are non-decreasing. 


If K, is unbounded on the left, then Jı must be unbounded on the left and Jz must be unbounded on 
the right, and for some a € IR and y € I», 61,2(z) = Bo2(x — z) = y for all z € R with z € a. But the 
non-decreasing property of £1. and 82,2 then implies that max(/2) = y and min(J2) = y, which implies that 
In = {y}, which implies that Range(51) = Range(72) = Range(y3) = {72(y)}. In this constant-curve case, 
^4 and 3 are path-equivalent by Theorem 36.5.6. Similarly, if A, is unbounded on the right, 7; and y3 are 
path-equivalent because they must be constant curves. Also, any two empty curves are path-equivalent. So 
it may be assumed that #(J2) > 2 and K, is bounded above and below for all x € J. 


Define the function 6; : J > Jı by 6i(x) = $(inf(K,) + sup(K;)) for all £ € J. Then f is a well- 
defined function with values in J; when #(J2) > 2, and f(x) € Ks for all x € J because Ky, = {81 (x)} if 
inf(K,) = sup(K,,) and fi1(z) € (inf( &;),sup(K,)) C Kz if #(K,) > 1. 


Let z,z' € J with z < z'. Then Ky = (z' € Jj; z' — 2’ € Jp and 61 2(2’) = Bo2(x' — z')). Suppose 
that z' € Ky satisfies z' < inf(K,). Then 61,2(z’) < f(x — 2’) because otherwise K, would contain an 
element which is less than inf(A,). But 622(x — 2’) < Bo2(a' — z') because 82,2 is non-decreasing. So 
Bi 2(2') < B2,2(x' — 2’), which contradicts the definition of Ką.. Therefore z’ > inf(K,) for all z’ € Ky, and 
so inf( K,) € inf(K,’). Similarly, sup( Kz) € sup(J€,;) by considering z € K, satisfying sup(K,;) < z, which 
also leads to a contradiction. Therefore £1(x) < 8ı(x'). Thus £84: J > Jı is non-decreasing. 


To show that £4 : J — Jı is continuous, let z,z' € J with x < x’. Suppose that inf(K,;) > inf( K,) +x’ — x. 
Let L, = z — Ky, = {w € Jo;  — w € Jı and fi 2(@ — w) = 8253(w)) for x € J. Then sup(L,;) = 
z'—inf(K,)«z-—inf(K,)-—sup(L,). Then there is an element w € L, such that sup(L,:) < w. So there 
is an element w € Jz such that x — w € Jı and f(x — w) = b2,2(w) and sup(L,;) < w. But then w € Jo 
and z' — w € Jı and fi,»(z' — w) > fi»(x — w) = 62,2(w) and sup(Lz’) < w because 61,1 is non-decreasing. 
So since £1,» and 85, are non-decreasing and continuous, there is a w' € J2 with w < w’ and z' — w € Ji 
and £i »(z' — w') = 62.2(w’) and sup(L,;) < w € w'. This contradicts the definition of sup(L,;). Therefore 
inf(K,) € inf(K,) + z' — z for all z,z' € J with z < a’. So 0 € inf(K,;) — inf(K,) € z' — x for all 
x,x' € J with z < x’ because inf(K,) is non-decreasing with respect to x. Therefore inf(K,) is continuous 
with respect to x € J. Similarly, sup(K,) is continuous with respect to x € J. Therefore bı : J > Jı 
is continuous. (Note that Theorem 34.9.23 cannot be used here to prove continuity from non-decreasing 
surjectivity because the continuity property is used to prove surjectivity!) 


To show that £415 o £1 : J > I» is surjective, let y € Ig. Then £i5(z1) = Bo2(z2) = y for some zı € Jı 
and z9 € J2 because £i» : Jı — Ip and b22 : J2 — Ip are surjective. Let x = zi + z2. Then z, € Kz 
and fi(zr) = i(inf(K;) + sup(K;)) € Kz, and so f1,2(81(z)) = y. Therefore Range(81 o fi) = ls. 
Unfortunately it cannot be shown in general that Range(81,ı o 81) = I4. (This is because a counterexample 
can be constructed where J; is left-compact and J2 is left-open, while J, and I> are both left-compact. Then 
Range(£1) = Int(J1) $ Jı and Range(£1,1 o 61) = Int(/1) & I4. See Definition 34.9.16 for left-compact and 
left-open intervals.) 


For showing that 61 : J > Jı is surjective, it may be assumed that J; and Jz are non-empty bounded open 
intervals by applying Theorem 36.5.8. Let z; € Jı and y = £1,2(z1). Let z2 € J2 satisfy Bo 2(z2) = y, which 
must exist because Range(b1,2) = I2 = Range( 62,2). Let x = z1 + z2. Then z; € K,. So —oo < inf(K,) < 
zı € sup(K,,) < oo. Suppose that z; < B(x) = $(inf(K,) +sup(K,)). If inf(I5) < y, then there is a zj € Jy 
with £1,5(21) < y, and then there is a 25 € Jz with 825(25) = B1,3(21). Let z' = z1 + 24 for such z| and 24. 
Then sup(K,;) < zı and so £i(x') < z because 81,2 and (2,2 are non-decreasing. Therefore there is an 
x” € (x',x) such that £1(x") = z1 by the intermediate value theorem because 1 is continuous. So assume 
that inf(I2) = y. Let z; = inf{z € Ji; f13(z) = y} and 25 = inf(w € Jo; B2.2(w) = y}. Then zj < z1 because 
Jı is an open interval. Let x’ = z1 +24. Let zY = (24 +2) and z7 = z/—zj. Then z/ € Jı and 6) (2) =y 


because 1,2 is non-decreasing. Similarly z4 = z' — $(z, + 21) = $((2’ — z1) + (z' — z1)) = f(a + 22) € J2 


because z2 € J2, and f55(z7) = y because £i, is non-decreasing. So 2Y € Kw. But inf(K,;) = zj and 
sup(K,) = z1 because for all z € Jy, z < z1 if and only if z’ — z > z4. So Bı(x') = zY < z1. Therefore there 
is an z" € (z', z) such that fi1(z") = z by the intermediate value theorem because £1 is continuous. The 
same form of argument shows that z; € Range(0) if z1 > (x). 
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Let Bi = £11 o B1. Then Bi : J — J, is a non-decreasing continuous surjection. It may be shown in 
the same way that if Lẹ = x — Ks = {w € J2; x — w € Jı and 61 2(4 — w) = £5,3(w)) for all x € J, and 
Bo: J > Jz is defined by 85 : z — z (inf (La) + sup(Lz)), then B23 = B23 © £5 : J — Ia is a non-decreasing 
continuous surjection. Then 4 o Pi —yiofi19 fi = 72 © Big © b1 = yo o B = ya © B3» © B2 = 73 O 
b2,3 0 B2 = ya 0 f». Hence qı and y3 are path-equivalent. 

For part (ii), let M be a topological space. Let y: I — M be a curve in M. Then y o f1 = y o 85 with B1 = 
B5 = idz. So y is path-equivalent to y because id; : J — I is a non-decreasing continuous surjection. This 
proves reflexivity for Definition 9.8.2. The symmetry property is obvious from Definition 36.5.3. Transitivity 
follows from part (i). So path-equivalence is an equivalence relation by Definition 9.8.2. mm 


36.5.10 EXAMPLE: Curves with the same range and start and end points may not be path-equivalent. 
The curves ^; : [0,7] — IR? with yı : t — (cost, 0) and 4» : [0,7] > IR? with y2 : t > (cos3t,0) have the 
same image set [—1, 1] x {0} and the same start and finish points. (See Figure 36.1.2 in Remark 36.1.5.) But 
Jı and Jz are not path-equivalent according to Definition 36.5.3. (This follows from Theorem 36.5.18.) 


36.5.11 EXAMPLE: Curves with the same range and start and end points may not be parallelism-equivalent. 
The curves yı : [0,27] — IR? with yı : t > (cost,sint) and 4» : [0,27] — IR? with y2 : t > (cos2t,sin 2t) 
have the same image set {x € IR?; |x| = 1} and the same start and finish points. But yı and yz are not 
path-equivalent according to Definition 36.5.3. Moreover, any fibre which is transported by y2 would be 
expected to experience twice the transport effect of y1. This contrasts with Example 36.5.10, where the path 
is traversed twice in one direction and once in the opposite direction, which should have the same effect as 
a single forward traversal. 


Thus it could be argued that curves should be characterised as path-equivalent if any differences can be 
removed by matching and removing forward and reverse traversals of segments of the curve. Parallel transport 
reversibility is specified, for example, in Definition 48.3.2 (v) for topological fibre bundles, and is implied 
by the linearity of connections in Definition 67.4.2 (iii) for differentiable fibre bundles. However, it does 
frequently occur in practice that two curves differ only by their parametrisation, as in Definition 36.5.3, and 
it is desirable to be able to then say that they must effect the same parallel transport. Retracing of a path 
is seen less often in practical scenarios. So it seems less important to incorporate parallelism-invariance for 
retracing into a path-equivalence definition. 


36.5.12 THEOREM: All curves in a Tı space are path-equivalent to some never-constant curve. 
Every curve in a T, space M is path-equivalent to a never-constant curve in M. 


Proor: The assertion follows from Definition 36.5.3 and Theorem 36.4.11. 


36.5.13 THEOREM: T; space path-equivalent curves are both path-equivalent to some never-constant curve. 
Two curves ^4 and 72 in a T, space M are path-equivalent in M if and only if they are both path-equivalent 
to a never-constant curve y3 in M. 


PROOF: Let yk : Ik — M be path-equivalent curves in a T, space M for k = 1,2. Then there are 
non-decreasing continuous surjections 6, : | — I, such that y, o f4 = %2 o £85 by Definition 36.5.3. Let 
y = 1 ° b1 = %2 0° 25:1 — M. Then by Theorem 36.4.11, there is a never-constant curve 73 : [3 > M and 
a non-decreasing continuous surjection 63 : I — Iz such that y = 73 o 83. Thus 71 o b1 = 73 o B3 = 72 o fs. 
Therefore ^; and y2 are both path-equivalent to a never-constant curve y3 by Definition 36.5.3. 

Conversely, suppose that yı and 72 are both path-equivalent to a never-constant curve y3. Then ^, and 7.2 
are path-equivalent by Theorem 36.5.9 (ii). 


36.5.14 REMARK:  Reparametrisation of curves while removing constant stretches. 

Instead of inserting constant stretches into path-equivalent curves ^; and %2 to make them match as in 
Definition 36.5.3 and Figure 36.5.1, it would be more satisfying to somehow remove them. This is done in 
Theorem 36.5.15 by effectively reversing the directions of the reparametrisation functions 8; and f». This 
"reversal of the arrows" is illustrated in Figure 36.5.4. 


Instead of obtaining yı o £4 = Yo = Y2 © f$ as in Definition 36.5.3, one obtains Yk = y o By for k = 1,2 
in Theorem 36.5.15. This is achieved by first defining “forward arrows" Bo : Io — Ik, then applying 
Theorem 36.4.11 to remove constant stretches from yo : Io — M, and then using “function quotients” to 
construct the “reverse arrows" fi = Bo o Bo. 2 
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Figure 36.5.4 Equivalence of two curves to a never-constant curve 


Io 


36.5.15 THEOREM: “Reversal of the arrows” for path-equivalence maps. 

Let yı : /4 > M and %2 : I2 —^ M ina T, space M be path-equivalent. Then there is a never-constant curve 
4:1 — M and non-decreasing continuous surjections f; : I, — I for k = 1,2 such that yı = y o bı and 
2 = 7° fa. 


Proor: Let yı: M and %2 : I2 > M in a T, space M be path-equivalent. Then by Definition 36.5.3, 
there is a curve Yo : Io — M and non-decreasing continuous surjections 5o, : Io — J, such that yı o Boia = 
y2 9 fo, = yo. By Theorem 36.4.11, there is a never-constant curve y : I — M and a non-decreasing 
continuous surjection 6o : Io — I such that yo = y o fo. 


Define the relations fy = fo o Bok : Ik — I for k = 1,2. To show that £4 is a function, first note that 
Ve € lh, dy € I, (x,y) € bı because o, : Io > I, is a surjection and Bo : Io — I is a well-defined 
function. To show that 8ı(x) has a unique value for all x € J}, suppose that 6o,1(z) = Bo,1(2’) for some 
2,77 € lg with z < z’. Then 691(2”) = Boi(z) for all z” € [z, z'] because f is non-decreasing. So 
^i(Boi(z")) = y1(80,1(2)) for all z” € [z, z/]. Therefore y(8o(z")) = 7(80(z)) for all z” € [z, z'] because 
y o Bo = *1 © foi. Suppose that £o(z/) Æ Bo(z). Then Bo(z) < Bo(z) and [8o(z"), 8o(z)] is a constant 
stretch of y by Definition 36.4.2. So by Definition 36.4.3, y is not a never-constant curve, which contradicts 
an assumption. Therefore fo(z = Bo(z). Thus Vz,z' € lo, (foi(z) = Bor(z’) = fo(z) = Bo(z)). So 
By = Boo Br : I, — I is a well-defined “function quotient”. (See Definition 10.5.27.) Then f, is non- 
decreasing because Bo and 8, are non-decreasing, and 6; : I; — I is surjective because fo : Io — I is 
surjective and 8o,1 : Jo — J, is a well-defined function. 


The functions 6, : Jı — I and 85 : I2 — I are continuous by Theorem 34.9.23 because they are non-decreasing 
surjections between intervals. 

Now y o By = y o Bo o Box = 70° Pai = (Yk © fo) o Bos = yp for k = 1,2. (Note that in general, 
[is o Bo, is not a well-defined function, but fo o fax = idz, by Theorem 10.5.19 (ii) because fo, is 
surjective.) Thus y : I > M is a never-constant curve and fj : Ik — I for k = 1,2 are non-decreasing 
continuous surjections such that 7, = y o £4 and %2 = y o ffs. 


36.5.16 REMARK: Existence of a homeomorphism linking path-equivalent never-constant curves. 

If the curves yp : Ik — M in Theorem 36.5.15 are never-constant for k = 1,2, the non-decreasing continuous 
surjections B, : I, — I can be chosen to be bijections, which implies that their “quotient” 85 logi: Io 
is a bijection. Since this bijection and its inverse are continuous, it is a homeomorphism, and since it is 
a homeomorphism, it must be strictly increasing, as asserted in Theorem 36.5.17. Consequently, path- 
equivalent never-constant curves “spend the same amount of time" in each subset of M, as asserted in 
Theorem 36.5.18. In particular, path-equivalent never-constant curves cross each point in M the same 
number of times. (See Example 36.5.10 for a pair of non-constant curves which have the same start and end 
points and range, but which cross each point a different number of times.) 


36.5.17 THEOREM: Existence of parameter-homeomorphism linking path-equivalent never-constant curves. 
Let ?4 : h — M and y» : I2 > M be path-equivalent never-constant curves in a T, space M. Then there is 
an increasing homeomorphism f,» : I — Iz such that y1 = %2 o £15. 


PROOF: Let y,: l|; — M and y2: Ig — M be path-equivalent never-constant curves in a T; space M. 
By Theorem 36.5.15, there is a never-constant curve y: I + M and non-decreasing continuous surjections 
By : Ip > I for k = 1,2 such that yı = y o £4 and y = y o b2. Let z, z/ € Ip satisfy Bo(z) = Bo(z’). Suppose 
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that z < z'. Then £3(z") = 82(z) for all z” € [z, z’] because 82 is non-decreasing. So y(85(z")) = y(82(z)) 
for all z" € [z, z/]. Therefore y2(z”) = y2(z) for all z" € [z,z’]. But this implies that y2 has a constant 
stretch, which contradicts an assumption. Therefore z = z’. So fi is injective. Therefore £5 is a bijection. 
So B : — Ip is a bijection. Let 61,2 = Bee o £1. Then £15 : I; — I» is a non-decreasing continuous 
bijection and 44 = y o 84 = y o fo Bs” o By = %2 o 81,2. Similarly, let 85, = p o Bo. Then 851 : h — I 
is a non-decreasing continuous bijection and 45 = 71 o £54. But 85; = fio. Hence £15 : h > h isa 
non-decreasing homeomorphism, and 61,2 is strictly increasing because it is a non-decreasing bijection. 


36.5.18 THEOREM:  Equinumerosity of inverse images by path-equivalent never-constant curves. 
If never-constant curves yı and 72 in a T, space M are path-equivalent, then #(77 +SP) = #(99 USH) 
for all sets S C M. (See Definition 13.1.2 for equinumerous sets.) 


Proor: Let yı: lı > M and %2 : Ig — M be path-equivalent never-constant curves in a T, space M. Let 
S € IP(M). By Theorem 36.5.15, there is a homeomorphism 0,» : I; > Ig such that y; = y2 o f15. So 
ATTS) = (2 © Bia)" (8) = Bil (S)) by Theorem 10.7.2 (ii). Hence #(; ((8))) = #07 (8])) by 
Definition 13.1.2 and Notation 13.1.5. 


36.5.19 REMARK: Parallel transport should be independent of the parametrisation of curves. 

It seems reasonable to expect parallel transport along a curve to be independent of the parametrisation of the 
curve. In other words, the transport of a fibre between two points via a pair of path-equivalent curves should 
be the same. To determine how the parameters of two different parametrisations of a curve correspond to 
each other, it is necessary to specify a particular non-decreasing homeomorphism between their parameter 
intervals as in Definition 36.5.20. The need for this is demonstrated by Example 36.5.21. 


36.5.20 DEFINITION: Corresponding parameters of curves ^ : 14 — M and %2 : Ig — M in a topological 
space M, for a given non-decreasing homeomorphism 4$ : Iı — Iz which satisfies 1 = y2 o à, are parameters 
tı € Jı and t» € I» such that t9 = (tı). 


36.5.21 EXAMPLE: Define curves ^1 : J > IR? and 72: I — IR? for I = R by qı : t 9 (cost,sint) and 
Y2 = ^1 9 bn, where ¢, : I + I is defined by ¢, : t t + 2nz for some n € Zt. Then y2 = %1, but $ is not 
the identity map even though ^; and %2 are path-equivalent never-constant curves in a T space. 


36.6. Concatenation of curves 


36.6.1 REMARK:  Concatenation of joinable curves. 
The very simple kind of concatenation of two curves in Definition 36.6.5 requires the curves to be “joinable” 
as in Definition 36.6.2. Condition (i) means that the domain intervals of the two curves “touch” at a common 


point sup(Dom(y;)) = inf(Dom(72)) in IR. Condition (ii) means that the end-point of the first curve equals 
the start-point of the second curve. 


36.6.2 DEFINITION: Two joinable curves in a topological space X are non-empty continuous curves 7 : 
Ih — X and ya : I2 — X such that 


(i) supll) = inf(I3) € LAD, 
(ii) 51 (sup(/1)) = o» (inf(15)). 


36.6.3 THEOREM: The union of joinable continuous curves is a continuous curve. 
If yı and 4» are joinable curves in a topological space X, then 91 U ys is a continuous curve in X. 


PROOF: Let yı : 4 — X and %2 : Ig — X be joinable curves in a topological space X. Then sup(/1) = 
inf(I2). Let c = sup(/4) = inf(I5). Then (c) = WN h and 41(c) = ya(c). Let Io = Iı U Ia. Then Jp is an 
interval in IR. Let yo = y1 U y2. Then yo : Jo — X is a well-defined function. To show that yo is continuous, 
let Q € Top(X). Then 45! (0) = 41! (Q) U 42! (Q). But 41! (Q0) € Top(IR) and 47! (0) € Top(IR) because 
^i and 4» are both continuous. So yg ! (Q) € Top(IR). Hence ^o is continuous. 


36.6.4 REMARK: Simple concatenation of joinable curves. 

Definition 36.6.5 is a very restrictive kind of curve concatenation. It requires the curves which are to be 
concatenated to be “joinable” in the sense of Definition 36.6.2. From Theorem 36.6.3, it follows that the 
“simple concatenation” y, U Y2 of curves 7; and 72 in Definition 36.6.5 is a continuous curve. 
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36.6.5 DEFINITION: The simple concatenation of joinable curves ^4 and %2 in a topological space X is the 
curve ^ U 72. 


36.6.6 REMARK: Concatenation of domain-translated curves. 

Definition 36.6.7 is a fairly general kind of concatenation of domain-translated curves. The output curve is 
defined by domain-translating the two input curves so that they are joinable in the sense of Definition 36.6.2, 
and applying Definition 36.6.5 to the translated curves. The output curve is then uniquely determined 
modulo a domain translation. (Definition 36.6.7 is illustrated in Figure 36.6.1 for the particular case where 
cı = sup(/1) and c2 = inf(I5).) 


Range(yo) = Range(71) U Range(y2) 


Po NEC 


PN Y2 


C2 
—— 
LR LR IR 
Dom(1) Aw Dom(72) = I 


~R ` 
Dom(5o) = (LÈ, o A) U (LE,, o 15) 


—€4 


4o = (71 © LE) U (y2 o LE) 


Figure 36.6.1 Concatenation of curves by translating the domain intervals 


36.6.7 DEFINITION: A domain-translated concatenation of curves yı : h > X and yo : hb > X ina 
topological space X is the simple concatenation (5; o LE) U (y2 o LÈ) of joinable curves yı o LÈ and 
y2 0 LR, for some c1, c2 € R, where LÈ : R > R is defined by LÈ : t t +c for cE R. 


36.6.8 REMARK: Concatenation of domain-transformed curves. 

The style of concatenation in Definition 36.6.7, which permits only translations to make the domains of the 
constituent curves meet, is generalised in Definition 36.6.9 to allow strictly increasing reparametrisations of 
the domains before joining them. (Definition 36.6.9 is illustrated in Figure 36.6.2.) 


Range(yo) = Range(71) U Range(y2) 


dil Wu 
a: bw 


Dom(1) 
EEG cM 
Dom(yo) = £1(h) U Ba(12) 
30 = (m © BT) U (72 © 857) 
Figure 36.6.2 Concatenation of curves by transforming the domain intervals 
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36.6.9 DEFINITION: A (domain-transformed) concatenation of curves yı : I —> X and y2: h > X ina 
topological space X is the simple concatenation (yı o f, ^) U (y2 o 85 !) of joinable curves yı o f, * and 
^yo 0 By r for some strictly increasing continuous functions £j : /; — R and 62: I5 > R. 


36.7. Pathwise connected sets 


36.7.1 REMARK: The relevance of pathwise connectedness to differentiable manifolds. 

Pathwise connectedness of sets has a clear relevance to differential geometry. Parallelism is defined along 
paths in terms of general and affine connections, and distances between points in Riemannian manifolds are 
defined by extremising an integral of the metric tensor field along paths between the points. At the very 
minimum, one requires paths which are continuous. In a differentiable manifold, connectedness via differ- 
entiable curves will be required. Although one uses the word “path” for these concepts, the mathematical 
structure for practical definitions is a curve. 


36.7.2 DEFINITION: A pathwise connected pair of points in a topological space X is a pair 21,22 € X such 
that for some continuous curve ^ : [a,b] > X, yla) = zı and 4(b) = 22. 


36.7.3 THEOREM:  Pathwise connectedness of points is an equivalence relation. 
Let X be a topological space. Define a relation “=” on X so that z1 = zg for z1,z2 € X whenever the pair 
21,29 is pathwise connected in X. Then “=” is an equivalence relation on X. 


PROOF: Let X and “=” be as in the statement of the theorem. Let z € X. Then z = z because the constant 
path y : [0,1] ^ X with y(t) = z for all t € [0,1] is a continuous path in X. Let z1,22 € X with zi = 22. 
Then 21, z2 is a pathwise connected pair in X. So there is a continuous path  : [a,b] —^ X with y(a) = z1 
and y(b) = z2. This curve can be reversed as 7 : [—b, —a] > X with 7(t) = y(—t) for t € [-b, —a]. So z2 = 21. 


Suppose that 21, 22,23 € X satisfy zj = z2 and z2 = za. Then there are continuous curves ^; : [a1,b1] ^ X 
and y2 : [a2,b2] — X with 41(a1) = zi, y1(b1) = ze, ye(a2) = z2 and y2(a2) = z3. (It may be assumed 
that a; € bı and a» < b2.) As in Definition 36.6.7, one may define the domain-translated concatenation 
73 = (71 © Le,) U (Y2 © Le, ) of the joinable curves y; o Le, and y2 o Le, with cı = a1 and c = ag — bı ai, 
where LÈ : IR > R is defined by LR : t  t-- c for c € IR. (This is illustrated in Figure 36.7.1.) Then 73 : 
[0, 64 —a1-4-09—a2] — X is a continuous curve by Theorem 36.6.3, and y3(0) = zı and 73(b1-—a1+b2—a2) = 23. 
So z1 = zs. Hence “=” is an equivalence relation. 


Range(y3) = Range(?1) U Range(y2) 


L® LR a 
Dom(71) n E Dom(y2) = I5 


Dom(73) = ss, o I) U OF. 
ys = (m o LE) U (%2 o LE) 


[9] I2) 


Figure 36.7.1 Curve concatenation to show transitivity of pathwise connectedness 


36.7.4 REMARK: The partitioning of sets according to pathwise connectedness. 

It follows from Theorem 36.7.3 that any topological space X is partitioned by the pathwise-connectedness 
equivalence relation. The equivalence classes of points for this relation may be referred to as “pathwise 
connected components" of X. This is similar, but different, to the partitioning of a topological space into 
connected components in the sense of Definition 34.5.3. 
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36.7.5 DEFINITION: A pathwise connected pair of points within a subset A of a topological space X is a pair 
21,22 € A such that for some continuous curve y : [a,b] > X, y(a) = zı and 7(b) = z2 and Range(y) C A. 


36.7.6 DEFINITION: A pathwise connected (sub)set in a topological space X is a subset A of X such that for 
all points 21, z2 € A, for some continuous curve » : [a,b] ^ X, Range(y) € A and 7(a) = zı and 7(b) = za. 


36.7.7 DEFINITION: A pathwise connected (topological) space is a topological space X such that X is a 
pathwise connected subset of X. 


36.7.8 REMARK: Alternative terminology for pathwise connectedness. 
Some authors call a pathwise connected topological space “arcwise connected”. 


36.7.9 THEOREM: Pathwise connectedness implies connectedness. 
Let X be a pathwise connected topological space. Then X is connected. 


PRoor: Let X be a topological space which is pathwise connected. Suppose that X is not connected. Then 
by Definition 34.1.3, there are non-empty disjoint open sets Q4, 05 € Top( X) such that X = Qı U Qs. Let 
zı € Qı and zg € Qe. Then by Definition 36.7.7, there is a continuous curve y : [a,b] > X for some a,b € IR 
with a < b, such that y(a) = zı and y(b) = z2. From Theorem 34.4.18, Range(y) must be a connected 
subset of X because [a,b] is a connected subset of IR. But Range(y) C Qı U Qz and Qı N Qə = 0 and 
Qı N Range(y) Z 0 and Q9 N Range(w) 4 0. So Range(») is a disconnected subset of X by Definition 34.1.6, 
line (34.1.4). This is a contradiction. So X must be connected. 


36.7.10 REMARK: Connected topological spaces are not necessarily pathwise connected. 

It is not true that all connected topological spaces are pathwise connected. Example 34.7.8 is connected 
but not pathwise connected. In the illustration in Figure 34.7.3 of this topological space, one may have the 
impression that the space X has two components, namely the subsets (0) x [21,1] and {(z,y) € R?; x € 
(0, 1] and y = sin(z/(2z))) of R?. However, these two “components” cannot be disconnected by covering 
them with disjoint open sets. Therefore in this sense, the two “components” are not disconnected, and so 
they are connected. But it is not possible to construct a continuous curve which joins points which are in 
different “components” of X. One could say that this example has two “pathwise connected components", 
but they cannot be disconnected in the sense of a disjoint open cover. It could be argued that pathwise 
connectedness is closer to one's intuition of the notion of connectedness than the disjoint open cover notion 
in Definition 34.1.3. 


36.7.11 REMARK:  Pathwise connected topological spaces are not necessarily locally connected. 
It is not true that all pathwise connected topological spaces are locally connected. This can be seen by 
slightly modifying Example 34.7.8, which is illustrated in Figure 34.7.3. Define a topological space X by 


X = ([0,1] x (0)) U (40) x [-1,1]) U £(z, y) € R2; x € (0,1] and y = sin(/(22))) 


with the relative topology of IR2. (The illustration in Figure 34.7.3 is valid for this function if one adds 
[0, 1] x {0} to the set.) Then X is pathwise connected because one may connect any pair of points in X by 
a continuous curve via the set [0, 1] x (0). However, all open neighbourhoods €? € Top( X) of points (0, y) in 
{0} x ([-1, 1] V 103) are disconnected into an infinite number of components if €) C B(o,,),jj. Therefore X 
is not locally connected at these points. 


36.7.12 REMARK: Pathwise connectedness partitioning is a refinement of connectedness partitioning. 

The partition of a topological space X according to the pathwise connectedness relation is a refinement 
(in the sense of Definition 8.7.16) of the partition of X into connected components. This follows from 
Theorem 36.7.9, applied to the pathwise connected subsets of X, using the relative topology. Thus each 
connected component of X is the disjoint union of pathwise-connected sub-components. It is impossible for 
a pathwise-connected component to overlap two connected components. 


36.7.13 REMARK: Locally pathwise connected topological spaces. 
One may define locally pathwise connected topological spaces as in Definition 36.7.14, analogous to the 
locally connected topological spaces in Definition 34.7.3. 
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36.7.14 DEFINITION: A locally pathwise connected (topological) space is a topological space X such that 


Va € X, VO € Top, (X), 40’ € Top,.(X), 
Q’ CO and Q is pathwise connected. 


A locally pathwise connected point in a topological space X is a point x € X such that 


YQ € Top, (X), 3€ € Top, (X), Q C Q and Q is pathwise connected. 


A topological space X is locally pathwise connected at a point x € X if x is a locally pathwise connected 
point in X. 


36.7.15 THEOREM: Local pathwise connectedness implies local connectedness. 
Let X be a locally pathwise connected topological space. Then X is locally connected. 


PROOF: Let X be a locally pathwise connected topological space. Let x € X and Q € Top, (X). Then by 
Definition 36.7.14, there is a pathwise connected set € € Top, (X) such that Q’ C Q. By Theorem 36.7.9, 
Q’ is a connected subset of X (in the relative topology on € from X). Hence X is a locally pathwise 
connected topological space. 


36.7.16 REMARK: Diagram of relations between connectedness classes. 
The relations between the various classes of locally and globally connected and pathwise connected spaces 
are illustrated in Figure 36.7.2. 


topological space 


eo 


connected space locally connected space 
pathwise connected space locally pathwise connected space 


Figure 36.7.2 Family tree of locally and globally connected and pathwise connected spaces 


36.8. Paths 


36.8.1 REMARK: A parametrised curve may contain “too much information”. 
In Section 36.8, paths are defined in general topological spaces as equivalence classes of curves with respect 
to the “path equivalence” of curves defined in Section 36.5. 


A major application of curves is to integration, both in mathematics and physics. One often refers to the 
“integration path” rather than the “integration curve” because the velocity of the traversal of the path is 
irrelevant. However, the direction of traversal very often is relevant. However, this does not imply that the 
transition map between equivalent curve parametrisations can be any invertible increasing function at all. 
For practical purposes, one generally wants curves to be parametrised in a piecewise differentiable way so 
that the velocity exists almost everywhere and the displacement up to any point on the curve’s image will 
equal the integral of the velocity up to that point. (This may depend on the choice of measure which is used 
for the integration.) 


The very general definition of a parametrised curve contains in a sense “too much information”, whereas the 
image of the curve contains too little “information”. Consequently, paths are defined here as equivalence 
classes of curves, where the equivalence relation is chosen so as to remove the information which is irrelevant 
to a particular application. 


36.8.2 REMARK: Notation for paths which suggests that they are equivalence classes of curves. 

The notation chosen for paths here is [y]o for the equivalence class of any given curve y. Then [y1]o = [ya]o 
if and only if yı and y2 are path-equivalent curves. One may say that [y]o is “the path of y”, so that any two 
curves are equivalent if and only if they “have the same path”. So every curve is associated with a unique 
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(oriented continuous) path. This path structure determines the order and general manner of traversal of 
points in the image set. (See Definition 36.5.3 for path-equivalent curves.) 


The subscript “0” for G(M), |y]o and Wo(M) is not entirely satisfactory. (See Notation 36.2.17 for G(M).) 
The use of this subscript suggests that the continuity class is a part of a scale of differentiability, which requires 
some kind of differentiable structure. For consistency with the notations C(X, Y) for topological spaces X 
and Y, and C?(Mi, M2) for topological manifolds, the notations here should be @(M), [y] and (M). 
However, the subscript for paths [y]o has the advantage that it does suggest that the meaning has something 
to do with continuity. 


36.8.3 NOTATION: [»]o denotes the set of curves in a topological space M which are path-equivalent to a 
given curve y in M. 


36.8.4 DEFINITION: A path in a topological space M is an equivalence class [»]o of curves which are path- 
equivalent to a given curve y in M. 


A path may also be called an (oriented) (continuous) path or an (oriented) C? path, and the words directed 
or ordered may be used instead of "oriented". 


For any curve y, the path of y is the equivalence class [»]o. 
The set U {Range(71); ^1 € [y]o) is called the image of the path [y]o. 


Any curve in a path Q = [y]o may be referred to as a path representative or representative curve for the 
path Q. 


36.8.5 REMARK:  Empty-curve paths and constant-curve paths. 

For the empty curve y = f), the equivalence class [y]o is not empty. So it cannot be called literally the “empty 
path”. But it could accurately be called the “empty curve path” or the “path of the empty curve”. This 
logically correct usage is too clumsy. So the terms “empty path” and “non-empty path” will refer to the 
curve map, not the equivalence class. The emptiness or non-emptiness also corresponds to the corresponding 
property of the image of the path. 


Another moderately interesting trivial-curve issue is that of constant curves. For a fixed p € M, the constant 
curves ^; : (0) > M with 71 : t > p and 3s : [0,1] > M with q2 : t > p are path-equivalent although their 
parameter intervals are not homeomorphic. So these curves have the same path. In fact, all constant paths 
with the same value are path-equivalent, for all of the topologically different kinds of non-empty oriented 
intervals in the table in Remark 34.9.15. The only kind of constant curve which is never-constant is a curve 
with a singleton domain. 


36.8.6 REMARK: Notation for the set of continuous paths in a topological space. 

The set of paths Z(M) in Notation 36.8.7 may be expressed as Yo(M) = {[y]o; y € Go(M)} in terms of 
the corresponding Notation 36.2.17 for the set of curves 6o( M). Thus Zo(M) is a partition of @(M). So 
€o(M) =U o(M) and for all 51,52 € (M), either [51]o = [ys]o or Dilo N Delo = 9. 


36.8.7 NOTATION: Denote by #o(M) the set of all continuous paths in M. 


36.8.8 DEFINITION: The reversal of a path [y]o in a topological space M is the path [—y]o where — 
denotes the curve —y : t œ> *(—t). The reversal of a path Q = [y]o may be denoted as —Q or —[7]o. 


36.8.9 REMARK:  Path-representative independence of initial, terminal and multiple points. 
The definitions of initial point, terminal point and multiple point in Definition 36.8.10 are independent of 
the choice of path representative. 


36.8.10 DEFINITION: The initial point and terminal point of a path Q in a topological space M with a 
non-empty compact domain interval are the initial point and terminal point respectively of any representative 
of Q. 

A multiple point of a path Q in a topological space M is a point x € M such that x is a multiple point of 
some path representative of Q. 


36.8.11 NOTATION: The initial and terminal points of a path Q with non-empty compact domain interval 
may be denoted as S(Q) = S(y) and T(Q) = T(») respectively for any path representative y of Q. 
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36.8.12 DEFINITION: The concatenation of two paths Qı and Q» in a topological space M with non-empty 
compact domain intervals such that T(Q1) = S(Q2) is the concatenation of any representatives yı of Qı and 
Y2 of Q2 such that T(y1) = S(4). 


36.8.13 REMARK: Removal of information about the direction of traversal of a path. 

Definition 36.8.4 for a path removes information about the choice of parametrisation of a curve except 
for the direction. The information which is preserved is the order in which every point of the image is 
traversed, possibly multiple times. Definition 36.8.14 removes slightly more information because the direction 
of traversal is also removed. 


36.8.14 DEFINITION: An unoriented (continuous) path in a topological space M is the set Q U (—Q) for 
any path Q in M. 


36.8.15 REMARK: Terminology for paths whose orientation information has been removed. 
Other possible terms for “unoriented” are “disoriented”, “undirected”, “unordered” and “disordered”. 


36.8.16 REMARK: Interchangeability of paths and curves in practice. 

As is the case of all equivalence class constructions in mathematics, an equivalence class of curves (such as 
the paths in Definitions 36.8.4 and 36.8.14) may be represented in practical applications by a single curve of 
the class. In practice, one need not be fastidious about the distinction between curves and paths, as long as 
it is clear which equivalence relation is being used in each context. 


36.8.17 REMARK: Possible alternatives to parametrisation for indicating order and manner of traversal. 
The purpose of the parametrisation of paths is to indicate the order and manner of traversal of points in 
a topological space, although most of the information in the parametrisation is irrelevant. The alternative 
of defining some sort of total ordering on the image is too clumsy in practice. (See Definition 11.5.29 for 
ordered traversals.) 


This is analogous to the issue of families of atlases versus sets of atlases. In practice, very little analysis 
of paths can be done without parametrisation, just as very little differential geometry can be done without 
coordinate charts. 


36.9. Topological groups 


36.9.1 REMARK: Relations between topological groups and differentiable (Lie) groups. 

Topological groups are closely related to Lie groups. (See Section 62.2.) Topological groups are required 
for the specification of structure groups for topological fibre bundles in Definitions 47.6.5 and 47.8.3, for 
example, but structure groups for fibre bundles are always transformation groups, not abstract groups. (See 
Figure 62.1.1 in Remark 62.1.1 for a family tree of topological and differentiable groups.) 


36.9.2 DEFINITION: A topological group is a tuple G < (G, Ta, oa) which satisfies the following. 


(i 


) 

(ii) (G, TG) is a topological space. 
) 
) 


(G, oq) is a group. 


(iii) The group operation og : G x G — G is continuous with respect to Tc. 
(iv 
36.9.3 THEOREM: Left and right action and conjugation maps on a topological group are homeomorphisms. 
Let (G, Ta, oq) be a topological group. Then the maps Ly : G > G, Rg : G — G and C, : G — G, defined 
by Ly :h e» gh, Rg: h 5 hg and Cg : h 5 ghg ^! , are homeomorphisms for all g € G. 


The map g++ g^! from G to G is continuous with respect to Tg. 


PROOF: Let (G, Tga, og) be a topological group. Then og : G x G — G is continuous. Let g € G. Then Lyg 
and R, are continuous by Theorem 32.10.10. Similarly, L7! = L,-: and Rj! = R,-: are continuous. So Lg 
and Rg are homeomorphisms for all g € Œ. Hence C, = Lg o R,-: is a homeomorphism for all g € G. 


36.9.4 REMARK: The general linear group for a finite-dimensional linear space is a topological group. 
The continuity of the inverse operation for the general linear group for a finite-dimensional linear space can 
be shown to be continuous by means of some metric space concepts. (See Theorem 39.6.4 (i).) 
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Even though the standard topology on a finite-dimensional linear space is induced by a metric, which is 
induced by a norm, this does not imply that the general linear group of such spaces contains only isometries. 
A group acting on a set typically defines some kind of isomorphism for the set. The isomorphisms preserve 
the invariants of the group action. However, invariance of a metric under group actions is only one special 
kind of invariance. Thus the role of a distance function to induce a topology must be distinguished from 
its role in defining isometries. In particular, the general linear group GL(V) is continuous with respect to 
a metric-induced topology on a finite-dimensional linear space V, and the subgroup O(V) contains those 
elements of GL(V) which preserve the metric. 


36.9.5 REMARK: One-parameter subgroups of topological groups. 

Although it is possible to drop the requirement for one-parameter subgroups to be continuous, the resulting 
chaos has some academic interest, but limited practical value for differential geometry. So the adjective 
"continuous" is often omitted. In the case of a group which lacks topological structure, continuity cannot 
be defined. Therefore “one-parameter subgroups" are rarely mentioned for groups without topology. 


The term “one-parameter subgroup" is not strictly correct. It is a curve in the group, not a subset of the 
group. (However, given the range of a one-parameter subgroup, it is often straightforward to guess the 
intended parametrisation.) It would be literally correct to say that a one-parameter subgroup is an “indexed 
subgroup" or *parametrised subgroup". It is asserted in Theorem 36.9.7 (1) that the range of a one-parameter 
subgroup is in fact a subgroup. 


One-parameter subgroups are also rarely defined for parameter spaces other than the real numbers. One 
could think of the unital group morphisms in Definition 17.4.8 and Theorem 17.4.7 as a kind of minimalist 
“one-parameter subgroup". (See also Remark 17.1.10 for a summary of other kinds of unital morphisms.) 
These concepts are similar, but not similar enough to give them the same name. 


Continuous one-parameter subgroups have limited interest. They become much more useful when continuity 
is replaced by differentiability. 


36.9.6 DEFINITION: A (continuous) one-parameter subgroup of a topological group G is a continuous curve 
y: R — G which satisfies y(0) = eg and 


Vs,t € R, y(s +t) = y(s)y(t). 


36.9.7 THEOREM: The range of a continuous one-parameter subgroup is a commutative subgroup. 
Let y be a continuous one-parameter subgroup of a topological group G. 


(i) Range(y) is a commutative subgroup of G. 


PROOF: For part (i), let g1,g2 € Range(y). Then gi = y(t1) and g2 = (ta) for some tı,t2 € IR. So 


g9ig2 = y(t1) (te) = y(t1 + t3) € Range(y). So Range(») is closed under the group operation of G. 
Let g € Range(y). Then g = y(t) for some t € R. Let g' = 7(-t). Then gg’ = y(t)y(-t) = 7(0) = ec 
by Definition 36.9.6, and g',ec € Range(y). Thus Range(y) is a group under the group operation of G. 
Therefore Range(») is a subgroup of G by Theorem 17.6.4. 

To show commutativity, let g1,g2 € Range(y). Then gı = y(t1) and g2 = y(t2) for some tı,t2 € R. So 
9192 = y(ti)y(t2) = y(ti + te) = «(ta + t1) = vy(t2)y(t1) = gogi by Definition 36.9.6. Hence Range(y) is a 
commutative subgroup of G. 


36.9.8 REMARK: Continuous one-parameter subgroups have “constant velocity” in some sense. 
Since a topological group G does not have a specified differentiable structure, it is not in general possible to 
define the velocity of a continuous one-parameter subgroup of G. 


Definition 36.9.6 strongly suggests a curve with “constant velocity” in some sense. If g = y(e) for some 
e € R*, then 7(ne) = g^ for any n € Z. The notation “g"” is suggestive of the real-number formula 
g” = exp(nlog.(g)), which may be thought of as having a constant “relative velocity" since its velocity is 
proportional to its value. If G is the group of translations of a Cartesian space IR^, then y will in fact have 
a true constant velocity equal to y(1). 


Without knowing the specific nature of the group, it is not possible to give a specific “constant velocity" 
interpretation for Definition 36.9.6. This is despite the fact that the parameter set R. does have a very specific 
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differentiable structure. However, one may at least note that the behaviour of a one-parameter subgroup is 
very tightly constrained in general. The curve is not necessarily uniquely determined by its value for one 
parameter. For example, the group element 4(1) may have more than one “square root”. In other words, 
there may be two or more elements h € G with h? = 4(1). Therefore (1/2) is not uniquely determined 
by 7(1). The groups SO(n) for n > 2 provide ample examples of this. On the other hand, if the value of y(t) 
is known for t in some neighbourhood of 0, then y is in fact uniquely determined for all parameter values. 


In the case of differentiable one-parameter subgroups, the concept of constant velocity is applicable in a 
literal sense. If the velocity ?/(t) for each parameter t € IR of the curve y is pulled back to the identity of the 
group, it has always the same value. In the same way that constant-velocity lines give a concrete meaning to 
tangent vectors in Definition 54.1.2, constant-velocity differentiable one-parameter subgroups give a concrete 
meaning to elements of the Lie algebra T.(G) of a differentiable group G. 


36.10. Topological left transformation groups 


36.10.1 REMARK: Multiple options for defining kinds of topological transformation groups. 

Definition 36.9.2 for a topological group (G, TG, c) specifies that the group G must have a topology and that 
the group operation e must be continuous. There are no obvious “multiple choice" options for variations 
of this definition since there is only one set and one operation. The situation with transformation groups 
is different. (See Section 20.1 for transformation groups.) A non-topological left transformation group 
(G, F,o, u) has two sets and two operations. So there are many options for introducing topology and 
continuity to such an algebraic structure. (See Figure 62.1.2 in Remark 62.1.4 for a family tree of topological 
and differentiable groups and transformation groups.) 


36.10.2 REMARK: Topological transformation groups with and without continuity of the action map. 

'The difference between Definitions 36.10.3 and 36.10.4 is the extra requirement that the action map of the 
topological transformation group be continuous with respect to elements of the group. In both cases, the 
action of each element of the group G is a topological automorphism of the set X. 


In Definition 36.10.3, the group is not a topological group because it has no topology. It is just a group of 
automorphisms. In Definition 36.10.4, the group is a topological group whose action map is continuous with 
respect to both the group topology and the point-set topology. 


Probably a better name for Definition 36.10.3 would be a “group of continuous transformations", but that 
would not be quite correct. If the transformation group is not effective, the automorphisms of F cannot 
be identified with unique left actions Lg : f — u(g,f). The name “transformation group of continuous 
transformations" would be more accurate, but it is somewhat repetitive. 


36.10.3 DEFINITION: A (left) transformation group of a topological space X < (X,Tx) is a tuple 
(G, X, Tx,0G, u) such that (G, X, og, u) is a left transformation group of the set X, and Ly : X > X isa 
homeomorphism from (X, Tx) to (X, Tx) for all g € G. 


36.10.4 DEFINITION: A topological (left) transformation group of a topological space (X, Tx) is a tuple 
(G, Ta, X, Tx,cG, u) such that (G,T¢,oc) is a topological group and the action map u : Gx X > X is 
continuous with respect to the topologies Tg and Tx. 


36.10.5 THEOREM: A topological left transformation group of X is a left transformation group of X. 
Let (G, Ta, X, Tx, 0G, p) be a topological left transformation group of a topological space (X, Tx). 

(i) The map Ly : X — X is a homeomorphism from (X, Tx) to (X, Tx) for all g € G. 

(ii) (G, X, Tx,oc, u) is a left transformation group of the topological space X. 


Proor: For part (i), let (X, Tx) be a topological space. Let (G, TG, X, Tx,oc, u) be a topological left 
transformation group of (X, Tx). Then u : Gx X — X is continuous with respect to the topologies Tg 
and Tx. Let g € G. Then L,: X — X and L,-1 : X + X are continuous by Theorem 32.10.10 (ii). So 
Lg : X — X is a homeomorphism by Definition 31.14.2 because Ls. =L 


Part (ii) follows from part (i) and Definition 36.10.3. 


got . 
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36.10.6 REMARK: Effective topological transformation groups with and without action-map continuity. 
Definitions 36.10.7 and 36.10.8 are the same as Definitions 36.10.3 and 36.10.4 respectively, except that the 
action is required to be effective. (See Definition 20.2.1 for effective group actions.) 


36.10.7 DEFINITION: An effective (left) transformation group of a topological space X is a left transforma- 
tion group G < (G, X, Tx, cc, H) of the topological space X such that G acts effectively on X. 


36.10.8 DEFINITION: An effective topological (left) transformation group of a topological space X is a 
topological left transformation group G < (G, Ta, X, Tx, oa, H) of X such that G acts effectively on X. 


36.10.9 REMARK: Style rules for the order of sets, attributes and operations in specification tuples. 

On the subject of specification tuples, the rule for choosing the listing order for the components of tuples is 
that all algebraic operations (such as sums ø and products p) are placed at the end of the tuple, whereas 
attributes of sets such as topologies and atlases are place immediately after the sets they belong to, as for 
example the topology Tx for X in Definition 36.10.8. These style rules are followed throughout this book. 


36.10.10 REMARK: A topological transformation group acting on itself is effective, free and transitive. 
'Theorem 36.10.11 is the topological version of Theorems 20.3.7 and 20.4.5. 


36.10.11 THEOREM: A topological transformation group acting on itself is effective, free and transitive. 
Let G < (G,Ta,6G) be a topological group. Define the action map u : G x G > G by p : (g1,9g2) ^ 
eG(gi1, 92). Then the tuple (G, Tc, G, TG, oa, p) is an effective, free, transitive topological left transformation 
group of (G, Ta). 


PROOF: (G,G,oc, 1) is an effective, free, transitive non-topological left transformation group by Theorems 
20.3.7 and 20.4.5. The continuity of u follows from the continuity of ag. Hence (G, TG, G, Ta, oa, H) is an 
effective, free, transitive topological left transformation group by Definition 36.10.4 


36.10.12 DEFINITION: The topological (left) transformation group of G acting on G by left translation is 
the topological left transformation group (G, G) < (G, Ta, G, Te, oG, ca). 


36.10.13 REMARK: Substituting a different topology for the passive set of a group acting on itself. 

If the second copy of the topology Tg in the specification tuple in Definition 36.10.12 is replaced with the 
trivial topology TZ = {0,G}, the resulting tuple (G, Ta, G, T, 7G, oa) is a topological left transformation 
group because the map og : G x G — G is continuous with respect to the product topology of Tg and T% 
on G x G and the topology TG on G. (It is unnecessary to construct the product topology for G x G because 
all functions whose range has a trivial topology are continuous.) This remark is applicable to Remark 47.13.5 
regarding fibre bundles. 


36.10.14 REMARK: Action by one-parameter subgroups of left transformation groups. 
A continuous one-parameter subgroup of a topological group in Definition 36.9.6 yields a continuous curve 
for each point in the passive set of a topological left transformation group. 


Let (G, TG, X, Tx,oG, u) be a topological left transformation group. Let y : R — G be a continuous one- 
parameter subgroup of the topological group (G, Ta, oga). Let p € X. Then the map 7, : IR > X defined by 
Yp : t + u(y(t), p) must be continuous because both y and u are continuous. Thus one obtains a family of 
continuous curves in X with p as family-parameter. 


For any fixed t € IR, the map p++ *(£) is a topological automorphism of X by Theorem 36.10.5 (i). Therefore 
y may be thought of as a continuous one-parameter family of topological automorphisms of X. One could 
denote this family by L, : IR — (X — X), defined by L4(t)(p) = y(t) = u(y(t),p) = Lyp for all t € R 
and p € X. In other words, L} : t (p œ> Lyæp). Then L, : IR > Aut(X), where Aut(X) is the group 
of topological automorphisms of X. This is clearly a one-parameter subgroup of Aut(X) because it obeys 
the additivity rule in Definition 36.9.6, but the continuity of L., requires a suitable topology to be defined 
for Aut( X). 


36.10.15 REMARK:  Topological transformation group homomorphisms. 
Homomorphisms of topological transformation groups in Definition 36.10.16 are based on Definition 20.6.2 
for general transformation groups. 
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36.10.16 DEFINITION: A topological (left) transformation group homomorphism from a topological left 
transformation group (G1, X1) < (Gi, Tai, X1, Tx,,01, 1) to a topological left transformation group 
(G5, X2) < (Go, Ta,, X2,Tx,, 02, 12) is a pair of maps (9, 9) with 6: Gi — G3 and $ : X4 — X» such that 


(i) The pair (d, $) is a left transformation group homomorphism, 


(ii) ó and ¢ are continuous. 


36.11. Topological right transformation groups 


36.11.1 REMARK: Mirror images of topological left transformation groups. 
The following definitions are for the “right” versions of the “left” transformation groups in Section 36.10. 
The non-topological versions of these topological right transformation groups are in Section 20.7. 


36.11.2 DEFINITION: A right transformation group of a topological space X is a tuple (G, X, Tx ,cG, p) 
such that (G, X, øg, u) is a right transformation group of the set X, and Rg : X — X is a homeomorphism 
from (X,Tx) to (X, Tx) for all g € G. 


36.11.3 DEFINITION: A topological right transformation group of a topological space (X, Tx) is a tuple 
(G, Ta, X,Tx,oG,) such that (G, TaG,oG) is a topological group and the action map u : X x G > X is 
continuous with respect to the topologies Tg and Tx. 


36.11.4 THEOREM: A topological right transformation group of X is a right transformation group of X. 
Let (G, Ta, X, Tx, 0G, p) be a topological right transformation group of a topological space (X, Tx). 


(i) The map Ry : X — X is a homeomorphism from (X, Tx) to (X, Tx) for all g € G. 
(ii) (G, X, Tx, oa, şu) is a right transformation group of the topological space X. 


PROOF: Part (i) may be proved as for Theorem 36.10.5 (i). 
Part (ii) follows from part (i) and Definition 36.11.2. 


36.11.5 DEFINITION: An effective right transformation group of a topological space X is a right transfor- 
mation group G of the topological space X such that G acts effectively on X. 


36.11.6 DEFINITION: An effective topological right transformation group of a topological space X is a 
topological right transformation group G of X such that G acts effectively on X. 


A free topological right transformation group of a topological space X is a topological right transformation 
group G of X such that G acts freely on X. 

A transitive topological right transformation group of a topological space X is a topological right transfor- 
mation group G of X such that the action of G on X is transitive. 


36.11.7 REMARK: A topological right transformation group acting on itself is effective, free and transitive. 
Theorem 36.11.8 is the topological version of Theorem 20.7.17. The tuple (G, TG, G, Ta, ca, p) which appears 
in Theorem 36.11.8 is identical to the corresponding tuple in Theorem 36.10.11, although it specifies a 
topological left transformation group in the former case and a topological right transformation group in the 
latter case. 


Theorem 36.11.8 is the simple mirror image of Theorem 36.10.11. The specification tuple (G, Tg, G, Ta, oa, p) 
is exactly the same in each case, but the *object class" is different. (See Section 8.8 for object classes and 
implicit “class tags".) 


36.11.8 THEOREM: Topological right transformation group acting on itself is effective, free and transitive. 
Let G < (G,Ta,cG) be a topological group. Define the action map u : G x G —> G by u : (91,92) ^ 
eG(g1, 92). Then the tuple (G, TG, G, Ta, og, p) is an effective, free, transitive topological right transforma- 
tion group of (G, Ta). 


PROOF: The tuple (G, G, oc, ui) is an effective, free, transitive non-topological right transformation group 
by Theorem 20.7.17. The continuity of u follows from the continuity of og. Hence (G, TG, G, Ta, oG, p) is 
an effective, free, transitive topological right transformation group by Definition 36.11.06. 
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36.11.9 DEFINITION: The topological right transformation group of G acting on G by right translation is 
the topological right transformation group (G, G) < (G, Tea, G, Ta, ca, ca). 
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METRIC SPACES 
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37.0.1 REMARK: Why metric spaces are presented after general topological spaces. 

Metric spaces may be regarded as being in a higher concept layer than topological spaces. Every metric 
space determines a unique canonical topology, whereas a metric function consistent with the topology on a 
topological space cannot be defined in general. When a topological space is metrisable in this sense, there are 
infinitely many metric functions which are consistent with the topology. A metric space may freely import 
all of the definitions and theorems of general topology, which are immediately applicable to its canonical 
topological structure. But the reverse is not true. 


Since metric spaces are much closer to human intuition, many textbooks introduce metric spaces before 
topological spaces to make life initially easier for the reader. In the long term, however, this is confusing 
because it is difficult to forget the “facts” of metric spaces when learning the more general subject of 
topological spaces. One's intuition for metric spaces tends to impose itself on topological spaces, leading to 
many false assumptions. 


One of the principal objectives of this book is to present the foundations of differential geometry in a 
disciplined systematic manner in order to discourage the incorrect application of specific facts to more 
general contexts. A similar danger of false generalisation occurs in many differential geometry books which 
present Riemannian manifolds before general differentiable manifolds and manifolds which have only an 
affine connection. Commencing a book in the middle of a subject (between the low-level foundations and the 
high-level applications) may be more popular in the short term, but it leads to confusion in the long term. 


37.0.2 REMARK: Metric spaces could be classified as an algebra topic. 

The distance functions in Definitions 37.1.2 and 37.2.3 have an entirely algebraic character since they mention 
nothing about topology or analysis. Topology is only introduced into metric spaces as the "topology induced 
by a metric" in Definition 37.5.2. In fact, the norms in Section 19.6, which are very closely related to 
distance functions, are treated in this book as a topic in algebra. Distance functions could be regarded in the 
same way as a topic in algebra. Even the use of a supremum in Definition 37.4.6 may be regarded as non- 
analytical because the general concept of a supremum is introduced in a very low layer in Definition 11.2.4, 
and the order properties of real numbers are part of the real number system in Section 15.6. In fact, distance 
functions are used in classical Euclidean geometry without necessarily relating the distance to any continuity 
or topological concepts. 


Metric spaces could thus be split into an algebraic sub-topic and an analytical and topological sub-topic. 
However, in the differential geometry context, the analytical and topological relevance of metric spaces is the 
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principal motivation for defining them. The benefits of splitting metric spaces into algebraic and analytical 
stages are outweighed by the benefits of an integrated presentation. 


37.0.3 REMARK: A Riemannian metric space is a special kind of metric space. 

Metric spaces (in the context of topology) should not be confused with the concept of a Riemannian metric 
(in the context of differential geometry). A Riemannian manifold is a particular kind of differentiable metric 
space, and a Riemannian metric is a particular kind of differential of a two-point metric on a manifold. In 
terms of conceptual layering, metric spaces are lower than Riemannian manifolds. Metric spaces are a more 
general concept which is closely associated with general topology. 


37.0.4 REMARK: In metric spaces, many distinct general topology concepts become identical. 

Just as the Riemannian metric tensor field simplifies many differential geometry definitions, so also a two- 
point metric function simplifies many topology definitions. Many distinct concepts in general topology 
become equivalent concepts when the topology is induced by a metric. A good example of this is compactness. 
Several different definitions of compactness for general topological spaces are equivalent for metric spaces. 


37.0.5 REMARK: History of metric spaces. 
According to Moore [371], page 235, it was Felix Hausdorff who introduced the name and concept of metric 
spaces in his 1914 book. (However, see Remark 31.1.3 for a different opinion.) 


37.1. General metric functions 


37.1.1 REMARK: Minimal requirements for defining a metric function. 

The space of values of the distance d(x, y) between points x and y in a set M requires at least a total order 
and a commutative addition operation. These requirements are met by ordered commutative groups. (See 
Definition 17.5.8.) Hence Definition 37.1.2 specifies that distance values lie in an ordered commutative group. 
Any ordered ring or ordered field may be used instead. 


37.1.2 DEFINITION: A metric (function) or distance function on a set M, valued in an ordered commutative 
group A, is a function d : M x M — A such that: 


(i) Vz,y € M, d(z,y) 204 & z—y, [identity] 
(ii) Va, yE M, d x,y) = d(y, z), [symmetry] 
(iii) Vz,y,z € M, d(x,y) € d(x,z) + d(z, y). [triangle inequality] 


37.1.3 DEFINITION: A metric space valued in an ordered commutative group A is a pair M < (M, d) where 
M is a set and d: M x M — A is a metric function valued in A. 


37.1.4 THEOREM: Some basic bounds for general metric functions. 

Let (M,d) be a metric space valued in an ordered commutative group A. Then 
(i) Vz,y € M, d(z,y) > 04. 

(ii) Vz,y,z € M, d(x,y) > |d(z, z) — d(z,y)|. 


PROOF: To show part (i), let (M, d) be a metric space valued in an ordered commutative group A so that 
d: M x M — A is a metric function on M. Then 04 = d(z,z) < d(x,y) + d(y,z) = d(x,y) + d(a,y) by 
Definition 37.1.2. Suppose d(x,y) < 04. Then d(x,y) + d(x,y) < d(x, y) < 04 by Definition 17.5.7. This is 
a contradiction. Therefore d(x,y) > 04. 


For part (ii), let x,y,z € M. Then by Definition 37.1.2 (iii), d(x,z) < d(x,y) + d(y,z) and d(z,y) < 
d(z,a) + d(x,y). So d(x,z) — d(y, z) € d(x,y) and d(z,y) — d(z,a) € d(x,y). Hence |d(z, z) — d(z,y)| = 
max(d(z, z) — d(z, y), d(z, y) — d(x, z)) = max(d(z, z) — d(y, z), d(z, y) — d(z,x)) < d(x,y). 


37.1.5 REMARK: The metric function definition implies positive distance values. 

It follows from Theorem 37.1.4 (i) that one may write d : M x M — Aj in Definition 37.1.2, where Aj = 
{a € A; a > 04}, without changing the definition. Thus only half of the ordered commutative group A is 
used in the definition. This suggests that one could make do with an ordered commutative monoid instead. 
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37.1.6 REMARK: The metric function definition implies lower bounds on distances. 
Theorem 37.1.4 (ii) puts a lower bound on distances corresponding to the upper bound in the triangle 
inequality. Combining the bounds gives 


These inequalities become clear by drawing lines and circles on paper. Theorems 37.3.8 and 37.3.12 suggest 
how to draw such lines and circles. 


37.1.7 REMARK: An abstract algebraic norm induces a metric function. 
The general kind of norm in Definition 19.6.2 induces a metric on a unitary module over a ring. (See 
Definition 19.3.6 for unitary modules over rings.) 


37.1.8 THEOREM: Canonical construction of a metric function from a norm. 

Let M be a unitary module over a ring R. Let y : M — S be a norm on M which is compatible with an 
absolute value function à : R — S for an ordered ring S. Then d : Mx M — S defined by d : (x, y) œ> v(z—y) 
is a metric function on M, valued in S. 


PROOF: Let M be a unitary module over a ring R. Let y : M — S be a norm on M which is compatible 
with an absolute value function $ : R — S for an ordered ring S. Then S is an ordered commutative group 
with respect the addition operation on S, by Definitions 17.5.7 and 18.3.3. 


Defined: Mx M > S by d : (x,y) => y(x — y). Then d(z,z) = Y(0m) = Og for all z € M by 
Theorem 19.6.4 (ii), and by Definition 19.6.2 part (iii), d(x,y) = v(x — y) > 0s for all x,y € M with x Z y, 


by part (i), d(x,y) = v(x—y) = v((-1n)(y-z)) = o(-1n)v(y—z) = é(1a)U(y—x) = 1sv(y—2z) = d(y, v) 
(by Theorems 18.5.10 (iii) and 18.6.8), and by part (ii), d(x,y) = v(x — y) = v((r—z)—(z—-y)tsz 


v(z—z)-wv(z-—y)-d(x,z) + d(z, y) for all x,y,z € M. Therefore d is a metric function on M, valued in 
the ordered commutative ring S. 


37.1.9 REMARK: Pseudo-metric (or pseudometric) spaces. 

Many authors define a pseudo-metric (function) to be a metric function as in Definition 37.1.2, but with 
the identity condition (i) weakened to require only that d(x,y) = 0 if z = y. In other words, there may be 
distinct points x and y with d(x,y) = 0. (See for example Gaal [77], pages 38-40, 48-49, 64-65, 93, 120- 
122, 132-133, 155-156, 159-160, 164-169, 276; Kelley [101], pages 118-126, 129-130, 184-190; Willard [165], 
pages 16-17, 85; Gemignani [80], pages 92-93; Kasriel [100], pages 237-239; Steen/Seebach [141), page 34; 
Simmons [137], page 58.) The applicability of pseudo-metrics to differential geometry may be somewhat 
limited, as mentioned in Remark 49.5.4. 


37.2. Real-valued metric functions 


37.2.1 REMARK: Metric functions of interest in analysis are usually real-valued. 

The general metric functions in Section 37.1 have an algebraic character. If, for example, a metric function 
is valued in the ordered commutative group of integers, the definitions of limits and convergence would not 
have much analytic interest. Analytically interesting metric functions are typically real-valued. 


37.2.2 REMARK:  Real-valued metric functions always have non-negative values. 
The real-valued metric function in Definition 37.2.3 has the requirement d : Mx M — Rj. By Remark 37.1.5, 
this may be replaced with d : M x M — IR without changing the definition. 


37.2.3 DEFINITION: A (real-valued) distance (function) or (real-valued) metric function on a set M is a 
function d : M x M — Rj such that 


(i) Vz,oy € M, d(z,y) 0e v=y, [identity] 
(ii) Vz,y € M, d(x,y) = d(y,x), [symmetry] 
(iii) Vr,y,z € M, d(x,y) € d(x,z) + d(z, y). [triangle inequality] 


37.2.4 DEFINITION: A (real-valued) metric space is a pair M < (M,d) where M is a set and d is a real- 
valued metric function on M. 
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37.2.5 REMARK: The discrete metric on any set. 

A “discrete metric” can be defined as in for any set Definition 37.2.6. The topology induced by this metric as 
in Definition 37.5.2 is the discrete topology in Definition 31.3.19. The trivial topology in Definition 31.3.18 
corresponds to the “pseudo-metric” (x,y) — 0 for x,y € M. (See Remark 37.1.9 for pseudo-metric spaces.) 


37.2.6 DEFINITION: The discrete metric on a set X is the function d : X x X — IR given by 


0 ifr= 


37.2.7 REMARK: The relation between distance functions and Riemannian metric tensor fields. 

The Riemannian metric tensor in Section 73.2 is also often referred to simply "the metric" on a manifold. 
'This could be confused with the metric in Definition 37.2.3. When there is a possibility of confusion, the 
metric in Definition 37.2.3 may be referred to as a “point-to-point metric”, “two-point metric" or “distance 
function". The Riemannian concept may be called a “Riemannian metric" or “metric tensor". The two 
concepts are closely related, as discussed in Section 73.9. It turns out that the Riemannian metric can be 
expressed as a differential of the corresponding two-point distance function. 


37.2.8 DEFINITION: The usual metric on IR" for n € Zt is the function d : IR" x IR" — IRj defined by 
d(x,y) = (Zi (t: — wy)? for x,y € R^. 


37.2.9 REMARK: Distance function for a finite semi-open real interval with the circle topology. 
The “circle distance function” for a finite semi-open real-number interval in Definition 37.2.10 induces the 
“circle topology” in Definition 32.6.14. 


37.2.10 DEFINITION: The circle distance function for a finite semi-open real-number interval I = [a,a+L) 
or I = (a,a + L], for a € IR and L € IR*, is the function d : I x I — Rọ given by 


Vz,y € I, d(x,y) = min(|y — z|, L — |y — zl). 


37.2.11 REMARK: Distance function for semi-open real intervals product with torus topology. 

'The extension of the circle-topology distance function in Definition 37.2.10 to torus-topology distance func- 
tions on finite products of finite semi-open intervals in Definition 37.2.12 induces the torus topology in 
Definition 32.6.15. This distance function is illustrated in Figure 37.2.1 for the case n = 2, I; = |[0,4), 
Ij = [0,3) and y = (0,0). 


(C 


1 Xi 


y r=1 
points at distance r from y = (0,0) 


Figure 37.2.1 Distance function on a torus M = [0,4) x [0,3) 


37.2.12 DEFINITION: The torus distance function for a finite product of finite semi-open real intervals is 
the distance function d: M x M > R given by 


n » 1/2 
Vz,y € M, d(x,y) = (LR, min(|yx — ve], La — yk — zrl)? ) " 


for a set M = x?_,Jy, where for each k € Np, the set J, is a semi-open interval Ip = [aķ, ak + Li) or 
Ij = (aj, ak + Ly] for some aj € R and Ly € IR*. 
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37.2.13 THEOREM: Subsets of metric spaces are metric spaces. 


Let (M, d) be a metric space. Let M' € P(M) and d' = dio Then (M', d') is a metric space. 


xM'' 


Proor: The distance function d' satisfies Definition 37.2.3 if d does. In particular, d'(z, y) is well defined 
for all z,y € M’. 


37.2.14 DEFINITION: A metric subspace of a metric space (M,d) is a metric space (M',d') such that 
M' € P(M) and d' 


= d| rin 


37.2.15 REMARK: Cartesian products of metric spaces. 

Whereas the Cartesian product of two topological spaces has only one standard product topology, which is 
given in Definition 32.9.4, the Cartesian product of two metric spaces can have multiple standard product 
metrics. The most common product metrics use one of the p-norms in Definition 24.7.11 to combine two 
or more metrics. Definition 37.2.16 uses the 2-norm. The choice of norm clearly does influence the metric 
on the Cartesian product, but it does not influence the topology, assuming a finite number of spaces in the 
product. 


37.2.16 DEFINITION: The Cartesian product of metric spaces (X,dx) and (Y, dy) is the pair (X x Y,d), 
where d : (X x Y) x (X x Y) + Rọ is given by 


YVz1, £2 € X, Via, y2 E Y, d( (x1, y1), (22, y2)) = (dx (1, 22); dy (y2, y2))|2 
= (dx (21,2)? + dy (y2, y2)^) ^. 


37.2.17 THEOREM: The Cartesian product of two metric spaces is a metric space. 
'The Cartesian product of two metric spaces is a metric space. 


PROOF: Let (X x Y,d) be as in Definition 37.2.16. Let z; = (zi yi) € X x Y for i = 1,2,3. Then 
d(21, 22) = d(22, 21), and d(z1, za) € d(z1, z2) 4- d(22, z3) by Theorem 16.8.9 (ii). If z; = 22, then d(21, z2) = 0. 
If d(z1, 22) = 0, then z1 = z2. Hence (X x Y,d) is a metric space by Definitions 37.2.3 and 37.2.4. 


37.3. Balls 


37.3.1 DEFINITION: 
The open ball in a metric space (M, d) with centre x € M and radius r € Rẹ is the set {y € M; d(x,y) < r}. 


The closed ball in a metric space (M, d) with centre x € M and radius r € Rọ is the set (y € M; d(z, y) < r}. 


37.3.2 NOTATION: 

Bzr, for a metric space (M,d), x € M and r € Rj, denotes the open ball (y € M; d(x,y) < r} in M with 
centre x and radius r. 

Bz r, for a metric space (M, d), x € M and r € Rj, denotes the closed ball (y € M; d(x,y) < r} in M with 
centre x and radius r. 

B, (x), for a metric space (M, d), x € M and r € Rj, is an alternative notation for Bzr. 

B, (x), for a metric space (M, d), x € M and r € Rj, is an alternative notation for D, ,.. 

37.3.3 REMARK: Diagrams for visualising open and closed balls. 


Definition 37.3.1 is illustrated in Figure 37.3.1. General metric spaces are quite different to R?, but such 
diagrams are often helpful for understanding general theorems. 


37.3.4 REMARK: A closed ball might not be the closure of the open ball with the same centre and radius. 
The words “open” and “closed” in Definition 37.3.1 are only loosely related to the concepts of open and 
closed sets in the topology induced by the metric in Definition 37.5.2. Likewise, the use of the bar in the 
notation for a closed ball does not generally mean that it is the topological closure of the corresponding open 
ball. Even in the spaces IR", closed balls with zero radius are not topological closures of the corresponding 
open balls. In general metric spaces, the relation between open and closed balls is even looser. An open ball 
is always open in the induced topology of a metric space by Theorem 37.5.6 (i). Similarly, a closed ball is 
always closed in the induced topology by Theorem 37.5.13. But there are examples where Bzr & Brr, as 
mentioned in Remark 37.5.14. 
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Figure 37.3.1 Open and closed balls in a metric space 


37.3.5 REMARK: Properties and triangle inequality conditions, assuming identity and symmetry. 
Definition 37.3.1 has an unambiguous meaning for any function d : M x M > R$, for any set M, whether 
or not the conditions of Definition 37.2.3 are satisfied. If only the identity and symmetry conditions are 
guaranteed, one still obtains some useful properties of open and closed balls. 


Theorem 37.3.6 gives some properties of balls when only the identity and symmetry conditions are guaranteed. 
Theorems 37.3.8 and 37.3.12 give some conditions which are equivalent to the triangle inequality when the 
identity and symmetry conditions are assumed. 


37.3.6 THEOREM: Some properties of functions satisfying identity and symmetry conditions. 
On a set M, let d: M x M — R$ be a function which satisfies the identity and symmetry conditions in 
Definition 37.2.3. Then the following properties hold. 
(i) Yx € M, Bolz) = 0. 
(ii) Vr € M, Bo(x) = {a}. 
(ili) Vr € M, Vr € Rt, x € B, (x). 
(iv) Yz € M, Vr € RÌ, x € B, (x). 
(v) V, y € M, Yr € Rj, x € B,(y) & y € B, (x). 
) B, (y 
) 


(vi) VzyceM,VreRj,rzceB,(ye—wvyec Bis. 


(vii) Yz € M, Bo(r) = B(x) = M. 


PROOF: For part (i), Bo(z) = (y € M; d(x,y) < 0) = Ø by Theorem 37.1.4 (i) for all x € M. 
For part (ii), Bo(r) = (y € M; d(x,y) € 0) = (x) by Theorem 37.1.4(i) and Definition 37.2.3 (i) for 
all x € M. 


Parts (iii) and (iv) follow from Definition 37.2.3 (1). 
Parts (v) and (vi) follow from Definition 37.2.3 (ii) and Notation 37.3.2. 
Part (vii) follows from Definitions 37.2.3 and 16.2.13. 


37.3.7 REMARK: Visualisation of the triangle inequality. 
There are many ways to express the triangle inequality in terms of balls. 
Theorem 37.3.8. Parts (i) and (iii) are illustrated in Figure 37.3.2. 


Some examples are given in 


37.3.8 THEOREM: Some equivalent conditions for the triangle inequality. 
Let d: M x M — Rj} be a function on a set M which satisfies the identity and symmetry conditions. Then 
the triangle inequality for d is equivalent to each of the following conditions. 


(i) Vz,y € M, Vri,r2 € R$, y € Bj (x) > B,,(y) € Bir). 
(ii) Vr € M, Vri,T2 € Ro, Ue B. (a) Broly ) c j; (z): 
(iii) Yz, y € M, Wri,r2 € R$, Bn (2) n By) £0 > y€ Bnin (a). 
(iv) Vz, y € M, Yri, r2 € Rọ, y ¢ Beste ) => B, LU yn B, (y) = 0. 
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g e 


ye Br (x) > His (y) i Bats (x) Bs; (x) N Bis (y) # 0 > YE Bises (x) 


Figure 37.3.2 Triangle inequality equivalents in Theorem 37.3.8 (i) and (iii) 


PRoor: Let d: Mx M > Ri be a function on a set M which satisfies the identity and symmetry 
conditions for a metric in Definition 37.2.3. It must be shown that the triangle inequality condition for d is 
then equivalent to each of the given alternative conditions. 

For part (i), let d satisfy the triangle inequality: Vr,y, z € M, d(x,z) < d(x,y) + d(y, z). Suppose z, y € M 
and rj,r € IRj. Let y € B,,(x). Then d(x,y) € rı by Notation 37.3.2. Suppose that z € B,,(y). Then 
d(y,z) € r2 by Notation 37.3.2. So d(x,z) < s y) t d(y, z) € rı r2 by the triangle inequality. Therefore 
z € B,, 4r,(x). Hence Vr, y € M, Vri,r € R, (y € Bj, (x) > Br, (y) € By, +r,(x)), which is condition (i). 
Now suppose that condition (i) holds. Let x,y,z € M. Let rj = d(x,y) and r2 = d(y, z). Then y € B,, (x) 
and z € B, (y). So z € B,, (x) by condition (i i). So d(x,z) X rı +r2 = d(x, y) - d(y, z) by Notation 37.3.2. 
Therefore Vr,y,z € M, d(x,z) € d(x,y) + d(y,z). So the triangle inequality holds. Hence the triangle 
inequality is equivalent to condition (i). 

For part (ii), note that z € Deut (a) B,,(y) means dy € M, (y € Ba, (£) ^ z € B,,(y)) (by Notation 8.4.2), 
which is equivalent to Jy € B,., (x ), z € Bm (y) (by Notation 7.2.7). So 


U Br, (y) € Bry dng (t) e VZE M, (Gy € By (£), ze Br, (y)) = ZE Bst )) 


ca e Yz € M, Yy € By, (2), (2 € Bry (y) > z € Bry us (c) 


e Yz € M, Yy E Bn (x), (Br (y) € Bids but )) 


Therefore the condition Vz € M, Vri,r2 € R, Use B,, (x) B, (y) € By, +r (x) is equivalent to the condition 
Va,y € M, Vri,r € R$, (y € Br, (£) > Bra (y) € Bm+ra(£)), which is condition (i). Hence condition (ii) is 
equivalent to condition (i), which is equivalent to the triangle inequality. 

For part (iii), let x,y € M and r,r2 € Rj. Then 


B, (£) N Baly) Z0 & 3z € M, z € (B. (x) n B,,(y)) 
€ dz € M, (z € B. (x) ^ z € B,,(y)) 
€ Iz € M, (z € Ba (x) ^ y € B,,(z)) (37.3.1) 
a € B,,(z) 
eye U B,,(z). 


z€ By, (a) 


(Line (37.3.1) follows from Theorem 37.3.6 (vi). ) So by swapping the quantifier "Vy € M" with the other 
quantifiers, the condition Vx TA € M, Vri, ra € R$, B, (£) O Bra (y) 40 9 y € Bri+ra (x) is equivalent to the 
condition Vr € M, Vr, ra € RỌ, Los. (a) Baz) © | Mm (x). In other words, condition (iii) is equivalent 


to condition (ii). Hence it is equivalent to the triangle inequality. 


The quantified logical expression in part (iv) is the contrapositive of the corresponding expression in part (iii). 
So it is logically equivalent by Theorem 4.5.7 part (xxvii). Hence part (iv) is equivalent to part (iii). 


37.3.9 REMARK: Some more equivalent conditions for the triangle inequality. 
Theorem 37.3.10 lists some additional equivalent conditions for the triangle inequality. These use various 
combinations of open and closed balls. 
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37.3.10 THEOREM: Some conditions for the triangle inequality using open balls. 
Let d: Mx M > Ri be a function on a set M which satisfies the identity and symmetry conditions. Then 
the triangle inequality for d is equivalent to each of the following conditions. 


(i) Vz,y € M, Vr; € Rt, Vr» € RO, y € Bn (x) > Br (y) € Bry tr (2). 
(ii) Vz,y € M, Vri € I$, Vr € R*, y € B, (x) > B,,(y) € Brr (2). 
(ii Va,y € M, Vri,r2 € IR, y € Br (x) => Br (y) € Br, try (x). 


(iv) Va € M, Vr, € IR*, Vr; € RỌ, Uyes,, (2) B, (y) C B. «s. 


(v) Yz € M, Yrı € Rg, Yr2 € RF, Ueb., (æ) Bra (Y) € Br, en (2). 
(vi) Yz € M, Yri, r2 € RF, Ue B,, (a) Bra(9) € Bri+ra (x). 


(vii) Vz,y € M, Yrı € R$, Vr € Rt, Bn (£) O B,,(y) Z0 > y € B, (£). 
(ix) Vz,y € M, Vri,ra € IR^, B, (x) Byy(y) #0 > y € B, ix). 
(x) Vz,y € M, Vr1 € Rt, Vr € R$, y € Bis, (x) > Br (£) N Br, (y) = 
(xi) Va,y € M, Yrı € Rj, Vr € IR^, y é Bu (x) > By, (x) N By) = 
(xii) Vz,y € M, Vri,r2 € R*, y € Brass (x) > Br (£) N Br (y) = 0. 


) 
) 
) 
) 
) 
(vii) Vz,y € M, Yrı € Rt, Yra € R$, Bm (x) B,.(y) Z0 > y € Bry +r. (2). 
) 
) 
) 
) 
) 


Proor: Let d: Mx M > Ri be a function on a set M which satisfies the identity and symmetry 
conditions for a metric in Definition 37.2.3. It must be shown that the triangle inequality condition for d is 
then equivalent to each of the given alternative conditions. 


For part (i), let d satisfy the triangle inequality. Let x,y € M, rı € Rt and rz € Rj. Let y € B,, (x). Then 
d(x,y) < ri. Let z € B,,(y). Then d(y,z) € ra. So d(x,z) < d(x,y) + d(y,z) < rı + ra by the triangle 
inequality. Therefore z € B,,,,,(z). Hence Vr,y € M, Vr1 € Rt, Vr) € Ri, (y € B,, (2) > B,,(y) € 
B, 1,4 (1)), which is condition (i). 


Now suppose that condition (i) holds. Let x,y,z € M, and e € Rt. Let r1 = d(x,y) + € and ra = d(y, z). 
Then y € B,,(x) and z € B,,(y), where rı € IR* and r2 € Rj. So z € B,,4,,(x) by condition (i). So 
d(x,z) < ri - ra = d(x,y) + d(y, z) +e. So Ve € Rt, d(x,z) < d(x,y) +d(y, z) +€. It follows by the usual 
reductio ad absurdum argument that d(x,z) € d(x,y) + d(y, z). So the triangle inequality holds. Hence the 
triangle inequality is equivalent to condition (i). 


Part (ii) may be proved by some small modifications to the proof for part (i). Let d satisfy the triangle 
inequality. Let z, y € M, rı € Ri and r2 € Rt. Let y € B,,(x). Then d(z,y) < ri. Let z € B,,(y). Then 
d(y,z) < ra. So d(x,z) < d(x,y) + d(y, z) < rı + ra by the triangle inequality. Therefore z € B,,+,,(x). 
Hence Vz,y € M, Vr; € R$, Vr; € R*, (y € B,, (x) > B,,(y) € B, Lr, (x)), which is condition (ii). 


Now suppose that condition (ii) holds. Let x,y,z € M, and e € Rt. Let rı = d(x,y) and r2 = d(y, z) + €. 
Then y € Bn (x) and z € B,,(y), where rı € R$ and r2 € Rt. So z € B,,4,,(x) by condition (ii). So 
d(x,z) < ri +r2 = d(x,y) + d(y,z) +e. So Ve € Rt, d(x,z) < d(x,y) + d(y, z) +e. It follows by the 
usual RAA argument that d(x,z) < d(x,y) + d(y,z). So the triangle inequality holds. Hence the triangle 
inequality is equivalent to condition (ii). 


Part (iii) may be proved by modifying the proofs of parts (i) and (ii). Let d satisfy the triangle inequality. 
Let x,y € M and ri,ra € IR^. Let y € B,,(x). Then d(z,y) < ri. Let z € B,4(y). Then d(y,z) < ra. 
So d(x,z) € d(x,y) + d(y,z) < rı +12 by the triangle inequality. Therefore z € B,, 4,4 (x). Hence Vx, y € 
M, Yri, r2 € Rt, (y € B,, (x) > B,,(y) € Br, +r. (x)), which is condition (iii). 


Now suppose that condition (iii) holds. Let x,y,z € M, and e € R*+. Let rı = d(z, y) -- € and r2 = d(y, z) +e. 
Then y € B, (x) and z € B,,(y), where r1,r2 € Rt. So z € B,,+4,,(x) by condition (iii). So d(x,z) < 
rı 4 T2 = d(x,y) + d(y, z) + 2e. So Ve € IR*, d(x,z) < d(x,y) + d(y, z) + 2e. It follows by the usual RAA 
argument that d(x,z) € d(x,y) + d(y, z). So the triangle inequality holds. Hence the triangle inequality is 
equivalent to condition (iii). 
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For part (iv), z € Uie: (2) B,,(y) is equivalent to Jy € B, (zx), z € Br, (y). So 


Br, (y) c Bri+ro (x) e Vzc M, ((3y € Br (x), zc B,, (y)) =z Bri+ro (x)) 
Beans e Yz € M, Vy € B, (x), (z € B, (y) > z € B, us, (£)) 


€ Vz € M, Vy € Br, (2), (Bro (y) € Br, ers ()) 


Therefore the condition Vr € M, Yrı € IR*, Vra € Rf, Upes., isi B, (y) € B, +r, (x) is equivalent to the 
condition Vz,y € M, Yrı € Rt, Vr € RG, (y € Br (£) > B,,(y) € B,,4,,(x)), which is condition (iv). 
Hence condition (iv) is equivalent to condition (i), which has been shown to be equivalent to the triangle 
inequality. 

Parts (v) and (vi) follow from parts (ii) and (iii) respectively in exactly the same way that part (iv) follows 
from part (i). 

Parts (vii), (viii) and (ix) follow from parts (iv), (v) and (vi) respectively in exactly the same way that 
Theorem 37.3.8 part (iii) follows from Theorem 37.3.8 part (ii). 

The quantified logical expressions in parts (x), (xi) and (xii) are contrapositives of the corresponding expres- 
sions in parts (vii), (viii) and (ix) respectively. Hence parts (x), (xi) and (xii) are equivalent to parts (vii), 
(viii) and (ix). 


37.3.11 REMARK: Visualisation of triangle inequality lower bounds. 
There are many ways to express the triangle inequality lower bound in Theorem 37.1.4 (ii) in terms of balls. 
Some examples are given in Theorem 37.3.12. Parts (i) and (iv) are illustrated in Figure 37.3.3. 


M 
7 7 NI 


n -ù | Fi n -ù 
y € Bn (2) > Br (y) N Brn-r(2)=0 Br (y) Z Bry (£) => y ¢ Bux) 
Figure 37.3.3 Triangle inequality equivalents in Theorem 37.3.12 (i) and (iv) 


37.3.12 THEOREM: Some equivalent “lower bound” conditions for the triangle inequality. 
Let d: M x M > Ri be a function on a set M which satisfies the identity and symmetry conditions. Then 
the triangle inequality condition for d is equivalent to each of the following conditions. 


(i) Vz,y € M, Vr; € R*, Vr € (ri], y € Bry (x) > By (y) Bs, -, (a) = 0. 
(ii) Vz € M, Vri € R*, Vra € (0ori], Bj (x) UseMB,, (2) B, (y) ^ 0. 
(iii) Vr,y € M, Vr; € Rt, Vr; € (0,r1], y € Bj, (x) > Br (y) € By (x). 
(iv) Vz,y € M, Vr, € R*, Vr? € (0r, B,,(y) Z By, (x) > y € B, va). 


PROOF: Let d: M x M — R$ be a function on a set M which satisfies the identity and symmetry 
conditions for a metric in Definition 37.2.3. It must be shown that the triangle inequality condition for d is 
then equivalent to each of the given alternative conditions. 

Part (i) is similar to Theorem 37.3.10 part (x). The set ((r1,r2); r1 € IR^, r € Rj} is bijectively mapped 
to ((ri, r5); ri € IR*, r5 € (0,r1]) by the transformation rj = rı + ra and rh = ri. The condition Vz,y € 
M, Vr, € R*, Vr € R$, (y € Bir, (z) > Br (£) n B,,(y) = Ø) in Theorem 37.3.10 part (x) becomes 
Vr,y € M, Vr, € R*, Vr5 € (Ori; (y € By (z) => By (£) à By Ls (y) = Ø) under this transformation. 
But y € By (x) & x € Br (y) by Theorem 37.3.6 (v). So the condition is equivalent to Vr,y € M, Yri € 
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Rt, vr, € (Ori), (x € Bm (y) > By (x) n By_rs(y) = Ø), which is exactly the same as part (i) after 
swapping x and y. 

For part (ii), note that Vy € M, (y € Brn (x) > B,(y) N Br:—r2(x) = 0) is equivalent to Vy € M \ 
B, (x), Bra (y) By, -ra (£) = 0, which is equivalent to B-r, (£) N Uyem\B,, (a) Bra (y) = 0. Hence parts (i) 
and (ii) are equivalent. 


Part (iii) may be proved from Theorem 37.3.10 partii (i) via the bijection r! = rı +r2 and r5 = rz between the 
sets {(r1, r2); r1 € Rt, ro € Ro} and ((rj, r5); ri € R*, r5 € (0,ri]]. Then the condition Vz,y € M, Vr; € 
Rt, Yro € Rd, (y € B,,(x) > Br, (y) € Bri+ra(£)) is transformed to the condition Vz,y € M, Vr € 
R+, Vr5 € (0,ri], (y € By (x) => B, (y) € Br (x)). Hence part (iii) is equivalent to Theorem 37.3.10 
part (i), which is equivalent to the triangle inequality. Alternatively, one may show that part (iii) is equivalent 
to part (ii) as follows. 


By-n (x) N U By (y) = 0 Vy M, (y By, rp (x) > y ¢ U B. (z)) 
yeM\B,,, (a) zeMNB,, (v) 


€ Vy € M, (y € By, (zx) > Vz € MN Ba (x), y ¢ B,,(z)) 

«€ Vy € M, (y € B,, (zx) > (Yz € M, (z € Bu (x) > y ¢ B,,(z)))) 
€ Vy € M, (y € Bj, (x) > (Yz € M, (z € B (x) > z € B,,(y)))) 
€ WEM, (y € Bry—rg(x) ^ (Wz € M, (z € B,,(y) > z € By, (x)))) 
€ Vy € M, (y € By, (2) > Bra (y) € Br, ()). 


Hence parts (ii) and (iii) are equivalent. 


The quantified logical expression in part (iv) is the contrapositive of the corresponding expression in part (iii). 
Hence part (iv) is equivalent to part (iii). 


37.3.13 REMARK: Open and closed annulus definitions, and punctured open and closed balls. 

It is sometimes useful to define an open and closed annulus as in Definition 37.3.14 corresponding to the 
open and closed ball in Definition 37.3.1. The special cases where the inner radius is 0 may be referred to 
as a “punctured” open or closed ball. Notation 37.3.15 is (probably) non-standard. (Gelbaum/Olmsted [78], 
page 9, has the notation D(x,r) for a punctured open ball Bar, and they call it a “deleted neighborhood".) 


37.3.14 DEFINITION: The open annulus in a metric space (M,d) with centre x € M and radius pair 
ri,T9 € Rọ is the set (y € M; rı < d(x,y) < ro}. 

The closed annulus in a metric space (M,d) with centre x € M and radius pair ri,r9 € Ri is the set 
{ye M; rı < d(x,y) € ro}. 

The punctured open ball in a metric space (M,d) with centre x € M and radius r € Rọ is the set {y € 
M;0zd(x,y)«r]. 

The punctured closed ball in a metric space (M,d) with centre r € M and radius r € Ri is the set 
{y € M; 0#d(z,y) < r}. 


37.3.15 NOTATION: E 
B, ,7,, for a metric space (M,d), x € M and r1,r2 € R$, denotes the open annulus in M with centre x 
and radius pair r,ra. In other words, B,,,,, = (y € M; rı < d(x,y) < raj. 


Bz rı,ra, for a metric space (M,d), x € M and ri, r2 € R$, denotes the closed annulus in M with centre x 
and radius pair r1, ro. In other words, By ri >, = (y € M; ri € d(x,y) € ro}. 


Bus for a metric space (M,d), x € M and r € R$, denotes the punctured open ball in M with centre x 
and radius r. In other words, By = (y € M;i0Zzd(z,y) «rj 

Bar, for a metric space (M,d), x € M and r € R$, denotes the punctured closed ball in M with centre x 
and radius r. In other words, Bar = {y € M; 0# d(x,y) <r} 

B,, ,., (x), for a metric space (M, d), x € M and ri, r2 € R$, is an alternative notation for B; r, ;.,. 


B,, ra (£), for a metric space (M, d), x € M and ri, r3 € R$, is an alternative notation for By», ,.,. 
B, (x), for a metric space (M, d), x € M and r € R}, is an alternative notation for By, r. 


B,(a), for a metric space (M,d), x € M and r € Rj, is an alternative notation for Bar. 
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37.3.16 REMARK: Relations between balls and annuli. 
There are, of course, many interrelationships between the various definitions of balls and annuli. For example, 
Drs = Burs \ Bs, and By = By, N B 9. 


37.3.17 REMARK: Possible ambiguity in notations when metric space point-sets contain real numbers. 
There is some ambiguity between Notations 37.3.2 and 37.3.15. For example, B,, and DB,,,, may be 
confused, particularly if M is the metric space of real numbers. Such clashes are usually easy to clarify 
within the application context. 


37.4. Set distance and set diameter 


37.4.1 DEFINITION: The distance between a point x and a set A in a metric space (M, d) is the non-negative 
extended real number d(x, A) € Rt defined by d(x, A) = inf {d(x,y); y € A} for x € X and A € IP(M). 
Thus d: M x P(M) > Rj is defined by 


Va € X, VA € P(M), d(x, A) = inf (d(z, y); y € A). 


37.4.2 l'HEOREM: Some basic properties of distance between a point and a set. 
Let (M, d) be a metric space. 


(i) Va € M, d(x, 0) = oo. 
(ii) Vr € M, VA e P(M) \ (0), d(x, A) € R$. 
(ii) VA € P(M), ((3x € M, d(z, A) = oo) > A =f). 
(iv) Let A € P(M) \ {0}. Then Vy, z € M, |d(y, A) — d(z, A)| € d(y, z). 
) 
) 


(v Let Ay, Ao € IP(M) with A, C A». Then d(x, A1) = d(x, A2) for all z € M. 
(vi) Let 41, A9 € P(M) with A; C A». Then Vr € Rb, {a € M; d(x, A1) < r} C {x € M; d(a, A2) < r}. 


PROOF: For part (i), let x € M. Then d(z, 0) = inf(d(z, y); y € 0) = inf(U) = oo by Theorem 16.2.14 (ii). 

For part (ii), let x € M and A € P(M) \ (0). Let y € A. Then d(x, A) < d(x,y) < oo by Definition 37.2.3. 
So d(x, A) € IR. 

For part (iii), let A € IP(M) and z € M satisfy d(x, A) = oo. Suppose that A # Ø. Then d(x, A) < oo by 
part (ii), which gives a contradiction. Hence A = Ø. (Part (iii) paraphrases the contrapositive of part (ii).) 

For part (iv), let A € P(M) \ {0} and y, z € M. Then d(y, A) = inf(d(z, y); x € A} € inf(d(z, z) + d(z, y); 
x € A} by Definition 37.2.3 (iii) and Theorem 11.3.9 (i). So d(y, A) < inf(d(z, z); x € A} + d(y, z). That is, 
d(y, A) € d(z, A) + d(y, z). Similarly, d(z, A) € d(y, A) + d(y, z). Hence |d(y, A) — d(z, A)| < d(y, z). 

For part (v), let A1, A? € IP(M) with A; C A». Let x € M. If A; = 0, then d(x, A1) = oo by part (i). 
So d(x, Aı) > d(z,A2), as claimed. Suppose that A; # Ø. Then A5 # Ø. By Theorem 11.2.42 (i), it 
follows from (d(z,y); y € Ai} C {d(a,y); y € A2} that inf(d(z,y); y € Ai} > inf(d(z,y); y € A2}. So 
d(x, Ay) > d(x, A2) by Definition 37.4.1. 

For part (vi), let A1, A, € IP(M) with A, C A; and r € Rj. Let z € {x € M; d(x, Ai) < r}. Then z € M 
and d(z, A1) < r. So d(z, A2) € d(z, A1) < r by Theorem 16.2.14 (i). Therefore z € {x € M; d(x, A2) < rJ. 
Hence {x € M; d(x, A1) < r} € {xz € M; d(x, Ag) < r}. 


37.4.3 THEOREM: Some relations between point-to-set distance and inclusion of balls. 
Let (M,d) be a metric space. 


(i) VA € P(M), Yx € A, Yr e R, (r < d(x, M \ A) & Bzr C A). 
(ii) VA € P(M), Yx € M, (d(x, M\ A) >0 & dr € R}, Bzr C A). 
(iii) VA € IP(M), Yx € M, (d(z, A) 20 & Yr e Rt, B,,0 Az 0). 


PRoor: For part (i), suppose that r < d(x, MX A). If d(x, M \ A) = oo then A = M by Theorem 37.4.2 (iii). 
So By, C A for all r € R. So assume that d(x, M \ A) < oo. Suppose that Bs, Z A. Then y € By V A 
for some y. So d(x,y) < r. Therefore d(x, M V A) < r by Definition 37.4.1, which contradicts the assumed 
range of values for r. Therefore Byr C A. 
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For the converse, suppose that Bsr C A. Then d(z,y) > r for all y € M\ A. So d(z, A) > r by 
Definition 10.13. 

For part (ii), suppose that d(x, M V A) > 0. If d(x, M \ A) = oo then A = M by Theorem 37.4.2 (iii). So 
Bz C A for any choice of r € Rt. So assume that d(x, MX A) < oo. Let r = d(x, M \ A)/2. Then By r C A 
by part (i). Thus dr € Rt, By C A. 

For the converse, suppose that Bs, C A for some r € Rt. Then r < d(x, M V A) by part (i). Therefore 
d(x, M \ A) » 0. 

Part (iii) follows as the logical contrapositive of part (ii) if A is replaced with M V A. 


37.4.4 DEFINITION: The distance between two sets A and B in a metric space (M,d) is the non-negative 
extended real number d(A, B) € Rọ defined by d(A, B) = inf (d(z, y); x € A, y € B}. 


37.4.5 THEOREM: Some basic properties of distance between two sets. 
Let (M,d) be a metric space. 


(i) VA € P(M), d(A,0) = q(0, A) = oo. 
(ii) VA, B € P(M) \ {0}, d(A, B) € RẸ. 
(iii) VA, B € P(M) \ (0), Ve € Rt, 3x € A, dy € B, d(x,y) < d(A, B) +€. 
(iv) Let A1, Ao, Bi, B2 € IP(M) with A1 C A» and Bı C B2. Then d(Aj, B1) = d(Ag, Bo). 


PROOF: For part (i), let A € IP(M). Then d(A,0) = inf{d(a,y);2 € A, y € 0) = inf(0) = oo by 
Theorem 16.2.14 (ii). Similarly, d(0, A) = oo. 

For part (ii), let A,B € P(M) \ (0). Let x € A and y € B. Then d(A, B) = inf (d(z,y); x € A, y E B) € 
d(x,y) < oo by Definition 37.2.3. So d(A, B) € Rj. 


For part (iii), suppose that the assertion is false. Then there exist A, B € P(M) V {0} and e € IR* such that 


d(x,y) > d(A, B) + € for all x € A and y € B. So d(A, B) > d(A, B) + by Definition 37.4.4, and so € < 0, 
which is a contradiction. Hence the assertion is true. 

For part (iv), let Aj, Ao, Bi, B» € IP(M) with EU C A» and Bi € Bə. If A1 = ( Or Bı = 0, then d(Ai, Bi) 
oo > d(A2, B2) by part (i), as claimed. So suppose that A; 4 Ø and Bı 4 Ø. Then by Theorem 11.2.42 ( 
it follows from (d(z,y); x € Ai, y € Bi} € {d(a,y); x € A2, y € Bo} that inf[d(z,y); x E€ Ai, y € Bi} 
inf(d(z, y); x € Ao, y € Bo}. So d(Aj, Bi) > d(A2, B2) by Definition 37.4.4. 


T 


IVS Il 


37.4.6 DEFINITION: The diameter of a set S + () in a metric space (M, d) is the non-negative extended 
real number sup (d(z, y); x,y € S]. 


37.4.7 NOTATION: diamq(S), for non-empty subsets S of a metric space (M, d), denotes the diameter of S. 
In other words, diamg(S) = sup (d(z,y); z,y € S). 

diam(S), for non-empty subsets S of a metric space (M,d), denotes the diameter of S. In other words, 
diam(S) = diama4(S) = sup (d(z, y); z, y € S]. 


37.4.8 REMARK: The diameter of the empty set is not defined. Or it could be zero. 
In Definition 37.4.6, the diameter of the empty set would be sup Ý = —oo, which would probably be annoying. 
Therefore it is not defined. 


One may express the diameter of a set S as diam(S) = sup {r € R$; 3x, y € S, r = d(x,y)}. Then the set 
{r € Rb; dz,y € S, r = d(z,y)) is a subset of IRj. The standard order on the set Rj is the restriction 
of the standard order on IR to Ri: Every element of the ordered set Ri is an upper bound for the empty 
set, and the least upper bound is zero. Thus within the ordered set RO, we have sup Ø = 0. In general, 
sup(@) = min(X) in an ordered set X if the minimum exists. (This is shown in Theorem 11.2.15 (v).) So it 
is perfectly acceptable to define diam(()) — 0, which is closer to one's naive intuition. 


Any non-empty set with a finite number of elements must have a finite diameter. The diameter of an infinite 
set may be infinite. 


37.4.9 THEOREM: A non-empty set with zero diameter must be a singleton. 
Let (M, d) be a metric space. Let S € P(M)\ {0}. Then diam(S) = 0 if and only if S = {x} for some x € M. 
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Proor: Let (M,d) be a metric space. Let S € P(M) \ {0}. Suppose that S = {x} for some x € M. Then 
diam(S) = 0 because d(z,x) = 0. Suppose that S is not a singleton. Then {x,y} C S for some z,y € M 
with x Z y. So diam(S) > d(x,y) > 0. Hence diam(S) = 0 if and only if S = {x} for some x € M. 


37.4.10 REMARK: The diameter of a ball is at most twice its radius. 

The triangle inequality implies that diam(B,.,.) < 2r for all z € M and r € Rj. By analogy with the set 
diameter diam(S) in Definition 37.4.6, one could also define radius'(S) = inf {r € Rj; dr € S, 9 C Bzr}. 
However, although clearly diam(S) < 2radius' (S), the equality diam(S) = 2radius' (S) does not hold for all 
balls in all metric spaces. (See for example the discrete metric in Definition 37.2.6.) 


The radius concept in Theorem 37.4.11 permits the “centre points” x in the expression for radius'(S) to be 
any point of the whole space M. Therefore radius(S) < radius’ (S). 


37.4.11 THEOREM: Some relations between radius and diameter of sets. 
Let (M,d) be a metric space. Let radius(S, p) = sup{d(z,p); x € S} for S € P(M) \ {0} and p € M. Let 
radius(S) = inf(radius(S, p); p € M) for S € P(M) \ {0}. 
(i) VS € P(M) \ {0}, sup{radius(S, q); q € S} = diam(S). 
(ii) VS € P(M) \ (0), Vp € M, diam(S) < 2 radius(S, p). 
(iii) VS € P(M) \ (0), radius(S) € diam(S) < 2 radius(S). 


PROOF: For part (i), let S € P(M)\ {0}. Then sup{radius(S, q); q € S} = sup{d(p, q); p,q € S} = diam(S) 
by Definition 37.4.6. 

For part (ii), let S € P(M) \ {0} and p € M. Let x,y € S. Then d(x,y) < d(x, p) + d(y, p) € 2radius(S, p). 
Hence diam(S) < 2radius(S, p). 

For part (iii), let S € P(M) \ (0). Let p € S. Then radius(S) < radius(S, p). But by part (i), radius(S, p) < 
sup(radius(S, q); q € S} = diam(S). Therefore radius(S) < diam(S). The inequality diam(S) < 2 radius(S) 
follows from part (ii). 


37.4.12 DEFINITION: A bounded subset of a metric space M « (M,d) is a subset S of M such that either 
S = () or diamq(S) < oo. In other words, 


Jb € R, Va,y € S, d(x,y) < b. 


37.4.13 THEOREM: The diameter of a set is bounded above by the diameters of bounded supersets. 
Let Kı be a bounded subset of a metric space (M, d). Let Kg C Ky. Then Kə is bounded and either K = 9 
or diamg(K5) € diam4( Ki). 


PROOF: Let Kı, K2 be subsets of a metric space M < (M,d) such that Kə C Kı and Kı is bounded. 
Then either Kı = Ø or diam;(K1) < oo by Definition 37.4.12. If Kı = 0, then Ky = 0, and so Ko is 
bounded by Definition 37.4.12. If Kı Z 0), then b = diama(K1) € Rj. So Vz,y € Ki, d(x,y) € b. Therefore 
Vz,y € Ko, d(x,y) € b because Ky C Ki. If Ko Æ 0), then sup(d(z,y); x,y € K2} € b = diamq(K1). So 
diamg(K5) < diama(K1) by Definition 37.4.6 and Notation 37.4.7. Hence Kə is bounded and either Ky = () 
or diamg(K5) € diam4( Ki). 


37.4.14 THEOREM: Upper bound for set-union diameter in terms of diameters and distance between sets. 
Let (M,d) be a metric space. Let $1, S2 € M. 

(i) If S4 Z (0) and S2 Z Ø, then diam(S1 U S5) < diam($1) + diam( S2) + d(S1, S2). 

(ii) S1 U S5 is bounded if and only if both Sı and S2 are bounded. 


Pnoor: For part (i), if 51 is unbounded, then diam(S1) = oo. So diam(S1) + diam(S5) + d(51,55) = oo. 
Therefore diam($1 U S5) € diam( S1) + diam(S5) + d(S1, $2), and similarly if $5 is unbounded. 

So suppose that Sı and S2 are both bounded. Then diam(S1) < oo and diam($2) < oo. Let e € IR^. Then 
by Theorem 37.4.5 (iii), there are xı € $4 and z2 € S2 with d(z1, £2) < d(S1, S2) + e, where d(S1, 82) € IR* 
by Theorem 37.4.5 (ii). Let yı € Sı and y» € S5. Then d(yi,y2) € d(yi, 21) + d(z1, 22) + d(£2, Y2) by the 
triangle inequality. But d(y1, £1) € diam(91) < oo and d(zs,y») € diam(S2) < oo. Therefore d(yi, ys) < 
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diam(S;)+diam($2)+d(a1, x2) for all yı € Sı and y2 € S2. So d(y1, Y2) € diam($1)+diam(S2)+d(S1, S2)+e 
for all yı € $1, y» € S2 and £ € Rt. Consequently d(y1, y2) < diam(S1) + diam(S2) + d($1, S2) for all 
yı € Sı and yo € S2. But d(yi,y2) € diam(Sı) for all y1,y» € Sı, and d(y1, y2) < diam(S2) for all 
Yı; Y2 € $5. So d(yi, y2) < diam($1) + diam( S2) + d(S$1, S2) for all Y1, 42 € 91 U $5. Hence diam( Sı U S2) < 
diam( S1) F diam( S2) + d(S1, $3). 

For part (ii), suppose that $4 U S2 is bounded. Then Sı and $5 are both bounded by Theorem 37.4.13. So 
suppose that Sı and Sə are both bounded. If S4 = Ø or S5 = Ø, then clearly S4 U S2 is bounded. So assume 
that Sı # Ø and S2 4 Ø. Then diam(S, U S5) < oo by part (i) and Theorem 37.4.5 (ii). Hence S1 U S5 is 
bounded. 


37.5. The topology induced by a metric 


37.5.1 REMARK: Topological spaces are a generalisation of metric spaces. 

On every metric space, there is a standard topology called the “topology induced by the metric”, which is 
generated by the set of all open balls in the metric space. This induced topology has the same definitions 
of continuity and convergence as the metric. Historically, metric spaces came first. Then topological spaces 
generalised metric spaces. Many texts introduce metric spaces first, presumably because metric spaces have 
a strong intuitive appeal and general topological spaces do not. In this text, the more abstract general 
structure, a topology, is presented before the more concrete structure, a metric, in an attempt to system- 
atically make specific structures follow general ones. This has the advantage that general definitions and 
theorems may be applied immediately to the more specific structures. Then one discovers the “value added” 
by adopting the more specific structure. 


Another purpose of proceeding from general to specific is to try to avoid the accidental incorrect application 
of specific definitions and theorems to more general structures. Such incorrect application often happens 
when one loses track of which assumptions apply to which definitions and theorems. This kind of confusion is 
most likely when one learns the specific structures before the general ones. There are in fact many definitions 
and theorems for metric spaces which are not directly applicable to general topological spaces. 


37.5.2 DEFINITION: The topology induced by a metric d on a set M is the topology generated by the set 
of all open balls with positive radius in the metric space (M, d). 


37.5.3 REMARK: The topology generated by the open balls in a metric space. 
The topology generated by a set of subsets of a given set is introduced in Definition 32.1.4. One obtains 
Top(M) = (UC; € € PUND; D e P?*(S)))), where S = {B,,; 2 € M,r € R$), by applying Theo- 
rem 32.1.10 to Definition 37.5.2. The requirements of Theorem 32.1.10 are satisfied because M = |J S, since 
x € B, for all x € M. However, the intersection operations are not required in this construction. That is, 
all open sets of M are unions of open balls. This is shown in Theorem 37.5.4. 


37.5.4 THEOREM: The topology induced by a metric equals the set of all unions of open balls. 
Let (M,d) be a metric space. Let Top( M) denote the topology induced by d on M. Then 


Top(M) = (UC; CC {Bz r; £ E M, r e R*3), 


where the open balls By,- are as in Notation 37.3.2. 


PROOF: Let (M,d) be a metric space. According to Definition 37.5.2, the topology induced by d on M is 
the topology generated by the set S = {Bz r; x € M, r € Rt} of all open balls with positive radius in M. 
Let T = (UC; C € S). To satisfy Definition 32.1.4 for the topology generated by the set S, it must be 
shown that T is a topology on M and that every topology on M includes T. 

Let C = 0. Then C C S and JC = 0. So 0 € T. Let C = S. Then UC = M. So M € T. Let Q C T. Then 
UQ € T by Theorem 8.6.2. Therefore T is closed under arbitrary unions. 

To show that T is closed under binary intersections, let D, E € T. Then D = [U Cp and E = [J Cpg for 
some Cp, Cg € P(S). So DN E = (UCp) B (U Cg) = U{Bp Be; Bp € Cp, Bg € Cp}. Each Bp € Cp 
and Bg € Cg is an open ball. That is, Bp = B,,, and Bg = By,s for some z, y € M and r,s € R*+. It must 
be shown that each intersection B, (B, may be expressed as the union of a set of open balls in M. For 
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2 


Figure 37.5.1 Radii of open balls within the intersection of two open balls 


z € Br r N By s, let tz = min(r — d(x, 2), s — d(y, z)). (This is illustrated in Figure 37.5.1.) Then t, € Rt 
and B;;, C Bz, 1 By,. Then Bz r O By, = U{Bzt.; 2 € Brr By,s}. So Bry O By, € T. Hence T isa 
topology on M. 


To show that T is a subset of all possible topologies on M which include S, note that any such topology on M 
must include all unions of open balls on M, and therefore they must include T. Hence T is the topology 
generated by the open balls on M as claimed. 


37.5.5 REMARK: The topology generated by families of open balls. 
By indexing the open balls in Theorem 37.5.4, one may write 


Top(M) = { U Bs; x: 14 M andr: I R*}. 
icl 


37.5.6 THEOREM: Some properties of the topology induced by a metric. 
Let (M,d) be a metric space. Let Top(M) denote the topology induced by d on M. 


(i) Vr € M, Vr € R+, Bz r € Top,(M). 
(ii) Q € Top(M) if and only if Va € Q, 30 € R+, Bas C Q. 


(iii) VA € IP(M), Va € M, (a € Int(A) & 3ó € R*, Bas C A), where Int(A) means the interior of A with 
respect to the topology Top(M). 


Va € M, VO € Top, (M), 36 € Rt, Ba,s € Q, where Top,(M) = {Q € Top(M); a € Q}. 
VO € Top( M), Vx € M, (x € Q & d(x, M \ Q9) » 0). 


(iv 
(v 
(vi) YQ € Top( M), Yx e M, (x € Q & Jr € R^, Bsr CQ). 
(vii) VA € IP(M), Int(A) = (x € M; dx, MV A) > 0). 
(viii) VA € P(M), Vx € M, (x € A & d(x, A) = 0). 
(ix) VA € P(M), A = {x € M; d(x, A) = 0). 
(x) YA € P(M), Vr € M, (re Ae Vr € Zi, An Bzr £0). 


) 
) 


RP CU NP CUN: 


PROOF: For part (i), let x € M andr € Rt. Let C= {Bz}. Then C C (B,,; y € M, s € Rt}, and so 
Bz r =U C € Top(M) by Theorem 37.5.4. But x € Bz r. Therefore Vz € M, Vr € R*, Bz, € Top, (M). 


For part (ii), let Q € Top(M). Then Q = UC for some C C {B} r; x € M, r € Rt}, by Theorem 37.5.4. 
Let a € Q. Then a € Bzr for some z € M andr € R*. So Bas C Q with ô = r — d(z, a) € IR*. Conversely, 
let Q be a set which satisfies Va € 0, 3ó € Rt, Bas C Q. By part (i), Bas € Top(M). So a € Int(Q) by 
Theorem 31.8.17 (i). So Va € Q, a € Int(Q). Thus Q C Int(Q). Hence Q € Top(M) by Theorem 31.8.14 (ii). 


For part (iii), let A € IP(M) and a € M. Suppose that a € Int(A) with respect to the topology Top(M). 
Since Int(A) € Top(M) by Theorem 31.8.13 (i), it follows from Theorem 37.5.4 that Int(A) = UC for 
some C € P(S), where S = {Bz r; £ € M,r € Rt}. Therefore a € B,, for some x € M and r € RY. 
So Bas C Bz» C Int(A) C A with 6 = r — d(z,a) € IR*. To show the converse, let a € M satisfy 
Jô € Rt, Bas C A. Let C = {Ba 5}. Then C € P(S). So Bas = UC € Top(M) by Theorem 37.5.4. Hence 
a € Int(A) by Theorem 31.8.17 (i). 


For part (iv), let a € M and Q € Top,(M). Let A=. Then a € Q = Int(Q). So 36 € R^, Bag CASQ 
by part (iii), as claimed. 
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For part (v), let Q € Top,(M) and z € Q. Then z € Q if and only if x € Int(Q) by Theorem 31.8.14 (i). 
So r € Q if and only if 3ó € Rt, Bz, C Q by part (iii). Therefore x € Q if and only if d(x, M \ Q) by 
Theorem 37.4.3 (ii). 


For part (vi) follows from part (v) and Theorem 37.4.3 (ii). 

For part (vii), let A € IP(M) and x € M. Then z € Int(A) if and only if 3ó € IR*, Bas C A) by part (iii). So 
x € Int(A) if and only if d(x, M \ A) > 0 by Theorem 37.4.3 (ii). Hence Int(A) = (x € M; d(x, MV A) > 0}. 
For part (viii), let A € IP(M). Let x € M. By Theorem 31.8.13 (ix), x € A if and only if z ¢ Int(M V A). 
But x € Int(M \ A) if and only if 3ó € Rt, Bs, C M \ A by part (iii). So by logical negation, x € A if 
and only if Vó € Rt, Bes Z MN A. But B;,5 Z M \ A is equivalent to B,,5 N A z 6), which is equivalent 
to Jy € A, d(z, y) < à. So a € A if and only if Vó € Rt, dy € A, d(x,y) < 6. Sox € A if and only if 
d(x, A) = inf{d(x,y); y € A} < 6 for alld > 0. So x € A & d(x, A) < 0. Hence x € AS d(x, 4) = 0 
because d(x, A) > 0 for all A € P(M) and z € M. 


Part (ix) follows from part (viii) and Notation 7.7.10. 


Part (x), follows from part (viii) and Theorem 37.4.3 (iii). 


37.5.7 REMARK: The topology induced by a metric is always well defined. 

The topology in Definition 37.5.2 is well-defined because an arbitrary set of subsets of any set M will always 
generate a topology on M. This does not necessarily mean that the topology will have any useful properties. 
For example, if d(x,y) = 1 for all x, y € M with x Æ y, the topology will be the discrete topology. (See 
Definition 31.3.19 for the discrete topology.) 

For comparison, if d(x,y) = 0 for all z, y € M with x Æ y, the topology would be the trivial topology. (See 
Definition 31.3.18 for the trivial topology.) But of course, such a “metric function" would not satisfy the 
requirement in Definition 37.1.2 (i) that d(x, y) must be non-zero whenever x Z y. 


37.5.8 REMARK: Notation to distinguish topologies induced on one set by different metric functions. 
When there are two or more choices of a metric on a given set M, one could use a notation such as Top;( M) 
to indicate the topology on M for a particular metric d. But this could be confused with the notation 
Top, (M) for the set of open neighbourhoods of x in Top(M). A better notation for the topology induced 
by (M, d) would be Top( M, d) or Top^(M). 


37.5.9 REMARK: Smaller distance functions give weaker topologies. 

Theorem 37.5.10 (i) means, very informally speaking, that smaller distance functions give weaker topologies. 
If the distance function is smaller, then the open balls (with a given radius) around points are larger. So 
it is more difficult to fit those open balls into an open set. Therefore fewer sets are open. So the topology 
is weaker. This more or less explains the direction of the inequalities. Another way to think of this is that 
smaller distance functions separate points less effectively. So the topology is also expected to separate points 
less well. As mentioned in Remark 31.11.1, stronger topologies separate sets more effectively in some sense. 


37.5.10 THEOREM: Uniform bounds between two metrics implies inclusion of their induced topologies. 
Let dı and dz be distance functions on a set M. Let Top(M,di) and Top(M,d») denote the respective 
topologies induced by dı and dz on M. 

(i) I£3C € Rt, Vz,y € M, di(z, y) € Cdz(x, y), then Top(M, dı) C Top(M, d). 

(ii) If Jc, C € Rt, Vz, y € M, cdi(z, y) € da(z, y) € Cdi(x, y), then Top( M, d1) = Top(M, d»). 


Proor: For part (i), let Q € Top(.M, di). Let a € Q. Then Bis C Q for some 6 € Rt by Theorem 37.5.6 (ii), 
where B^ denotes the respective open balls of (M, dp) for k = 1,2. Let 6’ = 6/C. Let x € B2. Then 
do(a,x) < 6’. So di(a,z) € Cd»(a,z) < Có' = 6. So x € Bl,. Therefore B? 5, C Bl, C 9. Consequently 
Va € Q, 3ó' € Rt, B? y CQ. So Q € Top(M, dz) by Theorem 37.5.6 (ii). Hence Top(M, d) C Top(M, ds). 

Part (ii) follows from the double application of part (i). 


37.5.11 REMARK: All definitions for topological spaces may be imported to metric spaces. 

All of the definitions for topological spaces apply also to metric spaces by referring to the induced topology. 
Thus a metric space is said to be paracompact if the induced topology is paracompact, and so forth. Similarly, 
continuity between metric spaces, and between metric spaces and topological spaces, is defined as if the metric 
function were replaced with the induced topology, as superfluously presented in Definition 38.1.1. 
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37.5.12 REMARK: All open balls in a metric space are open sets in the induced topology. 
Open balls in a metric space are automatically open sets by Definition 37.5.2. 


37.5.13 THEOREM: Closed balls in a metric space are closed in the induced topology. 
Closed balls in a metric space are closed sets in the induced topology. 


PROOF: Consider the closed ball S = (y € M; d(x,y) € r}. If r = oo, then S = M, which is closed. 
So assume that r < oo. If z € M \ S, then d(x,z) > r. Therefore by Theorem 37.1.4 (ii), Bz C MN S 
with e = d(z,z) — r. But Bz, is an open set. So M V S is an open set. Therefore S is closed. 


37.5.14 REMARK: A closed ball includes the closure of the open ball with the same radius. 

A closed ball in Definition 37.3.1 includes, but is not necessarily equal to, the closure of the corresponding 
open ball. Since a closed ball B, is a closed set, it follows from Definition 31.8.7 for the closure of a set 
that Bzr C Bass Discrete spaces such as the integers with the usual metric provide ample counterexamples 
to the reverse inclusion. (See also Gelbaum/Olmsted [78], pages 159-160, example 4.) 


37.5.15 DEFINITION: A metrisable topological space is a topological space X such that Top(X) is the 
topology induced by some real-valued metric function on X. 


37.5.16 REMARK: The difficulty of discovering whether a topological space is metrisable. 

Definition 37.5.15 gives the name “metrisable” to those topologies which are induced by a metric. But if 
one is given a topological space, it may be seriously problematic to determine whether or not a metric can 
be found which induces the topology. One may resolve the question in the positive by finding a suitable 
metric, and one may resolve it in the negative by showing that the special properties of topologies induced 
by metrics are not satisfied. But there may be cases which cannot be determined one way or the other. 


37.5.17 REMARK:  Metrisability may be considered as a kind of topological separation condition. 

It is shown in Theorem 38.1.5 that every metrisable topological space is a Tg topological space. This links 
metrisable spaces into the hierarchy of topological separation properties in Figure 33.3.7 for Remark 33.3.36. 
Thus metrisability may be considered as a kind of topological separation condition. 


37.6. Some basic metric space induced topology properties 


37.6.1 THEOREM: Relations between open-ball-intersection and point-set distance. 
Let (M,d) be a metric space. Then for any S C M, x € M, andr € Rj, 


(i) B..0920d»S >r, 
(ii) Bar NS Z0 e d(x, 8S) <r & Jy E sS, d(x,y) <r eo Jy E sS, xE Byr- 


Hence Upes Bz,r = (y € M; d(y, S) < r} for any set S C M and r € R$. 


res 
Pnoor: For part (i), let S C M, x € M, andr € Ri. Suppose that B,,,18 =. Let y € S. Then y € Brz r- 
So d(x,y) > r by Notation 37.3.2. Therefore Vy € S, d(x,y) > r. So d(x, S) = inf (d(z,y); y € S} > r by 
Definition 37.4.1. Conversely, suppose that d(x, S) > r. Then Vy € S, d(x,y) > r. So Bzr N S = Ø. (The 
special case r — oo follows a-fortiori from Theorems 37.3.6 (vii) and 37.4.2 (1).) 

Part (ii) follows as the contrapositive of part (i). (See Theorem 4.7.9 (xxi, xxii).) 


Hence Upes Ber = (ye M; Ix € S, y € By} = (y € M; d(y, S) < r} for any S € P(M) and r € Rj. 


rcs 


37.6.2 THEOREM: A closed set contains all points with zero distance from the set. 
Let (M, d) be a metric space. 
(i) VK € Top(M), Vz € M, (x€ K & d(x, K) — 0). 
In other words, for closed sets K, a points x € M is in K if and only if d(x, K) = 0. 


(ii) VK € Top(M), K = {x € M; d(x, K) = 0]. 
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PROOF: For part (i), let K is closed. Then M \ K is open. Therefore 


LK ereM\K 
© Jr > 0, Ber CM\K 
e Jr >0, BrOK =v 
e Jr > 0, d(x, K) >r 
€ d(z, K) » 0. 


Uw Ll 


Hence x € K © d(x, K) = 0 by Theorem 4.7.9 (xxi, xxii). 
Part (ii) follows from part (i) and Notation 7.7.10. 


37.6.3 THEOREM: The interior points of a set are the centres of balls which are included in the set. 
Let (M,d) be a metric space. 


(i) VS € P(M), Vy € M, (y € Int(S) © 3r € R*, B,, C S). 
(ii) VS € P(M), Vy € M, (y € Int(S) & Jx € S, dr € R*, (y € Bz and B,, C S)). 
(iii) VS € P(M), Int(S) = U {Bz,r; £ € S, r € R*, Bsr C S). In other words, the interior of any subset S 
of M equals the union of all open balls which are included in S. 


PRoor: For part (i), let S c IP(M). Then Int(S) € Top(M) by Theorem 31.8.13 (i). Let y € Int(S). 
Then B,, C Int(S) for some r € IR* by Theorem 37.5.6 (ii). But Int(S) C S by Theorem 31.8.13 (ii). So 
B} r C S for some r € Rt. Conversely, suppose that By, C S for some r € Rt. Then By, € Top(M) by 
Theorem 37.5.6 (i). So By, € Top, (M) because y € B,,. Therefore y € Int(S) by Theorem 31.8.13 (iii). 
Thus (dr € R^, Byr C S) & y € Int(S). Hence y € Int(S) & Ir € R*, Byr CS. 

For part (ii), let S € IP(M) and y € Int(S). Then 3r € Rt, B, C S by part (i). Therefore the proposition 
Jx € S, dr € Rt, (y € Bz, and B} r C S) is true by choosing x equal to y. 

For the converse, let y € M and suppose that 3x € S, dr € Rt, (y € Bz, and B,,, C S). In other words, 
y € B,, for some x € M and r € R* such that Bs, C S. But Be, € Top(M) by Theorem 37.5.6 (i). So 
Bz, € Top,(M) because y € B;,,. Therefore y € Int(S) by Theorem 31.8.13 (iii). 

For part (iii), let S € IP(M) and let y € Int(S). Then y € Bz, for some x € M andr € IR* such that By, C S 
by part Gi). So y € U {Bz,r; £ E€ S, r > 0, B, C S}. Therefore Int(S) C U {Bee € S, r 2 0, Bay C S}. 
For the reverse inclusion, let y € (J (B4; £ € S, r > 0, Bay € S). Then y € B,, for some x € M and 
r € R* such that Bs C S. So y € Int(S) by part (ii). So Int(S) 2 U{Bz,r; 2 € S, r > 0, Bay C S). 
Hence Int($) = U {Br £ € S r > 0, Ber C S) 


37.6.4 REMARK: Expressions for the interior, closure, exterior and boundary of sets in a metric space. 
Definitions and properties of the interior, closure, exterior and boundary of sets in a general topological 
space are presented in Sections 31.8 and 31.9. The correspondence between the metric on a metric space M 
and these set components in the induced topology on M is expressed in Theorem 37.6.5 in terms of the set 
distance function. 


37.6.5 THEOREM: Expressions for interior, closure, exterior and boundary in terms of the metric. 
Let (M,d) be a metric space and S C M. 


(i) Int(S) 2 (x € M; d(x, MN S) > 0). 

(ii) S = (x € M; d(x, S) = 0). 

(iii) Ext(S) = (x € M; d(x, S) > 0}. 

(iv) Bdy(S) = (x € M; d(x, S) = 0 and d(x, MV S) = 0}. 

PROOF: Part (i) is a restatement of Theorem 37.5.6 (vii). (Alternatively, note that Int(S) = M\M\S 

by Theorem 31.8.13 (xi). So Int(S) = M \ {x € M; d(x,M \ S) = 0} by Theorem 37.5.6 (ix). Hence 

Int(S) = (x € M; d(x, M \ S) > O}.) 

Part (ii) is a restatement of Theorem 37.5.6 (ix). 

Part (iii) follows from part (i) because Ext(S) = Int(M \ S) by Theorem 31.9.10 (iv). 
( 


ii 


Part (iv) follows from parts (i) and (ii) because Bdy(S) = S \ Int(S) by Definition 31.9.5. 
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37.6.6 THEOREM: Set closure equals intersection of unions of open balls centred on points in the set. 
Let (M, d) be a metric space and S C M. Then S =(),...5 Uzes Bz,r- 


PROOF: Let Gr = Uses Ber for r > 0. Then G, = {x € M;d(z, S) < r} by Theorem 37.6.1. So 
S = {x € M; d(x, S) = 0) CG, by Theorem 37.6.5 (ii), and so S C (,..5 Gr. 

To show the reverse inclusion, suppose that y ¢ S. Then d(y, S) > 0 by Theorem 37.5.6 (viii). Let r = d(y, S). 
Then y ¢ Gr = {x € M; d(x, S) < r}. Soy ¢ (>o Gr. Therefore (],.. Gr € S by Theorem 4.5.7 (iii) and 


Definition 7.3.2. Hence $ = (,. 5 Uzes Bz,r- 


37.6.7 REMARK:  Ezpressiom for the closure of a set in a metric space. 
An alternative proof for Theorem 37.6.6 is as follows. 
rcSed(zS)-0 
e inf(d(r,y; ye S} 20 
€ Vr > 0, dy € S, d(x,y) <r 
€ Vr > 0, Jy E€ S, x € By, 
e Yr >0, xe U Byr 
yes 


s re N U Byr- 
r>0yES 


This is possibly a more insightful form of proof. 


37.6.8 THEOREM: The diameter of a set equals the diameter of its closure. E 
Let M be a metric space. Let S be a non-empty subset of M. Then diam(S) = diam( S). 


PROOF: It follows from Theorem 37.4.13 that diam(S) < diam(S). Let x,y € S. Let € € Rt. Then by 
Theorem 37.5.6 (x), SM Beejg # Ü and S N By» # 0. So there are z',y' € S with d(z',x) < €/2 and 
d(y',y) < ¢/2. Therefore d(z',y') > d(z,y) —&. So diam(S) > d(x,y). Since this hold for all x,y € S, it 


follows that diam(S) > diam($). Hence diam(S) = diam(S). 


37.6.9 l'HEOREM: Every metric space is a Hausdorff space with the induced topology. 
Let X be a metric space. Then X is a Hausdorff space. 

PROOF: Let X be a metric space. Let x,y € X with x # y. Then d(x,y) € IR* by Definition 37.2.3 (i). Let 
r = d(z, y)/3. Then r € Rt. Let Q, = B, (x) and €, = B,(y). Then 2,,0, € Top(X) by Definition 37.5.2, 
and Qs N Qy = Ø (by the triangle inequality), and x € Q, and y € Qy. Hence X is a Hausdorff space. 


37.6.10 THEOREM: Every metric space is a first countable with the induced topology. 
The topology of metric space is first countable. 


Proor: Let X be a metric space. Let z € X. Let B = (B;iy; k € Z+}. Then B is a countable set 
of open neighbourhoods of z. Let Q € Top,(X). Then Bz,5 C Q for some ó € Rt by Theorem 37.5.6 (ii). 
Let k = ceiling(1/5). (See Definition 17.3.9.) Then k € Zt and 1/6 < k. So 1/k < ô. Therefore 
B- 1/k € Bs C Q. So B is a countable open base for Top(X) at z by Definition 32.2.2. Hence X is first 
countable by Definition 33.4.12. 


37.7. Compactness and convergence in metric spaces 


37.7.1 THEOREM: The union of two bounded sets is a bounded set. 
Let Sı and Sj be non-empty bounded subsets of a metric space M. Then S1 U S5 is bounded in M. 


PROOF: Let Sı and S2 be non-empty bounded subsets of a metric space M < (M,d). Let yı € Sı 
and y2 € S2. Then d(yi, y3) € Rj. Let £1, £2 € S1 U S2. Then either x; and zz are both in the same set Sı 
or $5, or they are in different sets. If they are in the same set, then d(z1, 72) < max(diamq(51), diama(53)). 
If they are in different sets, then d(zi, £2) € d(z1,y1) - d(yi, y2) 4- d(yo, v2) by the triangle inequality for d. So 
d(x1, x2) € diam4(9$1) + diamg($2) + d(j1, y). But max(diama( S1), diama(S2)) € diama($1) + diamg(S2) + 
d(yi,y2). Therefore d(xı, £2) € diamg($1) + diamg(S2) + d(y1,y2) € Ri for all 71,22 € S1 U S2. Hence 
$4 U So is bounded. 
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37.7.2 THEOREM: A set is bounded if it is covered by a finite set of bounded sets. 
Let K be a subset of a metric space M. Let C be a finite open cover of K such that €) is a bounded subset 
of M for all Q € C. Then K is a bounded subset of M. 


Proor: Let M < (M,d) be a metric space. Let K be a subset of M. If K = 0), then K is bounded 
by Definition 37.4.12. So assume that K is a non-empty subset of M. Let C be a finite open cover of 
K such that Q is a bounded subset of M for all Q € C. Let C' = CV {0}. Then K C UC’ = UC. 
Let co, Q, = diamg(Q; U Q2) for Q1, 05 € C'. Then co, o, € IR}. by Theorem 37.7.1. So diam4(U C) < 
max{co,,0,; 3,09 € C') € R because any points z1, £2 € UC’ must be elements of some Q1, Q2 € C", 
and so d(#1,%2) € ca,,9,- But diama( K) € diama(U C") by Theorem 37.4.13. Hence K is bounded. 


37.7.3 THEOREM: All compact subsets of a metric space are closed and bounded. 
All compact subsets of a metric space are closed and bounded. 


PROOF: Let X be a metric space. Let K be a compact subset of X. Let C = {Q € Top(X); diam(Q) < 1}. 
Then X = UC, and so K C UC. Therefore by Definition 33.5.10, there is a finite subset C’ of C such 
that K C UC". It follows from Theorem 37.7.2 that K is bounded. 


X is a Hausdorff space by Theorem 37.6.9. Therefore K is closed by Theorem 33.5.14. Hence K is closed 
and bounded. 


37.7.4 REMARK: Boundedness of continuous real-valued functions on compact sets. 
Theorem 37.7.5 is an immediate useful consequence of Theorem 37.7.3. 


37.7.5 THEOREM: Continuous metric-space-valued functions on compact sets are bounded. 
Continuous metric-space-valued functions are bounded on compact sets. 
Hence continuous real-valued functions are bounded on compact sets. 


Proor: Let X be a topological space. Let K be a compact subset of X. Let Y be a metric space. 
Let f : X — Y be continuous. Then f(K) is compact by Theorem 33.5.15. Hence f(K) is bounded by 
'Theorem 37.7.3. 


37.7.6 REMARK: Closed bounded sets in metric spaces may not be compact. 
'The converse of Theorem 37.7.3 is not valid in general. In other words, a closed, bounded subset of a metric 
space is not necessarily compact. 


37.7.7 THEOREM: Subsets of real Cartesian spaces are compact if and only if they are closed and bounded. 
For all n € Z*, a subset of R” (with the usual metric) is compact if and only if it is closed and bounded. 


PROOF: Letne Zt . Then every closed and bounded subset of R” is compact by Theorem 34.9.14. The 
converse follows from Theorem 37.7.3. 


37.7.8 THEOREM: Continuous real-valued functions attain their extrema on compact sets. 
Let X be a topological space. Let K be a non-empty compact subset of X. Let f : K — R be continuous. 
Then min(f) and max(f) are well-defined real numbers which are elements of f(K). 


Proor: f(K) is a non-empty compact subset of IR by Theorem 33.5.15. So f(K) is a non-empty, bounded, 
closed subset of IR by Theorem 37.7.3. Therefore max(f) = max(f(K)) and min(f) = min(f(K)) are 
well-defined elements of f(K) by Theorem 32.5.11 (ii, iv). 


37.7.9 REMARK: Choice function for compact subsets of Cartesian spaces. 

Theorem 37.7.10 is useful in situations such as for Theorem 37.9.6 (i, ii), where sequences of points chosen for 
a sequence of compact subsets of a Cartesian space. The inductive proof of Theorem 37.7.10 relies upon the 
fact that a lower-dimensional compact "projected slice" of a compact set in a Cartesian product is compact. 
(See Definitions 10.12.7 and 10.13.12 for projected slices of subsets of Cartesian set-products.) Note that the 
choice function in Theorem 37.7.10 is the same as one would obtain by choosing the least element according 
to a lexicographic order on IR". (See Definition 11.6.19 for lexicographic order.) 


37.7.10 THEOREM: Choice function for non-empty compact subsets of Cartesian spaces. 
In every Cartesian space, there is a choice function for non-empty compact sets. In other words, 
Vn € ZF, 3ó: C, > R^, VK € Cn, ¢(K) € K, where Cp = {K € P(IR”) \ (0); K is compact} for n € Zj. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


37.7. Compactness and convergence in metric spaces 1253 


PROOF: For n = 0, R” = {0} and Top(R”) = {0,{0}}. So Co = {{0}}. Therefore 9 : Co > R” with 
({0}) = 0 is a suitable choice function. 

For n = 1, C, is the set of non-empty bounded closed sets of real numbers by Theorem 37.7.7. Define 
$: Cı > R by ¢(K) = inf(K) for all K € Ci. Then ¢(K) € K for all K € Cı because closed sets contain 
all of their limit points by Theorem 31.10.12 (vi). So @ is a suitable choice function. 


For n > 2, assume that the assertion is proved for R”~!. Let K be a non-empty compact subset of R”. 
Let Kn = IL,(K), where IL, : R” — IR is the projection map as in Definition 10.13.2. Then Kn is a 
non-empty compact subset of IR, by Theorem 33.5.15 because IL, is continuous by Theorem 32.12.3 (ii). So 
an = inf(K4) = inf([z,; x € K} is a well-defined element of K, as in the case n = 1. 


Let K' = (z' € IR^-!; (z',a4,) € K}, where (x’,a,,) denotes the concatenation of the tuples x’ € R"~! and 
(an) € R!. Then K' is a compact subset of IR^"! by Theorem 32.10.8 (i) because K’ x {an} is a compact 
subset of R” by Theorem 33.5.13 since K’ x {an} is a closed subset of K because it is the intersection of 
the closed subsets K and [x € R”; £n = an} of IR^. By the inductive hypothesis, there is a suitable choice 
function $1 : C41 > IR^^1. Define dn : C, > R” by é«4(K) = (¢n_1(K’), inf(Kn)) for all K € Cy. 
Then ¢,(K) € K for all K € Cn. The assertion for general n € Zj follows by induction. 


37.7.11 REMARK: Convergence and limits of sequences in a metric space. 
Definition 35.4.2 for convergence and limits of sequences in a general topological space may be written in 
terms of the distance function of a metric space as in Theorem 37.7.12. 


It follows from Theorem 35.4.10 that sequences of points in a metric space have at most one limit because a 
metric space is necessarily Hausdorff by Theorem 37.6.9. Therefore a convergent sequence in a metric space 
has one and only one limit. 


37.7.12 THEOREM: Limits of sequences in a metric space. 
Let x = (z;)ie; be a sequence of points in a metric space X, where I is Zj or Z*. Then a point y € X isa 
limit of æ if and only if Ve € IR*, dn € I, Vi > n, d(xi,y) < €. 


PROOF: Let x = (zrj)ie; be a sequence in X, and y € X. By Definition 35.4.2, y is a limit of x if 
and only if VO € Top, (X), 3n € I, Vi > n, x; € Q. Suppose that y is a limit of x. Let € € Rt. Then 
By € Top, (X) by Theorem 37.5.6 (i). So dn € I, Vi > n, x; € By. But z; € By. if and only if d(x;, y) < € 
by Definition 37.3.1. Therefore Ve € Rt, dn € I, Vi > n, d(vi, y) < e. 


Now assume that Ve € Rt, dn € I, Vi > n, d(xi,y) < e. Let Q € Top,(X). Then By, C Q for some 
€ € Rt by Theorem 37.5.6 (vi). For such e, dn € I, Vi > n, d(zi, y) < e. So dn € I, Vi 2 n, zx; € Bye by 
Definition 37.3.1. Therefore dn € I, Vi > n, x; € Q. Consequently y is a limit of x. 


37.7.13 REMARK: Relation between compactness and sequential compactness in a metric space. 
See Definition 35.6.2 for sequentially compact sets. 


37.7.14 THEOREM: Every compact set in a metric space is sequentially compact. 
In a metric space every compact set is sequentially compact. 


PROOF: Let X be a metric space. Then X is first countable by Theorem 37.6.10. So every compact set 
in X is sequentially compact by Theorem 35.6.6 (vi). 


37.7.15 THEOREM [ZF+CC]: Equivalence of compactness and sequential compactness. (Conditions apply.) 
(i) In a second countable metric space, every sequentially compact set is compact. 


(ii) A subset of a second countable metric space is compact if and only if it is sequentially compact. 


PROOF: For part (i), let X be a second countable metric space. Then X is a Lindelöf space by the 
axiom of countable choice and Theorem 33.7.12. Let K be a sequentially compact subset of X. Then K is 
sequence-range oo-limit-point compact by Theorem 35.6.6 (i). So K is w-infinite-set oo-limit-point compact 
by Theorem 35.6.5 (ii). Then K is countably compact by the axiom of countable choice, Theorem 33.7.6 (ii) 
and Definition 35.5.12. Hence K is compact by Theorem 33.7.11 because X is a Lindelóf space. 


Part (ii) follows from part (i) and Theorem 37.7.14. 
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37.7.16 REMARK: Existence of Lebesgue numbers for open covers of compact sets in metric spaces. 

Since compactness implies sequential compactness in a metric space by Theorem 37.7.14, it follows from 
Theorem 37.7.18 that every open cover of a compact set in a metric space has a Lebesgue number. In 
ZF+CC set theory, this could be useful to show that rectifiability is well-defined for compact-interval paths 
in general topological manifolds. 


Discussion of Lebesgue numbers is simplified by the use of the abbreviation IP4(S) to mean IP(S) V (01, as 
defined in Notation 13.12.2 to mean the set of all subsets of S which contain at least one element. 


37.7.17 DEFINITION: A Lebesgue number for an open cover C € IP(Top(.X)) of a set S in a metric space 
(X,d) is a number A € IR* such that VA € P(S), ((diam(A) < A) 2 (EQ € C, A C Q)). In other words, 
every non-empty subset of S with diameter less than A is included in at least one of the sets in C. 


An open cover C € P(Top(X)) of a set S in a metric space (X, d) is said to have a Lebesgue number if 


JA € R*, VA € Pi(S), (diam(A) < à) > (ANE C, AC Q). 


37.7.18 THEOREM [ZF+CC]: Lebesgue covering lemma. 
Every open cover of a sequentially compact set in a metric space has a Lebesgue number. 


Proor: Let X be a metric space. Let S be a sequentially compact subset of X. Let C € P(Top(X)) be an 
open cover for S. Let Y = {A € P,(S); YQ € C, AZ Q}. If Y = f, then VA € P1(S), IQ € C, A C Q. Then 
clearly A is a Lebesgue number for C for any A € IR*. So assume that Y 4 Ø. Suppose that diam( A) = oo 
for all A € Y. Then diam(A) = oo for all A € P(S) such that VQ € C, A Z Q. Then once again, A is 
clearly a Lebesgue number for C for any A € Rt. So assume that diam(A) < oo for some A € Y. 


Let u = inf{diam(A); A € Y). Then u < oo. Suppose that u > 0. Then S has the Lebesgue number A for 
any A € (0, u). So assume that u = 0. Suppose that {y} € Y for some y € S. Then (y) € Q for some Q € C 
because S C | JC. Therefore there are no singletons in Y. So diam(A) > 0 for all A € Y by Theorem 37.4.9. 
Therefore Ve € IR*, JA € Y, diam(A) € (0,¢). So for all k € Z*, for some A € Y, diam(A) € (0, k^). 
Define the set-sequence (U;,)?2, by Up = U{A € Y; diam(A) € (0, k!)) for all k € Z*. Then Uk 4 0 
for all k € Zt. Therefore x?2,U, # @ by the axiom of countable choice. So there exists a sequence 
x = (£k) E€ XS, Ux. Then for all k € Z*, for some A € Y, ry € A and diam(A) e (0, k^). 

Since S is sequentially compact, the subsequence 7’ = x o G has a limit z € S for some f € Inc(Z^, Z*) by 
Definition 35.6.2. (See Notation 11.1.32 for *Inc".) Then z € 2 for some Q € C since S C UC. So B;, CO 
for some r € IR* because 2 € Top(X). By Definition 35.4.2, for some N € Z+, for all k > N, x, € Bz y/o: 
Let k = max(N,ceiling(2/r)). Then B(k) > k because D is an increasing function. So rg) € A for some 
A € Y with diam(A) € (0, 8(k)-!) C (0, k-!) C (0,r/2). Consequently A C Bz, for some A € Y because 
d(z,y) € d(z, zz) + d(z,,y) < r/2-- r/2 =r for all y € A, for any A in (A € Y; diam(A) € (0, 8(k) )). 
So A C Q for some A € Y and Q € C, which contradicts the definition of Y. Therefore the case u = 0 is 
impossible. Hence S has a Lebesgue number. 


37.7.19 REMARK:  Paracompactness of metric spaces. 

It cannot be proved in ZF set theory that all metric spaces are paracompact, even assuming the axiom of 
countable choice. (See Howard/Rubin [362], pages 53, 64, 136, 387. Note that since the publication of their 
book, it has been shown that the Howard/Rubin form 383 is equivalent to form 232.) The consequences of 
paracompactness of a metric space are somewhat abstract, and are usually much easier to prove by more 
direct means. For example, there is a well-known theorem that in every paracompact space, every open 
cover has a partition of unity subordinated to it. (See for example Willard [165), page 152; Szekeres [305], 
page 482; Lang [23], pages 34-36; EDM2 [113], page 1615.) But partitions of unity are typically needed in 
practice only for open covers which are so well behaved that it is obvious that the desired partition-of-unity 
functions exist. 


Since the proof that all metric spaces are paracompact is non-trivial, and the proof requires the axiom of 
choice, and the consequences are of dubious concrete value, no formal statement of this theorem is given 
here. (See Willard [165], pages 146-147 for a proof of this theorem.) 


37.7.20 REMARK: The Heine-Borel property for general metric spaces. 
Since the Heine-Borel theorem requires the definition of a bounded set for its statement. It cannot be 
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generalised from the real numbers to general topological spaces. It requires at least a metric space. (See 
Theorem 34.9.13 for the Heine-Borel theorem for the real numbers.) 


The Heine-Borel theorem is not valid for all metric spaces. The spaces where it is valid are said to have the 
“Heine-Borel property”. (For definitions and various classes of metric spaces which do and do not have the 
Heine-Borel property, see Rudin [130], pages 9, 18, 34-35, 153.) 


37.7.21 REMARK: The Bolzano-Weierstraf property. 

The Bolzano-Weierstraf theorem asserts that the Cartesian metric spaces R” for n € Zt have the Bolzano- 
Weierstraf property. (See Definition 35.5.4 for the Bolzano-Weierstraf property for general topological 
spaces.) 


37.7.22 REMARK: All separable metric spaces are second countable. 

All second countable spaces are separable by Theorem 33.4.25, but the converse is not valid for general 
topological spaces. However, the converse is valid for metric spaces, as stated in Theorem 37.7.23. (For the 
relations between separable spaces and second countable spaces, see Hocking/Young [93], pages 11, 64-65; 
Gemignani [80], page 148; Gaal [77], pages 120-121; Baum [54], pages 47-49; Kasriel [100], pages 193-194; 
Wilansky [163], pages 75-76; Willard [165], page 112; Kelley [101], pages 48-49; Steen/Seebach [141], pages 7, 
21-22, 35, 49-50; Simmons [137], page 100.) 


37.7.23 THEOREM: Every separable metric space is second countable. 
Every separable metric space is second countable. 


PROOF: Let (M,d) be a separable metric space. Then M has a countable dense subset S. The countability 
of S implies that there is a surjection $ : w — S. (It may be assumed that ¢ has the infinite domain w by 
permitting it to be non-injective.) Let C = (B(j1/4; i € w, n € Nj. Then C C P(Top(M)), and C is 
countable by Theorem 13.9.2 (vii). 

To show that C is an open base for M, let Q € Top(M) and let x € Q. Then B,, C Q for some r € Rt 
by Theorem 37.5.6 (ii). Let n = ceiling(2/r). Then n € N and n > 2/r, and Byz1/n € Top(M). Therefore 
oli) € Bz1/n for some i € w by the density of S in M. (See Theorem 33.4.3 (iii).) Then x € B,(;),1/, by the 
symmetry of d, and B,(j),4/, € Bz,2/n by Theorem 37.3.10 (iii). But B,5/, C Bey C Q. So C is an open 


base for M by Definition 32.2.3. Hence M is second countable. 


37.7.24 EXAMPLE: Metric space which is not separable and not second countable. 

As mentioned in Remark 37.5.7, the function d : M x M > IR with d(x,y) = 0 for x = y and d(z,y) = 1 
for r # y induces the discrete topology on X. (See Definition 31.3.19 for the discrete topology.) Then 
Top(M) — IP(M), which does not have a countable base if X is uncountable because a countable base 
for the discrete topology must contain all sets {x} for x € M. (See Definition 32.2.3 for open bases. See 
Definition 33.4.13 for second countable spaces.) Consequently not all metric spaces are second countable. 


Since every subset of a discrete topological space is closed, the closure of every set is itself. So a dense subset 
of the space must be the whole space. Therefore uncountable discrete topological spaces are not separable. 


37.8. Cauchy sequences and completeness 


37.8.1 REMARK: Some literature for complete metric spaces. 

For complete metric spaces, see Kolmogorov/Fomin [104], pages 56-61; Thomson/Bruckner/Bruckner [149], 
pages 575-579; Kasriel [100], pages 127-132; Simmons [137], pages 70-75; Bruckner/Bruckner/Thomson [56], 
pages 361—365; Shilov [135], pages 80-91; Johnsonbaugh/Pfaffenberger [97], pages 159-165; A.E. Taylor [145], 
pages 124-129; Willard [165], pages 175-181; Mattuck [114], pages 78-89; Gemignani [80], pages 217-222; 
Hocking/Young [93], pages 81-90; Graves [85], pages 346-350; Wilansky [163], pages 168-171; Schramm [133], 
pages 76-79; Gaal [77], pages 274-278; Kelley [101], pages 192-194; Steen/Seebach [141], pages 36-37. 


37.8.2 REMARK: Cauchy sequences specify convergence without specifying a limit. 

The purpose of defining Cauchy sequences is to specify that a sequence is in some way convergent, but 
without having to say what it converges to. Thus in Definition 37.8.3, the points of a sequence are required 
to come as close together as one wishes, and one knows that since all of the points in the sequence after some 
index lie within a ball of successively smaller radius, the sequence at least cannot converge to more than 


[www. geometry. org/dg.htm1 ] [draft: UTC 2023-1-3 Tuesday 00:13] 


1256 37. Metric spaces 


one point. If there are no “holes” in the metric space, the sequence intuitively should then converge to some 
point in the space. The sets Z[k, oo) in Definition 37.8.3 are defined as {i € Z; k € i} in Notation 14.4.11. 


37.8.3 DEFINITION: A Cauchy (convergent) sequence in a metric space (M, d) is a sequence (r;);29 € M” 
which satisfies 


Ve € Rt, 3k € Zf, Vi, j € Z|k, oo), d(zi,zj) < €. (37.8.1) 


37.8.4 THEOREM: Every convergent sequence in a metric space is a Cauchy sequence. 
In a metric space, every convergent sequence is a Cauchy sequence. 


PROOF: Let x = (r;);?9 be a convergent sequence in a metric space M. Then by Theorem 37.7.12 and 
Definition 35.4.2, there is a y € M such that Ve’ € Rt, Sk € Zf, Vi > k, d(zi,y) < &. Let e € IR^. Let 
e! — e/2. Then there is a k € Zj such that d(z;, y) < &' for all i > k. So d(zi, 25) < d(zi, y) + d(y, vj) < € 
for all i, j > k. Thus Ve € Rt, Jk € Z{, Vi, j > k, d(zi, 2) « €. Hence x is a Cauchy sequence in M. 


37.8.5 THEOREM: Every Cauchy sequence in a metric space converges to at most one point. 
Every Cauchy sequence in a metric space converges to at most one point. 


PROOF: Let M < (M,d) be a metric space. Let x = (x) be a Cauchy sequence in M. Suppose that 
x converges to points yi, y» € M. Then Vem € IR*, dk € ZF, Vi € Zk, oc), d(ya, zi) < €m for m = 1,2 by 
Theorem 37.7.12. But d(yi, ya) < d(y, zi) + d(z, 25) + d(aj, y2) for all i,j € ZË by the triangle inequality. 
Let e € Rt. Let &i = £2 = €/3. Then dk«(e&) € Zt, Vi € Zlkm(Em), 00), d(ym, zi) < Em = &/3 
for m = 1,2. But by Definition 37.8.3, 3k € Zj, Vi,j € Z[k, oo), d(zi,v;) < £/3. Therefore d(y1,y2) < 
d(yi, zi) + d(x;,2;) + d(rj,y2) < £ for all i € Z[max(k, ki(e1), &2(£2)), 00). So d(yi,y2) = 0. Therefore 
yı = ya by Definition 37.1.2 (i). Hence x converges to at most one point. 


37.8.6 REMARK: Relative speeds of Cauchy convergence and point-convergence. 

For applications to uniform convergence, Theorem 37.8.7 gives a useful quantification of the relative “speeds” 
of Cauchy convergence and point-convergence of sequences. For Cauchy convergence, shrinking bounds must 
be found for “tail diameters" Dy (xr). For convergence to a point p, shrinking bounds must be found for “tail 
radii” R,(a,y). These diameters and radii are related by a simple fixed ratio. 


37.8.7 THEOREM: Bounds between the speeds of Cauchy convergence and point-convergence. 
Let (M,d) be a metric space. Define (D;())?2, for sequences x = (x;)?29 in M by 


Vr € M", Yk € Zl, D(x) = diama(z( Z[k, oo) )) 
= sup{d(z;, 25); i,j € Z|k, co) }. 


Define (Rg(x,p))P2p for sequences x = (x;)%29 in M and p € M by 


Vx € M”, Yp € M, Yk € Z, Rz(x,y) = sup(d(z;, p); i € Z[k, oo)). (37.8.2) 


(i) If x is Cauchy convergent, then Vk € Zt, Dp(£) < 


ke Zg , Dk (x) « E. 
(iii) x converges to p € M if and only if Ve € IR*, 3k € Zi, Ry(az, p) < e. 
(iv) If x converges to p € M, then Vk € Zi, Ry(z,p) € D(x) < 2Rx(a,p). 


) 

(ii) x is Cauchy convergent if and only if 5o € R*, 
) 
) 


PROOF: For part (i), let 2 be a Cauchy sequence in M. Then there is a ko € Z such that d(r;,v;) < 1 
for all i,j € Z[ko, oo). So diam(z([ko, 00) )) € 1. Let K = max(d(z;, vj); i, j € [0,k)} = diamga(a( [0, k) )). 
Then K < oo (by induction on k). So x( [0,00) ) = a( [0, ko) JUx( [ko, oc) ) is bounded by Theorem 37.4.14 (ii). 
Therefore Do(z) < oo. Hence D;(x) < oo for all k € ZF by Theorem 11.2.42 (ii). 


For part (ii), let x by Cauchy convergent. Let € € R+. Then for some k € Zi, for all i,j € Z[k, oo), 
d(z;,vj) < €/2. So Dy(x) € &/2 < e. Thus Ve € Rt, dk € Zt, D(x) < e. 
Now assume Ve € IR*, 3k € Zt, D;(x) « €. Then x is a Cauchy sequence by Definition 37.8.3. 
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For part (iii), let x converge to some p € M. Let e € R*. Then for some k € Zi, for all i € Z[k, oo), 
d(z;,p) < £/2 by Definition 35.4.2 and Theorem 37.7.12. Therefore Ri(r,p) € €/2 < ©. Consequently 
Ve € Rt, 3k € Zt, Ry(z,p) < e. 

Now assume Ve € IR*, 3k € Zt, Ry(z,p) < £ for some p € M. Then z converges to p by Definition 35.4.2 
and Theorem 37.7.12 


For part (iv), let z converge to p € M. Let k € Zf. Lete € R*. Then d(z;,p) > Ry(x,p) — €/2 for 
some i € [k,oo) by line (37.8.2). But by Definition 35.4.2 and Theorem 37.7.12, d(zj,p) < &£/2 for some 
j € [k, oo). Then d(zi,z;) > d(z:,p) — d(yi,p) by Theorem 37.1.4 (ii). Thus d(zi,z;) > Rx(z,p) — €. So 


diam(z([k,oo))) > R&i(z,p)— & for all € € IR*. Hence D(x) > Ry(x,p) for all k € Zi. The inequality 
Dy (x) € 2H (x, p) follows from Theorem 37.4.11 (iii). 


37.8.8 REMARK: Existence of limits for Cauchy sequences. 
Line (37.8.1) in Definition 37.8.3 may be rewritten (possibly with a larger k for each £) as 


Ve € Rt, 3k € Zf, diama(x( Z[k, oo) )) < €. 


In other words, the "tail" of the sequence can be made to have an arbitrarily small diameter by removing 
a sufficient number of points from the beginning of the sequence. Intuitively it seems that there should be 
a point in common to some family of balls with shrinking diameter which include the shrinking tail of the 
sequence. Since a metric space such as the set of rational numbers with the usual distance function has 
infinitely many “holes” to which Cauchy sequences may “converge”, it is clearly not true that every Cauchy 
sequence converges to a point in the space. When they do all converge to a point within the space, it is 
called “complete”. 


37.8.9 DEFINITION: A complete metric space is a metric space in which every Cauchy sequence converges 
to a point in the space. 


37.8.10 REMARK: Complete and closed subsets of complete metric spaces. 

A converse of Theorem 37.8.11 is given by many authors, namely that a complete subset of a complete 
metric space must be closed, but the most obvious way to prove this requires a choice function for a countable 
sequence of open neighbourhoods of a point, and it is difficult to see how to accomplish this without invoking a 
countably infinite choice axiom. Therefore such a general converse is not presented here. However, a suitable 
choice function is available when the space is separable. (Example 37.7.24 shows that not all metric spaces 
are separable.) This is utilised in Theorem 37.8.12. 


37.8.11 THEOREM: Closed subspaces of a complete metric space are complete. 
Let M be a complete metric space. Let M’ be a metric subspace of M. If M’ is a closed subset of M, then 
M' is a complete metric space. 


PROOF: Suppose that M’ is a closed subset of M. Let x = (z;);29 be a Cauchy sequence in M’. Then 
x is a Cauchy sequence in M by Definition 37.2.14. So x converges to some z € M. Therefore z € M' by 
Theorem 35.4.19 (ii). Thus M' is a complete metric space by Definition 37.8.9. 


37.8.12 THEOREM: Complete subspaces of a separable metric space are closed. 
Let M be a separable metric space. Let M' be a metric subspace of M. If M' is a complete metric space, 
then M” is a closed subset of M. 


Pnoor: Let M be a separable metric space. If M — (), then all subsets of M are closed subsets of M. So 
assume that M # Ø. Then by Definition 33.4.6 and Theorem 13.7.13 (iv), M has a dense subset S for which 
there exists a surjection ¢ : Z — S. Let M' be a complete metric subspace of M. Let r € M'. Then 
M' AN B,ij, # 0 for all n € Z* by Theorem 31.8.17 (ii) (or alternatively by Theorem 37.5.6 (x)). For all 
n € Z+, let y, = ó(min(i € Zp; oli) € M'N B,1,4)). Then (yn), is a well-defined sequence in M’ by 
Theorem 33.4.3 (iii), and y, € M’ N B,,1/, for all n € Z*. So (yn)?21 converges to x by Theorem 37.7.12. 
Then (yn); is a Cauchy sequence in M’ by Theorem 37.8.4. So x € M' by Definition 37.8.9. Thus 
M' C M'. Hence M’ is closed by Theorem 31.8.14 (vi). 
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37.8.13 REMARK: Completeness of Cartesian metric spaces. 
The most important examples of complete metric spaces are the Cartesian spaces. The completeness of 
Cartesian metric spaces in Theorem 37.8.14 follows from the completeness of the real numbers. 


37.8.14 THEOREM: All real Cartesian spaces are complete metric spaces. 


(i) The metric space R is complete. 


(ii) The metric space IR" is complete for all n € Zf. 


PROOF: For part (i), let x = (z;);29 be a Cauchy sequence in IR. Define (Lx)? 9 by Lk = inf{ai; i > k} 
for all k € Z. Then Range(L) is a subset of IR which is bounded above. So sup(Range(L)) € R by 
Theorem 15.6.10 (i). Similarly, define (Ux);?. by Uy = sup(z;; i > k} for all k € Zf. Then Range(U) is 
a subset of IR which is bounded below. So inf(Range(U)) € R. Let e € Rt. Then d(z;,z;) < £ for all 
i,j € Z|k, oc), for some k € Zj by Definition 37.8.3. Consequently d(L,U) < & for alle € R*. So L =U. 
It follows that x converges to L = U € IR. Hence R is complete. 


For part (ii), let x = (2;)%29 be a Cauchy sequence in IR" with n € Zj. Then the sequences of components 
x = (a})%, for j € IN, are Cauchy sequences in IR. So xf converges to some zÍ € IR for each j € Nn by 
part (i). Then x converges to z for each j € Nn. Therefore x converges to the n-tuple (z a1 € R”. 
Hence R” is complete. 


37.8.15 REMARK: Generalisation of Cauchy sequences to Cauchy nets. 

Many authors define “Cauchy nets” as a generalisation of Cauchy sequences. Whereas a sequence uses the 
totally ordered set w as its domain, a net uses a “directed set” as its domain. A directed set (D, <) is like 
a partially ordered set except that it is not required to be antisymmetric, but it is required to satisfy a 
“common refinement” condition, Vr,y € D, dz € D, (x < z and y € z). A useful example to have in mind 
for a directed set is the set of partitions of an interval, where x < y means that y is a refinement of x. (See 
Definition 43.3.5 for refinements of partitions of real intervals.) 


A “net” is then any function from D to X, where X is any set and (D, X) is a directed set. A net f : D > X 
is said to converge to a point p in a topological space X if VO € Top, (X), 3x € D, Vy 7 x, f(y) € Q. This 
is essentially the same as the standard convergence criterion for sequences in Definition 35.4.2. 


For any metric space (X,d), a “Cauchy net" is a net f : D — X such that 


Ve € R^, 3x € D, Vy,z € D, (x € y and < z) > d(y,z) « e. 


This generalisation of Cauchy sequences is not pursued in this book. (For directed sets, nets and Cauchy 
nets, see for example Kasriel [100], pages 248-255; Bass [53], pages 202-203; Wilansky [163], pages 168-169; 
A.E. Taylor [145], pages 139-140.) 


In the case of approximations to Cauchy-Riemann-style integrals by summing function values over partitions 
of function domains, the resolution or “refinement” of partitions provides a single real-number parameter 
which can be made to converge to zero so that the usual kind of £- convergence concept can be applied. The 
refinement acts in the role of a metric for the domain for approximations. So no “net” needs to be defined. 


37.8.16 REMARK:  Generalisation of completeness to uniform spaces. 

Cauchy sequences and completeness may be generalised to structures more general than a metric, but less 
general than a topology. Kelley [101], pages 190-195, defines Cauchy nets and completeness for “uniform 
spaces", which are a generalisation of pseudo-metric spaces, which are in turn a generalisation of metric 
spaces. (See Remark 37.1.9 for pseudo-metric spaces. Kelley [101], page 174, attributes the invention of 
uniform spaces to a 1937 monograph by Weil [199]. See Remark 38.3.13 for further comment on uniform 
spaces.) This generalisation of Cauchy sequences and completeness is also not pursued in this book. (For 
some literature references for uniform spaces, see Remark 38.3.13.) 
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37.9. Nested set convergence in complete metric spaces 


37.9.1 REMARK: Convergent families of sets in compact metric spaces. 

As mentioned in Remark 37.8.2, Cauchy sequences have the substantial advantage that it is not necessary 
to know the limit of a sequence in order to say that it is convergent. In a vast range of scenarios in analysis, 
particularly when trying to prove existence of solutions to differential and integral equations, it is not possible 
to know what the limit of an approximation procedure is, but often the existence can be proved by applying 
the completeness of the target space. 


The benefits of the Cauchy sequence concept in complete spaces can also be obtained when the point-sequence 
is replaced by some other kind of sequence. Theorems 37.9.3, 37.9.6 and 37.9.8 assert that non-increasing 
sequences of non-empty closed sets converge to a unique point in a metric space if the space is complete. 
Theorem 37.9.12 makes similar assertions for set-families parametrised by a real variable. 


Terms such as “increasing” and “decreasing” in the context of set-valued functions assume that the range is 
ordered by set-inclusion, which is a partial order on the power set of any given set by Theorem 11.1.12. 


Convergence of set-families is particularly applicable to the definition of integrals. A simple example of 
this is Theorem 43.3.20 for the Cauchy integral, which shows that as the mesh of approximations converges 
to zero, the diameter of the set of approximations converges to zero, and this implies that a limit exists, 
although the value of that limit cannot be stated in advance. It is surely no coincidence that the Cauchy 
integral uses the Cauchy sequence concept to assert existence. 


The metric on the parameter space of a convergent set-family can be replaced by a “directed set” structure, 
as mentioned briefly in Remark 37.8.15. Set-families parametrised by integer or real parameters can then 
be replaced by more general structures. However, such metric-free parametrisation is seldom advantageous. 
Therefore such an approach is not presented here. 


37.9.2 REMARK: Nested closed real-number interval intersection theorem. 

Nested interval theorems for sets of real numbers and subsets of real Cartesian spaces can be stated and 
proved without introducing the abstract concepts of compactness and completeness for general abstract 
metric spaces. Therefore such theorems are typically given in real analysis texts in the more concrete 
context of real number system completeness. However, these theorems are presented here in Section 37.9 in 
the abstract context to avoid splitting them across multiple sections, which would make the interpretation 
in terms of the completeness concept less clear. 


Theorem 37.9.3 (ii) is called the “principle of nested intervals” by Shilov [135], page 21, the “nested intervals 
theorem” by Mattuck [114], page 78; A.E. Taylor [145], pages 34-35, the “nested intervals property” by 
Schramm [133], page 94, and the “nested interval property” by Edwards [67], page 446. The combined 


Theorem 37.9.3 (i, ii) is called the “nested interval theorem" by Kasriel [100], pages 64-65. 


37.9.3 THEOREM: Nested interval theorem for real numbers. Sequence version. 


(i) Let (S;)??9 be a non-increasing sequence of non-empty bounded closed real-number intervals. Then 
NZo Si z 0. 

(ii) Let (S;)??9 be a non-increasing sequence of non-empty closed real-number intervals which satisfies 
lim;_,o0 diam(5;) = 0. Then (72 S; = {2} for some unique z € R. 

PROOF: For part (i), define (Ly)? and (Uk)? o by Le = inf(Sk) € R and Uk = sup(S;) € R for 

all k € Zi. Let L = sup{Ly; k € Zi} and U = inf(Uy; k € Zi). Then L,U € R by Theorem 15.9.3 (xv), 

and L < U, and [L,U] C S; for all k € Zf. So by Theorem 84.8 (xviii), No Si 2 [L, U] z 4. 

For part (ii), the assumption that lim; ,4, diam(S;) = 0 implies that Jko € Zt, Vi > ko, diam($;) < oo. 

Define Uo eas and (Uk)? r, by Le = inf(Sk) € R and Uk = sup(Sk) € R for all k € Z[ko, 00). Let 

L = sup{ Lk; k € [ko, o0)? and U = inf(U,; k € Z[ko, o0)). Then L,U € R and L € U and [L,U] € Sy for 

all k € Za. But L =U because lim; ,5, diam(5;) = 0. So NXg S; = [L, U] = {z} where z = L = U. 


37.9.4 REMARK: The intersection of a nested family of closed sets might be empty. 

To see the necessity of the boundedness of sets in Theorem 37.9.3 (i), consider for example the family 
(S;)%9 defined by S; = [i,oo) € Top(IR) for i € Zj. To see the necessity of the sets being closed in 
Theorem 37.9.3 (i, ii), consider S; = (0, 1/(i + 1)] for i € ZF. 
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37.9.5 REMARK: Nested closed set intersection theorem in Cartesian spaces. 

The nested closed set intersection property in Theorem 37.9.6 (i) is called the “Cantor intersection theorem” 
by A.E. Taylor [145], pages 63, for Cartesian spaces, whereas Simmons [137], page 73, gives this name to 
the generalisation of Theorem 37.9.6 (ii) to general metric spaces. The combined Theorem 37.9.6 (i, ii) is 
called the “nested interval theorem” by Kasriel [100], page 74, for the special case of rectangular set families. 


(For Theorem 37.9.6 (ii), see also Gemignani [80], pages 219-220; Baum [54], page 137. For a rectangular set 
families, see also A.E. Taylor [145], pages 56-58.) 

The proof of Theorem 37.9.6 (i) uses a choice function to choose elements of general compact subsets of a 
Cartesian space. This is not very difficult for Cartesian spaces, but it is not possible for the general compact 
subsets of general complete metric spaces in Theorem 37.9.8 unless a suitable choice axiom is invoked. 


37.9.6 THEOREM: Cantor’s intersection theorem for Cartesian spaces. Sequence version. 
(i) Let ($;)%29 be a non-increasing sequence of non-empty bounded closed subsets of IR" for some n € Zj. 
Then = Si Æ (). 
(ii) Let (S;)?29 be a non-increasing sequence of non-empty closed subsets of R” with lim; ,4, diam(S;) = 0 
for some n € Zg. Then £p Si = {z} for some unique z € R”. 


PROOF: For part (i), let (.9;)?29 be a non-increasing sequence of non-empty bounded closed subsets of R” for 
some n € Z*. By Theorem 37.7.10, there is a choice function $ : Cn > IR" satisfying VK € Cn, o(K) € K, 
where Cn = (K € P(R”) V {0}; K is compact]. (By Theorem 37.7.7, a subset of IR" is compact if and only 
if it is closed and bounded.) Define x = (z;);2; in IR" by x; = 9(5S;) for all i € Zg. Then z; € S; C So 
for all i € Zg. So x has a convergent subsequence x’ = x o f in the compact set So by Theorem 37.7.14 
and Definition 35.6.2. Let z denote the limit of z'. Then z € Sga) for all i € Zi because i; € Sg) for all 
j € Zj with j >i. Therefore z € Zo 5;. Hence No 5; # 0. 

For part (ii), the assumption lim; ,5; diam(S;) = 0 implies that 3ko € De, Vi > ko, diam(S;) < oo. So S; 
is compact for all ¿ > kg for some kg € Zi. Therefore Fo S; # 0 by part (i). But lim;.,5; diam(S;) = 0 
implies that there is at most one z € Fo S;. Hence (7-9 Si = {z} for some z € R”. 


37.9.7 REMARK: Nested set intersection theorems for general complete metric spaces. 

Many authors present a version of Theorem 37.9.6 (ii) for general metric spaces, which requires a countably 
infinite choice axiom in general. Theorem 37.9.8 avoids invoking a choice axiom by requiring instead that a 
compact set choice function be provided. In the case of Cartesian spaces, such a compact set choice function 
is provided by Theorem 37.7.10 without invoking a choice axiom. 


37.9.8 THEOREM:  Cantor's intersection theorem for complete metric spaces. Sequence version. 
Let (X, d) be a complete metric space for which there exists a non-empty bounded closed set choice function 
$: K(X) X satisfying VC € K(X), (C) € C, where K(X) = (C € Top(X) \ {0}; C is bounded}. 


(i) Let (S;);??, be a non-increasing sequence of non-empty bounded closed subsets of a compact subset 
of X. Then No S; # 9. 

(ii) Let (S;)229 be a non-increasing sequence of non-empty closed subsets of a compact subset of X with 
lim; diam(5;) = 0. Then (Y/2, 5; = {z} for some unique z € X. 


Pnoor: For part (i), let (9;)72) be a non-increasing sequence of non-empty bounded closed subsets of a 
compact subset Co of X. Define x = (r;);29 in X by x; = 9(S;) for alli € Zg. Then z; € Sj € So for 
alli € Zü . So x has a convergent subsequence x’ = x o f in the compact set So by Theorem 37.7.14 and 
Definition 35.6.2. But z/ € Sg; for alli € Zf. So z € (7-9 Si, where z is the limit of x’. Hence Zo S: 4 0. 
For part (ii), the assumption lim; ,5; diam(.$;) = 0 implies that 3ko € Zd Vi > ko, diam(S;) < oo. So S; 
is compact for all i > kọ for some kg € Zg. Therefore So S; # Ó by part (i). But lim; ,4; diam(S;) = 0 
implies that there is at most one z € (7-9 Si- Hence No S; = {z} for some z € R”. 


37.9.9 REMARK: Countable choice version of Cantor intersection theorem for complete metric spaces. 
The proof of Theorem 37.9.10 (i), which assumes the countable choice axiom, is not constructed by feeding 
a choice function into the proof of Theorem 37.9.8 (i). Instead of choosing elements from general compact 
sets, Theorem 37.9.10 (i) chooses elements from sequences of general sets, which just happen to be compact 
in this case, although compactness is not used for constructing choice functions. 
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For versions of Theorem 37.9.10 (ii) which use the countable choice axiom, see for example Shilov [135], 
pages 83-85; Graves [85], page 347; Simmons [137], pages 73-74. For a version of Theorem 37.9.10 (ii) for 
compact spaces, see Rosenlicht [128], page 55. A version of Theorem 37.9.10 (ii) for nested closed spheres is 
given by Kolmogorov/Fomin [104], pages 59-61. 


37.9.10 THEOREM |ZF+CC]: Cantor’s intersection theorem for complete metric spaces. Sequence version. 
Let (X,d) be a complete metric space. 


(i) Let (5;);?9 be a non-increasing sequence of non-empty bounded closed subsets of a compact subset 
of X. Then (Y, 5; #9. 


(ii) Let (S;);29 be a non-increasing sequence of non-empty closed subsets of a compact subset of X with 
lim; diam(5;) = 0. Then (72, S; = {z} for some unique z € X. 


Pnoor: For part (i), let (S;);*9 be a non-increasing sequence of non-empty bounded closed subsets of 
a compact subset Co of X. Deane x = (x;)%2,5 in X by Theorem 13.7.22 (the “countable multiplicative 
axiom”), there is a sequence (2;)%29 which satisfies v; € S; C So for all i € Zj. So x has a convergent 
subsequence x’ = x o f in the compact set So by Theorem 37.7.14 and Definition 35.6.2. But x; € Sga) for 
all i € Zj. So z € Np Si, where z is the limit of x’. Hence No S; 4 0. 

For part (ii), the assumption lim; 4; diam(S;) = 0 implies that 3ko € Zg, Vi > ko, diam(S;) < oo. So S; 
is compact for all i > kg for some ko € Zf. Therefore rn" S; # Ü by part (i). But lim; ,;, diam(S;) = 0 
implies that there is at most one z € (7-9 Si- Hence No S; = (2) for some z € R^. 


37.9.11 REMARK: Family versions of nested set intersection theorems. 

Theorem 37.9.12 gives “family versions" of Theorems 37.9.3 and 37.9.6. The statements and proofs of 
the assertions are very similar. Theorem 37.9.12(ii) is useful in integral calculus to show the existence 
of limits for families of sets of sums with mesh converging to zero. (See for example Theorem 43.3.20.) 
Theorem 37.9.12 (iv) is useful for integrating finite-dimensional vector-valued integrals. 


Since the approximating sums for integrals require vector addition and scalar multiplication operations, the 
generalisation of Theorem 37.9.12 (iv) to more general metric spaces is not very useful for integral calculus. 
At the very least, a Banach space structure would be required. (See Section 39.4.) 


37.9.12 THEOREM:  Cantor's intersection theorem for Cartesian spaces. Family version. 
(i) Let (Ss5)sem- be a non-decreasing family of non-empty bounded closed real-number intervals. Then 
Osert Ss £ 0. 


(ii) Let (Ss)sem- be a non-decreasing family of non-empty closed real-number intervals which satisfies 
lims_,9+ diam(S5) = 0. Then fser+ Ss = (2) for some unique z € R. 


(iii) Let (Ss)ser+ be a non-decreasing family of non-empty bounded closed subsets of R” for some n € Zj. 
Then arm $5 x 0. 

(iv) Let (Ss)scr+ be a non-decreasing family of non-empty closed subsets of IR" with lim;_,9+ diam(S5) = 0 
for some n € Zj. Then (jeg. Ss = {z} for some unique z € IR". 


Proor: For part (i), define L : Rt — IR and U : Rt > R by L(8) = inf(S;) € IR and U(6) = sup(S5) € R 
for all 6 € Rt. Let L = sup(L;; 6 € Rt} and U = inf(U5; ô € Rt}. Then L,U € IR by Theorem 15.6.10 (i), 
and L < U, and [L,U] C S; for all 5 € Rt since S; is closed. So (],cg. 55 2 [L, U] by Theorem 8.4.8 (xviii). 
Hence as Ss #0. 

For part (ii), the assumption that lim ;_,9+ diam(S;5) = 0 implies that 3óo € Rt, Vô € (0, do], diam(S5) < oo. 
Define L : (0,69] — R and U : (0, ôo] — R by L(6) = inf(S5) € R and U(6) = sup(S;) € IR for all 6 € (0, do]. 
Let L = sup{ Ls; ô € (0, ôo]} and U = inf{Us; ô € (0,09]]. Then L,U € IR and L < U, and [L, U] C S; for 
all 6 € IR* because S; is closed. But L = U because lim; ,9« diam(S5) = 0. So (jeg. S5 = [L, U] = {2}, 
where z= L — U. 

For part (iii), by Theorem 37.7.10 there is a choice function $ : Cn — R” satisfying VK € Ch, ó(K) € K, 
where Cn = {K € P(R”) \ {0}; K is compact}. (By Theorem 37.7.7, a subset of R” is compact if and only 
if it is closed and bounded.) Define x : Zi — R” by z; = $(S1,(41)) for all i € Zg. Then z; € $1/ü41) € 9&1 
forallic Zi . So x has a convergent subsequence x’ = x o f in the compact set Sı by Theorem 37.7.14 and 
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Definition 35.6.2. Let z denote the limit of x’. Then z € 5$1/(5(j41) for all i € Zg because x; € $1/(8(a) +1) 


for all j € Zi with j > i. Therefore z € (Y 51/(41). Hence 


apum 552 Nizo S1/(i41) #0. 


For part (iv), the assumption lim;_,9+ diam(S5) = 0 implies 4d) € IR*, Vô € (0, ôo], diam(S5) < oo. So Ss is 
compact for all à € (0, ôo] for some ôo € R*. Therefore (;c p+ S; #9 by part (iii). But lims_,9+ diam(95) = 0 


implies that there is at most one z € (|; cg. S5. Hence N ser+ 
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Chapter 38 


METRIC SPACE CONTINUITY 
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38.10 Functions of bounded variation... . . 4. 4 2 2 ll ll ls 1289 


38.0.1 REMARK: Lipschitz functions and rectifiable curves may be defined for general metric spaces. 
Some of the topics in Chapter 38, such as Lipschitz functions and rectifiable curves are more closely related 
to calculus than topology. It turns out that Lipschitz functions and rectifiable curves are differentiable 
almost everywhere when the metric space is a manifold. Differentiability is a calculus concept and “almost 
everywhere" is a measure theory concept, but the Lipschitz and rectifiability conditions may be stated in 
the absence of such higher-layer concepts. 


38.1. Continuity of maps between metric spaces 


38.1.1 DEFINITION: A continuous function from a metric space (M, d) to a topological space (X, Tx) is a 
function f : M — X which is continuous with respect to the topologies Top( M) and Tx. 


Continuity of a function f : X — M is defined with respect to Tx and Top(M ) respectively. 


A function f : Mı — Mp for metric spaces (Mi, dı) and (Mg, d2) is said to be continuous when f is continuous 
with respect to Top( Mi) and Top( M3). 


38.1.2 REMARK: Equivalence of topological space continuity and €-6 continuity in a metric space. 
Theorem 38.1.3 states that in a metric space, £- continuity is the same as continuity with respect to the 
induced topology of the metric space. 


The &-ó definition of limits and continuity is attributed to Weierstraf by Bell [234], page 294, and by Bynum 
et alii [238], page 15. In an earlier time, continuity was defined intuitively. This made it difficult to arrive at 
consensus on which kinds of functions were continuous. But more importantly, the absence of an objective 
definition of continuity made deductive arguments impossible. As soon as a purely logical expression for 
continuity was discovered and agreed upon, rapid progress in the subject was possible. This shows the 
importance of replacing intuition with objective definitions. 


38.1.3 THEOREM: The &-ó criterion of continuity of maps between metric spaces. 
A function f : Mı — M» between metric spaces (Mi, dı) and (M2, d2) is continuous if and only if 


Yz € Mı, Ve > 0, 5d > 0, Vy € Mi, 
di (x,y) <ô => do( f(x), f(y)) <E. 
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PROOF: The topologies for (Mi, dı) and (M2, d2) are given by Definition 37.5.2. (Denote open balls in Mi 
and Mə by B! and B? respectively.) Suppose that f : Mı — Mg is continuous. Let x € M; and £ € IR*. 
Then Buys € TOp f(x) (M2) by Theorem 37.5.6 (i). So F (Bea) e) € Top, (Mi) by Definition 31.12.4. So 
Bis C f !(Bjq,) for some 6 € R* by Theorem 37.5.6 (ii). Then f(B; ;) € f(£!(B$55,)) € Bj. by 
Theorems 10.6.7 (ii) and 10.7.1 (i^). This verifies condition (38.1.1). 

Now suppose that condition (38.1.1) holds. Let Q € Top(M2). Let x € f !(Q0). Then f(x) € Q. So 
Bia) ¢ € 9 for some e € R" by Theorem 37.5.6 (ii). Then f^^ (B2, e) € f~*(Q) by Theorem 10.6.10 (ii). 
By condition (38.1.1), B1; C (y € Mi; d2(f (x), f(y)) < £} for some 6 € Rt. But (y € Mi; da( f(x), f(y)) < 
E} = f (Bg). So Bl, € f 1 (Baye) € f^! (Q). So f^! (Q9) € Top(Mi) by Theorem 37.5.6 (ii). Thus 
YQ € Top(M3), f-!(Q) € Top(Mi). Hence f is continuous by Definition 31.12.4. 


38.1.4 THEOREM: Continuity of the distance of a point from a fixed set. 
Let (M, d) be a metric space. Let A € P(M) \ (0). Define fa : M — Rọ by falx) = d(x, A) for x € M. 


(i) A= {x € M; fa(z) = 0}. 


(ii) fa is continuous. 


Pnoor: For part (i), A= {x € M; fa(x) = 0) by Theorem 37.5.6 (ix). 
For part (ii), let y, z € M. Then |d(y, A) — d(z, A)| < d(y, z) by Theorem 37.4.2 (iv). So fa is continuous by 
Theorem 38.1.3. 


38.1.5 THEOREM: Every metrisable topological space is a Te space. 
Let X be a metrisable topological space. Then X is a Tg topological space. 


PRoor: Let X be a metrisable topological space. Then the topology on X may be induced by a real-valued 
metric function d: X x X > Rj. Let K be a closed subset of X. If K = (), define f : X > IR by f(x) =1 
for alla € X. Then K = (x € X; f(x) = 0]. So f satisfies the requirements of Definition 33.3.30 for a Te 
topological space. (By Remark 33.3.31, it is not actually necessary to check the special case K — ().) 

If K is non-empty, define f : X — Ij by f(a) = d(z,K) for all x € X, where d(z, K) denotes the 
distance between x and the set K, as in Definition 37.4.1. Then d(z, kK) = 0 if and only if x € K, by 
Theorem 37.5.6 (viii), because K = K. That is, K = {a € X; f(x) = 0). The continuity of f follows from 
Theorem 38.1.4 (ii). Hence X is a Te topological space by Definition 33.3.30. 


38.1.6 REMARK: Expression for £-Óó continuity in terms of open balls. 

Condition (38.1.1) in Theorem 38.1.3 is expressed in Theorem 38.1.7 in terms of open balls. The ball notation 
requires superscripts to indicate in which metric space the balls are defined, although these can generally be 
guessed from the context. 


38.1.7 THEOREM: The &-Óó open-ball-inclusion criterion for continuity. 
A function f : Mı — M» between metric spaces (Mj, dı) and (M2, d2) is continuous if and only if 


Va € Mi, Ve > 0, 45 > 0, f (Bis) © Be 


(38.1.2) 


g)” 


PROOF: Condition (38.1.2) is exactly equivalent to condition (38.1.1) in Theorem 38.1.3. So the assertion 
follows from Theorem 38.1.3. 


38.1.8 THEOREM: Continuity expressed as convergence of diameter of images of ball-neighbourhoods. 
A function f : Mı — M» between metric spaces (Mj, dı) and (M2, d2) is continuous if and only if 


Vr € Mı, lim diama(f(B7 )) = 0. 
6-0 ' 


PROOF: Let f be continuous. Let e € IR*. Then by Theorem 38.1.7, f(B1;) C B6 eja for some 6 € IR*. 
So diam;(f(Bl1;)) € e/2 by Definition 37.4.6. Thus Ve € Rt, 3ó € IR*, diam»(f(B1,;)) < e. Therefore 
lims_,o+ diama(f (BÀ 5)) — 0 by Definition 35.3.7 and Notation 35.3.17. 

For the converse, assume that lims_,9+ diamz(f(B1,;)) = 0. Then Ve € Rt, 38 € Rt, diam(f(B1;)) < e. 


Let € € R+. Then diam?(f(B;;)) < € for some 6 € Rt. Therefore f(B;;) C BẸ), for some 6 € R* by 
Definition 37.4.6. Hence f is continuous by Theorem 38.1.7. 
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38.1.9 REMARK: Continuity condition for functions from a Euclidean space to a Euclidean space. 
Theorem 38.1.10 specialises Theorem 38.1.3 to the case where the metric spaces are real-tuple spaces with 
the usual Euclidean metric specified in Definition 37.2.8. 


38.1.10 THEOREM: The &-Ó continuity criterion for maps between Cartesian spaces. 
A function f : U — IR" for U € Top(IR"), for m,n € Z, is continuous if and only if 


Yz € U, Ve > 0, 3ô > 0, Yy EU, |x—y| <ô => |f(z) - f(y)| « e. 


PROOF: The assertion follows from Theorem 38.1.3 because the distance functions on Mı = U C R” and 
Mə = R” are given by d; : (2, y) 5 |z — y|r» and də : (z, y) ^ |z — y|iRm respectively. 


38.1.11 REMARK: Continuity of sums and products of continuous functions. 

Although Theorem 38.1.12 is elementary and well known, its proof is a useful exercise in the &-ó quantifier 
calculus which is required for many similar proofs. In most such proofs, the key step is the choice of the € 
values for the “input functions" so as to deliver the required € value for the “output function". 


38.1.12 THEOREM: Sums and products of continuous real- Cartesian-space-valued functions are continuous. 
Let (M, d) be a metric space. 

(i) Let f : M — R” be continuous and c € IR for some m € Z. Then cf : M > IR" is continuous. 

(ii) Let fk: M — R” be continuous for k = 1,2 for some m € Zj. Then fı + fə : M — R” is continuous. 
(iii) Let fẹ : M — R be continuous for k = 1,2. Then fif: M — R is continuous. 


PROOF: For part (i), if c = 0 then cf is continuous by Theorem 31.12.9. So assume that c # 0. Let 
x € M and e € IR*. Let &4 = &/|c|. Then &i € IR^. So by Theorem 38.1.3, there is a 6 € IR* such that 
Vy € Bas, |f(y) — f(x)| < &1. Then [(ef)(y) — (cf)()] = lel - |F) — f(z)| < e for all y € Bes. Hence cf is 
continuous by Theorem 38.1.3. 

For part (ii), let x € M and e € Rt. Then Vy € Bs,,, |fx(y) — fx(x)| < €/2 for some 6, € Rt for k = 1,2 by 
Theorem 38.1.3. Let 6 = min(d1, 42). Then |(f1 + f2)(y) — (fi + fa)(@)| € lay) — Al) -1fa(y) — fa@)| < € 
for all y € B;,,;. Hence fi + f? is continuous by Theorem 38.1.3. 

For part (iii), let x € M and e € Rt. Let e; = 1e/ max(1,|f2(z)|) and e2 = 1e/ max(ei,|fi(z)|). Then 
€1,€2 € IR*. So there are 6, € Rt such that Vy € Bz.s,, |fx(y) — fe(x)| < ex for k = 1,2 by Theorem 38.1.3. 
Let ô = min(04,02). Then 


Vy € Bre, AIDU) — AIDI = IG Qu) fav) — fy) fo(@)) + (f(y) fale) — fa (a) fl) 
| |fo(y) — fa(x)| + |fa(x)| |s(v) — f(x) 

| + €1| fo(x)| 

+ ealfi(y) — fa(a)|  exlfa(z)] 


€ |fi(x)| + €2€1 + eil fo(x)| 


IA IA IA IA IA 
M 
N 
m^ 
Bmw 


Hence f; + f2 is continuous by Theorem 38.1.3. 


38.1.13 REMARK: Continuity of the inverse of a continuous bijection between metric spaces. 
It is not true that general continuous bijections between metric spaces have continuous inverses. For example, 
consider the function f : [0, 1) U (1, 2] — [0, 2) defined by 


x for x € [0, 1) 
1 1,2 = , 
Vitel Henr F(z) i for x € (1,2]. 
Although f is continuous (assuming the induced metric from R on the domain and range), the inverse 
function f-! is not continuous at 1. But Theorem 38.1.14 shows that if the domain is compact, then 
the inverse of a continuous bijection is continuous. (For another example of a continuous bijection on a 
non-compact domain whose inverse is not continuous, see Example 50.3.11 and Figure 50.3.2.) 
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38.1.14 THEOREM: Continuous inverse of continuous bijection from compact metric space to metric space. 
Let X be a compact metric space. Let Y be a metric space. Let f : X — Y is a continuous bijection. Then 
f :Y — X is continuous. 


PROOF: Let f: X — Y bea continuous bijection, where X is a compact metric space and Y is a metric 
space. Let Q € Top(X). Let K = X\Q. Then K is a compact set by Theorem 33.5.13 because K is a closed 
subset of the compact set X. So f(K) is a compact set by Theorem 33.5.15. Therefore f (K) is a closed subset 
of Y by Theorem 33.5.14 because Y is a Hausdorff space by Theorem 37.6.9. So f(Q) = Y V f(K) € Top(Y) 
by Definition 31.4.1. Thus f(Q) € Top(Y) for all Q € Top(X). Hence f^! : Y — X is continuous by 
Definition 31.12.4. 


38.2. Explicit continuity of maps between metric spaces 


38.2.1 REMARK: Explicit continuity of functions. 

The term “explicit continuity" means here that the existential variable 6 which appears in the metric space 
version of the definition of continuity in Theorem 38.1.3 is given explicitly as a function of the universal 
variable e. In other words, “Ve € IR*, 36 € IR^," is replaced by *3ó : IR^ > IR*, Ve € R*,", and *ó" is 
replaced by “d(e)”. (This is closely related to the concept of an explicit right inverse for a surjection which 
is discussed in Remark 10.5.16.) Explicit continuity removes doubt that the "choice" of the existential 
variable could require a choice axiom. 


Theorems 38.2.7 and 38.2.10 assert that it is always possible to prove within pure ZF set theory (without 
the axiom of choice) that an explicit choice function exists for 6 as a function of e. This is in contrast to 
a comparable situation which arises for sets of measure zero (and for the general Lebesgue measurability of 
sets), where one needs to know that there exists a choice function for a countably infinite family of covers, 
although this cannot be done without the axiom of countable choice in general. (See Remark 45.3.6.) 


38.2.2 REMARK: Existence of a ó(&) choice function with or without the axiom of choice. 

At first sight, it might seem that an axiom of choice could be required in order to prove the existence of a 
"ó(&) choice function". Theorem 10.5.17 requires the axiom of choice to prove the existence of a right inverse 
of every surjective function. (See also Remark 10.11.11 for “quantifier swapping” .) 


e z(0,€) denote the proposition C or functions f : Mı — Mo, points x € Mı an 
Let Py,,(ó,€) denote thi ition *f Bis C Bio" for functi f:M M. int Mı and 
parameters 6,¢ € R*. Then the continuity of f at x is equivalent to the proposition 


Ve € R*, {5 € Rt; Pr. (5,2) £ 0. 


In other words, Ve € Rt, X. # 0, where X. = (6 € R*; Pr, (0,c)) for all e € IR*. By the axiom of choice, 
one may assert X-cp+X- z Ø. Therefore there exists a function 6 : IR* > IR* such that d(e) € Xe for 


all e € R+. In other words, there exists a function 6 : Rt + IR* such that Ve € R*, f(B} 8()) c Bia = 


Thus 6 is a choice function which chooses a value of 6 from IR* for each e € IR*. In this way, one may reverse 


the order of the universal and existential quantifiers to obtain 3ó € (IR^)E', Ve € Rt, f(B} Bley) € Bio. 


Luckily the axiom of choice is completely unnecessary for choosing a value of ó for each e. 

Define ó*(c) = sup(ó € IR*; Pr(0,2)) for all e € IR*. Then ó*(c) € IR* for all e € R*. (One may easily 
force the value to be finite by defining 9^ (e) = min(1,d+(e)) for e € IR* if infinite values are inconvenient.) 
Then one obtains Ve € IR*, f (B. 5+(c)) G Boy. No invocation of an axiom of choice is required to ensure 
the existence of the function ôt : Rt > Rt. The fact that 6+ : Rt — Rt is a function follows from the 
observation that Yô1, 2 € Rt, 04 € 6 => (Pr, (02,&) — Py, (01,&)). Therefore the sets (ó € Rt; Prx(d,e)} 
are real-number intervals. 


The choice function 5* is in essence the inverse of the pointwise modulus of continuity in Definition 38.2.4. 


38.2.8 REMARK: The pointwise modulus of continuity, or "distance transfer function envelope". 

For any function f : Mı — Mo, for metric spaces (Mj, dı) and (M2, da), and any p € Mj, one may define 
a “pointwise modulus of continuity" wy, : Rj > Rb by wfp(ô) = inf{r € Rb; f (Bhs) € Biip] for all 
ó € Rj. (See Definition 38.3.9 for the corresponding global modulus of continuity.) The characteristics 
of this function are the basis of various specialised definitions of continuity such as Lipschitz and Holder 
continuity. (See Sections 38.6 and 38.7.) By analogy with an electronics engineering concept, one could 
think of this as a "distance transfer function envelope". 
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38.2.4 DEFINITION: The pointwise modulus of continuity at a point p € Mı of a function f : Mı — M», 
for metric spaces (Mi, d1) and (M3, d2), is the function wr, : R$ — Ri defined by 


YS € RF, wf p(6) = sup{d2( f (x), f(p)); x € Bj). 


38.2.5 REMARK: Explicit continuity for a continuous function. 

Definition 38.2.6 reverses the universal and existential quantifiers for € and 6 in Definition 38.1.1. This may 
be thought of as specifying an explicit inverse modulus of continuity for the function at a point. A function 
6 € (IRt)®" specifies a single choice of e(5) € IR* for each e € Rt. 


38.2.6 DEFINITION: An explicitly continuous function at a point x € Mı between metric spaces Mı and Mə 
is a function f : Mı — Mə which satisfies 


de (R')F,ve»0,Vye Mi, — di(z,y) < 5(e) > da(f(z), f(y)) < e. 


In other words, 3ó : Rt — Rt, Ve > 0, f(Bi s) c BG. 


38.2.7 THEOREM:  Pointwise equivalence of continuity and explicit continuity between metric spaces. 
A function f : Mı — Mə for metric spaces Mı and Mə is continuous at a point x € Mı if and only if f is 
explicitly continuous at x. 


Pnoor: It follows by Theorem 38.1.3 and pure predicate calculus (by Theorem 6.6.24 (iii) in particular) 
that an explicitly continuous function at a point is continuous at that point. As outlined in Remark 38.2.2, 
the converse follows from the observation that one may choose an explicit value of 6 for each & by constructing 
the supremum of the values of € R* for which f(B; 5) C By, One may use the value 6+ (e/2)/2 to be 
certain that the inclusion is strict. 


c), 


38.2.8 REMARK:  Ezplicitly continuous functions between metric spaces. 

Definition 38.2.9 is a simple extension of Definition 38.2.6 from a single point to the whole domain M; of 
a function f : Mı — Mə. This should not be confused with uniform continuity, which demands the same 
pointwise modulus of continuity for all x € Mj. In Definition 38.2.9, the value ó,(c) depends arbitrarily 
on x. This explicit definition is once again equivalent to the standard definition of continuity. 


38.2.9 DEFINITION: An emzplicitly continuous function between metric spaces Mı and M» is a function 
f: Mı — Mə which satisfies 


5: Mı => (R*)E , Yz € M,, Ve > 0, Vy € Mi, 
di(z,y) < dx(e) = da(f(z), f(y)) < e. 


In other words, 46 : Mı > (IR* > R*), Vx € Mi, Ve > 0, £5, e) CB 


2 
f(a),e° 


38.2.10 THEOREM: Equivalence of continuity and explicit continuity between metric spaces. 
A function f : Mı — M» for metric spaces Mı and Mə is continuous if and only if f is explicitly continuous. 


PROOF: An explicitly continuous function is continuous by Theorem 38.1.3 and predicate calculus, in 
particular by Theorem 6.6.24 (iii). The converse follows by constructing the function d+ : Mı > (Rt > Rr), 
whose value is the supremum of the values of ó for which line (38.1.1) holds in Theorem 38.1.3. One may 
use the value à, (c) = 67 (€/2)/2 to be totally certain that the inclusion f(B} 5 (.)) C Bf, is strict. 


38.2.11 EXAMPLE: Let Mı = Mə = R and define f : Mj — Mə by f : zx + z?. Then the function 
6+ : Mı > (IR* > Rt) mentioned in the proof of Theorem 38.2.10 is given by 6+ (e) = (x? + c)!/? — |a| for 
all z € Mı and £ € IR*. (This is illustrated in Figure 38.2.1.) 
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IN 


z-—zo— 6x, (e) z—x9--0] "MO 


Figure 38.2.1 Explicit choice of 6 € IR* for each zo € M; and e € Rt 


38.3. Uniform continuity 


38.3.1 REMARK: Uniform continuity. 

Uniform continuity is well defined for functions between metric spaces. Definition 38.3.2 defines uniform 
continuity by swapping the quantifiers Vr and 3ó in Theorem 38.1.3 line (38.1.1). The uniform condition is 
equivalent to Ve > 0, Jó > 0, Vr € Mi, f(Bz,5) € By(zj,., which is equivalent to requiring that the function 
Wf p in Definition 38.2.4 satisfy Ve > 0, Jó > 0, Vp € Mi, wfp(ô) € e. Most importantly, the ô value is 
independent of x. 


Uniformly continuous functions include some classes of uniformly Holder and Lipschitz functions, but some 
classes of Holder and Lipschitz functions are not uniformly continuous. (See Sections 38.7 and 40.8.) 


38.3.2 DEFINITION: A uniformly continuous function f : Mı — Mə from a metric space (Mi,d1) to a 
metric space (M2, d2) is a function f : Mı — Ms» which satisfies 


Ve > 0, 35 > 0, Vz,y € Mi, di(z,y) <6 => do(f(z), f(y)) < e. 


38.3.3 REMARK: Some basic theorems for composition of uniformly continuous functions. 
'Theorem 38.3.4 is the uniformly continuous version of Theorem 31.12.7. Theorem 38.3.5 is the uniformly 
continuous version of Theorem 32.9.8. 


38.3.4 THEOREM: The composition of uniformly continuous functions is uniformly continuous. 
Let f : Mı > M» and g : Mz — M3 be uniformly continuous functions between metric spaces (Mi, di), 
(Mz, da) and (Ms, ds). Then go f : Mı > M; is uniformly continuous. 


PROOF: Let e € R*. Then there exists 0? € IR* such that do(x2,y2) < ó» = da(g(z2),g(y2)) < € for 
all T2, Y2 € Ms». But then there exists 01 € IR* such that dı (z1, y1) < Ói > do( f (a1), f(y1)) < 69 for all 
21,91 € Mi. Therefore for such ô, for all z1, 22 € My, di(a1, £2) < 01 > da(g(f(z1)), g(f(x2))) < £. Thus 
go f: Mı > Ms is uniformly continuous by Definition 38.3.2. 


38.3.5 THEOREM: Map from domain to graph of uniformly continuous function is uniformly continuous. 
Let (X,dx) and (Y,dy) be metric spaces. Let f : X — Y be uniformly continuous. Let (X x Y,d) 
be the Cartesian metric space product of (X,dx) and (Y,dy). (See Definition 37.2.16.) Then the map 
g:X > X xY defined by g : x > (x, f(x)) is uniformly continuous. 


PROOF: Let e € IR^. Then there exists 6, € IR* such that dy(f(zi), f(z2)) < £/2 for any z1,23 € X 
with dx(zi,22) < 61. Let 6 = min(e/2,6,) Then d((a1, f(a1)), (£2, f (x2))) < (&?/4 + £?/4)!/? < e for all 
21,29 € X with dx (x1,22) < ô. Hence g : X —^ X x Y is uniformly continuous by Definition 38.3.2. 


38.3.6 REMARK: Methods of proof that continuity on compact metric spaces is uniform. 
The simple form of “direct proof” given here for Theorem 38.3.7 is also followed for general metric spaces by 
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A.E. Taylor [145], page 74; S.J. Taylor [147], page 37; Bass [53], pages 214-215; Rosenlicht [128], pages 80-81, 
and for real-number functions by Schramm [133], pages 236-238. 


Various forms of “indirect proof", often requiring the axiom of countable choice, are given by Rudin [129], 
pages 78-79; Kolmogorov /Fomin [104], page 109; Edwards [67], page 54; Mattuck [114], page 191; Shilov [135], 
pages 137-138; B. Mendelson [115], pages 177-178; Gemignani [80], pages 166-167; Hocking/Young [93], 
pages 30-31; Johnsonbaugh/Pfaffenberger [97], page 153-154; Kasriel [100], pages 103-105; Simmons [137], 
pages 77-79, 124. For the special case of real-number functions, see Thomson/Bruckner/Bruckner [149], 
page 220; Friedman [74], page 67. The first publication of Theorem 38.3.7, for compact real intervals only, 
was in 1872 by Heine [181], page 188. 


38.3.7 THEOREM: Function continuity from a compact metric space to a metric space is uniform. 
Let K be a compact subset of a metric space Mi. Let Mz be a metric space. Let f : K — M^» be continuous. 
Then f is uniformly continuous. 


PROOF: Let K be a compact subset of a separable metric space (Mi,di). Let (M»,d5) be a metric space. 
Let f : K — M» be continuous. Let € € IR*. Then 


Vx € K,3ó € R*, f(Bi,;nK)C Bera esa (38.3.1) 


by Theorems 38.1.7 and 6.6.24(i). Let C = {Bj 5); € K,ó € R* and f(B;;n K) C Bj]. 
Then C C Top(Mi) \ {0} by Theorem 37.5.6 (i) and K C UC by line (38.3.1). In other words, C is 
an open cover of K. So K has a finite subcover C of C by Definition 33.5.10. For € € Top(Mi), let 
So = ((z,r) € K x R'; Q = Bzr}. Then So Z 0 for all Q € C by the definition of C. (Note that it 
is not guaranteed that #(Sq) = 1 for all Q € C.) So Xoce90 # Ú by the “axiom of finite choice". Let 
(£o, Oa) pce € Xgce So. Let 6 = + min(óo; QE C}. Then 6 € IR* is well defined. Let 21, z2 € K satisfy 
dy (21,22) < 6. Then z1 € Bugs, for some 2 € C because K C U C. Therefore for such Q, 


di(zo,z2) € di(zo, 21) + dy (21, 22) 
< tóg +ô 
< $60 + 560 
= dq. 


Thus 21, 22 € Brg sg = €. So f(z1), f(22) € Byiag),e/2 by definition of C. Therefore do(f (21), f(22)) < € by 
the triangle inequality for dz. (See Theorem 37.1.2 (iii).) So Vz1, z» € K, (di(z1, 22) < 6 > da(f (21, 22) < €)). 


Hence f is uniformly continuous by Definition 38.3.2. 


38.3.8 REMARK: Two definitions styles for “modulus of continuity". 
The modulus of continuity concept is a generalisation of the Lipschitz constant in Definition 38.6.6 and the 
Holder continuity functions (x, y) — Kdi(z, y)? in Definition 38.7.2. 


In one style of “modulus of continuity" definition, a function f between two metric spaces is fixed, and 
the lowest possible envelope wy is computed for that function's distance function. In another style, the 
modulus of continuity w provides an upper bound for one or more functions. These styles are both given in 
Definition 38.3.9. 

The first style of modulus of continuity wy is uniquely determined by a given function f. In other words, it 
is a property of the function. The second style w is a non-unique upper bound. Thus the first style wy is a 
least upper bound, namely the least modulus of continuity bound w consistent with a given function f. 


38.3.9 DEFINITION: The modulus of continuity of a function f : Mı — M» between metric spaces (Mi, di) 
and (M3, d2) with Mı Æ 0 is the function wy : Rọ — Rọ given by 


VO c Rj, w (0) = sup(da( f(x), f(y)); x, y € Mi and di(z, y) € ô}. 


A modulus of continuity (bound) fora function f : Mı — M2» between metric spaces (Mı, d1) and (M2, d2) 
with Mı Æ 0 is a function w : Rj > Rj which satisfies Vó € IR}, wy(0) < w(6). In other words, 


vô € R$, Vz, y € Mi, di(z,y) < 8 => da(f(z), f(y)) < (9). 
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38.3.10 REMARK: Properties of the modulus of continuity. 

In the case of uniformly continuous functions between general metric spaces, the properties of the modulus 
of continuity are very weak. This is particularly true for large values of the independent variable because the 
uniform continuity property focuses very much on smaller values. (See Example 38.3.11.) The properties 
are very much stronger in the case of convex subsets of Cartesian spaces (or Banach spaces). 


38.3.11 EXAMPLE: Let M; = Zi and M» = IR with the usual distance functions. Define f : Mı > Mə by 
f(x) = z? for all a € Mı. Then f is uniformly continuous because for any £ € IR*, the value ô = 1/2 ensures 
that da (f(x), f(y)) < € for all x,y € Mı with di(z,y) < 6. However, w;(0) = 0 for 6 € [0, 1) and wj (6) = oo 
for ô € [1, 00). 
38.3.12 THEOREM: Some basic properties of the modulus of continuity. 
Let f : Mı — Mz be a map between metric spaces (Mi, dı) and (M2, d2) with Mi 40. Let wy : IR — RO 
be the modulus of continuity of f. Let X5 = (do(f (x), f(y)); x, y € Mi and di(z, y) € 6} for all ó € R$. 

(i) Range(wy) C fRj. In other words, wy : Rj — Ro is a well-defined function. 

(ii) VÓ1,62 € Ro, ôi X09 > Xs, C Xs,. 
(iii) V1, 69 [x Ro, 61 < 62 > wp (d1) < uf (02). 
(iv) f is uniformly continuous if and only if lims_,9+ w;(9) = 0. 


PROOF: For part (i), there exists x € Mı because Mi # Ø. Then Xo # 0 because 0 = da(f (x), f(x)) € Xo 
since di(z,z) = 0. So X; # 0 for all ó € R$ by part (ii). Therefore ws(6) > 0 for all 6 € R$. Hence 
Range(ws) C Rẹ. 

Part (i) follows from Theorem 15.9.3 (xii). 

Part (iii) follows from part (ii), Definition 38.3.9 and Theorem 11.2.42 (ii). 

For part (iv), let f : Mı — Mp be uniformly continuous. Let € € Rt. Then Definition 38.3.2 implies 
Iso € Rt, Vó € [0,50), Xs C [0,£/2). So wp(8) X &/2 for all 6 € (0,69). Therefore w (do /2) < €/2 < e. 
Consequently lims_,9+ w¢(d) = 0. 

Now suppose that lims_,9+ wf(0) = 0. Let € € Rt. Then there exists ôo € IR* such that wr(8) < &/2 for 
all 6 € [0,09]. Let z, y € Mi satisfy di(z,y) < 9o. Then do(f(x), f(y)) € w(d0) < e. Hence f is uniformly 
continuous by Definition 38.3.2. 


38.3.13 REMARK: Uniform continuity for uniform spaces. 

Uniform continuity is not well defined for general topological spaces, but is well defined for *uniform spaces". 
These have a structure known as a “uniformity”, which uniquely determines a topology on the space, and 
may itself be determined by a metric-space distance function. Cauchy sequences and “Cauchy nets" may 
also be defined within uniform spaces. Using these, completeness may be generalised from metric spaces to 
uniform spaces. (See Remark 37.8.16.) The theory of uniform spaces is somewhat abstract and of limited 
relevance to the differential geometry definitions in this book. (See Kelley [101], pages 174-216, for an 
exposition of uniform spaces and their relations to metric spaces, uniform continuity and completeness. For 
uniform continuity for uniform spaces, see also Hocking/Young [93], pages 31-32; Gaal [77], pages 224—229; 
Wilansky [163], pages 209-211; Willard [165], pages 242, 247, 269.) 


38.4. Uniform convergence 


38.4.1 REMARK: Uniform convergence of sequences of functions. 

The uniform convergence concept in Definition 38.4.2 is equivalent to the “uniform Cauchy convergence" 
concept in Definition 38.4.3 if the common range of the functions is a complete metric space, as shown in 
Theorem 38.4.4. This conveniently enables uniform convergence to be proved without knowing in advance 
what the limit function is. 


38.4.2 DEFINITION: Uniform convergence of sequences of metric-space-valued functions. 
A sequence of functions f = (f;)%2, from a set X to a metric space (M, d) converges uniformly to a function 
g:X — M when 
Ve € Rt, dk € Zj, Vx € X, Vi € Zk, oo), 
U(fi(x),g(@)) < e. 
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A sequence of functions f from a set X to a metric space (M,d) converges uniformly on X when f converges 
uniformly to some function from X to M. 


A sequence of functions f from a set X to a metric space (M,d) is uniformly convergent on X when f 
converges uniformly to some function from X to M. 


38.4.3 DEFINITION: Uniform Cauchy convergence of sequences of metric-space-valued functions. 
A uniformly Cauchy (convergent) sequence of functions from a set X to a metric space (M, d) is a sequence 
f = (fii of functions from X to M such that 


Ve € Rt, 3k € Zf, Va € X, Vi, j € Zk, oo), 
d(fi(x), f;()) < e. 
38.4.4 THEOREM: Equivalence of uniform convergence and uniform Cauchy convergence. 


Let f = (fi);£g be a sequence of function from a set X to a complete metric space (M,d). Then f is 
uniformly convergent on X if and only if it is uniformly Cauchy convergent. 


PROOF: Let f be uniformly convergent. Then there is a function g : X — M such that 


Ve € Rt, dk € Zt, Vx € X, Vi € Z[k, oo), 
d(fi(x), g()) < €. 


Let & € IR*. Then there exists k € Zj such that Vr € X, Vi € Z[k,oo), d(fi(x), g(x)) < €0/2. Let x € X 
and i,j € [k, oo). Then d(fi(x), f;(x)) € d(fi(z), g(z)) + d(g(z), f;(x)) < e€/2+¢/2 =e. Thus f is uniformly 
Cauchy convergent by Definition 38.4.3. 

Now let f be uniformly Cauchy convergent. Then Definition 38.4.3 implies that (f;(x));29 is a Cauchy 
sequence of real numbers for each x € X. So by Definition 37.8.9 and Theorems 37.8.5 and 37.8.14 (i), 
there is a unique g(x) € R for all x € X such that (fi(x))%29 converges to g(x). Let eo € IR. Then by 
Definition 38.4.3, there exists k € Zf such that Vx € X, Vi, j € Z[k, oo), d(f;(x), f;(x)) < €/2. Therefore by 
Theorem 37.8.7 (iv), Vr € X, Vi € Z[k, oo), d(f;(x), g(z)) € e/2 < e. Hence f is uniformly convergent on X 
by Definition 38.4.2. 


38.4.5 THEOREM: Uniformly convergent continuous function sequences have continuous limits. 
Let f = (f;);£ be a sequence of continuous functions from a metric space (Mi, d) to a metric space (Mo, d2) 
which converges to a function g : Mı — Mo. Then g is continuous. 


PROOF: Letz € Mi. Let e € R+. Then by Definition 38.4.2, there exists k € Zf with do(fi(z), g(x)) < &/3 
for all x € M, and i € Z[k, oo). Since f; is continuous, there exists 6 € R% with do(fx(x), fy(z)) < &/3 for 
all x € Bl,. Thus 


Va € Bas, do(g(x), g(z)) < do(g(a), f(x) + da(fe(a), fi (2)) + da(fi (2). 9(2)) 
«e/3-e/3-4-ef3 =e. 


Hence g is continuous by Theorem 38.1.3. 


38.4.6 REMARK: Technical lemma about uniform continuity and convergence. 

Theorem 38.4.7 is a “technical lemma" to assist another “technical lemma", Theorem 44.3.18, which assists 
Theorem 44.3.20, which asserts Peano-style ODE existence. The metric function for X x Y is assumed to 
use the 2-norm as in Definition 37.2.16 to combine the metrics on X and Y. 


38.4.7 l'HEOREM: Uniform continuity and convergence for some Peano ODE existence constructions. 
Let (X,dx), (Y, dy) and (Z, dz) be metric spaces. Let K C X x Y. Let h: K — Z be uniformly continuous 
on K. Let (f;);94 be a sequence of functions f; : X — Y which converge uniformly to a function g : X > Y. 
Suppose that f; C K and g C K for all i € Z. (In other words, their graphs are included in K.) Define 
i: X >Z and 9 : X 2 Z by ó; : rH h(x, fí(z)) and v : x h(a, g(z)) for i € Zj and x € X. 


(i) à; : X + Z and v : X > Z are uniformly continuous for all i € Zj. 


(ii) ($;)?99 converges uniformly to %. 
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PROOF: For part (i), Theorem 38.3.5 implies that the maps z ++ (a, f;(x)) and z +> (a, g(x)) are uniformly 
continuous for all i € Zi . Hence by Theorem 38.3.4 and the uniform continuity of h, the maps 6j: X > Z 
and V : X — Z are uniformly continuous for all i € Z;. 


For part (ii), let e € IR*. Then the uniform continuity of h implies that there exists 6 € IR* such that 
dz (h(zi, y1), h(xa, y2)) < € for all (z1,y1), (12,92) € K with dx xy (21, y1), (12, y2)) < 9. (For the Cartesian 
metric space product distance function dxxy, see Definition 37.2.16.) The uniform convergence of f to 
g implies that there exists k € ZF such that dy(f;(x),g(z)) < 6 for all x € X and i € Z[k,oc). Then 
dxxy (ans fa), (25 9(2))) = dy (fi), (2) < 6. So dz (dilz), (2) = dz (hr, fi (2), he, g(2))) < € for all 


x € X andi € Z[k, oo). Hence (¢;)%29 converges uniformly to v. 


38.4.8 REMARK: Additional function-sequence definitions which are required for Ascoli’s theorem. 
Ascoli’s theorem requires the uniform convergence concept in Definition 38.4.2. (See Section 38.5 for Ascoli’s 
theorem.) It also requires some related concepts for sequences of functions, such as the equicontinuity in 
Definition 38.4.10 and the uniform boundedness in Definition 38.4.14. 


38.4.9 REMARK: Pointwise and uniform equicontinuity of sequences and sets of functions. 

In Definition 38.3.2, a single function f : Mı — My, is said to be “uniformly continuous" when the domain 
distance 6 ensures that d2(f(x), f(y)) < £ whenever d,(x,y) < 6, independent of where the points x and y 
are in the domain. In other words, 6 does not depend on location in Mj. 


Definition 38.4.10 requires 6 to be uniform with respect to the functions in a function sequence, but not 
with respect to the domain points. This “pointwise equicontinuity” is useful when attempting to show that 
a sequence of functions converges to some function, for example in Theorem 38.5.5. 


Definition 38.4.11 requires 6 to be uniform with respect to both the functions in a function sequence and the 
points in the domain. This is “uniform equicontinuity” . 


38.4.10 DEFINITION: A (pointwise) equicontinuous sequence of functions from a metric space (Mi, di) to 
a metric space (M2, d2) is a sequence (f;);2 of functions from Mi to M» which satisfies 


Va € Mı, Ve > 0, 36 > 0, Vi € Zt, Vy € Mi, 
di(z,y) <6 => da(fi(a), fi(y)) < e. 


A (pointwise) equicontinuous set of functions from a metric space (Mi,d4) to a metric space (M2, d2) is a 
set F of functions from M4 to Mə which satisfies 


Yz € Mı, Ve > 0, 3ô > 0, Vf € F, Vy € Mı, 
dı (x,y) <6 = da(f(z), f(y)) < €. 


38.4.11 DEFINITION: A uniformly equicontinuous sequence of functions from a metric space (Mj, dı) to a 
metric space (M2, d2) is a sequence (f;)?2, of functions from Mı to M» which satisfies 


Ve > 0, 56 > 0, Vi € Zi, Yz, y € Mi, 
di(z,y) <6 => do(fi(z), fily)) « €. 


A uniformly equicontinuous set of functions from a metric space (Mj, di) to a metric space (M2, d2) is a set 
F of functions from M4 to M» which satisfies 


Ve > 0, 3ô > 0, Yf € F, Yx, y € Mi, 
dı (x,y) <ð => do( f(x), f(y)) « E. 


38.4.12 REMARK:  Pointwise and uniform boundedness of sequences and sets of functions. 

Definition 38.4.14 introduces uniformly bounded sequences and sets of functions with values in a general 
metric space. (See Definition 37.4.12 for bounded subsets of metric spaces.) Theorem 38.5.5 requires uni- 
formly bounded function sequences for the special case of real-valued functions. For real or Cartesian space 
valued functions, the uniform boundedness of a sequence or set of functions is equivalent to the existence of 
a uniform bound for the norms of the values of all functions at all points of their common domain. 


Definition 38.4.13 defines the corresponding “non-uniformly bounded" sequences and sets of functions. The 
principal difference is that universal and existential quantifiers are swapped. 
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38.4.13 DEFINITION: A pointwise bounded sequence of functions from a set X to a metric space (M, d) is 
a sequence (f;);2. of functions from X to M which satisfies 


Va e X, IK € R$, Vi, j € Ze, d(fi(a), f;(x)) € K. 


In other words, Vx € X, diam((fi(z); i € Zg ]) < oc. 


A pointwise bounded set of functions from a set X to a metric space (M,d) is a set F of functions from X 
to M which satisfies 


Va e X, IK € R, Vig € F, d(f(x),g(x)) < K. 


In other words, Vx € X, diam({ f(x); f € F}) < oo. 


38.4.14 DEFINITION: A uniformly bounded sequence of functions from a set X to a metric space (M, d) is 
a sequence (f;);2. of functions from X to M which satisfies 


JK € R$, Vz,y € X, Vi,j € Ze, d(f;(x), f;(y)) € K. 


In other words, diam(UJ;*, Range(f;) ) < oo. 


A uniformly bounded set of functions from a set X to a metric space (M, d) is a set F of functions from X 
to M which satisfies 


KER), Yz,y € X, Vf,ge F, d(f(x),g(y)) < K. 


In other words, diam(U pep Range( f) ) < oc. 


38.5. Ascoli’s theorem 


38.5.1 REMARK: Ascoli’s theorem. 

An 1884 paper by Ascoli [170] showed that pointwise equicontinuous sequences of functions had uniform 
limits. (For some history of this result, see A.E. Taylor [145], page 167.) The method of proof, which uses a 
diagonalisation procedure, can be readily generalised to much broader scenarios. 


The main obstacle which the proof must overcome is that the application of infinitely many subsequencing 
operations does not generally yield a sequence in the limit. Consider for example the subsequence operation 
which replaces x = (z;);29 with (z5;)?29. When this operation is applied k times, the result is (aw x;)%2 9. If 
the operation is applied infinitely many times, the result does not converge to an infinite subsequence of x. 
'The main task here is to construct a single subsequence from a nested sequence of subsequences. 


This obstacle is overcome by constructing a convergent function-sequence by first choosing the first function 
in the original function-sequence, then choosing the second function of the first subsequence, then the third 
function in the second subsequence, and so forth. This yields a subsequence of the original sequence which 
converges at all elements of a dense subset of the domain. So it converges uniformly on the whole domain. 


For greater clarity, some steps in the proof of Ascoli's theorem, Theorem 38.5.5, are presented separately 
as Theorems 38.5.2, 38.5.3 and 38.5.4. (A similar approach is taken by Murray/Miller [119], pages 10-12.) 
These preparatory theorems are given in a very restricted form. They can be considerably generalised. 


The proof of Theorem 38.5.2 uses the basic diagonalisation procedure which makes Ascoli's theorem work, 
but it requires neither uniform boundedness nor equicontinuity. It requires only the explicit sequential 
compactness of bounded sets, which is shown to be a property of R in Theorem 35.7.10. This sequential 
compactness must be “explicit” so as to avoid to invoke an axiom of choice for choosing an infinite sequence 
of subsequences in the proof of Theorem 38.5.2. The range IR for the functions may be replaced by any 
topological space which has the same explicit sequential compactness of bounded sets. The domain superset 
IR for S in Theorem 38.5.2 may be replaced by any set because its topology is not used. 


38.5.2 THEOREM:  Pointwise bounded function-sequences have convergent subsequences om countable sets. 
Let S be a subset of R. Let f = (f;)?29 be a pointwise bounded sequence of real-valued functions on S. Let 
J be a countable subset of S. Then f has a subsequence which is convergent at all points of J. 
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PROOF: Since J is countable, there exists a sequence q = (qx)? 9 with range J by Theorem 13.7.13 (iv). 
(Clearly q will not be injective if J is finite, but this does not affect the proof.) 

Let X = {x : Z — R; x is bounded} and Y = {y : Z — IR; y is convergent}. Then by Theorem 35.7.10, 
there is a map ¢: X > (Zj — Z{) such that (zx) is an increasing sequence and x o ¢(z) € Y for all z € X. 
Thus z o ó(x) is a convergent subsequence of any given bounded sequence of real numbers z. 


Inductively define the sequence (g;)?2o = ((9;,i)?20)F20, Where Vj, i € Za, gji: S — R, by go = f and 


Vj € Zf, gj+1 = 9j © é( (gj. (a5)) Ro )- 


To show that this sequence is well defined, note first that go = f is a pointwise bounded sequence of 
functions, by assumption. So (go,~(qo))?29 is a bounded sequence of real numbers. So ó((9o,k(q0))5-.0 ) 
is a well-defined increasing sequence of non-negative integers. So gi is a well-defined subsequence of go. 
Therefore gı is a pointwise bounded sequence of real-valued functions. The same argument shows that gj+1 
is a well-defined pointwise bounded subsequence of g; for all j € Zg. So by induction, gj is a well-defined 
pointwise bounded subsequence of go = f for all j € Zü because an infinite subsequence of a subsequence is 
an infinite subsequence by Theorem 12.3.4 (v). 

For all j € Zf, let 27 = Ges denote the bounded sequence of real numbers (9;,.(q;))7:.9. In other words, 
Vj,k € Z$, v) = gjx(q;). Let Bj = ó(a?) = ¢( (g;,4(qj)) 20) for all j € Zj. Then z/ o 8; =a) o d(a?) € Y 
converges to a unique limit in IR. Denote this limit by p; = lim; jo, (2? o 8;); = limjo0 ib. for all j € Z. 
Then lim; 555 gj41,(9;) = lim; 5o 95,6; (qj) = lim; o5 UN = pj. Thus the function-sequence g;+1 converges 
to p; at qj for all j € Zf. However, since g;,4 is a subsequence of gy for all £ € Z(0, j], it follows from 
Theorem 35.4.6 that gj+ı also converges to pe at qe for all £ € Z[0, j). 

Define the function-sequence h = (hj) eo by Vj € Zi, hj = gj41,j. Then h is a subsequence of g;44 for all 
j € Z. So the real-number sequence (h; (q))5&.o converges to pe for all j € Zg and £ € Z(0, j]. Therefore 
(hj (qe))F29 converges to pe for all £ € Zi. Thus h is a subsequence of f which is convergent at all points 
of J. 


38.5.3 THEOREM: Extension of pointwise convergence from a dense subset to the whole set. 
Let S C R. Let f = (f;)?29 be a pointwise equicontinuous sequence of real-valued functions on $ such that 
f is pointwise convergent on a countable dense subset of S. Then f is pointwise convergent on S. 


PROOF: Let f be an equicontinuous sequence of real-valued functions on S which is pointwise convergent 
on a countable dense subset J of S. Let z € S. Let e € Rt. Since f is equicontinuous, it follows from 
Definition 38.4.10 that there is a 69 € Rt such that Vz € B4,,5,, Vi € Zg, |fi(z) — filzo)| < €/3, 

The density of J in S implies that J N B4,,5, 4 0. The countability of J implies that z; € JM B4,,4, may 
be chosen to be the first element of an enumeration of J which is in B,,,5,. Since the real-number sequence 
(fi(z1))%2o is convergent, it is a Cauchy sequence by Theorem 37.8.4. So by Definition 37.8.3, there is a 
k € ZF such that Vi, j € Z[k,oo), |fi(z1) — f;(z1)| € £/3. Consequently 


Jk € ZA, Vi, j € Zk, oo), 
|fi(o) — Filo) S |fi(zo) — Fila) + MiGa) — f (21) 165602) — fi(29)l 
<eé/3+e/3+6e/3 =e. 


Thus (fi(20))%2, is a Cauchy sequence by Definition 37.8.3. So (fi(zo));29 is a convergent sequence by 
Definition 37.8.9 and Theorem 37.8.14 (i). Hence f is pointwise convergent on S. 


38.5.4 THEOREM: Convergent equicontinuous sequences on compact sets are uniformly convergent. 
Let f = (fi)?2o be a pointwise equicontinuous sequence of real-valued functions which is pointwise convergent 
on a compact subset K of IR. Then f is uniformly convergent on K. 


PROOF: Let e € R*+. Then the equicontinuity of f implies by Definition 38.4.10 that there is a d9 € Rt 
such that Vi € Zt, Va,y € K, (|y — y| < 9o => |fi(x) — fi(y)| < £/3). By Theorem 34.9.12, the Heine-Borel 
theorem, the compactness of K implies that there is a finite subset S of K such that K C {Bp,50; p € S) 
because {B,5,; x € K} is an open cover for K. The pointwise convergence of f implies that for all p € S, 
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there exists kp € Zi such that Vi, j € Z[kp, oo), |fi(p) — f;(p)| < ¢/3 because (f;(p));£ is a Cauchy sequence 
for all p € S by Theorem 37.8.4. Let k' = maxpeg kp. Then k’ € Zi and 


Vp € S, Vi, j € Z[k’, 00), [fi(p) = fi(p)| < e/3. 
Let x € K. Then x € Bp, for some p € 5, and 


Vije[k,oo), f(x) — f(x) € | file) — Fil) + Mito) — Fi) - Ms») — fla) 
<eé/8+¢/3+e/3 =e. 


Thus Ve € Rt, 3k € Zt, Yx € K, Vi, j € Zk, 00), |fi(z) — f;(x)| < e. So f is uniformly Cauchy convergent 
on K. Hence f is a uniformly convergent on K. 


38.5.5 THEOREM:  Ascoli's theorem for real-valued functions of a real variable. 
Let K be a compact subset of IR. Let f = (f;);2 be a uniformly bounded, pointwise equicontinuous sequence 
of real-valued functions on K. Then f has a subsequence which is uniformly convergent on K. 


PROOF: Since Q, the set of rational numbers, is dense in R, it follows that Q N K is dense in K. (See 
Definition 33.4.2 for dense subsets.) There exists a sequence whose range is QM K. (See Remark 15.2.4 for 
enumerations of Q. See Theorem 12.4.2 for enumerations of arbitrary subsets of w, which can be used to 
convert enumerations of Q to enumerations of subsets of Q.) Thus K has a countable dense subset QN K. 


By Theorem 38.5.2, f has a subsequence g which is convergent at all points of Q N K. Since f is pointwise 
equicontinuous on K, g is also pointwise equicontinuous on K. So by Theorem 38.5.3, g is pointwise 
convergent on all of K. Therefore by Theorem 38.5.4, g is uniformly convergent on K. Hence f has a 
subsequence which is uniformly convergent on K. 


38.5.6 REMARK: Issues with the proof of Ascoli’s theorem. 

The explicit procedure used in the construction of a convergent subsequence in Theorem 38.5.2 is evidently 
useless as a basis for numerical approximations. The very cumbersome procedure at each individual point 
of the functions’ common domain is presented in the proof of Theorem 35.7.10. The procedure has the 
advantage that it avoids axioms of choice by being explicit. But this point-by-point procedure must be 
executed for each rational number in the common domain of the functions in the given sequence. 


Another problem with the proof-method of Ascoli’s theorem is that the uniformly convergent subsequence 
which is constructed, and also the function which it converges to, can be strongly influenced by the order of 
enumeration of a dense subset of the common function domain as in Example 38.5.7. 


The numerical uselessness of Ascoli’s theorem is particularly regrettable because it is used in the Peano 
method for proving existence of solutions of ordinary differential equations in Section 44.3, and ODE solutions 
are required in differential geometry to obtain parallel transport from connections, and to compute integral 
curves of vector fields. The essential difficulty with Peano’s method is that it does not provide a unique 
solution. So any existence procedure applied to it must necessarily make a choice amongst possible solutions. 
Even though this choice does not use an axiom of choice in Theorem 38.5.5, the arbitrariness and the 
unwieldiness of the procedure make it unsuitable for practical differential geometry applications. It is possibly 
for this reason, amongst others, that the Picard ODE existence proof method in Section 44.6 is almost 
universally preferred. 


38.5.7 EXAMPLE: Dependence of Ascoli’s theorem construction on enumerations. 

Let ¢: Z — QN [0,1] be an enumeration of QN [0,1], and define the function sequence f = (f;)%2o by 
Vi € Zj, Vx € [0,1], fi(x) = (x — ¢(i))?. (See Section 15.2 for enumerations of rational numbers.) Then the 
procedure in the proof of Theorem 35.7.10 will choose a convergent subsequence of f which has the limit 
po = 0 for the real-number sequence ( f;(q9));2, because it chooses the least limit value. Thus, when the full 
procedure in the proof of Theorem 38.5.2 is applied, the limit function will be x — (x — qo)?, where qo is 
arbitrarily determined by the function-domain enumeration sequence q = (4q;)2o.- 


In this example, it is not possible to achieve the limit point p; = 0 for the real-number sequence (f;(q1))22o 
because this would not match the choice of pg = 0 at qo. Clearly, the only limit possible at qı will be 
(qı — qo)”. In this simple example, there is only one possible choice for the limit function after the first limit 
point has been chosen, but a second rational parameter could be added to functions in the sequence f so as 
to give a second “degree of freedom”. Then two domain point choices could determine the limit function. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1276 38. Metric space continuity 


38.6. Lipschitz continuity of maps between metric spaces 


38.6.1 REMARK: Definitions of Lipschitz continuity. 
There are many variations of the concept of Lipschitz continuity between metric spaces (M3, d1) and (Ms, d2), 
including the following. 


(1) 3K € R, Vz,y € Mi, do(f(2), f(y)) < Kadi (z,y). 
Uniform bound with global scope. 

(2) IK € R$, dr € Rt, Vz,y € Mi, di(z, y) <r > da(f(z), f(y)) € Kdi(z, y). 
Uniform bound with uniform locality. 

(3) IK € R$, Yz € Mi, 3p € Rt, Vz,y € Bl,, do(f(z), f(y)) € Kdi(u, y). 
Uniform bound with pointwise pair-locality. 

(3) IK € R$, Vx € Mi, dr € Rt, Vy € Mi, di(z, y) <r > da(f (x), f(y)) € Kdi(z, y). 


Uniform bound with pointwise locality. 


(4) Yx € Mi, IK € R$, Vy € Mi, do(f(z), f(y)) < Kdi(z, v). 
Pointwise bound with global scope. 


(5 3p € Rt, Vz e Mi, AK € R$, Vr, y € BL, do(f(zx), f(y)) € Kdi (z, y). 


z,p? 
Pointwise bound with uniform pair-locality. 


(5) dr € R^, Vr € Mi, IK € Rọ, Vy € Mi, di(z, y) <r => do(f(z), f(y)) € Kdi(z, y). 
Pointwise bound with uniform locality. 


(6) Vz € Mi, 3p € R^, AK € R$, Vr, y € BL, do(f(x), f(y)) € Kdi (z, y). 


z,p? 
Pointwise bound with pointwise pair-locality. 


(6) Ve € Mi, dr € R*, 3K € Rọ, Vy € Mi, di(z, y) <r => do(f(z), f(y)) € Kdi(z, y). 
Pointwise bound and pointwise locality. 


Lipschitz continuity is often defined to be uniformly globally bounded as in (1). (See Definition 38.6.6.) This 
is excessively strict for many applications. The weakest style of definition as in (6) is sometimes preferable. 


(See Definition 38.6.11.) Each of the above definitions may be the most suitable for the requirements of 
particular applications. 


The implication relations in Theorem 38.6.2 are illustrated in Figure 38.6.1. 


(1) uniform bound 
global scope 


LAS 


(2) uniform bound,  |(4) pointwise bound 
uniform locality global scope 


ZN Z 


(3) uniform bound]  |(5) pointwise bound 
pointwise locality uniform locality 


Nf 


(6) pointwise bound 
pointwise locality 


Figure 38.6.1 Lipschitz class variants 


38.6.2 THEOREM: Some implications and not-implications between Lipschitz continuity classes. 

Let (Mi,di) and (M2,d2) be metric spaces. Then the following implications are valid for the Lipschitz 
continuity criteria in Remark 38.6.1. These implications are all strict in the sense that the converse of each 
one is false for some pair of metric spaces, as indicated by not-implications. 


0 (2) = 2). 2) 7 (1. 
(i) (1) = (4). (4) # D. 
(iii) (2) ^ (3). (3) # (2). 
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(iv) (2) > (9. © # (2. 
(v) (3) = (6). (6) # (3). 
(vi) (4) = (5). ©) # (A). 
(vii) (5) = (6). (6) # (3). 


i) follows from Theorem 6.6.18 (iv) and Definition 6.3.9 (viii) (EI). To show strictness, let 

i,2i +1) and M» = R with the standard distance function on IR. Define f : Mi — Mə by 
z)?. Let K = 0 and r 2 1. Then Vz,y € Mi, di(z,y) <r => da(f(x), f(y)) € Kdi(z,y). So 
f satisfies (2), but f does not satisfy (1). 
Part (ii) follows from Theorem 6.6.24 (iii). To show strictness, let Mı = Rt and Mə = R with the usual 
distance function. Define f : Mz > Mə by f : x — zi. For z € Mi, let K(x) = z-!/?. Let y € Mı with 
y > x. Then do(f(z), f(y)) = y? — zi? < 1x-17|y — x| € K(x)di(z, y). Now suppose that 0 € y < zx. 
Then da(f (x), f (y)) € bora y) = K(x)di(z, y). So f satisfies (4), but f does not satisfy (1). 


Part (iii) follows from Theorem 6.6.24 uU. For strictness, let Mı = UJ;cz« ((2i) , (2i — 1) ^!) and Mz = R. 
Define f : Mı — Mp by f : x > floor(z- 1). (Then for example; f equals 1 on (5,1), equals 3 on (1, 1), and so 
forth.) Let K = 0, and for x € M; let r(x) = floor(z 1) ?. Let z € Mı and y € Bl re Let k = floor(z 1). 
= k = 2i—1 for some i € Z*, and then z € ((2i) !, (2i—1)~1) and (2i-1) ! — (20)! = (2i) 1(2j—1) ! = 
-i(k +1)? < k? = r(z). So di(z,y) < r(x) implies f(y) = f(x), and so do(f(z),/(y)) = 0 = K 
eels f satisfies (3), but f does not satisfy (2). 
Part (iv) follows from Theorem 6. a 24 (iii). To show strictness, let M; = Mz = IR with the usual metric, and 
define f : Mı — M» by f : x > z?. Let r = 1, and for z € Mj let Ae ) = 2|2|4+1. Let x € Mi with x > 0. 
Let y € Mı with di(z, y) < 1. Then d2(f(x), TOE < ((x + 1)? — x?°)dı (x,y) = K(x)dı (x,y), and similarly 
for z < 0. Therefore f satisfies (5), but f does not satisfy (2). 
Part (v) follows from Theorem 6.6.24 (iii). To show strictness, let Mı = IR* and M» = R* with the usual 
distance functions, and define f : Mı — Mz by f : x  z-!. For x € M), let K(a = —25 ? and r(x) = 2/2. 
Let x € Mi. Let y € Mı satisfy di(z,y) < r(x). Then do(f(x), f(y)) € (((x/2) ! — z-1)/(x/2))di(z, y) = 
K (x)di(x, y). Therefore f satisfies (6), but f does not satisfy (3). 
Part (vi) follows from Theorem 6.6.18 (iv) 2 Definition 6.3.9 (viii) (EI). To show strictness, define Mi, M» 
and f as in the proof of part (iv) by Mı = = R, and define f : Mı — Mə by f : x =œ z?. Then f 
satisfies (5), but f does not satisfy (4). 
Part (vii) follows from Theorem 6.6.24 (iii). To show strictness, define Mı, M» and f as in the proof of 
part (v v) by Mı = Rt and M» = Rt, and define f : M; — Mz by f :x — «71. Then f satisfies (6), but f 
does not satisfy (5). 


38.6.3 REMARK:  Lipschitz continuity nas with buo ee 
When the “pair-locality” Lipschitz conditions (3^), (5^) and (6’) are included, Theorem 38.6.4 shows that the 
relations are no longer so simple. The picture es emerges i is illustrated in Figure 38.6.2. 


Figure 38.6.2 Lipschitz classes including “pair-locality” variants 
38.6.4 THEOREM: Some implications and not-implications for pair-locality Lipschitz continuity classes. 
Let (Mı, dı) and (M2,d2) be metric spaces. Then the following implications are valid for the Lipschitz 


continuity criteria in Remark 38.6.1. 
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(i) (2) = (8). 
(ii) (2) > ©). 
(iii) (3) 7» (6). 
(iv) (3) = (6). 
(v) (3) = (3). (3) # (3) 
(vi) (4) > ©). 
(vii) (5) # (6). 
(viii) (5) > (6). 
(ix) (5) > (5). ©) # (9) 
(x) (6) = (6). (6) # (6). 


PROOF: Part (i) follows with p = r/2 because x,y € Bl 2/2 > (ay) < 


Part (ii) follows with p = r/2 because x,y € B} „ja > di(a,y) <r. 


z,r/ 
For part (iii), let Mı = {0} U (1/4; i € Zt} C IR and M» = IR. Define f : Mı > M» by f(0) = 0 and 
f(1/i) = (-1)!/i for i € Zt. Let K = 1. Let x € Mı. If x = 0, then da(f (x), f(y)) = Kdi(z, y) for all 
ye Mı. If x 40, let r = g — (£71 c1) ! 2a?/(1--2) = 1/6—1/( +1), where? —$ ^. Then B} „= [s]. 
So do(f(x), f(y)) = 0 € Kdi(z,y) for all y € Mı with di(z,y) < r. So f satisfies (3). To test (6), let 
z — 0. Let p € Rt and K € R$. Let i = ceiling(max(1+ p^!, K/2)). Let y = 1/i and z = 1/(i + 1). Then 
di(z,y) = 1/i < p and di (x, 2) = 1/(i+1) < p because i > pl, and da(y,z) =i7-+—(¢4+1)7' =i} (+1)! 
and do(f(y), f(z)) 2i ! + (i +1)7t = (2i + 1)i ! +1)71 = (2i + 1)di(y, z) > Kdi(y, z) because i > K/2. 
Therefore f does not satisfy (6). Hence (3) # (6^). 


Part (s follows with p — r and s = gu s- The assertion (3) # (3') follows from the example in part (iii) 
because it satisfies (3), but does not ‘satisfy (6). So by part (iv), it does not satisfy (3'). 


For part (vi), let Mı, M» and f be as in the counterexample in the proof of Theorem 38.6.2 (ii). In other 
words, let Mı = R* and M5 = R and define f: Mi > M by f : 21 «'/?. Then f satisfies s (4). To test 
whether f satisfies (5^), let p € IR*. Let z = p. Let K € Rb. Let K = max(1,2K). Then K € Rt. Let 
x = min(z, $K-?). Then 0 < z < z. So x € BL. Let y = z/4. Then y € Bl, also. But da(f (x), f(y)) = 
gi? — yl/2 = ic 1/2 — 2z-V?2(g — y) = 251/25, (s, y) 2 2($K 2)- ?dy (x,y) = Kdi(z,y) > Kdi(z, y). 
Thus is any p € Rt, for some z € Mi, for all K € R$, there are x,y € Bl, with d(f (x), f(y)) > Kdy(a,y). 
This is the negation of (5). Hence (4) # (5). 

For part (vii), let Mi, M» and f : Mı — M2» be as in part (iii). Thus M; = {0} U (1/45 i € ZT} CR 
and M» = R and f : Mı — Mg is defined by f(0) = 0 and f(1/i) = (—1)!/i for i € Zt. Let r = 2. Let 
x € Mı. Ifx=0, let K = 1. Then do(f(zx), f(y)) < Kdi(z,y) for all y € Bl,.. If x Æ 0, then z = 1/i for 
some i € Z+. Let K = 2i -- 1. Then do(f(x), f(y)) < Kdi(z, y) for all y € B1,. Therefore f satisfies (5). It 
has been shown in the proof of part (iii) that f does not satisfy condition (6! ). Hence (5) # (6). 


Part (viii) follows from Theorem 6.6.24 (iii). 


Part (ix) follows with p = r and z = x. The assertion (5) # (5) follows from the example in part (vii) 
because it satisfies (5), but does not satisfy (6'). So by part (viii), it does not satisfy (5’). 


Part (x) follows with p = r and z = x. The assertion (6) # (6') follows from the example in parts (iii) 
and (vii) because it satisfies (3) and (5), and therefore satisfies (6) by Theorem 38.6.2 (v, vii), but it does 
not satisfy (6). (Alternatively see Example 38.6.5.) 


38.6.5 EXAMPLE: Function with very weak Lipschitz continuity on a real interval. 

The counterexample used in the proof of Theorem 38.6.4 (iii, v, vii, ix,x) may not seem very convincing 
since it is defined on a very disconnected domain. However, it is possible to “join the dots" while still 
satisfying Lipschitz conditions (5) and (6) in Remark 38.6.1. Then it remains a valid counterexample for 


Theorem 38.6.4 (vii, ix, x). (See Figure 38.6.3. The function has been negated to make it look better.) 
The linearly interpolated function f : [0,1] — R satisfies f(0) = 0 and 


Yx €(0,1], f(x) = (-1)89*0/9 ( -1/ floor(1/x) + (1 — z floor(1/z))(2 + 1/floor(1/z)) ). 
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f(x) 


Figure 38.6.3 Function with very weak Lipschitz continuity property 


38.6.6 DEFINITION: Lipschitz function style (1): uniform global bound. 
A (uniform-bound) (globally) Lipschitz (continuous) function from a metric space (Mi, d4) to a metric space 
(M2, dz) is a function f : Mı — Mə such that 


JK € Ri, Vr,y € Mi, da(f (x), f(y)) € Kdi(z, y). (38.6.1) 


A Lipschitz constant for a Lipschitz function f is any K € Ri such that (38.6.1) holds. 


38.6.7 REMARK: The Lipschitz constant for uniform-bound globally Lipschitz functions. 

Notation 38.6.8 is a well-defined non-negative real number for a function f : Mı — Mo, for metric spaces Mı 
and Mə such that #(M1) > 2, if (do(f(x), f(y))/dı (x,y); x € Mi, y € Mi \ {x}} is bounded above. This 
set is bounded above if the Lipschitz continuity bound K is uniform, which is true for cases (1), (2) and (3) 
in Remark 38.6.1. 


38.6.8 NOTATION: Lip(f) denotes the infimum of all Lipschitz constants for a Lipschitz function f. 


38.6.9 REMARK: Lipschitz continuity styles which guarantee continuity or uniform continuity. 

The relatively strong style of local Lipschitz continuity in Definition 38.6.10 guarantees uniform continuity. 
(So Definition 38.6.6 guarantees uniform continuity also.) The very weak style of Lipschitz continuity in 
Definition 38.6.11 guarantees only continuity. (Example 38.6.5 satisfies Definition 38.6.11.) 


38.6.10 DEFINITION: Lipschitz function style (2): uniform bound and uniform locality. 


A uniform-bound uniform-locality Lipschitz (continuous) function from a metric space (Mj, d1) to a metric 
space (M2, d2) is a function f : Mı — Mə such that 


AK € Rj, dre R*, Yz,y € Mi, 
dı(x,y) <r = do(f(x), f(y)) € Kdi(v, y). 


38.6.11 DEFINITION: Lipschitz function style (6): pointwise bound and pointwise locality. 
A pointwise-bound pointwise-locality Lipschitz (continuous) function from a metric space (Mi, d1) to a metric 
space (M2, d2) is a function f : Mı — Mə such that 


Va € Mı, IK € R, dre Rt, Vy € Mi, 
di(z,y) <r = do(f(z), f(y)) € Kdi(z,y). 


38.6.12 THEOREM: Continuity of Lipschitz functions. 
Let f : Mı — Mg for metric spaces Mı and M3. 
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(i) If f is a uniform-bound uniform-locality Lipschitz function, then f is uniformly continuous. 
(ii) If f is a uniform-bound globally Lipschitz function, then f is uniformly continuous. 
(iii) If f is a pointwise-bound pointwise-locality Lipschitz function, then f is continuous. 


PRoor: For part (i), let f : Mı — M» be a uniform-bound uniform-locality Lipschitz function. Let 
c € R*. By Definition 38.6.10, there are K € Rj and r € IR* such that do(f(x), f(y)) € Kdi(z,y) for 
all x,y € M, satisfying d4(z,y) < r. Let 6 = min(r,e/ max(1, K)). Then 6 € Rt. Let x,y € M; satisfy 
di(z,y) < 6. If di(z, y) = 0 then x = y and so d2(f(x), f(y)) = 0 < £. So suppose that d4(z, y) > 0. Then 
do(f(x),f(y)) € Kdi(zx,y) € max(1, K)di(z,y) < max(1, K) = e. Hence f is uniformly continuous by 
Definition 38.3.2. 

Part (ii) follows from part (i) because Definition 38.6.6 implies Definition 38.6.10 by Theorem 38.6.2. 

For part (iii), let f : Mı — M» be a pointwise-bound pointwise-locality Lipschitz function. Let x € Mi. Let 
e € IR*. By Definition 38.6.11, there are K € Rf and r € R” such that də( f(x), f(y)) € Kdi(z, y) for all 
y € M; satisfying d(x, y) < r. Let 6 = min(r,e/ max(1, K)). Then ô € R”. Let y € Mi satisfy di(x, y) < ô. 
If di(z, y) = 0 then y = x and so da(f (x), f(y)) = 0 < e. So suppose that dı (x, y) > 0. Then da(f(z), f(y)) < 
Kdi(z,y) € max(1, K)di(z, y) < max(1, K)6 € e. Hence f is continuous by Theorem 38.1.3. 


38.6.13 REMARK:  Lipschitz continuity class for applications to locally rectifiable curves. 
Definition 38.6.14 is slightly stronger than Definition 38.6.11. It has the advantage that it locally maps 
rectifiable curves to rectifiable curves. 


38.6.14 DEFINITION: Lipschitz function style (6'): pointwise bound and pointwise pair-locality. 
A pointwise-bound pointwise-pair-locality Lipschitz (continuous) function from a metric space (M1, d1) to a 
metric space (M2, d2) is a function f : Mı — M^» such that 


Yz € Mı, 3K € Rj, dr € R*,Vz,yc Bi. 
di(v,y) <r = do(f(x), f(y)) € Kdi(v, y). 


38.7. Holder continuity of maps between metric spaces 


38.7.1 REMARK: Relevance of Holder continuity. 

The Hólder continuity classes C^'^ (Mi, M3) of functions whose kth derivative is a-Hólder continuous are 
important in the Schauder existence theory for elliptic second-order boundary value problems. (For example 
see Gilbarg/Trudinger [81], pages 87-141; Miranda [116], pages 164-169; Ladyzhenskaya/Ural'tseva [107], 
pages 106-138.) This is relevant for analysis of differentiable manifolds and functions which “inhabit” them. 
(See for example Petersen [31], pages 301-317.) The special sub-class of 1-Hólder continuous or “Lipschitz” 
functions in Section 38.6 has more direct relevance to parallel transport along curves. 


As discussed in Remark 38.6.1, there are numerous distinct styles of definitions of Lipschitz function spaces. 
These styles also apply to Holder continuity classes, but these distinctions of local versus global, and uniform 
versus pointwise, are not fully represented in Definitions 38.7.2 and 38.7.3. 


38.7.2 DEFINITION: Holder continuity at a point. 

A Hoólder-continuous function with exponent a or a-Hólder (continuous) function from Mı to M», for metric 
spaces Mi < (Mı, dı) and M» < (M2», da), for o € (0,1], at a point x € S for a set S C Mi, is a function 
f: S — Mə such that 


JK € R, Vy € S, da(f (x), f(y)) € Kdi(z,y)^. 


38.7.3 DEFINITION: Holder continuity on sets. 

A uniformly Holder-continuous function with exponent a or uniformly a-Hölder (continuous) function from 
Mı to Mg, for metric spaces Mı < (Mı, dı) and M» < (Mo, d2), for a € (0,1], on a set S C Mj, is a function 
f: S — Mə such that 


JK € R, Vz,y € S, da(f (x), f(y)) € Kdi(z,y)?. 


A locally Hólder-continuous function with exponent a or locally a-Hélder (continuous) function from Mi 
to M», for metric spaces Mı < (Mı, dı) and Mz < (M2,d2), for a € (0,1], on a set S C Mi, is a function 
f: S — Ms such that f is uniformly a-Holder continuous on all compact subsets of S. 
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38.7.4 REMARK: Notations for spaces of Holder continuous functions. 

In the PDE literature, there are various definitions and notations for sets of Hólder continuous functions. 
The notation *C*(S)" for a € (0,1] and subsets S$ of Cartesian spaces or manifolds has meanings which 
vary from author to author, and typically depend on the nature of the domain set S. 


Similar variability is seen for definitions and notations for k-times differentiable functions whose kth deriva- 
tives are a-Hólder continuous, where k € Zj. The notation C^^(S) has meanings which depend on the 
needs of particular applications. (See for example Petersen [31], pages 301-302; Gilbarg/Trudinger [81], 
pages 51-53; Ladyzhenskaya/Ural’tseva [107], pages 4-5; Miranda [116], pages 1-2.) 

For simplicity, Notation 38.7.5 denotes only uniformly globally Hólder continuous functions. In the special 
case a = 1, C? (S, Mz) denotes the set of uniformly globally Lipschitz functions in Definition 38.6.6. 


38.7.5 NOTATION: C^?'^(S, M5), for a € (0,1] and S € IP(Mj), for metric spaces M; and M», denotes the 
set of uniformly a-Holder continuous functions on S, from Mı to M». In other words, 


C?e(S, M2) = {f : S > Mo; IK € R, Vz,y € S, da(f(z), f(y)) € Kdi(z,y)?]. 


C? (S, M3), for a € (0,1), means the same as C^ (S, M3). 


38.8. Curve length in metric spaces 


38.8.1 REMARK: Definition of the length of a curve. 

The curve length in Definition 38.8.2 is often interpreted geometrically as the sum of lengths of edges of 
an “inscribed polygon". By the identity property of distance functions, it makes no difference whether 
increasing or non-decreasing parameter sequences are used in Definition 38.8.2. (Every sum in the set on 
line (38.8.2) corresponds to a non-decreasing parameter value sequence, and if the repeating values in this 
sequence are removed, the result is an increasing sequence for which the sum is the same. Since the sets of 
real numbers in lines (38.8.1) and (38.8.2) are the same, their supremums must be the same!) 


38.8.2 DEFINITION: The length of a curve y in a metric space (M, d) is L(y) € Rj defined by 


L(y) = sup { E n € Zj and t € Inc(N,, 41, Dom(7))} (38.8.1) 
= sup { > d(y(ti), v(tia1)); n € ZË and t € NonDec(Nn41,Dom(7))} (38.8.2) 


if y # Ø and L(0) = 0. (See Notation 11.1.32 for “Inc” and *NonDec".) 


38.8.3 NOTATION: L(y), for a continuous curve y in a metric space, denotes the length of y. 
Lj (o), for a continuous curve y in a metric space and a real interval J, denotes the length of y J 


38.8.4 THEOREM: Some basic properties of the lengths of curves in metric spaces. 
Let (M,d) be a metric space. Let y : I — M bea curve in M. 


(i) If #(Dom(y)) = 1, then L(y) = 0. 
) If y is constant, then L(y) = 0. 
(iii) If J = [a,b] for some a,b € R with a < b, then d(y(a), y(b)) € L(y). 
) Let y = y U2: I — M be a curve in M, where yı : I — M and 72: I5 — M are curves in M 


and sup(/A) = inf(I;) € h NO Ip and wi(sup(J1)) = y2(inf(I2)). (See Definition 36.6.5 for “simple 
concatenation of joinable curves".) Then L(y) = L(y1) + L(y2). 


PROOF: For part (i), let I = (x) for some x € IR. Then NonDec(N,,41,/) contains only the constant 
function t : i++ x for all n € Z+. So line (38.8.2) yields L(y) = 0. (Alternatively, using line (38.8.1), 
Inc(Nj, I) contains only t : 1 > z, and Inc(IN,41, I) = 0 for n € Z*, which also gives L(y) = 0.) 


For part (ii), let J be any real interval and suppose that 3p € M, Vt € I, y(t) = p. Then L(y) = 0 because 


d(y(x), y(a’)) = 0 for all x, x’ € I. 
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For part (iii), let 7 = [a,b] for some a,b € R with a € b. Define t : Z2 — I by t(1) = a and t(2) = b. Then 
t € NonDec(IN, 44, I) for n = 1, and 5; 4 d(y(ti), y(ti41)) = d(o(a), y(b)). Therefore L(y) > d(y(a), 7(b)) 
by Definition 38.8.2 line (38.8.2). 

For part (iv), clearly L(y) > L(y) + L(y2) because the concatenation of sequences t) € NonDec(Nn, +1, 11) 
and t? € NonDec(N;, +1, I1) for ni, no € Zi yields a sequence t € NonDec(N,, 14,42, Z) for which the sum 
5.,, equals the sum of the sums for the individual parameter value sequences plus one extra distance term, 
namely d(7(tn41), Y(tn+2)). (See Definition 14.12.6 (iii) for list concatenation.) 


Each sum in the expression for L(y) corresponds to a parameter value sequence t € NonDec(Nn+1, I) for 
some n € Zj. Let x = sup(I1) = inf(J2). If Range(t) C Iı or Range(t) C I», then S} + equals one of the sums 
for either Jı or I2. So suppose that Range(t) Z I; and Range(t) Z Ig. Let n; = max(i € Zn41; t; € x]. 
Then n; < n - 1. Let nz = n--1— na. Define tU € NonDec(N,, 41, 71) by d = t; fori € Np, and 
(D. | =a. Define (2 € NonDec(N,, 44, 15) by tÜ =a and t® = t,,, 4 for i € N44 \ {1}. Then 


ni-cl 
S, 0) + Sya = Syt + AED), (2) + do (n), VE) — den), 58 41) 


= Sy + drt), (2) + digla), te?) — aye), y) 
> Sy t- 


Therefore L(7) + L(y2) > L(y). Hence L(y) = L(91) + L(o). 


38.8.5 REMARK: Curve length invariance under reparametrisation. 
The reparametrisation 5; of a curve ^; in Theorem 38.8.6 is illustrated in Figure 38.8.1. 


qi 
ao A 
Br M 
I "yg 9 By 
Figure 38.8.1 Reparametrisation of a curve 


For reparametrisation via domain-homeomorphisms, see Friedman [74], pages 301-302. For reparametrisation 
via non-decreasing continuous functions, see Graves [85], pages 211-213. 


Non-decreasing and non-increasing continuous surjections are used instead of homeomorphisms between 
parameter intervals in Theorem 38.8.6. Various properties of such reparametrisation maps are presented in 
Sections 36.4 and 36.5. Constant stretches of curves can be removed by applying Theorem 36.4.11 to obtain 
never-constant curves. Definition 38.8.2 does not exclude curves which have constant stretches. Therefore 
they are permitted in Theorem 38.8.6. Also permitted are reversals of the direction of parametrisation. 


38.8.6 THEOREM:  Path-equivalent curves in a metric space have the same length. 

If 4; and *9 are non-empty curves in a metric space, and yı o 8, = 72 o 2 for some non-decreasing or 
non-increasing continuous surjections 6; : I — Dom(y,) for k = 1,2, where J is a real interval, then yı and 
72 have the same length. 


PROOF: Let (M,d) be a metric space. For k = 1,2, let yk : Ip —^ M be non-empty curves in M, and let 
Dy : I — I, be non-decreasing continuous surjections for a real interval I such that 7, o B1 = %2 o B3: I — M. 
Then the length of yı o (1 is 


L(y o 6) = sup { x d(m (£x (6), 11 (03 (ti41))); n € ZT, t € NonDec(N444, 1)) (38.8.3) 
= sup { 2 d(y1((B1 o t)i)), m ((81 © tJi+1))); n € Zf, Bı ot € NonDec(Ns41,5)) (38.8.4) 
=sup{ 3 d(y1(ti)), 8 (ti+1))); n € Zf, t € NonDec(Nn41, 71) } 
= L(y). 
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For the equality of lines (38.8.3) and (38.8.4), it must be noted that 8, o t € NonDec(IN441, 7) does not 
necessarily imply t € NonDec(Nn+1, 7), but wherever £1(t;) € 61(ti41) and t; £ ti+1, the sum in line (38.8.3) 
is not affected because then £;(t;) = 61(ti41), which implies d(y1(61(ti)), y1(G1(tig1))) = 0. 

Similarly L(y2 o 85) = L(y2). Therefore L(y) = L(y) The non-increasing continuous surjection case 
follows similarly except that 6; o t € NonInc(N,41, /1), which must be replaced with a reversed sequence 
t € NonDec(Nn41, 11) defined by £: i+ B1(tn42—i). 


38.8.7 REMARK: Partial curve length functions and length-parametrisation of curves. 

Parametrisation of a curve by its length has substantial practical advantages. (See Definition 38.9.3 for 
length-parametrisation.) It not only makes the curve Lipschitz continuous. In a differentiable manifold, it 
also makes the curve differentiable almost everywhere, which is important for definitions of parallel transport 
for differentiable fibre bundles and pathwise distance in Riemannian and pseudo-Riemannian manifolds. 


In the case of the very simple curve y : R — R with y(x) = x for all x € R, Definition 38.8.8 gives the 
partial curve length function A,(r) = oo for all x € IR, which is not very useful. (See also Example 38.8.14.) 
A more useful style of partial length would, for example, give the answer ,(x) = x, which is a signed 
"length". However, with infinite-length curves in general, it is not possible to state a fixed rule which 
identifies the best starting point for the partial length function. Therefore no attempt is made to achieve 
this in Definition 38.8.8. Instead, this style of partial length function delivers a useful finite value if the curve 
is “finite on the left". For example, the curve y : Rf > R with y : x  z would give A4(z) = max(0, x) 
for all x € R. Thus Definition 38.8.8 is suitable for curves which are initially of finite partial length, but 
whose partial length may diverge to infinity on the right of its parameter interval. This would be suitable 
in particular for transporting a fibre from an initial point along a semi-infinite curve, which is a typical kind 
of application. 


The length of a curve L(y) in Definition 38.8.2 has some of the character of a Riemann integral, although 
a general metric space has no differential structure. Similarly, the partial curve length function A, in 
Definition 38.8.8 has some of the character of the solution of a differential equation. The temptation to make 
assumptions based on this similarly must, of course, be resisted. (For example, the partial curve length can 
have an infinite jump, but it cannot have a finite jump!) 


38.8.8 DEFINITION: The partial curve length function for a curve 7 in a metric space is the function from 
IR to Ri which is defined by z ++ L(55,44(?). 


38.8.9 NOTATION: A4, for a curve 7 in a metric space, denotes the partial curve length function for y. In 
other words, , : IR — Rj is defined by 


Vr € R, A(z) = Los] (y). 


38.8.10 THEOREM: Some basic properties of partial curve length functions in metric spaces. 
Let (M, d) be a metric space. Let y: I > M bea curve in M. Let S4 + denote the sum 5; d(y(ti), Y(ti+1)) 
for each sequence t € NonDec(N, 41, 7) with n € Zj. Let X, = (c € R; A4(c) < oo]. 
(i) A, is non-decreasing. 
(ii) X, is a real interval of the form Ø or R or (—oo,) or (—oo, b] for some b € IR. In other words, X, is 
“infinite on the left” if it is RAD 
(ii) Vc € X4, Ve € Rt, dn € Zj, 3t € Inc(Ns41, I N (—00, c]), $44 > Ay(c) — E. 
(iv) Ve ER, Lool) = sup(L sss): 2 < e) = sup (a); 2 < c}. 
(v) Ve € R, Li osq(y) = L( oo, (y). In other words, Vc € IR, A4(c) = sup(A4(z); £ < c}. 
(vi) Va,c € R, (a < c > Lio (Y) = suplLj (y); a € £ < c}). 
(vii) Vc, b € R, (c <b > Lieg) = ir acy e< 2 s Of), 
) 
) 
) 
) 


(viii) Va,c € R, (a < c > Lp, a(y) = Lp, )). 
(ix) Vc, b € R, (c< b > Liel) = Bu 
(x) Ve,b € R, (c <b — Liv) = Les) (1))- 
(xi) Vc € Int(X,), inf(Li (y); e < z)) = 0. 
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a 


Ve € Int(X5), A4(c) = inf{Ay(z); x > c]. 
d,| vx, is continuous. 


) 
iii) 
iv) A(X) is a real interval. 
xv) If L(y) < oo and I is a non-empty compact interval, then Range(A,) = [0, L(7)]. 
(xvi) Va,b € X4, (a € b — Liu») = Lav) (7) = Lea, (7) = Lea.) (Y) = AC) — A4(a))- 
PROOF: For part (i), A4(z) = L(—co,a)(7) € Lex) = A(x") for all z,z' € IR with z < a’ because 
every sum in line (38.8.1) for x| soa] is a sum for x| € Thus A, is non-decreasing. (Alternatively, apply 
to obtain Lis sy) = L( so, (Y) + Djs] (Y) 2 
L(-soa(y) if x € Dom(7). If z € Dom(7), then the result follows from either "Y lesson = ^l ass or else 
Um = ( and 4| 
Part (ii) follows from part (i). 


Theorem 38.8.4 (iv) to the subcurves yh j and Ju 


—oo,z x,x'] 


(—oo,«'] = y [aw ,a’]* 


Part (iii) follows from line (38.8.1) and the definition of a supremum. 


Part (iv) follows from part (i) and the observation that every sum S», for 3] ( is also a sum for a partial 


—oo,c) 


curve ul for some x € (—oo, c). (Note that L(. 4, (y) may be infinite.) 


—oo,x] 
For part (v), let c € R. If c ¢ I, then "losa = a eis which implies that L(—.0,q(7) = L( 55, (y). So 
suppose that c € I. If c = inf(I), then Lio, (7) = Lie, (7) = 0 by Theorem 38.8.4 (i), and L( 55, (y) = 0 
because ^ ess = (). So suppose that inf(I) < c € I. Let e € IR*. Then by part (iii), there exist n € Zg 
and t € Inc(Nn+1, Z N (—oo, c]) such that S5, > A4(c) — €/2. If tn+1 < c, then t € Inc(Nn+1, I N (—00,¢)), 
and so S} is in the set of sums for the supremum expression for L(_.40,c) (Y). So suppose that tn+1 = c. Then 
by the continuity of y, there exists x € (tn, tn+1) such that d(y(x), Y(tn+1)) < e/2. Then d(y(tn), y(x)) 2 
d(Y(tn), Y(tn+1)) — £€/2. Define t € Inc(Nn+1, I N (—oo,c)) by t, = # for i € Nn and £,,, — z. Then 
Sye 2 Syw —E/2 > M(e)—e. Since this holds for all e € Rt, it follows that L( 55, (y) 2 L(~co,q (7). Hence 
Lio, (7) = L(—co,c) (y). Then from part (iv), it then follows that Vc € IR, A4(c) = sup{A,(x); x < c}. 

Part (vi) follows by the same reasoning as in part (iv). (Alternatively, apply Theorem 38.8.4 (iv) to the 
subcurves "Yl so a q| and a and then apply part (iv).) 


[2,c) 


Part (vii) follows from part (vi) because Definition 38.8.2 is invariant under parameter interval reversal. 


Part (viii) follows by the same reasoning as in part (v). (Alternatively, apply Theorem 38.8.4 (iv) to the 
subcurves ~| m pl j and le ,; and then apply part (v).) 


[a,c] 


Part (ix) follows from part (viii) because Definition 38.8.2 is invariant under parameter interval reversal. 


Part (x) follows by the same reasoning as in parts (v) and (viii). 
For part (xi), let c € Pai ). Then b € Int(X,) for some b € R such that c < b. Let x € (c,b). Then 
Li) = Lia (y) - Lisa») by Theorem 38.8.4 Gv). So inf[Li.i (y); c < £} = Dig (y) -sup[Lisg (9); ¢ < 
z} = Lies (7) — Lis (y) = 0 by parts (vii) and (ix). 

Part (xii) follows from Theorem 38.8.4 (iv), applied to "loa and "lr. and part (xi). 

Part (xiii) follows from parts (v) and (xii). 

Part (xiv) follows from parts (xiii) and (ii) and Theorem 34.9.5. 

For part (xv), let J = [a,b], then A,(a) = 0 by Theorem 38.8.4 (i). If L(y) < oo, then X, = IR. So by 
part (xiii), Ay is a continuous on from R to IR. Therefore Range(A 4) = Ay([a, b]) is a compact interval 
of R by Theorems 33.5.15 and 34.9.5. But Range(A,) > [0, 2) for all £ € [0, L(y)) by the definition of L(y). 
So Range(A,) 2 [0, L(y)). But Range(A,) € [0, L(y)]. Hence Range(A,) = [0, L(7)]. 

For part (xvi), let a,b € X} with a < b. Then A4(b) = Lc ss (7) = EC osa (Y) + Lja, (v) = Ayla) + La, (y) 
by Theorem 38.8.4 (iv). So Lia (y) = A4(b) — A4(a). The rest of the equalities solio from parts (viii), (ix) 
and (x). 


38.8.11 REMARK: Failure of right continuity of partial curve length for infinite-length curves. 

The partial curve length function A, in Theorem 38.8.10 may have a jump from a finite real number to 
infinity. Therefore Theorem 38.8.10 (xii) has a restriction to the interior of the set where the partial curve 
length function is finite. Examples of curves with such jumps include the space-filling curves in Section 36.3. 
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38.8.12 REMARK: Motivation for bidirectional partial curve length functions. 

Definition 38.8.8 is useful for curves which have finite length or which have “finite length on the left”. A 
curve which has “infinite length on the left” has a partial curve length function with the constant value oo 
as in Example 38.8.14, which is totally uninformative. For some curves, even a bidirectional style of partial 
curve length as in Definition 38.8.13, which is guaranteed to give the value 0 for A, ;,(xo), could give 
positive or negative infinite values for all x € R \ (xo) for one choice of xo, but give finite values for some 
neighbourhood of xo for a different choice. (For example, this could happen if the curve is constructed as a 
concatenation of multiple space-filling curves alternating with straight lines joining them.) So the amount 
of useful information in this function depends on the choice of initial value relative to the locations of any 
infinite-length subcurves. 


38.8.13 DEFINITION: The bidirectional partial curve length function for a curve y in a metric space with 
initial value zy € Dom(») is the function A. sọ : R — R given by 


Ligsajty) for x € Dom(4) N [xo, oo) 
Ve eD Aya = = 
dedos. vao (z) { —Ljz,29)(¥) for z € Dom(y)n(-—oo, zo]. 


38.8.14 EXAMPLE: Doubly infinite straight lines have useless partial length functions. 

Define y : R — R” for n € Zt by y: t p + tv for some p € R” and v € R” \ {0}. Then according to 
Definition 38.8.8, its partial length function A, : R — R$ given by A(x) = oo for all x € IR. However, 
Ac ao (X) = (x — zo)|v| for all x € R. 


38.8.15 THEOREM: Some basic properties of bidirectional partial curve length functions. 
Let (M,d) be a metric space. Let y : I > M be a curve in M. Let X,,, = {x € R; A44,(r) € R} for 
all zo € IR. 
(i) 
(ii) Va € [zo, 00), Ayn (zx) = As(z — xo), where ¥ : R — M is defined by Vz € R$, ?(z) = y(xo + 2). 
) 
) 


= 


A~ zo is a well-defined non-decreasing function for all x € I. 


(iii) Vz € (—00, zo], Ay zo (£) = —As(zo — z), where ^j : RẸ — M is defined by Vz € R$, 5(z) = y(xo — 2). 


PROOF: For part (i), A, 4, : R — R is well-defined because Lio (7) = 0 by Theorem 38.8.4 (i), and 
Lia uj (y) is well defined for all a,b € R with a < b by Definition 38.8.2. The map A,,,,. is non-decreasing by 
Theorem 38.8.4 (i, iv). 

Part (ii) follows from Notations 38.8.3 and 38.8.9 and Definition 38.8.13. 
Part (iii) follows from Notations 38.8.3 and 38.8.9 and Definition 38.8.13. 
Part (iv) follows from Theorem 38.8.10 (xiii) and parts (ii) and (iii). 


38.9. Rectifiable curves in metric spaces 


38.9.1 REMARK: Rectifiable curves are curves which have finite length. 
A rectifiable curve is in essence a continuous curve with finite length in a metric space, although they may 
be defined in differentiable manifolds as in Section 50.7. 


The word “rectifiable” suggests that something can be “rectified” or “corrected”, or “straightened out” in 
some way. In a Cartesian space, the “rectification” of a curve may be thought of as continuously deforming 
the curve until it is a straight line with finite length. Therefore it is not surprising that a rectifiable curve 
is defined to be a curve with finite length. Two very useful properties of rectifiable curves are that they 
become Lipschitz curves when they are reparametrised in terms of subcurve length, and that as Lipschitz 
curves they are differentiable almost everywhere if the metric space has a suitable differentiable structure. 
This makes them well suited to definitions of parallel transport and distance in differentiable geometry. 


For rectifiable curves, see A.E. Taylor [145], pages 389-390, 417-419; Bruckner/Bruckner/Thomson [56], 
pages 38, 136-138; Riesz/Szókefalvi-Nagy [125], pages 26-28; Friedman [74], pages 298-302; Ahlfors [45], 
pages 104-105; Phillips [121], page 86; Graves [85], pages 211-214; Rudin [129], pages 124-126; Saks [131], 
pages 121-125. 
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38.9.2 DEFINITION: A rectifiable curve in a metric space M is a curve in M with finite length. In other 
words, a rectifiable curve in a metric space (M, d) is a curve y in M which is empty or satisfies 


sup { > d(y(ti), Y (ti41)); n € Zg and t € Inc(N, 4.1, Dom(4))] < oc. 
i=l 


38.9.3 DEFINITION: The length-parametrisation of a rectifiable curve y in a metric space is y = y o A 
where A, is the partial curve length function for y. 


A length-parametrised curve in a metric space M is a rectifiable curve in M which is equal to its own 
length-parametrisation. 


38.9.4 THEOREM: Some properties of the length-parametrisation of a rectifiable curve. 
Let ^; be the length-parametrisation of a rectifiable curve y in a metric space M. 


(i) ¥ is a well-defined function from Range(A,) to M. 
(ii) 4 is a Lipschitz curve in M with Lipschitz constant 1. 


Pnoor: For part (i), let y : J — M be a rectifiable curve in a metric space (M,d). Then y has finite 
length L(y). Let J = A4(I), where A, is the partial curve length function in Definition 38.8.8. Then J is a 
real interval and A, is a well-defined non-decreasing continuous surjection by Theorem 38.8.10 (i, xiii, xiv). 


Define the relation 4 = y o A;!. Then Dom(5) = J because Range(A,) = J and so Dom(Az!) = J. To show 
that ¥ is a well-defined function, suppose that (x, yx) € 7 for k = 1,2. Then (z, zk) € Az! and (zk, yk) € 7 
for some 2, € J, fork = 1,2. Soa = A,(z) and yk = (zy) for k = 1,2. But then z = A4(21) = As(22), 
which implies, assuming that z, < 22, that Li, (y) = A4(22) — Ay(21) = 0 by Theorem 38.8.4 (iv). So 
d(y(21), y(z2)) = 0 by Theorem 38.8.4 (iii). So y(z1) = 7(z2) by Definition 37.2.3 (i). Therefore 4 : J — M 


is a well-defined function. 


For part (ii), let 21,22 € J with x; X x2. Then zı = A4(z1) and z2 = A4(22) for some 21, z2 € I. (Note that 
21, 22 are not necessarily uniquely determined by x1, x2.) Then d(7(#1), ¥(@2)) = d(y a), Y(22)) € Liz) 
by Theorem 38.8.4 (iii). But Lu, (y) = A4(22) — A4(z1) = $2 — #1 by Theorem 38.8.10 (xvi). Therefore 
d(5(zi),5(x2)) € |v2 — z1| for all 71,22 € J. Hence ¥ is a Lipschitz map by Definition 38.6.6. So ¥ is 
continuous by Theorem 38.6.12 (ii). Hence Ẹ is a Lipschitz curve in M with Lip(7) X 1. 


Part (iii) follows from Definition 36.5.3 and the observation that y = 7 o Ay, where A, is a non-decreasing 
continuous surjection. 

Part (iv) follows from part (iii) and Theorem 38.8.6. 

For part (v), let a,b € Dom(7) with a € b. Then a = \,(z1) and b = A4,(z5) for some 21,25 € I. Therefore 
Um o del tes ual = um So L(5|, 4) = L(Y| 25,20) by Theorem 38.8.6 because ^] peo 9] > [a,b] 
is a non-decreasing continuous surjection. Hence Li 4(5) = Lja) = A4(22) — A4(231) = b — a by 
Theorem 38.8.10 (xvi). 

For part (vi), suppose that ^ is not a never-constant curve. Then Definition 36.4.3 implies that there exist 
a,b € Dom(^) with a < b such that Ẹ is constant on [a,b]. So Lj, (^) = 0. This contradicts part (v). Hence 
^j is a never-constant curve. 


21,22 


38.9.5 REMARK:  Length-parametrised curves are curves with “constant speed”. 

Theorem 38.9.4 (v) seems to be interpretable to mean that a length-parametrised curve is a curve which 
has “constant speed" equal to 1. Certainly the distant travelled in a “time” interval [a,b] is no more than 
b — a by Theorem 38.8.4 (iii). But speed in the sense of the absolute value of the velocity cannot be defined 
since the curve cannot be differentiated without a differentiable structure on the metric space. Nevertheless, 
in the absence of any better definition, it does seem reasonable to describe the Lipschitz constant 1 as the 
speed of any length-parametrised curve although velocity (which would give a direction to the “speed”) is 
not defined. 
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38.9.6 REMARK: Rectifiable curves versus locally rectifiable curves. 

In many textbooks, rectifiable curves are defined to be curves with finite length, but Example 38.8.14 shows 
that even the straightest of straight lines in Cartesian spaces are not rectifiable according to this definition. 
It seems reasonable to describe such “doubly infinite” curves as rectifiable, even though various irksome 
technicalities do arise when proving their basic properties. For compatibility with the literature, curves 
which have finite length on compact intervals are called “locally rectifiable" in Definition 38.9.8. Some 
relations between local rectifiability and Lipschitz continuity are given in Theorems 38.9.7 and 38.9.9. 


The “Lipschitz curve” in Theorem 38.9.7 means a curve which is a Lipschitz map. (See Definition 38.6.6.) 
Thus rectifiability is closely related to Lipschitz continuity for some choice of parametrisation. 

The range (J) of the partial curve function in the proof of Theorem 38.9.7 (i) might not be [0, L(7)] 
because the parameter interval J might not be compact. However, if J is a non-empty compact interval, then 
A(T) = [0, L(y)] by Theorem 38.8.10 (xv). 


38.9.7 THEOREM: Some relations between rectifiable curves and Lipschitz curves. 
Let y: I — M be a curve in a metric space M. 

(i) If y is rectifiable, then y is path-equivalent to a Lipschitz curve with Lipschitz constant 1. 
(ii) If y is a Lipschitz curve, then ?li ui is rectifiable for all a, b € R with a < b. 


) 
(iii) If y is path-equivalent to a Lipschitz curve, then um bj is rectifiable for all a,b € Dom(y) with a < b. 
) 


(iv) If te bj is rectifiable for all a,b € Dom(y) with a < b, then y is path-equivalent to a Lipschitz curve 
with Lipschitz constant 1. 


Pnoor: Part (i) follows from Theorem 38.9.4 (ii, iii). 

For part (ii), let y : I — M be a Lipschitz curve in a metric space (M,d). Then there is a K € Rj 
such that Yxı, x2 € I, d(y(#1),y(w2)) € K|r2 — zi| by Definition 38.6.6. Let a,b € R with a < b. Let 
ic NonDec(Nn+1, In [a, b]) for some n € Zi. Then Pm d(y(ti), y(ti41)) < | dogm ILES = til = K (tiia m 
ti) € K(b— a). Therefore L(y| fa s) < K(b— a) by Definition 38.8.2. Hence x| "m is rectifiable for all a, b € R 
with a < b. 

For part (iii), let yı : Iı — M be path-equivalent to a Lipschitz curve y2 : I2 + M. Then by Definition 36.5.3, 
there are non-decreasing continuous surjections k : I — Ij for k = 1,2 such that 71 o £4 = Y2 o B2 for some 
real interval I. 

Let ay, by € I with a4 € by. If ay = b, then L(y | 
So assume that a, < bı. 

Let a = sup{z € I; By (x) < a1} = sup(f, !((—o0,a;]). Then by Theorem 15.9.3 (xv), a is a well-defined 
real number because {x € I; &1(z) < a1) is non-empty and bounded above since a1 < bı € I and £4 is non- 
decreasing, and then a € (x € I; f1(x) € a1) because this is a closed set since £1 is continuous. Therefore 
a € I and fi(a) = ay. Similarly, let b = inf(x € I; f(x) > bı}. Then b € I and fi(b) = bı. Therefore 
Bx ({a, b]) = [a1, b1] because 6; is non-decreasing. So yı o Bil, i = ee o Bil, : [a,b] > M. 


Let az = f»(a) and b2 = Ba(b). Then 42 o Balin a = ljas ba] © Belja : [9] > M. But fi], y : [a,b] > 
[a1, 61] = Dom( Olja by) and Bolja : [a,b] — [a2, b2] = Dom( e j,) are non-decreasing continuous 


i.a] = 0 by Theorem 38.8.4 (i) and so "ilu, yj i5 rectifiable. 


surjections. So ^il, and 2 are path-equivalent curves. Therefore they have the same length 


[a1,b1] lias a] 
by Theorem 38.8.6. But Lelia " « oo by part (ii) because Yəla "RES Lipschitz curve. Therefore 


Ln], - < oo. Hence yı is rectifiable for all a,,b; € I with a; € bı. 


lia. 4] 
For part (iv), assume that Flia is rectifiable for all a,b € I with a < b. Let x € I. Let A44, : 1 — R 
be the bidirectional partial curve length function for y with initial value zo as in Definition 38.8.13. Then 
Ass, (I) € IR because Vz € IN [20, 00), Lizo, (7) < oo and Va € IN (—00, xo], Lj: (y) < oo. Therefore 
Ass, : 1 > R is non-decreasing and continuous by Theorem 38.8.15 (i, a Let J = A, 4I). Let 4 — y 
A-l . Then "Hs is the length-parametrisation of ur and "lo ae is the length-parametrisation of 


YT’ 


the reversal of yhe eon with domain translated to commence at 0. Therefore 5:J — M is a well-defined 


[z0,00)? 


Lipschitz curve in M with Lipschitz constant 1 by Theorem 38.9.4 (i, ii), and ^ is path-equivalent to y by 
Theorem 38.9.4 (iii). 
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38.9.8 DEFINITION: A locally rectifiable curve in a metric space M is a curve y in M such that y [a,b] is 
rectifiable for all a,b € Dom(y) with a < b. In other words, a locally rectifiable curve is a curve whose 


restrictions to compact subintervals have finite length. 


38.9.9 THEOREM: Path-equivalence of locally rectifiable curves and Lipschitz curves. 
A curve in a metric space is locally rectifiable if and only if it is path-equivalent to a Lipschitz curve. 


PRoor: The assertion follows from Definition 38.9.8 and Theorem 38.9.7 (iii, iv). 


38.9.10 REMARK: Invariance of local rectifiability under curve reparametrisation. 

It follows from Theorem 38.9.9 that if two curves in a metric space are path-equivalent, then either both 
or neither are locally rectifiable because path-equivalence is an equivalence relation by Theorem 36.5.9 (ii). 
Therefore local rectifiability is a well-defined property of the “paths” in Definition 36.8.4, which are equiva- 
lence classes with respect to path-equivalence. However, neither the rectifiability nor the Lipschitz continuity 
of a curve is well defined for such equivalence classes. 


Definition 38.9.11 gives a formula for converting any locally rectifiable curve to a Lipschitz curve with 
Lipschitz constant 1. This is unique apart from translations and reversals. Thus it provides a kind of 
normalisation of a locally rectifiable curve if the initial value and direction are given. 


38.9.11 DEFINITION: The bidirectional length- penne anon of a locally rectifiable curve y in a metric 
space with initial value zy € Dom(7) is 7 = y o AZ 4, where A4 sọ is the bidirectional partial curve length 
function for y with initial value xo. 


us 


A bidirectional length-parametrised curve in a metric space M is a locally rectifiable curve in M which is 
equal to its own bidirectional length-parametrisation for some initial value. 


38.9.12 REMARK:  Applicability of rectifiable curves to differentiable manifolds and parallel transport. 
Rectifiable curves are the natural kind of structure to define parallelism on because parallel transport of 
fibres along curves between base points of a fibre bundle requires integration (or more precisely the solution 
of first order ODEs), which in turn requires the existence of a velocity vector almost everywhere. This is in 
fact provided by rectifiable curves. (This is the subject of Section 45.7.) 


In differentiable manifolds and differentiable fibre bundles, it is desirable to be able to define rectifiability of 
curves locally within each chart in its atlas. Since local rectifiability in Definition 38.9.8 is expressed in terms 
of on compact subintervals of the domain of a curve, it is possible to localise the definition to individual 
manifold charts. Theorem 38.9.13 shows that local rectifiability is guaranteed if each point in the curve's 
parameter space has a neighbourhood (in the relative topology of the parameter interval) within which the 
length of the restricted curve has finite length. 


Although a metric is required on a differentiable manifold to be able to determine a chart-independent curve 
length, rectifiability may be defined with only a C! structure, or even with only a Lipschitz atlas. (See 
Section 50.7 for rectifiable curves in Lipschitz manifolds.) Thus rectifiability may be defined in a meaningful 
way to differentiable manifolds which have no metric space structure, whereas a meaningful number for the 
length of a curve does require a metric, which is provided by Riemannian manifolds. 


38.9.13 THEOREM: Curves are locally rectifiable if and only if they have locally finite length. 
A curve y: I — M in a metric space M is locally rectifiable if and only if 


Vt € Int(Z), Ja, b € R, t€(a,b) and Lpg(y) « oo. (38.9.1) 


PROOF: Let y : I — M satisfy line (38.9.1). Let a',b' € I with a’ < t. Then [a’,b’] is a compact 
subset of IR. Let C = {(a,b) € Top(R); a,b € IR, a < b, Li, g(y) < co}. Then C is an open cover of 
[a’, ] by line (38.9.1). So C has a finite open subcover C’. Therefore by induction on the elements of the 
subcover, Lja’,»(y) < oo by Theorem 38.8.4 (iv). So y is locally rectifiable. The converse follows directly 
from Definition 38.9.8 9.8 and line (38.9.1 i. 
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38.10. Functions of bounded variation 


38.10.1 REMARK: Functions of bounded variation, Lipschitz maps and rectifiable curves. 

Functions of bounded variation, or “BV functions” for short, are closely related to both the Lipschitz 
functions in Section 38.6 and the rectifiable curves in Section 38.9. (BV functions are attributed to Camille 
Jordan by Bruckner/Bruckner/Thomson [56], page 37.) 


BV functions are more general than both Lipschitz functions and rectifiable curves because BV functions 
are not necessarily continuous, but they are also less general because are usually defined to have values in 
normed linear spaces such as Cartesian spaces or Banach spaces. This is summarised in the following table. 


function class domain range continuous 
Lipschitz map metric space metric space yes 
rectifiable curve real interval metric space yes 
BV function real interval normed space — 


BV function spaces have a natural linear space structure if the range is a linear space. Their properties are 
generally studied in this linear space context with particular relevance for integration theory. There are also 
closely related definitions of BV functions on sets of sets, which are also important for integration theory, 
but they are not presented here. 


For BV functions of a real variable, see A.E. Taylor [145], pages 382-401; Riesz/Szókefalvi-Nagy [125], 
pages 9-11, 26-28; Kolmogorov/Fomin [104], pages 328-332; Graves [85], pages 202-207, 267-269; Bass [53], 
pages 117-119; Johnsonbaugh/Pfaffenberger [97], pages 213-224; Rudin [129], pages 117-121; Bruckner/ 
Bruckner/Thomson [56], pages 37-38, 149-153; Schramm [133], pages 297-299; S.J. Taylor [147], pages 228- 
229; Shilov/Gurevich [136], page 65; Giusti [83], pages 2-4, 33-37; Saks [131], pages 96-100; Federer [69], 
pages 109-111. 


38.10.2 REMARK: Total variation and functions of bounded variation. 

Although Definitions 38.10.3 and 38.10.4 are stated in terms of general metric spaces, the total variation 
and bounded variation concepts are usually stated for linear spaces only, and the metric function d is then 
defined in terms of a norm | - | as (x,y) > |z — yl. 


38.10.3 DEFINITION: The total variation of a function f : I — M, for a non-empty real interval J and 
metric space (M, d), is the non-negative extended real number Var(f) € Rj given by 


Var(f) = sup { 2 d(f (ts), ftig1)); n € Z9, t € Ine((INs qi. 1) }, (38.10.1) 


where Inc(N441,7) = (t € I"*!; Vi € Nn, t; < titi} for n € Zi. (See Notation 11.1.32 for “Inc”.) 


38.10.4 DEFINITION: A function of bounded variation from a non-empty real interval J to a metric space M 
is a function f : I — M for which Var(f) < oo. 


38.10.5 REMARK: Definitions of total variation and bounded variation. 

'The empty function is excluded in Definitions 38.10.3 and 38.10.4 for total variation and bounded variation 
(by requiring 7 4 Ø) because the empty function would have a total variation equal to —oo. The “sample 
point sequences" in Inc(IN,,+1, I) are required to be non-decreasing to avoid backtracking, which would make 
the total variation equal oo in general, and the point sequences are required to be strictly increasing to avoid 
the technical inconvenience of empty intervals in partitions. 


If I = {x} for some x € IR, then Inc(Nj, T) contains only the function t : 1 > z, and Inc(IN441, I) = 0 for 
all n € Z+. Then line (38.10.1) yields Var(f) = 0. Similarly, Var(f) = 0 for any constant function f. 


38.10.6 REMARK: Locally bounded variation. 

The properties of BV functions which are generally considered interesting or important are mostly local, 
such as differentiability almost everywhere. This is essentially the Lebesgue differentiation theorem. (See 
Section 45.7.) Its proof requires the concept of sets of measure zero. (See Sections 45.1 and 45.2.) Since the 
properties of interest are mostly local, it is desirable to define locally BV functions as in Definition 38.10.7. 
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38.10.7 DEFINITION: A function of locally bounded variation from a non-empty real interval J to a metric 
space M is a function f : I — M for which f | gisa function of bounded variation from K to M for all 
compact subintervals K of T. 


38.10.8 REMARK: Extension of bounded variation concepts to manifolds. 

The almost everywhere differentiability property of BV functions is relevant to the definition of parallel 
transport on differentiable manifolds. However, manifolds are not linear spaces, and they are not always 
metric spaces. The concept of a rectifiable curve, as in Section 38.9, is closer to the requirements for 
manifolds, but it must be adapted to the kinds of differentiable structure which are available on manifolds. 
The style of locally bounded variation in Definition 38.10.7 is suitable for metric spaces, but must be made 
even more local in order to adapt it for the local charts offered by manifolds. 
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39.1. Topological linear spaces 


39.1.1 REMARK: Terminology for topological linear spaces. 

The term “linear space" is used in preference to “vector space" in this book because the word “vector” 
suggests a geometric arrow with both a specified start point (or base point) and a specified end point, 
whereas a linear space is an algebraic structure where every “arrow” has the same base point. The term 
"topological vector space" is most often seen in the mathematics literature, although the term "topological 
linear space” does appear in EDM2 [113], pages 1600-1606. It is called a “linear topological space" by 
Yosida [167], page 25. 


Some family relations between topological linear spaces and other structures are illustrated in Figure 39.1.1. 


linear space topological space 
topological linear space} |metric space 
normed linear space complete 
" nem metric space 
Banach space 
Y 
Hilbert space 
Y 


Euclidean space 


Figure 39.1.1 Family tree of topological linear spaces 


The abbreviation TVS is often used for a topological vector space. Here the abbreviation TLS may be used 
for a topological linear space. 


39.1.2 REMARK: Combination of linear spaces with topological spaces. 

Definition 39.1.3 combines Definition 22.1.1 for a linear space with Definition 31.3.3 for a topological space. 
Topological linear spaces are defined only for the real-number field by Lang [23], page 5. Topological 
linear spaces are defined for both real and complex fields by Rudin [130], page 7; Bachman/Narici [51], 
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page 343; Robertson/Robertson [126], page 9; Kolmogorov/Fomin [104], pages 167-168; Yosida [167], page 25; 
EDM2 [113], page 1600. The field is assumed to be the complex numbers by Adams [44], pages 2-3. 


It makes little sense to replace the field K in Definition 39.1.3 with a more general class of field. For most 
topological analysis, the completeness of K is more or less indispensable. The standard topologies on IR 
and C are generated by open intervals. (See Definition 32.5.7 and 32.6.12.) These topologies correspond 
to the standard distance functions for R and C. According to Theorem 39.5.3 (iii), all possible norms 
on C, identified with IR2, give the same topology. So there is little room for choice of the topology T in 
Definition 39.1.3. 


39.1.3 DEFINITION: A topological linear space is a tuple V < (K, V,oK,TK,0v, H4, Tk, Ty) such that 
(i) K < (K,oxK,Tx) is the field R or C, 


(ii) Tg is the standard topology on K, 


) 
) 
(iii) V < (K,V,o0K,7K,0v, p) is a linear space over K, 
(iv) V « (V, Tv) is a topological space, 

) 


(v) the vector addition function oy : V x V — V is continuous with respect to the topology Ty on V and 
the corresponding product topology on V x V, 


(vi) the scalar multiplication function u : K x V — V is continuous with respect to the topology Ty on V 
and the product topology on K x V. 


A real topological linear space is a topological linear space for which the field is IR. 


A complex topological linear space is a topological linear space for which the field is C. 


39.1.4 EXAMPLE:  Non-trivial topological linear spaces with trivial topology are not Tı spaces. 

Let V < (K,V;ok,Tk,c0v,H) be a linear space over K, where K is R or C. Let Tx be the usual topology 
on K. Let Ty be the trivial topology (0, V} on V. 

Let Q € Top(V) = Ty. Then Q = 0 or Q = V. If Q = f, then ez!(Q) = 0 € Top(V x V). FQ = V, 
then oy (Q) = V x V € Top(V x V). So by Definition 31.124, V < (K,V.ok,Tk,Gv, pi, Tk, Ty) satisfies 
Definition 39.1.3 (v). 

Let Q € Top(V) = (0, V}. If Q = 0, then u^! (Q) = 0 € Top(K x V). EQ = V, then w-1(Q) = K xV € 
Top( x V). So V < (K,V.ok,Tk,0v.i, Tk, Ty) satisfies Definition 39.1.3 (vi). Hence V is a topological 
linear space. 


If V is the trivial linear space {0y }, then V satisfies the requirements for To, T; and T» spaces. But if V is 
non-trivial, V is not even a T space. (This is shown in Example 33.1.7.) So by Theorem 33.1.10, V is not 
a T, space, and by Theorem 33.1.26, V is not a T» space. 


39.1.5 REMARK: [If a topological linear space is 'T1, then it is Tə. 

Theorem 39.1.8 (ii) asserts that if a topological linear space is T4, then it is T2. The proof uses the technical 
result in Theorem 39.1.7, which uses the scaling-invariance property in Theorem 39.1.6 (iv). In a T5 space, 
every convergent sequence has a unique limit by Theorem 35.4.10. So it follows that in a Tı topological 
linear space, the limit of an infinite series as in Definition 39.2.6 must be unique. (This is asserted in 
Theorem 39.2.8.) It seems likely that this is why some authors incorporate the T condition as part of the 
definition of a topological linear space. (See for example Rudin [130], page 7; Lang [23], page 5.) 


39.1.6 THEOREM: Translation and scaling invariance of the topology on a topological linear space. 
Let V < (K,V;,ox,Tk,O0v, i, Ty,Ty) be a topological linear space. For all u € V, define fu : V > V by 
ve»oy(u,v) =u +v. For all A € K \ {0x}, define g) : V > V by v  u(A,v) = Av. 


(i) fu: V — V is a homeomorphism for all u € V. 

(ii) ga : V > V is a homeomorphism for all A € K \ {0x}. 

(ii) Vu € V, VU € Top(V), fu(U) € Top(V). 

(iv) VÀ € K \ {0x}, VU € Top(V), gX(U) € Top(V). 

PROOF: For part (i), let u € V. Then f, : V — V is continuous by Definition 39.1.3 (v) and Theo- 


rem 32.10.10 (ii). Similarly f. : V — V is continuous. But fu o feu = f-uo fu — idv. So fu is a bijection. 
Therefore f, : V — V is a homeomorphism by Definition 31.14.2. 
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For part (ii), let A € K V {Ov}. Then gà : V — V is continuous by Definition 39.1.3 (vi) and Theo- 
rem 32.10.10 (ii). Similarly gj; : V — V is continuous. But ga o giy4 = g1/a © gx = idy. So gx isa 
bijection. Therefore g, : V — V is a homeomorphism by Definition 31.14.2. 

Part (iii) follows from part (i) and Definition 31.12.4. 


Part (iv) follows from part (ii) and Definition 31.12.4. 


39.1.7 THEOREM: Existence of neighbourhoods of zero which are closed under negation and addition. 
Let V be a topological linear space. Let Q € Topy(V). Then for some U € Top,(V), 


(i) Va € V, (e U & —xz € U), and 
(ii) Vr,y EU, x -y ER. 


In other words, U is a neighbourhood of Oy such that U = —U and U +U C Q. (See Definition 22.10.2 and 
Notation 22.10.5 for the Minkowski negations and sums of sets.) 


PROOF: Let V be a topological linear space with zero element Oy. Let Q € Topo, (V). Let G = ay! (Q). 
Then (0y,0v) € GC V x V, where V x V has the product topology as in Definition 32.9.4. Therefore G € 
Topio,, o, (V x V) by Definition 31.12.4 because ay : V x V — V is continuous. Then by Theorem 32.9.6 (iii), 
there are sets G1, G2 € Topo, (V) such that G1 x G2 C G. Let Go = G1 N G2. Then Go € Topo, (V) by 
Definition 31.3.2 (ii), and Go x Go € Topi, 9, (V x V) by Theorem 32.9.6 (ii), and Go x Go C G. So 
cy (Go x Go) € Q. Let —Go denote (v € V; —v € Go}. Then —Go € Topo, (V) by Theorem 39.1.6 (iv). 
Let U = Go 1(-Go). Then Va € V, (x € U & —z € U), and U € Topo, (V) by Definition 31.3.2 (ii). 
Let x,y € U. Then (x,y) € Go x Go. So x+y € ov(Go x Go) € ov(Gi x G3) C av(G) € Q by 
Theorem 10.7.1 (i). Thus U satisfies (1) and (ii). 


39.1.8 THEOREM:  Sufficient conditions for a topological linear space to be Hausdorff. 
Let V < (K,V;ok,Tk,0v; H, Tk, Ty) be a topological linear space. 


(i) If {0} is a closed subset of V, then V is a Hausdorff space. 
(ii) If (V, Ty) is a T; space, then (V, Ty) is a Hausdorff space. 


Pnoor: For part (i), suppose that {Oy} is a closed subset of V. Then the singleton {v} is closed for any 
v € V by the translation invariance of Top(V) in Theorem 39.1.6 (i). (Alternatively apply Theorem 39.1.6 (iii) 
to the open set U = V \ {0y } and apply translations to this.) Let v € V \ {0y}. Then {v} is a closed subset 
of V, and so Q = V \ {v} is an open subset of V. Therefore Q € Topo, (V), and so by Theorem 39.1.7, there 
is a symmetric set U; € Topo, (V) which satisfies U; + U; C Q. In other words, (Ui + U1) n (v) = 0. Then 
Ui N ({v} — U1) = 0 by Theorem 22.10.6 (i). But Ui N Hv} — U1) = Ui N ({v} + U1) = Ui A (v + U1) by 
Definition 22.10.4 and the symmetry of Uj. So UN (v+U1) = 0. But U; € Topo, (V), and v+U, € Top, (V) 
by Theorem 39.1.6 (i). Hence V is a Hausdorff space by Definition 33.1.24. 


Part (ii) follows from part (i) and Theorem 33.1.15. 


39.1.9 THEOREM: Continuity of pointwise sums and scalar products of continuous TLS-valued functions. 
Let X < (X, Tx) bea topological space. Let V < (K, V,oxK,TK, 0v, H, Tk, Tv) be a topological linear space. 


(i) VA € K, Vf € C(X,V), Af € C(X,V), where Af : X — V denotes the pointwise scalar product 
Af : x e Af(z). 

(ii) Vf,g € C(X,V), f +g € C(X,V), where f +g: X — V denotes the pointwise sum f +g : z > 
f(z) + 9(z). 


PROOF: For part (i), if A = 0, then Af : x 2 Oy. So Af € C(X,V) by Theorem 31.12.9. So assume A Z Oy. 
Let 2 € Top(V). Then AIQ € Top(V) by Theorem 39.1.6 (iv). So (Af) !(Q) = f^! (A4^!Q) € Top(X) by 
Definition 31.12.4. Hence Af € C(X, V). 

For part (ii), the map f x g : X > V x V is continuous by Theorem 32.11.2 (i). The map oy : V x V > V 
is continuous by Definition 39.1.3 (v). So the composite function oy o (f x g): X — V is continuous by 
Theorem 31.12.7. But this is the map z > ev(f(z),g(x)) = f(x) + g(x). Hence f +g € C(X,V). 
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39.2. Infinite series in topological linear spaces 


39.2.1 REMARK: The partial-sum sequence for an infinite series. 

Convergence of sequences in topological linear spaces is a simple application of the concept of convergence 
of sequences in general topological spaces as in Section 35.4. The significant “value added” in a topological 
linear space is that for any given sequence, a partial-sum sequence may be constructed using the vector 
addition operation, which is commutative and associative. The partial-sum sequence is called a “series”. 


An infinite series is thought of as an infinite sum expression. In other words, an infinite series is an infinite 
sequence which is thought of as being progressively summed in the order of the elements in the sequence. 
Sums of finite sequences are well defined in general semigroups as in Definition 17.1.17. As mentioned in 
Remark 17.1.16, the result of the summation is order-independent if the semigroup is commutative. 


An “infinite sum” makes little sense without a topology, unless all but a finite number of elements of the 
sequence are equal to an additive identity for the semigroup in which the addition occurs. (An additive 
identity exists if the semigroup is a monoid as in Definition 17.1.17.) But such an infinite sum is merely a 
finite sum with an infinite amount of “padding”. A true infinite sum is not well defined without some notion 
of convergence. (Whether one considers the limit of a sequence of partial sums to be the same thing as an 
“infinite sum” is a philosophical question. There seems to be little harm in stating that infinite sums are in 
fact the same thing as limits of partial-sum sequences.) 


A topological linear space does possess the necessary topology, and the set of vectors in a linear space is a 
commutative group, which together amply meet the requirements for order-independent sums of finite series. 
(Convergence of infinite series would be well defined in more general “topological commutative groups”, but 
convergence theory for such a broad category is probably not very useful.) 


The index set J in Definition 39.2.2 could easily be replaced with a general countably infinite set which is 
order-isomorphic to ZG . Then the index i — 1 would need to be replaced by max{j € I; j < i}. Examples 
of such alternative index sets would be the set of all even positive integers or the set of all negative integers 
(with reverse order). 


In the inductive construction procedure in Definition 39.2.2, the partial-sum sequence could be defined so 
that si = 0 and line (39.2.1) is replaced by Vi € I \ {io}, s; = 5;-1 + aj_1. There are arguments for 
and against either choice. Convention favours defining s; = p» a; rather than s; — X aj. Thus 
the index i of (s;);e; equals the index of the last term which is present in the partial sum. This has the 
disadvantage that there is no partial sum containing zero terms. In other words, the partial-sum sequence 
does not commence with the *empty sum". In practice, it is generally not onerous to explicitly state which 


style is intended. 


In the customary mathematical way of thinking, a "series" is an infinite algebraic sum expression of the 
form ag + a1 + a2 +..., but since an infinite algebraic expression is a somewhat metaphysical concept, it is 
replaced in Definition 39.2.2 by the more concrete concept of an infinite sequence of partial sums. Then a 
"series" is defined by its relation to a given sequence, not as a class of object in itself. The series is not a 
sequence of terms or sums of terms, but rather the partial-sum sequence associated with a given sequence. 
Without this association, a series would be the same thing as a sequence. 


39.2.2 DEFINITION: The partial-sum sequence of a sequence (a;);e; in a linear space, where I = Zg or 
I = Z*, is the sequence (s;);e; defined inductively by s;, = a;,, where i9 = min(I), and 


Vi € I\ {io}, Sj = 8-1 + Qj. (39.2.1) 


The (infinite) series of an infinite sequence (a;);e; in a linear space means the partial-sum sequence (s;)je7 
of (ai)ier. 
The (infinite) series (a;);er in a linear space means the partial-sum sequence (s;);er of (a;);er. 


39.2.3 REMARK: Explicit indication of inductive construction of partial-sum sequences. 
It may seem at first that one may define the partial-sum sequence (s;);e; by 


Vi € I, 8 =>) {a;; j EL andj < i}, 


but this produces the wrong result if (a;);e; is not an injective sequence. Therefore the inductive construction 


procedure must be indicated explicitly. Alternatively, partial-sum notations such as »; 
17.1.17 or 18.10.2 may be applied here. 


‘_, aj in Definitions 
j-—*o 
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39.2.4 REMARK: Definition of the value of an “infinite sum”. 

Definition 39.2.2 does not make any reference to topology. So it could have been presented in Chapter 22. 
(See for example Definition 22.8.7, where the sum of an infinite family of vectors is well defined if all but a 
finite number of vectors are equal to zero.) However, the first thing one wants to know about an “infinite sum” 
expression, where infinitely many terms are non-zero, is whether it has a well-defined meaning. Therefore 
it seems reasonable to introduce infinite series when the convergence of sequences has already been defined, 
which requires topology. 


The notion of convergence of a sequence is introduced in Definition 35.4.2, and uniqueness of the limit of a 
sequence is guaranteed in Hausdorff spaces by Theorem 35.4.10. Since an infinite series is defined to be an 
infinite sequence (which has a particular kind of relation to some given infinite sequence), the convergence 
properties of series are a straightforward application of the theory for infinite sequences. The value of an 
infinite series is simply the limit of its partial-sum sequence, if it exists and is unique. Otherwise the value 
is undefined. 


Since the theory of convergence of series is merely a special case of the theory for sequences, one might well 
ask whey there is any need for the concept of an infinite series at all. One answer is that it is often not 
convenient to write down a closed formula for the sum of n terms for general n. There are many kinds 
of sequences which have simple formulas, but whose partial sum sequences are very difficult to write as a 
closed formula other than as a sum-of-n-terms formula. This is similar to the situation with integration, 
where a simple integrand may not be expressible as anything other than an integral expression. (There are 
in fact close parallels between sums of series and integrals of functions.) Another answer is that infinite 
series typically arise from particular application contexts, as for example power series for analytic functions 
or Fourier series for integrable functions. (See Section 42.8 for analytic functions.) 


39.2.5 REMARK: Informal expressions for infinite series. 

An expression of the form “the (infinite) series (a;);e;" is an informal way of saying “the (infinite) series 
of (a;)ier”. In other words, the series is not the sequence (a;);e;, but rather the corresponding partial-sum 
sequence. (This kind of informal language is used in Definition 39.2.6 for example.) 


Even more informal is an expression of the form “the (infinite) series » 7;-; a”, which could mean either 
the partial-sum sequence of (a;);e7 of the limit (if it exists and is unique) of the partial-sum sequence. The 
meaning of the symbolic expression “jez a;" is thus ambiguous. 

The word “sum”, applied to an infinite sequence, means a limit of its partial sum sequence. This distinguishes 
a limit of the sequence (5;);e; = Py aj)ier from a limit of the original sequence (a;);er. 

39.2.6 DEFINITION: A sum of an infinite series (a;);e; in a topological linear space V is a limit in V of 
the partial-sum sequence of (aj) jer. 


39.2.7 DEFINITION: A convergent series in a topological linear space V is an infinite series for which a 
limit of the partial-sum sequence exists in that space. 


A divergent series in a topological linear space V is an infinite series for which there exists no limit of the 
partial-sum sequence in that space. 


39.2.8 THEOREM: Every convergent series in a Tı TLS has a unique sum. 
The sum of an infinite series in a T, topological linear space is unique if it exists. 


In other words, every convergent series in a T, topological linear space has a unique sum in the space. 


PROOF: The assertion follows from Theorems 35.4.10 and 39.1.8 (ii). 


39.2.9 DEFINITION: The sum of a convergent (infinite) series (a;);c; in a T, topological linear space V is 
the limit in V of the partial-sum sequence of (a;);e 7. 


39.2.10 NOTATION: J- ;cz 4i, for a convergent infinite series (a;);e; in a T, topological linear space, denotes 
the unique limit of its partial-sum sequence. 


bou ai, for a convergent infinite series (a;);e; in a T4 topological linear space, denotes the unique limit of 


its partial-sum sequence, where ig = min(J). 
32a, for a convergent infinite series a = (a;);e; in a Tı topological linear space, denotes the unique limit of 
its partial-sum sequence. 
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39.2.11 THEOREM: [If a series in a Tı TLS is convergent, its elements must converge to zero. 
Let (a;)ier be a convergent series in a T, topological linear space V. Then (a;);er is a convergent sequence 
with the unique limit 0 € V. 


PROOF: Let (a;);e; be a convergent series in a Tı topological linear space V. Then there is a unique v € V 
such that the partial sum sequence (siJicr = [35 aj)ier converges to v. So by Definition 35.4.2, 


VG € Top, (V), In € I, Vi 2 m, Ge, 
I= 

where ig = min(I). Let Q € Topo(V). Then by Theorem 39.1.7, there is a set U € Top (V) such that U = —U 
and U +U CQ. Let G = v +U as in Definition 22.10.4. Then G € Top, (V) by Theorem 39.1.6 (iii). So 
there is an n € I such that s; € G for all i > n. It follows that a; = s; — sj; 1 € G — G for alli > n. But 
G-G=(v+U)-(w+U)=U-U=U4U CR. Therefore a; € € for alli > n. So 0 is a limit of the 
sequence (a;);e; by Definition 35.4.2. But this limit is unique by Theorem 39.2.8. Hence (a;);e; converges 
to the unique limit 0 € V. 


39.3. Normed linear spaces 


39.3.1 REMARK: Normed linear spaces are topological linear spaces. 
A norm on a linear space V as in Definition 24.7.2 induces a metric on V by Theorem 37.1.8, which then 
induces a topology on V by Definition 37.5.2. 


39.3.2 DEFINITION: The distance function induced by a norm w on a linear space V is the function d : 
V x V 2 Rý given by d(x,y) = v(y — x) for all z, y € V. 


The topology induced by a norm w on a linear space V is the topology induced on V by the distance function 
which is induced by w on V. 


39.3.3 REMARK: Every normed linear space is a Tı topological linear space. 

Theorem 39.3.4 asserts that every normed linear space is a T, topological linear space. An immediate benefit 
of this is the inheritance of various properties of infinite series. In particular, Theorem 39.3.5 asserts that 
the terms of a convergent series must converge to the zero vector. This does not require completeness of the 
normed linear space because the zero vector is guaranteed to not be a “gap” in the space. 


39.3.4 THEOREM: Every normed real or complex linear space is a Tı TLS. 

Let V < (K, V,okK, TK, ov, H, |- |x, v) be a normed linear space, where K is IR or C. (See Definition 24.7.5.) 
Let Ty be the topology induced by v» on V. Then V < (K,V;ok,T&k,Ov,I, Tk, Ty) is a Tı topological 
linear space. 


Pnoor: To show the continuity of the vector addition operation oy with respect to Ty, let (x, y) € V x V. 
Let z = oy (x,y) 2 x - y € V. Let G € Top,(V). Then Bzr C G for some r € IR* by Theorem 37.5.6 (ii), 
where the open ball B, r is with respect to the distance function d : V x V > R$ induced by v. Let 
Q = B,,/; X Byyjg. Then Q € Top; (V x V) by Theorems 37.5.6 (i) and 32.9.6 (ii). Let (a’,y’) € Q. 
Then d(cv (z^, y), ev (z^, y')) = Yov (x,y) -ev(z' v) = Yt +y) - (c9) = v((z' x) (yy) <r 
by Definition 24.7.2 (ii). Thus ey (0) C G. Therefore oy is continuous at (x,y) for all (x,y) € V x V by 
Definition 35.3.2. So oy is continuous by Theorem 35.3.3. 


Similarly, to show the continuity of the scalar product operation u with respect to Ty, let (A,v) € K x V. 
Let z = p(A,v) = Av € V. Let G € Top,(V). Then B, C G for some r € Rt by Theorem 37.5.6 (ii). Let 


fy = 5max(l,|A|g) *. Then r, € Rt. Let r4 = £(y(v) -ro) ~t. Then r4 € Rt. Let 0 = Byz, x Byr,- 


Then €) € Topo, w) (Æ x V) by Theorems 37.5.6 (i) and 32.9.6 (ii). Let (A, v') € Q. Then 
d(A'v', Av) = v(A'v' — Av) 
< y (Av! — Av) + wv’ — Av) 
=|N - Aly’) + Aleyo- v) 
€ X - Al (uv) + (v^ — v)) + Alyo — v) (39.3.1) 
< rallu) - ro) + |Al rs 
E r/34 r/3 « r, 
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where line (39.3.1) follows from Theorem 24.6.6 (iii). Thus (Q) C G. Therefore u is continuous at (A, v) 
for all (A,v) € K x V by Definition 35.3.2. So u is continuous by Theorem 35.3.3. Consequently V is a 
topological linear space. 


V is a T, space by Theorems 37.6.9 and 33.1.26. Hence V is a T, topological linear space 


39.3.5 THEOREM: If a series in a normed linear space is convergent, its elements must converge to zero. 
Let (a;);e; be a convergent series in a normed linear space V. Then the sequence (a;);e; converges to the 
unique limit 0 € V. 


PROOF: Let V be a normed linear space. Then V is a T, topological linear space by Theorem 39.3.4. Let 
(a;)ier be a convergent series with respect to the topology induced by the norm on V. Then (a;);e; is a 
convergent sequence with unique limit 0 € V with respect to the topology on V by Theorem 39.2.11. 


39.3.6 THEOREM: Norms on real linear spaces are continuous with respect to their induced topologies. 
Let y be a norm on a real linear space V. Then v is a continuous function with respect to the topology 
induced by w. 


PROOF: By Theorem 24.6.6 (iv), |v(z) — v(y)| € v(x — y) for all z,y € V. Therefore v is continuous by 
Theorem 38.1.3. 


39.4. Banach spaces 


39.4.1 REMARK: Banach spaces. 

Banach spaces are complete normed linear spaces over either the real numbers or the complex numbers. (See 
Section 24.7 for normed linear spaces. See Section 37.8 for complete metric spaces.) The principal motivation 
for defining Banach spaces is to extend basic differential and integral calculus concepts for Euclidean spaces 
to as large a class of spaces as possible. Banach spaces are not required to be equipped with an inner product. 
Hilbert spaces are intended for those concepts which do require an inner product. 


39.4.2 DEFINITION: A (real) Banach space is a real normed linear space whose induced metric space 
structure is complete. In other words, a real Banach space is a tuple V < (IR, V, oR, TR, OV; H, Y) such that 


(i) R < (I; om, 7n) is the field of real numbers, 

(ii) V < (R, V;on, TR, OV, H) is a linear space over R, as in Definition 22.1.15, 
(ui) v: V > Ris a norm on V, 
(iv) the metric space (V, dy) induced by ~# on V is complete. 


39.4.3 REMARK: Complex Banach spaces. 

The unitary groups, which arise as structure groups for connections on differentiable principal bundles 
in gauge theory, act on finite-dimensional complex linear spaces, which are complex Banach spaces when 
equipped with a suitable norm. Complex Banach spaces also appear in connection with the complex Hilbert 
spaces which arise in quantum theory. 


39.4.4 DEFINITION: A (complex) Banach space is a complex normed linear space whose induced metric 
space structure is complete. In other words, a complex Banach space is a tuple V < (C, V, oc, Tc, ov, uu, V) 
such that 


(i) C < (C,oc, Tc) is the field of complex numbers, 


(ii) V < (C,V,oo, To, ov, H) is a linear space over C, as in Definition 22.1.16, 
(ui) v: V > Ris a norm on V, 
) 


(iv) the metric space (V, dy) induced by y on V is complete. 


39.4.5 THEOREM: The complex number system is a complex Banach space. 

Let (C, C, oc, Tc, oc, Tc) be the linear space € over C (as in Definition 22.1.9). Let  : C + Rj be the 
standard absolute value function on C (as in Definition 16.8.8). Then the tuple (C, C, oc, Te, oc, Te, V) is 
a complex Banach space. 
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Pnoor: The standard absolute value function on C is an absolute value function on C by Theorem 18.5.18. 
In other words, 7 satisfies the product rule, triangle inequality and positivity conditions in Definition 18.5.6. 
So w satisfies the corresponding scalarity, subadditivity and positivity conditions for a norm on the linear 
space C in Definition 24.7.2. Therefore C < (C, C, oc, Tc, oc, Tc, V) is a normed linear space over C. The 
completeness of this space follows from Definition 32.6.12 and Theorem 37.8.14 (ii). Hence the complex linear 
space C is a complex Banach space. i 


39.4.6 THEOREM: If a complex-number series is convergent, its elements must converge to zero. 
Let (a;);er be a convergent series in C. Then the sequence (a;);e; converges to the unique limit 0 € C. 


PROOF: The assertion follows from Theorems 39.3.5 and 39.4.5. 


39.5. Finite-dimensional normed linear spaces 


39.5.1 REMARK: Equivalence of topologies induced by norms on finite-dimensional linear spaces. 
Theorem 39.5.2 generalises Theorem 24.8.9 from dim(V) < 2 to dim(V) € Zf. The compactness of the 
image of a compact set by a continuous function yields the positive lower bound for a norm in line (24.8.6). 
When this is combined with the analogous upper bound in Theorem 24.8.2, the result is that any two norms 
are equivalent according to Definition 24.8.11, and then by Theorem 37.5.10, the topologies induced by any 
two norms must be the same. This justifies the term “equivalent norms” in Definition 24.8.11. 


39.5.2 THEOREM: Lower bound for the norm of a linear combination in terms of individual norms. 
Let w be a norm on a finite-dimensional real linear space V. Let B = (e;)?., be a basis for V. Then 


3k € Rt, VA € R”, 7103 Nei) > kA: max (ei), (39.5.1) 
i=1 i= 


where n = dim(V). 


Proor: Let Ko = {A € (IR); 355, A; = 1}. Then Ko is a compact subset of R”. Define f : Ko > R 
by f : Ae v($5; ., Aiei). Then f is continuous because 7 is continuous by Theorem 39.5.2. So f (Ko) is a 
compact subset of R by Theorem 33.5.15. Let £o = inf(f(Ko)). Then co € f(Ko) by the compactness of 
f(Ko). So f(v) = £o for some v € Ko. However, 0 ¢ f(Ko) by Definition 24.7.2 (iii). So eo # 0. Therefore 
Ww € Ko, f(v) 2 £y > 0. There is a similar positive lower bound for f on each of the other “facets” of 
the “hyper-octahedron” K = (A € R^; $57 , |Ai| = 1). (Note that K is the same as the “unit ball” of the 
power-norm | - |; on R”.) Thus line (39.5.1) is satisfied with k equal to the minimum of the lower bounds 
for the 2" facets. 


39.5.3 THEOREM:  Equivalence of norms on finite-dimensional real linear spaces. 
Let Y1, Y2 be norms on a finite-dimensional real linear space V. 

(i) v and %2 are equivalent norms on V. 

(ii) The topologies induced on V by yı and v» are the same. 


(iii) The topologies induced on V by «v and v» are the same as the standard topology induced on V by any 
component map. 


PROOF: Let V1,» be norms on a finite-dimensional real linear space V. Let B = (e;)"_, be a basis for V. 
Let v € V. Then v = J`; 4, Ae; for some unique A = (4;)., € R”. So v4(v) € [Aj max, v1(ei) by 
Theorem 24.8.2. But by Theorem 39.5.2, v»(v) > k|A| max? , v»(e;) for some constant k € Rt which is 
independent of v. Let c = k max? 4 v»(e;)/ max? Yı (ei). Then c € Rt and 


Vv € V, eii(v) € eA: max v» (e;) 
= k|Ah max v»(e;) 
< palv). 


Let C = c-!. Then swapping v and v» gives Vu € V, p2(v) < Cy (v). So by Definition 24.8.11, the norms 
are equivalent. 
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Part (ii) follows from part (i) and Theorem 37.5.10 (ii). 
For part (iii), let B be a basis for V. Let T be the topology induced by the component map «g on V as 


in Definition 32.6.6. Then by Definition 32.6.1, T' is generated by the set of all Cartesian products of real 
open intervals. These products include the open balls corresponding to the power norm | - |œ on R”. But 
2-7/2) < [Aloo < [Alı for all A € R”. So the power norms | - |; and | - |œ are equivalent. But the norms 
V1 o bp and v» o dp are equivalent to | - |; by Theorems 24.8.2 and 39.5.2, where dg : R” — V is the linear 
combination map óp : À — 5; , Aei. Therefore all three norms are equivalent on IR”. Hence they induce 


the same topology. 


39.5.4 REMARK: Finite-dimensional normed linear spaces are Banach spaces. 

By Theorem 39.5.3 (iii), all norms on a finite-dimensional linear space are topologically equivalent to the 
standard topology. The definition of a Banach space specifies a particular norm, not a particular topology. 
So two finite-dimensional Banach spaces may have the same topology but a different norm. 


Even in the case of a one-dimensional Banach space, or the space IR regarded as a Banach space over the 
field IR, the choice of norm may differ by a constant multiple. (See Definition 22.1.9 for the linear space K 
over the field K itself.) However, in Theorem 39.5.7, it is assumed that the norm on IR is the same as the 
standard absolute value function. 


In a wide range of situations where definitions or theorems are valid for finite-dimensional normed spaces, 
those definitions or theorems are equally valid for Banach spaces. Often the methods of proof may be 
extended by simply replacing the text “finite-dimensional normed space” with “Banach space”. 


39.5.5 THEOREM: Every normed finite-dimensional real linear space is a real Banach space. 
Let w be a norm on a finite-dimensional real linear space V < (R, V, oR, TR, OV, H). 
Then V < (R,V,or, TR, Ov; H, V) is a real Banach space. 


PROOF: Definition 39.4.2 parts (i), (ii) and (iii) are clearly satisfied. Part (iv) follows from Theorems 
37.8.14 (ii) and 39.5.3 (iii). 


39.5.6 THEOREM: If a real-mumber series is convergent, its elements must converge to zero. 
Let (a;);e; be a convergent series in IR. Then the sequence (a;);e; converges to the unique limit 0 € IR. 


Proor: The assertion follows from Theorems 39.3.5 and 39.5.5. 


39.5.7 THEOREM: The real number system is a real Banach space. 

Let (R,R,eom,Tm.Om,Tm) be the linear space R over R (as in Definition 22.1.9). Let y : R — R 
be the standard absolute value function x : x + max(x,—x) on R (as in Definition 16.5.2). Then 
(R, IR, cà, TR; €; TR; V) is à real Banach space. 


PRoor: The standard absolute value function on R is an absolute value function on R by Theorem 18.5.15. 
In other words, ~ satisfies the product rule, triangle inequality and positivity conditions in Definition 18.5.6. 
So w satisfies the corresponding scalarity, subadditivity and positivity conditions for a norm on the linear 
space R in Definition 24.7.2. Therefore R < (R, IR; om, TR, Cm; TR; Y) is a normed linear space over IR. The 
completeness of this space follows from Theorem 37.8.14 (i). Hence the real linear space R. is a real Banach 
space. 


39.6. Topological groups of linear transformations 


39.6.1 REMARK: Norms for real square matrices. 
As shown in Theorem 25.12.4 (iii), the upper bound function u* : M, (IR) > IRj in Definition 25.12.2 is 
a norm on the real linear space M, (IR) of real square matrices for n € Z+. With respect to the topology 


induced by this norm, the inversion operation A œ> A^! for A € M7 (IR) is continuous. 


39.6.2 THEOREM: Matrix inversion is a topological automorphism of the group of invertible matrices. 
For all n € Z*, let M, (IR) have the distance function and topology induced by the norm u*, the upper 
bound function on M, (IR). 


(i) M'Y (IR) is an open subset of M,, ;, (IR). 


n,n 


ii) The matrix inversion map A  A-! is a topological automorphism on M?" (R). 
g 


n,n 
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Pnoor: For part (i), let A € MET (IR). Let r = (A), where u^ : Ma; (IR) + R$ is the matrix lower 
bound function in Definition 25.12.2. Then r > 0 by Theorem 25.12.3 (xii). Let Ba, = (A' € Mn (IR); 
u* (A' — A) « r}. Then u (A) > pw (A) — u* (A' — A) > 0 for any A’ € By. So A’ is invertible for 
all A’ € BA. Thus Ba, C Mi"; (IR). Therefore Mi» (IR) is an open subset of Mp (IR). 

For part (ii), define f : Mj'X(IR) > Mnn(R) by f : Ac A71. Then Range(f) = MÌ} (R) since A71 is 


invertible if A is invertible, and f is injective because matrix inverses are unique. So f is a bijection from 
Mi (IR) to MY (IR). Note that f = f^! because f(f(A)) = A for all A € Mi (R). 
To show that f is continuous, let A € MP (IR). Then u7 (A) > 0 by Theorem 25.12.3 (xii). Let e € IR*. Let 
6 = min($p (A) ieu (A)). Then ô € R*. Let Bas = (X € Mn (R); p* (X — A) < ô}. Let X € Bas. 
Then n7 (X) > n- (A) — n*(X — A) by Theorem 25.12.6 (i). So u~ (X) > u- (A4) — 6 > n- (4) - łu (4) = 
34 (A) > 0. Therefore X is an invertible matrix by Theorem 25.12.3 (xii). So B4,; C MPIPY(IR). Also, 
p*(X-! — A7!) € u*(X — A)u (A)! n- (X)! by Theorem 25.12.6 (v) for all X € BA,5. Then from the 
bound u7(X) > $u-(A), it follows that y* (X7! — A7!) € 2y* (X — A)n- (A)? < 28u (A)? < e. So 
F(X) € Ba-1,-. Thus Ve € Rt, 3ó € Rt, f(BAs) € Bg(aj,.. So f is continuous by Theorem 38.1.7. So 
-! = f is continuous. Hence the map A — A`! is a topological automorphism on Mi’ (IR). 


n,n 


39.6.3 REMARK: General linear groups of finite-dimensional linear spaces are topological groups. 
Since the inversion operation for square matrices is continuous by Theorem 39.6.2, the corresponding inversion 
operation for linear maps between finite-dimensional linear spaces is also continuous because linear-map 
coordinate maps are homeomorphisms. Consequently the general linear group for a finite-dimensional linear 
space is a topological group according to Definition 36.9.2. This is asserted in Theorem 39.6.4 (i). 


39.6.4 THEOREM: General linear groups are effective topological transformation groups. 

Let V be a finite-dimensional real linear space. Let G = GL(V) as in Notation 23.1.12. Define og : GxG — G 
by cc : (91,92) ^ gı © ga for 91,92 € G. Define ub : Gx V — V by u% : (g,v) = glv). Let Tg be the 
standard topology for G as in Definition 32.6.8. Let Ty be the standard topology for V as in Definition 32.6.6. 


(i) (G, Ta, 0G) is a topological group. 
(ii) (G, Ta, V, Tv, oa, ub) is a topological left transformation group. 
(ii) (G, Te, V, Tv, oc, E) is an effective topological left transformation group. 


PROOF: For part (i), (G, 6G) is a group by Theorem 23.1.17, and (G, Ta) is a topological space because Tg 
is a topology on G, imported from R”*” via coordinate maps which are bijections, where n = dim(V). (See 
Definition 32.6.8.) The group operation og is continuous because it corresponds, via the coordinate maps, 
to the multiplication of matrices, which is a continuous operation. The inversion operation is continuous, 
via coordinate maps, by Theorem 39.6.2. Hence (G, Tg, oc) is a topological group by Definition 36.9.2. 


For part (ii), (G, TG, øq) is a topological group by part (i), and u% is continuous with respect to Tg and Ty 
because matrix multiplication, via coordinate maps, is linear with respect to the elements of the matrix and 
the tuple. Hence (G, Tc, V, Tv, og, ut) is a topological left transformation group by Definition 36.10.4. 


Part (iii) follows from part (ii) and Theorem 23.1.17. 


39.7. Landau's order symbols 


39.7.1 REMARK: The use of Landau order symbols to denote “throw-away terms”. 

There are many situations where the limiting behaviour of a function is conveniently expressed in terms 
of the Landau order notations. In the first notation, one writes something of the form f(x) = O(g(x)) as 
x — ato mean that “f(x) is of order g(a) as z — a”. In the second notation, one writes something of 
the form f(x) = o(g(x)) as x — a to mean that *f(x) is of smaller order than g(x) as x — a". These are 
pseudo-notations, but they are in common use and are useful. 


According to EDM2 [113], 87.G, page 327, the Landau symbols (due to Edmund Georg Hermann Landau 
in 1909) are in common use for describing the orders of infinities and zeros of complex functions. In analysis 
more generally, the Landau symbols are used for “throw-away terms". That is, whenever intermediate steps 
in calculations involve terms which don't affect the intended outcome of the calculations, the negligible 
terms are written with Landau symbol notation to indicate the order of the terms which will be discarded. 
This maintains some semblance of logical correctness without actually having to supply exact formulas for 
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the negligible terms. A substantial proportion of analysis consists in putting bounds on terms so as to 
show convergence or uniqueness or other such desirable properties of functions or sequences. The Landau 
symbols provide a convenient notation for such bounds on the absolute values of functions or sequences. 
Schramm [133], pages 265-266, wrote the following. 


The Landau symbols give a neat appearance to our work. We will soon see that their use also 
has real benefit. This is a wonderful example of a situation where nothing deeper than a choice of 
notation can greatly clarify a subject. Perhaps the choice of notation is “deep” after all! 


At the very least, one can say that the Landau order symbols do save time and space when writing on a 
blackboard (or whiteboard or slide presentation), and the focus is profitably moved from the unimportant 
throw-away terms to the terms which one intends to keep. One fly in the ointment here is that the real 
business of analysis often consists in the very precise bounding of throw-away terms, and informality in this 
regard can give a perfectly plausible appearance to a totally incorrect argument. To be safe, one should prove 
every result with precise use of logical quantifiers, and reserve the Landau order symbols for presentation 
purposes only, or for rough sketches which are later carefully verified. 


39.7.2 REMARK: Some general definitions of function order. 

Definitions 39.7.3 and 39.7.4 put weaker than usual requirements on the spaces for which function order is 
defined. (See Remark 18.3.21 for the notation St = {A € S; A > Og}. See Definition 19.6.3 for normed 
modules over rings.) The spaces and functions in these definitions are illustrated in Figure 39.7.1. 


AER P S 
Lx " 
f 
a€X ___ FM 
g 
Figure 39.7.1 Function f of order g for a normed module M over a ring R 


Definition 39.7.3 defines equal order whereas Definition 39.7.4 defines strictly lesser order. However, only 
upper bounds are asserted in both cases. So “of order g” really means “of order equal to or less than g". 


39.7.3 DEFINITION: Weak order bound, normed module over a ring. 

A function of order g near a with respect to a norm «y : M — S, for a function g : X —^ M and a € X, 
where X is a topological space and (M, v) is à normed module over a ring R compatible with absolute value 
function $ : R — S for ordered ring S, is a function f : X — M such that 


C € St, 30 € Top, (X), Yz € QV {a}, v(f(x)) € Cv(g(x)). 


39.7.4 DEFINITION: Strong order bound, normed module over a ring. 

A function of order less than g near a with respect to a norm «v : M — S, for a function g : X > M 
and a € X, where X is a topological space and (M, v») is a normed module over a ring R compatible with 
absolute value function $ : R — S for ordered ring S, is a function f : X — M such that 


YC € S+, INQ € Top,(X), vr eQV(a], — v(f(z)) < Cv(g(z)). 


39.7.5 REMARK: Interpretation of general definitions of function order. 

Even if the set of absolute values S is a minimal ordered ring such as Z, Definition 39.7.3 puts a meaningful 
constraint on the function f. If the function g is constant, for example, then the definition asserts that f is 
bounded. If g is unbounded, then the definition asserts that f does not diverge faster than g. 


Definition 39.7.4 is not of much use if there is a value C € S* with (A € St; A < C] = Ø. This happens 


if S = Z. If g is constant, for example, the definition asserts only that f has a particular bound, not that f 
converges to zero. 
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ACK ails 
Ly 
Il - d 
f 
acx V 
g 
Figure 39.7.2 Function f of order g for a normed linear space V over a field K 


To give analytic significance to Definition 39.7.4, it seems reasonable to require the set (A € St; A « C] to 
be infinite for any C € St. This suggests that S should be an ordered field. Likewise, the ring R probably 
should be a field. This leads naturally to the less general Definitions 39.7.6 and 39.7.7, which are probably 
adequate for most purposes. The spaces and functions in these definitions are illustrated in Figure 39.7.2. 


One could perhaps require that S should be merely an ordered division ring and that R should be a division 
ring, but it is not clear that such generality would bring worthwhile benefits. Likewise, the space V could 
be replaced by a general metric space. However, this would ignore the “scalarity” implicit in the bounds in 
Definitions 39.7.3 and 39.7.4. (See Remark 19.6.7 for scalarity.) 


39.7.6 DEFINITION: Weak order bound, normed linear space. 

A function of order g near a, for a function g : X — V and a € X, where X is a topological space and 
(V. || - ||) is a normed linear space over a field K compatible with absolute value function | - |: K > R, isa 
function f : X — V such that 


3C € R*, 30 € Top, (X), Vz € QV {a}, IFN € Clig(x)]l- 


39.7.7 DEFINITION: Strong order bound, normed linear space. 

A function of order less than g near a, for a function g : X — V and a € X, where X is a topological space 
and (V. || - ||) is a normed linear space over a field K compatible with absolute value function | - | : K — R, 
is a function f : X — V such that 


VC € R*, 30 € Top, (X), Yz € Q \ {a}, I. f(x)] € Clig(x)]l- 


39.7.8 REMARK: Dummy variables in Landau order notations. 

Notations 39.7.9 and 39.7.10 for Landau's order symbols “O” and “o” are expressed in terms of Definitions 
39.7.6 and 39.7.7. Strictly speaking, these are pseudo-notations. The dummy variable “x” should be thought 
of as an element of the set X, but it is more correct to write “f = O(g) near a” and “f = o(g) near a” 
respectively. 


Landau’s order symbols do almost always use dummy variables in practice. This is particularly useful for 
specifying functions “inline”. For example, f(z) = 223/2 + 5z = O(|z|3/?) = o(|z|?) as z — oo. Such inline 
formulas are easily converted into unambiguous correct notation. (But it is usually tedious to do so.) 


39.7.9 NOTATION: Weak order bound, normed linear space. 
“f(x) = O(g(x)) as x > a”, for functions f, g : X —> V and a € X, for a topological space X and a normed 
linear space V, means that f is a function of order g near a. 


39.7.10 NOTATION: Strong order bound, normed linear space. 
“f(x) = o(g(z)) as x — a”, for functions f, g : X > V and a € X, for a topological space X and a normed 
linear space V, means that f is a function of order less than g near a. 
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40.0.1 REMARK: Differential calculus versus differential analysis. 

Chapters 40, 41 and 42 are on the subject of differentiation, whereas “differential calculus” is a corpus of 
methods, rules and procedures for computing derivatives of explicit functions, as for example the product 
and composition rules, and all of the formulas which one finds in the books of tables. (See for example 
Abramowitz/Stegun [43]; CRC [63]; Gradstein/Ryzhik [84]; Spiegel [139].) Differential calculus is the art of 
differentiating specific functions, whereas the presentation here is concerned with the analytic foundations of 
the subject. Perhaps a more accurate description would be “differential analysis", or just “differentiation”. 


Similarly, Chapters 43 and 45 are concerned with the foundations and general principles of integration, not 
the methods, rules and procedures for computing integrals of explicit functions. So the subject of these 
“integral calculus" chapters could be more accurately described as “integral analysis", or just "integration". 


40.1. Velocity 


40.1.1 REMARK: Mathematical modelling of velocity. 

The concept of “velocity” is more elusive than the concepts of “length” and “duration”. Even length and 
duration are elusive concepts. It is not truly possible to say what length is and what duration is. Both length 
and duration may be measured. Measurement is performed by comparison. Lengths may be compared with 
each other and durations may be compared with each other. One may therefore say when two lengths are 
equal and when two durations are equal. Lengths may be compared to standard lengths such as a metre 
rod. Durations may be compared to standard clocks. Standard wavelengths and frequencies of light may be 
counted to measure length and duration respectively. But no amount of measurement technology answers 
the question of what length and duration are. Therefore it is also not possible to say what velocity is, since 
velocity is a quotient of length divided by duration. (See Remarks 53.1.7 and 53.1.12 for discussion of “native 
velocity” and “native time".) 


On the one hand, velocity seems to be well defined in everyday observations of the real world. Velocity 
easily survives the “test-retest correlation". In other words, if a velocity is measured twice, its value is very 
much the same. Therefore the velocity quotient does measure something. Velocity can also be accurately 
predicted from initial conditions using physical laws. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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On the other hand, no equipment can measure length and duration to arbitrary precision. So the mathemat- 
ical limit of the length/duration quotient cannot be shown to be well defined experimentally. Particularly 
when velocity is varying, the instantaneous length/duration limit is impossible to measure. Like the perfectly 
straight zero-width lines of Euclidean geometry, precise instantaneous velocities exist in human models of 
the world, not in human observations of the world. Velocities are inferred from imperfect observations. 


It is probably the differential equations of classical physics which ultimately give confidence that velocities 
are well defined. This confidence relates only to the perfect world of mathematical models. As the old saying 
goes, reality is an imperfect approximation to theory. If we believe that the second derivative of velocity 
is directly related to a force field, and that the force field is continuous, it then follows that the velocity 
is well defined. Since classical physical theories work very well at the macroscopic level, and even at quite 
small microscopic scales, it seems reasonable to extrapolate from the macroscopic to the infinitesimal. If 
observation cannot measure the infinitesimal, then there is no harm done in supposing that the infinitesimal 
limit is well defined, and there is much calculational benefit in making this assumption. The infinitesimal 
limit is thus a metaphysical construction which is both harmless and beneficial. Nevertheless, it is erroneous 
to assume that the infinitesimal limit is observable in physical reality. 


Since we view the world through Cartesian charts, we have little choice but to model the intrinsic velocities 
of the world via charts. The concept of a derivative of a real-valued function is how a native velocity appears 
when viewed through a chart. The unnaturalness of the concept of differentiation may be due to the way 
in which we coordinatise space and time (and other aspects of the physical world). One could say, perhaps, 
that the derivative of space with respect to time is merely the “pull-back” of native velocity with respect to 
our space-time charts. And the “push-forth” of derivatives calculated within charts would no doubt yield 
native velocities. But mathematics deals only with coordinate charts, not with the real world. To see the 
real world, one must look beyond mathematics, using the “push-forth” from models to reality. 


40.1.2 REMARK: Mathematical modelling of direction. 

The direction of a velocity has difficulties which are related to the difficulties with the magnitude of a velocity. 
Directions may be compared, but it is not possible to say what a direction is. Static compass directions on 
the Earth are comparisons with the direction to the North Pole. But this is merely a quantification of the 
difference between directions. It is not practical to define a direction in terms of a geodesic which may be 
thousands of kilometres long. 


Perhaps the best natural description of a “direction” is “the line which a curve would follow if it did not 
change direction”. This is, of course, a circular definition. But it has an intuitive appeal. It suggests that 
one should describe a direction by a line rather than some other kind of geometrical object. Thus a tangent 
line seems to be the right kind of object to specify a direction. 


If the velocity of motion along the tangent line is also specified, a uniformly parametrised line becomes a 
strong candidate to specify velocity vectors, combining speed and direction. Then the velocity of a curve is 
specified by a uniformly parametrised line which is “the trajectory which a curve would follow if it changed 
neither speed nor direction”. (Equivalently, an unparametrised straight line in space-time specifies both 
uniform speed and uniform direction.) 


40.1.3 REMARK: Archimedes included the time parameter in his geometry. 

Within the Cartesian coordinate framework, it is fairly clear that a velocity corresponds to a time-space 
(n + 1)-tuple, where n is the dimension of the space coordinates. In the absence of Cartesian coordinates, 
a velocity may be defined as a set of time-space (n + 1)-tuples describing a line in space-time, which is 
equivalent to a line in space with time-labels attached. The time-space line representation of velocity is 
described in Sections 26.13 and 26.14. This time-space probably would have been acceptable to Archimedes. 
(See Remark 26.19.1 for Archimedes’ parametrisation of lines and curves by time.) In fact, replacing space 
vectors with time-space lines may be regarded as modernising the idea of a tangent from the time of Euclid 
(about 300BC) to the time of Archimedes (about 250BC). 


40.1.4 REMARK:  Curve-classes and differential operators are unsatisfactory as tangent vectors. 

An equivalence class of curves with the same velocity is perhaps philosophically satisfying as a replacement 
on a differentiable manifold for the straight line representation for Cartesian space, but the curve-class 
representation has few benefits for practical calculation, and it does not really solve the philosophical question 
of what a velocity is. The differential operator representation in differentiable manifolds is very useful for 
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calculations, but it results in circular definitions. All in all, the Cartesian number-tuple representation 
seems to be closest to the real meaning of a velocity. Therefore it would seem best to define derivatives of 
real-valued functions as numerical estimators for quotients of space divided by time. 


40.2. Lines through points 


40.2.1 REMARK: The fundamental nature of lines in human vision and classical geometry. 

At a very basic level, differentiation of mathematical functions and curves is related to the perception of lines. 
The human visual perception system initially registers points of light in the retina, but this information is 
processed to recognise lines at a low level in the processing path between the eyes and the higher levels of 
the visual cortex. For example, Churchland/Sejnowski [447], page 153, states the following in regard to the 
V1 area of the visual cortex, which is “early” in the processing of visual input. 


Complex cells are the most common cell type in the early visual cortex. [...] Like simple cells, they 
are tuned to respond to elongated stimuli, such as slits or bars, in their receptive field, and they 
prefer certain orientations to others. But unlike simple cells, they respond best when their favorite 
stimulus is moving in a particular direction across their receptive field; if the cell does respond to a 
stationary stimulus, the stimulus can be located anywhere in the receptive field. For example, a cell 
might respond best to a 30° stimulus moving in a downward direction. Some simple and complex 
cells show length summation, which means that the longer the line within its receptive field, the 
more vigorous the response. 


Similar organisation of visual processing is found in primates and other mammals, such as monkeys and cats. 
Thus the perception of lines and their orientation is biologically programmed at a low level in the brain, and 
has experienced a long evolution. Therefore lines are part of naive mathematics. So it is not surprising that 
a thousand years of ancient Greek pure mathematics were centred primarily on straight lines as the most 
fundamental concept. 


The ancient Greeks made various attempts to define lines in terms of lower-level concepts, but they were 
inevitably unsuccessful. For example, Euclid/Heath [213], page 153, has the following “definitions”. 


1. A point is that which has no part. 

. A line is a breadthless length. 

. The extremities of a line are points. 

. A straight line is a line which lies evenly with the points on itself. 
. À surface is that which has length and breadth only. 

6. The extremities of a surface are lines. 


CU um Ww N 


'The definitions become very rapidly circular. There were various other attempts to define lines in the ancient 
Greek literature, but they were all equally circular. Lines are in fact a naive concept, like points. 


Tangentiality of a straight line to a curve was generally demonstrated by the ancient Greeks by showing 
(1) that the line passed through a point on the curve and (2) that it did not pass through any other point. 
The first part was usually very easy. Proving the absence of other intersection points typically followed 
by demonstrating a contradiction if further intersections occurred. This worked well if the curve had no 
inflection points, which was generally true for the curves of interest. 


40.2.2 REMARK: Lines are more fundamental than numbers or tect. 

Humans perceive lines in the environment at a similar fundamental level to points and cardinal numbers. 
It is commonly stated that the integers are completely natural and innate, and that nothing can be more 
fundamental and "given" than the integers. (Some people even think that the integers would be the best 
way to commence communications with an inter-stellar intelligent society.) This shows an arithmétic bias. 
Modern mathematics is formulated in terms of numbers primarily because they are more convenient for lines 
of text, and diagrams are difficult to print. (At least until the 1980s, it was usual for mathematics publishers 
to discourage authors from including diagrams because of the cost. This probably reinforced our modern 
textification of mathematics.) But perception of geometry in the environment is surely more fundamental 
and innate than even the integers. Geometric perception of sophisticated kinds has been present in animals 
for hundreds of millions of years. The fundamentals of geometric perception include points, lines and curves. 


It could be fairly argued that lines are more fundamental in perception than points. A point has no attributes. 
A single pixel in a photo carries almost no significant information. The outlines and boundaries of shapes 
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are made up of curves and lines. There is useful and important information in the environment in the form 
of lines and curves, whereas it is rare that a single point is of any significance. (One exception in the pre- 
modern environment would be stars, of course.) Therefore lines are perhaps at the most fundamental level 
of perception. It follows that tangents should be thought of as lines, not as numbers. It is no accident that 
the ancient Greeks defined numbers in terms of lines, not vice versa. Even Leonhard Euler defined numbers 
in terms of lines as late as the 18th century. (See for example Leonhard Euler’s 1748 book Introductio in 
analysin infinitorum [217], which contains much “geometric algebra" following Euclid’s idiom.) The desire 
to fit lines to curves is thus at least as fundamental as the desire to count sheep, and probably much more 
fundamental. Since differentiation is the fitting of tangent lines to curves, it follows that differential calculus 
is at least as innate as counting. 


40.2.3 REMARK: The choice of mathematical representation for lines. 

Before examining the concepts of differentiation and tangentiality, it is helpful to first consider a more 
basic concept, namely the concept of a line joining two points. The line joining two points was presumably 
determined in classical Greek geometry by positioning a straight ruler so that it touched both points. Since 
the word “tangent” is Latin for “touching”, the joining of two points by a line was literally a kind of 
tangentiality! (Euclid used the word “égydnteoda”, which means “to touch”. See Euclid/Heath [214], page 2.) 


Figure 40.2.1 illustrates the familiar line through two points in R?. 


yER N 


Figure 40.2.1 Line through two points in R? 


It is easy to give formulas for the line through points (x1, y1) and (x2, ye) as y = yw (@—21)(Yo—-Y1) (2-21) | 
or z = z1 + (y — y1 )(£2— z1)(y2— y1) 1, carefully avoiding the infinite gradient cases. When the components 
of the Cartesian product are rings rather than fields, the formula cannot so easily be given so that one 
variable is a function of the other. Consider for example the situation illustrated in Figure 40.2.2 for the 
Cartesian product Z2. 


In this case, the ratio (yo — y1)(za—21) | = 2/3 is not well defined in the ring Z. The set of points (x, y) € Z? 
which satisfy (y — yi)(xo — z1) = (x — zi)(ya — y1) does not in fact constitute a function from x to y nor 
from y to z. An attractive idea here is to define the line through the two points as a parametrised line of 
the form L : Z — Z? defined by L : t — po + tv, where po = (21, y1), pı = (2, y2) and v = pı — po. (This 
style of parametrised line is also described in Section 26.9.) Such a parametrised line has the advantage that 
it passes through the two given points, and all points that it passes through are on the line passing through 
those two points. However, it does not in general pass through all such points. In this case, the velocity 
vector v must be divided by the greatest common divisor of its components. 


'These examples suggest that lines between points in a Cartesian product of any kind should preferably be 
expressed as parametrised lines rather than functions of one of the components of the Cartesian product. 
This achieves the greatest generality. A further advantage of parametrised lines is that they have a built-in 
traversal mechanism. A set expression such as ((r, y) € X x X; (y —11)(za — zi) = (x — zi)(yo — yı) } fora 
ring or field X, on the other hand, has no clear algorithm for traversing the points. 
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(y — y1)(t2 — 21) = (x — 21) (yo — vi) 
L:te (x1,y1) + t((22,92) — (#1, y1)) 


Figure 40.2.2 Line through two points in Z? 


If the two points to be fitted are the same, the set {(x, y) € X x X; (y—yi)(za— z1) = (x — £1) (yo — yr) } is 
the whole set X x X, whereas the line L : t — po + t(p1 — po) maps all parameters to a single point po = py. 
'The set may be interpreted to mean that all lines pass through the point. The parametrised line through 
two equal points is a constant map, a line which never moves. This constant map seems preferable to a set 
which goes everywhere. 


Fitting lines as tangents to curves is analogous to fitting lines to point-pairs, although clearly “the unique 
line through a single point" is not well defined. The unique line would be well defined if one of the points 
on the curve could be varied by an infinitesimal amount, so small that no one would notice! 


Despite the technical difficulties, it seems reasonable to adopt parametrised lines as a standard representation 
for tangents to curves. And since differentiation is the art of determining tangent lines for curves (and other 
kinds of functions), it seems reasonable to define derivatives of functions to be parametrised lines. Such lines 
have well-defined velocities which are easily extracted from them. These velocities are the customary numbers 
(or sets of numbers) which represent derivatives in most texts. Note in particular that the parametrised line 
approach has no difficulties with infinite gradients. One-sided derivatives may be conveniently represented 
as semi-infinite lines. 


40.2.4 REMARK: Straight lines and calculus. 
'The focus on straight lines in differential calculus is justified by the fact that they are close to how the human 
being sees things, not because they are close to the way the world is. 


Straight lines are algebraically simple because they use only a simple addition and multiplication rule. So 
differential calculus is, in a sense, the art of fitting straight lines to curves, or vice versa! 


40.2.5 REMARK: The epsilon-delta approach to calculus. 

The £/ó approach comes from the work of Huygens, Cauchy, d’Alembert and Weierstraf, who progressively 
refined the derivative concept as the limit of a differential quotient. (See Boyer/Merzbach [237], page 345 
for Huygens, and pages 426, 439-440 for d’Alembert. For Cauchy's ¢/6 limit using d’Alembert’s differential 
quotients, see Boyer/Merzbach [237], pages 455-456, and Struik [249], page 152. For the Weierstraf &/ó 
limits, see Bell [234], page 294. See Cauchy [207], page 27, for Cauchy's use of the specific symbols € and 6 
for defining derivatives.) 


40.3. Differentiability for one variable 


40.3.1 REMARK:  Desiderata for a definition of differentiability. 

Since both continuity and differentiability can be defined in terms of limits, it seems desirable to make the 
definition of differentiability resemble the definition of continuity. However, the resemblance is not so close 
in general as it seems to be for real-valued functions of a single real variable. Differentiability is not defined 
for general topological spaces. In the case of general metric spaces, Lipschitz and Holder continuity can be 
defined, but differentiability requires at least some kind of local linear structure because the derivative of 
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a function is effectively a tangent to its graph, and tangent lines (and tangent planes) are unambiguously 
linear concepts requiring some kind of linear structure. 


Differentiability may be defined either in terms of bounds on deviation of a function from linearity, or in 
terms of bounds on differential quotients. The deviation-from-linearity style of definition appears as follows. 


W € R, Ve > 0, 9 > 0, Vr € (p — ô, p+ ô) N (U \ {p}), 
|f(z) — f(p) — v(z — p)| < elz — pl. (40.3.1) 


'The differential-quotient style of definition has the following appearance. 


Jv € R, Ve > 0, 36 > 0, Yx € (p— ô, p + ô) N (UN (p), 
f(x) fe) _,, (40.3.2) 


« E. 
r—p 


Line (40.3.2) is clearly equivalent to line (40.3.1), but the quotient form can be made to resemble the definition 
of continuity for general metric spaces to some extent as follows. (See Theorem 38.1.3.) 


v ER, Ve > 0, 3ó > 0, Yx € Bs(p) nU, 
f(x) — f(r) 
z—p 


E Bue. (40.3.3) 


(See Notation 37.3.15 for the punctured open ball B;(p) = Bs(p)\{p}.) Line (40.3.3) requires the differential 
quotient to lie in an arbitrarily small open ball neighbourhood of v for all x in some open ball neighbourhood 
of p. This form of differentiability definition follows from the definition of a limit for general metric spaces. 


Line (40.3.1) may be expressed in terms of “conical neighbourhoods” which are subsets of R x R of the form 
Coy. = ((z,y) € Rx R; |y — f(p)| < elx — p|}, and the graph of f may be constrained to lie within such 
“neighbourhoods”. This is clumsy and artificial, and does not seem to increase clarity or generalisability. 
Therefore this option may be rejected with some confidence. However, although line (40.3.3) bears some 
similarity to the &-ó definition of continuity for metric spaces, the differential quotient is difficult to generalise 
to functions of several real variables, whereas line (40.3.1) generalises nicely even to Banach spaces for both 
the domain and range. 


The differential quotient may be considered to be a kind of “parameter estimator” for the derivative of a 
function. This “estimator” concept generalises well when the range is a Banach space, but not when the 
domain is a general Banach space or even a finite-dimensional Euclidean space. (Such a generalisation would 
require n points x near p instead of a single real number x. Then together with p, the n choices of x would 
determine a plane if their displacements from p are linearly independent. This approach would yield an 
"estimator" plane for each such set of n points. Clearly this is not a natural generalisation of the differential 
quotient concept.) 


The “goodness-of-fit” idea which is implicit in line (40.3.1) easily generalises because it is not difficult to 
define broad classes of “approximation functions" together with measures of “fit” between the approximations 
and the actual function. If the fit is arbitrarily good for close enough neighbourhoods of a given point, then 
one may say that the limit function is some kind of derivative at the point. 


All in all, it seems that the expression in line (40.3.1) best captures the concept of a derivative, although 
it is not the form favoured by many elementary calculus textbooks. In terms of classical geometry, the 
“goodness-of-fit” concept is not very far away from the idea of a tangent line or plane. Line (40.3.4) could 
probably have been explained to, and understood by, classical Greek geometers without difficulty, but the 
differential quotient “parameter estimator" approach would have been very much more difficult to explain 
because it is numerical rather than geometric, and is therefore foreign to the mind-set of classical geometry. 


40.3.2 REMARK: Testing differentiability without needing to guess the derivative in advance. 

'The convergence of a sequence of points in a metric space can be tested by either showing that the sequence 
has a limit or by showing that it is a Cauchy sequence. (See Section 37.8 for Cauchy sequences.) The Cauchy 
test requires the diameter of the tail of a sequence to converge to zero. This does not require the limit to be 
guessed in advance. 
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It is also desirable to be able to test differentiability without needing to guess the derivative v in advance 
as in lines (40.3.1), (40.3.2) and (40.3.3) in Remark 40.3.1. It seems somewhat unreasonable that one must 
determine the derivative first, and then use this to prove that the function is differentiable. The style of 
Cauchy sequence test in Definition 37.8.3 may be applied to line (40.3.2) as follows. 


Ve € Rt, 3ó € Rt, Vai, 22 € Bs(p) NU, 
f(z1) - fp) — f(z2) - f(p) 


X1—p T2 — pP 


SE. 


Alternatively one may apply the equivalent diameter-style Cauchy test in Remark 37.8.8 as follows. 


Ve € R^, 36 € R^, diam( {(f (x) — f(p))/(x — p); xz € Bs(p) N U}) <e. 


These tests imply the existence of a unique v € R satisfying line (40.3.2) by Definition 37.8.9 and Theorems 
37.8.5 and 37.8.14 (i). 


Cauchy tests may also be devised for the total differentiability concept in Definition 41.6.4 by estimating the 
linear map L € Lin(IR", IR") from n-tuples (q;)#_, of points q; € B5,5 ONU such that the n-tuple (qi — p)4 
is linearly independent. Such an approach is not presented here. Although it is desirable in principle to be 
able to test differentiability without guessing the derivative in advance, in practice such guesswork is usually 
not too difficult. 


40.3.3 REMARK: Defining differentiability by adapting the £-ó continuity definition. 

The e-ó style of continuity condition in Theorem 38.1.3 for general metric spaces is adapted for differentiability 
in Definition 40.3.4. In the case of continuity, the difference |f (x) — f (p)| must be bounded by e for arbitrarily 
small e. For differentiability, | f(a) — f(p) — v(x — p)| must be bounded by e|x — p| for arbitrarily small e. 
This is illustrated in Figure 40.3.1. 


v(z — p) +elz — pl 
+ u(x — p) 
+ v(x — p) — elx — p| 


> 
T 


p—-ó p pts 


Figure 40.3.1 Definition of derivative of real-valued function or real variable 


40.3.4 DEFINITION: A function f : U — IR for U € Top(R) is said to be differentiable at p for p € U if 
jv € R, Ve > 0, dó > 0, Vx € (p — ô, p + ô) N (U \ {p}), 

|f(x) — f(p) — v(x — p)| < ela — pl. (40.3.4) 
A function f : U — R for U € Top(R) is said to be differentiable on U if it is differentiable at p for all p € U. 


40.3.5 REMARK:  Keyhole testing for differentiability. 

Theorems 40.3.6, 40.3.7 and 40.3.8 may seem to be pointless formalities, proving the obvious. However, the 
open domain condition is necessary, and some logical steps are required to show that it is sufficient. The 
principal assertion of Theorem 40.3.8 is that part (v) implies part (i). This means that everywhere local 
differentiability implies global differentiability. Hu ~ 


Theorem 40.3.8 is analogous to the continuity keyhole test in Theorem 31.12.17. The equivalence of parts 
(i) and (v) is the special single-variable case of the multi-variable keyhole test in Theorem 41.1.27. Such 
“keyhole tests” are applicable to manifolds, where continuity and differentiability tests can only be applied 
via charts, whose domains form open covers of the underlying topological space. (See Definition 31.7.2 for 
open covers.) 
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40.3.6 THEOREM: Open-domain-independence of pointwise differentiability. 
Let p € IR and U1, U2 € Top, (IR). Let fı : U1 — R and fz : U2 — R satisfy filv, = felu: Then fı is 
differentiable at p if and only if f is differentiable at p. 


PROOF: Suppose that f; is differentiable at p. Then by Definition 40.3.4, there exists v € R such that 


Ve > 0, 3à1 > 0, Vr € (p — à, p + ài) Y (U1 \ {p}), 
|A(z) — h(p) —v@—p)| <elja— pl. (40.3.5) 


By Theorem 32.5.9, there exists 6’ € IR* such that (p — 6’,p + 6’) C U; because U; € Top,(R). Let e € RF. 
Then there exists 6, € IR* satisfying line (40.3.5) for all z € (p—61,p +61) A(U1 V {p}). Let 05 = min(6’, 6). 


Let x € (p—ô2, p+82)N(U2\{p}). Then x € (p—d1,p+61)N(U1\ {p}). So |fa() - fhi(p) - v(x—p)| < &lx— p]. 
Therefore | fo(x) — fo(p) — u(x — p)| < ela — p| because File = flv, and z € Ui N Us. Thus 


Ve > 0, dós > 0, Vx € (p — 65, p+ 62) N (U2 N {p}), 
|fo(x) — fa(p) — v(x — p)| < &lz — pl. 


Hence fə is differentiable at p by Definition 40.3.4. The converse follows similarly. 


40.3.7 THEOREM: Inheritance of differentiability by restrictions to open subsets. 
Let U1, U2 € Top(R) with U5 C U;. Let f be differentiable on U1. Then flo; is differentiable on Us. 


PROOF: Suppose that f is differentiable on U1. Let p € Uz. Then p € U,. So f is differentiable at p by 
Definition 40.3.4. Therefore f | y, 18 differentiable at p by Theorem 40.3.6. Hence f lee is differentiable on U2 
by Definition 40.3.4. i 


40.3.8 THEOREM: Everywhere local differentiability implies global differentiability. 


Let U € Top(IR) and f : U — R. Then the following propositions are equivalent. 
(i) f is differentiable on U. 
(ii flero is differentiable on U N Q for all Q € C, for every open cover C of U. 


) 

(iii) ae is differentiable on U N Q for all € € C, for some open cover C of U. 

(iv) Vp € U, YQ € Top, (U), flo is differentiable on Q. 
) 


(v) Vp € U, IQ € Top, (U), Te is differentiable on 2. 


PROOF: To show that (i) implies (ii), let f be differentiable on U. Let C be an open cover of U. Let 
Q € C. Then U NQ € Top(IR) by Definition 31.7.2 and UNQCU. So Flocea is differentiable on U N Q by 
Theorem 40.3.7. Thus (i) implies (ii). 


To show that (ii) implies (i), let f satisfy (ii). Let C = {U}. Then C is an open cover of U by Definition 31.7.2. 
So f = a= Tlusu is differentiable by (ii). Thus (ii) implies (i). Hence (i) and (ii) are equivalent. 

To show that (ii) implies (iii), suppose that f satisfies (ii). Let C = (UJ. Then C is an open cover of U by 
Definition 31.7.2. So flona is differentiable on U N Q for all Q € C by (ii). Therefore (iii) is satisfied with 
C = {U}. Thus (ii) implies (iii). 

To show that (iii) implies (i), let f satisfy (iii). Then U has an open cover C such that f = is differentiable 
on U (1 for all Q € C. Let p € U. Then p € Q for some 2 € C by Definition 31.7.2. Sop E UND. 
Therefore f rm is differentiable on U N Q. So d nen is differentiable at p by Definition 40.3.4. Therefore 
f is differentiable at p by Theorem 40.3.6. Consequently f is differentiable on U by Definition 40.3.4. Thus 
(iii) implies (i). Hence (i), (ii) and (iii) are equivalent. 

To show that (i) implies (iv), let f satisfy (i). Let p € U. Let Q € Top,(U). Then flo is differentiable on Q 
by Theorem 40.3.7. Thus i ) implies (iv). 

To show that (iv) implies (i), let f satisfy (iv). If U = 0, then f is differentiable on U by Definition 40.3.4. 
So assume that U 4 Ø. Let p € U. Let Q — U. Then Q € Top,(U). So f — fle is differentiable on U = Q. 
Thus (iv) implies (i). Hence (i), (ii), (iii) and (iv) are equivalent. 

To show that (i) implies (v), let f satisfy (i). Let p € U. Let Q =U. Then fl — f is differentiable on 
Q — U. Thus (i) implies (v). 
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To show that (v) implies (iii), let f satisfy (v). Let C = {Q € Top(U); fs is differentiable on Q}. Then C 
is an open cover of U by subsets of U, and ri e = f|; is differentiable on UN Q = Q for all Q € C. Thus 
(v) implies (iii). Hence (i), (ii), (iii), (iv) and (v) are equivalent. 


40.3.9 REMARK:  Differential-quotient-limit definition of differentiability. 

Theorem 40.3.10 converts the “goodness-of-fit” style of differentiability in Definition 40.3.4 to the differential 
quotient style which is familiar from elementary calculus textbooks. This corresponds geometrically to the 
idea of approximating a tangent of a curve with chords. By converting each chord to its real-number gradient, 
a geometric concept is converted to a numerical concept. One practical advantage of the differential quotient 
is that it can be used for a limit expression for the derivative, namely f'(p) = lim; ,5(f(x)— f(p))/(x—p). (See 
Theorem 40.4.9.) Perhaps this fact more than anything else is the real motivation for defining differentiability 
in terms of differential quotients. 


40.3.10 THEOREM: Equivalent quotient form of the definition of differentiability. 
A function f : U — IR for U € Top(IR) is differentiable at p for p € U if and only if 


Jv € R, Ve > 0, 36 > 0, Yx € (p— ô, p + ô) N (UN (p), 
[f= 40) _,| 
x—p 


«Eg 


A function f : U — R for U € Top(R) is differentiable on U if and only if 


Vp € U, dv € R, Ve > 0, 3d > 0, Yx € (p^ ô, p+ ô) N (U \ (p), 
| fone 
t—p 


—vu|<e. 


observing that | f(a) — f(p) — v(x — p)|/| — p| = (f(x) — f(p))/(x — p) — v| for z 7 p. 


PROOF: The assertions follow directly from Definition 40.3.4 by dividing line (40.3.4) by |x — p| and then 
) 


40.4. Derivatives of differentiable functions 


40.4.1 REMARK: Uniqueness of the derivative of a differentiable function. 

The real number v in Definition 40.3.4 is unique for a given function f and point p. This is shown in 
Theorem 40.4.2. The difference between equations (40.3.4) and (40.4.1) is the use of the unique existence 
quantifier ^3'" in (40.4.1). 


40.4.2 THEOREM: Uniqueness of the derivative in the definition of differentiability. 
Let U € Top(IR) and let f : U — IR be a function which is differentiable at p € U. Then 


Hu € R, Ve > 0, dó > 0, Yx € (p — ô, p + ô) N (U N {p}), 
|f (x) — f (p) — v(z — p)| < elx — pl. (40.4.1) 


PROOF 1: The uniqueness of v follows from a straightforward modus tollens. Let v1, v9 € R satisfy equation 
(40.3.4) for some p € U. Suppose vı Z vz and let € = |v; — v3|/2. Then e > 0. Therefore 


Jô, > 0, Va € (p — 01, p + 61) A (UN (p), 
|f(z) — f(p) — v(x — p)| < ela — p| 


and 


dq > 0, Va € (p — 0», p 4-62) N (UN (p), 
f(x) — f(p) — va(z — p)| < ela — pl. 
Let ô = min(04,02). Then 


|f(z) — f(p) -(z—»)| «ez —p»| and |f(x) — f(p) — va(z — p)| < &lz — pl. 
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Since (p — ô, p + ô) N (U \ {p}) #9, it follows that for some x € (p— ô, p + ô) N (U \ {p}), 


[vi — va| - |£ — p| = |(v1 — va) (x — p)| 
= |(f(x) — f(p) ^ (x — p)) — (f(x) — f(») — va(« — p))| 
€ |f(z) — fp) — (x — p)| + |f (x) — f(p) — va(z — p)| 


< 2e|x — p| 


= ju — v2|- |æ — pl. 


This is impossible. Therefore vj = v2. In other words, v is unique as claimed. 


PROOF 2: Let v,,v € R satisfy equation (40.3.4) for some p € U. Let £ > 0. Then 


Jô; > 0, Vz E (p — 61, p + 51) N (UN {p}), 
|f(z) — f(p) — vi (z — p)| < elz — p| 


and 


Ad > 0, Va € (p — 03, p 4-02) N (U \ {p}), 
|f (x) — f(p) — va(z — p)| < &lx — pl. 


Let 6 = min(ó1,03). Then since (p — ô, p + ô) N (U \ {p}) Z 0, for some x € (p — ô, p + ô) N (U \ {p}), 


|o — v2| = |(v1 — v3)(z — p)|/|£ — p| 
= |(f(@) — F(p) — vi (x — p)) — (f(x) — F(p) — va(@ — »))|/lx — pl 
< (If(x) — f(p) — vi (x — p)| + | f(x) — f(p) — vx(z — p)|)/|a — p| 
< 2e|z — p|/|x — p| 
= 2e. 


Therefore |v, — v2| = 0. So vı = v2. In other words, v is unique as claimed. 


40.4.3 REMARK: Definition of the (unique) derivative of a differentiable function. 

The uniqueness of the number v in Definition 40.3.4 implies that the word “the” can be used instead of 
“a” for this value. Thus it may be given a name such as “the derivative of f at p". (If the number v was 
not unique, we would have to call it “a derivative of f at p".) Hence Definition 40.4.4 is meaningful. If 
the function f is differentiable on U, there is a unique well-defined function which maps each p € U to a 
corresponding unique value v at p. This is called “the derivative of f". 


40.4.4 DEFINITION: The derivative of f at p, for a function f : U — IR which is differentiable at a point 
p € U for U € Top(IR), is the unique number v € R which satisfies 


Ve > 0, 4d > 0, Vr € (p — ô, p + ô) N (UN {p}), 
|f (x) — f(p) — v(z — p)| < elx — pl. (40.4.2) 


The derivative of a function f : U — R which is differentiable on U € Top(R), is the function from U to IR 
whose value for each p € U is the derivative of f at p. 


40.4.5 NOTATION: f’, for a function f : U > R which is differentiable on U for some U € Top(IR), denotes 
the derivative of f. 


40.4.6 REMARK: Expression for the derivative of a real function using quotients and balls. 
Condition (40.4.2) in Definition 40.4.4 can also be expressed in terms of the differential quotient and open 
balls as follows. 


Ve > 0, 30 > 0, Yz € Bog NU, 
z—p 


(See Definition 37.3.14 and Notation 37.3.15 for the punctured open ball B, 5.) 
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40.4.7 REMARK: Pseudo-notation for the derivative of a function at a single point. 

According to Notation 40.4.5, f' denotes a function from U to IR for some U € Top(IR). Therefore the 
notation f'(p) is meaningful for all p € U, and it means the value of the function f' at p. (See Notation 10.2.9.) 
However, it is customary to use the notation f'(p) for the derivative of a function f : U > R at p € U even 
if f’ is not a well-defined function. It may be, in fact, that the derivative of f at p is defined for only one 
point p in U. Then f’(p) refers only to a single real number, and there is no such function as f’, unless one 
defines the domain of f’ to be the set of all p € U where f is differentiable. (This option for making f’ 
denote a partially defined function f : U — R is outlined in Remark 42.1.1.) 


40.4.8 NOTATION: f'(p), for a function f : U + R which is differentiable at p € U for some U € Top(IR), 
denotes the derivative of f at p. 


40.4.9 THEOREM: Equivalent quotient form for the derivative of a function at a point. 
Let f : U — R for some U € Top(R) be differentiable at p € U. Then 


x—p r—p 


Pnoor: Let U € Top(R). Let f : U — R be differentiable at p € U. Define g : U > R by g(p) = f'(p) 
and g(x) = (f(x) — f(p))/(x — p) for x € U \ {p}. Let G € Top ¢(p) (IR). Then Bp(5," C G for some 
e € Rt. So by Definition 40.4.4, for some 6 € Rt, Vr € Bp NU, |g(z) — f'(p)| < €. In other words, 
g(Q \ {p}) € Bfp), where Q = Bpa OU. Therefore g(Q \ (pj) € G. But Q € Top, (IR). Consequently 
VG € Topp(5;, SQ € Top, (IR), 9(2\ {p}) € G. So f'(p) = lims,(f(x) — f(p))/(x — p) by Definition 35.3.7 
and Notation 35.3.17. 


40.4.10 REMARK: Making differentiability proofs easier by weakening the inequality. 

The purpose of Theorem 40.4.11 is to simplify some proofs by very slightly weakening the inequality which 
needs to be satisfied to pass the test. Note that in Definition 40.3.4, a “punctured interval” (p — ð, p + 6) is 
used because the strict inequality | f(x) — f(p) — v(x — p)| < &|x — p| in condition (40.3.4) cannot be satisfied 
for x = p. This exclusion of x = p has the convenient consequence that the condition can be expressed 
in terms of the differential quotient (f(x) — f(p))/(x — p) in Definition 40.4.4 for example. But for some 
proofs it is preferable to apply the slightly weaker inequality in Theorem 40.4.11, for example in the proof 
of Theorem 40.5.7 (i, ii, iii). 


40.4.11 THEOREM: Equivalence of weak and strong inequalities in the definition of the derivative. 
Let f : U > R for some U € Top(R). Let p € U. Then f is differentiable at p with derivative f'(p) at p if 
and only if 


Ve > 0, dó > 0, Yx € (p— ô, p + ô) Q U, 
lf(2)-—f@) — f@)(e—p)| < elz — pl. (40.4.3) 


PROOF: Let f :U — R for some U € Top(R), and p € U. If f is differentiable at p with derivative f’(p) 
at p, then condition (40.4.3) follows directly from Definitions 40.3.4 and 40.4.4. (When x = p, the inequality 
states that 0 < 0, which is clearly true!) 

For the converse, let f satisfy condition (40.4.3). Let e € R*. Let 1 = ¢/2. Then cı € IR*. So by (40.4.3), 


for some ó € IR*, for all z € (p — 61,p + à) N (U \ {p}), f(x) — Fp) — f'(p)(z — p)| < eile — p| < elc — p 
because |x — p| > 0. So f is differentiable at p with derivative f'(p) at p by Definitions 40.3.4 and 40.4.4. 


40.4.12 REMARK: Notations for derivatives of real-valued functions. 

EDM2 [113], 106.A, gives the notations dy/dz, y', y, df (z)/dx, (d/dx) f(x), f'(x) and D, f(x) for the deriva- 
tive of y = f (x). In the olden days, the dot notation y usually meant the derivative with respect to time, 
whereas the dashed notation y' means the derivative with respect to a space variable. Leibniz created the 
dy/dx notation whereas Newton created the i notation. (For the attribution of the dy/dx notation to 
Leibniz, see for example Spivak [140], page 131-132; Cajori [242], Volume II, page 186.) 


Some authors attribute the slow progress of analysis in England after Newton to the use of his notation. For 
example, Ball [232], page 439, wrote the following in about 1893. 
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Towards the beginning of the last century the more thoughtful members of the Cambridge school 
of mathematics began to recognise that their isolation from their continental contemporaries was a 
serious evil. The earliest attempt in England to explain the notation and methods of the calculus as 
used on the continent was due to Woodhouse, who stands out as the apostle of the new movement. 


Ball [232], page 441, wrote the following about Charles Babbage, who was also a member of the Cambridge 
Analytical Society. 


It was he who gave the name to the Analytical Society, which, as he stated, was formed to advocate 
“the principles of pure d-ism as opposed to the dot-age of the university.” 


The d and dot here refer to the continental and Newtonian notations respectively. Struik [249], page 171, 
wrote the following. (He gives the reference Dubbey [243] for the Babbage quote.) 


The names of Hamilton and Cayley show that by 1840 English-speaking mathematicians had at 
last begun to catch up with their continental colleagues. Until well into the nineteenth century, 
the Cambridge and Oxford dons regarded any attempt at improvement of the theory of fluxions 
as an impious revolt against the sacred memory of Newton. The result was that the Newtonian 
school of England and the Leibnitzian school of the continent drifted apart to such an extent that 
Euler, in his integral calculus (1768), considered a union of both methods of expression as useless. 
The dilemma was broken in 1812 by a group of young mathematicians at Cambridge who, under 
the inspiration of the older Robert Woodhouse, formed an Analytical Society to propagate the 
differential notation. Its leaders were George Peacock, Charles Babbage, and John Herschel. They 
tried, in Babbage’s words to advocate “the principles of pure d-ism as opposed to the dot-age of the 
university.” [...] The new generation in England now began to participate in modern mathematics. 


Ball [232], pages 361—362, also wrote as follows about Newton’s [228] “fluxions” approach as opposed to the 
European “differential” method. 


The controversy with Leibnitz was regarded in England as an attempt by foreigners to defraud 
Newton of the credit of his invention, and the question was complicated on both sides by national 
jealousies. It was therefore natural, though it was unfortunate, that in England the geometrical and 
fluxional methods as used by Newton were alone studied and employed. For more than a century 
the English school was thus out of touch with continental mathematicians. The consequence was 
that, in spite of the brilliant band of scholars formed by Newton, the improvements in the methods 
of analysis gradually effected on the continent were almost unknown in Britain. It was not until 
1820 that the value of analytical methods was fully recognised in England, and that Newton's 
countrymen again took any large share in the development of mathematics. 


(See also some related comments in Remark 77.1.6 regarding the decline of English mathematics in the 18th 
and 19th centuries. See also Remark 57.9.12 regarding the dot and dash notations.) 


40.4.13 REMARK: Expressing the derivative of a real-valued function as a limit. 
Equation (40.3.4) is the same as (40.4.4). 
— van lz) — fo) -ve — 9) 
sp |z — p| 


=0. (40.4.4) 
(See Definition 35.3.7 for the definition of the limit of a function at a point.) 


40.5. Differentiation rules for one variable 


40.5.1 REMARK:  Differentiability of a real-valued function of one variable implies continuity. 
Differentiability is generally thought of as a stronger condition than continuity. However, in the case of 
multiple independent variables, Examples 41.1.21 and 41.4.8 show that partial and directional differentiability 
do not imply continuity. Therefore it is prudent to prove that differentiability for a single independent 
variable does imply continuity. Theorem 40.5.2 is also useful for proving various differential calculus rules. 


40.5.2 THEOREM: Pointwise differentiability implies pointwise continutty. 
Let U € Top(IR). Let f : U > R be differentiable at p € U. Then f is continuous at p. 
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PROOF: Let f:U — IR be differentiable at p € U € Top(R). Let € > 0. Let ¢; = 1. By Theorem 40.4.11, 


Jô, > 0, Vr € (p — 61, p+ 61) NU, 
|f(z) - f(p) - f’@)\(e—-p)|<ale-—pl = |z- pl. 


Let ô = min(à;, $¢(1 + |f (p)]) 1). Then 


Vx € (p— ô, p+ ô) Q U, 

lf) — f@)| € lf) — f(p) — F'(p) — 2)| + If (p) — p)| 
lx — p| + |f (9l *]z — pl 
le <e. 


IA IA 


Hence f is continuous at p by Theorem 38.1.10. 


40.5.3 THEOREM: Differentiability implies continuity. 
Let f : U — R be differentiable on U € Top(R). Then f is continuous on U. 


PROOF: Let f :U — R be differentiable on U € Top(R). Then f is differentiable at p for all p € U by 
Definition 40.3.4. So f is continuous at p for all p € U by Theorem 40.5.2. Hence f is continuous on U by 
Theorem 35.3.3. 


40.5.4 REMARK: The derivative of any constant real function is zero. 
As usual, proving a trivial theorem is a good exercise in the mechanics of using new definitions. (If it is 
too difficult to prove a trivial theorem, this would suggest that the definitions should be adjusted to make 
them more easily applicable.) Theorem 40.5.5 is an exercise in the application of the basic definitions for 
differentiable functions. It is also useful, as many trivial theorems are. 


40.5.5 THEOREM: Constant functions are differentiable, and have derivative equal to zero. 
Let U € Top(IR). Let c € IR. Define f : U > R by f(x) = c for all x € U. Then f is differentiable on U, 
and f'(p) = 0 for all p € U. 


PROOF: Let v —0. Let e € Rt. Let ô= 1. Let x € (p—d,p+6)N(U \ {p}). Then |f (x) — f(p) -v(z—p)| = 
|lc—c—0| 20 < e|x — p|. So f is differentiable on U by Definition 40.3.4, and the derivative of f at p equals 
0 for all p € U by Definition 40.4.4. Hence f’(p) = 0 for all p € U by Notation 40.4.8. 


40.5.6 REMARK: Elementary proofs of elementary differential calculus rules. 

Although the constant multiplication rule, sum rule, product rule and quotient rule in Theorem 40.5.7 are 
among the first rules to be demonstrated in an elementary differential calculus course, their proofs, which 
are not entirely trivial, contain ingredients which are often needed to prove advanced theorems. 


40.5.7 THEOREM: Some elementary pointwise differential calculus rules for first-order derivatives. 

Let U € Top(IR) and p € U. 

(i) (Constant multiplication rule.) If A € R and f : U > R is differentiable at p, then the pointwise 
product Af is differentiable at p and (Af)'(p) = Af'(p). 

(ii) (Sum rule.) If fı, fo : U — R are differentiable at p, then the pointwise sum fı + f? is differentiable 
at p and (fi + f1) (p) = fi (p) + fap). 

(ii) (Product rule.) If fi, f2 : U + Rare differentiable at p, then the pointwise product f; f2 is differentiable 
at p and (fi f2) (p) = fi (p) f2(p) + fil) fa(p). 

(iv) (Reciprocal rule.) If g : U — R X {0} is differentiable at p, then the pointwise reciprocal 1/g is 
differentiable at p and (1/9g)'(p) = —g’(p)/g(p)?. 

(v) (Quotient rule.) If f : U 2 R and g : U > R \ {0} are differentiable at p, then the pointwise quotient 
f/g is differentiable at p and (f/g) (p) = f'(p)/a(p) — f(v)g (m)/g(»* = (F'(p)9(p) — f(»)g (/))/a(»". 
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PROOF: For part (i), let A € Rand let f : U > IR be differentiable at p. Let e € R*. Let ei = €/ max(1, |A|). 
By Theorem 40.4.11, zh > Vr € (p— ô, p+6)NU, | f(x) — f(p) — f'(p)(z —p)| € e1|lxz — p| because £1 € R*. 
Then ((Af)(2) — QJ)(2) -APE — p)| = IA L(x) — Flv) — FO- p) € [eile — pl < ele — p] for such 
ô and x. Hence Af is d at p and (Af) (p) = Af'(p) by Theorem 40.4.11. 

For part (ii), let fi, fo : U — IR be differentiable at p. Let € € IR*. Let co = €/2 € Rt. By Theorem 40.4.11, 
Ad, > 0, Vx € (p— ôk, p+ ôk) QU, | f(x) — fx(p) — fi. (p)(@ — p)| € colz — p| for k = 1,2. Let 6 = min(; , 52). 
Then 6 > 0 and Vz € (p—6, p+6)NU, |(fi--fo) (x) - (fit 2) (9) - U(9)-- f) (»—p)| < 2eole—pl = sla». 
Hence f, + f» is differentiable at p and (fı + f2)'(p) = fi(p) + fa(p) by Theorem 40.4.11. 

For part (iii), let fi, fe : U + IR be differentiable at p. Let e € Rt. Let e1 = $¢/max(1,|f2(p)|) and 
£9 = ie/ max(1, |fi(p)|). Then €1,€2 € IR*. So by Theorem 40.4.11, 


Ad, > 0, Va € (p—óx,p-Fóx) QU, — |f(z) — fp) — f&(p)(x — p)| < erle — pl (40.5.1) 


for k = 1,2. Let 69 = tell fip) + €1) t (| f4(p)| + £3) |. Let 6 = min(do, 61, 62). Then 


Va € (p—6,p+6)NU, \( fife) (@) — (fr fe)(p) — (fi p) fa(p) + fie) fa(p))(@ — p)| 
=|(fi(a) — fa(p) — fi(p)(x — p)) fap) + Cfa(a) — fap) — fa(p)(@ — p)) fp) + Ca (2) — filp))(fe(@) — fe(p))| 
< €1|fo(p)| «|e — pl + ex| iQ) - lx — pl + | fie) — filp)| - |f () — fe(p)| 
€ $elz — p| +|filx) — filp)| - |fo(x) — f2(p)l. 


But |fx(z) — fe(p)| € |fi(p)]- |x — p| + ex|x — p| for k = 1,2 by line (40.5.1). So 


Va € (p — ð p + ô) N U, 

|fi(x) — filp)| - |fo(@) — fal < (IFP) + €1) (fa (p)| + €2)|@ — p? 
< oll fi (P)| + &1)(1/2(9)] + &2)1x — pl 
«i 


eje — pl. 
Therefore 


Vx € (p— ô, p + ô) N U, 
(fi fa)(a) — Ca f2) (p) — (Fip) falp) + fi) fap) — p)| < ela — pl. 


Hence fg is differentiable at p and (fı f2) (p) = fi(p)fa(p) + fi(p) f3(p) by Theorem 40.4.11. 


For part (iv), let g : U — R\{0} be differentiable at p. Let € € IR*. Let e; = $¢|g(p)|~?. By Theorem 40.4.11, 
dd, > 0, Vx € (p — à, p + 01) QU, |g(x) — g(p) — g'(p)(x —p)| € eye — p| Poomi €i € IR*. Since g is 
continuous at p by Theorem 40.5.2, 36o € Rt, Yx € (p — 60,p + 69) NU, |g(z)| > z|g(p)|, and so g(x) z 0 
on (p — ôo, p + ĝo). Let ô = ETUR Then 


Va € (p — ô, p + ô) Q U, 
(1/9) (x) — (1/9)()  9'(/)g(») ?(x — p)| = lg) l)p) — 9(@) + 9 ()(x — p)| 
< silga) gp) e — p| 
< 2ei|g()| ?lz — p| 
< eļæ — pl. 


Hence 1/g is differentiable at p and (1/g)'(p) = —g'(p)/g(p)? by Theorem 40.4.11. 
Part (v) follows from parts (iii) and (iv). 


40.5.8 REMARK: Application of pointwise calculus rules to functions on open sets. 
Theorem 40.5.9 applies the pointwise calculus rules in Theorem 40.5.7 to functions on open subsets of the 
real numbers. 
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40.5.9 THEOREM: Some elementary differential calculus rules for first-order derivatives. 
Let U € Top(IR). 


(i) If A € Rand f : U > R is differentiable on U, then the pointwise product Af is differentiable on U and 


Vp € U, (Af)'(p) = Af'(p). In other words, (Af)! = Af’. [constant multiplication rule 
(ii) If f,g : U — R are differentiable on U, then the pointwise sum f + g is differentiable on U and 
Vp € U, (f + g)'(p) = f'(p) + g' (p). In other words, (f + g)’ = f' 4 g'. [sum rule 
(iii) If f,g : U — R are differentiable on U, then the pointwise product fg is differentiable on U and 
Vp € U, (fg) (p) = f'(p)g(p) + f(p)g' (p). In other words, (fg) = f'g + fo’. [product rule 
(iv) If g : U ^ R \ {0} is differentiable on U, then the pointwise reciprocal 1/g is differentiable on U and 
Vp € U, (1/g) (p) = —g'(p)/g(p?. In other words, (1/g)’ = —g’/g?. [reciprocal rule 


(v) If f:U > Rand g : U ^ R\{0} are differentiable on U, then the pointwise quotient f/g is differentiable 


on U and Vp € U, (f/g)'(») = f'(»)/g(») — f(»)g (p)/g(p)? = (f'(m)g(») — F(p)9'(p))/g(p)?.. In other 
words, (f/9)' = f'/g — fg /g? = (f'g — fg)/g. [quotient rule 


PROOF: Parts (i), (ii), (iii), (iv) and (v) follow directly from Theorem 40.5.7 parts (i), (ii), (iii), (iv) and (v) 
respectively. 


40.5.10 THEOREM: Differentiable functions form a linear space with pointwise addition and scalar product. 
For any U € Top(IR), the set of real-valued functions on R which are differentiable on U constitutes a real 
linear space with the operations of pointwise scalar multiplication and addition. 


Pnoor: The assertion follows from Theorem 40.5.9 (i, ii) and Definition 22.1.1. 


40.5.11 REMARK: Computation of the derivatives of monomial and reciprocal monomial functions. 
Theorem 40.5.12 applies Theorems 40.5.7 and 40.5.9 to compute first-order derivatives for simple monomial 
functions, including reciprocal monomials. (The result is trivial, but trivial tests are useful for exercising the 
mechanisms of basic definitions and theorems.) The rule for n = 0 could be regarded as an extension of the 
rule for positive n if 0.07! is understood to mean 0, but strictly speaking this is gobbledygook. The rule for 
n = 0 cannot be regarded as an extension of the rule for negative n because the domains do not match. So 
regrettably n — 0 is a special case. 


40.5.12 THEOREM: Derivatives of monomials and reciprocal monomials. 
Let U € Top(IR). 
(i) Let n = 0. Define fn : U > IR by falx) 
Then f, is differentiable on U and f/ (x 
(ii) Let n = 1. Define f, : U > R by f(x) = 
Then f, is differentiable on U and P(e) = 
(iii) Let n € Zt. Define f, : U > R by f(x) = 
Then f, is differentiable on U and f} (x) = 
(iv) Let n € Z-. Define fn : U \ {0} > R by fn 
Then f, is didoscut idle on U and f} (x) = 


= x” for all xe U. 
Poo 
n for all z € U. 
na”! for all z € U. 
x” for all x € U. 
nz”! for all z € U. 
(x) =a” for all x € UN {0}. 
nz"-! for all x € U X {0}. 


PROOF: Part (i) follows from Theorem 40.5.5 and Definition 16.6.3 (1). 


For part (ii), note that f,(x) — fa(p) — 1.(@ — p) = 0 for all z,p € U. So f, is differentiable at p for all 
p € U by Definition 40.3.4, and and fi (p) = 1 for all p € U by Definition 40.4.4 and Notation 40.4.8. Hence 
(x) = nz"! for all x € U by Definition 16.6.3. 


For part (iii), the case n = 1 is given by part (ii). Suppose that the assertion is proved for some n € Z*. 
Then fn is differentiable on U and f! (x) = nz"-! for all z € U. By Definition 16.6.3 (2), fn4i(a) = £. fn (a) 
for all x € U. So f, is differentiable on U and /;, (x) = 1.f, (x) + x.f, (x) for all x € U by part (ii) and 
Theorem 40.5.7 (iii). Thus f/,4(r) = x" + zng”! = (n + 1)z* for all x € U. Hence the assertion follows 
by induction for all n € Z*. 


For part (iv), let n = —1. Then f,(z) = 1/z for all x € UN {0} by Definition 16.6.4(1). So by part (ii) 


and Theorem 40.5.9 (iv), fn is differentiable on U \ {0} and f/ (x) = —1/z? for all x € U \ (0). Therefore 
f(x) = nz"! for all x € U \ (0) by Definition 16.6.4. 
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Now suppose that the assertion has been proved for some n € Z~. Then fn is differentiable on U V (0) and 
fi,(x) = nz^-! for all x € U \ {0}. By Definition 16.6.4 (2), f, 1(z) = x !.f, (x) for all x € U \ (0). So 
fn—1 is differentiable on U \ {0} and 


Va € U \ {0}, fr—i(z) 9-9 7 fa(z) a. fem) 
= —g ?g" +g ne} 


= (n — 1)? 


by the case n = —1 and Theorem 40.5.9 (iii). Hence the assertion follows by induction for all n € Z^ 


40.5.13 THEOREM: Differentiation of polynomials. 
Let U € Top(R). Define f : U > R by f(z) = X zo apx" for some sequence (a4). € R”+!. Then f is 
differentiable on U and f'(x) 2 $5, ., kapr"! = Ya t 1)ag41z* for all x € U X {0}. 


PRoor: The case n = 0 follows from Theorem 40.5.5. Now assume that the assertion is proved for some 
n € Zg. Define f : U — R by f(x) = is a,x* for some sequence (a&)7^4 € IR"*?. 


Define g : U — R by g(x) = et} apx! = Y 5, o ap siz* for all z € U. Then g(x) = Ypo bxz^ for all 
x € U, where (b,)? y € IR"*! is defined ro bk = ay44 for all k € Z,44. But f(x) = xg(z) + ao for all 
x € U. So f is differentiable on U and f'(x) = g(x) + xq'(a) for all x € U by Theorem 40.5.9 (ii, iii) and the 
case n = 0. Therefore f'(x) = Y5, obrt" + x Y 4 4 kbxz ^1 = Yolk + 1)byz^ = RT) kagz^- for all 
x € U. Thus case n + 1 follows from case n. Hence the assertion follows by induction for all n € Zj. 


40.5.14 REMARK: Differential calculus rule for composition of functions. 

Differentiation of the composite of two functions depends to some extent on which definition is adopted for 
the composite. According to the relatively strict Definition 10.4.17, the composite g o f of f and g is only 
well defined if Range(f) C Dom(g). When this constraint is relaxed, the result is a partial function as in 
Definition 10.10.6. Theorem 40.5.15 is expressed in terms of the partial-function style of composition. Then 
g o f is a well-defined function from f^! (Dom(g)) to g(Range(f)). 


40.5.15 THEOREM: Differential calculus composition rule. Also known as "the chain rule". 
Let f : U > Rand g : V — R be functions on U,V € Top(R). If f is differentiable at p € U and g is 
differentiable at f(p) € V, then g o f is differentiable at p and (g o f)'(p) = g'(f(p))f'(p). 


PROOF: Lete € Rt. Let e1 = 3e(14- |g'(f(p))]) !. Let e2 = $&(e1-- |f'(p)]) ^. Then by Theorem 40.4.11, 
there are 61,62 € IR* which satisfy the following. 


Vz € (p- ài, p + ài) N U, f(x) — F(p) — f'(p)(s — p)| € elz — pl, (40.5.2) 
Vy € (f(p) — às, f(p) + ó3) NV, 

lay) — 9(F(P)) — g (F@)(y — f(»))| € exly — f). (40.5.3) 

Since f(p) € V and V € Top(R), there is a 6, € R” such that (f(p) — i, f(p) + 64) C V. Since f is 


continuous at p by Theorem 40.5.2, there is a 0 € Rt such that Vx € (p—d/,p+6/)NU, |f(x) — f(p)| < às. 
Let 6 = min(01,01,607). Then x € (p — ô, p + ô) implies f(x) € (f(p) — 62, f(p) +62) Y V. So 


Va € (p—6,p+ ô) NU, 
la(f(x)) — 9(f(p)) — 9' (Fp) f (D) (a — »)] 
< lg(f()) — 9(f(p)) — g G()) f(x) — Fp) + 1g GG») (2) — f) — 9 GG) f p) — p)| 
< eal f(z) — f(»)| + 1g CF (2))] eilz — p| 
< €2(|f(x) — F(p) — f’(p)(@ — p)| + If (p) (a — »)l) + Sele — p| 
< ex(& + |f" (p)|)lz — p| + $elz — p| 


€ jele — p| + gele—p| = ele — pl. 


Hence g o f is differentiable at p and (g o f)'(p) = g'(f(p)) f (p) by Theorem 40.4.11. 
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40.5.16 DEFINITION: A continuously differentiable real-valued function of a real variable is a differentiable 
function f : U > R for some U € Top(R) such that f’ is a continuous function on U. 


40.5.17 THEOREM:  Differentiability and chain rule for composition of differentiable functions. 
Let f : U > IR and g : V — R be differentiable functions with U,V € Top(IR). 

(i) go f is differentiable on Dom(g o f) = f^ !(V) € Top(IR), and (g o fY = (g' o f)f'. 

(ii) If f and g are continuously differentiable, then g o f is continuously differentiable. 


PROOF: For part (i), Dom(g o f) = f^ !(V) by Theorem 10.10.13 (i), and f^!(V) € Top(IR) because f 
is continuous by Theorem 40.5.3. Then it follows from Theorem 40.5.15 that g o f is differentiable on 


Dom(g o f) and that (go fY = (g' o f)f". 
Part (ii) follows from part (i) and Theorems 31.12.7 and 38.1.12 (iii). 


40.5.18 REMARK: Continuous differentiability of real functions. 

Definition 40.5.16 may seem at first sight to be somewhat obvious. A continuously differentiable function 
is a differentiable function whose derivative is continuous. However, this concept does have some subtleties. 
The function f : U — IR in Example 40.5.19 is continuously differentiable, but the graph in Figure 40.5.1 
gives the impression that f is not differentiable at the origin. In fact, it is not possible to express f as a 
restriction to U of a continuously differentiable function on an open set Q which includes U. 


40.5.19 EXAMPLE: Continuously differentiable function with continuously extendable derivative. 

Let U = gun U; with U; = (-1/(2j — 1), -1/(25))U (1/(25), 1/(27 — 1)) for all j € Z*. Define f : U > R by 
f(x) = 4 ceiling(1/|22|)~ for all x € U. (See Definition 16.5.12 for the ceiling function.) Then f(x) = 1/(27) 
for x € U; for all j € Z. (See Figure 40.5.1.) 
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Figure 40.5.1 Continuously differentiable function not extendable to a neighbourhood 
Define g : U — R by 


= dna: EET 
vx € U, a@) = { 5 ceiling(1/|22|) for x £0 
0 for x = 0. 


Then g(x) = 1/(25) for x € U; for all j € Zf. So g is continuous on U = {0} U Lbs Uj, and f = gly- 
Define h : U > R by h(x) = 0 for all x € U. Then h € C(U,R) and f' = hlaz However, there is no 
continuously differentiable function F on a set Q € Top(IR) which satisfies U C Q and f’ = F le 


40.6. The mean value theorem and related theorems 


40.6.1 REMARK: The derivative of a real-number function is zero at an extremum if it is differentiable. 

The vanishing of the derivative of a function at a maximum or minimum was known in various forms since 
well before the time of Newton and Leibniz, although logically self-consistent and meaningful definitions for 
derivatives were not published until the 19th century by Cauchy, Weierstraf and others. (See Boyer [235], 
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pages 267-298, for the “rigorous formulation” of differential calculus.) Anticipations of the observation 
that the derivative vanishes at maxima and minima include Oresme (probably before 1361), Kepler (1615), 
Fermat (1638/1643), Johann Hudde (1658/1659) and Huygens (a few years after 1659). (See Boyer [235], 
pages 85, 110-111, 155-159, 185-186; Cajori [241], pages 160, 163-164, 180, 193, 196; Struik [249], page 100.) 


40.6.2 THEOREM:  Differentiable functions have zero derivative at a minimum or maximum. 
Let S be a subset of IR. Let f : S — IR. 

(i) If p € Int(S) and f(p) = max(f) and f is differentiable at p, then f’(p) = 0. 

(ii) If p € Int(S) and f(p) = min(f) and f is differentiable at p, then f'(p) = 0. 


PRoor: For part (i), let S € P(R), f : S > R, p € Int(S) and f(p) = max(f), and suppose that f is 
differentiable at p. Let U — Int(S). Then by Theorem 40.4.2, there is a unique number v € IR which satisfies 


Ve > 0, 4d > 0, Vr € (p— ô, p + ô) N (U \ {p}), 
|f (x) — f(p) — v(z — p)| < elx — pl. 


Suppose that this number v € R is non-zero. Let € = |v|/2. Then for some ô € IR*, 


Yre (p- & p) n (UND. Ie) - £0) -vle 9) < s - pl 


But f(x) — f(p) = v(x — p) + (f(x) — f(p) — v(z — p)) Z v(x — p) — |f (x) — f(p) — v(x — p)| for all x € 5 by 
Theorem 16.5.3 (ii). So 


Vr€(p-à pt0)Q](UN(DJ), f(x)— f(p) > vz—p) — Hie- 


Since p € Int(U), for some ðo € Rt, (p — ôo, p + o) C U. Let à = 4 min(ô, do). Then à; € Rt, and p — à 
and p + 6; are elements of (p — ô, p + ô) N (U \ {p}). Suppose that v < 0. Then 


F(p — 61) — f(p) > v(p— ô — p) — (v|/2)]p — & — p| 
= (|v|/2)óà1 > 0, 


which contradicts the assumption that f(p) = max(f). Suppose now that v > 0. Then 


f(p+ à) — f(p) > v(p + 61 — p) — (v|/2)]p + à — p| 
= (|v|/2)d1 > 0, 


which once again contradicts the assumption that f(p) = max( f). Therefore v = 0. That is, f’(p) = 0. 
For part (ii), apply part (i) to — f. 


40.6.3 REMARK: Left and right superior and inferior limits of real-number functions at extrema. 

Another way to approach the proof of Theorem 40.6.2 is to note that the left and right upper and lower 
limits of a real-valued function f of a single real variable, which are well defined extended real numbers at 
all points in the interior of the domain of f, have obvious bounds at minima and maxima. At a maximum, 
one finds lim inf, ,,- (f(x) — f(p))/(x — p) = 0 and limsup,_,,+ (f(x) — f(p))/(x — p) € 0. Similarly, at a 
minimum, one finds lim sup,_,,- (f(x) — f(p))/(x — p) € 0 and liminf,_,,+(f(x) — f(p))/(x — p) = 0. If f is 
differentiable at p, all four of these limits have the same value, namely the derivative at p. Hence f'(p) = 0. 


40.6.4 THEOREM: — Rolle's theorem for real-valued functions of real numbers. 
Let I = [a,b] be a bounded closed interval of R, with a < b. Let f : I — IR be continuous on J and 
differentiable on Int(7) = (a,b). If f(a) = f(b), then 


Jx € (a,b), 3 (asked 


PROOF: Since I = [a,b] is a bounded, closed subset of IR, it follows by Theorem 34.9.13 that I is a compact 
subset of IR. So f(I) is a compact subset of R by Theorem 33.5.15. Therefore f(T) is a closed, bounded 
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subset of IR. by Theorem 37.7.3 because the topology on R is induced by a metric space. So inf(f(I)) an 
sup(f(Z)) are well-defined real numbers, and inf(f(Z)) = min(f(Z)) € f(a) = f(b) € max(f(Z)) = sup( f (I ». 
Therefore f(x) = min(f(I)) € f(a) and f(x2) = max(f(1)) > f(a) for some 21, x2 € I. 

Let x1,£2 € I be such that f(rzi) = min(f(Z)) and f(x2) = max(f(Z)). Suppose that min(f(1)) < f(a). 
Then z, € (a,b). So f'(x1) = 0 by Theorem 40.6.2 (ii). Suppose that max(f(I)) > f(a). Then x2 € (a, b). 
Therefore f’(a2) = 0 by Theorem 40.6.2 (i). Suppose that min(f(I)) > f(a) and max(f(I)) € f(a). Then 
min(f(I)) = f(a) and max(f(I)) = f(a). Therefore f(x) = f(a) for all x € I, and so f'(r) = 0 for 
all x € (a,b) Z Ø. Hence in all cases, f'(x) = 0 for some z € (a,b). 


40.6.5 REMARK: Mean value theorem guarantees a point with gradient equal to the mean gradient. 

It would perhaps be more accurate to call Theorem 40.6.6 a “mean gradient theorem" or “mean velocity 
theorem" because it states that there exists at least one point along a real interval where the gradient is 
equal to the mean gradient or one-dimensional velocity. 


40.6.6 THEOREM: Mean value theorem for real-valued functions of real numbers. 
Let I = [a,b] be a bounded closed interval of IR, with a < b. Let f : I — IR be continuous on J and 
differentiable on Int(7) = (a,b). Then 


f(b) — f(a) 
b-a ` 


3x € (a,b), f'(2)= 


PROOF: Let I = [a,b] be a bounded closed interval of R with a < b. Let f : J — R be continuous on 7 
and differentiable on (a,b). Let k = (f(b) — f(a))/(b — a). This is a well-defined real number because a Z b. 
Define g : I > R by g(x) = f(x) — ka for all x € I. Then g(a) = f(a) and g(b) = f(b) — k(b — a) = f(a). 
Therefore g(a) = g(b). But g is the sum of two continuous functions on [a,b] which are differentiable 
on (a,b). Therefore g is continuous on [a, b] and differentiable on (a,b). So by Theorem 40.6.4, g'(xo) = 0 for 
some zo € (a,b). But the derivative of the sum of two functions equals the sum of the derivatives. So 


g' (xo) = f'(zo) — k. Hence f'(zo) = k = (f(b) — f(a))/(b — a). 


40.6.7 REMARK: Zero derivative implies constant function. 

'Theorem 40.6.8 states the intuitively clear idea that if a function has zero derivative everywhere, then it 
must be constant. One way to think of this is that if the function is going to vary at any point, it must have 
a non-zero derivative at the point where it starts to vary. However, the function f : [-1, 1] — R defined by 
f(x) = max(0, z?) has zero derivative at x = 0 although that is precisely where it starts to vary. So analysis 
of the "starting point" of the variation is not a promising approach for proving Theorem 40.6.8. Instead, the 
proof given here assumes that the function has already varied by some finite amount f(x) — f(a) between a 
and x, and then finds that the derivative must have varied from zero for at least one value of y € (a, x). 


40.6.8 THEOREM: A function with zero derivative on an interval must be constant. 
Let I = [a,b] be a bounded closed interval of IR. Let f : J — IR be continuous on I and differentiable 
on Int(7) = (a,b) with f'(x) = 0 for all x € (a,b). Then f(x) = f(a) for all x € [a,b]. 


PROOF: Let I = [a,b] be a bounded closed interval of IR. Let f : I > R be continuous on J and differentiable 
on Int(7) = (a,b) with f'(x) = 0 for all x € (a,b). If a = b, then clearly I = {a} and f(x) = f(a) for all x € I. 
So assume that a < b. Suppose that f(x) Z f(a) for some x € I. Then z € (a,b], and by Theorem 40.6.6, 
f'(y) = (f(x) — f(a))/(x — a) Z 0 for some y € (a,x). This contradicts the assumption that f'(y) = 0. 
Hence f(x) = f(a) for all x € [a,b]. 


40.6.9 THEOREM: Monotonicity of functions with derivative bounded above or below by zero. 
Let I = [a,b] be a bounded closed interval of IR. Let f : J — R be continuous on J and differentiable 
on Int(I) = (a, b). 


(i) If f'(x) > 0 for all x € (a,b), then f(y) € f(z) for all y,z € [a,b] with y < z. [non-decreasing 
(ii) If f'(x) > 0 for all x € (a,b), then f(y) < f(z) for all y, z € [a,b] with y < z. [increasing 
(iii) If f'(x) € 0 for all x € (a,b), then f(y) > f(z) for all y, z € [a,b] with y € z. [non-increasing 
(iv) If f'(x) « 0 for all x € (a,b), then f(y) > f(z) for all y, z € [a,b] with y < z. [decreasing 
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PROOF: For part (i), let y,z € [a,b]. Theorem 40.6.6 implies 3x € (y,z), f(z) = f(y) + f'(zx)(z — y). So 
f(z) > fly) if y € z. Hence f(y) € f(z). 

For part (ii), Theorem 40.6.6 implies dz € (y, z), f(z) = f(y) + f'(z)(z — y). Hence f(y) < f(z) if y « z. 
Part (iii) follows as for part (i). 


Part (iv) follows as for part (ii). 


40.6.10 REMARK: Monotonicity of first derivatives implies convexity or concavity. 
Theorem 40.6.11 applies Theorem 40.6.9 to show that the monotonicity of the first derivative implies the 
convexity or concavity of the function. 


40.6.11 THEOREM: Non-decreasing/non-increasing derivative implies convex/concave function. 
Let I be a real interval. Let f : I — IR be continuous on J and differentiable on Int (J). 


(i) If f" is non-decreasing on Int(I), then Yxo € Int(Z), (f'(xo) = 0 = f (v0) = mingser f(z)). 
(ii) If f" is non-decreasing on Int(7), then Va,b € I, Vx € [[a,6]], f(x) € max( f(a), f (b)). 
(iii) If f" is non-decreasing on Int(I), then f is convex on I. 
(I), then Vao € Int(Z), (f'(zxo) 2 0 > f(zxo) = maxzer f(x)). 
(I), then Va,b € I, Vx € [[a, bl], f(x) > min(/(a), f(d)). 
(vi) If f’ is non-increasing on Int(I), then f is concave on I. 


) 
) 

(iv) If f’ is non-increasing on Int 
) 


(v) If f’ is non-increasing on Int 


PROOF: For part (i), let zo € Int(/) with f'(zg) = 0. Then f'(x) > 0 for all x € Int(I) with x > zo. So 
f(x) = f(xo) for all x € Int(7) with x > zo by Theorem 40.6.9 (i). Similarly, f'(x) € 0 for all x € Int(J) 
with z X zo. So f(x) > f(xo) for all x € Int() with x > xo by Theorem 40.6.9 (iii). Therefore f(x) > f(xo) 
for all x € Int(7). Hence f(x) > f(xo) for all x € I by Theorem 35.1.8 (iii) because f is continuous on J. 
For part (i), let a,b € I with a < b. Suppose that f(x) > max(f(a), f(b)) for some xı € [a,b]. By 
Theorem 37.7.8, max(f|., sj) € f([a,b]). In other words, f(xo) = max(f|., aj) for some zo € [a,b]. But 
then zo € (a,b) because f(zo) > f(x1) > max(f (a), f(b)). So f'(xo) = 0 by Theorem 40.6.2 (i). Therefore 
f(t0) = min,ej,,5 f(x) by part (i). But then min,ep,: f(r) = max;e[a, f(x), which implies that f is 
constant on [a, b], which contradicts the assumption f(z1) > max(f(a), f(b)). Hence f(x) € max( f(a), f (b)) 
for all x € [a,b]. The same argument is applicable to [[a, b]] = [b,a] in the case b € a. (See Notation 16.1.15 
for [[a, 0]].) 
For part (iii), let a, b € I. Define g : [0,1] + R by 

VA € [0, 1], GA) = Q0 — A) F(a) + Af(b) — f((1— X)a + Ab). 
Then g/(A) = f(b) — f(a) + (b — a) f'((1— A)a + Ab) for all A € (0, 1) by Theorems 40.5.7 (i, ii, iii), 40.5.12 (ii) 


and 40.5.15. So g’ is non-decreasing on Int([0, 1]) = (0, 1) because f’ is non-decreasing on Int(I). Therefore 
by part (ii), g(A) € max(g(a), g(b)) = 0 for all \ € [0,1]. Hence g is convex on J by Definition 23.12.2. 


Part (iv) may be proved similarly to part (i). 


Part (v) may be proved from part (iv) similarly to part (ii) from part (i). 


Part (vi) may be proved from part (v) similarly to part (iii) from part (ii). 


40.6.12 REMARK: Bounding end-to-end function variation in terms of the derivative. 

Theorem 40.6.13 says almost exactly the same thing as Theorem 40.6.6. The purpose of rewriting the mean 
value theorem for real-valued functions of real numbers in this way is to put bounds on the point-to-point 
variation f(b) — f(a) in terms of a lower or upper bound (or both) on the derivative. In some situations, 
one knows bounds on the derivative and wants to know a bound on the end-to-end variation of the function 
which is being differentiated. By contrast, Theorem 40.6.6 says something about the derivative f'(x) at 


some z € (a,b) for a given variation of f between a and b. 


'Theorem 40.6.13 says that the overall average point-to-point velocity of a curve lies between the minimum 
and the maximum (or potentially could be equal to the minimum or maximum). The idea that the average 
of anything lies between its minimum and maximum is intuitively obvious. However, the idea of “average” 
here is not yet defined because it requires the concept of integration in Chapters 43 and 24.1 so that a 
generalisation of the arithmetic mean of a real-valued function may be defined as the quotient of its integral 
divided by the length of the interval. Since the concept of integration is not yet available here, one cannot 
give the interpretation of “arithmetic mean" to the quotient in line (40.6.1). 
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40.6.13 THEOREM: Differential quotient is bounded above/below by supremum/infimum of derivative. 
Let I = [a,b] be a bounded closed interval of IR, with a < b. Let f : I — IR be continuous on J and 
differentiable on Int(7) = (a,b). Then 


inf f'(z)« fo — F(a) < sup f'(x). (40.6.1) 
x € (a,b) b—a z€(a,b) 


PROOF: The result follows from Theorem 40.6.6 because inf;e(a, f'(x) € f'(y) € sup,e(a, f(x) for 
all y € (a, b). 


40.6.14 REMARK: Bound for absolute function variation in terms of an absolute derivative bound. 

One way in which one might obtain knowledge that the lower and upper bounds infse(a p») f'(x) and 
SUPz (a,b) f (x) for the derivative f'(x) is from the continuity of this derivative. If f' is known to be the 
restriction to [a,b] of a C! function, then f’ must be bounded on (a,b). Alternatively, the derivative of f 
might be known to be bounded via a differential equation of some kind. 


40.6.15 THEOREM: Differential quotient absolute value bounded by supremum of derivative absolute value. 
Let I = [a,b] be a bounded closed interval of R with a « b. Let f : I — IR be continuous on J and 
differentiable on Int(7) = (a, b). Then 


IFO) = ft o sup |f (x)|. 


b—a cz C€(a,b) 


PRoor: The result follows immediately from Theorem 40.6.13. 


40.6.16 REMARK: Lower bound for absolute function variation in terms of the derivative. 
The absence of a lower bound for the absolute function variation quotient in Theorem 40.6.15 is mostly 
due to the fact that one rarely knows a lower bound on the derivative of a function. The lower bound 
inf re(a,v) |f (x)| € |f(b) — f(a)|/(b — a) is still valid and informative if f'(x) does not change sign in the 
interval (a, b), but if this is known to be the case, one may apply Theorem 40.6.13 directly. 


40.6.17 REMARK:  Generalisation to higher-dimensional curves. 

Theorem 40.6.6 may be interpreted to mean that the velocity of a differentiable curve in one-dimensional 
Cartesian space is equal to the average velocity at some point along the curve. This is not true for curves in 
higher-dimensional Cartesian spaces. However, a generalisation of Theorem 40.6.15 to higher-dimensional 
spaces is possible. This is the subject of Section 40.8. 


40.7. Differentiation of vector-valued functions 


40.7.1 REMARK:  Vector-valued functions versus curves. 
Definition 40.7.2 introduces differentiable maps from open sets of real numbers to Cartesian spaces. These 
may be informally referred to as *vector-valued functions". They may be regarded as a specialisation of the 
partially differentiable function concept in Definition 41.2.2, but the concept of “partial derivative" is not 
applicable because there is only one independent variable. 


'The Cartesian-space-valued curves in Definition 40.8.4 specialise Definition 40.7.2 to interval domains. 


40.7.2 DEFINITION: A differentiable vector-valued function in IR" for n € Zj , on an open set Q € Top(IR), 
is a function f : Q — IR" such that the map fi : Q — R defined by f; : t — f(t); is differentiable for 
all i € Ny. 


40.7.3 THEOREM:  Differentiability implies continuity. 
Let n € Zt. Let € € Top(IR). Let f : Q — R” be differentiable on Q. Then f is continuous on 2. 


PROOF: Let f: Q — IR" be differentiable on Q. Then f; : Q — R is differentiable on Q for all i € Ny. 
So fi is continuous on Q for all i € IN, by Theorem 40.5.3. Hence f : Q — IR" is continuous on €) by 
Theorem 32.12.8. 
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40.8. The mean value theorem for curves 


40.8.1 REMARK: Generalisation of the mean value theorem to several dependent variables. 

The mean value theorem for a real-valued function of a real variable, Theorem 40.6.6, is not valid for 
differentiable curves in higher-dimensional Cartesian spaces. Theorem 40.6.6 states, in essence, that there 
is some point in a finite time-interval at which the velocity of an object moving differentiably in a one- 
dimensional Cartesian space is equal to the average velocity during that time. One might expect that if an 
object is moving differentiably in a higher-dimensional Cartesian space, there would also be some point in 
time at which the velocity equals the average. This is true for each component of the velocity separately, but 
not for all components simultaneously. A simple counterexample to the conjecture is the case of a helical 
coil on the surface of a cylinder in IR?. The velocity at each point is tangential to the cylinder, but the 
end-to-end average velocity of any such path is a secant through the cylinder in general, or else a tangent 
parallel to the axis of the cylinder. In either case, the average velocity occurs at no point on the curve. In 
the two-dimensional case, one may consider the counterexample of uniform motion along half of a circle. 
The velocity at one point has the same direction as the average velocity, but the speed there is different to 
the average. 


The name “mean value theorem” in the case of several dependent variables is given by some authors to 
Theorem 40.8.6, which states that the average point-to-point speed of a curve in a general finite-dimensional 
space is not more than the maximum speed. This is a generalisation of the inequality which is asserted in 
Theorem 40.6.15. The generalisation to higher-dimensional curves is intuitively obvious, but requires proof, 
which is not as easy as one might expect for such an obvious assertion. As mentioned in Remark 40.6.12 for 
the mean value theorem for real-valued functions, viewing the quotient in line (40.8.2) in Theorem 40.8.6 as 
the point-to-point “average speed" for the curve requires integration (and a fundamental theorem of calculus) 
so as to be able to define a continuous version of the arithmetic mean as the integral of the velocity divided 
by the distance between the end-points. 


40.8.2 REMARK: Triangle inequality for an infinite number of infinitesimal line elements. 

One may regard Theorem 40.8.7 as a kind of limit of the triangle inequality applied inductively to an infinite 
sequence of segments of a polygonal curve. The total distance along any piecewise linear curve, with a 
finite number of segments, is clearly not more than the sum of the lengths of the segments. If one takes 
the limit as this polygonal curve approximates a general differentiable curve, one easily obtains the desired 
inequality. However, this is an integration procedure, which is not presented in this book until Chapter 43. 
The inequality must be proved here without using integration. (Theorem 40.8.6 is proved by Rudin [129], 
page 99, without using integration. Theorem 40.8.7 is given by Lang [23], pages 13-14, using integration. 
A proof using integration for several dependent variables and several independent variables is given by 
Edwards [67], pages 172-180.) 


40.8.3 REMARK:  Differentiable curves in Cartesian spaces. 
The kind of differentiable curve in a Cartesian space in Definition 40.8.4 could be regarded as a specialisation 
of the partially differentiable function concept in Definition 41.2.2 to domains which are open real-number 
intervals. However, differentiable curves have much more in common with differentiable real-valued func- 
tions of a real variable than with general partially differentiable maps between Cartesian spaces. (See also 
Definition 42.6.6 for higher-order differentiability of curves.) 


40.8.4 DEFINITION: A differentiable curve in IR" for n € Zf, on an open interval J € Top®°""(IR), is a 
function f : I — IR" such that the map f; : J > IR defined by f; : t f(t); is differentiable for all i € Np. 


40.8.5 THEOREM: Derivative equals differential quotient limit for converging left/right end-points. 

Let I = [a,b] be a bounded closed interval of R, with a < b. Let f : I — R” be continuous on J and 
differentiable on Int(I) = (a,b) for some n € Zt. Let c € (a,b), and let (s;);29 and (t;);9 be real- 
number sequences such that s; € [a,b] and t; € [a,b] and s; < t; and s; < c < t; for all i € Zf, and 


PROOF: Let I = [a,b] be a bounded closed interval of R with a < b. Let f : I — R” be continuous 
on I and differentiable on Int(I) = (a,b) for some n € Zt. Let c € (a,b), and let (s;);€g and (t;)%29 by 
real-number sequences such that s; € [a,b] and t; € [a,b] and s; < t; and s; < c € t; for alli € Zi, and 
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lim; 55,5; = liM; t; = c. Then the quotient (f(t;) — f(s;))/(t; — si) is a well-defined element of R” for 
all i € Z. Let e € IR^. By the definition of f’(c) (and Theorem 40.4.11), 


3ó € Rt, Va € Bs(c) n [a,b], |f(x) — fle) ^ (x — e)f'(c)| € ela — c]. (40.8.1) 


But di, € Zi, Vi > is, |si— c| < 6, and di, € De. Vi > iz, |t; — c| < 6, where 6 € IR* in line (40.8.1) depends 
only on f, c and £e. For this choice of 6, is and i+, the numbers s; and t; lie in Bs(c) A [a,b] for all i > is 
and i > à. Therefore 


Vi 2 max(is, i), |f(si) — fle) = (si - c) f (2I < elsi — d 
and |f(t) — fle) — (ti — e)f'(c)] € elt; — el. 
Then by the triangle inequality in IR", 


Vi > max(is, i+), 


If (ta) — Fli) — (ti = si) P(E] S IF) — FOO — (& — e)f' Q1)I + IF (8s) — FQ — Gi — 9) II 
< elt; — e| + els; — c]. 


= elt; — sil. 
But s; < t; for all i € Zi. So 


ti — Sj 


Ve € Rt, Jio € Zi, Vi > io, — f'(c)| <e. 
0 


Hence f’(c) = limi ;sc(f (ti) — f(s;))/(t; — si). 

40.8.6 THEOREM: Mean value theorem for curves in Cartesian spaces. 
Let I = [a,b] be a bounded closed interval of R, with a < b. Let f : I — R” be continuous on J and 
differentiable on Int(I) = (a,b) for some n € Zt. Then 


Jr € (a, b), |f (x) > E (40.8.2) 


PROOF: Let I = [a,b] be a bounded closed interval of R with a < b. Let f : I — R” be continuous on 7 
and differentiable on Int(I) = (a,b) for some n € Z*. Define the real-number sequences (s;)% and (t;)%2o 
by induction so that so = a and tọ = b, and 


Vie Zt sua o4 200i 8). VEG + 6) = Sls) < alf) — £0 
Eh i Si otherwise 
and ty, = d 20895). V [f(S(si 69) - Fls) 2 all) — C) 
‘ ti otherwise. 
It is easily seen that t;—s; = 2~'(b—a) for alli € Zj. For any i € Zj , if |f($(sit+ti))—f(si)| < 4| f(t) —f (sa), 
it follows from the triangle inequality for the norm | - | on R” that |f (ti+1)— fisse] If (ti) — flitt) > 


IE) — F(ss)| — FG (ss + ti) — f(82] > IF) FDI- 81/0) — f(s] = al f(t i) — f(si)|- Otherwise, if 
Lf(S si) — f(si)| = 3f (5) — f (si); it follows that |f(ti+1) — f(si+1)| = [f(g (si +t) - (801 = alf J- 

f(si)|. So (fti) — F(sit1)| = 3f (ts) — f(s)] for all i € Zf. Therefore |f (t:i) — f(s:)| > 27*|f(6) — f(a) 
for all i € Zj. So |f(ti) — f(si)|/lti — si| > |f (b) — f(a y/( — a) for all i € Zj. Both (s;)?99 and (t;)?9 
are Cauchy sequences, and they have the same limit. Let c = lim; 55 Si = [PANE i. Then it follows from 
Theorem 40.8.5 that f'(c) = limi ss (f(t;) — f(si)/(t; — s;)). Hence f'(c) > | f(b) — f(a)|/(b — a). 


40.8.7 THEOREM: The “mean speed” is bounded above by the supremum of the pointwise “speed”. 
Let I = [a,b] be a bounded closed interval of R with a < b. Let f : I — R” be continuous on J and 
differentiable on Int(I) = (a,b) for some n € Z+. Then 


I£) — fa) < sup |f'(x)]. (40.8.3) 
b—a cz C€(a,b) 


PRoor: The result follows immediately from Theorem 40.8.6. 
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40.8.8 REMARK: Application of upper bound for mean speed to prove Lipschitz continuity. 

Theorem 40.8.7 may be used in two different ways. If one knows a-priori an upper bound for the speed 
| f’(x)| for x € (a,b), one may infer the same upper bound for the quotient on the left side of line (40.8.3). 
A useful application of this is to show that if a function is continuously differentiable on an interval, then 
the function satisfies the Lipschitz continuity property for every closed sub-interval. (See Section 38.6 for 
Lipschitz continuity.) In other words, C?! curves are Lipschitz continuous. If one knows a-priori that the 
quotient in line (40.8.3) has some lower bound, then there exists at least one point on the curve where the 
speed is greater than or equal to that lower bound. This gives a lower bound for the C! norm of f. 


40.8.9 REMARK: “Mean value theorem” for curves in general metric spaces. 

Theorem 38.8.4 (iii) states that the distance between the endpoints of a curve in a metric space is less than 
or equal to the length of the curve. Theorem 38.9.7 (iv) implies that any locally rectifiable curve in a metric 
space may be parametrised by its length so that it has Lipschitz constant 1. As mentioned in Remark 38.9.5, 
a curve with Lipschitz constant 1 may be thought of as having a “constant speed” (although the velocity is 
not well defined in a general metric space). This may be regarded as a kind of “mean value theorem” for 
curves general metric spaces because the “average velocity" of a length-parametrised curve y : [a,b] > M in 
a metric space M for a < b may be defined as the quotient d(^(a), y(b))/(b— a), whereas the “speed” at each 
point of the curve equals 1. Since b — a equals the length L(y) of the curve, the bound d(7(a), y(b)) € L(y) 
implies that 1 > d(»y(a),y(b))/(b — a), which resembles line (40.8.2) in Theorem 40.8.6. 


40.9. Unidirectional real function differentiability 


40.9.1 DEFINITION: A right-open subset of R is a set U C R which satisfies 


Vp € U, 3ó > 0, (p,p+ 6) CU. 


40.9.2 DEFINITION: A left-open subset of R is a set U C R which satisfies 


Vp € U, 46 > 0, (p —06,p) CU. 


40.9.3 REMARK: Right-differentiable functions. 
Definition 40.9.4 for right-differentiable functions is illustrated in Figure 40.9.1. 


(p) + v(x —p) + eļz — p| 


(p) + v(x — p) e — p| 


> 
x 


p pto 


Figure 40.9.1 Right differentiability of real-valued function or real variable 


40.9.4 DEFINITION: A right-differentiable function at a point p € U for a right-open set U C R is a function 
f: U — R which satisfies 


Je € R, Ve > 0, 30 > 0, Vr € (p, p+ 0) nU, 
|f(z) — f(p) — v(x — p)| < elz — pl. 


A right-differentiable function on a right-open set U C R is a function f : U — R which is right-differentiable 
for all p € U. 
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40.9.5 DEFINITION: A left-differentiable function at a point p € U for a left-open set U C IR is a function 
f: U — R which satisfies 
Jo € R, Ve > 0, 3ó > 0, Vr € (p^ ô, p) N U, 
f(x) — f(p) — v(x — p)| € ela — pl. 


A left-differentiable function on a left-open set U C R is a function f : U — R which is left-differentiable 
for all p € U. 


40.9.6 DEFINITION: A unidirectionally differentiable function at a point p € U for an open set U C Risa 
function f : U — IR which is both right-differentiable and left-differentiable at p. 

A unidirectionally differentiable function on an open set U CR is a function f : U — R which is both 
right-differentiable and left-differentiable on U. 


40.10. Dini derivatives 


40.10.1 REMARK: Upper and lower directional derivatives of real-valued functions. 

Upper and lower directional derivatives of real-valued functions of a single real variable are variously known 
as “Dini derivatives", “Dini derivates”, “derivates” or “derived numbers”. (See for example Thomson/ 
Bruckner/Bruckner [149], pages 300-303; A.E. Taylor [145], page 403; S.J. Taylor [147], page 224; Riesz/ 
Szókefalvi-Nagy [125], pages 7, 17-18; Kolmogorov/Fomin [104], page 318; Bass [53], page 122; Graves [85], 
page 73; Yosida [167], page 239.) One immediate advantage of such “derivatives” is that they always exist 
in the interior of the domain of any real-valued function. Their values are all extended real numbers in 
R = [706,00] = RU {—00,00}. (See Section 16.2 for extended real numbers. See Definition 35.8.9 for 
inferior and superior left and right limits.) m 


Similarly, upper and lower directional derivatives may be defined for functions from IR" to IR for n € Z*. 
The Dini derivative concept may then be extended from real-valued functions on Cartesian spaces IR” to 
real-valued functions on differentiable manifolds. 


The basic four “derived numbers" of a real function in Definition 40.10.2 are illustrated in Figure 40.10.1. 


h)— 
im sup l Et- f) 
h—0t h 
A 
»- 

‘pf (ath) — f(x) 
lim inf ————— ———- 

hot h 

Figure 40.10.1 Upper and lower left and right derivatives of a real function 


40.10.2 DEFINITION: Let f :R — R and z E€ R. 
(i) The upper right (Dini) derivative of f at x is the superior right limit lim supp „o+ (f(x + h) — f(z))/h. 
(ii) The lower right (Dini) derivative of f at x is the inferior right limit lim inf; _,9+(f(a +h) — f(x))/h. 
) 
) 


) 
(iii) The upper left (Dini) derivative of f at x is the superior left limit lim supp, ,o- (f(z +h) — f(x))/h. 
(iv) The lower left (Dini) derivative of f at x is the inferior left limit liminf,_,9- (f(x +h) — f(x))/h. 
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40.10.3 REMARK: Notations for Dini derivatives. 
In Notation 40.10.4, the superscript + or — signifies whether the upper or lower derivative is meant. The 
subscript + or — signifies whether the derivative is to the right or the left. 


This kind of notation is, unfortunately, not easily extended to upper and lower directional derivatives when 
the domain of the function is a multidimensional Cartesian space. (The definition is easy to extend. It is only 
the notation which encounters “issues”.) In the multidimensional case, it is more natural to define upper 
and lower directional derivatives of the form Of f(x) = limsup, ,o9- (f(x + hv) — f(z))/h for v € T,(IR"). 
But if n = 1 and v = —e; = (—1), one obtains Ot f(x) = —D7 f(x) instead of D; f(x). 


40.10.4 NOTATION: Let f: IR — IR and x € R. 


(1) DT ) denotes the upper right Dini derivative of f at x. Alternatives: A, f(x), D* f(x), Dt f(z). 
(ii) D; f(x) denotes the lower right Dini derivative of f at x. Alternatives: Ar f(x), D. f(x), DY f(x). 
(iii) Bite) denotes the upper left Dini derivative of f at x. Alternatives: Ay f(x), D~ f(x), D7 f(a). 
(iv) D, f(x) denotes the lower left Dini derivative of f at x. Alternatives: Ae f(x), D- f(x), D^ f(x). 
40.10.5 THEOREM: Some bounds for Dini derivatives. 

Let f : IR > R. 


(i) Drie ) < Dt 

(ii) D; f(a) € Dj f(a) for all z € R. 

(iii) If T is non-decreasing, then D; f(x) > 0 and D, f(x) > 0 for all z € R. 
v) 


(iv) If f is non-increasing, then D} f(x) < 0 and D7 f(x) < 0 for all z € R. 


PROOF: For part (i), Dy f(x) = lim inf,_,9+ (f (z--h)— f (x))/h € limsup, „o+ (f(z4-h)— f (z))/h = Df f (a) 
by Theorem 35.8.6. 

For part (ii), Dy /(2) = limints ,o- (f(e + A) — /(2))/A < limsup, ,o- (s +A) — /(2))/h = DFe) by 
Theorem 35.8.6. 

For part (iii), Dy f(a) = lim inf, ,o« (f(x + h) — f(x 
and h € IR*. Similarly, D; f(x) = liminf, ,9- (f(x + 
lim infy ,o- (f(a) — f(a — h))/h > 0 because f(x) 


f(a) for all z € R. 


)/h > 0 because f(x +h) > f(x) for all x € 


) 

h) — f(x))/h = — lim inf;, 40+ (f(x — h) — f(z))/h = 

> f(x — h) for all x € R and he R*. 
< 


e 


For part (iv), D7 f(x) = _jimsupnso+ (f( + h) — f(x))/h € 0 because f(x +h) € f(x) for all x € R 
and h € Rt. Similarly, Dj f(x) = lim sup, so- (f(x + h) — f(z))/h = —limsup, ,o«(f(z — h) — f(z))/h = 
lim sup, ,o« (f(x) — f(a — h))/h > 0 because f(x) € f(x — h) for all x € R and h € Rt. 
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41.1. Partial derivatives of real-valued functions on Cartesian spaces 


41.1.1 REMARK: Two main questions: Is it differentiable, and what is the derivative? 
'There are two interlinked questions that must always be asked when defining differentiation of functions. 


(1) Is the function differentiable? 
(2) What is the derivative? 


There is a certain amount of "chicken and egg" here. One may say that the answer to question (1) is 
that a function is differentiable whenever the derivative exists. But the answer to question (2) is often the 
"output" from the differentiability test for question (1). Typically a differentiability test says that a function 
is differentiable if the output from some formula exists and is unique (and inhabits an acceptable kind of 
function space). Then that becomes the definition of the derivative. So it makes sense to answer question (1) 


before attempting question (2). 


On the other hand, the test for question (1) usually arises from the definition formula for question (2). In 
practice, one typically first writes down a formula for the derivative in order to answer question (2). Then 
one investigates the conditions which must be placed on the function to ensure that this formula gives a 
well-defined result. So in practice, one answers question (2), then question (1), but when documenting the 
answers, one presents (1) before (2). This kind of order reversal is not unusual. One typically works out the 
details of a proof first, which determines the conditions which must be placed on a theorem statement. Then 
one presents the theorem before the proof. This order reversal was noted even by the ancient Greeks, who 
called the discovery of the proof “analysis” and the presentation, in reverse order, the “synthesis”. Proofs are 
typically found by starting from the conclusion and working backwards! Then one documents the successful 
outcome for publication. (The difference between the method of discovery and method of presentation was 
famously highlighted by the *Method" of Archimedes. See Archimedes/Heath [200].) 


Questions (1) and (2) become more “interesting” (i.e. problematic) as one progresses from Cartesian spaces to 
real functions on differentiable manifolds, tensor bundle cross-sections on differentiable manifolds, functions 
on differentiable fibre bundles, and beyond. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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41.1.2 REMARK: Partial and total differentiability of real-valued functions of several variables. 

Definition 40.3.4 for differentiability of a real-valued function of a single real variable may be generalised in 
many different ways to functions from IR" to IR. The most restrictive natural generalisation requires the 
existence of a “total differential” at a point in R”. This is given by Definition 41.6.7. A much easier test to 
apply is the partial differentiability property in Definition 41.1.4. 


41.1.3 REMARK: Independence of existence of individual partial derivatives. 

Individual partial derivatives of a function may exist independently of each other. In other words, the set of 
i € Nn for which line (41.1.1) in Definition 41.1.4 is satisfied can be any subset of IN4. However, individual 
partial derivatives are not defined here because in most applications, partial derivatives are useful only if 
they are all defined. A particular reason for this in differential geometry, and in analysis generally, is the 
“democracy of axes”. In other words, “all axes are equal” in such contexts. Abstract properties of functions, 
particularly geometry-related properties, rarely depend on a particular choice of axes. In other words, the 
validity of a property is typically invariant under permutations of the axes. Hence: “one in, all in”. 

This kind of consideration applies also to the directional derivatives in Section 41.4, where typically the 
existence of directional derivatives is useful only when all directional derivatives exist because in differential 
geometry, properties which are not invariant under local diffeomorphisms are typically not meaningful. 


41.1.4 DEFINITION: A partially differentiable function f : U — R. at a point p € U, for a set U € Top(R”) 
with n € Zf, is a function f : U > IR such that 


Vi € Nn, dw € R, Ve > 0, 3ó > 0, Vt € (—ô, ô), 
ptte, €U = |f(p+ tei) — f(p) — tw| < elt]. (41.1.1) 


A partially differentiable function f : U > R, for a set U € Top(IR") and n € Z{, is a function f : U > IR 
such that f is a partially differentiable function for all p € U. 


41.1.5 REMARK: Keyhole testing for partial differentiability. 

Theorem 41.1.6 is a multi-variable generalisation of the implication (v) — (i) in Theorem 40.3.8. For the 
sake of brevity, the full detailed arguments for the single-variable case in Theorems 40.3.6, 40.3.7 and 40.3.8 
are not generalised here. In fact, even the assertion of Theorem 41.1.6 is considered by most authors to 
be too obvious to be worth proving. (It wouldn't even be proposed as a student exercise!) However, it is 
required in the context of differentiable manifolds in order to show that partial differentiability testing via 
the charts implies global partial differentiability. 


Likewise Theorem 41.1.7, which generalises Theorem 40.3.7 from n = 1 to n € Zt, is somewhat obvious. 


41.1.6 THEOREM: Everywhere local partial differentiability implies global partial differentiability. 
Let n € Zł, U € Top(IR?) and f : U > R satisfy 


Vp € U, IQ € Top, (U), is is partially differentiable on Q. (41.1.2) 


Then f is partially differentiable on U. 


PROOF: Suppose that f satisfies condition (41.1.2). Let p € U. Then there exists Q € Top,(U) such that 
f m is partially differentiable on Q. So f le is partially differentiable at p by Definition 41.1.4. Let i € Nn. 
Then there exists w € IR satisfying 


Ve > 0, 4d > 0, Vt € (—ô, ô), 
pte; EQ = |f(p-- tei) — f(p) — tw| < elt]. 


Let e € IR*. Then there exists dg € IR* such that |f (p + te;) — f(p) — tw| < e|t| for all t € R satisfying 
t € (—ôo, ôo) and p+te; € Q. Let € = [t € (—69,09); p4-te; € Q}. Then Q’ € Top; (IR) by Definition 31.12.4 
because the map t +> p + te; is continuous. So by Theorem 32.5.9, (—6’,6’) C € for some 6’ € Rt. Let 
ôb = min(ó', ôo). Then p + te; € Q for all t € (—06, 56). So | f(p + tei) — f(p) — tw| € e|t| for all t € (—65,90) 
satisfying p+te; € U. Consequently f is partially differentiable at p by Definition 41.1.4. Hence f is partially 
differentiable on U by Definition 41.1.4. 
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41.1.7 THEOREM: Inheritance of partial differentiability by restrictions to open subsets. 
Let n € ZF and U € Top(IR”). Let f : U — R be partially differentiable on U. Then Flo is partially 
differentiable on Q for all Q € Top(U). 


PROOF: Let f be partially differentiable on U, and let Q € Top(U). Let p € Q. Then p € U. So f is 
differentiable at p by Definition 41.1.4. Let i € Nn. Then there exists w € IR such that 


Ve > 0, 4d > 0, Vt € (—ô, ô), 
p+tei €U => |f(p+ tei) — f(p) — tw| < elt|. 


Since Q C U, the proposition p + te; € Q implies p + te; € U. So by Theorem 4.5.7 (ix), 


Ve > 0, dó > 0, Vt € (—ô, ô), 
pte; EQ = |f(p+ tei) — f(p) — tw| < elt]. 


So f [s is differentiable at p by Definition 41.1.4. Hence f m is differentiable on 2. 


41.1.8 REMARK: The special case of the zero-dimensional Cartesian space. 

When n = 0 in Definition 41.1.4, the set IR" contains only the empty tuple () = Ø, which is thought of as the 
zero vector in R°. Thus IR? = (0) = {0}. Therefore U = {0} and p = 0, and so f : U — R must be defined 
by f(0) = y for some y € R. Then line (41.1.1) is always satisfied because N, = No = @, so that i € Ny, 
is impossible, and so the rest of the logical predicate is irrelevant. So every function f : U — IR is partially 
differentiable at every p € U. Consequently, all functions f : U — IR are partially differentiable. However, 
this is a *pyrrhic victory" because in Definition 41.1.10, one finds that there are no partial derivatives with 
respect to component i € IN, because Npn is empty. So the tuple of partial derivatives is empty. 


41.1.9 REMARK: Partial derivatives of partially differentiable functions. 

The uniqueness of the number w € R on line (41.1.1) is guaranteed by the uniqueness of the derivatives of 
functions from R to IR, in this case the function t > f(p+te;). Therefore this number may be given a name. 
'This is done in Definition 41.2.7. The array containing all n of these m-tuples as rows is given a name in 
Definition 41.1.10. In both of these definitions, the function is required to be differentiable only at the single 
point p where the m-tuple or matrix is defined. 


41.1.10 DEFINITION: The partial derivative of a partially differentiable function f : U — R at a point 
p € U with respect to component i € Nn, for U € Top(R”) and n € Ze , is the number v € IR which satisfies 


Ve > 0, dó > 0, Vt € (—ô, ô), 
ptte, €U = |f(p-tei) — f(p) — tw] < elt]. 


41.1.11 REMARK: Notations for partial derivatives. 

The functional notation 0;(p) is used in Notation 41.1.12 for a partial derivative which may exist at only one 
point. This is a slight abuse of notation because the rest of the function might not exist. On the other hand, 
the partial derivative may be well defined as a partial function. (See Section 10.9 for partial functions.) 


41.1.12 NOTATION: Ó;f(p), for i € Nn and a partially differentiable function f : U > R at a point p € U, 
for a set U € Top(R”) and n € Zi , denotes the partial derivative of f at p with respect to component i. 


41.1.13 NOTATION: 0;f, fori € Nn and a partially differentiable function f : U — R, where U € Top(R”) 
and n € Zi, denotes the map p — 0;f(p) for p € U. 


41.1.14 DEFINITION: The partial derivative tuple of a partially differentiable function f : U —^ Rata 
point p € U, for a U € Top(R”) and n € Zg, is the n-tuple w € R” which satisfies 


Vi € Nn, Ve > 0, 3ô > 0, Vt € (—6, 4), 
p+te; € U = |f(p+ tei) — f(p) — twil < elt]. 
41.1.15 REMARK: Constant real-valued functions on Cartesian spaces have zero partial derivatives. 


As mentioned in Remark 40.6.7, proving a trivial theorem is an excellent exercise for the mechanics of the 
application of new definitions. Theorem 41.1.16 is also useful. 
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41.1.16 THEOREM: The partial derivatives of a constant function are well defined and equal to zero. 
Let n € Zi. Let U € Top(IR?). Let c € R. Define f : U  R by f(x) = c for all x € U. Then f is partially 
differentiable on U, and 0; f(p) = 0 for all p € U and i € Nn. 


PROOF: Let p € U. Let i € N,. Let w — 0. Let € € IR*. Let ô= 1. Let t € (—6,6) satisfy p + te; € U. 
Then |f(p + te;) — f(p) — tw| = |e — c — 0| = 0 € eļt|. So f is a partially differentiable function on U by 
Definition 41.1.4, and the partial derivative of f at p equals 0 for all p € U and i € IN,, by Definition 41.1.10. 


Hence 0; f (p) = 0 for all p € U and i € N, by Notation 41.1.12. 


41.1.17 REMARK:  Cartesian space lift maps and partial derivatives. 

'The rules in Theorem 41.1.18 are the partial derivative versions of the single-variable rules in Theorem 40.5.7. 
Partial derivatives of real-valued functions on Cartesian spaces can be converted to ordinary derivatives via 
lift maps of the kind described in Definition 10.13.17. Such a map Lift? "lifts" a real number t € IR to a 
point Lift? (t) € R” which lies on the line through a point p € R”, running parallel to the ith axis. (This 
idea is illustrated in Figure 41.1.1.) 


T3 1 


(0, p2, pa) 


(t, p2, pa) 


Figure 41.1.1 Lift map from IR to IR? through a point p € IR? 


The derivative of the function f o Lift? is the same as the partial derivative of f with respect to component i. 
An important property of this “lift map" is that it is continuous by Theorem 32.12.3 (ii). So the inverse 
image (Lift?) ! (U) is an open subset of IR, which implies that Theorems 40.5.7 and 40.5.9 are applicable. 


41.1.18 THEOREM: Some elementary rules for pointwise first-order partial derivatives. 
Let U € Top(IR") and p € U, where n € Zj. 
(i) (Constant multiplication rule.) If A € R and f : U > R is partially differentiable at p, then the 

pointwise product Af is partially differentiable at p and Vi € Nn, 0;(Af)(p) = AO; f (p). 

(ii) (Sum rule.) If fi, f2 : U — R are partially differentiable at p, then the pointwise sum fı + f» is partially 
differentiable at p and Vi € Nn, 0;(f1 + fi)(p) = i fı (p) + 8i fa(p). 

(iii) (Product rule.) If fi, fa : U — IR are partially differentiable at p, then the pointwise product ff is 
partially differentiable at p and Vi € Nn, Oi(fifa)(p) = 3: fı (p) f2(p) + fh()8i fap). 

(iv) (Reciprocal rule.) If g : U — IR \ {0} is partially differentiable at p, then the pointwise reciprocal 1/g 
is partially differentiable at p and Vi € Nn, O9;(1/g)(p) = —(8i9(p))/g(p)*. 

(v) (Quotient rule.) If f : U —> R and g : U > IRA (0) are partially differentiable at p, then the pointwise 
quotient f/g is partially differentiable at p and Vi € Nn, O;(f/g)(p) = O:f (p)/g(p) — f(p)8ig(p)/g(p) = 
(a: f (p)glp) — f(»)8:g(»))/ gp". 


PROOF: Let U € Top(IR?) for some n € Zj. Let U; = (Lift?)-! (U) for i € Nn. Then U; € Top(R) for all 
i € Nn by Theorem 32.12.3 (ii). 
For part (i), let f : U — IR be partially differentiable at p € U, and define f:Ŭ>Rbyf=fo Lift?. Then 


f is differentiable at p; € R and f'(p;) = 0:f(p). Similarly (AF)'(p:) = A(Af)(p). Therefore 0;(Af)(p) = 
(Af)'(pi) = Af’ (pi) = AO; f (p) by Theorem 40.5.7 (i). 
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For part (ii), for k = 1,2, let f, : U — IR be partially differentiable at p € U and define fr: U OR 
by fk = fe o Lift?. Then f; is differentiable at p; € IR and fi(pi) = O:fx(p) for k = 1,2. Similarly 
(fi + f2) (i) = Oi(fi + f2)(p). Therefore 0;(f1 + f2)(p) = (fi + f2) (pi) = fi (pi) + f2(p:) = 8i (9) 8; fo(P) 
by Theorem 40.5.7 (ii). 

Similarly, parts (iii), (iv) and (v) follow from Theorem 40.5.7 parts (iii), (iv) and (v) respectively. 


41.1.19 THEOREM: Partial derivatives equal zero at a maximum or minimum. 

Let f : U — R be a partially differentiable function at p € U, where U € Top(IR") for some n € Zj. 
(i) If f(p) = max( f), then 0; f(p) = 0 for all i € Np. 

(ii) If f(p) = min(f), then 8; (p) = 0 for all i € Ny. 


PROOF: For part (i), let Q = [(z; — p; x € U and Vj € Nn \ {i}, zj = pj} = {t € R; p+ te; € Q}, 
where i € Npn, and e; is the usual coordinate basis vector in IR^. Then Q € Top(IR) by the inductive 
application of Theorem 32.10.3. (Alternatively by the continuity of the map t > p + te;.) Define f : R 
by f :t o f(p+ tei). Then 0;f(p) = f'(0). But if f(p) = max(f), then f(0) = max(f). So f’(0) = 0 by 
Theorem 40.6.2 (i). Hence 0; (p) = 0 for all i € Np. 

Part (ii) follows in the same way as part (i) from Theorem 40.6.2 (ii). 


41.1.20 REMARK: Partial differentiability does not guarantee continuity. 

Examples 41.1.21 and 41.1.22 show that partially differentiable functions are not necessarily continuous. 
Since all totally differentiable functions are continuous, it follows that the discontinuous Examples 41.1.21 
and 41.1.22 are not totally differentiable. Hence partial differentiability everywhere does not imply total 
differentiability. 


41.1.21 EXAMPLE: A function which is partially differentiable, but not continuous. 
Define f : IR? > IR by 


0 x = (0,0). 


Then f is partially differentiable on R?, but f is not continuous at (0,0) € IR?. This function is not 
directionally differentiable at (0, 0). The level curves of f are illustrated in Figure 41.1.2. Note that | f(x)| € 1 
for all x € R?. (For some further comment on this example, see Example 52.6.9.) 


f= (oo +22) zz (0,0) 
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Figure 41.1.2 Level curves of f(x) = 22425/(z? + 22), f : R? > IR 


41.1.22 EXAMPLE: A function which is partially differentiable, but discontinuous for rational coordinates. 
Let h : Zt — Q? be an enumeration of the set Q? of elements of IR? which have rational components. Define 
$ : R? — R by 

oe) = 5; 27* F(a — h(k)) 


kezt 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1334 41. Multi-variable differential calculus 


for all z € IR?, where f is as defined in Example 41.1.21 and x — a denotes the element (xı — a1, 22 — a2) 
of IR? for all a € R?. Then $ is well defined and partially differentiable on R?, but $ is discontinuous at 
all points a € Q?. Thus ¢ is discontinuous on a dense subset of its domain R?. Note that |ó(x)| < 1 for 
all x € R?. (For all points a € Q?, the function ¢ is also not directionally differentiable at a.) 


41.1.23 REMARK: Real functions with continuous partial derivatives. 

The issues which become clear from pathological functions as in Examples 41.1.21 and 41.1.22 can be 
resolved either by explicitly requiring directional derivatives to exist as in Definition 41.4.2, by requiring a 
much stronger kind of differentiability called “total differentiability” as in Definition 41.6.4, or by stipulating 
that the partial derivatives be continuous, which guarantees both directional and total differentiability. 
The amount of intellectual effort required to manage the technicalities of directional and total derivatives 
is substantial compared to the benefit of some extra generality. In fact, a large proportion of differential 
geometry authors require most functions to have continuous partial derivatives of all orders because it reduces 
the mental workload. The compromise which is adopted in this book is to typically assume that functions 
have continuous partial derivatives up to some order. That order is chosen as the least order which makes 
the relevant definitions or theorems valid. With considerable effort, it is often possible to achieve a small 
increase in generality, but the scale of C^ differentiability classes for k € Z is very well suited to the vast 
majority of tasks and purposes in differential geometry. 


Definition 41.1.24 does not have a pointwise version as Definition 41.1.4 does. The reason for this is that 
continuity of partial derivatives at a point makes no sense. (Note that C1(U, R) in Notation 41.1.25 is a 
special case of C^(U, R) for k € Zi in Notation 42.2.9. See Notation 31.12.11 for C?.) 


41.1.24 DEFINITION: A continuously partially differentiable function f : U — R for a set U € Top(R”) 
and n € Zr, is a partially differentiable function f on U such that the partial derivatives of f are all 
continuous on U. 


A function which has continuous partial derivatives means a continuously partially differentiable function. 


41.1.25 NOTATION: C!(U,R), for a set U € Top(IR"), for n € Zf, denotes the set of functions f : U > IR 
which have continuous partial derivatives. 


41.1.26 REMARK:  Keyhole testing for continuous partial differentiability. 

As mentioned in Remark 31.12.16, it is desirable in the context of topological manifolds to be able to 
demonstrate global continuity of a function by testing locally. This is because manifold structure is defined 
via charts whose domains are open subsets of the manifold. The same issue arises for differentiable manifolds, 
where one wishes to demonstrate global continuous differentiability, utilising only local tests within chart 
domains. Theorem 41.1.27 provides such a test in the case of real-valued functions on Cartesian space 
domains, which are obtained from real valued functions on manifolds via local charts. Thus Theorem 41.1.27 
is the C! analogue of Theorem 31.12.17. 


41.1.27 THEOREM: Local continuous differentiability implies global continuous differentiability. 

Let n € Te . Let f : U — IR for some U € Top(R”). Then f is continuously partially differentiable on U if 
and only if for all p € U there is a set Q € Top, (U) such that f | o is continuously partially differentiable. In 
other words, 


fec'U,R) e VpeU,INETop,(V), f|, € C' (0, IR). 


PROOF: Suppose that f € C!(U,IR). Then flo is partially differentiable on U by Theorem 41.1.7. The 
continuity of the partial derivatives 0;(f | 9) = (8i f)| off ls on Q for i € Nn follows from the continuity of 
the partial derivatives of f on U by Theorem 31.12.15. Thus f € C!(Q, IR) by Notation 41.1.25. 

Now suppose that Vp € U, 3) € Top, (U), flg € C1 (Q, R). Then f : U > IR is partially differentiable on U 


by Theorem 41.1.6, and by Theorem 31.12.17, the continuity of the partial derivatives of f on U follows from 
the continuity of the partial derivatives ô;(f |o) = Pla of fla on Q for i € Nn. Hence f € C1(U,R). 
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41.2. Partial derivatives of maps between Cartesian spaces 


41.2.1 REMARK: The choice of norm for convergence in Cartesian spaces. 

The norm “| - |” in Definition 41.2.2 is the 2-norm (or “root mean square") as in Definition 24.7.16. By 
Theorem 39.5.3 (i), the choice of norm affects the rate of convergence by at most a constant factor. So 
Definition 41.2.2 is independent of the choice of norm. 


41.2.2 DEFINITION: A partially differentiable function f : U — R™ at a point p € U, for a set U € Top(R”) 
and n,m € Zi, is a function f : U — R™ such that 


Vi € Nn, dw € R”, Ve > 0, 4d > 0, Vt € (—ô, ô), 
pt+te, €U => |f(p+ tei) — f(p) — tw| < elt]. (41.2.1) 


A partially differentiable function f : U — R™, for a set U € Top(R") and n,m € Zg, is a function 
f:U 5 IR" such that f is a partially differentiable function for all p € U. 


41.2.3 REMARK: Keyhole testing for partial differentiability. 

Theorem 41.2.4 is a tuple-valued generalisation of Theorem 41.1.6. Likewise Theorem 41.2.5 is a tuple-valued 
generalisation of Theorem 41.1.7. The statements and proofs are essentially unchanged. Thus the proofs are 
more or less a waste of space, but the theorems are useful. 


41.2.4 THEOREM: Everywhere local partial differentiability implies global partial differentiability. 
Let n,m € Z{, U € Top(IR”) and f : U — IR" satisfy 


Vp € U, IN € Top, (U), f|; is partially differentiable on Q. (41.2.2) 
Then f is partially differentiable on U. 


PROOF: Suppose that f satisfies condition (41.2.2). Let p € U. Then there exists Q € Top, (U) such that 
f m is partially differentiable on Q. So f ls is partially differentiable at p by Definition 41.2.2. Let i € Nn. 
Then there exists w € R™ satisfying 


Ve > 0, dó > 0, Vt € (—ô, ô), 
pte; € 0 => |f(p--tei) — f(p) — tw| € elt]. 


Let e € IR*. Then there exists ĝo € R* such that |f (p + te;) — f(p) — tw| < e|t| for all t € R satisfying 
t € (—69,09) and p+te; € Q. Let € = {t € (—69,09); p4-te; € Q}. Then Q’ € Top; (IR) by Definition 31.12.4 
because the map t + p + te; is continuous. So by Theorem 32.5.9, (—6’,6’) C Q’ for some 6’ € Rt. Let 
6g = min(ó', ôo). Then p + te; € © for all t € (—06, 56). So | f(p + tei) — f(p) — tw| € e|t| for all t € (—65,90) 
satisfying p+te; € U. Consequently f is partially differentiable at p by Definition 41.2.2. Hence f is partially 
differentiable on U by Definition 41.2.2. 


41.2.5 THEOREM: Inheritance of partial differentiability by restrictions to open subsets. 
Let n,m € Zf and U € Top(IR?). Let f : U — R™ be partially differentiable on U. Then f là is partially 
differentiable on Q for all Q € Top(U). 


PROOF: Let f be partially differentiable on U, and let Q € Top(U). Let p € Q. Then p € U. So f is 
differentiable at p by Definition 41.2.2. Let ? € Nn. Then there exists w € IR such that 


Ve > 0, dó > 0, Vt € (—ô, ô), 
pte; €U => |f(p--tei) — f(p) — tw| < elt. 


Since Q C U, the proposition p + te; € Q implies p + te; € U. So by Theorem 4.5.7 (ix), 


Ve > 0, dó > 0, Vt € (—ô, ô), 
pte; EQ => |f(p+ tei) — f(p) — tw| < elt]. 


So f m is differentiable at p by Definition 41.2.2. Hence f m is differentiable on 2. 
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41.2.6 REMARK: Partial derivatives of partially differentiable functions. 

The uniqueness of the m-tuple w € R™ on line (41.2.1) is guaranteed by the uniqueness of the derivatives 
of functions from R to R™, in this case the function t > f(p+te;). Therefore this m-tuple may be given 
a name. This is done in Definition 41.2.7. The array containing all n of these m-tuples as rows is given a 
name in Definition 41.2.12. In both of these definitions, the function is required to be differentiable only at 
the single point p where the m-tuple or matrix is defined. 


41.2.7 DEFINITION: The partial derivative of a partially differentiable function f : U — R™ at p € U with 
respect to component i € Nn, for U € Top(IR”) and n, m € Zi, is the m-tuple w € IR" which satisfies 


Ve > 0, dó > 0, Vt € (—ô, ô), 
ptte €U > |f(p+ tei) — f(p) — tw| < elt]. 


41.2.8 REMARK:  Notations for partial derivatives. 

The functional notation ð; f (p) is used in Notation 41.2.9 for a partial derivative which may exist at only one 
point. This is a slight abuse of notation because the rest of the function might not exist. On the other hand, 
the partial derivative may be well defined as a partial function. (See Section 10.9 for partial functions.) 


41.2.9 NOTATION: 0;f(p), for i € IN, and a partially differentiable function f : U > R™ at a point p € U, 
for U € Top(IR”) and n, m € Zf, denotes the partial derivative of f at p with respect to component i € Nn. 


41.2.10 NOTATION: 0;f, fori € N, and a partially differentiable function f : U — R™ with U € Top(R”) 
and n,m € Zi , denotes the map p — 0;f(p) for p € U. 


41.2.11 REMARK: Partial derivative matrices do not imply existence of directional derivatives. 

Since matrices are closely associated with linear maps in Definitions 25.7.3 and 25.7.5, one may be tempted 
to believe that the partial derivative matrix in Definition 41.2.12 is associated with directional derivatives. 
'This is true if the function is totally differentiable at a given point, but otherwise the product of the matrix 
with a vector in the source space does not necessarily match the function's directional derivative in that 
direction. The directional derivatives might not even be well defined. 


41.2.12 DEFINITION: The partial derivative matrix of a partially differentiable function f : U —^ R” at a 
point p € U, for U € Top(IR”) and n, m € Zj, is the matrix w = (wi;)-1551 € Maa (R) which satisfies 


Vi € Nn, Ve > 0, Ad > 0, Vt € (—ô, ô), 
p+tei € U => |f(p--tei) — f(p) — twi| € elt], 


where w; denotes the m-tuple (w;;)7., for all i € Nn. 


41.2.13 REMARK: Component-by-component partial differentiability between Cartesian spaces. 
Instead of defining partial differentiability at p € IR" of a map f between Cartesian spaces R” and IR" 
in terms of the differentiability of the n maps t — f(p + tei) for i € Nn as in Definition 41.2.2, one may 
equivalently define partial differentiability of f in terms of the n x m real functions t — f;(p + tei) for 
(i,j) € Nn x Nm. In other words, instead of requiring n curves in R™ to be differentiable, one may 
equivalently require n x m real functions to be differentiable. This is shown in Theorem 41.2.14. 


41.2.14 THEOREM: Componentwise partial differentiability is equivalent to partial differentiability. 
Let f : U + IR" and p € U for some U € Top(IR?), for some n,m € Z. Then f is partially differentiable 
at p if and only if 


Vi € Nn, Vj € Nm, 3v € R, Ve € R*, 36 € R*, Vt € (—6,6), 
p+te; €U = |f;(p+ tei) — f;(p) — vt| < et. (41.2.3) 


PROOF: Assume that f is partially differentiable at p. Let i € N, and j € Nm. Then by Definition 41.2.2, 
there is a w € IR" which satisfies line (41.2.1). Let v = wj. Then line (41.2.3) is satisfied because of the 
inequality | f;(p + tei) — f;(p) — tw;| € |f(p + tei) — F(p) — tule. 

Conversely, if line (41.2.3) is satisfied, then f is partially differentiable at p by choosing e1 = &/(n + 1) in 
line (41.2.3) to achieve the inequality on line (41.2.1). 
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41.2.15 REMARK: Continuity of “lift functions” at points where partial derivatives exist. 

In terms of the “lift functions” in Definition 10.13.17 and Notation 10.13.18, the assertion of Theorem 41.2.16 
is that f o Lift? is continuous for p € U and i € Nn if f o Lift? is differentiable. Therefore the assertion 
follows from the one-dimensional case. 


41.2.16 THEOREM: Continuity of "lift functions" at points where partial derivatives exist. 

Let f : U — R” be a partially differentiable function at p € U for some U € Top(IR") with n,m € Zg. For 
i € Nn, let U; = (t € R; p+ te; € U} and define f; : Uj + U by f; : t> f(p+ tei). Then U; € Topo (IR) and 
fi is continuous at 0 for all i € Ny. 


PROOF: Define Dj; : R —> IR" by Ly; : tr p- tei, where (e;)7., is the usual family of unit vectors in R”. 
Then Ly; is continuous. So U; = L7; (U) € Top(IR). But 0 € U;. So U; € Topo(IR). 


It follows from Definitions 41.2.2 and 40.3.4 that f; = f o Lpi : U; — U is differentiable at 0 € U; for 
all i € Nn. Hence f; is continuous at 0 by Theorem 40.5.2. 


41.2.17 REMARK: Functions with continuous partial derivatives. 

Definition 41.2.18 does not have a pointwise version as Definition 41.2.2 does. The reason for this is that 
continuity of a partial derivative at a single point makes no sense. The partial derivative must at least be 
defined in a neighbourhood of a point. Thus continuity at a point of a partial derivative in a neighbourhood of 
the point is a well-defined concept, but it is rarely useful. By contrast, continuity of all partial derivatives at 
all points in a neighbourhood of a given point is a very useful concept indeed because of the total differential 
theorem, Theorem 41.6.15. (Note that C'!(U,IR?) in Notation 41.2.19 is a special case of C^(U, IR") for 
k € Zj and m € Zj in Notation 42.5.10.) 


41.2.18 DEFINITION: A continuously partially differentiable function f : U > R” for a set U € Top(R”) 
and n,m € Zi, is a partially differentiable function f on U such that the partial derivatives of the components 
of f are all continuous on U. 


A function which has continuous partial derivatives means a continuously partially differentiable function. 


41.2.19 NOTATION: C1(U,IR™), for a set U € Top(IR?), for n,m € Zg, denotes the set of functions 
f :U — IR" which have continuous partial derivatives. 


41.2.20 REMARK:  Keyhole testing for continuous partial differentiability of Cartesian space maps. 

For differentiable manifolds maps, it is sometimes necessary to show global continuous differentiability, but 
this can only be done using a local test within each chart domain. Theorem 41.2.21 provides a local test 
for Cartesian space maps, which are obtained from maps between manifolds via local charts. The statement 
and proof of Theorem 41.2.21 are almost exactly the same as those of Theorem 41.1.27. 


41.2.21 THEOREM: Local continuous differentiability implies global continuous differentiability. 

Let n,m € Zg. Let f : U — R” for some U € Top(IR?). Then f is continuously partially differentiable on 
U if and only if for all p € U there is a set Q € Top, (U) such that f la is continuously partially differentiable. 
In other words, 


feu, R”) Vp € U, IN € Top, (U), flo € C! (Q, IR"). 


PROOF: Suppose that f € C!(U, IR"). Then flo is partially differentiable on U by Theorem 41.2.5. The 
continuity of the partial derivatives 0;(f la? = (0; Aly of f la on Q for i € N, follows from the continuity of 
the partial derivatives of f on U by Theorem 31.12.15. Thus f € C!(Q, IR") by Notation 41.2.19. 

Now suppose that Vp € U, 39 € Top, (U), flo € CH(Q,R™). Then f : U + R” is partially differentiable 
on U by Theorem 41.2.4, and then by Theorem 31.12.17, the continuity of the partial derivatives of f on 
U follows from the continuity of the partial derivatives 0;( flo) = (Oif )la of fle on Q for i € Nn. Hence 
f € CU, R?). 


41.2.22 REMARK: Constant Cartesian space maps have zero partial derivatives. 
Theorem 41.2.23 is the natural extension of Theorems 40.5.5 and 41.1.16 to Cartesian space maps. 
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41.2.23 THEOREM: Constant Cartesian space maps have zero partial derivatives. 
Let n,m € Zj. Let U € Top(IR?). Let c € IR". Define f : U 2 IR" by f(x) = c for all x € U. Then f is 
partially differentiable on U, and Vp € U, Vi € Nn, if (p) = 0, and f € C'!(U, IR"). 


PROOF: Let p € U. Let i € Np. Let w = 0 € IR". Let e € Rt. Let 6 = 1. Let t € (—6,9) satisfy 
p- tei € U. Then |f(p + tei) — f(p) - tv| = |c — c — 0| = 0 < e|t|. So f is a partially differentiable function 
on U by Definition 41.2.2, and the partial derivative of f at p equals 0 € R” for all p € U and i € N, by 
Definition 41.2.7. Therefore 0;f(p) = 0 for all p € U and i € Nn by Notation 41.2.9. But the zero function 


is continuous by Theorem 31.12.9 because it is constant. Hence f € C'(U, IR"). 


41.2.24 THEOREM: Partial differentiability of the identity map om Cartesian spaces. 
Let n € Zi. Let U € Top(IR?). Then idy € C! (U, R”) and Ojdu(p) = e; for all p € U and i € Ny. 


PROOF: Let p € U and i € Ny. Let w = e;. Let € € IR^. Let ô= 1. Let t € (—ô, ô) satisfy p + te; € U. 
Then |f(p + tei) — f(p) — tw| = |te; — te;| = 0 € e|t|. So f is a partially differentiable function on U by 
Definition 41.2.2, and the partial derivative of f at p equals e; for all p € U and i € N, by Definition 41.2.7. 
Therefore 0;f(p) = e; for all p € U and i € Nn by Notation 41.2.9. But the function p +> e; is continuous 
for all i € N, by Theorem 31.12.9 because it is constant. Hence f € C'!(U, IR") by Definition 41.2.18 and 
Notation 41.2.19. 


41.2.25 REMARK: Zero partial derivatives of hyperplane-constrained maps. 
Theorem 41.2.26 is applicable to the differentiable submanifold concept in Definition 52.4.2, particularly in 
regard to the submanifold tangent vector embedding map concept in Definition 54.6.2. 


41.2.26 THEOREM: Zero partial derivatives of hyperplane-constrained maps. 
Let n,m € Zf. Let n', m/' € Zj with n' X n and m' < m. Let U € Top(IR?). Let 6: U — R” be a partially 
differentiable function which satisfies 


Vp € Un (IR" x (0g... })s é(p) € R" x (0g... )). 
Then Vi € Nw, Vj € Nm \ Nw, 0:;(p) = 0. 


PROOF: Let p € UN (R™ x (0g, )). Let i € Nw. Then for some ó € Rt, for all t € (—ô, ô), 
p+te; € U because U € Top(R"). So p+ te; € U N(R” x {Opgn-n’}) for all t € (—6,6). Therefore 


olp + tei) — (p) € R™ x (0g 5-5 }) because (p), o(p + tei) R™ X {Opm-m’}). So ġj(p + tex) — oj (p) = 0 
for all j € Nm \ Nm’. Hence 0;¢;(p) = 0 by Definition 41.1.10. 


41.3. Joint-domain and joint-range map differentiability 


41.3.1 REMARK: Differentiable structure on Cartesian products of Cartesian spaces. 

If the two domains of a function of two variables are both open subsets of Cartesian spaces, then the two 
domains may be combined into a single direct product space. More generally, a function of two variables 
may have a domain which is an open subset of the joint domain. In other words, a partially defined 
function f : IR"! x IR" > R™ may have a domain 2 € Top(IR^: *"2), where R™ x R”? is identified with 
IR^: *?2, Under such circumstances, it is difficult to distinguish between a function of two variables and the 
corresponding function of a single joint variable. (See also Remark 10.2.31 for this ambiguity.) 


It is very often convenient to combine the domains of a function of two variables into a single joint domain. 
In fact, this is the usual interpretation, for both topological and differentiable structure. Thus one may 
say that a function of two variables is continuous if it is continuous with respect to the topological direct 
product of the two domain sets. (This is done for example in Theorems 32.9.10 and 35.3.5.) And then one 
may say that a function of two variables is partially differentiable if it is differentiable with respect to the 
differentiable structure on the joint domain, as in Theorem 41.3.3. 


'The corresponding observations may be made for joint ranges of direct products of maps. This concept is 
applied in both Theorems 41.3.3 and 41.3.5. 
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41.3.2 REMARK:  Double-domain product of partially differentiable maps between Cartesian spaces. 
Particularly for tangent bundles, tensor bundles and general differentiable fibre bundles, the differentiability 
of double-domain direct products of maps between Cartesian spaces is often required for atlases on direct 
products of manifolds. Theorem 41.3.3 shows that if two such maps are partially differentiable then their 
direct product is partially differentiable. (Theorem 41.3.3 is illustrated in Figure 41.3.1.) 
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Figure 41.3.1 Double-domain product of partially differentiable Cartesian space maps 


41.3.3 THEOREM:  Double-domain product of partially differentiable maps is partially differentiable. 
For k = 1,2, let $i : Uy — R'™* for some Up € Top(IR?*). 
(i) If à is partially differentiable at pẹ € Up for k = 1,2, then $1 X $3 : Uy x Up — IR? *"? is partially 
differentiable at (pi, p2). 
(ii) If x is continuously partially differentiable on Up for k = 1,2, then $1 X $3 : Uy x Uz — R™*™ is 
continuously partially differentiable on U1 x U2. 


PROOF: Part (i) follows from Theorem 41.2.14 by expressing the partial differentiability of $1 X $5 in terms 
of the differentiability of its mı + mg real component functions with respect to its nı + nz real variables. 


Part (ii) follows from part (i) and Definition 41.2.18 and Theorem 32.9.10 (i). 


41.3.4 REMARK: Common-domain product of partially differentiable maps between Cartesian spaces. 
The differentiability of common-domain direct products of maps between Cartesian spaces is often required 
because the manifold charts for total spaces of tangent bundles, tensor bundles and general differentiable 
fibre bundles are defined to be common-domain products of maps. Theorem 41.3.3 shows that if two maps 
with a common domain are partially differentiable then their direct product is partially differentiable. 


41.3.5 THEOREM: Common-domain product of partially differentiable maps is partially differentiable. 
Let n, mı, Mz € Zf. Let U € Top(R”). Let ġe : U — R™ for l = 1,2. 
(i) If ġe is partially differentiable at p € U for £ = 1,2, then $1 x $9 : U — R™*™ is partially differentiable 
at p, and ailp x $2)(p) = (ihi (p), 0;¢2(p)) for all i € Np. 
(ii) If à; is continuously partially differentiable on U for £ = 1,2, then $1 x $» : U — R™ +" is continuously 
partially differentiable on U. 


PROOF: For part (i), let ġe be partially differentiable at p € U for £ = 1,2. Then by Definition 41.2.2, 


VE € Na, Vi € Nn, Jwi € R”, Veg > 0, Jô; > 0, Vt € (—6¢, ôe), 
p+te; €U => |be(p + tei) — é«(p) — twi e| < exlt]. (41.3.1) 
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Let i € Nn. Let wi = (wi, wi) € IR"! x R™ = R"*"2, Let e € Rt. Let ep = €/2 for £ = 1,2. Let dy 
satisfy condition (41.3.1) for £ = 1,2. Let 6 = min(dj, 62). Then for all t € (—ô, ô) with p+ te; € U, 


le(p + tei) — o(p) — twilu» € |é1(p + tei) — é1(p) — twi ilr + |¢2(p + tei) — é»(p) — twi 2|R”2 
< elt], 


where ¢ = $4 X $» and m = m4 4- ma. So $4 x à» is partially differentiable at p by Definition 41.2.2, and by 
Definition 41.2.7 and Notation 41.2.9, 0;(¢1 X ¢2)(p) = wi = (wi1), wi) = (3:1 (p), 0i2(p)) for all i € Nn. 
For part (ii), let 9; be continuously partially differentiable on U for £ = 1,2. Then ¢ = $1 x $» is partially 
differentiable on U by part (i) and Definition 41.2.2. The continuity of 9;(ó; x $2) follows from the equality 
in part (i) and the continuity of 0;¢, and 0;¢2 because ¢; and ¢2 are assumed to be continuously partially 
differentiable. 


41.3.6 REMARK: Some basic real-number operations are continuously partially differentiable. 

Theorem 41.3.7 shows that some basic real-number operations are C!, By comparison, Theorem 35.3.5 
shows that the same operations are C°. However, the methods of proof are very different. Theorem 35.3.5 
proves continuity using only the most basic definitions for the topological structure of the real numbers and 
Cartesian linear spaces, whereas Theorem 41.3.7 uses analytical e-ó differential quotient definitions. 


Since addition operations are linear, and multiplication operations are bilinear, the partial derivatives for 
addition operations in Theorem 41.3.7 (i,iv) are constant, whereas the partial derivatives for multiplication 


41.3.7 THEOREM: Continuous partial differentiability of real-number algebraic operations. 


(i) The real-number addition operation ce : R x R — R is continuously partially differentiable, and o 
satisfies O47 (p1, p2) = 050 (pi, p2) = 1 for all (py, p2) € Rx IR. 

(ii) The real-number product operation 7 : IRXIR — IR is continuously partially differentiable, and 7 satisfies 
OiT(p1, p2) = po and OoT(pi1, p2) = pi for all (pj, p2) € IR x R. 

(iii) The Cartesian linear space scalar multiplication operation u : R x R” — R” is continuously partially 
differentiable for all n € Zt, and the partial derivative matrix w € Mia, (IR) for u satisfies w1 j = x 
and wj, = Aó; 1,; for i € N144 V (1) and j € IN, at each point p = (A, £) € Rx IR^ = RIY”. 

(iv) The Cartesian linear space vector addition operation o; : IR" x R” — IR" is continuously partially 
differentiable for all n € Zf, and the partial derivative matrix w € M, 4, (IR) for on satisfies w; j = 
Witn,j = ĝi j and for i, j € IN, at each point p € R^*^ = R2", 


PRoor: For part (i), let p = (pi, p2) € IR x IR = IR?. Let w = 1. Let e € Rt. Let 6 — 1. Let t € (—6,6). 
Then |o(p+te,) - o(p) ^ tw| = |o(pi +t, po) —o(p1, pa) ^ t| = [py +t - pa — (pı - pa) ^ t| = 0 < elt]. Similarly 
lo(p + te3) — o(p) — tw| = 0 € etl. So c is partially differentiable on R x R by Definition 41.1.4. Then 
o(p) = 1 and Oso (p) = 1 for all p € IR x IR by Definition 41.1.10 and Notation 41.1.12. Thus 0:0 and 020 
are continuous functions on R x IR by Theorem 31.12.9 because they are constant. Hence o is continuously 
partially differentiable by Definition 41.1.24. 

For part (ii), let p = (p1,p2) € Rx R = R?. Let w = pp. Let e € Rt. Let 6 = 1. Let t € (—6,6). 
Then |r(pi + t, p2) — T(pi,p2) — tw| = [pipa + tpa — pipa — tp2| = 0 < e|t|. So 0ir(pi, p2) = pa for all 
(pi, p2) € R x IR by Notation 41.1.12. Similarly 0)7(p1,p2) = pi for all (pj, p2) € IR x IR. Since 0,7 and 
əT are thus projection maps, it follows from Theorem 32.10.7 (i) that they are continuous. Hence 7 is 
continuously partially differentiable by Definition 41.1.24. 

For part (iii), let n € Z and p = (Az) € IR x IR^ = RU”. If n = 0, then p = (A,0) and p(p) = 0, 
which implies that 7 is constant, which implies that 7 € C (IR. x IR", IR?) with zero partial derivatives by 
Theorem 41.2.23. So assume that n € Z+. Let w = x. Let &€ € IR*. Let ô= 1. Let t € (—6,0). Then 
|u(A + 6,2) — u(A,z) — tw| = 0 € e|t|. Thus p satisfies Definition 41.2.2 line (41.2.1) for i = 1. 

Now let i € Nn4i V {1}. Let w = Ae;;i. Let € € Rt. Let 6 = 1. Let t € (—0,0). Then |n(A, x + tei 1) — 
p(A, x) — tw| = 0 € etl. Thus wp satisfies Definition 41.2.2 line (41.2.1) for ài € IN441 \ (1). Therefore 
uw: IR X IR" > R” is a partially differentiable function by Definition 41.2.2, and the partial derivative matrix 
w € Mn+1,n(R) for u satisfies w1 j = x and wij = Adj-1,; for i € N441 \ {1} and j € Nn at each point 
p = (A,x) € R!” by Definition 41.2.12. But these matrix elements are all continuous functions on IR'*” 
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because they are either constant or are projection maps from R!+” to R”. Therefore u is continuously 
partially differentiable by Definition 41.2.18. 


For part (iv), let n € Zf and p = (u,v) € R” x R” = R?”. Let i € Nn. Let w =e; € R”. Let € € Rt. Let 
ô= 1. Let t € (—6,0). Then |o. (p + tei) — on(p) — tw| = |o, (ut tei, v) — on(u, v) — tw| = 0 € e|t|. Thus 
On satisfies Definition 41.2.2 line (41.2.1) for i € N,. Similarly, for i € Nan V Nn, let wW = e; 4, € R”. Let 
e € Rt, ô = 1 and t € (—ô, ô). Then |o,(p + tei) — on(p) — tw| = |ou(u, v + tei) — on(u, v) — tw| = 0 < elt]. 
Thus o; satisfies Definition 41.2.2 line (41.2.1) for i € No, V Nn. Therefore on : R” x R” > R” isa 
partially differentiable function by Definition 41.2.2, and the partial derivative matrix w € Mn+n,n(R) for 
On satisfies Wij = Witn,j = Oij for all i,j € Nn at each point p € R?” by Definition 41.2.12. But these 
matrix elements are all continuous functions on R?” because they are constant. Therefore on is continuously 
partially differentiable by Definition 41.2.18. 


41.4. Directional derivatives 


41.4.1 REMARK: Logical implications between differentiability classes for several variables. 
Figure 41.4.1 shows some relations between differentiability properties for real-valued functions of several 
real variables. 


f continuous in Q 


1 


f partially 


differentiable in Q 


I" 


1 


f directionally 
differentiable in Q 


f totally 
differentiable in Q 


1 


T 


f ctsly partially 


f ctsly totally 


As f ctsly directionally _ 
differentiable in Q 


differentiable in Q differentiable in Q 


Figure 41.4.1 Relations between differentiability properties for f : Q — IR, Q € Top(IR?), n € Z+ 


41.4.2 DEFINITION: A directionally differentiable IR" -valued function at a point p in a set U € Top(R”) 
with n, m € Zg is a function f : U — IR" such that 


Vv € R”, dw € R”, Ve > 0, 3ô > 0, Vt € (—6,6), 
p+tv EU => |f(p+ tv) — f(p) —tw| € elt]. 


(41.4.1) 


A directionally differentiable R™-valued function on a set U € Top(IR") with n,m € Zi is a function 
f :U — IRR which is directionally differentiable at all points p in U. 


41.4.3 REMARK: Basic properties of directional derivatives. 

It is easy to show that the m-tuple w in Definition 41.4.2 is uniquely determined by the n-tuple v for each 
point p € U. Therefore w € IR” is a well-defined function of v € IR" for each fixed p € U. Denote this 
well-defined function by 9, : R” — IR". It is immediately clear from Definition 41.4.2 that $, (Au) = Ao (v) 
for any v € IR" and à € R. Thus each function 9, is linear on one-dimensional linear subspaces of the linear 
space R”. However, it is certainly not true in general that ¢, is linear on the whole of R”. (Remark 41.5.3 
is the unidirectional analogue of this remark.) 


Directional derivatives may be expressed as ô, f(x) = lim, ,o(f(z + av) — f(x))/a for x € U and v € R”. 
Consequently ðe, f(x) = Ojf(x) for x € U and i € Nn by Definition 41.2.7, for unit vectors (e;)2.,. Thus 
partial derivatives are special cases of directional derivatives. 


41.4.4 DEFINITION: The directional derivative of a directionally differentiable function f : U — IR^" at 
p € U in the direction v € IR", where U € Top(IR?) and n, m € Z, is the m-tuple w € IR" which satisfies 


Ve > 0, dó > 0, Vt € (—ô, ô), 


p+tv EU = |f(pt+tv) — f(p) — tw| x ele. (41.4.2) 
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41.4.5 NOTATION: 0,f(p), for a directionally differentiable function f : U — R”, for U € Top(R”), for 
v € R” and p € U, for n,m € Zo , denotes the directional derivative of f in the direction v at p. Thus 


Ve > 0, dó > 0, Vt € (—ô, ô), 
p+tv EU = |f(p+ tv) — f(p) — t9.f(p)| < elt]. 


41.4.6 EXAMPLE: Function with directional derivatives not linear with respect to direction. 
Let g : R” — R be a C! function which satisfies g(—2) = —g(x) for all x € IR", for some n € Z*. Define 
f: R” > R by 


Vr € R^, f(x) = gere se : f : 


Let v € R” \ {0}. Then f(tv) = g(tv/|tv|)|tv| = tg(v/|v|)|v| for all t € IR. So 8,f(0) = g(v/|v|)|v|. Therefore 
Ok, f (0) = g(kv/|kv|)|kv| = kg(v/|v|)|v| for all k € IR. In other words, 0, f(0) is linear with respect to scaling 
of the vector v. (This holds even for v = 0 because 0, f(0) = 0 when v = 0.) 

Now consider the function g : x — x3. This gives 9, f(0) = (vi/|v|)?|v| = v3|v|? for all v € IR" \ (0). 
For n = 2, let u = (1,0), v = (0,1) and w = u +v = (1,1). Then 0,f(0) = 1 and 0,f(0) = 0, but 
Ow f (0) = 3 # 9,f(0) + 9,f(0). Therefore clearly ð, f(0) is not a linear function of v. However, f is a C! 
function on IR" \ {0} for any C! function g. 


41.4.7 REMARK:  Directionally differentiable functions may be discontinuous. 

Examples 41.4.8 and 41.4.9 describe discontinuous functions which are everywhere directionally differentiable. 
These are analogous to Examples 41.1.21 and 41.1.22 respectively, which describe everywhere partially 
differentiable functions. 


Since all totally differentiable functions are continuous, it follows that Examples 41.4.8 and 41.4.9 are not 
totally differentiable. Hence directional differentiability everywhere does not guarantee total differentiability. 


41.4.8 EXAMPLE: An everywhere directionally differentiable function which is discontinuous. 

Define g : IR > R by g(t) = texp((1 — 12)/2) for all t € IR. (See Figure 41.4.2.) Then g is a C! function 
on R, g(1) = 1 and |g(t)| € 1 for allt € IR. (In fact, g is C??. See Notation 42.1.10 for C% functions. See 
Definition 44.1.5 for exp.) 


g(t) = tet“)? 


1 2 3 
4r 
Figure 41.4.2 The function g : R — R with g : t > texp((1— t?)/2) 
Define f : R? > R by 
_ J g(xi/u2) x2 40 
Vc € R, f(x) Tr dm 
_ 2z;lexp((1— 21227)/2) za Z0 
0 w= 0. 


The level curves of this function are illustrated in Figure 41.4.3. 
Let v € R? with v9 4 0. Then 0,f(0) = lim;>o f(tv)/t = limy4o v2v7 ! exp((1 — ?v$v57)/2) = vuz ! e!/?. 
If v = 0, then 0, f(0) = 0. Thus f is directionally differentiable at 0, and also on IR? V {0}. However, f, (0) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


41.5. Unidirectional derivatives for several variables 1343 


f(x) = «(3) 2 f(x) = «(3) 

f(x) = g(1) 1 f(x) = g(1) 

f(x) = g(2) ——+ 4—— f(x) = g(2) 

ae al 1 pe 
f(x) = g(-2) f(x) = g(-2) 
f(x) = g(-1) -1 f(x) = g(-1) 
f(x) = s(-3) -2 f(x) = s(-3) 
f(x)=0 

Figure 41.4.3 Level curves of f(x) = x25! exp((1 — wine )/2), f:R?3R 


is unbounded, even for bounded v. In fact, f,(0) = v, ! for v satisfying v2 = v] and vı Æ 0, which implies 
that f,(0) is unbounded for v in any neighbourhood of 0, although |f(x)| < 1 for all z € R?. 
Unsurprisingly, since f,(0) is unbounded for bounded v, f is not continuous at (0,0) € IR?. Consequently f 
is not totally differentiable at (0, 0), although this follows more immediately from the observation that f, (0) 
is not linear with respect to v. 


41.4.9 EXAMPLE: Everywhere directionally differentiable function, discontinuous on a dense set. 
As in Example 41.1.22, let h : Z* — Q? be an enumeration of the set Q? of elements of IR? which have 
rational components. Define ¢ : IR? + R by 


é(z) = Y 2-* f(z — h(k)) 


kez-t 


for all x € IR?, where f is as defined in Example 41.4.8 and x — a denotes the element (xı — a1, £2 — a3) of 
IR? for all a € R?. Then ¢ is well defined and directionally differentiable on R?, but $ is discontinuous at 
all points a € Q?. Thus ¢ is discontinuous on a dense subset of its domain IR?. Note that |ó(x)| < 1 for 
all z € R?. (For all points a € Q?, the function $ is also not totally differentiable at a.) 


41.4.10 REMARK: Continuously directionally differentiable IR" -valued functions. 

When an R”-valued function is said to be continuously directionally differentiable, this means that for each 
fixed direction, the directional derivative is continuous with respect to points in the domain of the function. 
(See Theorem 41.6.11 (iii) for an application.) 


41.4.11 DEFINITION: 

A continuously directionally differentiable IR" -valued function on a set U € Top(IR”), where n,m € Zo. 
is a directionally differentiable R™-valued function on U such that the directional derivative 0, f(x) is a 
continuous function of x for all v € R”. In other words, for all v € R”, the map x — f(x) is a continuous 
map from U to R™. 


41.5. Unidirectional derivatives for several variables 


41.5.1 REMARK: Unidirectionally differentiable functions of several variables. 
Definition 41.5.2 is a one-sided derivative version of Definition 41.4.2. 


41.5.2 DEFINITION: A function f : U — IR" for U € Top(IR?) with n,m € Zp is said to be 
unidirectionally differentiable at p € U if 


Vv € R”, 3w € R”, Ve > 0, 3ó > 0, Vt € (0,8), 
pttveU => (|f(pt+tv) — f(p) —tw| < elt]. (41.5.1) 
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A function f : U — IR^ for U € Top(R”) is said to be unidirectionally differentiable on U if it is unidirec- 
tionally differentiable at all points p € U. 


41.5.3 REMARK: Basic properties of unidirectional derivatives. 

In the same way as mentioned in Remark 41.4.3, it is easy to show that the m-tuple w in Definition 41.5.2 
is uniquely determined by the n-tuple v for each point p € U. Therefore w € R™ is a well-defined function 
of v € R” for each fixed p € U. Denote this well-defined function by p : IR" — R™. It is immediately clear 
from Definition 41.5.2 that plu) = 9, (Av) for any u,v € IR" which satisfy u = Av for some A € (0,00). Thus 
each function ¢, is linear on one-dimensional directed linear subspaces St = (Av; A € R$} of the linear space 
IR" for v € IR". However, it is not true in general that ¢, is linear on linear subspaces S, = (4v; A € R} 
of R”. It is a-fortiori not true in general that ¢, is linear on the whole of IR". 


41.5.4 REMARK: Functions which are everywhere unidirectionally differentiable may be discontinuous. 
Since every directionally differentiable function is necessarily unidirectionally differentiable, it follows that 
Examples 41.4.8 and 41.4.9 demonstrate the existence of discontinuous functions which are everywhere 
unidirectionally differentiable. 


41.6. Total differentiability 


41.6.1 REMARK: Terminology for the total differential. 

Some authors refer to the total differential in Definition 41.6.7 by the name “total differential” or “total 
derivative". (See for example Friedman [74], page 221; Rudin [129], page 189.) Other authors refer to the 
total differential as simply “the differential". (See for example Thomson/Bruckner/Bruckner [149], page 526; 


Edwards [67], page 414; Graves [85], page 76.) 


41.6.2 REMARK: The importance of the total differential for differential geometry. 

The total differential is an important brick in the edifice of differential geometry. The meaningfulness of 
tangent vectors is derived from the total differential. The main motivation for requiring the derivatives of a 
function to be continuous, in the differential geometry context, is to ensure that the function has a well-defined 
total derivative (by Theorem 41.6.15), which means that the tuple of partial derivatives for a single basis 
determines the directional derivative in every direction, and these directional derivatives are linearly related 
to those partial derivatives. This reduces the computations for coordinate transformations for derivatives 
to simple linear algebra. Likewise, the derivatives of compositions of totally differentiable functions may be 
computed by matrix multiplication. Without total differentiability of functions, differential geometry would 
be a very different subject. 


41.6.3 REMARK: Coordinate-free vectorial geometry requires the total differential theorem. 
A constain refrain in differential geometry books is the need to be “coordinate-free”. However, something 
which is not often mentioned is that the vectorial approach depends heavily on the total differential theorem. 


Theorem 41.6.21 (ii) is the crucial link between coordinate concepts and “coordinate-free” vector concepts. 
If f € C'(U,R™) for some U € Top(IR?) with n,m € R$, then all directional derivatives of f can be 
expressed in terms of the partial derivatives of f. Without this property, it would be necessary to write 
all formulas in terms of partial derivatives with respect to coordinates. The concept of a vector is really 
only useful when functions are totally differentiable, so that partial derivative formulas can be replaced with 
vector expressions. For example, in potential theory one may write F; = —OV/Ozj for the components of 
the force for a given potential function V. This can be written vectorially as F = —VV because the relation 
is not dependent on the coordinate system. Computation in one system yields values for all systems. 


Given the partial derivatives for one set of coordinates, the partial derivatives for all other coordinate systems 
can be computed by a simple linear transformation. This is the big advantage of total differentiability, which 
follows from C! differentiability by Theorem 41.6.15, the total differential theorem. In all areas of science 
and engineering, the use of vectors assumes sufficient continuous differentiability of functions to ensure 
that partial derivatives and total derivatives are freely interchangeable. In rare circumstances, such as at 
singularities of functions or boundaries of regions, the vectorial approach may in fact fail because of the 
absence of total differentiability. Then one must return to first principles. 
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41.6.4 DEFINITION: A totally differentiable function f : U — IR" at a point p € U, for U € Top(R”) with 
n,m € Zg, is a function f : U — R™ which satisfies 


JL € Lin(R”, R”), Ve > 0, 5d > 0, Va € Bp, NU, 
|f (a) — F(p) — L(a — »)| € ela— pl- (41.6.1) 


A totally differentiable function f : U — R” , for U € Top(R”), is a function f : U — IR which is a totally 
differentiable function at every p € U. 


41.6.5 REMARK: Uniform convergence and linearity of the total differential. 

In Definition 41.6.4, the “error term" e|q — p| is uniformly bounded with respect to the unit direction vector 
(q—p)/lq—p| for q € By, N (UN {p}). In other words, the “speed of convergence" of the difference f(q) — f (p) 
to the directional derivative term L(q — p) is uniform with respect to the direction. However, this uniform 
convergence alone is not sufficient to make a function totally differentiable. 


The significance of the linearity requirement is highlighted by Example 41.6.13, where directional derivatives 
are defined in every direction at the origin, and the convergence is uniform with respect to direction. (In fact, 
the “error term" is zero because the function is linear along each ray from the origin.) But the directional 
derivative is not linearly related to the direction vector. So the function is not totally differentiable. 


41.6.6 REMARK:  Uniqueness of the total differential. 
It is a trivial (and tedious) exercise to show that the linear map L in condition (41.6.1) is unique if it exists. 
'Therefore it can be given a name in Definition 41.6.7. 


41.6.7 DEFINITION: The total differential of f at p, for a totally differentiable function f : U — R™, where 
U € Top(R”) and p € U, is the map L € Lin(R™, IR?) which satisfies condition (41.6.1) for f at p. 


The total differential of f, for a totally differentiable function f : U — IR with U € Top(R”), is the 
function from U to Lin(IR?, IR) which maps each p € U to the total differential of f at p. 


41.6.8 NOTATION: df, for a totally differentiable function f : U > IR" with U € Top(IR?), denotes the 
total differential of f. In other words, df : U > Lin(IR^, R™) and for all p € U, df (p) is the total differential 
of f at p. Thus 


Vp € U, Ve > 0, 3ô > 0, Va € Bp, NU, 
|f(q) — f(p) — df(v)(a — p)| € elg — pl. (41.6.2) 


(df) is an alternative notation for the value of the total differential of f at p. 


41.6.9 REMARK: Visualisation of total differentiation. 

In the special case n = 1 in Definition 41.6.7, the total differential of a function y : R —^ IR" may be 
visualised as a tangent vector to a curve. In the special case m = 1, the total differential of a function 
f: R” > R may be visualised as a tangent plane to the constant-value contours of f. These two special 
cases are illustrated in Figure 41.6.1. 


t——1 
t=—2 


v = y'(0) w = df (p) 


Figure 41.6.1 Visualisation of tangent vector v and tangent covector w 
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The arrow in Figure 41.6.1 is misleading, even though this style of visualisation is often employed for 
gradients of functions in mathematics and physics books. A true vector scales proportionally to the scale 
of the diagram, whereas a gradient scales inversely. Even worse is the fact that the direction of a gradient 
changes when a diagram is scaled differently in the X and Y directions for example. The correct aspect of 
this kind of diagram is the spacing between the tangent planes of the level curves and their orientation. The 
normal direction does not transform like a true vector under general affine transformations. 


More generally, a fairly accurate picture of the total differential of a map f : U > R™, for U € Top(R”) 
and n,m € Zj, may be given by sketching a sequence (e;)"_, of basis vectors at a point p in the source 
space and the corresponding vector-sequence ((df)5(e;))7., at the point f(p) in the target space. 


41.6.10 DEFINITION: A continuously totally differentiable function f : U — IR", for U € Top(IR?) with 
n,m € Ze , is a totally differentiable function f : U — IR” for which the total differential df is a continuous 
map from U to Lin(R”, IR). 


41.6.11 THEOREM: Total differentiability implies directional differentiability. 
Let f : U 2 R”, where U € Top(R”) with n,m € Zg. 
(i) If f is totally differentiable at p € U, then f is directionally differentiable at p, and 0, f(p) = L(v) for 
all v € R”, where L € Lin(R”, IR") is the total differential of f at p. (In other words, 0, f (p) = (df) p(v) 
for all v € R^.) 


(ii) If f is totally differentiable on U, then f is directionally differentiable on U. 
(iii) If f is continuously totally differentiable on U, then f is continuously directionally differentiable on U. 


PROOF: For part (i), let n, m € Zj and U € Top(IR?). Let f : U — R” be totally differentiable at p € U. 
Then by Definition 41.6.4. there is a linear map L € Lin(IR?, IR) such that 


Ve > 0,3ó0 > 0, Yq € Bp QU, fia — fi») Lla- p)| < elg — pl. (41.6.3) 


Let v € R” \ {0}. Let w = Liv) € IR". Let £y € Rt. Let 6 € IR* satisfy line (41.6.3) for € = &o/|v]. 
Let ôo = ó/|v|. For t € (—60,09), let q = p + tv. If q € U, then |f(q) — f(p) — L(q — p)| € ela — p| by 
line (41.6.3). Thus |f (p + tv) — f(p) — tw| € &o|tl. So Veo > 0, 3o > 0, Vt € (—d0, 60), (p+ tv € U > 
|f (p + tv) — f(p) — tw| € &o|t|). Hence f is directionally differentiable at p by Definition 41.4.2 and the 
directional derivative of f at p in the direction v equals L(v) by Definition 41.4.4. In the case v — 0, clearly 
f has directional derivative 0 € R™ since f(p + tv) — f(p) = 0 for all t € R. 


Part (ii) follows from part (i) and Definition 41.4.2. 


For part (iii), if f is continuously totally differentiable on U, then the map p +> O,f(p) = (df)p(v) is 
continuous for all v € IR" because the map p++ (df), is continuous. Hence f is continuously directionally 
differentiable on U by Definition 41.4.11. 


41.6.12 REMARK: Insufficient conditions to guarantee total differentiability. 

Not only is the partial differentiability of a function insufficient to guarantee total differentiability. Examples 
41.4.8 and 41.4.9 show that even the existence everywhere of all directional derivatives is insufficient to imply 
even continuity. But then Example 41.6.13 shows that even if a function is continuous and has bounded 
directional derivatives everywhere, this does not imply total differentiability. 


This is important for differential geometry because it shows that even if the simple n-tuple concept of a 
tangent vector on an n-dimensional manifold is replaced by the more sophisticated concept of a directional 
derivative, which is known for the infinite set of possible directions, this still does not sufficiently describe 
the first-order behaviour of a function near a point. 


41.6.13 EXAMPLE: Bounded directional derivatives everywhere, but not totally differentiable. 

Define f : IR? + R by f(0,0) =0 and f(x) = x} (x? + 22) ! for x € IR? V (0). (See Figure 41.6.2.) 

Then |f(x)| < |x| for all z € R?. So f is continuous at 0. Clearly f also has continuous derivatives of all 
orders on IR? V {0}. Define gs : IR > IR by gy : t — f(tv) for v € R? \ (0). Then g,(t) = kyt for all t € IR, 
where ky = v?/(v? + v2). So Ogu (0) = ky for all v € R? \ (0). Therefore f has directional derivatives in all 
directions at x = 0. However, f is not totally differentiable at x = 0. Clearly ô, f(0) = 1 and 05//(0) = 0, but 
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Figure 41.6.2 Function which is not totally differentiable at the origin 


ky = $ for v = (1,1), whereas for a totally differentiable function, 0g, (0) should equal vi; f (0) + v282 f (0), 
which equals 1. 


The fact that f does not have continuous partial derivatives at the origin follows from the observation that 
Oi f (21,0) = 1 for x; € R \ {0}, whereas 04f(0, 2) = 0 for x2 € R \ {0}. 


41.6.14 REMARK: Continuous partial differentiability implies continuous total differentiability. 

Theorem 41.6.15 is of fundamental importance to differential geometry. This is the theorem which implies 
that the transition maps for tangent differential operators are linear if the partial derivatives exists and are 
continuous. (See Definition 41.2.18.) Hence the structure group of the tangent bundle of an n-dimensional 
differentiable manifold may be chosen as a subgroup of the general linear group GL(n). The importance of 
this theorem is the fact that only n partial derivatives need to be shown to exist and be continuous, which 
is much easier than showing total differentiability directly. 

The importance of Theorem 41.6.15 suggests that it should have its own nickname by analogy with such core 
analysis theorems as the “mean value theorem”, “inverse function theorem” and “implicit function theorem”. 
The name “total differential theorem” seems like a sensible choice. Theorem 41.6.15 is a straightforward 
consequence of Theorem 40.6.6, the mean value theorem for a single variable. 


41.6.15 THEOREM: The total differential theorem. 
If a function f : U > IRR" for U € Top(IR?), with n,m € Zi , has continuous partial derivatives on U, then 
f is continuously totally differentiable on U. 


PROOF: Let n,m € Zt and U € Top(IR?). Let f : U > R™ have continuous partial derivatives on U. (See 
Definitions 41.2.2, 41.2.7, 41.2.12 and 41.2.18.) If n = 0, then U = () and so f is the empty function, which 
is totally differentiable, or else U = {0} and so f(0) € R™ is constant, which is also totally differentiable. If 
m = 0, then f is a constant function, which is totally differentiable. So assume that n > 1 and m > 1. 


Let x € U. By Theorem 37.5.6 (ii), Bz, C U for some r € IR*. Choose ro € Rt so that Bz, C U. (For 


example, choose rg = d(x, R” V U).) The continuity of the partial derivatives 0; f implies that 


Vi € Nn, Ve c R*, 3ó € Rt, Vy € Bs, 
I(Gif)(y) — (8if)(z)]m < &/m, (41.6.4) 


where | - |m is the Euclidean norm on IR". Let eo € R+. Let do € (0, ro] satisfy |(8;f)(y) —(Oif)()|m < €o/n 
for all i € Nn and y € By, 5,. For z € Besa and i € {0} U Nn, let p; = £ + 35, 4(zy — zx)ex, where (ep): 
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is the coordinate basis for IR^. (In other words, the kth component of p; equals z, if k < i, and equals zy 
if k >i. That is, pi = (z1,... Zi, tii, 24).) Then f(z) — f(x) = Ya (pi) — f(pi-1)). 

For i € Nn, define h; : [0,1] > IR" by hi : t — (1— t)pi-i1-- tpi. Then h;(t) € Bs, for all t € [0,1]. 
For i € N, and j € Nm, define gi; : [0,1] > IR by gij : t — f;(hi(t)), where f; denotes the jth 
component of f. Then gj; is continuous on [0,1] and continuously differentiable on (0,1). Therefore by 
Theorem 40.6.6, g;;(1) — 9:,3(0) = g; j(tij) for some t;; € (0,1). But g;;(1) — 9:,;(0) = fj (pi) — f;(pi-1) 
and gj (5,5) = (zi — x1) (8&5) (t5). So f;(2) — fi(z) = Xia (i — ri) (Aifa) Us (65). 

It follows from line (41.6.4) that |(0;f;)(hi(ti,;)) — (O:f;)(@)|m < €o/n for all i € Nn. Therefore 


IB — F(z) E TNF ln = E — xi) (2i fi) (hi (ti,s)) — (8f) (2))55i | 
et > [zi — zil) max | (ifi 0 (5,5)) — (6:55) (2) |, 
<nlz-—2|n-e0/n = &o|z — z|a. 


Thus Vz € Bz», |f;(z) — fi(u) — oia Gi — vi) (if;)(x)]o € colz — z|n. Hence f is a totally differentiable 
function at x by Definition 41.6.4, and by Definition 41.6.7, the total differential of f at p is the linear map 
from R” to R™ defined by v > 77 4 vi(8;f)(z). 

Since the total differential of f is a matrix all of whose components are continuous functions on U, it follows 
that that f is continuously totally differentiable on U. 


41.6.16 REMARK: Equivalence of continuously partially, directionally and totally differentiable. 

A continuously directionally differentiable function, as in Definition 41.4.11, is clearly continuously partially 
differentiable as in Definition 41.2.18. But Theorem 41.6.15 shows that continuous partial differentiability in 
an open set implies continuous total differentiability, and Theorem 41.6.11 (iii) shows that continuous total 
differentiability in an open set implies continuous directional differentiability. T'herefore these three concepts 
are effectively equivalent. Even though the definitions are not literally the same, they define the same spaces 
of functions. This is stated more formally in Theorem 41.6.17. 


41.6.17 THEOREM: Equivalent conditions for continuous total differentiability. 
Let f : U — R”, where U € Top(R") with n,m € Zg. Then the following conditions are equivalent. 


(i) f is continuously partially differentiable on U. 
(ii) f is continuously directionally differentiable on U. 
(iii) f is continuously totally differentiable on U. 


PROOF: Condition (i) follows from condition (ii) by Definitions 41.2.18 and 41.4.11 because the partial 
derivative ð; f(p) is equal to the directional derivative ôe, f(p) for all i € N, and p € U, where (e;)? , is 
the usual coordinate basis for R”. So the continuity of the map p > 0;f(p) for all i € Nn follows from the 
continuity of the map p > 0,f(p) for all v € R”. 


Condition (ii) follows from condition (iii) by Theorem 41.6.11 (iii). 
Condition (iii) follows from condition (i) by Theorem 41.6.15. 


41.6.18 REMARK: Totally differentiable functions are continuous. 

As mentioned in Remark 41.6.12, directional differentiability everywhere does not guarantee continuity, 
whereas in the case of functions of a single variable, Theorem 40.5.3 shows that differentiability everywhere 
does imply continuity. 


Partial and directional differentiability are significantly weaker than total differentiability. As mentioned in 
Remark 41.6.5, even uniform convergence of directional derivatives at a point with respect to direction is not 
as strong as total differentiability, which very specifically requires the directional derivative to have a linear 
structure. Thus the natural generalisation of differentiability from a single variable to several variables is 
apparently total differentiability, which does guarantee pointwise continuity. 


41.6.19 THEOREM: Total differentiability implies continuity. 
Let n,m € Zf. Let U € Top(IR?). Let f : U — R”. 
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(i) If f is totally differentiable at p € U, then f is continuous at p. 
(ii) If f is totally differentiable on U, then f is continuous on U. 
(iii) If f € C! (U, R”), then f € C(U, IR"). 
Proor: For part (i), let f : U — IR be totally differentiable at p € U. Let € > 0. Let e = 1. Then by 
Definition 41.6.4, 
3L € Lin(IR^, R”), 4d, > 0, Yq € Bos, NU, 
If(q) — F(p) — L(a — »)| € &1la - pl- 


For some such L and ó;, let 6 = min(ó;, 3e(1-- p(L))~"), where p(L) = Ag |L(e;);| is the sum of absolute 
values of components of the action of L on standard unit vectors e;. Then 


Vq € By, N U, Gn — fons fig) — fp) Lla- p») + Ug —2)| 
€ |a — p|  e(L) |a — p| 
« lg «€ 


2 

because |Lv| € p(L) |v| for all v € IR^. Hence f is continuous at p by Theorem 38.1.10. 
Part (ii) follows from part (i) and Theorem 35.3.3. 

Part (iii) follows from Notation 41.2.19, Theorem 41.6.17 (i, iii) and part (ii). 


41.6.20 REMARK: Rewriting directional derivatives in terms of partial derivatives. 

The great value of total differentiability is that it enables directional derivatives to be expressed as simple 
linear combinations of the partial derivatives. Thus something which in general contains a large amount 
of information is reduced to a single n-tuple. That is, the expression 0, f for v € R” may be replaced by 
the expression 357. , vjO; f, which requires computation of only the n-tuple (0; f(p))?., at each p € U. This 
observation is stated more formally in Theorem 41.6.21. 


41.6.21 THEOREM: Rewriting directional derivatives in terms of partial derivatives. 
Let f : U + R”, where U € Top(R”) with n,m € Zi. 
(i) If f is totally differentiable at p € U, then Vv € R”, Oy f (p) = $5; 4 vidi f (p). 
(ii) If f is continuously partially differentiable on U, then Vp € U, Vv € R”, Ou f (p) = 35,4 vidi f (p). 


Pnoor: For part (i), let f is totally differentiable at p € U. Then f is directionally differentiable at p, 
and O,f(p) = (df)p(v) for all v € IR" by Theorem 41.6.11 (i). But 0;f(p) = 3e, f(p) for all i € N, by 
Definitions 41.2.7 and 41.4.4 and Notations 41.2.9 and 41.4.5. But (df), : R” — IR" is a linear map by 
Definition 41.6.4. So (df)p(0) = (PaE ta ie) = Xa v(df)p(e)) = ota Vide, f (v) for all v € R”. Hence 
Ouf (p) = Y viði f (p) for all v € R”. 


Part (ii) follows from part (i) and Theorem 41.6.15. 


41.6.22 REMARK: Continuously differentiable functions are Lipschitz on convex compact sets. 
Theorem 41.6.23 is useful for ODE theory in particular, where Lipschitz continuity is often required in order 
to prove existence or uniqueness of solutions. (See for example Sections 44.5, 44.6 and 44.7.) 


41.6.23 THEOREM: Continuous differentiability implies Lipschitz continuity on convex compact sets. 
Let n,m € Z*, U € Top(IR") and f € C!(U,IR?). Then f is globally uniformly Lipschitz continuous on 
every convex compact subset of U. 


PROOF: Let K be a convex compact subset of U. If K = Ø, then f is Lipschitz by Definition 38.6.6. So 
assume K Æ (). Then K is a bounded subset of R” by Theorem 37.7.3, and f and ð; f are bounded continuous 
functions on K for all i € IN, by Theorems 41.6.19 (iii) and 37.7.5. Let C = sup(|(Gif(x))4|; v € K}. 
Then C € Ri. 

Let x,y € K with x Z y. Let v = y — zr. Define g : [0,1] > IR" by g(t) = f(x + tv) for all t € [0,1]. 
By Theorem 41.6.17 (i,ii), f is continuously directionally differentiable on U. Then g'(t) = O,f(t) for 
all t € (0,1). So |g(1) — g(0)| € sup{|g'(t)|; t € (0,1)) € C|v| by Theorem 40.8.7 and Notation 41.4.5. 
Thus Vz,y € K,|f(y) — f(z)| € Cla — yl. Hence f is globally uniformly Lipschitz continuous on K by 
Definition 38.6.6. 
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41.7. Chain rules and diffeomorphisms 


41.7.1 REMARK: Chain rule for partial derivatives. 

The continuity of the partial derivatives in Theorem 41.7.2 is required because the partial derivative of f for 
the unit vector e; will in general cause f to vary in a direction which is not aligned with the axes of R™. 
Then the directional derivative 0,g must be computed in the direction of the tuple v = (0; f;(x))7-, € R”. 
Moreover, the curve t +> f (x--te;) will in general deviate from the straight line t f(x) --tO;f (x). Therefore 
the total derivative of g is required. (This is illustrated in Figure 41.7.1.) 


à try f(x + tei) 
Oif (x) 
x R” €k IR? 
L4 » » » 
HFC) Nee ti al fle te) 
— > 
g Oi(g o f)(x) 
Figure 41.7.1 Chain rule for partial derivatives 


If Range(f) C V in Theorem 41.7.2, then the composition g o f is the usual function composite as in 
Definition 10.4.17. Otherwise g o f is the partial function composite in Definition 10.10.6, which is the same 
as the usual function composite of f | mu with g. This is equivalent to replacing U with f-!(V). 


41.7.2 THEOREM: First-order pointwise partial derivative composition rule. Also called “the chain rule". 
Let n,m,p € Zt. Let U € Top(IR"?) and V € Top(IR?). Let f : U — IR? be partially differentiable at x € U. 
Let g : V > IR? be totally differentiable at f(x) € V. Then g o f is partially differentiable at x and 


vi € Nn, A:(g © fw) = $ (0,9) U (2) (8./,) 0. (41.7.1) 


PROOF: Let q € f !(V) and i € Nn. Let € € IR*. Then ei = $e/(1 + 55-4 |(859)(/(a))]p) € IR* and it 
follows from Definition 41.2.7, Notation 41.2.9 and Theorem 41.2.16 that there is a 6; € IR^ which satisfies 
vt c (—61,01), q+ te; € FV) and 

Vt € (—ô1, 61), |f (q + te;) = Fà = tif (q)|m < €1 |t]. (41.7.2) 


Let e2 = $e/(€1 + |Oif(q)|m). Then e» € Rt, and by Definition 41.6.4 and the assumption V € Top(R™), 
there is a 0» € IR* which satisfies By(q) 5, C V and 


Vy € Bg); Iglu) — (F(a) — E GID — fi(a))l» < Ely — f(D) |m- (41.7.3) 


Let 0j = min(ô1, ô2/(€1+|3if(q)|m). Then |f(q--te;) — f (q) 
for all t € (—6{, 64) by line (41.7.2). So Vt € (—6), ôi), f(q + te; 
( 


it 

) € 
vt € (—81, 81) lg(f(q + te:)) — I) — X5 (GIF) AF) Op 
< |g(f(a + tei)) — g(f(a)) — 325-4 (059) (3) (F(a + tei) — f(D) Ip 


|| f (a) teilt] < 91(61-- 10i f(Dlm) € 92 
B(g),5* Therefore 


+ [2254 (059) GF ()) (a + tei) — f(D) - EX GDS MOF) Op 
< eal f (q + tei) — F(q)m + Xg (059) FM) Ip 1a + tei) — Fla) — tif) (als 
< eal f (q + tei) — f(a)lm  &1ltl 255-4 (059) £F ())lo 


< ex( If (a + tei) — F(a) — HAL) (a) + HAL) lm) + Zell 
< ex(ei + |F’ (@)lm)lt] + ele 
< delt|+4ele] = elti. 
Hence by Definitions 41.2.2 and 41.2.7, g o f is differentiable at q and its partial derivatives at q are as given 


in line (41.7.1). 
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41.7.3 REMARK: First-order partial derivative chain rule. 

The pointwise chain rule for partial derivatives in Theorem 41.7.2 places a weaker condition on f than on g 
because f is only required to be partially differentiable whereas g is required to be totally differentiable. In 
the open-domain version of this chain rule in Theorem 41.7.4, there is effectively no difference between the 
conditions for f and g because Theorem 41.6.15 promotes continuously partially differentiable functions on 
open domains to be continuously totally differentiable. (Note that line (41.7.4) is intentionally arranged so 
as to resemble a product of two matrices. See Theorem 42.5.25 for the second-order chain rule.) 


41.7.4 THEOREM: First-order continuous partial derivative composition rule. Also called “the chain rule”. 
Let n,m,p € Zt. Let f : U — IRR" and g : V > R? be continuously partially differentiable on U € Top(IR") 
and V € Top(IR"") respectively. Then g o f is continuously totally differentiable on f !(V) and 
Va € f (V), Vi € Nn, Vk € Np, 
(8i(g o f))(x)x = X (Oii) (x) (859) (2). (41.7.4) 


j=l 


In other words, 0;(9 o f) = 00, Af; - ((859) o f). 


PROOF: The assertion follows from Theorems 41.7.2 and 41.6.15. 


41.7.5 REMARK: Definition of C! map morphisms. 
General C* map morphisms are given by Definition 42.7.2, but C! morphisms are required at this point for 
some specific chain rule tasks, and also for Theorem 41.10.4, the inverse function theorem. 


41.7.6 DEFINITION: Differentiable morphisms between open subsets of Cartesian spaces. 

Let n,m € Zi. 

A C! (differentiable) homomorphism from a set Q € Top(IR?) to a set S € IP(IR") is a C! differentiable 
map $:0— S. 

A C! (differentiable) monomorphism from a set Q € Top(IR") to a set S € IP(IR") is an injective C! 
differentiable homomorphism 6$ : €) — S. 

A C! (differentiable) epimorphism from a set € € Top(IR") to a set S € P(IR™) is a surjective C! differen- 
tiable homomorphism 9$ : 0 — S. 

A C! (differentiable) isomorphism or C! diffeomorphism from a set Q4 € Top(IR") to a set Qs € Top(IR") 
is a bijective C! differentiable homomorphism ¢ : Qı — Q2 such that $7! : Qə > Qı is a C! differentiable 
homomorphism from Qə to €. 

A C! (differentiable) endomorphism of a set Q € Top(IR”) is a C! differentiable homomorphism ¢ : Q > Q. 
A C! (differentiable) automorphism of a set Q € Top(IR") is a Ct differentiable isomorphism $ : Q — Q. 


41.7.7 THEOREM: Inverse chain rule for C! diffeomorphisms. 
Let n € Z and Q4, Q2 € Top(IR?). Let f : Qı — Nz be a C! diffeomorphism. Then 


n 


Va EM, [FMF ar = CE CNA a - 


n 


j=l 


Pnoor: By Definition 41.7.6, f^! : Q9 — Qı is a C! diffeomorphism. So by Theorem 41.7.4, 


Vr € Qi, Vi, k € Nn, 3 (Oi fj (x) (85(£  )) f (2)) = OT o f))(@)e 
= Oz, (ido, z))k 


J 


= On, Lk 
= bin. 
Therefore by Notation 25.8.11, 
ve EM, (Of Ye) FOV) par = (92 4506], - 


j=1 
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41.8. Vector-valued function differentiation 


41.8.1 REMARK: Differentiation of vector-valued functions on Cartesian spaces. 
The main purpose of Section 41.8 is to define and describe C!(U, W) as in Notation 41.2.19, but for functions 
from open subsets U of Cartesian spaces to general finite-dimensional real linear spaces W. 


Every finite-dimensional real linear space W has a finite basis by Theorem 22.7.12, and every basis for W has 
the same number of elements by Theorem 22.7.16. The bases are all related by vector component transition 
maps as described in Section 22.9. Since these transition maps are equivalent to matrix multiplication maps, 
the differentiability of W-valued functions on a Cartesian space is independent of the choice of the basis 
which is used for mapping W to a Cartesian space. (The equivalence of line (41.8.1) for different bases 
follows from Theorem 25.12.3 (xii).) 


Consequently the differentiability of W-valued functions is basis-independent. In other words, one may 
think of finite-dimensional real linear spaces as effectively equivalent to Cartesian spaces for the purposes 
of differentiation. However, concepts such as partial derivatives and continuous partial differentiability, for 
example, do require particular choices of bases to be made in order to apply concepts from Cartesian spaces 
to more abstract linear spaces. 


41.8.2 DEFINITION: A partially differentiable function f : U > W at a point p € U, for aset U € Top(R”) 
with n € Zd and a finite-dimensional real linear space W, is a function f : U — W such that for some basis 
B for W, 


Vi € Nn, dw € W, Ve > 0, 3ó > 0, Vt € (—ô, ô), 
pte; €U => |kg(f(p+ tei) — f(p) — tw)| < ele, (41.8.1) 


where kg : W > R” with m = dim(W) denotes the component map for basis B. (See Definition 22.8.8.) 


A partially differentiable function f : U — W, for a set U € Top(IR”) with n € Zf and a finite-dimensional 
real linear space W, is a function f : U — W such that f is a partially differentiable function for all p € U. 


41.8.3 REMARK: The role of a basis in the definition of partial differentiability. 

If the linear space W in Definition 41.8.2 is the Cartesian space R™ for some m € Zo , then the basis B 
can be chosen as the tuple (e;j)7*, as in Definition 22.7.9. For this standard basis, the component map 
kg : W — R” in Definition 22.8.8 is the identity map idw. So with this choice of basis, line (41.8.1) is 
identical to line (41.2.1) in Definition 41.2.2. Thus Definition 41.8.2 is a natural extension of Definition 41.2.2 
from Cartesian space valued functions to general finite-dimensional real linear space valued functions. 


If a basis B, is replaced by a basis B», it follows from Theorems 22.9.11 and 25.12.3 (xiii, xiv) that the 
positive ratio of the 2-norm expressions |&p(f(p + tei) — f(p) — tw)| on line (41.8.1) is bounded above and 
below by positive factors which depend only on Bı and Bg. So by adjusting the £ and 6 values, it is easily 
shown that the partial differentiability of a vector-valued function in Definition 41.8.2 is independent of the 
choice of basis for W. Thus saying “for some basis B" or “for all bases B” makes no difference. 


It follows that the vector w € W in Definition 41.8.2 is unique for each i € Nn. Therefore this vector can be 
given a well-defined name and notations as in Definition 41.8.4 and Notations 41.8.5 and 41.8.6. (These are 
natural extensions of Definition 41.2.7 and Notations 41.2.9 and 41.2.10 respectively.) 


41.8.4 DEFINITION: The partial derivative of a partially differentiable function f : U —^ W at p € U with 
respect to component i € Nn, for U € Top(IR”) with n € Zf, where W is a finite-dimensional real linear 
space, is the vector w € W which satisfies 


Ve > 0, 4d > 0, Vt € (—ô, ô), 
pte; €U => |&p(f(p-- tei) — f(p) — tw)| < ele| 


for some basis B of W. (See Definition 22.8.8 for the component map Kz.) 


41.8.5 NOTATION: 0;f(p), for i € IN, and a partially differentiable function f : U + W at a point p € U, 
for U € Top(R”) with n € Ze , for a finite-dimensional real linear space W, denotes the partial derivative of 
f at p with respect to component i € Nn. 
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41.8.6 NOTATION: 0O;f, for i € Nn and a partially differentiable function f : U > W for U € Top(IR?) 
with n € Zi , for a finite-dimensional real linear space W, denotes the map p+> O;f(p) for p € U. 


41.8.7 REMARK:  Vector-valued partial derivatives can be evaluated "through the charts". 

Since the component chart kg in Definitions 41.8.2 and 41.8.4 is effectively a linear map from W to the 
standard linear space structure on the coordinate space R™, one may rewrite &p(f (p + tei) — f(p) — tw) 
as KB(f(p + tei) — KB(f(p)) — tkg(w). This is the same as f(p + tei) — f(p) — tw, where f = kp o f 
and ù = &g(w). Thus these definitions may be obtained from Definitions 41.2.2 and 41.2.7 by substituting 


f for f and ù for w. 

Thus for a fixed choice of basis B, the vector-valued differentiability definitions are essentially identical to 
the Cartesian space valued definitions. The principal difference is the need to ensure that the vector-valued 
definitions are meaningful independent of the choice of basis. 


41.8.8 REMARK: Continuous partial differentiability of vector-valued functions. 

Definition 41.8.9 and Notation 41.8.10 are natural extensions of Definition 41.2.18 and Notation 41.2.19. 
'The continuity of partial derivatives of vector-valued functions may be defined either as continuity with 
respect to the topology induced on the linear space W by any component chart kg : W — IR" where 
m = dim(W) € Zt, or equivalently by considering the continuity of the partial derivatives O;(kp o f). 


If the map 0;f : U — W in Notation 41.8.6 is continuous, one may write 0;f € C(U,W) and f € C!(U,W) 
as in Notation 41.8.10. This utilises the topology on W which is induced by the bijection kg : W — IR?" for 
any basis B of W. 


Alternatively, as mentioned in Remark 41.8.7, one may define partial derivatives of vector-valued functions on 
Cartesian spaces “through the charts". Then one may write 0; f = kp (O;(&p o f)) for partially differentiable 
functions f : U — W. In other words, kp o O;jf = Oj(&pg o f). Then one may say that ôf € C(U,W) and 
f € C! (U,W) if Gj(kp o f) € C(U, IR"). Therefore Definition 41.8.9 gives two equivalent interpretations for 
the continuity of partial derivatives. (See Notation 42.5.32 for extension to C*(U,W) for k € Zg.) 


41.8.9 DEFINITION: A continuously partially differentiable function f : U —^ W for a set U € Top(R”) 
with n € Zi and a finite-dimensional real linear space W, is a partially differentiable function f on U whose 
partial derivatives are all continuous on U. 

In other words, Vi € Nn, 0;f € C(U,W), where W has the topology induced by the component chart &p for 
some basis B for W. 

Alternatively, Vi € Nn, OGj(&p o f) € C(U,R™) for the component chart kg : W — R™ for some basis B 
for W, where m — dim(W). 


41.8.10 NOTATION: C'(U,W), for a set U € Top(IR”) with n € Zf and a finite-dimensional real linear 
space W, denotes the set of functions f : U — W which have continuous partial derivatives. 


41.8.11 REMARK: Partial derivatives of constant vector-valued functions. 
Theorem 41.8.12 is the natural extension of Theorems 40.5.5 (IR — R), 41.1.16 (IR^ — IR) and 41.2.23 
(IR? — R™) to vector-valued functions (R” > W). 


41.8.12 THEOREM: Constant vector-valued functions have zero partial derivatives. 
Let n € Zf. Let U € Top(IR"). Let W be a finite-dimensional real linear space. Let c € W. Define 
f:U >W by f(a) =c for all x € U. Then f € C! (U,W) and satisfies 


Vp € U, Vi € Nn, Of (p) = 0. 


PROOF: Let B be a basis for W. Let p € U. Let i € N,. Let w = 0 € W. Let e € IR*. Let ô = 1. 
Let t € (—ô, ô) satisfy p + te; € U. Then |kga(f(p+ tei) — f(p) ^ tw)) = |sg(e — c — 0)| = 0 € eft]. So f 
is a partially differentiable function on U by Definition 41.8.2, and the partial derivative of f at p equals 
0 € W for all p € U and i € Nn by Definition 41.8.4. Therefore 0;f(p) = 0 for all p € U and i € Nn 
by Notation 41.8.5. But the zero function is continuous by Theorem 31.12.9 because it is constant. Hence 
f €CY(U,W). 
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41.8.13 REMARK: Directional derivatives of vector-valued functions on Cartesian spaces. 

The directional derivatives of IR"'-valued functions in Definitions 41.4.2 and 41.4.4 are expressed in terms 
of low-level constructions using &-ó style convergence criteria. However, by Theorem 41.6.17, a continuously 
partially differentiable function from an open subset of IR" to R” for n,m € Zt is necessarily directionally 
differentiable. Moreover, Theorem 41.6.21 (ii) gives a simple formula which expresses directional derivatives 
as linear combinations of partial derivatives. 


In practice, continuously partially differentiable functions as in Definition 41.8.9 and Notation 41.8.10 are 
almost always used in preference to merely partially or directionally differentiable functions in differential 
geometry. So there is no practical benefit in extending Definitions 41.8.2 and 41.8.4 to directional derivatives 
using low-level -ô style criteria. It is simpler to define general directional derivatives as linear combinations 
of partial derivatives for functions in C! (U, W) for U € Top(IR?) and linear spaces W. 


41.8.14 DEFINITION: The directional derivative of a vector-valued function f € C! (U,W) at p € U in the 
direction v € R”, for U € Top(IR") with n € Zf, where W is a finite-dimensional real linear space, is the 
vector in W which is given by 


TL 


0,f(p) = X vaf (p). 


i=1 


In other words, 0, f (p) = Kp (Ov(KB o f)) for any basis B of W. (This uses operator 0, in Notation 41.4.5.) 


41.9. Differentiation for normed linear spaces 


41.9.1 REMARK: Differential calculus for abstract linear spaces. 
Differential calculus concepts can be extended from concrete finite-dimensional Cartesian spaces to more 
general abstract linear spaces in two ways. 


(1) Use a basis for the linear space to first map the space to a Cartesian space, then apply the differential 
calculus for Cartesian spaces, and finally map the Cartesian space concepts back to the abstract space. 


(2) Use an abstract norm on the abstract linear space to define differential calculus concepts directly and 
abstractly without using bases or coordinate maps. 


The task in Section 41.9 is to present some definitions according to approach (2). Since most of the basic 
differential calculus concepts for abstract finite-dimensional linear spaces are automatically valid for general 
(real) Banach spaces, it seems reasonable to define them for Banach spaces, although it is only the finite- 
dimensional case which is required in this book. (See Definition 39.4.2 for real Banach spaces.) 


In the absence of a basis and coordinates, the partial derivative concept in Definitions 41.2.2 and 41.2.7 
is difficult to extend to abstract linear spaces, but the directional derivative concept in Definitions 41.4.2 
and 41.4.4 is easy to extend. 


41.9.2 DEFINITION: A directionally differentiable function at a point p € U, from U € Top(V) to W, where 
V and W are Banach spaces with norms || - ||y and || - ||w, is a function f : U + W such that 


Vv € V, dw € W, Ve > 0, Ad > 0, Yt € (—ô, ô), 
p+tvEU = ||f(p+tv)- f(p) —tullw < elt]. (41.9.1) 


A directionally differentiable function on U € Top(V) is a function f : U — W which is directionally 
differentiable at all points in U. 


41.9.3 REMARK: Definition and notation for directional derivatives. 

Definition 41.9.2 is essentially identical to Definition 41.4.2. It simply substitutes abstract spaces V and W 
for the Cartesian spaces IR” and IR™ respectively. Since the norm on V plays no role, Definitions 41.9.2 
and 41.9.4 are applicable to any linear space V. 


It is clear, by the definition of a norm, that the limit in line (41.9.1) is unique if it exists. So this value can 
be given a name, as in Definition 41.9.4 and Notation 41.9.5, when it exists. 
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41.9.4 DEFINITION: The directional derivative of a directionally differentiable function f : U — W at 
p € U in the direction v € V, where U € Top(V) and V and W are Banach spaces, is the vector w € W 
which satisfies 


Ve > 0, dó > 0, Vt € (—ô, ô), 
p+tveU = ||f(p+ tv) — f(p) — tvw|w < elt]. 


41.9.5 NOTATION: 0,f(p), for a directionally differentiable function f : U — W, for U € Top(V), for 
Banach spaces V and W, denotes the directional derivative of f in the direction v € V at p € U. Thus 


Ve > 0, 4d > 0, Vt € (—ô, ô), 
p+tv EU = |f(pt tv) — f(p) — ta, f(p)\lw < elt]. 


41.9.6 REMARK: Pathological possibilities for directionally differentiable functions. 

As outlined in Section 41.4, directionally differentiable functions can exhibit various kinds of pathological 
behaviour. Totally differentiable functions, as in Section 41.6 are much better behaved. Therefore these 
are generally more useful for abstract linear spaces also. Most importantly, a single linear map determines 
the directional derivatives in all directions at a point. This implies in particular a kind of uniformity of the 
directional derivative with respect to direction. 


The space Hom(V, W) is used in Definition 41.9.7 instead of Lin(V, W) because it is customary (and also 
a good idea) to require the total derivative map L to be continuous with respect to the topologies on V 
and W. (This continuity of L is equivalent to requiring it to be bounded. See for example Lang [23], page 8, 


for the continuity requirement. See for example Gilbarg/Trudinger [81], pages 74-75, for the equivalence of 
boundedness and continuity.) 


In the case of finite-dimensional linear spaces, the continuity requirement for L is redundant because all 
norms on such spaces are equivalent by Theorem 39.5.3. Thus the criterion of total differentiability in 
Definition 41.9.7 is also independent of the choices for norms on V and W if they are both finite-dimensional. 


41.9.7 DEFINITION: A totally differentiable function f : U — W at a point p € U , where U € Top(V) and 
V and W are Banach spaces, is a function f : U — W which satisfies 


AL € Hom(V, W), Ve > 0, 30 > 0, Yq € Bp, NU, 
lf (a) — f) — Lla- »)llw < ella — plv; (41.9.2) 


where Hom(V, W) denotes the set of bounded (i.e. continuous) linear maps from V to W. 


A totally differentiable function f : U + W, for U € Top(V), is a function f : U — W which is totally 
differentiable at every p € U. 


41.9.8 DEFINITION: The total differential of f at p, for a totally differentiable function f : U —> W, where 
p € U and U € Top(V), and V and W are Banach spaces, is the map L € Hom(V, W) which satisfies 
condition (41.9.2) for f at p. 

The total differential of f , for a totally differentiable function f : U > W with U € Top(V), is the function 
from U to Hom(V, W) which maps each p € U to the total differential of f at p. 


41.9.9 NOTATION: df, for a totally differentiable function f : U —> W with U € Top(V), where V and W 
are Banach spaces, denotes the total differential of f. In other words, 


Vp € U, Ve > 0, dó > 0, Vq € By NU, 
Ilf (a) — f) — df(p)(a — »)lw < ella — pllv. 
(df), denotes the value (df)(p) of the total differential df of f at p. Thus df = U, cp (df)p. 


41.9.10 REMARK:  Equivalence of total differentiability with or without coordinates. 

Total differentiability is expressed concretely in Definition 41.6.4 in terms of Cartesian coordinates, whereas 
it is expressed in Definition 41.9.7 in terms of abstract linear spaces. Theorem 41.9.11 shows that these are 
equivalent via coordinate maps «xg, : V > IR" and kg, : W — IR". By Theorem 39.5.3 (iii), the topology 
induced by any norms on V and W is the same as the topology induced by coordinate maps. 
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41.9.11 THEOREM: Equivalence of abstract and through-the-coordinates total differentiability. 
Let V and W be finite-dimensional real linear spaces. Let B; and B2 be bases for V and W respectively. 
Then f : U — W for U € Top(V) is totally differentiable on U if and only if f : U — IR" is a totally 
differentiable map between Cartesian spaces, where U = kg, (U) and f = kgo f o gl. 

PROOF: The norms in Definition 41.6.4 line (41.6.1) for total differentiability of maps between finite- 
dimensional Cartesian spaces are generally assumed to be the standard 2-norms on the domain and range as 
in Euclidean geometry. By Theorem 39.5.3 (iii), these are equivalent via any coordinate maps to any norms 
on V and W. Therefore Definition 41.9.2 line (41.9.1) holds if and only if Definition 41.6.4 line (41.6.1) 
holds. So f is totally differentiable between abstract finite-dimensional linear spaces if and only if f is 
totally differentiable as a concrete map between Cartesian spaces. 


41.10. Inverse function theorem and implicit function theorem 


41.10.1 REMARK: Applicability of the inverse and implicit function theorems. 
The inverse function theorem in connection with the regularity of immersions and submersions in Section 52.5. 
Implicit and inverse function theorems are useful for Lie groups in Section 62.2. 


The proof of Theorem 41.10.4 depends, in essence, on the “extreme value theorem”, which is a special case 
of the combination of Theorems 33.5.15 and 34.4.18, which assert respectively that a continuous function 
maps compact sets to compact sets and that a continuous function maps connected sets to connected sets. 
(The strategy of the proof of Theorem 41.10.4 is similar to the one adopted by Rudin [129], pages 193-194.) 


Another approach to proving the inverse and implicit function theorems is to use contraction maps. (This 
approach is used by Edwards [67], pages 160-194. The contraction map approach is used by Rosenlicht [128], 
pages 205-210, for the implicit function theorem, which is then applied to prove the inverse function theorem.) 
'The contraction map approach yields an iterative algorithm for computing the inverse or implicit functions 
whose existence and regularity is proved. 


41.10.2 REMARK: The upper and lower bound functions for a square matriz. 

The proof of Theorem 41.10.4 uses the “upper bound" function u* : M5, (IR) — IR and the “lower bound” 
function j^ : Ma, (IR) — R for square matrices, which are given in Definition 25.12.2 by the formulas 
ut (A) = sup ( |Av|; v € IR" and |v| = 1) and u^ (A) = inf ( [Av|; v € R” and |v| = 1}. 

The inverse function theorem rests on the equality ut (A^!) = u7 (A) ^! for any invertible matrix A, and 
the fact that A is invertible if and only if L^ (A) > 0. (See Theorem 25.12.3 (xii, xiv).) If the first-order 
partial derivative matrix A(x) = [0; f;(x)]7;-, is continuous with respect to x in a neighbourhood of a, and 
A(a) is invertible, then u^ (A(x)) must be bounded below by a positive number, and so u^ (A(z) !) must 
be bounded above, in a neighbourhood of a. So A(x) must be invertible in a neighbourhood of a. 


41.10.3 REMARK: A proof strategy for the inverse function theorem. 

Since the proof of Theorem 41.10.4 is quite long, it is possibly helpful to outline the stages. 
(1) Show that fly is injective for some neighbourhood U = B,,5, of a € Q. 

2) Show that f(U) is an open neighbourhood of f(a). 

3) Show that ley : f(U) > U is a continuous function. 

4) Show that (f],,)~!: f(U) > U is a differentiable function. 


( 
( 
( 
(5) Show that B : f(U) — U is aC! function. 


mb NU um NE MNA 


41.10.4 THEOREM: Inverse function theorem for C! maps between IR" and R”. 
Let f € C! (Q, IR") for some Q € Top(IR") and n € Z. Suppose that the matrix [9j f; (a)]? j= is invertible 
for some a € 2. Then 


U € Top,(R^), — f(U) € Topy(,; (IR^) and f : U — f(U) is a C! diffeomorphism. (41.10.1) 


(See Definition 41.7.6 for C! diffeomorphisms.) Hence 


AV € Top, (IR^), f^|,isafunction and f-!|, € C! (V, R”). (41.10.2) 
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Pnoor: For stage (1) of the strategy outlined in Remark 41.10.3, let n € Zj, € € Top(IR?), f € C! (Q,IR") 
and a € Q. Suppose that A = [9; f;(a)]?;- = f'(a) is an invertible matrix. Then u^ (4) = n*(A 7)! > 0 
by Theorem 25.12.3 (xii, xiii). Let A = 147 (A). Then \ € Rt and 


Ve € Rt, dó € R*, Vr € Bags, u* (f(x) - f'(a)) <e (41.10.3) 


because f' € C(Q, IR?*") and the upper bound of a matrix is continuous with respect to the elements of the 
matrix. Let £o = 2A, and let ô = do satisfy line (41.10.3) for € = eo. 


Let z,y € Ba. Then «+ t(y — x) € Bas, € Q for all t € [0,1] because Bais, is a convex subset 
of IR^. So the map F : [0,1] — R” given by F : tw f(x + t(y — x)) — tA(y — x) is well defined and 
continuously differentiable on (0,1). Then F’(t) = f'(x + t(y — x)) — A(y — x) for all t € (0,1). So 
pt (F'(t)) = wt Cf'(z-t(y—2z))-A(y-2)) < 24|y-z]|. But |A(y-2)| > &- (A)|y—x| by Theorem 25.12.3 (vii). 
Therefore y+ (F'(t)) < 22 u- (A) !|A(y — x)| = $|A(y — x)|. It follows that |F(1) — F(0)| € 2|A(y — x)| by 
the mean value theorem for curves. (See Theorem 40.8.7.) Thus 


Vz,y € Bass, f(y) - f(z) - Aly — x)| € slA(y -= x)|. (41.10.4) 
But |A(y — x)| € |A(y — x) — (f(y) — f(x))| + |f(y) — f(z)| by the triangle inequality for the norm on R”. 
So |f(y) — f(z)) = Au — 2)| - If) — f(z) - A(y - x)| > IA = 2) - &lA(y = x)| = FIA — x)| = 
zu-(A)|y — x| = 2A|y — z|. Consequently 


Vz,y € Bao f(y) — f() > 2Aly — al (4110.5) 


which implies that f is injective on B,,5, because A > 0. (This completes stage (1) of the strategy outlined 
in Remark 41.10.3.) 

For stage (2) of the strategy outlined in Remark 41.10.3, let b € Ba,5,. Then |b-a| < do. Let r = z (0o—|b—a|). 
Then 0 < r < ĉo — |b — a|. So Boy € Ba,s,. Define a set-valued function S : Bf ar > IP(By,) by 


Vz € By; S, = {x € Bor; Vr! € Byr, |f(z) - z| € | f(a’) — z|}. (41.10.6) 


(Note that if it can be shown that By. X. € f (By), then f-! will be well defined on B g(p),;, and then 
S,= {f-(z)} for all z € Bry ae) 

Let z € Byy),,r- Then the map 9 : By — R defined by z  |f(z) — z| is continuous on the compact, 
connected set By. So ¢(By,r) is a compact, connected set of real numbers by Theorems 33.5.15 and 34.4.18. 
Therefore ¢(B,,,) is the bounded closed interval [c,d] for some c, d € R with c < d. It follows that ¢(x) = c 
for some x € Byr, and that Vx’ € Byr, |f(x) — z| € |f (x!) — z| for any such choice of x. Therefore S; Æ 0. 


Let z € Bjy(y,, and x € S.. Then z € Brr. Suppose that r € Byr = {a’ € R”; |2’ — b| 2 r}. 
Then |f(x)— f(b)| > 2Ar by line (41.10.5). So 2Ar < |f(x) — z| + | f(b) — z| < |f(x) — z| + Ar because 
z € Bfr- Therefore | f(x) — z| > Ar, and so | f(x) — z| > |f (b) — z|. This contradicts the assumption that 
|f (x) — z| < |f(z') — z| for all z' € By. Therefore x € OBy,,. Thus $; C By, for all z € Bfr- 

Let z € By(y, 4, and x € S,. To show that f(x) = z, let w = f(x) — z and suppose that w # 0. Then 
w € R” \ (0). Let v = A^ lw. Then v Z 0. (Here w may be thought of as a kind of “error vector" because 
f(x) is intended to equal z. To correct this “error”, the strategy here is to subtract a small multiple of 
A^ lw from x, which should approximately move f(x) closer to z by the same multiple of w, since A is the 
"differential" of the map f.) 


Let tı = 4 min(l, (r — |x — b|)/|v|). Then x — tv € By, because x € Bp, and tı € (0,1). So 


|f (x — tiv) - z| = |f (x — tiv) — f(x) + tiw — z+ f(x) - tw)| 
< |f(z — tv) — f(z) + tiw] + |f(z) - z tiw] 
= |f(x — tiv) — f(x) + Ativ| + (1 — &à)|w| (41.10.7) 
€ $|Ativ| + (1 — t)|w| (41.10.8) 
-(- $t)lw| 


= (1- $t)lf(z) — al, 
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where line (41.10.7) follows from the calculation | f(x) - z —t1 Av| = |w—t1w| = (1—t1)|w], and line (41.10.8) 
follows from line (41.10.4). But this means that f(x — tiv) is closer to z than f(x), which contradicts the 
definition of S, in line (41.10.6). Hence f(x) = z. Since f is injective on B, (because f is injective on Ba,s,), 
x is unique. So S; equals the singleton {f~'(z)}. Therefore Bæjar € f (Bor) € f(Ba,s)). In other words, 
every element f(b) in f(Ba,5,) has an open neighbourhood By,p),,, which is included in f(B,,5,). Hence 
f(Ba,5,) € Top (4 (IR"). This verifies the first half of line (41.10.1) with U = Ba,5,. (This means that f has 
a well-defined inverse (f ie in an open neighbourhood of f(a), which completes stage (2) of the strategy 
outlined in Remark 41.10.3.) 

Stage (3) of the strategy in Remark 41.10.3 is to show that ( (Fla)! : f(U) > U is a continuous function 
for U = D,,5,. This follows immediately from line (41.10.5). In fact this shows that (f aye is uniformly 
Lipschitz on f(U) with Lipschitz constant 3A !. 

Stage (4) of the strategy outlined in Remark 41.10.3 is to show that Ghar : f(U) > U is a differentiable 
function for U = Bys5,. Let g = Ur Let z € f(U) = Dom(g). Let r' = min(1, d(z, R” V f(U))). (See 
Definition 37.4.1 for the distance between a point and a non-empty set.) Then r’ > 0 because f(U) is open. 
Let w € Bo. Then z + tw € Bzr € f(U) for all t € [0,1]. So g(z + tw) € U for all t € [0,1]. 


Let x = g(z) and v(t) = g(z + tw) — g(z) for t € [0, 1]. (This is illustrated in Figure 41.10.1.) 


Figure 41.10.1 Proving that the inverse of a C! function is differentiable 


Then tw = f(x + v(t)) — f(x) = f'(x)v(t) + Q(t) for all t € [0,1], where Q : [0,1] — IR" is defined by 
Q(t) = f(x 4 v(t)) — f(x) — f'(z)v(t) for t € [0,1]. So 


Ve > 0, Jô > 0, Vt € [0,1], |v(t)| « 6 > |Q(t)| € elv(t)| 


by the definition of the differentiability of f at x. (See Definition E 2.2.) T d choice of € = £o and 6 = do 
in line (41.10.3), wt (f'(x) — A) = pt (f(x) — f'(a)) < co = 2A = 5 Pp f'(x) is an invertible matrix 
by Theorem 25.12.6 (ii). Let B = f'(x) |. Then tBw = B(tw) = Bf'(x)w(t ) + BQ(t) = v(t) + BQ(t) for 
all t € [0,1]. So g(z + tw) — g(z) = v(t) = tBw — BQ(t) for all t € 0 1]. 

Let «1 € R*. Then for some 6; € Rt, |Q(t)| € &i|v(t)| for all t € [0,1] satisfying |v(t)| < ó1. Then 
|BQ(t)| € u*(B)|Q(t)| € eu*(B)|w(t). But 2A|v(t)| € [tw| for all t € [0,1] by line (41.10.5) and the 
definition of v. (Alternatively, use the Lipschitz constant $A~! for g.) So |BQ(t)| < ta) u (B \t|w| for 
all t € [0,1] satisfying |v(t)| < à. Therefore |(g(z + tw) — g(z))/t — Bw| = |t * BQ(t)) € $&14 ! n* (B)|w 
for all t € (0, 1] satisfying |v(t)| < 01. Therefore g'(z) = B = f'(x)^! by the definition of the derivative. 
(See Definition 41.2.12.) So g is differentiable on f(U). (This completes stage (4) of the strategy outlined 
in Remark 41.10.3.) 
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Stage (5) of the strategy outlined in Remark 41.10.3 is to show that (f|,,)~' : f(U) — U is a C! function 
for U = Ba,s,. From the formula g'(z) = f'(x) !, it follows that g'(z) = f'(g(z)) ! for all z e f(U). But 
f! : U > Mn n(R) is continuous by the assumption that f is a C! function, and g : f(U) > U has been 
shown to be continuous in stage (3), and matrix inversion is a continuous map by Theorem 39.6.2 (ii). So g 
is a C! function on f(U). (This completes stage (5) of the strategy outlined in Remark 41.10.3.) 


41.10.5 REMARK: The implicit function theorem. 

The implicit function theorem states in essence that the level sets of a C! partial function f : IR"*"" > R”, 
for n,m € Zg , can be represented in a neighbourhood of a non-stationary point as the graph of a C! partial 
function g : IR" — R”. This makes it possible to express the level sets as C! differentiable m-dimensional 
embedded submanifolds of the ambient space R”t™. 


For proofs of the implicit function theorem, see Rudin [129], pages 196-197; Friedman [74], pages 236-242; 
Lang [23], pages 19-21; Edwards [67], pages 108, 167—169, 190-192; Rosenlicht [128], pages 174-175, 205-209; 
Thomson/Bruckner/Bruckner [149], pages 509-512; Auslander/MacKenzie [1], pages 28-31; Kosinski [21], 
pages 224-225; Kaplan/Lewis [99], pages 935-937. 

Whereas the inverse function theorem is analogous to matrix inversion, the implicit function theorem is 
analogous to finding all solutions of a set of simultaneous linear equations. Since the solution of simultaneous 
linear equations is closely related to the inversion of a matrix, it is not surprising that the implicit function 
theorem is closely related to the inverse function theorem. In fact, Theorems 41.10.6 and 41.10.4 may be 
inferred from each other. 


41.10.6 THEOREM: Implicit function theorem. 

Let f € C!(Q, IR") for some Q € Top(IR"*™) and n, m € Zj. Let (p,q) € Q with p € R” and q € R”, 
and suppose that the matrix [0; fi(p,q)|?j-1 is invertible. Then for some W € Top, (IR"), there is a unique 
g € C1 (W, R”) which satisfies g(q) = p and 


Vy € W, Faly), v) = f(p, a). (41.10.9) 


PROOF: Let f € C!(Q,IR") for some Q € Top(R”+™) and n,m € Zj. Then f is continuously totally 
differentiable on € by Theorem 41.6.15. Let (p.q) € Q with p € IR" and q € IR". 

Define A = [9;fi(p. JI! a. B = [nay fis alata and C = [jfi (v, a)l, so that C € Maias (R) 
is the concatenation of A € M, (IR) and B € Mn,m(R) with respect to the index j. Then A is assumed 
invertible. Define F : Q + R”+™ by Va € R”, Vy € R", F(x,y) = (f(z,y),y). Then F € C! (OQ, IR^"). 
Define p : Q — R” by Vaz € R”, Vy € R^, p(z,y) = f(z,y) — f(p,q) — C(z —p,y —q). Let € € IR*. Then 


3ó € R*, V(z, y) € Q, 
l(zr—»y-—2e4)m^-«9 => le(myiszeíz-»v-a)mne 


by Definition 41.6.4. Define the matrix D € Mn+m,n+m(R) by 


A B 
DES. 


where 0 denotes the m x n zero matrix. Then 


V(z,y) € Q, F(z,y) — F(p,q) = (F(z.y) — f(p,a),v — a) 
= (C(x —p,y— 9) + p(z, y) y — q) 
= (A(z — p) + B(y — 4), y— a) + (p(a, y), 0) 


= D(x — p,y — q) + (p(2,y), 0). 


So |F(z,y) — F(p.a) - D(z — p, y — a)» = |(p(z, y), 0) [ms for all (x,y) € Q. Therefore 


Jð € Rt, V(x, y) € Q, 
(cx —p,y—-Q|prtm < => |F (x,y) — F(p, 9) — D(z —p,y — q)|rr+m < El(@ — p, y — q)|rr+m. 
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Hence D is the total derivative of F at (p,q) by Definition 41.6.4. But D is an invertible matrix because A 


is invertible. In fact, 
A! —A!B 
—1.. 
pin [At enn] 


Therefore by Theorem 41.10.4, the inverse function theorem, there is a set U € Top, o) (R”=™) such that 
F(U) € ToP pip q) (IR"*"*) and Bs : U — F(U) is a C' diffeomorphism, and G = PL :V-F(U)—U 
is a well-defined injection in C! (V,IR"*"). Express G(w, z) as (¢(w, z), v(w, z)) for all (w,z) € V. Then 
(x,y) = G(F(z,y)) = G(f(v.y),y) = (Gf (v, y), y), v Cf (z, y), y)) for all (x,y) € U. Therefore (F(x, y)) = 
v(f(x,y)y) = y for all (x,y) € U. So v(w,z) = z for all (w,z) € V because Range(F) = V. Thus 
G(w, z) = (ó(w, z), z) for all (w,z) € V. 

Let W = {z € R™; dw € R”, (w,z) € Vj. Then q € W because (f(p,q),q) € V, and so W € Top, (IR") 
by Theorem 32.10.3 (iv) because V € Top(R"t™). Define g: W > R” by gly) = é(f(p,q),u) for 
all y € W. Then g € C!(W,R”) because ó and f are C! functions, and (f(g(y),y),y) = F(g(y),y) = 
F($(F(p,4) y) y) = F(oó(f(p.a) y), Va) y)) = F(G(f(.a.v)) = (F(p, q), y) for all y € W. So 
Flav) y = fti) for all y € W. Simi (aa). a) = (8(/»0,0), = (BLP. 0) ) = GCP) = 

So g(q) = 

To show that g : W — R” satisfying g(q) = p and 
G(F(g(y).y)) = GGG). v). v) = GFP, a). v) = (¢ 


unique. 


line (41.10.9) is unique, let y € W. Then (g(y),y) = 
(f(p,@),¥),¥) = (gly), y). So g(y) = gly). Hence g is 
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Chapter 42 


HIGHER-ORDER DERIVATIVES 


42.1 Higher-order derivatives for real-to-real functions... ... ee 1361 
42.2 Higher-order derivatives for real functions on Cartesian spaces . . . ... sls 1365 
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42.8 Power series and analytic functions . . . . sooo 1383 


42.1. Higher-order derivatives for real-to-real functions 


42.1.1 REMARK: Considerations for the formalisation of higher-order derivatives. 

Since the Ath derivative of a function f at a point x € IR depends on two variables, k and xz, one must 
make a choice of order for the formalisation of higher-order derivatives. One may think of these derivatives 
as a sequence of partially defined functions or a sequence-valued function. In the former case, the function 
is of the form Z — (IR > R). In the latter case, the form is IR — (Zt > IR), or more precisely, IR > 
List(IR). (See Detter 10.9.2 and Notation 10.9.3 for partially defined functions. See Definition 14.12.12 
and Notation 14.12.13 for extended list spaces. x) — 


formalisation functional form 


function-valued sequence Zf — (IR > IR) m 
sequence-valued function IR — (Zg >R) [R 5 List(IR)] 


'The sequence-valued function representation for higher-order derivatives is clumsy. In this representation, a 
sequence of higher-order derivatives f(x), f'(x), f"(x),... is attached to each point x € IR. This is unnatural 
in the sense that the computation of each derivative f(? (xz) requires as input the values of the function 
fU (t) for t in a neighbourhood of x, not just the value f P (x). So it is not possible to define f(x) as 
an inductive sequence. The sequence-valued function representation has the minor advantage that it forces 
the values f? (a) to be defined for £ < k if f? (a) is defined. 


The comparison of advantages seems to strongly favour the “sequence of partially defined functions" style 
of representation as in Definition 42.1.4. The domains of the partial functions in this sequence are non- 
increasing. The set of partially defined functions is closed under differentiation, whereas a set of functions 
with a fixed domain is not. (This is illustrated by the example in Figure 42.1.1.) 


Definition 42.1.2 makes use of Definition 40.3.4 for the differentiability of f at x, and Definition 40.4.4 and 
Notation 40.4.8 for the value of the derivative of f at z. The main advantage of Definition 42.1.2 “compared 
to Definition 40.4.4 is that is well defined for real-valued functions on any subset of IR, which implies that it 
is well defined when it is applied any finite number of times. 


42.1.2 DEFINITION: The derivative of a partial function f : IR — R is the partial function 
{(x, f'(z)); x € Int(Dom(f)) and f is differentiable at x}. 


42.1.3 NOTATION: Of, for a partial function f : IR — IR, denotes the derivative of f. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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f(x) f" (x) f" (a) 
2! gt o—s 21 
EL—e 1} 1 
-2 -1 i Êz 21 l 3. 2 -1 E M 
-1 -1f -1 
3| E E 
Dom(f)=R  Dom(f)-RV(1]  Dom(f^-RV(12) Dom(f")-RVí1,2) 


Figure 42.1.1 Higher derivatives may be partially defined 


42.1.4 DEFINITION: The sequence of higher-order derivatives of a partially defined real function f : IR > IR 
is the sequence F : Z — (IR > IR) defined inductively by Fo = f and Vk € Zj , Fi = Fp. 


The k-fold derivative of a partial function f : R > R, for k € y^ is the element Fp of the sequence of 
higher-order derivatives of f. 


42.1.5 NOTATION: Ó*f, for a partial function f : IR IR and k € Zh denotes the k-fold derivative of f. 


42.1.6 REMARK: The domains of higher-order derivatives form a non-increasing sequence of sets. 
Definition 42.1.4 implies that Dom(O**! f) C Int(Dom(0* f)) C Dom(0* f) for all k € Z*. 


42.1.7 DEFINITION: A partial function f : IR — IR is said to have an undefined k-fold derivative at 
x € Dom(f), for k € Zt, if x ¢ Dom(0* f). 


A function f : IR > R is said to be k-times differentiable at x € Dom( f), for k € Zf, if x € Dom(0* f). 
A function f : IR > R is said to be k-times differentiable on a set S C Dom(f), for k € Zj if S C Dom(0* f). 


42.1.8 REMARK: Linear space structure for spaces of higher-order differentiable real functions. 

The spaces of functions in Notations 42.1.9 and 42.1.10 are closed under pointwise addition and multiplication 
by real numbers. So these spaces have a natural linear space structure with respect to these operations. (See 
Notation 31.12.11 for the space of continuous functions C(U, IR) in Definition 42.1.10.) 


42.1.9 NOTATION: D*(U,R), for an open subset U of IR and k € Zj , denotes the set of functions f : U > IR 
for which Dom(0* f) = U. In other words, 


VU € Top(R), Vk € Zf, D'(U,R) = (f : U 2 R; Dom(0* f) = U}. 


D^*(U, IR), for an open subset U of IR, denotes the set No D*(U, R). 
D*(U) is an abbreviation for D^(U, R). 
D** (U) is an abbreviation for D% (U, IR). 


42.1.10 Notation: C*(U,IR), for U € Top(IR) and k € Zf, denotes the set of f : U — IR such that 
Dom(0* f) = U and 0* f is continuous on U. In other words, 


VU € Top(IR), Vk € Zj, 
C*(U, R) = (f € D'(U, R); 0*f € C(U, R)) 
= (f : U > R; Dom(0* f) =U and 0*f € C(U,R)). 


C^ (U, R), for an open subset U of R, denotes the set NX, C*(U, R). (This is the same as D** (U,R).) 


C*(U) is an abbreviation for C^ (U, IR). 
C™(U) is an abbreviation for C?*(U, R). 


42.1.11 DEFINITION: A C* differentiable function from U to R, for k € Z and U € Top(R), is any 
element of C^(U, R). 
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42.1.12 REMARK: Interpretation of infinite superscripts for OF function spaces. 

Many definitions and theorems are stated in terms of C^*! functions with k € Zj = Z{ U(oo), or some such 
set of extended non-negative integers. The question then arises as to what “C+!” might mean. This can 
be interpreted in terms of some kind of infinite arithmetic where oo +j = oo for all j € Z, which has dubious 
validity. A preferable interpretation is that any superscript expression for a continuous differentiability class 
C which contains “oo” should be constructed as an infinite intersection as for C% (U, IR) in Notation 42.1.10. 
Thus “C+” should be interpreted as (5.9 C^*7. In other words, the presence of the symbol “oo” in the 
superscript means that the function is in every continuous differentiability class with an integer substituted 
for the symbol “oo”. Thus “C+” means the same as *C^??" although “oo + j” might be ill-defined. 
When j < 0, the expression *C?**7" cannot strictly be interpreted as (o C^ ^7 because C~! is undefined. 
Therefore this interpretation must be limited to classes C+) which are well defined. (Defining negative 
differentiability classes to include all functions might sound attractive, but this is strictly “verboten” in 
good mathematics.) This shows once again the delicate trade-off between expediency and exactness, between 
clarity and correctness, in the choice of notations. 


42.1.13 REMARK: Constant real functions are infinitely continuously differentiable. 

'Theorem 42.1.14 is a trivial exercise to determine whether the definitions and notations are workable. But 
the result is also useful. (Some related trivial exercises for constant functions are the proofs of Theorems 
40.5.5, 41.1.16 and 41.2.23. For the extension of Theorem 42.1.14 to real-valued functions on Cartesian 
spaces, see Theorem 42.2.16.) 


42.1.14 THEOREM: Constant real functions of a real variable are C??. 
Let U € Top(IR) and c € IR. Define f : U — IR by f(x) = c for all z € U. Then f € C?*(U), and O* f(p) =0 
for all p € U and k € Z*. 


PROOF: By Theorem 40.5.5, f is differentiable on U, and f'(p) = 0 for all p € U. So Dom(0f) = U by 
Definition 42.1.2 and Notation 42.1.3, and (Of)(p) = f'(p) = 0 for all p € U. 

Assume for some k € Z* that Dom(0* f) = U and (0* f)(p) = 0 for all p € U. Then Dom(O**! f) = U and 
(Q**!f)(p) = (0(0*" f))(p) = 0 for all p € U by Theorem 40.5.5 and Definition 42.1.4 and Notation 42.1.5. 
Hence f € C**(U) by Notation 42.1.10 by induction on k € Z+. 


42.1.15 THEOREM: A function with OF derivative must be CF. 
Let U € Top(IR). Let f : U — IR be differentiable on U. 

(i) Vk € Zf, (f' € C'(IR) > f € C^ (Im)). 

(ii) f' € C@(R) => f € C? (IR). 
PROOF: For part (i), let k € Zf and suppose that f is differentiable on U with f’ € C^(U). Then 
Dom(0* f") = U and 0* f’ € C(U, R) by Notation 42.1.10. But by Notation 42.1.3, 0* f' = O*(Of) = o**1f 
by Definition 42.1.4 and Notation 42.1.5. Therefore Dom(0^*!f) = U and O**!f € C(U, R). Hence 
f € CF (IR) by Notation 42.1.10. 
For part (ii), suppose that f is differentiable on U with f' € C°(U). Then f’ € C*(U,R) for all k € Zj by 
Notation 42.1.10. So f € C**! for all k € Zp by part (i). Hence f € C?*(U) by Notation 42.1.10. 


42.1.16 THEOREM:  Differentiability of the identity map on the real numbers. 
Let U € Top(IR). Then idy € C?*(U) and O*idy(p) = 9f for all p € U and k € Z+. 


PROOF: By Theorem 40.5.12 (ii), idy is differentiable on U and idy(p) = 1 for all p € U. So Dom(8f) = U 
by Definition 42.1.2 and Notation 42.1.3, and (Oidy)(p) = idy (p) = 1 for all p € U. Then idy € C?*(U) 
and 0'+idy(p) = 0*Oidu(p) = 0 for all k € Zt by Theorem 42.1.14 because id; is constant on U. Hence 
idy € C**(U) by Theorem 42.1.15 (ii), and O*idy(p) = 0 for all p € U and k € Zt. 


42.1.17 REMARK: Higher derivatives of composites of real-valued functions. 
Theorem 40.5.17 (i) may be generalised to higher-order composition rules for differentiation with a single 
variable as follows. 


h!" = jig + f'g" 
n"! = Paid (g^? 4 3f"g"g' +4 Tu 
nO - Tin (g^) + af g! (g)? +4 3f” (g? 3 Ay" g" gl 4 fig, 
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and so forth. It is assumed that g € C^(U) and f € C*(V), where k is the order of the derivative of h. 


42.1.18 REMARK: Formulas for higher derivatives of inverse functions. 
It is sometimes useful to be able to calculate the derivatives of inverse functions. Some example computations 
are given in Theorem 42.1.19. 


42.1.19 THEOREM: Formulas for first, second and third derivatives of inverse functions. 
Let f : Q1 — Q2 be a C? bijection for open sets Q1, Q2 C IR. Let t € Q2 be such that f'(f-!(t)) 4 0. 


(i) HF) = FF). 
(i) off (€ =-HF TO) LPG TOY? if f is C?. 
(ii) FAA) = —f" (F1 (0) (FFA) + 3" Q7 (Y? IFO)? if f is C9. 
Pnoor: Let f : 0; — Qz be a C! bijection for open sets O1, 09 C IR. Let g = f-!. Then f(g(t)) = t for 
all t € Q2. Let t € Qz be such that f'(f-1(t)) 4 0. 
To prove part (i), 0, f(g(t)) = f'(g(t))g'(t) = 1. So g(t) = f'(g(t)) ^. Hence 0,f~*(t) = (f^ (0). 
TG prover part (i; arg (t)) = f"(g(£)9 (t)? + f'(g(£))g" (t) = 0. So g"(t) = ~F'@)9 '(t)?/f"(g(t)). But 
g (t) = f'(g(t)) ^. So g"(t) = —f"(g(t)) f (g(t)) ^. That is, 02 f * (t) = =F ET ()f (7 (0). 
Alternatively, pirt (ii) may be proved by dilrerentiatiig part (i). Thus ftt) = Alf A) = 
SE T EOE ETT ETAT E) = E ET OEE ETO T E) by part (i). Hence 0? f7! (t) = 
ae ET EOE ET E. 


For part (iii), differentiating f o g = ido, three times gives 


F'OTO? + 2F"(g(t))9' ()g" (t) + F OOI (0) + f (g()g" (0 = 0. 


Then 
f'(g(t))g" (t) = —f"(g(t))g (0? — 3f" (g(t))g (t)g" (t) 


= - f" (g(0)f' ((0) 7 — 3/" (9(0)/ (9(0) C£" GO) f (9(0) 7) 
Hy parts (C) and Qui. Terie gU = — f" (g(t)) f'(g(0) * + 3/" (g(t)? '(g(0)) ?. Hence 8? f (t) = 


Pr OEE a) 37 0 (DP FG 09-5. 
Alternatively, part (iii) follows by differentiating 02 f! (t) = —f"(f-1(t))f'(f-(t)) ? to give 


oM c PFT OSES TOY) 
-f"qeyrü 0) Prog qo) ?esrmüg arg era ee) 
-J"(f (Y (0) * -3P" 07 (OY PGT OY. 


42.1.20 THEOREM: Bounds on second derivatives at minimum and maximum points of a function. 
Let S be a subset of IR. Let f : S — IR. Let f be differentiable on U = Int(S). 


(i) If p € U and f(p) = min(f) and f" is differentiable at p, then f"(p) > 0. 
(ii) If p € U and f(p) = max(f) and f’ is differentiable at p, then f"(p) < 0. 


PROOF: For part (i), let f’ be differentiable at p € U. Suppose that f(p) = min(f) and f"(p) < 0. It follows 
from Theorem 40.6.2 (ii) that f'(p) = 0. Let w = f"(p) and € = —w/2 € IR*. Then by Definition 40.4.4, 
there is a ôo € IR* such that | f’(a) — w(x — p)| < ela — p| for all x € (p — ĉo, p + 60) N (U \ (p]). This implies 
f'(x) < (wte)(x—p) < 0 for all x € (p,p+60)NU. Let 6; = supíó € (0,60); (p, p--ó C U}. Then 0 < 01 € do 
because U € Top(R), and then f'(x) < (w+e)(x— p) < 0 for all x € (p,p + 61). So f(x) < f(p) for all 
x € (p,p4-ó1] by Theorem 40.6.9 (iv). This contradicts the assumption that f(p) = min(f). Hence f"(p) > 0. 


For part (ii), apply part (i) to — f. 


42.1.21 REMARK: The relation between the second derivative and convexity/concavity of functions. 
Theorem 42.1.22 shows that if a real-valued function on a real interval has a non-negative second derivative 
on the interval, then it must be convex. Similarly, the function must be concave if the second derivative is 
non-positive. (See Definition 23.12.2 for convex and concave functions.) 
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42.1.22 THEOREM:  Second-derivative bounds which imply convexity or concavity. 

Let I be a real interval. Let f : J — R be continuous on J and twice differentiable on Int(J). 
(i) If f"(z) > 0 for all z € Int(J), then Vzo € Int(1), (f'(zxo) 2 0 = (zo) = minze; f(z)). 
(ii) If f"(x) > 0 for all x € Int(I), then Va,b € I, Vx € [[a,6]], f(x) € max( f(a), f(d)). 

(ii) If f"(x) > 0 for all x € Int(Z), then f is convex on I. 

(iv) If f"(x) € 0 for all x € Int(Z), then Vzo € Int(1), (f'(zo) 2 0 => f(xo) = maxzer f(z)). 
(v) If f"(x) < 0 for all x € Int(7), then Va,b € I, Vx € [[a,b]], f(x) 2 min(/f(a), f (0)). 

(vi) If f"(x) < 0 for all x € Int(I), then f is concave on I. 


PROOF: For part (i), f' is non-decreasing on [a,b] for all a,b € I by Theorem 40.6.9 (i). Therefore f’ is 
non-decreasing on Int(/). Consequently the assertion follows from Theorem 40.6.11 (i). 


Part (ii) follows from Theorem 40.6.11 (ii) because f" is non-decreasing on Int(J) as in part (i). 
Part (iii) follows from Theorem 40.6.11 (iii) because f’ is non-decreasing on Int(I) as in part (i). 


For part (iv), f’ is non-increasing on [a,b] for all a,b € I by Theorem 40.6.9 (iii) So f’ is non-increasing 
on Int(I). Consequently the assertion follows from Theorem 40.6.11 (iv). 


Part (v) follows from Theorem 40.6.11 (v) because f’ is non-increasing on Int(J) as in part (iv). 


Part (vi) follows from Theorem 40.6.11 (vi) because f’ is non-increasing on Int(J/) as in part (iv). 


42.2. Higher-order derivatives for real functions on Cartesian spaces 


42.2.1 REMARK: Higher-order partial derivative notations use multi-indices. 

It is convenient to denote higher-order partial derivatives in terms of multi-indices which are lists of individual 
indices in a specific order. (See Definition 14.8.38 for the style of multi-index which is used here.) However, 
there is more than one way to interpret an expression such as 0a f = Oa,,...0;,...0,/ for a function f : U > R, 
where U € Top(IR") and a € NE for some n, k € Zj. 


(1) 054,..0;,..0, / means Os, (... (Oa; (... (Oa, f)))). This order is adopted by Rosenlicht [128], page 201; 
Thomson/Bruckner/Bruckner [149], page 470; Friedman [74], page 215; Graves [85], page 80. 


(2) Ooy,...c,..0nf Means Og, (... (05, (. .. (Os, f)))). This order is adopted by Rudin [129], page 208. 


2 
o es dud O*f 
Ox; Ox; Ox j OX; 
Rosenlicht [128], page 201, who uses a different order for i and j according to the style of notation. This is 
not as confusing at it might at first seem because when the differentiation operations are written individually, 
the meaning is in no doubt at all. When the indices (or independent variables) are merely listed, they are 
given in the reverse order, but the intention is that the first index in the list is executed first. In other 
words, left-to-right order is also a kind of *chronological order". (Similar comments are made by Thomson/ 
Bruckner/Bruckner [149], page 470.) 


Although finite sequences are typically written in left-to-right increasing order so that Og means ĝa; ,...0;,...0%3 
finite sequences are also written right-to-left, as for examples in the digits of a decimal number or in the 
terms of a polynomial. Such a right-to-left order would have the advantage of harmonising the notation 
Oo f Oo,,..04,...0,f With Oa, (... (05, (. (3a f)))). The potential for ambiguities is clearly substantial. 


In the case of C* functions, Theorem 42.3.6 shows that the order of application of k partial derivatives makes 
no difference to the value of the derivative, but it does make a difference to the meaning of the expression. 
Since not all k-times differentiable functions are continuously k-times differentiable, it is a good idea to settle 
on a preferred order of differentiation. 


, as pointed out by 


Alternative notations for f,,,, include dud Us Dijf, D;Df, 


It seems best to adopt option (1), especially in a book on differential geometry, because in tensor calculus, 
one often indicates partial or covariant derivatives by appending coordinate indices to a list, as for example 
‘Thee or “gij,ne”, With the meaning that the derivative should be applied in left-to-right sequences. In the 
case of covariant derivatives, the order very definitely does make a difference, and this difference is closely 
related to the curvature of the manifold. 
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Another issue which arises for multi-indices is whether to commence the indices for the indices with 0 or 1. 
For convenience and conventionality, NF will be interpreted here as the set {(a1,...ax); Vi € Nk, o; € Nn} 
for any n,k € Zi. In other words, NE will be interpreted as the set of maps from IN; to Nn. 

Definition 42.2.2 refers to Definition 41.1.4 for the partial differentiability of a real-valued function on an 
open subset of R”, and to Definition 41.1.10 and Notation 41.1.12 for the partial derivative ð; f(x) of f at 
x with respect to component i € Na. (See Definition 42.1.2 for the corresponding ordinary derivative for a 
partial function f : IR > IR.) 


42.2.2 DEFINITION: The partial derivative of a partial function f : R” — R with respect to component 
i € Nn, for n € Zi, is the partial function 


{(x,0;f(x)); x € Int(Dom(/)) and f is partially differentiable at x]. 


42.2.3 REMARK: Extension of partial derivative notation from functions to partial functions. 

Notation 42.2.4 is very similar to Notation 41.1.13. The difference is that Notation 42.2.4 refers to the partial 
derivative of a partially defined function in Definition 42.2.2, whereas the expression “0; f(a)” which appears 
in Definition 42.2.2 applies Notation 41.1.12 to the fully defined function f on the open set Int(Dom(f)). The 
expression “O;F,” in Definition 42.2.5 applies Notation 42.2.4 to a partially defined function Fy : R” > IR. 


The principal advantage of this “partial function approach” is that one may define and notate any number 
of higher-order derivatives without needing to continually qualify all computations by the requirement that 
the derivatives must be “defined”. Since Definition 42.2.5 accepts partial functions as input and produces 
partial functions as output, it may be applied any number of times with confidence that the result will be 
“well defined" . 


42.2.4 NOTATION: Ójf, for a partial function f : R” > R, for n € Zf and i € Nn, denotes the partial 
derivative of f with respect to component i. 


42.2.5 DEFINITION: The partial derivative tree for a partial function f : R” > R with n € Zi, is the map 
F : Up, NE > (R” > R) which is defined inductively by the following rules. 


(i) Fo =f. 


(ii) Vk € ZF, Va € N*, Vi € Nn, Fai = 0; Fa, where “a, i” means the concatenation of a and (i). 


The partial derivative for multi-index a of a partial function f : R” > R, for a € NE with n,k € Ze is the 
value Fa of the partial derivative tree for f. 


42.2.6 NOTATION: Ó,f, for a partial function f : IR" > IR and o € N*, for n,k € Zg, denotes the partial 
derivative for multi-index a of f. 


42.2.7 DEFINITION: A partial function f : R” > R, for n € Zf, is said to have an undefined partial 
derivative for multi-indez a at x € Dom(f), for a € Uz, Nk, when x ¢ Dom(ôa f). 

A partial function f : R” — R is said to be k-times differentiable at x € Dom(f), for n,k € Zj , when 
x € Dom(Ó, f) for all a € NF. 


A partial function f : R” — R is said to be k-times differentiable on a set S C Dom(f), for n,k € Z, when 
S C Dom(Ó, f) for all a € NË. 


42.2.8 Notation: D*(U,R), for U € Top(IR”) and n,k € Zj , denotes the set of functions f : U — IR for 
which Dom(8, f) = U for all a € N*. In other words, 


Yn, k € Zt, VU € Top(IR?), 
D*(U,R) = (f : U > R; Vo € NË, Dom(0,f) = U}. 


D^*(U, R), for U € Top(IR), denotes the set No D^(U, R). 
D*(U) is an abbreviation for D*(U,IR) for U € Top(R). 
D**(U) is an abbreviation for D (U, R) for U € Top(IR). 
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42.2.9 NOTATION: C*(U,IR), for U € Top(IR") and n,k € Zi, denotes the set of functions f : U > R 
such that Dom(0.f) = U and 0.f is continuous on U for all a € NF. In other words, 


Vn, k € Zi, VU € Top(IR”), 
C*(U,R) = {f € D'(U, R); Va e NF, 8,f € C(U,R)) 
= {f : U > R; Va € NË, (Dom(0,f) =U and 0,f € C(U, R))). 


C^* (U, R), for U € Top(R”) with n € Zj , denotes the set No C*(U, R). 
C*(U) is an abbreviation for C^(U, R) for U € Top(R”) with n € Zf. 
C®(U) is an abbreviation for C® (U, R) for U € Top(R”) with n € Zg. 


42.2.10 REMARK: Simpler-looking recurrence formula for OF function spaces. 

The recurrence formula C^*!(U) = (f € C*(U); Vi € Nn, 0;f € C*(U)} in Theorem 42.2.11 (vii) could be 
regarded as the definition of the spaces C^(U) for k € Zj. The statement and proof of Theorem 42.2.11 
are essentially identical to the statement and proof of Theorem 42.5.16 for IR’-valued maps for m € Zi. 
(The relations in Theorems 42.2.11 (iii, iv, v) and 42.5.16 (iii, iv, v) are illustrated in Figure 42.2.1. Note that 


general inclusions of the form C^(U, IR") 5 D**+!(U,IR™) are contradicted by Examples 41.1.21 and 41.4.8 
if n > 2. In other words, partial differentiability does not guarantee continuity.) 


Figure 42.2.1 Differentiability class relations for Cartesian space maps 


42.2.11 THEOREM: Inclusion relations for differentiable and continuously differentiable function classes. 
Let n € Z. Let U € Top(IR?). 
(i) v : R” > R, Va € List(N,,), Dom(04 f) 2 Dom(0,f). 
(ii) Vf : R” > R, Vo, o € List(IN,), a € a’ > Dom(0,7/) 2 Dom(0, f). 
(iii) 2 € Zi, D*(U,R) 2 D**! (U, R). 
(iv) Vk € Zi, D*(U, R) 2 C*(U, R). 
(v) Vk e Zt, C*(U, R) 2 CFH (U, R). 
(vi) Vk € ZF, C*(U, R) 2 C**(U, R). 
(vii) Vk € Zi, CF (U,IR) = (f € C*(U, R); Vi € Nn, 8;f € C*(U,R)}. 


PROOF: Part (i) follows from Definitions 42.2.2 and 42.2.5 (ii). 

Part (ii) follows from part (i) by induction on length(o) and length(o"). 

For part (iii), let f € D**! (U, R). Then Dom(0gf) = U for all 8 € INF*! by Notation 42.2.8. Let a € NF. 
Let i € N4. Let 8 = concat(o,(i)) Then 8 € N**!. So Dom(0,;f) = Dom(dgf) = U. Therefore 
Dom(8, f) =U by part (i). Hence f € D*(U,IR) by Notation 42.2.8. 

Part (iv) follows from Notation 42.2.9. 


For part (v), let k € Zj and f € C**1(U,R). Then f € D**1(U,R) by Notation 42.2.9. Therefore 
f € D'(U, IR) by part (iii). So Dom(8, f) = U for all a € NF. 

Let o € NF. Let i € Nn. Let 8 = concat(a, (i)). Then 8 € NF*!. So 0;(Oaf) = Oaif = Osf € C(U, R) 
by Notation 42.2.9. Therefore O4 f € C'!(U,IR) by Notation 41.2.19 (or by Notation 42.2.9). Consequently 
af € C(U,R) by Theorem 41.6.19 (iii). Hence f € C*(U, IR) by Notation 42.2.9. 

Part (vi) follows from part (v) and Notation 42.2.9 for C^" (U, R). 

For part (vii), let f € C**!(U, R). Then f € C*(U,R) by part (v). Let i € IN, and a € NE. Let 8 = 
concat((i), o). Then 8 € NF*!. So 84(8;f) = 3af € C(U, R) by Notation 42.2.9. Therefore 0;f € C*(U,R) 
by Notation 42.2.9 because f € D*(U,IR) by part (iv). Thus C*+1(U, R) C (f € C*(U,I«R); Vi e Nn, 8;f € 
C*(U, R)). 
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For the reverse inclusion, let f € C*(U,R) satisfy Vi € Nn, Of € C*(U,R). Let 8 € NE*!. Then 
B = concat((i), œa) for some (unique) i € N, and a € NF. So if € C*(U, R). Therefore Dom(04(80;f)) =U 
by Notation 42.2.8. But 0.(0;f) = dif = Ogf. Thus Dom(Osf) = U for all 8 € IN**1. Therefore 
f € D**1(U,R) by Notation 42.2.8. It also follows from 0;f € C*(U,IR) that Vo € NF, 04(8;f) € C(U, R) 
by Notation 42.2.9. Thus 0g f € C(U, IR) for all 8 € NF*!. Consequently f € C**1(U, IR) by Notation 42.2.9. 


Hence C^ (U, R) = (f € C*(U, R); Vi € Nn, 8;f € C*(U, R)}. 


42.2.12 REMARK: Locally C" implies globally C^. 
Theorem 42.2.14 is useful for the analysis of differentiable real-valued functions on differentiable manifolds. 
(See for example the proof of Theorem 61.1.5.) 


42.2.13 THEOREM: Locally continuous implies globally continuous. 
Let f : Q — R for some Q € Top(IR?) with n € Zf. Suppose that for all x € Q there is an r € IR* such that 
f| p, i$ continuous. Then f is continuous. In other words, 


(Vz E9, Ir € R*, fl, €C(B,, IRR) = fe C(O,R). 


PRoor: The assertion follows from Theorem 31.12.17 and the topology on R”. 


42.2.14 THEOREM: Locally C* implies globally C^. 
Let f : Q — R for some Q € Top(IR”) with n € Zi. Let k € Z. Suppose that for all x € Q there is an 
r € Rt such that f|, € C*(Bz,r,R). Then f € C*(Q, R). In other words, 


(vz € 9, Ir € R*, fl, €C*(Ber,R)) > fec*(QR) (42.2.1) 


PROOF: Let k € Zj. Let f : Q R satisfy Vz € Q, dr € Rt, fl, € C!(Bz r, R). Then z € Dom(0,f) 


for all a € NK, for all x € Q, because the differentiability of f at 2 depends only on the values of f in a 
neighbourhood of x. Thus Dom(0,f) = Q for all a € N^. Since 3af is continuous in B, , for some r € IR*, 
for all x € Q, it follows from Theorem 42.2.13 that 04 f € C(Q, IR) for all a € NF. Hence f € C^(Q, R). The 


case k — oo follows directly from the validity of all of the finite cases. 


42.2.15 REMARK: Constant real functions are infinitely continuously differentiable. 
As in the case of Theorems 40.5.5, 41.1.16, 41.2.23 and 42.1.14, Theorem 42.2.16 is a trivial exercise to 
determine whether the definitions and notations are workable. But it is also useful in applications. 


42.2.16 THEOREM: Constant real functions on Cartesian spaces are C™. 
Let n € Zi, U € Top(IR?) and c € R. Define f : U > R by f(x) ^ c for all z € U. Then f € C™(U), and 
Of (p) = 0 for all p € U and a € NE with k € Zt. 


PROOF: By Theorem 41.1.16, f is partially differentiable on U, and ð; f (p) = 0 for all p € U and i € Nn. 
So Dom(0;f) = U for all i € N, by Definition 42.2.2 and Notation 42.2.4. 


Assume for some k € Z+ that Dom(O, f) = U for all a € NE, and (0, f)(p) = 0 for all p € U and a € NE. 
Then Dom(ôa f) = U and (0sf)(p) = (0:(O0f))(p) = 0 for all p € U, for all a € NF and i € Nn, by 
Theorem 41.1.16 and Definition 42.2.2 and Notation 42.2.6. Thus Dom(0,f) = U and (a f)(p) = 0 for all 
p € U and a € N**!. Hence f € C**(U) by Notation 42.2.9 by induction on k € Zt. 


42.3. Commutativity of partial derivatives 


42.3.1 REMARK: Commutativity of partial derivatives of real-valued functions of real variables. 

Of similar importance to the “total differential theorem”, Theorem 41.6.15, is the commutativity of partial 
derivatives. The total differential theorem gives conditions under which directional derivatives of functions of 
n real variables are fully determined as a linear function of n partial derivatives. The commutativity of partial 
derivatives implies that the order in which multiple partial derivatives are computed makes no difference to 
the value obtained. In both cases, the amount of information required is reduced due to redundancy, but in 
both cases, the benefits are only guaranteed if the partial derivatives obey some continuity conditions. Thus 
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this is another situation where real analysis is much simpler with C* spaces than with the corresponding D^ 
spaces which specify only that the derivatives are well defined. 


Proofs of the commutativity of partial derivatives are given, for example, by Friedman [74], pages 215-217; 
Thomson/Bruckner/Bruckner [149], pages 473-476; Rosenlicht [128], pages 202-203; Edwards [67], page 87; 
Graves [85], page 80; Rudin [129], page 208 (using integration). 

To avoid being distracted by minor details, a method of proving the commutativity of partial derivatives 
is shown in Theorems 42.3.2 and 42.3.3 for the special case of R?. But the more general Theorems 42.3.4 
and 42.3.6 follow directly from them. 


42.3.2 THEOREM: Bounds for double differential quotient for C? functions on R?. 
Let S = [a1, bi] x [a2, be] C IR? for some ay, b1, a2, 0» € R with a4 < bı and ag < by. Let S CU € Top(R?). 
Let f € C1 (U, R). 
(i) Let Dom(0,02f) = U, and let K € R be such that 0105 f(z) > K for all x € U. 
'Then f (bi, b2) = f (ai, b2) = f (bi, a2) F f (a1, a2) zm K(bi = a1) (be = a2). 
(ii) Let Dom(020, f) = U, and let K € R be such that 050; f(x) > K for all x € U. 
Then f (bi, b2) = f (ai, b2) = f (bi, az) ar f (a1, a2) > K (bı = a1) (be = a2). 


PROOF: For part (i), define g : [a1, b1] — R for t € [a2, ba] by gi : s — O2f(s,t). Then g is continuous on 
a1, bı] and differentiable on (a;,b,) because f € C!(U,R) and Dom(0;05 f) = U, and for all s € (a1,b1), 
gi (s) = 0105 f (s,t) > K. So gi(b1) — g«(a1) > K(bi — a1) by Theorem 40.6.13. 

Now define h : [a2,b2] > IR by h:t f(b1,t) — f(a1,t). Then h is continuous on [a2, b2] and differentiable 
on (a2, 3), and h/(t) = 05 f (b1, t) = O5 f (a1, t) = gi (b1) = gelar) > K (bi = a1) for all t € (az, b2). Therefore 
h(b2) — h(a2) = K (bi = aı)(b2 = az) by Theorem 40.6.13. Thus f (bi, b2) — f (ai, b2) = f(b, az) + f (ai, a2) PA 
K(bi m à1)(bo zm a2). 

Part (ii) follows as for part (i) by swapping zı and z3. 


42.3.3 THEOREM: First derivatives of a C? functions on IR? commute. 
Let U € Top(IR2). Let f € C!(U,IR) satisfy Dom(0,02f) = Dom(050; f) = U, and suppose that 0,02f € 
C(U, R) and 0501 f € C(U, R). Then 0105 f (z) = 020, (x) for all x € U. 


PROOF: Suppose that a € U and 0105/(a) > 020, f (a). Then 0,02f(a) > Kı > Kə > 0301f(a) for some 
Kı, K2 € R. (For example, choose Kı = 20105 f (a) + 10501 f (a) and Kı = 10105 f (a) + 20501/(a).) Since 
0,02 f and 050, f are continuous, there is a set Q € Top, (IR2) such that 0105 f(x) > Kı > Kə > 020: f(x) 
for all x € Q. But then there is a set S = [a1,bi] x [a2,b2] with a, < bı and ag < bo and S C Q. 
Therefore f(b1,b2) — f(a1,b2) — f(b1,a2) + f(a1,a2) > Ki(bi — a1)(b — a2) by Theorem 42.3.2 (i), and 
f (bi, b2) = f (ai, b2) = f(b, az) + f (ai, d2) < K»(bi = a1) (be = a2) by Theorem 42.3.2 (ii) (by substituting -f 
for f and —K> for K). This contradicts the inequality Kı > Kə. Therefore 0,02 f(a) < 020, f(a). Similarly, 
0105 f(a) > 020, f(a). Hence 0105 f (a) = 0201 f (a). 


42.3.4 THEOREM: First derivatives of a C? functions on real Cartesian spaces commute. 

Let n € Zt with n > 2. Let U € Top(R”). Let i,j € IN, with i Z j. Assume that f € C'(U,R) satisfies 
Dom(0,0; f) = Dom(0,0;f) = U, and that 0,0; f and 950; f are continuous on U. Then 0,0; f(x) = 0,0; f (x) 
for all x € U. 


PROOF: Let U € Top(IR") for some n € Zt with n > 2, and let i, j € Nn with i # j. Let a € U, and define 
Q: R? > R” by Q:y m a+yiei+yzej. Let U = Q^! (U). Then U € Top(R?) because Q is continuous. 


Define f : U > R by f = f o Q. Then fe C(Ŭ, R) because f € C(U,R). Also, after verifying 
various tedious technicalities, 0, f(y) = O;f(Q(y)) and ð2f(y) = O;f(Q(y)) for all y € U. Moreover, 
0105 f (y) = 9,0; f(Q(u)) and 020; f(y) = 950i f (Q(v)) for all yE U. Thus f € CP R) and Dom(010» f) = 
Dom(050, f) = U, and 0,0,f € C(U,R) and 020,f € C(U,R). So 0,02f(y) = 020, f(y) for all x € U by 
Theorem 42.3.3. Thus 0;0;f(x) = 8;0;f (x) for all x € Q(U) = UN Q(IR2). But this holds for all a € U. 
Hence 9;0; f(x) = 0;0;f (x) for all x € U. 


42.3.5 REMARK: Independence of partial derivatives with respect to permutations. 
Theorem 42.3.6 asserts the independence of partial derivatives with respect to permutations of the partial 
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derivative operations. This is one of the theorems which make tensor calculus, and differential geometry 
in general, much simpler, because the continuity of partial derivatives up to order k guarantees the total 
differentiability of partial derivatives up to order k — 1, and the order of differentiation has no effect on the 
value. The fundamental importance of Theorem 42.3.6 suggests that, like Theorem 41.6.15, it deserves a 
nickname, such as the “partial derivative commutativity theorem”, for example. 


42.3.6 THEOREM: The partial derivative commutativity theorem. 
Let n,k € Zi, Q € Top(IR?) and f € C*(Q,R). Then Oaf(p) = 3aop-: f (p) for all p € Q, for all a € NE, 
for all permutations (i.e. bijections) P : Nk — Nx. In other words, 


Yn, k € Zt, VQ € Top(IR?), Vf € C^(Q, R), Vp € Q, Va € NF, VP € perm(N), 
Of (p) = aop- f (p). 


In other words, the value of multi-index partial derivatives is independent of the order of differentiation. 


PROOF: The assertion follows by induction from Theorem 42.3.3. 


42.3.7 REMARK: Excessive smoothness requirements restrict applicability and hide analytical issues. 

For the reasons outlined in Remark 42.3.5, most of the differential geometry in this book is expressed in 
terms of C^ regularity classes for functions and manifolds. This restriction is generally a good trade-off 
between generality and pragmatic simplicity. Such regularity classes are readily refined further, for example 
to the C^ Hólder continuity classes, if required by applications. 


The blanket use of C' spaces, which is commonly seen in textbooks, is avoided here because many real-world 
applications do not satisfy such extreme smoothness requirements. Many important issues are completely 
removed from consideration by assuming that all manifolds, regions, functions, curves and maps satisfy C?? 
regularity. The extra work which is required in the case of C^ spaces, compared to C^? spaces, is perhaps 
excessive for an introductory course, but many insights are gained by constantly having to justify claims for 
each level of regularity of each new construction. When everything is “smooth”, one is oblivious to all of the 
"rough" detail which lurks below the surface. 


42.4. Higher-order directional derivatives 


42.4.1 REMARK: Multiple directional derivatives. 

The single directional derivative 0, f(p) in Theorem 42.4.2 means lim; ,o(f (p + tv) — f(p))/t according to 
Definition 41.4.4, not the “coordinate version” of this expression, which would be $77 , v;O;f (p). These first- 
order expressions happen to have the same value by Theorem 41.6.21 (ii) because f € C! (Q,IR). However, 
the distinction between directional and partial derivatives is much more serious in the second-order case 
because the partial derivatives might not even be well defined. 


42.4.2 THEOREM: Bounds for second partial and directional derivatives at minimum/mazimum points. 
Let n € Zi, Q € Top(IR"), p € Q and f € C! (Q, R). 


(i) If i € Nn and 02 f (p) is well defined and f(p) = min(f), then 6? f(p) > 0. 
(ii) If i € Nn and 0? f (p) is well defined and f(p) = max( f), then 02 f (p) < 0. 
(iii) If v € IR" and 0? f(p) is well defined and f(p) = min( f), then 02 f(p) > 0. 
(iv) If v € R” and 0? f(p) is well defined and f(p) = max(f), then 02f(p) < 0. 
PROOF: Part (i) follows from Theorem 42.1.20 (i) by considering the map t + f(p+te;) with e; = (05)54. 


Part (ii) follows from Theorem 42.1.20 (ii) by considering the map t — f(p + tei) with e; = (0;5)54. 
Part (iii) follows from Theorem 42.1.20 (i) by considering the map t e» f(p + tv). 


Part (iv) follows from Theorem 42.1.20 (ii) by considering the map t —» f(p + tv). 


42.4.3 REMARK: Expression for second-order directional derivative in terms of partial derivatives. 

Although Theorem 41.6.21 (i) gives a formula for first-order directional derivatives in terms of partial deriva- 
tives if the function is totally differentiable at a single point, the situation for second-order derivatives is not 
so simple. In Theorem 42.4.4, it is shown that an analogous formula is obtained for second-order directional 
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derivatives, but to ensure that the first-order directional derivatives are well defined in a neighbourhood of a 
particular point and have suitable properties, the first-order partial derivatives are assumed to be continuous 
in some neighbourhood. 


The well-definition and continuity of the first-order partial derivatives ensures that the first-order directional 
derivative is well defined in some neighbourhood. This can be directionally differentiated because each of the 
individual partial derivatives can be directionally differentiated since they are assumed totally differentiable. 


The purpose of Theorem 42.4.5, which extends Theorem 42.4.4, is to give fairly weak conditions for the 
validity of the usual formula for second-order directional derivatives in terms of partial derivatives. This 
formula is one of the most important foundation-stones for differential geometry. Tensor calculus relies very 
heavily on the ability to convert all directional derivatives into linear combinations of partial derivatives so 
that they can be expressed into terms of a simple n-tuple of numbers v = (v;)?., € R”. The first-order 
case is established by Theorem 41.6.21, and Theorem 42.4.5 establishes that this simple n-tuple of numbers 
suffices also for the second order. 


42.4.4 THEOREM: Formula for second directional derivative in a plane in terms of partial derivatives. 
Let n € Zt, Q € Top(IR?), p € Q and f € C!(Q, R). For i, j € Nn, let ôf and 0;f be totally differentiable 
at p. Let v = se; + te; € R”. Then 07 f(p) is well defined and 0% f (p) = s?0? f (p) + 25t0;0; f (p) + t705 f (p). 


PROOF: By Theorem 41.6.21 (ii), Vr € Q, 8, f(x) = Y, vkOkf(x) = sOjf(x) + t0; f(x) because f € 
C'(Q,IR). By Theorem 41.6.21 (i), 0,(0«f)(p) = Xp- Exe ) = 80;(Ocf)(p) + t0;(Ocf)(p) for £ = i 
and £ = j because 0;f and 0;f are totally differentiable at p. So 
9, (0, f) (p) = B(s: f (2) + 10; f(2))],.., 
= sO,(0i f )(p) + t0.(85 f)(p) 
= 5(80;(0; f) (p) + t0;(0; f )(p)) + t(s0;(0; f) (p) + t85(85 f)(p)) 
= s^0j f (p) + 25100; f (p) + £05 f (p), (42.4.1) 


where line (42.4.1) follows from Theorem 42.3.4. 


42.4.5 THEOREM: General formula for second. directional derivative in terms of partial derivatives. 
Let n € ZA, Q € Top(IR?), p € Q and f € C!(Q, R). Let ðf be totally differentiable at p for all i € Nn. 
Then 0? f(p) is well defined for all v € R” and 


Vv € IR^, a? f(p) = 3 vjvjO;O; f (p). (42.4.2) 


i,j—1 


PROOF: By Theorem 41.6.21 (ii), Vr € Q, 0, f(z) = 37, ., vxO f(x) because f € C! (Q, IR). It follows from 
Theorem 41.6.21 (i) that 0,(0;f)(p) = b 7,0; (0; f)(p) for all i € N, by the total differentiability of 9; f 
at p. Therefore 


Vv € R^ 8,(8,f)(p) = 20» vidi f (a) | 


rz—p 


©. 
IL 


I 
Ma 


9, (if (p) 


s. 
ll 
n 


II 
Ma 


0; (Oi f)(P) 


>. 


ll 
V 


e. 
ll 
m 


n 
&. 
ll 
n 


where line (42.4.3) follows from Theorem 42.3.4. 


42.4.6 THEOREM: Bounds of second-derivative matrices at minimum/mazimum of a function. 
Let n € Zt, Q € Top(IR"), p € Q and f € C! (Q, R). Let 0;f be totally differentiable at p for all i € Nn. 


(i) If f(p) = min(f), then Vv € R”, 02f(p) > 0 and the matrix [0;0; f (p)]? 


i,j—1 


(ii) If f(p) = max( f), then Vv € R”, 02f(p) < 0 and the matrix [0;0; f (p)]? 


e 


is positive semi-definite. 


is negative semi-definite. 
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PROOF: For part (i), it follows from Theorem 42.4.5 that 02 f (p) is well defined for all v € R”, and from 
Theorem 42.4.2 (iii) it follows that 02f(p) > 0. Then it follows from Theorem 42.4.5 line (42.4.2) and 
Definition 25.11.7 (i) that the matrix [0;0; f(p)|?;—1 is positive semi-definite. 


For part (ii), 0? f(p) is similarly well defined for all v € R”, and so 0? f(p) < 0 by Theorem 42.4.2 (iv). 


Then it follows from Theorem 42.4.5 line (42.4.2) and Definition 25.11.7 (ii) that the matrix [0;0; f (p)]? ji 
is negative semi-definite. 


42.5. Higher-order derivatives of maps between Cartesian spaces 


42.5.1 REMARK: Extension of higher-order derivatives from real-valued functions to maps. 

The definitions and notations for higher-order derivatives of maps between Cartesian spaces R” and R” are 
a straightforward extension of the case m = 1 in Section 41.2. (Extending the range dimension from 1 to 
general m is much easier than extending the domain dimension from 1 to general n because differentiation 
is performed with respect to the n domain coordinates.) 


The first-order partial derivatives “ð; f" for partially defined functions in Definition 42.5.3 refer to the versions 
for fully defined functions on open domains in Definition 41.2.7 and Notations 41.2.9 and 41.2.10. (Thus 
O; f(a) is an m-tuple in Definition 42.5.3 because the target space of f is R™.) 


'The considerations in Remark 42.2.3 apply here also, namely that partially defined functions are employed in 
Definition 42.5.3 so as to incorporate well-definition tests for higher-order derivatives into the computations, 
so that one does not need to separately test for well-definition of derivatives at each stage of the construction. 
In this way, instead of requiring the derivatives of each order to be well defined on some domain U, one 
requires that the domain should include U. In other words, the derivatives are always “defined”, in the sense 
of partially defined functions, and the question then only remains as to whether the domain of definition 
includes the desired set. Thus the domain of definition depends on the function which is being differentiated. 


42.5.2 REMARK: Terminology: Partial functions versus partial maps. 

It is preferable to use the word *map" to refer to functions between two Cartesian spaces, particularly when 
the dimensions are greater than 1. This creates a convenient contrast between the word “function”, which 
is a hint that the function is real-valued, and a “map” which is real tuple-valued. However, as discussed in 
Remark 10.12.16, the term “partial map" has a special meaning which is inconsistent with the term “partial 
function". Therefore Cartesian space maps which are partial functions in Section 42.5 cannot be referred to 
as “partial maps". So the slightly confusing term “partial function" is used for Cartesian space maps which 
are partially defined. 


42.5.3 DEFINITION: The partial derivative of a partial function f : R” > R™ with respect to component 
i € Nn, for n,m € Zi, is the partial function 


{(x,0;f(x)); v € Int(Dom(/)) and f is partially differentiable at x]. 


42.5.4 NOTATION: Ójf, for a partial function f : IR" > R”, for n, m € Zf and i € Np, denotes the partial 
derivative of f with respect to component i. 


42.5.5 REMARK:  Multi-indez-lists for partial derivative trees. 
Strictly speaking, the “multi-indices” in Definition 42.5.6 are multi-index-lists as in Definition 14.8.38. For 
definiteness, one may assume that the elements of such lists are indexed commencing with 1. Thus a multi- 
index-list a = (04,... 04) is an element of List(Nn) = Uz, N£. 


42.5.6 DEFINITION: The partial derivative tree for a partial function f : IR^ > R” with n,m € Zi, is the 
map F : UP, Nk > (IR^ > IR") which is defined inductively by the following rules. 


(i) Fo =f. 


(ii) Vk € Zi, Va € N*, Vi € Nn, Fai = 0;Fa, where “a, i” means the concatenation of a and (i). 


The partial derivative for multi-index o of a partial function f : R” > R™, for a € NE with n,m,k € Ze 
is the value Fy of the partial derivative tree for f. 
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42.5.7 NOTATION: ôaf, for a partial function f : R” > R” and a € NE, for n,m,k € ZÈ, denotes the 
partial derivative of f for multi-index a. 


42.5.8 DEFINITION: A partial function f : IR" > R”, for n,m € Zg, is said to have an undefined partial 
derivative for multi-index a at x € Dom(f), for a € Us 9 N5, if £ € Dom(ôa f). 

A partial function f : R” > R” is said to be k-times differentiable at x € Dom(f), for n,m, k € Zg, if 
z € Dom(ôða f) for all a € UKo NF. 

A partial function f : R” > R” is said to be k-times differentiable on a set S C Dom(f), for n,m, k € Ze, 
if S C Dom(ôa f) for all a € UKo NE. 


42.5.9 NOTATION: D*(U,IR™), for U € Top(R”) and n,m,k € Zg, denotes the set of f : U — IR" such 
that Dom(ða f) = U for all a € NE. In other words, 
Yn, m, k € Z, VU € Top(IR?), 
D*(U, R™) = (f : U 2 R”; Va € NË, Dom(0,f) = U}. 
D**(U, IR"), for U € Top(IR), denotes the set No D*(U, R”). 
42.5.10 NOTATION: C*(U,IR™), for U € Top(IR”) and n, m, k € Zj , denotes the set of f : U — R™ such 
that Dom(O, f) =U and ðaf is continuous on U for all a € NE. In other words, 
Vn,m,k € Zt, VU € Top(R"), 
C*(U,R") = (f € D'(U,R); Vo e NË, 8,f e C(U,R™)} 
—(f :U > R”; Va € NĚ, (Dom(0,f) = U and 3af € C(U,R™))}. 


C^* (U, R”), for U € Top(IR”) with n, m € Zt, denotes the set No C*(U, IR"). 


42.5.11 DEFINITION: A C* (differentiable) function from U to IR", for k € Zi and a set U € Top(R”), 
for n,m € Zg , is a function f : U — IR?" such that the partial derivatives of f of order less than or equal 
to k all exist and are continuous on U. In other words, f € C^(U, IR"). 


42.5.12 REMARK: Special cases of sets of differentiable and continuously differentiable maps. 

In the special case k = 0 in Notation 42.5.7, the set NE contains only the empty list or tuple (), which is the 
same as the empty function, which is the same as the empty set. Then 0g f = f by Definition 42.5.6. So as 
expected, the zeroth derivative of f is f. 

In the special case k = 0 in Notation 42.5.9, the set D?(U, IR?) is equal to (IR")", the set of all functions 
from U to IR", because Dom( fg) = Dom(f) = U for all functions f : U — R™. This is also as expected 
because all functions should be zeroth-order differentiable. 


In the special case k = 0 in Notation 42.5.10, the set C°(U, R™) is equal to C(U, R”), the set of all continuous 
functions from U to R”, 


42.5.13 REMARK: Locally C* Cartesian space maps are useful for differentiable manifold atlases. 
Notation 42.5.14 is applicable to the transition maps between charts in atlases for C^ manifolds. For each 
pair of charts pı and we, the transition map v» o YI ! must be a C^ map between open subsets of a Cartesian 
space. The purpose of Notation 42.5.14 is to avoid the necessity of stating the domain and range. 


42.5.14 NOTATION: Locally C^ maps between Cartesian spaces. 
C*(Q,IR™), for Q € Top(IR") and n, m,k € Zg, denotes the set of C^ functions from U to R™ with 
U € Top(Q). In other words, 


Yn, m, k € Zi,VQ € Top(IR, C*(Q,R™) = ; U " QI Rm. 
€ Top 


C*(Q, S), for Q € Top(IR?), S € P(R™) and n, m,k € Zt, denotes the set of f € (Q, IR?) such that 
Range(f) € S. In other words, C*(Q, S) = (f € C*(Q, R"); Range(f) C S]. 
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42.5.15 REMARK: Simple-looking recurrence formula for C* map spaces. 

As in Theorem 42.2.11 (vii) for real-valued functions on Cartesian spaces, Theorem 42.5.16 (vii) gives a 
simple-looking recurrence formula for the spaces of C^ maps between Cartesian spaces. (The statements 
and proofs of these two theorems are essentially identical.) Note that it cannot be asserted here that 
C*(U,IR™) > DF*!(U, IR") because this would be contradicted by Example 41.1.21. (The differentiability 
class relations in Theorem 42.5.16 (iii, iv, v) are illustrated in Figure 42.2.1.) 

When n = 0, either U = Ý or U = {0}. In either case, C^(U, IR?) = D*(U,R™) = (IR) because then 
all functions f : U — R” are constant, which implies that partial derivatives of all orders exist and are 
everywhere continuous (possibly on the empty set). 


42.5.16 THEOREM: Some inclusion relations between C" and D* differentiability classes. 
Let n,m € Zi. Let U € Top(IR?). 


(i) Vf : IR" > R”, Va € List(IN4,), Dom(04f) 2 Dom(0s,; f). 


PROOF: Part (i) follows from Definitions 42.5.3 and 42.5.6 (ii). 

Part (ii) follows from part (i) by induction on length(o) and length(a’). 
For part (iii), let f € D**1(U,R™). Then Dom(0gf) = U for all 8 € INZ^! by Notation 42.5.9. Let 
o € NF. Let i € Nn. Let 8 = concat(o, (i)). Then 8 € NK*1. So Dom(ôa, if) = Dom(0¢f) =U. Therefore 
Dom(8, f) =U by part (i). Hence f € D*(U, IR") by Notation 42.5.9. 

Part (iv) follows from Notation 42.5.10. 

For part (v), let k € Zf and f € C**! (U, IR"). Then f € D**1(U,R™) by Notation 42.5.10. Therefore 
f € D'(U, R”) by part (iii). So Dom(ôa f) = U for all a € NF. 

Let a € NE. Let i € Nn. Let 8 = concat(a, (i)). Then 8 € N**!. So 0;(Oaf) = Oaif = 3g f € C(U, IR") by 
Notation 42.5.10. Therefore à, f € C1(U, IR?) by Notation 41.2.19 (or by Notation 42.5.10). Consequently 
Of € C(U, IR") by Theorem 41.6.19 (iii). Hence f € C^(U, IR") by Notation 42.5.10. 

Part (vi) follows from part (v) and Notation 42.5.10 for C^" (U, IR"). 

For part (vii), let f € C^! (U, IR"). Then f € C*(U, IR) by part (v). Let i € Nn. Let a € NË. Let 8 = 
concat((i),a). Then 8 € NEI. So 84(8;f) = 0gf € C(U,R™) by Notation 42.5.10. So 0,f € C*(U, IR?) 
by Notation 42.5.10 since f € D*(U,IR™) by part (iv). Thus C** (U, R®) C (f € C*(U,IR"); Vi € Nn, 
Of € C* (U, R™)}. 

For the reverse inclusion, let f € C*(U,R™) satisfy Vi € Nn, 0;f € C*(U,IR"). Let 8 € NF*1. Then 
B = concat((i),a) for some (unique) i € Nn and a € NF. So 0;f € C*(U, IR"). Thus Dom(04(0;f)) = U 
by Notation 42.5.9. But O4(0;f) = iaf = Ogf. So Dom(dgf) = U for all 8 € INE*!. Therefore f € 
DF* (U, R?) by Notation 42.5.9. It also follows from 0;f € C^(U, IR?) that Va € NE, 0.(0;f) € C(U, R™) 
by Notation 42.5.10. Thus 0gf € C(U,IR™) for all 8 € NE*!. So f € CF*!(U, R") by Notation 42.5.10. 
Hence C** (U, R") = (f € CF(U, R"); Vi € Nn, if € C*(U, R™)}. 


42.5.17 THEOREM: A function is C** if and only if its order-k derivatives are C1. 
Let n,m, k € Zt. Let U € Top(IR?). 


(i) If f € C**1 (U, R™), then 8,f € C! (U, IR") for all a € NE. 
(ii) If à, f € CY (U, R") for all a € NF, then f € C^" (U, R”). 
PRoor: For part (i), let f € C**1 (U, IR"). Then by Notation 42.5.10, 0g f € C(U, IR") for all 8 € NE. 


Let a € IN. Then concat(o, (i)) € INE*! for all i € Nn. So 0;(Oaf) = O4if € C(U,IR?) for all i € Nn. 
Therefore à f € C! (U, IR") by Definition 41.2.18. 
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For part (ii), assume that ôa f € C'(U,IR™) for all a € Nk. Then 0;(0.f) € C(U,IR™) for all a € NẸ and 
i € Nn by Definition 41.2.18. Let 8 € NF*!. Then 8 = concat(a, (i)) for some a € NË and i € Nn, and so 
Osf = Oaif = 3i(3a f) € C(U, R”). Hence f € CF (U, IR") by Notation 42.5.10. 


42.5.18 REMARK: O* differentiability of first-order derivatives implies C**. 
Theorem 42.5.19 is the extension of Theorem 42.1.15 from real-valued functions of a real number to maps 
between Cartesian spaces. Theorem 42.5.19 (i) is slightly stronger than Theorem 42.5.16 (vii). 


42.5.19 THEOREM: A function with C" first-order derivatives must be CFT". 
Let n,m € Zj. Let U € Top(IR?). Let f : U > R”. 


(i) For all k € Zi, if 0;f € C*(U,IR™) for all j € Ny, then f € CF" (U, R"). 

(ii) If 8;f € C^*(U, R”) for all j € Nn, then f € C°(U,R™). 
Proor: For part (i), let k € Zt, and suppose that 0; f € C'(U, IR") for all j € Na. Then Dom(0,0; f) = U 
and 0,0; f € C(U, IR") for all a € NË and j € Nn by Notation 42.5.10. 


Let 8 € NE*. Then 8 = concat((j),o) for some o € NE and j € Nn. So Ogf = 0505f, and then 
Dom(0s f) =U and dgf € C(U,R™). Hence f € C**1 (U, IR") by Notation 42.5.10. 

For part (ii), suppose that 0; f € C® (U, IR") for all j € Ny. Then ô; f € C*(U, IR) for all j € Nn, for all 
k € Zg, by Notation 42.5.10. So f € C**1(U,IR™) for all k € Zf by part (i). Hence f € C? (U, IR") by 
Notation 42.5.10 and Theorem 42.5.16 (v). 


42.5.20 THEOREM: Constant maps between Cartesian spaces are C?? differentiable. 
Let n,m € Zf. Let U € Top(IR?). Suppose that f : U — IR" satisfies Jc € IR", Vp € U, f(p) = c. Then 
f € C**(U, R") and 8, f(p) = Orm for all p € U, for all a € NE with k € Zt. 


PROOF: Let f: U > R” satisfy dc € R™, Vp € U, f(p) = c. Then by Theorem 41.2.23, f is partially 
differentiable on U and satisfies Vp € U, Vi € Nn, 0;f(p) = 0g» and f € C!(U, IR"). Thus the proposition 
P(k) = “f € C*(U,R"') and Vp € U, Vo € N*, 3a f(p) = Orm” is true for k = 1. 

Now suppose that P(k) is true for some k € Zt. Let 8 € NF*!. Then 6 = concat(a, (i)) for some i € IN and 
a € NE. Then ôa f (p) = Orm for all p € U for such a, which implies that ôa f is constant on U, which implies 
by Theorem 41.2.23 that ða f is partially differentiable on U and satisfies Vp € U, Vi € Nn, iĝa f (p) = 0m» 
and ða f € C!(U, IR"). Thus Vp € U, V8 € N**!, ügf(p) = iða f (p) = Orm, and then Theorem 42.5.17 (ii) 
implies that f € C**1(U,IR™). Therefore P(k + 1) is true. Hence by induction, f € C?*(U, IR") and 
Vk € Zt, Va € NE, Vp € U, ôa f (p) = Orm . 


42.5.21 THEOREM:  Differentiability of the identity map on Cartesian spaces. 
Let n € Zi. Let U € Top(IR?). 

(i) idy € C (U, R”) and Vj € Nn, Vp € U, Ojidu(p) = ej. 

(ii) idy € C% (U, R”) and Vk € Zi, Vo € N2**, Vp € U, O4idu(p) = OR». 


PRoor: Part (i) is a paraphrase of Theorem 41.2.24. 
For part (ii), Ojidy is constant for all j € IN, by part (i). So by Theorem 42.5.20, Qjidy € C?*(U, IR") for all 
j € Nn and Vj € Nn, Vk € Zt, Vo € NE, Vp € U, 0,0;idu(p) = Or». By substituting 6 for concat((j), œ), 
it follows that Vk € Zt, V8 € N2**, Vp € U, Ogidu(p) = Or». By Theorem 42.5.19 (ii), Ojidy € C?*(U, R”) 
implies idy € C?*(U, IR"). This verifies the claim. 


42.5.22 REMARK: Commutativity of partial derivatives for maps between Cartesian spaces. 
Theorem 42.5.23 follows immediately from Theorem 42.3.6 because each component f(x); of f(a) for j € Nm 
is a real-valued functions on an open subset of a Cartesian space. 


42.5.23 THEOREM: The partial derivative commutativity theorem for Cartesian space maps. 
Let n,m,k € Zi, Q € Top(IR”) and f € C*(Q, IR"). Then dof(p); = Oacp-1f(p); for all p € Q, for all 
j € Nm and a € NE, for all permutations (i.e. bijections) P : Ng — Nx. In other words, 


Vn, m,k € Zt, VO € Top(IR"), Vf € C*(Q, R”), Vp € Q, Vj € Nm, Va € NË, VP € perm(Ny), 
Os f (p); = xop- f (p);- 


In other words, the value of multi-index partial derivatives is independent of the order of differentiation. 
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PROOF: The assertion follows from Theorem 42.3.6. 


42.5.24 REMARK: Second-order chain rule for continuously partially differentiable maps. 

Theorem 42.5.25 is the second-order version of the first-order chain rule in Theorem 41.7.4. First-order 
partial derivatives must be defined on a neighbourhood of each point so that the second-order derivative can 
be computed. 


42.5.25 THEOREM: Second-order chain rule for partial derivatives. 

Let n,m,p € Ze . Let f: U — IR" and g : V — IR? be twice continuously partially differentiable on 
U € Top(R”) and V € Top(IR") respectively. Then g o f is twice continuously totally differentiable on 
f^ (V) and 


Yz € f! (V), Vi,j € Nn, Vg € Np, 


ið; (g o f)(®)q = = 0,0; f(x) (Orga) F(@)) + 27. 0i fu(v) Oj felz) (OxAegq)(F())- 


1 


PM: 


k 


In other words, 0;0;(g o f) = X, 010; fx (Ong) o f + Dre Oi fr Oj fe (OnOeg) o f. 


PROOF: The assertion follows from Theorems 41.7.4 and 40.5.7 (iii). 


42.5.26 REMARK: Chain rule for higher-order differentiability between Cartesian spaces. 

Theorem 42.5.25 gives a hint that the order-k derivatives of a chain of two functions may become rapidly 
more complicated as k increases. A general rule for finite order could be given, but it would probably not 
be very useful. 


Between Cartesian spaces, the values of order-k function derivatives are higher-degree arrays of the form 
a — af for a € NE. (Such arrays are described briefly in Section 25.15.) In the case of C^ manifolds, 
the higher-order derivatives are even more complicated objects in higher-order tangent bundles. Therefore 
Theorem 42.5.27 is concerned only with the C^ differentiability property, not with the more complicated 
question of what the higher-order derivatives actually are. 


42.5.27 THEOREM: C* differentiability of a chain of C" functions between Cartesian spaces. 
Let n,m,p € Zt. Let U € Top(IR") and V € Top(IR"). Let k € Zt. Let f € C*(U,IR™) and g € C*(V, R7). 
Then go f € C*(f-1(V),IR?). 


PROOF: Let n,m,p € Zg, and let U € Top(IR^) and V € Top(IR™). For all k € Zf, let P(k) be the 
proposition “Vf € C^(U,R"), Vg € CF(V, RP), go f € C*(f-1 (V), R?)”. Then P(0), P(1) and P(2) follow 
from Theorems 31.13.4, 41.7.4 and 42.5.25 respectively. Assume that P(k) is valid for some k € Z. Then 
go f € C*(f- (V), RP). Suppose that f € C**! (U, IR") and g € CF*! (V, RP). Then by Theorem 41.7.4, 


Vi € Nn, Va € f^ (V), alg o f)(a) = Xd) (8g) f) 
= Y (8:fe- (Org) © D) (a), 


£—1 


where “-” denotes pointwise function product. But Ogg € C*(V, RP) for L € Nm by Theorem 42.5.16 (vii), 
and f € C*(U,IR™) by Theorem 42.5.16 (v). So (deg) o f € C*(f !(V),IRP) by P(k). Therefore since 


P(k) is valid for all k € Zj. The case k = oo then follows from Notation 42.5.10. 


42.5.28 REMARK:  Keyhole testing for higher-order differentiability. 

Theorem 42.5.29 (1) is a higher-order generalisation of the *keyhole test" for the first-order differentiability of 
maps in Theorem 41.2.4. Theorem 42.5.29 (ii) is a higher-order generalisation of the inheritance of first-order 
differentiability of maps by restrictions of maps to open subsets in Theorem 41.2.5. Theorem 42.5.29 (iii, iv) 
is a higher-order generalisation of the “keyhole test" for continuous first-order differentiability of maps in 
Theorem 41.2.21. 
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42.5.29 THEOREM: Local/global higher-order differentiability implications. 
Let n,m € Zt, k € Zt, U € Top(IR?) and f : U 2 R”. 
(i) If Vp € U, 30 € Top, (U), f|, € D*(Q, IR"), then f € D*(U,IR"). 
(ii) If f € D¥(U,R™), then VQ € Top(U), f|, € D*(Q,IR"). 
(iii) If Vp € U, 30 € Top, (U), fl, € C*(Q, IR"), then f € C*(U,R™). 
(iv) If f € C*(U, R™), then VO € Top(U), f|, € C*(Q, IR"). 


PROOF: For part (i), the case k = 0 follows from the observation that D?(U, IR") = (IR")" is the set of all 
functions from U to R”, and likewise D?(Q, IR") = (IR™)® for all Q € Top(U). (See Remark 42.5.12.) The 
case k = 1 follows from Theorem 41.2.4, which says the same thing. 

For k € Zt, assume that Vp € U, 30 € Top,(U), fs € DF(Q,R") implies f € D'(U,IR"). Let f 
satisfy Vp € U, JQ € Top,(U), Flo € D**!(0,R"). Then f € D*(U,IR™) by Theorem 42.5.16 (iii) and 
the assumption. Let a € N^. Then df : U — R” is well defined by Notation 42.5.9. Let p € U. Let 
Q € Top, (U) satisfy f|. € DEQ, R”). Then (3a f)|o = 3a(f|o) € D! (Q, R”) by Notation 42.5.9. So by 
Theorem 41.2.4, ða f € D!(U, IR"). Therefore f € D**1(U, IR") by Notation 42.5.9. Thus the proposition 
for k + 1 follows from the proposition for k. Hence by induction, the proposition holds for all k € Ze . Then 
the proposition for k = oo follows from Notation 42.5.9. 

For part (ii), the case k = 0 follows from the observation that D°(U,IR™) = (IR™)" is the set of all functions 
from U to R”, and likewise D°(Q,R™) = (R™)® for all Q € Top(U). The case k = 1 follows from 
Theorem 41.2.5, which says the same thing. 

For k € Zt, assume that f € D*(U, m implies VQ € Top(U), f|, € D*(Q,IR"). Suppose that f satisfies 
f € D**(U,R"). Then YQ € Top(U ) flo € D*(Q,R™) by Theorem 42.5.16 (iii) and the assumption. 
Let Q € Top(U). Then f|, € D*(Q, R”). Let a € NE. Then O&(f|;) : 2 — IR" is well defined by 
Notation 42.5.9. But Of € D!(U,IR") by Notation 42.5.9. So (daf)|o = Oa(flg) € D'(Q,IR") by 
Theorem 41.2.5. Therefore f|, € D**1(Q,IR™) by Notation 42.5.9. Thus f|, € DF*!(Q, IR") for all 
Q € Top(U), which verifies the proposition for k + 1. Hence by induction, the proposition holds for all 
k € Zt, and then the proposition for k = oo follows from Notation 42.5.9. 

For part (iii), the case k — 0 follows from Theorem 31.12.17, and the case k — 1 follows from Theorem 41.2.21. 
For k € Z* , assume that Vp € U, IQ € Top,(U), f|, € C^(Q, IR") implies f € C*(U, IR"). Suppose that f 
satisfies Vp € U, IN € Top, (U), f|, € CF"! (Q, IR"). Then f € C*(U, IR") by Theorem 42.5.16 (v) and the 
assumption. Let a € N^. Then af € C(U,R™) by Notation 42.5.10. Let p € U. Let Q € Top, (U) satisfy 
flo € C * (Q, IR"). Then (84f)|, = 84(f|,) € C! (Q, IR") by Notation 42.5.10. So by Theorem 41.2.21, 
3af € C! (U, IR"). Therefore f € C^*! (U, IR") by Notation 42.5.10. Thus the proposition for k + 1 follows 
from the proposition for k. Hence by induction, the proposition holds for all k € Au. 'Then the proposition 
for k — oo follows from Notation 42.5.10. 

For part (iv), the case k — 0 follows from Theorem 31.12.15, and the case k — 1 follows from Theorem 41.2.21 
because in the special case € = Ø, the empty function f|., = 0 satisfies f|, € CF(Q, IR") by Notation 42.5.10. 


For k € Z*, assume that f € C^(U, IR") implies VO € Top(U), f|, € CF(Q,IR"). Suppose that f satisfies 
f € CF (U, IR"). Then VQ € Top(U), f|, € C*(Q, IR") by Theorem 42.5.16 (v) and the assumption. Let 
Q € Top(U). Then f|, € C*(Q, IR"). Let o € Nk. Then O4(f|;) € C(Q, IR") by Notation 42.5.10. But 
af € C'(U, IR") by Notation 42.5.10. So (3a f)|o = 0. (f|3) € C! (Q, IR") by Theorem 41.2.21. Therefore 
fla € € C**!(Q, R™) by Notation 42.5.10. Thus ils € C**!(Q, R™) for all Q € Top(U), which verifies the 


proposition for k +1. Hence by induction, the proposition holds for all k € Zt, and then the proposition for 
k = œ follows from Notation 42.5.10. 


42.5.30 REMARK: Higher-order differentiability of vector-valued functions on Cartesian spaces. 

Notation 42.5.32 is the abstract vector-valued version of Notation 42.5.10 for spaces of concrete Cartesian 
space valued functions on a Cartesian space. This family of spaces extends the first-order space C!(U, W) 
in Notation 41.8.10. Definition 42.5.31 is the higher-order version of Definition 41.8.9. The abstract vector 
values are made “concrete” by applying the linear space component map in Definition 22.8.8. 
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42.5.31 DEFINITION: A C* (differentiable) W -valued function on a set U € Top(IR") with n,k € Zi and 
a finite-dimensional real linear space W, is a function f : U — W such that xg o f € C^(U, IR") for some 
basis B for W, where m = dim(W). (See Definition 22.8.8 for the component map &p.) 


42.5.32 NOTATION: C*(U,W), for a set U € Top(IR") with n,k € Zj and a finite-dimensional real linear 
space W, denotes the set of C^ differentiable W-valued functions f : U + W. 


C** (U, W), for a set U € Top(IR?) with n € Zf and a finite-dimensional real linear space W, denotes the 
set No C" (U, W). 


42.5.33 REMARK: Sets of functions with Holder continuous derivatives. 

Notation 42.5.34 for C%® functions is an extension of Notation 38.7.5 for C? functions, except that the 
source and target spaces must be Cartesian spaces, not fully general metric spaces. For differentiability, some 
kind of differentiable structure is required on the source and target spaces, plus also some kind of metric 
structure so that Holder continuity is well defined. Cartesian spaces have both the required differentiable 
structure and metric structure. General manifolds have the required differentiable structure, but not the 
metric structure, unless this is added. 


As noted in Remark 38.7.4, the PDE literature defines C^^ spaces with many variants of the concept of 
Holder continuity, which have varying degrees of locality and uniformity. To keep matters simple, Notations 
38.7.5 and 42.5.34 assume the strongest form of Holder continuity, which is global and uniform. 


The special case C": (U, IR") denotes the set of functions with Lipschitz continuous kth derivatives. 


42.5.34 NOTATION: C^^(U,R"), for k € Zt, o € (0,1], U € Top(IR?) and m,n € Zt, denotes the set of 
C* functions from U to IR™ whose derivatives of order k are uniformly a-Hélder continuous functions from 
U to IR". In other words, 


ch(U, R™) = (f € C*(U,R"); Va € NË 


n? 


3af € C% (U, R™)}. 


(See Notation 42.5.10 for C^(U, R™).) 


42.6. Applications of higher-order derivatives of maps 


42.6.1 REMARK: Constant functions are infinitely continuously differentiable. 

It is useful to know that constant functions are infinitely continuously differentiable. But proving this is 
also a useful exercise in the mechanics of applying the definitions. Table 42.6.1 lists some continuity and 
differentiability properties for constant functions and where they are proved in theorems. 


function class property of constant functions 
31.129 f: X >Y f continuous for general topological spaces X and Y 
40.5.5 f:ROR f differentiable and f’ = 0 
41.1.16 f:R” >R f partially differentiable and 0;f = 0, i € Nn 
41.223 f:mR"—R" feC!(Dom(f),IR?) and ðf = 0, i € IN, 
41.8.12 f:R"—5W  feC!(Dom(f),W) and 8;f = 0, i € Nn, linear space W 
42.114 f:R>R f € C~(Dom(f), R) and 0*f =0, k € Z* 
422.16 f:R">R feC^?(Dom(f),R) and df 20, a € NE, ke Zt 
42.6.2 f:R^R" feC^?(Dom(f),R"?) and 3af 2^0, a € NE, k e Zt 
5165 f:M>R / € C*(M,R) for a C* manifold M, k € Zj 
51.58 f:M —W | f €C'(M,W) for a C! manifold M, linear space W 
52.1.9 f : Mi — M» f € C*(Mi, M3) for ok manifolds Mi and M», ke Ze 
54.116 f:M—R f € C'(M,R) for a C! manifold M, 0;f = 0 for i € Nn, n = dim(M) 
f 


54.14.5 f:M—W  ôpvwyf =0 fora C! manifold M, linear space W 


Table 42.6.1 Differentiability and derivatives of constant functions 
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42.6.2 THEOREM: Constant functions between Cartesian spaces are C^? with all derivatives zero. 
Let n,m € Z. Let U € Top(IR?). Let v € IR". Define f : U > R™ by f(x) = v for all z € U. Then 
f € C**(U, IR"), and 8, f (p) = 0 for all p € U and a € IN5, for all k € Zt. 


Pnoor: By Theorem 41.2.23, f € C! (U, IR") and 0;f(p) = 0 for all p € U and i € Ny. So Dom(8;f) = U 
for all i € Nn by Definition 42.5.3 and Notation 42.5.4. 


Assume for some k € Z+ that Dom(0,.f) = U for all a € NE, and (0, f)(p) = 0 for all p € U and a € NE. 
Then Dom(0,,;f) = U and (3a if)(p) = (0:(Oaf))(p) = 0 for all p € U, for all a € N* and i € Nn, by 
Theorem 41.2.23 and Definition 42.5.3 and Notation 42.5.7. Thus Dom(0,f) = U and (ôa f)(p) = 0 for all 
p € U and o € NF*!. Hence f € C? (U) by Notation 42.5.10 by induction on k € Z*. 


42.6.3 REMARK: Basic real-number algebraic operations are C?*. 
The addition and multiplication operations for real numbers and Cartesian linear spaces are shown to be C? 
in Theorem 35.3.5, C! in Theorem 41.3.7, and C^? in Theorem 42.6.4. 


42.6.4 THEOREM: Smoothness of some basic real-number algebraic operations. 
(i) The real-number addition operation o : R x IR 2 R is C™. 


) 
(ii) The real-number product operation 7 : R x IR — R is C”. 
(iii) The Cartesian linear space scalar multiplication operation uw: R x R” > R” is C™ for all n € Zi à 
) 


(iv) The Cartesian linear space vector addition operation o; : IR? x IR" > R” is C™ for all n € Ze : 


PROOF: Part (i) follows from Theorems 41.3.7 (i) and 42.6.2 because the first-order partial derivatives of a 
are constant. 


Part (ii) follows from Theorems 41.3.7 (ii) and 42.6.2 because all of the second-order partial derivatives 
of r are constant, which follows from the fact that the first-order derivatives are linear (because they are 
projection maps). 


Part (iii) follows from Theorems 41.3.7 (iii) and 42.6.2 because all of the second-order partial derivatives of 
u are constant, which follows from the fact that all of the first-order derivatives are linear (because they are 
projection maps). 


Part (iv) follows from Theorems 41.3.7 (iv) and 42.6.2 because all of the first-order partial derivatives of on 
are constant. 


42.6.5 REMARK:  Differentiable curves in Cartesian spaces. 
Curves in a Cartesian space IR" are continuous maps from IR to IR" for some n € Zi . So a curve may be 
thought of as a special kind of map between the Cartesian spaces IR and IR". 


The domain of an m-parameter curve family in a Cartesian space R” is typically of the form x7' , I; for 
real-number intervals Jj. So a curve family may be thought of as a map between Cartesian spaces IR™ 
and R”. 


Although curves and curve families in Cartesian spaces are apparently maps between Cartesian spaces, they 
are generally thought of as objects which are located inside the target space R”. They are, more or less, 
m-dimensional manifolds immersed inside a target space IR”. 


Definition 42.6.6 means that a map y : I > R” is a C^ curve if and only if it is continuous on the whole 
interval J and C* differentiable on the interior of J. 


42.6.6 DEFINITION: A C* (differentiable) curve in IR^ for n € Zi, k € Zt and an interval J C IR is a map 
y € C?(I, R”) such that LM is of class C^. 


42.6.7 REMARK: Differentiability of common-domain and double-domain products of maps. 

In Theorem 42.6.8, $1 X #2 is a common-domain function product as in Definition 10.15.2. Such function 
products arise in particular as products of projection maps and fibre maps in fibre atlases for differentiable 
fibre bundles when viewed via manifold charts on the respective spaces. 


Theorem 42.6.9 is a double-domain version of Theorem 42.6.8. 
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42.6.8 THEOREM: Order-k differentiability of the common-domain direct product of C" functions. 
Let k € ZF. Let Q € Top(IR”) for some n € Zj. Let ġe € C*(Q, R™) for some m; € Zg, for £ = 1,2. Then 
$1 X $» € CK(Q, R™+™2), where IR": x R is identified with IR"1*72, 


Pnoor: The case k = 0 follows from Theorem 32.11.2 (i). The case k = 1 follows from Theorem 41.3.5 (ii). 
Assume that the assertion holds for some k € ZJ. Suppose that ġe € C**!(O, IR"*) for some me € Zg, 
for l = 1,2. Then dade € C! (Q, R”) for all a € NE and £ = 1,2 by Theorem 42.5.17 (i). Consequently 
(01) x ( 0.42) € C1 (Q, IR": *"2) for all a € NF by Theorem 41.3.5 (ii). But O4 ($1 x $2) = (0401) x (0542). 
So aldı x $2) € C! (O, R"1*"2) for alla € NE, So $i x bo € CF*! (U, IR": ^") by Theorem 42.5.17 (ii). 
Thus by induction on k, the result holds for all k € Zf, and hence for k = oo also. 


((2020-2-21. The proof of Theorem 42.6.9, which is adapted from the proof of Theorem 42.7.4 (i), is too 
sketchy. Both of these proofs should be given in as much detail as for Theorem 42.6.8. )) 


42.6.9 THEOREM: Order-k UiferentaaHty of the double-domain direct product of C" functions. 
Let k € Zt. Let ne, m, € Zj and Qy € Top(R™) for £ = 1,2. Let de € C*(Q2,R™) for £ = 1,2. Then 
Qı x à»: Q, x (25 > R™ x R” = Rmtme isa oF map. Hence Qı x Q2 € C*(O x Q5, IR" *m2), 


PROOF: Note that 1 x Q2 is an open subset of R™ x R”? = R™*"2 by Theorem 32.9.6 (ii). The product 
map $4 X $» is given by ($1 X $2)(#1, £2) = (ġ1 (z1), $2(22)) for all (21,22) € Q1 x Q2 by Definition 10.14.3, 
and $; X $9 is continuous by Theorem 32.9.10 (i). If k > 1, then ó X ¢2 is C! by Theorem 41.3.3 (ii). The 
assertion for k > 2 follows by noting that each level of the partial derivative tree in Definition 42.5.6 for 
$1 X ¢ is the concatenation of component-by-component partial derivatives for ¢, and $» as in the proof of 
Theorem 41.3.3 (i). 


42.6.10 REMARK: Differentiability of common-domain product of real and tuple-valued functions. 
Theorem 42.6.11 exploits the C% differentiability of the scalar multiplication operation on a Cartesian linear 
space, combined with the C* differentiability of common-domain function products, to show that the product 
of a scalar and vector function is C^. This is relevant for the proof of Leibniz rules for products of scalar 
and vector fields on manifolds. 


42.6.11 THEOREM: Differentiability of the product of a scalar function and a vector function. 

Let k € Zt and n,m € Zj. Let Q € Top(IR?). Let f € C*(Q, R) and g € C*(Q,IR"). Define f.g: 0 2 IR" 
by (f.g)(x) = f(x)g(x) = u( f(x), g(z)) for all x € Q, where u : R x R” > R” is the Cartesian linear space 
scalar product. Then f.g € C^(Q, IR"). In other words, 


Vk € Zi, Vn, m € Zi, VO € Top(R^), Vf € C*(O, IR), Vg e C*(Q, R”), 
po (f x g) e C" (n, IR"). 
PROOF: By Definition 22.2.19, w(A,z) = (Azi);2, for all A € R and z = (z;);:, € R”, and by Theo- 


rem 42.6.4 (iii), u € C% (R x IR", R") = C*(R!*", R"). But f x g € C*(O,R x R™) = C*(Q, IRE") by 
Theorem 42.6.8. Hence jo (f x g) € C*(Q, IR") by Theorem 42.5.27. 


42.6.12 REMARK: Closure of C^ maps under linear operations. 

Ultimately all maps between C^ manifolds are merely high-level structural packaging for C^ maps between 
Cartesian spaces. Therefore a very large proportion of analytical properties of differentiable manifolds and 
fibre bundles are really just properties of Cartesian space maps, packaged so as to represent particular classes 
of geometric objects. 


In particular C^ cross-sections of C* vector bundles are ultimately merely C^ maps between Cartesian 
spaces. To show that they are closed under linear space operations, it must be shown that C^ Cartesian 
space maps are likewise closed. This is the purpose of Theorem 42.6.13. 


42.6.13 THEOREM: Closure of OF maps under linear operations. 
Let k € Zj and n,m € Zt. Let Q € Top(IR?). Then 


VÀ € R, Vg € C^(Q, R™), Ag € C*(Q, R”) 
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and 
Yf, g € C*(Q, R”), f +g € C" (Q, R”). 
PROOF: Let A € Rand g € C*(O, IR"). Define f : Q — R by f(x) = A for all z € Q. Then f € C*(O, R) 
by Theorem 42.6.2. So Ag = f.g € C*(Q, IR") by Theorem 42.6.11. 
Let f,g € C*(Q,IR"). Then f +g = om o (f x g), where f x g € C*(Q,IR"*") by Theorem 42.6.8, 


and the real tuple addition operation om : IR" x IR" — IR" is a C^ map by Theorem 42.6.4 (iv). Hence 
f+g¢€C*(Q,R™) by Theorem 42.5.27. 


42.6.14 REMARK:  Differentiability of “partial maps" of differentiable functions. 

Theorem 42.6.15 is the Cartesian space differentiability analogue of Theorem 32.10.10, which shows that 
the “partial maps” of a continuous map from a direct product of two topological spaces are continuous. 
Theorem 42.6.15 shows that the “partial maps” of a C^ differentiable map between Cartesian spaces are C^ 
differentiable. Theorem 42.6.16 (i, ii) is the obvious generalisation of Theorem 42.6.15 (i, ii) to open subsets 
of Cartesian spaces. 


42.6.15 THEOREM: The “partial maps” of a C" map between Cartesian spaces are C* differentiable. 
Let IR^, R”? and IR" be Cartesian spaces with n1, 3, m € Zf. Let f : IR" x IR"? — R™ be C* differentiable 
for some k € Zj. 


(i) The function ff? : R^: — IR" defined by f? : xı — f(x1, £2) is C^ differentiable for all x € IR??. 
(ii) The function fs? : R”? => R™ defined by f7' : v2 f(21, 2) is C* differentiable for all xı € R”. 


PROOF: For part (i), let f € C*(IR™ x IR"2, R") for some k € Zj. Let x2 € IR"?. Let a € NK. Define 
à € IN* by 


nitne 


. jo; foicN,, 
Vi € IN, n2; Qi = for i ¢ Nn- 


Then 0. fF (x1) = Oaf (x1, z2) for all zı € R™!, which is well defined because f € C*(R™ x R™,R"™), 
and ða fi? : IR"! — R™ is then continuous by Theorem 32.10.10 (i). Therefore ff? € C*(R™,R™). The 
assertion for k = oo then follows from the definition of C% differentiability. 


Part (ii) may be proved as for part (i). 


42.6.16 THEOREM: The “partial maps” of a C* map between Cartesian domains are C* differentiable. 
Let Qı € Top(IR^:), Q2 € Top(R”?) and Qo € Top(R™) with n1, n2, m € Zg. Let f : Qi x Q2 2 No be C* 
differentiable for some k € Zj. 

(i) The function fj? : 0; — Qo defined by fF? : xı — f(z1,22) is CF differentiable for all x9 € Q5. 

(ii) The function f2' : Q2 — Qo defined by fs? : za  f(z1,22) is C* differentiable for all xı € Q4. 


PRoor: Parts (i) and (ii) may be proved exactly as for Theorem 42.6.15 (i, ii). 


42.6.17 REMARK:  Differentiability of component-projection and general indez-selection maps. 

'Theorem 42.6.18 may be generalised to any map which is defined between Cartesian spaces by an index- 
selection map between the tuple index sets. In other words, if P : Nm — Nn is a function for some m, n € Zi ; 
then f : R” > IR" defined by f(x); = £ pu) for all x € R” and i € Nm will be a C^? map which satisfies 
Oj; f (z); = pa), j for alla € R”, i € Nm and j € Nn. Theorem 42.6.18 gives only the special case P = idy, 
with m = k and n = k + £. 


42.6.18 THEOREM: Component projection maps between Cartesian spaces are C^? differentiable. 
Let k, € Zt. Define f : R*+ — RF by f(x,y) = x for all z € R* and y € Rf. 

(i) f is a C! map, and Vz € R***, vj € Ni,,, Vi € Np, Off (zx); = ói. 

(ii) f is a C*? map, and Vz € IR***, Yq € Zt V {1}, Va € N} p, Vi € Nk, Oof(x); = 0. 


PROOF: For part (i), Int(Dom(f)) = R***. Let i € Ng. Then if (£) = ei = (ij)¥—ı for all z € RAT. 


Similarly, if i € Nk+e V Nz, then ô; f(x) = Ope = (5:5) Fa for all x € RF**. 


Part (ii) follows from part (i) and Theorem 42.6.2. 
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42.7. Cartesian space local diffeomorphisms 


42.7.1 REMARK:  Diffeomorphisms by analogy with homeomorphisms. 

Diffeomorphisms are defined by analogy with the homeomorphisms in Section 31.14. A homeomorphism is 
a topological space isomorphism. An isomorphism is one of the six standard named kinds of morphisms, as 
for example the group morphisms in Definition 17.4.1 or the linear space morphisms in Definition 23.1.8. 
The objects which are “morphed” here by Cartesian space diffeomorphisms are open subsets of Cartesian 
spaces which have the same dimension. Diffeomorphisms are of fundamental significance in the definition of 
a differentiable manifold in Section 51.3. 

The C* differentiability in Definition 42.7.2 is based on Definition 42.5.11. This permits the domain and 
range spaces to have a different dimension. In the case of a C* diffeomorphism, the dimensions could be 
different if the domain and range are empty, but otherwise the dimensions must be the same. 


It is noteworthy that a C* diffeomorphism is not defined to be a bijective C* differentiable homomorphism. 
Thus a map between Cartesian spaces which is both a C^ monomorphism and a C* epimorphism is not nec- 
essarily a C^ isomorphism. For example, consider the function 2 ++ x? from IR to IR, which is a C^ bijection 
whose inverse is not even differentiable. (This problem could be remedied by requiring C^ homomorphisms 
to have maximal rank at all points, but this would require ^maximal rank" to be defined first.) 


42.7.2 DEFINITION:  Differentiable morphisms between open subsets of Cartesian spaces. 

Let n,m € Zt and k € Z. 

A OF (differentiable) homomorphism from a set Q € Top(IR") to a set S € P(IR™) is a C^ differentiable 
map $:0— S. 

A CF (differentiable) monomorphism from a set Q € Top(IR?) to a set S € P(IR™) is an injective C* 
differentiable homomorphism 6 : Q — S. 

A OF (differentiable) epimorphism from a set Q € Top(IR”) to a set S € P(IR™) is a surjective C^ differen- 
tiable homomorphism 9$ : 0 > S. 

A OF (differentiable) isomorphism or C* diffeomorphism from a set Qı € Top(IR") to a set Q2 € Top(IR") 
is a bijective C^ differentiable homomorphism $ : Q4 — Q2» such that $7! : Qə > Qı is a C^ differentiable 
homomorphism from Qə to €. 

A CF (differentiable) endomorphism of a set €) € Top(R”) is a C^ differentiable homomorphism ¢ : Q > Q. 


A OF (differentiable) automorphism of a set Q € Top(IR") is a C^ differentiable isomorphism $ : Q > Q. 


42.7.3 REMARK: Direct products of diffeomorphisms between open subsets of Cartesian spaces. 

The direct product functions ¢; X @2 in Theorem 42.7.4 are the double-domain style of direct function 

product in Definition 10.14.3. The products of diffeomorphisms in Theorem 42.7.4 (ii) are applicable to the 

products of differentiable manifolds in Section 52.7. (Theorem 42.7.4 is illustrated in Figure 42.7.1.) 
((2020-2-21. The proof of Theorem 42.7.4 (i) is too sketchy. The proof of Theorem 42.6.9, which is adapted 


from the proof of Theorem 42.7.4 (i), is likewise too sketchy. Both of these proofs should be given in as much 
detail as for Theorem 42.6.8. )) 


42.7.4 THEOREM: The double-domain direct product of C" diffeomorphisms is a C" diffeomorphism. 
Let ni,n3 € Ze and k € Zg. Let Q; j € Top(R™) and 9; : Qi1 — Qi» for i, j = 1,2. 
(i) If 9; : Q;1 —> Q2; is a CF map for i = 1,2, then $1 X $2 : Q1,1 x N21 — Q1,2 x N22 is a C^ map. 
(ii) If 9; : Q;1 — Qi isa C* diffeomorphism for i = 1,2, then $1 X à» : Q11 x Q21 — O12 x 055 is a 
C* diffeomorphism. 


PROOF: For part (i), note that 011 x Q21 and Q4, x Q22 are open subsets of IR"! x R"? = R™*™ 
by Theorem 32.9.6 (ii). The product map $1 X ¢2 is given by (¢1 X $2)(z1, 22) = (¢1(21), 62(x2)) for all 
(1,22) € O11 x Q21 by Definition 10.14.3, and $1 X 2 is continuous by Theorem 32.9.10 G). I 5, 
then $i X $» is C! by Theorem 41.3.3 (ii). The assertion for k > 2 follows by noting that each level of 
the partial derivative tree in Definition 42.5.6 for $1 X dz is the concatenation of component-by-component 
partial derivatives for ó; and $2 as in the proof of Theorem 41.3.3 (i). 


Part (ii) follows from part (i) by noting that ($1 X $3)! = 41! x $5. 
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Figure 42.7.1 Double-domain product of diffeomorphisms between Cartesian spaces 


42.7.5 REMARK: Pointwise map restrictions are diffeomorphisms if their product is a diffeomorphism. 
Theorem 42.7.6 is the C^ diffeomorphism analogue of Theorem 32.11.4, which states that if the direct 
function product of continuous maps is a homeomorphism then the pointwise restrictions of these maps 
are also homeomorphisms. Theorem 42.7.6 states that if the direct function product of C^ maps is a C* 
diffeomorphism then the pointwise restrictions of these maps are also C^ diffeomorphisms. Theorem 42.7.6 
can be applied to differentiable fibrations to show that per-base-point fibre charts are diffeomorphisms. (See 
Theorem 64.3.9 (ii).) 


42.7.6 THEOREM: “Fibre set” restriction of common-domain diffeomorphism product is a diffeomorphism. 
Let 04 € Top(IR?1), Qo € Top(IR?2) and Qo € Top(R™) with ni, nom € Zg. Let Qı : Qo — Qı and 
$2 : X —5 Nı be C* maps for some k € Zj such that $1 x $» : Qo — N1 x Q3 is a C* diffeomorphism. 


(i) Mi uy : $5 ((23)) > Q1 is a C^ diffeomorphism for all xz € Q5. 
2 

(ii) $2141 (5,3) : 7 1 ((21]) — Q2 is a C* diffeomorphism for all zı € Q4. 
1 


Pnoor: For part (i), let 2 € Qə. Then by Theorem 32.11.4 (i), 9114-1 (ua) : dg ({z2}) > Qı is a 
2 
homeomorphism. So its inverse g1 — Pil rts : Qi > $5 ({x2}) C Qo is a well-defined homeomorphism. 
2 


Let f = (¢1 x ¢2)7!. Then f : Qi x Q2 — Qo is a C* diffeomorphism. For any (£1, £2) € Qı x No, 


($1 X $2) (f (21,22)) = (z1, £2), and so di (f (21,22)) = x1 and é»(f (z1,22)) = z2. So f(z1,22) € $3 ((12]). 
and consequently $1 4-1 (4993) ris 22) = xı. Thus gi(z1) = f(z1,22) for all xı € Qı. But fy? : Qı > Qo 
2 
in the statement of Theorem 42.6.16 (i) also satisfies f}? (x1) = f(x1, £2) for all z4 € Q1. Therefore gı = fF’. 
So gı is a C^ map by Theorem 42.6.16 (i). Hence $i (uy : by ((23]) — Qı is a C* diffeomorphism. 
2 


Part (ii) may be proved exactly as for part (i). 


42.8. Power series and analytic functions 


42.8.1 REMARK: Literature for power series and analytic functions. 

Some useful presentations of real-number power series and analyticity are Thomson/Bruckner/Bruckner [149], 
pages 404-427; Rudin [129], pages 158-163; Mattuck [114], pages 305-322; Friedman [74], pages 167-175; 
Rosenlicht [128], pages 150-156; Schramm [133], pages 257-259, 275-278; Johnsonbaugh/Pfaffenberger [97], 
pages 87-89, 185-188, 259-261. Some useful presentations of complex power series and analyticity are 
Lang [109], pages 37-74; Shilov [135], pages 373-404; Ahlfors [45], pages 24-41, 173-182. 


42.8.2 REMARK: Real analytic functions. 
The name "analytic function" in Definition 42.8.3 may be justified by considering that such functions may 
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be “analysed” by computing all of the infinitely many derivatives of the function at each point p, and 
then the function may be precisely re-synthesised from the countably infinite sequence of derivatives in a 
neighbourhood of p. A non-analytic C% function is one for which the re-synthesis does not match the 
original function. Thus one may say that an analytic function is one which can be not only analysed but 
also re-synthesised. It is the successful synthesis which distinguishes analytic from non-analytic functions, 
not merely the ability to analyse it. 

The infinite sum XZ o a(x — p)" in Definition 42.8.3 is given meaning by Definition 39.2.6 because IR is 
a T topological linear space with its standard topology and linear space structure. (See Theorems 39.5.7, 
39.3.4 and 39.1.8 (ii) for justification of this.) 


42.8.3 DEFINITION: A (real) analytic function on U € Top(IR) is a function f : U > IR such that 


Vp € U, Ja = (ax)g-.o € Ro, 30 € Top, (UV), Vx € Q, 


f(z) = X ax(z — p)*. (42.8.1) 


In other words, each point in U has a neighbourhood within which the function is equal to the sum of a 
convergent real-number power series at each point. 


The coefficient sequence of a real analytic function f : U — IR at p € U is the sequence (ax)729 for f at p 
in line (42.8.1). 


42.8.4 REMARK: Complex analytic functions. 

Although complex numbers are largely avoided in this book, they are unavoidable in the definitions of the 
classical Lie groups which are used in applications of connections on complex vector bundles to quantum 
field theory. (See Definition 16.8.1 for the complex number system.) 


Differentiable complex functions are required to have directional derivatives which are independent of the 
direction of the differentiation. 


42.8.5 DEFINITION: A (complex) analytic function on U € Top(C) is a function f : U — C such that 


Vp € U, Ja = (ax)g-.o € CZ, JN € Top, (UV), Vz € Q, 


f(x) = X ax(x—p)". 


k=0 


In other words, each point in U has a neighbourhood within which the function is equal to the sum of a 
complex series expansion at each point. 


42.8.6 REMARK: Analytic versus holomorphic complex functions. 

The terms “(complex) analytic” and “holomorphic” are often regarded as synonymous because they are 
essentially equivalent. (See for example Ahlfors [45], page 24; Shilov [135], page 374.) Some authors call 
a holomorphic function “complex differentiable”. (See for example Lang [109], pages 27, 30.) And some 
authors call a complex function “differentiable” at a single point if it satisfies Definition 42.8.7 at that point. 
(See for example Shilov [135], page 373.) 

It is quite straightforward to show that an analytic function is holomorphic. (See for example Ahlfors [45], 


pages 39-40; Lang [109], pages 72-73.) But it is not so easy to show the converse. (For proof of the converse, 
see for example Ahlfors [45], pages 125, 177; Shilov [135], page 396; Lang [109], pages 128-129.) 


42.8.7 DEFINITION: A (complex) holomorphic function on U € Top(C) is a function f : U — C such that 
lim; ,s(f(z) — f(a))/(z — a) is well defined for all a € U. In other words, 


Va € U, dw € C, Ve € R*, 3ó € Rt, Yz € Bags, 
|f(z) — f(a) — w(z — a)| € &lz — al. 
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INTEGRAL CALCULUS 


43.1 History and styles of integration. . . . 4. ll les 1386 
43.2 Gregory-Barrow-Leibniz-Newton integral . . . . .... e 1388 
43.3. Cauchy integral s e c sce ai "-"-————————————— m 1391 
43.4 Cauchy-Riemann integral . . o.oo aaa a 1398 
43.5 Cauchy-Riemann-Darboux integral . . . . . a 1403 
43.6 Darboux integrability, pointwise oscillation and Jordan content ............... 1408 
43.7 Cauchy-Riemann-Darboux integral basic properties . . . .. 22r. 1409 
43.8 Fundamental theorems of calculus . ..... eA 1412 
43.9 Cauchy-Riemann-Darboux integral for vector-valued integrands . . . o.oo 1416 
43.10 Cauchy-Riemann-Darboux-Stieltjes integral . ........................ 1420 
43.11 Stieltjes integral for vector-valued integrands... ... lle 1421 
43.12 Stieltjes integral for linear operator integrands . ...... ee 1423 


43.0.1 REMARK: Integration is a “more advanced technology” than differentiation. 

Nowadays, integration seems like a more complex operation than differentiation. To differentiate a function 
at a point, one merely needs to take the limit of a differential quotient. The operations required are two 
subtractions, one division, and one limit. This yields f’(p) = limz_,,(f(x) — f(p))/(x — p) as the derivative 
of a differentiable function f. By contrast, integration requires the determination of upper and lower limits 
of areas under functions which are approximations to the given function for various partitions of the domain 
or range of the function (depending on the kind of integral). This is why integration is presented after 
differentiation in this book. 


Ironically, the techniques and theory of integration were surprisingly advanced in the time of Archimedes 
around 250BC, whereas differential calculus was struggling to come into existence before the 17th century, 
and it was only in the 19th century that a satisfactory definition of differentiation was obtained. The 
ancient Greek limit procedure, later given the name “method of exhaustion", was a method of inferior and 
superior bounds which very much resembled the limit procedures used in the definitions of the 19th century 
Cauchy-Riemann-Darboux integral. There is no really obvious reason why a limiting tangent line to a graph 
(a limit of secants) should be any more intellectually demanding than the area of the limiting curve of an 
infinite sequence of polygons. In each case, upper and lower bounds converge to a single number. Even 
allowing for the fact that the real number system had not developed sufficiently before the 17th century, 
and geometric methods were used instead, it is still mystifying that differential calculus lagged so far behind 
integral calculus (which was formerly known under the names *quadrature" and "cubature"). 


'The answer to this mystery could possibly lie in the lack of applications for derivatives. Calculation of areas 
and volumes was perhaps considered to be a worthy intellectual exercise, and applications could be found 
to some practical problems such as determining the volume of a pyramid or beer barrel. By contrast, there 
may not have been much intellectual or practical motivation to compute tangents to curves and surfaces 
expressed in terms of algebraic formulas before the 17th century. Geometric arguments were typically applied 
to determine tangent lines at points of curves, not analytic arguments, whereas integration could make use 
of the method of exhaustion, which has a clearly analytical character. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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In the 17th century, the newly discovered laws of motion and gravity were expressed in terms of first and 
second derivatives. The discovery that the universe obeys differential equations was surely the principal 
stimulus for the rapid, and perhaps over-hasty, development of differential calculus at that time. A very 
broad range of intellectually challenging applications was opened up. Even so, it took another 200 years to 
develop a sound definition of differentiation despite concerted efforts to find logically meaningful and self- 
consistent definitions. The concept of a real number finally caught up with the demands of differentiation 
by the end of the 19th century, and then at the beginning of the 20th century, the theory of integration 
demanded a more refined concept of a function. 


Perhaps the answer to why integration is now a more advanced theory than differentiation is that integration 
requires a sophisticated concept of a function, whereas differentiation requires only a sophisticated concept 
of a number. For integration one must approximate functions by functions, whereas for differentiation one 
approximates curves by lines. Since function spaces are infinite-dimensional and lines are described by a 
finite number of parameters, it is the approximation process of integration which is more demanding. One 
might say that the “technology” of integration has outpaced the “technology” of differentiation since the 
beginning of the 20th century. Integration theory is no longer simply a matter of guessing anti-derivatives 
as it may have been in earlier centuries. 


43.1. History and styles of integration 


43.1.1 REMARK: History of the name “integral calculus”. 
According to Cajori [242], Volume 2, pages 181-182, it was Leibniz and Johann Bernoulli (younger brother 
of Jacob) who introduced and standardised the name “integral calculus” and the symbol for the integral. 


At one time Leibniz and Johann Bernoulli discussed in their letters both the name and the principal 
symbol of the integral calculus. Leibniz favored the name calculus summatorius and the long letter 
f as the symbol. Bernoulli favored the name calculus integralis and the capital letter I as the 
sign of integration. The word "integral" had been used in print first by Jakob Bernoulli, although 
Johann claimed for himself the introduction of the term. Leibniz and Johann Bernoulli finally 
reached a happy compromise, adopting Bernoulli's name “integral calculus," and Leibniz’ symbol 
of integration. 


For the attribution of the integral symbol to Leibniz, see also Cajori [242], Volume II, pages 187, 242-244. 


Leibniz published his integral calculus in 1686, referring to indefinite integrals as “Tetragonismi Indefiniti” 
or "indefinite quadratures", and antidifferentiation as "Methodus Tangentium inversus” or “inverse method 
of tangents". (See Leibniz [186], page 295.) 


Newton published his integral calculus in his “Tractatus de quadratura curvarum” in 1704. His “Method 
of fluxions” was published in 1736, ten years after his death. (See Newton [228]. Newton used the term 
“fluents” for integrals, a term which is notably absent from the modern mathematics literature. 


43.1.2 REMARK: Milestones in the history of integration. 
The theory of integration has a history which is both long and deep. The developments in this history 
include the following highlights. 


1) The “Eudoxus-Euclid-Archimedes integral". The method of exhaustion for integration was introduced 
by Eudoxus in the 4th century BC. In the 3rd century BC, Euclid and Archimedes applied the exhaustion 
method to verify their computations of areas and volumes in works which have survived. 

2) 1668/1670/1686/1704. James Gregory, Isaac Barrow, Gottfried Wilhelm Leibniz and Isaac Newton 
published antidifferentiation as a method of integration. 

3) 1823. Cauchy defined and demonstrated the well-definition of the integration of continuous functions 
by approximation with sums. (See Cauchy [207], pages 81-84; Shilov/Gurevich [136], page 8; Thomson/ 


Bruckner/Bruckner [149], pages 333-348; Bruckner/Bruckner/Thomson [56], pages 41-42.) 


4) 1854. Riemann gave existence conditions for the Cauchy integral in terms of the oscillation of the 
integrand. As a result of this work, his name is commonly associated with this style of integral. (See 
Riemann [195], pages 101—104.) 

5) 1875. Darboux gave the first rigorous proof that the Cauchy-Riemann integral is well defined for a 
continuous real function of a real variable. (See Darboux [176], pages 57-77; Shilov/Gurevich [136], 
page 8.) 
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(6) Emile Borel introduced the Borel measure between about 1894 and 1898. (See Struik [249], page 195.) 
(7) Lebesgue introduced the Lebesgue integral in 1902. (See Shilov/Gurevich [136], page 8.) 
(8) Bell [234], page 482, says the following regarding developments after the Lebesgue integral. 


|... ] Lebesgue’s generalisation was not a complete synthesis of integration as summation and 
anti-derivation. A more inclusive union came with the further generalised integrations of 
A. Denjoy (French) in 1912 and O. Perron (German) in 1914. In the meantime, crossbreeds 
between various types of integrals came into being, for example the Lebesgue-Stieltjes integrals. 


(9) Radon introduced the Radon measure in 1913. (See Bell [234], page 483.) 


(10) According to Shilov/Gurevich [136], page 8, Radon and Fréchet generalised the Lebesgue integral during 
the period 1912-1915. 


(11) 1918. Hausdorff introduced Hausdorff measure and Hausdorff dimension. (See Hayman/Kennedy [91], 
page 220, and Hausdorff [180].) 


43.1.3 REMARK: Directions in which measure and integration can be generalised. 
Integration has been generalised in many directions, including the following “generalisation thrusts”. 


(1) From integration of continuous functions to integration of very discontinuous functions. 
(2) From real-number intervals to very general subsets of the real numbers. 


(3) From the domain of real numbers to finite-dimensional Cartesian spaces and infinite-dimensional linear 
spaces. 


(4) From real-valued functions to generalised functions such as Radon measures, Schwartz distributions and 
Sobolev spaces. 


(5) From Lebesgue-like volume measures to sub-manifold and fractional-dimensional measures such as the 
Hausdorff and Carathéodory measures. 


(6) From domains with simple smooth boundaries to domains with smooth but topologically non-trivial 
boundaries which require algebraic topology for their description. 


(7) From Cartesian spaces to general differentiable manifolds. 


Because of the wide range of generalisation thrusts, which have developed in parallel, it is not possible to 
present the theory of integration as a simple linear sequence of historically or logically ordered developments. 


43.1.4 REMARK: Styles of integrals. 
The literature contains descriptions of a wide range of integration styles. The following list is restricted to 
those integrals which are relevant to real-valued functions of a real variable. 


(1) Gregory-Barrow-Leibniz-Newton integral. (See Bruckner/Bruckner/Thomson [56], page 40; Cajori [242], 
Volume 2, pages 242-244; Gregory [220], pages 17-19; Leibniz [186], pages 295-298.) 


(2) Cauchy integral. (See Cauchy [207], pages 81-84; Thomson/Bruckner/Bruckner [149], pages 333-348; 
Bruckner/Bruckner / Thomson [56], pages 41-42; Shilov/Gurevich [136], page 8.) 


(3) Cauchy-Riemann integral. Also called the Riemann integral. (See Riemann [195], pages 101-104; 
Graves [85], pages 85-97; Rosenlicht [128], pages 111-128, 216-231; Schramm [133], pages 282-293; 
Shilov [135], pages 274-286; Thomson/Bruckner/Bruckner [149], pages 347-358; Shilov/Gurevich [136], 
pages 7-21; Mattuck [114], pages 251-262; Rudin [129], pages 104-105; Wilcox/Myers [164], pages 1-9, 
82-84; Spivak [140], pages 214-232; Edwards [67], pages 223-233; Bruckner/Bruckner/Thomson [56], 
pages 43-44, 189-192; Riesz/Szókefalvi-Nagy [125], pages 23-24; Bass [53], pages 69-72; Friedman [74], 
pages 113-129; Kaplan/Lewis [99], pages 320-327.) 


(4) Darboux or Riemann-Darboux integral. Also called the Riemann integral. (See Friedman [74], pages 
108-115, 260-261, 272; Riesz/Szókefalvi-Nagy [125], pages 23-26; Darboux [176], pages 57-77.) 


(5) Stieltjes integral. (See Graves [85], pages 260-296; A.E. Taylor [145], pages 392-401; Kolmogorov/ 


Fomin [104], pages 362-378; Riesz /Szdkefalvi-Nagy [125], pages 105-140; Shilov [135], pages 316-317.) 


(6) Riemann-Stieltjes integral. (See Shilov/Gurevich [136], pages 61-85; Johnsonbaugh/Pfaffenberger [97], 
pages 189-243; Rudin [129], pages 104-126; Bruckner/Bruckner/Thomson [56], pages 47-49, 490—494; 
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Kolmogorov/Fomin [104], pages 367-370; Schramm [133], pages 294-299; Riesz/Szókefalvi-Nagy [125], 
page 122; Friedman [74], pages 402-404.) 


(7) Darboux-Stieltjes integral. (See Friedman [74], pages 402-403.) 


(8) Lebesgue integral. (See Graves [85], pages 173-259; Shilov/Gurevich [136], pages 23-55; S.J. Taylor [147], 
pages 74-132; Riesz/Szókefalvi-Nagy [125], pages 29-104; Rudin [129], pages 227-256; Johnsonbaugh/ 
Pfaffenberger [97], pages 355-394; Kolmogorov/Fomin [104], pages 258-327; Bass [53], pages 47-66; 
A.E. Taylor [145], pages 177-280; Bruckner/Bruckner/'Thomson [56], pages 50-52, 58-121, 163-203; 
Wilcox /Myers [164], pages 13-88; Saks [131], pages 65-67.) 

(9) Lebesgue-Stieltjes integral. (See Shilov/Gurevich [136], pages 88-107; Riesz/Szókefalvi-Nagy [125], 
pages 122-128; Kolmogorov/Fomin [104], pages 364—367; S.J. Taylor [147], pages 95-98, 124-126; Bruck- 
ner/Bruckner/Thomson [56], pages 121-132; Bass [53], pages 24-27; Saks [131], pages 64-104.) 


(10) Daniell integral. (See A.E. Taylor [145], pages 281-323; Riesz/Szókefalvi-Nagy [125], pages 132-140; 
S.J. Taylor [147], pages 241—248.) 


(11) Generalised Riemann integral (Henstock/Kurzweil). (See Bruckner/Bruckner/Thomson [56], page 54.) 


(12) Denjoy, Denjoy-Perron or Denjoy-Khintchine integral. (See Bruckner/Bruckner/Thomson [56], page 54; 
Saks [131], pages 241-259.) 


(13) Perron integral. (See Saks [131], pages 201-203.) 


(14) Perron-Stieltjes integral. (See Saks [131], pages 207-212.) 


For other purposes, there are very numerous further styles of integration. Amongst the many styles of 
integrals, there is usually at least one style which achieves a particular purpose with minimal technicalities. 
For parallel transport definitions, the Riemann, Riemann-Stieltjes and Stieltjes integrals seem to offer the 
required “services” at a reasonable “price”. 


43.2. Gregory-Barrow-Leibniz-Newton integral 


43.2.1 REMARK: History of publications of the antiderivative method. 
The sequence of publications of the antiderivative method, and the proof of the fundamental theorem of 
calculus, is roughly as follows. 


(1) 1668. James Gregory (Jacobo Grigorio) published “Geometriae pars universalis, inserviens quantitatum 
curvarum transmutationi et mensurae" [220] in Padua, Italy. 


(2) 1670. Isaac Barrow (Isaaco Barrow) published “Lectiones geometrica" [202] in London. 

(3) 1686. Leibniz published *De geometria recondita et analysi indivisibilium atque infinitorum" [186] in a 
journal in Leipzig. 

(4) 1704. Newton published “Introductio ad quadraturam curvarum" and “De quadratura curvarum" as 
appendices to his book “Opticks” in London. 


Some of these authors were able to demonstrate that they wrote down many of their ideas many years before 
they published them, but some unlucky authors make earlier discoveries without leaving behind notes which 
can be dated and later published as evidence of priority. It seems unfair to give credit for unpublished results 
only for those fortunate individuals whose private notes or letters have been preserved. Therefore it seems 
fairest to give the name “Gregory-Barrow-Leibniz-Newton integral" to the antiderivative, according to the 
order of their publications. 


43.2.2 REMARK: History of the Gregory-Barrow-Leibniz- Newton integral. 
According to Cajori [242], Volume 2, page 181, the early development of integral calculus in the sense of 
antidifferentiation was due to Leibniz [186] and Johann Bernoulli. 


As regards the integral calculus, Johann Bernoulli had been active in this field and was looked upon 
as the creator of the integral calculus, notwithstanding Leibniz’ publication of 1686. 


Since Newton's publications were much later and much less influential for the development of integral calculus, 
it could be more accurate to call the antiderivative the *Leibniz-Bernoulli-Newton integral". Many important 
concepts were first invented privately before someone else published them. (See Nauenberg [nauenberg] for 
the possible influences of Barrow's work on Leibniz's work.) Nowadays credit is generally given for the first 
publication of an idea because that is typically what initiates its development amongst mathematicians. 
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43.2.3 REMARK: Antiderivatives, special functions and term-by-term integration of series expansions. 

In the 17th and 18th centuries, integration was primarily the art of guessing antiderivatives. (This is called 
“Newton’s integral” by Bruckner/Bruckner/Thomson [56], page 40.) If a suitable antiderivative could not 
be constructed algebraically from the library of known functions, it was added to the library as a “special 
function”, which extended the range of available antiderivatives. 


Antidifferentiation is performed mostly to integrate piecewise analytic functions. So when an explicit analytic 
antiderivative could not be found, the method of term-by-term integration could be used on a Taylor series 
for the integrand. This numerical method could deliver good accuracy for small intervals, but at first the 
dangers of divergence were poorly understood. Nevertheless, term-by-term integration could be used to 
construct the “special functions”, which could then be tabulated and graphed, and their properties could be 
studied in terms of infinite series expansions. 


As the library of special functions expanded, the tables of integrable functions expanded accordingly. Modern 
examples of tables of integrals and special functions include Gradstein/Ryzhik [84], Volume 1, pages 81-662, 
Volume 2, pages 5-497; Abramowitz/Stegun [43], pages 65-819, 925-1010, 1019-1030; CRC [63], pages 236- 
294, 337-426; Spiegel [139], pages 57-103, 136-184. 

Although the ecosystem of antiderivatives, special functions and term-by-term integration of series expansions 
is limited to piecewise analytic functions, this limitation is not a serious hindrance in applications to sciences 
and engineering. When functions are not piecewise analytic, they are generally given as raw data which can 
be processed by approximate numerical integration techniques. One rarely needs to integrate, for example, 
the indicator function xo : R — {0,1} of the rational numbers within the set of real numbers, which has a 
well-defined Lebesgue integral although it is not the derivative of any real function. (Shilov [135], page 285, 
and Shilov/Gurevich [136], page 8, call xc the “Dirichlet function”.) The technique of antiderivatives of 
piecewise analytic functions can be extended to integration of functions between Cartesian spaces, including 
path integrals, area integrals, volume integrals and complex functions. 


To go beyond the explicit integration of piecewise analytic functions requires advanced techniques. The 
Lebesgue integral, for example, which requires some difficult analysis with frequent invocations of choice 
axioms, is clearly motivated by the desire to be able to integrate as many functions as possible. One 
might ask, however, whether the substantial extra effort is justified by the ability to integrate pathological 
functions which never arise in nature. It does seem reasonable to expect the integral of the limit of a 
sequence of functions to equal the limit of the integrals of the functions, especially when that limit has some 
practical value, but sometimes a limit of functions can be quite pathological. It also seems reasonable to 
be able to integrate a function with a nowhere dense set of discontinuities, but when the discontinuities are 
everywhere dense or the function is nowhere continuous, the practicality of integrating such a function may 
be questionable. 


43.2.4 REMARK: Notation for indefinite and definite integrals of real functions. 
The integral symbol “f” was introduced by Leibniz in 1686. For example, Leibniz [186], page 297, used a 
notation resembling “f dx : 2r — xx” with an additional overline or “vinculum” to indicate the scope of 
the integration operation. (See also Cajori [242], Volume 2, pages 181-183, 242-244; Struik [249], page 111.) 
According to Cauchy [207], page 84, Lesson 21, the notational convention where the endpoints a and b for 
a definite integral on an interval [a,b] as in Notation 43.2.6, are indicated by a subscript and superscript 
respectively, was due to Fourier. 

De plus, comme la valeur de l'intégrale définie que l'on considère dépend des valeurs extrêmes zo, 

X attribuées à la variable x, on est convenu de placer ces deux valeurs, la première au-dessous, la 

seconde au-dessus de la letter f, ou de les écrire à côté de l'intégrale, que l'on désigne en conséquence 

par l'une des notations 


f ie [ 1|. [ 19 |:z X]. 


La premiére de ces notations, imagineée par M. Fourier, est la plus simple. 
This may be translated as follows. 


Moreover, as the value of the definite integral that one considers depends on the extreme values 
zo, X attributed to the variable x, it is conventional to place these two values, the first above, the 
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second below the letter f, or to write them beside the integral, which one consequently denotes by 
one of the notations 


[ fo. [1| [roef] 


The first of these notations, imagined by Mr. Fourier, is the simplest. 


And in fact, this is the notation generally adopted in the modern mathematics literature. 


43.2.5 REMARK: Generic notation for integrals on “directed intervals” of real numbers. 

Notation 43.2.6 attempts to give a generic meaning for integrals on “directed intervals” [a,b], where the value 
of the integral is negated if b < a. In fact, this standard form of notation for integrals signifies a function of 
two variables, not a function of a real interval. Thus p f(x) dz means Siia f(x) dx only if a < b. 


43.2.6 NOTATION: Generic integral of a real-valued function on a bounded closed real “directed interval”. 
p f(x) dz, for a,b € R with a € b and an integrable real-valued function f : [a,b] — IR, denotes the integral 
of f on [a,b]. 

f? f (x) dz, for a,b € R with b < a and an integrable real-valued function f : [b, a] — R, denotes the negative 
of the integral of f on [b,a]. In other words, f? f(a) dx = — ff f(x) dz. 

In other words, 


(a)dx ifa<b 


b 
Va,b € R, I f(x) de = [a-b] 
4 — (x)dx ifb<a 
[b,a] 


= sign(b — a) i f (x) dz, 


where Sita. f(x)dx means some generic integral of f on the (undirected) set [[a, b]] = [min(a, b), max(a, b)]. 


43.2.7 REMARK: Notation difficulties for integrals. 
Notation 43.2.6 is used for many different styles of integration. The integration operation which is intended 
can usually be inferred from the context in which it appears. 


Another difficulty with notations for integrals is the distinction between names of functions and rules for 
functions. In the case of function names, the integrand is a label for a function (i.e. a set of ordered pairs) 
which is defined in the context. In the case of function rules, the integrand is an inline declaration which may 
combine any number of named functions according to various algebraic (such as addition and division) or 
analytic operations (such as infinite sums, derivatives and integrals). In this “inline rule” case, a parameter 
(i.e. “dummy variable") is used to specify the rule, and this parameter is appended to the integrand. Thus if 
the parameter is t, then dt is appended to the integrand for example. The parameter is itself also sometimes 
replaced by a function name or inline function rule. 


When a function rule is given, the restricted domain of the function is typically given as the subscript and 
superscript to the integral sign, whereas for a function name, this is redundant information in principle. 

The dummy variable x in f? f(x)dx is apparently arbitrary and unnecessary, although it does have the 
advantage that the integrand function can be “declared inline”, as in formulas such as f? f(a + 3) dx or 
f? f(x) + g(x) dx. Some authors do write simply f f of n instead of Notation 43.2.6, but this is best 
regarded as a convenient abbreviation rather than the standard notation. This abbreviation is analogous to 


writing >> z or 5 z instead of X zi for the sum of a sequence z = (2;)? ,,. In the case of integrals, the 
differential “dx” is a useful mnemonic for the factor x; — vj..; in Definition 43.5.10. 


43.2.8 REMARK: The frequent non-existence of closed-form integrals of simple-looking functions. 
Integral calculus is, in a sense, infinitely more difficult than differential calculus. It is not generally possible 
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to find closed-form integrals. In fact, the integration of many simple-looking functions requires the invention 
of “special functions”. Bell [233], page 101, said the following about the difficulty of closed-form integration. 


[...] the problem of evaluating f f(x) dx for comparatively innocent-looking functions f(x) may be 
beyond our powers. It does not follow that an “answer” exists at all in terms of known functions 
when an f(x) is chosen at random—the odds against such a chance are an infinity of the worst 
sort (“non-denumerable”) to one. When a physical problem leads to one of these nightmares 
approximate methods are applied which give the result within the desired accuracy. 


Existence for the anti-derivative style of integral depends on the existence of known special functions, or 
the ad-hoc addition of new functions to the international standard corpus of special functions as required. 
The notion of a “special function" is thus socially defined and varies over time. Thus the “existence” of a 
Gregory-Barrow-Leibniz-Newton integral is socially defined, which is not at all satisfactory in the supposedly 
logical subject which mathematics is claimed to be. 


The Cauchy-Riemann-Darboux style of integration, described in Sections 43.3, 43.4 and 43.5, does not require 
guesswork to recognise anti-derivatives. This style is closely related to the Eudoxus-Euclid-Archimedes 
approach, except that instead of first guessing the integral and then verifying it, one simply says that the 
integral does exist, and it can be found to any accuracy by successive approximation. This meant, in the 
strict classical Greek sense, that the integral did not exist. 


The important innovation by Cauchy, described in Section 43.3, was to apply the Cauchy sequence way of 
thinking to assert that an integral exists even if you cannot say exactly what it is. Then the ancient “method 
of exhaustion” yields an “answer” even when this answer cannot be given an explicit name. 


43.3. Cauchy integral 


43.3.1 REMARK: Cauchy replaced Leibniz-Newton antiderivatives with an exhaustion-method integral. 
After the antiderivative style of integration had been applied very successfully for more than a century, a more 
analytic style of integration based on approximation by simple geometric regions in the exhaustion-method 
style of Euclid and Archimedes was defined in 1823 by Cauchy [207], pages 81-84, Lesson 21. (See also 
Thomson /Bruckner/Bruckner [149], pages 333-348; Bruckner/Bruckner/Thomson [56], pages 41-42; Shilov/ 
Gurevich [136], page 8.) 


Although Cauchy’s definition and proof of well-definition may seem rough by modern standards, they were 
published decades before even the concept of a real number had been defined adequately, and a satisfactory 
definition of a general continuous function had been published only 2 years earlier by Cauchy himself. 


Cauchy’s integral differed from the Eudoxus-Euclid-Archimedes integral in a significant way. With the 
exhaustion method, it was necessary to first guess the integral. Then it was necessary to demonstrate 
arbitrarily close bounds above and below. With Cauchy’s integral, the convergence of approximations was 
considered sufficient to state that an integral existed, and the approximations then provided a numerical 
method for its computation. This is analogous to the Cauchy sequence concept in Definition 37.8.3, where 
a sequence can be said to be convergent without stating in advance what its limit is. The idea of defining 
concepts in terms of approximations could perhaps have been due to the increasing influence of applied 
mathematics in Cauchy’s time. In engineering applications in particular, approximations were deemed 
sufficient, and pure mathematical theory had to follow engineering pragmatism. 


43.3.2 REMARK: Integration on “signed intervals”. Integrals on sets versus integrals on curves. 

Cauchy allowed the interval for integration to be bidirectional. In other words, he permitted the endpoints 
a and b of the integration interval to be any a,b € IR, not necessarily a < b, and the value of the integral 
was negated by swapping a and b. (This is particularly useful for the fundamental theorem of calculus 
for example.) Cauchy's definition and proof are presented in Definition 43.3.15 and Theorem 43.3.20 for 
such “signed intervals" by using convex-span intervals [[a, b]] as in Notation 16.1.15 and the signum function 
"sign" as in Definition 16.5.4. The interval partitions in Definition 43.3.3 are therefore increasing if a « b, 
decreasing if a > b, and constant if a = b. 


Allowing integrals f? to have b < a is as convenient and natural as allowing numbers to be negative! However, 
negating the value of the integral when the end-points are reversed highlights the fact that there are two 
ways of thinking about integrals of real-valued functions of the real numbers. The domain of the integral 
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may be thought of as either a curve or a set. If the integral’s domain is generalised to Cartesian spaces 
or manifolds, for example, the “direction” of the integration no longer makes so much sense. Volume or 
area elements on which integrals are defined can sometimes be given an orientation, but this is not always 
meaningful. Integrals along differentiable curves, on the other hand, do have an explicit direction, and such 
integrals can be meaningfully reversed. Thus the convention of negating the integral on a real interval when 
the end-points are reversed strongly suggests that the domain is in fact thought of as a curve, not a set, and 
the “intervals” of an interval partition are vectors, not intervals. Thus f? is not the same as Siaor 


43.3.3 DEFINITION: An interval partition between end-points a,b € R is a sequence (zj)#_o € IR**! for 
some k € Z* such that zo = a, £k = b and Vj € Ng, sign(zj — £j-1) = sign(b — a). 


43.3.4 NOTATION: Part(a,b) denotes the set of interval partitions between end-points a,b € R. 


43.3.5 DEFINITION: A refinement of an interval partition x = (a;)*_9 between end-points a,b € IR is an 
interval partition z' = (ao between a and b such that Range(x^) 2 Range(x). 


A common refinement of interval partitions x and x’ between end-points a,b € R is an interval partition 
between a and b which is a refinement of both x and 2’. 


43.3.6 REMARK: The coarsest common refinement of a non-empty finite set of interval partitions. 
'The purpose of Definition 43.3.7 is to construct a particular common partition from amongst infinitely many. 


In the special (and arguably useless) case of a pair of partitions z',z" € Part(a,b) with a = b, both 
a! = (z;)* y and a” = (z;)*^, have the constant value a. So the smallest partition z = (v;)* , € Part(a, b) 
with Range(x) = Range(z’) U Range(z") = {a} must satisfy x; = a for i € {0} UN, with k = min(K, k”). 
For partitions z',z" € Part(a,b) with a < b, the set Range(x’) U Range(z") is a finite subset of [a,b], for 
which there is a unique increasing enumeration by the set (0) UN;, where k = 1+ #(Range(x’) URange(z")). 
This is the only possibility for the “coarsest common refinement” of x’ and x” in Definition 43.3.7. (So the 
word “smallest” is redundant when a < b.) The same observation is applicable to the case b < a, and similar 
observations apply to general non-empty finite sets of interval partitions. 


43.3.7 DEFINITION: The coarsest common refinement of interval partitions z', x" € Part(a,b), for a,b € R, 
is the smallest partition x € Part(a, b) which satisfies Range(x) = Range(z^) U Range(z"). 


The coarsest common refinement of a non-empty finite set of interval partitions S C Part(a,b), for a,b € R, 
is the smallest partition x € Part(a, b) which satisfies Range(r) = (LJ, g Range(z"). 


43.3.8 THEOREM: Some basic properties of interval partitions and refinements. 
Let a,b € R. 
(i) Part(a, b) Z 0. In particular, ĉ = (2;)1.9 € Part(a, b) with ĉo = a and $1 = b. 

(ii) Va € Part(a, b), Range(a) C [[a, b]]. 

(iii) For any finite subset S of [[a, b]] with a Æ b, there is a unique x € Part(a, b) with Range(x) = SU {a,b}. 

(iv) If a = b, then the coarsest common refinement of z^, z" € Part(a, b) with z' = (a). y and 2" = (a7) E^, 
is x = (z;)?.9 € Part(a, b) defined by z; = a for all i € (0) U Nx, where k = min(k’, k”). 

(v) If a,b € R with a Æ b, then the coarsest common refinement of z',z" € Part(a,b) is the unique 
x € Part(a,b) with Range(r) = Range(z') U Range(x"). 


PROOF: For part (i), sign(2; — ĉo) = sign(b — a) because 29 = a and 2, = b. Therefore ĉ € Part(a, b) by 
Definition 43.3.3. Hence Part(a, b) 4 0. 

For part (ii), let x = (z;)5.9 € Part(a, b) with Range(z) Z [[a, b]]. Then x; ¢ [[a, b]] for some i € Ni 1. 
Suppose that a < b. Then either {i € IN, 4; z; < a} zz or {i € Ny 1; zx; > b} z Ü because zo = a 
and £ = b. In the former case, Let ij = min{i € Ni i; z; < a}. Then ij € Nk-1, Zip < a and 
Zij—1 > a. So sign(zi, — 14,11) = —1 z sign(a, b), which contradicts Definition 43.3.3. In the latter case, let 
i; = max(i € Ny i; xi > b). Then i1 € INx i, £i > b and z;,44 € b. So sign(£i+1— Zio) = —1 z sign(a, b), 
which contradicts Definition 43.3.3. Therefore Range(x) C [[a, 5]] if a € b, and similarly for b < a. 

For part (iii), let a < b. Let S’ = SU {a,b}. Then by Theorem 13.5.14, there is a unique increasing sequence 
x with Range(z) = S'. Then x = (z;)E.9, where k +1 = #(9’). Since a,b € S’ and S C [a,b], it follows 
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that zo = a and zk = b. Therefore x € Part(a,b). Since x is unique amongst sequences with range S", it is 
a-fortiori unique amongst elements of Part(a,b) with range S’. 

For b < a, let S’ = (t € R; -t € SU {a,b}}. Then by Theorem 13.5.14, there is a unique increasing 
sequence y = (y;);.g with Range(y) = S'. Then z = (—y;)z.g is the unique partition in Part(a,b) with 
Range(x) = SU {a,b}. 

For part (iv), both 2’ and x” have the constant value a. So the smallest partition « = (;)* o € Part(a, b) 
with Range(x) = Range(2z^) U Range(z") = (a) must satisfy x; = a for i € {0} UN, with k = min(k’,k”). 
For part (v), let a,b € R with a # b. Let x’,2” € Part(a,b). Then Range(«’) U Range(z") is a finite subset 
of [[a, b]. So by part (iii), there is a unique x € Part(a, b) with Range(x) = Range(z') U Range(z"). There is 
no partition y € Part(a, b) with Range(y) = Range(z') U Range(z"), for which y Z x and = is a refinement 
of y. So this is the only choice for the coarsest common refinement of x’ and x” in Definition 43.3.7. (The 
word “smallest” in the definition is thus redundant when a Æ b.) 


43.3.9 REMARK: The mesh of an interval partition. 

For the Cauchy integral, the “mesh” of partitions plays a role analogous to a metric on the domain of a 
function which is being tested for continuity. The domain of the Cauchy sum in Definition 43.3.14 for a fixed 
integrand f is the set of all partitions of an interval. The mesh of a partition is not truly a metric, although 
it does in some sense measure the distance of partitions from an idealised infinite division of the interval 
into infinitesimal sub-intervals. The ó in Notation 43.3.11 plays a role similar to the diameter of a ball in a 
metric space in the definition of continuity in Theorem 38.1.3. 


43.3.10 DEFINITION: The mesh of an interval partition (as) 6 € Part(a, b) is max’ |z; — $j-al. 


43.3.11 NOTATION: Mesh-constrained sets of interval partitions. 
Part; (a, b) denotes the set of interval partitions between a, b € R for which the mesh does not exceed 6 € R$. 
In other words, 


Va,b € R, Vô € IRj,  Parts(a, b) = [Go € Part (a, b); max |z; — £5-1| < 6}. 
mesh(a) denotes the mesh of an interval partition x. 


43.3.12 THEOREM: Some basic properties of mesh-constrained sets of interval partitions. 
(i) Va, bE R, Vå, 69 € RS. (ôi ud 69 > Parts, (a, b) c Parts, (a, b)). 
(ii) Va,b € IR, (a £b => Parto(a,b) = 0). 

Pnoor: For part (i), let a,b € R and 61,62 € Rj with à < dy. Let (2 m € Parts, (a,b). Then 
max} ,|r; — z;j-1| € 51 by Notation 43.3.11. So max}_, |r; — x;-1| € 62. Therefore (7;)5.9 € Parts, (a, b). 
Hence Parts, (a, b) C Parts, (a, b)). 

For part (ii), suppose that Parto(a,b) # Ø. Then there exists (x;)5 .j € Parto(a,b). Then zj = zj—ı for all 
j € Nx by Notation 43.3.11, where k > 1 by Definition 43.3.3. So £k = zo by induction on j. Therefore 
a = b by Definition 43.3.3. Hence a # b > Parto(a, b) = Ø by Theorem 4.5.7 (xxx). 


43.3.13 REMARK: Cauchy sums and the Cauchy integral. 

For the discussion of the Cauchy integral, it is convenient to define “Cauchy sums” as in Definition 43.3.14. 
These sums depend only on the integrand and the choice of partition since the value of the integrand is 
only “sampled” at the initial point of each interval of the partition. The “sample points” are on the left of 
intervals if a € b and on the right if b < a. Cauchy sums have negative values for positive integrands if b < a. 
Since there is no use of extremums on intervals as there is for Darboux sums, there is no need to assume 
that the integrand is bounded. In fact, the sums are well defined for any real-valued function on [[a, b]]. 


The Cauchy integral of a continuous function is the limit for small 6 € IR* of Cauchy sums for partitions 
with mesh not exceeding ô. (The case ô = 0 may be ignored in view of Theorem 43.3.12 (ii).) 
43.3.14 DEFINITION: The Cauchy sum of a function f : [[a,b]] + IR for an interval partition z = (x;)*_, 
between a,b € R is the real number Sc(f,x) = Y (zi — vj 1) f (zi 1). In other words, 

Va,b € R, Vf : [[a,b]] > IR, Vx € Part(a, b), 


Sc(f, v) = X (Ti def hi. 
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43.3.15 DEFINITION: The Cauchy integral of a function f : [[a, 0]] — IR for a,b € R is the number J € R, 
if it exists, which satisfies 


Ve € Rt, 3ó € R", V(z;)2 4 € Parts(a, b), 


| Gee (84) — T | < €. (43.3.1) 


In other words, Ve € Rt, 3ó € Rt, V(z;)* , € Parts(a, b), 
i—0 


Solf, x) -I| «e. 


43.3.16 DEFINITION: A (real-valued) Cauchy integrable function on an interval [a,b], for a,b € R with 
a < b, is a function f : [a,b] — R which satisfies 


I €R, Ve € R*, 36 € R*, V(z;)* , € Parts(a, b), 
i=0 


| Xl: =) fei) — T| <E. 


In other words, 3J € R, Ve € IR*, 3ó € R+, Vz € Parts(a,b), Sc(f,x) € Bre, where Bre = (I—&,I +8). 


43.3.17 REMARK: Alternative set-limit expression for the Cauchy integral. Approximation clouds. 
The Cauchy integral in Definition 43.3.15 could be thought as the following limit of sets. 


U(f)) = lim (xs — icc [25-8 € Part; (a, b) ). 


ó—0* "ji— 
Thus if the difference between the real number 7(f) and the singleton {J(f)} is ignored, the Cauchy integral 


of f may be expressed as lims. ,9- (Soc (f, x); x € Parts(a, b)) if this limit exists. This may be expressed more 
precisely as follows. 


Ve € Rt, 3ó e Rt, {So(f,x); x € Parts(a,b)} C Brgy c- 


In other words, lims_,o+ sup(|Sc(f, x) — I(f)|; x € Parts(a, b)} = 0. 


The set I(f,6) = {Sc(f,z); x € Parts(a,b)} may be thought of as a “cloud” of approximations, where 
each point in the cloud corresponds to one or more Cauchy sums for interval partitions with mesh not 
exceeding ô. Then one may ask whether this “cloud” converges to a unique number I(f) € IR as 6 tends to 
zero. Such a “converging cloud” is almost ubiquitous in the theory of integration because the approximations 
are parametrised by partitions, not by simple numbers or vectors. The mesh of the partitions provides a 
simple real number which aggregates the partitions which are “small” in some sense into a “cloud” depending 
on the “smallness” parameter. One can then say that the cloud of approximations I(f,6) converges to the 
value I(f) if for every e € IR*, there is a 6 € IR* such that the entire cloud I(f,6) is included in the 
neighbourhood By pr)... 


The standard Cauchy sequence concept is defined for sequences, of course. So this concept must be extended 
to make it applicable to “clouds” of approximations. The mere fact that the diameter of the “cloud” converges 
to zero does not immediately apply that the “cloud” converges to some unique number. A theorem is required 
in order to guarantee a meaningful limit for the quite complicated dependence of Cauchy sums on interval 
partitions. This can be done either by modelling the “clouds” as real-number functions of “directed sets” 
and “Cauchy nets” as in Remark 37.8.15, or by defining “Cauchy cloud functions” which are set-valued 
functions of a real variable. 


43.3.18 REMARK: The slightly unnecessary precondition of uniform continuity. 

Since Cauchy’s form of proof was not valid unless the integrand was assumed to be uniformly continuous, 
this is assumed as a condition in Theorem 43.3.20, whereas Darboux’s 1875 proof benefited from the 1872 
proof by Heine [181], page 188, that continuity on bounded closed intervals implies uniform continuity. Thus 
the assertion of Cauchy’s theorem was valid for general continuous integrands, but the proof was incomplete. 
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43.3.19 REMARK: Cauchy’s presentation of the Cauchy integral. 

Theorem 43.3.20 follows the 1823 proof of Cauchy [207], pages 81-83, lesson 21, using more modern notation 
and terminology, but otherwise essentially the same proof. Although he did not explicitly invoke the symbols 
€ and ô to prove the well-definition of his integral, he did use these symbols in the now familiar way in an 
earlier section which he referred to specifically during this proof. (See Cauchy [207], page 27, lesson 7.) 
Therefore epsilons and deltas are incorporated in the proof of Theorem 43.3.20. mn 


Cauchy's 1823 justification of his integral was included in an analysis course. Therefore it was a somewhat 
informal exposition, not a rigorous proof. His justification may be outlined as follows. 


(1) For continuous f : [[a,b]] ^ IR with a,b € R with a z b, let S(f,x) = Y (i — x41) f (xi-1) for 
x € Part(a, b). 


Note that S(f, x) — (b—a) Sih hif (zi-1), where p; = (r;—2; 1)/(b—a) for i € Ng, where 0, wi = 1 
and ui > 0 for all i € Ng. Thus S(f,x)/(b— a) is a weighted mean of the values f(x; 1) for i € Ng. 
Therefore S(f,x)/(b — a) lies in the closed interval between the minimum and maximum of f on [[a, 6]]. 


(2 


— 


(3) Conclude from (2) that S(f,x) = (b — a) f(z) for some z € [[a, b]] by “some arguments similar to those 
which we used in lesson 7". (See Cauchy [207], pages 26-28, lesson 7.) By this, he meant his proof 
of the mean value theorem. (See Theorem 40.6.6 for the mean value theorem. See Theorem 34.9.7 
for the intermediate value theorem, which is more directly applicable. For Cauchy's own proof of the 
intermediate value theorem, see Cauchy [206], pages 43-44, 460—463.) 


(4) Apply the procedure in (3) to a refinement x’ of x to obtain S(f,x,z') = Y, (wi — ti 1) f (zi) for some 
(mi) with z; € ([zi-1, zil] for all i € Ng. 


(B) Let e; = f(z) — f(as—1) for i € Ng. Then S(f, x, 2") = YE; (ai — zi-1)f(£i-1) + ey eii — zia). 

(6) Observe that the differences x; — 2;_, “have very small numerical values", and that “each one of the 
quantities e; will differ very little from zero”. So the sum Sede &i(z;— 2; 1), which equals b—a multiplied 
by the weighted mean iS Lye; of the quantities ej, will also differ very little from zero. 

(7) By (6), comparing S(f,x) in (1) with S(f,x,2’) in (4), “one will not perceptibly alter the value of" 
S(f, x) if one substitutes x’ for a partition z whose intervals are “numerically very small". 


(8) For any two interval partitions x and g”, construct a common refinement z'. Then S(f,x,z') = 
S(f, x", x^) will differ very little from S(f,x) or S(f, x"). Consequently, S(f,x) and S(f, x") will differ 
very little. “Therefore when the intervals of x become infinitely small, the choice of partition has no 
longer any more than an imperceptible influence on the value of S(f,z)." 


(9) “If one indefinitely shrinks the numerical values of the intervals, the value of S(f, x) [...] will end up 
attaining a certain limit which will depend uniquely on the form of the function f and the end-points 
a and b. This limit is what one calls a definite integral.” 


Cauchy's modern-style c- argument in lesson 7 showed that he knew how to make the above style of 
argument rigorous, if one assumes that continuity implies uniform continuity on bounded closed intervals. 
(See Cauchy [207], pages 27-28.) 


Cauchy's 1823 integral is essentially identical to Riemann’s 1854 integral. A sum-expression almost exactly 
the same as Cauchy's sum-expression S(f, x, x’) = Y (e — vi i)f(zi) with z; € [[x; 1, xil] for à € Nz in 
step (4) appeared in Riemann's presentation of convergence conditions for the integral. (See Riemann [195], 


page 102.) However, Riemann did not explain his interpolated numbers z; in terms of the intermediate value 
theorem for refinements x’ of the interval partition x. 


In step (8) of the above argument, Cauchy was saying that when the interval lengths are small enough, 
the differences between two sums will be small. This strongly resembles the Cauchy sequence concept in 
Definition 37.8.3. Therefore a modernised version of his proof should presumably also use Cauchy sequences 
and the compactness properties of the real numbers. 


One difficulty with applying Cauchy sequences to the proof of Theorem 43.3.20 is that the sums are not 
indexed by an integer parameter. They are parametrised by an interval partition z € Part(a,b). This is 
suggestive of the “Cauchy net” concept. (See for example Kasriel [100], pages 248-255. See also Remarks 
37.8.16 and 38.3.13.) Luckily such abstract concepts are not required because a measure of “smallness” 6 of a 
partition from an imaginary limiting “zero-length interval partition” is the maximum interval length 6. Thus 
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the set Parts(a,b) in Definition 43.3.15 is a suitable modernisation of Cauchy’s idea of intervals becoming 
“infinitely small”. 


43.3.20 THEOREM: Well-definition of the Cauchy integral for uniformly continuous functions. 
The Cauchy integral of any uniformly continuous function on a bounded closed interval is well defined. In 
other words, every uniformly continuous function on a bounded closed real interval is Cauchy integrable. 


PROOF: Let f : [[a, 6] — IR be uniformly continuous for some a,b € R. If a = b, then Sc(f,x) = 0 for all 
x = (z;)E.g € Part(a,b) because x is constant with x; = a for all i € k +1. So the Cauchy integral of f is 
well defined and equals zero. Thus assume that a Æ b. 


Let e € Rt. Let e9 = ie/|b — a|. Then by the uniform continuity of f, there is a 6 € Rt such that 
2 y 


| f(a) — f(a^)| < £o for all x, x’ € [[a, b]] such that |x — z'| < 6. Let ôo be such a value of ô. 

Let x = (z;)*.g € Parts, (a,b). Let z' = [uo be a refinement of x. Let i € Nx. Then z;.; = x}, and 
v; = x}, for some jo, jı € Zi with jo < jı < k' by Definition 43.3.5, and then (ate is an interval partition 
between rz; ; and x; by the monotonicity of z’. (See Figure 43.3.1.) 


To Ti—1 Ti Tk 


Figure 43.3.1 Refinement of an interval partition 


a S E , 
ji jo T 252j2je41 j- 


y1 le — 2; alf (254) < lai - zi-i| maxi j 41 f (25. 4). Therefore 


Since the factors xj — xj—1ı all have the same sign and z; — £i—1 = £ 
that |r; — z;-1| mini i44 f(2 1) $ us 
|n; — 21 | minia oap FE) < ene |x; — 25 1f (£1) < [zi — zi-i|max;eqs, ,,7,]] f(t). Therefore by 
the intermediate value theorem, Theorem 34.9.7, ETICA — cz; 1)f (25 4) = (vi — vi-1) f(ti-1) for some 
tii € [[i-1, xil] since f is continuous. But |t;:1 — z; 1| € à. So |f(t; 1) — f(xi-1)| < &o. Therefore 


(xi; — v5 4); it follows 


ISo(f, x’) — Sc(f,z)| = | > (a — 1, 1) f (25) — C = 241) f (zi1)] 
- p vi-1)f (621) - x = 85-3) f (5i1)] 
< |b = aļ£o 
= łe. 


Now let x, x’ € Parts, (a,b), and let x” € Part(a,b) be the coarsest common refinement of x and z', as in 
Definition 43.3.7. Then |Sc(f,x) — Sc(f,z")| < $e and |Sc(f.z') — Sc(f,z")| < $e, which implies that 
ISc(f.x) — So(f,2")| < €. So diam(X;) < e, where X; = (Sc(f,x); x € Parts(a,b)} for all ó € Rt. 
Therefore lims_,9+ diam(X5) = 0. So (,.5 Xs = (1j for some unique J € IR by Theorem 37.9.12 (ii) because 
this family is non-decreasing. (In other words, it is non-increasing with respect to decreasing 0.) 


Thus there is a unique J € IR which satisfies condition (43.3.1) in Definition 43.3.15. Hence the Cauchy 
integral of every uniformly continuous function on [[a, b]] is well defined. 


43.3.21 REMARK: Common refinements of partitions versus merged-partition overlap bounds. 

'The procedure for comparing Cauchy sums for two different partitions of the same interval in the proof 
of Theorem 43.3.20 is the same as was followed by Cauchy in his 1823 analysis textbook. But instead of 
constructing a third partition which is a common refinement of two given partitions, it is possible to arrive 
at the same result by putting bounds on the difference terms for the overlaps between component terms of 
the two partitions. (For this alternative tactic, see for example Graves [85], pages 86-87.) 
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43.3.22 REMARK: The class of Cauchy integrable functions. 

Example 43.3.23 shows that the class of Cauchy integrable functions includes some anomalous functions 
which have an asymptote at the right of their domain. This is because function values are sampled at the 
left of each interval of the partition. However, the concept of a “class of integrable functions” corresponding 
to each integration procedure is very modern. A few decades after Cauchy, it became quite normal to 
associate each mathematical procedure with the class of all inputs which give a meaningful output. 


It would be anachronistic to criticise Cauchy for sampling function values only at the left of each partition 
interval. He correctly stated that if a function is continuous on a closed interval, then his procedure yielded 
a meaningful integral. An important novelty is that his integral is demonstrably meaningful when there is no 
known anti-derivative (which was the 17th century calculus approach), when there is no known power series 
which can be integrated term by term (another 17th century approach), and even when one cannot guess 
the value of the definite integral in advance (which was required by the exhaustion method). In the modern 
approach to mathematics in general, it is quite normal to prove that something exists without knowing what 
it is. This was not so normal before Cauchy’s time. 


43.3.23 EXAMPLE: Function which is “Cauchy net integrable”, but is neither continuous nor bounded. 
Let a = 0, b = 1, and define f : [a,b] > R by 


Vt € [0, 1], f= T NE di E 


Let J = 2. Let € € IR*. Let x = (2;)*_, € Part(0, 1). (See Figure 43.3.2.) 


4 E t<1 
0 t=1 


3 


1 E 


Figure 43.3.2 Anomalous Cauchy integral of non-continuous, non-bounded function 
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Subtracting line (43.3.2), which equals Sc(f, x), from line (43.3.3), which equals 2, one obtains 


2-So(f,z) = 3 ((1— 2, 1)! — (1 — z;)1/7) (1 — zi-1)? — (1 — z:)?) (1 — 24.1) 17? 
EX (1 — 2551) — (1.— 2:12? 1 — a, 4) 


Let 6; = z; — zi—1 for i € Ng. Then 
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where line (43.3.4) follows from Theorem 16.6.19 (ii). 

Unfortunately, this kind of convergence with respect to the sum of square roots of partition interval lengths 
does not satisfy the requirements of Definition 43.3.16. The sum of square roots is a kind of “t-norm” in the 
context of p-norms in Definition 24.7.11, but for p = i, this is not a true norm according to Definition 24.7.2. 


Nevertheless, it is intuitively clear that the Cauchy sums do converge in some sense to the usually understood 
integral of f, which is 2. If a partition x € Part(0, 1) satisfies T (xj — i1)? < e, then Sc(f, £) will lie in 
the real-number ball B5,.. This will also be true for any refinement of x. Thus if the mesh-constrained sets of 
partitions in Notation 43.3.11, which put a bound on the co-norm of x as in Notation 24.7.13, are replaced by 
the metric-free directed set concept in Remark 37.8.15, this function f would be Cauchy integrable without 
using limits of sub-intervals to extend the Cauchy integral definition. Thus one could perhaps say that f is 
“Cauchy net integrable”, but not “Cauchy mesh integrable”. 


43.4. Cauchy-Riemann integral 


43.4.1 REMARK: The Cauchy-Riemann-Darbouz integral. Also known as the Riemann integral. 

The style of integral which is commonly given the name “Riemann integral” is perhaps more accurately 
referred to as the Cauchy-Riemann-Darboux integral. It was Cauchy who in 1823, in one of his textbooks, 
defined the integral of a general continuous function as the limit of numerical approximations by sums of 
areas of rectangles in an analytical fashion reminiscent of Euclid and Archimedes. (See Shilov/Gurevich [136], 
page 8; Thomson/Bruckner/Bruckner [149], pages 333-348; Bruckner/Bruckner/'Thomson [56], pages 41-42.) 


In 1854, posthumously published in 1868, Riemann added to Cauchy's definition some convergence criteria 
in terms of the local oscillation of functions. (See Riemann [195], pages 101-104.) In 1875, Darboux gave a 
correct proof, using Heine's 1872 work on uniformly continuous functions, that the Cauchy-Riemann style 
of integral is well defined for continuous functions. (See Shilov/Gurevich [136], page 8; Riesz/Szókefalvi- 
Nagy [125], pages 25-26; Darboux [176], 59-75; Heine [181], page 188.) The most basic property of the 
integral, namely that it is well defined, was therefore not proven within Riemann's lifetime. Much later, 
Lebesgue gave necessary and sufficient existence conditions in terms of sets of measure zero. 


It is difficult to attribute the Riemann integral to Riemann. The definition presented in most textbooks is 
in fact Darboux's, which is expressed in terms of limits of upper and lower bounds by rectangular functions. 
As in the case of the so-called “Stokes theorem", it is probably most convenient to employ the traditional 
Riemann name, although the true origins should not be forgotten. It was Cauchy who invented it. The other 
two merely paraphrased it and studied its properties in greater detail. 


All in all, the Riemann presentation of the Cauchy-Riemann-Darboux integral is perhaps the least valuable 
of the three. The *management overheads" for the arbitrary sample for each interval in a partition are 
unnecessarily onerous, as seen particularly in the statement of Theorem 43.4.11. The principal test of 
Riemann integrability, given in Theorem 43.4.16, is impractical to compute, as discussed in Remark 43.4.15. 
So the definition and properties which Riemann contributed did not in themselves give substantial net benefit. 
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It was Darboux who made the Cauchy-Riemann integral easier to analyse, and demonstrated its existence 
for general continuous functions, and it was Jordan who gave a practically useful necessary and sufficient 
condition for integrability in terms of the Jordan content of the discontinuity set of the integrand. The 
original Cauchy presentation of the integral, on the other hand, had the virtue of being very simple and 
practical. Nevertheless, the integral is most often named after Riemann. 


43.4.2 REMARK: Riemann sums depend on choices of sample points in each sub-interval. 

Whereas the Cauchy sums in Definition 43.3.14 depend only on the end-points of a partition, the Riemann 
sums in Definition 43.4.5 depend also on arbitrary choices of “sample points” within the component intervals 
of the partition. However, the sample points are strongly implied by Cauchy’s proof of the well-definition 
of his integral. The extra freedom to vary the sample points instead of sampling only at initial points of 
sub-intervals clearly makes no difference to the scope of well-definition of the integral nor to its value. Part 
of the difference in approach may be due to the fact that Cauchy was writing an undergraduate textbook 
whereas Riemann was writing a doctoral thesis. 


43.4.3 DEFINITION: An interval partition sample (sequence) for an interval partition (2;)*_9 € Part(a, b) 
for a,b € R is a sequence (z;)*_, € IR* which satisfies Vi € Nz, 2: € [[ri-1, zi]. 


43.4.4 NOTATION: Samp(r), for an interval partition x, denotes the set of all interval partition sample 
sequences for x. In other words, 


Va,b € R, V(;) , € Part(a, b), 
Samp(z) = {z € R5; Vi € Neg, zi € [[ri-1, z;]])- 


i= 


43.4.5 DEFINITION: The Riemann sum of a function f : [[a,b]] — IR for an interval partition x = (z;) 
|a for x, is the real number Sg(f,z,z) = 


between a,b € R, and an interval partition sequence z = (z;)* 
3 (z — vi 1) f (z;). In other words, 


Va,b € R, Vf : [[a,b]] > R, Vx € Part(a, b), Vz € Samp(z), 
k 


Sn(f,z,2) = » (ai — zic1)f (2). 


i=1 


43.4.6 DEFINITION: The Riemann integral of a function f : [[a, b]] — IR for a,b € R is, if it exists, 


lim LY. — 1) f(z); (zj) E.g € Parts(a,5), (s) 84 € Samp(z)]. 


ó—0* "i21 
In other words, the Riemann integral of f is the number J € R, if it exists, such that 


Ve € Rt, 3ó € R*, V(z;)E.g € Parts(a, b), V(z;)*., € Samp(z), 


p (zi — zi-1)f (z) — < e. (43.4.1) 


That is, the Riemann integral of f is lims_,9+{Sr(f, £, z); x € Parts(a,b), z € Samp(x)} if this limit exists. 


43.4.7 DEFINITION: A (real-valued) Riemann integrable function on an interval [a,b], for a,b € R with 
a < b, is a function f : [a,b] — R which satisfies 


3I € R, Ve € R*, 35 € R*, V(z;)* € Parts(a, b), V(z;)* , € Samp(z), 
1=0 i—1 


That is, SJ € R, Ve € Rt, 3ó € IR*, Vx € Parts(a,b), Vz € Samp(z), Sg(f,z,z) € Bre = (I— &I4 €). 
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43.4.8 REMARK: Riemann integrability is stricter than Cauchy integrability. 
Theorem 43.4.9 shows that Riemann integrability implies Cauchy integrability. This is because the Riemann 
integral requires convergence for all choices of sample points within the sub-intervals in the interval partition. 


43.4.9 THEOREM: Every Riemann integrable function is Cauchy integrable. 
Let f : [a,b] — R for some a,b € R with a < b. If f is Riemann integrable on [a,b], then f is Cauchy 
integrable on [a, b]. 


PROOF: Suppose that f is Riemann integrable on [a,b]. Then by Definition 43.4.7, there exists Ig € IR 
satisfying Ve € IR*, 3ó € Rt, Var € Parts(a, b), Vz € Samp(z), Sn(f,x,z) € Br, c. Let € € R*. Then there 
is a ĝo € IR* satisfying Vx € Parts, (a, b), Vz € Samp(z), Sn(f,z,z) € Br, c. Let x = (v;)E.g € Parts, (a, b), 
and define z = (z;))E 4, by zi = xi-1 for all i € Nz. Then z € Samp(z) by Notation 43.4.4. Therefore 
Sn(f,x,z) € Bre. But Sr(f,2,z) = Sc(f,v) by Definitions 43.3.14 and 43.4.5. So Sc(f,x) € Bro. It 
thus follows that 3I € R, Ve € Rt, 3ó € Rt, Yx € Parts(a,b), Sc(f,z) € Br, Therefore f is Cauchy 
integrable on [a,b] by Definition 43.3.16. 


43.4.10 REMARK: Conditions for well-definition of the Riemann integral. 

As Riemann [195], pages 101-104, remarked in his presentation of the integral in Definition 43.4.6, it is 
certainly not well defined if the integrand is unbounded. He suggested that by breaking up an interval [a,b] 
into intervals [a, c— o4] and [c-- a», b] and computing the limits of the integral on these intervals for o1 > 0* 
and a2 — 0*, a divergence of f at c would not prevent a meaningful integral from being defined if the two 
limits were well defined, but he noted that the arbitrariness of such fixes made them unsuitable. He then 
proceeded to demonstrate the well-definition of the integral in terms of the oscillation of the integrand on 
small intervals. 


Riemann [195], pages 103-104, gave a necessary condition for the integral in Definition 43.4.6 to be well 
defined. His informal theorem and proof strategy are completed and formalised here as Theorem 43.4.11. 


43.4.11 THEOREM: Necessary conditions for well-definition of the Riemann integral. 
For bounded functions f : [[a, b]] + R with a,b € IR, define the functions Dy, Wy, Ay and Ly by 


Va = (z;)E.g € Part(a, b), Vi € Nz, 
Dy(z, i) = diam(f([[xi-1,2;]])) 
= sup{ f(t); t € [[zi-1, vi]]} — inf(f(t); t € [mi-1, 2i]. 
Va = (z,)*. € Part(a, b), 
Wr(z) =| 


V 


M= 


[2 = Xi-1| Dy (xz, i), 
1 


VO ERT, A r(6) = sup{W (x); x € Parts(a, b)), (43.4.2) 
Va = (xi). € Part(a, b), Vo € RF, 
Lg(z,o) = M zi — vii: i € Nx, D(x, i) > oj. 


Assume that f is Riemann integrable on the interval [[a, 5]]. Then the following assertions follow. 


(i) f is bounded. 
(ii) v6,,8; € RY, (& € by > Ag(&) € Ag(63)). 
(ii) limg ,o« Ap(8) = 0. 
(iv) Vo € Rt, Vó € R*, Va € Parts(a, b), oLr(v,0) € As (ô). 
(v) Vo € R*, lim; ,o« sup(Lr(z, o); x € Parts(a, b)} = 0. 


PROOF: For part (i), suppose that f is unbounded above. Let ô, K € Rt. Let x = (x;)¥ o € Parts(a, b). 
Then oo = supif(t); t € [a6]; = sup(sup(f([i-1,2;]])); i € Ne}. So sup(f([rij-1,24,]])) = oo for 
some igo € Nz. For such ig € Nz, f(x) > (K E DIENA {io} [2 — vi3| f (i 1))/ |i, — Lip—1| for some 
z’ € [[ri; 1, Lio|]. (The case a = b can be ignored because f is always bounded on a singleton, and otherwise 
the sequence (z;)F.g is strictly monotonic by Definitions 43.3.3 and 43.3.10. So £i, — ri, 1 Æ 0.) Define 
z € Samp(x) by Zig = z' and z; = xi—ı for i Z ig. Then Sp(f,xz, z) = x |v; — zi-i1|f(z;) > K. But this 
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is true for all ô € R* and K € IR*. So sup{Spr(f, 2, z); x € Parts(a,b), z € Samp(z)) = oo for all 6 € IR*. 
So the Riemann integral in Definition 43.4.6 is undefined, and similarly if f is unbounded below. Therefore 
f is bounded. 


For part (ii), let 61,62 € IR^ with 6; < 62. Then Parts, (a,b) C Parts,(a,b) by Theorem 43.3.12 (i). So 
sup{W (a); x € Parts, (a, 0)) € sup(Wy(z); x € Parts, (a, b)}. Thus A,(01) € Af(d2). 


For part (iii), let X¢(6) = {Sr(f,x, z); x € Parts(a,b), z € Samp(x)) for 6 € Rt. Then for all ó € Rt, 


diam(X ;(0)) = sup(|Sn(f,, z) — Sn(f,', 2’)|; x, a' € Parts(a, b), z € Samp(z), 2’ € Samp(2^)) 
> sup{|Sr(f,x,z) — Sr(f, x, 2')|; x € Parts(a, b), z, z' € Samp(z)] 
eg. 


Therefore lim; ,59 Af (ô) = 0 by Definition 43.4.6, 
For part (iv), let z = (z;)E.g € Part(a, b) and o € IR*. Then 


W (a) = X {zi — zi-i|Dg(z, 3); i € Nx) 
= {ti — zi i| Dr (m, d); i€ Nz, Dy (x, i) P o} 
> o} {|z; — vi-1|; i E Nk, Dp (x,t) > o} 
=oL;(z,0). 


But W(x) € Ap(8) for all x € Parts(a, b) for all 6 € IR* by line (43.4.2). Therefore oL ¢(x,o) € Af(6) for 
all z € Parts(a, b) for all 6 € Rt for all o € R*. 


For part (v), let c € IR*. Let e € Rt. Then by part (iii), there is a ô € IR* such that A,(0) < oe. For 
such ô, by part (iv), Vz € Parts(a,b), Lg(zx,0) < o 1 A5(8) < o toe = e. Therefore Ve € Rt, 3ó € R+, 
sup(L(z,o); x € Parts(a,b)} < e. Hence lim; ,5- sup(L(z,o0); x € Parts(a,5)) = 0 for allo € IR*. 


43.4.12 REMARK: Implicit concept of the measure of a set of real numbers. 

Part (v) of Theorem 43.4.11 means that for any given positive c, the total length of the intervals of a 
partition x in which the “oscillation” of the function is greater than c can be made arbitrarily small by 
making the maximum length ô of the component intervals of x small enough. This implies that the function's 
set of jump-discontinuities which are greater than any given o > 0 can be covered by a finite set of intervals 
whose total length can be made arbitrarily small by making them smaller and more numerous. 


43.4.13 REMARK: Riemann’s sufficient condition for convergence of the integral. 

In his posthumously published doctoral thesis, Riemann [195], page 104, noted that his necessary conditions 
for integrability, equivalent to Theorem 43.4.11 (i, v), were also sufficient. However, he gave only a rough 
indication of how this might be proved. Common refinements of partitions, which were used in Cauchy's 
proof of the convergence of his integral as in the proof of Theorem 43.3.20, may be applied in the proof of 
Theorem 43.4.14 also. (See Remark 43.3.21 for an alternative style of proof.) 


When Riemann's integrability condition is written out more fully as in Theorem 43.4.14 line (43.4.3), it 
is not immediately obvious that it has practical value for testing functions for integrability apart from 
the observation that continuous functions and bounded piecewise continuous functions with finitely many 
discontinuities are integrable, which is intuitively evident from the convergence procedure anyway. 


43.4.14 THEOREM: Sufficient condition for well-definition of the Riemann integral. 
Let f : [[a,b]] — IR, for some a,b € IR, be bounded and satisfy condition (v) of Theorem 43.4.11. Then f 
has a well-defined Riemann integral on [[a, b]]. In other words, if f is bounded and satisfies 


Vo € R*, 
slim, sup ( 95 {|i — zi-i|; i € Nu, diam(f(([zi-1, 2i]])) > o}; (zi), € Parts(a,b) } =0, (43.4.3) 
ar 


then f has a well-defined Riemann integral on [[a, 0]. 
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PROOF: Let a,b € R and let f : [[a,b]| + R be bounded and satisfy line (43.4.3). If a = b, then all of 
the Riemann sums equal zero and the Riemann integral of f is well-defined and equal to zero. So assume 
that a Æ b. 


Let e € Rt. Let o = ie/|b — a|. Let €; = 1e/M, where M = max(1,sup(|f(s) — f(t)|; s.t € [[a, b]]}). Then 
by line (43.4.3), there is a 6; € IR* such that L;(r,c) < &i for all x € Parts, (a,b). (See Theorem 43.4.11 
for notation.) 

Let x = (zj)b,,2' = (x))E € Parts, (a,b). Let z” = (z7)9 o € Part(a,b) be the coarsest common 
refinement of x and z' as in Definition 43.3.7. Then z" € Parts, (a, b). Let z € Samp(z) and z” € Samp(z"). 


Let i € Nx. Since x” is a refinement of x, z;., = xj, and x; = x}, for some jo, jı € Dom(z") with jo < ji. 


Then [2i = Xi-1| = /9 P |x, = zi and 


Ja 
lzi = zial fla) D lez- ajal f) | < lei zil sup fŒ- inf | f(t))) 
j=jo+1 t€ [[ri -1,2i]] t€ [[x;—1,2;]] 
= lz: = Zi—ıl D;(a, i) 
By applying this inequality to each of the terms in the expression Spr(f,x£,z)— Sr(f, x", z"), the result is 


k" 


k 
| Sel f,2,2) - SR" D bee ilf 7 32 bef - al E) 
= J= 


k 
€ $, |r- ii| Delz, i) 
i=1 


D [zi— etel Pleo + X [ri — xi-1| D(x, i) 


ieN- ieN 
€ |b—a|e + Lg(z,0) M 
«ierie =e, 


where NT = (i € Ny; Dr(z,i) € o} and N* = (i € Ny; Dr(z,i) > o}. Similarly, let z' € Samp(z’). Then 
ISn(f. 2.2) — Sn(f, 2", 2")| < $e. Therefore |Sg(f, 2,2) — Sr(f,2', 2’)| < for all z, z' € Parts, (a,b) and 
z € Samp(x), z' € Samp(z'). Thus f satisfies line (43.4.1) in Definition 43.4.6. Hence the Riemann integral 
of f is well defined. 


43.4.15 REMARK: Necessary and sufficient conditions for existence of the Riemann integral. 

Theorem 43.4.16 combines Theorems 43.4.11 and 43.4.14 to give a necessary and sufficient condition for 
Riemann integrability of a real-valued function on a bounded closed interval. However, the property in 
Theorem 43.4.16 line (43.4.4) is not very easy to understand or test because is not neatly separated into 
a set-measure concept and a function discontinuity-set concept. It is a combined condition involving both 
the oscillation of the function and the sums of lengths of intervals in a partition. Later authors made this 
separation explicit by defining distinct concepts of the “content” or measure of a set of real numbers and 
the set of discontinuities of a function. Then it became possible to determine integrability by measuring a 
function’s discontinuity set with a separately developed measure theory. 


43.4.16 THEOREM: Riemann integrability is equivalent to convergence of oscillation to zero. 
Let f : [[a,b]] — IR, for some a,b € IR. Then f is Riemann integrable on [{a, b]] if and only if f is bounded 
and satisfies 


Vo € IR*, 
Jum sup { Y (Iz; — zii; i € Ng, diam(f([[ri-1.2;]])) > o}; (zi) 4 € Parts(a, b) }=0. (43.44) 


PROOF: The assertion follows from Theorems 43.4.11 (i, v) and 43.4.14. 
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43.5. Cauchy-Riemann-Darboux integral 


43.5.1 REMARK: Some milestones in the history of the Cauchy-Riemann-Darboux integral. 
1) 1821. Cauchy defined continuous functions. (See Cauchy [206], pages 34-35.) 
2) 1823. Cauchy's integral. (See Cauchy [207], pages 81-84.) 


3) 1854. Riemann's integral, with necessary/sufficient existence conditions in terms of the oscillation of 
the integrand. (See Riemann [195], pages 101-104.) 


1868. Riemann’s 1854 results published posthumously. 
1872. Heine’s proof that continuous functions are uniformly continuous on closed bounded intervals. 


1875. Darboux’s integral and existence proof for continuous functions. (See Darboux [176], pages 57-77.) 


1892. Jordan’s necessary/sufficient integral existence condition in terms of the “content” of the set of 
points of discontinuity. (See Jordan [182].) 


43.5.2 REMARK: The Darboux integral and well-definition proof. 

The integral defined in 1875 by Darboux [176], pages 57-77, was equivalent to Riemann's integral, which in 
turn was essentially identical to Cauchy's integral. All three integrals were well defined for the same functions 
and gave the same value for the integral, and the equivalences between them were clear. Darboux's definition 
is perhaps conceptually tidier than the previous definitions, and is in fact presented as “the Riemann integral” 
in many textbooks. Neither Riemann nor Darboux was in any doubt that they were merely reformulating 
the precursor integrals for greater convenience. The “Cauchy-Riemann-Darboux integral" is a single integral 
with three slightly different formulations. 


Darboux's 1875 proof that his slightly reformulated integral was well defined for continuous functions, which 
also proved well-definition for the other two integrals, used the recently proved theorem that continuous 
functions on bounded closed real intervals are uniformly continuous. Darboux [176], page 73-74, credited 
the uniform continuity theorem to an 1873 book by Thomae [231], pages 7-9, who in turn credited an 1872 
paper by Heine [181], page 188, for the first proof. Darboux then gave his own different proof of uniform 
continuity, using successive subdivisions of intervals instead of a sequence of bounds from left to right across 
the interval. (See Theorem 34.9.12 for such a Heine-style left-to-right proof of the compactness of bounded 
closed intervals. See Section 38.3 for uniform continuity.) 


For presentations of the Darboux integral, see for example Friedman [74], pages 108-112; Graves [85], 
pages 85-93; Schramm [133], pages 282-293; Shilov/Gurevich [136], pages 8-11; Bass [53], pages 69-71; 
Rudin [129], pages 104-105; Mattuck [114], pages 251-260. (Some of these authors give the name *Riemann 
integral" for the Darboux integral construction.) 


43.5.3 REMARK: The simplicity of the Darboux integral. 

If a function is viewed from the geometric perspective as a graph in two-dimensional Euclidean space, the 
Darboux integral is a simple method for computing upper and lower piecewise rectangular approximations 
to the area under the graph. This interpretation is not so obvious in the Cauchy and Riemann formulations 
of the same integral. 


It is difficult to see any real difference between the Darboux integral and the application of the method of 
exhaustion, attributed to Eudoxus, to quadrature and cubature by Euclid in about 300BC and Archimedes in 
about 250BC. (For Euclid's application of the exhaustion method to quadrature and cubature, see for example 
Euclid/Heath [215], pages 14-15, pages 369-391, 400-417; Euclid [216], pages 237-238, 411-421, 427—436; 
Heath [244], pages 413-415; Boyer/Merzbach [237], pages 81-83; Boyer [235], page 34. For Archimedes' 
application of the method in his *Quadrature of the parabola", see for example Archimedes/Heath [200], 
pages 233-252; Boyer [235], pages 51-56; Boyer/Merzbach [237], pages 115-116. The similarity of the Cauchy 
integral to the method of exhaustion is mentioned by Thomson/Bruckner/Bruckner [149], pages 331-333.) 


Although “quadrature” and “cubature” were geometric rather than arithmétic in ancient Greek mathematics, 
and rational numbers were used instead of real numbers, the technique is essentially identical. In fact, the 
Darboux integral is restricted to rectangles, whereas the method of exhaustion used more general geometric 
shapes, including triangles and tetrahedra, for bounding areas above and below. So it could be argued that 
the ancient Greek “Eudoxus-Euclid-Archimedes integral" was more general than the Darboux integral. (The 
definition of measures in terms of covers by non-rectangular simple sets reappeared in 20th century geometric 
measure theory. See for example Federer [69], pages 169-174.) 
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43.5.4 REMARK: Upper and lower Darboux sums of bounded functions. 

As in the case of the Cauchy sums in Definition 43.3.14 and the Riemann sums in Definition 43.4.5, the upper 
and lower Darboux sums in Definition 43.5.6 are meaningful for general functions. The integrand does not 
need to be restricted to bounded or continuous functions. Lower Darboux sums may contain terms which are 
negative infinite, but not positive infinite. Negative infinite terms will occur if and only if the integrand is 
unbounded below. Then the sum have the well-defined value —oo. Similarly, the upper Darboux sum will be 
positive infinite if and only if the integrand is unbounded above. Since this is true for all interval partitions, 
the integral will not be well defined for unbounded functions. This agrees with Theorem 43.4.11 (i). 


A notable difference between the Darboux integral in Definition 43.5.10 and the Cauchy and Riemann 
integrals in Definitions 43.3.15 and 43.4.6 respectively is that there are no epsilons or deltas in the Darboux 
integral. The epsilons are implicit in the infimums and supremums, but the deltas are unnecessary. 


43.5.5 REMARK: The Darboux integral definition for reversed domain intervals. 

An inconvenient difference between the Darboux integral and the Cauchy and Riemann integrals is that the 
use of the supremum and infimum as upper and lower bounds for function values actually gives the wrong 
value when the sense of the domain interval for the integral is reversed. As mentioned in Remark 43.3.2, 
the integrals of real-valued functions on real intervals are effectively integrals along curves, not on sets. 
Thus the line elements x; — z;.; are vectors, not lengths! Therefore to obtain the correct value of the 
upper and lower bounds for the area elements, one must apply the supremum and infimum to each product 
(x; — zii) f ([ri-1, z:]]), not directly to the image set f([xi-i,c;]]). (Note that (x; — zi 1) f ([[vi-1, i]]) 
must then be defined to be the product of a number by a set, yielding ((x; — x; 1)f(t); t € [x; 3. vi]])-) 
In his 1875 presentation, Darboux defined the upper and lower sums, and the oscillation, according to the 
formulas on lines (43.5.1), (43.5.2) and (43.5.3). Therefore when b < a, it is found that the “upper sum" is 
less than or equal to the “lower sum", and the oscillation is non-positive. (See Darboux [176], page 71.) He 
did not present the required adjustments to the theory of the integral for the case b « a, and in fact later 


authors have preferred to merely define f? to mean — f when b < a. This is analogous to how negative 
numbers are typically constructed, namely by effectively prefixing a positive value with a negative sign. This 
avoids technicalities which are tedious and worthless. 


When the Darboux integral approach is applied to Stieltjes-style integrals along curves as in Sections 43.11 
and 43.12, the supremum and infimum are replaced by topological convergence. Thus the supremum and 
infimum in the Darboux integral construction are merely an old-fashioned way of computing the diameters of 
the image sets f([[ri-1, x:]]). A more modern way to compute the Darboux oscillation in line (43.5.3) would 
be as PS m |v; — zi-i1| diam(f([[xi-1, z;]])). Hence the presentation given here, based on Definitions 43.5.6, 
43.5.10, 43.5.11 and 43.5.12, has some historical and nostalgic value, but is not the best way to formulate 
the theory in a modern idiom. 


43.5.6 DEFINITION: Darboux sums and Darboux oscillation, to be ignored for reverse domain intervals. 
The upper Darboux sum of a function f : [a, 6]] — IR for an interval partition z = (a;)*_9 between a,b € R 


is the extended real number S$ (f, £) = Y (a — zi 1) sup f ([[r;-1, x;]]). In other words, 
Va,b € R, Vf : [[a, 0]] > R, Vx € Part(a, b), 


a= > Ge dep bata] (43.5.1) 


i=1 


The lower Darboux sum of a function f : [[a,b]] — IR for an interval partition x = (x;)*_9 between a,b € R 
is the extended real number S5(f,x) = Y (z — zi 1) inf f([[xi-1, 2;]]). In other words, 


Va,b € R, Vf : [[a, 0]] > R, Vx € Part(a, b), 


85 (f.2) = X (ei — 2a) inf finis za). (43.5.2) 


i=l 


The Darbous oscillation of a function f : [[a,b]] — IR for an interval partition x = (a;)*_9 between a,b € IR is 
the difference Ap(f, x) = Sb(f,x) — Sp(f, x) between lower and upper Darboux sums of f. In other words, 


Va,b € R, Vf : [[a, 0]] > IR, Vx € Part(a, b), 
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Ap(f,x )- SS je Sp(f, 2) 
= x — xi-1)(sup f([[xi-1; zi]) — inf f([[zi-1, 2:]])). (43.5.3) 


43.5.7 THEOREM:  Monotonicity of Darboux sums and Darboua oscillation with respect to refinements. 
Let a,b € R with a € b. Let f : [a,b] > IR. Let x € Part(a, 5) be a refinement of y € Part(a, b). 


(i) pas Ux ) € S5. V). 

(ii) Sp(f.y) < Spl, x) < 

(iii) Spy) < Sp(f, 2) e SU, x) € SEC(S, y). 
(iv) 0 < Ap(f, v) € Ap(f. v). 


PROOF: For part (i), let x = (a;)f_) and y = (4125-9: If a = b then Sb(f, x) = Sb(f,y) = 0 because 
sup f([[a, b]]) = sup{ f (a)} = f(a) and z; — 2,1 = 0 for all à € Ng, and y; — yj-1 = 0 for all j € Ne. 

Now suppose that a < b. Then Range(x) 2 Range(y) by Definition 43.3.5. Let j € Ne. Then zi, = yj-1 
and rj, = yj for some ig,71 € Nx with i9 < i1, and then 


Ae — 24-1) sup f([[xi-1,24]]) < de — ci 1) sup f ([[yj—1, vj]]) (43.5.4) 


= (y; —j-1)sup f(Iv;—i. vl. 


where line (43.5.4) follows from Theorem 11.2.42 (ii). So S$(f,x) < S$ (f, y) by summing over j € Ny. Since 
—oo < sup f ([[7;-1,2;]]) for all i € Ng, it follows that —oo < S$ (f, x). 

Part (ii) may be proved as for part (i). 

Part (iii) follows from parts (i) and (ii), and the observation that inf f([[ri 1, ;]]) € sup f([[xi-1, zi]]) for 
all i because [[r; 3, z;]] 4 0. 

Part (iv) follows from part (iii) and Definition 43.5.6. 


43.5.8 THEOREM: Oscillation on sub-intervals is bounded by oscillation on the whole interval. 
Let o, B € R with a < 8. Let f : [o, 8] > IR. Let a,b € |a, B] with a € b. Then 


Va € Part(a, B), Vy € Part(a, b), 
Range(y) € Range(z) > Ap( (FI, app) € A2). 


PROOF: Let x € Part(a, 8) and y € Part(a,b) with Range(y) € Range(z). If a = b, then Ap( (flip V py) =9 


by Definition 43.5.6. Therefore Ap( (f|, (ay Y ) € A(f, x). So assume that a < b. 


Since a,b € Range(y), Range(y) C Range(x implies that a,b € Range(a). Let x = (a;)f_ and y = (yj)5—o- 
Then £ € k, and yo = Zi, and yp = xi, for some ig, i; € Dom(x) = {0} UN, with ig < i1. By the injectivity 
of x and y, this implies that 7; = io + £ and 2,4; = yj for all 7 € Dom(y). So by Definition 43.5.6, 


Ap(f.z) = È (2; — aia) (sup f (fea, ai) — inf ficiis) 


2 
> OX Gn e)p flle e) = int (nias) 

= Y (ints — tints a) Jess ti ~ in Fist) 
= Xi - i-i emp Fuji) — int ns) 

= An (flia t) 


This verifies the assertion. 


[ www. geometry .org/dg.htm1] [draft: UTC 2023-1-3 Tuesday 00:13 


1406 43. Integral calculus 


43.5.9 REMARK: Restriction of the Darboux integral to “forward domain intervals”. 

Definition 43.5.10 is restricted to “forward domain intervals” [a, 0] with a € b because otherwise the formulas 
would have *unintended consequences". In other words, their values would be worthless gobbledygook. The 
assertions of Theorem 43.5.7 are not valid for b < a. However, the integral in Definition 43.5.12 is required 
to be defined and valid for b « a. Therefore the customary ad-hoc negation of the “forward domain interval” 
definition is invoked. 


In Definition 43.5.11, Darboux integrability is defined only for a function on an “undirected interval" [a, b]. 
In other words, the domain is just a set, not a path. But the Darboux integral is defined for specified “end- 
points" a,b € IR. Then the sign of the difference b — a determines whether the common value of the upper 
and lower Darboux integrals on the (undirected) interval [[a, 6]] = [min(a, b), max(a, b)] must be negated. 
(See Notation 16.1.15 for [[a, b]].) 


43.5.10 DEFINITION: Darboux integral for a real function on a bounded closed interval. 
The upper Darboux integral of a bounded real-valued function f : [a,b] > IR for a,b € R with a < b is 


TSCA) = int { 32655 — 2-2) sup legas nj]: Gif € Partes) (43.5.5) 


= inf {Sf (f, x); (255.8 € Part (a, b)}. 
The lower Darboux integral of a bounded real-valued function f : [a,b] > IR for a, b € R with a < b is 


Ip(f) = sup { x — 251) inf f([j—1, 25]; (25)5-o € Part(a, b)} (43.5.6) 


= sup (Sp(f.z); o € Part(a, b)}. 


43.5.11 DEFINITION: Darboux integrability on a real interval with non-decreasing end-point order. 
A Darboux integrable (real-valued) function on a real interval [a,b] for a,b € R with a < b is a bounded 
real-valued function on [a,b] for which the upper and lower Darboux integrals I$ (f) and Ip (f) are equal. 


43.5.12 DEFINITION: Darboux integral with a specified ordered pair of end-points. 
The Darbous integral of a Darboux integrable real-valued function f on a real interval [[a, b]] with end-points 
a,b € R is sign(b — a) multiplied by the common value of the upper and lower integrals of f on [[a, t]. 


43.5.13 REMARK: Darboux integral notation. 

In general, the various styles of integrals do not have distinct notations. The generic Notation 43.2.6 is 
used for a wide variety of integral operators. The operator which is meant can usually be inferred from the 
surrounding context, but occasionally it is convenient to give an explicit indication as in Notation 43.5.14. 
(See Friedman [74], page 122, for the similar notation “(D) f? f(x) dx”.) 


Possibly a more rational systematic notation could be f. a f (x) dz, which is presented in Notation 43.5.14 for 
temporary convenience. This would be consistent with notations such as Js f(x) dz which are very widely 
used for integrals over regions, but it would be inconsistent with the traditional Fourier-style notation 
y f(x) dz mentioned in Remark 43.2.4. As observed in Remark 43.2.5, the Fourier-style notation signifies 
a function of two variables, a and b, not a function of a single set [[a,6]]. This malady could be remedied 


with a notation like i f(x) dz, but this is too clumsy to meet with popular acceptance. Ultimately the 
problem here is that the integral symbol has only two “sockets” for inserting parameters, but there are three 
parameters: a, b and the integration procedure D. 


43.5.14 NOTATION: The Darboux integral for real-valued functions on a bounded closed real interval. 
dos f(x) dz, for a Darboux integrable function f : [a,b] — R with a,b € IR such that a < b, denotes the 


Darboux integral of f on [a,b] with end-points a and b. In other words, [m f(a) da-—I Ug yed, 


(yf? f(x) dx, for a Darboux integrable real-valued function f : [[a,b]] — IR with a,b € R, denotes the 
Darboux integral of f on [[a, b]] with end-points a and b. In other words, 
D 


b 
Va,b € R, mf f(x) dx = sign(b — af f(x) da. 


a,b]] 


f? f(x) dx is an abbreviation for [D] f? f(x)dx when the meaning is clear from context. 
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43.5.15 REMARK: Darboux integral definition and properties. 

Darboux defined sums and limits of sums for a < 6 and then noted that the procedures are equally applicable 
for a > b, and that the signs of the sums and limits of sums are reversed. (See Darboux [176], page 71.) This 
is in contrast to the presentations of Cauchy [207], pages 81-83, and Riemann [195], pages 101-104, where a 
and b were always assumed to be arbitrary real numbers. (See Definitions 43.3.15 and 43.4.6.) 

The upper and lower Darboux integrals in Definition 43.5.10 are well defined real numbers for bounded 
integrands because the expressions £j — 2.1, inf f([z; 1, 2;]]) and sup f([[z;j 1, z,]]) are finite real numbers. 
The absolute value of a finite sum of such products is bounded by (b — a) sup |f|. Therefore the inf and sup 
of these sums are similarly bounded. 


The publication in 1875 by Darboux [176] of his formulation of the Cauchy-Riemann integral included the 

following aspects. 

(1) Theorem that every bounded real-valued function on a closed bounded real interval has a well-defined 
real-valued infimum and supremum. 

(2) Theorem that every continuous real-valued function on a closed bounded real interval attains its infimum 
and supremum, which are finite, and so (by the intermediate value theorem) attains all values in between. 

(3) Definition of sums NC — qj 3) inf f ([[xi-1, z;]]) and Sor tum — zi 1) sup f([[xi-1, z;]]) of infimums 
and supremums of a real-valued function f on sub-intervals [[v; 1, z;]] of a partition (x;)*_, € Part(a, b) 
of a closed bounded interval [[a,5]], and the sum Y (a — qi 3)(sup f ([[x;—1; z;]]) ^ inf f([[zi-1, i]])) 
of oscillations on the sub-intervals of such a partition. 


(4) For any real-valued function on a closed bounded real interval, whether continuous or not, the sums of 
infimums, supremums and oscillations in part (3) each converge to a well-defined limit as the mesh of 
the partitions converges to zero. (See Darboux [176], pages 65-70.) In other words, 


im. (550 x); x € Parts(a,b)} = I (f), 


lim (Sp(f. x£); x € Parts(a,b)} = Ip(f), 


ó—0* 


Jim (S5(f,2) — Sp(f, 2); # € Parts(a,0)) = I5) - 154). 


5) Restatement of Riemann's necessary and sufficient criterion for well-definition of the integral in terms 
of the total length of sub-intervals with oscillation greater than any given positive value. 


6) Formulation of the Riemann integral as the common limit of the upper and lower Darboux sums if these 
limits are the same. (See Darboux [176], pages 72-73.) 


7T) New proof of the uniform continuity of a continuous real-valued function on a closed bounded real 
interval, expressed in the terminology of oscillations on sub-intervals. 


8) New theorem that continuous functions are Riemann integrable because they are uniformly continuous. 


9) Assertion of various basic properties of the Riemann-Darboux integral, such as the immunity of the 
integral to changes of value at a finite number of points of the integrand, the continuity of the integral, 
the equality of the derivative of the integral to the original function at points where the original function 
is continuous (i.e. FTOC 1), and the equality of the integral to the difference between values of an 
antiderivative at the end-points of the integrand's domain (i.e. FTOC 2). 


As in Riemann's treatise, Darboux did not decompose the integrability condition into a set-measure concept 
and a set-of-discontinuity-points concept. This was remedied in 1892 by Jordan [182]. (See Remark 43.6.1.) 


43.5.16 REMARK: The approximation procedures for the Darboux integral and curve length are similar. 
The Darboux integral is similar to Definition 38.8.2 for the length of a curve, which is the supremum of 
approximations to the length over all ordered sequences of curve parameter values. The Darboux integral is 
computed as both a supremum and infimum of approximations to the area under a curve over all ordered 
sequences of domain-interval partition points. Although the supremum and infimum of area approximations 
are always well defined, the integral is only well defined if they are equal, whereas the length of a curve is 
always well defined as the supremum of length approximations. The difficulty of making the upper and lower 
area approximations equal is one of the principal motivations for defining the technically more demanding 
integration procedures. 
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Instead of presenting the 19th century Cauchy-Riemann-Darboux integral, many authors merely comment 
on its limitations before presenting the 20th century Lebesgue integral. Some authors present the Riemann- 
Stieltjes integral and make only brief comments on the special case of the Cauchy-Riemann-Darboux integral. 


43.6. Darboux integrability, pointwise oscillation and Jordan content 


43.6.1 REMARK: Integrability conditions in terms of zero Jordan content. 

Necessary and sufficient conditions for the well-definition of the Cauchy-Riemann-Darboux integral of a real 
function of a real variable may be expressed in terms of the “Jordan outer content” of the set of points where 
the integrand has oscillation greater than some positive value. 


The content of a set of real numbers is called the “exterior Jordan content” by Graves [85], page 88. It 
is simply called the “content” by A.E. Taylor [145], page 191. “Sets of content zero” are also defined by 
Schramm [133], pages 340—341, and are called “contented sets” by Edwards [67], pages 214-215. The early 
development of the outer content of sets in connection with the well-definition of the Darboux integral is 
attributed to Peano, Jordan and Cantor by Bruckner/Bruckner/Thomson [56], page 20. 

For the application of Jordan content to Riemann integrability, see Graves [85], page 88-90; Friedman [74], 
pages 123-125; Bruckner/Bruckner/Thomson [56], pages 20-21, 25-27; A.E. Taylor [145], pages 191-196; 
Shilov [135], page 485-488; Edwards [67], pages 204-206, 214-233; Rosenlicht [128], pages 118-123, 224-228; 
Jordan [182], pages 81-94. 


43.6.2 DEFINITION: The (Jordan) (outer) content of a set of real numbers E is 


k k 
inf ($75 £4 (aj), € List(IR), (6)E., € List(R*), and E € U (a;,a; + &)}. 
i-i 


i=l 


43.6.3 REMARK: JIntegrability conditions in terms of zero Lebesgue measure. 

A real-valued function of a real variable is Cauchy-Riemann-Darboux integrable if and only if its set of 
discontinuities is a set of Lebesgue measure zero. (For sets of real numbers of Lebesgue measure zero, see 
also Sections 45.1 and 45.2.) 


The Lebesgue outer measure of a set of real numbers is called the “exterior Lebesgue measure” by Graves [85], 
pages 88-89. Sets with zero Lebesgue measure are called “sets of measure zero” defined by Schramm [133], 
pages 343-344. The early development of the outer measure of sets in connection with the well-definition of 
the Darboux integral is attributed to Borel and Lebesgue by Bruckner/Bruckner/'Thomson [56], page 20. 


The theorem that a bounded function on a bounded closed interval is Riemann integrable if and only if 
it is continuous except on a set of Lebesgue measure zero is shown by Johnsonbaugh/Pfaffenberger [97], 


pages 236-237; Riesz /Székefalvi-Nagy [125], pages 18-20. This theorem is obtained as an application of 


Lebesgue integration theory by Bass [53], pages 70-71; Bruckner/Bruckner/Thomson [56], page 209. 


43.6.4 DEFINITION: The Lebesgue outer measure of a set of real numbers F is 


inf ( 5 £i; (a2, € RN, (6)72, € (IR')P, and E € U (aiai + &)}. 
ici i 


i=1 


43.6.5 DEFINITION: The oscillation of a function f : X — R. on a non-empty subset S of X is the non- 
negative extended real number 


oscs(f) = sup( f|) — inf (f|). 


43.6.6 DEFINITION: The oscillation of a function f : S — IR at a point t € S, where S is a subset of IR, is 
the non-negative extended real number w(t) = lims_,9+ diam(/(|t — ó,t + 0])). In other words, 


vt c S, wf (t) = inf (oscu su 5)(f); 6 € IR*]. 


43.6.7 THEOREM: Global bound for pointwise oscillation bounds oscillation on small enough intervals. 
Let f : K — R for some compact subset K of IR. Suppose that € € Rt is such that Vt € K, wy(t) < e. Then 
3ó € Rt, Vt € K, oscp tpl f) < e. 
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PROOF: Let C = ((t' —6',t' - 0); € K and ó' € Rt and oscy_95/,7495(f) < €). Then C is an open 
cover of K. So K has a finite subcover C” C C by Theorem 34.9.13. Let 6 = $min{diam(Q); Q € C’}. 
Then ô € Rt and every t € K is an element of some (t' — 6’,t/ + 6’) € C', which implies that [t,t + 6] C 
[^ — 20’, t^ + 20"), which implies that oscyy 445] < £. 


43.6.8 THEOREM:  Darboux integrable if and only if positive oscillation points have Jordan measure zero. 
Let f : [a,b] > IR for a,b € R with a < b. Then f is Darboux integrable if and only if 


Ve € R*, us (ft € [a.b]; wg (t) > e}) — 0, (43.6.1) 


where u$ (E) denotes the Jordan outer content of any set E € IP(IR). 


PROOF: Let f : [a,b] — R be a Darboux integrable function with a < b. Suppose that condition (43.6.1) is 
not satisfied. Then there is an € € Rt such that u7 ({t € [a,b]; w(t) > e}) = 7 > 0. By Definition 43.5.10, 
there are partitions x’, x” € Part(a, b) such that S5(f,z') — S5(f. x") < en. Let x = (xi); be the coarsest 
common refinement of z' and x”. (See Definition 43.3.7.) Then Sj(f,z) < Sj(f,z') and Sp(f,x) > 
Sp(f, x"). So SEC, £) - Sp(f, £) < en. 

Let I = {i € Nn; ose; ,2,)(f) > £}. Let X, = (t € [a,b]; we(t) > £}. Then X. C U;erlti-1, zi]. So 
Xe € Ujer(ai-1 — ô, zi + ô) for all ó € R+. Therefore ut (Xe) < 2nd + Y jc;(zi — 21-1) for all ó € Rr. 
So 7 = u$ (X) € Vye7(ti — vi-1). Therefore S$(f,x) — S5(f,x) > ¢ Dijey (vi — vii) > en. This is a 
contradiction. Therefore condition (43.6.1) is satisfied whenever f is Darboux integrable. 

For the converse, suppose that condition (43.6.1) holds. Let e € Rt. Let L = max(1,b—a). Let €o = $e/L. 
Let M = max(1,oscjqyj(f)). Let n = $e/M. Then by (43.6.1) and Definition 43.6.2, there are (a;)#_, and 
Cae such that Xs, C U(ai,ai + £;) and X li < n. Let K = [a,b] \ UJ(a;, a; 4- £j). Then K is a compact 
subset of IR. So by Theorem 43.6.7, there is a 69 € IR^ such that OSC[r 1.59] < €o for all t € K. 

The closed intervals [a;, a; + £j] may be assumed to be disjoint by joining overlapping or adjoining intervals, 


which does not increase the value of ym Li. Then [a,b] may be partitioned by a sequence x € Part(a, b) into 
intervals which either belong to 1|[a;, a; + £;]; à € Nz} or are subsets of K with length not exceeding ôo. The 
sum of oscillations of the former kinds of intervals does not exceed 7 oscp, g (f) < Ie. The sum of oscillations 
of the latter kinds of intervals does not exceed eo(b — a) € ie. Therefore S$ (f, v) — Sp(f,x) € $e < e. 
Hence f is Darboux integrable. 


43.6.9 REMARK: Continuous functions are Darboux integrable om closed bounded real intervals. 
As expected, continuous functions on closed bounded real intervals are Darboux integrable. This follows 
easily from Theorem 43.6.8 because the oscillation of such a function equals zero at every point of its domain. 


43.6.10 THEOREM: Continuous functions have zero oscillation at every point. 
Let f € C?([a, b], IR) for some a,b € R with a < b. Then 


Vt € [a,b], ur (t) — 0. 


Hence ut (ft € [a,b]; w(t) > e}) = 0 for alle € R*. 


PROOF: The assertion follows from Definition 43.6.6, Theorem 38.1.8, and Definition 43.6.2. 


43.6.11 THEOREM:  Darbouz integrability of continuous functions on closed bounded real intervals. 
Let f € C?([a, 0], IR) for some a,b € R with a X b. Then f is Darboux integrable on [a,b]. 


Proor: The assertion follows from Theorems 43.6.8 and 43.6.10. 


43.7. Cauchy-Riemann-Darboux integral basic properties 


43.7.1 REMARK: Some basic properties of the Darboux integral. 

The various names of this integral are more or less interchangeable. They give the same output for the same 
spaces of functions. They differ only in some minor aspects of their construction methods. The attributions 
to individual mathematicians seem to be justified more by their theorems than their definitions. Thus the 
Cauchy, Riemann, Darboux, Cauchy-Riemann, Cauchy-Riemann-Darboux and Riemann-Darboux integrals 
are the same except for some construction details. Here, for simplicity, the term “Darboux integral” is used. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1410 43. Integral calculus 


((2018-12-3. Should give a theorem which states, roughly, that if a function is Darboux integrable on a 
bounded interval, then that function is both Cauchy integrable and Riemann integrable, and the values of 
the integrals are all the same. Part of this work is already done in Theorem 43.4.9. )) 


43.7.2 REMARK: Using nets of interval partitions instead of the mesh. 

Theorem 43.7.3 gives a formulation of Darboux integrability which is not based on the mesh. The mesh- 
based approach is followed for the Cauchy integral in Definition 43.3.16, and for the Riemann integral in 
Definitions 43.4.6 and 43.4.7, and Theorem 43.4.11, 43.4.14 and 43.4.16. The mesh 6 € Rt of a partition is 
closely related to the metric on the real number. Then convergence is expressed in the usual -ô manner. A 
net-based integrability assertion, on the other hand, states that for all € € R*+, all refinements x’ of some 
fine enough partition x have “error” less than e. This net-based approach is adopted in Theorem 43.7.3. 
(For convergent nets, see for example Bass [53], pages 202-203.) 


43.7.3 THEOREM:  Darbouz integrability formulated in a convergent-net style. 
Let f : [a,b] > R for a,b € R with a < b. Then f is Darboux integrable on [a,b] if and only if 


Ve € IR*, dz € Part(a, b), Ap(f,x) « e. (43.7.1) 


PROOF: Let f be Darboux integrable on [a,b]. Then the Darboux integral of f on [a,b] is I5(f) = Ip(f). 
Let € € R”. Then by Definition 43.5.10 and Theorem 16.1.18 (v), there exists a partition z* € Part(a, b) with 
SEC eT) e I5 (f), I5 (f) - 3e). Similarly, there exists ^ € Part(a, b) with S5(f,x ^) € (Ig(f)- ie. Ip(f)]. 


Let x be the coarsest common refinement of «+ and z^ , as in Definition 43.3.7. Then by Theorem 43.5.7 (iii), 
Spf, £) < Spl x) < SES, £) < SECS, £t). Therefore I5(f) - 1e < Spf £) < SECS, £) < I5) + 1e. 
So Ap(f,z) = Si(f,z) — Sp(f, x) < € because If (f) = Ip (f). Hence line (43.7.1). 

Now assume line (43.7.1). Let € € R*. Then there exists x € Part(a, b) satisfying Ap(f,x) < e. Therefore 
Sb(f,r)— Sp(f,z) « e. However, S5(f,x) € I5(f) € IEA) < Sji(f, x) follows from Definition 43.5.10 
and Theorem 11.2.17 (i). So Ij (f) — Ip(f) < S(x) — Sp(f,x) < e. Since this holds for all e € IR*, it 
follows that I5(f) = I5 (f). Hence f is Darboux integrable on [a,b] by Definition 43.5.11. 


43.7.4 REMARK: Darboux integrability is defined on intervals, not for interval end-point pairs. 
Darboux integrability is defined in Definition 43.5.11 on real intervals [a,b], whereas the Darboux integral 
in Definition 43.5.12 is defined for ordered pairs of end-points a and b. As mentioned in Remarks 43.5.5 
and 43.5.9, this follows the very useful convention that the domain interval is regarded as a vector, not a set. 
In fact, the domain is regarded as a set when that is convenient, and it is regarded as a vector or "directed 
interval” when that is convenient. 


Here Darboux integrability is defined on a set, whereas the Darboux integral is defined on a “directed 
interval” with a specified ordered pair of end-points. Consequently, Theorem 43.7.5 is expressed in terms of 
undirected domain sets, because its assertion concerns integrability. Theorems 43.7.7 and 43.7.8, on the other 
hand, concern the integral. So their integral formulas are expressed in terms of directed domain intervals, 
whereas their integrability assumptions are for undirected domain intervals. 


43.7.5 THEOREM:  Darbouz integrability is inherited on sub-intervals. 
Let a,8 € R with a € B. Let f : [a,8] —^ R be Darboux integrable on [o, 8]. Then um bj is Darboux 


integrable on [a,b] for all a,b € [a, 8] with a < b. 


PROOF: Let f : [o, 8] + R be Darboux integrable on [a, 8]. Assume for simplicity that a < 6 and 
a,b € [a, B] with a < b. Let € € IR^. Then by Theorem 43.7.3, there exists x € Part(a, 8) with Ap(f,x) < e. 
By Theorem 43.3.8 (iii), there is a unique x’ € Part(a, 8) such that Range(x’) = Range(z) U (a,b). Since a’ 
is a refinement of x by Definition 43.3.5, Ap(f, z') € Ap(f, x) < € by Theorem 43.5.7 (iv). 

Since z' is injective and a,b € Range(z'), there are unique ig,7; € Range(«’) with z;, = a and 2, = b, 
and io < i, because a < b. Let ( = ij — io, and define y = (vj) o by y; = zij4j for all j € {0} U Ne. 
Then y € Part(a,b) and Range(y) C Range(x). So by Theorem 43.5.8, Ap(f|,, y Y) € Ap(f,x) < e. 
Thus for all e € IR*, there exists y € Part(a, b) with Ap(f|,, v) < e. Hence m , 
on [a,b] by Definition 43.5.11. In the special case a = b, f| [a,b] is Darboux integrable on [a,b] for all functions 


is Darboux integrable 


f : |o, 8] > R. So the assertion holds for general a < b. 
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43.7.6 REMARK: By its definition, end-point reversal negates the Darbouz integral. 
As mentioned in Remark 43.5.5, Darboux defined the upper and lower sums such that reversing the end- 
points a and b of the domain interval [a,b] would make the upper sum less than or equal to the lower sum. 
For b < a, the correct upper sum is S5(f, x), and the correct lower sum is S$ (f, x). 


Rather than correcting Darboux's error, Definition 43.5.6 presents the “upper sum” and “lower sum" as 
Darboux did, but then Definitions 43.5.11 and 43.5.12 for Darboux integrability and the Darboux integral 
utilise only the case a € b. Since the upper and lower sums are not defined correctly for b « a, Theorem 43.7.7 
merely quotes Definition 43.5.12, which skirts around the issue. 


43.7.7 THEOREM: . End-point reversal negates the Darboux integral. 
Let f : [o, 8] — IR, for a, 8 € IR. with a € 8, be a Darboux integrable function. Then 


b a 
Va, b € [a, f], ] soa--f f (t) dt 


Hence Va € [a, 8], f? f(t) dt = 0. 


Pnoor: The assertion follows directly from Definition 43.5.12. The existence of the integrals follows from 
Theorem 43.7.5. 


43.7.8 THEOREM: The Darbouz integral on the “sum” of two intervals equals the sum of the integrals. 
Let f : [a, 8] > R, for o, 8 € R with a € 8, be a Darboux integrable function. Then 


b c c 
Va, b,c € [a, 8], n (eat = f roa 


Pnoor: Ifa — b, then f? f(t)dt = 0 and the assertion follows, and similarly if b = c. If a = c, then the 
assertion follows from Theorem 43.7.7. So assume that a, b and c are distinct. 

Suppose that a < b and b < c. Then the concatenation of every interval partition z = (x;)*_, € Part(a, b) 
with an interval partition y = (y;)£.g € Part(b, c) is an interval partition z = (2;)7*9 € Part(a,c), where 
zi = vj for 0 € i < k and zi = yi, for k < i < m = k + 4. Consequently I5, sy) + IEC loa) > DU. 
Now let z = (z;)7*.9 be any interval partition between a and c. If b is in two adjacent intervals of z, then z is a 
concatenation of intervals in Part(a, b) and Part(b, c). Otherwise b is in exactly one of the intervals in z. Let 
this be the interval [zi 1, zx] with k € Nm. Define x = (z;)*.y and y = (yi)f-9 by xi = zi fonr0 c i k—1 
and x, = b, and yo = b and y; = ze4i-1 for 1 <i < £ — m — k-F 1. Then SL = Sb(f o) + Sf, y). So 

b c c . 

Ib, s) + I5 ya) € Ij (f). Hence f; f(t) dt 4- f; f (t) dt = J” f(t) dt. Then the assertion for the cases 
b «aorc« b (or both) follows from Theorem 43.7.7. 


43.7.9 THEOREM: Some fundamental properties of the Darboux integral. 
Let f : [o, 8] — R and g : [a, 8] ^ IR be Darboux integrable on [a, 8] for some a, 8 € R with a < B. 


(i) Vc € R, Va,b € [a, 8], f? cdt = (b — a)c. 

(ii) eif + cog is Darboux integrable on [[a,5]] for all c1, c9 € IR and a,b € fa, 8]. 
(iii) Ver, c2 € IR, Va,b € [a, 8], f? cı f(t) + c2g(t) dt = e in f(t) dt +c f° 9(t) dt 
(iv) Va,b € [a, 8], ((a < band Vt € [a,b], f(t) € g(t) > SÉ FE dt < f° g(t) at). 
(v) Va,b € [a, 8], ( (a € band Vt € [a,b], g(t) > 0) => v g(t) dt > 0). 

(vi) |f| is Darboux integrable on [[a, b]] for all a,b € [a, 8]. 
(vii) Va,b € [o 8], | Je F(t) dtl < | f; (dtl. 
(vii) Vc € Ro, Va,b € [a, 8], ((Vt € [a,b], |f(£)| € ce) => | S? F(t) dtl < |b — a|c ). 
(ix) I 


f (h;);94 is a sequence of continuous real-valued functions on [o, 8] which converges uniformly to f, 
then Va, b € [a, 8], limi soo f^ hi(t) dt = f? f(t) dt 
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Pnoor: For part (i), S5(c,z) = Y m — zi .1)c = (b — a)c for all of the upper Darboux sums in 


Definition 43.5.6. Therefore I5(c) = (b — a)c for the upper Darboux integral in Definition 43.5.10, and 
similarly Ip (c) = (b — a)c. Hence i cdt = (b — a)c. 

For part (ii), let a € b and c1,c2 € Rj. Let e € Rt. Let e1 = $ max(1,c1) ! and e = 4 max(1,c2)^! 
Then £1,£2 € R*. So there are partitions x,y € Part(a, b) with Ap(f,x) < &y and Ap(g,y) < €2. Let z € 
Part(a, b) be the coarsest common refinement of x and y. (See Definition 43.3.7.) Then Ap(f,z) € Ap(f, x) 
and Ap(g,z) € Ap(g,y). So Ap(af +c2g, z) < €1¢1 + €2c2 € €. Therefore c f + cog is Darboux integrable, 
and similarly for cı < 0 or cg < 0, both for a € b and a > b. 

For part (iii), let a < band c1,c2 € R. Then S$ (c1 f +c29, £) € iSi (f, v) -e2Sj,(g, x) for all x € Part(a, b) 
by the inequality sup(ci f + c2g)([[zj;-1, 2j]]) € e1sup f([;—1, 25]]) + c2 sup g([[mj—1, 2;]]) for the individual 
terms in the sums. Similarly S5(ef + €29, T x) > ca Sp(f,z) + e»S5(g, x) for all x € Part(a, b). Hence 
f! eft) + cog(t) dt = cy EC f(t) dt + ca f? g(t) dt, and similarly for c4 < 0 or c4 < 0, both for a < b 
and a > b 

For part (iv), let a € b and Vt € [a,b], f(t) < g(t). Then S5(f, 2) < Si(g, x) and Sp(f,x) 
all a € Part(a,b). So I5(f) € I$ (g) and I5(f) € Ip(g). Hence f? f(t) dt < f? g(t) dt 

Part (v) follows from part (iv) with f(t) = 0 for all t € [a, 6]. 

For part (vi), let a X b. Then the Darboux oscillation Ap(f,2) in Definition 43.5.6 line (43.5.3) may be 
expressed as Ap(f,x) = X dm — v4 3) sup(|f (s) — f(t)|; st € [i 1, x;]) for x € Part(a,b). Let e € IR*. 
Then by the Darboux integrability of f, there is a partition z € Part(a, b) such that Ap(f,x) « e. So 


< Sp (9, a) for 


Apilfl2) = È (e = nsn (I - UOI sst € asa) 
E x — zi-1)sup(|f(s) — f(E); s.t€ [zi-1,2i]) 
= Ap(f,a) 
<E 


by Theorem 18.5.16 (iii). Therefore |f| is Darboux integrable on [a, b], and similarly if b < a. 
For part (vii), let a < b. Then Vt € [a,b], f(t) < |f(t)]. So SH(f,x) < "o x) for all z € Part(a, b), and 


so I5(f) € Ij (|f|). But |f| is Darboux integrable on [a,b] by part (vi). So NM f(t) dt| < NM | f(t)| dt|, and 
similarly for b « a. 


Part (viii) follows from parts (vii), (iv) and (i). 
For part (ix), let a < b. Let e € Rt. "^ > 38.4.2, for some k € Zf, |h;(t) — f(t)| < &/(b — a) for 
all t € [a, : and i € un oo). So NM hi( f(t) dt| < & for all i € Zk, oo) by part (viii). Consequently 
by part (iii), | f? h( -j f(t)dt| < € Lo all i € Z[k,oo). Hence lim; 4. f^ hi(t) dt = f? f(t) dt by 
Theorem 37. 7. 12, a ee for a > b. (The case a = b is trivial.) 


43.8. Fundamental theorems of calculus 


43.8.1 REMARK: The two fundamental theorems of calculus for the Darboux integral. 
Theorems 43.8.3 and 43.8.5 are both known as the fundamental theorem of calculus. The abbreviation 
FTOC is sometimes used for a “fundamental theorem of calculus”. 


In the FTOC style in Theorem 43.8.3, called “FTOC 1” here for brevity, an integral is constructed from 
the integrand, and the derivative of the integral is shown to equal the original function almost everywhere, 
namely on the set of continuity points of the integrand. In other words, a continuous function is integrated 
and then differentiated to return to the original function. 


FTOC 1 is presented for the Riemann-Darboux integral by Friedman [74], pages 127-128; Rudin [129], 
pages 114-115; Schramm [133], page 292-293; Graves [85], pages 90-91; Johnsonbaugh/Pfaffenberger [97], 
pages 225-226; Edwards [67], page 209-210; Mattuck [114], pages 270-273; Spivak [140], pages 240—241; 
Bruckner/Bruckner/'Thomson [56], page 41 (Cauchy integral); Rosenlicht [128], pages 126-127; Shilov [135], 
pages 291—292; Thomson/Bruckner/Bruckner [149], pages 340—341. 
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In the FTOC style in Theorem 43.8.5, called “FTOC 2” here for brevity, a differentiable function is assumed 
to be given and it is shown that the integral of its derivative yields the original function. In other words, a 
differentiable function is differentiated and then integrated to return to the original function. 


FTOC 2 is presented for the Riemann-Darboux integral by Friedman [74], pages 128-129; Rudin [129], 
page 115; Schramm [133], page 291; Graves [85], page 91; Johnsonbaugh/Pfaffenberger [97], pages 224-226; 
Edwards [67], page 210; Mattuck [114], pages 269—273; Bruckner/Bruckner /'Thomson [56], pages 41, 191-192; 
Spivak [140], page 244; Thomson/Bruckner/Bruckner [149], page 341; Shilov [135], page 293; Rosenlicht [128], 
pages 127-128. All of these authors require the antiderivative (or "indefinite integral") to be differentiable 
or continuously differentiable, with the exception of Shilov [135], page 293, who allows the integrand to be 
piecewise continuous (with finite left and right limits everywhere), which permits the antiderivative to have 
a finite number of discontinuities. Greater generality than this is possible. The antiderivative may be non- 
differentiable on a set of measure zero, although the derivative must be bounded. (This fits very well with 
the definition of a Lipschitz function for the antiderivative.) 


Amongst the references consulted for the FTOC here, roughly half of those which designated one as the first, 
and the other as the second, presented them in the same order as given here. The other half presented them 
in the opposite order. So there is apparently no international standard order for this! 


43.8.2 REMARK: Comparison of FTOC 1 with FTOC 2. 
Roughly speaking, it is FTOC 1 which is the more technically powerful, whereas FTOC 2 is generally 
considered to be more useful in practical computations. 


FTOC 2 makes it possible to compute a definite integral if an antiderivative is known, which is how integral 
calculus was done from the late 17th century onwards. (See Remarks 43.2.1 and 43.2.2. See also Leibniz [186], 
page 297.) But FTOC 1 is especially valuable for the functions which cannot be integrated using a corpus 
of known antiderivatives (which are typically determined by differentiating combinations of familiar kinds 
of functions). FTOC 1 can be used to add new functions to the corpus of antiderivatives because when a 
function can be determined to be Darboux integrable, it is then known what its derivative is. So essentially 
any Darboux integral may be added to the corpus as a "special function" in this way. (Special functions are 
also defined as convergent series or by other kinds of rules.) This may be summarised as follows. 


* FTOC 1. Construct a new antiderivative. 'l'his theorem verifies that the output from integrating an 
integrable function is the antiderivative of the input. So it may be added to the corpus of special 
functions whose derivatives are known. 


* FTOC 2. Apply a known antiderivative. 'This theorem applies a known antiderivative to compute a 
definite integral. 


43.8.3 THEOREM: FTOC 1. Fundamental theorem of calculus. Integrate, then differentiate. 

Let f : [a, 8] — R be Darboux integrable on [a, 8] for some o, 8 € R with a € 8. For c € [a, 6], define 
F, : [a, 8] > R by F, : x e» f? f(t) dt. Then for all c € [a, 8] and x € (a, 8), if f is continuous at x then F, 
is differentiable at x and F(x) = f(x). Hence 


d T 
Vc € [a, B], Yx € Cr, x] f(t) dt = f(x), 
where C is the set of points in (a, 8) where f is continuous. 


PROOF: Let f : [o, 8] > IR be Darboux integrable on [a, 8] with a € 8. Let c € [a, 8] and x € (o, 8). 
Suppose that f is continuous at x. Let € € R*. Then there is a ô € IR* such that (x — ô, x + ô) C (o, 8) and 
|f (t) — f(x)| < € for all t € (x — ô, x + ô). From Theorem 43.7.8, it then follows that 


vweG-kaeD. [EG)- RG)-G- 2f =| f FO dt— (y — 2)f(3) 


=| f roa- [roa 
= Lf ro - re) at| 


= ely — «| 
by Theorem 43.7.9 (i, iii, viii). Hence F, is differentiable at x and F/(x) = f(a) by Theorem 40.4.11. 
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43.8.4 REMARK: A limited form of FTOC 2. 

Theorem 43.8.5 is stated for the special case of a differentiable antiderivative (or “indefinite integral” ) because 
this is how it is stated by most authors. Slightly more generally, Shilov [135], page 293, permits the integrand 
to be piecewise continuous (with finite left and right limits everywhere), which allows the antiderivative to 
have a finite number of discontinuities. In the 150 years after FTOC 2 was published by Gregory, Barrow, 
Leibniz and Newton, antiderivatives were generally differentiable, as in Theorem 43.8.5, but it is clearly valid 
for any antiderivative which is an integral of a Darboux integrable function. 


It should perhaps be emphasised that the late 17th century antidifferentiation procedure was not FTOC 2 
in the sense of Theorem 43.8.5 because the Cauchy e-ó analytical style of integral had not yet been invented. 
The 17th century style of integral was geometric, not analytical. It was expressed in terms of the gradual 
addition of infinitesimal rectangles to a geometric region, not in terms of approximation by sums of areas 
of rectangles on partitions with gradual refinement of the mesh. Thus the FTOC 2 in Theorem 43.8.5 gives 
the antiderivative difference F(b) — F(a) as the value of a limit of gradually refined piecewise rectangular 
approximations to the area, whereas the 17th century FTOC 2 gave the same antiderivative difference for the 
area of a geometric region thought of as infinitely many infinitesimal rectangles. The distinction is somewhat 
subtle. It is a distinction between 17th century geometric intuition and 19th century analytical precision 
using epsilons and deltas. 


43.8.5 THEOREM: FTOC 2. Fundamental theorem of calculus. Differentiate, then integrate. 
Let f : [a,b] — R be a Darboux integrable function for a,b € R with a < b, and let F : [a,b] ^ R bea 
continuous function on [a,b] such that F is differentiable on (a,b) and F'(x) = f(x) for all x € (a,b). Then 


b 
J f(x) dx = F(b) — F(a). 


PROOF: Define G: [a,b] > R by Vs € [a,b], mE f(t) dt, which is well defined because f is integrable 
on pue b]. Then G is differentiable on (a,b) a ce = f(x) for all z € (a,b) by Theorem 43.8.3. Define 
: [a,b] — R by H(x) = F(x) — G(x) for all z € [a,b]. Then H is continuous on [a,b] and differentiable 
a a,b), and H'(z) = 0 for all z € (a,b). So H(x) = H(a) for all x € [a,b] by Theorem 40.6.8. Therefore 
Gic) = F(x) — H(a) = F(x) — F(a) for all x € [a,b]. In particular, G(b) = F(b) — F(a). In other words, 
JË f(a) dz = F(b) — F(a). 


43.8.6 REMARK: Uniqueness of antiderivatives. 

The equation F’(x) = f(x) in the statement of Theorem 43.8.5 resembles a differential equation. It seems 
to say that F is a solution of a first-order differential equation. The proof of such existence is in general 
non-trivial. (See for example Theorems 44.3.20 and 44.7.1.) However, in the statement of Theorem 43.8.5, 
this existence is an assumption, not an assertion. This theorem asserts the uniqueness of the solution as an 
implicit consequence of the formula for the definite integral. Thus F(t) = F(a) + fi f(a) dx for all t € [a, t]. 
'This is stated explicitly as in Theorem 43.8.7. 


43.8.7 THEOREM: Existence and uniqueness of antiderivatives. 

Let f : [a, 8] > R be a Darboux integrable function for a, 8 € R with a < f. Let xo € [a, 8] and yo € R. 
Then there exists a unique function F € C°({a, 8], IR) which is differentiable on (a, 8) and satisfies 

(1) F(ao) = yo, and 

(2) Vt € (o, B), F'(t) = f(t). 


PROOF: For a given Darboux integrable function f : |a, 8] — IR, define F : [a, 8] > R by 


Vt € [a, 8], F= a | f(a) dt 


Then F is well defined by Theorem 43.7.5, F satisfies (1) by Theorem 43.7.7, and F satisfies (2) by Theorems 
43.8.3, 40.5.5 and 40.5.7 (ii). 


To show uniqueness, let F}, Fz € C°({a, 8], IR) be differentiable on (a, 8) and satisfy (1) and (2) with Fi or 
F substituted for F. Let Fy = F, — F5, the pointwise difference of F; and Fy. Then Fo € C°([a, B], IR), 
and Fo is differentiable on (a, 8), and Fo(xo) = 0, and Vt € (o, B), F4(t) = 0 by Theorem 43.7.9 (ii, iii). So 


Fo(t) = 0 for all t € [a, 8] by Theorem 40.6.8. So F = Fy. Hence the solution F is unique. 
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43.8.8 REMARK: The “change of variables” formula for the Riemann integral. 
Theorem 43.8.9 applies FTOC 2 to the chain rule for derivatives of real functions of a single real variable 
in Theorem 40.5.15. Since the “change of variables" map $ in Theorem 43.8.9 is not necessarily injective, it 


may be better to think of it as a curve which has end-points ó(a) and (b). (See Friedman [74], page 130, 
for a similar version of the change of variables formula.) 


Since $ is continuous on a compact interval domain, its range is also a compact interval by Theorems 34.4.20, 
33.5.15 and 37.7.7. Therefore Dom(f) = Range(4) is a compact real interval. 


((2019-8-31. Theorem 43.8.9 could be strengthened by allowing f to be merely Darboux integrable. However, 
this would require a prior theorem that f o ó is Darboux integrable if is continuously differentiable, and 
then also that the product of a Darboux integrable function by a continuous function is Darboux integrable. 
At the moment though, Theorem 43.8.9 is adequate for its intended applications. (This isn't a calculus 
textbook. It's supposed to be only some minimal preliminary material.) 


Theorem 43.8.9 can obviously be extended to Riemann-Stieltjes integrals. (See for example Rudin [129], 
page 124.) It can also be extended to real functions on Cartesian spaces, i.e. functions of several variables. 
(See for example Rudin [129], pages 206-207.) However, this would require a presentation of the Riemann 
integral on Cartesian spaces, which would take a long time to write. This does not need to be done, at least 
not yet. It might be required for the Stokes theorem if it is extended to non-rectangular regions. )) 


43.8.9 THEOREM: Change of variables formula. (Also known as "substitution of variables".) 


Let a,b € R with a € b. Let ó € C'([a, b], IR) satisfy Phea 6) € C! ((a, b), R). Let f € C°(Range(¢), IR). Then 


o(b) b 
; (tat = | fo) s. 
(a) a 


PROOF: First note that jee f(t) dt is well defined by Theorems 43.7.5 and 43.6.11 because [[¢(a), é(b)]] is 
a compact sub-interval of the compact interval Dom(f) = Range(¢) = ¢([a, b]). Similarly, f? f(@(s))¢'(s) ds 
is well defined by Theorem 43.6.11 because s +> f(¢(s))¢/(s) is continuous on fa, b] 

Define F : Range(¢) > R by F(y) = Lita) f(t) dt for y € Range(9). This is well defined by Theorem 43.7.5. 
Define G : [a,b] > IR by G(x) = F(¢(x)) for all x € [a,b]. Then G’(x) = F’(¢(x))d'(x) = f(o(x))¢'(a) for 
all z € (a,b) by Theorem 40.5.15 (the chain rule) and Theorem 43.8.3 (FTOC 1). Therefore 


b b 
S (ec»eaas- [ eas 
ze 


G(a) (43.8.1) 
= F(¢(b)) — F(¢(@)) 
6(0) 
= f f(t) dt (43.8.2) 
$(a) 


where line (43.8.1) follows from Theorem 43.8.5 (FTOC 2), and line (43.8.2) follows from Theorem 43.7.7 
and the definition of F. 


43.8.10 REMARK: Integral version of the product formula for derivatives. 

The very useful “integration by parts” formula is a direct consequence of the application of FTOC 2 to the 
product formula for derivatives. However, integration by parts does not always make integration easier. It 
often replaces a difficult integral with a much more difficult integral. So the “parts” must be chosen carefully 
so that the new integral is easier to solve than the old integral. Integration by parts is often used as a kind 
of “tactical move” to try to simplify an integral. If the new integral is not easier than the old integral, one 
simply retreats to the old integral. Such “tactical retreats” usually occur only in handwritten notes. They 
are usually edited out before reaching the printing press! 

Theorem 43.8.11 can be generalised to integrals F which are not necessarily differentiable everywhere, nor 


even continuous everywhere. (See Remark 43.8.4 for some comment on this.) For brevity and simplicity, 
only the restricted case is presented here. 
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43.8.11 THEOREM: Product rule integration. (Also known as “integration by parts”.) 

Let a,b € R with a € b. Let f : [a,b] — IR be Darboux integrable on [a,b]. Let F : [a,b] — R be continuous 
on [a,b], such that F is differentiable on (a,b) with F'(x) = f(x) for all x € (a,b). Let G: [a,b] — IR be 
continuous on [a,b] and continuously differentiable on (a,b). Then 


FOGA dt = F(b)G(b) — F(a)G(a) — 1 i F(t)G'(t) dt. (43.8.3) 


Hence 
b 


b b t 
FOGE) dt = Gb) T f(t) dt — / G'(t) f $18) ds di: (43.8.4) 


PROOF: Define H : [a,b] —^ R by H(x) = F(z)G(x) for x € [a,b]. Then H is continuous on fa, 
poseen on (a,b), and H'(z) = F'(x)G(x) + F(x)G'(z) for x € (a,b) by Theorem 40.5.7 (iii). So 
rh F'(t F(t)G'(t) dt = F(b)G(b) — F(a)G(a) by Theorem 43.8.5 (FTOC 2). Then line (43.8.3) follows 
from M 43.7.9 (iii). 


By Theorem 43.8.5, F(t) = F(a) + fi f(s) E for all t € [a,b]. ac f? F(t)G'(t) dt = rs G'(t)(F(a) + 
[1 f(s) ds) dt = f G'(t ORT (s) ds dt+ f^ G' (t) F(a) dt. But f^ G'(t)F(a) dt = F(a) f? G' (t) dt = F(a)(G(b)— 
G(a)) Mia ns 43.8.5. So by line (43.8.3 2 


b b 
f soca [ F'(t)G(t) dt 


b t 


= FOGO) - Feta) [GH f Fe) dsdt- FOGH) - Gta) 
b t 
= G(b)(F(b) — F(a)) - f c'() f f()dsdt 


by Theorem 43.8.5. This verifies line (43.8.4). 


43.9. Cauchy-Riemann-Darboux integral for vector-valued integrands 


43.9.1 REMARK: Generalisation of real-valued integration to vector-valued. 
The “vector-valued” integrals in Section 43.9 are required for ODE systems theory, as in Sections 4 4, 44.5 


and 44.7, which are required for the definition of integral curves, as in Section 57.10, which are required fo 
one-parameter subgroups of Lie groups in Section 62.9 and for many other purposes. 


Definition 43.9.2 is a generalisation of Definitions 43.5.11 and 43.5.12. The Darboux integral of a Cartesian- 
space-valued integral is nothing more than the tuple of integrals the components of a tuple of functions. So 
its properties are almost automatic consequences of the properties for real-valued functions. In this sense, 
Section 43.9 is superficial. It is provided here for easy reference from later sections of the book. But there 
are no surprises. 


43.9.2 DEFINITION: A Darboux integrable (vector-valued) function on a bounded closed interval fa, b] for 
a,b € R with a < b is a bounded function f : [a,b] — IR", for some n € Z^, for which the components 
f; = Il; o f : [a,b] — IR are Darboux integrable real-valued functions for j € Nn. (See Notation 10.13.3 for 
the projection map II;.) 


The Darboux integral of a Darboux integrable vector-valued function f : [[a,b]] — IR” with end-points 


a,b € IR is the tuple 
b b 
f f(t)dt = (f f; (t dt) a 


where f? f;(t) dt is the Darboux integral of f; with end-points a and b for j € Nn. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


43.9. Cauchy-Riemann-Darboux integral for vector-valued integrands 1417 


43.9.3 REMARK: The Darbouz integral version of the triangle inequality. 

Theorem 43.9.5 (vii) has the appearance of a triangle inequality for an infinite number of vectors, one vector 
in R” for each parameter t € [a,b]. As mentioned in Remark 40.8.2, the form of the mean value theorem 
for curves in Theorem 40.8.7 is a kind of “triangle inequality for an infinite number of infinitesimal line 
elements”. Theorem 43.9.5 (vii) may be regarded as an integral version of this, namely a kind of “triangle 
inequality for an infinite number of infinitesimal area elements”. 


Theorem 43.9.5 (vii) is important for demonstrating uniqueness of solutions of systems of ordinary differential 
equations in Theorem 44.5.3, which in turn makes possible the copying of connections between associated 
fibre bundles in Definition 67.12.3. This is then used, in particular, to apply connection forms on principal 
bundles to ordinary bundles, which is of some significance in the formulation of Lagrangians and equations 
of motion in gauge theory. 


43.9.4 REMARK: Technical note on proof of Theorem 43.9.5. 

The proofs of parts (vi) and (vii) of Theorem 43.9.5 are perhaps of some interest because of the mechanisms 
which make them work. The proof of part (vi) works because on line (43.9.3), the oscillation of a sum is 
bounded by the sum of the oscillations. Something similar is seen in probability and statistics, where the 
standard deviation of a sample is bounded by the sum of the individual standard deviations. So the proof 
in this case is assisted by summing n variables. 


The proof of part (vii), ironically, has some difficulties from the same cause. Trying to bound the sum of 
supremums of n values on individual intervals in terms of the supremum of the sum does not work because 
the variation of the sum is typically less than the sum of the variations. (By contrast, this kind of supremum 
bounding does work in the proof of Theorem 43.7.9 (vii) for a single variable.) Consequently in the proof of 
part (vii), the sum of infimums of n values is bounded in terms of the infimum of sums, and then part (vi) 
guarantees that the corresponding infimums and supremums converge to the same value. 


43.9.5 THEOREM: Some fundamental properties of the real-tuple-valued Darboux integral. 
Let n € Z*. Let f : [a, 8] > R” and g : [a, 8] + R” be Darboux integrable on [a, 8] for some a, 8 € R 
with a < B. 


(i) Va,b € [a, 8], Kos t) dt = — f£ f(t) dt 

Va,b,c € [a, 8], J f(t) dt + SE f(t) at = [€ f(t) dt 

(iii) Vc € R”, Va, b € [a, 8], f’ cat = (b — a)c. 

(iv) cif + cog is Darboux integrable on [[a,5]] for all c1, c9 € R and a,b € fa, 6]. 


i) 
) 
) 
(v) Vei, ce € R, Va,b € |o, 8], f? cif (t) + c2g(t) (t) dt = c f° f(t) dt + ca f? g(t) at 
(vi) |f| is Darboux integrable on [[a, b]] for all a,b € [a, 8]. 
i) 
) 
x) I 


Zo 


ii 


(vit) Va, b € [o B], | SÈ FO dt] < | SÈ IO dtl. 
(viii a b € lo. 8], (Vt € [a,b], |f(t)) € e) => | S? f(t) dt < |b — alc). 
(i 


f (h;)%2 is a sequence of continuous R”-valued functions on [a, 8] which converges uniformly to f, then 


Va, b € [o, B], lim;.,ss f h;(t) dt = f? f(t) dt 


Pnoor: Part (i) follows from Theorem 43.7.7 applied to the components of f. 

Part (ii) follows from Theorem 43.7.8 applied to the components of f. 

Part (iii) follows from Theorem 43.7.9 (i) applied to the components of c. 

Part (iv) follows from Theorem 43.7.9 (ii) applied to the components of f, g and c f + cag. 


Part (v) follows from Theorem 43.7.9 (iii) applied to the components of f, g and cif + cg. 

For part (vi), it follows from Theorem 43.7.9 (vi) that the absolute value | f;| is Darboux integrable on [[q, £]| 
for alli € Nn. (But this does not directly imply that |f| is Darboux integrable. It is necessary to re-run the 
proof with suitable adjustments.) 

Let a € b. Then for j € Nn, the Darboux oscillation an z) of f; in Definition 43.5.6 line (43.5.3) uA be 


expressed as Ap(fj,r) = x i ($i — zii) sup{|f;(s) — H); s,t € [mi 1, vi]) for each x € Part(a, b). 
€ € R+. Then by the Darboux integrability of f, there is for endi j € Nn a partition x? € Part(a, b 
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that Ap(fj, 27) < e/n. Let x be the coarsest common refinement of all of the partitions x for j € Nn, as 
in Definition 43.3.7. Then 


Ap(lfl.a) = X (i ei) sup [fI ON st € caso) 

= G — vi 1) sup( | f(s) — f(O|; s.t € [zi-1, 2i] } (43.9.1) 

E C — zı) sup{ ICs) - f; st E [ei z] } (43.9.2) 

= $ (a: i — vii) Pi sup{ [f;(s) — HON s.t € [zi-1; zi] ) (43.9.3) 
£ 

= 2, Ap D(fj,% ) 

< »ES p(f;,2?) (43.9.4) 

<E; 


where line (43.9.1) follows from Theorem 37.1.4 (ii) applied to the standard 2-norm on R”, line (43.9.2) 
follows from Theorem 24.7.18 (ii), line (43.9.3) follows from Theorem 16.7.8 (i), and line (43.9.4) follows from 
Theorem 43.5.7 (iv) because x is a refinement of z; for each j € Ne. Hence |f| is Darboux integrable on [a,b] 
by Definition 43.5.11, and similarly for a > b. 


For part (vii), let a < b. Then f? f(t) dt is well defined because f is Darboux integrable by assumption, and 
f£ | f(t)| dt is similarly well defined by part (vi). Define g; : [a,b] — RÈ by gj(t) = |f;(t)| for all t € [a,b], 


for all j € N,. Then g; is Darboux integrable and Le f(t) dt| € f? g;(t) dt by Theorem 43.7.9 (vi, vii) for 
all 7 € Nn. Therefore 


|S FE) dtl = (È FE) at), | 
= (1 S? fy(t) at) | 
«| f? gj (t) dt)? l: (43.9.5) 


Let K = |( f? g;(t) dt)"_,|. Then K € R$. If K = 0, then NM f(t) dt| = 0, from which it follows that 
| f? f(t) dt| € | f? |f(t)|dt|, which is the assertion to be proved. So assume that K € R+. 
Let € € (0, K]. Then for all j € Nn, there exists z/ € Part(a,b) with Sp (gj, xf) > (1 — €/K) ys g;(t) dt 


by Definition 43.5.10 line (43.5.6) and Theorems 43.7.9 (v) and 16.1.18 (ii). Let x = (a;)_, be the Ed 
common refinement of all of the partitions z for 7 € Nn. (See Definition 43.3.7.) Then 


Vj € Nn; Sp(gj, 27) < Sp(gj,z) 


k 
= » (ui — 24-1) inf gj(s); s € [[z:-1; 2:]] } 
i=1 
For each i € Ng, define the n-tuple V; = (Vi j)j=1 = ((xi — vi-1) inf g;([[vi-1, ;])) 7-1. Then Vi € (IRS)" for 
all i € Nz. So oan Vi| < X |Vi| by the triangle inequality for the 2-norm of finite sequences of vectors 


in IR^. Then Sp(gj,z) = 54 Vig = (Xi Vi); for all j € Nn by Definitions 43.5.12, 43.5.10 and 43.5.6. 
So 


($5 (9j, 2))5- Jiz ^ v; 
: 
€ © |V: 
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= Y (zi — 1) |(inf(g;(s); s € [ria zd] 


k=1 
Š PIC — 4-1) inf{|(9;(s))Fil; $ € [riis 2:]])} (43.9.6) 
= Sp(| fl, x), 


where line (43.9.6) follows by Theorem 24.7.20 (ii). Therefore 


Sp(|fl.a) > \S5(9;,0) 21 
> (1 — e/K) |(JÈ g; 6) dtl (43.9.7) 
> (1— €/K) | J) F(t) ae (43.9.8) 


where line (43.9.7) follows by Theorem 24.7.18 (iii) and Definition 24.7.2 (i), and line (43.9.8) follows from 
line (43.9.5). Since this holds for all e € (0, K], it follows from Definitions 43.5.12 and 43.5.10 line (43.5.6) 
that 


fe feat» | S? FE) ai. 
Similarly for a > b, | f; |f(£)] at] = f; |f(t)] dt = | Sy FE) at] = | f FE) at]. Hence | f? f(t) t| < | S3 IFE dtl 
for all a,b € [a, £]. 
For part (viii), let c € Ri, a,b € [a, 6], and |f(t)| € c for all t € [a,b]. Then IE f(t) dt| < P | f(t)| dt| by 
part (vii). But | f^ |f(£)| dt| < | f? cdt| by Theorem 43.7.9 (iv), and | f? cdt| = |b— a|c by Theorem 43.7.9 (i). 
Hence | f? f(t) dt| € |b — alc). 
Part (ix) follows from Theorem 43.7.9 (ix) applied to the components of f and h; for i € Zj. 


43.9.6 REMARK: Fundamental theorems of calculus for curves. 

Theorems 43.8.3 and 43.8.5 may be generalised to integrate the velocity of a C?! curve. In fact, these 
theorems are valid for general Lipschitz continuous curves in Cartesian spaces. So the vector from one point 
on a Lipschitz curve to another is equal to the integral of the curve's velocity function along that stretch of 
the curve. To assert an FTOC for curves in terms of Lipschitz continuity requires the Lebesgue differentiation 
theorem in Section 45.7, but it can be stated here that any curve which is the integral of an almost everywhere 
continuous velocity can be differentiated to recover the velocity wherever it is well defined, and the velocity 
of any differentiable curve can be integrated to recover the point-to-point displacement along the curve. 


43.9.7 THEOREM: FTOC 1 for curves. Integrate, then differentiate. 

Let f; : [o, 8] — R be a Darboux integrable function on |a, 8] for some o, 8 € R with a < £ for all j € Ny 
for some n € Z*. For c € [o, 8], define Fo; : [a, 8] > IR by Fej :x I f;(t) dt for all j € Nn. Then for 
all c € [a, 8|] and x € (o, 8) and j € Nn, if f; is continuous at x for all j € Np, then F}; is differentiable at 
x and F;;(x)— fj(x) for all j € Nn. Hence 


d x 
Ve € [o, 8], Va € Cf, Vj € Nn, x] f;(t) dt = f;(x), 
where C is the set of points in (o, B) where fj is continuous for all j € Nn. In other words, 


vc € [o, B], Vx € Cj, x] f(t) dt = f(a), 


where f : [o 6] — R” is defined by f : x + (fj(z))54- 


PROOF: The assertion follows from Theorem 43.8.3 applied to each of the n components. 


43.9.8 THEOREM: FTOC 2 for curves. Differentiate, then integrate. 
Let n € Zt. Let a,b € IR with a < b. For all j € Ny, let f; : [a,b] — IR be a Darboux integrable function, and 
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let Fj : [a,b] + R be a continuous function on [a,b] such that F} is differentiable on (a,b) and F; (x) = f;(x) 
for all x € (a,b). Then 


b 
Vi € Nn, I f(x)dx = F(b) — F(a). 
In other words, 
b 
] Fæ de= FO) - Flo), 


where f : [a,b] — R” is defined by f : x > (fj(x))?_), and F : [a,b] — R” is defined by F : x + (F5(2))5. 


j-r 


PRoor: The assertion follows from Theorem 43.8.5 applied to each of the n components. 


43.9.9 THEOREM: Existence and uniqueness of antiderivatives. 
Let n € Z*. Let f : [o, 8] — R” be a Darboux integrable function for o, 8 € R with a < 8. Let zo € [a, 8 
and yo € R”. Then there exists a unique F € C?([o, 6], IR") which is differentiable on (a, 8) and satisfies 


(1) F(xo) = yo, and 
(2) Vt € (o, B), F'(t) = f(t). 


PROOF: The assertion follows by application of Theorem 43.8.7 to the components of f and F. 


43.10. Cauchy-Riemann-Darboux-Stieltjes integral 


43.10.1 REMARK: Riemann, Riemann-Stieltjes and Stieltjes integrals. 

The Riemann integral for real-valued functions on Cartesian spaces is quite straightforward because it does 
not require measure theory. The Riemann-Stieltjes integral for real-valued functions of a real variable is a 
fairly straightforward extension of the corresponding Riemann integral to a fairly simple class of non-uniform 
measures which may have “point masses". 

The generalisation of the Stieltjes integration concept to vector-valued functions in Section 43.11 is not very 
much more difficult. It is slightly more difficult to extend the Stieltjes integral to linear-operator-valued 
integrands with vector-valued differentials in Section 43.12. In this case, the linear-operator integrand 
operates on the velocity of a rectifiable path. (This velocity may be undefined on a set of measure zero.) 
This latter case is of particular interest for the definition of parallel transport along rectifiable paths for 
connections on differentiable fibre bundles. 


The following kinds of integral are defined in Sections 43.4, 43.10, 43.11 and 43.12. 
(1) Definition 43.5.10. Riemann integral for real-valued functions on real-number intervals. 
2 


(2) 
(3) Definition 43.11.3. Stieltjes integral for vector-valued functions on real-valued integrator curves. 
(4) 


Definition 43.10.4. Riemann-Stieltjes integral for real-valued functions on real-valued integrator curves. 


Definition 43.12.3. Stieltjes integral of linear-operator-valued functions on vector-valued integrator 
curves. 


43.10.2 REMARK: Riemann-Stieltjes integral definition. 

The product terms (rj — 2; 1) inf f([z; 1, x;]) and (x; — xj 1) sup f([w;-1, £;]) in lines (43.5.6) and (43.5.5) 
in Definition 43.5.10 may be modified to “morph” the z-axis in continuous or discontinuous ways. This 
innovation has many benefits. In probability theory, one may represent measures (i.e. probability densities) on 
the real line which combine continuous distributions with discrete distributions within a common framework. 
(This is reminiscent of the way light spectra can have both continuous and discrete components.) Of 
particular interest for differential geometry is the generalisation of the domain from IR to curves in IR" or 
other linear spaces, which permits integration of a wide variety of functions along rectifiable curves. 


It goes almost without saying that the Riemann integral is the special case of the Riemann-Stieltjes integral 
where a(x) = x for all x € [a,b] in Definition 43.10.4. 
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43.10.3 DEFINITION: A (real-valued) integrator (curve) is a real-valued function of bounded variation on 
a bounded closed real-number interval. 


A nondecreasing (real-valued) integrator (curve) is a bounded nondecreasing real-valued function on a 
bounded closed real-number interval. 


43.10.4 DEFINITION: Riemann-Stieltjes integral for a real function on a bounded closed interval. 
The upper Riemann-Stieltjes integral of a bounded real-valued function f : [a,b] — IR. with respect to an 
integrator curve a: [a,b] > R is 


inf { X (o(a) — a(zj 1)) sup f([xj-1, 25); (x) Po € Part(a, b)}. (43.10.1) 


The lower Riemann-Stieltjes integral of a bounded real-valued function f : [a,b] — IR. with respect to an 
integrator curve a: [a,b] > R is 


sup { X (a(x) — a(zj 1)) inf f([x; 1,2j]); (25)5.0 € Part(a, b)}. (43.10.2) 


A Riemann-Stieltjes integrable (real-valued) function on a bounded closed interval [a,b] with respect to an 
integrator curve a : [a,b] — R is a bounded real-valued function f : [a,b] — R for which the upper and 
lower Riemann-Stieltjes integrals with respect to o are equal. 


The Riemann-Stieltjes integral of a Riemann-Stieltjes integrable real-valued function f with respect to an 
integrator curve a: [a,b] > R is the common value of the upper and lower integrals of f with respect to a. 


43.10.5 REMARK: Notation issues for the Riemann-Stieltjes integral. 

As mentioned in Remark 43.2.7, notations for integrals suffer from various difficulties. In the case of the 
Riemann-Stieltjes integral, there is at least a clear indication of the Stieltjes character of the integral because 
the integrator o is incorporated into the notation. This reduces to some extent the ambiguity as to which 
kind of integral is meant. 

In regard to the dummy variable x, it is easier to omit in Notation 43.10.6 than in Notation 43.2.6 because 
“da” can take the place of the implied uniform-measure differential “dx”. Thus the differential “da” in the 
expression ay f do? Notation 43.10.6 is a mnemonic for the factor o (rj) —a(#;~1) in Definition 43.10.4. An 
explicit indication of the dummy variable z is still often useful, however, because it permits "inline function 
declaration". 


43.10.6 NOTATION: f? f(x) do(x) denotes the Riemann-Stieltjes integral of a Riemann-Stieltjes integrable 
real-valued function on a bounded closed interval [a, b] with respect to an integrator curve a : [a,b] > R. 


f? f da means the same as f? f(x) da (a). 


43.10.7 REMARK: Existence of Riemann-Stieltjes integral for functions of bounded variation. 

For existence proofs for the Riemann-Stieltjes integral for functions of bounded variation, see for example 
Johnsonbaugh/Pfaffenberger [97], page 219; Rudin [129], pages 121-122; Kolmogorov/Fomin [104], page 368. 
(See Section 38.10 for functions of bounded variation in metric spaces.) 


43.11. Stieltjes integral for vector-valued integrands 


43.11.1 REMARK: The Stieltjes integral versus the Riemann-Stieltjes integral. 

When the Stieltjes approach is applied to vector-valued and linear-operator-valued integrands, the calculation 
of upper and lower bounds is no longer relevant because a total order is not available. Therefore the name 
“Stieltjes integral” is used in Sections 43.11 and 43.12. 

The “Stieltjes integral” is apparently the restriction of the Riemann-Stieltjes integral in Definition 43.10.4 
to continuous integrands, which may be defined as the limit of the sums Li (alz) — a(xzj—1))f(£j-1) as 


the mesh max? (£j — rj .1) of the interval partitions converges to zero. (See Riesz/Szókefalvi-Nagy [125], 
page 122; Stieltjes [198], pages 68-71.) The Riemann-Stieltjes integral extends this to more general real- 
valued integrands by considering upper and lower bounds. 
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43.11.2 REMARK: Extension of the Riemann-Stieltjes integral to vector-valued integrands. 

Lines (43.10.1) and (43.10.2) in Definition 43.10.4 do not refer to any topology on the real numbers. They 
resemble, respectively, a “limsup” and “liminf” of sums of products of integrand values and differentials, 
and when “limsup = liminf”, it is considered that the limit is well defined. Each sum in line (43.10.1) is 
a kind of upper bound, and each sum in line (43.10.2) is a kind of lower bound, for the integrand f with 
respect to the partial order on real-valued functions which is defined in terms of pointwise order. (In other 
words, f < g means Yz € [a,b], f(x) € g(x).) The integral is considered well defined when the least sum for 
upper bounds is equal to the greatest sum for lower bounds. (Note that the least upper and greatest lower 
approximations to f might not be equal. Consider for example the Riemann integral of f : [-1,1] — R with 
f(0) =1 and f(x) = 0 for z £0.) 

'The structure required on IR for the Riemann-Stieltjes integral of a real-valued function includes the standard 
field operations and properties, and also completeness with respect to a compatible total order. Topology is 
not required at all, although order can be considered to be a kind of topology. 


When one wishes to generalise a real-number integral to vector-valued integrands, the supremum and infimum 
are no longer available since linear spaces do not usually have a total order which is compatible with the 
algebraic structure. Therefore for vector-valued integrands, a topology is required. More specifically, one 
requires some notion of convergence of a non-increasing family of sets to a point. To better understand this 
issue, let f : [a,b] — V for some linear space V, and consider the set 


Ss = { Y (oj) — a(zj-1))f (t5); (z;)5-o € Parts(a,b) and Vj € Ng, tj € [5-124] V 


where Parts(a, b) is the set of interval partitions of [a,b] with mesh not exceeding ô. (See Notation 43.3.11.) 
If V = Rand f is a Riemann-Stieltjes integrable function on [a,b] with respect to a, then sup(S5) and inf (Ss) 
will converge to a common value as ó — 0. In fact, in this case, lims_,9 sup(S5) and lims.,o inf(S5) are equal 
to the values in the respective lines (43.10.1) and (43.10.2) in Definition 43.10.4. In the case of a general linear 
space V, one may hope that lim; ,o diam($5) = 0 if V is a normed space, or that the set-family (S5)5em- 
may be said to converge to a point in some other sense otherwise. (One could use seminorms for example.) 
In the most general case, one would want (Ss)ser+ to satisfy Ju € V, VO € Top, (V), 36 € R^, Ss C OQ ina 
topological linear space. 


The lack of supremum and infimum expressions for the vector-valued integral in Definition 43.11.3 implies 
that there is no need to prefix the name “Riemann” to the name “Stieltjes”. (For some definitions of vector- 
valued integrals, which do not necessarily use the Stieltjes approach, see Lang [23], pages 12-13; Rudin [129], 
pages 116-123; Rudin [130], pages 77-81.) 


43.11.3 DEFINITION: Stieltjes integral for a vector-valued integrand. 

A Stieltjes integrable vector-valued function on a bounded closed interval [a, b] with respect to an integrator 
curve a : [a,b] — R is a vector-valued function f : [a,b] + W, for some topological linear space W, which 
satisfies 


Jw € W, VO € Top, (W), 3ó € RT, S; CQ, (43.11.1) 


where 
k 
vő € R*, $5 = { 5 (alx) = a(25 1)) f (t5); (25)50 € Part;(a, b) and Vj € Ng, tj € [aya] 
j=l 


The Stieltjes integral of a Stieltjes integrable vector-valued function f with respect to an integrator curve 
a : [a,b] — R is the (unique) value of w in line (43.11.1). 


43.11.4 REMARK: Stieltjes integral for finite-dimensional vector-valued integrands. 

In the case of finite-dimensional linear spaces W over R in Definition 43.11.3, one may easily decompose the 
integral into components with respect to a basis. Each component of the integral may be written in terms of 
supremum and infimum expressions as in Definition 43.10.4. (This formulation for vector-valued integrands 
in terms of the Riemann-Stieltjes integrals of components is described by Rudin [129], pages 116-117. An &-ó 
version of Definition 43.11.3 for Euclidean spaces IR", which is easily extendable to general Banach spaces, 
is given by A.E. Taylor [145], page 393.) 
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43.12. Stieltjes integral for linear operator integrands 


43.12.1 REMARK: Stieltjes integrals for linear operator valued integrands. 
For applications to ordinary differential equations along rectifiable curves, it is useful to generalise the 
Stieltjes integral to integrands which are effectively linear operators acting on vector integrator differentials. 


Suppose that the real-valued integrator curve o in Definition 43.10.4 is replaced by a map a: [a,b] + U 
for some linear space U, and the integrand space R is replaced by a linear space W. In Definition 43.11.3, 
W is generalised from R to a topological linear space. In this case, the products (a(x,;) — a(x;~1))f(t;) are 
scalar multiples of vectors f(t;) € W by scalars a(x;) — a(z; 1) € IR. This situation can be reversed by 
making U a linear space while W = R, which corresponds to an integrator curve a in U along which a scalar 
function f is integrated. 


In case both U and W are linear spaces, one must define some kind of product of vectors in these two spaces. 
The case U = W = C gives a particularly useful kind of integral for complex integrands f along rectifiable 
curves o; in C. This yields the standard expression f. f(z) dz = f? f(y(t)) dy(t) for complex functions and 
curves. (See Riesz/Szókefalvi-Nagy [125], page 105; Rudin [129], pages 122-123; Ahlfors [45], pages 104-105; 
Phillips [121], pages 85-88.) The “vector” product in this case is a complex number product, but it may be 
thought of as the action of a linear operator f : U — W on a vector in U, where U = W = R? & C. If 
zj = £j + ty; for j = 1,2,3, and z3 = 2122, then 


T3 | _ |71 3 T2 
Y3 u zij Lye] 
Thus one may think of pre-multiplication by zı € C as the action of a linear operator, which is written here 


in matrix form. It is then only a short step to permit linear operator valued functions for much more general 
linear spaces U and W. (This is hinted at by A.E. Taylor [145], page 393.) 


If the factor o (rj) —a(x;~1) in the product f(t;)(a(x;) — a(x;~1)) in Definition 43.11.3 becomes an element 
of a linear space U because a : [a,b] > U has become a curve in U, then f(t;) should be a linear map from 
U to W for each t; € [a,b]. In other words, f should have the form f : [a,b] > Lin(U, W). 


In the application to the Picard iteration procedure for ordinary differential equations, f has the form 
A o z, where A and z have the forms A: W — Lin(U, W) and z : [a,b] —> W respectively. Then the map 
2:5 z(a)-- f? f(t) da(t) = z(a) 4- f° A(z(t)) da(t) has the form z' : [a,b] + W, which (not) coincidentally 
is the same as the form of z. This enables z’ to be substituted for z to generate an infinite sequence of iterated 
integrals. (See Section 44.6 for ordinary differential equations systems and the Picard iteration method.) 


43.12.2 DEFINITION: A vector-valued integrator curve is a function f : [a,b] > U of bounded variation on 
a bounded closed real-number interval [a, b] for some finite-dimensional real linear space U. 


43.12.3 DEFINITION: Stieltjes integral for a linear-operator-valued integrand and vector-valued integrator. 
A Stieltjes integrable linear-operator-valued function on a bounded closed interval [a,b] with respect to a 
vector-valued integrator curve a : [a,b] — U is a vector-valued function f : [a,b] — Lin(U,W), for some 
finite-dimensional real linear space U and topological linear space W, which satisfies 


w € W, VQ € Top,,(W), 3ó € R*, Ss CQ, (43.12.1) 


where 


Vó c R*, $95 = { » f (t;)(a(a;) E a(zj—1)); (x;)*_o [s Part; (a, b) and Vj € Nz, tj E apis}. 


j=1 


S 
ll 


The Stieltjes integral of a Stieltjes integrable linear-operator-valued function f : [a,b] — Lin(U,W) with 
respect to a vector-valued integrator curve a : [a,b] — U, for a finite-dimensional real linear space U and 
topological linear space W, is the value of w in line (43.12.1). 
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Chapter 44 


DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS 


44.1 Logarithmic and exponential functions . . . . . ll n 1426 
44.2 Trigonometric functions . s. 4 6 o9 o F e OS EY UR dO ee aw ee 1431 
44.3 Ordinary differential equations, Peano method . ......... ln 1435 
44.4 Ordinary differential equations, Peano method, systems . .......... llle 1442 
44.5 Ordinary differential equations, uniqueness... . 2. a 1444 
44.6 Ordinary differential equations, Picard method. .................200005% 1446 
44.7 Ordinary differential equations, Picard method, systems ...............+.0.-.4 1451 
44.8 Calculus of variations on Cartesian spaces . . . . . . . les 1451 


44.0.1 REMARK: Ordinary differential equations. 

ODE existence and uniqueness theory is required in differential geometry to justify the well-definition of 
integral curves generated by vector fields, parallel transport generated by connections on vector bundles, 
and one-parameter subgroups of Lie groups generated by Lie algebra elements. (These applications are also 
summarised in Remark 44.6.3.) 


44.0.2 REMARK: Special functions and differential equations. 

Differential equations and special functions are grouped into a single chapter, partly for convenience (to keep 
the chapter lengths roughly equal), and partly because special functions and differential equations motivate 
each other. 


Special functions can typically be defined as solutions of differential equations, or as integrals, or as power 
series. Since differential equations are typically solved by integration of some kind, it makes sense to first 
present integration so that special functions can be defined as integrals, then introduce some special functions 
and show that they satisfy some particular differential equations, and then finally present some differential 
equations theory, using the properties of special functions for motivation and in examples. Most of the theory 
of differential equations is motivated by the desire to extend special cases to much more general classes. 


To be called "special", functions generally require some kind of analytical limit procedure for their definition. 
In other words, they should not be constructible by algebra alone. The special functions presented here are 
the most elementary, namely the logarithm, exponential and trigonometric functions. These functions range 
in age from about 400 to 2400 years. So in this sense, they are no longer truly “special”. They are the 
least special of all special functions. The exponential in particular is almost as essential in analysis as the 
real-number sum and product operations. 


44.0.3 REMARK: Ordinary differential equations are a generalisation of integral calculus. 

An ordinary differential equation is a differential equation in which all derivatives are with respect to a single 
variable. The simplest kind of ODE seeks a function F : I > IR satisfying Vx € I, F'(x) = f(x) for a given 
real interval J and function f : J IR. This is the principal task of integral calculus. Thus ODE theory may 
be regarded as a natural generalisation of integral calculus. Unsurprisingly, integral calculus is the principal 
tool for solving ODEs. 


44.0.4 REMARK: Practical methods for solving ordinary differential equations. 
There is an extensive literature on practical methods for solving ODEs. (See for example Tenenbaum/ 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1426 44. Differential equations and special functions 


Pollard [148], pages 1-790; Kreyszig [105], pages 1-269; Coddington/Levinson [62], pages 108-416; Bear [55], 
pages 1-139; Coddington [61], pages 33-183; Murray/Miller [119], pages 117-153.) Therefore very little is 
said here about practical methods of solution except in a few basic cases where solutions are required for 
proving some kinds of theorems. 


44.0.5 REMARK: Partial differential equations. 

Partial differential equations are not presented here because the theory is too huge. (See the scope summary in 
Remark 1.6.3 (7).) Even in flat space, PDE theory is far too big for this book. Differential geometry provides 
a substrate on which partial differential equations may be defined and solved, but ordinary differential 
equations are part of that substrate. Thus one may say that ODEs are part of the DG infrastructure, 
whereas PDEs are part of the superstructure. 


44.1. Logarithmic and exponential functions 


44.1.1 REMARK: The logarithm arises naturally as an integral. 

In terms of Taylor series or solutions of differential equations, the exponential function is more natural than 
the logarithm function. But in terms of integrals, the logarithm is more natural. The exponential is not an 
integral of some simpler function such as a quotient of polynomials, whereas the logarithm arises naturally 
as the integral of x71. Therefore in this section, the exponential is defined in terms of the logarithm. 


44.1.2 DEFINITION: The logarithm function is the function In: Rt — IR defined by 


x 
Va € Rt, In(z) =| t! dt. 
1 
44.1.3 THEOREM: Some basic properties of the logarithm function. 


(ii) Yx € Rt, £ ln(z) 2 271. 


Yz € R}, 1—7! < ln(z) X x — 1. 
(v) Yzı, £2 € IR*, x, < £2 => ln(z1) < In(z3). In other words, “In” is a strictly increasing function. 


In: IR* — R is a concave function. 
(viii) Vk, z € IR*, In(kz) = In(k) + ln(x). 


) 
) 
) 
) 
) 
(vi) In: Rt > R is a bijection. 
) 
) 
) 
) 


PROOF: Part (i) follows from Definition 44.1.2 and Theorem 43.8.3 (FTOC 1). 

Part (ii) follows from Definition 44.1.2 and Theorem 43.8.3 (FTOC 1). 

Part (iii) follows from Definition 44.1.2 and Theorem 43.7.7. 

Part (iv) follows from Definition 44.1.2 and Theorems 43.7.9 (i, iv) and 43.7.7 because z^! < t^! < 1 for all 
€ [1,2], for all z € [1,00), and x! > t^! > 1 for all t € [z, 1], for all x € (0, 1]. 

For part (v), In(z3) = f^ t ^ dt = f t dt fr? t dt > f t dt+(2—-21)az' > fp t dt = In(21) by 

Definition 44.1.2 and Theorems 43.7.8 and 43.7.9 (i, iv). (Alternatively apply Theorem 40.6.9 (ii) to part (ii).) 

For part (vi), injectivity follows from part (v). Surjectivity follows from part (iv) and Theorem 34.9.7. 


Part (vii) follows from Theorem 40.6.11 (vi) because «++ # In g is non-increasing by part (ii). 


For part (viii), 


ka 
Vk,x € Rt, In(kz) = 1 tdi 
1 


x ka 
=| etae f t idt (44.1.1) 
1 x 
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p k 

= f t! dt «f (zs) !x ds (44.1.2) 
1 1 
x k 

al race f s lds 
1 1 

— ]nz-lnk, 


where line (44.1.1) follows from Theorem 43.7.8, and line (44.1.2) follows from Theorem 43.8.9 with the 
substitution t = xs. 


Part (ix) follows from parts (viii) and (iii) by letting k = z^. 


Part (x) follows from parts (viii) and (ix). 


44.1.4 REMARK: The exponential function. 

The exponential function is probably the most fundamental and essential of all special functions. It is 
difficult to do much analysis without it. Often it is presented as a power series because this uses the most 
elementary analysis to bring it into existence. However, the properties of the exponential are easier to derive 
from the differential equation which it satisfies by Theorem 44.1.9 (iii), which follows from its definition as 
the inverse of the logarithm, which may be constructed as an integral. The power series follows easily from 
the differential equation. (See Theorem 44.1.9 (xii, xiii).) 


44.1.5 DEFINITION: The exponential of a real number. 
The exponential function is the function exp : IR — IR* which is the inverse of the logarithm function. 


44.1.6 NOTATION: e denotes the real number exp(1). In other words, e is the unique solution of Ine — 1. 
e”, for z € IR, denotes exp(x). 


44.1.7 REMARK: Logarithms and exponentials as inverses of each other. 

Definition 44.1.5 means that Vz € IR, ln(exp(r)) = x. In other words, In o exp = idg. The equation 
In(y) = x has one and only one solution for z € R because the logarithm function is injective and its range 
is IR. Equivalently, exp may be expressed as the left inverse of In. That is, exp o In = id(o,4;. In other 
words, Vz € Rt, exp(In(x)) = x. 


44.1.8 REMARK: The choice of which functions should be fundamental. 

It may seem a little troubling that such a basic function as the logarithm is defined as an integral. Integrals 
are cumbersome to calculate in practice. Defining the exponential function as the inverse of an integral 
means that it is defined as the solution of an integral equation. Thus y — exp(x) is defined as the solution 
of i t^! dt =x. The logarithm and exponential functions are often defined in terms of Taylor series which 
are better suited to computers which primarily offer addition and multiplication operations. 


As always, one’s choice of definition can be optimised for a given range of applications. Integrals have 
some advantages for defining transcendental functions. Integrals don’t require convergence tests to ensure 
that they are well-defined. It is easier to show that integral-defined functions are solutions to differential 
equations. So for many analysis purposes, integral definitions are better. Series expansions are usually quite 
easy to derive from integral definitions. 


In the olden days, when analogue and digital computers were considered to both be viable contenders to 
be the future dominant category of computer, it was often observed in electronics texts (and by teachers) 
that one of the most important building blocks of analogue computers, namely “operational amplifiers” (also 
called “op amps”), could be configured to either integrate or differentiate an input signal (so as to solve 
ordinary differential equations), and input noise in the differentiating configuration was greatly amplified in 
the output, whereas input noise in the integrating configuration was greatly reduced. Thus in a sense, one 
may say that defining standard special functions as integrals has the advantage of “reducing input noise”. On 
the other hand, as noted in Remark 43.0.1, integration is mathematically more onerous than differentiation. 
It is a “more advanced technology”. 


44.1.9 THEOREM: Some basic properties of the exponential function. 
(i) Vaz € R, exp(x) > 0. 
(ii) exp € C°(R, IR). 
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Vr € R, £ exp(z) = exp(z). 
ex = j=l; 

E€ R, 1+y € exp(y) < (1— y) *. 
(vi 


yi, y2 € R, yı < y2 = exp(yi) < exp(y2). In other words, “exp” is a strictly increasing function. 
exp: IR — IR* is a bijection. 


(ix) Vzr,y € R, exp(z + y) = exp(x) exp(y). 
(x) Vy € R, exp(—y) = 1/ exp(y). 
(xi) Vy € IR, exp(y) = 1 + fj exp(t) dt. 
(xii) Vk € Zt, Vy € Rt, exp(y) > Weg v /d. 
(xii) Vk € Zg, Vy € [0,1], exp(y) < Rm y! fi! + (e - 1)yF* /(k + 1)!. 


iii) 
iv) 
)v 
i) V 
vii) 
(viii) exp : R — IR* is a convex function. 
) 
) 
i) 
i) 
) 


PROOF: Part (i) follows from Definition 44.1.5 because Dom(In) = IR*. 
Part (ii) follows from Definition 44.1.5 and Theorems 44.1.3 (i) and 34.9.25. 
For part (iii), n(exp(x)) = x for all x € IR by Definition 44.1.5. So exp(x)~!# exp(x) = 1 by Defini- 
tion 44.1.2, Theorem 40.5.15 (the chain rule) and Theorem 43.8.3 (FTOC 1). Hence 4. exp(z) = exp(z). 
Part (iv) follows from Definition 44.1.5 and Theorem 44.1.3 (iii). 

Part (v) follows from Definition 44.1.5 and Theorem 44.1.3 (iv). 

Part (vi) follows from Definition 44.1.5 and Theorem 44.1.3 (v). (Alternatively apply Theorem 40.6.9 (ii) to 
parts (iii) and (i).) 

Part (vii) follows from Definition 44.1.5 and Theorem 44.1.3 (vi). 

Part (viii) follows from Theorem 40.6.11 (iii) because x ++ + exp(x) is non-decreasing by parts (iii) and (vi). 
Part (ix) follows from Definition 44.1.5 and Theorem 44.1.3 (viii). 

Part (x) follows from Definition 44.1.5 and Theorem 44.1.3 (ix). Alternatively by parts (ix) and (iv). 

Part (xi) follows from parts (iii) and (iv), and Theorem 43.8.5 (FTOC 2). 

For part (xii), the case k = 0 follows from parts (iv) and (vi). Now assume that the proposition is true for some 
k € Z. Then exp(y) > Xf o y*/i! for all y € Rt. Let y € Rt. Then by part (xi) and Theorem 43.7.9 (iv), 
43.8.5 (FTOC 2) and 40.5.12, exp(y) = 1+ fexp(t) dt > 14 ff Dip t /ddt = 1-- 0%, f t/ildt = 
1+ Sio ytt (i +1)! = = y’/i!. Thus the proposition is true for case k + 1. Hence by induction, the 
proposition is true for all k € Ee. (Note that the strict inequality between the integrals follows from the 
continuity of exp by part (ii).) 

For part (xiii, the case k = 0 follows from part (viii), Notation 44.1.6 and Definition 23.12.2. Thus 
exp(y) € 1+ (e — 1)y for all y € [0,1]. Assume that the assertion holds for some k € Zj. Then exp(y) € 
355 o y! [il - (e — 1)yF* / (k-- 1)! for all y € [0,1]. So by part (xi) and Theorems 43.7.9 (iv), 43.8.5 (FTOC 2) 
and 40.5.12, exp(y) € 1+ fj YE t/t 4 (e — Det /(k 4+ 1) dt = WE yt fil + (e — 1)yF*?/ (k + 2)! for all 
y € [0,1]. Hence by induction, the assertion holds for all k € Zf. 


44.1.10 EXAMPLE: The function f : IR — IR defined by 


is a C™ function on IR. (See Figure 44.1.1.) 
Similarly, the function g : IR" — IR defined for R > 0 by 


nis T gcc 


is a C^? function on R”. (See Figure 44.1.1.) 
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1 1 
E 1 u 1 
f(x) = exp (-1) gn(x) = exp (=z) 
0 1 a -1 0 i? 


Figure 44.1.1 C™ functions f(x) = exp(—z^!) and gn(z) = exp((x? — R2) 13); R=1 


44.1.11 EXAMPLE: How to construct a C^? smooth step function in terms of the exponential. 
The function f : R — R defined by 


0 , xz<0 
f(x) | (1c-exp(z!-(1—2))) | ze€(0,1) 
1 rl 


is a C% function on IR. (See Figure 44.1.2.) 


————-- 1 —————-- 
1 
1- f() = —À fG)- 
| 14 
1 exp (15 3 exp ( =z) 
0 1 x 
Figure 44.1.2 C^? function which is constant outside [0, 1] 


This function was arrived at by first finding a function tanh(x) which is C^? and bounded between two 
finite values, and then finding another function (1 — z) ! — z^! which maps the finite interval (0,1) to 
the doubly infinite interval (—o00, 00). When these two functions are composed, the result is a function 
which has the desired properties but whose range lies in the interval [-1,1]. This was adjusted by noting 
that (tanh(z/2) + 1)/2 = (1 + e^?) 1. 


44.1.12 EXAMPLE: A smooth function with compact support, constant inside a ball. 
The function g,. 5 : R” > IR defined for n € Zt and r, R € R with 0 € r < R by 


1 lal <r 
Vx € R”, gr,n(x) = (1 +exp((R— |z|)-! — (|x| — r)-))* |x| € (r, R) 
0 lz) > R 


is a C*? function on R” which is zero outside Bo,r and equal to 1 inside Bo,-. (See Figure 44.1.3.) 


This function is useful for constructing C^? functions with compact support with prescribed properties 
within a given region. For example, if the function gr r is multiplied by a general polynomial P(x) = 
per. ID, 225 in n variables, then all of the derivatives of the pointwise product function P.g,,p are 


arbitrarily determined at x = 0 by the choice of coefficients ca. 


44.1.13 EXAMPLE: The function g : (—co,1) — R defined by 


0 <0 
gy) = liy: ue (0, 1) 
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Gr, R(L) = 1 1 
l+e 
ev(x-u a) 


-2 -1 0 1 2 7 


Figure 44.1.3 C®™ function which is zero outside Bo p; cross-section %2,...%, = 0; r — 1, R=2 


g(y) = 1/In(y7) 
2 
1 
y 
1 
Figure 44.1.4 Continuous function g which is not Hólder continuous 


is a C? function on (—oo, 1), and a C% function on R7 U (0, 1). (See Figure 44.1.4.) 


'This is the inverse of the function f in Example 44.1.10, which is very smooth at the origin, whereas g grows 
so rapidly (or shrinks so slowly) at the origin that it is almost like a vertical line. This demonstrates in 
particular that a C? function could have a very poor modulus of continuity, even if multiplied by arbitrary 
fractional-positive power functions. (See Definition 38.3.9 for the modulus of continuity.) This function is 
not a-Hólder continuous for any a € (0,1). (See Section 38.7 for o-Holder continuity.) 


Define ga : (—00,1) > R for a € (0,1] by ga(y) = g(y)/y® for y € (0,1) and ga(y) = 0 for y € 0. Then 
lim, ,o9- ga(y) = lims ga(1/z) = limz+.. 2^/1nz = oo because the logarithm grows more slowly than 
any positive power. Therefore Vo € (0,1], Vk € IR*, Yô € (0,1), dy € (0,6), gly) > ky”. In other words, 
there is no a € (0,1) for which g is a-Hólder continuous. 


The right derivative of g at 0 is infinite, even if it is multiplied by some fractional positive power. Define 
hg : (—00,1) > R for 8 € [0,1) by ha(y) = y®g(y) for y € (0,1) and hg(y) = 0 otherwise. Then 
Of hg = lim, ,o« y?g(y)/y = lim, ,o« g1-a(y) = oo. Thus 9j y^ g(y)|, = oo for all 6 € [0, 1). 
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44.2. Trigonometric functions 


44.2.1 REMARK: The importance of trigonometric functions. 
Trigonometric functions are needed for the study of spheres, which provide important examples of most 
things in differential geometry. 


44.2.2 REMARK: Inverse trigonometric functions are integrals of algebraic functions. 

In terms of Taylor series, the trigonometric functions sin, cos and tan are more natural than the inverse 
trigonometric functions. But in terms of integrals, the inverse functions are more natural. The sin, cos and 
tan functions cannot be constructed as integrals of algebraic functions. The inverse trigonometric functions 
do arise naturally as integrals of simple algebraic functions. Therefore in Section 44.2, the trigonometric 
functions are defined in terms of their inverses. More generally, there are three main ways in which one may 
define “special functions”, including the trigonometric functions. 


(1) Taylor series. Example: sin(x) = $;-9(—1)/27**1/(2i + 1)!. 
(2) Integrals of algebraic functions. Example: arcsin(z) = fo (1 — t?)~1/? dt. 
(3) Initial value problems for differential equations. 

Example: (d?/dz?) sin(z) = — sin(x) with sin(0) = 0 and (d/dz) sin(0) = 1. 


44.2.3 DEFINITION: The one-argument inverse trigonometric functions are defined as follows. 


Va € R, arctan(x) = n (1 4- £2)-1 dé 
0 
Vz € [-1, 1], arcsin(x) = n (1— pj dt 
0 
1 
vz € [-1, 1], arccos(x) = / (e gy? dt 


44.2.4 REMARK: The inverse trigonometric functions are injective. 

The inverse trigonometric functions are illustrated in Figure 44.2.1. It is clear from the definitions that these 
functions are injective. The arctangent, arcsine and arccosine functions are often abbreviated to atan, asin 
and acos respectively. 


Ma ru e 
6 -5 -4 
atan(x) 
accep E uu BE 
asin(z) |—7/2 
Figure 44.2.1 The atan, asin and acos functions 


44.2.5 DEFINITION: 7 = 4arctan(1). 


44.2.6 REMARK: The origin of the notation m. 

A Welsh mathematician, William Jones (1675-1749), used the 7 notation in 1706 in Synopsis Palmariorum 
Matheseos. Euler used the « notation in his Variae observationes circa series infinitas, page 165, presented 
to the St. Petersburg Academy in 1737, but not published until 1744. 
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44.2.7 REMARK: Some properties of m. 
The number 7 in Definition 44.2.5 also satisfies m = 2arctan(oo) = f° (1 + t?)^! dt. Then arccos(r) = 


7/2 — arcsin(z) for all x € [—1, 1]. 


Note also that 7 = 2arcsin(1) = arccos(—1) = Ca — t?)1/? dt. (There are a zillion such formulas for 7.) 


44.2.8 REMARK: The most natural multiple of m. 

It is sometimes argued that a more natural and convenient constant than m would be 7 = 27. This is based 
on the observation that the combination 27 appears so often in formulas. For example, the circumference of 
a circle is 27r and 360° is equal to 27 radians. 


However, as mentioned in Remark 44.2.22, the ancient Greeks expressed circle geometry in terms of the 
diameter of a circle, not the Latin word “radius”. So the formula for the circumference is 7d for diameter d. 
The radian is defined as the angle of a circle which cuts off a distance equal to the radius. If the radius is 
replaced by the diameter, then a radian would be defined as twice its current definition. Then 360° would 
be equal to 7 radians. 


One could also argue that 7 = 7/4 is more natural than 7 because the area of a circle is td?/4 = itd? and 
7/4 = arctan(1). The idea of replacing m with an alternative symbol representing 27 is best reserved for 
April Fools Day jokes. 


44.2.9 DEFINITION: A two-parameter arctangent function arctan : IR? — (—7,7] may be defined in terms 
of the standard single-parameter version as follows. 


arctan(y/x) ifr0 
T-arctan(y/rz) | ifíz«0andy20 
—r + arctan(y/x) ifz«0andy«0 


2 = 

V(r,y) € R4, arctan(z, y) = 1/2 if x =0andy>0 
—n/2 if x = 0 and y < 0 
0 ifr=y=0. 


44.2.10 REMARK: Alternative order for the parameters of the two-parameter arctangent function. 

The two-parameter arctangent function is often defined with the two parameters swapped. This has the 
advantage that one may write “arctan(y, r) = arctan(y/x)” when x > 0, which is perhaps an aid to memory. 
The order given in Definition 44.2.9 is better harmonised with the complex argument function. (In fact, 
the “principal argument” of a complex number z = (z,y) € CV {0} is arctan(z, y).) The two-parameter 
arctangent function is also sometimes denoted as “atan2” or “arctan?”. 


44.2.11 REMARK: Two-parameter arcsine and arccosine functions. 

There is no significant benefit in defining two-parameter arcsine and arccosine functions analogous to the 
two-parameter arctan function. Such definitions would presumably look like arcsin(y,r) = arcsin(y/r) 
and arccos(x,r) = arccos(r/r), or possibly like arcsin(x,y) = arcsin(y(x? + y?) !/2) and arccos(x,y) = 
arccos(z(a? + y?) 1/7). The motivation for the two-parameter arctangent function lies in the fact that the 
single-parameter arctangent is limited to the range (—5, 5), and the formula arctan(y/x) gives the same 
value for positive and negative x € R\ {0}. These inconveniences are largely remedied by the two-parameter 
arctangent function. 


44.2.12 REMARK: Some formulas for the two-parameter arctangent function. 
The 2-parameter arctan function seems to be the best basis for deriving the other trigonometric functions. 
For instance, 


Vz € [-1,1], arcsin(z) = arctan(V/1 — 2?, £) 
Va € [-1,1], arccos(z) = arctan(x, V 1 — 2? ). 


The 2-parameter arctan function satisfies the following. 


, arccos (x(x? + yy) xr? +y >0,y>0 
V(x, y) € Rf, arctan(z,y) = 4 — arccos(z(z? + y?) 1/2) r? +y >0,y<0 
0 z=y=0. 
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44.2.13 REMARK: Definitions of trigonometric functions in terms of inverse trigonometric functions. 
Definition 44.2.14 expresses the sine, cosine and tangent functions in terms of the arcsin, arccos and arctan 
functions. The sawtooth functions used in this definition are discussed in Remark 16.5.22. 


44.2.14 DEFINITION: The functions sin, cos and tan are defined as follows. 


Vr € R, cos(x) = arccos' (|(x + 2r) mod 4r — 27|) 
Vr cm, sin(x) = arcsin ' (|(z + 31/2) mod 2r — 7| — 1/2). 
Vz € RM(k-- 1/2); k € Z}, 

tan(x) = arctan”! ((x + 7/2) mod s — 1/2). 


44.2.15 THEOREM: The first derivatives of the basic inverse trigonometric functions. 
The inverse trigonometric functions have the following first derivatives. 


Va E R, d,arctang = (1-2?) ! 

Vz € (—1,1), A, arcsina = (1—2?) 1 
Va € (—1,1), 0, arccosz = —(1— z?) 7"? 
V(z,y) € R? \ {(0,0)}, Oy arctan(z,y) = —y(a? 4-3?) ! 
V(z, y) € R? \ {(0,0)}, Óyarctan(r,y) = s(x? +y’) 


PRoor: The calculation of derivatives follows in an elementary way from Definitions 44.2.3 and 44.2.9. 
The only subtlety is the overlap line for the two-parameter arctangent function, where x = 0 and y Æ 0. 
It is easily verified that the function is Ct differentiable across this line, and that 0, arctan(z,y) = 1 and 
Oy arctan(z, y) = 0 on this line. 


44.2.16 THEOREM: The second derivatives of the basic inverse trigonometric functions. 
'The inverse trigonometric functions have the following second derivatives. 


Vz € R, 6? arctan z = —2z(1 + 2?) ? 

Va € (—1,1), O2arcsinz = z(1— 2?) 37? 

Va € (—1,1), 02 arccos x = —a(1 — z?) 97 
V(z, y) € R? \ {(0,0)}, dzarctan(x,y) =  2ay(a? +y’)? 
V(x, y) € R? \ {(0,0)}, 0,0, arctan(z, y) = (~x? +y?) (1? + y?) ? 
V(z, y) € IR? \ {(0,0)}, a arctan(z, y) = —2xy(z? + y?) ?. 


PROOF: The second derivatives follow from Theorem 44.2.15 by direct differentiation. 


44.2.17 REMARK: An equation which is useful for spherical geometry. 
Theorem 44.2.18 is useful for computing the equation in terrestrial coordinates on a 2-sphere for a circle on 
the sphere with a given centre and radius. 


44.2.18 THEOREM: The possible angles of a unit vector with specified inner product to a given vector. 
The equation acos + bsin@ = c for 0 € R, (a,b) # (0,0) and c? < a? + b? has the solutions 0 = 
arctan(a, b) + arccos(c(a? + b?)—!/2) + 2nm for n € Z. 


PROOF: This follows from the formula a cos0 + bsin 0 = Va? + 0? cos(@ — arctan(a, b)). 


44.2.19 REMARK: Further properties of the trigonometric functions. 

For further trigonometric function properties, see for example CRC [261], pages A-2 to A-7; CRC [63], 
pages 133-148; Reinhardt/Soeder [124], volume 1, pages 178-181; Gradstein/Ryzhik [B4], pages 50-80; 
Spiegel [139], pages 11-20. 
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44.2.20 REMARK: The historical parallel of elliptic integrals. 

It may seem unusual to define the inverse trigonometric functions first and then patch them together to 
create the familiar trigonometric functions sin, cos and tan. However, Bell [233], page 324, said the following 
in 1937. 


Now, in the integral calculus, the inverse trigonometric functions present themselves naturally as 
definite integrals of simple algebraic irrationalities (second degree); such integrals appear when we 
seek to find the length of an arc of a circle by means of the integral calculus. Suppose the inverse 
trigonometric functions had first presented themselves this way. Would it not have been “more 
natural” to consider the inverses of these functions, that is, the familiar trigonometric functions 
themselves as the given functions to be studied and analyzed? Undoubtedly; but in shoals of more 
advanced problems, the simplest of which is that of finding the length of the arc of an ellipse 
by the integral calculus, the awkward inverse “elliptic” (not “circular,” as for the arc of a circle) 
functions presented themselves first. It took Abel to see that these functions should be “inverted” 
and studied, precisely as in the case of sin x, cos x instead of sin! x, cos~! z. Simple, was it not? 
Yet Legendre, a great mathematician, spent more than forty years over his "elliptic integrals" (the 
awkward “inverse functions" of his problem) without ever once suspecting that he should invert. 
This extremely simple, uncommonsensical way of looking at an apparently simple but profoundly 
recondite problem was one of the greatest mathematical advances of the nineteenth century. 


Thus the trigonometric functions are the inverses of “circular integrals". Just as the 19th century mathe- 
maticians focused too long on elliptic integrals while ignoring their inverses, so also the modern focus on 
the ill-behaved trigonometric functions at the expense of their inverses is unfortunate. Trigonometric func- 
tions have simple Taylor series, but their radius of convergence is very limited. Amongst the trigonometric 
functions, the tangent is more difficult to work with than the sine and cosine because it has asymptotes, for 
example. But amongst the inverses, the arctangent is the simplest to deal with because it is defined and 
analytic on all of R and the arctangent is the integral of a rational polynomial. 


To the modern reader, it is often perplexing that nineteenth century mathematicians invested so much time 
and energy on the mysterious "elliptic integrals". Top university positions were often awarded for prowess 
in this field which seems to have almost disappeared from the later mathematics literature. Nowadays they 
are listed alongside innumerable other “special functions" in mathematical tables reference books. 


44.2.21 REMARK: The historical origin of the word “tangent”. 
Figure 44.2.2 illustrates the classical sine, cosine and tangent. (Here “classical” refers to the geometry of 
about 600BC to 400AD, not just the classical Greek era of the 5th century BC.) 


T 
43 
= 
oO 
èp 
g 
7 8 
radius — 7^0 
O cosine C |B 
Figure 44.2.2 The classical sine, cosine and tangent 


The word “tangent” has many meanings. Normally this is a minor inconvenience, but the tangent space is 
arguably the most fundamental concept in differential geometry. Tangent vectors on manifolds are built into 
almost every structure and mathematical formula in differential geometry, and yet there is no agreement 
on which modern set-theoretic mathematical structure should represent tangent vectors. (See Section 53.3 
regarding the variety of representations.) None of the representations are, in fact, truly satisfactory. This 
annoyance is traceable to the ambiguity which is present even in classical and Renaissance flat geometry. 


In Euclidean geometry, the word “tangent” is used for the tangent line BT in Figure 44.2.2. This tangent 
line may be considered to be the infinite line through B and T, or any segment of that line. This line is 
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considered to be a tangent because it is touching that line (Latin: tangens, tangentis means “touching” ), 
but does not cut it. (See Remark 53.3.12 for Euclid’s definition of tangents to circles.) 


In trigonometry, the word “tangent” refers to the ratio of the length of BT to the length of OB for a given 
angle / TOB = 0. This proportion is a number, not a line. (In classical geometry, however, a proportion 
was always a proportion of lines, not a ratio of numbers.) 


In analytical geometry, the tangent to a curve may be considered to be the direction of the line BT at B. 
'This may be represented by any vector from B in the direction of T. 


These three tangent concepts, (1) the tangent line, (2) the tangent ratio and (3) the tangent direction vector, 
survive into modern differential geometry as (1) equivalence classes of curves, (2) tangent vector coordinates, 
and (3) differential operators. (These are discussed in Section 53.3.) 


44.2.22 REMARK: Sines, cosines and tangents are a relatively modern invention. 

The words “sine”, “cosine”, “tangent” and “radius” in Remark 44.2.21 are all of Latin origin. This is 
somewhat perplexing when one considers that Euclidean geometry is Greek in origin. In fact, Euclid's 
“Elements” did not contain any word meaning “radius”. Euclid/Heath [214], page 2, says the following in a 
footnote regarding definition 1 in Book III of Euclid's “Elements”. 


The Greeks had no distinct word for radius, which is with them, as here, the (straight line drawn) 
from the centre. 


Thus the Greeks, including Archimedes, used the phrase “from the centre" where we would expect the word 
“radius”. See also Euclid/Heath [213], page 199, which has the following. 


The Greeks had no word corresponding to radius: if they had to express it, they said "(straight 
lines) drawn from the centre". 


'This throws doubt on the idea that the classical and Hellenic Greeks would have used the sine, cosine and 
tangent as we now understand them, since they are based on the radius of a circle. In fact, the ancient Greeks 
prepared tables of “chords”. (See Heath [244], page 45, and Heath [245], page 257, regarding Ptolemy’s Table 
of Chords.) Such tables expressed the chord of a triangle with a vertex at the centre of a circle in terms 
of the angle at the centre. This is equivalent to the function 0 ++ sin(0/2). Thus the chord of 60° is 0.5. 
(Ptolemy, however, expressed this as a fraction of 120, into which he divided the diameter. This gave the 
value 60 for the chord of 60°.) Galileo [268] used chord-based trigonometric functions instead of the modern 
radius-based trigonometric functions in his calculations regarding the Ptolemaic and Copernican systems. 


44.3. Ordinary differential equations, Peano method 


44.3.1 REMARK: Some literature for the Peano existence theorem. 

The Peano method for proving existence of solutions for ordinary differential equations is presented by 
Murray/Miller [119], pages 1-35; Coddington/Levinson [62], pages 1-7; Hurewicz [95], pages 1-18, 23-33; 
Bruckner/Bruckner/Thomson [56], pages 382-384. It is presented without proof by Bear [55], pages 156-158. 
The Peano method is called the Cauchy-Peano theorem by Coddington/Levinson [62], page 6. It is called 
the Cauchy-Euler method by Hurewicz [95], page 5. It is called the Cauchy method by Bear [55], page 156. 


44.3.2 REMARK:  Non-uniqueness of solutions of first-order ODEs. 

The Peano method, published in 1886 and 1890, has the advantage that it puts less constraints on ODE 
problems than the Picard method [193], published in 1890, but it has the disadvantage that such generality 
breeds non-uniqueness of solutions. (See Peano [190]; Peano [191].) 


Consider the first-order ODE Vt € R, u’(t) = g(u(t)), where g € C(IR, R) and g(0) = 0. Then clearly the 
function u € D!(R, R) defined by Vt € R, u(t) = 0 is a solution to this ODE. To seek other solutions, 
note that u and g are related by the formula Vy € Range(u), g(y)Oyu !(y) = 1 whenever the terms are well 
defined. (This follows from the chain rule applied to u^! o u. See Theorem 40.5.17 (i).) For the special 
case Vt € R, u(t) = (t — c)!*?* with a € Zt and c € R, one obtains Vy € IR, gly) = (1 2o)|y[?e/0*29), 
For example, a = 1 yields Vt € IR, u(t) = (t — c)? and Vy € R, g(y) = 3|y|?/?. Similarly, a = 2 yields 
Vt € R, u(t) = (t — c)? and Vy € R, g(y) = 5|y|!?. In each case, g is continuous, but does not have a 
bounded derivative, and in fact g'(0) is not well defined. 


Consequently, one can say in general only that Peano’s method produces a solution, not the solution. 
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The non-uniqueness of first-order ODE solutions can be much worse than is suggested by single-parameter 
solution families such as t +> (t— c)?. The function y ++ 3|y|?/3 has only a single point where differentiability 
fails. Otherwise, there is unique continuation of the function. In other words, bifurcations only occur where 
differentiability fails. Clearly if the function g is nowhere differentiable, the non-uniqueness opportunities 
will be hugely greater. (For nowhere-differentiable continuous real functions, see for example Schramm [133], 
pages 332-333; Bruckner/Bruckner/Thomson [56], pages 419-422; Rudin [129], page 141-142.) 


44.3.3 REMARK: Peano ODE existence theorem for a single dependent variable. 

The proof of Peano’s ODE existence theorem for real-valued functions in Theorem 44.3.20 requires many 
technical lemmas. (See Remark 2.4.6 for the meanings of “lemmas”.) These lemmas include Theorems 35.4.6, 
35.7.10, 37.4.5, 37.4.11, 37.4.14, 37.8.7, 38.4.4, 38.5.2, 38.5.3, 38.5.4, 38.5.5, 44.3.4, 44.3.5, 44.3.10, 44.3.13, 


44.3.16. and 44.3.18. Most of these have been stated and proved specifically to assist Theorem 44.3.20. 
(This proof strategy is based on the presentation by Murray/Miller [119], pages 1-12.) 


In the statement of Theorem 44.3.4, the requirement y € P(U) means that y C U, which means that the 
graph of y lies within U. In other words, Vt € Dom(y7), (t,(t)) € U. Therefore X(t, y(t)) is well defined 
for all t € By, because Dom(X) = U and Dom(y) = Bi, since X € C?(U,IR) and y € D!(B, ,., IR). 
(See Notation 42.2.8 for the space D! (BE,,.,., R) of differentiable real-valued functions on B,,,-. Some of the 
sets and functions in Theorems 44.3. 4, 44.3.5, 44.3.10, 44.3.13, 44.3.16, 44.3.18 and 44.3.20 are illustrated in 
Figure 44.3.1.) 


©, X(ti,y(t1)) (1, y (t)) 


(t2, y(t2)) 


(t3, y(t1)) 
zo| tnnt 


ty ty ts 
on" 'to | ^on t 


Figure 44.3.1 Sets and functions for proof of Peano existence theorem 


The right-hand side of a second-order ODE may often be thought of as a “force function” because it specifies 
the “acceleration” of the left-hand side. In the same way, one could refer to the right-hand side of a first-order 
ODE as a “velocity function”. 


44.3.4 THEOREM: The solution of a first-order ODE with continuous “velocity function” is C1. 
Let U € Top(IR2) and X € C?(U, R). Let Q € Top(IR) and y € D1(Q, IR) n P(U) satisfy 


vt e Q, y (t) = X (t, (t). 
Then y € C! (Q, IR). 


PROOF: Since y € D!(Q, IR), Theorem 40.5.3 implies that y € C?(Q, IR). So the function t +> (t, y(t)) for 
t € Q is continuous by Theorem 32.11.2 (i). Therefore the function t + X(t, y(t)) for t € Q is continuous by 
Theorem 31.12.7. But this equals the function 7’. Hence y € C! (Q, IR) by Notation 42.5.10. 


44.3.5 THEOREM: Equivalence of differential and integral equations. 
Let U € Top(IR2) and X € C°(U,R). Let I € Top(IR) be an interval with tp € I. Let y € D!(I,IR) N IP(U) 
satisfy y(to) = xo. Then y satisfies line (44.3.1) if and only if y satisfies line (44.3.2). 


Vt € I, y (t) = X(t, y(t)), (44.3.1) 
Vt c I, y(t) = xo + I X (s, y(s)) ds, (44.3.2) 


where * f” is the Cauchy-Riemann-Darboux integral in Definition 43.5.10. 
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PROOF: Suppose that y satisfies line (44.3.1). Then y € C!(I,IR) by Theorem 44.3.4. So y~! € C?(I,IR) 
by Theorem 42.5.16 (vii). Therefore 7’ is uniformly continuous on the symmetrised closed interval [[to, t]] by 
Theorem 38.3.7 for all t € I. (See Notation 16.1.15 for symmetrised closed intervals [[a, b]] for a,b € IR.) So 
y(t) — y(to) = [5 "y (s) ds for all t € Bi, by Theorems 43.3.20 and 43.8.5. Thus y satisfies line (44.3.2). 

Now suppose that y satisfies line (44.3.2). The continuity of y and X implies that the map s + X(s, y(s)) 


is continuous from J to R. by Theorem 31.12.7. So y(t) = X(t, y(t)) for all t € I by Theorem 43.8.3. Hence 
y satisfies line (44.3.1). 


44.3.6 EXAMPLE: Time-independent linear velocity field. 

Consider the equation ^4'(t) = a+ by(t) for t € IR, where a,b € R. Let k € R. Define y : IR > R by 
y(t) = kexp(bt) — a/b for all t € IR. Then 4'(t) = kbexp(bt) and a + by(t) = a + bk exp(bt) — ba/b = y' (t). 
So y solves the equation. The “velocity field" here is a linear function of the space variable. 

44.3.7 EXAMPLE: Time-independent quadratic velocity field. 

Consider the equation ^'(t) = a + by(t) + cy(t)? for t € IR, where a,b,c € IR. The “velocity field” here is a 
quadratic function of the space variable. 


Let A = 4ac — b?. Assume that A > 0. Let to € IR. Let I = (to — 1A-V/2,tg + «A-1/?). Then I € Top(IR). 
Define y : I > R by y(t) = (A! tan(A!/? (t — to) /2) — b)/2c for all t € I. Then for all t € I, 
y(t) = A sec? (A!? (t — t9)/2)/4c 

= A/4c+ A tan? (AV? (t — to) /2)/4c 

and 
a + bylt) + ey(£)? = a + b(A! tan(AV?(t — t9)/2) — b)/2c + c((AV? tan(AT"? (t — to)/2) — b)/2c)? 
= a — V /2c + bA? tan( AT? (t — t9)/2)/2c 
+ Atan? (A! (t — t9)/2)/4c — bA? tan(A1? (t — t9)/2)/2c + b? /4c 
= a — b? /Ac + A tan? (AT? (t — t9)/2)/4c 
= y (t). 


Thus y satisfies the equation on J, but cannot be extended beyond I. 


44.3.8 REMARK: Constructions for the Peano existence proof. 

The Peano ODE existence proof requires the construction of a suitable rectangle on which to define local 
polygonal approximations to a solution. The ad-hoc concept of a “local approximation rectangle” is defined 
in Definition 44.3.9 for this purpose. Theorem 44.3.10 shows that local approximation rectangles can be 
constructed (without choice axioms) from given set/function/point triples (U, X, (to, zo)). Perhaps a more 
accurate name for these rectangles would be “rectangular Peano approximation neighbourhoods”, which has 
the disadvantage of being even longer. 


If one has specific concrete knowledge of the region U and function X, one may construct larger rectangles 
than are constructed in Theorem 44.3.10. However, one may extend any solution by constructing an extension 
to the right of the rightmost point to + a of the interval B;,,,, and then proceed indefinitely, and likewise 
to the left. Note that due to non-uniqueness, extending back to the left from to + a may not coincide with 
the previously constructed solution. Even starting at the same t-value to, but with a different sequence of 
interval partitions, may yield a different solution. In fact, even a single sequence of interval partitions p (in 
Definition 44.3.12) may yield multiple solutions depending on the choice of convergence points in the Ascoli 
theorem procedure. 


44.3.9 DEFINITION: A local approximation rectangle for a set U € Top(IR?), a function X € C?(U, IR), and 
a point (to, zo) € U, is a set R= Bia x Bro» with a,b € IR*, which satisfies the following. 


(i) R CU. (The closure is relative to Top(IR2).) 
(ii) b > Ma, where M = sup{|X (t, x)|; (t,£) € R} € RO. 


44.3.10 THEOREM: Existence of a local approximation rectangle. 
For all U € Top(IR2), X € C(U,R), (to, zo) € U, there exists a local approximation rectangle for U, X 
and (to, Xo). 
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PROOF: Let b = min(1, $d((to, xo), IR? V U)). (See Definition 37.4.1 for the point-set distance function d.) 
Then b € Rt by Theorem 37.5.6(v). Let Ri = Be,» x Bao. Then V(t,z) € Ri, d((to, £o), (t,)) € 
21/25 < 2b < d((to, zo), R2? NU). So Ri C U. Let Mi = sup{|X(t,2)|; (t,£) € Ri}. Then Mi € R$ by 
Theorem 37.7.5. Let a = b/ max(1, Mi). Then 0 « a < b. 

Let R= Bia x Br,» and M = sup(|X(t,z)|; (t,x) € R}. Then R C Ri; C U by Theorem 31.8.13 (xiv), and 
b > amax(1, M) > Ma by Theorem 11.2.42 (ii). Hence R is a local approximation rectangle for U € Top(IR?), 
X € C(U, R) and (to, xo) € U by Definition 44.3.9. 


44.3.11 REMARK:  Polygonal approximations to ODE solutions. 

Definition 44.3.12 partitions the interval B;, 4 = [to — a, to + a] according to endpoint sequences (p;)*, and 
(q;);1; and then inductively defines a continuous piecewise linear function f : [to — a, to + a] > R which is 
linear on the intervals [pi 1, pi] and [q;, qj 1], with gradients determined by the value of X at the interval 
end-points closest to to. (See Figure 44.3.2. See Definition 43.3.3 for interval partitions.) 


AS 
R= Bia x Brab 
zo 4-54 
Flisi) = f(ai) + (aix — ai) X (ai, f (ai)) 
sp 
fü) a Tn 
f (qs) f (0) — n 
to + = M CLA qe 
fe) C fa) 
q4 %2)) d), 
f (q2) 
zo — b4 f(pisxi) =f (pi) + (pia — vi) X (vi, F(vi)) 
T T T my 
to — à = q4 q3 q2 di  to=po Pi p2 ps =to +a 
Figure 44.3.2 Polygonal approximation for proof of Peano existence theorem 


44.3.12 DEFINITION: The polygonal approximation for interval partitions p = (p;)?*, € Part(to, to + a) 
and q = (qj)j-, € Part(to,to — a), for a local approximation rectangle Bi, x Bey,» for U € Top(R?), 
X € C(U, R) and (to, xo) € U, is the function f : [to — a, to + a] — IR defined inductively as follows. 


(i) f(to) = xo. 
(ii) Vi € Nm, Vt € [pi-, pi]; f(t) = f(pi-1) + X (pi-1, f(pi-1)) (t — pi-1)- 
(iii) Vj € Nn, Vt € [aj,aj-1], f (t) = f(ai-1) + X(ai-v; f (ai-1)) (t — aj-1)- 


44.3.13 THEOREM:  Well-definition of polygonal approximations for Peano's ODE existence theorem. 

Let U € Top(IR2), X € C(U, R) and (to, zo) € U. Let R= Bra x Bz,» be a local approximation rectangle 
for U, X and (to, xo). For interval partitions p € Part(to, to + a) and q € Part(to, to — a), let fp,q denote the 
polygonal approximation for U, X, (to, £o), R, p and q. 


(i) fj4 is a well-defined piecewise linear continuous function from Bi, to Bso,» for all p € Part(to, to + a) 

and q € Part(to, to — a), 

(ii) IK € R$, Vp € Part(to + a), Vq € Part(to — a), Vt € [to — a, to + a], |fp O| € K. 
Thus the set of polygonal approximations, for given U, X and (to, xo), is a uniformly bounded set of 
functions on [tp — a, to + a]. (See Definition 38.4.14.) 

(iii) Ve € IR*, 36 € Rt, Vp € Part(to +a), Vq € Part(to — a), Vt1,t € [to —a,to +a], |fp.q(t1) — fp,q(t2)| < €- 
Thus the set of polygonal approximations, for given U, X and (to, ro), is a uniformly equicontinuous 
set of functions on [to — a, to + a]. (See Definition 38.4.11.) 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


44.3. Ordinary differential equations, Peano method 1439 


PROOF: In this proof, it is convenient to abbreviate “fp,” to “f”. 


For part (i), f(pi) = f(pi-1) + X(pi-1, f (pi-1)) (bi — pi-1) for all i € Nm, where p = (p;)72,. Summing 
differences f(pi) — f(pi-1) gives |f(pn) — f(to)) € Ma < b with M = sup{|X(t,2)|; (t,x) € R}. So 
(to +a, f (to + a)) € R. Induction on i gives Vt € [to, to + a], (t, f(t)) € R by the convexity of R. Similarly, 
Vt € [to — a, to], (t, f (t)) € R. 

For part (ii), let K = |xo| + Ma with M = sup{|X (t, x)|; (t,x) € R}. The bound on [to, to + a] follows 
by induction on i because |f(to)| = |xo| and |f(t) — f(pi-1)| € M(pi — pi-1) for all t € [pi 1, pi], for all 
i € Dom(p), which gives |f(t)| € |xo| + Ma for all t € [to,to + a] by the triangle inequality applied to 
successive intervals of p. The bound on [to — a, to] follows similarly. 

For part (iii), the bound holds with 6 = €/M for all e € IR*, where M = sup(|X(t,z)|; (t,x) € R}. (This 
may be shown for t1, t2 € [to, to + a] by induction on 44,4» such that tı € [pi 1, pi,] and t» € [Pi 1, pi], and 
similarly for tı € [to — a, to] or te € [to — a, tol.) 


44.3.14 REMARK: Convergence "oscillations" of polygonal approximations in the Peano method. 

Theorem 44.3.13 (i) may seem, at first sight, to imply that the polygonal approximations for the Peano 
method are well defined enough to permit them to be used to construct solutions of the ODE in the limit. 
However, since the ODE conditions are insufficient to guarantee uniqueness (as discussed in Remark 44.3.2), it 
is unclear whether a sequence of partitions of the domain interval will yield approximations which converge, or 
could possibly oscillate between multiple solutions, even after existence has been demonstrated. This differs 
significantly from the situation with Cauchy-Riemann-Darboux integrals, which always converge to the same 
result, independent of the choice of interval partitions for which the mesh converges to zero. The possibility 
of oscillations in sequences of polygonal approximations is the reason for the application of Ascoli's theorem, 
Theorem 38.5.5, to extract a convergent subsequence from a given sequence of Peano-style approximations. 


The possibility of oscillations in the convergence of polygonal approximations may seem to be merely a 
technical issue which would rarely arise in practice. Such optimism is contradicted by the simplicity of 
ODEs for which non-uniqueness does occur, as mentioned in Remark 44.3.2. The Peano-style space-filling 
curves in Section 36.3 give some idea of how “random” a continuous curve can be. An equation which is 
driven by such a ^random force" could easily develop oscillatory-like behaviour. 


44.3.15 REMARK: Convergence estimates for polygonal approximations. 

After asserting in Theorem 44.3.13 (i) that the polygonal approximations in Definition 44.3.12 are well 
defined, the next assertion to make is that they are in fact approximations, which means that they differ 
from an exact solution by an amount which can be made as small as one wishes. Theorem 44.3.16 asserts that 
the approximation error is arbitrarily small in the sense of how closely it satisfies the integral equation on 
line (44.3.2) in Theorem 44.3.5. The “error measure" E, in Theorem 44.3.16 is not the difference between 
an approximation fp, and an exact solution of the equation. 


The strategy of proof here is to show that an approximation sequence can be constructed which converges 
to some limit function, then note that the error measure E, converges to zero, and then deduce that 
the limit function must satisfy the differential equation. As usual in ODE and PDE existence theory, one 
cannot estimate the error of approximations directly because the solution is not known in advance. (This is 
analogous to the situation with Cauchy sequences in complete metric spaces, which are known to converge 
to something, but the limit is not generally known in advance.) 


The function 6* in the statement of Theorem 44.3.16 is a kind of “choice function" for the usual ó-value 
which one requires for each e € IR* in the definition of uniform continuity as in Definition 38.3.2. (In 
Theorem 44.3.16, X is uniformly continuous on R by Theorem 38.3.7. See Remark 38.2.2 for the avoidance 
of the axiom of choice when choosing 6.) The function ó* is effectively a kind of inverse of the modulus of 
continuity of X. 


44.3.16 THEOREM: Error measure bounds for polygonal approximations. 

Let U € Top(IR2), X € C(U, R) and (to, zo) € U. Let R = Biya X Bs,,5, be a local approximation rectangle 
for U, X and (to, £o). Let fj, : [to — a,to + a] — R denote the polygonal approximation for interval 
partitions p € Part(to, to + a) and q € Part(to, to — a), for U, X, (to, £o) and R. Define an “error measure" 
Ep, : [to — a, to + a] — R by 


Vp € Part(to,to + a), Yq € Part(to, to — a), Vt € [to — a, to + a]; 
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Ep,q(t) = fp, (t) — to — | X(s, fo,q(8)) ds. (44.3.3) 


Define 5+ : Rj} — Rj by 


Ve € Rb, 
d+ (e) = sup(ó € RT: V(t1, 21), (t2, £2) € R, \(t1, 21) = (t2, £2) |oo < ô => [X (t1, 21) = X (t5, 22)| < E}. 


Let M = sup{|X (t, x)|; (t,x) € R}. Let u(&) = 6*(e/a)/ max(1, M) for e € IR^. Then 


Ve € Rt, Yp € Part,,(<)(to, to + a), Vq € Part,,(<)(to, to — a), Vt € [to — a, to + a], 
| Ep, (t)| «E. 


In other words, for any € € R*+, polygonal approximations fp, with mesh not exceeding ju(¢) have an error 
measure |Ep q| not exceeding e. 


PROOF: Let e € Rt. Let p = (pj), € Part,(e)(fo,to + a). Let t € (to,to +a]. Then t € (pi 1, pi] for 
some i € Nm, and then by Definition 44.3.12 (ii), | fpa (t) — fo, (pi-1)| = |X (i-1, foa (0i): It ^ Pi-al < 
Mu(e) < 6*(e/a). But |t — pi-1| € ple) € 9* (e/a). So |(t, fo, (£)) — (Pi-1, fo.a(pi-1))loo < ô+ (e/a). (See 
Notation 24.7.13 for |- |.) Therefore |X (t, fp,q(t)) — X(pi-1, fp, (pi-1))) € e/a. (Note that d+ (e/a) > 0 
because X is uniformly continuous on R by Theorem 38.3.7.) It follows that 


Vt € (pi-1, Pil, 
fralt)~ fos) - | Xle, aids = |X Pina fup via) ~ f X18, foal8)) a 
Pi-1 Pi-1 
=|] X(@i-a fpaPi-1)) - X(s, foals)) ds] (44.3.4) 
Pi-1 
< E d 44.3.5 
Sa in 
= 5(t— pi), 


where lines (44.3.4) and (44.3.5) follow from Theorem 43.7.9 (i, vii). Summing over intervals, one obtains 


vi € (to, pi; E, c(t) =| fat) — fout) — j X(s, fog(s)) ds 
i—1 
< Y, E(pj — py) + (t pii) 
j=l @ a 


€ 
—(t—t « E. 
=(t-to) < 


Similarly for t € [to — a,to) and q € Part(to, to — a). Clearly E,,4(to) = 0 € e. The assertion follows. 


44.3.17 REMARK: Some limit of the polygonal approximations solves the ODE. 
Theorem 44.3.18 provides the last step in the proof of Theorem 44.3.20, which asserts existence of solutions 
for first-order ODEs for real-valued functions. 


44.3.18 THEOREM: Uniform limits of polygonal approximations satisfy the ODE integral equation. 

Let U € Top(IR2), X € C(U,R) and (to, zo) € U. Let R = Baa X Bey», be a local approximation rectangle 
for U, X and (to, xo). Let fj, : [to—a, to +a] —^ R denote the polygonal approximation for interval partitions 
p € Part(to, to +a) and q € Part(to, to — a), for U, X, (to, £o) and R. 

Let (p); € Part(to,to + a)” and (q/)%29 € Part(to,to — a)" be sequences of interval partitions with 
lim; mesh(p/) = 0 and lim;_,.. mesh(g?) = 0. Let (h;)??.9 be the polygonal approximation sequence 
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defined by hj = fpj qi for all j € Zf. Suppose that h converges uniformly to a function g : [to —a, to +a] > R. 
Then 


t 
Vt € [to — a, to + a], g(t) = 9+ f X(s,g(s)) ds 
to 


PROOF: Lete € Rt. Define » : Rt — Rj as in the statement of Theorem 44.3.16. (The function p 
depends only on X and R.) Then p(e) > 0 because X is uniformly continuous on R by Theorem 38.3.7. 
The assumption lim;.,4, mesh(p’) = 0 then implies that there exists kp € Zf such that mesh(p’) < u(e) 
for all j € Z[kp,oo). Similarly there exists kq € ZË such that mesh(g/) < p(e) for all j € Z[kg,co). Let 
k = max(kp,k,). Then Theorem 44.3.16 implies 


Vj € Z[k, oo), Vt € [to — a, to + a], 


foi ai (t) — foi (t0) — 1 X (s, fpi qi (8)) ds| < €. (44.3.6) 


Since hj = fi ,qi is continuous for all j € Zg by Theorem 44.3.13 (i), the uniform convergence of h implies 
that g is continuous by Theorem 38.4.5. 

Define Y; : [to — a,to + a] > R by Y;(t) = X(t, h;(t)) for all j € ZË and t € [to — a,to +a]. Then Y; is 
continuous for all j € Zj because h; is continuous. Define Z : [to — a, to + a] — IR by Z(t) = X(t, g(t)) for 
all t € [to — a,to + a]. Then Z is continuous because g is continuous. The function sequence Y converges 
uniformly to Z by pH ge 4.7 (i e because X is uniformly continuous on R and h converges uniformly 


to g. So limj—o0 LG s)ds= =f Zz s) ds for all t € [to — a, to + a] by Theorem 43.7.9 (ix). Hence 
Vt € [to — a, to + a], g(t) = lim h;(t) 
00 
= jim "i X(s, h;(s)) ds + Eps qi (t)) (44.3.7) 
= To +f X(s,g(s (44.3.8) 


where line (44.3.7) follows from line (44.3.3), and line (44.3.8) follows from line (44.3.6). 


44.3.19 REMARK: The first-order Peano ODE existence theorem for real-valued functions. 

The proof of the local existence of solutions of first-order real-valued ODEs in Theorem 44.3.20 may seem 
quite short, but this is because a large amount of work has already been done in numerous preparatory 
theorems and technical lemmas which reduce the proof to the mere gluing together of those theorems. (This 
proof strategy is summarised in Remark 44.3.3.) 


44.3.20 THEOREM: Peano existence theorem for real-valued first-order ODEs. 
Let U € Top(IR2), X € C?(U, R) and (to, £o) € U. Then 


dr e Rt, 3y € C' (Bs, ,., R) NPV), 
y(to) 2 29 and Vte€ By, Y (t) = X(59(t)). 


PROOF: Let U € Top(IR2), X € C°(U,R) and (to, zo) € U. Then by Theorem 44.3.10, there exists a local 
approximation rectangle R = Bi. X Bro,» for U, X and (to, zo), and by Definition 44.3.9, this rectangle 
satisfies R C U and a,b € IR* with b > Ma, where M = sup{|X (t, x)|; (t,£) € R} € R$. 


Let (p’)$29 € Part(to, to + a)” and (g)72, € Part(to, to — a)” be sequences of interval partitions such that 


limj+.o mesh(p/) = 0 and lim; ,4, mesh(q’) = 0. (For example, p/ = (pl), and qf = (g1)7, with m; = 24 
for all j € Zf, where p] = to + ia/m; and q} = to — ia/m; for all i € Nm, and j € Zj.) 

Define the function sequence h = (h;)75.9 by hj = fy;,4; for all j € Zg, where f, : [to — a, to +a] > R 
is the polygonal approximation for partitions pî and q for U, X, NR and R, as in Definition 44.3.12. 
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on [to — a, to +a] to some (unique) function y : [to — a, to +a] — IR. Then y is continuous by Theorem 38.4.5. 
Therefore by Theorem 44.3.18, 


t 
Vt € [to — a, fo + a], x0) = 20+ f. X(s,7(s)) ds. 
to 


Then y is differentiable on (to — a,to + a) by Theorem 43.8.3 because s +» X(s,*(s)) is continuous by 
Theorems 32.9.8 and 31.12.7. So y € C! (Bj, 4, R) by Theorem 44.3.4. Hence by Theorem 44.3.5, 


Vt € Bur, y (t) = X (t& y(£)), 


and clearly y(to) = xo. 


44.4. Ordinary differential equations, Peano method, systems 


44.4.1 REMARK: Generalisation of the Peano existence method from ODEs to ODE systems. 

Theorem 44.4.9 generalises Theorem 44.3.20 from real valued functions of time (or some time-like parameter) 
to Cartesian space valued functions of time. Such functions may be referred to colloquially as *vector-valued 
functions", although the space variable for ODEs is generally located in a differentiable manifold, not a linear 
space. ODEs for such functions are traditionally referred to as “systems” or “coupled systems” of ODEs. 


The generalisation to ODE systems requires various preliminary lemmas to be generalised. Thus Theorems 
44.4.2, 44.4.3, 44.4.5 and 44.4.8 generalise Theorems 44.3.4, 44.3.5, 44.3.10 and 35.7.10 respectively. Similarly, 
Definitions 44.4.4 and 44.4.6 generalise Definitions 44.3.9 and 44.3.12 respectively. Although the differences 
in the assertions and proofs are almost identical, there are some subtle differences which need to be stated 
explicitly to remove doubt and ambiguity. 


44.4.2 THEOREM: The solution of a first-order ODE with continuous “velocity function” is C^. 
Let n € Z+. Let U € Top(IR x R”) and X € C?(U, R”). Let Q € Top(IR) and y € D!(Q, IR") n IP(U) satisfy 


weg, l(t) = X(()). 
Then y € C! (Q, IR"). (See Notation 42.5.9 for the set of partially differentiable functions D!(Q, IR?).) 


PROOF: Since y € D!(Q, IR"), Theorem 40.7.3 implies that y € C?(Q, IR"). So the function t — (t, y(t)) 
for t € Q is continuous by Theorem 32.11.2 (i). Therefore the function t + X(t, y(t)) for t € Q is continuous 
by Theorem 31.12.7. But this equals the function y’. Hence y € C! (Q, IR") by Notation 42.5.10. 


44.4.3 THEOREM: Equivalence of differential and integral equations. 
Let n € Zt. Let U € Top(R x IR?) and X € C°(U,R”). Let I € Top(IR) be an interval with to € I. Let 
y € D'(I,IR") n IP(U) and zo = ^(to). Then y satisfies line (44.4.1) if and only if y satisfies line (44.4.2). 


Wer y(t) =X (44.4.1) 


(t, (t) 
Vt c I, y(t) = zo «f X (s, vy(s)) ds, (44.4.2) 


where * f” is the Cauchy-Riemann-Darboux vector-valued integral in Definition 43.9.2. 


PROOF: Suppose that y satisfies line (44.4.2). Then y € C!(I,IR") by Theorem 44.4.2. So y~’ € C*(I, IR) 
by Theorem 42.5.16 (vii). Therefore 7 is uniformly continuous on the symmetrised closed interval [[to, t]] by 
Theorem 38.3.7 for all t € I. (See Notation 16.1.15 for symmetrised closed intervals [[a,5]] for a,b € IR.) So 
y(t) — (to) = Si, "y (s) ds for all t € Bto,r by Theorems 43.3.20 and 43.8.5 applied to the components 7; of y 
for i € Nn. Thus y satisfies line (44.4.2). 

Now suppose that y satisfies line (44.4.2). The continuity of y and X implies that the map s +> X(s,7(s)) 
is continuous from I to R by Theorem 31.12.7. So y(t) = X(t, y(t)) for all t € I by Theorem 43.8.3 applied 
to the components of y. Hence y satisfies line (44.4.1). 
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44.4.4 DEFINITION: A local approximation cylinder for a set U € Top(R x IR?) with n € Z*, a function 
X € C? (U, R”), and (to, o) € U, is a set R= Biya X Bro» with a,b € IR*, which satisfies the following. 

(i) R C U. (The closure is relative to Top(IR x IR?).) 

(ii) b > Ma, where M = sup{|X (t, x)|; (t, z) € R} € RẸ. 


44.4.5 THEOREM: Existence of a local approximation cylinder. 
For all n € Zt, U € Top(IR x R”), X € C(U, R”) and (to,zo) € U, there exists a local approximation 
cylinder for U, X and (to, £o). 


PROOF: Let b = min(1, $d((to, zo), IR! *" \U)). (See Definition 37.4.1 for the point-set distance function d.) 
Then b € R* by Theorem 37.5.6 (v). Let Ry = Bj, X Boop. Then Y(t, x) € Ri, d((to, £o), (t, 2)) < 21b < 
2b < d((to,zo),IRI ^ \ U). So Ri C U. Let M; = sup(|X(t,z)|; (t,£) € Ri}. Then Mi; € R$ by 
Theorem 37.7.5. Let a = b/ max(1, Mi). Then 0 « a < b. 

Let R = Biya x Bry» and M = sup{|X(t,x)|; (t,£) € R}. Then R C Ri; CU by Theorem 31.8.13 (xiv), and 
b > amax(1, M) > Ma by Theorem 11.2.42 (ii). Hence R is a local approximation cylinder for U € Top(IR?), 
X € C(U, R) and (to, xo) € U by Definition 44.4.4. 


44.4.6 DEFINITION: The polygonal approximation for interval partitions p = (p;)7*, € Part(to, to + a) and 
q= (a5) 5 € Part(to, tg — a), for a local approximation rectangle Bisa x Bz,» for U € Top(R x R”), 
X € C(U, R”) and (to, zo) € U, where n € Z*, is the function f : [to — a, to + a] — IR" defined inductively 
as follows. 

(i) f(to) = xo. 

(i) Vi € Nm, Vt € [pi-1, pi], F(t) = f(pi-1) + X (pi-1, f (pi-1)) (t — Di-1)- 
(iii) Vj € Ne, Vt € [qj, qi], F(t) = F(qi-1) + X (Gi-1, F(G—-1)) (t — 4j-1)- 
44.4.7 REMARK: Proof of well-definition of polygonal approximations. 


The proof of Theorem 44.4.8 is textually identical to the proof of Theorem 44.3.13. However, many of the 
symbols have a different meaning. 


44.4.8 THEOREM: Well-definition of polygonal approximations for Peano’s ODE existence theorem. 

Let n € Zt. Let U € Top(R x R”), X € C(U, R”) and (to,zo) € U. Let R = Biya X Bay,» be a local 

approximation cylinder for U, X and (to, xo). For interval partition pairs (p,q) with p € Part(to, to +a) and 

q € Part(to, to — a), let fp, denote the polygonal approximation for U, X, (to, £o), R, p and q. 

(i) fp is a well-defined piecewise linear continuous function from Bi,,a to Bro,» for all p € Part(to, to + a) 

and q € Part(to, to — a), 

(ii) SK € R$, Vp € Part(to + a), Vq € Part(to — a), Vt € [to — a, to + a], |f, (0) € K. 
Thus the set of polygonal approximations, for given U, X and (to, xo), is a uniformly bounded set of 
functions on [to — a, to + a]. (See Definition 38.4.14.) 

(iii) Ve € Rt, 3ó € Rt, Vp € Part(to +a), Yq € Part(to — a), Vt, t» € [to — a, to +a], |fp.q(t1) — fpa (t2)| < €- 
Thus the set of polygonal approximations, for given U, X and (to, ro), is a uniformly equicontinuous 
set of functions on [to — a, to + a]. (See Definition 38.4.11.) 


PROOF: In this proof, it is convenient to abbreviate "f," to “f”. 


For part (i), f(pi) = f(pi-1) + X (pi-1, f(pi-1)) (bi — pi-1) for all i € Nm, where p = (pi)it4. Summing 
differences f(p;i) — f(pi-1) gives |f(pn) — f(to)) € Ma < b with M = sup{|X(t, x)|; (t,£) € R}. So 
(to +a, f (to +. a)) € R. Induction on i gives Vt € [to, to + a], (t, f(t)) € R by the convexity of R. Similarly, 
Vt € [to — a, to], (t, f(t)) € R. 

For part (ii), let K = |ro| + Ma with M = sup(|X(t,z)|; (t,x) € R}. The bound on [to, to + a] follows 
by induction on į because |f(to)| = |xo| and |f(t) — f(pi-1)| € M(pi — pi-1) for all t € [pi-1, pi], for all 
i € Dom(p), which gives |f(t)| € |xo| + Ma for all t € [to,to + a] by the triangle inequality applied to 
successive intervals of p. The bound on [to — a, to] follows similarly. 

For part (iii), the bound holds with 6 = €/M for all e € Rt, where M = sup{|X (t, x)|; (t, v) € R}. (This 
may be shown for t1, t2 € [to, to + a] by induction on 44,4» such that tı € [Pi 1, Pi] and t» € [pi 1, Piz], and 
similarly for t4 € [to — a, to] or t» € [to — a, tol.) 
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(2018-11-16. Here must generalise more Peano ODE lemmas to IR^. )) 


44.4.9 THEOREM: Peano existence theorem for first-order ODE systems. 
Let n € Z*. Let U € Top(R x R”), X € C°(U, IR") and (to, zo) € U. Then 


3r € Rt, 3y € C! (Bir, R”) a IP(U), 
(to) 2 zo and Vte Bur, y(t) = X(t, Y(t)). 


PROOF: 
(2018-11-11. To be continued ... )) 


44.5. Ordinary differential equations, uniqueness 


44.5.1 REMARK: Uniqueness theorems for ODEs and ODE systems. 

For the sets of open interval neighbourhoods Top;?"" (IR) in Theorems 44.5.2 and 44.5.3, see Notation 34.4.11. 
The Lipschitz conditions are imposed here only on the “space” variable, but uniformly with respect to the 
"time" variable. 


Theorems 44.5.2 and 44.5.3, for real-valued and vector-valued ODEs respectively, may seem to have almost 
identical statements and proofs, but Theorem 44.5.3 relies upon Theorem 43.9.5 (vii), a kind of triangle 
inequality for vector-valued Darboux integrals, which has a significantly more difficult proof than the much 
simpler inequality in Theorem 43.7.9 (vii) for real-valued Darboux integrals. Thus the relative complexity 
here is “hidden in the lemmas”. 


44.5.2 THEOREM: Uniqueness theorem for real-valued first-order ODEs. 
Let U € Top(R x IR). Let U’ = ft € R; U; Z Ø}, where U, = (x € R; (t,x) € U} for all t € IR. Let 
X € C?(U, R) and 


L € R$, Yt € U', Vz,y € Ui, |X (6,2) — X(t, y)| € L|z — yl. (44.5.1) 


Let (to, £o) € U and I € Top?" (R). Let 71,72 € C! (I, IR) Y IP(U) satisfy 
(i) yk(to) = xo for k = 1,2, and 
(ii) Vt € I, (t) = X(t, yn (t)) for k = 1,2. 


Then $4 = 72. 


PROOF: By Theorem 44.3.5, Vt € I, yx(t) = xo +S X (s, yx (s)) ds for k = 1,2. So by Theorem 43.7.9 (vii), 


vee, m0 = ROL | f XGaiG) 7 Xl, a 


< Jf Ims) — 2(s)| dsl, (44.5.2) 


where L satisfies the Lipschitz condition in line (44.5.1). Let tı € I. Then J’ = [[to, t1]] is a compact subset 
of I. So yı and 72 are bounded on J’ by Theorem 37.7.5 and Notation 42.1.10. Thus there exists K € Rj 
with |y1(s) — 72(s)| € K for all s € I’. So by Theorem 43.7.9 (viii) and line (44.5.2), 


Viel’, Im) — ya(t)| € KL |t — tol. 
This may be substituted any number of times into line (44.5.2) to obtain 


t — to|* 
veel’, Yk e Zé, ma-ne tol 


Therefore Vt € I’, y1(t) = y2(t) by Theorem 35.4.21 (iv). Since this holds for all intervals of the form [[to, t1] 
for tı € I, it follows that yı = %2. 
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44.5.3 THEOREM: Uniqueness theorem for Cartesian-space-valued first-order ODEs. 
Let n € Z*. Let U € Top(R x IR"). Let U’ = (t € R; U, 4 0), where U, = (x € R^; (t,x) € U} for all 
t € R. Let X € C?(U,IR?) and 


L € R$, Vt € U', Vz,y € Ui, |X (t,x) — X(t, y)| € Lle — y]. (44.5.3) 
Let (to, £o) € U and I € Topz;"" (IR). Let 71,72 € C! (I, R”) N P(U) satisfy 
(i) Ye(to) = xo for k = 1,2, and 
(ii) Vt € I, y(t) = X(t, y (t)) for k = 1,2. 
Then 54 = 72. 


PROOF: By Theorem 44.4.3, Vt € I, yk (t) = zot fy, X (s, yx (s)) ds for k = 1,2. So by Theorem 43.9.5 (vii), 
vt € I, lm (5) — y2(t)| < ^ X(s,m(s Pens 


(s)| ds}, (44.5.4) 


where L satisfies the Lipschitz condition in line (44. 44.5.3). Let tı € I. Then I’ = [[to, t1]] is a compact subset 
of I. So yı and 7 are bounded on I’ by ds 37.7.5 and Notation 42.5.10. Thus there exists K € R$ 
with |71(s) — 72(s)| < K for all s € I". So by Theorem 43.9.5 (viii) and line (44.5.4), 


Veer’, Im(t) — y2(t)| € KL |t — tol. 
This may be substituted any number of times into line (44.5.4) to obtain 
/ + k It — to|* 
vt E€ I, Vk € Zo, ntt) —92(0)| < KL — —. 


k! 


Therefore Vt € I’, y1 (t) = y2(t) by Theorem 35.4.21 (iv). Since this holds for all intervals of the form [[to, tı] 
for tı € I, it follows that y, = %2. 


44.5.4 REMARK: Approximation of solutions of first-order ODEs. 

An important application of solutions of first-order ordinary differential equations is to the Lie bracket of two 
vector fields. For the geometric interpretation of the Lie bracket, for example in Theorems 46.5.8 and 61.5.21, 
approximations are required for integral curves of vector fields. It can be shown that the geometric “holonomy 
deviation” of the commutator of two vector field actions is determined by the algebraic commutator of the 
two vector fields X and Y. This “holonomy deviation” means the difference between the start and end points 
of a quadrilateral formed from the concatenation of four integral curves, generated by fields X, Y, —X and 
—Y for equal curve parameter “times”. (See Definition 46.5.6 for “quadrilateral curve families”.) To obtain 
an approximation for such concatenations of curves, the first task is to obtain suitable approximations for a 
single curve. For this purpose, it is sufficient to obtain approximations for time-independent ODEs. 


There is a kind of “chicken-and-egg” difficulty in the proof of Theorem 44.5.5. A bound for y can be obtained 
from a bound for 7’, which can be obtained from a bound for X o y, which can be obtained from a bound 
for y. A similar difficulty arises in the proofs of Theorems 44.5.2 and 44.5.3, where the cycle of dependencies 
is boot-strapped by noting bounds for the curves on some compact set. The same kind of strategy is adopted 
in the proof of Theorem 44.5.5. 


((2019-8-17. Theorem 44.5.5 is not correct. Please ignore it temporarily. It is just a first stab at a sketch of a 
theorem which should give a bound for a curve, given a bound for the vector field. The important theorem 
will be the first-order approximation which will be given shortly after. This will then be used to show the 
relation between nonholonomy of vector field pairs and their Lie bracket. ) 


44.5.5 THEOREM: First-order approximation for solutions of real-valued time-independent ODEs. 
Let I € Top^""(R). Let Q € Top(IR) and X € C?(Q,IR). Let y € C!(I,)) satisfy y(t) = X(4(t)) for 
allt € I. Then 
Vto € I, Ve € IR^, 36 € IR*, Vt € (to — ô, to + ô), 
ly(t) — ( (to) + X (Y(to))(€ — to))l < bx (Byty),2) It — tol, 
where ¢x(K) = sup{|X (x1) — X (x2)|; x1, 22 € K} for compact subsets K of Dom(X). 
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PROOF: Let rg = (to). Assume that X(xo) Æ 0. Let &i = i|X (zo). Then £1 € IR*. So there exists 
ô € Rt such that X (Bay51) c Bx (5),5,- Let G— In aH BS ds s 


((2019-8-18. To be continued ... )) 


44.6. Ordinary differential equations, Picard method 


44.6.1 REMARK: Some literature for the Picard iteration method. 
The Picard iteration method for ODE existence is presented by Murray/Miller [119], pages 48-76, 112-117; 
Bear [55], pages 140-160, 171-178; Coddington [61], pages 200-225, 250-256; Coddington/Levinson [62], 
pages 11-13; Mattuck [114], pages 445-454; Hurewicz [95], pages 18-22, 39-40; Tenenbaum/Pollard [148], 
pages 719-747, 763—770. (See also the 1890 publication by Picard [193].) 


ODE existence proofs based on contraction map theorems are given by Lang [23], pages 67-88; Spivak [37 
Volume 1, pages 139-142; Kolmogorov/Fomin [104], pages 71-74; Simmons [137], pages 339-340; Thomson/ 
Bruckner/Bruckner [149], pages 595-597; Bruckner/Bruckner/Thomson [56], pages 372-374; Rosenlicht [128 
pages 177-192. 


44.6.2 REMARK: Integration ignores “noise”. 

Integral methods, such as the Picard iteration method, have the advantage of ignoring “noise” which has 
measure zero in some sense. A differential method, on the other hand, such as the Peano method, is 
sensitive to variations in the force function at individual points of a partition which is used for defining 
approximations. (It is observed in the electronics engineering of operational amplifiers that integration is 
resilient to noise, whereas differentiation amplifies noise. So generally integration is preferred in the design 
of analogue computers.) The “noise resilience” of integral methods has the particular advantage that the 
dependent variable can be a point on a rectifiable curve, whose velocity is only defined almost everywhere. 


44.6.3 REMARK: Applications of ODE theory to differential geometry. 
The applications of ODE theory to differential geometry include the following. 


(1) Existence and uniqueness of integral curves of vector fields. 
(2) Existence and uniqueness of one-parameter subgroups of Lie groups for given Lie algebra elements. 


(3) Existence and uniqueness of parallel transport along curves for a given connection on a vector bundle. 


Cases (1) and (2) have the form of time-independent ODE systems, whereas case (3) has the form of a 
time-dependent linear ODE system. 


44.6.4 REMARK: Curvature, parallel transport, connections, and systems of ODEs. 

The central concept of differential geometry is curvature, which is defined in terms of the deviation from 
absolute parallelism of parallel transport along closed curves in a fibre bundle. (This “holonomy deviation” 
is discussed in Section 70.1. See Section 71.4 for parallel transport.) Therefore, since curvature is defined 
in terms of parallel transport, it could be ‘argued that parallel transport is even more central to differential 
geometry than curvature. However, parallel transport is defined, via the coordinate charts, as the solution of 
an initial value problem for a system of ordinary differential equations. Therefore one could argue that ODE 
systems are even more central to differential geometry than parallelism. Without existence and uniqueness 
for solutions of initial value problems for ODE systems, there would be no parallel transport. Without 
parallel transport, there would be no curvature. Without curvature, all differential geometry would be flat. 


If one writes down the meanings of the equations which appear in differential geometry, it becomes evident 
that it is really a branch of the analysis of differential equations, since almost all equations in differential 
geometry are differential equations. In particular, the equations for parallel transport are systems of ordinary 
differential equations via the coordinate charts. 


44.6.5 REMARK: Linearity and discontinuity of the differential equations for parallel transport. 

The ordinary differential equations which describe parallel transport for affine and general connections are 
linear with respect to the z-variable, but potentially discontinuous with respect to the t-variable. (As a 
mnemonic, z means here some kind of “state” in a fibre space for a parameter t of a curve in the base 
space.) This is assuming that one formulates the systems as ¢’(t) = f(t, d(t)), where ¢: R — IR" is the 
unknown solution and f : R x IR^ — R™ defines the system of equations. Here t € R and z = ¢(t) € R™ 
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for some m € Zt. The linearity with respect to z means that the system of equations may be written as 
¢' (t) = B(t)(¢(#)) for t € R, where B(t) : R™ — IR" is a linear map for all t € R. 


In the parallel transport application, t is the parameter of a rectifiable curve in a differentiable manifold and 
z is the object which is being transported, typically a vector in a vector bundle. (The transported object 
could also be an element of a structure group, for example, or an element of any other kind of fibre space. 
If the fibre space is a differentiable manifold, the ODE system involves vector fields and tangent vectors via 
Cartesian charts.) Typically the function B(t) depends on the tangent vector to the curve, but this tangent 
vector only exists almost everywhere. For this reason, the Peano approach to approximation and existence 
proofs is unsuitable. (The Peano approach requires the function f to be uniformly continuous on closed 
intervals.) The Picard approach uses integrals which conveniently gloss over discontinuities which occur at 
most on a set of measure zero. 


44.6.6 REMARK: Motivation for solving ODE systems along rectifiable curves. 

Many differential geometry texts require curves to be at least differentiable everywhere, and typically C! or 
even C??. But for calculating curvature for a given connection, one typically calculates parallel transport 
around a rectangular path which converges to a point. Rectangular paths are not differentiable everywhere! 
One way out of this problem is to regard a rectangular path as piecewise differentiable, but this naturally 
suggests the question of the most general kind of path along which parallel transport may be computed, 
which is a rectifiable path. 


In the special case of rectangular paths, the velocity is not defined at the vertices, and the left and right limits 
of the velocity at the vertices are different. The parallel transport which is computed for such a path is the 
same as if one integrates along each edge and uses the final value along one edge as the initial value for the 
succeeding edge. In the case of a general rectifiable curve, one may have infinitely many such discontinuities, 
but the integral is still well defined. 


44.6.7 REMARK:  Unfortunate properties of Picard approximations. 

Example 44.6.17 demonstrates the unfortunate fact that although the solution to a parallel transport system 
of equations may be orthogonal, the Picard approximations are definitely not orthogonal. Worse than this 
is the fact that the convergence is not uniform on an infinite interval. It would be possible to fix the first 
problem by developing an iteration method which uses only approximate curves which lie within a specified 
Lie group. However, such an approach would in essence perform Picard iterations in advance to construct the 
approximation curves within the specified Lie group, after which the iterations are all performed within that 
group. Although this approach would be philosophically desirable, the costs of setting up the “machinery” 
for such a method may exceed the benefits. 


To get around the non-uniform convergence of Picard iterants on unbounded intervals, one simply restricts 
the curve parameter domain to a bounded interval and “daisy-chains” the solutions together along an infinite 
concatenation of such bounded intervals. 


44.6.8 REMARK: Representation of “problems” and “procedures” as sets. 

For the last hundred years or more, there has been a strong tendency in pure mathematics to identify every 
kind of object with a set. Thus numbers, for example, are now regarded as sets which are members of 
number-sets such as w, Z and R, and geometrical points and lines are regarded as elements or subsets of 
point-set structures such as IR”. In this book, even the meanings of logical propositions are regarded as 
“knowledge sets”. (See Section 3.4.) 


Following this general trend, “problems” may be represented as particular kinds of sets. For example, ODE 
initial value problem classes are represented in Definitions 44.6.12 and 44.6.14 by a parametrised family of 
sets of solutions of problems in each problem-class. The prime objective is then to demonstrate that the 
solution-set contains one and only one element for each problem in the problem-class. As always (or almost 
always), when sets are used to indicate elements of a class, the meaning of such sets must be indicated in 
the context. As outlined in Section 8.8, this contextual bestowal of meanings on sets could be formalised by 
associating “class tags” with each set, although the meanings of all such class tags would need to be defined 
in natural language in some kind of class-tag dictionary. 


In a similar way, iteration procedures may be represented by the sequences of outputs which result from 
executing those procedures, as for example in Definition 44.6.15. The output sequences are merely sets, 
whereas the procedure is a sequence of computations. The “true meaning” of a computational procedure 
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is metamathematical and cannot be described by a set alone. The natural-language context adds this 
metamathematical meaning to the sets. Thus both “problems” and “procedures” may be identified with 
particular ZF sets, but the meaning must still be gleaned from the context. An object-set communicates 
which object of a class is intended. The natural-language context, or a class-tag, communicates the which 
class is intended. The knowledgeable reader may then associate a meaning with the object, given these two 
“inputs”. (This is somewhat analogous to the process by which pages of text and images are communicated 
to a printer by sequences of data-bytes. The printer interprets the byte-sequence to produce its meaning, 
which in this case is a piece of paper with ink printed onto it.) 


44.6.9 REMARK: Specification of systems of ordinary differential equations. 

Definition 44.6.12 specifies a system of ordinary differential equations as a set of solutions. Line (44.6.1) 
requires the solutions to be differentiable on some real-number interval. Consequently all solutions in the 
solution-set S.(I, V, B) must be classical solutions whose derivatives are defined everywhere on their domains. 


The fundamental theorem of calculus implies that solutions according to Definition 44.6.12 are also solutions 
according Definition 44.6.14. In other words, S.(I,V, B) C S,(I,V, B) for all real-number intervals J, 
finite-dimensional real linear spaces V, and linear operator families B. 


Expressed in terms of Definitions 44.6.12 and 44.6.14, the initial main focus of the study of initial value 
problems for systems of ordinary differential equations is to demonstrate existence and uniqueness, which 
means that the sets S.(I, V, B; to, vo) and S, (I, V, B; to, vo) should contain one and only one element for 
the least possible constraints on the parameters of these sets. 


44.6.10 Notation: D*(Q,R™), for Q € Top(IR?), for some n,m,k € Zt, denotes the set of functions 
from Q to R™ which are k times differentiable on Q. 


44.6.11 NOTATION: RẸ (I, R”), for an interval-product I of R”, for some n, m € Zi and p € RÝ, denotes 
the set of functions from J to R™ for which the p-th power of the norm is Riemann integrable on bounded 


subinterval-products of T. 


44.6.12 DEFINITION: A system of ordinary differential equations (with classical solutions) is a set 


S.(I,V,B) = {A € D'(,V); vt € I, A (t) = B(t)(AQ)}, (44.6.1) 


where J is an open interval of IR, V is a finite-dimensional real linear space, and B : I — Lin(V,V) is a 
family of bounded linear operators on V. 


An initial value problem for a system of ordinary differential equations (with classical solutions) is a set 


S«(1, V, B; to, vo) = {A € Se(I, V, B); A(to) = vo}, (44.6.2) 


for some initial parameter to € J and initial value vy € V, where J is an open interval of IR, V is a 
finite-dimensional real linear space, and B : I — Lin(V, V) is a family of bounded linear operators on V. 


44.6.13 REMARK: JIntegrability requirements for ordinary differential equations coefficients. 

The integral expressions in lines (44.6.3) and (44.6.5) in Definitions 44.6.14 and 44.6.15 respectively are not 
necessarily well defined for the constraints which are stated for the "coefficient" linear operator family B. 
Further constraints may be required in order to guarantee their well-definition. 


44.6.14 DEFINITION: A system of ordinary differential equations (with weak solutions) is a set 


Sw(I,V, B) = (A € C'(I, V) n RU, V); Yto,tı € I, A(t) — Alto) = fË B(s)(A(s)) ds), — (446.3) 


where J is an open interval of IR, V is a finite-dimensional real linear space, and B : I — Lin(V,V) is a 
family of bounded linear operators on V. 


An initial value problem for a system of ordinary differential equations (with weak solutions) is a set 


S, (1, V, B; to; vo) = {A € S, (1, V, B); A(to) = vo}, (44.6.4) 
for some initial parameter tọ € J and initial value vy € V, where J is an open interval of R, V is a 
finite-dimensional real linear space, and B : I > Lin(V, V) is a family of bounded linear operators on V. 
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44.6.15 DEFINITION: The Picard iteration sequence for an initial value problem S, (I, V, B; to, vo), for 
initial parameter tọ € J and initial value vo € V, where J is an open interval of IR, V is a finite-dimensional 
real linear space, and B : I — Lin(V, V) is a family of bounded linear operators on V, is the sequence (A;)%o 
of functions A; € C?(I, V) n RLL(I, V) for i € Z which are defined inductively by Ao(t) = vo for all t € J, 
and 


Vi € Zi, Vt € I, Ait) = [ Bouw) ds. (44.6.5) 


44.6.16 REMARK: Iteration schemes generate infinite families of wrong answers. 

One of the crazy things about iteration schemes is that they generate infinite families of wrong answers, and 
supposedly after an infinite number of iterations, the correct answer is produced, but one never arrives there 
in a finite lifetime. So the real output of an iteration scheme is a very large number of wrong answers. This is 
something which could be difficult to explain to a non-mathematician. In fact, there were initially some quite 
strong objections to calculus in the 18th century for very similar reasons. Luckily in the 21st century, the 
concept of a limit is well accepted. Approximations are considered acceptable because of the broad success 
of applied mathematics in a wide range of disciplines, many of which bring substantial economic benefits. 
The idea that an approximation is “close enough” is part of everyday life and work. One might summarise 
the trade-off between the costs and benefits of precision in the motto: “Close enough is close enough.” 


44.6.17 EXAMPLE: Picard approximations for orthogonal transport along a curve. 

Let I = R and V = R2. Define B : I > Lin(V,V) by B(t)(v) = —div? + ðv! font € I, v € V 
and i € Ng. Let tọ = 0 and vo = (1,0). The unique solution for this initial value problem is A: I > V 
defined by A(t) = (cost, sint) for all t € I. Then A'(t) = (—sint, cost) = (—v?, v!) = (—601v?, 62v), where 
v = (cost,sint) with the dependence on t suppressed. So A'(t)! = —div? + ó$v! = B(t)(v) = B(t)(A(t))' 
for all t € I and i € Nə. Hence A'(t) = B(t)(A(t)) for all t € I. 

Define a zeroth approximation by Ag : I — V by Ao(t) = vo for all t € I. Inductively define approximations 
Ak : I > V for k € Z* by Ax(t) = vo + fj, B(s)(Ax-i(s)) ds. Then 


Vt € I, Ao(t) = (1,0) 
WEIT A(t) = (1,1) 
Vt € I, A»(t) = (1 — t?/2,t) 
Vt € I, A3(t) = (1 — £?/2,t — 13/6) 
Vt € I, A4(t) = (1— £?/2 + 1*/24, t — 8/6) 
k k—1 
Vk eZ], Vt e I, Aalt) = (E CDA E (1) (22 + 1)!) 
£=0 £—0 
k k 
Yk € Zi, Vt e I, Aart) = (95 (71424 /(20), Y (1) (22+ 1)!). 
£—0 £—0 


(This is illustrated in Figure 44.6.1.) Hence limp... Ax(t) = (cost,sint) = A(t) for all t € I. Of course, 
although these series for the cosine and sine have infinite radius of convergence, the convergence is very 
unsatisfactory indeed for numerical methods for large values of the argument t. 


44.6.18 REMARK: Interpretation of the Picard iteration method. 

'There are some worrying aspects of the Picard iteration method which are demonstrated to some extent in 
Example 44.6.17. It seems unlikely that physical systems find solutions to their evolution equations by first 
trying a constant-state solution for a while and then going back to the beginning to feed the velocity data 
discovered along the trial trajectory into the integral to generate a second guess, after which the system 
makes an infinite number of approximations to determine how to evolve. One could counter-argue that this 
does not matter as long as the mathematical approximation technique obtains the correct answer, but there 
are some difficulties with this counter-argument. 


First, the system's evolution equations may have a perfectly good solution which is not discovered by this 
iteration technique. In fact, the iterations may not converge at all, or they may converge to a non-solution 
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Vib As 


Figure 44.6.1 Picard iterants for an orthogonal transport 


or the wrong solution. Second, the vector field map B : I > Lin(V,V) may not be defined at all along the 
approximate trajectories. This map could be defined only in some neighbourhood of the correct trajectories 
in the 7 x V product space. In Example 44.6.17, for example, the map B(t) could be defined only on the 
unit sphere defined by |V| = 1. If a physical system inherently preserves orthogonality, it might have no 
need of an extension of the vector field from this unit sphere to the surrounding space. Thus B(t) could be 
defined so as to have a value in the tangent space to this sphere for all argument values in the tangent space 
to the sphere. Of course, one may move the problem to the chart space, where this issue disappears. But 
some spaces are not necessarily identifiable as manifolds in this way. This non-definition issue is probably 
of less importance in practice than the non-convergence issue. 


In Figure 44.6.1, one sees that the approximation Ag is a constant function of time, with value (1,0). But 
the velocity prescribed by the vector field B(t) at v = (1,0) is B(t)(v) = (0,1) for all t € I. Therefore in the 
approximation Aj, the velocity is equal to (0, 1) for all t € I. In other words, the velocity for A, is equal 
to the velocity which should have been used for Ag. The velocity which is prescribed for the trajectory A 
is (1, —£) at each t € I. Therefore the actual velocity for A» at time t is equal to the velocity which should 
have been used for A; at time t. Thus for all k € Zf, the actual velocity of A,+1(t) is the same as the 
velocity of A;(t) should have been at that time. This is as if a tape recording is made of the prescribed 
velocity during trajectory Aj, and then this velocity is used for the trajectory Aj,4, at the same time. So 
each trajectory follows a path determined by the previous trajectory's measurements of the velocity it should 
have had, but that new path always measures a different velocity field. So each path is adjusted to have the 
correct velocity at each time for the previous trajectory, but then it must “tape record" the velocity field to 
be used for the following trajectory. It seems difficult to believe that real physical systems work like this. In 
this example, the circular limit of the trajectories is determined by the vector field in a region surrounding 
the circle. In a real system, the actual trajectory would surely be determined only by the vector field in a 
small neighbourhood of the trajectory. 


When the function B(t) € Lin(V, V) maps elements v € M to tangent vectors B(t)(v) € T(M) in the tangent 
bundle of a differentiable manifold M which is embedded in V for all t € J, it is intuitively clear that if an 
initial value A(to) of A lies in M, then A(t) will lie in M for all t € I. This does occur in Example 44.6.17. 
In such a case, one knows a-priori that the solution lies inside M, and the problem can be solved in a 
chart space, thereby avoiding the need to consider convergence within the ambient space. In this particular 
example, the manifold M corresponds to a subgroup of the differentiable group GL(2). So the problem can 
be reformulated as an evolution equation in the Lie algebra of this group. 


Another observation that arises from this example is the fact that the linear map B(t) € Lin(V, V) is best 
thought of as a map from the space V to the tangent space of V. In the theory of connections on fibre bundles, 
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this is in fact how such equations are formulated. The linear map B(t) maps from each point p in the space V 
to the tangent space T,(V) at p in V. This formulation facilitates the conversion of Example 44.6.17 to an 
evolution equation for the orbit of the element (1,0) € V with respect to the subgroup SO(2) of GL(2). The 
orbit is in this case the subset (v € V; |v] = 1), which may be considered to be a submanifold of V = R?. 


44.6.19 REMARK: The Picard ODE existence theorems. 

Theorem 44.6.20, which can be proved by the Picard method, is implied by Theorem 44.3.20, which can be 
proved by the Peano method. Thus Theorem 44.6.20 could be claimed to be a superfluous corollary. However, 
it is the method of proof which is of interest, and this method can be generalised in different ways to the 
Peano method. Another motivation for presenting Theorem 44.6.20 is the fact that under its conditions, 
uniqueness is guaranteed by Theorem 44.5.2. The Picard method also facilitates the demonstration of 
continuous dependence of solutions on the various parameters of ODE equation or system of equations. 
(Corresponding comments apply to Theorem 44.7.1, which uses the Picard method to prove existence for 
ODE systems.) 


44.6.20 THEOREM: Picard existence theorem for real-valued first-order ODEs. 
Let U € Top(R x R). Let X € C9: (U, R). Then 


V(to, zo) € U, dr € Rt, Jy € C! (Bj, ,,I«R) A IP(U), 
y(to) 2 zo and Vte€ Ba, Y(t) = X(t,4(t)). 


PROOF: 
(2018-11-11. To be continued ... )) 


44.7. Ordinary differential equations, Picard method, systems 
(2018-11-11. Section 44.7 is new today. Work in progress. . . . )) 


44.7.1 THEOREM: Picard existence theorem for first-order ODE systems. 
Let n € Z+. Let U € Top(IR x R”). Let X € C°1(U,R”). Then 
V(to, zo) € U, Jr € R*, ay € C (Bis, R”) N P(U), 
Y(to) 2 zo and Vt € Bur, Y(t) = X(t, 7(t)). 


PROOF: 
(2018-11-13. To be continued ... )) 


44.8. Calculus of variations on Cartesian spaces 


(( 2019-10-27. Section 44.8 is very sketchy right now, with some repetition and self-contradiction. The purpose 
of presenting calculus of variations is to help explain the application of covariant derivatives on vector bundles 
to the derivation of gauge theory equations of motion from Lagrangian density functions. )) 


44.8.1 REMARK: Calculus of variations literature. Path integrals versus field integrals. 

In the early history of the calculus of variations, the “solution” to the problem was a path or curve from 
an initial state to a final state, and the states in between were elements of some finite-dimensional manifold 
such as the points of some Cartesian space. For applications to field theories, the states of the “solution” to 
a calculus of variations problem are fields, which are typically functions valued on some finite-dimensional 
manifold. In the former case, the action integral is a functional of a path through a manifold. In the latter 
case, the action integral is a functional of the evolution of the state of a field as a function of space and time. 


For the calculus of variations for paths, see for example Gelfand/Fomin [79], pages 1-151; Lovelock/Rund [27], 
pages 181-213; Henri Cartan [4], pages 105-137; Lang [23], pages 243-262; Frankel [12], pages 275-281; 
Itzykson/Zuber [277], pages 1-7; Spivak [37], Volume 1, pages 316-320; Guggenheimer [16], pages 73-85; 


Rebhan [299], pages 250-253; Levi-Civita [26], pages 208-218; Miranda [116], pages 143-145, 234; Joos/ 


Freeman [278], pages 75-78; Courant / Hilbert [259], pages 139-233. 

For the calculus of variations for fields, see for example Gelfand/Fomin [79], pages 152-191; Lovelock/ 
Rund [27], pages 298-314; Itzykson/Zuber [277], pages 7-13; Mandl/Shaw [288], pages 27-30; Peskin/ 
Schroeder [298], pages 15-19; Bjorken/Drell [252], pages 24-28; Gilbarg/Trudinger [81], pages 288-292. 
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44.8.2 REMARK: Extremal points versus stationary points. 

In general, the calculus of variations is concerned with functions J : X — IR for some topological space X. 
The objective of the study is to either find points f € X for which J(f) is a local minimum or maximum, 
or else find points f € X for which the velocity of variation of J in a neighbourhood of f is equal to zero in 
every direction. The former is an “extremal point”. The latter is a “stationary point". 


Extremal points require only topological structure on X for their definition. 


(1) Weak minimum f. IQ € Top,(X), Vg € Q, I(f) € I(g). 

(2) Weak maximum f. IQ € Top,(X), Vg € ©, I(f) = I(g). 

(3) Strong minimum f. IQ € Top,(X), Vg € Q\ (fJ, I(f) < I(g). 
( 

( 


Q € Top;(X), Vg € QV (fj, I(f) > I(g). 


4) Strong maximum f. 
5) Stationary point f. Ve € Rt, Vy e Cl((-e,€), X), (y(0) =f => 8,I((s))], o = 0). 


Stationary points require some kind of differentiable structure on X. If X is a space of vector-valued functions 
with pointwise addition and scalar multiplication operations, affine maps of the form »yr,4 : s — f+s¢ will be 
meaningful, and can be assumed to be C! in some sense. Such maps may be regarded as tangent line vectors 
at points f in the same way that tangent line vectors are defined on Cartesian spaces in Definition 26.13.1 and 
Notation 26.13.4. One must beware the possibility that directional differentiability of J might not guarantee 
even the continuity of J. (See Examples 41.4.8 and 41.4.9.) However, one may at least hope to obtain 


for suitable Banach spaces. 


Thus stationarity, although it is a weaker concept than extremality in principle, is very much more demanding 
on the differentiable structure of the space X. In practice, however, the calculus of variations is often merely 
a convenient way to arrive at equations of motion indirectly via a Lagrangian density function, and it is 
the stationarity requirement which justifies the application of the Euler-Lagrange equations. So stationarity 
is really the necessary concept, not local or global extremality. But then one must introduce sufficient 
differentiable structure to make stationarity meaningful. So in general, it is apparently preferable to focus 
on stationarity as the key concept in the calculus of variations, not extremality. 


Since the Euler-Lagrange equations are only necessary for stationarity, questions concerning the extension of 
stationarity along tangent lines to stationarity for all differentiable curves do not have any great importance. 
If the calculus of variations is merely a clever way to encode and manipulate equations of motion, detailed 
analytical considerations have only philosophical significance. The important thing is to derive the correct 
equations of motion. 


In regard to the great convenience of Lagrangian functionals for combining fields to derive equations of 
motion, the comments of Penrose [297], page 491, are of some interest. 


However, I must confess my unease with this as a fundamental approach. I have difficulties in 
formulating my unease, but it has something to do with the generality of the Lagrangian approach, 
so that little guidance may be provided towards finding the correct theories. Also the choice of 
Lagrangian is often not unique, and sometimes rather contrived—even to the extent of undisguised 
complication. There tends to be a remoteness from actual physical ‘hands-on’ understanding, 
particularly in the case of Lagrangians for fields. [...] In most situations, the Lagrangian density 
does not itself seem to have clear physical meaning; moreover, there tend to be many different 
Lagrangians leading to the same field equations. 


Lagrangians for fields are undoubtedly extremely useful as mathematical devices, and they enable 
us to write down large numbers of suggestions for physical theories. But I remain uneasy about 
relying upon them too strongly in our searches for improved fundamental physical theories. 


It is the equations of motion which are physically significant, although they seem to follow almost magically 
from the Euler-Lagrange stationarity equations for suitable Lagrangian functionals. Therefore the analytical 
niceties of the stationarity of functionals are of secondary importance. As a trivial example, a circle is a set 
of points equidistant from a given point. The fact that a circle also minimises the circumference bounding 
a given area is a secondary issue. So one may define circles in a very elementary way without needing to 
define the length functional on the set of all curves which enclose sets with a given fixed area, or conversely 
define the area functional on the set of all curves with a given fixed length. 
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44.8.3 REMARK: Definitions of functionals, Lagrangians and action integrals. 

It is not surprising that the long history and wide applicability of the calculus of variations has led to some 
divergence of terminology. In principle, the calculus of variations can apply to any kind of functional, but 
in practice one mostly sees functionals which are integrals of functions of time, position and velocity, either 
abstract or concrete. Such integrals are typically called “action integrals”, and their integrands are typically 
called “Lagrangians”. The integrands may be called “Lagrangian density functions” in the case of integrals 
over regions of space-time. 

Since functionals, Lagrangians and action integrals are very general concepts, it is difficult to provide a 
definite specification of what they are. So it is difficult to give definite names for these concepts. When the 
Euler-Lagrange equations are given for some particular class of functionals, Lagrangians or action integrals, 
they appear as theorems rather than definitions. Consequently the terminology for calculus of variations 
concepts is typically somewhat informal. 


44.8.4 REMARK: The methods and scope of the calculus of variations. 

The calculus of variations is a set of methods which are applied to any kinds of problems for which the 
methods yield useful results. In other words, any problem for which the methods are useful can be said to 
lie within the scope of the calculus of variations. 


The methods are principally applied to “action” functionals J : X — IR for spaces X which have sufficient 
structure to be able to define [(f + t$) for f € X and ó € T;(X), where T;(X) denotes some kind of 
tangent space at f. (See Notation 54.1.4 for the tangent set T? (X) for a differentiable manifold X.) Then 
t € R may be varied to determine whether the map t — I(f -- t$) has a minimum, maximum or stationary 
value at t = 0. If this map is differentiable with derivative zero at t = 0, for all choices of ó € T;(X), then 
f is a stationary point of I. The first objective of the calculus of variations is to determine which points 
f € X have this property. In the case of a differentiable manifold X, the stationary point condition would 
be (dl) ¢(¢) = 0 for all ¢ € T,;(X). (See Definition 58.1.2 for the differential (dI);.) In other words, the 
condition is (dI); = Or, (x. 


Thus in the case of a differentiable manifold X, the calculus of variations is primarily concerned with 
determining solutions to the equation (dI); = 0 for differentiable functions I : X — R. As a purely 
abstract study, this would be of limited interest. The calculus of variations is more typically concerned 
with functionals / which are integrals of functions in X. Then varying these functions yields differential 
equations which can be subjected to other methods. The main interest in the subject is then the conversion 
of stationary integral problems into differential equations. These equations may be ordinary or partial 
differential equations, depending on the types of integrals / and function spaces X. 


In physics and other fields, the differential equations for numerous kinds of systems may be derived by 
varying action integrals. In some cases, the equations are discovered before the action integral, but in many 
areas, it is the integral which is chosen first, and the system's differential equations then follow from this. 
Particularly in quantum field theory, it is usual to construct the integral first, and the equations of motion 
are regarded as secondary. (This is not very surprising when one considers that initial value problems in 
quantum field theory are very difficult to apply directly in practice because observing the current state of 
the system significantly alters its future evolution, whereas classical mechanical systems may be observed 
with great precision without altering their future evolution.) 


Action functionals of the form I(f) = i F(f(t)) dt for functions f : [a,b] — R and F : IR — R are not 
generally of interest for the calculus of variations. For example, the minimum of such J for F : y > y? would 
be f : t> 0, the zero function. Action integrals become interesting when F depends on both the value of 
the function and its derivative because then there is a trade-off between the two which makes the solution 
difficult to guess. By converting this trade-off to an equation such as 


Vt € (a, b), Oy Fy, FO) si = Or (DE Vlr) 


for the functional I: f ^ f? F (f(t), f'(t)) dt, the stationary choices for f can be determined by solving a 
differential equation. The equation is most interesting when F(y,v) depends on both y and v. In the case 
F : (y, v) = v?, for example, the “trade-off equation” reduces to f"(t) = 0, the equation for a straight line, 
which is interesting only as a basic introductory example. 

Consequently the calculus of variations is principally concerned with the stationary points of functionals 
which are defined as integrals of combinations of both the function and one or more of its derivatives. 
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44.8.5 REMARK: The tangent space of a function, curve or surface. 

In the same way that a tangent vector tp, at a point p in a manifold M may be represented in terms of 
a parametrised straight line in R” with starting point v(p) € R” and velocity v € IR^, a “tangent vector" 
to a function, curve or surface can be represented via a kind of chart in this way. Then stationarity of 
a functional with respect to the function, curve or surface can be defined as stationarity with respect to 
all tangent vectors at a given “point”. Theorem 41.6.17 suggests that testing stationarity with respect all 
tangent directions will imply uniform "total" stationarity with respect to all neighbourhoods of the “point” 
for some metric function. 


(( 2019-10-19. To be continued ... )) 


44.8.6 REMARK: Differentiability of real functionals on linear spaces. 

Definition 44.8.8 requires only directional differentiability of functionals. As pointed out in Examples 41.4.8 
and 41.4.9, this does not even guarantee that the functional is continuous when the linear space is Cartesian. 
By Theorem 41.6.17 (ii), continous directional differentiability on a (finite-dimensional) Cartesian space 
implies continuous total differentiability, which in turn implies continuity by Theorem 41.6.19 (ii). 


On infinite-dimensional linear spaces, the definition of total differentiability of real functionals requires a 
suitable metric on the space. This can be provided by a norm as in Definition 39.3.2. Thus a preferable 
structure for the definition of differentiable functionals is a normed linear space as in Definition 24.7.5. 


44.8.7 DEFINITION: A (real) differentiable functional on a real linear space X is a function I: X > IR 
such that the map t > I(f + td) is differentiable at t = 0 for all f, 9 € X. 


44.8.8 DEFINITION: A stationary point of a differentiable functional I : X — IR, for a real linear space X, 
is a point f € X such that 9,I(f --tó)|, o — 0 for all 9 € X. 


44.8.9 REMARK: Definitions and notations for continuously extendable C" functions. 

There are two obvious ways to define C^ functions whose derivatives have continuous extensions to the 
closure €) of an open domain Q € Top(IR?) for n,k € Zt. They may be defined as restrictions of C^ 
functions on some open set G € Top(IR") with € C G, or they can be defined as C^ functions on €) which 


can be extended continuously to 2. 


One or the other approach may be preferred, depending on the application, but they are not equivalent. The 
restriction style of definition is stronger than the extension style. (See Example 40.5.19 for a function which 
is C! on a set Q € Top(IR), with a derivative which has a continuous extension to Q, but which is not equal 
to the restriction to Q of a C! function on G € Top(IR) with Q C G.) 


Notation 44.8.10 uses the simple extension style of definition for the extendability of C^ functions to the 
closures of their domains. So one must resist the temptation to think of the function g € C(Q), IR") as being 
equal to Q4 f on Bdy(Q). 


44.8.10 NOTATION: The set of C" functions with continuous extensions to the closure of the domain. 
C*(Q, R™), for k,m € Zi and Q € R” for some n € ZF, denotes the set 


(f € C(Q, R”); f|, € C*(Q, IR") and Vj € Ng, Va € N2,, 3g € C(O, R"), glo = 04(f|g))- 


((2019-9-25. Theorems 44.8.11 and 44.8.12 are work in progress. They probably contain errors. )) 
44.8.11 THEOREM: Well-definition of action density for real functions on a bounded real interval. 
Assume the following conditions. 

(i) a,b € R with a « b. 
(ii) f € C!((a, b), IR). 
(ii) F € C1((a,b) x IR x IR; R). 
(iv) A: [a,b] > IR. is defined by A(t) = F(t, f(t), f'(t)) for all t € [a,b]. 


Then A is continuous on [a,b]. Hence A is Darboux integrable on [a, b]. 


PROOF: 
((2019-9-25. To be continued ... )) 
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44.8.12 THEOREM: Euler-Lagrange equation for real functions on a bounded real interval. 
Let a,b € R with a < b. Let F € C?((a,b) x IR x R,R). Define I : C!1((a, 0), IR) 2 R by 


b 
vf € C (a,b), R), I(f) = l F(t, f(t), f(t) dt. 


Then [...] 


PROOF: 
((2019-9-25. To be continued ... )) 
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Chapter 45 


LEBESGUE MEASURE ZERO 


45.1 Sets of real numbers of measure zero . . . . . 4. 2 ll ll ls 1457 
45.2 Sets of real numbers of explicit measure zero . . . . es 1459 
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45.7 Lebesgue differentiation theorem . . . ooo a sns 1473 


45.0.1 REMARK: The distinction between measure and integration. 

Usually “measure” means quantification of the size of a set, whereas “integration” means quantification of 
the area under a function. However, measure may be regarded as a special case of integration because the 
measure of a set equals the integral of the indicator function of the set. Conversely, integration of a function 
may be regarded as a special case of measure because the integral of a function equals the measure of the set 
“under” the function in the Cartesian product of the domain and range. However, the techniques of measure 
and integration are different. Roughly speaking, measure theory adds up the length, area of volume of a set 
by approximating the set with simpler sets, whereas integration theory adds up the area or volume under a 
graph by approximating the area or volume by Cartesian products of domain sets and range intervals. The 
subject is thereby partitioned into measure theory and integration theory, presented in that order. Since 
integration theory is the “higher layer” of these two layers, the term “integration theory” effectively includes 
both measure and integration. 


45.1. Sets of real numbers of measure zero 


45.1.1 REMARK: Almost-everywhere assertions are true except on a set of measure zero. 

Many analytical concepts are expressed with the qualification “almost everywhere”, which means at all 
points except for a set of measure zero. In other words, the set of points where the assertion in question 
is false has measure zero. Hence the relatively simple concept of zero measure is worthy of study for its 
own sake. It provides a (relatively) relaxing prelude to general measures and general integration. Some very 
basic properties of sets of measure are asserted in Theorem 45.1.4. The less basic property that countable 
unions of sets of measure zero also have measure zero is delayed to Theorems 45.3.3 and 45.3.5 because this 
property requires either an explicit choice function or an axiom of choice. 


45.1.2 REMARK: Ball notation for real-number intervals. 
It is convenient to write real-number intervals of the form (a — 34, a + $2) in the ball notation B(a, £/2). 
(See Section 37.3 for balls.) The notations w and Zf are used interchangeably. 


45.1.3 DEFINITION: A set (of real numbers) of measure zero is a subset X of IR such that 


ve € RY, 3(a) 2, € R^, I4) 2o € (RT), 


M G<e and XC U Bla, 14). (45.1.1) 
i=0 e 
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45.1.4 THEOREM: Some basic properties of sets of real numbers with measure zero. 
Sets of measure zero have the following properties. 


(i) The empty set is a set of measure zero. 
(ii) Any countably infinite set of real numbers is a set of measure zero. 
(iii) Any subset of a set of real numbers of measure zero is also a set of measure zero. 
(iv) The union of two sets of real numbers of measure zero is also a set of measure zero. 
) 


(v) The union of a finite collection of sets of real numbers of measure zero is also a set of measure zero. 


PROOF: For part (i), let e € Rt. Define (a;)??; € IR? and (¢;)%29 € (IR*)" by a; = 0 and £; = e2~*~? for 


alli € Zj. Then pars že < e and 0 C Us Bla, 44). Hence () is a set of measure zero. 


For part (ii), let (z;);?9 be a sequence of elements in IR, and let X = (zi; i € Zi}. Let € € R+. Define 
(i)o € IR? and (£;)?9g € (IR^)" by a; = x; and 4; = &2-*-? for alli € Z{. Then e ie < £ and 
X CUP, Blai, 44). Hence X is a set of measure zero. 

For part (iii), let X be a set of real numbers of measure zero. Let Y C X. Then for all € € Rt, there 
are families (a;)??9 € IR" and (é;)%9 € (IR*)" such that 357-94; < € and X € Uo Blai, 14;). Then 
YC LES B(ai, 54) under the same conditions. So Y is a set of measure zero. 

For part (iv), let Xo and X; be sets of real numbers of measure zero. Then for all e € Rt, there are families 
(a$ .)%25 € RY and (6 ,)%2o € (Rt) such that 3772945, < € and X; C UZo B(a§,, 544 ,) for j = 0,1. Let 
&€ c Rt. Define families (af)? and (Ci) Ro by A405 = ae and 5, = po for all i € Zj and j = 0,1. 
Then $77.94 < € and Xo U X; € Ur. B(a;, 545). Hence Xo U X, is a set of measure zero. 


Part (v) follows by induction from part (iv). 


45.1.5 REMARK: The rational numbers have measure zero. 

It is perhaps somewhat perplexing that the rational numbers have measure zero. Since the rational numbers 
are a dense subset of the real numbers, there seems to be no space at all between the rational numbers. 
The “contiguous stretches” of irrational numbers all have zero length. So it might seem that the irrational 
numbers should have zero measure too! Yet the irrationals in the interval [0, 1] have measure 1 while the 
rationals have measure 0. 


To see how the rationals have measure zero according to Definition 45.1.3, one may commence covering the 
rational numbers with a small interval centred on the number 0 € Q of diameter e. Then one may put an 
interval of diameter £27} centred at 1, and of diameter £27? at 1/2. Then one may place successive intervals 
centred at integer multiples of 1/3, 1/4, 1/5 and so forth. (See Figure 45.1.1.) 


J o 7 5 3 8 2 9 4 6 10 1 
aj Q 1 1 1 2 1 3 2 3 4 1 
: tt ! : H ES 
bj 1 5 4 3 5 2 5 3 4 5 1 
r; 01 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 
i 1 128 32 8 256 4 512 16 64 1024 2 
oo 
Q0, 1 e Uj=o C; 
; + 
Vj € Zo > Cj = B(aj[bj,r;) 
Vj € Zt, rj = 0.1/2? 
Figure 45.1.1 Covering the rational numbers to prove they have measure zero 


Since the diameters are reducing exponentially while the gaps between the interval-centres are falling very 
slowly, there are very many irrational points which will never be reached. Even though this is logically 
clear, it is somewhat counterintuitive. This is only one of the very many facts about real analysis which run 
counter to one's intuition. Hence it is very important to get the logic right. Everything must be proved! 

In topology, one learns that the closure of a dense subset equals the whole set. (See Definition 33.4.2 for 
topological density.) The cover described here covers every element a;/b; of the dense subset Q[0, 1] of the 
real interval [0, 1] with some open interval C;. Therefore one naturally expects that the union Uo C; of 
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these intervals should cover all of the real-number interval [0,1]. However, the closure of any set X is the 
smallest closed set which covers X. In this case, the cover of X = QJ[0,1] is an open set because it is equal 
to a union of open sets. If the open individual intervals C; in the cover are replaced with the corresponding 
closed intervals Cj, this does not help the case. The union of a countably infinite family of closed intervals is 
not guaranteed to be closed! The real numbers are the topological completion of the rational numbers, but 
if the lengths of a sequence of covering intervals decrease rapidly enough, the completion of the dense set is 
prevented. It should be noted here that the word “density” has very different meanings in the topological 
and measure-theoretic senses. The density is measured numerically as zero, although the density in the 
topological sense of ubiquity means that every neighbourhood contains at least one rational number. 


45.2. Sets of real numbers of explicit measure zero 


45.2.1 REMARK: Sets of explicit measure zero. Swapping the quantifiers. 

Instead of merely requiring the existence of a cover of X of total length less than each £ € IR* as was done 
in Definition 45.1.3, in Definition 45.2.2 the requirement is for a choice function which chooses such a cover 
for each e. In the case of continuity of functions between metric spaces in Section 38.2, it is found that a 
function is continuous if and only if it is "explicitly continuous" in Theorems 38.2.7 and 38.2.10. That is 
possible because there is a total order on the real numbers, and the set of possible values of 6 € Rt is a 
real-number interval for each € € IR^. A “choice” of 6 can be made by setting it equal to the supremum of 
a set of possible values. 


In the case of covers of a set X with length less than & € IR*, it is not immediately clear how to define a 
total order on the set of covers so that the supremum of the covers with length less than € is a suitable cover 
for the definition of a set of measure zero. The set of choices of covers is analogous to the set of choices of 
ó-values for continuity of a function between metric spaces. 


Definition 45.2.2 requires the sequences af = (a£);2 € IR? and £* = (af)? o € (IR*)" to be given in advance 
for all values of € € Rt before being tested for all such € for length and covering ability. This may be thought 
of as “swapping the quantifiers” or “reversing the quantifiers". 


Limit processes in mathematical analysis are generally expressed as a universal quantifier followed by an 
existential quantifier, as in the expression Ve € R+, 3d € R*, f(B1,) C Bio. in Theorem 38.1.7. This is 
reversed in the “explicit” expression 3ó : IR^ > R+, Ve € R*, f(B1; a) Ss Bi ,, in Definition 38.2.6. The 
motivation for “reversing the quantifiers” is to replace an implicit infinite set of choices, for example of the 
number 6 for each £, within a single function which makes all of those choices. With the axiom of choice, 
such a reversal is always possible. By reversing the quantifiers, either one is claiming that the reversal is 
possible (either by some ZF construction or by the axiom of choice), or one is claiming that the reversibility 
is required as an input to a theorem. (This depends on whether the logical expression is on the left or on the 
right of the assertion symbol.) In Definition 45.2.2, the reversal of the quantifiers is a precondition which 
will be exploited in theorems which require the existence of an explicit choice function. With the assistance 
of the axiom of choice, this reversal is always possible. (See also Remarks 10.11.11 and 10.5.16 for comments 
on quantifier swapping and the axiom of choice. See Theorem 10.5.17 (ii) for justification.) 


45.2.2 DEFINITION: A set (of real numbers) of explicit measure zero is a subset X of R such that 


Ja: Rt > RY, 30€: Rt 2 (IR*)*, Ve € Rt, 


1 


Si <e and XC U B(aj, 4£). (45.2.1) 
i=0 


4*2 
i=0 


45.2.3 REMARK: Converting a measure-zero theorem into an explicit measure-zero theorem. 

Theorem 45.2.4 is the explicit version of Theorem 45.1.4. The difference is that in Theorem 45.2.4, it must 
be shown that there exists a function which maps positive real numbers € to suitable set-covers with length 
less than e, whereas in Theorem 45.1.4, it is only necessary to demonstrate the existence of such a set-cover 
for any given e € IR*. Connoisseurs of tedious proofs may enjoy examining the differences between the proofs 
of these two theorems in detail. 
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45.2.4 THEOREM: Some basic properties of real-number sets of explicit measure zero. 
Sets of explicit measure zero have the following properties. 


(i) The empty set is a real-number set of explicit measure zero. 
Any countably infinite set of real numbers is a set of explicit measure zero. 
y 
) Any subset of a set of real numbers of explicit measure zero is also a set of explicit measure zero. 
(iv) The union of two sets of real numbers of explicit measure zero is also a set of explicit measure zero. 
) 


The union of a finite collection of sets of real numbers of explicit measure zero is also a set of explicit 
measure zero. 


PRoorF: For part (i), define 2 a: Rt + RY and £ : IR* — (IR*)" by af = 0 and /$ = £2 *? for 
alli € Zf and e € Rt. Then 352, & = te < e and 0 C UX, B(af, 462) for all e € Rt. Hence ( is a set of 
explicit measure zero. 


For part (ii), let (z;)?29 be a sequence of elements in R, and let X = [zj;i € Z5 Define functions 
: IR* > RY and £ : Rt > (IR*)" by af = a; and & = €2-*? for all i € Zi and e € Rt. Then 
x o = ie < e and X CUR, B(af, 14) for all e € Rt. Hence X is a set of Explicit measure zero. 


Qis 274. 

For part (iii), let X be a set of real numbers of explicit measure zero. Then there exist functions a : R* > RY 
and £ : IR^ — (IR*)* such that $772,945 < e and X C Uo B(a, $4) for alle € Rt. Let Y C X. Then 
ot <e and Y Cll Bir. I5) for alle € IR*. So Y is a set of explicit measure zero. 

For part (iv), let Xo and X1 be sets of real numbers of explicit wies zero. Then there exist functions 
aj : Rt — R” and Z; : IK" > (IR*)" for j = 0,1 such that 5772445; < e and X; C Uc o Bl as i, 105.) for 
alle € R+, for j = 0,1. Define functions à : IR* — IR? and £ : Rt > (IR*)" by à$,5, = a, ? and fg = ee 
for all i € Zj and j = 0,1, for alle € Rt. Then po% < € and Xo U X; € UKo B(a, 5/5). Hence 
Xo U X4 is a set of explicit measure zero. 


Part (v) follows by induction from part (iv). 


45.2.5 THEOREM[ZF4OC]|: Real-number measure zero implies explicit measure zero with countable choice. 
Every set of real numbers of measure zero is a set of real numbers of explicit measure zero. 


PROOF: Let X be a set of measure zero. Then X satisfies Definition 45.1.3 line (45.1.1). For € € IR*, define 
Ce quus E€ Re XR) 44 < £ and X C UZ o Blan 1 ;5)). Then Ve € R*, C, # Ú. 
Therefore by the axiom of countable choice, X? Cz- £ 0. For all e € IR*, let k(c) = inffk € Zj; 2^" < e}. 
(It follows from the Archimedean property of IR that {k € Zj; 27" < e} z Ø. See Section 18.4.) Then 
Ce 2 Cy-%e) for all e € IR*. So xzeg- C. 2 x$9.9C5-*. Therefore x zeg«Ce #0. So X satisfies line (45.2.1) 
in Definition 45.2.2. Hence X is a set of explicit measure zero. 


45.2.6 REMARK: Alternative equivalent definitions for sets of explicit measure zero. 

A set of real numbers of explicit measure zero is formalised in Definition 45.2.2 as a family with a real-number 
parameter. This can be replaced by an equivalent family with an integer parameter. In other words, the 
parameter set IR* can be replaced with Z*. To convert from the real parameter € to the integer parameter k, 
one may replace k with e = k~!. In the reverse direction, one may construct the real-parameter family by 
replacing the real parameter £ with ceiling(c t). (See Definition 16.5.12 for the ceiling function.) 


One may also replace Definition 45.2.2 with an equivalent definition where the sets X. = UJ; Baz, 245) in 


line (45.2.1) may be replaced by sets X! which are non-decreasing with respect to e. For example, one may 
define X! = (){X,-1; k € Z*, k € ceiling(e~')}. 


45.3. Explicit families of real-number sets of explicit measure zero 


45.3.1 REMARK: Explicit countable families of sets of explicit measure zero. 

In preparation for Theorem 45.3.3, Definition 45.3.2 introduces the concept of an explicit countable family 
of sets of explicit measure zero. This clumsy choice of terminology signifies that the cover for each set in 
a countable family of sets is explicitly chosen, not acquired via an axiom of choice. The explicit countably 
infinite family of choices permits Theorem 45.3.3 to be stated without any axiom of choice. 
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45.3.2 DEFINITION: An explicit countable family of sets of explicit measure zero is a countable family of 
sets (X;)72. such that there exist maps a: Zg x IR* > IR? and l: Z; x Rt > (IR*)" such that 


VjeZj,Vee RF, M 6,«c and X;C U Blaga 145), (45.3.1) 
i=0 m 


where (af ;);29 = a(j, €) € RY and (£5,)?2s = Lj, £) € (IR*)" for all j € Z9 and e € IR*. 


45.3.3 THEOREM:  Countable unions of explicit measure-zero sets have explicit measure zero. 
'The union of an explicit countable family of sets of real numbers of explicit measure zero is a set of real 
numbers of explicit measure zero 


PROOF: Let X = Uo Xj, where (X;)72. is an explicit countable family of sets of real numbers of explicit 
measure zero. Then there exist maps a: Z x IR* — IR? and £ : Zj x IR* — (IR*)* which satisfy line (45.3.1) 
in Definition 45.3.2. Let € € IR*. Define families (a5 55.9 € IR?** and (65 ,)55..9 € (IR*)^** by a$; = aj} 
and i i = £9 for all j,i € Zi, where €; = 277-^1e for all j € Z. Then X; C UZo B(a5^,, $65) for 
all j € Zo. 

Define the families (47)? , € R” and (/$)2& y € (IR*)" by à = a sd and & = UO ra) for all k € Zj, 
where the functions J, T : Zi > Zi are defined by 


Vk eZ, g(k) = floor(—3 + (2k + 1)!/2) 
Vk € Zo, f(k) = 9(k)(g(k) + 1)/2 

Vk e Z, J(k) = f(k) + g(k) —k 

Yk € Z, I(k) =k — f(k) 


It is easily verified (in view of Remark 13.9.4) that 


and (5-5 » G 


Hence X is a set of real numbers of explicit measure zero by Definition 45.2.2. 


45.3.4 THEOREM |ZF+CC]: Countable unions of measure-zero sets have explicit measure zero. 
Every family of sets of real numbers of measure zero is an explicit family of sets of real numbers of explicit 
measure zero. 


PROOF: Let (X;);2o be a countable family of sets of real numbers of measure zero. For all j € Zg 
and € € Rt, define Cj e = {((ai)&2o, (€:)%29) € IR? x (Rt); Do 4; < £ and X; C UR, Blai, 14,)). Then 
Vj € Zo, Ve € Rt, Cj. £0 by Definition 45.1.3. So by the axiom of countable choice, x95 oC$,2-* 7 0 
because Zj x Zi is a countable set. 

For all e € Rt, let k(e) = inf{k € Zg; 2-* < e}. (It follows from the Archimedean property of IR that 
{k € Z9; 27" € e) # 0. See Section 18.4.) Then Cj. 2 Cj5-&» for all j € Zf and e € Rt. So 
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X G,e)ezi xt Cie 2 X G.eyez? x Rt Ôj 2780): Therefore X G.e)ezd xt Cic # Ø. In other words, there exist 
maps a: Z x IR* > IR? and £: Z x IR* — (IR*)? which satisfy line (45.3.1) in Definition 45.3.2. So 
(X;)55., satisfies Definition 45.3.2. Hence (X;)72. is an explicit family of sets of real numbers of explicit 
measure zero. 


45.3.5 THEOREM [ZF+cCc]: Countable unions of measure-zero sets have measure zero. 
The union of a countable family of sets of real numbers of measure zero is a set of real numbers of measure 
Zero 


PRoor: The result follows immediately from Theorems 45.3.4 and 45.3.3. 


45.3.6 REMARK: The countable choice axiom as a “front end” for unions of sets of measure zero. 

In Theorem 45.3.3, no axiom of choice is assumed. Therefore the set-cover-family mid-point parameters 
(a5 ;);£o € R” and diameter parameters (/5;);*9 € (IR*)" for X; must be “given” or “known” or “chosen” 
for each j € Zi, for each e € IR*. This is an infinite set of choices of set-cover-families. The “infinite- 
multiple-choice” requirement arises here because if one only knows that a cover-family “exists” for each 
j € Z and € € R*, then the choice of a unique cover-family for each j constitutes a countably infinite set 
of choices from a very infinite set of possible families of mid-point and diameter parameters. 


Intuitively, it seems clear that if all of the sets X; have suitable covers for each € € IR*, then someone must 
presumably have proved that such covers exist, and so presumably there is sufficient information available to 
write down an explicit cover-family for each set X;. If these sets were not "constructed" with the assistance 
of an axiom of choice, generally one would be able to trace through the proofs to determine a method of 
constructing the cover-families explicitly. 


Since in every situation where a cover is really known to exist, one may construct explicit covers for all 
sets X;, it seems reasonable to assume that this is always the case, even if no information is available except 
the knowledge that the sets have zero measure. However, there are Zermelo-Fraenkel models in which the set 
of all real numbers can be expressed as the union of a countable family of countable sets. (See for example 
Jech [364], pages 142-143; Moore [371], page 70; Howard/Rubin [362], page 30 (form 38).) This would imply 
by Theorem 45.1.4 (ii) that the set of all real numbers is a set of measure zero. This is clearly unacceptable! 


There are two obvious escape paths from this conundrum. The easy path is to adopt the axiom of countable 
choice as in Theorem 45.3.5, so that the required infinite sets of choices of cover-families are delivered as if by 
magic. The more realistic path is to demand that suitable cover-families be given explicitly, or at least must 
be known to be explicitly choosable. After all, this is the intuitive underpinning of the frequent accidental 
use of the axiom of choice. In most realistic scenarios, explicit choices can be made. 


Moore [371], pages 64-76, describes the “historic irony that many of the mathematicians who later opposed 
the Axiom of Choice had used it implicitly in their own researches." Not least amongst these was Henri 
Lebesgue, who in 1902 proved that the union of a family of measurable sets is measurable by an unwitting 
application of the axiom of countable choice. “Thus Lebesgue was eventually faced with the dilemma of 
either accepting the Denumerable Axiom or else restricting his theory of integration in an essential way." 
(Moore [371], page 70.) In fact, the dilemma is not as serious as it seems. One simply has to replace the 
unconditional assertion with an assertion which is predicated on the existence of a choice function, and that 
choice function can almost always be provided in practical contexts. Theorem 45.3.3 is an example of how 
such a condition may be added so as to keep the assertion within the scope of Zermelo-Fraenkel set theory. 
If that is inconvenient, one may invoke the countable choice axiom to summon forth a choice function as in 
Theorem 45.3.5. The difference between the two approaches is not enormous. In one case, one requires a 
choice function as an input to make the theorem work. In the other case, one trusts and believes that such a 
choice function exists, either in reality or on some astral plane. The structure of the proof is still the same. 
It is only the source of the input data which distinguishes the AC-free and AC-assisted proofs. This issue 
is more philosophical than mathematical. (See Remark 12.1.27 for a similar comment on the philosophical, 
insubstantial nature of the axiom of infinity.) 


One can only test the validity of the claims of an axiom of choice in situations where the claimed choice 
functions can be constructed without the assistance of the axiom. "Therefore choice axioms can never be 
falsified because they can never be tested. In this sense, one could say that the axioms of choice are harmless, 
but there is substantial benefit in stating explicitly what kinds of choice functions one is relying upon. In 
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concrete applications they will be needed anyway. Thus Theorem 45.3.3 states explicitly what kind of choice 
function is required, whereas Theorem 45.3.5 sweeps the requirement under the carpet. Instead of tacitly 
relying upon rabbits out of hats, Theorem 45.3.3 specifies precisely what kind of rabbit is required. Users of 
the theorem who believe that they can obtain suitable rabbits from hats are at liberty to obtain them from 
such a source. 


45.3.7 REMARK: The axiom of choice should not be taken literally. 

Since all applications of the axiom of choice, such as Theorem 45.3.5, may be rewritten with a caveat, as in 
Theorem 45.3.3, to make them AC-free, one may consider that AC-assisted theorems are true in a kind of 
metaphorical or virtual sense. In other words, any AC-assisted theorem which asserts a proposition P may 
be interpreted to mean that P is true if a suitable choice function can be provided. Thus Theorem 45.3.3 
makes explicit the implicit assumption of the existence of a choice function in Theorem 45.3.5. In this 
sense, one may consider that all applications of the axiom of choice are entirely harmless, not matter how 
counterintuitive the assertions may seem, because the assertions are not meant to be taken literally. 


There is a precedent for such a non-literal interpretation of axioms of choice, namely the axiom of infinity. 
As mentioned in Remark 12.6.2, even the finite set Vg is far too huge to be listed explicitly within the known 
Universe, and this set does not even contain the ordinal number 6. So the axiom of infinity is not meant to be 
taken literally. It really means that any finite set that we can think up is “blessed” with “existence”, and is 
admitted to the respectable universe of sets. We can also write down a name w for the set of all finite ordinal 
numbers and we can ^bless its existence". But it does not exist in the literal sense. Another precedent is the 
power-set axiom. The set of subsets of w also does not really exist. Similarly, the von Neumann universe V 
(in Section 12.6) is “blessed with existence” (as an NBG class) despite being too big to really think about. 


Ina related way, the choice functions required for AC-assisted theorems often cannot be provided by any kind 
of construction because to write them down as set-theoretic expressions would require an infinite number 
of symbols. So all AC-assisted theorems are conditional on the provision of choice functions. Invocation 
of an axiom of choice is a shorthand way of saying: “If you can obtain suitable choice functions for the 
inputs for this theorem, then you can obtain the outputs from this theorem via a ZF proof.” This does not 
excuse or validate the application of axioms of choice. In fact, this point of view underlines the importance 
of identifying every AC application so that a caveat may be added to inform users that they must provide 
suitable choice functions if they want such applications to be valid. 


Referring to the idea in Remark 7.10.3 that the axiom of choice delivers a kind of IOU for a rabbit rather 
than a real rabbit out of a hat, this is not so much a disparagement of the axiom, but rather an objective 
statement that in practice, the IOU is very often fulfilled when non-AC constructive methods deliver the 
rabbit. In other words, whatever choice functions are promised by the axiom of choice are later fulfilled by 
the constructive ZF axioms. In this perspective, one may say that Theorem 45.3.3 states what kind of choice 
function (i.e. rabbit) must be delivered, and Theorem 45.3.5 claims (falsely) that it can always be delivered. 
This is mostly not a real problem because the specified choice function (in this case an infinite sequence of 
choices of cover-families) can be delivered when required. 


45.4. Non-differentiability of monotonic real functions 


45.4.1 REMARK: An alternative (and perplexing) characterisation of real-number sets of measure zero. 
Theorem 45.4.2 gives an alternative characterisation of sets of real numbers of measure zero. (A similar 
theorem is given by Riesz/Szókefalvi-Nagy [125], pages 5-6.) The condition on line (45.4.1) means that a 
set X can be covered by a countably infinite sequence of intervals whose total length is finite, but such that 
every point of X is contained in an infinite number of those intervals. Roughly speaking, the thinking here 
is that if every point is in an infinite number of intervals, one may remove any proportion of the total length 
of the cover by removing a finite number of intervals, but because the cover is infinite at every point, the 
cover will still be a full cover (because "infinity minus a finite number equals infinity"). Therefore there 
must exist a cover of X with any given positive total length, and so X has zero measure. 


The purpose of Theorem 45.4.2 is to help construct a bounded monotonic real function which is non- 
differentiable on every point of a given set of explicit measure zero. (If the zero measure is not explicit, it 
is not possible to carry out this construction.) The alternative definition of a set of explicit measure zero is 
applied in Theorem 45.4.5 to construct such a function. 
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45.4.2 THEOREM: . Explicit measure zero if and only if all points infinitely covered by finite-length family. 
A set of real numbers X is a set of explicit measure zero if and only if 


A(ai)f2o € RY, 3(6);, € (R*)", 


oo 


Y; £j <œ and X C U B(aj;,/;/2) and Vz € X, #{i € Zi; x € B(aj,0/2)) = oo. (45.4.1) 
i 0 


i=0 = 


PROOF: Let X be a set of real numbers of explicit measure zero. Let L € IR*. Then by Definition 45.2.2, 
there exist double families (aj) -o € IR" and (¢);)%$ 9 € (IR*)^ such that Xo 4;; < L2 7! and 
X CUP, Blaji, £;;/2) for all j € Zj. Define families (a&);?.; € RY and (£&)g&.g € (IR*)" by Ge = a;(x), r(a) 
and £j, — Éj(k),r(k) for all k € AZ where the functions J,I : Zi > Za are defined by 


Vk eZ, g(k) = floor(—3 + (2k + 1)!/2) 
Vk € Zo, f(k) = g(k)(g(k) + 1)/2 

Vk € Z, J(k) = f(k) + g(k)— k 

Vk € Zf, I(k) — k — f(k) 


See Remark 13.9.4.) Then it is easily verified that Y 7^ , lk = V Y ^£; < Y 7 4 L279! = L and 
( k=0 j=0 1i=0 “J; j=0 


ACU U B(ajs 44/2) 
j=0 i=0 


= Ü B(Gx, 0/2), 
k=0 


and Va € X, Vj € Zi, Ji € Zg; z € B(aji, lj,i/2). In other words, 


Vz € X, Vj € Zf, #{i € Zf; x € B(aj;,£;,/2)) > 1. 
Therefore 
Vx € X, #{k € Zg; x € B(ax£x)) = 4H((,3) € Zg x Zi; x € B(aji,05)) 
= Mo € Zj; zeB(at)) 
= OQ. 


Hence X satisfies line (45.4.1). 

To show the converse, let X C IR satisfy line (45.4.1) for some (a;);?9 € IR" and (£;);?9 € (IR*)". Let 
L= 3o 6i. Then L € Rt. Let e € Rt. Then $5. li < € for some j € Zj by the convergence of the series 
(€:)%29- For e € Rt, define S; = {j € Zd; Dpc; li < €}. Then S; # ( and so fe = inf S; is a well-defined 
element of Z. Since 4; € IR* for all i € Zj, it follows that Vj € Zi, (j > fe © b» Li < €). Consequently 
the subset f = {(e,k) € Rt x Zj; Vj € Zi, (j 2 k & Yo 6 < e)) of Rt x Zj is a function from Rt 
to Zj. Define functions à : Rt — IR^ and £: IR* > (IR*)" by 


Ve € R*, Vic Ze Gi = Gif. and li = 5 rk. 


Then X C UZ, B(a;, £;/2) by line (45.4.1) because only a finite number of elements of the sequence-pair 
(a,£) have been removed. Therefore à and @ satisfy line (45.2.1) in Definition 45.2.2. Hence X is a set of 
real numbers of explicit measure zero. 


45.4.3 THEOREM [ZF4-CC]: Measure zero if and only if all points infinitely covered by finite-length family. 
A set of real numbers X is a set of measure zero if and only if 


3(a;)i&o € RY, 3(£);2o € (RT), 


X 4 « oo and X C U Bla, £;/2) and Va € X, 44 (i € Zi; x € B(aj, £;/2)) = oo. 
i=0 i=0 


PROOF: This follows from the axiom of countable choice and Theorems 45.2.5 and 45.4.2. 
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45.4.4 REMARK: Existence of monotonic functions which are non-differentiable on a set of measure zero. 
Theorem 45.4.5 gives an explicit construction of a bounded non-decreasing function f : R — R which is 
non-differentiable on any given set X of real numbers. (See also Riesz/Szökefalvi-Nagy [125], page 6, for this 
construction.) The purpose of this construction is to show the sharpness of a theorem which states that all 
monotonic real-valued functions are differentiable except on a set of real numbers of measure zero. 


45.4.5 THEOREM: Construction of monotonic functions non-differentiable on set of explicit measure zero. 
Let X be a subset of IR, and let (a;)%) € IR? and (£;);29 € (IR*)" satisfy 


Y; 4 « oo and X C U B(aj,£;/2) and Vz € X, #{i € Zj; x € B(aj,£;/2)) = oo. (45.4.2) 
i i=0 


i=0 i= 


Define the function f : IR — IR by 


Vx € R, f(z) = X (x), 
i=0 
where 
Vie Zt, Yz € R, £(z)-— v — ai d- i4 if x € B(ai, $4) 


2 
'Then f is a non-decreasing function which is non-differentiable at all points of X. 


PROOF: Let X, (a;)%2o, (4:)%29 and f be as in the statement of the theorem. Let L = 577°, £;. Clearly the 
function 4; : R > IRj is non-decreasing for all i € Zf. Since 0 < £;(x) < 4; for all x € R and i € Zj, it 
follows that 0 < f(a) < L for all x € IR. Let zo € X and let $ = (i € Zf; zo € B(aj, 34;)). Then S is an 
infinite set. Define b; = aj + i — zo for i € S. Then b; > 0 for all i € S. Let c; = min(b;; i € S and i € j}. 
Then cj > 0 for all j € Zf, and Vh € [0,c;), f(zo + h) — f(zo) > Njh, where N; = #{i € S; i < j} 
for all j € Zf. Let K € Rt. Then N; > K for some j € Zj. For such j, let h = cj/2. Then 
(f(zo 4- h) — f(zxo))/h 2 N; > K. Therefore VK € Rt, Ih € IR*, (f(zo +h) — f(xo))/h > K. So f is not 
differentiable at x. Hence f is non-differentiable at all elements of X. 


45.4.6 THEOREM: Existence of bounded non-decreasing non-differentiable function on measure-zero set. 
For any set X of real numbers of explicit measure zero, there is a bounded non-decreasing function f : X — 
R$ such that f is non-differentiable for all x € X. 


PROOF: The result follows from Theorems 45.4.2 and 45.4.5. 


45.4.7 THEOREM [ZF+cCc]: Bounded non-decreasing non-differentiable function on measure-zero set. 
For any set X of real numbers of measure zero, there is a bounded non-decreasing function f : X — Ri 
such that f is non-differentiable for all x € X. 


PROOF: The result follows from the axiom of countable choice and Theorems 45.2.5 and 45.4.6. 


45.5. Shadow sets 


45.5.1 REMARK: A technical lemma to assist the proof of the Lebesgue differentiation theorem. 

Theorem 45.5.4 is similar to a technical lemma used by Riesz/Szókefalvi-Nagy [125], pages 5-11, to prove the 
Lebesgue differentiation theorem without needing to first define Lebesgue measure and the Lebesgue inte- 
gral. (This approach to the Lebesgue differentiation theorem is also presented by Kolmogorov/Fomin [104], 
pages 318-323, and A.E. Taylor [145], pages 403-407. A closely related approach for functions of bounded 
variation is given by Johnsonbaugh/Pfaffenberger [97], pages 234-237. See Section 38.10 for functions of 
bounded variation in metric spaces.) 
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f(z) Kə Ky, Ko ky 


al a? Tı as a2 T2 T 


Figure 45.5.1 Smaller shadows are cast by a higher light source: Ky, < K2 => Qe > OF rk 
Theorem 45.5.4 gives some basic properties of “shadow sets”, which are the shrinking sets of points on a 
continuous graph which lie in shadows cast by a rising light source on the right. All points with a positive- 
infinite upper right Dini derivative must lie permanently in a shadow, no matter how high the light source 
rises on the right. (See Section 40.10 for Dini derivatives.) This is illustrated in Figure 45.5.1. 


It is clear that the set of z-coordinates which lie under the “shadow regions" in Figure 45.5.1 is non-increasing 
with respect to the “altitude of the rising Sun”. (This is shown in Theorem 45.5.4 (ii).) If the sum of the 
lengths of the intervals which lie under the “shadow regions" can be shown to converge to zero, this will 
imply that the measure of the set of infinite-derivative points must be zero. This strategy can then be applied 
to each coordinate of a curve in a multidimensional space to achieve the same result. 


The openness of the shadow set Q7 r.f, 1n Theorem 45.5.4 (iv) implies that each such set may be expressed 
as an explicit disjoint union of a countable set of open en intervals by Theorem 32.7.8, which does not require 
an axiom of infinite choice. Therefore the points a',b' € Bdy(Q* sK) in Theorem 45.5.4 (vi) are always the 


end-points of some open interval of which Q* nk is composed. However, the decomposition of an open set 
of real numbers into a countable set of disjoint open intervals is not used in the proof of Theorem 45.5.4. 


45.5.2 DEFINITION: The (upper right) shadow set with gradient K € IR for a function f : [a,b] > R on a 
closed bounded real-number interval [a,b] is the set Qr, x € P(R) given by 


Qi, kc = {x € (a,b); 3y € (2,0), fly) > f(z) + K(y — 2)). 


45.5.3 REMARK: Shadow sets are open and non-decreasing. 

For a general real-valued function f : [a,b] — IR, the shadow sets in Definition 45.5.2 are not necessarily 
open sets, although if f is continuous, the shadow sets Q rf, are shown to be open open sets of real numbers 
in Theorem 45.5.4 (uer The most important “deliverable” of Theorem 45.5.4 is part (vi), which implies the 
upper bound b'—a' € K (f (b) — f (a!) for the length of the shadow interval (a’, U^) in terms of f(b’) — f(a’), 
which is in turn bounded by the finite range of values for f. More importantly, the sum of the interval lengths 
b' — a’ for the whole shadow set is bounded by the sum of the differences f(b’) — f(a’), which is bounded by 
the range of values of f if f is non-decreasing. This observation is exploited in the proof of Theorem 45.5.8, 
which is in turn exploited in the proof of Theorem 45.7.4 to show that the set of points with infinite upper 
right Dini derivative has explicit Lebesgue measure zero. 


45.5.4 THEOREM: Some basic properties of shadow sets. 
Let f : [a,b] — R be a continuous function for some a,b € R with a « b. Let OF pK € P(R) denote the 
upper right shadow set for f for any gradient K c IR. 


(i) Va € (a,b), VK € R, (Di f(z) > K > 2 EOF, x). 
(ii) VA1, K2 € R, (Kı € Ko > OF pk, 2 OF rk). 
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(ili) Q$ jo is an open subset of R. 


) Q 

(iv) Q 

(v) Va’, b! € Bdy(Q} jo), (a! <b and (a,b) € so) > F(a’) < f(9)). 

(vi) VK € R, Ya’, b! € Bdy(Q}; p), ((a’ < V and (a^, V) C s) > FV) — Fa) 2 KU — a^). 


».f,k 18 an open subset of R for all K € R. 


PROOF: For part (i), let x € (a,b) and K € IR. Suppose that D;* f(x) > K. Then by Definition 40.10.2 and 
Notation 40.10.4, limsup;, o+ (f(x + h) — f(z))/h > K. Thus infpso sup;e(o y (f(x + t) — f(x))/t > K. So 
Vh € Rt, supreco,ny (f (zt) — f(z))/t > K. So Vh € Rt, 3t € (0, h), (f(z--t) — f(z))/t > K. Let h = b—z. 
Then (f(y) — f(z))/(y — x) > K with y = x +t € (a,b) for some t € (0, A). So f(y) > f(x) 4 K(y — 2). 
Therefore x € qr "m 

For part (ii), let Kı, K2 € R with Kı € Kə. Let x € OF pie: Then f(y) > f(z) + Ko(y — x) for 
some y € (2,b). So f(y) > f(x) + Ki(y — x) for the same value of y because y > x. Therefore x € OF rg 


+ + 
Hence TS 3 Tu Ko: 


For part (iii), let z € OF ro. Then x € (a,b) and f(x) < f(y) for some y € (a,b) with x < y. Let 
e€ = f(y) — f(x). Then € > 0. So by the continuity of f, there exists 6 € Rt such that | f(a’) — f(x)| < £ for 
all z' € B(x, 5)N [a,b]. Let 5’ = min(6, (y —2)/2). Then 6’ € Rt and f(z’) < f(x) +e = f(y) and 2’ € (a,b) 
for all z' € B(x, 6’). Sow’ € OF ro Therefore B(x, 6’) C OF ro. Hence OF ro is an open subset of R. 

For part (iv), define f: [a,b] — R and € = (x € (a,b); Jy € (x,b), f(y) > f(x)) with f(x) = f(x S Krz for 
all x € [a,b]. Then © is an open subset of IR by part (i ii) because f is continuous on [a,b]. But or BEC - f. 


Hence QF, ,. is an open subset of IR. 
r,f,K 


For part (v), suppose that a',b' € Bdy(Q7 so ) with a’ < b' and (a’,b’) C OF o: Then b ¢ uf o By 
Theorem 31.9.10 (iii). Let x € ic b). Let a, = sup{a’ € [r, |; f(a) € f(a’)}. Then zı € [a, b’]. Appo 
that z; # V. Then zı € Q* spo 90 f(z1) < f(y) for some y € (x1,b) by the definition of QF o But 
f(x) > f(z) for all z € (21,0) by the definition of xı, and f(x) < f(x d by the continuity ‘of f. So 
ms f(xı) for all z € (z1,0'). Therefore f(z) € f(a) for all z € [z1,b'] by the continuity of f. So 
€ (z1,0) \ [z1, ] = (b,b). But f(b) < f(x) by the assumption zı 4 b’ and the definition of xı. So 
70) < f(x) € f(xi) < f(y). Therefore f(b") < f(y), and so b € Q*.,, which is a contradiction. So 
x, =0'. Therefore f(x) < f(b’). This holds for all x € (a’,b’). Hence f(a (aj f(b’) by the continuity of f. 


For part (vi), define f : [a,b] — IR and Q = (x € (a,b); Jy € (2,0), F) > f(x )} with JG ) = f(x) — Kx for 


all x € [a,b]. Then © satisfies part (v). So f(a’) € f(U) for all a’, b! € Bdy(Q) with a’ < b! and (a/,U) C Q. 


Hence f(b’) — f(a’) > K(t — a’) for all a’, t € Bdy(Q) with a’ < [ and (a’,b’) C Ñ, where 0 = OF pk: 


45.5.5 REMARK: The construction of the shadow-set interval lists. 

The expression List(Top(IR)) in Definition 45.5.6 uses Notation 14.12.13 for extended lists. It is equal to 
the set of all finite and infinite sequences of open subsets of IR. It would have been preferable to specify 
i "as List(X) in Definition 45.5.6, where X is the set of all non-empty bounded open intervals of IR. But 
there is apparently no standard notation for this space of intervals. 

Definition 45.5.6 (ii) uses notation of the form [[x, y]] for x, y € IR to mean the non-empty bounded closed 
real-number interval [min(z, y), max(x, y)] as in Notation 16.1.15. 


The set qr, x in Definition 45.5.6 (i) is an open subset of R for all K € IR by Theorem 45.5.4 (iv). The set 
yr jc (9) in Definition 45.5.6 (ii) is a component interval of Qt mg 88 in Definition 16.1.10 if n(i) € OF pK: 
Otherwise V gk) = 0. The index set J in Definition 45.5.6 (iii) depends on K because it is the ordinal 


number which has cardinality equal to the cardinality of the set of component intervals in the set Qr BK 
(This construction of a list of component intervals is justified by Theorem 32.7.8.) 


The important thing to notice about Definition 45.5.6 is the explicit constructions at each stage, which could 
all, in principle, be written as finite-length set-theoretic expressions in terms of the given function f and the 
given enumeration 7 for the rational numbers. (See Section 15.2 for explicit enumerations of the rational 
numbers.) As noted in Remark 32.7.7, any surjection 7 : w — Q may be used in Definition 45.5.6 instead of 
a bijection. 
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45.5.6 DEFINITION: The upper right shadow-set interval list for a continuous function f : [a,b] — R with 
a,b € R such that a < b, for a given rational number enumeration map (i.e. bijection) y : w > Q, is a list 
It k € List(Top(IR)) = Unew+ Top(IR)" for K € R which is constructed in the following stages. 


(i) VK ER, OF, g = (x € (a,b); Iy € (z,b), f(y) > f(z) + K(y — 2))- 
(ii) VK € R, View, Vi rk) = {x € (a,b); [[x, m(3)]] € OF exh 


T, 


(i) (LT, ke (2))ie; is the first-instance subsequence of the sequence (V ¢ (i) )iew for all K € R, but omitting 
the empty set. (See Definition 12.4.16 for first-instance subsequences.) That is, Ik E Vi EK o B, 
where 8: J > S is the standard enumeration of S = (i € w; V px (i) # Ú and Vj < i, VI eG) Æ 


V. fk (0). (See Definition 12.4.5 for the standard enumeration of a set of non-negative integers. Note 
that J is determined by S. In fact, J is in essence the cardinal number of S.) 


45.5.7 REMARK: Management of variable-length lists of non-empty bounded open intervals. 

Writing proofs of theorems in terms of countable sets of connected components of open sets, as in the 
case of Definition 45.5.6, is made particularly cumbersome by the need to manage variable-length lists of 
components. Each list z^ pK isa finite or infinite sequence whose cardinality depends on K. For each such 


list, the index set J = Dom(I7 f. x) can depend in a very complicated way on K, but the set is computed by 
induction in terms of the given rational number enumeration map 7. 

Each list i FK also depends on the chosen rational-number enumeration map 7, but the lists constructed 
for different enumerations differ only in their order. The important thing here is that out of the infinity of 
possible orderings, a single ordering may be chosen according to a definite, explicit procedure. Much more 
important than the explicit choice for a single value of K is the fact that an explicit choice is made for all 
values of K simultaneously, thereby avoiding the temptation to invoke an axiom of infinite choice. 


45.5.8 THEOREM: Upper bound for the measure of a shadow set. 
Let f : [a,b] — R be continuous and non-decreasing for some a,b € R witha < b. Let OF pK € Top(R) be the 


upper right shadow set for f with gradient K € Rt. Let Jt nas List(Top(IR)) be the list of open intervals 
in OF f,» aS constructed in Definition 45.5.6 for an enumeration 7) : w — Q of the rationals by non-negative 
integers. Let bn pK = Mies length (I7 yc (i); where J — Dom(17 , x). Then Ul rk < K- W(f(b) — f(a)). 


PnRoor: For a given continuous non-decreasing function f : [a,b] > IR, K € Rt, and ņ : w > Q, let 
a; = inf (Ty ¢ (i) and b; = sup(/7 y (i) for all i € J. Then b; — a; € K !(f(b;) — f(a;)) for alli € J 


by Theorem 45.5.4 (vi). Therefore B, pK = Vie s(bi — ai) € K- 263 0) — P001) € K-!(f(b) — f(a)) 
because f is non-decreasing. 


45.5.9 REMARK: Upper right Dini derivative of non-decreasing function is zero almost everywhere. 
It follows almost immediately from Theorem 45.5.8 that the upper right Dini derivative of a non-decreasing 
function is zero almost everywhere. This is shown in Theorem 45.7.4 (i). 


45.6. Double shadow sets 


45.6.1 REMARK: Comparison of single and double shadow sets. 

Whereas the single shadow sets in Section 45.5 are introduced so as to demonstrate in Theorem 45.7.4 
that a continuous monotonic function has bounded Dini derivatives almost everywhere, the double shadow 
sets in Section 45.6 are presented in order to show that left and right Dini derivatives are equal almost 


everywhere. The “calculus” of double shadow sets is one or two orders of magnitude more complicated than 
the corresponding single shadow set “calculus”. This additional complexity seems to be unavoidable. 


45.6.2 REMARK: Double shadow set concept of operation. 
'The concept of a double shadow set is illustrated in Figure 45.6.1. There are five points z on the graph of the 
function f in this diagram where D; f(x) < cı and c2 < DY f(x). The double shadow procedure constructs 
a sequence of open intervals which contain those five points. 


The five component intervals in Figure 45.6.1 are obtained by first applying “torchlight” (from the lower left 
with gradient c4), and then applying "sunlight" (from the upper right with gradient c2). By comparison, 
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Figure 45.6.1. Double shadow set to cover z with D; f(x) < cı and c2 < Df f(x) 


the component intervals in Figure 45.5.1 in Remark 45.5.1 use sunlight only. The sum of the lengths of the 
open intervals is guaranteed to not exceed (cı/c2)(b — a). (The “torchlight phase" puts an upper limit on 
the intermediate vertical displacement in terms of the initial horizontal displacement. The “sunlight phase” 
puts an upper limit on the final horizontal displacement in terms of the intermediate vertical displacement.) 


If this procedure is applied again, to each of the five open intervals in Figure 45.6.1, the sum of the lengths 
of the new sequence of intervals will be bounded above by (ci/c2)?(b — a). Repeated application of the 
procedure yields the upper bound (c;/c2)"(b — a) for n steps, which clearly converges to zero if 0 € c4 < c». 
Therefore {x € (a,b); D; f(x) < cı and c2 < D; f(x)} is a set of explicit measure zero. 


The repeated double shadow set procedure may be applied for any c,,c2 € R with 0 € c4 < c2 to construct 
a disjoint cover by open intervals of the set {x € (a,b); D; f(a) < cı and c2 < D} f(x)} such that the sum 
of the interval lengths is smaller than any given £ € IR*. A countably infinite sequence of pairs (6i (i), é2(7)) 
may be chosen so that all pairs (c1, c2) with 0 € cı < co satisfy cı < č (i) and č2(i) < c2 for some i € w. (See 
Theorem 45.6.12 for such a sequence.) For this sequence of pairs, one may construct an explicit sequence 
of open intervals with total length less than € = €92~*~! for all i € w, for some eg € R*+. Then these 
explicit sequences may be combined into a single sequence as described in Remark 13.9.11. The result 
is an explicit sequence of intervals with total length less than any given eo € IR*. Consequently the set 
{x € (a,b); D; f(x) < D7 f(x)) is a set of explicit measure zero. This result can then be extended from 
(a,b) to all of IR. When this result is applied also to the function x — —f(—x), the combination of these 
results shows that f is differentiable almost everywhere. 


The double shadow sets in Definition 45.6.3 correspond to the single shadow sets in Definition 45.5.2. The 
double shadow set is employed in each stage of an iteration like in Definition 45.5.6 so as to successively 
refine a sequence of open covers of the set {x € (a,b); D; f(x) < cı and c2 < D} f(x)) in order to prove 


that it has zero Lebesgue measure. Definition 45.6.3 is illustrated in Figure 45.6.1, where Q) , .. is the set of 
+ 


T, f,C1,C2 


x-values which lie under the lightly shaded regions, and Q is the set of z-values which lie under the 


darkly shaded regions. 


45.6.3 DEFINITION: The (upper right) double shadow set for gradient pair (c1, c2) € R$ x Rt with e < c», 


for a function f : [a,b] — R on a closed bounded real-number interval [a, b] is the set or fac © IR given by 
OF pe, = {y € (a,b); 3z € (a. y). f(x) > fy) cas — 9) (45.6.1) 

Vz € OF gos Q, Fa (2) = {W € Gja; € wand [rw] € €; , , J, (45.6.2) 
feno 7 U € Ong dy € OF, (2), Fly) > F(z) + cly- 2)}. (45.6.3) 


45.6.4 REMARK: Properties of lower left “torchlight” shadow sets. 
Theorem 45.6.5 is a kind of mirror image of Theorem 45.5.4. So its proof is essentially the same. 
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45.6.5 THEOREM: Some basic properties of “torchlight” shadow sets. 
Let f : [a,b] — IR be a continuous function for some a,b € R with a < b. Let Q, fe be as in Definition 45.6.3 


for all c; € IRj. 
(i) Vy € (a,b), Va € R$, (D; f(y) <a => y € Org). 
(ii Veiis 02 € RG, (a1 € a2 > O75, € Or). 


(iii) Qz fo is an open subset of IR. 


(v) Val genu Zro), (a! <V and (a) € Mpg) > f(a’) = f(b). 


) 

) 

) Q 

(iv) Qz fe is an open subset of R. for all cı € Bi. 
) 

(vi) Vc € R$, Va/,U € Bdy(Q 


Tsa) ((a’ < V and (4,9) C 055, ) => f) — fla’) € cb — a). 

Proor: For part (i), let y € (a,b) and c € R$. Suppose that D; f(y) < cı. Then by Definition 40.10.2 
and Notation 40.10.4, lim infano- (f(y-- h) — f(y))/h < ci. Thus sup; zo infre(s,oy (f(y +t)— f(y))/t < c1. So 
Vh € R7, infresoy(f(y +t) — f(y))/t < e. So Vh € IR^, dt € (h,0), (f(y +t) — f(y))/t < e. Leth —a— y. 
Then (f(z) — f(y))/(z — y) < cı with z = y +t € (a, y) for some t € (h,0). So f(z) > fly) + a(z — y). 
Therefore y € Q7 


£ Pa e 
For part (ii), let c11,01» € Rý with c1 € c12. Let y € Oy av," Then f(z) > f(y) + ex1(2 — y) for some 
z € (a, y). So f(z) > f(y) + e1,2(z — y) for such z because z < y. Soy € Qg fep: Hence Qg 5, | C Oppo 


For part (ii), let y € Qz po Then y € (a,b) and f(y) < f(z) for some z € (a,b) with z < y. Let 
€ = f(z) — f(y). Then £ > 0. So by the continuity of f, there exists 6 € IR^ such that | f(y’) — f(y)| < & for 
all y € B(y,ô)N [a,b]. Let 6’ = min(ó, (y — z)/2). Then 6’ € Rt and f(y’) < f(y) +e = f(z), and y' € (a,b) 
for all y € B(y, 9"). So y' € QF po. Therefore B(y, 6’) € Qz , 9. Hence Qg po is an open subset of IR. 

For part (iv), define f : [a,b] > R by f : y > f(y) — ery, and Ñ = (y € (a,b); Jz € (a,y), f(2) > f()]- 
Then €? is an open subset of R by part (iii) because f is continuous on [a,b]. But €, ,,, =. Hence Qg p o 
is an open subset of IR. 

For part (v), suppose that a',b' € Bdy(Q,,,) with a’ < b' and (a',U) C Ogro Then a’ ¢ OF 


( | fro by 
Theorem 31.9.10 (iii). Let y € (a’,b’). Let y; = inf{y’ € [a^ y]; f(y) € F} Then yi € [a', y]. oe 
that yı Z a’. Then yı € Q7 jo: 5o f(y) < f(z) for some z € (a,y1) by the definition of Q7 po. But 
f(y) > f(w) for all w € (wd, E by the definition of yı, and f(y) € f(y.) by the continuity of f and the 
definition of yi. So f(w) < f(y1) for all w € (a/,y1). Therefore f(w) < f(yı) for all w € |a, vi by the 
continuity of f. So z € (a,y1) \ [a’,yi] = (a,a’). But f(a’) < f(y) by the annon yı # a’ and the 
definition of yı. So f(a’) < f(y) < f(m) < fle ). Therefore f(a’) < f(z), and so a’ € Qy po, which is a 
contradiction. So yı =a’. Therefore f(y) € f(a’). This holds for all y € (a’,b’). Hence f(a vs > f(b’) by the 
continuity of f. 

For part (vi), define f : [a,b] — IR by f : y f(y) — ciy and 0 = (y € (a, b); 3z € (a,y), f(z) > f(y))- 
Then €» satisfies part (v). So f(a’) > f(b’) for all a^, t € Bdy(Q) with a^ < b' and lac V) CQ. Hence 
f(b!) — fa’) (V — a’) for all a, € Bdy(Q) with a! < U and (a^, V) C Ñ, where Q = Q7... 


45.6.6 THEOREM: Some basic properties of double shadow sets. 


Let f : [a,b] — R be a continuous function for some a,b € R with a < b. Let QF denote the upper 


r, f,c1,c2 


right double shadow set for f with gradient pair (c1, c2) satisfying 0 € c1 < c2, and let 0, fà and O, fa be 


as in Definition 45.6.3. Let a',b' € Bdy(Q; p) with a^ < t and (a, U) C Qg pa 


(i) Ory - = (a',b') NA QT. c, c,» Where f=f "a (See Definition 45.5.2 for uH m 
(ii) Va € (a’, 0’), (Dr f(a) 5e > rE OF res. ea): 
(ili) QF ec ls an open subset of R. 
(iv) Val’, b” € BAY (Q$ po, co), ((a" < b” and (a”, b”) C rens] => f(b") — f(a”) > co(b" — a")). 


PROOF: For part (i), let a',b' € Bdy(Q, pe) with a’ < b' and (a',b') C Og pa Let f=f a^] Then by 
Definition 45.5.2, OF "x {x € (a/,U); Jy € (2,0), f(y) > f(x) -- ea(y — x)}. By lines (45.6.2) and (45.6.3) 
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of Definition 45.6.3, 


OF renes = {re es dy € Qe pu, (@ Sy and [x,y] COP», and f(y) > f(x) + ea(y — z)))- 


Let x € QF PNE Then z € Qpe because z € (a^, 0) CO»... So dy € (x,b'), fly) > f(x) + ex(y — 2). 
But (x,b') € (a',b') COP, ea So Jy € Ope, (z € y and [z,y] € Qpe and fly) > f(z) + cly — z)). 
Therefore z € QF Thus TN C (a'b) AQT 


r, f.c1,c2^ r,f.c1,c27 


Let x € (a',b') aure cic, Then x € y and [x,y] € Qpe and f(y) > f(z) * ea(y — z) for some y € Qg p e 
Such y must be in [r, 0) because b € Bdy(Q, pe) and [x,y] € zje and Qpe, € Top(R), and y Z x 


because f(y) > f(x) + ea(y — x). So y € (x,b'). Therefore x € OF on . Hence qs. ciao 


T,f,c1,02* 
For part (ii), let z € (a/, 0) with Df f(x) > ca. Then D+ f(a) > c. So x € OF 
Hence x € Q* fcin PY Part (i). 


For part (iii), €; se, € Top(IR) by Theorem 45.6.5 (iv). So Qz pe is the union of a countable set of disjoint 

open real-number intervals by Theorem 32.7.8. Each of these component intervals has the form (a’,b’) for 

some a’, b € Bdy(Q p a) with a’ < b' and (a^, V) C O7 pe since Q7 p e is bounded. Then Or, € Top(R) 
T, J ,C2 


L,f,c1 
by Theorem 45.5.4 (iv) with f= fin for each such pair (a’,b’). So (a',b') A QF € Top(IR) for each 
such pair (a’, 0). Therefore QF Ed ies with 


^b] T, f,c1,c2 
one such open set for each component of Qy f, Hence Qr 


by Theorem 45.5.4 (i). 


r, f,C1,C2 


equals the union of a countable set of open sets (a^, U) NOT po. ess 


r,f,c1,02 is an open subset of IR. 

d Ww); A di d Bdy(9; , cı g) satisfy a” < b" and (a 4 ,U") & OF, C1,C2* 'Then al^. pU € Bdy(Q* ; ) 
wd T C2 

and (a,b!) C Q, by part (i) with F = f|, wy So FW") — f(a") > c(l — a") by Theorem 45.5.4 (vi). 


Hence Ya”, b” € Bdy(Q% 7-0, ea), ((a" < b" and (a,b) C O$ pe, e) > FO) — fla") 2 cab" — a"). 


45.6.7 REMARK: Double shadow-set interval lists. 

The double shadow-set interval lists in Definition 45.6.8 are a straightforward adaptation of the single 
shadow-set interval lists in Definition 45.5.6. The main purpose of both of these definitions is to ensure that 
no axiom of infinite choice is used in their construction. Similarly, Theorem 45.6.9 is the double shadow-set 
adaptation of Theorem 45.5.8. 


45.6.8 DEFINITION: The upper right double shadow-set interval list for a continuous function f : [a,b] — IR 
with a,b € R such that a < b, for a given rational number enumeration bijection 7 : w — Q, is a list 


f. cac € List(Top(R)) = Uy, Top(IR)" for c1,c2 € R with c, < c» constructed in the following stages. 
(i) Qf “f.e1,cy 18 the upper right double shadow set as in Definition 45.6.3. 


containing n().) 


T,f.c1,c2 


(ii) Vi € w, [PT c C) = {x € (a,b); [m n(2)]] € b uns c2}. (The component of Q7 


(iii) (If Fec (Í) ies is the first-instance subsequence of the sequence (ut (i));e,, but omitting the 


T, f,C1,C2 
empty set. (See Definition 12.4.16 for first-instance subsequences.) That is, L rfe = v. Fee © b» 
where 8: J > S = {i € w; Wite cli) FO and Vj « i, bre elI) F Urto uu) is the standard 
enumeration of S. (See Definition 12.4.5 for the standard enumeration of a set of non-negative integers. 
Note that J is determined by S. In fact, J is in essence the cardinal number of S.) 


45.6.9 THEOREM: Upper bound for the measure of a double shadow set. 
Let f : [a,b] — IR be continuous and non-decreasing for some a,b € IR with a < b. Let Q* 


imd Top(R) be 
the upper right double shadow set for f with gradients c1, c2 € Rj with cı < c2. Let L^" PN List(Top(IR)) 


be the list of open intervals in Q as constructed in Definition 45.6.8 for an enumeration 1 : w — Q of 


T, f,c1,c2? 


the rationals by non-negative integers. Let li] X cias = Diez length(17 fer.co(4)), where J = Dom(I*, o, e3): 


Then li Fonts < (c1/c2)(b — a). (See Definition 32.7.10 for the measure of open sets of real numbers.) 


PRoor: For a given continuous mud un pincer f : [a,b] > R, ec € Ri with c1 < c», and 


rational number enumeration 7 : w — Q, let a? = inf (IF "fc seg (2)) and b7 = Bupil s. c (i)) for all i € J. 
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Then IF Fei ca V) = (aj, 57) for all i € J because he 
T,f,ci,c2 


T,f,c1,c2 
is contained in one and only one open component interval of Qy fac 


(i) is an open component interval of Q* . Each 


T,f.c1,c2 
component of Qr C 
Let (a^, 0) be a component interval of Q; ,.... Then f(b’) — f(a’) < e(l — a’) by Theorem 45.6.5 (vi), and 
by Theorem 45.6.6 (iv), f(b) — f(a!) = co (b! — a7) for all à € J such that (a7, 57) C (a’,b’). Let J’ be the 
set of i € J such that (a7, bi’) C (a', t^). Then since f is non-decreasing, 


2 e aj)tze! 2,0 f(b7) — f(a;)) 
z (fŒ) — f(a^)) 
(e feo —a’). 


Ec 
< 


Applying this to all components (a’,b’) of Qg pe gives X iez (by — a7) € (c1/c2)(b — a). 


45.6.10 REMARK: Bound on measure of double shadow set for a given gradient-pair. 

Theorem 45.6.9 implies that the measure of Q7 fc, c, does not exceed (c/c) (b — a). This is independent of 
f(b) — f(a), by contrast with Theorem 45.5.8, where the bound is proportional to f(b) — f(a). The reason 
for this is that the jumps in f are excluded by the “torchlight” procedure, as can be seen in Figure 45.6.1. 
The torchlight procedure excludes any jump which goes “faster” than cı, whereas the sunlight procedure 
excludes jumps which go “slower” than K. 


'The construction in Theorem 45.6.9 may be iterated by applying it to each of the Pp intervals of 
or fai, c, tO obtain the combined double shadow set UT “fre ,c2,¢1,¢ TOT all of the intervals of oF dau Then 
li foie ta ORE: € (c1/c3)? (b— a). One may define an infinite sequence (Gk)? of open sets by Go = (a, b) and 
Gua OT “f.e1,c9(Gk), meaning the (c1, c2) double shadow set of f on Gp. Then u* (Gx) < (e1/ c2)" (b — a), 
which clearly converges to zero. Then by Theorems 45.6.5 (i i) and 45.6.6 (ii), this implies that the set of 
points x € (a,b) where D; f(x) < cı and D} f(x) > cz has explicit measure zero. 


45.6.11 REMARK: Coverage of all (D; f(x), D} f(x)) pairs satisfying 0 € D, f(x) < D} f(x). 
The gradual build-up of coverage of the set ((A, A) € IRj x IR^; A < A} by individual coverage regions 
(0, A) € R$ x Rt; A < c(i) and (i) < A} in Theorem 45.6.12 6.12 (vi) is illustrated in Figure 45.6.2. 


C2 (i) <A, 
44 y 
* Ys 
1 TT 
3 .| Le 
4l "YA 
1 13 
=() 3 È 
2 J a d duos 
2 “YO 
1 LA 
14, B 
*é 
"B 
T T T r . 
1 2 3 4 A««(i) 
Figure 45.6.2 Coverage of (A, A) space by individual coverage regions 


For each pair in the sequence (c (i), ex Wien of gradient pairs in Theorem 45.6.12, the measure-bounding 


procedure in Theorem 45.6.9 can be “run” often enough to show that the measure (i.e. sum of component 
interval lengths) of the set of points x € (a,b) where D; f(x) < c(i) and D7 f(x) > c2(i) is less than 27^ tep. 


The “speed” of convergence is very slow for a gradient pair (ci(i), c2(2)) which is close to the line À = A. 
So the combined procedures for a large finite number of gradient pairs will, in practical calculations, yield 
an extraordinarily slow convergence. This is not surprising because the combined procedure is hunting for 
possibly very microscopic deviations from differentiability, and these could occur near any pair of gradient 
values. An intuitively appealing alternative might be to check every point of (a,b), but this would require 
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an uncountable number of checks! The procedure in Theorem 45.6.12 may be described as a combination of 
“refinement” and “extension”, which resembles the rational number enumerations in Section 15.2. 


The suggestion of Riesz/Szókefalvi-Nagy [125], pages 8-9, is to use all rational pairs (c1,c2), which can 
be explicitly enumerated. This is logically adequate, and would make the method of Theorem 45.6.12 
unnecessary. But the “refinement and extension" method described here has the advantage of at least trying 
to be efficient. 


45.6.12 THEOREM: Properties of covering construction for left/right Dini derivative pairs. 
Define the sequences p : w > w, o : w > w, c1:w — IR and co : w — R as follows. 
(1) Vi € w, pli) = max(k € w; hy 2” < i). 
f ; (i)-1 
(2) Vi € w, ali) = JLT 22*. 
(3) Vi € w, a(i) = 279 (1 +4 — o(i)). 
(4) Vi € w, ex(i) = 27°) (2 + i — o (i). 
Then: 


(i) p and o are well defined and non-decreasing. 


(ii) cı and c2 are well defined functions. 
(iii) Vi € w, c (i) € i. 
(iv) Vi € w, 0 < c(i) < ex(i). 

) 

) 


(v) VA, A € R$, (A< A > Ji € w, (A< c(i) and c2(i) < A)). 
(vi) (A, A) € R§ x Rt; A < A} = Uieu { (å, A) € RP x R+; A < c(i) and c2(i) < A}. 

PRoor: For part (i), define S : w —> w by S(k) = S 2% for k € w, and define the set-sequence (X;)%o 
by X; = {k € w; S(k) € i) for i € w. Then p(i) = max{k € w; S(k) € i} = max X; for all i € w, and 
X = (X;)%, is non-decreasing because S is increasing. (Note that p is a left inverse of S because p(S(k)) = k 
for all k € w.) Then Xo = {0} because Sọ = 0 and S1 = 1. So Xo is non-empty and bounded. Therefore 
Xj is non-empty and bounded for all i € w. Hence p(i) is well defined for all i € w. The well-definition of c 
follows directly from this. Since X is non-decreasing, p is non-decreasing, and so c is non-decreasing. 

Part (ii) follows from part (i). 

For part (iii), the formula p(i) = max{k € w; 15,4 27 < i) implies that p(i) € {k € w; hag 2” < i). 
Therefore dcm 2? < i. In other words, a(i) < i. 

Part (iv) follows from part (iii). 

For part (v), let A, A € Rọ with A < A. Let 6 = min(1, A — A). Then 0 < 6 € 1. So ó^! > 1. Therefore 
log;(0-1) > 0. Let p = ceiling(log,(max(6—!, \/2))). Then p € Zf = w and 2? > 6-! = max(1,(A—A)7!) > 
(A — à). So A— à > 2. Let q = floor(2?*?A). Then q x 2?*?A <q+1. SoA < 2-?-7(q4+ 1) and 
A> A2 222774427? =2-P-2(g +4) > 2-P-72(g4 2). Let i= yu Se Then p(i) = p + 2 since 
0 < q < 2?*? because p > log,(max(d~!, \/2)) and so 2? > max(0-1, 4/2) > A/2, which implies A < 2?+?, 
and so q X 29*2A < 22»*3 < 92714. So g(i) = S784) 2”. Then c(i) = 27? -2(1 + i — o(i)) = 2-?7?(g + 1) 
and c3(i) = 2-?-7(2 +i — o (i)) = 2-?-?(q + 2). Hence A < c(i) and ce(i) < A. 


Part (vi) follows from part (v). 


45.7. Lebesgue differentiation theorem 


45.7.1 REMARK: Motivation for presenting the Lebesgue differentiation theorem. 

The Lebesgue differentiation theorem yields a very general kind of integration along rectifiable curves in both 
Cartesian spaces and in differentiable manifolds. The functions which can be integrated along such curves 
include not only real-valued functions but also affine and general connections (contracted with the curve’s 
velocity field). It is the generalisation of the integration of connections to calculate parallel transport along 
rectifiable curves which is the principal motivation for presenting the Lebesgue differentiation theorem here. 
(See Remarks 67.3.3 and 71.4.5 for related comments.) 
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For the Lebesgue differentiation theorem, see for example Riesz/Szókefalvi-Nagy [125], pages 5-11; Bass [53], 
pages 117-124; A.E. Taylor [145], pages 402-410; Shilov/Gurevich [136], pages 216-220; Simmons [137], 
pages 224—229; Kolmogorov/Fomin [104], pages 321—323, 328-331; Bruckner/Bruckner/Thomson [56], pages 
149-153, 267-271. 


45.7.2 REMARK: Development of Lebesgue differentiation theorem. 
There are several stages in the development of a constructive proof of the Lebesgue differentiation theorem. 
These are outlined as follows, including some of the intended applications. 


(1) Section 12.4. Enumerations of sets by extended finite ordinal numbers. 
Construction of explicit bijections from some element of wt to any given subset of w. 


(2) Section 15.2. Cardinality of the rational numbers. 
Enumerations of the rational numbers. Explicit construction of bijections from w to Q. 


(3) Section 16.1. Real-number intervals. 
Definitions of intervals of the real number system. 


(4) Section 32.7. Component enumerations for open sets of real numbers. 
Explicit enumerations of the open-interval components of a given open set of real numbers. 


(5) Section 34.6. The connected component map. 
Map from points to connected components which contain them. Enumerations of connected components. 


(6) Section 34.8. Locally connected separable spaces. 
Theorem that open sets in a locally connected separable space have a countable set of components. 


(7) Section 38.9. Rectifiable curves and paths in metric spaces. 
Definitions for rectifiable curves in metric spaces. 


(8) Section 40.10. Dini derivatives. 
Definitions of upper and lower, left and right Dini derivatives. 


(9) Section 45.1. Sets of real numbers of measure zero. 
The standard definition of sets of measure zero. For example the rational numbers. 


(10) Section 45.2. Sets of real numbers of explicit measure zero. 
Definitions of explicit measure zero sets of real numbers. 


(11) Section 45.3. Explicit families of real-number sets of explicit measure zero. 
AC-free proof of closure of explicit measure zero sets under countable unions. 


(12) Sections 45.5 and 45.6. Shadow-set maps. 
Properties of shadow-set maps. 


(13) Section 45.7. Lebesgue differentiation theorem. 
AC-free proof that functions of bounded variation are differentiable almost everywhere. 


(14) Section 50.7. Lipschitz manifolds and rectifiable curves. 
Applications to rectifiable curves in Lipschitz manifolds. 


(15) Section 67.3. Reconstruction of parallel transport from its differential. 
Application to the integration of connections on general differentiable fibre bundles. 


(16) Section 71.4. Parallel transport by horizontal lift functions. 
Application to the integration of affine connections on tangent bundles. 


The motivation for the strong focus on the constructions underlying the Lebesgue differentiation theorem is 
to ensure that there is no need for any axiom of choice here. Saying that a construction merely “exists” is 
nowhere near as satisfying as a specific concrete, explicit construction. (One might say that a construction 
“in the hand" is worth more than twice as much as a choice-function “in the bush” .) 


The motivation for formulating the integration of connections on fibre bundles in terms of rectifiable curves 
instead of a narrower class of curves is partly because it is always more satisfying to have a “sharp theorem” 
which cannot be improved upon, and partly because integration of connections is often required for curves 
which have at least finitely many discontinuity points. The ability to integrate along rectangular paths in 
the coordinate chart space is a minimum requirement. 
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45.7.3 REMARK: Dini derivative estimates for monotonic real-valued functions of a real variable. 

The set of points where a monotonic real-valued function of a real variable has an infinite upper right Dini 
derivative is a set of measure zero. This is shown in Theorem 45.7.4 (i). (See Section 40.10 for definitions 
and notations for upper and lower right and left Dini derivatives.) 


Since Theorem 45.7.4 is stated within Zermelo-Fraenkel set theory without any axiom of choice, the given 
function f is required to exist within the ZF framework. Then the sets of measure zero are explicit in terms 
of the given function f. To be more precise, the required families of covers which validate the zero measure 
of the sets are given in terms of set-theoretic formulas containing the function f, whereas non-explicit sets of 
measure zero are only known to have a suitable cover for each ¢ € R*+, with no way of proving that a single 
family of covers exists in ZF set theory to validate the zero measure. 


The proof given here for the Lebesgue differentiation theorem is based on the observation that a real function 
f : R — R is differentiable at any point x € R if and only if all four Dini derivatives at x are finite and 
equal to a single common value. This requirement can be split into the following requirements. 


(1) Dj f(x) # +00 and D; f(z) # —co. 


(2) D, f(x) > D? f(z) and D; f(x) < D; f(a). 

The strategy of Theorems 45.7.4 and 45.7.6 is as described by Riesz/Szókefalvi-Nagy [125], pages 7-9, which 
is similar to the approach of Kolmogorov/Fomin [104], pages 318-323, and A.E. Taylor [145], pages 403-407. 
The main difference in the proof given here is that each stage of the proof is written out in greater detail to 
ensure that no axiom of infinite choice is invoked. 


'The set of points where the function is not differentiable must fail in one of the two ways described. That is, 
either (a) one Dini derivative is infinite at a point or else (b) two Dini derivatives are not equal at a point. 
In either case, these are shown to be possible for a monotonic function only on a set of measure zero. This 
is shown by (a) putting a bound on the measure of the set of points where the derivative is larger than some 
constant, and (b) putting a bound on the measure of the set of points where the difference between Dini 
derivative pairs is greater than some constant. As the “constant” is varied, the measure tends to zero. 


45.7.4 THEOREM: Infinite Dini-derivative sets of a non-decreasing function have explicit measure zero. 
Let f : I IR. be a non-decreasing continuous function. 


(i) (x € R; D? f(x) = oo) is a set of real numbers of explicit measure zero. 


(ii) {x € R; D7 f(x) = oo) is a set of real numbers of explicit measure zero. 


Pnoor: For part (i), let f : IR — IR be a non-decreasing continuous function. Let a,b € R with a « b. 
Let g : [a,b] — R be the restriction of f to [a,b]. Let S = (x € (a,b); D^ g(x) = œ}. For K € R*, 
let OF ok = {x € (a,b); dy € (z,b), gly) > g(a) + K(y — x)}. Then S C OF uk for all K € IR*. But 
es, x is the upper right shadow set for g with gradient K by Definition 45.5.6. So Qr. x € Top(R) for all 
K € Rt by Theorem 45.5.4 (iv). Therefore by Theorem 32.7.8, Qt g,k has a component interval enumeration 


Ij, € List(Top(IR)) for each K € IR* in terms of some given rational-number enumeration 7 : w > Q. 


'Thus by Definition 32.7.2, Qr gK 18 equal to the disjoint union of the sequence of open real-number intervals 
Ij, = Ud k(0)ie; € Top(R)’, where J = Dom(I7, p) € w*. 


It follows from Theorem 45.5.8 that u}, y < K !(f(b) — f(a)) where ut, y = Xics length(17, x (i))- 
Let e € Rt. Put K = 1e !(f(b) — f(a)) !. Then Ln < ie < e. Therefore S is a set of real numbers of 
explicit measure zero by Definition 45.2.2. 


To complete the proof of part (i), the result for f restricted to bounded closed intervals must be extended 
to the whole domain IR. Let co € Rt. Let ¢ : w > Z be a bijection. (This is an enumeration of all 
integers.) For all i € w, let g; be the restriction fl EA, ea] of f to [¢(z),¢(¢) +1]. For each i € w, let Dk 
be the (unique) sequence of disjoint open intervals defined as above for the restricted function g; for the 
gradient K; = 27eg (F(C + 1)) — f(C(2)) 1. Let X = {(i, j) € w x w; j € Dom(I7,, &.)). Since X is a 
subset of wxw, one may construct the (unique) bijection € : w — X which preserves order from w to X, where 
the order “<” on X is defined so that (ài, j1) < (i2, j2) whenever (i1 + j1 < i2 + j2) or else (i1 + j1 = i2 + jo) 
and jı < jo. (The map £ is easily generated inductively by choosing the least remaining element €(k) of X for 
each value of k € w as in Remark 13.9.11. Strictly speaking, the domain of € is some element of w*, but this 
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is too tedious, and obvious, to describe here.) A combined list T+, € List(Top(R)) of all of the open intervals 
in all of the lists | may now be (uniquely) constructed as the map Ie, : k 5 in (£(k)1) from 
w to Top(IR), where the two components of each £(k) are written as E(k) = (£(k)o, £(k)1). (This list of open 
intervals does not cover the integers embedded in IR. The easy but tedious resolution of this technicality are 


omitted here for brevity.) 


One obtains thus an explicit countable cover of S = (x € IR; D} f(x) = oo] by non-empty bounded open 
real-number intervals whose total length sums to less than and given eo € Rt. Hence S is a set of real 
numbers of explicit measure zero. 

For part (ii), let f : IR — IR be non-decreasing and continuous, and define f:ROR by f: 24 —-f(-2). 
Then f is non-decreasing and continuous. So {x € R; D; f(x) = oo] is a set of explicit measure zero by 
part (i). But {x € IR; D? f(x) = oo) = {x € R; D7 f(-x) = oo). Hence {x € IR; Dj f(x) = oo) is a set of 
explicit measure zero. 


45.7.5 REMARK: Left and right Dini derivatives are equal almost everywhere. 
Theorem 45.7.6 is significantly more difficult than Theorem 45.7.4. The proof is achieved by Riesz/Szókefalvi- 
Nagy [125], pages 8-9, with the tacit assistance of the axiom of countable choice because they rely upon the 


assertion that the union of a countable set of sets of measure zero has measure zero. No axiom of choice is 
in fact required because the relevant countable union of sets of measure zero can be written explicitly. 


45.7.6 THEOREM: Unequal Dini-derivative sets of a non-decreasing function have explicit measure zero. 
Let f : R — R be a non-decreasing continuous function. 


(i) (x € R; D; f(x) < Dj f(x)) is a set of real numbers of explicit measure zero. 
(ii) {x € R; D? f(x) > D; f(x)) is a set of real numbers of explicit measure zero. 


Pnoor: Part (i) follows from Theorems 45.6.9 and 45.6.12 in the same way that Theorem 45.7.4 follows 
from Theorem 45.5.8. 


Part (ii) may be proved similarly to part (i). 


45.7.7 REMARK: Lebesgue differentiation theorems. 
Although the Lebesgue differentiation theorem is valid for general functions of bounded variation, this is not 
shown here. The intended application here is to rectifiable curves, which are always continuous. 


The Lebesgue differentiation theorem for non-decreasing continuous functions follows from the various con- 
structions in Sections 45.5, 45.6 and 45.7 which provide upper bounds on the measure of sets of points 
where the derivative does not exist. This is easily extended to functions with locally bounded variation by 
expressing them locally as a difference between two non-decreasing functions. This can then be extended to 
rectifiable curves in Cartesian spaces by noting that each component of such a curve has bounded variation. 


45.7.8 THEOREM: Lebesgue differentiation theorem for monotonic continuous functions. 
Let f : IR — IR be a non-decreasing continuous function. Then f is differentiable everywhere except on a set 
of points of explicit measure zero. 


PROOF: The assertion follows from Theorems 45.7.4 and 45.7.6. 


45.7.9 THEOREM: Lebesgue differentiation theorem for locally BV functions. 
Let f : IR — R be a continuous function with locally bounded variation. Then f is differentiable everywhere 
except on a set of points of explicit measure zero. 


Pnoor: The assertion follows from Theorem 45.7.8 by expressing the function as the difference of two 
non-decreasing continuous functions. 
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Chapter 46 


VECTOR FIELD CALCULUS FOR CARTESIAN SPACES 


46.1 "Drue vector fields: ee roic e oko borse Ro gom ox EE OR E ae S ROS a d 1477 
46.2 Differential forms . . . . . .. 22 lll los sonos 1479 
46.3 Naive vector fields... 2... 2 ons 1481 
46.4 The Lie bracket for naive vector fields . .. ...... e 1483 
46.5 Holonomy and the Lie bracket . .. .. .. rs 1485 
46.6 Naive differential forms . . . . a ee 1488 
46.7 The exterior derivative . . . . . 2. aa sesso roro 1490 
46.8 The exterior derivative for vector field arguments... . o e a 1495 
46.9 Rectangular Stokes theorem in two dimensions . . . .... 222r 1498 
46.10 Rectangular Stokes theorem in three dimensions . ........ llle 1501 


46.0.1 REMARK: Vector field calculus includes differential form calculus. 

In the same way that Chapter 61 includes the calculus of vector fields and differential forms on differentiable 
manifolds, Chapter 46 includes the calculus of vector fields and differential forms on Cartesian spaces. Thus 
Chapter 61 may be regarded as an extension of Chapter 46 from calculus in a single chart to calculus for 
multi-chart atlases. 


The topics for vector fields include particularly the Lie bracket and the holonomy (or nonholonomy) of 
the integral curves of pairs of vector fields. The topics for differential forms include especially the exterior 
derivative and the very closely related Stokes theorem. 


46.0.2 REMARK: Exterior calculus combines tensor algebra with differential and integral calculus. 
Differential forms and the exterior derivative combine the antisymmetric tensor algebra in Section 30.4 and 
Cartesian space tensor bundles in Section 30.6 with the differential calculus in Chapters 40, 41 and 42. The 
Stokes-style integration theorems also make use of various kinds of integration in Chapter 43. 


46.1. True vector fields 


46.1.1 REMARK: True vector fields versus naive vector fields. 

As discussed in Remark 46.3.1, the “true vector fields” in Section 46.1 are less convenient for practical 
purposes than the “naive vector fields” in Section 46.3. Nevertheless, the “true” versions are presented here 
within the framework of the fibre bundle formalism. The naive vector fields may be regarded as some kind 
of abbreviation of the true vector fields. Within a given context, it should be straightforward to “upgrade” 
naive vector fields to true vector fields whenever required. 


46.1.2 REMARK: Vector fields on Cartesian spaces. 

Vector fields are presented for differentiable manifolds in Section 57.1. Vector fields for Cartesian spaces are 
much simpler, although not very much different. For Cartesian spaces, a few definitions and notations are 
required for the benefit of exterior calculus. In particular, the Lie bracket of vector fields is of some interest. 


Notation 30.6.14 for sets of cross-sections of tensor bundles on Cartesian spaces can be applied to the 
tangent bundle total space T(R”) for Cartesian spaces IR" with n € Zj in Notation 26.14.3 to define the 
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sets X(T(IR”)) and X(T(IR")|U) for U € IP(IR?), which are respectively sets of global and local cross- 
sections of the basic tangent bundle of R”. In other words, they are sets of vector fields. For the purposes 
of exterior calculus, the differentiability of such vector fields must be defined. Definitions and notations for 
vector fields follow very much the same pattern as for differential forms in Section 46.2. 


46.1.3 DEFINITION: A (global) vector field on a Cartesian space R” with n € Zf is an element of the set 
X (T(IR?)) of cross-sections of the bundle T(R”) of tangent vectors on R”. 


A (local) vector field on a subset U of a Cartesian space R” with n € Zi is an element of the set X (T (IR") | U) 
of local cross-sections with domain U of the bundle of tangent vectors on U. 


46.1.4 REMARK:  Differentiability definitions for vector fields. 

The differentiability classes C^ for k € Zi in Definition 46.1.5 assume that the domain is an open set. This 
avoids the difficult issues regarding differentiability at boundary points. When k = 0, such issues admittedly 
do not arise, but it is straightforward to define continuous vector fields in terms of the relative topology of 
the domain if non-open domains are required. 


Although the formal definition of differentiability of vector fields applies a *velocity chart" to convert tangent 
vectors to real-number component tuples, it is usual to identify the tangent sets T,(R”) with R” to make 
differentiation simpler to write. (See Definition 26.14.7 for the velocity chart 8 : T(R”) — R” which is 
mentioned in Definition 46.1.5.) 


46.1.5 DEFINITION: Differentiability of vector fields. 7 

A (global) C! vector field on a Cartesian space IR", with n € Zf and k € Zj, is a vector field X € X(T(IR")) 
for which 8 o X € C*(IR", R”), where 8 : T(R”) + R” is the velocity chart for T(R”). 

A (local) C" vector field on a set U € Top(IR"), with n € Zj and k € Zg, is a function X € X(T(IR") |U) 
for which 8 o X € C*(U, R”), where 8 : T(R”) — IR" is the velocity chart for T(R”). 


46.1.6 NOTATION: Sets of differentiable vector fields. 
X*(T(IR")), for k € Zj and n € Zj , denotes the set of global C* vector fields on a Cartesian space R”. In 
other words, 


Yk Ee Zl, Yne Zł,  X*(T(R")) ={X e X(T(R”)); Bo X e C'(R",R*)), 


where 6: T(R”) > IR" is the standard velocity chart for T(R”). 


X*(T(IR")|U), for U € Top(IR?), k € Z and n € Zj , denotes the set of local C* vector fields on U. In 
other words, 


Vk € Zi, Vn € ZA, VU € Top(IR"), 
X*(T(R")|U) = (X e X(T(R”) |U); Bo X € C'(U, R")). 


46.1.7 REMARK: The Lie bracket and directional derivatives of vector fields om Cartesian spaces. 

The Lie bracket for vector fields on differentiable manifolds is introduced in Section 61.5. This is based on the 
naive directional derivatives of vector fields on differentiable manifolds in Sections 61.2 and 61.4. These con- 
cepts are essentially equivalent to the corresponding concepts for Cartesian spaces. It is convenient to define 
directional derivatives and the Lie bracket here for Cartesian spaces because they arise in Theorem 46.8.8 
and Definition 46.8.10 when extending the exterior derivative from constant vectors to general vector fields. 


The directional derivative of a vector field on a C? manifold M in Definition 61.2.3 has an output in the 
double-tangent bundle T'(T'(M)). Such abstraction is not applied to Definition 46.1.8, where the output is a 
simple real-number n-tuple. 


46.1.8 DEFINITION: Directional derivative of a vector field on a Cartesian space. 

The directional derivative of a vector field Y € X1(T(IR"),U), where n € Z and U € Top(IR?), with 
respect to a vector V € T, (IR") for some p € U, is the real n-tuple 557 , 8(V)'0,:B(Y (2))| p € R”, where 
B : T(IR") > R” is the velocity chart for T(R”). 


Alternative name: The action of a vector V on a vector field Y . 


g= 
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46.1.9 NOTATION: ÓyY, for a vector V € T,(R”) and a vector field Y € X'(T(R")|U) with n € Zj, 
U € Top(R”) and p € U, denotes the directional derivative of Y with respect to V. In other words, 


Vn € Zt, VU € Top(R”), VY € X' (T (IR^) |U), Vp € U, VV € T, (IR^), 
AVY = $^ L(V Y ax (Y (x))| 
i—1 


pp! 


where 8 : T(R”) > IR" is the velocity chart for T(R”). 


46.1.10 REMARK:  Non-vectoriality of the directional derivative of a vector field. 

The partial derivative Oy Y in Notation 46.1.9 is a well-defined real-number n-tuple, but this tuple does not 
transform like the component tuple of a vector under C? local diffeomorphisms of the point-space R”. This 
is because the second derivatives of diffeomorphisms enter into the transformation rules when V and Y are 
both transformed as component tuples of vectors in IR”. (See Theorem 46.4.5 for details.) 


The non-vectorial second-order terms for transformations of a directional derivative Oy Y of a vector field Y 
can be cancelled by subtracting some other construction which always suffers from the same second-order 
terms. It happens that if V € T,(IR") is extended to a vector field X € X'(T(IR") |U), then Oy(y) X 
has the same second-order dependence on local diffeomorphisms as Oy Y. So subtracting Oy(,)X from 
OvY = Ox(5)Y yields a vectorial n-tuple of real numbers. In other words, it yields the same vector if the 
same construction procedure is applied in different coordinate systems. Since this holds for all p € U, it 
seems reasonable to construct the difference Ox (jj Y — Oy(p)X for all p € U, and then attempt to verify that 
the map p ++ Ox (p)Y — Oy(p)X behaves like the velocity components of a vector field when subjected to 
diffeomorphisms. 


46.1.11 DEFINITION: Lie bracket or commutator of actions of vector fields om a Cartesian space. 
The Lie bracket of vector fields X,Y € X'(T(IR"),U), where n € Zf and U € Top(IR"), is the function 
p xp Y = Oy (p) X from U to R”. 


Alternative name: The commutator of actions of vector fields. 


46.1.12 REMARK: Proof of vectoriality of the commutator of actions of vector fields. 

The basic calculus for showing that the commutator in Definition 46.1.11 yields a real-number n-tuple 
which transforms like a vector under C? diffeomorphisms of the point space is not too difficult. However, 
the use of coordinate charts for a more or less abstract tangent bundle necessitates conversions back and 
forth between abstract vectors and concrete real-tuple coordinates. (See for example Theorem 61.2.11 for 
some computations.) Therefore it is preferable to show the basic properties of vector field commutators using 
naive vector fields. (See for example Theorem 46.4.5.) Otherwise the practical inconvenience of computations 
would be almost as great for Cartesian spaces as for general differentiable manifolds. 


46.2. Differential forms 


46.2.1 REMARK: Real-valued differential forms. 

Real-valued differential forms are defined as antisymmetric covariant tensor fields. (See Notation 30.6.12 for 
the bundle A,,(T(R”)) = Upern Am(Zp(R")) = Upern Sin (Zp (R7 )) of real-valued antisymmetric covariant 
tensors of degree m on R”. See Notation 30.6.14 for the set of cross-sections X(A,,(T(IR”))).) Differential 


forms may be thought of as functions of infinitesimal volume, area or length elements. 


Definition 46.2.2 implies that a differential form of degree m on a Cartesian space R”, for n,m € Zi , 1s 
a function w : R” > Uperr Am(Zp(R")) such that w(p) € Am(Tp(IR")) for all p € R”. However, it is 
often convenient to employ “short-cuts” for differential forms so that the first map of the double map is 
skipped. In other words, one may regard a differential form as a map & from T" (R^) = Uper» T; (IR")" 
to R defined by @(V) = w(x"(V))(V) for all V € T"(IR?), where 1"* : T™(R"”) ^ R” is the projection 
map for T™(R”). Such considerations are more conveniently discussed in the context of differential forms 
on differentiable manifolds. (See for example Remarks 57.7.1 and 58.11.6. See also Section 21.4 for general 
short-cuts of cross-sections of fibrations.) 
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46.2.2 DEFINITION: Real-valued differential forms on Cartesian spaces. 
A (global) differential form of degree m on a Cartesian space IR", with n,m € ZZ, is an element of the set 
X (Am(T(R”))) of cross-sections of the bundle of antisymmetric covariant tensors of degree m on R”. 


A (local) differential form of degree m on a subset U of a Cartesian space R”, with n,m € Z, is an element 
of the set X (A; (T (IR?)) | U) of local cross-sections with domain U of the bundle of antisymmetric covariant 
tensors of degree m on U. 


46.2.3 REMARK:  Vector-valued differential forms. 

Definition 46.2.4 extends Definition 46.2.2 from real-valued to general vector-valued differential forms. In 
the case of differentiable classes of differential forms, the linear space W must have a suitable differentiable 
structure. This is easily provided in a standard way if W is a finite-dimensional real linear space. The 
standard differentiable structure of a real Banach space is also suitable. A typical choice for W in differential 
geometry applications is the Lie algebra of the structure group of a differentiable fibre bundle. Such Lie 
algebras are finite-dimensional real linear spaces. 


46.2.4 DEFINITION:  Vector-valued differential forms on Cartesian spaces. 

A (global) differential form of degree m, valued in a real linear space W, on a Cartesian space IR", for n, m € 
Zi, is an element of the set X (A, (T(IR?), W)) of cross-sections of the bundle of antisymmetric covariant 
tensors of degree m on IR", valued in W. 


A (local) differential form of degree m, valued in a real linear space W, on a subset U € Top(IR?) of a 
Cartesian space R”, for n, m € Z, is an element of the set X (Aj, (T(IR"), W) | U) of local cross-sections of 
the bundle of antisymmetric covariant tensors of degree m on U, valued in W. 


46.2.5 REMARK:  Differentiability of differential forms. 

In the context of differentiable manifolds, the differentiability of differential forms, which are cross-sections 
of particular kinds of tensor bundles, is defined via manifold charts as in Theorem 57.5.7. For differential 
forms on Cartesian spaces, it is convenient to define differentiability in terms of Cartesian coordinates in the 
classical fashion. 


For local differential forms in Definitions 46.2.6 and 46.2.8, it is assumed that the domains of cross-sections 
are open subsets of R”, which makes the standard OF differentiability of functions well defined. (This 
constraint can be relaxed for C? differentiability.) 


46.2.6 DEFINITION:  Differentiability classes for real-valued differential forms. 7 
A (global) C* differential form of degree m on a Cartesian space IR", with n,m € Zf and k € Zj, isa 
function w € X (A, (T(IR"))) for which the map z +> w(x)(V) is of class C^ from IR" to R for all V € (IR")™. 


A (local) C* differential form of degree m on a set U € Top(IR?), with n,m € Zg and k € Z; , is a function 
w € X(A,,(T(IR"))|U) for which the map z+ w(x)(V) is of class C^ from U to R for all V € (R")™. 


46.2.7 NOTATION: Sets of differentiable real-valued differential forms. 
X" (A, (T(IR?))), for k € Z and m,n € Zf, denotes the set of global C^ differential forms with degree m 
on a Cartesian space IR". In other words, 

Vk € Ze Vm,n € y^ 

X" (A (T(R?))) = {w € X(Am(T(R"))); VV € (R?)", «+ o(z)(V) € C'(R",R)). 

XF (A, (T (IR?)) | U), for U € Top(IR"), k € Zi and m,n € Zj , denotes the set of local C* differential forms 
with degree m on U. In other words, 

Vk € Zt, Vm,n € Zt, VU € Top(IR"), 

X*(A& (T(R?)) |U) = {w € X(Am(T(R”))|U); VV € (R")™, x o w(z)(V) e C*(U, R)). 

46.2.8 DEFINITION:  Differentiability classes for vector-valued differential forms. o 
A (global) C* differential form of degree m on a Cartesian space IR", with n,m € Zi and k € Zg, valued ina 


finite-dimensional real linear space W, is a function w € X (A,,(T (IR^), W)) for which the map z — w(x)(V) 
is of class C^ from R” to W for all V € (IR")™. 


A (local) C" differential form of degree m on a set U € Top(IR?), with n,m € Zf and k € Z{, valued 
in a finite-dimensional real linear space W, is a function w € X(A4(T(IR?),W)|U) for which the map 
x — w(x)(V) is of class C* from U to W for all V € (R")™. 
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46.2.9 NOTATION: Sets of differentiable vector-valued differential forms. 
X* (A, (T (IR?), W)), for k € Zf and m,n € ZF, and a finite-dimensional real linear space W, denotes the 
set of C^ global W-valued differential forms with degree m on a Cartesian space R”. In other words, 


Vk € Ze Ym,n € Zos 
XF (Am (T (R7), W)) = {w € X(Am(T(R”),W)); VV € (R")™, 24 w(£)(V) € CF(IR*,W)). 


X*(Am(T(R”),W) |U), for U € Top(R”), k € Zj and m,n € Zf, and a finite-dimensional real linear 
space W, denotes the set of C^ local W-valued differential forms with degree m on U. In other words, 


Vk € Zt, Ym,n € ZA, VU € Top(R”), 
X*(Am(T(R”), W) |U) = {w € X (A (T(R"), W) |U); VV € (IR")™, z > w(x)(V) e C*(U, W)}. 


46.3. Naive vector fields 


46.3.1 REMARK: Convenience versus philosophical correctness. 

Although the tangent bundle T(R”) for the Cartesian space R” in Definition 26.14.2 is in some sense 
“philosophically correct”, it suffers from the inconvenience of needing to frequently map backwards and 
forwards between abstract and concrete structures using charts of various kinds. The purpose of charts for 
manifolds and fibre bundles is to map an abstract structure to a concrete structure such as a Cartesian space. 
Therefore it is counterproductive to introduce abstract structures for tangent bundles on a Cartesian space, 
since this is the environment where real computations (with real numbers) are supposed to be carried out. 


When defining the tangent bundle itself, there must be some way of distinguishing the elements of Tp, (R”) 
from T,,(IR") for p;,p» € IR" with pı # po. This justifies using point/velocity pairs (p,v) for tangent 
velocity vectors in Definition 26.16.2, or the corresponding affine maps Lp,» : t ^ p-- tv in Definition 26.13.1. 
However, in the case of vector fields and differential forms, any function on the point-space R” automatically 
“tags” the velocity value at that point. So it is then superfluous to provide “point-tags” for all elements of 
the tangent bundle. 


For a Cartesian space R” and a finite-dimensional real linear space W, Notations 30.4.2 and 30.4.3 define the 
linear spaces A; (IR^; W) = .£. (IR^; W) and A;,IR* = Z; (IR^; IR) which contain respectively vector-valued 
and real-valued antisymmetric m-linear forms on IR". A function from IR” to one of these spaces may then 
be regarded as a differential form on IR". Similarly, a function from IR" to IR" may be regarded as a vector 
field on R”. 


Naive vector fields and naive differential forms may be obtained from the philosophically more correct 
Definitions 46.1.3, 46.2.2 and 46.2.4 by applying the coordinate chart 8 : T(R”) > R” in Definition 26.14.7. 
This chart is a bijection when restricted to a single tangent space T,(IR”), but otherwise is not injective. 
This is why the naive structures require special efforts to retain the base-point information. Using the naive 
structures could be thought of as “working in the charts", as opposed to working with real tangent vectors. 
This is in essence the same as the classical tensor calculus approach, using numbers rather than structures. 


The naive approach and tangent bundle approach to vector fields and differential forms are effectively 
interchangeable. So it is possible to work with the naive structures for convenience, particularly for calculus, 
and then map the results back to the better-defined tangent bundles. 


46.3.2 REMARK: Naive vector fields. 

The naive vector fields in Definitions 46.3.3 and 46.3.4, and Notation 46.3.5 are represented simply as 
functions between a Cartesian space IR" and itself. As mentioned in Remark 8.8.7 and elsewhere, it often 
happens that different classes of objects are represented by the same ZF set construction. This is not 
ambiguous if the object class is indicated in the context. In the naive vector field definitions, the domain 
R” of the function is thought of as a base space B, while the direct product IR" x R” is thought of as the 
total space E. Then a map X : IR" > R” is thought of as the map x > (x, X(x)) from B to E. 


The use of cross-section notations X*(IR") and X*(IR"|U) gives the hint that these are vector fields, not 
just maps between Cartesian spaces. Note that notation for the naive vector field space X*(IR”) is distinct 
from the genuine vector field space X*(T(IR”)) in Definition 46.1.5. 
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46.3.3 DEFINITION: Naive vector fields on Cartesian spaces. 
A (global) naive vector field on a Cartesian space IR" with n € Zj is a map from IR" to R”. 


A (local) naive vector field on a subset U of a Cartesian space R” with n € Zt is a map from U to R”. 


46.3.4 DEFINITION:  Differentiability of naive vector fields. 7 

A (global) C" naive vector field on a Cartesian space R”, with n € Zf and k € Zj, is a function in 
CR R”). 

A (local) C" naive vector field on a set U € Top(IR"), where n € Zf and k € ZF, is a function in C^(U, R”). 


46.3.5 NOTATION: Sets of differentiable naive vector fields. 
XF(IR"), fork € Zi and n € Zf, denotes the set of global C^ naive vector fields on a Cartesian space IR". 
In other words, 


Vk e Zl, Yne Ze, X*(R”) = C*(IR^,R?). 


X* (IR^ |U), for U € Top(IR"), k € Zj and n € Z{, denotes the set of local C^ naive vector fields on U. In 
other words, 


Yk € Zl, Yn € Z, VU eTop(R?), X*(R” |U) = C*(U, R^). 


46.3.6 REMARK: Vectorial transformations of naive vector fields. 

Theorem 46.3.8 does not prove that naive vector fields are vectorial. It proves that if a C^ naive vector field 
is transformed according to the vectorial transformation rule, then the resulting function is also a C* naive 
vector field, which in this case means only that it is a C^ function with the correct domain and target set. 


In the naive vector field framework, vectoriality is defined with reference to transformation properties with 
respect to point-set diffeomorphisms. In the tangent bundle framework, these diffeomorphisms are replaced 
by the transition maps between coordinate charts. All objects which reside in tangent bundles, and in any 
kind of tensor bundle, are automatically transformed according to specified rules. For naive vector fields, 
vectoriality must be applied *by hand" in each application context. 


An important difference between the formulation of vectoriality in Theorem 46.3.8 and the vectoriality of 
tangent vectors on manifolds, for example in Theorem 54.1.11, is that the transformation $ : R” > IR" 
for naive vector fields is a point transformation, whereas the transformations ¢ = wv» o wv ba Re x 
R” for tangent vectors on manifolds are coordinate chart transition maps, which are therefore coordinate 
transformations. In the latter case, abstract points are regarded as static while the coordinates change. 


46.3.7 DEFINITION: The Jacobian (matrix) (function) for a C* diffeomorphism ¢ : U > U between sets 
U,U € Top(R”) is the function Jy : U + Mn (IR) which is defined by 


Vx € Dom(¢), I(x) = [Jo (x) jejzi- = [82,6 (2) ]7;-1- 


46.3.8 THEOREM: Differentiability of the vectorial transformation of a naive vector field. 

Let n € Zf. Let U,U € Top(R"). Let k € Zj. Let X € X*(R"|U). Let 9 : U 2 Ü be a CP 
diffeomorphism. Define X : U > R” by X(ó(p)) = Jo(p)X(p) for all p € U, where Jy is the Jacobian of ¢. 
Then X e X*(IR^ | D). 


PROOF: Let q€ ÜU. Let p= $^! (q). Then X(q) = X (¢(p)) = Js(p) X(p) € R^. So X : Ü — R” is a naive 
vector field on U by Definition 46.3.3. 

Since ¢ € C**1(U, R”), it follows from Theorem 42.5.16 (vii) that 0;¢ € C*(U, IR") for all j € Np. Therefore 
d;¢' € C'(U, IR) for all i,j € Nn. So the map p  ? 5 ., 0j6(p)! X (p) is in C*(U, R) for all i € N, by the 


inductive application of TOT 41.1.18 (ii, iii). So p — J(p)X (p) isin C^(U, R”). Thus X o ġ € C*(U, R”). 
Therefore X = (X o ¢) o ¢7! € C*(Ŭ, R”). Hence X € X*(IR"|U) by Notation 46.3.5. 
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46.4. The Lie bracket for naive vector fields 


46.4.1 REMARK: Directional derivatives of naive vector fields. 

The directional derivative for naive vector fields in Definition 46.4.2 requires an explicit specification of the 
base-point where the derivative should be computed because a naive tangent vector V € IR" has no base- 
point tag, unlike the philosophically correct tangent vectors in T(R”). Note that the component charts 
which were required in Definition 46.1.8 are not required here because the naive tangent vectors are their 
own component tuples. 


46.4.2 DEFINITION: Directional derivative of a naive vector field on a Cartesian space. 
The directional derivative of a naive vector field Y € X!(IR^,U), where n € Zf and U € Top(IR?), with 
respect to a vector V € IR" at p € U, is the real-number n-tuple $77 ., V'O,:Y (x)| 


py—p' 
Alternative name: The action of a vector V on a naive vector field Y . 


46.4.3 NOTATION: p vY, for a point p € U, a vector V € IR", and a naive vector field Y € X! (IR"|U) 
with n € Zt and U € Top(IR"), denotes the directional derivative of Y with respect to V. In other words, 


Vn € Zi, VU € Top(R^), VY € X! (R^|U), Vp € U, VV € R^, 


r—p' 


p vY = » V'O,:Y (x)| 
i=1 


46.4.4 REMARK: Non-vectoriality of the directional derivative of a naive vector field. 

Theorem 46.4.5 shows the non-vectorial second-order term which arises from applying the construction in 
Definition 46.4.2 to a transformed naive vector field. This term is non-zero if the transformation is not affine. 
(See Section 24.4 for affine transformations.) 


In the fibre bundle framework, the second-order term arises because the directional derivative of a vector 
field has a non-vertical component in the second-level tangent bundle, which is the tangent bundle of the 
tangent bundle. (See Remark 61.2.8.) But in the naive vector field framework, it is simply a non-vectorial 
term in the transformation rule. 


46.4.5 THEOREM: Coordinate transformation rule for directional derivative of naive vector field. 

Let n € Zt. Let U,U € Top(IR"). Let 9: U > U be a C? diffeomorphism. Let p € U and V € R”. Let 
Ye XR |U). Define V € R” by V = Js(p)V, where Jj is the Jacobian matrix function for ¢. Define 
Ý :U > R” by Y(ó(z)) = Jo(z)Y (x) for all a € U. Then 


H 
ll 


Jy(D)Op,v¥ + 9; VIY (p*8,,0,,6(2)], 


j,k=1 


Pnoor: By Notation 46.4.3, Os) 0Y = 355, Vap Y (y), By the chain rule, Theorem 41.7.4, 


$(p)' 


VhjeN. X IPPO] uu = 22 0s). 0i Y lyon 


£—1 
= Qu Y (6()) |... 
= Os (Jo(z)Y E) liay 


=e A, rola) )Y (x)*|,_, 


= Y: (0.0, 0(7))Y (a)l, + 3 (Ople) Y (w)*| 


k=1 k=1 y—p 


= Y; Jela) kou Y (z)*|,... + X0. Ope ó() Y E)" lpp 
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Therefore 
- VAM = [4 i 
Vie Nai (By Y) = 2, V Op YQ) ae) 

= » Dx Jo(p)*; V?) OyeY (y) | =4(p) 
é=1 j=l 
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This verifies the assertion. 


46.4.6 REMARK: The Lie bracket. 

To vectorise the partial derivative of a vector field, the second-order term must be removed somehow. One 
way to do this is with a connection which adjusts the n-tuple for each coordinate system. The other well- 
known way to vectorise the n-tuple construction is to subtract from it another n-tuple construction which 
has the negative of the second-order term. Note that this subtraction method (called the Lie bracket) is not 
entirely satisfactory because the identity of the original n-tuple is lost. When another n-tuple is subtracted, 
it becomes a different n-tuple in all coordinate systems, whereas the application of a connection preserves 
the identity of the original vector by matching it with other n-tuples in other coordinate systems. 


46.4.7 DEFINITION: The Lie bracket of naive vector fields X,Y € X!(IR"|U) on a set U € Top(R”), 
where n € Zj, is the map from U to IR” defined by p> Oy, (oY — Op vq) X for pe U. 


46.4.8 NOTATION: [X,Y], for naive vector fields X,Y € X!(IR^|U) for U € Top(IR?) with n € Zi, 
denotes the Lie bracket of X and Y. In other words, 


Vp € U, [X, Y](p) = 85, xi Y — 3p Y (p) X 
= = X(p)'dx¥ (£) -p — » Y (p8,: X(z)|, 


46.4.9 REMARK:  Vectoriality of the Lie bracket construction. 

Theorem 46.4.10 shows the vectoriality of the Lie bracket construction procedure. In other words, when 
the same procedure is carried out in the transformed coordinates, using vectorially transformed n-tuples as 
inputs, the output of the procedure is equal to the vectorial transformation of the output of the procedure 
performed in the original coordinates. Thus there is nothing at all vectorial about the n-tuple which is 
produced by the procedure in a single coordinate system. It is the construction procedure which is vectorial. 


46.4.10 THEOREM:  Vectoriality of the Lie bracket. 
Let n € Zj. Let U,U € Top(IR?). Let X,Y € X'(R^|U). Let ¢:U > U be a C? diffeomorphism. Let Jọ 
be the Jacobian matrix function for 9. Define X : Ü — R” and Y : U > R” by X(o(p)) = Jo(p)X(p) and 
Y ((p)) = Jo(p)Y (p) for all p € U. Then 

Vp € U, [X, Y](6(p)) = Jo(p) [X, Y |(»). 
PROOF: Let p € U. Then by Notation 46.4.8 and Theorem 46.4.5, 


[X, Y](6(p)) = Pop) X (oo) Y - 0,6) * 


= Jg(p)Op,x(py¥ + È Xe p) Y (p)*0,,0,,ó(2)],..., — Je(p)Op,v (p) X — 2; Y (y X (p) ða, nola) ap 
i—4 Js = 
= Js(P)Op,x(p)Y — Je(p)Op.v(o) X 
= Je(p) DX Yo ): 
This verifies the assertion. 
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46.4.11 THEOREM:  Differentiability of the Lie bracket. 
Let n € Zi. Let U € Top(IR?). Let k € Zi. Let X, Y € X**! (R^|U). Then [X,Y] e X*(IR^|U). 


PROOF: Let k € Zj. Then Theorem 42.5.16 (vii) implies that 0;Y € C^(U, R”) for all i € N,,. So the map 
po X(p)'OsY (x)| rp is in C*(U,IR”) for all i € Nn. Therefore the map p > Xi; X (p)'OsY (2)|,_, is in 


C* (U, R”) by the inductive application of the sum rule. So [X,Y] € C*(U, R”) by Notation 46.4.8. Hence 
[X,Y] e X*(IR"|U) by Notation 46.3.5. The case k = oo then follows from Notation 42.5.10. 


46.5. Holonomy and the Lie bracket 


46.5.1 REMARK:  Holonomic families of vector fields. 

The holonomic vector field families in Definition 46.5.3 arise very easily and naturally on differentiable 
manifolds as the coordinate vector fields for charts. (See Definition 61.5.19 for the differentiable manifold 
version.) Levi-Civita [26], pages 53-59, referred to a holonomic vector field family as a “Jacobian system". 


The simple exterior derivative which is defined in Section 46.7 is extended to general nonholonomic vector 
fields in Section 46.8. Holonomic vector field families have the advantage that all of the terms with Lie 
brackets disappear. It is frequently useful to simplify expressions by such a choice of vector field families. 


46.5.2 REMARK: The application of vector field holonomy to parallelism holonomy. 

In the application of the Lie bracket to the curvature of connections on differentiable fibre bundles, two loops 
are traversed simultaneously. When parallel transport is integrated around points p in a small closed loop 
in the base space M of a fibre bundle (E, m, M, A7), the fibre set E, above each point p € M experiences 
infinitesimal transformations when viewed via a fibre chart ó € AZ. When p moves around a small curved 
quadrangle in M, the individual points in the image of Ep in F via $ also move along a curve. 


But there is an important difference here. Whereas the base-space curve starts and ends at the same point, 
the projection of the fibre sets Ej, onto the fibre space F does not in general follow a closed curve. This is the 
essence of curvature. It explains why one studies the holonomy deviation for the Lie bracket in terms of the 
"size of the break" between the start and end points, whereas in the study of curvature of connections, the 
base-space loop is always closed. The connection has non-zero curvature if the lifted curve in the total space 
is not closed. This explains why the nonholonomy (i.e. curvature) of a connection is sometimes explained, 
often with a diagram, in terms of a closed curve, and sometimes in terms of the gap of an open curve. There 
are two curves simultaneously, one open, one closed. The Lie bracket holonomy interpretation is concerned 
with the open curve in a fibre space. 


Thus the importance of the Lie bracket for curvature in fibre bundles, including tangent bundles, vector 
bundles and principal bundles, lies in its application to fibre spaces, not to base spaces of fibre bundles. 
'The important vector fields on fibre spaces are the infinitesimal transformations which are generated by Lie 
algebra elements, as described in Section 63.6. 


46.5.3 DEFINITION: A holonomic family of naive vector fields on a set U € Top(IR"), where n € Zj, isa 
family (X;)ier with X; € X! (IR" |U) for all i € I, which satisfies 


46.5.4 REMARK: Interpretation of the Lie bracket as deviation from holonomy. 

The most important attribute of the Lie bracket is its meaning. The Lie bracket of two vector fields equals 
zero when they are holonomic, and when they are not holonomic, the Lie bracket provides a measure of the 
deviation from holonomy. This is asserted in Theorem 46.5.8. (See Theorem 61.5.21 for the corresponding 
assertion for differentiable manifolds.) 


The integral curves in Theorem 46.5.8 are solutions of ordinary differential equations where the curves act 
both as input and output. They have the form 0,7(t) = X (»(t)) for given fixed vector fields X. So bounds 
for these curves can only be obtained by solving a kind of “chicken-and-egg” riddle. The path which is taken 
by the curve y through the vector field X determines its velocity, which determines its path. Therefore the 
proof of Theorem 46.5.8 requires some subtle arguments. 


The output from the first side of the “holonomy quadrilateral” becomes the input of the second side, which 
is repeated three times. If it is only known that X and Y are continuous, only linear approximations can be 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1486 46. Vector field calculus for Cartesian spaces 


Psi lee -X 7 
DT * P20 
P3,0 -x P2 a 
4. «&— — — — — ——————— e 
P3,0 P2,a 
Y 
—-Y Y 
-Y 
P4,a X Pi, 


Po X a * Pia 


M 


n 


pitt Pio 


Figure 46.5.1 Holonomy quadrilateral for two vector fields in Cartesian space 


made from vertex to vertex. At each vertex, the integral curve diverges from the linear approximation, which 
can lead to a quite large deviation from holonomy at the last stage. This is illustrated in Figure 46.5.1. 


Bounds for each output must be known in terms of the corresponding input, which implies that the solutions 
of ODEs must have known continuity or differentiability with respect to their initial values. 


'Theorems which assert that the Lie bracket is equal to the limit of the ratio of nonholonomy to area of a 
quadrilateral are presented by Spivak [37], Volume 1, pages 159-163; Frankel [12], pages 129-131; Bishop/ 
Goldberg [3], pages 135-138; Crampin/Pirani [7], 79- 82; Schutz [36], pages 45-47. 


46.5.5 REMARK: Families of quadrilateral curves generated by vector field pairs. 

Definition 46.5.6 is somewhat untidy, but it does describe the kinds of families of quadrilateral curves 
which are required for holonomy deviation computations. As noted in Remark 44.3.2, the integral curves in 
Definition 46.5.6 are not necessarily unique if it is only known that the vector fields are continuous. But as 
is shown in Theorem 44.5.3, the solutions are unique if the vector fields are at least Lipschitz continuous. 


46.5.6 DEFINITION: A quadrilateral curve family at a point po € U, for U € Top(IR?) and n € Zi, 
generated by vector fields X, Y € X?(IR" |U), is a family of curves (?4)5e(0,4,) for some ag € IR*, such that 
Ya satisfies the following for all o € (0, ao). 


(i) Ya € C*([0, 4a], U). 
(ii) Ya(0) = po. 
(iii) Vk € Na, Yali- 1)o,ko) € C! (((k — 1)o, ko), U). 


) 
)7 
) 
(iv) Vt € (0,0), Ya(t) = X (a (t). 
) al 
i) 
) 


(v) Vt € (a, 2a), A) = Y (Yalt)). 
(vi) Vt € (2a, 3a), y4( 
(vii) Vt € (3a, 4a), A) = —Y (ya (t)). 


46.5.7 REMARK: Minimal requirements for vector-field-pair nonholonomy estimation. 

Unfortunately, the quadrilateral curve families in Definition 46.5.6 could fail to exist, fail to be unique, or fail 
to be sufficiently continuous or differentiable to be usable for estimating the nonholonomy of pairs of vector 
fields. Theorem 44.4.9 implies the existence of the first side ya of the quadrilateral curve ya following 


love) 
vector field X in Definition 46.5.6 (iv) for all a less than some ag € Rt. 

Similarly, the second side, commencing at *4(o) and following the vector field Y in Definition 46.5.6 (v), 
will have at least one solution “Vel eee for a’ € (0,01(o)) for sufficiently small o1(o) € IR*. However, 
o1(a) could be less than a. (See Example 44.3.7 for typical divergence of ODE solutions for even analytic 
vector fields.) If so, the bound o will need to be lowered. However, the difficulties do not stop there. This 
bound o4(o) evidently depends on o, assuming that X, Y and po are given. In order to adjust ag € IR* 
to a sufficiently low value, it must be known that for all a € (0,9), the value of a (qa) is not less than a. 
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This is a weaker requirement than a uniform constant lower bound for a;(q@) as a function of a, but even 
this weaker requirement, Jag € Rt, Va € (0,09), ai(a@) > o, is not guaranteed by the simple existence 
assertion in Theorem 44.4.9. Therefore it is desirable to replace the pointwise existence assertion with a 
uniform “existence radius” for integral curves of Y, depending only on Y and po. 


When the “integral curve existence radius uniformity” issue has been resolved so as to obtain existence of 
a quadrilateral curve family, the next item on the agenda is uniqueness. Amongst the possibilities for the 
integral curves "Falso for k € IN4 in Definition 46.5.6 (iii), a combination of choices must be made 
for each a € (0,ao). (See Remark 44.3.2 for examples of ODE non-uniqueness.) Even if the holonomy 


deviation limit, lima ,9« (71/2 (40/2) — ya(0))/a, is well defined and independent of the choices of curves, 
it is certainly very inconvenient to have to manage such complexity. 


Using Theorem 44.5.3, uniqueness of quadrilateral curve families can be obtained by requiring the vector fields 
to satisfy a Lipschitz condition. Then the next “flies in the ointment” are continuity and differentiability. Here 
continuity means that the first order limit, lim, ,94 (^o (4o) — Ya(0))/a, must equal zero, and differentiability 
means that the second order limit, lim, ,9« (Yq1/2(4a!/?) —44 (0))/a, must be well defined. To obtain this, the 
continuous or differentiable dependence of solutions of ODEs on their initial values must be demonstrated. 
This is because each curve Yq | (( for k = 2,3,4 has a variable initial value. It must be shown 


k—1)o,ko) 
that y4(ko) depends in a suitably continuous or differentiable fashion on yo((k — 1)o) for all k € N4 
and a € (0,a9). This then puts a stronger requirement on the differentiability of the vector fields. 


Clearly to meet all of the practical requirements for applications to nonholonomy estimates, the quadrilateral 
curve families in Definition 46.5.6 will require the vector fields to be at least C?. 


((2019-8-21. Theorem 46.5.8 is not yet correct. Please ignore it. 
2019-8-26. Should first show lim, ,9« (yo (4o) — Ya(0))/a = 0 as a first-order version of Theorem 46.5.8. 
Also, the unfinished proof of Theorem 46.5.8 starts with an excessively lengthy proof of some bounds for 
quadrilateral curve family range. It should be possible to accomplish this in a couple of lines. Maybe 
'Theorem 40.8.7, a mean value theorem for curves, could be useful for this. 
Before Theorem 46.5.8, it would be desirable for Theorem 44.5.5 and its corresponding second-order version 
to be proved first. Also required are continuous and differentiable dependence theorems for ODEs. 
The overall objective here is to find minimal differentiability conditions on the vector fields which guarantee 
that the quadrilateral curve families converge as expected. It’s fairly obvious that C1 differentiability will 
be sufficient. It isn't so obvious how to prove this in a tidy way. 
Another requirement is that, as mentioned in Remark 46.5.7, the “convergence radius” of ODE solutions 
must be uniformly bounded below in some sense. This is intuitively clear, but also must be proved. 
All of these considerations imply that the ODE theory needs to be “upgraded” before the Lie bracket can 
be interpreted in terms of quadrilateral curve family limits. )) 


46.5.8 THEOREM: The Lie bracket equals the limit of holonomy deviation divided by area. 

Let n € Zt and U € Top(R”). Let X1, Xə € X!(R” |U). Let po € U and ðo € R*. For 6 € (0, 60], 
let ys : [0,49] — U be a continuous curve with ys(0) = po, which satisfies O,y5(s) = Xı(ys(s)) and 
Os¥5(8 +28) = — Xi(vys(s-4- 20)) for all s € (0,6), and sys (t) = Xə (ys (t)) and dys (t + 20) = —X2 (y(t 4- 26)) 
for all t € (6,26). Define p : R — U by p(0) = po and p(a) = Ya1/2(4a1/?) for all a € Rt. Then pis a 
continuous curve which satisfies at o(@)|,-0 = [X1, X2](po). 


PROOF: Let po €U. Let Kı = Bj,,,, where r1 = min(1, $d(po, R” \ U)). (See Notation 37.3.2 for Bpo,rı- 
See Definition 37.4.1 for d(po, IR^ V U).) Then rı € Rt, Ky is a compact subset of U, and po € Int(K1). So 
X; and X» are bounded on Ky. Let co = max(1, sup;e x, max(| Xi(q)|, | Xa(q)])). Then co € RF. 


Let 5 = min(d9, 1r1/co). Let py, = ya(kô) for à € (0,0,] and k € Zs. (See Notation 14.4.7 for Zs.) To show 
that |y5(t) — po| < tri for all t € [0,6], for all 6 € (0,6;], suppose that |ys(t) — po| > Fri for some t € [0,6], 
for some 6 € (0,0,]. Let tı = min(t € [0,5]; |ys(¢) — po| > irij. Then tı € (0,6] is well defined by the 
continuity of ys, and y5(t) € Kı for all t € [0,11]. So |ys(t) — po| € cot by Theorems 43.9.8 and 43.9.5 (viii) 
for all ¢ € [0,44]. But then |ys(t1) — po| € coti € codi < ini < ini which is a contradiction. Therefore 
lys (£) — po| < $71 for all t € [0,6], for all 6 € (0, 81]. 

Similarly |ys (£-- kó) — py,a| € 01co for all k € Za, for all t € [0,6], for all ó € (0,61]. So |ys(t) —po| € 46160 € rı 
for all t € [0, 46], for all 6 € (0, ô1]. It follows that |X| and |X2| are uniformly bounded by co on the ranges 
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of the curves ys for all 6 € (0,61]. More importantly, the derivatives of X; and Xə are uniformly bounded 
on the ranges of the curves ys for all ó € (0,0;] because their ranges are included in a compact subset of U. 
In other words, 3c; € R+, vô € (0, 61], Vt € [0, 40], Vi € Nn, max(|8; X1 (y8 (t))l, |O; Xa (s (0))]) < e. 
For such c4 € IR*, ys(p1,5) = po + 9Xi(po) + 2020,, x (po) X1 + £1, where 

((2019-8-12. To be continued ... )) 


((2019-8-28. After proving the Lie bracket holonomy relation, maybe summarise it using the Landau order 
symbols in Section 39.7. Whereas some authors use the weak order bound “O”, here the strong order bound 
“o” will be required, but for a lower monomial degree. )) 


46.5.9 REMARK: The Lie bracket relation to holonomy links ODE theory with differential geometry. 

The relation of the Lie bracket to holonomy implies a very strong link between calculus, geometry and physics. 
'The Lie bracket is an unavoidable component of the curvature concept, and curvature is probably the most 
important amongst all of the core concepts of differential geometry. Without curvature, the subject would 
be merely differential topology, namely the study of differentiable manifolds without geometry. Curvature is 
also arguably the core mathematical concept in geometrised fundamental physics because it represents field 
strength (or force) in field theories. 


The geometric significance of the Lie bracket can only be verified by means of substantial applications of 
ODE theory, which in turn requires substantial applications of both differential and integral calculus. Thus 
the Lie bracket /holonomy relation tightly links differential geometry with Cartesian space calculus. 


One could take it on trust that the Lie bracket is to be used in the definition of curvature. However, even the 
definition of the Lie bracket, ignoring its holonomy significance, is a somewhat questionable construction. 
As mentioned in Remark 61.5.6, both swap functions and drop functions are required in order to fit the 
Lie bracket into the framework of differentiable manifolds because vectors in two different tangent spaces 
must be subtracted, and their difference is equal to a vector in yet a third, different tangent space. (Related 
difficulties are also mentioned in Remarks 61.5.4 and 61.5.5.) Instead of regarding the Lie bracket as some 
kind of “computational magic" which serendipitously yields something which “transforms like a vector", 
it is much more satisfying to know that a truly geometric concept lies behind it. This is the purpose of 
demonstrating the relation of the Lie bracket to holonomy, and thereby to curvature. 


46.6. Naive differential forms 


46.6.1 REMARK: Naive differential forms on Cartesian spaces. 

Naive differential forms are defined following the pattern of naive vector fields. In other words, elements 
of the total space have their base-point tags removed, and are replaced by their component tuples, which 
makes coordinate charts unnecessary, but the base-point must be indicated by context. 


The interesting differential operator for vector fields is the Lie bracket. The corresponding interesting 
differential operator for differential forms is the exterior derivative. For both operators, tensoriality is a 
consequence of cancellations of second-order derivatives of local C? diffeomorphisms by virtue of alternating 
signs of terms. Since the proof of tensoriality and some other basic properties for the exterior derivative is 
somewhat non-trivial, the exterior derivative is presented in Sections 46.7 and 46.8. 


46.6.2 DEFINITION: Real-valued naive differential forms on Cartesian spaces. 
A (global) naive differential form of degree m on a Cartesian space R”, with n,m € Zia is a function from 
IR" to the set Am R” of real-valued antisymmetric m-linear functions on the linear space IR". 


A (local) naive differential form of degree m on a subset U of a Cartesian space R”, with n,m € Zj, is a 
function from U to the set A,,IR" of real-valued antisymmetric m-linear functions on the linear space R”. 


46.6.3 DEFINITION:  Vector-valued naive differential forms on Cartesian spaces. 

A (global) naive differential form of degree m, valued in a real linear space W, on a Cartesian space R” with 
n,m € Zg, is a function from R” to the set A,,(IR";W) of W-valued antisymmetric m-linear functions on 
the linear space R”. 


A (local) naive differential form of degree m, valued in a real linear space W, on a subset U of a Cartesian 
space IR", with n,m € 773 is a function from U to the set A4, (IR^; W) of W-valued antisymmetric m-linear 
functions on the linear space R”. 
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46.6.4 REMARK: Differentiability of naive differential forms. 

As in Notation 46.3.5, the differentiable cross-section sets in Notations 46.6.6 and 46.6.8 do not contain 
genuine cross-sections because the differential forms do not constitute a genuine fibre bundle or fibration. 
However, as mentioned in Remark 46.3.2, the maps may be regarded as true cross-sections of true fibrations if 
point-tags are added to the outputs from the maps. Thus for example, w € X*(A,,(IR", W) |U) is a function 
from U to Am(R”, W), but it can be converted to the map p  (p,w(p)) from U to U x Am(R”, W). 


46.6.5 DEFINITION:  Differentiability classes for real-valued naive differential forms. 

A (global) C" naive differential form of degree m on a Cartesian space IR", with n,m € Zj and k € Zj , is 
a naive differential form w : R” — A,,IR" for which the map x > w(z)(V) is of class C* from R” to IR for 
all V € (R?)". 


A (local) C" naive differential form of degree m on a set U € Top(IR"), with n,m € Zf and k € Zg, is 
a naive differential form w : U > A,,IR" for which the map x ++ w(zx)(V) is of class C^ from U to R for 
all V € (R")™. 


46.6.6 NOTATION: Sets of differentiable real-valued naive differential forms. 
X*(A,,R”), for k € Zi and m,n € az denotes the set of global C^ naive differential forms with degree m 
on a Cartesian space IR^. In other words, 


Vk € Zi, Vm,n € Zf, 
X*(A,R”) = {w : R? > A4IR*; VV € (IR), z 4 w(z)(V) e C*(R^,R)). 


X*(A,,IR^|U), for U € Top(IR^), k € Zi and m,n € Zi, denotes the set of local C* naive differential 
forms with degree m on U. In other words, 


Vk € Zi, Vm,n € ZA, VU € Top(IR^), 
X*(A4R^|U) = {w : U > A4B7; VV € (R")™, z w(2)(V) e C*(U,R)}. 


46.6.7 DEFINITION:  Differentiability classes for vector-valued naive differential forms. 

A (global) C" naive differential form of degree m on a Cartesian space R”, with n,m € Zj and k € Zf, 
valued in a finite-dimensional real linear space W, is a function w : R” > Am(R”,W) for which the map 
x — w(x)(V) is of class C* from R” to W for all V € (R”)™. 


A (local) C" naive differential form of degree m on a set U € Top(IR?), with n,m € Zj and k € Zf, 
valued in a finite-dimensional real linear space W, is a function w : U > A,,(IR",W) for which the map 
x — w(x)(V) is of class C^ from U to W for all V € (R")™. 


46.6.8 NOTATION: Sets of differentiable vector-valued naive differential forms. 
X*(A,,(R",W)), for k € Zj and m,n € Z, and a finite-dimensional real linear space W, denotes the set 
of C* global W-valued naive differential forms with degree m on a Cartesian space IR". In other words, 


Vk c De Vm,n € y^ 
X* (As (R^, W)) = {w : R^ > Am(R”, W); VV € (R")™, z e w(z)(V) e C'(R^,W)). 


X" (A, (IR", W) | U), for U € Top(IR?), k € Zj and m,n € Zi , and a finite-dimensional real linear space W, 
denotes the set of C^ local W-valued naive differential forms with degree m on U. In other words, 


Vk € Zt, Vm,n € Zt, VU € Top(R”), 
X* (As (R?, W)|U) = {w : U 2 Am(R”, W); VV € (R")™, z w(x)(V) e CF(U,W)). 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1490 46. Vector field calculus for Cartesian spaces 


46.7. The exterior derivative 


46.7.1 REMARK: The exterior derivative requires integral calculus for its motivation. 

The exterior derivative is not defined in Chapters 27-30 on tensor algebra because differential calculus is 
required for its definition, and its motivation comes from integral calculus. Theorem 46.9.3 motivates the 
exterior derivative of a 1-form, and Remark 46.10.1 motivates the exterior derivative of a 2-form. (It is 
called the exterior differential by Guggenheimer [16], page 189; Frankel [12], page 73; Whitney [161], page 70; 
Choquet-Bruhat [6], page 60. This terminology seems more accurate, but is not widely adopted. d.) 


For the exterior derivative, see Lang [23], pages 124-137; Gómez-Ruiz [14], pages 96-97; Guggenheimer [16], 
pages 189-193; Gallot/Hülin/Lafontsine [13], pages 43-45; Crampin/Pirani [7], pages 120-126, 258-259; 
Poor [32], pages 88, 150-151; Spivak [37], Volume 1, pages 210-215; Darling [8], pages 35-39; Sternberg [38], 
page 53; Frankel [12], pages 73-77; Szekeres [305], pages 447-453; Willmore [4]. pages 202-205; Penrose [297], 
pages 231-233; Misiar/ Phorne/ Wheeler [292], pages 90-94, 114-120; Bishop/Crittenden [2 ] pages 64—68; 
Bishop/Goldberg [3 ], pages 167-169; Nash/Sen [30], pages 41-43; Whitney [161], pages 70-74; Flanders [11], 
pages 20-22; Lovelock/Rund [27], pages 136-141, 344-349; Sulanke/Wintgen [4 [40], pages 106-109; Choquet- 
Bruhat [6], pages 59-63; Malliavin [28], pages 115- 122; Federer [69], pages 351-353; Schutz [36], page 134; 
Auslander/MacKenzie [1], pages 211-212; Bleecker [254], pages 10-11; Kobayashi/Nomizu [19], pages 7-8, 
34-37; Struik [39], pages 206-207; EDM2 [113], pages 388-389; Drechsler /Mayer [262], pages 25-28. 


The invention of the exterior derivative is generally credited to Élie Cartan in 1901. (See for example Misner/ 
Thorne/Wheeler [292], page 198; Sternberg [38], page 161; Penrose [297], page 231.) 


46.7.2 REMARK: Naive vector fields and differential forms are best for concrete differentiation. 

The tangent-bundle-based vector fields and differential forms in Sections 46.1 and 46.2 are superior from 
the philosophical point of view, as mentioned in Remark 46.3.1, but they are inconvenient for computations 
with derivatives, such as are required for the definitions and properties of the exterior derivative. Therefore 
it is the naive vector fields and differential forms in Sections 46.3 and 46.6 which are employed here. It is 
straightforward to translate all results back to the tangent-bundle-based definitions after the work is done. 


46.7.3 REMARK:  Ezplicit definition of the exterior derivative. 

Many authors define the exterior derivative either in terms of tuples of vector fields, or in terms of components 
of differential forms with respect to some basis, or as the unique operator which satisfies a small set of axioms. 
Definition 46.7.4 gives an explicit specification of the exterior derivative. (See Notation 30.4.2 for the space 
Am(R”, W) of W-valued m-linear functions on the linear space R” for n € Zf. See Notation 46.6.8 for sets 
X*(Am(R”, W) |U) of local C^ naive differential m-forms on the Cartesian space IR" for k € Zg .) 


The vector (m + 1)-tuples V = (Vj)? € (R”)™+! are assumed to be indexed starting at 0 because this 
makes some formulas simpler. The derivative operator Oy, means the directional derivative $57 , VO; for 
L € Z[0,m]. (See Notation 14.4.10 for Z[0, m]. See Definition 14.12.6 (iv) for the “omit” operator, which 
omits one specified element from a tuple. See Definitions 41.8.4 and 41.8.14 for partial and directional 
derivatives of vector-valued functions on Cartesian spaces respectively.) 


The exterior derivative is really a procedure or map-template, not a ZF function. The class of linear spaces W 
is not a ZF set. The target space W is quite often the real number system, but in the theory of connections, it 
is often the Lie algebra of the structure group of a differentiable fibre bundle. Definition 46.7.4 is meaningful 
if W is a Banach space, but such generality is not required here. 


The differentiable structure for a finite-dimensional real linear space W could be thought of as the set of all 
component maps, one for each basis. However, W is not regarded as a manifold in Definition 46.7.4. Instead, 
the directional derivative for vector-valued functions in Definition 41.8.14 is used for Oy, on line (46.7.1). 
This requires only the linear space structure on W. 


The target space X (A444 (IR", W) | U) for d may be replaced by X°(Am41(R”, W) | U) in Definition 46.7.4. 
'This is justified by Theorem 40.7.9. 


The tacit antisymmetric multilinearity assertion in Definition 46.7.4 that dw € X(Am4+41(R”,W) | U) for all 
w € X! (A, IR", W) |U) follows from the multilinearity of the terms Oy,w(x)(omit;(V)) in line (46.7.1). The 
expression w(x)(omit;(V)) is multilinear with respect to the m-tuple omit;(V). The linearity with respect 
to V; for fixed omit;(V) follows the linearity of the directional derivative operation in Definition 41.4.4. The 
antisymmetry follows from the antisymmetry of w combined with the factor (—1)*. 
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46.7.4 DEFINITION: Exterior derivative in Cartesian spaces. 

The exterior derivative for a Cartesian space R” with n € Zf, valued in a finite-dimensional real linear 
space W, is the map d : X! (A4, (IR^, W) |U) > X(Am4i(R", W) |U), for degree m € Zj and U € Top(IR?), 
defined by 


Vw € X! (A4 (IR^, W) |U), Vx € U, VV € (IR^), 


(dw)(x)(V) = SE (D fðvwla) (omit(V)). (46.7.1) 


46.7.5 REMARK: Variant definitions for the exterior derivative. 
Some authors multiply the expression in line (46.7.1) by the factor 1/(m+1). (See for example Szekeres [305], 
pages 448-453; Helgason [17], page 21; Kobayashi/Nomizu [19], page 36.) 


46.7.6 REMARK: Interpretation of lower-degree exterior derivative maps. 
Substitution of degree m = 0 in Definition 46.7.4 yields: 


Vw € X! (Ao(IR^, W) |U), Va € U, VV; € R”, 
(dw)(x)(Vo) = ðvw(z)(). 
The empty parameter list “()” can be ignored by identifying X!(Ao(IR", W) |U) with C!(U, W). Thus 
Vw € C'(U,W), Yx € U, VVo € R^, (dw)(x)(Vo) = Oyw(z). 


So dw may be identified with the total differential of w as in Notation 41.9.9 for abstract linear spaces W, 
or with the total differential in Notation 41.6.8 if W is a Cartesian space. 


Substitution of degree m = 1 in Definition 46.7.4 yields: 
Vw € X! (A, (R*, W) |U), Ve € U, Wo, Vi € R^, 
(dw) (a)(Vo, Vi) = veo (x) (Vi) — Ay, w(x) (Vo). 
Substitution of degree m = 2 in Definition 46.7.4 yields: 
Vw € X! (A9 (I«R^, W) |U), Va € U, VVo, Vi, Vo € R”, 
(dw) (z)(Vo, Vi, V2) = y, (z)(Vi, V2) — Oy, w(x) (Vo, V2) + ðv (a) (Vo, Vi). 
Degrees m — 0, 1 and 2 are illustrated in Figure 46.7.1. 


m=2 Oy, w(x) (Va, Vo) 
Oy,w(x)(Vo, Vi) ~~ _ : 
m —0 
= + 
e-------- E T2 A 
x Vo 
Oy,w(x)() 
m=1 
Vi = 
= + 
V. 
x + Vo E 
ðv w(x) (Vi) Ovx) (Vi, Va) 
— dy,w(zx)(Vo) Xo 
Figure 46.7.1 Interpretation of the exterior derivative 
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The antisymmetric multilinear form spaces A;;, (IR^) are canonically isomorphic to the corresponding duals 
Am(IR")** = A" (IR")* of the spaces /’"(IR") of wedge-products of m-tuples of vectors in IR^. Therefore w 
can be interpreted for m = 2 and n = 3 as a linear function of area elements, and dw can be interpreted as 
a linear function of volume elements. (See Remark 57.6.9 for further details for differentiable manifolds.) 


46.7.7 REMARK: Covariance of the exterior derivative with respect to diffeomorphisms. 

To be a meaningful concept in the differential geometry context, the construction for the exterior derivative in 
Definition 46.7.4 must give the same value when the coordinates of the underlying point-space are transformed 
by a local diffeomorphism. Then the coordinates of the vectors V; in the vector-tuple V = (Ve)j%9 must 
also be transformed because a fixed vector has different components with respect to a different tuple of 
coordinate basis vectors. Covariance of the exterior derivative means that its value is independent of the 
choice of coordinates. The construction procedure for the exterior derivative must yield the same output 
when the coordinatisation of the point-space is changed. 


Let 6: U > Ü be a C? diffeomorphism, where U,Ü € Top(IR"). Each point with component tuple y € U 
may be given an alternative component tuple y = ¢(y) € U in the transformed coordinate system. Then 
each vector with component n-tuple V; € R” in the (m + 1)-tuple V = (V;)7^g € (IR?)"*! must have 
the component n-tuple Ye = Js(y)Ve in the transformed coordinate system, where the Jacobian matrix 
Jely) = Jely) ilj- = [0y;0(y)']?;-1 depends on y. (See Theorem 54.1.11 line (54.1.4) for justification 
of this formula in the context of tangent bundles for differentiable manifolds.) Thus whereas the vector- 
tuple V is constant with respect to x in Definition 46.7.4, it is variable with respect to 4 = ¢(a) when a 
diffeomorphism ¢ is applied (unless ¢ is an affine transformation as in Definition 24.4.6). 


It is assumed in Definition 46.7.4 that the directional derivatives Oy, are computed for fixed vector-tuples V 
so that there is no contribution from differentiating the expression “omite(V)”. Thus there is an apparent 
contradiction between the variability of the vector-tuples, due to the diffeomorphism’s variability, and the 
assumed constancy of the vector-tuples in Definition 46.7.4. This is the motivation for Theorem 46.7.8, 
which checks that if the transformed vector-tuples are fixed for the directional derivative computation, then 
the same answer is obtained. The fact that this fixing of the vector-tuples makes no difference to the value 
obtained can be seen, in the proof, to be due to way in which the second-order terms with respect to ¢ are 
cancelled by virtue of the factor (—1)*. 


The equation on line (46.7.2) is the covariance formula for a differential m-form w which is subjected to a C? 
diffeomorphism ¢. Then line (46.7.3) compares the outputs dw and dw for the exterior derivatives of w and 
w computed according to Definition 46.7.4. The assertion of the theorem is that these outputs are equal, 
which means that Definition 46.7.4 has the same geometric meaning independent of C? diffeomorphisms of 
the point-space. (This is why the exterior derivative is in differential layer 2 in Section 1.1.) 


46.7.8 THEOREM: Covariance of the exterior derivative. 
Let n,m € Zt. Let U,U € Top(IR?). Let 6: U 2 U bea C? diffeomorphism. Let W be a finite-dimensional 
real linear space. Let w € X!(A,, (IR^, W) |U) and 2 € X! (A4 (IR",W)|U) satisfy 


Ve € U, VV € (R”)™, w(x)(V) = &(9(x))( (Jo(2)Ve)eXo" ) 
= (9(2))(Jo(@)V), (46.7.2) 


where Jg(x) = [Jo(a)*j]?j21 = [0x,0(x)']?;=<1 for all x € U, and Jg(x)V denotes the tuple (Jo(z) V)! 


ij=l 
for any V € (IR?)"", for any m’ € Zj. Then 


Ve € U, YV e (IR^)*, 


Y C0! S VIO, Gr) (omit(V)) = $ C 1* Y; (Jele) Ve) yay) (omit (Js (5)V)), ga): (46.7.3) 
£—0 k=1 £=0 k=1 


Hence the exterior derivative in Definition 46.7.4 is covariant under C? diffeomorphisms. In other words, 
Vz € U, VV € (IR^), (dw) (x)(V) = (d)(6(x)) (Jo(x)V) 
whenever w and c satisfy line (46.7.2). 
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PROOF: Let w and & satisfy line (46.7.2). Let y € U and V € (R")”. Let z = $^! (y) and V = Js(x) ^V. 
Then &(y)(V) = dà(6(z))(Js(z)V) = w(z)(V) = «(67 (y) a(6^ (y)) ^V). Thus 


vy € U, VV € (R")”, &(y)(V) = oH (y)) Jo (67! (9)) ! V). (46.7.4) 


Let z € U and V € (IR")*!. Let V = Jg(a)V. Then for all £ € Z[0, m], line (46.7.4) implies 


Vy € U, È Jo) V2 8 ay) (omit(Jo(z)V)) = 35 V/Oye(ó  (y))(omit(J4(6 (y) V)) — (46.7.5) 
(Note that on the left side of this equation, the differential operator ô» acts only on the “y” in @(y) because 
the m-tuple omit, (JJ; (z)V) is fixed relative to y, whereas on the right side, the m-tuple omitz(Ja (6^ (y))- 1 V) 
does depend on y because the diffeomorphism $^! maps fixed vectors to variable vectors.) 

For y € Ü, let Jely) = Jel t (y)) t. Then Jely) = [397 (y)]?;-, by Theorem 41.7.4 (the chain rule) 
applied to 9 o 9^! and $^! o ¢. So by line (46.7.5), for all £ € Z[0, m], 


25 GV) 0,8) omit Gr)V)),. aa) = = VEO pA HY) (omit WV pote 


= Y. VEO (67 U) omit (662) ))|, uy + 92 VA OyoGr) (mito) V), ay 
k=1 k=1 
= E Y ds GG), (omit, (6())7)) + YS Tia. (46.7.6) 


where Te a is defined for £, o. € Z[0, m] with a # £ by 
Mam a Vf wle) Vea) y-ga) 


where Vra = (Ve,a,j)j20 € (R”)™ is defined by 


Jely) V; ifjcfandjZa 
Jely) Vj+ı ifj>€andj+lF#a 
OyJo(y)V; fj < Land j=a 


yr Jely) Visa if j >€andj+l=a. 


Vj € Z[0, m — 1], Veo. j = 


Thus the m-tuple Vz a,j is the same as omite(Jy(@(a))V) except that component a is differentiated with 
respect to y" before component / is omitted. (The differentiation of the arguments Jo (y)V; of w(x) follows 
from the multilinearity of w(x).) 

Let à = $71, and let ó';,(y) = 8,0,:(y)! for all y € U. Then dy Jely) V; = (xd dios) VP). Then 
by the multilinearity of w(x), for all £4, € Z[0, m] with a # £, 


Tha = MW E Op Jo) Va wl) Via) y-a) 
DA OOVA 


T 
m 
" 
ll 
m 
"3 
Il 
un 


where Vi, = (Vi, ;)7* € (IR")" is defined for i € Na by 


Jely) V; ifj«fandj zo 
Joly)Vjy1 iff > Cand j+14a 
€i if j < land j = a 
€i if j > land j+1=a. 


Vj € Z[0,m — 1], Viaj = 
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It then follows from the antisymmetry of w(x) that Ta e = (—1)'~°"'Ty.q for all £, a € Z[0, m] with a z 4. 
To see this, first suppose that a < £. Then Vins = Viaj for j < a and j > £, and the remaining £ — o 
components suffer a one-step rotation, which is equivalent to ( — & — 1 transpositions. (See Definition 14.8.14 
for transpositions.) Therefore a factor of (—1)’~°! is incurred. If / < o, then an equal factor (—1)9-*-! is 
incurred. It then follows that 


SCM! X m. Y Cols 


i axl oz 


Therefore $77 9(—1)* ei 0,a4¢0 Tea = 0. It then follows from line (46.7.6) that 


LAMY Qs) a Dl) omits()V))|, s) 
= E Cn! D VEY OE) rbola) omit Aa) 
= ECn' 3 Vioc) (omit(V)) 


which verifies the assertion. 


46.7.9 THEOREM:  Differentiability of the exterior derivative of a differential form. 
Let n € Zj and U € Top(IR?). Let W be a finite-dimensional real linear space. Then 


Vm € Zt, Vk € Zf, Vw e X" (Am(R”,W) |U), 
dw € X*(As 44 (R^, W)|U), 


where d is the exterior derivative map in Definition 46.7.4. 


PROOF: Let w € X^**!(A,, (IR^, W) |U). Then by Notation 46.6.8, the map z ++ w(x)(V) is of class C**! 
from U to W for all V € (R")™. Thus VV € (IR")™, w(-)(V) € CF*(U,W). Let V € (IR?)?*!. Then 
w(-)(omite(V)) e CF (U,W) for £ € Z[0, m]. Therefore dy,w(-)(omite(V)) € C*(U,W) for £ € Z[0, m] by 
Theorem 42.5.16 (vii). (This assumes that the differentiable structure on W is defined by some component 
map. As mentioned in Remark 46.7.3, the component map choice does not affect C^ classes.) It then 
follows that (dw)(-)(V) € C*(U, W) by Definition 46.7.4 by the sum rule for C^ differentiability. Hence 
dw € X*(N,, 4 (R?, W)|U) by Notation 46.6.8. 


46.7.10 REMARK: The exterior derivative applied to tuples of unit vectors. 
Theorem 46.7.11 specialises Definition 46.7.4 to the chart basis vectors for the standard coordinate chart idy 
for open subsets U of Cartesian spaces IR". 


The basis (e;)?., in Theorem 46.7.11 is the standard basis for a Cartesian linear space. (See Definition 22.7.9.) 
The notation ea for a € (N,,)™ means the same as e o a or (e9,)7*,. The “omit” operator omits one element 
from a tuple. (See Definition 14.12.6 (iv).) 

The multilinearity properties of differential forms imply that the action of the exterior derivative on chart 


basis vectors in Theorem 46.7.11 uniquely determines its action for general vector tuples. So this action is 
a kind of coordinatisation of the exterior derivative. It could in fact be used as an alternative definition. 
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46.7.11 THEOREM: Application of exterior derivative to chart-basis vector-tuples. 
Let n,m € Zi and U € Top(IR?). Let W be a finite-dimensional real linear space. Then the exterior 
derivative d : X1(A,,(R",W)|U) + X°(Am4i(R”, W) | U) satisfies 


Vw € X!(Am(R”,W) |U), Yx € U, Vo € (IN), 
dula) (Cao) = 3 (71)! 
£—0 


(w(x)( (ea; )i=0 J); (46.7.7) 


Ofa iz 
where (e;)?_, is the usual coordinate basis for IR". In other words, 
Vw € X! (A4 (R^, W) |U), Yz € U, Vo € (N4)"**, 


date 2 (710, (x) (omit(ea)). (46.7.8) 


PROOF: Since all of the vector field tuples (e4,)7*9 are constant with respect to x, the assertion follows 
directly from Definition 46.7.4. 


46.7.12 REMARK: Interpretation of the exterior derivative as the limit of boundary integrals. 

The exterior derivative dw of a form w of degree m may be interpreted as the infinitesimal limit of the 
integral of w over the m-dimensional boundary of an (m--1)-dimensional submanifold divided by the (m 4- 1)- 
dimensional area of the submanifold as it shrinks to a point. The Stokes theorem supports this interpretation. 
The definition of exterior derivative is designed to make the Stokes theorem valid. One may regard the 
Stokes theorem as the definition of the exterior derivative. Defining the exterior derivative in this way is 
more satisfying than pulling it out of a hat. (See Section 46.9 for motivation for the exterior derivative in 
terms of the Stokes formula.) 


The exterior derivative of differential forms of degree 1 is also important for the definition of curvature of 
connection forms on principal bundles. (See Definition 70.5.2.) In this application, the input is a Lie algebra 
valued 1-form, and the output is a 2-form valued in the same Lie algebra. Since the curvature application is 
limited to the differentiation of 1-forms, this would not justify the exterior derivative for general m-forms. 


46.8. The exterior derivative for vector field arguments 


46.8.1 REMARK:  Eztension of the exterior derivative to vector-field tuples. 

Definition 46.7.4 gives the exterior derivative of an antisymmetric multilinear form field by extending tuples 
of vectors to constant vector fields so that a naive directional derivative can be applied to them. In general, 
naive derivatives do not yield tensorial objects, but Theorem 46.7.8 shows that the non-tensorial terms which 
refer to second-order derivatives of point-space diffeomorphisms are cancelled by the special form of exterior 
derivative. (This is analogous to the cancellation which occurs for the Lie bracket in Definition 61.5.7.) 


Theorem 46.7.8 shows that the procedure for the computation of the exterior derivative in Definition 46.7.4, 
which extends vectors to constant vector fields, is immune to the distortions of these vector fields by C? 
diffeomorphisms. This suggests that the constancy of the vector fields might be an unnecessary constraint, 
which could be removed while still producing the same output values. In fact, T'heorem 46.7.8 only shows 
that the constant vector fields can be replaced by the basis vector fields of any C? chart for the domain. 
Such “holonomic” vector-field tuples have the property that their pairwise Lie bracket is everywhere zero. 


'The extension of the exterior derivative from constant extensions of vector tuples to general tuples of vector 
fields is no doubt motivated at least partly by the desire to have a definition which is apparently liberated 
from coordinate charts. Apart from the philosophical motivations, there are also concrete benefits from 
the ability to evaluate the exterior derivative for nonholonomic tuples of vector fields. There are some 
situations where nonholonomic vector fields are easily available, but replacing them with equivalent holonomic 
vector fields would be inconvenient. (For example, C? principal bundles have “fundamental vector fields” in 
Definition 66.6.2 which typically do not commute with each other.) 

Definition 46.8.3 applies the computational procedure in Definition 46.7.4 to vector field tuples. In the case 
of holonomic vector field tuples, the output of the procedure is the same. (This is shown in Theorem 46.8.5.) 
For nonholonomic vector fields, some extraneous terms appear. (This is shown in Theorem 46.8.8.) These 
must be cancelled in order to recover the correct value for the exterior derivative. 
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46.8.2 REMARK: Conversion of differential forms to functions of vector fields. 

To extend the procedure in Definition 46.7.4 to families of vector fields, the functional form of differential 
forms must be adjusted so that they have parameters which are vector-field tuples instead of vector tuples. 
The uncorrected exterior derivative “dw” for vector fields in Definition 46.8.3 effectively transposes the 
parameters in the exterior derivative dw for vectors. Whereas (dw)(x)(V) acts on vector-tuple V at each 
point x € U, the vector field version (du)(X)(x) first constructs a function from U to W from a vector field 
tuple X, and then restricts the result to a particular point x € U. 


To place the input w and the output dw on the same footing, the naive differential form w : U + Aj, (R^, W) 
may be converted to a map à : X!(IR^|U)" — (U — W) defined by @(X)(x) = w(x)(X(x)) for all 
X € X! ((R^|U)" and x € U. Then à encapsulates the same information as w. (The original information 
can be recovered by choosing XV € (U > R”)™ with XY (x) = Va for all a € Z[0, m — 1] and z € U.) 
Then the input for d is a function of the form à : X' (IR^ | U)™ — (U > W), while the output is a function 
dw : X! (IR^ |U)"** — (U > W). In this way, pointwise differential forms are converted into maps from 
naive vector-field tuples to W-valued functions on the point space U. 


46.8.3 DEFINITION: Uncorrected exterior derivative for vector fields on Cartesian spaces. 

The uncorrected exterior derivative for vector fields on a Cartesian space R”, valued in a finite-dimensional 
real linear space W, is the map d : X' (A4, (IR^, W)|U) — (X'!(IR^|U)"*! — (U — W)), for degree 
m € Zá and U € Top(R") with n € Zj , which is defined by 


Vw € X! (A4 (R^,W)|U), VX = (X99 € X! (R^ |U)*!, Ve € U, 


m 


^ T, s 
(AWe) = S C0 Y Xela) apo iOO) yay (46.8.1) 
= = 
46.8.4 REMARK: Recovering the exterior derivative from the uncorrected vector-field exterior derivative. 
Theorem 46.8.5 asserts that the pointwise exterior derivative in Definition 46.7.4 can be recovered from the 
uncorrected vector-field exterior derivative in Definition 46.8.3 by applying it to constant vector fields. 


46.8.5 THEOREM: Application of the uncorrected exterior derivative to constant vector fields. 
Let n,m € Zj. Let U € Top(IR”). Let W be a finite-dimensional real linear space. Then 


Vw € X!(Am(R”, W) |U), Vx € U, VV € (R^), 
(dw) (z)(V) = (de) (XV )(a), 
where XV = (XY y? o e X! (IR^ |Uy"*! is defined for V = (Va) , € (IR^)! by 
Vm € Z[0, m], Vx € U, XY (x) = Va. 


PROOF: The assertion follows by substituting XV into (dw)(XY)(«) in Definition 46.8.3, which gives the 
same expression as in Definition 46.7.4 for (dw)(z)(V). 


46.8.6 REMARK: The effect of the uncorrected vector-field exterior derivative on general vector fields. 
When Definition 46.8.3 is applied to general vector field tuples X = (X4)? o € X'(R"|U)™*? there are 
two instances of the variable y in line (46.8.1) which contribute terms to the result. The first instance 
reproduces the pointwise exterior derivative as in Theorem 46.8.5, but the second instance produces terms 
which depend on the Lie brackets of pairs of fields. These are “error terms" which must be subtracted from 
the naive uncorrected vector-field exterior derivative to recover the correct exterior derivative. 


It is unsurprising that the application of the uncorrected exterior derivative to general vector fields produces 
some commutators of actions of vector fields, i.e. Lie brackets, because the exterior derivative is constructed 
from differences of actions of vector fields. The Lie bracket [X;, X;] in Theorem 46.8.8 is introduced in 
Definition 46.4.7 and Notation 46.4.8. It transforms vectorially under diffeomorphisms by Theorem 46.4.10, 
and by Theorem 46.4.11, [X;, Xj] € X? (U, R”) if Xz, X, € X1(U, R”). 


46.8.7 REMARK:  Tuple concatenation for parameters of differential forms. 

The expression *([Xr, X;])(x), omit, ¢(X()))” in line (46.8.2) means concat((LXx., X¢](x)), omit, e(X (x))), 
which is the concatenation of the 1-tuple (Xi, X¢|(x)) with the (m — 1)-tuple omit, ¢(X (x)). It is convenient 
to indicate such tuple concatenations by a comma rather than with a formal functional notation. In many 
contexts, this kind of informal comma-notated concatenation is not ambiguous. 
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46.8.8 THEOREM: Application of the uncorrected exterior derivative to general vector fields. 
Let n,m € ZA. Let U € Top(IR"). Let W be a finite-dimensional real linear space. Then 


Vw € X'(Am(IR", W)|U), VX € X! (R^ |U)"*!, Va € U, 


(dw)(X)(x) = (du)(a)(X (x)) — 22, C DF ela) (Xe, Xo) Gr) omit (X) (46.8.2) 
kee l 


where X(x) = (Xq(x))P o for all x € U. 


PROOF: The expression 3pw(y) (omite (X) Y) in Definition 46.8.3 line (46.8.1) can be expressed as the 
sum ðpw(y) (omite (X)()), e + yix) (omit (X)(y))|, ... of derivatives where one instance of y is held 


constant in each term. Considering the second term first, the multilinearity of w(x) implies that the action 
of the naive differential operator ôy: on w(r)(omit;(.X)(y)) yields 


8, (omit QXO()) = 3 (x) (omit, subs. QO)(y)). 
jzt 


Thus 0; X;(y) is substituted for the parameter X;(y) of w(x) for each j € Z[0, m] in turn, omitting j = £. 
(See Definition 14.12.6 (vii) for the list substitution operator “subs”.) Therefore 


Oso) (omit OG) = 3 Ce (e) Go) omi QOG)) + 1 (olal) omi QOG)) 


because the parameters with j < / must be transposed j times to move into the first parameter position, 
whereas parameters with j > @ must be transposed j — 1 times to move into the first parameter position. 
(See Definition 14.12.6 (v) for the two-element-omission list-operator “omit”.) Therefore 


m 


ECD Y Xela) Opole) (omit), 


£=0 


= Cn Xi Xela)! (S C1yo()0:3;G)omi(06)) + $5 (707 «(0X G) omit(X)(0))) 


j—0 j=l41 


= X C0! Xi (-1)fe(0)(Ox 2) X (a), omit X)(2)) 


£=0 j=0 ji 


In the sum $5, 9(71)! 354 X¢(x)'Ayiw(y) (omit, (X (y), .. in line (46.8.1), the contribution of the term 
Oyiw(y) (omit, (X)(z))|, .. yields the exterior derivative (dw)(x)(X(«)) as in Definition 46.7.4. Combining 
these two terms then verifies line (46.8.2). 


46.8.9 REMARK: Correction of the vector-field exterior derivative. 

It would be desirable to make a small adjustment to the uncorrected exterior derivative in Definition 46.8.3 so 
that it gives the correct answer for all C! vector-field tuples. However, the simplest procedure for computing 
the exterior derivative seems to be to solve equation (46.8.2) in Theorem 46.8.8 for the correct exterior 
derivative w. This can be done by first defining a “corrected” vector-field version of the exterior derivative, 
and then observing that the vector version can be recovered from it, as in Theorem 46.8.12. 


'The complexity of Definition 46.8.10 can be avoided in several ways. 
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(1) The vector fields X; can be restricted to be holonomic. In other words, they can be the basis vector 
fields of some local C? chart. 

(2) The computation in equation (46.8.3) can be modified so that y is replaced with x in the sub-expression 
omite(X)(y) so that effectively Definition 46.7.4 is being used instead. 


(3) Ignore vector fields and just use Definition 46.7.4 for vector tuples instead. 


In many situations, none of these ways of avoiding the Lie bracket terms are convenient. However, as 
with many concepts in differential geometry, it is not the computational algorithm which is used directly in 
applications, but rather the abstract properties of the concept. (Engineers might say that it is the interface 
which matters, not the implementation.) In practice, it is the cases m — 0 and m — 1 which are typically the 
most useful. T'he case m — 0 is equivalent to the differential of a function, and in case m — 1, the formula 
in Definition 46.8.10 reduces to (du)(Xo, X1)(zx) = Ox,(x)(w o X1) — Ax,(2)(w o Xo) — w([Xo, X1](x)), or 
something of that ilk. More briefly, (du)(Xo, X1) = Ox,(w o X1) — Ox,(w o Xo) — w([Xo, X1]). 


46.8.10 DEFINITION: Corrected exterior derivative for vector fields on Cartesian spaces. 

The exterior derivative for vector fields on a Cartesian space R”, valued in a finite-dimensional real linear 
space W, is the map d : X1(A,,(IR",W)|U) > (X1(R"|U)™+! — (U — W)), for degree m € Zj and 
U € Top(IR?) with n € Zf, which is defined by 


Vu € X! (A«(R?, W)|U), VX = (Xe, e X! (IR? | Uy**, Ve € U, (46.8.3) 
(dw)(X)(x) = yoy » Xi(z)'8y (y) (omit(X (y), -s * p (-)"*e(X, Xe](x), omit(X (2))). 


46.8.11 REMARK:  Equivalence of the vector-field extension of the exterior derivative. 
'Theorem 46.8.12 asserts that the vector-field version of the exterior derivative in Definition 46.8.10 is a valid 
extension of the vector version in Definition 46.7.4. Note that line (46.8.4) is not a pure tautology. The 
appearance of a vector field X as the first argument of *(dw)(.X)(z)" signifies that “dw” is the vector-field 
exterior derivative in Definition 46.8.10 in this expression, whereas “dw” in the expression "(dw)(z)(X(x))" 
refers to the vector version in Definition 46.7.4 because of the classes of its arguments. 


The most important observation to make regarding Theorem 46.8.12 is that it shows that for the corrected 
vector-field version dw of the exterior derivative in Definition 46.8.10, (du)(X)(x) depends only on X (x), 
independent of all possible variability of the vector-field tuple X. This implies that the pointwise values of 
the exterior derivative can be recovered from the corrected (dw)( X), whereas this can only be done in this 
way for the uncorrected exterior derivative (dw)(X) in Definition 46.8.3 if nonholonomic vector fields are 
used for X, as in Theorem 46.8.5. 


46.8.12 THEOREM: Equivalence of vector-field and constant-vector versions of the exterior derivative. 
Let n,m € Zj. Let U € Top(IR”). Let W be a finite-dimensional real linear space. Then 


Vw € X!(Am(R”,W) |U), VX e X! (IR^ |Uy"**, Ve e U, 
(dw)(X)(x) = (dw)(x)(X(x)), (46.8.4) 


where X(x) = (Xq(«))_p for all x € U. 


PROOF: The assertion follows from Theorem 46.8.8 and Definition 46.8.10. 


46.9. Rectangular Stokes theorem in two dimensions 


46.9.1 REMARK: The Stokes theorem in the literature. 

Some presentations of the Stokes theorem in the differential geometry literature are as follows. (Some 
authors call it Green's theorem or the Gauf-Green theorem.) Weyl [310], pages 109-112; Synge/Schild [41], 
pages 267-277; Struik [39], pages 208-209; Flanders [11], pages 64-66; Guggenheimer [16], pages 190-193; 
Kobayashi/Nomizu [19], pages 281-283; Henri Cartan [4], pages 76-82; Choquet-Bruhat [6], pages 74-81; 
Bishop/Goldberg [3], pages 195-199; Misner/Thorne/Wheeler [292], pages 94-98, 127, 150-151; Spivak [37], 
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Volume 1, pages 253-263, Volume 4, pages 132-134; Whitney [161], pages 21-24, 94-103, 108-110, 273; 
Sulanke/Wintgen [40], pages 213-217; Lovelock/Rund [27], pages 156-163; Poor [32], pages 157-158; Gallot/ 
Hulin/Lafontaine [13], pages 182-184; Darling [8], pages 183-193; Frankel [12], pages 110-117, 155; Lang [23], 
pages 475-510; Szekeres [305], pages 486-493; Penrose [297], pages 230-233; Bleecker [254], pages 12-13. A 
very general Gaufi-Green theorem in Cartesian space is given by Federer [69], pages 391, 478. Some useful 
summaries are given by EDM2 [113], 94.F, 105.U, 194.B, App. A, Table 3.III. 


Some authors refer to the Stokes theorem as the “fundamental theorem of calculus on manifolds" or the 
"fundamental theorem of exterior calculus". (See Lovelock/Rund [27], page 131; Penrose [297], page 233.) 


Stokes did not discover the Stokes theorem, but he did use the three-dimensional version, which he obtained 
from William Thomson (also known as Lord Kelvin), as part of the Smith Prize examination at Cambridge 
University in 1854. (See Penrose [297], pages 245-246; Darling [8], page 189.) 


46.9.2 REMARK:  Generalisation of the fundamental theorem of calculus to rectangular regions. 
A consequence of Theorem 43.8.5 is the corresponding Theorem 46.9.3 for two variables. The theorem holds 
for very general regions, but it is illuminating to first prove it for a rectangular region. 


46.9.3 THEOREM: Stokes theorem for a rectangle in two-dimensional Cartesian space. 
Let a1, @2,61,b2 € R with a; < bı and az < be, and let A: [a1, bi] x [a2, b2] — R? be a continuous function 
such that A has partial differentials on (a4, 51) x (a2, b2). Then 


f 1 Ao = On Ay daz da? = A.ds, (46.9.1) 
Q anQ 


where Q = [a1, b1] x [a2, b2] and ds denotes the anti-clockwise line integral around OQ. 


PROOF: The first term 0;A2 may be integrated by Theorem 43.8.5 with respect to x! for fixed z?. This 
gives A»(b1, x?) — A»(a1,x?). (See Figure 46.9.1.) So 


bo by 
1 1 Ag dx! dx? = I Oi Ao(x", x”) dx} dx? 
Q a2 ay 
b2 
= A2(b1, x?) — A»(a1, x°) dz?. 


a2 


LA —— — 


b: E 


bı 
|| iadaa y 


a == 
2 7b 
— > 
d 
ay bi X 
Figure 46.9.1 Integration of exterior derivative in a rectangle 


Similarly 
by b2 
f —05 A, dz! da? = / —02A;(x", x”) da? dx! 
Q ai a2 
bi 


= Ai(x!, a?) — Ai(x!, 0?) dat. 


So the left-hand side of (46.9.1) becomes the sum of anti-clockwise line integrals: 


A.ds + i] A.ds + A.ds+ | A.ds, 
Vr Ye 


Yo Yt 


where Yr, Ye; Y and y denote respectively the right, left, bottom and top sides of [a1, b1] x [a», bə]. 
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46.9.4 REMARK: Dimensional analysis check for the Stokes theorem in two-dimensional Cartesian space. 
Some elementary scaling tests can be applied to Theorem 46.9.3 to ensure that it is at least plausible. The 
left-hand side of equation (46.9.1) has the terms 0; As — 02 A1 in the integrand. These both look like covariant 
tensors of degree 2. So is the coordinates are multiplied by 2, they will be multiplied by 1/4. The differential 
form of the integral is dz!dz?, which looks like a contravariant tensor of degree 2. So under any simple 
coordinate scaling, the left hand side should remain constant. In fact, this is true for any C! transformation. 


The right-hand side of equation (46.9.1) has an integrand which looks like a covariant tensor of degree 1 
and a differential form ds which looks like a contravariant tensor of degree 1. So this is also invariant under 
coordinate scaling. This kind of “dimensional analysis” is useful as a basic sanity check for equations and 
expressions in differential geometry. 


46.9.5 REMARK: How the form of the exterior derivative arises from the Stokes theorem. 

The proof of Theorem 46.9.3 gives a clue for how to remember the form of the exterior derivative. The 
term 0, Ao, if integrated on its own, yields the difference between A5 on the right and left sides of the 
integration region, which is just like in the fundamental theorem of calculus. All that is happening here is 
that the partial derivative 0, is cancelled by the integration f... dz!. Then the integral over x? sums this 
right-left difference over the whole right and left edges of the region. The term 0941 may be understood 
similarly, except that a minus-sign is required because the top edge integral is going in a negative direction, 
ie. in the direction of decreasing xt. Thus one may think of 0; A2 — 05A, as "the right-left difference of As 
minus the top-bottom difference of A4". 


46.9.6 REMARK: The Stokes theorem integrals vanish for a conservative field. 

If the (covariant) vector field A in Theorem 46.9.3 is replaced with the differential (0; f,09f) of a C? 
function f : Q > R, the left-hand integral has a value equal to zero because 0?f/0x!0z? = 0? f /Ox?dz'. 
So the theorem implies that the integral of the differential df around the boundary OQ is zero. This is not 
surprising because this integral of df signifies the difference in “height” of the function f as a point completes 
a loop around the boundary. This must be zero so that the value of f comes back to where it started. This is, 
in fact, a consequence of the fundamental theorem of calculus applied to the boundary curve. (This pattern 
continues for higher dimensions.) 


The expression 01.45 — 09341 may be interpreted as the “deviation of the vector field A from the differential 
of a function f”. If this integral is always zero, the boundary curve integral of A will be path-independent, 
in which case an integral f may be determined by integration of A along non-closed curves. If the vector 
field A represents a physical force field, the integrals in Theorem 46.9.3 may be thought of as the energy 
gained by one rotation around the boundary curve. So a zero value implies that the field is conservative. 


46.9.7 REMARK: Interpretation of the exterior derivative as the limit of a boundary integral. 

Roughly speaking, Theorem 46.9.3 suggests that the exterior derivative 0145 — 0544 may be thought of as 
"curl per unit area". The kind of directional boundary path integral in Theorem 46.9.3 has an interesting 
additive property. If two rectangular regions are placed side by side, the common boundary segments cancel 
each other. The arrows cannot have the same direction on a common segment if they are always oriented 
counterclockwise as indicated. 


If a region is partitioned into many rectangles, the integral of the curl operator over the entire region may be 
calculated by integrating around its boundary, ignoring all of the internal line segments where the component 
rectangles coincide. In fact, this can be generalised to almost any region at all. This is not a surprising 
result when it is considered that the differential operator in the interior is equal to the limit of the per-area 
boundary line integral for vanishingly small rectangles. Thus, roughly speaking, one may write: 


: 1 


1 
= lim m | A.ds, 
2 {pr} W(Q) Jan 


for p € IR?, where u denotes the area for subsets of IR? and the expression "limo. ,(," can be made precise 
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in terms of the diameter of Q. This interpretation may be compared with the corresponding formula in Rt: 


. 1 
M0) = igi; ff 


] 1 
H i length(I) Ja; Tus 
oa /0 - f) 


a,b—p b—a i 


where OI denotes the “signed boundary" of the interval I = [a,b]. This “signed boundary" is positive at b 
and negative at a. Hence the boundary integral is the integral of f multiplied by the 0-form ds which equals 
1 at b and —1 at a. 


The Stokes Theorem follows very naturally from the way the exterior derivative (in this case the curl 
operator) is defined. The reason for the name “exterior derivative" is clear from the “limit of per-area 
boundary integral" interpretation. (See also Remark 46.7.12 for this interpretation.) 


46.9.8 REMARK: The tensorial form of integrands in Stokes’ theorem suggests diffeomorphism invariance. 
The exterior derivative of a covariant vector field transforms like a covariant tensor field of degree 2. This 
suggests that Theorem 46.9.3 should hold when the point space is subjected to a C! diffeomorphism. 


46.10. Rectangular Stokes theorem in three dimensions 


46.10.1 REMARK: Extension of the Stokes theorem from a rectangle to a rectangular solid. 

Stokes theorem can be extended from a rectangle to a rectangular solid. Consider first the surface integral 
for S4 = (xl) x [x?, x? + Az?] x [x?, x? + Az?]. The integral of the vector field À € X!(A2(T(IR?))) is 
Js, A(z) (€2, e3) dz?da?. In terms of coordinates, let (a) (e2, e3) = a23(a). (See Figure 46.10.1.) 


T3 4 
e3 — we 
A(x) (€2, e3) dz?da? 
f A(x) (€2,\e3) dz? dz? j l 
SA 
— D eJ 
€1 
T2 
T1 
Figure 46.10.1 Stokes theorem in IR? 


Then by subtracting the integral over S4 from the integral over Sg and dividing by Az! Az?Az? and taking 
the limit, the result is 01a23(z) = (0/0x!) (xl, x?, 1?)(e2, e3). When the other two surface pairs are added, 
the result is Q1a»3(x) — 02a13(x) + 03a12(x). By integrating this over a non-infinitesimal rectangular solid 
as for a 2-dimensional rectangle, the result is fo Q1a23(x) — 02a13(x) + 03a12(z)dx! da?da? = fag A(x) (dA). 
This suggests that dA(x) should be defined as 0,a23(x) — 02413(x) + 03a42(x) in order to make the Stokes 
formula valid. 


The Stokes formula for a rectangular solid is then 
if 01023 — 05013 + 03012 dxlda?da? = | a.dA. 
Q aa 
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The integral over the surface could be called an “exterior integral”, while the integral over the rectangular 
region could be called an “internal integral”. The name of the “exterior derivative” is clearly related to the 
external nature of the surface integral. 


An easy proof of the Stokes formula for a rectangular solid follows the same method as for Theorem 46.9.3. 
Each of the three terms of the integrand may be integrated along lines as in Figure 46.10.2 to get rid of 
the partial derivative. Each of these line integrals may be integrated over the corresponding faces of the 
rectangular solid. These solid integrals may be added to give the surface integral. 


T3 4 bı 


01.423 (2) dx! 


— X1 


Figure 46.10.2 Stokes theorem integration paths in IR? 


46.10.2 REMARK: Interpretation of the exterior derivative as the limit of a surface integral. 
As for the rectangular Stokes theorem in R? (Remark 46.9.7), one may motivate the definition of the exterior 
derivative by the (very rough) expression: 


1 
= = ae 
(01a23 — 02413 + 03412) (p) n MO I. a.d A, 


for p € IR?, where u denotes the volume for subsets of IR?. 
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TOPOLOGICAL FIBRE BUNDLES 


47.1 Motivation and overview of fibre bundles . . ...... ees 1504 
47.2 Topological fibrations with intrinsic fibre spaces .. 1... 2.2... le 1506 
47.3 Topological fibre charts and fibrations .. 2... 0... a 1507 
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47.6 Topological fibre bundles ....... oao caasa eee 1512 
47.7 Fibre bundle homomorphisms. ...........0.0. osos 1515 
47.8 Topological principal fibre bundles . .. ..... ole 1518 
47.9 Associated topological fibre bundles . ...... a 1521 
47.10 Patchwork associated topological fibre bundles . . ... .... lle 1523 
47.11 Orbit-space associated topological fibre bundles . ........... llle 1528 
47.12 Topological short-cut orbit-space associated cross-sections . . . . 2l ls 1532 
47.13 Combined topological fibre/frame bundles .. ....... ee 1536 


47.0.1 REMARK: Chapter dependencies. 

Chapter 47 imports no concepts from Chapters 37-39 on metric spaces and topological linear spaces. Nor does 
it import concepts from Chapters 40-46 on differential and integral calculus. Therefore it could be argued 
that Chapters 47-48 on topological fibre bundles should appear directly following Chapter 36. However, after 
Chapters 31-36 on abstract topology, it's good to have a break from abstraction by first entering instead 
into the more concrete metric spaces and calculus before diving once again into pure topology. 


Another reason to place Chapters 47-48 at the end of Part III is that fibre atlases, which are the defining 
characteristic of fibre bundles, have much in common with locally Cartesian space atlases, which are the 
defining characteristic of the differential geometry in Part IV. Moreover, the core concept of topological fibre 
bundles is parallelism, which is a quintessentially geometric notion. So topological fibre bundles provide a 
suitable prelude to the geometry in Part IV. 


47.0.2 REMARK:  Topological fibre bundles are simpler to study than differentiable fibre bundles. 

Fibre bundles provide a framework within which geometrical concepts such as parallelism and curvature can 
be given meaning. They also provide a “home” for concepts such as vector fields and tensor fields which are 
important in physics. 

The relative simplicity of topological fibre bundles makes it easier to examine the fine detail of definitions 
in depth without the distractions of topics such as Lie groups and vector fields which arise naturally in the 
context of differentiable fibre bundles. 


Non-topological fibre bundles are defined in Chapter 21. Chapter 47 deals with topological fibre bundles. 
Differentiable fibre bundles are defined in Chapters 64-66. Non-topological fibre bundles may be regarded 
as a sub-class of topological fibre bundles by adding the discrete topology on every space in the specification 
tuple as the “default topology". (This default topology is also mentioned in Remark 31.3.21.) This has 
the consequence that all subsets of these spaces are open sets, and all functions between these spaces are 
continuous by Theorem 31.12.10 (i). Therefore the properties of topological fibre bundles are applicable to 
non-topological fibre bundles by adding the default discrete topology on all spaces. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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Conversely, the properties of non-topological fibre bundles are applicable to the topological fibre bundles by 
ignoring the topologies on the spaces and the continuity requirements. 


47.0.3 REMARK: History of fibre bundles. 

It seems that fibre bundles as a distinct concept were first defined by Eduard Ludwig Stiefel in 1936 in the 
context of pathwise parallelism in manifolds. (See EDM2 [113], 147.A. However, see also the 1932 paper on 
“fibred spaces” by Seifert [197].) Steenrod [142], page v, wrote the following. 


The recognition of the domain of mathematics called fibre bundles took place in the period 1935-1940. 
The first general definitions were given by H. Whitney. His work and that of H. Hopf and E. Stiefel 
demonstrated the importance of the subject for the applications of topology to differential geometry. 


The need for fibre bundles in physics can be traced back to the introduction of fields to explain forces. 
Instantaneous action at a distance was necessarily replaced by field theories when it was realised that light 
was governed by Maxwell’s equations for electricity and magnetism. Since light travelled at a finite speed, 
there was a requirement for some mathematical structure to contain the state of a moving wave between the 
time of its emission and the time of its reception. (See Remark 21.0.7 for some comments by Maxwell on 
resistance to the idea of a “medium” for the propagation of light.) 


When the state of a “medium” for the propagation of forces is modelled mathematically, the concept of a fibre 
bundle is implicitly used. This is usually not “topologically interesting” in the combinatorial topology sense. 
However, it is geometrically interesting, in particular because forces are typically identified with curvature, 
which is a geometric concept. So one could say that the geometric concepts of fibre bundle theory arose in 
physics, while the topological concepts arose in pure mathematics. But even after the general recognition 
that gauge theory in particle physics is closely related to connections on fibre bundles, these two literatures 
have still been largely separate, with distinct sets of definitions and notations. (See Remark 70.8.2 for some 
references to gauge theory geometry histories in the literature.) So one could say that there has been separate 
development of the topological and geometric aspects of fibre bundles since the mid-19th century. 


47.0.4 REMARK: Abbreviations for cumbersome names of fibre bundle categories. 
In this book, OFB is short for “ordinary fibre bundle”, and PFB is short for “principal fibre bundle”, as 
mentioned in Remarks 21.8.1 and 21.9.2. These are non-standard abbreviations. However, the abbreviation 


“PFB” is used by Bleecker [254], page 26. 


47.1. Motivation and overview of fibre bundles 


47.1.1 REMARK: The underlying motivation and significance of fibre bundles. 

A fibre bundle is a mathematical model for a set of objects with well-defined locations, where each object 
can be coordinatised by a single coordinate space. The canonical example of a fibre bundle is the set of 
tangent vectors on an n-dimensional real differentiable manifold. Each tangent vector has a well-defined 
location on the manifold and can be coordinatised by an n-tuple of real numbers. Physical examples of fibre 
bundles include the set of all possible electric field vectors in a region of space and the set of all possible 
electromagnetic stress-energy tensors in a region of space-time. 


In physics, field theories attach various scalar, vectorial and tensorial objects to each point in space-time. 
Thus physical fields “inhabit” fibre bundles. 


The topology on a fibre bundle is applied to the entire set of possible objects at all locations. This has the 
effect of “gluing” together the sets of possible objects at different locations. The non-topological fibre bundles 
in Chapter 21, on the other hand, have no sense of locality or “glue” because the topological structure which 
is required for the specification of locality is absent. 


47.1.2 REMARK: The component parts of fibre bundles. 
The set E of all elements in a fibre bundle is called the “total space". This is often denoted as *E", 
presumably as a mnemonic for the French word “entier”, which means “whole”, “entire” or “total”. 


A fibre bundle has a “base space” B, which serves to parametrise the location of all elements of the total 
space E. The function 7 : E — B which maps total space elements to their locations is called the “projection 
map" of the fibre bundle. Each element of E has one and only one location. It follows that the sets t~1({b}) 
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Figure 47.1.1 


Partition of total space into fibres by projection map 


for b € B are pairwise disjoint, and the union of these “fibre sets” is the entire total space. (This is illustrated 
in Figure 47.1.1.) 


One may think of the location in the base space B as the horizontal component of each element of E. 
Each fibre bundle also has a vertical component space F. For each b € B, there is required to be a bijection 
between the fibre set 7! ((b)) and F. The combination of the horizontal and vertical components determines 
a unique point in E. 


It would seem, then, that a vertical component projection o : E — F could be defined, and the combined 
horizontal/vertical map 7 x 9 : E — B x F would serve to parametrise the whole total space, where v x ¢ 
maps z € E to the ordered pair (7(2), $(z)). The fly in the ointment here is that the topology on E is not in 
general the same as the standard Cartesian product topology on B x F. The two-sphere S? and the Möbius 
strip are examples where these topologies are different. However, the topology on fibre bundles can always 
be identified with such Cartesian product topologies in a neighbourhood of each base point. (This is part of 
the definition of a topological fibre bundle.) Therefore topological fibre bundles are defined to permit any 
number of "fibre charts", which locally map the total space to the vertical component space F. Each such 
local fibre chart is required to be a homeomorphism between its domain and range. 


47.1.3 REMARK: The three main categories of topological fibre spaces. 

A structure with only the component parts mentioned in Remark 47.1.2 is called a “topological fibration”. 
The name “topological fibre bundle" is given when an atlas of fibre charts is provided, and a structure group 
is specified, as in Definition 47.6.5. 


The name “topological principal fibre bundle" is given when the fibre space F is the same as the structure 
group G as in Definition 47.8.3. In this case, the total space of the fibre bundle is thought of as the 
set of "reference frames" at each point of the base space. These have a role in defining the fibre charts 
on associated ordinary fibre bundles. Fibrations, ordinary fibre bundles and principal fibre bundles are 


illustrated in Figure 47.1.2. 
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47.2. Topological fibrations with intrinsic fibre spaces 


47.2.1 REMARK: Upgrading non-topological fibrations to topological fibrations. 

Definition 47.2.2 is the same as the non-topological fibration in Definition 21.2.1 except that topologies Tg 
and Tg have been added for the total space E and base space B respectively, and continuity is required for 
the projection map and the local bijections between fibre sets ~'({b}). The continuity is required uniformly 
for the fibre sets in an open neighbourhood of every base point. 


47.2.2 DEFINITION: A topological fibration with intrinsic fibre space is a tuple (E, n, B) < (E, Tg,m, B, Tg) 
such that 


(i) E <(E£,Tg) and B < (B,Tp) are topological spaces, 
(ii) m : E > B is continuous, 
(iii) Vb € B, JU € Top, (B), 4d: n 1(U) a1 ({b}), vx$:m-!(U) SU x n-!((b)), 
(iv) Vbi, b2 € B, n l((b)) Ri n l((b5)). 
E is called the total space of (E,m, B). 
7 is called the projection map of (E,7, B). 
B is called the base space of (E,7, B). 


For any b € B, the set 7~1({b}) is called the fibre set of (E, n, B) at b. 
The maps ¢ are called intrinsic fibre charts for (E, m, B). 


47.2.3 EXAMPLE: The real number system based on a single point. 

Let E = R, Tg = Top(IR), Bo = {0}, Tp, = {0, {0}} and m = {(t,0); t € R}, where 0 means Og € R. In 
other words, 7 : IR — {0} is defined by ro : t> 0. Then (E, Tg, 1o, Bo, Tp,) is a topological fibration with 
intrinsic fibre space according to Definition 47.2.2. 


For condition (iii), for any b € Bo, the set U may (and must) be chosen as {0}, and à : ng (U) > my '({b}) 
may be chosen as ¢ = idp, although $ could be chosen as any continuous bijection from R to R. Such 
a continuous bijection is a homeomorphism by Theorem 34.9.24. Then To x ó : IR — {0} x R is the 
map To X $ : t ++ (0,t), which is a homeomorphism with respect to the standard product topology on 
Bo x Ey = {0} x IR according to Definition 32.9.4. (See Notation 21.1.3 for Ey = ng '({b}).) Condition (iv) 
is trivially satisfied. Thus (E, Tg, 70, Bo, Tz,) = (R, Top(IR), R x {0}, {0}, (0, {0}}) is a topological fibration 
with intrinsic fibre space. 


47.2.4 REMARK: The component parts of a topological fibration. 

The spaces and maps in Definition 47.2.2 are illustrated in Figure 47.2.1. Recall that Top,(B) denotes the 
set of all open neighbourhoods of b € B. (See Notation 31.3.12.) See Definition 10.15.2 for pointwise direct 
products of functions such as v X ¢. 


n (U)CE n *({b}) 
T 
ves (AJ) ux cm e Bx) 
Figure 47.2.1 A map ¢ for an intrinsic fibre space x !((5]) 


Definition 47.2.2 (iii) implies by Theorem 32.11.4 that m~'({bi}) ~ v !((b2]) for all b},b2 € U. Fora 
connected topology on B, this implies condition (iv). So condition (iv) is superfluous if B is connected. 


47.2.5 REMARK: Global uniformity of fibres of a topological fibration with intrinsic fibre space. 
Definition 47.2.2 (iv) means that the fibres of (E, c, B) are globally uniform. This is reminiscent of the 
requirement that a manifold have the same dimension at all points. Just as a manifold could be alternatively 
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defined to have different dimensions on different components, so a fibration or fibre bundle could be defined 
to have different fibre spaces on different components of the base space. But the benefits of the increased 
generality would not outweigh the inconvenience. 


Since the fibre sets x !([b]) are pairwise homeomorphic, any topological space F which is homeomorphic 
to one such fibre set is homeomorphic to all of them. Such a space F is therefore uniquely defined up to 
homeomorphism and is uniform over all of B. 


47.2.6 REMARK: Trivial examples of topological fibrations with intrinsic fibre spaces. 

Trivial examples are often useful for checking the basic sanity of definitions and theorems. (See Remark 21.2.5 
for analogous trivial examples for non-topological fibrations.) The topological space (B,Tg) in Defini- 
tion 47.2.2 could be as trivial as the space (0, {0}), which implies that E = Ø and x = Ø. This satisfies all 
of the conditions of Definition 47.2.2. This example could be referred to as the trivial or empty topological 
fibration. 


Another kind of triviality occurs if (E, Tg) = (0, (01) for an arbitrary topological space (B, Tg). Then v = 9, 
and condition (iii) is satisfied by U = B and ¢ = 0) for all b € B. 


A slightly less trivial example is where (E, Tg) = 1 Tg) for any topological space (B, Tp) and 7 is the 
identity idg on E. Then «^! ((b)) = (b) for all b € B and condition (iii) is satisfied by U = B and 9 : i = b 
for all b € B. 


4.3. Topological fibre charts and fibrations 


47.3.1 REMARK: Adding fibre spaces, fibre charts and fibre atlases to fibrations. 

Sections 47.3 and 47.5 add fibre spaces, fibre charts and fibre atlases to the fibrations with intrinsic fibre 
spaces which are defined in Section 47.2. The specification of an external fibre space for an intrinsic fibration 
(Definition 47.2.2) yields the topological fibration with extrinsic fibre space in Definition 47.3.6. The addition 
of a fibre atlas to Definition 47.3.6 yields the topological fibre bundle in Definition 47.5.5. (See Figure 47.3.1 
for Definitions 47.3.2, 47.3.3 and 47.3.6.) 


e(l 


T g 
UCB «5 6) UxFCBxF 
Figure 47.3.1 Fibre chart ¢ for an extrinsic fibre space F 


47.3.2 DEFINITION: A fibre space or standard fibre for a topological fibration (with intrinsic fibre space) 
(E,7, B) is any topological space F such that F ~ 1~1({b}) for all b € B. 


47.3.3 DEFINITION: A fibre chart for a fibre space F for a topological fibration (with intrinsic fibre space) 
(E,7, B) is any function ¢: x^ !(U) > F such that m x $ :v !(U) zz U x F for some U € Top(B). 


47.3.4 THEOREM: Continuity of fibre charts. 
Let ¢ be a fibre chart for a fibre space F for a topological fibration (E, r, B). Then $ is continuous. 


PROOF: Since m x ¢: 7 !(U) > U x F is continuous for some U € Top(B) by Definition 47.3.3, the 
assertion follows from Theorem 32.11.2 (ii). 


47.3.5 REMARK: Specification styles for fibre charts in the literature. 

Many texts define fibre charts to be the combined maps v = m x ó : m !(U) — B x F. Then they 
require Ve € a !(U), Ily(v(e)) = «(e), where Il; : B x F — B is defined by II; : (b, f) — b. Some 
texts define fibre charts to be the inverses of such combined maps. That is, they define charts of the form 
p—wW-(rxd$)i:BxFo- n-!(U). (The relation between the three styles of fibre chart definitions 
are surveyed in Remark 21.5.11, Table 21.5.2 and Theorem 21.5.13.) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1508 47. Topological fibre bundles 


47.3.6 DEFINITION: A topological fibration with fibre space F for a topological space F < (F, Tr) is a tuple 
(E,7, B) < (E, Tg, v, B, Tp) such that 


(i) E < (E, Tg) and B < (B,Tg) are topological spaces, 
(ii) m : E > B is continuous, 
(iii) Vb € B, JU € Top (B), 3ó: x (U) > F, nx: 1(U) ~U xF. 


47.3.7 REMARK: Fibre sets of a topological fibration may be regarded as regularly embedded. 

By analogy with Definitions 50.3.6 and 50.2.8 for topological manifolds, it seems that it should be possible 
to claim that the fibre sets Ey = 7~+({b}) are in some sense “regularly embedded” in the total space E. 
The topological space Ey with the relative topology from E is locally homeomorphic to a Cartesian product 
of topological spaces, one of which is F. This is almost precisely what the “local trivialisation" condition 
Definition 47.3.6 (iii) says. The main obstacle here is the lack of locally Cartesian structure. Nevertheless, 
it is useful to think of the fibre sets as topological spaces which happen to be “regularly embedded” in some 
sense inside the total space. 


The “local triviality" condition means that fibrations are “locally product-structured topological spaces". 
(This is also noted in Remark 32.11.1. See Section 32.11 for globally product-structured topological spaces.) 


It is generally assumed implicitly that each fibre set of a topological fibration is a topological space as in 
Definition 47.3.8. 


47.3.8 DEFINITION: The topological fibre set of a topological fibration (E,7,B) < (E,Tg,x, B,Tpg) at a 
point b € B is the topological space (Ey, Tg,), where Ey = ~1({b}) and Tg, = (0 Ey; Q € Tg]. 


47.3.9 REMARK: Fibre charts compatible with a given topological fibration. 

Strictly speaking, the only fully acceptable fibre charts for a topological fibration are those which are specified 
in its fibre atlas, if one is specified. (See Section 47.5 for topological fibre atlases.) To avoid confusion, the 
charts in Definition 47.3.10 are called “compatible fibre charts”, which means merely that they are topologi- 
cally consistent with the topological structure on the fibration. A fibre atlas might not necessarily contain all 
compatible fibre charts, and it is often undesirable for the fibre atlas to contain them all. Theorem 47.3.11 
is the topological version of Theorem 21.5.6 for non-topological fibrations. 


47.3.10 DEFINITION: A compatible fibre chart for a topological fibration (E, n, B) with fibre space F is a 
function ¢: x !(U) — F, for some U € Top(B), such that m x 6: !(U) > U x F is a homeomorphism. 


47.3.11 THEOREM: Restrictions of fibre charts to fibre sets are homeomorphisms. 
Let (E, n, B) be a topological fibration with fibre space F. 


(i) If à : «-!(U) — F is a compatible fibre chart for (E, r, B), then Pla- 
homeomorphism for all b € U. 


pp n l((b) > Fisa 


PROOF: For part (i), suppose that ó : x !(U) — F is a compatible fibre chart for (E, v, B). Then 
"Xó6:n !(U) —U x F is a homeomorphism by Definition 47.3.10. Therefore $i ay m d((b))—Fis 
a homeomorphism for all b € U by Theorem 32.11.4 (ii). 


47.3.12 REMARK: The benefits of fibre charts. 

In the case of topological manifolds (in Section 50.1), topological charts are required to be compatible 
with each other on the intersections of their domains. This is not required for topological fibrations in 
Definition 47.3.6 because any chart which is compatible with the topologies on the total space E and base 
space B is automatically compatible with all other such charts due to the transitivity of continuity. (This 
is also mentioned in Remark 47.5.3.) It would seem, then, that the provision of fibre charts for a fibration 
adds no value. 


The topologies on the total space E and base space B, together with the projection map 7, determine totally 
and precisely the set of all possible fibre charts. Constructing a fibre atlas for a fibration (in Section 47.5) 
is therefore apparently merely a matter of choosing any subset of this set of charts which cover the whole 
total space. So the provision of a fibre atlas for a fibration also seems to add no value. However, there are 
some important reasons for providing fibre charts. 
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(1) Fibre charts are the easiest and best way to specify the topology on the total space. 

In practice, it is very tedious to define the topology for the total space. For example, the topology on the 
Mobius strip fibration, which is amongst the simplest of non-trivial fibrations, is almost never defined 
directly because it is too onerous to define the set of all open sets on its total space. One should recall 
that the topology on R” is usually defined as arbitrary combinations of unions and finite intersections 
of open balls. Following such a procedure for the total spaces of the Móbius fibration and much more 
complicated fibrations would be both tedious and difficult to interpret. Therefore it is best to induce the 
topology on a fibration's total space via fibre charts. In practice, fibre charts are the preferred method 
of specification of the topology. (This is also true in the case of manifolds.) 


(2) Fibre charts specify structures other than topology. 

Topology is not the only purpose for providing fibre charts. Fibrations are almost never abstract 
topological fibrations in applications. For example, the fibre space may be a linear space, in which case 
the fibre charts indicate which element of each fibre set 7~'({b}) is intended to correspond to the origin 
of the linear space. The local homeomorphism property of fibre charts is only a minimum requirement. 
The real value is added by the other structures which are described by the fibre chart maps. As in 
the case of the topology in reason (1), fibre charts are the most convenient way to specify additional 
structuring of the total space. 


(3) Specification of topology and other structure via charts facilitates generalisation. 
By using fibre charts to specify topology and other structure, one may present the theory of a very wide 
range of fibrations in terms of a common class of fibre spaces. One does not need to be concerned with 
the details of the topology and other structure on the concrete total space and base space. One can 
thus focus on the common features of a wide range of fibrations. 


These observations apply also to differentiable fibre bundles, and to topological and differentiable manifolds. 
Even if the topology (or differentiable structure) on a fibration (or fibre bundle) were pre-defined, and only 
the topological (or differentiable) structure were of interest, one would still find it easier to define fibre charts 
because they are very convenient for understanding and manipulating the structure. 


It would be more realistic to say that the topology (and differentiable structure) on the total space of a 
fibration are actually defined by the fibre charts, not vice versa. The topology (and differentiable structure) 
on the fibre set at each point of the base space is often defined pointwise without fibre charts, but the way 
in which the per-point fibre sets are glued together is generally defined in practice by the fibre charts. So 
the advantages of fibre charts are (1) define the topological (or differentiable) structure which glues between 
the per-point fibre sets, (2) specify additional structure on the fibre sets, and (3) allow fibrations (and fibre 
bundles) to be divided into classes for efficient study. 


47.4. Cross-sections of topological fibrations 


47.4.1 REMARK: A cross-section is a choice of fibre set element at each base point. 

In Definitions 47.4.2 and 47.4.6, a local cross-section over a subset U of the base space B of a fibration 
(E, 7, B) represents a choice of fibre element f(b) € Ey = 7~1({b}) for each b € U because 7 o f = idy. (See 
Definition 21.3.3 and Notation 21.3.4 for cross-sections of non-topological fibrations. See Definition 64.7.2 
and Notation 64.7.3 for differentiable cross-sections of differentiable fibrations.) 


More generally, the right inverse f of any surjective function g : X — Y represents the choice of an element 
f(y) € g^! (£y]) for all y € Y. A cross-section of a fibration is required to be a right inverse of the projection 
map 7. Therefore it represents a choice of fibre set element at each base-space point. 


47.4.2 DEFINITION: A (global) cross-section of a topological fibration (E, r, B) is a function X : B > E 
such that m o X = idp. 


A (local) cross-section of a topological fibration (E,v, B), on a set U C B, is a function X : U — E such 
that 7 o X = idy. 


47.4.3 NOTATION: .X(E,7, B), for a topological fibration (E,7, B), denotes the set of all global cross- 
sections of (E, 7, B). 

X(E,n, B |U), for a topological fibration (E, m, B), denotes the set of all local cross-sections of (E, n, B) on 
aset U C B. 
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o 


X(E,m, B), for a topological fibration (E, m, B), denotes the set of all local cross-sections of (E,7, B). 


Xioc(E, 7, B), for a topological fibration (E, x, B), denotes the set of all local cross-sections of (E, m, B) with 
open domains. In other words, Xi, (E, c, B) = (X € X(E,7, B); Dom(X) € Top(B)}. 


47.4.4 REMARK: Choice of notation for sets of local cross-sections. 

The use of the “circle-accent” to indicate local cross-sections in Notation 47.4.3 follows the pattern of 
Notation 10.9.3, where partially defined functions f : A > B are indicated by a “circle-accent”. (A mnemonic 
for the circle-accent could be that it represents a gap in knowledge. So the function is only partially defined.) 
The use of the subscript “loc” for local cross-sections with open domains follows a common practice in PDE 
analysis. The word “topology” is closely connected to the word “locality”, as noted in Remark 31.1.1. So 
the abbreviation “loc” suggests an open set. 


47.4.5 REMARK: Continuous local cross-sections of topological fibrations. 

Continuous local cross-sections of a topological fibration (E, 7, B) may be defined either as restrictions X | 5 
of continuous cross-sections X € X(E,, B) to the subset U of B, or as continuous functions X : U > E 
such that m o X = idy. (The relative topology is assumed for the subset U. See Section 31.6 for the relative 
topology on a subset.) 

These definitions are not equivalent. A continuous function on a subset cannot always be extended and then 
restricted to recover the original function. (For example, the function sign : IR — IR in Definition 16.5.4 has 
a continuous restriction to R \ {0}, but it is not the restriction of a continuous function from IR to IR.) 


Although a restriction symbol *|" is used in Notation 47.4.7, it does not signify the function-restriction style 
of definition. It is the domain which is restricted, not the function. 

The superscript C indicates continuity in Definition 47.4.6 because the superscript 0 strongly suggests a 
differentiable structure on E so that k-times differentiability can be defined for general k € Zi . (This issue 
is also mentioned in Remark 31.12.12.) 


47.4.6 DEFINITION: A continuous (global) cross-section of a topological fibration (E, 7, B) is a cross-section 
of B which is continuous. 


A continuous (local) cross-section of a topological fibration (E, n, B), on aset U C B, is a continuous function 
X:U — E such that * o X = idy. 


47.4.7 NOTATION: XC(E,z, B), for a topological fibration (E, r, B), denotes the set of all continuous 
global cross-sections of (E, 7, B). 

XC (E, n, B|U), for a topological fibration (E, 7, B), denotes the set of all continuous local cross-sections 
of (E, n, B) on a set U C B. 


x C(E,7,B), for a topological fibration (E,7,B), denotes the set of all continuous local cross-sections 
of (E, n, B). 

XC. (E, v, B), for a topological fibration (E, r, B), denotes the set of all continuous local cross-sections of 
(E,7, B) with open domains. In other words, XC. (E, r, B) = (X € X°(E,7, B); Dom(X) € Top(B)}. 


loc 


47.4.8 REMARK: A cross-section is continuous "through the fibre charts". 

Since fibre charts 9 : t~!(Uy) + F for Uy € Top(B) and a fibre space F define homeomorphisms 7 x à : 
c (Ug) — Us x F, the continuity of a cross-section in Definition 47.4.6 is equivalent to continuity “through 
the fibre charts”. In other words, a cross-section X is continuous if and only if the map $ o X | U, ` UNU > F 


is continuous for all fibre charts $. 


47.5. Topological fibre atlases 


47.5.1 REMARK:  Fibration topology is typically specified via a fibre atlas. 

Although atlases are defined in terms of a pre-defined topology on the fibration, this logic is the reverse 
to usual practice. It is more usual to define the atlas first and then define the topology so that all of the 
charts in the atlas are homeomorphisms. In other words, the topology on the total space is imported from 
the fibre atlas, not vice versa. However, the total space set is typically defined as a simple Cartesian set 
product, such as B x F, for example. Topologies Tg and Tp are pre-defined on B and F respectively. But 
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then the topology on E is not defined as the product topology on B x F. (See Section 32.9 for the direct 
product topology for two spaces.) The topology on E is then imported locally from open subsets of B x F 
via the fibre charts. One may therefore observe that the fibre atlas is the most important part of a fibration 
in practice, although in theory, the fibre atlas is an optional extra. 


47.5.2 DEFINITION: A fibre atlas for a fibre space F for a topological fibration (E, mr, B) is a set AL of 
fibre charts for the fibre space F for (E, n, B) such that Useat Dom(9) = E. 
E 


An indezed fibre atlas for fibre space F for a topological fibration (E, 7, B) is a family (6;);e; of fibre charts 
for fibre space F for (E, 7, B) such that U,-, Dom(¢;) = E. (That is, Range(¢) is a fibre atlas for F.) 


iel 
47.5.3 REMARK: If the topology on a fibration is known, the fibre atlas is superfluous. 

All fibre atlases for a given topological fibration are equivalent to each other. So the specification of a fibre 
atlas is optional for topological fibrations. But as mentioned in Remarks 47.3.12 and 47.5.1, the topology 
on a fibration is typically defined in practice by an atlas, not the other way around. More importantly, the 
purpose of an atlas is to indicate additional structure on fibre sets by mapping them to a fixed fibre space, 
where the structure is typically indicated by a structure group. 


The chart transition function for charts ¢; and $;j in an indexed fibre atlas in Definition 47.5.2 is 
(m X gj) o (( S99 Sees ) : (U: NU;) x F > (U: N U;) x F, 


where U; = 1(Dom(9;)) for i € I. This is necessarily continuous by Definition 47.3.3. (See Figure 47.5.1.) 
Therefore it is unnecessary to add any subsidiary condition on the regularity of transition functions to 
Definitions 47.5.2 and 47.5.5. 


(T X $j) o (( x deus 


Figure 47.5.1 Chart transition map for a fibration with fibre space F 


47.5.4 REMARK: Addition of a fibre atlas to a topological fibration. 
Definition 47.5.5 adds a fibre atlas to the fibration structure in Definition 47.3.6. The resulting combined 
structure is given the name “fibre bundle". 


47.5.5 DEFINITION: A topological fibre bundle with fibre space F, for a topological space F, is a tuple 
(E, Tr, B, AZ) < (E, Tg, v, B, Tp, A5) such that 
(i) E < (E, Tg) and B < (B,Tg) are topological spaces and 7 : E — B is continuous, 
(ii) Yọ € AE, JU, € Tp, 1 x Q: n 1(U4) ~ Ug x F. 
(iii) Useaz Us = B. 
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47.5.6 REMARK: Most of the information in a fibre bundle is redundant. 

Since the charts ¢ € A% in Definition 47.5.5 are homeomorphisms, the topologies Tg and Tg may be discarded 
without losing information. This is analogous to the fact that the atlas on a differentiable manifold makes 
the specification of the topology unnecessary. So a topological fibration could be specified as (E, r, B, AZ) 
without loss of information. In fact, the sets E and B (and the set F) can be discarded too. The fibre atlas 
and the projection map contain all of the information in the fibre bundle. (This is discussed in more detail 
in Remark 47.3.12.) 


47.6. Topological fibre bundles 


47.6.1 REMARK: Family tree for topological fibre bundles. 
A small family tree for topological fibre bundles is shown in Figure 47.6.1. 


non-topological fibration 
(B,7,B) 


ae L 


non-topological fibre bundle 


topological fibration 


(E,Tg,n,B,Tgp) (E,n,B, AE) 
topological fibre bundle 
(E,Tg,n,B,Tp,AE) 
Figure 47.6.1 Family tree for topological fibre bundles 


47.6.2 REMARK: Notation for fibre atlas components of specification tuples. 

To distinguish manifold atlases from fibre atlases, the notation Aj is used for a manifold atlas, whereas the 
notation A‘ for a fibre atlas on a total space E is tagged with the fibre space F. Then differentiable fibre 
bundles (in Chapters 64-66) can be specified by tuples such as (E, Ag, v, B, Ap, AE), where Ag and Ag are 
manifold atlases for E and B respectively. 


47.6.3 REMARK: Idea for replacing specification tuples with specification trees or networks. 

A possibly superior scheme for specification tuples which include topological or differentiable spaces would 
be to pair each topological or differentiable structure with its corresponding space. (See Section 8.8 for 
specification tuples.) Thus the tuples (G, Tc, F, Troc, p) and (E, Tg, n, B, Tg, AZ) in Definition 47.6.5 
would become respectively ((G, Ta), (F, Tr), 0G, u) and ((E, Tg), v, (B, Tp), A5). Similarly, the group oper- 
ation og would be paired with the topological space (G, Ta), so that ((G, TG), (F, Tr), oa, p) would become 
((G, Ta), oa), (F, Tr), u) or (((G, oa), Ta), (F, Tr), p). It is, however, not quite clear whether a topological 
group is a topological space with a group operation or a group with a topological structure. A more accurate 
specification structure would be a network of spaces and maps. To avoid the subjective questions of which 
structures are more or less primary or secondary than others, the arguably more accurate "specification 
trees" or "specification networks" are not used in this book. 


47.6.4 REMARK: Diagram of maps and spaces for a fibre bundle with structure group. 

In Definition 47.6.5, Lg denotes the function Lọ : F — F satisfying Lg : f + a(g, f), for transformation 
group elements g € G < (G, F,oa, u). See Definition 36.10.7 for “effective topological left transformation 
groups". The spaces and maps of Definition 47.6.5 are illustrated in Figure 47.6.2. 


wQ OO 
fus F G 


T [21 
UCB QO 2) UxFCBxF 
Figure 47.6.2 Maps and spaces for a fibre bundle with structure group 
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In Figure 47.6.2, the map ji denotes the function-valued function fi: G — (F — F) defined by fi(g)(f) = 
ug, f), where u : Gx F — F is the group operation of G on F. The circle on the arrow for fi in Figure 47.6.2 
indicates that the map is of the form fi: G > (F > F), not i : G — F. That is, for all g € G, fi(g) is a 
map ji(g) : F — F. In this case, the circled arrow ji is the action of a transformation group G on F. 


47.6.5 DEFINITION: A topological (G, F) fibre bundle for an effective topological left transformation group 

(G, F) < (G, Ta, F, Tp, c, p) is a tuple (E,7, B, AE) < (E, Tg, v, B, Tp, AZ) satisfying the following. 

(i) (E, Tg) and (B, Tg) are topological spaces, 7 € C(E, B) and 1(E) = B. 

(ii) Vo € AZ, IU € Tp, $ : n-1(U4) > F. 
i) 
) 


(iii Use ak Us = B. 
(iv) Vo € Ab, tx 6: 07-1 (Ug) & Us x F. 
(v) Vó1, $2 € AZ, [942,41 € CUS, N Us, G), Vz € g- (Us. N Ug), $»(z) = L(Gb2.61 (m(z)), (1 (2)). 


A topological ordinary fibre bundle is a topological (G, F) fibre bundle for some effective topological left 
transformation group (G, F). 


E is the total space of the fibre bundle. 

7 is the projection map of the fibre bundle. 

B is the base space of the fibre bundle. 

Af, is the fibre atlas of the fibre bundle. 

G is the structure group of the fibre bundle. 

F is the fibre space of the fibre bundle. 

ó is a fibre chart of the fibre bundle for ¢ € Af. 

mn l([b)) is the fibre set of the fibre bundle at b € B. 

Each element of E is a fibre of the fibre bundle. 

v X ¢ is the local trivialisation of the fibre bundle by the chart ¢. 
945,9, İS the fibre chart transition map from ¢ to $5, for $1,902 € AL. 


47.6.6 THEOREM: Topological fibre bundles are derived from non-topological fibre bundles. 
Let (E, Tg, n, B, Tg, AL) be a topological (G, Tc, F, Tp,c, p) fibre bundle. Then (E, m, B, AZ) is a non- 
topological (G, F, c, u) fibre bundle. 


PROOF: Definition 21.8.3 conditions (i), (ii), (iii), (iv) and (v) follow from Definition 47.6.5 conditions (i), 
(ii), (iii), (iv) and (v) respectively. 


47.6.7 THEOREM: Continuity of topological fibre bundle charts. 
The fibre charts of a topological ordinary fibre bundle are continuous. 


PROOF: The assertion follows from Theorem 47.3.4. 


47.6.8 THEOREM: Restrictions of fibre charts to fibre sets are homeomorphisms. 
Let (E, n, B, AZ) be a topological (G, F) fibre bundle. Let ¢ € AL and b € «(Dom(9)). Let Ey = 7~1({b}) 
have the induced topology from E. Then $n, DES e F. 


PROOF: The assertion follows by Definition 47.6.5 (iv) and Theorem 32.11.4 (ii). 


47.6.9 REMARK: Terminology for topological fibre bundles. 
Steenrod [142], pages 7-8, defines a *coordinate bundle" to be the same as Definition 47.6.5, except for the 
use of the reverse style of fibre chart ¢: Uy x F — v !(U,). 


47.6.10 REMARK: Interpretation of components of definition of a topological fibre bundle. 

The space C(E, B) in Definition 47.6.5 (i) means the set of continuous functions from E to B, using the 
topologies Tg and Tp. The sets C(r- ! (Us), F) and C(Ug, Uo, , G) assume the respective relative topologies 
for their domain spaces. 


The sets 7! (U5) in Definition 47.6.5 (ii) are open subsets of E because 7 is continuous. From (ii), it follows 
that Ug = 7(Dom(¢)) is unique for each ¢ € AL. 
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The fibre-set-restricted maps ¢] „ are homeomorphisms from Ey = 1-1((b)) to F for all b € Uy and ó € AE 
by Theorem 47.6.8. Then condition (v) implies 

Vo1, $2 € Aa. Vb € Uo, N Uo, dg e G, Vz = n l((b)), Q»(z) = L(g, $1(z)). 
This may be written in terms of the pointwise homeomorphisms $| EC E, — F as follows. 


Yoi, d» € Af, Vb € Us, NUga 39 EG, dos, © dila = Ly. 


The group element g € G is unique for each pair ($1, 62) by Theorem 20.2.7 because (G, F) is an effective 
transformation group. So the function go,,4, is unique for each pair ($1,992). The chart transition map 
Joo,¢, (b) at b € B from chart $4 to chart ¢2 in Definition 47.6.5 is illustrated in Figure 47.6.3. 


Figure 47.6.3 Fibre bundle transition maps are group elements 


47.6.11 REMARK: The arbitrary choice of subscript order for fibre chart transition maps. 

The fibre chart transition map notation gg,,¢, in Definition 47.6.5 is a consequence of the fact that the 
transformation groups of fibre bundles are assumed to be left transformation groups, and this fact is a 
consequence of the convention in matrix algebra that a matrix multiplies a column vector from the left, and 
this is doubtless a consequence of the fact that function names are generally written before their parameters. 
Thus for functions fı and fz one generally writes fo(fi(x)), not ((x)f1) fo, and for matrices A; and Az one 
writes (A241)u = A»2(A1v) for a column vector v, not v(A1 A2) = (vA1)A» for a row vector v (except in 
some subjects such as the theory of Markov probability models, where the order is chronological left-to- 
right). Long experience with matrix multiplication has led mathematicians to expect to be able to write 
963,63 (0)965,04 (0) = 994,2, (b), for example, where matching indices are adjacent. 


If one defines go,,4, to be the “fibre chart transition map” from $4 to $» (which is notated in "reverse 


chronological order"), one has the formula L (b) = 920 IM The order of $1 and $» is conveniently 


9625,04 
the same on the left and right sides of the formula. One has also 9%, 45(6)9¢0,4,(b) = 964,5, (b), which is 


similarly convenient. 


On the other hand, one writes maps ó : X4 > Xə and $ € C(Xi, X2) in the forward chronological order, 
which makes simple rules of thumb unreliable. 


47.6.12 REMARK: A topological fibre bundle is a topological fibration with a fibre atlas. 

Conditions (i), (ii), (iii) and (iv) in Definition 47.6.5 mean that (E, Tg, n, B, Tp) is a topological fibration 
with fibre space F according to Definition 47.3.6, and that Af is a fibre atlas for (E, m, B) with fibre space F 
according to Definition 47.5.2. 


47.6.13 REMARK: Empty topological fibre bundles. 

Topological fibre bundles may be empty. (This is mentioned for topological fibrations in Remark 47.2.6.) 
The general empty topological (G, F) fibre bundle for non-empty F has the form (E, Tg, m, B, Tg, AZ) = 
(0, (03, 0,0, (0), AS), where AG = Ø or {0}. A special case is (E, TE, v, B, Ta, AE) = (0, (01,0, 0, (0), 0). 
This may be shown as follows. 


An empty fibre bundle means a fibre bundle whose total space is empty. If E in the topological (G, F) fibre 
bundle (E, Tg, 7, B, Tp, AE) has E = 0, then Tg = {Ø} is the only possible topology on E. (See Remarks 
31.3.17 and 31.3.9 regarding the empty topological space.) The only possible function m : Ø > B is the 
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empty function 7 = Ø. (See Remark 10.2.22.) The empty function on the empty topological space (Ø, {0}) 
is continuous for any target topology (B, Tg). So Definition 47.6.5 (i) is satisfied. 


To satisfy Definition 47.6.5 (ii), one must set Us = Ý because ¢ must be the empty function, 7~'(U) = 0, 
and 7 x ¢ is the empty function, which can only be a bijection 7 x 6: v (Us) ~ Ug x F for F £0 if Uy = 0. 
So the empty chart ¢ = ( is the only permissible chart. It then follows by Definition 47.6.5 (iii) that B = 0. 
If AZ = {Ø} (i.e. if the atlas contains only the empty function), then Definition 47.6.5 (ii) requires that for 
$1 = $2 = Ý and Uy, = Ug, = 0, the condition Vb € Ug, N Uga, dg € G, $10 dal, = L, holds. But this 
is always true because Ug, N Ug, = Ø. The single transition map gg,,4, in Definition 47.6.5 (v) is the empty 
function. Therefore A‘ = {0} is a valid atlas. The fibre atlas A = Ø similarly satisfies all of the conditions. 


47.6.14 REMARK: The fibre atlas of a topological fibre bundle contains non-redundant information. 

The fibre atlas AZ in Definition 47.6.5 is not an optional parameter in the specification of a topological 
fibre bundle. A topological fibration (E, r, B) may be given two different atlases to specify two different, 
incompatible fibre bundles with respect to a pair (G, F). A topological fibration (E, m, B) with a single atlas 
is a (G, F) fibre bundle for a range of choices of the group G. But this set of groups depends on the choice 
of atlas AZ. 

In general, the larger the atlas, the smaller the set of allowable structure groups G. A fibre bundle with a 
single-chart atlas has the widest range of possibilities for the group G, since any topological transformation 
group G on F must be compatible with the atlas. If a quadruple (E, v, B, A5) is a (G, F) fibre bundle, then 
it is a (G', F) fibre bundle for any topological left transformation group of F, which is a supergroup of G. 


The purpose of the fibre atlas AẸ on each fibre set is to specify algebraic structure. Topologically, fibre 
charts are fully determined by the topological fibration (E, m, B). 


47.6.15 DEFINITION: A compatible fibre chart for a topological (G, F) fibre bundle (E, r, B, AE) is a fibre 
chart ó for (E, v, B) with fibre space F such that (E, r, B, AE U {¢}) is a topological ( ) fibre bundle. 


47.6.16 REMARK: Equivalent topological fibre atlases. 


Another way of stating Definition 47.6.17 is that two atlases A; and A» are said to be equivalent if every 
fibre chart in A; is compatible with every fibre chart in Ag. 


47.6.17 DEFINITION: ‘Topological (G, F) fibre atlases A1 and A» for a topological fibration (E,7, B) are 
said to be equivalent topological (G, F) fibre atlases for (E, v, B) if Ay U A» is a topological (G, F) fibre atlas 
for (E, 7, B). 


47.6.18 EXAMPLE: The Mobius strip as a topological fibre bundle. 

The Möbius strip provides probably the simplest non-trivial example of a fibre bundle. For the Mobius strip, 
the only possible structure group is G = (1, J} where I is the identity on F = {—1,1} and J: F > F swaps 
the elements of F. The trivial group {J} would not be an effective group since it leaves two elements of F 
fixed. In this example, both the fibre space F and the structure group G are completely determined by the 
triple (E, x, B). In general, the structure group G may not be uniquely determined by this triple. 


47.7. Fibre bundle homomorphisms 


47.7.1 REMARK: Classification of fibre bundles according to global topology. 

Topological homomorphisms between fibre bundles are not unavoidable in differential geometry. They are 
required for the classification of fibre bundles into equivalence classes with respect to global topology, but 
the style of differential geometry presented in this book is predominantly local. Consequently the definitions 
in Section 47.7 are not much used in later chapters. It is the associations between fibre bundles, as discussed 
in Sections 47.9, 47.10, 47.11 and 47.12, which are unavoidable in applicable differential geometry. 


47.7.2 REMARK: Fibre bundle homomorphisms. 

The style of homomorphism in Definition 47.7.3 maps only total spaces because the rest of the structure more 
or less follows from this. The atlases are not required to be completely equivalent. They are only required 
to be *continuous-equivalent". Thus conditions (iii) and (iv) specify that the map must be compatible with 
the charts on each fibre bundle and the topology of the structure group. 


A fibre bundle homomorphism is called a “fibre bundle map" in EDM2 [113], 147.B. It is called simply a 
“map” by Steenrod [142], page 9, who mentions that “in the literature" it is called a “fibre preserving" map. 
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47.7.3 DEFINITION: A topological (G, F) fibre bundle homomorphism between two topological (G, F) fibre 
bundles (E1, m1, B1, A1) and (E2, 72, B2, Ag) is a continuous map h : Ej — E» such that: 


(i) Vb € By, 3b» € Bs, hl (40) im ({b1}) m my (£05). 
(ii) mg 0h = h o s, for some continuous function h : By 3 Bo; 
(iii) Vor € Al, Vo» E A», Vb € ULA h- (Us), dg € G, Pi (5), oho wed = Lg, where Boii = RENTE 
for ġ; € A; and b; € U; = m;(Dom(¢;)) for i = 1,2; 
(iv) Vó € A1, Y2 € A», Pbo,¢, : UN h- (5) — G is continuous, where pg, 4, is defined by L 
Pino) bo oho Bs 


Po2,01 (b) = 


47.7.4 REMARK: Interpretation of the conditions for a topological fibre bundle homomorphism. 

Definition 47.7.3 (ii) is equivalent to requiring that the relation h = m2 o h o 7; ! be a well-defined continuous 
function. If this function exists, it is unique. The continuity of h means roughly that h continuously maps 
sets of the form 7, ! ((b1)) for bı € By to sets of the form mz‘ ({b2}) for b2 € B3. 


Condition (iii) can be made to look more similar to condition (ii) by writing it as follows: 
Vb1 € A1, Va € A», Vb € U1 n hi (Us), 3g € G, 
Pi (b),, ° hy; = Ly o Bog, i03 ((0)) — my ({A(b)}). 


Thus Ly : F © F in (iii) is analogous to the base space map h : Bı > B» in (ii). 
Condition (iv) is illustrated in Figure 47.7.1. It may be written equivalently as 


$2 o h[. i iniquo): £9. Dos, (male) 9 9102) 


for z € x1 (Ui n h-  (U)). This shows that h(z) is equivalent “through the charts” to a group action which 
depends only on 71(z) for each fixed choice of charts. 


y f Poz, b) € G a " 
| -— 


Figure 47.7.1 Topological (G, F) fibre bundle homomorphism 


47.7.5 DEFINITION: A topological (G, F) fibre bundle isomorphism between topological (G, F) fibre bundles 
(E1, 71, Bi, A1) and (E2, 72, B2, A3) is a map h : E4 — E» such that h and h^! are topological (G, F) fibre 
bundle homomorphisms. 


47.7.6 DEFINITION: Equivalent topological (G, F) fibre bundles are topological (G, F) fibre bundles between 
which there exists a topological (G, F) fibre bundle isomorphism. 


47.7.7 REMARK: Topological fibre bundle homomorphisms for differing structure groups. 

The style of fibre bundle homomorphism in Definition 47.7.3 assumes the same topological transformation 
group (G, F) for the two fibre bundles. Definition 47.7.8 removes this restriction. (See Definition 36.10.16 
for topological transformation group homomorphisms.) 
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47.7.8 DEFINITION: A topological fibre bundle homomorphism between a topological (G1, F1) fibre bundle 
(Ei, 71, B1, A1) and a topological (G5, F2) fibre bundle (E2, 72, Bz, A2) is a tuple of maps (h, À, h) such that: 


(i (h, h) is a topological transformation group homomorphism with ħ : Gi > Go and hk SB 


um 


) 
(ii) h: E4 — Es is continuous; 
(iii) m2 oh = h o mı for some continuous function h : By — Bo; 
) 


(iv) VÀ € Ai, Y2 € A2, Vb € U N h^ 1 (05), Ag € Gh, Braga © h o Bog, = h o Lg, where fy, o, = 
$i, (uy for ó; € A; and b; € Ui = »;(Dom(9;)) for i = 1,2; 
(v) Vor € A1, Và» € A», pos.) : Ui nh- (Us) — G; is continuous, where pg, 4, is defined by hoL 


DR), bo oho Bs 


Po2,01 (b) = 


47.7.9 REMARK: Conditions for topological fibre bundle homomorphisms. 
Definition 47.7.8 is illustrated in Figure 47.7.2. 


h 
«Q——Qs 


h 
Qs 


Figure 47.7.2 General topological fibre bundle homomorphism 


If the equation Bind) 0 oho [o =ho L, in condition (iv) is replaced with the less restrictive condition 


Jg2 € Ga, Bw) ds 2 ho Bob, = Lg, © ho Lg, which would be more symmetric, it would no longer be 
required in general that Pi (5), s oho Bia, (Fi) C h(F). 


47.7.10 REMARK:  Topological fibre bundle isomorphisms with equivalent but different structure groups. 
Definitions 47.7.11 and 47.7.12 permit the structure groups (G4, F1) and (G2, F2) to be different but equiv- 
alent whereas Definitions 47.7.5 and 47.7.6 require identical structure groups. 


47.7.11 DEFINITION: A topological fibre bundle isomorphism between topological fibre bundles 
(E4,71, B1) and (E2, m2, B2) is a bijection h : E, — E» such that h and h^! are topological fibre bundle 
homomorphisms. 


47.7.12 DEFINITION: Equivalent topological fibre bundles are topological fibre bundles between which there 
is a topological fibre bundle isomorphism. 


47.7.13 REMARK:  Cross-sections of topological fibre bundles. 
The definitions of cross-sections for topological fibre bundles are inherited from the corresponding definitions 
for topological fibrations in Section 47.4. 
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47.8. Topological principal fibre bundles 


47.8.1 REMARK: The purpose and roles of principal fibre bundles. 

A topological principal fibre bundle is a particular kind of topological fibre bundle, namely a topological 
(G, F) fibre bundle (P, r, B, AE) such that F = G. (Topological fibre bundles were defined in Section 47.6.) 
More precisely, the structure group (G, F) < (G,Tc,F,Tr,oc,u) has F = G, Tp = Tg and u = oq. 
So the structure group for a principal fibre bundle is the topological left transformation group (G,G) < 
(G, Ta, G, Ta,0G,0G). The corresponding topological group is G < (G,T¢,oq). A principal fibre bundle 
with structure group G is usually called a “principal G-bundle" or just a *G-bundle". 

Some authors define principal fibre bundles in reverse. They start with the right transformation group 
(G, P) and build the (G, G) fibre bundle (P, m, B, AG) from this. (Some authors construct the base space of 
a principal fibre bundle as the quotient of a G-space P by the right action of a group G. Then they construct 
ordinary fibre bundles as associated bundles for left transformation groups (G, F) as in Section 47.11.) 


Most authors define connections (differential parallelism) on principal fibre bundles rather than ordinary 
fibre bundles. Then they copy connections from PFBs to associated OFBs. Thus principal fibre bundles 
customarily serve as a structure on which to define connections and parallelism. Connections and parallelism 
are defined on OFBs instead of PFBs in this book, but the importance of connections on PFBs is inescapable. 


47.8.2 REMARK: Comparison with non-topological and differentiable principal fibre bundles. 

Definitions 47.8.3 and 47.8.7 introduce topological principal bundles and right action maps corresponding to 
Definitions 21.9.4 and 21.11.4 for non-topological principal fibre bundles, and Definitions 66.1.2 and 66.2.2 
for differentiable principal fibre bundles. 


47.8.3 DEFINITION: A topological principal (fibre) bundle with structure group G for a topological group 
G < (G, Te, ac) is a topological (G, G) fibre bundle (P, x, B, AG) < (P, Tp, v, B, Tg, AS), where (G,G) < 
(G, Ta, G, Ta, 0G,0G) is the topological left transformation group G acting on itself. 


Alternative names: topological principal G-bundle or topological G-bundle. 


47.8.4 REMARK: Diagram of the component parts of a topological principal fibre bundle. 
Figure 47.8.1 illustrates the spaces and maps in Definitions 47.8.3. The map R, : P — P in Figure 47.8.1 is 
defined in Notation 47.8.9, and Lg : G — G is the left translation map Ly : g' + gq’. 


UxGCBxG 


ven) 


Figure 47.8.1 Principal fibre bundle spaces and maps 


47.8.5 THEOREM: Topological principal bundles are derived from non-topological principal bundles. 
Let (P, Tp, 7, B, Tp, AG) be a topological principal G-bundle, where G < (G, TG, oa). Then (P, v, B, AS) is 
a non-topological principal G-bundle with structure group G < (G, oa). 


PROOF: The assertion follows from Theorem 47.6.6 and Definitions 47.8.3 and 21.9.4. 


47.8.6 REMARK: The right action map for topological principal bundles. 
Definition 47.8.7 introduces the right action map for topological principal bundles. (See Definition 21.11.4 
for non-topological principal bundles. See Definition 66.2.2 for differentiable principal bundles.) 
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47.8.7 DEFINITION: The right action (map) for a topological principal bundle (P, v, B, AG) with structure 
group G < (G, Ta, ca) is the operation uë : P x G — P which is defined by p(z, g) = lp. ,(ea(6(2), 9)) 
for (z,g) € P x G for any ó € AG with z € Dom(9). In other words, 


Yz € P, Yg € G, Và e AZ (2), HGC g) = (x x à) (n(2),0(2)9). (47.8.1) 


47.8.8 REMARK:  Topological version of notation for the right action map. 
Notation 47.8.9 is the topological version of Notation 21.11.6 for non-topological principal bundles. See 
Notation 66.2.3 for the corresponding notation for differentiable principal bundles. 


47.8.9 NOTATION: Right action by a structure group element on a PFB total space. 
R? , for a topological principal G-bundle (P, x, B, AG) and g € G, denotes the map from P to P defined by 
z e ub, g), where u£ is the right action map of G on P. In other words, Wz € P, R? (2) = uE (z, g). 


R, is an abbreviation for RP when the context makes the meaning clear. 


zg, for z € P and g € G, denotes RẸ (2). 


47.8.10 THEOREM: Fibre chart independence of the right action map. 
The right action ma in Definition 47.8.7 is fibre chart independent. 


PROOF: Forze P, b — z(z) and $,9' € AS, let g' € G satisfy ¢' o dlp, = Ly. Then 


|p, (ee (6(2). 9)) = |p, (La (06 gr (9 (2) 9) 
= d'|z, (ea (9! (2), 9)). 


The fibre chart independence follows from this. (Alternatively, apply Theorem 21.11.2 (iv).) 


47.8.11 REMARK: Fibre-chart independence of right action is due to commutativity with left actions. 
This chart independence in Theorem 47.8.10 is due to the fact that the fibre space is a group whose elements 
can be acted on from both the left and the right. One could say that group elements are “two-port” objects. 
Since groups are associative, the right and left actions commute with each other. The elements of ordinary 
fibre spaces are then “single-port” objects. 


47.8.12 REMARK: The right action on the total space is the action of G on itself via the charts. 

The right action Rg : P — P of G on P in Notation 47.8.9 may be expressed in terms of the right action 
R,:G— GofGonG as R, = dlp, o R, o dlp, for g € G and b € «(Dom(9)). In other words, the action 
of G on P is the same as the action of G on G through the charts. Therefore the right action u, does not 
provide any information that is not already in the fibre bundle. 


47.8.13 THEOREM: Topological right transformation group using right action map. 
Let (P, v, B, AG) be a topological principal bundle with structure group G < (G, og) < (G,Ta,oca). Let 
Lb : P x G — P be its right action map. Then (G, P, oG, uE) is a topological right transformation group. 


Pnoor: By Theorem 21.11.7 (iv), (G, P, oG, HE) (ignoring the topologies on G and P) is a non-topological 
right transformation group. 


To show the continuity of u& : P x G — P, it is sufficient to show this for each fibre chart ¢ € AG in the 
definition of už, on line (47.8.1) because the right action map is chart-independent by Theorem 47.8.10. The 
continuity of 7, ó and (m x $) ! follows from Definitions 47.8.3 and 47.6.5 (i, iv). The expression *ó(z)g" on 
line (47.8.1) is continuous with respect to ¢(z) and g because og is continuous by Definitions 47.8.3 and 47.6.5. 
Therefore the composition of functions on line (47.8.1) yields a continuous function uie : P x G — P. Hence 
(G, P,og, uÈ) is a topological right transformation group by Definition 36.11.3. 


47.8.14 DEFINITION: The right transformation group of a topological principal fibre bundle (P,7, B, AG) 
is the topological right transformation group (G, P) < (G, Te, P, Tp, oa, we). 
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47.8.15 REMARK: The right transformation groups of topological principal bundles act freely. 

As in the case of non-topological principal bundles, the right transformation group in Definition 47.8.14 acts 
freely on the total space. As mentioned in Remark 21.11.9, the possibility of an empty total space cannot 
be ignored in Theorem 47.8.16. 


47.8.16 THEOREM: Free and effective action of right transformation group of a principal bundle. 
The right transformation group of a topological principal bundle acts freely on the total space. Hence the 
action is effective if the total space is non-empty. 


PRoor: The assertion follows from Theorem 21.11.10 for non-topological principal bundles. 


47.8.17 REMARK: Commutativity of right actions with fibre charts. 

Assertion (iii of Theorem 47.8.18 is sometimes given as part of the definition of a principal fibre bundle, 
whereas here it is presented as a consequence of the definition of a right action which is expressed in terms 
of the structural components of the standard fibre bundle tuple in Definition 47.8.3. 


47.8.18 THEOREM: Some basic properties of the right action map of a principal bundle. 

Let (P, Tp, v, DB, Tp, a be a topological principal G-bundle, where G < (G, Ta, ca). 

(i) Vg € G, Dom(R?) = 

(ii) Vz € P, Vo1, 92 € G, -— (2g1)92- 
(That is, Vz € P, Vg1,g2 € G, Rb ,,(z) = REL (RI (z)). Thus Vi, 92 € G, RT 

(ii) Vz € P, Vg € G, m(zg) = v(z). (That is, v(uE(z, g)) = v(z).) 

(iv) Vo € Ag, Vz € Dom(¢), Vg € G, ó(zg) = ó(z)g. (That is, 6(uG(2. g)) = ea(6(2), 9)-) 

(v) Let u : P x G — P be a function which satisfies (iii) Vz € P, Vg € G, n(u(z,g)) = n(z) and (iv) 
Vo € AG, Vz € Dom(¢), Vg € G, ó(u(z, g)) = oa(ó(z),g). Then u = uE. Hence u is the unique 
function from P x G to P which satisfies parts (iii) and (iv). 


=f; o RỌ.) 


9192 


(vi) Vg € G, -oHP =r. 

(vii) Yó € AS, vg €G, $o RP — RF o. 
(vii) Vb € B, Yz € P, P, = (29: g € G}. 

(ix) Vb € B, Vz € P, Py = (zg ig € Gh. 


Pnoor: Part (i) follows from Theorems 21.11.7 (i) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (ii) follows from Theorems 21.11.7 (ii) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 


Part (iii) follows from Theorems 21.11.7 (v) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 

Part (iv) follows from Theorems 21.11.7 (vi) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (v) follows from Theorems 21.11.7 (vii) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (vi) follows from Theorems 21.11.7 (viii) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (vii) follows from Theorems 21.11.7 (ix) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (viii) follows from Theorems 21.11.7 (x) and 47.6.6, and Definitions 21.11.4 and 47.8.7. 
Part (ix) follows from part (viii). 

47.8.19 REMARK: Structure group elements act freely on the principal fibre bundle total space. 


In Definition 47.6.5 for ordinary fibre bundles, the structure group G is required to act effectively on the 
fibre space F. Definition 47.8.3 does not need to specify that the right action of G on G is effective because a 
group always acts freely on itself, and a free action on a non-empty set is necessarily effective, as mentioned 
in Remark 20.3.3. 


The right action re of G acts freely on P. To see this in terms of the conditions in Theorem 47.8.18, 
let z € P and g € G \ {e} satisfy ub(z, g) = z; that is, zg = z. Then by (viii), (zg) = m(z). So zg 
and z are in the same fibre set of P; that is, they have the same base point b = m(zg) = a(z). From 
(iii) it follows that ¢(zg) = ¢(z)g for any chart à € Ag such that b € Us = 7(Dom(¢)). Since G acts 
freely on G, this implies that (z9) # ó(z). But m x @: 7 (Us) ~ Ug x F is a bijection. Therefore 
zg = (nx $)- 1(b, ó(zg)) £ (nx 6) 1(0,¢ $0) — z. This means m non-identity elements of G have no fixed 
points, and G therefore acts freely on 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


47.9. Associated topological fibre bundles 1521 


47.8.20 REMARK: Interpretation of properties of right action maps on PFB total spaces. 
Theorem 47.8.18 (iv) means that all fibre charts are equivariant maps between the right transformation 
groups (G, P) and (G,G). (See Definition 20.8.4 for equivariant maps.) 


Condition (viii) of Theorem 47.8.18 means that the action of G on P is *vertical". Without this condition, 
the action of G on P would not be uniquely determined by the fibre charts as a result of condition (iii). 


47.8.21 REMARK: The information in PFB fibre charts is partly but not fully redundant. 

In contrast to the ease of constructing u from the atlas AG in Remark 47.8.12, it is not possible to 
construct the atlas from the right action u£E. It is true that if ó(z) is known for one value of z € a! ((b]) 
for a given b € «(Dom(9)), then ó(zg) is uniquely defined for all g € G as $(zg) = ó(z)g, and this means 
(by Definition 47.6.5 (iv)) that $(z) is uniquely defined for all z € 7~'({b}). Thus the fibre charts are fully 
defined if they are defined for just one value of z in each fibre ~1({b}). However, this single value cannot 
be obtained from ub. Therefore the information in the fibre charts is partly but not fully redundant. 


To specify a fibre chart $, it is sufficient to specify the set d~'({e}), where e € G is the identity of G. 
The rest of the fibre chart is uniquely determined by this. One may think of the unique point 5j. 3 (e) € 

~!({e}) x1 ({b}) as the “identity point” in each fibre set 7~!({b}). For a tangent bundle, this gives a 
kind of “coordinate frame field” on B. 


47.9. Associated topological fibre bundles 


47.9.1 REMARK: Topological versus non-topological associated fibre bundles. 

Most of the associated topological fibre bundle concepts in Sections 47.9, 47.10, 47.11 and 47.12 can be 
defined for non-topological fibre bundles. Definitions 47.9.5 and 47.9.7 for fibre bundle associations have no 
continuity requirement. So the definitions are essentially th the same as the corresponding Definitions 21.12.4 
and 21.12.5 for non-topological fibre bundles. Likewise, the associated fibre bundle construction methods are 
essentially the same because they are non-topological. (Specific methods for constructing associated fibre 
bundles are presented in Sections 47.10 and 47.11.) 


47.9.2 REMARK: The distinction between fibre bundle associations and fibre bundle homomorphisms. 

A fibre bundle association is a different kind of relation to a fibre bundle homomorphism. (See Section 47.7 
for fibre bundle homomorphisms.) A fibre bundle association specifies a map between the fibre atlases of 
the fibre bundles. This fibre atlas map is required to preserve the chart transition maps, but there is no 
specified map between the total spaces of the associated bundles. A fibre bundle association may be thought 
of as an isomorphism of the fibre atlases. However, since the fibre spaces are typically different, associated 
fibre bundles typically are not topologically isomorphic. So associations and topological isomorphisms are 
distinct kinds of relations. 


47.9.3 REMARK: Parallelism can be ported between associated fibre bundles. 

If (G, F) and (G, F) are effective left transformation groups, then the fibre chart transition functions uniquely 
determine group elements. These group elements can then be compared with each other. If they are 
equal for some bijection between the atlases, they are said to be associated. This shows a good reason 
why the transformation groups should be effective. Since parallelism is determined by group elements 
at each base point, it follows that parallelism can be uniquely transferred from any fibre bundle to any 
associated fibre bundle. In the case of differentiable fibre bundles, this means that connections may be 
ported between different types of fibre bundles if they have the same base space and structure group. In 
particular, connections on affine connections on tangent bundles can be copied to all types of tensor bundles. 


47.9.4 REMARK: Associations between fibre bundles are defined between the fibre charts. 
Definition 47.9.5 says only that associated fibre bundles must have exactly matching domains for all charts 
in the respective atlases, and that all chart transition functions must be the same. 


Definition 47.9.5 is illustrated in Figure 47.9.1. The correspondence between the fibre bundles involves only 
a map h between the charts. There are no maps between the spaces of the two fibre bundles. This should 
be contrasted with fibre bundle homomorphisms as illustrated in Figure 47.7.2, which have maps between 
all components of the two fibre bundles. Note also that although the spaces “on the outside" (G and B) 
are the same, the spaces “on the inside" (F and E) are different. Therefore the group action maps pz and ji 
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are different and the projection maps 7 and 7 are different. In the middle, the charts ¢; and bi = h(¢;) are 
related by the fibre space association map h. 


r| |7 
G e 
Figure 47.9.1 Associated topological fibre bundles 


47.9.5 DEFINITION: A topological fibre bundle association is a bijection h : AE > AE between the fibre 
atlases of topological (G, F) and (G, F) fibre bundles (E, 7, B, AZ) and (E,7, B, AE) respectively such that: 
(i) Vo € AZ, 1(Dom(9)) = (Dom(h(9))): 


(ii) Voi, 9» € AF, Vb € Us, Us, Joa, (b) = Gh($2),h(d1) (b), where Ug denotes 7(Dom(¢@)), and g, g denote 
the fibre chart transition functions for the respective fibre bundle atlases. 


In other words, a topological fibre bundle association is the same as a non-topological fibre bundle association 
in Definition 21.12.4. 


47.9.6 REMARK: Alternative expression for fibre-chart-independence condition for OFB associations. 
Perhaps a clearer way of presenting Definition 47.9.5 (ii) is: 


V1, b2 € AE, Vb € Us, N Upa, Vg EG, 
balm, = Lo o ilp, € h(02)|g, = Lo o h(61)| g, (47.9.1) 


This is similar to equation (48.4.1) in Definition 48.4.2 for associated parallelism. Condition (47.9.1) is 
equivalent to Definition 47.9.5 (ii) by Theorem 21.12.8. 


47.9.7 DEFINITION: Associated topological fibre bundles are topological (G, F) and (G, F) fibre bundles 
(E, r, B, AL) and (E, 7, B, AZ) for which a topological fibre bundle association h : AE > AZ is specified. 


47.9.8 REMARK:  Ezpressions for fibre chart transition functions in terms of fibre chart compositions. 
By Definition 47.6.5, chart transition functions in Definition 47.9.5 are defined by L, "NOR Bb, 9 By, s 


and Joa. gt 7-1((0) 
BÉ = TM Definition 47.9.5 (ii) means that the group elements g and g are equal, but Lg : F ~ F 


= B, rer d ic de where 8 and B are notations defined by 85,4 = o| and 


and L; : F ~ F are not equal in general. Even if the spaces F and F are the same, the group actions 
u:Gx F- F and ñ: G x F > F might be different. 


47.9.9 REMARK: For a given pair of fibre bundles, there may be many fibre bundle associations. 

The fibre bundle association for fibre bundles in Definition 47.9.7 is generally not unique even if the atlases 
of the fibre bundles are fixed. Since a fibre bundle may be associated with itself via an atlas bijection which 
is not the identity, such a self-association may be composed with associations to different fibre bundles to 
also make associations between different fibre bundles non-unique. It follows that when parallelism is being 
copied between associated fibre bundles, it is essential to specify which association map is being used between 
the atlases. 
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47.9.10 REMARK: Fibre bundles may be associated via fibre bundle equivalence maps. 

Definition 47.9.7 is very strict. It requires the atlases of associated topological fibre bundles to have closely 
matched domains and identical fibre chart transition maps. In practice, one may be content to call a pair of 
fibre bundles associated if they are merely “continuous-equivalent” (in the sense of Definition 47.7.12) to a 
pair of strictly associated fibre bundles. This is presented in Definition 47.9.11. 


For defining parallelism and connections, it is the very strict linkage between associated fibre bundles, 
as stipulated in Definition 47.9.7, which is of real benefit. Parallelism and connections can be “ported” 
between strictly associated fibre bundles so as to guarantee certain kinds of “contragredience” relations when 
associated fibre charts are used. Definition 47.9.11 does have applications in the topological classification of 
fibre bundles, but this topic is outside the scope of this book. 


47.9.11 DEFINITION: Continuous-associated topological fibre bundles are topological fibre bundles &,& 
with the same base space which are equivalent to associated topological fibre bundles £5, £2 respectively. 


47.9.12 REMARK: Illustration of fibre bundles associated via fibre bundle equivalence maps. 

Definition 47.9.11 is illustrated in Figure 47.9.2. Definition 47.9.11 means that if £9 and €2 are associated 
topological fibre bundles according to Definition 47.9.7, and Ci and £9 are equivalent, and £1 and £» are 
equivalent, and they all have the same base space, then £1 and £1 are continuous-associated topological fibre 
bundles. (See Definition 47.7.12 for equivalent topological fibre bundles.) Definition 47.9.11 permits the 
associated fibre bundles £2, €z to have different but equivalent structure groups (G4, F1) and (G4, Fi), but 
the base spaces are required to be identical. This is somewhat arbitrary and may be changed in light of the 
requirements for any applications. 


i 
OG 
"| 


£, isomorphism £2 association £9 isomorphism £ 


Figure 47.9.2 Continuous-associated topological fibre bundles 


47.10. Patchwork associated topological fibre bundles 


47.10.1 REMARK: Construction of associated fibre bundles using patchwork spaces. 
The “peer-to-peer” associated fibre bundle construction method described in Section 47.10 is implemented 
using “patchwork spaces”. (See Section 21.13 for the non-topological version.) 


The basic concept here is that for each chart ¢ in the atlas AE of the source OFB, there is a single copy of 
the topological direct product Uy x F, where Uy = 7(Dom(¢)). These simple direct products are then “sewn 
together" by identifying pairs (b, f) € Ug x F and (b, f’) € Uy x F which are related by the transition rule 
f! = gero (b) f, where go, (b) is the transition map from ¢ to ¢' at b for à, 9' € AL. 


47.10.2 REMARK: Construction of fibre bundles from abstract local transition maps. 
Fibre bundles may be constructed from sets of transition maps between open sets which cover the base space 
and satisfy a simple transitivity rule. This construction procedure is presented for example by Steenrod [142], 
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pages 14-15; Ehresmann [177], page 762; Crampin/Pirani [7], page 355; Kobayashi/Nomizu [19], page 52; 
EDM2 [113], page 569. (See also Remark 21.13.1 for minimal input requirements for associated fibre bundles.) 
Definition 47.10.4 makes use of this method of construction from transition maps, but in this case, the 
transition maps are not abstract. They are obtained from a more concrete pre-existing fibre bundle, which 
has the advantage of ensuring a-priori that the transitivity rule is valid. 


47.10.3 REMARK: Construction of an associated fibre bundle with a different fibre space. 

Definition 47.10.4 constructs an associated (G, F) fibre bundle from a given (G, F) fibre bundle. The only 
information that the associated bundle inherits from the given bundle is the set of transition maps g¢,,¢, of 
Definition 47.6.5 for topological fibre bundles. 


The inputs and outputs of the patchwork construction method may be summarised as follows. (For the non- 
topological version, see Remark 21.13.1. The corresponding lists for the orbit-space construction method are 
given in Remark 47.11.4.) 


Inputs: 

1) (G, F) < (G, Ta, F, Tr, o, uE), a topological left transformation group. (The source fibre space.) 

2) (E, 7, B, AZ) < (E, Tg, v, B, Tp, A‘), a topological (G, F) fibre bundle. (The source OFB.) 

3 (G, F) < (G, d Tp, 0, uË), a topological left transformation group. (The target fibre space.) 

Outputs: 

1) E « (E,Tg), the new total space. 

2) 7, the new projection map. 

3) AE, the new fibre atlas. 

4) (È, ñ, B, AF) < (E, Tg, ñ, B, Tg, Ae, the new topological (G, F) fibre bundle. (The target OFB.) 
Composed of the new structures E < (E, Tg), 7 and Ag, and the old base space B < (B,Tp). 


5) h: AZ > AF, the fibre chart association map. (From source OFB to target OFB.) 


47.10.4 DEFINITION: The patchwork associated topological (G, F) fibre bundle of a topological (G, F) fibre 
bundle (E, r, B, AZ) < (E, Tg, n, B, Tp, AS), for effective topological left transformation groups (G, F) < 
(G, Ta, F, Tp, o, ub) and (G, F) < (G, Ta, F, Tg, c, ud), is the topological (G, F) fibre bundle 

(E, i, B, AF) « (E, Tg, ït, B, Tg, AL), with the chart association map h : AE > AF, defined as follows. 

(i) E = {[0,f,0) b € B, f € Fd € AE), where [(b, /,9)] = {(b, gø ob) f, d^) Ø € AE), and the 
transition maps 94,,¢, : Us, N Us, — G are defined by L,, , (y = $2 o dal sns where Ug = 
m(Dom(¢)) for ¢ € Af. 

(ii) 7: E — B is defined by 5 : [(b, f, $)]  . 

(iii) AE = (h(9); ¢ € AE}, where h(@) : -! (U5) — F is defined for ¢ € AE by h(9) : (b, f, 9)] f. 


(iv) Tg = (Useaz (t x h(0))! (Q4); 2: AE > P(B x F) and Yọ € AE, Qs € Top(Us x F)]. 


The topological fibre bundle association map is then h : AE > AE as in part (iii). (See Figure 47.10.1.) 


47.10.5 REMARK: Comments on associated fibre bundles constructed with patchwork. spaces. 

The fibre bundle (É, 7, B, AZ) constructed in Definition 47.10.4 is well defined and satisfies Definition 47.9.5 
for a fibre bundle association because the chart transition maps satisfy h(@2) : [(b, fi, 91)] 964,4, (b) fa by 
conditions (i) and (iii). 


It would perhaps be more logical to use the charts ó as tags for triples (b, f, 9) instead of ¢, but then they 
would be used in (i) although they are not defined until (iii). Besides, there is a one-to-one map between 
them anyway. 


47.10.6 REMARK: Construction of associated principal fibre bundle with structure group as fibre space. 

If the fibre space F in Definition 47.10.4 is the structure group G, the associated fibre bundle (E, 7, B, AF) = 
(P, mp, B, AG) will be a principal G-bundle. In this case, the right action ub : P x G — P of the G-bundle 
is defined by uG : ([(b,9,9)], 9’) > [(b, 99’, 4)]. 
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Figure 47.10.1 Associated topological fibre bundle patchwork construction 


47.10.7 REMARK: Validation of patchwork associated topological fibre bundles. 
Theorem 47.10.8 is entirely non-topological. So it follows directly from Theorem 21.13.3. Similarly, the 
non-topological parts of Theorem 47.10.9 follow from Theorem 21.13.5. 


47.10.8 THEOREM: Validation of the patchwork associated fibre bundle total space construction. 
Let (G, F) and (G, F) be effective topological left transformation groups. Let (E, v, B, AE) be a topological 
(G,F) fibre bundle. Let X = {(b,f,¢);b E B,f € F,ó Aio} Define the relation “=” on X by 
(b', f’, 0’) = (b, f, 9) whenever b’ = b and f' = ga'o (b) f. 

() 
(ii) The set (((5, f,9); bE B, f € F, ge Abi) of non-empty equivalence classes of “=” is a partition of X. 
(iii) Vb € B, Vf € F, Vói, $2 € AE b: ((b, f. $1)] = [(b, 92,61 (b) f, $)]. 


PROOF: Parts (i), (ii) and (iii) follow from Theorem 21.13.3 parts (i), (ii) and (iii) respectively because 


by Theorem 47.6.6, every topological (G, F) fibre bundle in Definition 47.6.5 satisfies the conditions for a 
non-topological (G, F) fibre bundle in Definition 21.8.3. 


q—» 


is an equivalence relation on X. 


47.10.9 THEOREM: Patchwork associated topological fibre bundles are associated topological fibre bundles. 
Let (G, F) < (G, Ta, F, Tr, 0G, pE) be an effective topological left transformation group. Let (E, v, B, AE) < 
(E, Tg, v, B, Tp, A5) be a topological (G, F) fibre bundle, 

Let (G, F) < (G, Tg, F, Tp, oG, uË) be an effective topological left transformation group. Let (E,7, B, AF) « 
(E, Tg, 7, B, Tp, AZ) be the patchwork associated topological (G, F) fibre bundle of (E, x, B, AE) with chart 
association map h : Aj, > AF as in Definition 47.10.4. 

(i) E is a well-defined partition of {(b, f,ó); b € B, f € F, 6€ AE.) =U gear ((Dom(9)) x F x {$}). 
(ii) T: E > B is a well-defined function, and 7: E > B is a surjection. 
U € P(B), 4(x-!(U)) =U. 

b € B, Ey £0, where É, denotes 4^! ((b]) for b € B. 


(xi vó c AE, JU € Tg, 9:0 > F. 
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(xii) U ácAE &(U;) = B, where U; = Dom(4) for all ¢ € AF. 
E Ree eos Au 
(xiii) V € AR, t x 6:  (U5) ~ UZ x F. 
(xiv) Voi, $2 € AZ, Jg5, g, € C(Uz, N Uz, G), V € t (U5, QU), $2(2) = 95, a, (2) (2). 
(xv) Voi, ¢2 € AZ, Vb € Us, NU 45, 965,9. (b) = Gn(do),h(4,) (b), where Ug denotes (Dom(¢)), and g, g denote 
the fibre chart transition functions for the respective fibre bundle atlases. 
(xvi) (E, 7, B, AZ) is a topological (G, F) fibre bundle. 


xvii E,3, B, AË is an associated topological (G, F) fibre bundle of (E, v, B, AE) with chart association 
E g E 


map h: AE > AF. 


Pnoor: Part (i) follows from Theorems 21.13.5 (i) and Theorem 47.6.6. 

Part (ii) follows from Theorems 21.13.5 (ii) and Theorem 47.6.6. 

Part (iii) follows from Theorems 21.13.5 (iii) and Theorem 47.6.6. 

Part (iv) follows from Theorem 21.13.5 (iv) and Definitions 47.10.4 and 21.13.2. 

For part (v), to show that Tg is a topology on E, first let Q equal the empty map @ from AS to P(B x F). 
Then UT Grx h(9)) 1 (Q5) = 0. So € Tg by Definition 47.10.4 (iv). Secondly, define Q : AZ — IP(BxF) 
by Q5 = Ug x F for all € AE. Then OQ, € Top(U, x F) for all ¢ € AZ, and 


U (7X AE) (Q9) = U Gr X h(9)) (Us x F) 


peas peat 


by Theorem 10.15.6 (iii). 
Since 47! (Us) = {[(b, f, 9^) € E; b € Us} and (9) (Us) = {[(, f, ¢)] € E; @ = à), it follows that 


U Gxh($) N) = U {b f,¢')] € E: ó' = ó and b € Us] 


peak $€ AF 


= U l[(b f. 2)]; b € Us and f € F} 


óc AL 


Therefore E € Tg by Definition 47.10.4 (iv). 
Two sets X1, X2 € Tg must satisfy Xj, = UT. (ï x h($))- (0$) for some Q^ : AE — P(B x F) satisfying 
Vo € AE, 05 € Top(U, x F) for k = 1,2. Then by Theorem 10.8.14 (iii), their intersection is 


XinXai- U (EXA OG) N U (Gr xh(92))  (05,) 


gic AT $2€ AT 
= U GQGxh(ó)) (05) n ( x hle)  (02,) 
$1,¢2€ AT, 
» y MU fi, éi)] € E; (b, fi) € 03,) n (5, f2,63)] € E; (b, fo) € 02.) 
=, y miu fi, é)] € E; (b, f1) € 25, } n (05, 961.40 (0) fa, $1)] € E; (b, fo) € 02.) 


by Theorem 47.10.8 (iii). For g € G and S € Top(U x F), let gS denote the set {(b, gf); (b, g) € S). Then 
gS € Top(U, x F) for all g € G and S € Top(U, x F). This follows from the fact that gS = (idu, x L,)S, 
where idy, X Lg : Ug x F + U; x F is a homeomorphism by Theorem 32.9.10 (ii) because Ly : F > F is a 
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homeomorphism by Definition 36.10.3. Then by Theorem 10.8.14 (i), 


X10 Xo= U {[(b, fi, ¢1)] € E; (b, fi) € Q } n {[(d, fi, 1)] € E; (b, 92,61 (b) fi) € Q5.) 
$1,02€ AT, 
a U MICE E; (b, fi) € 05,3 n {I fi ġ1)] € E; (b, fi) € 961,9, (0)05,) 
= U ((Xh(/))- (Q5) n ( x hlo) (95,5, (002,)) 
$1,92€ AT, 
= U (EXMA) NA U GxA(9)) (25.5, (09,)) 
$i€AE $2€ AE 
=i) (a x AHE, ) n (& x alh U Ibaba (B)3,)) 
$1€ AE $3€ AT 
= U (&*h(ó))(0l, n U — 961,62 (992,) 
ó1€ AE $3€ AZ 
= U (xh(ó)) Â), 
pic AT 


where Ô! : AZ + Top(U, x F) is defined by Q5, = Q3, n Use AE 91.62 (b)Q4, for all ó, € Af. It follows 
that X1 X: € Tg. 

Now let C C Tg. Then for all X € C, X = Use Ar Gt x h(9))! (Q2) for some function Q* : AZ > P(B x F) 
satisfying V € A5, NF € Top(U, x F). So by Theorems 10.8.16 (i) and 10.8.18 (i), 


UC= U U GE xh(9) Ož) 


where Q : AE > Top(U¢ x F) is defined by Q, = Uxec ax for all ó € AL. It follows that UC € Tg. Thus 
T'g is a topology on E by Definition 31.3.2. 
For part (vi), let Z € Tg = Top(B). Then it follows from Definition 47.10.4 (ii, i) that 


E b,f,o)|€ É; bez) 
b, fo]; bEZ, fEF, pe AE} 
{ 


(b, f, o)l]; bez red 


={[( 
e 


mi l 


cA 


Define Q : AE > P(B x F) by Ng = Z x F for all 6 € AE. Then 4-!(Z) = Use Az Gt x h($)) (Q4). So 
$- !(Z) € E by Definition 47.10.4 (i). Therefore ï : E — B is continuous by Definition 31.12.4. 

Part (vii) follows from Theorem 21.13.5 (v) and Definitions 47.10.4 and 21.13.2. 

Part (viii) follows from Theorems 21.13.5 (vi) and Theorem 47.6.6, and Definitions 47.10.4 and 21.13.2. 
Part (ix) follows from Theorems 21.13.5 (vii) and Theorem 47.6.6. 

Part (x) follows from Theorems 21.13.5 (viii) and Theorem 47.6.6, and Definitions 47.10.4 and 21.13.2. 

For part (xi), 

For part (xii), 

For part (xiii), 
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For part (xiv), 
For part (xv), [...] 
Part (xvi) follows from parts (v), (vi), (vii), (xi), (xii), (xiii) and (xiv) and Definition 47.6.5. 
Part (xvii) follows from parts (ix), (x), (xv) and (xvi) and Definitions 47.9.5 and 47.9.7. 
(( 2022-12-13. To be continued ... )) 


47.11. Orbit-space associated topological fibre bundles 


47.11.1 REMARK: Construction of associated fibre bundles using orbit spaces. 

The “orbit-space method” of defining associated fibre bundles constructs associated ordinary fibre bundles 
from a given principal fibre bundle. The orbit-space method is less general than the patchwork method in 
Section 47.10 because the input fibre bundle must be a PFB, but it is the method most often encountered 
in textbooks. It is very often given as the definition of an associated fibre bundle. Perhaps this is because it 
is easier to define, understand and use than the abstract definition, and maybe also partly because it is so 
useful for gauge theory. 


For the orbit-space method, the action of a group G on a principal bundle P and fibre space F is defined 
by g : (z, f) 9 (zg ^, gf) for g € G, z € P and f € F, or some variant of this. In terms of the physical 
interpretation of combined figure/frame bundles in Section 20.10, this action by g on the pair (z, f) may be 
thought of as the “rotation” g of the “camera” z combined with the inverse “rotation” g^! of the “image” f. 


It is common experience that when the camera is rotated or translated in one way, the image is rotated or 
moved in the opposite way. The underlying idea of the orbit-space method is that if z and f are varied in 
opposite ways, then there must be a unique object which is producing images f from camera orientations z. 
Since the “real object" may not be available, the closest possible substitute is the equivalence class [(z, f)] 
of all pairs (zg^!, gf) associated with the “real object". 


The sets [(z, f)] = {(zg~', gf); g € G} are orbit spaces of actions g : (z, f) > (zg |, gf) by Gon Px F. (See 
Definition 20.5.6 for orbit spaces.) However, one usually thinks of actions by groups on sets as translating 
or transforming the elements of those sets in some way. In this case, the purpose of the action of g on (z, f) 
is to keep the underlying “real object” the same! This action of a group element is a kind of “change of 
coordinates" for an unchanged object. 


The “true meaning” of the sets [(z, f)] is that they are equivalence classes very much in the same sense that 
the coordinate charts of manifolds refer to the same underlying “real points". One does not think of the 
points of manifolds as “orbit spaces" of the transition maps between coordinate charts. These points are 
merely equivalence classes of pairs [(w,x)], where the chart Y% may be thought of as the “camera”, and x may 
be thought of as the “image” or “measurement” produced by that camera. Thus the term “orbit space" is 
probably somewhat irrelevant when applied to the construction method in Definition 47.11.5. However, this 
term is very widely used and convenient in presentations of this method. 


47.11.2 REMARK: The importance of associated principal bundles and vector bundles in physics. 

As mentioned in Remark 21.12.9, principal bundles and their associated vector bundles are important in 
gauge theory for elementary particles. Principal bundles represent radiation fields (which are bosonic, such 
as photons), and their associated vector bundles represent physically associated matter fields (which are 
fermionic, such as electrons). 


Thus the fact that the orbit-space method for constructing associated ordinary fibre bundles requires a 
principal bundle as input does not restrict its applicability to gauge theory at all. One commences with a 
PFB representing the radiation field and builds the OFB representing a matter field from this. 


47.11.3 REMARK: Comparison of the orbit-space and patchwork methods. 

The patchwork method uses fibre charts ¢ as tags for pairs (b, f) € B x F, for a base space B and fibre 
space F, to make tagged tuples (b, f, à). The fibre chart tags determine the required transformation of the 
fibre space element f when changing the fibre chart. So for an arbitrary chart ¢’, the fibre space element f’ 
in the tuple (b, f', 9) which labels the same point as (b, f, à) is calculated as gg,4(b)f in terms of a chart 
transition map go, (b) for the given fibre bundle. 


The orbit-space method, on the other hand, uses tuples of the form (z, f) € P x F, for a given PFB total 
space P. The component z € P contains the same information as the combination of the base point b and fibre 
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chart F in the patchwork method. The fibre space element f’ in a pair (z’, f^) which is identified with a pair 
(z, f) is calculated as f’ = 6(2’)~1¢(z)f for an arbitrary chart ¢ for the principal bundle. This gives the same 
value, independent of the choice of ¢ since gg’, (b) = ¢'(z)¢(z)' is independent of z € 7~'({b}) for b € B. So 
97) 1d (z) = HT 16(7)9 G7) E oT (2) = (2) gol) go elbol) = o)l). The 
orbit-space method looks simpler, but the result is equivalent. 

In summary, the tuple (b, f, 9) in the patchwork method carries around a copy of the base point b and fibre 
chart ¢ so that the fibre chart transition map go/,; (b) can be applied correctly to change f to f' = gos (b) f, 
whereas the tuple (z, f) in the orbit-space method carries around the charts ¢ in the principal bundle atlas 
AS so that the group element (z’)~'#(z) may be applied to f to change it to f’ = ¢(z’)~!d(z)f when z' 
is substituted for z. 

The orbit-space method does not work if the given fibre bundle is not a PFB because the product ¢(z)f is 
only defined if ¢(z) is an element of a transformation group acting on f € F. 


47.11.4 REMARK: The orbit-space method for associated fibre bundles uses equivalence classes. 

Definition 47.11.5 constructs a topological (G, F) fibre bundle (E, 7, B, AZ) from a given topological principal 
G-bundle (P, tp, B, AG), where (G, F) is an effective topological left transformation group. The total space 
E of the ordinary fibre bundle (E, m, B, AE) is constructed as an equivalence class of pairs in P x F. which 
is locally homeomorphic to B x F. The (G, F) fibre atlas A‘ for the fibration (E, r, B) is constructed from 
the atlas A for (P, p, B) by applying the group operation of G. The topology Tp for E is defined so that 
the maps 7 X 9 will be homeomorphisms for ¢ = h(¢) € AE. 


The inputs and outputs of the orbit-space construction method may be summarised as follows. (The lists 
for the patchwork construction method are given in Remark 47.10.3.) 


Inputs: 

1) G < (G, Ta,c), a topological group. (The source structure group.) 

2) (P, mp, B) < (P, Tp, mp, B, Tp, AS), a topological principal G-bundle. (The source PFB.) 

3) (G, F) < (G, Ta, F, Tr. o, pE), a topological left transformation group. (The target fibre space.) 


Outputs: 

1) E < (E, Tg), the new total space. 

2) 

3) AL, the new fibre atlas. 
) 


4) (E, 7, B, AE) < (E, Tg, v, B, Tp, AZ), the new topological (G, F) fibre bundle. (The target OFB.) 
Composed of the new structures E < (E,Tg), 7 and AL, and the old base space B < (B, Tg). 


5) h: AG — AB, the fibre chart association map. (From source PFB to target OFB.) 


7, the new projection map. 


47.11.5 DEFINITION: Orbit-space construction of topological fibre bundle from a principal bundle. 
The orbit-space associated topological (G, F) fibre bundle for a topological principal bundle (P, tp, B, AG) < 
(P, Tp, p, B, Tp, AG) and a topological left transformation group (G, F) < (G, Ta, F, Tp, c, ub) is the topo- 
logical (G, F) fibre bundle (E, r, B, AE) < (E, Tg, n, B, Tp, AZ) which is constructed as follows. 

(i) E = {[(z, f); z € P, f € F}, where for all (z, f) € P x F, 

(z, P] = (G^. f") € P x F; np(2') = np(z) and 36 € AF, (2) f = (2)f]- 

(ii) m: E > B is defined by s : [(z, f)] > mp(z). 
(iii) AZ = {h(d); 6 € AG}, where h(9) : ^! (U5) — F is defined for ¢ € AG by h(9) : [(z, f)] 9 elz) f. 
(iv) Tg — {Useag (7 x h(9)) 1 (Q5); A: AG + P(B x F) and Yọ € AG, Ng € Top(Uy x F)}. 


The topological fibre bundle association map is then h : AG — AE as in part (iii). (See Figure 47.11.1.) 


((2019-4-1. Show that the orbit-space associated topological fibre bundles in Definition 47.11.5 satisfy the 
abstract Definition 47.9.7. This should follow the pattern of Theorem 47.10.9.)) 


47.11.6 NOTATION: (P x F)/G denotes the orbit-space version of the associated topological fibre bundle 
in Definition 47.11.5. (See Notation 47.11.13 for an alternative.) 
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Figure 47.11.1 Associated topological fibre bundle orbit-space construction 


47.11.7 REMARK: Well-definition of associated fibre bundles using the orbit-space method. 

The relation ¢(z')f’ = ¢(z)f in Definition 47.11.5 (i) is independent of the choice of ¢ € AG. (This is 
asserted in Theorem 47.11.8.) If P is non-empty, then Ap is non-empty. So the set of (z’, f’) satisfying 
“Ad € AG, (2) f' = p(z) f” is the same as the set of (z’, f^) satisfying “Vd € AG, o(z2') f’ = o(z)f”. Thus 
two pairs (z, f) and (z', f’) are considered equivalent in Definition 47.11.5 (i) if they have the same base 
point in B and the action of z on f through a chart ó is the same as the action of z’ on f! through the same 
chart 9. 


Hence the elements [(z, f)] of E are the same as orbits of the action of P on F except that the action is 
indirect via one or more charts ¢ € AS. (For comparison, see Definition 20.5.6 for the orbit space of a 
general left transformation group.) 


47.11.8 THEOREM:  Chart-independence of orbit spaces of associated fibre bundle construction. 
The relation $(z') f’ = $(z)f in Definition 47.11.5 (i) is independent of the choice of ¢ € AG. In other words, 


Vb € B, Vz,z/ € ng ((b)), Vf, f' € F, VÀ, b2 € AS, 
pil) F = biz) f S pl) = polz). 
PROOF: Let (P, 7p, B, A8) < (P,Tp, tp, B, Tp, AG) be a topological G-bundle, and let (G, F) < 


(G, To, F,Tr,o, uE) be an effective topological left transformation group, let z,z’ € P, f,f' € F and 
$1, 62 € AG, and suppose that tp(z) = tp(z’) = b and ¢1(z’)f’ = ġı(2)f. Then 


= o2(2)f, 


which proves the implication ¢)(z’) f’ = ¢1(z)f = ¢2(z')f’ = ¢2(z)f. The reverse implication follows in 
the same way. Therefore the choice of ¢ in Definition 47.11.5 (i) does not affect the definition. 


47.11.9 REMARK: Orbit spaces of orbit-space associated fibre bundles are functions. 
According to Theorem 47.11.10 (v), each orbit space [(z, f)] in the total space E of an orbit-space associated 
fibre bundle E < (E, tpg, B, AZ) is a well-defined function from Prp(z) to F. This is perhaps an accident 
of the way in which functions are represented in ZF set theory. Nevertheless, each orbit space [(z, f)] = 
{(zg~*, gf); g € G} does associate a unique element of F with every element of P,,(z), which is without 
doubt a well-defined function. It turns out that this is useful. (See Theorem 47.12.6.) 
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According to Theorem 47.11.10 (vii), each orbit space [(z, f)] € E, regarded as a function, has an invariance 
property. Let y = [(z, f)]. Then Yz’ € Pz, (2), Vg € G, y(z'g !) = gy(z’). (Or equivalently one may write 
Vz! € Prp(e) Vg € G, y(z/g) = g *y(z’).) This implies in particular that the value of the function [(z, f)] is 
fully determined on the fibre set P,,(z) if its value is known for just one element of that set. It follows that 
the value of a cross-section of the associated fibre bundle E is therefore equivalent at each base point b € E 
to an F-valued function on P, possessing this invariance property. 


p(z) 
In the context of differentiable principal bundles, this implies that a covariant derivative may be applied 
to cross-sections of the OFB indirectly via F-valued functions on the principal bundle. Thus a covariant 
derivative may be first defined for F-valued functions on the PFB total space, and this may then be “copied” 
to arbitrary orbit-space associated OFB cross-sections with matching fibre space F. 


47.11.10 THEOREM: Some basic properties of orbit spaces of associated fibre bundles. 
Let (P, rp, B, A8) be a topological principal G-bundle. Let (G, F) < (G, Ta, F,Tr, 0, p) be an effective 
topological left transformation group. 


(i) Vz € P, Vf € F, [(z, f)] = (297,955 g € G} 


(ii) Yz € P, Vf € F, Vg € G, [(z, f)] = [Gg 9f)]- 
(ii) Yz € P, Vf € F, Vg € G, [(z9, f)] =[(z of). 
(iv) Vz € P, Vf € F, V(a, fi), (22, fa) € (Gs P)]; 39 € G, (22, f2) = (n9 5 oft). 
(v) Yz € P, Vf € F, [(z, f)] is a well-defined function from P;,(. to F. 
(vi) We, f) € P x F, Y2! € Puy VO € AS e. Gs DIG) = 96719 ()f. 
(vii) V(z, f) € P x F, V2! € Prote) Yg EG, [(z, PII) = 9CIGS DIC) ). 
(viii) V(z, f) € P x F, Yz’ € Prez), V9 EG, [(z, DI) = 9 CIGs A) - 
(ix) V(z, f) € P x F, V2! € Pros), [Gs PI = (12) 1 (2)f. (See Definition 21.11.18 for LẸ = u&(z^, -).) 


PROOF: For part (i), let z € P and f € F. Then by Definition 47.11.5 (i), the elements of [(z, f)] are the 
pairs (2’, f^) for which z’ € P and f' € F, such that mp(z’) = mp(z) and 3ó € AS, ó(z)f' = o(z)f. But 


p(2') = np(z) and ó(7)/' = é(z)f €& np(z) = mp(z) and f' = ó(7) *é(z)f 
€ mp(z') = np(z) and 3g € G, (f' = gf and g = 4(z')~*4(z)) 
€ mp(z') = mp(z) and 3g € G, (f' = gf and ¢(2') = é(zg ^) 
© JdgcG, (F = gf and z’ = zg E) 
€ 3g EG, (2, f") = (z9, gf). 


So [(z, f)] = (997,9): g € G- 

For part (ii), let z € P, f € F and g € G. Let (2’ f) € [(z, f)]. Then by part (i), (z^, f^) = (29,97'f) 
for some à € G. Let g = $g |. Then (2’, f") = (zg-1g—', gf) € (za 9] by part (i). Thus [(z, f)] € 
[29 ^, gf)]. Similarly [(z, f)] 2 [(z9~*, gf)]. Hence [(z, f)] = (2975, 9 f)]. 

Part (iii) follows from part (ii) by substituting zg for z. 

For part (iv), i z € P and f € F. Let (z2, fi), (22 fa) € [(z, 
(za, fe) = [CN ,g2f) for some g1,g» € G. So (2191, 91 ROLE 
z2 = zg | and f = gfi. Hence (22, f2) = (1g  , gfi). 

For part (v), let z € P and f € F. Then [(z, f)] € Pro(2) x F by Definition 47.11.5 (i). Therefore [(z, f)] 
is a relation from P,,(z) to F by Definition 9.5.2. Let 7" € P,,(z). Then 2’ = zg -! for some g € G by 
Theorem 47.8.18 (ix). Then (z’, gf) = (zg~', gf) € [(z, f)] by part (i). So Dom([(z, f)]) = Pre(2)- 

To show that the relation [(z, f)] has a unique value, let (2’, f1), (z^, f2) € [(z, f)]. Then (2, fi) = (2’97', gfe) 
for some g € G by part (iv). So ¢(z’) = ¢(z g-1) = $ó(z)g ^! for any ¢ € AT n(o) by Definition 47.8.7. 
Therefore g^! = eg by Theorem 17.3.14 (iii). So fı = gf» = f». Thus the relation [(z, f)] has a unique value 
for each element of its domain. Hence [(z, f)] is a well-defined function from P,,,(z) to F by Definition 10.2.2. 
For part (vi), let z € P and f € F. Let z' € P4,,(; and ¢ € A ney Let f' = $(7) !ó(z)f. Then 
(2^, f^) € [(z, f)] by Definition 47.11.5 (i). Hence [(z, f)|(z’) = f’ = $(z) !9(z)f. 


]. By part (i), (21, fi) = (29; 5, gi f) and 
, f) = (2292,93 f2). Let g = g2g,'. Then 


f) 
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For part (vii), let z € P and f € F. Let z' € P,,(,) and g € G. Let $ € AS (a): Then by part (vi) and 
Theorem 66.2.12 (iv), [(z, f)](2g 7) = 6(28g 1) 1 6(2)f = g6(2) ^ 9(2f = gl lle 0G). 

Part (viii) follows from part (vii). 

For part (ix), let z € P and f € F. Let z' € P,,(, and o € AS c (s): Then (LE)-!(z) = $(2')-!é(z) by 
Theorem 21.11.19 (vi). Hence [(z, f)](z’) = (DE) !(z)f by part (vi). 


47.11.11 REMARK: Comments on associated fibre bundles constructed with orbit spaces. 

By Theorem 47.11.10 (i), the equivalence classes [(z, f)] in Definition 47.11.5 (i) are orbit spaces of the left 
action ((z, f), g) 9 (zg ^, gf) by G on P x F. 

Since $((z97", 9f)) = 6(zg )gf = ó()g gf = Ol2)f = O((z, f)) for any g € G, a fibre chart ó = h(¢) 
maps all representatives of an orbit [(z, f)] to the same element of F. 

The construction of (P x F)/G is similar to the construction of a tensor product of two spaces. It is 
particularly similar to the tensor product of two modules over a ring. (See EDM2 [113], 277.J.) For any 
ring R and a left R-module X and right R-module Y, the tensor product X xg Y is defined so as to satisfy 
(ra) & y = x & (ay) for a € R, x € X and y € Y, and some other conditions. The projection map from 
P x F to E is “G-balanced” in the sense that [(zg, f)] = [(z, gf)] for all z € P, g € Gand f E F. 


As mentioned in Remark 47.8.12, the information in the right action map pÅ is redundant because this 
information may be recovered from the fibre charts. So an associated OFB constructed in Definition 47.11.5 
from a PFB may be defined without reference to jj. To be specific, the expression p& (z, g) in condition (i) 
may be replaced with ble (¢(z)g) for ó € AG. This shows that the associated OFB is constructed 
essentially in terms of only the fibre charts of the PFB. 


47.11.12 REMARK: Alternative notation for orbit-space associated topological fibre bundles. 

Notation 47.11.13, like Notation 47.11.6, is well defined even if only a topological right transformation group 
(G, P) and a topological left transformation group (G, F) are given. The principal bundle P is only mentioned 
in Notation 47.11.13 because a right transformation group (G, P) can be automatically constructed from it. 
However, in differential geometry this is usually how the right transformation group is in fact constructed. 


47.11.13 NOTATION: Orbit-space associated topological fibre bundle. 
P xg F, for a topological principal bundle P and a topological left transformation group (G, F), denotes the 
orbit-space associated topological (G, F) fibre bundle for P. In other words, 


P xg F = {[(z, f) ze P, fe F}, 


where Vz € P, Vf € F, [(z, f)] = {(297+, gf); g € G} 


47.12. Topological short-cut orbit-space associated cross-sections 


47.12.1 REMARK: Motivation for contravariant fibre-space-valued functions on a principal bundles. 

The contravariant fibre-space-valued functions in Definition 47.12.3 are arrived at by some simple steps. 

(1) Construct the orbit-space associated fibre bundle E — (P x F)/G as in Definition 47.11.5. 

(2) Define cross-sections X € X(E,mg, B). 

(3) Note that X(p) = [(z, f)] = {(z, ¥p(z)); z € Pj) = Y, : P, > F is a contravariant function for p € B. 
(4) Create short-cut versions Y : z œ Y,,í)(z) = X(mp(z))(z) of the maps X = {(p, Ýp); p € B}. 

'Thus contravariant principal bundle functions are the automatic outcome from the combination of the orbit- 
space associated fibre bundle construction with the short-cut concept in Definition 21.4.9. So the set of 


contravariant principal bundle functions is the same as the set X (Em, B) in Notation 21.4.10. In other 
words, it is the set X ((P x F)/G, re, B). This is abbreviated to X ((P x F)/G) in Notation 47.12.4. 


47.12.2 REMARK: Contravariant fibre-space-valued functions on a principal bundle total space. 

Bleecker [254], pages 43-44, defines fibre-space-valued functions on principal bundles as in Definition 47.12.3 
for Lie left transformation groups (G, F). When F is a linear space, they are called “particle fields" by 
Bleecker [254], page 43. (Such fields are also defined and used by Daniel/Viallet [317], pages 180, 186.) 
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When F is a linear space, these kinds of functions are extended to general F-valued k-forms on P for k € Zi ; 
namely in the space A&;(T(P), F), by both Bleecker [254], page 44, and Sternberg [38], pages 336-339, who 
calls them “basic forms". 


Contravariant fibre-space-valued functions on principal bundles have a natural one-to-one correspondence to 
cross-sections of orbit-space associated ordinary fibre bundles. Their construction requires only a principal 
bundle and an action (i.e. representation) of the structure group on some fibre space. It is then unnecessary 
to construct the orbit-space associated OFB in Definition 47.11.5 because cross-sections of the associated 
OFB can be represented as contravariant PFB total space functions. Bleecker [254], page 43, wrote the 
following regarding his presentation of such “particle fields." 


Much of what follows can be rephrased in terms of associated bundles, but we will not bother to 
do so, since this point of view is foreign to most physicists and is also notationally more difficult. 


This may explain why so few physicists define associated fibre bundles, and why only a small number of 
mathematicians or physicists define associated connections. (See Remark 67.12.1 for a list of some authors 
who do define associated connections.) 


In terms of the physical interpretation of fibres and frames in Section 20.10, contravariant principal bundle 
functions output the “coordinates” or “measurement” value in F for each “reference frame" in P. In 
practical terms, this is actually much more useful than an abstract element [(z, f)] of an abstract ordinary 
fibre bundle P xg F. (The word “contravariance” in Definition 47.12.3 is taken from the contravariance 
condition (iv) in Definition 20.10.8 for “contravariant baseless figure/frame bundles".) 


The functions Y in Definition 47.12.3 are effectively “orbit-valued functions". For each z € P, the set of 
ordered pairs ay (z) = ((zg ,Y(zg 1); g € G} = ((zg ^, gY(z)); g € G} is an element of P xc F, the total 
space of the orbit-space associated (G, F) fibre bundle of P in Definition 47.11.5 by Theorem 47.11.10 (i). 
Thus ay (z) = [(z, Y (2))] € P xg F for all z € P. But ay is constant on fibre sets of P because ay (z) = 
Yl ow for all z € P. In other words, ay (z) = ay (2') for all z, z' € P, for all p € M. Thus Y maps each 


element of P,, 2) to the same orbit [(z, Y (z2))] € P xg F. Therefore Y determines a map X : B > P xg F 
defined by the formula X(b) = [(z, Y (z))] & mp(z) = b. In other words, X = ((b,[(z, Y (z))); mp(z) = b}. 
Consequently a possible name for this kind of function would be an “orbit-valued cross-section of a principal 
bundle", or something similar. However, the identification of such functions with OFB cross-sections is valid 
for general abstract associated ordinary fibre bundles, as shown in Theorem 47.12.8, not only for the orbit- 
space construction in Definition 47.11.5. A canonical bijection specifically between contravariant principal 
bundle functions and associated orbit-space OFB cross-sections is shown in Theorem 47.12.6. 


47.12.3 DEFINITION:  Contravariant fibre-space-valued principal bundle functions. 
A contravariant principal bundle function on a topological principal G-bundle (P, mp, B, AG) for a fibre 
space F, where (G, F) is a topological left transformation group, is a function Y : P — F which satisfies 


Yz € P,VgeG, Y (zg) = g !Y(z). 


Alternative name: short-cut orbit-space associated cross-section. 


47.12.4 NOTATION: Short-cut orbit-space associated cross-section spaces. 
X((P x F)/G), for a topological principal G-bundle (P, tp, B, AG) and topological left transformation group 
(G, F), denotes the set of contravariant fibre-space-valued principal bundle functions on P. In other words, 


X((P x F)/G) = {Y : P > F; Yz e P, Yg € G, Y (zg) = g !Y(2)). 


Alternative notation: X (P xg F). 
X? ((Px F)/G), for a topological principal G-bundle (P, 7p, B, AG) and topological left transformation group 
(G, F), denotes the set of continuous contravariant fibre-space-valued principal bundle functions on P. In 
other words, 

X°((P x F)/G) = (Y € O(P, F); Vz € P, Vg € G, Y (zg) = g !Y(z)). 


Alternative notation: X? (P xg F). 
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47.12.5 REMARK: Contravariant principal bundle functions are short-cut orbit-space cross-sections. 

The contravariant principal bundle functions in Definition 47.12.3 are short-cut versions of cross-sections of 
the orbit-space associated fibre bundle with fibre space F. (See Definition 21.4.9 for non-topological short-cut 
maps. See Section 57.7 for short-cut versions of differential forms.) 


To see this, let X € X(E, ng, M) be a cross-section of the orbit-space associated fibre bundle E = (Px F)/G. 
Let p € M. Then X(p) = [(z, f)] = ((zg !. gf); g € G} for some z € P, and f € F. Each set [(z, f)] is a 
function from P, to F by Theorem 47.11.10 (v), where p = tp(z). Denote this function by Y, : P, > F. 
Since the sets P, are pairwise disjoint for p € M, it follows that Y = U,cyy Yp is a well-defined function 
from P to F. This function Y satisfies Definition 47.12.3. 


This construction gives a simple map from X (E, mrg, M) to the set of all contravariant principal bundle 
functions on P. The map p in Theorem 47.12.6 is the inverse of this map. Thus Theorem 47.12.6 is very 
closely related to Theorem 21.4.13 (v). 

A version of Theorem 47.12.6 is stated and proved by Daniel/Viallet [317], page 180. The result is stated 
without proof by Bleecker [254], page 43. It is stated with an informal proof by Sternberg [38], page 337. 


47.12.6 THEOREM:  Bijection from contravariant PFB functions to orbit-space associated cross-sections. 
Let (P, tp, B, AG) be a topological principal bundle. Let (G, F) be a topological left transformation group. 
Let (E, vg, B, AŻ) be the orbit-space associated topological (G, F) bundle for (P, p, B, AG). 

Let S = {Y : P — F; Yz € P, Yg € G, Y (zg) 2g !Y(z)). Define p: S  X(E,ng, B) by 


VY € S, Vb € B, p(Y)(b) = (Y (2): z € mp HI 


Then p: S > X(E,ng, B) is a well-defined bijection, and its inverse satisfies 


VX € X(E,n, B), Vz € P, p (X)(z) = X(np(z))(z) 
and 
VX € X(E,n, B), Vb € B, p (X)| p, = X(b). 


Proor: Let Y € S and b € B. Let zo € P, = sg! ((b]). Then P, = (zog; g € G} by Theorem 47.8.18 (viii). 


n (GG, Y (2) z € ng (5) = (Y (2)); z € 6085 g € GH} 

= {(209, Y (20g)); g € G} 

= {(z09, 9° Y (20) g € G} 

= {(209 , gY (20)); g € G} 

= [(20, Y (20))] € Æ 
by Definition 47.11.5 (i, ii). Thus Range(p) C X(E,ng, B). Now let X € X(E,ng, B). Then for all b € B, 
X(b) = [(z, f)] for some z € P and f € F by Definition 47.11.5 (i), and m_(X(b)) = b implies that z € P». 


By Theorem 47.11.10 (v), [(z, f)] is then a well-defined function from P, to F for each b € B. So the function 
Y : P — F is well defined by Vz’ € P, Y(2’) = X(np(2))(7). But Theorem 47.11.10 (viii) then implies 
that Y(z'g) = g 1Y (7) for all g € G, for all z' € P. Therefore Y € S. Thus Range(p) 2 X(E, rg, B). 
Consequently Range(p) = X(F,72, B). In other words, p: S > X(E,nx, B) is a surjection. 

To show that p is injective, let Y1, Y € S satisfy p(¥1) = p(Y2). Then Yı|p, = Y2|, for allb € B. So 
Y; = Yo. Thus p is injective. Hence p : S > X(E,7z, B) is a well-defined bijection. 

To verify the formulas for p^ !, let X € X(E,7,B). Define Y : P > F by Vz € P, Y(z) = X(mp(z))(z). 
Then Y € S by Theorem 47.11.10 (v, vii) as above. Let b € B. Then p(Y)(b) = Ys = X(b). So p(Y) = X. 


So Y = p !(X). Hence p !(X)(z) = X(mp(z))(z) for all z € P and p (X) = X(b) for all b € B. 


= Se LUN 


la 


47.12.7 REMARK: Contravariant PFB functions related to general associated ordinary bundles. 
In the spirit of maximum generalisation (which has pervaded mathematics since the earliest times, in the 
same way that maximum unification has pervaded physics), one naturally wishes to know whether the 
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contravariant principal bundle function concept can be extended to general associated OFBs. This would be 
particularly helpful when interpreting physics texts where definitions of covariant derivatives, for example, 
sometimes presuppose that all associated OFBs are orbit-space associated OFBs. 


Theorem 47.12.8 shows that the associated fibre bundle E does not need to have any particular construction 
method. The functions in S are defined in terms of P, G and F in a way which is independent of any 
particular representation of the associated F-bundle. Then the map p : S > X(E,ng, B) is constructed 
from the chart association map h : AG — A‘. It is not in fact necessary to define or construct any OFB at 
all. The “particle fields” Y € S are well defined in the absence of any associated ordinary fibre bundle. 


The interpretation of contravariant PFB functions as “particle fields", mentioned in Remark 47.12.2, is not 
too difficult to see. For each reference frame z € P, a measurement in F, which is typically a linear space 
such as C, C? or C?, is defined so that it behaves as expected under coordinate transformations z +> zg. 
This is a minimum self-consistency condition. Then by knowing the measurement Y (z) for one reference 
frame z, one may compute it for all other reference frames zg. Such a field value in F is defined on each fibre 
set Ey. So this does effectively give a “vector field" at each point b € B, for each reference frame z € Ep. 


The reason for the name “particle field” is that in gauge theory, the force exerted by a boson “radiation field" 
is represented by a connection form on the principal bundle, whereas the “wave function" or “matter field" 
in the Dirac equation for fermions is represented by a vector field X € X(E, rp, B). Thus “particle” signifies 
“matter”, which means fermions, whereas “radiation” means a boson field. Thus very roughly speaking, C 
fields represent electrons and U(1) fields represent photons, whereas C? fields represent quarks and U(3) 
fields represent gluons. (If in doubt, ask an expert, i.e. not me.) 


47.12.8 THEOREM:  Bijection from contravariant PFB functions to general associated cross-sections. 

Let (P, tp, B, AG) be a topological principal bundle. Let (G, F) be a topological left transformation group. 
Let (E, Tg, B, AE) be an associated topological (G, F) bundle for (P, tp, B, AG) with chart association map 
h: AS 2 AE. Let S = {Y : P — F; Yz € P, Yg € G, Y (zg) 2g !Y (z)). Define p: S > X (E, rpg, B) by 


=i -1 
VY € S, Vb c B, Vó € Ag, p(Y)(b) = h(4)| 5, (Y (al p, (€))) 
where e is the identity of G. Then p: S > X(E, 7p, B) is a well-defined bijection, and p^! satisfies 


VX € X(E,ng, B), Yz € P, Vó € AS, 
p (X)() = à(z) ! h(6)(X (np(2))). (47.12.1) 


Proor: LetY € S,b € B and ó € A§%,. Then lp, (e )is awe: defined ciues of » because $l p, : P, >G 
is a bijection by Theorem 21.8.8. (See Definition 21.10.3 for olz P, ) SoY( (lz P, ) is a well-defined element 


of F. Therefore ¢(Y)(b) is a well-defined element of Ey s - : Eo > p is a bijection. 


Ola 
To show that the expression for p(Y)(b) is independent of ¢ € A, let $1, 0 € AG». Then ó2|p, o $i Pa = 
i : G > G for some g € G by Definitions 47.8.3 and 47.6.5(v). So Y(¢2 ae YY (dlp ( = 
Y (Iz! g7!) em (Jp, (e )) by Theorem 21.11.7 (xiii) and the definition of S. But h(¢2) le o h(¢1) ra = 
L: Fo - the same value of g € G by Definition 47.9.5 (ii). So 


h(o2)| p, (Y (o| p, (©) = h(b2)|p, OY lp (€))) 
og [o c»). 


Thus o(Y )(b) is chart-independent. So p : S > X(E,7z, B) is a well-defined function. 


Now let X € X(E,mg, B) and define Y : P > F by Y(z) = 6(z)~'h(¢)(X(mp(z))) for all z € P. Then Y 
is a well-defined function from P to F because z € P implies tp(z) € B, which implies X(np(z)) € Ez,(., 
which implies h(¢)(X(mp(z))) € F, which implies 6(z)~'h(¢)(X (mp(z))) € F because $(z) ! € G. Since 
é(zg) | = (ó(z)g) ! = g 1o(z) !, it follows that Y (zg) = g 1Y (z) for all z € P and g € G. So Y € S and 


Vb € B, Vó € AS, p(Y)(5) = h(9)|5, (Y (e| p, (€))) 
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D) |, (|p, (9) h(0) X (5))) 


h( 
= h(ó)[s, (h (0) X (0) 
= X(b). 


Thus p(Y) = X. So p: S — X(E, rpg, B) is surjective. To show that p is injective, let Y1, Y € S satisfy 
p(Y1) = p(¥2). Then for all b € B and ¢ € AG, h(d)|p, (Yi (ó|p, (€))) = h(G) |p, (Yo (|, (€))). Therefore 
Yı (|p, (e)) = ¥2(4|p, (e)) because h(o)|g, is injective. So Yi(zo) = Yo(z0), where zo = dlp, (e) € Py. 


Let z € Pj. Then z = zog for some g € G by Theorem 47.8.18 (viii). So Yi(z) = Yi(zog) = g^! Yi(z0) = 
g lYo(zo) = Ya(zog) = Yo(z). Thus Yı = Yo. Hence p : S —^ X(E,mg, B) is a well-defined bijection, and 
line (47.12.1) follows from the computation already made, that p(Y) = X for Y : z+ ó(z) !h(9)(X (np(z))) 


for ó € AG sta)’ 


47.12.9 REMARK: Doppelgänger orbit-space associated OF Bs for general associated OF Bs. 

An unimportant side-effect of Theorem 47.12.8 is that from any associated OFB Ej of a given PFB P, for a 
fibre space F, one may bijectively map the cross-sections X of E, to contravariant principal bundle functions 
p, (X) : P > F, where p; : S > X(E1, 71, B) denotes the bijection in Theorem 47.12.8. Then one may 
apply Theorem 47.12.6 (with bijection p») to pj ! (X) € S to obtain p»(pj ! (X)) € X(E2, 72, B), where E» 
is the orbit-space associated OFB which is constructed from P and F. Thus p2 o üt : X(E1, mı, B) > 
X (E2, 172, B) is a bijection between the two spaces of cross-sections. 


Therefore if one possesses only an associated OFB which does not have the orbit-space structure, it is 
straightforward to map it bijectively to the unique orbit-space construction for the given principal bundle P 
and fibre space F. 


This bijection pz o p; ! between cross-section spaces could be useful when one wishes to obtain the benefits 


of some theory which requires orbit-space structure for definitions and theorems. Then results for the orbit- 
space structure may be converted to results for generally structured associated OFBs. 


47.13. Combined topological fibre/frame bundles 


47.13.1 REMARK: Bundles which combine fibres with frames. 

Definition 47.13.2 is an extension of the “baseless figure/frame bundle" ideas in Section 20.10 to topological 
fibre bundles, in particular Definition 20.10.8. (See Notation 47.4.7 for the set XC(P, p, B) of continuous 
local cross-sections of a topological fibration (P, p, B) in Definition 47.13.2 (viii).) The maps and spaces in 
Definition 47.13.2 are illustrated in Figure 47.13.1. 


Figure 47.13.1 Combined topological fibre/frame bundle 


47.13.2 DEFINITION: A (contravariant) topological fibre/frame bundle is a tuple 
(G, P, mp, E, 7, B, F,n, AB) < (G,Tc,P,Tp, mp, E, Tg, m, B, Tp, F, Tp, o, we, ub v, AB) which satisfies the 
following conditions. 
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(i) (G, Ta), (P, Tp), (E, TE), (B, Tp) and (F, Tr) are topological spaces. 
(ii) mp: P > B and « : E 2 B are continuous surjective maps. 
(ii) (G, P,o, ue) < (G, Ta, P, Tp. c, ue) is a topological right transformation group which acts freely on P, 
and acts transitively on sets mp” ({b}) for b € B. 
(iv) Vb € B, JU € Top,(B), 3ó : U > P, (np o à = idy and pg : U x G z np (U)), where the local 
“trivialisation” pẹ : U x G — P is defined by pg : (b, g) — HE (G(b), g). 


(v) (G, Fo, nE) < (G, Ta, F, Tr, o, u) is an effective topological left transformation group. 
(vi) 7: Use (np! ({b}) x 771 ({b})) > F satisfies 
Vb € B, Vp € mp ({b}), Vz € v ({d}), Vg € G, n(u = - =+), 2) = uG (g, n(p, z))- 
z)= 
to 


(vii) Vb € B, Vp e wp’ ({b}), Vf € F, J'z e n-1((b)), n(p, 
(Le. the map z > n(p, z) is a bijection from «^ 1({b}) 


(viii) A$ C X°(P, zp, B), and Use ar Dom(9) = B. 


(ix) Yọ € Ab, m x ng : v^! (Dom(Q)) ~ Dom(¢) x F, where ng : 1^! (Dom(9)) > F is defined for ¢ € A 
by ng : z e» n(6(n(z)), 2). 


G is the structure group or reference-frame transition group. 

P is the principal bundle or frame bundle or observer bundle or viewpoint bundle or perspective bundle. 
E is the ordinary bundle or fibre bundle or state bundle or entity bundle or object bundle. 

B is the base space. 

F is the observation space or measurement space or component space or coordinate space. 

uÈ is the right action (map) of G on P. 

uE, is the left action (map) of G on F. 

(G, P, c, uÈ) is the frame (transformation) group. 

(G, F, c, uE) is the measurement (transformation) group. 

n is the measurement (process) map or observation (process) map. 

A5 is the frame cross-section atlas. 

$ is a frame cross-section chart for ¢ € AL. 

p is à frame or observer or viewpoint or perspective for p € P. 

z is a fibre or state or entity or object for z € E. 

f is an observation or a measurement or a view or the components or the coordinates for f € F. 
no is a chart or component map or coordinate map for ¢ € AL. 


for all p € mp ((b)), for all b € B.) 


47.13.3 REMARK: Ordinary and principal fibre bundles are most natural when combined. 

If Definition 47.13.2 seems excessively long, it should be considered that it combines a principal fibre bundle 
with an associated ordinary fibre bundle, and it defines the association between them. So it is really three 
definitions in one. 


This definition is not motivated by abstract formalism. It is probably closer to “the truth" than the other 
fibre bundle definitions because it shows where fibre charts and atlases come from and what they mean. 
Instead of presenting principal and ordinary fibre bundles as enigmatically intertwined object classes with 
some inscrutable relevance to each other, it shows that fibre charts are constructed directly from cross-sections 
of the reference frame bundle. 


47.13.4 REMARK: Simple interpretation of the definition of a combined topological fibre/frame bundle. 
The conditions of Definition 47.13.2 may be summarised in plainer language as follows. 
1. The principal and ordinary fibrations. 
(i) G, P, E, B and F are topological spaces. 
(ii) (P, wp, B) and (E,7, B) are topological fibrations. 
2. Reference-frame transition group G acting on reference frames in P. 


(ii) (G, P) is a topological right transformation group which acts freely on P, and freely and transitively 
on the fibre sets rp ((b]). 


(iv) The fibration (P, p, B) has a local trivialisation to U x G for an open cover of sets U € Top(B). 
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3. Reference-frame transition group G acting on measurements in F. 
(v) (G, F) is an effective topological left transformation group. 


4. The measurement process map 7. 


(vi) n: P x E 2 F satisfies n(pg—', z) = gn(p, z) for g € G, for (p,z) € P x E with rp(p) 


n(p, -) : v-1((0]) > F is a bijection for all b € B and p € ng! ((0]). 


5. The (optional) frame cross-section atlas Ab. 


(vii 


The charts in the atlas A} are continuous local cross-sections of (P, mp, 


= m(z). 


B) which cover B. 


) 
( 
(viii) 
(ix) 
47.13.5 REMARK: 


mx Ne? "T 


! (Dom(9)) ~ 


Discussion of the definition of a combined topological fibre/frame bundle. 


z: Dom(¢) x F is a local trivialisation of E for all charts ¢ € A}. 


The following comments apply to Definition 47.13.2. 


(1) 


It follows from the free and transitive action of G on the sets P, = p'({b}) for b € B in condition (iii) 
that the map uE(p, -) : G — P, is a bijection for all p € P,, for all b € B. (See Figure 47.13.2.) 


effective 
m secti 
free and E k E s 
transitive 9 45 s P, 
— bijective 


4——— — > Ey 


(4) 


[www .geometry.org/dg.html] 


Figure 47.13.2 Principal and ordinary fibre sets of topological fibre/frame bundle 


The free and transitive requirements are a direct consequence of the fact that the purpose of G is 
to model the set of transitions between the reference frames p € P, for all b € B, as mentioned in 
Remark 20.10.13. Expanding the structure group may be expanded to a supergroup without affecting 
its validity for the ordinary bundle, but an expanded structure group requires a corresponding expansion 
of the principal bundle. 


The action L5 of G on P is continuous by condition (iii). But it does not follow from this that the 
principal fibre sets P, are homeomorphic to G via the maps u£5(p,:). For example, if P, has the 
trivial topology, the map ue (p, - ) : G — P, must be continuous, but the inverse would then only 
be continuous if G has the trivial topology. (See also Remark 36.10.13 for this kind of topological 
transformation group.) Hence the local trivialisation condition (iv) for the principal bundle is required. 


A useful mnemonic for condition (vi) is the formula “pu = z” 
From the mnemonic pv z, it seems plausible to write (pg-!)(gv) = z. Translated back to the 
formula for 5, this gives n(pg~',z) = gv. (This resembles the canonical case G = GL(n), where the 
coordinatisation of an element z € Ey is achieved by determining which components (v;)?_, yield z in 
the linear combination $7; , e;v' = z, where p represents a basis (e;)7 4.) 


as an interpretation of m(p,z) = v. 


Condition (vi) is probably the core non-trivial concept in Definition 47.13.2. This formula is the essence 
of the notion of an associated principal bundle. It encapsulates a kind of “relativity principle" in the 
sense that a reverse transformation R,-1 of the measuring apparatus p is indistinguishable from a 
forward transformation Lg of the measurement n(p, z). 


The specification of a particular atlas A% of frame cross-sections in condition (viii) permits more than 
mere topological regularity to be indicated. The obvious alternative would be to require the complete 
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atlas of all continuous local cross-sections which are compatible with the principal bundle’s topology. 
(Such maximal atlases are avoided in this book.) 


The frame cross-sections in condition (viii) was originally included because it seemed to match the overall 
philosophy of Definition 47.13.2. However, such frame cross-sections do appear as “local gauges” in gauge 
theory, and the transition maps between such cross-sections are known as “gauge transformations". 


(5) The homeomorphisms in condition (ix) imply that the topology on the ordinary bundle E is fully 

determined by the frame bundle charts ó € A5. However, the frame bundle charts are fully determined 
by the topological fibration (P, p, B). So the topology on the ordinary bundle is uniquely determined 
by the topology on the principal bundle. 
Conversely, the topological fibration (E, r, B) fully determines the set of fibre charts from v» !(U) to F 
for open sets U € Top(B), which fully determines the set of compatible maps ng : ~'(Dom(¢)) > F 
of the form ng : z œ> n($(x(z)), z) for functions $ : B > P with Dom(9) € Top(B) and vp o ¢ = idy 
with U = Dom(4). So the set X C(P, p, B) of continuous cross-sections of the principal bundle, and 
hence the topology on P, is fully determined by the topological fibration (E, r, B). It follows that 
conditions (viii) and (ix) are effectively redundant! 


47.13.6 REMARK: Interpretation of fibre/frame bundles in terms of observations of object states. 
Another way of looking at topological fibre bundles is shown in Figure 47.13.3. 


io m Ros Nab i) Rag de 
7 : d 
E, E 


measurement map reconstruction map 
n: P x a> F C: P, x F — E 
n(p, z) = Lgn(Rgp, 2) C(Rgp, f) = Clp, Lo f) 


Figure 47.13.3  Observer/observed/observation maps for a combined fibre/frame bundle 


For each b € B, a “reconstruction map" Ç : P, x F — Ey may be defined by C : (p, f) ^ iis (f). Then 
Vo c P, Vf € F,n(p,¢(p, f)) = f. In other words, if the state is first reconstructed (by C) from an 
observation, and then measured (by 7), then the observation is the same. The map ¢ reconstructs the 
observed object state z € Ej from the observer p and the observation f. The rule which makes the object 
state observer-independent is Vp € P, Vf € F, Vg € G, C(p, f) = ¢(Ry-1p, Lg f). (This is related to the 
comments in part (3) of Remark 47.13.5.) 


It follows that one may reconstruct E, as the set of equivalence classes [(p, f)] = {(R,-1p, L5 f)) for (p, f) € 
Pp x F. 'The double application of g € G keeps the state unchanged. With this reconstruction technique, 
one may construct ordinary topological fibre bundles for any given topological fibre space F for which (G, F) 
is an effective topological transformation group. 


Applying measurement before reconstruction, one obtains Vp € P», Vz € Ey, C(p,n(p,z)) = z. This shows 
a kind of duality between the noumena in FE, and the phenomena in F. The noumena in Ey are “the 
real thing", whereas the phenomena in F are mere observations, resulting from the interaction between an 
observing system reference-frame p € P, and an observed system state z € Ej. 


47.13.7 REMARK: Multiple ordinary fibre bundles associated with a single principal fibre bundle. 

Figure 47.13.4 illustrates the case of multiple ordinary bundles for a single common principal bundle. There 
is a different set of component measurement maps nis for each ordinary bundle Eg, but they are parametrised 
by the same set of frames ¢ € P. 


47.13.8 REMARK: The difficulty of separating observations into location and state components. 
Although the definition of fibre bundles is apparently always given in terms of a clean decomposition into 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1540 47. Topological fibre bundles 


Figure 47.13.4 Fibre/frame bundle for multiple ordinary bundles 


a base space B and a coordinate space F, it is not so clear in even the macroscopic physical world that 
location and state can be so cleanly separated. The baseless figure bundles in Section 20.10 show how one 
may completely integrate location and state in fibre bundles, while Definition 47.13.2 shows how location 
and state can be cleanly separated. 


This raises the question of whether one can define a class of fibre bundles which are “almost decomposable” . 
In other words, one could attempt to define fibre bundles where the fibre sets are not quite disjoint, and the 
group actions do not preserve the location/state pair within a strict fibre set. For example, one could find 
that the same particle at some point may seem to have an observer-dependent location. Then the structure 
group could contain transition maps between observations at different locations. Against this, one may 
note that a topological fibre bundle does provide a kind of “glue” linking the states and frames at nearby 
locations. In this sense, there is already some “fuzziness” in the definition of topological fibre bundles. 


47.13.9 REMARK: The specification tuple for the topological fibre/frame bundle definition. 

To harmonise Definition 47.13.2 with the “uncombined” fibre bundle definitions such as Definition 47.6.5, the 
specification tuple would not have included the structure group G and fibre space F. Then Definition 47.13.2 
would have resembled the following. 


A topological (G, F) fibre/frame bundle, for an effective topological left transformation group (G, F) < 
(G, Ta, F, Tr, c, u), is a tuple (P, mp, E, v, B, n, AR) < (P,Tp, mp, E, Tg, v, B, Tp, ub, n, AB) which 
satisfies the following conditions 


[....the same conditions as in Definition 47.13.2, but omitting (v)....] 


However, there is a kind of symmetry between the spaces P and F which would be lost in this form of 
definition. There are arguments in favour of both specification tuple styles here. In fact, Definition 47.13.2 
is very much in harmony with Definition 20.10.8 for baseless combined figure/frame bundles. Moreover, 
it could be argued that all fibre bundle specification tuples should include the principal bundle, structure 
group and fibre space because the underlying significance of fibre bundles is that fibre charts map real-world 
systems E to measurements in F via real-world reference frames in P. 
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PARALLELISM ON TOPOLOGICAL FIBRE BUNDLES 


48.1 Structure-preserving fibre set maps . . . 2... ll ls 1541 
48.2 Parallelism path classes . . .... lll sss 1545 
48.3 Pathwise parallelism on topological fibre bundles. . .. ..... les 1547 
48.4 Associated parallelism... . . 2. 22 2l lll ss sss 1549 


48.0.1 REMARK:  Parallelism is a generalisation of the concept of a connection. 
Parallelism on topological fibre bundles is a generalisation of the concept of a connection on a differentiable 
fibre bundle which is presented in Chapters 67, 68 and 69. 


48.1. Structure-preserving fibre set maps 


48.1.1 REMARK: Preservation of "structure" by parallelism. 

Maps which “preserve structure" between fibre sets Ej with b € B for (G, F) fibre bundles (E, 7, B, AZ) 
are important for defining parallelism because parallelism must preserve structure. (At least something 
must be preserved because parallelism must be reversible. This is related to the holonomy group concept in 
Section 70.1.) A map which preserves structure means a map which is equivalent to the action of an element 
of the structure group “through the charts”. 


48.1.2 DEFINITION: A structure-preserving fibre set map for a topological (G, F) fibre bundle 
(E, v, B, AZ) is a homeomorphism A : Ey, & Ep, for bı, b2 € B such that 


Vói € AZ. Vóo € Ay, dg EG, daoh-—L,o drla, (48.1.1) 


In other words, Yġı € Ab b, Voz € AE ba? dg € G, Vz € Ey, ¢2(h(z)) = gġ1 (2). 


48.1.3 THEOREM: Uniqueness of chart-transition left-action group element. 
Let (E, 7, B, AE) be a topological (G, F) fibre bundle with 51,05 € B. Let h : Ey, — E», be a structure- 
preserving fibre set map. Then 


Vài € Abs, Y2 € Aga, TJ EG, 2 0 h= Lg o dile, 


In other words, the group element g in line (48.1.1) is unique for each given $1 and d». 


PROOF: Let ài € Afp, and ¢2 € AZ ,,. Let g,g' € G satisfy $2 o h = Ly o dila, and dg o h = Ly o 
H š . i š —1 
dila, Theorem 47.6.8 implies that $i|g, i Eon — F is a bijection. So L, = ó3 o h o dalm, = Ly. By 


Theorem 20.2.7, g is uniquely determined by L, because (G, F) is an effective transformation group by 
Definition 47.6.5. Hence g = g'. 


48.1.4 REMARK: Structure group elements may be thought of as the coordinates of fibre set bijections. 

The group element g € G in Definition 48.1.2 and Notation 48.1.5 may be thought of as the coordinates of 

the map h: Ey, + Ey,. This is clearer if g is written as gy, o, (h; b2, b1) such that Lasa., (h; ba,b1) = $2| r, o 
: 2 


ho di], - 
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48.1.5 NOTATION: The set of isomorphisms between the fibre sets at two base points. 

Isog( E», , Ey, ), for a (G, F) topological fibre bundle (E, x, B, AE), denotes the set of topological isomorphisms 
from Ej, to Ep, for b1,b2 € B which are equivalent to left translations by group elements “through the 
charts”. In other words, 


Isog(Ep,, Ey, ) = {h c Iso( Ey, , Ey, ); Voy E AE oy Yoz E AP tas Ag c G, Vze Ey, $»(h(z)) = 9gói(z)]. 


48.1.6 NOTATION: The set of automorphisms of the fibre set at a base point. 

Autga (Ex) for a (G, F) topological fibre bundle (E, 7, B, A7) denotes the set of all topological automorphisms 
of Ey for b € B which are equivalent to left translations by group elements “through the charts". That is, 
Autgc (Ei) = Iso( Ey, Ey). 


48.1.7 REMARK: Relation between fibre-set isomorphisms and general topological isomorphisms. 

Notation 48.1.5 is based on Notation 31.14.7, by which Iso(E»,, Ey,) denotes the set of homeomorphisms 
from Ep, to Ey. However, it is superfluous to require that h € Iso(E»,, Ey,) for all h € Isog(Ep,, En.) 
because (G, F) is a continuous transformation group. So 


Isog (Ey, , Ev, ) = {h : Ey, — Ey; Voy € AE n Yoz € AE ta? dg € G, Yz € Ey; $»(h(z)) = gó1(z)]. 
In terms of the per-base-point chart notation Bb, 4, = (M p, > One May write 
k 


Is0g(Eb,; Eb.) = (85,5, © Lg 9 Bongi; 61 € Ag, 62 € AE, JEGY 
=i 
= Us| o Lgo dila: $1 € Abs b2 € Aho GE GI. 


All maps of the form Bon do o Lg o By,,6, are automatically elements of Iso( Ep, , Ey,) by the conditions of 
Definition 47.6.5. 


48.1.8 DEFINITION: A (fibre set) automorphism through the charts on a (G, F) fibre bundle (E, m, B, AZ) 
is a map of the form 6, 4 o Ly o fy : Ey & Ey for some g € G, b € B and ¢ € Af p 


48.1.9 NOTATION: Left actions by structure group elements on fibre sets. 
LÈ ud for b € B, g € G and ọ € AE for a topological (G, F) fibre bundle (E,7, B, AZ), denotes the 


" b _ alot 2 Je es 
automorphism through the charts Ly , = elz, o Lgo lp, : Ep & Ep. 
Lj,4, for g € G and ¢ € Af, for a topological (G, F) fibre bundle (E, v, B, AZ), denotes the map z — 
-1 
a o Lgo Plas (z) for z e Dom(¢). 


48.1.10 REMARK: Notation for left action by structure group elements via an entire chart. 
The map Lg, in Notation 48.1.9 is an automorphism L, : Dom(9) ~ Dom(4). 


48.1.11 REMARK: Illustration of basic properties of fibre set automorphisms. 
Theorem 48.1.12 is illustrated in Figure 48.1.1, particularly the proof of part (ii). 


F Lg COO Ly fibre space 

Pk Boor IT fibre charts 

E D g CAD L5 ga total space 

T | projection map 

B : base space 
Figure 48.1.1 Fibre set automorphisms 
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48.1.12 THEOREM: Basic properties of fibre set automorphisms. 
The fibre set automorphisms L? „ for a (G, F) topological fibre bundle (E, 7, B, Az) for g € G, ó € Af and 
b € Us = 1(Dom(9)) have the following properties, where 6p, denotes dlr, for all b € B and ó € AS i 


(i) Voc AL, vb c Us, Vgi,go € G, L $9 Lr. d = Lb Vnd 


(ii) Yoi, Q2 € AE , Vb € Us, N Use; Vgi, 92 € G, L? udi 9 L? on = L? 


g1hi2g2h21,01 
hij = fbig; (b) for i, j € N2, and Sets (b) as in Definition 4T.6.5 6.5 (v). 


(iii) Yei, da € AE, Vb € Us, N Uga, Le, 4, = Lb abo = 85,460.01 for hia as in (ii). 

(iv) Yi, da, ds € AE, Vb € Us, N Upa WU ds, LÈ, g, = Do, = Dose, for hiz as in (ii). 
F b —_ Tb b b —_ 7b 

(v) Vó1, $2 € Ag, Vb € Uo, n Ugs, Vg € G, La Dhar bo 9,02 his, E Las ghiaida* 


PROOF: Part (i) follows from L? L? , = (Bs Lo Pog) (Or, os Boso) =b a bain = Prolo t: 


(For simplicity, juxtaposition is el edad of the “o” symbol to indicate function composition here.) 


= [> 


h21g1h1292,02? where 


Part (ii) may be calculated as follows. (See Figure 48.1.1.) 


-1 
n $i Pw s = (Bs 01 Lg, Bo,41) (B o, Ls Fb,¢2) 


1 "ET 

S PLA Lg, Lnis Lg; Bt,o; (Bs. 5. Bo.) 
-1 

= By gı Lgrhizg2 Lha: Po, o, 

— Tb 

E Lgihaagahzi,di* 


The equality to L^ follows similarly. (Note how the indices match up nicely.) 


haigihi2g2,62 
Part (iii) follows from Lj ads = Bia, Dni = [OE (Bb,61 Bo da) 0,01 B Bias Pos Similarly, L un "E 
Pros (Baia Doa = Pa Pp 
Part (iv) follows from the calculation: 
Ein Py ds Lisa sa 

= (Bib 5,05) Po d, Uhr2 90.61 (Biga boos) 

= Bi ġa Lhsı Lha Dass 

= By ġa Lhsıhiz2his bos 

= Lie hishis da T Ir — 

The other half of part (iv) follows from part (ii). Part (v) s d from the calculation 
D$, = Bii Loboda = Psi (B is Porta) Lo (B as Bos) Pss = Linas Dp is hanga = Dhaighisda" 


48.1.13 REMARK: Interpretation of some basic properties of fibre set automorphisms. 

Part (i) of Theorem 48.1.12 means that the composition rules for the automorphisms L, 4 are the same as 
for the left translation operators of the transformation group (G, F). 

The general composition rule LÈ g o Lb do = ee = I! -—Ü for different charts in part (ii) 
involves some sort of conjugation of group elements with the coordinate transition group elements 12 
and hai = h34. 

Part (iii) in reverse gives a chart transition rule for elements of a fibre set. Thus 5, Jı Pos (2) = D$ dy (2) 
for all z s Ey. Hence fy, n z) = Pa (LÈ hai di (2 z)) for all z € Ey. In terms of Notation 48.1.15, one may write 


— yb = 
also i 62,01 Lhaa, oi Ir p2 — =P hy “by bb, 1° 


The fact that Dias, "E = Ld in part (iii) suggests that LY ad is independent of the chart ¢. Part (iv) 
shows that this is not true in general. 


Part (v) is a general chart transition rule. This rule indicates a fundamental problem with the transfer of 
group actions from the fibre space F to the fibre sets Ey. The problem is that parallelism cannot be specified 
by associating a group element g € G with each point on a curve to indicate a left action on the fibre set 
at that point. The left actions L^ g.@ 0n Ep are chart-dependent. So the orientation of the fibre sets must be 
indicated in general by a different group element for each chart. The purpose of principal fibre bundles is to 
try to remove this problem. 
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48.1.14 DEFINITION: A (fibre set) isomorphism through the charts on a (G, F) fibre bundle (E,7, B, AZ) 
is a map of the form fp‘, o Lg o f, 4, : Eo, © Ey, for some g € G, bi, b2 € B, $1 € AE ,, and do € AE ,,. 


48.1.15 NOTATION: Left actions by structure group elements between two fibre sets. 
UNA for bı,b2 E B, g E€ G, ġ1 € AS o, and $9 € AE bo) for a topological (G, F) fibre bundle (E, 7, B, A£), 


denotes the isomorphism through the charts pon = Bs, o Lg © By, o, : Ey, & Ey. 


48.1.16 REMARK: Notation for isomorphisms between fibre sets at different points. 
Notation 48.1.15 is illustrated in Figure 48.1.2. 


fibre space 
fibre charts 


total space 


| projection map 


M 
B . . Heu base space 
by b» b3 
Figure 48.1.2 Fibre set isomorphisms through the charts 


If the fibre set isomorphisms pus o, for base point pairs (b1, b2) € Dom(¢1) x Dom(¢2) are considered as the 
maps L,,5,,4, for variable (b1, 62), the result is not a single-valued function because each element of the fibre 
set Ej, is mapped to a single element in every fibre set Ep, for b> € Dom(¢2). Therefore the symbol L, 5, o, 
is best thought of as an equivalence relation than as a function. For each choice of g € G and $1,» € Af, 
the relation Ly 2,4, sets up an equivalence between one point in each fibre set of Dom(¢) and one point 
in each fibre set of Dom(¢2). In practice, the superscripts on the notations L? 4 and i o, are tedious to 
write. So they may be omitted. 


48.1.17 THEOREM: Basic properties of isomorphisms between fibre sets at different points. 
The fibre set isomorphisms L??! , for a (G, F) topological fibre bundle (F,7, B, AZ) for g € G, dp € AE 


9,02, 
and bk € Uy, = a (Dom(Qx)) for k = 1,2 have the following properties: 
: b3,b bo,b b3,b : 
(i) Vou, 2, $3 € AE. Vb, € Uo, ; Vb € Uo; Vb; € Ugs, Yg1, 92 € G, Lots i du = L edhà 


xs b3 ,b: b2,b 

(ii) Voi, 02, 93, Q4 € AE, Vbi € Us, Vbs € Ug, n Ugs Vb3 € Uo,; Vgi, go € G, il d [o Do zh — 
v. PE where hij = go,,o, (b) for i, j = 2,3, and gg,.4,(b) as in Definition 47.6.5 (v); 

m bo,b bo,b 

(iii) Voi, $1, $2, 9 € AE, Vb € Ug, n Ue; Vbo € Ug, N Ue, ; (Ea = Let di e 

9 = go, os (b2)-9-941,0", (b1)); 


; b5,b = : 
(iv) V1, Q2 [s AÈ, Vbi € Us, Vb» € Us, Lo sd: = Pie sedis 


bob -1 . 
(v) V1, G2 € AL, Vb, € Uo, Vb, € U,,, Lg = E E A :F oF. 
Pnoor: Part (i) follows by simple calculation (indicating composition by juxtaposition): 


b3,b bob -1 E 
Ly huda on batt = (By, os Los ta o2 (B, o, Loi Dos o) 


— 2—1 

EN Pos os 9291 Pbr 1 
— 73,b1 

^ ““go91,03,01° 


Part (ii) may be calculated similarly as follows: 
bab bo,b -1 -1 
Lj $a da giai = (Ps a Lon Ba os (Bu, Lo Ps o) 


m 

= Pos ġa s Lhsa Lg, By, o 
— pbs,bi 

^ ““goh3291,04,01° 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


48.2. Parallelism path classes 1545 


The rule for changes of fibre charts in part (iii) follows from the calculation: 


b5,b — a-l 
Lobos m By, o, Lobor, 
cu E E 
= Pon., (B, ds By, o.) Ug (Bb. o. By, i) Bs 


—1 
= Poa. Paar us (02) Loo, yy (b) b, d y 


Parts (iv) and (v) follow trivially from Definition 48.1.14. 


48.1.18 REMARK: Uniqueness of structure group element which coordinatises a fibre set isomorphism. 

Theorem 48.1.17 (v) implies that any fibre set isomorphism h : Ey, — Ey, may be converted to a unique g € G 
by calculating Lg = 8y,,4. o h o By, : o," Any isomorphism h = Lh o, may be regarded as a parallelism 
relation between the fibre sets at bj and bọ in B. To specify absolute (path-independent) parallelism, 
these fibre set isomorphisms are completely adequate. But for pathwise (path-dependent) parallelism, the 


definitions of Section 48.3 are required. 


48.1.19 REMARK: Connections on fibre bundles require differentiability of fibre set isomorphisms. 
Looking ahead to differentiable fibre bundles, the strategy will be to try to differentiate the parameter g for 
the map f with respect to the point bə as it varies along a curve. This derivative will be called a “connection”. 
One may think of g as a function gy, o, 5, (b2, 61) which is a coordinatisation of the map h(bo, b1) : Ey, > Foz- 
Thus 95,5,,4, (b2, b1) = Bo, pa h (bo, b1) B... Looking even further ahead, one may also try to calculate the 
exterior derivative of the derivative of gn,4,,4, (b2, b1) with respect to bz to obtain a measure of the curvature 
of the parallelism. This is not totally unlike the situation in elementary calculus where the curvature of a 
curve in flat 2-space IR? is related to the second derivative of the curve regarded as a graph. 


48.2. Parallelism path classes 


48.2.1 REMARK: The difference between absolute parallelism and pathwise parallelism. 

Parallelism in flat spaces is absolute parallelism, which means a global equivalence relation between elements 
of fibres sets at each base point. By contrast, pathwise parallelism means parallelism which is absolute only 
within each path. For all the points along each path, there is an equivalence relation between elements of 
fibre sets on the points of the path. (Self-intersections of paths are dealt with by treating multiple crossings 
of a single base point as different points.) 


48.2.2 REMARK: A pathwise parallelism specification requires a suitable path class. 

The fibre atlas on a topological fibre bundle uniquely determines the topology, which in turn determines 
which cross-sections along paths are continuous. A definition of parallelism, on the other hand, determines 
which continuous cross-sections along paths correspond to parallel translation. Since parallel translation 
must satisfy a group invariance property, the structure group plays a role in defining parallelism but not in 
defining the set of all continuous cross-sections. 


As discussed in Section 36.1, the terminology adopted for curves and paths in this book is that “curves” 
are maps y : I > M for intervals I and topological spaces M, whereas “paths” are equivalence classes of 
curves. Two curves are considered equivalent if they are related to each other by an increasing parameter 
homeomorphism. So a path is a set of curves which all start at the same point and take the same route to 
the end point, passing every point in the same order. Parallel transport depends only on the path, not on 
the particular choice of curve which represents the path. 


In the case of differentiable fibre bundles, parallelism is defined on rectifiable paths because a connection 
can only be integrated along a path if the tangent to the path exists almost everywhere. (See Sections 38.9 
and 50.7 for rectifiable paths.) It does not seem to be possible to generalise parallelism in a satisfactory 
way to all continuous paths in a topological space or topological manifold. (The set of all continuous paths 
in M is denoted 4o(M) in Notation 36.8.7.) This is not surprising because the transitivity property of 
parallelism along paths gives parallelism the character of an integral, and integrals are not usually defined 
for completely general functions. In this case, the integral is the kind of direction-dependent path integral 
that appears in the Stokes theorem, which requires a locally rectifiable curve. Therefore it seems natural 
and unavoidable that the most general definition of pathwise parallelism on a topological fibre bundle will 
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require the specification of a class Y of paths on which parallelism may be defined. Since the path class 
4 will typically be defined in terms of a differentiable structure, it will not generally be definable in terms 
of the topological structure alone. This is a kind of ex machina path class which requires some external 
structure for its definition, and which therefore must be specified as an ad hoc set which has certain closure 
and continuity properties. The paths in the class Y could be thought of as“wormholes” through which 
parallelism is carried between the fibre sets at different points in the base space. This is illustrated in 
Figure 48.2.1. 


End here 


Start here 


Figure 48.2.1 Parallelism “wormholes” (paths carrying fibre set orientation information) 


48.2.3 REMARK: Definitions of parallelism typically arise from physics or geometry. 

It is not guaranteed that every kind of transformation of fibre sets along paths in a fibre bundle will satisfy 
the criteria for a definition of parallelism. In a physical system, one may imagine that one sends test particles 
out into the state space of the system to measure the transformations that occur in the fibre sets along the 
paths of the particles. Only if these transformations satisfy a definition such as Definition 48.3.2 can the 
transformations be thought of as a kind of parallelism. In other words, a parallelism is something that one 
must discover. It turns out that many mathematical models, in particular all Riemannian metric spaces, 
have a natural and useful parallelism. If the parallelism can be differentiated, then a connection is defined. 
If the connection has a well-defined exterior derivative, then the curvature may be defined, and curvature is 
what makes differential geometry different to flat-space geometry. 


48.2.4 REMARK: Parallelism is very fundamental in physics. 

Parallelism along paths is essential in physics for the support of polarisation of light and conservation of 
momentum. Since Mach’s principle (a very reasonable principle) says that momentum must be related to 
the rest of the matter of the universe (and inertial frames just “coincidentally” are those which have constant 
velocity relative to the “fixed stars”), one might ask if there is some causal relation between the “fixed stars” 
and the parallelism or affine connection on physical space. Even though there is no luminiferous aether in 
the 19th century sense, there still seems to be some sort of structure in the vacuum which defines parallelism 
so that momentum and polarisation are meaningful. 


Space seems to require a connection in addition to mere differentiable structure, and one might reasonably 
ask what this structure is composed of. It seems to obey equations which are related to gravity, but it is 
not clear what the “parallelism wormholes” are. It seems almost as if space has tramlines laid down for 
matter and energy to flow along, and the tramlines have some sort of “roll control” which maintains parallel 
transport. 


One could go further and ask the more fundamental question of why physical space (or space-time) seems to 
have a differentiable structure. How are nearby points “glued” together to make a smooth manifold? How 
do nearby points “know” that they are near each other? How does know that it must let light pass through 
it at the speed of light and not some other speed? Why doesn’t light travel in a randomly crooked path or 
go round in circles? The anthropomorphic principle does not tell us how these things happen. It only tells 
us that we would not be observing the world if it were otherwise. 
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48.2.5 REMARK: Determination of minimum conditions for path classes for parallelism definitions. 
Definitions 48.2.6 and 48.3.2 are, presumably, non-standard. It is reasonable to expect that a parallelism path 
class will be closed under concatenation, restriction and reversal. Closure under continuous reparametrisation 
is taken care of by the definition of a path as an equivalence class of curves. 


A class 4 of paths in a base space B for defining parallelism on a fibre bundle (E, r, B, AE) must be a 
partition of some set @ of curves in B such that the curves in any path are path-equivalent according to 
Definition 36.5.3. In other words, every path in Y must be a non-empty set of path-equivalent curves in @ 
and the paths must be pairwise disjoint. In particular, € = U 2. 


48.2.6 DEFINITION: A parallelism path class for a topological fibre bundle (E, v, B, A‘) is a set Z^ of paths 
in B such that 


(i) All constant paths in B are in FY; [idempotence] 
(ii) For all paths Q = [y]o € Z, the reverse —Q = [—y]o is in F; [symmetry] 
(iii) For all Q1, Q2 € P with T(Q1) = S(Qs), the concatenation of Q4 with Qə is in FY. [transitivity] 


A parallelism curve class is the set of curves € = |J Z in a parallelism path class Y. 


48.2.7 REMARK: Examples of path classes suitable for defining parallelism. 

Examples of suitable parallelism path classes are the set of rectifiable paths in a Lipschitz manifold, the set 
of piecewise C^ paths in a C* differentiable manifold for k > 1, and the set of piecewise linear paths in an 
affine space. 


48.3. Pathwise parallelism on topological fibre bundles 


48.3.1 REMARK: A general definition of parallelism on topological fibre bundles. 

Definition 48.3.2 is necessarily a little convoluted. In plain language, it means that a pathwise parallelism 
is a set of structure-preserving maps between the fibre sets of pairs of points of curves in the specified curve 
class @. These parallelism maps are equivalent for curves which are path-equivalent, and they obey some 
basic rules of transitivity and symmetry. 


Although general parallelism is not an absolute (i.e. path-independent) map between fibre sets, the restric- 
tion to paths is absolute. Within a path, every point in every fibre set has a unique association with a 
point in each other fibre set. (Recall that intersection points of paths are regarded as different points.) 
Therefore parallelism along a path may be formalised as an equivalence relation rather than as the maps of 
Definition 48.3.2. The functional representation is probably better for such tasks as differentiation though. 


See Notation 48.1.5 for the isomorphism sets Isog(E,, Eg). Figure 48.3.1 illustrates some of the structures 
involved in the pathwise parallelism in Definition 48.3.2. 


Q” 
Ge) Isog Ee Eq (3 © 


q= (t) 
Z Range(y) € B 
L R R s,t 
Figure 48.3.1 Pathwise parallelism structure 
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48.3.2 DEFINITION: A (topological) (pathwise) parallelism on a parallelism path class Y for a topological 
(G, F) fibre bundle (F,7, B, AZ) is a map 


0:*— U(LxlI,- U Isc(Ep E), 
YEO pqceE 


where @ = |J Z and I, = Dom(7) for y € €, which satisfies the following: 
Gi) Vy € €, Dom(87?) = £, x Iy; 
(ii) Vy € €, Vs, t € L,, Sum € IsoG(E.(), E,1)); 
(iii) VQ € Z, Yy € Q, VI CR, VB € C(I, L), Vs,t € I, (yo BE Q > O24" = O 5,50): 


[parametrisation independence 

(iv) Vy € €, Vs,t,ue I, 01,007, 2 O1, [transitivity 
(v) V4 € V, Ve tel, 01 = Ol; [reversibility 
(vi) V5, 3s € €, (11 € 4 > OT C OF); [monotonicity 
(vii) Vy € €, VÀ1, à» € AL, Iha ha is continuous, where 95.0 : h x lh G with I; = I} n 1(Us,) for 
k = 1,2 is defined by $3 0 02, = Loy 4, (st) o dile for all $1, ¢2 € AẸ and s,t € I x Ig. [continuity 


The notation O7 , means O(7)(s,t), and O? means O(). 


48.3.3 REMARK: Consequences of parametrisation-independence of parallelism. 
Definition 48.3.2 (iii) has the consequence that the parallelism map is the identity map along constant 
stretches of curves. That is, if 8 : J — L, is constant on [a,b], then ey = ide esq for all s,£ € [a,b]. 


If (iii) is applied twice in the case of a curve equivalence 7 o 8; = 72 o f = 73 with 71, %2; %3 € Q, the 
result is OF NOUN tn =o? Bas) ba(t) = = O75. This means that the definition of parallelism is independent of the 
curve a. to represent a path. So parallelism depends only on the path, not on the parametrisation. 


48.3.4 REMARK: Consequences of the transitivity rule for parallelism. 

The transitivity rule, Definition 48.3.2 (iv), with u = t implies an idempotence rule, namely that O7, = idg, - 
for any y € @ and t € L,. Similarly, (iv) implies the rule O7, = (07,) !. These look like semigroup 
properties, but the maps B. are only isomorphisms, not automorphisms. 


Condition (iv) in Definition 48.3.2 is specified by Willmore [42], page 206, as two separate rules, one corre- 
sponding to s < t < u, the other corresponding to s = u. 


If the fibre set isomorphisms O7, are known for a fixed s € I), then the isomorphisms for all other pairs 
(s, t) may be calculated. 


48.3.5 REMARK:  Parallelism is absolute along paths. 

The reversibility rule, Definition 48.3.2 (v), together with the transitivity rule (iv), implies that the parallelism 
on a path is absolute. That is, it doesn't matter how a curve gets from one point of the path to another, it will 
always give the same parallelism from one point on the path to another. (When there are self-intersections, 
the different traversals through the same point are regarded as different points of the path.) In particular, 
if a curve starts at a point on a path and comes back to the same point, the result is the identity map. So 
the parallelism is “flat” because there are no closed paths for which the parallelism is a non-identity fibre 
set map. 


This absolute parallelism implies that the function O? may be replaced with a simple equivalence relation 
on the fibre sets over the base points of the curve y. 


48.3.6 EXAMPLE: Showing that reversibility does not follow from the other parallelism conditions. 

Condition (v) for Definition 48.3.2 does not follow from the other conditions. As a counterexample, consider 
a trivial (G, F) fibre bundle (E, v, B, AZ) with G = O(2, F = R24, B = R, E = B x F, 7: (a,z) = x, and 
AE = (0) with $ : (zx, z) > z. penne € to be the set of rectifiable curves y : J — B. Define a map O for 
this fibre bundle by O2, = R(a},), where R(o7,) denotes rotation of the fibre sets (through the chart) by 


angle a}, = =f |y (u)| du for 4 € @. The interested reader may verify that all of Definition 48.3.2 except 
condition (v) is satisfied by O. (It’s about time the other readers did some work too. The interested reader 
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can’t be expected to do everything!) If the rotation angles are replaced with a}, = E y(u) du = 7(t) —^(s), 
then O satisfies all of the conditions of Definition 48.3.2. These examples are illustrated in Figure 48.3.2 for 
y: um sinu, s — 0and t — m. 


50000 Gooo0 
oooo0o0 coooo 


aj f Ga) du aj f Ei eo 


ER 


Figure 48.3.2 Definition of parallelism without and with reversibility 


48.3.7 REMARK: Consequences of the monotonicity (subsets of curves) condition for parallelism. 
By Definition 48.3.2 (vi), if y1 is a subcurve of y2, then ©” is a restriction of 0%. So O% = O% 


Ly, XLy," 
If two curves ^; and ^ have a common portion ^ so that yo C yı and 7 C 72, then condition (vi) implies 
that the parallelism of yı and y2 will be the same along the common portion yo. This means that any 
two curves passing through the same points will experience the same parallelism transformation, no matter 
how the curves differ elsewhere. This may be thought of as a “memoryless” property. The transformation 
experienced by a test particle moving along any portion of a curve is independent of anything that happens 
before (or after) it passes through that portion. 


This can be looked at in reverse. The opposite of a function restriction is a function extension. If a curve 
73 is the concatenation of curves yı and y2, then yı C 73 and y2 C 73. So both curves are subcurves of y3. 
(It follows by Definition 48.2.6 that 73 € @.) Therefore the parallelism map O75 for s € J}, and t € L,, is 
obtained as the composition of O7, and O7 ,, where Z}, = [ax, bx] for k = 1,2. Thus O75 = O25, o Oue 
So Definition 48.3.2 (vi) may be thought of as a concatenation rule. 


48.3.8 REMARK: Chart-dependence of structure group elements which coordinatise parallelism. 

The group element Ibo, à G t) in Definition 48.3.2 (vii) generally depends on the fibre charts $4 and $». If 
b = (s) = y(t), one may choose a single chart ¢ = $1 = $5. For such a closed curve portion, O7, = Ls € 
Autg(E.,(,,) with g = 9, «5: t). Unfortunately, by Theorem 48.1.12 (v), the group element g depends on 6. 


A simple example of this is the tangent bundle of the sphere S? with the orthogonal group G = O(2) as 
the structure group. The parallel transport around a closed curve (with the standard parallelism definition) 
results in a rotation of the tangent space common initial and terminal point through some angle, o € IR say. 
This angle is the same for all charts which have the same orientation, but the rotation angle is —a for a 
chart with the opposite orientation. 


48.4. Associated parallelism 


48.4.1 REMARK:  Porting parallelism between associated fibre bundles. 

There is a logical progression which leads to porting of parallelism between associated fibre bundles. The 
logical progression is: (1) porting right action on G to left action on F; (2) porting base-point-to-base-point 
parallelism between associated fibre bundles (ignoring curves); (3) porting pathwise or curvewise parallelism 
as in Section 48.4. 


Definition 48.4.2 shows how parallelism can be “ported” between a topological (G, F) fibre bundle and an 
associated topological (G, F) fibre bundle. This concept is illustrated in Figure 48.4.1. 


'This is the real reason for defining associated fibre bundles. The idea is to achieve economy of specifications of 
parallelism by specifying it just once for one fibre bundle and then copying it to all associated fibre bundles. 
The prime example of this is where the parallelism on the tangent bundle of a differentiable manifold is 
re-used for all of the different kinds of tensor bundles on that manifold. 
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Figure 48.4.1 Porting parallelism between associated fibre bundles 


48.4.2 DEFINITION: The associated topological pathwise parallelism from a topological (G, F) fibre bundle 
€ = (E,7, B, AZ) to an associated topological (G, F) fibre bundle € = (E, 7, B, AP) for a given topological 
pathwise parallelism O on a parallelism curve class € is the topological pathwise parallelism Õ on which 
is defined by 


Vy € €, Vs,t € Ly, Vy € AE), Vós € AE Vg EG, 


y _ po (os) y y) a(s) 
Ol = Lye € One Lid 


E,y(t)? 
(48.4.1) 


where dài = h(¢)) and d» = h(ó2) are the charts for € which are associated with the charts $1 and ¢2 
respectively for € via a topological fibre bundle association h : AE — AF. 


48.4.3 REMARK: Associated parallelism uses the same structure group elements for both fibre bundles. 
Definition 48.4.2 is illustrated in Figure 48.4.2. The most important thing to focus on in this cluttered 
diagram is the equality g = g. This means that for matching (i.e. associated) charts, the parallelism is 
"coordinatised" by the same group element, regarding the fibre charts as a kind of coordinatisation of the 
space of all permitted isomorphisms of the fibre space. 


r( )—À— (Or B )——— Ze F 


8,9). | 621 sto. Bacay by =r) a=) | By do 
Biss E—————À: E E. ——sÁ— E 
qe C) e», c poo © T a) C) ope C) ve) 
st — “g,d1,b0 st ""G,d1,62 
z| | 8| |i 


Y Y 


=p (s) _ g-1 o Lg o By(s), i ón- pee) = 5-1 a Lg By 


9,01,2 y(t).02 9.01,02 (t).02 (5), 
Figure 48.4.2 Associated topological pathwise parallelism 
t E] — = 
Recall from Notation 48.1.15 that ro 2 = Byes o Lg o By(s),6, : F ~ F, and LO = 67! ro 
L,° B (ad d F ~ F, where Po, denotes 9|. cy) and so forth. 


Using the notation of Definition 48.3.2 (vii), the parallelism association in expression (48.4.1) may be formu- 


lated as the equation Tb = ZEE à for all associated charts $1 © 1 and $9 à». The group elements 
? 25,01 
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Io, (5. t) and 9 A (s,t) may be thought of as the coordinates in G of the fibre set isomorphisms O7 , and 


97, respectively with respect to the corresponding fibre charts. 


48.4.4 REMARK: Associated parallelism needs most components of the two fibre bundles to be the same. 
There are many things which are the same in the two associated fibre bundles in Definition 48.4.2. These 
include the structure group G, the base space B, and the curve class @. Thus both parallelisms © and © 
are defined for the same values of y, s and t, and their values are left translations through the charts by 
the same group element g € G for each curve y € @. The difference is that these left translations are for 
different fibre spaces. 


48.4.5 REMARK: The chart-independence of associated parallelism. 

It doesn’t seem to be possible to define associated parallelism without the use of fibre charts because fibre 
bundle associations can only be defined in terms of fibre charts. Therefore, as with all definitions which 
are constructed with charts, it must be verified that the definition is chart-independent. This is done in 
Theorem 48.4.6. 


48.4.6 THEOREM:  Fibre-chart-independence of associated pathwise parallelism definition. 
The associated parallelism in Definition 48.4.2 is chart-independent. 


PROOF: It must be shown that the isomorphism 97, = prs) with g = I, A (s, t) is independent of 
the choice of fibre charts. The original parallelism O is automatically chart-independent because the group 
elements g are defined in terms of O rather than the other way around. Chart-independence for O means that 
the group elements g3, 4, (s.t) € G obey the rule 95, g, 50 = go, s (1 (0))95, 5, (5: 5)96, 64 (Y(5)), where the 
functions Jø’, : U;(1 Uy — G denote the fibre chart transition functions for charts ¢, 9' € AL. This follows 
from the rules for change of fibre charts for the fibre set isomorphisms L0 gı; (See Theorem 48.1.17 (iii).) 
'The same rule must be shown to apply for O; that is, it must be shown that 


pi ae ee Y The 
95, d, (s,t) = IG! bo (I, 5, (s, 095, y. (y(s))- 


By Definition 48.4.2, 95, g (t) z IY, di (s,t) and g3, 5, 54) = gj, o, (5,0). From Definition 47.9.5 (ii) for 


fibre bundle associations, it follows that gg, 5, (*(t)) = 964,9, ((0)) and 95,  (7(8)) = ge... (Y(5)).. So 
everything works out nicely. 
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Part IV 


Differential geometry 


The informal outlines at the beginning of Parts I, II, III and IV of this book are not intended to be read. 
However, these outlines could possibly be of some value as an adjunct to the table of contents and index. 
The reader is recommended to skip immediately to Chapter 49. 


PART IV: Differential geometry 


(1) 


Differential geometry includes locally Cartesian spaces and any kind of construction which incorporates 
a locally Cartesian space in its structure. Thus tensor algebra and topological fibre bundles, for example, 
are amongst the “preliminary topics” which are presented in Parts II and III, even though they are often 
considered to be core topics of differential geometry. g Ex: 


In terms of the layer model in Section 1.1, Sections 49-50 are in (topological) layer 1, Chapters 51-61 
are in (differentiable) layer 2, Chapters 67-72 are in (connection) layer 3, and Chapters 73-75 are in 
(metric) layer 4. 


Chapters 62-66 (Lie groups, Lie transformation groups and differentiable fibre bundles) are in a kind of 
intermediate “layer 2.5" which lays the foundations for layer 3. 


Most of the material in Chapters 49-61 is somewhat dispiriting in the sense that it is, in many ways, 
not much more than “differential calculus with charts". In other words, it has very little geometric 
character. The Lie groups and differentiable fibre bundles in Chapters 62-66 do have slightly more 
geometric flavour, but still lack real geometric substance. EM 


After some 546 pages of dry “calculus in charts", the real geometry commences in Chapter 67. Then there 
are approximately 208 pages of “true geometry”, which are concerned with parallelism and distance. 


The reason for the very lengthy “dry” lead-up to the truly geometric chapters of this book (which 
are the more satisfying chapters if one has some enthusiasm for the geometry of curved space) is the 
guiding principle of systematic stratification of concepts. Consequently all of the dry “calculus in charts" 
chapters necessarily precede the presentation of connections and metrics. Most textbooks present the 
geometric concepts with the calculus concepts interpolated as they are required. By insisting on a 
strictly layered presentation order, this book has concentrated the dull, dry chapters at the beginning, 
with the risk that the reader may never arrive at the more geometric material. 


CHAPTER 49: Locally Cartesian spaces 


(1) 


(4) 
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Chapter 49 presents locally Cartesian spaces, which enable geometry to progress from spaces which are 
topologically globally Cartesian like IR” to spaces which are only locally Cartesian. Thus every point 
must have a neighbourhood which is homeomorphic to an open subset of a Cartesian space R”, but 
there are no further constraints in this chapter. 


Section 49.1 contains some remarks about intrinsic differential geometry as opposed to the extrinsic 
differential geometry of submanifolds of Cartesian spaces. 


Section 49.2 is a general discussion of how best to define, name and interpret locally Cartesian spaces 
and manifolds in general. After one an a half centuries of attempts to reformulate differential geometry 
to make it either simpler or more powerful, there is now such a plethora of novel formalisms that 
the diversity of inconsistent definitions, names and conceptual frameworks is now a major obstacle to 
learning the subject. One of the main objectives of this book is to reunify the subject to some extent. 


Section 49.3 defines non-topological charts and atlases. Some of their basic technical properties and 
constructions are independent of topological considerations. 
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Section 49.4 introduces locally Cartesian topological spaces in Definition 49.4.7. Many authors refer to 
these as “locally Euclidean spaces”, but in this book, the adjective “Euclidean” implies the metric space 
structure of classical Euclidean spaces, whereas “Cartesian” refers to only the standard topological and 
differentiable structures on finite-dimensional Cartesian set-products R”. 


Section 49.5 gives several examples of non-Hausdorff locally Cartesian spaces. These demonstrate that 
the very simple definition for locally Cartesian spaces has not completely captured the intuitive notion 
of a topological manifold. In addition to the strictly local topology in a neighbourhood of a each point, 
something extra is required in order to prevent a wide range of apparently pathological possibilities 
from emerging. Therefore most authors append an ad-hoc Hausdorff separation requirement to the list 
of conditions for a manifold to exclude such “pathology”. However, closer examination reveals some 
intriguing possibilities for quantum field theory, which are not pursued here. 


Section 49.6 defines charts for locally Cartesian spaces. These are homeomorphisms between the space 
and some Cartesian space IR”. They are guaranteed to exist and be consistent on their overlaps by the 
definition of a locally Cartesian space 


Section 49.7 defines locally Cartesian space atlases, which are sets of consistent charts which cover the 
whole space. The consistency between charts is guaranteed if the charts are consistent with the space. 


Section 49.8 considers how to induce the topological structure of a locally Cartesian space from a given 
atlas instead of defining the atlas from the space. In the induced topology case, the consistency and 
global covering conditions must be explicitly required in order to guarantee that the induced topology 
will be well defined and locally Cartesian. The induced topology concept is important because locally 
Cartesian spaces are often defined in this way in practice. 


Section 49.9 defines “locally Cartesian patchwork spaces”. These are the same as the induced topological 
spaces in Section 49.8 except that the point set for the space is defined to be a “patchwork” of individual 
patches of R” by topologically gluing them together. 


Section 49.10 is about continuous real-valued functions on locally Cartesian spaces, and Section 49.11 
is about continuous curves in locally Cartesian spaces and continuous maps between locally Cartesian 
spaces. It is not necessary to define these concepts here because they have already been defined for 
general topological spaces. 


CHAPTER 50: Topological manifolds 


(1) 


Section 50.1 defines topological manifolds by adding the Hausdorff condition to the locally Cartesian 
space definition. All of the definitions, notations and theorems for locally Cartesian spaces are then 
immediately applicable to topological manifolds. Some authors add further constraints on topological 
manifolds, such as second countability or connectedness of the topology. These are considered to be 
“optional extras” here. 


Section 50.2 defines submanifolds and regular submanifolds of topological manifolds. Regularity of a 
submanifold means that it is locally equivalent to the graph of a continuous function. 


Section 50.3 defines continuous embeddings, immersions and submersions of topological manifolds within 
other topological manifolds. These may or may not be regular. Some authors make regularity mandatory 
for these definitions. 


Section 50.4 defines direct products of topological manifolds. 


Section 50.5 presents product-structured topological manifolds and their horizontal and vertical sub- 
manifolds. 


Section 50.6 defines Holder continuous and Lipschitz topological manifolds. 


Section 50.7 defines rectifiable curves in Lipschitz manifolds. 


CHAPTER 51: Differentiable manifolds 


(1) 
(2) 
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Section 51.1 gives a very brief overview of the “differential layer”, which is layer 2 in the structural 
framework for differential geometry in this book. 


Section 51.2 presents some of the choices which have been made here for definitions of concepts in 
differentiable manifolds. 
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Section 51.3 defines differentiable manifolds by adding explicit atlases to topological manifolds to specify 
differentiable structure. (The word “differentiable” here means *C* differentiable” for some k € Zg. 
The differentiability class is specified explicitly when required.) 


Section 51.4 defines some basic technical concepts for differentiable manifolds. 


Section 51.5 shows how to generate the underlying topology of a differentiable manifold from an atlas. 
(The alternative is to pre-define the topology and require the atlas to be consistent with the topology.) 


Section 51.6 defines differentiable real-valued functions on differentiable manifolds. 

Section 51.7 defines differentiable vector-valued functions on differentiable manifolds. 

Section 51.8 is concerned with the global extension of differentiable real-valued “test functions”. These 
are functions which can be used to convert chart-based definitions of differentiability to test-function- 
based definitions. 

Section 51.9 defines differentiable curves in differentiable manifolds. 

Section 51.10 defines analytic manifolds. Section 51.11 defines unidirectionally differentiable manifolds. 
(These two sections may be removed.) 


CHAPTER 52: Differentiable manifold maps and products 


(1) 
(2) 
(3) 


Section 52.1 defines differentiable maps between differentiable manifolds. 
Section 52.2 presents diffeomorphisms between differentiable manifolds. 


Sections 52.3 and 52.4 define differentiable submanifolds of differentiable manifolds. Regularity of sub- 
manifolds is defined in terms of local graphability. 

Section 52.5 defines differentiable embeddings, immersions and submersions of differentiable manifolds 
in differentiable manifolds. Regularity of such maps is defined in terms of local graphability. 

Sections 52.6 and 52.7 define direct products of differentiable manifolds and gives some properties 


of maps whose domains or ranges are direct products. Such products and maps are relevant to the 
definitions and properties of fibre charts for differentiable fibre bundles. 


CHAPTER 53: Philosophy of tangent bundles 


(1) 
(2) 


Chapter 53 takes a rest from serious mathematics to try to get some understanding of the “true nature” 
of tangent bundles. (This chapter may be considerably shortened or removed.) 


Section 53.1 is about the ontology of tangent vectors. When we say “point” or “line”, we have some idea 
of what these objects might look like. Out direct experience of these things can be extended to more 
abstract mathematical concepts. Similarly, tangent vectors are found in our direct experience, and each 
textbook about intrinsic differential geometry defines them to be some particular kind of set-construction 
in terms of the differentiable manifold structure. It is argued here that the best representation for a 
tangent vector is a parametrised line in a coordinate chart, both because it corresponds very well with 
intuition and because it has practical advantages. (The author spent over 20 years trying to find a 
satisfactory representation for tangent vectors, and this is the result.) 


Section 53.2 gives some examples of pathological manifolds which hopefully help to justify why tangent 
vectors should be represented as parametrised lines. (This section may be removed.) 


Section 53.3 compares various tangent vector representations which are found in the differential geometry 
literature. Table 53.3.1 in Remark 53.3.2 gives some idea of the lack of unanimity on this subject. 


Section 53.4 summarises some methods for constructing tangent bundles from differentiable manifolds. 
Since a tangent bundle is a differentiable manifold, these methods may be used recursively. Table 53.4.1 
in Remark 53.4.1 summarises much of the plethora of differentiable manifold constructions in this book. 
(This section may be removed.) 


CHAPTER 54: Tangent bundles 


(1) 
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Section 54.1 defines tangent vectors in Definition 54.1.2 to be equivalence classes of chart-tagged affinely 
parametrised lines in the coordinate space of a differentiable manifold. This means that a tangent vector 
at a point p € M is an equivalence class of the form [(v, L)] for charts y and affinely parametrised 
lines L : t — v(p) + vt for v € R^. This representation is (apparently) not yet in the differential 
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geometry literature, but it can mostly be ignored because the explicit details are not required to be 
known for most other definitions. 

Section 54.2 shows that tangent vectors are maps from charts to Cartesian lines. 

Section 54.3 presents some technical points and philosophical points regarding tangent vectors. 
Section 54.4 defines the tangent vector space at a point to be the linear space of all tangent vectors 
at that point. It is important to know that C! differentiability of the manifold is required in order 
to ensure that the set of tangent vectors at a point does constitute a well-defined linear space whose 
dimension equals the dimension of the manifold. (This is emphasised in Remark 54.4.2.) 


Section 54.5 defines the tangent (vector) bundle on a differentiable manifold as special kind of bundle 
which fits the exact definition of neither a topological nor a differentiable fibre bundle. (There are 
technical reasons for this.) 


Section 54.6 defines submanifold tangent vector embedding maps. 
Section 54.7 presents tangent bundle identification maps for direct products of tangent bundles. 


Section 54.8 presents various technical theorems regarding the submanifolds which arise from product- 
structuring of differentiable manifolds. 


Section 54.9 introduces a “drop function” which identifies tangent vectors of a finite-dimensional real 
linear space, regarded as a manifold, with elements of the underlying linear space. 


Section 54.10 gives an alternative representation for tangent vectors which is called here a “tangent 
velocity vector”. For this representation, the equivalence class of parametrised lines is replaced with an 
equivalence class of velocity n-tuples. This is often found in the literature, and is the most convenient 
representation for practical computations. 


Section 54.11 gives another alternative representation for tangent vectors, which is called here a “tangent 
differential operator”. These are also often found in the literature as a way of avoiding “coordinates”. 
This representation has some technical limitations which make them unsuitable as the primary definition 
of a tangent vector, although they are currently very popular in this role. 


Section 54.12 presents the difficulties which arise when one attempts to construct a fibre bundle of some 
kind from the set of tangent differential operators on a manifold. 


Section 54.13 presents various coordinate transformation rules for tangent differential operators on C1 
manifolds. 

Section 54.14 defines the action of tangent differential operators on vector-valued functions. 

Section 54.15 is about "tagged tangent operators", which are tangent differential operators with point- 
tags added to keep track of which point of the manifold they belong to. 

Section 54.16 is about “unidirectional tangent bundles", which are intended to show how the standard 


bidirectional tangent vectors can be adapted for situations such as boundary points of manifolds with 
boundary. (This section may be removed.) 


PTER 55: Covector bundles and frame bundles 


Chapter 55 introduces covector bundles and frame bundles. 


Section 55.1 examines the comparative advantages of numerous plausible representations for tangent 
covector bundles. These are even more numerous than for tangent vector bundles in Section 53.3. 


Section 55.2 defines tangent covector spaces in Definition 55.2.1 in what seems to be the most effective 
way for applications, which is the simple algebraic dual space of the tangent vector space. Standard 
components for tangent covectors are given by Definition 55.2.5. The chart transition rule for tangent 
covector components is given in Theorem 55.2.17. 


Section 55.3 introduces the standard basis for the tangent covector space at a point in terms of the 
standard basis of the corresponding tangent vector space. The chart transition rule for tangent covector 
space basis elements is given in Theorem 55.3.11. 


Section 55.4 defines tangent covector bundles in Definition 55.4.11 by combining the tangent covector 
spaces at all points of a given differentiable manifold. The fibre maps for this bundle are constructed 
directly from the associated tangent vector bundle. 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


(6) 
(7) 


IV. Differential geometry 1557 


Section 55.5 defines tangent vector-tuple bundles, which are useful for the definition of tensor bundles, 
Riemannian and Minkowskian metric tensors, and coordinate frame bundles. 

Section 55.6 defines frame bundles, which are the same as tangent vector-tuple bundles except that the 
vectors in each tuple must be linearly independent. In Section 55.7, frame bundles containing n vectors 
in an n-dimensional differentiable manifold satisfy the requirements for a principal fibre bundle. 


CHAPTER 56: Tensor bundles and multilinear map bundles 


(1) 


Chapter 56 introduces (tangent) tensor bundles, which are built from tangent (vector) bundles using any 
of the tensor constructions in Chapters 27-30. All tensor bundles are associated with each other in the 
sense of associated topological fibre bundles in Section 47.9. Therefore if parallelism (or a connection) 
is defined on one them, an associated parallelism (connection) is automatically defined on all of them. 
Section 56.1 defines pointwise tensor spaces in terms of the pointwise tangent vector space in much the 
same way that the tangent covector space is constructed. Tensor spaces include not only general mixed 
tensor bundles, but also antisymmetric and symmetric tensor spaces. 

Section 56.2 shows how to import Cartesian space tensors onto differentiable manifolds. 

Section 56.3 builds tensor bundles on manifolds from pointwise tensor spaces. 

Section 56.4 defines general multilinear function bundles on manifolds. 

Section 56.5 defines antisymmetric multilinear function bundles on manifolds. 


Section 56.6 defines symmetric multilinear function bundles on manifolds. 


Section 56.7 defines symmetric and antisymmetric multilinear map bundles on manifolds. 


CHAPTER 57: Vector fields, tensor fields, differential forms 


(1) 


Chapter 57 is about cross-sections of vector and tensor bundles, also known as vector fields, tensor fields 
and differential forms. These arise naturally in many areas of physics, and also in pure mathematical 
differential geometry. In fact, vector fields are the most basic “object” in the popular Koszul formal- 
ism, although the term “vector field” in that formalism typically means “differential operator field". 
The Koszul formalism could be thought of as “vector field calculus", or more accurately, “first-order 
differential operator field calculus". 


Chapter 57.1 introduces vector fields in Definition 57.1.2, and in Notation 57.1.11 denotes the space 
of general vector fields on a differentiable manifold M as X(T'(M)), and the space of C^ vector fields 
as X*(T(M)). Notations 57.1.14 and 57.1.16 associate a differential operator field Ox with every vector 
field X € X(T(M)). This avoids the confusions of the Koszul formalism. 


Section 57.2 defines differentiability of vector fields. 


Chapter 57.3 introduces tangent operator fields, which can be thought of as cross-sections of the tangent 
(first-order differential) operator bundle on a differentiable manifold. Unfortunately, this bundle is not 
well defined because zero differential operators have an ambiguous base point. Therefore the operator 
fields which are central to the Koszul formalism are not well defined. In practice, they are always tacitly 
or explicitly associated with the “coordinates” which the formalism seeks to avoid. Given a differential 
operator field, it is non-trivial to determine the corresponding vector field. 


Section 57.4 presents vector-tuple fields. 

Section 57.5 gives definitions and notations for tensor fields. 
Section 57.6 gives definitions and notations for differential forms. 
Section 57.7 introduces “short-cut” versions of differential forms. 


Section 57.8 is about vector fields and tensor fields along curves in differentiable manifolds. These has 


particular importance for parallel transport, geodesics and Jacobi fields. 
Section 57.9 is about the velocity field of a differentiable curve. 


Section 57.10 presents integral curves of vector fields on differentiable manifolds. 


Section 57.11 defines frame fields, which generalise the idea of a coordinate frame field. These are also 
known as “moving frames". 


CHAPTER 58: Differentials of functions and maps 
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Section 58.1 is about pointwise differentials of real-valued functions on differentiable manifolds. The 
differential of a real-valued function is a particular kind of differential form. (Differential forms are 
covariant antisymmetric tensor fields.) 


Section 58.2 is about global differentials of real-valued functions on differentiable manifolds. 


Section 58.3 is about the kinds of spaces which are suitable for differentials of maps between manifolds to 
“inhabit”. These are sometimes called “two-point tensors”, although they are perhaps more accurately 
thought of as covariant tensor fields with values in a “foreign” contravariant tensor bundle. Even 
though differentials of maps between manifolds often appear in differential geometry, the fibre bundle 
formalism does not comfortably accommodate them, especially in the case of higher-degree tensors and 


higher-order differentials. 
Sections 58.4, 58.5, 58.6, 58.7, 58.8, 58.9. 58.10 and 58.11 are about differentials and induced maps 


of maps between differentiable manifolds. As suggested by the literature survey in Table 58.4.1 in 
Remark 58.4.1, the term “induced map” is not much used to distinguish differentials of maps from 
differentials of real-valued functions. Therefore the term “differential” has a wide range of meanings. 
There is some danger of confusion of differentials with the exterior derivative, which uses the same 


notation “d”. 


Section 58.12 is about differentials of differential operator fields. These have a formal simplicity which 
is exploited in the Koszul formalism. 


CHAPTER 59: Recursive tangent bundles and differentials 


(1) 
(2) 


(3) 
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Chapter 59 is about tangent bundles of tangent bundles and differentials of differentials. 


Section 59.1 is about tangent bundles of tangent bundles. These are also called “second-level tangent 
bundles” or “double tangent spaces” here. They are the spaces on which connections are defined. So 
their structure and properties are not a mere matter of idle curiosity. 


Section 59.2 defines vertical subspaces and horizontal components for vectors in double tangent bundles. 
These are core concepts for definitions of affine connections on tangent bundles. Definitions 59.2.9 
and 59.2.15 introduce the “drop function”, which maps vertical vectors in the double tangent space 
to vectors in the single tangent space. This function is required for defining covariant derivatives, for 
example, but it is apparently absent from the differential geometry literature. 


Section 59.3 presents oblique drop functions, which are applicable to non-vertical vectors. 


Section 59.4 gives some theorems for differentials of “scaling curves” and “constant-scale maps” which 
are useful for proving Leibniz theorems for various differential operators. 


Section 59.5 defines sprays on tangent bundles. These are second-level vector fields which satisfy a 
quadratic scaling condition. 


Section 59.6 defines the “(horizontal component) swap function” for double tangent bundles. Like the 
drop function in Section 59.2, the swap function is required for differential geometry, but is apparently 
absent from the literature. The swap function is needed for defining commutators of vector fields and 
commutators of covariant derivatives. As in the case of the drop function, the requirement for a swap 
function is glossed over in the literature by simply proving that certain kinds of differences between 
second-level tangent vectors transform just like ordinary first-level tangent vectors. Thus coordinates 
are used in an ad-hoc way to justify informalities in a supposedly coordinate-free formalism. 


Section 59.7 is about tangent bundles of tangent vector-tuple bundles, which are relevant to the analysis 
of affine connections on Riemannian manifolds. 


Section 59.8 defines differentials of differentials of curves. These represent the acceleration of curves. 
However, in the absence of an affine connection, these second-order differentials can only be defined as 
second-level tangent vectors in the double tangent bundle. 


Section 59.9 defines higher-order differentials of curve families. 


Section 59.10 defines higher-order differentials of real-valued functions. The second-order differential of 
a real-valued function is the “Hessian” of the function, but without an affine connection, the Hessian 
can only be defined in the double tangent bundle. 
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Section 59.11 shows that the Hessian is a vertical double tangent vector at a critical point of a real-valued 
function. This means that it can be “dropped” (using a drop function) to a well-defined second-degree 
tensor without any affine connection. 


Section 59.12 is about higher-order differentials of maps between manifolds. These are relevant to the 
analysis of second-order Jacobi fields. 


CHAPTER 60: Higher-order tangent operators 


(1) 


Chapter 60 is about higher-order tangent differential operators which act on real-valued functions on a 
manifold. The Laplace-Beltrami operator is a second-order differential operator, but is uses an affine 
connection to make it tensorial. The higher-order tangent differential operators here are defined more 
abstractly without an affine connection. In principle, abstract second-order differential operators on 
manifolds should be of significant interest because they are of such importance in physics. However, in 
practice the need for abstraction is bypassed by employing covariant derivatives, which use connections. 
Section 60.1 argues that is first-order derivatives are the inspiration for tangent vectors, then second- 
order derivatives should inspire some sort of vector-like object. 

Section 60.2 defines second-order tangent operators, which are analogous to the standard first-order 
tangent operators. The transformation rules for second-order operators are given in Theorem 60.2.5. 
These are related to the transformation rules for double tangent bundles. 

Section 60.3 examines the possible results of the composition or two first-order tangent operator fields. 
Remark 60.3.2 gives two interpretations. This is relevant to the Lie bracket of vector fields. 

Section 60.4 looks at the equations which must be satisfied by an array of “tensorisation coefficients” 
for a second-order operator. These equations are satisfied by a Christoffel array for example. 

Section 60.5 is about the conversion of second-order tangent operators into the corresponding second- 
order tangent vectors. These can then be combined to form a second-order tangent bundle. 


CHAPTER 61: Vector field calculus 


1) 


Chapter 61 is concerned with the Lie bracket, Lie derivatives and exterior derivatives, which accept vector 
or tensor fields as inputs and produce vector or tensor fields as outputs. Neither of these operations 
requires a connection or metric. 


Section 61.1 concerns the action of vectors and vector fields on real-valued functions. 


Section 61.2 is about “naive derivatives”, where a vector acts on a vector field to produce a double 
tangent vector. 


Section 61.3 presents a Leibniz rule for the naive derivative by a vector of the product of a scalar and 
vector field. This rule equates only second-level tangent vectors, but when combined with connections, 
it yields a Leibniz rules of the more familiar kind. 


Section 61.4 is about “naive derivatives”, where a vector field acts on a vector field to produce a double 
tangent vector field. 


Section 61.5 introduces the Lie bracket of two vector fields, notated like a commutator. 
Section 61.6 applies the Lie bracket to map-related vector fields. 
Section 61.7 introduces Lie connections, which are required for defining Lie derivatives. 


Section 61.8 is about Lie derivatives of vector fields. In terms of coordinates, these are the same as the 
Lie bracket. But there are various formal “issues” which need to be clarified. 


Section 61.9 extends Lie derivatives from vector fields to general tensor fields. 
Sections 61.10, 61.11, 61.12 and 61.13 define the exterior derivative for differential forms. 
Section 61.14 applies the exterior derivative to vector fields. 


CHAPTER 62: Lie groups 


(1) 
(2) 


[ www. geometry. org/dg. html] 


Chapter 62 presents groups which have finite-dimensional manifold structure. These are much better 
known as “Lie groups”. 


Chapter 62.1 describes Hilbert’s fifth problem, which is concerned with demonstrating that a topological 
group, under some technical conditions, is necessarily an analytic group. Because of this implication, 
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groups in general classes C* are of little interest since they are all effectively C^*. However, C% 
regularity does not necessarily apply to the action of the group on a manifold. 


Section 62.2 defines both C^ differentiable and analytic groups, but only the analytic groups are of real 
relevance in differentiable geometry. 


Section 62.3 defines left translation operators on Lie groups. These extend the action of the group on 
points to associated actions on functions, operators, operator fields, tangent vectors and tangent fields. 


Section 62.4 is about left invariant vector fields on Lie groups. These are vector fields which are invariant 
under the group action described in Section 62.3. These are relevant to the definition of connections on 
principal fibre bundles. 


Section 62.5 is about standard left Maurer-Cartan forms on Lie groups. 


Section 62.6 defines right translation operators on Lie groups similarly to the left translation operators 
in Section 62.3. 


Section 62.7 defines right invariant vector fields on Lie groups similarly to left invariant vector fields in 
Section 62.4. 


Section 62.8 introduces the Lie algebra of a Lie group. 
Section 62.9 presents one-parameter subgroups of Lie groups. 
Section 62.10 presents adjoint maps for Lie groups. 


CHAPTER 63: Lie transformation groups 


(12) 


(19) 
CHA 


(1) 
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Chapter 63 presents general groups of diffeomorphisms in Sections 63.1, 63.2 and 63.3. When such 
groups have a finite-dimensional manifold structure, they are known as Lie transformation groups, 
which are introduced in Sections 63.4 and 63.5. Lie transformation groups play a central role as the 
structure groups of differentiable fibre bundles, on which connections are defined. 


Section 63.1 defines groups of diffeomorphisms. For these kinds of transformation groups, a differentiable 
manifold structure is defined on the passive set, but not on the group. 


Section 63.2 defines families of diffeomorphisms and presents some properties. These are relevant for 
defining a very general kind of dual space for a group of diffeomorphisms. 


Section 63.3 defines dual action and dual differential action for diffeomorphism families. These are a 
generalisation of the dual action of a linear transformation on a linear space. Theorem 63.3.8 shows 
that this very general kind of dual differential action is the true differential of the dual. This can be 
used to define a dual connection from a given connection on a kind of diffeomorphism group which is 
much more general than a Lie transformation group. 


Section 63.4 defines Lie (left) transformation groups. These are incorporated into the definition of a 
general differentiable fibre bundle. 


Section 63.5 defines Lie right transformation groups. These are useful in definitions for differentiable 
principal fibre bundles. 


Section 63.6 defines infinitesimal transformations by Lie left transformation groups. In the definition of 
a connection on a differentiable fibre bundle, the action of the connection on the fibre set at each point 
is required to be equal to such an infinitesimal transformation of the structure group when viewed via 
a fibre chart. 


Section 63.7 defines infinitesimal transformations by Lie right transformation groups. 


PTER 64: Differentiable fibre bundles 


Chapter 64 introduces differentiable fibre bundles. These provide a very general substrate on which 
parallelism may be defined by means of connections. (Connections are differential specifications of 
parallelism which can be integrated to compute parallel translation.) Although many of the structure 
of differentiable fibre bundles are the same as for the topological fibre bundles in Chapter 47, there are 
many features which are new. 


Sections 64.1 and 64.2 define differentiable fibrations, which are the same as differentiable fibre bundles 
except that they have no fibre atlas. Differentiable fibrations are simpler structures which allow some 
basic concepts to be introduced before structure groups are added. 
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Section 64.3 introduces fibre charts for differentiable fibrations. 
Section 64.4 introduces fibre atlases for differentiable fibrations. 


Section 64.5 introduces vertical vectors and horizontal components for differentiable fibrations. These 
are generalisations of the same concepts in Section 59.2 for double tangent bundles. 

Section 64.6 introduces drop functions for differentiable fibrations. This is a generalisation of the drop 
functions in Section 59.2 for double tangent bundles. 


Section 64.7 defines differentiable cross-sections and naive derivative operators. 


Section 64.8 defines differentiable fibre bundles by adding a structure group to the simpler definition of 
a differentiable fibration. The transition maps of pairs of fibre charts are required to be in the structure 
group. The reason for this is to ensure that when a connection is defined on the fibre bundle, it will be 
well defined when viewed via any fibre chart. 

Section 64.9 defines pull-back atlases for total spaces of differentiable fibre bundles. 

Section 64.10 defines analytic ordinary fibre bundles. 

Section 64.11 is about differentiable structure on fibre sets. 


Section 64.12 is about tangent-vector embedding maps from fibre-set submanifolds to their ambient 
total spaces of differentiable fibre bundles. 


Section 64.13 is about vector fields on differentiable fibre bundles. 


Section 64.14 is about non-vertical vector fields generated by the Lie algebra. 


CHAPTER 65: Differentiable vector bundles 


(1) 


Section 65.1 defines differentiable vector bundles. These are more general than tangent bundles, but 
because of the linear structure of the fibre space, linear connections may be defined on them. 


Section 65.2 presents linear operations on differentiable vector bundles. 


Section 65.3 defines vertical drop functions for differentiable vector bundles. These are required for 
mapping vertical vectors on the total space down to the total space itself. This is necessary for covariant 
derivative definitions in particular. 


Section 65.4 defines oblique drop functions for differentiable vector bundles. These have some applica- 
tions which are related to Leibniz rules for naive and covariant derivatives. 

Section 65.5 gives some basic properties of differentials of “scaling curves” and “constant-scale maps” 
on vector bundles. This generalises the corresponding tangent bundle concepts in Section 59.4. 
Section 65.6 applies the scaling curve and constant-scale map differential properties in Section 65.5 to 
obtain a Leibniz rule for naive derivatives of cross-sections of vector bundles. 

Section 65.7 introduces vector-tuple bundles which are constructed from vectors in the total spaces of 
general vector bundles. This is an extension of the vector-tuple concept in Section 55.5. 

Section 65.8 present vector-frame bundles defined on vector bundles, which are the same as the vector- 
tuple bundles in Section 65.7, except that they are linearly independent. 


Section 65.9 is about tangent bundles on differentiable manifolds, which may be viewed as differentiable 
fibre bundles with a general linear group as structure group. 


CHAPTER 66: Differentiable principal bundles 


(1) 


[ www. 


Section 66.1 introduces differentiable principal fibre bundles, which are in essence bundles of reference 
frames. These are typically associated with differentiable ordinary fibre bundles, in which cross-sections 
representing physical fields are defined. The combination of a reference frame with a field value yields 
a measurement in the fibre space. 

Section 66.2 presents properties of differentiable principal bundle right action maps. 

Section 66.3 presents properties of differentials of identity chart transition maps. 

Section 66.4 defines left action maps on principal bundles by structure group elements. 


Section 66.5 defines the infinitesimal transformations on principal bundles which are generated by Lie 
algebra elements. These are the differentials of the left action maps in Section 66.4. 
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(6) 


(7) 
(8) 


Section 66.6 defines the function-transposes of infinitesimal transformations on principal bundles. These 
transposes are known as “fundamental (vertical) vector fields". 


Section 66.7 defines associated differentiable fibre bundles. 


Section 66.8 presents short-cut differentiable orit-space associated cross-sections. These are useful for 
representing matter fields in gauge theory. 


CHAPTER 67: Connections on ordinary fibre bundles 


(1) 


(13) 


Chapter 67 introduces connections on differentiable fibre bundles. This commences the connection 
layer 3 in Section 1.1. Historically, connections on tangent bundles were introduced first, but connections 
on differentiable fibre bundles are presented first here because they are more general. 


Section 67.1 comments on the historical background of connections and parallel transport. 


Section 67.2 surveys the numerous ways of representing connections from 1918 until the present. These 
include the Christoffel array, two kinds of connection forms, covariant derivatives, horizontal subspaces 
and horizontal lift functions. (The primary representation chosen here is horizontal lift functions.) 


Section 67.3 discusses how parallel transport is reconstructed from a connection, which is its differential 
in some sense. 


Section 67.4 introduces horizontal lift functions on differentiable fibrations. These have very great 
freedom because they are not constrained by a structure group. 


Section 67.5 introduces horizontal lift functions on differentiable ordinary fibre bundles. This is the 
primary definition chosen for connections in this book. 


Section 67.6 defines connection generator functions for ordinary fibre bundles. 

Section 67.7 defines differentiability of horizontal lift functions on ordinary fibre bundles. 

Section 67.8 defines transposed horizontal lift functions on ordinary fibre bundles. 

Section 67.9 is about horizontal component maps and horizontal subspaces on differentiable fibre bundles. 
Section 67.10 is about the vertical component map representation of connections. 


Section 67.11 is about conversion rules between various representations of connections on ordinary fibre 
bundles. 


Section 67.12 defines associated connections for associated ordinary fibre bundles. 


CHAPTER 68: Connections on vector bundles 


1) 
2) 
3) 


Section 68.1 defines the coefficients of a connection on a vector bundle. 

Section 68.2 defines a covariant derivative for cross-sections of a vector bundle. 

Section 68.3 presents coefficient arrays for connections on vector bundles with respect to Cartan-style 
moving frames. 

Section 68.4 defines a covariant derivative for functions on the total space of an ordinary fibre bundle. 
Section 68.5 mentions parallel transport on differentiable ordinary fibre bundles. (This section may be 
removed.) 


CHAPTER 69: Connections on principal bundles 


1) 
2) 
3 


— 
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Section 69.1 is about horizontal lift functions on differentiable principal fibre bundles. 

Section 69.2 is about transposed horizontal lift functions on differentiable principal fibre bundles. 
Section 69.3 defines horizontal component maps and horizontal subspaces for connections on principal 
bundles. These are alternative representations for connections. 

Section 69.4 defines vertical component maps for principal bundles. 

Section 69.5 defines connection forms on differentiable principal bundles. 

Section 69.6 gives conversion rules between five definitions of connections on principal bundles. These 
are the transposed horizontal lift, the horizontal and vertical component maps, the horizontal subspace 
function, and the connection form. 


Section 69.7 expresses principal bundle connections in terms of connection generator functions. 
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Section 69.8 is concerned with the effect of right action maps on connection forms on differentiable 
principal bundles. 


Section 69.9 applies principal bundle connections to associated ordinary fibre bundles. 
Section 69.10 presents associated connections for orbit-space associated vector bundles. 


Section 69.11 is about the projection of connection forms on principal bundles via cross-sections to the 
base-point space. Such projections are “localisations” of connection forms. Such localisations are called 
"gauge potentials” in gauge theories. 


Section 69.12 shows how to combine gauge potentials which are consistent on overlapping regions into 
a global connection form. The consistency rules are called “gauge transformations". 


Section 69.13 is about the component functions for the *connection form localisations" which are also 
known as "gauge potentials". 


Section 69.14 is about “gauge covariant derivatives”, which are defined on a base space using a connection 
form on the principal bundle. 


Section 69.15 gives conversion rules between nine definitions of connections. Well, ten maybe. 


CHAPTER 70: Curvature of connections on fibre bundles 


(1) 
(2) 


(8) 


Section 70.1 mentions holonomy groups. 

Section 70.2 is a speculative discussion of some ideas for defining curvature of connections on general 
ordinary fibre bundles. Such generality seems to be very difficult or impossible, and probably not very 
useful, but failed attempts help to motivate narrowing the focus to vector bundles and principal bundles. 
Section 70.3 defines Riemann curvature for connections on general differentiable ordinary fibre bundles. 
Section 70.4 discusses curvature of connections on differentiable vector bundles. 

Section 70.5 discusses curvature of connections on principal bundles. 

Section 70.6 presents gauge localisation of curvature forms, which means the pull-back of curvature 
forms from a principal bundle total space down to the base space via a “gauge”, which is a cross-section 
of the principal bundle. 

Section 70.7 presents in Theorem 70.7.6 a justification for the Riemann curvature formula in the special 
case of a connection on a fibre bundle, expressed in terms of coordinates. It is noted that the most 
popular formula for the Riemann curvature is actually the curvature of the covariant derivative, not the 
curvature of parallel transport. These two kinds of curvature differ by a minus-sign. 


Section 70.8 is on the subject of gauge theory, which is a core topic of particle physics. However, the 
objective here is limited to deriving the classical equations of motion for gauge potentials. 


CHAPTER 71: Affine connections on tangent bundles 


(1) 


Chapter 71 applies the general framework for connections on differentiable fibre bundles in Chapters 67, 
68 and 69 to the special case of affine connections on tangent bundles. This specialisation is achieved 
by replacing the general fibre bundle total space E with the tangent bundle T(M) of a manifold M. In 
this book, the term "affine connection" always means a connection on a tangent bundle. 


) Section 71.1 defines an affine connection on a tangent bundle to be a horizontal lift function. 


Section 71.2 presents conversions for horizontal lift functions to and from tensor calculus. 


Section 71.3 is about associated affine connections on vector-tuple bundles, which are useful for the 
analysis for connections on Riemannian manifolds. 

Section 71.4 defines parallel transport on a tangent bundle in terms of an affine connection. 

Section 71.5 defines transposed affine connections. 

Section 71.6 defines the covariant derivative corresponding to a given affine connection. 

Section 71.7 defines the covariant derivative of a vector field along a curve. 

Section 71.8 defines the divergence of a vector field. 

Section 71.9 is about the covariant Hessian of a real-valued function. 


Section 71.10 is about the covariant differential of a map between two manifolds. 
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(12) Section 71.11 defines the Riemann and Ricci curvatures of an affine connection. 
(13) Section 71.12 defines the torsion of an affine connection. 
(14) Section 71.13 is on the coefficients of affine connections on principal fibre bundles. (This section may 


be removed.) 


(15) Section 71.14 is concerned with transferring affine connections to vector-frame fields. This is the Cartan 
style of connection form matrices. 


CHAPTER 72: Geodesics and Jacobi fields 

1) Chapter 72 applies the affine connections in Chapter 72 to define geodesic curves, Jacobi fields on these 
curves, and convex sets and functions. 

Section 72.1 defines geodesic curves as self-parallel curves with respect to a given affine connection. 


Section 72.2 discusses the relations between geodesics and torsion. 


rm 


Section 72.4 defines Jacobi fields. 
Section 72.5 gives the equations of geodesic deviation which are satisfied by Jacobi fields. 


) 
) 
) Section 72.3 is about exponential maps. 
) 
) 


D 


CHAPTER 73: Riemannian manifolds 


1) Chapter 73 introduces Riemannian manifolds. These can be defined either as manifolds which have a 
distance function whose square has a well-defined Hessian, or as a manifold with an inner product on 
the tangent space at every point, or else in a third way by an orthogonal connection. 

2) Section 73.1 is concerned with the construction of Riemannian metric tensor fields from two-point 
distance functions or affine connections. 

3) Section 73.2 defines Riemannian metric tensor fields, which define an inner product on the tangent space 
at each point of the manifold. 

4) Section 73.3 expresses Riemannian metric tensor fields in terms of “coordinates” as in tensor calculus. 

5) Section 73.4 defines Riemannian functions as an alternative construction for metric tensor fields. 


6) Section 73.5 is concerned with raising and lowering the indices of tensors in a Riemannian manifold, 
which is achieved by “musical isomorphisms”. 


7) Section 73.6 is concerned with the lengths of curves in a Riemannian manifold. 


8) Section 73.7 is concerned with the distance function in a Riemannian manifold, which is computed by 
integrating the length element along curves joining two points and minimising over all such curves. 


9) Section 73.8 is concerned with the calculus of variations for the extremisation of distance in Riemannian 
manifolds. 


(10) Section 73.9 is about how to differentiate the square of the distance function for a Riemannian manifold 
to recover the metric tensor field. 


(11) Section 73.10 gives an example of a manifold with an inner product at each point which is only semi- 
definite at some points. 


CHAPTER 74: Levi-Civita parallelism and curvature 


1) Section 74.1 is about the Levi-Civita connection for a Riemannian manifold. 

) Section 74.2 constructs the Levi-Civita connection as a horizontal lift function. 

3) Section 74.3 expresses the Levi-Civita connection in terms of coordinates (tensor calculus). 
) 


Section 74.4 is about various kinds of curvature tensors in a Riemannian manifold, namely the Riemann 
curvature, the Ricci curvature, and the scalar curvature. 


5) Section 74.5 is about sectional curvature. 


6) Section 74.6 is about differential operators in a Riemannian manifold, including the gradient and the 
Laplace-Beltrami operator. 


CHAPTER 75: Pseudo-Riemannian manifolds 
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(1) Chapter 75 is concerned with pseudo-Riemannian manifolds, which may be regarded as a generalisation 
of Riemannian manifolds. Their importance lies in general relativity. 


Section 75.1 discusses some basic concepts of pseudo-Riemannian manifolds. 
Section 75.2 defines a pseudo-Riemannian “metric” tensor on a manifold. 
Section 75.3 states the fundamental gravity formula for general relativity. 


Section 75.4 applies gauge theory geometry to pseudo-Riemannian spaces to obtain equations of motion 
for elementary particles. 


CHAPTER 76: Spherical geometry 

(1) Chapter 76 collects together many differential geometry formulas for the special examples of spheres 
embedded in Euclidean spaces. 

Section 76.1 defines and coordinatises general spheres S" in spaces IR"*. 

Section 76.2 defines terrestrial-style coordinates for S?. 

Section 76.3 describes the embedding of S? in IR?. 

Section 76.4 computes the Hessian differential operator. 

Section 76.5 computes the metric tensor for S ? from the distance function. 

Section 76.6 gives various formulas for the metric, Christoffel array and curvatures on S 2. 

Section 76.7 gives formulas for geodesic curves on S?. 


Section 76.8 gives formulas for isometries of S?. 
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Chapter 49 


LOCALLY CARTESIAN SPACES 
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49.1. Intrinsic differential geometry 


49.1.1 REMARK: Locally Cartesian spaces distinguish curved space from flat space. 

It is not strictly true that differential geometry is distinguished from "flat Euclidean space" by locally 
Cartesian structure. However, in this book this is the arbitrary criterion which is adopted to decide which 
topics are presented in Part IV, which commences with Chapter 49. 


Locally Cartesian structure signals that the geometry will be intrinsic, which distinguishes it from the 
classical differential geometry of Gauf. It is true that Gauf famously discovered the Gaufian curvature, 
which is intrinsic, i.e. independent of the choice of embedding, but this did not lead in his time to the 
development of explicit multi-chart structuring. Differential geometry has used locally Cartesian charts 
explicitly for only about one hundred years. It is typically associated with the intrinsic approach. 


49.1.2 REMARK: Locally Cartesian charts are an abstraction from submanifold projection maps. 

“A manifold is a set which can be coordinatised in a neighbourhood of each point." This simple informal 
definition implies that a manifold cannot be defined on a set which has no topology. A neighbourhood of a 
point is well defined only when a topology is defined on the set. (This tends to support the argument that 
manifold atlases should be defined on topological spaces, not on bare sets. This issue is discussed more fully 
in Remark 49.2.4.) 


The differentiable structure of a manifold is technically defined in terms of an atlas, which is a set of charts. 
This is the cause of huge inconvenience in all definitions and proofs of theorems. An atlas provides a 
coordinatisation of the manifold's base-point set in a neighbourhood of each point. Anyone who has bought 
half a dozen atlases of planet Earth knows that there are a huge number of possible untidy ways to cover the 
Earth with charts, and that it is quite inconvenient to have to follow structures from one chart to the next, 
typically with significant differences in scale, orientation and projection between different charts of each area. 
One naturally wonders whether there exists some kind of canonical representation of a sphere's surface which 
removes most of the arbitrariness of an atlas. (See Remark 49.2.7 for a semi-plausible biological solution to 
this problem.) 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
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More importantly, one naturally asks whether one could define some kind of intrinsic differentiable structure 
on a manifold. It is suggested in Remark 53.1.1 that the “true differentiable structure” of a manifold is the 
tangent bundle, but the tangent bundle is constructed from an atlas of charts, whereas the question here is 
how to avoid having to supply an atlas. 


Locally Cartesian spaces first arose historically as submanifolds of Euclidean ambient spaces. (Riemann later 
suggested that the ambient space could itself be a non-Euclidean manifold, but the Riemannian manifolds 
which he proposed did not have a non-Cartesian global topology.) A submanifold or embedded manifold 
inherits differentiable structure from the ambient manifold. (The technical details of submanifolds and 
submanifold of a Cartesian space can be represented as the graph of a continuous function in a neighbourhood 
of each point. (This is shown in Theorem 50.2.15.) So one could use such function-graphs as some kind 
of intrinsic C? differentiable structure on a submanifold. More importantly, such function-graphs could 
provide an intrinsic C^ differentiable structure for k > 1. (See Theorem 52.3.14 for a proof of the local 
graph condition for regularity of C^ differentiable submanifolds.) The locally Cartesian space character of 
a topological space is evident from the topology alone, whereas C^ differentiable structure for k > 1 cannot 
be determined from the topology alone. 


The fact that regular C^ submanifolds can be represented locally as graphs of C^ functions is equivalent 
to the assertion that they have C^ projections onto hyperplanes in a neighbourhood of each point. These 
projections are effectively local charts for the submanifold. Therefore these can provide a kind of intrinsic 
atlas for a submanifold in terms of the atlas of the ambient space. In this sense, one may say that C^ 
manifolds do have intrinsic differentiable structure, but only if they can inherit this structure from an 
ambient manifold by being regularly embedded. 


Locally Cartesian charts are effectively an abstraction from local projection maps of regular submanifolds. 
For an embedded manifold, local projection maps are typically employed as the charts for its atlas. (The 
charts in a geographical atlas are in fact generally referred to as “projections”.) In the case of manifolds 
which are not defined as submanifolds of Cartesian spaces of other manifolds, an atlas may be regarded as 
an abstraction or generalisation of the set of local projection maps of a regular submanifold, except that 
there is no ambient space. 


In the case of a regular topological submanifold, the relative topology contains all of the required information 
about the locally Cartesian structure of the submanifold. In the case of a regular C^ submanifold with k > 1, 
the relative topology lacks the required differentiable structure. However, the local projection maps do 
provide the required structure. In this sense, one may say that the set of local projection maps provides a 
kind of “relative differentiable structure” on a submanifold, analogous to the “relative topology" concept. 
'The use of an atlas to specify differentiable structure on an abstract manifold is clearly an abstraction of 
this “relative differentiable structure” concept. 

The adverb “locally” is applied to various adjectives describing properties of topological spaces. Examples are 
“locally compact” and “locally connected”. The term “locally Cartesian” follows the same pattern. In each 
case, the word “locally” has meaning only if a topology is defined. In the following table, the non-standard 
term “n-Cartesian” means “Cartesian with dimension n” for any n € Zi ; 


property meaning 


locally compact Vp € X, 
locally connected Vp E X, 
locally n-Cartesian Vp € X, 


Q € Top, (X), Q is compact 
Q € Top, (X), Q is connected 
Q € Top, (X), 3G € Top(IR?), Oz G 


Ww LLI LLI 


Unfortunately, this pattern cannot be followed in the case of a C^ differentiable locally Cartesian space 
with k > 1. Therefore differentiable manifolds cannot be defined as “locally C* Cartesian spaces". This is 
because the homeomorphism relation “~” cannot be extended to diffeomorphism relations without a prior 
differentiable structure on the topological space X. This is why charts are required on X. (Probably the two 
most annoying aspects of differential geometry are charts and tensors, although numerous other concepts 
are strong competitors for this honour.) 


In the absence of a pre-defined differentiable structure on a topological space X, a diffeomorphism condition 
must be imposed on the transition maps between charts. So a CF manifold does not fit into the pattern 
of locally compact and locally connected spaces. In other words, there is no such thing as a “locally C^ 
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Cartesian space” if k > 1. The transition map C* diffeomorphism condition is a mechanism whereby 
embedded manifolds can be generalised to intrinsic manifolds. It is the need for transition map rules which 
sharply distinguishes differentiable manifolds from topological manifolds. This justifies the presentation of 
topological and differentiable manifolds in two separate chapters, namely Chapters 47.9.1 and 51. 

In the case of topological manifolds, the topology alone is a sufficient structure to qualify the space as a 
manifold. In the case of differentiable manifolds, much more work is required in order to model the missing 
differentiable structure using charts and their transition maps. This is a burden which must be carried 
throughout the entire subject of differential geometry. 


49.2. Philosophy of locally Cartesian spaces 


49.2.1 REMARK: Cartesian charts are a modelling tool for real-world manifolds. 
Manifolds are sets with locally Cartesian structure. Figure 49.2.1 illustrates this idea. 


topological space without charts topological space with chart 
S? = {x € T |z| = 1} v : x > (arctan(x1, z2), arcsin(z3)) 
for z € S? with z; > 0 or z2 40 


Figure 49.2.1 Topological space S? without and with a chart 


On the left in Figure 49.2.1 is the set of points on the surface of a sphere, shown here embedded in a Cartesian 
3-space coordinate grid for purposes of illustration. (A bare grey circle would not look much like a sphere.) 


On the right in Figure 49.2.1, the sphere S? is shown with a chart. Cartesian charts are intended to reflect 
some kind of native structure on a point set. For example, if a continuous function or curve is defined on the 
point set, which is external to our pure mathematical model, that function or curve should also be continuous 
within our mathematical model, and vice versa. This is a modelling issue which cannot be resolved within 
pure mathematics. It is the modeller’s task to ensure that the model is accurate, or at least accurate enough 
for the purposes of the model. 


When spheres are encountered in the real world “in the wild”, they typically do not have coordinate charts. 
Before the concepts of latitude and longitude became established several hundred years ago, maps of the 
Earth had no latitude/longitude grids. Yet it was possible to clearly distinguish connected curves from 
disconnected curves, which suggests that there is a “native topology”. A locally Cartesian atlas is not just 
an atlas of charts which are compatible with each other. The charts must be compatible with the “native 
topology”. As a bonus, if all charts are compatible with the native topology, they will also be compatible 
with each other. Thus topological compatibility of charts on their overlaps (i.e. continuity of the transition 
maps) is a necessary condition, not a sufficient condition, for a valid atlas. 


In the special case of the sphere in Figure 49.2.1, there are no charts on the sphere on the left. The topological 
(and differentiable) structure on S? are perfectly well defined via the induced metric on S? from the ambient 
space IR?, as indicated in Example 49.2.3. Therefore charts are not strictly necessary. However, the existence 
of pairwise compatible local Cartesian charts does need to be demonstrated. 


49.2.2 REMARK: Locally Cartesian space charts correspond to “observer frames of reference”. 
From the mathematical point of view, a topological manifold is a “patchwork” consisting of patches of 
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Cartesian topological space stitched together so that the overlaps are consistent. Mathematics is mostly 
performed within individual Cartesian charts, but the point set is the “real thing”. 


In practical applications, the patches typically correspond to different observer frames of reference. So the 
coordinate maps from the point set to observer reference frames are not so much part of the real system 
being observed as part of the process and procedures of measurement and observation of the real system. 
Manifolds permit multiple patches because the topology of the point set may not permit it to be covered by 
a single Cartesian coordinate patch, but even if the structure of a point-set is identical to the structure of a 
single patch of a Cartesian space, one would still want to permit multiple patches, each patch corresponding 
to a different observer frame of reference. 


49.2.3 EXAMPLE: A locally Cartesian topological space which can be defined without charts. 

The topology on the two-sphere S? = {x € IR?; |x| = 1} may be defined without charts. A distance function 
d: S?x S? — Rf may be defined by d(x,y) = |z—y|ms = (35; 1 (xi 9i)?)!? for all x, y € S?. Instead of this 
chord-distance, one may use the geodesic distance d : S? x S? — Rọ defined by d(x, y) = 2 arcsin(d(z, y)/2) 
for all x,y € S?. These metric functions induce the same topology on S?. (See Section 37.5 for the topology 
induced by a metric.) Although the topology can be defined without defining charts on S?, these distance 
function definitions do require a chart on R. It is difficult to avoid using at least some kind of chart. 


Projections onto the three axial planes demonstrate that 5? is a locally Cartesian space. 


49.2.4 REMARK: Charts are required for verification that a topological space is locally Cartesian. 
Example 49.2.3 shows that a locally Cartesian topological space S? can be defined without specifying charts 
directly on the space itself, but charts are required in order to verify that the topology is locally Cartesian. 
To meet the requirements of a differentiable manifold, however, an atlas of charts must be provided because 
a locally Cartesian topological space has no specific differentiable structure in the absence of an atlas. Two 
atlases may specify two incompatible differentiable structures on the same underlying topological space. The 
relations between point sets, topologies and atlases are summarised in Figure 49.2.2. 


topological J manifold 


space x with atlas 


Figure 49.2.2 Relations between point sets, topologies and atlases 


The point set is uniquely determined by the topology, but the topology cannot be uniquely determined from 
the point set. Similarly, the topology is uniquely determined by a continuous atlas, but the atlas cannot 
be uniquely determined from the topology, although the maximal atlas consisting of all continuous locally 
Cartesian charts can be determined, in principle, from the topology. 


In practice, the topology on a set is a pure mathematical fiction. The set of all open sets on a point set is 
typically an enormous collection of sets which cannot be written down explicitly. In practice, a topology is 
typically specified by a point-to-point distance function of some kind, either on the point set or locally in 
a chart coordinate space. But an atlas, on the other hand, can be specified as a small finite set of explicit 
formulas for charts, or the inverses of charts, or constraints defining a submanifold. 


Real-world locally Cartesian spaces arise from simultaneous measurements of parameters. One does not 
measure the topology of any real-world set. Open sets are not found “in the wild". To distinguish between 
a closed interval and an open interval, infinite-precision instrumentation would be required. The topology 
of the real numbers is a pure abstraction which can never be observed in reality, whereas the coordinates of 
points in a locally Cartesian space can be measured, to some precision, in real-world observations. 


Thus an atlas is much more closely aligned with the practical realities of differential geometry. The main 
benefit of defining the underlying topology which is induced by a continuous atlas is that it permits all of 
the definitions, notations and theorems from general topology to be applied. 
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Then again, one must also consider that without a topology, the notion of approximate measurement is 
undefined. The concept of “approximate” requires either a metric or a topology (or something similar). 
Therefore inherent in the idea that real-number parameters of points can be measured in the “real world” 
is the assumption of some kind of underlying topology. If one is to suppose that one’s real-parameter 
measurements are a true reflection of the underlying structure of a physical system, one must assume that 
there is some kind of “glue” which attaches the measurement tuples to the points which are being measured. 
In this sense, it could be argued that a topology is part of the measurement process, and that the topology 
therefore deserves a foreground role in the definition of locally Cartesian spaces. 


It may be concluded, perhaps, that both the charts and the topology deserve equal standing in the definitions 
for locally Cartesian spaces. But there are two topologies: the topology on the point set and the topology in 
the coordinate space, and these two topologies should agree. Small variations in points and coordinate tuples 
should correspond to each other. The topology in the coordinate space is part of the standard mathematical 
machinery for Cartesian spaces. The topology on the point set is extra-mathematical in the case of real- 
world point-spaces. (In the case of embedded manifolds, the point-set topology is obtained from the ambient 
space.) Therefore the point-set topology may be mathematically defined only as the induced topology from 
an atlas. 


There is no way to specify the “locally Cartesian quality" of a point-set apart from either assuming a prior 
topology which is locally homeomorphic to Cartesian patches, or else defining an atlas which induces a 
topology and locally Cartesian structure from the prior definitions in Cartesian space. This issue becomes 
even more difficult to resolve in the case of differentiable manifolds. 


The balance seems to be tipped in favour of the “native topology” style of specification of locally Cartesian 
spaces, including differentiable manifolds, by the fact that every chart is supposed to have an open domain. 
This only makes sense if the point-set has a topology. It is true that one may define an atlas in isolation 
from the point set as a “patchwork” as in Section 49.9, but even for an intrinsic space such as cosmological 
space-time, there is a psychological need for some mathematical set to represent points. (Both the point-set 
and topology of a patchwork of Cartesian patches are self-contained within an atlas.) If an explicit point 
set is part of the specification, it really seems most natural to specify that it should have its own topology, 
which is then used to ensure that all charts have open domains. Thus the default style of structure for a 
locally Cartesian space or manifold should include a topological space at its core, whether an atlas is part 
of the specification or not. (Other styles may also be defined.) 


An additional argument in favour of the topology being central is that the modern definition in terms of open 
sets is a high abstraction from the traditional definition of a metric, and a metric, i.e. a point-to-point distance 
function, is typically expressible as a formula of some kind. Therefore the abstract set-of-subsets concept of 
topology is not the concept which is typically employed in the practical specification of a topological space. 
Patches do need “glue” (i.e. homeomorphisms) to attach them to the point set. So it seems that the best 
kind of mathematical representation of manifolds should have a topological space at its core. This appears 
to agree with the majority view according to the literature sample in Table 49.2.3. 


49.2.5 REMARK: Cartesian charts offer “richer services” than the underlying topology. 

Since the underlying topology of a locally Cartesian space, which is induced by a locally Cartesian atlas, 
fully determines the set of all locally Cartesian charts and atlases which are compatible with that topology, 
it might be thought that the topology contains the same information as an atlas, and so one may freely 
choose to specify a locally Cartesian space as a topological space (which is a set with a topology), or as a 
set with an atlas. In principle this is true, but in practice, it is very much easier to compute the topology 
from an atlas than to compute an atlas from a given topology. 


Another consideration is the differing “services” provided by a topology and an atlas. The topology offers 
the following “services” for example. 

(1) Tests for which functions on the set are continuous. 

(2) Tests for which curves in the set are continuous. 

(3) Tests for which sets are open, closed, compact, connected, locally connected, and so forth. 

(4) Construction of the interior, exterior, closure and boundary of sets. 


Those “services” are the same as are offered by any topology. But a locally Cartesian atlas offers the following 
additional services. 
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year reference core structure 
1949 Synge/Schild [41], pages 3-4 set 

1959 Willmore [42], pages 152, 193 topological space 
1963 Auslander/MacKenzie [1], page 33 topological space 
1963  Kobayashi/Nomizu [19], page 2 topological space 
1964 Bishop/Crittenden [2], page 2 topological space 
1965 Postnikov [33], pages 1-4 topological space 
1968 Bishop/Goldberg [3], page 21 topological space 
1968  Choquet-Bruhat [6], pages 5-6 topological space 
1970 Spivak [37], Volume 1, page 1 topological space 
1972 Malliavin [28], page 17 set 

1972 Sulanke/Wintgen [40], page 11 topological space 


1975 Lovelock/Rund [27], pages 333-334 topological space 
1977 Drechsler/Mayer [262], pages 12-13 set 


1979 Do Carmo [9], page 2 set 
1980 Schutz [36], page 23 topological space 
1981 Bleecker [254], page 6 set 
1983 Nash/Sen [30], page 26 topological space 
1986 Crampin/Pirani [7], page 238 topological space 
1987 Gallot/Hulin/Lafontaine [13], page 6 topological space 
1988 Kay [18], page 195 topological space 
1993 Kosinski [21], page 1 topological space 
1994 Darling [8], pages 98-99 set 
1995 O'Neill [295], page 2 topological space 
1997 Frankel [12], page 13 topological space 
1999 Lang [23], page 22 set 
2004 Szekeres [305], page 411 topological space 
2015  Gómez-Ruiz [14], pages 15-17 topological space 
Kennington topological space 
'Table 49.2.3 Survey of manifold definition core structures 

(1) Define coordinates for points, i.e., charts. 

(2) Define the dimension of the space. 

(3) Decompose function, curve and map continuity tests into tests with respect to individual coordinates. 

(4) Define submanifolds, embeddings, immersions and submersions. 

(5) Define regularity of submanifolds, embeddings, immersions and submersions. 


These “additional services" which are provided by an atlas are worth having. A large proportion of operations 
on locally Cartesian spaces, apart from the purely topological operations, are expressed in terms of coordinate 
charts. This provides an argument in favour of specifying a locally Cartesian spaces (and a topological 
manifold) as a set-plus-atlas rather than a set-plus-topology. 


Nevertheless, a locally Cartesian space is specified in Definition 49.4.7 as a set-plus-topology. This is not as 
irrational as it might seem. Many geometric and topological concepts are defined as point sets with some 
additional structures on them, even though the point sets are often abstract and extra-mathematical. In 
the same way, a topological space is a structure which is an abstraction underlying other structures, which 
those structures “point to" in some sense. Thus the topology is the abstraction, while an atlas is concrete. 
'The same thinking process underlies the abstraction of linear spaces in terms of a set of axioms. In practice, 
a linear space is generally manipulated in terms of a basis and coordinates. The underlying vectors are 
generally an abstraction which is not directly useful. 


49.2.6 REMARK: The unnatural representation of locally Cartesian spaces by patches and charts. 
There is something discomforting about the need to use multiple Cartesian patches stitched together to 
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represent locally Cartesian spaces. Since most physical manifolds do not have stitch-marks on them, it seems 
unnatural to introduce such stitch-marks into the mathematical model. However, this is a consequence of the 
great familiarity of human beings with Cartesian spaces. If we had very much more familiarity with spherical 
topology, we would presumably use sphere-patches to model Cartesian spaces. Then we would remark that it 
is unnatural to use an infinite number of sphere-patches to cover an infinite Cartesian space. Thus what we 
do in defining manifolds is to leverage our prior strong familiarity with simple Cartesian topology to study 
other topologies. It is simply a matter of extending the tools which we already have in order to broaden 
their applicability. 

As mentioned in Remark 49.2.2, coordinate patches correspond to observer frames of reference. The entire 
surface of a sphere cannot be observed from a single external viewpoint. Consequently more than one patch 
is required. We accept that we cannot view the entire surface of a three-dimensional object from a single 
viewpoint. The mathematical definition of a manifold using charts is no more unnatural than that. 


49.2.7 REMARK: Patch-free differential geometry with spherical retinas. 

As a kind of thought-experiment, one might consider how a two-sphere could be viewed in a single patch. 
One could perhaps imagine a creature which views objects by swallowing them so as to use an internal 
“retina” which is homeomorphic to S?, which is temporarily opened during swallowing to allow an object to 
be placed inside the spherical viewing space. (Even the human eye has a blind spot through which the optic 
nerve passes.) Then the spherical retina could simultaneously view all points on the surface of the swallowed 
object. Such a creature would presumably have a notion of geometry which is based on S? topology without 
the need to use patches as we do with our approximately flat oval-shaped retinas. A creature with $7 vision 
would presumably have a neural organisation to enable it to perceive an entire two-sphere surface in a single 
observation without needing to mentally stitch together Cartesian patches from different viewpoints as we do. 
It may be concluded that the patches in differential geometry are a consequence of the limitations of human 
perception which cannot be removed because our neural organisation is adapted to Cartesian-patch retinas. 
'Therefore it is vain to pine for a mathematical formalism for differential geometry which is patch-free. 


49.2.8 REMARK: Terminology: Topological manifolds versus locally Cartesian topological spaces. 

Table 49.2.4 outlines the naming scheme for locally Cartesian spaces and manifolds in this book. (The 
columns headed “sep” and “reg” indicate the separation and regularity classes respectively. The text in 
parentheses is implied if not stated. The parameter k is assumed to be in Zi. but *C^" may be replaced by 
any other regularity class which is at least as strong as continuity.) 


name structure sep reg 

locally Cartesian (topological) space topology: (M, Ty) c? 
Hausdorff locally Cartesian (topological) space topology: (M, Tjj) Tə C? 
non-Hausdorff topological manifold topology: (M, T) G? 
(Hausdorff) topological manifold topology: (M,Tm) Tə C? 
non-Hausdorff C? (differentiable) manifold ^ atlas: (M, Am) Co 
(Hausdorff) C? (differentiable) manifold ^ atlas: (M, Am) T C? 
non-Hausdorff C^ (differentiable) manifold atlas: (M, Am) ok 
(Hausdorff) C^ (differentiable) manifold ^ atlas: (M, Am) T, OF 


Table 49.2.4 Naming scheme for locally Cartesian spaces and manifolds 


In this naming scheme, a “locally Cartesian (topological) space" means a topological space structure (M, Tm) 
which happens to be everywhere locally Cartesian. The adjective ^Hausdorff" specialises this to spaces which 
have T^ separation. 


The term “topological manifold" is interpreted by some authors to mean a Hausdorff locally Cartesian 
topological space (M, T), while other authors define a "topological manifold" to mean a set/atlas pair 
(M, Am) where the atlas Ay induces a Hausdorff locally Cartesian topology on the set M. However, the 
majority of authors do not even define a “topological manifold" at all. Of those who do define it, a substantial 
proportion interpret it as a topological space. This seems preferable because the atlas version already has a 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1574 49. Locally Cartesian spaces 


name, which is a *C? (differentiable) manifold”. It seems sensible to include the atlas structure within the 
scale of C^ manifolds rather than give two names to the same structure. 


Concerning the terms “Hausdorff” and “non-Hausdorff”, it is generally assume that a topological space 
does not have a property unless it is stated. Therefore a “locally Cartesian (topological) space" is non- 
Hausdorff, i.e. not necessarily Hausdorff nor not-Hausdorff. Thus Hausdorff separation is not implied unless 
it is explicitly stated. 

In the case of manifolds, however, most authors assume that the topology (explicit or induced) is Hausdorff 
unless otherwise stated. Thus one may override the “default” for manifolds by stating that they are “non- 
Hausdorff”, which means that Hausdorff separation is not required although it is permitted. If a manifold 
is called “Hausdorff”, this is redundant, but reinforces the default in contexts where it could be in doubt. 


The not-Hausdorff locally Cartesian space examples in Section 49.5 give some idea of what is avoided when 
the Hausdorff condition is not enforced. In view of the huge range of "interesting" topologies which are 
unleashed when Hausdorff is not enforced, it is perfectly understandable why most authors would want to 
excluded such spaces. Hence "topological manifolds" are effectively those locally Cartesian spaces which are 
embeddable in Cartesian spaces and are thus devoid of rampant bifurcations. 


When a manifold has a regularity class C^ for some k € Zf, the adjective “differentiable” is superfluous. 
The word “manifold” on its own is ambiguous, but the meaning is typically clear in particular contexts. 


It is perhaps also worth noting that under the above naming scheme, a “locally Cartesian (topological) space" 
is exactly the same thing as a “non-Hausdorff topological manifold”, and a “Hausdorff locally Cartesian 
(topological) space” is exactly the same thing as a “(Hausdorff) topological manifold”. This is inefficient 
terminology, but does reflect a difference of emphasis which is present in the literature. 


When the term “topological space” is used, the implied context is general topology, where one generally 
lists constraints explicitly, and constraints which are not listed are not implied. This is reasonable because 
pathological spaces are the bread and butter of general topology. So there is no need to be overly cautious. 
In the differential geometry context, however, the focus of study is much more specific. Topologically 
pathological spaces are generally avoided because such academic curiosities are not the main interest. There 
are plenty of other things to worry about. Therefore in the differential geometry context, it makes sense to 
make “manifold” imply Hausdorff. 


Some authors include second countability as a necessary property of manifolds. This seems unduly restrictive 
as a default. When the second countability property, or any other topological property, is required, it 
should be stated explicitly instead of being hidden in the definition of a manifold. Books can be difficult 
to interpret is they use non-standard definitions which are hidden in an introduction somewhere. (One 
shouldn’t need to wade through 50 pages of introduction to discover what basic words such as “manifold” 
mean.) Other plausible default properties for manifolds include connectivity, local connectivity, smoothness 
(i.e. C*? regularity), and paracompactness. These should be explicitly stated wherever they are needed. 


49.2.9 REMARK: Origin of the term “manifold”. 
Riemann's 1854 habilitation thesis [230], page 3, used the word “Mannigfaltigkeit” (“manifold”) in reference 
to physical space. (See also Riemann [196], page 135.) However, he did not mention anything like an atlas. 


49.2.10 REMARK: Hermann Weyl’s three-storey metaphor for differential geometry structures. 

Weyl [310], page 104, writing in 1918-1922, strongly hinted at a three-layer model of differential geometry 
structure in the following passage which appears at the end of a philosophical discussion of the nature of 
physical space. 


Die gedanklichen Grundlagen sind gelegt, und wir dürfen jetzt nicht langer sáumen, mit dem 
systematischen Aufbau der »reinen Infinitesimalgeometrie« zu beginnen, der sich naturgemäß in 
drei Stockwerken vollziehen wird; vom jeder näheren Bestimmung baren Kontinuum über die affin 
zusammenhängende Mannigfaltigkeit zum metrischen Raum. 


This may be translated into English as follows. 
The conceptual foundations are laid, and we mustn’t tarry any longer now to begin with the 
systematic construction of the “pure infinitesimal geometry” which will be carried out, appropriately 
to its nature, in three storeys; from the continuum, bare of every qualification, via the affinely 
connected manifold to the metric space. 
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Another passage (Weyl [310], page 78) makes it clear that Weyl’s “continuum” means a locally Cartesian 
topological space. (The word “Raum” means “space”, but it also means a “room” as in a house. Probably 
this was intended humorously!) Thus Weyl was evidently proposing a “three-storey” model as follows. 


storey /floor structure 
3 metric space Riemannian metric 
2 affinely connected manifold affine connection 
1 continuum Cartesian charts 


This is related to the five-layer model in Section 1.1. Weyl misses out the differential layer, although 
throughout his book, he does require sufficient differentiability of the chart transition maps for his purposes. 
Since not much can be done without differentiability (i.e. with only a pure Cartesian manifold “bare of any 
qualification”), it is quite understandable that he does not split his “continuum” layer into a topological 
continuum and differentiable continuum. 


The zeroth layer (a set without topology) is not defined by Weyl. This is also quite understandable because 
differential geometry really only starts when you get to charts in the “continuum” layer. What goes on in 
the zeroth layer “cellar” was quite rightly not Weyl’s concern in a book intended for physicists around 1920. 


49.2.11 REMARK: Alternative concepts of locally Cartesian spaces. 

Locally Cartesian topological space in the style of Definition 49.4.7 could be generalised to spaces which 
are locally homeomorphic to open subsets of any topological space at all. This would yield a more general 
category of patchwork spaces with patches taken from any given topological space. In particular, some 
authors use general Banach spaces in place of R”. (See for example Lang [23], page 22.) 


It could be argued that an affine space should be used instead of the n-tuple space R” for the coordinate 
space of manifolds. This is an appealing idea. There is no truly geometric role for the origin and axes 
of R”. An affine space removes the origin and axes by defining the point space of the affine space to be an 
abstract extra-mathematical set which has various correspondences with a linear space. (See Section 26.10 
for affine spaces over linear spaces.) In effect, an affine space is a kind of manifold with affinely related linear 
space charts. However, using an affine space as the coordinate space for a topological manifold would create 
two layers of atlases. The topological manifold would have charts valued in an affine space, which then 
has affinely related linear space charts. This would be excessively burdensome for a dubious philosophical 
benefit. In practice, the abstract point set of an affine space is defined itself as a linear space. This is because 
there is no explicit mathematical structure which has no origin or axes. Defining an affine space pretends 
that there is no origin or axes by making the point set abstract, but whenever it is made concrete, the origin 
and axes reappear. 


The important point here is not whether “real-world” manifolds have an origin and axes such as is imposed 
by a chart, but rather whether the human measurement of real-world manifolds has an origin and axes. The 
answer to this second question is “yes”. The mathematical models employed in physics and other sciences 
describe the interaction between observers and the real world, not the “real world” itself. As suggested by 
Figure 49.2.1, the real world is free of coordinates, but when one measures the real world, the coordinates 
inevitably appear. The true role of mathematics in science is to provide tools for describing measurements 
or “phenomena”, not for describing the “real world” itself. 


49.3. Non-topological charts and atlases 


49.3.1 REMARK: The value of non-topological charts and atlases. 

Non-topological topological spaces make very little sense, but non-topological charts and atlases are a dif- 
ferent matter. As in the case of non-topological fibrations and fibre bundles in Chapter 21, there are some 
things which make perfect sense in the absence of a topology. Some aspects of charts and atlases are easier 
to understand when the focus does not have to be placed on topological technicalities. 


49.3.2 REMARK: Non-topological atlases. 
Definitions 49.3.3 and 49.3.4 introduce X-valued charts and atlases for a general set M, where X is any set. 
These are related to the topological charts and atlases in Definitions 49.6.2 and 49.7.3, and the continuous 
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res REMAIN 2 non-topological 
atlas on a set 
Y Y 
locally Cartesian] , locally Cartesian 
topological space atlas on a set 
Y Y 
topological] . . , | topological manifold 
manifold atlas on a set 
Y Y 
Cartesian | . Cartesian 
topological space atlas on a set 
Figure 49.3.1 Relations between manifolds and atlases on sets. 


locally Cartesian atlases in Definition 49.8.2. No constraints are placed on the transition maps between 
charts in Definition 49.3.4. (Some relations between these structures are illustrated in Figure 49.3.1.) 

A non-topological atlas with chart space X for a set M is a set of X-valued injections from subsets of M 
to X such that the domains of these injections cover M. Theorem 49.3.6 gives some basic properties which 
can be applied or extended to more structured kinds of atlases. 


49.3.3 DEFINITION: A (non-topological) chart with chart space X on a set M is a bijection v : U > G for 
some U c IP(M) and G € IP(X). 


49.3.4 DEFINITION: A (non-topological) atlas with chart space X on a set M is a set A of charts with chart 
space X for M such that Uy<4 Dom(v) = M. 


An indexed (non-topological) atlas with chart space X on a set M is a family (w;);e; of charts with chart 
space X for M such that | Ji j Dom(v;) = M. 


49.3.5 REMARK: Some basic properties of non-topological atlases. 

'Theorem 49.3.6 is valid for general functions between sets which have non-topological atlases. The conclusion 
is essentially obvious, namely that the images and pre-images of any function f between two sets may be 
reconstructed as the union of the corresponding images and pre-images via the charts. 


49.3.6 THEOREM: Some basic properties of non-topological “manifold” atlases. 
Let A, and A» be atlases with chart spaces X4 and X» for sets Mı and Mg respectively. 
(1) Vii € Ai, Vil o V1 — idpDom(w1): 
(ii) Vi € A, V, o Vil = idRange(y1): 
(i) Upea wy oqqi = idy,. 
(iv) For any map f : Mı > Mo, 
VS € P(Mi), f(S)- U U (v! o fuu. 0 V1)(S), 
V1€A: V2€A» 
where fy, yp = V2o f o Vil for all v4 € A; and vy» € Ag. 
(v) For any map f : Mı > Mo, 


VS € P(M3), f£MS-U U (ro fey, ovS), 
1€41 V3€ À» 


where fyra = Y2 o f o v1! for all pı € Ay and v» € 43. 


PROOF: Part (i) follows from Theorem 9.6.19 (ii) 

Part (ii) follows from Theorem 9.6.19 (iii) 

Part (iii) follows from part (i) and Definition 49.3.4. 

For part (iv), wz! o fads? i = (Ug! o p2) o f o (vi! o yı) for all vj € A1 and v» € A», and 
Vul o pk = idy, with Up, = Dom(wvy) for k = 1,2 by part (i). Therefore Wz 0 fona © V4 = idv, e f o idy,- 
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Consequently f = Uy e 4, Uu, eA, (Ug! o forpa o V1) by Theorem 9.6.24 (v) and Definition 49.3.4. The 
assertion follows. 

For part (v), V ^ © fiu, o V» = (YI o1) o f^! o (V o Vs) for Yı € Ai and V» € A», and v, o x = 
idy, with Up = Dom(y) and v € Ax for k = 1,2 by part (i). So py! o Jonas o p = idy, o f^! o idy. 
Therefore f^! = Uy ea, Uu eA, (91 o fuu, o W2) by Theorem 9.6.24 (v) and Definition 49.3.4. 


49.3.7 REMARK:  Non-topological manifolds. 

Despite the paucity of theorems for non-topological atlases, the concept of a “non-topological manifold" is 
not entirely without value (even though it literally means a “non-topological topological space"). Riemann 
briefly mentioned *discrete manifolds" (i.e. non-topological manifolds) in his 1854 thesis before proceeding to 
the “continuous manifolds" which were his main subject. (See Riemann [196], page 135; Riemann/Weyl [230], 
page 3.) Otherwise, non-topological or “discrete” manifolds are not much mentioned. m 


One may consider various generalisations of the continuity constraints which are imposed on the transition 
maps of topological manifolds. If the continuity constraints are abstracted as some kind of transformation 
semigroup or pseudogroup, a wide class of manifold-like structures can be defined. However, it is questionable 
whether these yield interesting applications. 


49.3.8 DEFINITION: A non-topological manifold with chart space X is a pair (M, A) for some set M and a 
non-topological atlas with chart space X for M. 


49.3.9 REMARK: Products of non-topological atlases. 
The direct product of atlases on direct products of sets in Definition 49.3.10 is of little value, except to show 
that it can be done. 


49.3.10 DEFINITION: The (direct) product atlas for sets Mi and M» with atlases A; and A» respectively 
is the set (V4 X V»; Yı € Ai and vs € Ag}. 


49.3.11 REMARK: The fundamental mysteries of real-world geometry. 

One of the great mysteries of geometry is why space is three-dimensional, at least locally. Perhaps an 
even greater mystery is why space is locally Cartesian at all, in other words locally homeomorphic to R” 
for some n € Zt. The real numbers, as mentioned in various places in Chapter 15, constitute a quite 
complicated system when studied in detail. So one might ask why this very complicated number system is 
the basis of physical space, why we use ordered triples to describe space, why space doesn't have any gaps 
or bubbles, and why space seems to be continuous and even differentiable in some sense. 


As mentioned in Remark 15.0.3, part of the answer to the mystery of spatial geometry might be the fact that 
our measurements of the real world are recorded as tuples of numbers, which we imagine can be extended to 
any precision, with no gaps or bubbles. But if the universe had been significantly different, presumably our 
measurements of the world would be significantly different. Possibly the real world does have gaps, bubbles, 
wormholes or other deviations from the locally Cartesian model. Possibly the universe is sometimes even 
fractional-dimensional or cannot be modelled by our current range of mathematical systems. But since our 
perceptions are funnelled through equipment which reports locations and other parameters as tuples of real 
numbers (to some precision), we might miss out on something. 


When formalising differential geometry, the question inevitably arises as to how much generality should be 
permitted in the definitions so that future applications are not precluded or discouraged. Definitions which 
are too general are a constant burden. If the real numbers are replaced by general fields, rings, groups or 
semigroups, and if finite integer dimensionality is replaced by possibly infinite or fractional dimensionality, 
and if differentiability is replaced by continuity, almost-every where continuity or some kind of measurability, 
every definition, notation and theorem is burdened by extensions to spaces which will never be used, and 
in practical applications, they must be specialised so that ultimately no benefit is obtained. The best kinds 
of generalisations are those which are directly suggested by applications. Generalisation for its own sake, in 
anticipation of applications in future centuries, rarely yields any benefit because the required generalisations 
are typically straightforward to supply when needed, and the theoretical frameworks developed a century 
earlier are generally burdened by substantial irrelevancies which do more harm than good. It is often claimed 
that various kinds of mathematics were developed 50 or 100 or more years before they were needed for 
physics, but in the case of differential geometry, for example, it is not clear that 19th century mathematical 
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developments materially assisted 20th century physics, and in fact the subject is even to this day split between 
the pure geometry of manifolds immersed in Cartesian spaces on the one hand, and the less topologically 
interesting pseudo-Riemannian pseudo-geometry on the other. 


It may be concluded that there is no real harm in restricting manifolds to locally Cartesian spaces. As 
discussed in Section 49.5, the Hausdorff condition is a restriction on manifolds which could preclude some 
useful applications in physics, and this restriction is inherited from the pure geometry applications of the 
19th century which did not seriously envisage that intrinsic physical space could be different to the extrinsic 
spaces which then attracted interest. Other directions in which manifolds could be profitably extended 
include boundary complexity and unusual ways of stitching locally Cartesian patches together. To minimise 
the burden of excessive generality, the presentation here is limited to finite-dimensional locally Cartesian 
spaces without boundaries and with no unusual patch-stitching. 


49.4. Locally Cartesian topological spaces 


49.4.1 REMARK: The difference between “Euclidean spaces” and “Cartesian spaces”. 

The view is taken here that a “Euclidean space” has the standard Euclidean distance function, whereas 
a “Cartesian space” is a set of n-tuples of real numbers together with an optional implied topological or 
differentiable structure. Thus a “Cartesian space” has no distance function and no affine connection. A 
Cartesian space is a numerical, algebraic, topological or analytical concept, whereas a Euclidean space is a 
geometric concept. In the theory of manifolds, it is the topological or analytical Cartesian space which plays 
the role of coordinate space for charts. Nevertheless, the usual term for a locally Cartesian space is currently 
a “locally Euclidean space”. (This terminology issue is also discussed in Section 26.11.) 


49.4.2 REMARK: The product topology for Cartesian patches. 

Definition 49.4.3 extends Definition 32.6.3, which is the Cartesian tuple space IR” together with its standard 
topology, by giving the name “Cartesian topological space” to any topological space which is homeomorphic 
to it. Thus Definition 49.4.3 introduces a class of topological spaces which include the concrete canonical 


example in Definition 32.6.3. All finite-dimensional linear spaces which possess the standard topology in 
Definition 32.6.6 are also members of this class. 


The standard topology on R” is the product topology of its factors. (Product topologies are defined in 
Sections 32.9 and 32.12. See Definition 32.5.7 for the topology on R. See Definition 32.6.1 for the usual 
topology on R”.) 


49.4.3 DEFINITION: A Cartesian topological space is a topological space which is homeomorphic to the 
Cartesian product set R” for some n € Zg together with the usual product topology. 


49.4.4 REMARK: Variable-dimension versus constant-dimension locally Cartesian topological spaces. 
Definition 49.4.5 lets each topological component of a locally Cartesian space have a different dimension. 
Such a definition has limited value. It is rarely used. Such spaces are best considered to be loose associations 
of the kind of constant-dimension locally Cartesian space in Definition 49.4.7. 


49.4.5 DEFINITION: A variable-dimension locally Cartesian (topological) space is a topological space X 
such that Vr € X, Jn € Zf, JO € Top, (X), 3G € Top(R”), Q ~ G. 


49.4.6 REMARK: By default, a locally Cartesian space has constant dimension. 
Life is much simpler when locally Cartesian spaces (and manifolds) have constant dimension by default. 
Variability of the dimension is best introduced only when a particular range of applications is envisaged. 


49.4.7 DEFINITION: A locally Cartesian (topological) space is a topological space X such that 


Jn € Zf, Vx € X, IN € Top, (X), 3G € Top(R”), 
0 zz G. 


In other words, 


Jn € Zi, Vr € X, 30 € Top, (X), 3G € Top(IR?), 3v : 0  G, 
i is a homeomorphism. 
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49.4.8 REMARK: History of locally Cartesian spaces. 
The local homeomorphism style of definition of a locally Cartesian space was used by Hermann Weyl in 
1913. (See Weyl [158], pages 16-18.) 


49.4.9 REMARK: Zero-dimensional locally Cartesian spaces. 
Zero-dimensional locally Cartesian spaces are not without interest. A zero-dimensional locally Cartesian 
space is any set with the discrete topology. In other words, it is an arbitrary set M with the topology IP(M). 


49.4.10 DEFINITION: The dimension of a locally Cartesian topological space M is 
(i) for M z 0: the unique integer n € Zg such that Vp € M, 30 € Top, (M), 3G € Top(IR"), Q = G; 
(ii) for M = (: the integer 0 € Zj. 


49.4.11 NOTATION: dim(M) denotes the dimension of a locally Cartesian topological space M. 


49.4.12 DEFINITION: The coordinate space or chart space of a locally Cartesian space M is the Cartesian 
topological space IR", where n = dim( M). 


49.4.13 REMARK: The dimension of the empty locally Cartesian space. 
The special case of the empty topological space (Ø, (0)) satisfies Definition 49.4.7 for all n € Z. For this 
reason, Definition 49.4.10 chooses the ad-hoc dimension zero for the empty locally Cartesian space. 


It may seem more reasonable to exclude the empty space from the definition of dimension for a locally 
Cartesian space. In fact, it would be reasonable to exclude the empty set from Definition 49.4.7. The empty 
set has all properties, as shown in Theorem 7.6.9. So it makes little sense to say that the empty set has any 
particular property. m 


On the other hand, one could say that the empty topology has all non-negative integer dimensions. In other 
words, the dimension is indeterminate like the value of 0/0 because 0 — 0.n for all n € Z. An example of the 
usefulness of such indeterminacy of the dimension of the empty topology is the observation that in general 
the intersection of two open subsets of an n-dimensional space is n-dimensional. If the intersection is empty, 
it would be convenient to be able to say that it is n-dimensional also. In fact, this is true if one applies 
part (i) of Definition 49.4.10 to all locally Cartesian spaces, whether they are empty or not, except that the 
words “the unique" would need to be replaced with “an”. 


Another way to resolve this issue is hinted at in Remark 8.8.5, where it is suggested that each mathematical 
object should have a class-tag. Then there could be a separate class for each dimension of locally Cartesian 
space, and the empty topology could belong to all of them, with a different class-tag within each class. Such 
an approach would be suitable for mathematical software, but presumably not in a book for humans! 


49.4.14 REMARK: Logical formula for the dimension of a locally Cartesian space. 
An explicit logical predicate for the dimension of a locally Cartesian topological space is as follows. 


(dim(M) =n) + ((n = 0 and M = 9) 
or (n € Zf and M # ) and Vp € M, IN € Top, (M), 3G € Top(IR?), Q ~ G)). 


This gives a unique value for dim(M) because a non-empty topological space cannot be locally homeomorphic 
to Cartesian spaces with different dimension. A non-unique value for dim( M) for non-empty M would imply 
that each point of M has overlapping neighbourhoods homeomorphic to open subsets of different Cartesian 
spaces, and by restricting these homeomorphisms to the intersection of the neighbourhoods, one would obtain 
a local homeomorphism between open subsets of two different Cartesian spaces. (Note that the Peano space- 
filling curve in Remark 36.3.1 does not contradict this argument because it is not a bijection, although it is 
continuous.) 


49.4.15 REMARK: The Cartesian topological spaces are locally Cartesian spaces. 

Definition 49.4.16 gives the name “Cartesian locally Cartesian space" to the Cartesian topological spaces 
IR^ in Definition 32.6.3. Since the structure of a locally Cartesian space is a topological space without 
any atlas, there is no difference in the structures here. It is clear that the identity map on R” provides a 
locally Cartesian chart for all points of the set. Therefore every Cartesian topological space according to 
Definition 49.4.3 is also a locally Cartesian space. In particular, every finite-dimensional linear space, with 
the standard topology in Definition 32.6.6, is a locally Cartesian space. 
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49.4.16 DEFINITION: The Cartesian locally Cartesian space with dimension n € Zi is the Cartesian 
topological space IR” with its usual topology. 


49.4.17 DEFINITION: The locally Cartesian space structure for a finite-dimensional linear space F is the 
topological space (F, Tr), where Tp is the standard topology for F in Definition 32.6.6. 


49.4.18 DEFINITION: The locally Cartesian space structure for the general linear group GL(F) of a finite- 
dimensional linear space F is the topological space (GL(F), Terry), where Tarr) is the standard topology 
for GL(F) in Definition 32.6.8. 


49.4.19 REMARK: Some basic topological properties of locally Cartesian spaces. 
A locally Cartesian space is both locally connected and locally compact. 


49.4.20 THEOREM: Local connectedness and local compactness of locally Cartesian spaces. 
(i) A locally Cartesian space is locally connected. 
(ii) A locally Cartesian space is locally compact. 


PROOF: For part (ii), let (X, T) be a locally Cartesian space. It then follows from Definition 49.4.7 that 
dn € Z*,Vr € X, JQ € Top,(X), IG € Top(IR?), Q ~ G. Let y = d(x), where ó : Q ~ G is the 
homeomorphism in Definition 49.4.7. Since G is open in IR", there is an r > 0 such that Byr C G. Define 
Q = ¢7'(B,,,). Then Q’ is connected because B,,, is connected. Therefore Q’ satisfies the requirements of 
Definition 34.7.3 for the local connectedness of X. 


For part (ii), let (X, T) be a locally Cartesian space. It then follows from Definition 49.4.7 that dn € Z*, 
Vz € X, Ne Top, (X), 3G € Top(IR?), Q ~ G. Let y = ¢(x), where ¢: Q ~ G is the homeomorphism 
in Definition 49.4.7. Since G is open in R”, there is an r > 0 such that Byr C G. Define Q’ = $^! (B,,,). 
Then 9 = 9^! (B, r) because ¢ is a homeomorphism, and €) is a compact subset of (X, T) because By r isa 
compact subset of R”. Therefore Q’ satisfies the requirements of Definition 33.6.3 for the local compactness 
of X. 


49.4.21 REMARK: Topological separation classes for locally Cartesian spaces. 

Since Cartesian topological spaces are metrisable, they have very strong topological separation properties. 
(See Remark 33.3.37 and Figure 33.3.7 for metrisability.) It is therefore quite surprising that locally Cartesian 
spaces have very weak topological separation properties, especially considering that separation classes are 
apparently concerned only with local properties of the topology. Theorem 49.4.22 states that locally Cartesian 
spaces have the very weak T separation property. (See Definition 33.1.8 for the T separation class.) 


49.4.22 THEOREM: Every locally Cartesian space is a Tı space. 
All locally Cartesian topological spaces have the TT, topological separation property. 


PROOF: Let X be a locally Cartesian topological space. Let 21,272 € X with zı Æ zs. Let y; = v(zi) 
for i = 1,2. Suppose there is a homeomorphism w : Q — G for some 2 € Top(X) and G € Top(R”), for 
some n € Zi, such that z,,29 € Q. Then by the T, property of R”, there exist open sets G; € Top,, (IR?) 
such that y; ¢ Gz and y2 ¢ Gy. Since v is a homeomorphism, it follows that ij^! (G;) € Top, (X) for i = 1,2, 
and zi ¢ j^ 1 (G3) and x2 € y 1 (G1). So xı and x2 are separated in Top( X) as required for the Tı property. 


Now suppose that for all n € Zf, there is no homeomorphism v : Q — G with Q € Top( X) and G € Top(R”) 
such that 71,22 € Q. By Definition 49.4.7, for some n € Ze there are homeomorphisms Y; : Q; — G; with 
Q; € Top,,(X) and G; € Top(IR") for i = 1,2. Since y; € IR" has a neighbourhood G} € Top,, (IR), it 
follows that 91 = v^ !(G1) € Top,, (X) because v; is a homeomorphism. Suppose that xj € Q4. Then 
V, restricted to € is a homeomorphism between €?! and G'", such that x1, £2 € 94, which contradicts the 
assumption. Therefore x2 ¢ Q}. Similarly there exists a neighbourhood Q, € Top, (X) such that xı ¢ 05. 
Hence X has the T, property by Definition 33.1.8. 


49.4.23 REMARK: Direct product of locally Cartesian topological spaces. 

'The direct product of two locally Cartesian spaces is fully specified by Definition 32.9.4 for general topological 
spaces because a locally Cartesian space is a particular kind of topological space without any extra structure. 
Theorem 49.4.24 asserts that this product space is also locally Cartesian. This holds by induction for any 
direct product of a finite number of locally Cartesian spaces. 
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The fact that the dimension of the product of two locally Cartesian spaces is not equal to the sum of their 
dimensions if one of the sets is empty while the other has positive dimension suggests that it is preferable 
to exclude empty locally Cartesian spaces in many situations. As mentioned in Remark 49.4.9, the empty 
set could be considered to have any non-negative integer dimension. If this point of view were accepted, the 
simple dimension additivity rule for products of locally Cartesian spaces would in fact be generally valid. 
Definition 49.4.10 specifies zero as the dimension of an empty locally Cartesian space so that *dim( M)" 
will always have a unique non-negative integer value. It would be more truthful to call the dimension of 
the empty set “any non-negative integer”, but this would require the dimension to be defined as a set of 
non-negative integers, which would create technical nuisances that far outweigh the benefits of truthfulness. 


49.4.24 THEOREM: Dimension of direct product of locally Cartesian spaces is the sum of the dimensions. 
Let M; and Mə be locally Cartesian topological spaces. The direct product of M, and M» is a non-empty 
locally Cartesian topological space M with dim(M) = dim(M,)+dim(M2) if M; and M» are both non-empty, 
and dim(M) = 0 otherwise. 


Proor: If M; — 0 or Mə = 0, then M = 0 and dim(M) = 0. So assume that Mı 4 @ and M» # Ø. Then 
Mı x M» £ 0 by Theorem 9.4.6 (i). 


Let ni = dim(Mi), no = dim( M2) and n = nı + ng. Then ni, no,n € Zg. Let p € M. Then p = (pi, pa) 
for some (unique) p; € Mı and po € M». By Definition 49.4.7, there are homeomorphisms wy : Qj — Gy 
for some Qy € Top,, (My) and Gk € Top(IR"*) for k = 1,2. Let Q = Qı x Qz and G = Gi x G2. Then 
Q € Top( Mi x M3) and G € Top(IR"! x IR"?) by Theorem 32.9.6 (ii). So Q € Top, (M) since (p1, p2) € Q1xQe, 
and G € Top(IR?: **2) = Top(IR?) by the standard identification of IR": x IR"? with IR": *"?, Let Y = V X V» 
as in Definition 10.14.3. Then v : Q — G is a homeomorphism by Theorem 32.9.10 (ii). Hence M is a locally 
Cartesian space by Definition 49.4.7, and dim(M) = dim(Mij) + dim(M;). 


49.5. Non-Hausdorff locally Cartesian spaces 


49.5.1 REMARK: One person's poison may be another person's paradise. 

'The class of not-Hausdorff locally Cartesian spaces may be regarded as a chamber of horrors or a paradise 
of opportunities depending on one's point of view. A pure mathematical geometer classifying manifolds 
might want to relegate them to the pathology lab. But as outlined in Remark 49.5.10, non-Hausdorff locally 
Cartesian spaces open up some intriguing possibilities for extending the concepts of space and space-time to 
incorporate fibre bundles and fields. 


49.5.2 REMARK: The Hausdorff property for locally Cartesian spaces. 
The Hausdorff property is not implied by Definition 49.4.7. This is demonstrated by Example 49.5.5. 
Lang [23], page 23, says the following about the definition of a C* atlas for a manifold X for k € Zi ; 


We see no reason to assume that X is Hausdorff. If we wanted X to be Hausdorff, we would have 
to place a separation condition on the covering. This plays no role in the formal development in 
Chapters II and III. It is to be understood, however, that any construction which we perform (like 
products, tangent bundles, etc.) would yield Hausdorff spaces if we start with Hausdorff spaces. 


Thus some authors define “locally Euclidean” (i.e. locally Cartesian) to require Hausdorff, whereas some 
other authors define *manifold" to not require Hausdorff. However, the majority of authors define manifolds 
to be Hausdorff, and they typically define “locally Euclidean spaces" to not require Hausdorff. 


Some authors require all general topologies to be Hausdorff, which is entirely understandable because the 
failure of the Hausdorff property leads to contradictions to intuitive expectations of topological properties. 


49.5.3 REMARK: A locally Cartesian space with two non-separable points. 

In Example 49.5.5 the quotient topological space formed by identifying two real lines at all points except 
one is locally Cartesian but not Hausdorff. This may be thought of as “the real line with two origins". (This 
is illustrated in Figure 49.5.1.) 


While one chart covers all of the real line including the upper origin (but excluding the lower origin), the 
other chart covers all of the real line including the lower origin (but excluding the upper origin). 
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upper origin 

0 a 

0 D R 
lower origin 


Figure 49.5.1 The real line with two origins 


49.5.4 REMARK: Non-metrisability of non-Hausdorff locally Cartesian spaces. 

As mentioned in Remark 49.4.21, the metrisability of a topological space implies the Hausdorff property. 
Therefore locally Cartesian spaces which are not Hausdorff are not metrisable. (See Remark 33.3.37 and 
Figure 33.3.7 for metrisability.) 


It may seem at first sight that the Euclidean metric may be directly imported onto the locally Cartesian 
space in Figure 49.5.1 and Example 49.5.5, but the distance between the two “origins” would then equal 
zero, which contradicts the strict positivity condition for a distance function. (See Definition 37.1.2 (i).) So 
one could perhaps settle for second-best and say that it is a pseudo-metric. (See Remark 37.1.9.) However, 
not even the conditions for a pseudo-metric are met. Let Boo, and Bo, . be balls with radius € around the 
two origins, 09 and 01. Then 09 ¢ Bo,,- and 0; € Bo,,- for some € € R*+ by the Tı property. Therefore 
d(00, 01) > £ > 0 for some £ € IR*. But by the triangle inequality, d(09,01) < 2& for all e € IR*. (This can 
be seen by considering that Bo,,- N Bo,.« #9 for all € € R*+.) So d(0o,01) = 0, which is a contradiction. (In 
fact, any pseudo-metric on a To space is necessarily a metric. See Willard [165], page 85; Gemignani [80], 
page 93.) 


It follows from the non-metrisability of locally Cartesian spaces which are not Hausdorff that such spaces 
cannot support a Riemannian metric tensor field. This makes such spaces of limited utility in differential 
geometry. Of course, one may generalise the concept of a Riemannian metric so that each chart has its own 
metric field, but then one must re-work much of the theory of Riemannian manifolds. 


49.5.5 EXAMPLE: The real line with two origins. 

An example of a topological space which is a not-Hausdorff locally Cartesian space may be constructed as 
follows. Define X = R x {0,1} to have the relative topology from IR?. Define an equivalence relation R on 
X so that (z1,y1) R(xas, yo) whenever zı = £2 Æ 0 or (1, y1) = (2, y2). Let Y = X/R have the standard 
quotient topology. (See Definition 32.13.9.) The set Y may be identified with the set Y’ = (IRx {0})U{(0, 1)}. 
Clearly the two points (0,0) and (0, 1) cannot be separated. So Y" is not Hausdorff although it is T4. 


Define a locally Cartesian atlas on Y" with charts $; : U; — R for i = 0, 1, where Up = Rx {0}, do : (z,0) => x 
for x € R, Ui = ((R \ {0}) x {0}) U ((0, 1)}, and $4 : (z, y) +> z for (x,y) € U1. (See Figure 49.5.2.) 


>R x {1} 
X 
. F . > R x {0} 
identification f f f J y f 
map (0,1) 
y' A . > R x {0} 
$i 
charts Qı Qı 
$o Qo f $o 
. IR 
| e >R 
Figure 49.5.2 Non-Hausdorff locally Cartesian space example 49.5.5 


Then Definition 49.4.7 for a locally Cartesian space is satisfied, but X is not Hausdorff. 


49.5.6 REMARK: The real line with two closed unit intervals. 
Example 49.5.5 may be extended to multiple isolated “double points” on the real line. The number of double 
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points could be infinite. This is only slightly troubling, that a universe which seems to be everywhere locally 
Cartesian could in fact be hiding “double points” which are in one chart but not in another chart which 
seems to cover the same area. Perhaps more troubling is the possibility illustrated in Figure 49.5.3. 


Whole intervals of the real line could be “doubled” while the topology remains Cartesian in a neighbourhood 
of each point. Theorem 49.5.7 shows that such a locally Cartesian space may be defined for an arbitrary 
closed set S of “double points” in a Cartesian space R” for any n € Zj. If S Z Ø and S Æ R”, the set 
S will have boundary points which cause the topology to be non-Hausdorff. (The case S — [0,1] C IR! is 
illustrated in Figure 49.5.3.) 


upper 0 upper 1 
0 "id upper closed unit interval [0, 1] E 1 
0 pM lower closed unit interval [0, 1] x 1 R 


lower 0 lower 1 


Figure 49.5.3 'The real line with two closed unit intervals 


49.5.7 THEOREM: Cartesian space with an arbitrary closed set of “double points”. 

Let n € Zt and let S be a closed subset of IR". Define X = (R” x {0}) U (S x {1}). Let Uo = R” x {0}. 
Define Yo : Uo > R” by o((x,0)) = z for all x € IR^. Let U, = ((IR" V5) x {0}) U (S x {1}). Define 
Yı : Uy > R” by vi ((a,y)) = x for all (x,y) € U1. Define 


Tx = {Q0 U Q1; Qo € Uo, Qi € Ui, Wo(Qo) € Top(R”), Yı (Q1) € Top(R”)}. 


(i) CX, Tx) is a topological space. 
(ii) (X, Tx) is a locally Cartesian topological space. 
(iii) If S AM and S Z R”, then (X, Tx) is not a Hausdorff topological space. 


PnoorF: Let To = {vp (Go); Go € Top(IR?)) and T, = (41!(G1); G1 € Top(IR7)). Then (Uo, To) and 
(U1, T1) are topological spaces because Wo and v are bijections. Let Qo = Vo (Go) € To. Then Qo N U1 = 
Oo (RAS) x (01) = (GoVS)x {0} = v5! (Go/S) € To because S is closed. Similarly, let Q1 = v1 ! (G1) € Th. 
Then Q; € ((R”\ S) x {0}) U (S x (1). So Q1 N Uo = Q1 N(R” x (00) = (G1 \ S) x {0} = v41(G1/8) e T, 
because S is closed. Hence by Theorem 32.14.5, Tx is a topology on X. 

Part (ii) follows from Definition 49.4.7 and the observation that Up and U; are open subsets in the topological 
space (X, Tx) because v4 (Ug) = R” \ S € Top(IR") and (U1) = R” \ S € Top(IR?), and X = U9UU,. 
For part (iii), assume that S # ( and S # IR". Then Bdy(S) # Ø by Theorem 34.1.13 because IR" is a 
connected topological space, and Bdy(S) C S by Theorem 31.9.10 (xxvi) because S is closed. Let x € Bdy(S). 
Then z € S, and so g'(x) = (2,0) and v1 (z) = (a, 1) Æ vg (x). Let y; = V; (x) for i = 0,1. Let 
Q; € Top, (X) and G; = v;(Q0;) for i = 0,1. Then Go,G, € Top,(R"). Let G = Go N Gi. Then 
G € Top, (IR^), and G \ S 4 () because Bdy(S) N Int(S) = Ø by Theorem 31.9.10 (i), since x € Bdy($). But 
Vo (GV S) = v; (GV S). So Ro NNi = Vig (Go) ni (63) 2 vg (GV S)n vi (GN S) = vs (GN S) AO. 
Since this is true for all Q; € Top, (X) for i = 0,1, it follows that X is not Hausdorff. 


49.5.8 REMARK: Consequences of modelling reality with non-Hausdorff locally Cartesian spaces. 

The Hausdorff condition is imposed on topological manifolds in Section 50.1 to prevent bifurcations of the 
kind which are shown to be possible in locally Cartesian spaces by Theorem 49.5.7. The Hausdorff condition 
is somewhat embarrassing because it is not imposed on individual charts, but rather on the topology which 
results from the combination of many charts. If the condition is not imposed, a huge variety of possibilities 
is opened up. Since charts correspond in physical models to the reference frames of different observers, this 
raises the possibility not only of observer-dependent observations, but also of observer-dependent universes. 
In other words, different observers could follow different bifurcations of the total universe and find themselves 
no longer on the same sheet. Or two observers could briefly follow different sheets and return to the same 
sheet later. In the context of the differential geometry of space-time, this raises a very broad range of 
possibilities. Therefore one might perhaps ask whether the Hausdorff condition excludes wide ranges of 
models which could be useful. 
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49.5.9 EXAMPLE: Multiple real lines with a “switching matriz? at the origin. 
Let X = ((R \ {0}) x F) U({0} x F x F) for some non-empty set F. For i,j € F, define 


Ui; = (R7 x (i) U ({0} x {i} x {7}) U (RF x {3}). 


Then U; jer Vij = X. (The case F = {0, 1} is illustrated in Figure 49.5.4.) 
upper upper origin (0, 1, 1) lower upper origin (0, 1, 0) 
left upper real line R7 x (1) bmw "d right upper real line R* x (1) R 


left lower real line R- x {0} ra = right lower real line Rt x {0} R 
lower lower origin (0, 0, 0) upper lower origin (0, 0, 1) 


Figure 49.5.4 A double real line with four origins 


Define Yi j : Ui; > R for i,j € F by 


VijcF,VrclmR, pi jlx, i) =x 
Vi,j € F, Va € {0}, bis(m, i,j) = 2 
Vi, j € F, Vx € R*, Ypi jlx, j) = c. 


Then 4; ; is a bijection for all i, j € F. Let Tx = { U; jer Vij 1(G); G € Top( (R)}. Then Tx will be a topology 
on X by Theorem 32.14.7 (i) if it can be shown that py j (v; ic )) € Top(R) for all pairs (i, 7), (’,9') € FxF. 
If (i,j) = (#, 3’), then obviously Yy y (Y7 }(G)) = G € Top(R) for all G € Top(R). Suppose that i = i' 


and j 7 j'. Then Wy y(r (G)) = GN Ro € Top(IR) because Wy y (W7; ({0})) = Wy y ({(0, i, j)}) = 0 since 
(0,i, j) € Dom(Yv y). Similarly, if i A i' and j = j', then yy p (Yz (G)) = GO IR* € Top(R). Ifi z i/ 
and j # j’, then bir gt Qo; (G)) =) € Top(IR). Therefore (X, Tx) is a topological space. Hence (X,Tx) is a 
locally Cartesian topological space by Definition 49.4.7 because the maps v;,; are all homeomorphisms, and 
their domains cover X. 


To show that (.X, Tx) is a non-Hausdorff locally Cartesian topological space if #(F) > 2, let x = (0,4, j) and 

= (0, i7, j’) with i = i' and j Æ j'. Suppose that Q € Top, (X) and Q’ € Topp (X). Let G = yi; (Q ) and 
G' = vy j(Q). Then G, G’ € Top; (IR). So G” = GnG' € Top (IR). But 0 2 QNU; j = un (G) 2 Urs (G") 
by Theorem 10.6.10 (i), Theorem 10.6.7 (ii) and Theorem 10.6.10 (ii). Similarly, Q’ D Vo. 4 (G"). Therefore 


ANN D pj; 1(G") N Was (G”) = (Rt n G”) x {i} 4 0. So x and z' have no disjoint deigtinouchootls, Hence 
(X, Tx) is “ non- Hausdorff locally Cartesian topological space. 


The non-Hausdorff topology of the case F = {0,1} is roughly illustrated in Figure 49.5.5. 


upper upper origin (0, 1,1) lower upper origin (0, 1, 0) 
left upper real line R7 x {1} i id right upper real line R* x (1) R 
left lower realline R- x (0] "xm right lower real line R* x {0} R 
lower lower origin (0, 0, 0) upper lower origin (0, 0, 1) 


Figure 49.5.5 Non-Hausdorff topology of double real line with four origins 


Each of the four patches U; ; in Figure 49.5.5 passes through its own "origin" (0,2, j) € X. Each chart v; ; 
may be thought of as mapping a point on the chart U; j to its z-coordinate in the diagram. Points with the 
same x-coordinate should be thought of as having the same “location” in some sense. In other words, points 
directly above or below each other are “collocated”, but are distinguished by some internal state parameter. 
This kind of “collocation” clearly cannot happen if the locally Cartesian space is smoothly embedded inside 
a higher-order Cartesian space, which explains why it is difficult to visualise such geometries. 
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The origins may be thought of as “switching points” for trains travelling from left to right in the diagram. 
The index i is the “state” or “track” before arriving at the switching point. The index j is the “state” or 
“track” after passing the switching point. 


The completely arbitrary set F may be thought of as a fibre space on the base space IR, and the single 
switching point may clearly be replaced by any locally finite set of switching points. This gives some hint 
of the great variety of locally Cartesian topological spaces which may be constructed when the Hausdorff 
condition is not in force. 


49.5.10 REMARK: The non-necessity of the Hausdor[f condition for intrinsic geometry. 

It is noteworthy that locally Cartesian spaces with “switching points" as described in Examples 49.5.5 
and 49.5.9, and in Theorem 49.5.7, cannot be embedded in Cartesian spaces because the continuity of 
the embedding enforces the Hausdorff condition. One might speculate that this is the historical cause of 
the insistence on the Hausdorff condition for topological manifolds in differential geometry. Originally all 
manifolds were thought of as embedded inside some Cartesian space, and when general relativity offered a 
real-world application for intrinsic geometry, the Hausdorff condition was, perhaps, retained purely because 
this had become the custom and habit in earlier applications to the geometry of embedded manifolds. 


There is no compelling reason to believe that physical space, or space-time, needs to be constrained by the 
Hausdorff condition. When switching points occur in a locally Cartesian space, each point of the “switch” 
has a neighbourhood which is topologically Cartesian. It is only when one tries to separate different points 
of a “switch” by local patches that one notices the impossibility of making those patches disjoint. This is 
a somewhat non-local issue because it can be ignored completely by focusing on a neighbourhood of each 
point. Generally one thinks of physical laws as being local, not “acting at a distance". Therefore it seems 
plausible that some useful classes of models of space-time could be constructed if manifolds were liberated 
from the Hausdorff condition. One could perhaps even speculate that the “switching matrix" F x F in 
Example 49.5.9 could represent transitions between quantum states of some kind. Even more rashly, one 
could speculate that Figure 49.5.5 might have some value in the interpretation of the famous quantum state 
transitions of Schródinger's cat. 


Instead of regarding fields as actors in a space-time which is a passive arena for quantum field theory, 
one could perhaps regard fibre elements (i.e. field values) as an integral component of the geometry. In 
other words, one could regard an active space-time arena as the totality of physics, as opposed to a passive 
substrate inhabited by actors. The idea that physics is crisply divided into a duality of arena and actors 
seems as absurd as the idea that the human body and mind have such a duality, in which the mind is an 
active actor inhabiting an otherwise passive body in some kind of master-servant relationship. Relaxing the 
Hausdorff condition permits such speculations to be investigated. 


In the context of quantum gravity, if the stress-energy tensor is in a mixed state, one would have a mixture 
of solutions of the Einstein gravity equations, each one correct for a particular pure state of the stress-energy 
tensor. So one would have a superposition of space-times which are very close to each other. If the mixed 
state collapses to a single state, each individual state before the collapse may "see" a slightly different space- 
time locally Cartesian space which smoothly merges into the post-collapse space-time, but each state “sees” 
a slightly different collapse trajectory. This is, once more, highly speculative, but it gives some idea that 
relaxing the Hausdorff condition might lead to some interesting “opportunities” . 


49.5.11 REMARK: Fibre bundles and non-Hausdorff locally Cartesian spaces. 

The notation F for the state space in Example 49.5.9 is intended to suggest the idea of a fibre space. 
The first component 4 of each pair (i,j) in the Cartesian product F x F may be thought of as the input 
to a switching node, if the real line represents a time parameter, and j would then represent the output. 
Instead of permitting all possible transitions from F to F, one could restrict transitions to elements of some 
structure group G which acts on F. For example, one could require all transitions to be orthogonal, in other 
words maintaining the length of fibre elements. Parallel transport along a curve could then be thought of 
as “no change" of fibre elements as they are transported, whereas switching between “tracks” or “threads” 
of the locally Cartesian space could be thought of as a non-zero covariant derivative. Such variation along 
a path could be made continuous by taking the limit of arbitrarily large numbers of switching points, each 
permitting a small change of orientation of a fibre element as it travels along the path. 
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49.6. Charts for locally Cartesian spaces 


49.6.1 REMARK: Continuous charts for locally Cartesian spaces. 
Continuous Cartesian charts are introduced in Definition 49.6.2, which is illustrated in Figure 49.6.1. 


A 
Y 
T 
M 
R” 
Figure 49.6.1 Chart V : U + G € Top(IR?) for some U € Top( M) 


49.6.2 DEFINITION: A (continuous) chart for a locally Cartesian space M with n = dim(M) € Zj isa 
homeomorphism v : U — G for some U € Top(M) and G € Top(R”). 


A continuous chart for a locally Cartesian space is also called a (continuous) coordinate map. 


49.6.3 REMARK: Motivation for requiring locally Cartesian space chart domains to be open sets. 

In principle, there is no overwhelming reason to require topological manifold charts to be defined on open 
sets, but open sets offer the convenience of open neighbourhoods around all points within which the map is 
defined. This is particularly convenient when one wishes to test the chart transition maps for differentiability 
for example. 


49.6.4 DEFINITION: The transition map from chart pı to chart w2, where yı and v» are continuous charts 
for a locally Cartesian space M, is the composite map v» o YI z 


A (chart) transition map for a locally Cartesian space M is a transition map from any chart to any other 
chart on M. 


49.6.5 THEOREM: Some basic properties of transition maps for locally Cartesian spaces. 
Let i; and w2 be continuous charts for a locally Cartesian space M. Let n = dim( M). 


(i) Dom(v» o v; ^) = Yı (Dom(y)) = Range(v1|o,,.(,,.,) = v1(Dom(v1) n Dom(v)). 

(i) Range(Va © yr!) = vs(Dom(/)) = Range(vo p... )) = v2 (Dom(v1) n Dom(va)). 

(iii) V o vi : 4 (Dom(v5)) > va(Dom(v;)) is a well-defined function. 

(iv) Dom(vy4) N Dom(vs) € Top(M). 

(v) Dom(y» o Y7 !) € Top(IR?). 

(vi) Range( o vy) € Top(R?). 

(vii) %2 o YI ' : Gy, — G% is a homeomorphism with respect to the relative topologies on G} = Dom(wy» o Y7 +) 
and Gz = Range(y2 o i, !) in R”. 

PROOF: Part (i) follows from Theorem 10.10.13 (iii, v, ix). 

Part (ii) follows from Theorem 10.10.13 (iv, vi, x). 

Part (iii) follows from parts (i) and (ii). 

For part (iv), Dom(/1) € Top(M) and Dom(w2) € Top(M) by Definition 49.6.2. So Dom(v1) n Dom(v») € 

Top(M) by Definition 31.3.2 (ii). 

For part (v), v1(Dom(vy4) n Dom(v2)) € Top(IR") by part (iv) and Definitions 31.14.2 and 31.12.4 because 

i; is a homeomorphism. Hence Dom(w» o v!) € Top(IR?) by part (i). mE 

) 


For part (vi), vo(Dom(v1) N Dom(w2)) € Top(IR") by part (iv) and Definitions 31.14.2 and 31.12.4 because 
V» is a homeomorphism. Hence Range(w2 o v, ') € Top(IR?) by part (ii). 


For part (vii), %2 o yl : Gi — G» is a bijection because v» and v, ' are well-defined injective partial 
functions. The continuity of vs o yi! and its inverse follow from the continuity of V, Y2, YI ! and Vy 1 
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by Theorem 31.13.6. Hence Yz o V! : G1 — Gə is a homeomorphism. (Alternatively, part (vii) follows 
from Definition 31.14.16 and Theorem 31.14.17 (iii) because v; ! : R” > M and v» : M > R” are local 
homeomorphisms.) 


49.6.6 EXAMPLE: Two coordinate charts for the unit circle. 

Let M = ((z,y) € R2; £? +y? = 1). Let Top(M) = {9N M; Q € Top(R”)}. Then Top(M) is the relative 
topology on M induced by the standard topology on R” by Definition 31.6.2. Therefore Top( M) is a valid 
topology on M by Theorem 31.6.3. 

Define yı : MV (—1,0) > R by yı : (x,y) > y/(1 +2) = sign(y)((1 — z)/(1-- z))!/7. Similarly, define 
V» : M\ {1,0} > R by v» : (x,y) 9 y/(1— x) = sign(y)((1-- x)/(1 — x))!/2. Then Dom(y4) € Top( M) and 
Dom(w2) € Top(M), and Range(v1) = Range(v5) = IR. (The charts V; and %2 are projections of points 
in M from (—1,0) and (1,0) respectively to the Y-axis.) These charts are continuous with respect to the 
topologies on M and R. 


Note that Dom(vy o v; ) = R\ {0} z R and Range(Uy o v;!) = R \ {0} # R for k z £ even though 
R is the range of both charts. The transition maps wy o Y7" : R \ {0} > R \ {0} may be calculated as 
Wr o wp ite t7 for k # L. These transition maps are homeomorphisms. 


49.6.7 REMARK:  Hint-notations for charts and functions. 

In this book, the symbol v» very often hints at a chart (i.e. a coordinate map) for a locally Cartesian space 
or manifold. Many books use the symbol ¢ for charts. In this book, the symbol ¢ often hints at a function 
from one space to another. A possible mnemonic for this is that «» (“psi”) suggests the last two letters of 
the word “maps”, whereas ¢ (“phi”) sounds like the first letter of the word “function” (a Latin word which 
does not seem to be of Greek origin). 


49.6.8 REMARK: The choice of homeomorphism direction for locally Cartesian space charts. 

Since the map w in Definition 49.6.2 is a homeomorphism between open sets, so also is its inverse ~~. So 
charts could be specified as maps from the Cartesian coordinate space to the point space. Gauf adopted 
this convention in his 1827 paper, *Disquisitiones generales circa superficies curvas" [219], section 4. 


Duae habentur methodi generales ad exhibendam indolem superficiei curuae. Methodus prima 
vtitur aequatione inter coordinatas x, y, z quam reductam esse supponemus ad formam W = 0, 
ubi W erit functio indeterminataram x, y, z. [...] 


Methodus secunda sistit coordinatas in forma functionum duarum variabilium p, q. 
The following translation is given by Spivak [37], Volume 2, page 65. 


There are two general methods for defining the nature of a curved surface. The first uses the 
equation between the coordinates x, y, z, which we may suppose reduced to the form W = 0, where 
W will be a function of the indeterminants x, y, z. [...] 


'The second method expresses the coordinates in the form of functions of two variables, p, q. 


However, it is better to regard coordinates as tags on geometrical points. The points are the primary entity, 
and the coordinates are mere labels for the points. On the other hand, in the case of curves and families 
of curves, points are given as a function of real n-tuples because curves and families of curves represent a 
kind of possibly self-intersecting (non-injective) motion within the point space. A curve may self-intersect, 
whereas the modern style of manifold must not. A curve gives you a unique point for each parameter value. 
A manifold has a unique set of parameters for each point. 


Another way to see that it is more natural to define charts as functions from the point set to the coordinate 
set than vice versa is to think of how people make real maps of the Earth. The usual procedure is to choose 
which points are of interest, such as towns and mountains, and then determine the coordinates (e.g. longitude 
and latitude) of these points. In other words, the coordinates are attributes of the point, not vice versa. One 
does not choose a set of coordinates and then go out and see what is at those coordinates. 

Perhaps an exception to this would be aerial and satellite photography where data is organised as a set 
of pixels, and the points on the Earth must be determined as a function of the pixel row and column in 
the matrix. However, such images are usually calibrated by identifying points on Earth which have known 
coordinates and then determining the Earth-to-pixels map by interpolation. 


Similarly in the case of fibre bundles, fibre charts are expressed as functions from the fibre bundle to the fibre 
space rather than vice versa. On the other hand, when defining charts for particular embedded manifolds, it 
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is generally easier to define the inverse chart, i.e. from a Cartesian space to the manifold. (See for example 
spherical coordinates in Section 76.2.) All in all, the arguments in favour of chart-to-point-set are strong 
enough to make one seriously doubt that the points-to-coordinates direction is the best choice of direction. 
However, a sub-optimal standard is better than no standard. Discontented certainty is better than chaos. 


49.6.9 REMARK: Three methods for specifying locally Cartesian spaces. 
The common methods of specifying the points of a locally Cartesian space include the following. 


(1) Specify a constraint on the points of an ambient space. For example, S? = (x € IR?; |x| = 1}. 
(2) Specify the point space by means of curve families. For example, x = (cos ¢ cos 0, sin ¢ cos 0, sin 0). 
(3) Specify coordinates for each given point. For example, ¢ = arctan(z1, x2), 0 = arcsin(#3). 


Method (1) is philosophically more satisfying because there are no charts or coordinates at all, but it requires 
an ambient space, and often the constraint may not be so simple as “|x| = 1”. It must then be proved from 
the constraint that the set of points can in fact be coordinatised in the fashion required for locally Cartesian 
spaces. A further disadvantage is that the topology and differentiability class of the space depend on the 
choice of charts. Specifying only the point set leaves the topological and differentiable structure a matter of 
guesswork. 


Method (2) is in many practical situations the easiest way to specify a locally Cartesian point space. The 
topological and differentiable structure is implied by the inverse of each curve family. However, it must be 
proved that the curve families are injective, and that their inverses have the desired regularity properties. 
For example, the curve family x = (cos ¢ cos 0, sin ¢ cos 0, sin 0) requires careful choice of its domain to ensure 
injectivity and avoid singularities, particularly at the poles. 


Method (3) has the advantage that the coordinate charts are easily checked to be well-defined functions, 
and the regularity of the transition maps can be evaluated fairly straightforwardly. However, in many 
practical situations, the point-to-coordinates maps may be considerably less pleasant than the coordinates- 
to-points maps. One generally does most work on locally Cartesian spaces in terms of coordinates. So the 
points themselves are somewhat abstract and extra-mathematical. One does not usually want to know the 
coordinates of given points because one is working in the coordinate space. Most often, one merely wishes 
to know how to change to a different coordinate chart. 


As often happens in differential geometry, the choice of definitions requires a subjective trade-off according 
to the relative advantages and disadvantages in a given application context. (This helps explain the plethora 
of different definitions in differential geometry.) In the case of locally Cartesian spaces, it seems best to 
use method (3) for abstract theoretical contexts, although method (2) is often most convenient for practical 
applications, and method (1) may be best for the specification of point sets. 


'The requirement for an ambient space is not restrictive as it may seem. In practice, explicit specifications of 
locally Cartesian spaces are almost always expressed in terms of an ambient space of some kind. An ambient 
space may be avoided by specifying only the transition maps for the charts, although this in effect embeds 
all patches within other patches. In other words, both the domains and ranges of charts are almost always 
sets of n-tuples of numbers of some kind. 


49.7. Atlases for locally Cartesian spaces 


49.7.1 REMARK: Locally Cartesian spaces may be specified with a topology or with an atlas. 
A locally Cartesian space does not require an atlas for its specification. This is in contrast to differentiable 
manifolds, which do require an atlas for their specification. (See also Remark 49.2.4.) 


Since an atlas is typically the most convenient structure for specifying the topology on a locally Cartesian 
space, even if it is not differentiable, it is convenient to accept an alternative specification tuple (M, Aj) for a 
locally Cartesian space (M, Tm), where Ay is a continuous atlas for (M, Tm) according to Definition 49.7.3. 


49.7.2 REMARK: Continuous atlases consistent with a given locally Cartesian space. 

In Section 49.8, continuous locally Cartesian atlases are defined in the absence of a topology on a given set. 
'Then the topology is determined by the atlas. By contrast, Definition 49.7.3 makes the atlas suit a given 
topology which is known to be locally Cartesian on a given point set. This arrangement is realistic when 
the point set is defined as a subset of a Cartesian space, in which case the topology may be defined as the 
relative topology within the ambient space. (Definition 49.7.3 is illustrated in Figure 49.7.1.) 
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viU) | iv T dm p22) 


IR^ R” 
Figure 49.7.1 Atlas for a locally Cartesian space M 


49.7.3 DEFINITION: A (continuous) atlas for a locally Cartesian space M is a set A of continuous charts 
for M such that Upea Dom(v) = M. 


An indexed (continuous) atlas for a locally Cartesian space M is a family (v;);er of continuous charts for M 
such that U;c; Dom(y;) = M. 


icI 


49.7.4 REMARK: Comparison of chart-set and chart-family atlases. 

An atlas is given two alternative formalisations in Definition 49.7.3: with and without an index. In practical 
applications, the charts are usually indexed. For convenience, the family of charts is usually referred to 
interchangeably as a set of charts, which then means the set of charts which is indexed by the family. As 
always with indexed families, the indexing may be implicit or explicit, according to the requirements of the 
context. The same issue arises for fibre bundle atlases in Definition 47.5.2. 


The arguments in favour of defining an atlas as a set of charts rather than an indexed family are much 
stronger than the counterarguments. In the case of an indexed atlas: (1) it is difficult to choose an index set 
for a complete atlas other than the set of charts themselves, which is rather clumsy; (2) when merging two 
atlases, it is difficult to choose an index for the union of the atlases, particularly if the atlases are infinite; 
(3) when restricting a locally Cartesian space to a subset, the restricted atlas uses a subset of the index set 
of the full atlas; (4) since the content of an atlas is independent of the indexing, the extraneous index map 
must be ignored when comparing atlases. All in all, it is best to simply add an index set whenever it is 
convenient for applications. 


49.7.5 REMARK: Consequences of “incorrect coverage” by an atlas. 

The requirement Uye a Dom(v) = M in Definition 49.7.3 is apparently a constraint on the atlas A. However, 
that would be putting the cart before the horse. In application contexts where locally Cartesian spaces arise, 
one typically defines an atlas to cover all of the points of interest, and the point-space is then defined to be 
the union of the domains of the charts. 


For example, if an atlas covers only the northern hemisphere of a sphere, it is not the atlas which is “at 
fault”. The point space is then the northern hemisphere, not the whole sphere. As another example, if a 
chart covers all points of a Euclidean space except the origin, then the origin is not part of the point space. 


Similarly, as discussed in Section 49.8, it often happens that the topology on the point set is defined by the 
atlas, whereas Definition 49.7.3 requires an atlas to be compatible with a pre-defined topology. If the atlas 
is defined first, both the point set and the topology are derived from the atlas, in which case the point-set 
coverage is always “correct”. In this sense, one may say that the coverage-matching requirement is a not a 
requirement. It is, in a sense, the definition of the point set. 


49.7.6 REMARK: Chart transition maps for a specified atlas on a locally Cartesian space. 
Definition 49.7.7 specialises the general transition maps between locally Cartesian spaces in Definition 49.6.4 
to transition maps between charts in a specified atlas. 


49.7.7 DEFINITION: A (chart) transition map for an atlas A for a locally Cartesian space M is a transition 
map from any chart in A to any other chart in A. 
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49.7.8 REMARK: Transition map continuity conditions are automatic for continuous atlases. 

Additional continuity conditions on transition maps v o (Vili. su) : Wi(Ui N Uj) — wv;(U; N Uj) for 
WPi, Yj € A are not needed in Definition 49.7.3 because all charts are homeomorphisms by Definition 49.6.2. 
So the transition maps are homeomorphisms by Theorem 31.14.17 (iii). Therefore any continuous atlas for 
M according to Definition 49.7.3 is also a C? atlas according to Definition 51.3.2. 


'The way in which a given locally Cartesian topology forces the transition maps for any continuous atlas for 
the space to be continuous is apparently not available in the case of C% atlases for k > 1. In other words, the 
topology alone does not impose differentiability properties on the transition maps. However, a continuous 
tangent bundle does impose C! differentiability on the chart transition maps if the charts are required to be 
compatible with the tangent bundle. 


A C? tangent bundle for a C! differentiable manifold plays the same role for C! atlases as a locally Cartesian 
topology plays for a continuous atlas. Similarly, the higher-level tangent bundles discussed in Chapter 59 


play the role similar to the topology in enforcing C* compatibility of charts for larger values of k € Zt. 


49.7.9 REMARK: Notations for atlases on locally Cartesian spaces. 

Although the notations atlas(M) and atlas,(M) in Notation 49.7.10 suggest that the atlases are attributes of 
a locally Cartesian space M < (M, Ty), the atlas is context-dependent, i.e. not inferable from the topological 
space specification tuple (M, Ty). In this sense, these are pseudo-notations. 

For compactness and brevity, Aj; may occasionally denote the atlas atlas(M) for M, and then Ay, may 
denote the corresponding set of charts atlas;(M) = (v € Ay; p € Dom(y)}. 


49.7.10 NOTATION: 

atlas( M), for a locally Cartesian space M. denotes an atlas for M which is implicit in a particular context. 
atlas; ((M) denotes the set {~ € atlas(M); p € Dom(w)} of charts in atlas(M) whose domain contains a 
given point p € M. 


49.7.11 REMARK: The standard atlas for a Cartesian space contains only the identity function. 
The usual or standard atlas for any Cartesian space IR" is the single-chart atlas which contains only the 
identity function on IR". Definition 49.7.12 generalises this to any open subset of IR". 


49.7.12 DEFINITION: The usual atlas or standard atlas for an open subset M of the Cartesian topological 
space IR" for n € Z is the atlas {idm} for M. 


49.7.13 REMARK: The standard atlas for a finite-dimensional linear space contains all component maps. 
Any one of the component map styles Definitions 22.8.6, 22.8.7 or 22.8.8 may be used for the “standard 
atlas" for a finite-dimensional linear space in Definition 49.7.14. These component maps are compatible with 
the standard topology for a finite-dimensional linear space in Definition 32.6.6. So the set of all component 
maps constitutes a locally Cartesian space atlas for the space according to Definition 49.7.3. 


49.7.14 DEFINITION: The standard atlas for a finite-dimensional real linear space F is the set Ap of 
component charts for all bases of F. In other words, Ar = (&p; B is a basis for F}, where kg denotes the 
component map with respect to a basis B for F. 


49.7.15 DEFINITION: The standard atlas for the general linear group GL(F) of a finite-dimensional real 
linear space F is the set of linear-map component maps for all bases of F. In other words, it is the set 
(&p,p; B is a basis for F}, where &p,p denotes the linear-map component map with respect to a basis B 
for F as in Definition 23.2.10. 


49.7.16 REMARK: Existence of atlases with finitely many charts. 

Every locally Cartesian space M possesses an atlas. In particular, if M is compact, any atlas on M has a 
finite subset or “sub-atlas” which is also an atlas. Therefore every compact locally Cartesian space has a 
finite atlas. But a locally Cartesian space with an infinite number of disconnected components cannot have 
a finite atlas. For most locally Cartesian spaces of practical interest, a finite atlas is available. 


49.7.17 REMARK: Automatic consistency of charts for a locally Cartesian space. 
All pairs of charts for a locally Cartesian space are automatically consistent on their intersection. Hence all 
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atlases are consistent. This is different to the case of differentiable manifolds, where an atlas is required to 
indicate which structure is intended. 

If A is an atlas for a locally Cartesian topological space M, and v is a chart for M, then AU (v) is also an 
atlas for M. Hence the set (v; v is a chart for M) is an atlas for M. 


49.7.18 DEFINITION: 

The maximal atlas for a locally Cartesian space (M, T) is the set of all locally Cartesian charts which are 
compatible with (M, T). In other words, it is the set of all functions  : U — IR" such that U € Top( M) 
and v: U + v(U) is a homeomorphism with respect to the relative topology on Y(U) in IR^. 


A maximal atlas is also known as a complete atlas. 


49.7.19 REMARK: Notation for the atlas of all continuous charts for a locally Cartesian space. 

It is convenient to introduce the non-standard Notation 49.7.20 for the atlas of all continuous charts on a 
given locally Cartesian space, in other words for the maximal atlas in Definition 49.7.18. This atlas is fully 
determined by the topology Top(M) on M. Thus atlas( M) is the set of all homeomorphisms between open 
subsets of M and open subsets of R”, where n = dim( M). 


49.7.20 NoTATION: atlas(M), for a locally Cartesian space M, denotes the maximal atlas for M. 

atlas; (M) denotes the set (v € atlas( M); p € Dom(w)} of charts in atlas(M) whose domain contains a given 
point pe M. 

Alternative notations: atlas? (M) and atlas? (M) respectively. 


49.8. Topology induced by a locally Cartesian atlas 


49.8.1 REMARK: Making the topology fit the atlas instead of making the atlas fit the topology. 

'The purpose of Definition 49.8.2 is to define a continuous locally Cartesian atlas which does not assume any 
prior provision of a topology on a given point set. Continuous atlases may be defined in conformance with a 
given topology on a set, or the topology may be induced by a given continuous atlas. Theorem 49.8.8 shows 
that such an “atlas-induced topology" is a valid topology. 


A third possibility is to dispense with the predefined point set altogether, as in Section 49.9, where a locally 
Cartesian space is defined as a patchwork of Cartesian patches stitched together by transition maps. The 
three specification styles may be outlined as follows. 


section specified structures derived structures 


49.7 point set, topology atlas 
49.8 point set, atlas topology 
49.9 atlas topology, point set 


'The spaces and maps in Definition 49.8.2 are illustrated in Figure 49.8.1. 


49.8.2 DEFINITION: A continuous locally Cartesian atlas for a set M is a set Ay which satisfies the 
following conditions for some n € Zg. 
(i) Vb € Am, JU € P(M), 30 € Top(R”), y : U > Q is a bijection. 
(ii) Less Dom(v) = M. 
(iii Yy, We E Am, pı (Dom(w2)) € Top(R”). 
(iv) V3, Y2 € Am, 2 0 Vil : Yı (U1 NU2) > Y2(U1 N U2) is continuous, where Ua = Dom(wv,) for a = 1,2. 


49.8.3 REMARK: Interpretation of conditions for a continuous locally Cartesian atlas. 
The conditions of Definition 49.8.2 mean that a continuous locally Cartesian atlas on a set M is a set Am 
of bijections from arbitrary subsets of M to open subsets of IR", such that the transition map v» o Vil is 
continuous for all y1, Y2 € Ay, and the domains of the bijections in Am cover all of M. 


Condition (iii) of Definition 49.8.2 ensures that the patches of the atlas are sewn together correctly. The 
set w1(Dom(w2)) equals the image v4 (Ui N U2) under «^; of the patch-domain intersection U; N U2. The 
requirement that v1(U; N U2) and v»(U, N U2) be open subsets of R” ensures that the continuity test in 
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M 
U; U2 
A Vi ab A 
v1(U1) V»(Us) 

| VIN i vilian Vales K 

| i a(t NU 
NER ei nv) FN Hun 

EM um: | 
V» o b laces E 
IR? IR? 
Figure 49.8.1 Transition map v» o 94^ Lu tr npa» n — dim( M) 


condition (iv) is carried out on functions V» o vi ! and yı o Pa ! which are conveniently defined on open 
subsets of IR", but this is not the principal motivation for condition (iii). The main motivation is to ensure 
that the definition matches the standard concept of a locally Cartesian space (or a topological manifold if 
the topology has the Hausdorff property). Example 49.8.4 gives some idea of what could go wrong without 
condition (iii). 

The continuity requirement in condition (iv) is interpreted in terms of the relative topologies on the range- 
sets 4 (U; N U2) and v3(U, N U2). Since these are open subsets of R” by condition (iii), the points of these 
range-sets are surrounded by neighbourhoods within R”. So the continuity criterion is effectively identical 
to the criterion in R”. 


49.8.4 EXAMPLE: Consequence of not enforcing topological openness of images of patch intersections. 

Let M = {x € RÌ; x3 = 0 or z; = 0} and Ay = (1,5), where Y1 :U; = {x € R3; z4 = 0} > Qı = R? 
is defined by V : x > (21,22), and V5 : Uz = {x € R3; £2 = 0} — Q5 = R? is defined by Wo: z (21,23). 
Then conditions (i) and (ii) in Definition 49.8.2 are satisfied by Aj. To verify condition (iv), note that 
Ui AN Us = [x € IR?; 23 = 0 and x = 0}, and wi(Uy, n U2) = ypə(Uı N U2) = {x € R?; X9 = 0}, and 
Va op) : x e (21,0) and 4 o Y3" : £> (21,0), which are continuous functions. 


(M, Am) does not satisfy condition (iii) because vi(Dom(v5)) = v»(Dom(v1)) = {x € R?; x2 = 0} is not 
an open subset of the coordinate space IR?. This corresponds to the fact that M is the union of planes 
U; = (x € RÌ; x3 = 0} and Uz = {x € R; x9 = 0}, which does not fit the usual concept of a locally 
Cartesian space. Points on the intersection of U; and Uz have no neighbourhoods which are homeomorphic 
to patches of IR?. (It is acceptable to assume here that M has a well-defined topology because the maps Yı 
and %2 have been chosen to be continuous with respect to the relative topology on M in IR?.) 


49.8.5 REMARK: Optional exclusion of empty charts from continuous locally Cartesian space atlases. 
Definition 49.8.2 does not exclude the possibility of empty charts. For any set M, one may define the empty 
chart y = 0. Then Dom(v) = 0 € IP(M) and Range(7) = 0 € Top(IR") for any n € Zf. So the empty 
chart always satisfies condition (i) of Definition 49.8.2. It is easily seen that if Aj; is a continuous locally 
Cartesian atlas for M, then Am U {0} is also a continuous locally Cartesian atlas for M. 


In the particular case M = Ø, both Ø and {Ø} are continuous locally Cartesian atlases for M. It is difficult to 
think of any immediate benefit from permitting empty charts. (It is particularly annoying that the dimension 
n is an arbitrary non-negative integer for the empty chart.) 


'The proof of Theorem 49.8.8 is an example of the technical difficulties that can arise from empty charts. 
(Empty sets are much more difficult to analyse than one might expect!) In view of such technicalities, it is 
often convenient to exclude empty charts and the empty point-set from consideration. 
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49.8.6 REMARK: The topology induced by a continuous locally Cartesian atlas on a set. 

The lack of a predefined topology on the set M in Definition 49.8.2 is intentional. The procedure followed 
here is to first define an atlas on M, then define the topology induced by the atlas on M, and then verify that 
the induced topology has the desired locally Cartesian property. If the induced topology Tm is additionally 
a Hausdorff space, the topological space (M, Ty) will be a topological manifold as in Definition 50.1.1. 


Theorem 49.8.8 asserts that the topology induced on a point set M by a continuous locally Cartesian atlas 
Ay is a valid topology. The induced topology of a continuous locally Cartesian atlas is the weakest topology 
Tm on M for which all of the maps in Ay are local homeomorphisms. (The induced topology is formalised 
in Definition 49.8.12.) 


49.8.7 REMARK: Relations to other kinds of “overlap topologies”. 

Theorem 49.8.8 is related to Theorem 32.4.6 for the induced topology by the inverses of a family of maps 
from a general set X to a family of general topological spaces, which induces a “weak topology” on a general 
set X from a family of maps. However, such an induced topology requires all of the functions to be well 
defined on the whole set X, whereas the topology induced by a continuous locally Cartesian atlas permits 
the functions to have domains which are subsets of X. This extra generality requires the induced topology 
to be defined as unions of inverse images of the functions, which substantially increases the effort required 
to prove that the induced topology is valid. Theorem 49.8.8 is also related to the set-union topology in 
Section 32.14, the patchwork topology in Section 32.15, and the identification topology in Section 32.13. 


49.8.8 THEOREM: The topology induced on a set by a continuous locally Cartesian atlas is a topology. 
Let Am be a continuous locally Cartesian atlas for a set M, where n = dim(M) € Zi . 


(i) Vii, vo € Ay, VO € Top(IR?), (Y2 o v; !)(0) € Top(IR?). 
(ii) The set 
A U V 1(04); (Qu)ueA, € Top(R”)4™ } 
EAM 
is a topology on M. 


Proor: For part (i), let y1, Y2 € Am and Q € Top(R”). Then (Y2 o PI'O) = (Vs o YI 1) (Q9), where 
Q = QN v4 (U2) is an element of the relative topology on ~1(U2). (See Section 31.6 for relative topologies.) 
So (Y2 o py')(Q) € Top(y2(U1)) by Definition 49.8.2 (iv). (See Remark 49.8.3 for interpretation of the chart 


continuity condition.) Therefore (Yz o v; ')(Q) € Top(IR?) by Theorem 31.6.6 because va(U1) € Top(R”) 
by Definition 49.8.2 (iii). (This is illustrated in Figure 49.8.2.) 


p2 o v, )(Q 
m )(Q) 
(25 


aes & 


Ui U2 


Figure 49.8.2 Effect of a chart transition map on an open set 


For part (ii), to show that 0 € Tm, let Qy = Ọ for all v € Am. Then (Qy)yeay, € Top(IR")4™” and 
ij (Qu) = 0 for all y € Ay. So Upean V (Qu) = 0. Therefore 0 € Ty. 


To show that M € Ty, let Qy = R” for all y € Ay. Then (Qy)yeA, € Top(R”)^™ because R” € 
Top(IR"), and $^! (Q,) = Dom(v) for all v € Ay. So Leads dt) = eae. Dom(w) = M by 


Definition 49.8.2 (ii). Therefore M € Tm. 
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To show closure of Tm under pairwise intersections, let G1, G2 € Ty. Then Gk =U Ang i1 (Q0E) for some 
set-family (Q5)seA,, € Top(R")4™, for k = 1,2. Then 


G,nG;-( U HNA U RD) 


peAm peAm 


= U U Quod (95. 


pıEAm V3€ Aw 


by Theorem 8.4.8 (viii). Let Go, yo = Dom(wo) N G4 N G» for all Wo € Ay. Then 


Vio € Am, Vo(Go,u,) = Vo(G1 G2) 
-Vé( U U (#7 2(04,) 945 7(93,))) 


V1€Aw V3€AÀw 


= U JU ((Voovr(Q,)n(Voo va )(Q,)) 


wi€Am U»€AM 


by Theorem 10.8.18 (i) and Theorem 10.6.7 (iv’) because v is injective. But (Wo o Vi y) € Top(IR?) 
and (yo o y; (02, ) € Top(IR") for all Vo, V1, Y2 € Ay by part (i). So Vio € Am, vo(Go,u,) € Top(IR") 
because each set vo(Go,u,) is a union of finite intersections of open sets in R”. Now let O7, = vo(Go,u,) 
for all vo € Ay. Then G1 N G2 = Upean (Dom(vo) N Gi N G2) by Definition 49.8.2 (ii). Therefore 
G1 N G2 = Uu eA, Coo = UuseAu Vo (99) by Theorem 10.7.1 (ii) because each chart wo is injective by 
Definition 49.8.2 (i). Hence G1 N G2 € Ty because Q5. = Vio(Go,u,) € Top(IR?) for all vo € Am. 

To show closure of Tm under arbitrary unions, let Gk € Tm for all k € I, for some index set J. Then 
Gk =Uvean $7 (05) for some set-family (QF wean € Top(IR”)4™, for all k € I. Thus 


UGe=U U Ya) 


kel kel j€Aw 
= U Ud (QQ) 
weAm kel 
= U v^(U 0$) 
weAm kel 
by Theorem 10.8.18 (iii). But U,e, Q7, € Top(IR") for all (Q5)ker € Top(IR?)/, for all y € Ay. Therefore 
Uie, Cx € Tm. Hence Ty is a topology on M. 


49.8.9 REMARK: Faster proof that the induced topology of a locally Cartesian atlas is well defined. 
Theorem 49.8.8 can also be proved from Theorem 32.14.7 by considering that each patch Dom(w) for Yy € Am 
has an induced topology from IR" via the inverse map %71. Since these topologies are assumed to be 
consistent on overlaps, the overlap conditions of Theorem 32.14.7 are met. Therefore the “union topology” 
is well defined. 


49.8.10 REMARK: Topology induced by a continuous locally Cartesian atlas for some empty-set cases. 
Although the proof of Theorem 49.8.8 is apparently valid for the special case M = 6), with either Am = 0 
or Am = (0), it is perhaps of some interest to study this scenario separately. (Note that n is an arbitrary 
non-negative integer when M — ().) 


In the special case M = 0), either Ay = 9 or Ay = {0}. That is, either there is no chart or only the empty 
chart. If Am = 0, then Top(IR^)^*« = (0) by Theorem 10.2.26 (ii), and so (Qy) eA, = 9 is the only element 
of Top(R”)4™ = {Ø}. For this element, Upean (Qu) = 0. Therefore Ty = (0), which is the one and 
only possible topology for Ø. (See Remark 31.3.17 and Example 31.5.2 for the empty set topology.) 


If M — and Ay = {0}, then Top(R")4” = {{(0,Q9)}; Qg € Top(IR")) by Theorem 10.2.26 (iii), and so 
(Qy)yedar = ((0, 09) for some Ng € Top(R”). Then U,c4,, 9^ (Qu) = pera; V^ (Qu) = 07^ (Q9) = 0. 
Hence Tm = {0}, which is once again the one and only possible topology for Ø. (It may seem perplexing 
that two different atlases Ø and (0) yield the same topology Tm = {0}. In essence, the reason for this is the 


fact that [JU = Ø, which follows from Theorem 8.4.6 (i).) 
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49.8.11 REMARK: Topology induced by continuous locally Cartesian atlas is valid, therefore nameable. 
Since the set of sets in Theorem 49.8.8 part (ii) is proven to be a valid topology, it may now be referred to 


as “the topology induced by a continuous locally Cartesian atlas” in Definition 49.8.12. 


49.8.12 DEFINITION: The topology induced by a continuous locally Cartesian atlas Ay on a set M , where 
n = dim(M) € Zi, is the set 


Ta={ u Qs usa Top(R")4™}. 
peAm 


The topological space induced by a continuous locally Cartesian atlas Am on a set M is the pair (M, Ty). 


49.8.13 REMARK: The unique topology determined by an atlas might be only locally Cartesian. 
The topology Tm in Definition 49.8.12 is uniquely determined by the atlas Aj; and is the weak topology 
induced by the charts a € Ay. 


If the topology Tm on M is not Hausdorff, then the atlas Ay may only be thought of as a “topological locally 
Cartesian space atlas". If the topology is Hausdorff, then the atlas may be thought of as a “topological 
manifold atlas". (See Definitions 50.1.11 and 50.1.12.) 


49.8.14 REMARK: The topology induced by a continuous locally Cartesian atlas is a “union topology”. 
The topology induced by a continuous locally Cartesian atlas in Definition 49.8.2 may be considered to be a 
“union topology" according to the terminology in Section 32.14. Each patch Uy = Dom(w) has the topology 
Ty = (v 1(G); G € Top(IR")) which is induced by the inverse of a chart Y € Am as in Theorem 32.4.2. 
Since the topologies Ty, and Ty, on any pair of patches Uy, = Dom(v1) and Uy, = Dom(v») are guaranteed 
to be consistent by Definition 49.8.2 (iv), the conditions of Theorem 32.14.7 are met. "Therefore the induced 
topology of a continuous locally Cartesian atlas is a well-defined topology, and this is the weakest topology 
for which all of the charts are continuous. 


49.8.15 REMARK: Equivalent "overlap condition" for a locally Cartesian atlas to be continuous. 
Theorem 49.8.16 gives a single condition (iii’) to replace conditions (iii) and (iv) in Definition 49.8.2. 


49.8.16 THEOREM: Equivalent set of conditions for a continuous locally Cartesian atlas. 
A set Am is a continuous locally Cartesian atlas for a set M if and only if the following conditions are all 
satisfied. 


(i) In € Zj, V € Am, JU € P(M), IN € Top(IR?), v : U > Q is a bijection, 
(i!) Upean Dom(/) = M, 
(ii) Vii, Y2 € Ay, VO € Ty,, ANU, € Ty, 

where Uy = Dom(v) and Ty = (v !(G); G € Top(IR?)) for all Y € Ay. 


PROOF: Suppose that Aj; is a continuous locally Cartesian atlas for M. Then conditions (i’) and (ii^) follow 
from Definition 49.8.2 (i, ii). Let y1, Y2 € Ay and Q € Ty,- Then Q = v ! (G1) for some Gy € Top(IR"). Let 
Vk = Ue (Uy, N Uya) for k = 1,2. Then Vi, V; € Top(IR") by Definition 49.8.2 (iii). Let G; = G1 Vi. Then 
G' € Top(Vi) by the definition of the relative topology on Vi. So v»(Q N Uy.) = v»(Q) = v»(v1!(G1)) = 
(49 o v; )(G1) € Top(V3) because V o Wy! : V; — Vz is continuous by Definition 49.8.2 (iv). Therefore 
p(Q N Uy,) = G2 N V2 for some G2 € Top(R”) by the definition of the relative topology on V2. Let 
G^, = G3 V2. Then Qn Uy, = v5 (Gh), where G5 € Top(IR”) because Vz € Top(R”). Thus QN Uy, € Ty. 
Hence Ay satisfies (iii"). 


To show the converse, suppose that M and Am satisfy (i), (ii^) and (iii'). Definition 49.8.2 conditions (i) 
and (ii) follow from (i’) and (ii). Let y1, Vo € Ay. Then Uy, = v; ! (IR^) € Ty, because R” € Top(IR"). 
So Uy, N Uy, € Ty, by (iii). Therefore Uy, N Uy, = V, (G) for some G € Top(R”). So v; (Dom(y»)) = 


V1 (Uy) = v1(Uy, N Upa) = G € Top(IR"). This verifies Definition 49.8.2 condition (iii). 


To verify condition (iv), let V1, Y2 € Am. Let Ve = vx(Uy, N Uya) for k = 1,2. Let Gı € Top(Vi). Then 
G4 = G1 N Vi for some G} € Top(IR?) by the definition of the relative topology on Vi. So G4 € Top(R”) 
because V; = v1(Dom(/5)) € Top(IR?). Therefore v; ! (G1) € Ty,. Then v4! (G1) N Uy, € Ty, by (ii^). 
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Let G3 = (Y2 o v (G1). Then G2 = p2(Y7  (G1)) = ve(v1 (G1) N Uy) € Top(IR^). But G2 C Vo and 
V2 € Top(IR?). So Gs = G2 N V2 € Top(V;). Thus (4s o v4 ')(G1) € Top(V2) for all Gi € Top(Vi). So 
V o Vil : Vy > Və is an open map. Similarly, v; o wa" : V — V, isan open map. Hence v» o Vil DW Ve 
is a homeomorphism, which verifies Definition 49.8.2 condition (iv). 


49.8.17 REMARK: Continuous atlases for locally Cartesian spaces induce the correct original topology. 
Theorem 49.8.18 verifies that a continuous atlas A for a locally Cartesian topological space (M, Tm) according 
to Definition 49.7.3 induces the same topology T4 on M as was present on the original space. In other 
words, Ta = Tm. Therefore it is safe to work with either the topological space or a continuous atlas, at least 
from the purely topological point of view. 


49.8.18 THEOREM: The topology induced by a continuous atlas is the same as the original topology. 
Let A be a continuous atlas for a locally Cartesian space M. Let T4 be the topology induced by A on M. 
Then T4 = Top(M). 


Pnoor: By Definition 49.8.12, T4 = {Uyea wo 1(Gy); (Gu)vea € Top(IR")4}. Let Q € Ty. Then 
Q = Upea v^! (Gy) for some family (Gy) yea in Top(IR")^. But 7! (Gy) € Top(M) for all y € A because Y 
is continuous and Gy € Top(IR”), where n = dim(M). So Upea V! (G,) € Top(M) by Definition 31.3.2 (iii). 
Therefore T4 C Top(M). 

Now let 2 € Top(M). Let Gy = v(Q) for all y € A. Then (Gy)yea € Top(IR")4 because every chart y € A 
is a homeomorphism. But Upea V 1(G,) = Q by Theorem 49.3.6 (iii). So Q € T4. Therefore Top(M) C T4. 
Hence T4 — Top(M). 


49.9. Locally Cartesian patchwork spaces 


49.9.1 REMARK:  Topological patchwork spaces are constructed by joining patches. 

General topological patchwork spaces are defined in Section 32.15. (The patchwork of a family of general 
topological spaces is presented in Definition 32.15.3.) When the patches of a topological patchwork space 
are all open subsets of a fixed Cartesian topological space, the result is a locally Cartesian patchwork space. 
(If the topology on a locally Cartesian patchwork space is Hausdorff, the space may be referred to as a 
"topological manifold patchwork space".) Such spaces are useful for constructing locally Cartesian space 
when the underlying point space is not given. This is quite frequently the case for the modelling of intrinsic 
space or space-time in physics. 


Conversely, the transition maps of any given locally Cartesian space may be combined to construct a topo- 
logical synthesis of the patches, as stated in Theorem 49.9.2. (Theorem 49.9.2 is illustrated in Figure 49.9.1.) 


+4 11) = Range(v1) 


U = Dom(VA) | i 


X1 ] 


AES 
M 


Tus 2(U5) = Range(we 


zr n 
U2 E Dom(w2) : ne d IR fh” X 
) 
Figure 49.9.1 Topological patchwork of charts of a locally Cartesian space 


49.9.2 THEOREM: Homeomorphism between locally Cartesian space and its patchwork space. 
Let (Yi)icz be an indexed atlas for a locally Cartesian space M with n = dim(M) € Zt. Let X; = Range(v;) 
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for i € I. Define X C Xier Xi by 
X ={ (wi)ies € x X; 3p € M, (Vi € J, x = vi(p) and Vi € IV J, p € Dom(v;)) } 
ic 


= ((zi)ie; € x Xe Jp € M, Vi € I, ((i € J and x; = vi(p)) or (i € J and p € Dom(y;))) }. 


For all i € I, define the topology T; on X; to be the relative topology from R”. The family (X;,7;)ier is 
topologically compatible with the patchwork X. Let T be the patchwork topology on X. Then (X, T) ze 
(M, Top( M)). 


PROOF: It is evident from Definition 10.17.8 that X is a set patchwork of the family (X;);e;. Condition (i) 
of Definition 10.17.8 requires that there be no null families in X. This follows from the fact that the atlas 
covers M. The remaining conditions follow equally straightforwardly. 


The fact that the family (.X;, T;);er is topologically compatible with the patchwork X follows directly from 
the topological consistency of all charts in the atlas. Since all chart transition maps are homeomorphisms, 
the image of an open set under any chart transition map is an open set. 


To show the topological equivalence of (X, T) and (M, Top( M)), define the identification map f; : X; > X 
as in Definition 32.15.3 so that f;(x;) = x for all i € I and x € X. In other words, fi maps each element of 
X, to the corresponding element of the patchwork X. 


Define the function f : M — X so that f(p) = fi(vi(p)) for some i € I. This is well-defined because 
the functions f; are defined so that f;(z;) = f;(z;) if and only if v; !(z;) = V; (z3) for all i,j € I. To 
show that f is a homeomorphism, first let Q € Top(M) and note that v;(Q) € T; for all i € I. Therefore 
F(Q) = Uierfi(v;(Q)) € T. Similarly, any open set of (X, T) is of the form U;e; f;(Q9;) for some open sets 
Q; € T; for i € I, by Definition 32.15.3. Each set v; ! (Q;) is open in (M, Top(M)). So f-!(Uierf;(Q;)) = 
Vier f 1 (fi(Q;)) = Vier); (Qi) € Top(M). Hence f : (M, Top(M)) z (X, T). 


49.10. Continuous real-valued functions 


49.10.1 REMARK: Linear spaces of real-valued functions on locally Cartesian spaces. 

One of the most important constructions on a locally Cartesian space is the real linear space of continuous 
real-valued functions. (Continuity of real-valued functions may be tested via the charts in the same way as 
for continuous curves and continuous maps between locally Cartesian spaces.) This is a linear subspace of 
the free linear space on M. (See Definition 22.2.5 for the free linear space on a set.) Many useful classes of 
linear spaces on topological and differentiable manifolds are defined as subspaces of C(M,IR). 


'The definitions and notations in Section 49.10 are all applicable to general topological spaces. 


49.10.2 DEFINITION: The linear space of continuous real-valued functions on a locally Cartesian space M 
is the real linear space of continuous real-valued functions on M together with the operations of pointwise 
addition and multiplication by elements of IR. 


49.10.3 NOTATION: C(M,R), for a locally Cartesian space M, denotes the real linear space of continuous 
real-valued functions on M. 


C(M), C9?(M, R) and C?(M) are alternative notations for C(M,IR). 


49.10.4 REMARK: Specification tuple for linear space of real-valued functions on a locally Cartesian space. 
The full specification tuple for the real linear space C(M, R) in Definition 49.10.2 and Notation 49.10.3 is 
(IR, V; OR, TR:OV; LL), where 
(i) R < (R, om, Tr) denotes the field of real numbers, 
(ii) V = (f : M > R; f is continuous}, 
(iii) cy : V x V > V is the pointwise vector addition operation on V, and 
) 


(iv) w: Rx V — V is the pointwise scalar multiplication operation by R on V. 
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49.10.5 REMARK: Sets of partially defined real-valued functions on a locally Cartesian space. 

Notations 49.10.6 and 49.10.7 refer to sets, not linear spaces, of partially defined functions on a locally 
Cartesian space because these sets of functions have no natural linear space structure. (See Section 10.9 for 
partially defined functions.) However, such sets of functions are often useful because of the local character 
of locally Cartesian spaces. The sets of functions C(Q, R) in Notation 49.10.6 have optional linear space 
structure. 


o 


49.10.6 NOTATION: C(M,R), for a locally Cartesian space M, denotes the set of continuous real-valued 
functions on open subsets of M. In other words, 
C(M,R)= U C(Q,R). 
NETop(M) 


o 


49.10.7 NOTATION: C,(M,R), for a locally Cartesian space M and p € M, denotes the set of continuous 
real-valued functions on open subsets of M containing p. That is, C,(M, IR) = {f € C(M,R); p € Don(f)}. 
In other words, 
C,(M,R)= U  C(Q,R). 
OQ€Top, (M) 


49.10.8 REMARK: Linear spaces of vector-valued functions on a locally Cartesian space. 
Definition 49.10.9 extends Definition 49.10.2 from real-valued functions to general vector-valued functions. 
The full specification tuple for such a space is (IR, V, om, TR, OV; p), where 


(i) R < (IR;ecm, Tn) denotes the field of real numbers, 
(ii) V = {f : M — W; f is continuous}, 
(iii) cy : V x V + V is the pointwise vector addition operation on V, and 
(iv) w: Rx V — V is the pointwise scalar multiplication operation by R on V. 
This looks very similar to the specification tuple in Remark 49.10.4, but the vector addition and scalar 


multiplication operations use the linear space structure of W instead of the field addition and multiplication 
operations of IR. 


In the case of a differentiable manifold, it is not possible to let W be the tangent space at each point of 
the point-space because W must be the same linear space at every point. If the linear space is different at 
distinct, the structure is better represented as a vector bundle. Then the space of functions becomes a space 
of cross-sections of the vector bundle. 


In some applications, the space W is a real linear space of component m-tuples for vectors or tensors or 
other structures on the point-space. Since components are generally defined only within a single chart, it is 
useful to define also a notation for partially defined vector-valued functions. 


Continuous W-valued functions on locally Cartesian spaces require a topology on the linear space W. In the 
case of finite-dimensional or Banach linear spaces, this topology is generally clear. For other kinds of linear 
spaces, the topology must be stated in the context. 


49.10.9 DEFINITION: The linear space of continuous W -valued functions on a locally Cartesian space M, 
for a real linear space W, is the real linear space of continuous W-valued functions on M together with the 
operations of pointwise addition and multiplication by elements of IR. 


49.10.10 Notation: C(M,W), for a locally Cartesian space M and real linear space W, denotes the real 
linear space of continuous W-valued functions on M. 


C? (M, W) is an alternative notation for C(M,W ). 


o 


49.10.11 NoTaTION: C(M,W), for a locally Cartesian space M and a real linear space W, denotes the 
set of continuous W-valued functions on open subsets of M. In other words, 


C(M,W)= U C(Q,W). 
QETop(M) 
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49.10.12 NOTATION: (M, W), for a locally Cartesian space M and p € M, and a real linear space W, 
denotes the set of continuous W-valued functions on open subsets of M which contain p. In other words, 
Vp € M, C,(M,W)= U COC(24W) 
Q€ Top, (M) 


49.11. Continuous curves and maps 


49.11.1 REMARK: Curves and paths for locally Cartesian spaces are inherited from topological spaces. 
Definitions of curves and paths in locally Cartesian spaces are inherited without change from the topological 
space definitions in Sections 36.2 and 36.8. 


49.11.2 REMARK: Continuity of curves in locally Cartesian spaces may be tested via the charts. 

In locally Cartesian spaces, curves may be tested for continuity by mapping them through the charts. Thus 
if y: I M is a map from an interval J C R to a locally Cartesian space M with n = dim(M), then y is 
continuous if and only if v o ylw) :y-1(U) + R” is continuous for all continuous charts Y : U — R” 
for M. 


49.11.3 REMARK: Continuity of maps between locally Cartesian spaces may be tested via the charts. 
Continuity of maps between locally Cartesian spaces can be tested either with respect to the topologies on 
the respective point sets, or alternatively with respect to atlases of continuous charts for these sets. This is 
asserted in Theorem 49.11.4. In the case of differentiable manifolds, on the other hand, the only option for 
testing differentiability of maps is with respect to atlases of differentiable charts. 


49.11.4 THEOREM: Continuity between locally Cartesian spaces is equivalent to continuity via charts. 
Let Mı and Mə be locally Cartesian spaces. Let A; and A2 be continuous atlases for Mı and M2» respectively. 
Then a map f : Mı — Mo is continuous if and only if 


Vi4 € A1, Vo € Aa, 20 fo Vil is continuous. (49.11.1) 


PROOF: Suppose that f : Mı — Mo» is continuous. Let v, € Ay with Up = Dom(wvx) for k = 1,2. Then 
V» o f o V. : v1(U1) > v»(Us) is continuous by Theorem 31.13.4 because Y2, f and v! are continuous. 
This verifies line (49.11.1). 


Now suppose that line (49.11.1) holds. By Theorem 49.3.6 (v), a map f : Mı — Mg satisfies 


YQ € Top( M3), £((9)- U U ro fey, o v2)(Q), 
V1€4Ai V2€ Ag 


where fy, ypa = 2o fo Vil for pı € A; and v» € A». By Theorem 31.13.4, Ul o fu, v, 9 V1 is continuous. 


So (11 o Jide o V3)(Q) = (Vg! o fyra o v1)- (Q0) € Top(Mi). Therefore f~1(Q) € Top(Mi) by 
Definition 31.3.2 (iii). Hence f is continuous by Definition 31.12.4. 


49.11.5 NOTATION: C(Mi, M3), for locally Cartesian spaces Mı and M», denotes the set of all continuous 
functions from Mı to Mə. 


C? (Mi, M2) is an alternative notation for C (M1, M2). 


49.11.6 REMARK:  Pull-back atlases via homeomorphisms. 

Since homeomorphisms preserve all topological properties, and a locally Cartesian space atlas is a kind 
of “topological property", the “pull-back” of an atlas via a homeomorphism as in Definition 49.11.7 is, 
unsurprisingly, a valid atlas on the domain space. Such pull-back atlases are useful on submanifolds which 
are embedded via some homeomorphism or diffeomorphism, in particular for constructing atlases on the fibre 
sets of differentiable fibre bundles. (See Theorem 50.5.2 for an application of Theorem 49.11.8 to product- 
structured topological manifolds. See Theorem 52.2.6 for some related properties for differentiable manifolds 
and diffeomorphisms.) Definition 49.11.7 is illustrated in Figure 49.11.1. 
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IR" 


did bus 
OO 
Mi — M2 


Figure 49.11.1 The pull-back of a topological manifold chart via a homeomorphism 


49.11.7 DEFINITION: The pull-back atlas for a locally Cartesian space M; from an atlas A» on a locally 
Cartesian space M» via a homeomorphism ¢ : Mı — M» is the set (V o à; v € Ag}. 


Alternative name: induced atlas. 


49.11.8 THEOREM: Some basic properties of pull-back atlases. 
Let ó : Mı — Mə be a homeomorphism between locally Cartesian spaces Mı and M». Let Ag be a continuous 
atlas for Mz. Let A, be the pull-back atlas for M, from A» via ¢. 


(i) Aj is a continuous atlas for Mı. Hence A, C atlas( M1). 


(ii) Ypı € Mi, (v1 € Ai; pı € Dom(v1)) = {Y2 o à; Y2 € A» and (pı) € Dom(v5)J. 
(iii) If EU = atlas( M32), then A1 = atlas( Mı). 


PROOF: For part (i), let v; € A. Then Yı = v» o ó for some v; € A5. So by Definitions 49.7.3 and 49.6.2, 
pa : U2 — G3 is a homeomorphism for some Uz € Top(M3) and G2 € R”™, where ng = dim(M3). Let 
Uı = $^1(Us). Then U, € Top(M;i) by Definition 31.12.4, and V; o 9 : U; — Gh is a homeomorphism by 
Theorem 31.14.17 (i). Thus every v1 € A; is a continuous chart for Mj. 


Let py € Mi. Then ¢(pi) € Dom(v») for some v» by Definition 49.7.3. So pı € Dom(v» o $). Therefore 
M; € Uy, Dom(v1). So Mi = U,, Dom(v1). Thus A; is a continuous atlas for Mi by Definition 49.7.3. 
Hence A, C atlas(Mi) by Notation 49.7.20 and Definition 49.7.18. 


For part (ii), let py € Mı and v, € Aj with pı € Dom(v1). Then Yı = v» o @ for some Y2 € Ag, and 
(pı) € é(Dom(vi)) = Dom(v; o $7!) by Theorem 10.10.13 (iv). Therefore ó(pi) € Dom(i2). Thus 
{v1 € Ai; pı € Dom(y1)} € {Y2 o $; Y2 € A» and ó(p1) € Dom(v5)]. 

For the reverse inclusion, let Yı € [4s o $; Y2 € A» and (pı) € Dom(v»)). Then v4 = wv» o ¢ for 
some %2 € Ag, and (pi) € Dom(w2). Therefore yı € A; and pı € ¢-1(Dom(w2)) = Dom(v» o $) by 
Theorem 10.10.13 (i). So pı € Dom(v1). Thus (v1 € Ai; pı € Dom(v1)) 2 {Y2 o $; Y2 € A» and ó(p1) € 
Dom(v5)). Hence (v1 € Ai; pı € Dom(v1)) = (V» o $; V» € A» and (pı) € Dom(v»)J. 

For part (ii), let A» = atlas(M3). Let yı € atlas(Mi). Then v : U; — Gi is a homeomorphism for 
some U € Top(Mj) and Gi € IR"!, where n, = dim(M,). Let V5 = vy, o $^! and Uz = (U1). Then 
Dom(v5) = ¢(Dom(y1)) by Theorem 10.10.13 (iii). So Dom(v5) = 9(U1) € Top( M2) by Definitions 31.14.2 
and 31.12.4. But Range(v5) = v1i(Dom(9)) by Theorem 10.10.13 (iv). So Range(v5) = Range(v1) = Gi € 
Top(IR":). Therefore v» € atlas(M2) = A». But v1 = v» o d. So v4 € Ay. Thus A; 2 atlas(Mi). Hence 
A, = atlas( Mı) by part (i). 
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Chapter 50 


'TOPOLOGICAL MANIFOLDS 


50.1 Topological manifolds . . . . . . es 1602 
50.2 Submanifolds of topological manifolds . ................. ees 1605 
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50.5 Product-structured topological manifolds and submanifolds . ................ 1616 
50.6 Holder and Lipschitz continuous manifolds . .. ............... lle. 1620 
50.7 Rectifiable curves in locally Lipschitz manifolds . ...................... 1623 


50.0.1 REMARK:  Topological manifolds versus locally Cartesian spaces. 

As explained in Remark 49.2.8, topological manifolds are defined in this book as locally Cartesian topological 
spaces which happen to have the Hausdorff separation property. Some authors add various other topological 
requirements for topological manifolds, but they are best stated as needed. 


There are two principal reasons to require topological manifolds to be Hausdorff. 


(1) Hausdorff locally Cartesian space can be embedded or immersed in Cartesian spaces. 


(2) For purely intrinsic manifolds, the Hausdorff condition excludes the pathological-looking kinds of spaces 
which are discussed in Section 49.5. 


In pure mathematical differential geometry, embedded or immersed locally Cartesian spaces are the principal 
structures to study. In most physics, bifurcations of space and time are rarely of interest. Therefore the 
"benefits" of the Hausdorff restriction are generally welcome in both pure mathematics and physics contexts. 
There is, however, no obstacle in the way of anyone who wishes to open the Pandora’s box of non-Hausdorff 
locally Cartesian spaces. (But the name is intentionally long to discourage its use!) 


Since submanifolds of Cartesian spaces, and embeddings and immersions in Cartesian spaces, force the 
Hausdorff condition, these are presented in Sections 50.2 and 50.3 for topological manifolds only. Such 
concepts are easily “back-ported” to general locally Cartesian spaces if required. 


50.0.2 REMARK: Terminology: Topological manifolds versus continuous manifolds. 

The terminology for differentiable manifolds in Chapter 51, if applied to topological manifolds, would suggest 
that the name should be “continuous manifolds". A differentiable manifold has differentiable chart transition 
maps, but a topological manifold has continuous chart transition maps. To make the names more consistent, 
one could change “topological manifold” to “continuous manifold”, but the real problem is the lack of a 
differentiable structure which can play the same role for differentiable transition maps as a topology plays 
for continuous transition maps. As mentioned in Remark 49.7.8, tangent bundles can act in this role, but 
this is rarely seen in the literature. Moreover, a term such as “tangent-bundle manifold" would certainly not 
enhance comprehensibility. 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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50.1. Topological manifolds 


50.1.1 DEFINITION: A topological manifold is a Hausdorff locally Cartesian topological space. In other 
words, a topological manifold is a topological space M which satisfies 

(i) 3n € Zt, Vz € M, IN € Top,(M), 3G € Top(R”), Q = G, and 

(ii) M is a Hausdorff space. 


50.1.2 REMARK: The Hausdorff property is not implied by the locally Cartesian property. 

It is perhaps not immediately obvious why the Hausdorff property does not follow from the locally Cartesian 
property in Definition 50.1.1 (i), but the examples in Section 49.5 amply demonstrate that it is not without 
effect. (See Definition 33.1.24 for the Hausdorff condition. ) A brief family tree for topological manifolds and 
locally Cartesian spaces is illustrated in Figure 50.1.1. 


locally Cartesian Hausdorff 
topological space| |topological space 


^w x 


topological 
manifold 


t 


Cartesian 
topological space 


Figure 50.1.1 Family tree for topological manifolds and locally Cartesian spaces 


Definitions, notations and theorems which are valid for locally Cartesian spaces are also valid for topological 
manifolds. In fact, most topological manifold concepts make no reference to the Hausdorff condition at all 
and do not need it. (The issue of whether to require Hausdorff separation is also mentioned in Remark 49.5.2.) 


To permit emphasis of the Hausdorff property of topological manifolds, and to allow this “default” property 
to be overridden, Definition 50.1.3 effectively makes “topological manifold” mean the same thing as “locally 
Cartesian space” because the Hausdorff or non-Hausdorff qualification is stated explicitly anyway. 


50.1.3 DEFINITION: A Hausdorff topological manifold means the same as a “topological manifold”. 


A non-Hausdorff topological manifold means the same as a “locally Cartesian space”. 


50.1.4 REMARK: Dimension of a topological manifold. 

Notation 50.1.5 is a duplicate of Notation 49.4.11. Since topological manifolds are a sub-class of locally 
Cartesian spaces, both Definition 49.4.10 and Notation 49.4.11 for the dimension of a locally Cartesian space 
are immediately applicable. In fact, the great majority of concepts in Chapter 49 are directly applicable to 
topological manifolds. 


50.1.5 NOTATION: dim(M) denotes the dimension of a topological manifold M. 


50.1.6 EXAMPLE: Cartesian spaces satisfy the requirements for a topological manifold. 

A trivial example of a topological manifold is the set IR" with its standard topology for n € Zf. For 
any z € IR^, the sets 2 and G in Definition 50.1.1 may be taken as the whole set IR”, and the homeomorphism 
is then the identity map. 


50.1.7 EXAMPLE: Cartesian space homeomorphisms for a torus. 

A class of topological spaces which model toruses is presented as Definition 32.6.15. The point set of a torus 
may be modelled as a Cartesian product of real-number intervals M = x}_,J,, where for each k € Nn, the 
set Ij is the semi-open real-number interval [ay aj + Di) or (aj, aj + Li] for some aj € IR and Ly € IR*. 
Let Ij = [0, Lẹ) for all k € Nn. Then with the topology on M = x7 I; as in Definition 32.6.15, there is 
a homeomorphism Ysb : Qs > IR" for each x € M and b € R” such that 0 < bk < L;,/2 for all k € Nn, 
where v; b((£ +y) mod L) = y for all y € Gz b = xz 4(—b5r, bk) and Or, = ((x4- y) mod L; y € Grb}. Here 
z mod L is defined as the real n-tuple (zk mod Lp); for all z € IR". (See Definition 16.5.19 for the binary 
modulo function *mod".) This is illustrated in Figure 50.1.2. 
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homeomorphism Yz» with domain €); centred on x 


Figure 50.1.2 Local homeomorphism from a torus to the Cartesian space IR? 


50.1.8 REMARK: Manifolds with boundary points. 

In the pure mathematical kind of differential geometry which is concerned primarily with the classification 
of global topology, boundaries of manifolds are typically absent. In other words, manifolds are assumed to 
be locally homeomorphic to open subsets of a fixed Cartesian space at all points, as in Definition 50.1.1. 


For example, the open ball Bo; in IR" for n € Z* is an n-dimensional topological manifold and its boundary, 
OBo1 = S”! is an (n—1)-dimensional topological manifold, but neither of these manifolds contain boundary 
points in the sense of contradicting Definition 50.1.1. The closed ball Bou, however, does contain true 
boundary points which contradict Definition 50.1.1. 


In applications of differential geometry to physics, on the other hand, true boundary points are often required 
to be part of the mathematical model, for example for boundary value problems where the boundaries are 
geometrically non-trivial. (The edge and vertex boundary points of cubes, for example, have different 
geometric qualities, and the vertices of cones can have varying qualities.) 


Despite the importance of manifolds with boundaries, few authors specifically define them. Those who 
do define them include Lang [23], pages 38-42; Gallot /Hulin/Lafontaine [13], pages 181-184; Spivak [37], 
Volume 1, page 19; Kosinski [21], page 2; Darling [8], pages 111-114 (only submanifolds with boundary). 
The following remark is made by Bishop/Goldberg [3], page 22. 


The reasons for the restriction to open sets are that it forces a uniformity in the local structure which 
simplifies analysis on a manifold (there are no *edge points") and, even if local uniformity were 
forced in some other way, it avoids the problem of spelling out what we mean by differentiability 
at boundary points of the coordinate neighborhood; that is, one-sided derivatives need not be 
mentioned. On the other hand, in applications, boundary value problems frequently arise, the 
setting for which is a manifold with boundary. These spaces are more general than manifolds and 
the extra generality arises from allowing a boundary manifold of one dimension less. The points of 
the boundary manifold have a coordinate neighborhood in the boundary manifold which is attached 
to a coordinate neighborhood of the interior in much the same way as a face of a cube is attached to 
the interior. Just as the study of boundary value problems is more difficult than the study of spatial 
problems, the study of manifolds with boundary is more difficult than that of mere manifolds, so 
we shall limit ourselves to the latter. 


So after explaining very clearly the importance of manifolds with boundary, they do not give a formal 
definition because of the technical difficulties. The kind of boundary “attachment” to the manifold which 
they describe is the smoothest kind of attachment. Practical applications do not always feature such a 
regular style of attachment. 


50.1.9 REMARK: The optional second countability property for topological manifolds. 
Topological manifolds are sometimes defined to require second countability of the topology. This is not 
assumed in Definition 50.1.1. (See Definition 33.4.13 for second countability.) Second countability is probably 
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best regarded as an “optional extra” condition which is stated when needed. Like Hausdorff separation, 
second countability is inherited by topological subspaces, and since Cartesian topological spaces are second 
countable, any topological manifold embedded in a Cartesian space will necessarily be second countable as 
well as Hausdorff. 


50.1.10 REMARK: Atlases for topological manifolds. 

Although any continuous atlas (according to Definition 49.7.3) for a topological manifold automatically 
induces a Hausdorff topology on the underlying set (by Theorem 49.8.18), a locally Cartesian atlas which is 
defined on a set (by Definition 49.8.2) will not necessarily induce a Hausdorff topology on the set by merely 
renaming it to a “topological manifold atlas for a set”. Therefore this must be defined additionally to the 
locally Cartesian space definitions in Chapter 49. Definitions 50.1.11 and 50.1.12 are Hausdorff adaptations 
of Definition 49.8.2. Ni 


50.1.11 DEFINITION: A topological manifold atlas for a set M is a continuous locally Cartesian atlas for M 
which induces a Hausdorff topology on M. 


50.1.12 DEFINITION: An indexed topological manifold atlas for a set M is an injective family (wa)aer such 
that (v; a € I} is a topological manifold atlas for M. 


50.1.13 REMARK: Preventing the closure of a subset of a chart domain from “leaving the domain”. 
Theorem 50.1.14 asserts that for any chart v» for a topological manifold M, the closure in M of the inverse 
image v^ !(S) of a set S is included within Dom(w) if the closure of S in R” is included within Range(w). 
In other words, $ C Range(v) => wv-!(S) C Dom(v). (The topological closure operations are relative to 
R” and M, not relative to Range() and Dom(w).) This seems fairly intuitively obvious in the case of an 
embedded manifold as one usually pictures it, but for an intrinsic manifold, the validity of this assertion 
depends critically on the Hausdorff separation assumption. 


'The assertion in Theorem 50.1.14 has an elementary proof if the closures are instead defined with respect to 
the range and domain of w because w is à homeomorphism between these two topological subspaces. The 
proof is not so straightforward when the closures are with respect to IR" and M. 


The examples of non-Hausdorff topological manifolds in Section 49.5 give some idea of what can “go wrong” 
if the Hausdorff condition is not available. In the case of the simple “real-number line with two origins" in 
Example 49.5.5, illustrated in Figure 49.5.1, the open subset (—1,1) of either of the two chart ranges will 
contradict Theorem 50.1.14 because the origin for each chart ~ will be in the closure of the inverse image of 
(—1, 1) by the other chart ~’. Thus a point from outside the domain of each chart is in the closure of relatively 
compact subset of the domain. This contradicts any intuition which may be derived from experience with 
embedded manifolds. It is a peculiar characteristic of manifolds which have truly intrinsic geometry since 
non-Hausdorff manifolds cannot be embedded in Cartesian spaces. 


On a technical point regarding the proof of Theorem 50.1.14, it may seem to be convenient to assert that 
Vij; € Viji-1) for i > 2. This temptation must be resisted. The assertion fails due to the inconvenient 
fact that v(U-!(z))  Range(V) because v-!(z) = d € Dom(V). So Viji = Y(t (Bs/;)) is “punctured” 
by a single-point “hole in the middle” for all ; > 1. Thus the closure Vij; does contain the “point in the 
middle” whereas the set Vj/(;1) does not. This is in fact the principal issue of the theorem. A point which 
is surrounded by other points when viewed through one chart may correspond to a “hole” when viewed 
through a different chart. 


'Theorem 50.1.14 is applied in the proofs of Theorems 51.8.3 and 51.8.5. 


50.1.14 THEOREM: Locality of the closure of a subset of a Hausdorff manifold. 
Let M be a topological manifold with n = dim(M). Let $ be a subset of Range(v) for some 7 € atlas(M). 
Suppose that Closgn (S) C Range(v). Then Clos (v^! (S)) C Dom(w). 


PROOF: Let S denote Clos» (S) and $)-1 (S) denote Closy (i^! (S)). Suppose that à € v-!(S) V Dom(w). 
Then à € Dom(w) for some v € atlas(M) with v 4 v. Let 3 = w(q). 


Let ĉo = d(Z,R"^ V Range(w)). Then Bz C Dom(v) for all 6 € Rt with ô < ôo by Theorem 37.4.3 (i). Let 


Us = )-!(Bs,) for all 6 € Rt. Then Us C Dom(i) and Us € Top;(Dom(%))) C Top;(M) for all 6 € Rt 
because 1) is continuous. Therefore Us ^! (S) 4 0 for all 6 € R+ by Theorem 31.8.17 (ii). Let V; = (Us) 
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for all ô € R*. Then V; C Range(w) and Vs € Top(Range(v)) C Top(IR”) because v! m continuous. Then 
lims_,9+ diam(V;) = lims_,9+ diam(¢)(%~1(Bz,5))) = 0 by Theorem 38.1.8 because wj o ~~! is continuous. So 
lims. ,9« diam(V;) = 0 because diam(V;) = diam(V;) for all 6 € R*+ by Theorem 37.6.8. 

Define a set-family (S;);£, by S; = Vij; N S = Closgs (Vij; N S) for all i € Z*. Then S; C S for all i € Zt 
because V; N S C S and so V; 1 S C S for all ó € IR* by Theorem 31.8.13 (xiv). So S; C Range(v) for all 
i € Zt. But 5; # 0 for all i € Z* because S; 2 Vij N S = (Ui) n vw 1(S)) = v (Uii n v^! (S)) z 0 
because U;j; N ^! (S) # 0. By Theorem 37.4.13, lim; diam(5;) = 0 because lim; ,9 diam(V;) = 0 
and S; C Vj; for all i € Z^. Thus (5;);2, is a family of non-increasing compact subsets of R” satisfying 
lim; diam(5;) = 0. Therefore (7, S; = {a} for some x € R” by Theorem 37.9.6 (ii). Then x € S since 
x E€ Sı C VLAS. So x € Range(y). Let q = ^! (a). 

Let W € Ton, (i) and W € Top;(M). Then v(W) € Top;(IR"). So Bz C v(W) for some ô € Rt. 
Therefore Us = $-!(Bs,) € 9-  (((W)) = W n Dom(y) C W for some 6 € Rt. So Vd € (0,5, Us C W 
for some 6, € RF. 

The continuity of ~~! implies that v(W) € Top,(IR^). So Bz C v(W) for some 6’ € Rt. Therefore 
V 1(B, s) C W for some ó' € Rt. From lim; diam(S;) = 0, it follows that Vi > io, S; C Bs,» for some 
io € Zt. Then v-!(S;) C W for all i > ig. Let 52 = min(64, 1/io). Then 


V 1(S,)nUs 2v (Sig) N pis N Dom()) 
= YT (Sio) N YTE (Va) 
=v (Sin n 
— 4 ((Vi jio n S) n Va) 
2v ((Vi N S)n Va) 
2 (Vans) 
z 


because  Z V; N S C Range(v) for all 6 € Rt. So W NW z O) for all W € Top, (M) and W € Top;(M) 


because ^ 1(S;,) C W and Us, C W. This contradicts the Hausdorff assumption for differentiable manifolds 
since q Z q because q € Dom(v) and d ¢ Dom(w). From this contradiction, it follows that there is no q in 
—-1(S)N Dom(w). In other words, -! (S) \ Dom(v) = 0. Hence $-!($) C Dom(). 


50.2. Submanifolds of topological manifolds 


50.2.1 REMARK: Submanifolds, embeddings, immersions and submersions. 

The various classes of submanifolds in ambient topological or differentiable manifolds are defined essentially 
in the same way locally as they are for ambient Euclidean spaces. (See Sections 52.3 and 52.4 for submanifolds 
of differentiable manifolds.) m m 


The closely related concepts of submanifolds, embeddings (or imbeddings), immersions and submersions 
are required for various technical aspects of topological and differential fibre bundle concepts, particularly 
in relation to connections and curvature. Local trivialisations of fibre bundles establish homeomorphisms 
between the fibre space and fibre sets. Each fibre set is a submanifold of the total space. Submanifolds are 
also studied as classes of geometric objects for their inherent combinatorial topology properties. 


50.2.2 REMARK: Submanifold definition styles. Regularity of graphs versus rank of differentials. 

In the literature, submanifolds, embeddings, immersions and submersions are often defined in terms of the 
rank of the differential of a map. (This map is either the inclusion map in the case of submanifolds, or a 
map between two manifolds in the case of embeddings, immersions and submersions.) An alternative style 
of definition requires the map to be everywhere locally a graph, via the charts, which must satisfy some 
regularity condition. 

The “rank of differential” style of definition has some difficulties. To demonstrate that the graph (via the 
charts) of the map exists locally everywhere, one must invoke some kind of inverse mapping theorem. 


The “graph regularity” style of definition has the advantage that it is not necessary to define or calculate 
the differential first, and the existence of a local regular graph is part of the definition, not an analytical 
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consequence of it. An even more important advantage of the “graph regularity” style is that it can be very 
easily adapted for more general classes of regularity, such as the Holder C^ classes for example. Such 
regularity classes often arise from second-order elliptic boundary value problems. 


Thus it is preferable to define the regularity of submanifolds, embeddings, immersions and submersions in 
terms of local graph properties to obtain maximum generality with minimum technical machinery. The 
“rank of differential” approach can then be used to show that under suitable constraints on the differential 
of a map, the map will satisfy the local graph regularity requirements. (Constraints on the rank of the 
differential are most often expressed in terms of injectivity or surjectivity of the differential.) 


In the case of topological manifolds, the concept of the differential of a map is not available. So the “graph 
regularity” approach is the only possibility. 


50.2.3 REMARK: Spaces and maps for submanifolds of a topological manifold. 

Classes of submanifolds of a topological manifold M, and maps from a topological manifold N to M, include 
the following. (See Notation 49.7.20 for the maximal atlas notation “atlas”.) Almost all of the authors in 
the following terminology surveys define these terms only for differentiable manifolds. Amongst the authors 
in these lists, only Lang [23] and Choquet-Bruhat [6] define these concepts for topological manifolds. 


(1) Topological subspace (S, Ts). Definition 31.6.2. 
S € IP(M). Top(S) = Ts = {9N S; Q € Top(M)}. 
Any subset S of M, with the relative topology from M. This is not a topological manifold in general. 
(2) Topological submanifold (S, Ts). Definition 50.2.6. 
A locally Cartesian topological subspace of M. 
Any subset S of M, with the relative topology T's from M, which is a locally Cartesian space will be a 
topological manifold if M is a topological manifold because the Hausdorff property is inherited from M 
if M is Hausdorff by Theorem 33.1.33 (iii). 
This is called “une sous-variété topologique” (“a topological submanifold”) by Choquet-Bruhat [6], 
page 11; Malliavin [28], pages 53-54. 
(3) Regular submanifold (S, Ts). Definition 50.2.8. 
A topological submanifold which also satisfies: 


Vp € S, ay € atlas (M), Dom(v) n 8 = y^ (IR? x {Ogn—m}). 


This is called a “submanifold” by Lang [23], page 26; Kosinski [21], page 27; Flanders [11], page 52; 
Crampin/Pirani [7], page 243; Do Carmo [9], page 11; Frankel [12], page 27; Lee [24], page 15; Spivak [37], 
Vol. 1, page 49; Bishop/Goldberg [3], page 41; Bishop/Crittenden [2], page 21; Darling [8], page 56; 
Kobayashi/Nomizu [19], page 9; Wallace [153], pages 33-34. It is called a “regular submanifold” by 
Lee [24], page 15; EDM2 [113], page 385. It is called an “embedded submanifold” by Frankel [12], 
page 27; Lee [24], page 15. It is called an “imbedded submanifold” by Crampin/Pirani [7], page 243; 
Kobayashi/Nomizu [19], page 9. 


(4) Regular embedding u : N — M. Definition 50.3.6. 

A homeomorphism p : N — u(N) such that u(N) is a regular submanifold. 

This is called an “embedding” by Lang [23], page 27; Do Carmo [9], page 11; Lee [24], page 15; Gallot/ 
Hulin/Lafontaine [13], page 12; Szekeres [305], page 430; EDM2 [113], page 385. (Spivak [37], Vol. 1, 
page 49, says that “embedding” is used by “the English”.) It is called an “imbedding” by Kosinski [21], 
page 27; Flanders [11], page 53; Crampin/Pirani [7], page 243; Spivak [37], Vol. 1, page 49; Bishop/ 
Crittenden [2], page 21; Bishop/Goldberg [3], page 40; Whitney [161], page 113; Kobayashi/Nomizu [19], 
page 9. It is called a “regular embedding" by Szekeres [305], page 430; EDM2 [113], page 385. It is 
called *un plongement" (in French) by Choquet-Bruhat [6], page 15; Malliavin [28], page 53. 


(5) Regular immersion u : N — M. Definition 50.3.8. 
A continuous map which is locally injective, and whose image is locally a regular submanifold. 


Vq € N, 36 € Top, UN), 


p| g is a homeomorphism onto (G), and (G) is a regular submanifold of M. 


This is called an “immersion” by Lang [23], page 26; Kosinski [21], page 27; Gallot /Hulin/Lafontaine [13], 
page 12; Crampin/Pirani [7], page 243; Do Carmo [9], page 11; Frankel [12], page 169; Spivak [37], 
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Vol. 1, page 46; Bishop/Crittenden [2], page 185; Bishop/Goldberg [3], page 40; Szekeres [305], page 429; 
Darling [8], page 53; Kobayashi/Nomizu [19], page 9; EDM2 [113], page 385; Sulanke/Wintgen [40], 
page 32. It is called “une immersion” (in French) by Choquet-Bruhat [6], page 13. It is also called “une 
immersion” (in French) by Malliavin [28], page 52, but only if it is injective. 

(6) Regular submersion u : N — M. Definition 50.3.13. 
A continuous map whose inverse image is locally a regular submanifold in some sense. 


Vq € N, Ay € atlas (N), 3v» € atlas (q) (M), Vx € Range(v»), 
Vous * (x))) = (xi): 


where m — dim(M) € dim(N). 

This is called a “submersion” by Lang [23], page 27; Kosinski [21], page 27; Gallot /Hulin/Lafontaine [13], 
page 12; Crampin/Pirani [7], page 243; Do Carmo [9], page 185; Darling [8], page 54; Frankel [12], 
page 181; Sternberg [38], page 378; EDM2 [113], page 385. 


The maps u : N — M in the above list are usually tacitly assumed to satisfy a regularity condition. The 
corresponding purely topological concepts, without the regularity condition, are not often explicitly defined 
in differential geometry textbooks. 


50.2.4 EXAMPLE: A topological submanifold which is not regular. 

Not all topological submanifolds are regular. This is demonstrated by “wild arcs", which are curves in IR? 
with one or more points where no extension of the curve to a local chart is possible. (See Hocking/Young [93], 
pages 176-177; Artin/Fox [169].) The curve illustrated in Figure 50.2.1 is based on the Artin/Fox wild arc. 


ys - "dmm t£ 0.5 
il | my l E "HH HERR 
t=0 = = = m o 


Let [0, 1) have the circle topology. (See Definition 32.6.14.) Then the curve y : [0, 1) — IR? in Figure 50.2.1 
is a homeomorphism with respect to the relative topology on Img(^) in R. The inverse of y is the function 
y~! : Img(y) — [0, 1), which is continuous at 7(0.5) because for all e € IR*, for some ô € Rt, the image 
4((0.5 — £, 0.5 + £)) of the interval (0.5 — €, 0.5 + €) under (57!)^! = y is included in the ball B.(o.5),s. 
However, it is not possible to define a homeomorphism wv : U — Q for a neighbourhood U of 4(0.5) and 
Q € Top(IR?) such that v(Img(y) NU) = {x € Q; x, = £2 = 0). 

Thus the image of this curve is isomorphic to a one-dimensional manifold, and its internal topology is 
the same as the relative topology on the set. So the image is a topological subspace of IR?, and it is also a 
topological manifold. However, it is not a topological submanifold in the sense of being representable at each 
point as the graph of a function from IR? to R via some local topological manifold chart. In other words, it is 
not embedded in a regular manner in the ambient manifold. This justifies the distinction between concepts 
(2) and (3) in Remark 50.2.3. (Writing a formula for Figure 50.2.1 is left to the reader as an exercise.) 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1608 50. Topological manifolds 


50.2.5 REMARK: A topological submanifold is a subset with a suitable relative topology. 

The definitions of submanifolds for topological and differentiable manifolds have some technical differences. 
This is because of the differing role of charts and atlases. A topological manifold is defined as a topological 
space which happens to have the property that it is Hausdorff and everywhere locally homeomorphic to 
a Cartesian space. These local homeomorphisms are then optional charts for the manifold, which may be 
aggregated into atlases. A differentiable manifold, on the other hand, requires an atlas of some kind to 
specify its structure. (This difference is also discussed in Remark 49.2.4.) 


This difference in the method of specification affects the way in which submanifolds are defined. In the 
case of topological manifolds, the natural choice for the topology for a submanifold is the relative topology 
which is induced on a subset by the ambient space. Then one merely needs to test such a set, with its 
relative topology, to determine whether it meets the requirements to be a topological manifold. In the case 
of differentiable manifolds, some kind of a “relative atlas” must be defined for the submanifold. (The concept 
of a “relative atlas” is mentioned in Remarks 52.4.9 and 52.7.4.) 


It should be possible to call any subset of a topological manifold a topological submanifold if the relative 
topology meets the criteria of Definition 49.4.7 for a locally Cartesian space, or Definition 50.1.1 for a 
topological manifold, depending on whether one requires the Hausdorff property. For brevity and simplicity, 
definitions and theorems are given here only for topological manifolds, not for the slightly more general 
locally Cartesian spaces, which may or may not be Hausdorff. Since the relative topology on a subset 
of a topological space is Hausdorff if the ambient space is Hausdorff, the Hausdorff separation property 
is automatically inherited by subsets of a topological manifold by Theorem 33.1.33 (iii). So a topological 
submanifold is automatically Hausdorff if the ambient manifold is Hausdorff. Otherwise the Hausdorff 
property is independent of the definitions for submanifolds. 


The term “topological submanifold” is used in this book to more clearly distinguish the purely topologically 
embedded submanifolds in Definition 50.2.6 from the regular submanifolds in Definition 50.2.8. In the 
literature, it is most often the regularly embedded style of submanifold which is assumed. 


50.2.6 DEFINITION: A topological submanifold of a topological manifold M is a topological manifold (S, Ts) 
such that S C M and Ts is the relative topology on S in M. 


50.2.7 REMARK: Regular submanifolds of topological manifolds. 

Regular submanifolds of topological manifolds are topological submanifolds which satisfy the regularity 
condition on line (50.2.1) in Definition 50.2.8. (Essentially the same condition is given also by Choquet- 
Bruhat [6], pages 11-12.) Regular submanifolds are sometimes called “embedded submanifolds” because the 
identity function of the submanifold is a regular embedding of the submanifold within the ambient manifold. 
(See Definition 50.3.6 for regular embeddings.) 

Line (50.2.1) uses the set atlas;(M) in Notation 49.7.20, which is the set of all locally Cartesian charts 
for M which are compatible with Top(M) and contain p in their domain. (These charts are automatically 
compatible with each other, as mentioned in Remark 49.7.8.) 


50.2.8 DEFINITION: A regular (topological) submanifold of a topological manifold M is a topological sub- 
manifold S of M which satisfies 


Vp € S, 3v € atlas, (M), Dom(y) n S = y ! (IR^ x {0rr-m}), (50.2.1) 
where m = dim(S) and n = dim(M). 


50.2.9 REMARK: Regular submanifolds are locally homeomorphic to a zero graph. 

Definition 50.2.8 line (50.2.1) requires each point p in a regular topological submanifold S to have a local 
chart s» € atlas; (M) such that the image v(S) is a relatively open subset of the hyperplane IR" x {Ogn—m} 
of R”. This is the same as requiring v(S) to be the graph of the zero function on some open subset of R™. 
The equivalence of this zero-graph condition is shown in Theorem 50.2.10 (ii). Then from Theorem 50.2.13, 
it follows that this zero-graph condition may be replaced by an equivalent continuous-graph condition. 

It should be noted that in Theorem 50.2.10, the condition (S) = hole in part (ii) is not sufficient to 
ensure that ~ satisfies the condition v(S) = Range(v) n (R™ x {Ogn—m}) in part (i), which is equivalent 
to line (50.2.1) in Definition 50.2.8. (S) = ho, does imply v(S) C Range(v) n (IR x {Opn-m}), but 


la 
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the reverse inclusion is not implied. It is for this reason that a restricted chart v is constructed from v in 
the proof of part (ii). All things considered, it is probably better to incorporate such a restriction into the 
definition of a submanifold. (For such an approach, see Lang [23], pages 25-26.) This kind of restriction is 
imposed also in the construction of the chart 9 = $ o v4 in the proof of Theorem 50.2.13. 


50.2.10 THEOREM: Equivalent zero-graph condition for regularity of a topological submanifold. 
Let S be a topological submanifold of a topological manifold M. Let n = dim(M) and m = dim(5). 


(i) S is a regular topological submanifold of M if and only if 


Vp € S, Iy € atlas,(M), v(S) = Range(V) n (IR" x {Ogn-m}). 
(ii) S is a regular topological submanifold of M if and only if 


Vp € S, IY € atlas, (M), INQ € Top(R™), 
v(S) = ((z,0g»-»); £ e OF 
= hg 


" (50.2.2) 


where ho : R™ > R”—™ is defined by ho(z) = Ogn-m for all z € R”. 


PROOF: For part (i), let p € S and v € atlas,(M). The condition Dom(v) N S = v^! (IR? x {Ogn—m}) is 
equivalent to /(Dom(wv) n S) = v(v- (IR" x {Ogn—m})) because y% is injective. This is then equivalent to 
v(S) = Range(v) n (IR" x {0rnr-m }) by Theorems 10.9.12 (ii") and 10.10.2 (i). The assertion follows. 

For part (ii), note that IR" x (0gs-5) = ((2,0ms-»); z € R™} = {(z, ho(z)); z € R™} = ho. 

Suppose that S is a regular topological submanifold of M. Let p € S. Then v(S) = hg N Range(v) for 
some v € atlas,(M) by part (i). Let Q = IIP(v(S)) = II" (ho N Range(v)). Then Q € Top(IR") by 
Theorem 32.10.3 (i). Since v(S) C IR" x {Ogn—m}, it follows that v(S) = {(z, Ogn—m); (z,0m»-«) € v(S)]). 
So Tp (W(S)) = {2; (2, 0nn—m) € v(S)]. Therefore Y(S) = {(z,0pnm); 2 € IIT(U(S))) = holo This 
verifies line (50.2.2). 

Now assume condition (50.2.2). Let p € S. Then v(S) = holo for some v € atlas,(M) and Q € Top(R”™). 
Let U = i 1 (Ox IR^7?) and y = v|;. Then 9(S) = v(Un S) = YUN YS) = v(S)n (Qx R”) = ho 
because w is injective, and 


lo 


Range(4) N (R" x {0rn-m}) = Y(U) N (R" x {Opn—m}) 
= (Range(U) N (Q x R^7)) N (R^ x {Opnm}) 
= Range() N (R^ x {0rn-m}). 


Since U € Top( M), and p € Dom(v) = UnnDom(v) € Top(M) because p € S, it follows that v € atlas; (M). 
Hence $ is a regular topological submanifold of M by part (i). 


50.2.11 REMARK: Replacement of the zero-graph condition with a continuous-graph condition. 

The zero-graph condition in Theorem 50.2.10 (ii) for regularity of a topological submanifold may be replaced 
by a continuous-graph condition. Theorem 50.2.13 asserts, roughly speaking, that one continuous graph may 
be replaced by any other continuous graph. In particular, a zero graph (which is of course continuous) can 
be replaced by any other continuous graph. Consequently the zero-graph condition can be replaced by an 
equivalent general continuous-graph condition. 


It could be argued that the basic definition for a regular topological submanifold should be expressed in 
terms of continuous graphs, not zero graphs. One advantage of this would be to make it easier to prove that 
a submanifold is regular. In the case of differentiable manifolds, the differentiability class of the submanifold 
could be more easily fine-tuned. For example, the graph could be required to be C*, C^^, analytic, Dini 
continuous, and so forth. Unfortunately, these differentiability classes are only meaningful if the atlas 
transition maps are in the same class (or stronger). So then there would be a double requirement. Both the 
atlas and the submanifold local graphs would need to be in the same class. An advantage of the zero-graph 
condition is that it is very simple. The atlas transition maps must lie in some differentiability class, but this 
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is necessary anyway. And to make it easier to prove that a submanifold is regular, theorems can be provided 
for this purpose. 


It should perhaps be emphasised that the atlas whose transition map class must match the submanifold 
graph class is the mazimal (or “complete” ) atlas for the corresponding differentiability class, not the given 
atlas for the ambient manifold. The class of this maximal atlas must not be stronger than the class of the 
given atlas. Thus for example, a C! submanifold is well defined in a C? ambient manifold by using its 
maximal C! atlas, but a C? submanifold is not well defined in a C! ambient manifold. 


50.2.12 REMARK: Construction of a chart which maps a given submanifold to a given graph. 

The purpose of Theorem 50.2.13 is to construct charts which map given submanifolds to given graphs. 
These charts must be compatible with the manifold structure of the ambient space. Therefore they cannot 
be completely freely chosen. In the case of topological manifolds and submanifolds, if a given chart maps 
the submanifold locally to one continuous function graph in Cartesian space, a new chart can be constructed 
to map the submanifold locally to any other continuous function with the same domain. For C^ manifolds, 
Theorem 52.3.12 performs the same task, but the graphs must be C* differentiable. In particular, this kind 
of "graph interchange" construction permits a zero-graph condition to be replaced with a continuous-graph 
condition, or vice versa, and similarly for C* graphs in C^ manifolds. 


Theorem 50.2.13 states, in essence, that if S is the graph of a function hy € C(Q,R"~™) via some compatible 
chart t4, then S is the graph of any function hg € C(Q,R”~™) via some compatible chart Y2. Moreover, 
the first m coordinates of i/i, and Y%2| q are the same. (Theorem 50.2.13 is illustrated in Figure 50.2.2.) 


ls ls 


Figure 50.2.2 Construction of chart with alternative Cartesian graph 


50.2.13 THEOREM: Interchangeability of continuous-graph conditions for submanifold regularity. 

Let M be a topological manifold with n = dim(M). Let S C M and m € Zj with m < n. Let yı € atlas(M) 
with v(S) = hi for some hy € C(Q,IR"^"") with Q € Top(R™). Let hg € C(Q,R"~™). Then 2(S) = h2 
and II?" o pails =] o0 pals for some %z € atlas( M). (See Definition 14.6.11 for II? : (z;)2., 9 (a;)%4.) 


PROOF: Let Q € Top(R™) with hy, hg € C(Q,IR^^"*), where yı € atlas(M) satisfies v1(S) = hi. Define 
$: Qx R-™ > Q x R” by 


Vr Ee Qxm^-", (x) = £ + (Orm , ha (HIT ()) — hi (HT (2))), (50.2.3) 


where the addition operation “+” on IR" is defined in the usual componentwise fashion. The bijectivity of 
o: Q x R”7™ > Q x R”7™ follows by swapping hı and ho to obtain its inverse. Also ¢ is continuous with 
respect to the relative topology on Q x R”~™ within IR” because hı and hg are continuous, and similarly 
¢ | is continuous. So ¢: Q’ — Q is a homeomorphism, where €) = Q x R”=™ € Top(IR?). 

Then é(Range(v1)) € Top(IR") by Theorem 31.6.6 because Range(v1) € Top(IR”) and Dom(¢) € Top(R”). 
Let Y2 = $ o pı. Then y2 is a homeomorphism from Dom(y») = v, !(Dom(9)) = v4 (Q) € Top(M) to 
Range(v») = é(Range(41)) € Top(IR"). So v» € atlas(M). Moreover, 


= é(hi) 

= {9(z,hi(z)); z € Q} 

= {(z, hı (z) + ha(z) - hi(z)); z € OF 
= hə 
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and 


Vq € S, HT (V»(q)) = Wr (ó(vs (a) 
= Wr (vi (q)) 
by line (50.2.3). Thus v3(S) = hg and IIP o pails —IIPo pals with Y2 € atlas( M). 


50.2.14 REMARK: Projected submanifold charts. 

The functions II7" o Yl] g =F o Yol g in Theorem 50.2.13 are (equal) charts for the submanifold S. In 
fact, II o vi|, € atlasp(S). This is not of so much importance for topological submanifolds because the 
relative topology contains all of the manifold structure information for the submanifold. But in the case of 
differentiable manifolds, such charts carry significant information about the differentiable structure on the 
submanifold. Then the regularity of the embedding requires the submanifold’s own atlas to be compatible 


with these projected charts IT" o v. 


50.2.15 THEOREM: Equivalent continuous-graph condition for regularity of a topological submanifold. 
Let S be a topological submanifold of a topological manifold M. Let n = dim(M) and m = dim(S). Then 
S is a regular topological submanifold of M if and only if 


Vp € S, Iy € atlas, (M), IQ € Top(IR?), 3h € C(Q, IR^"), (50.2.4) 
v(S) = ((z, h(z)); x € OF 
=h. 


PROOF: Let S be a regular topological submanifold of M. Let p € S. Then by Theorem 50.2.10 (ii), there 
exist v1 € atlasp(M) and €) € Top(R™) such that V4(S) = ho|,, where ho : IR" — R"~™ is the zero 
function. Thus condition (50.2.4) is satisfied with h = ho|,, € C(Q, IR"). 

Now suppose that condition (50.2.4) is satisfied. Let p € M. Then there exist yı € atlas; (M), Q € Top(IR?) 
and hy € C(Q, IR^?) such that 4(S) = ((x, hi(z)); x € Q} = hy. Let h3 = holo: Then hg € C(Q, R^"). 
So by Theorem 50.2.13, there exists y2 € atlas(M) such that (S) = ha and HP o v4], = HP o v»|,. 
Since p € Dom(vy41) N S = Dom (IIP? o Vi]; it follows that p € Dom(v») n S = Dom(II?" o palo). Therefore 
V» € atlasp( M). Thus condition (50.2.2) is satisfied. Hence S is a regular topological submanifold of M by 
Theorem 50.2.10 (ii). 


50.2.16 REMARK: Arbitrariness of the zero value in the definition of a regular submanifold. 
Theorem 50.2.17 asserts that Ogn-m in Definition 50.2.8 can be replaced by any real (n — m)-tuple. 


50.2.17 THEOREM: Regularity of submanifolds satisfying a constant-graph condition. 
A topological submanifold S of a topological manifold M is a regular submanifold of M if and only if 


Vp € S, Iy € atlas; (M), de c R”, 
Dom(wv) n S = y 1 (IR" x {x}), (50.2.5) 
where m = dim(S) and n = dim(M). 


PROOF: Suppose that S is a regular topological submanifold of M. Then condition (50.2.5) is satisfied 
with £ = Ügs-» by Definition 50.2.8. 

Now suppose that condition (50.2.5) is satisfied. Let p € S. Then Dom(v) n S = ^! (IR" x {x}) for some 
V € atlas; (M) and z € IR"^"". So v(S) = Range(v)' (IR" x (z)) by Theorem 10.7.1 (i). Let Q = II (v(S)). 
Then Q € Top(R™) and (S) = h, where h = {(z,x); z € Q}. By Theorem 31.12.9, h € C(Q,R"~™). Hence 
by Theorem 50.2.15, S is a regular topological submanifold of M. 


50.2.18 REMARK: Regular submanifolds of topological manifolds with codimension zero. 
In the special case m = n in Definition 50.2.8 and Theorems 50.2.10 and 50.2.15, the set R”~™ equals the 
singleton {0go}, where Ogo equals the empty tuple () = Ø € IR?. Clearly then IR" x {Ogn—m} may be 
identified with IR”. So a regular submanifold is nothing more or less than an open subset S of M with the 
relative topology on S in M. All such open subsets are necessarily regular submanifolds. 


The number n — m is the “codimension” of the submanifold, while m is its dimension. The most interesting 
submanifolds are, of course, those whose dimension and codimension are both positive. 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1612 50. Topological manifolds 


50.3. Continuous embeddings, immersions and submersions 


50.3.1 REMARK: Construction of submanifolds of topological manifolds from maps. 

For constructing and validating topological submanifolds, the criteria in Definitions 50.2.6 and 50.2.8 could 
be inconvenient in practice because one must apparently first construct the relative topology for a given 
subset S of M, and then try to find a suitable local coordinate chart at every point of S. Sometimes this is 
easy, sometimes not. 


A submanifold can often be constructed and validated more easily by specifying it as the image of a suitable 
kind of map from an already validated topological manifold. In fact, it is much more usual to work with 
manifolds which are differentiable everywhere or almost everywhere. Then only the non-differentiable subsets 
need to be tested in more detail. Most textbooks don’t define purely topological submanifolds at all. (See 
Section 52.5 for differentiable manifold embeddings, immersions and submersions.) 


Theorem 50.3.2 shows that the image of any homeomorphism from a source topological manifold to a subset 
of a target topological manifold is a topological submanifold of the target space, although such a submanifold 
is not necessarily a regular submanifold in the sense of Definition 50.2.8. (The proof of Theorem 50.3.2 is 
illustrated in Figure 50.3.1.) 


Mi 


Figure 50.3.1 Continuous embedding of a topological manifold 


50.3.2 THEOREM: The image of a homeomorphism between a space and a subset is a submanifold.. 

Let f : Mı — M» be a map between topological manifolds Mi and Mə such that f : Mı > f(Mi) is 
a homeomorphism with respect to the relative topology on f(Mi) in Mə. Then f(Mi) is a topological 
submanifold of M3. 


PROOF: By Theorem 31.14.11, the relative topology Top(f(M1)) on f(M;) is {f(Q); Q € Top(.Mi)] since 
f : Mi — M is a homeomorphism. To show that (f(Mi), Top(f(Mi1))) is a topological manifold, let 
p € f (Mi), and for a chart V € atlas p-1(p) (Mj), let Y = Y o f! : f(Dom(#)) > R™, where nı = dim( M1). 
Then Dom(y) = f(Dom(v)) € Top, (f (M1)), and ) is a homeomorphism between Dom(i) and Range(4) 
because f~! : Dom(i)) > Dom(w) and y : Dom(v) > Range(v) = Range(y) are both homeomorphisms. 
Since f(M;) is a Hausdorff space by Theorem 33.1.33 (iii) because M» is a Hausdorff space, it follows from 
Definition 50.1.1 that f(Mi1) is a topological manifold. Hence (Mi) is a topological submanifold of M» by 
Definition 50.2.6. 


50.3.3 DEFINITION: A continuous embedding of a topological manifold M; in a topological manifold M» is 
a homeomorphism from Mj to a topological subspace of M5. 


50.3.4 THEOREM: The image of a continuous embedding of one space in another is a submanifold.. 
The image f/ (M1) of any continuous embedding f : Mı — Mo» of a topological manifold M; in a topological 
manifold Mə is a topological submanifold of M2. 


PRoor: This is a paraphrase of Theorem 50.3.2 in terms of Definition 50.3.3. 


50.3.5 REMARK: Regularity of embeddings of topological manifolds. 
The purely topological or continuous style of embedding in Definition 50.3.3 is inadequate for many purposes. 
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As demonstrated by Example 50.2.4, not all submanifolds are regularly embedded. Therefore the regularity 
of an embedding must be explicitly required, as in Definition 50.3.6. In the literature, the term “embedding” 
usually refers to the regular style of embedding, not the purely topological style. 


50.3.6 DEFINITION: A regular embedding of a topological manifold Mı in a topological manifold M» is a 
homeomorphism from Mi; to a regular topological submanifold of M3. 


50.3.7 REMARK: Generalisation of embeddings to non-injective immersions. 

An immersion is the same as an embedding locally, but globally it is not necessarily injective. Consequently 
the image of an immersion is not necessarily a topological submanifold of the target space. So immersions 
are only fairly loosely connected to submanifolds. Even if an immersion is injective, its inverse might not be 
continuous due to “near-overlaps” , which also can cause a failure of the image to be a topological submanifold. 
(See Example 50.3.11.) However, immersions are required to be at least locally well behaved. 


50.3.8 DEFINITION: A regular immersion of a topological manifold M; in a topological manifold M» is a 
continuous map f : Mı — Mə such that 


Vp € Mi, JIN E€ Top, (M1), 
fle : Q 2 f(Q) is a homeomorphism, and f(Q) is a regular submanifold of M3. 
50.3.9 REMARK: All regular embeddings are regular immersions, but not vice versa. 


Theorem 50.3.10 shows that every topological manifold embedding is a topological manifold immersion. 
Example 50.3.11 shows that the converse is not true, even if the immersion is injective. 


50.3.10 THEOREM: A regular embedding between two manifolds is a regular immersion. 
Let f : Mı — M2 be a regular embedding between topological manifolds Mı and Mz. Then f is a regular 
immersion of M, in Mə. 


PROOF: Let f : Mı — Mə be a regular embedding between Mı and M». Let p € Mi. Let Q = Mj. 
Then f m : Q — f(Q) is a homeomorphism and f(Q) is a regular submanifold of M5. Hence f is a regular 
immersion of Mı in Mo. 


50.3.11 EXAMPLE: An injective regular topological manifold immersion which is not an embedding. 
Figure 50.3.2 illustrates an injective continuous map f : t + (sint,sin2t) from the topological manifold 
(0,27) C IR to the topological manifold IR?, but f^! is not continuous although f is an embedding locally 
at every point in the domain of the map. 


t — 5n/A 724 t — n/A 
l- 


» 
-1 1 tı 
E 
f : (0,27) — R? with f : t — (sint,sin2t) 
Figure 50.3.2 An injective regular immersion which is not an embedding 
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Let p € (0, 2m). Let Q = (p — 0, p + ô) with 5 = 5 min(p, 2x — p). Then Q € Top, (Mi), and Flo :Q 2 f(Q) 
is a homeomorphism with respect to the relative topology on f(Q), and f(Q) is a regular submanifold of R?. 
So f is a regular immersion from (0,27) in R? by Definition 50.3.8. 


Since f is a bijection, f^! is a well-defined function from Range(f) to (0,27). To show that f^! is not 
continuous at q = (0,0) € IR?, note that f~'(q) = m, and let Qy = (m — e,m +e) € Top, ((0,25)) for 
some € € Rt. Let Qə = Ba. € Top,(IR?). Let t = ¢/3. Then |f(t)| € (t? + (2t)?)!/? = 5V?|t| < e. So 
f(t) € Qe, but f-1(f(t)) = t € Qı. Thus no open neighbourhood Qo of q satisfies f^! (053) C Qi. (This 
follows from the fact that the open balls are an open base for the metric space IR?.) So f^! is not continuous. 
Therefore f is not a homeomorphism onto its range with respect to its relative topology in R?. Hence f is 
not a regular embedding of (0,27) in R?. 


50.3.12 REMARK: Submersions are fibrations. 

As noted in Remark 21.1.1, every function is a (non-topological) fibration. Thus every map f : Mı > Mo, 
for topological manifolds Mı and M», may be viewed as a fibration. But as noted in Remark 21.1.10, the 
theoretical framework of fibrations (and fibre bundles) is principally concerned with functions which are 
surjective and nowhere-injective. In other words, a map f : M4 — M» is only interesting from the fibration 
theory point of view when f^ !([p]) contains more than one element for every p € Mz. Therefore embeddings 
and immersions have no real interest from the point of view of fibrations, but a submersion is intentionally 
defined so as to permit (and encourage) non-injectivity everywhere in its image set /(M1) C M». Then by 
regarding the map f as the projection map of a fibration, one may regard the inverse images f~!({p}) C Mi 
as fibre sets for all p € f(M|). 


The equation on line (50.3.1) in Definition 50.3.13 ensures that a submersion is effectively the projection map 
of a fibration. Line (50.3.1) may be written as Uo o f o yy! = II? o» Where II? : IR"! — R”? is defined 
—_——— 1 


by II? : x (a;)2,, and Qi = Dom(y2 o f o wy") € Top,, (jj (IR"*). Clearly the component projection 
map II}? may be replaced by any other linear map from R” to a linear subspace which has dimension n3. 
It is also clear that the plane (x € IR?!; (z;)72,) may be replaced by the graph of a continuous function 
h: R”? — R1 7"2, so that va(f (v !(z))) = (HI? (x), A (II? (z))) for all z € Dom(vo o f o yy"). 


50.3.13 DEFINITION: A regular submersion of a topological manifold M4 in a topological manifold M», 
where n; = dim(Mi) > n2 = dim( M2), is a continuous surjection f : Mı — M^» such that 


Vp € Mi, Sy, € atlas; (Mi), 3» € atlas sep) (M5), Vx € Dom(y» o f o ur. 
Va (1 (2))) = (wa) P21. (50.3.1) 


50.3.14 REMARK: Immersions and submersions have a different character. 

Despite the similarity of the names, immersions and submersions have a different character. Roughly speak- 
ing, an immersion has similar character to an injective linear map R™ — IR"? with nı < n2, whereas a 
submersion has similar character to a surjective linear map R”! > R”? with nı > ng. In the former case, 
the source space is embedded in the target space as a linear subspace. In the latter case, the source space 
may be partitioned into hyperplanes, each of which maps to a single point of the target space, and these 
planes constitute a quotient linear space. 


In the case ny < ng, the map determines the target-space image-hyperplane and its relation to the source 
space. In the case n4 > n5, the map determines the source-space kernel-hyperplane and its relation to the 
target space. Similarly, the interest for topological manifold immersions (and embeddings) is in the target- 
space image and how it is controlled by the map, whereas the interest for topological manifold submersions 
is in the source-space pre-image and how it is controlled by the inverse map. 


50.3.15 REMARK: Alternative expression for the regularity condition for a submersion. 

The regularity condition for a submersion in line (50.3.1) can be slightly rewritten so that the range of v4 is 
required to be a Cartesian product 9° x Of of open subsets Q? and Q of IR^? and IR": "2 respectively. These 
sets correspond to the base space and fibre space respectively. Thus a regular submersion of a topological 
manifold M, in a topological manifold M» is a continuous surjection f : Mı — Mə such that 


Vp € My, Iyı € atlas, (Mi), 3v» € atlas fep) (M2), IN? € Top(R”), 30f € Top(IR^: ^2), 
Range) = Qj x Qf and Range(w2)=OF and woo foi = IP lamers)" 
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where n; = dim(Mi) > n2 = dim(M5), and II? : IR — R”? is defined by I}? : a+ (z;);2,. (A similar 
form of definition is given by Lang [23], page 27.) 


50.3.16 EXAMPLE: Submersions from IR? \ {0} onto the unit circle S1. 

Define the map fı : R?\{0} > S! = (x € R2; |x| = 1} by fi : x 9 x/|z]|, and define f,41 : R?\{0} — S! for 
k € Z* by frat: 2 fe(x)-fi(x), where y: z is defined for all y, z € IR? by y:z = (yiza—9222, 1224-271), as 
for complex numbers. (Thus in terms of complex number exponentiation, f(x) = (z/|z|)^ for all z € R?\{0} 
and k € Z+.) Then f, is a topological manifold submersion for all k € Z*, assuming the usual relative 
topologies on IR? V {0} and St. (With the relative topologies, these sets are both topological manifolds.) 


A single ray Rẹ = (cx; c € Rt} is mapped by fı onto each point in S1, but for k > 1, there are k such 
rays which are mapped by fk to each point of S1. This shows that in some sense, submersions have more 
in common with immersions than with embeddings because the regularity condition is enforced only locally 
for each point of the source manifold. So a kind of “multiple wrapping" or “multiple covering" is permitted. 
(This is explained further in Example 50.3.17.) 


50.3.17 EXAMPLE: A topological manifold map can be both an immersion and a submersion. 

With the same kind of complex number style of multiplication as in Example 50.3.16, define g, : S! > S! 
by gy : x — x". Then clearly gı is a topological manifold embedding, and it is also a regular immersion and 
a regular submersion. But for k > 1, the map gy is both an immersion and a submersion because it is a 
regular homeomorphism locally at every point of the source manifold, but it is not an embedding because it 
is not a homeomorphism globally. 


50.4. Products of topological manifolds 


50.4.1 REMARK: Direct products of topological manifolds with or without atlases. 

Since topological manifolds are defined to be a subspecies of topological spaces in Definition 50.1.1, the direct 
product of topological manifolds is already defined as a matter of pure topology. (See Definition 32.9.4 for 
the product of two topological spaces.) Consequently, Definition 50.4.2 is more or less redundant. In 
Definition 50.4.8, the direct product of topological atlases is defined via given atlases. The resulting induced 
topology on Mı x Mə is the same via Definition 50.4.2 or via Definition 50.4.8, no matter which atlas is 
used, since they are all compatible. However, Definition 50.4.8 has the advantage that it is consistent with 
Definition 52.6.2 for differentiable manifolds, and in many contexts it is preferable to define topological 
manifolds via atlases. 


50.4.2 DEFINITION: The (direct) product of topological manifolds (Mi, T1) and (M3, T3) is the topological 
manifold (Mı x Ms, T), where T is the direct product topology on Mi x Mə. 


50.4.3 THEOREM: Dimension of a direct product of topological manifolds equals the sum of dimensions. 
The product of topological manifolds Mi and M» is a non-empty topological manifold M with dim(M) = 
dim( Mı) + dim(M3) if Mı #9 and Mə # 0), and is an empty manifold with dim(M) = 0 otherwise. 


PROOF: Let (M;i,Ti) and (M3, T3) be non-empty topological manifolds. Then M = (Mı x M5, T) is a non- 
empty locally Cartesian space with dim(M) = dim( Mı) + dim(M3) by Theorem 49.4.24, which is Hausdorff 
by Theorem 33.1.35 (iii). So the product is a non-empty topological manifold by Definition 50.1.1. If Mı = 0) 
or Mz = Ó, then the set Mı x Mz is empty by Theorem 9.4.6 (i), which implies by Definition 49.4.10 that 
dim(M) = 0. 


50.4.4 REMARK:  Generalisation of products of topological manifolds to locally Cartesian spaces. 

It is too tedious to spell out in full the obvious extension of topological manifold concepts (such as products) 
to locally Cartesian spaces. For example, Definition 50.4.6 is exactly the same with the term "topological 
manifolds" replaced by *locally Cartesian spaces". Similarly, Theorem 50.4.7 (iii) is still valid if it is modified 
to say that the product of locally Cartesian spaces is a locally Cartesian space. 


50.4.5 REMARK: The direct product of atlases for two manifolds. 

The double-domain direct product functions v; X %2 in Definition 50.4.6 are given by Definition 10.14.3, which 
defines 4 X V» : (p1, p2) + (V1(p1), Y2(p2)) for py € Dom(t/1) and p € Dom(v»). The usual identification 
of IR^: x R”? with IR^: *"? by concatenation is assumed, where n; = dim(Mi) and ng = dim(M3). (See 
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Definition 14.6.10 for concatenation of tuples.) Thus Definition 50.4.6 simply combines the tuples from charts 
by concatenating them. The topology induced by the direct product atlas on the set-product Mı x Mə is the 
same as the direct product of the topologies induced by the individual atlases on their respective point-sets. 


Definition 50.4.6 is essentially identical to the corresponding Definition 49.3.10 for non-topological manifolds. 


50.4.6 DEFINITION: The (direct) product atlas for topological manifolds M; and M» with continuous atlases 
A, and Ag respectively is the set (i4 X We; Yı € A1 and v» € A5]. 


50.4.7 THEOREM: Some basic properties of direct product atlases for topological manifolds. 
Let Mı and M» be non-empty topological manifolds with continuous atlases A, and Ag respectively. Let A 
be the direct product atlas for M; and Mz. Let M = Mı x Mə be the product of the topological spaces Mi 
and M». 
(i) M is a topological manifold with dim(M) = dim(Mij) + dim( M2). 
(ii) Yı X V» is a chart for M for all yı € Ay and v» € A3. 
(iii) A is a topological manifold atlas for M. 
(iv) The topology induced by A on M is equal to Top( M). 


PROOF: Part (i) follows from Theorem 50.4.3. 


For part (ii), let v1 € A1 and V» € A». Yk : Nk — Gy are homeomorphisms for some Qy € Top, (Mx) 
and Gk € Top(R"*) for k = 1,2. Let Q = Qı x Qz and G = G4 x G2. Then 2 € Top(Mi x M3) and 
G € Top(R™ x IR"2) by Theorem 32.9.6 (ii). So Q € Top(M) and G € Top(R™ +") = Top(IR?) by the 
standard identification of R™ x IR"? with IR": *"2, Let v» = V X v; as in Definition 10.14.3. Then v : 0 G 
is a homeomorphism by Theorem 32.9.10 (ii). Hence V; X v» is a chart for M. 


For part (iii), let € A. Then y = vi X v» for some y; € Aji and vy» € A» by Definition 50.4.6. 
So 4 is a chart for M by part (ii). Let X = Uye,Dom(p). Then X = UJ, ca, U,, c4, Dom(vi X 43) = 


Upea: Loses Dom(w1)xDom(w2). So X = (Uie a; Dom(11))x (Uyse da Don (U5)] by Theorem 9.4.7 (v). 
So Upea Dom() = X = Mi x M» by Definition 49.7.3. Hence A is an atlas for M by Definition 49.7.3. 


Part (iv) follows from Theorem 49.8.18. 


50.4.8 DEFINITION: The (direct) product via atlases of two topological manifolds with atlases, (Mi, A1) 
and (M2, A2), is the topological manifold with atlas (M; x M2, A), where A is the direct product atlas for 
Mi and Mə. 


50.4.9 REMARK: The direct product of two maximal atlases is not maximal. 

Generally the direct product atlas of two maximal atlases for topological manifolds is not itself maximal. 
This is another good reason to not work exclusively with maximal or complete atlases. It is possible to 
“normalise” the atlas to be maximal following the construction of a direct product, but this has insignificant 
advantages and significant disadvantages, as alluded to in Remarks 51.2.2, 51.4.4, 54.5.31 and 54.9.12. 


50.5. Product-structured topological manifolds and submanifolds 


50.5.1 REMARK: Atlases which are pulled back from direct products to product-structured manifolds. 
Theorem 50.5.2 gives some basic properties of the “pull-back atlas” which is induced on a product-structured 
topological manifold by a direct product of manifolds via a homeomorphism. (See Definition 49.11.7 for pull- 
back atlases for general topological manifold homeomorphisms.) 


In practice, a product-structured topological manifold may have no explicit atlas or topology. Particularly 
in the case of fibre sets of fibre bundles, the atlas and topology may be entirely specified by “induction” from 
products of manifolds, in other words from the base space and fibre space. Therefore the atlas atlas( Mo) 
for Mo in Theorem 50.5.2 may only exist as an “abstract” atlas, which is nowhere specified. This kind of 
abstraction is seen also for general manifolds, where the abstract topology of a manifold is determined by 
a locally Cartesian atlas, and the point-set itself may never be specified except as an abstraction from a 
specified atlas. Thus the induced atlas on a product-structured manifold is in practice much more than just 
"compatible" with the given atlas on Mo. In practice, it typically is the given atlas on Mo. 
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50.5.2 THEOREM: Some basic properties of pull-back atlases for product-structured topological manifolds. 
Let Mi < (Mi, A1), M» < (M3, A2) and Mo be topological manifolds. Let $1 : Mo — Mı and ¢2 : Mo > M2 
be continuous functions such that $4 x $» : Mo — Mı x Mə is a homeomorphism. Let Ao be the pull-back 
atlas for Mo via $1 x $» from the direct product atlas A = (V X tjs; V4 € A1 and ys € A2} on Mı x Mə. 


(i) Ao = ((V1 X V2) o ($1 X $2); V1 € Ai, V» € A2}. 
(ii) The pull-back atlas Ao is a continuous atlas for Mo. Hence Ao C atlas( Mo). 


(ii) (Vo € Ao; po € Dom(vo)) = ((V1 X Y2) o (G1 x $2); Vi € atlas, (p) (M1), Y2 € atlas, (5 (M) for all 
po € Mo, where atlas( Mi) = A; and atlas( M2) = A2. 


(iv) If A = atlas(M; x M3), then Ao = atlas( Mo). 


PROOF: Part (i) follows from Definitions 49.11.7 and 50.4.6. 
Part (ii) follows from Theorem 49.11.8 (i). 

Part (iii) follows from Theorem 49.11.8 (ii). 

Part (iv) follows from Theorem 49.11.8 (iii). 


50.5.3 REMARK:  Decomposing a function-product homeomorphism into pointwise homeomorphisms. 
The decomposition of a homeomorphism of the form $1 x $» : Y — X4 x X» into pointwise topological 
manifold isomorphisms in Theorem 50.5.4 is applicable to the study of topological fibre bundles. 


Theorem 50.5.4 is the topological manifold version of Theorem 32.11.4, which is for general topological 
spaces. When topological space terms, such as “continuous” and “homeomorphism” in Theorem 50.5.4, 
are applied to topological manifolds, it is understood that they apply to the underlying topologies on the 
relevant spaces. (It would be tedious to redefine every topological concept for manifolds.) However, there 
is a definitional issue for the concept of a topological manifold isomorphism between a topological manifold 
(such as X; or X2) and a subset (such as 3 '({x2}) or $1 ((z1])) of a topological manifold Y. Such subsets 
must be given some kind of topological manifold structure. 


The induced topology on an arbitrary subset of a topological space is well defined. (See Definition 32.8.4.) 
But the induced topological manifold structure on an arbitrary subset of a topological manifold is not 
defined in general. In other words, every subset of a topological space is a topological space with the relative 
topology, but only special kinds of subsets of topological manifolds are topological manifolds. 


Until the late 19th century, it was commonly believed that images of continuous maps from open subsets 
of Cartesian spaces to Cartesian spaces must inherit the manifold structure of the source space, or at least 
its dimension. But it became clear that this was not true following the publication of a space-filling curve 
in 1890 by Peano [192]. (See Section 36.3 for space-filling curves.) Images of continuous curves can be 
quite pathological, possibly having a higher dimension than the domain of the curve. So the existence and 
properties of manifold structure on subsets of topological manifolds must always be verified. This applies 
even more to subsets of differentiable manifolds. (See Sections 52.3 and 52.4 for submanifolds of differentiable 
manifolds.) 


Theorem 50.5.4 shows how a broad class of submanifolds may be constructed and validated. The set 
3 '({x2}) in part (vii) is equal to (1 x 93) ! (II! ((z3])), where (1 X ó2)7! : X1 x X2 — Y is a topological 
manifold isomorphism and II; ! ((131) = X1 x {a2} C Xi x X2. (This scenario is illustrated in Figure 50.5.1.) 


50.5.4 THEOREM: Homeomorphisms from horizontal/vertical submanifolds to direct-product components. 
Let X;,, X5 and Y be topological manifolds. Let $4 : Y — X, and $9 : Y — X» be continuous functions 
such that Q1 x $3 : Y — X4 x X» is a topological manifold isomorphism. 


(i) $i éz oap) | $5 '({a2}) — X, is a topological space homeomorphism for all xz € X». 
$2 ETE 6, ((x1)) — X» is a topological space homeomorphism for all zı € X4. 


(d lagu) : X1 > 05! ((23]) is a continuous embedding of X, in Y for all vg € Xo. 


) 
) 
(iv) (G21 i-i.) : Xə > 1 ((21]) is a continuous embedding of X» in Y for all zı € X1. 
) à dr za ` $5 I ((z2]) — X; is a topological manifold isomorphism for all x9 € Xə. 
)ó 


256-1243) | r ((241)) — X» is a topological manifold isomorphism for all zı € X4. 
1 
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H 
| 


R™ 
Figure 50.5.1 Topological manifold embedding from a direct product homeomorphism 
(vii) $5 ({x2}) is a topological submanifold of Y for all az € Xa. 
(viii) $1 !((z1]) is a topological submanifold of Y for all zı € X4. 
(ix) $5  ((12]) is a regular topological submanifold of Y for all zo € X». 
(x) ói1((21)) is a regular topological submanifold of Y for all zı € Xi. 


PROOF: Part (i) follows from Theorem 32.11.4 (i). 

Part (ii) follows from Theorem 32.11.4 (ii). 

Part (iii) follows from part (i) and Definition 50.3.3. 

Part (iv) follows from part (ii) and Definition 50.3.3. 

Parts (v) and (vi) follow from parts (i) and (ii) respectively and Theorem 50.3.2. 

Part (vii) follows from part (iii) and Theorem 50.3.4. 

Part (viii) follows from part (iv) and Theorem 50.3.4. 

For part (ix), let x2 € X2, S = à5!((z5]) and p € S. Let v € atlas;, (p) (X1) and V» € atlasy,(»)(X2) = 
atlasz, (X2). (See Notation 49.7.20 for the set atlas, (M) of all charts v on a topological manifold M such 
that p € M.) Let U = $4 T(Dom(v)) N d; (Dom(w2)). Then U € Top, (Y ). Define y : U — It"? 
by bs y > (i ( (9). (62 (y))) = (Wr X va)((01 X 2)(y)), where ng = dim(My) for k = 1,2. In other 


words, ù = (pı X V3) o ($4 x $3). Then v is continuous, open and injective because $, x $9 and Yı X t» 

are continuous, open and injective. Therefore v : U — Y(U) = v4(d4(U)) x va(ó5(U)) € Top(R™*"2) is 

a homeomorphism. So v € atlas,(Y) by Theorem 50.2.17. But Dom(v) N S = v ! (IR: x {x2}). Hence 
$5 '({a2}) is a regular topological submanifold of Y for all zo € X». 


Part (x) may be proved as for part (ix) by swapping X; and X». 


50.5.5 REMARK: All hyperplanes of all charts induce submanifolds onto topological manifolds. 

Example 50.5.6 suggests that hyperplanes parallel to axes in a coordinate patch are always mapped by charts 
on a topological manifold to submanifolds of that manifold. This is clearly true, but not all topological 
submanifolds can be generated in this way because of the limited range of topologies provided by Cartesian 
chart spaces. The continuous embeddings in Definition 50.3.3 allow the “coordinate space" M to be a 
general topological manifold which is embedded in a manifold Mə. Example 50.5.6 uses lines in a patch of 
IR? for the manifold Mj, but this kind of direct product of two copies of R cannot provide an embedding for 
the full latitude circles in S?. If S! x IR is used in place of Mi, the full latitude circles can be shown to be 
embedded as images of sets S! x {xg} for x2 € IR, but S! cannot be embedded in IR. So Theorem 50.5.4 (vii) 
cannot be applied in this case to prove that the full latitude circles in S? are embedded latitude circles for 
X4 and X» both submanifolds of IR. 


The limited scope of Theorem 50.5.4 is hinted at by Example 50.5.6. The theorem applies to manifolds which 
are homeomorphic to direct products X, x X2 of manifolds X4 and X2, and the submanifolds $5 ‘({22}) or 
9; ((z1)]) must be homeomorphic to one or the other of the components X, and X2. These are the kinds 
of constraints which are satisfied by “local trivialisations" of topological fibre bundles. (See Remark 21.5.7 
for local trivialisations.) 
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50.5.6 EXAMPLE: Figure 50.5.2 illustrates the embedding of topological submanifolds in a topological 
manifold. The map $1 X $» : Y —> X x Xə is a homeomorphism with Y = {y € S?; yj > —(y? + y3)'/?}, 
X; = (—m,7) and X = (—4r, $7) with the usual topologies. By Theorem 50.5.4 (vii), o5! ((z3)) is a 
topological submanifold of Y for any x2 € X». 


EX " 
$5 ({x2}) up 
at v/2 
Ye | eee eee eee al E 
i X x {a4} 1 2, = 50° i 
Xi x {x2} T2 = 15° i 
T] 1 1 LT. 
l-3 -2 ET 1 2 3! Xi 
| | 
i L -1 í 
| | 
E ca Se ld EE DEDERE [ 
—mr/2 -2 
$1 X 2 Y > X1 x Xo 
Y ={y € 8; y > -yy tyz} X4 x Xa = (—r, T) x (—7/2,T/2) 
$1 : y > arctan(y1, y2) $i! : {x1} (y € Y; yı = cos(2i), y2 = sin(z1)) 
$2 : y > arcsin(ys) $5! : {ra} (y € Y; ys = sin(z3)) 
Figure 50.5.2 Topological submanifolds embedded in a manifold 


50.5.7 REMARK: The natural atlas induced by a topological manifold isomorphism. 

The conclusions of Theorem 50.5.4 are fairly obvious. The spaces X; and Xə are regularly immersed in 
X4 x Xə by Definition 50.2.8 and Theorem 50.2.17. But $1 x 2: Y > X1 x Xa isa topological manifold 
isomorphism. So whatever is true (topologically) in Y should be the same as in X4 x X». 


More interesting is the natural choice of charts which arises from the isomorphism $4 x ¢2 : Y > X4 x X». 
This is asserted in Theorem 50.5.8. 


50.5.8 THEOREM: Atlases induced on horizontal/vertical submanifolds of product-structured manifolds. 
Let X;,, Xə and Y be topological manifolds. Let $4 : Y — X4 and $9 : Y — X» be continuous functions 
such that $1 x 3: Y > X1 x Xa isa topological manifold isomorphism. Let Aj; be a topological manifold 
atlas on X; for k = 1,2. 

(i) {V1 © à1[5; V1 € Ai} is a topological manifold atlas on S = $5! ((z2]) for all z2 € Xə. 


(ii) [Vo o da] a; Y2 € A2} is a topological manifold atlas on S = $1 !((z1]) for all zı € X. 
E 1 


e 


PROOF: For part (i), let z2 € X2 and S = ¢3'({x2}). By Theorem 50.5.4(), gil, : S > Xi is 
homeomorphism. Let yı € Ai. Then Dom(4) C X1 = Range(61 |). Consequently Dom(wv, o dilg) = 
ói| s (Dom(41) N Range(ói|5)) = ¢1|5 (Dom(v1)) = S N ó; ^ (Dom(vi)). It then follows that Upea, = 
S Uy eA, (61 (Dom(y1))) = Sn 67 (U,, c4, (Dom(y1))) = $n à; (X1) = SNY = S. In other words, 
the domains of the functions in (v o i| E 1 € Ai} constitute a cover for S. 


To verify that V o i| gisa valid topological manifold chart on S for each 7, € Aı, let Yı € A4 and note 
that ¢;'(Dom(w1)) is an open subset of Y because Dom(w1) is an open subset of X, and ¢; : Y — X; is 
continuous. Therefore Dom(v o $1 | g= SN ¢,'(Dom(w1)) is an open subset of S in the relative topology 
on S in Y. But Range(v1 o ¢1|¢) = Range(71) because Dom(y1) C Range(¢1|,). So Range(v1 o ¢1| 5) € 
Top(IR?), where n = dim(.X1). Therefore t o gı g: S — IR" is a valid n-dimensional topological manifold 
chart on S. Hence (v o $i les V1 € Ay} is a topological manifold atlas on S. 


Part (ii) may be proved as for part (i) by swapping X1 and X3. 


50.5.9 REMARK:  Product-structured topological manifolds and horizontal and vertical submanifolds. 
Definition 50.5.10 is obtained from Definition 32.11.6 by substituting the word “manifold” for the word 
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“space”. The two definitions are automatically consistent because by Definition 50.1.1, every topological 
manifold is a topological space. The horizontal and vertical “submanifolds” in Definition 50.5.10 are regular 
topological submanifolds of the manifold Y by Theorem 50.5.4 (ix, x). 


50.5.10 DEFINITION: A product-structured topological manifold is a tuple Y < (Y,¢1, $9, X1, X2) where 
Y, X4 and X; are topological manifolds and $1 : Y — X4 and $9 : Y — X» are functions such that the 
common-domain function product $4 x $5 : Y — X4 x Xə is a homeomorphism. 


A horizontal submanifold of a product-structured topological manifold (Y,¢,,¢2,X1,X2) is a topological 
manifold (Y??, T1?) for some x2 € X2, where 


(i) Y?? = óg ({x2}), and 
(ii) T?? = (Qn Y??; Q € Top(Y)}. 


A vertical submanifold of a product-structured topological manifold (Y, $1, $2, X1, X2) is a topological mani- 
fold (Y7", T7*) for some x; € X1, where 


(i) Y7^ = ór  (£21)), and 
(ii) Ty? = (Qn Y7*; Q € Top(Y)}. 


50.6. Holder and Lipschitz continuous manifolds 


50.6.1 REMARK: Manifold regularity classes between continuity and differentiability. 

The a-Holder regularity classes for functions in Section 38.7 are stronger than continuity. The Lipschitz 
regularity classes for functions in Section 38.6, which are the same as o-Hólder regularity with a = 1, 
are at the top of the scale of a-Hólder regularity with a € (0,1]. Neither of these regularity classes are 
strong enough to guarantee the well-definition of a tangent bundle covering an entire topological manifold. 
Therefore these regularity classes are presented in Chapter 50 rather than in Chapter 51. 


50.6.2 REMARK: Interpolation of regularity classes between C* differentiability classes. 

For many purposes, including analysis on the graphs of solutions of boundary value problems, it would 
be valuable to be able to define various interpolations of manifold regularity classes between the integral 
differentiability classes C^ for k € Zi in Chapter 51. The C^ regularity classes for transition maps whose 
kth derivative is a-Hélder continuous are applicable in the context of the Schauder existence and regularity 
theory. (See for example Petersen [31], pages 301-307; Gilbarg/Trudinger [81], pages 87-141.) 


50.6.3 REMARK:  Lipschitz manifolds. 
In the case of Lipschitz continuity, the first derivative exists almost everywhere, and instead of a tangent 
bundle with copies of the linear space IR” attached to each point in the manifold, there is a tangent cone at 
each point. This cone is a linear tangent space almost everywhere, and inferior and superior unidirectional 
tangent vectors are defined everywhere. 


Lipschitz manifolds are useful for defining rectifiable curves, which are the natural kind of curve for defining 
parallelism. Rectifiable curves on Lipschitz manifolds are presented in Section 50.7. 


50.6.4 REMARK: Oblique projection charts for the graphs of real-valued functions. 

The set M = {x € R**!; z"*! = f(zl,...2")) C R"*! for any C94 function f : R” — IR can be given 
manifold charts in a natural way by projecting points from M to x € IR?^*!; z"*! = 0}, which may be 
identified with IR". For any w € IR", a function v, : M — R” may be defined by Yw : (zl,...2"*1) 5 
(zl — wlg"l,.. y^ — wa"). This function is injective if |w| < Lip(f) !, where Lip(f) is the Lipschitz 
constant for f in Notation 38.6.8. An example of this is illustrated in Figure 50.6.1. 

Manifolds which are embedded in the flat spaces IR" are said to be "regularly" embedded if they are every- 
where locally projectable in this injective manner to hyperplanes of the ambient space. 


If only a single projection of a manifold onto a hyperplane is used in an atlas, the atlas will be of class 
C^?? because there will be no transition maps. So it is important to include enough charts in the atlas to 
accurately describe the inherent regularity (or irregularity) of the manifold. In R”, n projections with n 
independent projection directions at each point should be adequate to fully describe the manifold's regularity. 
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Figure 50.6.1 Projection maps for a Lipschitz manifold 


50.6.5 EXAMPLE: A two-chart manifold which is Lipschitz continuous, but not differentiable. 
The set M = {x € R"*1; z"*! = |r!|) is a simple example of a set which is naturally modelled as an 
n-dimensional C?:! manifold. (See Figure 50.6.1.) 


The most obvious chart for this set is o : M — R” with yo : (xl,...z"*1) o (xl,...z"). The atlas 
{wo} containing only this chart makes this set a C^? manifold. One might ask why this very smooth state 
of affairs should be upset by adding further charts. The fly in this ointment is that this would not be an 
accurate description of the manifold. Problems would arise when the embedding of M in IR"*! is used as a 
diffeomorphism. Tangent vectors and higher order differential constructions would not map as expected. 


To expose the non-C™ nature of the set M, it suffices to project the set onto R” in different directions. (See 
Remark 50.6.4 for general projections for graphs of functions.) For 6 € (—1, 1), define the chart vg : M > R” 
by ve: (zl,...z"*l) œ (at — Batt, x?,...2"). This map is clearly C% with respect to the ambient 
space R"*!, but when z^"*! = |z1| is substituted, this yields wg : (zl,...2"*1) 5 (x! — B|z!|, x?,. .. x"). 
The transition map wg o Vg! : R” — R” is defined by vg o V5! : (z1,...z") — (x! — G|x!|, z2,... x^), 
which is clearly only C®'. (See Figure 50.6.2.) 


Vi ovg (x) = c — Bla" | 
2l EL 


2 


Pe 04 Bes" 


Figure 50.6.2 Transition maps for a Lipschitz manifold 


50.6.6 REMARK: Using oblique projections to honestly reflect the structure of an embedded manifold. 

For general embedded manifolds, the direction of projection has no natural choice. It would be deceptive to 
use only one projection of a set because this would give the manifold a structure which depends on the choice 
of projection chart. To honestly reflect the structure of an embedded manifold, all local projections onto 
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hyperplanes should be included in the atlas so that there will be no perplexing chart-dependent properties 
when relations between the manifold and the ambient space are examined. 


50.6.7 REMARK: Generalised tangent bundle required for non-C! manifolds. 

An interesting question to ask about the z"^! = |r!| manifold is how the lack of C! regularity affects the 
definition of the tangent bundle. In fact, this example manifold has an extra property which is not shared by 
general C?! manifolds; namely its transition maps have unidirectional derivatives at all points. This kind 
of manifold is discussed in Section 54.16. 


50.6.8 EXAMPLE:  Lipschitz function which has no one-sided derivatives at one point. 
Figure 50.6.3 shows a function which is C?! but which has no one-sided derivatives at x = 0. 


y-c 


f(x) = x sin (r Ing |z|) 


y--c 


Figure 50.6.3 Lipschitz function without one-sided derivatives at x — 0 


The function f : R — R defined by f(x) = xsin(kln |x|) for x Z 0 and f(0) = 0 has derivative f'(x) = 
sin(k In |x|) + k cos(k 1n |x|) for z #0. So the C°! norm of f is || flo. = (14- k?)!/2. In this case, k = 7/1n2. 
Define the set M C IR"*! by M = {x € R"**!; z"*! = f(z!)). When this is projected at various angles 
onto the z"*! = 0 plane as in Example 50.6.5, the resulting charts will have C?! transition maps, but the 
points on M with z! — 0 will have no one-sided tangent vectors in most directions. 


50.6.9 REMARK: Locally Lipschitz continuous atlases for topological manifolds. 
As in the case of differentiable structure on manifolds in Definition 51.3.8, locally Lipschitz continuous struc- 
ture on topological manifolds must be defined by means of an atlas because multiple mutually incompatible 
locally Lipschitz structures may be consistent with a single topological structure. 


The domains of the transition maps v» o V, ! in Definition 50.6.10 are open subsets of the Cartesian space IR? 
with n — dim(M). (See Definition 38.6.14 for pointwise-bound pointwise-pair-locality Lipschitz functions.) 


50.6.10 DEFINITION: A locally Lipschitz (continuous) atlas for a topological manifold M is a continuous 
atlas A for M such that v» o Vil is a pointwise-bound pointwise-pair-locality Lipschitz map for all y,» € 
A. In other words, 


V4, Y2 € A, Vp € Dom(v4,) N Dom(we), IK € Rj, dr € Rt, Vz,y € Range(41) à Bastgyaa 
[us (Url (2)) — vo(v, 1 ())] € Kle — yl. 


50.6.11 DEFINITION: A locally Lipschitz (continuous) manifold is a pair (M, Am) where M is a topological 
manifold and Aj a locally Lipschitz continuous atlas on M. 


50.6.12 REMARK: Examples of Lipschitz manifolds. 
See Section 53.2 for examples of locally Lipschitz manifold charts. 
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50.7.1 REMARK: Application of rectifiable curve concepts from metric spaces to Lipschitz manifolds. 
Although the length of a curve in a Lipschitz manifold is chart-dependent, the rectifiability of a curve is 
chart-independent. This shows that the definition sometimes given for rectifiability in terms of curve length 
has some limitations. Theorem 38.9.9 asserts that a curve in a metric space is locally rectifiable if and only 
if it is path-equivalent to a Lipschitz curve. Since the path-equivalence relation in Definition 36.5.3 is purely 
topological, not requiring a metric, local rectifiability may be applied to Lipschitz manifolds by requiring a 
curve to be path-equivalent to a Lipschitz curve. (See Section 38.6 for Lipschitz continuity of maps between 
general metric spaces. See Section 38.8 for curve length in ‘general metric spaces. See Section 38.9 for 
rectifiable curves in general metric spaces.) m 


50.7.2 REMARK: The applicability of rectifiability of curves to parallelism definitions. 

Rectifiable curves are a minimum requirement for defining connections (and therefore curvature and covariant 
derivatives) on manifolds, and a Lipschitz atlas is a minimum requirement for defining rectifiable curves in a 
chart-independent manner. Therefore Lipschitz manifolds are a minimum requirement for much of differential 
geometry. This is the motivation for Section 50.7. 


50.7.3 REMARK: Chart-dependence of rectifiability of curves. 

Example 50.7.4 shows that it is not possible to define rectifiability of curves in topological spaces in terms 
of the curve map "through the charts", because even if a curve is rectifiable with respect to one chart, there 
will certainly be other charts for which rectifiability does not hold. 


50.7.4 EXAMPLE:  Chart-dependence of rectifiability of curves in topological manifolds. 

Define a 2-dimensional topological manifold (M, Tm) by M = IR? with the usual topology Tm on IR2. For any 
continuous function h : IR — IR define the map Yp : M — IR? by wy : (at, 27) (xl, x? + h(x1)). Then Yh 
is a continuous map for any continuous h, and the inverse of v, is Y—p, which is also continuous. Therefore 
wp, is a homeomorphism and so a valid continuous chart for (M, Tm) for any continuous h : R — IR. For the 
zero function h = 0, the map v, = vy is the identity map on IR2. This is illustrated in Figure 50.7.1. 


A 
m cc 

Y R2 

" 

I 
lo Th i. 
M 
R2 
Figure 50.7.1 Chart-dependence of rectifiability of a curve 


Now define a curve y : I + M by y: x> (x,0) for some interval J C R. Consider the map “through the 
charts” defined by v; o y : I > R?. This is continuous because v;, and y are continuous. When h = 0, the 
map Yo o y = y is C^? and therefore Lipschitz continuous. But if h is chosen to be a continuous-everywhere, 
differentiable-nowhere function, then pp o y is continuous but not Lipschitz continuous. 


50.7.5 REMARK: The relation of rectifiability of curves to metric spaces. 
One could define Lipschitz manifolds whose transition functions have the corresponding regularity. On such 
manifolds, rectifiable curves and sets would be well-defined. 


Rectifiable curves and paths are defined for metric spaces in Section 38.9. Manifold charts induce a chart- 
dependent metric structure on a manifold. Lengths of curves cannot be defined in a chart-independent 
manner in the absence of a true metric. But the rectifiability property can be defined in a chart-independent 
manner if the manifold is locally Lipschitz continuous as in Definition 50.6.11. 
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50.7.6 DEFINITION: A locally rectifiable curve in a locally Lipschitz manifold M is a curve y : I —^ M with 


Vt € I, Iy € atlasy (M), 3ó € IR*, 
7(Bis) € Dom(v) and length(v o eis ) « oo. 


In other words, for all t € I, for some v € atlasy) (M), there is a ô > 0 such that 4(B,5) C Dom(v) and 
the restricted curve % o |, | : ZA Bis + IR" is rectifiable in R”, where n = dim(M). 


50.7.7 THEOREM: Chart-independence of locally rectifiable curve definition. 
For any locally rectifiable curve y : J > M in a locally Lipschitz manifold M, 


Vt € I, V € atlasy) (M), 3ó € IR*, 
7(Bis) C Dom(y) and length(v o 7|}, ,) < oc. (50.7.1) 


In other words, Definition 50.7.6 is chart-independent. 

PROOF: Let t € I. Let vo € atlas, (M). By Definition 50.7.6, there is a v1 € atlas4((M) and 0, € Rt 

such that 7(B:,5,) € Dom(v1) and length(v o Vp : ) « oo. Let Up = Dom(vo) and U; = Dom(v). Then 
1,01 


yt (Uo NU) € Top, (I) because Up NU; € Top(M) and y : I > M is continuous. So Bis, NI € y 1(UoU1) 

for some ðo € Rt by the definition of the relative topology on I in IR. Then y(Bi s) C Uo N U1 for such ôo. 

Let à = min(do, 61). Then 4(B,,5,) € Uo NU; and length(v o 7| p ,.) € length(vi o "|n ,.) € œ because 
1,09 1,01 


V4 o YI Bes, is a sub-curve of 44 o Y|p, s, in IR^, where n = dim(M). 


Let Vo = Range(/o) and V; = Range(w1). By Definition 50.6.10, there are K € Rj and r € IR* such that 


Yz, y € Vi N By, (yr ovr  (2)) = yoyr (y))| < Kle — yl, 


where p = q(t). For such r, there is a 63 € IR* such that v1(5(Bi,5,)) C By, (p),r because V o 7 is continuous 


and By, (p),r € Topy,(p)(R”). Let 64 = min(d2, 63). Then 51 = Yı © Al ,, I$ a rectifiable curve in By, (p),r 
wag 
and 4 = Wo o Vp , i$ a curve in Vo(V, (By, (y) r)) which is an open subset of Vo. To show that ^o has 
1,84 , 
finite length, let s € NonDec(IN, 41, Dom(^o)) be a non-decreasing sequence of parameter values in the real 


interval Dom(4o) = I Bis,. Then 


l/^lo(zi-1) — ^o(zi)] = 2 Iyoh 61 (23))) — Vo(V 08 (2) 


pas 


<K » l^ (zi+1) — 0 (2x) 
< K length(^1). 


Therefore length(4o) < K length(41) < K length(yı o ae : ) « oo by Definition 38.8.2. This verifies 
1,81 ——— 
line (50.7.1). 


50.7.8 REMARK: Applicability of local rectifiability of curves in Lipschitz manifolds. 

If the parameter interval J in Definition 50.7.6 is compact, the cover of I by open balls B,5 C IR may 
be replaced by a finite cover. Then one could define finite-length curves in an atlas-independent manner, 
although the length would clearly depend on the choice of charts and the choice of neighbourhoods. However, 
this is not relevant to the intended range of applications of the local rectifiability concept in manifolds. It is 
the differentiability almost everywhere which is most important because this permits velocity to combined 
with a connection almost everywhere, which then determines a system of differential equations (when viewed 
via the charts), which can be solved to compute parallel transport. If the manifold is only locally Lipschitz, 
not C1, then a meaningful tangent space may be available only in some almost-everywhere sense. But if 
tangent spaces are in fact suitable for defining parallel transport, the best class of curves for such transport 
would be locally rectifiable curves. In the case of C! manifolds, suitable tangent spaces are available at all 
points, and parallel transport is then straightforward to define in a chart-independent manner. 
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51.1. Overview of the differential layer 


51.1.1 REMARK: The differential layer is between the topological layer and the connection layer. 

Chapter 51 commences the "differential layer" of differential geometry. (The structural layers of differential 
geometry are outlined in Section 1.1.) Only differentiable structure is defined in this layer. No connection 
or metric is defined in the differential layer. So concepts such as parallelism and distance are meaningless 
here. The differential layer is presented in Chapters 51 to 66. 


Figure 51.1.1 illustrates the relations between the various kinds of manifolds according to the kinds of 
structures which are defined on them. 


topological manifold 


t 


differentiable manifold 


L 


affinely connected manifold 


"d 7. 


Riemannian manifold pseudo-Riemannian manifold 


Figure 51.1.1 Family tree of manifolds according to structures defined on them 


51.1.2 REMARK: A differentiable manifold has no “real geometry". 

A differentiable manifold is locally the same as a Cartesian space, but only concepts which retain their 
meaning under local diffeomorphisms of the point-space are meaningful within differentiable manifolds. For 
example, the concept of a straight line is meaningless, but the tangent vector to a curve is well defined 
because both the vector and the curve undergo “covarying” transformations which are associated with local 
diffeomorphisms. 
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A differentiable manifold does not in fact have any “geometry” in the sense of distances, angles, parallelism, 
and so forth. It is a purely analytical structure. In the absence of definitions of parallelism (a connection) 
and distance (a metric), differentiable manifold structure supports only differential and integral calculus. 
The real geometry begins when parallelism (Chapter 67) or distance (Chapter 73) are added to the differen- 
tiable manifold structure. Nevertheless, a surprisingly large proportion of the basic machinery of differential 
geometry may be developed without parallelism or distance. Most importantly, tangent bundles may be 
defined, and all of differential geometry is built on top of these. 


51.1.3 REMARK: Distribution of differentiable manifolds topics amongst the chapters. 
Differentiable manifold topics are presented in the following chapters. 


chapter 


topics 


51. Differentiable manifolds 

52. Differentiable manifold maps and products 
53. Philosophy of tangent bundles 

54. Tangent bundles 

55. Covector bundles and frame bundles 

56. Tensor bundles and multilinear map bundles 
57. Vector fields, tensor fields, differential forms 
58. Differentials of functions and maps 

59. Recursive tangent bundles and differentials 
60. Higher-order tangent operators 

61. Vector field calculus 


differentiable manifolds and real-valued functions 
differentiable manifold curves, maps, products 
the tangent vector representation question 
tangent vectors, tangent operators 

covectors and frame bundles 

mixed tensors and multilinear bundles 

vector and tensor fields, differential forms 
differentials and induced maps 

tangent bundles of tangent bundles 
higher-order tangent vectors and operators 
Lie bracket; Lie derivatives; exterior derivative 


62. Lie groups 
63. Lie transformation groups 


includes Lie algebras 
includes groups of diffeomorphisms 


64. Differentiable fibre bundles 
65. Differentiable vector bundles 
66. Differentiable principal bundles 


locally product-structured manifolds 
based on linear-space manifold fibre spaces 
based on Lie-group fibre spaces 


Chapters 62-66 provide the foundational structures for extending parallelism and curvature concepts from 
tangent bundles to general differentiable fibre bundles. 


51.1.4 REMARK: Vectors are well defined in a differentiable manifold. 

Vectors specify direction at each point of a set. Vectors make possible the determination of the rates of change 
of functions in each direction. There is no diffeomorphism-invariant one-to-one correspondence between the 
vectors at different points of a differentiable manifold. Even if two vectors at different points seem to have 
the same direction for a particular choice of coordinates, the directions will generally be different for other 
orientations of the charts. This general idea is very roughly illustrated in Figure 51.1.2. 


Figure 51.1.2 Layer 2: Charts and vectors 
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The vectors at each point of a differentiable manifold are dissociated from vectors at other points. There is no 
structure to hold the affine transformations between tangent spaces at different points fixed when the space 
is subjected to a diffeomorphism. Precisely such a structure is in fact provided by an affine connection in 
Chapter 71. An affine connection defines parallel transport along paths linking the points of a differentiable 
manifold so that the tangent spaces at distant points may be related to each other. 


51.2. Terminology and definition choices 


51.2.1 REMARK: The motivation for using C" regularity classes. 

The central concept of the differential layer of a manifold is the tangent bundle, which are defined in 
Chapter 54. The reason for using regularity classes such as C rather than k-times differentiability is to 
simplify the definitions of derivatives. For instance, the directional derivative lim, ,o(f(r + av) — f(x))/a 
for v € R” may be expressed as $5; , v/Of (x)/Ox' if f is Cl. So the derivatives of C! functions are fully 
described by vectors v = (v;)#_, in an n-dimensional linear space of tangent vectors at each point. The 
directional derivatives do not always have such a simple structure if the function is merely partially or 
directionally differentiable everywhere. (See Examples 41.4.6 and 41.4.8.) 


51.2.2 REMARK: The benefits of a small atlas. 

For computational applications, it is desirable to define a manifold in terms of a small atlas, preferably finite, 
rather than a maximal or complete atlas. The desire to have a manifold definition which is independent of a 
particular small or minimal atlas should be resisted because the completion of any atlas is highly dependent 
on the level of regularity specified. A maximal atlas discards possibly valuable information, such as, for 
example, a varying level of regularity in different regions of the manifold. Therefore a C" differentiable 
structure is defined in this book as a C^ differentiable atlas, not a maximal C^ differentiable atlas. As is 
hinted at in Remark 49.7.8, the tangent bundle is the true ^philosophically correct" differentiable structure 
underlying a C! manifold, in the same sense that a topological space is the true topological structure 
underlying a C? manifold. However, this is not widely asserted in the literature, and fibre bundles have 
various complexities which are a distraction when initially defining differentiable manifolds. 


51.2.3 REMARK: The disadvantage of defining manifolds to always be very smooth. 

Some authors use the term “differentiable manifold” to mean a C% manifold. This is an unfortunate practice. 
When a reader refers to a book for particular results, it is very easy to make serious errors by thinking that the 
assumptions or assertions of a theorem are weaker than they really are. It is an unnecessary practice, given 
that the term “C” is easier to write and has the same number of syllables to pronounce as “differentiable” . 
So no effort is economised and real harm can be done in terms of wasted time and energy for the reader. 
Redefining standard terms to mean something else is nearly always a nuisance for the reader. (A similar 
practice is the habit of using the term “map” to mean a continuous map. This is dangerous if the reader is 
not reading a book from cover to cover.) 


Neither the C! nor C% interpretations of the word “differentiable” agree with the standard elementary 
calculus definition, which is a weaker notion than C!. Here the C+ interpretation is often used for the term 
“differentiable manifold” in this book, but only when precision is unimportant. 


51.2.4 REMARK: The loss of knowledge and generality when smoothness is always assumed. 

Defining all manifolds to be C?? has the disadvantage that large numbers of theorems are “lost” in the 
sense that one never needs to determine the minimal order of differentiability to make each assertion valid. 
The knowledge of the level of differentiability required for each theorem gives a valuable insight into the 
relations between concepts. But insight is not the only thing lost. Weakening theorems by always requiring 
smoothness makes it impossible to validly apply them to many situations in analysis where only very limited 
differentiability is available. On this subject, Lee [24], page 14, wrote the following. 


Throughout this book, all our manifolds are assumed to be smooth, Hausdorff, and second count- 
able; and smooth always means C™, or infinitely differentiable. As in most parts of differential 
geometry, the theory still works under weaker differentiability assumptions, but such considera- 
tions are usually relevant only when treating questions of hard analysis that are beyond our scope. 


'The problem with this approach is that when applications to *hard analysis" are in fact desired, all of the 
theorems and definitions must be rewritten. It is better to develop the theory from the beginning with 
minimum differentiability requirements rather than having to do it all again when the need arises. 
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On this topic, Drechsler/Mayer [262], page 13, wrote the following. 


Most definitions are easily modified to accommodate C"-differentiability, with finite r, but the 
statements of many results become more involved, and care is required not to exceed the order 
of differentiability. Since in quantum field theory we deal with distributions, the C^? assumption 
represents no loss of generality, and where necessary (e.g., near sources of fields) we will remove 
the set where a function ceases to be smooth, possibly at the cost of ruining the connectivity of the 
manifold under consideration. This will be seen to have a certain usefulness. 


It is very true that when the minimum differentiability level is given for theorems and definitions, ^many 
results become more involved", and this book amply demonstrates this. In many practical applications, this 
implies much work for no benefit. But in a book such as this, the greater understanding which arises from 
determining the minimum differentiability which makes each theorem and definition valid is a significant 
benefit. This quotation also seems to support the assertion in Remark 51.2.2 that the differentiability of 
manifolds could be variable. (For an example of a differential geometry book which states theorems with 
minimum C* classes, see Sulanke/Wintgen [40].) 


51.2.5 REMARK: The difference between a C? manifold and a topological manifold. 

As mentioned in Remark 51.3.18, a C? manifold (as in Definition 51.3.8) is the same as a topological manifold 
(as in Definition 50.1.1) combined with a particular C? atlas. In other words, The C^? attribute does not 
constrain the underlying topological space in any way because the word *manifold" requires it to have locally 
Cartesian structure already. No new requirement is added. 


By contrast, as mentioned in Remark 64.1.4, a C? fibre bundle (as in Definition 64.8.3) is a much more 
specific structure than a topological fibre bundle (as in Definition 47.6.5). The topological spaces which are 
structural components a topological fibre bundle are extremely general by comparison with C? fibre bundles. 
The notation *C^" indicates that a locally Cartesian structure is present for any k € Zi . 'The special case 
“O°” still has this locally Cartesian structure, although with 0-level differentiability. 


Thus the attribute * C?" is effectively a kind of shorthand for “topological” in the case of manifolds, whereas 
in the case of fibre bundles, it has a much more specific meaning. This is because the term “topological fibre 
bundle", like the term "topological space", has no connotations of locally Cartesian structure. 


51.3. Differentiable manifolds 


51.3.1 REMARK: Building a differentiable manifold on top of a topological manifold. 

A differentiable manifold can be specified in many ways. 

(1) Add a differentiable atlas to a locally Cartesian topological space. 

(2) Add a differentiable atlas to a set. 

(3) Create a differentiable manifold as a patchwork of Cartesian patches. 

As stated in Remark 49.2.4, approach (1) seems to best overall, considering the many arguments for and 
against. It is also apparently the most popular in the literature, although some authors adopt approach (2). 


For approach (1), first differentiable atlases must be defined for locally Cartesian spaces. Then differentiable 
manifolds are defined as the aggregate of a topological manifold and a differentiable atlas. This is opposite to 
the presentation order for locally Cartesian topological spaces, where the topological structure can be defined 
first (as in Definition 49.4.7) and atlases can be defined afterwards to fit the topology (as in Definition 49.7.3). 


51.3.2 DEFINITION: A C* (differentiable) atlas for a locally Cartesian space M, for k € Za is a set Am 
which satisfies the following conditions, where n = dim(M) € Zj. 


(i) Vb € Ay, JU € Top( M), INQ € Top(IR?),  : U > Q is a homeomorphism. 
Gi) Usean Dom(v) = M. 
(iii) V4 , We € Am, abe [9] Vil i V1 (Ui f U2) => V3(U1 f U2) is Cr, where Us = Dom(wa) for a= 12. 


An indexed C" (differentiable) atlas for a locally Cartesian space M is a family (Ya)acr such that (4; a € I} 
is a C* atlas for M. 
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51.3.3 REMARK: Comparison of differentiable and topological locally Cartesian atlases. 

Definition 51.3.2 satisfies the requirements of Definition 49.7.3 for a continuous atlas for a locally Cartesian 
topological space. The difference between the two definitions lies entirely in condition (iii) of Definition 51.3.2. 
Since condition (iii) is satisfied for k = 0 by any continuous atlas for a locally Cartesian space, it follows 
that a C? atlas for a locally Cartesian space is exactly the same thing as a continuous atlas. This is stated 
(in unnecessary formality) as Theorem 51.3.4. 


51.3.4 THEOREM: A C^" atlas is equivalent to a continuous atlas with C" transition maps. 

A OF differentiable atlas for a locally Cartesian space (M, Tm), for k € Zi , is the same thing as a continuous 
atlas Ay for (M, Tm) such that v» o Vil : wi(U, N U2) — V»(Ui N U2) is C^ for all V1,V. € Am, where 
Ua = Dom(va) for a = 1,2. 


PROOF: Let Am be a C^ differentiable atlas for a locally Cartesian space M < (M, T) for some k € Zo 
Let n = dim(M). Then n € Zj. For any € Am, Y is a continuous chart for M by Definitions 51.3.2 (i) 
and 49.6.2, and Upean Dom(v) = M by Definition 51.3.2 (ii). Therefore Aj is a continuous atlas for 
M by Definition 49.7.3. Hence by Definition 51.3.2 (iii), Am is a continuous atlas for (M, Tm) such that 
Va o Vi : Ví (Ui N U2) > va(Ui N U2) is C* for all y1, Y2 € Am, where Ua = Dom(y) for a = 1,2. 

For the converse, suppose that Aj, is a continuous atlas for a locally Cartesian space (M, Tj) such that 
we O vy" V1 (Ui N U2) — w2(U, N U2) is ck for all wr, we € Am, where Uy = Dom(v,) for a = 1,2. Then 
Am satisfies Definition 51.3.2 by Definitions 49.6.2 and 49.7.3. 


51.3.5 REMARK: The relative advantages of chart-set atlases and chart-family atlases. 
Although an atlas is principally specified as a set of charts Am in Definition 51.3.2, an atlas is often formalised 
as a family (Wa)aer such that Ay = (va; a € I}. So both forms are given here. Each form has its own 
advantages according to context. The conversion between these two styles of atlases is usually handled 
informally in the literature. The indices are often added or dropped without comment! 


51.3.6 REMARK:  Differentiable locally Cartesian space and topological manifold atlases. 

Since topological manifolds (in Definition 50.1.1) are a sub-class of locally Cartesian topological spaces (in 
Definition 49.4.7), Definition 51.3.2 suffices to define C* differentiable atlases for both Definitions 51.3.7 
and 51.3.8. In both definitions, M < (M,Tm) is a locally Cartesian space, which is a special kind of 
topological space. Therefore the tuple (M, Am) in each case has the form ((M,Tm), Am), where Ty is a 
locally Cartesian topology on the set M. (In Definition 51.3.8, this topology is required to be Hausdorff.) 
The topology Tm contains superfluous information because it can be regenerated from the atlas Am, but 
even the set M = U{Dom(w); v € Am} can be regenerated from Am. So all of the real information is 
in the atlas. Reconstructing the topology from the atlas every time it is needed is tedious. Therefore it is 
convenient to make it available in the standard specification tuple. 


51.3.7 DEFINITION: A C* (differentiable) locally Cartesian space, for k € Zg, is a tuple ((M, Tm), Am) 
such that (M, T4) is a locally Cartesian space and Ay; is a C* differentiable atlas for M. 


51.3.8 DEFINITION: A C* (differentiable) manifold, for k € Zt, is a tuple ((M, Ty), Ay) such that 
(M, Ty) is a topological manifold and Am is a C* differentiable atlas for M. 


51.3.9 REMARK: The point set of a manifold is conventionally an abbreviation for the manifold. 

Since differentiable locally Cartesian spaces and manifolds (M, T4;), Am are “nested structures", M may 
have three meanings. Thus M may be an abbreviation for (M, Am), which is in turn an abbreviation for 
(CM, T), Am), where Ty is the topology on the set M, and Ay is an atlas for the topological space (M, Tm). 
In terms of the “chicken-foot” notation “<” which is mentioned in Remarks 1.5.7 and 8.8.4, one may write 


M< (M, Am) < ((M, Tm), Am), 
but one may also write 


M <(M,Ty) < ((M, Tm), Am). 


Ultimately it does not matter which abbreviation is used because they all refer to the same standard triple, 
which is ((M,Tm), Am), at least in this book. As discussed at some length in Remark 49.2.4, the topology 
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is always assumed to be present if the atlas is present. The atlas always induces the correct topology by 
means of a simple construction. Most importantly, Tm is always implied as part of the tuple (M, Am). 


The abbreviation which is intended can be clarified, when necessary, by using the phrase “the set M" to 
emphasise that no topology or atlas is attached, the phrase “the topological space M" or “the topological 
manifold M" to mean the pair (M, T4), and the phrase “the differentiable manifold M" to mean the nested 
pair (LM, T), Am). It is almost always possible, with a sufficiently sympathetic attitude, to determine 
which meaning is meant. 


One convenient aspect of the inclusion of both the topology and an atlas in the nested specification tuple 
(CM, Ty), Am) for a differentiable locally Cartesian space M is that one may use the notations Top( M) and 
atlas(.M) to refer to the topology and atlas respectively, and these operations on the nested tuple are well 
defined by the set-theoretic formulas in Theorem 9.2.11 (i, iv). 


51.3.10 REMARK: Notation for the implicit atlas for a manifold in a particular contest. 

The atlas Aj is often implicit unless more than one atlas is under consideration in a single context. When 
a unique choice of atlas is clearly implied by context, Notation 51.3.11 is sometimes helpful when the choice 
of atlas is clear. 


51.3.11 NOTATION: atlas(M) denotes the implicit atlas Aj for a differentiable locally Cartesian space or 
manifold ((M, Tm), Am). 


atlasp(M) denotes the subset (v € atlas(M); p € Dom(w)} of charts v in the implied atlas Ay whose 
domains contain a given point p € M. 


Aw,p is an alternative notation for atlas;(M) when Am denotes an atlas. 
51.3.12 REMARK: The underlying topology of a differentiable manifold is locally Cartesian. 


Definitions 51.3.13 and 51.3.14 give names to the topological space and topology underlying a given tuple 
((M, Ty), Am) according to Definitions 51.3.7 and 51.3.8. 


51.3.13 DEFINITION: The underlying topological space of a differentiable locally Cartesian space 
M < ((M,Ty), Am) is the topological space (M, Tm). 


The underlying topology of a differentiable locally Cartesian space M < ((M,Tw), Am) is the topology Tm. 


51.3.14 DEFINITION: The underlying topological space of a differentiable manifold M < ((M,Tm), Am) is 
the topological space (M, Tm). 
The underlying topology of a differentiable manifold M < ((M,Tm), Am) is the topology Tm. 


51.3.15 REMARK: Dimension of a differentiable manifold. 

Although Notation 50.1.5 for the dimension of a topological manifold is a duplicate of Notation 49.4.11 for a 
locally Cartesian space, Notation 51.3.17 is not exactly a duplicate of those two notations. A differentiable 
manifold is a different class of structure to a locally Cartesian space. So it needs its own definition. 


51.3.16 DEFINITION: The dimension of a differentiable manifold M < ((M,Tm), Am) is the dimension of 
the underlying locally Cartesian topological space (M, T). 


51.3.17 NOTATION: dim(M) denotes the dimension of a differentiable manifold M. 


51.3.18 REMARK: A C? manifold is a topological manifold combined with a particular C? atlas. 

A C? differentiable manifold is a differentiable manifold structure with zero differentiability. Therefore 
a C? differentiable manifold is not, strictly speaking, a differentiable manifold. The case k = 0 for C* 
differentiability is included for notational convenience, but also to specify a particular choice of atlas. The 
term “C° (differentiable) manifold" means a tuple ((M, Tm), Am) such that Am is a C? manifold atlas for the 
underlying topological space (M, Tj), whereas the underlying topological space (M, Tm) is a “topological 
manifold" as in Section 50.1. 

Thus a “topological manifold" is a Hausdorff locally Cartesian topological space (M, T) where only the 
topology Tm is specified for the set M. Such a topological space has at least one topological manifold 
atlas Ay, and it typically has infinitely many such topological manifold atlases, all of which are topologically 
compatible with each other and with the topology Tm. If one chooses such an atlas Am, the C? manifold 
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(CM, Tw), Am) contains of the information of the topological space (M,Tm). However, it is not “equi- 
informational”. The particular choice of atlas contains additional information. For example, the chosen 
atlas may specify a C^ differentiable structure on M for some k € Zt. Therefore a C? differentiable 
manifold is not exactly the same thing as a topological manifold. 


51.3.19 THEOREM: A OF manifold is a C’ manifold for all £ < k. 
Let (M, Am) be a C^ manifold for k € Zg. Then (M, Am) is a C* manifold for all / € pu with £ < k. 


Proor: Let k € Zj, and let / € Zg satisfy ( < k. Let (M, Am) be a C* manifold. Then Aj; is a C^ 
manifold atlas for M by Definition 51.3.8. So v3 o Vil : yı (U1 N U2) > v3(U1 N U2) is a OF transition map 
for all y1, Y2 € Am, where Ua = Dom(v,) for a = 1,2, by Definition 51.3.2 (iii). By Definition 42.7.2 for 
C* Cartesian space diffeomorphisms, it follows that Y2 o pI ! is of class C" for all y1, Y2 € Am. Therefore 
Ay is a C^ manifold. Hence (M, Am) is a C^ manifold. 


51.4. Compatible charts and maximal atlases 


51.4.1 REMARK: Compatible charts and atlases for differentiable manifolds. 

Given a differentiable manifold M < ((M, Tm), Am), a C^ compatible chart for M is a topological chart for 
the underlying locally Cartesian space or topological manifold (M, Tm) which is C^-compatible with all of 
the charts in Ay. (See Definition 49.6.2 for continuous charts on locally Cartesian spaces.) 


51.4.2 DEFINITION: A C* compatible chart for a C* locally Cartesian space or manifold ((M, Tm), Am), 
for k € Zg, is a continuous chart Y% on (M, Tm) such that Am U (v) is a C^ atlas for the underlying 
topological space (M, Tm). 


51.4.3 THEOREM: A C^ atlas remains a C" atlas after the addition of C^ compatible charts. 
Let M < ((M,Ty), Au) be a C* locally Cartesian space or manifold for some k € Zj. Let A be a set of 
C* compatible charts for M. Then Am U A is a C^ atlas on (M, Ty). 


PROOF: Let M < ((M, Ty), Am) be a C* manifold for some k € s. Let A’ = Am U A, where A is a set 
of C^ compatible charts for M. Then v is a C^ compatible chart on M for all y € A’ because all charts in 
Ay are C* compatible charts on M by Definition 51.4.2. So every v) € A’ is a continuous chart on M by 
Definition 51.4.2. Therefore A’ satisfies Definition 51.3.2 (i). 


Since Upea, Dom(v) = M by Definition 51.3.2 (ii), it follows that Upea Dom(v) = M because Am C A’. 
Therefore A’ satisfies Definition 51.3.2 (ii). 


Let v1,» € A’. Then v, and v» are continuous charts on M, and both Ay U (41) and Ay U (vs! 
are C* atlases for M by Definition 51.4.2. Let U, = Dom(wv4) for a = 1,2. Let p € Ui N Us. Then 
p € Uo = Dom(yo) for some po € Ay by Definition 51.3.2 (ii). So it follows from Definition 51.3.2 (iii) that 
Vi o Y3” : polUo N U1) — Yı (Uo N U1) is a CF diffeomorphism because Am U {Y1} is a C^ atlas on M, and 
V. o Vol : Wo(Up N U2) — Y2(Uo N U2) is a C* diffeomorphism because Am U (V) is a CF atlas on M. 
Therefore Yz o by "|p, = (2 o g^) o (Vi o do)? : Vi(Uo N Ui N U2) > y2(Uo N V1 N U2) is a C* 
diffeomorphism. Thus V» o v, ! is a C^ diffeomorphism in an open neighbourhood Up MU; NU of each point 
V1(p) in Ui N U2. In other words, Yz o Vil : yı (U4 N U2) — v3(U1 N U2) is a C* diffeomorphism. Therefore 
A’ satisfies Definition 51.3.2 (iii). Hence A’ is a C* atlas on (M, Tm). 


51.4.4 REMARK: Maximal atlases, subject to a differentiability criterion. 

A C* maximal atlas can be constructed for a C* manifold (M, Am) as the set of all oF compatible charts 
for (M, Am) according to Definition 51.4.2. This C^ maximal atlas is sometimes called the *C^ differ- 
entiable structure” on M. Maximal or “complete” atlases are not very useful in practice. For example, 
Theorem 51.3.19 would not be valid if differentiable manifolds were required to have maximal atlases. A C* 
maximal atlas could also be referred to as a “C*-maximal differentiable structure", but it is clearer to call 
it a C^-maximal atlas. 


There is a basic difference between the maximal atlases for differentiable manifolds in Definition 51.4.5 
and Notation 51.4.7 as opposed to the maximal atlases for topological manifolds in Definition 49.7.18 and 
Notation 49.7.20. In the differentiable manifold case, a maximal atlas must be compatible with a given 
atlas (which implies compatibility with the underlying topology), but in the topological manifold case, a 
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maximal atlas must be compatible with the given topology. However, the maximal atlas notation atlas? (M) 
does denote the same set of charts as the corresponding topological manifold notation atlas(M) because by 
Definitions 51.3.2 and 51.3.8, every C? atlas for M must be topologically compatible with M. But this is 
true as a theorem, not as a definition. 


51.4.5 DEFINITION: The mazimal C^ atlas for a C* differentiable manifold M, for k € Za is the set of 
all locally Cartesian charts on M which are C^ compatible with atlas(M). 


A maximal C^ atlas is also known as a complete C* atlas. 


51.4.6 REMARK: Notation for the atlas of all C^ charts for a C" manifold. 

It is convenient to introduce the non-standard Notation 51.4.7 for the atlas of all C^ differentiable charts 
which are C* compatible with a given C^ manifold, in other words for the maximal atlas in Definition 51.4.5. 
This atlas is fully determined by atlas(.M). Theorem 51.4.8 shows that charts which are C* compatible with 
a given atlas are also C^ compatible with each other. Therefore the C^ maximal atlas is a C* atlas. 


51.4.7 NOTATION: atlas*(M), for a C^ manifold M with k € Z{, denotes the maximal C* atlas for M. 


atlas (M ) denotes the set of charts in atlas" (M) whose domain contains a given point p € M. 
In other words, atlas? (M) = (v € atlas*(M); p € Dom()]. 


51.4.8 THEOREM: The maximal C* atlas of a C* manifold is a C" atlas. 
Let M be a C* manifold for some k € Zi. Then atlas" (M) is a C^ atlas on M. 


PROOF: Let M < ((M,Ty), Am) be a C^ manifold for some k € Zj. Let A = atlas*(M). Let V € A. 
Then y is a locally Cartesian chart on M which is C^ compatible with Aj by Definition 51.4.5. So Am U A 
is a C^ atlas on M by Theorem 51.4.3. But Ay C A because all charts in Aj; are compatible with M by 
Definition 51.4.2. So A= Am UA is a C* atlas on M. 


51.4.9 REMARK: Definition styles for equivalent atlases and compatible charts. 

In Definition 51.4.2, a C* compatible chart is characterised as a chart v» such that Am U (v] is an atlas, 
which enforces C* compatibility between 7 and every chart in the given atlas Ay. An alternative style of 
definition would be to require explicit individual C^ compatibility between « and each chart in the given 
atlas Am. The former style of definition could be thought of as “blending with the crowd", while the latter 
style could be thought of as “agreement with all individuals". The practical consequences of the choice of 


style can be seen in proofs of theorems such as Theorem 51.4.8. 


In the case of Definition 51.4.10, a hybrid approach is adopted. Two atlases Aj, and A7, on a given locally 
Cartesian space M could be said to be C^ equivalent if their union Aj, U A2, is a valid C^ atlas. This 
is a kind of “blending of two crowds" into a single crowd. The opposite approach would be say that the 
two "crowds" are equivalent if every individual in one crowd is agrees with every individual in the other 
crowd. (In other words, 71 and v are C* equivalent for all v4 € Al, and v» € A?;.) The approach taken 
in Definition 51.4.10 is to require each individual in each “crowd” to “blend” with the other “crowd”. The 
equivalence of the approaches is easily proved, as for example in Theorem 51.4.11. 


51.4.10 DEFINITION: C* equivalent atlases on a locally Cartesian space M are C^ atlases Al, and A2, 
for M such that every chart in Al, is a C* compatible chart for (M, A2;) and every chart in A2, is a C^ 
compatible chart for (M, Al;). 


51.4.11 THEOREM: Two CF atlases are equivalent if and only if their union is a C" atlas. 
Let Al, and A7, be C* atlases for a locally Cartesian space M. Then A}, and A3, are C* equivalent atlases 
on M if and only if A}, U A2, is a C* atlas for M. 


PRoor: Let Al, and A2, be C" atlases for a locally Cartesian space M. Suppose that Al, and A?, are 
C* equivalent atlases on M. Then A}; U A2, is a C^ atlas for M by Theorem 51.4.3 because every chart in 
A3, is a C* compatible chart for (M, A1,) by Definition 51.4.10. Now suppose that Al, U A2, is a C^ atlas 
for M. Then by Definition 51.4.2, every chart in Al, is a C^ compatible chart for (M, A2;) and every chart 
in A}, is a C* compatible chart for (M, Al). Therefore A}, and A2, are C^ equivalent atlases on M by 
Definition 51.4.10. 
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51.4.12 THEOREM: Some very basic properties of atlas compatibility, equivalence and completion. 
Let k € Zg- Let M be a C^ manifold. 


(i) Every chart in atlas(M) is C^ compatible with atlas(.M). 
(ii) atlas(M) and atlas(M) are C^ equivalent atlases on M. 
(iii) atlas(M) C atlas" (M). 

(iv) atlas(M) and atlas" (M) are C* equivalent atlases on M. 


PROOF: For part (i), let € atlas(M). Then atlas(M) U (v) = atlas(M) is an atlas for M. So ij is a C* 
compatible chart for M by Definition 51.4.2. Hence every chart in atlas( M) is C^ compatible with atlas( M). 


Part (ii) follows from part (i) and Definition 51.4.10. 
Part (iii) follows from part (i) and Definition 51.4.5. 
( 
( 


Part (iv) follows from part (iii) and Theorem 51.4.11 because atlas(M) U atlas*(M) = atlas*(M), and 
atlas” 


M) is an atlas on M by Theorem 51.4.8. 


51.4.13 THEOREM: A covering set of C* compatible charts is a CF equivalent atlas. 
Let k € Zy. Let M < (M, A1) be a C^ manifold. Let 4» C atlas" (M) satisfy Uy 4, Dom(y2) = M. Then 
A» is a C^ atlas for M which is C* equivalent to A. 


PROOF: Let A» C atlas*(M) satisfy Uy ca, Dom(Y2) = M. Then by Definition 51.4.5, every i» € A» 
is C* compatible with Aj. So by Theorem 51.4.3, A; U Ag is a C* atlas on M. Therefore Ag satisfies 
Definition 51.3.2 conditions (i) and (iii). But Ag also satisfies condition (ii) because the charts of Az cover M. 
Hence A is a C* atlas for M by Definition 51.3.2, and Ag is C^ equivalent to A, by Theorem 51.4.11. 


51.4.14 REMARK: Restriction of a differentiable manifold to an open subset. 
If (M, Am) is a C? manifold for some k € Z, the restricted manifold (Q, Ao) in Definition 51.4.15 is 
clearly C*, and the topology induced on Q is clearly the relative topology on Q from M. (Note that the 


definition is applicable and valid for C? manifolds in particular.) 


The exclusion of empty restricted charts v6 in Definition 51.4.15 would be arbitrary and unnecessary. In 


fact, the restriction of a C^ manifold with a maximal atlas would not yield a maximal atlas on the restricted 
space if the empty chart is excluded. 

The restriction of a differentiable manifold which has an indexed atlas does not remove duplicate restricted 
charts. (If the index set is well ordered, it would be straightforward to make choices of which charts to keep.) 


'Theorem 51.4.16 is fairly obvious, and its proof contains no surprises. The key point is that an open subset 
of an open subset of IR" is itself an open subset of IR". This is why the theorem requires €) to be open. 


51.4.15 DEFINITION: The restriction of a differentiable A m — Am) to an open subset Q of M is 
the pair (Q, Ag), where Ao = {(a , Valo); a € I) 2 ((a V|o); ) € Am} in the case of an indexed atlas 
Am = (Va)aer, and Ag = (v ave Ay} in the case of a non- ie atlas Aw, 


51.4.16 THEOREM: Every restriction of a C* manifold to an open subset is a C" manifold. 
Let k € Zf. Let (M, Am) be a C^ manifold. Let Q € Top(M). Let (Q, Ao) be the restriction of (M, Am) 
to Q. Then (Q, Ag) is a C^ manifold. 


Proor: By Definition 51.3.8, Am is a C% atlas for the topological manifold M. Let Aj; be non-indexed. 
Then Ag = {| 3 V € Am} by Definition 51.4.15. Let v € Ay. Then by Definition 51.3.2 (i), v : U > Visa 
homeomorphism for some U € Top(M) and V € Top(IR"), where n = dim(M). So |, : UnQ > Vnw(Q) is 
a homeomorphism by Theorem 31.14.14, where UNQ € Top(Q) and Vnw(Q) € Top(R”) by Definition 31.6.2 
and Theorem 31.6.6. Thus Ag satisfies Definition 51.3.2 (i). 

By Definition 51.3.2 (ii), Uyea,, Dom(v) = M. Therefore Uges Dom(v p) = Ue, (Dom(y) N 9) = 
(Uyea,, Dom(v PNQ = MND = Q by Theorem 8.4.8 (iv). In other words, Ag covers 2. So Ag satisfies 
Definition 51.3.2 (ii). 

Let Y1, Y2 € Ag. Then Ya = al, for some Vo € Am, for a = 1,2. Let Uy = Dom(%q) for a = 1,2. 
Then Ua = Ua MQ, where Ua = Dom(w,), for a = 1,2. So Dom(y» o 7") = v4 (Dom(%1) N Dom(vs)) = 
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y (Ü N Uz) and Range(U» o V; !) = p2(Dom(1) N Dom(/2)) = Y2(Ü1 N U2) by Theorem 10.10.13 (ix, x), 
and then #2 o 8" = (alongs) 9 Wilani)! = Golo) 9 (Or Luo) = 2° Alyona However, 
pı (UN U2) € Top(R”), and V2 o V : V3 (Ui QU) > V(Ui QUs) is C* by Definition 51.3.2 (iii). Therefore 
V» o Vi 1 : v1 (01 U5) A v»(U, N U2) is C^ for all V4, V» € Ag. So Ag satisfies Definition 51.3.2 (iii). Hence 
(Q, Ag) is a C* manifold. 


51.4.17 REMARK: Chart transition matriz notation. 
Definition 51.4.18 gives a name and notation for the chart transition matrices which are implicit in the chart 
transition differentiability condition in Definition 51.3.2 (iii). 


51.4.18 DEFINITION: The chart transition matriz or Jacobian matrix from chart Ya to chart wg at a point 
p € Dom(v4) n Dom(vg), for Ya and Yg in an indexed atlas (Ya)acr for a C! locally Cartesian space or 
manifold M, is the invertible n x n matrix Jga(p) € GL(n), with n = dim(M), which is defined by 


a Me i ð = i 
Vi, j € Nn, Jga(p) j= aj Ve 9 ya (z)) — (51.4.1) 


The chart transition matriz map for Ya, a € atlas(M), for a Ct locally Cartesian space or manifold M, is 
the map Jgq : Dom(Ya) à Dom(yg) > GL(n). 


51.4.19 REMARK: Chart transition matrix notation alternatives. 

In the case of an indexed atlas (wWa)aer, the indices a and 8 for Jgq are elements of I. If the atlas is not 
indexed, it may be assumed to be "self-indexed", meaning that each element of the atlas is indexed by itself. 
(See Remark 10.8.6 for self-indexing of families of sets and functions.) 


Equation (51.4.1) may be written more succinctly as follows. 


Va, B € I, Vi, j € Nn, Jpa(p)*; = ailh o ba) (Walp)) (51.4.2) 
3y! 
= 55; P) (51.4.3) 


Line (51.4.2) is an accurate representation of equation (51.4.1), whereas line (51.4.3) is merely a mnemonic 
shorthand which may or may not be useful. (See also Remark 54.1.12 for this mnemonic notation.) 


The order of a and £ is chosen in the same from/to order as for fibre bundles in Definitions 47.6.5 (v) 
and 64.8.3 (v). This may be thought of (mnemonically) as a kind of chronological or causal order. 


51.4.20 REMARK: Finite-dimensional linear spaces as differentiable manifolds. 

It often happens that a general kind of definition which has differentiable manifolds as spaces within a 
structure must be specialised to the special case of a finite-dimensional linear space. (For example, the fibre 
space of a differentiable fibre bundle is specialised to a linear space in Definition 65.1.3.) Then it is necessary 
to have some way to fit linear spaces into the general framework of differentiable manifolds. 


A standard locally Cartesian space atlas is given in Definition 49.7.14 for finite-dimensional linear spaces, 
consisting of all component maps for the linear space. In the special case of the Cartesian tuple space IR” in 
Definition 51.4.22, it is possible to choose a single chart for the atlas, which induces the same differentiable 
structure as if all possible component maps had been used. From the point of view of differentiable manifold 
structure, it is immaterial whether one or all charts are used. (In the case of an abstract linear space, it 
is not possible to choose a-priori a single basis since all bases are “equal”, and there are infinitely many of 
them if the dimension is non-zero.) 


It is obvious without formal proof that the differentiable manifolds in Definitions 51.4.21 and 51.4.22 are C™. 
In fact, they are clearly analytic. 


51.4.21 DEFINITION: The differentiable manifold (structure) for a finite-dimensional linear space F is the 
differentiable manifold (F, Ap) < ((F, TE), Ap), where Ap = (&p; B is a basis for F} is the standard atlas 
for F in Definition 49.7.14, and Tp is the standard topology for F in Definition 32.6.6. 


51.4.22 DEFINITION: The Cartesian differentiable manifold with dimension n € Zi is the differentiable 
manifold (IR^, Ag»), where Ag» = {idp~} is the standard atlas for IR" as in Definition 49.7.12. 
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51.4.23 REMARK: Standard differentiable manifold structure for general linear groups. 

Analogous to the standard differentiable structure in Definition 51.4.21 for finite-dimensional linear spaces is 
the standard differentiable structure in Definition 51.4.24 for the corresponding general linear groups. Such 
structure is required for Lie groups, which require differentiable structure in Definition 62.2.2, and also for 
the Lie transformation groups in Definition 63.4.2, which are used in Definition 64.8.3 for the specification 
of differentiable fibre bundles. 


51.4.24 DEFINITION: The differentiable manifold (structure) for the general linear group GL(F) of a finite- 
dimensional linear space F is the differentiable manifold (GL(F), Aar(p)) < ((GL(F), Terry), Aance)) 
where Aar(r) = (&np,p; B is a basis for F} is the standard atlas for GL(F) in Definition 49.7.15, and Terr) 
is the standard topology for GL(F) in Definition 32.6.8. 


51.4.25 REMARK:  Topological classes of differentiable manifolds. 

Whenever a topological property such as “compact”, “connected”, “second countable”, and so forth, is 
applied in reference to a differentiable manifold, the property is understood to refer to the underlying 
topological space. 


51.5. Differentiable locally Cartesian atlases for sets 


51.5.1 REMARK: Two ways to define a differentiable locally Cartesian atlas. 

As mentioned in Remark 49.8.1, an atlas may be derived from a given locally Cartesian topological space, 
or the topological space may be derived from a locally Cartesian atlas on a bare set. The former approach 
is followed in Section 51.3. The latter approach is followed in Section 51.5. Thus in Definitions 51.3.7 
and 51.3.8, a continuous atlas for a given locally Cartesian topological space is required to have a specified level 
of differentiability, whereas in Definition 51.5.2, a differentiable locally Cartesian atlas is defined on a pure 
set. From such an atlas, a locally Cartesian topology may be "induced" onto the set as in Definition 51.5.8. 


As discussed in Remark 49.2.4, most authors prefer to define differentiable atlases in reference to a given 
locally Cartesian topological space as in Definition 51.3.2. (See Table 49.2.3.) However, there are arguments 
in favour of defining an atlas as an independent structure “from scratch" as in Definition 51.5.2, and some 
authors do define manifolds in this way. (In particular, Synge/Schild [41], pages 3-4; Malliavin [28], page 17; 
Do Carmo [9], page 2; Darling [8], pages 98-99; Lang [23], page 22.) v 


51.5.2 DEFINITION: A C* (differentiable) locally Cartesian atlas for a set M, for k € ZE , is a set Am 
which satisfies the following conditions for some n € Zg . 
(i) Vb € Ay, JU € P(M), 30 € Top(IR?), y : U > Q is a bijection. 
(i) Upean Dom(v) = M. 
(iii) V1, Y2 € Am, v1(Dom(v»)) € Top(IR"). 
(iv) V4 , We € AM, abe [9] Vil : Yı(Uı f U2) => V3(U1 f U2) is Cr where Uax = Dom(Ya) for a= 1,2. 


51.5.3 REMARK: Interpretation of conditions for a differentiable locally Cartesian atlas. 

The spaces and maps in Definition 51.5.2 are illustrated in Figure 49.8.1 for Remark 49.8.3. The comments in 
Remark 49.8.3 about Definition 49.8.2 for a continuous locally Cartesian atlas apply also to Definition 51.5.2 
for a differentiable locally Cartesian atlas. Those comments are adapted and expanded here in this remark. 


The conditions of Definition 51.5.2 mean that a C* atlas on a set M is a set Ay of bijections from arbitrary 
subsets of M to open subsets of IR", such that the transition map Y2 o vj l is of regularity class C^ 
for all 1,» € Am, and the domains of the bijections in Am cover all of M. (See Definition 42.5.11 
and Notation 42.5.10 for C^ differentiable Cartesian space maps.) The transition maps are necessarily C^ 
diffeomorphisms because the C^ differentiability applies in both directions. (See Definition 42.7.2 for C* 
diffeomorphisms.) 


Condition (iii) of Definition 51.5.2 ensures that the patches of the atlas are sewn together correctly. The 
set i; (Dom(v;)) equals the image v1(U; N U2) under v, of the patch-domain intersection U; N Us. The 
requirement that V4 (U1 N Uz) and va(U4 N U2) be open subsets of R” ensures that the CĂ differentiability 
test in condition (iv) may be carried out for functions v» o i ! and v, o v5 ! on open sets, but this is not 
the principal motivation for condition (iii. The main motivation is to ensure that the definition matches 
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the standard concept of a differentiable manifold. Example 51.5.4 gives some idea of what could go wrong 
without condition (iii). 

The differentiability class C^ in condition (iv) is interpreted in terms of the standard differentiable structure 
on the range-sets 4 (U1 N U2) and w2(U; N U2). Since these are open subsets of R” by condition (iii), the 


points in these range-sets are surrounded by neighbourhoods within R”. So the C^ criterion is identical to 
the criterion for points in R”. 


51.5.4 EXAMPLE: Consequence of not enforcing topological openness of images of patch intersections. 
Example 49.8.4 describes a “non-manifold” which complies with all of the conditions of Definition 49.8.2 
for a continuous locally Cartesian atlas on a set except for condition (iii). That example is adapted here 
for Definition 51.5.2 for a differentiable manifold, with additional observations which arise because of the 
differentiability criterion. 

Let M = {x € R3; 23 = 0 or z2 = 0} and Ay = (V4, Y2}, where 4 : Uy; = {x € R3; x3 = 0} > Qı = R? 
is defined by v4 : x — (21,22), and we : Uz = (x € RÌ; z9 = 0} — Q2 = R? is defined by v» : £ > (21, 23). 
Then the pair (M, Am) complies with Definition 51.5.2 except for condition (iii). Clearly conditions (i) 
and (ii) are satisfied by Am. To verify condition (iv), note that Uj N U5 = (x € R?; x3 = 0 and x2 = 0}, 
and V1 (Ui N U2) = (Ui N U2) = (x € R2; z2 = 0}, and ya o V! : x e (21,0) and 41 o 5! : z 9 (21,0). 
It is a little difficult to interpret the C^ property for such a map between closed subsets of IR?. Certainly 
when k = 0, one may say (without fear of contradiction) that ws o wy! and Yı o wy! are C? maps. In the 
cases k € Zt, one could plausibly claim that these chart transition maps are C* although the domain and 
range have no interior points. 


Whether one regards the C* property as being well defined on a non-open domain or not, it seems desirable 
to exclude this circumstance explicitly rather than resting upon the regularity-class definition to exclude 
non-open domains in some implicit way. Therefore condition (iii) is added to Definition 51.5.2 to clarify the 
situation. In simple terms, this condition requires the chart-images of intersections of chart-domains to be 
open subsets of R” in order to keep the chart-transition-map regularity condition tidy. 


51.5.5 REMARK: Optional exclusion of empty charts from differentiable locally Cartesian atlases. 

'The issues regarding empty charts are the same for differentiable manifold atlases on sets in Definition 51.5.2 
as for topological manifold atlases on sets in Definition 49.8.2. To cut a long story short, permitting empty 
charts is sometimes a good thing and sometimes a bad thing. (See Remark 49.8.5.) 


51.5.6 REMARK: The topology induced by a differentiable locally Cartesian atlas on a set. 

Theorem 51.5.7 asserts that the topology induced on a set M by a C^ differentiable locally Cartesian atlas 
Ay is a valid topology. (This theorem is related to, but not directly provable from, Theorem 32.4.6 for 
a family of maps targeted at general topological spaces.) The induced topology of a differentiable locally 
Cartesian atlas can then be formalised in Definition 51.5.8. It is the weakest topology Tm on M for which 
all of the maps in Ay are local homeomorphisms. (Note that Definition 51.5.8 is essentially identical to 
Definition 49.8.12.) 


51.5.7 THEOREM: The topology induced on a set by a locally Cartesian atlas is a topology on the set. 
Let Am bea C^ locally Cartesian atlas for a set M, for some k € Zo . Then the set 


{ U v7 (Q5); (Qu)uea, € Top(R”)*™ } 
EAM 


is a topology on M, where n = dim(M) € Zj. 


PROOF: The result follows immediately from Theorem 49.8.8 (ii) because an n-dimensional C* differentiable 
locally Cartesian atlas for a set M is necessarily an n-dimensional continuous locally Cartesian atlas for M, 
for any n € Zi and k € Zj. 


51.5.8 DEFINITION: The topology induced by a C* locally Cartesian atlas Ay; on a set M , for k € Z and 
n = dim(M) € Zj, is the set 


Tm = t, U V-1(Q,); V € Am, Qy € Top(IR?)). 
EAM 


The topological space induced by a C* locally Cartesian atlas Am on a set M is the pair (M, Tm). 
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51.5.9 REMARK: The topology induced by a locally Cartesian atlas might not be Hausdorff. 

The topology Tm in Definition 51.5.8 is uniquely determined by the atlas Ay. It is the weak topology 
induced by the charts v € Am. If the topological space (M, Tj) is not Hausdorff, then the atlas Aj is 
merely a C^ locally Cartesian space atlas, as indicated in Definition 51.5.2. (See Section 49.4 for locally 
Cartesian spaces. See Section 49.5 for non-Hausdorff locally Cartesian spaces.) But if the induced topology 
is Hausdorff, the word “manifold”. may be used, which implies the Hausdorff separation property by default. 


Definition 51.5.10 grants the title “manifold” to C^ locally Cartesian atlases on sets which induce Hausdorff 
topologies on their sets. 


51.5.10 DEFINITION: A C* (differentiable) manifold atlas for a set M, for k € Zi, is a C* differentiable 
locally Cartesian atlas for M which induces a Hausdorff topology on M. 

An indexed C" (differentiable) manifold atlas for a set M is a family (Ya)acr such that (4; a € I} isa 
C* differentiable manifold atlas for the set M. 


51.5.11 REMARK:  Single-chart locally Cartesian spaces are manifolds. 
Theorem 51.5.12 shows that a locally Cartesian space with a single-chart atlas is necessarily Hausdorff. 


51.5.12 THEOREM: Single-chart locally Cartesian spaces are manifolds. 
Let Am be a C* locally Cartesian atlas on a set M for some k € Zj. If Am contains only one chart, then 
the topology induced by Aj on M is a Hausdorff topology. 


PROOF: Let n € Zj and k € Zj. Let M be a set, and let Ay = (vo) be a C* locally Cartesian atlas 
on M with n = dim(M) € Zj. Then by Definition 51.5.8, the topology induced by Am on M is the set 


Ty = (s (0); Q € Top(R”)}. 
By Definitions 31.12.4 and 31.14.2, wo is a homeomorphism between (M, T4) and (Q, Top(Q)). But the 


usual topology on R” is a Hausdorff topology. So the induced topology Top(Q) on Q is Hausdorff by 
Theorem 33.1.33 (iii). Therefore the topological space (M, Tm) is Hausdorff. 


51.5.13 REMARK:  Differentiable locally Cartesian spaces and manifolds defined in terms of atlases. 
Definitions 51.5.14 and 51.5.15 differ from Definitions 51.3.7 and 51.3.8 respectively by being derived from 
atlases on sets instead of atlases on topological spaces. Definitions 51.5.14 and 51.5.15 refer to Definitions 
51.5.2 and 51.5.10 respectively for atlases on sets. The induced topology comes from Definition 51.5.8. 


51.5.14 DEFINITION: The C^ (differentiable) locally Cartesian space induced by an atlas Ay , where Ay 
is a C* locally Cartesian atlas for the set M for some k € Zi, is the tuple ((M, Tm), Am), where Tm is the 
topology induced by Am on M. 


51.5.15 DEFINITION: The C* (differentiable) manifold induced by an atlas Ay , where Am is a C^ manifold 
atlas for the set M for some k € Zj , is the tuple ((M, Taj), Am), where T; is the topology induced by Ay 
on M. 


51.5.16 REMARK: Building a differentiable manifold from a set versus building it on a topological space. 
Definitions 51.5.14 and 51.5.15 build differentiable locally Cartesian spaces and manifolds from sets by adding 
atlases, which then induce topologies on those sets. Definition 51.5.2 provides conditions to ensure that the 
atlas has the right properties, which in turn ensures that the topology has the right properties as shown in 
'Theorem 51.5.7, which is based on the fairly complicated Theorem 49.8.8. 


The alternative approach in Section 51.3, which appears to be simpler, is to define an atlas on a pre-defined 
topological space which has locally Cartesian or topological manifold properties. This does not require a 
complicated theorem such as Theorem 49.8.8 to verify that the topology has the right kinds of properties. 
The atlas-on-topological-space approach is convenient for manifolds which are specified by embeddings or 
immersions so that the topology and charts are easily specified. However, a truly intrinsic geometry does 
not inherit its structure from embeddings or immersions. An intrinsic geometry obtains its structure from 
an atlas defined on an abstract set. Thus the “atlases on sets” approach is more intrinsic than the “atlases 
on topological spaces" approach. 
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51.5.17 REMARK: Abstract differentiable manifold structure should reflect concrete manifold structure. 
The definition of a differentiable manifold specifies that the transition maps must lie in some class of local 
transformations in the coordinate space. This use of the coordinate space to define differentiability, as 
opposed to the manifold itself, is necessary because the manifold exists outside the pure mathematical 
framework. The differentiability requirement for the transition maps reflects the fact that the manifold 
itself is supposed to have a differentiable structure. In other words, the differentiability of transition maps 
yz © Vil is only a by-product of the fact that the charts 4 and v» are supposed to faithfully reflect the 
native differentiability of the manifold itself. 


If we had direct access to the differentiable structure on the manifold (otherwise than through Cartesian 
charts), we could state that all charts 7 in the atlas must be differentiable. But the manifold’s differentiable 
structure exists outside the mathematical framework. So we can only state the necessary condition that the 
transition maps V» o YI ! must be differentiable. There is no way for the pure mathematician to ensure that 
the maps v : M — R” are themselves differentiable. As part of the formulation of any mathematical study, 
someone must choose charts which correctly map the manifold's structure to Cartesian charts. This step in 
the mathematical analysis is not often mentioned. Just because the charts are C^ consistent with each other 
does not necessarily imply that they are C^ consistent with the native structure on the manifold. Otherwise 
any old coordinatisation would be acceptable. 


An atlas is only a model. It is the responsibility of the mathematician (or somebody else) to ensure that 
the model matches the system under study. This is not a purely philosophical issue. It often happens that a 
system which has singularities of various kinds can be made to look very smooth by choosing an atlas which 
has smooth transition maps. Then the false conclusions which result from such a modelling error are the 
responsibility of whoever chose the unfaithful coordinatisation. 


51.6. Differentiable real-valued functions 


51.6.1 REMARK: The differentiability of functions on manifolds. 

The continuity of a real-valued function on a topological manifold M depends only on the topological 
structure. It has nothing to do with the choice of atlas, but differentiability of a real-valued function 
f: M > R is meaningless if the only structure on the set M is the topology. A single topological space can 
have multiple incompatible differentiable atlases such that some functions are differentiable with respect to 
some atlases but not with respect to others. 


The C* differentiability of functions on manifolds is defined in terms of the same property for Cartesian 
spaces. The C% differentiability of a function whose domain is an open subset of IR" and range is R™ for 
some m,n € Zg is defined in Section 41.1. Definition 51.6.2 for m — 1 is illustrated in Figure 51.6.1. (See 
Definition 52.1.2 for the corresponding C^ maps between C^ manifolds.) 


M 


(9) = PQN Dom(y)) CI» 
A il (07 
Range(v) 2 
"tT 
7 


IN 


f 
| NN | f 
N j R 
Lio ee 
foy! 
IR? 
Figure 51.6.1 Differentiability test for f o ^! : v(Q) +R 


51.6.2 DEFINITION: A (real-valued) C: (differentiable) function on a C^ manifold M, for k € Zj, isa 
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function f : M — R such that f o ~~! : Range(w) — IR is C^ for all charts ~ € atlas(M). In other words, 
Vy € atlas( M), fow 1 €C*(Range(w), R). 


A (real-valued) C" (differentiable) function in an open subset €) of a C^ manifold M, for k € Zj, is a 
function f : Q — R such that f o ~~! : v(Q) > R is of class C^ for all charts v» € atlas(M). In other words, 


Vy) € atlas(M), fov! ec*(u(0), R). 


51.6.3 NOTATION: C*(M,R), for k € Zi and a C^ manifold M, denotes the set of C^ functions on M, 
with or without linear space structure for pointwise addition and real scalar multiplication operations. Thus 


Vk € Zi, C*'(M,R) = (f : M > R; Vy € atlas(M), f o v^! € C*(Range(4), R)}. 


C*(M) is an abbreviation for C^(M,R). 
C*(Q, R), for k € Z and an open subset €) of a C^ manifold M, denotes the set of C^ functions on Q, with 
or without linear space structure for pointwise addition and real scalar multiplication operations. Thus 


Vk € Zt, VQ € Top( M), 
C*(Q,R) = (f : Q > R; Vy € atlas(M), f o v^! e C*(W(Q), R)). 


C*(Q) is an abbreviation for C^(Q, IR). 


51.6.4 THEOREM: The C* real functions with pointwise operations form a commutative unitary ring. 
Let M be a C^ manifold with k € Zf. Let Q € Top(M). Then C*(Q, R) with the operations of pointwise 
addition and multiplication is a commutative unitary ring. 


PROOF: Let M be a C^ manifold with k € Zj and n = dim(M) € Zf. Let Q € Top(M). The pointwise 
sum fi + f2 : Q — R defined by fi + fo : p> fi(p) + fa(p) is a well-defined function for all fi, fo € 
C*(Q,R). Let fi, f2 € C*(Q,IR) and v € atlas(M). Then the function g = (fı + f2) o YTE : v(Q) > R 
is the pointwise sum g = (fı o ^1) + (fa o v^!) on the open subset v(Q) of R”. So g is C^ on v(Q) 
by Theorem 41.1.18 (ii) because fi o #7! and fa o v^! are OF on v(Q). So fi + fa € C*(Q,IR) by 
Definition 51.6.2. Therefore C*(Q,TR) is algebraically closed under the operation of pointwise function 
addition. Similarly, C^(Q,IR) is algebraically closed under the operation of pointwise multiplication by 
Theorem 41.1.18 (iii), and this operation is clearly commutative. Moreover, the function p > 1p is a 
multiplicative identity for C^(Q, IR). Hence C*(Q, R) is a commutative unitary ring by Definitions 18.1.2, 
18.1.17 and 18.2.14. 


51.6.5 THEOREM: Constant real-valued functions on C* manifolds are C" differentiable. 
Let k € Zt. Let M be a C* manifold. Let Q € Top(M) and A € R. Define f : U — IR by f(p) = A for all 
p € Q. Then f € C*(Q,R). 


PROOF: Let v € atlas(M). Then f o 77! is a constant map from 7(Q) to IR. So f o -! € C*(v(Q),IR) 
by Theorem 42.2.16. Hence f € C^(Q, IR) by Notation 51.6.3. 


51.6.6 REMARK: Spaces of functions defined in a neighbourhood of a point. 
Notations 51.6.7 and 51.6.8 are concerned with “local functions", namely functions which are not necessarily 
defined everywhere on a manifold. 


51.6.7 NorATION: C*(M,IR), for k € Zg and a C^ manifold M, denotes the set of all C^ functions 
f :Q— R on open sets Q € Top( M). In other words, 


CF(M,R)J- U C"(Q,R). 
Oc Top( M) 


C*(M) is an abbreviation for C^(M, R). 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1640 51. Differentiable manifolds 


51.6.8 NOTATION: C*(M, IR), for k € Zj and a point p in a C^ manifold M, denotes the set of all C^ 
functions f : Q — R on open sets Q € Top, (M). In other words, 


Jk 
CkK(M,R)= U (Q, R). 
QETop,(M) 


21^ : foil 5k 
C5 (M) is an abbreviation for C5 (M, R). 


51.6.9 REMARK: Minima and maxima of real-valued functions on manifolds. 

Whether or not a point is a local extremum (i.e. minimum or maximum) for a given function on a manifold 
depends only on the underlying topology. This is seen clearly in Definition 51.6.10, which mentions the 
topology, but says nothing about charts. Theorems 51.6.12 and 51.6.13 do mention charts, but if a function 
passes the tests for one chart, it will pass for them all. In this sense, the criteria are chart-independent. 


51.6.10 DEFINITION: A local minimum of a function u : M — R on a C? manifold M is a point p € M 
which satisfies JQ € Top, (M), Vq € Q, u(q) = u(p). 


A local maximum of a function u : M — IR on à C? manifold M is a point p € M which satisfies 
INQ € Top, (M), Yq € €, u(q) € u(p). 


51.6.11 REMARK: Theorems about local minima and maxima of real functions on manifolds. 

Theorems 51.6.12 and 51.6.13 are closely related to other theorems which give necessary first and second 
order derivative conditions for a point to be a local minimum or maximum of a real-valued function in a 
space with a differentiable structure. The following table lists theorems for minima for various classes of 
functions. (The shorthand *d?f > 0" here means that the Hessian is positive semi-definite.) 


df =0 d^f -0 class of functions 
40.6.2 42.1.20 (i, i) f:R—-R, differentiable 


41.1.19 42.4.2 (i, ii) f: R” > R, partially differentiable 

— = n 2 42.4.2 (iii, iv) f: R” — R, directionally differentiable 
— 42.4.6 (iii) f:R” — R, totally differentiable 
516.12  51613(Li) f € C?(M,R) 

51.9.5 (i) 51.9.5 (ii) f-2uoy,yeC?*(R, M), u e C?(M,R) 


51.6.12 THEOREM: Partial derivatives of real-valued functions equal zero at a minimum/mazximum. 
Let M be a C! manifold with n = dim(M) € Zj. Let u € C! (M) and let p € M be a local minimum or 
local maximum of u. Then Vv € atlas; (M), Vi € Nn, (0/02")(u o YHE) ayo) =0. 


PROOF: Suppose that p is a local minimum of u, and let w € atlas;(M). Then v(p) is a local minimum 
for u o 157! because Y is a homeomorphism, and so Dom(u o $^!) = Range(v) € Topy) (IR^). Therefore 
(8/8x*)(u o (2) |. y = 0 for alli € N, by Theorem 41.1.19 (ii). The case that p is a local maximum 


of u follows similarly from Theorem 41.1.19 (i). 


51.6.13 THEOREM: Functions have semi-definite partial derivative matrices at a minimum/mazimum. 
Let M be a C? manifold with n = dim(M) € Zf. Let u € C?(M,R) and p € M. 


(i) If p is a local minimum of u, then [(0?/02*02?)(u o v^! 
matrix for all y € atlas; (M). 

(ii) If p is a local maximum of u, then [(0?/0z027)(u o po} 
matrix for all y € atlas,(M). 


UL ol si is a positive semi-definite 


geal is a negative semi-definite 


PROOF: For part (i), suppose that p is a local minimum of u, and let v € atlas,(M). Then v(p) is a local 
minimum for u o Y7} because t is a homeomorphism, and so Dom(u o ^!) = Range(v) € Top, (IR^). 
Therefore (8/8x*)(u o v1 (2). uc) = 0 for all i € N, by Theorems 41.6.17 and 42.4.6 (i). 


Part (ii) follows similarly from Theorem 42.4.6 (ii). 
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51.6.14 REMARK: Second-order derivatives at a local minimum of a C? function. 

Since the first-order derivatives of the function u in Theorem 51.6.12 equal zero, the second-order derivative 
matrix in Theorem 51.6.13 is tensorial. In other words, it transforms according to the first-order derivatives 
of chart transition maps only, not involving higher order derivatives of chart transition maps. 


51.6.15 REMARK: Local extrema of real-valued functions on non-Hausdorff manifolds. 

With regard to non-Hausdorff manifolds, local extremal points have some possibly surprising properties. 
Consider the “real line with two origins” in Example 49.5.5. If a function is defined equal to zero everywhere, 
except for a negative value at the lower origin, clearly the lower origin is a local minimum for the function. But 
the upper origin is also a local minimum by Definition 51.6.10, although in the closure of every neighbourhood 
of the upper origin, there is a point where the function has a smaller value. 


51.7. Differentiable vector-valued functions 


51.7.1 REMARK:  Differentiable real-tuple-valued functions om manifolds. 

Definition 51.7.2 and Notation 51.7.3 extend Definition 51.6.2 and Notation 51.6.3 from real-valued to real- 
tuple-valued functions on differentiable manifolds. The C" differentiability of real-tuple-valued functions 
viewed “through the charts” relies upon differentiability definitions for maps between Cartesian spaces. (See 
Notation 42.5.10 and Definition 42.5.11 for C* differentiable Cartesian space maps.) 


51.7.2 DEFINITION: A C* (differentiable) IR" -valued function for k € Zi; and m € Z on an open subset 
Q of a C^ manifold M is a function f : Q > IR? such that f o V7! : (Q) — R™ is of class C* for all 
charts w € atlas(M). In other words, 


Vw € atlas(M), fov! eCc*(y(Q),mR"). 
(See Notation 42.5.10 for C^(U, IR") for U € Top(IR") and m € Zg.) 


51.7.3 NOTATION: C'(Q,R?"), for k € Zi, m € Zf, and an open subset Q of a C^ manifold M, denotes 
the set of all IR?-valued functions of class C^ on M, with or without the linear space structure of the 
pointwise addition and real scalar multiplication operations for function spaces. In other words, 


C*(Q,R") = (f: 9 2 R”; vy € atlas(M), f o! € C*(¥(Q), R™)}. 
51.7.4 THEOREM: Constant real-tuple-valued functions on C" manifolds are C" differentiable. 


Let k € Zt and m € Zt. Let M be a C* manifold. Let Q € Top(M) and A € R”. Define f : U — R” by 
f(p) ^ ^ for all p € Q. Then f € C*(Q,IR?). 


PROOF: Let w € atlas(M). Then f o ~~! is a constant map from v(Q) to IR". So f o ~~! € C*(y(Q),IR") 
by Theorem 42.6.2. Hence f € C*(Q, IR") by Notation 51.7.3. 


51.7.5 REMARK:  Higher-order differentiability of vector-valued functions on manifolds. 
Definition 51.7.6 and Notation 51.7.7 are the manifold versions of Definition 42.5.31 and Notation 42.5.32 
for vector-valued functions on Cartesian spaces. 


51.7.6 DEFINITION: A C* (differentiable) W -valued function on an open subset €) of a C* manifold M, 
where W is a finite-dimensional real linear space and k € Zi ,is a function f : Q — W such that f o v^! : 
v(Q) > W is of class C* for all charts v» € atlas(M). In other words, 


Vw € atlas(M), fo yat e CQ), W). 
(See Notation 42.5.32 for C*(U,W) for U € Top(IR?).) 


51.7.7 NOTATION: C*(Q,W), for k € Zf, an open subset Q of a C^ manifold M, and a finite-dimensional 
real linear space W, denotes the set of all W-valued functions of class C^ on M. In other words, 


C*(Q,W) = {f : 23 W; vy € atlas(M), f o! e C*(y(Q),W)). 
51.7.8 THEOREM: Constant vector-valued functions on C! manifolds are C! differentiable. 


Let M be a C! manifold. Let W be a finite-dimensional real linear space. Let Q € Top(M) and wo € W. 
Define f : U > W by f(p) = wo for all p € Q. Then f € C! (Q,W). 


PROOF: Let wv € atlas(M). Then f o v^! is a constant map from v(Q) to W. So f o v^! e C1 (v(Q),W) 
by Theorem 41.8.12. Hence f € C!(Q, W) by Notation 51.7.7. 
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51.8. Global extension of differentiable test functions 


51.8.1 REMARK:  Test-function-based definitions of differentiability for manifolds. 

In modern approaches to differential geometry, coordinate-chart-based definitions of differentiability are 
sometimes replaced with test-function-based definitions. Thus for example, instead of defining a function 
$: Mı > M3 to be C* if p2 o à o v. is C* for all i; € atlas( M1) and Y2 € atlas( M2) as in Definition 52.1.2, 
one might replace this chart-based criterion with the test-function criterion in Theorem 52.1.19, which 
requires f o ó € C*(M) for all f € C*(Mz2). This is an elegant style of definition which is superficially 
coordinate-free. Similarly, the concrete chart-based definition for the C^ differentiability of a vector field, 
whose action is defined in Definition 61.1.3, may be expressed in terms of an abstract test in terms of C^*! 
test functions as in Theorem 61.1.5. 


One fly in the ointment here is that the proof of the validity of the elegant definition requires the extension 
of chart coordinates y from each patch Dom(w) to a C* function on the whole manifold. This is not as easy 
as intuition might suggest. The purpose of Theorem 51.8.3 is to achieve such extensions of chart coordinates 
to global test functions which can be used for test-function-based definitions. (It should be noted, however, 
that there is more than one “fly in the ointment".) 


51.8.2 REMARK: The Hausdorff requirement for global test functions. 

Intuitively it seems clear that one may define a function with compact support within the domain of some 
chart for a differentiable manifold and extend this with a zero value to points of the manifold which lie 
outside that domain. As the proof of Theorem 51.8.3 shows, this is possible if the manifold is Hausdorff, but 
not so easy when the manifold is not Hausdorff. The issue here, which is resolved in Theorem 50.1.14, is that 
a relatively compact open subset of the range of a chart may correspond to an open subset of the domain 
whose closure is not included in that domain. Thus what seems to be a small neighbourhood U C Dom(w) 
of a point p € Dom(w) for some chart y may have a closure U which is not included in Dom(w) even though 
when viewed through the chart 4», the set Y(U) has closure v(U) which is included in Range(v). 


The difficulty here is a relic of the classical embedded manifold way of thinking of the 19th century. The 
Hausdorff condition is imposed on manifolds to ensure that they can be embedded in Cartesian spaces, 
but 20th century developments have shown that intrinsic geometry is preferable for applications in physics. 
So the embeddability of the manifold is no longer required, although the majority of differential geometry 
textbooks routinely exclude non-Hausdorff locally Cartesian spaces with little discussion of the justification 
for this exclusion. Certainly differential geometry is easier to manage in Hausdorff topological manifolds, 
but mathematicians are well accustomed to difficult intellectual tasks. 


In the proof of Theorem 51.8.3, the possibility must be excluded that a different chart m may contain points 
which are in the closure of some small neighbourhood 1^! (B, R) of a point ^! (z) in Dom(w). As shown 


in Example 49.5.5, a patch Dom(wv) may contain a point of d € 9-1 (B, n) even though d € Dom(w). 


In colloquial terms, one may say that the closure of a small neighbourhood of a point in a manifold may 
contain a point which is on a different “plane of reality" which is outside the coordinate system at that point. 
Such an intruding point is “infinitely near" to the given point, but not identical with it. This can never 
happen in a manifold which is embedded in a Cartesian space, but there seems to be no real justification 
for insisting that real physical space should be embedded in some higher-dimensional Cartesian space when 
there is no direct evidence for such an embedding. 


'The difficulty of demonstrating the extendability of a simple test function with compact support on a manifold 
in Theorem 51.8.3 hints at the unnaturalness of the Hausdorff condition. A natural condition would not be 
expected to require such technicalities as are encountered in the proof of Theorem 50.1.14, which is applied 
in the proof of Theorem 51.8.3 to show that the closure of i ! (B, &) is included in Dom(w). (The function 
f in Theorem 51.8.3 is illustrated in Figure 51.8.1.) 


51.8.3 THEOREM: Global C* extension of locally constant real-valued function with compact support. 

Let M be a C* manifold for some k € Zf. Let dim(M) = n € Z*. Then for all € atlas(M), for all 
x € Q = Range(4)), for all r, R € IR with 0 < r < R < p = d(x, R” V Q), there is a function f € C*(M,R) 
such that f(p) = 1 for all p € v^ 1(B,,.) and f(p) 2 0 for all pe M NV (B, n). 


PROOF: Let v € atlas(M). Let Q = Range(v) C R”. Let x € Q. Let p = d(x, R” V Q). Then p € Rọ by 
Definition 37.4.1, and p > 0 by Theorem 37.6.5 (i) because € € Top(R”). (By Theorem 37.4.2 (i, ii), p = oo 
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1 f 
MN (Boa) < p . MV (Baa) 
V (80B.g) wv^7(0B.,) (az) ^ v"(0B..) w"(0B.n) 
Figure 51.8.1 Global C^ function with specified values 0 and 1 


if and only if Q = IR".) Let r, R € Rt with r < R < p. Define f : M IR by 


Vp € M, f(p) = A =a) y wird (51.8.1) 


where g,.g € C^? (IR", R) is defined in Example 44.1.12. In other words, 


1 


jal x v 
Vz € IR", gr R(2) = 4 (1+ exp((R— |z)! — (1g —7)3)) ^ |a] € (r, R) (51.8.2) 
0 |z| > R. 


Then f(p) = 1 for all p € Y7 ee Let p € MN! (B, n). If p € Dom(y), then f(p) = gr, rR(Y(p)- x) = 0 

by line (51.8.2) because (p) € QX B, n. If p € Dom(w), then (p) = 0 by line E 8.1). Thus f(p) = 0 for 

all p € MNwvV !(B,g). Also, f o V^ T Ir,Rlo € C™(Q, IR). To prove that f € C*(M, R), it remains to 
(M 


show that f o 9-7! € C*(Range(q)), R) for all € atlas(M). (See Definition 51.6.2.) 


Let ) € atlas(M). Let Q = Range(v). Let i € Q. Let G = 7 (2). Suppose that q € Dom(w). Let 
U = Dom(v) n Dom(V). Then U € Top;(M) because Dom(v), Dom(v) € Hepg i ). So JU ) € Top; (R?). 


Then f oj |; =fop odo ET ow lees ig (bo pe lg) ), which is in C*(Q(U), R) 
because f o ad o € C^ (i (U), IR) and v o ^ "rte € C*(9(U),u(U)). Thus f od! : O2 Ris C^ in 
the neighbourhood v(U) of 3. 

Now suppose that d € Dom(w). Since M is Hausdorff and B}, C Range(w), it follows from Theorem 50.1.14 
that 9-1(B, R) C Dom(v). So d ¢ v-1(B, g). Therefore d € Int(M \ ^! (B,,g)). So f(p) = 0 for p is 


some neighbourhood of j in M. Therefore f o $-! is C^ in some neighbourhood of %. Consequently 
fod : Å —> Ris C* in all of Q = Range() for all € atlas(M). Hence f € C*(M) by Definition 51.6.2. 


51.8.4 REMARK: Modulation of coordinates by a globally C" function with compact support. 

Theorem 51.8.5 uses Theorem 51.8.3 to modulate a chart component y by multiplying it with a function 
h € C*(M,R) with compact support in Range(v) so that the resulting function f € C*(M,R) is equal to 
y within a neighbourhood of a given point i^! (x) in the manifold. This is useful for converting chart-based 
definitions of differentiability into test-function-based definitions as in Theorems 52.1.19 and 61.1.5. 


51.8.5 THEOREM: Global C^ extension of local coordinates. 

Let M be a C^ manifold for some k € Zf. Let dim(M) = n € Zt. Then for all v € atlas(M), for all 
x € Q = Range(w), for all r, R € R with 0 < r < R < p = d(x, R” \ Q), for all j € IN,, there is a function 
f € C*(M,R) such that f(p) = y’ (p) for all p € 9^1(B,,,.) and f(p) = 0 for all pe M \ Y7! (Bz R). 


PROOF: Let v € atlas(M). Let Q = Range(v) C R”. Let x € Q. Let p = d(x, R” V Q). Then p € Rọ by 
Definition 37.4.1, and p > 0 by Theorem 37.6.5 (i) because Q € Top(R”). Let r, R € Rt with r < R < p. 
Then by Theorem 51.8.3, there is a function h € C^(M,IR) with h € C*(M,R) such that h(p) = 1 for all 
pew (B, ,) and B(p) = 0 for all pe M ig ! (B, R). Let j € Nn. Define f : M — R by 


Vp € M, Joys nm ni nod (51.8.3) 
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Then f(p) = v? (p) for all p € 97! (B;,.) and f(p) = 0 for all p € MN! (Bz,g). To prove that f € C*(M, R), 
it must be shown that f o %7! € C*(Range(V), IR) for all i € atlas(M). (See Definition 51.6.2.) 
Let 4) € atlas(M). If = v, then f o7! = f o ^! € C*(Range(w), R) because f (4»7!(y)) = A(w7"(y))y4 
for all y € Range(w) and ho wy! € C*(Range(v), R). 

ji 7,—1 — -1 j,—1 k (a) 
Assume 7 V. Then f o V7 [suu = f 9 V lusu ? V ° V^ [opas € CDon), R) 
because f o ~~! € C*(Range(v), IR) and the transition maps wv o ~~! of M are all C* differentiable. Thus 
f ow? is OF on the open subset (Dom(v)) of Range(4). 
Finally let @ € Range(U) V é(Dom(/)). Let ¢ = 9-'(z). Then q € Y-I (Bg) by Theorem 50.1.14. So 
f(p) = a for p in a neighbourhood of q. Therefore f o 7t is C* on and open neighbourhood of z. Thus 
fow! is C* on all of Range(i). Hence f € C*(M) by Definition 51.6.2. 


51.9. Differentiable curves 


51.9.1 REMARK: Application of the topological space definition of curves and spaces to manifolds. 
Section 51.9 uses the definitions in Sections 36.2 and 36.8 for curves, paths and parametric families of curves 
in general topological spaces. Curves and families of curves are assumed to be continuous by definition. 
Continuity of curves on a differentiable manifold is defined in terms of the topology which is induced on the 
manifold as in Definition 51.5.8. 


51.9.2 DEFINITION: A C* (differentiable) curve in a C^ manifold M, for k € Zg, is a continuous curve y : 
I — M for some open interval J C R, such that Y o y : y^! (Dom(v)) — IR" is a C^ map for all Y € atlas(M), 
where n = dim(M) € Zf. 


51.9.3 REMARK: The effect of viewing curve differentiability through the charts. 

Continuous curves on general topological spaces are introduced in Definition 36.2.3. The domain of a 
continuous curve is defined to be an interval of IR, which is not necessarily an open interval. In principle, 
C* differentiability is defined “through the charts” as in Figure 51.9.1. 


R Voy me 


À 


Figure 51.9.1 Curve in a differentiable manifold 


However, when a curve y : I — M is viewed via a chart 7 € atlas(M), the domain of the curve is restricted 
to the set Dom(w o y) = «4^! (Dom(v)), which is not generally an interval of IR. However, this restricted 
domain is always an open subset of R by Definition 31.12.4, which is therefore equal to the disjoint union 
of a countable set of open intervals of R by Theorem 32.7.8. The C^ differentiability property for curves 
in IR", as given in Definition 42.6.6, may then be applied to each such component interval. 


51.9.4 REMARK: Curve differentiability classes lower than the manifold differentiability class. 

Definition 51.9.2 apparently defines C^ curves only when the curve differentiability parameter k exactly 
equals the manifold differentiability parameter k. However, a C^ manifold for £ € Zi such that k < £ is 
automatically of class C^ by Theorem 51.3.19. Clearly any definition requiring C^ differentiability of the 
inputs could be rewritten to require “C* differentiability for any £ > k”. But some things are just too obvious 
to be worth stating every time. 


51.9.5 THEOREM: First and second derivatives via a curve of a real function at a minimum. 
Let M be a C! manifold. Let n = dim(M) € Zt. Let p € M be a local minimum of a function u € C! (M). 
Let y: I + M be aC! curve in M for some open interval J of IR. 


(i) (d/dt)(u o 7(t))|,_, = 0 for all s € I such that p = ?(s). 


(ii) If M is C?, u € C?(M) and y is C?, then (d?/d£?)(u o y(t))|,_, > 0 for all s € I such that p = 4(s). 


t=s 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


51.10. Analytic manifolds 1645 


PROOF: Part (i) follows from Theorem 40.6.2 (ii) because Dom(u o y) = I € Top,(R), and s is a local 
minimum for u o y if p is a local minimum for u. 


Part (ii) follows similarly from Theorem 42.1.20 (i). 


51.9.6 THEOREM: Constant curves are continuously differentiable. 
Let M be a C! manifold. Let y : T — M be a constant curve in M for some open real interval J. Then y is 
a C1 curve on M. 


PnRoor: By Definition 36.2.11, a constant curve y : J — M satisfies Vs, t € I, y(s) = y(t). If I = 0, then 
yis a C! curve. So assume that I # Ø. Let p € Range(»). Then Vt € I, y(t) = p. Let y € atlas; (M). 
Let J = y-!(Dom(v)). Then J € Top(IR) and v o y : J — IR" is constant, where n = dim(M). In fact, 
Vt € J, v(4(t)) = v(p). Therefore 0;((q o *)(t)) = Opn by Theorem 40.5.5 and Definition 40.7.2. But 
constant functions are continuous by Theorem 31.12.9. So v o y € C1(J, IR") by Notation 42.5.10. Hence y 
is a C! curve in M by Definition 51.9.2. 


51.9.7 REMARK: Summary of book-sections dealing with curve and path topics. 
Table 51.9.2 summarises the distribution of curve and path topics within this book. 


section topics 
36.1 terminology for curves and paths paths are equivalence classes [y]o of curves y 
36.2 curves definitions for curves in topological spaces 
36.5 path-equivalence of curves definition of curves which have the same path 
36.8 paths definitions for paths 
38.9 rectifiable curves and paths almost-everywhere differentiable curves 
48.3 pathwise topological parallelism parallelism e, for paths y 


50.7 rectifiable curves 
51.9 differentiable curves 


57.8 vector fields along curves 
57.9 differential of a curve dy — ^ 


59.8 higher-order differentials of curves and families d'4, k > 1; Oijk... Y 


71.7 covariant derivatives of vector fields along curves Dy; D*y, k > 1 


72.1 geodesic curves Dy =0 


Table 51.9.2 Summary of curve and path topics 


51.10. Analytic manifolds 


51.10.1 REMARK: Applicability of manifold analyticity to Lie groups. 

Analytic manifolds have no special properties that are useful in this book, except that they are used in 
defining Lie groups in Chapter 62. The analytic manifold definitions are the same as the corresponding C° 
definitions except for a simple text substitution. To be clearer, one should refer to real-analytic manifolds 
when there is danger of confusion with complex-analytic manifolds. 


51.10.2 DEFINITION: An analytic (manifold) atlas for a topological manifold M is a topological manifold 
atlas S for M such that 72 o /; is analytic for all v1, Y2 € S. 


51.10.3 DEFINITION: An n-dimensional analytic manifold for n € Zi is a pair (M, S) such that M is 
an n-dimensional topological manifold and S is an analytic manifold atlas for M. Then M is called the 
underlying topological space of (M, S). 


51.10.4 DEFINITION: An analytic chart for an analytic manifold (M, S) is a chart ọ for M such that 
S U (v) is also an analytic manifold atlas for M. 
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51.10.5 DEFINITION: Equivalent analytic manifold atlases on a topological manifold M are analytic man- 
ifold atlases A, and Ag on M such that A4 U A» is an analytic manifold atlas on M. Then (M, A1) and 
(M, A2) are said to be equivalent analytic manifolds. (Clearly (M, Ai U A3) is then also analytic-equivalent 
to both (M, A1) and (M, A2).) 


51.10.6 DEFINITION: A compact analytic manifold is an analytic manifold (M, Am) whose underlying 
topological space M is compact. 


51.10.7 EXAMPLE:  Cartesian spaces are analytic manifolds. 

Let Q be an open subset of IR" for some n € Z+. Define an atlas on Q by S = {4} with v : Q — IR" defined 
by y(x) = x for all x € Q. (This is the same as in Definition 49.7.12.) Then (Q, S) is an analytic manifold. 
In general, any differentiable manifold whose atlas contains only one chart is an analytic manifold because 
there are no chart transition maps to test! 


51.10.8 REMARK:  Differentiable manifold charts for toruses. 

The charts for toruses in Example 50.1.7 have differentiable transition maps. So such toruses are differentiable 
manifolds (because such charts clearly cover the whole manifold and the topology is Hausdorff). In fact, 
they are C% and analytic. 


51.11. Unidirectionally differentiable manifolds 


51.11.1 REMARK: Weakening bidirectional differentiability to unidirectional differentiability. 
Definitions 51.11.2 and 51.11.3 are unidirectional analogues of Definitions 51.3.2 and 51.3.8 for C^ manifolds. 
(See Section 41.5 for unidirectional derivatives.) 


51.11.2 DEFINITION: A unidirectionally differentiable atlas for a topological manifold M < (M,Ty) isa 
topological atlas Am for (M, Ty) such that Vs o V. : v4(Ui N U2) > w»(Ui N U2) is unidirectionally 
differentiable for all 1,» € Ay, where Ua = Dom(v,) for a = 1,2. 


51.11.3 DEFINITION: A unidirectionally differentiable manifold is a pair (M, Ay) such that M is a topo- 
logical manifold and Aj, is a unidirectionally differentiable manifold atlas for M. 


51.11.4 REMARK: Compatibility of unidirectional differentiability with local transformation semigroups. 
Unidirectionally differentiable atlases could be characterised as those atlases which are compatible with the 
local transformation semigroups of unidirectionally differentiable homeomorphisms on Cartesian spaces. For 
this LTS, there is an LTS homomorphism from unidirectionally differentiable curves to unidirectional tangent 
vectors. 


51.11.5 REMARK: Applications of unidirectionally differentiable manifolds. 

A simple example of a manifold which is unidirectionally differentiable but not C! differentiable is a cube or 
rectangle. Any manifold with vertices or “corners” may have unidirectional tangent vectors in all directions 
at all points, but clearly will not be C? at the vertices. The unidirectional differentiability of a topological 
manifold embedded in a Cartesian space is related to the internal and external cone conditions, which 
require each point on the boundary of a set to be the vertex of some open cone within the interior or exterior 
of the set. Interior and exterior cone conditions are closely related to Lipschitz continuity of local graph 
representations of the boundary of a set. 


Since sets with vertices do arise very naturally in applications, the concept of a unidirectionally differentiable 
manifold is not a mere pure mathematical curiosity. 
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Chapter 52 


DIFFERENTIABLE MANIFOLD MAPS AND PRODUCTS 


524. Differentiablewiaps: -oedd e Wod eke eae eq om RUE a Ro mw x a Ron m v jo 1647 
52.2 Diffeomorphisms and pull-back atlases . . . ..... e 1653 
52.3 Differentiable submanifold point-sets . . . ..... es 1654 
52.4 Differentiable submanifolds . .............. es 1659 
52.5 Differentiable embeddings, immersions and submersions ................... 1667 
52.6 Products of differentiable manifolds. . . . .. a a a a 1668 
52.7 Product-structured manifolds and submanifolds . ...................... 1673 


52.1. Differentiable maps 


52.1.1 REMARK:  Differentiable maps and diffeomorphisms between differentiable manifolds. 
Differentiable maps include diffeomorphisms as a special case. A diffeomorphism is a differentiable map 
which is also a homeomorphism whose inverse is a differentiable map. (See Section 52.2.) Differentiable 
maps are well defined between manifolds of arbitrary equal or unequal dimension. 

The real-valued C^ functions on C^ manifolds in Section 51.6 are essentially a special case of the C^ maps 
between C^ manifolds in Section 52.1. Thus Definition 51.6.2 is in essence the special case of Definition 52.1.2 
for real-valued functions. — 


The domain of each function v» o $ o y, ! in Definition 52.1.2 satisfies 


Dom(y» o à o 4!) = v1(Dom(wys o $)) (52.1.1) 
= v4 (6^  (Dom(vs))) (52.1.2) 
€ Top(R"), (52.1.3) 


where n; = dim(Mi), lines (52.1.1) and (52.1.2) follow from Theorems 10.10.13 (iii), and 10.10.13 (i), and 
line (52.1.3) follows from the observation that Dom(w2) € Top( Mi), which implies ¢~!(Dom(w2)) € Top( M3) 
by the continuity of ¢, which implies v1(6-!(Dom(v3))) € Top(IR™) because y; is a local homeomorphism. 
So the C* differentiability of this map is well defined by Definition 42.5.11 for C^ differentiability of maps 
between Cartesian spaces, which requires the domain to be an open set. 


'The spaces and maps in Definition 52.1.2 are illustrated in Figure 52.1.1. 


popop 
R" 
pı 


| ps 


BE Mə 
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Figure 52.1.1 C* differentiability of a map “through the charts” 
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52.1.2 DEFINITION: A C* (differentiable) map from a C^ manifold Mj to a C^ manifold M» for k € Za 
is a map ó : Mı — Mə such that 


Vi, € atlas(Mi), Vi» € atlas( M2), | 3o $ o v. is C* differentiable. 
In other words, V^, € atlas( M1), Vi» € atlas( M2), Y2 0 à o YI” € C*(v4 (6-7 (Dom(vs))), Range(v»)). 


52.1.3 NOTATION: C*(M,, M2), for k € Zj and C! manifolds M; and Mg, denotes the set of all C* maps 
from M1 to Mə. 


52.1.4 NOTATION: Ó*(M;, Mz), for k € y^ and C^ manifolds M, and M3, denotes the set of all C^ maps 
from open subsets of M; to M». In other words, C^(Mi, M2) = UoeTop(Mi) C*(Q, Mə). 
(See Definition 51.4.15 for the restriction of a differentiable manifold to an open subset.) 


52.1.5 REMARK: The difficulty of representing higher derivatives of differentiable maps. 

Even though it is straightforward to define whether a map between manifolds is of class C^, it is not so 
easy to state just what the k-fold derivative is. This contrasts with real functions of a real variable, where 
the derivative of a function resides in the same space as the function being differentiated. In differential 
geometry, derivatives frequently fall into a completely different space to the original function. Each order of 
derivative may require a separate space to be constructed for it. The derivative of a differentiable map, called 
the *differential", is presented in Section 58.4. It is possible to avoid the question of what a differential is in 
Section 52.1 by defining differentiability in terms of charts, which moves the question to Cartesian spaces. 


52.1.6 REMARK:  Avoidance of ambiguous language for differentiability classes. 

When the regularity class C^ of a differentiable map in Definition 52.1.2 is not stated explicitly, some authors 
assume this to mean C? while many others assume it means C??. It is generally best to be explicit to avoid 
misunderstanding. 


52.1.7 REMARK: Testing the differentiability definition with the identity map. 

If the identity map on a C^ manifold is not a C^ map, then some rapid revision is required! According to 
Theorem 52.1.8, the C* differentiability definition passes this minimal test. More importantly, the mechanical 
details of the proof are an exercise in the application of the definition. Theorem 52.1.9 for the differentiability 
of constant maps is a similarly useful exercise. 


52.1.8 THEOREM:  Differentiability of the identity map on a manifold. 
Let k € Zi. Let M be a C^ manifold. Then id € C^(M, M). 


PROOF: Let Vi,» € atlas(M). Then V» o idm o v1! = V» o v !. But v» o v! is C* differentiable by 
Definition 51.3.2 (iii). Hence idjy is a C^ map by Definition 52.1.2. 


52.1.9 THEOREM: Differentiability of constant maps between differentiable manifolds. 
Let k € Zj. Let Mı and Mz be C* manifolds. Let q € M». Define  : Mı — Mz by ¢(p) = q for all p € Mi. 
'Then [o € C*(M,, M3). 


PROOF: Let n; = dim( Mọ) and v; € atlas( Mọ) for £ = 1,2. Then (2 o go v, P)(x) = v»(q) € IR"? for 
all z € Range(v1) € Top(IR"). Therefore v» o $ o v4! € C*(Range(vi), R^?) by Theorem 42.6.2. Hence 
$ € C*(Mi, Mə). 


52.1.10 REMARK: Reduction of the number of charts which need to be tested. 

Theorem 52.1.11 is particularly useful for testing the C^ differentiability of vector fields as in Remark 57.2.2, 
or the C^ differentiability of cross-sections of differentiable fibre bundles as in Definition 64.7.2 because only 
one chart on the total space needs to be tested for each chart on the base-point space. 


52.1.11 THEOREM: Differentiability tests needed for only one target-space chart per source-space chart. 
Let Mı and Mz be C* manifolds for some k € De Then ¢ : M, > M; is a C* map if and only if 


Vy1 € atlas( M1), Vp € Dom(y1), Jv» € atlaso(5 (M3), (52.1.4) 
VU» o $ o 1! is of class C*. 
Hence ¢ is C* if and only if 
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Vp EM, 3 € atlas; (Mi), d» € atlaso(y) (M3), (52.1.5) 
VU» o $ o 411 is of class C*. 


PROOF: Let ke Zg- Let @: Mı — Mə for C* manifolds Mı and M». If f is C^, then conditions (52.1.4) 
and (52.1.5) both follow by elementary predicate calculus from Definition 52.1.2. 

Assume condition (52.1.4). Let Ya € atlas(Ma) and fg = dim(Ma) for a = 1,2. Let Q = Dom(w2 o 
$ o Vil). Then Q € Top(R™) because V9 o ¢ o vi! is continuous. Let x € Q and p = y7 (x) € 
Dom(wV) € Top, (M1). Then by condition (52.1.4), there is a i» € atlasyp) (Ma) such that p2 o $ o py" is of 
class C*. Let 2 = p(o 1(Dom(%2))). Then 2 € Top(IR"*) because Dom(ds) € Top(IR??) and ¢ and v4! 
are continuous. So Ê € Top, (IR^!) because ó(p) € Dom(2) and so z € €. Then (¢ o v1 !)(Dom(y)) = 
Yı ($7! (Dom(i))) = € by Theorem 10.7.2 (ii). Therefore 


p20 opla = Pal rome) oop 
= %2 0 idpom(ġ,) © Ê © Yr" 
= p2 o 3" o p20 Qo Ņ7", 
which is a C: map on Qn Q aee V. o Jj! is C^ by Definition 51.3.2 (iii), and dj» ogo Vil isa C* (Bap 


by assumption. Thus V3 o ġ o Vil is C* in a neighbourhood of each x € Q. Consequently 5 o do Vil is à 
C* map for all i, € atlas(M;) and v» € atlas( M2). Hence ¢ is a C^ map by Definition 52.1.2. 


Now suppose that condition (52.1.5) holds. Let i; € atlas(Mi) and p € Dom(q). Then condition ERES 1.5) 
implies that there is a v); € atlas; (Mi) and qj» € atlass(5 (M3) such that v3 o ġ o i! is of class CF. Let 
Q = v (Dom(44)). Then Q € Top, (IR":), where x = yı (p), and 


zz —1 
20 Go Vi lo =%20 los o 
= $20 $0 idpom(ġ,) ovi. 
=% 0 ġo 97" o Ņ opt, 
which is a C^ map on Range(71) N Q € Top,(IR™) because 7, o wy! is C^ by Definition 51.3.2 (iii), 
and v9 0 ġ o Vil is a C^ map by assumption. Thus wz o ¢ o Vil is C^ in a neighbourhood of each 


a € Range(vi). So V» o o v! is a C^ map. Therefore condition (52.1.4) is satisfied. Hence ¢ is a C^ 
map by the above proof of the sufficiency of condition (52.1.4). 


52.1.12 REMARK: Equivalent C^ map tests for C^ equivalent atlases. 

In Theorem 52.1.13, it is assumed that A; and atlas(M;) are C^ equivalent atlases on M; for j = 1,2 
according to Definition 51.4.10. Thus as one would expect, the C^ differentiability of a map ó : Mı — Mə 
is independent of the choices of atlases on both manifolds, assuming that the atlases are C^ compatible. 


A significant example of a C* compatible atlas for M; is the C* atlas completion atlas" (M;) for j — 1,2. 
'This example is presented as Theorem 52.1.15. 


52.1.13 THEOREM: Equivalent C* map tests for C" equivalent atlases. 
Let k € Z. Let M; and Mz be C* manifolds. Let A; be a C} equivalent atlas for M; for j = 1,2. Then 
$ : Mı — Mia is a C* map if and only if 


Vii € A1, Vio € As, y2o poy is a C^ map. 


Hence ¢ € C*(Mi, M3) & Vu € A1, V € Ao, Y2 0 go Vil € C* (v4 (67! (Dom(v»))), Range(w2)). 


PROOF: Let nı = dim(Mi) and n3 — dim( Mə). Let (0) : Mı > M2 be " ck map. Let v4 € Aı and we € A». 
Let $ = %2 o ġ o v; !. Then by Theorem 10.10.13 (i, iii), Dom(¢) = v1($-! (Dom(vs))). If Dom(9) = 0, 
then ¢ is a C^ map by Definition 42.5.11. So assume that Dom(¢) 4 Ø. The continuity of ¢ implies that 
Dom(9) € Top(IR":) by Definition 31.12.4. 


Let x € Dom(¢) and p = v "(æ ) Then p € & ! (Dom(v)) C Dom(¢). Let $, € atlas; (M1) and 


js € atlasy(p)(M2z). Let b= dj» o ġo vil. Then $ is a OF Cartesian space map which satisfies 
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$ € C*(U4(ó-!(Dom(U5))), Range(2)) by Definition 52.1.2. Let = (Y2 o d5!) o à o (à o vi). 
'Then is a C* Cartesian space map by Theorem 42.5.27, the chain rule for Cartesian space maps, because 
Mı and Mə are C* manifolds and 11,» are CF compatible with M1, M» respectively. (So the transition 
maps 2 o W5 + and 4 o v1! are C^ Cartesian space maps.) The map ó' satisfies 


$ — s odglodsododj odio! 


= V» o idp onan) 9 Ê 9 Mom) © Vit (52.1.6) 
= 2 o idi, i, 09° Vi old, Domed,y) (52.1.7) 
= ao G09, oid, (4-1 (Dom(dba))) 9 Id, (Dom: )) (52.1.8) 
= dao $ o V1 oid, Gi (Dom(ia)))nWs(Dom(r)) (52.1.9) 
= $2060 Vi oid, Dom(iyng-t(Dom(a))) 

= Yao Va Oa icles, poa) 
= lo; 


where Q = wv (Dom(i4) n d~!(Dom()2))), line (52.1.6) follows from Theorem 10.10.15 (x), line (52.1.7) 
follows from Theorem 10.10.15 (v), line (52.1.8) follows from Theorem 10.10.15 (iv), line (52.1.9) follows 
from "Theorem 10.10.15 (ii), and line (52.1.10) follows from Theorem 10.10.15 (iii). Then 2 € Top(R”) 
because ¢ is continuous, and x € Q because p € Dom(v) and ó(p) € Dom(%2). So 7 € Top,(R™). Thus 
for all x € Dom(¢), for some 2 € Top, (IR":), olo is a C* map. Hence y2 o 9 o 1! = ¢ is a C¥ map by 
Theorem 42.5.29 (iii). The converse follows from Definition 52.1.2 and the observation that C* equivalence 
of atlases is a symmetric relation. 


Since Dom(y2 o à o Py!) = vi(Dom(v» o 4)) = yı (47t (Dom(V/2))) by Theorem 10.10.13 (iii, i), i 
that V» o $ o 1 ! is a C^ Cartesian space map if and only if 6 € C* (yı (471 (Dom(v))), Range(y2) 
ġo € CF (Mi, M2) if and only if V4 € A1, V» € A2, Y2 0 $ o v1! € C*(yi(ó-  (Dom(ys))), Range( 


follows 
Hence 


it 
). 
V2))- 


52.1.14 REMARK: Extension of C* map condition to complete atlases. 
It is often useful to know that a C^ map is C* via all C*-compatible charts, not just the charts in the given 
atlases for the domain and target manifolds. This is asserted in Theorem 52.1.15. 


52.1.15 THEOREM: Equivalent condition for C* maps using complete atlases. 
Let k € Zj. Let M; and Mz be C* manifolds. Then ¢: Mi — Mg is a C^ map if and only if 


Vv € atlas" (Mi), Vj; € atlas" (M2), 
Ua o $ o V1! is a C* map. 


In other words, Yy, € atlas* (Mi), Vi/; € atlas* (Ma), Y2 o do v4! € C* (41 (47t (Dom(vs))), Range(4»)). 


PROOF: The assertions follow from Theorems 52.1.13 and 51.4.12 (iv). 


52.1.16 REMARK:  Differentiability of composites of differentiable manifold maps. 

Theorem 52.1.17 does not mention what the kth derivative of a chain ¢2 o $4 of C^ maps between C* 
manifolds actually is. It is not at all easy to state what this derivative means even for a single map 
for general k € Zi . Therefore Theorem 52.1.17 is limited to merely asserting differentiability, like the 
corresponding Theorem 42.5.27 for Cartesian spaces. 


52.1.17 THEOREM: Chain rule for differentiability of maps between manifolds. 
Let k € Zf. Let Mi, Mz and M3 be OF manifolds. Let $1 € C*(Mi, M3) and ¢2 € C*(M», M3). Then 
$2 o à € C'(Mi, M3). 


PROOF: The assertion follows by applying Theorem 42.5.27, the eA assertion for open subsets 
of Cartesian spaces, to the maps Y3 o ¢2 o $1 o YI ZI (67 (Dom(ya)) = (Ua 0 b2 0 Wy 1) o (p2 o 1 0 V) 
for we € atlase( Mọ) for £ = 1,2,3. 
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52.1.18 REMARK: Defining differentiability of maps in terms of pull-backs of global test functions. 

The C* regularity of maps between manifolds is sometimes defined in terms of real-valued test functions. This 
gives a more-or-less valid criterion which is formally neat and tidy. In the test-function style of differentiability 
definition, a map ¢: Mı — Mg is said to be C^ if f o ¢ is in C*(M,) whenever f € C*(M3). 

Despite its formal neatness, the global test-function style of differentiability definition has various drawbacks. 
As a minor philosophical concern, the set of test functions f in such a definition is generally extremely infinite, 
whereas for finite atlases, Definition 52.1.2 requires only a finite number of chart combinations to be tested 
for differentiability. A more serious philosophical concern is that in practice, testing f o $ for regularity 
requires the use of charts on each manifold, which implies that the test-function definition is very little 
different to Definition 52.1.2 anyway. So any claims that the test-function approach is coordinate-free are 
somewhat exaggerated. Nevertheless, just for the record, the equivalence of the coordinate-tainted and 
"coordinate-free" definitions is demonstrated in Theorem 52.1.19 for maps which are known to be at least 
continuous a-priori. 


Various combinations of topological constraints on manifolds are quoted by various authors as sufficient 
conditions for the validity of the assertion in Theorem 52.1.19 without needing to assume a-priori that the 
map ¢ is continuous. The proofs of this validity are typically too difficult for presentation in elementary 
textbooks. So one might ask whether the complexity of the theoretical basis is justified by the result which is 
achieved. Extending functions is often problematic, often requiring some form of choice axiom for example. 
For the assertion in Theorem 52.1.19, partitions of unity are often invoked, and topological constraints such 
as second countability and paracompactness are often mentioned. By comparison, the benefit of the global 
test-function style of definition is not very great. The differentiability of a map between manifolds can be 
achieved in an elementary way by replacing condition (52.1.12) with the slightly stronger condition 


Vf € C*(M5,R), fo¢eC*(g!(Dom(f)), R) (52.1.11) 


for 6 € C°(M,, M2). This would simplify the construction of test functions in the proof of Theorem 52.1.19, 
although it does not remove the a-priori continuity requirement for ¢. This stronger test-function condition 
is more closely related to the way differentiability between manifolds is generally thought of and applied, 
which is in terms of local differentiability at all points. For general differentiable manifolds M; and M», and 
general maps ¢: Mı — Mh, it is not justified to infer condition (52.1.11) from condition (52.1.12). 


The notion of differentiability is quintessentially local. Therefore one might reasonably ask why global test 
functions should be required for their definition. In much the same way, the classical analytic notion of a 
tangent vector is replaced by many authors with a definition which uses derivations, which requires C?* test 
functions to act on. The motivation for using derivations is apparently to avoid mentioning coordinates, 
although the theoretical justification of such an elegant alternative definition is not elementary. (See for 
example Gómez-Ruiz [14], pages 21-22.) The derivation-based definition is also very restrictive because 
it is well defined only for C?? manifolds. Likewise when test functions are used for differentiability of 
maps between manifolds, various constraints must be put on the underlying topologies of the manifolds. It 
seems simpler and better to define differentiability in the most obvious way, which is via the charts as in 
Definition 52.1.2. 


52.1.19 THEOREM:  Test-function criterion for differentiability of a map between manifolds. 
Let ¢ € C? (Mi, M3) for C^ manifolds M; and Ms for some k € De Then ¢ € C*(M;, M3) if and only if 


Vf € C*(M2,R), foocC*(M,I). (52.1.12) 
In other words, C*(Mi, M3) = {o € C?(Mi, M3); Vf € C* (M3, R), f o [o € C*(M,, R)}. 


PROOF: Let nı = dim(Mi) and ng = dim(M2). Let ¢: Mı — M» be of class C^. Let yı € atlas(Mi) 
and v» € atlas( M2). Then by Definition 52.1.2, Y2 o à o v1 is of class C^. A function f : Mz > IR is of 
class C* (by Definition 51.6.2) if and only if f o 4 ! is C^ for all s € atlas(Mz). Let f : Mz > R be C*. 
Then f o v! is C*. So (f 03") o (20 $ o y!) is C. But 


Vi» € atlas(M2), (F o Y3) o (%20 $0 Yr”) = flos, 29 OUT 
= f o $ o V1", (9-1 emis) 
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So f o ġ o v. is C* on all subsets v4(ó-! (Dom(v))) of Range(v1) for i» € atlas( M2). These subsets 
are open because v1 (6! (Dom(vs))) = v4($-1 (vg (IR"))) = (uo o à o wy!) TR”) and Y2 o go v a is 
continuous. These open subsets cover all of Range(v1) because the domains of v? cover all of M», and so 
the sets ¢~!(Dom(w2)) cover all of Mı. Therefore f o $ o v! € C*(Range(vi), IR) for all yı € atlas( M1). 
It then follows by Definition 51.6.2 that f o ọ € C*(M,,R). 

For the converse, let ó : Mı — Mb satisfy line (52.1.12). Let v; € atlas( Mi) and v» € atlas( Mə). Let 
r c Q = Range(y3). Let p = d(x, R"? V Q). Then p € R*. Let r, R € Rt with r < R < p. Let 
j € Na, Then by Theorem 51.8.5, there is a function f? € C*(M2,R) such that f?(p) = v» (p ) for all 
p € V; (Ber) and f/(p) = 0 for all p € M2 \ vz!(Bs,n). So by line (52.1.12), f? o ọ € C*(Mi, R). 
Therefore f? o ó o v4! € C*(Range(/4), R) by Definition 51.6.2. Since Hos = = Vel: 1, ,)? it 

2 Lat 2 c.r 

follows that 


j —1 j —1 
2009 mS = palyan) Spe 
j E 
= Pon, ooo Vi 


— fi mL 
=f oġo p lux (671 (67 (Bs. 


where 41(6-! (ig ! (B,,,.))) is an open subset of Range(V4) since Y2 o ¢ o V1. is a continuous map from 
Range(41) to Range(72) because ó € C(Mi, M3), and Yı o $7! o Wy! = (a o œ ov, 1)-1. Therefore 


—1 m n 
Vo o $o VI [cuz y 7 PBF 0 6o VT" lu oci uet, y 
— (E Mey 9 Q 2 V agare 
C* (ur (9 pg (Bzr))), R”?). 


The collection of sets (41(6-1(v3!(B,,)); x € Range(Y2), 0 < r < d(x, R”? V Range(p2))} is an open 
cover for v4 (6^! (Dom(v»))) because yo(¢(YI (y))) € Range(v») for all y € vi(éó-!(Dom(vs))). So by 
Theorem 42.2.14, p2 o $6 o Wy € C*(ui(ó- - (Dom(9a))), IU) for all v4 € atlas(Mi) and v» € atlas( M2). 
Hence ¢ € C* (Mh, M») by Definition 52.1.2 


52.1.20 REMARK: Excessive hiding of coordinates. 
Figure 52.1.2 illustrates the sets and maps in Theorem 52.1.19. 


Om 
E p 


Figure 52.1.2 C* differentiability of a map via test functions 


The proof of Theorem 52.1.19 exemplifies how attempts to do differential geometry in a coordinate-free 
fashion really only hide the coordinates. To be precise, whenever one invokes the space of C^ functions 
f: M — R as a test space on which to work “coordinate-free”, the real-valued functions f are themselves 
coordinates. There is very little difference indeed between the individual coordinates YÍ of a C^ chart v and 
a C* real-valued function. 


This thinking may be applied similarly to tangent operators as defined in Section 54.11. These are defined 
on C! test functions f : M — IR, but this is equivalent to defining operators on chart coordinates y : 
M — R. In fact, a tangent operator (in Definition 54.11.2) of the form Op,y,y acting on a function f = v? 
yields p v, (f) = v. This shows again the equivalence of chart coordinates and test functions. 
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52.2. Diffeomorphisms and pull-back atlases 


52.2.1 REMARK: Differentiable manifold diffeomorphisms and other morphisms. 

Differentiable manifold diffeomorphisms, and other kinds of differentiable morphisms, can be defined in the 
same way as for Cartesian spaces in Definition 42.7.2. A C^ diffeomorphism from a C^ manifold M; to a 
C^ manifold Mə is simply a bijection ¢ : Mı — Mə such that ¢: Mı — Mə and $^! : Mz — Mj are ck 
differentiable maps. 


52.2.2 DEFINITION:  Differentiable morphisms between open subsets of differentiable manifolds. 
Let Mi, Mz and M be C* differentiable manifolds for some k € Ze : 


A C* differentiable homomorphism from a set Qı € Top(Mi) to a set Q5 € Top( M2) is a map ¢: Qı > Q2 
such that $ is C^ differentiable. 


A CF differentiable monomorphism from a set Qı € Top(M;i) to a set Q2 € Top(M3) is an injective C* 
differentiable homomorphism 6 : Q4 > Q2. 


A OF differentiable epimorphism from a set Qı € Top(Mı) to a set Qə € Top(M») is a surjective C^ 
differentiable homomorphism 6 : Q4 > Q2. 


A C* differentiable isomorphism or C" diffeomorphism from a set Qı € Top(.Mi) to a set Qə € Top(.M3) 
is a bijective C* differentiable homomorphism ¢ : Q4 — Qə such that $7! : Qə > Qı is a C* differentiable 
homomorphism from Qə to €. 

A OF differentiable endomorphism of a set Q € Top(M) is a C* differentiable homomorphism $ : Q > Q. 


A C* differentiable automorphism of a set Q € Top(M) is a C^ differentiable isomorphism $ : Q > Q. 


52.2.3 DEFINITION: C* manifolds M; and Mb are said to be C*-diffeomorphic, for k € Zg, if there exists 
a C* diffeomorphism from Mi to Mo. 


52.2.4 REMARK: Ambiguous language for diffeomorphisms. 

If the regularity class C is not specified, k = oo is often assumed. Thus two differentiable manifolds are often 
said to be “diffeomorphic” when they are both C?? manifolds and are C^??-diffeomorphic. It is preferable to 
state the regularity class of a diffeomorphism explicitly. 


52.2.5 REMARK: The pull-back and push-forth of charts and atlases via diffeomorphisms. 

It seems intuitively clear that if two manifolds are C^ diffeomorphic, then any atlas on one manifold can 
be pulled back or pushed forth to construct an equivalent atlas on the other manifold. This is asserted in 
Theorem 52.2.6 (ii). (See Definition 49.11.7 and Theorem 49.11.8 for pull-back atlases for locally Cartesian 
spaces.) In fact, the converse is true, which is also intuitively clear. 


52.2.6 THEOREM: The pull-back of charts and atlases via a diffeomorphism. 
Let k € Zg. Let Mı and Mə be C^ manifolds. Let o: Mı — Mə be a bijection. 


(i) ¢ is a C^ diffeomorphism from M; to Mg if and only if 
V» € atlas( M3), V» o $ € atlas" (M1). 


(ii) @ is a C^ diffeomorphism from M; to Mg if and only if (v o $; Y2 € atlas(M2)} is a C* atlas for My 
which is C* compatible with atlas(Mi). 


Pnoor: For part (i), suppose that ó : Mı — M2 isa OF diffeomorphism. Let v € atlas( M2). Then 
Dom(w» o ¢) = $^! (Dom(v5)) € Top(M5) by Theorem 10.10.13 (i) and Definitions 52.1.2 and 31.12.4. Let 
V, € atlas( M1). Let Yi = V» o 9. Then Yi o yr! € C^ (IR! R™) by Definition 52.1.2 and Notation 42.5.14, 
where nı = dim(Mi) and n; = dim(M;). Similarly, yı o (y4)! = yı o $^! o v3! € CF*(IR^, IR"). 
Therefore Y2 o ó € atlas*(Mi) by Notation 51.4.7. 

Conversely, suppose that V» € atlas(M2), vo o ó € atlas" (Mı). Let v; € atlas(Mi) and v» € atlas( M2). 
Then (io o à) o v4! € C*(IR^:, R??) and yı o (io o à)! € C*(IR??, R^) by Notation 51.4.7 and 
Definition 51.3.2 (iii). Thus ¢ € C^(Mi, Mz) and $^! € C*(Mz, Mı) by Definition 52.1.2. Hence ¢ is a C* 
diffeomorphism from M to M» by Definition 52.2.2. 
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For part (ii), suppose that ¢ : Mı — Ma isa C* diffeomorphism. Let A; = (w» o à; v» € atlas(M3)]. 
Then A; C atlas*(M,) by part (i). Let pı € Mı. Then ó(p1) € Dom(w2) for some v € atlas( M2). So 
pi € ó !(Dom(/5)) = Dom(v/» o ¢) by Theorem 10.10.13 (i). Thus the charts in A; cover Mı. Therefore 
A, is a C^ atlas for Mı by Definition 51.3.2, and A; is C^ compatible with atlas(.M1) by Definition 51.4.10. 


Conversely, suppose that Aj = {w2 o $; we € atlas(M2)} is a C^ atlas for Mı which is C^ compatible 
with atlas(Mi). Then v» o ọ € atlas" (Mı) for all Y2 € atlas(M2) by Definition 51.4.10. Hence ¢ is a C* 
diffeomorphism from M to M» by part (i). 


52.2.7 REMARK:  Pull-back atlases for diffeomorphisms between differentiable manifolds. 

Definition 52.2.8 and Theorem 52.2.9 are the differentiable manifold versions of Definition 49.11.7 and The- 
orem 49.11.8, which are stated for locally Cartesian spaces. Differentiable manifolds always have a specified 
atlas. So Definition 52.2.8 uses that specified atlas, not an arbitrary atlas as in Definition 49.11.7. 


52.2.8 DEFINITION: The pull-back atlas for a C^ manifold M, for k € 7 from a C^ manifold Mp via a 
C* diffeomorphism $ : Mı — Mg, is the set (V o à; v € atlas(M3)]. 


Alternative name: induced atlas. 


52.2.9 THEOREM: Some basic properties of pull-back atlases for differentiable manifolds. 
Let k € Zg. Let ó : Mı — Mə be a C* diffeomorphism between C^ manifolds M, and M». Let A4 be the 
pull-back atlas on Mı from Ms» via 4. 


(i) Ay is a C* atlas for Mj. 

(ii) Ai C atlas" (M1). 
(ili) Ypı € Mi, {1 € Ai; pı € Dom(Y1)} = {V2 o à; Y2 € atlasy(p,)(M2)}. 
(iv) If atlas(M3) = atlas*(M3), then A, = atlas" (Mj). 


PROOF: Part (i) follows from Theorem 52.2.6 (ii). 
Part (ii) follows from Theorem 52.2.6 (i). 


For part (iii), let p; € M; and v, € A, with p; € Dom(v,). Then V, = v» o ¢ for some wv» € atlas( M2), 
and ó(pi) € ó(Dom(v1)) = Dom(v; o 9^!) by Theorem 10.10.13 (iv). Therefore ó(p1) € Dom(w2). Thus 
{yı € Ai; pı € Dom(v1)) € {V2 o 6; v» € atlass(5,,(M2)]). 

For the reverse inclusion, let i; € (V2 o ¢; Y2 € atlass(5,(M23)]. Then v1 = Y2 o ó for some v» € atlas( M2), 
and ó(p1) € Dom(w2). Therefore v; € A; and p, € à"! (Dom(v5)) = Dom(w» o $) by Theorem 10.10.13 (i). 
So pı € Dom(y1). Thus (V1 € Ai; pı € Dom(y1)} 2 (v» o à; v» € atlas, (Ma3)). Hence (V1 € A1; pı 
Dom(v1)j = {V2 o ¢; V» € atlass(5,) (M3)). 

For part (iv), suppose that atlas(Ma) = atlas*(M2). Let v; € atlas*(Mi). Then vi : Ui > G isa 
homeomorphism for some U, € Top(Mi) and G4 € IR", where n; = dim(Mi). Let 3 = v4 o 9^! and 
U> = Q(Ui). Then Dom(v5) = o(Dom(v,)) by Theorem 10.10.13 (iii). So Dom(v5) = $(U1) € Top(M3) 
by Definitions 31.14.2 and 31.12.4. But Range(v5) = vi(Dom(9)) by Theorem 10.10.13 (iv). Therefore 
Range(V5) = Range(/1) = G; € Top(R™). So v» € atlas? (M2). Then to show that p> € atlas" (M3), let 
v € atlas( M2). Then y» o Tt = yı o $7! o y-! is a C^ map by Definition 52.1.2 because $7! is C*. 
Similarly, v o Yz" = v o $ o v5! is a C^ map because ¢ is C^. Thus v» is C^ compatible with M» by 
Definition 51.4.2. So v» € atlas*(M2) by Definition 51.4.5. Therefore yı € A,. Thus A, 2 atlas"(Mj). 
Hence A; = atlas" (Mi) by part (ii). 


52.3. Differentiable submanifold point-sets 


52.3.1 REMARK: Spaces and maps for submanifolds of a differentiable manifold. 

Submanifolds, embeddings, immersions and submersions are defined for differentiable manifolds by adding 
differentiable structure to topological manifolds and defining the regularity of the various submanifold-related 
spaces and maps to make them locally fit differentiably with their ambient spaces. (See Section 50.2 for 
submanifolds of topological manifolds.) KE 


'The principal basic classes of spaces and maps related to submanifolds are summarised in Remark 50.2.3 


for topological manifolds, although almost all of the book references in Remark 50.2.3 give definitions of 
submanifolds, embeddings, immersions and submersions for differentiable manifolds, not topological. 
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An important difference between topological submanifolds and differentiable submanifolds is that in the 
topological case, there is only one obvious natural choice for the topology on a subset, namely the relative 
topology, and the subset’s topology determines whether it is a topological manifold or not. In other words, 
charts and atlases are not required for the basic definitions. In the differentiable case, however, both the 
ambient space and the subset must have an atlas to specify the structure. Therefore it is not sufficient to 
examine only the subset of a manifold. One must also examine the atlas of that subset in relation to the 
atlas of the ambient space. 


52.3.2 REMARK:  Differentiable manifold inclusions in differentiable manifolds. 

The inclusion map ids : S > M in Definition 52.3.3, defined by ids : p — p for p € S, is a C^ differentiable 
function if and only if it is differentiable when viewed via charts in atlas(S) and atlas(M) for the domain 
and range respectively. So the differentiability depends on both atlases. 


The name “submanifold” is not used in Definition 52.3.3 because the concept described here is no more than 
an embedding of one manifold within another (without any regularity condition), where the embedding map 
happens to be the set inclusion map, which is the identity map on the subset. This definition is more useful 
when combined with a graph-style regularity condition as in Definition 52.4.2. 


52.3.3 DEFINITION: A C^"-included (differentiable) manifold in a C" manifold M, for k € Z, is a OF 
manifold S such that S C M and ids : S — M is a C} map with respect to the atlases on S and M. 


The map ids is then called a C^ (differentiable) inclusion of the C! manifold S in the C^ manifold M. 


52.3.4 EXAMPLE: Let M = R? and Ay = {idm}. Then (M, Ay) is a C% manifold with dim(M) = 2. 
Let S = {(£1,£2) € R?; x9 = |vi]] For a € Z, let A% = {va}, where Ya : S — R is defined by 
Wal (#1, 22) ) = ree) for all z; € IR. Then (S, A) is a C^? manifold because it has only one chart. (The 
case o = 2 is illustrated in Figure 52.3.1.) 


4T2 


—1 /5 1 


V» : (£1, £2) > zi 


Figure 52.3.1 The chart Yə : S > IR 


Define ya = idy o idg o y3! : R > R? for a € Zt. Then ya(t) = (t*?%, |t1*?^|) for all t € IR. So 
Ya € C?*(IR, R?) for all a € Zi. Therefore by Definition 52.1.2, ids : S — M is a C?* map, but not a 
C'!*?* map. Hence (S, A) is a C?^-included manifold in M, but not a C!*?^-included manifold in M. 

Thus the choice of atlas for S determines the differentiability of its inclusion within M. (This is called an 
“inclusion”, not an “embedding”, because according to Definition 52.5.3, an embedding is a more general 
kind of map from one manifold to another. The inclusion map ids : S — M is a special kind of embedding.) 


52.3.5 REMARK: Natural differentiable structure for submanifolds. 

In Example 52.3.4, the natural chart v for the subset S of M makes the manifold (S, (9]) a C°-included 
manifold of M, as one would expect. The charts v, with a > 0 are not so natural, in the sense that they 
have infinite "velocity" near the vertex (0,0) of S. (This corresponds to the observation that the inverse of 
Wa is a curve Ya which has zero velocity at the vertex for a > 1.) 


To ensure “naturality” of the atlas on a submanifold, one could either require the atlas on the subset S to 
be automatically derived from the atlas on the ambient set M in some way, such as by local projections of 
some kind, or one may allow any atlas to be provided for S, subject to some kind of test that the atlas has 
a “natural” relation to the ambient space. 


One argument against the use of an automatically generated atlas for a subset of a manifold is that there 
are many ways to do this. For example, the subset S! = {x € R?; |x| = 1} may clearly be given many 
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different C°° atlases which are “natural” in the sense that they have a smooth, bounded “velocity” relative 
to the standard chart for R?. For example, projections may be defined in various directions, with various 
"velocities" and domains. An atlas with two charts would suffice, but one may easily include countably 
or uncountably infinitely many charts according to a range of criteria. When the subset has no obvious 
symmetries, the choice of a specific atlas is even more difficult. One may require the atlas construction 
to be rotation-independent, or C^ diffeomorphism independent, or independent of various other classes 
of transformations of the ambient space. Choosing a complete atlas with respect to some differentiability 
criterion will make the submanifold fail to have higher levels of differentiability. (For example, C! completion 
of an atlas makes it fail to be C?, and C?? completion makes it fail to be real-analytic.) 


All things considered, it seems reasonable to distinguish two notions of what a submanifold is. It can be a 
subset of the ambient space which is everywhere locally the graph of a C^ function, as in Definition 52.3.7, 
or it can be a subset with its own atlas, i.e. a manifold, for which the subset’s atlas is C^ compatible with 
the ambient space's atlas in some way, as in Definition 52.4.2. In the first case, there is no atlas, but an atlas 
clearly can be chosen. (This is shown in Theorem 52.4.11.) In the second case, a particular differentiable 
structure for the subset is specified by its own atlas, and the C* extensibility of the inclusion map signifies 
that it is included in the ambient space in a C^ compatible way. Since a substantial proportion of the 
differential geometry literature is concerned with the classical geometry of manifolds embedded in Cartesian 
spaces, the choice of definitions is not entirely without importance. 


52.3.6 REMARK: Regular differentiable submanifold point-sets in manifolds. 

Definition 52.3.7 requires the existence of local charts via which a subset of the manifold is diffeomorphic to 
a hyperplane in a Cartesian space. These local charts must be C^ compan with atlas( M), not necessarily 
elements of atlas(M). (See Definition 51.4.5 and Notation 51.4.7 for atlas& (M), the set of C^ compatible 
charts in atlas; (M).) Since no atlas is provided for the set S, the diendon of S must be inferred as an 
integer for which condition (52.3.1) is satisfied. This integer is uniquely determined by S if S is non-empty. 


52.3.7 DEFINITION: Regular differentiable submanifold point-set. 
An m-dimensional regular C" (differentiable) submanifold point-set in a C" manifold M, for m € Zg with 
m < n = dim(M), and k € Zg, is a subset S of M which satisfies 


Vp € S, 3 € atlas; (M), Dom(y) n S = ^1 (IR? x {Opn—m}). (52.3.1) 


52.3.8 THEOREM: Equivalent zero-graph condition for regularity of a C® submanifold point-set. 
Let k € Zi. Let M be a C! manifold M with n = dim(M). Let S C M and m € Zj with m < n. 


(i) S is an m-dimensional regular C^ submanifold point-set of M if and only if 


Vp € S, W € atlas? (M), v(S) = Range(v) n (IR" x {Opn—m}). 


(ii) S is an m-dimensional regular C^ submanifold point-set of M if and only if 


Vp E S, W c atlas} (M), JN € Top(R”), 


V(S) = { (2, 0gs-»); £ E€ Q} 


= ho (52.3.2) 


lo: 


where hg : R™ — R^-"' is defined by ho(z) = Opn—m for all z € R”. 


PROOF: For part (i), let p € S and € atlas; (M). The condition Dom() n S = v^! (IR" x {Ogn-m}) is 

equivalent to (Dom(v) n S) = v(v-  (IR" x f0gs-»])) because v is injective. This is then equivalent to 

v(S) = Range(v) n (IR" x {Ogn—m}) by Theorems 10.9.12 (ii") and 10.10.2 (i). The assertion follows. 

For part (ii), note that IR" x {Ogn—-m} = {(2, 0pn-m); z € R™} = ((z, ho(z)); z € R™} = ho. 

Suppose that S is an m-dimensional regular C^ submanifold point-set of M. Let p € S. Then v(S) = ho N 

Range(v) for some y € atlas? (M) by part (i). Let Q = II (v(S)) = II (honRange(v)). Then 2 € Top(IR") 

by Theorem 32.10.3 (i. Since v(S) C IR" x {Ogn-m}, it follows that v(S) = ((z,0ms-»); (z, Ogn—m) € 
V(S)). So IPOS) = (5 (2,0nn-m) € Y(S)}. Therefore 9(S) = {(z, 0n»); z € HI (9(5))) = hola: 

This verifies line (52.3.2 


3, 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


52.3. Differentiable submanifold point-sets 1657 


Now assume condition (52.3.2). Let p € S. Then (S) = ho|, for some v € atlas? (M) and Q € Top(IR"). 
Let U = y 1(Qx IR^") and Y = Y|. Then 9(S) = Y(UN S) = Y(U) NYS) = v(S)n (Q x R”) = ho 
because w is injective, and 


lo 
Renge(/) N (R™ x {0rn-m}) = Y(U) N (R" x {Open }) 

= (Range(v) n (Q x IR") n (IR" x {Opn—m }) 

= Range(v) N (IR" x {Ogn—m}). 


Since U € Top( M), and p € Dom(v) = Urn Dom(v) € Top(M) because p € S, it follows that v € atlask (M). 
Hence S is an m-dimensional regular C^ submanifold point-set of M by part (i). 


52.3.9 REMARK: Open subsets of manifolds are regular submanifold point-sets. 
In the special case m = n in Definition 52.3.7, a submanifold point-set is the same thing as an open subset 
of the ambient space M. This is shown in Theorem 52.3.10. 


52.3.10 THEOREM: Open subsets of manifolds are regular submanifold point-sets. 
Let k € Zj. Let M be a C* manifold with n = dim(M). Let S € IP(M). Then S € Top(M) if and only if 
S is an n-dimensional regular C^ submanifold point-set of M. 


PROOF: Let S € Top(M). Let p € S. Let y € atlas,(M). Then Dom(w) = v! (IR") because IR" is the 
target space for Y by Definition 51.3.2 (i). But v € atlask(M) by Definition 51.4.5 and Theorem 51.4.3. 
Therefore S is an n-dimensional regular C^ submanifold point-set of M by Definition 52.3.7. 


Now suppose that S is an n-dimensional regular C^ submanifold point-set of M. Let p € S. Then there 
exists Y € atlas (M) by Definition 52.3.7. By Definition 51.4.5, v» is a locally Cartesian chart on M which 
is OF compatible with atlas(M). So by Definition 51.4.2, atlas(M) U (v) is a C^ atlas for M. Therefore 
Dom(v) € Top(M) by Definition 51.3.2 (i). But by line (52.3.1), Dom(y) N S = v^! (IR^) = Dom(v). So 
Dom(v) C S by Theorem 8.1.7 (v). Hence 5 € Top(M) by Theorem 31.3.16 (i). 


52.3.11 REMARK: Construction of chart through which a submanifold point-set has a given C* graph. 
Theorem 52.3.12 and its proof are the same as Theorem 50.2.13 and its proof except that C^ differentiability 
is substituted for continuity. (Theorem 52.3.12 is illustrated in Figure 50.2.2.) 


52.3.12 THEOREM: Interchangeability of C^-graph conditions for submanifold point-set regularity. 

Let k € Zj. Let M be a C* manifold with n = dim(M). Let S C M and m € Z with m < n. Let yı € 
atlas" (M) with v1(S) = hi for some hı € C*(Q,R"-™) with Q € Top(R™). Let ha € CF(Q, R^^"'). Then 
there exists i € atlas*(M) such that v2(S) = he, IY o yi], = II?" o vo|; and Range(u) C Q x RT". 
(See Definition 14.6.11 for the component projection map IIT : (z;)?., > (zi)g4.) 


PROOF: Let Q € Top(IR?) with hi, hz € C^(Q, IR"7"), where v € atlas" (M) satisfies v1(S) = hı. Define 
$ : Qx R7" 2 Ox R” by 


VrceOxm"", d(x) = a+ (Orr, ha (II (z)) — hi (HT (2))), (52.3.3) 


where the addition operation “+” on R” is defined in the usual componentwise fashion. The bijectivity of 
o: Qx IR^7" — Q x "7^ follows by swapping hı and ha to obtain its inverse. Also ¢ is C^ because hı and 
hə are C*, and similarly $7! is C^. So à : (Y > M isa C* diffeomorphism, where Q’ = Qx R"~™ € Top(IR"). 
Then é(Range(v1)) € Top(IR") by Theorem 31.6.6 because Range(v1) € Top(IR") and Dom(9) € Top( R”). 
Let ij; = $ o pı. Then v» is a C* diffeomorphism from Dom(wV5) = v, !(Dom(9)) = v4 + (Q) € Top(M) to 
Range(w2) = ó(Range(/1)) € Top(IR”). So v» € atlas*(M). Moreover, 


V»(S) = e(v1(S)) 
ó(hi) 
= {9(z,hi(z)); z € Q} 
{(z,hi(z) + ha(z) — hı (2)); z e Qj 
ho 


II 


II 
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and 


Vq € 8, TH" (vo(q)) = Mr (o(v1(a))) 
) 


by line (52.3.3). Since Range(v3) C Q x R"-"', it follows that there exists v» € atlas*(M) such that 
V(S) = ha, VII o 1|; =I" o vo| ; and Range(y2) CQ x IR". 


52.3.13 REMARK: Relazing the zero-graph condition to a C" -graph condition. 
Theorem 52.3.14 gives a weaker-looking condition for regularity than in Definition 52.3.7. Instead of a 
requiring a zero-function graph, the point-set is required only to be locally the graph of a C^ function. 


52.3.14 THEOREM: C* graph condition for regular differentiability of a submanifold point-set. 
Let k € Zj. Let M be a C* manifold with n = dim(M). Let S C M and m € Zj with m < n. Then S is 
an m-dimensional regular C^ submanifold point-set of M if and only if 


Vp € S, Ay € atlas; (M), JQ € Top(IR"), Ih € C^(Q, R"7"), (52.3.4) 
v(S) = t(z, h(z)); x e qj 
=h. 


PROOF: Let S be an m-dimensional regular C^ submanifold point-set of M. Let p € S. Then by Theo- 
rem 52.3.8 (ii), there exist v, € atlas (M) and Q € Top(R™) such that V4(S) = holo: where hg : IR^ > 
R”—™ is the zero function. Thus condition (52.3.4) is satisfied with h = holo ecto B 


Now let condition (52.3.4) be satisfied. Let p € M. Then there exist v; € atlas (M), Q € Top(IR™) and 
hi € CF(Q, R7") such that 41(S) = ((z, hi(x)); x € Q} = hi. Let ha = holo Then h € C*(Q, R"-™) by 
Theorem 42.6.2. So by Theorem 52.3.12, there exists Y2 € atlas*(M) such that V2(S) = hz and II? o Vils = 
II o alg- Since p € Dom(y1) N S = Dom(II?" o v4 m it follows that p € Dom(v3)n$ = Dom(II}” o Va] ;). 
So Y2 € atlas; (M). Thus condition (52.3.2) is satisfied. Hence S is an m-dimensional regular C^ submanifold 
point-set of M by Theorem 52.3.8 (ii). 


52.3.15 REMARK:  Arbitrariness of the zero tuple in the regular submanifold point-set definition. 
Theorem 52.3.16 asserts that Ogn-m in Definition 52.3.7 can be replaced by any real (n — m)-tuple. This 
extra generality is not incorporated into Definition 52.3.7 because it would complicate the logic of all of the 
theorems and definitions which use it. On the other hand, there are numerous situations where it is easy to 
prove that a submanifold satisfies condition 52.3.5. Then Theorem 52.3.16, or some other kind of argument, 
must be applied so as to verify that Definition 52.3.7 is satisfied. 


52.3.16 THEOREM: Regularity of submanifolds satisfying a constant-graph condition. 
Let k € Zf. Let M be a C* manifold with n = dim(M). Let S C M and m € Zi with m < n. Then S is 
an m-dimensional regular C^ submanifold point-set of M if and only if 


Vp € S, Iy € atlas; (M), Ja € IR^", 
Dom(y) n S 2 v! (IR" x (z)). (52.3.5) 


PROOF: Let S be an m-dimensional regular C* submanifold point-set of M. Then condition (52.3.5) is 
satisfied with x = Ogn-m by Definition 52.3.7. 


Now suppose that condition (52.3.5) is satisfied. Let p € S. Then Dom(v) S = v !(IR" x {x}) for 
some v € atlas}(M) and x € R”™™. So (S) = Range(v) N (R™ x {x}) by Theorem 10.7.1 (i). Let 
Q = II (v(S)). Then Q € Top(IR") by Theorem 32.10.3 (i) because € = II (Range(y) N(R” x {x})), where 
Range(/) € Top(IR?). Let h = {(z,x); z € Q}. Then v(S) = h, and h € C*(Q,R"-™) by Theorem 42.6.2. 


Hence by Theorem 52.3.14, S is a regular topological submanifold of M. 
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52.4. Differentiable submanifolds 


52.4.1 REMARK: Regular differentiable submanifolds. 

Whereas Definition 52.3.7 defines regular C^ submanifold point-sets, Definition 52.4.2 defines regular C* 
submanifolds, which means that the point-set has its own atlas, which is required to be consistent with the 
ambient space’s atlas. Condition (52.4.1) requires S to be the graph of a zero-function with respect to the 
ambient-compatible chart 1», but it also requires the chart w for the subset S to be equal to a restriction of the 
ambient-compatible chart. In other words, the subset’s chart m must be extensible to an ambient-compatible 
chart w via which S is the graph of a zero-function. (Definition 52.4.2 is illustrated in Figure 52.4.1.) 


R” x(0un—m) 


Figure 52.4.1 Definition of a regular differentiable submanifold 


52.4.2 DEFINITION: Regular differentiable submanifold. 
An m-dimensional regular C} (differentiable) submanifold of a C" manifold M, for k € Zi and m € ZF 
with m < n = dim(M), is a C* differentiable manifold (S, As) such that S C M and 


Vp € S, 3) € As, 3 € atlas*(M), 
v(S) = Range(v) n (IR" x {Ogn—-m}) and aestum =f o4/|,.. (52.4.1) 


(See Definition 14.6.11 for IIT : R” > R” with II? : (z;)7., > (z;)234.) 


52.4.3 REMARK: Motivation for testing submanifold atlases for regularity. 

Theorem 52.4.4 is not the differentiable version of some topological manifold theorem in Section 50.2 because 
topological submanifold structure is defined to be the relative topology, which does not require any “relative 
atlas” on the submanifold. (This issue is discussed in Remark 50.2.5.) In the case of topological manifolds, 
a regular submanifold is required to satisfy a local graph condition as in Definition 50.2.8, but this is a test 
for the subset’s relative topology, not for any atlas. 


In the case of differentiable manifolds, a subset which is a candidate to be a regular submanifold must also 
satisfy a local graph condition as in Definition 52.3.7. Then a kind of “relative atlas” can be constructed 
for the subset as in Theorem 52.4.11. (This is also discussed in Remark 52.4.9.) However, it is not always 
convenient to use this maximal atlas. For example, for fibre sets of differentiable fibre bundles, it is more 
convenient to use submanifold atlases which are induced by local trivialisations. The necessity then arises 
to test any given atlas for a subset to determine whether the subset is a regular submanifold with that atlas. 
This is the purpose of Definition 52.4.2. Theorem 52.4.4 gives some basic properties for this basic definition. 


52.4.4 THEOREM: Some basic properties of regular differentiable submanifolds. 
Let k € Zj. Let S be a regular C^ submanifold of a C^ manifold M. Let n = dim(M), m = dim(S), and 


define A*(p, 7) for all p € S and v € atlas,(S) by 


Vp € S, Vi) € atlas, (S), (52.4.2) 
A*(p,) = (v € atlas? (M); V(S) = Range(v) n (IR" x {Opn—m}) and Viso e Imo pls} 


Then 
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(i) Vp € S, Vj € atlas,(S), V € A*(p, v), Dom(v) n S = Dom(i) n Dom(). 


(ii) Vp € S, X) € atlas, (S), A¥ (p, Y) £ b. 
(In other words, “As” may be replaced by “atlas,(S)” in Definition 52.4.2.) 


(iii) Vp € S, V) € atlas,(S), Vv € A*(p, 9), Dom(v) N S = Y7 (IR? x {Ogn—m}). 
(iv) Vp € S, V € atlas, (S), V € A*(p,), |, = concat(-,Opn—m) o er 


In other words, the value of y on S equals the concatenation of the value of Ü with the tuple 0gs-». 
(See Definition 14.6.10 for the tuple concatenation map concat : IR" x IR?" — R”.) 


(v) Vp € S, Vi € atlas, (S), V € A*(p, V), Vz € v(Dom(v)), v(9-7!(z)) = (£, Ogn—m). 

(vi) Vp € S, V) € atlas (S), V € A*(p, d), VG € P(M), v(Gn S) = 9(Gn Dom(y)) x {Ogn—m}. 

(vii) Vp € S, Vi) € atlas, (5), Vv € A} (p, Y), VG € P(M), GNSNDom(w) = Y~! (b(GnDom()) x (0g; }). 
) (S) 
) 
) 


(viii) Vp € S, V € atlas, (S), Vv € A*(p,d), VG € P(M), Y(G N S) = v(G) n (IR? x (0gs-»]). 
(ix) Vp € S, V € atlas,(S), Vv € A*(p, Y), VG € P(M), 9-1 (II"(v(SnG))) = Gn Dom(y) n Dom(y). 
(x) Top(S) = {QN S; Q € Top( M)). In other words, the topology on S (induced by atlas($)) is the same 
as the relative topology induced on S by Top( M), where Top(M) is induced by atlas(M). 
(xi) Vp € S, Vi € atlas (5), V € A*(p, V), VX € IP(IR"), (91 (X)) = mx nv(s)). 
(xii) Vp € S, V) € atlas (S), V € A*(p, d), VX € P(R”), 
bw? (X)) = II (X n Range(v) n (IR? x (0gs-»])). 
(xiii) Vp € S, V) € atlas,(S), A kp, p) #0. 
(In other words, “2h” may be replaced by “Ww” in Definition 52.4.2. ) 
(xiv) Vp € S, Vi € atlas, (S J Vi, We € AF (p, vj) V) ; VFalsspastuo) = Va| srDom(1)" 
(xv) Vp € S, Vi, Uo € atlas, (S), Vii € A* (p, v1), Via € A" (p, wa), 
vali (R x {Oem })) = bo(Dom(v1)) n (R" x {Opn—m}). 
(xvi) Vp = S, Vii, Y2 € atlas, (S ji Vi € A" (p, v1), Vii» € AF (p, we), 
V1 (Dom(y) n S) € pi (Dom()) n {Opn—m } 
(xvii) Vp € S, Vii, V» € atlas; (S), V € A*(p, v), Vy», € AX, bo), 
-1 
Va o pi PE = concat( - ,Üna-«) o tj» o py! o II lus (Dom(V2)n8)* 
In other words, 7 7 
Vp € S, Vi, Y2 € atlas,(S), Vi, € A" (p, i1), V2 € A" (p, Y2), Vy € 1(Dom(v2) n S), 
walby (y) = aby OTI (9); 0-7). 
Proor: For part (i), let p € S, i) € atlas, (S) and v € A*(p, ij). Then V sna — IIT o plg It follows 
that Dom(4%) N Dom(Y) = Dom(Y|oom( yp) = Dom(lI? o 9[,) = Dom) n S. 
For part (ii), let p € S. Then by Definition 52.4.2, there exists ) € Ag such that AŽ (p, 9) £ 
p € Dom(v)) for some v € A*(p, Y), and then by part (i), p € $M Dom(4) = Dom(v) n Dom(v) € 
Hence v € atlas, ($). 
For part (iii), let p € S, € atlasp(S) and € A*(p, v). Then v(S) = Range(v) N (IR? x {Ogn—m}). So 
v l(v(S)) = pt (Range(w) n (R™ x {Ogn-m})). But v^! (v(S)) = Dom(v) n S by Theorem 10.10.2 (iii), 
and y !(Range(v) N (IR^ x (0gs-4))) = v^ !(IR" x {Ogn-m}) by Theorem 10.9.14 (vi). Consequently 
Dom(y/) NS = Y(R" x {Opn—m}). 
For part (iv), let p € S, ) € atlas,(S) and v € A*(p, v). Then RT = II o y|. It follows that 
Dom(;p| Dom( y) = Ma o V|s). Therefore the domains of the left and right hand sides of the equation 
V|s = concat(-, Ügs-5) But are the same. Let q € S Dom(y). Then v(g) € IR? x {Opn—m } 


(W) ~ 
by part (ii). But IIP(v(q)) = w(q) because V assi = II o P| s. Therefore by Theorem 14.6.12 (i), 


Yla) = concat(II (v(a)), Hin 41 (9(q))) = concat(2b(q), Onn-m). Hence Y| = concat(- ,Ogs-») o 9| pui; 


Part (v) follows directly from part (iv). 
Part (vi) follows from part (iv) by applying both sides of the equation to G. 


( 
(p, 
(p. ý 
( 
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Part (vii) follows from part (vi) by applying ^! to the equation and then applying Theorem 10.10.2 (iii"). 
For part (viii), let p € S, pe atlas,(S), v € A*(p, i) and G € P(M). Then v(GnS) = v(G)nw(S) because 
w is injective, where (S) = Range(v) n (R™ x {Opn—m}). Hence (Gr S) = v(G) n (IR" x (0g.-4]). 

For part (ix), let p € S, i € atlas, (S), v € A*(p, v) and G € P(M). Then VP Inset — I ow 


implies that II ((S n G)) = v(Sn Gn Dom(y)). So YHT (YCS N G))) = YS N Gn Dom(y))) = 


SnGnDom(v) n Dom(w) by Theorem 10.10.2 (iii^) because ~ is injective. But Dom(v) C S. Therefore 
v (HPC N G))) = Gn Dom(v) n Dom(v). 

For part (x), let Q € Top(M). If QN S = 0, then QN S € Top(S). So suppose that Q N S x (. Then there 
exists p € QN S. So by part (ii), there exists Y € atlas; (S) with A? (p, i») # 0. So there exists v € atlas? (M) 
such that (S$) = Range(v) n (R™ x {Opn—m}) and V lont = II? o y|;. Let G 2 Qn Dom(V). Then 
G € Top, (M). So v(G) € Top(IR") by Definition 51.3.2 (i). So II* (V(G) 0 (IR" x (0m«-»])) € Top(IR™) by 
Theorem 32.10.3 (i). Thus II" (V(G n S)) € Top(IR") by part (viii). But the assumption Dl nomi) =I o 
Y|- implies that 9(G) = Y(G N Dom(v)) = II (V(G n S)). So Y(G) € Top(R™). So by Definition 51.3.2 (i) 
and Theorem 10.10.2 (iii”), GADom(4) = w—'(w(G)) € Top(S). But p € GnDom(z)) because v € atlasp(S). 
So GN Dom(v) € Top, (S). Consequently Vp € QN S, JU € Top,(S), U C QN S. (The choice of U here is 
U = Gn Dom(4) = Qn Dom(v) n Dom(v), which depends on choices of y € atlas,(S) and Y € A? (p, ).) 
Therefore QN S € Top(S) by Theorem 31.3.16 (i). Thus Top(S) 2 {QN S; Q € Top(M)}. 

To show the reverse inclusion, let U € Top(S). Then for all p € U, part (ii) implies that there exists 
V € atlas, (S) with A°(p,~) #0. So there exists € atlas? (M) with v(S) = Range(v) n (R™ x (0g«-»]) 
and Vs =] o Y| Define nd for any pE U, we atlas,(S) and v € A°(p, y) by 


g» Which 


Vp € U, Vi) € atlas,(S), V € A°(p, 4»), 
G, jy = V (V(U n Dom(y)) x R=). 
Then G, jy, € Top(M) because (U n Dom(v)) x R"-™ € Top(IR") by Theorem 32.9.6 (ii) and v and ¢ 


are homeomorphisms. Let € = U{G, ; ,; p € U, yc atlas; (M) and v € A? (p, w)}. Then Q € Top(M) by 
Definition 31.3.2 (iii). By part (iii), 
Ci, 
= V (dU N Dom(y)) x IR") n ^! (R" x (Ogs-»]) 
= V! (4(U n Dom(y)) x (0gs-».]) 
-UnSnDom(y) (52.4.3) 
= U n Dom()), 


4082 G, gy n Dom(y)n S 
) 
) 


where line (52.4.3) follows from part (vii) by substituting U for G. Therefore by Theorem 8.4.8 (iv), 


Qn S -U(U n Dom()); p € U, v € atlas, (M) and y € A? (p, V)) 
=U 


because for all p € U, A? (p, v) # 0 for some v € atlas,(S), and then Y € A?(p, ) implies Y € atlas; (M), 
which implies p € Dom(v). Thus Top(S) C {QN S; Q € Top(M)). Hence Top(S) = {9N S; 0 € Top(M)}. 


For part (xi), let p € S, i) € atlas,(S), Y € A*(p,4)) and X € P(R”). Then v !(X) C Dom(v). So 
U(U-1 QX)) = I A(S 9 (X))) by the definition of A*(p, V). But ((Sn9-1(X)) = v(S)ng(0-1(X) = 
v(S) n Range(V) NX = v(S) n X by the injectivity of v and Theorem 10.10.2 (i). Hence (y !(X)) = 
Ix nv(s)). 

Part (xii) follows from part (xi) and the definition of A* (p, y). 

For part (xiii), let p € S and yc atlas,(S). By part (ii), A" (p, jo) Æ 0 for some wo € atlas,(S). So there 
exists Yo € A*(p, Yo). Then v € atlast (M), vo(S) = Range(vo) N (IR? x {Ogn—m }) and Vo [scio ema 
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M R^ 
Dom(o) 1 (  Range(vo) ) 
S R'"x(0na-a]) 
je 
II (vo(5)) = vo(Dom(vo)) / ^ $(Dom(vo)n Dom(vo)) = v(U) 
Yo(Dom(y) N Dom(vo)) = vo(U) 
Figure 52.4.2 Construction of additional charts for a regular submanifold 
Vo|;. It follows that Dom(vo) n Dom(yo) = S n Dom(Vo) and o(Dom(vo)) = HP (uo(S)) by equating 


domains and ranges respectively. (See Figure 52.4.2.) 

Let U = Dom(vo) N Dom(/o) N Dom(V) and Q = vg (UP) (9o(U))) = vg !(Uo(U) x IR^7"). Then 
U € Top(S) because Dom(wvo) N S € Top(S) by part (x), mud Dom(vo), Dom() € ird Therefore 
U € Top, (S) because p € U. So vo(U) € Top(IR™) by Definition 51. 22 51.3.2 (i, iii). So Y(U) x I7? € Top(R”) 
by Theorem 32.9.6 (ii). Therefore Q € Top(M) by Definition 31. 124. 


By part (vii), U = UN S N Dom(vo) = v (%o(U n Dom(do)). x “One “NG = Ug (Uo(U) x {Onn-m}). 


But by part (iii), ONS = Vs !(Uo(U) x IR^77) n vg ! (IR? x (0gs-]) = vg | (Vo(U) x {Opn-m}) because 
Q C Dom(o). Thus QN S =U. Therefore p € Q, and so Q € Top, (M). 


Define y : Q — R” by 
Vq EQ, wq) = concat ( (V; ^ (Iz (vo(a)))). Hing1(¥o(q)) )- 


(See Definition 14.6.10 for the tuple concatenation map concat : IR" x R”7™ — R”. This map is not shown 
explicitly when the need to concatenate n obvious.) Then w is well defined because Q C Dom(wvo) and 
II? (vo(0)) C do(Dom(i)) = Dom(9 o i5) by Theorems 10.7.1 (i^) and 10.10.13 (iii). 

To show that v € atlas? p(M), let ó-wvo Un , and Up = Dom(9) = vo(Dom(/))) and U, = Range(¢) = 
i (Dom(yo)). Then Up, U, € Top(IR") and $ : Üo 2 U, is a OF diffeomorphism by Definition 51.3.2 (iii). Let 
$ = bX idgn—m, and Up = Dom(¢) = Uy x R"-™ and U; = Range(¢) = Ŭi x R"-™. Then Up, U; € Top(R”) 
and ¢: Up > Ui isa C* diffeomorphism. (See Notation 10.14.4 for the double- domain function product 
@ X idgn—m : (zy) + (ó(z),y).) Since v = ó o Yolo, and Dom(ó) = vo(Dom(v)) x R"7" 2 vo(Q) 
because Yo(%) = Range(vo) N (Jo(U) x R^-") C (U) x IR"7" C jo(Dom(/)) x IR", it follows that 
Pol, =¢ 1 o y. So v € atlas (M). 

To show that (S) = Range(V) x {Ogn-m}, let z € v(S). Then z € Range(v) and z = v(q) for some 
q E€ Sn Dom(/). But II? , ,(vo(g)) = 0 because io € A*(p, vo). So (S) C Range(v) x {Ogn-m}. Now 
let z € Range(V) x {Ogn-m}. Then z = v(q) for some q € Dom(y), and then q € V^! (IR" x {Opn—m}). 
So q € S by part (iii). Therefore z = v(g) € V(S). Thus v(S) 2 Range(v) x {Ogn—-m}. Hence v(S) = 
Range(w) x {Opa}. 

To show that Visa = II’ g first note that Dom()| oom) = Dom(%) N Dom(2) = Dom(#) n à = 
Dom(}) n SNQ = Dom(J) NU =U and Dom(II? o v|;) = Dom(y) N S =NNS =U = Dorf pom(y)* 


Let q € U. Then (II o v|.)(q) = Hr (92) = Ir (Gs (Ir Wo(a))))) = Ir (Gs (Go(a)))) because 
bo € A*(p, Uo). So CEF o |.) = IP (9(9)). Thus J] yon) = TI o e|. 
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Since v € atlas? (M), v(S) = Range(v) x {Ojn—m} and Ulisa = II? o Y|, it follows that v» € A*(p, i), 
which implies that A*(p, 9) Z Ø. Hence Vp € S, V € atlas,($), A*(p, V) z 9. 
For part (xiv), let p € S, € atlas;(S) and Y1, Y2 € A* (p, w). Then by a double application of part (iv), 


Vi boana = concat(- , Ogn—m) o TEETE - U rioa 


For part (xv), let p € S, V, %2 € atlasp(5), Yı € A! (p. 11) and Y2 € A*(p, 92). Then 1 ! (IR^ x {Opn—m}) = 
Dom(w1) MS by part (iii). Hence by part (viii), vo(v; ! (IR" x {Opn—m })) = v»(Dom(v1)) n (IR" x {Ogn—m }). 
For part (xvi), let p € S, Ji, d. € atlas, (5), Yı € A*(p, ii) and vs € A*(p, ia). Then it follows from part (vi) 
that v4 (Dom(v9) N 8) = vi (Dom(v1) n Dom(v2) N S) x {Ogn—m}. But Dom(u;) n S = Dom(v;) n Dom(v) 
by part (i). Therefore vi(Dom(v») n S) = v1(Dom(v1) N Dom(v») n Dom(v2)) x (0m»-»j. So it follows 
that yı (Dom(y5) n S) € v, (Dom(ys)) n (0gs-^]. 

For part (xvii), let p € S, i1,» € atlasp(S), Yı € A*(p, i1) and v» € A*(p, 2). Then 


Dom(wV» o Vi paaga = Dom(w2 o vil) n V1 (Dom(v») n S) 
= v4 (Dom(v3)) N v1 (Dom(v) n S) 
= V1 (Dom(w2) f S). 


Similarly, 
= Dom(4z o Hy! o II) n vi (Dom(us) N S) 


= (LI*) (h1 (Dom (a2) N v1(Dom(v») n S) 
= (yı (Dom(y2)) x R”=™) N vi(Dom(v2) n S) 
= Wy (Dom(w2) f S) 


Dom (concat( +, Opn-m) o Wo o v7} o IT” i tantu 


because yı (Dom(v») N S) C di (Dom(i2)) N {Ogn—-m} by part (xvi). Thus the domains of the two sides 
of the equation to be proved for part (xvii) are the same. Let yı € vi(Dom(v3) N S), q = wy (y1) and 
yo = v»(q) = v» (i (y1)). Let zı = HP (y1) = HT (%1 (a)) and z2 = HP (yo) = IY (2(q)). Then zi = v1(g) 
and z2 = wv»(q) by line (54.6.1), and II7, ,4(y1) = II, ,4(y2) = Orr-m because yı € v1(S) and y2 € v»(S). 
So 


2p 21), ORn—m) 


(concat(-,Ogn-m) o %2 o V! o II?) (y y= (ab 
= (22, Ogn—m) 
y2 


(52.4.4) 


= (Y2 9 pi D(yi), 


where line (52.4.4) follows from Theorem 14.6.12 (i). 


Hence ij» o 1 mmm = concat(*,Ügs-«) o jo o ilo MH orbus 


52.4.5 REMARK: Replacing the zero-graph condition with a C" -graph condition. 
Theorem 52.4.6 applies Theorem 52.3.12 to Definition 52.4.2 to show that the zero-graph condition can be 
replaced by the C^-graph condition (52.4.5), which could be written more briefly as follows. 


Vp € S, 3) € As, 3v € atlasë (M), IQ € Top(R™), 
YCS) E C^(Q, R"7") and Ploom) — II o Yle 


52.4.6 THEOREM: Equivalent C*-graph condition for a regular differentiable submanifold. 
Let k € Zg. Let M be a C* manifold with n = dim(M). Let m € Zj with m X n. Then (S, As) is an 
m-dimensional regular C^ submanifold of M if and only if (S, As) is a C^ manifold with S C M and 
Vp € S, 3) € As, 3 € atlas (M), IN € Top(R™), 3h € C*^(Q, R^77), (52.4.5) 
YS) = (x, h(z) r EQ} and Yo nq) =T o Vs. 
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PROOF: Let (S, As) be an m-dimensional regular C^ submanifold of M. Then by Definition 52.4.2, (S, As) 
is a C^ manifold with S C M which satisfies condition (52.4.1). Let p € S. Then there exist Y € As 
and v € atlas&(M) such that v(S) = Range(v) n (IR" x {Ogn—m}) and | Dom(p) = IIT o V|;. Let 
Q = II (v(5)). Then Q € Top(R™) by Theorem 32.10.3 (i). Let h = Q x {Ogn—m}. Then h € C^(Q,IR"^") 
by Theorem 42.6.2, and v(S) = h. So S satisfies condition (52.4.5). 

Now let (S, As) be a C* manifold with S C M satisfying (52.4.5). Let p € S. Then there exist ) € As, 
v, € atlas (M), Q € Top(IR™) and hy € C^(Q,IR"^"') such that 1(S) = hy and Dl bomi) = II? o ijg- 
Let ha = Q x {Opn—m}. Then hg € C^(Q, R”7™) by Theorem 42.6.2. Therefore by Theorem 52.3.12, there 
exists v» € atlas" (M) such that (S) = ho, II o 1|, = HP o vo|; and Range(u2) € Q x IR". Then 
p € Dom(y1) $ = DonYIIT o pilo) So p € Dom(y3) N S = Dom (IIP o palo) So V» € atlask(M). Since 
Range(~2) CQ x R"~™, it follows that 


Range(W2) n (R x {Opn-m}) = Range(w2) n (€ x {Onn }) 
= Range(w2) N va(S) 
= y2(S) 


because ~2($) C Range(72). Thus for all p € S, there exist 9 € Ag and v» € atlas (M) which satisfy 
w2(S) = Range(w2) N (IR? x {Opn—m}) and Dl ci = II? o q2|,. Hence by Definition 52.4.2, (S, As) is 
an m-dimensional regular C^ submanifold of M 


52.4.7 REMARK: Constant-graph condition for regularity of a differentiable submanifold. 

Theorem 52.4.8 is the same as Theorem 52.3.16, except that it additionally proves that the given C^ manifold 
atlas Ags for the submanifold set S is consistent with the submanifold structure induced by the ambient space 
atlas. If the set S is only proved to be a regular C^ submanifold point-set by Theorem 52.3.16, then a suitable 
submanifold atlas Ag can be constructed as in Theorem 52.4.11, but Theorem 52.4.8 tests a given atlas As. 


As mentioned in Remark 52.3.15, an arbitrary tuple x € IR^-"* could have been incorporated into Defini- 
tion 52.4.2 instead of the fixed choice of the zero tuple On—m. This would have the disadvantage of compli- 
cating the logic of downstream theorems and definitions, which would be a distraction from the necessary 
complexity of other aspects of regular submanifolds. However, in numerous situations it is straightforward 
to demonstrate condition (52.4.6), and then Theorem 52.4.8 or some similar argument must be applied in 
order to verify that Definition 52.4.2 is satisfied. 


52.4.8 THEOREM: Equivalent constant-graph condition for a regular differentiable submanifold. 
Let k € Zi. Let M be a C? manifold with n = dim(M). Let m € Zj with m < n. Then (S, As) is an 
m-dimensional regular C^ submanifold of M if and only if (S, Ag) is a C^ manifold with S C M and 
Vp € S, 3) € As, 3v € atlasi (M), Ix € R^", (52.4.6) 
v(S) = Range(v) n (IR" x (z]) and VÍ cac) =] o plo 


PROOF: Let (S, As) be an m-dimensional regular C^ submanifold of M. Then condition (52.4.6) is satisfied 
by Definition 52.4.2 with x = Opn—m. 

Now suppose that (S, As) is a C* manifold with S C M which satisfies condition (52.4.6). Let p € S. 
Then there exist Y E€ As, v € atlas? (M) and zr € R^-"" which satisfy (S) = Range(v) n (IR? x {a}) 
and i een = II o plg Let Q = IIP(v(S)). Then Q € Top(IR") by Theorem 32.10.3 (i) because 
Q = IP (Range(v) n (IR™ x (z))), where Range(v) € Top(R”). Let h = {(z, x); z € Q}. Then Y(S) = h, 
and h € C*(Q, IR^") by Theorem 42.6.2. Hence (S, As) is an m-dimensional regular C^ submanifold of M 
by Theorem 52.4.6. 


52.4.9 REMARK: Adding and removing regular submanifold atlases. The “relative atlas”. 

Theorem 52.4.10 makes the easy observation that if the atlas Ag is removed from an m-dimensional regular 
C* submanifold (S, As) of a C^ manifold M, then the set S is an m-dimensional regular C* submanifold 
point-set according to Definition 52.3.7. 
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Theorem 52.4.11 reverses the atlas removal in Theorem 52.4.10 by constructing an atlas to give the point-set a 
suitable manifold structure. There are in general infinitely many choices for such an atlas. Theorem 52.4.11 
constructs a particular atlas Ag by including the projections II?" o v| g of all C* charts ~ which satisfy 
Definition 52.4.2 line (52.4.1). This will not usually be the same as the original atlas which was removed. 
So some information will typically be lost by removing and reconstructing the atlas. (The construction 
in Theorem 52.4.11 is essentially the same as the construction which is described much more briefly by 
Lang [23], pages 25-26.) 


The atlas Ag constructed on line (52.4.7) in Theorem 52.4.11 could be thought of as a kind of “relative atlas" 
on the submanifold point-set S. This is by analogy with the relative topology in Definition 31.6.2. However, 
although a relative topology is a valid topology for all subsets of a given topological space, this "relative 
atlas” is only valid if it is known that the set S satisfies Definition 52.3.7. But to satisfy this definition, a 
suitable atlas on S must be provided, and then there will be no need to construct one. Therefore the analogy 
with relative topologies is not very strong. 


52.4.10 THEOREM: Submanifold atlas removal. 
Let k € Z. Let (S, Ag) be a regular C^ submanifold of a C^ manifold M. Then S is an m-dimensional 
regular C^ submanifold point-set in M. 


Pnoor: The assertion follows from Definition 52.4.2 and Theorem 52.3.8 (i). 


52.4.11 THEOREM: Submanifold maximal atlas construction. 
Let k € Zj. Let M be a C^ manifold. Let m € Zj with m < n = dim(M). Let S be a non-empty 
m-dimensional regular C^ submanifold point-set in M. Define 


As — (Hi ov 


gi V € atlas" (M) and (S) = Range(v) n (IR" x (On, |, })}; (52.4.7) 


where II^ : R” — R” is defined in Notation 11.5.26. Then (S, Ag) is a regular C^ submanifold of M, and 
the topology induced by Ag on S is the relative topology of S in M. 


PROOF: Let p € S. Let H™™ = R” x {Ogn—m}. By Theorem 52.3.8 (i)., there is a chart € atlas (M) 
with v(S) = Range(v) n H™™. Then  — II? o Vs € Ag by Tm (52.4.7), and p € Dom(w) because 
p € Dom(v) and p € S. Thus S C ger Dom()). But Dom(4) C C S for all ý € Ag. Consequently 


S=Ujeas Dom(w). Thus As satisfies condition (ii) of Definition 51.3.2 for a C^ atlas for S. 


Let Ts = {9N S; Q € Top(M)}, the relative topology of S in M. Let ) € Ag. Then 9 = IIT o Vlg for some 
Vj € atlas* (M) with (5) = Range() 1H". Therefore Dom() = Dom(w| ) = Dom(Y)N S € Ts because 
Dom(w) € Top(M), and Range(w) = II? (Range( pls) = II*(v(S)) = II (Range(v) n H™™) € Top(R™) 
by Theorem 32.10.3 (i). The continuity of i) follows from the continuity of II and y. To show that 4 is 
injective, let pj, p2 € Dom(v) N S with (pi) = (pz). Then IT? (Y(p1)) = II? (ab(p 2)). But Mn+ (b(p1)) = 
Opgn—-m = I5, .1(v(p2)). So v(p1) = v(pz). So pı = p2 because v is injective. Thus ) : Dom(V) > Range() 
is a continuous bijection between open sets. To show that ij-! is continuous, let Ê € Ts. Then Q = QN S for 
some 2 € Top(M). But v(Q S) = v(09)nv(S) by Theorem 10.6.7 (iv^) and the injectivity of v. Therefore 
YŠ) = IPN 5)) = Up (YN) N Range) n H"") = IPN) n H™™). So ¥(M) € Top(IR") b 
Theorem 32.10.3 (i). Consequently ~~! is continuous. Thus 7 : Dom(w) + Range(w) is a uu M 
with respect to the topologies Ts and Top(IR™). So As satisfies condition (i) of Definition 51.3.2 for a C* 
atlas for S. 

Let ýa € Ag and Ü, = = Dom (1a) for a = 1,2. Let U = Ŭ, N Üz. Then U € Ts because Ü;, Us € Ts. 
Therefore Dom(is o $1!) = J4(U) € Ts and Rangel% o V!) = vs(U) € Ts. Let x € J4(U). Then 
(x) = vi|s ( (£, Ogn—m), where (x, Ogn- in If [ss (x) is the concatenation of the real tuples x and 
Opn—-m as in mae 16.4.3. o Yr (e) = y (z, 0g»- n because 4(S) C H™™. It then follows 
that a(r !(z)) = HT be] (br (s , Orn=m))), which implies paly  (a)) = TH (Va (9 "(s Onn—m))) because 
Vi (a, Opn— < = Wi (2) e 4rth (Ŭ)) = U - S. But since V» o vi! € C*(v1(U), vo (U)), where 


U = Dom(1) N Dom(w»), it follows that %2 o 91 ! € C*(V1(U), »(U)). Thus As satisfies condition (iii) of 
Definition 51.3.2 for a C* atlas for S. So (S, As) < ((S, Ts), As) is a CF manifold for which the topology 
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Ts induced by the atlas Ag is the relative topology of S in M. But by line (52.4.7), (S, As) satisfies 
Definition 52.4.2. Hence (S, As) is a regular C} submanifold of M, and the topology induced by Ag on S is 
the relative topology of $ in M. 


52.4.12 REMARK: Replacing a given submanifold atlas with its C" completion. 

Whereas Theorem 52.4.11 constructs a maximal atlas for a regular submanifold point-set when the point-set 
has no given atlas at all, it is sometimes necessary to consider the consequences of replacing a given atlas with 
the C^ completion of that atlas for some k € Zj. It is shown in Theorem 52.4.13 that the atlas Ag = atlas(S) 
in Definition 52.4.2 can be replaced with atlas” (S), which is the C^ completion of As. This shows that the 
zero-graph condition in the definition of a regular C^ submanifold is equivalent to an apparently weaker 
condition, which requires the submanifold to pass the test for some chart v in a larger class of charts. This 
makes it easier to prove that a submanifold is a regular C^ submanifold. 


52.4.13 THEOREM: Equivalent definition for regular CE submanifold using its C^-complete atlas. 
Let k € Zf. Let M be a C^ manifold. Let S < (S, As) be a C* manifold with m = dim(S) < n = dim(M) 
and S C M. Then S is an m-dimensional regular C* submanifold of M if and only if 


Vp € S, ay € atlas*(S), Jy € atlas (M), 


v(S) = Range(v) N (IR^ x {Ogn—-m}) and V scan = II? o3]... (52.4.8) 
PROOF: Suppose that S is an m-dimensional regular C* submanifold of M. Let p € S. Then there exist 
V € Ag and v € atlas? (M) such that line (52.4.8) is satisfied by Definition 52.4.2. But Ag C atlas*(S) by 
Definition 51.4.5. So there exist  € atlas" (S) and v € atlas? (M) such that line (52.4.8) is satisfied. 
For the converse, suppose that for all p € S, there exist € atlas"(S) and v € atlask(M) such that 
line (52.4.8) is satisfied. Let A% = atlas"(S). Then (S, A5) is an m-dimensional regular C* submanifold of 
M by Definition 52.4.2. Let p € S. Then by Theorem 52.4.4 (xiii), V € AS» AF (p, v^) 4 0, where AE, = 
(9 € AE; p € Dom(i)). Let do € atlas,(S) = ( € As; p € Dom(V)). Then po € A% „ because Ag C AS. 
So A"(p,Uo) # 0. Thus Vio € atlas,(S), A*(p, Uo) z Ø. But atlas,(S) z Ø by Definition 51.3.2 (ii). So 


Ayo € atlas, (S), A*(p, ijo) 4 Ø by Theorem 7.6.7. Hence (S, As) is an m-dimensional regular submanifold 
of M by Definition 52.4.2. 


52.4.14 REMARK: Formula relating submanifold and ambient space chart transition maps. 

'Theorem 52.4.15 gives a formula for m components of chart transition maps for the ambient space atlas in 
terms of the chart transition maps for an m-dimensional C! submanifold. This is useful for computations 
for tangent vector embedding maps, as in the proof of Theorem 54.6.4. Unsurprisingly, the formula only 
applies to m derivatives because these are the derivative directions which lie within the tangent bundle of 
the submanifold. Nothing can be said about the tangential directions which lie outside this tangent bundle. 


52.4.15 THEOREM: Relation between chart transition maps for a submanifold and ambient space. 
Let S be a regular C! submanifold of a C! manifold M, where n — dim(M) and m — dim(S). Then 


Vp € S, Vii, d» € atlas, (S), Vii € A (p, V1), Vi» € Al (p, us), Vi € Nm, Vj € Nn, 
m Os (5 (91^ (2), s e, EIE Nm 
: J 1 = T 2N P1 z—a, 
y (3 (91^ |, Lu = E v (p) modd (52.4.9) 
where A!(p, i) is defined for € atlas; (S) as in the statement of Theorem 52.4.4, line (52.4.2). 
PROOF: Let p € S and 7, i» € atlas,(S). Then by Theorem 52.4.4 line (52.4.2), for l = 1,2, 
A! (p, be) = {we € atlas; (M); Ye(S) = Range(e) N IR" x (0gs-» )) and Vel poy) = If o vels) 


Let we € A! (p,e) for l = 1,2. By Theorem 52.4.4 (xvii), v» (V 1(y)) = (va (7 (IP (y))), ORn-m) for all 
y € v1 (Dom(vs) n S). So 


Vic Nm, Yj E Nn, 

8, GG GL, cu, cy = Dut (Car CIT (y))), On), 
zi (pil bi (09) ato) if j € Nm 
0 if j E Nm 
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because II? o 4|, = Vil pom): This verifies line (52.4.9). 


52.4.16 DEFINITION: The restriction of a manifold M < (M, Ay) to an open subset S of M is the pair 
(S, Ag) where S has the relative topology from M and Ag = ls: v € Am}. 


52.4.17 THEOREM: Restrictions of C* manifolds to open sets are regular C^ submanifolds. 
Let k € Zj. Let M be a C^ manifold. Let S € Top(M). Let As = {| g; v € Ay. 


(i) (S, As) is à C* manifold. 
(ii) The restriction of M to S is a regular C^ submanifold of M. 


Pnoor: For part (i), let ) € Ag. Then ) = V|s for some € Ay. So Dom() = Dom(Y)N S € Top(S) by 


Definition 31.6.2. Let n = dim(M). Then Range(w) = v(S) € Top(IR") by Definition 31.12.4 because ^! 
is continuous, and v : Dom(v) — Range(v) is a homeomorphism by Theorem 31.14.14. Thus Ag satisfies 
Definition 51.3.2 (i) for a C* atlas on S. Since Upean Dom(v) = M, it follows from Theorem 10.8.14 (i) 


that [Jc 4; Dom(v) = Upean (S n Dom(v)) = SO Uyea,, (Dom(v)) = SN M = S. Thus As satisfies 
Definition 51.3.2 (ii). 

Let Y1, Y2 € Ay. Let U = Dom(v,) N Dom(y»). Then v» o Vil: V4(U) > v»(U) is a C* map by 
Definition 51.3.2 (iii). So y» o vi! = (v»|5) © (Wilg)! = (vo o Vi) uc : VA(S QU) > vx(SnU) 
is C^. (This follows from the locality of Definition 42.5.11 for C* maps between Cartesian spaces.) Thus 
the transition map V» o Wy! : vi(Ü) > (VU) is C*, where U = Dom(4i) n Dom(»). Thus Ag satisfies 
Definition 51.3.2 (iii). So Ag is a C^ atlas on S. Hence (S, As) is C^ manifold by Definition 51.3.8 and 
Theorem 33.1.33 (iii). 

For part (ii), let p € S. Then there exists Y € atlas,(M). Let v= V|s. Then € Ag and y € atlas; (M) 
because atlas; (M) C atlas? (M). Let n = dim(M). Then (S) = Range(/) N R” because Range(7) C IR", 


and W Seat = V snpoma) = V|s —IIT o V|s because II? = idan. Hence S is a regular C^ submanifold of 
M by part (i) and Definition 52.4.2. 


52.4.18 THEOREM: Differentiability of restriction of manifolds to submanifolds. 
Let k € Zj. Let M be a C^ manifold. Let S be a regular C^ submanifold of M. Then ids € C^(S, M). 


PROOF: Let n = dim(M) and m = dim(S). Let p € S. Then Theorem 52.4.13 implies that there exist 


V € atlas" (S) and v € atlas (M) such that (S) = Range(v) n (IR™ x {Ogn—m }) and V scs — II o y|. 


Then v o idg o ^ =yo V^ Ls =yo ^! by Theorem 10.10.15 (v) and the fact that YCS) C Range(). 
But then y o py! = y o I ie =o (If o g[,)7! — 9 o (IIP o ds) = v o (I ys) o Wt 
by Theorem 10.10.15 (viii), where I|, = VIT" |pange(y) MGR” (0s es) = TOY" P a Range(p)? Which 
Nees op= TOP unser c o v. But VIT eset has a well-defined inverse because it is 
a bijection from IR" x {Opgn-m} to IR". Let P = UT [no x{0nn-m}' Then 4% o %7! = y o y7! o P7! = 
idRange(y) © P^! by Theorem 10.10.15 (xi). Since P^! : (a), 4 (22), with z; = z; for i € Nm, and 
x = 0 fori € Nn \ Nm, it follows that P^! € C®(R™, R”). 
Since idRange(y) is a C^? map on Range(v) € Top(R”), it then follows that t) o ids o pis idRange(y) o P7! 
is a C^? function. Hence idg € C^(S, M) by Theorems 52.1.15 and 52.1.11. 


implies IIT 


52.5. Differentiable embeddings, immersions and submersions 


52.5.1 REMARK: Styles of specification of the regularity of maps between manifolds. 
In the literature, there are three ways of specifying the regularity of an embedding or immersion between 
differentiable manifolds. 


(1) Injectivity of the differential. This style of definition is given by Bishop/Crittenden [2], page 185; 
Darling [8], page 53; Do Carmo [9], page 11; Frankel [12], page 169; Gallot /Hulin/Lafontaine [13], page 12; 
Kosinski [21], page 27; Szekeres [305], page 429; O’Neill [295], page 42; Kobayashi/Nomizu [19], page 9. 
(For submersions, the injectivity condition is replaced by a surjectivity condition.) 
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(2) Constant rank of the differential. This style is given by Crampin/Pirani [7], page 243; Spivak [37], 
Volume 1, page 46; Flanders [11], pages 52-53; EDM2 [113], page 385; Sulanke/Wintgen [40], page 31; 
Gómez-Ruiz [14], page 27; Choquet-Bruhat [6], page 13; Malliavin [28], page 52. 

(3) Local regularity of the graph of the map via some chart. This style is given by Bishop/Goldberg [3], 
page 40-41; Lang [23], pages 25-26. 


As mentioned in Remark 50.2.2, requiring local regularity of the graph has substantial advantages, including 
simplicity, generality and extensibility, although most authors seem to favour definitions which are expressed 
in terms of the differential, requiring it to be injective or surjective, or else of constant rank. In the topological 
manifold case, only the local graph style of definition is meaningful because the differential is not defined. 


'The approach taken here is to give local graph-style definitions for embeddings, immersions and submersions. 
Then definition styles (1) and (2), which are expressed in terms of the differential of a C ! map, may be shown 
to be equivalent to the graph-based definitions for C^ regularity classes with k € Z*. 


Definition styles (1) and (2), based on the differential, are most convenient when the maps between manifolds 
are given as explicit functions which can be explicitly differentiated in local coordinates to determine the 
relevant properties of the differential. 


52.5.2 REMARK: Regular differentiable embeddings, immersions and submersions. 

Definitions 52.5.3, 52.5.4 and 52.5.6 extend Definitions 50.3.6, 50.3.8 and 50.3.13 respectively from topological 
manifolds to differentiable manifolds. The topological manifold definitions are the same as the special case 
k = 0 of the differentiable manifold definitions. 


Definition 52.5.3 relies upon Definition 52.3.7 to define a regular C* differentiable submanifold of the target 
space Mə. 


52.5.3 DEFINITION: A regular C^ embedding of a C® manifold M; in a C* manifold M», for k € Zj, isa 
C* diffeomorphism from M; to a regular C* submanifold of Mə. 


52.5.4 DEFINITION: A regular C^ immersion of a C! manifold M, in a C^ manifold Mo, for k € Ze isa 
OF differentiable map f : M, — M^» such that 


Vp EM, JQ € Top, (M1), 
flo : Q — f(Q) is a C* diffeomorphism, and f(Q) is a regular C^ submanifold of M3. 


52.5.5 REMARK: Local regularity condition for submersions. 

The local regularity condition in Definition 52.5.6 requires the existence of C^ compatible charts yı and %2 
such that when viewed via these charts, the map f looks like a projection from the full tuple x = (z;)734 
to the restricted tuple v»(f(vi'(x))) = (z;)72,. This style of definition has the advantage of avoiding 
computations of ranks of Jacobian matrices, which would be required by the constant-rank definition style (2) 
in Remark 52.5.1. 


52.5.6 DEFINITION: A regular C^ submersion of a C^ manifold Mj; in a C^ manifold Mo, for k € Zf, 
where n; = dim(Mi) > nz = dim(Mz), is a C* differentiable map f : Mı — Ms» such that 


Vp € Mı, Iyı € atlas; (Mi), 3i € atlast, (M2), Vr € Dom(w» o f o yy"), 
bol f(y (2))) = (war. 


52.6. Products of differentiable manifolds 


52.6.1 REMARK: Extension of direct products of topological manifolds to differentiable manifolds. 

Direct products of topological manifolds are presented in Section 50.4, both with and without atlases. 
(See Definitions 50.4.2 and 50.4.8.) Since the underlying topological space of every differentiable manifold 
(CM, T), A) is a topological manifold (M, T), the definitions, notations and theorems for topological manifolds 
are automatically applicable to differentiable manifolds. 


Definition 52.6.2 applies the product of topological manifold atlases in Definition 50.4.6 to define the under- 
lying topology for a differentiable manifold product. However, whereas the product of topological manifolds 
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(without atlases) is a special case of the product of topological spaces, the product of differentiable mani- 
folds requires the construction of the product of the atlases which specify the differentiable structure on each 
manifold. 


The product atlas for differentiable manifolds in Definition 52.6.2 is the same as for topological manifolds 
in Definition 50.4.6, but the expected basic properties of product atlases for differentiable manifolds do not 
follow immediately from the corresponding properties for topological manifolds in Theorems 50.4.7, 50.5.4 
and 50.5.8. Such properties must be proved, as in Theorem 52.6.5. 


52.6.2 DEFINITION: The (direct) product of two differentiable manifolds ((Mi, T1), A1) and ((Ms, T5), A2) 
is the differentiable manifold ((M1 x M», T), A), where Mi x Mg is the Cartesian product of sets Mı and Mo, 
T is the product topology of T1 and T5, and A is the product atlas of A; and A» as in Definition 50.4.6. (In 
other words, A= [i x we; Wy € A1 and we € A»])-) 


52.6.3 NOTATION: Mı x Ms, for differentiable manifolds Mı and Ms, denotes the differentiable-manifold 
direct product of Mı and Mə as in Definition 52.6.2. 


52.6.4 REMARK: Validation of direct product of two differentiable manifolds. 
Theorem 52.6.5 verifies that the differentiable manifold product in Definition 52.6.2 is a well-defined differ- 
entiable manifold, and confirms the expected dimension. 


52.6.5 THEOREM: C manifold products are well-defined C" manifolds. 
Let Mı < ((Mi,Ti), A1) and Mz < ((M», T5), A2) be non-empty C^ manifolds for some k € Zf. Let M 
be the Cartesian product of sets Mı and M». Let T be the product of topologies T; and Tz. Let A be the 
product of atlases A; and A». 

(i) (M, T) is a non-empty topological manifold with dim( M, T) = dim(Mi, T1) + dim(M5, T2). 

(ii) ((M,T), A) is a non-empty C^ manifold. 


In other words, Mi x M» is a C^ manifold with dim(M, x M3) = dim(M;) + dim( M2). 


Proor: For part (i), (Mı, Tı) and (M2, T5) are topological manifolds by Definition 51.3.8. So (M,T) is a 
non-empty topological manifold with dim(M, T) = dim(Mi, T1) + dim(M», T5) by Theorem 50.4.3. 

For part (ii), A; and Az are C* manifold atlases for their respective underlying topological manifolds (Mi, T1) 
and (Mz, T3) by Definition 51.3.8. So A; and Ag are continuous atlases for (Mi, Tı) and (M»s,T5) by 
Definition 49.7.3. So the product atlas A = (v X js; V4 € A1, Y2 € Ag} is well defined by Definition 50.4.6. 
'Then by Theorem 50.4.7, A is a topological manifold atlas for M which induces the topology T' on M. 

Let $41,» € A. Then $; = Yi 1 X Vi? for some v; j € A; for j = 1,2, for i = 1,2. Then the transition matrix 
from $; to ¢2 is $2 o di! = (Waa X 23) o (Vii X W1,2)7!, which equals (Uo, o V1) X (V2 o v3) by 
'Theorem 10.14.14 (iii). But wo O Vii : p1 1(U2,1) — Va 1(U14) and 2.2 [9] Vil i p1, 2(U2,2) — V3 2(U1,2) 
are C^ diffeomorphisms by Definitions 51.3.2 (iii) and 42.7.2, where U;,; = Dom(v;,;) for i, j = 1,2. So d2 0 
$1! = (Vaio Vii) X (V2.5 © Yra) : Y1 1(U2,1) X Y1,2(U2,2) > W2,1(U1,1) X V25(U1,2) is a C^ diffeomorphism 
by Theorem 42.7.4 (ii). Therefore A is a C* atlas for (M, T) by Theorem 51.3.4. Hence ((M, T), A) is a C* 
manifold by Definition 51.3.8. 


Part (iii) follows from part (i) and Definition 51.3.16. 


52.6.6 REMARK: The negative consequences of notational ambiguity for differentiable manifolds. 
Particularly in Definition 52.6.2, Notation 52.6.3 and Theorem 52.6.5, the ambiguity of abbreviated notations 
for differentiable and topological manifolds is evident. The notation ^M, x M2” can mean the Cartesian 
set-product, or the topological manifold product (M, x M», T), or the differentiable manifold product ((M, x 
M», T), A). Strictly speaking, the expression “M, x M2” in Notation 52.6.3 is an abbreviation for ((Mi x 
M», Top( Mi x M35)),atlas(Mi x M2)). The ambiguity is mostly tolerable when the basic definitions and 
theorems have been completed, but when working on the foundations, the ambiguity is increases the mental 
effort required to understand what has been written, especially when the formula “Mı x M2” appears twice 
in one sentence. 
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The expression “Mı x M2” can have the following meanings when Ma < ((Ma, Ta), Aa) are differentiable 
manifolds for a = 1,2, and Ti, = Top( Mı x M2) is product topology for Mı x M», and A1, = atlas( Mı x M2) 
is the product atlas for Mı x M». 


(1) Mı x? Mə = Mı x Ms», the Cartesian set-product of the point-sets M; and M2. (See Definition 9.4.2 
and Notation 9.4.10.) 

(2) Mi x? Mə = (Mi x? Mə, Tı 2) = (Mi x M3, Ti 2), the topological space product of (M, T1) and (M3, T2). 

(See Definition 32.9.4.) 

(3) M- xM Mo = (Mı xT M», 41,2) = ((Mı xs M»5,T13), A15) = (CM x M»5,T13), A152); the differentiable 
manifold product of ((Mi, T1), A1) and ((M2, T2), A2). (See Definition 52.6.2.) 

(4) Mi x? Mz < ((Mi,T3), A1) x? ((Ma, T5), Ag) = ((Mi, Ti), A1) x ((M2, T3), A2), the Cartesian set- 
product of the differentiable manifolds ((Mi, T1), A1) and ((Mo, T5), A2). 


Ambiguity is mostly harmless between cases (1), (2) and (3). Not so harmless is the ambiguity between (3) 
and (4) in the context of function parameters. In all of these cases, the expression “hb: Mi x Mo > Mo” for 
a map ¢ whose target space Mo < ((Mo, To), Ao) is a differentiable manifold, the map is typically understood 
to be from the point-set Mı x M» to the point-set Mo. Then the topology and atlas are used to assess how 
continuous or differentiable the map is, but the topology Tı 2 and atlas A,» are not themselves mapped 
by à. In case (3), the differentiability of the map “¢: Mı x Mz — Mp” is assessed in terms of the atlases 
A1,» and Ag. In case (4), the product atlas A4,» is absent, but the map *$ : Mı x Mz — Mọ” still has an 
unambiguous meaning as a function with two parameters. (See Remark 10.2.31 for the subtle difference 
between a function with two parameters and a function whose argument is a pair of objects.) In this case, 
the differentiability can only be assessed with respect to the two input-atlases A; and Ag, and the one 
output-atlas Ag. This issue becomes a serious concrete problem when tangent bundle structure are added, 
as for example in Definitions 54.7.6 and 58.6.2, where the products T(Mi) x T(M3) and T(M, x M3) must 
be distinguished. When differentials of maps between manifolds are then constructed and related to the 
respective tangent bundles, it becomes very difficult indeed to guess which set-construction is indicated by 
each expression including the product operation symbol “ x”. 


52.6.7 REMARK:  Differentiability of “partial maps” of maps between differentiable manifolds. 

Theorem 52.6.8 is the C^ manifold analogue of Theorems 42.6.15 and 42.6.16, which state that the partial 
maps of C d maps between Cartesian spaces are C^ differentiable. Theorem 52.7.5 is the C* manifold analogue 
of Theorem 42.7.6, which states that pointwise restrictions of a diffeomorphism between Cartesian spaces 


are diffeomorphisms. (The function $f? in Theorem 52.6.8 (i) is illustrated in Figure 52.6.1.) 


$(Mi x {p2}) 
A * p2 
Mo M3 
p2 : -1 d T 
1 = 0 (T1 xm) (p) 2 
Mı x {p2} 
"i EB ha g m 
Figure 52.6.1 “Partial map" 9f? of map ¢ from Mi x M» to Mo 


52.6.8 THEOREM: The “partial maps” of a C* map between C* manifolds are C! differentiable. 
Let Mi, Mz and Mo be C* manifolds for some k € Zg. Let 6: Mı x M5 — Mo be a C* differentiable map. 


(i) The function 9f? : M; — Mo defined by $4? : py œ> ó(pi, pa) is C* differentiable for all p» € M». 
(ii) The function 65! : Mz — Mo defined by $5! : p; ++ ó(pi, p2) is C^ differentiable for all pj € Mj. 
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PROOF: For part (i), let k € Zi and à € C*(Mi x M2, Mo). Let p € M». Let Ya € atlas(Ma) for a = 0,1. 
Define 9? = Yo o 9? o yi 1. Est V» € atlasy, (M3). Let Qo = Dom(vo) and Na = v«(xa(o !((19))) for 
a = 1,2. Then Qa € Top(IR"»), where n, = dim(M,), for a = 0,1,2. 


Define " = Wo (0) (0) o (ubi x V3) 1. 'Then Dom(¢) = (Yı x V») (6-1 (Q9)) = Q1 x Qə and ó € GEO x Q2, Q0). 
Let TE Qı. Let pı = py (z1). Let T2 = palpa). Then 


9f? (x1) = polok? (p1)) 

Vo(6(pi; p2)) 

= bo(O(Wy (21), by (v2))) 
= o(O((Wr* X vg )(v1,v2))) 
bo(O((W1 X v2) (z1, 22))) 
(a1, 23). 


II 


Thus Var, € 01, 9? (£1) = (£1, £2). Therefore 9f? € C (Q1, Qo) by Theorem 42.6.16 (i). But this holds for 
all yı € atlas( Mı) and Yo € atlas(Mo). Therefore of? € C*(Mi, Mo) by Definition 52.1.2. 


Part (ii) may be proved as for part (i) by means of Theorem 42.6.16 (ii). 


52.6.9 EXAMPLE: C^ partial maps do not imply a C* map. 
The converse of Theorem 52.6.8 is not valid. The conditions (i) and (ii) do not imply that 9 : Mı x Mz — Mo 
is C^. A simple counterexample is Example 41.1.21, where f : R? > R is defined by 


f(a) = wo +e) 24 0,0) 


For z9 € R \ {0}, the map fP : R > R with fi? : zy > f(zi,22) is C% (and in fact real-analytic) with 
maximum fj?(z,) = 1 at xı =k and minimum f? (x1) = —1 at zı = —k, whereas f? is the zero function. 
Thus f : Mı x Mz — Mo is a map between real-analytic manifolds Mı = Mz = Mo = IR with real-analytic 
partial maps fi? and f5* for all x1, x € R, and yet f is not even continuous from Mi x M» to Mo. 


52.6.10 REMARK: Two-variable differentiability is weaker than one-variable differentiability. 

From Example 52.6.9, it is clear that the combination of the two “partial map” C* conditions for M, and M3 
in Theorem 52.6.8 is weaker than the single C^ condition with respect to the direct product manifold structure 
on the combined domain M; x M». Therefore it is necessary to distinguish two alternative interpretations 
of a map $ : Mı x Mz > Mp as either (1) a function of a single variable p = (p1,p2) € Mı x M», or (2) a 
function of two variables py € Mı and po € M». (See also Remarks 58.6.3 and 58.10.1 for comments on some 
other ways in which the one-variable and two-variable interpretations differ.) 


Definition 52.6.11 is an attempt to give some vocabulary for the distinction between two-variable and single- 
variable differentiability of maps whose domains are direct products of differentiable manifolds. Then No- 
tation 52.6.12 distinguishes the two-variable componentwise C* maps in C^(Mi, M»; Mo) from the single- 
variable jointly C^ maps in C*(M, x M2, Mo). 


52.6.11 DEFINITION: Componentwise versus joint differentiability of maps on manifold products. 

A two-variable map ¢ : Mı x Mz — Mo is componentwise C" (differentiable) for C" manifolds Mo, Mi 
and Mz when the maps pı — ó(pi, q2) and pa ++ (qi, p2) are C^ for all q € Mz and qı € Mi. 

A two-variable map $ : Mı x Mz — Mo is jointly OF (differentiable) for C" manifolds Mo, M; and Mz when 
the corresponding single-variable map ó : Mı x M» — Mo is C* differentiable with respect to the C^ direct 
product manifold structure on Mı x Mə. 


52.6.12 NOTATION: C*(Mı, Ms; Mo), for C! manifolds Mi, M» and Mo, for k € Zi, denotes the set of 
componentwise C^ differentiable two-variable maps from M, and Mə to Mo. 


52.6.13 THEOREM: The common-domain direct product of two C^ maps is a C* map. 
Let k € Zi. Let M, M; and Mz be C^ manifolds. Let $4 : M — Ma be C* maps for a = 1,2. Then 
dı X d2: M > Mı x M5 isa C* map. 
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PROOF: Let Ya € atlas(M.) for a = 1,2. Let v € atlas(M). Then Ya o a o Y7! is a CF map for 
a = 1,2 by Definition 52.1.2. So (yı X V») o ($1 x $3) o YT! = ((V4 o $1) X (Vs o à2)) o YT} is a C^ map 
by Theorem 42.6.8, where Yı X Y2 denotes the double-domain direct product as in Definition 10.14.3, and 
Qı x $2 denotes the common-domain direct product as in Definition 10.15.2. Hence ġı x 2 : M > Mix M5 
is a C* map by Definitions 52.1.2 and 52.6.2. 


52.6.14 REMARK: Differentiability of double-domain direct products of manifold maps. 
Theorem 52.6.15 is a double-domain version of Theorem 52.6.13. (See Definition 10.14.3 for double-domain 
function products.) 


52.6.15 THEOREM: The double-domain direct product of two C* maps is a C* map. 
Let k € De Let Mag be a C* manifold for a, 8 € Na. Let ġa : Mo, — Maz be a C^ map for a = 1,2. 
Then Qı x $» i M14 x M21 — Mi x M». isa C* map. 


PROOF: Let v4, € atlas(Ma,g) for o, 8 € No. Then V4» o Qa o Yai is a C^ map for a = 1,2 by 


Definition 52.1.2. So (2,1 X %2,2) o (1 X $2) o (Uii X V1,2) 1 = (2,1 © $19 Wy) X (%2,2 0 $2 o W72) is 
a C! map by Theorem 42.6.9. (This function equality follows by a double application of Theorem 10.14.8.) 
Hence Qı x $2 € C*(Mi4 x M», Mi,» x M22) by Definitions 52.1.2 and 52.6.2. 


52.6.16 REMARK: Regarding a manifold as a submanifold of a direct product of manifolds. 

It seems reasonable to say that a manifold M is a submanifold of a direct product M x M» of two manifolds, 
but M; is not a subset of Mı x Mə. So it is not a submanifold according to Definitions 52.3.7 and 52.4.2. 
However, the “slice sets” Mı x {p2} are subsets of Mı x Mə for any po € Mə. (See Definition 10.12.5 for 
slice sets.) Such slice sets can be given a natural submanifold structure as in Definition 52.6.17. The validity 
and differentiability of these submanifolds are asserted in Theorem 52.6.18. 


52.6.17 DEFINITION: Slice-set submanifolds of direct products of differentiable manifolds. 
The left slice (set) submanifolds of a product Mı x Mə of C? differentiable manifolds M; and Mg are the 
manifolds (MP? , AT?) for po € M» defined by 


Vp» € M3, MP = Mı x {p2}, 
Vp2 € Mo, AT? = (v4 o Ili; v4 € atlas(Mi)]), 


where II; : Mi x {p2} — Mı is defined by II; : (q, p2) > q. 


The right slice (set) submanifolds of a product M, x Mz of C? differentiable manifolds M, and M3 are the 
manifolds (M31, AS") for py € Mi, where 


Vp, € Mı, MP = {pi} x Ms, 
Vpi € Mi, Ad’ = {42 o II»; v» € atlas(M3)], 


where IIo : {p1} x Mz — Mo is defined by II» : (p1,q) > q. 


52.6.18 THEOREM: Direct product slice-set submanifold atlas differentiability and regularity. 
Let k € Zg. Let Mı < (Mı, A1) and Mz < (Mz, A3) be C* manifolds. Let pı € Mı and pọ € Ma. Let 
(M??, At?) and (M2', AP) be the left and right slice-set submanifolds of Mı x Mp as in Definition 52.6.17, 
with projection maps I : Mı x {p2} > Mı and Il» : {pi} x M» — M23. 
(i) MP? < (M??, A?) is a C* manifold. 
(ii) MP? < (M??, A7?) is a regular C^ submanifold of M; x M». 
(iti) MZ" < (M3", A5!) is a C* manifold. 
(iv) M3? < (M?', A5!) is a regular C* submanifold of M; x Mo. 


PROOF: For part (i), let v; € AT". Then y, = yı o II; for some v4 € Ai. By Definition 51.3.2 (i), 
pı : U + Q is a homeomorphism for some U € Top(Mi) and Q € R™. Since {pə} € Top({p2}) by 
Definition 31.3.2 (i), U x {p2} € Top(Mi x {p2}) by Theorem 32.9.6 (ii). But Dom(v; o I1) = U x {p2}. So 
Dom(v1) € Top(M; x {p2}). Also, Range(v1) = Q € Top(IR":). Thus A? satisfies Definition 51.3.2 (i). 
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Since A; satisfies Definition 51.3.2 (ii), Uy, e Ar? Dom(v1) = Uy c4, Dom(v1) x {p2} = Mi x {p2}. (See also 
Theorem 9.4.7 (i).) Therefore A}? satisfies Definition 51.3.2 (ii) also. 

Let Yi y} € AT. Then y, o (91) ! = (Y2 o Th) o (yı o Il) ! = V» o (Il; o ID) o yj! = y» o 
idm, o V1! = v2 o Wy", which is a C^ map from Range(v1) = Range(w) to Range(y2) = Range(w) 
by Definition 51.3.2 (iii). Therefore A}? satisfies Definition 51.3.2 (iii). Thus A?? is a C^ atlas on M”? by 
Definition 51.3.2. Hence (MÍ?, A7?) is a C^ manifold by Definition 51.3.8. 


For part (ii), clearly M?? C Mi x M», and (MP^, A?) is a C^ manifold by part (i). Let p € MP. 
Then there is à chart ia c AP such that p € Dom(v), and so y= = 4, o Il; for some 7, € A. Let 
V» € atlas,,(M2) and v = a X wa. Then v € atlas(Mi x M3) by Definition 52.6.2, and u(M?P?) = 
V(Ms x {po}) = tha (Mi) x do([p2]) € R”! x {tbo(p2)}. So YM?) C Ranges) n UR™ x {th2(p2)})- Let 
x € Range(v) N (IR* x {p2(p2)}). Then « = v(q) = (v1(qi), v2(q2)) for some q = (q1, q2) € Mi x Mo, and 
x = (#1, Y2(p2)) for some zı € R™. So v»(qo) = w»(qo). Therefore q2 = p» because v» is injective. Thus 
q € Mi x (po) = M??. So x € v(MI?). Consequently (MI?) = Range(v) n (IR™ x (v»(pa)]). 


To show that Pla =I o0 V| r2» first note that 


Dom()| pomy) = Dom(/) n Dom(v) 
= Dom(w o M1) n Dom(y) 
= (Dom(v1) x {p2}) n (Dom(v1) x Dom(v»)) 


= Dom(v1) x {p2} 
and 


Dom(II?! o vlr) = Dom(II}* o y) MP? 
= Dom(II7! o (Yı x v2)) N (Mi x {p2}) 
= (Dom(v1) x Dom(2)) N (Mı x {p2}) 
L3 Dom(v1) x {po} 
= Dom(Y| poma) 


Let (q, p2) € Dom(v1) x {p2}. Then Ploom) (4 92) = yı (Ilı (q, p2)) = Yı (q) and (I7 o V| uoa (a; p2) = 


II? ( (Vi(q), Y2(p2)) ) = Yı (q). Therefore Vow j^ = Il?" o V| uoa Hence it follows from Theorem 52.4.8 
that (M??, AÑ?) is a regular C^ submanifold of M; x Mg. 

Part (iii) may be proved exactly as for part (i). 

Part (iv) may be proved exactly as for part (ii). 


52.7. Product-structured manifolds and submanifolds 


52.7.1 REMARK:  Pull-back atlases for product-structured differentiable manifolds. 
Theorem 52.7.2 is the differentiable manifolds version of Theorem 50.5.2, which is for topological manifolds. 
For pull-back atlases via general diffeomorphisms, see Definition 52.2.8. 


52.7.2 THEOREM: Basic properties of pull-back atlases for product-structured differentiable manifolds. 
Let k € Z. Let Mo, Mı and Mz be C^ manifolds. Let $1 : Mo — Mı and $» : Mo —> Mə be C* maps such 
that $1 X à» : Mo — Mı x Mə is a C* diffeomorphism. Let Ao be the pull-back atlas for Mo via ¢1 X $2 
from the direct product manifold Mı x Mə. 

(i) Ao = {(v1 X We) o ($1 x $3); Yı € atlas(Mi), Y2 € atlas(M3)]. 

(ii) Ao is a C* atlas for Mo. 
(iii) Ao C atlas*(Mo). In other words, Ao is C* compatible with atlas( Mo). 

) 


(iv Vpo € Mo, Ao, po = [ui x pa) o (d1 x $2); wy € atlas, (p) (Mi), we € atlass, (p Y(M3)), where Ap „Po 
denotes {Wo € Ao; po € Dom(vo)] for all po € Mo. 
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PROOF: Part (i) follows from Definitions 52.2.8 and 52.6.2. 
Part (ii) follows from Theorem 52.2.9 (i). 

Part (iii) follows from Theorem 52.2.9 (ii). 

Part (iv) follows from Theorem 52.2.9 (iii). 


52.7.3 REMARK: The importance of vertical submanifold diffeomorphisms for differential geometry. 
Theorem 52.7.5 is one of the core theorems underlying connections on differentiable fibre bundles, which are 
amongst the core concepts in differential geometry. Connections are defined as diffeomorphisms of fibre sets, 
which must be given a suitable submanifold structure. This theorem defines manifold atlases for vertical 
submanifolds (i.e. “fibre sets”) and proves some of their basic properties. 


52.7.4 REMARK:  Diffeomorphisms between horizontal or vertical submanifolds and component spaces. 
The scenario in Theorem 52.7.5 is encountered in the context of differentiable fibre bundles, where M; would 
be the base space, M5 would be the fibre space, $1 would be the projection map, ¢2 would be a fibre 
chart, and Mo would be the domain of ¢2, which is an open subset of the total space. Then $1 x $» is a 
common-domain direct product of maps as in Definition 10.15.2. 


Theorem 52.7.5 are extends Theorem 50.5.4 from topological manifolds to differentiable manifolds. (See also 
Theorem 32.11.4 for analogous assertions for general topological spaces.) The topology on a horizontal or 
vertical submanifold is the relative topology in the manifold Mo, and the differentiable structure is a kind of 
“relative differentiable structure” in the sense that the atlas for each submanifold defined in parts (iii) and 
(iv) is compatible with the ambient atlas for Mp. The atlases here are constructed from the maps ¢ ‘and 2 
in the atlases for M, and Mə, not from the ambient atlas on Mo. So it is not truly a “relative atlas" in the 
sense that the relative topology is “relative”. However, any atlas on a submanifold which is C} compatible 
with the ambient atlas will also be compatible with the atlas which is defined here. 


'Theorem 52.7.5 is illustrated in Figure 52.7.1. 


E $5! ({p2}) 
P 


dilez uy 


de =. 


Figure 52.7.1 Diffeomorphism from a horizontal submanifold $; ! ((pa)) to Mi 


52.7.5 THEOREM: Diffeomorphisms between submanifolds and direct product components. 
Let Mo, Mı and M» be C^ manifolds for some k € Zg. Let $1 : Mo — Mı and $» : Mo — M» be C* maps 
such that $4 x $5 : Mo > Mı x Mh is a C* diffeomorphism. Let n, = dim(M,) for a = 1,2. 


) 9114-1 (py : by ((pa)) + Mı is a homeomorphism for all py € Mo. 
2 
(ii) $2] 5-1 (y) : $7 ({p1}) > Ma is a homeomorphism for all p; € Mj. 
(ii) Let M?? = $5! ((p3]). Let A? = (v o $1, (uy v € atlas(Mi)) for po € M2. Then (M^, A?) isa 
C* manifold for all po € Mo. 
(iv) Let M3? = $i! ((p1)). Let AS! = {yo $24: (5,3) v € atlas(M3)) for py € Mı. Then (M7, Ad") isa 
C* manifold for all py € Mı. 
(v) (MP?, A7?) is a regular C^ submanifold of Mo for all p € Mo. 
(vi) (MZ*, Ab!) is a regular C^ submanifold of Mo for all py € Mi. 
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) 
iii) 
(ix) dy ór pip $5 '({p2}) > Mi is a C^ diffeomorphism for all p» € Mz, assuming atlas AF? on $5 ! ((p3]). 
2 
(x) d2 NE $1 ((pi)) + Ma is a C* diffeomorphism for all pı € Mi, assuming atlas A5! on 9; ! ((p1]). 
) 


(xi) ài dri op : Mı > $5 ({p2}) is a regular C^ embedding of Mı in My for all p? € M2, assuming atlas 
AT on $3  ((73)). 

(xii) $9 or} (iy | M2 > $i ((p1]) is a regular C^ embedding of Mz in Mp for all pı € Mi, assuming atlas 
A3' on $1 ((1)) 

PROOF: Part (i) follows from Theorem 50.5.4 (i). 

Part (ii) follows from Theorem 50.5.4 (ii). 


For part (iii), let pp € M» and p € M??. Then p € Mo and ¢2(p) = p2. But p= ($1 X 2) 1 (p1, ph) for some 

unique (p4, ph) € Mı x M» because ¢; x s : Mı x M» — Mp is a bijection. So p} = p» and ¢ġı(p) = p4. But 

p, € Dom(v) for some w € atlas(Mi), and p, = $17: (5,) P ). So p € Dom(v o I p) Therefore 
2 


Ue A72 Dom(7) = MÍ?. Thus A’? satisfies condition (ii) of Definition 51.3.2 for a C^ atlas for MP’. 


Let p; € Ms and ) € AP. Then 9 = o PENARE for some 7 € atlas( Mı). Let U = Dom(v). Then 


U € Top(Mı) by Definition 51.3.2 (i). Let U = dil aian Then U € Top(M??) by part (i), where 
2 


Top(M??) is the relative topology of M?? in Mo. But Dom(y) = U and Range(i) = Range(w’) € Top(IR?:). 
So Aj? satisfies condition (i) of Definition 51.3.2 for a C* atlas for M??. 


Let p; € Ms and 41, %2 € AP". Then ba = Wa o eil; "m for some Ya € atlas( Mi), for a = 1,2. Let 
Uy = = Dom(i^,) and Ua = = a, ox) for T = zx m 2. Then Pal N U2) = (Wa O b1)(Uy N U2) = Va(U1 N U2) 
for a = 1,2, and we o v7} = 20 vil cabs (Uh N U2) — wv»(Ui N U2) is a C^ function because atlas( M1) 
is C^. So Af? satisfies condition (iii) of Definition 51.3.2 for a C* atlas for M?P?. Hence (M??, A}?) is a C* 
manifold by "Deam 51.3.8 because it is a Hausdorff space by Theorem 33.1.33 (iii). 

Part (iv) may be proved as for part (iii). 

For part (v), let p; € Mz and p € MẸ’. For a = 1,2, let Ya € atlasg„(p (Ma) and Ua = = Dom(v,) and 
Ua = és (Dom(y4)). Then Uy € Top, (Mo) because $4 is continuous for a = 1,2. Let U = 0, N Üz. Then 
p € Ü and Ü € Top(Mo). Define 9 : Ü + R™ x R™ = R™+™ by ẹ(x) = (vi(ói(z)), vo(2(z))) for 
all a € U. Then  — (V1 X v) © (1 X à2)|g, where v1 X Y2 denotes the double-domain direct product as in 
Definition 10.14.3, and $1 x $» denotes the common-domain direct product as in Definition 10.15.2. (Note that 
(1 X2) o ($1 x da)|o = (Ui X v») © (1% 2) because Dom((u X v») o (1x 2)) = U by Theorem 10.10.13 (i) 
since ($1 x $2) ! (Dom(v x v»)) = ($1 X ó2) ! (Dom(v) x Dom(u)) = ó; (Dom(Y1)) Nez * (Dom(v»)) = 
by Theorem 10.15.6 (iii).) 

Then w is injective. So wb : Ü — Range(/4) x Range(y2) is a bijection because ($1 x $3)(U) = U1 x Us. 
Thus Range(v) = v1(U1) x v2(Us5). Therefore 


Dom(i) n MP? = Ung ((p) 
= ài  (Dom(vi) ( 
= 6,‘ (Dom(y1)) N 43 *({p2}) 
= ($1 X $2) (Dom(41) x {p2}) 
= (61 x b2) (YI R"*) x vg *({b2(p2)})) 
= ((¢1 x à)! o (Vi X va) ) IR" x (vo(p2))) 
=} (R^ x (vs(p))). 
Let U = U, x Uz and Y = o ($1 X $3) = V4 X Wr: UA V1 (U1) x v»(Us). Then Y € atlas(Mi x M3) 


by Definitions 52.6.2 and 50.4.6. So v o ($1 x $2) o jp is C^ for all wo € atlas(Mo) by Definition 52.1.2 
because $1 X $z is a C^ diffeomorphism. Thus 4) o v? is C^ for all yo € atlas(Mo). This means that 7 is 


2 | (Dom(v)) n éz ! (£p3)) 
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C* compatible with atlas(Mo). In other words, 1) € atlas? (Mo) by Notation 51.4.7. (Note that at this point 
in the proof of part (v), it has been shown that M”? is an n;-dimensional regular C^ submanifold point-set 
in Mo, which is the assertion of part (vii), and the atlas A}? has not yet been utilised. But the construction 
of C^-compatible charts m is required for testing the suitability of A??. So it is more efficient to combine the 
proofs, although the atlas A7? has no relevance to the definition of a regular C^ submanifold point-set.) 

Let $ = Vio da [uma Then ) € A? and Dom() = $; '(Dom(y1)) n MP? = Ŭi n MP = Un MP by 


Theorem 10.10.13 (i) because MI? C Us since p; € Dom(v;). So Dom() = Dom(v) n M??. Then from the 

observation that II?" o i = v o $1, it follows that v = 4 = II? ow]|,,5. Hence (M??, AP) is a 
1 

regular C^ submanifold of Mo by Theorem 52.4.8. 


Part (vi) may be proved as for part (v). 


Dom(#) 


Part (vii) follows from part (v) and Theorem 52.4.10. 

Part (viii) follows from part (vi) and Theorem 52.4.10. 

For part (ix), let po € Ms. Let p € M??. Let € AT? with p € Dom(i). Then ) = yo iiie for 
some w € atlas( Mı), and ¢1(p) € Dom(w) because ¢1(p) € Range($1|4-1(¢p,4)): Thus v € atlas; (p) (M1). 
But then v o $i (uy owt = 40g! = idg, where Q = Range(V), which is a C^ map between 


Cartesian spaces. Therefore $; : MP? — M; is à C* map by Theorem 52.1.11 line (52.1.5). 


los? Coa 
Now let py € Mi. Let p = dih ap (PL): Let v € atlas,, (Mi). Let p=ywo 91) 6-(¢p9})" Then w € A”? 


- - -1 i = 

and p € Dom(v) because $1] 6-12) P) = pı and pı € Dom(w). But v o ZI o pt =% o yt = 

ido, where Q = Range(w), which is a C^ map between Cartesian spaces. Therefore $4 pem : Mı > MP? 
2 


is a C^ map by Theorem 52.1.11 line (52.1.5). Hence APET : MP? — M; is a C* diffeomorphism. 
y LI 2 


Part (x) may be proved as for part (ix). 


Part (xi) follows from part (ix) and Definition 52.5.3. 


Part (xii) follows from part (x) and Definition 52.5.3. 


52.7.6 REMARK: Submanifold regularity for product-structured versus direct-product manifolds. 

Theorem 52.7.5(v) for product-structured manifolds is closely related to Theorem 52.6.18 (ii) for direct 
products of manifolds. They both apply Theorem 52.4.8, which shows that a submanifold is regular if it 
satisfies a constant-graph condition. 


52.7.7 REMARK: Horizontal and vertical submanifolds of product-structured differentiable manifolds. 
Definition 52.7.8 extends Definition 50.5.10 from topological to differentiable manifolds. Whereas topological 
manifolds are fully specified by their topology, differentiable manifolds require an atlas for their specification. 
Thus Definition 52.7.8 adds an atlas to the underlying topological space. By Theorem 52.7.5 (v, vi), the 
differentiable manifolds M?? and M?" in Definition 52.7.8 are regular C^ submanifolds of Mo for all pı € Mi 
and po € Mə. 


Product-structured differentiable manifolds are, in essence, trivial differentiable fibrations, and their vertical 
submanifolds are, in essence, fibre sets of these fibrations. 


52.7.8 DEFINITION: | Product-structured differentiable manifolds and horizontal/vertical submanifolds. 

A product-structured C: (differentiable) manifold, for k € Zt, is a tuple Mo < (Mo, $1, 62, M1, M2) where 
Mo, Mı and Mə are C* manifolds and $1 : Mo — Mı and $» : Mo — Ms» are functions such that the 
common-domain function product $4 x $» : Mo — Mı x Mə is a C* diffeomorphism. 


A horizontal submanifold of a product-structured C* differentiable manifold (Mo, $1, $2, Mı, M3), for k € Ze 
is a differentiable manifold MI? < (M??, AP?) < ((MP?, TP?), AT?) for some po € M», where 


(i) MP? = $3" ({pe}), 
(ii) TP? = {QN MP; Q € Top(Mo)), 
(iii) AP = (vo INT v € atlas(M;)}. 
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A vertical submanifold of a product-structured C* differentiable manifold (Mo, $1, $2, Mi, M2), for k € Zg, 
is a differentiable manifold MJ" < (M7', AS") < ((M2',T3"), A>") for some p € Mi, where 


(i) MP = 47 (mJ) 
(ii) T? = {Q N M2* ; Q € Top( Mo)), 
(iii) AB! = (y o INE V € atlas(M2)}. 
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[1679] 
Chapter 53 


PHILOSOPHY OF TANGENT BUNDLES 


53.1 The true nature of tangent vectors .. . ooo a 1679 
53.2 Tangent vectors on almost-differentiable manifolds . . .. ...... llle. 1688 
53.3 Styles of representation of tangent vectors . . . . ees 1691 
53.4 Tangent bundle construction methods . ....... ers 1697 


53.1. The true nature of tangent vectors 


53.1.1 REMARK: The tangent bundle is the most basic concept of differential geometry. 

If anything could be said to be the most basic concept of differential geometry, it must be the tangent bundle. 
Without the tangent bundle, there is no differential geometry. All other differential geometry concepts are 
built upon the tangent bundle. Therefore it is essential to have a correct and confident understanding of 
tangent bundles. Uncertainty in one's understanding of tangent vectors leads inevitably to uncertainty in 
one's understanding of all other concepts in differential geometry. 


As suggested in Remark 49.7.8, the tangent bundle has the same relation to differentiable manifold structure 
that the topology has to topological manifold structure. In the same way that the topology fully determines 
the set of all continuous charts and atlases on a locally Cartesian topological space, the tangent bundle, 
which is a kind of topological vector bundle, fully determines the set of all differentiable charts and atlases. 
Thus the true “differentiable structure" on a differentiable manifold is the tangent bundle, which plays the 
same role that the topology plays as the "topological structure". Thus in principle, atlases are not strictly 
required for the specification of differentiable structure on a manifold. The tangent bundle suffices. However 
in practice, the tangent bundle is specified with the aid of an atlas. On the other hand, the topology is usually 
specified in practice with the aid of an atlas also, not in terms of open bases or other purely topological 
concepts. Nevertheless, at least in a philosophical sense, it is nice to know that the "differentiable structure" 
can be something more abstract than a set of arbitrarily chosen charts. 


53.1.2 REMARK: Tangent vectors are essentially velocities. 

Tangent vectors are the quintessence of velocity. A tangent vector may be thought of as a rate of change 
of position of a point in a manifold. A tangent vector is not an infinitesimal displacement. In fact, tangent 
vectors are best thought of as space-time entities, not as spatial entities. Thus it would be preferable to think 
of a tangent space as a “velocity space" rather than a “vector space”, since the word “vector” is associated 
with displacements. 


The word “tangent” is derived from the Latin word "tangens" which means “touching”. Tangent bundles 
are modern differential abstractions from the tangent lines to curves and tangent planes to surfaces which 
have been familiar in flat space since classical Greek mathematics. (See Remark 53.3.12 for the Euclidean 
definition of tangent lines and tangent circles.) 


It is an interesting coincidence that the word "tangens" is an adjective which is derived directly from a 
verb "tangere", although the English word “tangent” is typically used as a noun. Thus a tangent originally 
referred to an action, described by a verb, an it has developed into a noun. In other words, the original 
action-word has been frozen into a thing-word, but the word "tangent" still hints at the original active 
meaning which has a time parameter. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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If the parameter of a differentiable curve in a manifold is thought of as a time parameter, then the derivative 
of the curve is a velocity. This velocity may be identified with a tangent line to the graph of the curve in 
space-time. If this tangent line is projected down to the manifold, the result is a line showing the direction 
of the motion. This projected line does not indicate the speed of motion. However, most physics textbooks 
show velocity vectors in the same diagram as the space vectors, leading to a false impression that velocity 
vectors exist in the same space as the points in space. This is as wrong as claiming that the gradient of the 
graph of a function y = f(x) is located in the X-axis. The correct location of this gradient is a line which 
is tangent to the point (x, f(z)) in the Cartesian product space R x IR. 


53.1.3 REMARK: Tangent vectors are not infinitesimal displacements. 

If one interprets velocities within the context of special relativity, the same velocity will be measured differ- 
ently by different observers, because they have different clocks and measuring sticks. If Lorentz transforma- 
tions are applied to velocities, one must apply transformations to both the space-charts and the time-charts. 
This shows once again that time is an important aspect of tangent vectors. Therefore tangent vectors cannot 
be interpreted as infinitesimal displacements. 


The idea that tangent vectors are velocities is reminiscent of the Newtonian concept of “fluxions”. It might be 
useful to reintroduce this perspective into differential geometry. Within the pure mathematical applications 
of differential geometry to static geometries, it is easy to understand why the time parameter is ignored. For 
example, a length-parametrised geodesic uses distance as a parameter, not time. But one may describe such 
curves as constant-velocity or constant-speed curves. In general, it seems that thinking of tangent vectors 
as velocities with respect to some kind of time parameter does no harm, and sometimes it does some good, 
because it prevents one from thinking (falsely) that tangent vectors are infinitesimal displacements. 


53.1.4 REMARK: Tangent vectors are an intermediate concept between curves and functions. 

There is an analogy between tangent vectors and money. Money is itself useless and worthless. Money is 
earned by doing real work, or selling a real thing, or providing a real service. Money is expended to purchase 
real work, or a real thing, or a real service. But the money is only useful as an accounting mechanism for 
exchanges of goods and services. 


Similarly, tangent vectors are an intermediary between curves and real-valued functions on a manifold. If 
one forms the expression 0;(¢(y(t))) for a real-valued function ¢: M — IR and a curve y : IR > M, one is 
differentiating a function $ o y : IR — R, which is a purely mathematical calculation involving no geometry. 
In the middle of the transaction is a set M of geometrical points. We can split this “barter transaction” into 
two parts: the conversion y(t) of real-number changes to point changes, and the conversion dó(p) from 
point changes to real-number changes. That is, we trade real numbers for points, and then trade our points 
for real numbers. The exchange rates between real-numbers and points do affect the prices y(t) and dó(p) 
at which we buy and sell points. But the exchange rates do not affect the barter rate 0:(¢(7(t))) which we 
obtain at the end of the full transaction. 


Therefore we may define the space of tangent vectors, and its dual space of tangent covectors, as we wish, 
because tangent vectors are only a temporary, intermediate part of a full transaction. Just like money, 
tangent vectors have no real significance of their own. They have value only for the real things which you 
can “buy” with them. 


There is a similar analogy with the concept of a “force”. There is really no such thing as a force. It is a 
fiction. We measure forces by the resulting acceleration of a body which is acted on by the force, or by 
the displacement of an elastic body by the application of the force. In the case of gravity, we say that 
one body exerts a force on another, but what we actually observe is the acceleration of the body which 
is acted upon. We calculate the force by observing its accelerative effect. No force is ever seen. Forces 
are simply an intermediate concept between the exporters of forces, such as planets, and the importers of 
forces, such as apples. The behaviour of the system can be explained without forces by eliminating forces 
from the equations to obtain acceleration as a function of masses and distances. Coincidentally, forces may 
be modelled as tangent vectors in a space-time manifold. Tangent vectors have the same dubious claim to 
reality as forces. They are both intermediate concepts between observables. 


Some authors define tangent vectors as equivalence classes of curves, so that the vector is identified as the 
common “effect” of these curves on differentiable real-valued functions. If two curves have the same effect, 
then they belong to the same class. Other people define tangent vectors as linear operators acting on real- 
valued functions. Such operators are the “effect” of the vectors on real-valued functions. This is analogous 
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to the idea of “force” in physics, which is abstract and not directly observable. Tangent curve classes and 
differential operators are two sides of the same coin. 


The point of view taken in this book is that the many representations of tangent vectors which are found 
in the literature are co-existing, mutually isomorphic classes of objects. (See Table 53.3.1 in Remark 53.3.2 
for a survey of representations.) They are like multiple currencies or systems of physical units (such as 
Newtons, dynes or pounds-force). The most flexible and general representation of tangent vectors is in 
terms of equivalence classes of pairs (v, L), where w is a chart and L is a linearly parametrised line. Such a 
representation has broad generalisations which usefully extend the scope of tangent vectors to object classes 
which are not so closely tied to Ct differentiable manifolds. Not all manifolds are C+. 


53.1.5 REMARK: Tangent vectors should be thought of as velocity vectors. 

Tangent vectors are not vectors in the classical sense of displacements between points. They are velocities, 
not displacements. They are infinitesimal, not point-to-point vectors. The reader who wishes to think 
clearly should think “velocity” whenever “tangent vector” appears, and “velocity bundle” whenever “tangent 
bundle” appears. (Do Carmo [9], page 6, does explain that a tangent vector is a velocity.) 


53.1.6 REMARK: Arguments in favour of using extrinsic representations of tangent vectors. 

There is no compelling reason to reject extrinsic tangent vectors from consideration in modern differential 
geometry. The tangents to a two-sphere embedded in three-space are well defined. There is no reason to not 
use them. It is true that such vectors lie in the ambient space, not in the manifold's own point space, but 
tangent vectors never lie in the point space of any manifold. One cannot argue that an equivalence class of 
curves, or a differential operator, or a tuple of vector components, is “inside” the manifold. These objects 
are not elements or subsets of the manifold. An extrinsic tangent vector has the advantage that at least 
one point of a tangent line does intersect the manifold. Therefore extrinsic tangent vectors of embedded 
manifolds may be considered on equal terms with all of the other possible representations. 


It is not only tangent vectors whose “true nature" lies outside mathematics. Even the points of a manifold 
are typically “extra-mathematical”. In the definition of a differentiable manifold ((M, Tm), Am), restrictions 
are not placed on the point set M except to say that it is a set. It is tacitly assumed that M will be a ZF 
set, but in applications to geometry or physics, it usually is not. Ultimately differential geometry provides 
models for the real world, not for ZF sets. So if the points are extra-mathematical, the tangent vectors will 
likewise be extra-mathematical. There should be no surprise in this. 


Remark 54.3.10 presents a way to synthesise a point space M from given chart transition maps. This 
is how tangent vectors are defined also, namely as equivalence classes constructed from charts and their 
transition maps. Somehow such constructions are very rarely presented for the synthesis of the base points 
of a manifold, but there are numerous constructions for tangent vectors. 


53.1.7 REMARK: The real world may have its own native differentiable structure. 

The difficulty of identifying a credible mathematical construction to represent velocities (i.e. tangent vectors) 
on a differentiable manifold can be resolved in terms of the concept of “native velocity". Length, duration 
and velocity are observed in the real world, and physical models since the 17th century have successfully 
exploited the verisimilitude of Cartesian coordinates for describing and predicting motion. This implies that 
there is something in the observable physical world which corresponds to length, duration and velocity. 


'The charts of differentiable manifolds are useful because the real world has, apparently, its own differentiable 
structure. Manifold charts merely exploit this structure. The fact that manifold charts are required to be 
differentiably compatible with each other is a consequence of the assumption that there is some underlying 
"native geometry" which can be described by a mathematical model. Consistency with the underlying 
structure necessarily implies consistency between the charts. Similarly, there is a “native time" which 
is modelled by curve parameters. This yields a “native velocity" which has good test-retest correlation 
properties. (See also Remark 40.1.1 on this subject.) In other words, measurement of the native velocity 
observed in the real world has enough self-consistency to make it useful. 


From these comments, it is a reasonable conclusion that the “true tangent vector" is in the system which is 
being modelled, not in the mathematical model. This is entirely analogous to the well-accepted idea that the 
points of a manifold are abstract points which lie outside the framework of the model. Thus a manifold is a 
set M, and charts  : M > R” map abstract points p € M to n-tuples v(p) € IR". The “true point" is the 
abstract object p, not the mathematical object (p). In the same way, a “true tangent vector" is an abstract 
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object ip^! (v), where v € Typ) (IR^) is a mathematical tangent vector at (p), and v : T(M) > T(R”) is 
a map for tangent vectors corresponding to the map w for points. Thus the “true tangent bundle" is an 
abstract set. (Similar comments are made in Remark 53.1.15.) 


Arguments for and against the existence of differentiable structure in the real world can be short-circuited by 
recognising that it is only our measurements of the real world which use Cartesian charts, and the supposed 
differentiability of the real world is actually the differentiability of the coordinates which we use in our 
frames of reference. In other words, the differentiability resides in our observations of the real world, which 
are expressed in the language of numerical tuples, and it is only these tuples which can be differentiated. 
We never differentiate the real world itself. 


53.1.8 REMARK: Seeking a natural mathematical representation for the “true tangent bundle”. 

The comments in Remark 53.1.7 may seem to resolve the “true tangent bundle” issue. However, it is not 
quite clear how to define a natural tangent bundle on Cartesian spaces R”. (This question is also mentioned 
in Section 40.1.) Tangent vectors for these spaces may be defined as straight lines in the corresponding 
space-time set IR"*! = R” x IR, as n-tuple point-velocity pairs (z,v) € IR" x R”, as equivalence classes of 
curves, or as differential operators. Flat Cartesian spaces are, of course, a very particular class of manifolds. 
The value of Cartesian spaces lies in their ability to model the observable geometry of the real world. So 
none of the representations of the tangent bundle T'(IR") are the "true tangent bundle" on Cartesian spaces. 


The tangent line in space-time has thousands of years of historical depth on its side. The space-time numerical 
quotient (velocity) has hundreds of years of historical depth on its side. Curve classes and tangent operators 
also have strong appeal for particular applications. But ultimately, “true tangent vectors" really exist only 
in modelled systems, just as “true points" exist only in modelled geometries. The choice of mathematical 
representation is purely a matter of expediency, efficacy and aesthetics. 


53.1.9 REMARK: The best mathematical representation for a tangent vector is a line. 

For Euclidean geometry, there are special charts which faithfully copy the translation-invariant, rotation- 
invariant structure of the space. Necessarily, these charts are related by combinations of translations and 
rotations. (This is illustrated on the left of Figure 53.1.1.) 


Cartesian transition maps differentiable transition maps 
A 
T1 PAS 
R” R” SER » IR? X2 » IR? 
T2 
numeric charts V —  — / 3 coordimates =f 
real geometry ay wo points wy abe 
SS m | 
Euclidean geometry differential geometry 


Figure 53.1.1 Real geometry versus numeric charts for Euclidean and differential geometry 


In the case of Euclidean geometry, a tangent (either a direction or velocity of a curve) is very naturally 
represented as a straight line (either unparametrised or parametrised respectively.) In the case of a differen- 
tiable manifold, there are no special charts which faithfully copy the translation-invariant, rotation-invariant 
structure of the space because there is no such structure. However, there are special charts which faithfully 
copy the differentiability of curves, and these charts are related by differentiable transition maps. (This is 
illustrated on the right of Figure 53.1.1.) In any particular such chart, the velocity of a curve is conveniently 
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represented as a parametrised straight line. If the line is mapped back to the manifold, the result will gen- 
erally not be a straight line, because a straight line is not well defined on a general differentiable manifold. 
But this does not imply that the straight lines in the chart space are irrelevant. Differentiation of curves is 
defined in terms of straight lines as approximations to the curves in the Cartesian chart space. 


A differentiable curve in a differentiable manifold is defined to be a curve which is differentiable through each 
chart, and this means that it is well approximated in the limit by a straight line. Therefore the most honest 
representation of a tangent vector is as a particular unique line Lz», for each chart Ya. Thus an equivalence 
class of the form [(Ya, La,,v, )] is a very accurate representation of a tangent vector. It is democratic because 
it gives equal weight to every chart. Within each chart, it describes the differentiability exactly in terms of 
the chart. The atlas of charts is the only structure defined on a general abstract differentiable manifold. 


Many authors replace the one-line-per-chart tangent vector with an infinite class of differentiable curves 
which have the same velocity. However, this assumes a particular differentiability class (such as C?) for the 
charts and curves, and various other qualities for the curves. For example, the curves may be assumed to be 
open or closed, the domains of the curves may be assumed to be intervals or general open sets of other kinds of 
real-number sets, and they may assumed to have no self-intersections, or finite self-intersections or arbitrary 
self-intersections. Such general infinite curve classes introduce unnecessary and irrelevant technicalities, and 
they may restrict the classes of differentiable manifolds so that some useful kinds of manifolds are excluded. 


53.1.10 REMARK: Representation of vectors as parametrised straight lines in the literature. 
As mentioned in Remark 26.12.4, the representation of vectors as parametrised straight lines is discussed by 


Misner/Thorne/Wheeler [292], page 49. 


53.1.11 REMARK: Straight lines have constant direction, but direction is defined relative to straight lines. 
Unparametrised lines in Euclidean spaces are those curves which have constant direction. Parametrised 
lines are those curves which have constant velocity. Differentiation of curves is performed by comparison to 
lines. Therefore if neither direction nor velocity is meaningful in a general differentiable manifold, there is no 
possibility of defining constant-direction or constant-velocity lines, and consequently there is no possibility 
of differentiating curves intrinsically within the manifold. One must simply accept that the coordinate charts 
are the only tool at hand. 


There is one possible way out of “coordinate-bound” tangent spaces. Instead of Cartesian-style coordinate 
charts, one could use as chart spaces standard manifolds which have some kind of coordinate-free differen- 
tiable structure on them. But this is a circular argument. If the *coordinate-free" manifolds have the desired 
constant-direction or constant-velocity curves, these will provide grid lines which are equivalent to Cartesian 
grid lines. Coordinate-free chart spaces are attractive philosophically, but in practice somewhat useless. 


53.1.12 REMARK: Zeno’s arrow paradox. 
Since ancient times, notions such as velocities and infinitesimals have yielded paradoxes and logical quagmires. 
EDM2 [113], 319.C, page 1188, describes one of Zeno’s paradoxes as follows. 
A flying arrow occupies a certain point at each moment. In other words, at each moment the arrow 
stands still. Therefore the arrow can never move. 


It is not too difficult to sympathise with this argument. At each instant of time, there is a definite fixed 
location for the arrow. An object which has a fixed location cannot move. Even with the modern perspective, 
there is a paradox here. The object which is not moving cannot be distinguished from an identical object 
which is moving but located at the same point at a particular instant of time. If one had a perfect camera 
which could make an instantaneous exposure lasting zero seconds, the moving and not-moving objects would 
look the same. So what inner property of the moving arrow causes it to change its location at a particular 
rate? How does an object “know” how fast and in what direction it should move? Does it somehow 
remember its recent past history and therefore maintain its velocity by moving the same amount in the next 
nanosecond that it did in the last nanosecond? In other words, how is inertia “implemented”? How is the 
velocity state-parameter “encoded”? 


It is clear that velocity is a different kind of thing to a location. But tangent spaces are the sets which 
contain velocities. So tangent spaces and point spaces are necessarily quite different. 

In special relativity, the internal state of a moving objects (relative to an observer’s frame of reference) differs 
from the static state by having a slightly larger mass. But this inner state parameter does not specify the 
direction of motion. 
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53.1.13 REMARK: Design of suitable specification tuples for velocity vectors. 
A “tangent-line vector" at p € M for an n-dimensional C! manifold M for n € Z may be represented as 
an equivalence class [(v, L)] of ordered pairs (v, L) with w € atlas, (M) and L € Typ») (IR"). 


A “tangent velocity vector" at a point p of an n-dimensional C! manifold M, where n € Ze , may 
be represented as an equivalence class [(w, (x, v))] of ordered triples (v, (r,v)) such that v € atlas,(M) 
and (z, v) € Typ) (IR^). 

The position of the chart v» as the first element in the ordered pairs in these styles of representation of 
tangent vectors on manifolds is very convenient. It happens that functions are represented in the standard 
development of ZF set theory as ordered pairs, where the first element is from the domain set and the second 
is from the range set. Thus an equivalence class [(v;, L)] is actually a function from atlas; (M) to the tangent 
bundle Ty p)(IR”). Similarly, an equivalence class [(w (x, v))] of ordered triples is a function from atlas, (M) 
to Tip) (R”). 


It follows that if V = [(v, L)] represents a tangent-line vector, then V(v) = L. Similarly, if V = [(v, (x, v))] 
represents a tangent velocity vector, then V (v) = (x, v). This may be compared with the formula v(p) = x 
for the coordinates x of a point p with respect to a chart w. The chart w is itself a set of ordered pairs 
(p, x) for which w(p) = x. If one fixes p, one obtains the map w +> v(p) from charts to coordinates. Thus 
i(v,v(p)); v € atlas;(M)) may be thought of as the equivalence class of ordered pairs mapping charts to 
coordinates for the fixed point p. 


'This shows that the charting of points and tangent vectors is done inconsistently. Points are charted as pairs 
(p,x) for each fixed chart w. Tangent vectors are charted here as pairs (w,(z,v)) for each fixed point p. 
Obviously the coordinates of a point are a function of both the point and the chart. It is a matter of free 
choice whether one thinks of the point or the chart as the primary independent variable, while the secondary 
variable is regarded as a mere parameter. 


In the case of point-charts, one usually regards the point as the primary parameter of the charts, while the 
chart is a semi-fixed parameter. Generally one works within a single charts, and charts are changed rarely. 
In the case of tangent vectors, one often fixes attention on a single point, and then the chart is a parameter 
which determines multiple coordinates values. This makes the vector coordinates a function of the chart, 
where the point coordinates are a function of the point. It seems reasonable to try to standardise these 
definitions to make them more consistent. In the case of points, the custom is very strongly to fix the chart 
and ignore it most of the time. Therefore it seems reasonable to adapt the tangent vector chart maps to 
this. Then one would not define tangent vectors for individual points. One would define charts for entire 
tangent bundles. Thus one would want to define a tangent bundle chart w : T(M) > T(R”) between the 
respective tangent bundles corresponding to each point-chart y : M — R”. These are necessarily partially 
defined functions in general. The point-charts are not always defined on the whole manifold. This approach 
bears a close resemblance to the way fibre bundle charts are defined on the total space of a fibre bundle. 
This suggests that the best approach to defining tangent vectors is first define one tangent vector bundle 
chart for each point chart, and then define tangent vectors at each point in the manifold consequentially 
from the bundle definition. A big advantage of this approach is that it fairly closely matches the way that 
general fibre bundles are defined. (See for example Definition 47.2.2 for topological fibre bundles.) 


One fly in the ointment here is that the domain of the tangent vector bundle chart may be a proper subset 
of the domain of the corresponding point-set chart. This may occur when a manifold has points of non- 
differentiability. 


53.1.14 REMARK: Mathematical structures and notations for tangent vectors. 

Another way of viewing the tangent vector representation question is to define a function t : atlas(M) x 
T(R”)  T(M) so that t : (v, (z,v)) 9 V for V € T(M). In other words, the partially defined function t 
maps chart/coordinate pairs to “real” tangent vectors. The fact that the real tangent bundle T'(M) is an 
undefined, i.e. extra-mathematical, tangent bundle should not be disturbing because even the point set of 
any differentiable manifold is undefined and extra-mathematical, and no one complains about that! (See 
Remark 53.1.6 on this subject.) 


Any function induces a partition on its domain, as mentioned in Theorem 10.16.3 and Remark 10.16.4. There- 
fore t partitions its domain into equivalence classes [(s, (z,v))] = ((v/. (z',v)); ty, (x, v)) = tY, (2’, v'))}. 
This raises the spectre of defining ty (z) to mean the “real tangent vector" corresponding to the chart 
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v and point/velocity coordinates pair (x,v) € T(R"). Then V = ty (zw) = w(x, v) would be an 
extra-mathematical tangent vector, and [(w,(a,v))] = t^! (V) would be an equivalence class which is in a 
one-to-one relation with “real” extra-mathematical tangent vectors, where  : T(M) > T(R”) maps real 
tangent vectors to n-tuple pairs. 

The situation is actually worse than this. Cartesian coordinate charts are a model for Euclidean geometry, 
but Euclidean geometry is itself an abstract model for physical “real geometry". (This idea is illustrated 
very roughly in Figure 53.1.2. The idea is also discussed at some length in Remark 26.11.6.) 


real Euclidean Cartesian 
geometry geometry geometry 


Figure 53.1.2 Relation between real geometry, Euclidean geometry and Cartesian geometry 


Ultimately, it is fruitless to ask which representation of tangent vectors is the best or the most realistic. It 
is like asking which brand of computer hardware is the best, or whether little-endian integer representations 
are better or worse than big-endian representations. These are internal issues within the professions which 
work with such tools. In the end, anything that represents a tangent vector is only a model. 


53.1.15 REMARK: Tangent bundles may be either extra-mathematical or synthesised from the charts. 
There are many very different definitions of tangent vectors in the differential geometry literature. The 
concept of tangent vectors on manifolds deserves to be studied more closely to determine which definition is 
best. The basic context for defining tangent vectors is illustrated in Figure 53.1.3. 


$oy-—óo 
Figure 53.1.3 Maps related to the definition of tangent vectors 


Differentiation of functions between real tuple spaces is unproblematic. If the function 7 : R — IR" is 
C! differentiable, then 7/(¢) is a well-defined vector in the tangent space T5(; (IR^) at the point 7(t) € R”. 
Similarly, if the function $ : IR" — R is C differentiable, then the differential dó(x) is a well-defined covector 
in the tangent covector space T7 (IR^) at the point z € R”. 


The composition of ¥ and ¢ is a function ¢ o 5 : IR > R. which is C! differentiable and satisfies 


MN E 36(2) oy (t) 
a Q 7(t)) = 3 Oxi PE Ot 
= 5 wy’, 
i=1 


where w; = aglr) ðr | 4) 
for i € N, are the components of a vector in Ty) (IR"). There is no problem with this at all. The problem 
starts when the same procedure is carried out with M in place of IR". 


for i € N, are the components of a covector in 77, (IR^), and v! = OF (t)/Ot 
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The function ¢ o y : R — R is the same as ¢ o 7, but the functions ¢: M — IR and y : R — M are more 
difficult to differentiate. Differentiating these functions requires a C! differentiable structure on M (if you 
want an easy life), which must be provided by charts y : M — R”. Differentiation of curves and functions 
can be carried out with the aid of such charts (if the charts are Ct differentiable and consistent with each 
other). 


Žie on) = Čo yt oo s(0) 
"4 gy (m Ow" (y(t 
“5 O vir) 


r=7(t) Ot 


z S 09(x) | ƏY (t) 
TL Gat eO Ot 


The result is exactly the same because the differentiable structure on R” must be used. The abstract manifold 
M has no differentiable structure of its own. If M did have its own intrinsic differentiable structure, this 
could be used instead. The purpose of charts is to give manifolds a differentiable structure so that curves 
and real-valued functions can be differentiated using the differentiable structure on real-tuple spaces R” as 
a proxy for the absent differentiable structure on the manifold. 


In view of these comments, there are now two approaches to defining a tangent bundle on a manifold M. 
Either (1) the manifold may be required to provide its own tangent bundle T(M) together with suitable 
tangent bundle charts  : T(M) — T(R”) which are consistent with the point space charts y : M — R”, 
or (2) a tangent bundle with the required properties may be constructed (or synthesised) for the manifold. 
The differential geometry literature contains numerous construction methods for tangent bundles. Some of 
these methods are presented in this book. 


The most “correct” answer to the question “What is the tangent bundle?" is probably that it is an abstract 
set T'(M), just like the point set M itself. For the point set, the fullest possible generality is achieved by 
defining it to be abstract, although charts must be provided to describe its structure. Similarly for the 
tangent bundle, the fullest possible generality may be achieved by defining it to be an abstract set, although 
charts must be provided to describe the tangent bundle structure. 


A differentiable manifold may have multiple tangent bundles whose different definitions all satisfy the criteria 
of some metadefinition. For example, a single manifold may have one tangent bundle consisting of chart- 
tagged Cartesian tangent lines as in Definition 54.1.2, another tangent bundle consisting of (chart, point 
coordinates, velocity components) tuples as in Definition 54.10.4, another tangent bundle consisting of point- 
tagged differential operators as in Definition 54.15.3, and yet another tangent bundle consisting of equivalence 
classes of curves as in Remark 53.3.1 (iv). 


Thus one may speak of a tangent bundle rather than the tangent bundle for a given manifold. This may be 
thought of as a syncretistic approach. The many “belief systems" may be synthesised so as to obtain the 
benefits of them all. 


The lack of a straightforward, generally accepted standard definition for tangent vectors is not totally 
surprising. As mentioned in Remark 53.1.12, a velocity is a different kind of thing to a location, and a 
tangent vector is a quite different thing to a point. Therefore tangent vectors cannot be easily manufactured 
from points. There are many ways to construct tangent bundles, but none of them are totally convincing or 
satisfying. They all maximise some virtues at the expense of others. Therefore it is best to accept all of the 
tangent bundles as valid, utilising each tangent bundle according to which is best for each purpose. Each 
tangent bundle has additional attributes beyond merely meeting the minimum criteria for the definition of 
a tangent bundle. 


53.1.16 REMARK: Curve-equivalence-class representations of tangent vectors require C! charts. 

The idea that a tangent vector at a point p in a manifold M may be represented as an equivalence class of 
curves y : IR — M with 4(0) = p depends on the assumption that the manifold must be at least of class C1. 
This style of representation does not succeed in the case of non-C! Lipschitz manifolds such as are considered 
in Section 53.2. The C 1 curve equivalence class representation maintains that two curves Ya : IR — M with 
"ya (0) = p for a = 1,2 are equivalent if (b (m())],.s = A: (b(72(t))) |, 0 for some 7 € atlasp( M). This then 
implies the equality for all » € atlas; (M). But this only holds true because the charts are C 1-compatible. 
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53.1.17 REMARK:  Equivalence-class definitions avoid the question of what an object is. 

Defining a tangent vector at a point to be an equivalence class of curves which have the same velocity at 
that point avoids the real question of what velocity is. The curve-class approach is like defining the number 
23 to be the class of all sets which contain 23 elements. Cardinality cannot be defined by equivalence classes. 


In the case of cardinality, the concept of equinumerosity is very well defined. (See Section 13.1.) But even 
when two sets are known to have the same cardinality (because a bijection between them can be found), this 
still does not define what cardinality is. Similarly, curves can be tested for *equi-velocity", but this requires 
the prior definition of spaces of test functions or charts in order to make the comparison. Since two curves 
have “equi-velocity” whenever their velocity tuples are the same, clearly the curve equivalence class concept 
does not escape from dependency on charts. 


53.1.18 REMARK: The chart-line-pair equivalence class representation is for measurements only. 

If one defines tangent vectors at a point p in a differentiable manifold M to be equivalence classes of 
pairs (v, L) with V € atlas,(M) and affine L : R — IR" satisfying L(0) = p, the equivalence rule can 
be written succinctly as (41, L1) = (Y2, L2) if and only if L5(0) = (p2 o v! o L4)'(0). In other words, 
&.L»(t)|, = Oe» (wy * (Lr (0)))], s This is equivalent to the standard transition rule for velocity vectors. 
In the case of unidirectional tangent lines, one requires Of L»(t)|, = OF vao Qa (0)))|, (This is 
illustrated in Figure 53.1.4. For brevity, the function composition symbol *o" is omitted.) 


bidirectional tangent lines unidirectional tangent lines 
"i ! Ly : 
d vo, Li] e La dob; Li e La 
> R” 22-—* IR” = R” R” 
| | L1 Viv, Le 
Vis La | T2 


^os ^f 


i (29. 


bidirectionally unidirectionally 
differentiable manifold differentiable manifold 
Figure 53.1.4 Tangent-line vector equivalence relations 


In this book, the view is taken that tangent vectors are best represented as equivalence classes of chart-line 
pairs. This apparently contradicts the assertion that the only “true tangent vector" is extra-mathematical. 
The chart-line-pair-class approach says that the “true tangent vector" exists entirely in the observer’s co- 
ordinate space. But this is completely compatible with the view that the “true tangent vector" is extra- 
mathematical. In a physical observation situation, the observed physical tangent vector is indeed extra- 
mathematical, while the line viewed in the chart is part of the observation by the observer. In any observation 
situation, one must distinguish between objects and measurements. 


All attempts to construct native tangent vectors in extra-mathematical manifolds are doomed to fail. They 
are all constructed in the observer's coordinate chart and then “pulled back" onto the extra-mathematical 
manifold. But this is self-deception. Constructions such as differential operators and equivalence classes 
of differentiable curves are constructed in coordinate charts, and they stay inside the coordinate charts, no 
matter how much one pretends that they are really in the extra-mathematical manifold. This is a very good 
thing too. The great value of working in coordinate charts is the abstraction from the “real manifold”. The 
particularities of each kind of “real manifold” are not the concern of general differential geometry. 
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53.2. Tangent vectors on almost-differentiable manifolds 


53.2.1 REMARK: The best way to understand the tangent vector concept is to “make it break”. 

To better understand tangent vectors on differentiable manifolds, it is useful to first briefly consider tangent 
vectors on almost-differentiable manifolds, which could be referred to as “directionally differentiable”. If 
the directional derivatives are equal and opposite for opposite rays, one could describe the manifold as 
“bidirectionally differentiable”. A manifold where the unidirectional derivative is well defined along all rays 
from each point could be described as “unidirectionally differentiable”. 


Lipschitz manifolds are defined in Section 50.7. Such manifolds are of interest partly because they have 
“interesting” tangent space structure, and partly because they arise naturally as the solutions of some 
kinds of boundary value problems for partial differential equations. Lipschitz manifolds are not necessarily 
directionally differentiable everywhere, but these manifold classes are related. 


Examples 53.2.4, 53.2.5 and 53.2.6 show how a Lipschitz manifold may have directional derivatives which 
depend on direction in a nonlinear way. These bidirectionally differentiable examples may be easily modified 
to be only unidirectionally differentiable. 


53.2.2 REMARK: Regularity classes should be fitted to the manifolds and functions being modelled. 

The solutions of partial differential equations are sometimes Lipschitz rather than C1. The usual linear 
transformation rule for tangent vectors between charts for a C! do not always apply in the case of Lipschitz 
manifolds. (See Examples 53.2.4, 53.2.5 and 53.2.6.) 


The vast majority of differential geometry and general relativity texts specify that differentiable manifolds 
must be at least C!. However, when one restricts the range of available models for mathematical convenience, 
one risks overlooking in the physical world those aspects of models which have been eliminated a-priori. It 
is therefore better to maintain the maximum generality in models so as to avoid “blind spots" in the 
modelling. An inadequate range of models can increase the risk of trying to fit reality to the models like a 
bed of Procrustes. 


53.2.3 REMARK: Non-C! maps which map tangent lines to tangent lines. 

At a point p in a C! manifold M, all tangent lines Lı € Ty,(p) are mapped to tangent lines Ly € Ty, (p) for 
any charts %1, Y2 € atlas; (M). The map Lı +> Lz is linear with coefficients equal to the partial derivatives 
of Vj» o 1/1 !. (More precisely, the map Lı ++ Lz is the total derivative of V» o y] ! at v(p).) In the case of 
non-C! manifolds, this map Lı ++ Lz may be well defined, but the map may not be linear. Examples 53.2.4, 
53.2.5 and 53.2.6 show how this can happen. In fact, the point transition maps Yk o wp may be described 
as “starlike” because they map lines at each point to other lines, not just the tangent lines. 


It seems quite plausible that manifolds in useful physical models could have tangent lines at a point which are 
well defined, but which do not transform linearly between natural charts. For example, physically meaningful 
boundary value problems for elliptic partial differential equations do sometimes have solutions which are of 
class C°:! rather than C?. 


53.2.4 EXAMPLE: A Lipschitz chart-pair with consistent vector direction. 
Consider the set M = IR? and the atlas A12 = {y1,W2} for M, where v, : M — R? is the identity 
map 1/4 : £  z, and V» : M — IR? is defined by 


— J (mil lza) Im if ir. 0 
valz) {4 if z123 < 0. 


The coordinates lines of v; are illustrated in Figure 53.2.1. The version of the diagram on the right shows 
how the vectors with length equal to 4 at the origin are transformed. They are drawn at 10 degree intervals. 
Note that the directions of the vectors are not changed by the transformation. Only the lengths are affected. 


The manifold (M, A12) is C° (i.e. locally Lipschitz continuous) because v» and w; ! are (globally) Lipschitz 
continuous maps. This manifold is differentiable except on the axes (including the origin) of the chart %2. 
The inverse of Yz is as follows. 


E — J ix|([zi| + |z2]) 71x if ryr2 > 0 
vs (2) = U if z1235 < 0. 
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Figure 53.2.1 Coordinate lines of a Lipschitz chart v» 


This manifold has unidirectional tangent vectors on the axes of v and bidirectional tangent vectors at the 
origin. (See Section 54.16 for unidirectional tangent vectors.) However, the chart transition rule for the 
tangent vectors at the origin is not a linear map. Likewise, the chart transition rule for the unidirectional 
tangent vectors at each point on the axes is not linear. 


At the origin of the charts y and v», the effect of the transition map V» o Wy 1 = q is to multiply points 
x € M by a number which depends on only the direction from the origin. Thus a(x) = wWe(Aeg) = Awe(e9) = 
Akgeg = Ako (eo) for a vector x = Aeg, eg = (cos0,sin0) € IR? and A € R, with k : [0,27] — IR defined by 
ko = | cos6| + | sin 6| for 0 € [0, 3] U [r, 27] and kg = 1 for 0 € [3,7] U [82,27]. This is a limited linearity 
along lines through the origin. 


53.2.5 EXAMPLE: Another Lipschitz chart-pair with consistent vector direction. 
Consider the set M = IR? and the atlas Aia = {1,3} for M, where v4 : M — IR? is the identity 
map 1/4 : £ > x, and vj : M — IR? is defined by 


([xi|-- |xap|v| m. if azz > 0 
Vs(z) = 4 |x|(|x1] 4- |zx2]) 1x. if xiz «0 
T if z4259 = 0. 


The coordinates lines of w3 are illustrated in Figure 53.2.2. The version of the diagram on the right shows 
how the vectors with length equal to 4 at the origin are transformed. They are drawn at 10 degree intervals. 
Note that the directions of the vectors are not changed by the transformation. Only the lengths are affected. 
The manifold (M, A43) is Lipschitz because 3 and v3 l are Lipschitz maps. This manifold is differentiable 
except on the origin of the chart ~3. The inverse of v is as follows. 


lz|(|xi| + |x2]) im  ifz1z2 0 
Vs (2) = ¢ (mi + lxol)|lz| te if az < 0 
x if z1239 = 0. 


This manifold has bidirectional tangent vectors at all points, and is in fact C! except at the origin. The 
chart transition rule for tangent vectors at the origin is not a linear map. 


53.2.6 EXAMPLE: A Lipschitz chart-pair with consistent vector length. 
Consider the set M = IR? and the atlas A14 = {1,4} for M, where v4 : M — R? is the identity 
map 1/4 : z > x, and Y4 : M — IR? is defined by 


r2 + r2 1/2 l 
a(z) = (ten) (azi,Bz2) ifz £0 
: ifr=0 
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Figure 53.2.2 Coordinate lines of a Lipschitz chart 4/3 


for some a, B € IR*. Then |va4(z)| = |vi(x)| for all x € M. The chart v4 is also linear along rays from the 
origin. However, if a Z 6, the chart is not a linear map. The inverse of v4 satisfies. 

"E 1/2 
(424) (8y1,0y2) ify #0 


0 ify — 0. 


pi (y) = 


The coordinates lines of v4 are illustrated in Figure 53.2.3 for a = 1 and 8 = 2. The version of the diagram 
on the right shows how the vectors with length equal to 4 at the origin are transformed. They are drawn at 
10 degree intervals. Note that the lengths of the vectors are not changed by the transformation. Only the 
directions are affected. 
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Figure 53.2.3 Coordinate lines of a Lipschitz chart w4 

This manifold has bidirectional tangent vectors at all points, and is C! except at the origin. The chart 
transition rule for tangent vectors at the origin is clearly not a linear map since there is no linear map at 
the origin which can map such irregularly spaced vectors to regularly spaced vectors. 

At the origin of the charts y and v4, the effect of the transition map V4 o Wy ! — yy is to multiply points 
x € M by a number which depends on only the direction from the origin. Thus v4(Ax) = Ava(x) for all 
a € M and A € R. Therefore each line Lẹ = (Az; A € R} C IR? is mapped by v4 o V! to a line Ly,(z). SO 
tangent vectors at the origin make perfectly good sense although the transformation rule is non-linear. 
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53.3. Styles of representation of tangent vectors 


53.3.1 REMARK: Styles of mathematical representations for tangent vectors. 

The chart transition matrices for charts of C! manifolds are given by line (51.4.1) in Definition 51.4.18. Any 
object which transforms according to these matrices may be accepted as a valid tangent vector representation 
on a C! manifold. Some examples of intrinsic tangent vector constructions which satisfy this requirement 
are as follows. 


(i) An affine-parametrised line. A tangent vector is represented as an equivalence class of pairs (v, L), 
where i) : M — R” is a Cartesian coordinate chart and L is an affine-parametrised line in R”. (See 
Definition 54.1.2.) 


(ii) Coordinates. Computations mostly use this representation. A tangent vector may be represented as 
an equivalence class of triples (p, v, Y), where p € M is the vector’s base point, v € IR" is the set of 
tangent vector coordinates and v is a chart. Alternatively, tangent vectors may be represented as pairs 
(Y, (z, v)) or triples (o, z, v) where x is the set of coordinates of p € M and a € I is the index of w in 
an indexed atlas (Ya)acr. (See Remark 54.10.14.) 


(iii) A differential operator. This is popular for analytical and theoretical applications. The operator has 
the form f ++ X; v,(0/0x*)(f o Y*(2)) ewe) for vy € R”. This style of definition requires a space of 
“test functions" to differentiate, in much the same way as generalised functions require a space of test 
functions. (See also the comments in Remark 53.3.3.) 


(iv) An equivalence class of curves. This representation defines a tangent vector at p € M to be an 
equivalence class of curves y : R — M which have the same velocity vector 5/(0) at p = y(0), where 
y'(0)-— (d/dt)(7(t))|,_9 for charts v : M — R”. (See comments in Remark 53.3.4.) 


(v) A derivation. This representation is based on linear functionals L : C°(M) — IR which obey the 
Leibniz rule. (See Remark 54.12.12.) These are essentially the same as the differential operators in (iii), 
but are defined more algebraically. They don't work correctly for C^(M) spaces with k < oo. (See for 
example Gómez-Ruiz [14], page 22.) 

(vi) A generalised function. Schwartz distributions, for example, represent generalised functions as ele- 
ments of the topological duals of spaces such as CG? (M). In particular, points may be represented as 
Dirac delta functions and tangent vectors may be represented as directional derivatives of delta func- 
tions. This is a broad extension of the differential operator and derivation styles of definition. This style 
of representation is restricted to C?? manifolds. 


(vii) Importation from the coordinate chart tangent bundle. In this style, tangent bundles are defined 
first on Cartesian spaces, and then tangent vectors on the manifold are defined as images of those tangent 
vectors, subject to equivalence relations between the charts. Thus if T,(IR") is the tangent space at 
x € IR" for a Cartesian space IR”, then a tangent vector at p € M would be defined as an equivalence 
class of pairs (Y, v) where v € Typ) (IR"). 


53.3.2 REMARK: Survey of tangent vector representation styles in the literature. 

Table 53.3.1 is a survey of some styles of definitions of tangent spaces which may be found in the literature. 
The notations in this table assume that p € M is a point on an n-dimensional manifold, v € R” is a 
coordinate n-tuple, and Y% € atlas; (M) is a chart at p. 


The pseudo-algebraic “derivations” style of tangent vector definition is very popular, although this merely 
hides the analysis inside the non-trivial proof that the algebraic rules for derivations imply that derivations 
are in fact differential operators relative to the coordinate charts. The “derivations” style is also unfortunate 
because it is valid only for C^? manifolds, which is particularly inconvenient for applications in physics 
although, ironically, it is the physics textbooks which seem to prefer this style the most. 


The “derivatives via curves" style of tensor vector definition is also very popular, and it is applicable to general 
C! manifolds, which is clearly preferable for physics applications. The same is true of the “derivatives (via 
charts)" style of definition. The “coordinate (tuple)" and the “curve class" styles of definition are amongst 
the least popular, but they are not an insignificant minority. 


For practical applications in physics, probably the differential operator (i.e. *derivatives") style of definition 
gives the best trade-off between computational convenience and philosophical (i.e. ontological) comfort. 
Some authors give multiple definitions and identify them. For greatest clarity and logical self-consistency, 
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year reference tangent vector definition 
1918 Weyl [310 coordinates with transformation rules 
1949 Synge/Schild [41] coordinates with transformation rules 
1957 Whitney [161 Es equivalence classes of differentiable curves through a point 
1959 Kreyszig [22] coordinates with transformation rules 
1959 Willmore [42] derivatives of C" functions, r € Z* 
1963 Auslander/MacKenzie [1] equivalence classes of differentiable curves through a point 
1963 Flanders [11] derivations of C% functions on the manifold 
1963 Gelfand/Fomin [79] [derivatives of curves] 
1963 Guggenheimer [16] pairs (p,v) € M x T,(M), for equivalence classes of curves T;,(M) 
1963 Kobayashi/Nomizu [19] derivatives of C^? functions by C% curves 
1964 Bishop/Crittenden [2] derivations of C^? functions on the manifold 
1965 Postnikov [33] " derivations of C^? functions on the manifold 
1968 Bishop/Goldberg [3] derivations of C^? functions on the manifold 
1968 Choquet-Bruhat (6] equivalence classes of C^? curves through a point 
1969 Federer [69] [tangent cone of a general subset of R” at a point] 
1970 Misner/Thorne/Wheeler [292] derivatives of C™ functions by C™ curves 
1970 Spivak [37] equivalence classes of coordinate tuples (v, v) 
1972 Malliavin [28] derivatives of C^? functions by C^? curves 
1972 Sulanke/Wintgen [40] equivalence classes of coordinate pairs (i,v) € I x IR" [ill-defined] 
1977 Drechsler/Mayer [262] equivalence classes of C^? curves through a point 
1979 Do Carmo [9] derivatives of C^? functions by C% curves 
1980 Daniel/Viallet [317] derivatives of C^? functions by C% curves 
1980 EDM2 [113] derivations of C% functions on the manifold 
1980 Schutz [36] derivatives of curves 
1981 Bleecker [254] equivalence classes of C^? curves through a point 
1983 Nash/Sen [30] derivatives of curves 
1986 Crampin/Pirani [7] derivations of C^? functions on the manifold 
1987 Gallot/Hulin/Lafontaine|13] ^ equivalence classes of C^? curves through a point 
1988 Kay [18] [derivatives of curves] 
1993 Kosinski [21] derivations of C^? functions on the manifold 
1994 Darling [8] equivalence classes of coordinate triples (p, o, £); chart index a; 
€ € Ty, (pmo), where T; (IR") is a copy of R” 
1995 O'Neill [295 derivations of C^? functions on the manifold 
1996 Goenner [270 derivatives of C% functions by C% curves 
1997 Frankel [12] coordinate tuples: maps from coordinate charts (U, y) to R” 
1997 Lee [24] (1) derivations of the ring of germs of C% functions 
(2) equivalence classes of curves 
1999 Lang [23] equivalence classes of coordinate triples (U, i, v) for p € U 
1999 Rebhan [299] coordinates with transformation rules 
2004 Bump [57] derivations of C% functions on the manifold 
2004 Szekeres [305] derivations of C™ functions on the manifold 
2005 Penrose [297] derivatives of C% functions 
2012 Sternberg [38] set of partial derivative tuples of local graph for surface in R” 
2015 Gómez-Ruiz [14] derivations of the ring of germs of C^? functions 
Kennington (1) equiv. classes of tagged parametrised Cartesian lines (v, L) 
(2) many classes of tangent vectors co-exist, different but isomorphic 
Table 53.3.1 Survey of tangent vector definitions 


however, it seems preferable to permit all definitions to co-exist as different, but isomorphic, tangent spaces 
and tangent bundles. Ultimately one must confess that there is no such thing as the tangent space. It is 
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only possible to define representations of the abstract concept, in the same way that the abstract concepts 
of points, numbers and probabilities have many representations but no definitive definitions. 


53.3.3 REMARK: The most popular tangent vector representation is the differential operator. 

The differential operator representation of tangent vectors in part (iii) of Remark 53.3.1 is often claimed 
to be chart-independent and “coordinate-free”, although clearly in order to specify which vector one it 
talking about, one must give the vector components. The operator definition has the advantage that the 
transformation rules follow automatically from its form. It is probably the style of representation which is 
most widely regarded as the essence of tangent vectors in differential geometry texts. 


One seldom-mentioned disadvantage of this representation is the “zero vector ambiguity problem”: the zero 
vectors at all points in the manifold are represented by the same operator. (See Remarks 54.12.6, 54.12.10 
and 54.15.1, and Definition 54.15.3.) 


Another problem is that the space of differentiable functions on a manifold must be defined beforehand. A 
differentiable function f is defined as one for which the derivatives (0/0z*)(f o Ww ~'(a)) are well-defined 
for charts v. This is uncomfortably close to being a circular definition. The space C!(M) is a very large 
set of functions. Defining a tangent vector as a linear functional on an infinite-dimensional linear space of 
functions is certainly not conceptually economical. In fact, the n chart component functions p — v (p) for 
i € Nn (for a fixed chart y whose domain contains p) suffice as test functions to fully determine the operator, 
but this is a reversion to coordinates again, which the operator definition is supposed to avoid. Clearly any 
claim that differential operators provide the best, simplest or most natural representation of tangent vectors 
is debatable. (Their main advantage is that they simplify the teaching of elementary differential geometry.) 


One should regard tangent operators as being abstract differentiation procedures rather than actions on a 
particular class of test functions, because otherwise the chosen class will always be too large or small for 
some applications. 


53.3.4 REMARK: The intuitively appealing curve-class representation is not very useful. 

Since the curves in part (iv) of Remark 53.3.1 are embedded within the point space of the manifold, this 
definition has a strong intuitive appeal which seems to be coordinate-free, but this is illusory because the 
choice of curves depends heavily on the coordinate maps. The representation is quite uneconomical in 
practice because a single vector is represented as an infinite number of curves. The curves themselves must 
be tested to ensure that the expression (d/dt)w*(7(t)) is well-defined for all components v* of the coordinate 
charts of the manifold. So the coordinates are not excluded from the definition. When one wishes to indicate 
a particular tangent vector, in practice one must specify the components of the derivatives. So, like the 
operator in part (iii), this style of definition has intuitive appeal but no practical value. (For “tangent curve 
classes", see for example Auslander/MacKenzie [1], page 9; Darling [8], page 147.) 


53.3.5 REMARK: Completely coordinate-free tangent vector representations are impossible. 

It is hopeless to try to construct a truly coordinate-free structure to represent the tangent bundle for a 
general manifold because all of the available information about the differentiable structure on the manifold 
comes from the charts, which are never coordinate-free. The charts are the coordinates. So all general 
tangent bundle constructions have explicit or implicit dependence on the coordinates. In the interests of 
honesty and efficiency, it is best to choose a representation which makes use of coordinates in the most direct 
and open way possible. (See Penrose [297], pages 239-240, regarding the pros and cons of using coordinates 
versus coordinate-free notations.) m 


53.3.6 REMARK: The affine-parametrised line tangent vector representation is “ontologically correct”. 

All of the constructions in Remark 53.3.1 have the correct transformation rules. It is difficult to select 
one representation as morally superior to all others. The approach here will be to say that a tangent 
space is any construction which has the right properties and relations to the corresponding manifold. All 
such representations and constructions are isomorphic, equivalent and interchangeable. However, the affine- 
parametrised line option (i) is chosen here as the preferred representation because it is the best starting 
point for deriving all of the other representations, and it has the best prospects for natural generalisations, 
and the best “ontological correctness". (See Definition 54.1.2.) 


For comparison, the positive integers may be represented as Babylonian, Greek, Roman or Arabic numerals, 
in binary, octal, hexadecimal or sexagesimal. The best choice depends on context. The same is true for 
tangent vector representations. 
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53.3.7 REMARK: The reason for requiring C. test functions for tangent differential operators. 

One might ask why C. test functions are used instead of general differentiable functions. The reason is 
to guarantee the equality of the expressions lima. ,o(f (x + av) — f(x))/a and $5; 4 v'Of(x)/Ox^. (This is 
explained more fully in Remark 41.6.2.) 


53.3.8 REMARK: Conditions for well-definition of unidirectional tangent vectors. 

It is reasonable to ask whether the existence of unidirectional derivatives everywhere on a manifold would be 
sufficient to define meaningful tangent vectors like lim, ,9« (f (z--av) — f (z))/a which are unidirectional. The 
examples in Section 50.6 give some hints on this subject. Example 50.6.8 shows that C?! regularity is not 
sufficient to guarantee existence everywhere of directional derivatives. However, if the chart transition maps 
have well-defined unidirectional derivatives, it should be possible to define unidirectional tangent vectors. 
This could be useful for some applications. (Such generalisations are discussed in Section 54.16.) 


53.3.9 REMARK: Differential operator and curve representations are pruned-doum coordinate charts. 

In a sense, the tangent vector representations (iii) and (iv) are inverses or duals of each other. Just as a test 
function f : M — R is one coordinate of a coordinate chart  : M — R”, each curve y : R —^ M is one 
inverse coordinate of an inverse coordinate chart ~~! : R” — M. Both the differential operator and curve 
representations of tangent vectors are pruned-down versions of coordinate charts. T'herefore one may as well 
go the whole hog and use full coordinate charts. The moral of this story is that there is no such thing as a 
coordinate-free tangent vector. Test functions and curves are thinly disguised coordinate charts. 


53.3.10 REMARK: Coordinate representations of tangent vectors are unsatisfying. 

Sometimes vectors do not transform as they should. Some quantities in physics do not vary under all 
transformations in GL(n) according to the standard matrix rules. In such cases, a different invariance group 
may be required. So, all things considered, it may be best to define a vector as an equivalence class of 
triples (p, v, ), where p is a point in a space, v is a set of coordinates for the vector, ~ is an element of 
the permitted set of charts for the space, and a set of transformation rules is supplied for determining the 
equivalence class. 


The point/coordinates/chart form of vector definition is unsatisfying. Since the coordinates must be the 
coordinates of something, it leaves open the question of what that something is. But then, one can equally 
observe that the Cartesian coordinates for a point in space are just numbers which depend upon the co- 
ordinate frame in a specified way. The coordinates are certainly not points. Nor is any equivalence class 
of coordinate/chart pairs (p,w) a point. A point is really something outside the scope of pure set theory. 
But it is not really an empirical construct either. A point is a psychological construct within the minds of 
mathematicians. This construct is useful for modelling the real world, and it may be given coordinates. But 
the point itself is undefined, just as in the case of classical Euclidean geometry. 


53.3.11 REMARK: If coordinates are good enough for points, they are good enough for vectors. 

A point is sometimes “defined” as something which has position but no extent. But this is not very illumi- 
nating. Since points cannot ultimately be defined within mathematics, it is no surprise that vectors are not 
defined either. So if it is good enough to define a point as somehow underlying the sets of coordinates that 
describe it, then surely this must be good enough for vectors. One may as well define them as equivalence 
classes of triples (p, v, Y) for p, v € IR" just as points are really equivalence classes of pairs (x, y) for x € R”. 
If coordinates are good enough for points, surely they are good enough for vectors too. It follows that for 
consistency one should define vectors by coordinates rather than by differential operators on function spaces. 


53.3.12 REMARK: The classical Greek definition of tangents. 
If one ignores the Cartesian coordinate version of geometry, one returns to the ruler-and-compass view of 
geometry in Euclid's “Elements”. (See Euclid/Heath [213], [214], [215].) In Euclidean geometry, a tangent 
is a line which touches a circle. The formal definition of a tangent (line) to a circle is given in Euclid's 
"Elements", Book III, definition 2. This is immediately followed by definition 3 for touching circles. (See 
Euclid/Heath [214], pages 1-3.) 

2. A straight line is said to touch a circle which, meeting the circle and being produced, does not 

cut the circle. 


3. Circles are said to touch one another which, meeting one another, do not cut one another. 
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This immediately raises the question of what kind of object is tangent to a circle. Even Euclid defines two 
kinds of such tangent objects, namely lines and circles. In the case of lines and circles, not cutting is an 
ample criterion for tangentiality. For more general curves, tangentiality is more subtle. In the most literal 
sense, the X-axis in Cartesian coordinates is not tangential to the graph {(z, y); y = 2°}. This disagrees with 
the modern definition. On the other hand, the vertex of a square which touches a circle would be tangential 
in the sense of not cutting, but not according to the modern definition. Thus “touching” and “not cutting” 
are not naive concepts which provide a reliable basis for defining tangents in differential geometry. 


These examples suggest that the modern definition is more concerned with the direction of the two curves 
at the point of contact than whether the curves cross one another. This raises the question, then, of how 
“direction” may be defined for a general class of curves. The direction of a straight line may be specified by 
the line itself. But a cyclic pair of definitions results from specifying direction by a curve. Then for a given 
curve, the direction is specified by the same or a different curve. It is tempting to specify the direction by an 
equivalence class of curves, but this does not answer the question as to which curves are equivalent to which. 
One may note that if two curves are tangential at a point, then any rotation of one of the curves relative to 
the other will negate the tangential property. However, this does not uniquely define the property. 


53.3.13 REMARK: Tangent bundles for Banach spaces. Tangent bundles are imperfect fibre bundles. 

In any real Banach space W, including the finite-dimensional Euclidean normed spaces R”, differentiating a 
C! curve ^ : IR — W yields an element of W. In the differential quotient (y(t + h) — y(t))/h for h € R.\ {0}, 
the difference (t 4- A) — q(t) is clearly an element of W, and A^! is an element of IR. So for a fixed t € IR, the 
map Q : h e» (7(t+h)—7(t))/h is from R to W. The images Q(Bj;) of balls Bj; by the map Q for ô € RY 
are clearly non-increasing. If the diameter diamy (Q(Bj)) of these images converges to zero, then it follows 
from the completeness of W that there exists a unique w € W such that Ve € Rt, 38 € Rt, Q(Bj) € Br. 
In other words, w = lima 5o(*(t + h) — *(t))/h, which means that 7/(t) = w. Since the tangent space of a 
manifold is the same as the set of all derivatives of C! curves in the manifold, it follows that T(W) = W. 


The observation that T(W) — W may seem obvious, but it is easy to find arguments to raise against it. In 
the fibre bundle way of thinking, every tangent vector V € T(M), for a C! manifold M, is attached to a 
base point p € M. It is assumed that the fibre sets T;,(M) are disjoint for distinct points p € M. But if one 
claims that T,(W) = T(W) = W for all p € W, one of the most fundamental assumptions of fibre bundles 
is broken. 


'The philosophical problem here is the distinction between abstract and concrete fibre bundles. This problem 
arises surprisingly often, although it is almost always ignored. One elementary example is the tangent oper- 
ator in Definition 54.11.2. As mentioned in Remarks 53.3.3 and 54.15.1, the zero-valued tangent operators 
at all points of a manifold are the same concrete object. So all of the sets T;(M) contain the zero tangent 
operator, which contradicts the disjointness assumption for distinct fibre sets. It is also found in Section 59.6 
that certain subsets of second-level tangent spaces overlap when one considers the concrete objects which 
inhabit them, a fact which is rarely (if ever) mentioned in the literature. 


In the case of tangent bundles of Banach spaces W, it is mostly accepted without question that T,(W) = 
T(W) = W for all p € W, because everyone knows that this is correct for the concrete objects which inhabit 
the tangent bundle. One particular case where this assumption is made is the identification of the set of 
all second-level tangent vectors with zero horizontal component with the tangent space of the corresponding 
base point, as mentioned in Remark 59.2.8. 


It may be concluded that the fibre bundle formalism is only imperfectly applicable to tangent bundles. The 
abstract assumptions are not valid when expressed in terms of the concrete constructions in the tangent 
bundle framework. 


The fibre bundle way of thinking puts every object into a distinct tidy space so that each object can be 
dealt with correctly, but a consequence of over-classification is that concretely sensible operations are not 
permitted. This is reminiscent of the Whitehead/Russell [400] way of classifying all sets into neat classes to 
prevent Russell's paradox, for example, but this stratification approach was later abandoned in favour of the 
Zermelo style of axiomatisation. It is difficult to manage objects in a way which forbids all bad operations 
and permits all good operations. One must be aware that the various formalisms for tangent bundles are all 
imperfect in one way or another. 


When W is a Banach space and the tangent space T(W) is not considered to be identical to the underlying 
linear space W, one may refer to the identification map from T(W) to W as a “drop function". (See 
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Definition 54.9.5 for drop functions on finite-dimensional linear spaces. Drop functions for linear spaces are 
also discussed in Remarks 57.9.5 and 59.2.8.) 


53.3.14 REMARK: Importation of tangent vectors from Cartesian chart tangent bundles. 
Figure 53.3.2 illustrates the “importation method” for defining tangent vectors on manifolds which is listed 
as option (vii) in Remark 53.3.1. 


4 (z,v) = W(9)(V) 
T(M) ae, E T(R") = p R^ 
- | " ILE 
NNI 
M x ce > R^ 


Figure 53.3.2 Exploitation of flat-space tangent bundle to define tangent bundles on manifolds 


The idea is to use a flat-space tangent bundle T(R”) to define the tangent bundle T(M) on a manifold M. 
Then the charts U(w) for the tangent bundle's total space T'(M) are required to have the same transformation 
rules as the Cartesian chart-transition diffeomorphisms $ = v» o v for charts v, Y2 € atlas; (M) C Ay. 
(See Section 42.7 for Cartesian space diffeomorphisms.) 


53.3.15 REMARK: The often-confused terms “component”, “coordinate” and “coefficient”. 

The words “component”, “coordinate” and “coefficient” are often used interchangeably, and their meanings 

are closely related, but they have different meanings. 

(1) A component is an element of a list or array of numbers, for example the numbers z^ in a vector 
(z!,... 2") or the numbers aj; in a matrix [a;;]7 j..,. 

(2) A coordinate is a component of a map from a point set to a set of tuples, for example the functions (or 
function values) r and 0 in polar coordinates (r, 0) for the plane. 


(3) A coefficient is typically a constant multiplier of a term in an expression, for example the numbers a, b, 
and c in the expression az? + bx + c, or the numbers v! in the expression $7; , v/e;. 


The numbers vê in Definition 54.11.2 for tangent operators may be described in all three ways, but with 
slightly different interpretations. They are components of the n-tuple v € IR". They are coordinates of 
operators Op», in the natural atlas on the total tangent operator space T(M). And they are coefficients 
of the first-order derivatives. These three words each suggest different relations of the numbers to their 
mathematical objects rather than attributes of the numbers themselves. 


In Theorem 54.1.11, the chart transition rule for tangent vectors, the velocity-component-tuple elements v1 
are coefficients because they are multipliers for terms in an expression. They are also components of the 
n-tuple v. They may also be thought of as coordinates of tangent vectors with respect to an atlas for the 
total space of a tangent bundle. The best choice of terminology depends on the role which the numbers play 
in each particular context, and also the idea which one wishes to communicate. 


For both tangent vectors and tangent operators, it is often preferable to refer to the numbers v? as coordinates. 
In the case of tangent operators (Definition 54.11.2), the term "coefficients" is may also be used. In the case 
of the velocity of a tangent-line vector, the term “components” may be used for the coordinates. 


The meanings may be briefly summarised as follows. 


(1) A component is an element of a tuple. 
(2) A coordinate is one component of a map from a point set to a tuple set. 


(3) A coefficient is a multiplicative factor of a term in an expression. 
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53.4. Tangent bundle construction methods 


53.4.1 REMARK: Summary of objects and spaces defined for differentiable manifolds. 

Table 53.4.1 is asummary of the objects and spaces which are defined for differentiable manifolds in this book. 
These spaces require the differentiable manifold structure (i.e. a differentiable locally Cartesian atlas), but do 
not make use of any higher structures such as an affine connection or metric tensor field. For completeness, 
some related definitions for topological manifolds in Chapter 50 are also included. The reference numbers 
here are subsections of this book which give definitions or notations for the objects or spaces. 


53.4.2 REMARK: Notation for classes of manifolds. 

The non-standard Notations 53.4.3 and 53.4.4 are intended to assist the description of various tangent 
bundle construction methods. Any tangent bundle with an n-dimensional base space has a 2n-dimensional 
total space, which is locally the product of the n-dimensional base space and n-dimensional fibre space. 
Notation 53.4.4 indicates the dimensions of both the base space and fibre space. A tangent bundle is then 
in the class F(n,n,k) for some n € Zj and k € Zg. 


53.4.3 NOTATION: M(n, k), for n € Zf and k € Zj , denotes the class of n-dimensional C^ manifolds. 


53.4.4 NOTATION: F(n,m,k), for n € Zf and k € Zj , denotes the class of C* differentiable fibre bundles 
with a base space which is an n-dimensional manifold and a fibre space which is an m-dimensional manifold. 


53.4.5 REMARK: Construction methods for tangent bundles. 

There is a bewildering array of procedures for building spaces from a differentiable manifold. These space- 
construction procedures are presented in Chapters 54, 55, 56, 59 and 60. The following is a summary of 
these procedures. 


1) Tangent-line bundle. INPUT: M(n,k). OUTPUT: F(n,n,k — 1) > M(2n, k — 1). 


For any C! manifold M, construct the tangent-line space T,(M) for all p € M and the tangent-line 
bundle T'(M). 


2) Tangent velocity bundle. INPUT: M(n,k). OUTPUT: F(n,n,k — 1) > M(2n, k — 1). 
For any C! manifold M, construct the tangent velocity space T,(M) for all p € M and the tangent 
velocity bundle T(M ). 

3) Tangent operator pseudo-bundle. INPUT: M(n,k). OUTPUT: [F(n,n,k — 1) = M(2n, k — 1).] 


For any C! manifold M, construct the tangent operator space T,(M) for all p € M and the pseudo- 
bundle of tangent operators T (M). (The set T(M) can be a true vector bundle if “tagged”.) 

(4) Tangent covector space. INPUT: M(n,k). OUTPUT: F(n,n,k — 1) > M(2n, k — 1). 

For any C! manifold M, construct the pointwise tangent covector space 15 (M) for all p € M and the 
total tangent covector space T* (M). 

(5) Tangent vector-tuple space. INPUT: M(n,k). OUTPUT: F(n,nr,k — 1) 2 M(n(r 4 1), k — 1). 
For any C! manifold M and r € Ze, construct the pointwise tangent r-tuple space Tp (M) for all p € M, 
and the total tangent r-tuple space T" (M). 

(6) Tensor space. INPUT: M(n,k). OUTPUT: F(n,n"**,k — 1) > M(n 4- n"**, k — 1). 

For any C! manifold M, construct the pointwise tensor space T5*(M) for all p € M, and the total 
tensor space T^*(M), where r,s € Zg. 

(7) Higher-order tangent vectors. 

INPUT: M(n,k). OUTPUT: F(n, 334 o CF, k — £) > M(n - YE CPU, k — 0. 
For any C* manifold M, construct the pointwise space of (th-order tangent vectors T (M) at pe M, 
and the total space of (th-order tangent vectors TU (M). 


The above construction methods may be applied recursively. The output from each principle may be an 
input to other building principles. Also, each C^ vector bundle is also a C^ manifold. So for example, 
T (T(M)) results from applying method (1) twice to a C^ manifold to yield a C^-? vector bundle, because 
the C*—! vector bundle from the first step is also a C^-! manifold which can be input to the second step. 


53.4.6 REMARK: Construction methods for spaces of fields and maps. 
Another space-construction concept for objects based on tangent bundles is spaces of cross-sections of the 
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reference object or space notation parameters 
49.10.3 set of continuous real-valued functions C(M,R) 
49.10.6 set of continuous real-valued local functions C(M, R) 
51.5.15  C* differentiable manifold (M, Am) k € Zj , atlas Ay 
51.6.3 linear space of C" real-valued functions C"(M,R) r € Zj 
51.6.7 set of C" real-valued local functions C"(M,R) reZ 
51.7.3 linear space of C^ m-tuple-valued functions C'(M,R")  reZj,meZj 
54.1.2 tangent vector tooa pcM,vcm^",vecAy 
54.4.4 tangent vector space T,(M) peM 
54.4.10 tangent space chart-basis vector eb pe M, v € atlas (M), i € Nn 
54.1.4 tangent total space T(M) 
54.10.4 tangent velocity vector MNT pe€M,vemR^,ve€Ay 
54.10.7 tangent velocity vector space T,(M) peM 
54.10.10 tangent velocity vector bundle total space — T(M) 
54.11.2 tangent differential operator Op v vb pE M, v ER”, yE Am 
54.11.10 tangent differential operator space T,(M) peM 
54.12.1 tangent differential operator total space T(M) 
54.13.4 tangent operator space chart-basis vector pe pe M, v € atlas, (M), i € Nn 
54.15.3 tagged tangent differential operator (p, Op,v,w) pEM,veR",~eAnu 
54.15.5 tagged tangent differential operator space — T;(M) peM 
55.2.1 | tangent covector space T5 (M) peM 
55.2.0 tangent covector ood pcM,wcR", v c atlas,(M) 
55.3.3 tangent space chart-basis covector ei y pE M, v € atlasp(M), i € Nn 
55.4.11 tangent covector total space T*(M) 
56.1.5 tangent multilinear-map tensor space TYS(M) pEM,r,s € Z9 
56.1.6 tangent linear-map tensor space T3? (M) pEM,r,s € Z 
56.1.11 tangent multilinear-map tensor ba " a: NTS > R, pe Am, r,s € Zf 
56.1.12 tangent linear-map tensor Eri a:NL > R, Y € Am, r,s € Zp 
56.3.2 tangent multilinear-map tensor total space T"*(M) r,s € Zo 
56.3.3 tangent linear-map tensor total space T™S(M) r,s € Zo 
56.4.9 set of general r-linear functions L,(T(M)) rez 
56.5.14 set of antisymmetric r-linear functions L-(T(M)) reZ; 
56.6.3 set of symmetric r-linear functions LH(T(M)) rez 
57.1.5 space of C* tangent vector cross-sections X*(T(M)) k € Zo 
57.3.3 space of C^ tangent operator cross-sections X*(T(M)) k € Z 
57.5.6 space of C* tensor field cross-sections X*(T^*(M) ke Z,r,se Ze 
57.6.14 space of C* differential m-forms X*¥(An(M)) ke Ze 
59.1.19 second-level tangent space T.(T(M)) z ET(M) 
59.1.22 second-level tangent bundle T(T(M)) 
59.1.26 higher-level tangent bundle T (M) ke Zt 
60.2.2 second-order tangent operator 0 sui p € M, a E€ Sym(n, R), be R” 
60.2.4 second-order tangent operator space TH (M) peM 
60.2.4 set of second-order tangent operators TUM) 
60.5.6 | second-order tangent vector NM p € M, a € Sym(n, R), be R” 
60.5.8 second-order tangent vector space TP (M) pceM 
60.5.8 set of second-order tangent vectors TUM) 
Table 53.4.1 Summary of differentiable manifold objects and spaces 
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bundles defined in Remark 53.4.5. Method (8) constructs spaces of cross-sections of fibre bundles from 
differentiable manifolds. For example, X ^(T* (M)) results from applying method (4) followed by method (8). 


(8) Vector fields. INPUT: C^ vector bundle. OUTPUT: Vector field algebra. 
For any total space T on a C^ manifold M, construct the space of vector fields X*(T) for k € Zt. 
This is the space of C* cross-sections of T. 


The following construction methods yield induced-map bundles. 


(9) Map forms. INPUT: Two C^ manifolds. OUTPUT: C^-! vector bundle. 

For any two C! manifolds M; and M2, construct the pointwise space of T(M»3)-valued forms T (Mı, M2) 
at p € Mı, and the total space of T(M2)-valued forms T* (M4, M3) on Mj. 

(10) Map vectors. INPUT: Two C^ manifolds. OUTPUT: C^-! vector bundle. 
For any two C! manifolds M; and Mg, construct the pointwise space of T* (M3)-valued tangent vectors 
T (Mı, M2) at p € Mı, and the total space of T*(M2)-valued tangent vectors T(M,, M2) on Mı. (Note 
that T'(Mi, M3) = T* (M2, Mı), roughly speaking.) 

(11) Higher-order map vectors. INPUT: Two C^ manifolds. Output: C^-* vector bundle. 
For any C* manifolds M; and Mo, construct the pointwise space of ¢th-order T*(M3)-valued tan- 


gent vectors TÉ (Mi, M3) at p € Mi, and the total space of ¢th-order T*(M»a)-valued tangent vec- 
tors TV (My, M2). 
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54.0.1 REMARK: Relations between tangent bundles and fibre bundles. Avoiding definition-recursion. 


Some relations between tangent bundles and fibre bundles are sketched in Figure 54.0.1. 


[1701] 


non-topological topological 
fibre bundle manifold 
L L 
topological C! manifold 
fibre bundle = 
mts a” 
tangent bundle 
on C! manifold 
differentiable C? manifold 
fibre bundle P 
=< a 


tangent bundle 
on C? manifold 


Figure 54.0.1 Relations between fibre bundles and tangent bundles 


The tangent bundle of a differentiable manifold satisfies the requirements for a topological fibre bundle. 
After differentiable fibre bundles have been defined in Chapter 64, it can be asserted that the tangent bundle 


of a differentiable manifold is a differentiable vector bundle. ($ee Theorem 65.9.5.) 


A differentiable fibre bundle according to Definition 64.8.3 requires a differentiable structure group G1, which 
has a tangent bundle Ti. So if a tangent bundle is a differentiable fibre bundle, then the tangent bundle Ti 
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will have its own differentiable structure group G2, which will have its own tangent bundle 75, and so forth. 
This recursion between the two definitions is easily avoided by not including the structure group of a fibre 
bundle as part of the specification tuple. (Additional reasons for not including the structure group in fibre 
bundle specification tuples are discussed in Remark 21.8.16.) 


The structure definition recursion issue for tangent bundles and differentiable fibre bundles may be compared 
with a related scenario for linear spaces. For every linear space Vo, there is an automorphism space Aut(Vo) 
consisting of all linear isomorphisms from Vo to Vo. The space Aut(Vo) has a natural linear space structure. 
So one may define the linear space Vi = Aut(Vo). Clearly one may construct Vk+ı = Aut(V;) recursively. 
However this is not a problem. The automorphism space is not included as part of the definition of the 
underlying linear space. The automorphism space Aut(V4) is in a sense a “property” of the linear space Vx. 


In the same way, the structure group for a fibre bundle is a property of the fibre bundle structure which 
may be discovered, for example, by testing various groups to find one which fits. The avoidance of run- 
away structure recursion is one of the best reasons for specifying only minimal information in structure 
specification tuples in general. 


54.0.2 REMARK: Summary of tangent space concepts. 

Table 54.0.2 summarises some tangent space concepts which are presented in Chapter 54. The abbreviations 
in the table are based on Definition 51.5.15 for a differentiable manifold M with a C! atlas Ay = atlas(M). 
(See also Table 55.0.1 for tangent covector, tensor bundle and higher-order tangent spaces.) 


reference concept symbol comments 

54.1.2 tangent vector Lv ib equiv. class [(V, Lyp),v)], p € M, ve R”, v € Au 

54.4.4 tangent vector space T,(M) {tp wyp; v € R^, v e atlas (M), pe M 

54.1.4 tangent total space T(M) {tp wv; p E M, v € R”, v € atlas,(M)} 

54.5.16 tangent bundle (T(M), Arm) T(M) = Use Tp(M) with atlas Arom) 

54.10.4 tangent velocity vector € equiv. class [(V, ((p),v)),pe M, v € R”, Y € Ay 

54.10.7 tangent velocity space T (M) {čp vy; v € R”, v € atlasp(M)}, pe M 

54.10.10 tangent velocity total space T(M) {čp vy; p E M, v € R^, y € atlas; (M)) 

54.11.2 tangent operator Op, vv pv : f. vid? f for f € C'(M,R) 

54.11.10 tangent operator space T,(M) {3p vy; p € M, v € R^, y € atlas,(M)} 

54.12.1 tangent operator total space T'(M) {3p vy; p E M, v € R^, v € atlas (M)) 

54.15.3 tagged tangent operator (p, Ou) tangent operator pv, with tag p € M 

54.15.5 tagged tang. operator sp. T,(M) {(p, Opus); p € M, v € R^, v € atlas; (M)) 
Table 54.0.2 Summary of tangent space concepts 


54.1. Tangent vectors 


54.1.1 REMARK: The primary representation of tangent vectors here is "tangent-line vectors". 
Tangent-line vectors are the chosen representation for tangent vectors in this book. Unless otherwise stated, 
all tangent vectors on differentiable manifolds are tangent line vectors. These are so called because they are 
imported from linearly parametrised lines in the Cartesian spaces which are used for manifold charts. The 
justification of this choice of representation is discussed in particular in Remark 53.1.9. 

To avoid lengthy discussions of equivalence classes of chart/line pairs and their properties, Definition 54.1.2 
is presented directly in terms of lines and chart transition maps, without mentioning equivalence classes. 
However, tangent vectors are in fact equivalence classes by this definition, as shown in Theorem 54.1.8 (v). 
The Cartesian space tangent space T(R”) in Notation 26.14.3 is presented here in a simple closed form. It 
represents the set of linearly parametrised lines in the Cartesian space R”. 


54.1.2 DEFINITION: Representation of tangent vectors as sets of chart/line pairs. 
The tangent vector at a point p in a C! manifold M with n = dim(M), with velocity component tuple 
v € IR" for a chart v € atlas; (M), is the set 


tpv, = (Uo, Lo) € atlas, (M) x T(R”); Lo(0) = vo(p) and Avvo  (Lo(0)) |, y = v). 
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where T(R”) = {L : R > R”; dv € R^, Vt € R, L(t) = L(0) + tv}. 
A tangent vector on a C! manifold M is a tangent vector ty,y,y, for some p € M, v € R” and v € atlasp( M), 
where n — dim(M). 


54.1.3 REMARK:  Notations for sets of tangent vectors. 

It is clear that the tangent vectors in Definition 54.1.2 are well-defined sets of chart/line pairs. So the sets 
T,(M) and T(M) in Notation 54.1.4 are well-defined sets. Then it must be shown that these have the 
expected properties. 


54.1.4 NOTATION: Sets of tangent vectors on differentiable manifolds. 
T,(M) denotes the set of tangent vectors at a point p in a C! manifold M. Thus with n = dim(M), 


T (M) = {tpv,y; v € R” and y € atlas; (M)). 
T (M) denotes the set of tangent vectors on a C! manifold M. In other words, 


TOM) = U TM) 


= {tp wp; p € M, v € R” and v € atlas (M)]. 


54.1.5 REMARK: Distinction between “components” and “coordinates”. 

In Definition 54.1.2 and Notation 54.1.4, v(p) is the n-tuple of coordinates of the point p via the map wv, 
whereas v is the n-tuple of components of the velocity of the vector tp,v,p. Components are elements of 
tuples in general whereas coordinates are components of maps from point spaces to tuple spaces. The “point 
space” of the coordinate map v is M. (See Remark 53.3.15 for further comments on this terminology issue.) 


54.1.6 REMARK:  Tangent vectors are equivalence classes of chart/line pairs. 

Each tangent vector tpv, in Definition 54.1.2 contains the chart /line pair (Y, Ly(p),.), where Lz, : IR — R” 
is defined for z,v € IR" by Lr w : t — x + tv as in Notation 26.13.4. But each £,,,,y also contains all of the 
chart/line pairs (Vo, Lo) which are equivalent to L,(5;,, in the sense that the curve vy ! o Lo has the same 
velocity v at p. (This equivalence relation is presented in Theorem 54.1.8 and illustrated in Figure 54.1.1.) 


Li 1 
23 woo, Li] y Lo 
IR" ty > IR” 
tivy Da 
L L 
A ix Ae bs 
Ed — 
R R 
M 
Figure 54.1.1 Tangent-line vector equivalence relation 


Each tangent vector tpw, could be represented by the single chart/line pair (Y, Ly(p),.), which contains all 
of the necessary information to specify the vector. (Or it could even be represented by the tuple (p, v, v)).) 
Although this has the advantage of being “smaller” than an equivalence class of chart/line pairs, it has the 
disadvantage that equality of vectors tp», 4, and t5,,,,,, must then be replaced by an equivalence relation. 
By incorporating the equivalence class into the definition of a tangent vector, the equality of vectors is the 
same as ZF set inequality. Thus one may write “tpy.o, a, = tp,v2,p2 instead of “ty, , = tp,v2,p2 . Apart 
from this practical advantage, there is the further advantage of “ontological clarity". In other words, its 
real-world meaning can be easily guessed from its structure. 
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54.1.7 REMARK: Construction of tangent vectors from equivalence classes of chart/line pairs. 

Theorem 54.1.8 shows how the tangent vector representation in Definition 54.1.2 can be constructed by first 
defining the global set of chart/line pairs, then defining an equivalence relation on these pairs, and then 
finally associating the equivalence classes with individual tangent vectors. 


54.1.8 THEOREM: Some basic properties of chart/line pair equivalence classes and tangent vectors. 
Let M be a C! manifold. Let X = ((v,L) € atlas(M) x T(R”); L(0) € Range(v)), where n = dim(M). 
Define the relation “=” on X such that (V1, £1) = (Y2, L2) if and only if 


Vg '(L2(0)) = Y7 Q4 (0) (54.1.1) 
and 
A, Lo(t)|, = On» (v5 (Lr (0))|, a (54.1.2) 


For all p € M, let X, = ((v, L) € X; L(0) = v(p)). Then the following propositions hold. 
(i) Both sides of equation (54.1.2) are well defined for all (v1, £1), (We, L2) € X. 


(ii) “=” is an equivalence relation on X. 
(iii) Vpi, pa € M, (pi Æ po => Xp, n Xpo = 0). 
Let P be the set of equivalence classes of “=”. Let [(w,Z)] denote the equivalence class containing any 


chart/line pair (a, L) € X. Define Lyw € T(R”) for z,v € R” by Lew : t zc tv fort € IR. 
(iv) Vp € M, Ww € R^, Vy € atlas; (M), (V, Lyp), wv) € tp,v,v- 
(v) Vp € M, Ww € R”, Vy € atlas;(M), tpw, = (V, Lyp), o)l: 
(vi) T(M) = P. 
(vii) Vp € M, Vor, v; € IR", Vy € atlas,(M), ty», = tp, € v1 = va. 
(viii) Vpi, po € M, Vv, v2 € R”, VV € atlas,, (M), Vv» € atlas,, (M), 
Ups vi ibi = Ipsos © opi vii tps oa 2 * 
(ix) Vpı, p2 € M, (pı pa => Tp (M) N Tp, (M) = 9). 
(x) Vp e M, T, (M) = ([(v, L)] € P; L(0) = v(p)). 
(xi) Vp € M, Vy, vo € atlas; (M), Vv € IR^, 3'vo € R”, ty, = tp,vo to 
(xii) Vp € M, Vv € atlas, (M), T,(M) = {tp w; v € IR^. 
(xiii) Vp € M, Vy € atlas,(M), t 


p,- : R” > T,(M) is a bijection, where tp,. y is the map v > tpv,- 


PROOF: For part (i), let (v1, L1), (Y2, L2) € X. Then La = Lz,,, for some Ta, Va € R” for a = 1,2. 
So 0,L2(t) = və for all t € IR. and so dLa lt) l,o = U2, which is clearly well defined. The expression 
Va (bi (Lı(t))) is well defined for t in an open neighbourhood of 0 € R because L4(0) € Range(yı) and 
Range(7,) € Top(IR?), and v4 !(L4(0)) = v;!(L3(0) € Dom(v») and Dom(v») € Top(M). Then since 
V» o wy! is a C! map by Definition 51.3.2 (iii), the map t > v»(v; ! (L4 (t))) is differentiable at t = 0. 

For part (ii), the reflexivity condition in Definition 9.8.2 (1) follows from the observation that for (v, L) € X, 
line (54.1.2) is satisfied because L(0) € Range(v) and Range(v) is an open subset of R”. Symmetry of the 
relation follows from the symmetry of line (54.1.2). Transitivity follows easily from the observation that 
Vs o Wy! = (Vs o Wy") o (Ya o i T) for (Vi, L1); (Vo, L2), (3, L3) € X. 

For part (iii), let pı, po € M. Let (v, D) € Xp, O Xp. Then v(p1) = L(0) = v(p»). So pı = p» because wz is 
injective. Hence pı Z p2 implies Xp, N Xp, — () 

Part (iv) follows a S Mar 54.1.2 because e v) € atlas (M) x T(R”) and L4(,,(0) = v(p), 
and Abb (up), (£))) io = Midrange w)(Lu@).v( o. = Ob) Olio = v- 

For part (v), let a € tpw,- Then by Definition 54.1.2, (Vo, Lo) € atlas,(M) x T(R”), L 


o(0 
Wo(p) and Orb (ug (Lo IL LES 'Therefore Vo (L o(0) =p= PTH (Ly (py,v(0)) and O;Ly(p),v(0) = 
Orb (ho (LO) o: à (Vo, Lo) = (V; Lupy,v). So (Vo, Lo) € [(V, D4(5),:)). Thus too € (H, Law),v)]- 
For the reverse inclusion, let (Wo, Lo) € [M (p,v)]- Then (Vo, Lo) = (V, Ly(p),v)- So by line (54.1.1), 
Uo (Lo(0)) = $^ (Lu,,(0)) = p. Thus Lo(0) = vo(p). Then by line (54.1.2), p(y ^ (Lo(t)))|, = 


)-= 
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OrLo(y), s (t)], =V- So (Yo, Lo) € tpw, by Definition 54.1.2. Consequently tpw, 2 [(Y, Ly~),v)]. Hence 
tpw = [Q5 Lupe): 

For part (vi), the inclusion T(M) C P follows from Notation 54.1.4 and part (v). Now let [((v, L)] € P. Then 
L = Lz for some v € R”, where x = L(0). But L(0) € Range(v) by the definition of X. So p = iih (x) = 
v-1(L(0)) is well defined, and then x = ¢(p). So L = Ly)». Thus [(v, L)] = (Y, Lupv)] = tpv, by 
part (v). So P CT(M). Hence T(M) = P. 

For part (vii), let too. = tpv2,y. Then T € tye, because (V, Lyp) vi) € tp,v,,u by part (iv). 
So by Definition 54.1.2, v2 = Ov (V ! (Lu(py,u, lo = Dess (0) |o = vı. The converse is obvious. 
For part (viii), suppose that tpi wip, = began cies. Then p ec T bp ray = te ua Æ Ô by part (iv). 
Now suppose that ty, p, N tp2,v2,ve F Ø. Then (Wi, Lu. (p1),vi )n [(V2, Lu (p;),v;)] # by part (v). So 
[Vos Dus (p1),01)] = (2, Dus (pa),v,)] by part (ii) and Definition 8.7.12 (ii). Therefore tp, vıp = tp2,vejh2 by 
part (y). Hence Up, vii = Ipaa ia © pi oii N Óps oa * 0. 

For part (ix), suppose that T,,(M)MT,,(M) z 0. Then by Notation 54.1.4, there exist v1, v» € IR", and 
Yı € atlas,, (M) and wy € atlasp, (M), such that tp, v1.0; = tps,v2,2- So by part (viii) and Definition 54.1.2 L 2, 
there exists (Wo, Lo) € tp, v, 1 N tpa.v2,pa: Then Lo(0) = vo(p1) and Lo(0) = vo(pa). Hence pı = pa. 

For part (x), let p € M and V € T;,(M). By Notation 54.1.4, V = tpv, for some v € R” and v € atlas; (M). 
So V = [(V, Ly(p),v)] by part (v). But L4(5,,(0) = w(p). Therefore V € ([(v, L)] € P; L(0) = v(p)). Thus 
T,(M) € ([(, L)) € P; L(0) = v(p)j. 

Now suppose that p € M and V € ([(v, L)] € P; L(0) = v(p)). Then V = [(w, L)] for some (v, D) € X with 
L(0) = v(p), and L € T(R”) by the definition of X. So L = Ly»), for some v € IR" by Notation 26.14.3, 
which implies V = tpw, by part (v). Therefore V € T,(M) by Notation 54.1.4. Consequently T (M) = 
(((, L)] e P; L(0) = v(p)j for all pe M. 

For part (xi), let p € M, v,vo € atlas,(M) and v € IR^. Let vo = $5.4 vari po TE) ryo) and 
L = Ly)». Then L(0) = v(p) and Aro (1 (L(t)))|, = v by the chain rule. So (y, D) € tp wopo by 
Definition 54.1.2. But (v, L) € tpw, by part (iv). So tp. = tp.) by part (viii). The uniqueness of vo 
follows from part (vii). 

For part (xii), let p € M and v € atlas, (M). It follows from Notation 54.1.4 that T (M) 2 [t,,,,); v € R"J. 
Now suppose that V € T (M). Then V = t,,4,,4, for some vo € R” and v € atlas; (M) by Notation 54.1.4. 
So V = ty», for some v € IR" by part (xi). Therefore V € (t5,,,; v € R^). Thus T,(M) C {tp wy; v € IR"). 
Hence T (M) = {tp vp; v € IR"). 

For part (xiii), surjectivity follows from part (xii), and injectivity follows from part (vii). 


54.1.9 REMARK:  Non-overlap of tangent vector spaces and distinct base points. 
Theorem 54.1.8 (ix) ensures that the zero-vector ambiguity problem in Theorem 54.12.4 (ii) for tangent 
operator spaces cannot occur in the case of tangent vector spaces. (See also Remark 54.1.15. ) 


54.1.10 REMARK: Using extra-mathematical “native differentiable structure” to check some formulas. 
Line (54.1.2) in Theorem 54.1.8 suggests the more symmetric-looking line (54.1.3). 


“Oey LE)” = “Ory L)” (54.1.3) 
Unfortunately, the expressions in line (54.1.3) are only meaningful if the manifold already has its own native 
differentiable structure, which would make the provision of charts superfluous. However, this formula does 
give a kind of reality check, and a convenient mnemonic and motivation. It also closely resembles line (54.1.1), 
which is philosophically desirable, but this is useless if it has no meaning! 


54.1.11 THEOREM: Chart transition rule for tangent vector components. 
Let M be a C! manifold with n = dim(M) € Zj. Let pi,p2 € M and vi, v2 € R”. Let Yı € atlas, (M) and 
V, € atlas, (M). Then ty, «, y, = 15, v.v, if and only if pı = p? and 


j i o. ; 0 -1 i 
Vi € Nn, v = 2, rig; C a) enum (54.1.4) 
= Y Ja (pjo? (54.1.5) 
j=l 


where p = pı = po. (See Definition 51.4.18 for the chart transition matrix Jo1(p).) 
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PROOF: Suppose that tp, v», y, = £p,,v,,u,. Then (Vi, Ly, (p,),u,) € tp,,v,,u, by Theorem 54.1.8 (iv). So by 
Definition 54.1.2, Ly, (p,),v, (0) = V1(pz) and Apal YI (Lin (p), (t))) |, = v2. So Vi (pi) = vi(p2), which 
implies p; = p» since 4 is injective, and vo = @&p2 (Y7 (i (pi) -vt))|, o = pm v1 Oni Vo (5 (Eloy) 
by the chain rule, Theorem 41.7.4. This verifies line (54.1.4). Then line (54.1.5) follows from line (54.1.4) 
and Definition 51.4.18. 

Now suppose that pı = pa and line (54.1.4) holds. Then vo = Ojo (1 "(Yı (pi) + vit))|, o which implies 
arpal YT (Lip (p1),01 (t)))|,—o = v2, and Lipi) (0) = i (pı) =% (p2). Therefore (V1, Cre, € bys va vo 
by Definition 54.1.2. But (Vi, Ly, (p,),u,) € tyijo1,0, by Theorem 54.1.8 (iv). Hence tp, v, y, = tpo,v2,Y2 by 
Theorem 54.1.8 (viii). 


54.1.12 REMARK: Mnemonic formulas for chart transition rule equations for vector components. 
A useful mnemonic formula for equations (54.1.4) could be 


Ovi 
vá 
An essentially correct mnemonic would be v5 = $55 4 vô; (Yh o va )(va(p)). Using the chart transition 
matrix notation in Definition 51.4.18, line (54.1.4) may be written as the matrix product vg = Jga(p)va- 


vh = (pvi. 


54.1.13 REMARK: Locality of the point/components/chart specification of tangent vectors. 
An immediate consequence of Theorem 54.1.11 is the observation that if charts %1, Y2 € atlasp(M) are 
identical on some neighbourhood of p € M, then ty», 4, = tp,v2,p2 if and only if vı = vg. In other words, 


Vp € M, Vui, v2 € IR^, Vi, v» € atlas; (M), YQ € Top, (M), 
Pila = tlo > (tpm cue ®© V =v). 


This observation has the consequence that tangent vectors on an open subset of a manifold, using the 
restrictions of charts to that subset, may be identified if they have the same components. (This is mentioned 
in more detail in Remark 54.6.7 and proved in Theorem 54.6.8.) All representations of tangent vectors are 
ultimately local in this sense, although for some representations (such as derivations), proving locality may 
be non-trivial. 


54.1.14 REMARK: Tangent vectors for finite-dimensional linear spaces. 

A finite-dimensional linear space V is structured as a differentiable manifold (V, Ay) in Definition 51.4.21, 
where the atlas Ay = (&p; B is a basis for V) is the set of all component maps for the linear space. Then 
the chart transition rule matrices in Theorem 54.1.11 are constant because of the special choice of atlas. 


54.1.15 REMARK: Non-ambiguity of base points of tangent vectors. 

In view of Theorem 54.12.4 (i), which shows the base-point ambiguity for tangent operators, it is important 
to establish that this problem does not occur in the case of tangent line vectors. Therefore the non-ambiguity 
of base points of tangent line vectors is highlighted as Theorem 54.1.16. In particular, the zero vectors tp, 0,0, 
and ty, o,,, are distinct whenever p, and p» are distinct. (See also Remark 54.1.9.) 


54.1.16 THEOREM:  Tangent vectors at distinct points are distinct. 
Let M be a C! manifold with n = dim(M). Then 


Vp1,p2 € M, Vv1,v9 € R”, Yy € atlasp, (M), Vy». € atlas, (M), 
Pı * pa = tpi wi # Ups jua ba - 


PROOF: The assertion follows from Theorem 54.1.8 (ix) and Notation 54.1.4. 


54.1.17 REMARK: Zero vectors and zero-dimensional manifolds. 

In the (almost) trivial case n = 0 in Definition 54.1.2, the only tangent-line vector in T(IR”) is the zero 
line Lo,o : t — Opo for t € IR. (This looks more like a point than a line!) So the only tangent vector at each 
point p of a 0-dimensional manifold is the equivalence class ((v, Lo,0); v € atlas; (M)) = atlas, (M) x Lo]. 
Therefore tp o, = atlas; (M) x (Loo) for all v € atlasp( M). 


For general n € Zt, zero tangent-line vectors on M are represented by equivalence classes of the form 
ty ,0,2) = {(wvo, Lo(p),0); Wo E atlas; (M). 
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54.2. Tangent vectors regarded as chart/line maps 
54.2.1 REMARK: Tangent vectors can be regarded as functions from charts to lines. 


As mentioned in Remark 53.1.13, tangent vectors which are represented as equivalence classes of chart-tagged 
lines can be regarded as functions which map charts to lines. Using Theorem 54.1.8 (v), one may write 


Vp € M, Vu € R^, Vy € atlas,(M), tpw y (Y) = Lyp) w- 


(This is asserted as Theorem 54.2.2 (v).) Since this relation is representation-dependent, it should only be 
used in the presentation of the foundations of tangent bundle theory. 


Theorem 54.2.2 is concerned with the viewpoint that each tangent vector on a C! manifold is a function 
which maps charts around its base point to the set of all Cartesian tangent-line vectors which have the same 
dimension as the manifold. 


54.2.2 THEOREM: Tangent vectors are functions from charts to a Cartesian tangent space. 
Let M be a C! manifold. 


(i) Vp € M, Vy, vo € atlas, (M), Vv € R”, duo € R”, (Vo, Lyo(p)vo) € tp,v,y- 
(ii Vp € M, Vu, Vo € atlas; (M), Vv € R”, YLı, La € T(R”), ((wo, Li), (wo, La) € Lov => "m = La). 


) 
(iii) Vp € M, Vv, Yo € atlas, (M), Vv € R”, Fup € IR", (Vo, Lus(p) ug) € tp,v,u- 
(iv) Vp € M, Vv € R”, Vv € atlas; (M), tpv, : atlas; (IM) — T (IR?) is a function. 
(v) Vp € M, Vv € R^, Vy € atlas; (M), tpv, ESO ) = Ly(y),v- 
(vi) Vp € M, Vv € R”, Vv € atlas,(M), p 7M (ty v o (Y) (0)). 
(vii) Vp € M, Vv € R^, Vv € atlas, (M), v x — tpv y (0)(0). 
(viii) Vy € atlas(M), Vp € Dom(w), VV € T,(M), p = v *(V(v)(0)). 
(ix) Vp € M, VV € T,(M), Dom(V) = atlas (M). 
(x) Vp € M, VV € T,(M), Vv € Dom(V), p = Y~! (V (4) (0)). 


PROOF: 
For part (i), let vo = 377 4 vari pol (2))],.. Then Lyo(p),v9 € T(R”), Lus), (0) = vo(p) and 


Orb (Uo (Lot), (0)) 1 o = Bera? (uo (p) + tr0))|,-o 
=> við (VO (9) Ls py (54.2.1) 
=r de và, woe ‘(x 2). g59 Qo ^ (0l, uso; 
-Xw py By PY (9) yap ep) C C0), uy 
= Y v Gl ho^ ())L, a (54.2.2) 
= E VOX a= (p) 
zo ae 
i=1 
=v, 


where lines (54.2.1) and (54.2.2) follow by chain rules, and e; denotes a unit vector as in Definition 22.7.9. 
Thus by Definition 54.1.2. there exists vo € R” such that (Vo, Luo(p),vo) € tp,v,: 

For part (ii), let Lı, Le € T(R”) and (wo, Li) (wo, L3) € lv Then Li = Liiv and Lə = Lsa, v2 a some 
T1, T2, U1, V2 € R”, and by Definition 54.1.2, X1 L4(0) Vo(p) L3(0) T2 and Onb(ug! (L(t) 


Nemo = 
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v = Orb (Ug (La(t)))|,29 So 


vı = 8Li(t)|, 
= Onbo( ^ (Qs ^ Qa (0))) rao 
= » Bai PoV (2))], 9 (Ho ^ A ()) |, 


= P» v Oxi polyp (x)) ee 


TL 


= 3 bm bol 1 (9), S909 Yo EO) 


= Ono (97 (Qs (2 (0)))))|, 
= àLx(t)|, 


Hence Lı = Lə. 

For part (iii), the existence and uniqueness follow from parts (i) and (ii) respectively. 

For part (iv), the set tpw, in Definition 54.1.2 is a subset of atlas; (M) x T(R”), which implies that t,o, 
is a relation from atlas, (M) to T(R”) by Definition 9.5.2. 

To show existence, let Wo € atlas; (M). Then by part (i), there exists vo € IR" with (vo, Lu, (5), ) € tp,v,y- 
Thus (Vo, Lo) € tp,v,p for some Lo € T(R”), where Lo = Ly, (»),u- 

To show uniqueness, suppose that (Vo, L1), (Yo, L2) € tpv, for some Wo € atlas; (M) and L1, Lo € T(R”). 
Then Lı = Lz by part (ii). Hence tp vy : atlas; (M) — T(R”) is a well-defined function by Definition 10.2.2. 
For part (v), (V, Ly(p),v) € tp,v,y by Theorem 54.1.8 (iv). So t5,,,u (V) = Ly(5),, because tp,y,y : atlas; (M) — 
T (IR") is a function by part (iv). 

For part (vi), tooo (V) = Lyp), v But L4(5,,(0) = v(p) by Notation 26.13.4. Therefore V! (tp v, (V)(0)) = 
v^ (v(p)) = p. 

For part (vii), tpv, (V) = Ly(py,s. But Ly(5,,(0) = v(p) and Ly(5),,(1) = v(p) + v by Notation 26.13.4. 
Hence tpv o (V)(1) — tp, vy (v)(0) = v. 

For part (viii), let v € atlas(M), p € Dom(w) and V € T,(M). Then V = ty... for some vo € R” and 
wo € atlas,(M) by Notation 54.1.4. But tpro2 = tp», for some v € IR" by Theorem 54.1.8 (xi), and 
the function tpw, : atlas,(M) — T(R”) satisfies tp» (Y) = Lup)» by parts (iv) and (v). Consequently 
V(t) (0) = tpv, (Y) (0) = Lip), (0) = v(p). Hence v^ (V(v)(0)) = p. 

Part (ix) follows from part (iv) and Definition 54.1.2. 


Part (x) follows from parts (viii), (iv) and (ix). 


54.3. Tangent vector technicalities 


54.3.1 REMARK: Extended tangent vector parametrisation and philosophical notes. 

Section 54.3 mentions some technicalities and philosophical points regarding tangent vectors which can 
probably be ignored. The extended tangent vector specification in Notation 54.3.3 is required because 
in this book, locally Cartesian atlases are not maximal in any sense. Therefore an extended notation is 
required for C!-consistent charts. Remarks 54.3.8, 54.3.9 and 54.3.10 are merely philosophical notes on 
other possibilities for defining tangent vectors. 


54.3.2 REMARK: Extension of tangent vector point/components/chart notation to compatible charts. 
'There are sometimes situations where it is desirable to be able to refer to a tangent vector by specifying 
its point/components/chart triple for a compatible chart instead of a chart which is in the manifold's atlas. 
Definition 54.1.2 for tpw, requires the chart w to be in the atlas of the manifold M. But tangent vectors in a 
C! manifold can be specified without ambiguity in terms of any C! compatible chart. (See Definition 51.4.2 
for general C* compatible charts.) 
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A maximal C! atlas, as in Definition 51.4.5 and Notation 51.4.7, would contain all C! compatible charts. 
Then no extension of Definition 54.1.2 would be required. (Many books do assume that all manifold atlases 
are maximal. See Remarks 50.4.9, 51.2.2, 51.4.4, 54.5.31 and 54.9.12 for the disadvantages of maximal 
atlases.) The extended specification method in Notation 54.3.3 is required in the case of non-maximal C! 
atlases. (See Notation 51.4.7 for atlas; (M).) 

Notation 54.3.3 does not add new tangent vectors to the manifold M. Nor does it add charts to the atlas 
of M. (Note that Notation 54.3.3 requires Yo € atlas,(M), not vo € atlas; (M).) It merely extends the 
notation for tangent vectors in Definition 54.1.2. In other words, every vector which is written as tp u, for 
some 1j € atlas. (M) and v € IR" is in fact equal to a vector tpvo,o for some vo € atlas; (M) and vo € IR". 
(This is asserted by Theorem 54.3.5 (ii).) 


54.3.3 NOTATION: Extended specification of tangent vectors in terms of compatible charts. 
tpv, for p € M, v € R” and v € atlas; (M), for a Ct manifold M with n = dim(M), denotes the set 


tpv = {(o, Lo) € atlas, (M) x T(R”); Lo(0) = Yo(p) and Or(vg * (Lo(t)))], y = v)- 
(See Definition 54.1.2 or Notation 26.14.3 for T(R”), the set of Cartesian space tangent-line vectors.) 


54.3.4 REMARK: Consistency of basic and extended notations for tangent vectors. 

Notation 54.3.3 is an extension of Definition 54.1.2 from atlas(M) to the C'-completion atlas!(M). (This 
is asserted as Theorem 54.3.5 (1).) Definition 54.1.2 could be replaced by Notation 54.3.3 without negative 
consequences. The extended specification is often useful, and will be used in this book with little or no 
comment. It is equivalent to temporarily adding the compatible chart ~ to atlas( M). 


54.3.5 THEOREM: Some basic properties of the extended notation for tangent vectors. 
Let M be a C! manifold with n = dim( M). 
(i) Let p € M, v € IR" and v € atlas(M). Let V; be the tangent vector tp u, according to Definition 54.1.2, 
and let V2 be the tangent vector tp,» according to Notation 54.3.3. Then Vi = Vo. 
(ii) Vp € M, Vv € R”, WY € atlasi (M), Vu € atlas, (M), 3'vy € R”, tgo = Lugd 
(iii) Vp € M, Vv, vo € R”, YY € atlas; (M), Vio € atlas; (M), VÀ € R, 
tp wap = pwopo 77. tp, Av = tp,Avo so 
(iv) Vp € M, Yu, v, ugo, vo € R”, V € atlas; (M), Vo € atlas; (M), 
(tpu = tp,uo,o And tpv, = tp,vo.so) = thutv = tp,uotvo,do- 


PROOF: Part (i) follows directly from Definition 54.1.2 and Notation 54.3.3. 
For part (ii), let p € M, v € R^, v € atlas; (M) and vo € atlas(M). Let vo = 5; ., v0, bow E) |, wip) 
Let (41, L1) € tpw,- Then by Notation n600, £1(0) = yı (p) and by the chain rule, Theorem 41.7.2, 


Oro (Li (0))) |, = Oro Qo QP (Li (0))))) | <9 
F > Orso (9 (2))[, 559r (P Gn (0) [ig 


= X vas polh (x), uy (54.3.1) 


= Up, 


where line (54.3.1) follows by Notation 54.3.3 and (V1, L1) € tpv,- So (Y1, L1) € ty, 4, v, by Definition 54.1.2. 
Thus tpw, C tp,vo,49- To show the reverse inclusion, let (1, L1) € tps, 4). Then Definition 54.1.2 implies 
L,(0) = yı (p), and by the chain rule, 


Orb Gs (a (£))) |, = Orb (Ho (Po Qs (L1(#))))) |, 


= X Da PUT Doyo OOo 
= X hda Eey (54.3.2) 
=v, (54.3.3) 
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where line (54.3.2) follows by Definition 54.1.2 and (v1, L1) € ty,4,,",, and line (54.3.3) follows from the 
“inverse chain rule", Theorem 41.7.7. Thus typo. 2 tpvo,po. Hence £y,v,y = tp,vo,zo- The uniqueness of vo 
follows from Theorem 54.1.8 (vii). 


For part (iii), it follows from part (ii) that tp = tp,vo,po and tp,rv, = ty,v, po for some unique vo, v € IR". 
By Theorem 54.1.8 (iv), (Yo, Lyo(p),vo) € tp,vo,vo- SO (Wo, Lyolp),vo) € tp,v,y- Therefore by Notation 54.3.3, 
v = Pp (Lust) peo = Ej Ar (Uo (2), (5, bY the chain rule. It follows that Av = 
x. BTE CC: 2). uo) = Orb (Ug * (Leso(p),dv0 (£)) Vix Mud implies that (Vo, Lus (y) ws) € tp,àv,p 


by Notation 54.3.3. So (Vo, Lyo(p), Avo) € tyro But (Vo, Luo(p),rvo) € tp,rvo,% by Theorem 54.1.8 (iv). 
Therefore tpv po = tp,Avo,po Dy Theorem 54.1.8 (viii). Hence tp Avy = $5 Xu ds: 


For part (iv), it follows from part (ii) that tp y = tp,us vos tp = tp voyo and tputoy = tp,wo,r for 
some unique uo, vo, wo € IR". By Theorem 54.1.8 (iv), (Wo, Luo(p),uo) € tp,uo,vo and (Vos Lust Lus Je Toros ads 


So (Vo, Luo(p),uo) € tp,u,o and (Vo, Ly,(p),uj) € tp, Therefore u = = Ma uj Ins (ap * (x 2))].. vec) and 
v= 2 c4 Ori (Us (2) |, sto) by Notation 54.3.3 and the chain rule. By addition, it follows that u 4- 
v= aus + 09) Ox1b (Ug (2) an) = ORC (Likely ee bus CODDI PENS which implies by Notation 54.3.3 


that (Vo, Lus (p),u0+v0) € typos). 90 (Yo, Ly, (p) ,uo+v0 ) € tpw; But (Vos Lus (p),u0-+v0) € tpuo+vo,po bY 
Theorem 54.1.8 (iv). So ty,u, us = tp,uo+vo,yo by Theorem 54.1.8 (viii). Hence tp u+v,y = tp,uo+vo,vo- 


54.3.6 REMARK: ‘Tangent vectors parametrised by charts in the C! complete atlas. 
Theorem 54.3.7 extends Theorem 54.1.8 (xii) to charts in the C* completion of the given atlas on M. This 
makes use of the extended specification for tangent vectors in Notation 54.3.3. 


54.3.7 THEOREM: Tangent vectors for charts in the C^ complete atlas. 
Let M be a C! manifold with n = dim(M). Then 


Yp E M, WY € atlasi (M), T,(M) = {tp vy; v € R”}. 


PROOF: The assertion follows from Theorems 54.3.5 (ii) and 54.1.8 (xii). 


54.3.8 REMARK: Replacement of extra-mathematical points of a manifold with components. 

One may replace the point set M from the manifold with the set of equivalence classes of ordered pairs [(/, x) 
for x € Range(v) and v € atlas(M), where (1,21) = (Y2, x2) if and only if x2 = (p2 o vi )(z1) = Y1,2(21), 
where the transition maps v1, = v o Y] ! may be defined abstractly in the absence of the point set M. 
(This kind of charts-only approach is also discussed in Remark 54.3.10.) This abstract approach is not as 
abstract as it seems. The coordinates-only approach describes manifolds from the observer’s point of view, 
and observers are one half of every observation. The observer is as much a part of every physical observation 
as the system which is being observed. 


54.3.9 REMARK: Some advantages of the Cartesian space line representation for tangent vectors. 

Since a tangent-line vector tj, = [(W, Lu(5),,)] is a set of ordered pairs, it may be viewed as a function. 
'This is an artefact of the way in which functions are defined in the standard approach to relations in ZF set 
theory. (A similar comment is made in Remark 54.10.13.) The domain Dom(t,,.,y,) of this function is equal 
to atlas; (M). The value of this function for v € atlas,(M) is tpw, (V) = Ly(yjy,». This apparent simplicity 
is due to the choice of the same chart v as the function argument and as its chart parameter. When they 
are not the same, a conversion must be made as in Theorem 54.1.11 to make the charts the same. 


Since the line Lyp)» is itself (explicitly) a function, one obtains tpv, y (Y) (t) = Ly), v(t) = v(p) + tv € R” 
for v € atlas,(M) and t € IR. Then one may apply the inverse chart ^ !, which is a partial function 
on IR^, to the points on this line to obtain the partial function ^! o ty, (v) : R — M, which satisfies 
V o tpw yl Y(t) = v ((p) + tv). This is a C! curve in M which passes through p. Thus a tangent-line 
vector may be thought of as a line in IR", as a C! curve in M, as an equivalence class of ordered pairs, or as 
a function which maps charts to lines. There are no contradictions here. The wide range of interpretations 
is intentional. Each of these interpretations is valid and useful. 


It is noteworthy that the C! curve V^! o Ly») ,y is similar but different to the equivalence class of C* curves 
which many authors define to be a tangent vector. The difference is that there is only one curve, and this 
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curve is affine when viewed through the chart for which it is defined. This avoids prejudicing the definition 
in favour of the C! regularity class. This is a very intentional consequence of the design of Definition 54.1.2. 


It should also be noted that whereas many authors test the equivalence of C! curves through p € M 
by applying either C! real-valued functions f : M — R or the charts V € atlas,(M), the approach in 
Definition 54.1.2 is to test the one single curve (per chart) in the Cartesian chart coordinates. There is no 
need to apply test functions because the line is already defined in Cartesian space. The curve ^! o Liy(p),v 
in M is only a secondary feature of the definition, not the principal structure of the definition. This is 
also intentional. The coordinate chart represents the observer’s point of view of the manifold. It is in 
the observer’s frame of reference that curves are tested to determine their tangent vectors. There is no 
real benefit in pretending that one can define intrinsic tangent vectors on a manifold. Tangent vectors are 
observed in the observer’s reference frame, namely the coordinate chart. 


54.3.10 REMARK: Synthesis of both points and tangent vectors from a given atlas. 

If a manifold is defined by chart transition maps Yk e = Wx o wy : R” — R” for k, € I for some index set T, 
and the point set M is defined abstractly as the set of equivalence classes [(k, x)] of pairs (k, x) € I x R” with 
(ky, £1) = (k2, £2) whenever £2 = v, kı (1), then Definition 54.1.2 may be adapted so that a tangent-line 
vector becomes an equivalence class [(k, L)] of index-tagged Cartesian space tangent-line vectors in the set 
{(k, D) € I x T(IR?); X € I, L(0) € Range(wx,c)}, where the equivalence (k1, L1) = (k2, L2) holds whenever 


(1) L2(0) = v, x, (L1(0)), and 
(2) &4Lo(t)|, = Orbis as (02a (0), s 


Such a definition for a “point-less manifold" has a pleasing symmetry between the zeroth and first order 
part of the equivalence relation, which suggests the possibility of some sort of generalisation. 


54.3.11 REMARK:  Tangent-line vectors in non-C* manifolds. 

As mentioned in Section 53.2, the tangent-line vector definition may be extended to manifolds which are not 
C' differentiable. For such manifolds, the partial derivatives in Definition 54.1.2 may not be related to each 
other by a simple linear isomorphism between the n-tuples of partial derivatives for the two charts. The 
map may be a well-defined bijection which is non-linear. The C! condition ensures simple linear relations. 


54.4. Tangent vector spaces 


54.4.1 REMARK: The linear space structure of the tangent space at each point. 
Section 54.4 presents the linear space structure which is imported onto the set of tangent vectors at a single 
point of a differentiable manifold from the corresponding tangent spaces at single points of Cartesian spaces. 


54.4.2 REMARK: The lack of linear space structure for tangent spaces in non-C* manifolds. 

In the case of a C! manifold M, there is a natural linear space structure on the set T;,(/) of tangent vectors 
at an point p € M. This linear space structure has a natural isomorphism to the corresponding structure on 
the Cartesian tangent space Typ) (IR^). If the manifold is not of class Ct, it is not generally true that the 
set of tangent vectors at a point has a natural n-dimensional linear space structure. 


The set T,(M) of tangent vectors at p € M is well defined if the transition maps for the manifold at p map 
directional derivatives to directional derivatives in a one-to-one manner; in other words, if tangent lines are 
mapped one-to-one to tangent lines. It is important to recognise that the linear structure on T,(M) is a 
special property which is guaranteed by the total differential theorem only when the manifold is C1. (See 
Theorem 41.6.15.) Non-C! manifolds are not necessarily pathological. For example, at the boundary of a 
set, the solution of a boundary value problem is not expected to be C? in general. Quite the opposite! It 
is quite typical that the C! property is lost at boundaries. The pure-mathematical differential geometry 
literature is largely biased in favour of smooth boundaryless manifolds, which happen to be convenient for 
the investigation of combinatorial topology. Applications of differential geometry in physics and engineering 
must contend with the boundaries of regions and singular sets in the interior of regions. 


54.4.3 REMARK: Tangent space linear space operations for various representations. 
The linear space algebraic operations for the tangent-line space Tp(R”) for a Cartesian space R” are presented 
in Definition 26.13.11. The vector addition and scalar multiplication operations for linearly parametrised 
lines in T; (IR") are not the same as pointwise addition and multiplication operations. 
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Although each tangent vector tp», € Tp(M) for a C! manifold M is represented as (v, Lyp),»)], which 
is an equivalence class of chart/line pairs, it is inconvenient to define linear combination operations on 
tangent vectors in terms of Cartesian space lines L,(5),,. Therefore Definition 54.4.4 is presented in terms 
of point /components/chart triples instead, which are much simpler and more intuitive. 


54.4.4 DEFINITION: The tangent (line) vector space at a point p in a C! differentiable manifold M is the 
set T (M) with the operations of vector addition and scalar multiplication by the field R defined as follows. 


(i) Vw € atlas M), Vv, v2 € R”, fous b + tae ah = bao t02, 
(ii) VV € atlas, (M), Vv € R”, VA € R, Atp vy = tp, Av,- 


The linear space specification tuple for this tangent-line vector space is T,(M) < (R, T (M), oR, Tm. 0. 1), 
where R < (IR; om, TR) is the specification tuple for the field of real numbers. 


54.4.5 REMARK: Extension of linear operations from tangent vectors to C!-complete atlas charts. 

Notation 54.3.3 does not introduce an extended space of tangent vectors at a point in a manifold. It merely 
permits charts from the C! completion of the atlas to be used in an extended notation to specify the same 
set of tangent vectors. Therefore Theorem 54.4.6 does not extend the linear operations in Definition 54.4.4 
to a larger set of vectors. It merely extends these operations to an extended notation for the same vectors. 


54.4.6 THEOREM: Linear operations extension from tangent vectors to the C! atlas-completion. 
Let M be a C! manifold with n = dim(M). 


(i) Vp € M, Ww € R^, Vy € atlas] (M), VÀ € R, My = tpv,y- 
(ii) Vp € M, Yu1, v2 € R”, Yy € atlas; (M), Lov, b + Lp,va b = tp vi +v, y: 
Proor: For part (i), let p € M, v € R”, y € atlas}(M) and A € R. Let vo € atlasp(M). Then 


tpv, = tp,vo,po for some unique vo € R” by Theorem 54.3.5 (ii). So ty, xv,v = tp,Avo,po by Theorem 54.3.5 (iii). 
But Ap, vo ibo = tp, Avo, Yo by Definition 54.4.4 (ii). Hence Ap, v jb = Lp, Av, y- 

For part (ii), let p € M, v1,v2 € R” and v € atlasi (M). Let vo € atlas,(M). Then typo, = Íp,u, uo 
and £54, 4 = fp,u,,u, for some unique w1, wz € R” by Theorem 54.3.5 (ii). So ty», Lus, = Cp,wi-Lwa do bY 
Theorem 54.3.5 (iv). But tjv, uo + tp,us, io = tp,witwe,% by Definition 54.4.4 (i). Hence £5, y + tps, = 


tov; +v2,y° 


54.4.7 THEOREM: The zero vector in each tangent space has zero component-tuple for all charts. 
Let M be a C! manifold with n = dim(M). Then 


Vp € M, Vy € atlas, (M), Or, (M) = ty,on, > 


where Or» denotes the zero vector in the linear space IR". In other words, the zero vector in T; (M) is to, 
for all C'-compatible charts v € atlas (M), for all p € M. 


PROOF: The assertion follows by letting v = Or» or v2 = Or» in Theorem 54.4.6 (ii). 


54.4.8 REMARK: The tangent space chart-basis vectors form a basis for the tangent space. 
The family of tangent space chart-basis vectors (e^)? , = (tpe: y); in Definition 54.4.9 constitutes a 
basis for the linear space T,(M) of tangent vectors at a fixed point p of a differentiable manifold M. The 
symbol 5! is the Kronecker delta which is presented in Definition 14.7.10. The standard basis (e;)?_, for the 


Cartesian linear space IR” is introduced in Definition 22.7.9. 


54.4.9 DEFINITION: The tangent space chart-basis vector for component i € N, at a point p € M with 
respect to a C!-compatible chart v € atlas} (M) for a C! manifold M with n = dim(M) is the tangent vector 


tpe: € Tp(M), where e; € IR" is defined by (e;)/ = ô$ for i, j € Nn. 


54.4.10 NOTATION: Tangent space chart-basis vectors. 
e&"" forpeM,ve atlas; (M) and i € Nn, for a C! manifold M with n = dim(M), denotes the tangent 
space chart-basis vector tp, € Tp(M). In other words, 


Vp € M, Vw € atlas} (M), Vi € Nn, gh = tper 


a 
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54.4.11 THEOREM: Expression for general tangent vectors as linear combinations of chart-basis vectors. 
Let M be a C! differentiable manifold with n = dim(M). Then 


Yp E€ M, Vv € IR^, Vv € atlas; (M), laud = X vier? 
i=l 


PROOF: Let M be a C! manifold with n = dim(M). Let p € M, v € atlas}(M) and v = (v), € IR". 
Then v = $77 , ve; by Theorem 22.7.11 (i). So 


Lp, va = ye vieja 


= 2 0p, vies i (54.4.1) 

= » Uo ec (54.4.2) 

= Y viet, (54.4.3) 
i—l 


where line (54.4.1) follows from Theorem 54.4.6 (ii) by induction on n, line (54.4.2) follows similarly from 
Theorem 54.4.6 (i), and line (54.4.3) follows from Notation 54.4.10. 


54.4.12 REMARK: Chart transition rule for tangent space chart-basis vectors. 

Comparison of Theorem 54.4.13 with Theorem 54.1.11 reveals that the matrix which is applied to the n-tuple 
of chart-basis vectors in Theorem 54.4.13 is effectively the transpose of the inverse of the matrix which is 
applied to n-tuples of vector components in Theorem 54.1.11. Thus the transformations applied in each case 
are effectively dual (or contragredient) to each other. (See Definition 23.11.20 for the dual representation of 
a linear space automorphism.) 


54.4.13 THEOREM: Chart transition rule for tangent space chart-basis vectors. 
Let M be a C! manifold with n = dim(M) € Zj. Then 


Vp € M, Ypa, vg € atlas; (M), Vi € Nn, 


PPa _ Pvp 0 —1 ; 
ei =e ai (Pe (Pa (9) Lu c; (54.4.4) 
= 2 6$" Ja wy i, (54.4.5) 
J= 


where the chart transition matrix notation Jga(p) is given in Definition 51.4.18. 


PROOF: By Notation 54.4.10, ee = tyra Let tpeipa = ty,vgg- Then it follows from Theorems 
54.1.11 and 54.3.5 (iii, iv) that vg = » 7; 4 Jaa (p klei)" = pa Jpop) &0? = Jao(p)!i for all j z Nn. But 
from Theorem 54.4.11, it follows that tjj, y, = 5; v j^^. Therefore e?» = 35a J8a (7)? ;e5 ^ , which 


Bj 
verifies line (54.4.5). Then line (54.4.4) follows from Dettaftion 51.4.18. 


54.4.14 REMARK: The "drop function" from the tangent space of a tangent space to the tangent space. 
The linear space T;(M) can be its own tangent space. Instead of constructing a separate tangent space 
for T,(M), the limit of an expression such as ?/(0) = lim, ,o0 t ! (4(t) — 4(0)) for maps y : R > T;(M) may 
be regarded as an element of T,(M) rather than some abstract tangent space of the tangent space Tp(M). 
The map which sends an abstract tangent vector 7/(0) to the corresponding concrete element of T,(M) 
will be called a “drop function". (See Section 59.2 for drop functions for double tangent bundles.) This is 
important in defining Lie derivatives and covariant derivatives, especially because most textbooks “drop” 
abstract tangents to concrete tangents without comment. 


A second-level tangent vector on the tangent bundle of a C? manifold M which in some sense stays 
within T,(M), instead of “straying” into tangent spaces T;(M) with q # p, is a vertical tangent vector 
in T(T(M)), which does have a well-defined *drop" operation, as shown in Theorem 59.2.11. 
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54.5. Tangent vector bundles 


54.5.1 REMARK:  Clircularity caused by asserting tangent bundles to be differentiable fibre bundles. 
Differentiable fibre bundles are defined in terms of differentiable structure groups, which are defined in 
terms of differentiable manifolds, which have tangent bundles. (See Section 65.9 for tangent bundles of 
differentiable manifolds.) Therefore tangent bundles cannot be asserted to be a sub-class of differentiable 
fibre bundles until these other two topics (differentiable groups in Chapters 62-63 and differentiable fibre 
bundles in Chapters 64-66) have been presented. As alluded to in Remark 54.0.1, tangent bundles must first 
be defined as a species of objects distinct from differentiable fibre bundles. Then later, when differentiable 
fibre bundles have been defined, tangent bundles may be “discovered” to be identifiable as a subspecies of 
differentiable fibre bundles. Thus tangent bundles and differentiable fibre bundles are presented initially as 
independent classes of objects. 


54.5.2 REMARK: The relation between tangent vector bundles and second-level tangent bundles. 
The main structures which are presented in Section 54.5 are summarised in Table 54.5.1, together with their 
natural progression to the second-level tangent bundle structures in Section 59.1. 


definition notation 
54.4.4 tangent vector space T,(M) 
54.1.4 tangent vector total space T(M) = Upem Tp (M) 
54.5.16 tangent vector bundle (T(M), 7, M, AR) 
54.5.22 tangent bundle total space manifold atlas ARM) 
54.5.26 tangent bundle total space manifold (T(M), Arm)) 
54.5.30 tangent vector bundle (T(M), Arm), T, M, Am, EL 
59.1.19 second-level tangent space T.(T(M)), z € T(M) 
59.1.22 second-level tangent bundle (T(T(M)), Arcrany 79, T(M), Arm), AR) 


Table 54.5.1 Progression from tangent vector bundles to second-level tangent bundles 


Attempts to fit tangent bundles into the framework of abstract fibre bundles can be exceedingly tedious. 
Simple concepts are made exceedingly convoluted by such attempts. Structures which have an obvious 
motivation are thereby transformed into unnecessarily bewildering structures. Nevertheless, such an attempt 
is made here, in the interests of doing things “the right way”. 


A differentiable fibre bundle has five atlases. (This is illustrated in Figure 66.2.1 in Remark 66.2.4.) Therefore 
a tangent fibre bundle, which mimics the structure of a general differentiable fibre bundle, must also have 
five atlases. (This is illustrated in Figure 65.9.2 in Remark 65.9.2.) However, most of the information in 
these atlases is redundant. 


54.5.3 REMARK:  Ertractiom of horizontal and vertical components of tangent vectors. 

Definitions 54.5.4 and 54.5.6 respectively define horizontal and vertical component maps for the tangent 
fibration of a C! manifold. The horizontal component map 7 is chart-independent, whereas the vertical 
component maps (y) depend on the choice of chart v. 


'The projection map 7 can be obtained as an explicit construction from the tangent vector representation in 
Definition 54.1.2 as the map m : V v! (V(v)(0)) for all Y € Dom(V) by Theorem 54.2.2 (viii). In other 
words, VV € T(M), V € Dom(V), «(V) = v! (V(v)(0)). This is asserted in Theorem 54.5.5. 


The map 7 : tpw, — p on line (54.5.1) in Definition 54.5.4 is is uniquely determined by Theorem 54.1.16. 
The map 7 : V > p on line (54.5.2) in Definition 54.5.4 is uniquely determined because the fibre sets 7,(M) 
are disjoint by Theorem 54.1.8 (ix). 


For each p € M and v € atlas; (M), the map tpw, — v in Definition 54.5.6 is a well-defined bijection from 
T,(M) to R” by Theorem 54.1.8 (xiii). The explicit construction v = tp», ()(1) — tp,v,4(w) (0) is given by 
Theorem 54.2.2 (vii). 
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54.5.4 DEFINITION: The tangent fibration of a C! manifold M is the tuple (T(M),7,M), where the map 
m : T(M) — M is defined by 


Vp € M, Vy € atlas, (M), Vv € R”, T(tp,v p) = P, (54.5.1) 
where n = dim(M). In other words, 


Yp € M, VV €T,(M), a(V) =p. (54.5.2) 


The projection map of the tangent fibration (T(M), n, M) is the map m. 


54.5.5 THEOREM: Explicit formula for base points of given tangent vectors. 
Let M be a C! manifold. Then 


VV € T(M), YY € Dom(V), mV) = v! (V(v)(0)), 


where 7 is the projection map of the tangent fibration of M. 


Pnoor: The assertion follows directly from Theorem 54.2.2 (x) and Definition 54.5.4. 


54.5.6 DEFINITION: The velocity chart on the tangent fibration of a C! manifold M, for a point chart 
V € atlas(M), is the map from x! (Dom(v)) to R” defined by ty,»,y — v, where n = dim(M). 


The velocity chart map for the tangent fibration of a C! manifold M is the map from atlas(M) to T(M) > R” 
which maps each chart in atlas(M) to its corresponding velocity chart on T(M), where n = dim(M). 


54.5.7 NOTATION: 6, for a C! manifold M, denotes the velocity chart map for T(M). In other words, 
Vv € atlas(M), Vp € Dom(v), Vv € IR",  $(y)(tisv) — v, (54.5.3) 

where n — dim(M). 

54.5.8 THEOREM: Component charts restricted to tangent spaces are linear space isomorphisms. 


Let M be a C! manifold with n = dim(M). Let p € M and v € atlas; (M). Then $()|,. TUE T,(M)— R” 
is a linear space isomorphism with respect to the standard linear space structure on R”. In other words, 


Vp € M, Vv € atlas (M), VVi, V2 € T (M), VA1, A2 € R, 
$(v)(A1Vi + A2V2) = P(Y) (Vi) + A ()(V2). 


Pnoor: It follows from Theorem 54.1.8 (xiii) that $(y)|;. TUE T,(M) — IR" is a bijection. The linearity 
follows from Definition 54.4.4 and Theorem 23.1.14. 


54.5.9 DEFINITION: The (pointwise) velocity chart on the tangent fibration of a C* manifold M at a point 
p € M, for a chart ~ € atlas; (M), is the map from T,(M) to R” defined by tpw, — v, where n = dim(M). 


The (pointwise) velocity chart map for the tangent fibration of a C! manifold M at a point p € M is the 
map from atlas; (M) to T;(M) — R” which maps each chart in atlas;(M) to the corresponding pointwise 
velocity chart for T,(M), where n = dim(M). 


54.5.10 NOTATION: ©,, fora C! manifold M, denotes the pointwise velocity chart map for T(M) at p € M. 
In other words, 


Vi) € atlas, (M), Vv € IR^, Pp(Y) (tpv, i) = v. 
In other words, Vw € atlas; (M), $5(v) = $()|;. (My 
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54.5.11 REMARK: Domains and ranges of velocity charts and velocity chart maps. 

The velocity chart (wv) for a given point chart v € atlas(M) has the domain v^ !(Dom(wv)), which is a 
subset of T(M). Since the range is R”, one may write (v) : x! (Dom(v)) > R” or (v) : T(M) > R”. 
So one may also write ® : atlas(M) > (T(M) > IR?) or 6 : atlas(M) > (Uu estas) 1 (Dom(y)) > R”). 
Thus Range(®) = (9(V); v € atlas(M)) € Upeatias(m) 7 +(Dom(w)) > IR". It is Range(®) which becomes 
the standard fibre atlas for a tangent bundle in Definition 54.5.16 (iii). It is important to note that fibre maps 
only provide fibre maps, which define homeomorphisms from the fibre sets Tp(M) to the fibre space R” for 
the fibration (T(M), x, M). (See Definitions 47.3.3 and 47.3.6 for fibre charts and fibrations respectively.) 
The velocity chart maps (yY) do not constitute a manifold atlas for this fibration. Such a manifold atlas is 
in fact provided by the combined base point coordinates and velocity coordinates in Definition 54.5.22. 


54.5.12 REMARK: An explicit formula for the velocity chart map on a tangent fibration. 

The formula for the velocity chart map (wv) in Notation 54.5.7 line (54.5.3) is unsatisfyingly abstract. 
It is far preferable, whenever possible, to have a direct explicit formula rather than an inverse formula. 
Line (54.5.3) is an inverse formula because one must somehow determine which element v of R” was used 
in the construction of the tangent vector tpu, according to Definition 54.1.2, when tpv, is given as some 
equivalence class [(~, L)] as in Theorem 54.1.8 (v). In other words, the parameter-triple (p, v, Y) determines a 
set of chart/line pairs, and one must compute v from this set of pairs, without knowing the parameter-triple. 


As mentioned in Remark 54.3.9, equivalence classes [(~, L)] € T;(M) have the serendipitous advantage that 
they are functions from atlas,(M) to T(R”). Therefore by Notation 10.2.9 for function values, tpv s (v) 
very conveniently selects the correct element L € Ty p)(IR”) corresponding to any given v € atlas; (M). The 
consequence of this is Theorem 54.5.13, which gives an explicit construction for velocity chart maps. 


54.5.13 THEOREM: Formula for velocity chart map in terms of tangent-line bundle definition. 
Let M be a C! manifold. Then 


Vp € M, VV € T (M), Vv € atlas (M), 
EV) = (V (5)(9)|, s 


PROOF: Let p € M, V € T,(M) and v € atlas,(M). Then by Theorem 54.3.5 (i, ii), V = tp.»,y for some 
v € R”. But tpw, = [(V, Ly(p),v)] by Theorem 54.1.8 (v), which is a function from atlas; (M) to T(R”). 
So V(v) is a well-defined element of T(R”), and in fact V(v) € Typ)(R”) and &L(t)|, 6 — v. Hence 


PWV) = AV ())(t)|, s. 


54.5.14 THEOREM: Chart transition formula for the velocity chart map for a tangent bundle. 
Let M be a C! manifold with n = dim(M). Then 


Vp € M, VV € T,(M), VV, v» € atlas; (M), 
$(Vv3)(V) = J3(p)*(v1)(V). 


(See Definition 51.4.18 for the chart transition matrix J21(p).) In other words, 


Vp € M, VV € T,(M), VV, v» € atlas; (M), Vi € Nn, 


B(o)(V)' = x Ja (p) P(t) (VY. 


PROOF: The assertion follows from Notation 54.5.7 and Theorem 54.1.11. 


54.5.15 REMARK: Alternative notations for velocity charts. . 
In some contexts in this book, the velocity chart (wv) for a chart y may be denoted as w for brevity. 
Similarly, the manifold chart W(wv) in Notation 54.5.21 may be denoted by «^ for brevity. 


54.5.16 DEFINITION: The tangent (line) (vector) bundle for an n-dimensional Ct differentiable manifold 


M < (M, Ay) is the tuple (T(M), 7, M, AT uy) as follows. 
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(i) TM) = Usem T,(M). 
(ii) 7: T(M) — M is the projection map for the tangent fibration (T(M), m, M). 
(iii) AR) = {G(~); v € Am}, where 9 is the velocity chart map for T(M). 


54.5.17 REMARK: Diagram of spaces and maps for a tangent vector bundle. 

Some of the maps and spaces in Definition 54.5.16 are illustrated in Figure 54.5.2. The superscript R” on 
the atlas notation AR M) signifies that this is a fibre bundle atlas with fibre space R”, not a differentiable 
manifold atlas. 


n-!(Dom(V)) € T(M) OC) R^ 


n 


$(v) € Afr 


T 


p € Am 


Domt) c at () ) J Cw 


Figure 54.5.2 Spaces and maps for a tangent vector bundle 


54.5.18 REMARK: Charts for a differentiable manifold structure om a tangent bundle total space. 
The charts (v) in the tangent vector bundle atlas AR, My combined with the maps v o m for y € Ay, 


induce a differentiable manifold structure on the total space T(M). The charts in AR n define the vertical 
components of tangent vectors, whereas the corresponding maps v o m define the horizontal components. 
Thus each tangent vector on the manifold can be identified uniquely by 2n real components, the first n 
components identifying the base point and the second n components identifying individual vectors within 
the tangent space at each base point. 


Strictly speaking, a tuple concatenation operator Qn.» : IR" x R” > IR?" as in Definition 16.4.3 is required 
for the expression for the manifold chart maps in Notation 54.5.21 line (54.5.5). However, such concatenation 
of real tuples is generally applied without comment. 


54.5.19 DEFINITION: The tangent (vector) bundle (total space) manifold chart for a C! manifold M, 
corresponding to a chart ù € atlas(M), is the map from «^ !(Dom(wv)) to IR" x R” = IR?" defined by 
ty ^ (v(p), v) for all p € Dom(v) and v € R”, where s is the projection map for the tangent vector 
fibration (T'(M), v, M). 


54.5.20 DEFINITION: The manifold chart map for the tangent vector bundle total space T(M) of a C! 
manifold M is the map from atlas(M) to Uneattas(ay(™ (Dom(V)) — IR?") which maps each chart in 
atlas( M) to the corresponding tangent vector bundle total space manifold chart. (See Definition 54.5.19.) 


54.5.21 NOTATION: Manifold chart map for tangent vector bundle total space. 
V, for a C! manifold M, denotes the manifold chart map for the tangent vector bundle total space T'(M). 
In other words, for each 7) € atlas(M), the map W(w) : x^! (Dom(v)) — IR?" is defined by 


Vp € M, V € atlas, (M), Vv € R”, — V(y)(tu,.u) = (v(p),v), (54.5.4) 


where n — dim(M). In other words, 


Vw € atlas(M), V(b) = Qu o ((V om) x $(y)) (54.5.5) 
zm (yom) x (Y), (54.5.6) 


where Qn,n : R” x R” — IR?" is the concatenation map in Definition 16.4.3. 
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54.5.22 DEFINITION: The tangent bundle (total space) manifold atlas for the tangent-line vector bundle 
(T(M),7,M, AT) of a C! manifold M < (M, Ay) with dim(M) = n is the manifold atlas Ar(m) on 
T(M) defined by 


Arm) = (Quin o ((V 0 T) x O(~)); v € Au) 
= [V(V); v € Ay. 


54.5.23 THEOREM: Expressions for projections of tangent bundle manifold atlases. 
Let M be a C! manifold with n = dim(M). Then 


V) € atlas(M), VV € ^ (Dom(})), — IIf(W()(V)) = v(x(V)). 
Vi € atlas(M), VV € x^ (Dom(v)), Mia (9 (4)(V)) = (4) (V). 
(See Notation 11.5.26 for the Cartesian-product projection maps II.) 


PROOF: The equalities follow directly from Definition 54.5.22. 


54.5.24 REMARK: The construction of one tangent bundle chart for each base space chart. 
The tangent bundle charts in Definition 54.5.22 are illustrated in Figure 54.5.3. 


x 1(Dom(w)) € T(M) © SU Cr Qua © (( o T) x 9(u)) 
£ 
T V Qua e 
Dom() C M © -7 ( )m 


Figure 54.5.3 Standard manifold atlas for a tangent bundle 


The map V : Ay > Arm) is a bijection from base-space manifold charts to tangent bundle total space 
manifold charts. (The total space manifold atlas in Definition 54.5.22 is related to the flat-space tangent 
bundle which is illustrated in Figure 53.3.2 in Remark 53.3.14.) 


54.5.25 REMARK: The differentiable manifold topology for a tangent bundle. 

To construct a differentiable structure for the tangent bundle in Definition 54.5.16 requires more than just the 
manifold atlas in Definition 54.5.22. According to Definitions 51.3.7 and 51.3.8 for a differentiable locally 
Cartesian spaces and manifolds respectively, a topology is also required. The differentiable structure for 
T(M) must be a triple of the form ((C(M), Truy), Arq), where Trom) is a topology on T(M). 

In principle, the topology Tr(m) could be constructed on T(M) first, and then the atlas Arm) could 
be constructed on the topological manifold structure (T(M), Trm)). In practice, the topology is rarely 
constructed on a manifold before the atlas. An atlas is an easily understood structure which requires little 
abstract thinking for its definition. The topology is a high abstraction which is typically induced by an atlas 
in practice. The task of constructing differentiable structure for a tangent bundle is no exception. The most 
convenient construction order is the atlas first, followed by the topology. 


It might be thought that the topology Tr(m) can be specified fairly easily in terms of the topology Tm 
on M. An attempt in this direction might involve a direct product topology as in Definition 32.9.4. Then 
the topology Tr(m) would be generated by sets of the form Q x G for Q € Ty and G € Top(IR"). It is not 
possible to use open sets G € Top, (M) for the products Q x G because the space T;(M) is “variable” with 
respect to p. But IR" is also “variable” because the map which identifies it with 7,(M) is a variable map 
with respect to p. This identification map is in fact a tangent bundle chart. So this approach is actually the 
same as defining the tangent bundle atlas first and then inducing the topology on the tangent bundle from 
this atlas. So it seems to be inescapable that the atlas must be constructed before the topology. 

Definition 54.5.26 combines the tangent bundle set T(M) from Notation 54.1.4 with the tangent bundle 
manifold atlas Ar(m) from Definition 54.5.22 and the topology Trom) which is induced by Ar(m) on T(M) 
according to Definition 49.8.12, or alternatively Definition 51.5.8, which is essentially identical. 
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54.5.26 DEFINITION: The tangent bundle (total space) manifold of a C! manifold M is the differentiable 
manifold ((T(M), Trom), Ar(m)), Where Ary) is the tangent bundle manifold atlas for T(M) and Trom) 
is the topology induced by Aras) on T(M). 


54.5.27 REMARK: Verification of validity of tangent bundle differentiable manifold structure. 

The validity of the differentiable manifold structure in Definition 54.5.26 is verified in Theorem 54.5.28. Since 
the topology Tr(m) is constructed after the atlas A; js), it is not possible to use Definition 51.3.2 to test the 
validity of the atlas in terms of a pre-defined locally Cartesian space topology. Therefore Definition 51.5.2 
is used for the test, since it is meaningful for the set T(M). 


Although it may seem that verification that Tr(m) is a valid topology on T'(M) is any easy step, this is some- 
what illusory because the hard work of constructing a valid topology from an atlas is done in Theorem 49.8.8, 
whose proof is not entirely trivial. 


Concerning the Hausdorff condition, the proof of Theorem 54.5.28 (ii) fails if Tj is not a Hausdorff topology 
on M because, for example, if two points p1,po € M do not have a disjoint open cover according to 
Definition 33.1.24, then the zero vectors at these two points also will not have a disjoint open cover. For this 
reason, if M is not Hausdorff, then T'(M) is not Hausdorff. 


54.5.28 THEOREM: The tangent bundle on a CEt! manifold is a C^ manifold. 
Let M < (M, Au) < ((M, Ty), Am) be a C+! manifold with k € Zg. 
(i) The tangent bundle manifold atlas Ap (js) for T(M) is a C^ locally Cartesian atlas for the set T(M). 
(ii) The topology Tr(m) induced on T(M) by Arm) is a Hausdorff topology on T (M). 
(ii) ((TCM), Try), Ara) is à C* differentiable manifold. 
(iv) The projection map 7: T(M) — M is C* differentiable. 


PROOF: For part (i), by Definition 54.5.22, Aria) = (V (v); Y € Am} with UW) = Qnin o (( o 7) x (v) 
for all Y € Ay, where n = dim(M) € Le . It is customary to tacitly apply the concatenation map Qn,» to 
the product Range(v o 7) x IR". Thus one may write W(v) = (Y o «) x (4). 

Any element of Aj) has the form )- P(Y) for some v € Ay. Then y= (V o c) x (y) is a function 
with Dom(i) = 7~!(Dom(w)) and Range(?) = Range(v o 7) x R”, where Range(w o 7) = Range(w) by 
Theorem 10.10.13 (viii) because Dom(v) C Range(7) = M. Let € = Range(v). Then Q € Top(IR”). So 
Range(V) = Q x R” € Top(IR?"). Therefore Y : x^! (Dom(v)) > Q x IR" is a bijection. So Ar(m) satisfies 
Definition 51.5.2 (i) for the point set T(M). 

Ugeran Dom($) = peat 7 (Dom(w)) = T(M) because Dom(7) = T(M) and Range(r) = M = 
Upean Dom(v). So Ary) satisfies Definition 51.5.2 (ii) for the point set T(M). 

Let dy = W(t) and $» = P(Y). Then $i(Dom(/;)) = wVi(n-!(Dom(v;)) = vi(Dom(vs)) x R”, 
where «4 (Dom(v5)) € Top(IR”) because Dom(y2) € Top(M) and v is a homeomorphism. Consequently 
Yı (Dom(v/5)) € Top(IR?*). So Arm) satisfies Definition 51.5.2 (iii) for the point set T(M). 

Let dy = VW(v1) and dy = W(V») with Y1, Y2 € Ay. Then d» o 91! maps Q, x R” to Qs x R”, where 
Qa = Range(v4) for a = 1,2, according to the rule 


V(z,v) € x R”, (io o dy *)(a,v) = (Va; (2), p vj a. hr ean) 


This is C^ because v» o v ! is C^*! by Definition 51.3.2 (iii). So Arm) satisfies Definition 51.5.2 (iv) for 
the point set T(M). Hence Ar(y is a C* locally Cartesian atlas for T(M) by Definition 51.5.2. 

For part (ii), the topology Tr(m) is induced on T(M) according to Definition 49.8.12. Then Try) is a valid 
topology on T(M) by Theorem 49.8.8 (ii). The Hausdorff property for Trom) follows from the Hausdorff 
property for Tm by Theorem 33.1.35 (iii) applied to any chart yc Ar m) for two vectors Vi, V2 € Dom(4). If 
there is no common chart in Arm) for Vi, V2 € T(M), then a disjoint open cover for V, and V2 can be the pair 
(Dom(v), Dom(»)) for any v € atlasy, (T(M)) and v» € atlasy, (T(M)), where atlas(T(M)) = Arm). 
Part (iii) follows from parts (i) and (ii) and Definition 51.5.15. 
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For part (iv), m : T(M) — M is C* differentiable if o 7 o W(w)~! : Range(V(v)) > Range(v) is a C^ map 
between the Cartesian spaces IR?" and R”, where n = dim(M). (This follows from Definition 52.1.2.) Let 
(z,v) € Range(V(v)) with z,v € R”. Then by Notation 54.5.21, V(u) !(z,v) = tyv,4, where p = ^! (x). 
So (V o m o (v) )(x,v) = v(x(ty,s)) = x by Definition 54.5.4. Therefore y o m o V(y) ! is a C? map 
between Cartesian spaces by Theorem 42.6.18. Hence m : T(M) — M is a C* map. 


54.5.29 REMARK: Tangent bundles may be regarded as a subspecies of differentiable fibre bundles. 
The atlases Am, Ar(m) and AT m) are required for the tangent bundle in Definition 54.5.30. Some of the 
maps and spaces in Definition 54.5.30 are illustrated in Figure 54.5.4. 


rocr QO) 
CO U x R^ C Mx R^ 


( )mxm ent 


Figure 54.5.4 Tangent fibre bundle spaces and maps 


54.5.30 DEFINITION: The tangent bundle of an n-dimensional C! manifold M < (M, Am) is the tuple 
(T(M),7, M, Afta) < (T(M), Arm) T, M, Am, Atm) as follows. 


(i) (T(M), Arm), 7, M, Am) is the tangent fibration of M with the tangent bundle manifold atlas Ar(m) 
as in Definition 54.5.22. 


(ii) AR) = {6(w); v € Am}, where for each v € Ay, the fibre chart (v) : x !(Dom(v)) > IR" is 


defined to be the velocity chart P(Y) : tp», — v as in Notation 54.5.7. 


54.5.31 REMARK: The tangent bundle total space manifold atlas maintains maximum information. 

It is tempting to think that a tangent bundle should not be defined to have a specific manifold atlas. 
Definition 54.5.16 may seem perhaps to be overly restrictive in prescribing specific atlases. However, it would 
not be strictly correct to define the differentiable structure on T(M) as some equivalence class of atlases 
equivalent to Arm), or some sort of maximal atlas, because that would weaken the detailed regularity of 
the specific atlas {W(w); v € Am}. There are infinitely many levels and gradations of regularity between 
classes C^ and C^*!. Any addition of charts to the differentiable structure would weaken the regularity, 
possibly losing some property of interest. 


The regularity inherited from the base space M might not be only some form of differentiability. For example, 
the atlas for M may satisfy some transformation group invariance property such as local orthogonality or 
conformality of the transition maps. 


54.5.32 REMARK: The topology on the tangent bundle total space is induced by its atlas. 

The definition of the induced topology on a tangent bundle total space is related to Theorem 49.8.8 and 
Definition 49.8.12, which express a weak induced topology on a set in terms of inverses of arbitrary functions 
on the set. The tangent bundle in Definition 54.5.30 becomes a topological (GL(n), IR") fibre bundle when 
the atlases Ar(m) and Am are replaced with their induced topologies Tr(m) and Ty. It cannot be shown 
that the tangent bundle is a differentiable fibre bundle before this concept is defined in Section 64.8. 


'The proof given here for Theorem 54.5.33 is somewhat sketchy because the mechanical details are somewhat 
tedious to write out in full. The interesting, and slightly non-trivial, part of the proof is the observation that 
the chart transition rule for tangent vectors on a C! manifold is given by an invertible matrix in GL(n), and 
this matrix is C? because the manifold is C!. Therefore Definition 47.6.5 (v) is satisfied. 
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In the case of the tangent bundle T(M), the action by a matrix gg,.¢,(p) € GL(n, IR) for fibre charts $1 
and $», is very simply the left action of the matrix on the real n-tuple ¢1(V) = ®(w)(V) € R”, which is 
regarded as an n x 1 column vector subjected to multiplication from the left as in Definition 25.3.7. Thus 
it happens to be true that go, o, (p) (P(v1)(V)) = 94,6, (p) $(v1)(V) for the tangent bundle T(M). In other 
words, the group action is defined to be the matrix product from the left. This is not so for the associated 
bundles which are built from the basic tangent bundle T'(M). 


For associated bundles, such as the covector bundle T*(M) in Section 55.4, the tuple and frame bundles 
T" (M), JF" (M) and .F" (M) in Sections 55.5, 55.6 and 55.7, and the tensor bundles T^*(M) in Section 56.3, 
the same matrix g¢,,¢,(p) € GL(n, IR) is used for chart transition maps, but its action on fibre-set elements 
is not so simple. However, the bundles are all “associated” by means of their common structure group. 


54.5.33 THEOREM: The tangent bundle of a C! manifold is a topological fibre bundle. 
Let M be a C! manifold with n = dim(M). Then the tangent bundle T(M) for M is a topological 
(GL(n), R") fibre bundle. 


PRoor: To show that T(M) is a topological (G, F) fibre bundle according to Definition 47.6.5 with 
G = GL(n) and F = R”, first note that (G, F) is an effective topological left transformation group by 
Theorem 39.6.4. The pair (M, T4) is a topological space, where Tm is the underlying topology of the C! 
manifold (M, Am) according to Definition 51.3.14. Similarly, (T(M), Tr(m)) is a topological space, where 
Trm) is the topology underlying the C? manifold (T(M), Army) as in Theorem 54.5.28. 


The projection map 7: T(M) — M in Definition 54.5.30 is continuous by Theorem 32.10.7 (i) because, via 
the charts in Definition 54.5.22, it is a projection map from a direct product to one of the components of 
the direct product. Thus Definition 47.6.5 (i) is satisfied because 7(T(M)) = M by Definition 54.5.4. 


Definition 47.6.5 (ii) is satisfied because the charts W(w) in Notation 54.5.21 are continuous. This follows 
from the fact that the topology on T(M) is defined by the charts themselves. 


Definition 47.6.5 (iii) follows from the observation that the domains of the manifold charts for T(M) cover 
all of M because they are the same as the domains for the charts for M. 


For Definition 47.6.5 (iv), it must be shown that m x ®(7) : a 1(U,) — Uy x IR" is a homeomorphism, 
where Uy = 1(Dom($(v))), for all v € atlas(M). This follows from the observation that the differentiable 
structure on 7 1 (U,;), and therefore also the topology on 7 ! (U,), is induced by the charts in the manifold 


atlas Arm) in Definition 54.5.22. 


For Definition 47.6.5 (v), it must be shown that any two fibre charts ¢; and $» on a given fibre set 7~'({p}) 
of T(M) are related to each other by a group element g4, (p) € G, and that this group element is a 
continuous function of p € M. It follows from Theorem 54.5.14 and Definition 51.4.18 that 


Vp € M, VV € « *({p}), Vii, V» € Amp, 
$(U5)(V) = Jn(p)(u1)(V) 
= Üó»,di (p)($(v1)(V)), 


where $1 = (V1), d2 = $(v5) and g¢,,¢,(p) = Jai(p) € GL(n). The continuity of g5,,4, on Uy, N Uy, 
follows from the formula in Definition 51.4.18 for J»1(p), and Definition 42.5.11 for a C! function. 


54.5.34 THEOREM: The tangent bundle of a C! manifold is a vector bundle. 
Let M be a C! manifold. Then T(M) is a vector bundle. 


PROOF: The assertion follows from Theorem 54.5.33 and Definition 24.11.2. 


54.5.35 REMARK: Possible confusion between tangent-line vector bundles and "line bundles". 

As mentioned in Remark 26.14.9, the n-dimensional “tangent-line bundles" on n-dimensional differentiable 
manifolds in this book are not related to the one-dimensional “line bundles” in the fibre bundle literature. 
(See Definition 65.2.12.) 
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54.6. Tangent vector embedding maps for submanifolds 


54.6.1 REMARK: Identification of submanifold tangent vectors with ambient space tangent vectors. 

The various classes of submanifolds, embeddings, immersions and submersions in Sections 52.3, 52.4 and 52.5 
differ in their ability to guarantee the meaningfulness of submanifold tangent vectors within the ambient 
space. This topic arises in the context of differentiable fibre bundles (E, v, M, AE), where vectors on fibre 
sets Ep = 7~'({p}) are associated with vectors in the fibre space F, but the fibre sets are submanifolds of the 
total space E. Each fibre set is diffeomorphic to the fibre space, but only with respect to the submanifold's 
differentiable structure, which differs from the total space's differentiable structure. The dimensions of these 
manifolds are different. Therefore tangent vectors on fibre sets have a different dimension to tangent vectors 
on the total space. So clearly the vectors cannot be the same, although there is a clear identification between 
them. In the example of S? embedded in IR?, tangent vectors on 5? have a clear identification with vectors 
in IR?. But they are not the same vectors. Consequently, identification maps must be defined, and conditions 
on the embedding must be determined which imply the well-definition of these identification maps. 


According to Definition 54.6.2, the embedding n : T(S) > T(M), for a regular C! submanifold S of M, 
simply adds an extra n — m zeros to the m-tuple v for any vector t, ; € T(S), but this assumes that 


the chart v which is used for the ambient space M is a suitable extension of m from R™ to IR^. (See 
Definition 52.4.2 for regular C! submanifolds.) 


54.6.2 DEFINITION: The (submanifold) tangent vector embedding map for a regular C! submanifold S of 
a C! manifold M is the map 7: T(S) + T(M) given by 


Vp € S, Vv € R”, V4) € atlas(S), YY € A(p, v) 
Nn a) = bp (v, Og s m), (54.6.1) 
where n = dim(M), m = dim(S), and 


Vp € S, Vi) € atlas,(S), 
A(p, V) = (9 € atlas; (M); 9(S) = Range() n (R™ x {Opn-m}) and Ploom) = Hi" o [5]. 


54.6.3 REMARK: The use of compatible charts for defining the tangent vector embedding map. 

Definition 54.6.2 uses C! compatible charts 7) for the embedded tangent vectors Lp (v0 s 5. ),9 IN line (54.6.1). 
This is not immediately meaningful according to Definition 54.1.2 because v» may not be in atlas(M). But 
as mentioned in Remark 54.3.2, it is straightforward to extend the notation for particular tangent vectors to 
allow the use of C! compatible charts as in Notation 54.3.3. Thus the tangent vector Uo (v On m); is well 
defined, although 7 may not be in atlas( M). This extended definition of tangent vectors is necessary because 
atlas(M) may not possess any charts which are in the set A(p,w). (For example, the circle $+ cannot be 
expressed at all points as the graph of a function with respect to a single chart for IR?, and certainly no 
point of S! has a neighbourhood which lies in the hyperplane IR! with respect to the identity chart on R?.) 


'Theorem 54.6.4 verifies that Definition 54.6.2 gives one and only one value for the tangent vector embedding 


map 9: T(S) ^ T(M). 

54.6.4 THEOREM:  Well-definition of the submanifold tangent vector embedding map. 

Let S be a regular C! submanifold of a C! manifold M. Then the submanifold tangent vector embedding 
map 7 : T(S) — T(M) is well defined and independent of the choices for charts in line (54.6.1). 


PROOF: Let n = dim(M), m = dim(S), p € S, V € T,(S) and 7 € atlas, (S). Then V = 1,,,,j for some 


unique v € IR" by Theorem 54.1.8 (xii, vii), and the set of charts A(p, v) in Definition 54.6.2 is non-empty 
by Theorem 52.4.4 (xiii). Therefore tp (,0..,,.,,, ),y € T, (M) is well defined by Notation 54.1.4. Thus for each 


choice of 4» and y, the vector (V) € T,(M) is well defined for all V € T, (S). 


To show that (V) is independent of the choice of 7) € atlas, (S) and v € A(p, V), let We € atlas, (S) and 
we € A(p, ve) for L = 1,2. Then V =t for some unique ve € R” for £ = 1,2. Then by Theorem 54.1.11, 


D,Ue pe 


Ms 


Vi E Nm, vj = Opi (bor (2) |, s. py Yt" 


1 


©. 
ll 
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Let we = (ve, 0gn—-m) € R” and We = tp wepe for L = 1,2. It must be shown that Wi = W». By 
Theorem 54.1.11, W1 = ty iy 2, where 


Vi € Nn, dj 3 Bp s 9), uy 
= X Ay Wali W))) us cnl 


By Theorem 52.4.15, 
Vi € Nn, Vj € Nm, 


9, ÉT OD Lu y = Ps OT 0), s uy ifie Na 


0 if i ¢ Nm. 
Therefore 
Vi € Nn, ù = iz 1925 (PAT) emir) IEEE Nm 
0 ifi d Nm 
_ v) ifie Nn 
0 ift¢ Nm 


Thus ù = w2. So W, = W2. Hence 7(V) is independent of the choices of charts in Definition 54.6.2. 


54.6.5 REMARK: The “embedded tangent bundle” of a regular C! submanifold. 

An immediate consequence of Theorem 54.6.6 is the inclusion n(T(8)) € Upes T, (M) for any regular c! 
submanifold S of a C! manifold M. If dim(S) < dim(M), this is a proper inclusion. (See Theorem 54.6.8 
for the case dim(S) = dim(M), where this inclusion is an equality.) The set n(T(S)) may be thought of as 
the *embedded tangent bundle" of S within M. 


54.6.6 THEOREM: Submanifold tangent vector embedding maps are injective. 
Let M be a C! manifold. Let S be a regular C! submanifold of M. Then the submanifold tangent vector 
embedding map 5 : T(S) > Upes T, (M) for S in M is well-defined and injective. 


PROOF: The well-definition of n : T(S) + Upes T, (M) follows from Definition 54.6.2 line (54.6.1). To 


show injectivity, let p1,pa € S, vi, v € IR" and 44, Y2 € atlas, (S) satisfy n(t T 35.4 = n(t,,,. " be)? where 


m = dim(S). Then tp, (01 ,0gn—m)it1 = 'ps,(v2,0gn—m)rtv2 for some Yı € A(pi, V1) and V» € A(po, %2), where 
n — dim(M). So by Theorem 54.1.11, 


Vi € Na, d c 35 0, (920 (Y, 


where p = pı = p2. Therefore t = 


injective. 


by Theorem 52.4.15. Hence 7 : T(S) + Upes To(M) is 


pi,Ui,Vi pa,U2,U2 


54.6.7 REMARK: Open subsets of manifolds are submanifolds, but the tangent bundle is different. 

In the case dim(S) = dim(M) in Definition 54.6.2, the submanifold S is an open subset of the ambient 
space M. This follows from Theorems 52.3.10 and 52.4.10. Moreover, the submanifold's atlas is consistent 
with restrictions of the ambient space atlas. However, if S Æ M then T(S) is not a subset of T(M). This 
observation follows from the fact that all standard representations of tangent vectors are atlas-dependent. 
(See Remark 53.3.1 for the most popular tangent vector representation styles.) The tangent vectors in T(M) 
are parametrised, in one way or another, by charts in atlas( M), which must cover all of M. The tangent 
vectors in T(S) are parametrised by charts in atlas(S), which cannot cover all of M. Thus vectors in T'(M) 
have a parameter domain which is a strict superset of the parameter domain of vectors in T'(S), and two 
functions which have a different domain are necessarily different functions. 
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There are some artificial ways of escaping from the assertion that T(S) Z T(M) when S & M. For example, 
tangent vectors at p € M could be defined in terms of only those charts whose domain is included in the 
topological component of M which contains p. Then if S is a topological component of M, the tangent 
vectors in T(S) will in fact be elements of T'(M). Thus this observation is not an absolute truth applying to 
all possible representations of tangent vectors. However, any “truth” which depends on technical details of an 
underlying representation is clearly unsatisfactory. This would be like making theorems depend on the choice 
of representation of the real numbers. Abstraction and axiomatisation of definitions are intended to remove 
such arbitrary, subjective "truth" from mathematics. If a question cannot be determined independent of an 
arbitrary representation (or “model”), then the proposition can neither be proved true nor false, and must 
be designated as “unknown” or “indeterminate”. Such propositions are non-removable gaps in knowledge. 
One must simply grieve and move on. 


Theorem 54.6.8 applies the submanifold tangent vector embedding map in Definition 54.6.2 to the special 
case of an open subset of a C! manifold. The result is the very natural-looking formula on line (54.6.2), 
which is how one would have defined the embedding map even in the absence of a tangent vector embedding 
map for submanifolds with general dimension. 


As mentioned in Remark 54.1.13, the locality of the point /components/chart specification for tangent vectors 
ensures that the embedding map on line (54.6.2) gives the same vector ty, y, = tp,vbo if Vils = polg- 


54.6.8 THEOREM: Application of the tangent vector embedding map to open subsets of manifolds. 

Let M be a C! manifold. Let S be an open subset of M. Let S < (S, As) be the restriction of the manifold 
M to S as in Definition 52.4.16. Let n = dim(M). Then the submanifold tangent vector embedding map 
for S in M is the unique map y : T(S) —> T(M) which satisfies 


Vp € S, Vv € R”, Vv € atlas, (M), Eoo lg) = tona: (54.6.2) 


PROOF: Let p€ S,v € R" and y € atlas, (M). Let  — v|;. Then V € atlas; (S) by Definition 52.4.16. As 
in the proof of Theorem 52.4.17 (ii), Y € atlas} (M) because atlas; (M) C atlas. (M), v (S) = Range(v) n IR^ 


because Range(w) C R”, and A crite = | atone = V|s =I? o | 9: So v € A(p,)), where A(p, V) is 
given in Definition 54.6.2. Therefore by Definition 54.6.2, the submanifold tangent vector embedding map 
n: T(S) + T(M) satisfies n(t,, , ,;) = tp,v,p, which verifies line (54.6.2). Since every vector in T(S) is equal 
to tp, v,y|s for some p € S, v € IR" and v € atlas;(M), the map 7 : T(S) ^ T(M) is uniquely determined by 
line (54.6.2). 


54.6.9 THEOREM: The tangent vector embedding map of an open subset is a bijection. 
Let M be a C! manifold. Let S be an open subset of M. Then the submanifold tangent vector embedding 
map n : T(S) + Upes T, (M) for S in M is a bijection. Hence 1(T(S)) = Upes T;(M). 

PROOF: The well-definition and injectivity of the map 7 : T(S) > Upes T; (M) follow from Theorems 
52.3.10 and 54.6.6. Surjectivity follows from line (54.6.2) in Theorem 54.6.8. 


((2019-2-11. It seems clear that n(T(9)) is a regular C^ submanifold of T(M) for general regular C^*! 
submanifolds S of M. 'This could possibly be useful when defining curvature of connections. If so, then the 
natural location for the relevant theorem might be somewhere near here. It is clear that the map 7 induces 
a manifold structure from T(S) onto n(T(S)). It must be shown that this is a regular C* embedding. This 
could have some relevance to Theorem 58.5.6 also. )) 


54.6.10 REMARK: Extended chart-parametrisation of submanifold vectors for the embedding map. 
Theorem 54.6.11 replaces atlas,(S) in Definition 54.6.2 with atlas; (5). This apparent extension of the map 
n: T(S) + T(M) is not an extension at all. The only thing which is extended is the parametrisation of 
submanifold tangent vectors bs aig allowing the charts n to be in the C! completion of the atlas on S. 


The extension of tangent vector parametrisation from a given atlas on a C^ ! manifold S to its C! completion 
atlas! (S) is given by Notation 54.3.3, which makes t, , ; meaningful for vj € atlasj(5). 


54.6.11 THEOREM: Extension of tangent vector embedding map to complete atlas on submanifold. 
Let M be a C! manifold. Let S be a regular C! submanifold of M. Let n : T(S) + T(M) be the tangent 
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vector embedding map for S in M. Then 


Vp € S, Vv € R”, Vj € atlas} (S), V € A(p, V) 
To, yap) = to (vog s s) 


where n = dim(M), m = dim(S), and 


Vp € S, Vi) € atlas! (S), 
A(p, V) = {V € atlas; (M); v(S) = Range() n (IR" x (0ns-»]) and ¥| poy) = Hr" o Vlo} 


PROOF: Let p € S, v € R™, ù € atlas}(S) and Y € A(p,U). Then t „g € Tp(S) is well defined by 
Notation 54.3.3 and Theorem 54.3.5 (i). 

Let jo € atlas,(S), which exists by Definition 51.3.2 (ii). Then tu = pvo Fr some vo € R™ by 
Theorem 54.1.8 (xi). Definition 54.6.2 implies that n(t ,. 5.) = tp.(vo,0gn—m),vo for all vo € A(p, vo). But 
by Theorem 54.1.11, tp,(vo,Opn-m); po = 5p (wi w2),o; Where (wi, w2) € IR" x IR"7"" satisfies 


VÍ E€ Nm, wi = X. T AC (x De volp) T > One m Or Wi (bo! (x 2). up) 
= vida! r Daa 
= 2 viðr (Cone (x 2). isto) 658) 
B (54.6.4) 


where line (54.6.3) follows from Theorem 52.4.15, and line (54.6.4) follows from Theorem 54.1.11 because 


bou) = (psi Similarly, 
Vj € Nn-m, 
wi = Y: hdn (s Gag, T D Ot on" Gg O) eyot 
= viða "(Vig * (2), ctp) 
=0 


by Theorem 52.4.15. Hence Wiad) = Hs dis = tp, (vo; Ogn-m) po = ip (v gn m), 


54.7. Tangent bundle product identification maps 


54.7.1 REMARK: Direct products of tangent spaces. 

Direct products of tangent spaces often appear in differential geometry, particularly in various constructions 
for differentiable fibre bundles. Defining these direct products is not difficult, but some choices for notation 
and terminology are not completely obvious. The closely related situation for direct products of tangent 
spaces of Cartesian spaces is discussed in Remark 26.15.1. The Cartesian space direct products of tangent 
spaces suggest how to define tangent space direct products for differentiable manifolds. 


Definition 26.15.6 gives the name “concatenation of tangent vectors" or “tangent vector concatenation” 
to the vector denoted (Li, L2) in Notation 26.15.7 for the vector in the direct product space whose base 
point and velocity are constructed as concatenations of the parameters of Lı and Lə. The direct product 
identification map in Definition 54.7.2 is the tangent space version of the identification map for Cartesian 
spaces in Definition 26.15.2. (Definition 54.7.6 is the tangent bundle version of Definition 54.7.2.) 
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54.7.2 DEFINITION: Pointwise identification map between vector pairs and direct-product vectors. 
The direct product identification map for tangent spaces Tp, (Mı) and Tp, (M3) of C! manifolds M; and M5, 
at points p; € Mı and p» € Mg, is the map 7: Tp, (M1) x Tpa (M2) > Tp(Mı x M2), where Mi x M» is the 
direct product of Mı and Mə (as in Definition 52.6.2), p = (p1, po) € Mi x Mo, and i is defined by 


Via € atlas, (Mı), Vibo € atlas, (M3), Vu, € R”, Vug € R”?, 
itp, vipi pan ia.) = tiup tatahi XV» 


where ny = dim(Mi), na = dim(M3), and (vi,v2) denotes the tuple concatenation concat(v,va) for all 
v € R”! and v € R™. 


54.7.3 REMARK: Applicability to common-domain direct products of maps between manifolds. 

As mentioned in Remark 52.7.4, common-domain direct products of maps are frequently encountered as local 
trivialisations in the basic definitions for differentiable fibre bundles, namely as products of projection maps 7 
and fibre charts ¢. Then the question arises as to how to combine the outputs of the differentials of two such 
maps in a single tangent bundle for the product space. (See Definition 58.4.5 for the pointwise differential of 
a map between manifolds.) For example, the differential of r would produce an output (d7).(y) € Tr(a (M), 
and the differential of ¢ would produce an output (dó);(y) € Tg) (F), where y € T; (E), for differentiable 
manifolds E, M and F. These outputs must be related to the output (d(m x $));(y) € Ti xay M x F) 
from the differential of m x ¢, the common-domain product of r and ¢. Definition 54.7.2 is clearly applicable 
to this requirement since (m x ¢)(z) = (v(z), 6(z)). (See Theorem 58.7.7 for further details.) 


54.7.4 REMARK: Comparison of direct products of tangent spaces for Cartesian spaces and manifolds. 
The chart products v X V; in Definition 54.7.2 are double-domain function products as in Definition 10.14.3. 
Therefore Dom(v4 X v3) = Dom(v4) x Dom(/), and so (p1, p2) € Dom(v X Y2). This differs from the way 
in which tangent-vector base-points in Cartesian spaces are concatenated in Definition 26.15.2. In the case 
of manifolds, the points are abstract. Since their construction is unknown, it is not possible to construct an 
explicit combined point p from pı and po. So the abstract ordered pair (pı, p2) is used instead. However, 
the notation “(v1, v2)" does mean a concatenation of tuples, as described in Remark 26.15.1. 


This is consistent with Definition 54.1.2, by which a tangent vector on a C! manifold is identified with 
a tangent vector in the Cartesian space of a manifold chart. Thus Definition 54.7.2 maps the vector-pair 
(£p, vi a tpe,v2,2) to the vector tipi p2) (v1 ,v2),pı 3:9, Which has the same velocity tuple (v1, v2) € R™+™ as 
it would have had if Definition 26.15.2 had been applied to the chart spaces R™, IR"? and IR": *"?, In other 
words, Definition 54.7.2 follows almost as a theorem from Definition 26.15.2 for Cartesian spaces by applying 
the charts. 


54.7.5 REMARK: Direct products of tangent bundles. 

It often happens in differential geometry that a differentiable manifold M is the direct product of two 
differentiable manifolds Mı and Mə as in Definition 52.6.2. The definition of the tangent bundle for such 
a direct product is automatic because the direct product of C^ manifolds Mı and Mz with n; = dim( M1) 
and n3 = dim(M3) is a Ck manifold M = Mı x Mə with dim(M) = n4 + n2 by Theorem 52.6.5. The point 
of interest here is the relation between the tangent space T'(M) and the tangent spaces T'(M1) and T'(M3). 
The direct product of the tangent spaces Tp, (Mı) and Tp, (M2) is discussed in Remark 54.7.1. 


The direct product identification map for tangent spaces Tp, (Mi) and Tp, (M3) is given in Definition 54.7.2. 
This may be extended automatically from pointwise tangent spaces to global tangent bundles. Strictly 
speaking, the more correct formula for the identification map in Definition 54.7.6 is the one which is given in 
line (54.7.1). But this is possibly less clear, while being only slightly more precise. An even more “precise” 
formula would be 


VVi € T(M,), WW € T(Mm3), Vi € atlas; y, (Mi), Vibo € atlas;(y; (M3), 
(Vi, Va) = svi), (va) concat( (4a) (Vi), (92) (Va) bi ba? 


where m : T(M) — M is the projection map for T(M) in Definition 54.5.4. 
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54.7.6 DEFINITION: Global identification map between vector pairs and direct-product vectors. 

The direct product identification map for tangent bundles T(M,) and T(Mz2) of C! manifolds Mj and M» 
is the map i : T(Mi) x T(M3) > T(Mi x M3), where Mı x M3 is the direct product of Mı and M» (as in 
Definition 52.6.2), and i is defined by 


Vpi € Mi, Vpo € M», Vu € atlas, (Mı), Vio € atlas, (M3), Vu, € R^", Vus € R”, 
ts vipis pava) = Disease abe Xe? 


where ny = dim(Mi), ng = dim(M3), and (v1, v2) denotes the tuple concatenation concat(v1, v2) for all 
vı € R”! and v2 € R™. In other words, 


Ypı € Mj, Vpg € Mo, V, € T5; (Mı), YV € Tho (M3), Vu € atlas, (Mı), Vy». € atlas, (M3), 
Vi; V2) = tt, ps) concat($(91)(V1):9 (02) (Va) i Xd? (54.7.1) 
where ® is as in Notation 54.5.7. 


54.7.7 REMARK: Concatenation of tangent vectors in a direct product of tangent bundles. 

The name “concatenation” for the operation i in Definition 54.7.6 is clearly incorrect. In Definition 26.15.6, 
the name “concatenation” for a pair (Ly, vı; Lp, v, ) of Cartesian space tangent vectors has some justification 
because it has the form Lconcat(p,,ps),concat(vi,v;). This form is possible because direct products of Cartesian 
spaces are explicitly defined to be tuple concatenations as in Definition 16.4.3. (See also Definition 14.6.10 for 
a more explicit formula for tuple concatenation.) In the case of abstract point pairs, there is no concatenation 
operation, as mentioned in Remark 54.7.4. The velocity parameter for i(Vi, V2) is only formed as the 
concatenation of the velocity tuples of V; and V2 is the chart v; X v» is used for i(Vi, V2), which makes 
the concatenation rule chart-dependent. So the argument in favour of the name “concatenation” is not 
overwhelming. However, it seems harmless enough. The naming is important because it suggests a suitable 
notation. The notation “Vi & V3" is unsuitable because it suggests a tensor product. The notation “Vi . V2” 
is unsuitable because it suggests an inner product. In fact, it is not a product. The entire set Mı x Mə isa 
product of sets, but the individual elements of Mı x Mə are not products of elements of Mı and Mə. The 
notation chosen here is “(Vi, V2)” because it suggests an ordered pair, which is not correct, or a concatenation 
of vectors, which is also not correct. However, the “concatenation” of tangent vectors is some combination 
of ordered pairs and concatenations, as shown in line (54.7.1) in Remark 54.7.5. So this is chosen as the 
name in Definition 54.7.8. 


54.7.8 DEFINITION: The concatenation of tangent vectors V, € T(M,) for a = 1,2, for C! manifolds Mi 
and Mg, is the tangent vector (Vi, V2) € T(M1 x M3), where i is the identification map in Definition 54.7.6. 


Alternative name: tangent vector concatenation. 


54.7.9 NOTATION: (Vj, V2), for tangent vectors Vj € T(M,) and V5 € T(M3), denotes the tangent vector 
concatenation i(V1, V2) € T( M; x M3), where i is the direct product identification map for T( Mı) and T(M3) 
as in Definition 54.7.8. 


54.7.10 REMARK: Comparison of atlases for T(M1) x T(M2) and T(Mi x Mə). 

The manifolds T(Mi) x T(M2) and T(M, x M3) differ because in the first case, the tangent bundles T(M1) 
and T (M3) are constructed according to Definition 54.5.30 and then combined according to Definition 52.6.2, 
whereas in the second case, the manifolds Mı and M» are combined into Mı x Mə and then the tangent 
bundle T(M, x M3) is constructed from this. To evaluate the differentiability of a map from T'(Mi) x T(M2) 
and T(M, x Mə), it is necessary to examine and compare the atlases on these two manifolds. 


(1) atlas(T(M4)) = {¥ (Va); Ya € atlas(Ma)) for a = 1,2. 
Dom(WV(wv,)) = 14! (Dom(v,)) for Ya € atlas(Ma), for a = 1,2. 
U( Wa) : tp, vapa  (Wa(Pa), va) € IR"» x IR?» for a = 1,2. 
atlas(T (M1) x T(M2)) = [v x Yo; Wy = V (i1) € atlas(T (M1)), we = W(wW2) € atlas(T(M2))}. 
Dom(*)1 X 3) = 1; (Dom(y1)) x 15 !(Dom(v)) for v, € atlas(Mi), v» € atlas( M2). 
Vi X Va : (Lp, v, pis po va 3) > (1 (p1); vi, V2(p2), v3) € R™ x R™ x R™ x R™. 
(2) atlas(My x M2) = (v4 X v»; Yı € atlas( Mı) v» € atlas(M3)]. 
Dom(wV, X v3) = Dom(vV4) x Dom(v») for v, € atlas(Mi), v» € atlas( M2). 
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V4 X We: (p1, p2) 5 (Ui (pi); Vo (p2)) € R™ x IR"? for y € atlas(Mi), v» € atlas( M2). 
atlas(T (Mi x Mz)) = (W(u x 1); Yı € atlas(Mi) v» € atlas(M3)]. 
Dom(V (1 x w2)) = 7; (Dom(vy4 x W2)) for V1 € atlas( Mı), we € atlas( M3), 

where m : T(Mi x Mz) > Mı x Mg is the tangent bundle projection map for T(M, x Mə). 
"2071 x pa) : Loupe tos a ile > (3 (p), W2(p2), v1, v2) € IR x R™ x R™ x R”. 


According to Theorem 52.1.11, it is sufficient to test C* differentiability of a map $ between manifolds for 
only one chart for each of p € Dom(¢) and $(p) € Range(9). So it may be assumed that the same base-space 
chart-pair (%1, Y2) may be used for the charts for T(M,) x T(M2) and T(M; x M2). These observations 
may now be applied to the differentiability of tangent-bundle direct-product identification maps. 


54.7.11 THEOREM: The direct product identification map for C** manifolds is a C* map. 
Let k € Ze . Let Mı and M; be C**! manifolds. Then the direct product identification map for tangent 
bundles i : T(M1) x T(M3) — T(Mi x M3) satisfies 7 € CF(T(Mi) x T(M3),T(Mi x M»)). 


PROOF: Let V € T(Mi) x T(M2). Then V = (Vi,V2) for some V; € T(Mi) and V; € T(M3). Let 
Da = Ta(Va), where ma : T(M5) > Ma is the tangent bundle projection map for T(Ma) for a= 1, 2. 
Then V, = tpa vapa for some Ya € atlas, (Ma) and v, € R" for a = 1,2. It follows that (Vi X v3)(V) = 
(Va (p1), v1, V» (pa), va) € R”! x R”! x R”? x R”?, where yy X ij» = U(y1) X V(v) € atlasy(T(M1) x T(M3)). 
By Definition 54.7.6, i(V) = t(,,.,,) (v, va) axo. € T(pi p) Mi X M2). Since (pi, pz) € Dom(yı X Y2), it 


follows that ¥(yı X v2) € atlas) (T(Mi x Ma3)). Then Y(yı X v2)((V)) = (vı(pı), Y2(p2), v1, v2) € 
R™ x R”? x R™ x R™. Since this is equal to a simple permutation of the components of the tuple 


(1 X de)(V) = (vi(pi), vi, va(pa), v2), it follows that W(u X Y2) o i o (bı X ye)! is a C™ local map 
between Cartesian spaces. Hence i € C*(T(M1) x T(M3), T(M; x M3)) by Theorem 52.1.11. 


54.8. Product-structured manifold tangent vector embedding maps 


54.8.1 REMARK: Submanifold tangent vector embedding maps for horizontal/vertical submanifolds. 

The motivation for product-structured manifolds comes from fibre bundles, which are locally product- 
structured via “local trivialisation maps”. (See for example Remarks 21.0.7 and 21.5.7 and Definition 47.6.5 
for local trivialisations.) In the fibre bundle context, the submanifolds which arise by inverting product 
maps are called horizontal or vertical subsets, and it is the vertical subsets which are called “fibre sets”. 
Thus the motivation for submanifold tangent vector embedding maps for the submanifolds which result from 
product-structuring is from fibre sets. Therefore the term “fibre set” could be substituted for “inverse images 
of slice sets of products of manifolds”. The term “fibre set” is both shorter and more meaningful. 


Two very closely related kinds of submanifold are considered here, namely the slice-sets of manifold products 
in Theorem 54.8.3 (illustrated in Figure 54.8.1), and the inverse images of these slice-sets in Theorem 54.8.5 
(illustrated in Figure 54.8.2). It is the inverse image submanifolds which are important for differentiable 
fibre bundles. 


54.8.2 REMARK: Submanifold tangent vector embedding map for manifold-product slice-sets. 

Theorem 54.8.3 applies the submanifold tangent vector embedding map concept in Definition 54.6.2 to the 
slice-set submanifolds of direct products of differentiable manifolds in Definition 52.6.17 in the same way that 
Theorem 54.6.8 applies this concept to open subsets of manifolds in Definition 52.4.16. (Theorem 54.8.3 (i) 
is illustrated in Figure 54.8.1.) 7 


54.8.3 THEOREM: Submanifold tangent vector embedding maps for slice sets of manifold products. 
Let M; and M3 be C! manifolds with n; = dim(M,) and no = dim( M2). For py € M; and pz € M», let 
(MP?, A?) and (MZ", AS") be the slice-set submanifolds of M = Mı x Mg as in Definition 52.6.17. 


i) The submanifold tangent vector embedding map for the submanifold M?? = Mı x {pə} within M for 
g g 1 
p2 € M» is the unique map v^ : T(M??) — T(M) which satisfies 


Vpi € Mi, Vv € R™, VV, € atlas, (Mi), Vi» € atlasy, (M2), 
TH (£(p1,p2),01,10Tl) = tp, ,po),(v1,0) 1 ba (54.8.1) 


where II; : MP? — M; is defined by II, : (q, p2) ^ q, and 0 means Opn. 
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My? = Mı x {p2} 


V1 II, 
4— ——— . + 
pi 


R” Mi 


Mı x Mo 


Figure 54.8.1 Submanifold tangent vector embedding for product of manifolds 


ii) The submanifold tangent vector embedding map for the submanifold M?! = {pı} x Mə within M for 
g 8 2 
pi € Mı is the unique map 75! : T(M2') — T(M) which satisfies 


Vp» € M», Vue € R™, VV € atlas,, (Mi), V» € atlas,, (M2), 


p = . 
TI (Epi pa),va aorta) (p, pa), (0,2) a X a 
where II? : MZ" — Mg is defined by Il» : (p1, q) + q, and 0 means Ogni. 


PROOF: For part (i), let py € Mi, po € Mo, v € R™, v, € atlas, (Mi). and v» € atlas; (M2). Let 
V = yı o Il. Then y € atlas,, (MP?) by Definition 52.6.17. 
Define Y : Dom(v4) x Dom(wW2) — IR"! *?? by v(q) = (Yı (q1), v2(q2) — We(pe)) for q = (q1, q2) € Dom(w). 
Let p = (p1, p2). Then v € atlas (M) and w(M?P?) = Range(w1) x {Ons } = Range(v)n (IR^: x {Opns }), and 
er =W10 M bom) = 4% o Il =P o V| pra So v € A(p, i), with A(p, 4) as in Definition 54.6.2. 
i Lee 
Consequently 77° (tpo, g) = t»(w,0),s by Definition 54.6.2, where yj? : T(Mi^) + T(M) is the tangent 
vector embedding map for MI? within M. 
Since (qi, qo) = (Yı X v2)(q1. q2) — (Oni, Y2(p2)) for all q = (q1,q2) € Dom(w), the Jacobian of the chart 
transition map t o (i X v2) ! is the identity matrix. So t5(,0),y = tp (01,0), p1 ža: Thus m^ satisfies 
line (54.8.1) because nj (byob) = Íp.(v1,0),9- 


Since every vector in T(M/??) is equal to t(p, ,p2),vi p101, for some p; € Mi, vı € IR", y, € atlas,, (M1) and 
V» € atlas, (M2), the map v? : T(M??) — T(M) is uniquely determined by line (54.8.1). 


Part (ii) may be proved exactly as for part (i). 


54.8.4 REMARK: Computation of tangent vector embedding maps for product-structured manifolds. 
'Theorem 54.8.5 computes submanifold tangent vector embedding maps for product-structured manifolds. 
This is an application of the general submanifold tangent vector embedding map in Definition 54.6.2 to 
the horizontal and vertical submanifolds of product-structured manifolds in Definition 52.7.8, whose basic 
properties are given in Theorem 52.7.5. The following theorems are analogous, showing properties of point or 
vector embeddings for open subsets, direct-product slice sets, and the horizontal and vertical submanifolds 
of product-structured manifolds. 


open subset direct product product-structured 


submanifold embedding 52.4.17 52.6.18 52.1.5 
tangent vector embedding 54.6.8 54.8.3 54.8.5 


Unsurprisingly, Theorems 54.8.3 and 54.8.5 have similar proofs. However, note that the pairs (MI^, AT?) 
and (M?', AS") are submanifolds of M; x Mp in the former theorem, and submanifolds of Mo in the latter 
theorem. (Theorem 54.8.5 (i) is illustrated in Figure 54.8.2.) 
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m? : noh = tz,(v1,0),b 
MP? = 
oy" (Eo) e| h H 
— y po > > 
M» 
R” 
pı wr X ypa 
«4— —— ° — > 
¢ı(2) 
R”: Mı Rtn 
M, x M» 


Figure 54.8.2 Submanifold tangent vector embedding for product-structured manifolds 


54.8.5 THEOREM:  Tangent vector embedding maps for product-structured manifolds. 

Let Mo, Mı and Mz be C! manifolds. Let $4 : Mo — Mı and ¢2 : Mo — Ms» be Ct maps such that 
Q1 X b2 : Mo — Mı x Ms is a C! diffeomorphism. Let (MÍ?, At?) and (M2', Ab") be the horizontal and 
vertical submanifolds respectively for the product-structured manifold (Mo, $1, 62, M1, M2) for py € M; and 
p2 € M» as in Definition 52.7.8. Let nı = dim( Mı) and ng = dim( M2). 


(i) The submanifold tangent vector embedding map for the horizontal submanifold (M/?, AT?) of M for 
po € M» is the unique map n}? : T(M??) — T(Mo) which satisfies 


Vz € MP^, Vv, € R™, Vy, € atlass, (2) (M1), Vy» € atlas,, (M3), 


TA? evr dr) = 0,0) Ga Xa)o(2 Xba)? (54.8.2) 


where w= V4 o $1 yu: for all i; € atlas; (z) (M1) and p; € M», and 0 means Ogre. 
1 
(ii) The submanifold tangent vector embedding map for the vertical submanifold (MZ", AS") of M for 
pi € Mı is the unique map 75! : T(M3') + T(Mo) which satisfies 
Vz € Mg’, Vv; € R™, Vi, € atlasy, (Mi), Vy» € atlaso, (2) (M35), 


"b Donde) = 0, (0,02), (42 X ba)o(d1 X $2) 
where ij» = 2 0 2| m for all %2 € atlasy,(,)(M2) and pı € Mi, and 0 means Or». 
2 


PRoor: For part (i), let py € Ma. Then MP? = $5! ((po]) and AP? = (v1 © ¢ı| y2; V1 € atlas(M2)) by 
- l à à 
Definition 52.7.8. Let z € M^, v, € IR"! and y; € atlasg (M1). Let 1 = Yı © pil yrz: Then y, € Aj? 
p 1 
and z € Dom(v). Let v» € atlasp, (M2). 


Let U = Dom(((v1 x v2) o (61x ó2)) !) = (61x62)! (Dom(vi X v3)) = (61 X é2) ! (Dom(h1) x Dom(v»)) = 
$,'(Dom(w1)) N 95 (Dom(y2)). Then U € Top, (Mo). Define v : U — IR'1*"? by 


Vy € U, ply) = ((Vi X v2) © ($1 x é3))(y) — (0m , v» (p2)) (54.8.3) 
= (+1(¢1(y)), Y2(b2(y)) — Y2(p2)). 


Then 4% € atlas! (Mo) because $1 x ¢2 : Mo — Mı x Mg is a C! diffeomorphism. 

Since (MI?) = di(di(MP*)) x {Onn} = Range() N (R™ x {Opn2}) and di], = Vio $i urz = Hf o 
Vl urz» it follows that v € A(p, 1) for the set A(p, v1) as in Definition 54.6.2. Therefore nj? CE AA) = 
tz (%1,0), by Definition 54.6.2, where nj? : T(M??) — T(M) is the tangent vector embedding map for MP? 


within M. 
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Since 7 is a constant translate of (i4 X Y2) o (¢1 x $2) by line (54.8.3), the Jacobian of the chart transition 
map v o ((yı X V2) o (ó1 x $2)) ! is the identity matrix. Therefore t (v,,0),y = LEONEL olga Seba) 
Thus j^ satisfies line (54.8.2) because nf? (t, v, 5.) = tz,(v1,0),0- 


Since every vector in T(M/?) is equal to tv, for some z € MP^, v € R™, 44 € atlas; (;j(Mi) and 
Y2 € atlas, (M2), where y; = Yı o INTE the map ni? : T(MP?) — T(M) is uniquely determined by 
line (54.8.2). 

Part (ii) may be proved exactly as for part (i). 


54.9. Linear space tangent bundle drop functions 


54.9.1 REMARK: The standard tangent bundle for a finite-dimensional linear space. 

The standard topology Tp for a finite-dimensional linear space F is introduced in Definition 32.6.6. The 
locally Cartesian space structure on F is defined to be the topological space (F, Tr) in Definition 49.4.17. 
The standard atlas for F is given by Definition 49.7.14. The C% differentiable manifold structure for F is 
given as (F, Ap) < ((F, Tp), Ar) in Definition 51.4.21, where Ar is the atlas of all coordinate maps for F. 


The structures for the general linear spaces GL(F) are built up in parallel with the build-up for finite- 
dimensional linear spaces F. The standard topology Terr) for GL(F) is given by Definition 32.6.8. The 
locally Cartesian space structure on GL(F) is defined to be the topological space (GL(F), Tar(r;) in Def- 
inition 49.4.18. The standard atlas Acır) for GL(F) is given by Definition 49.7.15. The differentiable 
manifold structure ((GL(F), Tar(r)); Aarcr)) for GL(F) is given by Definition 51.4.24. 


The standard tangent bundles for F and GL(F) are then given, as for all tangent bundles on differentiable 
manifolds, by Definition 54.5.30. The build-up of structures underlying tangent bundles on abstract finite- 
dimensional linear spaces and their general linear groups is summarised in Table 54.9.1. (See also the general 
linear group "structure build-up table" in Remark 25.14.3.) 


structure F GL(F) 
set 22.5.7 . 23.1.12 
topology 32.6.6 32.6.8 


mw 
e 
A 
nÓ 
-1 
A 
© 
A 
m 
oo 


locally Cartesian space 
manifold atlas 
differentiable manifold 
tangent bundle 


iN 
© 
S 
EN 
A 
A 
e 
Ny 
m 
e 


Or 
en 
A 
N 
= 
[o1 
ET 
Aa 
Ny 
A 


bil 
w 
= 


4. 


oY 
w 
e 
on 


4. 


[o1 


Table 54.9.1 Build-up of standard manifold structures on linear spaces 


54.9.2 REMARK:  Dropping tangent vectors to the base space in linear-space manifolds. 

In the special case that a differentiable manifold is a finite-dimensional linear space, the tangent vector space 
at each point of the manifold may be identified with the underlying point-manifold in the sense that tangent 
vectors in T(F) and their corresponding base-space vectors in F obey the same transformation rules. 


It is often asserted explicitly or implicitly in textbooks that “if it transforms like a tangent vector, it is 
a tangent vector". In light of Theorem 54.9.3, one might suggest that “if it transforms like a base-space 
point, it is a base-space point”. In other words, each element of T(F) must thus be an element of F. This 
is in fact more or less accepted in classical coordinate geometry for flat Euclidean space. All of the points 
are coordinatised by some set IR", and the velocity vectors for such a space are also coordinatised by IR", 
and the acceleration vectors also. But one would not suggest that a velocity (3, 1,2) € R3, for example, is 
somehow associated with the point (3,1,2). Under rotations and reflections, the vectors undergo the same 
transformations, but under translations they do not. 


In the case of derivatives of functions of real numbers, the values of the derivatives are real numbers, which 
are elements of the same set as is used for the values of the function, but a derivative value 3.12, for example, 
usually has no association at all with the function value 3.12 unless the task is to solve a differential equation 
which associates the derivative and value of the function. But the value of the derivative of a real-valued 
function of a real variable is said to be a real number, even though it is really different “kind of object". 
Similarly, the same coordinate space is used for points, velocities and accelerations in kinetics, although 
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these also are different “kinds of objects”. Nevertheless, it is generally assumed, explicitly or implicitly, 
that the derivative of a vector-valued function is a vector-valued function. Under linear transformations, the 
derivative and the value of the function undergo the same transformations. So this seems harmless enough. 
But the identification of the linear spaces for values and derivatives is effectively an identification of points 
with velocities, which seems to not be valid. In the fibre bundle perspective, the point space and velocity 
space are kept distinct and separate, but for expediency and convenience, they are sometimes identified! 


Theorem 54.9.3 is illustrated in Figure 54.9.2. 


tp viy = y= tp,v2,be 


Opi; (V) = w3 (V) = Op,we (V) 
doar — Ur OW) us Powe = V7" © BH) |p, c, 


Figure 54.9.2 Drop function for the tangent bundle of a linear space 


54.9.3 THEOREM: The “drop function” for a linear space is component-map-independent. 

Let F be a finite-dimensional linear space. Let (F, Ar) be the differentiable manifold structure for F as in 
Definition 51.4.21, where Ap is the standard atlas for F consisting of all linear space component maps for F. 
For p € F and v c atlas(F), define maps óy,y : T;(F) > F by 


Vp € F, V € atlas(F), Vv € R”, Ppp (tpv, p) = YT (v), 


where n = dim(F). In other words, $5, — V! o (y)],. (ry; Then 


Vp € F, Vis, Y2 € atlas(F), VV € T;(F), 
Ppp (V) = bpp (V). 
In other words, Vp € F, Vw, v» € atlas(F), ppi = bpp. That is, dp, is independent of v. 


PROOF: Let p € F, V1, V» € atlas(F) and V € T,(F). Then V = tpw, = tp,v,,u, for some vı € IR" and 
v2 = Jai (p)vi € R” by Theorem 54.1.11, where n = dim(F). That is, Vi € Nn, và = $55 4 Jni(p)';vj, where 
Vi, j € Na, Jor(p)*; = Oni (Y2 o V4 (2), uto) by Definition 51.4.18. 


By Definition 51.4.21, v, = Kp, for some basis By = (e*)"_, for F for k = 1,2. So by Theorem 22.9.11, 


(2 o i (2)! = Kp, (p; (2))! = &m, Qj i63). = Ej sele) ri for i € Nn. So Jn (p); = Kz, (65)! 


for all i,j € Nn. Thus Vi € Nn, v = 375 i Kp, (e])'v]. It follows that 


dpan (V) = vr! (v) = &gl(v) = F viel 


j=l 

= 2 vi 2 sme (54.9.1) 
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where line (54.9.1) follows from Theorem 22.9.11. 


54.9.4 REMARK: The drop function of the tangent bundle of a finite-dimensional linear space. 

By Theorem 54.9.3, the drop function in Definition 54.9.5 is chart-independent, assuming that the standard 
atlas for a finite-dimensional linear space is used as in Definition 51.4.21. Note that atlas,(£) is the same 
as atlas(F) in Definition 54.9.5 because all of the (standard) charts are global. 


54.9.5 DEFINITION: Vertical drop functions for tangent bundles of linear spaces. 
The pointwise drop function for the tangent bundle of a finite-dimensional real linear space F at a point 
p € F is the map c? : T,(F) > F defined by wf = v !o Ple) for any v € atlas(F). (See 
Definition 51.4.21 for the standard atlas for F.) In other words, for m — dim(F), 


Vp € F, Vv € R”, V € atlas(F), tu» tag) = Y QUI 


The drop function for the tangent bundle of a finite-dimensional real linear space F at a point p € F is the 
map w* : T(F) > F defined by w” = um. In other words, c^ = Ueatias(F) who (y). 


per 


54.9.6 THEOREM: Pointwise drop functions are restrictions of the global drop function. 


Let F be a finite-dimensional linear space. Then voL = m s (F) for all p € F. 


PROOF: The assertion follows from the disjointness of sets Dom(co7 ) = T; (M) by Theorem 54.1.16. 


54.9.7 THEOREM: The drop function for a linear space is a linear space isomorphism. 
Let F be a finite-dimensional linear space. Let p € F. Then uL : T,(F) > F is a linear space isomorphism. 


PROOF: Let B be a basis for F. Let Y = kpg. Then v € atlas(F). Let p € Dom(v) = F. Let n = dim(F). 
Then S(p) TE T (F) — IR" is a linear space isomorphism by Theorem 54.5.8. But v : F — R” isa 


linear space isomorphism by Theorem 23.1.15. Therefore c] = ^! o $()|;. (y? T,(F)  F is a linear 


space isomorphism. 


54.9.8 T'HEOREM: Formula for inverse drop function for a linear space. 
Let F be a finite-dimensional linear space. Then 


Vp, q € F, Vy € atlas(F), (w7) (q) =o" |z 


PROOF: Let p,q € F and v € atlas(F). Then by Definition 54.9.5, aE (topla) ah) = V l(v(q)) = q. So by 


Theorem 54.9.7, (vo?) !(q) = ty.4(q),y- The equality (o7) | = w” 2-5 follows by Theorem 54.9.6. 


54.9.9 REMARK: Why coordinate maps are required for linear-space tangent bundle drop functions. 

One might perhaps ask why the map w” should be defined via the coordinate space R”. The tangent vectors 
on a general Banach space are typically defined directly as elements of the space itself without any need for 
a basis or coordinate charts. However, in the abstract fibre bundle framework, tangent vectors are elements 
of the total space of the tangent bundle, which is constructed via coordinate charts. Such tangent vectors 
are defined in a very abstract way via charts because in general a manifold cannot be assumed to have a 
Banach space structure. 


It happens that the standard definition for the tangent vector to a curve in a Banach space yields the 
same vector as would be obtained via coordinate charts. The principal benefit of Definition 54.9.5 is that 
it converts abstract tangent vectors in T(F) to the more concrete tangent vectors in F. (According to the 
"correspondence principle", a new improved theory must contain the well-known old classical realities within 
it. But demonstrating old results in terms of new improved more general theories may sometimes seem like 
using a helicopter to shop at the local supermarket.) 
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54.9.10 REMARK: Some basic properties of the tangent bundle of a finite-dimensional linear space. 
Theorem 54.9.11 gives some very basic properties of the tangent bundle of a finite-dimensional linear space. 
The inverse relation (v)! in part (i) is not a function in general, but the composite w” o (v) ! is a 
function. This is a consequence of the fact that the vector cz" (V) depends only on the fibre-space component 
$(V)(V) of V, not on the base-space component 7(V). In other words, cz* effectively extracts the vertical 
component of V € T(F) and drops it to w? (V) € F. When this is mapped to IR" by y, the result is the 
same as ®(7)(V). Thus v(c (V)) = &(W)(V). 

Part (iii) of Theorem 54.9.11 shows the relation of the drop function w” to the projection map 7: T(F) > F. 
The manifold map for the total space T(F) is the direct product of ij o m and « o co^, which verifies that 7 
and w? split out the horizontal and vertical components of each tangent vector. 


Although Theorem 54.9.11 may seem somewhat shallow, it throws some light on general tangent bundles, 
which do not have such simple properties. Theorems 54.9.3, 54.9.5 and 54.9.11 are not valid for tangent 
bundles on general C! manifolds because they are in some sense “not flat". More precisely, they are not 
affine. In other words, there is no global linear space which may be regarded as “portable” from point to 
point in the base space as in affine spaces. 


54.9.11 THEOREM: Some basic map properties for the linear space drop function. 
Let F be a finite-dimensional linear space with n = dim(F). Let Arp be the standard atlas for F. Let 
7 : T(F) — F be the tangent bundle projection map for F. 


(i) Vb € Ap, v oc o (y)! = idg». 
(ii) V € Ap, P(Y) = y o w". 
(ii) vy € Ap, W(9) = (Pon) X (Y o ar) = (9 d) o (T X c). 


PROOF: For part (i), let y € Ap. Then (W,v) € c o $(y)-! if and only if (W, V) € c and (v, V) € (v) 
for some V € T(F) by Definition 9.6.2 for the composite of two relations and Definition 9.6.13 for the inverse 
of a relation. In other words, (W,v) € w o 6(v)-! if and only if W = w” (V) and v = (v)(V) for some 
V € T(F). But V = ty, for some unique (p,v’) € F x R”, and then W = vy !(v') by Definition 54.9.5 
and v’ = v by Notation 54.5.7. So (W,v) € w? o $(v)-! if and only if W = y^ !(v). Therefore (ŭ, v) € v o 
w o (y)! if and only if 6 = v(W) and W = v-!(v). Hence (?,v) € Y o w o $(v)-! if and only if 
0 = v because v : F — R” is a bijection. In other words, v o c^ o $(V)-! = idr». 

For part (ii), let Y € Ap. Then c = V^! o (y) by Definition 54.9.5. Hence (v) = v o w? because 
v: F — R” is a bijection. 

Part (iii) follows from part (ii) and Notation 54.5.21 line (54.5.6). 


54.9.12 REMARK: The advantages of manifold atlases which are “incomplete”. 

As alluded to in Remarks 50.4.9, 51.2.2, 51.4.4 and 54.5.31, complete or maximal atlases have the dis- 
advantage of “washing away” the detailed structure of the manifold. For example, an analytic manifold 
(with positive dimension) whose atlas is replaced by the corresponding C®-complete atlas will no longer be 
analytic. It will be downgraded to being merely C^??. Similarly, a C^-complete atlas will downgrade any 
C* manifold for k, £ € Zi with k « £ to a C^ manifold. The standard atlas for a finite-dimensional linear 
space F in Definition 49.7.14 may be thought of as “linear-complete” or “linear-maximal” because it contains 
all linearly compatible charts for F. Even extending this atlas to the analytic-complete atlas on F would 
“wash away” the linear structure of the underlying point-set. The use of “incomplete” or “non-maximal” 
atlases has the advantage that the maximum information is retained. In this case, the linear structure is 
important. So it would be counterproductive to “complete” or “maximalise” the atlas in any way. 


54.10. Tangent velocity vectors 


54.10.1 REMARK:  Tangent-line vectors need to be numericised. 

Tangent-line vectors are “the real thing" (according to this book), but tangent velocity vectors are used for 
calculations. Nowadays, we numericise everything. This is because our ability to calculate with numbers is 
so vastly better than our ability to calculate with lengths of lines. But even quite recently, only a few decades 
ago, professional scientists and engineers were calculating with slide rules, which are a form of geometrical 
calculation, although slide rules are only reliable to about 3 decimal places. In the 18th century, Euler was 
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in the habit of expressing algebraic relations and calculations in terms of geometrical manipulation of line 
lengths. (See Euler [217], for example.) In the classical and Hellenic Greek world, it was usual to express 
algebra in terms of geometry. So it is not without precedent to express a definition in terms of lines, which 
may be thought of as a geometric concept. Geometric concepts are better for the intuition. Numerical 
concepts are better for the computer. 


54.10.2 REMARK: Tangent velocity vectors are really tangent vector component tuples. 

The “tangent velocity vectors” in Definition 54.10.4 are actually tangent vector component tuples. Real- 
number tuples do not seem very much like abstract velocity vectors. However, the n-tuple space IR" can be 
replaced with an abstract linear space V, and then the velocity becomes an abstract vector v € V. In reality, 
vectors are measured as numerical component tuples, although they are thought of as having an existence 
independent of components. 


54.10.38 REMARK: Tangent velocity vectors for Cartesian spaces. 
The tangent velocity vectors in Definition 54.10.4 are closely related to the tangent velocity vectors which 
are defined for Cartesian space in Section 26.16. 


54.10.4 DEFINITION: A tangent velocity vector for an n-dimensional C! differentiable manifold M is an 
equivalence class [(w, (x, v))] of chart-tagged point-velocity pairs in pem (atlas; (M) x ({~(p)} x IR")), where 


(Y1, (21, v1)) and (Y2, (x2, v2)) for v1, Y2 € Am are said to be equivalent whenever #7 ! (a1) = v; ! (za) and 


i La E = i 
Vi € Nn, v=), Dei (%2 o Wy ^ (a) Lessing 
j=l 
= Y! Ja(p)yjvi 
j=l 


(See Definition 51.4.18 for the transition matrix J21 (p).) 


54.10.5 NOTATION: fp», for p € M, v € R” and v € atlas; (M), for an n-dimensional C! differentiable 
manifold M, denotes the tangent velocity vector [(v», (v(p), v))] in Definition 54.10.4. 


54.10.6 REMARK: Equivalent expression for the set of chart-tagged point-velocity pairs. 
The set U,¢y,(atlas,(M) x ({Y(p)} x IR")) in Definition 54.10.4 is the same as the set L Jc 4, {Y} x 
(Range(4) x R")), where Ay = atlas(M). 


54.10.7 DEFINITION: The tangent velocity vector space at a point p in a C! differentiable manifold M 
is the set V = ([(v, (z,v)); v € atlas; (M), x = v(p), v € IR"), where the equivalence relation is as in 
Definition 54.10.4, together with the operations of vector addition and scalar multiplication by the field IR 
defined as follows. 

(i) Vi, (z, v1))], (4, (2, v2))] € V, [Q5 (2, v1))] + Gs (x, v2))] = [@, (z, vi + v2))]. 

(ii) VÀ € R, V[(v, (z,v))] € V, ALC, (z, v))] = [(w; (a, Àv))]. 


54.10.8 NOTATION: T,(M) denotes the tangent velocity vector space at a point p in a C! manifold M. 


54.10.9 REMARK: Expressions for the set of vectors for a tangent velocity vector space. 

The set of vectors (i.e. the module) of the tangent velocity vector space T,(M) for a point p in a C! 
differentiable manifold M is equal to the set of tangent velocity vectors tp,» on M with v € IR" and v € 
atlas; (M). In other words, 


vp; (v, Y) € R” x atlas (M)) 


p ’ 
V, (Yp) v))]: (v, v) € atlas,(M) x R”} 
Y, (7, v))); V € atlas,(M), x = Y(p), v € R^]. 


54.10.10 DEFINITION: The tangent velocity vector bundle total space for a C! differentiable manifold M 
is the set ([(V, (z,v)); v € atlas(M), x € Range(v), v € IR^), where the equivalence relation is as in 
Definition 54.10.4. 
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54.10.11 Notation: T(M) denotes the tangent velocity vector bundle total space for a C! manifold M. 


54.10.12 REMARK: Expressions for the tangent velocity vector bundle total space. 
The tangent velocity vector bundle total space T'(M) for a C! differentiable manifold M is equal to the set 
Ue ur 15 (M) of all tangent velocity vectors tp,v,4 on M. In other words, 


boy; p € M, v € R”, v € atlas,(M)} 
| 


={ 
= {[(v, (x, v))]; v € atlas(M), x € Range(v), v € R”}. 

54.10.13 REMARK: The convenience of the representation of tangent velocity vectors. 

An equivalence class [(v, (z,v))], for v € atlas(M), x € Range(v) and v € R”, just happens to be (the 
graph of) a function from atlas;(M) to IR^ x IR". (This is a lucky consequence of the way functions 
are represented in standard set theory. A similar comment is made in Remark 54.3.9.) Given a tangent 
velocity vector V = [(v,(z,v))] € T,(M), one may write V(v) = (z,v), since V is a function. For charts 
Yı, 2 € atlas, (M), one may then write V(w1) = (z1,v1) and V(w2) = (za, v2), where v !(z1) = vg ! (z2) 
and v1, v9 satisfy line (54.1.4). 


For a fixed point p € M and chart v» € atlas, (M), the set of pairs (x, v) in the tangent velocity vectors in the 
tangent space Ñ, (M) is the cross product {7)(p)}xIR". For any fixed V € atlas(M), the pairs (x, v) for tangent 
velocity vectors in the tangent velocity bundle total space T'(M) lie in the cross product Range(v) x IR". 
Most of the time in practical calculations, the chart is fixed and one works within such a cross product 
Range(v) x IR”, which looks very much like a Cartesian tangent bundle. Therefore the tangent velocity 
vector representation in Definition 54.10.4 is convenient for calculations. 


One particular example of the advantage of point-velocity pairs is in the definition of the tangent bundle 
on a tangent bundle (i.e. the second-level tangent bundle) in Section 59.1 because the pairs (r,v) are a 
ready-made tuple of 2n chart coordinates, which is convenient for constructing second-level tangent vectors. 


Roughly speaking, the tangent-line vectors in Definition 54.1.2 answer the ontological question: “What is 
the tangent vector?". (At least, that is the claim of this author, given the constraints of what ZF set theory 
can do.) The tangent velocity vectors in Definition 54.10.4 answer the question: *How can one conveniently 
perform calculations with tangent vectors?" These are the author's answers to the questions in Remark 1.4.7 
for tangent vectors on manifolds. 


The “ontologically correct” tangent-line vectors in Definition 54.1.2 are equivalence classes of the form 
(Y, L)] by Theorem 54.1.8 (v), which are also fortuitously (graphs of) functions on the domain atlas, (M) 
for some p € M. f V = Ch, Lew), then V(v) = Lz, where Ly, : I — R” is defined by Ly, : tr z + tv 
for t € IR. (See Section 26.13 for tangent-line vectors in Cartesian spaces.) Therefore V(w)(t) = x + tv 
for t € IR. Hence in particular, V(w)(0) = x = v(p). 


54.10.14 REMARK: The convenience of numerical index/point/velocity triples for computer software. 

For practical computation with tangent velocity vectors, it is convenient to use an indexed atlas (wa)aer for 
some set J, and then replace the pairs (Ya, (z, v)) with triples (a, z, v) € I x IR? x R” with x € Range(v). 
Such a representation is straightforward to implement in computer software. 


54.10.15 REMARK: The relevance of patchwork spaces for tangent velocity bundles. 

The set Unens(atlas,(M) x ((v(p)) x R”)) = Upean UY} x (Range(v) x R")) in Definition 54.10.4, 
with Aj = atlas(M), fits the definition of a patchwork space. (See Sections 10.17 and 32.15 for non- 
topological and topological patchwork spaces respectively.) This set may be said to be a “patchwork” of the 
sets Range(y) x IR" for v € atlas( M). 


54.11. Tangent differential operators 


54.11.1 REMARK:  Tangent operators are not the primary definition for tangent vectors in this book. 

A tangent operator is the action of a tangent vector on real-valued differentiable functions on a manifold. 
Many authors use tangent operators as their primary definition of tangent vectors, but as discussed in 
Remark 53.3.3, tangent operators have both advantages and disadvantages as a primary definition. The 
disadvantages seem to outweigh the advantages. So the chart-tagged Cartesian lines in Definition 54.1.2 are 
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oh pare) 
\ 


v(p) EZ 
P= level curves 


41], of f o 7! 


Figure 54.11.1 Tangent operator definition: pv u(f) = X; v'Oi(f o v )(v(p)) 
adopted here as the primary definition, whereas the tangent operators in Definition 54.11.2 are regarded as 
merely tangent-vector-like objects which are not suitable for defining tangent bundles. They are without 
doubt very useful objects, but they are not tangent vectors. 


Definition 54.11.2 and Notation 54.11.3 are illustrated in Figure 54.11.1, where 0; is shorthand for 0/0x*. 


54.11.2 DEFINITION: The tangent (differential) operator at a point p in a Ct differentiable manifold M, 
with velocity v € R” for a chart 7 € atlas; (M), is the function from C' (M) to R which is given by the rule 
[oo uua vesfo CON where n = dim(M) € Z 


54.11.3 NOTATION: Op, for p € M, v € R” and v € atlas; (M), for a C! manifold M with n = dim(M), 
denotes the tangent differential operator at p with velocity v for a chart w € atlas; (M). In other words, the 
map Op,v,y : C' (M) > R is given by 


n 20 
vf € C! (M), Lou) 3 m (f o Ys) uy (54.11.1) 


54.11.4 REMARK: Relations between points, functions, function spaces and tangent operators. 

Figure 54.11.2 shows tangent operators, real-valued functions, and points in a manifold. In this diagram, 
f appears twice, first as a function from points p € M to IR, and then as a point f € C!(M) being mapped 
to R by the operator O,,,,4. In one context, f is a function, whereas in the other it is a “point” in the 
domain of the operator Oy... 


— —P 
pd IR 
" : 
Figure 54.11.2 Tangent operator map Op», :C'(M) > IR 


54.11.5 THEOREM: Linearity of tangent differential operators. 
Let M be a C! manifold with n = dim( M). 


(i) Vp € M, Yy € atlas, (M), Vv € IR^, YA € R, Vf € C! (M), 3p wu (Af) = My uf). 
(ii) Vp € M, Vv € atlas; (M), Vv € R”, Whi, fa € C! (M), Opes (fa f2) = Ops e (F1) + Ops (f2). 
(ii) Vp € M, Vv € atlas, (M), Vv € IR^, 3p vy : C'(M) > R is a linear map. 
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PROOF: Part (i) follows from Theorem 41.1.18 (i). 
Part (ii) follows from Theorem 41.1.18 (ii). 
Part (iii) follows from parts (i) and (ii) and Definition 23.1.1. 


54.11.6 THEOREM: Constant real-valued functions on C manifolds have zero derivative. 
Let M be a C! manifold with n = dim(M). Let A € IR. Define f : M — IR by f(p) = A for all p € M. Then 
f € C! (M) and 


Vp € M, Vv € R”, Vy € atlas; (M), Du f= 0. 


ProoF: Theorem 51.6.5 implies f € C!(M). Let p € M, v € R” and v € atlas,(M). Then f o v !(x) = 
A for all x € Range(v) € Top(IR"), where n = dim(M). So 0,i(f o v^ !)(zx) = 0 for all i € IN, by 
Theorem 41.1.16. Hence p, vy (f) = 0 for all p € M, v € IR" and v € atlas,(M) by Notation 54.11.3. 


54.11.7 THEOREM: Expressions for tangent operators in terms of Cartesian chart derivatives. 
Let M be a C! manifold. Then 


Yf € C'(M), pv ulf) = 2 (Ww (u(p) + vt))|, 6 (54.11.2) 
= O f (67 (up ODl: (54.11.3) 


Pnoor: Line (54.11.2) follows from Notation 54.11.3 line (54.11.1), Theorem 41.6.21 (ii), Definition 41.4.4 
and Notation 41.4.5. Line (54.11.3) then follows from Notation 26.13.4 and Definition 26.13.2. 


54.11.8 REMARK:  Line-based directional derivative operator. 

Theorem 54.11.7 line (54.11.2) has the advantage that it permits easy generalisation to non-C! manifolds, 
including unidirectionally differentiable manifolds. (This can be useful on boundaries of regions, for example.) 
Line (54.11.2) also avoids the inconvenience of summation symbols and vector components. However, the 
decomposition of Oy,v,4(f) into the product of the velocity tuple (v*)?., and the tuple of partial derivatives 
of f o ~~ in line (54.11.1) is enormously useful for computations. (This follows from Theorem 41.6.21 (ii).) 
The fact that this product is linear with respect to both factors is the basis for a vast proportion of the 
analysis of differentiable manifolds. Unfortunately, this simple linearity fails in the case of non-C! manifolds, 
for which line (54.11.2) may still be valid. 


Line (54.11.3) is convenient for various generalisations. For example, a unidirectional tangent vector may be 
defined by the formula OF ud : f arf (Lio D) how (See Section 41.5 for unidirectional derivatives. 
See Section 54.16 for unidirectional tangent bundles.) 


54.11.9 NOTATION: The set of tangent differential operators at a point in a manifold. 
T,(M ), for p € M, for a Ct differentiable manifold M, denotes the set of tangent differential operators at p. 
In other words, : 

T,(M) = {9p vy; v € R^, v € atlas; (M)]. 


54.11.10 DEFINITION: The tangent (differential) operator space at the point p € M, for a C! differentiable 
manifold M, is the set T (M) of tangent differential operators at p together with the operations of pointwise 
addition and multiplication by real numbers. Thus 

(i) Ydi, d2 € T, (M), Vf € C'(M), (91 + ó3)) = di(f) + óx(/). 

(ii) VÀ € R, Yọ € T (M), Vf € C'(M), (A9)(f) = A(). 


The linear space specification tuple for this tangent-line vector space is (IR, T,(M ), OR; TR; 9, H), where 
(IR; om, TR) is the specification tuple for the field of real numbers. 


54.11.11 THEOREM: Linearity of tangent operators with respect to components. 
Let M be a C! manifold with n = dim( M). 
(i) Vp € M, Vv € atlas, (M), VÀ € R, Vv € R^, Ov = Àðp vy. 
(ii) Vp € M, Vv € atlas; (M), Vv1, vo € R”, Op vs i = Ops jh + Op,vs 
(iii) For all p € M and v € atlas, (M), the map v +> Ó,,,, is a linear map from R” to 7, (M). 
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PRoor: For part (i), let f € C!(M). Then by Notation 54.11.3 and Definition 22.2.19, Op yvy (f) = 
EEO Ba fO (2), oy = ATE VIA FH) |p uu = Mes. Hence vs = Aag by 
Notation 54.11.3. 

For part (ii), let f € C! (M). Then by Notation 54.11.3 and Definition 22.2.19, 


"Bg = Dm F V2)" Oy (i "(m 2))].. t) 
= Y das f(7 GL ug + viða LU Elay) 
= Óp,vi i) F Op,va,ah- 


Hence Op,» Lo, = Op,v1, + pva, by Notation 54.11.3. 
Part (iii) follows from parts (i) and (ii) and Definition 23.1.1. 


54.11.12 NOTATION: The tangent differential operator corresponding to a given tangent vector. 
Oy, for a tangent vector V = tpw, € T(M), for a C! differentiable manifold M, denotes the corresponding 
tangent differential operator pv, € Tp(M). In other words, 


Vp € M, Vy € atlas,(M), Vv € R”, ô = pv: 


Pv, p 


In terms of the tangent vector bundle notation in Definition 54.5.16, this may be written as 
VV € T(M), Vv € atlas(M), Ov = On(V),8(p)(V) y 


54.11.13 THEOREM: Linearity of tangent operator action with respect to the tangent vector. 
Let M be a C! manifold. 


(i) Vf € C!(M), VÀ € R, VV € T(M), Ov f = AOv f. 

(ii) Vf € C! (M), Vp € M, VV, V2 € T (M), Oy, Lv f = Ov, f + Oy; f. 
(iii) For all f € C! (M) and p € M, the map V => Oy f from T,(M) to R is a linear map. 
(iv) For all p € M, the map V ++ Oy from T,(M) to T,(M) is a linear map. 


Pnoor: For part (i), let V € T(M) and n = dim(M). Then V = tpw, for some p € M, v € IR" and 
w € atlas, (M) by Notation 54.1.4. So Oy = Op», by Notation 54.11.12. Let A € IR and f € C! (M). Then 


Ov f = Ox, uf 


pL CT (54.1.4) 
= Op. vo f (54.11.5) 
= AO, suf (54.11.6) 
= Oy f, (54.11.7) 


where line (54.11.4) follows by Definition 54.4.4 (ii), line (54.11.5) follows by Notation 54.11.12, line (54.11.6) 
follows from Theorem 54.11.11 (i), and line (5 54. 11.7) follows from Notation 54.11.12. 


Part (ii) follows in a similar manner to part (i) from Notations 54.1.4 and 54.11.12, Definition 54.4.4 (i), and 
Theorem 54.11.11 (ii) 


Part (iii) follows from parts (i) and (ii) and Definition 23.1.1. 
Part (iv) follows from parts (i) and (ii) and Definitions 54.11.10 and 23.1.1. 


54.11.14 REMARK: Construction of tangent differential operators from tangent-line vectors. 

Theorem 54.11.15 shows how the tangent differential operator pwy for a triple (p, v, Y) can be expressed in 
terms of the corresponding tangent-line vector tp... This shows how the built-in parametrised line inside 
the tangent-line vector representation can be used to directly construct directional derivatives. Expressing 
tyv,y in terms of O,,,y is not so simple. 
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54.11.15 THEOREM: Relation of tangent operators to tangent vectors with the tangent-line representation. 
Let M be an n-dimensional Ct differentiable manifold. Then 


o 


Vf € C' (M), pwyl f) = at 


(7 (os OOo (54.11.8) 


for all p € M, v € IR" and v € atlas; (M). 


PROOF: Let M be an n-dimensional C! differentiable manifold. Let p € M, v € R”, y € atlas; (M) 
and f € C!(M). Then tpw, = [(~,Lu(p),v)] by Theorem 54.1.8(v). So ty, y (V) = Lp». Therefore 
tpv p (V)(t) = Ly@o(t) = v(p) + tv for all t € R. So f( (tovs (/)(t))) = fab (p) + tv)). Hence 


line (54.11.8) follows from line (54.11.2) in Theorem 54.11.7. 


54.11.16 REMARK: Equality of tangent-line vectors implies equality of corresponding tangent operators. 
Let V = tpw, € T(M) for a C! manifold M. Then it follows from Theorem 54.11.15 and Notation 54.11.12 
that Ov(f) = AFV (E) lo for all f € C'(M). This expression is independent of the choice 
of i € atlas,(M). An interesting question here is how to determine the base point p from an equivalence 
class [(v^, L)]. This is easily evaluated as p = v^ !(L(0)). So the choice of Y € atlas(M) for a given tangent- 
line vector V = [(wo, Lo)] must be such that v-!(L(0)) = vg !(Lo(0)). Alternatively, and very much more 
simply, one may choose any w € Dom(V). Therefore 


VV € T(M), Vf € C! (M), V € Dom(V), 
dv (f) = à f (9  (V(9)(0))], ,. 


From this, it is clear that if the tangent-line vectors ty, ,, y, and t5,,,4, are equal, then the tangent 
differential operators Op, 0,2), and Op, 5,5 are equal. 


54.11.17 REMARK: Ambiguities in the O-notation for derivatives on manifolds. 

The O-notation for tangent operators in Notation 54.11.12 avoids the ambiguity of the more usual D-notation 
for the naive derivative of a real-valued function with respect to a given tangent vector. Although naive 
derivatives and covariant derivatives are equal when applied to real-number functions, it is best to use the 
curly-dee O-notation when the intent is to apply a naive derivative, and use the big Latin letter D when 
the intent it to apply the covariant derivative. Then there is less confusion when these symbols are tightly 
mixed, as for example in Definition 71.6.9. 


In Notation 54.11.12, ô is effectively a map from tangent vectors in T(M) to tangent operators in T(M). 
But an operator such as Oy may be applied to objects other than real-valued functions, such as vector fields 
or tensor fields including differential forms. In such cases, the naive derivative Oy is very different to the 
covariant derivative D$, with respect to an affine connection 6. (Typically the naive derivative yields an 
object in a higher-level tangent space, whereas the“output” of a covariant derivative would generally be in 
the same level of space.) The meanings of both the ôy and D$, operators depend on the class of object they 
are applied to. 

The meanings of the operators Oy and D$, also depend on the object-class of the subscript V. For example, 
V may be a vector at a single point, or it could be a vector field, in which case these operators yield “outputs” 
which are also fields of some kind. The -operator is also often used with a real number or an integer index 
as its subscript. One must simply be alert at all times to the class of every object in every expression! 


54.11.18 THEOREM: Constant real-valued functions on C! manifolds have zero differential. 
Let M be a C! manifold. Let A € IR. Define f : M — IR by f(p) = A for all p € M. Then f € C!(M) and 


YV € T(M), Ov f — 0. 


PRoor: The assertion follows from Notation 54.11.12 and Theorems 51.6.5 and 54.11.6. 


54.11.19 REMARK: Rewritten stationarity condition for the local maximum of a function. 
Theorem 51.6.12 is rewritten in the language of tangent operators in Theorem 54.11.20. The expression 
(d/dt)(u o ¥(t)) |, = 0 in Theorem 51.9.5 for a C! curve y : R — M, where y(x) = p, equates to the 


expression O,/(z)U, where y(x) is as in Definition 57.9.2. 
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54.11.20 THEOREM: Tangent operators acting on real functions evaluate to zero at a maximum. 
Let M be a C! manifold. Let p € M be a local maximum of a function f € C!(M). Then 


YV € T,(M), Df =0. 


PROOF: Let p € M be a local maximum of f € C!(M). Then (8/0z*)(f o PHE) ey) = 0 for all 
i € Nn, for all Y € atlas,(M) by Theorem 51.6.12. Let V € T (M). Let v € atlas,(M). Then V = tpw, for 
some v € R”. So Oy = pv, by Notation 54.11.12. Hence Oy (f) = 5; v(0/0z*)(f o PHE) a-yo) =0 
by Definition 54.11.2. 


54.12. Tangent operator total space 


54.12.1 DEFINITION: The tangent operator total space on a C! manifold M is the set Unem Tp(M) of all 
tangent differential operators at points of M. 


54.12.2 NOTATION: The tangent operator total space for a differentiable manifold. 
T(M) denotes the tangent operator total space for a C! manifold M. 


54.12.3 REMARK: Relation between tangent operator total space and the spaces at each point. 
From Definition 54.12.1 and Notation 54.12.2, it follows that 


T(M) = U £,(0M)-(8,.,; pE M, ve R”, v € atlas,(M) }. 
peM 


Unfortunately, this set cannot be structured as a fibre bundle. The obstacle is that pem T, (M ) £0 by 
Theorem 54.12.4 (ii). A fibre bundle must have pairwise disjoint fibre sets. The simplest elution to this 
problem is to use point-tagged tangent operators as in Section 54.15. 


54.12.4 THEOREM: Zero vectors at different points are indistinguishable. 
Let M be a C! manifold. 


(i) Vb1, p2 € M, V, € alae (M), Vy» € atlas; (M), Op, 0, = Apo,0,rb2- 
(ii) If M z 0, then pem Tp(M) # b. 

Proor: For part (i), let pj, p» € M, v, € atlas; (M) and v» € atlas; (M). Let f € C!(M). Then 
Oy, 0,01 (f) = 0 = 05, o, (f). Hence es = Op,,0,, by Notation 54.11.3. 
For part (ii), let Po € M and vg € atlasp (M). Let p € M. Then there exist V € atlas, (M). So 
Op, oio = Op,0,6 € T (M) by part (i). Therefore Op,,0,% € pem Tp(M). Hence Npem T5 (M) # 0. 


54.12.5 REMARK: The relation of tangent operator spaces to the space of C! functions. 

Each pointwise tangent operator space T » (M) is a linear subspace of the global function space Lin(C' (M), IR) 
of linear functionals on C! (M). 

The union T(M) — Lew 
Lin(C'(M), IR) because the sum of non-zero tangent operators at two different points of a manifold will not 
in general be a tangent operator at any point of the manifold. 


T,(M ) of these pointwise tangent operator spaces is not a linear subspace of 


However, the fact that the spaces T,(M ) are linear subspaces of Lin(C! (M), R) can be exploited by consid- 
ering instead the action of cross-sections $ of the fibre bundle consisting of differential operators, which act 
on functions f € C!(M) to give a distinct value ¢(p)(f) € R for each p € M. Such cross-sections have the 
form ó : M — T(M) with ó(p) € T(M) for all p € M. If the resulting function p++ ¢(p)(f) is an element 
of C! (M), then ¢ is a linear map from C'!(M) to C!(M). In general, this does not work very well unless 
one replaces the space C! (M) with C?* (M), and then one must require M to be a C% manifold. 


54.12.6 REMARK: LEztension of tangent differential operators to locally defined functions. 

Although the operators in a pointwise tangent operator space 7,(M) are applied to the global test-function 
space C!(M) so that operators at all points of the manifold are acting on a common space, it is generally 
assumed that the operators in 7, »(M) can also be applied to the space C1(M ) of restricted C! functions 
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which are defined only on some neighbourhood of p. (See Notation 49.10.7 for Ĉ (M). See Notation 51.6.8 
for Ci(M ).) 

Applying an operator pw, to the restricted functions in CM, R) = Uaerop (a) C!(Q,IR) may seem 
at first to be unproblematic. Certainly this would be well defined if every function in C! (Q, R) could be 
uniquely extended to a function in C! (M). But this is clearly not true. The application of pv, to C2(M) 
is only valid because of the local character of the operator. It must be known a-priori that the action of the 
operator depends only on the values of the test function in a neighbourhood of p. If the parameters p, v and 
v of Op»,y have been “thrown away”, it will be impossible to know whether the operator is being applied to 
a suitable open set. 


The parameters p, v and w are used in the construction of a tangent operator, but the constructed object has 
no explicit indication of those parameters. A tangent operator is a set of ordered pairs whose first elements 
are C! functions on M and whose second elements are real numbers. It is a machine which accepts C! 
functions as input and produces real numbers as output. From this one must somehow guess p and v for 
given w. 

Most differential geometry texts seem to implicitly assume that the three parameters are still available for 
use at all times, which suggests that in fact the users of such mathematical objects are “keeping the tags 
on". It is typically assumed that if a map h : C! (M) — IR is given, then the tuple (p, v, v) can be easily 
recovered from this map, knowing only that it was constructed in some particular way. This is a case where 
the issue is confused by the notation. For example, if someone writes X (p) for some vector field X, it is clear 
from the notation that the base point is p, but if this is the zero vector, it is just the map h: f + Op for 
f € C'(M), which gives no information on the base point. In the case of non-zero tangent operators, one 
must apply an infinite sequence of test functions to successively narrow down the location of the base point. 


These considerations have an impact on the question of how to adapt the definition of a tangent operator 
to locally differentiable test functions. The adaptation of Op,v,y, from C! (M) to el (M) is only well defined 
if p is known. Since p is not explicit in a given operator h : C! (M) — IR, it is not possible to know whether 
h has the intended meaning on e (M). 


It is fairly clear that in practice, the base point p of a tangent operator is almost always attached to the 
operator. In other words, it is a “tagged operator” as described in Section 54.15. Thus in reality, the tags 
are almost never removed as they are claimed to be. 


In the Koszul formalism, where the fundamental object is a tangent operator field rather than a tangent 
vector or tangent operator at a single point, the base point of a tangent operator is effectively available 
because of the way in which the field is defined. If X € X (T(M )) is a tangent operator field on M, then X 
maps base points p € M to corresponding operators X(p) € T,(M). This is in effect a kind of “tagging” for 
operators because a vector field is a set of ordered pairs (p, X(p)) for p € M. 


54.12.7 REMARK: Difficulties of defining an atlas for the set of tangent differential operators. 

One of the big advantages of the tangent-line vector approach to tangent vectors is that the C? regularity 
prerequisite on the manifold can be easily relaxed. Another advantage is that one doesn’t have the overhead of 
constructing a space of differentiable functions to act as a test function space. Tangent differential operators 
are unquestionably essential in differential geometry. But they are not suitable as the primary representation 
for tangent vectors. 


One of the most unsatisfying aspects of tangent operators as a candidate for the representation of tangent 
vectors is the difficulty of differentiating cross-sections of a bundle of such operators. The ability to dif- 
ferentiate vector fields is essential for most applications. But to differentiate an operator-valued function 
requires some sort of differentiable structure on the set of tangent operators. How does one coordinatise the 
set of all tangent operators on a manifold? The obvious way to do this is to use the coefficient n-tuples v 
of operators pwy. In other words, one arrives back at the coordinate representation of tangent vectors. 
Differentiating in the general space of operators on spaces like C! (M) leads down complicated, confusing 
pathways from which it is difficult to return. If most common operations on tangent operator bundles require 
a reversion to coordinates anyway, one might as well use coordinates as the primary representation. 


54.12.8 REMARK: Equality of tangent velocity vectors implies equality of corresponding tangent operators. 
Theorem 54.12.9 means that tangent velocity vectors and tangent differential operators are consistent with 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


54.12. Tangent operator total space 1743 


each other. If the velocity vectors are equal, then the operators are equal. In other words, the choice of 
representative of the equivalence class of coordinate triples is immaterial. So the D-operation in Notation 
54.11.12 commutes with changes of chart. (Theorem 60.2.5 for higher-order tangent operators is similar to 
Theorem 54.12.9.) 


54.12.9 THEOREM: Equal tangent velocity vectors correspond to equal tangent operators. 
If the tangent velocity vectors tp, «, 4, and tps,v2,y2 of a C! manifold are equal then the tangent operators 
Op, v, ij, ANA Op, v, v, are equal. 


PROOF: Suppose fp, 01,01 = tps,ve,e0- Then by Definition 54.10.4, pı = p» and vj = við; (y$ o v!) for all 
i € Nn. So by Definition 54.11.2, 


x 2o 
Vf e C'(M), Ops f) =D iss oi DL 
i—1 
LE, 7 Ó j - 
= 2 viaa; (Ff y V3 E) uoa) a; 0 o V4 Olene 
i, j= 
k o E 
= 2 gg a v E) a-y) 
= 


54.12.10 REMARK: Equality of non-zero tangent operators implies equality of tangent vectors. 

Theorem 54.12.9 has a partial converse. Theorem 54.12.11 asserts that a given tangent operator uniquely 
determines a tangent vector if the operator is non-zero. Since differential operators are defined as maps from 
C'(M,R) to R, it is not directly evident which point in M they apply to. The effect of an operator pv, 
on test functions is given, but the base point p, velocity component tuple v and chart w are “thrown away” 
as discussed in Remark 54.12.6. 

The proof of Theorem 54.12.11 must show the equality tp, 4, o, = tpo,vs,2, given only that the operators 
VY, and V3 are non-zero elements of T(M ). Since the only information available about tangent operators 
is their action on the space C!(M), the proof of Theorem 54.12.11 necessarily proceeds by constructing 
test functions to apply the given tangent vectors to. This makes the proof more difficult than the simple 
computation in the proof of Theorem 54.12.9. This shows once again the inconvenience of using differential 
operators to represent tangent vectors. When the coordinates are not given, it is technically onerous to 
extract the coordinates from the operators for any kind of practical application. 


It is also noteworthy that the proof of Theorem 54.12.11 explicitly depends on the Hausdorff topology for 
the manifold M. In fact, the assertion fails if the manifold is not Hausdorff. For the manifold with a double 
origin in Remark 49.5.3 and Example 49.5.5, all C! functions must have the same value and derivatives at 
each of the two origins. So Op, v, y, = Op, v, v, FO does not imply pı = p2. In the more interesting case of 
a two closed unit intervals in Remark 49.10.4, C! functions may have different values on separate branches, 
even if required to be C??, but they still must have the same zeroth and first derivatives at the branch 
points, which makes them incapable of distinguishing non-zero tangent operators Op, vıy, and Op, v, y, With 
pı Æ p2 at such double-points. 


54.12.11 THEOREM: Equal non-zero tangent operators correspond to equal tangent velocity vectors. 

Let M be a C! manifold. Let Yp € atlas(M), p, € Dom(wv) and vy € R” for k = 1,2. Suppose that 
Op, vi b, = Opa,va,u, Æ 0. Then pı = p» and ty, v, v, = ops vs p2- 

PROOF: Let Ve = Op, on yp for k = 1,2. Then V&(f) = 95, 4 vi0/0z'(f o Vi JG). usos) for k — 1,2 for 


all f € C1 (M) by Definition 54.11.2, where n = dim(M). Suppose that pı Z p2. Since M is a Hausdorff 
space by Definitions 51.3.8 and 50.1.1, there are Qy € Top, (M) for k = 1,2 such that Qı N Q2 = 0. Let 
Gk = vy(Q&) for k = 1,2. Then Gk € Top,, (5, (IR^) for k = 1,2. Therefore By, (p,),r, C Gk for some 


ry € IR* for k = 1,2. Define h] : Gp — R for k = 1,2 and j € N, by 


Yk € No, Vj E€ Nn, Y£ E Gk, hl(z)-(z)— Vx (pi)? ) Grn /2,r— (© — Vi (pk); 
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where g,. 5 : R” — Rj is defined as in Example 44.1.12 for all r, R € R$ with 0 < r < R. Then hi € C** (Gi) 
for k = 1,2 and j € Nn. Define f} : M > R for k = 1,2 and j € IN, by 


VkeN»VijeN,VpeM, —fi(p- { vale) dine Demy) 
0 otherwise. 
Then fj € C'(M) and so Vi(fZ) is well defined for k = 1,2 and j € Nn. The value of Vi(fZ) may 
be computed as V;(fZ) = $4 v0 /Oa" (fg o Vi X8). us (px) F AC) ryo) = v, for 
k = 1,2 and j € N,. However, Vi( f2) = 0 for all j € Nn because f3 = 0 in a neighbourhood of pı. Similarly, 
Vo(f1) = 0 for all j € Nn. Therefore dp, viy, Æ Op, v, i, if Op, vip Æ 0 OF Opy,v2,. # 0. This contradicts 
the assumption Op, 01,261 = Op,,v,,u, Æ 0. Therefore pı = po. 


To show that £j, J, 4, = Čpa vaa, note that the assumption Op, vıy, = Ops.v2.u implies that Vi(f?) = Va(f2) 
for all j € Nn. So 


Vj € Nn, vi = Vo(f1) 


Hence £j, 1,0, = tpo,ve,2 by Definition 54.10.4. 


54.12.12 REMARK: Defining tangent vectors as derivations is tedious, cumbersome and restrictive. 

Many textbooks define tangent vectors as “derivations”, which are shown to be the same a differential 
operators. Derivations are defined on the space of C^? real-valued functions on a manifold. This restricts 
their applicability to C^? manifolds. This is a restriction which may sometimes be overlooked in applications 
to non-smooth manifolds. Derivations are defined at a point p of a C^? manifold M as those linear operators 
L:C?*(M) > R which satisfy the Leibniz rule: 


Vf,g € C®(M), L(f.g) = LC(f)g(») + f(»)L(g). 


It may be shown that derivations on a C^? manifold are the same thing as first-order differential operators. 
(See for example Crampin/Pirani [7], pages 247-248; Gómez-Ruiz [14], pages 21-22.) Since it is straight- 
forward to show that a first-order differential operator is a derivation, but difficult to show the converse, it 
seems more practical to define tangent operators in the classical fashion, and the direct definition for tangent 
operators is then applicable to general C! manifolds. 


54.13. Tangent operator basis transformations 


54.13.1 THEOREM:  Chart-transition matriz for tangent differential operators. 

Let M be a C! manifold with indexed atlas (Wa)aer. Let p € M, a, 8 € I, and vo, vg € R” be such that 
p vaba = Avg: Then vg = Jga(p)va, where the matrix Jgq(p) € GL(n) for p € Dom(v4) N Dom(vg) is 
as in Definition 51.4.18 with n = dim(M). 


PROOF: The assertion follows from Theorem 54.12.11 and Definitions 54.10.4 and 51.4.18. 


54.13.2 REMARK: Multiplication on the left by chart-transition matrices. 

The matrix Jgo(p) in Theorem 54.13.1 may be thought of as the contravariant transformation rule from 
a-coordinates to G-coordinates. In accordance with the usual conventions of matrix multiplication, one may 
write Jyo(p) = Ja(p)Jao(p) for all a, 8,7 € I such that p E€ Dom(v4) n Dom(vg) n Dom(v»,). 


54.13.3 REMARK: Chart basis operators provide a basis for tangent differential operators. 
The chart basis operators in Definition 54.13.4 and Notation 54.13.5 provide a natural basis for the linear 
space of tangent operators at a fixed point of a differentiable manifold. 


54.13.4 DEFINITION: The tangent operator space chart-basis vector for component i € N, at a point p c M 
with respect to a chart ~ € atlas;(M) for an n-dimensional C! differentiable manifold M is the tangent 
operator p,e; € T;(M), where e; € Typ) (IR") is the corresponding Cartesian chart-basis vector at ~(p). 
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54.13.5 NOTATION: Tangent operator space chart-basis vectors. 
a?” for p € M, a € atlas,(M) and i € Nn, for a C! differentiable manifold M with n = dim(M), denotes 
the tangent operator space chart-basis vector Op ¢,. € T;(M). 


54.13.6 THEOREM: Formulas for tangent operator components. 
Let M be a C! differentiable manifold M with n = dim(M). Then 


Vp € M, Vw € atlas, (M), Vi € Nn, Vf € C! (M), 


IV (f) = (f ov (2) 


54.13.1 
Oz? «=1)(p) ( ) 


and 


Vp € M, Vv € R”, Yy € atlasp(M), poy = X vP”. (54.13.2) 


i=1 


PROOF: Lines (54.13.1) and (54.13.2) both follow from Notations 54.13.5 and 54.11.3. 


54.13.7 THEOREM: Tangent differential operator expressed in terms of chart-basis tangent operators. 
Let M be a C! manifold with n = dim(M). Then 


Vp € M, VV € T (M), Vv € atlas,(M), 
dy = Yo &(9)(V) OP”. (54.13.3) 


i=l 
Hence 
Vp € M, VV € T,(M), Vv € atlas,(M), Vf € C! (M), 
av (P) = 3 9(0( 85,67 GL uy (54.134) 


PROOF: Letpc M, V €T (M) and v € atlas,(M). Then V = tpw, for some v € IR" by Notation 54.1.4. 
So Ov = Op» by Notation 54.11.12. But v = e(v)(t,,,,) = 9(v)(V) by Notation 54.5.10. So dy = 
$27 viat = YT 4 9(u)(V)'8^" by Theorem 54.13.6 line (54.13.2). This verifies line (54.13.3). Then 
line (54.13.4) follows from Theorem 54.13.6 line (54.13.1). 


54.13.8 THEOREM: Chart-transition matriz for tangent operator space chart-basis vectors. 
Let Wa, Vg € atlas(M) be two charts for a C! manifold M with n = dim(M). Then the tangent operator 


space chart-basis vectors (0^ ^)? , and (0^"^)* , at any point p € Dom(y/4) N Dom(w) satisfy 
Vi € N,, TRA UA US (54.13.5) 
j=l 


where the chart transition matrix Jyg(p) is defined in Definition 51.4.18. 


PROOF: Letic Nj. Then Of"? = 0,,, ,. By Theorem 54.13.1, Ó,,, 4, = Opecva when e; = Jaa(p)va. 
Then Jag(p)ei = Jag(p)Jgo(p)va = va. So S ocdanlp) uer = và for all j € Nn. But e£ = ó for 
all i,k € Nn. So Jag(p) i = vj, for all j € Ny. But va = X vlej. So Ope, = x5 vi ðp e; pa = 


2 cdi Jag (p) iðp, e; pa = St ones Jog (p)/;. 


54.13.9 REMARK: Multiplication on the right by chart-transition matrices. 

In Theorem 54.13.8, basis vectors are multiplied by the matrix Jag(p) on the right, but in Theorem 54.13.1, 
components are multiplied by the inverse matrix Jgq(p) on the left. There is no contradiction here. In 
Theorem 54.13.1, the objects being transformed are contravariant components for tangent vectors, whereas 
in Theorem 54.13.8, the objects are sequences of tangent vectors, which are essentially covariant in nature. 
This ensures that the linear combination $77 , 0? ds v? is independent of the chart Ya. (This is related to the 
fact that principal fibre bundle structure group elements have a right action on fibre sets and total spaces.) 


Whenever a coordinate change occurs, the components v), of a tangent operator Op», (or vector £5,,,,) are 
transformed according to a linear transformation with matrix Jgq(p) € GL(n) on the left, which is just what 
is required for the definition of a fibre bundle in Definition 47.6.5, condition (v). 
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54.13.10 REMARK: Mnemonics and shorthands for chart-basis tangent differential operators. 
A useful mnemonic for the operator gp is (0/0v*)(p). (See Remark 54.13.11.) Then equation (54.13.5) 
may be expressed as 


ð sc D ; 
Vi € Nn, gnus... (p) = V C Gay, 
i ; 9j (p) 2. Bui (p) Jas (p) 
n a Ov 
=) sg) 
A ayh Os 
T 0 j n 
=a ee) = Ya Lato 
j- B j=l 


A useful shorthand for the symbol 0? '? is ð; when the point p and chart w are implied. 


54.13.11 REMARK: Mnemonics and shorthands for tangent differential operators. 
The tangent operator Op, in Definition 54.11.2 may be written more colloquially as 


n . Ə n ; a 
Pow = 2 ede 9r > id ENT (p). or vop”. 


i=l i=1 


When p and w are clear from the context, the simpler notation v/O; may be used. 


54.14. Tangent operator action on vector-valued functions 


54.14.1 REMARK: Application of tangent differential operators to vector-valued functions. 

The tangent differential operators in Definition 54.11.2 are applied only to real-valued functions on a manifold. 
This is because the purpose of Chapter 54 is to present tangent bundles, not the applications of those tangent 
bundles more broadly. However, it is convenient to present here the application of tangent operators to some 
function classes. 


The formula on line (54.11.1) of Notation 54.11.3, which defines a tangent operator O,,., by its action on 
a real-valued function f € C'(M) may be extended to apply to real-tuple-valued functions in C! (M, IR") 
for m € Zg, or to abstract vector-valued functions in C! (M, W) for finite-dimensional real linear spaces W. 
(Extension to Banach spaces W is also straightforward, but not required in this book.) 


54.14.2 DEFINITION: Action of tangent operators on Cartesian space valued functions om a manifold. 
The action of a tangent operator pw on IR"-valued functions on a C! manifold M, where p € M, v € R”, 
w € atlas, (M), n = dim(M) and m € Zi, is the map Ó,,,,y : C! (M, IR?) — R™ given by 


o 


vf E€ CHM, R"), je Ns, — Onus) = X v s o WV la uo, 


a” ri 
Dial I^). 


(See Notation 54.11.3 for Oy, s (f? ).) 


54.14.3 REMARK: Abstract vector-valued functions can be differentiated "through the charts". 

Lines (54.14.1) and (54.14.3) in Definition 54.14.4 show how vector-valued functions on Ct manifolds can be 
differentiated “through the charts". In other words, linear space charts xg, whose values are the components 
of vectors with respect to particular bases B of W, can be applied to convert vector-valued functions into 
Cartesian space valued functions as in Definition 54.14.2. (The use of component maps for vector-valued 
function differentiation is also discussed in Remarks 41.8.3 and 41.8.7.) 


The expression 0;(Kp o f o ^!) on line (54.14.1) uses Definition 41.2.7 because kp o f owt € C! (U, IR") 
with U € Top(IR?). The expression p v y (KB o f) on line (54.14.3) uses Definition 54.14.2 because kp o f € 
C1(U,R™) with U € Top(M). 
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54.14.4 DEFINITION: Action of tangent operators on vector-valued functions on a manifold. 
The action of a tangent operator pv, on W-valued functions on a C! manifold M, where p € M, v € R” 
and v € atlas;(M) with n = dim(M), and W is a finite-dimensional real linear space, is the map Ó,,,,y : 
C!(M,W) — W given by 

Vf € C' (M, W), pv y (f) = X v'di(f 9 y Da). 
(See Notation 41.8.5 for 0;(f o v^ !)(x). See Notation 41.8.10 for C! (U, W) for U € Top(IR?).) 
In other words, for any basis B of W, 


Vf € C (M,W), puul) = È ving (Olea o fo YW) (54.14.1) 
= kg (Avg o f o v7!) (6(p))) (54.142) 
= kp (OÜpsa(nge f. (54.14.3) 


(See Definition 22.8.8 for the component map kg : W — R™, where m = dim(W). See Definition 54.14.2 


for Op, .u (Rp o f).) 
54.14.5 THEOREM: Constant vector-valued functions on C manifolds have zero derivative. 
Let M be a C! manifold with n = dim(M). Let W be a finite-dimensional real linear space. Let wo € R. 
Define f : M > W by f(p) = wo for all p € M. Then f € C!(M,W) and 
Vp € M, Vv € IR^, Vv € atlas, (M), pv ab (f) = 0. 


PROOF: Theorem 51.7.8 implies f € C!(M,W). 
Let p € M, v € R” and v € atlas,(M). Then f o V^ !(z) = wo for all x € Range(v) € Top(IR"). So 
O;(f o v )(z) = 0 for all x € Range(v) and i € N, by Theorem 41.8.12. Hence pv, (f) = 0 for all p € M, 
v € R” and v € atlas,(M) by Definition 54.14.4. 


54.14.6 REMARK:  Tangent operator application to connection forms. 

Theorem 54.14.7 is an application of the tangent operator action in Definition 54.14.4 to the kind of scenario 
which arises when computing the exterior derivative or curvature of a connection form on a principal bundle. 
(See Section 69.5 for connection forms. See Definition 70.5.2 for the curvature of a connection form.) 


A connection form is a Lie-algebra-valued function on a principal bundle, where the Lie algebra is a finite- 
dimensional linear space consisting of tangent vectors at the identity element of a Lie group. In this scenario, 
the point q in Theorem 54.14.7 would be this identity element, Mı would be the principal bundle total space, 
M» would be the Lie group, T; (M2) would be the Lie algebra, and ¢ would be the connection form. (For an 
application, see Example 69.5.8.) 


54.14.7 THEOREM: Differentiation of tangent-vector-valued functions on a manifold. 

Let M; and Mz be C! manifolds. Let nı = dim(Mi), n = dim(M3), 1 € atlas( Mı) and v» € atlas( M3). 
Let U € Top(M;), ó € C! (U, R™) and q € Dom(w2). Define f : U > T4,(M3) by Vp € U, f(p) = taotoraa 
Then 


Yp EUN Dom(4%1), Vv € R”, Op v i, (f) = b4,Op,0,w, ($) 2 


nı . 
KA 
ni = V ta,ðp e; 1 (0),2* 


t 


where e; = (6/)"2, € R™ for i € N,, are the standard basis vectors for R™ as in Definition 22.7.9. (See 
Definition 54.14.2 for p,e; pı (6) and Op,v,, (P)-) 


PROOF: Let v» € atlas( M2) and q € Dom(v5). Then T;(M»3) = {t4 a, p2; a € R”? } by Theorem 54.1.8 (xii). 
This is a real linear space by Definition 54.4.4, and dim(T}(M2)) = nz by Theorem 54.4.11. Let W = T}( M2). 
Then f is a W-valued function on Mi, where W is a finite-dimensional real linear space. Also f € C!(M, W) 
because ¢ € C1(U,IR"). The map ky : a +> tq4,4, from IR"? to T,(M3) is linear and hence C®. So the 
action of pv, on f is well defined for all p € M, v € R™ and v € atlas; (M) by Definition 54.14.4. 

Let B = {tq,¢;,2; J € Nna}. Then B is a basis for T; (M35). (See Theorem 54.4.11 and Notation 54.4.10.) For 
p€UNDom(y), kg o f : U > R”? maps p to ¢(p). Therefore O,, y, (£B o f) = pv, y (0). Consequently 
Bond f) Oban, (KB 0 f)) = Kp Oca 0) = 14.0, v.u, (6),u, by Definition 54.14.4 line (54.14.3). 
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54.15. Tagged tangent operators 


54.15.1 REMARK: Ad-hoc resolution of zero tangent vector ambiguity by point-tagging. 

A “tagged” tangent operator on a manifold M is a pair of the form (p, ôp,v,p), where Op» is a tangent 
operator as in Definition 54.11.2. The reason for tagging is to resolve the ambiguity of operators 0, 9,y, 
with zero coefficients. (See Remark 54.12.10.) It is not possible to construct a tangent bundle from tangent 
operators without such tagging. (Malliavin [28], page 64, calls the false assumption T;(M)n T, (M) = 0 “un 
abus de langage commode", which means *a convenient abuse of language".) 


Although non-zero tangent operators can be disambiguated in principle, in practice it is very burdensome 
to determine the base point p and coefficients v (for a given chart v) from the action of an operator on test 
functions. So even without the zero-vector ambiguity, it would be a good idea to use tagging. 


54.15.2 REMARK: Tagged tangent vectors are unnatural and unnecessary. 

Tagged operators are a desperate measure to salvage differential operators as a viable representation for 
tangent vectors. It is true that differential operators do have some convenient formulas for making calcu- 
lations. But one may easily obtain such convenience by invoking the notation Oy = Op,» for tangent-line 
vectors V = tpw, as in Notation 54.11.12. 


The tagged tangent operator definitions and notations in Section 54.15 could be argued to be the best com- 
promise between convenience, tradition and meaningfulness. However, they are too clumsy and unnatural. 
Tangent-line vectors seem to offer the best compromise, and (untagged) differential operators may be easily 
constructed as needed from tangent-line vectors. Consequently, the definitions and notations given here for 
tagged differential operators may be safely ignored. 


54.15.3 DEFINITION: A tagged tangent (differential) operator on a C! manifold (M, Am) is a pair (p, jv.) 
such that p € M and pv, : C' (M) > R is a tangent operator at p. 


54.15.4 NOTATION: The set of tagged tangent operators at a given point. 
T, (M) for a point p € M, for a C! differentiable manifold M, denotes the set of tagged tangent operators 
on M at p. In other words, 7,(M) = {p} x T,(M) = {(p,¢); 6 € T;(M)j. 


54.15.5 DEFINITION: The tagged tangent operator space at a point p in a C! manifold M is the set T,(M) = 
{(p, 0); GE T,(M)} together with the linear space operations of T,(M). Thus 

(i) Vó1, $2 € T,(M), ((p, $1) F (p, $3)) (f) = (p, $1 T $2), 

(ii) VÀ € R, Vó € T (M), (Alp, 9) (f) = (p, ^9(f)- 


The linear space specification tuple for this tangent-line vector space is (IR, T,(M hom. TR; 0, H), where 
(IR; cm, TR) is the specification tuple for the field of real numbers. 


54.15.6 NOTATION: The set of all tagged tangent operators on a manifold. 
T(M) for a C! manifold M and point p € M denotes the set Upe m Tp(M) of tagged tangent operators 
(p, Ona) at p. 


54.15.7 REMARK: The action of operators on function spaces is inefficient for computer software. 

A tangent bundle based on the tagged tangent operator in Definition 54.15.3 would not be an economical 
tangent bundle definition. The space C!(M) is a very large space of functions. If p is fixed, it is sufficient 
to consider (instead of C! (M)) only the functions f^ € C! (M) defined by f^(p) = v(p)* for p € Dom(v) 
and k € Nn (with appropriate smoothing at the boundary of Dom(w) to ensure extendability to all of M). 
Then 0, y (f^) = v^ for all v € R” and k € Nn. In other words, the action of Op,,y on a well-chosen set of 
n functions contains the same information as the action on the entire space C! (M). But of course, the main 
objective of this representation is usefulness in applications, not economy for data structures in computer 
software. 


It is possible to recover (or discover) the base point p and the component vector v € R” from a tangent vector 
operator Op, if the point p is not known, but this is quite difficult. If the operator is non-zero, it is sufficient 
to apply the operator to the functions f; and fke defined by fy : p> v(p)" and fy; : p — v(p)'w(p)*. Let 
o = Op, o (fx) and Bij = Os vu (fixe) for p € Dom(w). Then clearly v! = oj for all i € Np, and zt = Bi /2a; 
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for all i such that a; Z 0. Choose j such that a; 4 0. Then vê = 8;;/o; for all i such that a; = 0. Thus 
both the point p = v !(z) and v € R” are determined from the action of Ó,,,,; on at most n + n? test 
functions. 


54.15.8 DEFINITION: A tagged tangent operator space chart-basis vector is a tagged tangent operator 
ar” = (p, 9^") such that 0?” is a tangent operator space chart-basis vector. 


54.16. Unidirectional tangent bundles 


54.16.1 REMARK: Unidirectionally differentiable manifolds and homeomorphisms. 
Unidirectionally differentiable manifolds are defined in Section 51.11. Some examples of unidirectionally 
differentiable homeomorphisms are presented in Examples 53.2.4, 53.2.5 and 53.2.6. 


54.16.2 REMARK: Unidirectional tangent bundles for embedded manifolds. 

A unidirectional tangent bundle looks like a cone at each point of a manifold which is embedded in a 
Euclidean space whereas the usual definition of a tangent bundle looks like a tangent plane at each point. 
An invariant class of curves for unidirectionally differentiable manifolds is the class of curves with one-sided 
derivatives everywhere. The set of one-sided derivatives at a point constitutes a cone of tangent vectors. 


54.16.38 REMARK: Contexts where unidirectionally differentiable manifolds might arise. 

The examples in Section 50.6 give some idea of the issues which arise for manifolds with locally Lipschitz 
transition maps. The locally Lipschitz, i.e. C°!, condition guarantees the existence of derivatives almost 
everywhere, but does not guarantee the existence of unidirectional derivatives everywhere, as shown by 
Example 50.6.8. Existence of tangent vectors almost everywhere may be adequate in contexts where vector 
fields are to be integrated, for example. So the manifold classes C^! for integer k may be useful for some 
applications. In other applications, directional derivatives may be well defined everywhere but not continuous 
everywhere. Such manifolds may arise from differential equations whose force functions have well-defined 
one-sided limits everywhere instead of full continuity. 


54.16.4 REMARK: Unidirectional tangent bundle metadefinition. 
A unidirectional tangent bundle for an n-dimensional unidirectionally differentiable manifold (M, Am) must 
provide a tuple (7,7 Ar, $) > (T, Ar) > T which satisfies the following conditions. 


(i) 7:7 — M is a surjective map. 


(ii) 9: Am > Az is a bijection. 
(ili) Vb € Am, P(Y): 7 No e > R^. 
(iv) Vp € M, Vv € atlas; (M), 1» L.cà 1p) i —1({p}) + R” is a bijection. 
(v) Vp € M, Vor, Y2 € e VV uc MN 
$(v3)(V js (OF vs (E Fa) pva (54.16.1) 


T is called the total space of the unidirectional tangent bundle. 

An element of 7 is called a unidirectional tangent vector. 

7 is called the projection map of the unidirectional tangent bundle. 

Ar is called the unidirectional tangent bundle atlas of the unidirectional tangent bundle. 

The maps (v) € Az are called unidirectional tangent bundle charts. 

A unidirectional tangent vector at p € M is any element of » !((p]). 

The unidirectional tangent vector at p € M with coordinates v € R” with respect to chart Y € Ay is the 
unidirectional tangent vector CICOP E (9) ET. 


54.16.5 REMARK: Ezgspressions for unidirectional derivatives via the charts. 
Equation (54.16.1) may be written more fully as follows. 


B(2)(V) = lim Vai (a + av) — vai 162) 


a0 a z=% (p) v=8(41)(V) 
— jim YOT Gh (9) +a (41)(V))) = vp) 
a—0* a 


= OF baby" (br (p) + a®(h1)(V))). 
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55.0.1 REMARK: Table of tangent tensor classes. 

Chapter 55 presents covector bundles and frame bundles. Chapter 56 presents tensor bundles and multilinear 
map bundles. Table 55.0.1 gives a rough overview of tangent spaces, vector fields and differentials. (See also 
Table 53.4.1 for a summary of definitions of differentiable manifold objects and spaces.) 


type linear space total space vector fields references 


real functions and maps 


real function CE(M) C*(M) 51.6 
manifold map CE (Mi, M3) C* (Mi, M3) 52.1 
tangent vectors, covectors and tensors 

tangent T,(M) T(M) X*(T(M)) 54.1 
tangent operator T,(M) T(M) 54.11 
tagged tangent operator T,(M) T(M) X*(T(M)) 54.15 
tangent covector T5 (M) T*(M) X*(T*(M)) 55.2 
tangent vector-tuple T; (M) T"(M) X*(T"(M)) 55.5 
tangent vector-frame F;(M) F'(M) X*(F"(M)) 55.6 
tangent space basis Fp (M) F”(M) X*(F"(M)) 55.7 
type (r, s) tensor TES (M) T”S(M) X"(T'*(M)) 56.1, 57.5 
higher order tangent spaces 

order-£ tangent TE (M) TUM) X*(TU(M)) 60.2 
order-¢ tangent operator T4(M) TU (M) 

tagged order-/ tang. op. T1 (M) TUM) X*(Tl4(M)) 


Table 55.0.1 Overview of tangent spaces, vector fields and differentials on manifolds 


55.0.2 REMARK: Covector, frame and tensor bundles are “associated” with tangent bundles. 
Like tangent vector bundles, tangent tensor bundles are differentiable fibre bundles. Tangent tensor bundles 
share the same structure group as the corresponding tangent vector bundles. Hence they are associated fibre 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 


[ www. geometry. org/dg. html] 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


dden. You may not charge 


1752 55. Covector bundles and frame bundles 


non-topological 
fibre bundle 


t 


topological 
fibre bundle 
~ 
tangent bundle | . tensor bundle 
on C* manifold on C* manifold 
differentiable | 
fibre bundle H i 
Bs Y Y 
tangent bundle | . tensor bundle 
on C? manifold on C? manifold 


Figure 55.0.2 Relations between fibre bundles, tangent bundles and tensor bundles 


bundles, and parallelism can therefore be ported between them. The class derivation relations are roughly 
sketched in Figure 55.0.2. 


55.0.3 REMARK: Etymology of covectors and cotangents. 

The use of the prefix “co” for covectors and cotangents makes some sense. (The Latin prefix “co” derives 
from a Latin preposition which means “with”.) In trigonometry, the cosine, cotangent and cosecant of an 
angle 0 are defined by swapping the X and Y coordinates in a right-angled triangle where Y is the side 
opposite the angle 0, and X is adjacent. In a similar way, tangent covectors (also known as “cotangent 
vectors") are defined as the dual of tangent vectors. Tangent covectors are not reciprocals or inverses of 
tangent vectors. But the chart transition maps for tangent vectors and covectors are inverses of each other. 
Thus vectors and covectors go “with” each other. 


55.1. Philosophy of tangent covector representations 


55.1.1 REMARK: The plethora of clumsy representations for covectors. 

In principle, the space of tangent covectors is the dual of the space of tangent vectors. The properties of the 
duals of finite-dimensional linear spaces are very well known. (See Section 23.4.) On a finite-dimensional 
manifold, the tangent space is finite-dimensional. So the dual space has the same dimension as the primal 
space. (See Theorem 23.7.11.) 


As in the case of tangent vectors, there are many species of objects which are typically described as tangent 
covectors. Each tangent space representation has its own dual, which is the set of linear functionals on the 
linear space structure of each such representation. Additionally, one may define various concrete tangent 
covector space representations which are not simple duals. 


Tangent spaces are often quite cumbersome structures. Therefore defining linear functionals on them can 
also be quite cumbersome. For example, if T; (M) consists of differential operators or equivalence classes of 
curves, one does not really want to specify the value of a linear map for elements of such spaces. 


55.1.2 REMARK: Dual action spaces versus true dual spaces. 

It turns out that “dual spaces" are not always "true dual spaces". Often they are merely “dual action 
spaces". Let V be a linear space over a field K. The true dual of V is the set (A : V — K; A is linear}, 
together with the pointwise vector addition and scalar multiplication operations. (See Section 23.6 for dual 
linear spaces.) 

Bilinear maps are closely related to dual linear spaces. Let W be a linear space over the same field K as V. 
Let u: W x V > K be a bilinear map. (See Section 27.2 for multilinear maps.) For each element w € W, 
the map ply : V > K defined by ji, : v > (w, v) is an element of the algebraic dual V*. Thus Hu € V* for 
all w € W. But this is not the same thing as saying that w € V*. 

Each bilinear map u: W x V — K may be thought of as a linear map jii : W — V* where fi: w > Ly. 
(This map may also be notated as fi: w œ (v  pu(w,v)).) There is therefore a linear map p : .Z(W,V) > 
Lin(W, V*) defined by p(u)(w)(v) = u(w, v) for all u € .Z(W,V), w € W and v € V. (See Notation 27.2.19 
for the bilinear function space notation .Z(W, V).) 
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It follows that every linear space of bilinear functions “(W,V) may be thought of as a space of linear 
functionals on V. In other words, one may think of .Z(W, V) as a subspace of V*. Therefore such spaces of 
bilinear functions may be used as “dual action spaces", which are almost equivalent to “true dual spaces". 


No matter which *dual action space" is chosen as the representation of the dual of a tangent space, one 
usually uses the notation V* and calls it “the dual" as it was the true dual. This is effectively harmless, 
except that it is slightly confusing because it is not literally true. 


55.1.3 REMARK: Classes of tangent covector space representations. 
Tangent covector spaces representations may be classified as follows. 


(1) True dual spaces: The standard algebraic dual of any linear space representation T,(M) for the 
tangent space at p € M. In this kind of representation, tangent covectors are linear functionals on the 
corresponding linear space. 

(2) Chart-tagged true dual spaces: Chart-tagged algebraic duals on the Cartesian coordinate space. 


(3) Chart-tagged dual action spaces: Chart-tagged concrete representations on Cartesian coordinate 
spaces which are not algebraic duals. 


(4) Dual action spaces: Concrete representations which are neither algebraic duals, nor chart-tagged 
algebraic duals, nor chart-tagged dual action spaces. 


True dual spaces have the disadvantage that they need to be coordinatised by means of a basis in order to 
make them useful in practice. So it is better to define a dual action space instead, on which coordinates are 
easier to define. Likewise, the native duals on a manifold have a disadvantage, namely that they must be 
defined from scratch instead of importing the ready-to-wear duals from the chart's tangent bundle. 


There are two principal concerns when choosing a representation for tangent covector spaces. 
(i) Which symbols should be used to indicate individual tangent covectors. In other words, how should 
tangent covectors be written? 
(ii) What should the symbols mean? In other words, what should tangent covectors be? 
One needs both a convenient method of pointing to individual covectors and also a definite idea of what 
they are. Too often, it is unclear what the symbols mean. And sometimes a nicely defined representation 


has inadequate “handles”. In Remark 55.1.4, representation (iii) seems to offer the best representation in 
terms of extensibility and intuitive “correctness”, together with the most convenient “handles”. 


55.1.4 REMARK: Some examples of tangent covector space representations. 
The following are examples of tangent covector space representations in the four classes in Remark 55.1.3. 


(i) In class (1), the tangent covectors are linear maps from Tp(M) to IR. This sounds simple enough. Most 
texts do initially define tangent covectors as the dual space of the tangent space. Although abstract 
formulas generally do use this abstract algebraic dual, in more concrete applications and calculations, 
the true dual is typically replaced by more concrete representations. The reason for this is that it 
is onerous to have to specify a full linear map from 9 : T;(M) — IR. In principle, this requires the 
provision of an infinite number of ordered pairs (z, 6(z)) for z € T;(M). One usually provides a rule to 
specify these ordered pairs. But this is still onerous. This true dual space may be denoted T,(M)* to 
distinguish it from chart-tagged or dual action spaces, which may be denoted 77 (M) as in example (iii). 


(ii) In class (2), the tangent covectors are equivalence classes of chart-tagged Cartesian-space tangent covec- 
tors of the form [(w, ¢)], where v € atlas, (M) and ¢ € Ty(p)(R”)*. Thus ¢ : Typ) (M) — R is a linear 
functional on the Cartesian coordinate space's tangent space Typ) (M). If one uses the tangent-line 
bundles in Sections 26.13 and 26.14 to represent tangent spaces on points of the Cartesian coordinate 
space R”, then the tangent vectors in T,,(M) will be represented by equivalence classes of pairs [(w, L)] 
where v € atlas(M) and L € Typ)(IR"). Thus each L is a parametrised line in IR". The action of 
[(v, d)] on [(v, L)] is given by ((v,9)]([(. L)]) = $(L) € R. This representation has the advantage 
that everything is imported ready-made from Cartesian space, but it still has the disadvantage that it 
is onerous to have to specify the linear map 9 : T;(M) — IR. One may denote this imported true dual 
space as T; (M), which is not the same set construction as T;(M)*. 


(iii) In class (2), the chart-tagged true dual T% (M) is equal to the set of £5, wy = (v, 5(,,.,,)] for p € Dom(v) 
and w € R”, where L7, : Te(R”) — R is the linear map defined for x = v(p) and v € R” by Li, : 
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Lew c SY, wivi. Then the dual action is defined by ibus (tpv) = E vas (Li (p),v) = sg Witi. 
'This is the same set-construction T% (M ) as in example (ii), but it is parametrised by coordinates 
x € R” and components w € R”. Thus T} (M) = {t} w; w € R”, v € atlas,(M)}. This is equivalent 
to TF (M) = {[(4, Lž w); v € atlasp(M), £ = v(p), w € o This i is probably the best representation 
ontologically and practically. 


(iv) An example of class (3) is the set W of tuple-classes [(p, A, v) with A € (IR?)*, while tangent vectors 
in T,(M) are represented as tuple-classes tp, = [(p, v, v)]. Here A is a linear map from R” to R and 
the dual action map u : W x T,(M) > R is defined by u : (((p. àA, v)], [(p, v, v)])  A(v). 


(v) An example of class (3) is the space of vector component tuples for the covector space. This is not the 
algebraic dual of the primal space of tangent vector component tuples. The dual of a tuple space is a set of 
maps from tuples to the field IR. For such a concrete dual W, one may define a map p : W x T (M) > R 
such that u is bilinear, but this is not the same thing as the algebraic dual. It is, however, closely related 
to the algebraic dual. For each element w € W, the map Huw : T,(M) — R defined by pw : v + p(w, v) 
is an element of the algebraic dual, T; (M). Thus pw € T5 (M) for all w € W. But this is not the 
same thing as saying that w € T; (M). For example, in the case of component tuples, one may define 
p: W x R” by W = R” and u : (w,v) 9 OL, wivi. Then py € (IR?)* for all w € W. But it is clear 
that the tuples w € W are not linear functionals on IR". They are just tuples. 

(vi) As another example of class (3), let T,(/) consist of tuple-classes tp». = [(p, v, v)] for p € M, v € R” 
and v € atlas; (M). One may define a space W of tangent covector tuple-classes 15 ,, „ = [(p. w, v)] for 
p E€ M, w € R” and v c atlas,(M). Then one may define u : W x T,(M) so that ult pw pr ipee) = 
Moa wiv for tuy € W and tjs, € T,(M). It is clear that t5, ,, is not itself a linear functional 
on T; (M), but the map He y T, p(M) > TR with Mer Up T IU, wy» tp,v,) i$ a linear functional 
for t wy E M. 


(vii) Another example of class (3) is the set of tuples (Y, (p), L*) or (v, (v(p), L*)), where each L* is an 
affine map from IR" to R with L*(wv(p)) = 0. (See Section 26.18 for details. This representation is too 
clumsy to use, even in a simple Cartesian space.) This would then be chart-tagged dual action space 
representation. 


(viii) An example of class (4) is to define tangent covectors at p € M would be as equivalence classes of 
functions f € C'(M) which have the same differential. Thus two functions f,g € Cl(M) would 
be considered equivalent if Oy f = Oy g for all V € T,(M). (Notation 54.11.12 defines Oy.) A tangent 
covector would then be an equivalence class [f] of a function f € C! (M) with respect to this equivalence 
relation. This representation for tangent covectors is analogous to the construction of tangent vectors 
on manifolds as equivalence classes of curves with the same derivative. (See Remark 53.3.1, paragraphs 
(iv) and (v).) 

(ix) As another example of class (4), tangent covectors can also be defined as maps from the set of C! 
curves y : R — M with 4(0) = p to the real numbers. This is a dual of the curve-equivalence-class 
representation of tangent vectors. (See Remark 53.3.1 (iv).) Every function f € C! (M) corresponds to 
such a tangent covector. The map y +> Of (»(t))) (Lo has the same value for all curves y in a curve 
equivalence class at p € M. 


The literal abstract dual, consisting of linear functionals on a linear space, is often almost useless for practical 
applications with concrete linear spaces. One does not want to refer to an element of a dual space by 
specifying its action on every element of the primal space. At the very least, it is preferable to set up a 
basis for the dual space and refer to dual space vectors by their basis components. This generally requires 
a corresponding basis to be defined for the primal space also. If choosing a basis for both primal and dual 
spaces is inconvenient, it is in practice often best to define a concrete linear space to act as the dual to a 
given concrete primal space. Then one may define the “dual action” of each space on the other. The literal 
dual space is best for abstract algebra. But concrete “dual action spaces" are generally used in practice 
instead of abstract duals. 


55.1.5 REMARK: The plague of clumsy representations for covectors. 
Although there is a plethora of representations of tangent vectors on manifolds, the plethora of tangent 
covector representations is an even bigger plethora. Perhaps the word “plague” is more apt than “plethora”. 


Just as in the case of (contravariant) tangent vectors, one really wants to be able to use all representations 
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of tangent covectors interchangeably according to the application. But for practical reasons, it is necessary 
to choose one of the representations as the standard and derive all of the others from the standard version. 


This book tries to use the “economy principle” to help choose the best standard representation of each class 
of object. This eliminates some of the more artistically pleasing representations involving curves, smooth 
functions and linear functionals. The most economical representation is in terms of equivalence classes 
[(p, w, v)] for w € IR" and charts v» € atlas; (M). These are the same kinds of triples as for tangent vectors, 
but the computation rules are different. 


55.2. Tangent covectors 


55.2.1 DEFINITION: The (tangent) covector space or cotangent (vector) space at a point p € M, for a C! 
manifold M, is the dual linear space Lin(T (M), IR) of the tangent-line space T,(M) at p in M. 


A (tangent) covector or cotangent (vector) at p in M is any element of the tangent covector space at p in M. 


55.2.2 NOTATION: The tangent covector space at a point in a manifold. 
To (M), for a point p € M, for a C1 differentiable manifold M, denotes the tangent covector space at p in M. 
In other words, T7 (M) = Lin(T; (M), IR) together with the usual pointwise linear operations. 


55.2.3 REMARK: The standard linear space structure for a tangent covector space. 

The full linear space specification tuple for the pointwise tangent covector space T; (M) in Definition 55.2.1 
is (IR", TZ (M), oR, TR, 7,4), where oR and 7R are the addition and multiplication operations respectively 
for IR, ø is the pointwise addition operation on T; (M), and p is the pointwise scalar multiplication operation 
of R on T7 (M). (See Definition 22.1.1 for linear space specification tuples. See Section 23.6 for dual linear 
spaces.) 


The location of the superscript asterisk in Notation 55.2.2 could be altered to indicate an alternative rep- 
resentation. Thus T,(M)* could be defined as the true dual of the linear space T,(M) while T; (M) could 
denote a different representation from amongst the alternatives in Remark 55.1.4. The advantages of two 
slightly different notations for different representations seem to be outweighed by the disadvantages. In par- 
ticular, such an approach would be error-prone. Notation 55.2.2 has the advantage that T7 (M) and T,(M)* 
are exactly the same. 


55.2.4 REMARK: Escaping the “too much information” problem for true dual spaces. 

The first difficulty with Definition 55.2.1 is that the identification of a particular element of the true dual 
space T7 (M) requires “too much information". To specify an element w of Tý (M), one must specify the 
value of w(V) for all V € T;(M). A more compact way to specify a linear functional w on the linear space 
T,(M) is to specify only the value of w(V) for V in a basis for T;(M). This brings the great misfortune 
of the arbitrary choice of the basis. To be fair to all bases, one must specify the value of w on all bases 
of T, (M), which is even worse! 


To escape the “too much information" problem, one may observe that for any chart v € atlasp( M), the 
tangent-line space Typ) (IR") which is used in the construction of T; (M) has a natural ordered basis (Lze; )7.., 
at each x = v(p), where the sequence (e;);, of velocity component tuples is defined by (e;)! = ói; for 
all ij € Nn. Therefore the sequence ([(V, Lu(5y,e;)])i-1 = (tpe;,u)-1i is a basis for T,(M). The basis 
vectors tpe; for T,(M) are denoted e^" in Notation 54.4.10. The canonical dual basis of (c^)? , is 
defined by Definition 23.7.3. The most efficient way to specify an individual vector in T7 (M) is to specify 


the components of the vector with respect to this canonical dual basis. 


55.2.5 DEFINITION: The (tangent) covector component tuple of a tangent covector w in the tangent covector 
space 75 (M) at a point p in a C! differentiable manifold M with respect to a chart ~ € atlas; (M) is the 


tuple (w(e?”))", € IR", where n = dim(M) € Zj. 


7 


55.2.6 REMARK: Reconstruction of a covector from its component tuple. 

Definition 55.2.5 associates a well-defined real n-tuple with each w € T5(M). It must be shown that the 
entire map w : T,(M) — R can be correctly reconstructed from this n-tuple. Definition 55.2.7 defines the 
method of reconstruction of a covector from its component tuple. 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


1756 55. Covector bundles and frame bundles 


55.2.7 DEFINITION: The tangent covector with coordinates (w;)?_, € IR" with respect ~ € atlas; (M), for 
pina C! manifold M with n = dim(M) € Zi, is the tangent covector w € T5 (M) which satisfies 


Vv € IR^, w(tp wyp) = 2 wi 


55.2.8 REMARK: The bijection between tangent covectors and components is well defined. 

The map from R” to T (M) defined by v +> t,,,,, is a bijection. Therefore the map in Definition 55.2.7 is 
a well defined map jon T,(M) to IR. The map is also clearly linear. Therefore Definition 55.2.7 defines an 
element of T; (M) for each coordinate n-tuple. Hence Notation 55.2.9 denotes a well-defined covector for all 
pc M, w € IR" and v € atlas,(M). 


S 2.9 NOTATION: A specific tangent covector at a point in a manifold. 
twy: for p € M, w € R” and v € atlas, (M), for a C! manifold M with n = dim(M) € Zj , denotes the 
tangent covector with coordinates w with respect to v at p. In other words, 


Vp € M, Vw € R^, V € atlas, (M), Vv € R”, 
tow y(t pwp) = Suv 


55.2.10 REMARK: De-facto coordinate-tuple equivalence class representation for tangent covectors. 

Notation 55.2.9 is effectively the same as adopting equivalence classes |(p, w, v)] of coordinate tuples (p, w, Y) 
as the representation for tangent covectors as in option (iv) in Remark 55.1.4. Such tuple-classes would be 
indistinguishable from the tangent vector tuple-classes [(p,v,w)], although the chart-transition coordinate 
transformations are different and the meaning is different. The tangent covector tuple-class representation 
would be a *dual action space" which requires a dual action map as described in Remark 55.1.2. All things 
considered, the simple dual representation as in Definitions 55.2.1 and 55.2.7 seems best. The coordinate- 
tuple equivalence class is implicit in Notation 55.2.9, as it is for all kinds of structures on manifolds. 


55.2.11 REMARK:  Pseudo-dual notation for covector component tuples. 

A subscript is used for the tuple w = (w;)2., € R” in Notation 55.2.9, but there is really no difference 
between these “covariant tuples” and the “contravariant tuples” v = (v Dn € R”. They are both exactly 
the same kinds of real-number tuples, distinguished only by their application context. However, the use of 
superscripts and subscripts for these tuples is valuable as a mnemonic, and as an assistance to some kinds 
of computations. 


55.2.12 REMARK: Identification of component-based covectors with true dual space elements. 
Next, it must be shown that the well-defined covector t5, w,y 1n Notation 55.2.9 is the same as the original 


covector w € T; (M) whose coordinates are w = (wi); = (w(ei SYN 


55.2.13 THEOREM: Every element of the dual of the tangent space is a tangent covector. 
Let M be a C! manifold with n = dim(M) € Zj. Let p € M and w € T;(M). Let w € IR" be the covector 
component tuple for w with respect to a chart € atlas,(M). Then w = anti 


PROOF: Let M be a C! manifold with n = dim(M) € Zf. Let p € M and v € Tz(M). Let w € 
R” be the covector component tuple for w with respect to a chart v € atlas,(M). Let v € IR^. Then 
tpv = ov, vie?" by Theorem 544.11. So w(t z 9) = oE vie *) = viele”) = Y^? viw by 
Definition 55.2.5. But t> y y(tpow) = Xi wiv’ for all v € IR" by Definition 55.2.7 and Notation 55.2.9. 
Since w and t? ,, tw y have the: same value for all elements of T?(M), it follows that w = t7 


p,w,v 


55.2.14 REMARK:  Ezpression for tangent covector spaces at points in terms of component-based vectors. 
It follows from Theorem 55.2.13 that for any fixed chart v € atlas; (M), covectors w € T7; (M) are in a one- 
to-one association with covectors t where w = (w(e?”))?_, € IR^. Hence T3 (M) = {t wy we R^] 


for any fixed v € atlas, (M). 


p,w,v? 


55.2.15 THEOREM:  Tangent covectors depend linearly on their component tuples. 
Let M be a C! manifold with n = dim(M) € Zj. Let p € M and v € atlas; (M). Then the map from IR" 
to T/ (M) which is defined by w > tř „y is a linear map. 
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PROOF: Let p € M and v € atlas,(M). Let w = (wi)_, € R” and à € R. Let V € T,(M). Then V = 
tpv for some v = (v*)7., € R” by Notation 54.1.4, and then t yw y (V) = t2 x, uy (too) = Din Aw) = 


Aia Wid" = M uu ltp op) = My, f (v). Therefore t5 ywy = A 
Similarly, let w* = (w*)?* , € R” for k = 1,2. Then 05 wiway = ipai + 02,4 by the same form of 


argument. Hence the map w > t5, ,, is linear from R” to T5 (M). 


55.2.16 REMARK: The contragredient chart transition rule for covector component tuples. 

Comparison of Theorem 55.2.17 with Theorem 54.1.11 reveals that the matrix which is applied to the 
covector component tuple in Theorem 55.2.17 is effectively the transpose of the inverse of the matrix which 
is applied to vector component tuple in Theorem 54.1.11. The transformations applied in each case are 
dual (or contragredient) to each other. (See Definition 23.11.20 for the dual representation of a linear space 
automorphism. See also Remark 54.4.12 for a similar comment about the transformation rule for tangent 
space chart-basis vectors. See also Remark 55.2.19.) 


In essence, Theorem 55.2.17 is a direct consequence of Definition 55.2.7, which makes covector components 
transform in a dual fashion to vector components. 


55.2.17 THEOREM: Chart-transition rule for tangent covector components. 
Let M be a C! manifold with n = dim(M) € Z. Let p € M and Ya, vg € atlas; (M), and w^,w? € R”. 


Then ty weya = tpw pg if and only if 
vss (x) 
P a B Qa 
Vi € Nn, w, 3 Fat lus c) (55.2.1) 
J= 
= Y w Jss(p)^;, (55.2.2) 
j= 
where the chart transition matrix Jgq(p) is defined in Definition 51.4.18. 
PROOF: By Definition 55.2.7 and Notation 55.2.9, t% we ya = t jobs if and only if 
Wwa, vg € R”, bud = eu. > Sa AY weve. (55.2.3) 
i=1 i=1 


But according to Theorem 54.1.11, typo... = tp,vg,v, if and only if 


f i ". 9 -1 i J 
Vi € Nn, Ug m » a; (ea (x))) —À 
j= 
= Jga(p)' jv), 
j=l 
So by line (55.2.3), t5 we ya = Fw? bp if and only if 
Vos, vg € R”, (Vi € Nn, vg = 2 Jas(p) joh) > » uu = LS 
which holds if and only if 
Vos € IR", EM UI S we Y: Jaa(p)' jv, 
i=l i=1 j=l 
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3 


II 
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which holds if and only if 


Vi € Nn, we = Y wi Jao (p). 


55.2.18 REMARK: Mnemonic formula for chart-change rule for tangent covectors. 
The chart-change rule in Theorem 55.2.17 may be written in the mnemonic form: 


ð 
Vi € Nn, -X uy p, 


55.2.19 REMARK: Dual (or contragredient) linear map action on covector spaces. 

Since the tangent space T (M) for p in a C ! manifold M is a finite-dimensional linear space with dimension 
n = dim(M) € Zi, the dual (or contragredient) group action in Definition 23.11.20 is well defined on T} (M). 
Unlike the chart-change rule in Theorem 55.2.17, the dual action of a linear map on T7 (M) transforms points, 
not coordinates. However, there are some similarities. In each case, the matrix used for the covector space 
is the transpose of the inverse of the matrix which is used for the vector space. This kind of transpose of 
the inverse of a matrix is referred to as the dual or contragredient. 


Linear change-of-coordinates transformations are important for establishing that tensorial objects are indeed 
tensorial. It is customary in differential geometry to say that an object is a tensor if it behaves like a tensor, 
which means that it must pass a coordinate transformation test. 


Linear transformations of the points of vector spaces and covector spaces play a role in definitions of horizontal 
lift maps (i.e. connections), parallel transport, holonomy, curvature and covariant derivatives. In particular, 
if a connection or covariant derivative is given for a tangent bundle T(M), it should be possible to construct 
a corresponding connection or covariant derivative for the covector bundle T*(M). The “dual action” of 
linear maps in Definition 55.2.20 establishes the relevant correspondence between action on tangent bundles 
and action on covector bundles. 


55.2.20 DEFINITION: The dual action of an invertible linear map L € Lin(T,(M),T,(M)) on the covector 
space T7 ((M) for p € M, where M is a C! manifold, is the map L* € Lin(T7 (M), T7 (M)) given by 


Vw € T» (M), Yv € T; (M), L*(w)(v) = w(L- v). 


In other words, L*(w) = w o L^! for all w € T; (M). 


55.3. Chart-induced bases for tangent covector spaces 


55.3.1 REMARK: Tangent space chart-basis covectors. 
Next, it is convenient to define tangent space chart-basis covectors analogous to the chart-basis vectors in 
Definition 54.4.9 and Notation 54.4.10. 


55.3.2 DEFINITION: A tangent space chart-basis covector at p in M for chart ~ € atlas;(M) with index 
i € Nn is the covector £j ,, y in T; (M), where M is an n-dimensional C ! differentiable manifold with n € Zf. 


55.3.3 NOTATION: Specific tangent space chart-basis covectors. 
Bu for p € M, v € atlas; (M) and i € Nn, for an n-dimensional C! manifold M with n € Zj , denotes the 


tangent space chart-basis covector at p in M for chart «v» with index i. In other words, e, = E ei " 


55.3.4 REMARK:  Pseudo-dual standard basis for covector component tuples. 

As mentioned in Remark 55.2.11, there is no real difference between real n-tuples with subscripts and those 
with superscripts, apart from their application context. Similarly, there is no real difference between the 
Cartesian space “covariant basis vectors” (e’)"_, and the corresponding “contravariant basis vectors” (e;)/. ,. 
They are both defined as the n elements of IR^ which are n-tuples of the form (0,...1,...0) with 1 in the 
ith position. In other words, both eê and e; map j € Nn to 6;;. Thus e’ = e; for all i € Nn. The choice of 
notation is purely mnemonic, to assist computations. 
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The pseudo-dual basis (e7)”_, in Definition 55.3.2 should not be confused the canonical dual basis (/)5.., 
for IR" which is introduced in Definition 23.8.2 for the canonical dual basis for Cartesian spaces. The true 
canonical dual basis is a sequence of linear functionals in (IR")* which satisfy h/(e;) = 67 for all i,j € Nn, 
as mentioned in Remark 23.8.1. 


By contrast with the “pseudo-dual basis” (e*)7., for the Cartesian space R”, the basis (ef ,,)- for T; (M) 
is a true dual basis consisting of linear functionals on the primal space T (M). 


55.3.5 THEOREM:  Tangent space chart-basis covectors form a basis for the tangent covector space. 
Let M be a C? manifold with n = dim(M) € Zj. Let p € M and v € atlas, (M). Then (65) £21 is a basis 
for T} (M). 


PROOF: Let w € T5(M). Let w = (wi) € R” be the covector component tuple for w with respect 


to ij. Then w = t? „y by Theorem 55.2.13. But t5, = 574 wit, uy = Mica wie}, by Theorem 55.2.15. 
Therefore T7 (M) is spanned by (e5,,)7-,. Now suppose that $77 ,wie?,, = 0 for some w € IR". Then 
Lud = Dota Wits wy = 0. So 972 Wiw = t wy ltp wy) = 0 for all v € R” by Notation 55.2.9. Therefore 
by choosing v = e; € IR" for each j € Np, it follows that w; = 0 for all j € N,. In other words, w = 0. 


Therefore the vectors (Cm are linearly independent. Hence there are a basis for T} (M). 


55.3.6 THEOREM: Some formulas for the components of tangent covectors. 
Let M be a C! manifold with n = dim(M) € Zj. Then 


Vp € M, Vv € atlas; (M), Vi € Nn, Vu c R”, 
ep tye) = to ei yltpw y) =", (55.3.1) 
Vp € M, Vy € atlas; (M), Vi, j € Nn, 
Cae edi ei y ltpes y) md (55.3.2) 
Proor: By Notations 55.3.3 and 55.2.9, ej y (tpu) = 6 ei yltrww) = Djale) = Dja Oye? = v. 
This verifies line (55.3.1). 
Line (55.3.2) follows similarly from the formula e^" = tpe: for i € Nn in Notation 54.4.10. 


55.3.7 REMARK:  Notations for tangent space chart-basis covectors which suggest differentials. 

Tangent covectors are used for defining differentials of real-valued functions in Section 58.1. Many texts 
identify covectors so closely with the differentials of real-valued functions on manifolds that they are virtually 
assumed to be synonymous, and this is often reflected in the choice of notation. Thus, for example, da^ might 


denote the chart-basis covector eb = t5 "E for some implicit chart v. 


55.3.8 REMARK: Analogy between tangent covector components and tangent vector components. 
Theorem 55.3.9 is the covector analogue of Theorem 54.4.11. 


55.3.9 THEOREM:  Ezrpression for general tangent covectors in terms of components and basis covectors. 
Let M be a C! manifold with n = dim(M). Then 


Vp € M, Vip € atlas, (M), Yw € R”, PI MU y. 
i—1 


PROOF: Let M be an n-dimensional Ct differentiable manifold M for n € Zg. Let pe M, v € atlas,(M), 
w € R” and y € atlas,(M). Then w = $5; wie. 

It follows from Notation 54.1.4 that two covectors in T5 (M) are equal if and only if they have the same value 
for tpv, € Tp(M) for all v € R”. Let v € R”. Then 


TL 
t5 wp psi) = 2 wiv’ 
kic 
n * 
= M we (v) 
i=1 
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55.3.10 REMARK: Transformation rule for chart-basis covectors. 
The transformation rule for chart-basis covectors in Theorem 55.3.11 may be written slightly more simply 
as Eha = Ma (Vh o VT") (Wa(P)) € ya» or alternatively in the mnemonic summary form 


Ov; à 
el y = =a PPa 


Theorem 55.3.11 is the covector analogue of Theorem 54.4.13. The alert reader will notice that chart-basis 
covectors obey a “double contragredient” transformation relative to the vector component transformation 
rule in Theorem 54.1.11. In other words, it is a double dual, which makes the transformation the same as the 
original. The transformation in Theorem 55.3.11 is the dual of the transformation in Theorem 55.2.17, which 
is the dual of the transformation in Theorem 54.1.11. The four relevant transformations are summarised in 
the following table. 


vector  covector 


components primal dual 
basis dual primal 


In the primal cases, the chart transition matrix Jgq(p) is applied by matrix multiplication from the left. In 
the dual cases, the inverse is applied from the right, which means that effectively the multiplication is by 
the transpose of the inverse from the left. (The fact that this is a representation of GL(n) follows from the 
formula ((A8)-)* = (A^ (B-*y* 


55.3.11 THEOREM: Transformation rule for tangent space chart-basis covectors. 
Let M be a C! manifold M with n = dim(M) € Zj. Then 


Vp € M, Vis, vs € atlas, (M), Vj € Nn, 


“rata = 2 — -— 5. ^ (55.3.3) 
= Y; Seal) inva (55.3.4) 
{=l 


where the chart transition matrix Jga(p) is defined in Definition 51.4.18. 


Proor: By Notation 55.3.3, e d x^ ei wp" Let t, gia = tý wey: Then it follows from Theorem 55.2.17 


that w% = Y, 4(e?)kJgo(p)^; = De 19;Jaa (p); = Jga(p)?; for all i € Nn. But from Theorem 55.3.9, it 
follows that t* ga y, = Di1 WF Eh ya: Therefore el a = Ma Jgo(p) ie) ya» which verifies line (55.3.3). 
Then line (55.3.4) follows from Definition 51.4.18. 


55.4. Tangent covector bundles 


55.4.1 REMARK: Construction of the covector bundle from the tangent bundle. 

Definition 55.4.11 *automatically" constructs the tangent covector bundle for a C! manifold from the tangent 
bundle which has already been constructed from the manifold itself in Definition 54.5.16. The individual 
covector spaces T7 (M) have been constructed in Definition 55.2.1 as the linear space duals of the tangent 
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spaces T (M). So the only remaining part of the covector bundle which is required is the atlas for the total 
space T*(M) = Upem Tp (M). This atlas is constructed by making the elements w of each covector space 
15 (M) act on the basis vectors which are induced onto T,(M) by individual charts v € Ay. 

The manifold atlas Ap«(yj maps each element of its domain to IR?", consisting of n horizontal coordinates 
and n vertical coordinates. The fibre atlas AR. (m) Maps each element of its domain to IR", consisting only 
the n vertical coordinates. (See Definitions 47.5.2 and 64.4.2 for fibre atlases.) 


Since the construction of the total space atlas for the covector bundle is the most complicated part of 
Definition 55.4.11, this atlas is introduced first in Definition 55.4.8. The right side of line (55.4.1) is a tuple of 
2n coordinates. The first n coordinates w(p) are for the base point p. The second n coordinates (w(t e, ))7-4 
are the obtained as the value of w for a basis vector tpe; € Tp(M) as described in Definition 54.4.9. 


Definitions 55.4.2, 55.4.4 and 55.4.6, and Notations 55.4.3 and 55.4.7, are analogous to Definitions 54.5.4 
and 54.5.6 and Notations 54.1.4 and 54.5.7. The tangent covector bundle constructions are “implementation- 
agnostic” because covectors are defined simply as linear functionals on linear spaces of tangent vectors, and 
the fact that those vectors may be defined in terms of Cartesian space tangent-line vectors is an unimportant 
“implementation detail". 


55.4.2 DEFINITION: The (tangent) covector total space for a C! manifold M is the set Upem Tp M). 


55.4.3 NOTATION: The (tangent) covector total space on a differentiable manifold. 
T* (M) denotes the tangent covector total space for a C! manifold M. That is, T* (M) = Upem Tp (M). 


55.4.4 DEFINITION: The (tangent) covector fibration of a C! manifold M, with n = dim(M) € Zj, is the 
tuple (T*(M), n*, M), where n* : T*(M) — M is defined by 


Vp € M, Vy € atlas (M), Vw € R”, T" (t5 ww ab) = P- 


The projection map of the (tangent) covector fibration (T*(M),«*, M) is the map «*. 


55.4.5 REMARK: Velocity charts for covector fibrations. 

'The word *velocity" is not so easy to justify for the vertical component of the component map for covectors 
in T*(M) as it is for vectors in T(M). The vectors in T(M) correspond to velocities of curves in the base 
point set M. The elements of T* (M) correspond to differentials of real-valued functions on M. Differentials 
quantify the rate of change of a real-valued function for a given curve velocity. Nevertheless, it does not 
seem to be necessary to invent a new word to replace *velocity" in this case. The elements of T* (M) signify 
a kind of “dual velocity" or “covector velocity" or “covelocity”. 


A strong similarity between the velocity charts ®(w) and ©*(~) for vector and covector fibrations in Notations 
54.5.7 and 55.4.7 respectively is the fact that they are both chart-dependent. This is because they are in some 
sense “vertical”, whereas the corresponding projection maps 7 and 7* are “horizontal”. So the velocity charts 
could be referred to as “vertical component charts”, although this would be confused with the higher-level 
vertical component concepts for bundles such as T(T(M)) and T(T*(M)). 


55.4.6 DEFINITION: The velocity chart on the covector fibration of a C! differentiable manifold M, for a 
chart «» € atlas(M), is the map from (7*)~'(Dom(w)) to R” defined by tw, +? w, where n = dim(M). 


The velocity chart map for the covector fibration of a C1 differentiable manifold M is the map from atlas(M) 
to T*(M) > R” which maps each chart in atlas(;M) to its corresponding velocity chart on T*(M). 


55.4.7 NOTATION: ©*, for a C! manifold M, denotes the velocity chart map for T*(M). In other words, 
Vp € M, Vy € atlas (M), Vw € R”, P(Y) (tow) = W, 
where n = dim( M). 


55.4.8 DEFINITION: The (tangent) covector bundle (total space) manifold atlas for a Ct differentiable 
manifold M < (M, Am) with n = dim(M) € Zj is the set Arm) = (V*(); Y € Am}, where V*() : 
UpeDom(y) Tp (M) > IR?" is defined for € Ay by 


Vi € Am, Vp € Dom(y), Vv € T; (M), 
V*()(w) = (Yp), (w(to e. o); (55.4.1) 
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where (e;)/_, is the standard basis for IR", and tp e; p € Tp(M) denotes the tangent space chart-basis vector 
for component i € IN, at p € M with respect to v € Ay. (See Definition 54.4.9.) In other words, 


V € Am, (i) = (Wo T*) x &*(y). (55.4.2) 


55.4.9 REMARK: Adding a fibre atlas to the covector fibration. 

Since differentiable fibre bundles are not defined until Chapter 64, the atlas AE, «(u) Cannot yet be described 
as the fibre atlas of a differentiable fibre bundle. However, the tt tuple in Definition 55.4.11 can be described 
as a fibre atlas for a topological fibration according to Definition 47.5.2. 

The fibre atlas AE. (M) differs from the manifold atlas Ar-(y for the total space T*(M) by the fact that the 
first n base-space coordinates are omitted. This requirement is a consequence of the choice of specification 
style for fibre charts which is discussed in Remark 21.5.11. The projection map m2? "1 removes the first n 
components. (See Remark 11.5.27.) 


55.4.10 REMARK: Spaces and maps for tangent vector and covector bundles. 
Some of the charts and projection maps in Definitions 55.4.8 and 55.4.11 are illustrated in Figure 55.4.1. 


Om 
Or 
QO 


R” 


Figure 55.4.1 Charts and projection maps for tangent vector and covector spaces 


55.4.11 DEFINITION: The (tangent) covector bundle of a Ct differentiable manifold M < (M, Am) with 

n = dim(M) € Zj is the tuple T*(M) < (T*(M), AT=(m), T*, M, Am, AR) which satisfies the following. 
(i) m = U,eu Tp (M). 

(ii) «* : T*(M) — M is the projection map for the covector fibration (T* (M), «*, M). 

(iii) Ars m) is the tangent covector bundle manifold atlas for M. (See Definition 55.4.8.) 

)A E uu) = ($*(V); v € Ay}. 


55.4.12 REMARK:  Differentiability of the tangent covector bundle. 
'Theorem 55.4.13 is the tangent covector bundle version of Theorem 54.5.28 for tangent vector bundles. 


(iv 


55.4.13 THEOREM: The tangent covector bundle of a C**! manifold is a C* manifold. 
Let M < (M, Am) < ((M, Tm), Am) be a C**! manifold with k € Zj. 

(i) The tangent covector bundle manifold atlas Az«(arj for M is a C* locally Cartesian atlas on T*(M). 
(ii) The topology Tr=(m) induced by Ar«(yr on T*(M) is a Hausdorff topology on T* (M). 
(iii) ((T* (M), Tr» (uy); Ar (ui) i$ a C* topological manifold. 
PROOF: For part (i), by Definition 55.4.8, Ar-(M) = (W*(v); v € Am} with V*(/) - (v o n*) x $*(v) for 
all v € Ay. Any element of Ara) has the form y» = W*(v)) for some Y € Ay. Then Y = (v o n*) x 9*(v) 


is a function with Dom(i)) = (z*)-!(Dom(v)) and Range(w) = Range(v o m*) x IR^, where Range(y) o 
7*) = Range(v) by Theorem 10.10.13 (viii) because Dom(w) C Range(7*) = M. Let Q = Range(v). Then 
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Q € Top(R”). So Range(7) = Q x R” € Top(R2”). Therefore $ : (1*) ^ (Dom(v)) — Q x IR" is a bijection. 
So Arp» i) satisfies Definition 51.5.2 (1) for the point set T*(M). 

LI edcsno Dom($) = Logan (x*)~!(Dom(w)) = T*(M) because Dom(z*) = T*(M) and Range(z*) = 
M = UyeA,, Dom(v). So Ar-(w) satisfies Definition 51.5.2 (ii) for the point set T(M). 

Let 1 = W*(q1) and bz = W*(vs). Then $;(Dom(/»)) = $i((1*) ! (Dom(v2))) = yı (Dom(42)) x R”, 
where v; (Dom(13)) € Top(R”) because Dom(v») € Top( M) and v, is a homeomorphism. Consequently 
V1(Dom(v/5)) € Top(IR?^). So Aj.(yr) satisfies Definition 51.5.2 (iii) for the point set T(M). 


Let 4 = W* (i4) and yb. = W* (Y2) with Y1, Y2 € Am. Then by Theorem 55.2.17, i o 7! maps Qı x IR^ 
to Q2 x IR^, where Qg = Range(v) for k = 1,2, according to the rule 
V(z,w) € Qı x R”, 


(2.0 ir yw) = (voi 0), (È wnh Lucr ay Ma): 


This is C* because v o %3" and y2 o v! are C*+! by Definition 51.3.2 (iii). Therefore Ar»(m) satisfies 
Definition 51.5.2 (iv) for the point set T*(M). Hence Ar»(m) is a C* locally Cartesian atlas for T*(M) by 
Definition 51.5.2. 

For part (ii), the topology Tr=(m) is induced on T*(M) according to Definition 49.8.12. Then Tr=(m) is 
a valid topology on T*(M) by Theorem 49.8.8 (ii). The Hausdorff property for T-(y follows from the 


Hausdorff property for Tm by Theorem 33.1.35 (iii) applied to any chart we Ar+(m) for two covectors 
W,,W2 € Dom(). If there is no common chart in Ap« jz) for W1, W2 € T(M), then a disjoint open cover 
for W; and W3 can be the pair (Dom(;), Dom(»)) for any à; € atlasw, (1* (M)) and d» € atlasw, (T* (M)), 
where atlas(T* (M)) = Ar+(m). 


Part (iii) follows from parts (i) and (ii). 


55.4.14 REMARK: The full fibre bundle for tangent covectors. 

The differentiable fibre bundle structure is not introduced until Definition 64.8.3, whereas topological fibre 
bundles have already been defined in Definition 47.6.5. More importantly, associated topological fibre bundles 
have been introduced in Definition 47.9.7, which are relevant to the association between tangent vector 
bundles and tangent covector bundles. To meet the requirements for associated topological fibre bundles in 
Definition 47.9.7, for the tangent vector bundle (T'(M), Arm), T, M, Am, Aft) in Definition 54.5.30 and 


for the tangent covector bundle (T*(M), Ar«(uy, 1*, M, Am, AT my) in Definition 55.4.11 to be associated, 
there must be a bijection A : AR) > ARM) which meets conditions (i) and (ii) in Definition 47.9.5. 


The proof of Theorem 55.4.15 for T*(M) follows very much the pattern of the proof of Theorem 54.5.33 
for the associated tangent bundle T'(M). The main distinction is the action of the common structure group 
GL(n) on the fibre space. All chart transition maps for fibre sets must be equal, via the charts, to this group 
action on the fibre space IR", but the group action is different for tangent vectors and tangent covectors. For 
covectors, the n-tuples representing components must be multiplied by the transpose of the inverse of the 
matrix in the structure group. 


The notation * F*” is employed in Theorems 55.4.15 and 55.4.17 for the dual linear space of F = R”. Both 
F and F* are identified with R” here, although strictly speaking, F is a space of column vectors and F* is 
the corresponding space of row vectors. Thus (G, F*,c, u*) = (GL(n), R”, o, u*) is a dual representation of 
the transformation group (G, F, o, p) = (GL(n), R”, o, u). 


55.4.15 THEOREM: The tangent covector bundle of a C! manifold is a topological fibre bundle. 

Let M be a C? manifold with n = dim(M). Then the tangent covector bundle T* (M) for M is a topological 
(GL(n), R", c, u*) fibre bundle, where e : G x G > G is the matrix product operation on G = GL(n), and 
u* :G x F* + F* is the action of G of F* = IR" by matrix multiplication by inverse matrices on the right. 


PRoor: To show that T*(M) is a topological (G, F*) fibre bundle according to Definition 47.6.5 with 
G = GL(n) and F* = R”, first note that (G, F*) is an effective topological left transformation group by 
Theorem 39.6.4. The pair (M, Tj) is a topological space, where Tm is the underlying topology of the C! 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1764 55. Covector bundles and frame bundles 


manifold (M, Am) according to Definition 51.3.14. Similarly, (T* (M), Tr=(m)) is a topological space, where 
Tr+(m) is the topology underlying the C? manifold (T*(M), Ar«(w)) as in Theorem 55.4.13. 


The projection map z* : T*(M) — M in Definition 55.4.4 is continuous by Theorem 32.10.7 (i) because, via 
the charts in Definition 55.4.8, it is a projection map from a direct product to one of the components of the 
direct product. Thus Definition 47.6.5 (i) is satisfied because 7" (T*(M)) = M by Definition 55.4.4. 
Definition 47.6.5 (ii) is satisfied because the charts V*(v) in Definition 55.4.8 are continuous. This follows 
from the fact that the topology on T*(M) is defined by the charts themselves. 

Definition 47.6.5 (iii) follows from the observation that the domains of the manifold charts for T*(M) cover 
all of M because they are the same as the domains for the charts for M. 

For Definition 47.6.5 (iv), it must be shown that «* x ®*(~) : (1*) ! (Uy) + Uy x IR" is a homeomorphism, 
where Uy = 7* (Dom(d* (v)))), for all Y € atlas(M). This follows from the observation that the differentiable 
structure on (7*)~!(Uy), and therefore also the topology on (2*) !(U,), is induced by the charts in the 
manifold atlas Ar») in Definition 55.4.8. 

For Definition 47.6.5 (v), it must be shown that any two fibre charts ¢ and ¢2 on a given fibre set (1*) ! ((p]) 
of T*(M) are related to each other by a group element gs4,,4,(p) € G, and that this group element is a 
continuous function of p € M. It follows from Theorem 55.2.17 and Definition 51.4.18 that 


Vp € M, YW € (x*)~"({p}), Yy, We € Amp; 
B* (99)(W) = 9*(91)(W)Ja(p) * 
= gà» eu (P) (9 (V1) (W)), 


where $1 = * (Y1), $9 = 9*(v5) and ggz, (p) = Jz1(p) € GL(n), and the action of GL(n) on IR" is defined 
to be multiplication of row vectors by inverses of square matrices on the right. (Or equivalently, column 
vectors are multiplied on the left by the transpose of the inverse of the square matrices in GL(n).) 

The continuity of g5, 4, on Uy, N Uy, follows from the formula in Definition 51.4.18 for J51(p), by virtue of 
Definition 42.5.11 for a C! function. 


55.4.16 REMARK:  Tangent vector and covector bundles are associated topological fibre bundles. 

The action u : Gx F + F in Theorem 55.4.17 is the map u : (A,v) => Av for A € G = GL(n) and 
v € F = R”, where Av denotes the usual action of a matrix on an n-tuple from the left. The dual action 
p* : G x F* > F*, where F* = R”, is the dual map p* : (A, v) = (A-1)Tv = vA7!. (Some of the maps and 
spaces in Theorem 55.4.17 are illustrated in Figure 55.4.2.) 


Or 


Figure 55.4.2 Association of tangent vector and covector bundles 


55.4.17 THEOREM:  Tangent vector and covector bundles are associated topological fibre bundles. 
Let M be a C! manifold with n = dim(M). Then the vector bundle T(M) and covector bundle T*(M) are 
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associated topological (GL(n), IR^, c, u) and (GL(n), R”, ø, u*) fibre bundles, where o : G x G > G is the 
group operation of G = GL(n), and u : G x F > F and p* : Gx F* — F* are the standard primal and dual 
actions of G on F = R” and F* = IR" respectively. 


PROOF: Define h: ARG (M) ^ AR. (m) by h: (v) + D (Y) for Y € Ay, where AT) n and AT-M) are the 
fibre atlases for T(M) and T*(M) in Definitions 54.5.16 and 55.4.11 (iv) respectively. Then h is a bijection, 
and BDO) = Dom(w) = 7*(Dom(®*(~))) for all v € Ay. So Definition 47.9.5 (i) is satisfied. 
Let $1,092 € AB Rumy Then $1 = $(11) and $9 — $ (155) for some v1,» € Ay. Let p €E z(Dom(41)) N 
n(Dom(ġ2)) 2 Dom(v4) n Dom(y1). Then ¢2(V) = J21(p)ġı(V) for all V € T (M). (See Definition 51.4.18 
for J21(p).) In other words, the fibre chart transition map in Definition 47.6.5 for T(M) is the map go,,o, : 
Dom(wW1) N Dom(v5) + GL(n) which satisfies gg, .4,(p) = Jo1(p). The corresponding fibre chart transition 
map for charts h(ó1) and h(¢2) for T*(M) is gig.) ,(5,; : Dom(v1) N Dom(2) > GL(n) which satisfies 


95(65),h (91) D) = J21(p). Therefore 


Yoi, ó» € Al's, Vp € Dom(v1) N Dom(y), 
Jo2,61(P) = Th(b2),h(d1) (P)- 


Thus h satisfies Definition 47.9.5 (ii). So by Definition 47.9.5, h is a topological fibre bundle association 
between T(M) and T*(M). Hence by Definition 47.9.7, T(M) and T*(M) are associated topological fibre 
bundles. 


55.5. Tangent vector-tuple bundles 


55.5.1 REMARK: Comparison with direct sums and Whitney sums of vector bundles. 

A tangent vector-tuple bundle may be thought of as the “direct sum” or “Whitney sum” of vector bundles 
where each component vector bundle is a copy of the tangent bundle in Section 54.5. (See Section 65.7 for 
the extension of Section 55.5 to vector-tuple bundles constructed from general vector bundles.) 


The definitions and notations for tangent vector-tuple spaces, fibrations and bundles are mostly uninteresting 
“boilerplate” , following much the same pattern as for various other classes of objects which are constructed 
from base-level tangent bundles. However, they are unavoidable because they are required for defining 
differential forms and tensor fields on manifolds. 


55.5.2 REMARK:  Tangent vector tuples, multilinear functions and tensor bundles. 

Tensors are defined in terms of multilinear functions on Cartesian products of finite families of linear spaces. 
(See Definitions 27.6.3 and 28.1.2.) Tensor bundles may be constructed from the tangent vector bundle on 
a differentiable manifold M by first defining a Cartesian product of a family of linear spaces at each point 
of the manifold. 


The Cartesian product of most interest is T,(M)" = x? 4,T,(M) for all p € M, where r € Zj. (Note that 
T,(M)? = (0) = {0} by Theorem 14.6.3.) From this space, the covariant tensor space .Z,(1,(M)) may be 
constructed. (See Notation 27.3.4 for .Z;.(V) for linear spaces V.) Then the corresponding contravariant 
tensor space may be constructed as the dual space 69" V = Y%,(T,(M))* as in Definition 28.1.2, and various 
other kinds of tensor spaces may be constructed from these arase tensor spaces. (See Notation 28.1.7 
for Q” V.) The standard Cartesian product topology in Definition 32.12.2 is assumed in Definition 55.5.4. 


55.5.3 REMARK: Applications of tangent vector-tuple bundles. 

The concepts in Section 55.5 are referred to in Sections 55.6, 56.4, 56.5, 56.7, 57.4, 57.7, 59.7, 61.10, 61.11, 
61.12, 61.13, 65.7, 71.3 and 73.4. Tangent vector- tuple bundles se raned Tor dA of ahaa 
function bundles in Section 56.4 and the short-cut versions of differential forms in Section 57.7. They are 


also required for the definition of Riemannian metric functions in Section 73.4. 


The tangent vector 2-tuple bundle, i.e. the tangent vector-pair bundle, provides a convenient domain for the 
definition of Riemannian and Minkowskian metric tensor fields. (See for example Definition 73.4.2.) Instead 
of defining a separate inner product g(p) on the tangent space T, (M) at each point p of a manifold M, one 
may define a metric function g : T?(M) > R on the 3n-dimensional total space of all tangent vector pairs, 
where n = dim( M). (This simplifies the construction of differentials of the metric.) 
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Another application of vector-tuple bundles is in the special case of linearly independent n-tuples, which 
are bases of the tangent space at each point. These are used as “coordinate frames bundles” or “principal 
bundles” for the definition of affine connections. 


55.5.4 DEFINITION: The tangent vector r-tuple space at p € M, for a C! manifold M and r € uos is the 
Cartesian product T,(M)" = x7 4T5(M), together with its Cartesian product topology. 


55.5.5 NOTATION: T5(M),forp€ M andr € Zi for a C! manifold M, denotes the tangent vector r-tuple 
space at p. In other words, T; (M) = T,(M)'. 


55.5.6 DEFINITION: The (tangent) vector r-tuple total space for a C! manifold M, for r € Zg, is the 
set Upem Tp (M). 


55.5.7 NOTATION: T”(M) denotes the tangent vector r-tuple total space for a C! manifold M for r € Zj. 
In other words, T" (M) = Upem Tp (M). 


55.5.8 DEFINITION: The (tangent) vector r-tuple fibration for a C! manifold M, for r € Z+, is the tuple 
(T"(M), n”, M), where x” : T" (M) — M is defined by 


Vp € M, (Vj), € Tz (M), (Vra) p. (55.5.1) 


j=l 


The projection map of the (tangent) vector-tuple fibration (T"(M),«*, M) is the map 7”. 


55.5.9 THEOREM: Explicit expression for projection map of tangent vector-tuple fibration. 
Let M be aC! manifold. Then 


Yr € Z*, (Vj. € T'(M), m (Vra) = 504). 


j=l 
where 7: T(M) — M is the projection map for T(M). In other words, 


vr € Z*, VV € T"(M), a" (V) = «(Vi). 


PROOF: The assertions follow from Definitions 55.5.8, 55.5.6, 55.5.4 and 54.5.4. 


55.5.10 REMARK: Problem: Projection maps cannot be defined for vector 0-tuple total spaces. 
Very unfortunately, the projection map 7” in Definition 55.5.8 is meaningless for r = 0 and #(M) > 1. This 
is because T? (M) = Upem 03 = (0) when M # 0, which follows from the observation that 


Vp € M, T)(M) = T (M)? 
= {()} = {0}. 


Clearly no well-defined function can map a single value, the empty set, to all points of M if #(M) > 1. 
(This follows from Theorem 13.1.15 (iv) although it may be proved very easily without it.) 


Consequently all definitions which are based on vector-tuple bundles must be restricted to non-empty tuples. 
However, the pointwise linear spaces T,)(M) = {()} = (0) are meaningful and useful. Even the total space 
T°(M) = (0) is meaningful, and it means the correct thing, but it is of limited value because it cannot be 
given a projection map. 

Definition 55.5.8 is able to extract the base point p from elements of T”(M) when r > 1 because each vector 
in T(M) has a built-in base-point value in its chart/map pair in Definition 54.1.2. Consequently the fibre 
sets at different base points are disjoint. This is why 7" is well defined by line (55.5.1). 


An explicit expression which maps tangent vectors to their base points is given in Theorem 54.2.2 (viii) as 
the formula p = v 1(V(v)(0)) for all v € atlas(M), p € Dom(v) and V € T,(M). By Theorem 54.5.5, this 
can be expressed in terms of the tangent bundle projection map « : T(M) > M as v 1(V(v)(0)) = «(V) 
for all V € T(M) and v € Dom(V). This formula is applied in Theorem 55.5.9. 


Theorem 55.5.9 asserts that the base point 4" (V) can be extracted explicitly from the first element Vi of any 
vector tuple V in the total space T"(M) by applying the explicit projection map 7 for the ordinary tangent 
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bundle T(M) to it. It is clear that the “first element” of the vector tuple V cannot be used in this way for 
r = 0 because the first element does not exist for r = 0. 


In a sense, the problem which arises here is caused by the abandonment of the idea that a fibre bundle total 
space should be a cross-product of the base space and the fibre space, or rather an equivalence class local 
cross-products of this kind. In a cross-product, the base point and fibre value are always extractable at any 
point in the total space. The employment here of tangent vectors with built-in extractable base points only 
works when the tuple of such vectors is non-empty. The tagging of total space elements with their base 
points should be the normal way of doing things, not an arbitrary ad-hoc add-on whenever the built-in base 
point fails to appear. 


55.5.11 REMARK: Avoiding the use of vector 0-tuples in degree 0 differential form definitions. 

The principal motivation for defining vector-tuple bundles is to provide a space on which bundles containing 
differential forms and general tensor fields can be defined. The non-definition of vector 0-tuple bundles 
would make degree 0 differential form definitions more difficult. (See Notation 56.4.4 for example.) Degree 0 
differential forms are important because they are inputs for the exterior derivative in Definition 61.10.3, and 
are outputs from some other kinds of operations. So it is of some interest to see how differential forms of 
degree 0 are defined in the literature. 


Lang [23], page 125, wrote the following. (Here “functions” means real-valued functions on the base space.) 


It is convenient to agree that a differential form of degree 0 is a function. In the next proposition, 
we describe the exterior derivative of an r-form, and it is convenient to describe this situation 
separately in the case of functions. 


He then proceeds to define the exterior derivative of a real-valued function f on a manifold as the differential 
df as in Definitions 58.1.2 and 58.2.2, and he defines the exterior derivative for differential forms of positive 
degree as an extension of this operation. This is slightly untidy, but it seems to be an unavoidable untidiness 
because of the inconvenient degenerate total space of the bundle of tangent vector 0-tuples. 

Poor [32], page 32, simply says that “a section of (@* TM) & (G T*M) is called a tensor field of type (k, £). 
A (0,0)-tensor field is just a C% function on M". This avoids discussing the impossibility of constructing 
a projection map from the base-point-independent linear space @° T,(M) = Lin(K {9} K) mentioned in 
Remark 28.1.15 to a variable base point p, where K = IR. 

Frankel [12], page 66, wrote the following, where E denotes a vector space which is typically the tangent 
space at a point of a differentiable manifold. 


It is convenient to make the special definition AE := R, that is, 0-forms are simply scalars. A 
0-form field on a manifold is a differentiable function. 


Once again, the idea is to replace the degenerate degree 0 differential forms with real functions on the base 
space. It is in fact not merely convenient, but also necessary so as to avoid using degenerate total spaces. 
This approach is well justified by the linear space isomorphism between scalars and degree-zero multilinear 
functions in Definition 27.3.7. 


55.5.12 REMARK: Coordinate charts for tangent vector-tuple fibrations and bundles. 

Definition 55.5.13 for the coordinatisation of the total space of a tangent vector-tuple fibration or bundle 
is expressed in terms of the charts ®(y) in Notation 54.5.7 for tangent fibrations. Then the coordinate 
tuple for an r-tuple of tangent vectors is simply the r-tuple of coordinate n-tuples for the individual vectors. 
Therefore under the usual linear space operations of pointwise addition and scalar multiplication, the space 
of tangent vector r-tuples at each point has dimension nr. 


55.5.13 DEFINITION: The (tangent) vector r-tuple coordinate tuple of a tangent vector r-tuple (V;)5., in 
the tangent vector r-tuple space x52 T (M ) at a point p in a Ct differentiable manifold M with respect to 
a chart vj € atlas; (M) is the tuple ($(/)(Vj))7, = (((0)(Vj))221)52 € (R")", where n = dim(M) € Z$ 
and r € Zf. 


55.5.14 NOTATION: tp wy forr E Zt, p € M, w € (IR")" and v € atlasp(M), for a C! manifold M with 
n = dim(M) € Zf, denotes the tangent vector r-tuple with coordinate tuple w = (w;);., € (IR")" with 
respect to v» at p. In other words, 


Vp € M, Yw € (IR")", Vb € atlas (M), thw = (tpw) j= 
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55.5.15 REMARK: Degenerate Hanger vector 0-tuple coordinate tuples. 

In Notation 55.5.14, 0) wy = t9, ,,,. for all pj, pa € M and wi, w2 € (R")? = (0). The degeneracy with 
respect to the base point of 0-tuples in Definition 55.5.13 follows directly from Definition 55.5.4. However, 
the situation is much worse than this. These zero-vectors are in fact the only elements of T9 (M) at every 


point p € M, and T? (M) = T}, (M) for all pj, p; € M, as mentioned in Remark 55.5.10. 


The tangent spaces TUM ) ad vectors tpg. € Ty(M) for p € M and v € atlas,(M) are useful for some 
purposes. So it is a good idea to define them despite the impossibility of defining a projection map t, gy, > p. 


55.5.16 REMARK: Convenient notation for general tangent vector-tuples. 
For specific fixed values of r, it is convenient to list the r coordinate tuples w; € IR" individually, separated 
by commas. For example, one may write t? =y for w = (w;);., € (IR")? for 


2 
pwp T ZI, p,w| ,w V 
p € M for a manifold M with n — dim(M). 


55.5.17 REMARK: Chart-basis tangent-vector tuples. 
If a set T7 (M) is given linear space structure (as in a “direct sum of vector bundles” construction) with 
addition operation (V)5 .,--(V2)7.., = (Vj --V?)74, then the natural choice of basis vectors would be vector 


tuples (e?”,0,0,...0), (0,e?”,0,...0), and so forth, for i € Nn. It is difficult to adapt Notation 54.4.10 
for chart-basis vectors e? '" to such tangent vector tuples. These chart-basis vectors would have the form 
t? e,,0,). OF t o. ey for r = 2. One could perhaps denote these as Cid and “ep G respectively, but the 


benefits seem to j^ outweighed by the ambiguity and inconvenience. 


The sets T7 (M) are most often used as product spaces from which multilinear maps can be constructed. 
In this multilinear context, the more natural concept of basis vectors would be vector tuples of the form 
Cb d zı for i € IN. (See Definition 27.6.9 for the canonical basis for a very general kind of multilinear 
function space.) Such tuples appear in Definition 55.5.18 and Notation 55.5.19, which are vector-tuple 
versions of Definition 54.4.9 and Notation 54.4.10. Whether the index tuples i have the form i : N, — IN, 


or i: Z[0, r — 1] + Nn, the tuple space is still denoted as 77 (M). 


55.5.18 DEFINITION: The tangent space chart-basis vector-tuple for an index-tuple i € (Na) atp E€ M 
with iw to a chart vj € atlas,(M) for a C! manifold M, where n = dim(M) and r € Z, is the vector 


tuple (ep? "yr i. (See Notation 54.4.10 for gm) 


Alternative meaning: (e? ius for index-tuple style i : Z[0, r — 1] > Nn 


tk 
55.5.19 NOTATION: Tangent space chart-basis vector-tuples. 
e?” for p € M, v € atlas,(M) and i € (N,)', for a C! manifold M with n = dim(M) and r € Zj, denotes 
the tangent space chart-basis vector-tuple for 4 at p with respect to v. In other words, 
Vp € M, V € atlas,(M), Vi € (Nn)", — e?" = (eb )r 
= (t.e; o) ki 


ub Pie 


Alternative meaning: (e;’"),_9, for index-tuple style  : Z[0,r — 1] > Nn 


55.5.20 REMARK: Ezpression for tangent vector-tuples as linear combinations of chart-basis vector-tuples. 
Theorem 55.5.21 is the vector-tuple version of Theorem 54.4.11, which expresses tangent vectors in terms of 
chart-basis vectors. T'his assumes the linear space structure outlined in Remark 55.5.17. 


55.5.21 THEOREM: Components of tangent vector-tuples in terms of chart-basis vector-tuples. 
Let M be a C! differentiable manifold with n = dim(M) € Zj. Then 


Vr € Zt, Vp € M, Vv € (IR")’, Vy € atlas, (M), 


n 
puw = (3 vae p T2 


where v € (IR?)" is indexed in the style v = (v3)? ,)7 


a= 


PROOF: The assertion follows from Notation 55.5.14 and Theorem 54.4.11. 
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55.5.22 REMARK: Tangent vector-tuple bundles of a differentiable manifold. 

Definition 55.5.30 introduces the structure for the topological fibre bundle whose total space consists of vector 
tuples at each point of the base-level manifold. The topologies on the sets M and T"(M) are implicit in 
the specification tuple (T" (M), ^, M, ARO oe This may be recognised as a (GL(n), R”) topological vector 
bundle with the natural action lis elements of GL(n) on the components of the vector tuples via the fibre 
charts ®"(wW). The group action transforms each component simultaneously in accordance with the same 
linear map. In this way, the r-tuple bundle becomes an associated bundle of the base-level tangent bundle 
for each r € Z*. (Some of the maps and spaces in Notation 55.5.25 and Definitions 55.5.24 and 55.5.28 are 
illustrated in Figure 55.5.1.) 


(x^) (Dom()) € T*(M 2) CN ay 


R^)" 
ATA (M) 
v € atlas(M) 


Dom() C » (2) R” 
am 


Figure 55.5.1 Velocity chart map ©” for an r-tuple fibration 


55.5.23 REMARK: Definition of velocity chart on tangent-vector 0-tuple fibration. 

Definition 55.5.24 first gives the general definitions for r-tuples with r > 1, and then gives a separate 
definition for r = 0 because there is no projection map for r = 0. Notation 55.5.25 is valid for all r > 0 by 
using the combined r > 1 and r = 0 cases in Definition 55.5.24. 


55.5.24 DEFINITION: The velocity chart on the tangent vector r-tuple fibration for a C! manifold M and 
r € Z*, corresponding to a chart Y € atlas(M), is the map from (17) ! (Dom(v)) = UpeDom(y) Zp (M) to 
(IR")" defined by t; „y — w for w € (IR")", where n = dim(M). 

The velocity chart map for the tangent vector r-tuple fibration of a C! manifold M, for r € Z*, is the map 
from atlas(M) to T" (M) > (IR?)" which maps each chart in atlas(M) to its corresponding velocity chart 
on T"(M). 

The velocity chart on the tangent vector 0-tuple fibration for a C! manifold M, corresponding to any chart 
p € atlas(M), is the map from T?(M) = (0) to (IR")° defined by tuy  w for w € (IR?) = {0}, 
where n = dim(M). (In other words, tù 9 ,, — 0 for all p € M and v € atlas(M).) 


The velocity chart map for the tangent vector 0-tuple fibration of a C! manifold M, is the map from atlas(M) 
to T°(M) > (IR")° = {0} which maps each chart in atlas(M) to its corresponding velocity chart on T?(M). 


55.5.25 NOTATION: ©", for a C! manifold M and r € Zj, denotes the velocity chart map for T"(M). In 
other words, V» € atlas(M), Dom($"(v)) = U,epo (y; Tp (M) and 


Vp € M, Vw € atlas,(M), Vw € (R?)', 
E (Y) (tpw, y) = w 
= (wj) 


= (((w;)) £52» 


where n — dim(M). In other words, 


VV € T'(M), V € atlasz) (M), 9"(9)(V) = (Q)(V;);-i 


In other words, 


Yy € atlas(M), $'(y)- U xi). (55.5.2) 


peDom(v) 
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55.5.26 REMARK: Multiple-domain function products. 
The formula in Notation 55.5.25 line (55.5.2) uses an obvious inductive extrapolation of the double-domain 
function products in Notation 10.14.4 to arbitrary finite Cartesian products of domains. (See Notation 54.5.10 


for $,(V) = OW) |p (un) 


55.5.27 THEOREM: Chart transition formula for velocity chart maps for tangent vector tuples. 
Let M be a C! manifold. Then the r-tuple velocity charts ®” in Notation 55.5.25 satisfy 


Vr € Zi, Yp € M, VV € T; (M), Vi, Y2 € atlas; (M), 
$ (v3)(V) = (Jm (p)8(v1)(V;))521 
where n = dim(M). (See Definition 51.4.18 for the transition matrix Jo1(p).) In other words, 


Vr € Zi, Vp € M, VV € T,(M), V1,» € atlas, (M), Vj € Nr, Vi € Nn, 
$(V2)(V;)! = » Jap) (v1) (V;)*. 
c=1 


PROOF: The assertion follows from Notation 55.5.25 and Theorem 54.5.14. 


55.5.28 DEFINITION: The tangent vector r-tuple bundle fibre atlas for a C! manifold M, for r € ZG is the 


set AE. nie = {6"(w); v € atlas(M)] of velocity charts for the tangent vector r-tuple fibration for M. 


55.5.29 REMARK: The 0-tuple fibre bundle cannot be defined without a projection map. 
The 0-tuple case of Definition 55.5.28 gives AR = = {8° (y); v € atlas(M)) = ((0,0)) because the velocity 
chart $?(v) maps 0 to 0 for all € atlas(M). 


It is not possible to use the tangent vector 0-tuple case of the fibre atlas in Definition 55.5.28 to construct a 
tangent-vector 0-tuple bundle in Definition 55.5.30 because the required projection map is undefined. 


55.5.30 DEFINITION: The tangent vector r-tuple topological fibre bundle of a C! manifold M, for r € Zt, 
is the tuple (T"(M), «^, M, AT M) ), where n — dim(M) and FS fn is the tangent vector r-tuple fibre atlas 
for M. 


55.5.31 DEFINITION: The tangent vector r-tuple bundle manifold chart for a C! manifold M, for r € Z*, 
corresponding to a chart w € atlas(M), is the map from (2^) !(Dom(v)) to IR" x (IR?)" = IR"*"" defined 
by tpw | (Yp), w) for all p € Dom(v) and w € (IR")", where 7" is the projection map for the tangent 
vector r-tuple fibration (T"(M),«*, M). 


55.5.32 DEFINITION: The manifold chart map for the tangent vector r-tuple bundle total space T" (M) of 
a C! manifold M, for r € Z*, is the map from atlas(M) to U,esaa ary (77). (Dom(v)) > R” +”) which 
maps each chart in atlas(.M) to the corresponding tangent vector r-tuple bundle manifold chart. 


55.5.33 NOTATION: Manifold chart map for tangent vector-tuple bundle total space. 

V". for a C! manifold M and r € Zt, denotes the manifold chart map for the tangent vector r-tuple bundle 
total space T"(M). In other words, for each v» € atlas(M), the map W"^(wv) : (1") !(Dom(v)) — IR"*"", 
where n — dim(M), is defined by 


Vp € M, Vv € atlas, (M), Vw € (IR")*, 


In other words, 


Vy € atlas(M), V" (a) = (Y o T”) x D (v) 
=(wom)x U XLa49,Q). (55.5.3) 


p€Dom(wv) 
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55.5.34 DEFINITION: The tangent vector r-tuple bundle (total space) manifold atlas for a C* manifold M, 
for r € Z*, is the manifold atlas Apr (jg) on T"(M) defined by 


Arr(m) = (V (V); v € atlas(M)}. 


55.5.35 REMARK: Tangent vector-tuple bundle. 

The atlas Arr(m) in Definition 55.5.34 may be combined with the set T"(M) in Notation 55.5.7 to construct 
the vector r-tuple bundle total space in Definition 55.5.36, which is a C^ manifold if M is a C^*! manifold. 
Then in Definition 55.5.37, this is combined with the projection map 7” in Definition 55.5.8 and the fibre 


atlas ds us in Definition 55.5.28 to construct the tangent r-tuple bundle. 


55.5.36 DEFINITION: The tangent vector r-tuple bundle total space for a C! manifold M, for r € Z*, is 
the C? differentiable manifold T"(M) < (T"(M), Arr(u))- 


55.5.37 DEFINITION: The tangent vector r-tuple bundle of a C! manifold M, for r € Z*, is the tuple 
(T^ (M), 7", M, AW ty) < (T (M), Avr T", M, Am, As (Ap) as follows. 
(i) Ay = atlas(M). 


(ii) Array is the tangent r-tuple bundle manifold atlas for M as in Definition 55.5.34. 


(iii) A d = {6"(w); Y € atlas(M)] is the tangent vector r-tuple fibre atlas for M as in Definition 55.5.28. 


55.5.38 REMARK: Differentiable fibre bundle structure for tangent vector-tuple bundles. 

Definition 55.5.37 specifies a structure which is in fact a differentiable fibre bundle, although this cannot 
be formally asserted until such fibre bundles have been defined in Section 64.8. However, it is possible to 
assert at this point that a tangent vector-tuple bundle T" (M) is a topological (G, F") fibre bundle for r € Z* 
according to Definition 47.6.5, where G = GL(n) and F” = (IR")', and the action of G on F is defined by 


VA € GL(n), Vv € (R")', A(v) = (Av; )ja1- 
In other words, the n x n matrix A acts individually on each tuple v; in the tuple-tuple v = (v;)i_,. Then 
the fibre chart transition maps gj, 4, : Dom(v1) N Dom(w2) > GL(n) for T"(M) in Definition 47.6.5, for 
1,02 € Ai satisfy 


Vr € Z*, Vp € M, VV € (n")—"({p}), V1, V» € AM p, 
$^ (v3)(V) = (Joi (py ®" (vi 

= (Jai (p)? (yı 

= Ip, p (D) (^ 


j=1 (55.5.4) 


where $1 = $" (y1), 6 = ®" (2) and gj, 4, (p) = Ja1(p) € GL(n) for all y1, Y2 € Am. (The chart (v1) 
in line (55.5.4) is a single-vector fibre chart in AK m) 88 in Notation 54.5.7.) It is then not very difficult to 
show that T"(M) is a topological (G, F") fibre bundle. A formal assertion of this is omitted here because it 
is tedious and uninteresting, as may be seen in the proofs of Theorems 54.5.33 and 55.4.15. 


It is equally straightforward, and even more tedious, to show that the tangent bundles T(M) and T"(M) are 
associated topological fibre bundles via the obvious map A : AK M)? AP Gp defined by h : P(Y) + P (4) 
for € Ay. (See Theorem 55.4.17 for the corresponding association between T'(M) and T*(M).) However, 
the general idea is illustrated in Figure 55.5.2. 


55.5.39 REMARK:  Abbreviated notation for components of velocity charts for tangent vector-tuples. 

It is sometimes convenient to use the abbreviation ®"(7)(V)’ for a product such as [],_, $(v)(Vi)'* for 
tangent vector r-tuples V = (Vp); € T"(M) and index r-tuples i € N? with r € Zg. This could be 
regarded as a pseudo-notation because the tuple (9"(v)(V));ewz = (IIa $(U)(Vi)**) enr is not the 
component tuple for V € T,(M)" with respect to a basis for the direct sum space T,(M)'. . 
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On 


O) © 
N 


dx = 9(g), k— 1,2 (p-) M (dx) = (Uy), B= 1,2 


Figure 55.5.2 Association of tangent vector and vector-tuple bundles 


The main obstacle here is the fact that the maps V ++ ®"(w)(V)* are not linear functionals on the direct sum 
space T;(M)" when r > 2. This sum space has dimension nr, whereas #(IN7,) = n". But more importantly, 
the maps V ++ ®"(7)(V)? are r-linear with respect to T,(M)", not linear. Thus the maps V  e^(y)(V) 
are elements of the multilinear function space -%,.(T,(M)). They do in fact form a basis for that space. 
Notation 55.5.40 introduces maps 9" (Y)! : V ++ ®"(w)(V)*, where the superscript i is shifted from ©" (7)(V) 
to ®"(#) to make each expression $"(v/)* represent a function in Y,.(T,(M)) for some p € M. This follows 
the pattern of the notation in Definition 30.2.11 for the canonical basis of an abstract multilinear function 
space. An example application of Notation 55.5.40 is the statement of Theorem 56.4.3. 


55.5.40 NOTATION: ©"(7)', for r € Z, v € atlas(M) and i € NZ, where M is a C! manifold with 


n = dim(M), denotes the map V +> [T;.., 9(/)(Vx)'* for V € Ujepos(u) 75 (M). In other words, 
Yr € Zy, YY eatla(M),VVe U T;(M)vieN;, 


pEDom(w) 
(y) (V) = I $(U)(V.)'*. 
$7 ()*, for r € Zf, v € atlas(M), p € Dom(v) and i € N}, denotes the restriction ©" (y) Te (M) of $"(u)* 


to T; (M). Thus 
Vr € Zj, V € atlas(M), Vp € Dom(y), VV € T; (M), Vi € IN7, 


Dr) = I D) (Ve). 


55.5.41 DEFINITION: The canonical basis for the multilinear function space .Z.(1,(M)) corresponding to 
a chart Y € atlas,(M), for r € Zj and p € M for a C! manifold M, is the family (7 (Y) Jien, where 
$7 (v)! € .Z, (T, (M)) is defined for i € N; as in Notation 55.5.40. 


55.5.42 THEOREM: The multilinear function space canonical basis for a tangent space is a basis. 
Let ($7(v)')iew; be the multilinear function space canonical basis in Definition 55.5.41 for .Z. (T, (M)), 
corresponding to a chart Y € atlas, (M), for r € Zf and p € M for a C! manifold M. 
(i) (95(v)')iew; is a basis for Y,(T,(M)). 
(ii) VA € .£,(T;(M)), A= Dien: Ale?) 97), where e?" = (e")t for i € NT. In other words, 


VA € Z,(T,(M)), A= Dient ai 92 (0)*, where Vi € N}, a; = A(ef"). 


n? 


Pnoor: Part (i) follows from Theorem 30.2.12 (i). 
Part (ii) follows from Theorem 30.2.12 (ii). 
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55.6. Tangent vector-frame bundles 


55.6.1 REMARK: Applications of tangent vector-frame bundles. 
The concepts in Section 55.6 are referenced in Sections 55.7, 57.11, 65.7, 65.8 and 68.3. (See Section 65.8 
for the extension of Section 5 55.6 to vector frame-bundles "built on on renal vector bundles. ) 


The most prominent application of tangent vector-frame bundles is to the “repère mobile" or “moving frame" 
approach to connections on principal bundles. (See Remark 55.7.12 and Section 57.11 for further comments 
on this topic.) Moving frames are cross-sections of tangent vector n-frame bundles on n-dimensional differ- 
entiable manifolds, which implies that they provide a basis at each point of the base space. The philosophy 
behind this is the desire to free up tangent space basis fields to not necessarily be constructed as basis fields 
for coordinate charts. In other words, it is an attempt to liberate tangent space basis fields from coordinates. 
An easy way to construct examples of tangent vector-frame bundle cross-sections is to choose tuples of r 
vectors from amongst the n basis vectors e? ' at each point p € M for some given chart w € atlas(M) for a 
C manifold M with dim(M) = n > r as in Notation 54.4.10. (This is also presented in Example 55.6.11.) 
Tangent vector-frame bundles give such vector r-tuple fields a space to inhabit. 


55.6.2 REMARK: Frame bundles are closely related to tangent vector-tuple bundles. 

Frame bundles on a differentiable manifold are in principle nothing more or less than tangent vector-tuple 
bundles where the vector-tuples are constrained to be linearly independent. The linear independence implies 
that each tuple is a basis for some linear subspace of the tangent space at each point. In the special case that 
the number of tuples equals the dimension of the manifold, these tuples form a basis for the whole tangent 
space at each point. Generally the unqualified term “frame bundle" refers to this most important case of 
linearly independent n-tuples on an n-dimensional differentiable manifold. (See Section 55.7 for this case.) 


Since a frame, in the frame bundle context, is always a basis for some linear space or subspace, the word 
“frame” may be replaced with the word “basis” without changing its meaning. However, the word “frame” 
is the customary choice of terminology in differential geometry. 


'The basic definitions for vector-frame bundles are very little different to the corresponding definitions for 
tangent vector-tuple bundles in Section 55.5, but since the properties and applications are different, it is 
convenient to repeat all of the definitions, with some necessary minor adjustments, while choosing different 
notations. Thus the tedious “boilerplate” in Section 55.5 is almost exactly replicated in Section 55.6, which 
may be equally profitably skipped. 


The definitions for tangent vector tuples in Section 55.5 are adapted here for tangent vector frames mostly 
by substituting the word “frame” for the word “tuple”. This is not surprising because a vector frame is 
simply a vector tuple which happens to be linearly independent. Correspondingly, the notations with “T” 
in Section 55.5 are replaced here with the symbol “F”. 

It goes without saying that 75,(M) = Ø in Notation 55.6.4 if r > dim(M), but constantly stating the 
restriction r € dim(M) is inconvenient. This restriction may may be imposed whenever required for applications. 
Similarly, F”(M) = 0 in Notation 55.6.6 if r > dim(M). 


55.6.3 DEFINITION: The (tangent) (vector) r-frame space at p € M, for a C! manifold M and r € Zt, 
is the set (B € T,(M)’; B is linearly independent), together with the relative topology of the standard 
Cartesian product topology on T (M) = x7 4T,(M). 

The (tangent) (vector) frame space at p € M, for a C! manifold M, is the tangent vector n-frame space 
at p, where n — dim(M). 


55.6.4 NOTATION: F;(M), for p € M andr € Zi, for a C! manifold M, denotes the tangent vector 
r-frame space at p. In other words, 


Vp € M, Vr € Z, F,(M) = (B € T;(M)'; B is linearly independent} 
= (B € T;(M); B is linearly independent} 


with the relative topology from the Cartesian product T,(M)" = x7 4T,(M) = T; (M). 
Fp(M), for p € M, for a C! manifold M, denotes the tangent vector n-frame space at p, where n = dim( M). 
In other words, F,(M) = Foren yy), 
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55.6.5 DEFINITION: The (tangent) (vector) r-frame total space for a C! manifold M, for r € ZZ, is the 
set Unem Fp(M). 

The (tangent) (vector) frame total space for a Ct manifold M is the tangent vector n-frame total space 
for M, where n = dim(M). 


55.6.6 NOTATION: J"(M) denotes the tangent vector r-frame total space for a C! manifold M for r € Zj. 
In other words, F^(M) = U,ey Fp (M). 

F(M) denotes the tangent vector frame total space for a C! manifold M. 

In other words, F(M) = U,ey (M). 


55.6.7 REMARK: Tangent vector 0-frame fibrations have no projection map. 

Sadly, for the reasons discussed in Remark 55.5.10, it is not possible to define a projection map for tangent 
vector 0-frame fibrations in Definition 55.6.8. The problem is that the total space F°(M) = {()} = (0) isa 
singleton, and by Theorem 13.1.15 (iv) this cannot be mapped to all points of a base space M with #(M) > 1. 


55.6.8 DEFINITION: The (tangent) (vector) r-frame fibration for a Ct manifold M, for r € Z*, is the tuple 
(X"(M), a^, M), where a^ : £F" (M) > M is defined by 


Vp € M, V(Vj);. € Fp (M), 7^((V5)521) =P. 


The projection map of the (tangent) (vector) r-frame fibration for M is the map r”. 


The (tangent) (vector) frame fibration for a C* manifold M is the tangent vector n-frame fibration 
(F^(M), v", M) for M, where n = dim(M). 
The projection map of the (tangent) (vector) frame fibration for M is the map 7”, where n = dim(M). 


55.6.9 REMARK: Choice of notation for frame bundles. 

A possible alternative to the symbol “F” for sets of tangent vector frames in Notations 55.6.4 and 55.6.6, for 
example, would be the letter “P”. Thus 75 (M) could be written as P? (M) and so forth, which hints that the 
frame bundle may be regarded as a principal bundle. This alternative is avoided here because it is preferable 
to keep concrete (frame bundle) and abstract (principal bundle) concepts distinct. (See Remark 55.7.3 for 
the difference between principal frame bundles and principal fibre bundles.) 


55.6.10 REMARK: Omission of special-case definitions and notations for tangent vector frames. 
For brevity, it is assumed that all definitions and notations involving F” (M) may be substituted with (M) 
[19990] 


for the special case that r = dim(M). In other words, “r” may be omitted when it equals dim(M). (The 
special properties for the case r — dim(M) are delayed until Section 55.7.) 


55.6.11 EXAMPLE: Let M be a C! manifold. Let r € Zf with r < n = dim(M). Let p € M and 
i) € atlas, (M). Let e?^" denote the r-tuple of coordinate basis vectors (e^)? ,. Then e^" € F; (M). 
Therefore if r € Z+, the map p | e^" is in the set X14, (7^ (M)) of local cross-sections of the r-frame 
fibration F"(M). 

If r = 0, the map p+ e?" = f is a valid map from M to the total space F?(M) = {0}, and it could be 
considered to be a “local cross-section", but it does not have the full set of properties which are expected 
from cross-sections. In particular, such a map is not a right inverse of the projection map because the 
projection map is not definable. Such a “cross-section” would be of very limited utility anyway because its 
value is constant, always equal to the empty set. So it does not contain useful information about anything. 


55.6.12 REMARK: Coordinate charts for tangent vector-frame fibrations and bundles. 

Definition 55.6.14 for the coordinatisation of the total space of a tangent vector-frame fibration or bundle is 
expressed in terms of the charts ®(w) in Notation 54.5.7 for tangent fibrations. Then the coordinate tuple 
for an r-frame of tangent vectors is simply the r-tuple of coordinate n-tuples for the individual vectors. 
Therefore under the usual linear space operations of pointwise addition and scalar multiplication, the space 
of tangent vector r-tuples at each point has dimension nr. Thus Definition 55.6.14 is essentially identical 
to Definition 55.5.13, and Notation 55.6.15 is the same as Notation 55.5.14 except that it is restricted to 
linearly independent vector tuples, which implies that r < n. 


For convenience of reference, Notation 55.6.13 introduces a temporary notation for the set of all linearly 
independent r-tuples of vectors in R”. 
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55.6.13 NOTATION: Temporary notation for tuples of linearly independent vectors. 
Fars for nr € Zi , denotes the set of linearly independent r-tuples of vectors in R”. In other words, 


Yn,r € Zj, Far = {w € (IR")'; w = (w;);.., is a linearly independent r-tuple of vectors in IR"). 


55.6.14 DEFINITION: The (tangent) (vector) r-frame coordinate tuple of a tangent vector r-frame (V;)7 ., 


for r € Zi in the tangent vector r-frame space Fp(M) at a point p in a C! manifold M with respect to a 
chart V € atlas, (M) is the tuple (9(v)(Vj))4 = ((®(W)(Vj) haa € Fa; where n = dim(M). 


55.6.15 NOTATION: (5, for r € Zi,p € M, w € Fy» and v € atlas; (M), for a C! manifold M 
with n = dim(M) € Z, denotes the tangent vector r-frame with linearly independent coordinate tuple 
w = (wj); € Fn,r with respect to 7 at p. In other words, 


Vp € M, Vw € F,,, VY € atlas,(M), pw = (tpw) 5-1- 


55.6.16 REMARK: Coordinate charts for vector-frame fibration total spaces. 

Definition 55.6.17 is the vector-frame version of the vector-tuple velocity chart in Definition 55.5.24. The 
velocity chart on the vector-frame fibration in Definition 55.6.14 maps tangent vector-frames in £F" (M) to 
the corresponding r-tuples of coordinate vectors in IR” with respect to a given chart for M. 


In the same way that Definition 55.5.24 is split into the r € Zt and r = 0 cases, Definition 55.6.17 is also 
split into these two cases. 

Notation 55.6.18 for vector-frames is the same as Notation 55.5.25 for vector-tuples because the concepts are 
identical apart from the slightly restricted domains of the velocity charts. 


55.6.17 DEFINITION: The velocity chart on the (tangent) (vector) r-frame fibration for a C! manifold M 
and r € Z*, corresponding to a chart w € atlas( M), is the map from (z^) !(Dom(v)) = Up enemy Fp (M) 
to F;, defined by tpw,  w for w € Fn,r, where n = dim(M). 


The velocity chart map for the (tangent) (vector) r-frame fibration of a C! manifold M, for r € Z*, is the 
map from atlas(M) to F" (M) > Fn,r which maps each chart in atlas( M) to its corresponding velocity chart 
on F"(M). 

The velocity chart on the (tangent) (vector) 0-frame fibration for a C! manifold M, corresponding to a 
chart v» € atlas(M), is the map from F°(M) = {0} to Fro defined by t$ wy |> w for w € Fro = (0j, 
where n — dim(M). (In other words, to au ++ Í for all p € M and w € atlas(M).) 


The velocity chart map for the (tangent) (vector) 0-frame fibration of a C! manifold M, is the map from 
atlas(M) to F(M) > Fn, which maps each chart in atlas(M) to its corresponding velocity chart on F°(M). 


55.6.18 NOTATION: ©", for a C! manifold M and r € Zj , denotes the velocity chart map for F"(M). In 
other words, Vw € atlas(M), Dom(9"(v)) = UpeDom(y) Fp (M) and 


Vp € M, Vy € atlas, (M), Vw € Fir, 
P(Y) (uy) = wv 


where n = dim(M). In other words, 
VV € F'(M), Vo € atlas;(yj(M), O'(Y)(V) = (E) V) jar 
= (D (V5) "Vea fat: 


55.6.19 THEOREM: Chart transition formula for velocity chart maps for tangent vector frames. 
Let M be a C! manifold. Then the r-frame velocity chart maps ^ in Notation 55.6.18 satisfy 


Vr € Zf, Vp € M, VV € T; (M), Y, v» € atlas, (M), 
$^ (V3)(V) = (J3(p)9(1)(Vj))52; 
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where n = dim(M). (See Definition 51.4.18 for the transition matrix J21 (p).) In other words, 
Vr € ZS, Vp € M, VV € T,(M), VV, v» € atlas (M), Vj € Np, Vi € Nn, 


(s) (Vj) = È Ja PEV 


PROOF: The assertion follows as in Theorem 55.5.27 from Notation 55.6.18 and Theorem 54.5.14. 


55.6.20 REMARK: The specification tuple for a frame bundle. 

Definition 55.6.22 introduces the tangent vector frame bundle structure, which uses the tangent vector frame 
fibre atlas in Definition 55.6.21. When a manifold atlas is provided for the total space of this frame bundle, it 
can be asserted in Section 64.8 that this is a differentiable fibre bundle. It is assumed in Definition 55.6.22 that 
the frame bundle total space F” (M) has the topology consistent with the velocity charts in Definition 55.6.17 
and the projection map 7^, and that M has the topology consistent with its own atlas. (The topology on 
JF" (M) is more conveniently induced by the charts in Definition 55.6.24.) Some of the maps and spaces in 
Notation 55.6.18 and Definitions 55.6.17 and 55.6.21 are illustrated in Figure 55.6.1. 


(17)! (Dom(y)) € 7^ (M) Q)——( D Fre € (IR"* 


r (R^)" 
(Y) € Azur 


v € atlas(M) 
Dom(w) € M O—— R^ 


Y 
Figure 55.6.1 Velocity chart map 9" for an r-frame fibration 


55.6.21 DEFINITION: The (tangent) (vector) r-frame fibre atlas for a C! manifold M, for r € Zf, is the 
set A = {®"(w); v € atlas(M)) of velocity charts for the tangent vector r-frame fibration for M. 


r 


55.6.22 DEFINITION: The (tangent) (vector) r-frame topological fibre bundle of a C! manifold M, for 
r € Z7, is the tuple (F"(M),7", M, Pm where n — dim(M) and a 
fibre atlas for M. 


is the tangent vector r-frame 


55.6.23 REMARK:  Differentiable manifold charts for the total space of a frame bundle. 

To define the differentiable manifold structure of the total space of a frame bundle, the manifold charts in 
Definition 55.6.24 are required. These are associated with the charts for the base space M by the “manifold 
chart map" in Definition 55.6.25. This is denoted V* in Notation 55.6.26, which is (slightly confusingly) the 
same as in Notation 55.5.33. 


55.6.24 DEFINITION: The (tangent) (vector) r-frame bundle manifold chart for a C! manifold M, for 
r € Z*, corresponding to a chart 7) € atlas(M), is the map from (1^)! (Dom(v)) to IR" x Fa, defined by 
thw, | (0p), w) for every p € Dom(7) and linearly independent vector-tuple w € F,,,, where q” is the 
projection map for the tangent vector r-frame fibration (.F" (M), n”, M). 


55.6.25 DEFINITION: The manifold chart map for the (tangent) (vector) r-frame bundle total space F” (M) 
of a C! manifold M, for r € Z*, is the map from atlas(M) to Uyeatias(ay (17) | (Dom(v)) > R” x Fir) 


which maps each chart in atlas(M) to the corresponding tangent vector r-frame bundle manifold chart. 


55.6.26 NOTATION: Manifold chart map for tangent vector frame bundle total space. 

V". for a C! manifold M and r € Zt, denotes the manifold chart map for the tangent vector r-frame bundle 
total space " (M). In other words, for y € atlas(M), the map Y” (Y) : (1") ! (Dom(v)) > R” x Fa,r, where 
n — dim(M), is defined by 


Vp € M, Vy € atlas, (M), Vw € Fir, 
V'(v)( malo) = (v (p); w), 
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where n = dim(M). In other words, 
Vi € atlas(M), W'(U) = (bo 7”) x B"), 
where ©” is as in Notation 55.6.18. 


55.6.27 REMARK: Manifold atlas for a frame bundle. 
The manifold atlas for a frame bundle is constructed in Definition 55.6.28 from the manifold chart map V^ 
in Notation 55.6.26. 


55.6.28 DEFINITION: The (tangent) (vector) r-frame bundle (total space) manifold atlas for a C! manifold 
M, for r € Z*, is the manifold atlas Apr(m) on F" (M) defined by 


Arr(m) ^ (V (V); v € atlas(M)}, 


where V" is as in Definition 55.6.25 and Notation 55.6.26. 


55.6.29 REMARK: Tangent vector frame bundle. 

The atlas A z-(ar; in Definition 55.6.28 may be combined with the set F" (M) in Notation 55.6.6 to construct 
the vector r-frame bundle total space in Definition 55.6.30, which is a C^ manifold if M is a C^*! manifold. 
Then in Definition 55.6.31, this is combined with the projection map 7” in Definition 55.6.8 and the fibre 
atlas Ay M) in Definition 55.6.21 to construct the tangent r-frame bundle. 


55.6.30 DEFINITION: The (tangent) (vector) r-frame bundle total space for a C! manifold M, for r € Z*, 
is the C? differentiable manifold F"(M) < (F"(M), Az-(u)). 


55.6.31 DEFINITION: The (tangent) (vector) r-frame bundle of a C! manifold M, for r € Z*, is the tuple 
T r Fs T T Fa 
(F°(M), n", M, Ax) « (F' (M), Arr(My a’, M, Am, Ay (uy) as follows. 
(i) Ay = atlas(M). 
(ii) Azria) is the tangent r-frame bundle manifold atlas for M as in Definition 55.6.28. 


(iii) Ae = {®"(w); Y € atlas(M)] is the tangent vector r-frame fibre atlas for M as in Definition 55.6.21. 


55.6.32 REMARK: Tangent vector-frame bundles are associated topological fibre bundles. 

As in the case of tangent vector-tuple bundles T"(M) mentioned in Remark 55.5.38, the tangent vector- 
frame bundles "(M) in Definition 55.6.31 satisfy the requirements for a topological (G, F) fibre bundle in 
Definition 47.6.5 with G = GL(n) and F = F,,. It is also straightforward to show that the tangent vector- 
frame bundles #"(M) are associated with the basic tangent bundle T(M). (This association is illustrated 
in Figure 55.6.2.) 


Figure 55.6.2 Association of tangent vector and vector-frame bundles 
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55.6.33 REMARK: Generalisation of vector frames from tangent bundles to vector bundles. 

The vector frames in Definition 55.6.3 may be generalised without difficulty to the general differentiable 
vector bundles in Definition 65.1.3 whose fibre spaces are finite-dimensional linear spaces. Such fibre spaces 
have well-defined bases which may be chosen at each point of the base-point manifold via fibre charts. 
Associated principal bundles for such vector bundles may then be defined as bundles of vector frames. 


55.7. Principal frame bundles 


55.7.1 REMARK: Applications of principal frame bundles. 
The concepts in Section 55.7 are referenced in Sections 57.11, 65.7, 69.11 and 71.14. 


55.7.2 REMARK: Tangent space basis bundles. 

The frame bundle #(M) = F” (M) for an n-dimensional C! manifold M could be referred to as the “tangent 
space basis bundle” of M because that is what it is. However, tangent space basis bundles are customarily 
referred to as “frame bundles” or somewhat ambiguously as “principal bundles”. 


55.7.3 REMARK: Frame bundles versus principal bundles. 

The most important property of the n-frame bundle #"(M) for an n-dimensional C! manifold M which 
distinguishes it from r-frame bundles F"(M) with r Æ n is the fact that it can be regarded as a principal 
bundle. However, this is not automatic from Definition 55.6.31 because the fibre space Fhn,n is not the same 
as the group GL(n) of invertible n x n matrices. A tuple of tuples is not the same as a matrix. This is not 
a purely pedantic issue. 


There are two obvious candidates for an identification map 6 : F,,, — GL(n) between tuple-tuples and 


matrices, namely fi : (wj)5., e [(w;)']z;-, and b2 : (w;)5-, > [((wi)]2;-4. It turns out that 6, is the 
correct choice because Definition 47.8.3 for a topological principal bundle requires the structure group to act 
on the fibre space from the left. This requirement originates in Definition 47.6.5 (v), which relates fibre charts 
$1 and $» via the formula ¢2(z) = u(94.,¢,(7(z)), $1(z)), where u is the action of an effective topological 
left transformation group. 

The identification map 8 : Fyn — GL(n) defined by 8 : w = (wj)? + [(wj)']*;zi is used for the 
conversion of an n-frame bundle to a principal GL(n) bundle. The result of this conversion is referred to in 
Definition 55.7.8 as the “principal frame bundle" (of the tangent bundle), where the word “frame” indicates 
that it is a particular concrete principal bundle which is associated with the tangent bundle T(M). (Note 
that vector bundles other than the tangent vector bundle T(M) could also have “principal frame bundles", 
but that is not directly relevant here.) 


Some authors equate the concepts of frame bundles and principal bundles. The frame bundle containing n 
vectors in each vector-tuple on an n-dimensional differentiable manifold does in fact satisfy the requirements 
for a principal fibre bundle for the general linear group GL(n). However, any other GL(n) principal bundle 
associated with the tangent bundle will satisfy the same principal bundle requirements. Since they are all 
isomorphic, there is no great harm in calling the n-vector frame bundle the principal bundle with structure 
group GL(n) for the tangent bundle. Certainly the tangent-vector frame bundle is the original model upon 
which all definitions of principal bundles are based. 


On the other hand, a principal bundle could have a very general kind of structure group, not necessarily 
the general linear group GL(n, IR) for the coordinate space R” for the tangent bundle. For example, the 
structure group could be a complex group such as SU(m) = SU(m, €) for some m € Z*. So it would be 
confusing to refer to the very specific tangent vector n-frame bundle as the principal bundle. Therefore this 
kind of ambiguous terminology is avoided here. 


55.7.4 REMARK:  Technicalities for definition of principal frame bundle for tangent bundle. 
Definition 55.7.8 for the principal frame bundle F(M) (for a tangent bundle T(M)) requires some minor 


preliminary technicalities to define a suitable fibre atlas A This is the same as the fibre atlas Aa s 
in Definition 55.6.31 except that the coordinate tuple-tuples have been replaced with square arrays. The 
purpose of this replacement fibre atlas is to enable Theorem 55.7.11 to be proved. 


The map 8 : Fy.» > GL(n) in Definition 55.7.5 is a bijection because an n x n matrix is invertible if and 
only if its column rank equals n (by Theorem 25.8.12 (ii)), which holds if and only if its column nullity equals 
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zero (by Theorem 25.6.5 (ii)), which holds if and only if the column vectors are linearly independent. (by 
Definition 25.6.3). 


55.7.5 DEFINITION: The principal velocity chart on the (tangent) (vector) n-frame fibration F"(M) for a 
C! manifold M with n = dim(M), corresponding to a chart v € atlas(M), is the map from (7”)~1(Dom()) 
to GL(n) defined by tpw, > B(w) for w € Fan, where B : Fyn > GL(n) is defined by 


Vw € Fin, B(w) — (wij (55.7.1) 
The principal velocity chart map for the (tangent) (vector) n-frame fibration F”(M) of a C! manifold M 


with n = dim(M), is the map from atlas(M) to F”(M) > GL(n) which maps each chart in atlas(.M) to its 
corresponding principal velocity chart on F"(M). 


55.7.6 NOTATION: ^, for a C! manifold M, denotes the principal velocity chart map for F” (M), where 
n = dim(M). In other words, Vy € atlas(M), Dom(97 (Y)) = (1") ! (Dom(v)) = Unepom() Fp (M) and 


Vp € M, Vy € atlas, (M), Vw € Fin, 
DF (Y) (tpw, yp) = 8(w) 


In other words, 
VV € F”(M), Vw € atlas; (v) (M), $7 (Y)(V) = KOVANE 
where ® is the velocity chart map for T(M) in Notation 54.5.7. 
55.7.7 DEFINITION: The (tangent) (vector) principal frame fibre atlas for a C! manifold M is the set 


x e = (97 (v); v € atlas(M)} of principal velocity charts for F"(M), where n = dim(M). 


55.7.8 DEFINITION: The (tangent) (vector) principal frame bundle of a C! manifold M with n = dim(M) 


is the tuple (F(M), 17, M, ASQ) < (F(M), Azm), 77, M, Am, AZ Gn) as follows. 


(i) n? : F(M) — M is the same as the projection map 7” : F(M) — M in Definition 55.6.8. 
( i) AM = = atlas( M Js 
iii) Az(yr is the tangent n-frame bundle manifold atlas for M as in Definition 55.6.28. 
F( —— 
v) A 


(i ASH) = = {7 (y); v € atlas(M)} is the principal frame fibre atlas for M as in Definition 55.7.7. 


F(M) 
55.7.9 REMARK: The relation of principal frame bundle fibre charts to coordinate bases. 
As mentioned in Example 55.6.11, the coordinate basis e^ = (e?”)"_, is an element of the n-frame fibre 
set Fp (M) for all p € Dom(), for all y € atlas(M). Then one naturally asks whether there is a relation 


between an arbitrary n-frame V € Fp (M ), coordinate bases, and the fibre chart value ®7(w)(V). It 
follows from Notation 54.5.7 and Theorem 54.4.11 that V; = 557.4 D(a) (Vj Jie?” for all j € Nn. Therefore 
V; = Y P DFY (V); = (e^99(v)(V)); for all j € Na, where the juxtaposition *e^* aF (y)(V)" 
should be interpreted as multiplication of the n-tuple e^? by the matrix 7 (4)(V) on the right. In this way, 
one may write V = e^^$ (y)(V). In other words, the columns of the matrix 97 (4)(V) are, unsurprisingly 
given their method of construction, the linear combinations of the coordinate basis vectors which yield the 
vectors of V. More significantly, this is a kind of “right action” by elements of the structure group on elements 
of the principal frame bundle total space. Let ¢ = PF (v). Then V = Rye? for all V € Dom(¢). 


55.7.10 REMARK: Frame bundles are topological principal bundles. 

Theorem 55.7.11 shows only that a principal frame bundle is a topological principal bundle. The slightly 
informal proof of Theorem 55.7.11 follows the pattern of the proof of the corresponding Theorem 54.5.33 for 
tangent vector bundles and Theorem 55.4.15 for tangent covector bundles. 


Most of the proof of Theorem 55.7.11 is a simple (but onerous) matter of “ticking boxes" for the conditions 
of Definition 47.6.5. The most important part is observing on line (55.7.5) that the left action of the 
matrix g = Joi(p) on individual frame-vectors V; is equal to the left action of the matrix on the entire 
matrix $^ (y1)(V) = [9(u1)(V;)* lk,j-1 Of coordinates of the vector-frame V. Thus this coordinate matrix 
participates correctly in the algebra of the structure group GL(n). Hence F(M) is a principal bundle with 
this style of matrix-valued fibre chart. 
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55.7.11 THEOREM: The principal frame bundle of a C! manifold is a topological principal bundle. 
Let M be a C! manifold. Then the principal frame bundle (M) for M is a topological principal bundle 
with structure group GL(n), where n = dim(M). 


Pnoor: To show that F(M) in Definition 55.7.8 is a topological (G, G) fibre bundle with G = GL(n) 
according to Definition 47.6.5, first note that (G,G) is an effective topological left transformation group 
by Theorem 36.10.11. The pair (M, Ty) is a topological space, where T; is the underlying topology of 
the C! manifold (M, Am) according to Definition 51.3.14. Similarly, (F(M), Ty) is a topological space, 
where Tz(a is the topology underlying the C? manifold (F(M), Az(m)). (For the analogous derivation 
for T(M), see Theorem 54.5.28.) The projection map 7 : F(M) — M in Definition 55.6.8 is continuous by 
Theorem 32.10.7 (i) because, via the charts in Definition 55.6.28, it is a projection map from a direct product 
to one of the coordinates of the direct product. Thus Definition 47.6.5 (i) is satisfied because 1(.F(M)) = M 
by Definition 55.6.8. 

Definition 47.6.5 (ii) is satisfied because the charts W^ (7) in Notation 55.7.6 are continuous. This follows 
from the fact that the topology on F(M) is defined by the charts in ADS. and the map f in Definition 55.7.5 
is a homeomorphism with respect to the standard topologies on Fy,» and GL(n). 

Definition 47.6.5 (iii) follows from the observation that the domains of the fibre charts for (M) cover all of 
M because they are the same as the domains for the charts for M. 


For Definition 47.6.5 (iv), it must be shown that z^ x $^ (7) : (17) !(U4) — Uy x G is a homeomorphism 
with Uy = n? (Dom(97 (y))) for all » € atlas(M). This follows from the observation that the differentiable 
structure on (77) !(U,), and therefore also the topology on (77) !(U,), is induced by the charts in the 
manifold atlas Ayn js) in Definition 55.6.28. 


For Definition 47.6.5 (v), it must be shown that any two fibre charts $1 and $» on a fibre set (1^)-!((p]) of 
JF (M) are related to each other on that fibre set by a group element 92,01 (P) € G, and that this group element 
is a continuous function of p € M. Let g = Jzi(p) as in Definition 51.4.18 for charts v1, v» € atlas; (M), 
where $; = 97 (V) and $2 = 97 (i5). Let V € (17)! ((pJ) = Fp (M). Then V = (Vj)? for some linearly 
independent vectors V; € T,(M). So ®(w2)(V;) = g6(v1)(V;) for j € Nn. Let u : G x G — G denote the 
action map of the left transformation group G acting on itself. Then 


$7 (p2) V) = [EV ]7 (55.7.2) 
= [(g9(51)(2))]; 51 (55.7.3) 
= [E oO) VTP (55.7.4) 
= [gr] par (OC) ]; pa (55.7.5) 
= gà (1) (V) 


= ulg, ®7 (v1)(V)), 


where line (55.7.2) follows from Notation 55.7.6, line (55.7.3) follows from Theorem 54.5.14, line (55.7.4) 
follows from Definitions 25.2.18 and 25.3.7, and line (55.7.5) follows from Definition 25.3.7. But g = Jai(p) 
is continuous on Dom(wW,) N Dom(w2) by Definition 51.4.18 and the Definition 42.5.11 for a C1 function. 
Let 96,¢,(p) = Jai(p) for p € Dom(v1) N Dom(v5). Then go,,4, € C?(Dom(v1) N Dom(v»), GL(n)). So 
the principal frame bundle F(M) satisfies Definition 47.6.5 (v). Therefore F(M) is a topological (G, G) 
fibre bundle by Definition 47.6.5. Hence F(M) is a topological principal fibre bundle with structure group 
G = GL(n) by Definition 47.8.3. 


55.7.12 REMARK: Moving frames. 

The theory of connections on fibre bundles can be formulated in terms of “moving frames” (or “repéres 
mobiles” in French), which were invented by Gaston Darboux according to Darling [8], page 76, but are 
generally associated with the name of Elie Cartan. (See for example Spivak [37], Volume 2, pages 259-304; 
Sternberg [38], pages 161-187; Darling [8], pages 76-97; Darboux [208], pages 66-73.) 

The fundamental idea of the moving frame is to replace holonomic (or “natural”) coordinate frames with 
general frames. Thus a moving frame is a local section of the frame bundle (M). (See Section 57.11 for 
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tangent vector frame fields.) Then coordinates of vectors and tensors are computed relative to such a frame 
rather than with respect to frames which are directly derived from a local chart. A frame bundle cross-section 
is essentially the same thing as an n-tuple of vector fields which are everywhere linearly independent. The 
moving frame formalism is particularly popular in theoretical physics. 


To define a connection for the Cartan frame-field formalism, an element of the Lie algebra of the structure 
group must be associated with each point in the domain of a local frame field. This requires the frames at 
each point to be associated with elements of the structure group GL(n) via a fibre chart, not merely elements 
of the tuple-tuple space F,,. Therefore the principal frame bundle fibre atlas in Definition 55.7.8 (iv) is 
required, not just the n-frame fibre atlas in Definition 55.6.31 (iii). = 


[ www. geometry. org/dg. html] [draft: UTC 2023-1-3 Tuesday 00:13] 


1782 55. Covector bundles and frame bundles 


[www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


book dri 


gto. Al Rights Reserved. You m 


book daft for personal use. Publie redistribution ofthis book draft in electr 


‘or printed form is forbidden. You may not charge- 


[1783] 


Chapter 56 


TENSOR BUNDLES AND MULTILINEAR MAP BUNDLES 
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56.1. Tangent tensor spaces 


56.1.1 REMARK: The Über-plethora of representations of tensor spaces on differentiable manifolds. 

It was mentioned in Remark 55.1.5 that the plethora of representations for tangent covectors on differentiable 
manifolds is even greater than the plethora of primal tangent vector representations. These considerations 
apply also to spaces of general tangent tensors, but to an even greater extent. Covariant tensors are formed 
by constructing spaces of multilinear functions from a basic linear space. Mixed tensors are formed from 
the basic linear space by a wide variety of constructions, including the linear duals of spaces of multilinear 
functions, but also including spaces of linear maps between tensor spaces. (See Sections 29.3 and 29.5 for 
mixed tensor space constructions.) m E 


In the case of mixed tensor spaces on differentiable manifolds, the toolkit for constructing representations 
includes true dual spaces, dual action map spaces, chart-tagging Cartesian space tensor spaces, and tensor 
algebra constructions on the manifold itself. One may construct multilinear maps on the manifold's tangent 
space, on the Cartesian tangent space or on any true dual or dual action map space. One may then construct 
the dual of the space of multilinear maps in multiple ways. Then one may construct combinations of the 
covariant and contravariant parts of the tensor space according to the methods described in Remark 29.3.1. 
When one applies symmetry or antisymmetry at any stage, the picture becomes even more complicated. Yet 
another level of complexity is implied by the generalisation from the basic tangent bundle to other kinds of 
differentiable fibre bundles. 

'Thus the plethora of representations for tangent tensor spaces on differentiable manifolds far exceeds the 
corresponding plethora for tangent covector spaces (discussed in Remarks 55.1.3 and 55.1.4), which in turn 
considerably exceeds the plethora for tangent spaces. 


In view of the super-abundance of tools and building blocks available for constructing tensor spaces, it is 
clearly desirable to restrict one's attention to a small collection of representations so as to avoid confusion. 
As mentioned in Remark 55.1.3, the two main criteria for choosing representations are the ability to “point 
to" individual tensors and the intuitive “correctness” and extensibility of the set-constructions. 


The popular approach which says that all tensor spaces which transform the same are the same (or at least 
may be "identified") leads directly down the path to the much-maligned “coordinates” or tensor calculus. 
Many tensor spaces transform in the same way, but have entirely different functionality. The classes of tensor 
objects are important, not just their numerical parameters. 


In Section 56.1, only a small selection of the range of possible tensor spaces is defined. Suitable tensor spaces 
may be tailored to each application. It is better to learn the techniques for constructing tensorial spaces 
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rather than trying to make a comprehensive list, which would probably be almost useless, assuming that it 
could be done at all. 


56.1.2 REMARK: Two styles of general mixed tensor spaces. 

The multilinear-map style of tensor space Q”? T,(M) = Y,s(Tp(M)) in Definition 56.1.3 is defined in 
Notation 29.5.15, whereas the linear-map style of tensor space 69^ T, (M) in Definition 56.1.4 is defined in 
Definition 29.5.4 and Notation 29.5.5. T'hese two styles of mixed tensors are different set-constructions, but 
they have the same transformation rules. 

The multilinear-map tensor space Q”? T,(M) = .Z,..(T,(M)) and the linear-map tensor space C9" ^ T (M) 
in Definitions 56.1.3 and 56.1.4 are as follows. (See Definitions 29.3.8 and 29.3.3 for these.) 


@"" T,(M) = Z(T0M)*) , T, (M); R) = £s (T, (M)) 
Q9 Tp(M) = Lin(Z,. (T; (M); IR), £,(T,(M); R)). 


56.1.3 DEFINITION: A (multilinear-map) tensor of type (r,s) at a point p in aC! manifold M is an element 
of the mixed multilinear map space C9 ^ T,(M) = Z, s(Tp(M)). 


56.1.4 DEFINITION: A (linear-map) tensor of type (r,s) at a point p in a C! manifold M is an element of 
the mixed tensor space 9^ T,(M). 


56.1.5 NOTATION: The set of multilinear-map style tensors at a point in a manifold. 
T5*(M), for a point pina C 1 manifold M and r,s € Zf, denotes the set of multilinear-map tensors of type 
(r,s) at p. In other words, T7*(M) = &™° T,(M) = .£-.(T5(M)) = Z((T5(M)")", T5 (M)*; R). 


56.1.6 NOTATION: The set of linear-map style tensors at a point in a manifold. 
T5'*(M), for a point pina C! manifold M and r,s € Zj , denotes the set of linear-map tensors of type (r, s) 
at p. In other words, T7**(M) = Q'" T,(M) = Lin(Z.(T;(M); R), Z4 (T; (M); R)). 


56.1.7 REMARK: Converting tensors to components with respect to a coordinate basis. 

Theorem 56.1.8 shows that the value of a tensor A € €9"* T,(M) for general arguments can be determined 
from its values for arguments which are coordinate basis vectors. Thus the evaluation of a tensor for given 
arguments may be split into two tasks, the first being to evaluate the tensor for basis vectors, the second being 
to determine the components of the arguments with respect to basis vectors. The outputs from these two 
tasks can then be combined by array multiplication as indicated in line (56.1.1). This is the manifold tangent 
bundle version of Theorem 29.6.2 for abstract multilinear-style tensors. Theorem 56.1.9 is the corresponding 
manifold tangent bundle version of Theorem 29.6.3 for abstract linear-style tensors. 


56.1.8 THEOREM: Components for multilinear-style tensor spaces. 
Let M be a C! manifold with n = dim(M). Then 


Vr,s € Zi, Yp € M, VA e Q'? T,(M), Vv € atlas,(M), V(w')t ., € (IR?)", V(ve)ĝ-1 € (IR?)*, 


AlO arlo oes) = X X, ARS a) IL wh I v (56.1.1) 
i€N? JENS, k=1 é=1 


PRoor: The assertion follows by Definition 56.1.3, Notation 29.5.15, and Theorems 54.4.11 and 55.3.9. 
(See also Theorems 29.6.2 and 29.4.4.) 


56.1.9 THEOREM: Components for linear-style tensor spaces. 
Let M be a C! manifold with n = dim(M). Then 


Vr,se Zt, Yp € M, VA € Q”? T,(M), Vv € atlas,(M), VA € .£(T,(M)), V(ve)f_1 € (IR?)5, 


AQ (osea) D E ACS, e) (65012) ACER 913) TT v: (56.1.2) 


(See Notation 29.6.7 for @7_, p € &,(T,(M)).) 
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PRoor: The assertion follows by Definition 56.1.4, Notation 29.5.5, and Theorems 54.4.11 and 27.6.13. 
(See also Theorems 29.6.3 and 29.4.5.) 


56.1.10 REMARK:  Coordinatisation of tensor bundles. 

Theorems 56.1.8 and 56.1.9 imply that tensor spaces on differentiable manifolds can be coordinatised by 
their effect on basis vectors for the tangent vector and covector spaces at each point. In this case, the basis 
vectors are the tuples (ef dia and (ey which are constructed from charts ~ on the manifold M. 


One very substantial advantage of the construction of coordinate charts for tangent tensor spaces is that 
they yield coordinate charts for tangent tensor bundles, which can then be used to define C^ differentiability 
classes for cross-sections on tensor bundle total spaces T™S(M) and T"**(M) in Section 56.3 by means of 
Definition 52.1.2 for C^ maps between manifolds. This requires a differentiable structure on tensor bundles. 
For every tensor A € &9* T,(M), Theorem 56.1.8 shows that the value of A for all parameter sequences 
is fully determined by the array of real numbers a’; = zi (e an (er 4] for indices (i, j) € N, x N. 
Conversely, each such array determines a unique tensor A € (9^ T (M). Therefore the set &9^* T,(M) is 
equal to the set of tensors £5 ,, in Notation 56.1.11 for arrays a : Nj, x Nj, > R, where Y € atlas; (M). 


56.1.11 NOTATION: Particular multilinear-style tensors, specified by coordinates. 
traps Or Ta E Zi.p€ M,a: N! x N > R and v € atlas,(M), for a C! manifold M with n = dim(M), 
denotes the multilinear-style tensor in ®"* T,(M) defined by 


V(w*)g., € (R")", V(ve)en € (R5, 


. r S : 
treo ur i) Eis puvi) = D D 25 TL wi, Ho 
i€N7, JENS, k=1 é=1 


56.1.12 NOTATION: Particular linear-style tensors, specified by coordinates. 
tayp for r,s € Zi, p€ M,a:N7Z, x N$ > Rand v € atlas; (M), for a C! manifold M with n = dim(M), 
denotes the linear-style tensor in C9" ^ T, (M) defined by 


VA € Z(Ty(M)), (vota € (R°), v 
GO (os in) = E Ys ats (Yen) IE 


i€N7 jENs 


56.1.13 REMARK: Coordinate transformation rules for general tensors. 

The transformation rules for tensors under coordinate chart transitions are straightforward, but they are 
tedious to write out in general. They follow the same pattern as Theorem 29.6.4 for abstract tensor spaces. 
(The proof of Theorem 29.4.7 for mixed-component-space abstract tensors has similarities to the proof of 
Theorem 56.1.14.) 


56.1.14 THEOREM: Tensor coordinate transformation rule for base-space chart transitions. 
Let M be a C! manifold with n = dim(M). Let r,s € Z. Let p € M and v,v € atlas,(M). Let 
a,a@: N? x N§ > R. Let J = (J*5) i4 € R?*" denote the chart transition matrix from v to v, given by 


7 oe 
Vi,j € Nn, P= age (x)| ay? 


and let J denote the inverse matrix of J. (See Definition 51.4.18 for chart transition matrices.) 


(i) Dow = ae p if and only if 
Vi € IN’, Vj e NS, à; Y (II J*4)CII 755 Jat p. (56.1.3) 
(ii) they = (7*5 z if and only if 


Vi € N^, Vj € NS, &j- Y. Y ILS 
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PROOF: For part (i), it follows from the definition of ®"* T (M) that £55 y = =t zap If and only if 


V(w*)t., € (IR")", V(vj)z-., € (R?)*, 
Á A Eom mr (tp,veb)ta1) = tia y (o us ako (pon i )2—1)- 


By Theorem 55.2.17, (£5 i Jas = (5 ar g)i=1 if and only if Vk € Nn, Vi € Nn, uf = 2, ËJ. and 
by Theorem 54.1.11, (tp,ve, LL 1 = (toa, ġ)ĉ=1 if and only if VE € Ng, Vi € Nn, $i — = 5-2 Jv Therefore 
by Notation 56.1.11 11, regarding w and v as functions of w and 0 respectively, tpa y, = P . if and only if 


p,a EU 


) 
X Dä IDéh IE X ay Heb [oe 


ic Ny jENS, k=1 l= VENT j/ENS k=1 £—1 
s Tr E] n 
= ay TES Pam) ICE Fat) 
VENT JENS k=1 ip=l £—1 je=1 
" E 
= d s The que) C TT) (56.1.4) 
i'€N7 j'€N$ icN7 k=1 j€Nz l=1 
T T jn bk St ws i! 
-XE Lay D D (Ian) (I Pai) 
VENT J ENS, i€N” jENT, k=1 £—1 
. r Y a d 
=E oM is E erigan) CIE Por) 
icNz GENT VENT j'€eN3 k=1 £—1 
" T : S . E rt S z 
=) X(XE X ay (1 a) (11 5) ) eh Ta, 
icN7z JEN” Si/ENT j/ENS k=1 £—1 k=1 £—1 


where line (56.1.4) follows from Theorem 16.7.6. This equality is clearly implied by line (56.1.3). Conversely, 
to show that this equality implies line (56.1.3), let i" € N7, and j” € NS, and let (^)7 , = (ei pay and 
(Ge)g-1 = (esr)i-i. (See Definition 22.7.9 for the unit vectors eq € R” for a € Na.) Then [T;.., wr 
d(“i — i") and IL jj = (“j = 3""). (See Notation 14.7.21 for the Kronecker delta pseudo-notation 
template for Me hn arguments.) Then the above equality is the same as line (56.1.3) for each choice 
of i" € N}, and j” € NÀ. Thus part (i) is verified. 

Part (ii) may be proved in much the same way as part (i), or by applying the isomorphism in Theorem 29.3.11 
between linear-style and multilinear-style tensor spaces. 


56.2. Tensors imported from Cartesian spaces to manifolds 


56.2.1 REMARK: Alternative coordinatisation of tensors by importation from a Cartesian space. 

Section 56.2 presents an alternative method for coordinatising tensors on a manifold by importing them 
from the coordinate space used for its charts. This method is less satisfactory than defining tensors of 
the form Laid ,j im terms of tangent vectors of the form tpw,- Therefore the method described here is not 
recommended. (It is very unlikely that the reader will regret skipping Section 56.2.) 


56.2.2 REMARK: Construction methods for tensors on manifolds. 

The most straightforward way to construct tensor spaces on a C! manifold is to define them as algebraic 
tensor spaces in the multilinear style Q”? V = .Z(V"r, V5) with V = T,(M) as in Notation 56.1.5 (which 
is derived from Notation 29.5.3), or in the linear style 9^ V = Lin(.Z(V"),.Z(V5)) as in Notation 56.1.6 
(which is derived from Notation 29.5.5). In this book, the tangent vectors in T,(M) are chart-tagged 
Cartesian space tangent vectors [(¢, Ly(p),,)]. This approach could be thought of as “import, then tag, then 
construct". In other words, tangent vectors are imported, then tagged with coordinate charts, and then 
tensors are constructed algebraically from these tagged imports. 


An alternative method of construction for tensor spaces on a C! manifold is to import pre-constructed tensors 
from the Cartesian chart space as in Notations 56.2.4 and 56.2.5. This could be thought of as “construct, 
then import, then tag" because the tensors are constructed in Cartesian space, then imported, and then are 
tagged with coordinate charts. 
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56.2.3 REMARK: Coordinatisation of tensors on manifolds. 

As with all sets, one needs some way to “point to” individual elements of the sets T7*(M) and T7**(M) of 
tensors on manifolds. This is where tensor components enter the picture. Mixed tensors on a manifold may 
be coordinatised in terms of a single basis for the primal space T, (M) of vectors on which the tensor space is 
based. The basic algebra of mixed tensors is presented in Section 29.6 for both multilinear-style mixed tensors 
(Theorem 29.6.2) and linear-style mixed tensors (Theorem 29.6. 29.6.3). 1 The change-of-basis transformation rules 
for mixed tensor components are presented in Theorem 29.6.4. 


The multilinear-style mixed tensor L7? Joas T pip n (R”) on the Cartesian space R” in Notation 56.2.4 is 
defined in Notation 30.6.16. The linear-style mixed tensor Lic) , € Typ) (IR^) on the Cartesian space pace IR” 
in Notation 56.2.5 is defined in Notation 30.6.17. 

Notations 56.2.4 and 56.2.5 are generalisations of the tangent vectors tpw, in Definition 54.1.2 and the 
tangent covectors t% w y in Notation 55.2.9. Notation 56.2.4 is illustrated in Figure 56.2.1. 


T° (M) 
Tip) (IR) Ly, (p),a1 sai sj — Load dà Lj, (p),a2 Ty) (IR) 


Tj, (p) IR") Ly, (p),vi «— — Ly, ibi = tpv, ia [+> Lws(p),v; Tij, (p) (R^) 


| | | 


Yı(p)|<« p -|U»(p) 
Vi V» 
Figure 56.2.1 Mixed tensors on manifolds and Cartesian charts 


2: 2.4 NOTATION: Coordinatised multilinear-style tensors, imported from Cartesian space. 
b ap for pe M, a: N, x N; > R and y € atlas; (M), for an n-dimensional C! manifold M with n € Zf, 
denotes the thon style tensor in Tp (M) defined by 


Dp aa — = [v Lj ),aJ^ 
where the equivalence relation “=” is defined so that c = (Vo, Li) for V1, Y2 € atlas, (M), 
21,212 € R” and a1,a2 : IN, x IN, — IR, whenever ypy (zı) = z (x 2) and 
Vi € Nps Vj € Nh, (a;— E X (I Ia) J) la) y. 
WENT j'ENS k-l é=1 


where the chart transition matrix (J^5)5 ,_, € IR"*" is defined by 


"EET 
Va, B € Nn, 7*5 — gua Q1), y 


and J denotes the inverse matrix of J. 


56.2.5 NOTATION: Coordinatised linear-style tensors, imported from Cartesian space. 


tray for p E M, a: N; xX Ni, > R and Y € atlasp(M), for an n-dimensional C* manifold M with n € Zi 


denotes the linear-style tensor in T7**(M) defined by 
tpa = (Y, Doy); 
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where the equivalence relation “=” is defined so that (v1, Lsa) = (Vo, L72*,,) for v1, v» € atlas,(M), 
11,29 € R” and ay, az: N? x NS > R, whenever Y] (z1) = v; (z2) and 
Vi € NS, Vj € Nz, (ai= X D (MI) ie) (a1) y 
WENT J'EN; k=1 é=1 


where the chart transition matrix (J%g)% 5.., € IR"*" is defined by 


"EP Me 
Vo, B € Nn, J*g = age 2 (Vi lazy 


and J denotes the inverse matrix of J. 


56.3. Tangent tensor bundles 


56.3.1 REMARK: Tensor bundle total spaces. 
The most elementary stage in the construction of tensor bundles is the definition of the total spaces sets in 
Definitions 56.3.2 and 56.3.3. 


56.3.2 DEFINITION: The (multilinear-style) tensor fibration total space of type (r,s) on a C! manifold M, 


for r,s € ZZ, is the set Upem 15^ (M). 


56.3.3 DEFINITION: The linear-style tensor fibration total space of type (r,s) on a C! manifold M, for 


r,s € ZA, is the set Upem Tz ^ (M). 


56.3.4 NOTATION: Total spaces of multilinear-style tensors on a manifold. 
T"5 (M), for a C! manifold M and r,s € us. denotes the multilinear-style tensor fibration total space on M. 
In other words, T^*(M) = U,ey 17? (M). 


56.3.5 NOTATION: Total spaces of linear-style tensors on a manifold. 
T"**(M), for a C! manifold M and r,s € Z; , denotes the linear-style tensor fibration total space on M. In 


other words, T**(M) = Upe m T7 (M). 


56.3.6 REMARK: Construction of non-topological tensor fibrations on manifolds. 

To construct non-topological tensor fibrations from the total spaces 7^^(M) and T"**(M) in Definitions 
56.3.2 and 56.3.3, projection maps which attach the total spaces to the base-point set M are required as in 
Definitions 56.3.7 and 56.3.8. (This follows the pattern of Definition 54.5.4 for tangent vector fibrations.) 


56.3.7 DEFINITION: The non-topological (multilinear-style) tensor fibration of type (r,s) for a Ct 
manifold M, where r,s € Zj, is the tuple (T"*(M), 1^5, M), where «^? : T^*(M) — M is defined by 


Vp € M, VA € T° (M), T” (A) =p. 


The projection map of a non-topological multilinear-style tensor fibration (T™*(M),7"*, M) is the map «^*. 


56.3.8 DEFINITION: The non-topological linear-style tensor fibration of type (r,s) for a C! manifold M, 
where r,s € Zj , is the tuple (T"**(M), 175, M), where «7*5 : T'*5(M) — M is defined by 


Vp € M, VA € T7**(M), T” (A) =p. 


The projection map of a non-topological linear-style tensor fibration (T"?*(M),7"**, M) is the map «'**. 


56.3.9 REMARK:  Notations for sets of tensor components with respect to bases. 

It is convenient to let IR" *"^ be an abbreviation for IR» XN), This is acceptable because “n” is often used 
as a convenient abbreviation for “N,” when the index sets are understood within a context to commence 
with 1 instead of 0. (See also Remarks 14.6.9, 14.12.4 and 16.4.2 for the vexed issue of the initial integer 
for index sets.) The set membership relation a € RNx*Nn = R^" *" may be conveniently (and literally 
correctly) written as a: IN7, x IN7, > R. 


» 


A more systematic style of notation for sets such as RWn*Nn) is given in Notation 25.15.3, where A,+5(Nn) 
or Ar4s(Nn; R) denotes the set of maps a : INZ^? — IR. Alternatively, one could write something like 
A(N? x N3; IR) for maps a: N? x Nf —> R. This has no particular advantage until one considers symmetric 
or antisymmetric subsets such as A (IN4) or A; (Nn), where a superscript is easily added to indicate the 
kind of symmetry. 
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56.3.10 REMARK: Bundle charts and chart maps non-topological tensor fibrations. 

The fibrations in Definitions 56.3.7 and 56.3.8 have a horizontal component map, which is the projection 
map, but they have no vertical component maps, which are the bundle charts. These charts are introduced 
in Definitions 56.3.11 and 56.3.12. 


Although no prior differentiable or topological structure is explicitly defined on T^*(M) or T"**(M), the 
fibre charts $^*(v) and S(p) associate the linear space structure of the fibre space IR" *"' with the 
fibre sets Tp (M) and T7"*(M) at points p € M. This association is part of the fibre charts’ algebraic 
role in which they indicate the group-invariant structure of each fibre set. The structure group here is 
GL(n), with n = dim(M), and the fibre space IR" *"' is acted on by the group contravariantly for the first 
r components, and covariantly for the last s components. (See Notations 56.1.11 and 56.1.12 for tangent 
tensors £55 y € Tp (M) and tpay € Tp (M) respectively.) 


The map-rules is ap ^? à and tpa y a in Definitions 56.3.11 and 56.3.12 are meaningful only if the inverse 


maps a > £55, pe a TN are bijections from IR" *"^ to (7^5)-!((p)) and (n7**)-! (£p]) respectively. 
To verify surjectivity, it must be known that T7*(M) = (175 y; a € IR" *"'J for all Y € atlas,(M). For 
injectivity, it must be known that for different tensors in T^ (M ), the coordinate matrix a is always different. 
These requirements are established by Theorem 56.1.8 and Notation 56.1.11, as mentioned in Remark 56.1.10. 
The same comments apply to T7**(M). 


56.3.11 DEFINITION: The bundle chart for the tensor bundle total space T™*(M) of a C! manifold M, 
for r,s € Zt, corresponding to a point chart v € atlas(M), is the map from (1^*)-! (Dom(v)) to IR" *" 
defined by tpa y — a for all p € Dom(v) and a: N;, x Ni, > R. 


56.3.12 DEFINITION: The bundle chart for the tensor bundle total space T'**(M) of a Ct manifold M, 
for r,s € Zi, corresponding to a point chart v» € atlas(M), is the map from (27**)-! (Dom(v)) to IR" *"" 
defined by (777 y — a for all p € Dom() and a: N7, x N; > R. 


56.3.13 NOTATION: Bundle chart map for the multilinear-style tensor bundle total space. 
$^5, for a C! manifold M and r,s € Zi , denotes the bundle chart map for the tensor bundle total 
space T™S(M). In other words, 


Vu € atlas(M), Vp € Dom(w), Va: IN, x INS — TR, 
$^*(u)(t55 9) = 


where n — dim(M). 


56.3.14 NOTATION: Bundle chart map for the linear-style tensor bundle total space. 
$'^5, for a C! manifold M and r,s € Ze , denotes the bundle chart map for the tensor bundle total 
space T’’*(M). In other words, 


V € atlas(M), Vp € Dom(w), Va: IN, x IN > TR, 
e (vtr) = 


where n — dim(M). 


56.3.15 REMARK: Construction of non-topological tensor bundle from tensor fibration and fibre atlas. 
The next stage of construction for tensor fibration structure is to add fibre atlases (9"*(v); v € atlas(M)) 
and {®"**(w); v € atlas(M)} to their respective fibrations to define horizontal components. The resulting 
structures are then non-topological tensor bundles. (Moreover, they are non-topological vector bundles 
associated with the non-topological tangent vector bundle.) 


56.3.16 DEFINITION: The non-topological (multilinear-style) tensor bundle of type (r,s) on a C} manifold 
M, with n = dim(M) € Zj and r,s € Zf, is the tuple (T^*(M), 2", M, AB ont where 


(i) (T^5(M),«^*, M) is the non-topological multilinear-style tensor fibration of type (r,s) on M, and 


nxn 


(ii) AB ue = {9 (Y); v € Am}. 
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56.3.17 DEFINITION: The non-topological linear-style tensor bundle of type (r,s) on a C! manifold M, 
with n = dim(M) € Zj and r,s € Zf, is the tuple (T"’*(M), "5, M, ARS, aD where 


(i) (T?5(M), "5, M) is the non-topological linear-style tensor fibration of type (r,s) on M, and 
(ii) AR eM) = = ($7 (Y); v € Am}. 


56.3.18 REMARK: Attachment of manifold atlases to tensor fibrations and tensor bundles. 

'The final stage in the construction of tensor fibrations and tensor bundles is to add manifold atlases to the 
base space and total space. The base space manifold atlas is already provided with the base space, and 
the total space atlas may be constructed by combining the bundle charts with the base space charts. The 
topologies on the base space and total space are induced by the manifold atlases. 


The bijectivity of the total space manifold charts V^*(4) : (1^*)-!(Dom(V)) > Range(v) x IR" *"' and 
Ps (p) : (175) 1 (Dom(v)) — Range(v) x (IR")" x (IR")* follows from the bijectivity of the fibre charts 
P(Y) and $"**(1) on their respective fibre sets, which is mentioned in Remark 56.3.10. 


56.3.19 DEFINITION: The (multilinear-style) tensor bundle (total space) manifold chart of type (r,s) for 
a C! manifold M, for r,s € Zj, corresponding to a chart Y € atlas(M), is the map from (1^*)-! (Dom()) 
to IR^ x IR" *"' defined by 157 ,, — (Y(p), a) for all p € Dom(v) and a: N7, x N; > R. 


56.3.20 DEFINITION: The linear-style tensor bundle (total space) manifold chart of type (r,s) for a C! 
manifold M, for r,s € Zt, corresponding to a chart v» € atlas(M), is the map from (2^?**) ! (Dom(4))) to 
R^ x R''*"' defined by tg y + (W(p),a) for all p € Dom(U) and a: Ny, x N; > R. 


56.3.21 NOTATION: Manifold chart map for multilinear-style tensor bundle total space. 
V^5, for a C! manifold M and r,s € Zf, denotes the manifold chart map for the multilinear-style tensor 
bundle total space T^*(M). In other words, 


Vp € M, Vy € atlas, (M), Va: IN, x INS > R, 
TS (Y) (toa) = VP) a), 


where n = dim(M). In other words, 
Va € atlas(M), P(Y) = (Y o q^?) x P(Y). 


56.3.22 NOTATION: Manifold chart map for linear-style tensor bundle total space. 
V5. for a C! manifold M and r,s € Zg , denotes the manifold chart map for the linear-style tensor bundle 
total space 7"**(M). In other words, 


Vp € M, Vy € atlas, (M), Va: N, x IN — IR, 
P(Y) (tpa) = (YP), a), 


where n = dim( M). In other words, 
Vw € atlas( M), PS (y) = (Y o n™*®) x P(Y). 


56.3.23 DEFINITION: The (multilinear-style) tensor bundle of type (r,s) on a C! manifold M < (M, Am), 
with n = dim(M) € Zj and r,s € Zj , is the tuple (T™°(M), ATrs(m 1^5, M, Am, AE, ou)» where 


(i) (T"^*(M), «^5. M, AR” int M) i) is the non-topological multilinear-style tensor bundle of type (r,s) on M, 
(ii) Arsim) = 19 (4); v € Am} 


56.3.24 DEFINITION: The linear-style tensor bundle of type (r,s) on a C! manifold M < (M, Am), with 
n = dim(M) € Zf and r,s € Zj , is the tuple (T5(M), Ar-s(m), 175, M, Am, AB where 


(i) (T7*(M), "75, M, ARA M 
(ii) Apr+s(as) = (97? (9); v € Ay]. 


m is the non-topological linear-style tensor bundle of type (r, s) on M, 
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56.4. Multilinear function bundles 


56.4.1 REMARK: Multilinear function bundles on tangent spaces of differentiable manifolds. 

“Mixed” contravariant/covariant tensor spaces are often either purely contravariant or purely covariant in 
practice. The multilinear-style representations for these two pure kinds of tensor spaces are T7 *9(M) and 
T, (M ) respectively, with r,s € Zg, for p in a C! manifold M. The linear-style representations of these two 
pure kinds of tensor spaces are T7" (M) and T;”*(M) respectively. (In the vast majority of the literature, 
only the multilinear-style tensor space representations are defined.) 

The space TPS (M) is conveniently identical to the multilinear function space .Z; (T;(M)). The space T7"? (M) 
is almost identical to the dual space Y,.(T,(M))* = &9' T,(M) of the multilinear function space Z, (T,(M)). 
Some specific notations are introduced for purely covariant spaces in Notation 56.4.9, and in Notations 
56.5.14 and 56.6.3 for the antisymmetric and symmetric subsets of these covariant spaces. 


Purely covariant multilinear function bundles have some special properties which contravariant and mixed 
contravariant/covariant tensors do not have. 


(1) Antisymmetric multilinear functions have a useful algebra. 
(2) Real-valued multilinear functions can be generalised to vector-valued multilinear maps as in Section 56.7. 


(3) Multilinear functions have equivalent “short-cut” versions as described in Remark 57.7.1. (See also 
Section 21.4 for general short-cuts of cross-sections of fibrations.) 


These reasons justify separate treatment for multilinear function bundles and multilinear map bundles. 


Similarly, the general, antisymmetric and symmetric sub-classes of multilinear function bundles have suffi- 
cient differences in their characteristics to justify separate treatment of these three sub-classes. (These are 
presented in Sections 56.4, 56.5 and 56.6 respectively.) 


Although most definitions, notations and theorems for multilinear functions bundles are almost automatic 
specialisations of the corresponding more general cases for mixed contravariant/covariant tensor spaces in 
Sections 56.1 and 56.3, these specialisations are presented in Section 56.4 to provide a convenient reference 
point for their application to antisymmetric and symmetric multilinear function and map bundles. 


56.4.2 REMARK: Expressing multilinear functions on tangent spaces in terms of components. 

Before aggregating the sets of multilinear functions on pointwise tangent spaces to define multilinear function 
bundles in Notation 56.4.9, standard bases and coordinate maps must be defined at individual points of 
manifolds. These can then be used for defining manifold coordinate charts for the bundle total spaces. 


'Theorem 56.4.3 and Notation 56.4.4 are specialisations of Theorem 56.1.8 and Notation 56.1.11 from general 
mixed contravariant/covariant tensor spaces to purely covariant multilinear function spaces. 


56.4.3 THEOREM: Components for multilinear functions on tangent spaces. 
Let M be a C! manifold with n = dim(M). Then 


Vr € ZS, Vp e M, VA € Z,(T,(M)), Vv € atlas, (M), V(vi)z., € (R”)", 


AC (toolbar) = D AUR R) TT oe 


icNT 


n 


In other words, 
Yr € Zi, Yp € M, VA € T4), (A) € Tp(M)", Vi € atlas, (M), 
A Vika) = 35 ACA OE IT (OA. 


ieNz 
(See Notation 54.5.7 for ®(w) for v € atlas(M).) Hence 


Vr € Zy, Yp e M, VA e Z,(T,(M)), VV € T; (M), Vv € atlas, (M), 
A(V) = E Ali”) (V), (56.4.1) 


ieNz 
where e = (eP")7 | forr € Zt and i € IN7,. (See Notation 55.5.25 for 6"(w) for r € ZÈ and v € atlas(M). 


jk 


See Notation 55.5.40 for ®"(~)'(V) = IT; ., €(V)(Vi)'* for r € ZF and i € N?) 
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Proor: The assertion follows by Theorem 56.1.8, or alternatively by Theorem 30.2.2. For line (56.4.1), see 
also Theorem 55.5.42 (ii). 


56.4.4 NOTATION: Particular multilinear functions on tangent spaces, specified by coordinates. 
Pu fore Zi, p€ M, a: Nt > R and v € atlas, (M), for a C! manifold M with n = dim(M), denotes 
the multilinear function in .Z;.(T,(M)) defined by 


T n\r Xr T ik 
V(vk)k=1 € (R”)”, toa (ipur b)k=1) = 2 a [II vk- 


In other words, 


YV € T7 (M), t? V= Y aj 9 (9) (V). 


p,a, 
i€NT, 


56.4.5 REMARK: The boundary case of 0-linear functions on tangent spaces. 

In the case r = 0 in Theorem 56.4.3 and Notation 56.4.4, the space Y%,.(Tp(M)) is the set R{}, which has 
elements () > t for t € R, together with the natural operations of addition and scalar multiplication. (This 
is also mentioned in Remark 27.3.6.) Also T)(M) = T,(M)° = (0) = (0) and N}, = (0) = (0). So 
a € N? must have the form a: Ø — a(0) € IR. Thus a is not the empty set as one might have expected from 
Remark 55.5.15 and Definition 55.5.24. 


It follows that bu = ((0,a(0))) maps Ø € T?(M) to a(0) € IR. The value of a may be identified with 
elements of R as in Definition 27.3.7. Thus .Zo(1,(M)) is a 1-dimensional linear space, as it must be by 
Theorem 27.6.14. This leads to the possibility of identifying differential forms of degree 0 with real-valued 
functions on the base space. This identification is probably the best way to work around the impossibility 


of defining a projection map for degree 0. (See Remark 55.5.11 for discussion of this work-around.) 


56.4.6 REMARK: Coordinate transformation rule for multilinear functions. 
Theorem 56.4.7 is an automatic specialisation of Theorem 56.1.14 to multilinear functions. 


56.4.7 THEOREM: Coordinate transformation rule for multilinear functions on a tangent space. 
Let M be a C! manifold. Let r € Zj. Let p € M and v,v € atlas,(M). Let a,à : N7, > IR, where 
n = dim(M). Let J = (J*;)?j;-1 € IR"*" denote the chart transition matrix from q to 4%, given by 


o 


Vi, j € Nn, Ti = aa EOT Oee 


and let J denote the inverse of the matrix J. (See Definition 51.4.18 for chart transition matrices.) Then 


ZL; L, " g 
tay = b ed if and only if 


T d 
Vi € NT a; = 5 ( II J'*j Jay. 
i€N5 k=1 


PROOF: The assertion follows from Theorem 56.1.14 (i). 


56.4.8 REMARK:  Total-space sets for multilinear function bundles. 

Notation 56.4.9 introduces total-space sets for general multilinear function bundles on a C! manifold. These 
sets have no topological or differentiable structure. The alternative notation T>” (M) for Y.(T(M)) suggests 
the alternative notation TŽ" (M) for £.(Tp(M)) for p€ M. 


In the degenerate degree 0 case, -2o(T(M)) = Upem -Zo(15(M)) = Use RIO = R. (This follows from 
Remark 56.4.5.) So the projection map z^ : IRÍÜ — M cannot be defined in Definition 56.4.10. 


56.4.9 NOTATION: Total spaces for multilinear function fibrations. 
£,(T(M)), for a C! manifold M and r € Zf, denotes the set LJ c Z, (T, (M)). 


Alternative: T" (M). 
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56.4.10 DEFINITION: Non-topological multilinear function fibration. 
The (non-topological) multilinear function fibration of degree r € Z+ for a C! manifold M is the tuple 
CET (M)), ^, M), where v? : Y.(T(M)) > M is defined by 


Vp € M, YA € Z.(T,(M)), m (A) =p. 


The projection map of a (non-topological) multilinear function fibration (Z. (T (M)), 7%", M) is v. 


56.4.11 REMARK: Fibre charts for multilinear function fibrations. 

The standard fibre charts $4" (7) for multilinear function fibrations .Z;.(T(M)) have the coordinate chart 
space RN», where n = dim(M). Definition 56.4.12 for fibre charts for .Z;.(T(M)) is effectively equivalent to 
Definition 56.3.11 for the special case T?" (M). 


56.4.12 DEFINITION: The fibre chart for the multilinear function fibration .£, ((T'(M)) of a C! manifold M, 
for r € Z*, corresponding to a chart Y% € atlas(M), is the map from (77?) -! (Dom(v)) to R®™ defined by 


ee + a for all p € Dom(v) and a: N? > IR, where n” = IN;. 


56.4.13 NOTATION: Bundle chart map for the multilinear function fibration total space. 
6%." for a C! manifold M and r € Zi, denotes the bundle chart map for the multilinear function fibration 
total space T^" (M). In other words, 


Vu € atlas(M), Vp € Dom(v), Va : N}, > IR, 


where n — dim(M). 
Qr "(y), for r € ZF, p € M and v € atlas, (M), for a C! manifold M, denotes the restricted bundle chart 
Qr (y) 


S TS (M) In other words, 


ONT Xr Wu 
Va: IN, > R, eI (utr) = a. 
56.4.14 REMARK: Formulas for fibre charts for multilinear function fibrations. 
It follows from Notations 56.4.4 and 56.4.13 that 


Vr € Te, Vp € M, Vv € atlas (M), Va: Ni 2 IR, VV € T: (M), 


OF" (hb) (a)(V) = tz rs (V) 


= 5», ai 9^ (p) (V). 


ieNz 


This gives an unsatisfying formula for an element of T; rd '" (M) in terms of its coordinate array a € IR" . Of 
greater interest would be a direct coordinate map (i.e. fibre chart) for multilinear functions. (The formulas 
in Notation 56.4.13 are somewhat indirect.) A more direct map can be obtained by “testing” or “sampling” 
a multilinear function A € To '" (M) for coordinate basis vectors as follows. 


Vr € Zf, Vp € M, Vv € atlas, (M), VA € TZ "(M), Vi € N}, 
DETA): = ACP”), 


where e?" denotes the vector r-tuple Cage for i € IN7. Therefore 


Vr € Zj, Vp € M, VV € atlas, (M), VA € TZ (M), 
93^" (9)(4) = (A(R) evs. (56.4.2) 
In other words, the (vertical) coordinates for multilinear functions may be obtained by applying them to 
r-tuples of coordinate basis vectors. Line (56.4.2) would be suitable as an alternative to Definition 56.4.12 
for the fibre charts for multilinear function fibrations. 
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56.4.15 DEFINITION: The non-topological multilinear function bundle of degree r on a C! manifold M, for 
r € Zg, is the tuple (T "(M),m**, M, AR, (u^ Where n = dim(M) and 


(i) Cr" (M), v7, M) is the non-topological multilinear function fibration of degree r on M, and 
(ii) AT e (M) = {8% " (4); v € Ay}. 


56.4.16 REMARK: Manifold charts for the total space of a multilinear function bundle. 

Although differentiable fibre bundles are not defined until Chapter 64, it is possible to define here the 
standard total space manifold atlases which will be used later for differentiable fibre bundle structures. As 
usual, the total space manifold charts may be obtained automatically by combining the projection map with 
fibre charts. This is done in Definition 56.4.17 and Notation 56.4.18. Then Definition 56.4.19 is in essence a 
differentiable fibre bundle whose total space consists of multilinear functions on the tangent bundle. 


56.4.17 DEFINITION: The multilinear function bundle (total space) manifold chart of degree r for a C! 
manifold M, for r € Zt, corresponding to a chart v» € atlas(M), is the map from (z*)-! (Dom(v)) to 
IR" x IR" defined by £777, +4 (u(p), a) for all p € Dom(y) and a: Nt > R. 


56.4.18 NOTATION: Manifold chart map for multilinear function bundle total space. 
V-7:". for a C! manifold M and r € Ze denotes the manifold chart map for the multilinear function bundle 
total space T^" (M). In other words, 


Vp € M, Vy € atlas, (M), Va: NZ > R, 
V^ Otga) = (HP) a), 
where n = dim( M). In other words, 
Yy € atlas(M), (y) = (por?) x &* (y). 


56.4.19 DEFINITION: The multilinear function bundle of degree r on a C! manifold M < (M, Am), for 
re Zi , is the tuple (T^ (M), ATZ. (M) T ne" M, Am, ARZ ray) where n = dim(M) and 


Qe oae M, At 2) 
(ii) Arzen) = (9*7 (Y); v € Au). 


is the non-topological multilinear function bundle of degree r on M, and 


56.5. Antisymmetric multilinear function bundles 


56.5.1 REMARK: The effect of the antisymmetry constraint on multilinear functions. 

The task of Section 56.5 is to adapt the multilinear function bundles in Section 56.4 so that they contain 
only those multilinear functions which are antisymmetric. This requires the fibre and manifold charts to be 
replaced with corresponding submanifold charts. This is not automatic. The construction of submanifold 
charts requires understanding of the algebraic structure of the subset which satisfies antisymmetry. 


Section 56.5 is the differentiable manifold version of the algebraic treatment of antisymmetric multilinear 
map spaces in Section 30.3. In each case, the main objective is to coordinatise the spaces of antisymmetric 
multilinear maps or functions, but in the differentiable manifold context, the Cartesian coordinatisation of 
abstract multilinear maps is necessary so that the calculus of Cartesian space maps can be applied. 


56.5.2 REMARK: Expressing antisymmetric multilinear functions in terms of components. 
The adaptation of Theorem 56.4.3 to antisymmetric multilinear functions in Theorem 56.5.3 features the 
replacement of simple multilinear functions v + [];,_, v;^ with antisymmetrised simple multilinear functions 


0+ Y peperm(n,) Parity (P) Thar v vp . (See Definition 27.4.6 for simple multilinear functions.) 


56.5.3 THEOREM: Components for antisymmetric multilinear functions on tangent spaces. 
Let M be a C! manifold with n = dim(M). Then 


vr € Zt, Vp € M, VA € £c (T,(M)), V € atlas, (M), V(vy)z.., € (IR")’, 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


56.5. Antisymmetric multilinear function bundles 1795 


A Gna) = X AY a) Tl oe (56.5.1) 
i€N7, k=1 
=E E Ae s I o (56.5.2) 
LEI? icp(£) k=1 
= Y: A(( Ra) E parity(P) IT v^. (56.5.3) 
icIn P€perm(N,.) k=1 


(See Notation 14.10.3 for I? = Inc(IN,;, Nn). See Theorem 14.11.7 for p(£) = {€0 P; P € perm(N,)) for all 
£ € I7. See Notation 14.8.23 for parity( P).) 


PROOF: Line (56.5.1) follows from Theorem 56.4.3. Then lines (56.5.2) and (56.5.3) follow by applying 
Theorem 30.3.2 lines (30.3.1) and (30.3.2) respectively. 


56.5.4 REMARK: Lower dimensionality of antisymmetric multilinear function spaces. 
Theorem 56.5.3 line (56.5.3) implies that the effect of an antisymmetric multilinear function A € -£7 (T,(M)) 
acting on the vector r-tuple (tp, ,.);,—1 can be computed by first evaluating A for the vector r-tuple (nd NR 


T Íp(k) 


and then multiplying this by the scalar factor >) peperm(n,) parity(P) J [k=1 v; ^^ € R. This real-number fac- 
tor is an antisymmetric multilinear function of the r-tuple v € (IR?)", which is obtained from the coordinate 
n-tuples for the vectors ty», for k € IN,. 


The values A( (ud ME on line (56.5.3) are “sampled” only for increasing index r-tuples i € IP. A 
similar formula could be written for the A-values for decreasing index-tuples instead, or any other set 
of representatives of the equivalence classes of Ny under rearrangements. (See Section 14.11 for general 
rearrangements of ordered selections.) Alternatively, the values A( Ca m could be “sampled” for all 
i € Ni. as in line (56.5.1). The motivation for using a restricted set of index tuples i € I? is to remove 
redundant information in the set of A-values. This redundancy cannot be reduced any further (unless further 
constraints are placed on A). Therefore dim(.Z;; (T,(M))) = C}. More importantly, line (56.5.3) yields a 
basis-and-coordinates structure for .Z7. (T,(M)), which can be used for the construction of a total-space 
manifold atlas for the antisymmetric multilinear function bundle 27 (T(M)). 


56.5.5 REMARK: The natural basis for multilinear functions for a given base-space chart. 

Theorem 56.5.6 restates Theorem 56.5.3 line (56.5.3) in terms of the tangent bundle's “velocity” charts (wv) 
instead of parametrising tangent vectors tp v, p explicitly by their components v; € IR". Each of the maps 
V ++ P Peperm(n,) Parity (P) Ia PY) (Vk r for V = (Vt, € Tp(M)" is a multilinear function in 
L; (T,(M)). Thus Theorem 56.5.6 gives a basis-and-coordinates expression for A € £Z (T (M)). 


56.5.6 THEOREM: Antisymmetric multilinear function components using velocity charts. 
Let M be a C! manifold with n = dim(M). Then 


Vr € Zi, Yp € M, YA € ZY (Tp(M)), V(Vx)z., € Ty(M)", Vi € atlas; (M), 


A((Vk)&1) = do A( (e E) p» parity(P) I $(V)(V,)iroe 


icIn P€perm(N;.) 
=> Ae") © parity(P) 87y)? (V), 
tel” P€perm(N;.) 


ik 


for ®"(w)* for i € IN7..) 


where e^ = (eP")* | for i € N?. (See Notation 54.5.7 for ®(w) for v € atlas(M). See Notation 55.5.40 


PROOF: The assertion follows from Theorem 56.5.3 line (56.5.3) and Notation 54.5.7. 


56.5.7 REMARK: Chart-independence of the antisymmetry of multilinear function coordinates. 

The assertion that A € .Z;. (T,(M)) makes no obvious reference to the base-space charts v € atlas(M). So 
the antisymmetry of A is apparently chart-independent. Therefore it should be expected that the coordinate 
array $2 r(y)(A) = (A(eP”) ienr in Remark 56.4.14 line (56.4.2) will be antisymmetric if A € . Z7. (T,(M)), 


a 


independent of the choice of v. (Note that Notation 56.4.13 and Remark 56.4.14 for .Z;,(T,(M)) are valid 
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for elements of .Z;; (T;(M)) because it is a subset.) Consequently the antisymmetry of the coordinate array 
must be invariant under chart transitions. In other words, 


Vr € Zt, Vp € M, Vw, V € atlas,(M), Va € A; (Nn), Và € Ar(Nn), 


Z L 2 E 
Dad 7 P ad => à€A;(N,) 


where n = dim(M). (See Notations 25.15.3 and 25.15.9 for A,(IN,) and A; (Nn).) The validity of this 
invariance of antisymmetry under chart transitions is easily demonstrated from the transformation rule in 
Theorem 56.4.7. 


56.5.8 REMARK: Parametrisation of antisymmetric multilinear functions by charts and coordinates. 
Notation 56.5.9 specifies individual antisymmetric multilinear functions on a tangent bundle following the 
pattern of Notation 56.4.4 for general multilinear functions. The formulas for en use the restriction 


a = (A(e Pier of the full antisymmetric array (A(e?” 


ei?” ))iew;. The effect of the full array is reconstructed 
by replacing [],_, v 


vr 


! a 
with > Pepan, ) parity (P ) Ik= [9 E 


It may seem illogical | to remove entries men the full array and im have to effectively reconstruct the missing 
elements by a kind of antisymmetrisation. However, this removal and reconstruction is required so that a 
will be an array of coordinates with respect to a basis, which must be a linearly independent spanning set. 
(See Definition 22.7.2 for linear pare bases.) Most importantly, for every fixed v € atlas, (M), every element 


of the set A, (T,(M)) is equal to md ‘a, for some a € RŽ". (This is asserted in Theorem 56.5.10.) 


Me > 9 NOTATION: Antisymmetric multilinear functions on tangent spaces, specified by coordinates. 
for r € Zf, p€ M, a: I? > Rand v € atlas,(M), for a C! manifold M with n = dim(M), denotes 


A a v" 
the (unique) antisymmetric multilinear function in A,(T,(M)) = .Z; (T,(M)) which satisfies 
T n\r A,r r 5 s ÍP(k 
V(vk)k=1 € (IR")", tpa yl pses)k-1)7 a Dd parity(P) IL vg ^. 
ic€l? | P€perm(N;.) k=1 


In other words, 


YV €T"(M), “A (V)- Ya o parity(P) |] OH) (Y%)Po (56.5.4) 
' icI? | P€perm(N,) k=1 


— 3a DY _  perty(P)9'(u)^" (V). 


ic€I? | P€perm(N;) 
Alternative notation: b E E 


56.5.10 THEOREM: Concrete set of antisymmetric multilinear functions on tangent spaces. 
Let M be a C! manifold. Then 


Vr € Zi, Vp € M, Ar(Tp(M)) = (157.5; V € atlas, (M) and a € RI" }. 


PROOF: The inclusion A,(T,(M)) C dd yi V € atlas,(M) and a € IR/* } follows from uisi 56.5.6. 
The reverse inclusion follows from the r-linearity with respect to V of the expression for e (V) in 


p,a ath 
line (56.5.4) in Notation 56.5.9. 


56.5.11 REMARK: Coordinate transformation rules for antisymmetric multilinear functions. 

A particular disadvantage of the restricted coordinate arrays a : I? — R in Notation 56.5.9 is the complexity 
of the chart-transition rules. This array must first be expanded to a full antisymmetric array a : N}, > R, 
then this must be transformed according to Theorem 56.4.7, and finally this array must be restricted to 1}. 
'The resulting transformation rule is not simple. 


Theorem 56.5.12 verifies that if a restricted coordinate array a : I7 — R is extended toa: IN, > R 
according to the antisymmetrisation rule in px (56.5.5), then the result à is an BUD VIMUS array, and 
it determines the same multilinear function t d as the original multilinear function b ap This procedure 
may be thought of as “expanding” a “condensed” version of the antisymmetric component array to recover 
the full array a : IN7, — R from the “compressed” array a: 17, — R. So this is a kind of “decompression”. 
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56.5.12 THEOREM: Expansion of components of an antisymmetric tensor. 
Let M be a C! manifold with n = dim(M). Let r € Z, p € M, v € atlas,(M), and a: I" — IR. Define the 
expanded component array à : Ni, > R by 


rity(P;)a(io P) ifi € Inj(N,, Nn 
Vi € N}, a = | Pani BEI de (56.5.5) 
0 if à ¢ Inj(IN,, Nn) 
= D, , (a) (i), 


where P; denotes the standard sorting-permutation for i : N, — IN, as in Definition 14.11.3. (For the 
“decompression map” D; r : (I? > K) > A; (Nn; K), see Definition 25.15.16.) 


Then à € A; (Nn) and Cd v= br (See Notation 25.15.9 for the multidimensional matrix set A; (IN).) 


PROOF: It follows from Notation 25.15.9, Definition 25.15.16 and Theorem 25.15.18 (ii) that à € A; (Nn), 
and then by Notation 56.4.4, 


I 


vVeTr(M) te" (V)= 0 a Il P(Y) (Ve) 


Pit i€NZz  k-l 

= X Drel) II 9()(V)'* 
icNz k—1 

= 5 parity (P;) a(i o P; 3) TI $()(Vi)** 
i€Inj(N;,Nn) k=1 

= X parity(Piog) a(i o Qo Poo) [T $(v)(V.)^909 (56.5.6) 
icI? QEperm(N,-) k=1 

=P Jj parity(P; o Q) a(é) TT $(5)(V.)'ec (56.5.7) 
i€I? QEperm(N,.) k=1 

= Pj a()party(P) X  parity(Q) TT 9(v)(V.)'eo, 
icIm Q€perm(N;.) k=1 

=Ya > parity(P) [[ E) Vp) P ® (56.5.8) 
icIr | P€perm(N,) k=1 
A,r 

- tpa yp V) (56.5.9) 


where line (56.5.6) follows from Theorem 14.11.7 (x), line (56.5.7) follows from Theorem 14.11.5 (v, vi, ix), 


line (56.5.8) follows from Theorems 14.11.5 (vii) and 14.8.25 (v), and line (56.5.9) follows by Notation 56.5.9. 
Hence pr = r J 


56.5.13 REMARK: Total-space sets for antisymmetric multilinear function bundles. 
Notation 56.5.14 introduces total-space sets for antisymmetric multilinear function bundles on a C! manifold. 
These sets have no topological or differentiable structure. (See Notation 30.4.3 for A,(Z,(M)).) 


Definition 56.5.17 makes Definition 56.5.16 redundant because it has exactly the same meaning. 


56.5.14 NOTATION: Total spaces for antisymmetric multilinear function bundles. 
L-(T(M)), for a C! manifold M and r € Zj, denotes the set Unem r (5 (M)) = Upem Ar(Tp()). 


Alternatives: T-^^- (M), A,(T(M)), T^"(M). 


56.5.15 REMARK: Tensor bundles for general, symmetric and antisymmetric multilinear functions. 

Full tensor fibration and tensor bundle structures may be defined for the tensor fibration total space sets in 
Notation 56.4.9 in the same way as for the tensor fibration total space sets in Definitions 56.3.2 and 56.3.3. 
The main difference here is that the coordinate spaces are not so simple. Each symmetry or antisymmetry 
must be reflected in constraints on the coordinate spaces. One obvious way to achieve this is to remove 
redundant coordinates from the coordinate space R‘”"). For r = 2, for example, half of the off-diagonal 
array elements must be removed for symmetric tensors, and the diagonal elements must also be removed for 
antisymmetric tensors. 
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When acoordinate chart must be constructed for a submanifold, it is sometimes a non-trivial task to construct 
it from a chart for the ambient space. Fortunately, charts for symmetric and antisymmetric tensor spaces 
can be constructed by merely removing some of the coordinates. In the symmetric case, the redundancy of 
the ambient space charts can be removed by restricting coordinate tuples to those which have non-increasing 
index order. In the antisymmetric case, strictly increasing index order can be used. 


It is a fairly routine task to define non-topological fibrations, fibre charts, fibre chart maps, bundles, total 
space manifold charts, total space manifold chart maps, and full fibre bundle tuples for general, antisymmetric 
and symmetric multilinear function bundles, but natural choices for notations is not entirely obvious. The 
notation choices in Sections 56.4, 56.5 and 56.6 are non-standard and not entirely satisfactory. 


56.5.16 DEFINITION: The (non-topological) antisymmetric multilinear function fibration of degree r € Za 
for a C! manifold M is the tuple (.Z— (T(M)), 72°", M), where rt% : Y-(T(M)) > M is defined by 


Vp € M, YA € Z (T,(M)), m2" (A) =p. 


The projection map of a (non-topological) antisymmetric multilinear function fibration 
(L-(T(M)), m7, M) is the map 79 


56.5.17 DEFINITION: The (non-topological) antisymmetric multilinear function fibration of degree r € Zg 
for a C! manifold M is the tuple (A,(T(M)),7“", M), where «^ : A,(T(M)) — M is defined by 


Vp € M, VA € A,(T,(M)), n^" (A) = p. 


The projection map of a (non-topological) antisymmetric multilinear function fibration (NA, (T(M)), x^", M) 
is the map x^^ 


56.5.18 REMARK: Fibre charts for antisymmetric multilinear function fibrations. 

The standard fibre charts P" (y) = PA" (Y) in Notation 56.5.20 for antisymmetric multilinear function 
fibration total spaces . Z5 (T(M)) = A,(T(M)) have the coordinate chart space R?” , where n = dim(M). 
(See Notation 14.10.3 for the set J” of increasing r-tuples of elements of Ny.) 


The fact that $^ (v)(A) is well defined for all A € (1^")-! (Dom(v)) follows from Theorem 56.5.10. 


56.5.19 DEFINITION: The fibre chart for the antisymmetric multilinear function fibration A,(T(M)) of a 
0n HERUM M, for r € Zf, for a point chart V € atlas(M), is the map from (z^")-!(Dom(v)) to IRI* 
defined by i ay | a for all p € Dom(v) and a: I? — R, where n = dim(M) and I7 is the set of increasing 
r-tuples of elements of IN, as in Notation 14.10.3. 


56.5.20 NOTATION: Bundle chart map for antisymmetric multilinear function fibration total space. 

$^". for a C! manifold M and r € Zi, denotes the bundle chart map for the multilinear function fibration 
total space A,(T(M)) = T^r(M) = T^-(M) = Y-(T(M)). In other words, for all r € Zf and 
v € atlas(M), the map BA” (Y) : (1^7)-! (Dom(v)) > R? is defined by 


Vp € Dom(w), Va: J? > R, 
Y i A,r 
Q^ (9) (tony) = a, 
where n = dim(M). 
Alternative notation: $-£-. 


56.5.21 REMARK: Explicit formula for the bundle chart map for antisymmetric multilinear functions. 
Definition 56.5.19 and Notation 56.5.20 express the chart P^" (y) as the inverse of the map a + i 
For an inverse, one generally asks questions not only about existence and uniqueness (which follow from 
Theorem 56.5. 56.5.10), but also about what procedures may be used to compute the inverse. Theorem SER 5.22 


gives a procedure for obtaining the component array a from a given antisymmetric multilinear function Er us 
exploiting its action on tuples of coordinate basis vectors. Thus given any A € TA "(M), one may compute 
P&r (A) = (A(ei))iem for Y € atlas,(M). The vector tuples e; may be regarded as “test vector-tuples", 
analogously to the way in which tangent operator components may be evaluated by their action on suitably 
differentiable test functions. 
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56.5.22 THEOREM: Antisymmetric multilinear function component evaluation from “test vector-tuples”. 
Let M be a C! manifold with n = dim(M). Let r € Z. Then 
Vp € M, Vv € atlas, (M), Va € RI, Vi € I”, 
A, ; 
ai = Dow (e ys ) 


A, , 
= tat (ef v), 


where e? denotes (e?”)%_, € Ty(M)" for all i € I”, p € M and v € atlasp( M). Hence 


ik 
Vp € M, Vv € atlas,(M), YVA € TA" (M), 
D^ (Y)(A) = (A(ei))iers- (56.5.10) 
PROOF: It follows from Notation 56.5.9 that 
Vp € M, Vv € atlas,(M), Va € IRI, Vi € I”, 


A,r r : 4 , | P(k 
tor (eee) = X ay; X parity(P) J] 9(v)(e Frm 
je? P€perm(N;.) k=1 


- YXaj > parity(P) J] er” (56.5.11) 
jeln P€perm(N;.) k=1 


PME ye parity(P) 61°? 
Jes P€perm(N,) 
= Y aj (56.5.12) 
je 
= üi, 
where line (56.5.11) follows from Notation 54.5.7, and line (56.5.12) follows from Theorem 14.11.7 (vii). Then 
line (56.5.10) follows from Notation 56.5.20 and the observation that for each v» € atlas; (M), every element 


of T^" (M) is equal to D for some unique a € RF”. 


56.5.23 DEFINITION: The non-topological antisymmetric multilinear function bundle of degree r on a Ct 


manifold M, for r € Zf, is the tuple (T^"(M), x^", M, EM where n = dim(M) and 


(i) C^" (M), x^^, M) is the non-topological multilinear function fibration of degree r on M, and 
(8) ABS cary = {O%"(W)s Y € Am}. 


56.5.24 DEFINITION: The antisymmetric multilinear function bundle (total space) manifold chart of degree 
r for a C! manifold M, for r € Zj , corresponding to a chart v € atlas(M), is the map from (1^)! (Dom()) 
to IR" x R7” defined by ae + (v(p), a) for all p € Dom(v) and a: I? > R. 


56.5.25 NOTATION: Manifold chart map for antisymmetric multilinear function bundle total space. 
V^, for a C! manifold M and r € Z; , denotes the manifold chart map for the multilinear function bundle 
total space T^" (M). In other words, 


Vp € M, Vv € atlas (M), Va: I? > R, 
V^ (9)(557.,) = (H(p) a), 
where n — dim(M). In other words, 
Vw € atlas(M), PAT (y) = (y o TAT) x OAT (4). (56.5.13) 


56.5.26 DEFINITION: The antisymmetric multilinear function bundle of degree r on a C! manifold M « 
(M, Am), for r € Zf, is the tuple (T^"(M), Ararcuy, n^", M, Am, ADSL ), where n — dim(M) and 


(M) 


(i) (T (M), 77, M, ART. ( M) is the non-topological antisymmetric multilinear function bundle of degree 
r on M (as in Definition 56.5.23), and 
(ii) Ara.r(M) = {WA (p); p € Ay}. 
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56.6. Symmetric multilinear function bundles 


56.6.1 REMARK: Symmetric multilinear function bundles on differentiable manifolds. 

The symmetric multilinear function bundles on manifolds are of less importance for differential geometry 
than the antisymmetric kind. It is technically true that metric tensor fields are cross-sections of symmetric 
multilinear function bundles of degree 2, but metric fields are really part of the infrastructure of a Riemannian 
manifold, not an “inhabitant” of the manifold. (In other words, metric fields are part of the “stage”, not 
an “actor” on the “stage”.) What the symmetric multilinear functions lack is a useful algebra of the kind 
which antisymmetric multilinear functions have. 


56.6.2 REMARK:  Total-space sets for symmetric multilinear function bundles. 
Notation 56.6.3 introduces total-space sets for symmetric multilinear function bundles on a C! manifold. 
These sets have no topological or differentiable structure. 


56.6.3 NOTATION: Total spaces for symmetric multilinear function bundles. 

£,*(T(M)), for a C! manifold M and r € Zj; , denotes the set Jc m 2+ (Tp(M)). 

Alternative: T-^* (M). 

56.6.4 REMARK: Fibre charts for symmetric multilinear function fibrations. 

The standard fibre charts 4-7: (v) for symmetric multilinear function fibration total spaces Z+ (T(M)) 


have coordinate chart space IR" , where n = dim(M). (See Notation 14.10.3 for the set J” of non-decreasing 
r-tuples of elements of Ny.) 


56.6.5 DEFINITION: The (non-topological) symmetric multilinear function fibration of degree r € Zi for a 
C! manifold M is the tuple (Z+ (T(M)), 4 ^*, M), where c ^* : Y*(T(M)) > M is defined by 


Vp € M, VA € Z (T,(M)), qm (A)- p. 


The projection map of a (non-topological) symmetric multilinear function fibration (L7 (T(M)), v ^*, M) 
is the map r% ®t, 


56.6.6 DEFINITION: The fibre chart for the symmetric multilinear function fibration Y*(T(M)) of a Ct 
manifold M, for r € Zj, for a point chart v € atlas(M), is the map from (z*:^*)-! (Dom(y)) to I7] 
defined by tp ay a for all p € Dom(w) and a € IR", where J? is the set of non-decreasing r-tuples of 
elements of IN, as in Notation 14.10.3. 


56.7. Vector-valued multilinear map bundles 


56.7.1 REMARK:  Generalisation from real-valued to vector-valued multilinear functions. 

Notation 56.7.4 presents vector-valued versions of the multilinear function total spaces in Notation 56.4.9. 
(See Notation 30.4.2 for A.(T,(M),W).) The term “vector-valued multilinear function" means the same 
as ^multilinear map". (See Remark 27.2.1 for the subtle distinction between multilinear functions and 
multilinear maps.) 


Since purely covariant tensors are identified with multilinear functions, it is possible to generalise these from 
real-valued to vector-valued maps. An important case is multilinear maps valued in a Lie algebra, which 
are used for defining connection forms and curvature forms on differentiable principal bundles, which are 
cross-sections of Lie-algebra-valued multilinear map bundles. (See Definitions 69.5.4 and 70.5.2.) 


56.7.2 REMARK:  Erpression for multilinear maps on tangent spaces in terms of a base-space basis. 
Theorem 56.7.3 differs from Theorem 56.4.3 only in the replacement of the multilinear function space 
ZZ (T5(M)) with a multilinear map space .Z;.(15(M), W) for some real linear space W. 


56.7.3 THEOREM:  Vector-valued components for multilinear maps on tangent spaces. 
Let M be a C! manifold with n — dim(M). Let W be a real linear space. Then 


vr € Zt, Yp € M, VA € Z,(T,(M),W), V € atlas, (M), V(vx)z., € (IR), 


Asset) = D ACER) IE 


te NT 


n 
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In other words, 
Vr € ZS, Yp € M, VA € Y,(T,(M),W), V(Vi)., € Tp(M)", Vv € atlas,(M), 
A( (Vei) = © AC (eR Year) I P(Y) (Ve). 


ieNz 
(See Notation 54.5.7 for ®(w) for v € atlas(M).) In other words, 
Vr € Zy, Yp € M, VA € Z,(T,(M),W), VV € T; (M), Vv € atlas, (M), 


A(V) = à. AC (Cf Year) 9" QU) Q^). 


(See Notation 55.5.25 for ®"(w) for € atlas(M). See Notation 55.5.40 for $” (y)! for i € IN.) 


PnRoor: The assertion follows by Theorem 56.1.8, or alternatively by Theorem 30.2.2. 


56.7.4 NOTATION: Total spaces for general, symmetric and antisymmetric multilinear map bundles. 
Let M be a C! manifold, let r € Zj and let W be a real linear space. 

4 (T(M), W) denotes the set Upem -2r (Tp(M), W). 

£,+(T(M),W) denotes the set LJ c.r (Tp(M), W). 

L; (T(M),W) denotes the set LJ, c 4.7 (Tp(M), W). 


A,(T(M), W) denotes the set Upem Ar (Ti(M), W) = Upem £r (Tp(M),W) 5. (T(M), W). 


56.7.5 REMARK:  Vector-valued multilinear function fibrations. 
For the vector-valued multilinear function fibrations, only the antisymmetric case is presented here, namely 
in Definition 56.7.6, because this is the only case which is of significant utility for this book. 


56.7.6 DEFINITION: The (non-topological) (vector-valued) antisymmetric multilinear function fibration of 
degree r € Ze for a C! manifold M, valued in a finite-dimensional real linear space W, is the tuple 
(A,(T(M),W), 7%", M), where tr" : A,(T(M),W) — M is defined by 


Vp € M, YA € A,(T,(M),W), q aW (A) =p. 


The projection map of a (non-topological) (vector-valued) antisymmetric multilinear map fibration 
(A, (T(M), W), 747” M) is the map 77 


56.7.7 REMARK: Notation for a single vector-valued antisymmetric multilinear map. 

Notation 56.7.8 does not decompose the vector w € W into components and basis vectors because the 
coordinatisation of W is usually unconnected to the chart transition régime for the manifold M. In principle, 
there is no compelling reason for W to have a specified basis. Quite often, W is a Lie algebra of some kind, 
which would typically have its own independent régime for managing coordinates and bases. (For the scalar- 
valued version of Notation 56.7.8, see Notation 56.5.9.) 


56.7.8 NOTATION: Antisymmetric multilinear maps on tangent spaces, specified by coordinates. 
DP fo re Zi; w€ W, p€ M,a: I > R and 4 € atlasp(M), for a real linear space W and a 
C' manifold M with n = dim(M), denotes the antisymmetric W-valued multilinear map in A, (T (M), W) 


which satisfies 
f n 
V(w)ke: € (RP), — tome (Ctr lea1) = a — 12 — parity(P) [[ 4". 
Mid ic€I? | P€perm(NN;,.) k=1 
In other words, 


n A,r,w : L i 
VV € T; (M), Üas(V)-w»,a XP, parity(P) [[ P) (V) 
tel? | P€perm(N,) k=1 


=w} a DY parity(P) 9" (y)^^ (V). 
icI? | P€perm(N,) 
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56.7.9 REMARK: Juztaposition products of vectors by multilinear maps. 

As mentioned in Remark 27.2.31, products of vectors by multilinear maps may be formed notationally by 
juxtaposing them. Thus one may write a = wine in terms of Notations 56.7.8 and 56.5.9. This 
seems to be a product of a vector w € W by a multilinear map iru 
Vo 6 00V) w, which is the scalar product of th" 4(V) € R by w € W for each V € T,(M)". Since this 


does in fact yield a well defined multilinear map 5 € L (T,(M),W), there is no harm in it. 


Strictly speaking, it is the map 


56.7.10 REMARK: Fibre charts for vector-valued antisymmetric multilinear maps. 

Definition 56.7.11 and Notation 56.7.12 are the vector-valued versions of the scalar-valued Definition 56.5.19 
and Notation 56.5.20 respectively. The increased complexity is due to the arbitrary choice of a basis Bw for 
the target space W. The coordinatisation of the fibration total space as a finite-dimensional manifold requires 
the target space to be finitely coordinatised. This implies the requirement for a finite basis for W. Tedious 
definitions and notations such as these are justified only by the fact that they are needed for applications 
which use vector-valued differential forms, such as Theorem 61.13.2. 


56.7.11 DEFINITION: Fibre chart for a vector-valued antisymmetric multilinear map fibration. 
The fibre chart for the antisymmetric multilinear map fibration A,(T(M),W) of a C! manifold M, for r € Zj 
and a finite-dimensional real linear space W, for a point chart ~ € atlas(M) and a basis By = (e;)7"., for W, 


is the map from (z^^W)-!(Dom(v)) to IR" x IR? defined by Dub e (a, &p,, (w)) for all p € Dom(w), 
a : I? — Rand w € W, where n = dim(M) and J” is the set of increasing r-tuples of elements of N, 
as in Notation 14.10.3, and kp, : W — R” is the component map for W with respect to basis By as in 


Definition 22.8.8. 


56.7.12 NOTATION: Bundle chart map for antisymmetric multilinear map fibration total space. 

64.°W for a C! manifold M and r € Zi , denotes the bundle chart map for the multilinear map fibration 
total space A.(T(M), W). In other words, 64" : atlas(M) x B(W) > (A,(T(M),W) > R7”), where 
B(W) denotes the set of bases of W, is defined by 


Vw € atlas(M), VBw € B(W), Vp € Dom(v), Va: I? > R, Vw € W, 


9^7 (9, Bw)(t575) = (a. (w), 


where n = dim(M) and &g,, : W — R” is the component map for W with respect to basis By. 


56.7.13 REMARK:  Vector-valued antisymmetric multilinear map bundle definition. 

The remaining steps in the procedure to define vector-valued antisymmetric multilinear map bundles are 
Definitions 56.7.14, 56.7.15 and 56.7.18 and Notation 56.7.16. A novelty here is the need to choose which 
bases should be used in the fibre atlas and manifold atlas for the total space. 


In principle the atlas for (7 ^:^W (M) could contain all of the charts 6" (Y, Bw) for v» € atlas(M) for 
which By is a basis for W. This would generally be a infinite set of charts, which is undesirable for 
computer representations. Alternatively, one could restrict the atlas to just a single basis By, but this 
raises the question of which basis to use. The choice is clearly arbitrary. One could also make the fibre atlas 
contain the fibre charts for all bases in a given set of bases for W. This also is very arbitrary. 


It must be remembered that the fibre atlas Are = {@47(4); v € Am} in Definition 56.5.23 for a 
scalar-valued antisymmetric multilinear function bundle is constructed automatically from the given atlas 
Am for M. It is not a class of atlases, but rather a single construction. Therefore since the atlas in 
Definition 56.7.14 is created automatically from a single given atlas Am, it seems to reasonable to construct 
it also from a single given basis for W. In a sense, the coordinate map for a basis is a kind of "linear 
space chart” for W, and the singleton set containing only this chart is a “linear space atlas" for W. It is 
straightforward to generalise Definition 56.7.14 to any other “linear space atlas" for W, using multiple bases. 


56.7.14 DEFINITION: The non-topological antisymmetric multilinear map bundle of degree r, with target 
space W with basis By, on a C! manifold M, for r € Zi and a finite-dimensional real linear space W, is 


the tuple (7^ W (M), a4" , M, aA b oss where n = dim(M), m = dim(W), and 


i) (TW (M), à ^V. M) is the non-topological multilinear map fibration of degree r on M, and 
g 
(ii) 2d Cw = (^W (y, Bw); V € Ay]. 
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56.7.15 DEFINITION: The antisymmetric multilinear map bundle (total space) manifold chart of degree r, 
with target space W, on a C! manifold M, for r € Zg and a finite-dimensional real linear space W, for 
a chart ù € atlas(M) and a basis Bw for W, is the map from (1^^W)-!(Dom(v)) to R” x RF x R” 
defined by t^" > (4(p),a, KBw(w)) for all p € Dom(v), a: I} — R and w € W, where n = dim(M) and 


a,p 
m= dim(W). 


56.7.16 NOTATION: Manifold chart map for antisymmetric multilinear map bundle total space. 

UAW for a C! manifold M and r € Zi and a finite-dimensional real linear space W, denotes the manifold 
chart map for the multilinear map bundle total space T^:^W (M). In other words, 

YAW : atlas(M) x B(W) > (A,(T(M),W) > R” x RF? x R"), where B(W) denotes the set of bases 
of W, and n = dim(M) and m = dim(W), is defined by 


Vp € M, Vv € atlas, (M), VBw € B(W), Va: I? — R, 
V^ (h, Bw) (tpa) = (V(P), a, Raw (w)), 
where n = dim(M). In other words, 


Vu € atlas(M), VBw € B(W), 
Ven (Y, Bw) = (Y o a) x 9^" (Y, By). 


56.7.17 REMARK: Manifold charts for vector-valued antisymmetric multilinear map bundles. 

Definition 56.7.18 defines standard manifold structures for the total spaces of vector-valued antisymmetric 
multilinear map bundles. It is simpler than it may appear. The manifold atlas for the base space M is 
the given atlas Ay = atlas(M) for the manifold M. The manifold atlas for the total space (T^^ W (M) is 
the set Ara." w(ar),p,, Of all manifold charts UAW (5. Bw) for v € Ay and the single “manifold chart” 
KBy : W — IR" for W, which is in reality the component map for a given basis Bw. The output of 
each chart PAW (p, Bg) is defined in Notation 56.7.16 as the concatenation (vp), à, K By, (w)) of the base 
space coordinate tuple (p) € R”, the multilinear map coordinate tuple a € IR/^ , and the vector coordinate 
tuple &g,, (w) € R™. The rest of the definition is just “boilerplate”. 


56.7.18 DEFINITION: The antisymmetric multilinear map bundle of degree r, with target space W with 
basis Bw, on a C! manifold M with atlas Am, for r € Za and a finite-dimensional real linear space W, 


is the tuple (TA"” (M), Ara w (ar) Bw 7 ^V , M, Am, AE; EC By)» Where n = dim(M), m = dim(W) 
and 


i) (TAW M), ha W ,M, AB, xR” is the non-topological antisymmetric multilinear map bundle of 
T (M),By 8 
degree r on M, with target space W (as in Definition 56.7.14), and 


(ii) Arawi), Bw = (IV (y, Bw); v € Am}. 
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VECTOR FIELDS, TENSOR FIELDS, DIFFERENTIAL FORMS 
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57.0.1 REMARK: The experiential origins of vector fields. 

Vector fields arise very naturally and inevitably as gradients of scalar fields. (See Definition 74.6.2 for the 
gradient of a real-valued function on a Riemannian manifold. Gradient fields are contravariant due to the 
application of the “index-raising” isomorphism in Definition 73.5.4.) An example which must have been 
fairly obvious ever since there have been hills on the Earth is the gradient of the altitude of points above 
sea level. Although the Earth's gravity field in three dimensions has a gradient which has a fairly uniform 
magnitude and is therefore fairly ignorable as a constant background fact of nature, in practical terms people 
experience gravity very consciously as the gradient of altitude which makes the ground seem to be either 
level or sloping up or down, steeply or gently, either in the direction of motion or transverse to the direction 
of motion. Thus the whole Earth has a gradient vector at every point, which is a prime example of the 
experience of vector fields in every day life. Every walk or drive up or down a hill is a direct personal 
experience of a vector field. Work, in the physics sense of the word, is required when walking uphill. 


Any scalar field yields a vector field which is the gradient of the scalar field. For example, temperature and 
pressure fields have meaningful gradient vector fields which are experienced indirectly due to their effects 
as heat diffusion or the flow of fluids. Similarly, chemical concentration gradients have effects which are 
perceived indirectly, as for example osmosis and the diffusion of electrolytes. 


Some scalar fields cannot be measured directly and must be inferred by the integration of conservative vector 
fields. Thus three-dimensional static electric and gravitational force fields, for example, can be integrated to 
calculate corresponding scalar fields, but there is no way to physically determine the unknown integration 
constant. In such cases, the scalar field is a mathematical convenience, not a physical observable. 


Not all forces can be expressed as gradients of scalar fields. For example, magnetic force fields are non- 
conservative, and also depend on such factors as the velocity of a charge or the orientation of a magnetic 
dipole, and are therefore not integrable to obtain scalar fields which can be differentiated to recover the 
original force fields. Non-conservative vector fields are at least as important in physics as conservative vector 
fields. As another example, fluid velocity fields are vector fields, but they are not in general conservative. 
So they cannot generally be integrated to discover underlying scalar fields of which they are gradients. 


The subject of vector fields is clearly at the core of physics. Hence comprehensive tools are required for their 
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study, especially in curved point-space contexts where the underlying geometry is modelled as a differentiable 
manifold. The study of vector fields is moderately complex for flat Euclidean underlying point-spaces, but 
they become much more “interesting” in the case of curved geometry. 


Physical vector fields are of two opposite kinds, namely covariant and contravariant. The differential of a 
scalar field is covariant, whereas fluid velocity vector fields are contravariant. The contravariant kind of vector 
field is the subject of Sections 57.1 and 57.3. The covariant kind of vector field is the subject of Section 57.6. 
Differentials of real-valued functions (which are always covariant) are the subject of Section 58.1. The term 
“vector field” is usually understood to mean a contravariant vector field, whereas a covariant vector field is 
often known as a differential form of degree 1, or a “differential 1-form". Then roughly speaking, one may 
say that velocities are vectors and differentials are forms. 


To maintain a clear distinction between the two kinds of vector fields, it is perhaps helpful to think of the 
example of the rate of climb of a vehicle travelling up a hill. (This is closely related to the rate of “work” 
of the vehicle.) If the (contravariant) velocity is v and the (covariant) differential of the altitude above sea 
level is w = dH, where H is the altitude function, then the rate of climb of the vehicle is w(v) = (dH)(v). 
If the coordinates for measuring the horizontal location of the vehicle are altered from metres to kilometres, 
the numerical value of the velocity of the vehicle will be divided by 1000, whereas the numerical value of 
the differential will be multiplied by 1000. For example, a speed of 60,000 metres per hour will become 
60 kilometres per hour, and a differential of 10 centimetres per metre will become 10,000 centimetres per 
kilometre. The rate of climb w(v) must be invariant under a change of horizontal length scale. Therefore v 
and w must vary reciprocally to each other. The effect of horizontal scaling is suggested by the names. A 
covariant vector is scaled proportionally to the length of the measuring unit, in this case by a factor of 1000 
from a metre to a kilometre. A contravariant vector is scaled inverse proportionally to the length of the unit 
measuring-stick. 


The discussion of this issues is made difficult by the confused meaning of the word “gradient”. In colloquial 
English, a gradient of temperature would be measured as a ratio like “one degree per metre” or 1°/metre, 
which has covariant units, but the gradient operator according to Definition 74.6.2 is contravariant due to 
the application of an inverse metric tensor field. 


57.0.2 REMARK: The Koszul tangent operator calculus formalism and vector-tuple bundles. 

The popular Koszul formalism for differential geometry adopts tangent operator fields as the primary objects 
for definitions and computations. In this formalism, tangent vectors and tangent operators are identified 
with each other. So the tangent operator fields are referred to simply as vector fields. Coordinates are hidden 
by replacing them with vector fields, which act as proxies for coordinate basis vector fields. 


In the Koszul-style vector field calculus formalism, the formulas all seem to be chart-independent, but the 
vector fields are themselves proxies for the hidden coordinates. In practical computations, one typically 
chooses the fields X, Y, etc., to be basic vector fields like X (p) = ei(p), Y (p) = e;(p) etc. So the “hidden 
coordinates” are really present in full view as vector fields. Every formula is effectively indicating a choice 
of chart through the vector fields. This is why vector fields are used instead of vectors! 


Thus the Koszul formalism may be described as the “differential operator field calculus”, which is not very 
much different in practice to the 19th century “tensor calculus”. When holonomic vector fields are used, as 
described in Remark 70.3.2, all of the vector field commutators equal zero, and the Koszul calculus is then 
reduced to old-fashioned tensor calculus in a coordinate chart for which the vector fields are unit vector fields 
X =e;, Y = ej and so forth. 


In the case of nonholonomic vector fields, the Koszul vector-field formalism is equivalent to a vector-tuple- 
bundle formalism where a sequence of vector fields (X;)*, for some m € Z$ is replaced by a tangent 
vector m-tuple bundle cross-section Y on a manifold M such that Y (p) = (X;(p))72, for all p € M. The 
“multilinear dual" of a vector m-tuple bundle is a covariant tensor bundle of type (0, m). (See Remark 27.6.4 
for ^multilinear dual" spaces.) Thus the vector-tuple bundle concept is better integrated with the fibre bundle 
framework. It seems better to formulate differential geometry in terms of pointwise differential objects (such 
as tangent vectors) rather than global or extensive objects such as vector fields. It is generally the lower-order 
derivatives of vector fields which are used in the basic formulas of differential geometry rather than their 
extensions to global fields. 
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57.1.1 REMARK: Vector fields versus vector-valued functions on manifolds. 
Vector fields on a manifold are tangent-vector-valued functions whose value at any point of the manifold is a 
tangent vector at that point. Thus a vector field is a right inverse of the projection map of a tangent bundle. 


Note that “vector” means “tangent vector” here, not an element of a general linear space. If W is a linear 
space, one could define “vector-valued fields" ¢: M — W on a manifold M, but clearly such a function ¢ 
would not be a cross-section of a fibre bundle because W is not the total space of a fibre bundle. Such “vector 
fields” are best referred to as “vector-valued functions” to avoid ambiguity. The “vector fields" which appear 
in physics books are most often vector-valued functions because they work with Cartesian coordinates, not 
the fibre bundle idiom where every vector is attached to a base point. 


In Definition 57.1.2, there is no continuity or differentiability constraint on vector fields. Figure 57.1.1 
illustrates this kind of unconstrained “random” vector field. 


perm a a 
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Figure 57.1.1 A pseudorandom vector field on a manifold 


57.1.2 DEFINITION: A (tangent) vector field on a C! manifold M is a function X : M — T(M) such that 
X(p) € T,(M) for all p € M. 


In other words, a tangent vector field on M is a cross-section of the tangent fibration (T'(M), x, M), which 
means that it is a function X : M — T(M) such that 7 o X = idm. 


57.1.3 REMARK:  Cross-sections of tangent bundles. 

'The cross-section of a tangent bundle in Definition 57.1.2 is based on Definition 47.4.2 for cross-sections of 
topological fibrations. The definition depends only on the fibration triple ((T(M), m, M), where the projection 
map 7 defines only the horizontal component of each vector in T(M). In other words, the vertical component 
(defined by fibre charts) is not used. The vertical component is required, however, for the definition of C^ 
differentiability of vector fields in Section 57.2. 


The letter X is often used to denote a vector field or a space of vector fields. Since vector fields are the same 
thing as cross-sections of tangent vector bundles, a useful mnemonic for the letter X is the word "cross". 


57.1.4 REMARK: Alternative notations for spaces of cross-sections. 

Notation 57.1.5 seems to be slightly non-standard. Some authors (e.g. Malliavin [28], 7.4.1, page 69, Gallot/ 
Hulin/Lafontaine [13], 1.38, page 18, Darling [8], 7.1.1, page 144) use notations such as PT(M), TI'(TM) 
or TTM instead of X(T(M)). Crampin/Pirani [7], page 252, uses a notation like X(M). EDM2 [113], 
section 105.M, and Kobayashi/Nomizu [19], page 5, use the notation X(M). Do Carmo [9], page 49, uses a 
notation which looks more like X' (M), but with a horizontal stroke through the centre of the “X”. Cheeger/ 
Ebin [5], page 1, use the notation y(M) or x for the linear space of smooth vector fields on a differentiable 


manifold M. 


Notations 57.1.5 and 57.1.11 are straightforward applications or extensions of Notation 21.3.4 by replacing 
"X (T(M),z, M)" with “X(T(M))”. In other words, the tangent space fibration's projection map 7 and 
base space M are suppressed because they are implicit. The same comment applies also to the tensor bundle 
cross-section spaces X (T'^*(M)) and X (T"*(M) |S) in Notations 57.5.3 and 57.5.4, and various other kinds 
of cross-section spaces. 
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57.1.5 NOTATION: Spaces of vector fields. 
X(T(M)), for a C! manifold M, denotes the set of vector fields on M. In other words, 
X(T(M)) 2 (X : M > T(M); Vp € M, X(p) € T,(M)} 
={X:M >T(M); 70 X = idy}, 
where 7: T(M) — M is the tangent vector projection map for T(M). 
57.1.6 THEOREM: The set of vector fields on a manifold is a real linear space. 
Let M be a C! manifold. Let S = (IR, X(T(M)), oR, TR, 05, us), where (IR, oR, TR) is the real number field, 
and og : X(T(M)) x X(T(M)) > X(T(M)) and us : IR x X(T(M))  X(T(M)) are the usual pointwise 
addition and scalar multiplication operations on X (T(M)). 
(i) S is a linear space of cross-sections on the vector bundle T(M). 
(ii) S is a well-defined real linear space. 


Pnoor: Part (i) follows from Definition 24.11.4 and Theorem 54.5.34. 
Part (ii) follows from Theorem 24.11.5 (iii) and Definition 24.11.2. 


57.1.7 THEOREM: Some useful formulas for components of vector fields. 
Let M be a C! manifold with n = dim(M). Let X € X(T(M)). Then 


Vi) € atlas(M), Vp € Dom(y), V(v)(X (p)) = (YP), PX (p))), 


Vi € atlas(M), Vp € Dom(v), — Il? (W(v)(X(p))) = v(»), 
Vv € atlas(M), Vp € Dom(v), Un. (V(v)(X(p))) = 9(9)(X (p). 
Hence 
Vip € atlas(M), V(y) o X — v x (6(y) o X), (57.1.1) 
Vu € atlas(M), II? o V(y) o X — v, 
Vw € atlas(M), 12", o V(y) o X = (y) o X, 
Vy € atlas(M), Vj € Nn, Py) o X — 9, (57.1.2) 
Vy € atlas(M), Vj € Nn, Wp)" o X = (y)? o X. (57.1.3) 


PROOF: The assertions follow directly from Notations 10.15.3, 11.5.26, 54.5.7, 54.5.21 and 57.1.5, and 
Definitions 10.15.2, 54.5.4, 54.5.16, 54.5.30 and 57.1.2, and Theorem 54.5.23. 


57.1.8 REMARK: Local cross-sections of tangent bundles. 

The local cross-section of the tangent bundle in Definition 57.1.9 is constrained to have a base point which 
lies in a specified set S, but if the X (p) is non-zero, this represents a vector which may not lie within the set 
S in the sense that there exists a curve in S which passes through p with velocity X (p). 


57.1.9 DEFINITION: A (tangent) vector field on a subset S of a C! manifold M is a function X : S + T(M) 
such that X (p) € T,(M) for all p € S. 


In other words, a tangent vector field on a subset of a C! manifold is a local cross-section of the tangent 
bundle (T (M), m, M), which means a function X : S > T(M) such that 7 o X = ids. 


57.1.10 REMARK:  Ad-hoc notation for local cross-sections. 

Since local (or restricted) cross-sections occur frequently in differential geometry, particularly in definitions of 
general connections on differentiable fibre bundles and affine connections on tangent bundles, it is convenient 
to introduce a notation for spaces of such local cross-sections as in Notation 57.1.11. 


57.1.11 NOTATION: The spaces of local vector fields on a fixed domain set. 
X(T(M)|S), for a C! manifold M and a subset S of M, denotes the set of local vector fields on S. Thus 


X(T(M)|S) = (X : 8 > T(M) Vp € M, X(p) € Tj(M)) 
—(X:S— T(M); 70 X —ids], 
where x: T(M) — M is the tangent vector projection map for T'(M). 
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57.1.12 REMARK: Spaces of cross-sections on any domain set or any open domain set. 

It is sometimes useful to aggregate all of the local vector fields with open domains into a single set as in 
Notation 57.1.13. Vector fields with non-open domains are rarely useful. Since the meaning of the notation 
“ X” is not so immediately clear as the notation “ Xioc”, Notation 57.1.13 defines X(T(M)) to mean the 
less useful set of local vector fields on general subsets of M. (See Notation 47.4.3 for local cross-sections of 
general topological fibre bundles.) 


57.1.13 NOTATION: X(T(M)), for a C! manifold M, denotes the set of all local vector fields on subsets 
of M. In other words, X(T(M)) = User) X (TM) |8). 


Xioc(T(M)), for a C! manifold M, denotes the set of all local vector fields on open subsets of M. In other 
words, Xioc(T(M)) = Uvemop(uy X(T(M) |U) = (X € X(T(M)); Dom(X) € Top(M)J. 


57.1.14 NOTATION: The action of a vector field on a differentiable real-valued function. 
Ox, for a vector field X € X(T(M)), for a C! manifold M, denotes the map Ox : C!(M) — (M — IR) 
defined by 


Vf € C'(M), Vp € M, (Ox f)(p) = 0x (y) f. 


57.1.15 REMARK: Vector field notation with arguments swapped. 

One often wishes to be able to refer to the value of a vector field at a particular point in the manifold. Thus, 
for example, Ox (p) would mean a vector in T,(M) for a given vector field X € X(T(M)). This suggests 
Notation 57.1.16, which may potentially be confused with Notation 57.1.14. However, both forms of notation 
are required, and points in manifolds are unlikely to be confused with real-valued functions. 


57.1.16 NOTATION: The value of a vector field at a point. 
Ox, for a vector field X € X(T(M)), for a C! manifold M, denotes the map 0x : M — (C'(M) — IR) 
defined by 


Vp € M, vf € C! (M), Ox (D(f) = Ox (y) f. 
In other words, 0x (p) = Ox (yj. 


57.1.17 REMARK:  Chart-basis vector fields are defined in terms of chart-basis vectors. 
See Definition 54.4.9 and Notation 54.4.10 for the chart-basis vectors e? v - ty,v;, Which are referred to in 
Definition 57.1.18. Chart-basis vector fields could also be named “coordinate (chart) basis vector fields". 


Theorem 57.2.20 (i) asserts that e? € X*(T(M)| Dom(v/)) for any C^*! manifold M, for all ọ € atlas(M) 
and i € Nn, where n = dim(M). 


57.1.18 DEFINITION: The chart-basis vector fields of a C! manifold M with respect to a C? chart w for 
M are the vector fields e? € X(T(M) | Dom(/)) defined for i € Nn by 


Vi € Nn, Vp € Dom(v), e” (p) = em 
where n = dim(M). 


57.1.19 REMARK: Extension of a vector to a “constant” vector field. 

It is sometimes necessary to extend a single vector at a point in a manifold to a vector field on a neighbourhood 
of that point so that differentiation can be performed. Sometimes the particular choice of vector field is not 
very important because a computation with that field will give (more or less) the same answer no matter 
which field is chosen. (See for example Theorem 74.2.7.) In Definition 57.1.20, a vector z € T;,(M) is extended 
to the domain of a chart v € atlas; (M) by applying the chart-basis vector fields in Definition 57.1.18. The 
vector field zy is a constant linear combination of the chart-basis vectors (e), in Definition 57.1.18, where 
the linear combination is chosen so that z,(p) = z. Since the notation “zy,” is easy to confuse with other 
notations, a more rational alternative is given in Notation 57.1.21. (See also Notation 57.4.5 for constant 
vector-tuple field extensions. See Definition 21.6.8 for constant cross-section extensions for non-topological 
fibrations. See Theorem 64.7.16 for differentiability of constant cross-sections of differentiable fibrations.) 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1810 57. Vector fields, tensor fields, differential forms 


57.1.20 DEFINITION: The constant vector field extension of a vector z € T,(M) by a chart v € atlas,(M), 
for a C! manifold M, is the vector field zy € X(T(M)| Dom(w)) which is defined by 


Yq € Dom()), zola) = Y; Det”, 


where n = dim(M). In other words, zy = 3574 9(v)(z)/ef. 


t 


57.1.21 NOTATION: Extensions of vectors to “constant” vector fields. 
Extn, (z), for v € atlas(M) and z € T,(M) for some p € Dom(w), where M is a C! manifold, means the 
constant vector field extension of z by v. In other words, Extny(z) € X(T(M)| Dom(v)) and 


Yz € T(M), Vv € atlas, (,)(M), Vg € Dom(y), 


Extny(z)(q) = E Beeg”, 
where n = dim(M) and 7: T(M) — M is the projection map for T(M). In other words, 


Vi € atlas(M), Yz € a (Dom(U)), Extny(z) = E &(U) (ze. 


i=l 


57.1.22 REMARK: Properties of constant vector field extensions. 
Theorem 57.1.23 asserts some basic properties of constant vector field extensions. (See Definition 21.6.8 for 
constant cross-section extensions for non-topological fibrations. ) 


It is shown in Theorem 57.2.20 (ii) that Extny(z) e X*(T(M)| Dom(v)) for any C*t! manifold M, for all 
z € T(M) and v € atlas; (;) (M). 


57.1.23 THEOREM: Some useful properties of constant vector field extensions. 
Let M be a C! manifold with n — dim(M). Then 


Vp € M, Vv € R”, V € atlas; (M), Vg € Dom(v), 


Extny (tpv, y) (q) = tavy- (57.1.4) 
Hence 
Vz € T(M), Vv € atlasr(-) (M), Yq € Dom(y), 
Extny (z) (4) = ta, te) (57.1.5) 
Therefore 
Vz € T(M), Vv € atlasr(-) (M), Yq € Dom(y), 
P(Y) (Extny(z)(q)) = 9(v)(z). (57.1.6) 
So 
Vz € T(M), Vv € atlas, (,)(M), Yq € Dom(y), 
Extny(z)(@) = 9(v)|z an (®W)(2)). (57.1.7) 
Furthermore, 


V € atlas(M), Vp € Dom(v), Vi € Nn, 
Extn, (e?”) =e. 


1 


(57.1.8) 


Proor: Lines (57.1.4), (57.1.5), (57.1.6) and (57.1.7) follow from Definition 57.1.20, Notations 57.1.21 
and 54.5.7, and Theorem 54.4.11. For line (57.1.8), it follows from Notation 57.1.21 that Extn, (e?”) = 


Di DEY Fe? = 377. 1e? = ef by Notations 54.4.10 and 54.5.7, and Definition 22.7.9. 
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57.2. Differentiability of vector fields 


57.2.1 NOTATION: Spaces of differentiable vector fields. 
X*(T(M)), for a C^*! manifold M, for k € Zf, denotes the set of C^ vector fields on M. 


57.2.2 REMARK: Testing the differentiability of a vector field. 

It is not necessary to provide an additional definition for the differentiability of vector fields in Notation 57.2.1. 
The C^ differentiability of a map between two C* manifolds is already defined in Definition 52.1.2 for 
any k € Zj, and by Theorem 54.5.28 (iii), T(M) is a C} manifold if M is a C+! manifold. Since a vector 
field on M is a map from M to T(M), it is therefore a map between two C^ manifolds. So Definition 52.1.2 
is applicable. 

According to Definition 52.1.2, a map X : M > T(M) is C differentiable for k € Zf when poXoy lisa 
C* Cartesian space map for any v € Am = atlas(M) and 7) € Arm) = atlas(T(M)). By Theorem 52.1.11, 
this test only needs to succeed for a single pair of charts ~ € atlas;(M) and ) € atlas x 5) (T(M)) for 
each p € M. So n may be chosen to be W(w). (See Notation 54.5.21 for V.) Thus the C^ differentiability of 
X will follow from the C^ differentiability of W(«») o X o v^! for all y € atlas(M). (See Figure 57.2.1.) 


T(M) (ome. C R2 


mw) X UW(w)oX oy! 
"Oca Or 
we Am 
Figure 57.2.1 Differentiability test for a vector field 


The manifold atlas Ay; ) for the tangent bundle total space T(M) of M is defined in Definition 54.5.16. A 
chart  — (Y) € Arm) is associated with each chart Y € Ay. 


57.2.3 THEOREM: Tests for differentiability of vector fields via the charts. 
Let M be a C**! manifold for some k € Zi. Let n = dim(M). Then 


X*(T(M)) = (X e X(T(M)); Vv € atlas(M), V(v) o X ov^! € C*(Range(v), R?*)) 
= (X e X(T(M)); Vv € atlas(M), 9(y) o X o 47! € C*(Range(w), IR?)]. 


Proor: The equality for the manifold charts (wu) follows from Notations 57.2.1, 57.1.5 and 54.5.21, and 
Definition 52.1.2. Then the equality for the velocity charts ®(~) follows from Theorem 57.1.7 line (57.1.1) 
and the observation that (II? o (y) o X) o 4! = po v^! = idRange(y) is necessarily C^ differentiable. 
Therefore V(u) o X o 7! = (y x (P(Y) o X)) o -! is OF if and only if (V) o X o v^! is C*. 


57.2.4 REMARK: C* differentiability of vector fields via C" compatible charts. 
It is sometimes useful to know that C cross-sections have C^ Cartesian space maps via general C^ compatible 
charts. This is shown in Theorem 57.2.5. 


57.2.5 THEOREM: Tests for differentiability of vector fields via compatible charts. 
Let k € Zj. Let M be a CFt! manifold with n = dim(M). Then 


X"(T(M)) = (X € X(T(M)); Vv € atlas" (M), W(v) o X ov^! € C*(Range(v), IR?")) 
= (X e X(T(M)); Vy € atlas" (M), (v) o X o 97! € C*(Range(v), R”)}. 


PROOF: The assertion follows from Theorems 57.2.3 and 52.1.15. 
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57.2.6 REMARK:  Test-function-based. alternative definitions of differentiability of vector fields. 

Many authors define vector fields to be differentiable when their action on differentiable real-valued test- 
functions yields differentiable output-functions. (See for example Bishop/Goldberg [3], page 117; Bishop/ 
Crittenden [2], page 13; Szekeres [305], page 422; Kobayashi/Nomizu [19], page 5; EDM2 [113], page 386. 
Differentiability of the manifold map X : M — T(M), or differentiability of the coordinate expression for 
this map, is given as the definition by Do Carmo [9], page 25; Spivak [37], Volume 1, page 82; Choquet- 
Bruhat [6], page 21; Gallot/Hulin/Lafontaine [13], page 18; Bump [57], page 39; Flanders [11], page 54; 
Frankel [12], page 30; Gómez-Ruiz [14], page 65; Sulanke/Wintgen [40], page 47; Malliavin [28], page 70.) 
The test-function style of definition is convenient and elegant, but it has various “issues”. The test-function 
approach gives the wrong answer for some kinds of manifolds. It is also not entirely trivial to show that 
the answer is correct for manifolds which do have the right properties. (See Theorem 61.1.5.) Although 
the test-function approach conveniently avoids explicit discussion of coordinate charts, it is not immediately 
obvious that it is the best generalisation of the classical notion of differentiability. (For more discussion of 
this issue, see Remark 61.1.4.) 


57.2.7 REMARK: The linear space of C^ vector fields on a manifold. 

The linear space of C* vector fields in Definition 57.2.8 is particularly useful when it is combined with the 
Lie bracket in the case k = oo to define the Lie algebra of vector fields in Definition 61.5.16. Theorem 57.2.9 
is the C^ version of Theorem 57.1.6. 


57.2.8 DEFINITION: The linear space of C^ vector fields on a C^*! manifold M for k € Zg is the tuple 
X^(T(M)) < (R, X*(T(M)), oR, Tn. C xt(r(u) H), where R < (IR; om, 7m) is the field of real numbers, 
Oxk(r(uy : X"(T(M)) x X*(T(M)) > X*(T(M)) is the operation of pointwise addition on X*(T(M)) 
(using the linear space addition of T; (M) for all p € M), and p : R x X*(T(M))  X*(T(M)) is the 
pointwise product operation of IR on X*(T(M)) (also using the linear space structure of T (M) for all p € M). 


57.2.9 THEOREM: The set of C* vector fields on a manifold is a real linear space. 
Let k € Zi. Let M be a C**! manifold. 


(i) X^(T(M)) is a linear space of cross-sections on the vector bundle T(M). 
(ii) X*(T(M)) is a well-defined real linear space. 


PROOF: Part (i) follows from Definition 24.11.4 and Theorem 54.5.34. 
Part (ii) follows from Theorem 24.11.5 (iii) and Definition 24.11.2. 


57.2.10 REMARK: Application of differentiability of products of real-valued functions and vector fields. 
'Theorem 57.2.11 is applicable to the proof of a Leibniz rule for products of real-valued functions and vector 
fields in Theorem 61.3.3. (See Theorem 65.2.4 for the generalisation of Theorem 57.2.11 to vector bundles.) 


57.2.11 THEOREM: JDifferentiability of the product of a real-valued function and a vector field. 
Let k € Zi. Let M be a C^*! manifold. Then 


Vf € C'(M,R), VX e X*(T(M)), f.X e X*(T(M)), 
where f.X : M — T(M) denotes the pointwise product f.X : p — f(p)X(p). 


PROOF: Let p € M. Then (f.X)(p) € T,(M) because X € X(T(M)). Therefore f.X € X(T(M)) by 
Notation 57.1.5. Let y € atlas,(M), Q = Range(w) and n = dim(M). Then (V) o X o y^! e C^*(Q, R”) 
by Theorem 57.2.3, and f o -! € C*(Q,R) by Notation 51.6.3. So by Theorem 42.6.11, the pointwise 
product (f o -L).(6(v) o X o Ww) satisfies (f o v-1).(e(v) o X o y-!) e C*(Q,IR?). But this is 
equal to the map (v) o (f.X) o wt by Theorem 54.5.8. Thus (v) o (f.X) o wt € C*(Q,IR?). Hence 
f.X € X*(T(M)) by Theorem 57.2.3. 


57.2.12 REMARK: The zero cross-section. 

The zero cross-section in Definition 57.2.13 is the zero element of the linear space of C^ vector fields on any 
C**! manifold. It conveniently shows the existence of at least one global cross-section of the tangent bundle. 
This is also trivially true for any vector bundle, but not for every class of fibre bundle. (Global cross-sections 
of principal bundles, for example, are topologically "interesting", in the sense that their existence strongly 
constrains the global topology of the bundle.) 
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57.2.13 DEFINITION: The zero vector field of a C! manifold M is the map X : M — T(M) defined by 
X(p) = Or, (M) for all p € M. 


Alternative name: zero cross-section. 


57.2.14 THEOREM: Well-definition and differentiability of the zero vector field. 
Let X be the zero vector field of a C^*! manifold M with k € Zj. Then X € X*(T(M)). 


PROOF: Since Or (m) € Tp(M) for all p € M, it follows from Definition 57.1.2 and Notation 57.1.5 
that X € X(T(M)). Let v € atlas(M). Then X(v^!(x)) = Or, (a = tpo,y for all p € Dom(w). Therefore 
$(v)CX (v 1(x))) = 0n» for all p € Dom(w) by Notation 54.5.7, where n = dim(M). Since this is a constant 
function, P(Y) o X o v^! € C^*(Range(v), IR^) by Theorem 42.6.2. It then follows from Theorem 57.2.3 
that X € X*(T(M)). 


57.2.15 REMARK:  Differentiability of zero/constant pair-valued functions. 
Theorem 57.2.16 is a technical assertion which is useful for demonstrating the differentiability of certain 
kinds of vector fields on differentiable fibre bundles. 


57.2.16 THEOREM: Differentiability of direct products of zero vector fields and constant maps. 
For k € Ti let M, be a C**! manifold, and let Mz be a C^ manifold. For q € Mo, define à? : M, > 
T(M1) x M2 by $ : p+ (05, q), where 0, denotes the vector Or, (m,) for p € Mi. Then 


Vq € M», p e C*(Mi, T(M1) x M3). 


PROOF: Let q € Mə. Define $i : Mı — T(Mi) and ¢2 : Mı — Mə by ¢1 : p+ 0, and à» : p > q. 
Then $1 € C*(Mi, T(M1)) by Theorem 57.2.14, and à» € C*(Mi, M2) by Theorem 52.1.9. Consequently 
p1 = $1 X $3 € CF(Mi, T(Mi), Mz) by Theorem 52.6.13. 


57.2.17 REMARK: Spaces of differentiable local vector fields. 

Notations 57.2.18 and 57.2.19 are the differentiable versions of Notations 57.1.11 and 57.1.13 for spaces of 
local vector fields. The notation X*(T(M)) for subdomains which are general sets is not defined because 
C* differentiability is meaningful only on open subdomains. (The two set-expressions for X^(T(M)|U) in 
Notation 57.2.18 may be justified as in the proof of Theorem 57.2.3. See Notation 64.7.3 for the generalisation 
of Notations 57.2.1, 57.2.18 and 57.2.19 to general C* differentiable fibrations.) 


57.2.18 NOTATION: Spaces of differentiable local vector fields on a specified open subdomain. 
X*(T(M)|U), for k € Zi, a C**! manifold M, and U € Top(M), denotes the set of C^ local vector fields 
on U. In other words, with n — dim(M), 


X*(r(M)|U) = (X e X(T(M)|U); Vv € atlas(M), V(i) o X o v! € C'(9(U), R?")) 
= (X e X(T(M)|U); Vv € atlas(M), (Y) o X ov^! e C*(y(U), R")). 


57.2.19 NOTATION: Spaces of differentiable local vector fields with unspecified open subdomains. 
XE. (T(M)), for k € Zg and a C^*! manifold M, denotes the set of all C^ local vector fields on open subsets 


loc 


of M. In other words, 


Vk € Zg, X&aT(M)- U x'qaqu)|). 
UcTop(M) 


57.2.20 THEOREM: Differentiability of chart-basis vector fields and locally constant vector fields. 
Let k € Zi. Let M be a C**! manifold. Let v € atlas(M). Let n = dim(M). 


(i) Vi € Nn, e? € X*(T(M)| Dom(y)). 
(ii) Vp € Dom(y), Vz € T,(M), Extny(z) € X*(T(M)| Dom()). 
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PROOF: For part (i), let y € atlas(M) and i € Nn. Then ($(v) o e? o wi )\(«) = ej for all x € 
Range(w) by Definition 57.1.18, Notations 54.4.10 and 54.5.7, and Definition 22.7.9. So (v) o e? opt 
Range(w) — IR" is constant. Therefore P(Y) o e? o v-! € C™(Range(z), R”) by Theorem 42.6.2. Hence 
e? e X*(T(M)| Dom(w)) by Notation 57.2.18. 

Part (ii) follows from part (i) and Definition 57.1.20 because a constant linear combination of C^ functions 
is a CF function. (Alternatively note that a constant linear combination of constant functions is a constant 
function.) 


57.2.21 THEOREM: Differentiability of vector fields with differentiable vector velocity parameter. 

Let k € Zf. Let M be a C**! manifold. Let v € atlas(M). Let U € Top(M) with U C Dom(). 
Let f € C*(U, R”), where n = dim(M). Define Y : U > T(M) by Vp € U, Y(p) = tp r(p,y. Then 
Y e X*(T(M)|U). 


Proor: By Notation 57.1.11, Y € X(T(M)|U) because Y (p) € T,(M) for all p € U. To test Y for 
differentiability, let F = W(u) o Y o v^! : Y(U) — Y(U) x R”. Then 


Va € Y(U), F(a) = w(o)(Y (9 (z))) 
= Ub) (ty-1(2), ww) )) 
= (x, f (V (2) 
= (idy) x (f o v7 )(z). 


By Theorem 42.5.21 (ii), idy(yj € C^(v(U), IR"), and by Definition 51.7.2, f o v^! € C*(U(U), R”). So 
F € C*(v(U), R?”) by Theorems 52.1.8 and 42.6.8. Hence Y € C*(U,T(M)) by Notation 57.2.18. 


57.3. Tangent operator fields 


57.3.1 REMARK:  Distinct notations for vector fields and the corresponding differential operator fields. 
Many texts use the same notation for a vector field X and the differential action Ox of such a vector field. 
Since this book is being more careful about such definitions, a separate set of notations is presented in 
Section 57.3 for tangent operator fields. 


It is certainly very convenient to be able to say that tangent vectors and tangent operators are the same 
thing. Then one does not need to convert between them, and many formulas have a simpler appearance when 
tangent vectors are identified with the corresponding operators. However, clarity and precision of thought 
are also important, perhaps more important than convenience. The popular vector field formalism adopts 
tangent operator fields as a primary object class. But many out-of-the-ordinary, degenerate or pathological 
situations require more careful analysis. In some applications, the manifolds are not necessarily smooth, and 
they may have "interesting" boundaries. 


57.3.2 REMARK: Spaces of cross-sections of untagged and tagged tangent operator bundles. 

In Notations 57.1.5 and 57.3.4, X(T(M)) and X(T(M)) are spaces of cross-sections of the corresponding 
tangent fibrations T'( M) and T(M), but X (T(M)) in Notation 57.3.3 is not a space of cross-sections because 
the set T'(M) of untagged tangent operators is not a fibration. However, all of the spaces in Notations 57.3.3 
and 57.3.4 may be regarded as linear spaces with respect to pointwise vector addition and scalar multiplication 
similarly to Definition 57.2.8. 


57.3.3 NOTATION: Spaces of “cross-sections” of untagged tangent operator “bundles”. . 
X(T(M)) for a C! manifold M denotes the set of maps X : M — T(M) such that X(p) € T,(M) for 
all p € M. 

X*(T(M)) for a C**! manifold M with k € Z* denotes the set (X € X(T '(M)); X is C^) of C* tangent 
operator fields in X (T'(M)). 


57.3.4 NOTATION: Spaces of cross-sections of tagged tangent operator bundles. : 
X(T(M)) for a C! manifold M denotes the set of maps X : M — T(M) such that X(p) € T,(M) for 
all p € M. 

X" (T(M)) for a C+! manifold M with k € Z* denotes the set (X € X(T(M)); X is C^) of C* tagged 
tangent operator fields in X (T(M)). 
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57.3.5 DEFINITION: The chart-basis operator fields of a C! manifold M with respect to a chart ~ are the 
maps 0” : Dom(w) — T(M) defined by 


3? (p) = 0P” 


for all p € Dom(w). (See Definition 54.13.4 and Notation 54.13.5 for the chart-basis operators 0?” .) 


57.4. Vector-tuple fields 


57.4.1 REMARK: Chart-basis vector-tuple fields. 

For the benefit of the short-cut differential forms used for the exterior derivative in Section 61.10, it is 
convenient to give definitions and notations here for chart-basis vector-tuple fields, which are based on the 
chart-basis vector fields in Definition 57.1.18. 


57.4.2 DEFINITION: The chart-basis vector-tuple fields of degree m € Zi on a C! manifold M for a chart 
w € atlas(M) are vector m-tuple fields in X (T (M)| Dom(v)) defined by p> (a om for i € IN. (See 
Notation 54.4.10 for e?”.) 


Alternative indexation: p +> Cape, for i: Z|[0, m — 1] > Np. 
57.4.3 NOTATION: Chart-basis vector-tuple fields for specified charts and indez-tuples. 

B for a chart v» € atlas(M) on a C! manifold M with n = dim(M), and i € N? for some m € Zf, denotes 
the chart-basis vector-tuple field p — Cag) m for p € Dom(w). In other words, e" € X (T"(M)| Dom(v)) 
is the vector field defined by 


Vp € Dom(y), e? (p) = e” 


= Capen 


Alternative indexation: Vp € Dom(w), e¥ (p) = Cages for i: Z[0, m — 1] > Np. 


tk 


(See Definition 55.5.18 and Notation 55.5.19 for e^" for i € N™ or i: Z[0,m — 1] > Nn.) 


57.4.4 REMARK: Chart-dependent “constant extensions” of vector-tuples to vector-tuple fields. 
The “constant extensions" of vector-tuples in Notation 57.4.5 are fields of vector-tuples, not tuples of vector 


fields. Therefore it would be incorrect to write “Extny(V) = (Extny(Va)) Z9 ”. (See Definition 57.1.20 and 


a=0 ° 
Notation 57.1.21 for constant extensions of vectors V4 to vector fields Extnj(V4).) To avoid ambiguity and 


possible confusion, the corresponding vector-field tuple is denoted as Extny,(V) in Notation 57.4.9. 


57.4.5 NOTATION: Extensions of vector-tuples to “constant” vector-tuple-valued fields. 

Extn, (V), for a tangent vector m-tuple V = (V4)"-d € Ty(M)™, where M isa C! manifold, m € Zj, p € M 
and v € atlas; (M), denotes the map q +> (Extny(Va)(q))?-0, where Extny(V4) € X(T(M)| Dom(v)) is 
the constant vector field extension of Va via w for all a € Z[0, m — 1]. In other words, 


Vm € Zt, Vp € M, Vy € atlas, (M), VV = (V4)? -d € T,(M)", Va € Dom(4), 
Extny(V)(q) = (Extuy(Va)(a))2o > 


where Yq € Dom(y), Va € Z[0, m — 1], Extny(V4)(q) = 9$4(v) !($(v)(V4)). (See Notation 54.5.10 for ®,. 
See Definition 57.1.20 and Theorem 57.1.23 line (57.1.7) for extensions of vectors to “constant” vector fields.) 


57.4.6 REMARK: Some formulas for “constant” extensions of vector tuples to vector-tuple-fields. 
Theorem 57.4.7 is the vector-tuple version of Theorem 57.1.23, which gives some basic formulas for “constant” 
extensions of vectors to vector fields. (See Notation 55.5.14 for t^, y € T" (M).) 


57.4.7 THEOREM: Some useful properties of locally constant vector-tuple fields. 
Let M be a C! manifold with n = dim(M). Then 


Vm € Zt, Vp € M, Vv € (IR")™, vw € atlas, (M), Vg € Dom(y), 
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Hence 
Vm € Z, VV e T"(M), YY € atlas;^ (y; (M), Vg € Dom(w), 
Extny(V)(q) = tion), w (57.4.2) 
Therefore 
Vm € Zt, VV € T"(M), Vo € atlas; (v; (M), Vg € Dom(w), 
^ (y) (Extn, (V)(g)) = 9" (v)(V). (57.4.3) 
So 
Vm € Zi, VV ET™(M), Vv € atlas;»(y; (M), Vg € Dom(v), 
Extny(V)(q) = 9" (Y) pm an 9" (9)(V)). (57.4.4) 
Furthermore, 
Vm € Zt, Vy € atlas(M), Vp € Dom(v), Vi € N”, 
Extn, (e?) = ef. (57.4.5) 


PROOF: For line (57.4.1), Extny (tpv y) (4) = (Extn nuaa ur = rog) es = tgw y by Notations 
57.4.5 and 55.5.14, and Theorem 57.1.23 line (57.1.4). 


For line (57.4.2), Extny(V)(q) = (Extny(Va)(q))m9 = (tadva) pao = Ühan(yyv),, by Notations 


57.4.5, 55.5.14 and 55.5.25, and Theorem 57.1.23 line (57.1.5). (Alternatively, line (57.4.2) follows from 
line (57.4.1) and Notation 55.5.25.) 


Line (57.4.3) follows from line (57.4.2) and Notation 55.5.25. 
Line (57.4.4) follows from line (57.4.3). 
For line (57.4.5), Extny(e?”)(q) = (Extny(e?”)(g))™o = (ef (Za = (eb) = ef (q) for all q € 


dea a= ta JO 


Dom(w) by Notation 57.4.5, Theorem 57.1.23 line (57.1.8), Definition 57.1.18 and Notation 57.4.3. Hence 
(eh) = 


Extny (e e. 


57.4.8 REMARK:  Transposing “constant” vector-tuple fields to define “constant” vector-field tuples. 

The vector-field tuples in Notation 57.4.9 are transposes of the vector-tuple fields in Notation 57.4.5. This 
is indicated by the superscript T. The vector-tuple fields in Notation 57.4.5 are map-valued maps of the 
form Extny(V) : q + (o + Extnyj(V4)(q)). The vector-field tuples in Notation 57.4.9 are map-valued 
maps of the form Extn] (V) : a e (q+ Extnj(V4)(q)), which is the same as the vector-field-valued map 
Extn, (V) : a e Extnyj (Va), which is the same as the vector-field-family Extn, (V) = (Extny (Va) Za. 


57.4.9 NOTATION: Extensions of vector-tuples to “constant” vector-field tuples. 

Extnj (V), for a tangent vector m-tuple V = (V4)? € T,(M)", Maus M isa C! manifold, m € Zi, p € M 
and v € atlas, (M), denotes the vector-field tuple (Extn (V4))7-9, where Extny(V4) € X(T(M) | Dom(V)) 
is the constant vector field extension of Va via v for all a € Z[0, m — 1]. In other words, 


Vm € Zi, Vp € M, Vj € atlas, (M), VV = (V4)? e Ty(M)™, 
Extny (V) = (Extny(Va))ag : 


where Yq € Dom(«)), Va € Z[0, m — 1], Extny (Va 
(5 


(q) = P(Y) ! ($(V)(Va)). (See Notation 54.5.10 for 4. 
See Definition 57.1.20 and Theorem 57.1.23 line 1 


) 
7.1.7) for extensions of vectors to “constant” vector fields.) 


57.5. Tensor fields 


57.5.1 REMARK: The basic tensor field definition. 

The tensor fields in Definition 57.5.2 and Notation 57.5.3 are the obvious generalisation of the vector fields in 
Definition 57.1.2 and Notation 57.1.5. A tensor field is the same thing as a cross-section of the corresponding 
tensor fibration in Definition 56.3.7, which is based on the pointwise tensor space in Definition 56.1.3. 
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57.5.2 DEFINITION: A tensor field of type (r,s) on a subset S of a C! manifold M for r,s € Zi is a map 
Y : S — T"*(M) such that Y (p) € T7?(M) for all p € S. 


57.5.3 NOTATION: X(T"*(M)), for r,s € Za and a C! manifold M, denotes the set of tensor fields of 
type (r, s) on M. In other words, 
Vr,s € Zl, X(T"*(M)) = {Y : M > T"*(M); Vp € M, Y (p) eT;^(M)) 
={Y : M > T (M); x" o Y = idm}, 
where 7™° : T^*(M) — M is the tangent vector projection map for 7^*(M) as in Definition 56.3.7. 
57.5.4 NOTATION: X(T"*(M)|S), for r,s € Zi and a subset S of a C! manifold M, denotes the set of 
tensor fields of type (r,s) on S. In other words, 
Vr,s € Zg, AG'"UM)IS)eiYts8-—TI*(MYpeS, Y (p) € T" (My 
—(Y:S— T'?(M); Y? o Y = ids}, 
where 7^? : T^*(M) — M is the tangent vector projection map for T'^*(M) as in Definition 56.3.7. 


57.5.5 REMARK:  Differentiability of tensor fields. 
The differentiability of a tensor field may be tested in the same way as for vector fields as discussed in 
Remark 57.2.2. The maps and spaces for the tensor field differentiability test are illustrated in Figure 57.5.1. 


PoS (Y) € Apres 
T"^*(M) Q7 2 A) R” x (IR?)" x (IR?)5 


PS (b JoY oy! 


tomer 
pe Am 


Figure 57.5.1 Differentiability test for a tensor field 


Thus a C* tensor field of type (r,s) on a C**! manifold M, for k € Zj and r,s € Zj, is a tensor field 
Y € X(T"*5(M)) such that Y : M — T"*(M) is of class C^ with respect to the manifold charts on 
M and T^*(M). In other words, Y € X(T"5(M)) is C* differentiable when W^*(w) o Y o a7! is C^ 
differentiable for all v € atlas(M). So C* differentiability of tensor fields is well defined. (This is asserted 
slightly more formally in Theorem 57.5.7.) Therefore Notation 57.5.6 is well defined. 


57.5.6 NOTATION: X"(T"5(M)), for r,s € Zi, k € ZF and a C^*! manifold M, denotes the set of C^ 
tensor fields of type (r,s) on M. 


57.5.7 THEOREM: Through-the-charts differentiability criteria for tensor fields. 
Let Y € X(T"*(M)) by a tensor field on a C^ manifold M with k € Zj and r,s € Z. Then Y € 
X*(T^5(M)) if and only if 


Vw € atlas(M), PS (Y) o Y oy! is C* differentiable. (57.5.1) 
In particular, if Y € X(T*(M)), then Y € X*(T*(M)) if and only if 
Vw € atlas(M), V*(y)oY oy! is OF differentiable. (57.5.2) 


PROOF: Line (57.5.1) follows from Definitions 57.5.2, 56.3.23 and 52.1.2. Then line (57.5.2) follows from 
Definition 55.4.8 and the identification of T*(M) with T^! (M). (See for example Remarks 29.5.6 and 56.4.1 
for this identification.) 


57.5.8 REMARK: Tensor fields on open subsets of differentiable manifolds. 
See Definition 51.4.15 for the restriction of a manifold M to an open subset 2 in Notation 57.5.9. 


57.5.9 NOTATION: X^(T"5(Q)), for r,s € ZF, k € Zg, and Q € Top(M) for a C*t! manifold M, denotes 
the set of C^ tensor fields of type (r,s) on Q. 
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57.6. Differential forms 


57.6.1 REMARK: Differential forms are antisymmetric covariant tensor fields. 

A differential form may be defined as a real-valued antisymmetric covariant tensor field. The base-point 
manifold M is required to be C! to ensure that T(M) is well defined. The construction of Am(Tp(M)) 
and Am(Tp(M), W) from T;(M) is purely algebraic for each p € M. (See Notation 56.5.14 for the set 
As (T(M)) = Upem Am (5(M)) = Upem 4m (15 (M)) of antisymmetric covariant tensors of degree m on a 
C! manifold M, and Notation 56.7.4 for Am(T(M),W) = Upear Am(Zp(M),W) = User Ga (I5 (M), W) 
for general linear spaces W.) 


57.6.2 DEFINITION: A (real-valued) differential form of degree m € Zf on a C! manifold M is a cross- 
section of the bundle A;,(T(M)) of antisymmetric covariant tensors of degree m on M. 


In other words, a differential form of degree m on a C! manifold M is a function w : M > Unem (Am(Tp(M))) 
such that w(p) € Am(Zp(M)) for all p € M. 


57.6.3 NOTATION: X(A,(T(M))), for m € Z and a C! manifold M, denotes the set of real-valued 
differential forms of degree m on M. In other words, 


Vm € Zi, X(Am(T(M))) = (Y : M > Am(T(M)); Vp € M, Y (p) € Am(Tp(M))} 
= (Y : M > AqQ(T(M)); 1^" o Y = idm}, 


where z^" : A4, (T(M)) — M is the projection map for A4,(T(M)) as in Definition 56.5.17. 


57.6.4 NOTATION:  X(A,,(T(M))|S), for m € ZF and a subset S of a C! manifold M, denotes the set of 
real-valued differential forms of degree m on S. In other words, 


Vm € Zf, X (Am(T(M)) |S) 2 (Y : S > An(T(M)); Yp € S, Y (p) € Am(Tp(M))} 
= {Y : S > A«(T(M)); 1^" o Y = ids}, 


where z^" : Am(T(M)) — M is the projection map for Am(T(M)) as in Definition 56.5.17. 
57.6.5 DEFINITION: A (vector-valued) differential form of degree m, valued in a linear space W, on a C! 


manifold M for m € Zf, is a cross-section of the bundle A4, (T(M), W) of antisymmetric covariant tensors 
of degree m on M, valued in W. 


In other words, a differential form of degree m, valued in a linear space W, on a C! manifold M, is a function 
w: M > Upem Am(Tp(M), W) such that w(p) € A&(T5(M), W) for all p € M. 


57.6.6 NOTATION: .X (A4, (T(M), W)), for m € Zi, a C! manifold M and a linear space W, denotes the 
set of W-valued differential forms of degree m on M. In other words, 


Vm € Zi, X(Am(T(M),W)) = {Y : M  AS(T(M),W); Vp € M, Y (p) € Am(T»(M), W)} 
— (Y : M > A«(T(M), W); «^V o y = idm}, 


where z^"^W : A CT(M),W) — M is the projection map for A5, (T (M), W) as in Definition 56.7.6. 


57.6.7 NOTATION: X(A,,(T(M), W)|S), for m € Zf, asubset S of a C! manifold M and a linear space W, 
denotes the set of W-valued differential forms of degree m on S. In other words, 


Vm€Zj, X(A&«(T(M),W)|S) 2 (Y : S > A«(T(M),W); Vp € S, Y (p) € A (T,(M),W)) 
= (Y : S 2 A«(T(M),W); 1^"^W o Y = ids}, 


where mW : AL (T(M),W) — M is the projection map for A4, (T(M), W) as in Definition 56.7.6. 
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57.6.8 REMARK: “Differential forms” versus “multilinear forms”. 

The term “differential form” arose historically from the fact that they were often written as “dx” or “dy” 
or “dz?” or “df”, or some such thing. Such expressions yielded tangent covector fields on differentiable 
manifolds. So all such fields were called “differential forms", since typically they were in fact constructed as 
differentials. Then multilinear products such as “dx dy" would also be called “differential forms". 


The use of the word “form” no doubt derives from expressions such as *u dz + v dy” which are formulas or 
"forms" by analogy with quadratic, polynomial or multinomial forms in algebra. The main problem with 
this terminology is that very often the so-called "differential forms" are not in fact differentials of anything. 
Strictly speaking, they are multilinear function fields or cross-sections of covariant tensor bundles. So the 
term “multilinear form" would be more accurate than “differential form”, but the word “form” is also an 
anachronism. The strongest argument in favour of the word “form” is that it is much shorter than “function 
field” or “tensor bundle cross-section". All things considered, the term “antisymmetric multilinear function 
bundle cross-section" seems unlikely to become more popular than “form”. 


57.6.9 REMARK: Antisymmetric multilinear forms are isometric to duals of wedge-product spaces. 

For p € M and m € Zj, the antisymmetric multilinear form space Am(Tp(M)) is canonically linear-space 
isomorphic to its double dual, Am(Zp(M))** = A" (T;(M))*, which is the dual of the space A” (T5(M)) of 
wedge-products of m-tuples of vectors in 7,(M). (See Definition 30.4.8 and Notation 30.4.9.) This suggests 
a useful interpretation of a differential form w € X(A,,(T(M))) as a function o € X(N (T(M))*), which 
means that @(p) is a linear functional on A" (T,(M)) for each p € M. This helps give geometric meaning 
to differential forms. It associates a number @(p)(A) with each area-element à € A" (T;(M)). Consequently 
w can be integrated over some m-dimensional surface, which suggests why it is called a “differential form". 
However, it is often mathematically more convenient to retain the antisymmetric multilinear form space 
Am (T, (M)) as the primary definition for differential forms, as in Definition 57.6.2. 


Similar comments apply to Definition 57.6.5, where the antisymmetric multilinear map bundle cross-sections 
w € X(Am(T(M),W)) may be replaced by & € X(Lin(A" (T (M)), W)), where &(p)(A) = A(w(p)) for all 
p €M and A € A" (T;(M)). Then, by applying the short-cut concept in Section 57.7, @ may be regarded 
as a function from the bundle A" (T(M)) to W. In other words, & maps area elements \ € A" (T(M)) to 
vectors in W, linearly within the fibre set A" (15(M)) at each point p € M. 


57.6.10 REMARK: Differential form expression in terms of coordinate array fields. 
Theorem 57.6.11 expresses real-valued differential forms w in terms of coordinate array fields $^" (y) o w o 
V^! for given charts v. 


57.6.11 THEOREM: Formulas for differential forms in terms of coordinate array fields. 
Let k € Zi. Let M be a C! manifold. Let m € Zj. Let w € X(A&(T(M))). Let v € atlas(M). Let 
a = $^ (4j) o vw opt. Then 


A,m 
Va € Range(v), w(u (a)) = bra etu (57.6.1) 
= YAm (y4)! (a, a(a)). (57.6.2) 
Pnoor: Line (57.6.1) follows from Notation 56.5.20. Line (57.6.2) follows from Notation 56.5.25. 


57.6.12 REMARK:  Differentiability of differential forms on manifolds. 

The C* differential forms in Definition 57.6.13 and Notation 57.6.14 assume the differentiable structure which 
is provided by the standard manifold charts W^" (y) for Am(T(M)), and UAW (Y) for Am(T(M), W). 
(See Notation 56.5.25 for U4". See Notation 56.7.16 for U4.) 


57.6.13 DEFINITION: A CF differential m-form on a C**! manifold M, for k € Zi and m € Zj, isa 
cross-section w € X(Am(T(M))) such that w € CF(M, A, (T(M))). 

A Cf W -valued differential m-form on a C^*! manifold M, for k € Zf and m € Zi , for a finite-dimensional 
real linear space W, is a cross-section w € X(Am(T(M),W)) such that w € CF(M, Am(T(M),W)). 

A C* (local) differential m-form on an open subset U of a C^*! manifold M, for k € Z; and m € Zj, isa 
cross-section w € X(Am(T(M))| U) such that w € C*(U, A, (T(M))). 

A C* (local) W valued differential m-form on an open subset U of a C^*! manifold M, for k € Zj and 
m € Zg, for a finite-dimensional real linear space W, is a cross-section w € X (As, (T(M), W) | U) such that 
w € C*(U, Am(T(M), W)). 
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57.6.14 NOTATION: X*(A,,(T(M))), for m € Zp, k € Za, and a C^*! manifold M, denotes the set of C* 
differential m-forms on M. 

XF (A, (T (M), W)), for m € ZA, k € Zg, a finite-dimensional real linear space W, and a C^*! manifold M, 
denotes the set of C^ differential W-valued m-forms on M. 

XF(A,,(T(M))|U), for m € Zt, k € Zg, and an open subset U of a C^*! manifold M, denotes the set of 
C* local differential m-forms on U. 

X* (As (T(M), W)|U), for m € Zp, k € Zs, a finite-dimensional real linear space W, and an open subset 
U of a C**! manifold M, denotes the set of C^ local differential W-valued m-forms on U. 


57.6.15 REMARK:  Differentiability of component array fields of differential forms. 
Theorem 57.6.16 shows that C^ differential forms have C* component array fields. 


57.6.16 THEOREM: Component array fields of differentiable differential forms are differentiable. 
Let k € Zi. Let M be a C**! manifold. Let m € Zj. Let w € X*(Am(T(M))). Let € atlas(M). Let 
a = $^ (y) ow ow-!. Then a € C*(Range(y), Ri»). 


PROOF: The C* differentiability a € C*(Range(v), R»), of the component array field a of w for a chart 
w follows from Definitions 57.6.13 and 52.1.2 and Notation 56.5.25 line (56.5.13). 


((2019-7-20. Sternberg [38] makes extensive use of an “interior product" i, which apparently has the property 
i(X)(A) = A o X for vector fields X and differential forms A. See Sternberg [38], pages 56-58; Frankel [12], 
pages 89-92; Bishop/Goldberg [3], pages 170-173. See also the to-do note at the end of Section 29.7. The 
interior product can be defined also for general vector bundles. )) 


57.7. Short-cut versions of differential forms 


57.7.1 REMARK: Informal short-cut versions of multilinear forms. 

In practice, cross-section definitions for differential forms are inconvenient. Although it is broadly agreed 
in principle that a differential form is a cross-section of a covariant tensor bundle of some kind, the formal 
definition of a cross-section of a non-topological fibration in Definition 21.3.3, or of a topological fibration in 
Definition 47.4.2, or of a tensor field in Definition 57.5.2, or of a differentiable fibration in Definition 64.7.2, 
requires that the cross-section must be a map from the base space to the total space of the bundle (or 
fibration). But in the case of covariant tensor bundles, a useful short-cut is available, and this is very often 
exploited, with or without explanation. (See Remark 58.11.6 for an example context where this kind of 
short-cut is used. See Section 21.4 for general cross-section short-cuts.) 


Let w be a differential form of degree m € Zf on a C! manifold M. Then w € X(Am(T(M))). According 
to Notation 55.5.7, T""(M) denotes the total space of the tangent vector m-tuple bundle (T (M), «^, M) 
in Definition 55.5.37. So for each p € M, w(p) is an antisymmetric m-linear map from T7 (M) = T,(M)" 
to IR by Notations 30.4.3 and 30.1.10 and Definition 30.1.8. (This is the pedantically correct definition.) 
Define ù : T"(M) — R by &(V) = w(p)(V) = v(p)(Vi,... Vm) for V = (Viti € Tp (M) for all p € M. 
Then one may write @(V) as shorthand for w(z"(V))(V), where 7” : T'"(M) — M is the projection map 
for T™(M). This shorthand has advantages in practical applications because the base point p does not need 
to be computed from V and fed into w. (This is the expedient definition.) Thus 


vV e T"(M), &(V) = w(n"(V))(V). (57.7.1) 


Because of the convenience, most authors are content to use the short-cut differential form ù : T"(M) > IR 
instead of the pedantically correct w € X(Am(T(M))), and generally the same notation is used for these 
technically different concepts. 


It is possible to undo the short-cut. (This is demonstrated for cross-section short-cuts for general fibrations 
in Theorem 21.4.13 (iv).) If © : T"(M) — R has the required antisymmetric multilinearity properties on 
the pointwise linear spaces T7" (M), then w may be recovered from w by the simple formula: 


Vp € M, VV € T? (M), w(p)(V) = a(V). 
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In other words, 


Vpc M, wp) = [rs aar: 


The short-cut procedure for the real-valued forms in Definition 57.6.2 may be applied to the vector-valued 
forms in Definition 57.6.5 in exactly the same way. Then one defines à : T"(M) — W by the same formula 
as in line (57.7.1) for a given differential form w € X(A,,(T(M),W)). (The relevant spaces and maps are 


illustrated in Figure 57.7.1.) 
As (TM), W) O T™(M) O T™(U) 


«(C yven 


without short-cut with short-cut 


Figure 57.7.1 Differential forms w without the short-cut, and œ with the short-cut 


The differential form short-cut procedure is not restricted to antisymmetric covariant tensor fields. It may be 
applied in the same way to cross-sections of all of the covariant tensor bundles outlined in Notations 56.4.9, 
56.5.14, 56.6.3 and 56.7.4, such as Y%,,(T(M)), which is canonically equivalent to the tensor bundle T9" (M) 
in Definition 56.3.23, and symmetric vector-valued covariant tensor bundles such as Z$} (T(M),W ). 


The technical discomfort caused by the short-cut versions of cross-sections of covariant tensor bundles is yet 
another indication that tangent bundles and tensor bundles do not fit perfectly within the general abstract 
fibre bundle framework. (Some other such “discomforts” are the constructions called “drop functions" in 
Remark 59.2.8 and "swap functions" in Section 59.6.) 


Definitions 57.7.2 and 57.7.3 are applications of the general cross-section short-cut concept for “form-style 
fibrations" in Definition 21.4.9 to the special case of differential forms on manifolds. 


57.7.2 DEFINITION: The short-cut version of a (real-valued) differential form w € X(Am(T(M))), for a 
C! manifold M and m € Zj, is the function à : T"(M) — IR defined by 


VV € TUM, &(V) = w(v"(V))(V), 
where z"* : T^(M) — M is the projection map for 7" (M) in Definition 55.5.8. 
The short-cut version @ is usually denoted the same as the original function w. 


57.7.3 DEFINITION: The short-cut version of a (vector-valued) differential form w € X(Am(T(M),W)), 
for a real linear space W, a C! manifold M and m € Zo , is the function © : T™(M) — W defined by 


VV € T"(M), G(V) = w(x"(V))Y(V). 
The short-cut version @ is usually denoted the same as the original function w. 
57.7.4 REMARK: Pointwise antisymmetric multilinearity of short-cut differential forms. 
The almost completely trivial Theorem 57.7.5 asserts only the pointwise antisymmetric multilinearity of 


short-cut differential forms. 
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57.7.5 THEOREM: Short-cut differential forms are antisymmetric multilinear at each point. 


Let M be a C! manifold. Let m € Zg- Let W be a real linear space. 
(i) Ew € X(Am(T(M))), then o € (6: T"(M) > R; Yp E€ M, Plm € Am(Tp(M))}. 


(ii) If» € X(Am(T(M),W)), then o € {6:T"(M) > W; Vp € M, Pl rm 


(M) 


M) € Am(Tp(M), W)}. 


PROOF: For part (i), let w € X(Am(T(M))). Then o : T'^(M) > R by Definition 57.7.2. Let p € M. Let 


V € T} (M). Then G(V) — w(p)(V). Thus Gros (ur) = w(p), where w(p) € A4 (T, (M)) by Definition 57.6.2. 


Therefore © € (0 : T'^(M) > R; Vp € M, Sr uy € Am(Tp(M))}. 


For part (ii), let w € X(A,,(T(M),W)). Then o : T"(M) — W by Definition 57.7.3. Let p € M 
and V € T; (M). Then à(V) = w(p)(V). Thus irn = w(p), where w(p) € Am(Tp(M), W) by 


Definition 57.6.5. Therefore © € (6 : T"(M) > W; Vp € M, orm cy € Am (M), W)}. 


57.7.6 REMARK:  Differentiability of short-cut differential forms. 

Theorem 57.7.7 expresses the short-cut version w of a differential form w in terms of the coordinate array 
field a = 4$" (yy) o w o Y7! for w. (This is related to Theorem 57.6.11, which expresses w in terms of a.) 
Theorem 57.7.8 then shows that w € X^(A,,(T(M))) implies o € C^(T" (M),IR). 


57.7.7 THEOREM: Some useful conversion formulas for short-cut differential forms. 
Let k € Zg. Let M bea C! manifold. Let m € Zf. Let w € X(AS(T(M))). Let v € atlas(M). Let 
a = $^ (u) ou ow-!. Let n = dim(M). Then a: Range(v) > IR/», and à satisfies 


V(z,v) € Range(v) x (R”)”, 


(© o W^ (y) (z, v) = w( (2) (V (9)! (x, v)) (57.7.2) 
= wv (2) (551,9) (57.7.3) 
St E asus uud) (57.7.4) 
= Ya) E parity(P) [| v". (57.7.5) 
icIn P€perm(N,,) k=1 


PROOF: Line (57.7.2) follows from the observation that z^(W^(v) '(z,v)) = 1 (tava yy) = V (x) 


for all (z,v) € Range(w) x (R")™ by Notation 55.5.33 and Definition 55.5.8. Line (57.7.3) follows from 
7.7.5) 


Notation 55.5.33. Line (57.7.4) follows from Theorem 57.6.11 line (57.6.1). Then line (57.7.5) follows from 


Notation 56.5.9. 


57.7.8 THEOREM: If a differential form is CF then the corresponding short-cut differential form is C*. 
Let k € Zt. Let M be a C*^*! manifold. Let m € Zf. Let w € X*(A,,(T(M))). Let v € atlas(M). Then 
© o V" (y)-! € C*(Range(v) x (IR")", R). Hence 2 € CF(T"(M), R). 


PROOF: Let a = $^""(w) o wo w-!. Then a € C*(Range(), R/*) by Theorem 57.6.16. Then the C* 
differentiability of @ o V(u)-! : Range(v) x (R")™ — IR follows from the observation that the maps 
(z, v) + a(x); and (z,v) > [[;. v; in Theorem 57.7.7 line (57.7.5) are both C*. So their product is C", 
as are any constant linear combinations of such products. The conclusion that œ € C*(T™(M),IR) then 
follows from Definition 52.1.2. 


57.7.9 REMARK: A differential form is as differentiable as its short-cut. 
Theorem 57.7.10 asserts that @ € C*(T™(M), R) implies v € X*(Aj;(T(M))). Since this is the converse of 
Theorem 57.7.8, the differentiability levels of differential forms and their short-cuts are the same. 


57.7.10 THEOREM: If a short-cut differential form is C* then the corresponding differential form is C". 
Let k € Zj. Let M be a C**! manifold. Let m € Zj. Let w € X(Am(T(M))). Assume that © € 
C*(T"(M),IR). Then w € X*(A,,(T(M))). 
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PROOF: Let w € X(Am(T(M))) be such that o € C*(T™(M),R). Let y € atlas(M) and b =@ o V" (y)-1. 
Then b € C*(Range(v) x (IR")™, IR) by Definitions 57.6.13 and 51.6.2. Let a = $^" (y) o w o yt. 


Let x € Range(v), n = dim(M), i € Ip, and v = (ei,)g., = ei € (R”)™. (See DEDI 22.7.9 for the 
standard basis vectors e;, € R”.) Then (z, v) € Range(v) x (R")™. So b(z,v) = i P-a), ala) as E-i), vp) 
by Theorem 57.7.7 line (57.7.4). Let p = el Then tfj iG uu = (trex ee = (e ym. = e” by 
Notations 55.5.14 and 54.4.10. So b(x,v) = pu 1 (a) a2) ab n = aj(x) by Theorem 56.5.22. Thus a may be 
expressed in terms of b as 


Vi € I”, Vx € Range(v), a;(x) = (a, ej). 


Therefore a; € C*(Range(7), IR) for all i € 1”, because each e; € IR» is constant. But a(x) = Ð jern ai(x)e; 


for all z € Range(7). So a = Y jern aie; € C*(Range(v), R'm). Hence v € X*(A,,(T(M))) by Definitions 
57.6.13 and 52.1.2. i 


57.7.11 REMARK:  Notations for sets of short-cut scalar-valued differential forms. 
Notation 57.7.12 specialises the very general abstract style of short-cut cross-section space in Notation 21.4.10 
to differential forms on differentiable manifolds, and extends the notation to indicate differentiability level. 


57.7.12 NOTATION:  Scalar-valued short-cut differential form spaces. 
X (Am(T(M))), for m € Z and a C! manifold M, denotes the set of short-cut versions of differential forms 
in X(A,,(T(M))). In der words, 


X (A4 (T(M))) = (6 : T"(M) > R; aw € X(Am(L(M))), 6: V e wr” (V) V) 


XF* (A ,(T(M))), for m € Zj, k € Zt, and a C^*! manifold M, denotes the set of short-cut versions of C* 
differential forms in X^(A,, (T(M))). In other words, 


X" (A, (T(M))) = (6 : T"(M) > R; 3» € X" (A, (T(M))), 6: V o w(n™(V))(V)}. 


57.7.13 THEOREM: Alternative expressions for scalar-valued short-cut differential form spaces. 
(i) Let M be a C! manifold and m € Zf. Then 


"| 


(Am(T(M))) = (06 : T"(M) > R; Vp € M, $l rm (ur) € Am(Tp(M))}. (57.7.6) 


(ii) Let k € ZF. Let M be a CFt! manifold. Let m € Zj. Then 
X®(Am(T(M))) = X(Am(T(M))) n C*(T™(M), R). (57.7.7) 


PROOF: For part (i), let 6 € X(Am(T(M))). Then ¢: T"(M) — R, and for some w € X(Am(T(M))), 
WV) = w(x"(V))(V) for all V € T"(M). So $ = ù by Definition 57.7.2. Therefore ¢ is an element of the 
right hand side of line (57.7.6) by Theorem 57.7.5 (i). 


Conversely, suppose that $ : T"*(M) — IR satisfies Plenan € Am(Tp(M)) for all p € M. Define the map 
w : M — Am(T(M)) by w(p) = 9 [ro qur) for all p € M. Then w € X(A,,(T(M))) by Definition 57.6.2 
and Notation 56.5.14. But «"(V) = p for all V € T"(M) by Definition 55.5.8. So $ equals the map 
$: V e w(x"(V))Y(V) for V € T”(M). Thus line (57.7.6) is verified. 

For part (ii), let 6 € X*(A,(T(M))). Then à : T"(M) > R, and there exists w € X*(A;(T(M))) 
such that ó : V ++ w(n"(V))(V) for V € T"(M). Then w € X(A,(T(M))) by Definition 57.6.13 and 
Notation 57.6.14. So ¢ € X(Am(T(M))) by line (57.7.6). The property ¢ € C*(T"(M),IR) follows from 
Theorem 57.7.7 and Definition 57.7.2 because $ = uw. 

Conversely, let 6 € X(Am(T(M))) NC*(T™(M),R). Then ¢ € X(Am(T(M))). Therefore ó : T"^(M) > R 
satisfies 6: V œ w(n"(V))(V) for some v € XA TUM ). So 6 =. Then ¢ € C*(T"(M),R) implies 
w € X*(A,,(T(M))) by Theorem 57.7.10. So 6 € X*(A,,(T(M))). Thus line (57.7.7) is verified. 
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57.7.14 REMARK: Notations for sets of short-cut vector-valued differential forms. 

Notation 57.7.15 is the vector-valued version of Notation 57.7.12. Since X(Am(TI(M),W)) is essentially 
algebraic, the linear space W does not need to be constrained in this case. But X^(A,,(T(M),W)) is 
analytical. So W is restricted to be finite-dimensional in this case. 


Notation 57.7.15 makes use of Notation 56.7.4 and Definition 57.6.5. 


57.7.15 NOTATION:  Vector-valued short-cut differential form spaces. 
X(Am(T(M),W)), for m € Zf, a C! manifold M, and a real linear space W, denotes the set of short-cut 
versions of differential forms in X (Am(T(M), W)). In other words, 


X(Am(T(M),W)) = (6: T"(M) > W; Iw € X(Am(T(M),W)), 6: V > wn" (V))(V))- 


X* (A, (T(M), W)), for m € Zf, k € Zi, a C**! manifold M, and a finite-dimensional real linear space 
W, denotes the set of short-cut versions of C^ differential forms in X (A, (T(M), W)). In other words, 


X* (A (T(M),W)) = (9: T"(M) > W; 3» € X*(Am(T(M),W)), 6: V o w(a"(V))(V)). 


57.7.16 THEOREM: Alternative expressions for vector-valued short-cut differential form spaces. 
(i) Let M be a C! manifold, W be a real linear space, and m € Zj. Then 


X(Am(T(M),W)) = (6: T^(M) > W; Vp € M, 9r ur) € Am(Tp(M),W)}. . (57.7.8) 


(ii) Let k € Z3, M be a C**! manifold, W be a finite-dimensional real linear space, and m € Zj. Then 
X" (A (TM), W)) = X(As(T(M), W)) n C*(T^ (M), W). (57.7.9) 


PROOF: For part (i), line (57.7.8) may be verified as for line (57.7.6) in Theorem 57.7.13. 
For part (ii), line (57.7.9) may be verified as for line (57.7.7) in Theorem 57.7.13. 


57.7.17 REMARK:  Equi-informationality of differential forms and their short-cut versions 

Theorems 57.7.18 and 57.7.19 show that no information is gained or lost by replacing differential forms with 
their corresponding short-cut versions. The proofs of Theorems 57.7.18 and 57.1.19 are essentially identical. 
(Note that Theorems 57.7.18 and 57.7.19 also follow directly from Theorem 21.4.13 (v).) 


57.7.18 THEOREM:  Equi-informationality of scalar-valued differential forms and their short-cut versions. 
For C! manifolds M and m € Zj, define p^" : X(Am(T(M))) > X(Am(T(M))) by 


Vw € X(Am(T(M))), VV € T"(M), 
p" (w)(V) = w(n™(V))(V). 
Then p^ : X(Am(T(M))) ^ X(Am(T(M))) is a bijection. 
Pnoor: The map p^" : X(Am(T(M))) > X(Am(T(M))) is a well-defined function by Notation 57.7.12. 
To show that p^"" is an injection, suppose that p^"'(ui) = p™™ (w2) for some wi,w» € X(Am(T(M))). 


Then «(z"(V))(V) = wo(a™(V))(V) for all V € T"(M). Let p € M. Then wi(p)(V) = we(p)(V) for all 
V € T; (M). Thus UA rr for all p € M. So w; = w2. Therefore p^" is injective. 


) Z V2 7m (Mu) 
To show that p^ : X(A,,(T(M))) + X(A,,(T(M))) is surjective, let ¢ € X(A,,(T(M))). This means, by 
Notation 57.7.12, that the function $ : T'"(M ) Ei R satisfies VV € T"(M), (V) = w(r™(V))(V) for some 
w € X(Am(T(M))). Thus à = p^" (uw). So p^" : X(Am(T(M))) ^ X(Am(T(M))) is a surjection, and is 
therefore a bijection. 


57.7.19 THEOREM:  Equi-informationality of vector-valued differential forms and their short-cut versions. 
Define p^"^W : X(Am(T(M),W)) > X(Am(T(M),W)) for C! manifolds M, real linear spaces W, and 
m € Zi, by 
Vw € X(Am(T(M),W)), VV € T"(M), 
p^ (u)(V) = w(n™(V))(V). 


Then p^^W : X (A (T(M),W)) ^ X(Am(T(M), W)) is a bijection. 
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Pnoor: By Notation 57.7.15, p^"^W : X(Am(T(M), W)) > X(Am(T(M), W)) is a well-defined function. 
To show injectivity, suppose that p^"'"W(u,) = p4"™W (wo) for some w1,w2 € X(Am(T(M),W)). Then 
wi(a™(V))(V) = wa(x"^(V))(V) for all V € T"(M). Let p € M. Then «i1(p)(V) = wa(p)(V) for all 
Ved (M). Thus Viro ur for all p € M. So wı = w2. Therefore p^"^W is injective. 


je 
) V2 Tem(M) d 
To show that p^"^W : X(A,(T(M),W))  X(A«(T (M), W)) is surjective, let 6 € X(A,(T(M),W)). 
Then by Notation 57.7.15, ¢ : T"(M) — W satisfies VV € T"(M), e(V) = w(n"(V))(V) for some 
w € X(An(T(M), W). Thus à =p" (w). So p : X(A, (T(M), W)) > X(As (T(M), W)) is a 
surjection, and is therefore a bijection. 


57.7.20 REMARK: Local versions of short-cut differential forms. 

In principle, there is no real need to formally define local versions of short-cut differential forms. The 
restriction of a C* manifold to an open subset is always itself a C^ manifold. (See Definition 51.4.15 and 
Theorem 51.4.16.) However, a definite notation is required. Definition 57.7.21 is the obvious extension of 
Definitions 57.7.2 and 57.7.3 from general to local differential forms. The local differential forms, and the 
corresponding short-cuts, are defined as local functions on subsets of the full spaces, not as restrictions of 
global functions. (Not all local differential forms can be extended to be global with the same differentiability. 
Consider for example Q = R? V (IR! x {0}) with M = R?.) 


57.7.21 DEFINITION:  Short-cut versions of local differential forms. 
The short-cut version of a local (real-valued) differential form w € X(Am(T(M))|U), for a Ct manifold M, 
U € Top(M) and m € Z{, is the function à : T™(U) — IR defined by 


VV e T"(U), &(V) =w(n™(V))(V), 


where x” : T^(M) — M is the projection map for 7" (M) in Definition 55.5.8. 


The short-cut version of a local (vector-valued) auroral form w € X(Ap(T(M),W)|U), for a real linear 
space W, a C! manifold M, U € Top(M) and m € Zi, is the function à : T"(U) — W defined by 


YV e T" (U), &(V) = w(n(V))(V). 


57.7.22 REMARK: Notations and equivalences for local short-cut differential form spaces. 
Notation 57.7.23 is the obvious extension of Notations 57.7.12 and 57.7.15 from general to local differential 
forms. Lines (57.7.10), (57.7.11), (57.7.12) and (57.7.13) follow from Theorems 57.7.13 and 57.7.16. 


Notation 57.7.24 introduces unspecified-domain versions of spaces in Notation 57.7.23. 


57.7.23 NOTATION: Local short-cut differential form spaces with specified domain. 
X (A, (T(M))|U), for m € Zf, a Ct manifold M and U € Top(M), denotes the set of short-cut versions of 
differential forms in X (A, (T(M)) | U). In other words, 


X(As(T(M))|U) = (6: T"(U) > R; w € X(Am(T(M))|U), 6: V = w(r™ (V))(V)) 


={¢:T"(U) 2 R; Vp € U, 9o ur) € As (I (M))). (57.7.10) 


X* (Am, (T(M D |U), for m € Zj, k € Zg, a C**! manifold M and U € Top(M), denotes the set of short-cut 
versions of C* differential forms in X* (Nn (T(M))|U). In other words, 


X* (A, (T(M))|U) = (9 : T"(U) > R; 3o € X" (A (T(M 


DIU), 6: V = e(n" (V))(V)) 
= X(Am(T(M))|U) n C*(T"(U),R). (57.7.11) 


( 
X(Am(T(M),W)|U), for m € Zt, a C! manifold M, U € Top(M) and a real linear space W, denotes the 
set of short-cut versions of differential forms in X (A4,(T(M),W)|U). In other words, 


X(Am(T(M),W)|U) 2 (0: T"(U) > W; 3w € X(An(T(M),W)|U), $: V e w(n"(V))(V)j. 
= (0$: T"(U) 2 W; Vp € U, 9 [ro qur) € Am(Tp(M), W)). (57.7.12) 
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X* (As (T(M),W)|U), for m € Zf, k € Zł, a C**! manifold M, U € Top(M), and a finite-dimensional 
real linear space W, denotes the set of short-cut versions of C^ differential forms in X*(A,, (T(M),W) |U). 
In other words, 


X*(Am(T(M), W)|U) = (9 : T” (U) > W; aw € X (A (T(M),W)|U), $9: V = w(a"(V))(V)). 
= X(Am(T(M),W)|U) n C*(T"(U),W). (57.7.13) 


57.7.24 NOTATION: Local short-cut differential form spaces with unspecified domain. 
x toc(Am(T(M))), for m € Zf and a C! manifold M, denotes the set of short-cut differential forms in 
X (A (T(M)) |U) for some U € Top(M). In other words, 


Xoc(Am(T(M))) = vet? s X(As (T(M)) |U). 


XE (As (T(M))), for m € Zj, k € ZF and a C! manifold M, denotes the set of short-cut differential forms 
( 


in X"^(A,,(T(M)) |U) for some U € Top( M). In other words, 
Xioc(Am(T(M)))= U  X"(A«(T(M))|U). 
UcTop(M) 


Xiyc( Am (T(M), W)), for m € Zi , a C! manifold M, and a real linear space W, denotes the set of short-cut 
differential forms in X (A4, (T(M), W) |U) for some U € Top( M). In other words, 


Xoc(Am(T(M), W)) = U X(Am(T(M),W) |U). 
UETop(M) 
XI. (Am (T(M), W)), for m € Zf, k € Zj, a C! manifold M, and a real linear space W, denotes the set of 
short-cut differential forms in X *(A,, (T (M), W) |U) for some U € Top(M). In other words, 


Xi (As (T(M),W)) = T X" (As (T(M), W) |U). 


57.7.25 REMARK: More general applicability of the “multilinear form short-cut” concept. 

The “short-cut” style of notation in Remark 57.7.1 is applicable whenever a global function g : E — X 

may be expressed as g = Use 9| s, for some fibration (E.r, B) as in Definition 21.1.2, where X is any set. 

Then for all b1,b2 € B, bı # b» implies g|, N g|p, = 0 because Ey, N E», = Ø. Thus the validity of the 
1 1 


equation g = yep gl E follows from the fact that E is a disjoint union of the sets E, by Theorem 21.1.7. 
Therefore when the global “short-cut” function g : E — X is reconstructed from the function-valued function 
f: B — (E > X) defined by f(b) = Ol e, for all b € B, the reconstructed function is well defined. Then f is 
a “localisation” of g, whereas g may be reconstructed as a “globalisation” of f. (See Section 21.4 for general 
cross-section short-cuts for non-topological form-style fibrations.) E 


, 


If a global function g : E — B is given, the expression “g,” may be thought of as an expedient abbreviation 
for g| E But the system of notation used generally in mathematics suggests that such an abbreviation should 


be interpreted as a value of the function-valued function f : B > (E > X) with f(b) = g| p, E» > X for 
all b € B, and strictly speaking, f must be given a different symbol to g to distinguish distinct objects. This 
seems to be the cause of the notational ambiguity. 


The function-valued function f provides abbreviations for the restrictions of the global function g to fibre 
sets, but this is confusing when the same letter g is used for both the pointwise and global versions. In 
this sense, the abbreviations g, may be regarded as a kind of pseudo-notation because it clashes with the 
convention that g, means the value g(b) of a function g for an argument b. On the other hand, subscripts 
do not always signify function arguments. Subscripts can have many other kinds of meanings. 
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57.8. Vector fields and tensor fields along curves 


57.8.1 REMARK: Applications of vector and tensor fields along curves. 

Vector and tensor fields along curves are relevant to the study of Jacobi fields, for example. A vector field 
according to Definition 57.8.2 does not require the curve to be non-self-intersecting because the domain of the 
vector field is the same as the domain of the curve. This freedom is particularly useful in Definition 70.3.8, 
for example, to avoid a common practice of defining concepts such as curvature for fully defined vector fields 
instead of the minimum required structure, which is in this case a two-parameter curve-family. 


57.8.2 DEFINITION: A vector field along a curve ^; : I —^ M ina C! manifold M is a function Y : I + T(M) 
such that Vt € I, Y (t) € T4(9 (M). 


57.8.3 DEFINITION: A continuous vector field along a curve y: I — M, for a real-number interval J, in a 
C! manifold M, is a vector field Y : I + T(M) along y such that Y is a continuous function with respect 
to the usual topology on IR and the standard topology on T'(M). 


57.8.4 REMARK:  Differentiability of vector fields along curves with respect to the curve parameter. 
Differentiable vector fields can be defined along differentiable curves. (See Definition 51.9.2 for differentiable 
curves.) Definition 57.8.5 uses the standard C* differentiable structure on the total tangent space T(M) of 
a C**! manifold M. 


57.8.5 DEFINITION: A CF (differentiable) vector field along a C^ curve y : I — M, for an open real- 
number interval J, in a C^*! manifold M for k € Zj , is a vector field Y : I + T(M) along y such that Y is 
of class C^ with respect to the usual differentiable structure on open intervals J C R and the standard C^ 
differentiable structure on T(M). In other words, 


Vy € atlas(M), ) o Y € C^(ay-! (Dom(y)), R” x R”), 


where n = dim(M) and  — W(w) € atlas(T(M)) denotes the manifold chart on T(M) corresponding to 
each base-space atlas 1» € atlas( M). 


57.8.6 DEFINITION: A vector field on a family of curves y: xt. I, — M for a family of intervals (Ip); 
for m € Zf, where y is continuous, is a function Y : x?* I + T(M) which satisfies 


vt c X i. Y (y(t) € Ty) (M). 


57.8.7 REMARK:  Tensor fields along curves. 
There is little difference between Definitions 57.8.8 and 57.8.2. There are many other classes of tensor bundles 
apart from T™S(M). Definitions may be written for all such classes for various levels of differentiability. 


57.8.8 DEFINITION: A tensor field of type (r,s) along a curve y: I > M ina C! manifold M, for r,s € Zt, 


is a function Y : I > T"*(M) such that Vt € I, Y (t) € T (M). 


57.9. Velocity vector fields of curves 


57.9.1 REMARK: The differential of a differentiable curve. 

The velocity vector field of a differentiable curve is the same as the differential of the curve. Differentiability 
of curves in a differentiable manifold is defined in Section 51.9. Differentiable vector fields along curves are 
defined in Section 57.8. 

The idea of Definition 57.9.2 is illustrated in Figure 57.9.1. (See Definition 54.1.2 and Theorem 54.1.8 (v) for 
tangent-line vectors such as t,(4),a,(woy(t)),b = (V, Lyoto), where Ly y(t) v(t) : 5 v((t)) + sv(t).) 


57.9.2 DEFINITION: The tangent vector field of a C! curve y : I + M ina C! manifold M, for some 
interval I € Top(IR), where n = dim(M) € Zf, is the map 7’ : I > T(M) defined by 


Vt c I, We atlasy (+) (M), y (t) = b(t) js (YE) b 


= [Q^ Lyre, apare) 
= [Qt Bp u) Jeep V]. 


The velocity (vector) field of a C} curve is the same as its tangent vector field. 
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w € atlas(M) 


> 


Figure 57.9.1 Tangent-line vectors for the velocity field of a curve 


57.9.3 REMARK: The necessity of coordinate charts for defining the velocity field of a curve. 

It would be nice to think that something so basic as the velocity field of a differentiable curve could be 
defined without resorting to aesthetically displeasing coordinate charts. Unfortunately, this is not so. If 
tangent vectors are formalised as C! equivalence classes of C! curves, Definition 57.9.2 could then be written 
without referring directly to charts, but then the ugly formulas would be hidden away inside the definition 
of a tangent vector. An abstract manifold simply does not have a differentiable structure of its own. The 
coordinate charts are the only way to define the derivative of points on a curve with respect to its parameter. 


57.9.4 REMARK: A tangent vector field is a well-defined tangent vector at each point. 
The fact that y(t) in Definition 57.9.2 is a well-defined tangent vector according to Definition 54.1.2 is easily 


verified. Let %1, Y2 € atlas, (M). Then 


(8, (uo o IEN) = Blh o vp o vi o A(t) 
Y 8, (Ja e VT (2) (Ais o (0))^, 


j=l 


which verifies equation (54.1.4) in Theorem 54.1.11. Since the tangent vector field is tagged by the curve 
parameter £ € I, there are no ambiguities at self-intersections of the curve. However, 7’(t) is sometimes said 
to be “the tangent vector at y(t)”, which is ambiguous if the curve parameter is discarded. 


57.9.5 REMARK: Interpretation of the definition of the tangent vector field of a curve. 

The tangent vector ?'(t) in Definition 57.9.2 is a substitute for a limit of the form lim; ,o(»(t +h) — y(t) /h. 
If M has no linear or affine space structure, the difference and product in the expression h~!(y(t+h) — 4(t)) 
will not make sense. This is why it is necessary to first overlay a coordinate chart on the manifold and 
then differentiate the coordinates instead of the curve itself. To remove the arbitrariness of this procedure, 
equivalence classes of these derivatives are used. 


The situation becomes more interesting when the manifold M does have a linear space (or affine space) 
structure. In this case, the expression (y(t + h) — y(t))/h is well-defined, and if the limit exists, it would be 
interesting to compare the result with the corresponding tangent vector to the manifold. (It is assumed here 
that all finite-dimensional linear spaces are given the standard topology which makes them homeomorphic 
under a linear map to a space IR", which is topologically complete.) The results should be equivalent if the 
coordinate charts match up in a differentiable manner with the linear space structure. 


Let M be a linear space with the standard topology as in Definition 32.6.6, and let v € atlas(M) be C1 in the 
sense that for all p,v € M, the derivative bp, = 0,(~(p + tv))| 1—0 € R” is well-defined and continuous with 
respect to p and v. Define V, = t4, ,.y € Tp( M). Then the map L: M — T,(M) defined by L: v > V, 
is a linear space isomorphism. Therefore the inverse linear map wp = L^! : T,(M) — M is well-defined. 
The map wp might be referred to as the “drop” of T,(M) onto M (analogous to “lift” functions). A special 
case of this kind of drop-function is the canonical identification of T, (IR^) with R” for all p € R”. Such drop 
functions arise in the calculation of covariant derivatives with respect to affine connections in Section 71.6. 
(See Definition 59.2.9 for drop functions.) 
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57.9.6 REMARK: Alternative notation for the velocity vector field of a differentiable curve. 
The alternative Notation 57.9.7 for the velocity of a differentiable curve y at a given point t € Dom(y) has 
the advantage that it may be easily extended to curve families. 


57.9.7 NOTATION: y(t), for a C! open curve y : J — M in a C! manifold M, and t € I, denotes the 
value of the tangent vector field of y at t. In other words, Vt € I, Ojy(t) = y' (t). 


57.9.8 THEOREM: The velocity vector field of a constant curve is the zero vector field. 
Let M be a C! manifold. Let y : I — M be a constant curve in M for some open real interval J. Then 
Vt € I, y (t) = Or. s (un: 


PROOF: Theorem 51.9.6 implies that y is a C! curve in M. So y’ is well defined by Definition 57.9.2. By 
Definition 36.2.11, a constant curve y : I > M satisfies Vs,t € I, y(s) = y(t). If I = 0, then y is the 
empty function, and consequently y’ is also the empty function. Therefore the assertion of the theorem is 
(vacuously) true. So assume that I Æ 0. 

Let p € Range(y). Then Vt € I, y(t) = p. Let V € atlas,(M). Let J = y !(Dom(v)). Then J € Top(R) 
and o y : J — R” is constant, where n = dim(M), and in fact, Vt € J, v(oy(t)) = v(p). Therefore 
A: (( o y)(t)) = Or» by Theorem 40.5.5 and Definition 40.7.2. Thus y(t) = ty4),0,p = Or, (m) for all t € J 
by Definition 57.9.2 and Theorem 54.4.7. 


57.9.9 REMARK: Comparison of the differential of a curve with the differential of a map. 
It is interesting to compare Definition 57.9.2 with the corresponding definitions for the differential of a 
differential map in Section 58.9. The set IR may be regarded as a differentiable manifold (IR, Ag) with 


Am = (vo), where v9 = idg is the identity chart on IR. Then the differential dy : R —> T(IR, M) and 
induced map ^. : T(R) — T(M) are defined as follows. 


vt € R, Va € R, (dy)t(te,ayo) = ay (t) 
Vt € IR, Va € IR, Yx (tiei) = oy (t), 


where ay’ (t) = ty(t),ad;(or(t)),v for v € Ay. Thus (dy): = Plna € Lin(T; (IR), T (M)) for t € R. 
Conversely, the value 7’(t) can be expressed in terms of dy and Ys as 


vt € IR, Y(t) = (dY) (t) (tuus) = Y (teas )- 


It is clear that y’ contains all of the information in the maps dy and yx. Since R has such an obvious choice 
of chart and the pointwise linear spaces of R are 1-dimensional, it seems quite unnecessary to give the full 
differentials. It is entirely sensible to define y’ (t) as > (tz,1,4)), since the number 1 and chart v are implicit. 


The idea of regarding a curve as a map whose domain is the manifold R is more than mere philosophical 
contemplation. It is applied in a practical way in Theorem 59.4.8 to determine the differential of a “tangent 
vector scaling curve", which is used in the proof of Theorem 61.3.3 (a Leibniz rule for naive derivatives 
of vector fields), which is used in the proof of Theorem 71.6.7 (a Leibniz rule for covariant derivatives on 
tangent bundles). 


57.9.10 REMARK: The velocity of a scaling curve, regarding a linear space as a manifold. 

A “scaling curve” is defined here to mean a curve A —— AV for A € IR, where V is a vector in some linear 
space. A finite-dimensional real linear space can be given a standard manifold structure by adding the atlas 
consisting of all coordinate maps. (The standard atlas for a finite-dimensional real linear space is given 
in Definition 49.7.14, and the corresponding manifold structure is given in Definition 51.4.21.) Since the 
coordinate maps of linear spaces are linear maps, it is not surprising that the velocity of a scaling curve in 
Theorem 57.9.11 has a special, simple form. When the velocity vector is dropped from T(F) to F via the 
drop function oo! , the result is the same as the result of the naive differentiation of AV with respect to A, 
namely the vector V. 


57.9.11 THEOREM: The velocity of a scaling curve in a linear-space manifold. 
Let F be a finite-dimensional real linear space with the standard manifold atlas Ap. For V € F, define the 
curve yy : IR — F by A AV. Then 


VV F, VA IR, Vw atlas(F), yy O) = Ü V (V), y 
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Hence 

YV € F, VAER, v (y (A)) = V, 
where w” : T(F) — F is the drop function for the tangent bundle of F as in Definition 54.9.5. 
PROOF: Let m = dim(F). By Definition 57.9.2, y (4) = two, aaya a), y But v(yv(A)) = PAV) = 
Av(V) € R” for all A € IR because  : F — R” is a linear space isomorphism by Theorem 23.1.15. So 
Oxv(yv (4)) = VV). Hence yy (A) = tavy), y 


By Definition 54.9.5, a (EAV av), y) = YTHS) lavy), v) = v i((V)) — V. Hence wë (y (A)) = V 
for all V € F and A ER. 


57.9.12 REMARK: The relative merits of the dot and dash notations for differentials of curves. 

The notation 4/ is used for curves instead of +7 because the dot is more difficult to see. An argument can be 
made for using the dot-notation ^ for functions y : R — M from the real numbers to a manifold M, and 
the dash-notation f' for functions f : M — IR from a manifold to the real numbers. 

In the literature, both the dot and dash notations are used for differentials of curves. A particular convenience 
of the dot notation is that superscripts may be added to indicate a contravariant vector component index. 


57.9.13 DEFINITION: The tangent operator of a C! curve y: I — M in a C! manifold M, for a parame- 
ter t € I, where I € Top(IR), is the tangent operator 8y) € Tya (M), where ?'(t) € Ty) (M) is the tangent 
vector in Definition 57.9.2. 


(( 2018-11-12. The only application for Remark 57.9.14 and Definition 57.9.15 seems to be Definition 59.9.3 in 
Section 59.9, which itself appears to have no applications. The original book plan did have applications for 
partial derivative vector fields. Since these concepts are shallow, they may be removed soon. On the other 
hand, the tuple of partial derivatives of a curve-family does constitute some kind of frame field, which could 
be of some relevance to Section 57.11. } 


57.9.14 REMARK: Partial tangent vector fields with respect to coordinate charts. 

The differential dy for a C! family of curves y : IR" — M is not quite the same thing as the sequence 
(Oi y) in Definition 57.9.15, but it does contain essentially the same information. 

The differentials in Definition 57.9.15 may be thought of as “transversal vector fields". For example, 


when m = 2, the tangent vector 027(u',u”) is transverse to the curve t! — ^(t!, t?) at the point y(u1, ua) 
for (ul, u?) e R?. 


57.9.15 DEFINITION: A partial derivative vector field of a C! family of curves y : IR" — M with m € Z+ 
in a C! manifold M is a vector field Oy : R™ — T(M) defined for k € Nm by 


Vt € R”, V € atlas, (y (M), Ony(t) = by (£),8, (wor(t)) ib 


57.10. Integral curves of vector fields 


57.10.1 REMARK: Integral curves of vector fields. 

In a C! manifold, a C! curve whose velocity at each point is equal to the value of a given vector field 
at that point is called an "integral curve" of the vector field. (For integral curves of vector fields, see 
Lang [23], pages 66-96; Spivak [37], Volume 1, pages 135-140; Crampin/Pirani [7], pages 59-63; Szekeres [305], 
pages 433-436; Bishop/Goldberg [3], pages 121-128; Frankel [12], pages 30-34; Gallot/Hulin/Lafontaine [13], 
pages 23-24; Bishop/Crittenden [2], pages 14-15; Sulanke/Wintgen [40], page 48; Nash/Sen [30], page 172; 
Kobayashi/Nomizu [19], page 12; Malliavin [28], pages 96-98; Gómez-Ruiz [14], pages 68-78.) 


57.10.2 DEFINITION: An integral curve of a vector field X € X(T(M)) on a C! differentiable manifold M 
is a differentiable curve y : I > M for some open interval I € Top(IR) such that 


Vt € I, y (t) = X(xt)). 


57.10.3 EXAMPLE: The integral curves of a vector field in Cartesian two-space. 
Figure 57.10.1 illustrates the integral curves of the naive vector field X : IR? — IR? which is defined by 
X(z,y) = (zy, y — x) for all z, y € R?. 
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Figure 57.10.1 Integral curves of the vector field (x, y) (x 4-y,y — x) 


Let (xo, yo) € R? \ ((0,0)). Let to = 3 In(z2 + yg) and Bo = arctan(zo, yo). (See Definition 44.2.9 for arctan 
with two parameters.) Define 725,45) : IR — R? V {(0,0)} by 


Vt € R, ^f yo) (t) = (e^ 9 cos(Bo — t), e" sin(Bo — t)). 
Then ^(4,.4,)(0) = (xà + ya)? (cos(Bo), sin(Bo)) = (xo, yo) and 


Vt € R, "(xo o£) = € (cos(8o — t) + sin(Bo — t), sin(Bg — t) — cos(Bo — t)) 
— (ry y- 2), 


where (x,y) = Y(xo,yo)(t). Therefore *(,,,,, is an integral curve of X with initial point (xo, yo). (The 
“integral curve" for (zo, yo) = (0,0) is the constant curve 70,0) : t+ (0,0).) 


57.10.4 REMARK: Existence of integral curves. 

The existence of integral curves is one of the core assertions of differential geometry. Integral curve existence 
is required for the identification of Lie algebras of Lie groups with left invariant vector fields, and also for 
the generation of parallel transport on differentiable fibre bundles from connections. 


The existence and uniqueness of integral curves follows from the corresponding ODE existence and uniqueness 
theorems, such as Theorems 44.7.1 and 44.5.3. To satisfy the requirements of such ODE theorems, the 
manifold M in Theorems 57.10.5 and 57.10.6 only needs to be C! differentiable because the vector field X 
only needs to be of class CUT Thus the transition maps in atlas( M) only need to have Lipschitz differentiable 


first-order derivatives. 


(( 2018-11-22. Theorem 57.10.5 uses Theorem 44.7.1, which has not yet been proved in this book, although it 
is a standard theorem which has been known for 128 years. )) 

(( 2018-11-24. Upgrade Theorem 57.10.5 to a C^*? manifold M with X € X**1(T(M)) and y € C^* (I, M) 
for k € Zt.) 
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57.10.5 THEOREM: Existence of integral curves for differentiable vector fields on manifolds. 
Let M be a C? manifold. Let X € X!(T(M)). Then for all p € M, there exists a C! integral curve y of X 
on M such that 0 € Dom(y) and 4(0) = 


PROOF: Let n = dim(M), p € M, v € atlas; (M) and zo = v(p). Let X = (v) o X o v-!, where (wv) is 
defined by Notation 54.5.7. Then X € C'(Range()), R”) by Theorem 57.2.3. 
Let rg — z min(1, d(ro,IR^ V Range(w))). Then ro € Rt and B,,4, C Dom(X) by Theorem 37.5.6 (iv) 
because Dom(X ) € Top,, (IR^). So Bro,ro is a convex compact subset of Dom( X X). Therefore X is Lipschitz 
continuous on B,,,, by Theorem 41.6.23. 

Let U = R x Bz,n. Then U € Top(o,,,(R x IR") by Theorem 32.9.6 (ii). Define X : U > R” by 
X (t,£) = X(x) = 9(v)(X(v-!(z))) for all (tz) € U. Then X € C9!(U,IR^). In other words, X is 
Lipschitz continuous on U. So by Theorem 44.7.1, there exists £ € IR* and a curve 4 € C!(Bo,c, Buo,ro.) such 
that 4(0) = ao, and Vt € By, 4 (t) = X(t, A(t) = X(4(t)). 

Let I = A and define y : I > M by y(t) = v-(4(t)) for all t € I. Then 4(0) = p, y € C'(I, M), 
and 7/(t) = tyy,4 Ob = cM. (4(t), 4 (t)) € Ty t)(M) is well defined for all t € I. (See Notation 54.5.21 
for V()).) no y(t) = ts 4), X(4(t)) = apos (t)) for all t € I. Hence y € C! (I, M) is an integral curve of X 


with y(0) = 


57.10.6 THEOREM: Uniqueness of integral curves for differentiable vector fields on manifolds. 
Let M be a C? manifold with dim(M) € Z*. Let X € X'(T(X)). Let y1, %2 € C! (I, M) be integral curves 
of X on M with I € Top^""" (IR) and 41(t9) = ye(to) for some to € I. Then y = %2. 


PROOF: Suppose that yı Æ y2. Then 71(t) 4 y2(t) for some t € I with either t < to or t > to. It may be 
assumed that there exists t € I with y(t) 4 y2(t) and t > to. (Otherwise substitute t — yy (2to — t) for 
^ for k = 1,2. Or just do the whole proof again, backwards.) Then {t € I (to, 00); y(t) 4 yo(t)} F 4. 
So to = inf(t € IN (to, oo); ^n (t) # 72(t)} is well defined and to € IN [to, oo). Since p ) = y2(t) for all 
t € (t', to) for some t' € (—oo, to), the continuity of yı and y2 implies that 71 (to) = y2(to). 
Let fo = yi(to) = y2(fo). Let v € atlass,(M). Define 54 = Y o yk for k = 1,2. Let Iy = Dom(^i) 
for k = 1,2. Then I, = y%,'(Dom(w)) € Top(IR) for k = 1,2 by Theorem 31.12.7 and Definition 31.12.4, 
where n = dim(M). Let zo = (po). Then zo € I; for k = 1,2. Let Io = I; 1I». Then Ip € Top; (IR) and 
Io € Dom(4;) for k = 1,2. Let Ij be an open interval with £j € Ip C Ip. (For example Jp be the union of all 
such intervals.) Let 4j, = ^|. for k = 1,2. Then 4, € C! (Io, IR") for k = 1,2. 
Let X = (Y) o X o -!. (See Notation 54.5.7 for ().) Then Dom(X X) = Range(v) € Top(IR?), and 
X € C! (Range(v), R”) by Theorem 57.2.3. Since Zo - Dom(X), there exists r € Rt with Bz, C Dom(X). 
s closed ball is a convex compact subset of Dom( X). So X is globally uniformly Lipschitz continuous on 
Ba, by Theorem 41.6.23. Thus there exists L € Rj such that Yy, z € Ba, |X(y) — X(z) € Lly — z|| by 
Definition 38.6.6. 
Define U = Io x Ba,,. Then (19, čo) € U and U € Top(IR x R”). Define X : U — R” by X(t, £) = X(z) for 
all (t,x) € U. Then X € C?(U, R”), and X is Lipschitz with respect to z € Bz,,.,, with a uniform Lipschitz 
constant with respect to t € Jp, as in line (44.5.3). Moreover, 44(i9) = 42(to) = Žo and 


Vk € Ns, Vt € Io, plt) = (v 


Therefore 4, = 49 by Theorem 44.5.3. This contradicts the definition of to. Hence 7 = 72. 


57.10.7 REMARK: C! manifold requirement for integral curve uniqueness for constant vector fields. 

The uniqueness of integral curves, as shown in Theorem 57.10.5, generally requires the manifold to be of 
class Ctt. Theorem 57.10.8 requires the manifold to be merely Ct. This is not a paradox. The requirement 
for Lipschitz continuity of the vector field is required when it is non-constant because then the equation 
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Vt € R, y(t) = X(y(t)) becomes an ODE when viewed through the charts, and in this ODE the solution y 
is on both the left and right hand sides of the equation. This causes a kind of “feedback” which can lead to 
non-uniqueness of solutions, as shown by examples in Remark 44.3.2. Therefore to achieve uniqueness for 
non-constant vector fields, the vector field must be C?:!, and this implies that the manifold must be Cl. 
Since there is no “feedback” in Theorem 57.10.8, the manifold is only required to be Ct, which ensures that 
the tangent bundle exists, which allows C! curves to be defined. 


57.10.8 THEOREM: All integral curves of zero vector fields are constant curves. 

Let M be a C! manifold. Let X € X°(T(M)) be the zero vector field on M. Then for all p € M, there exists 
a unique integral curve € C! (IR, M) of X which satisfies y(0) = p, and this integral curve is constant. In 
other words, Vt € IR, y(t) = p is the only solution of Vt € R, y(t) = X(»(t)). 


PROOF: Define y : IR — M by Vt € R, y(t) = p. Then y € C'(M) by Theorem 51.9.6, and 4'(t) = 
Or, (uu) = X(p) = X ((t)) for all t € IR by Theorem 57.9.8. Thus y is an integral curve of X. 


To show uniqueness, let ^ : IR — M satisfy Vt € IR, 5 (t) = X (5(t)). Then Vt € R, 5'(t) = OT m) Suppose 
that 4 4 y. Then 7(t) # y(t) for some t € R \ {0}. Suppose that 4(t) Z y(t) for some t € R*. Then let 
to = inf(t € Rt; 7(t) Z y(t)}. Then to € IRj and 4(to) = (to) by the continuity of ^ and y, where y(to) = p 
by the definition of y. 

Let j € atlas, (M). Let F = poy and F = vo^. Then 2 = Dom(F)NDom(F) € Top, (IR), and F'(t) = Or» 
and F’(t) = Or» for all t € Q. Since F(t) = F(to), it follows from Theorem 43.9.9 that 12 = Flie for 
any a, B € R such that a < to < B and fa, 8] C Q. (Alternatively apply Theorem 40.6.8 to the components 
of these differential equations.) This contradicts the definition of to. 


Similarly, if 4(t) Z y(t) for some t € R7, then to = sup(t € IR; 5(t) Æ y(t)} gives the same conclusion. 
Hence the integral curve is unique. 


((2019-9-24. Here might be a good location for a section on calculus of variations on manifolds. )) 


57.11. Frame fields for tangent bundles 


57.11.1 REMARK: Frame bundle cross-sections can be used as “infrastructure”. 

Cross-sections of frame bundles are, in a formal sense, just one of the many classes of objects which inhabit 
a differentiable manifold. A frame bundle cross-section defines a linearly independent tuple of vectors at 
each point of its domain, and vector-tuples are, apparently, merely tuples of vectors. So a frame bundle 
cross-section is formally equivalent to an n-tuple of vector fields which are linearly independent at every 
point, where n is the dimension of the manifold. 


However, frame bundle cross-sections are most often found in an “infrastructure” role as a generalisation of 
the concept of a coordinate basis. Then other objects in the manifold may be coordinatised in terms of the 
basis provided for the tangent space at each point. 


This is similar to the way in which a metric tensor field is, in a formal sense, merely a cross-section of 
the doubly covariant tensor bundle on a manifold, but in reality the metric is regarded as part of the 
infrastructure or “fabric” of the manifold. 


Tangent vector frame bundles are defined in Sections 55.6 and 55.7. In the infrastructure role, a frame bundle 
cross-section is often referred to as a “moving frame” or “repére mobile” (in French). It is preferable to avoid 
the words “moving” and “mobile” because they suggest a time parameter or some similar dependency, 
whereas in fact the parameter space of a frame bundle cross-section is the static point-manifold. 

A frame bundle cross-section is called a “frame field” by O’Neill [295], page 10. It is called a “local frame 
field” by Darling [8], page 144. A linearly independent tuple of vector fields is called a “frame of vector 
fields” by Frankel [12], page 243. 

See Notation 55.6.6 and Definition 55.6.31 for the frame bundle (M) in Definition 57.11.2. This is the same 
as F" (M), where n = dim(M). See Definition 47.4.2 and Notation 47.4.3 for spaces of local cross-sections of 
topological fibrations such as X (F(M)|U). (Note that it is not possible to call F(M) a differentiable fibre 
bundle at this point because this concept is defined in Chapter 64.) 


Since global frame fields are a special case of local frame fields, it is always sufficient to state definitions and 
theorems for the local case, as in Definition 57.11.4 and Theorem 57.11.6 for example. 
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57.11.2 DEFINITION: Tangent vector frame fields. 

A (global) (tangent) (vector) frame field on a C! manifold M is an element of X(#(M)). In other words, a 
global frame field on M is a global cross-section of the frame bundle F(M). 

A (local) (tangent) (vector) frame field on a C! manifold M is an element of X(F(M)|U) for some U € 
Top(M). In other words, a local frame field on M is a local cross-section of the frame bundle F(M). 


57.11.3 REMARK: The dual of a frame field. 

To avoid the burden of defining covariant *dual frame bundles" analogously to the contravariant frame 
bundles #(M), the dual frame field in Definition 57.11.4 is specified as a “frame of differential forms". In 
other words, it is an n-tuple (0*)? , of differential forms which are linearly independent at each point. Thus 
the value of the ith differential form at p € U is o*(p), not c(p)'. Inconsistently, the value of the jth vector 
in the frame field X at p € U is X(p);, not X;(p). (Luckily this transposition of parameters is of no great 
importance.) The obvious alternative would be to define a “tangent covector frame bundle" F*(M) to act 
as a dual space to F(M). A presentation of such a dual space would be even less interesting to read (and to 
write) than Sections 55.6 and 55.7. Nevertheless, it would be reasonable to think of the dual frame field in 
Definition 57.11.4 as a "tangent covector frame field”. 


In much of the literature, the dual frame field is denoted by the letter “6”. (See for example Spivak [37], 
Volume 2, page 260.) It is denoted “o” by Frankel [12], page 243. 


57.11.4 DEFINITION: The dual frame field of a frame field X € X(F(M)|U) of a C! manifold M, where 
U € Top(M), is the tuple (c*)7., of cross-sections of € X(T°1!(M)|U) for i € Nn, where n = dim(M), 
which are given by 


Vi € Nn, Vp € U, VV € T,(M), o'(p)(V) = kx (V, 


where &x (5) : Tp(M) — IR" is the component map for the linear space T (M) with respect to the basis X (p). 
(See Definition 22.8.8 for &.) 


57.11.5 REMARK: Existence and well-definition of the dual frame field. 

The existence and well-definition of the dual frame field in Definition 57.11.4 follow from Theorem 22.8.3. 
The component map &x(y) : T;(M) — IR" is a bijection by Theorem 22.8.11, and it is a linear space 
isomorphism by Theorem 23.1.15. Thus the map V > (e'(p)(V))i-, = Kx(p)(V) for V € T (M) is a linear 
space isomorphism. Consequently c'(p) € T; (M) = T9 (M) for all p € M and i € Np. 


57.11.6 THEOREM: Some basic properties of dual frame fields. 
Let M be a C! manifold with n = dim(M). Let U € Top(M). Let X € X(F(M)|U). 


(i) Vp € U, Vi, j € Nn, e*(p(X(p);) = 5j. 
(ii) Vp € U, VV € T,(M), V = 5 o (D(V)X (p);. 
(iii) VY € X(T(M)|U), Yp € U, Y (p) = Xi 7 Q)(Y (0) X (p): 
(iv) Vp € U, Va € T;(M), a = 35 Q(X (p):)o’ (p). 
(v) Yw € X(T*(M)|U), Vp € U, w(p) = 35a w(p)(X (p):)o* (p). 
PROOF: Part (i) follows from Theorem 22.8.9 (iii). 
For part (ii), let p € U and V € T (M). Then c'(p)(V) = &x(5(V)' by Definition 57.11.4. Consequently 
YX e ((V)X(p) = Y: Kx (V) X(b); = V by Definition 22.8.8 
Part (iii) follows from part (ii) by letting V = Y (p) for each p € U. 
For part (iv), let p € U anda € T; (M). Let V € T,(M). Then 


YV e T, (M), (È axo) o) )(V) = X Xp) oY) 
= ¥ a(X(0).) xo) 
= » kx oy (V) X (p) ) (57.11.1) 
=a(V) (57.11.2) 
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where line (57.11.1) follows from the linearity of a, and line (57.11.2) follows from Definition 22.8.8 for 
component maps «x. Hence $5; , a(X(p);)c'(p) = a. 


Part (v) follows from part (iv) by letting a = w(p) for each p € U. 


57.11.7 REMARK:  Juztaposition pseudo-products of frame field and dual frame field components. 
Theorem 57.11.6 part (ii) may be written in a slightly cryptic form as Vp € U, idz, (y) = oua o (p) X (p)i. 
This does not have an obvious meaning. The juxtaposition o'(p)X(p); cannot be interpreted as a scalar 
product or a real-number product because o(p) € T; (M) and X(p); € T;(M). One might guess that the 
intention is to apply c*(p) to X (p);, which would yield o*(p)(X (p);) = 1, which gives $57 ,60'(p)X(p); ^ n, 
which is presumably not what is intended. In fact, this kind of expression is intended to accept a vector 
V € T,(M) as the argument of o’(p), which yields a real number, and this number is supposed to act as a 
scalar product on the vector X (p);. This yields Theorem 57.11.6 part (ii). (Note that the “wrong answer" n 
equals the trace of idr, (y), which is not just a lucky coincidence.) 


In general, “multiplying” vectors in T? (M) by covectors in Tš (M) to obtain tensors in T1 (M) is error-prone 
at worst and logically incorrect at best. Therefore such “pseudo-product” expressions should be avoided, 
although unfortunately, these kinds of “juxtaposition products" of covariant and contravariant tensors are 
often seen in the literature. So some guesswork is required for their interpretation, particularly when tensor 
terms in more complicated spaces 75^?! (M) and T7?7?(M) are “multiplied” by juxtaposition and then 
summed over one or more indices. 


57.11.8 REMARK: Frame fields for general differentiable vector bundles. 

The tangent vector frame fields in Definition 55.6.3 may be generalised without difficult to the general 
differentiable vector bundles in Definition 65.1.3 whose fibre spaces are finite-dimensional linear spaces. (See 
for example Darling [8], page 144.) The dual frame fields in Definition 57.11.4 are then equally straightforward 
to generalise to vector bundles. Similarly, connections and curvature may be defined on principal bundles 
constructed from general vector-frames, more or less in the Cartan connection style. (See for example 
Darling [8], pages 194-206.) However, concepts such as torsion, for example, are not well defined for general 
vector bundles. 
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Chapter 58 


DIFFERENTIALS OF FUNCTIONS AND MAPS 


58.1 The pointwise differential of a real-valued function... .. 2... 2 a a 1837 
58.2 The global differential of a real-valued function. . . a aoa a a a a 1842 
58.3 Induced-map tangent spaces . . . .. .. .. 5.2 2l ss 1843 
58.4 Pointwise differentials of differentiable maps . . ...... a a 1844 
58.5 Pointwise differentials of identity maps . . . . a a a 1849 
58.6 Partial differentials of maps on direct products . . .. aoa a a a 1852 
58.7 Differentials of common-domain product-maps . ....... a e 1854 
58.8 Pointwise pull-back differentials of differentiable maps . . .. ..... ls 1862 
58.9 Global differentials of differentiable maps................... 2.055045 1864 
58.10 Global differentials of maps on products and products of maps ............... 1867 
58.11 Global pull-back differentials of differentiable maps .................2.00. 1868 
58.12 Differentials and induced maps for differential operators .............2..200. 1871 


58.1. The pointwise differential of a real-valued function 


58.1.1 REMARK: The differential of a C! real-valued function has three equivalent formulations. 

The derivative of a real-valued function of several real variables is typically specified in application contexts as 
the sequence of partial derivatives of the function with respect to the variables. Tuples of partial derivatives 
follow a simple covariant transformation rule under changes of coordinates if the function is Ct. 


The pure mathematical tendency is to specify the directional derivative of the function in every direction at 
every point, not just the derivatives along the axes. (Ever since the advent of the numericisation of space in 
the 17th century, pure mathematicians have yearned to be free from axes and coordinates!) 


In modern abstract treatments of differential geometry, the differential of a real-valued function is specified 
as a directional derivative. This associates with each C! function f and direction V at each point p the 
directional derivative Oy f (p). (See Notation 54.11.12 for “Oy”.) If f and p are fixed, Oy f (p) is a function 
of the direction V. This is the “differential” of the function f at p. The differential of a function is a map 
from the set of all vectors V at a point to the real-valued derivative of the function in the direction V. This 
is linear with respect to V. Therefore a differential is a linear functional on the set of vectors V. In other 
words, it is a member of the dual linear space. 


'Thus the differential of a C! real-valued function at a point p in a C! manifold M can be viewed in three 
ways, as a sequence of partial derivatives (0;f)7.,, as a directional derivative Oy f for V € T,(M), or asa 
linear functional or total differential map V ++ Oy f, which is an element of the dual linear space 75 (M). 
The equivalence of these perspectives is guaranteed for C! functions by Theorem 41.6.17. This equivalence 
is not generally valid for non-C! functions. 


Although manifolds and functions are often assumed to be as smooth as desired in modern textbooks on 
pure mathematical differential geometry, there are application scenarios where the luxury of C^? smoothness 
cannot be purchased. It is the Ct level of smoothness which is the minimum price to pay for the modern 
abstract treatment of differentials as covectors. In applications, it is important to know the minimum 
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smoothness required as input for each definition and theorem in order to obtain the desired outputs. When 
that minimum smoothness is not available, more detailed analysis must be undertaken. 


Definition 58.1.2 exploits the linear-map property of the differential of a C! function to express it as an 
element of the dual space T5 (M) at a single point p € M. The notation (df), for the differential at a single 
point is slightly ambiguous. It could mean the value (df)(p) of a map df : M — T*(M), or it could mean the 
restriction (df )| T,(M) of a map df : T(M) — R. (Issues regarding the “globalisation” of the differential are 


postponed until Remark 58.2.1.) The tangent operator version of the differential in Definition 58.1.3 is not 
much used in this book, although the majority of modern textbooks do use tangent operators to represent 
tangent vectors. (Some spaces and maps in Definitions 58.1.2 and 58.1.3 are illustrated in Figure 58.1.1.) 


o 
a ba 


pud Le 
20 l | OL 


Figure 58.1.1 Differential of a real-valued function at a point for vectors and operators 


58.1.2 DEFINITION: The differential of f at p, for any function f € C! (M) and point p in a C! manifold M, 
is the map (df), : T,(M) — R defined by 


We T,(M), (df)p(V) = yf. 
(See Notations 54.11.12 and 54.11.3 for dy : C!(M) — R.) 


58.1.3 DEFINITION: The differential of f at p (for tangent operators) is the map (df), : T,(M) >R 
defined by 


VL € T,(M), (df)p(L) = L(f). 


58.1.4 THEOREM: Linearity of the differential of a real-valued function at a point. 
Let M be a C! manifold. Then (df), : T;(M) — R is a linear map for all p € M and f € C'(M). 


PROOF: The assertion follows directly from Definition 58.1.2 and Theorem 54.11.13 (iii). 


58.1.5 REMARK: Notation for differentials of real-valued functions. 

The same notation (df), is used for both the tangent vector and operator versions of differentials to economise 
on notations. (A notation such as (d f)» would have been the natural choice for the operator version, 
but this would look silly. It increases the line spacing too much. But with such a notation, one could 
write (df), = (df), o O, where Ó is the map V — Oy as in Figure 58.1.1.) 


58.1.6 THEOREM: Expressions for differentials of real-valued functions. 
Let M be a C! manifold with n = dim(M) € Z. Then 


Vf € C'(M,R), Vp € M, Vv € atlas; (M), Vv € IR^, 
(df)p(tp.v,o) = pwy (f) (58.1.1) 


= DOF o V7 (2), y (58.1.2) 
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and 
Vp € M, Vv € R^, Vy € atlas,(M), Vi € Nn, 
[du us) m v (58.1.3) 
and 
Vf € C'(M,R), Vp € M, Vv € atlas, (M), Vi € Nn, 
(df)s (e^) = o" (f). (58.1.4) 
PROOF: It follows from Definition 54.1.2 that tp... € Tp(M). Therefore (df)p(tp..,4) = Ovf by Defini- 


tion 58.1.2, where V = tpw. But then Oy f = Oy wy (f) by Notation 54.11.12, which verifies line (58.1.1). 
Then line (58.1.2) follows from Notation 54.11.3. It follows that 


Vp € M, Vv € R^, Vy € atlas, (M), Vi € Nn, 
(du), (tpv, p) = 


s 


V3 On; (U* o V7 (2))L uy 


ta 
3 | 
= 


I 
S 
[EN 
E 
= 
€ 
€ 


a. 
Il 
un 


I 
e. 


1.3). 


(58 
For line pa 1.4), e" = tye, by Notation 54.4.10, and 0?” (f) = Op,c,,u(f) by Notation 54.13.5. Therefore 
(df)p(e7 "jc P corr f) by line (58.1.1). —= 


58.1.7 REMARK: The close relation between directional derivatives and differentials. 

Since (df), = {(V, Ov f); V € T;(M)), whereas Oy = ((f,Ovf); f € C'(M)}, it seems that Oy and (df), 
are projections of the map (f, V) — Oy f onto f and V respectively. So the derivative and the differential 
are just different ways of viewing the combined map (V, f) œ> Oy f from T,(M) x C' (M) to IR. Therefore 
the maps ð : V ++ Oy and d, : f ++ (df), may be regarded as function-transposes of each other. The former, 
“g”, is a map from Tp(M) to T,(M) whereas the latter, “dp”, is a map from C! (M) to T7 (M). 


which verifies line 


58.1.8 REMARK: Chart component differentials. 

Since each component 7)" of a chart v» € atlas; (M) is a real-valued C! function on an open neighbourhood of 
p € M if M is a C! manifold, it seems that it should have a well-defined differential, and if so, it should have 
an identifiable meaning. Let U = Dom(v). Then U is a C! manifold with an atlas consisting of restrictions 
of charts in atlas(M) to U. So Definition 58.1.2 is directly applicable to the C! manifold U with the atlas of 
restricted charts. Therefore the “chart component differentials” (du), are well defined. (These are referred 
to as “coordinate differentials” by Crampin/Pirani [7], page 37.) Theorem 58.1.9 shows that these are the 


same as the "tangent space chart-basis covectors" ep „ in Definition 55.3.2. 


58.1.9 T'HEOREM: The differential of a chart component equals the corresponding chart-basis covector. 
Let M be a C! manifold with n = dim(M) € Zf. Then 


Vp € M, W% € atlas,(M), Vi € Nn, (di^), me b uw (58.1.5) 
Pnoor: By Definition 55.3.2 and Notation 55.3.3, E d = bei BY Definition 55.2.1, (dy), = 1 eil if 
and only if VV € T,(M), (dy")p(V) = É ei QO 1 But by Theorem 54.1.8 (xii), T,(M) = {tpoy; v € IR"). 
Therefore (dy*), = t ei y if and only if Vv € R”, (dv')p(tpos) = t5 ei (toss). But by Theorem 58.1.6 
line (58.1.3), (dv^), (tpv, yp) = v* for all v € R”, and by Notation 55.2.9, t (tps) = Mat)Á = 
E Td = v' for all v € IR^. It follows that (du^), = ey = = teip for an p € M, v c atlas; (M) 
and i € Ny. 
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58.1.10 REMARK: Tangent space chart-basis covectors are equivalent to partial differentials. 
For the tangent space chart-basis covectors e; ,,, which equal the chart component differentials (dy’),, one 


could also use the simplified notation di ,, or d'. Then the identity di (eb) = ĝi holds in terms of the 
chart-basis vector notation ep in Notation 54.4.10. (See also Remark 58.1.14 for similar comments on 
notation.) The tangent covectors (dv), may also be written as dy’ (p). 


58.1.11 REMARK: Differentials of locally defined real-valued functions. 

It is often desirable to define (df), to act on functions in C1(M, R) instead of C! (M), because then it is not 
necessary to extend local functions to global functions with desired local properties. (See Notation 51.6.8 
for e (M,R) = U(C!(Q,R; Q € Top, (M )}.) Another advantage is the ability to write expressions such 


as (d^), = = a oP” (f)(dy")p for f € CÀ(M, R), with n = dim(M), as in Theorem 58.1.13. 


58.1.12 REMARK: The set of differentials at a s spans the tangent covector space. 

The tangent covector set 77 (M) is equal to {(df), : T,(M) > IR; f € C'(M)} because the set of differentials 
(df)p spans Tj (M). This can be shown with functions f which are chart component functions y for 
v € atlas(M) multiplied by suitable functions with compact support. (See Example 44.1.12 for smooth 
functions with compact support.) 

If the differentials (df), are defined in terms of tangent ata ae in 7 p(M) rather than T,(M), then the 


tangent covector space may be defined as T*(M ) = {(df)p : Ti(M) > R; f € C'(M)). The difference 
between these tangent covector spaces ['oróbably) may be Bulb glossed over in most situations. 


58.1.13 THEOREM: The coordinate differentials span tangent covector spaces. 
The sequence of vectors ((du*)5)7., is a basis for T$ (M) for any p € M and v € atlas; (M), for any Ct 
manifold M with n = dim(M) € Zj. 


PROOF: The assertion follows from Theorems 55.3.5 and 58.1.9. 


58.1.14 REMARK: Linear combination expression for differentials at a point. 

Theorem 58.1.15 expresses the differential (df), of a C’ real-valued function in terms of the unit basis 
vectors in Theorem 58.1.13. The tangent covectors (d^), could plausibly be abbreviated to d, or d'. 
(See Remark 54.13.11 for the corresponding tangent operator abbreviations. See Remark 58.1.10 for the 
chart-basis tangent covector notation.) 


58.1.15 THEOREM: Expressions for differentials of functions in terms of chart-basis covectors. 
Let M be a C! manifold with n = dim(M) € Zj. Then 


Vp € M, Vw € atlas, (M), Vf € Ĉ}(M, R), 
(df); = OP (P) (du*), 


PROOF: Let p € M, v» € atlas; (M) and f - CM, R). Then f € C! (Q, R) for some € € Top,(M) by 
Notation 51.6.7. Let V € T,(M). Then V = typo "for some v € R” by Notation 54.1.4. So by Theorem 58.1.6 
lines (58.1.2) and (58.1.3), 


(df), (V) = (df )p(tp,v,2) 
2 vOs(f o v (2) uo; 


II 


i 
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since 9P" (f) = 8, (F (971 (z)))|,. yp) bY Theorem 54.13.6 line (54.13.1). So (df); = Diy a” C f) (du),,. 
The other two equalities then follow from Theorem 58.1.9. 


58.1.16 REMARK: Expression for the differential in terms of tangent covector bundle charts. 

Since the differential of a function is a tangent covector, it can be coordinatised by the tangent covector bundle 
charts ®*(w) in Notation 55.4.7 and the tangent covector bundle manifold charts W* (uw) in Definition 55.4.8. 
This is done in Theorem 58.1.17. 


58.1.17 THEOREM: Components of differentials of functions in terms of differential operators. 
Let M be a C! manifold with n = dim(M). Then 


Vp € M, Vb e atlas (M), — 9 ((df)) = (OP "(fa 
= (8k f(97 e heup ia 


and 


Vp € M, Vw € atlas,(M), — V*((df)5) = (v(»), (9P" a) 
= (lp), (Ox f (9^ (@)) eye) Ren) 


Pnoor: Let p € M and $ € atlas,(M). Then (df), = Yj, OP" (f)t* oiy by Theorem 58.1.15, and 
so ®*((df)p) = D (Era 0n" t ey) = Xi- 18 (AE, sy) = DL He = OP Us by 
Notation 55.4.7. Therefore  *((df),) = (ri f(W7*(x) i.i by Theorem 54.13.6. The equalities for 
V* (v) then follow from Definition 55.4.8. 


Ji "mE 


58.1.18 REMARK: Computation of the differential of a function along a curve. 

The assertion of Theorem 58.1.19 may seem intuitively obvious, and the proof may seem needlessly tedious, 
but the use of coordinate charts in this computation is inescapable. It is important to sometimes see how 
the abstractions of differential geometry all ultimately boil down to differential calculus on Cartesian spaces. 
(The spaces and maps in Theorem 58.1.19 are illustrated in Figure 58.1.2.) 


Figure 58.1.2 Spaces and maps for differential of function along a curve 


58.1.19 THEOREM: Chain rule for differentiation of a real function composed with a curve. 
Let M be a C! manifold. Let y : I + M be a Ct curve in M for some I € Top(IR). Let f € C! (M,IR). 
Then 


vto € I, OFC) pay = 0s to). 
PROOF: Let to € I, p = y(to) and v € atlas,(M). It follows from Definition 57.9.2 that y'(to) = tpv, 
with v — OGICA DI PN So (df) p(y‘ (to)) = Ov f by Definition 58.1.2, where V = tpw,- But Ov = Op, v, by 
Notation 54.11.12, and pv (f) = X; v/O/Oz'(f o YNE) ayp by Notation 54.11.3. Therefore 


n 


(df), 5) =D AMON ears zs oY GOL 


= (f ov o Y o VD) 
= AF NOl 
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58.2. The global differential of a real-valued function 
58.2.1 REMARK: Globalisation of the differential of a differentiable function. 


The pointwise differentials (df), in Definition 58.1.2 may be “globalised” to a single object df which combines 
the information for all points of the manifold in two principal ways. 


(1) Differential map: df : T(M) > IR with df : V > (df)x(v)(V) for V € T(M), where 7: T(M) — M is 
the projection map for T(M) in Definition 54.5.4. 
Then Yp € M, (df), = (df)|;, (n. Thus df = (V. (df)acvy(V))s V € TUM} = Uren df). 
(2) Differential field: df € X (T*(M)) with df : p — (df), for p € M. 
In other words, df : p — (V  (df),(V)) for p € M and V € T;(M). 
Then Vp € M, (df), = (df)(p). Thus df = {(p, (df)p); p € Mj. 


The global differential in Definition 58.2.2 is a differential map as in (1), whereas the global differential in 
Definition 58.2.3 is a differential field as in (2). Thus Definitions 58.2.2 and 58.2.3 give inconsistent meanings 
to the notation “df”. 


58.2.2 DEFINITION: Let M be a C! differentiable manifold. Let f € C! (M, IR). 
The differential (map) of f is the map df : T(M) — IR defined by 


Vp € M, VV € T,(M), (df)(V) = (4f)p(V). 
In other words, 
VV € T(M), (df)(V) = df) mvy(V), 
where 7: T(M) — M is the projection map for the tangent fibration (T(M),7, M). 


58.2.3 DEFINITION: Let M be a C! differentiable manifold. Let f € C!(M, R). 
The differential (field) of f is the covector field df € X(T*(M)) defined by 


Vp € M, (df)(p) = (df )p. 
In other words, 


Vp € M, VV €T,(M), (df)(p)(V) = (df) (V). 


58.2.4 REMARK: Local definitions of the differential of a real-valued function. 

If f is defined only on an open subset 2 of M in Definitions 58.2.2 and 58.2.3, then the domain of df is 
correspondingly restricted, and Dom(f) is understood to have a restricted atlas. (See Definition 51.4.15 
for the restriction of a differentiable manifold to an open subset.) Then the differential map of f has the 
form df : T(Q) > R in Definition 58.2.2, and the differential field of f has the form df € X(T*(Q)) in 
Definition 58.2.3. These kinds of restrictions are usually made without comment. 


58.2.5 REMARK:  Differentiability of the differential of a real-valued function. 

Theorem 58.2.6 makes the unsurprising assertion that a C*+ real-valued function has a C differential 
field. To prove this, the abstract specification in Definition 58.2.3 must be converted to a concrete form as a 
function on a Cartesian space patch so that the analysis for real-valued functions on Cartesian spaces may 
be applied. 


58.2.6 THEOREM: If a real function is C**1, then its differential is C". 
Let f € C**!(M) for a C**! manifold M with k € Zf. Let df be the differential field of f. Then 
df € X*(T*(M)). Hence if f € C*(M) for a C? manifold M, then df € X*(T*(M)). 
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PROOF: Let k € Zj. Let M be a C**! manifold. Then T*(M) is a C* manifold by Theorem 55.4.13 (iii). 


Let df € X(T*(M)) be the differential field of f as in Definition 58.2.3. Let v € atlas(M). Let U = Dom(v) 
and Q = Range(w). Then f o ~~! € C*+1(Q, IR) by Definition 51.6.2. Let x € Q and p= v-!(x). Then 


(W* (v) o df o )(z) = V'Q)((df)5) 
= (x, (Oni f (07 (x) 1) 


by Theorem 58.1.17. But f o v-! € C** (Q, IR) by Definition 51.6.2. So W*(v) o df o v-! € C*(O, R2”). 
Therefore df € X*(T*(M)) by Theorem 57.5.7 line (57.5.2). The assertion for f € C^*(M) then follows 
from this. 


58.2.7 REMARK: Tangent differential operator versions of the global differential. 

Definition 58.2.8 adapts the differential field in Definition 58.2.3 for tangent differential operators instead of 
plain tangent vectors. (See Notation 54.11.9 for T, (M). See Notation 54.15.4 for T,(M).) This is the global 
differential field corresponding to the pointwise differential for tangent operators in Definition 58.1.3. (Note 
that the untagged “total space” 7*(M) is not a true fibre bundle.) 


58.2.8 DEFINITION: Let M be a C! differentiable manifold. Let f € C! (M, IR). 
The differential (field) of f (for tangent operators) is the function df € X (T*(M)) defined by 


Vp € M, VL e T, (M), (df)(p)(L) = (df) (L). 


The differential (field) of f. (for tagged tangent operators) is the function df € X(T*(M)) defined by 


Vp € M, V(p,L) € T,(M), (df)(p)(p, L) = (df), (L). 


58.3. Induced-map tangent spaces 


58.3.1 REMARK: Spaces of induced maps of differentiable maps between manifolds. 

Section 58.3 defines spaces of induced maps of differentiable maps between manifolds. Each “induced-map 
tangent space" could be thought of, more or less, as a kind of tangent bundle on the Cartesian product of 
the domain and range tangent bundles, but this seems artificial. It is better to regard such spaces as mere 
notational conveniences, since it is easier to write T(Mi, M2) than U,, em pacem: Lin(T5, (M1), Tp, (M5)). 
Such a space does not seem to have any obviously useful geometrical interpretation. If one attempts to 
define cross-sections of such spaces, there seems to be obvious application. So these are not really tangent 
bundles. Nor are they really fibre bundles in any natural sense. 


58.3.2 DEFINITION: The (pointwise) induced-map tangent space of a pair of C! manifolds Mı and M» at 
points pı € Mı and p» € M» is the linear space Lin(T,, (Mi), T5, (M35)) of linear maps between the tangent 
spaces Tp, (M1) and Tp, (M2). 


58.3.3 NOTATION: Tp, pa(Mı, M3), for py € Mi, p» € Mə, for C! manifolds Mı and M», denotes the 
pointwise induced-map tangent space Lin(T,, (M1), T5, (M3)). 


58.3.4 DEFINITION: The (total) induced-map tangent space of a pair of C! manifolds M; and Mg is the 
set Up cm, ,p2€ Mz Tri o; (Mı, M2) of all linear maps between tangent spaces Tp, (Mi) and Tp, (M5). 


58.3.5 NOTATION: (Mi, M»), for C! manifolds M; and M3, denotes the total induced-map tangent space 
Up, € Mi,p2€ Mo Tpi p; (Mi, Mə). 


58.3.6 REMARK: Duals of induced-map tangent spaces. 
The duals of the induced-map tangent spaces are the induced-map tangent covector spaces. The space 
T* (Mi, M3) is essentially equivalent to the space T(Mz, Mı) for any C! manifolds Mı and M3. 


58.3.7 DEFINITION: The (pointwise) induced-map tangent covector space of a pair of C! manifolds M; and 
M» at points py € Mı and p» € Mz is the space Lin(T7, (Mı), T}, (M2)) of linear maps between the tangent 
spaces T% (Mı) and T7, (M5). 
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58.3.8 NOTATION: T% ,,(Mi, M3), for py € Mi, po € M», for C1 manifolds M; and M», denotes the 


pointwise induced-map tangent covector space Lin(Tj, (M1), Tp, (M2)). 
58.3.9 DEFINITION: The (total) induced-map tangent covector space of a pair of C! manifolds M; and M5 


is the set Up cM, pse Mo Tp, 5; (M1; M2) of all linear maps between spaces 77, (Mı) and Ty, (M2). 


58.3.10 NOTATION: T7*(Mi, M2), for C! manifolds Mı and Mo, denotes the total induced-map tangent 


s * 
covector space Up em poe Ms Tp. pa Mi M3). 


58.3.11 REMARK: General “two-point tensors”. 

Maps in Lin(75, (Mi), Tp, (M2)) may be thought of as “two-point tensors” of type ((1,0), (1,0)) or ie a 
General two-point tensors of type ((r1, 81), (r2, $2)) or Ms 2] are defined by Marsden/Hughes [289], page 70, 
to be elements of Lin(151*: (Mi), T5??? (M3)). 


58.4. Pointwise differentials of differentiable maps 


58.4.1 REMARK: Survey of terminology for differentials and induced maps. 

Some terminology variants for differentials and induced maps in the literature are listed in Table 58.4.1. 
(Regarding the terminology of Auslander/MacKenzie [1], it should be noted that they define their differential 
df to be an equivalence class of differentiable functions which have the same derivative at a given point.) 


year reference df for f:M >R Qx for 6: Mı > Mə 
1963 Auslander/MacKenzie [1], pages 43, 46 differential induced linear transformation 
1963 Kobayashi/Nomizu [19], pages 6, 8 total differential differential 
1964 Bishop/Crittenden [2], pages 9, 12 differential differential 
1965 Postnikov [33], page 22, 24 differential differential 
1968 Bishop/Goldberg [3], pages 55, 58 differential differential 
1968 Choquet-Bruhat [6], pages 27, 29, 60 [fr] différentielle différentielle 
1970 Misner/Thorne/Wheeler [292], pages 59-63 gradient, differential 
1970 Spivak [37], Vol. I, page 109 differential 
1972 Malliavin [28], page 26 [fr] application dérivée 
1979 Do Carmo [9], page 10 differential 
1981 Bleecker [254], page 8 differential 
1981 Poor [32], page xiii induced tangent map, differential 
1986 Crampin/Pirani [7], pages 249, 250 differential induced map 
1987 Gallot/Hulin/Lafontaine [13], page 17 differential map 
1993 EDM2 [113], 105.I, 105.J, page 385 differential differential 
1993 Kosinski [21], page 14 differential 
1994 Darling [8], pages 35, 41 differential differential 
1995 O'Neill [295], pages 4 differential differential map 
1997 Frankel [12], pages 7, 40 differential differential 
1999 Lang [23], page 52 tangent map, induced map 
2004 Szekeres [305], pages 423, 426 differential tangent map, differential 
2005 Penrose [297], page 224 gradient 
2012 Sternberg [38], page 26 differential 
2015 Gómez-Ruiz [14], page 23 [es] diferencial 
Kennington differential differential, induced map 
Table 58.4.1 Survey of terminology for differentials and induced maps 


In many cases, the mathematical constructions which are given these names are defined only at a single 
point. When these constructions are aggregated into functions on the entire domain manifold, most authors 
define an aggregate function equivalent to a map df : M — T*(M) for real functions f : M — IR. For maps 
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$ : Mı — M» between manifolds, they generally define an aggregate function of the form dọ € T'(Mi, M3), 
which maps from the first tangent bundle to the second. 


Some authors refer to the differential of a real-valued function as the “exterior derivative". The differential 
of a real-valued function on a manifold M is in fact essentially the same as the exterior derivative if the 
real-valued function is thought of as a cross-section of the Ag(M) bundle. In other words, f € X°(A9(M, R)) 
and df € X! (A9(M, R)), where Ao(M,R) looks like C°(M, R) and A4(M,TR) looks like T*(M). 


58.4.2 REMARK: Relative advantages of terminologies for differentials and induced maps. 

The term “differential” is used in this book for the style of aggregate differential map whose domain is the 
same as the domain of the function or map which is being differentiated. For example, f : M — IR has a 
differential df : M — T*(M). The term “induced map" is used here for a differential map whose domain and 
range are the tangent bundles of the respective domain and range of the map which is being differentiated. 
For example, à : Mı — M» has an induced map dọ : T(M,) > T(M3). 


Despite possible arguments in favour of the term “induced map”, however, in practice it seems preferable to 
use the term “differential” in all cases. Differentials of the form df : M — T*(M) are best regarded as just a 
convenient way of packaging the more general kind of differential of the form df : T(M) — IR, and the term 
"differential" is shorter and more popular. 


58.4.3 REMARK: The difference between "differential" and. "gradient". 

As mentioned in Remark 74.6.1, the differential is a covariant vector (i.e. differential form) which can be 
defined without a connection or metric, whereas the gradient is a contravariant vector which is constructed 
from the differential with the assistance of a metric tensor. 


58.4.4 REMARK: The differential of a map is uniquely defined by the chain rule for curves. 

As mentioned in Remark 58.4.10, there is really only one reasonable choice for the differential of a map 
between manifolds in Definition 58.4.5. Thus Definition 58.4.5 may be regarded as a theorem which follows 
directly from the expected chain rule for composition of maps with curves. The coordinate transformation 
rule on line (58.4.2) is simply the Cartesian map differential via the respective charts on the two manifolds. 


58.4.5 DEFINITION: The differential at a point p € Mı of a C! map ¢: Mı > Mo», for C! manifolds Mı 
and M» with n; = dim( Mı) and na = dim( M2), is the linear map (d$), : T;(Mi) > Typ) (M2) defined by 


V), € atlas, (Mi), Vv» € atlasgp) (M2), Vv; € R™, 


[dia bai ds ) = tnos adus (58.4.1) 
where v2 € R”? is defined by 
Le TU i jo 
Vk € Nn, vt = 2, 0s: (pk o $ o vi poer (58.4.2) 
= por h (V2 © $). (58.4.3) 


58.4.6 REMARK: Coefficients of the differential expressed as a differential operator. 

The expression for v2 on line (58.4.3) in Definition 58.4.5 follows from Notation 54.11.3. This expression is 
not often useful because it applies a vectorial operator p wı,p, to the non-vectorial real-valued components 
wk o ġ: Mı > R. So this expression has only a limited application to computations such as in the proof of 
Theorem 58.7.7 and some minor comments in Remarks 58.9.3 and 58.12.1. 


58.4.7 REMARK: Tangent space for the range of a pointwise differential of a map. 
Although Theorem 58.4.8 is in the realm of the obvious, it is given here so as to remove doubt because it is 
required so often. (Some of the maps and spaces in Theorem 58.4.8 are illustrated in Figure 58.4.2.) 


58.4.8 THEOREM: The differential of a map at a point lies in the tangent space at the map’s value. 
Let ¢: Mı — Mə be a C! map for C! manifolds Mı and Mə. Then 
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Figure 58.4.2 Pointwise differential of a differentiable map 
In other words, 
VV € T(Mi), (d$)z, (v (V) € Toc qv (M2), 
where mı : T (Mi) — Mi is the tangent bundle projection map for Mi. In other words, 


Vp € Mi, Range((d¢)p) € Typ) (M2). 


Pnoor: The assertion follows directly from Definition 58.4.5. 


58.4.9 THEOREM:  Linearity of the differential of a map between manifolds. 
Let à : Mı — Mə be a C! map for C! manifolds M; and Mə. 


(i) Vp E€ Mi, VÀ ER, VV € T (Mı), (do) (àV) = A(do)s(V). 
(ii) Vp € Mi, VS, V2 € T, (M1), (d$)y(Vi + V3) = (do)p(V1) + (dd) (V2). 
(iii) (do), : T,(M1) — Typ) (M3) is a linear map for all p € Mj. 


PROOF: For part (i), let p € Mi, V € T (Mı), Yı € atlas; (Mi) and nı = dim( Mı). Then V = tpn, y, for 
some v; € IR"! by Notation 54.1.4. Let v € atlas;(5)(M2) and n; = dim( M2). Then (d¢)p(V) = t5(p),v, us; 


where v2 € R”? is given by 
ny 


v2 = È viða (Y2 o [o o yi )(z)], uy: 
Let A € R. Then AV = ty34,,4, by Definition 54.4.4 (ii). So (d¢)p(AV) = tẹ(p),v, p2; where v5 € IR"? is 
given by 


v) = 3 (i) Oso oo Vi )(2)|, us (p) 


1 
=À 3 vi Oi (We oo CDCA IMEN 


i=1 


Hence by Definition 54.4.4 (ii), (d$), (AV) = Lo(p), Ava a = Ab dite) ns abs = A(dQ)s (V). 
Part (ii) follows in a similar manner to part (i) from Definition 54.4.4 (i). 
Part (iii) follows from parts (i) and (ii) and Definition 23.1.1. 


58.4.10 REMARK: The fundamental property of the differential of a map. 
Theorem 58.4.11 may be regarded as the fundamental property of differentials of maps between differentiable 
manifolds. In fact, some authors use this property as the definition of the differential of a map. 


58.4.11 THEOREM: Chain rule for composition of maps with curves. 
Let ó : Mı — M» be a C! map for C! manifolds Mj and Ma. Let y : I > Mı be a C! curve for some 
interval J € Top(IR). Then 


Vt c I, (à o 4)' (t) = (dé). CY ©). 
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PROOF: Lett € I. Let yı € atlas; (M1) and vy» € atlasy(+(t)) (M2). Then y(t) = T), (by (y (£))) bi and 


(6 o YY (t) = tscy),9,(2(6(Y())),u, by Definition 57.9.2. Let p = y(t) and vı = O6;(v1(y(t))). Then by 
Definition 58.4.5, (dó)p(fp.», v.) = tp) vs i, With 


v2 = 3 Vj Oxi (Uo o Go Vi a)l azy) 


- S Ba (a o 6o PTE) aay A o DE) 


= Gio o ovi! o Yı © y(t) (58.4.4) 
= Ov» o 0 y)(t) 
= 0:(v2(¢(r(t)))), 


where line (58.4.4) follows from Theorem 41.7.2. Hence (dd). (4) (y (t)) = (dé)p(tp,v. y) = (6 o y)'(t). 


58.4.12 REMARK: Chain rule for the differentials of maps between differentiable manifolds. 
Theorem 58.4.13 is the C! manifold version of the Cartesian space chain rule in Theorem 41.7.2. This is one 
of the most frequently used theorems about differentials of maps. 


As suggested in Remarks 41.6.2 and 41.6.3, it is the C! level of differentiability which makes the chain rule 
work. The chain rule on its own is sufficient justification for employing C! differentiability rather than other 
kinds of differentiability. 


58.4.13 THEOREM: Chain rule for pointwise differentials of differentiable maps. 
Let $4 : Mı — Mə and $5 : Mz > M3 be C! maps between C! manifolds Mi, Mz and M3. Then 


Vp € Mı, (d($2 o $1))p = (dó2)o, (p) © (dó1)p- 


PROOF: Let n; = dim(Me) for £ = 1,2,3. Let V; € T;(Mi). Then V, = t,,,, y, for some (unique) v € IR" 
by Notation 54.1.4 and Theorem 54.1.8 (vii). So by Definition 58.4.5, (d(@2 o $1))p(Vi) = to;(o. (p))vs bs: 
where v3 € R” satisfies 


Vi. € atlass, (p) (M3), Vk € Nns, 


oå = $S viða (WS o (62061) er GOL usus 


— = vi, (WE o da o v5) o (Ua o d1 o i) GL uo; 


= YoY By WF o 92 043 VI uuu ded osovr GL, uu) 88.45) 
= j=l 

= M Oy (Who s o i GL, au 22 MACE oi o GIL uso; 
j= i= 

E viða (VF o do o Va DG] uy 


where line (58.4.5) follows from Theorem 41.7.2, and v2 € R”? is defined by 
; j in 
Vj € Na; Vg = Lvi 8, (VÀ o 1° Wy BIC 2 | aba (p)" 


Therefore by Definition 58.4.5, (d(¢2 o ¢1))p(Vi) = (doz) fp) (V2), where V2 = ty, (p),v2,v2 = (dé1)p(Vi) by 
Definition 58.4.11. Consequently (d(¢2 o ¢1))p(Vi) = (dé2) ¢(p) ((de1)p(V1)) for all Vi € T;,(Mi). Hence 
(d(ó2 o $1))p = (dé2)o, (p) © (dó1)p- 
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58.4.14 REMARK: Differentials of linear maps on linear spaces. 

Theorem 58.4.15 applies Definition 58.4.5 to finite-dimensional real linear spaces, regarded as differentiable 
manifolds by using coordinate maps as manifold charts. Unsurprisingly, the differential of a linear space auto- 
morphism has a linear effect on the components of a tangent vector which it is applied to. (Theorem 58.4.15 
is used in the proof of Theorem 65.3.2.) 


Theorem 58.4.16 shows that a linear space automorphism applied to the drop function of tangent vector 
gives the same result as applying the differential of the linear map to the tangent vector before dropping it. 
(Theorem 58.4.16 is used in the proof of Theorem 69.10.4.) 


((2019-4-14. The proof of Theorem 58.4.15 should be a two-liner which invokes a linear map components 
theorem from Section 23.2, but right now I don’t have the energy to write that theorem. )) 


58.4.15 THEOREM: Formula for components of the differential of a linear map. 
Let F be a finite-dimensional real linear space. Let Ap be the standard differentiable manifold atlas for F 
as in Definition 51.4.21. (That is, Ar is the set of all component maps for F.) Let n — dim(F). Then 


VL € Lin(F, F), Vp € F, Vw € Ap, Vw € R”, 
(AL) p(tp,ww) = tr(p),o (1 (971 w) 


PROOF: By Definition 58.4.5, (dL), (ty,u,u) = tr(p),s,v, where 
Vk € Nn, ü" = Y wô (pe o Low )(z)], uc 
ic 


Define e; € R” for i € Nn by (ei); = Oi for i,j € Nn. Let ef = 47 "(e i) for all i € Nn. Then 
wa) = ux. ziej) = D Td , and then L(w—1(x)) = i er ) = zs 1 27 L(ej P) by the 
linearity of L. Therefore 9(L(v^!(x))) = v(*55 4 2) L(e7)) = Ej- c v(L(e7)) by Theorem 2. 8.12. It 
follows that O, (YE (L(9-1(2)))) = Ty Qa) V^ (L(eP) = n Su^(L(eF)) = YE(L(eF)). Therefore 
u^ = $7 WY (LT e:i) = YLT E w ie) = pk (L(Y 1(w))). Thus w = o(L(~7*(w))). 
Hence (dL)p(tp,w.4) = tL) h(E (w))) b> 


58.4.16 THEOREM: Effect of a linear map on the drop function of a linear space. 
Let F be a finite-dimensional real linear space. Then 


VL € Lin(F, F), Vp € F, Vy € T(F), L(@"(y)) = w" ((aL)p(y)). 
(See Definition 54.9.5 for the linear space drop function c : T(F) > F.) 


PROOF: Let € Ap, where Ap is the standard manifold atlas for F as in Definition 51.4.21, namely 
the set of all linear space component maps for F. Then c^(y) = v-!(é(v)(y)) for all y € T(F) by 
Definition 54.9.5. (See Notation 54.5.7 for ®(w).) Let p € F and y € T,(F). Then y = tpw, for some 
w € IR” by Theorem 54.1.8 (xii), where n = dim(F). So c (y) = v !(w). 


By Theorem 58.4.15, (dL) (y) = tr») (9-1(w)),e- So ^ (dL) p(y)) = v7 (9LQ071(9)))) = Lyw). 
Hence L(z (y) = «^ (dL), (y)). 


58.4.17 REMARK: Linearity of infinitesimal general linear group actions. 

Theorem 58.4.18 cannot be expressed in terms of Lie groups because general Lie groups are not introduced 
until Definition 62.2.4. However, the manifold atlases for general linear groups and finite-dimensional real 
linear spaces are already defined at this point. (The formal introduction of general linear Lie transformation 
groups is Definition 63.4.17.) The differentiable atlas for a finite-dimensional real linear space and its general 
linear group are introduced in Definitions 49.7.14 and 49.7.15 respectively. These are used to construct 
the corresponding differentiable manifold structures in Definitions 51.4.21 and 51.4.24 respectively. (See 
Theorem 63.6.25 for a reformulation of Theorem 58.4.18 in the context of Lie transformation groups.) 


For applications to vector bundles, it is useful to know that the vertical drop of an infinitesimal general 
linear group action is linear. (See for example the proof of Theorem 68.1.3.) 
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58.4.18 THEOREM: Linearity of the vertical drop of an infinitesimal linear map. 

Let F be a finite-dimensional real linear space with its standard manifold atlas. (See Definition 51.4.21.) 
Let G = GL(F) be the general linear group of F with its standard manifold atlas. (See Definition 51.4.24.) 
For p € F, let Rp : G > F denote the right action of p on G, namely Rp : g — gp for g € G. For u € T.(G) 
define bu : F + F by B,(p) = w*((dR,)-(u)). Then B, is a linear map from F to F for all u € T,(G). 


PROOF: Let B be a basis for F. Let vp; = kpg be the component map for F with respect to B. Then 
vr € atlas(F). Let Ya = &p,p be the component map for G with respect to B. Then yg c atlas(G). 


Let p € F. Let u € T,(G). Then u = te wpa for some w € R™*™, where m = dim(F). Let V = (dRp)e(u). 
Then V = t, s, € Ty (F), where v = 355, 4 Wijôe Vr (RV (z)))|,_ pale) by Definition 58.4.5. 


Let B = (ej)*.,. Let z = yr (p). Then Vj € Nm, Vr(vG (x)p); = 2 £jizi by Definition 23.2.10. So 


Vi E Nm, vj = Warten he (2)p)5]. ut 


E 


Tja PM 


HG 


m 
Wh tOs, e » Tjiži egos 
i= 


> 


II 
Ms 


- 
ll 
= 


Wj iZi. 


So Bu(p) = yp (v) = gem w;;Vr(p);ej, which is linear with respect to p by Theorem 22.8.12. 


58.5. Pointwise differentials of identity maps 


58.5.1 REMARK: The differential of the identity map. 

A useful first test for Definition 58.4.5 is the identity map on any C! manifold M. (If the definition is 
impractical for the most trivial C! map, there's not much hope for other maps!) The map idm : M > M 
with idm : p > p is a C! map by Theorem 52.1.8. Its differential is computed in Theorem 58.5.2. 


The proof of Theorem 58.5.2 is somewhat longer than it needs to be because all pairs of charts are tested 
in accordance with Definition 58.4.5. In fact, it is only necessary to test one chart-pair (Yı, Y2) with 
V1 € atlas; (M1) and v» € atlas;(5) (M2). (The sufficiency of a single choice of v; for the differentiability test 
in Definition 52.1.2 is shown in Theorem 52.1.11.) So in fact, it is really only necessary to test for Y2 = Yı 
in the proof of Theorem 58.5.2, which would make the chart transition matrix equal to the identity matrix. 
However, proving the sufficiency of single chart choices is onerous, as seen in the proof of Theorem 52.1.11. 


For the global version of Theorem 58.5.2, see Theorem 58.9.18. 


58.5.2 THEOREM: The differential of the identity map on points is the identity map on the tangent space. 
Let M be a C! manifold. Then (didm)p = idr, (ar) for all p € M. 


PROOF: Let M be a C! manifold. Let p € M. Let yı € atlas,(M) and v» € atlasiq,, (p = atlasp( M). 
Then 2 o idm o V1! = v» o i1! is C! by Definition 51.3.2 (iii). Let v; € R”, where n = dim(M). Then 


TL 


Vke NN, — Y vios (Uf o idm o vr) (2), uu) = Y: vios (95 o v1!) (2) — 


1-1 i=1 


by Definition 51.4.18. Define v, € IR" by Vk € Nn, vg = 357 4 Jai(p)";vi. Then (didm)pltpvi y1) = tp,v) vs 
by Definition 58.4.5. But í,4,44 = 554,45: by Theorem 54.1.11. Therefore (didm )plto.s y1) = tpv, for 
all v; € R” and v, € atlas,(M). Hence (dida), = id, (ar) by Notation 54.1.4. 


58.5.3 REMARK: The pointwise differential of the identity map is a “tautological tensor”. 
The trivial differential (didm)p = idr, (a4 of the trivial map idm may be regarded as a tensor in Tj! (M). 
This follows from various canonical equivalences between tensor spaces. 
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Clearly idr (y € Lin(T;(M), T;(M)). But Lin(Tp(M), Tp(M)) is isomorphic via the linear space transpose 
map to Lin(T7 (M), T; (M)) by Theorem 23.11.17. (Similarly, Lin(V, W) is isomorphic to Lin(W*, V*) for 
finite-dimensional linear spaces V and W.) Then Lin(T7 (M), Tš (M)) is essentially identical to g”! T,(M). 
(See Notation 29.5.5.) And &'*! T; (M) is canonically isomorphic to the space e T, (M) in Notation 29.5.3 
by Theorem 29.3.11. But &  T;(M) = T (M) by Notation 56.1.5. Thus id, (rj may be thought of as an 
element of T} (M ). Some authors refer to the global differential of idw as the “tautological tensor field". 
(See for example Sternberg [38], page 162. For applications of this tensor field to frame fields, see for example 
Spivak [37], Volume 2, pages 260, 264.) One could then refer to the pointwise differential (didjy), = idr, (yj; 
as the “tautological tensor at p". (See Remark 58.9.17 for the “tautological tensor field" .) 


58.5.4 THEOREM: Inverse rule for the pointwise differential of a diffeomorphism. 
Let ó : Mı > Mz be a C! diffeomorphism for C! manifolds M; and M2. Then (d¢)p : Tp(M1) > Typ) (M2) 
is a bijection for all p € Mi, and 


vp € My, (d(¢~")) sp) = (d)p* 


Proor: For p € Mı, Theorem 58.5.2 implies (d(9^! o $)), = idz, (y, and (d(ó o 97!)) f) = id, y (Mz) 
because $^! o 9 = idm, and $ o 9^! = idy,. But Theorem 58.4.13 implies (d($^! o $)), = (d($^1)) fip) o 
(do), and (d(ó o $71));y = (d$), o (d(^!))p(p. It then follows that (d(9^!))g(p) o i = idr, (wi) 
and (d$) o (d(¢~*)) Fp) = idr,,, (m5). Hence by Theorem 10.5.14 (i, iv), (d$), : Tp(Mi) > tj (Ma) isa 
bijection and (d($~*)) ip) = (d$)! 


58.5.5 REMARK: Inclusion maps from submanifolds to their ambient spaces. 

As a pure function, the identity map idm, for a C! manifold M; is nothing more than a set of ordered 
pairs, namely idm, = {(p, p); p € Mi). If M, is a regular C! submanifold of a C! manifold M5, then the 
inclusion map from Mı to M» is the same set of ordered pairs idy,. However, the pointwise differential 
of idm, : Mı > Mz at p € Mi is a map (didm,)p : T;/(Mi) — Tp(M2), which is different to the map 
(didw,)p : T5(M1) > T,(M1) which is mentioned in Theorem 58.5.2 if Mı # M». Thus the meaning of 
“(didm )p” depends on the source and target manifolds Mı and M» which are implied in the surrounding 
context. This highlights the fact that the definition of the pointwise differential (d), in Definition 58.4.5 
depends on four inputs, namely the source and target manifolds Mı and M5, the map o : Mı — Mə and the 
point p € Mi. It would be less ambiguous, but much more tedious, to denote such a differential as (d$)? M2. 


The submanifold tangent vector embedding map 7 in Theorem 58.5.6 is introduced in Definition 54.6.2. (See 
Sections 52.3 and 52.4 for submanifolds of differentiable manifolds.) 


58.5.6 THEOREM: The tangent vector embedding map equals the differential of the point inclusion map. 
Let S be a regular C! submanifold of a C! manifold M. Let n : T(S) — T(M) denote the tangent vector 
embedding map for S within M. Then 


Yp € S, VV € T,(S), n(V) = (dids),(V), 
where ids : S — M is the inclusion map of S within M. 
PROOF: Let m = dim(S) and n = dim(M). As in Definition 54.6.2, define A(p, v) for y € atlas;(S) by 

Vp € S, Vi) € atlas, (S), 

A(p, V) = {4% € atlas; (M); 9(S) = Range() n (R™ x (0gs-»]) and j|, (,,, = Hi" o [5]. 

Let ) € atlas, (S) and € A(p,%). Let V = 1,4," Then (dids)p(V) = tpw,y, where 

vie Nn, wi = Y vagi GdsF"@ ace 

" 2 vaj (7 Egoy 

(Note the use here of C! compatible charts € atlas; (M) instead of atlas charts i) € atlas,(M). This 


subtlety is explained in Remarks 54.6.3 and 54.3.2.) Then by Theorem 52.4.4 (v), v(P-(x)) = (£, 0g-^). 
So w = iuo uy 7 Haud , v6 53 Jo) = v’ for i € Nm, and w* = 0 for i € Ny \ Nm. Thus 
tow = (ty yp) by Definition 54.6.2. Hence 7(V) = (dids)p(V). 
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58.5.7 REMARK: Application of tangent vector embedding maps to embeddings between manifolds. 
Theorem 58.5.8 uses temporary notations (dọ) : T(Mi) — T(M2) and (db)? : T(Mi) > T(S2) to 
distinguish pointwise differentials which are defined with respect to different target manifolds M2 and S2 
respectively. This is necessary because a function is nothing more than a set of ordered pairs. In some 
formalisms, a function is represented as a triple such as (X, Y, 9), where X and Y are the source and target 
sets respectively, and ó is the graph (i.e. set of ordered pairs) of the function. (This issue is also mentioned in 
Remark 9.1.3.) In some ways, such a formalism for functions is closer to real-world usage than the bare-bones 
set-of-ordered-pairs formalism. However, the explicit indication of the source and target sets adds significant 
“management overheads”. 


A more reliable procedure in general is to add the source and target structure input parameters to the 
notations for the output objects which are constructed from functions. In the case of the differential, the 
necessary inputs are in fact the function, the point, and the source/target atlases, but the convention is 
to write manifold names only, leaving the atlases implicit. That is precisely what is done here with the 
temporary notations (db) }" and (d$). (For an alternative approach, see the range restriction operations 
in Notation 9.6.22, by which o|? could be used to mean that the target manifold is S2, which would then 
enable the expression (dd|*?)p to mean the differential of 9 with S2 as the target manifold. Similarly, 
(dé) s)» in Theorem 58.5.10 means that the domain manifold S; is to be assumed.) 


'Theorem 58.5.8 may seem to be superfluous or artificial, but it is an inevitable consequence of the way in 
which differentials of maps between manifolds are defined. The requirement for the tangent vector embedding 
map is very often overlooked because in typical practical scenarios, an embedding would be regular, and its 
range would be a regular submanifold. The necessity of Theorems 58.5.6 and 58.5.8 becomes evident when 
these plausible assumptions fail to hold. 


58.5.8 THEOREM: The differential of an embedding requires a submanifold vector embedding map. 

Let M; and M5 be C! manifolds. Let ¢ : Mı — Mə be a regular C! embedding of Mı in Mə. Let 
S2 = Range(¢). Let (dd)}” : T(Mi1) > T(M2) denote the differential of 6 : Mı + Mə at p € Mi. Let 
(d$)?? : T(M1) > T(S2) denote the differential of 9 : M; > S2 at p € Mi. Then 


Vp e Mi, (do) = nS? o (dó)j*, 
where Nise : T(S2) + T(M3) is the submanifold tangent vector embedding map as in Definition 54.6.2. 
Pnoor: By Definition 52.5.3, Sg = Range(@¢) is a regular C! submanifold of Mz and ó : Mı — S» is 


a C! diffeomorphism. Since ids, € C!(S5, M3) by Theorem 52.4.18, it follows by Theorem 58.4.13 that 
(d9)!? = (dids,) sp) o (do). Hence (dd)? = nf? o (dd)$? by Theorem 58.5.6. 


58.5.9 REMARK: Differentials of submanifold restrictions require a submanifold vector embedding map. 
Whereas Theorem 58.5.8 uses the submanifold tangent vector embedding map to convert between differentials 
where the target set is either the ambient space or a submanifold, Theorem 58.5.10 uses it to convert between 
differentials where the domain is either the ambient space or a submanifold. 


The notation $| g in Theorem 58.5.10 does not, strictly speaking, indicate that the restricted domain set 
S must use the submanifold differentiable structure. However, it is a convenient informal shorthand which 
gives a strong hint that this is intended. This is very similar to the target space differentiable structure 
issue which is described in Remark 58.5.7. Unfortunately there is no standard notation to indicate which 
additional structures should be chosen for the domains and target sets of maps. (Possibly Notation 9.6.22, 
which indicates restrictions of both source and target sets, could be adapted so that $| PAPE means that the 
atlases A, and Av are to be used for the source and target spaces $1 and S2 respectively, but such concepts 
are not required often enough to justify introducing new notations for them.) 


58.5.10 THEOREM: The differential of a submanifold-restricted map requires an embedding map. 
Let M; and Mz be C! manifolds. Let ó € C! (Mi, M3). Let Sı be a regular C! submanifold of Mı. Then 


Vp € Si, (do| s)» = (d$) o NS,» 


where ng” : T(S1) ^ T(M;) is the submanifold tangent vector embedding map as in Definition 54.6.2. 
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PROOF: By Theorem 10.10.15 (iii), Ps, = (0 idg,. Since ids, € C!($1, M1) by Theorem 52.4.18, it follows 


by Theorem 58.4.13 that (dó|.. ), = (d¢)p o (dids,), for all p € S1. Hence (dó|, )p = (dó), o n5; for all 
p € Sı by Theorem 58.5.6. 


58.6. Partial differentials of maps on direct products 


58.6.1 REMARK: Pointwise differentials of maps between direct products of manifolds. 

Differentiable maps are often defined from a direct product Mı x Mə of C! manifolds M; and Mə to a Ct 
manifold Mo. And example of this is the group operation o : G x G > G of a Lie group, or the left action 
L:Gx M — M ofa C! Lie transformation group. Differentiable maps also often arise from a single manifold 
to a direct product of manifolds. Examples of this are *local trivialisations" of a differentiable fibre bundle. 
It is then desirable to be able to decompose the differentials of such maps into differentials for maps between 
single manifolds. As an example, consider a Lie transformation group right action map u : P x G > P. 
Then one would expect a formula such as (du)z (y, u) = (du(z, -))g(u) - (du(-, 9))-(u) for all (z,g) € PXG 
and (y, u) € TZ(P) x T,(G). The first task is to give meaning to the terms in such a formula. 


58.6.2 DEFINITION: The direct product decomposition of the differential of aC! map ó : Mı x Mz — Mo at 
a point-pair (pı, p2), for C! manifolds Mi, M2, Mo, is the map (dó)p, p, : Tp, (M1) x Tp, (M3) > Ty (Mo) 
defined by 


pi,p2) 


VV € Tp, (M1), YV2 € Tp, (M2), 
(d$)p, p. (Vi, V3) = (dd) (p, p) (Va, V3)), 


where i : T(Mi) x T(M3) > T(Mi x Mə) is the global direct product identification map for tangent bundles 
in Definition 54.7.6. 


58.6.3 REMARK: Direct product decomposition of the differential of the identity map. 

A perplexing situation arises when the manifold Mo in Definition 58.6.2 is the manifold product Mı x Mo 
and the map 9 : Mı x Mz — Mo = Mı x M3 is the identity map idw, xm,. One would normally expect the 
differential of the identity map between two manifolds to have the form as in Theorem 58.5.2. But if the 
manifold is expressible as the direct product of two manifolds, it is possible to regard the identity function as 
a function of two variables! This ambiguity is inherent in the concept of “functions of several variables”, as 
alluded to in Remark 10.2.31. Sometimes the distinction between a function of two variables and a function 
of variable-pairs is significant. Definition 58.6.2 is one example of this, where the domain of ¢ is in fact the 
direct product set Mı x Mə (which signifies a function of two variables), not the direct product manifold 
Mı x M» (which would signify a single variable). 


Let Mı and Mə be C! manifolds. Then according to Definition 58.6.2, 


(didm, xm )pi p; (Vi, V2) = (didm, x Mz )(p1,p2) G(Vi, V2)) 
= idm, (Mix M3) (Vi, Va) 


by Theorem 58.5.2. Therefore (didm, x y), p, (Vi, V2) = i(Vi, Va). Thus (didy, x w;)p, p; = İ(pı p2); Where 
i: T(Mi) x T(M3) > T(Mi x M3) is the global direct product identification map for tangent bundles in 
Definition 54.7.6, and ip, ,) lr. (My) x Ty, (Ma) ` Tp, (M1) x Tp, (M2) + Tip, p2) (M1 x M3) is the pointwise 
direct product identification map for tangent bundles in Definition 54.7.2. But applying Theorem 58.5.2 to 
idm, x M2 directly gives (d idm, x Mə ) (p1 ,p2) = idm, (Mi X Mo): 

Thus (didm, x M2) p1,p2 = *(p, p) : Tpı (Mı) X Tp, (M3) > Tip, p, (Mi X M3) in the two-variable interpretation, 
whereas (didm, x Mz)(p1,p2) = ddr, ,, (Mi x M2) : Tpi o (Mı X M2) > Tj, 4, (Mi x M3) in the one-variable 
interpretation. This is not a logical contradiction. There is real confusion only when distinct objects are 
“identified”. Then there will be some occasions when the difference must be recognised. In this case, it is 
desirable to be able to say that i is the differential of the map id; x ,. (This yields a useful property of 
the *concatenation" map i.) But this is only true when the map is regarded as a two-variable map. 


58.6.4 THEOREM:  Pointwise differential of identity map on a direct product of manifolds. 
Let k € Zi. Let M; and Mz be C* manifolds. 
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(i) Vp € Mi, Vp? € M», (didm, x u)(pi pz) = id, p) (Max Mə) * Tp, p, (Mi x M3) > Ty, p, (Mi x M3). 


(ii) Vp, € Mı, Vp) € Ms, (didm, x M.) pi, pz = Lp, p2) : To (Mi) x Ty, (M3) — T tos (Ma x M3), where 
lcs ilz. (Mj) T, (Ma) ` Tp, (Mi) x Tp (M2) > Typ, 5.) (Mi X M3) is the pointwise direct product 
identification map for tangent bundles in Definition 54.7.2. 


PROOF: Part (i) follows from Theorem 58.5.2. 
Part (ii) follows from Definition 58.6.2 and Theorem 58.5.2. 


58.6.5 REMARK: Partial differentials of maps from direct products of manifolds to manifolds. 

The identification map in Definition 58.6.2, like most identification maps, can usually be safely ignored 
because they are fairly obvious and contain little real information. Therefore (d@)p,», and (d@)(p, p2) are 
used interchangeably in line (58.6.2), ignoring the identification map. 


58.6.6 THEOREM: Decomposition of a differential into two “partial differentials”. 
Let Mo, Mı and Mz be C! manifolds. Let ¢: Mı x Mz > Mo be a C! map. Then 
V(pi.po) € Mı x M», V(Vi, V2) € Tp, (M1) x Tp, (M3), 
(dó)p, p, (Vi, V2) = (dé( +, p2))p (Vi) + (do(p1, + ))pa (V2) (58.6.1) 
= (d1? )pı (Vi) + (de$  )p (V2), 
where 9f? : py œ> (pi, p2) and $5 : po ++ (pi, p2) are the partial maps in the statement of Theorem 52.6.8. 
This may be written slightly informally (ignoring the identification map 7 in Definition 58.6.2) as 
V(pi,p2) € Mi x Mo, V(Vi, Vj) € Tp, (Mi) x Tp. (M2), 
(d$)(5, ,p2) (Vi, Vo) = (d ó( - , P2))p (Va 1) + (dé(pi, t Jos (V2). (58.6.2) 


PROOF: Let v; € atlas; (My) and Vk = tprp pp, With v € R”! and n, = dim(My), for k = 1,2. Let 
po = O(p1, p2) and vo € atlas), (Mo). Then by Definitions 54.7.6, 58.4.5 and 58.6.2, 
(do) pr p; (Vi, V2) = (d oN (p1,P2) (Vi, V2)) 
=(d P)(pr P2) y(t Ce auda 
E 


Do, vo, Vo» 


where vo € R™, with no = dim(Mo), is defined by 


. ni-cna : ; - 
Vj E€ INS, vå = » (v1, v2)’ Opi (8 ogo (Yı x b2)*)()) | sts (ps)ba(pa)) 
where “(v1, v2)" is shorthand for concat(v;, v2), the tuple concatenation of vı and v2. Thus 
x ni " - we 
Vj € IN; U) = v10, (A o @ o (V; x V2) *)(2, Pa(P2))) ean (pr) 
i=1 


+ S vides (Qd o $ o (Wr X s) ) (P1),2)) a uy 


But (¢ o (v1 X va) ) (a, Y2(p2)) = PPI (2), p2) = (ØC, p2) o Hy )(x) and (6 o (i X v2) Di (1) 2) = 
$(pi, Pz  (x)) = (6(pi, -) o v (a). "Therefore 


Vj € Na, vj = X viðr: (Q6 o 6C. pa) o YTE) |g us 


pi) 
g Uz Oxi (ud 9 o(p, -) 9 by ')(a)) ges 


ER 
= $(vo)((dé( ` , P2))pı (tpi vi 1) + $(vo)((dé(pi; ` ))p2 rn 
(Vo) ((dé(- , p2))p. (V1) + (dé(pi; + ))p2(V2))’- 


Hence (dó)p, p, (V; V2) = tpo,vo.vo = (dC:  p2))p (Vi) + (do(p1, :))p, (V2). 


S 
S 

a 
x< 
mR 
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58.6.7 THEOREM: Pointwise “partial differentials” of a function of two variables. 
Let Mo, Mı and M» be C! manifolds. Let ¢: Mı x Mz — Mo be a C! map. Then 


V(pi.po) € Mı x M», VV, € Ty, (M3), 


(dó)p, p (Vi, Ops) = (dó( * , p2))p. (Vi) (58.6.3) 
and 
V(pi,p2) € Mi x M», VV» € Tp, (M2), 
(dó)p, p; (05, , V2) = (dé(pi, :))p (Va) (58.6.4) 


where 05, = Or, (m) and Op, = Or, (m2): 


PROOF: Lines (58.6.3) and (58.6.4) follow directly from Theorem 58.6.6 line (58.6.1). 


58.6.8 THEOREM:  Differentials of linear operations on linear-space manifolds. 
Let F be a finite-dimensional real linear space with m = dim(F) and addition operation op : Fx F > F 
and the standard manifold atlas for F. Then 


(i) Vai, qo € F, Vp € atlas(F), Yw, we € R”, (do F)q,q2 [loss aires ga ase.) = Lai +¢2,w1+wo,pr- 
(ii) Vou, q2 € F, Vyr € atlas(F), Vui, w2 € R™, (dax )(g q2) (ilta, y dr ian sap) = ta +g, wi+tw,Yr 


PROOF: For part (i), define Ry, : F > F by Ry, = or(:.42) : qı o r(qd1, q2). Then by Definition 58.4.5, 
(dFa,)a, (Las wi pe) = lg qa, usbr : where 


M3 


Vi € Nm, ul 2 w Oni Wir (Fia, (be Gus) 


Q&S 
Il 
mr 


I 
Ms 


e 
Il 
m 


wisi (E) + d2)|, (a) 


3 


II 


wi A, (a + Vi (22), 4. (qr) 


l 
IM: Ï 
Es 
e: 


J 
E 


Thus (das) (tar us vr) = tatqowisr: Similarly (dLa ao (taz ,wz r) = tatqo,wobr> where La : q2 07 
qı + q2. Therefore (dor)g, qo (Ua, m. Yr: Iq wa ip) = tga+qz,wi+wz,pr by Theorem 58.6.6 line (58.6.1) and 
Definition 54.4.4 (i). 


Part (ii) follows from part (i) and Definition 58.6.2. 


58.7. Differentials of common-domain product-maps 


58.7.1 REMARK: The technical necessity of theorems about differentials for products of maps. 

The theorems in Section 58.7 are even more tedious to read than those in Section 58.6. These theorems 
have been included because they are technically necessary for various kinds of assertions about differentiable 
fibre bundles, which have a manifold-product nature by their definition. Since differentials of maps are also 
inescapable (because the word “differential” is part of the phrase “differential geometry”), it is inevitable 
that one must prove some basic properties for differentials of functions whose domains and/or ranges are 
products of manifolds. In particular, fibre bundles are defined in terms of “local trivialisations", which 
are common-domain function products, and all functions on total spaces are defined on product-structured 
manifolds, and all cross-sections have values in product-structured manifolds. 


The theorems in Sections 58.6 and 58.7 are necessary for filling gaps in a wide range of theorems about 
differentiable fibre bundles. However, there is no real need to read them unless they are referred to from 
an application context. Therefore the reader will lose little value by skipping or skimming over these two 
sections. This material has accumulated here because it is technically required for later gap-filling, but the 
author has endured the tedium of this work so that the reader will not need to. 
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58.7.2 REMARK: Differentials of individual maps of a common-domain product-map. 

In Theorem 58.7.3, Yı X v» € atlas, (:),5,(:)) (Mi x M2) by Definition 52.6.2. So (yı X 2) o (d1 x ¢2) € 
atlas! (M) by Theorem 52.2.6 (i) because $1 x ¢2 : M — Mı x Mə is a C! diffeomorphism. Therefore the 
tangent vector ty (4...) (4,4 5)o(ó, x95) 18 well defined by Notation 54.3.3 and Theorem 54.3.5 (i). 

The difference between Theorems 58.7.3 and 58.7.15 is that Theorem 58.7.3 gives formulas for the action of 
the differentials (dé,), and (d$3), on general tangent vectors at z, whereas Theorem 58.7.15 gives formulas 
for the action of the same differentials on horizontal and vertical submanifold tangent vectors which are 


embedded in the ambient manifold at z. 


58.7.3 THEOREM: Partial differentials of common-domain product-maps. 
Let M, Mı and Ms be C! manifolds. Let ó; : M — Mı and $$ : M — M» be C! maps such that 
$1 X 62: M > Mı x M3 is a C! diffeomorphism. Let n; = dim( Mı) and n3 = dim( M2). Then 


Vz € M, Vv, € R™, Vu, € R™, Vy, € atlas, (z) (Mi), Vi. € atlasy, (z) (M3), 


(d$1)- (0, (ova), (Ua Rayo ($1 3)) = bdr m is (58.7.1) 
and 


(dó2)« (0, (u.s) (a Rayo (1 X2) = da (2) vo (58.7.2) 


Pnoor: For line (58.7.1), let Y = (Y1 X V3) o ($1 X $3). Then Range(v) C IR": *?? and by Definition 58.4.5, 
(dd1) 2(tz,(v1,v2),) = lá (2) wi as where 


Vie Nu. wh = P vis (WN Luc, + 25 asm AG Q7 Lu 


By expressing z € IR?! *"? as (1,23) with zı € R™ and x2 € IR??, one obtains 


Va, € R™, Vr; € R”?, 
br( 1 (z4,22)) = b1((b1 X d2) (h1 X vo) 1 (21, 22))) 
= o1((1 X 2) (WT (21), Vz (22))) 
= 4 (o1) 


by Theorem 10.15.15. Therefore 


VjeNa. wf = È ida GG GL así) 3 a LG GL conte 


Hence (d¢1)z(tz,(v1,v2),0) = 5o: (2), v» Which verifies line (58.7.1). 


Line (58.7.2) may be proved analogously to line (58.7.1). 


58.7.4 REMARK: Embedding “inverse partial differentials” in a product-structured manifold. 

Theorem 58.7.5 is useful for differentiable fibre bundles (for example in Theorem 67.11.6 (v)), to equate 
the inverse of the differential of a trivialisation, which is a direct product of functions, with the inverse of 
the differential of a single function. This is made possible by the regular C! submanifold structure on sets 
of the form $7! ((po]) in Theorem 52.7.5 (iii, v), and the fact that $i : 3 ({p2}) > Mi isa C1 


diffeomorphism by Theorem 52.7.5 (ix, xi). 
)z!(Vi) in line (58.7.3) yields a vector in the tangent space T,(M$* 9), 


The expression (4 $4] 6-1 (5, 3) 


where Mf?) = 651({d9(z)}) is a regular C! submanifold of M by Theorem 52.7.5 (v). This is not exactly 
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(d($1 x $2))7*(Vi, 05,0) = m((d Cm 92 ()) 
$2(z) 
COD 
2 ({¢2(z)}) = h ; 
2(z) 
M M» 
5 
1l yoo) NT 
TEM ġ ONS 
Mi x {¢2(z)} 
Y "d 
Mi — ($1(2),02(2)) M4 x Mo 
di(z) VÀ 
Figure 58.7.1 Inverse partial differential of common-domain map product 


the same thing as a vector in T(M). A map which embeds the tangent space of a submanifold within an 
ambient manifold is given by Definition 54.6.2. It is more or less safe to ignore this embedding map because 
if it is well defined, there is only one natural choice for it, and it is well defined if the submanifold is a regular 
C! submanifold. (Theorem 58.7.5 line (58.7.3) is illustrated in Figure 58.7.1.) 


Lines (58.7.4) and (58.7.6), which arise incidentally during the proof of Theorem 58.7.5, are very closely 
related to the *constant cross-section extension" concept in Definition 21.6.8 for fibre bundles. 


58.7.5 THEOREM: Formulas for horizontal and vertical vectors on a product-structured manifold. 

Let M, Mı and Mz be C! manifolds. Let ¢; : M — Mı and ¢2 : M — Mə be C! maps such that 
1 X $2 : M — Mı x Mh is a C! diffeomorphism. Let m : T(M$??) 2 T(M) and nz : T(M2 ) + T(M) 
denote the submanifold tangent vector embedding maps for the submanifolds M? az) - $5 '({2(z)}) and 
M$ = or! (£ós(z))) of M respectively, as in Definition 54.6.2. Then 


Yz € M, VV € T5, (3) (M1), 
(d(ós x ó2))7 (Vis 05,5) = m (ddil pea ))z (1) (58.7.3) 
= (d(ó x $2)~"(-, ó2(2))e e) (V1) (58.7.4) 


and 


Vzc M, YV € T, (M2); 
(d(ói x $2))7 (05, (2), Va) = na ((d(o2| 9. ))z (Va) (58.7.5) 
= (d(& X $2)~*(b1(2); -))o (V2); (58.7.6) 


where 05, (2) = Or, (2) (M3) and 05, (2) = OT, co (M2): 


PROOF: Let ¢ = ($i x $3) !. Then à : Mı x Mə — M is a C! diffeomorphism. Let z € M. Let 
Pı = d1i(z) and P2 = Q»(z). Let V € Toi (Mı). Then (d$)p, pa (Vi, 05,) = (de - ,P2))p, (Vi) by Theorem 58.6.7 
line (58.6.3). Since ¢(-,p2) : Mı — M is a C? differentiable map by Theorem 52.6.8 (i), its pointwise 
differential satisfies (d(@(-,p2)))p, : T», (M1) > To(py,p2)(M) = Tz(M) by Definition 58.4.5. 


Theorem 52.7.5 (ix) implies that il yo: : MI? — M; is a C! diffeomorphism. So Definition 58.4.5 implies 
PURSE eus i oo 
dli) yp2))2 : Tz( MP?) > Ty, (Mi) and (d (hilare) Jp, : Tp (M1) > T;(MP?). But Theorem 58.5.4 implies 
À Bos 
d(ói| uos). = ( = doil yr2))2 z1. Therefore ( (doil o2 ))z 1(V,) € T;(M??) for all Vi € Tp, (Mı). 


By Theorem 52.7.5 (v, xi), M?? is a regular C! submanifold of M, and ei [uos : Mı > MP? is a regular Ct 
——Á Y 
embedding of M; in M. So m o (d($i m ))pı = (dó(*,pa))p, by Theorem 58.5.8 because ó(*, pa) = $1 [aa 
1 1 
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by Theorem 10.15.14 (ii). (Although ¢(-,ps) = ba) ype and Dom(ó(-,p3)) = Mı = Dom( (és |, ure) , these 
two functions do not have the same target space. Therefore their differentials are different, as Seid in 
Remark 58.5.5. The function $(-,p2) has the target space M, whereas $1 
which is a submanifold of M.) Consequently 


Re has the target space M1", 


(d ($1 X $2) 2 1 (V1,05,05) = (d(6 1)71 (Va, 05,05) 
= (dØ) (pı p) (Vi. 094 (2) (58.7.7) 
= (dó)p, p; (Vi, 09, (2) (58.7.8) 
= (dọ(- , p2))p. (V1) (58.7.9) 
=m ((d( bilm) Jo. (V) 
=m ((d(b1| a) V) (58.7.10) 


where line (58.7.7) follows from Theorem 58.5.4, and where line (58.7.8) follows from Definition 58.6.2 
by ignoring the tangent bundle product identification map, as is the custom. Then line (58.7.10) verifies 
line (58.7.3), and line (58.7.9) verifies line (58.7.4). Lines (58.7.5) and (58.7.6) may be proved in exactly the 
same way. 


58.7.6 REMARK: Gluing together pointwise differentials of maps with a common domain. 

Theorem 58.7.7 shows how the pointwise tangent-bundle direct-product identification map can be used to 
*glue" together the pointwise differentials of two C! maps which have the same domain. (This issue, which 
arises naturally for local trivialisations of differentiable fibre bundles, is also mentioned in Remark 54.7.3. 
The proof of Theorem 58.7.7 is illustrated in Figure 58.7.2.) 


io, (p),v2,U2 


V1 toi (p),vi i 
Wc ———» 
ói(p) 


R”: Mı R”: +n 


M, x Mə 


Figure 58.7.2 Pointwise differential of common-domain product of maps 


58.7.7 THEOREM: Pointwise differential of common-domain product of maps. 
Let M, Mı and Mə be C! manifolds. Let $4 : M — Mı and $» : M — M» be C! maps. Then 


Vp € M, (d(ó: X $2))p = io ((dó1), x (dø2)p). 


PROOF: Let pc M, v € atlas; (M) and V € T,(M). Then V = tpw, for some v € R”, where n = dim(M). 
Let Ya € atlasy, (p (Ma) for a = 1,2. Then (dda)p(tpoy) = to, (p). i; Where Ya = Opvy(Ya © Pa) for 
a = 1,2 by Definition 58.4.5 line (58. 4.3). Since v4 X We € atlas, 5, (p (Mı x Mə) by Definition 52.6.2, it 
follows similarly that (d(¢1 x %2))pltp vp) = Lo, x da) (p), 9i ua: Where 9 = Opuy((Y1 X we) o (1 x $3)). 

However, Op, u((V1 X v2) o ($1 X $2)) = (Opsu(Vi © $1), O,v,0(2 o $2)) = (viv). So it follows 
from Definition 54.7.2 that (d(ó1 x $2))p(V) = tie: (5,65 (p)) (viva) xo = (dor) p(V), (db2)p(V)) for all 
V c T,(M). Hence (d(di x $2))p —io ((dó1)y x (d$3),). 
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58.7.8 REMARK: Obtaining component differentials from a common-domain product map differential. 
Theorem 58.7.9 is little more than a paraphrase of Theorem 58.7.7. However, it is presented here in the 
interests of abundant certainty because sometimes in fibre bundle contexts, the horizontal and vertical com- 
ponents of local trivialisations do not separate so cleanly from each other. (See for example Theorem 64.8.12.) 
So Theorem 58.7.9 is intended to remove all doubt. 


Note that in practice, the tangent bundle direct-product identification map ¿i in both Theorems 58.7.7 
and 58.7.9 is rarely shown. 


58.7.9 THEOREM: Separation of common-domain product map differential into component differentials. 
Let M, Mı and M» be Ct manifolds. Let $1 : M — Mı and $» : M — M» be C! maps. Then 
Vp € M, Vy € T (M), V(Vi, Vj) € T(Mi) x T(M3), 
(d(ó: x $2))p(y) = (Vi, V2) €  ((do1)p(y) = Vi and (dó3),(y) = V2). 
Hence if à x $$ : Mo — Mı x M; is a C! diffeomorphism, then 
Vp € M, Vy € T,(M), V(Vi, Vo) € T(Mi) x T(M3), 
y = (d(d1 X é3)5 QA, V9)) € — ((déi)p(y) = Vi and (d¢2)p(y) = Va). 


PROOF: The assertions follow directly from Theorem 58.7.7. 


58.7.10 REMARK: Unique determination of product-structured manifold vectors from differentials. 
Theorem 58.7.11 uses Theorem 58.7.9 to prove a converse of Theorem 58.7.3. This allows a vector on a 
product-structured manifold to be uniquely determined from differentials with respect to component maps. 
In the fibre bundle context, this means that a total space vector is uniquely determined by its horizontal 
and vertical component vectors on the base-space and fibre space manifolds. 


58.7.11 THEOREM: Correspondence between product-manifold vectors and their component differentials. 
Let M, Mı and Mz be C! manifolds. Let $1 : M — Mı and $9 : M — Mə be C! maps such that 
$1 X $2 : M > Mı x M» is a C! diffeomorphism. Let n; = dim( M1) and nz = dim( M2). Then 
Vz € M, VV € T;(M), Vv, € R™, Vv; € R™, Vy, € atlas;, (Mi), VV € atlasy, (2) (Ma), 
((dé1)«(V) = to (2,5, and (dé2)(V) = tea(z)voda) — € V= toi a) (a Xva)o(d1 Xda)" 


Proor: The assertion follows from Theorems 58.7.9 and 58.7.3. 


58.7.12 REMARK: Differentials of maps with two “pathways” from source space to target space. 

In basic differential calculus (in Cartesian space), one often needs to differentiate a function of the form 
x — F(fi(x), fo(x)) for differentiable functions F, fı and fo. For example, one may wish to differentiate 
the function z ++ In(cos(x) exp(z)). Whenever the argument appears in two places in an inline formula for a 
function, one imagines that one appearance is fixed, while the other one varies, and then vice versa. This is 
the same as regarding each instance of x as the value of the identity function, which is a special case of the 
function form «+> F(fi(x), fa(z)). This may be expressed as F o (fı x f2). (See Notation 10.15.3 for the 
common-domain function product fix f2.) In this form, it is clearly the composite of two functions, one of 
which is a common-domain product of two functions. 


Theorem 58.7.13 extends this concept from Cartesian spaces to C! manifolds, based on Theorems 58.6.6 
and 58.7.7 for differentials of the two-variable function and common-domain function product respectively. 
(For some applications, see the proofs of the naive derivative Leibniz rules in Theorems 61.3.3 and 65.6.2. 
Theorem 58.7.13 is illustrated in Figure 58.7.3.) 


58.7.13 THEOREM: Differential of composite of a common-domain product with a two-variable function. 
Let M, Mi, M» and Mo be Ci manifolds. Let Qı :M > Mi, 2 : M > M» and 0) : Mi x M» < Mo þe C1 
maps. Then 


Vp € M, (dlg o (91 X d2)))p = (dØ) ipi pa) © i o ((dó1)p X (dó2)5), (58.7.11) 
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(d(@ o (pı x ¢2)))p(w) 
(dR$ 


So (desto) [afr G2 G5 G0) 
Mo | $((¢1 X é2)(p)) 
A $ No) 
1(p) 2(p) 
r ~~ 
(do1) p(w) (doz) p(w) 


Figure 58.7.3 Differential of composite of common-domain product and two-variable function 


where i : T(Mi) x T(M3) > T(M, x Mə) is the identification map in Definition 54.7.6. Hence 
Vp € M, Vw € T,(M), 
(d(ó o (41 x $2)))p(w) = (dó)o, ().62(p) ((4G1) p(w), (4b2) p(w) (58.7.12) 
= (d6(-, ó2(p))e qp ((4e1)p(w)) + (dolei), + )) bop) ((4b2)p(w)). (58.7.13) 
Thus 


Vp€ M,  (d(óo(ói x $2)))p = (db(-, $2(P)))41(p) © (dóx)p + (49(b1(B), + )) b(n) o (dó2)p. (58.7.14) 


assuming pointwise addition of functions. In other words, 


Vp € M, (d(ó o (1 x $2)))p = (ARE, e) o (dó1)y + (dL$ (p))oaln) o (dd2)p, (58.7.15) 
where Vg € M», R = ¢(-,q) and Yq € Mi, L = ¢(q, -). 


PROOF: Line (58.7.11) follows from Theorems 58.7.7 and 58.4.13. Then the elimination of i in line (58.7.12 
follows from Definition 58.6.2. Line (58.7.13) then follows from Theorem 58.6.6. Lines (58.7.14) and (58.7.15 
are essentially identical to line (58.7.13). 


ence Nes 


58.7.14 REMARK:  Horizontal/vertical submanifold vectors have zero vertical/horizontal components. 
'Theorem 58.7.15 implies that tangent vectors of horizontal subspaces have zero vertical component by 
line (58.7.17), and tangent vectors of vertical subspaces have zero horizontal component by line (58.7.18). 
Therefore tangent vectors to horizontal and vertical submanifolds are respectively horizontal and vertical 
vectors. Submanifold tangent vector embedding maps are required in order to identify submanifold tangent 
vectors with ambient manifold tangent vectors. (Theorem 58.7.15 is illustrated in Figure 58.7.4.) 


58.7.15 THEOREM:  Horizontal/vertical components of horizontal/vertical submanifold tangent vectors. 
Let Mo, Mı and M»; by C! manifolds. Let $1 : Mo — Mı and $» : My — Mə be C! maps such that 
$1 X $» : Mo — Mı x Mp is a C diffeomorphism. For p; € Mi and p2 € M», let (MP?, A?) and (M^, A5") 
be the horizontal and vertical submanifolds respectively of the product-structured differentiable manifold 
(Mo, $1, 62, Mi, M2) as in Definition 52.7.8. Let n}? : T(M??) — T(Mo) and 75? : T(M3') ^ T(Mo) be the 
corresponding tangent vector embedding maps as in Definition 54.6.2. Let n; = dim( Mj) and nz = dim( M32). 
Then 


Vp, € Mə, Yz € MP, Vv; € R™, Vy € atlasy, ;) (Mi), 
(déi | soa) «(6 abr) = (do1)- (ny? (CEN = toi (z),v1,0h1 (58.7.16) 
and (déa| coa ) (1... ja) = (dé2) (17 (t, i, hy) = Or, (Ma) (58.7.17) 
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is g, solam 
LM o ONY Arete 
M? = 
oy" (Eo) “| p H 
P2 > > 
" | 
IR^"? 
pı toi (z), i pı x we 
4— —— — *———» ————- 
¢ı(2) 
R”: Mı pRritn 
Mi x M» 
Figure 58.7.4 Horizontal/vertical components of submanifold tangent vectors 
where 4, = V4 o pilye € Al’. Similarly, 
1 
Vpi € Mi, Vz € M? , Vv € R™, Vi; € atlaso, (z) (M3), 
(doal mpa altz vag) = (001): (E (tee dy) = Or, Qu) (58.7.18) 
and (dóo| yz: Jats uy by) = (dé2) «(n2 (t, v9 abo) = toale) v2 a (58.7.19) 


where Y2 = y2 o G2| yer e Ab. 


PROOF: Let p; € M», z € MÌ’, v, € IR and v € atlas; (; (Mi). Let v» € atlas; (M2). Then it follows 
from Theorem 54.8.5 () that n2 [od = by. (a4 0), Gaby eabaholda X62)" So (dor). (ny? (t, ody) = lói(z),vi a 
and (dé); (ni^ (tz v, 5, )) = OTga) (M2) = 97,,(M2) by Theorem 58.7.3 lines (58.7.1) and (58.7.2) respectively. 
The equalities (d¢1| ,,»2)2 = (dd1)z o rf? and (dóz|, 7.) = (dd2)z © nj? follow from Theorem 58.5.10. 

T. T NNE 


It follows similarly that (dói).(n$' (t, ,, 5,)) = Om, (<) (M3) and (doz) 2(13" (te vy 5,)) = téo(z) woe for all 
pi E€ Mi, z € MP, v2 € R™ and v» € atlasg, (z) (M2), where v; = Y2 o $»| yri- 
2 


58.7.16 REMARK: Inverse differentials of horizontal/vertical component maps. 

Theorem 58.7.17 gives the inverses of the invertible formulas in Theorem 58.7.15. To make the component 
maps ¢; and $» invertible, they must first be restricted to horizontal and vertical submanifolds respectively. 
For the benefit of applications, these inverse formulas are written out explicitly in Theorem 58.7.17. 


Most notable here is that the tangent vector embedding maps are no longer required in order to convert 
submanifold vectors to ambient vectors. This is because the charts $4 and $» are restricted to submanifolds 
M?? and M', which implies that the domains of the differentials of these maps are the corresponding 
submanifolds, not the ambient manifold. Therefore, as discussed in Remarks 58.5.5, 58.5.7 and 58.7.4, and 
in Theorem 58.7.5, the outputs from the inverse differentials must be submanifold tangent vectors. Sadly the 
equalities in lines (58.7.17) and (58.7.18) cannot be inverted. So they are not represented in Theorem 58.7.17. 


58.7.17 THEOREM: Inverse horizontal/vertical component map differentials. 

Let Mo, Mı and Mə by C! manifolds. Let ó1 : Mo — Mı and ¢2 : Mo — M» be C! maps such that 
$1 X $» : Mo — Mı x Mh is a C! diffeomorphism. For p; € M; and py € Mh, let (MP?, A?) and (M31, AP) 
be the horizontal and vertical submanifolds respectively of the product-structured differentiable manifold 
(Mo, Qı, oa, Mi, M3) as in Definition 52.7.8. Let nı = dim( Mı) and ng = dim( Mə). Then 


Vp2 € M», Yz € MP?, Yvi € IR", Vi, € atlaso, (2)(M1), 
zi 
(doi [rea Jos (o) (fs) sh) = aos (58.7.20) 


where i4 = V4 o lue: € AT?. Similarly, 
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Vp1 € Mi, Vz € Mj, Vv € R™, Vi; € atlass, (z) (M2), 
(dés | vs Jost) (62 (2) o 2) = Uus d (58.7.21) 
where ijj = Y2 o $a] un € AP. 


PROOF: For line (58.7.20), let p; € Mz and z € M??. Then (dói| 5:)« : TZ(MP?) + Ty,(2)(M1) is a C! 
E mS jT 


diffeomorphism by Theorem 52.7.5 (ix), which implies that (d¢1| ,,»)7 ! = (di| »" )o,(;) by Theorem 58.5.4. 
pets z : 9 


Hence line (58.7.20) follows from Theorem 58.7.15 line (58.7.16). Similarly, line (58.7.21) follows from 
Theorem 58.7.15 line (58.7.19). 


58.7.18 REMARK: Application of tangent vector embedding map to inverse horizontal/component maps. 
Theorem 58.7.19 is almost a carbon copy of Theorem 58.7.17. (And Theorem 58.7.17 is almost a carbon 
copy of Theorem 58.7.15.) Theorem 58.7.19 simply adds back the tangent vector embedding maps which 
were removed in Theorem 58.7.17. This somewhat shallow modification is intended to make certain kinds of 
applications to fibre bundles easier. The assertions may be thought of as partial inversions of the component 
projection formulas in Theorem 58.7.3. 


58.7.19 THEOREM: Embedded inverse horizontal/vertical component map differentials. 

Let Mo, Mı and M> by C! manifolds. Let 9; : Mg — Mı and à» : Mg — M» be C! maps such that 
$1 X $» : Mo > Mı x Mp is a C! diffeomorphism. For p; € Mı and p2 € M», let (M??, A?) and (M^, A) 
be the horizontal and vertical submanifolds respectively of the product-structured differentiable manifold 
(Mo, à1, 62, Mi, M2) as in Definition 52.7.8. Let n}? : T(M??) + T(Mo) and 75? : T(M3') + T(Mo) be the 
corresponding tangent vector embedding maps as in Definition 54.6.2. Let n; = dim( Mı) and nə = dim( M2). 
Then 


Vp € M», Vz € MP?, Vv € R™, Vi; € atlasg, (2)(M1), Vibe € atlasg, (2 (Ma), 


—1 
n ((dói | M?? Jose) (£o, (z), vipi )) = be (ay 0) (1 X 2)0(b1X¢2)° (58.7.22) 
Similarly, 


Vp1 € Mi, Vz € M3", Vv; € IR"?, Vy, € atlas, (2 (M1), V» € atlasy, (z) (M2), 


—1 
n5 ((dóo | Mn )datey Los (23,02.52) = tz, (0,02), (aia o( buda" (58.7.23) 


Pnoor: Line (58.7.22) follows from Theorem 58.7.17 line (58.7.20) and Theorem 54.8.5 (1). 
Line (58.7.23) follows from Theorem 58.7.17 line (58.7.21) and Theorem 54.8.5 (ii). 


58.7.20 REMARK:  Product-structured manifold tangent vector synthesis from components. 

Theorem 58.7.21 adds lines (58.7.22) and (58.7.23) in Theorem 58.7.19 to synthesise any tangent vector on 
a product-structured manifold in terms of its horizontal and vertical components via charts. Informally one 
may omit the tangent vector embedding maps 


58.7.21 THEOREM:  Decomposition of product-structured manifold tangent vectors into components. 

Let Mo, Mı and M»; by C! manifolds. Let $1 : Mo — Mı and $» : My — Mə be C! maps such that 
$1 X $2 : Mo — Mı x Mp is a C! diffeomorphism. For p; € Mi and p2 € Ms, let (MP? , A?) and (M^, A5) 
be the horizontal and vertical submanifolds respectively of the product-structured differentiable manifold 
(Mo, 61, 62, Mı, M2) as in Definition 52.7.8. Let 7? : T(M??) + T(Mo) and nf! : T(M2') ^ T(Mo) be the 
corresponding tangent vector embedding maps as in Definition 54.6.2. Let n; = dim(Mj) and nz = dim( M3). 
Then 


Yz € Mo, Voi € R”, Yuz € R"; Vu, € atlasy, (2) (M), Vibo € atlaso, (z) (M3), (58.7.24) 


z —1 z —1 
tz (vi 2) (a X bao ($1 X da) = i ((d dbs root Js (o) Coa 2), i) XE n$! ) ((dóa | ro sto) Loa (2),o2.92))- 
Hence 


Vz € Mo, Vy € T. (Mo), 
y = nf? (dés [ira (61) (9) + nf O (doala Jua((62).())). — (58.7.25) 
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PROOF: Line (58.7.24) follows from Theorem 58.7.19 and Definition 54.4.4 (i). Then line (58.7.25) follows 
from line (58.7.25) and Theorems 52.7.2 (iii), 54.3.7 and 58.7.3. 


58.8. Pointwise pull-back differentials of differentiable maps 


58.8.1 REMARK: The contragredient of the differential of a map between manifolds. 

The pull-back differential of a map at a point in Definition 58.8.2 is a kind of “contragredient” map between 
dual spaces, relative to the (forward) differential in Definition 58.4.5. The pull-back differential “pulls back” 
tangent covectors from TY) (M2) to Tý (Mı). (See Remark 23.11.3 for further comments on terminology for 
pull-backs, contragredients and transposes.) Thus the “pull-back (d9);(f) of f via (do)," pulls back the 
function f from T5(5) (M25) to T; (Mı) via (dó),. (Definition 58.8.2 is illustrated in Figure 58.8.1.) 


(do); AV) = Fdo) V) 


Figure 58.8.1 Pull-back differential (dó)5(f) for target space IR at a point p 


58.8.2 DEFINITION: Pointwise pull-back differential for tangent covectors. 
The pull-back differential at a point p € Mı of a C! map ó : Mı > Ms», for C! manifolds M, and Mg, is the 
linear map (d$); : 15, (M2) > T5 (M1) defined by 


Vf € T5, (M3), VV € T (M1), (do) PV) = f((do),(V)). 
In other words, 
Vf € Typ) (M2), (dh) (f) = f o (d9),. 


Alternative name: transposed differential at a point. 


58.8.3 REMARK: The pointwise pull-back for a differentiable map is a transpose operator. 

The pointwise transposed differential for maps between differentiable manifolds in Definition 58.8.2 is related 
to the transpose map for linear maps between linear spaces in Definition 23.11.2. Theorem 58.8.5 makes 
the obscure (and possibly useless) point that (d$); is a transpose of the forward differential (dó),, not an 
inverse. An inverse would be a map from Typ) (M2) to T,(Mi), and would yield an identity map when 
composed with (dd),. That would be a totally different concept. 


However, (d); is not literally the transpose of (dó),. It is in fact the transpose of hg») o (d¢)», where the 
canonical identification map hap) : Typ) (Ma) > Typ) (M2)** identifies Tp) (M3) with its double dual as in 
Definition 23.10.4. (See Remark 10.19.4 for the transpose of a function-valued function.) Since identification 
maps are often thought of as essentially a precise equality in some sense, one could ignore the map hg») and 
say that, essentially, (do); is indeed the transpose of (d$); 


58.8.4 REMARK: Terminology for pointwise pull-back differentials. 
The pointwise pull-back differential in Definition 58.8.2 is called the “transpose of ¢, at p" by Daniel/ 
Viallet [317], page 182. (See also Remark 58.11.1 for terminology for the global pull-back differential.) 
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58.8.5 THEOREM: Transposition of vector and function arguments for pointwise pull-back differentials. 
Let ¢ € C! (Mi, M3) for C! manifolds Mı and Ms. Then 
Vp € Mi, Vf € Toto) (M2), VV € T, (M3), 
(d$); (f(V) = (hot o (do) V), 


where the identification map hg : Ty(M2) > T; (M3)* = T,(M2)*™ is defined by hg : V (f + f(V)) for 
all q € M». That is, (d$); is the transpose of the function-valued function hyip) o (dé), for all p € Mı. 


PROOF: LetóeCl(Mi,Mg,peMiyf€ Typ) (Mz) and V € T;(Mi). Then 


(hop) © (d$)5) (VIC) = (iq) © (6)5)(V)) Cf) 
= hep) (ldo) VIC) 
f((de)p(V)) 
= (do), (f)(V) 
by Definition 58.8.2. Thus (d$); is the function transpose of hgp) o (dó); for all p € Mi. 


58.8.6 REMARK:  Vector-valued pull-back differentials. 

The tangent covector spaces T7, (M2) and T5 (Mi) in Definition 58.8.2 are the same as the linear spaces 
Lin(T5(5 (M2), R) and Lin(T; (Mi), IR). This suggests that the target space R could be replace with a general 
real linear space W. This is both true and useful. (See for example Theorem 69.8.2 (ii).) 


The linear map spaces Lin(T5(5, (M5), W) and Lin(T,(Mi), W) may even be replaced with the multilinear 
map spaces .Z, (Tg) (Mz), W) and .Z;, (T, (Mi), W) for general r € Zf as in Notation 27.2.19, together with 
their linear space structure in Definition 27.6.3. Particularly applicable are the corresponding antisymmetric 
subspaces in Notation 30.1.5. 


It is difficult to describe the pull-back differential in Definition 58.8.7 as a “transposed differential" because 
the “double dual” linear space Lin(Lin(T,; (Mi), W), W) does not generally resemble T,,(M1) if W # R. 


The notation * (dó)5" in Definition 58.8.7 is the same as is used in Definition 58.8.2. General pull-back maps 
may be more accurately described as pull-back map templates because the class of all real linear spaces W 
in Definition 58.8.7 is not a valid set in ZF set theory. So the aggregate of pull-back maps for all target 
spaces W cannot be a ZF function in the sense of Definition 10.2.2. However, for each individual space W, 
Definition 58.8.7 defines a valid ZF function for each given ¢ and p. 


A simpler, and more accurate, way to think about pull-backs is that they compose linear functions f in 
Definition 58.8.7 with the forward differential (do), on the right. This operation pulls back the domain of f 
from Typ) (M3) to T; (M1), while leaving the values of the function unchanged. Thus a more meaningful name 
for a pull-back operation (do); would be the “domain pullback via (d@),”. (Definition 58.8.7 is illustrated 
in Figure 58.8.2.) 


(do)5 (f )(V) = f((do)s(V)) 
w 
OO Nagy, 
(dd)p 7 
Tp(Mı) Ve ————— e. T$(p) (M2) 

NA Ke 

Mi| pe |——+|¢(p)+ | Me 
i Na 


Figure 58.8.2  Pull-back differential (do) (f) for linear target space W at a point p 
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58.8.7 DEFINITION: Pointwise pull-back differential for vector-valued covectors. 
The pull-back differential for target space W at a point p € Mi of a C! map ¢: Mı — Mb, for a real linear 
space W and C? manifolds M; and M», is the map (d); : Lin(T4(5 (M3), W) > Lin(T; (Mi), W) given by 


Vf € Lin(T4(p) (M3), W), We T, (M1), 
(do) (POV) = f((do),(V)). 


In other words, 


Vf € Lin(Tsg (M3), W), (do); (f) = f ° (de)p. (58.8.1) 


58.9. Global differentials of differentiable maps 


58.9.1 REMARK:  Ertension of pointwise differential of a map to the whole domain manifold. 

The pointwise differential in Definition 58.4.5 is extended to the whole manifold M; in Definition 58.9.2. The 
result is a map dọ : Mı — T(Mi, M2) such that (dó)(p) € Tp (p) (Mi, M2) for all p € Mi. The pointwise and 
total induced-map tangent spaces T,,;(Mi, M2) and T (Mi, M2) are given by Definitions 58.3.2 and 58.3.4 
respectively. 


58.9.2 DEFINITION: The (global) differential of a C! map $ : Mı — Mə for C! manifolds Mı and M3 is 
the map dọ : Mı — T(M,, M2) defined by (d¢)(p) = (d$), for all p € Mi. 


58.9.3 REMARK:  Abbreviated expression for the differential of a map. 

The tangent operator Op vı p, in Definition 58.4.5 line (58.4.3) is as defined in Definition 54.11.2. It is 
applied to each of the n2 components of the function %2 o $ : Mı > IR™. As a shorthand, one could write 
V2 = Óp,u, o4 (V2 © $). In fact, this extension of differential operators to R”-valued functions will be adopted 
for convenience. Then one could write instead of equation (58.4.1): 


Vi, € atlas; (Mi), Vu, € IR", Vy». € atlaso(y) (Mə), 
(dØ)p (tpv o) = £4(P),0p,01 yy (W208) ,tb2° 


58.9.4 DEFINITION: The (global) induced map of a map ¢ € C! (Mi, M2) for C! manifolds M; and Mp is 
the map ¢, : T(M,) > T(M3) defined by 


Yz € T(M1), $.(z) = (d$)«, (3) (2); 


where (d$), is as in Definition 58.4.5 with p = mı (z), and 7 is the projection map of T(Mi). 
Alternative name: (global) differential. 


58.9.5 REMARK:  Tangent space for the range of a pointwise differential of a map. 
Although Theorem 58.9.6 is almost in the realm of the obvious, it is given here so as to remove doubt because 
it is required so often. 


58.9.6 T'HEOREM: The differential of a map at a point lies in the tangent space at the map's value. 
Let ¢: Mı — Mə be a C! map for C! manifolds Mj and Mə. Then 


Yp € Mı, VV € Tp(Mı), Q$«(V) € Top) (Ma). 
In other words, 
We T(M;), Q$«(V) € Tom (vy) (M2), 
where mı : T(Mi) — Mi is the tangent bundle projection map for Mı. In other words, 


Vp € Mi, bx (Tp(M1)) € Tp) (M3). 


Pnoor: The assertion follows directly from Definition 58.9.4 and Theorem 58.4.8. 
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58.9.7 REMARK: The induced map is the union of the pointwise differentials. 
Definition 58.9.4 joins together the pointwise differentials (d), of Definition 58.4.5 for all points p € Mi. In 
other words, s = U em, (dó)p. 


58.9.8 THEOREM: Chain rule for global differentials of differentiable maps. 
Let 6; : Mı > Ms» and $» : Mz — Ms be C! maps between C! manifolds Mi, Mə and M3. Then 


($2 9 $1)« = (G2) o (Ó1)«. 


PROOF: The assertion follows immediately from Theorem 58.4.13. 


58.9.9 REMARK: Criteria for using the differential versus the induced map. 

In the case of a differential dọ as in Definition 58.9.2, for 6 : Mı — Mo, one may specify a base point p € Mi 
to obtain the tangent vector map (dó)(p) : T;(M1) — Typ) (M2) (which is a linear map by Definition 58.4.5). 
One may then determine the fate of any tangent vector V € T,(M1) by applying (d¢)(p) to it. However, the 
base point specification is superfluous, in principle, because given V € T(M1j), there is one and only one pair 
(V. W) € T(Mi, M2) which satisfies Sp’ € Mi, (dd)(p')(V) = W, and this pair must satisfy V € T» (Mi), 
from which it follows that p' = p. Hence (d9)(p)(V) = W. This kind of superfluity is a consequence of the 
"tagging" of all tangent vectors in a tangent bundle with their base points. (This tagging may be explicit or 
implicit, depending on the choice of tangent space representation. The tagging issue is discussed in Remarks 
53.3.3 and 54.15.1 for example.) 


In the case of the induced map 4$. as in Definition 58.9.4, for the same point-map $ : Mı — Mə, there 
is no superfluous tagging of the individual tangent-vector-maps as there is for the differential dé. So the 
induced map is the same as the differential, but with the superfluity removed. Thus the differential and 
induced map contain the same information. The choice of representation of this information depends on 
factors such as syntactical convenience and the desired focus in a particular application. The differential 
representation facilitates focus on the linear map (d@), for a particular point p € Mi, whereas the induced 
map representation puts the focus on the tangent bundle T'(Mi) as a whole. 


By contrast with the simple formula for ¢, in terms of dọ in Remark 58.9.7, the reverse is not so easy. One 
may write, for example, 


dó = {(p, baln an) pe Mı} 
= {(p, {(V, W) € T (M1) x T(M3); W = ,(V)); p € Mi}. 


Such formulas verify that the information content is equivalent and recoverable, but have limited practical 
application. One may also write the simpler-looking formula 


Vp € Mı, VV € T, (Mi), (do) (p)(V) = é.(V). 


This formula also verifies the “recoverability” of the “data”, but is more convenient to use. It shows clearly 
that the requirement to specify p € M; to obtain the map for V € T (Mi) from the differential representation 
d$ is superfluous because the induced map representation $, yields the same “data” without it. 


58.9.10 REMARK:  Suitability of the term "induced map". 

The term “induced map” suggests the way in which electric current is induced in a wire by a neighbouring 
wire via a magnetic field. The induced map between the tangent spaces is in some sense parallel to the map 
between the point spaces. This is illustrated in Figure 58.9.1. 


58.9.11 REMARK: The second induced map. 

Figure 58.9.1 shows both the first and second induced maps. The second induced map ¢,. uses the fact that 
the tangent bundle T(M) of a C? manifold M is a C! manifold which has its own tangent bundle T(T(M)). 
(See Section 59.12 and Figure 59.12.1 for the corresponding double differentials d?¢.) 


58.9.12 THEOREM: If a map is C**!, then the induced map is C". E 
If ó : Mı — Mp is a C**! map between C**! manifolds Mı and M» for k € Zj , then the induced map 6. 
is of class C*. 
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T(T(Mi)) p 


Figure 58.9.1 Induced maps for a manifold map 


PROOF: The assertion follows as in the proof of Theorem 58.2.6 for real-valued functions on manifolds. For 
k = 1,2, let ny = dim(M;,) and let V, : Ay, + Ar(m,) be the manifold chart map for the tangent bundle 
total space T'( Mj) as in Notation 54.5.21 and Definition 54.5.22. Then by Definitions 54.5.22 and 52.1.2, the 
C* differentiability of ¢, means that V2(y2) o $x o Vy (y,)7! : R2™ > R?” is a C^ map between Cartesian 
spaces for all v; € atlas(Mi) and v» € atlas( M2). By Definition 58.4.5 and Notation 54.5.21 line (54.5.4), 


Vi, € atlas(Mi), Vv» € atlas(M2), Vr € Dom(v), Vv € IR", 


W»(V2) o pa o Vi(v1) (z, v) = (PPT (E), (X vrp (2))))523) 


i=1 


Hence ¢, is a C^ map because v» o à o v ! is C^*! differentiable and so the maps z +> pipol  (x))) 
are C* differentiable for all i € Nn. 


58.9.13 THEOREM: Inverse rule for the global differential of a diffeomorphism. 
Let à : Mı — Mz be a C! diffeomorphism for C! manifolds Mj and M2. Then ¢, : T(Mi) > T(M2) is a 
bijection and ($1), = (¢4)71. 


PROOF: The assertion follows from Theorem 58.5.4, the corresponding inverse rule for pointwise differentials 
of diffeomorphisms. 


58.9.14 THEOREM: The induced map of a C** diffeomorphism is a C" diffeomorphism. 
Let ó : Mı — Mo» be a C**! diffeomorphism between C**+1 manifolds Mı and M» for some k € Zg. Then 
the induced map ¢, : T(Mi) > T(M2) is a C^ diffeomorphism. 


PROOF: The assertion follows from Theorems 58.9.12 and 58.9.13. 


58.9.15 REMARK: Why bijections are required for induced maps of vector fields. 

The global induced map ¢, : T(M1) > T(M23) in Definition 58.9.4 does not require the point-map $ : Mı > 
Ms» to be a bijection because for any given vector z € T(Mi), the evaluation of ¢,(z) proceeds by first 
projecting z down to 74(z) € M1, and then pushing this point forward to $(71(z)), and finally evaluating a 
vector in Ta(;, (jj (M2) as in Definition 58.4.5. So a simple map is sufficient. A bijection is not required. 

In the case of the “push-forth” of a vector field, on the other hand, the output vector field in X (T'(M3)) must 
cover all of Mə. In this case, the “computation process" starts with a point p € M», which must be mapped 
to $^ 1(p) so that the input vector field X can be evaluted there, and finally the “push-forth” differential ¢, 
can be evaluated with X(ó-!(p)) as input. 


Although the comment may seem unnecessary, it is the reverse to the requirements for global pull-back 
differentials as in Section 58.11. In the pull-back case, mapping individual covectors in Definition 58.11.3 
from the target space M» to the source space Mı requires a bijection, whereas the pull-back differentials for 
covariant fields in Definitions 58.11.5, 58.11.8 and 58.11.10 do not require a bijection. 
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58.9.16 DEFINITION: The induced map for vector fields of a C! diffeomorphism ¢ : Mı — M», where Mi 
and Mə are C! manifolds, is the map ¢, : X?(T(M1)) > X?(T(M23)) defined by 


VX e X"(T(Mi)), Vp € Ma, é«(X)(p) = é«(X (6^ (p))), 
where ¢, : T(M,) + T(M») denotes the induced map of ¢. That is, 


VX e X (T(M))), p(X) = d. o (X o 73). 


58.9.17 REMARK: The global differential of the identity map is the "tautological tensor field". 

In relation to the pointwise differential (didm)p = idm, (u) € Tq of the identity map idj at p € M, for 
a C! manifold M, which is referred to as the “tautological tensor" in Remark 58.5.3, the corresponding 
“tautological tensor field” (idm), € X°(T'1(M)) is not a “vector-valued 1-form" on M because a vector- 
valued 1-form in X (A4(T(M),W)) would need to have a fixed linear space W, whereas the “output space" 
for (idi). is the variable space T,(M) for p € M. 


It is also not strictly correct to say that the map idr(m) € C?(T(M),T(M)) is a cross-section of the 
bundle Ti1(M). It is in fact a “short-cut” version of the cross-section X € X?(T^!(M)) defined by 
X : p e» idr,(m) for p € M. (See Remark 57.7.25 and Section 21.4 for general “short-cuts” for functions on 
total spaces of fibrations.) 


58.9.18 THEOREM: The induced map of an identity map is the identity map on the tangent bundle. 
Let M be a C! manifold. Then (id), = idr(m)- 


PROOF: The assertion follows from Definition 58.9.4 and Theorem 58.5.2. 


58.10. Global differentials of maps on products and products of maps 


58.10.1 REMARK: One-variable and two-variable interpretations of the differential of the identity map. 
'Theorem 58.10.5 is the global version of the pointwise Theorem 58.6.4 for the one-variable and two-variable 
interpretations of the differential of the identity map. This is expressed in terms of Definition 58.10.2, which 
is the global version of the pointwise Definition 58.6.2. 


58.10.2 DEFINITION: The (global) direct product decomposition of the differential of a C* map 
[o : Mi x M» — Mo, for Ci manifolds Mi, M», Mo, is the map [M : T(Mi) x T(M2) — T (Mg) defined by 


Wi € T(M), VV € T (M3), Px (Vi, V2) = bs ($(Vi, V2)) 
= (do) (my (vi), (vo) (Vi; V2)), 
where i : T(M1) x T(M2) > T(M, x Mə) is the global direct product identification map for tangent bundles 


in Definition 54.7.6, and Ta : T(Ma) — Ma is the tangent bundle projection map for T(Ma) for a = 1,2. 
In other words, Qx x = $. o à. 


Alternative name: direct-product decomposed differential. 


58.10.3 THEOREM: The direct product decomposition of the differential of a Cr+! map is a C* map. 
Let k € Zi. Let Mı, M» and Mo be C** manifolds. Let ó € C*+!(My x Mz, Mo). 
Then à... € CF(T(Mi1) x T(M3), T(Mo)). 


Pnoor: By Theorem 54.7.11, i € C*(T(Mi) x T(M3), T(Mi x M3)), and à, € C*(T(M, x M3), T(Mo)) 
by Theorem 58.9.12. So ds « = $+ oi € C(T(Mi) x T(M3), T(Mo)) by Theorem 52.1.17. 


58.10.4 REMARK:  Tangent-bundle direct-product identification map is the differential of the identity map. 
Theorem 58.10.5 (ii) asserts that the two-variable differential of idw, «u, : Mi x Mo — Mı x Mz equals 
the direct product identification map for tangent bundles Mı and Mə. This helps to give meaning to the 
identification map. 
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58.10.5 THEOREM: Direct product decomposition of differential of identity map is an identification map. 
Let Mı and Mə be C! manifolds. 
(i) dm, x Mz)* = idr(m x Ma): 
(ii) (idas, xM)», = i, where i : T(Mi) x T(M3) > T(Mi x Mə) is the global direct product identification 
map for tangent bundles in Definition 54.7.6. 


PRoor: Part (i) follows from Theorem 58.9.18. 
For part (ii), (id yr, x Ma), = (idm, x Mo)» o à by Definition 58.10.2. But (id yr, x Mə )x = id (M, x M3) by 
part (i). So (id yr, x My ) x = idT(M, x M3) o4 -—i. 


58.10.6 REMARK: Gluing together global differentials of maps with a common domain. 

Theorem 58.10.7 shows how the global tangent-bundle direct-product identification map can be used to 
*glue" together the global differentials of two C! maps which have the same domain. (See Theorem 58.7.7 
for the pointwise version of Theorem 58.10.7.) 


58.10.7 THEOREM: Differential of common-domain product of maps in terms of component differentials. 
Let M, Mı and Mə be C! manifolds. Let $1 : M — Mı and $» : M — M» be C! maps. Then 


($1 X $2). =i o ((01)« x ($2)... 


PROOF: Let V € T(M). Then (¢1 x $3).(V) = i((¢1)«(V), (62).(V)) by Theorem 58.7.7. So it follows 
that (¢1 x $2). — io ((91)« x ($2).). 
58.10.8 THEOREM: Global differential of common-domain product of C*** maps is a C* map. 


Let k € Zt. Let M, Mi and Mp be C**! manifolds. Let ġa : M — Ma be C**! maps for a = 1,2. Then 
($1 X $2). : T(M) — T(M, x Mp) is a C* map. 


Pnoor: By Theorem 58.9.12, (64). : T(M) > T(Ma) are C^ maps for a = 1,2. So by Theorem 52.6.13, 
($1)« X ($2). : T(M) =TM) x T(M2) is a C* map. But i: T(Mi) x T(M3) 2 T(M; x Mp) is a C^ map 
by Theorem 54.7.11. Therefore i o ((¢1)« x ($2).) : T(M) — T(Mı x M3) is a C^ map by Theorem 52.1.17. 
Hence (¢1 x é3). : T(M) > T(M, x M3) is a CF map by Theorem 58.10.7. 

Alternatively, if $4 : M — Ma be C**! maps for a = 1,2, then $1 x à : M — Mı x M» is a CFt! map by 
Theorem 52.6.13. Hence ($4 x $3). : T(M) —> T(M, x M3) is a C^ map by Theorem 58.9.12. 


58.11. Global pull-back differentials of differentiable maps 


58.11.1 REMARK: Terminology for global pull-back differentials. 
The global pull-back in Definition 58.11.3 is referred to as the “induced cotangent map” by Poor [32], page xiii. 


58.11.2 REMARK: Globalisation of pointwise pull-back differentials of differentiable maps. 

Following the pattern of the “globalisation” procedure for forward differentials in Section 58.9, one might 
hope that a global pull-back differential for a C! map may be defined as an aggregate of pointwise pull-back 
differentials as in Definition 58.9.4. It turns out that this is not so easy. 

The global differential ¢, : T(M1) + T(M3) of a C! map $ : Mı — Mp in Definition 58.9.4 is an aggregate 
of pointwise differentials (do); : T,(Mi) — Ta(5)(M») for p € Mi. The union à, = Upem, (d), defines a 
value once and only once for each element of the domain T(M;j) because T( M1) equals the disjoint union 
Upe m, Lp(M1) of the pointwise tangent spaces which are domains of pointwise differentials (dé),. Although 
the ranges 75(5,(M3) for p € M; may overlap, and they may not cover all of T(M3), this does not prevent 
os from being a valid function. 


In an attempt to construct a covector pull-back differential from T* (M35) to T* (Mi), each covector in T*(M3) 
must be mapped to one and only one covector in T*(Mi). For covectors f € T7 (M3) for q € M», the natural 
choice for the range would be Tag Mı) in accordance with Definition 58.8.2, but the unique existence 
of 9^ 1(q) requires $ to be a bijection. This issue is avoided in Definition 58.8.2 by tagging the pull-back 
differential (d¢) with an element p of the domain Mı of ¢. However, if these maps are aggregated, the 
result is a union (pem, (d) of pointwise pull-back differentials which is ill-defined because the domain will 
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be Upe Mi T5 (M2), which may cover some covector spaces T7 (M3) for some q € M» more than once, and 
other covector spaces not at all. 


The most obvious way to work around these issues is to require the map to be a C! diffeomorphism as in 
Definition 58.11.3. This is a valid definition, although it is not very useful. 


58.11.3 DEFINITION: Global pull-back differential for tangent covector bundles. 
The (global) pull-back differential (for covectors) of a Ct diffeomorphism $ : Mı — Ms, for C! manifolds 
Mi and Mp, is the map ¢* : T*(M2) + T* (Mı) defined by 

Vf € T" (M3), 9" (f) = f o (d$)s-1(:(r) 
where (d$), is the differential of ¢ at p € Mı as in Definition 58.4.5, 73 : T*(M2) — Mo» is the tangent 
covector fibration projection map as in Definition 55.4.4, and (do); denotes the pull-back differential of ¢ at 
p € Mı as in Definition 58.8.2. In other words, 

Vf € T* (M2), 9" (f) € Ty-1 ms (py) (M1) 
and 

Vf € T'(M3), VV € To-1(x: (p (M1), 

PPV) = f((d9)e-:(;((V)) 


In other words, 
Vp € Mi, Vf € To) (M2), We Tp(Mı), 


V) = f(e«(V)) 
= f((d¢)p(V)). 


58.11.4 REMARK:  Pull-backs of cross-sections of covector bundles. No bijection required. 

Ironically, although the global pull-back differential for covector bundles in Definition 58.11.3 is not well 
defined unless the manifold map is a bijection, no such restriction is required for the pull-back of covector 
bundle cross-sections. 


As mentioned in Remark 58.9.15, bijections are required for Definition 58.11.3, but not for Definitions 58.11.5, 
58.11.8 and 58.11.10. This is because a pull-back from the covector bundle T* (M2) to T* (Mj) requires each 
point in M5 to be mapped to exactly one point in Mı where the forward differential can be computed. In 
the case of the pull-back of a covector field or differential form, the pull-back construction process merely 
needs to find a single point in M» for each point in M; because the construction of the differential form 
starts from Mj and then seeks a corresponding point in Mo. 


Given a covector bundle cross-section (i.e. a 1-form or covector field) w on M2, the construction task is to pull 
back one covector to each point of the domain manifold. Thus for each point p € Mı, the covector w(¢(p)) 
must be pulled back to a covector at p. Some covectors w(q) for q € Mz may be pulled back many times, and 
some may not be pulled back at all. However, the task is not to construct a map with the target manifold 
M» as the domain. The task is to construct a covector field consisting of one covector for each p € Mı. 


58.11.5 DEFINITION: Pull-back differential for tangent covector bundle cross-sections. 
The pull-back differential (for real-valued 1-forms) of a C! map à : Mı — Mb, for C! manifolds M; and M2, 
is the map ¢* : X(T*(M3)) ^ X(T*(M;)) defined by 


vw € X(T* (M3)), b*(w) =wo de, 
where ¢, is the induced map of ¢ as in Definition 58.9.4. In other words, 


Vw € X(T"(Ma), VV e T(Mi), | é*(w)(V) = 
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where mı : T(M,) — Mi is the tangent fibration projection map as in Definition 54.5.4, In other words, 


Vio € X(T* (M3)), Vp € My, VV € Ty(My), 
$*(w)(V) = w((do)p(V)). 


58.11.6 REMARK:  Covector bundle cross-sections versus real functions of tangent bundles. 

As is customary in the literature, Definition 58.11.5 uses the “differential form short-cuts” in Section 57.7 to 
simplify the presentation of formulas related to cross-sections of tangent covector bundles. Strictly speaking, 
a cross-section w € X(T*(M)) of a tangent covector bundle T*(M) is a map from M to T*(M) which 
satisfies w(p) € T; (M) for all p € M. To obtain a real value from w for a given vector V € T(M), one 
must compute the expression w(m(V))(V), where m : T(M) — M is the projection map for T(M). Thus 
w € X(T*(M)) would be a function-valued function which requires two steps to obtain a real value. 

With the short-cut style of differential form w € X (T*(M)) such as is used in Definition 58.11.5, one writes 
w(V) instead of w(x(V))(V). This short-cut version is a map w : T(M) — R which is linear on each 
tangent space T;(M). The short-cut function is usually written without any notational distinction. (See 
also Remark 57.7.1 for general differential form short-cuts.) 


This issue is the same as mentioned in Remark 58.2.1, namely that global differentials df of real-valued 
functions f € C! (M, R) may be defined to be either differential maps df : T(M) — IR or differential fields 
df € X(T*(M)). Typically such differentials are stated to be covariant fields (or cross-sections or differential 
one-forms), but in practice they are often assumed to be real-valued functions on tangent bundles. 


58.11.7 REMARK: Extension to pull-back differentials for vector-valued 1-forms. 

The generalisation of the pull-back in Definition 58.11.5 for real-valued 1-forms to the general vector-valued 
r-forms in Definition 58.11.10 poses no new difficulties. The formulas are the same, and the same observations 
as in Remark 58.11.6 are applicable. (One application of Definition 58.11.10 is Theorem 69.8.2 (ii), where 
W =T.(G) is a Lie algebra.) 


58.11.8 DEFINITION: Pull-back differential for vector-valued 1-forms. 
The pull-back differential for W -valued 1-forms of a e map $ : M; > M», for C! manifolds Mı and M» 
and a linear space W, is the map ¢* : X(A4(T(M3), W)) > X(Ai(T (Mi), W)) defined by 


Vw € X (M (T(M3),W)), $* (w) =WO Px, 
where ¢, is the induced map of ¢ as in Definition 58.9.4. In other words, 


Vw € X(A1(M3,W)), VV € T(Mi), ¢*(w)(V) = 


where mı : T(Mi) — Mi is the tangent fibration projection map as in Definition 54.5.4, In other words, 


Vw € X (M1(M5,W)), Vp € Mi, VV € T, (M1), 
9" (w)(V) = w((dd),(V)). 


58.11.9 REMARK:  Pull-back differential for vector-valued differential forms of general degree. 

The pull-back differential for vector-valued differential forms of general degree has some difficulties in regard 
to the notations ¢, and (d$), for the push-forth differential. Multiple “copies” of this map must be supplied 
to the differential form. 


((2023-1-3. To be continued ... )) 


The differential expression *9£(V)" for V = (Vi), € T"(Mij) in Definition 58.11.10 is shorthand for the 
tuple (¢.(V;))?_,, where ¢.(V;) is the application of the induced map ¢, in Definition 58.9.4 to V; € T(M1) 
for all i € IN,. 


There seems to be no tidy expression which means (U em X; (d$), with domain Unem Xi21 Dom((d9);) 
and range U em x;-; Range((d9),). So the expression "97" is used for this in Definition 58.11.10. Similarly 
the expression *(d$);" is used with the intended meaning x;_1(d@)», with domain x7., Dom((dó),) and 
range x7, Range((dó),). Hopefully such clumsy notations will never be generally adopted. 
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Flanders [11], page 23, defines the pull-back ¢* for differential p-forms with the parenthetical comment: 
“(Strictly speaking we should index ¢* and write 95, p = 0,1,---, but we shall skip this.)". So the notations 
“ot” and (d); in Definition 58.11.10 are perhaps not too far-fetched. 


58.11.10 DEFINITION: Pull-back differential for vector-valued forms. 
The pull-back differential (for W -valued r-forms) of a C! map $ : Mı — Mp, for C! manifolds M; and M», 


a real linear space W and r € Z^, is the map ¢* : X(A.(M3,W)) > X(A,.(Mi, W)) defined by 


vw € X(A(Ma,W)), d*(») =wo de, 
where ¢, is the induced map of ¢ as in Definition 58.9.4. In other words, 


Vw € X(A,(M2,W)), VV e T'(Mi), $'(wy(V)- 


where mı : T (Mi) > Mi is the tangent fibration projection map as in Definition 54.5.4, In other words, 


Vw € X (A. (M5, W)), Vp € Mi, VV € T; (Mi), 
$* (w)(V) = w((do)5(V)). 


58.12. Differentials and induced maps for differential operators 


58.12.1 REMARK: The tangent operator version of differentials has some advantages. 

By putting L = pmi yı for tpo yı € Tp(M) and f = Y o $ in Definition 58.4.5, the operator form of 
the differential is constructed in Definition 58.12.2. These are equivalent definitions. Definition 58.12.2 
is simpler, whereas Definition 58.4.5 is more useful for computation. The operator definition provides a 
convenient shorthand and mnemonic for the component version of the differential, which is defined so as to 
be consistent with the operator version. 


58.12.2 DEFINITION: The differential (for tangent operators) at a point p € Mı of a map ó € C! (Mi, Mə), 
where M; and M2 are C! manifolds, is the linear map (d$), : Tp( M1) — Typ) (M2) defined by 


YL € TUM, Yf SO"): (Cdo) (L) CP) = L( o 9). 


58.12.3 REMARK: Illustration of spaces and maps for the tangent operator differential of a map. 
Figure 58.12.1 shows the spaces and maps which are relevant to Definition 58.12.2, including the projection 


dread estere 


maps Tk : T(My) > My for k = 1,2. 


Figure 58.12.1 The differential of a map for first-order operators 
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58.12.4 REMARK: Relation between two versions of the differential of a map. 

Theorem 58.12.5 gives the relation between the component version of the map differential in Definition 58.4.5 
and the operator version in Definition 58.12.2. The same notation (d$), is used for both versions, but this 
should not cause confusion. 


58.12.5 THEOREM: The operator of the differential equals the differential of the operator. 
Definition 58.4.5 is consistent with Definition 58.12.2. In other words, for C! manifolds M, and M3, 


Vo € C! (Mi, Ma), Vp € Mi, VV € T,(Mi), 
lad) (v) = (do)p (ðv). 
PROOF: Let V = tp.,,y, and f € C!(M2). Then ðv = 05, yı- So by Definition 58.12.2, 
(do)p(ðv)( F) = 3pv y: (f © 9) 


ny 


= >, viril f ogo VT (2) lew) 


=Ñ vide (f oda! ova o G0 i (2), uo; (58.12.1) 
i=1 


ny 


u 25 Oy o 9a (Luci) » viða (V3 o $ o vi (2) [iG 
j- = 


= Os(p) vs us (f) (58.12.2) 
= bap) (v) (f), 


where line (58.12.1) follows from Notation 54.11.3, and v2 on line (58.12.2) is computed as in Definition 58.4.5 
line (58.4.2). Since this holds for all f € C1(M3), the assertion follows. 


58.12.6 REMARK: The problem of the “ubiquitous zero tangent vector” for induced maps. 

Definition 58.12.2 maps the “ubiquitous zero vector” of M, to the corresponding vector in M5. The tangent 
operator L € T (Mı) such that L : f — 0 for all f € C!(Mi) is the same map independent of p € Mi, 
which justifies the name “ubiquitous zero vector". Luckily, this vector is mapped to the corresponding zero 
vector for Mz, no matter which point p it is attached to. Therefore the union of the differential maps 
(d), : T(M3) > T(Mg) is a well-defined function ġ, = Upem, (dó)p. This is the induced map given in 
Definition 58.12.8. 


Although there is no logical contradiction in mapping the ubiquitous zero vector on the domain manifold to 
the ubiquitous zero vector on the range manifold, this does not correspond to one's rational expectation that 
the zero vector at a point p € Mı should be mapped to the zero vector at ó(p) € M2. This is yet another 
reason to reject tangent operators as the primary representation for tangent vectors on manifolds. 


58.12.7 DEFINITION: The differential (for tangent operators) of a map ¢ € C! (Mi, M2) for C! manifolds 
M; and M; is the linear map dọ : Mı — T(My, M2) defined by 


Yp € Mı, YL € T(Mi), Vf € C'(Ma), | ((dé)(p)(L)) (f) = Lf o à). 


58.12.8 DEFINITION: The induced map (for tangent operators) of a map ¢ € C! (Mi, M3) for C! manifolds 
M; and M» is the linear map ¢, : T(Mi) + T(M2) defined by $4 = Upem, (dó)5. In other words, 


vL e T(M), bx(L) = (do), (L)). 
That is, by Definition 58.12.2, 
VL e T(M:), Vf € C' (My), (6«(2))(/) = L(f 9 9). 


58.12.9 THEOREM: The operator of the induced map equals the induced map of the operator. 
Definition 58.9.4 is consistent with Definition 58.12.8. In other words, 


VV € T(M;), Os, (v) = $x (0v). 


PROOF: Let V € T(Mi). Let p = mi(V). Then iag) (v) = (d¢)p(Ov) by Theorem 58.12.5. But ¢.(V) = 
(dé),(V) by Definition 58.9.4, and (d), (ôv) = 6. (Ov) by Definition 58.12.8. So 04, (y) = $.(Ov). 
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58.12.10 REMARK: Tagged tangent operators avoid the “ubiquitous zero vector” issue. 

Definition 58.12.11 has the advantage, relative to Definition 58.12.8, that it maps the zero operator at a 
point p € M; to the zero operator at ¢(p) € M» instead of mapping ubiquitous zero vectors to ubiquitous 
zero vectors. However, the nicety of tagging the tangent vectors to achieve this is rarely done. It is tacitly 
assumed to be done in the background somehow. Tangent-line vectors do not have this problem. 


58.12.11 DEFINITION: The induced map (for tagged tangent operators) of a map ġ € C! (Mi, M2) for C! 
manifolds M; and Mg is the map $. : T(M,) > T(Ma) defined by 


V (p, L) € T(M1), $((p, L)) = (&(p), (d¢)p(Z)), 


where (d$), is the operator version of the differential given in Definition 58.12.2. 
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Chapter 59 


RECURSIVE TANGENT BUNDLES AND DIFFERENTIALS 


59.1 The tangent bundle of a tangent bundle . ....... a 1876 
59.2 Horizontal components and vertical drop functions . . . ... 2l. 1883 
59.3 Oblique drop functions . . . . . 4 2 2 ll s lll ls 1887 
59.4 Scaling curves and constant-scale maps . . . . . 2l leere 1889 
59.5 Sprays on tangent bundles . .. .. .. lll lle 1893 
59.6 Horizontal component swap functions... . . 4 2 2 2 2 ll a 1893 
59.7 Tangent bundles of tangent vector-tuple bundles... ...... lens 1897 
59.8 Higher-order differentials of curves . . . . 4. 2l lees 1898 
59.9 Higher-order differentials of curve families . ......... es 1900 
59.10 Higher-order differentials of real-valued functions... ... 22222 le 1901 
59.11 Hessian operators at critical points . . . . . 2 2l ll lll 1902 
59.12 Higher-order differentials of maps between manifolds . . . . .. ..... cll. 1903 


59.0.1 REMARK:  Higher-level tangent bundles and higher-order differentials. 

Chapter 59 presents tangent bundles of tangent bundles, differentials of differentials, and higher recursive 
levels of these structures. In this book, the former are called higher-level tangent bundles, while the latter 
are called higher-order differentials. 

Horizontal vector components and drop functions are defined for higher-level tangent bundles. Higher-order 
differentials make use of higher-level tangent bundles. 

This chapter presents two higher-order differentials topics: higher-order differentials and differentials for 
higher-order operators. These are distinct topics. For example, a second-order differential is the differential 
of a differential, whereas the differential for a second-order operator is the “push-forth” of the operator. 
These topics are presented together with the intention of clarifying the differences between them. 


59.0.2 REMARK: Confusion between second-level, second-degree and second-order concepts. 
The following concepts are easily confused. 


(i) The tangent bundle of a tangent bundle: T (M) = T(T(M)). This is also called the “second-level 
tangent bundle” or “double tangent bundle”. (Section 59.1.) 

(ii) The space TPI(M) of second-order derivative operators f + viw (8? f(x) /Ox'da4 (The second 
order is discussed in Section 60.2. See also Section 60.5 for the corresponding higher-order tangent 
vectors in spaces T!"!(M).) 


(iii) The space of tensors of degree 2: T??(M) = T(M) C9 T(M). (See Section 56.1.) 
(iv) The set of ordered pairs of tangent vectors: T(M)? 2 T(M) x T(M). 


The words “degree”, “order” and “level” are used in this book with the following meanings. 


word meaning example space or operator tangent spaces 
degree the multiplicity of a tensor @'V, Q? V, Q? V T9 (M) : 
order the order of derivatives 0/8a?, 07/02 dx), O?/Oa'OxjOr" T'l(M), T™(M) 


level recursive level of tangent bundles T(M), T(T(M)), T(T(T(M)) | T™(M) 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
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59.1. The tangent bundle of a tangent bundle 


59.1.1 REMARK: Terminology for the tangent bundle of the tangent bundle of a manifold. 

Lang [23], pages 96-97, calls T'(T(M)) the “double tangent bundle” of M, while referring to a vector field on 
"T (M) as a “second-order vector field”. The space T(T'(M)) is used (with the same notation) by Malliavin [28], 
chapter II.4, page 112, but he does not give it a name. 


Here the term “second-level tangent bundle" is used so as to distinguish it from second-order and second- 
degree concepts. (Higher-order concepts are associated with higher-order derivatives. Higher-degree concepts 
are associated with polynomials of higher degree.) The term “double tangent bundle" has the benefit that 
it makes the recursive nature of the tangent bundle of a tangent bundle intuitively clear. 


59.1.2 REMARK: Differential geometry applications require second-order derivative structures. 

Tangent vectors are mathematical objects which correspond to first-order derivatives, but most of the laws 
of fundamental physics are expressed in terms of second-order differential equations. So tangent vectors 
must be extended somehow to represent second-order derivatives to make differential geometry applicable to 
physics. The upgrade of differential geometry from first-order to second-order concepts is the point where 
differential geometry becomes significantly more complicated than Euclidean space. 


From the geometrical perspective, first-order tangent bundles are concerned with direction, whereas second- 
order derivatives are concerned with curvature, but curvature is not invariant (or covariant) under the 
transition maps of a differentiable manifold in the same way that direction is. It is possible to write down 
the transition rules for curvature between manifold charts, but there is no second-order “inertial frame" in 
which curvature can be calculated and then transformed between the charts. In the same way, there is no 
chart-independent definition of velocity, but at least the zero velocity is invariant under local transformations. 
In the case of curvature, it is never clear whether it is zero or not. What is needed is some kind of anchor 
in the differentiable manifold. The role of affine connections and Riemannian metrics is to anchor the 
manifold so that curvature (and parallelism) can be evaluated. Connections and metrics are like tracks in 
the manifold which act as a kind of “inertial frame" relative to which curvature (and parallelism) can be 
evaluated. It follows that the connection and metric structures must themselves be transformed according to 
the chart transition maps because they are supposed to provide fixed “snail trails” and “measuring sticks" 
in the manifold. If these structures are affixed to the manifold's points, they must necessarily have different 
coordinates with respect to different charts. 


The tangent bundle of a tangent bundle can be defined despite a total absence of anchored structure. One 
must not expect such a “second-level tangent bundle" to be immediately useful. It cannot say anything 
about parallelism or curvature. It cannot be directly useful for expressing the evolution equations of physics 
because there is no way to measure second-order derivatives in a chart-independent way. However, second- 
level tangent bundles provide a convenient space for abstract second-order derivatives to be define in. Then 
connection and metric structures may be applied to such abstract second-order derivatives to yield concrete 
second-order derivatives in other spaces. 


In an otherwise unstructured differentiable manifold, it is possible to calculate the derivative of the derivative 
of a curve, but the result is abstract and essentially useless on its own. The tangent bundle of a tangent 
bundle becomes useful only when affine connection and Riemannian metric structures are added. 


59.1.3 REMARK: Second-order derivatives require first-order differentiation to be endomorphic. 

A second-order derivative is usually thought of as the first-order derivative of the first-order derivative. This 
is fine if the output from the first-order derivative of a function space is the same space you started with. For 
example, the first derivative of a C% function f : R — R is a C% function from IR to IR. One may compute 
derivatives up to any order in the same way. This property could be described by saying that differentiation 
acts as set endomorphism on the set of functions C^? (IR). (See Definition 10.5.21 for set endomorphisms.) 


In the case of a function f € C?*(IR", IR) for integer n > 2, the first-order derivative can be expressed as 
a sequence of n first-order partial derivatives (0;f)?_, € C?* (IR^, IR"). So even in this very basic flat-space 
case, the second-order derivative cannot be the double application of the first-order derivative because the 
derivative yields an output in a different space to the input. In other words, differentiation is not endomorphic 
in this case. 


The situation is much more difficult when defining a second-order derivative on a differentiable manifold. 
In this case, the first-order derivative of a real-valued function yields a cross-section of the tangent covector 
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bundle, which is a linear functional on the tangent vector space at each point of the manifold. One of the 
purposes of Section 59.1 is to define the first-order derivative of such cross-sections so that second-order 
derivatives will be meaningful. 


59.1.4 REMARK: The steps to construct a second-level tangent bundle. 

A second-level tangent bundle T(T(M)) may be defined in terms of the differentiable manifold structure 
of the tangent bundle T(M). The tangent bundle total space T(M) has a natural C"-! differentiable 
structure if M is C". The natural differentiable manifold atlas on T'(M) is defined in Definition 54.5.22. The 
regularity of the tangent bundle's atlas is asserted in Theorem 54.5.28. T'he tangent bundle atlas can be used 
to construct a second-level tangent bundle T(T(M)) of class C’~?. The steps to construct a second-level 
tangent bundle are as follows. (These steps are sketched in Figure 59.1.1.) 


(i) Define a tangent bundle T(M) on a C" manifold M « (M, Ay). 
(ii) Define a C"-! differentiable manifold atlas (i.e. differentiable structure) Azim) on the total space 
of T(M). 
(iii) Define a second-level tangent bundle T) (M) = T(T(M)) on the total space of T(M) < (T(M), Ar(m)): 
(iv) Define a C’~? differentiable manifold atlas Ar(r(m)) on the total space of T(T(M)). 


C 
Q 


Ar(M) Ar(T(M)) 


9 


——-( ]|——- 


" 
Í 


T(M) T(T(M)) 


Figure 59.1.1 Building tangent bundles from manifolds and atlases 


The differentiable structure on the tangent bundle of a differentiable manifold is presented in Defini- 
tion 54.5.22. The existence of a differentiable structure on the tangent bundle implies that tangent vectors 
can be defined with base points in the tangent bundle T(M) just as they were on the base space M in 
Definition 54.10.4. The manifold of these tangent vectors forms a new tangent bundle, namely T(T(M)). 


59.1.5 REMARK: The dimension of the total space of a second-level tangent bundle. 

In the same way that first-level tangent bundles on manifolds are defined in terms of flat-space tangent 
bundles (as indicated in Figure 53.3.2 in Remark 53.3.14), second-level tangent bundles are defined in terms 
of the corresponding flat-space second-level tangent bundles. This is summarised in Figure 59.1.2. If M 
is an n-dimensional manifold, then T(M) is a 2n-dimensional manifold. Therefore T(T'(M)) must have a 
4n-dimensional manifold structure. 


4 (2,0, w) = WO (YW) 
pl) (Y) € Amp 1 (2(m^ 
(M) T?(R^) 
TO (M) d " = R” x R” x (IR? x R”) 
= R" 
j| qO ow) >r 
t+ s= yip) 
M > R” 
p € Am 
Figure 59.1.2 Defining second-level tangent bundle in terms of flat space 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


don, You may not charge 


1878 59. Recursive tangent bundles and differentials 


59.1.6 DEFINITION: A second-level tangent vector on an n-dimensional C? manifold M is a tangent vector 
on the tangent bundle total space manifold (T'(M), Aria). 


59.1.7 REMARK: Interpretation of the structure of second-level tangent vectors. 

The tangent bundle total space manifold (T(M), Ar(m)) in Definition 59.1.6 is defined in Definition 54.5.26. 
The tangent bundle manifold atlas Amy is defined in Definition 54.5.22. A tangent vector on the manifold 
(T(M), Arc) is an equivalence class (Q^, Lya(2),w)l for some vU € Ary, z € T(M) and w € R”, 
where z € Dom(q(?). But z € Dom(w) if and only if z(z) € Dom(q), where Y® = P(Y). Let p = r(z). 
Then z = tpw, for some v € R”. So z = [(~, Ly(5),,)]. Therefore the line D; is the map Lyay(z),y : 
s e VO (z) + sw = VO (tuu) + sw = Qaa (hlp), v) + sw for s € R, where Qnn : R” x R” > R?” is the 
tuple concatenation map in Definition 16.4.3. Hence a second-level tangent vector on M is a tangent vector 
te wpa), Where z = ty», and w € R”: 


The component tuple w € R?” may be expressed as the concatenation w = Qn.n(w,w), where ù, ù € IR". 
Then the line Lya) (z),w has the form Lya (z) : 5 > Qn (V (p) si», v+ sw). Since the component w affects 
the base point p € M, it is called the “horizontal component” of w. Since the component w affects the fibre 
set element v € R”, it is called the “vertical component” of w. (The intended mnemonic for w and w is that 
the hat-accent suggests “horizontal”, whereas the V-shaped check-accent suggests “vertical” .) 


59.1.8 DEFINITION: The second-level tangent-line set at a tangent vector z € T(M) for an n-dimensional 
C? differentiable manifold M is the set ([( P, Lagtotsyu)] v (O) € atlas, (T(M)), w e R?^). 


59.1.9 NOTATION: The second-level tangent-line set at a point on a manifold. 
T. (T(M)) denotes the second-level tangent-line set at z € T(M) for a C? manifold M. 


59.1.10 REMARK: Notations for particular second-level tangent vectors. 

Notation 59.1.11 is intended to follow the pattern of Definition 54.1.2 and Theorem 54.1.8 (v), which require 
three subscripts for each particular first-level tangent vector tp, namely the point p € M at which 
the vector is “attached”, the Cartesian chart real-number component tuple v € IR" for the vector, and 
the chart y € atlas;(M). Following this pattern, one obtains t; 0), for z € T(M), w € IR?" and 
v) € atlas,(T(M)). (Notation 59.1.11 is essentially superfluous because it is the same as the result 
obtained by applying Definition 54.1.2 to T'(M) in place of M.) 


Notation 59.1.12 splits the second-level “point” z € T(M) into two components, p € M and v € R”, and 
splits w € IR?" into the horizontal and vertical components, ù € R” and w € IR? respectively. It is assumed 
that the choice of second-level chart 7“) in atlas; (T(M)) will always be U(w). Therefore a base-level chart 


w € atlas; (M) is given as the last parameter for the vector. 


59.1.11 NOTATION: Particular second-level tangent-line vectors. Two-component-parameters version. 
t, wp), for z € T(M), w € R?” and v? € atlas; (T(M)), for a C? manifold M, denotes the tangent-line 


vector [(W, Lc (z),w)] € T(T(M)). 


59.1.12 NOTATION: Particular second-level tangent-line vectors. Four-component-parameters version. 
i oa for z = ty, € T(M), w = (ù, ù) € IR?" and v € atlas;(M), for a C? manifold M, denotes the 


tangent-line vector [QV (v), Lw(y)(z),w)] € T(T(M)). 

In other words, Duy = b, www): 

59.1.13 REMARK: Interpretation of the structure of second-level tangent bundles. 

The apparent complexity in the definition of second-level tangent bundles is more due to the difficulties 
of the notation than any intrinsic difficulty. A second-level tangent vector is a line whose start-point is 
(via a chart) a first-level tangent vector z = tpw, € T(M), and whose velocity w = Qn, n, ù) € IR?" 
has two components, the horizontal component i? varying (via a chart) the base point p, and the vertical 
component w varying the first-level velocity v € R”. Since the component tuples v(p) € R” and v € R” 
of the first-level tangent vector are included as part of the second-level tangent vector, there are in total 
four-component tuples, which gives a four-part 4n-tuple (v (p), v, ù, w). These four component component- 
tuples are respectively the base-point components, the velocity components, the velocity of change of the 
base point, and the velocity of change of the velocity. 
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It is very tempting to think that the velocity v and velocity 1? in the 4n-tuple (v(p), v, b, €») are the same. 
They are not the same. The difference is clear if one thinks of each level-one tangent vector as a line! The 
level-one tangent-line vector L,(5),, in Cartesian space IR" is being moved as a whole so that the base point 
(p) is being moved in Cartesian space at velocity ù, and the velocity v is simultaneously being moved at 
the rate w relative to the Cartesian coordinates. This is illustrated in Figure 59.1.3. 


Vu T.U 


v(p) — SS (p) +++ ù 
— Ly z),w() 
Lyay(z),w(9) — = Lud pote 
= Lyip),v icd 


Figure 59.1.3 Components of second-level tangent vector in parametrised line form 


The fact that the vertical component w of a second-level vector is not invariant under chart transition maps 
is not a show-stopper. The component n-tuple w is not the component tuple of a vector in T(M). The 
n-tuple w is only n components of a 4n-component tuple for a vector in T(T(M)). 


The vertical component w may be thought of as the rate of deviation of the “line of lines" from parallel 
motion. But this parallel motion is entirely chart-dependent. This is not a real problem. This arbitrariness 
is a consequence of the lack of “anchoring” in a bare differentiable manifold. (Hermann Weyl referred 
to a differentiable manifold with neither affine connection nor Riemannian metric as a *bare continuum". 
See Remark 49.2.10.) The arbitrariness of the vertical component provides the motivation to define affine 
connections. So it is actually a boon in disguise. 


The purpose of an affine connection is to provide reference vectors at different base points of M so that the 
fourth component w may be compared with parallel-transported vectors to determine how much the vector 
z has changed relative to parallel transport. Unfortunately, affine connections generally define a different 
parallel transport along each path between two points. So one can only compare w to a fairly arbitrary 
definition of parallel transport. However, pathwise parallel transport is well suited to most scenarios in 
differential geometry and physics. 


Figure 59.1.4 illustrates the same second-level tangent vector as in Figure 59.1.3, but in point/velocity form. 


base velocity © D “vertical” component 
v ~ Ù of second-level vector 
/ `a 
base point D Pi 
BOP) uu 


(3) "horizontal" component 
of second-level vector 


Figure 59.1.4 Components of second-level tangent vector in point/velocity form 


59.1.14 REMARK: Chart transition rules for second-level tangent vectors. 

Theorem 59.1.15 gives chart transition rules for second-level tangent vectors on the total tangent space T'(M) 
which are analogous to the chart transition rules in Theorem 54.1.11 for first-level tangent vectors on the 
base manifold M. 


The chart transition rules for second-level tangent vectors show that verticality of vectors in T(T'(M)) is 
chart-independent, whereas horizontality is not. (This is because the horizontal component is essentially 
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vector-like, whereas the vertical component is not.) In other words, if the horizontal component w of a 
vector tz w,w(y) € Tz(T(M)) is zero for one chart, then it is zero for all charts. 

Although horizontality is not chart-independent in general, if z is a zero vector (that is, z = tpo,y for 
some p € M and wv € atlas;(M)), then horizontality is chart-independent. This observation has helpful 
consequences when applied to critical points of real-valued functions on C? manifolds. 


The sets, points and maps in Theorem 59.1.15 are illustrated in Figure 59.1.5. 


T.(I(M)) 
Z = tp,v1 %1 
r= 1(p) = pvo 3 12 = va(p) 
(e) . C- o 
P — > 
TR2n T(M) R?” 
een OF eee S 
—————— ——— —————- 
R” M R” 
Figure 59.1.5 Chart transition rule for tangent vectors to T(M) 


59.1.15 THEOREM: Chart transition rules for a second-level tangent bundle. 
Let M be a C? manifold. Let p € M and vi, v» € atlas,(M). Let n = dim(M), vı € R” and z = tpv y1: 


Let yO = V(w,) € atlas;(T(M)) be the tangent bundle charts corresponding to pe for £ = 1,2. (See 


Notation 54.5.21.) Then for wi, w2 € R2”, t Tn t way if and only if 
UU, 12,5 
Vic Nn, wh = Y 0, Ql o vil (x)) uj (59.1.1) 
j=l x=% (p) 
and 
Vi € Nn, wy t? = J Opi Ope (5 o vi *(2)) viru] + >) Ou (Yh o Hy (z)) wi”. (59.1.2) 
j,k=1 x=Ņı(p) j=l x=1(p) 


This can be expressed in abbreviated form, with we = (w); we = (w7 ^)? 4 for ( — 1,2, as 


Vi € Nn, di = Yd, (59.1.3) 
j=1 

Vi € Nn, w= Y. prkl + Y ei ui, (59.1.4) 
j k-1 j=l 


where @ = V» o Ur and Vi, j € Nn, $i; = O and Vi, j,k € Nn, bj = OjOng". 

Pnoor: By Theorem 54.1.11 applied to the manifold T(M) (instead of M), t, ww = beng yo if and 
— S WPI 1W2,H9 

only if 


Vi € Non, wh =| CE (u$)! (y) ML (59.1.5) 
j= =P 


The dummy variable y in line (59.1.5) must lie in yO (Dom(u?)), which is a non-empty open subset of R?”. 
Therefore y = (Yı (p), ŭ) for some p € Dom(v,) n Dom(v5) and ? € IR". Let ž = v4(p) and Z = tg; y,. 
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Then % € R” and ž € T;(M). Soy = ží = v (p) for j € Nn, and y? = i^" for j € Non V Nn. Thus by 
applying Theorems 54.5.14 and 54.1.11 to the manifold M (instead of T(M)), 
who v (ž) for i € IN, 


n "RN . l (59.1.6) 
Dia dalV” o pr (2), g 8" for i € Non \ Nn- 


Vie Non, (09) o (01) 1() = | 
Substituting (59.1.6) into the expression ô; (( (yi o (u 9 )71(y)) in (59.1.5) for i € Na, and j € Nn gives 


Os; (Y o by * (4) for i € IN, 
az Yo er (WE o YI (E))| ,U* forie Nan \ Nn 


8y (PY o (u(?)71(y)) = | 
u E o yr (2))|,. z for i € Nn 


Epai Oxi One (5 0 Py (E) ,U* for i € Non \ Nn. 


A similar substitution for i € No, and j € Nay \ Nn gives 


! Dyin (W3 o i (2) for i € Np 
a5 (Ni o (4004-1 B 
y ((w5 ) (Vi ) (y)) i Diu Xa Ok (yi? o yi (E) z or for i € Non \ Nn 
_ 0 for i € Ny, 
7 { Oi-n (ui^ 9 Vil(z)). , for i € Non \ Nn. 


Substitution of these expressions into (59.1.5) gives (59.1.1) and (59.1.2). 


59.1.16 REMARK: Summary of second-level chart transition rules. 
The transformation rules (59.1.1) and (59.1.2) may be summarised as the following matrix equation. 


We E B 0 Wy 
wo) |A Bj|dJ|' 
where ù; and i; are respectively the horizontal and vertical parts of the component vector w € IR?" and 


ES 
ovi 


ij=l 
_ | 9$ oe 


59.1.17 REMARK: Similarity of chart-transition rules for second-level and second-order tangent vectors. 
The transformation rules for T2(M) = T(T(M)) in Theorem 59.1.15 are eerily similar to the rules for 
TU (M) and TP! (M) in Theorem 60.2.5 and Definition 60.5.5 respectively. This is probably not a coincidence. 
The space T (M) can be embedded inside TU!(M) so as to have the same transformation rules. Thus 
l scs and i od have essentially the same chart transition rules, where p € M, v € atlas,(M), 
n = dim(M), v,w,w € R”, and v- i? means the array a € IR?*" defined by a? = vd? for i,j € Nn. 
'This is related to the close correspondence between the commutators of the first-order tangent operators in 
Section 60.3 on the one hand, and the Lie bracket in Section 61.5 on the other. 


59.1.18 REMARK: The second-level tangent space at a point of the first-level tangent bundle. 

Definition 59.1.19 is based on Definition 59.1.8 for the second-level tangent-line set of a C? manifold. The 
tangent space T;(M) in Definition 59.1.19 does not need to be defined here because it is already defined by 
Definition 54.4.4. 
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59.1.19 DEFINITION: The second-level tangent space on M at z € T(M), where M is an n-dimensional 
C? differentiable manifold, is the second-level tangent-line set T,(T(M)) on M at z together with the usual 
tangent space vector addition and scalar multiplication operations. 


59.1.20 REMARK: Linear space operations for a second-level tangent space. 
The vector addition and scalar multiplication operations in Definition 59.1.19 for the linear space T,(T'(M)) 
satisfy —— 


War, A2 € R, Vui, wz € R?", Vf? € atlas.(T(M)), 


Attan yp + Ast =f 


zwa, p® zx ui tAzwa, pr” 

59.1.21 REMARK: Shorter names for second-level tangent spaces and tangent bundles. 

The double tangent bundle T(T(M)) in Definition 59.1.22 does not need to be defined here because it is 
already defined by Definition 54.5.16. The purpose of Definitions 59.1.19 and 59.1.22 is to give new (shorter) 
names for the structures T,(T(M)) and T(T(M)). 


59.1.22 DEFINITION: The second-level tangent bundle or double tangent bundle on an n-dimensional C? 
manifold M is the tangent bundle T(T(M)) < (T(T(M)), Arriu: 7, T(M), Aro), Atria) 


59.1.23 REMARK: Interpretation of the structure of second-level tangent bundles. 
Definition 59.1.22 is illustrated in Figure 59.1.6. 


(2) 
T(T(M)) (0) 
zo 


(1) 
ro Oe 


R21” 


r| 
M ()—— 


Figure 59.1.6 Charts and maps for total tangent space of total tangent space 


R” 


The total space T(T(M)), projection map 7) and total-space manifold-atlas Arr(m)) 0f the second-level 
tangent bundle in Definition 59.1.22 satisfy the following. 
(i) T(T(M)) = Uierg T: (1'(M)) is the disjoint union of the tangent spaces at points of T(M). 
(ii) r® : T(T(M)) > T(M) satisfies nU : ty, pwyo 9 tons (wy, where II? : IR?" > R” is the 
projection map with IIT : (z1,...2?") — (zl, ... a"). 
(ii) Avery) = (00; v € Am}, where for all y € Am, the chart O9 : (q(?)-1 (a7! (Dom(y))) > R^" 
satisfies y O : bee wp wpa) = Q»2n,2n (Qus (V (p), v), w). 


It should be noted that 7“) Z m, except in trivial cases. It is tempting to conjecture that these two maps are 
the same because they have the same domain T(T(M)) and range T(M). However, suppose that 7“) = m, for 
some C? manifold M. Then by Definitions 54.5.16 and 54.5.4 applied to the C! manifold T(M), the second- 
level projection map 1 : T(T(M)) > T(M) must map all vectors in T;(T(M)) to z for all z € T(M). The 
pointwise differential m, ler M) of m at z has domain T;(T(M)) and target space T,(., (M) by Definitions 


) 
58.9.4 and 58.4.5. Let W = Or, (r(m)) be the zero vector in T;(T(M)). Then s.(W) = Or, t 
T™ = m, then implies z = 10 (W) = m,(W) = Or, ,.)(m)- Thus the equality of the maps can only be valid if 
z is the zero vector for all z € T(M). So T,(M) must be zero-dimensional for all p € M. Therefore M must 


be a zero-dimensional manifold. 


(m). Therefore 
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The relation between 7“) and 7, is perhaps clarified to some extent by Figure 64.5.1 in Remark 64.5.2 with 
T(M) substituted for the fibration total space E. Consider also that in terms of Notation 59.2.5, x) (W) = z 
and 7. (W) = V for all W € T; v(T(M)), for all z, V € T;(M), for all p € M. Also possibly of some interest 
is the observation that this implies 7 o 1! = 7 o T. 


59.1.24 REMARK: Terminology for higher-level tangent bundles. 

It is clumsy to have to refer to a space such as T(T(T(M))) as the “tangent bundle of the tangent bundle of 
the tangent bundle of M”. Since there is probably no standard term for this, it is convenient to refer to such 
a space as the “third-level tangent bundle” or “triple tangent bundle” of M. Then a linear space such as 
T.(T (T (M))) for z € T(T(M)) may be referred to as a “third-level tangent space" or “triple tangent space" 
of M at z. Notation 59.1.26 will be used for kth-level tangent spaces for arbitrary k € Z* in this book. (The 
parentheses in the superscript could be a mnemonic for the parentheses in the fully written-out definition.) 


59.1.25 DEFINITION: The kth-level tangent bundle of a C^ manifold M is defined inductively for k € Z+ 
as the tangent bundle of the (k — 1)th-level tangent bundle of M, where the first-level bundle is T'(M). 


59.1.26 NOTATION: Higher-level tangent bundle. 
T) (M) for k € Z* denotes the kth-level tangent bundle of a C^ manifold M. Thus T® (M) = T(M) and 


V2 € Ng \ {1}, TÓ (M) = T(T)(M)) 
for any C^ manifold M with k € Z*. 
59.1.27 THEOREM: The ¢-level tangent bundle of a C* manifold is a C*-* manifold. 


Let k € Z*. Let M be a C* manifold. Then T® (M) is a C*-^ manifold for all / € Nz. Hence if M is a 
C^? manifold, then T® (M) is a C^? manifold for all £ € Z*. 


Pnoor: The assertions follow by induction from Theorem 54.5.28. 


59.2. Horizontal components and vertical drop functions 


59.2.1 REMARK: Affine connections are defined on second-level tangent bundles. 

An affine connection (i.e. differential parallelism) is constructed on a C? differentiable manifold M by defining, 
for each direction of motion v in the tangent space at each point p € M, the rate of adjustment of each 
vector w in the tangent space to keep the vector w moving in a parallel fashion. This is required to be linear 
with respect to both v and w. 


For example, if y : IR — M is a C! curve in a C? manifold M, then ^ has a velocity vector v(t) = 4(t) € 
Ty()(M) at the point p(t) = y(t) for all t € IR. A parallel-transported vector function w : IR > T(M) 
along the curve must satisfy w(t) € T yœ) (M) for all t € IR. An affine connection on M must map the pair 
(v(t), w(t)) linearly to a second-level tangent vector u(t) € Tsa) (T(M)) such that the horizontal component 
of u(t) equals v(t). (This linear map is called a “horizontal lift function". See Section 67.5.) 


Parallelism is defined for a particular path between two points pi,po € M as a linear map between the 
tangent spaces Tp, (M) and T5, (M). When this is differentiated, it may yield an affine connection structure 
on the tangent space of the tangent space of M. 

The concepts of “horizontal components", “vertical vectors” and “drop functions” are required for defining 
connections on general differential fibre bundles and affine connections on differential manifolds. 


The differential of a map in Definition 59.2.2 is defined in Section 58.9. (See Definitions 64.5.3 and 64.5.4 
for the more general C! fibration versions of Definitions 59.2.2 and 59.2.3 respectively.) 


59.2.2 DEFINITION: The horizontal component of a vector W in a total tangent space T(T(M)) for a C? 
manifold M is the vector 7,(W) in T(M), where m, is the differential of the projection map a of T'(M). 


59.2.3 DEFINITION: A vertical vector in a total tangent space T(T(M)) for a C? manifold M is a vector 
W € T(T(M)) whose horizontal component is zero. 
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59.2.4 REMARK: The horizontal component and verticality of second-level tangent vectors. 

The horizontal component of a vector in a total tangent space is chart-independent. Therefore the verticality 
property is chart-independent. Definitions 59.2.2 and 59.2.3 follow the terminology for general differentiable 
fibre bundles in Section 64.5. Vertical components and horizontal vectors are not presented here because 
they are not well defined in the absence of a connection on a manifold. 


Since the horizontal components of second-level tangent vectors are well defined, independent of the choice 
of charts, Notation 59.2.5 is well defined. In the special case that the horizontal component V equals zero, 
one has T, o(T(M)) = ker((dz),). 


59.2.5 NOTATION: T, v(T(M)), for a C? manifold M, and V, z € T,(M) for some p € M, denotes the set 
{y € TZ(T(M)); (dx);(y) = V), where m : T(M) > M is the standard projection map for T(M). In other 
words, T; y (T (M)) denotes the set of vectors in T,(T'(M)) which have horizontal component V. Thus 

Yp € M, VWV, z E€ T((M) |. T,v(T(M)) = (y € TZ (M); (dr) z(y) = V}. 
59.2.6 THEOREM: The set of vertical vectors equals the kernel of the differential of the projection map. 
Let M be a C? manifold. Then 

Yz € M, T; o(T(M)) = ker((dz);), 


where m : T(M) — M is the projection map for T(M) in Definition 54.5.4. 


Proor: The assertion follows from Notation 59.2.5 and Definition 59.2.2. 


59.2.7 THEOREM: Component expressions for double tangent spaces for a given horizontal component. 
Let M be a C? manifold. Then 


Vp € M, Yz, V € T (M), Vv € atlas; (M), 
T: v (T(M)) = {tz (ww), w); W= S(y)(V), w E R”} (59.2.1) 


EC P 
= {6 De) V) wyi V ER”). (59.2.2) 


PROOF: Line (59.2.1) follows from Notations 59.2.5 and 59.1.11, 
Line (59.2.2) follows from line (59.2.1) and Notation 59.1.12. 


59.2.8 REMARK: The “vertical drop function” map for vertical second-level tangent vectors. 

Definition 59.2.9 is important for constructing a covariant derivative from an affine connection. It may 
be compared with the much simpler “vertical drop function” for a linear space such as R”. The tangent 
space T(R”) at a point p € IR" is usually identified with R” without comment. Thus if y : R — R” isa 
differentiable curve in R”, the derivative of y at p = y(t) is defined as limp—o(y(t + h) — y(t))/h, which is 
the limit of a vector (y(t + h) — y(t))/h that just happens to be in R” for h # 0. The fact that R” has 
a complete topology implies that the limit exists and is an element of R”. Alternatively, one could think 
of R” as a manifold with an atlas a single chart, namely the identity map on R”. Then the limit could be 
thought of as an element of T, (IR?). 


In the case of a general manifold, there is no linear space structure. Therefore only the tangent space version 
of a vector can be defined. But if the manifold happens to be a linear space also, there will be two possible 
definitions. This is exactly what occurs when one constructs tangent vectors in T,(T(M)) for a C? manifold 
with z € T(M). Any vertical vector in T;,(T(M)) may be constructed either within the linear space of 
vertical vectors or in the manifold structure of T(M). Definition 59.2.9 gives the canonical map from the 
latter to the former. 


The verticality is important because a vertical vector y € T,(M) represents a tangent vector with a constant 
base point p = 7z(z), and if the base point is not changing, the rest of the manifold might as well not exist, and 
the vector therefore exists entirely within a simple linear space. It is always true for any differentiable fibre 
bundle (£,7, B) that the verticality of vectors is independent of the choice of fibre chart because verticality 
depends only on the projection map 7 : E — B, not in any way on the fibre charts ¢: E > F. Another 
way of thinking of this is to compare it to the velocities of curves in any finite-dimensional linear space W. 
There is no ambiguity in identifying velocities in the tangent space T(W) with elements of W. This kind of 
“vertical drop function" is generally used without comment, and T(W) is tacitly assumed to equal W. (See 
Remark 53.3.13 for the identification of T(W) with W for general Banach spaces W.) 
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59.2.9 DEFINITION: The (pointwise) (vertical) drop function at a point z € T(M) for a C? manifold M is 
the linear map w; : ker((dz);) — T,(z)(M) which satisfies 


Vw € {Op} x IR", Vy € atlas; (M), 
Wz (tz uu (v)) = 5 (z), 112" | (w) y> 
where n = dim(M) and 7 : T(M) — M is the standard projection map for T(M), and I?" : IR?" — R” is 
the projection map II25 , : w e (w"t!,...w?") as in Notation 11.5.26. In other words, 
Vp € M, Vy € atlas; (M), Vv € IR^, Vw € R”, 


2 
8. (555.0,5,9) = bows (59.2.3) 
where z is an abbreviation for tp. 


59.2.10 THEOREM:  Linearity of pointwise vertical drop functions. 
Let M be a C? manifold. Then c; : Tz o(T(M)) > Tz(;)(M) is a linear map for all z € T(M). 


PROOF: Let A c R and v € {Opn} x IR". Then by Definitions 59.2.9 and 54.4.4 (ii), 


UM w, wp) = Waste vw, (y) 
= ta (z),II^ (Aw), 
= ix(z),AI25 (w), 
= Abs(2) n5, (w) 


= \Tzl(tz w v) ): 


Similarly, O72 (Lu, w) + Ls tga) = Iz La w, (uj) + Waltz us v (v)) for all w1, w2 € {Opn} x R” by 
Definitions 59.2.9 and 54.4.4 (i). Hence cc; is linear. 


59.2.11 THEOREM: The drop function is a chart-independent linear isomorphism. 
Let M be a C? manifold. Then for all z € T(M), the drop function w, : ker((dz);) > Tr(a (M) is a 
chart-independent linear isomorphism. Hence wy! : Tr(a) (M) > Tz o(T(M)) is well defined and 


Vp € M, Vy € atlas; (M), Vv € IR^, Vw € R”, 
T 2 
Wh y (tp...) = ous (59.2.4) 
PROOF: Letz € T(M), p= 1(z) € M and v4, v» € atlasp( M). It must be shown that bp," (wi). = 


tpn?” (wa) ja. 1E, (o 
So by Theorem 59.1.15, 


=t a) € ker((dz),). The components wł,...w? and w4,...w? are all zero. 
z,wo wh p 1 1 2 2 


ug = 32 Ons o iG), ug Mt 
J= 


from which it follows that 75(w4) and m2(w2) satisfy the same chart transition rule, which happens to be 
the correct transition rule for vectors in T;(M). So Lp ri^ (wr), = Inn, (we), do Thus c; is injective. 


The fact that w, is a linear space isomorphism follows from Theorem 59.2.10 and the observation that II?" , 
is surjective and (dz), : (z,w, v?) = (p, IT? (w), Y) for y € atlasp(M). Line (59.2.4) then follows from 
Definition 59.2.9 line (59.2.3). 


59.2.12 REMARK:  Generalisation of drop functions from tangent bundles to vector bundles. 
The drop function in Definitions 59.2.9 and 59.2.15 is generalised from tangent bundles to vector bundles in 
Definition 65.3.5, which utilises the linear space drop function in Definition 54.9.5. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1886 59. Recursive tangent bundles and differentials 


59.2.13 REMARK: Identification of vertical vectors with base-space tangent vectors. 

The drop function w, in Definition 59.2.9 “identifies” vectors in the space T; o(T(M)) in Notation 59.2.5 with 
the space T,(M) for all z € T;(M). This “identification” means that the spaces are formally distinct, but it 
is often convenient to substitute one for the other without comment. Within the fibre bundle framework, the 
substitution should be indicated explicitly, although this is rarely done. (An exception is Poor [32], pages 
18-19, 67, who refers to the drop function as the “second factor projection".) 
The verticality of the second-level vector t”) 4 , , € T(T(M)) in Definition 59.2.9 line (59.2.3) is evident 
from the fact that the horizontal component explicitly equals zero. The vertical component w is copied 
to the velocity parameter in tp. A significant consequence of this is that all “memory” of the original 


base-space velocity parameter v is lost. 


In the application of the drop function to the definition of the covariant derivative in terms of a given affine 
connection in Section 71.6, the horizontal component is set to zero by subtracting the “horizontal lift” Ay (z) 
of z for base-space velocity V from a second-level velocity w € T,(T(M)) to obtain a vertical second-level 
velocity vector w — 0y (z) € ker((dz);) = TZ,0(T(M)) for some given V € T,(M) and z € T,(M). The drop 
function is applied to this to obtain w(w — 0v (z)) € T;(M). Thus the velocity V is used in the computation 
of w(w — 0v (z)), but the value of V is absent as a parameter in the final output. 


It should be mentioned here that a horizontal lift vector 0y (z) of is not a horizontal vector itself because 
in fact horizontality is not a chart-independent concept. It is called the "horizontal lift" of z € T(M) with 
velocity V because it lifts z from T(M) to 0y(z) € T:(T(M)), but it is the parameter V which is horizontal, 
not the output Oy (z). 


59.2.14 REMARK: The global vertical drop function for a double tangent bundle. 

The global vertical drop function in Definition 59.2.15 is the simple globalised version of the pointwise 
vertical drop function in Definition 59.2.9. An important difference is that the global vertical drop function 
is not a bijection in general because for each V € T(M) there is a vector in every space Tz o(T(M)), for 
z € T,(v)(M), which is mapped by w to V, and typically there is more than one element z € Tz(y)(M). 
The map w+ : Tz o(T(M)) — Tr(a (M) is a bijection for each z € T(M) by Theorem 59.2.11. 


59.2.15 DEFINITION: The (global) (vertical) drop function for a C? differentiable manifold M is the map 
w : Userem) Tz,0(T(M)) ^ T(M) defined by 


Vz € T(M), Vy € T; o(T(M)), w(y) = (y). 


In other words, w = Uer 2- 


59.2.16 REMARK: Connections are needed for extension of drop functions to all second-level vectors. 

The map w, in Definition 59.2.9 is a bijection from the subspace ker((d7);) of vertical vectors in T,(T(M)) 
to T,(M). Note that w, cannot be extended to a chart-independent linear map on all of TZ(T(M)) without 
some means of removing the arbitrary definition of horizontality. This is a service which is provided by 
connections. A connection on a C? manifold is equivalent to extending w, to all of T,(T(M)) for all 
z € T(M) in a way which is chart-independent. A connection effectively makes a more or less arbitrary 
choice of “horizontal plane" for each z € T(M), relative to which vertical components can be computed. 


Thus a connection may be thought of as a horizontal component removal rule. The connection determines 
a horizontal vector which is subtracted from an element of T.(T'(M)) to give a vertical vector. This is the 
basis of covariant differentiation. Definition 59.2.9 is an essential ingredient in defining covariant derivatives 
from connections. The vertical vectors constructed from connections must be “dropped” from the vertical 
space ker((dz);) down to the base tangent space T (M). (This is the procedure for covariant differentiation 
of vector fields in X?(T(M)). Similar procedures are followed for general tensor fields. If the field to be 
differentiated is valued in a linear space, a similar drop function can generally be defined.) 
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59.3. Oblique drop functions 


59.3.1 REMARK: Chart-dependent oblique drop functions. 

There are many situations where the chart-independence of the drop function is not required. For example, 
extracting the chart-dependent vertical component of a non-vertical second-level vector is specifically required 
when converting an affine connection on a tangent bundle into a Christoffel array. 


An oblique drop function for a tangent bundle is not the same as the chart-dependent style of drop function 
for general differentiable fibre bundles in Definitions 64.6.2 and 64.6.6, although the same form of notation 
is used. In the fibre bundle style of drop function c? or c?, the superscript is a fibre chart, and the range 
of the function lies in the tangent bundle of a general differentiable fibre space. 


The global oblique drop function in Definition 59.3.5 is the “chartwise” version of the pointwise oblique drop 
function in Definition 59.3.2. This obviously cannot be fully “globalised” because it is chart-dependent. 


In the physics literature, chart-dependent functions and maps are often referred to as “local” because charts 
are typically labelled by their domains, which are in fact local neighbourhoods in the topological sense. This 
can be confusing because a chart is not uniquely determined by its domain. However, using such terminology, 
one may say that oblique drop functions are “local”. 

The oblique drop function in Definition 59.3.2 uses exactly the same expressions as for the vertical drop 
function in Definition 59.2.9. Thus it is an extension from vertical second-level vectors to general second- 
level vectors. It is this extension which makes it chart-dependent, not the expression which defines it. 


59.3.2 DEFINITION: The (pointwise) (oblique) drop function at a point z € T(M), for a C? manifold M 
and v € atlas; (;) (M), is the linear map c? : T;(T(M)) — T«(;)(M) defined by 


Vw € R” x R^, OE (tzw, 0) ) = teo) 2n (w) 


where n = dim(M) and 7 : T(M) — M is the standard projection map for T(M), and II”; : IR?" > R” is 
the subsequence map II25 , : w e (w"t!,...w?") as in Definition 14.6.11. In other words, 


Vp € M, Vz € T,(M), Vy € atlas (M), V(wi, w2) € R” x R”, 
TY (ta (urs wa), 0(0)) = Epu 
In other words, 


Vp € M, Vy € atlas, (M), Vu, wi, we € R”, 


v (2) = 
ey eh (uiia = bp wa ab: 


59.3.3 REMARK: The range of oblique drop functions. 
The pointwise vertical drop functions w, in Definition 59.2.9 have range T (,;(T(M)). Thus it is observed 
in Theorem 59.2.11 that the maps c; : T; 9(T(M)) > Tr(a (M) are bijections. 


The pointwise oblique drop functions w% in Definition 59.3.2 have the same range Tr(z)(M) as wz, but 
their domains are much larger, namely the set T;(T(M)), whereas vertical drop functions are defined only 
for vertical vectors. Consequently the oblique drop functions are not bijections in general. However, by 
restricting them to subsets of the form T; y (T(M)) for V € Tr (M), they do become bijective. This is 
asserted in Theorem 59.3.4. 


59.3.4 THEOREM: Bijectivity of restricted pointwise oblique drop functions. 
Let M be a C? manifold. Then 
Vp € M, Yz, V € T(M), Vv € atlas,(M), 


I 


T.v(T(M)) ` T, y (T(M)) > T,(M) is a bijection. 


Hence 
Vp € M, Yz, V € T(M), Vy € atlas, (M), 


= 
Tz,v(T(M)) ` 


we T,(M) > Tzv (T(M)) is a bijection. 
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Proor: Let y € T;v(T(M)). Then by Definition 59.3.2, y € Dom(w?) and w?(y) € T,(M). So the 
function c? T. v(T(M) ` T; v (T(M)) > T,(M) is well defined. To show surjectivity, let W € T,(M). Then 
W = tpw, for some we € R” by Notation 54.1.4, where n = dim(M). Similarly, z = tp,u,y and V = ty uu 
for some u,w; € IR^. Let y = ^. M Then y € T, y(T(M)) and sd ae = tow, = W by 
Definition 59.3.2. Thus W € Range(cz?). Therefore Sr. try) VIL V(T(M)) > T,(M) is surjective. 

To show injectivity, let W € T,(M), and let y,y’ € T;,v(T(M)) satisfy co?(y) = c ?(y) = W. Then 
y = Ss = y, where u = e(v)(z), wy = O(W)(V) and ws = 9(v)(W). Thus OE lr, vran) : 
TZ, v (T(M)) > T,(M) is injective, and is therefore a bijection. Hence colum. (M) > Tov (T(M)) 
is a well-defined bijection by Theorem 10.5.11. 


59.3.5 DEFINITION: The (chartwise) (oblique) drop function for a C? manifold M and Y% € atlas(M) is the 
map w” : Uen-1(Dom(py) 12 (1 (M)) > T(M) defined by 


Vz €^ (Dom(y)), Yy € T.(T(M)), w” (y) = vw (y). 
In other words, w” = U,cn-1(Dom(p)) 72- 


59.3.6 REMARK: Management of the four coordinate tuples for double tangent vectors. 

Each vector in the double tangent space T(T(M)) of a C? manifold M may be coordinatised by four 
real n-tuples, where n = dim(M). For each W € T;(T(M)) and v € atlas;(;(M), the manifold chart 
Vr (p (p) : (r) (71 (Dom())) > (IR^ x R”) x (IR? x R”) = R“ maps W to some (z,u,v,w) € IR" 
with z, u, v, w € R”, where a) : T(T(M)) + T(M) is the projection map as in Definition 59.1.22. 

'The purpose of Theorem 59.3.7 is to give some ways of extracting the 4th n-tuple w from a given second- 
level vector W € T(T(M)). The application of cc" to W moves the 4th n-tuple to the second place. 


Thus (xz, u,v, w) is effectively mapped to (rz, w). (More precisely, DU M is mapped to £,—:(5),u,y-) 
The tangent bundle fibre chart ®(w) : x !(Dom(v)) — IR” then extracts the n-tuple w. This combined 
operation (v) o cz" is on the left side of line (59.3.1). The procedure on the right side of line (59.3.1) is 
to first (effectively) map (x, u, v, w) to (x, w), as on the left side, and then apply the manifold chart W(w) 
to extract (x, w) from the vector ty-1(.),, and then finally apply II24 , to extract w from (az, w). The other 


operations are similar but different. 


'The operations in Theorem 59.3.7 are admittedly tedious and ponderous, but this is the price to be paid 
for abstraction. Since coordinates cannot be avoided for concrete computations, such computations often 
require abstract geometric structures to be mapped to Cartesian space structures so that some analysis 
can be performed. Then the results must be re-packed into the abstract structures. Such unpacking and 
repacking procedures are required in particular for affine connections and various curvature tensors. 


59.3.7 THEOREM: Extraction of the vertical-component coordinate-tuple from a second-level vector. 
Let M be a C? manifold with n — dim(M). Then 


= II^, o &(W(y) 
= I$, o W(W(y)). 


(See Definition 14.6.11 for the subsequence maps II2^ , : IR?" > R” and II$? ,, : IR^" > IR.) 


Vi € atlas(M), P(Y) o w? =I", o P(Y) o w” (59.3.1) 
) 


PROOF: Let p € M and z € T,(M) and W € T,(T(M)). Let v € atlas,(M). Then z = tp,u,y for some 
(unique) u € R” and W = P ous for some (unique) v, w € R”. By Definition 59.3.2, co" (W) = tpw,- 
Let x = v(p). Then $()(c"(W)) = w and W(v)(c*(W)) = (x,w) by Notations 54.5.7 and 54.5.21. So 
112" (Uy) (c2 (W))) = w = 9(v)(c  (W)). Therefore P(Y) ow? = II22, o V(y) oc". 
The second-level fibre chart $(w(w)) for T(T'(M)) maps second-level vectors B cum to corresponding 
tuples (v, w). Then I”? , extracts w from such 2n-tuples. Therefore P(Y) o co? = IP", o &(W()). 


The second-level manifold chart W(w*(w)) for T(T(M)) maps vectors e to corresponding tuples 


pu, v, w, p 


(x,u,v, w). Then II2" ,, extracts w from such 4n-tuples. So (Y) o cz? = II? , , o W(W(w)). 
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59.3.8 REMARK: Inverse relations between vertical and oblique drop functions. 

Theorem 59.3.9 is quite shallow. Since c? is merely an extension of c, from T; o(T(M)) to T.(T(M)), 
which is stated in line (59.3.2), and c; : Tz,9(T(M)) > Tr(a (M) is a bijection, lines (59.3.3) and (59.3.4) 
are effectively only a statement of the definition of an inverse map. Nevertheless, it is useful to package this 
as a theorem so that it can be used without distracting attention from proofs where it is used. 


Lines (59.3.5), (59.3.6) and (59.3.7) are slightly less shallow. But they are still not very deep. 


59.3.9 T'HEOREM: Oblique drop functions and inverses of vertical drop functions. 
Let M be a C? manifold. Then 


Vp € M, Vz € T (M), Vv € atlas; (M), Vy € TZ o(T(M)), 


w.(y) = w} (y). (59.3.2) 
Hence 
Vp € M, Vz € T (M), Vv € atlas, (M), Vy € Tz o(T(M)), 
y = v; (w! (y)) (59.3.3) 
and 
Vp € M, Vz € T (M), Vv € atlas; (M), Vy € TZ o(T(M)), 
2 mam CO) REDE (59.3.4) 


T 


Consequently wy! o c? = idr, ,((w)) and vr crt o c; = idr,,(r(uy) for all z € T,(M) and 


V € atlas; (M), for all p € M. Similarly, 


M)) 


Vp € M, Vz € T,(M), Vv € atlas; (M), VW € T,(M), 


w;(W)-wL 


zo cran UV). (59.3.5) 


Hence 


Vp € M, Vz € T,(M), Vy € atlas, (M), VW € T,(M), 


W = e. (2? lr. ran W) (59.3.6) 
and 
Vp € M, Vz € T,(M), Vv € atlas; (M), VW € T,(M), 
wt (w7 (W)) =W (59.3.7) 


Consequently w, o my onum = idm, (ur) and w$ [e v! = idr, (yr) for all z € T,(M) and v € atlas; (M), 
for all p € M. 

PROOF: Line (59.3.2) follows from Definitions 59.2.9 and 59.3.2. Line (59.3.3) follows from line (59.3.2) 
and Theorem 59.2.11. Line (59.3.4) follows from line (59.3.2) and Theorem 59.3.4. Line (59.3.5) follows from 
line (59.3.2) and Theorems 59.2.11 and 59.3.4. Line (59.3.6) follows from line (59.3.5) and Theorem 59.2.11. 


Line (59.3.7) follows from line (59.3.5) and Theorem 59.3.4. 


59.4. Scaling curves and constant-scale maps 


59.4.1 REMARK: Scaling a second-level vector also scales its horizontal component. 

Theorem 59.4.2 (ii) confirms that multiplying a second-level vector by a real number yields a vector with a 
scaled horizontal component. This fact may be written as AT; v (T(M)) € Tz Xv (T(M)). This has particular 
relevance when the naive directional derivative of a vector field is multiplied by a real-valued function, as is 
done for the Leibniz rule for naive derivatives in Theorem 61.3.3. The significant point here is that out of the 
4n components of a second-level vector only a particular 2n components are scaled. The rest are unchanged. 
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59.4.2 THEOREM: Scaling of components of second-level vectors. 
Let M be a C? manifold with n = dim( M). 


(i) Yp € M, Yz € T (M), Y(h, v) e R” x R”, Yy € atlas, (M ), VAER, Atz (hv), V (y) =t, OI Av), V (b) - 
(ii) Vp € M, Yz, V € T,(M), VA € R, Vy € T, v(T(M)), Ay € T; v (T(M)). 


Pnoor: Part (i) follows from Definition 54.4.4 (ii). 
For part (ii), let p € M, z, V € T;(M) and y € T;v(T(M)). Let v € atlas,(M). Then y = tz (,v),u(y) for 
some v € R” by Theorem 59.2.7 line (59.2.1), where h = ®(w)(V). Therefore Ay = tz (àh,\v),v (4) by part (i). 
Hence Ay € Tz Av (T(M)) by Theorem 59.2.7 line (59.2.1). 


59.4.3 REMARK: The verticality of "tangent vector scaling curve” velocity fields. 

A motivation for Theorem 59.4.4 arises naturally from the Leibniz rule for naive derivatives of vector fields in 
Theorem 61.3.3. The most obvious value for the derivative O4 (Atp v,y) would be ty». However, according 
to Definition 57.9.2, the velocity field of a curve takes values in the tangent bundle of the manifold which 
is the target space for the curve. Since the map À — At,,,, is a curve in T(M), its vector field must take 
values in T(T'(M)). This vector field has vertical values for this special kind of curve. So it is possible to 
apply a drop function to obtain the value fj, y, which is exactly the same as the naive interpretation of the 
derivative 0)(Atp,v,). (For a general vector bundle version of Theorem 59.4.4, see Theorem 65.5.2.) 


59.4.4 THEOREM: The velocity vector field of the “scaling curve” of a tangent vector. 
Let M be a C? manifold with n — dim(M). Then 


Vp € M, Vv € IR^, Vv € atlas (M), Vào € IR, 


OLEI NENI E Op xo e| par, (59.4.1) 
= Y (o) 
= IQ), (0,0), V (V) (59.4.2) 
x cad (59.4.3) 
nic (59.4.4) 


=w, Fro) pwy) 
where the curve abbreviation y : R — T(M) denotes the map A+ Atypv = ty,rv,»- Hence 
Vp € M, Vv € R”, Vv € atlas, (M), Vào € R, 
Cis ass y CO po i) |a )= tov b 


PROOF: Line (59.4.1) follows from Definition 54.4.4 (ii). Line (59.4.2) follows from Definition 57.9.2 because 
PYA) = UH) (ty,rvs) = (Yp), Av) € R” x R” for all A € IR, which implies 0, U(wW)(y(A)) = (0, v), 
where W(q) € atlas, (T(M)) for all A € R by Definition 54.5.22 and Notation 54.5.21. Line (59.4.3) 


then follows from Notation 59.1.12. The well-definition of the inverse drop function on line (59.4.4) follows 


from Theorem 59.2.11, and the equality p s Qv, = Tbo Eoo) then follows from Definition 59.2.9 and 
Theorem 59.2.11 line (59.2.4). 


59.4.5 REMARK: Abstract vector formulas for scaling curve velocity vector fields. 

Whereas Theorem 59.4.4 expresses tangent vectors tpv, in terms of components and charts, Theorem 59.4.6 
presents some of the same formulas in terms of abstract vectors V. This makes the comparison with the 
corresponding vector bundle formulas in Theorem 65.5.2 slightly easier. 


59.4.6 THEOREM: The velocity vector field of the “scaling curve” of an abstract tangent vector. 
Let M be a C? manifold with n = dim(M). Then 


VV € T(M), VAo € R, DAV) h, = 23v (V). 
Hence 
YV € T(M), Vào € R, Tov (AAV) ha = V. 


Pnoor: The assertions follow from Theorem 59.4.4 line (59.4.4) and Notation 54.1.4. 
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59.4.7 REMARK: The differential of the “tangent vector scaling curve”, regarding R as a manifold. 

If R is regarded as a manifold rather than a number system, it can be given the atlas Ag = {idr}, which 
makes (IR, Ag) a C% manifold. Then by defining the map R; : IR — T(M) by A Az for some z € T(M), 
where M is a C? manifold, R, becomes a C! map between C! manifolds R. and T(M) instead of a C! 
curve in T(M). Thus the curve y : A+ Az with z = tpw i in Theorem 59.4.4 becomes a C! map between 
manifolds for which the differential may be computed as in Definition 58.4.5. This point of view, taken in 
Theorem 59.4.8, is applicable to the Leibniz formula for naive derivatives of vector fields in Theorem 61.3.3. 


As one would expect, the “scaling curve velocity field” approach in Theorem 59.4.4 ultimately gives the same 
answer as the “map on manifold R” approach in Theorem 59.4.8. This can be seen by setting u = 1 because 
the source vector t,,u,ia, represents the old-fashioned differential “u dA" located at Ag. So u = 1 gives the 
old-fashioned differential “dX”, which is implicit in the definition of a velocity curve. (See also Remark 57.9.9 
for this idea of regarding a curve as a map on the manifold R.) 


59.4.8 THEOREM: The differential of the tangent vector scaling curve as a map on the manifold R. 
Let M be a C? manifold. Then 


Vp € M, Vz € T (M), Vv € atlas,(M), VAo,u € R, 


(dRz)ro (tosu idr) = taz, (0,9 (9)(2)) V () (59.4.5) 
— +(2) 
= dg ro (U)(2),0,us (Y) (2) 0b (59.4.6) 
=umyy2(2); (59.4.7) 


where R; : R — T(M) denotes the map A+ Az for z € T(M). 


Pnoor: Letv = ®(~)(z). Then z = tp,v,,, and by Definition 58.4.5, (GR;)A, (tx, uia ) = troz,w,b(), Where 
w € R” x R^, with n = dim(M), is given by 


w = Y v (ts Gd G2), sano) 


= uo (5): Q))L, 

= u ða (Ubera) (Az) PAD | as (59.4.8) 
= u ða (Yp) Av)|,_, (59.4.9) 
= u (Opn, v) 


= (Orn, ub (U)(z)), 


where line (59.4.8) follows from Notation 54.5.21, and line (59.4.9) follows from Theorem 54.5.8. This verifies 
line (59.4.5). Then line (59.4.6) follows from Notation 59.1.12. For line (59.4.7), note that 


(2) 


eS 

tp ABN) uB) b = Y p ABH) (20,9 (UY (2), (59.4.10) 
eS 
= Ub, @(w)(Aoz),0,8(#)(2). (59.4.11) 
= uw, (z) (59.4.12) 


where line (59.4.10) follows by Theorem 59.4.2 (i), line (59.4.11) follows by Theorem 54.5.8, and line (59.4.12) 
follows by Theorem 59.2.11. 


59.4.9 REMARK: Component-scaling by the differential of a constant-scale tangent vector map. 

Perhaps one small surprise in Theorem 59.4.10 is that the two component n-tuples which are scaled are 
different to the two which are scaled in Theorem 59.4.2. The constant-scale map in Theorem 59.4.10 should 
not be confused with the variable-scale map in Theorem 59.4.8, which may appear similar at first sight. 


Theorem 59.4.10 is used in the proof of Theorem 61.3.3, which is a Leibniz rule for the naive derivative of the 
product of a real function and a vector field, and this Leibniz rule is then used to prove the corresponding 
Leibniz rule for covariant derivatives in Theorem 71.6.7 for affine connections on tangent bundles. 
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59.4.10 THEOREM: The differential of constant-scale tangent vector maps. 
Let M be a C? manifold with n = dim(M). Then 


Vp c M, VA ER, Vu, w1, w2 € R”, Vy € atlasp( M), 
CI CNET (59.4.13) 
where Ly : T(M) — T(M) is defined by Ly : z+ Az. In other words, 
Vp c M, VA ER, Yz € T (M), Vui, we € R”, Vy € atlasp( M), 


(dL) (t2 (ww), (9)) = trz,(wi,rAwe), V(b): (59.4.14) 
Hence 
Vp c M, YA € R, Vz € T;(M), Vy € TZ(T(M)), Vv € atlas (M), 
c ((dLa)z(y)) = Xe? (y). (59.4.15) 
Consequently 
Vp € M, VA € R, Yz, V € (M), Vy € T, v(T(M)), V € atlas, (M), 
(415). (y) = ela. cry OP?) (59.4.16) 


PROOF: Let z = tpu and w € IR?". Then by Definition 58.4.5, (dL)z(t;,u,u) = taz, o, where 


II 


w! — S wa (P) La (0) E) Lato 


i=l 


2n " 
= b» wri Vw) (LX (5571(2:),25,9)) ees eer UG (59.4.17) 


i=1 


2n I 
=> w Os WP) (ty—2(21),02,0) |a, (pyra (59.4.18) 


i=1 


2n " 
= 3 wiOs (21, À22)] 
ici 


z;—wV(p),r2—u 


w' (ei, Orn) + 5 wti (Opn, Ae; ) 


i=l j=l 
—(Mvu iei, À * we, j) 
E S 

= (w1, Aw2), 


where w = (w1, w2) € R” x R” for w € IR?", x = (21,22) € IR" x R” for x € R?”, line (59.4.17) follows 
from Notation 54.5.21, and line (59.4.18) follows from Definition 54.4.4 (ii). This verifies lines (59.4.13) 
and (59.4.14). Then line (59.4.15) follows from Definitions 59.3.2 and 54.4.4 (ii). 


For line (59.4.16), let y € Fe EUT Then (dLa )(y) € Thzv(T(M)) by line (59.4.13). Therefore by 
Theorem 59.3.4, (dLa)(y) € Dom( (wy, and then (dL)).(y) = ay. = Aw? (y)) follows 


Ins. vran) n. (rad 
from line (59.4.15) because wv? VD V(T(M)) > T,(M) is a bijection. Hence line (59.4.16). 


Ins 


59.4.11 REMARK: Velocity and differential of scaling curves for linear-space tangent vectors. 

The standard C% manifold atlas for finite-dimensional real linear spaces is introduced in Definition 49.7.14. 
The full differentiable manifold structure for such spaces is given by Definition 51.4.21. The vertical drop 
function for such spaces is given by Definition 54.9.5. 


Since finite-dimensional real linear spaces can be given a natural C?? manifold structure, it is possible to 
apply Theorems 59.4.4, 59.4.6, 59.4.8 and 59.4.10 to them. This is useful when applied to the fibre spaces 
of differentiable vector bundles. Theorem 57.9.11 is a linear space version of Theorem 59.4.6, which is used 
in the proof of Theorem 65.5.2, the general vector bundle version of Theorems 59.4.4 and 59.4.6. 


((2019-6-16. To be continued ... )) 
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59.5. Sprays on tangent bundles 


59.5.1 REMARK: Sprays on tangent bundles are second-level vector fields with quadratic scaling. 

A spray on a tangent bundle T'(M) is a second-level vector field, which means that it is a T(T'(M))-valued 
vector field on T(M), whereas first-level vector fields are T(M)-valued vector fields on M. So a spray 
S on T(M) is a function S : T(M) — T(T(M)) which satisfies S(z) € T.(T(M)) for all z e T(M). 
Therefore on line (59.5.1), AS(z) € T.(T(M)) implies that the expression (dL4);(AS(z)) is well defined. 
(The well-definition of (dL); on line (59.5.1) follows from the proof of Theorem 59.4.10.) The equality on 
line (59.5.2) follows from the linearity of the induced map (dL); : TZ(T(M)) > Tyz(T(M)) for each A € IR 
and z € T(M). 

“Sprays” as in Definition 59.5.2, and the closely related “geodesic sprays”, are presented by Lang [23], 
pages 99-109; Poor [32], pages 95-102; Crampin/Pirani [7], pages 336-339. (According to Lang [23], page 103, 
sprays were introduced in a 1960 paper by W. Ambrose, R.S. Palais and I.M. Singer.) 


Definition 65.5.8 generalises Definition 59.5.2 from tangent bundles to vector bundles. 


59.5.2 DEFINITION: A spray on the tangent bundle T(M) of a C? manifold M is a cross-section S € 
X(T(T(M))) which satisfies 


YA € R, Yz € T(M), S(Az) = (dLy)2(AS(z)) (59.5.1) 
= A(dLA)«(S(z)), (59.5.2) 


where Ly : T(M) — T(M) is defined by Ly : z > Xz for all A € R and z € T(M). In other words, 


VA € RR, S o Ly = (La) o LP og 
= LY) 0 (Ly). o 8, 


where LÜ : T(T(M)) > T(T(M)) is defined by LY’ : y Ay for all A € R and y € T(T(M)). 


59.5.3 REMARK: The meaning of sprays. 

The spray concept is applicable to definitions of symmetric affine connections on tangent bundles. Such 
connections can be reconstructed from “geodesic sprays”, which are families of geodesics with the same 
starting point. (This kind of reconstruction is related to the Schild’s ladder concept in Remark 72.2.2.) 


The information in a symmetric (i.e. torsion-free) affine connection on a C? manifold M may be extracted 
by lifting every vector V in the tangent bundle T(M) in the direction of V itself. This yields a second- 
level vector 0v (V) € Ty(T(M)) for all V € T(M). (See Definition 67.5.4 for horizontal lift functions 0 on 
general differentiable fibre bundles.) Since V appears twice in the expression 0y (V), the map V +> 0y (V) is 
quadratic with respect to the scaling of V. Then, by using the cosine law for triangles, the original bilinear 
connection can be recovered from this quadratic map. 


The horizontal lift expression 0y (V) may be obtained either directly from the horizontal lift function, or 
else from families of geodesics which are integral curves of the horizontal lift function. In the latter case, the 
term “geodesic spray" accurately describes the “spray” of geodesic curves emanating from each point in the 
manifold M. The quadratic spray function S € X (T(T(M))) can be extracted from the sprays of geodesics, 
and the entire horizontal lift function can be obtained from S. 


59.6. Horizontal component swap functions 


59.6.1 REMARK: Swapping velocity-vector parameters of second-level tangent spaces is “mostly harmless”. 
Section 59.6 asserts that some subspaces of fibre sets in the second-level tangent bundle of a C? manifold 
are “concretely identical". This means that they are identical in much the same way that the spaces T(V) 
and V are “concretely identical" if V is a Banach space. (See Remark 53.3.13.) 

The issue here is not generally mentioned in differential geometry textbooks. In fact, it is rarely mentioned 
in this book either. The drop function c in Section 59.2 is also generally glossed over in the literature. 
Therefore it is probably harmless to ignore it. However, if one wishes to apply the fibre bundle formalism 
in a logically consistent way, one should at least be aware of the issue. Section 59.6 can be safely skipped 
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because the standard informal way of adding and subtracting vectors in the formally disjoint linear spaces 
Tz,v(T(M)) and Ty,  (T(M)) which are defined in Notation 59.2.5 is essentially safe, although it is somewhat 
troubling to be adding vectors which are in disjoint spaces. In this case, two disjoint linear spaces happen to 
always have the same transformation rules, and so a difference between elements of these spaces "transforms 
like a vector". This does not, however, necessarily imply that the difference is a vector. 


59.6.2 REMARK: Symmetry of the second-level tangent space under velocity-vector swap. 

The motivation for introducing a “swap function” for the second-level tangent bundle T(T(M)), for any C? 
manifolds M, is to identify certain subspaces of fibre sets within T'(T'(M)) which are “concretely identical” 
but formally distinct. 

To be specific, the linear spaces T; y (T(M)) and Ty, (T(M)) are “concretely identical” in some sense for 
all z,V € T,(M), where T, y (T(M)) denotes the set of vectors in T;(T(M)) with horizontal component V. 
To see why this is so, let y : IR? —^ M be a C? curve family in M, and define X : IR? — T(M) and 
Y : R? 2 T(M) by X(t) = 0,5(s,t) and Y (t) = Gvy(s,t) for all (s,t) € IR?. Then 


V(s,t) € R?, OX (s, t) = ðs (s, t) 
= ðs (s, t) (59.6.1) 
= 0,Y(s,t). 


This equivalence (ignoring the difference in T(T'(M)) base points) is illustrated in Figure 59.6.1. 


0,Y (s,0)]. o 3X (0,1), 
| m 
W jg 
X (0,0 
+(0, 0) 7(0, 0) Mun) 
Figure 59.6.1 Equality of mixed derivatives of a C? curve family 


The vertical components are the same, but the base points z and V are different. So line (59.6.1) is correct 
only in the sense of the equality of vertical components. 

Let z = X(0,0) and V = Y (0,0). Then Q,X(0,t)|, , € Tz,v(T(M)) and 8,Y(s,0)]. , € Tv,z(T(M)). But 
aX (0,t)], = Q,Y (s, BU as Consequently T, y (T(M)) = Ty, (T (M)), since every vector in T, y (T(M)) 
is in Ty ,(T(M)), and vice versa. Although the elements of these spaces (which have the same vertical 
component) are equal in this concrete sense, and therefore must have identical chart transition rules, they 
are maintained as separate spaces in the fibre bundle idiom because each element is tagged differently. 


In terms of Notation 59.1.12, one has apparently e dubi = i for all p € M, v,w,w € IR" and 


p,W,v,w,w 

VO € atlas,(T(M)), for a C? manifold M with n = dim(M), where z = tpv, with  € atlas,(M). The 
equality is only prevented by the fact that t) , , , € Tv.e(T(M)) and tU), € T,v(T(M)), where 
V = tpv and z = tj45,, and Ty,.(T(M)) A Tv ,(T(M)) = 0 if V Z z. (It is assumed here that 7“ is the 
chart for T(M) which corresponds to w for M. See Remark 59.1.7.) 

The concrete identification of the disjoint linear subspaces T; y (T(M)) and Ty, (T(M)) may appear to be an 
obscure matter, particularly since it is rarely (if ever) mentioned in differential geometry textbooks. However, 
without this identification, it would be impossible to subtract the second-level tangent vectors Ox (p)Y and 
Oy (p)X to form Lie derivatives (LxY)(p) in Section 61.8, or the formula 7(V, z) = 0;(V) — 0v (z) for the 
torsion of a connection 0 in Remark 71.12.2. Nor would it be possible to define the Riemann curvature 
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tensor components as Rijke = De, Oe, (€j) — De, 9c, (ej) in Remark 70.4.3. All such formulas would become 
impossible in the fibre bundle idiom. Most authors content themselves to demonstrating merely that these 
constructs are well defined because they transform correctly under chart transition maps. This is a necessary, 
but not sufficient, condition for a construct to be well defined. 


Generally one does require addition and subtraction to be applied only to vectors which are in the same linear 
space. To override this “default protection” , the swap function = in Definition 59.6.3 may be applied between 
the spaces T; v (T(M)) and Ty.; (T(M)) to bring the vectors in question into the same linear space so that 
linear operations on them will be valid. In practice, one may then omit the application of such a “second- 
level tangent space swap function", but at least one should know in the background that the situation can 
be “regularised”. (In other words, it should always be possible to rewrite informal mathematics as correct 
mathematics. Preferably one should also know how to do this!) 


It must not be forgotten that in general, the spaces T;,v(T(M)) do transform differently, as indicated in 
'Theorem 59.1.15. So the swap function must be applied judiciously. It must also not be forgotten that the 
spaces T; y (T(M)) and Ty. (T(M)) are in fact formally disjoint, as are the spaces T; o(T(M)) and T,(M). 
These pairs of spaces may be said to be ontologically identical, but formally distinct. 


59.6.3 DEFINITION: The swap function for a pair of vectors z, V € T,(M) for some p € M, for a C? 
manifold M with dim(M) =n € Zj is the map E;,v : Tz v(T(M)) > Tv, (T(M)) defined by 


Vib € atlas;(; (M), Vw € R”, Sr oe 
where z = tpv, and V = typ. 


59.6.4 REMARK: Domain and range of the pointwise swap function. 
The domain T;,v (T(M)) indicated for E; y in Definition 59.6.3 is the set (Y € T,(T(M)); (dr).(Y) = V}. 


(See Notation 59.2.5.) The requirement Y € T,(T(M)), with z = tp», means that Y = i os for 
some (2, ğ) € R” x R” = R?". Then the requirement (dx);(W) = V means that j = w since V = tj. 
Therefore =, y must be defined for T; v (T(M)) = T nus ib € R”}, which in fact it is. 


59.6.5 REMARK: Global swap function. 
One may define a global swap function for a second-level tangent bundle as the union of the individual swap 


functions. Thus E = U em Uz ver, Z,v and Dom(&) = Range(=) = Upem U; verm) T; v(T(M)). 


59.6.6 DEFINITION: The global swap function for a C? manifold M with dim(M) = n € Zj is the map 
z: Upem U- ver, T: v(T(M)) > Upem U.ver,(M) T.,v(T(M)) defined by = = Unem U.ver,(M) Euv- 


59.6.7 REMARK: The swap function does not alter the chart-dependent drop function value. 

Theorem 59.6.8 makes the technically useful assertion that the swap function replaces a second-level vector in 
T.,v(T(M)) with the corresponding unique vector in Ty,,(T'(M)) which has the same oblique drop function 
value. This observation has an application in the computation of the Levi-Civita connection for a Riemannian 
metric function. (See Remark 74.2.12.) 


59.6.8 THEOREM: The swap function does not alter the chart-dependent drop function value. 
Let M be a C? manifold. Then 


Vp € M, Yz, V € T (M), WW € T; v (T(M)), VW» € Tv (T(M)), Vv € atlas, (M), 
E(Wi)=W. ©  «"(Wi)- w"(W;). 


PROOF: The assertion follows from Definitions 59.3.2, 59.3.5, 59.6.3 and 59.6.6. 


59.6.9 REMARK: Some basic technical properties of the swap function. 
Theorem 59.6.10 (iv) is useful for defining the Lie bracket in Definition 61.5.7, and then showing in the proof 
of Theorem 61.5.17 that the Lie bracket is antisymmetric. 
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59.6.10 THEOREM: Some basic properties of the swap function. 
Let M be a C? manifold. Let w and E be the swap and drop functions for M. 


(i) E [0] aS => idr(T T(M)): 


(ii) Vp € M, Yz, V € T (M), Vy", y? € T;v(T(M)), y — y? € T;o(T(M)) = Dom(z;) € Dom(c) 

(iii) Vp € M, Yz, V € T,(M), Vy, y? € T; v (T(M)), @(E(y*) - E?) = wy! — y?). 

(iv) Vp € M, Yz, V € T (M), Vy" € T; v(T(M)), Vy? € Tv. (T(M)), o(y" -sQ a) = —w(y? — By’). 
PROOF: For part (i), let y € T(T(M)). Then by Theorem 59.2.7, y = e oy for some p € M, u,v, w € R” 
with n = dim(M), and y € atlas,(M). By Definitions 59.6.6 and 59.6.3, E(y) = t?) „wy, and thereby 
E(E(y)) = iom = y. Hence E o E = idr(r(a). 


For part (ii), let y, y? € T;,v(T(M)), n = dim(M) and v € atlas,(M). Then by Definition 59.2.15, 
Notation 59.1.12 and Theorem 59.2.7, y? = iO 


p,u,v,wJ dd 
where u = ®(i)(z) and v = &(U)(V). Hence y! — y? = 10 4,5 2, € Ta o(T(M)) = Dom(z;) € Dom(z) 
by Theorem 59.2.6 and Definitions 59.2.9 and 59.2.15. 


For part (ii), let y!', y? € T; v(T(M)), n = dim(M) and v € uM Then by Definition 59.2.15, 


for some wl, w? € R” with n = dim(M) for j = 1,2, 


Notation 59.1.12 and Theorem 59.2.7, y = =i awii for some w!, w? € R” with n = dim(M) for j = 1,2, 
where u = ®(7)(z) and v = (Y) (V). Soy! — y? = pr wi—w?,y € Dom(w). Then ay y?) = dui uy 


T = 2 2 2 2 2 
by Definition 59.2.9. But E(y!) — E(y?) = TE NL E (ota = "ovat way € 


Dom(a). So w(E(y") — E(y?)) = tp.w1—w2,y by Definition 59.2.9. Hence w(E(y') — E(y?)) = w(y! — y’). 


(y?) € T, v (T M )) implies that cz(£(y!) — €(E(y?))) = e(y! — E(y?)) by part (iii). Hence 
= —w(y? — E(y')) by part (i) and Theorem 59.2.10. 


For part (iv) 
e(y! - &(y") 


wa 
Il 
g 
~ 
(1) 
~ 
e 
m? 
s 
m 


59.6.11 REMARK: Swapping derivatives of a curve family. 

Theorem 59.6.12 asserts that the derivatives of a curve family with respect to two of its parameters may be 
swapped if the swap function is applied. This is different to the situation for naive, fibre-free calculus, where 
no swap function is required. (See Theorems 42.3.4 and 42.3.6.) 


59.6.12 THEOREM: Using the swap function to swap derivatives of a curve family. 
Let M be a C? manifold. Let Q € Top(IR2). Let y € C?(Q, M). Then 


V(so,to) € Q, tsy (so; to)) = E(Os0r7(S0; to)). 


PROOF: Let (so, to) EQ. Let p= (80, to). Then Vi = 0s7(S0, to) € T, p(M ) and Və = Ory(so, to) € T,(M) 
are well defined, and Wi = ðs y (So, to) € Tw, v (T(M)) and W2 = 0;,017(S0, to) € Ty, v, (T(M)) are 
also well defined. Let v € atlas,(M) and n = dim(M). Let vj = O,(v o y) and vo = ly o 4). 
Then v1, v2 € C!(Q, R”), and 0,y(s,t) = ty(s,t),01(s,t),y and Ory(s t) = tyst) valst) y for all (s,t) € Q by 
Definition 57.9.2. So 


V(s, t) € Q, 0,058, t) = Ott, (s,t),vi (s,t),v 
(2) 
=t 


(s,t),v1(s,t),v2(s,t),wi(s,t),w? 


(59.6.2) 


where w; = v1 = 0:0.(W o y) € C9(Q, R”). (See Notation 59.1.12 for the four-component double tangent 
vector notation. Line (59.6.2) follows from Definition 57.9.2 applied to Notation 59.1.11.) Similarly, let 
wz = 0,02 = 0,0, (V o y) € C? (Q, IR"). Then 


V(s, t) € Q, 0,0,y(s,t) = Os t (s,t),va(s,t),p 


= sy gl?) 
7(s,t),v2(s,t),v1 (s,t),wa(s.t),p 


— z2 
m zm 


by Definition 59.6.6 because w; = tw» by Theorem 42.3.4. Hence 0,0,7(S0, to)) = E(0,0,*(so, to)) because = 
is the inverse of itself. 
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59.7. Tangent bundles of tangent vector-tuple bundles 


59.7.1 REMARK: Relevance of tangent bundles of tangent vector-tuple bundles to affine connections. 

An affine connection 0 in Definition 71.1.2 maps tangent vectors z, V € T,(M) to values 6y(z) € TZ(T(M)) 
in the second-level tangent bundle T(T(M)), but the associated connection 0" on the associated tangent 
vector r-tuple bundle T"(M) for r € Zj requires the corresponding values Oy ((2;)51) to lie in T(T"(M)) 
for (2;);-.i € Tp( M)”. Therefore the second-level tangent bundles T(T"(M)) must be defined, and a map 
must be defined which can map a tuple of elements of T;, (T(M)) with v(z;) = p for all j € N, to an element 
of Ttc. (T"(M)). (See Definition 55.5.36 for the manifold T"(M).) 


The task here is to convert an input r-tuple (W;);., in T(T(M)) to a corresponding second-level output 
vector in T(T"(M)). Geometrically this is apparently easy to do. For each j € N,, let (W;)5., satisfy 
W; € T, (T(M)) with v(z;j) = p for some p € M. Then it seems that a corresponding vector can be 


constructed in Ty. , (07 (M)) because 7,((z;)"_,) = p, where the projection map 7” : T"'(M) — M is 


j=l 
given in Definition 55.5.8. It would seem that the “vector” (W;)7.., should be an element of Ty, (T"(M)) 
without any modification. Unfortunately this is not correct because x5 4,72; (T(M)) is not the same space 
as Ti; (T"(M)). Therefore some “construction work" is required here. 


One could perhaps hope to find a differentiable map f from the direct product manifold T(M)" to T"(M) 
such that f maps vector r-tuples (2;)7., in T(M)" to the corresponding vector r-tuples (z;)j_, in T"(M). 
Then the differential df (or induced map f.) of f would, hopefully, map differential-tuples (W;);., with 
Vj € N,, Wj € T,,(T(M)) to tangent vectors in Tz (TT (M)). The first obvious objection to this is 
that the vectors z; must all lie in the same pointwise tangent space T,(M) for some p € M. Otherwise 
the space Ti; (T"(M)) would be meaningless. So any such map f would need to be either restricted to 
vector-tuples which have the same base point, or the “output” from the map would need to be given some 
sort of meaningful value for vector-tuples for heterogeneous base points. A second objection is that the map 
f would need to be restricted to differentials W; which have the same horizontal component. This is because 
each second-level vector in T(T"(M)) has a single horizontal component. Each element of T(T"(M)) is an 
element of one and only one space Tz v (1 (M)), where V € T(M). (See Notation 59.2.5 for spaces 
T; v(T(M)) for general C? manifolds M. In this case, T"(M) is substituted for M.) 


The Levi-Civita connection definition requires a map from each tuple (t2, wj.) 5-1 to the corresponding 
vector ft, yr œ in T(T"(M)) for all v € atlas; (M), where v and w; denote the components of V 


Zj A C DA 
and Wj. Definition 59.7.2 is exceedingly clumsy. It probably has an exceedingly narrow range of applications. 


59.7.2 DEFINITION: The canonical immersion of second-level vector r-tuples for a C? manifold M and 
r € Zd, is the map A : Upem Uver, m) Ucs er, Qu x7 AT, v (M) > T(T'(M)) defined by 


Vp € M, VV € T (M), V(z;);-1 € Ti(M)", V(W;);. € 2 Pg 
h((W;)52) = GORTERA Q03)5 4 d? 


where Wj = £;, vw; y for all j € Ny. 


59.7.8 REMARK:  Differentials of real-valued functions on tangent vector-tuple bundles. 

Many kinds of differentiable manifolds are constructed from a base-level manifold M, such as (the total 
spaces of) the covector bundle T*(M) in Definition 55.4.11, the vector-tuple bundle in Definition 55.5.37, 
and the tensor bundles in Definitions 56.3.23 and 56.3.24. In each case, the concept of the differential 
of a real-valued function in Definitions 58.2.2 and 58.2.3 may be applied to the total spaces, which are 
differentiable manifolds, if they are at least of class Ct. Of particular interest for an application to the 
Levi-Civita connection in Section 74.2 is the differential of a real-valued function on the total space of a 
tangent vector-tuple bundle. (See for example Remarks 74.2.6, 74.2.8 and 74.2.10.) 

The differential of a real-valued function on a vector-tuple bundle is an application of Definition 58.1.2 to 
the special case where the manifold M is replaced by a manifold T"(M) with r € Zf. The differential of a 
function f € C! (T" (M)) at a vector r-tuple V = (Vj); € T"(M), for a C? manifold M and r € Z, is the 
map (df)v : Tv (I"(M)) > R defined by 


VW e Ty (T"(M)), (df)v(W) = ôw f. 
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59.8. Higher-order differentials of curves 


59.8.1 REMARK: Velocity and acceleration of curves. 

Whereas the first-order differential of a curve, defined in Section 57.9, represents the velocity of a trajectory 
(if the parameter is interpreted as time), the second-order differential represents the acceleration. However, 
this is a very abstract kind of acceleration because there is no concept of an “inertial frame” here. In other 
words, there is no concept of “zero acceleration” relative to which an “absolute acceleration” can be defined. 
In physics, the “fixed stars” are invoked to provide inertial reference frames, but this is a kind of “majority 
vote” thinking, where the average vote of a trillion, trillion stars is taken as a basis. Presumably there must 
be some kind of local space-time property which an object can use to determine its absolute acceleration 
without needing to make an extensive opinion survey of the whole universe. (See Remark 48.2.4 for further 
comments on fixed stars and inertial frames.) 


The mathematical structure which represents the local concept of zero acceleration is the affine connection. 
To be practically useful, second and higher order differentials are generally “dropped” from higher-level 
tangent spaces to the first-level tangent space by means of an affine connection to make them covariant in 
some sense, but in Definition 59.8.2, second-order differentials are defined abstractly without any connection. 


59.8.2 DEFINITION: The second-order tangent vector field of a C? curve y: I — M for I € Top(IR) and a 
C? manifold M with n = dim(M) is the map y” : I > T(T(M)) defined by 


Vt € R, Yy € atlas, (M), YE) = tya) wyt) T) 


where wy : 4! (Dom(v)) — IR?" is defined for v € atlas(M) by 


VV € atlas(M), Vt € y ! (Dom(g)), | w(t)! = | Oi(* o 4(t)) for i € Nn 


02(y*7" o-(t)) fori € Non \ Nn, 


and 7’ is the first-order tangent vector field in Definition 57.9.2, where Y (y) is the manifold chart map for 
T(M) corresponding to 4% € atlas(M). (See Notation 54.5.21 for V : atlas(M) — atlas(T(M)).) 


59.8.3 REMARK: The second-order tangent vector field is the velocity field of the velocity field. 

Since the first-order tangent vector field 7’ for a C? curve y in Definition 57.9.2 is a C! curve in the tangent 
bundle T'(M), one would expect that its differential would be a well-defined vector field with velocity values 
in T(T(M)). This is in fact true. The second-order tangent vector field in Definition 59.8.2 is nothing more 
or less than the differential of the differential of a C? curve. It only gives a name to a straightforward 
computation. It does not introduce a new concept. This is asserted in Theorem 45.2.6. 


59.8.4 THEOREM: The second-order tangent vector field of a curve equals the velocity of the velocity. 
Let y: I > M be a C? curve in a C? manifold M with I € Top(IR). Then y” = (7)’. 


PROOF: '(t) € T4 (M) for all t € I. So 7 : I 2 T(M) is a C! curve in the C! manifold T(M). So 
(V(t) € Tg (M) for all t € I, and by Definition 57.9.2, (y) (t) = tya adc g Where $ = W(v). 
Here p(y (t)) = O:(w(4(t)), 9v (^(t))) by Definition 57.9.2 and Notation 54.5.21. Therefore 0;¢)(7‘(t)) 
(yv ((t)),02w(^(t))). Thus taaa), g agrees with Definition 59.8.2 for y”(t) € Ty (T( 

all t € I. Hence y" = (yy. 


59.8.5 THEOREM: Chart-independence of the second derivative of a curve. 

The vector field y” in Definition 59.8.2 is chart-independent. In other words, for all t € IR and v1, v» € 
atlas. (o (M), DIM "P trii (t),bo? where w denotes the chart for T(M) corresponding to the 
chart Ya for M for a = 1,2. 


PROOF: By Theorem 59.1.15, INO E ang dbs if and only if equations (59.1.1) and (59.1.2) 
are satisfied. Thus it must be shown that for all i € Np, 
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WhO) = 3280, Uh o d GL, cn) Ma (D ©) + 22 09 (E VT G)L uo WH 


where vy, (t) € R” is the component vector for y'(t) defined by y(t) = ty()oy, (Qi The first equation 
follows as for the first-order tangent field from the calculation 


u$ o y(t) 
L(Y o Wy o Vi o q(t) 


Oxi (VÀ i vi (2), ui (£)) à (V i y(t )) 


à 


— 


wy, (t) 


I 
& 


Be (05 o WT (2))|, y Wn E): 
'The second equation follows similarly from the calculation 
wir (t) = OF (Y3 o *(0) 
= 8; (Y3 o i ovi o Y(t) 
F 2 9,50, (05 o PTD aore T o W)C o (0) 


+ 2 as (2° 1L, Lu cy EH o 000) 
= 


n 


- 3858,05 o i13) uou i. (Cw, (0) + (ES Who Lu oo) MO. 


j,k=1 


This is the correct answer because wh, (t) = vb, (t) for k € Nn. 


59.8.6 REMARK: Interpretation of second-order tangent vector fields. 
The expression for y” (t) in Definition 59.8.2 may be written out more fully as follows: 


Vt € R, E) = [Cty tovc»), (A o V(t), FW o Yd), p), 


where (O:(w o 4(t)),02(v o 4(t))) € IR?” represents the concatenation of the vectors 0;:(wW o y(t)) € R” 
and O2(v o 4(t)) € IR". Definition 59.8.2 is constructed from the double application of Definition 57.9.2. 
When the coordinates (7(7(t)), O4 (v o ?(t))) € IR?" of y (t) in the total tangent space T'(M) are differentiated 
with respect to t, the result is simply 8 (Y(y(t)), &(w o (t))) = (Aw o y(t), 092(v o (4). 

It is perhaps interesting to note that the standard coordinates in IR^" for y” (t) with respect to the chart Y 
may be written as the quadruple concatenation (Y(y(t)), (v o y(t), (wh o y(t)), 2 (v o 7(t))). The first 
derivatives appear twice in this coordinate vector. 


59.8.7 REMARK: The four parts of the second-order derivative of a curve. 

The fact that wh, (t) = Uy (t) for k € Nn in the proof of Theorem 59.8.5 means that the horizontal component 
of y” is the same as 7’. Thus c. (4" (t)) = y' (t), where m : T(M) — M is the projection map for T(M). In 
other words, the vector ?"(t) € T, (4) (T(M)) carries inside it a copy of the vector y(t) € Ty4)(M). This is 
because the vector y(t) = ty, contains a copy of the base point p. Variation of p is regarded as horizontal 
whereas variation of v is regarded as vertical. 


Figure 59.8.1 illustrates the four parts of the second-order tangent field of a curve. The first two parts are 
the point y(t) and the component vector v € R” which are combined as y(t) = t4(,,,s. The third part 
is the sequence of n horizontal components w’ of the vector ?"(t) € Ty(4)(L(M)). The fourth part is the 
sequence of n vertical components w"*J of 4"(t), which indicate the rate of change of the components v 
with respect to t. In the limit, the third part is the same as the second part. Only the fourth part is really 
a second differential. The rest is just book-keeping. 
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fa D vertical 
on /Á x 
O 4) LZ ~ 
© 


horizontal 


Figure 59.8.1 Components of second-order tangent field of a curve 


59.8.8 REMARK: Chart-dependence of the vertical component of the second derivative of a curve. 
The dotted arrow in Figure 59.8.1 represents a kind of parallel-translated copy of the vector v^. 


However, it is important to remember that this translated vector depends on the choice of coordinates. The 
dotted arrow is constructed by copying the coordinates v; from q(t) to a nearby point on the curve. But the 
transition rules are different at different points of the manifold. So these copied coordinates to not transform 
correctly at any point except y(t). The purpose of an affine connection is to construct a true vector at each 
point of the curve which transforms correctly at each point and which may be thought of as the parallel 
translate of y(t) to each point. 


59.8.9 REMARK: The (dropped) acceleration of a trajectory is chart-independent if the velocity is zero. 

If 7/(t) = 0, then y” (t) is a vertical vector. This implies that the vertical part of ?"(t) transforms in the 
same way as vectors in T (M) under changes of chart. If the “drop function" wq) in Definition 59.2.9 is 
applied to y'"(t) € ker((dz)., (1), the result is a well-defined vector in T,(M). This means that even in the 
absence of a connection, the acceleration of a trajectory is chart-independent if the velocity is zero, and the 
acceleration may then be interpreted as a true tangent vector to the manifold. One way of thinking about 
this is to observe that the error introduced by chart variation (as in Remark 59.8.8) into the limiting process 
when differentiating ?'(t) with respect to t converges to zero faster than t converges. 


59.8.10 NOTATION: Recursively defined higher-order derivatives of a curve. 
d*4 for k € Z* for a OF open curve y : I — M in a C^ manifold M recursively denotes the differential 
d(d*-«), where d°y = «. 


59.8.11 REMARK: The domain and range of higher-order differentials of curves. 
Notation 59.8.10 implies the functional form d*y : I — T? (M), where the higher tangent spaces T? (M) 
are defined in Definition 59.1.25 and Notation 59.1.26. For example, d?» : I > T(T(M)) is defined by 


Vt c I, (d?»)(t) = (tco. tov,» OF (0 o 9), 9], 


for all v € atlas) (M), where i) is the chart for T(M) corresponding to vj as in Definition 54.5.16. 


59.8.12 REMARK: The differentiability of higher-order derivatives of curves. 

For any k € Zj , it can be seen from Definition 57.9.2 that y’ is C^ if y is C**1. By the double application 
of Definition 57.9.2, as in the proof of Theorem 59.8.4, it can be seen that y” is OF if y is C^*?. Then the 
curve t  Of4(t) in TO (M) is C if y is C*** for any k € Zj and Le Zf. 


59.9. Higher-order differentials of curve families 


(2018-11-12. There seems to be no application for Section 59.9, although the original book plan did have 
applications. The partial derivative concept for curve families is shallow. So it may be removed soon. )) 


59.9.1 REMARK:  T'wo interpretations of the second-order differential of a curve. 
'The second-order differential of a curve may be interpreted either as the differential of the differential in 
T(T(M)) or as a second-order operator in T?! (M). The T(T(M)) interpretation is presented in Section 59.9. 


59.9.2 REMARK: Noncommutativity of partial second-order derivatives of a curve family. 

When derivatives do not commute, it is important to get the order correct. The notation yke means a 
derivative with respect to k then £. (This is the traditional order.) Thus in Definition 59.9.3, yke(t) means 
Oye (Op. (t)), which is a vector with base point O,«.y(t) = y(t). 
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59.9.3 DEFINITION: A second-order partial derivative vector field of a C? map y : IR" — M in a C? 
manifold M with n = dim(M) and m € Z* is a map yke : IR" > T(T(M)) defined for k,£ € IN, and 
w € atlas, (M) by 


vee R”, Welt) = by, (4) wy (0), 
where wy : IR — IR?" is defined by 


Vt e R^, wy (t)! = 


| Aye (Qb* o y) (t) for i € Nn 
Op Op (^7 o y)(t)) fori € Non \ Nn 


and yp is the kth first-order partial derivative vector field of y as in Definition 57.9.15. 


59.9.4 REMARK: Interpretation of the definition of partial second-order tangent vector fields. 
The expression for y,¢(t) in Definition 59.9.3 may be written out more fully as: 


vt € R, welt) = ((550,5, (oom) (os (Bie (hb © 9) (£)), Ope Bi ((W o 9) (£))); v)], 


where (O,¢((~) o 4)(t)), 840p (( o y)(t))) € IR?" represents the concatenation of the vectors ô; ((Y o 
y)(t)) € R” and GO ((v o y)(t)) € R”. Definition 59.9.3 is constructed from the double application 
of Definition 57.9.2. When the coordinates (v(^(t)), 9j. (( o 7)(t))) € IR?" of y(t) in the total tangent 
space T(M) are differentiated with respect to t^, the result is Qj (w(y(t)), 3s (( o )(t))) = (Oe((W o 


3)(0), Ber Bre (9 o 9) (£))). 


59.9.5 REMARK: Proof strategy for chart-independence of second-order tangent vectors. 

The second-order tangent vector yke(t) in Definition 59.9.3 is a chart-independent vector in T^, œ) (T(M)). 
The proof of this is the same as for Theorem 59.8.5. In this case, though, the second and third parts v^ and 
wF of the vector are not generally the same unless k = £. That is, the horizontal part of the second-order 
differential is not generally the same as the first-order differential. 


Related to this is the fact that the partial tangent vector fields do not generally satisfy yke = Yep when k # 4. 
Such fields are not even comparable because they are in different tangent spaces T», (M) and T^, ( (M). 


59.10. Higher-order differentials of real-valued functions 


59.10.1 REMARK: The form of higher-order differentials of real-valued functions. 

Section 59.10 is about differentials of differentials (d?f), of real-valued functions f, and higher-order versions 
of this. By Definition 58.2.3, df € X(T*(M)), and by Theorem 58.2.6, df € X!(T*(M)) for f € C?(M). 
Thus df is a C! map from M to T*(M). However, this formulation of the differential of a real-valued function 
is exceptional amongst differentials, and it is not very useful for generalising to higher-order differentials, 
as can be demonstrated by attempting to apply a differential to this formulation. Both M and T*(M) 
have a C! manifold structure if M is a C? manifold. Therefore df : M — T*(M) is a C! map which 
can be differentiated. Its differential must be a map of the form d?f : T(M) — T(T*(M)) which satisfies 
(d?f), : Tp(M) > Tra), (T* (M)). This would yield a tangent vector to the tangent covector bundle for each 
base-space tangent vector. It is not entirely clear what this would mean. 


To define a second-order differential of f € C?(M,IR) for a C? manifold M, it is preferable to formulate the 
first-order differential of f as df : T(M) — IR, where (df), = df |z (M) is a linear map for all p € M. This 


is the very simple map df : V > Oy f for V € T(M). Then the second differential is a map of the simple 
form d? f : T(T(M)) > R, which is quite straightforward to convert to a tensorial map D? f : T?9(M) > R, 
which turns out to be the covariant Hessian of f. 
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59.11. Hessian operators at critical points 


59.11.1 REMARK: Terminology: Critical point versus stationary point. 

The term “stationary point” is generally used for functions of a single real variable. The word “stationary” 
means that a differentiable function has zero derivative with respect to a parameter which may be interpreted 
as time. In other words, a stationary point of a function is a point where its velocity is zero. Thus “stationary 
point” generally applies to single-parameter curves only. 


The term “critical point” refers to more general kinds of differentiable functions between Cartesian spaces 
or differentiable manifolds. A critical point of a function is a point where at least one directional derivative 
equals zero. The function may not be continuously partially (or totally) differentiable in a neighbourhood of 
a critical point. So the criterion for a critical point which is often given, in terms of the rank of the Jacobian 
matrix of the map, is not always applicable. 


In the case of a C? function, the total differential of the function is well defined, and in this case the rank of 
the Jacobian matrix of the transformation may be examined to determine whether a point is a critical point. 
At a non-critical point, a C? function has relatively simple-looking level curves which are fairly accurately 
described in terms of the first-order partial derivatives. At a critical point, the higher-derivatives must be 
examined in order to characterise the level curves. No doubt the word “critical” is intended to suggest the 
greater complexity of the function’s behaviour in a neighbourhood of a critical point. 


59.11.2 REMARK:  Tensoriality of the Hessian of a real-valued function at a critical point. 

It is perhaps surprising that the Hessian of a real-valued function is, under the right circumstances, a well- 
defined tensor even in the absence of a connection. The usual definition of the Hessian operator incorporates 
a connection. (See Greene/Wu [87], page 7, for Hessians with a connection.) It will be shown here that 
the Hessian of a real-valued function f € C?(M) for a C? manifold M, at a critical point of f, namely a 
point p € M such that (df), = 0, the Hessian of f at p is a well-defined tensor in TP? (M). Obviously, 
since this “connectionless” version of the Hessian is restricted to critical points, it is not very effective at 
generating interesting vector fields from real-valued functions in the way that the first-order differential 
df does. However, the Hessian operator plays an important role in the computation of Riemannian and 
pseudo-Riemannian metric tensor fields from two-point distance functions. (See Section 73.9 for the relation 
between distance functions and metric tensor fields.) 

It is tempting to use a notation such as (d? f), for the Hessian of f at p, but this has some difficulties. The 
differential of df should be a map from T(T'(M)) to IR, as discussed in Remark 59.10.1. In the presence 
of a connection, it is usual to write D?f for the Hessian of f. This is the same thing as the Hessian in 
Theorem 59.11.3 if (df), = 0. 


59.11.3 THEOREM: The Hessian at a critical point is a doubly covariant tensor. 
Let M bea C? manifold. Then for any real-valued function f € C?(M) and point p € M such that (df), = 0, 
the map Hy,» : T5(M) x T;(M) > R defined for all tp .u.y, tpv, € Tp(M) by 


n id ? 7 
Hig(ipsgluss) = X uv asa / ME (59.11.1) 
i,j—1 


where n = dim(M), is a tensor in T? (M) which is independent of the chart Y € atlas; (M). 


PROOF: For a fixed chart v, the function Hẹ p is clearly bilinear with respect to u,v € IR". Therefore for 
this fixed chart, Hy,» € .Z;(T;(M), R) = T??(M) by Notation 56.1.5. 


To verify the tensorial transformation rule, let i € atlas; (M). Then for all i, j € Nn, 
O(f o 977) = 05 ((F ov) o (bo 977) 
= Y ulf o YDA o THAW o 71) + E l o $7) (UE o 7) 


k,£—1 k=1 


because (df) p = 0 implies 0; (f o wt) = Of o v7 (2). uc) — 0 for all k € Zi. This is the correct 
transformation rule for T)?(M). Hence Hy, € T??(M). 
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59.11.4 DEFINITION: The Hessian at a critical point p of a C? real-valued function f on a C? differentiable 
manifold M is the twice covariant tensor Hy» € T9? (M ) which satisfies 


n “ ind o? —1 
Yu, v ER HS o boss iso! = b» u v? asas Y (2)... (py (59.11.2) 


sgel 
with respect to any C? chart y for M, where n = dim(M) € Zt. 


59.11.5 REMARK: Components of the Hessian with respect to a chart. 

For f and v as in Theorem 59.11.3, define fij = 0? Əz Oi f (9 (z))|,.., for i,j € Nn. Then Hj, = 
Dey fij & e), where the tangent covectors e = (dy"), € Tọ (M) are combined to obtain e! & e) € 
T$? (M), and the right-hand side of equation (59.11.1) may be written as 55; ., fiju. 


59.12. Higher-order differentials of maps between manifolds 


59.12.1 REMARK: Second-order differential of a map between manifolds. 
The second-order differential (or induced map) of a map between manifolds Mi and Mg is formulated in 
Definition 59.12.2 as a map between the double tangent bundles T(T'(M1)) and T(T(M3)). 


According to Theorem 59.12.3, Definition 59.12.2 follows from the double application of Definition 58.4.5. 
This justifies the simple notation “d?” for this kind of double differential. Higher-order differentials d^ for 
any k € Z* may therefore be defined inductively according to the same pattern. 


The second-order differential of a map between two manifolds is applicable to the “push-forth” of second- 
order differential operators. For example, one might wish to “push forth" the Laplacian from one Riemannian 
manifold to another. This kind of application requires the naive second-order differential in Definition 59.12.2 
to be converted to a covariant second-order differential, which is the subject of Section 71.10. 


Figure 59.12.1 illustrates some of the spaces and maps in Definition 59.12.2 and Theorem 59.12.3. 


: T(T(M1)) T(T'(M3)) 4 
do 
> <4 = — > = » » 
Rir pı 7 7 he TRA4n2 
p n 
n T(M) T(M3) i 
dó 
> <4 zm ————————* E » » 
Vi V» 
R2m TR2n2 
[^ "i 
Mi M» 
| é | 
> 4 — > > » 
| Vn we | 
R™ IR?2 
- 2 
Z= tpv i a= EN 
Figure 59.12.1 Double differential of a map between two manifolds 


59.12.2 DEFINITION: The second-order differential of a C? map ¢ : Mı — M» between C? manifolds M; 
and M3 is the map d?¢: T(T(M1)) > T(T(M2)) defined by 


Vp € Mi, V, € atlas, (Mi), Vy. € atlass(p (M3), Yv, Ù, Ww. € R”, 
2 (2) — 7) 
(d*o) (5,0 0a a) ~ Ló(p),va ja 2 a? 
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where n; = dim( Mı) and n2 = dim( M2), and 


ny 


Vie Nau, 03 = Le 119 (AOUT ())], uy 
J= 

ViENngs t= 3 bibo (HOUT ))], us) 
J= 

VieN., th= Y divido. HOUT (2) Lu T 25 He BOUT GOD) Lus i 
I= JF 


The second-order differential at a vector z € T(M;) of a C? map ¢: M, > M» between C? manifolds Mj 
and M is the map (d*¢), : T.(T(Mi)) > T(asj(;) (T(M2)) which is the restriction of d?¢ to T,(T(M1)). 
Thus 


Vp € Mi, Vili € atlas, (Mi), Vibo € atlaso(y) (M3), Vui, V, Wy € R”, 


2 (2) — (2) 
(d D) NN, > Uo ah us dogs alia" 


59.12.3 THEOREM:  Justification of formula for the second-order differential of a map. 
Let à : Mı — Mz be a C? map between C? manifolds Mı and M». Then d?$ = d(d4). 


PROOF: Let n; = dim(Mj) and ng = dim(M3). By Definitions 58.4.5 and 58.9.2, dọ : T(Mi) > T(M3) 
satisfies 


Vp € Mı, Yy € atlas, (Mi), Vis € atlaso(y) (Mə), Vu, € IRR, 
(d) (tpv) = tolp) va,p2> 


where vs € R”? satisfies 
] ny . . 
Vi € N5,, U5 = M vios (V5 o 6 o 11 (2))|, uo: 
j=l 


These formulas must be re-applied with Mi, M», ¢, v1 € atlas(Mi), p and vı € IR"! replaced by T(M;), 
T(M3), dé, pı € atlas(T(Mi)), tpv, and wi = (t,t) € IR?" respectively, and other corresponding 
replacements. Then d(dQ) : T(T(Mi)) > T(T(M2)) satisfies 


Vp € Mi, Vu € atlas; (Mi), Vio € atlas p) (M2), Vu1,W1, W1,€ R”, 
(2) — 4(2) 
(d(dq)) (tpv io, as s) T og) va iba ia a? 
where wz = (105,403) € IR?"? satisfies 


LO 


Vi € Na, v) = Y v1 Oni (YZ odo V; (2) |, (py (59.12.1) 
j=l 
and 
" 2ni . ay T 
Vi € Nong; w= | 03 Dis (Wi o (dd) o vr (9). a, -— (59.12.2) 
J= D E 


Line (59.12.2) may be split according to the components z,v; € R”™ of the variable i = (z,v1) € R?™ as 


; ni ; ; Ta ; ; 
Vi € Nns, d$ = Y 010. (0$ o G ovi (z)), uu) +L HAs (0$ o 6 o 91 (a) 
j=l j=l 
ny . ii 
7 i52 (95 0 6 o dr E) lazy) 
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and 
nı nj s . 
Vi € Nns, w, = Do 05v TY ùy 

j=l j=l x 

= Y idos (35 vba (Vd o d 0 7 G))L uu) +L HOw (35 v0 (Who 6o v1 (2) Lu) 

zi ja ar — M T x=% (p) = vy 1 Trepi 1 z—wvi(p) 

ni : ni ; - 
mi ae, ivi, Os (u$ ogo Vi (z)) "— T 2 ti} Ow (y3 ogo Vi (z)) [aso 


because ô;v = ô} for all j,k € IN4,. This agrees with the formulas for d?¢ in Definition 59.12.2. 
T — 
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[1907] 
Chapter 60 


HIGHER-ORDER TANGENT OPERATORS 


60.1 Motivation for higher-order tangent vectors .. 2... eA 1907 
60.2 Second-order tangent operators... ees 1909 
60.3 Composition of first-order tangent operator fields . ................. s. 1912 
60.4 Tensorisation coefficients for second-order tangent operators... so soosoo 1913 
60.5 Second-order tangent vectors . . . 4... e lees 1916 


60.0.1 REMARK: Terminology confusion for degree, order, rank and level. 

As mentioned in Remark 59.0.2, the words “degree”, “order” and “rank” are often used with different 
meanings by different authors. In this book, “degree” refers to tensor multilinearity, so that a trilinear 
function of a linear space has covariant degree 3 for example. The word “degree” is also used for polynomials, 
so that a cubic polynomial has degree 3 for example. This is a good match because multilinear functions have 
a lot in common with higher-degree terms in polynomials. (For example, the degree-2 multilinear function 
f € Z(R?, R?; IR) in Example 27.2.5 has the form f(u1, u2) = x rud a;;u$ u$ for (u1, u2) € IR? x IR? for 
some array [a;j]7_, 7-1 € IR?*?. This is a degree-2 multinomial in the real variables uj, uj, u$, u$ and uj.) 


The word “order” refers to the order of differentiation, so that a second-order differential equation contains 
derivatives of derivatives, for example. The word “rank” generally refers to the dimension of the span of rows 
or columns of a matrix, although some authors use the term “rank” where “degree” or “order” would be 
used here. Terms like “second-level” and “higher-level” are applied in this book to recursive tangent bundle 
constructions such as the tangent bundle of a tangent bundle in Section 59.1. 


60.1. Motivation for higher-order tangent vectors 


60.1.1 REMARK: Higher-order tangent vectors correspond to higher-order differential operators. 
Second-order differential operators are of importance for differential geometry to represent curvature of 
various kinds. In physics, second-order differential operators are of importance to represent the dynamics of 
systems. Since tangent vectors represent only first-order differential operators, it is necessary to additionally 
define second-order tangent vectors. (Many authors even state that tangent vectors are first-order differential 
operators.) In the spirit of generalisation, one naturally extends such concepts to arbitrary higher-order 
tangent vectors. 


Higher-order tangent vectors are extensions of ordinary tangent vectors to represent higher-order derivatives. 
These should be distinguished from higher-order differentials of functions and maps, which are multiple 
applications of a differential operator to real-valued functions on manifolds and maps between manifolds. 


In the case of first-order tangent vectors, the standard definition offered in Section 54.1 is based upon 
parametrised lines in charts. These are closely harmonised with tangent velocity (component tuple) vectors 
in Section 54.10 and tangent (differential) operators in Section 54.11. (Of less importance is the tagged 
tangent differential operator style in Section 54.15. A generalisation to unidirectional tangent-line vectors is 
presented in Section 54.16.) In the case of second and higher order tangent vectors, it is a rational expectation 
that the operator style of tangent vector can be generalised to second and higher order differential operators, 
and that it should be possible to harmonise such tangent vectors with the line-based and velocity (component 
tuple) styles of tangent vectors. These are the objectives of Chapter 60. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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60.1.2 REMARK: Higher-order tangent operators cannot be tensorial. 

Whenever the term “tensorial” is applied to higher-order tangent operator objects in Chapter 60, this means 
that the objects have specified component transformation rules which ensure that the same object is referred 
to in different coordinate systems. Unfortunately, second and higher order tangent operator objects cannot 
be made tensorial. 


A truly tensorial object is built from first-order tangent vectors using only tensor algebra. Thus a C! 
differentiable structure (i.e. atlas) is sufficient to build all kinds of tensorial objects. A second-order tangent 
operator, on the other hand, requires a C? atlas. A general second-order tangent operator, for example, 
cannot be constructed from a combination of tensorial objects because the transformation rules for second- 
order tangent operators depend on the second-order derivatives of local chart-transition diffeomorphisms, as 
shown in Theorem 60.2.5. (This is why the manifold must be C?.) 


60.1.3 REMARK: Higher-order tangent operators can be defined without a connection or metric. 
Higher-order tangent vectors do not require a connection or metric to make them tensorial. It is often 
said that some constructions on manifolds are tensors because they transform correctly, and others are not 
because they don’t. The tensorial transformation test is often applied to determine is a given construction 
“is a tensor”. It is generally believed that if it transforms like a tensor, it is a tensor. 


One little fly in the ointment here is the fact that given any array AV» = (af) rent Of n* real numbers for 
a given chart vo € atlas;(M) can be made to “transform like a tensor" by defining the value of the array 
AV = (a) renk for general Y € atlas; (M) according to a standard tensorial transformation rule for the chart 
transition from Yọ to i». One may even freely choose whether this “tensor” should be k-times covariant, 
k-times contravariant, or any of the 2* possible combinations of covariant and contravariant for the indices 
constituting the multi-index J € NF. There is, of course, the small inconvenience that one must specify the 
array of në numbers for a specific chart po, but the result is still a construction which passes the tensorial 
transformation test. So it must be a tensor! 


Logically and mathematically, there is nothing wrong with specifying a tensor at a point p € M as follows. 
(1) Specify a reference chart wo € atlas; (M). 
(2) Specify an array of n* real numbers A"» = (a) ree € RN. 


3) Specify which of the k indices i; in the tuples (i,,...2;,...i4) € NË should be covariant, and which 
y j j n 
should be contravariant. 


(4) Define AY” = (a7) IeN& € RN? for all € atlas; (M) according to the tensor transformation rules for 
the covariance and contravariance choices in step (3). 


The resulting construction passes all of the objective tests for tensoriality of the kind specified in step (3). 
The real difficulty with such a procedure is philosophical, not mathematical. Geometers and physicists would 
find such a construction objectionable on the grounds that a specific chart is singled out as the reference 
chart, which seems arbitrary and unnatural, and perhaps one could even argue that it is undemocratic or 
not “culturally relative" enough. In other words, all perspectives should be equal. No point of view is the 
“correct” point of view, while all others are only relative. All points of view should be equally “relative”. 


What one observes in practice in the literature is that tensors are constructed by procedures which work for 
any given chart, and the same algorithmic procedure applied to different charts ~ € atlas; (M) is supposed 


to yield a component array A" = (af) TENE € IRN» which obeys the tensor transformation rules by virtue of 
the chart-agnostic construction procedure alone. No value judgement should be made regarding a preferred 
chart because all charts are equal. This kind of thinking is generally implicit or more or less explicit in most 
of the literature. Chart-relativism is a kind of "article of faith" in geometry and physics. 


Chart-relativism is not actually implemented as fully in practice as is often claimed. Out of the very large 
class of possible local chart maps from a manifold to a Cartesian space, by no means all of those charts 
are included in the relativity tests. For tensors of non-zero degree, one includes only the charts in some 
particular C* equivalence class of charts. In other words, the manifold's atlas must be a C! atlas. The set 
of charts to test is equal to, or included in, some such equivalence class, but this equivalence class must be 
specified somehow. This is done by providing an atlas. This atlas is generally not specified as the set of 
all possible charts, but rather by giving a small finite number of charts which cover the manifold. Then 
any chart equivalent to this small covering set is regarded as acceptable. Thus in practice, one does specify 
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particular reference charts. (It is true that if a manifold is a smoothly embedded submanifold of another 
manifold, such as a Euclidean space, one can specify the charts on the submanifold by projection from the 
ambient space, but even in this case, there are usually specific standard projections which are used most 
often for convenience.) 


So the argument against chart-specific constructions is not as strong as it might seem at first. Nevertheless, 
the philosophical imperative for chart-agnostic constructions is strong and undeniable. One should define all 
constructions on manifolds in a chart-agnostic fashion, and then one must be certain that such constructions 
transform correctly as tensors. But it is important to not forget that this is a philosophical imperative, not 
a logical or mathematical imperative. 


The important conclusion to draw from the possibility of defining a tensor in terms of a specified chart and 
an array of numbers is that all higher-order derivative operators can be tensorised. The real question to be 
asked about these operators is how a chart-agnostic procedure can be determined by which “true tensors” 
can be produced without specifying a particular chart. 


60.1.4 REMARK: Second-order differential operators and the need for “grooves in space”. 

Although the Laplacian operator uses a metric for its specification, it is a second-order tangent operator 
which is well defined without any metric. This may seem a little paradoxical, but in fact the Laplacian needs 
the metric only for the choice of its coefficients. The space of second-order operators is well defined in the 
absence of a metric tensor field. The Laplacian operator continues to “exist” if the metric is removed. In 
other words, without the metric, one knows that the Laplacian is somewhere in the set of all second-order 
tangent operators, but one does not know which one it is. The correct choice for the Laplacian operator 
depends on the choice of metric tensor field. 


Second-order operators such as the Laplacian are typically specified (i.e. selected from the set of all possible 
second-order operators) with the assistance of a covariant derivative or a Christoffel array, which themselves 
are derived from a metric tensor field. The affine connection structure on a manifold provides “compensation 
terms” which have the effect of tensorising second-order operators. In effect, an affine connection compensates 
for distortions of the local coordinate chart so as to compute the correct values for a special class of preferred 
charts at each point which are somehow more “correct” than the others. Relativity in physics does not mean 
relativity with respect to all diffeomorphisms. There are, in effect, “grooves in space” which define parallel 
translation. Second-order derivatives such as the Laplacian are calculated relative to these “grooves”. 


In Chapter 60, affine connections are not yet defined. Therefore the “grooves in space” which enable chart- 
independent (i.e. tensorial) second-order derivatives to be calculated are not available. So the values of 
second-order derivatives must be defined with the assistance of arbitrary tensorisation coefficients, which 


may or may not be derived from some kind of affine connection. 


60.2. Second-order tangent operators 


60.2.1 REMARK: Transformation rules for higher-order vectors come from higher-order operators. 

The higher-order tangent vectors in Section 60.5 are an abstraction from the higher-order tangent operators 
in Section 60.2. The concrete differential operators are defined here first so as to determine the transformation 
rules which can guarantee some kind of “tensoriality” of the corresponding abstract vectors. 


Second-order operators are given in Definition 60.2.2 for a single chart only. Then the task is to determine 
transformation rules which can tensorise such operators. Such rules are given in Theorem 60.2.5. 


The coefficient matrix a = [a;;]7;.., in Definition 60.2.2 is assumed to be symmetric to eliminate redundancy 
(since the matrix of second partial derivatives of f is known to be symmetric if f € C?(M)). However, it 
is sometimes convenient to permit operators M with a € M, (IR) = IR"** because this automatically 
constructs the same operator as if the matrix a had been symmetrised first. 


60.2.2 DEFINITION: A second-order tangent operator on an n-dimensional C? manifold M is any function 


Dad : C?(M) > R defined for p € M, a € Sym(n, IR), b € R” and v € atlas,(M) by 


Vp € M, Va € Sym(n, IR), Vb € R”, Vy € atlas (M), Vf € C?(M), 


7 ; 9 Sahat 
Eo 2: aU v T)) 


TL 


2 8G 
OL as) = Y, a s ov Ty) 


ij-l 


r=1)(p) 
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This may be written in abbreviated form (in terms of Notation 54.13.5) as: 
Vp € M, Va € Sym(n, R), Vb € R”, Vy € atlas,(M), Vf € C?(M), 
AP woof) = a? 0P P" (9) + V0P" (f), 


or even more briefly (but imprecisely) as: 


aP 


a,b,» = 0707 + b ði. 


The pair (a,b) is called the component pair for, or the components of, the second-order tangent operator 


BP ua with respect to the chart 4» at p. 
The tuple (p, a, b, w) is called the coefficient tuple for the second-order tangent operator Bb uh 


60.2.3 REMARK:  Abbreviated notations for second-order tangent operators. 
More colloquially, the second-order tangent operator om y, may be written as 


ol ee UG = c. ` b? : 
D,a,b,ij 2 "rðr x=Ņ(p) H e Or 


ij - b : ; 
"TEE 2 NIENTE OEDD gyi P) 


i, i=1 
or just 
2 
p 4 9 po B Qu i B] aeo pano 
p,a,by T a” Aou Ua Or even Op aby = a? Qi + bd; or On aba = a” OF + bo? T 


60.2.4 NOTATION: Spaces and total spaces of second-order differential operators. 


iPM ), for a C? manifold M and p € M, denotes the set of all second-order tangent operators at p. Thus 


TP) = (007 , a € Sym(n, R), b € R”, y € atlasp(M)}, 


where n = dim( M). 


T 2I (M), for a C? manifold M, denotes the set of all second-order tangent operators on M. Thus 


TPM) = U TP) 
peM 


= {ðf „yi p € M, a € Sym(n, R), b € R”, y € atlas (M)). 


jd 

60.2.5 THEOREM: Transformation rule for second-order operators. 
[2] [2] 

Second-order tangent operators O^. ,, ,,, and O5, p, tho 

pi = p2 = p and 


TL 


: 0 i 
Vi, jENn, aj = mE o V 1 (a)! "m o Vr la i at. 
j PESCE CLR: 
VeN, = Y S hosp ay] ate ove tay] od 
k o a Baka! i s-a) © < Oak : =y1(p) | 
In abbreviated form, 
Vi, j = Nn, aj = X Prp p af, 
k,£—1 
Vic Nn, bi, = 5 d ke i + obi, 
k,£—1 k=1 


where [o = Wo o Vil, Và, j € Nn, 9; = 0d! and Vi, j, k € Nn, Pjk = 0,0; 0". 
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on an n-dimensional C? manifold M are equal if 


(60.2.1) 


(60.2.2) 


(60.2.3) 


(60.2.4) 
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PROOF: Suppose that pı = p» € M. Then for all f € C?(M), 


Pla mall) = ef rg oi no Bg OO 
= 3 oF spree MDA, gy Gall MIO, I 
+O algae OWA) Toad oH DOL Gy 
e XL evene], gy del eT IO, os eny 
= Ed sage NA, gy tL age IL, o 


[2] 
— Ep a2 ,b2,we (f) 
where az and bz are as in equations (60.2.1) and (60.2.2). 


60.2.6 REMARK:  Abbreviated notations for some second-order calculations. 

Differential geometry has many of the kinds of expressions and calculations seen in Theorem 60.2.5 and 
its proof. One has the choice between writing the full onerous details or using ambiguous abbreviations. 
Generally the abbreviations, such as in Remark 60.2.3, are to be preferred, if one does not forget how to 
write down the full details. In abbreviated form, Theorem 60.2.5 becomes: 


a? Oi; + bid; = ae One + bk Ap, 


where 

ay = Pk p ay 
and 

by = b yeah + opb. 
Therefore 


al! Oj + 040; = Fat Oey + (95,05. + d',01)0; 
=a} LORY di; + $490 ) +o (o! n 2 
= aj (9*6 One + 9*,0,) + t (950). (60.2.5) 


Here ¢ = 3 o V! 


60.2.7 REMARK:  Rearrangement of a second-order operator to make the LCOGINGIORUS tensorial. 

The expression (60.2.5) looks simple enough. It suggests that 0; = 465 Oke + 9^;0y and 0; = o%0;., which 
is true and interesting. However, it is much more useful to rearrange (60.2.5) so that a and b transform like 
tensors. Thus 


aj Iai; +00: = (a og 19^, )(Oxe + Qs b", J“ ðm) +( 1 950, 
= af (One + OP eb Om) + Oy 

where 

aa ee, 
and 

b = bids. 
This gives us a nice tensorial form for the coefficients. It follows that the operators Õpe + Gis jn, B®, Om and Oy 
are also tensorial. So we have constructed a tensorial kind of second-order derivative. The pros iem with 


this is that the second order derivative operator must be calculated in terms of a single special chart (or a 
special subset of atlas-compatible charts). 


An interesting question to ask now is how to define a second-order operator on any C? manifold so that it 
looks like Ope + Qs ke Q^, Om, when transformed. This question leads to Theorem 60.4.7. 
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60.2.8 REMARK: Difficulty of showing that operator equality implies equality of operator parameters. 

To show the converse of Theorem 60.2.5 is not so easy. It must be shown first that pı = po and vj = 
v O; (4$ o Wy!) for all i € Nn, and then that the symmetric parts of the second-order coefficients are the 
same. This requires the use of test functions. 


60.2.9 REMARK:  Vamishing of the second-order part of an operator is chart-independent. 
'The set of operators py y, 18 à closed subset of the set of operators pa by in T; 12 (M) under chart transitions. 


'The set of tangent operators al! 


p,a,o,p 18 not closed. 


60.2.10 REMARK: Tagged second-order tangent operators. 

'The second-order tangent operators in Definition 60.2.2 have the same ambiguity problem as the first-order 
tangent operators in Definition 54.11.2. A work-around for this problem is given in Section 54.15, where 
tangent operators are tagged with their base points. The same work-around is provided by Definition 60.2.11. 


60.2.11 DEFINITION: A tagged second-order tangent operator, on a C? manifold M, is a pair (p, om 


such that p € M and Os : C?(M) — R is a second-order tangent operator at p. 


The tuple (p, a, b, w) is called the coefficient tuple for the tagged second-order tangent operator (p, o yp): 


60.2.12 NOTATION: Particular tagged second-order differential operators. 
a for p € M, a € Sym(n, IR), b € R” and v € atlas;(M) for an n-dimensional C? manifold M denotes 
the ordered pair (p, Bo 


60.2.13 NOTATION: Spaces and total spaces of tagged second-order differential operators. 
Î 12 (M), for a C? manifold M and p € M, denotes the set of all tagged second-order tangent operators at p. 
Thus 


TP) = (00, ,; a € Sym(n, R), b € R”, v € atlas,(M)} 


= {(p, 08 , s a € Sym(n, IR), b € R”, v € atlas (M)], 


where n — dim(M). 
i Pl (M), for a C? manifold M, denotes the set of all tagged second-order tangent operators on M. Thus 


TPM) = U TPM) 
pEM 


= CaN: p E€ M, a € Sym(n, R), b c R”, y € atlasp(M)} 
= Woa 4); p E€ M, a € Sym(n, R), b c R”, y € atlasp(M)}. 


60.2.14 REMARK: Higher-order tangent operator bundles. 

It is clear that third and higher order tangent operators, tangent vectors and tangent bundles may be defined 
following the pattern of the second order. These are not presented here because the definitions and notations 
for them are excessively onerous in proportion to their utility. 


60.3. Composition of first-order tangent operator fields 


60.3.1 REMARK: The composite of two first-order operator fields is a second-order operator field. 

Theorem 60.3.3 shows that second-order tangent operator fields arise naturally when the actions of first- 
order tangent operator fields on real-number functions are composed. It is well known that the second-order 
component of the commutator [X,Y] = “XY — Y X" of tangent operator fields X,Y € X! (T'(M)) is equal to 
zero if the fields are C! and the function acted upon is C°. (See Section 61.5.) Theorem 60.3.3 shows that XY 
and Y X are well-defined second-order operator fields. Hence they can be meaningfully subtracted pointwise 


as elements of the well-defined linear space TB (M) for each p € M. The result of this subtraction may then 
be identified with a first-order vector field, which is known as the *Lie bracket". (See Definition 61.5.7.) 
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60.3.2 REMARK: Two interpretations of a “product” of tangent operator fields. 
The product notation “XY” for two first-order tangent operator fields has two distinct interpretations. 


(1) Second-order tangent operator in T!?!(M). 
vX,Y e X1(T(M)), (X,Y) = Ox(p o Oy € TE (M). 
The composition expression 0x o Oy is obtained by making Y act on a function f € C?(M) to give 
a function Oy f € C! (M), which is then acted on my X to give a function Ox (Oy f) € C?(M). Thus 
Ox o Oy maps C?(M) to C?(M). This is used in the definition of the Lie bracket. (See Section 61.5.) 
(2) Second-level tangent vector in TO (M) = T(T(M)). 
VX,Y € X (T(M)), (X. Y) => ôx Y € Ty(p)(T(M)). 
The naive derivative Ox(p»)Y is obtained by making X(p) act on Y € X!(T(M)) to give a second-level 
tangent vector Ox(p)Y € Ty (5j (T(M)) for each p € M. 


The composition interpretation Ox o Oy is suitable for use in the Lie bracket expression [X,Y]. The flow- 
rate interpretation Ox(»)Y is more suitable for use in covariant derivative expressions like the difference 
Ox(p)Y — 0x (py) (Y (p)) € Typ (T(M)) between a parallel transport term 0x5 (Y (p)) and a naive derivative 
term Ox(p)Y. (See Definition 71.6.9 for the covariant derivative.) 

It is perhaps noteworthy that the Lie derivative Lx Y is constructed more similarly to the covariant derivative 
than to the Lie bracket. Thus LxY = Ox(5jY — Oy(p)X is a difference between two elements of T(T'(M)). 
This difference expression requires the application of a “horizontal component swap function", as described 
in Section 59.6. Numerically it is true that LxY = [X,Y], but the construction method is quite different, 
and the terms which are subtracted are in different spaces. 


60.3.3 THEOREM: Composition of two first-order operator fields. 
Let X,Y € X!(T(M)) be C! operator fields on a C? manifold M with n = dim(M). Let Y € atlas(M). Let 


X and Y satisfy X(p) = 355, && ((p))0* and Y (p) = Di & (6(p))0?" for all p € Dom(w), for some 
functions £x,£y : Range(v) > R”. Then 


Vf € C?(M), Vp € Dom(v), 
(XY)(f)(p) = bs Ex (b(p)) & (WP) Ax: Ons f (v! (x) 


ij=1 z=4(p) 
+E GOO) Wo] OTO] 
= S POO) OTE) y +E MOOD TE 


where a^ € C'(Range(Y), R) and b’ € C! (Range(v), IR) for i,j € Np are defined by a” (x) = €i (z)£), (x) 
for i,j € IN, and b (x) = 357 4 £&(z)0,:£1 (x) for j € Nn, for all £ € Dom(v). Then 


TL 


vp € Dom(y), (XY)() = D> a" up) 9907 + D (up) HY. (60.3.1) 


ij-l 


PRoor: The assertion follows by straightforward computations of derivatives. 


60.4. Tensorisation coefficients for second-order tangent operators 


60.4.1 REMARK:  Tensorisation coefficients “correct” non-tensoriality of the second differential. 

Theorem 60.4.3 is an attempt to define a second-order version of the differential (df), of a real-valued 
function f € C!(M) at a point p € M. Since the second-order differential is not tensorial, it is necessary 
to define “tensorisation coefficients" to adjust it to make it tensorial. Then one obtains a kind of tensorial 
second-order differential *(d?f),", but it depends upon an almost arbitrary choice for the tensorisation 
coefficient array w. This array is required to obey a transformation law which is not at all tensorial. This 
may seem odd at first, but the tensorisation coefficients must be viewed as a kind of “error correction factor” 
for the second-derivative operator under chart transitions. In other words, the non-tensoriality of the array 


(Oi; f(p))7,=1 is corrected by the non-tensoriality of the array (w()7;)7.j pa1- 
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The negative sign in the expression for L(v)(f);; in Theorem 60.4.3 harmonises the sign of w with the sign 
of the Christoffel array T in the expression 9j; — TE Or for the Hessian operator in Remark 71.9.7. However, 
the Christoffel array is the negative of the coefficients of the horizontal lift map for an affinely connected 
manifold. In other words, the Christoffel array indicates the negative of the direction of parallel transport for 
tangent vectors. Consequently w may be thought of as signifying parallel transport for tangent covectors. So 
the differential operator 0;; — $5; wb) is in effect subtracting some kind of “parallel transport" w from 
the naive second partial derivative. This corresponds closely to the general concept of a covariant derivative 
in Definition 71.6.4, which quantifies the difference between the naive derivative and parallel transport. 


60.4.2 REMARK:  Abbreviated notations for chart-dependent differential operators. 
In Theorems 60.4.3, 60.4.5 and 60.4.7, “Ək f (p)" is an abbreviation for 0, ze (b+ (2) bp) and “0;;f(p)” is 


an abbreviation for 0,10, f (~~ ! (a))]| 2=¥(p)" So these abbreviated notations implicitly depend on the chart v. 


The notations Age f(p)” and “o f(p)” would be slightly more informative, but within such a narrow context, 
some such “abuse of notation” is acceptable. Using Notation 54.13.5, one may write “O; f(p)” more accurately 
as «OP p f)", but in the corresponding second-order expression Eu f)", the tangent operator pp" would 


be an element of TPM M) as in Notation 60.2.4 instead of Ê, (M). 


60.4.3 THEOREM: Pointwise tensorisation of the second-order tangent differential operator. 

Let M be a C? manifold with n = dim(M). Let p € M. For 7 € atlas,(M), let w(w) € IR?", and define 
L(Y) : C?(M) > R?” by Lb) (fix = 05 fP) — Ep1 9 (V)50xf (p) for f € C?(M) and i,j € Nn. Then the 
array (L(v)(f)ij)7;-, transforms like the component array of a tensor in TP? (M) if and only if 


Vw, w € atlas,(M), Vi, j,£ € Nn, 
WB) = E who" (Go 4G) ule) D Fyle), (60.4.1) 


where à = x o -! : Range(U) — Rangel) and ¢ = 7) o y7! = $7! : Range(w) — Range(i), and 
comma-subscripts such as ^;" and *;;" signify the corresponding differential operators “O;” and “0;;”, and 
x = V (p) and i = %7! (p). In more abbreviated form, line (60.4.1) may be written as 


wi (b) = EE id? 9f i + d" id" 1. 
Proor: Letpe M, wv, € atlas,(M), f € C?(M) and i,j € Nn. Then by the definition of L(w) and L(v), 


; P (fod @ CICER. 
LaDy = EUM = È why st 
a (f opt o o yg (8)) O(f o YT! o y od-!(z)) 
= Ox ORI Lees »LO EPI lsg 
n "P o NS C B Of o v7!) 
= 22,8 O85 ar loup + BHO T eoe 


k,£—1 
= È Ge GL) É dae Ewe DU 9 ©) 
k,£—1 k,r,s=1 Ox vp) 
+ EG s - Xe.) | o 
This equals 57; »_1 ¢* il)" 5 (2) L(V)(F) ie (i.e. transforms like components for T?! (M)) if and only if 
WIEREN” —— Y OBO" GBs +E E- Y aE) =0. 


This may be multiplied by the matrix [of K() Pear and rearranged as the equality 


VhjéeN, — wl(d)y= Y) Palae ()9 x (r)o(U)5, + 3 o^ (09 s) 


k,r,s—1l 


This agrees with line (60.4.1). 
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60.4.4 REMARK:  Tensorised second-order differential operators. 

Theorem 60.4.5 means that the map L(y) : C?(M) — IR?" for generating component arrays for functions 
f € C?(M) at points p € M yields the same tensor in Ti(M) for any choice of v € atlasp(M). In other 
words, the procedure for constructing the tensor OM vU) from the given tensorisation coefficient array 
w(y) is independent of the choice of v. 


It is often written in differential geometry textbooks that an object is a tensor of a given type if it “transforms” 
like a tensor of that type. This means that the construction method for the object must be chart-independent. 


It is perhaps noteworthy that if a construction method is given for only one chart vo, the issue of chart- 
independence does not arise. One may simply transform the tensor's components according to the rule for 
that type of tensor to obtain correct component arrays for all charts in terms of the one special chart wo. In 
practice, however, it is usually most desirable to use construction procedures which are expressed in terms of 
an arbitrary chart. Such procedures must be tested to determine whether the constructed component arrays 
"transform" correctly for the intended tensor type. 


60.4.5 THEOREM:  Tensorial transformation rule for tensorised second-order operators. 

Let M be a C? manifold with n = dim(M). Let p € M. Let w : atlas,(M) — IR?" satisfy line (60.4.1) 
in the statement of Theorem 60.4.3. Define L(y) : C?(M) — IR?" for all y € atlas,(M) by L(v)(f)ij = 
Osf(p) osa wh); Oxf (p) for all f € C?(M) and i,j € Nn. 

Let j,) € atlas,(M). Let f € C?. Define the array w = (wij)7;., € IR?" by w = (L(v)(f)ij)t;-i. Then 
T. d if and only if ®© = (L(Y) FJ) 


PROOF: The assertion follows from Theorem 60.4.3 and the T ?(M) component array transformation rule. 
(See Theorem 29.6.4 for the general component array transformation rule for tensors of type (r,s) with 
r,s € Z9. See Notation 56.2.4 for the general transformation rules for tensors 055, y € Tj *(M).) 


, 


60.4.6 REMARK: Globalisation of the tensorisation of the second-order differential. 

Theorem 60.4.7 “globalises” Theorem 60.4.3 by making both w(w) and L(w)(f) functions of a variable 
point p € Dom(w). Then Theorem 60.4.8 asserts that there is a well-defined second-order differential of a 
function f € C?(M) if the tensorisation coefficient array map w transforms as indicated in line (60.4.2) of 
Theorem 60.4.7. Such a differential could be denoted as *d2 f", for example. In fact, when the Christoffel 
array T for an affine connection acts as the tensorisation coefficient array for a manifold, a notation such as 
“D? f" is typically used, although it would be more informative to write *D2 f". 


60.4.7 THEOREM: Global tensorisation of the second-order tangent differential operator. 

Let M be a C? manifold with n = dim(M). For v € atlas(M), let w(7) : Uy — IR?", where Uy = Dom(v 
and define L(x) : C?(M) — (Uy = R?") by L(9)(f(9) = 0f (9) -X2 Wd) (0) Se f(p) for f € C((M), 
p € Uy and i,j € Nn. Then the array L(v)(f)(p)i; transforms like the components of a tensor in T??(M) 
for all f € C?(M, IR) and p € Uy if and only if w satisfies 


Vv) € atlasp(M), Vp € Uy N Ug, Vi, j,£ € Nn, 


) 
) 


BON = LWP (BO 4)9 ule) E df ()9 (0) (60.1.2) 


where à = x o Ww! : Range(U) > Range(v) and ¢ = o pw! = $7! : Range(w) > Range(i), and 


” 


comma-subscripts such as “;” and “;;” signify the corresponding differential operators “O;” and “O;;”, and 
x — *(p) and $ = V^! (p). 
PRoor: The assertion follows by the application of Theorem 60.4.3 at each point p € M. 


60.4.8 THEOREM:  Chart-independence of tensorised second-order operator action on real function. 
Let M be a C? manifold with n = dim(M). For € atlas(M), let w(w) : Uy > IR?", where Uy = Dom(7), 
satisfy line (60.4.2) in Theorem 60.4.7, and define L(y) : C?(M) — (Uy — R?") by L(v)(f)(p) = 
dij f (p) — 3k 4 wV) (D)§, Oxf (p) for f € C?(M), p € Uy and i,j € Nn. Then 
Vf € C?(M), Vp € M, Vo, w € atlas, (M), 
ae cd s " (60.4.3) 
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Hence d?, = {(p, i ri Ay) P € M, v € atlas(M)] is a well-defined vector field in X (T??(M)). 


PROOF: Line (60.4.3) follows from Theorems 60.4.7 and 60.4.5. It then follows that there is a unique 
pair (p, t9 * TO), » in the set d2 f for each p € M. So d2f € X(T°2(M)) by Notation 57.5.3 because 


0,2 ; 
te Tw (Ap), € T? (M) for all p € M. 


60.4.9 DEFINITION: A tensorisation coefficient array map for a C? manifold M with n = dim(M) is a 
map w : atlas(M) > Upeatiascm) (Dom(v) ^ — IR?") which satisfies Vy € atlas(M), Dom(w(v)) = Dom(v 
and line (60.4.2) in Theorem 60.4.7. 


60.4.10 REMARK: Freedom in the choice of tensorisation coefficients. 
'The tensorisation coefficients vU) i in Definition 60.4.9 are completely arbitrary for any fixed chart wv, but 
then the transformation rules completely determine w(w)* ; for all other charts i). The values of w(wv)(p) e are 
completely independent at all points p € M. In particular, the values may be not be continuous, bounded 


or even integrable. 


60.4.11 REMARK: Some properties which tensorisation coefficients do not necessarily have. 

Although the operators 0;; yield a symmetric matrix of values 0;; f when applied to any function f € C?(M), 
the tensorisation coefficients w(V)7; are not necessarily symmetric. Therefore the operators Li;()(f) in 
Theorem 60.4.7 do not necessarily yield a symmetric matrix. Although Oj;f(p) is necessarily continuous 
and symmetric, Lij (a))(f)(p) may be neither continuous nor symmetric. Asymmetry of the “tensorisation 


coefficients" turns out to be interesting enough to have its own name: "torsion". 


60.4.12 REMARK:  Coefficients of affine connections satisfy conditions for tensorisation coefficients. 

The Christoffel array TË = $9" (ðgu/3xİ + Əgıj /Əxİ — Ogi; /Ox') for the Levi-Civita connection on a Rie- 
mannian manifold in Section 74.1 satisfies the requirements of Theorem 60.4.8 with w =T. (This is shown 
in Theorem 74.3.7.) Therefore second covariant derivatives which are based on the Levi-Civita connection 
in a Riemannian manifold are tensorial. 


60.5. Second-order tangent vectors 


60.5.1 REMARK: Higher-order tangent vectors are abstractions from higher-order tangent operators. 

The second-order tangent vectors in Section 60.5 are based on the chart transition rules for second-order 
tangent operators in Section 60.2 in the same way vay that the familiar first-order tangent vectors in Section 54.4 
have the same chart transition rules as first-order tangent operators in Section 54.11. This observation applies 
also to operators of the third and higher orders. 


60.5.2 REMARK: Symmetric coefficient matrices for second-order differential operators. 

Since the matrix [0;; f]? j= of second order derivatives of a C ? function f on an n-dimensional Cartesian space 
is symmetric, the contraction $57; ., a70;; f of this matrix with another matrix a = [a/7]7;.., depends only 
on the symmetrant of a. Therefore the set of real symmetric n x n matrices Sym(n, IR) in Notation 25.13.4 
is employed in Definition 60.5.3 instead of general matrices in M, (IR). 


60.5.3 DEFINITION: A second-order tangent (component) tuple for an n-dimensional C? manifold (M, Am) 
is a tuple (p, a,b, Y) € Uyea,, (Dom(v) x Sym(n, R) x R” x (yj). 


60.5.4 REMARK:  Second-order tangent vector transformation rules are copied from tangent operators. 
'The second-order tangent vector component-tuple equivalence rules in Definition 60.5.5 are based on the 
corresponding rules for second-order operators in Theorem 60.2.5. Although these vector transformation 
rules are defined to match the corresponding rules for differential operators, it does not follow that tangent 
objects are differential operators. 


For comparison, consider ordinary vector fields X € X'(T(M)) on manifolds. (See Section 57.1 for vector 
fields.) At each point p € M, the vector X(p) is an element of T,(M), but X(p) is not necessarily a 
differential operator. Similarly, if W € X!(T*(M)) is a covector field, the covector W(p) € T*(M) is not 
necessarily the differential W (p) = df (p) of some real-valued function f € C!(M). This would be a very 
strong constraint on the field W. However, first-order operators and differentials are used to conveniently 
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determine the transformation rules. Then vectors like X (p) are thought of simply as indicating a direction, 
not an operator. It just happens that a first-order operator has a direction, but other things have a direction 
too. In the same way, all directional objects on a manifold use differential operators for their transformation 
rules without actually being operators. 


60.5.5 DEFINITION: A second-order tangent vector for an n-dimensional C? manifold M < (M, Am) is an 
equivalence class [(p, a, b, v)] of tuples (p, a,b, v) € Upean (Dom(v) x Sym(n, R) x IR" x (vj), where the 
tuples (pı, a1, b1, 1) and (po, a2, 02, Y2) for Yı, Y2 € Am are said to be equivalent whenever p; = p2 = p and 


- m xxu "T a - 
Víje Ns, aj = J a2 e vi G2) [iu o) oz (02 PT E |, uu, a1 (60.5.1) 
k,ł=1 
Vie N dar e i a) u y2 r (e) bj. (60.5.2 
V € Nn, 2 — 5 grrr 2 o V (x)) — 01 M 9 Vi (x)) lius) 1: ( sete ) 
k,ł=1 k=1 


In abbreviated form, 


Vi, j € Nn, aj = Y; indicat’, (60.5.3) 


Vi € Nn, b = bine ar? + Y pbk, (60.5.4) 


where à = V» o vi !, Vi,j € Nn, $^; = 0;d! and Vi, j,k € Nn, jy = On0;¢". 

60.5.6 NOTATION: Bios for p € M, a € Sym(n, R), b € IR" and v € atlas;(M) for an n-dimensional 

C? manifold M with n € Zj denotes the equivalence class [(p, a, b; /)] in Definition 60.5.5. In other words, 
2 

D ss = [p,a 5,9]. 

60.5.7 REMARK: Notation, linear space and tangent bundle structure for second-order tangent vectors. 

Notation 60.5.8 gives notations for sets of second-order tangent vectors. Definition 60.5.9 specifies the obvious 

linear structure on the pointwise set of second-order tangent vectors. Definition 60.5.12 gives the obvious 

tangent fibration structure for second-order tangent vectors. 

60.5.8 NOTATION: TP(M ) denotes the set of second-order tangent vectors ib ug at a point p in a C? 

manifold M. That is, 


TP(M) = {tP ey; v € atlas(M), a € Sym(n, IR), b € R"}. 


TPM) = Upem TP? (M) denotes the set of all second-order tangent vectors for a C? manifold M. That is, 
TPM) = id p € M, V € atlas,(M), a € Sym(n, R), b € IR"). 


60.5.9 DEFINITION: The second-order tangent space at a point p in a C? manifold (M, Am) is the set 
TP) where n — dim(M) and fees = [(p, a,b, ~)] denotes the equivalence class of (p,a,b,w) with 
respect to the equivalence relation in Definition 60.5.5, together with the linear space operations inherited 


from Sym(n, IR) and R”. 


60.5.10 REMARK: The full specification tuple for second-order tangent spaces. 
The second-order tangent space in Definition 60.5.9 has specification tuple (IR, TP (M), 0R, TR, Tpm u), 
— p 


where TP (M) is as above, og and 7g are the standard operations of addition and multiplication for R, 


Orile) ` TP (M) x T (M) > TP (M) is the addition operation on Tp(M) defined by he a DD wo 
ec and u : Rx TP! (M) > TP (M) is the scalar multiplication operation (A, ha) + i ops 
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60.5.11 REMARK: The chart-transition rules ensure chart-independence of linear space structure. 

The definitions of vector addition and scalar multiplication in Definition 60.5.9 are independent of the choice 
of coordinates. This is because the chart transition rule in equations (60.5.1) and (60.5.2) is linear with 
respect to the components a and b. 


60.5.12 DEFINITION: Second-order tangent fibration of a manifold. 
The second-order tangent fibration of a C? manifold (M, Ay) is the tuple (TU (M), m, Apium), where 


(i) TPBl(M) = Upem TPM) = ical V € Ay, p € Dom(y), a € Sym(n,IR), b € IR"), where n = 
dim(M), 
(ii) 7: TPl(M) > M is defined by 7 : [(p, a, b, )] > p, 
(iii) Arem) = (i;  € Ay}, where for any chart Y € Am, the chart j : a! (Dom(y)) — IR?*"^*^ is 
defined by v : iua + (vp), a, b). 
60.5.13 REMARK:  Consistency of second-order tangent vectors and second-order tangent operators. 


Theorem 60.2.5 implies that Definitions 60.5.5 and 60.2.2 are consistent with each other and the D-operation 
in Notation 60.5.14 commutes with changes of chart. 


60.5.14 NOTATION: Oy, for a second-order tangent vector W = rd ES TU (M), denotes the tangent 
operator Br ed e TB (M). 


60.5.15 NOTATION: 


Ow for a second-order tangent vector W = 471 


p,a,b 


tangent operator (p, ðw) = (p, oP”! c TUI (M). Thus ôw = (p, Ow) = ôP = (p, o . 
p,a,b,w p,a,b,w p,a,b,wp 


“ae TU! (M) denotes the corresponding tagged second-order 


60.5.16 REMARK: Mnemonics and abbreviations for second-order tangent vector transformation rules. 
A useful mnemonic for equations (60.5.1) and (60.5.2) is 


PETTY 
ovt Ovi 
Ou) ae , Odi 
Out 
The equations are reduced to the first-order tangent vector transformation rule if a, is zero. If the first-order 


component is ignored, the transformation rule reduces to that for contravariant coefficients of 2-tensors in 
T2? (M). The rules can be further abbreviated as follows. 


dE aput 


bf. 


ai = zz a", 


b = Tike gg Tik bh. 
Equations (60.5.1) and (60.5.2) are reminiscent of the equations in Theorem 59.1.15. 


60.5.17 REMARK: First-order tangent vectors and operators are special cases of second-order. 

It is essentially true that T(M) C T (M), and precisely true that T(M) C T (M) and T(M) c TPI(M), 
since the second-order tangent vectors and operators reduce to the corresponding first-order vectors and 
operators when the second-order coefficient matrix a is zero. 


The higher-order operators of order k > 2 may also be defined along the same pattern as the second-order 
operators. The notations for these spaces would then be T¥ (M) and so forth. Each one of these spaces has 
à corresponding total tangent space and can have a higher-order tangent bundle defined for it in Chapter 64. 


60.5.18 REMARK: The choice of order for components of notations for second-order tangent objects. 

There are good arguments against the component order (p,a,b,w) in Definition 60.5.5. The ordering 
(w, p,6,a) is preferable in some ways. For example, it is sometimes useful to consider all spaces TM ) 
together in one combined set. It is useful to be able to say that (Y, p,a1,a2,a3) € TU (M) and (v,a4,a3) € 
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TU (M) are equivalent when the symmetric n x n x n array a3 equals zero. This could be done by considering 
all such sequences to be elements of a set of infinite sequences for which all but a finite number of components 
equal zero. This is much easier to do if the components have increasing order. Nevertheless, decreasing order 
is used here for the time being because it matches the decreasing left to right order in which polynomials 
are written. 
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Chapter 61 


VECTOR FIELD CALCULUS 


61.1 Action of vector fields on real-valued functions . . . . 2.2.2.2... 00000002 eee 1921 
61.2 Naive derivatives of vector fields by vectors... o.o 22e 1924 
61.3 Leibniz rule for naive derivatives of vector fields . ........... lll. 1927 
61.4 Naive derivatives of vector fields by vector fields . .. ....... lll. 1929 
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61.6 The Lie bracket for map-related vector fields... ..... 2l. 1937 
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61.8 Lie derivatives of vector fields... . aoa a a a ss 1943 
61.9 Lie derivatives of tensor fields... . 2... 2 2. rs 1945 
61.10 The exterior derivative for vectors . . . . 4. 4 2 ll ll ls 1946 
61.11 Exterior derivative applied to chart-basis vector fields . . . ............ css. 1949 
61.12 Exterior derivative relation to Cartesian space version . . . ... lle 1949 
61.13  Differentiability of exterior derivative of forms ........ 2222s 1952 
61.14 The exterior derivative for nonholonomic vector fields . ................ ln. 1954 


61.0.1 REMARK: Vector field calculus for cross-sections of general differentiable fibre bundles. 

The functions on manifolds which one differentiates with respect to vectors and vector fields are cross-sections 
of various kinds of differentiable fibre bundles for which the manifold is the base-point space. This is difficult 
to present in full generality in Chapter 61 because the general differentiable fibre bundles in Chapter 64 
require the prior definition of Lie groups, Lie algebras and Lie transformation groups in Chapters 62-63, 
which themselves require various concepts from the vector field calculus in Chapter 61. Therefore the vector 
field calculus presented here is restricted to the vector fields, tensor fields and differential forms presented 
in Chapter 57, which are cross-sections of the tangent bundles, tensor bundles and multilinear map bundles 
presented in Chapters 54, 55 and 56, because these do not require general Lie groups for their definitions. 


61.1. Action of vector fields on real-valued functions 


61.1.1 REMARK: The action of vectors and vector fields on real-valued functions. 

The pointwise differential action Oy in Definition 61.1.2 is the same as in Notations 54.11.12 and 54.11.3. 
The differential action Ox with respect to vector fields X in Definition 61.1.3 is defined in terms of this 
pointwise differential operator. (The action by vector fields is the same as in Notation 57.1.14.) The vector 
field spaces X (T(M)) and X"(T(M)) for r € ZF are defined in Notation 57.1.5. 


The tangent operators and tangent operator fields which were defined earlier are given the new name “ac- 
tion" here to focus attention on their role as operations whose inputs are real-valued functions, and whose 
outputs are real numbers or real-valued functions. In Notation 54.11.3, the input functions were merely "test 
functions" whose purpose was to give concrete meaning to abstract differential operators. 


61.1.2 DEFINITION: The action of a vector V € T(M) on C! real-valued functions on a C! manifold M is 
the map Oy : C! (M) > IR. (See Notations 54.11.12 and 54.11.3 for Ov.) 
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61.1.3 DEFINITION: The action of a vector field X € X(T(M)) on C! real-valued functions on a Ct 
manifold M is the map Ox : C! (M) > (M — R) defined by 


Vf € C'(M), Vp e M, (Ox f)(p) = Ox (pf. 


61.1.4 REMARK:  Test-function alternative to chart-based definition for differentiability of vector fields. 
The vector field action in Definition 61.1.3 is often used for testing the C^ differentiability of vector fields. 
A vector field X € X(T(M)) is tested by applying it to functions f € C^*1 (M,TR). If the output Ox f is in 
C* (M, TR), then X is said to be C* differentiable. This test gives the illusion that coordinates are not being 
used, but it is not a practically effective procedure for determining differentiability. (See Remark 57.2.6 for 
some references to texts which present either the test-function definition or the chart-based definition.) 


Theorem 61.1.5 line (61.1.1) gives a global-test-function condition for differentiability of a vector field as an 
alternative to the standard chart-based definition. This is given as a criterion of differentiability by many 
authors because it is elegant, convenient and apparently coordinate-free. Nothing in differential geometry is 
coordinate-free, but many proofs and definitions can be somewhat simplified by using global-test-function 
definitions for differentiability. Perhaps the best description for line (61.1.1) is that it is a characterisation 
of C* differentiability, not a definition. 


The proof of Theorem 61.1.5 gives some hints as to why the definition based on global test functions is 
unsatisfactory. It is straightforward to show that chart-based differentiability (as in Definition 52.1.2 and 
Theorem 57.2.3) of the map X : M — T(M) implies line (61.1.1), but the converse is not so simple. The 
chart-based definition performs tests locally within each chart, but the test-function definition requires global 
test functions f € C^*!(M,IR) as inputs. In the proof of Theorem 61.1.5, global test functions must be 
constructed from the local vector field within each chart so that the right kind of input can be provided 
for line (61.1.1). The difficult part of the construction of these test functions is performed in the proof of 
Theorem 51.8.3, where Hausdorff separation of the topology plays an important role. 


'The test-function style of definition is elegant, but it does not say what it means, which should be that 
the map X : M — T(M) is differentiable. Many authors also define differentiability of maps between 
differentiable manifolds in terms of test functions as in Theorem 52.1.19, which is similar to the elegant 
style of criterion in line (61.1.1). If all differentiability definitions are in the test-function style, they are 
compatible with each other and can be straightforward to work with. 


Although the test-function style of differentiability definition for maps, vector fields and other kinds of 
functions is equivalent to the “differentiability through the charts” style, its abstract nature makes it more 
difficult to apply to practical situations. Ultimately the test-function definitions must use charts anyway 
because f € C*(M,IR) means that f o y7! € C*(Range(w), R) for all v € atlas(M). One could say that 
the chart-based definitions use local IR"-valued test functions (i.e. the charts) instead of global IR-valued test 
functions. Conveniently, charts are an integral part of a manifold's definition. So they are always available. 
The real-valued global test functions could be thought of as proxies for chart components, but they do require 
some effort to prove their existence. 


61.1.5 THEOREM: Equivalence of test-function and chart-based vector field differentiability definitions. 
Let k € Zj. Let M be a C**! manifold. Let X € X(T(M)) be a vector field on M. Then X is a C^ 
differentiable vector field on M if and only if 


vf e CFI M B), dx f € C*(M,R). (61.1.1) 


In other words, X*(T(M)) = (X e X(T(M)); Vf € CF (M,R), Oxf € C*(M, R)}. 
(See Notations 57.2.1 and 57.1.5, and Definition 52.1.2, for X*(T(M)).) 


Proor: Let n = dim(M). The case n = 0 can be ignored because all functions on a zero-dimensional 
manifold are C?*. (See Remark 41.1.8.) So assume that n € Z*. 


Let X € X*(T(M)) and f € C*** (M, RR). Let Y € atlas(M). Then by Definition 61.1.3, 


Vx € Range(w), (Bx f) o v7 )(x) = (Ox f) b 1a) 
= Ox(4-1(a)) f 


= $ E (e)a UTE), (61.1.2) 
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where by Notations 54.11.12, 54.5.7 and 54.5.21, € = ®(w) o Xov! : Range(v) — R” satisfies 
Va € Range(v), E(x) = 9(9)(X (9^ (z))) 
= EG (IX (2)))). 
= I ((V(w) o X o 971) (z)), (61.1.3) 


where (v) and V(wv) are the velocity and manifold charts respectively for T(M) corresponding to w. (See 
Notation 54.5.7 for 6. See Notation 54.5.21 for V.) 


By Theorem 57.2.3, V(v) o X oy! € C*(Range(v), R?") because X € X*(T(M)). (This is in essence the 
chart-based definition for C^ differentiability of X.) So € € C*(Range(w), IR") by line (61.1.3). 

By Theorem 42.2.11 (vii), 0;(f o v!) € aa IR) for all i € Nn since f o j-! € C^* (Range(4)), R). 
From £ € C*(Range(v), R”) and (0;(f o -1))*., € C^(Range(v), R”), it then follows from line (61.1.2) 
that (Ox f) o v7! € C*(Range(v), R). p? holds for all y € atlas(M). So Ox f € C*(M,I) by Defi- 
nition 51.6.2. This verifies line (61.1.1). Thus X*(T(M)) € (X e X(T(M)); Vf e C**1(M,R), Oxf € 
C*(M,R)}. 

For the converse, let X € X(T(M)) satisfy line (61.1.1). Let y € atlas(M). Let x € Range(w). Let 
p = d(x, R” \Range(y)). Then p € R*. Let r, R € R* with r < R < p. Let j € Na. Then by Theorem 51.8.5, 
there is a function f? € C^"! (M, IR) such that f?(p) = y (p) for all p € 9^! (Bz,.) and f?(p) = 0 for all 
peMNwvU-!(B, n). So by line (61.1.1), Ox f € C*(M,IR). Therefore 0x f? o ^! € C*(Range(v), IR) by 
Definition 51.6.2. Since f?| P = y| i71 (B, it follows that 


-1(Bz,r) 
Vy € Bs, üxf'(07 (y) = Ox vA (9) 
= X Kyaw") 
= DEW 
= X68 
= g y), 


where € : Range(w) — IR” is defined as in line (61.1.3). Therefore 


P(Y) ti oX o Wla = lea 
= Ox f! o ve, 
& OB. 


The set of such open balls B,, covers Range(v). Therefore V(y)"*J o X o ^! € C*(Range(v), R) by 
Theorem 42.2.14. Since X € X(T(M)), Theorem 57.1.7 implies (v) o X o -! = y? o 7t, which is C* 
for all j € Nn. So W(y) o X o v^! € C*(Range(v), IR7") for all v € atlas(M). Therefore X € X*(T(M)) 
by Theorem 57.2.3. Hence X*(T(M)) = (X e X(T(M)); Vf € CF (M,IR), 0x f € C*(M,R)}. 


61.1.6 REMARK: Some basic properties of vector field actions. 

By Theorem 61.1.5, the action of a C^ vector field for finite k on a C^*! real-valued function yields a function 
which is in a larger function space, namely the C^ functions. So this action is not a C^*! function space 
endomorphism. For k = oo, the input and output spaces are both C?*(M). So vector fields are in fact 
endomorphisms in this case. However, this feature is only available for C^? manifolds. 


Similarly, vector field actions lack such basic properties as injectivity and surjectivity. So invertibility is not 
on the cards, but as discussed in Remark 61.5.4, compositions and commutators of vector field actions do 
have some nice "features". 
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61.2. Naive derivatives of vector fields by vectors 


61.2.1 REMARK: Parallelism-free derivatives of vector fields by vectors and vector fields. 

Naive derivatives on differentiable manifolds are derivatives with respect to vectors or vector fields in the 
absence of any definition of parallelism. This may be contrasted with Lie derivatives, which use a limited 
kind of parallel transport following the flow of a vector field, and covariant derivatives, which use parallel 
transport with respect to a connection. 


(1) Naive derivatives Oy Y assume no concept of parallel transport at all. 
(2) Lie derivatives Lx Y assume a limited “parallel transport” via the flow of a vector field X. 
(3) Covariant derivatives DL Y use a parallel-transport connection 0. 


When there is some kind of parallel transport, as in (2) and (3), the horizontal component of the derivative 
of a vector field can be cancelled by a parallel transported vector. Then a “drop function" can be applied to 
drop the vertical component of the derivative minus the parallel transport from a double tangent space to the 
ordinary tangent vector space. The resultant dropped derivative-minus-transport has the same coordinate 
transformation rules as the original vector field which is being differentiated. 


By contrast, in the absence of any kind of parallel transport, as in (1), the field and its derivative are in 
different spaces, with different coordinate transformation rules. Therefore naive vector field derivatives are 
often of limited utility in themselves, but they provide the starting point for defining more useful derivatives, 
such as Lie derivatives and covariant derivatives. The unsatisfactory transformation properties of naive 
derivatives motivate the definitions of Lie derivatives and covariant derivatives. (See Theorem 59.1.15 for 
the unpleasant chart transition rules for second-level tangent bundles.) The task of Lie derivatives and 
covariant derivatives is to bring naive derivatives down from a second-level tangent space to the first level. 


61.2.2 REMARK: The action of vectors on vector fields. 
The action by a single vector on a vector field in Definition 61.2.3 is analogous to the action by a single 
vector on a real-valued function in Definition 61.1.2. 


The partial derivative notation Oy Y emphasises that this derivative is not the same as the covariant derivative 
which is denoted as Dy Y , which is constructed from Oy Y by subtracting suitable correction factors. (The 
notational distinction between “0” and “D” for naive and covariant derivatives respectively is often seen 
in the literature.) 


61.2.3 DEFINITION: The action of a vector V € T(M) on C! vector fields on a C? manifold M is the map 
Oy : X! (T(M)) > T(T(M)) defined by 


VV € T(M), VY e X! (T(M)), OvY =Y,(V) 
LI (aY );(v (V). 


(See Definition 58.9.4 for the induced map Y, : T(M) — T(T(M)) of the map Y : M — T(M). See 
Definition 54.5.4 for 7: T(M) > M.) 


61.2.4 REMARK: Actions of single vectors versus actions of vector fields. 

Both naive and covariant derivatives may be defined as the action of isolated vectors V € T(M) on a vector 
field Y € X!(T(M)) (as in Definition 61.2.3). Then single vectors V may be replaced by a vector field 
X € X(T(M)) to define action by a vector field by simply aggregating the pointwise derivatives with respect 
to single vectors X (p) € T(M). (For naive derivatives, this aggregation is done in Definition 61.4.2.) 


By contrast, the Lie derivative Lx Y in Section 61.8 (and the closely related Lie bracket [X, Y] in Section 61.5) 
cannot be defined first as the action by a single vector (and then for the action by a vector field as an 
aggregate) because the acting vector field must itself be differentiated in order to compute the local parallel 
transport. However, some authors say little or nothing about derivatives with respect to single vectors, 
preferring instead to use vector fields X in all cases instead of vectors V to define derivative actions. (The 
preference in this book is always to define both if possible.) 


61.2.5 THEOREM: Some basic properties of the action of a vector on a vector field. 
Let M be a C? manifold. 
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(i) VV € T(M), VY e X'(T(M)), r.(0yY) =V. 

(ii) VV € T(M), VY € X!(T(M)), OVY € Ty avyy,v(T(M)). 
(iii) Vp € M, VV € T (M), VY € X! (T(M)), OvVY € Ty(,v (T(M)). 
PROOF: For part (i), Y € X'(T(M)) implies r o Y = idm by Definition 57.1.2. So m, o Y, = idr(m) by 
Theorems 58.9.8 and 58.9.18. Hence m, (Oy Y) = v,(Y.(V)) = V for all V € T(M). 
Part (ii) follows from Definition 61.2.3, Theorem 58.9.6, Notation 59.2.5 and part (i). 
Part (iii) follows from part (ii). 


61.2.6 REMARK: Differential action of curves on vector fields. 

Theorem 61.2.7 uses C! curves y to “differentiate” C1 vector fields Y. Some authors use such “differentiation 
by curve" as their basic definition of the action of a vector on a map. Certainly *differentiation with respect 
to curves" gives a strong intuitive meaning to the abstract Definition 61.2.3, and in fact most derivative 
definitions in differential geometry could be explicitly defined in terms of curves to maximise their intuitive 
clarity. One disadvantage of "differentiation with respect to curves" is that one must, in principle, verify in 
each case that the definition is independent of the choice of curve. 


As mentioned in Remarks 53.1.8, 53.1.9 and 54.3.9, the definition chosen for tangent vectors in this book 
(Definition 54.1.2) uses equivalence classes of curves which are imported via the charts from Cartesian spaces, 
which means that there is only one curve to represent each tangent vector in each chart. Then the equivalence 
class automatically adjusts to the differentiability level of the manifold. Thus this book does define tangent 
vectors in terms of curve classes, but in a very economical and flexible way. 


61.2.7 THEOREM: Differentiation of vector fields with respect to curves. 
Let M be a C? manifold. Let y: I + M be a C! curve with J € Top(IR). Then 


VY e X!(T(M)),, Vt e I, AY (y(t)) = Oy (o Y. 
PROOF: Since Y : M — T(M) is a C! map, Theorem 58.4.11 implies that 0,Y (7(t)) = (dY Jya) (y (t)) for 


all t € I. But (dY Jya (y (t)) = Y. (y (t)) by Definition 58.9.4, where Y. (4'(t)) = Oy (;)Y by Definition 61.2.3. 
Hence AY (y(t)) = Oy Y 


61.2.8 REMARK: The naive directional derivative of a vector field resides in the double tangent space. 
The naive derivative Oy Y in Definition 61.2.3 is the rate of change of the vector field Y in the direction V. 
This derivative is not in the tangent space T (M). It is in the second-level tangent space T (M) = T(T(M)). 
(See Definition 59.1.22 and Notation 59.1.26 for T(T(M)) and T? (M) respectively.) More specifically, 


VY e X'(T(M)), VV € T(M), OY € Ty(«(vy(T(M)), 


VY e X'(T(M)), Vp € M, VV € T, (M), 
OvY € Ty (y) (T(M)). 
The vector Óy Y is well defined because the map Y : M — T(M) is a C! map between the C! manifolds M 


and T(M), but the components of Oy Y do not transform like a vector in T(M). (See Theorem 59.1.15 for 
the component transformation rules for second-level tangent vectors in T'(T(M)).) 


61.2.9 REMARK: Coordinates expressions for the naive derivative. 

The coordinates expressions for Oy Y in Theorems 61.2.10 and 61.2.11, and their proofs, may seem excessively 
convoluted, but it is better to know in advance what is hidden by the tidy formalism than to be surprised 
by it when abstract expressions must be converted to concrete formulas in an application context. (See 
Theorem 61.5.11 for an example application.) Since cross-sections of general fibre bundles are defined by 
analogy with vector fields on tangent bundles, Theorems 61.2.10 and 61.2.11 may be regarded as a kind of 
“practice run” for the general case. 


Theorem 61.2.10 expresses Ov Y as a vector which looks like ty (5), Uy. y „,w(y); In this “basic version", Y (p) 
is a point in the manifold T(M) and Uy.y., is a coordinate 2n-tuple. This has some abstract interest. 


Theorem 61.2.11 expresses Oy Y as a vector like D onm where Y (p) = tpn, is the base point of the 


second-level vector Oy Y , and (v, w) is a pair of coordinate n-tuples, where v = ®(wW)(V) is the horizontal 
component and w is the vertical component coordinate n-tuple. This is the “useful version". 
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61.2.10 THEOREM: Coordinates expression for the action of a vector on a vector field. Basic version. 
Let M be a C? manifold. Then the action of vectors in T(M) on vector fields in X! (T(M)) satisfies 


VV € T(M), VY € X' (T(M)), Vv € atlas,(y)(M), 
OvY = ty(x(v))Uv y v YY) (61.2.1) 
where 7: T(M) — M is the projection map for T(M), and Uy.y.y € IR?", with n = dim(M), is defined by 
Vp € M, VV € T,(M), VY € X! (T(M)), V € atlas, (M), Vj € Non, 
Ul. = Y: PWM) AV) VO") eave (61.2.2) 
PROOF: Let p = x(V), V € T,(M), Y € X'(T(M)) and y € atlas,(M). Then Y € C'(M,T(M)) 


by Definition 57.1.2. By Definition 58.4.5, the differential map (dY), : T,(M) — Typ (T(M)) applied 
to V gives (dY),(V) = ty(p),uy.y.y,v(): Hence by Definitions 61.2.3 and 58.9.4, vY = (dY),(V) = 


Ly (n(V)),Uv, v, y (b): 


61.2.11 THEOREM: Coordinates expression for the action of a vector om a vector field. Useful version. 
Let M be a C? manifold. Then the naive action of vectors in T(M) on vector fields in X! (T(M)) satisfies 


Vp € M, VV € T,(M), VY € X! (T(M)), Vv € atlas, (M), 


_ 42) 
OvY = ty, (p), BWV), wvv (HP) w (61.2.3) 


where ny = (Y) o Y o v^! : Range(w) — IR" is the component function of Y for charts v» € atlas(M), and 
wy,v,y : Range(V) > R”, with n = dim(M), is defined for v € atlas(M) by 


Va € Range(w), wvyyl2) = $ O(W)(V) Oey (2). (61.2.4) 


PROOF: Let p € M, V € T,(M), Y e X!(T(M)) and v € atlas,(M). Then by Theorem 61.2.10, 
ÓyY = Ly (p), Uy. y. Ub) where Y (x(V)) = by ny, (b (p)) b and Uy. y. € IR?" is the 2n-tuple on line (61.2.2). 
For the horizontal n-subtuple of Uy,y.y, note that W(i)/ o Y = y for all j € N, by Theorem 57.1. 
line (57.1.2). So Up y, = Diz W)(V) Ox) (677 (z)) = Vin 9()(V)'0s2 = La PVY = 
$(vV)(V) for all j € Nn. The vertical n-subtuple may be written by Theorem 57.1.7 line (57.1.3) as 


“J 


| 


ll 
un 


Vj € Nn, Uy yl, = D EV Y B (0) (Y (97! (2), ui 


=4(p) 


II 


e 
Il 
oe 


COICO KA (2), 6; 


Il 
= 
SIS 

x< 
* 
= 
= 


2 


_ (2) 
Hence OvY = t5. (Y), ©()(V), wr (b (9), b^ 


61.2.12 THEOREM: Action of vector on a vector field. Point/velocity/chart computational version. 
Let M be a C? manifold with n = dim(M). Then 


Vy) € atlas(M), VU € Top( M) n IP(Dom(v)), Vf € C! (U, R”), Vp € U, Vv € IR^, 
— +(2) 
Orv (d > ta, r(@).0) = IF) YT, via (fou 1) G6) 
- 5 
p, f(p),v,w(p,v),V? 
where w : U x R” > R” is given by w(p,v) = 35; 4 vO;(f o v^ )(v(p)) for all (p, v) € U x R”. 
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PROOF: Let v € atlas(M). Let U € Top(M) with U C Dom(). Let f € C! (U, IR"). Define Y : U + T(M) 
by Y (q) = tg,¢(q),w for all q € U. Then Y € X! (T(M)|U) by Theorem 57.2.21. So Y € C! (U, T(M)). Then 


Vp € U, Ww € R”, p vy (q F tafla) = Op wap Y 
= On ee 


=Y, (tpv) 


= ty (p),W (p,v), Vw)» 


where W : U x IR” > IR?" is given according to Definition 58.4.5 by 


vp € U, vv ER", W(p,v) = Y; và, (9) (Y (9 (2)))] 


i= 


z—wv(p) 


HG 


II 


ll 
m 


vAn WU) (Lo: (2), 971 6), 9) lao) 


II 


Il 
A 


vss rs FOTE eyi) 


v‘ (ei, Oi(f o v )(u(p))) 
v, ini v Of o v7 )((p)). 


ll 
Mes 


Il 
A 


I 
a [EN 


'Thus 


Vp € U, w € R^, Opv,yY =t 


n 


Y (p).(v.2 ,, ., v*Oi(fov-1)(b(p))) L(Y) 
(2) 

= a 61.2.5 
p.f (p),v, S». loy) ( ) 


E 'O 
p.f(p),v,w(p,v)? 


where line (61.2.5) follows from Notation 59.1.11. 


61.3. Leibniz rule for naive derivatives of vector fields 


((2019-5-16. Remark 61.3.1 is about four times as long as it should be. It is also very boring. This remark 
was written while trying to figure out which lemmas to prove in preparation for Theorem 61.3.3. Now all of 
the lemmas have been stated and proved. So discussion of the method can be removed. )) 


61.3.1 REMARK:  Naive derivative Leibniz rule for products of real-valued functions and vector fields. 
Given a real-valued function f € C! (M) and a vector field X € X!(T(M)), one may form the pointwise scalar 
product f.X : M — T(M) defined by Vp € M, (f.X)(p) = f(p)X(p). Let Y = f.X. Then Y € X!(T(M)). 
So the naive derivative Oy Y is well defined by Definition 61.2.3 for V € T(M). 


One naturally seeks some kind of product rule (also known as a Leibniz rule) which expresses Oy Y in terms 
of Oy f and Oy X. However, some “issues” arise when one attempts this. The most obvious kind of product 
rule would look something like “Ov Y = f(p)Ov X + (Ov f)X(p)” for p € M and V € T;(M). An immediate 
difficulty with this is that f(p) € IR and ôv X € Tx(j(T(M)), whereas Oy f € IR and X(p) € T, (M). 


It is not clear how to add such products. It seems that f(p)Oy X € Tx(p)(T(M)) because Tx(j(T(M)) is a 
real linear space, whereas (Oy f) X(p) € T;(M) because T, (M) is a real linear space. But these are different 
linear spaces. So the sum of the terms is apparently not well defined. The sum should be Oy Y , which lies 
in Ty (5) (M). So both of the two terms are apparently in the wrong tangent space. 


A Leibniz rule can be obtained for Oy (f.X) by applying Theorem 58.7.13, but this can only be done if R is 


regarded as a C! manifold which has a tangent bundle. Let Ag = {idr}. Then (IR, Ag) is a C?^? manifold. 
Define u : IR x T(M) > T(M) by VÀ € R, Vp € M, VV € T,(M), (à, V) = AV, where AV uses the scalar 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1928 61. Vector field calculus 


product operation of the real linear space T;(M). Then one may write Vp € M, (f.X)(p) = u(f(p), X(p)). 
In other words, f.X =o (f x X). So by Theorem 58.7.13 line (58.7.15), 


Vp € M, (dY)p = (d(f.X))p 
= (d(uo(f x X)))p 
= (dF (p)) f) o (df)p + (dL) x») o (dX), (61.3.1) 


where Yq € T(M), Ri = u(:,q) and Vq € R, L^ = (q, -). This expression has the correct overall structure 
to yield a Leibniz rule because (df),(V) = Oy f and (dX),(V) = Ov X, although Oy f is in this case an 
element of T; (IR), not an element of IR. Nevertheless, it is worth pursuing. 

The map Rg) : R > T(M) is a C? curve in the tangent bundle T(M), which is a C* manifold if M is a 
C? manifold. So the differential (d^ o); fo) 2 Tip (R) > Ty (oy (TCM)) is the curve velocity map for the 
curve parameter f(p) € IR. By Theorem 54.1.8 (xii), a vector at f(p) may be written as t¢(p) w,idg for some 
w € IR since idg € atlas) (R). Let Yı € atlas; (M) and ii = Uh). (See Notation 54.5.21 for V.) Then 
yy € atlasx (p)(T(M)). So Definition 58.4.5 gives (dR (p) )ecpy rip), idr) = t where v, € R?” is 
given (as in the very similar proof of T l'heorem 59.4.8) by 


Y (p)jvi iji? 


v = » wd MC 19 RE. o ida H(z ®)| side Ct) 


= wort (R(t) "n" 

w h XP) f(p) 

aoe ))), 93) (EX (p)... f(p) 
= w ð (W1 (p), tE (4X (0))) |... py 

= (Orn, w P(41)(X(p))). 


(See Notation 54.5.7 for ®. See Definition 54.5.4 for the tangent bundle projection map Trom) : T(M) > M.) 
Thus the horizontal component of ty, A is zero, whereas the vertical component is equal to the vertical 
component of X(p). This implies that ty (5), di. € Ty (y), o(T(M)) by Notation 59.2.5, which implies that 
the drop function value zy; (5v), us, j,) = WX (p) is well defined by Definition 59.2.9. 


I 


I 


Here the input vector for (dR C) f) is tp(p),widg = vf = (df)p(V) € Ti (R). This implies that 
w € R is equal to the traditional meaning of “Oy f" as in Definition 61.1.2 and Notation 54.11.12, which is 
a real number. Unfortunately the notation “Oy f" has two different meanings here in the same context. To 
distinguish them, temporarily let 0) f = (d?f),(V) € R denote the traditional real-number derivative, and 
let OF f = (d'f)p(V) € Tip) (R) denote the tangent-vector version of this derivative. It then follows that 
wy (p) (dR ip) ) sod) = y (p) (ARE (py) f) (OV f)) = Vy (p) lty (yy o, Gr ) = (OQ f)X(p). This has 
the form which is expected for one term of a Leibniz-style rule for Oy (f. X), although significantly it does 
require a drop function. 

For the second term in line (61.3.1), Li) maps vectors W € T,(M) to vectors f(p)W € T,(M). This 
kind of verticality property of Lip) implies that (dL/5,,,) x(p) maps vectors in Tx(p),y(T(M)) to vectors in 
Ty (p),v(I(M)) for any V € T,(M). This is confirmed by Theorem 59.4.10. Thus the value X (p) is scaled 
by the factor f(p), but the horizontal component V is not. 


The consequence, then, is that the first term (dB) fp (Oy f) gives an output in Ty(,),9(T(M)), whereas 
the second term (dL) x (o) (Oy X) gives an output in Ty (5). (T(M)). Thus the first term can be dropped 
in a chart-independent manner to T(M) because it is vertical, but the second term cannot. So the sum 
in Theorem 61.3.3 line (61.3.3) has one chart-independent and one chart-dependent term. This motivates 
the definition of an affine connection on a tangent bundle, which converts this chart-dependent sum to a 
chart-independent sum in the Leibniz rule for covariant derivatives in Theorem 71.6.7. 


61.3.2 REMARK: Drop functions and Leibniz rules for naive derivatives. 
Theorem 61.3.3 line (61.3.2) may not look like the expected form for a Leibniz rule, but if the drop functions 
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are removed, the result looks like Oy(f.X) = (Ov f)X(p) + f(p)Oy.X, which is very much as one might 
expect. This cannot be literally correct because Oy (f.X) € Tip) xp (T(M)), (Ov f)X(p) € T;(M) and 
f(p)Ov.X € Tx (p)(T(M)), which are three different linear spaces. So the addition and equality operations 
are not meaningful. However, when this simplified formula (without the drop functions) is combined with 
a connection, it should look like Dy(f.X) = (Ov f)X(p) + f(p) Dy.X, which has a very good chance of 
being correct because covariant derivatives employ drop functions to convert vertical second-level vectors to 
chart-independent base-space (i.e. first-level) vectors. 


Line (61.3.3), which obliquely drops the vectors on line (61.3.2), looks very much more like the expected 
form for a Leibniz rule. 


61.3.3 THEOREM: Leibniz rule for naive derivative of product of a real function and vector field. 
Let M be a C? manifold. Then 


Vf € C'(M,R), YX € a Vp € M, VV € T, (M), Vv € atlas; (M), 
Ov (fX) = (Ov fg X0) + Mn xe lr, an) f(D) PX q(OvX))- — (61.3.2) 


(See Definition 59.3.2 for oblique drop functions c.) Consequently 


Vf € C'(M,R), VX € da (M)), Vp e M, VV € T,(M), V € atlas, (M), 
Dw) xi (Ov (F-X)) = (Ov f) X(») + Fok (0v X). (61.3.3) 
PROOF: Define Y : M  T(M) by Vp € M, Y (p) = f(p)X(p). That is, Y = f.X. Then Y € X!(T(M)) 


by Theorem 57.2.11. Let u : R x T(M) > T(M) denote the scalar product operation in Definition 54.4.4. 
Then by Theorem 58.7.13 line (58.7.15), 


Vp € M, (dY), = (d(f-X))» 
= (d(no(f x X)))p 
= (AR C) ft) o (df), + (dL^5,) x o) o (dX)p, 


where Yq € T(M), RẸ = u(-,4) : R > T(M) and Vq € R, L7 = u(q; -) : T(M) ^ T(M). Therefore 


Vp c M, VV € T,(M), 
óyY = 1 (V) 

AR oy) ptg (P) (V)) + (AL py) xq) (4X (V) 

AR (py) fo) E r(p)ov fias) + (AL (py) x(p) (0v X). 


(d 
= (dR 
=< 
= (8v APY) QC) + Plan era FOP (0v. X) 


by Theorem 59.4.8 line (59.4.7) and Theorem 59.4.10 line (59.4.16). This verifies line (61.3.2). 
Line (61.3.3) follows from line (61.3.2) and Theorem 59.3.9 line (59.3.7), and Theorem 59.2.10. 


61.4. Naive derivatives of vector fields by vector fields 


61.4.1 REMARK: Extension of action by a vector to action by a vector field. 
The action by a vector field on vector fields in Definition 61.4.2 is analogous to the action by a vector field 
on real-valued functions in Definition 61.1.3. 


Definition 61.4.2 extends Definition 61.2.3 from the action by a single vector V € T(M) on a vector field 
Y € X!(T(M)) to the action by a vector field X € X(T(M)) on Y. This extension is defined by simply 
aggregating the actions by the single vectors X(p) € T(M). 


This is in contrast to the impossibility of defining the Lie derivative LxY with the vector field X replaced 
by a single vector V. There is no such thing as the Lie derivative * Ly Y ” for a vector V and vector field Y. 
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61.4.2 DEFINITION: The action of a vector field X € X(T(M)) on C! vector fields on a C? manifold M 
is the map Ox : X! (T(M)) 5 (M  T(T(M))) defined by 


VX € X(T(M)), VY € X! (T(M),, Vp € M, 
(Ox Y )(p) = Ox(pyY. 


61.4.3 REMARK: Action of vector fields on vector fields produces inconvenient kinds of cross-sections. 
The result of differentiating a vector field Y with respect to a vector field X in Definition 61.4.2 is not a 
vector field in a space such as X (T'(T'(M))) because the base-point space for T(T(M)) is T(M). So a vector 
field in X (T(T(M))) would be a map from T(M) to T(T(M)), whereas the map OxY in Definition 61.4.2 
is from M to T(T(M)). 


The action of X on Y produces a cross-section Ox Y € X(T(T(M)), 7 o 1., M), where (T(T(M)), 7 o Tx, M) 
is a kind of double tangent fibration of M. (See Notation 47.4.3 for general topological fibration cross-section 
spaces X (E, 7, B) for fibrations (E, r, B). See Notation 64.7.3 for spaces X*(E, nr, B) of C" cross-sections of 
differentiable fibrations (E, n, B).) The map «x o 7, is the composition of the projection maps 7 : T(M) > M 
and a, : T(T(M)) > T(M). Thus OxY is a cross-section of the space of second-level tangents over M as 
a base space because 0x Y specifies only a single vector in T(T(M)) for each point p € M rather than a 
vector for each element of T(M). So one could write “Ox : X! (T(M)) > X(T(T(M)),« o 1., M)" instead 
of “Ox : X! (T(M)) > (M > T(T(M)))" in Definition 61.4.2. But this would be of very limited benefit. 


Luckily the output from the expression xY for vector fields X and Y is not itself used much in practice 
because its transformation properties are so inconvenient. To make this expression useful, its horizontal 
component must be somehow cancelled by subtracting some object which has the same horizontal component. 
This is achieved in the definitions of Lie derivatives and covariant derivatives for example. (Second-level 
vectors in T(T'(M)) with zero horizontal component can be “dropped” to usable first-level vectors in T(M). 
Then one obtains vector fields Lx Y and DxY in X(T(M)).) 


61.4.4 THEOREM: Some very basic properties of the naive action of a vector field on a vector field. 
Let M be a C? manifold. Let X € X(T(M)) and Y € X!(T(M)). Let r : T(M) — M denote the projection 
map for T'(M). 
(i) Vp € M, (OxY)(p) = Y.(X(p)). In other words, Ox Y = Y. o X. 
(ii) Vp € M, v.((OxY)(p)) = X(p). In other words, 7. o (OxY) = X. 
(ii) Vp € M, (OxY)(p) € Typ), xp) (TUM). 
PROOF: Part (i) follows from Definitions 61.4.2 and 61.2.3. 


For part (ii), v. ((OxY)(p)) = v«(Ox(p)Y) = X (p) by Theorem 61.2.5 (i) for all p € M. 
Part (iii) follows from Definition 61.4.2 and Theorem 61.2.5 (iii). 


61.4.5 REMARK: The action of vectors and vector fields on tensor fields. 
Naive derivatives of vector fields may be generalised to C? cross-sections of any C! fibration. (See Defini- 
tion 64.2.2 for differentiable fibrations.) Of special interest are tensor fields of general type. 


The space X ^(T"*(M)) for k € Zj consists of all C* tensor fields of type (r,s) on a C^*! manifold M. If a 
tensor field K € X'(T^5(M)) is differentiated naively with respect to V € T,(M) for some p € M, the result 
is a vector Oy K in the tangent space T (5 (L^*(M)) of T^*(M). Therefore each operator Oy has domain 
X!(T"5(M)) and range T(T"^*(M)). It seems reasonable to label the operator with the tensor type. Thus 
OV: X!(T^*(M)) > T(I"*(M)). From this perspective, the operator on real-valued functions may be 
notated as gy f — Oy f, and the operator on vector fields may be notated as Oy x = Oy X. (The pedantic 
reader will hopefully note that the spaces X*(T°°(M)) and X*(T!°(M)) are only canonically isomorphic 
to C*(M) and X*(T(M)) respectively. Also note that X*(T°°(M)) and C*(M) only require M to be C^, 
whereas X^(T*9(M)) and X*(T(M)) require M to be C*t?.) 

The naive action of vectors on tensor fields in Definition 61.4.6 is defined in terms of induced maps as in 


Definition 61.2.3. The naive action of vector fields on tensor fields in Definition 61.4.7 is defined as the 
aggregate of actions by individual vectors on tensor fields in Definition 61.4.6, as in Definition 61.4.2. 


[ ww .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


61.5. The Lie bracket 1931 


61.4.6 DEFINITION: The action of a vector V € T(M) on C! tensor fields of type (r,s) on a C? manifold 
M is the map O^ : X! (T^*(M)) > T(T"*(M)) defined for r,s € Zt by 


VV € T(M), VK e X! (T^*(M)), DUK-KV), 


where K, : T(M) > T(T"*(M)) is the induced map of K : M > T"*(M). 


61.4.7 DEFINITION: The action of a vector field X € X(T(M)) on Ct tensor fields of type (r,s) on a C? 
manifold M is the map OS : X!(T^*(M)) —> (M —> T(T"*(M))) defined for r,s € Z by 


YX € X(T(M)), VK e X! (IT^*(M)), Vp € M, 
(IZ K)(p) = OY). 


61.5. The Lie bracket 


61.5.1 REMARK: Literature and terminology for the Lie bracket. 
The Lie bracket is called the “bracket product” by Lang [23], pages 116-122. 


It is called the “bracket” by Bishop/Crittenden [2], pages 13-21; Crampin/Pirani [Y], pages 73-76; Spivak [37], 
Volume 1, pages 153-163; Kosinski [21], pages 18, 80; Do Carmo [9], pages 25-29; Darling [8], page 27; Gallot/ 
Hulin/Lafontaine [13], pages 22-26; Willmore [42], page 200; Kobayashi/Nomizu [19], page 5. 


It is called the “Lie bracket” by Bishop/Goldberg [3], pages 133-138; Szekeres [305], page 432; Bump [57], 
pages 30-31; Sternberg [38], pages 59-60; Flanders [11], pages 179-183; Lovelock/Rund [27], pages 339-340; 
Frankel [12], pages 125-131; O’Neill [295], page 5; Cheeger/Ebin [5], pages 1, 48-49; Poor [32], page 67; 
Stillwell [143], pages 74, 80; Penrose [297], pages 268, 310-317; Schutz [36], pages 43-47, 74-77; Eriksson/ 
Hàggblad/Strómbom [264], pages 10-12. 

It is called the “Lie operation" by Postnikov [33], page 17. 

It is called the “Lie corchete" (in Spanish, meaning “Lie square bracket") by Gómez-Ruiz [14], pages 66-67. 


It is called the “crochet” (in French, meaning “square bracket") by Malliavin [28], pages 106-111; Choquet- 
Bruhat [6], pages 25-28. 


It is called the “commutator” by Szekeres [305], page 432; Penrose [297], page 311; EDM2 [113], page 1213. 
It is called the “Kommutator” (in German) by Sulanke/Wintgen [40], pages 51-52. 

It is called the “alternate function" or “Poisson’s parenthesis” by Levi-Civita [26], page 35. 

It is called the “Poisson bracket” by EDM2 [113], page 1213. 


It is defined without a name by Auslander/MacKenzie [1], pages 142-143; Peskin/Schroeder [298], page 495. 


61.5.2 REMARK: The Lie bracket is related to the Poisson bracket, but it is not the same thing. 

The Lie bracket is related to the “Poisson bracket” and “Lagrange bracket" for phase space in Lagrangian 
and Hamiltonian mechanics. (See Flanders [11], pages 179-183; Bishop/Goldberg [3], page 257; Schutz [36], 
page 170; Corben/Stehle [258], pages 220-231; Greenwood [271], pages 241-251; Marsden/Hughes [289], 
pages 259, 263; Gregory [272], pages 416-417; Lanczos [280], pages 212-216; Itzykson/Zuber [277], pages 
3, 29; Lawden [284], page 131; Penrose [297], pages 321, 483; Rebhan [299], pages 263-266; EDM2 [113], 
pages 1007-1008, 1213.) 


61.5.3 REMARK: Interpretations of the Lie bracket for vector fields or matrices. 

There are four plausible interpretations for the expression “XY — Y X" for vector fields X and Y. A fifth 
interpretation arises from the Lie bracket operation of an abstract Lie algebra. (See Remark 19.10.1.) The 
matrix interpretation (4) is typically associated or identified with the commutator interpretation (1). 


(1) The commutator of compositions of two operator fields. 
“XY” means the composition Ox o Oy € TU (M) of Ax and dy acting on real-valued functions. 
“XY —Y X" means Ox o Oy — dy o Ox, dropped from TUI(M) to T(M). (See Notation 60.5.8 
for TPl(M).) 
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(2) Antisymmetrant of naive derivatives of two vector fields. 
“XY” means the naive derivative Ox Y € T(T'(M)) of X acting on Y. 
“XY —Y X" means ôxY — ðy X, dropped from T2(M) = T(T(M)) to T(M). 
(3) The Lie derivative. 
“XY” means the naive derivative Ox Y € T(T(M)) of X acting on Y. 
^Y X" means the “Lie horizontal lift" 0x Y € T(T'(M)) of Y by X. (See Definition 61.7.5.) 
“XY —Y X" means OxY — 0xY , dropped from T(T(M)) to T(M). (See Definition 61.8.4.) 


(4) Antisymmetrant of matrices. 
“XY” means the matrix product XY € M, (IR) for X,Y € Mn (IR). 
"XY — YX” means the matrix difference XY — Y X. 


Interpretation (1) is presented by Lang [23], pages 117-118; Bishop/Crittenden [2], page 13; Do Carmo [9], 
pages 26-27; Crampin/Pirani [7], page 73; Spivak [37], Volume 1, pages 153-154; Kosinski [21], page 18; 
Darling [8], page 27; Gallot/Hulin/Lafontaine [13], page 22; Willmore [42], page 200; Bishop/Goldberg [3], 
page 133; Szekeres [305], page 432; Bump [57], page 40; Flanders [11], page 180; Postnikov [33], page 12; 
Lovelock/Rund [27], page 339; Frankel [12], page 126; Penrose [297], page 311; Cheeger/Ebin [5], page 1; 
O'Neill [295], page 5; Gómez-Ruiz [14], page 66; Choquet-Bruhat [6], page 25; Sulanke/Wintgen [40], page 51; 
Levi-Civita [26], page 35; Schutz [36], page 44; Stillwell [143], page 104; Kobayashi/Nomizu [19], page 5; 
Eriksson/Hàggblad/Strómbom [264], page 12. 

Interpretation (2) is presented by Sternberg [38], pages 59-60. 

Interpretation (3) is presented by Malliavin [28], page 108. 

Interpretation (4) is presented by Stillwell [143], page 105; Hall [89], page 56; Moriyasu [293], page 165; 
Cahn [58], page 3. 


It seems that interpretation (1) wins the majority vote! However, this interpretation typically requires the 
manifold to be C??, which constrains its applicability. This is not the only objection to interpretation (1). 
Even if the differential operators are applied to C? functions on a C? manifold, and the domain mismatches 
are ignored (because [X,Y] € X°(T(M)) if X and Y are in X'(T(M))), there is still the issue that test 
functions and differential operators are not truly part of the quintessence of vector fields. Luckily the issue 


is more philosophical than practical because interpretations (1) and (2) give essentially the same result. 


61.5.4 REMARK: The serendipity of the Lie bracket. 
Interpretations (1) and (2) in Remark 61.5.3 seem to be serendipitous. They both lack clear geometric 
meaning. They are constructed as follows. 


'Thus the Lie bracket operation seems to be a kind of computational magic which yields something meaningful 
as the final output of computational steps which are not motivated by geometric intuition. Interpretation (3) 
is, however, geometrically motivated. Thus the Lie derivative seems to be the only real underlying geometric 
reality which justifies the Lie bracket. 


61.5.5 REMARK: Comparison of interpretations for the Lie bracket. 

For the definition of XY in Remark 61.5.3, interpretation (1) produces a second-order operator, whereas 
interpretation (2) produces a second-level velocity in the double tangent space. The coordinates of the 
commutators for both interpretations are essentially the same, after they have been suitably “dropped” to 
the first-level tangent bundle. So from the coordinates point of view, the difference is academic. But this 
tends to detract from the claims that coordinates should be avoided in differential geometry. It is only by 
looking at the coordinates of the difference “XY — Y X", in either interpretation, that one discovers that 


Gan 


this object transforms like a vector field, and therefore “is” a vector field in some sense. 


Whenever possible, it is preferable to avoid definitions which require test functions for differential operators. 
Such definitions typically require at least a rudimentary proof of the a-priori existence of enough test functions 
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to make the specification valid. Direct constructions are best. Thus option (1) in Remark 61.5.3 should be 
avoided, and option (2) should be preferred. After a concept has been defined by some concrete construction 
without differential operators, the concrete vectors V can be applied in differential operators Oy. 


Interpretation (2) has been chosen for Definition 61.5.7 because it abides by the basic principles of the 


fibre bundle approach to differential geometry. It uses neither test functions nor coordinates, except that 
coordinates are required for the initial definitions for all tangent bundles. 


Interpretation (4) is typically given in physics-related texts with some kind of justification in terms of 
commutators of “generators” and their associated differential actions on some kind of representation of a Lie 
group. Such an interpretation could perhaps be described as “non-mathematical” because of the difficulty 
of identifying it with a pure mathematical definition for vector fields. The Lie bracket in Definition 61.5.7 is 
applicable to general vector fields, not only to the left invariant vector fields which are encountered in the 


context of Lie groups. 


61.5.6 REMARK: Drop functions and swap functions are required for Lie bracket definitions. 

In Definition 61.5.7, a “swap function” E is applied to Y X to move it to the same linear space as XY so 
that they can be subtracted. Then a “drop function" w drops the difference from T(T(M)) to T(M). The 
swap and drop functions are required both for the naive derivative interpretation of XY in Definition 61.5.7, 
and for the (more popular) operator-field commutator interpretation of XY in Remark 61.5.3 (1). Thus in 
the fibre bundle way of thinking, drop functions and swap functions are unavoidable. By contrast, they are 
superfluous in the coordinate calculus way of thinking. However, the formula in Definition 61.5.7 merely 
hides the necessary coordinate computations by incorporating them into the drop and swap functions, which 
cannot be defined without recourse to coordinates. 


61.5.7 DEFINITION: Lie bracket for vector fields on differentiable manifolds. 
The Lie bracket on a C? manifold M is the operation [-, -] : X! (T(M)) x X!(T(M)) > X(T(M)) where 


vX,Y e X (T(M)),, [X,Y] = wo (0xY — E o ðy X). (61.5.1) 
(See Definition 59.2.15 for the drop function w. See Definition 59.6.6 for the swap function E.) 


61.5.8 THEOREM:  Well-definition of the Lie bracket on a C? manifold. 
The Lie bracket on a C? manifold is well defined. 


PROOF: It must be verified that the expression w o (Ox Y — E o Óy X) is well defined. Let M be a C? 
manifold. Let X,Y € X!(T(M)). Then by Definitions 61.4.2 and 61.2.3, Ox Y : M — T(T(M)) satisfies 
OxY : p xp Y = Y*(X(p)) for p € M, where Y*(X(p)) € Ty(p)(T(M)) by Theorem 58.9.6. Similarly, 
Oy X : M > T(T(M)) satisfies Oy X : p 2 Oy(yj.X = X*(Y (p)) for p € M, where X*(Y (p)) € Tx(y(T(M)). 
The horizontal components of (Ox Y)(p) = Y*(X(p)) and (Oy X)(p) = X*(Y(p)) are respectively X(p) and 
Y(p) by Theorem 61.4.4 (ii). Therefore (OxY)(p) € Ty (p),x(p)(T(M)) and (Oy X)(p) € Tx (py, y (y (TCM)) 
by Notation 59.2.5. But then by Definitions 59.6.3 and 59.6.6, E((Oy X)(p)) € Ty(p,xqy(TCM)). This 
implies that (0x Y — = o Oy X)(p) = (OxY)(p) — E((O0y X)(p)) € Ty (5,9 (T (M)) is well defined as a difference 
between two vectors in the double tangent space Ty(5),x(y(T(M)). (The horizontal component of the 
difference equals zero by Definition 59.2.2 because the projection map (dr)y(p) : Ty(p)(T(M)) > T,(M) is 
linear.) 


Since (OxY — € o Oy X)(p) € Ty (5),0(T(M)) for all p € M, it follows from Definition 59.2.9 and Theo- 
rem 59.2.6 that (wy(p) o (OxY — € o Oy X))(p) = Wy(p)((OxY — € o Oy X)(p)) € T;(M) is well defined 
for all p € M. Hence [X,Y] = w o (OxY — E o Oy X) : M > T(M) is a well defined function which 
satisfies m([X,Y](p)) = p for all p € M. This implies that [X,Y] € X(T(M)) by Notation 57.1.5 and 
Definition 57.1.2. 


— 


61.5.9 THEOREM: Anticommutativity of the Lie bracket. 
The Lie bracket is anticommutative. In other words, for any C? manifold M, 


VX,Y e X!(T(M)), [X,Y] = -[Y, X]. 


Proor: [X,Y] = w o (0xY —Zo ðy X) = —wo (Oy X — 8 o OxY) = —|Y, X] by Definition 61.5.7 and 
Theorem 59.6.10 (iv). Hence the Lie bracket is anticommutative. 
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61.5.10 REMARK: Coordinate calculus for the Lie bracket. 

The Lie bracket is arguably one of the DG concepts which are simpler and clearer to define in coordinates 
than in the more modern tensorial or fibre bundle idioms. The formulas in Theorem 61.5.11 express the 
Lie bracket in terms of coordinates, which in the simplest form look like $77 ,(£'0;r — 7'0;€). This has 
the advantage of being simple, well defined, concrete, applicable and comprehensible. A similar coordinate 
computation is given by some authors as the definition of the Lie bracket. 


61.5.11 THEOREM: Coordinates expression for the Lie bracket of vector fields on manifolds. 


The Lie bracket [-, -] : X! (T(M)) x X1(T(M)) > X(T(M)) on a C? manifold M satisfies 
VX,Y € X'(T(M)), Vp € M, Vy € atlas; (M), Vi € Nn, (61.5.2) 
Pc YO) =F (9/0698, 0767 GL. uy 7 PO O2H XO anor) 


where n = dim(M) and 7) = (wv) is the velocity chart for y as in Notation 54.5.7. Hence 


VX,Y € X!(T(M)), Vp € M, Vy € atlas, (M), Vi € Nn, 


PUYO) = È EUO Gl, ug E m (G0: GL. ui (61.5.3) 
j- j- 
where £ = ij o Xow! and y — $ o Y o 7t. Hence 

VX,Y e X!(T(M)), Vw € atlas(M), Vx € Range(v), Vi € Nn, 


W o [X,Y] e Yle) = E Gun (0) - È P) (a). (61.5.4) 


Ms 


1 


j 
Or more briefly, o [X,Y] o v^! = x 3400-3 94). 
PROOF: Let X,Y € X!(T(M)) and p € M. Then by Definitions 61.5.7, 61.4.2, 59.2.15 and 59.6.6, 


[X, Y](p) = w((OxY)(p) — S((Av X)(p))) 


= By (p)(Ox(pyY¥ — Exp), Y p) (Av p)X))- 
Let Y% € atlas,(M) and V = X(p). Then by Theorem 61.2.11, 


Ox(p)Y = yY 
= : 
p, ny (b(p)), b (V), wx vo Qo (p)), v? 


where ny = V o Y ov-! : Range(V) — R” is the component function of Y for the chart Y, and wx y. (x) = 
Moa v OEQO 1(x)))8u me (x) for all x € Range(v). Similarly, with z = Y (p), one obtains 


Ex(p),¥ (p) (Ov (p) X) = Exp), Y p) (0X) 


NE (2) 

= Ext) Yan, e, (toy, dto) wv x e (699), 9) 

— +(2) 

= Ip TO, EWP) v. o ((P)). 8? (L5) 
where €y = woX oy: Range(~) — R” is the component function of X for the chart /, and wy x (x) = 
Di Y (Y (Yi (x)))0,2€y (x) for all z € Range(v), and line (61.5.5) follows from Definition 59.6.3. 


Since ny (w(p)) = v(z) and Ey(w(p)) = (V), the vectors Ox(p)Y and Ex(y),v (p)(v(p)X) are both elements 
of the double tangent space T;(T(M)). Therefore they may be subtracted to obtain 


Lu _ (2) O40). . 
Ox(p)¥ Ex (p) Y (p) (Ov @)X) = 5 56 dV), vx v utl), Y TERRI. 
40) 
poU rU 
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This is a vertical vector in T; o(T(M)) = Ty(p),o(T(M)). So it may be dropped via wy(p) to obtain 


= = (2) 
vy (p) (Oxi) = Exoyyoy(0vo)X)) = Pre Ep 50,0, wx, c top) m wv xo (9(9)), 


= ty, wx, v, (6(p)) wv, x. y ((p)), v 


by Definition 59.2.9. Thus [X, Y (p) = tp wx v ,(o(p)) -wv.x o (v(p)),u by Definition 61.5.7. Hence 


Vi € Nn, 
W(X, Y](p)) = p(t P,wx,y,p (v(p))— wy,x «(9 (p)),b) 
= (wx v, b (p)) — wy, x „(Y (p)))' 
= X (Pun GL, uy 7 9096s, uc) 


QX))04 (Y (967 6), — 9 (79): (97 (9), ug); 


E 


S 
ll 
m 
AEn AN 


& 
ll 
a 


II 


which verifies line (61.5.2). Lines (61.5.3) and (61.5.4) follow immediately from this. 


61.5.12 REMARK: Using coordinates to prove differentiability of the Lie bracket. 
The coordinates version of the Lie bracket in Theorem 61.5.11 makes possible the proof of differentiability 
of the Lie bracket in Theorem 61.5.13. 


61.5.13 THEOREM: The Lie bracket of co vector fields is a OF vector field. 
Let M be a C**? manifold for some k € Zi. Then 


VX,Y e X**1(T(M)), [X,Y] e X^(T(M)). 


PROOF: Let k € Zj. Let M be a C**? manifold. Let X,Y € X*+1(T(M)). Let € = po X o 7! 
and 7 = woYo yo . Then €,7 € C*t'(Range(w), R”) by Theorem 57.2.3. Therefore the map p +> 
da PY WHE), yep) 18 in C? (Dom(Y), R"). So the map p > PXPI YY Ua), yy is in 
C* (Dom(4)), R”), and p — %1 (Y (p))Oq5b*(X (v 1(2))) |. ui) is similarly in C*(Dom(w), R”). Therefore 


) 
the map p + v([X,Y](p)) is in C^(Dom(v), R”) by Theorem 61.5.11. Hence [X,Y] € X*(T(M)) by 
Theorem 57.2.3. 


61.5.14 EXAMPLE: Computation of Lie bracket of two vector fields using the manifold formalism. 
Let M = R?. Let Ay = {4} with Y% = idm. Then (M, Ay) is a C?? manifold. 


Define vector fields Y, Z € X^*(T(M)) by Y(p) = e” and Z(p) = pie5? — pae?" for p = (pi,p2) € M. 
(See Notation 54.4.10) for e^" = tpe; € T,(M).) These are the differential vector fields of diffeomorphism 
families a — (p +> (pı + o, p2)) and 0 — (p => (pi cos@ — po sin, pı sin@ + pa cos0)) respectively. (See 
Section 57.10 for integral curves of vector fields.) Since the integral curves do not commute, one would 
expect the Lie bracket of the vector fields to be non-zero. 


The naive derivative dy (5) Z of Z by Y (p) is Ope, y Z = Ope, (q (t3, 9,,4),9)). This yields the second- 


p2,:p1),61,e5,5 PECAUSE Og, (qez — q2€1)|,— -p 582 = (0,1). 


Similarly the naive derivative Oz(p)Y of Y by Z(p) is 3p, (—p2,p1), YY = Op,(—po,p.),b( > tq,e1,y). This yields 


(2) 
the second-level vector t. apa 


this is not a zero vector, although the derivative of a constant vector field is clearly zero. It is only the 
vertical component of the velocity tuple ((—p2,p1),0). The horizontal component is the tuple (—pe, pı), 
which clearly is not zero except at the origin of the manifold. 


It follows that 


level vector ie i 


because (—p20, + mos )(e1)], p = 0. It is important to note that 


= iO =4(2) 
Oy (p)Z — E(0z(p)¥) = p,(—p2,p1),e1,€2,50 — Et e1, (—p2,p1),0,9) 
aO a 
p,(—p2,p1),€1,€2, P,(—p2,P1),€1,0, 


= 'OQ 
P,(—p2,p1),0,e2, 
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because (€1,€2) — (e1,0) = (0,e2). Then this vertical vector in Tz(p),o(T(M)) must be dropped to T,(M) 


using the drop function. By Definitions 59.2.9 and 59.2.15, BE sn ouch hn: jd ng 


Hence [Y, Z](p) = e8” for all p € R2. 

As a "sanity check" for this computation, let (p1,0) be the starting point for following the integral curves. 
For da € IR*, follow the curve family of vector field Y. This takes the point (p1,0) to (pı + 05,0). Then 
follow vector field Z for a “time” dg. This moves (pı + ĝa, 0) to (pı + ĝa) (cos 69, sinóg). As the “third leg" 
of the journey, follow the vector field Y for a “negative time" —ó4. This moves (pı + ó4)(cosós, sin dg) to 
((p1 + ða) cos ðo — ða, (p + a) sinóg). Then as the “fourth leg", follow the vector field Z for a “negative 
time" —óg. This moves ((p; + dq) cosóg — da; (p + ĝa ) sin dg) to 


(cos 69((p1 + da) cos dg — ĝa) + sin do (pı + 5a) sin dg, — sin óg((p1 + fa) cos dg — ôa) + cos ĝo (pı + ôa) sin dg) 
= (pı + da(1 — cos p), ba sin 69) 
= (pı + 20, sin? (59/2), Ôa sin dg). 


To first order in the product 5,69, this is (p1, 6.69). This is consistent with the value e8 for [Y, Z](p). To 
be more precise, lims,,5,-0 05105 . ((pi + 250 sin? (65/2), da sin 69) — (pi, 0)) = (0,1) = [Y, Z]( (p1, 0) ). 


61.5.15 REMARK: Lie algebras of vector fields on differentiable manifolds. 

The Lie bracket in Definition 61.5.7 is the operation which distinguishes the Lie algebra of vector fields on a 
manifold in Definition 61.5.16 from the linear space of C?? vector fields in Definition 57.2.8. This is shown 
to be a genuine Lie algebra over R in Theorem 61.5.17, which uses Theorem 61.5.13 to prove the closure of 
the Lie bracket operation on X?^*(T(M)). 


A slightly alternative way to prove Theorem 61.5.17 would be to apply Theorem 19.10.10, which states 
that an associative algebra becomes a Lie algebra by replacing its vector multiplication operation with the 
commutator of this operation. 


61.5.16 DEFINITION: The Lie algebra of vector fields on a manifold. 
The Lie algebra of vector fields on a C™ manifold M is X°(T(M)) < (IR, X*(T(M)), or, TR, 0. T, 4), where 


(i) (IR, X?*(T(M)), om, Tm, 0, p) is the linear space of C^? vector fields in Definition 57.2.8, 
(ii) the product operation 7 = [:, +]: X?*(T(M)) x X*(T(M)) > X?* (T(M)) is given in Definition 61.5.7. 


61.5.17 THEOREM: The Lie algebra of vector fields on a manifold is a real Lie algebra. 
Let M be a C% manifold. Let A = (R, X® (T(M)), R, Tm. 0, 7. p) be the Lie algebra of vector fields on M. 
Then X*?*(T(M)) is a Lie algebra over IR. 


PROOF: It must be shown that Definition 61.5.16 satisfies Definition 19.10.3 for a Lie algebra. The tuple 
(IR, X^" (T(M)), oR, Tm, 0, H) is a real linear space by Theorem 57.2.9 (ii). (Therefore it is a unitary left 
module over R, as mentioned in Remark 22.1.2.) 

The algebraic closure of the vector product operation 7 : X^ (T(M)) x X? (T(M)) > X*(T(M)) follows 
from Theorem 61.5.13. The bilinearity of the map (X,Y) > [X,Y] in Definition 61.5.7 line (61.5.1) follows 
from the bilinearity of the maps (X, Y) > OxY and (X, Y) — Oy X, and the linearity of the drop function 
and swap function. Thus distributivity of 7 over ø is satisfied. The scalarity of 7 follows from its bilinearity. 
By Theorem 61.5.9, 7 is anticommutative. Hence the Lie algebra of vector fields X^*(T(M)) is a real Lie 
algebra by Definitions 19.10.3 and 19.10.14. 


61.5.18 REMARK:  Holonomic families of vector fields on differentiable manifolds. 

The holonomic families of vector fields in Definition 61.5.19 are the differentiable manifold versions of the 
Cartesian space holonomic naive vector field families in Definition 46.5.3. They arise very naturally, for 
example as the coordinate basis vector fields for charts in Definitions 57.1.18 and 57.4.2. 


61.5.19 DEFINITION: A holonomic family of vector fields on a set U € Top(M), for a C? manifold M, is 
a family (X;);e; with X; € X! (T(M)|U) for all i € I which satisfies 


Vi,j € I, Vp e U, [X4 Xj](p) = 0. 
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61.5.20 REMARK: Interpretation of the Lie bracket in terms of nonholonomy. 
The purpose of Theorem 61.5.21 is to give an interpretation of the Lie bracket of vector fields X4 and Xə at a 
point po as the limit of the ratio of the nonholonomy of X4 and X» around a rectangle divided by the “area” 
of the rectangle. The deviation from holonomy is measured by constructing four integral curves as the sides 
of a “curved rectangle”. (See Definition 57.10.2 for integral curves.) The one-sided derivative dt p(a)| 


a=0 
is defined as lim, ,9« (p(o) — po)/a, where a is the area of the square [0,a‘/?] x [0,a1/?] in the parameter 
space, and p(a) — po is a measure of the nonholonomy of X4 and Xə around the “curved rectangle” spanned 
by this parameter square. (See Theorem 46.5.8 for a Cartesian space version of Theorem 61.5.21.) 


Theorems which are exactly or roughly equivalent to Theorem 61.5.21 are proved (with variable levels of 
completeness) by Spivak [37], Volume 1, pages 159-163; Frankel [12], pages 129-131; Bishop/Goldberg [3], 
pages 135-138; Crampin/Pirani [7], 79-82; Schutz [36], pages 45-47. 


((2019-8-22. Theorem 61.5.21 is not yet correct. Please ignore it. )) 


61.5.21 THEOREM: The Lie bracket equals the limit of holonomy divided by area. 

Let M be a C? manifold. Let X1,X2 € X!(T(M)). Let po € M and óg € Rt. For ó € (0,óo], let 
ys : [0,46] + M be a continuous curve with y5(0) = po, which satisfies O.75(s) = X1(ys(s)) and 0,ys(s4-20) = 
— Xi1(ys(s4-26)) for all s € (0,6), and rys (t) = Xa(vys(t)) and Opys(t+20) = — Xa (vs(t--28)) for all t € (6, 26). 
Define p : Rj — M by p(0) = po and p(a) = Ya1/2(4a!/?) for all a € R*. Then p is a continuous curve 
which satisfies dF pla) ao = [X1, X2] (po). 


PROOF: 
({ 2019-8-11. To be continued ... )) 


61.6. The Lie bracket for map-related vector fields 


61.6.1 REMARK: The map between vector fields which is induced by a point map between manifolds. 

For any map ¢ € C! (Mi, M3), for C! manifolds Mı and M5, the pointwise “push-forth” map V 4 $,(V) = 
(dó);(v)(V) in Definitions 58.4.5 and 58.9.16 is a well-defined map from T(M,) to T(Mz2). This suggests 
the idea that an entire vector field can be *pushed forth" from the source manifold to the target manifold. 


Let X; € X(Mi). Then each vector Xi(p) € T,(Mi) for p € Mı is mapped to a well-defined vector 
(d¢)p(X1(p)) € Top) (M2). However, if ¢ is not surjective, this will not define a vector at every point of the 
target space M2, and even if the map is surjective, it may not define a unique vector at each point of M». If 
a vector field X2 € X (T'(M35)) can be defined so that (d$), (X1(p)) = X»(p) for all p € Mı, then the induced 
vector (d$), (X1(p)) is at least guaranteed to have a unique value at each point of M», and it is extended as 
a vector field to the whole target space. 


Fields X4 and X5 which are consistent in this way with a map ¢ € C!(Mi, M2) are said to be *ó-related" as in 
Definition 61.6.2. (See for example Lang [23], page 119; Bishop/Crittenden [2], page 14; Bishop/Goldberg [3], 
page 138; Crampin/Pirani [7], page 75; Kobayashi/Nomizu [19], page 10; Sulanke/Wintgen [40], page 51.) 
The main purpose of this definition is to enable the push-forth of the Lie bracket in Theorem 61.6.6 to 
be extended from diffeomorphisms to more general differentiable maps. The kinds of maps which are of 
particular interest include the differentiable embeddings, immersions and submersions in Section 52.5, which 
can be very non-surjective or non-injective. 


61.6.2 DEFINITION: A ¢-related pair of vector fields, for a function ó € C! (Mi, M2) between C! manifolds 
Mi and Mg, is a pair (X4, X2) such that X4 € X(T(Mi)) and X» € X(T(M3)), and 


Yp € Mi, (db)p(X1(p)) = X2(¢(p)). (61.6.1) 
In other words, ¢, o X1 = X» o Q. 


61.6.3 REMARK: Map-related fields, expressed in terms of coordinates. 

Expressed in terms of coordinates in Theorem 61.6.4, the criterion for vector fields to be map-related in 
Definition 61.6.2 becomes somewhat less elegant and less edifying. However, it is the coordinates version of 
this criterion in line (61.6.2) which is required for its application in Theorem 61.6.6. 
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Line (61.6.2) could be expressed more elegantly as £5 o db = dy o & if the expression d, = d» o dy o Jil 
was well defined. This would be well defined if each vector £i(r) was tagged with its base point x, but 
unfortunately, the inverse Vl ! is not well defined because hı maps all vectors in T(M1) to R™, which loses 
the information about the base point. This can be remedied by defining be pointwise as be (x) = d» o 
(do) p-a) o V !. Then line (61.6.2) would become £z o ó(x) = $, (x) o & (z), which is not elegant enough 
to merit the effort. 


61.6.4 THEOREM: Coordinates for related vector fields. 
For a = 1,2, let Ma be a C! manifold with na = dim(M,), and let X, € X(T(M,)). Let ó € C!(M;, M3). 
Then (X1, X2) is a ¢-related pair of vector fields if and only if 


Vi € atlas( Mi), VV/9 € atlas( M2), Vx € Range(v1), Vi € Nn, 
(Ba) = Y stile), (61.6.2) 
j= 


where £4 = Ja o Xa o yz! with Ya = O(a) for a = 1,2, 6 = v» o Go Wj", and à, : Range(y1) > 
M5, 5, (IR) is defined by $.(x)'; = Y3 (Qx (cte aly JJ for all i € Na, and j € Nn,- 


PROOF: Let p € Mi, v4 € atlas; (Mi) and Y2 € atlass(5)(M»3). Let x = v1(p). Applying we to the right side 
of line (61.6.1) gives Va(Xa(0(p))) = (Jo o Xa o vg P) (ua 0 po Vi (2))) = £4(¢(x))). Applying v$ to the 
left side gives $5 ((dd)p(X1(p))) = U5 (b«(tp.€:(@).)) = 32521 b+ (2) zéi (x) for all i € Nn. Thus line (61.6.2) 
asserts the equality of the velocity components of the left and right terms on line (61.6.1). Hence, since the 
base points are the same also, the two assertions are equivalent. 


61.6.5 REMARK: Application of an induced map to the Lie bracket of two vector fields. 

If ó is a diffeomorphism, Theorem 61.6.6 implies ¢,.([X1, Yi](p)) = [$x o X10 $71, 9. o Yı o $^ !](ó(p)) for 
p € Mı, where X5 = ¢, o X1o$Q ^! € X(M3) and Y = 4, o Yı o 6-1 € X(Ms3). This may be rewritten 
as ġa o[Xi, Yi] o $7! = [bx o X10 $71, 6. o Yı o $71] (if à is a diffeomorphism), which could be written 
informally as *$,[X1, Yi] = [9..X1, 6. Yi], where 9, X means ¢, o X oo ^! for X € X(M3). 

Theorem 61.6.6 looks simpler than it is. There are two different Lie brackets in line (61.6.3), one for 
X!(T(M:i)), the other for X!(T(M»3)). Since the Lie bracket for Mz requires C! vector fields on M» as 
inputs, the constructions $, o X4 o $7! and ¢, o X3 o $^! may not be acceptable as inputs when ¢ is 
not a diffeomorphism. (Consider, for example, that dim(Mi) and dim(M2) may not be the same.) In the 
general case, the pseudo-fields ¢, o X, o $^! and ¢, o Xə o $^! must be replaced with authentic vector 
fields X2, Y € X!(T(M3a)) which are well-defined extensions of the pseudo-fields. The Lie bracket is then 
computed for these authentic vector fields which are ¢-related to X1, Yı € X'(T(Mi)). The result of this 
computation must then be related back to the source-space fields. (Theorem 61.6.6 is applicable to the proof 
of the closure under the Lie bracket of the Lie algebra of a Lie group in Theorem 62.4.11 (ii).) 


61.6.6 THEOREM: Relation between source and target space Lie brackets for map-related field-pairs. 
Let Mı and Mz be C? manifolds. Let ó € C?(Mi, M3). Let (X1, X3) and (Yi, Y2) be ¢-related pairs of 
vector fields X1, Yı € X! (T(Mj)) and X2, Y2 € X!(T(M3)). Then 


Yp E€ Mi, (d)p([X1, Y1](p)) = [X2, Ya](6(p)). (61.6.3) 
In other words, $. o [X1, Yi] = [X2, Ya] © Q. 
PROOF: Let p € Mi. Let V, € atlasp( Mı) and v» € atlass(5(M3). For a = 1,2, let na = dim(M,), let 
Ua = (pa) and define £x, Na : Range(wa) > R" by fa = ta o Xa o Wz! and Na = Va o Ya o Wi. Then 
Va € No, Yq € Dom(tha), Xalq) EX La £a (bo (2) ba? 
Va € No, Yq € Dom(wv,), Y. (q) = tana (ba (q)) Pad 
and by Theorem 61.5.11 line (61.5.2), 
Vy € 3(ó(Dom(vi))), Vi € Nn, 


TAUX» Yo] eig W) = 35 (BW) bto) — hw) BEW). 
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Then by Theorem 61.6.4, substituting y = b(a) with Ó =y20P0 Vil gives 


Vr c yı (47t (Dom(v»))), Vi € Nn; 


Bá EXo. Ya Cs (Be) = 25 (Gla) IEE) — le) BEE) 
- »» È (Bule) Két (x) 85m (O(a) — bsla ant (x) dEle) 


The expression jni ($(x)) satisfies, using the chain rule à, = avum (x) kð; and the product rule, 


Vie Nu VEN, M Bale) r OÉB) = 3 Fale) Mi) lyase 
J= J= 
= à, n (2) 
= ða (35. (w)'enf(2)) 
@=1 


where kgs (£) e = repi lOl 1 (x))) by Definition 58.4.5. So O9. (r)!; = Oebx(x)*z. Therefore 


Va € 1($~*(Dom(w2))), Vi € Nns, 


3 (Xo. Ya] Gu! (9(:)))) = È (ef) È (nf Ada Go à. Gn 2) 
- a6) E (E1999. Go 9.) )0,61(2))) 


= Do bs (2)*e Y, (E(w) Oemi (2) — ni (x)s61(2)) 


But 5; oó — oq. So 


Va € yı (9^! (Dom(v»))), Vi € Nns, 
(Xa, Yo] (Q5; ^ (2))))) = 9$ (o (Xa, Yi] * (2)))). 


Hence ¢, o [X1, Yi] = [X2, Y2] o 9. 


61.7. Lie connections 


61.7.1 REMARK: The Lie derivative is the differential of a vector field relative to the flow. 
The Lie derivative in Section 61.8 is the derivative of a “passive vector field” with respect to an “active 
vector field”. The roles are interchangeable, but it is convenient to label vector fields with these role names. 


The first task for defining Lie derivatives of passive vector fields is to define the transport of a single vector 
(which is embedded in a passive vector field) by the flow generated by the active vector field. (This is 
illustrated in Figure 61.7.1.) This transport is subtracted from the naive derivative of the passive vector field 
(which contains the single vector) in the direction of the flow to define the Lie derivative of the passive vector 
field by the active vector field. The Lie derivative thus equals the naive derivative computed relative to the 
“with the flow” transport. Thus the active field flow provides a reference to compare the naive derivative of 
the passive field to. 


In Figure 61.7.1, for a given vector field X € X!(T(M)) and vector V € T,(M), a C! curve 8 : I > M 


is constructed for an interval J € Topg(IR), with 6(0) = p and 8'(0) = V. Any such curve f may be used 
because the computations using it give the same result regardless of the choice of curve. Then the integral 
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Figure 61.7.1 The effect of flow by a vector field X on a curve ( with velocity V 


curves Yg(s) of X, as in Definition 57.10.2, are constructed with points 6(s) for s € J as initial points. Thus 
Ya(s)(0) = B(s) for s € I, and Yges) (t) = X(va(s)(t)) for s € I and t € J, for some interval J € Topo(IR). 


The existence of a curve 8 with 8(0) = p and 6’(0) = V is never problematic because tangent vectors 
are defined in Definition 54.1.2 in terms of constant velocity curves in Cartesian coordinate spaces as in 
Notation 26.13.4, which are imported to the manifold via charts. So there is no need to invoke ODE theory 
to guarantee the existence of 8. One may simply choose t — i^ ! (v(p) + t$(v)(V)) for any v € atlas; (M). 
By contrast, the existence and uniqueness of the y curves is problematic. ODE theory is required for this. 


Given a particular so € I, the point q = B(so) flows to a point ¢ = yq(t) after a “time” t c IR. If this is 
constructed for each s € I, one obtains a transformed curve f given by f(s) = Ya(s)(t) for all s € I. This 
curve passes through the points f = 7,(t) and d = y(t). The velocity V = ĝ' (0) of the transformed curve at 
p is the result of the flow along X of the vector V for “time” t. This flow of V along the curve ^, is called 
the “Lie transport” of V by X in Definition 61.7.11. The derivative of V with respect to t at p is called the 
“Lie connection" for X in Definition 61.7.5. Then the Lie connection is used in Definition 61.8.4 to compute 
the Lie derivative of a passive vector field Y € X'(T(M)) such that Y(p) = V with respect to the active 
vector field X. 


The effect of the flow of an active vector field on a passive curve is similar to the way in which a row of 
untethered buoys move under the influence of an ocean current, except that the vector field here is static 
whereas the velocity field of an ocean current is generally time-dependent. Perhaps a more accurate analogy 
is a curve painted on the surface of a glacier. The curve moves under the flow of the glacier, transporting 
vectors in the curve according to the flow. 


Since the Lie connection 0x V equals the naive derivative of the vector V with respect to the parameter t, 
it is necessarily an element of the double tangent space Ty (T(M)) with base point V because V is being 
varied. This distinguishes it from the naive derivative Oy X € Tx (jj (T(M)) of X in the direction V, which 
has base point X(p) for p € M because X is being varied. (See also Remark 61.7.7 on this topic.) 


Something similar to the flow concept described here is also given by Crampin/Pirani [7], pages 64-69, with 
the name “Lie transport”. Many authors prefer to define a family of diffeomorphisms, which is the aggregate 
of all integral curves for all starting points in a neighbourhood of a given point. This serves the same purpose, 
but obscures the concept by unnecessarily extending the required curve family to a diffeomorphism family. 


61.7.2 REMARK: Formula for the effect of vector field flow on a vector. 
Theorem 61.7.3 gives a formula for the flow concept which is described in Remark 61.7.1. The expression 
ðs yas) (0|, 1)=(0,0) 38 the derivative of the function t ++ V(t) = 0576(s) (t)|,_, with respect to t, where 


V is the vector field along the curve yo, which is interpreted as the effect of the flow of the active vector 


field X on the passive vector V. Thus OrOs ya (sy (t)| 4 #)=(0,0) 18 equal to &V (t). The vector expression 


V(t) = OsYA(s) BL is meaningful because the map s ++ Ygs)(t) is a C? curve for all t € J. 
'Theorem 61.7.3 asserts that the second-level vector àV(t)| io € Tv (T(M)), which is the derivative of the 
transport of V by X, is equal to Oy X with the swap function = in Definition 59.6.6 applied to it, where 


Oy X € Tx (p)(M) is the naive derivative of X in the direction V in Definition 61.2.3. Consequently the 
flow-effect of X on V is independent of the choice of curve £. 
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Theorem 61.7.3 gives a kind of “a-priori” formula for the differential of transport of vectors which flow 
according to a given vector field. (The term “a-priori” is also used in PDE theory for estimates of PDE 
solutions which are not yet known to exist, with the purpose of showing that they do exist.) This formula 
can then be used to test vector fields along curves to determine whether they transport vectors parallel to 
a given vector field. The formula can also be used to define the Lie derivative of vector fields, which is the 
immediate purpose here. 


The integral curve approach has the advantage of motivating the definition, but in practical terms it is 
difficult to obtain the required differential expressions by first integrating and then differentiating. Just as 
in the case of affine connections, it is better to work with differentials than parallel transport along curves. 
It is probably best to first define Lie derivatives differentially as in Definition 61.8.4, and then demonstrate 
their properties along integral curves, assuming that such integral curves exist. 


61.7.3 THEOREM: A-priori formula for flow of vectors due to transport by a vector field. 

Let M be a C? manifold. Let X € X'(M). Let p € M and V € T,(M). Let 8 : I —^ M be a C! curve 
with I € Top9^"" (IR), 8(0) = p and B'(p) = V. For all q € Range(), let yq : J + M be aC’ curve with 
J € Topo ™ (IR), 44(0) = q and Vt € J, y(t) = X(»4(t)). Define 6: I x J > M by ¢(s,t) = vas) (t) for all 
(s,t) € I x J. Suppose that ó € C?(I x J, M). Then 


Vs € I, Vt € J, OO yg(s) (B cmn = E(0y X). (61.7.1) 
(See Definition 59.6.6 for the swap function =. See Notation 34.4.11 for Topo... (IR).) 


Pnoor: It follows from Theorem 59.6.12 that OTOLO Bees) = =(050178(s) Oh t)-(0 o)^ and the 
equation for y} gives Orya(s)(t) = X(va(s)(t)) for all (s,t) € I x J. But ya((0) = B(s) and 8'(0) = V. So 
Ós^yg(s) (0). o = V. Therefore 0, X (yg(s (|. jr Oy X by Theorem 61.2.7. Hence line (61.7.1). 


((2018-10-4. After Theorem 61.7.3, show that ¢ € C?(Ix J, M) follows from the conditions of the theorem. This 
requires a theorem on the continuous/differentiable dependence of ODE solutions on the initial conditions. 
This should be presented in or after Section 44.6. See Murray/Miller [119], pages 63-72, 77-102. )) 


61.7.4 REMARK: A “Lie connection” has similarities to an affine connection. 
The terminology “Lie connection” in Definition 61.7.5 is non-standard, but it seems reasonable because its 
integral curves in Definition 61.7.11 are called the “Lie transport” of the vector V by the vector field X. 


The “Lie connection” is the differential of parallel transport along an integral curve of a vector field. So it is 
in fact a connection of a limited kind. However, instead of being a per-path connection, it is a per-vector-field 
connection. Hence it is labelled with the vector field rather than a curve velocity vector as is the case for an 
affine connection. An affine connection is defined along general curves, whereas a Lie connection is defined 
only along integral curves which are determined by vector fields. 


The “Lie connection” plays the same role in the construction of the Lie derivative in Definition 61.8.4 as the 
affine connection plays in Definition 71.6.4 for covariant derivatives. The defining equations are similar. 


61.7.5 DEFINITION: The Lie connection on a C? manifold M is the map 0 : X! (T(M))xT(M) > T(T(M)) 
defined by 


VX e X!(T(M)), VV e T(M), 0xV = E(Oy X). (61.7.2) 


The Lie connection with respect to a vector field X € X! (T(M)) on M is the map 0x : T(M) > T(T(M)) 
defined by 


VV € T(M), 0xV — E(0y X). 


Alternative name: Lie horizontal lift. 


61.7.6 THEOREM: Coordinates for the Lie connection. 
Let M be a C? manifold. Then the Lie connection on M satisfies 


Vp € M, VV €T,(M), VY € X! (T(M)), Vv € atlas,(M), 


EN 
OxV = t5 tujv), ny (9(9)) v.v, e (WP), 9? 
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where ny = P(Y) o Y o ^! : Range(v) — IR" is the component function of Y for charts v» € atlas(M), and 
wy,v,) : Range(V) > R”, with n = dim(M), is defined for v € atlas(M) by 


Va € Range(w), 


(v) (V)8«m, (x). 


wv, yy (2) 


PRoor: The formula follows from Theorem 61.2.11 and Definitions 59.6.3 and 61.7.5. 


61.7.7 REMARK: The relation between the Lie connection and a naive derivative. 

The parallel translation vector 6xV is an element of Ty(T(M)) rather than the tangent space T,(M) 
where p = 1(V). The set Ty(T(M)) C T® (M) = T(T(M)) contains “tangents to tangents” as defined in 
Section 59.1. Like the affine connections in Chapter 71, these vectors may be thought of a “tensorisation 
terms” which are applied to the actual rate of change of vector fields. The difference between them is then 
a “vertical vector” which can be converted to a vector in T(M) by using a “drop function”. 


Figure 61.7.2 illustrates the flow velocities 0x (Vi) and @x(V2) for vectors Vi, Vo € T,(M) for a vector 
field X € X!(T(M)). This emphasises that the flow velocities are not elements of the tangent space T(M ). 


TUM) 
6x(Vi) € Tux T) 944 cT 
Va 
T(M) T(M) ec 
Toron = E 

d (47) d P1 (TOt) 

We T,(M) ue T,(M) 
M < X (p) € T,(M) M C= X(p) € T,(M) 

Vo € T,(M) Vo € T,(M) 


Figure 61.7.2 


Base points for 0x V and Oy X 


Application of the swap function = to ôy X in Definition 61.7.5 swaps the passive vector V and the flow 
velocity X (p) as parameters for the second-order tangent space base points. Thus 0xV € Ty x (jJ (T(M)), 
whereas ôy X € Tx (5), y (T(M)) 


61.7.8 REMARK: The Lie horizontal lift of a vector field. 

It is sometimes convenient to have a simple definition and notation for the “lift” of an entire vector field 
by a Lie connection. This is really only a notational convenience, as also in Definition 61.4.2, where Ox Y 
is defined to mean p++ Ox(p)Y, which then permits Ox(,)Y to be written as (OxY )(p). Definition 61.7.9 
replaces the cumbersome map p > 0x (Y (p)) with 0xY. 


61.7.9 DEFINITION: The Lie connection for vector fields on a C? manifold M is the map 
0: X! (T(M)) x X(T(M)) > (M > T(T(M))) defined by 


VX e X! (T(M)), VY € X(T(M)), Vp € M, 


Thus VX € X! (T(M)), VY € X(T(M)), 0xY =0x o Y 2 8oO0y X. 


Alternative name: Lie horizontal lift for vector fields. 


61.7.10 REMARK: Lie transport is a, kind of horizontal lift of vectors along curves. 
The Lie transport curve W in Definition 61.7.11 is a “horizontal lift” of the curve x o W : I —^ M in the 
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same way that curves are lifted by affine connections. (Here m : T(M) — M is the projection map of T'(M).) 
An important difference is that the base-point curve 7 o W cannot be freely chosen. The base-point curve is 
determined by the vector field X. Parallelism is defined by this *Lie connection" only along integral curves 
of X. Definition 61.7.11 does not require existence or uniqueness of integral curves. But if existence and 
uniqueness are guaranteed, one may refer to the Lie transport of a vector by a vector field, whereas the Lie 
connection in Definition 61.7.5 is well-defined even if integral curves of X are non-existent or non-unique. 
(See Section 57.10 for integral curves.) 


The difference between the Lie connection and an affine connection as in Definition 67.4.2 is shown most 
clearly in the fact that the Lie connection is given only for the direction X(p) € T,(M) at each p € M, 
whereas an affine connection is given for all tangent vector directions in T; (M). This is why parallel transport 
in Definition 61.7.11 is defined only along integral curves of the vector field X, whereas parallel transport 
for affine connections is defined along completely general rectifiable curves. 


61.7.11 DEFINITION: The Lie transport of a vector V € T(M) on a C? manifold M by a vector field 
X € X! (T(M)) along an integral curve of X, namely a curve y € C! (I, M) with Vt € I, 7/(t) = X(o(t)) for 
some interval I € Topo (IR), is a curve W € C! (I, T(M)) such that r o W = y and 


(i) W(0) =V, 
(ii) Vt € I, W'(t) = 0x (W (£)), 


where 6x is the Lie connection with respect to X, and 7: T(M) — M is the projection map for T(M). 


61.8. Lie derivatives of vector fields 


61.8.1 REMARK: The Lie derivative for vector fields is equal to the Lie bracket. 

In Section 61.8, the Lie derivative is defined for vector fields acting on vector fields. This is numerically 
equal to the Lie bracket. In Section 61.9, the Lie derivative is extended to define actions of vector fields on 
general tensor fields using the association between tensor bundles and tangent vector bundles. 


61.8.2 REMARK: Some literature for the Lie derivative. 

The Lie derivative is presented by Crampin/Pirani [7], pages 67-72, 76-79, 126-129, 253-254; Frankel [12], 
pages 125-138; Szekeres [305], pages 436—439; Spivak [37], Volume 1, pages 149-157; Bishop/Goldberg [3], 
pages 128-132; Lang [23], pages 122-124, 140-143; Bishop/Crittenden [2], pages 16-18; Lovelock/Rund [27], 
pages 123-126, 345-347; Petersen [31], pages 375-380; Auslander/MacKenzie [1], pages 141-144; Gallot / 
Hulin/Lafontaine [13], pages 39-40; Nash/Sen [30], pages 171-174; Choquet-Bruhat [6], pages 143-148; 
Gómez-Ruiz [14], page 97; Guggenheimer [16], pages 126-127; Malliavin [28], pages 102-112; Penrose [297], 
pages 310-316; Schutz [36], pages 76-79; Bleecker [254], page 9; Eriksson/Hàggblad/Strómbom [264], page 12; 
Kobayashi/Nomizu [19], pages 28-30. 


61.8.3 REMARK: Illustration of the Lie derivative as a difference between actual and parallel flow. 

The Lie derivative of a vector field Y with respect to a vector field X on a C? manifold M is defined as the 
difference between the actual rate of change of Y and the rate of change it would have if it was transported in 
a parallel fashion by the flow of X. This difference is Ox (jj Y —0x (Y (p)) for p € M. This is a vertical vector in 
Ty (5) (T(M)), which means that (dr)y(p) (3x(pY — 0x (Y (p)) = 0, where (dz)y (5) : Ty (9 (T(M)) > Tp(M) 
is the differential at Y (p) € T;,(M) of the projection map a : T(M) — M. Therefore a drop function 
w :T(T(M)) ^ T(M) may be applied to the difference to give Lx (Y)(p) = c(0x(5Y —0x (Y (p))) € T (M) 
for all p € M. This is illustrated in Figure 61.8.1. 


61.8.4 DEFINITION: The Lie derivative of a C! vector field Y with respect to a C! vector field X on a C? 
manifold M is the vector field Lx Y € X(T(M)) defined by 


Vp € M, (LxY)(p) = wy (p)(Ox(p)¥ — 0x (Y (p))), 
where 0 x is the Lie connection on M with respect to X and w is the drop function given by Definition 59.2.9. 
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©), 
To MD (4) 


M 


Lx(Y)(p) = wy (y(0xqyY — 6x (Y (p))) 
€ T (M) 


Figure 61.8.1 


Lie derivative of vector field Y with respect to X 


61.8.5 THEOREM: Some basic properties of the Lie derivative. 
Let M be a C? manifold. Let X,Y € X!(T(M)). 


(ii LxY =Wo (OxY —Eo Oy X). 


) 
(iii) LxY = [X,Y]. 


Pnoor: Part (i) follows from Definitions 61.8.4, 61.4.2 and 61.7.9. 
Part (ii) follows from part (i) and Definition 61.7.9. 
Part (iii) follows from part (ii) and Definition 61.5.7. 


Part (iv) follows from part (iii) and Theorem 61.5.9. 


61.8.6 THEOREM: Coordinates expression for the Lie derivative of vector fields. 
The Lie derivative LxY for X,Y € X!(T(M)) on a C? manifold M satisfies 


VX,Y € X'(T(M)), Vp € M, Vv € atlasy(M), Vi € Nn, 


B(LxY)) = X (HX) HV OC) legen) — FO) 9 Q7). uo); 


j=1 


where n = dim(M) and 7) = (wv) is the velocity chart for 7 as in Notation 54.5.7. Hence 


VX,Y € X'(T(M)), Vp € M, Vu € atlas; (M), Vi € Nn, 


B(ExY)@) = X €) GL, y ~ LPs) leay 


p) 


where £ = jo X o j^! and n = h o Y o -!. Hence 


VX,Y e X!(T(M)),, Vv € atlas(M), Vx € Range(v), Vi € Nn, 


(9 o LxY o (a) = È EEan) - Yo (e)a O). 


Or more briefly, p o LxY oy! = oa (Ean — n O£). 


PROOF: The assertion follows from Theorems 61.8.5 (iii) and 61.5.11. 


61.8.7 THEOREM: 


The Lie derivative of a C*+ vector field by a C** vector field is a OF vector field. 


Let M be a C**? manifold for some k € Zj. Then 


VX,Y e X**1(T(M)), LxY e X*(T(M)). 


Pnoor: The assertion follows from Theorems 61.5.13 and 61.8.5 (iv). 
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61.8.8 REMARK: Unlikely confusion of Lie derivative notation with left action notation. 

The notation LxY for Lie derivatives could be confused with the notation L,X for the left translation of 
vector fields in Section 62.3. The former has a vector field as a subscript whereas the latter has a group 
element. This will usually remove any ambiguity. 


61.8.9 REMARK: Interpreting Lie derivatives as parallelism for a very restricted curve space. 

The Lie derivative with respect to a field X may be thought of as the covariant derivative for a parallelism 
which is defined for a curve class @ which is the set of integral curves of X. This is a very limited kind of 
parallelism because it sets up isomorphisms only between tangent spaces at points which are connected by 
these integral curves. Therefore this class of curves does not satisfy the requirements for a “parallelism path 
class" in Definition 48.2.6. 


61.8.10 REMARK: The differing essences of Lie derivatives and the Lie bracket. 
The identity LxY = [X,Y] in Theorem 61.8.5 (iii) hides the different ways in which these two expressions 
are constructed. The ZF set constructions are ultimately identical, but they have a different “essence”. 


The Lie bracket expression [X, Y] in Definition 61.5.7 is constructed by first subtracting the naive derivatives 
OxY and Oy X, and then “dropping” the difference to T'(M). As mentioned in Remark 61.5.3, the majority 
of authors define the Lie bracket to be the commutator of first order differential operator fields. Thus [X,Y] 
is thought of as ôx o Oy — dy o Ox acting on C?(M). This gives a second-order operator whose second- 
order components equal zero. It is not immediately obvious how a differential operator Ox can be defined on 
C?(M) for a tensor field K € X!(T^*(M)) with general (r, s) 4 (1,0). This accurately reflects the “essence” 
of the Lie bracket because of its historical origins in Lie group theory. 


The Lie derivative expression LxY for vector fields X, Y € X!(T(M)) for a C? manifold M is formed 
by subtracting a Lie connection term 0xY from the naive derivative Ox Y to give a T(T(M))-valued field 
OxY — 0xY which is then “dropped” to T(M). In the case of the Lie derivative, extension of LxY to 
LxK is quite easy to imagine for tensor fields K € X!(T"5(M)) with general type (r,s). One may use the 
"associated parallelism" concept which is outlined in Sections 48.4 and 71.3. 


61.9. Lie derivatives of tensor fields 


61.9.1 REMARK: Lie derivatives may be extended from vector fields to general tensor fields. 

Lie derivatives are an extension or generalisation of the Lie bracket from vector fields to general tensor fields. 
In other words, the Lie bracket of vector fields is a special case of the Lie derivative. More precisely, the Lie 
bracket is a particular construction from two vector fields, whereas the Lie derivative is a family of operations 
on fibre bundles which are associated with the tangent bundle of a manifold, and in the special case of the 
Lie derivative of a vector field (a cross-section of the tangent bundle), the value of the Lie derivative just 
happens to be the same as the Lie bracket. 


The Lie derivative is often expressed in an abstract intuitive way as a derivative of transport by families 
of local diffeomorphisms which are constructed from integral curves. In practice, concrete expressions are 
required for Lie derivatives acting on any kind of vector, covector or tensor field. To obtain an explicit 
expression for the Lie derivative for general tensor fields and other fibre bundle cross-sections on a given base 
manifold, it is necessary to use the concept of “associated parallelism”, which is defined in Section 48.4. The 
Lie derivative is straightforward to define for the tangent bundle of a differentiable manifold. Lie derivatives 
for cross-sections of other fibre bundles on the same manifold can then be defined in terms of fibre bundle 
associations, which are defined in Section 47.9.5. 


Just as covariant derivatives may be extended from cross-sections of tangent bundles (i.e. vector fields) to 
cross-sections of arbitrary associated tangent fibre bundles (such as tensor fields and differential forms) by 
using the concept of associated parallelism (and associated connections), so also Lie derivatives may be 
extended from vector fields to general associated fibre bundles. Textbooks often perform this extension 
by differentiating various contractions of vector and tensor fields, but this is just one way of defining the 
association between different types of tensor fields. 
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61.10. The exterior derivative for vectors 


61.10.1 REMARK: Adaptation of the Cartesian space exterior derivative to differentiable manifolds. 

The exterior derivative in Section 46.7 is straightforward to adapt from Cartesian spaces to differentiable 
manifolds because it does not require a connection or metric field, and it has been shown in Theorem 46.7.8 
that it is covariant under C? diffeomorphisms. A C? structure on a manifold is the minimum differentiability 
requirement because a C! manifold is required for the definition of the tangent bundle (and differential forms), 
and a further order of differentiability is required for the differentiation of differential forms. 


For the construction of the exterior derivative for Cartesian spaces, Definition 46.7.4 uses globally constant 
vector fields. In the case of manifolds, vector constancy can be defined using charts, but global charts are 
unavailable if the manifold is topologically non-trivial, and in any case, the constancy is chart-dependent 
(except for 0-dimensional manifolds). However, the “constant vector field extensions" in Definition 57.1.20 are 
suitable substitutes for globally constant vector fields because the computation of derivatives of functions at a 
point only requires them to be defined in some neighbourhood of the point, and the chart-dependencies cancel 
each other out in this application. (Hence the value of (dw)(V) in Definition 61.10.3 is chart-independent.) 


Definition 61.10.3 line (61.10.1) is an adaptation to differentiable manifolds of the corresponding exterior 
derivative formula in Definition 46.7.4 line (46.7.1) for Cartesian spaces. (The proof of this correspondence 
in Theorem 61.12.3 is not entirely trivial. See Remark 61.12.1 for further details.) 


Since the definition of "constant" vector fields on a differentiable manifold is chart-dependent, it must 
be shown that the exterior derivative in Definition 61.10.3 is chart-independent despite this. The chart- 
independence of the formula in line (61.10.1) is demonstrated in Theorems 61.12.3 and 61.12.5. (See Re- 
mark 61.12.2 for further comment.) 

The differential forms w : T™(M) — W and dw: T"*! (M) — W in Definition 61.10.3 are short-cut versions 
of differential forms as in Definition 57.7.3. This explains why the left-hand-side expression in line (61.10.1) 
is (dw)(V), not (dw)(p)(V). (The relations between maps and spaces in Definition 61.10.3 are sketched in 
Figure 61.10.1 without and with short-cuts. See Notation 57.7.23 for short-cut differential form spaces.) 


without short-cuts with short-cuts 


w € X (A (T(M), W)|U) w € X (A (T(M), W)|U) 
dw € X°(Anii(T(M),W)|U) dw € X? (Am+ı(T(M), W) |U) 
Figure 61.10.1 Exterior derivative of differential forms on a manifold 


61.10.2 REMARK: Definition of the exterior derivative for differentiable manifolds. 

Definition 61.10.3 expresses the exterior derivative in terms of extensions Extny(V) of vector tuples V 
to vector-tuple-valued fields. (See Notation 57.4.5.) Such an extension is constant with respect to the 
chart y. Any holonomic C! extension will give the same value for dw, but chart-generated extensions are 
convenient because they are guaranteed to exist and be C! when the manifold is C?. (This follows from 
Theorem 57.2.20 (ii).) Therefore the right hand side of line (61.10.1) is well defined. 

In the expression “omit o Extny(V)” in Definition 61.10.3, the vector-tuple output from Extn,(V) is the 
input for omite. Thus (omit; o Extny(V))(g) = omite(Extn,(V)(q)) € T4,(M)" for all q € Dom(v). 
The similar-looking expression “omite(Extny(V))” would be meaningless because the operator *omit;" in 
Definition 14.12.6 (iv) acts on lists, not on list-valued functions. 


See Remark 46.7.1 for some references to the literature regarding the exterior derivative. See Figure 46.7.1 
in Remark 46.7.6 for the geometric significance of Definition 61.10.3 line (61.10.1). 
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61.10.3 DEFINITION: Exterior derivative for differentiable manifolds. 

The exterior derivative for differential forms on a subset U € Top(M) of a C? manifold M, valued in a 
finite-dimensional real linear space W, is the map d : X! (A4,(T(M),W)|U) > X(Am+41(T(M), W) |U) for 
general m € Zi , defined by 


Vw € X! (A4 (T(M),W)|U), Vv € atlas(M), Vp € UN Dom(y), VV € T;(M)"**!, 


(dw)(V) = > (71) ðv, (w o omit o Extny(V)), (61.10.1) 
=0 


where Extn, (V) is the “constant extension” of V via v) as in Notation 57.4.5. 


61.10.4 REMARK: Verification of the target space for the exterior derivative. 

Since it is not immediately obvious that the exterior derivative d in Definition 61.10.3 has the claimed target 
space, this must be verified. The claimed target space X (A, 41(T (M), W) | U) consists of local short-cut 
differential forms which are defined by Notation 57.7.23 line (57.7.12). 


Let w € X! (A (I(M),W)|U), v € atlas(M), p € U N Dom(w) and V € T,(M)™*1. 
The chain-of-maps w o omit; o Extn, (V) on line (61.10.1) takes a point in U NDom(¢) as input. The output 
from Extn, (V) is an (m + 1)-tuple of vectors at each point of VM Dom(wv). Then omit; removes one of these 


vectors and sends an m-tuple of vectors to w, which outputs an element of W. Thus w o omit; o Extn, (V) 
is a W-valued function on UM Dom(w). The differential operator Oy, is applied to this function. 


Since Extny(V) is a C! map because v is a C! chart (because M is a C! manifold), and w is a C! map, 
it follows that w o omit, o Extny(V) is a C! map from U N Dom(v) to W. Therefore the derivative 
Ov, ( o omit; o Extn, (V)) is a well-defined element of W by Definition 54.14.4. 


Thus dw is a map from T"^*' (U) to W which is an antisymmetric function of V € T,(M)™*? for all p € U 
due to the factor (—1)*. Hence dw € X (As 4i(T (M), W) | U) as claimed. 

The target space X (^, 44 (T(M), W) | U) in Definition 61.10.3 could be replaced by the corresponding space 
of continuous differential forms X?(As 44 (T (M), W) | U), but this is not proved until Theorem 61.13.2. (The 
proof is somewhat non-trivial.) 


61.10.5 REMARK: The exterior derivative of a 1-form. 

An important specialisation of the exterior derivative operator in Definition 61.10.3 is to 1-forms. The 
curvature of a connection form in Definition 70.5.2 is the exterior derivative of a 1-form. (This is because 
connection forms are always 1-forms.) Substituting m = 1 into line (61.10.1) gives the following formula. 


Vw € X! (A (T(M),W)|U), V € atlas(M), Vp € U N Dom(y), V(Vo, Vi) € (M), 
(dw)(Vo, Vi) = Oy (w o Extny(Vi)) — Oy, (w o Extny(Vo)). 


This shows how the exterior derivative yields a 2-form dw in this case. The function dw is linear in both of 
its arguments because the directional derivatives Oy, and Oy, are linear in Vo and V respectively, and the 
extension function Extn, is linear map from vectors to vector-valued functions, and w has the right kind of 
linearity because it is a 1-form. (To be specific, a fixed directional derivative of w is linear with respect to 
constant linear transformations of the vector-valued function which it acts on.) 


61.10.6 EXAMPLE: Application of differentiable manifold exterior derivative to Cartesian space. 
Let M = R” for some n € Z. Let W = R and U = M. Let 2 idm. Define w : T(M) —> W by 


Vp € M, Vv € R^, w(tpww)= 0 a;(p)v, 


where a € C! (IR^, IR”). Then w € X1(A;(T(M), W) |U) by Notation 57.7.23 line (57.7.13). 


Let p € M and vo, vi € R”. Let Vi = tp,,y for k = 0,1. Then Extny(Vo) : M — T(M) and Extny (Vi) : 
M — T(M) are cross-sections of T(M) by Notation 57.1.21. So w o Extny (Vi) : M — W is a vector-valued 
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function on M for k = 0,1. By Remark 61.10.5, or by Definition 61.10.3 line (61.10.1), 


(dw)(Vo, Vi) = s (w [0] Extny(V1)) — Ov, (w (w o Extny(Vo)) 
= Opus, (e 0 Extny (tpv: p)) — Opoi p(w 0 Extny (tp,vo,p)) (61.10.2) 
P,vo (q => w(ta,v,y)) = Op,v, o (d d W(tg,vo,¥)) (61.10.3) 
= Op vu (d > » aj (q)vi) — Ops o (q 9 3 aj(a)vå) (61.10.4) 
J-— IS 
= "LI » aj(z)vi) |, ., LC » aj(z)v)) |, ., (61.10.5) 
= J= i= J= 
= Y X wviósaj(z)|,.., — 2. 2; vivala), 
w=1j=1 p t=1j=1 E 
ES L 2 vov? (0,4; (x) O,;ai(x))|, F 
i=17= 


where line (61.10.2) follows by Notation 54.11.12, line (61.10.3) follows by Theorem 57.1.23 line (57.1.4), and 
line (61.10.5) follows by Notation 54.11.3. 

Since W = R, the operators Op.) and Op, vı, can be applied to real-valued functions as in Notation 54.11.3 
instead of vector-valued functions as in Definition 54.14.4. Thus on line (61.10.4), the maps q > 3 a;(q)vi 


and q > ae aj(q)vj from M to W are thought of as real-valued functions on a manifold, not vector-valued. 


61.10.7 REMARK: The exterior derivative of a 0-form. 
A not-so-important specialisation of the exterior derivative operator in Definition 61.10.3 is to O-forms. 
Substituting m = 0 into line (61.10.1) gives the following formula. 


Vw € X! (Ao(T(M), W) |U), V € atlas(M), Vp € U N Dom(vy), V(Vo) € T;(M)!, 
(dw)(Vo) = Oy, (w o omit o Extny( (Vo) )). 


This shows how the exterior derivative yields a 1-form dw in this case. The function dw is linear in its 
argument because the directional derivative Oy, is linear in Vo. 


The sub-expression omitg o Extny( (Vo) ) omits the only element of the 1-tuple extension to each point in 
U n Dom() of the 1-tuple (Vo). Therefore (omito o Extny( (Vo) ))(q) = omito(Extny( (Vo) )(q)) = () = 0 for 
all q € UN Dom(v). The 0-form w maps this empty 0-tuple at each point to some vector in W. So the map 
looks like q — 0 — w(0). 


((2022-9-8. Actually the short-cut version of the exterior derivative does not work when applied on 0-forms. 
So a few sections must be rewritten now to fix this problem. )) 


61.10.8 REMARK: Issues arising if the linear space W is regarded as a manifold. 

An alternative to the interpretation in Remark 61.10.4 of the differential operator Oy, giving an output in 
W is the idea of regarding W as a differentiable manifold. Then w o omit; o Extny(V) would be a C! map 
between manifolds, and its derivative would be an element of the tangent bundle T(W) instead of W. 


Since W is a finite-dimensional real linear space, the elements of T(W) may be “dropped” to W using the 
drop function c; V : T(W) — W constructed as in Definition 54.9.5. So with an implicit drop functionm the 
right-hand side of line (61.10.1) would effectively be an element of W. This gives exactly the same result as 
the assumption in Definition 61.10.3 that W is a linear space without any differentiable manifold structure. 


(( 2022-11-23. Remarks 61.10.8 and 61.10.9 are almost worthless. Consider deleting them! )) 


61.10.9 REMARK: The very fortunate non-necessity of swap-like functions for the exterior derivative. 
Under the linear space manifold structure assumption which is made in Remark 61.10.8, it could seem at 
first sight that some kind of “swap function" as described in Section 59.6 would be required for defining 
the exterior derivative of vector-valued differential forms. Fortunately, swap-like functions are rendered 
unnecessary because the "drop function" for the tangent space of a linear space *drops" all tangent vector 
terms in the exterior derivative formula down to the same linear space W. 
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To see why a swap-like function might be necessary, note that the terms of the sum on line (61.10.1) in 
Definition 61.10.3 have the form Oy,(w o omit; o Extny(V)), where the function w o omit; o Extn,(V) in 
C*(U N Dom(v)), W) is being differentiated at V; € T,(M). 

Let ze = w(omit;(Extnj(V)(p))). (See Notation 57.4.5 for Extn,(V) and Extn,(V)(p).) Then strictly 
speaking, Ov, (wv o omit; o Extn,(V)) € TZ, (W) for all £ € Z;, 41. This implies that the formula for (dw)(V) 
in Definition 61.10.3 would be attempting to add and subtract vectors which are in different pointwise 
tangent spaces, and of course these tangent spaces are disjoint for distinct values of z;. So strictly speaking, 
line (61.10.1) would appear to be meaningless. 


There are two ways out of this apparently erroneous situation. One could define swap-like functions which 
identify vectors in the distinct tangent spaces T.,(W). Or one can map all of these tangent spaces to W 
using the standard finite-dimensional linear space drop function as described in Section 54.9. Dropping all 
vectors to the common underlying linear space is clearly by far the best solution. = 


61.11. Exterior derivative applied to chart-basis vector fields 


61.11.1 REMARK: Application of the exterior derivative to coordinate basis vector tuples. 

The chart-independence of the formula for the exterior derivative given by Definition 61.10.3 line (61.10.1) 
is shown Theorem 61.12.5, based on Theorem 61.12.3 However, it is convenient to first give here some 
clarification in Theorem 61.11.2 of this definition. 


Theorem 61.11.2 replaces the abstract vector (m+ 1)-tuple V in Definition 61.10.3 with the more concrete 
vector (m + 1)-tuple e? = (e£)? o for index-families a : Z[0, m] — Nn. (See Notation 55.5.19 for e£". 
See Notation 54.4.10 for eb») Then the “constant” extension via a chart w of this vector-tuple is the local 
vector-tuple field e% : q+ (e£")z.g for q € Dom(v). (See Notation 57.4.3 for eZ.) 


Theorem 61.11.2 is a differentiable manifold version of Theorem 46.7.11 for Cartesian spaces. The constant 
basis-vector tuples (e;)*., are replaced by chart-basis vector-field tuples (e)? , which are constant with 
respect to a chart 7. The partial derivatives (0;)?_, with respect to coordinates are replaced by chart-basis 
tangent operator tuples (OP) 


i =r 


61.11.2 THEOREM: Application of the exterior derivative to chart-basis vector-field tuples. 
Let M be a C? manifold with n = dim(M). Let U € Top(M) and m € Zj. Let W be a finite-dimensional 
real linear space. Then the exterior derivative d : X!(Am(T(M), W) |U) + X(Am+1(T(M), W) | U) satisfies 


Vw € X (Am(T(M), W) |U), Vy € atlas(M), Vp € U n Dom(y4), Vo € NYT}, 


(dw) (e?) = Y: (71)! 8:9 (w o omit o ef), (61.11.1) 
£—0 


where (0? '"" is the tuple of chart-basis tangent operators for T, » (M) and chart % as in Notation 54.13.5. 


PROOF: Let v € atlas(M), p € U N Dom(y) and o € N7*!, (That is, a : Z[0, m] + Nn.) Let V = eb". 
Then V € T7*'!(M), and Extny(V) = e% € X'!(I"*!(M)| Dom(V)) by Theorem 57.1.23 line (57.1.8). 
Therefore omit, o Extny(V) = omit; o e? € X! (T"(M)| Dom(wv)) for all 2 € Z[0, m]. It then follows that 
w o omit; o Extny(V) € C!(U n Dom(v), W). So 05; (w o omit; o Extny(V)) € W is well defined for all 
p € U n Dom(v) and £ € Z[0, m]. But dP” = peah € T>(M) by Definition 54.13.4 and Notation 54.13.5. 
So ŁY = dy, by Notations 54.11.12 and 54.4.10. Hence line (61.11.1) follows from line (61.10.1). 


61.12. Exterior derivative relation to Cartesian space version 


61.12.1 REMARK: Converting the manifold exterior derivative to the Cartesian space exterior derivative. 
In a differentiable manifold without a connection or metric tensor field, there is very little real difference 
between a locally Cartesian space M with n = dim(M) and the corresponding globally Cartesian space R”. 
Definition 61.10.3 is in essence the same as Definition 46.7.4. (This is asserted in Theorem 61.12.3.) This al- 
lows theorems from Cartesian space, such as Theorem 46.7.8, to be applied to differentiable manifolds. (Note 
that the naive differential forms in Notation 46.6.8 are used in the statement and proof of Theorem 61.12.3.) 
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The statement of Theorem 61.12.3 contains some notational ambiguities which are pointed out in the proof. 
These are partly a consequence of using the customary “multilinear map short-cuts” in Definition 57.7.3, and 
partly due to the lack of standard notations for some kinds of multi-parameter and function-valued functions. 
But as always, it is easier to solve a puzzle if you know the answer in advance. (The proof of Theorem 61.12.3 
is a dog’s breakfast. It is not a good advertisement for pure mathematics. More precisely, it is not a good 
advertisement for abstract manifolds in the fibre bundle idiom. The inconvenience of converting backwards 
and forwards between abstract manifolds and Cartesian spaces is very substantial if full details are given.) 


The main purpose of Theorem 61.12.3 is to show that the exterior derivative for a differentiable manifold 
can be expressed in terms of the exterior derivative for a Cartesian space. The conversion formula in 
line (61.12.1) is cumbersome. The point p must be transformed to w(p), the vector (m + 1)-tuple V must 
be transformed to ®”*1(q)(V), and the differential form w must be transformed to w o $7 ,( (v) '. 
Ideally, the properties of the flat-space exterior derivative in Sections 46.7 and 46.8 should be automatically 
transferable via Theorem 61.12.3 to differentiable manifolds, although the complexity of line (61.12.1) could 
inspire pessimism. 


61.12.2 REMARK: Chart-independence of the exterior derivative must be proved. 

To avoid circularity of reasoning, it is assumed in the statement of Theorem 61.12.3 that the formula for 
the exterior derivative in Definition 61.10.3 line (61.10.1) is potentially chart-dependent. This is indicated 
by using the chart-dependent notation dyw instead of dw. The exterior derivative formula is shown to be 
chart-independent in Theorem 61.12.5, which makes use of Theorem 61.12.3. 


61.12.3 THEOREM: Manifold exterior derivative expressed in terms of Cartesian space exterior derivative. 
Let M be a C? manifold with n = dim(M). Let U € Top(M) and m € Zf. Let W be a finite-dimensional 
real linear space. Then 
Vw € X*(Am(T(M),W)|U), Vp € U, VV € T,(M)™*!, Vip € atlas (M), 
(dyw)(V) = (dw o 95-14. 09) 1) (0) (9 ** (X)(V)), (61.12.1) 
where 
(1) (dyw)(V) denotes the expression $77 (—1)^ Oy, (w o omit; o Extny(V)) in line (61.10.1), 
(2) e" ()(V) = (Y) (Vr) o € IR?)"*!, as in Notation 55.5.25, is the tuple of component-tuples for 
the vectors in the vector-tuple V € T^! (M) = T,(M)™*?, 
(3) d: X! (A, (R*,W)|U) > X (Ami (R^, W) | U) is the exterior derivative in Definition 46.7.4 for the 
Cartesian space IR^, where U = v(U), 
(4) en (y) = ©") | pm cas) for all p € Dom(v/), and 
(5) w o 97. (., (V) ! means the function-valued function z  w o $7 i0, (W) for z € Range(v). (So 
this is the double-map +> (V +> (95.06, (V) 1 (V))) for z € Range() and V € (IR")™.) 
In other words, 
Vw € X! (A4 (T(M),W)|U), Vp € U, VV € T,(M)"*., Vy € atlas (M), 
(dyw)(V) = (da) (z)(V), (61.12.2) 
where w denotes w o 97 ,(., (9) ! € X! (As (IR^, W) | U), ž denotes the real n-tuple v(p) € U, and V 
denotes the (m + 1)-tuple of real n-tuples P+ (yY) (V) e (R™)™*1. 


Proor: By Notation 57.1.21, Extny(Ve)(q) = 55, P(Y) (Vp)? € Ty(M) for all q € Dom(w), for all 
Vi € T,(M), for all k € Z[0, m], for all w € atlas; (M). Then, more by design than by accident, 


Vj € Nn, (5) (Extn (Vi) (4)! = BWE BVA)” y 


Bh) (Va) (v) (ep 


Il 


ll 
un 


P(Y) (Ve d] 


II 


es 
ll 
n 


Í 
pa 
= 
5 
a 
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Thus ®(wW)(Extny(Ve)(q)) = 9(v)(V;) for all q € Dom(v). (In other words, ®(wW)(Extny(V;)(q)), the 
component tuple for the variable vector Extn, (Vi), equals the constant n-tuple ®(~)(V;,) via the chart v, 
independent of q.) Therefore $""!(y)(Extnj(V)(g) = $""*!(y)(V) e (IR?)"*! for all q € Dom(v) 
by Notations 55.5.25 and 57.4.5. (In other words, the component tuple 6”*!(q)(Extny(V)(q)) for the 
variable vector-tuple Extn,(V)(q) equals the constant real n(m + 1)-tuple 9"! (u)(V) via v.) Therefore 
6” (y) ((omit; o Extny(V))(q)) = omit; ($+! (v)(V)) € (R")”™ for all q € Dom(v). 

Let f = 45 ,( (Y). Then f is the function-valued function defined by f(x) = 97 .(,, (v) ' for all 
TE Dom(f) = = Range(W) € Top(IR"). Let x € Range(w) and p = v^ !(z) € Dom(y) € Top(M). Then 
Dom(67(4)-) = Range(97 (/)) = (R^) and Range(97 (u)-!) = Dom(87 (9) = T (M) = T,(M)". 
Thus f has the form f : Range(v) > ((IR")" > T"(M)), and f(x) : (R")™ + TP (M) is a bijection for 
all z € Range(7). (It should be noted that the expression $7 , cod!) is ambiguous because it is not 
explicitly indicated whether it is a function of two variables or a function-valued function, and in either case, 
it is also not indicated which variable should be applied first. So there are four possible interpretations. 
However, the interpretation in this context is in the style f(x)(V), not f(V)(x), f(x, V) or f(V,z).) 

Let à = w o 95 ,(. (V) !. Let z € Range(v). Then w(x) = w o 95 ,(,, (V) is well defined if and only if 
Range(975 ..(,, () !) € Dom(w), which holds if and only if T- NC ) € (a)! (Dom(w)) = (2**)-! (U), 
where 7” : T'"(M) — M is the projection map for T" (M). (This assumes that the “short-cut” version of 
differential forms is being used here.) Thus w(x) is well defined if and only if x € Range(w) and v^! (x) € U, 
which means that z € Y(U) =U. (The notation w o ac 10 — w o f is ambiguous because even when 
it is known that f has the function-valued function interpretation f : x œ> (V e f(x)(V)), it is not explicitly 
indicated whether the composition “o” should be applied to f directly or to f(x) for each x individually. 
In fact, the interpretation here is the latter.) 

Thus à is interpreted as o : U > ((IR")™ > W) defined by o(z)(V) = w( OP sa) (#) (VY) for all ž € U 
and V € (IR")”. The antisymmetry of o(i)(V) with respect to permutations of the tuple V = (Ve) 
follows from the antisymmetry of w. So o € X(A,, (IR^, W)|U). The C! differentiability of &(-)(V) for all 
V € (IR")" follows from the C? differentiability of w. (See Theorem 57.5.7.) Then by Definition 46.7.4, 


yrs 


vV e(R")"", — (di) 9)(V) = 3: 0 0,2) (omit(V)) 


= yoy x jð pöly) (omit(V))|..... 

i Pan 2 // Oy Bip (9) (omit(V)))|,... (61.12.3) 
= Ec» > 7/0, co (omit(87 y (9) (V), 

B xc» | // Oy (omit( (94-1) (9) V) r= ls: (61.12.4) 


where line (61.12.3) follows from the assumption that à = w o $77 ate (7. and line (61.12.4) follows from 


assumption (2). Let V = $"*(w)(V) e (IR")"*. Then Extn (V) (9-1 (g)) = y-a) (V)! (Vc) for all 
k € Z[0,m] by Notation 57.4.5. So 


(dy) (V) = E(D" $ Vias oni( (Extn (Q7 G)) so Ls 


= y (-1) = V/Oy w(omit(Extny(V (9^ ()))l, 


£—0 
= X C oto omit o Extny(V)) 
£—0 


by Notations 54.11.3 and 54.11.12 because r™11(V) = v !(x). Hence it follows from assumption (1) that 


(d)(z)(V) = (dyw)(V). 
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61.12.4 REMARK: Verification of covariance of the exterior derivative. 

The first test of the applicability of Theorem 61.12.3 is whether it can be used to convert Theorem 46.7.8 
for the covariance of the Cartesian space exterior derivative in Definition 46.7.4 to the covariance of the 
manifold exterior derivative in Definition 61.10.3. In the differentiable manifold context, this “covariance” 
means that the output of the exterior derivative computation is chart-independent. 


61.12.5 THEOREM:  Chart-independence of the exterior derivative. 
Let M be a C? manifold. Let U € Top(M). Let W be a finite-dimensional real linear space. Then 


Vm € Zt, Vo € X! (A. (T(M),W)|U), Vp € U, VV € T,(M)"*!, vy, y" € atlas,(M), 
(dyw)(V) = (dyrw)(V). (61.12.5) 


In other words, 
Vm € Zo, Yw € X*(Am(T(M),W)|U), Vp € U, VV € T;(M)"*!, Vo’, y" € atlas,(M), 


Y (—1)* dy, (tw o omit o Extny(V)) = > (—1)* dy, (wo omit o Extny(V)). (61.12.6) 
£=0 =0 


PROOF: Let p€ U,V € T;(M)"* and s, v" € atlas,(M). Then it follows from Theorem 61.12.3 that 
(dyw) (V) = (di) G (V^), where à! = w'(p), V! = &" "(g^ (V), and df = w o OP i (WE 

Let Ü' = q'(U n Dom(V")) and Ü" = "(Un Dom(y’)). Then ¢ = V" o (y)? : Ü' > Ù" is a C? 
diffeomorphism by Definition 51.3.2 (iii), and U’,U” € Top(IR”), where n = dim(M). 


") 
It UR from Theorem 61.12.3 that (dy«w)(V) = (dao) (4: 4" )(V"), where à" = y" (p), V" = emi (y")(V), 
and à" = w o 606, (97) T Then z^ = $(4^), and V" = (Jo(3^) VI)? 4 = Js(4')V', and à satisfies 


viet" vv e (R), — wl(a)(V) = DE (u)- (V) 
= ICM p) oan VITTEN) 
= w"(¢())(Jg(@)(V)). 


So by Theorem 46.7.8, (da) (3^) (V^) = (di) (e(3^)) (Jg(#’)V"). Hence (dyw) (V) = (dyrw)(V). 


61.13. Differentiability of exterior derivative of forms 


61.13.1 REMARK: Choice of “constant” extension fields to prove exterior derivative differentiability. 

A particular difficulty with the formula for the exterior derivative in Definition 61.10.3 line (61.10.1) is the fact 
that a different extension field Extn,(V) is used on the right-hand side for each vector-tuple V € T(M)"'*!. 
So the formula cannot be easily converted to a simple Cartesian space map which can be differentiated. 


There are two relevant points in M in Definition 61.10.3 line (61.10.1). The first point is p = 7 (V), the base 
point of the vector-tuple V € T(M)"'*!. But there is a second point q € M which must be varied in order to 
evaluate the directional derivative expressions Oy, (w o omit; o Extny (V)) for £ € Z[0, m]. Unfortunately the 
vector-tuple field q + Extn,,(V)(q) depends on both of these points p and q. In other words, the vector-tuple 
field which must be differentiated is a different field for each p. So one cannot assert in a simple way that 
dw is C? because “through the charts” it is the derivative of a single C*+! function. The function to be 
differentiated is varying as the differentiating point p varies. 


The proof of Theorem 61.13.2 proceeds by first computing a specific formula in line (61.13.3). Luckily this 
formula removes the dependence on the particular choice of “constant extension field” Extny (V) for V. After 
this field has been differentiated, it yields the simple coefficient array omit;(v), which is C% because it is 
constant with respect to x = v(p). 


Theorem 61.13.2 is stated for W-valued differential forms on domains U € Top( M). However, real differential 
forms are obtained by substituting IR for W, and global differential forms are obtained by substituting M 
for U. (See there is no need to provide four almost identical versions of this theorem.) 
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61.13.2 THEOREM: Dhifferentiability of the exterior derivative of a differential form. 
Let k € Z, M be a C**? manifold and U € Top(M). Let W be a finite-dimensional real linear space. Then 


Vm € Zt, Vw € X (A (T(M),W)|U), 


where d is the exterior derivative map in Definition 61.10.3. 


PROOF: Letw € X*t!(A,,(T(M),W)|U). Then w € CF (T"(U),W) by Notation 57.7.23 line (57.7.13). 
Let v € atlas(M) and à = w o V"(u)-1. Then & € C*H(wy(U) x (R")™, W) by Notation 55.5.33 and 
Definitions 55.5.37 and 52.1.2, and w| (m1 (Doma) = & o Y™ (y), where xz™ : T™(M) — M is the projection 
map for T" (M) as in Definition 55.5.8, and (7™”)~!(Dom(w)) = Dom(¥™(w)) by Notation 55.5.33. It must 
be shown that (dw) o V*1(u)-! € C (Y(U) x (R")™*1, W). 


Let p € Dom(y) and V € T;,(M)"*!. Then V = (Vj) 3o = (55. vier? )7:. for the family of component 


/ 4—1l “jt 
families v = ((v1)7-1)5o = (9(/)(V;))75o € (R")™** by Notation 54.5.7 and Theorem 54.4.11, and then 


vq € Dom(U), — (9"(V) o omit o Extny(V)) (a) = 9" (V) (omit(Extn,(V)(q))) 
= omit (9^ (y) (Extny(V)(a))) 


= omit (9^  ()(V)) (61.13.1) 
= omit(B™™()((3o vjet* Vo) 

= omit( COD gee” E (61.13.2) 
= omit(((vj)7-1)5--o) 


where line (61.13.1) follows from Theorem 57.4.7 line (57.4.3), and line (61.13.2) follows from Notation 55.5.25 
and Theorem 55.5.21. So by Notation 55.5.33, 


Yq € Dom(U), (W (Y) o omit o Extny(V))(q) = (Y(a), 9" (V) (omit(Extny(V)(q)))) 
= (uq), omit(v)). 


Let (x, v), (y, v) € Y(U) x (R")™*? = Range(V"-* (y)). Thus x,y € Y(U) and v = ((vj)-1)529 € R^)". 
Let p = 47+ (x), q = 3 !(y) and V = W"1(y)-!(z,v) e T7 (M). Then 


vZ € Z[0,m], (wo omit o Extny (V))(q) = (v o V" (wv) o omit o Extny(V))(q) 


= aly, omit(v)). 


Therefore 
VE € Z(0, m], Ov, (wo omit o Extny(V)) = Y, vjOy (wo omit o Extny (V) (07 U) 
i=l 
=5 vpO,:(y, omit(v))| z (61.13.3) 
i=1 


The expression in line (61.13.3) is a well-defined C* function of (x, v) because & € C^*! (Y(U) x (R")™, W). 
So the expression )7y".9(—1)‘ dy, (w o omit; o Extn,(V)) for (dw)(V) in Definition 61.10.3 line (61.10.1) is 
a CF function of (z,v) = W"*!(y)(V). In other words, (dw) o W"'*!(y)-! € C*(v(U) x (IR")"^1, W). So 
dw € C*(T"(U),W). Hence dw € X*(Aq4(T (M), W)|U) by Notation 57.7.23 line (57.7.13). 
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61.14. The exterior derivative for nonholonomic vector fields 


61.14.1 REMARK: The application of the exterior derivative to vector-field tuples. 

A popular way to avoid the technicalities of the chart-dependent constant extensions of vector-field tuples 
in Definition 61.10.3 is to replace these vector-field tuples with general vector-field tuples. As observed 
in Theorem 46.8.8 for Cartesian spaces, the uncorrected exterior derivative produces some erroneous extra 
terms when it is applied to nonholonomic vector-field tuples. These erroneous terms can be removed very 
easily by simply subtracting them as in Definition 46.8.10. The same “correction” procedure can be applied 
in the case of differentiable manifolds. 


An important application of the exterior derivative is to the computation of the curvature of a connection form 
on a principal bundle. (See for example Definition 70.5.2.) Connection forms are vector-valued differential 
forms which produce a vector output from a given tangent vector, but connection forms are also often 
formulated as maps which produce a vector-field output from a vector-field input. The exterior derivative 
for vector fields is particularly useful for such formulations. 


Definition 61.14.2 adapts Definition 61.10.3 to vector-field tuples by replacing the “constant” vector field 
extensions Extn,(V;) of vectors V; with general vector fields X;. Definition 61.14.2 is the manifold version 
of Definition 46.8.3, which is the Cartesian space uncorrected exterior derivative for vector-field tuples. 


61.14.2 DEFINITION: Uncorrected exterior derivative for vector fields on differentiable manifolds. 

The uncorrected exterior derivative for vector fields on a C? manifold M, valued in a finite-dimensional 
real linear space W, is the map d : X! (A4,,(T(M),W)|U) > (X'(T(M)|U)"*! > (U > W)), for degree 
m € Z* and U € Top(M), which is defined by 


Vw € X! (A«(T(M),W)|U), VX = (X)? o E€ X! (T(M)|U)"*, vp e U, 


Tri 


(dw)(X)(p) = LN Ox, (p)( © omit(X)). 

61.14.3 REMARK: The functional form of the output from the uncorrected exterior derivative. 

In Definition 61.14.2, d takes a W-valued differential m-form w € X'(A,,(T(M), W)|U) as input, and 
produces a function-valued map dw : X!(T(M)|U)"! > (U — W) as output. This is inconvenient and 
untidy, and does not seem to agree with the way it is defined by most authors. The desired output dw should 
ideally be an (m + 1)-form in X?(As, 1(T(M), W)|U). (This is in fact the output space for the exterior 
derivative in Definition 61.10.3.) Such an (m + 1)-form would be an antisymmetric map from T(M)™*+ 
to W, with base points restricted to U. 


Such a map may be constructed from dw as V œ> (dw)(Extnj(V))(a"*!(V)) for V € T"*!(M). In 
other words, a tangent vector (m + l)tuple V = (Vj)gtg € T7'"* (M), for some p € U, can be first 
mapped to a vector-field tuple Extny(V) which extends V to U N Dom(v) for some v € atlas; (M). Then 
(dw) (Extny (V)) : U > W is produced from w and Extn, (V) as inputs. And finally, this function is evaluated 
at p = 1" *1(V) to obtain an element of W. 


The non-trivial step in this construction sequence is the construction of a vector-field tuple Extn, (V) which 
extends V to a neighbourhood of p. In fact, some such construction is required in practical applications of 
the exterior derivative. This can be achieved be means of a chart w € atlas;(M) as is suggested here, or 
it can be achieved using vector fields which are found in a particular application context. For example, on 
differentiable principal bundles, the *fundamental vector fields" in Definition 66.6.2 can often be used for 
this purpose. When a connection is available on a differentiable fibre bundle, horizontal lifts of base-space 
vector fields can be used, as in Definitions 67.5.10 and 69.1.16. This requirement for the extension of vectors 
to vector fields is unavoidable because the exterior derivative is a derivative of vector fields, not a function 
of vectors at a point. 


When vector fields are used instead of vectors as inputs for the exterior derivative of a differential form, 
it must be shown that the resulting construction depends only on the behaviour of the differential form 
at a single point. This requirement can be expressed as the relation (dw)(X) = (dw) o X, which is not 
satisfied by the uncorrected exterior derivative for general vector-field tuples X. When it is valid, it implies 
(dw)(X)(p) = (dw)(X(p)) for p in the relevant domain, which shows that dw depends only on the vector- 
tuple X(p), not on its derivatives or values at other points, whereas the expression (dw)(X) could have very 
general dependencies on X. In fact, such dependencies do occur for the uncorrected exterior derivative. 
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61.14.4 REMARK: The uncorrected exterior derivative gives correct output for holonomic vector fields. 
When the uncorrected exterior derivative in Definition 61.14.2 is computed for holonomic vector-field tuples, 
it gives the correct output, as for example in Theorem 61.14.5 for constant vector field extensions. (For the 
Cartesian space version of Theorem 61.14.5, see Theorem 46.8.5.) Holonomic vector-field tuples are those 
which have zero commutators between the individual vector fields. (Strictly speaking, holonomic means that 
the fields can be constructed as the differentials of a C? curve-family, but this is essentially equivalent to the 
zero commutator condition. See Schutz [36], pages 47-49, for a proof that vector fields with zero commutator 
may be equated to the differentials of curve-families.) 


61.14.5 THEOREM: Application of the uncorrected exterior derivative to constant vector fields. 
Let M be a C? manifold. Let U € Top(M). Let W be a finite-dimensional real linear space. Then 


Vm € Zp, Vw € X! (As (T(M),W)|U), Vp € U, VV € T,(M)"*, vw € atlas,(M), 
(dw) (Extny (V))(p) = (dw)(V), (61.14.1) 


where d is the uncorrected exterior derivative for vector fields in Definition 61.14.2, and d is the exterior 
derivative for vectors in Definition 61.10.3. (See Notation 57.4.5 for Extn, (V).) 


PROOF: Definition 61.14.2 with X = Extn,(V) implies that 


Vm € Zt, Vw € X! (A(T(M),W)|U), Vp € U, VV € T;(M)"*, V € atlas,(M), 


(dw) (Extny (V))(p) = 2 (—1)* Ox, (vy (w © omit o Extny(V)) 


= p» (71) ðv, (w o omit o Extny(V)) 


by Notation 57.4.5. This equals (dw)(V) by Definition 61.10.3. 


61.14.6 REMARK: The “error term” for the uncorrected exterior derivative for general vector fields. 
Theorem 61.14.7 shows the “error term” which appears when the uncorrected exterior derivative operator 
in Definition 61.14.2 is applied to general vector-field tuples which are not necessarily holonomic. 


Theorem 61.14.7 follows from the analogous Theorem 46.8.8 for Cartesian spaces by the application of 
Theorem 61.12.3 to convert dw from the manifold version in Definition 61.10.3 to the Cartesian space 
version in Definition 46.7.4. Recycling Theorem 46.8.8 in this way actually requires more lines of proof for 
Theorem 61.14.7 than for the original theorem. An alternative would be to replicate the method of proof for 
Theorem 46.8.8 directly for differentiable manifolds. 


For Lie bracket expressions such as |X, X;] on line (61.14.2), see Definition 61.5.7. 


61.14.7 THEOREM: Application of the uncorrected exterior derivative to general vector fields. 
Let M be a C? manifold. Let U € Top(M). Let W be a finite-dimensional real linear space. Then 


Vm € Zi, Vw € X'(Am(T(M), W) |U), YX e X! (T(M)|U)"**, vp € U, 


(dw)(X)(p) = (dw)(X(p)) - X, CD eX, Xia) omit(X 92) (61.14.2) 
hee 


where d is the uncorrected exterior derivative for vector fields in Definition 61.14.2, dis the exterior derivative 
for vectors in Definition 61.10.3, and X (p) = (Xq(p))"_, for all p € U. 


PROOF: Let m € Zf, w € X!(A,(T(M),W)|U), X e X'(T(M)|U)"*, p € U and v € atlas,(M). 
'Then by Theorem 61.12.3, 


(du) (X (p)) = (dv o Py- (9) (o) ("* (9) (X (p) 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


1956 61. Vector field calculus 


where w = wo 97 104). &=(p), and X = 6™+1(y) o X o v-!. Then by Theorem 46.8.8, 


(ds)(X)(@) = YS 0 Y; 


(=0 i=l y=} 
But for £ € Z[0, m], ; . 
oly) (omit (Xy) = w( BY (0) (omit(X (y))) 
= u(omit(975, (0) QO ()) 
= w(omit(X (67 (y)))). 
So »: " 

(dy) = 3 C1 Z Xe) 0yo omit QX (67 WN) ye 
= E Cn! 3 BX) ayoli XO WN ea 
= $ (-1)Ax,¢p)(w o omit(X)) 

£—0 
= (du) (X) (v) 


by Theorem 54.13.7 line (54.13.4) and Definition 61.14.2. But by Notation 46.4.8, for all k, £ € Z[0, m], 


Xe, Xd) = Y X« G8) Xi)... — X Kelp Xa yas 


>. 
= 


n 


D B()(Xe(p))! 8:9 P) Xi Q7 U — E (5) X (9)) 2): (9) OX 67 9) 


$()( Xy, XQ ())) 
by Theorem 61.5.11. Therefore for all k, € Z[0, m], 

e (à) (Xi, Xo] ( ), omit (X (2 ))) = o(ž) (8V) (Xr, X] (p)), omit(™*! (Y) (X (p)))) 
= (3) (BY) (Xr, X] (p)), 9"! (4 V)(omit(X (p)))) 


I 


y—i 


-. 
= 


Therefore 


This verifies line (61.14.2). 


61.14.8 REMARK: The correction term for the uncorrected exterior derivative for vector fields. 

An immediate consequence of Theorem 61.14.7 is the ability to recover the correct exterior derivative in 
Definition 61.10.3 from the uncorrected exterior derivative for vector fields in Definition 61.14.2 by adding 
the correction term which appears in line (61.14.2). (See Definition 46.8.10 for the Cartesian space version 
of Definition 61.14.9.) 


The vector-field version (dw)(X)(p) of the exterior derivative in Definition 61.14.9 has a different functional 
form to the vector version (dw)(V) in Definition 61.10.3, although they give the same output when V = X (p). 
'The greater flexibility of the vector-field version is purchased at the price of the correction term which features 
Lie brackets of the vector fields. 
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61.14.9 DEFINITION: Corrected exterior derivative for vector fields on differentiable manifolds. 

The exterior derivative for vector fields on a C? manifold M, valued in a finite-dimensional real linear 
space W, is the map d : X! (A,(T(M), W)|U) > (X!(T(M)|U)"*! > (U > W)), for degree m € ZF 
and U € Top( M), which is defined by 


Vw € X! (AS(T(M),W)|U), VX = (Xi)? o E€ X! (T(M)|U)"*1, Vo € U, 


(d2)(X)(p) = 35 (7 ôx) o omit(X)) + $ (71) (Xe, Xp) omit(X(p))). — (61.143) 
£—0 k,£=0 j 


61.14.10 REMARK: Immunity to nonholonomy of the corrected vector-field exterior derivative. 

In lines (61.14.4) and (61.14.5) in Theorem 61.14.11, the left-hand-side instances of “dw” are applied to 
vector-field tuples as in Definition 61.14.9, but the right-hand-side instances are applied to vector tuples 
as in Definition 61.10.3. Thus these lines indicate the relation between two different formulations of the 
exterior derivative, but more importantly, they show that the pointwise value of the exterior derivative can 
be recovered from the vector-field version (on the left hand side) by evaluating the output function (dw)(X) 
at p. In other words, (dw)(X) is a function from U to W whose value at each p € U is equal to the exterior 
derivative of w applied to the vector-tuple X (p). 


As mentioned in Remark 61.14.3, the exterior derivative is often evaluated in contexts where the vector- 
field tuples which are conveniently available are not necessarily holonomic. (For example, fundamental 
vertical vector fields on principal bundles and horizontal vector-field lifts on fibre bundles which have a 
connection.) Theorem 61.14.11 shows that the corrected exterior derivative gives the correct output at each 
point, independent of any noncommutativity of the vector fields. 


Line (61.14.3) in Definition 61.14.9 may be written very informally as 
(dw) (X) (p) = “uncorrected exterior derivative" + “correction term". 


Thus Theorem 61.14.11 implies that when the correction term is added to the uncorrected exterior derivative 
(dw)(X) in Definition 61.14.2, the resulting corrected exterior derivative (dw)(X) in Definition 61.14.9 is 
immune to the noncommutativity of the vector fields in the vector-field tuple X. In other words, it is 
immune to the nonholonomy of its vector-field arguments. (See Theorem 46.8.12 for the Cartesian space 
version of Theorem 61.14.11.) 


61.14.11 THEOREM:  Equivalence of vector-field and constant-vector versions of the exterior derivative. 
Let M be a C? manifold. Let U € Top(M). Let W be a finite-dimensional real linear space. Then 


Vm € Zj, Vw € X (AS (T(M),W)|U), YX e X! (T(M)|U)"*, Vp € U, 
((dw)(X))(p) = (dw)(X(p)), (61.14.4) 


where X(p) means (Xq(p))" for all p € U. In other words, 


Vm € Zt, Vw € X (A, (T(M),W)|U), YX e X! (T(M)|U)"*!, 
(dw)(X) = (dw) o X. (61.14.5) 


PROOF: The assertion follows from Theorem 61.14.7 and Definition 61.14.9. 


61.14.12 REMARK: The vector-version exterior derivative may use general holonomic vector fields. 

It is more or loss obvious from Theorem 61.14.11 that the somewhat unnatural-looking “constant” vector 
extension fields in Definition 61.10.3 line (61.10.1) may be replaced by arbitrary holonomic pairs of vector 
fields which have the specified values at the point where the differential is computed. The definition looks 
more natural if holonomic vector field pairs are specified instead of using vector extension fields which use 
coordinate charts. 


However, using constant vector extension fields in Definition 61.10.3, which are constructed from coordinate 
charts, has the advantage that such fields are easily shown to exist and have the required differentiability. To 
show that suitable general holonomic vector-field pairs exist may be done most easily by presenting constant 
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vector extension fields as examples! So once again, the attempt to avoid coordinate charts only hides them. 
Elegant definitions very often have some quite inelegant constructions hiding beneath their shiny surface if 
one takes the cover off to look inside. 

Theorem 61.14.13 shows that the constant vector extension fields which are used to construct the vector- 
tuple version of the exterior derivative in Definition 61.10.3 can be replaced with general holonomic tuples 
of vector fields. 


61.14.13 THEOREM: General holonomic vector-field tuples to define vector-version exterior derivative. 
Let M be a C? manifold. Let U € Top(M). Let W be a finite-dimensional real linear space. Let m € Zf. 
Let p € U. Let V € T,(M)"*!. Let X € X! (T(M) |U)"'*! be a holonomic (m + 1)-tuple of C* vector fields 
on M such that X¢(p) = V; for all £ € Zit. 

Then the exterior derivative d : X! (A,,(T(M),W)|U) 4 X(Am+41(T(M), W)|U) for differential m-forms 
on U, valued in W, satisfies 


Vw € X'(A,(T(M),W)|U),  (dw)(V) = SEW dy, (w o omit(X)). 


PROOF: Let we X!(A,(T(M),W)|U). Let v € atlas,(M). Then 


(dw)(V) = (dw)(X(p)) 
= ((dw)(X))(p) (61.14.6) 
= Y (7 x, gy o omit(X)) (61.14.7) 
£—0 
= X C ðv; (w o omit(X)), 


^S 
ll 


0 


where line (61.14.6) follows from Theorem 61.14.11 line (61.14.4), and line (61.14.7) follows from Definitions 
61.14.9 and 61.5.19. 
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62.0.1 REMARK: The inescapability of Lie groups for parallelism and curvature in differential geometry. 
The core concept of differential geometry is curvature. Intrinsic curvature is defined as the limit of parallel 
transport around closed curves “divided by” the tensorial area enclosed by the curve. Differential parallel 
transport is defined on differentiable fibre bundles in terms of horizontal lift functions. (See Sections 67.4 
and 67.5 for horizontal lift functions.) Parallel transport, as the name suggests, preserves the structure group 
of the fibre bundle on which it is defined. In most systems of interest for physical applications of differential 
geometry, the structure group is a Lie group, and the differential of parallel transport (i.e. the horizontal lift 
function) is therefore constrained to be equivalent to an element of the Lie algebra of the structure group. 
(See Definition 67.5.4 (v).) Curvature is then defined as the limiting ratio of the integral of such a Lie algebra 
element divided by the tensorial area of the curve in the base space of the fibre bundle. (See Section 70.4 for 
curvature of general connections.) This yields the "ratio" of a Lie bracket of vector fields on the Lie group 
"divided by" the tensorial area of the base space path. 


The use of Lie groups and Lie algebras in differential geometry is inescapable. Even in the late 19th century 
tensor calculus formulation of parallelism and curvature, the Lie algebra of a Lie group was tacitly present. 
'The Christoffel array numerically quantifies parallel transport of unit vectors along lines parallel to the axes 
of Cartesian charts. Thus one may write Tj, = —0,,(ej)', where 0 denotes the “horizontal lift function" 
which maps each direction V € T,(M) and fibre element z € E, to a “lifted velocity" Oy (z) € T(E) on the 
total space E of a differentiable fibre bundle with base space M. When this differential parallelism vector 
0v (z) is mapped by a fibre chart ¢ to the tangent space on the fibre space F = R” (where n = dim(M)), 
the result must be a vector ¢.(0v(z)) € Toc) (F). (See Definition 58.9.4 for the induced map ¢,.) This idea 
is illustrated in Figure 62.0.1. 


The composite @, o @y must be a vector field on F corresponding to an element of the Lie algebra of the 
structure group G on F. In this case, G = GL(n,IR). So for each k € Nn, the map e; 9 -yi Lei 
determines an element of the Lie algebra of GL(n, IR). This yields the horizontal lift function: 


0:V = Ys ek 3 (z= zej e Ov(z NS Dis ei) 


j=1 i,j,k=1 


The Lie algebra element 6y is the infinitesimal action (i.e. vector field) on the fibre space corresponding to 
the base space velocity V € T;,(M). Clearly the Lie algebra of a Lie group has thus been implicit in notions 
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Oy dx 0 Oy 


X(T(E) | Ep) B - X(T(F)) 


$ 


Figure 62.0.1 Correspondence between vector fields on a fibre set and the fibre space 


of parallelism for about 150 years. The Christoffel array is the array of components of an element of the Lie 
algebra of GL(n, R). But when one wishes to generalise affine connections on tangent bundles to general 
connections on differentiable fibre bundles, the role of Lie groups and Lie algebras in the interpretation of 
parallelism and connections must be made explicit. 


The well-known tensor calculus expressions, in chart coordinates, for the Riemann curvature tensor compo- 
nents (such as Rt jke) in terms of derivatives of the Christoffel array can be obtained by specialising the more 
general abstract expressions for the curvature of general connections on differentiable fibre bundles to affine 
connections on tangent bundles. (The general case is obtained essentially by computing the exterior deriva- 
tive of the Lie algebra element which specifies differential parallel transport with respect to “infinitesimal” 
closed curves in the base space.) 


It is possible to obtain curvature formulas as “rabbits out of hats” within the frameworks of various modern 
formalisations of connections and curvature. It is preferable, however, to first derive all curvature formulas 
in an ontologically clear way from parallelism and holonomy groups for general differentiable fibre bundles, 
and then specialise these as required. This gives not only confidence that the formulas are correct, but 
also insight into the meaning of each symbol, and furthermore makes it much more feasible to generalise all 
concepts to more general scenarios where the simplifying assumptions of even the most general definitions 
of connections no longer hold. (Consider, for example, how parallelism might be defined on non-smooth 
boundaries of manifolds.) The alternative is to be limited to formulas which have “fallen from the sky on 
golden tablets” due to lack of knowledge of their true origins. 


62.0.2 REMARK: The necessity of Lie groups for the dynamics of reversible systems. 

The motion of a rigid body in space satisfying invariance with respect to the group of possible motions of the 
body. In other words, each state of the body is related to any other possible state by an element of a group. 
If this group has a finite-dimensional differentiable manifold structure, it must be a Lie group of actions on 
the set of states of the body. Any rate of change of the state, in particular with respect to time, must be an 
element of the corresponding Lie algebra. Similarly, in the case of more general dynamical systems which 
are reversible and finitely-parametrisable, the groups of state transitions of such systems are Lie groups. 


62.0.3 REMARK:  Differentiable groups, and topological and differentiable groups of transformations. 
Three kinds of group are presented in Chapters 62-63. 


(1) Lie group: a group with a compatible analytic manifold structure. 


(2) Diffeomorphism group: a topological group of diffeomorphisms of a differentiable manifold. 
Sections 63.1, 63.2, 63.3. 


(3) Lie transformation group: an analytic group of diffeomorphisms of a differentiable manifold. 
Sections 63.4, 63.5, 63.6, 63.7. 


Since a group may be regarded as a transformation group of itself, (1) may be considered as a special case 
of (3), while (3) is clearly a special case of (2). Diffeomorphism groups which do not have a differentiable 
manifold structure in case (2) are not, strictly speaking, “differentiable groups”, but it is convenient to 
present them together with Lie groups and Lie transformation groups. 


62.0.4 REMARK: Circularity of definitions of differentiable fibre bundles and tangent bundles. 
There is a kind of circularity between the definitions of differentiable manifolds, differentiable fibre bundles 
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and Lie groups. On a sufficiently differentiable finite-dimensional manifold M, a differentiable tangent 
bundle T(M) may be constructed. A differentiable tangent bundle is a differentiable fibre bundle. For 
a finite-dimensional differentiable fibre bundle, a finite-dimensional differentiable group G(T(M)) may be 
constructed. And a finite-dimensional differentiable structure group G is a finite-dimensional differentiable 
manifold which has a differentiable tangent bundle T(G). This cycle of relations is illustrated in Figure 62.0.2. 


structure 
group 


differentiable unique tangent unique 
: ———————» : > 
manifold construction | bundle construction atlas 


Figure 62.0.2 Construction loop for manifolds, tangent bundles and Lie groups 


One has thus a sequence of constructions M > T(M) > G(T(M))  T(G(T(M))) > G(T(G(T(M))), 
and so forth, because finite-dimensionality is conserved by these constructions. For a general n-dimensional 
manifold M, the group G(T'(M)) will typically be (n x n)-dimensional. It follows that G(T(G(T(M)))) will 
be (n x n) x (n x n)-dimensional, and so forth. The order of presentation of these structures in this book is 
first differentiable manifolds in Chapter 51, then tangent bundles in Chapters 54, 55 and 56, differentiable 
groups in Chapter 62, differentiable transformation groups in Chapter 63, and differentiable fibre bundles in 
Chapters 64-66. 

Importantly, the tangent bundle of a manifold is not specified as part of the definition of a differentiable 
manifold, and a structure group is not specified as part of the definition of a differentiable fibre bundle. 
Some texts do include the structure group in the specification tuple of a differentiable fibre bundle. But 
this is neither necessary nor beneficial. The cycle of constructions in Figure 62.0.2 is safe because they are 
merely constructions, not components of specification tuples. Figure 62.0.3 illustrates a sequence of such 
constructions for an n-dimensional differentiable manifold. 


T(GL(nxm)) GL((nxm)x(nxnm)) 


T3 EN a 


T(GL(n)) GL(nxn) RO") (nxn) 


Figure 62.0.3 Construction sequence for manifolds, tangent bundles and Lie groups 


62.1. Hilbert's fifth problem 


62.1.1 REMARK: Relations between various classes of topological groups and Lie groups. 
Figure 62.1.1 shows some family relations between topological groups and Lie groups. 


62.1.2 REMARK: History of the proof that continuous groups are analytic. 

Continuity can be substituted for analyticity in Definition 62.2.4, and the analyticity then follows. (See 
Sulanke/Wintgen [40].) Kobayashi/Nomizu [19], page 38, require a Lie group to be a C^? manifold and state 
that it follows that the manifold is real analytic (page 43). 
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topological group 
(G,Te,7G) 


"d Es 


locally compact top. group locally connected top. group 
(G,Ta,ca) (G,Ta,ca) 


bot zt 


locally Euclidean group 


(G.Ta,oG) 
| EDM2, 423.N 
Lie group 
(G,Ac,oa) 
Figure 62.1.1 Family tree of topological and Lie groups 


EDM2 [113], 423.N, says that Hilbert’s fifth problem (in 1900) posed precisely this question, and it was 
resolved in the positive in 1952. “It was proved [in 1952] that any locally connected finite-dimensional 
locally compact group is a Lie group.” This means is that there exists a differentiable locally Cartesian 
atlas Ag, compatible with the topology Tc on the topological group, for which the group (G, Ac, c) is a Lie 
group. 

The solution of Hilbert’s 5th problem is presented in the book by Montgomery/Zippin [118]. In section 4.10, 
they give various theorems on this subject. Two of their theorems are as follows. 


(i) A locally Euclidean (i.e. Cartesian) group has no small subgroups and is isomorphic to a Lie group. 


(ii) A locally compact group which is finite-dimensional and locally-connected is a Lie group. 


Theorem (i) is stated in Montgomery/Zippin [118], pages 70 and 184. They attribute it to Gleason [179] and 
Montgomery /Zippin [189]. 

Theorem (ii) is stated in Montgomery/Zippin [118], page 185. (The term “finite-dimensional” here apparently 
refers to topological dimension or Lebesgue dimension. See Section 33.8 for topological dimension.) 


62.1.3 REMARK: The regularity of Lie transformation groups. 

In the case of Lie transformation groups, if the space acted upon by the group is C^ for some k € Zi and 
the group action is likewise C^ for a fixed group element, then the group action can be proved to be C* 
with respect to the group elements also. It does not seem that analyticity follows in this case. "Therefore 
the group actions of Lie transformation groups (Section 63.4) are defined here as having only C* regularity 
from some k € Zg, although the groups themselves are defined to be analytic. 


62.1.4 REMARK: The relations between various classes of groups and transformation groups. 

Figure 62.1.2 shows some of the relations between topological transformation groups and Lie transforma- 
tion groups. The symbols Ag and Aj, refer to atlases on sets G and M respectively. The atlases imply 
corresponding topologies for the respective sets. 


EDM2 [113], 431.H (11), page 1637, says the following. 


Suppose that M is a C! manifold and G is a topological transformation group of M acting effectively 
on M. If G is locally compact and the mapping z — g(x) of M is of class C? for each element g 
of G, then G is a Lie transformation group of M. 


This seems to cover all of the cases of interest. Thus if the group, the manifold and the group action are C', 
then the group is a Lie transformation group, which implies that it is analytic. This implies that there 
is not much point in defining C^ transformation groups for k > 1. However, it seems clear that there is 
a point in defining varying levels of regularity for the manifold M and for the action u : Gx M M. 
Of particular relevance to this is the theorem statement in (1), which is paraphrased from Montgomery / 
Zippin [118], page 212. 


(1) Let (G, M, c, u) be a Lie transformation group of a manifold M. Let k € Zf. Suppose that M is a 
C* manifold and L, : M — M is OF (i.e. L} € C*(M, M)) for all g € G. Then p € C*(G x M, M). If 
M is analytic and Lg : M — M is analytic, then the group action pz is analytic. 
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transformation group 
(G,X,0G,") 
topological group transf. group of top. space 
(G.Ta,oG) (G.X,Tx oa.) 
Lie group top. transf. group of top. space BN n F ft 
(G,Aa,oa) (G,To,X,Tx 0G.) Saha qo Farbe i UE 
M loc. compact, Tg=compact-open top. 
icm x M loc. connected or unif. top. space 
EDM2, 431.H(10) Se 
i G acts equicontinuously on M 
C" Lie transf. group (G TeX, Tx oep) 
(G,Ac,M,Am,0oG,H) 
een eff. top. transf. group, M is oF 
EDM2, 431.H(11) G loc. compact, js is C^ w.r.t. M 
(G, Aa, M, Aw ,0G.H) 
Figure 62.1.2 Family tree of topological and Lie transformation groups 


62.2. Differentiable and analytic groups 


62.2.1 REMARK: Adding differentiable structure to groups. 

The C* classes of differentiable groups in Definition 62.2.2 are derived from the plain algebraic groups in 
Definition 17.3.2 by adding a C" atlas for differentiable structure and requiring the group operation to be 
consistent with this. The only difficulty which arises here is that even groups which are given consistent topo- 
logical manifold structure are fairly inevitably *promoted" to analytic groups as alluded to in Section 62.1. 
However, the fact that one may obtain analyticity “for free" does not invalidate Definition 62.2.2. 


In practice, one does not examine a group, determine that it is continuous, and then infer from this that it 
must be analytic. In practical situations, it is usually obvious that a group is either analytic or not analytic. 
If there are not groups which are C* but not analytic, that it not a problem because one rarely requires a 
group to be not analytic. In this way, one may avoid the need to understand the very difficult proof that 
analyticity is implied by continuity or differentiability under some technical assumptions. One may simply 
assume whichever level of differentiability or analyticity is required for any particular purpose. Consequently 
definitions for both differentiable and analytic groups are given here. 


62.2.2 DEFINITION: A O% differentiable group, for k € Zg, is a tuple G < (G, Ag, o) such that 


(i) G < (G,c) is a group, 

(ii) G < (G, Ag) is a C* differentiable manifold, 
) 
) 


(iii) the map e : G x G — G is C* differentiable, 


(iv) the inversion map g++ g^! from G to G is C^ differentiable. 


62.2.3 REMARK:  Differentiability conditions for differentiable groups. 

Malliavin [28], page 156, defines Lie groups to have a manifold structure of class C?, and a group action 
and inverse map of class C?. Gallot/Hulin/Lafontaine [13], page 27, require class C^. EDM2 [113], 249.A, 
requires an analytic manifold and analytic group operations. Kobayashi/Nomizu [19], page 38, require C^? 
regularity, but they say on page 43 that the C% condition may be replaced with analyticity. 


The assumption of analyticity for Lie groups in Definition 64.8.3 can be proved from much weaker conditions, 
such as mere differentiability or even just continuity. However, the proof of this is extremely difficult. (This is 
discussed in Remark 62.1.2.) Therefore to avoid the issue, analyticity is assumed for the group's differentiable 
structure and its operations. 


To try to obtain maximum consistency between the definitions here and the definitions in the literature, 
the name “Lie” is generally used here when the group is analytic. Thus “Lie” can be read as a synonym 
for “analytic” for both Lie groups in Definition 62.2.4 and Lie transformation groups in Definition 63.4.2. 
However, in the case of Lie transformation groups, it is only the group structure and action on itself which 
are assumed analytic while the passive set and the action of the group on the passive set are assumed to be 
merely C* differentiable for some k € Zj. 
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62.2.4 DEFINITION: A (real) Lie group or (real) analytic group is a triple G < (G, Ag, a) such that 
(i) G < (G,c) is a group, 

(ii) G < (G, Ag) is a finite-dimensional real analytic manifold, 
i) 
v) 


(iii) the operation o : G x G > G is real-analytic with respect to Ac, 
(i 


62.2.5 REMARK: The implicit function theorem implies analyticity of the group inverse map. 

Many authors require the group operation (x,y) + o(x,y 1) = xy^! from G x G > G to be analytic for a 
Lie group. The purpose of this is to imply that the map y — y^! is analytic. However, this is superfluous 
since, as remarked in Montgomery/Zippin [118], page 49, the analyticity of the map g ++ g^! follows from 
the analyticity of c by the implicit function theorem. So condition (iv) of Definition 62.2.4 is superfluous. 


the inversion map g++ g~! from G to G is real-analytic with respect to Ag. 


62.2.6 THEOREM: The left and right translation maps of elements of a Lie group are C^. 
Let G be a Lie group. For g € G, define Lg : G+ Gand Ry : G > G by L,: 2+ gx and Ry : x > zg. (In 
other words, Lg = o(g,-) and Rg = o(-,g).) 
(i) Vg € G, Lg: G— G is a C^ diffeomorphism. 
(ii) Vg € G, Rg: G— G is a C^? diffeomorphism. 


Pnoor: Part (i) follows from Theorem 52.6.8 (ii) applied to Lj; and (L,)~! = L 
Part (ii) follows from Theorem 52.6.8 (i) applied to Rg and (R5) ! = R,-1. 


62.2.7 REMARK:  Differentiability of left and right infinitesimal actions of a Lie group. 

Theorem 62.2.8 implicitly asserts that the left and right invariant vector fields of a Lie group are C??. This 
is also shown using more direct computations in the proof of Theorem 62.4.16 (v). 

The functions o(g1,-): G —> G and o(-,g2) : G — G in Theorem 62.2.8 are identical to the left and right 
translation operators LF and RS in Definitions 62.3.3 and 62.6.2 respectively. Thus line (62.2.1) asserts 
that (do)g,.9.(u1, U2) = ‘(dS 9, aJr + (dRE )g (u1) = L7, (u2) + R, (u1) in terms of the left and right 
transformation operators L7, and Rj, in Definitions 62.3.9 and 62.6.3 respectively. (See Theorem 62.8.11.) 


62.2.8 THEOREM: The differential of a Lie group operation equals a sum of left and right differentials. 
Let G < (G, Ac,0) be a Lie group. Then o, : T(G x G) > T(G) is a C^? map and 
V(gi, 92) €GxG, V(u1, u2) € Ty, (G) x Tga (G), 
(do)g, o; (u1, u2) = (do(g1, a ))oz (u2) + (do( ` ,92))g; (u1) (62.2.1) 
= (dLg, )g; (u2) up (dF,)g, (u1). 


PROOF: The differential map o, : T(G x G) — T(G) is C% by Definition 62.2.4 (iii) and Theorem 58.9.12. 
The formulas for (da)g, ga (U1, u2) follow from Theorems 58.6.6 and 62.2.6. 


62.2.9 THEOREM: The differential of the inversion map is the negative identity map. 
Let G be a Lie group. Let j : G — G be the inversion map j : g g^ !. Then (dj). = —idr, (c). 


PROOF: Let u € T.(G). Let Y € atlase(G). Then u = tew, for some w € R”, where n = dim(G). Define 
L: R > R” by L: t+ y(e)+tw. Since Range(y) € Toppe) IR"), it follows that L~*(Range(w)) € 
Topo(R”). Therefore there is an open interval I € Topo(R”) with / C L^! (Range(v)). Define y : I > G by 
y:te wv !(L(t)) 2 Yt yle) + d Then 4/(0) = te,w,y = u by Definition 57.9.2. 

Define 7: I > G by y :t e y(t)! = j(o(t)). Then by the chain rule. 5/(0) = (dj)e(^/(0)) = (dj)e(u). But 
o (5 (t), y(t)) = e for all t € I. So (o (F(t), y(t))) = Qe = Or, (a) for all t € I. Therefore by prp 62.2.8, 


Or, (a) = ài(o (5 (0), (0)) = (do)e« (Y (0), V (0)) = (ALe)e(5 (0)) + (dRe)e(7 (0)) = (0) +7 (0). S Ge )= 
—y'(0) 2 —u. Therefore (dj)e(u) = —u. Thus (dj)e(u) = —u for all u € Te(G'). Hence (dj)e = “ide 


62.2.10 REMARK: Complex Lie groups. 

Complex Lie groups have enormous importance in fundamental particle physics, especially the special unitary 
groups in gauge theory. So they cannot be ignored. However, they do not require definitions for complex 
manifolds and complex analyticity because an n-dimensional complex manifold can be regarded as a 2n- 
dimensional real manifold. So Definition 62.2.4 is adequate for describing complex Lie groups. 
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62.2.11 REMARK: Concrete versus abstract Lie groups. 

In practical applications, Lie groups are often presented as concrete matrix groups as in Section 25.14, not 
as the abstract style of group with a differentiable atlas as in Definition 62.2.4. There are more than two 
levels of abstraction of Lie groups. (Four related styles of Lie algebras are outlined in Remark 19.10.1.) 


(1) Classical matrix groups containing invertible matrices in Mn,n (K) = K"*” for a field K and n € Zt. 
Here both the group elements and the implied linear space K” are concrete. 


(2) Groups of linear automorphisms in Aut(V) for a linear space V over a field K. (See Notation 23.1.9.) 
Here the linear space is abstract, but the linear transformations are relatively concrete. 


(3) Fully abstract Lie groups as in Definition 62.2.4. 


These three levels of abstraction are often combined in particular contexts. With fully concrete matrix 
definitions, Lie algebra elements may be expressed as concrete matrices, and then the exponential map from 
the Lie algebra to the Lie group may be expressed as a power series of matrices. In physics texts, this very 
concrete approach has advantages for computation, but it also has the consequence that all other definitions 
are expressed concretely, which is sometimes difficult to reconcile with the corresponding abstract definitions. 


62.3. Left translation operators on Lie groups 


62.3.1 REMARK: Invariants of transformation groups are figures which are invariant under group actions. 
Invariants of groups are properties or structures which are invariant under group actions. Group actions 
are initially defined on the elements of a group, but they may be extended to actions on subsets, functions, 
operators, fields, and other kinds of "figures". (General invariant figures of transformation groups are 
discussed in Section 20.9.) Definition 62.3.3 presents the action of group elements on some of the kinds of 
figures which are relevant to Lie groups. In this case, the passive set for the group actions is the group itself. 
Invariance under such group actions is presented in Section 62.4. 


62.3.2 REMARK: Some extended left action definitions for Lie groups. 

The left translation operators in Definition 62.3.3 are required for the definition of the Lie algebra of a 
Lie group. The superscripts on the symbols for these operators distinguish the kinds of spaces which they 
operate on: G for group elements, C for continuous real-valued functions, T' for tangent vectors, and F for 
fields. The superscript is omitted when the application space is obvious. As one would expect, all of the left 
translation operators in Definition 62.3.3 become identity maps when g equals the identity e of G. 


62.3.3 DEFINITION: Let G be a Lie group. Let g € G. 
The left translation operator (for group elements) LG : G > G is defined by 


Va eG, LS (a) = gu. 


The left translation operator (for real-valued functions) LẸ : C°(G) => C°(G) is defined by 


vd e C"(G), LC (9) — à LC... 
'That is, 
vo € C"(G), Va €G, LG ($)(z) = é(g a). 


The left translation operator (for tangent operators) Lr : T(G) > T(G) is defined by 


VV e T(G), ET (Ay) = dy o LE. 
That is, 
VV € T(G), Y$ e C*(G), Lj (Ov ($) = 9v (L5. (9) 
= dy (go LẸ). 


The left translation operator (for tangent operator fields) ie : X°(T(G)) + X°(T(G)) is defined by 
YX e X"(T(G)), LF (Ox) = LT o Ox o LC... 
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That is, 
VX e X°(T(G)), Yz €G, L5 (Ox)(a) = L5 (Ax(g*2)) 
= L} (Ox(g-12)) 
= Üx(g-is) o LE. 
That is, 


YX e X"(T(G)), Yx € G, Yọ € C! (G), 
LF(8x)(z)(9) = Ox (LF (£) (LF (9) 
= üx(g !z)(ó o L7) 
= Ox (g- ix (6 0 Ly). 


62.3.4 REMARK: Illustration of four kinds of left actions of a Lie group on itself. 
The interpretation of Definitions 62.3.3 and 62.3.9 may be assisted by Figure 62.3.1. 


G [e 
Lg Ly 
LE 
. ————_ * . 4 e 
x gx ie LS, x 
LT LE 
d T 9 F F 
V LU LTV X(g c) Lj Lg (X) (£x) 
g : 
C C 
LC ,ó Loa P gef A. Loa $ 
. ——————_> a. . <+—___————__ # 
x LF gx g x L, T 
Figure 62.3.1 Left translation operators for Lie groups 


62.3.5 REMARK: Comparison of tangent vectors and tangent operators for left translation operators. 
Definition 62.3.3 shows the convenience of using tangent operators Oy € T(M) instead of plain tangent 
vectors V € T(M). Just as in distribution theory, the translation of differential operators may be easily 
expressed in terms of the reverse translation of test functions. For differentiable manifolds, it is necessary 
to convert the tangent operator back into a tangent vector. This conversion is discussed in Remark 54.15.77 
and elsewhere. 


In the case of tangent vector fields X € X?(T(G)), the notation 0x means the pointwise assignment of 
an operator to a vector. In other words, Ox € X°(T(G)) is defined by Ox(x) = x) for all r € G 
and X e X°(T(G)). 

The left translation operators i and LE apply to tangent operators Oy and tangent operator fields Ox 
whereas L? and Le apply to non-operator tangent vectors V and vector fields X respectively. 


62.3.6 REMARK: The range of the left translation operator for tangent operators. — 
In Definition 62.3.3, LT (9v) € T4, (G) for all V € T;(G) (i.e. for all Oy € T;(G)). So LT does not define an 


automorphism of T, x (G), but it is an automorphism of the total tangent space T(G). 


62.3.7 REMARK: Reverse translation of test function for left translations for vector fields. 
The left translation operator for vector fields Lr is defined by taking the value of the field X at g^!z, but 
the test function ¢ must then also be moved back to g^ !x in order to apply the vector X (g^ !z) to it. 
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62.3.8 REMARK: Conversion of left translation operators from tangent operators to tangent vectors. 
Definition 62.3.9 gives the non-operator tangent vector version of Definition 62.3.3 for the left translation 
functions rl and Ln . Whereas the tangent operators in Definition 62.3.3 act on test functions, the tangent 
vectors in Definition 62.3.9 use the induced map (L£),. 


The induced map (dL?). in Definition 62.3.9 may be written in terms of the pointwise differential. For 
all V € T(G), (dLg).(V) = (dL) (vy (V) and (dLG),(X(g7*a)) = (dL@),-1,(X(g7'2)). Theorem 62.3.10 
asserts that the left translation operators in Definition 62.3.9 act on tangent vectors and vector fields in a 
correct correspondence to the tangent operator vectors and fields in Definition 62.3.3. 


62.3.9 DEFINITION: Let G be a Lie group. Let g € G. 
The left translation operator (for tangent vectors) LT : T(G) > T(G) is defined as LT ex. 


g 
YV e T(G) L3 (V) = (£F).(V) 
, g Ne gl 
= Oy Lf. 
In other words, 
Vp € G, VV € T,(G), L5 (V) = (4LẸȘ)p (V). 


The left translation operator (for vector fields) LẸ : X°(T(G)) > X?(T(G)) is defined by 


YX e X?(T(G)), LF (X) = L} o X o LẸ 
= (LẸ). o X o LS. 
In other words, 
VX e X"(T(G)), Yz €G, L? (X)(£) = L? Ow) 
= (Lg). (X(g 1o). 


62.3.10 THEOREM:  Consistency of left translations of tangent operators and tangent vectors. 
Definition 62.3.9 is consistent with Definition 62.3.3. In other words, 


(i) Vg € G, VV € T(M), Orz(v) = LT (dy), and 
(ii) Vg € G, VX € X°(T(G)), Orz(x) = LF (Ox) 


PROOF: For part (i), Vg € G, VV € T(M), Vo € C!(G), LT (ðv)(¢) = Av(¢ o Lg) by Definition 62.3.3. 
This equals (L2). (Ov )(9) by Definition 58.12.8. So Vg € G, VV € T(M), LT (ay) = (L@).(Av). This equals 
ire). (v) by Theorem 58.12.9. But (L£).(V) = LT(V) by Definition 62.3.9. So LT (ay) = Opry). 

For part (ii), LF (Ax)(x)(9) = Oüx(g !x)(ó o LẸ) by Definition 62.3.3. It follows from Definition 58.12.8 
that Ox(g tr) o Le) = (LF) (8x (g 12))(9). So LF (0x )(z) = (LE). (Ox (g7 tx) for all z,g € G 
and X € X°(T(G)). But Ox(g !x) = Ox(g-1,; by Notation 57.1.16, and it follows from Theorem 58.12.9 
that (L&).(8x(g-15)) = Ore), (x (g-12)), and (L£).(X(g^!x)) = LE (X)(x) by Definition 62.3.9. Therefore 
LF(Ax)(«) = Ope (x(a) But Opr(xy(a) = Opr(x)(w) by Notation 57.1.16. So LP (0x)(z) = 3rr (x)(x) for 
all x € G. Hence Opr(x) = L (0x). 


62.3.11 REMARK: The range of the left translation operator for tangent vectors. 
In Definition 62.3.9, LI (V) € Tj,(G) for all V € T;(G). So L7 does not define an automorphism of T; (G), 
but it is an automorphism of the total tangent space T(G) since L7 = (L£),. 


62.3.12 THEOREM: Covariance of left translation operators. 
Let G be a Lie group. Let 91, g2 € G. 

1 G G rG 
(i) Ly, á Ls = Logs" 
T C C — pre 

(ii) L5, s D m Lgs 
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(iii) ŻE o ÈT = Ì2 p 
(iv) LF o LF LE 
(v) I5 o LG, = La 
(vi) LE o LE = LP, 


PROOF: For part (i), (LZ o L@)(x) = LE (go) = gigax = LẸ (£) for all x € G by Definition 62.3.3. 
Therefore LG o LG = LS 


9g19g2* 
For part (ii), (LE o LE )(¢) = LE (do Li) =o Le o LS. =o i Age =o b den = Le (9) 
for all ¢ € C°(G). Therefore LC o LE = LẸ g,- 


EE TT TT JT TO) rC yC JC = rC Z 
For part (iii), (Lo, [o La) (ôv) = Lo (Oy o Li) = Ov [o Li o Lu = Ov o Digi = Ov o E isi) = 


El (Ov) for all V € T(G). Therefore iT o I = ÈT m 


For part (iv), 
YX e X°(T(G)), (Le o LF)(8x) = Lo. o Ox o Lu) 
TT TT G G 
= Ly o L5 0 Ox o Ly o Lg 


TT G 
=L o Ax o Lig g2)-2 


g1g2 
PF 
— Ljig (8x). 
Therefore Lm o Lr = LF ga 
For part (v), LT. o LT, = (L8). o (LG). = (LG o LG), = (LẸ ,,). = LT ,, by Definition 62.3.9. 
For part (vi), 
YX e X°(T(G)), (Lf, 0 LX) = D$ (Lj, 0 X o LS) 
T 2 


—-LtolLlTloXolLS8,oLS8, 
gı g2 95 91 

yT G 

= Liag oXo Liga) 


= LF (X). 


9192 


F F _7F 
Therefore 5 o Liz = Loss 


62.3.13 THEOREM: Some basic properties of left translations by inverse group elements. 
Let G be a Lie group. 

(i) Vg € G, LS, o LS =idg= a o Lei 

(ii) Vg €G, (dL )g o (aL )e = idr, (G) = (dL?) 4-1 fe) (LF 1 )e- 
(ii) Vg € G, (dLg);! = (ALF .)g. 


PROOF: Part (i) follows from Theorem 62.3.12 (i) because LC = ida. 


Part (ii) follows from Theorem 62.3.12 (v) and Definition 62.3.9. Alternatively apply Theorem 58.5.4, the 
chain rule for differentials of inverses of C! maps, to part (i). 


Part (iii) follows from part (ii). 


62.3.14 REMARK: Left translation operators for tangent covectors and differential forms. 

Left translation operators may be extended to tangent covectors and differential forms by observing that 
they are duals of tangent vectors and vector fields. In Definition 62.3.3, the operator LE for continuous 
functions is constructed by regarding such functions as a kind of dual of the point space. So one first applies 
the operator p ; to the point space to “go back" to the points where the “dual” continuous function can be 


applied, and the result is then “pushed forth" from the location g^ !p to the point p. The result is that the 
function value ¢(g~'p) is used as the value for LS (d) at p. The same thinking lies behind the construction 
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of Lr for tangent operators, which are a kind of *double dual" of the point space. Then Rr for tangent 
vectors maps T(G) forward to T,,(G), which is also a kind of “double pullback”. To compute the vector 
LT (V) € Typ(G), one first go back to V € T,(G) and then push this forth via (dL), to Tj, (G). 


The left translation operator for tangent covectors is then a kind of “triple dual" of the corresponding point 
space operator. One must send a vector V € T(G) back to T,-1,(G) so that A € T7.,,(G) can be applied 
to V. This must then be pushed forth to become the value for L}(A) € T;(G). 


The validity of the formula for L(A) may be checked by simultaneously left translating À and V by g. Thus 
L(A) (V)) = AES) ((LG')«(V))) = (LE o L£),(V)) = A(V). Therefore the computation of A(V) 
is independent of the translation, as it should be. 

The left translation operator for differential forms w € X (T*(G)) follows the same pattern because the value 
w(p) of a differential form at each point p € G is a tangent covector. (The procedure is similar in the more 
general case of a differential form w € X (Au, (T(G), W)), where w(p) is an antisymmetric multilinear map.) 
If a differential form w € X (T*(G)) is to be translated to the left by g, the value of the form LY (w)(p)(V) 
must be computed by applying the linear map w(g~'p) to V. (This means that w is translated forward from 
g !pto LC(g !p) = p.) This can be achieved by translating V back to g^! p via (dL? .), so that w(g-!p) 
can be applied to it. (This is illustrated in Figure 62.3.2.) 


Figure 62.3.2 Left translation operator for differential forms on Lie groups 


62.3.15 DEFINITION: Let G be a Lie group. Let g € G. 
The left translation operator (for tangent covectors) L) : T*(G) > T* (G) is defined by 


VA € T* (G), L(A) =à o LI 


In other words, 


Vp € G, VA € T; (G), VV € Typ(G), L(A) = AQLLA(V)) 
= X, a4)«(V)) 
= (Oy LS) 


The left translation operator (for differential forms) L2 : X(T*(G)) > X(T*(G)) is defined by 


Vw € X(T*(G)), L2(w) = LA ou o Lg. 


In other words, 


Yw € X(T*(G)), Woe G, L?(w)(p) = L5 (w(g7"p)) 
— w(g^!p) o Lys 


In other words, 
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62.3.16 REMARK: Left translation operators for more general tensor fields and differential forms. 
Definition 62.3.15 may be extended to left translation operators for differential forms of arbitrary degree, and 
likewise to tensor fields of arbitrary type. It is also possible to replace tangent covector fields with short-cut 
differential forms w € X(A1(T(G))) as in Definition 62.3.17. (See Definition 57.7.2 and Notation 57.7.12 
for short-cut differential forms.) Furthermore, Definitions 62.3.15 and 62.3.17 can both be generalised from 
scalar-valued to vector-valued differential forms in X (A; (T(G), W)) and X (A4, (T(G), W)) respectively, for 
any real linear space W. Of particular interest is W = T.(G). 


62.3.17 DEFINITION: Let G be a Lie group. Let g € G. u 
The left translation operator (for short-cut differential forms) LË : X(Ai(T(G))) > X(M(T(G))) is defined 
by 


Vw € X(M(T(G))), LY (w) 


Il 
€ 
o 
m 
EE 


In other words, 


Yw € X(Ai(T(G))), VV € T(G), L? (w)(V) = w(LTA(V)) 


I 
= 
> 
S 
tS 
xo 


62.4. Left invariant vector fields on Lie groups 


62.4.1 REMARK: Basic properties and benefits of left invariant vector fields. 

Unsurprisingly, Definition 62.4.2 states that a vector field on a Lie group is a “left invariant vector field” 
when it is invariant with respect to any left translation of the field. Slightly more surprising, perhaps, is 
that every left invariant vector field is completely determined by its value at the unit element of the group, 
as stated in Theorem 62.4.5 (iii). In fact, given the value of a left invariant vector field at any single point 
in the group, the value at all other points is uniquely determined, as stated in Theorem 62.4.5 (i). This is a 
consequence of the fact that the only orbit of a group acting on itself is the whole group. 


Perhaps more surprising is that the left invariant vector fields are algebraically closed under commutation, 
or more precisely, under the Lie bracket. (See Theorem 62.4.11 (ii).) These basic properties make possible 
the definition of the Lie algebra of a Lie group in Section 62.8. 


Left invariant vector fields mediate the very close link between a Lie group G and its Lie algebra T.(G). 
Given a left invariant vector field X € X(T(G)), the corresponding Lie algebra element is X(e) € T.(G). 
Conversely, given a vector V € T,(G), there is a unique left invariant X € X(T(G)) such that X(e) = V. 
'The first obvious advantage of this linkage is that one may freely choose between two constructions for the 
Lie algebra of a Lie group, the vectors in a finite-dimensional linear space T,(G) or the vector fields in a 
finite-dimensional subspace of the space of vector fields X (T'(G)). 


The second advantage is more important, namely that the algebra of the Lie algebra is determined by the 
Lie bracket operation for the left invariant vector fields. The finite-dimensional tangent space T.(G) does 
not have any algebraic operations apart from linear space operations. To obtain the vector multiplication 
operation for T.(G), one must first extend Vi, V2 € T.(G) to left invariant vector fields X1, X2 € X(T(G)), 
then compute the Lie bracket [X1, X2] in the usual way for vector fields as in Definition 61.5.7, and finally 
restrict the output to T.(G) by evaluating [X1, X2|(e) € Te(G). 


The fact that the multiplication operation for T.(G) must be determined in this extend/compute/restrict 
fashion may explain why so many authors define the Lie algebra of G to be the space Xr(T(G)) of left 
invariant vector fields rather than the simpler linear space T,(G). (See Remark 19.10.1 for a literature list.) 
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62.4.2 DEFINITION: A left invariant vector field on a Lie group G is a vector field X on G such that 


Vg € G, E =X. 
In other words, 


Vg € G, i, oX=Xo0 L$. 


62.4.3 NOTATION: Xz(T(G)) denotes the set of left invariant vector fields on a Lie group G. 


X*(T(G)), for k € Zt, denotes the set of left invariant C* vector fields on a Lie group G. 
In other words, X£(T(G)) = Xr(T(G)) n X*(T(G)) for any k € Zg and Lie group G. 


62.4.4 REMARK: Unique determination of left invariant vector field by single point values. 

Theorem 62.4.5 (ii) asserts that the value X (g) of a left invariant vector field X € X7(T(G)) at any point 
g € G is uniquely determined by the value X(go) at any point go € G. In particular, Theorem 62.4.5 (iii) 
asserts that X (e) uniquely determines X(g) for every g € G. 


62.4.5 THEOREM: Relations between left invariant vector field values at different points. 
Let G be a Lie group. Let X € Xr(T(G)). 


(i) Yg, h € G, X(g) = Li (X(h^!g)) = (LR) a (X(h^19)) = (dL )n-ig (X(h19)). 
(ii) Vg,go € G, X(g) = L^ OX (90)) E (Lag) (X(g0)) = (dL... )gs (X (go))- 
(iii) Vg € G, X(g) = LT(X(e)) = (L8). (X (e)) = (dE (X (e)), where e is the identity of G. 


PROOF: For part (i), let X be a left invariant vector field on G. Then by Definitions 62.4.2 and 62.3.9, 


X(g) = Ly (X(h-*g)) = (LR) (X (^79) = (ALY )n-1g (X (h^ g)) for any g,h € G. 
Part (ii) follows from part (i) by substituting h = ggg !. 


Part (iii) follows from part (ii) by substituting go = e. 


62.4.6 REMARK: Linear spaces of left invariant vector fields. 
Theorem 62.4.7 (i) is the left invariant version of Theorem 57.1.6 (ii). Theorem 62.4.7 (ii) is the left invariant 
version of Theorem 57.2.9 (ii). 


62.4.7 THEOREM: Sets of left invariant vector fields with pointwise operations are linear spaces. 
Let G be a Lie group. 
(i) XL(T(G)) < (R, XL(T(G)), oR, TR, 71, HL) is a real linear space with the operations of pointwise 
addition and scalar multiplication. 
(ii) For all k € Zi, XE(T(G)) < (R, X#(T(G)), or, TR, of, wk) is a real linear space with the operations 
of pointwise addition and scalar multiplication. 


PROOF: Part (i) follows by Theorems 57.1.6 (ii) and 22.1.13 from the algebraic closure of Xj, (T(G)) under 
linear combinations, and the fact that Xr (T'(G)) contains the zero vector field of X (T'(G)). 

Part (ii) follows by Theorems 57.2.9 (ii) and 22.1.13 from the algebraic closure of X#(T(G)) under linear 
combinations, and the fact that X¥(T(G)) contains the zero vector field of X (T (G)). 


62.4.8 DEFINITION: The linear space of left invariant C^? vector fields on a Lie group G is the tuple 
XfP(T(G) < (R, X?(T(G)), or, TR, 07. uZ), where of : XP(T(G)) x XP(T(G)) > XP(T(G)) and 
pr: Rx XP(T(G)) > XP(T(G)) are the operations of pointwise addition and scalar multiplication. 


62.4.9 REMARK: Lie algebras of left invariant vector fields. 

Whereas it is shown in Theorem 61.5.17 that X°(T(M)) is a Lie algebra for any C^? manifold M, it is 
shown in Theorem 62.4.11 (iii) that X7* (T(G)) is a Lie subalgebra of the Lie algebra X?*(T(G)). Therefore 
XT (T(G)) is a Lie algebra. 

In the case of the classical Lie groups, X?*(T(G)) is typically infinite-dimensional, whereas X?°(T(G)) 
is usually finite-dimensional. (The Lie algebra X7*(T(G)) is used in Definition 62.8.3 as the vector field 
interpretation for the Lie algebra of a Lie group.) 
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62.4.10 DEFINITION: The Lie algebra of left invariant vector fields on a Lie group G is the tuple 
XP (T(G)) < (R, XP (T(G)), on, TROP, TH US), where 
(i) (R, XP (T(G)), or, TR, oF, uF) is the linear space of left invariant C^? vector fields on G, and 
(ii) Tf? is the restriction to X7°(T(G)) of [-, -] : X°(T(G)) x X^(T(G)) > X°(T(G)), which is the Lie 
bracket operation (as in Definition 61.5.7). 


62.4.11 THEOREM: The Lie algebra of left invariant vector fields is a Lie subalgebra of all vector fields. 
Let G be a Lie group. Let XP°(T(G)) < (R, XP°(T(G)), on, TR, OF, TH, MF) be the Lie algebra of left 
invariant vector fields on G as in Definition 62.4.10. 

(i) (IR, XP°(T(G)), or, TR, OF, UZ) is a real linear space. 

(ii) X?*(T(G)) is closed under the Lie bracket operation for vector fields. 
(iii) XP°(T(G)) is a Lie subalgebra of the real Lie algebra X^? (T(G)) of C^? vector fields on G. 
(iv) XP°(T(G)) is a real Lie algebra. 


PROOF: Part (i) follows from Theorem 62.4.7 (ii). 
For part (ii), let X,Y € XP(T(G)). Then LF([X,Y]) = LI o 
Definition 62.3.9. So by Theorem 61.6.6 and Definition 62.4.2, 


[X,Y] o LS, ep; [X,Y] o LS, by 
LEY = [(Lg)« 0 X o LS, (Lg) 0 Y o LS] 

= [L5 (X). L7 (¥)] 

= [X,Y]. 


Therefore [X,Y] is a left invariant vector field on G. Hence X?°(T(G)) is closed under the Lie bracket. 
For part (iii), note that X^*(T(G)) < (IR, X®(T(G)), oR, TR, 0, 7, 1), the Lie algebra of C^? vector fields 
on G, is a Lie algebra over R by Theorem 61.5.17. Also XP°(T(G)) € X^ (T(G)) by Notation 62.4.3. 

The addition operation of and scalar multiplication wf are closed on X7*(T(G)) by part (i). In other 
words, of C c and pf C p. The product operation r?° is closed on X7*(T(G)) by part (ii). In other 
words, T7? C m. The zero vector field is an element of X?°(T(G)) by part (i). Hence it follows from 
Theorem 19.10.19 that X? (T(G)) is a Lie subalgebra of X?" (T(G)). 

For part (iv), X?°(T(G)) is a Lie algebra by part (iii) and Definition 19.10.17. 


62.4.12 REMARK: Left invariant vector fields are infinitesimal right actions. 

As mentioned also in Remarks 62.7.1, 63.6.2 and 63.7.1, left invariant vector fields are infinitesimal right 
actions, and vice versa. By Theorems 62.2.8 and 62.4.5 (iii), a left invariant vector field X € Xz (T(G)) for 
a Lie group (G,o) must satisfy X (g) = (dL@).(X(e)) = (do(g, -))e(X(e)) = (do) g,e(0g, X(e)) for all g € G, 
where 0, is the zero element of T,(G). This shows that X(g) is the result of varying e in the expression 
a(g,e) by X(e). In more precise language, it is the output of the differential (do),,. for inputs 0, and X (e). 


In very rough language, this may be thought of as the “infinitesimal right action” by X(e) on g because it is 
the result “g(e+6.X(e))” of the right action of “e+6.X(e)” on g for infinitesimal ô. (Although the language 
of infinitesimals helps to explain the name “infinitesimal right action”, it should be avoided because it is 
difficult to translate into rigorous mathematical language. The language of differentials, on the other hand, 
is not so difficult to convert to meaningful mathematics.) 


Table 62.4.1 shows some relations between left/right invariant vector fields and right /left infinitesimal actions. 


Lie group left invariant 


Lie right transf. group 


Lie group right invariant 


Lie left transf. group 


Xi :g 9 (dLg)«(V) 
X$ € Xr (T(G)) 
L? (XY) = X% 

Rg (Xv) = Xaj- v) 


XM : po (dLM)e(V) 
XV e X*-!(T(M)) 


F(yM M 
Rg (Xv) = X Adj D(V) 


Xf : g (dRG)e(V) 
XË e X#(T(G)) 
RE(X) = XB 


L5 (X) = Xia) 


XV : p> (dR )«(V) 
X e X*-MT(M)) 


F M M 
Lg (Xv ) = XAV) 


Table 62.4.1 
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In the first and third columns, the left and right invariant vector fields XS and XR on a Lie group G, 
generated by a vector V € T.(G), are introduced in Definitions 62.4.15 and 62.7.7 respectively. These vector 
fields are shown to be in X7*(T(G)) and XX (T(G)) in Theorems 62.4.16 (vii) and 62.7.8 (vii) respectively. 
'This implies that they are left and right invariant respectively, which means that i (X5) = XP and 
RE (XË) = XË. However, these fields are not in general right and left invariant vocas tel because 
RP (Xv) = Xaj) and L7 (X7) = Xs, by Theorem 62.10.10 (i, ii). 

The left and right invariance formulas, L (XŁ) = XŁ and RE (XB) = XB, follow from the fact that the 
left and right actions of a Lie group commute. Since X% is actually a right infinitesimal transformation 
by the vector V € T.(G), despite the superscript “L”, it commutes with the left translation operator LE . 
The reason for the failure of right invariance for XL is the same, namely that XU is an infinitesimal right 
action. Unsurprisingly this right invariance does succeed if the adjoint expression Adj(g !)(V) equals V, 
which means in essence that g^! and V commute. 


In the second and fourth columns, the infinitesimal right and left actions XV on a manifold M by vectors 
V € T.(G), for a right or left transformation group (G, M) respectively, do not have an entry in the row for 
left or right translations respectively because these are not defined in general. This is because left translation 
by a right transformation group is not defined, and likewise right translation by a left transformation group 
is not defined. The principal observation here is that the formulas for right translation of X t and X "d in the 
first and second columns are the same. (The formula RẸ (Xj) = X Ai (g-1)(v) follows from Theorem 63.7.8.) 
This is a consequence of the fact that these 1 both infinitesimal transformations by V € T.(G). Likewise, 
the formulas for left translation of XË and X¥ in the third and fourth columns are the same. (The formula 


LP (XY) = X Mito (V) follows from a koorem 63. 6.15.) 


62.4.13 THEOREM: The left translation field of a vector is a left invariant vector field. 
Let G be a Lie group with identity e. Let V € T.(G). Then X : G —^ T(G) defined by Vg € G, X(g) = LT (V) 
is the unique left invariant vector field X € Xz (T(G)) such that X(e) = V. 


PROOF: Let V € T,(G). Define X : G — T(G) by X(g) = L7 (V) for all g € G. Then X(g) € T,(G) for 
all g € G. So X € X(T(G)). To show that X is left invariant, let g € G. Then 


Vx €G, L^ (X)(z) = LL(X(g^!z)) (62.4.1) 
= LL; s (V)) 
= L] (V) (62.4.2) 
= X(x), 


where line (62.4.1) follows from Definition 62.3.9, and line (62.4.2) follows from Theorem 62.3.12 (v). So X 
is left invariant by Definition 62.4.2. 

To show uniqueness, suppose that X' € X(T 
LG (X')(2) = L3 (X'(g7*2)) = L2(X'(e)) = 
Thus X’ = X. Hence X is unique. 


(G)) and X'(e) = V. Let x = g. Then by Definition 62.3.9, 
LT(V), which implies X'(g) = LT(V) by Definition 62.4.2. 


62.4.14 REMARK: The unique left invariant vector field with a given value at the identity. 

It follows from Theorem 62.4.13 that one may define the left invariant vector field X € X(T(G)) with a 
given value X(e) = V at the identity of the group. This is given a name and notation in Definition 62.4.15. 
It follows from Definition 62.3.9 that LT (V) = (L@).(V) = (dLg)«(V) for V € T,(G) in Theorem 62.4.13. 
So the expression (dL )e(V) is used in Definition 62.4.15. (See Figure 62.4.2.) 


Xy(g) = (4LF )e(V) V 
(dL). 
PEA MEL: es 
< 
g LG e 
Figure 62.4.2 Left invariant vector field uses left translation operator 
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Left invariant vector fields play a fundamental role in the definition of the Lie algebra of a Lie group. In 
fact, they are the exact analogue of the “fundamental vector fields” for principal bundles in Definition 66.6.2. 
The corresponding right invariant vector fields in Definition 62.7.7 are not usually used in this role. 


62.4.15 DEFINITION: The left invariant vector field generated by a vector V € T.(G), for a Lie group G, 
is the unique vector field XL € X, (T(G)) which satisfies 


Vg € G, Xv (g) = (aLg)«(V). 


62.4.16 THEOREM: Some basic properties of left invariant vector fields generated by a vector. 
Let G be a Lie group with identity e. 


(i) VV € T.(G), XE(e) — V. 


(ii) VV € T,(G), Vg € G, Xv(g) = (LẸ). (V) = Oy LE = L7 (V). 

(iii) VV € T(G), Vg € G, X¢(g) = (dL). (Xv (e)) = Xx sí (9)- 

(iv) Xr(T(G)) = (XU; V € T.(G)}. (So all left invariant vector fields on G equal XL for some V € T.(G).) 
(v) VV € T(G), XE e X*(T(G)). 

(vi) VV € T. p Xt e Xr? (T(G)). 

(vii) X,(T(G)) = XX (T(G)). 


PROOF: Part (i) follows directly from Definition 62.4.15 and Theorem 62.4.13. 
Part (ii) follows from Definitions 62.4.15 and 62.3.9. 
Part (iii) follows from part (i) and Definition 62.4.15. 


For part (iv), it follows from Definition 62.4.15 and Theorem 62.4.13 that Xz(T(G)) 2 (Xv; V € T-(G)}. 
To show the reverse inclusion, let X € Xr(T(G)). Let V = X(e). Then X% € (XU; V € T.(G)}. But then 
X = X% by Theorem 62.4.13. So Xz (T(G)) € (Xi; V € T.(G)}. Hence X;(T(G)) = (XL; V € T.(G)). 


For part (v), to prove that Xi is C^*, it must be shown (by Remark 57.2.2) that wo Xd o 71 is C% for 
all Y% € atlas(G), where 7 € atlas(T(G)) denotes the tangent space chart corresponding to each manifold 
chart w € atlas(G). (See Definition 54.5.16 for tangent space charts.) By Definition 58.4.5 for the differential 
of a map, Xv(g) = (ALO )e(V) = tgw, for Y2 € atlas, (G), where w € IR" is defined by 


Vk € Nn, wt = » vanilya o Lg ovi (2), (e) 
= v' Oni (WF lalg, v1 1 ()))) — 


where v € R” satisfies V = te», for Yı € atlas.(G), and o : G x G — G is the C™ group action of G. 


Define f : U + R” by f(x,y) = vo(o(V5 (y), Yr (2) for (x,y) € U = Dom(v;) x Dom(v») C R?”. Then 
by the definition of a C^? manifold map o : G x G — G, f is a C™ function. Therefore the derivative 
function (x,y) — h(z,y) = $5, 4 vôri f(x,y) is also C™. It follows that w is C% with respect to g, 


since w = h(vo(g), v1(e)). But for all y € Dom(w2), vo(XE(vz 1 (y))) = (y, hly, yı (e))) € IR^, which is now 
obviously C?? with respect to y. Therefore XL is C™ as claimed. 


Alternative proof for part (v). 

Let V € T.(G). Define $1, ¢2 : G— T(G) by $1:g 5 04 = Or (c) and $3 : g V. Then $1 € C? (G, T(G)) 
by Theorem 57.2.14 because $4 is a zero cross-section, and à» € C^*(G, T(G)) by Theorem 52.1.9 because 
$2 is constant. So ¢ = $1 x à» € C*(G, T(G) x T(G)) by Theorem 52.6.13. Then X¢(g) = (dL°),.(V) = 
(do) g,c(0g,V) = (do),,-6(g) for g € G, where ø is the group operation. (See Theorem 62.2.8 for the direct- 
product differential-decomposition (do), - : T;(G) x T.(G) > T,(G).) But e... : T(G) x T(G) > T(G) isa 
C^? map by Theorem 58.10.3 because e : G x G — Gis C™ by Deaton 62.2.4 (iii). So by Theorem 52.1.17, 
the chain rule for manifold maps, XL = c, « o 6 € C~(G,T(G)). Hence XL € X*(T(G)). 


Part (vi) follows from parts (iv) and (v). 


Part (vii) follows from parts (iv) and (vi), and Notation 62.4.3. 
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62.4.17 REMARK: Left invariant vector fields generated as infinitesimal matrices. 

On a concrete Lie group consisting of matrices, left invariant vector fields can be constructed by differentiating 
the right action of an exponential matrix family on vectors. Specifically, a vector field Xy € X(T(G)) may 
be defined by Xv(g) = (g exp(tV)) Dh- o for all g € G and V € Te(G), where exp : Te(G) — G is defined 
as the standard exponential power series V ++ exp(V) = 35; V‘/i!. By the chain rule, the computation of 
Xv (g) gives the same result as (dLC)«(V) because gexp(tV) = LF (exp(tV)) and 0, exp(tV)|, , = V. 


The validity of this construction relies on the representation of Lie algebra elements as matrices which 
can be exponentiated in this way. For abstract Lie groups, the exponential map t — exp(tV) must be 
constructed as the solution of an ODE for each V € T,(G). Then the differentiation of exponential curves 
t œ> exp(tV) recovers the vector V as 0; exp(tV) Wiss =o; This means that the vector V is first exponentiated, 
using solutions of ODEs, and then this is digenntiated to recover the original vector V. This seems a 
somewhat cumbersome way to achieve nothing. It is for this reason that left invariant vector fields are 
constructed directly as differentials in Definition 62.4.15. The matrix approach may seem at first sight to be 
more concrete. However, the exponential matrix power series construction is not as elementary as it looks, 
and solutions of ODEs are certainly not elementary at all. 


A minor advantage of the exponential matrix differential definition is that it makes explicit the idea that 
it is an infinitesimal right action of the group. (This left/right conundrum is discussed in Remark 62.4.12.) 
For “small ¢”, exp(tV) may be thought of as an infinitesimal group element, although infinitesimals tend to 
be ill-defined, as they were in the 17th century. 


Another minor advantage of exponential matrix differentials is that they can be applied in the same way to 
the action of a group on a manifold. However, once again the differential (dL). does have the advantage of 
being well-defined, correct, compact, clear and general. The concreteness of matrices makes them useful for 
elementary introductions, but for pure mathematical presentations, the abstract viewpoint is preferable. 


For definitions of vector fields using expressions of the form 0;(exp(tA) p Jas to OF Or(p exp(t A)) »| i29» Where A 
is a matrix or Lie algebra element, see for example Bleecker [254], pages 30- 31, 38, 44, 47-50; Sternberg [38], 
pages 332-333; Frankel [12], pages 455, 486; Bump [57], page 42; Szekeres [305], page 575; rena Pirani [7], 
page 314; Silankke Aaien [40], pages 75, 85. 


62.4.18 REMARK: The difficulty of proving smoothness of left invariant vector fields. 

One may try to prove the C regularity in Theorem 62.4. 62.4.16 (v) using the test-function version of the left 
translation operator Le In this case, the left invariant tangent operator field XL € X?(T(G)) would be 
constructed as XL(g) = LT (ay) for V € T.(G). In this case, the test for C^? regularity would require 
XL($) € C*(G) for all ¢ € C**(G). For any x € G, X$(d)(x) = X$(x)($) = LT (Ay)(¢) = 0v(9 o LE). 
It could be argued that 9 o LS is C^? with respect to x, and therefore Oy (à o LG) must be C% with 
respect to x, and therefore the C% regularity of XL (9) is proved. However, although it is “obvious” that 
r++ Ov(ó o LC) is C™, it is not obvious how one would prove it. To show that Ov(ó(oc(x, -))) is C% 
would probably require some analysis using charts as in the first proof of Theorem 62.4.16 (v). 


The alternative second proof of Theorem 62.4.16 (v) uses charts from earlier theorems to hide the calculus. 
The proof of Theorem 66.6.8 (ii, iii, iv) for the right action map of a principal G-bundle uses the same earlier 
theorems because the proof there is essentially identical (because the theorem is very similar). 


62.4.19 REMARK: Computation of left invariant vector fields generated by Lie algebra elements. 

Example 62.4.20 demonstrates the computation of left invariant vector fields on the Lie group SO(3) using 
abstract manifolds as in Definition 62.4.15 and Theorem 62.4.16. The computations are shown here in some 
detail to clarify (and validate) meanings of symbols and concepts, not as a demonstration of a routine method 
of Lie algebra computation. Real-life computations use matrix multiplication and basic calculus! 


62.4.20 EXAMPLE: Computation of SO(3) left invariant vector fields generated by Lie algebra elements. 
The quintessential non-abelian Lie group is SO(3), which may be defined (as in Remark 25.14.1) as the pure 
matrix group {R € Ma (IR); RTR = I3, det(R) = 1). (See Example 63.6.23 for the corresponding group of 
transformations of the sphere S?.) 
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As noted in Example 63.6.23 line (63.6.3), matrices R(a1, 02,03) € SO(3) have the form 


CoC3  S152C3 — C193  C152C3 + $153 
R(aı, Q2, 3) = R3(a3)R2(a2)Ri(a1) = C283 818283 + C1C3 C18283 — $103 (62.4.3) 
— 825 $1C2 C1C2 


where s; = sin o; and c; = coso; for £ = 1,2,3. The map R : IR? > SO(3) is obviously not injective, but it 
is injective in some small enough neighbourhood of (0,0,0) € IR?, and R(0,0,0) = I3 = [6:3]? ;-1 equals the 
identity element e € G = SO(3). In such a neighbourhood, the restricted map R is a C^? isomorphism with 
respect to the standard differentiable structure on Ms 3(IR). So its inverse provides a manifold chart on G 
which faithfully represents the differentiable structure on the submanifold G of M3 3(R). 


The angles can be obtained from a matrix R € SO(3) in a neighbourhood of the identity matrix Iz by 
the formulas o4 = arctan( R33, R32), a2 = — arcsin(Ra31) and a3 = arctan( R11, R21). (See Definition 44.2.9 
for the two-parameter arctan function.) Thus a coordinate chart  : U — IR? may be defined in some 
neighbourhood U of I3 by 


VR EU, (R) = (arctan( R33, R32), — arcsin (R31), arctan( R11, Ro1)). (62.4.4) 


A left invariant vector field X € X(T(G)) satisfies X(R) = (dL&)-(X(e)) for all R € G by Definition 62.4.15 
and Theorem 62.4.16 (i). Let V € Tn (G). Then V = tz, wv, for some v € IR? by Notation 54.1.4. 

Let R € U. Then Y(R) = (arctan( R33, R32), — arcsin( a1), arctan( R11, R21)). The value of the differential 
(dL8).(X(e)) € Tr(G) may be computed for X(e) = V using Definition 58.4.5. One obtains (dLG).(V) = 
(db ab ean) = İR, ww, where for all k € N3, 


3 > 
v = Y vas o1 DO uo 


j=l 
3 > 
exu | (62.4.5) 
j=l 
C2C3  $182€3 — C183  C152€3 + 8153 X233 Y1Y2T3 — T1Y3  Xiy23 t yiys 
C283 818283 + Ci1C3 018283 — 81C3 | | v2U3  UiU2Us + Xia T1Y2Y3 — Viva ee 
—83 51C2 C1 C2 —ya 1*2 Ly L2 TEWARI 
where x = cos(z;) and y/ = sin(z;) for j € Na. Let E31 = (LG(v-1(z)))a1. Then 
Z1 a 5 z z=v(e) = —Cz i 31 z=w(e) 
02,07 (Lb (z)))| z, arcsin( E; ) 
= —0,, arcsin(—s2%2%3 + S1C212y3 — e1c2¥2)|.-(0,0,0) 
= -(1 — He ða (—3s21233 + 81C2%2Y3 — €1C2¥2)|,_(0,0,0) (62.4.6) 
= —63'0.,(—s2r2%3 + $1¢2@2y3 — e1¢2¥2)| 20,00) (62.4.7) 


=-=- 0=0, 


where line (62.4.6) follows from Theorem 44.2.15, and line (62.4.7) follows from E3ı| —s2. Similarly, 


z=(0,0,0) — 


94 V (LRT (2) | ure) = —02, arcsin(E31)|,_ (s 0,0) 
= —eg 0z, (—s2%203 + 81C2r2y3 — €1292) |... o.0,0) 
= —c,' - (—c1c2) = c1 
and 
O4 (LRO ())) | ure) = —ôz arcsin(Æ31)| (0,0,0) 
= —c3 0, (—S222£3 + 51 9123 — €129/2) |... (o.0,0) 


= -et : (8103) = —81. 
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Thus w? = cv? — siv?. Similarly, wt may be computed from lines (62.4.4) and (62.4.5) by computing 


derivatives of vi(R) = arctan( £33, E32), where E33 — (LG 71(2)))as and E32 = (LE (w-1(z)))s2- One 
obtains E33 = —s2(£1Y2£3 + yia) + S1C2(£1Y2Y3 — 1a) + €1C2%12£2, and a similar expression for £32. Then 


Oz, arctan( E33, E32) = (— £3202, E33 + E3302, E32) /( E33 + E) 


by Theorem 44.2.15, where E2, + E2, = 1— E2, by the orthogonality of the matrices. The substitution 
of (0,0,0) for z then gives (E2, + E3,)| 1 — (—s2)? = cà. Then 3z B33| —s 1c. and 


z=(0,0,0) — z=(0,0,0) — 
9; Esa|, (5.0) = «c3. So 0, arctan(Ess, E2)|.. (o 6.9) = ((—51c2) + (—51c2) + 162012) /c2 = 1. Thus 
9. V (LEO lyte) = Oz, arctan( Eas, Es2)|,-(0,00) 
zu. 
Similarly, Esaiam = C1C2, Es2| ceu = 81C2, Os, Ess|,. (0.0.0) = — S2 and Ben E32 ,_(0,0,0) = 0 lead to 


9, (LRT (2))) | uo) = Ox» arctan( £33, Esa) |, teri 
= ((—E3202 E33 + E3302, E32)/(E33 + E32))|-—(0,0,0) 


= (s1C282 + €1€2 © 0)/c2 = s152/c3. 


Similarly, 
94 (LRH (2))) |, uo) = [o arctan( £33, Es2)| 0,0,0) 
= ((—E3202, E33 + E3302, E32)/(E33 + E32))|-—(0,0,0) 
= (—81¢2 : 0 + c1c282)/c3 = c182/c2. 
Consequently w! = (c2 — s2)v! + (s152/c2)v? + (c152/c2)v?. Finally, to compute w? from lines (62.4.4) 
and (62.4.5), note that Eni], — €283, Bi oa = C2€3, Bo Ew. = 0, DENS = —(c182¢3 + 8183); 
0, E11| „o = 515208 — C153, Oz, E21| „o = 0, Oz, Exi], = — (618283 — 5103), 04, E21|  — 515253 + c1cg and 


E2, + E2, = cà. Then one obtains 
82, P (LRT (2))) |, Lucy =0:, arctan(£11, £21)|,_ (5 5.9) 
= ((—F2182, Ei F E102, E3)/(E us E21))] 
= (—c283 - 0 + caca - 0)/c2 = 0, 
94 (LEY *(2))) | ue, = 0,, arctan( E11, Sia) sagas 
= ((—Ea10,, Ei; T E1104, Ex) /(E sp E21))] 


2—(0,0,0) 


z=(0,0,0) 
= (—c283 - —(C182€3 + 8183) + C2C3 © —(€1 8253 — 81¢3))/c = $1/Co, 
Oz," (LE (97 ())], ut = Oz, arctan( E11, E) gm 


= ((CEn04 E11 + E1182, Eo)/(Efs + E31))|-—(0,0,0) 


= (—c283 - (8182€3 — C183) + C2C3 - (818283 + c1c3)) /cà = c1 /c2. 
So now one may write w = Av or Vk € Na, u^ = M ak jvi, where 
1 8182/C3 €182/ C2 
A = 0 C1 — $1 
0 s1/c2 c1/C2 


This matches the matrix given by Hughes [276], page 28 for the rotation F4 (o1)f3(a2)FHa(o3), which is in 
the reverse order to line (62.4.3) because he is rotating basis frames, not points. (See Theorem 22.9.11 for 
the relation between basis frame transformations and component transformations.) 

It now follows that the left invariant vector field on G = SO(3) with velocity V = tew, € Te(G) at the 


identity e € G satisfies 


Vo € v (U), Xi (Ha(aa) Ro(a2)Ri(a1)) = tn, Avv; 
where R is an abbreviation for R3(a3)R2(a2)Ri(a1). In other words, 
VR € U, Ww € RË, COEF eG asd) = tR, Avy- 
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62.5. Maurer-Cartan forms on Lie groups 


62.5.1 REMARK: Triviality of the tangent bundle of a Lie group. 

The tangent bundle of a Lie group G is trivial. A chart which demonstrates this is 6 : T(G) — T.(G) defined 
by @: z œ (dL, (z)-1)2(z)(Z), where e is the identity of G, m is the projection map of T(G), and dL; (,,-: 
is the differential of the map L,:,)-1 : G —> G. Then the map 7 x $:T(G) 2 Gx T.(G) is a global C% 
diffeomorphism. 


62.5.2 REMARK: Absolute parallelism of tangent vectors of a Lie group defined by left translation. 

Left translation of vectors defines an absolute parallelism on a Lie group. More precisely, vectors Vi € T}, (G) 

and V2 € T}, (G) for gi, 92 in a Lie group may be said to be parallel if V2 = L? (V1). Of course, right 
1 


translation also defines an absolute parallelism. 


62.5.3 REMARK: The Maurer-Cartan form. 

The chart ¢ : z > (dLz(,-:)5(4(2) in Remark 62.5.1 is generally known as the Maurer-Cartan form, which 
appears in differential transition maps for fibre bundle trivialisations. (See in particular Theorem 64.8.10. 
For the mirror-image Maurer-Cartan form for right transformations, see also Theorems 64.14.5 and 67.6.8.) 


The Maurer-Cartan form is does not immediately appear to be a fibre bundle because it does not map 
group elements g € G to elements of T,(G). It maps vectors in T(G) to a single linear space T.(G). 
However, it could be thought of as the short-cut version, as in Notation 57.7.15, of a cross-section of the 
non-short-cut form bundle X (A (T(G), Te(G))) in Definition 57.6.5. The non-short-cut form corresponding 


to Definition 62.5.4 would then be a form © € X(Aı (T(G), T«(G))) defined by 


Vg € G, WV € T,(G), G(g)(V) = (dL,-1)5(V). 


Thus Vg € G, olg) € M(T5(G), T.(G)) = Lin(T,(G),T-(G)). Definition 62.5.4 is clearly a straightforward 
short-cut version of this. (The equality (dL? 1)g = (dL? );! follows from Theorem 62.3.13 (iii).) 


Note that the term ^Maurer-Cartan form” sometimes refers to a closely related form on the base space of a 
differentiable fibre bundle. (See Remarks 62.5.7 and 64.8.13.) 


62.5.4 DEFINITION: The left (i.e. standard) Maurer-Cartan form. 
The (left) Maurer-Cartan form on a Lie group G is the map w : T(G) — T.(G) defined by 


Vg € G, VV € T4(G), w(V) = (dL? 1)4(V) 
= (dLg); (V). 


In other words, w = Usea (dL-.)s- 


62.5.5 REMARK: Left invariance of the Maurer-Cartan form. 

Theorem 62.5.6 uses the obvious extension of Definition 62.3.17 from scalar-valued to vector-valued left 
translation operators. Similarly, Definition 62.4.2 is extended in the obvious way from left invariant vector 
fields to left invariant differential forms. 


62.5.6 THEOREM: The left Maurer-Cartan form on a Lie group is a left invariant differential form. 
Let G be a Lie group. Then the left Maurer-Cartan form w on G is a left invariant differential form on G. 
In other words, 


Vh e G, wo (Lg). =w. 
In other words, 

Vh € G, VV € T(G), w((dLp-1)x(v)(V)) 7 w(V), 
where m : T(G) — G is the tangent bundle projection map for G. 
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PROOF: Let h € G and V € T(G). Let g =a(V). Then (dLp-1)g(V) € T,-14(G). So 


w((dLy-1)g(V)) = (dL(n-1g)-1)n-1g((dLn-1) g(V)) 
= (d(L(n-1g)-1 © Ly-1))g(V) 
= (dL,-1)(V) 
=w(V). 


Hence w is a left invariant differential form on G by Definition 62.3.17. 


62.5.7 REMARK: The Maurer-Cartan form in the literature. 

In at least some of the literature, the form (dL )g : T,(G) > T.(G) is denoted *g^!". Thus, for example, 
the notation g^ !dg used by Drechsler/Mayer [262], page 204, means (dL? 1) g(“dg”), where “dg” means any 
element of T,(G). This kind of notation is encountered with great frequency in the gauge theory literature. 
(See also Remark 64.8.13.) Consequently “g~!” can mean either the inverse element g^! € G of g € G, or 
it can mean the Maurer-Cartan form at g, which is (dL? .)g : T(G) > T.(G), according to context. 


«,—1» 


More generally, the notation *g^ ^", applied to vectors in T(G), may be interpreted as (L,-1). if it is written 
on the left, and “g” may be interpreted as (R,). if it is written on the right. So an expression such as 
^g lug" may be interpreted as (L,-1).((Rg)«(u)) = (dLg-1)4((dR,)e(u)) = Adj(g~')(u) for all g € G and 
u € T.(G). (See Notation 62.10.3 for Adj(g ).) 

According to Chevalley [60], page 152: “A left-invariant Pfaffian form is called a form of Maurer-Cartan.” 
This agrees with Definition 62.5.4. 


62.6. Right translation operators on Lie groups 


62.6.1 REMARK: Semi-superfluous conversion of left-invariant to right-invariant vector field definitions. 

In the interests of fair play, Sections 62.6 and 62.7 replicates Sections 62.3 and 62.4 for the case of right 
translation operators and right invariant vector fields. One substantial advantage of this apparently slightly 
superfluous mirror-image replication of definitions and theorems is that it reduces the mental burden of 
having to mirror-reverse them when applying them to other contexts. It also reduces errors which can occur 
when multiple left/right *mirror reversals" must be made to multiple interacting structures simultaneously. 


62.6.2 DEFINITION: Let G be a Lie group. Let g € G. 
The right translation operator (for group elements) RE : G — G is defined by 


Va eG, Re (x) = gg. 


The right translation operator (for real-valued functions) RE : C9 (G) — C? (G) is defined by 


vo € C?(G), RC ($) 2 à o RS. 
That is, 
Vo € C"(G), Va € G, RE (¢)(x) = (ug). 


The right translation operator (for tangent operators) RT : T(G) > T(G) is defined by 


YV € T(G), RT (ðv) = dy o RF. 
That is, 
YV € T(G), Vó € C'(G), R$ (v )($) = 0v (RF-1(4)) 
= dy(¢o RÇ). 


The right translation operator (for tangent operator fields RF : X?(T(G)) > X°(T(G)) is defined by 
g 


YX € xe. RE (ax) = RT o (Ox o RS). 
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That is, 
VX e X"(T(G)), Va e G, 


That is, 


62. Lie groups 


VX e X"(T(G)), Vz € G, Yọ e C1(G), 


62.6.3 DEFINITION: 


Let G be a Lie group. Let g € G. 


The right translation operator (for tangent vectors) RT : T(G) + T(G) is defined by R7 = (RF): 


VV € T(G), 


In other words, 


Yp € G, VV € T(G), 


Rg (V) = (R7 )(V) 


g 
= ðv RÇ. 


R? (V) = (aR2)s(V). 


The right translation operator (for vector fields) R : X°(T(G)) 5 X°(T(G)) is defined by 


YX e X" (T(G)), 


That is, 
VX e X"(T(G)), Yz €G, 


62.6.4 REMARK: 


R(X) = R7 o X o RS. 


Rg (X) (2) = Ry (X(xg~*)) 
= (RG)s(X(xg~*)). 


Illustration of four kinds of right translation operators. 


(RES oXo RẸ. 


The interpretation of Definitions 62.6.2 and 62.6.3 may be assisted by Figure 62.6.1. 


c 
Rg Rg 
C 
x cg xg | R, z 
T F 
EET. 7 xe) APO ROOG 
RIV 
C € 
T RE, P sf. RS, JA 
x RÇ xg xg! RG, x 


Figure 62.6.1 
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62.6.5 THEOREM: Consistency of right translation operators for tangent operators and vectors. 
Definition 62.6.2 is consistent with Definition 62.6.3. That is, Ógr(v) = Rj (Ov) for all g € G and V € T(M), 


and Ogr(x) = RI (Ox) for all g € G and X € X°(T(G)). 


ProoF: The proof is the same as for Theorem 62.3.10. 


62.6.6 THEOREM: Covariance of right translation operators. 
Let G be a Lie group. Let gi, g2 € G. 


(i) RE o RẸ = RÈ g- 
(ii) Ry, o Ra = Ron: 
(iii) RE o RT, = RE as 
(iv) RE o RE = RE g. 
(v) RI o RI, = Rpa 
(vi) R o He zl. m 
PRoor: The assertions may be proved as for Theorem 62.3.12 (except that everything is reversed). 


62.6.7 THEOREM: Some basic properties of right translations by inverse group elements. 
Let G be a Lie group. 


(i) Yg € G, RG, o RY = idg = RF o RS. 
(ii) Yg € G, (ARE )g o (ARG )e = idr, (a) = (RẸ )g-1 o (ARE )e- 
(iii) Yg € G, (dR2)z! = (dR..),. 


PROOF: Part (i) follows from Theorem 62.6.6 (i) because RO = ida. 


Part (ii) follows from Theorem 62.6.6 (v) and Definition 62.6.3. Alternatively apply Theorem 58.5.4, the 
chain rule for differentials of inverses of C! maps, to part (i). 


Part (iii) follows from part (ii). 


62.6.8 REMARK: Right translation operators for tangent covectors and differential forms. 

Definitions 62.6.9 and 62.6.10 are mirror images of Definitions 62.3.15 and 62.3.17 respectively. For comments 
on the corresponding left translation operators, see Remarks 62.3.14 and 62.3.16. (The construction of the 
right transformation operator for differential forms is illustrated in Figure 62.6.2.) 


(pg) (AR... (V) = Ri oP) 
vert) Tn. gelo) 
(dRG .),(V) Rap c V 
pg R2. p 


Figure 62.6.2 Right translation operator for differential forms on Lie groups 


62.6.9 DEFINITION: Let G be a Lie group. Let g € G. 
The right translation operator (for tangent covectors) R} : T*(G) — T* (G) is defined by 


YA € T*(G), RA(A) =A o RIA 


=\o (RS 


got 


je 
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In other words, 


Vp € G, VA € T7 (G), VV € T(G), RA(A)(V) = (REA (V)) 
= A((RF-1)«(V)) 
= (0v Ri) 


The right translation operator (for differential forms) R? : X(T*(G)) > X(T*(G)) is defined by 


Vw € X(T*(G)), Bou) = R} owo RG. 
In other words, 
Vw € X(T*(G)), Vp € G, Re (w)(p) = R3 (w(pg ?)) 
-u(pg eR... 
= w(pg ?) o (Rg.1).. 
In other words, 
Yw € X(T*(G)), Yp € G, VV € T,(G), Re (w)(p(V) = R} (w(»g )(V) 
w(pg™™) (R3 (V) 
= (pg 1) (Gig-1)«(V)) 
(pg (Ov RG-1). 


62.6.10 DEFINITION: Let G be a Lie group. Let g € G. E = 
The right translation operator (for short-cut differential forms) RẸ : X(M(T(G)) > X(A1(T(G))) is 
defined by 


Yw € X(M(T(G))), R2(w) 2 wo RIA 
—uo (R1) 


In other words, 


Vw € X(Ai(T(G))), VV € T(G), R? (w)(V) = 


62.7. Right invariant vector fields on Lie groups 


62.7.1 REMARK: Application of right-invariant vector fields to principal fibre bundles. 

Defining mirror images of left-invariant vectors fields is not entirely redundant. Right invariant vector fields 
are useful for defining connections on principal bundles, which must leave fibre set structure invariant. So 
the infinitesimal transformations specified by connections must be right invariant vector fields because these 
are equivalent to infinitesimal left actions of the group on itself. (See Remarks 62.4.12, 63.6.2 and 63.7.1 for 
the interpretation of left invariant vector fields as infinitesimal right action maps.) 


62.7.2 DEFINITION: A right invariant vector field on a Lie group G is a vector field X on G such that 
Vg EG, R(X) =X. 


In other words, 


Vg € G, RT oX=X 0 RF. 


62.7.3 NOTATION: Xg(T(G)) denotes the set of right invariant vector fields on a Lie group G. 
XE(T(G)), for k € Zg, denotes the set of right invariant C^ vector fields on a Lie group G. 


In other words, X£(T(G)) = Xg(T(G)) n X*(T(G)) for any k € Zt and Lie group G. 
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62.7.4 THEOREM: Relations between right invariant vector field values at different points. 
Let G be a Lie group. Then X(g) = RI(X(e)) = (Rg).(X(e)) = (dRZ)e(X(e)) for any right invariant 
vector field X on G and any g € G, where e is the identity of G. 


Pnoor: The proof is the mirror image of the proof for Theorem 62.4.5 (iii). 


62.7.5 THEOREM: The right translation field of a vector is a right invariant vector field. 
Let G be a Lie group with identity e. Let V € T.(G). Then X : G — T(G) defined by Vg € G, X(g) = RT (V) 
is the unique left invariant vector field X € Xg(T(G)) such that X(e) = V. 


PROOF: The assertion may be proved as for Theorem 62.4.13. 


62.7.6 REMARK: The (unique) right invariant vector field with a given value at the identity. 

It follows from Theorem 62.7.5 that one may define the right invariant vector field X € X(T(G)) with a 
given value X(e) = V at the identity of the group. This is given a name and notation in Definition 62.7.7. 
It follows from Definition 62.6.3 that RT(V) = (R2),(V) = (dRZ).(V) for V € T,(G) in Theorem 58.7.7. 
So the expression (GR )«(V) is used in Definition 62.7.7. 

This vector field does not play the same kind of fundamental role in the definition of the Lie algebra of a 
Lie group that the left invariant vector field does. (This is mentioned in Remark 62.4.14.) 


62.7.7 DEFINITION: The right invariant vector field generated by a vector V € T.(G), for a Lie group G, 
is the unique vector field XË € Xp(T(G)) which satisfies 


Vg € G, Xj (g) = (ARF )e(V). 


62.7.8 THEOREM: Some basic properties of right invariant vector fields generated by a vector. 
Let G be a Lie group with identity e. 


() We TG), XPO =V. 

(Rg),(V) = Oy RE = R? (V). 
(aR). X0 (9)) = XE (9). 
Xn(T (c )) = {xX V € Te(G)}. 

€ T.(G), XË € X®(T(G)). 

€ Te(G), X? € XR (1(G)). 

Xn(T(G)) = XX (T(G)). 


PROOF: Part (i) follows directly from Definition 62.7.7 and Theorem 62.7.5. 
Part (ii) follows from Definitions 62.7.7 and 62.6.3. 

Part (iii) follows from part (i) and Definition 62.7.7. 

Parts (iv) and (v) may be proved as for Theorem 62.4.16 (iv, v). 


= 
= 


Part (vi) follows from parts (iv) and (v). 
Part (vii) follows from parts (iv) and (vi), and Notation 62.7.3. 


62.7.9 THEOREM: The Lie algebra of right invariant vector fields is a Lie subalgebra of all vector fields. 
The set XX (T(G)) of right invariant C^? vector fields on G is a Lie subalgebra of the Lie algebra X^" (T(G)) 
of C^? vector fields on G. Hence X? (T(G)) is a Lie algebra. 


Pnoor: The assertion may be proved as for Theorem 62.4.11. 


62.7.10 REMARK: Group elements can be acted om both from the left and from the right. 

Group elements may be thought of as “two-port” objects because they can be multiplied both on the left and 
on the right, whereas the elements of the passive set of a Lie transformation group are *one-port" objects 
because they can only be multiplied from the left. This turns out to be the fundamental reason why principal 
fibre bundles have some advantages over ordinary fibre bundles. Theorem 62.7.11 is an example of why the 
two-port property is useful: the left and right actions commute. This follows from associativity. 
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62.7.11 THEOREM: Order-independence of mixed left and right translation operators. 
Let G be a Lie group. Then 


(i) Lg o RẸ = RF o LẸ for all g,h € G. 
(ii) LC o RẸ = RẸ o LẸ for all g,h € G. 
(iii) LT o RẸ = RẸ o L1 for all g,h € G. 


PROOF: For part (i), (LẸ o RẸ)(x) = g(zh) = (gx)h = (RẸ o LẸ) (x) for all x,g,h € G. 

For part (ii), (LF o RẸ)() = 6o LZ o RẸ = G0 RẸ- © LE $4 = (Ry o LG)(@) by part (i) for all 
$ € C°(G), for all g,h € G. 

For part ar (Lg o RA )(V) = (9). (5 «(V) = (LF o R)«(V) = (RẸ o LF )e(V) = (RE) (LF )-(V) = 
(RI o LTY(V ) hs: part (i) and Theorem 58.4.13, the chain rule for differentials of differentiable maps, for all 
Ve T(G), for all g,h € G. 


36 


62.7.12 REMARK: The non-standard right invariant Maurer-Cartan form. 

Definition 62.7.13 and Theorem 62.7.14 are the mirror images of Definition 62.5.4 and Theorem 62.5.6 
respectively. See Remark 62.5.3 for some comments on the standard left-invariant Maurer-Cartan form. 
(The equality (dR? 1) = (dRC);! follows from Theorem 62.6.7 (iii).) 


62.7.13 DEFINITION: The right (i.e. non-standard) Maurer-Cartan form. 
The right Maurer-Cartan form on a Lie group G is the map w : T(G) — T.(G) defined by 


Vg € G, VV €T,(G), w(V) = (dRS) (V) 
= (dRF).*(V). 


62.7.14 THEOREM: The right Maurer-Cartan form on a Lie group is a right invariant differential form. 
Let G be a Lie group. Then the right Maurer-Cartan form w on G is a right invariant differential form on G. 
In other words, 


vh eG, wo (Rp-1)x = uw. 
In other words, 

Vh € G, VV € T(G), w((dRnr-1)a(v)(V)) = «(V), 
where r : T(G) — G is the tangent bundle projection map for G. 


PROOF: Let h € G and V € T(G). Let g =n(V). Then (dRp-1)g(V) € Tj,-1(G). So 


w((dRr-1)g(V)) = (dR -1)gn-1((dRp-1)9(V)) 
=n -1 0 Ry,-1))g(V) 
dRg-1)g(V) 
= (V). 


Hence w is a right invariant differential form on G by Definition 62.6.10. 


62.8. The Lie algebra of a Lie group 


62.8.1 REMARK: Lie algebras may be defined as tangent vectors or as vector fields. 
The Lie algebra of an abstract Lie group is generally defined in two different ways. 


(1) As tangent vectors in the tangent space T.(G) of the identity of the group (as in Definition 62.8.8). 
(2) As left invariant vector fields in X7°(T(G)) (as in Definition 62.8.3). 
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By Theorems 62.4.13 and 62.4.16, these definitions are interchangeable and equi-informational because of 
the one-to-one correspondence between them. (See Remark 19.10.1 parts (2) and (3) for some authors who 
present these two definition styles.) 


The tangent space version Te(G) is mostly preferred here as the primary definition of the Lie algebra of a Lie 
group, but the product operation on this space comes from the vector field version, which defines the product 
operation using the Lie bracket of left-invariant vector fields. So the vector field is truly the fundamental 
object, although the tangent space version is best for many applications. 


In practice, one must be able to switch back and forth between the two definitions as required. In the 
literature, it is often not clear which definition is assumed, and it doesn’t often matter. However, it is best 
to be clear whether the assumed definition is T.(G) or X?°(T(G)) because these are not the same ZF set. 


One may also define Lie algebras of Lie groups more concretely as matrices if the Lie group is itself defined 
as a matrix group. Then the Lie algebra product operation is defined as matrix commutation. (For a short 
literature list, see Remark 19.10.1 part (4).) If the Lie group is represented as linear operators on a Hilbert 


space, the Lie algebra may be represented as linear operators also. However, Section 62.8 is concerned with 
Lie groups as differentiable manifolds rather than the concrete representations which arise in physics. 


Lie algebras may be studied as abstract algebraic structures in complete isolation from Lie groups. (See 
Definition 19.10.3 for abstract Lie algebras. See Remark 19.10.1 part (1) for some references to abstract Lie 
algebra presentations. ) 


62.8.2 REMARK: The vector field version of the Lie algebra of a Lie group. 
The well-definition of the Lie algebra of a Lie group follows from Theorem 62.4.11 (iii), which asserts that 
the left invariant C?? vector fields form a Lie subalgebra of the general C' vector fields. 


62.8.3 DEFINITION: The Lie algebra of a Lie group. Vector-field version. 
The (vector-field) Lie algebra of a Lie group G is the linear space X?°(T(G)) of all left invariant vector fields 
in G together with the operation [-, -] : Xf°(T(G)) x X?*(T(G)) > XP (T(G)) defined by 


VX,Y e X%(T(G)), [X,Y] = w o (0xY — E o ðy X). (62.8.1) 
In other words, it is the Lie algebra of left invariant vector fields on G. (See Definition 62.4.10.) 


62.8.4 THEOREM: The vector-field Lie algebra of a Lie group is a real Lie algebra. 
The vector-field Lie algebra of a Lie group is a real Lie algebra. 


PROOF: The assertion follows from Theorem 62.4.11 (iv) and Definitions 62.4.10 and 61.5.7. 


62.8.5 REMARK: Computation of Lie algebra operations from an abstract manifold definition. 

Explicit computation of Lie algebra operations using Definition 62.8.3 (and using Definition 62.4.15 for the 
computation of left invariant vector fields as in Example 62.4.20) is clearly not an efficient practical method. 
In real life, one uses matrix multiplication and basic calculus because it is very quick, easy and simple. 
However, computation according to literal definitions is applied in Example 62.8.6 to demonstrate meanings 
of concepts, and to verify that abstract manifold definitions do ultimately yield computational outputs. 


62.8.6 EXAMPLE: Computation of vector-field version of the Lie algebra of the Lie group SO(3). 

The computation of the product operation of the Lie algebra of a Lie group (defined using manifolds, not 
matrices) requires the computation of the Lie bracket for left invariant vector fields on the Lie group. (For 
an example of Lie bracket computation for general vector fields on manifolds, see Example 61.5.14.) 


The quintessential non-abelian Lie group is SO(3), which may be defined (as in Remark 25.14.1) as the 
matrix group {R € M3 3(R); RTR = I3,det(R) = 1). Matrices R = R(o1,03,03) € SO(3) have the form 


C2C3 $152C3 — C153 C152C3 + $153 
R(o4, Q3, 03) = R3(a3)R2(a2)R1(a1) = | C2853 3818383 + C1C3  C18283 — §1C3 (62.8.2) 
— S2 S1C2 C1 C2 


where s; = sin o; and c; = cos a; for l = 1,2,3, as in Example 63.6.23 line (63.6.3). 
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The map R : IR? - SO(3) is injective in a small enough neighbourhood U € Topo (IR?) of 0 = (0,0,0) € R. 
The identity element e € G = SO(3) equals R(0,0,0) = I3 = [6:;]? ;-1- The restriction of R to such a 
neighbourhood is a C^? bidirectional map. So its inverse w = B : R1(U) — U provides a manifold 
chart on R^ !(U) € Top,(G) which faithfully represents the differentiable structure on the submanifold G 
of Mz (R). 

To apply Definition 62.8.3, left invariant vector fields are required as inputs. The left invariant vector fields 
for SO(3) have been computed in Example 62.4.20 using Definition 58.4.5 for the differential of a map 
between C! manifolds. 


A left invariant vector field X € X (T(G)) satisfies X(R) = (dLẸ)e(X (e)) for all R € G by Definition 62.4.15 
and Theorem 62.4.16 (i). Let V € T,(G). Then V = tew, for some v € IR? by Notation 54.1.4. It follows 
from Example 62.4.20 that 


Va € U, Xy (Fs(o) Ho (o2) Fi (01)) = tra),A(a)uib> 


where R(o) is an abbreviation for R3(a3)R2(a2)Ri(a1), and 


1 8182/C2 €182/ C2 
Va € U, A(a) = | 0 Cl —8] 
0 81/C2 €1/ C2 


In other words, 
VR € V !(U), Ww € R2, (dLG) (teu) = tn, AR), y: (62.8.3) 


By Definition 62.8.3, the Lie algebra operation for G acts on Xf°(T(G)). By Theorem 62.4.16 (iv, vii 
X? (T(G)) is the set of left invariant vector fields XL on G such that V € T.(G). Thus 


— 


XP (T(G)) = {Xz i v E€ RË}, (62.8.4) 
where VR € G, VV € T.(G), XE(R) = (dLG).(V) by Definition 62.4.15. 
To compute [X,Y] according to line (62.8.1), first one must compute OxY. By line (62.8.4), X = PN 
and Y — XL wy fOr some v, w € IR?. Then for such w, line (62.8.3) implies 


YR ew !(U), Y(R) = Xp. ,(R) 


DE (REP bou) 
= İR AQU(R))w, d 


This has a double dependency on R. The base point of the vector is R, and its tangent velocity parameter 
is A(V(R)). The map R > tg, A(u(n))w,o may be differentiated at R = e by the vector tev. Computation 
of the derivative of a vector field by a vector can be achieved using Theorem 61.2.12. 


Define f : v^ !(U) —^ R? by f(R) = A(J(R))w for all R € v^!(U). Define h : v^ !(U) x IR? 2 R3 by 
h(R,v) = 35 4v'O;(f o )(v(R)) for all (R,v) € y) (U) x R3. Then 


VR € j^ (U), Ye € RY, h(R v) = Y: Aa, FW) layer 


i=l 


3 
= M ða, A(o)0| wry: 


The partial derivatives of A are easiest to compute at a = 0 = (0,0,0), which corresponds to R = eg. 


001 0 0 0 
0 0 0f, O.Ae)|,= 10 0 0 
0 0 0 0 0 0 
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From this one obtains 


Vv € R2, h(eq, v) == Ui: (0, —w3, w2) + v2 (w3, 0, 0) + U3 (0,0, 0) 


= (vat, —v w3, U1 W2). 


Therefore by Theorem 61.2.12, 


3 _ (2) 
Vv, w € IR?, bew yp Y = DERIT. 


240 


6, ,U,(U2w3,—U113,U172),V"* 
Similarly, one obtains 
Vv, w € R8, ewy X = o 


€,U,w,(w2U3,—w1U3,w102),U" 


Consequently, for X — X d and Y = XL. " 


[X, Y](e) = (w o (Ox Y — Eo 0y X))(e) 
= w(Ox(gY — E(0y(e) X)) 
— w (ðe v pY = = (0e,w,X )) 


a (2) zu 0) 
~~ OES sy oi Cain, rane E SPENT) 
2 (2) (2) 

_ OL tai Ciatks tuli E © — 

- (2) 

zd p ———ma! 


= le (va ws —wavs,— vy ws--w vs vi wa —ws v2) b 


= (vzw3 — W2U3)tee,  - (—U1ws + W1U3)te,e.,y + (v1w2 — wit2)te e, v. 


'This expression shows very clearly the antisymmetric bilinear nature of the Lie bracket when applied to the 
vector-field-style Lie algebra of a typical Lie group. Also, even though the parametrisation of the group is 
not symmetric with respect to permutations of the three parameters o4, o? and a3, the Lie brackets of the 
corresponding unit vectors show perfect symmetry (or antisymmetry). 


Definition 62.8.3 apparently demands computation of [X, Y](R) for all R € SO(3). The asymmetric form 
of the parametrisation used here for SO(3) makes such a computation quite cumbersome. Luckily it is not 
necessary. By Theorem 62.4.11 (ii), the Lie bracket operation for the Lie algebra of left invariant vector fields 
of a Lie group is closed. In other words, the Lie bracket of two left invariant vector fields on a Lie group 
yields another left invariant vector field. 

By Theorem 62.4.5 (iii), a left invariant vector field Z € X, (T(G)) is uniquely determined by its value at the 
group identity according to the rule Vp € G, Z(p) = (dL )e(Z(e)). Therefore [X,Y] is the unique element 
of X?*(T(SO(3))) which is defined by VR € SO(3), [X, Y](R) = (dLg)e([X, Y](e)). So the full Lie algbera 
product operation for SO(3) in the vector-field style as in Definition 62.8.3 is given by 


Vv,w € R?, Peace (62.8.5) 
'These kinds of computations suggest that the full left invariant vector fields in Definition 62.8.3 should not 
be used as the practical computational definition of the Lie algebra of a Lie group. In practice, one computes 
at most the action of a vector on such vector fields at the identity of the group. "Therefore tangent space 
elements at the identity are defined to be the underlying objects in the Lie algebra in Definition 62.8.8, 
and the left invariant vector fields generated by them are used principally to determine the multiplication 
operation, as mentioned in Remark 62.8.7. 


If one uses the shorthand e; to denote the vector field XD us for k € N3, line (62.8.5) implies the often-seen 
formulas [e1, e2] = es, [e2, ea] = e1 and [e3, e1] = ea. 


62.8.7 REMARK: The tangent space version of the Lie algebra of a Lie group. 
For the left invariant vector field XL generated by any V € T.(G) in Definition 62.8.8, see Definition 62.4.15. 
For the vector-field Lie bracket expression [X/, XL] in line (62.8.6), see Definitions 61.5.7 or 62.8.3. 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


1988 62. Lie groups 


Definition 62.8.8 is the same as Definition 62.8.3 except that the left invariant vector fields XL are replaced by 
their generators V € T.(G). In essence, the tangent-space Lie algebra uses the vectors as “labels” for the left 
invariant vector fields. They have a one-to-one correspondence by Theorem 62.4.13. So this can be thought 
of as a kind of “information compression". However, the vector fields in Definition 62.8.3 are the “real” Lie 
algebra elements because their commutators provide the vector product for the linear space T(G). Thus 
fields must be used to compute the commutators for the linear space, and then the fields can be discarded. 


62.8.8 DEFINITION: The Lie algebra of a Lie group. Tangent-space version. 
The (tangent-space) Lie algebra of a Lie group G is the tuple T.(G) < (R, T.(G), or, TR, 0, T, H), where 


(i) (R, T £(G), or, TR, 0, p) is the tangent space of G at its identity e, 
(ii) the vector product operation T = [-, -] : Te(G) x T.(G) — T;(G) is defined by 


VV,W e T(G), [V, W] = (XI, XE Ce) (62.8.6), 
= (wo (Ox, Xi — o xi Xy))(e) 
= w(0y Xj, — E(0w Xy.)), 


where XL denotes the left invariant vector field generated by any V € T..(G). 


62.8.9 THEOREM: The tangent-space Lie algebra of a Lie group is a real Lie algebra. 
The tangent-space Lie algebra of a Lie group is a Lie algebra over IR. 


PROOF: Let G be a Lie group. Let A = (R, Te(G), or, TR, C,T, ui) denote the tangent-space Lie algebra of 
G according to Definition 62.8.8. To show that A is a Lie algebra over R according to Definition 19.10.3, 
note first that the tuple (IR, T.(G), cm, TR, c, 4) is a linear space by Definitions 62.8.8 (i) and 54.4.4, and is 
therefore a unitary left module over the commutative unitary ring (R, oR, TR) by Theorems 22.1.3 and 18.7.5. 
So A satisfies Definition 19.10.3 (i). 

To show left distributivity of 7, let U, V,W € T.(G). Then 7T(U + V,W) = [U + VW] = [XE v. X&](e) = 
XG, Xile) + [Xv Xiy](e) = T(U, W) + T(V, W) by Theorems 61.5.17 or 62.8.4 or 62.4.11 (iv). Similarly, 
T has the right distributive, anticommutative, scalarity and Jacobi identity properties. (Since the properties 
hold for left invariant fields, they are also valid for the “samples” of these fields evaluated at e € G.) Thus 
A is a Lie algebra over R by Definition 19.10.3. 


62.8.10 REMARK: Differential of the group operation in terms of translation operators. 

Theorem 62.8.11 expresses the differential do of the group operation o : G x G — G in terms of left and right 
translation operators, and left and right invariant fields. This is useful for expressing the vector product of 
the Lie algebra of a Lie group locally in terms of differentials of the group operation in Theorem 62.8.13. 


62.8.11 THEOREM: Lie group operation differential expressed as differentials of translation operators. 
Let G be a Lie group with group operation c. Then 
V(91,92) € G x G, V(Vi, Vj) € Ty, (G) x Ty, (G), 
(do)g; o; (Vi, V2) = (le ja (V2) + (dR )a (Vi) 
= (L5 (Va) + (Rg, )+ (Vi) 
= ôv, LẸ + Oy, RG, 
= LT (V2) + Rj, (Vi). 


Hence 
Vg € G, VV €T.(G), (da) g,¢(0g,V) = (dL) e(V) =(L2),.(V) = ðv Le = LI (V) 
= Xv (g) 
and 
Vg € G, VV € T,(G), (da)e,, (V, 05) = (dRẸ)e(V) = (RS). (V) = ðv RF = R? (V) 
= X% (9) 
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PROOF: The equalities for (do) ,.4,(Vi, V2) follow from Theorem 62.2.8 and Definitions 62.3.3, 62.3.9, 
62.6.2 and 62.6.3. The equalities for (do)g e(0g, V) and (do)e,(V,0,) then follow from Definitions 62.4.15 
and 62.7.7. 


62.8.12 REMARK: Local construction for the Lie bracket for the Lie algebra of a Lie group. 

It seems somewhat indirect and inefficient that the bracket operation 7 in Definition 62.8.8 is constructed 
by first extending the vectors V, W € T.(G) to global vector fields XL and Xj, on G, then computing the 
Lie bracket of these global vector fields as in Definition 61.5.7, and finally sampling the value of the output 
field POR A] at only one point, namely e. It seems more desirable to be able to define 7 more locally, in 
some neighbourhood of e, by means of some kind of differential formula. This is clearly a logical possibility 
because the field FO E in a neighbourhood of e depends only on the local structure of X% and Xj. 
Verifying this is the purpose of Theorem 62.8.13. 


Theorem 62.8.13 demonstrates explicitly that the vector product operation for the Lie algebra of a Lie group 
is constructible from the addition operation of the group. Therefore the construction of the left invariant 
vector fields is in fact not essential for the construction of the vector product. Line (62.8.7) could be a useful 
alternative definition for the vector product in some applications. 


62.8.13 THEOREM: Local formula for the vector product operation for the Lie algebra of a Lie group. 
Let G be a Lie group with addition function o : G x G — G. Then 


VIL V eT. 
[U,V] = e ((d(g  (do)g,« (05, V) ))e(U) — E( (d( h = (da)e,s (U, 0n) )).(V)) ). (62.8.7) 


where (d(...)). means the differential for maps on G (as in Definition 58.4.5), and (do),,- and (do).,; denote 


the direct product differential for maps on G x G (as in Definition 58.6.2), and 0,,0; are the zero elements 
of the tangent spaces 7T,(G) and Th (G) respectively for g, h € G. 


Pnoor: Let U,V € T.(G). Define ¢: G > T(G) by à : g 4 (do)gc(0g,V). Then ó(g) = XŁ(g) € 
T,(G) for all g € G by Theorem 62.8.11. So (d$) (U) = 0pó = Ou XL = Oxe) Xy = Ox, Xy (e) by 
Definition 61.4.2. Thus (d(g > (do)g e(0g,V)))e(U) = 0x, Xy (e). The same argument applied to the 


map h +> (do)en(V,0n) gives (d( h  (do)e,4 (U,05) )) (V) = xr Xğ(e). Hence line (62.8.7) follows from 
Definitions 62.8.8 and 62.8.3. 


62.9. One-parameter subgroups 


62.9.1 REMARK:  Differentiable one-parameter subgroups of Lie groups. 
The one-parameter subgroups in Definition 62.9.2 are the differentiable analogues of the topological one- 
parameter subgroups in Definition 36.9.6. 


As alluded to in Remark 36.9.5, one-parameter subgroups of Lie groups may be thought of as a generalisation 
of affinely parametrised lines through the origin in a Euclidean space with Cartesian coordinate space R”. 
Such lines have a one-to-one correspondence with the velocity vectors v in the tangent space To (IR^), which 
is identified with IR". In the case of Lie groups, one-parameter subgroups are (hopefully) in a one-to-one 
correspondence with the velocities v € T,(G). It is not difficult to show that every C! one-parameter 
subgroup y is completely determined by its initial velocity y/(0). It must also be shown that there is a 
unique one-parameter subgroup for each initial velocity. 


62.9.2 DEFINITION: A OF (differentiable) one-parameter subgroup of a Lie group G, for k € Zr is a C* 
curve y : IR — G which satisfies y(0) = eg and 


Vs,t € R, y(s +t) = y(s)y(t). 


62.9.3 THEOREM: The range of a one-parameter subgroup is a commutative subgroup. 
Let G be a Lie group. Let y be a C? one-parameter subgroup of G. Then Range(^) is a commutative 
subgroup of G. 


Pnoor: The assertion follows from Theorem 36.9.7 (i). 
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62.9.4 THEOREM: One-parameter subgroups are integral curves of left invariant vector fields. 
Let y be a C! one-parameter subgroup of a Lie group G. Let X be a left invariant vector field on G with 
X(e) = y'(0). Then Vt € R, y(t) = X(s(t)). In other words, y is an integral curve of some X € X; (T(G)). 


PROOF: Let V = 4'(0). Then V € T.(G). So by Theorem 62.4.13, there is a unique left invariant vector 
field Xt € X°(T(G)) with X, (e) = V, and by Theorem 62.4.16 (v), Xv € X^ (T(G)) and X (g) = LI (V) 
for all g € G. Therefore 


vt eR, XE(Q(0) = LTV) 
= (Lo). (0 (0) (62.9.1) 
= (dL. )«(/ (0) (62.9.2) 
= (Ly o 3) (0), (62.9.3) 


where line (62.9.1) follows from Definition 62.3.9, line (62.9.2) follows from Definition 58.9.4 and the fact 
that y'(0) € T4(9(G) = Te(G) by Definition 57.9.2, and line (62.9.3) follows from Theorem 58.4.11. But 
(Lya o y)(s) = L4qy(Y(s)) = v(t)v(s) = (t+) for all s,t € R by Definition 62.9.2. Let 5; : IR —> G denote 
the translated curve y : s+ y(t + s). Then Lya) o y = y for all t € IR. Therefore XE (y(t)) = 74(0) = 4 (t) 
for all t € IR. 


62.9.5 REMARK: Lie algebra elements generate Lie group elements via one-parameter subgroups. 

Lie algebra elements of a Lie group are often called *generators" of the group because they can be integrated 
to generate one-parameter subgroups. In general contexts, a "generating set" means a set whose combinations 
span and entire space, and often this expected to be a minimal spanning set, like a basis of a linear space. 
In the Lie group context, a set of generators is expected to span the entire Lie algebra of the group, and 
then the connected component of the group containing the identity element can be spanned by generating 
one-parameter subgroups. Thus “generators” can mean either elements of a minimal or general spanning 
subset of the Lie algebra, or it could mean general elements of the Lie algebra. (See also Remark 67.6.6 for 
this terminology issue.) 


The term “generator” is certainly a convenient shorthand for “Lie algebra element”, and that is the sense 
in which it is used here. However, it must be kept in mind that these “generators” do not span the entire 
group if the group has more than one topological component. 


62.9.6 REMARK: Existence and uniqueness of one-parameter subgroups generated by Lie algebra elements. 
In practical applications, Lie groups are typically represented as matrices, and the generators are then also 
matrices. So "generating" one-parameter subgroups from generators is a simple matter of exponentiation. 
'Then questions of existence and uniqueness are relatively easy to answer. In the case of abstract Lie groups, 
however, existence and uniqueness of one-parameter subgroups requires ODE theory. 


Lie algebra elements V € T,(G) are very convenient substitutes for left invariant vector fields XL, and 
the one-to-one correspondence is easy to construct for any given abstract Lie group. The existence and 
uniqueness of left invariant vector fields do not require ODE theory. 


Lie algebra elements also have a one-to-one correspondence with one-parameter subgroups. This requires 
ODE theory to demonstrate uniqueness for abstract Lie groups. One useful application of this uniqueness 
is that the effectiveness of the group action on a manifold implies a one-to-one correspondence between Lie 
algebra elements and the vector fields which they generate on the manifold, as in Theorem 63.6.17. This 
uniqueness then implies the well-definition of associated connections on differentiable fibre bundles as in 
Definition 67.12.3. 


Although the proof of Theorem 62.9.7 may seem very simple, the complexity is hidden in the proofs of 
Theorems 57.10.6, 44.5.3 and 43.9.5 (vii). The latter theorem ultimately depends on some subtle properties 
of the vector-valued Darboux integral. 


62.9.7 THEOREM: Uniqueness of one-parameter subgroups of Lie groups. 
Let G be a Lie group. Let V € Te(G). Let y1,72 : I — G be Ct curves with I € Topo?" (IR), such that 
^ (0) = e and Vt € I, y(t) = XE (y (t)) for k = 1,2. Then 4 = 72. 


PROOF: The assertion follows from Theorems 62.4.16 (i, v) and 57.10.6. 
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(( 2018-11-24. Show in Theorem 62.9.8 that if y € C!(I, G), then y € C?*(I, G). Thus Theorem 62.9.9 should 
state that y € C” (IR, G).)) 


62.9.8 THEOREM: Some basic properties of integral curves of left invariant vector fields. 
Let G be a Lie group. Let V € T.(G). Let X¢ be the left invariant vector field on G with X¢(e) = V. Let 
y € C! (I, G) be an integral curve of XU with I € Top?" (IR) and 4(0) =e 


(i) Vs,te I, (s-t€ I => 7(s+t) 2 vy(s)y(t)). 
(ii) Vt € I, (-t€ I > 7(-t) = q(t)-1). 
(ili) Vs,t € I, y(s)y(t) = v(t)a((s). 
) 


(iv) There exists an integral curve 7 € C! (IR, G) of XL with al; =y 


Pnoor: For part (i), let y : I + G be a C' integral curve of X/ with I € Topo" (IR) and 7(0) = e. Let 
s € I and I+ = {s +t; t € I}. Define Ẹ : I+ — G by 4(t) = y(s)y(t — s) fort € I+. Let g = 7(s). Then 


Vt € I*, 4 (t) = &(Lgy(t — s)) 
= (dL),u-sQ' (t — 8)) (62.9.4) 
= (dLg),u- s) ((AL0—5))«(V)) 
= (dLss(-s))e(V) (62.9.5) 
= (dL4())e(V), 


where lines (62.9.4) and (62.9.5) follow from the chain rule, Theorem 58.4.13. Let Ig = IM I*. Then 
s € lo, 5(s) = g = 7(s), and 4/(t) = y(t) for all t € Io. Therefore 4|, = Ale by Theorem 57.10.6. 
Therefore y(t) = y(s)y(t — s) for all s,t € I with t — s € I. Hence by substituting s + t for t, one obtains 
qls +t) 2 *(s)y(t) whenever s,t € J and s4- tc I. 

Part (ii) follows from part (i) with s — —t. 

For part (iii), if s <0 € t or t € 0 € s, then s-F£ € I. So y(s)y(t) = g(s +t) = y(t + s) = vy(t)y(s) 
by part (i). If 0 < s € t, thent— s € I. So y(t) = y(t — s)y(s) = y(s)y(t — s) by part (i). Therefore 
y(t)y(s) t = 4(s) -!4(t). Thus 7(s)7(t) = y(t)y(s). The remaining cases follow similarly. 

For part (iv), let y € C! (I, G) be an integral curve of XL with I € Topo" (IR) and 4(0) =e. Let s € I and 
I+ = {s +t; t€ I}. Define 4: I+ > G by 4(t) = «(s gs s) for t € I*. Then as in the proof of part (i), 
VW, = = 711, , where Ip = IN I+. So yı = yU4F is a well-defined function on I, = IU I*, y € C!(I4,G), 
d y(t) = XE (o(t)) for all t € Jı. Since 2s € J, this procedure may be repeated to construct an integral 
curve y2 € C!(I5,G) of XU with 4s € I5 = L U (2s + t; t € I5), and so forth. Since s may be chosen to 
be either positive or negative, this construction yields, by induction, an integral curve y € C! (IR, G) of Xt 
with 3l Ly. 


62.9.9 THEOREM: Existence of one-parameter subgroups of Lie groups. 
Let G be a Lie group. Let V € T.(G). Then there exists an integral curve y € C! (IR, G) of XL with 4(0) = 


Proor: By Theorem 62.4.16 (v), Xf € C^*(T(G)). So by Theorem 57.10.5, there is an integral curve 
y: I> G of XL with I € Top?" (IR) and 7(0) = e. Hence by Theorem 62.9.8 (iv), there exists an integral 
curve y € C^(R, G) of XL with 7(0) =e 


62.10. Adjoint maps by Lie groups acting on Lie algebras 


62.10.1 REMARK: Adjoint maps by Lie group elements acting om their Lie algebra. 

Adjoint maps associate each Lie group element with a linear map on the Lie algebra of the Lie group. 
This is useful for the study of connection forms. (See Section 69.5.) The left and right actions LỌ and 
ie ', = (RZ)! commute with each other by Theorem 62.7.11 (i). 


62.10.2 DEFINITION: 


The (left) adjoint map by a Lie group element g € G is the linear map (d(L¢ jd ) 


The right adjoint map by a Lie group element g € G is the linear map (d(Rg o LE 
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62.10.3 NOTATION: Adj(g), for g in a Lie group G, denotes the (left) adjoint map by g. In other words, 


Vg EG, Adj(g) = (d(LE o RO.1))e 
=a o LẸ))e. 


62.10.4 REMARK: Adjoint maps are differentials of conjugate maps. 

The map LF o RT. is just a complicated way of writing the map h — ghg~! for h € G. So the adjoint map 
in Definition 62.10.2 is simply the differential of this conjugate map for a fixed g € G. (See Section 17.8 for 
conjugate maps.) If G is commutative, this differential is (dida), = id, (c) by Theorem 58.5.2. 

-1 


The conjugate map h =œ ghg ^ is referred to as the left conjugation map in Definition 17.8.12. When g is 
replaced by g^, it is called the right conjugation map. Therefore the adjoint map Adj(g) could be thought 
of as the “left adjoint map", whereas the adjoint map Adj(g~') could be thought of as the “right adjoint 
map". The latter is quite frequently encountered in the theory of connection forms on principal bundles. 
(See Section 69.5.) In this book, the “right adjoint map" is always expressed as the left adjoint map of the 
inverse group element, namely as Adj(g~). So there is no need for two distinct notations for adjoints. 


62.10.5 THEOREM: Some formulas for the adjoint map in terms of differentials of translation maps. 
Let G be a Lie group. 


(i) Vg € G, Adj(g) € Lin(T.(G), T.(G)). 
(ii) Vg € G, Vu € Te(G), Adj(g)(u) = (LE o B i)«(u) = (ALẸ )q-1 (ARE 1 )e(u). 
(iii) Vg € G, Vu € T. (G), Adj(g)(u) = (RF. © 16). (u) = (REL) (Lg)«(u) = (AR .) (ALG )e(u). 


I 
tS 
Q 
x 
Es 
SQ 
x 
es 


PRoor: For part (i), the linearity of Adj(g) follows from the linearity of (d(L¢ o Reade : T.(G) > T.(G) 
by Definition 58.4.5. 

For part (ii), let g € G and u € T.(G). Then Adj(g)(u) = (Lg o R@i).(u) = (Lg o Re a)«(u) by 
Definitions 62.10.2 and 58.9.4. So Adj(g)(u) = (L2). (Rg.1)«(u) = (dLg)4- (GRE..)«(u) by the chain rule 
for differentials of C! maps, Theorem 58.4.13. 

For part (iii), let g € G and u € T.(G). Then Adj(g)(u) = (a(R, o L£))«(u) = (RE o LẸ), (u) by 
Definitions 62.10.2 and 58.9.4. So Adj(g)(u) = (RE1)s (L m (u) = (dRG_1)g(dL7 )e(u) by the chain rule for 
differentials of C! maps, Theorem 58.4.13. 


62.10.6 THEOREM: Adjoint maps are linear space isomorphisms on the Lie algebra. 
Let G be a Lie group with identity e. Let 0. denote Or, (c). 


(i) Adj(e) = idr,(q). 
(ii) Vg,h € G, Adj(g) o Adj(h) = Adj(gh). 
(iii) Vg,h € G, Vu € T.(G), Adj(g)(Adj(h)(u)) = Adj(gh)(u). 
(iv) Vg € G, Adj(g) o Adj(g~*) = idr, (c) = Adj(g~*) o Adj(g). 
In other words, Vg € G, Adj(g) ! = Adj(g7?). 
(v) Vg € G, Vu € T.(G), Adj(g)(u) = 0e & u = 0e. 
In other words, Vg € G, ker(Adj(g)) = {0e}. 
(vi) Vg € G, Adj(g) is a linear space automorphism on T,(G). 
PROOF: For part (i), LG = idg and RC , = RG = idg by Definitions 62.3.3 and 62.6.2. So (dLC), = idr, (c; 
aud (dR )e = idz, (cj by Theorem 58.5.2. Hence Adj(e) = idz, (cj by Theorem 62.10.5 (ii). 
For part (ii), 


Adj(g) e Adj(h) = (d(L' o RẸ-1))e e (d(Lf o Ri-i))e 
= (d(L o RG-1 o LẸ o Ri-1))e 


— (d (L, gh 2 Br uas 
= Adj(gh) 


by the chain rule for differentials of C! maps, Theorem 58.4.13. 
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Part (iii) follows from part (ii). 

Part (iv) follows from parts (i) and (ii). 
For part (v), let g € G and u € T(G) satisfy Adj(g)(u) = 0e. Then Adj(g~')(Adj(g)(u)) = Adj(g~)(0.). 
So Adj(g-1)(0.) = idr, (Gy (u) = u by part (iv). But Adj(g~')(0.) = 0, because Adj(g^!) is a linear map by 
Theorem 62.10.5 (i). So u = 0e. The converse follows likewise from the linearity of Adj(g). 

Part (vi) follows from part (iv), Definition 23.1.8 and Theorems 10.5.14 (iv) and 62.10.5 (i). 


62.10.7 REMARK: Adjoint maps and right translation operators for left invariant fields. 

The adjoint map for a Lie group element g € G is expressed in Theorem 62.10.8 (ii) in terms of the right 
translation by g`! of the left invariant field XL € X(T(G)) for which XL(e) = u. (See Definition 62.4.15 
for this field.) Assertions similar to Theorem 62.10.8 (i, ii) are given by Spivak [37], Volume II, page 309. 


62.10.8 THEOREM: Some expressions for adjoint maps in terms of left invariant vector fields. 
Let G be a Lie group. For u € T.(G), let XL be the left invariant field on G with X} (e) = u. (See Definitions 
62.4.2 and 62.4.15 for left invariant fields.) 


(i) Vg € G, Yu € Te(G), Adj(g)(u) = (Re). (Q2)« o Xx) (9). 
(ii) Vg € G, Vu € T.(G), Adj(g)(u) = ((Rf-1)« o Xx o R9)(e) = (RFs (XI) (9)- 


PROOF: Part (i) follows from Theorem 62.10.5 (ii) because X7 (e) = u. 
For part (ii), let g € G and u € T,(G). Then X$ (g) = (LE).(u) by Theorem 62.4.16 (v). But g = RẸ (e) by 
Definition 62.6.2. So (LF). (u) = XL(Rg(e)) = (XL o RE)(e). Therefore by Definitions 62.3.9 and 62.4.2, 
(LS), o Xe) = (L6). (u) = (XE o RS)(e). So Adj(g)(u) = (RS). o XE o RE)(e) by part (i). Hence 
Adj(g)(u) — (LAG by Definition 62.6.3. 


62.10.9 REMARK: Formulas for translations of translation-invariant vector fields on Lie groups. 
'Theorem 62.10.10 gives some formulas for right and left translations of left and right invariant vector fields 
respectively. These are shown in Table 62.4.1 in Remark 62.4.12 together with the contrasting left and right 
translations respectively. The formulas in Theorem 62.10.10 are essentially the same as the formulas in 
Theorems 63.7.8 and 63.6.15 for Lie right and left transformation groups respectively. This is, of course, not 
by pure coincidence. 


62.10.10 THEOREM:  Right/left translations of left/right invariant vector fields. 
Let G be a Lie group. 


(i) VV € T.(G), Yg € G, RG (XV) = Xxaig-1v- 
(ii) VV € T.(G), Yg E€ G, LE (X#) = X 


Adj(g)(V)" 
PROOF: For part (i), 
Va € G, RE (X$)(z) = (ARG) 2g-1 (Xv (xg ))) (62.10.1) 

= (ARF sg (4L) (V)) (62.10.2) 
= (d(RẸ o LẸ av) (62.10.3) 
= (d(LẸ o RẸ o LE 1))e(V) 
= (dg etd LR o Eo (V) (62.10.4) 
= (daL2).(Adi(g )(V)) (62.10.5) 
= X Kaj(g-2)(V) (x) (62.10.6) 


where line (62.10.1) follows from Definition 62.6.3, line (62.10.2) follows from Definition 62.4.15, lines 
(62.10.3) and (62.10.4) follow from the chain rule (Theorem 58.4.13), line (62.10.5) follows from Nota- 
tion 62.10.3, and line (62.10.6) follows from Definition 62.4.15. Hence R(X) = X aig (V) 
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For part (ii), 


Va € G, Ly (XV)(z) = (aL) (Xv (g^ 2) (62.10.7) 
= (dL )g-'a((dRF-12)e(V)) (62.10.8) 
= (d(LS o RS ,))e(V) (62.10.9) 
= (d(RF o 18 o Ry-1))e(V) 
= (ARF )e(d(LG o RG-1))e(V)) (62.10.10) 
= (ARF )e(Adj(g)(V)) (62.10.11) 
= XRai(9)(V) (x) (62.10.12) 
where line (62.10.7) follows from Definition 62.3.9, line (62.10.8) follows from Definition 62.7.7, lines (62.10.9) 


and (62.10.10) follow from the chain rule (Theorem 58.4.13), line (62.10.11) follows from Notation 62.10.3, 
and line ine (62.10.12 10.12) follows from Definition 62.7.7. Hence L? (X) = XE vy: 


62.10.11 REMARK: Right/left translation bijectively maps left/right invariant vector field spaces. 

Although Theorem 62.10.10 implies that the right/left translation operators RẸ and po are identity maps 
on the respective left/right invariant vector field spaces X;(T(G)) and Xp(T(G)) only if the Lie group 
is commutative, it is perhaps some consolation that these translation operators are at least bijections in 


L are identity maps Xp(T(G)) and X(T (G)) Ee in Theorem 62.10.12 (i, iv) is just another way 
of stating the definitions of these invariant vector field spaces. 


62.10.12 THEOREM: Left/right translation operators are bijections on invariant vector field spaces. 
Let G be a Lie group. 
(i) Vg EG, LF a latri Ta) ` : XL(T(G)) > XL (T(G)) is a bijection. 


Moreover, Yg € G, LE ve (T(G)) = id x, (T(G))- 


(ii) Vg € G, n XrL(T(G)) > XrL(T(G)) is a bijection. 
(ii) Vg € G, L?| xscrqay ` Xr(T(G)) > X&(T(G)) is a bijection. 
(iv) Vg eG, RF eee Te) ` Xn(T(G)) ^ Xr(T(G)) is a bijection. 


Moreover, Vg € a RF [karie = idx, (7(@)): 


PROOF: Part (i) follows from Theorem 62.4.16 (iv). 

Part (ii) follows from Theorems 62.10.10 (i) and 62.10.6 (vi) because each vector field XẸ € KaT (G)) is 
uniquely determined by its value at e € G by Theorem 62.4.16 (iii). So the bijection Adi(g- 1) : Te(G) > 
T.(G) induces a bijection RP lx. re) : X1(T(G)) > Xz, (T(G)). 


Part (iii) follows from Theorems 62.10.10 (ii) and 62.10.6 (vi) because each vector field XË € Xp(T(G)) is 
uniquely determined by its value at e € G by Theorem 62.7.8 (iii). So the bijection Adj(g) : T.(G) > T.(G) 
induces a bijection Lj xr) : XR(T(G)) > XR(T(G)). 


Part (iv) follows from Theorem 62.7.8 (iv). 


62.10.13 REMARK: The left invariant Maurer-Cartan form is a connection form. 
Theorem 62.10.14 (i) has the curious consequence that the Maurer-Cartan form is a connection form on G, 
regarded as a principal bundle, by Theorem 69.8.2 (i). 


62.10.14 THEOREM: Right and left transformation rules for left and right Maurer-Cartan forms. 
Let G be a Lie group. Let w^ : T(G) — T.(G) be the left invariant Maurer-Cartan form in Definition 62.5.4. 
Let w? : T(G) + T.(G) be the right invariant Maurer-Cartan form in Definition 62.7.13. 


(i) Vg € G, RẸ (w") = Adj(g) ow”. 

(ii) Vg € G, L2 (wF) = Adj(g !) o wF. 
(See Definitions 62.3.17 and 62.6.10 for the left and right transformation operators LY and R$ respectively 
for short-cut differential forms.) 
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PROOF: For part (i), Let g € G. Then by Definitions 62.5.4 and 62.6.10, 


Vp € G, VV € T(G), R$ (w")(V) = w" ((dRy-1)p(V)) 
= (dL(pg-1)-1)pg-1((d R51)» (V)) 
= (d(Lgp-1 o Ry-1))p(V) 
= (d(Lg o Rg-1))e((dLp-1)p(V)) 
= Adj(g)(w*(V)) 


by Definition 62.10.2. Hence R2 (w^) = Adj(g) ow”. 
For part (ii), Let g € G. Then by Definitions 62.7.13 and 62.3.17, 


Vp € G, VV € T,(G), L? (uPy(V) = wF((dL,-:)4(V)) 


by Definition 62.10.2. Hence LË (w?) = Adj(g ^!) ow. 
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Chapter 63 


LIE TRANSFORMATION GROUPS 


63.1 Groups of diffeomorphisms . .. ....... 00 1997 
63.2 Families of diffeomorphisms . . . . . . 22 eos 1998 
63.3 Dual action of diffeomorphism families . . . ........ es 2000 
63.4 Lie left transformation groups... «4. ll e eos 2002 
63.5 Lie right transformation groups... 2... e e Ses 2007 
63.6 Infinitesimal left transformations . . . . . . les 2010 
63.7 Infinitesimal right transformations . . . .. . . lees 2018 


63.0.1 REMARK: Diffeomorphism groups versus differentiable transformation groups. 

Sections 63.1, 63.2 and 63.3 present some very basic definitions and theorems concerning groups and families 
of diffeomorphisms. These are not assumed to have a finite-dimensional manifold structure on the group. 
Therefore they are not Lie transformation groups. However, they do have some “philosophical” interest 
because they give some hint of what is lost if the group has no specified manifold structure. 


'The remaining sections are concerned with Lie transformation groups, which require both the group and 
the passive set to have finite-dimensional manifold structure. These sections contain core definitions and 
theorems which are required for the presentation of differentiable fibre bundles, which are required for the 
presentation of general connections. Sections 63.1, 63.2 and 63.3 may be safely skipped, but the following 
sections cannot. 


63.1. Groups of diffeomorphisms 


63.1.1 REMARK: Relevance of diffeomorphism groups to connections on differentiable fibrations. 

The implicit structure group for a connection on a differentiable fibration in Definition 67.4.2 is a group of 
diffeomorphisms of the fibre space. This fibre space is a differentiable manifold by Definition 64.2.2. A group 
of diffeomorphisms of a differentiable manifold can be very infinite-dimensional. Therefore it cannot be said 
that the structure group for a differentiable fibration is a finite-dimensional Lie transformation group. This 
state of affairs for connections on differentiable fibrations is summarised on the first line of the following 
table, including references to definitions. 


structure group fibration/fibre bundle connection 


group of diffeomorphisms (63.1.4) differentiable fibration (64.2.2) connection on fibration (67.4.2) 
Lie transformation group (63.4.2) differentiable fibre bundle (64.8.3) connection on fibre bundle (67.5.4) 


In the case of connections on differentiable fibre bundles in Definition 67.5.4, the structure group is explicitly 
required to be a Lie transformation group, which is a group of diffeomorphisms which has the structure 
of a differentiable manifold. Since differentiable manifolds are defined, at least in this book, to be finite- 
dimensional, this forces the structure group to be finite-dimensional. This state of affairs is summarised in 
the second line of the above table. 

The very general definition of a connection on a differentiable fibration is helpful to elucidate the role played 
by a structure group of a differentiable fibre bundle by replacing the Lie transformation group with a general 
group of diffeomorphisms as in Definition 63.1.2. 
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63.1.2 DEFINITION: The C* diffeomorphism group of a C" manifold M for k € Zi is the transformation 
group G < (G, M) < (G, (M, Am), cc, u) of all C^ diffeomorphisms from M to M. 


63.1.3 REMARK: The definition of groups of diffeomorphisms. 

In Definition 63.1.2, each group element is identified with its action on the C^ manifold M. This implies 
automatically that the group acts effectively on M because any two group elements which have the same 
action on M are the same group element by definition. Since the identity function on M is always a C* 
diffeomorphism, the C^ diffeomorphism group is well defined for any C^ manifold. If the class is not specified, 
it is assumed to be Cl. 


63.1.4 DEFINITION: A group of C" diffeomorphisms of a C* manifold M for k € pn is a transformation 
group G < (G, M) < (G, M, Am, oa, p) of C^ diffeomorphisms from M to M. 


63.1.5 REMARK: Topology of diffeomorphism groups. 
A group of C* diffeomorphisms of a C^ manifold according to Definition 63.1.4 is the same thing as a 


subgroup of the C* diffeomorphism group of M according to Definition 63.1.2. The difference is that “a” is 
substituted for “the”. 


If the group G has a topology Tg, and the group action is continuous with respect to Tg and the topology 
on M, this kind of group may be called a topological group of C* diffeomorphisms as in Definition 63.1.6. A 
suitable topology for G could be the relative topology on G from the compact-open topology on C(M, M). 
(See Definition 33.5.20 for compact-open topology.) 


63.1.6 DEFINITION: A topological group of CE diffeomorphisms of a C" manifold M for k € Zi is a 
topological transformation group G < (G, M) < (G,Ta, M, Am, ca, p) of CF diffeomorphisms from M to 
M such that pp: G x M — M is continuous. 


63.2. Families of diffeomorphisms 


63.2.1 REMARK:  Differentiable families of diffeomorphisms. 

A prime application of differentiable families of diffeomorphisms is to parallel transport along base-space 
curves for differentiable fibre bundles. When viewed through a single fibre chart, the parallel transport of a 
fibre set along a path is mapped to a family of diffeomorphisms of the fibre space. 


It seems reasonable to suppose that in Definition 63.2.2, condition (iii) should follow from conditions (i) 
and (ii), but it is not entirely obvious how to prove this in an elementary way. So it is assumed. (This issue 
may be related in some way to the inverse function theorem. See Theorem 41.10.4.) In other words, it is 
assumed that the family of inverses of a given C^ family of C^ diffeomorphisms is also such a family. (It 
is very easy to confuse 7(t)~! with y(—t), which would make condition (ii) automatic. As mentioned in 
Remark 63.2.11, the temptation to confuse the inverse with the reverse must be resisted. The author could 


find neither proof nor counterexample that (iii) follows from (i) and (ii), even for k = 0.) 


63.2.2 DEFINITION: A C* family of CE diffeomorphisms of a C" manifold M, for k € ZA, is a map 
y :1 — (M — M) for some open interval J C IR such that 
(i) y(t): M — M is a C* diffeomorphism of M for all t € I, 
(ii) the map (t, p) + y(t)(p) is in C*(I x M, M), 
(iii) the map (t, p) + y(t)~1(p) is in C^(I x M, M). 


A OF diffeomorphism family on a C* manifold M is the same as a C^ family of C* diffeomorphisms on M. 


63.2.3 DEFINITION: The vector field generated by a Ct family y of C! diffeomorphisms of a Ct manifold M 
at a parameter to € Dom(7) is the map X : M — T(M) defined by 


Vp € M, X(y(to)(p)) = AHP hest: 


63.2.4 REMARK: The “tangent algebra” of vector fields generated by families of diffeomorphisms. 
Definition 63.2.5 is intended to be a generalisation of the Lie algebra T.(G) of a Lie group G with unit 
element e to groups of C! diffeomorphisms of a manifold. The fact that a vector field X € X°(T(M)) is 
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generated by a C! family of C! diffeomorphisms implies that it is in some sense “integrable” because it 
conserves the structure of M in a differential sense. This makes it a plausible candidate for an “infinites- 
imal group action" condition for fibration connections in Definition 67.4.2, corresponding to the standard 
infinitesimal group action condition (v) in Definition 67.5.4. 


63.2.5 DEFINITION: The tangent algebra of a group G of C^ diffeomorphisms of a C! manifold M is the 
set of all vector fields on M generated by C! families of elements of G. 


63.2.6 REMARK: The velocity field family of a diffeomorphism family. 
According to Notation 63.2.8, 7/ : I + X? (T(M)) denotes the family of vector fields which are generated at 
each “time” to by the family y. Thus »'(to) is the velocity field induced on M by the family y at “time” to. 


The expression 4(fo) !(p) in Definition 63.2.7 is the location of the point p before it was transformed up 
until the current time tọ. Then y(t) is applied to this expression to obtain its location at any time t € I. 
This ensures that ^' (£9) (p) is a tangent vector in T5(M). So this velocity is attributed to the current location 
"y(to) -! (p) which the diffeomorphism family has arrived at by time to instead of the starting point p. 


63.2.7 DEFINITION: The velocity field family of a C} diffeomorphism family y : I > (M —^ M) ofa C! 
manifold M is the map from I to X°(T(M)) given by the rule tp 6 (p — alrt) ETEO) ea) 


63.2.8 NOTATION: 7’, for a C! family y : I — (M — M) of C! diffeomorphisms of a C! manifold M, 
denotes the velocity field family of y. In other words, y’ : I + X°(T(M)) is given by 


Vto € I, Yp € M, 1 (to)(P) = 8«Q(0) (t0) 7 O pt, 


63.2.9 REMARK: Application of velocity field families to real-valued functions on manifolds. 

The velocity field family y” of a diffeomorphism family y on M may be applied to a real-valued function 
f on M to obtain the rate of change of the value of f at each point in the manifold. Unsurprisingly, the 
application of the velocity field family to f gives the same result as differentiating the value of f with respect 
to the action of the diffeomorphism. This is asserted in Theorem 63.2.10. 


63.2.10 THEOREM: Formula for derivative of real function by the velocity of a diffeomorphism family. 
Let y: 1 — (M > M) bea C! diffeomorphism family on a C! manifold M. Then 
Vto € I, Yp e M, Vf € C'(M), 
Prof = (d) (to) ()) 
= Of (V(t)(7(to)* O) pt 


PROOF: The equality O,-(45)(p)f = (df)p(7'(to)(p)) follows from Definition 58.1.2. Then the equality 
(df)s y (to) (p)) = &f (v4) (WENTE Neto follows from Theorem 58.1.19. 


63.2.11 REMARK: The distinction between reverse and inverse of a diffeomorphism family. 

A diffeomorphism family y may be reversed by replacing t with —t. This is essentially the same as the reversal 
of the curve t  *(t)(p) for any fixed p € M. Consequently the velocity of the reverse of a diffeomorphism 
is the negative of the velocity of the original family. In other words, if 5 : I > (M — M) is defined by 
I = (t; —t € I} and 7 : t> sy(—t), then 


Vto € I, Vp € M, 3 (to)(p) = —Y (—to) (p). 


This simple reversal must be distinguished from the inverse y~! of y, which is the subject of Definition 63.3.3 
and Theorem 63.3.8. The purpose of Theorem 63.2.12 is to assist the proof of Theorem 63.3.7, which assists 
the proof of Theorem 63.3.8. 


63.2.12 THEOREM:  Diffeomorphism family velocity formulas to show dual differential action properties. 
Let 4; : h — (M —> M) and %2 : I —> (M — M) be C! diffeomorphism families on a C! manifold M. 
Define 514, : I5. — (M — M) for all t2 € I5, define 554, : Ig > (M — M) for all tı € I5, and define 
40: hi I5 (M > M) by 


Vt; € l, Vt2€ I2, V» M, — us ()(») = va (ta3)m (t)()) 
Vt, € h, Vt3 € I5, Vp € M, ^a (ta) (p) = va(ts) m (t)(»)) 
vte hn I», Vp € M, Ao(£)(p) = wa (t) (mi (t) (p)). 
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Then 
Yi EL, Wa e La, PEM, Heal ts)(0) = (delta) (ta) rata) (9))) (63.2.1) 
Vty € h, Via € I, VpE M, — 654, (t2)(p) = as (t2)(p) (63.2.2) 
Vt € hin I», Vp € M, WEP) = (d? (£))5. 19-1) 1 (0 (0 (0) (0) + 5 (0)(). (63.2.3) 


PROOF: For line (63.2.1), let tı € Ij, t2 € Ig and p € M. Then by Notation 63.2.8, 


) 
) 
Jaa (te) 2 (9) (x(a (4) (2 (63) 7 (2 (62) *()))) |) (63.2.4) 
) 


where line (63.2.4) follows from Theorem 58.4.11. 
Line (63.2.2) follows from Notation 63.2.8 as follows. 


3a. (t) (Fe, t1 (t2)*(p))) — 
( 3o. (5) P) |a, 

(0i (6:)03 (53) ^ Q9 (6) 7 (0), 
(93 (t2)- P) h 
). 


Vt; € Ij, Vtg € I5, Vp € M, 554, (t2)(p) = à 
= Ó, 


— cm 


Line (63.2.3) follows from lines (63.2.1) and (63.2.2). 


63.3. Dual action of diffeomorphism families 


63.3.1 REMARK: The dual differential action of a diffeomorphism family on a real-valued function. 

It is possible to define the “dual action" on functions f : X — Y by bijections ¢ : X — X in a very 
general way. The dual action ¢* of ¢ on f may be defined by Vx € X, 6*(f)(z) = f(ó !(x)). This 
concept is demonstrated by the left translation operator LF in Definition 62.3.3 and the right translation 
operator RE in Definition 62.6.2. (See also the dual representation of groups of finite-dimensional linear 
space automorphisms in Definition 23.11.20.) 


In the case of a differentiable family of diffeomorphisms, the dual action concept may be extended to define 
dual differential actions. The dual differential action concept is applicable to the construction of a dual 
for the horizontal lift function on a differentiable fibration in Definition 67.4.2. In the tensor calculus 
context, it is the dual differential action for linear spaces which explains why the formula for the covariant 
derivative of a covariant vector with coordinates (w;)j., is (v/Ojw; — TE wku)? ; in the direction with 


coordinates (v7)? 4, 


whereas the corresponding formula for a contravariant vector with coordinates (u*)? , 
is (v0;u! + MATT v^) ,. The sign reversal and the transposition of the linear transformation follow from 
the definition of a dual differential action. 


'The essence of the general dual action idea is that the value of the function should be effectively held constant 
by a combination of covariant and “contragredient” actions. Thus the points are made to move “under” a 
function, and the value of the function is simultaneously modified *over" the points, but the value of the 
transformed function acting on transformed points should remain the same. In the case of a differential 
action, the result is that a kind of vector field is induced on the function space corresponding to a given 
vector field on the point set. 


The function which is subjected to the dual action may be thought of as moving in the opposite direction 
to the primal action of the points under it. It would perhaps be more accurate to think of the dual action 
as being an up-and-down motion rather than a sideways motion. 
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63.3.2 REMARK: The role of general dual action definitions for parallelism on general fibrations. 

The kinds of dual action which are specified in Definitions 63.3.3 and 63.3.5 are closely related to the 
concept of associated parallelism for associated fibre bundles. (See Section 47.9 for associated topological 
fibre bundles. See Section 48.4 for associated parallelism on associated topological fibre bundles.) Dual 
action is only one kind of associated action which may be defined for fibre spaces. General mixed “tensor 
actions” may be defined in terms of various combinations of primal and dual actions. The kinds of dual 
action described here, however, do not require a Lie group as the structure group. (In other words, these 
general kinds of dual action are applicable to fibrations, not just differentiable fibre bundles.) The dual action 
concept is extended here to general groups of diffeomorphisms for which the group does not necessarily have 
any kind of differentiable manifold structure. The dual action concept does not require linear space structure 
on the passive set of the group action. (The passive set is the fibre space in fibre bundles and fibrations.) The 
dual action concept does not even require a differentiable structure on the passive set unless a dual differential 
action is required to be defined. The differential action is required for the definition of connections, which 
are differentials of parallel transport along curves in the base space. 


63.3.3 DEFINITION: The dual action of a C? family y : I — C°(M,M) of C? diffeomorphisms on a C? 
manifold M for an open interval I € Top(IR) is the map »* : I > C°(C°(M, R), C?(M, R)) defined by 


vt € I, Vf € C'(M, R), Vp € M, (OA) = fa (0). (63.3.1) 
In other words, 
vt € I, Vf € C°(M,R), Y (QU) fot). 


63.3.4 REMARK: The dual action of a family of diffeomorphisms. 

Definition 63.3.3 closely follows the pattern of Definition 23.11.19 for linear space automorphisms. The dual 
action of y(t) is defined individually for each t € I without reference to the action for other values of t. In 
other words, Definition 63.3.3 could have been written in terms of individual diffeomorphisms in C? (M, M) 
instead of aggregating diffeomorphisms into a continuous family. However, the differential action of a family 
of diffeomorphisms does depend on the whole family. 


It is easily seen that line (63.3.1) implies y*(t)(f)(y(t)(p)) = f(p) for all t € I, f € C°(M,R) and p € M. 
This means that when q(t) is applied to p and y*(t) is applied to f, the value of f(p) is unchanged. 


63.3.5 DEFINITION: The dual differential action of a C! family y : I + C' (M, M) of C! diffeomorphisms 
on a C! manifold M for an open interval I € Top(IR) with 0 € I, which satisfies 7(0) = idw, is the map 
' (0)* : C! (M, IR) — C?(M,IR) defined by 

Vf € C'(M,R), Vp € M, y (0)* Cp) = - (df) (0) (0). (63.3.2) 


63.3.6 REMARK: The dual differential action of a family of diffeomorphisms. 

The dual differential action y'(0)* in Definition 63.3.5 can (and should) be interpreted as a “tangent vector" 
on the space C! (M, R) in the same way that 4/(0) € X°(T(M)) can be interpreted as a “tangent vector" 
on whole manifold M. 


A vector field on a differentiable manifold M moves the whole manifold at once. In other words, a vector 
on M is an action on the aggregate of all points in the manifold, not just on each point individually. In this 
sense any element of X°(T(M)) may be thought of a kind of “tangent vector" on the “point” M. In the 
same way that tangent vectors at points of a manifold can be identified with equivalence classes of curves 
through that point which have the same first-order effect on the value of functions on the manifold, a vector 
field in X°(T(M)) can be identified with an equivalence class of families of diffeomorphisms which have the 
same first-order effect on functions on the whole manifold. (See Remark 53.3.4 for tangent curve classes.) 


By analogy, a dual differential action on C! (M, IR) may be thought of as a tangent vector on the aggregate 
space C1 (M, IR), not just on the individual functions in that space. Each family of diffeomorphisms on the 
point space M has a first-order effect on all of the functions in C! (M, IR). The equivalence class of these 
diffeomorphism families may be thought of as a tangent vector on C! (M, IR). The purpose of Definition 63.3.5 
is to identify which of these “tangent vectors" is the dual of the “tangent vector" y/(0) which acts on M. In 
other words, it tries to identify which differential change in the functions f € C! (M) will keep the value of 
"y" (t)(£) constant to first order with respect to t. 


'The purpose of Theorem 63.3.8 is to vindicate Definition 63.3.5. T'he purpose of Theorem 63.3.7 is to assist 
Theorem 63.3.8. 
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63.3.7 THEOREM: Velocity of inverse of diffeomorphism family is the negative of the original velocity. 
Let M be a C! manifold. Let y : I —^ (M — M) be a C! family of C! diffeomorphisms of M for some 
interval I € Top; (IR) such that 4(0) = idm. Then 


Vp € M, (y (05 = —Y' (09). 


PROOF: Let yı = y and y2 = y l, where 4^! : I — (M — M) is defined by y~1(t)(p) = (t) (p) 
for all t € I and p € M. Define 59 : I > (M > M) by 4o(t)(p) = wa(t)(oi(t)(p)) for all t € I and 
p € M. Then *o(t)(p) = p for all t € I and p € M, and so ¥(t)(p) = 0 for all t € I and p € M 
by Notation 63.2.8. But by Theorem 63.2.12 line (63.2.3), 45(0)(p) = (dy(0))p(71(0)(p)) + ?'(0)(p) for 
all p € M, and since (dy(0)),(V) = V for all V € T,(M) for all p € M because 7(0) = idm, it follows that 


(y 1) (0)(p) = —y'(0)(p) for all p € M. 


63.3.8 THEOREM: Dual differential action on real functions counterbalances the differential action. 
Let M be a C! manifold. Let y : I —^ (M — M) be a C! family of C! diffeomorphisms of M for some 
interval € Top; (IR) such that y(0) = idm. Then 


Vp € M, Vf € C'(M,R), aft) Plo = BD. 
—(df)p (7 (0)(p)). 


Proor: Let p c M a f € C'(M,IR). Then &f(7(t)(p))|,-9 = (df)p(Y/(0)(p)) by Theorem 63.2.10. 
Similarly, 0, f (y(t)~1(p Dl- o = (oly 1V (0)(p)), where 47! : I + (M — M) is defined by y~1(t) = y(t)7! 
for all t € I. But (77+) (0)(p) = E » ) by Theorem 63.3.7 J 7. So AF) P|,- = =d) (0)(p)), 
from which it follows that 8; f(y(t)~'(p))|,_9 = —3 F(E) (P); o: 


63.3.9 REMARK: Vindication of definition of the dual differential action of a diffeomorphism family. 

In light of Theorem 63.3.8, it can be seen that Definition 63.3.5 line (63.3.2) is effectively the differential of 
Definition 63.3.3 line (63.3.1). Thus the dual differential action ?/(0)* (f) on real-valued functions f exactly 
counterbalances the differential action y/(0) of y on the points of M. 


The dual differential action for this very general class of diffeomorphisms on differentiable manifolds is 
the concept which underlies the well-known dual differential action by the negative Christoffel array for 
covariant vectors, as mentioned in Remark 63.3.1. The Christoffel array is in fact, in component form, the 
dual differential action of parallel transport acting on an ordinary fibre bundle, which in this case is the 
(contravariant) tangent bundle of a manifold. Then the negative of this action is applied to the corresponding 
covariant tangent bundle, which is the covector bundle. Therefore Theorem 63.3.8 vindicates the well-known 
tensor calculus formula for the covariant derivative of covariant vectors, and also covariant derivatives of 
dual spaces in general. 


63.4. Lie left transformation groups 


63.4.1 REMARK: Application of Lie transformation groups to connections on differentiable fibre bundles. 
Lie transformation groups are useful for defining differentiable ordinary fibre bundles and connections on 
differentiable fibre bundles, in particular Lie right transformation groups as in Definition 66.2.7. (See Sec- 
tion 63.5 for Lie right transformation groups.) 


In essence, a Lie transformation group is a transformation group where every space is a differentiable manifold 
and every map is differentiable. In other words, it is a differentiable transformation group. 


If the differentiable manifold structures of a Lie left transformation group are replaced by the corresponding 
topological structures which they induce on the spaces, the result is a topological left transformation group 
as in Definition 36.10.4. If the topological structures are also removed, the result is a left transformation 
group as in Definition 20.1.2. 


63.4.2 DEFINITION: A C* Lie (left) transformation group for k € Zg is a tuple 
(G, M) < (G, M, oc. uM) < (G, Aa, M, Am, oc, HL) which satisfies the following. conditions. 
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(i) ( 

(ii) (M, Ay) is a C^ manifold. 

(iii) (G, Ta, M, Ty, oG, uł) is an effective left topological transformation group of the topological space 
(M, Ty), where the topologies Tg and Thy induced by the atlases Ag and Ay respectively. 


(iv) The map ud : G x M — M is C* with respect to the product differentiable structure on G x M and 
the differentiable structure on M. In other words, uM € C*(G x M, M). 


G, Aq, oq) is an analytic group. 


63.4.3 REMARK: Alternative terminology for Lie transformation groups. 
A Lie transformation group is also known as a “differentiable transformation group". The manifold M in 
Definition 63.4.2 is sometimes called a G-manifold or G-space. 


If the regularity class C* of a Lie transformation group is unspecified, it is assumed to be Ct. 


63.4.4 REMARK: Alternative conditions for the definition of Lie transformation groups. 

Definition 63.4.2 assumes minimal regularity as in EDM2 [113], 431.C, which corresponds to Montgomery / 
Zippin [118], page 195. However, Malliavin [28], page 240 defines Lie groups of transformations to be C% 
transformations acting on the right of a C^? manifold. Kobayashi/Nomizu [19], page 41 also define Lie 
transformation groups to have a C% right action on a C^? manifold. 


63.4.5 THEOREM: Left actions by a OF transformation group are C* diffeomorphisms of the passive set. 
Let k € Zt. Let (G, M) be a C* Lie left transformation group. Then for all g € G, the map Lg : M > M, 
defined by Ly : x > gx, is a C* diffeomorphism from M to M. 


Proor: Let g € G. Then it follows from Theorem 52.6.8 (ii) that Lj : M — M is a C* map. Similarly, 
(Lj) ! =Ly-1:M>Misa C^ map. Hence Lg : M — M is a C* diffeomorphism. 


63.4.6 REMARK: Left translation operators for a Lie left transformation group. 

Definition 63.4.7 is a generalisation to Lie left transformation groups of Definition 62.3.3 for Lie groups. The 
left translation operators in Definition 63.4.7 use the same notations as in Definition 62.3.3, but all act on the 
passive space M rather than G. A similar comment applies to Definition 63.4.8. 'The numerous definitions of 
left and right translation operators for Lie groups and Lie transformation groups are listed in Table 63.4.1. 


system translation object definitions 

Lie group left point, function, operator 62.3.3 

Lie group left vector, vector field 62.3.9 

Lie group left covector, differential form — 62.3.15, 62.3.17 

Lie group right point, function, operator 62.6.2 

Lie group right vector, vector field 62.6.3 

Lie group right covector, differential form 62.6.9, 62.6.10 

Lie left transformation group left point, function, operator 63.4.7 

Lie left transformation group left vector, vector field 63.4.8 

Lie left transformation group left covector, differential form 63.4.10, 63.4.12 

Lie right transformation group right point, function, operator 63.5.6 

Lie right transformation group right vector, vector field 63.5.7 

Lie right transformation group right covector, differential form 63.5.9, 63.5.10 
'Table 63.4.1 Overview of translation operator definitions 


Although Figure 62.3.1 is intended to illustrate the Lie group left translation operators, it equally accurately 
illustrates the Lie transformation group left translation operators in Definitions 63.4.7 and 63.4.8. 


63.4.7 DEFINITION: Let (G, M) be a C! Lie left transformation group. Let g € G. 
The left translation operator (for manifold points) LY : M —> M is defined by 


Vx € M, LY (x) = gu. 
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The left translation operator (for real-valued functions) LF : C°(M) — C? (M) is defined by 


Ve € C?(M), L (¢) =¢0 L5... 
That is, 
Vó € C°(M), Va € M, Li (¢)(2) = é(g ^v). 


The left translation operator (for tangent operators) Lr : T(M) > T(M) is defined by 


YV € T(M), LT (dy) = dy o LC s. 
That is, 
VV € T(M), Y$ € C' (M), LT(0y)(6) = 0v (LC. (9) 


= ðv ($ o LM). 


The left translation operator (for tangent operator fields) L : X°(T(M)) > X°(T(M)) is defined by 


VX e X°(T(M)), LP (Ax) = LT o (Ox o LM,). 
That is, 

YX € X°(T(M)), Vz € M, LE (Ax)(«) = LT (0x (g^!z)). 
That is, 


VX e X°(T(M)), Yz e M, Ve e C! (M), 


63.4.8 DEFINITION: Let (G, M) be a C! Lie left transformation group. Let g € G. 
The left translation operator (for tangent vectors) LT : T(M) > T(M) is defined by 


VV e T(M), LT (V) = (LM),(V) 
= ðy LY. 
In other words, 


Yp € G, VV € T,(G), LZ (V) = (ato sv). 


The left translation operator (for vector fields) LT : X°(T(M)) + X°(T(M)) is defined by 


F _7T M 
VX e X°(T(M)), LI (X) = L} o X o Lha 
= Gg o Xo T 
That is, 
VX e X°(T(M)), Vx € M, LE (X)(z) Lg 2) 


= (L5). (X (g^ 2). 


63.4.9 REMARK: Left translation of tangent covectors and differential forms. 


Definition 63.4.10 substitutes tangent covectors and differential forms for the tangent vectors and vector 
fields in Definition 63.4.8. Definition 63.4.10 generalises Definition 62.3.15 from Lie groups to Lie left trans- 
formation groups. (Figure 62.3.2 illustrates both Definitions 62.3.15 and 63.4.10. The “concept of operation” 


is explained in Remark 62.3.14 for both definitions.) 
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63.4.10 DEFINITION: Let (G, M) be a C? Lie left transformation group. Let g € G. 
The left translation operator (for tangent covectors) L5 : T*(M) — T* (M) is defined by 


VA € T*(M), L(A) =à o LJ- 
= AÀ o (Los 


In other words, 


Vp € M, VÀ € Tz (M), VV € T4(M), L(A) = A(LLA(V)) 
= A((L971)«(V)) 
= (Oy LI). 


The left translation operator (for differential forms) L? : X(T*(M)) + X(T*(M)) is defined by 


Vw € X(T*(M)), L? (w) = Lå} o w o IMs. 
In other words, 
Vw € X(T"(M)), Vp € M, LG (w)(p) = L3 (w(9-*)) 
= w(g !p) o Ls 
= w(g p) o (D2.1).. 
In other words, 


Vw € X(T*(M)), Yp e M, VV € T,(M), 


L5 w)(p((V) = L5(w 


63.4.11 REMARK: Left translation operators for short-cut differential forms. 

Definition 63.4.12 is the generalisation of Definition 62.3.17 from Lie groups to Lie transformation groups. In 
practice, the short-cut differential forms in Definition 63.4.12 are often easier to work with than the tangent 
covector fields in Definition 63.4.10. 


63.4.12 DEFINITION: Let (G, M) be a C? Lie left transformation group. Let g € G. u 
The left translation operator (for short-cut differential forms) Le : X(Ai(T(M))) > X(M(T(M))) is 
defined by 


Vw € X(Ai(T(M))), Lew) =w o LT, 
=wo0 (L2 .), 


In other words, 


Vw € X(M(T(M)), YV eT(M), L®(w)(V) = 


63.4.13 REMARK: Left invariant vector fields on manifolds under a Lie transformation group subgroup. 

'The reason for specifying a subgroup in Definition 63.4.14 instead of the whole group as in Definition 62.4.2 
is that whole-group left invariant vector fields are typically quite trivial. As an example, there is only one 
vector field on S? which is invariant under all of SO(3), namely the zero vector field. (To see this, apply any 
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non-trivial rotation around any point where the vector field is non-zero.) In general, “larger” groups generally 
have “fewer” left invariant vector fields on a given manifold, whereas enlarging the manifold typically yields 
“more” left invariant vector fields. (This is related to the effects of group “size” and passive set “size” on 
the “size” of the set of orbits, mentioned in Remark 20.5.4.) 


Theorem 63.4.15 (i) is the Lie transformation group version of Theorem 62.4.5 (i) for Lie groups. There is 
no transformation group version of Theorem 62.4.5 (ii) because the passive manifold set M has no “identity 
element” corresponding to the identity of the group G. Thus any left invariant vector field can be computed 
from any particular element of a given orbit of the subgroup H, but in general, no individual element of an 
orbit may be identified as “the identity”. Another contrast to the case of Lie groups acting on themselves 
is that a Lie transformation group may not be transitive, and so each orbit has independent left invariant 
vector fields, where in the Lie group case, there is only one orbit. 


63.4.14 DEFINITION: A left invariant vector field on the manifold M with respect to a subgroup H of a 
Lie left transformation group (G, M) is a vector field X on M such that 


Vg € H, ET eX. 


63.4.15 THEOREM: Some formulas for left invariant vector fields with respect to a subgroup. 
Let (G, M) be a Lie transformation group. Let X be a left invariant vector field on M with respect to a 
subgroup H of G. 


(i) Vh € H, Yz € M, X(x) = LL(X(h-1z)) = (LM), (X(h-12)) = (ALM), i, (X(h-12)). 


Proor: For part (i), let X be a left invariant vector field on M with respect to H. Then by Definitions 
63.4.14 and 63.4.8, X(z) = LI(X(h-z)) = (LM, (X(h-!z)) = (dL), i, (X(h^!z)) for all h € H 
and z € M. 


63.4.16 REMARK: General linear Lie groups. 

The most important kind of Lie left transformation group for differential geometry may be the groups of 
linear transformations of a finite-dimensional linear space over the field R of C. For simplicity, real linear 
spaces are assumed here. The full tuple for the linear space does not need to be included in the specification 
tuple for the Lie group because the linear space is a parameter for the definition. (This is analogous to the 
way in which the parameters G and F for a (G, F) fibre bundle do not need to appear in its specification 
tuple (E, r, B, AE).) 

The atlases for G and V in Definition 63.4.17 each require only one chart, which is a coordinate map for a 
single choice of basis. Then the charts for other choices of basis will be automatically compatible. It is also 
possible to add any number of analytic-compatible or C^-compatible charts to these atlases for k € Zi , but 
these are superfluous and also potentially downgrade the regularity of these differentiable manifolds. Since 
there is no a-priori way to choose a basis for an unknown, unseen linear space, a unique specification tuple 
can be obtained only by including coordinate maps for all bases, which does not in any way reduce the 
manifold differentiability class. 


63.4.17 DEFINITION: The general linear (Lie) (transformation) group of a finite-dimensional real linear 
space V < (IR, V, or, TR, ov, HX) is the tuple (G, V) < (G, Ac, V, Av, oa, ub); where 


(i) G is the set of linear space automorphisms (i.e. invertible linear transformations) of V, 
(ii) (G, V, oc, ud) is the left transformation group of G acting on V as in Definition 20.1.2, 
(iii) Ay = (4p; B is a basis for V} as in Definition 51.4.21, where «pg denotes the component map for any 
basis B = (e;)7-4 for V as in Definition 22.8.8, 
(iv) Ag = (&p,p; B is a basis for V) as in Definition 51.4.24, where &p,p denotes the linear-map component 
map for any basis B = (e;)?_, for V as in Definition 23.2.8. 


63.4.18 NOTATION: GL(V), for a finite-dimensional real linear space V, denotes the Lie group of general 
linear transformations of V. 


63.4.19 NOTATION: GL(n, R) denotes the Lie group of general linear transformations of IR". That is, it is 
the group of real invertible n x n matrices. 


GL(n) is an abbreviation for GL(n, IR). 
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63.4.20 THEOREM: Finite-dimensional general linear groups are C?? Lie transformation groups. 
GL(V) is a C?? Lie transformation group for any finite-dimensional real linear space V. 


PROOF: Let V be a finite-dimensional real linear space. Then GL(V) = (G, Ac, V, Av,oc,u%) as in 
Definition 63.4.17, where Ag = (&p,p; B is a basis for V) as in Definition 23.2.8. So (G, Ag) is an analytic 
manifold by Definition 51.10.3 because the transition maps in Ag are of the form &p,. p, o Kg p,» Which are 
real analytic functions from IR?*" to IR"*" because they are linear. Then (G, AG, oq) is an analytic group 
by Definition 62.2.2 because og and the inverse operation g +> g^! are analytic because these operations 
correspond to matrix multiplication and inverse operations respectively, which are both analytic. Thus 
Definition 63.4.2 condition (i) is satisfied. Condition (ii) is satisfied with k = oo because the transition maps 


KB, © Kg. for (V, Av) are linear functions from IR" to R”. Condition (iii) follows from Theorem 39.6.4 (iii). 
2 Bi a pode dei ary 


For condition (iv), the C% differentiability of mA via the component maps follows from the bilinearity of the 
matrix product formula in Definition 25.3.7 with respect to matrix elements. 


63.5. Lie right transformation groups 


63.5.1 REMARK: Definitions of Lie left and right transformation groups. 

To avoid confusion, Lie right transformation groups are presented in Section 63.5 as a separate topic. (For 
purely algebraic left and right transformation groups, see Sections 20.1 and 20.7 respectively. For topological 
left and right transformation groups, see Sections 36.10 and 36.11 respectively.) 


Definition 63.5.3 is the mirror image of Definition 63.4.2. 


63.5.2 REMARK: The relative significance of Lie left and right transformation groups. 

It could be argued that the difference between left and right transformation groups is purely formal. That 
argument seems weaker in contexts where a group acts on sets both from the left and the right. In the case of 
a group acting on itself, the left and right actions are both meaningful. In the case of differentiable principal 
fibre bundles as in Section 66.1, it is the Lie right transformation groups which are required because the right 
action map on a principal bundle commutes with fibre transition maps, which are specified as left actions 
on the fibre space in Definition 64.8.3 (iv), whereas the corresponding left action maps do not commute in 
this way, as mentioned in Remark 66.2.25. 


This possibly perplexing asymmetry arises from the fact that the structure group for a fibre bundle is a left 
transformation group (G, F), where a group G acts on a fibre space F. When that fibre space is the group 
G itself, as is the case for principal bundles, the “passive copy” of G in the left transformation group (G, G) 
can be acted on also from the right. This is exploited to construct a right transformation group on the total 
space as in Definition 66.2.7. 


The relative significance of Lie left and right transformation groups is discussed particularly in Section 20.10, 
where topological and differentiable structure and the base space are removed from fibre bundles. Fibre spaces 
represent the results of measurements, and are acted on by left transformation groups. Structure groups 
represent sets of reference frames for making measurements, and are acted on by right transformation 
groups. These transformation groups are coupled in fibre bundles in a kind of “contragredient” relationship 
between ordinary and principal fibre bundles. (For contragredient relationships, see Remarks 23.11.18, 
54.4.12, 55.2.16, 55.2.19, 58.8.1 and 63.3.1.) 


63.5.3 DEFINITION: A C* Lie right transformation group for k € Zi is a tuple 
(G, M) < (G, Ac, M, Am, 0G, ue) which satisfies the following conditions. 


(i) (G, Aa, 0G) is an analytic group. 
(ii) (M, Am) is a C^ manifold. 
(iii) (G, Tc, M, Tw. 0G. ul) is an effective right topological transformation group of the topological space 
(M, Ty), where the topologies Tg and Ty induced by the atlases Ag and Ay respectively. 


(iv) The map uM : M x G — M is C^ with respect to the product differentiable structure on M x G and 
the differentiable structure on M. In other words, uM € C*(M x G, M). 


63.5.4 THEOREM: Right actions by OF transformation groups are C" diffeomorphisms of the passive set. 
Let k € Zg. Let (G, M) be a C* Lie right transformation group. Then for all g € G, the map Rg : M > M, 
defined by Rg: ze xg, isa C* diffeomorphism from M to M. 
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PROOF: Let g € G. Then it follows from Theorem 52.6.8 (i) that Rọ : M — M isa C^ map. Similarly, 
(Rj) ! = Rg- : M > M is a C* map. Hence Rg : M — M is a C* diffeomorphism. 


63.5.5 REMARK: Right translation operators for Lie right transformation groups. 

Definitions 63.5.6 and 63.5.7 for Lie transformation groups are mirror images of Definitions 63.4.7 and 63.4.8 
in the same way that the corresponding Definitions 62.6.2 and 62.6.3 for Lie groups are mirror images of 
Definitions 62.3.3 and 62.3.9. The apparent redundancy of information here has the benefit that it saves the 
mental effort of converting between these mirror-images, which can introduce occasional errors. 


63.5.6 DEFINITION: Let (G, M) be a C! Lie right transformation group. Let g € G. 
The right translation operator (for manifold points) d : M > M is defined by 


Vz € M, RY (x) = rg. 


The right translation operator (for real-valued functions) RC : C°(M) — C°(M) is defined by 
g 


vé e (M), RC (6) = o RM.. 
That is, 
Vo € C°(M), Va € M, RG ()(z) = ó(zg ?). 


The right translation operator (for tangent operators) RT : T(M) > T(M) is defined by 


VV € T(M), RI (y) = dy o RC... 
'That is, 
VV € T(M), Yọ € C' (M), R7 (av) (H) = Ov (RC. (0) 


= dy ($ o R). 


The right translation operator (for tangent operator fields) RE : X°(T(M)) > X°(T(M)) is defined by 


VX e X°(T(M)), RE (0x) = RT o (0x o RM). 
That is, 

YX e X°(T(M)), Vz € M, RF (Ox)(x) = RI (0x (xg7})). 
That is, 


VX e X°(T(M)), Yz e M, Vee C! (M), 


63.5.7 DEFINITION: Let (G, M) be a C! Lie right transformation group. Let g € G. 
The right translation operator (for tangent vectors) RT : T(M) > T(M) is defined for g € G by 


YV € T(M), RI (V) = (Ri), (V). 
= dy RY. 
In other words, 


Vp € G, VV €T,(G), RI (V) = Yo. 


The right translation operator (for vector fields) RP : X°(T(M)) + X°(T(M)) is defined by 


VX e X°(T(M)), RF (X) = R} o X o RM, 
Ll. M M 
= (R3 )« o X O Roa. 
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That is, 


YX e X°(T(M)), Yz € M, RF(X)(z) = RI (X(zg^))) 


= (R5). (X (xg 7). 


63.5.8 REMARK: Right translation of tangent covectors and differential forms. 

Definition 63.5.9 is the mirror image of Definition 63.4.10. Definition 63.5.9 substitutes tangent covectors 
and differential forms for the tangent vectors and vector fields in Definition 63.5.7. Definition 63.5.9 gener- 
alises Definition 62.6.9 from Lie groups to Lie right transformation groups. (Figure 62.6.2 illustrates both 
Definitions 62.6.9 and 63.5.9. The *concept of operation" is explained in Remark 62.3.14 for both definitions.) 


63.5.9 DEFINITION: Let (G, M) be a C! Lie right transformation group. Let g € G. 
The right translation operator (for tangent covectors) R} : T*(M) > T*(M) is defined by 


VA € T*(M), Rea) 2 Ao R? 


In other words, 


Yp € M, YA € T; (M), YV e T,(M), —-RY(A)(V) = A(RTA(V)) 
= \((Rit1).(V)) 
= A(8y R2L.). 


The right translation operator (for differential forms) R : X(I*(M)) ^ X(1*(M)) is defined by 


Vw € X(T*(M)), R? (w) = Rẹ o w o RM. 
In other words, 
Vw € X(T*(M)), Vp € M, Rọ (w)(p) = R} (w(pg7")) 
—-w(pg )o RP 
= w(pg !) o (R31) 
In other words, 
Vw € X(T*(M)), Yp e M, VV € T,(M), 
Rg (w)(p)(V) 


Ro (w(pg  ))(V) 

= w(pg- (RI (V)) 

= w(pg )((Rj..).(V)) 
w(pg (Ov RZ). 


63.5.10 DEFINITION: Let (G, M) be a Ct Lie right transformation group. Let g € G. i 
The right translation operator (for short-cut differential forms) Re : X(MGC'(M))  X(Ai(T(M))) is 
defined by 


Vw € X(Ai(T(M))), R? (w) =wo RI. 
=wo (Rs 


In other words, 


Yw € X(Ay(T(M))), VV e T(M),  R2(w)(V) = o(RT (V)) 
= w((Rg i). (V) 
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63.5.11 REMARK: Left and right invariant transformations on a manifold. 

Corresponding to left and right invariant vector fields on a Lie group, as outlined in Sections 62.4 and 62.7, 
one may define vector fields on a G-manifold M with similar properties as in Definition 63.4.14. (An example 
of this for G = SO(3) and M = S? is presented in Example 63.6.23 and Remark 76.8.6.) Fields on Lie groups 
may have either left or right invariance because there are well defined left and right actions by group elements 
on the group, but in the case of the action of a group on a manifold, only one kind of invariance is meaningful, 
depending on whether the group acts on the manifold from the left or the right. 


Definition 63.5.12 is the mirror image of Definition 63.4.14. 


63.5.12 DEFINITION: A right invariant vector field on the manifold M with respect to a subgroup H of a 
Lie right transformation group (G, M) is a vector field X on M such that 


Vg € H, Aoc) ex. 


63.6. Infinitesimal left transformations 


63.6.1 REMARK: Connections may be thought of as infinitesimal transformations. 

In Section 63.6, vector fields on a G-manifold M are generated by differentiating transformations in a 
Lie transformation group G. These vector fields may be thought of as “infinitesimal transformations" or 
“differential actions" of the group on the manifold, or “generators” of actions on the manifolds. These 
are used for defining connections on ordinary fibre bundles because differential parallelism is represented as 
differential actions on tangent fibre bundles. (See Definition 67.5.4.) 


An infinitesimal transformation of a manifold by a Lie group may be thought of as the differential of the 
action by group elements along a differentiable curve within the Lie group. In other words, given a C! curve 
y :1-— G with 0 € I and 7(0) = ec, an infinitesimal transformation may be thought of as the differential 
Lyte lio of the action Ly : M — M given by Lya) : p e» q(t)p for p € M. One may think of this as 
the derivative y'(0). This concept must be precisely defined to make it meaningful. Ironically, it must be 
defined in terms of the right action Rp : G — M by points p € M on the group G. (See Theorem 63.6.5.) 


It is intuitively fairly clear that an infinitesimal transformation must be a vector field on M which satisfies 
some kind of invariance condition with respect to some element of the Lie group G. In fact, one would expect 
that the diffeomorphism obtained from integral curves of the vector field should yield such invariance. The 
converse is presumably not valid. In particular, one would not expect any infinitesimal transformations to 
be invariant under elements of G which are not in the connected component of G which contains eg. 


Since parallel transport along a curve in the passive point space M is obtained as the integral of the differential 
parallel transport along the curve, it is unsurprising that connections on differentiable fibre bundles are 
defined to be infinitesimal transformations of the fibre space via a fibre chart. Thus the effect of parallel 
transport along a curve may be computed by integrating vector fields on the fibre space, or alternatively by 
integrating a Lie algebra element in the group and applying the result to the fibre space. 


63.6.2 REMARK: Paradoxical definition of left infinitesimal action as differential of a right action. 

The apparent paradox that an infinitesimal left action is expressed in Definition 63.6.5 as the differential 
of a right action results from the fact that one must fiz the right term p in the product gp so as to 
construct the differential with respect to the left term g. This can be seen more clearly from the formula 
(dRp)e(U) = (du(-,p))e(U) for p € M and U € T.(G). (See also Remarks 62.4.12, 62.7.1 and 63.7.1 for this 
“paradox” .) 


63.6.3 REMARK: Motivation for infinitesimal transformations on manifolds. 
Definition 63.6.5 defines the infinitesimal transformation X on a G-manifold corresponding to each element 
U of the Lie algebra T.(G). 


To motivate Definition 63.6.5, consider a curve y : I — G for some open interval J C IR such that 0 € 7 
and 7(0) =e € G. For each p € M, one may define a curve 7, : I > M by yp: t — y(t)p. Since M isa 
manifold, differentiability of such curves is well-defined. So assume that yp is a C! curve in M for all p € M. 
If G is a Lie group, one may differentiate both y and yp. Then 55(0) = A(¥(t)P) |, = (RAE io = 
(dRp)e(y (0)), where Rp is the right action map in Definition 63.6.4. Thus if G is a Lie group, then the 
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infinitesimal action of the curve y on M is of the form (dR,)-(U) with U = 7'(0). However, even if G is 
not a Lie group, it may be that the derivatives y,(0) exist for all p € M, in which case one may generalise 
the definition to infinitesimal transformations of the form Y, € X (T(M)), where Y, : p++ 7,(0). This more 
general definition is applicable to connections on fibrations, whereas the Lie group version is applicable to 
connections on differentiable fibre bundles. 


63.6.4 DEFINITION: The right action on a Lie left transformation group (G,M,oc, p) by a point p € M 
is the map Rp : G — M defined by Rp : g > gp = u(g, p). 


63.6.5 DEFINITION: The infinitesimal transformation map of a C! Lie left transformation group (G, M) 
is the map XM : T,(G) 5 X(T(M)) defined by 


VU € T.(G), Vp € M, Xy (p) = (dRp)e(U), 


where Rp : G — M is the right action of p on G. 
The infinitesimal transformation by a Lie algebra element U € T.(G) on M is the vector field X, 
Alternative name: vector field generated by a Lie algebra element U € T.(G) on M. 


63.6.6 REMARK: Znfinitesimal transformations may be called “fundamental vector fields”. 
Some authors refer to the infinitesimal transformations in Definition 63.6.5 as “fundamental vector fields” 
on the passive manifold M, but the sign is often opposite. (See for example Sulanke/Wintgen [40], page 75.) 


63.6.7 REMARK: The special case of infinitesimal left action on a Lie group. 

In the special case that the “passive manifold” M is the group G itself, there is a potential for confusion with 
the notations XL (p) = (ALS). (U) and X5 (p) = (dR )e(U) which are given in Definitions 62.4.15 and 62.7.7 
respectively, for p € G and U € T.(G). When M = G, the fields X§ and XB are the same. However, in 
the case of the infinitesimal right action in Definition 63.7.3, the formula Xj (p) = (dL,)-(U) implies that 
XG and Xğ are the same if M = G. Thus the notation “X7/” has a context-dependent meaning. (In other 
words, it's clear as mud.) Therefore when M = G, it is best to specify explicitly whether it is an infinitesimal 
left or right action by using notations p ” or “X D ” as applicable. 


63.6.8 REMARK: The set of vector fields generated on a manifold by Lie algebra elements. 

The set in Notation 63.6.9 may be regarded as a representation of the Lie algebra of G. More precisely, the 
map p : T.(G) > XG(T(M)) defined by U ++ X¥ is a representation of the Lie algebra T«(G). This set of 
vector fields is important for its application in definitions of connections on differentiable fibre bundles. (See 
for example Definitions 67.5.4 (v), 67.12.3, 69.1.3 (v), 69.2.2 (iii) and 71.1.2 (v).) 


63.6.9 NOTATION: Xc¢(T(M)), for a Ct Lie left transformation group (G, M), denotes the set of vector 
fields generated on M by Lie algebra elements in T,(G). In other words, XG(T(M)) = {X¥; U € T.(G)}. 


63.6.10 REMARK:  Well-definition of infinitesimal transformations of Lie left transformation groups. 
Since the right action R, on G by each element p € M is a C! map, the differential dR, is well-defined. So 
Xi! (p) € T,(M) is well defined for all p € M. 

The differentiability of X iv is shown in Theorem 63.6.11 from the differentiability of the group action in a 


similar way to the demonstration in Theorem 62.4.16 (v) of differentiability of the left invariant vector field 
XL € X(T(G)) in Definition 62.4.15, where the group acts on the group itself. 


63.6.11 THEOREM: Infinitesimal transformations by CEt! transformation groups are C" vector fields. 
Let k € Zj. Let (G, M) be a C**! Lie left transformation group. Then X! € X*(T(M)) for all U € T.(G). 
In other words, XG(T(M)) € X*(T(M)). 


PROOF: Let U € T.(G). Then (dR,)e(U) € Tg, (9 (M) = Tp(M) for p € M by Theorem 58.4.8. Therefore 
Xi! e X(T(M)) is a well-defined vector field on M. By Definition 63.4.2 (iv), u € C**! (G x M, M), where 
p: G x M — M is the action map of (G, M). Let 0, denote Or, (y) for p € M. 

Theorem 58.6.7 line (58.6.3) implies that (du)ep(U,0p) = (du(-,p))e(U) = (dR5)e(U) = Xi (p), where 
Lx» € O*(T(G) x T(M), T(M)) by Theorem 58.10.3. (See Definitions 58.6.2 and 58.10.2 for the direct 
product differential decomposition ju...) 
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Define à" : M + T(G) x T(M) by 9$" (p) = (U,0,) for p € M. Then à" € C*(M,T(G) x T(M)) by 
Theorem 57.2.16 because M and G are C*+! manifolds. So x = ja. © OY is a C! map from M to 
T(M) by the chain rule, Theorem 52.1.17. Thus Xj € C*(M,T(M)). Hence Xi! € X*(T(M)) because 
X¥ (p) € T (M) for all p € M. 


63.6.12 THEOREM: Infinitesimal transformations depend linearly om their generators. 
Let (G, M) be a C! Lie left transformation group. 
(i) Ve, c2 € R, Wi, U2 € Te(G), Vp € M, X5, Lou (P) = a Xp (P) + 2X0; (P). 
In other words, XM : T,(G) > XG(T(M)) is a linear map. 


PROOF: For part (i), XM tests (p) = (dRp)e(c1U1 + €3U3) = c1(dRp)e(U1) + co (dR)« (U2) = ei Xp (p) + 
c2X fi (p) because (dRp)e : Te(G) > Tp(M) is a linear map by Definition 58.4.5. 


63.6.13 THEOREM: Joint differentiability of infinitesimal transformations. 
Let k € ZF. Let (G, M) be a C^*! Lie left transformation group. Then the map (U,p) — X¥ (p) from 
T.(G) x M to T(M) is a C^ map. 


PRoor: Define Y : T.(G) x M — T(M) by Y (U, p) = X¥ (p) for all (U, p) € Te(G) x M. Then 


V(U, p) € T.(G) x M, Y(U, p) = (dRp)e(U) 
= (dt) (e p) (4(U, Op)) 
= [xx (U, 0p), 


where u : Gx M — M is the action map of G on M, and 0, = Or, (m) is the zero vector at p for all p € M. (See 
Definition 58.10.2 for the “decomposed differential map” ux x: T(G) x T(M) — T(M). See Definition 54.7.6 
for the identification map i : T(G) x T(M) > T(G x M).) 

Define ¢ : T.(G) x M > T.(G) x T(M) by ¢: (U,p) 4 (U,0,). Then ¢ = idz,(c) X Zm, where Zm : 
M — T(M) is the zero vector field Zm : p +> 0, as in Definition 57.2.13. Since Zm € C*(M,T(M)) by 
Theorem 57.2.14, it follows that ¢ € C*(T.(G) x M,T.(G) x T(M)) by Theorem 52.6.15. 


((2020-2-21. To be continued ... )) 


63.6.14 REMARK: Infinitesimal left transformation vector fields are not in general left invariant. 
'Theorem 63.6.15 gives a formula for the left translation of infinitesimal transformations by a group element 
in terms of its adjoint map given in Definition 62.10.2. Since the left translate of Xj is not in general 
the same as AE , this implies that X nd is not in general left invariant. (Theorem 63.6.15 is illustrated in 
Figure 63.6.1. For an application to fibre bundles, see Figure 64.13.1 and Theorem 64.13.11.) 


Xj (g ^p) = (dR,-1p)e(U) 


X MiGey(U) (p) = (dRp)e(Adj(g)(U)) 


Figure 63.6.1 Left translation rule for infinitesimal left translations 


In applications to fibre bundles, the manifold M is the fibre space F of a Lie transformation group (G, F), 
where G is the structure group of a fibre bundle. The infinitesimal left action of the group on the fibre space 
is then used for defining transition maps between fibre charts, but more importantly, it is used for defining 
connections on fibre bundles as in Definition 67.5.4 (v). 
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63.6.15 THEOREM: Formula for left translates of infinitesimal left transformations. 
Let (G, M) be a C! Lie left transformation group. Then 


Vg € G, VU € T.(G), Lj (X0) = X&aig U 


In other words, Vg € G, VU € T.(G), Vp € M, X 
for E , the left translation operator for fields.) 


(gu) (P) = (LI) (XF (g 1 p)). (See Definition 63.4.8 


PROOF: Let g € G and U € T(G). Then 


Vp E€ M, X Kaj(g)(U) (P) = (dRp)e(Adj(g9)(U)) 

= (dRy)e(d(L o RF1)e(U)) 
= (d(Rp o LẸ o R..))«(U) 

= a( L% o Rg-1p))e(U) 

= (dL )g-1p((dRg-1p)e(U)) 
= (UD). (GG (g 1 p) 

= (LM), o Xi. o LM) (p) 

= Lr (XQ). 


Hence LP (XY) = XX(Qw): 


63.6.16 REMARK: Identification of Lie algebra elements with infinitesimal transformations. 

Lie algebra elements are often identified with the left invariant vector fields X Ü which they generate on the 
Lie group itself, and in the case of Lie transformation groups, the elements of a Lie algebra may also be 
identified with the vector fields, i.e. infinitesimal transformations, which they generate on the passive set 
which the group acts on. This is a one-to-one correspondence by Theorem 63.6.17. Thus the vector fields 
XË : p+ (dRp)e(U) in Definition 63.6.5 may be identified with the corresponding vectors U € T,(G). 


By Theorems 62.9.7 and 62.9.9, there is a one-to-one correspondence between vectors U € T.(G) and one- 
parameter subgroups yy : R — G with 45(0) = U. But the vector fields XẸ may also be identified 
with the one-parameter family of diffeomorphisms ¢y : R — (M —> M) defined by ¢y(t) : p > yu(t)p. 
Thus the velocity yy (0) = U of a curve in G with the “velocity” of the action of the group on the passive 
set. As mentioned in Remark 63.3.6, the vector field y/(0) € X(T(M)) may be regarded as a “tangent 
vector" to the whole manifold M in the same way that an element of T,(M) is a tangent vector at a single 
point p € M. Thus the map from T.(M) to X(T(M)) is a kind of “differential” of the action of the group 
on the manifold M. This differential map is linear in the sense that U +> (dRp)-(U) is a linear map from 
T.(G) — T,(M) for each p € M. Even though this “differential map” is not necessarily injective at each 
individual point p € M, the map from U € T.(G) to the entire vector field U = 7/(0) : p> (dRp)-(U) in 
X(T(M)) is injective. 

Thus the following constructions may be identified for a C^*! Lie left transformation group (G, M). 


(1) U € T«(G). 

(2) XE € X^" (T(G)), the left invariant vector field on G with X§(e) = U. 

(3) yu € C?* (IR, G), the integral curve of X5 with yy(0) = e. 

(4) XX € X*(T(M)) defined by XY (p) = (dRp)-(U) for p € M. 

(5) The one-parameter family of diffeomorphisms ¢y : IR — (M > M) with ¢y(t) : p — yu (t)p. 


63.6.17 THEOREM: Infinitesimal transformations are in one-to-one correspondence with their generators. 
Let (G, M) be a C! Lie left transformation group with dim(M) € IR*. Then 


VU, Us € T.(G), X =k x Uy =: 


In other words, the infinitesimal transformation X uniquely determines U € T.(G), and vice versa. 
In other words, XM : T(G) ^ XG(T(M)) is a bijection. 
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PROOF: Let U;,U2 € T,(G) with Xff = Xf and U, # U2. Let U = U, — U2. Then Xj/ (p) = Or, (wy 
for all p € M by Theorem 63.6.12 (i), but U # 0. By Theorem 62.9.9, there exists a curve y € C!(R,G) 
with 4(0) = e and 4'(t) = X%(7(t)) for all t € IR. Suppose that y(t) = e for all t in some neighbourhood 
of 0. Then 0 = 4'(0) = (dLe)e(U) = U by Definition 62.4.15 and Theorem 58.5.2. This contradicts the 
assumption U # 0. Therefore for any € € Top; (IR), there exists to € Q with y(to) # e. Then *(to)po Æ Po 
for some po € M by Definition 20.2.1 because G acts effectively on M by Definition 63.4.2 (iii). 


Define 4 : IR 2 M by 4(t) = y(t)po for all t € IR. Then 4(0) = po, and 4 € C!(IR, M) and 


vt ER, 4 (t) = (Rpa (Y (t) 
= (dRo Jye Q (t) 
= (dRpy)(t) ((dLao)«(U)) 
= (d(fty, o Lat) )e(U 
= (d(Ly() o Rpo))e(U 
dL (t)) po ((dRpo)e(U)) (63.6.2) 
= (dL4(1)) po (XQ. (po) 


(63.6.1) 


where lines (63.6.1) and (63.6.2) follow from the pointwise chain rule, Theorem 58.4.13. This differential 
equation for 4 has a solution 4(t) = po for all t € IR. This is the only solution because by Theorem 57.10.8, 
the zero vector field p ++ Om (yj on a C! manifold has only one integral curve through each point. So 
^(to)po = po, which contradicts the definition of po. Then y(to)p = p for all p € M, which contradicts the 
definition of tọ and y(to), and this contradicts the assumption U; Æ U2. Hence U = U2. The converse 
follows from Definition 63.6.5. 


63.6.18 REMARK: Maps from infinitesimal transformations to their generating Lie algebra elements. 

The one-to-one correspondence asserted in Theorem 63.6.17 implies that there is a well-defined map from 
infinitesimal transformations to Lie algebra elements for any C! Lie left transformation group (G, M) with 
dim(M) > 1. This map has particular value for the definition of associated connections in Section 67.12. 
Therefore it is convenient to give this map a definition and notation. 


63.6.19 DEFINITION: The generator of an infinitesimal transformation X € X(T(M)) by a Lie algebra 
element of G, where (G, M) is a C! Lie left transformation group, is the U € T.(G) which satisfies X = XV 


63.6.20 DEFINITION: The generator map for infinitesimal transformations by Lie algebra elements of a C! 
Lie left transformation group (G, M) is the map X¥ > U from XG(T(M)) = {X}; U € T.(G)} to T.(G). 
In other words, it is the inverse of the infinitesimal transformation map X™ in Definition 63.6.5. 


63.6.21 NOTATION: Generator maps from infinitesimal transformations to Lie algebra elements. 
Gen", for a C! Lie left transformation group (G, M), denotes the generator map for (G, M). In other words, 
Gen" is the map from the set of infinitesimal transformations (XV; U € T.(G)} to Te(G) which satisfies 


VU € T.(G), Gen” (X9) =U. 


In other words, Gen” = (X M)-! 


63.6.22 REMARK: Infinitesimal transformations by the general linear group on a linear space. 

The infinitesimal transformations in Example 63.6.23 are generated by elements of the Lie algebra of the 
concrete matrix group SO(3) which acts on the concrete linear space IR?. Definition 63.6.5 is applicable to 
general Lie transformation groups, which include in particular the general linear Lie transformation groups 
in Definition 63.4.17 which act on abstract finite-dimensional linear spaces. Such abstract spaces are given 
a standard differentiable manifold structure in Definition 51.4.24. Therefore infinitesimal transformations 
may be defined in accordance with Definition 63.6.5. 
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G=GL(F T.(GL(F)) 

> b" » | » 

| Va = KBB Yria) = Ya) | 

Rmxm Rmx x R"x*m 
| Re (dRp)e| 

T,(F) | 
> o4 « » » 

| Ware Ors (e) Vr = Ur) H 
R” R” x R” 

(ak, p)e(U) 
Figure 63.6.2 Infinitesimal action by e linear Lie group on a linear space 


Figure 63.6.2 illustrates the spaces and maps for infinitesimal action by a general linear Lie group on a linear 
space F with dim(F) = m € Zj. 

The infinitesimal transformations in Example 63.6.23 exploit the fact that the points of the spaces are 
themselves coordinates. So there is no need to apply charts, and there is no obvious need to provide base 
points for all vectors as there is for abstract differentiable manifolds. The differentiable manifold formalism 
where each tangent vector tp», must have a base point p and a chart 1» has the consequence that various 
algebraic properties, such as linearity, are obscured. 


63.6.23 EXAMPLE: Jnfinitesimal transformations of S? by SO(3). 
Two infinitesimal transformations of S? by SO(3) are illustrated in Figure 63.6.3. 


0 —a 0 0.0 0 
vector field induced by U = |a 0 0 vector field induced by U 2 |0 0 —a 
0 0 0 0 a 0 
Figure 63.6.3 Vector fields X$ induced on S? by Lie algebra elements U € T.(SO(3)) 


The vectors on S? in Figure 63.6.3 are shown as embedded in the ambient space IR? to emphasise that they 
are vectors, not short curves. Of course, it is impossible to illustrate truly infinitesimal translations of points 
because tangent vectors do not inhabit the same manifold as the base-space points. (See Chapter 53 for the 
"true nature" of tangent vectors.) 


The group SO(3) is a set of linear automorphisms of the Euclidean space IR? whose matrices with respect to 
the standard orthonormal basis are compositions of the three single-parameter families of rotation matrices 
Re which are indicated in equations (76.8.2) in Remark 76.8.3. These families are generated by the Lie algebra 
elements which are coordinatised by the three matrices in line (76.8.1). The vector fields in Figure 63.6.3 
are induced on S? by the maps p++ (dRp)e(U) for the matrices U as indicated. 

If p = (a;)3_, € M = IR? is acted on by a group element g = R3(a3)R2(a2)Ri(ai) € SO(3) C M3.3(IR) as 
indicated in line (76.8.3) in Remark 76.8.6, the result is as follows. 


C2C3  $8182C3 — C183  C182C3 + $153 Ly 
Lg = R3(a3)R2(a2)Ri(a1) Dp C253 S182853 + C1C3 C18283 — $1C3 HD ; (63.6.3) 
=$ S1C2 C1C2 X3 
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where s; = sino; and cg = cosay for L = 1,2,3. (The notation for the three rotation matrix families Re 
for £ € Ng should not be confused with the right action map Rp : G —^ M for p € M = S? C R®.) The 
left action notation “L,” for g € G is adopted here for contrast with the corresponding right action Rp 
for p c M. Thus Ly : p 9 gp = R,(g) for g € G and p € M. Then to concretely compute (dR,)-(U) for 
U € T.(G), one must differentiate the matrix product on line (63.6.3) with respect to g in a neighbourhood 
of e in the direction of each choice of Lie algebra element U. 

For the Lie algebra element U — [763,5]. which is illustrated on the left in Figure 63.6.3, one obtains 
(dRp)e(U) = Or Ra(at) = (—ax2,az1,0) € T(S?) for p = (x1, 22,23). (See Definition 14.8.34 for the Levi- 
Civita alternating symbol e.) For example, with p = (1,0,0), one obtains (dR,)-(U) = (0,a,0), and with 
p = (0, 21,0), one obtains (dR,)-(U) = (a,0,0). These are consistent with the anti-clockwise horizontal 
vector field as indicated for some positive value of a. 


For the Lie algebra element U — [7261,45]. j=1 Which is illustrated on the right in Figure 63.6.3, one obtains 
(dRp)e(U) = &Ri(at) = (0, —axs,ax3) € T,(S*) for p = (z1,22,23). For example, with p = (1,0,0), 
one obtains (dRp)e(U) = (0,0,0), and with p = (0, —1,0), one obtains (dR,)-(U) = (0,0, —a). These are 
consistent with the anti-clockwise vector field around the z4 axis as indicated for some a € Rt. 


63.6.24 REMARK: Linearity of general linear group infinitesimal group actions. 

Theorem 63.6.25 is effectively a restatement of Theorem 58.4.18 in the language of Lie transformation groups. 
Infinitesimal transformations are not, strictly speaking, linear themselves because they take values in the 
tangent bundle of the linear space, and the tangent bundle is not a linear space. However, the drop function 
“throws away” the base points of tangent vectors, and the resulting vertical component vectors do depend 
linearly in the base point, as would be expected from basic calculus. 


63.6.25 THEOREM: Linearity of vertical drop of general linear group infinitesimal actions. 
Let (G,V) be the general linear Lie transformation group of a finite-dimensional real linear space V as in 
Definition 63.4.17. (In other words, G = GL(V) as in Notation 63.4.18.) Then 


Vu € T.(G), v" o XY €Lin(V,V), 


where wY : T(V) — V is the vertical drop function for V as in Definition 54.9.5, and XY is the infinitesimal 
transformation map as in Definition 63.6.5. 


PROOF: The assertion follows from Theorem 58.4.18. 


63.6.26 REMARK: Difficulty of obtaining vector fields on a group from vector fields on the passive set. 

In principle, one could try to invert the construction in Definition 63.6.5 to obtain a vector field on the 
group G. This does not seem to be useful, however. For each p € S? and W € T,(S?), one may attempt 
to construct a vector field from the inverse map (dRp)g at each g € G, such as (dR,); ! (W) or ker((dR;),). 
The problem with (dR,)7'(W) is the fact that the linear map (dRp), is not generally injective. Therefore 
(dR,);  ((W) would be some sort of hyperplane in T; (G). The subspace ker((dRp)g) of T; (G) could possibly 
hold some interest, but it is difficult to see any immediate application. 


This situation contrasts with the situation where og : Gx G — G yields useful left and right invariant vector 
fields on Lie groups G. 


63.6.27 REMARK: Generation of vector fields by varying general group elements. 

The vector field XV : T.(G) > X(T(M)) in Definition 63.6.5 is generated, informally speaking, by varying 
a group element near the identity e € G and computing the vector field which results at each point of a 
manifold M. As mentioned in Remark 63.6.2, X) (p) = (dR,)-(U) is the same as (du(* , p)).(U). Sometimes 
a vector field is generated by substituting a general group element g € G for e. Then it would be interesting 
(and useful) to know whether the vector field generated by varying g is equivalent to a field generated by 
varying e. And if so, one would like to determine a conversion formula between the fields p ++ (dR,)g(W) 
for W € T,(G) and the fields p++ (dR,)e(U) for U € T.(G). (By Theorem 63.6.29 (iii), U = (dR4);  (W).) 


Technically speaking, there is no need to devise a new notation for infinitesimal transformation maps gen- 
erated by vectors W € T,(G) for g 4 e. One may write XM for the map p +> (R5).(W) = Ow Rp for 
any W € T(G). However, the term “infinitesimal transformation map” strongly implies an infinitesimal 
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transformation by a group element which is very close to the identity e. So it is better to give this concept 
a new name and a new notation. 


More importantly, however, the map p++ (dR,),(W) is not a vector field on M if g # e because it maps 
each p € M to a vector in T;,(M). This can be easily remedied by using the map p ++ (dR,-1,)9(W) 
instead. Then (dR,-1,)g(W) € T,(M). This gives an even stronger reason to use a new name and notation. 
(However, technically speaking, one could define the map p ++ (Rre(w)-tp)x(W), which uses the tangent 
bundle projection map tg : T(G) > G to automatically insert the “correction factor" ™g(W)~! before p. 
Then the choice g = e would make this factor disappear, thereby making a unified notation possible.) 


63.6.28 DEFINITION: The (off-centre) infinitesimal transformation map of a C! Lie left transformation 
group (G, M) at g € G is the map Xj” : T,(G) > X (T(M)) defined by 


VW € T,(G), Vp € M, Xy (p) = (dRg-1p)4(W), 


where Rp : G — M is the right action of p on G. 
The infinitesimal transformation by a Lie algebra element W € T,(G) on M is the vector field Xy. 


Alternative name: vector field generated by a Lie algebra element W € T,(G) on M. 


63.6.29 THEOREM: Some basic properties of off-centre infinitesimal transformation maps. 
Let (G, M) be a C! Lie left transformation group. 


(i) Vg € G, VW € T;(G), Vp € M, Xy (p) € T; (M). 
(ii) Vg € G, VW € T,(G), XM, € X(T(M)). 
(ii) Vg € G, VW € T,(G), Vp € M, X (p) = Xn yz aw 02- 
In other words, xy = X¥ , where U = (dR,)-1(W) = (dR,-1)4(W). 


PROOF: Part (i) follows by noting that R,-1,(g) = g(g !p) = p. So Range((dR4-1,)5) € T; (M). 
Part (ii) follows from part (i), Notation 57.1.5 and Definition 57.1.2. 

For part (iii), note that (dR,-1,), = (dRp)e o (dR,-1), because R,-1, = Rp o R,-:. Let W € T,(G). Then 
(dRg-1p)g(W) = (dRp)e((dRy-1) g(W )- R,) (U), where U = (dR,-1)g(W). But (dRp)e(U) = Xj (p) by 
Definition 63.6.5. Thus X^ (p) = Xi (p = U = (dR,-1),(W) = (dRọ)z+(W) by Theorem 62.6.7 (iii). 
Hence U = (dR4); (W). 


63.6.30 REMARK: Simultaneous variation of a group element and manifold element. 

It sometimes happens that a group element and the object which it acts on both vary with respect to an 
external variable. Thus one may wish to know the differential of a map z +> g(z)p(z), where both g(z) € G 
and p(z) € M depend on z. This computation is given in Theorem 63.6.31. 


A particular application for this kind of computation occurs when a fibre chart is acted on by a fibre chart 
transition map, which is a variable element of the structure group which acts on the fibre space. (See the 
proof of Theorem 64.8.10 for an application.) 


63.6.31 THEOREM: Partial differentials of group action with respect to group element and passive point. 
Let (G, M) be a C! Lie left transformation group. Let E be a C! manifold. Let g: E — G and p: E— M 
be C! maps. Define q : E — M by q(z) = g(z)p(z) for all z € E. Then 


VzcE, (dq); = (dRyz)) g(z) © (dg)z + (dLo(2))p(2) © (dp)z. (63.6.4) 
Hence 
Yz € E, Yw E TAE), (da). (w) = XY) ag). co) (16) + (ALg(2)) 2) (dp) Qu) 
= Xituy(a(2)) + (dLg(z))p(2) ((dp) 2(w)), (63.6.5) 


where u : T(E) — T.(G) is defined by u(w) = (ARS (wy) le (9 (w)) for all w € T(E), where 7: T(E) > E 
is the tangent bundle projection map for T(E). 
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PROOF: Let u: Gx M — M be the action map of (G, M). Then Vz € E,q(z) = u(g(z),p(z)). Thus 
q= uo (g X p). So by Theorem 58.7.13 line (58.7.14), 


Yz E€ E, (dq); = (du( i P(A) ater d (dg); us (du(g(z), "ata o (dp); 
= (dRotz)) sta) 9 (dg); RE (dL g(z)) p(z) o (dp), 


which verifies line (63.6.4). Then 


Vz € E, Ww € TAL), (dRpz))9(z) ((dg)z(w)) = (dRg(z)-1q(z))g(z) ((dg) z(w)) 


by Definition 63.6.28 and Theorem 63.6.29 (iii), where u(w) = (GRE A) 1 ((dg);(w)). In other words, u(w) — 
(ARE n(wy) Je  (g«(w)) for all w € T(E). This verifies line (63.6.5). 


g(a 


63.7. Infinitesimal right transformations 


63.7.1 REMARK: Left invariant vector fields are infinitesimal right actions, and vice versa. 

The form (dR,). for infinitesimal transformations in Definition 63.6.5 may seem a little surprising, but in 
the case M = G, so that G acts on G by left translation, Theorem 62.7.4 shows that a right-invariant vector 
field X on G satisfies X(p) = (dR,)-(X(e)) for all p € G, which matches perfectly with Definition 63.6.5. 
Therefore an infinitesimal transformation is a generalisation of right-invariant vector fields from Lie left 
transformation groups of the special form (G, G) to general Lie left transformation groups (G, M). 


It may seem odd that infinitesimal transformations of a left transformation group are so closely related to 
right invariant vector fields. The reason for this is that left and right actions of groups commute, as stated 
in Theorem 62.7.11. So an infinitesimal left action by a group is invariant under right actions of the group. 
So an infinitesimal left action is a right invariant vector field when the G-manifold M is the group G itself. 


Similarly, in the case of the Lie right transformation group (G,G) < (G,G,oc,oc) (same specification 
tuple as for the left transformation group but with a different object class), infinitesimal transformations are 
infinitesimal right actions, and these are left invariant vector fields on G. In the case of general Lie right 
transformation groups (G, M), infinitesimal transformations are of the form g++ (dL,)-(U) for U € T.(G). 
Summarising this, one may say that on a Lie group, left invariant vector fields are infinitesimal right actions, 
and right invariant vector fields are infinitesimal left actions. (See also Remarks 62.4.12, 62.7.1 and 63.6.2 
for interpretation of left invariant vector fields as infinitesimal group actions.) 

'The abstract infinitesimal transformation concept in Definition 63.7.3 is applied in Definition 66.5.2 to the 
right action map for differentiable principal fibre bundles in Definition 66.2.2. 


63.7.2 DEFINITION: The left action on a Lie right transformation group (G,M,oc, u) by a point p € M 
is the map Lp : G — M defined by Lp : g + pg = u(p, g). 


63.7.3 DEFINITION: The infinitesimal (right) transformation action of a C! Lie right transformation group 
(G, M) is the map X™ : T.(G) + X(T(M)) defined by 


VU € T.(G), Vp € M, XE (p) = (dLp)e(U), 


where Lp : G — M is the left action of p on G. 
The infinitesimal (right) transformation by a Lie algebra element U € T.(G) on M is the vector field Xj. 
Alternative name: vector field generated by a Lie algebra element U € T.(G) on M. 


63.7.4 REMARK: An infinitesimal transformation may be called a “fundamental vector field”. 

As mentioned in Remark 63.6.6, an infinitesimal transformation of a Lie left transformation group may be 
called a “fundamental vector field”, with or without a negative sign. The same is even more true of Lie right 
transformation groups. This is because the right action of the structure group on a differentiable principal 
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bundle is a Lie right transformation group. (See Sections 66.4, 66.5 and 66.6.) The fundamental vector 
field in Section 66.6 is an infinitesimal transformation of this kind. This vector field is related to connection 
forms. 


The two different names reflect two different ways of thinking about the same thing. The “vector field” name 
emphasises the structure on the passive set. The “infinitesimal transformation” name emphasises the strong 
relation to the group action. As shown for Lie left transformation groups in Theorem 63.6.17, infinitesimal 
transformations are in one-to-one correspondence with the Lie algebra elements which generate them. So 
one may almost identify these vector fields with Lie algebra elements. 


63.7.5 REMARK: Well-definition of infinitesimal transformations of Lie right transformation groups. 

The well-definition of the infinitesimal transformations in Definition 63.7.3 follows in the same way as for 
Lie left transformation groups in Remark 63.6.10. Since the left action Lp on G by each element p € M is a 
C! map, the differential dL, is well-defined. So Xi (p) € T;(M) is well defined for all p € M. 


63.7.6 THEOREM: Infinitesimal transformations by C**! transformation groups are OF vector fields. 
Let (G, M) be a C^*! Lie right transformation group. Let X™ be the infinitesimal transformation action 
of (G, M). Then X¥ e X*(T(M)) for all U € T.(G). 


PROOF: The assertion may be proved as for Theorem 63.6.11. 


63.7.7 REMARK: Znfinitesimal right transformation vector fields are not in general right invariant. 
Theorem 63.7.8 gives a formula for the right translation of infinitesimal transformations by a group element 
in terms of its adjoint map given in Definition 62.10.2. Since the right translate of Xj is not in general 
the same as X 2 , this implies that X A is not in general right invariant. There are some subtle differences 
between the proofs of Theorems 63.6.15 and 63.7.8. In particular, it should be noted that it is the right 
adjoint map Adj(g~') which is used for Lie right transformation groups, whereas the left adjoint map Adj(g) 
is used for Lie left transformation groups. 


63.7.8 THEOREM: Formula for right translation of infinitesimal transformations. 
Let (G, M) be a C! right transformation group. Then 


vg € G, VU € T.(G), RE (XU) = Xxaitg-3)(U)- 


In other words, Vg € G, VU € T.(G), Vp € M, X Kajla- io (P) = (RM), (Xf! (pg !)). (See Definition 62.6.3 
for Ed , the right translation operator for fields.) 


PROOF: Let g € Gand U € T.(G). Then 


Vp c M, Xy- (P) = (dLp)e(Adj(g" )(U)) 

= (dLp)e(d( RG o LG .)«(U)) 
= (d(L o RẸ o L24))-(0) 

= (A(R o Lpg-1))e(U) 

= (dR )pg- (dL, -1)e(U)) 
=(R dor M 

= (Gi), o Xi, o Ry-1)(p) 

=R] UE "i 


RM 
g) 
RM 


Hence XM -iy(u) = RP (Xp). 
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Chapter 64 


DIFFERENTIABLE FIBRE BUNDLES 


64.1 Differentiable fibrations with unspecified fibre space . . . .. ..... lll. 2024 
64.2 Differentiable fibrations with a specified fibre space . .. ...... llle. 2025 
64.3 Fibre charts for differentiable fibrations . . . . .. es 2027 
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64.9 Differentiable fibre bundle pull-back atlases . . . .... res 2043 
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64.12 Fibre-set tangent vector embedding maps ............... 0.000002 0 0G 2048 
64.13 Vertical vector fields generated by the Lie algebra... ....... llle 2053 
64.14 Non-vertical vector fields generated by the Lie algebra . . ...... lll 2057 


64.0.1 REMARK: Overview of components of specification tuples for fibre bundles. 
A family tree for differentiable fibre bundles is illustrated in Figure 64.0.1. 


non-topological 


ae 
"m 


fibration 
(E,7,B) 
topological 
fibration non-topological 


(E,Tz,7,B,Te) fibre bundle 


n d (B,7,B,A5) 
differentiable i 
fibration topological i 
(E,Az,7,M,Am) fibre bundle non-topological 
| (£,Tr,7,B,Tp,Ag) principal fibre bundle 
pri 1 (P,r,B, Ag) 
differentiable ps 
fibre bundle topological 
(E,Ag,7,M,Au AE) principal fibre bundle 
i (P,Tp,7,B,Tp,AS) 
PS 
differentiable 
principal fibre bundle 
(P,Ap,x,M,Am,A@) 


Figure 64.0.1 Family tree for differentiable fibre bundles 


Whereas topological fibre bundle specification tuples include a topology for each space, E, M, F and G, 
differentiable fibre bundle tuples include a locally Cartesian atlas for each space. Differentiability of maps 
requires atlases to indicate differentiable structure. (Base spaces which are denoted “B” for the topological 
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fibre bundles in Chapter 47 are denoted “M” for the differentiable fibre bundles in Chapters 64-66 as a hint 
that they are manifolds which have atlases.) 
Non-topological fibre bundles are defined in Chapter 21. Topological fibre bundles are defined in Chapter 47. 


The following overview table compares the regularity « conditions for topological, differentiable and analytic 
fibre bundles. (See Remark 64.0.3 and Table 64.0.3 for more detail.) 


topological C* differentiable analytic 
component symbol fibre bundle fibre bundle fibre bundle 

total space E topological space C* manifold analytic manifold 
base space M topological space C* manifold analytic manifold 
fibre space F topological space C* manifold analytic manifold 
structure group G topological group Lie group Lie group 
projection map -7: E—^ M continuous C* analytic 

fibre charts Q:E-—F continuous p^ analytic 

group operation o: G —G continuous analytic analytic 

left action L:GxF-F continuous C analytic 


An analytic fibre bundle is has the same kind of specification tuple as a differentiable fibre bundle. The 
only difference is the regularity class. (See Definition 64.10.1.) As in the case of differentiable groups, 
where a topological manifold structure comes with a “free upgrade” to an analytic group (as mentioned in 
Remark 62.2.3), the group of a C? transformation group likewise comes with a “free upgrade". Therefore 
it is possible to assume much weaker conditions than analyticity for a structure group, and then prove its 
analyticity. But the proof is very far from elementary. To avoid giving theorems without proof, it is always 
assumed explicitly here that the structure group for a differentiable fibre bundle is analytic. 


A very abstract kind of continuous pathwise parallelism may be defined on a topological fibre bundle, but the 
definition of a connection (differential parallelism) requires a differentiable fibre bundle. Just as topological 
fibre bundles (Chapter 47) are the natural structures for defining parallelism (Chapter 48), differentiable 
fibre bundles (Chapters 64-66) are the natural structures for defining connections (Chapters 67-72). 


64.0.2 REMARK: Species of differentiable fibre bundles. Fibrations and ordinary/principal fibre bundles. 
As illustrated in Figure 47.1.2, there are three species of topological fibre bundles. Differentiable fibre 
bundles have the same three species: fibrations, ordinary fibre bundles, and principal fibre bundles. Differ- 
entiable fibrations are in Sections 64.1 and 64.2, differentiable ordinary fibre bundles are in Section 64.8, and 
differentiable principal fibre bundles are in Section 66.1. 


Numerous definitions of non-topological, topological and differentiable fibrations, ordinary fibre bundles 
and principal fibre bundles are listed in Table 21.0.1 in Remark 21.0.5. The differentiable definitions are 
summarised here in Table 64.0.2. 


definition specification tuple fibre bundle class short term 

64.1.2 (E, Ag, 1, M, Am) C* fibration with intrinsic fibre space fibration 

64.2.2 (E, Ag, 1, M, Am) C* fibration with fibre space F F-fibration 

64.4.4 (E, Ag, n, M, Am, AE) OF fibre bundle for fibre space F F-bundle 

64.8.3 (E, Ag, Tt, M, Am, AL) OF (G, F) fibre bundle (G, F)-bundle 

64.10.1 (E,Ag,n,M, Am, AL) analytic (G, F) fibre bundle (G, F)-bundle 

66.1.2 (P, Ap, n,M, Au, AG) C* principal fibre bundle with structure group G principal G-bundle 
Table 64.0.2 Summary of differentiable fibre bundle definitions 


Both fibrations and ordinary fibre bundles may be thought of as generalisations of tangent bundles. A 
fibration specifies how a total space is partitioned into fibre sets which are associated with individual points 
of a base space. A fibration has very limited geometric significance. It is the structure group which adds 
geometry to fibrations, in the same way that the Euclidean group of rotations, reflections and translations 
adds geometry to two and three dimensional sets of points. (The so-called “structure group” is in fact a 
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transformation group for a fibre space, not just a pure group. In the case of a principal bundle, the structure 
group acts on both itself and on the total space. So it is also a transformation group.) 


It is questionable whether a fibration earns the title “fibre bundle” by merely adding a fibre atlas, or 
whether a structure group must be specified. Without a structure group, the fibre charts in the fibre 
atlas are constrained only by the analytical requirement to be C" differentiable for some k € Ze . The 
(transformation) group of all C^ automorphisms of a C^ manifold has very little geometric significance. 
Most importantly, it is difficult to define a Lie algebra for such a large group. A Lie algebra for the structure 
group is required for definitions of connections. (See Sections 63.1, 63.2 and 63.3 for some indications of how 
a “Lie algebra” might be defined for such infinite-dimensional groups. No attempt is made here to define 
connections for infinite-dimensional structure groups.) 


Differentiable fibre bundles are presented here as a gradual build-up of structure from fairly minimal fibrations 
to fully structured fibre bundles. This helps to clarify the roles of various aspects of the full definitions. 


64.0.3 REMARK: Overview of differentiability of maps and spaces of fibre bundles. 
Table 64.0.3 summarises regularity classes of maps and spaces of various classes of fibre bundles. 


fibre bundle class E m M 6 G F o 4 
non-topological 
20.10.8 non-topological baseless figure/frame bundle set  — - fn gp set fn fn 
21.1.2 non-uniform non-topological fibration set fn set 
21.2.1 uniform non-topological fibration set fn set fn 
21.2.10 non-topological fibration with fibre space set fn set fn — set — = 
21.7.5 | non-topological fibre bundle set fn set fn = êt = = 
21.8.3 non-topological ordinary fibre bundle set fn set fn gp set fn fn 
21.9.4 non-topological principal fibre bundle set fn set fn gp set fn fn 
topological 
47.2.2 topological fibration top cts top cts 
47.3.6 | topological fibration with fibre space top cts top cts - top = - 
47.5.5 | topological fibre bundle top cts top cts — top - = 
47.6.5 topological ordinary fibre bundle top cts top cts top top cts cts 
47.8.3 topological principal fibre bundle top cts top cts top top cts cts 
47.13.2 topological fibre/frame bundle top cts top cts top top cts cts 
differentiable 
64.1.3  C* fibration c: queque qe 
64.2.2 OF fibration with fibre space QW Cc qu Cc - OF - - 
64.4.4 C* F-bundle Ch CF due oe - CF - - 
64.8.3 OF ordinary fibre bundle ce Qe oc” qq A Ge A * 
64.10.1 analytic ordinary fibre bundle A A A A A A A A 
65.1.3 C* vector bundle p" QE ce A A A A 
66.1.2 C* principal fibre bundle OS -OF um OF A A A A 
Table 64.0.3 Summary of regularity classes of fibre bundles 


The abbreviations in the table are: “fn” =function, “gp” =group, “top” =topological space, “cts” =continuous, 
“A” =real-analytic, and “C*” means a C* manifold or C^ differentiable function between C^ manifolds. 


64.0.4 REMARK: Fibre bundles are a natural framework for defining distributed physical states. 

Fibre bundles provide an effective and natural framework within which distributed physical states such as 
fields can be defined. The fibre set at each base-space point of a fibre bundle is a container for the state of 
some distributed object at that point. The global uniformity of the fibre sets at base-space points follows 
from the assumed uniformity of the basic laws of physics. The charts of a fibre bundle are models for observer 
reference-frames. (See Section 20.10 for further comments on the significance of fibre bundles.) 


The differential geometry in this book is presented principally from the fibre bundle perspective. Both tensor 
calculus and Koszul-style vector field calculus lack geometric clarity. They are also difficult to extend from 
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affine connections on tangent bundles to general connections on fibre bundles. The fibre bundle framework 
is suitable for the widest range of applications, despite its lack of user-friendliness. 


64.1. Differentiable fibrations with unspecified fibre space 


64.1.1 REMARK:  Differentiable fibrations have atlases, whereas topological fibrations have topologies. 

Topological fibrations with an intrinsic fibre space are defined in Section 47.2. A topological fibration 
specifies how the fibre sets at each point of the base space are “glued together" continuously. In the case of 
differentiable fibrations, the fibre sets are glued together differentiably. Definition 64.1.2 is a differentiable 
version of the topological fibration with an intrinsic fibre space in Definition 47.2.2. The topologies Tg and 


Tm are replaced with manifold atlases Ag and Ay. 


64.1.2 DEFINITION: A C* (differentiable) fibration with an intrinsic fibre space, for k € Zg, is a tuple 
(E,n, M) < (E, Ag, 1, M, Am) which satisfies the following conditions. 


(i) E < (E, Ag) and M < (M, Am) are C* manifolds. [differentiable manifold structure] 
(ii) m : E — M is a C* differentiable surjective map. [projection map differentiability] 
(iii) Vp € M, JU € Top, (M), 36 : 1 (U) > «- ((p)), 

T Xó6:n-1(U)— U x n-!((p]) is a C* diffeomorphism. [local triviality] 
(iv) Vp1,pa € M, «-!((p1]) is C^ diffeomorphic to «-!((pa]). [global fibre-set uniformity] 


E is the total space of (E, n, M). 

7 is the projection map of (E, n, M). 

M is the base space of (E, r, M). 

nm l([p]) is the fibre set of (E, n, M) at p € M. 

T X ¢ is the local trivialisation of (E, v, M) by the chart ¢ € Af. 


64.1.3 REMARK: Manifold charts for a differentiable fibration with an intrinsic fibre space. 
Definition 64.1.2 is illustrated in Figure 64.1.1. It is assumed that n = dim(M) and m = dim(E). 


m"({p}) € M x a! ((p) 


Figure 64.1.1 Differentiable fibration with an intrinsic fibre space 


Figure 64.1.1 is the same as Figure 47.2.1 except that charts have been added. In this case, there are charts 
we E€ Ag and vy € Ay. The manifold charts have no specified association with the intrinsic fibre charts œ. 


64.1.4 REMARK: The difference between topological fibrations and C? differentiable fibrations. 

The parameter k of the differentiability class C^ in Definition 64.1.2 is permitted to be zero, which means 
that the fibration is only continuous, not differentiable. This is not exactly the same as a topological fibration 
because the topologies on Æ and M in Definition 64.1.2 are specified by C? manifold atlases when k = 0, 
not open-neighbourhood topologies. The word “manifold” here means the finite-dimensional real Cartesian 
style of manifold. A C? fibration is thus essentially equivalent to a topological fibration whose component 
spaces are manifolds rather than completely arbitrary topological spaces. 
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64.1.5 REMARK: The local triviality conditions for differentiable fibrations. 

Condition (iii) in Definition 64.1.2 is known as the “local triviality” condition, which means that every point 
z in the total space has a neighbourhood 7~1(U) which is C*-diffeomorphic to a Cartesian differentiable 
manifold product of U € Top, (M) with 7~*({m(z)}). 


The local “triviality” refers in particular to combinatorial topology. In other words, although the set «^ !(U) 
might not be connected, and might not be simply connected, and various tools of algebraic topology might 
show something non-trivial about the topology of 7~!(U), all of this non-triviality comes from the topological 
attributes of U and «^ !(U). Both of these sets could have arbitrarily non-trivial topological structure, but 
the set 7^ 1(U) is formed from their direct product, which can obviously be decomposed back into the 
components of the product. The local triviality applies also to differentiable structure. 


Condition (iv) ensures that if the base space M is topologically disconnected, the fibre sets at all points of M 
are equivalent. So any fibre set 1^ !([p)) may be adopted as “the” fibre set. This “homogeneity condition" 
is redundant if M is connected. 


64.1.6 REMARK: Difficulties with differentiable fibrations with intrinsic fibre space. 

In the case of a topological fibration with intrinsic fibre space in Definition 47.2.2, the choice of topology on 
a fibre set Ej = x !((5]) is clear. It must be the relative topology of the total space E restricted to Ej. The 
relative topology is well defined for any subset of E. There is not such clarity in the case of the manifold 
atlas on E, = « !([p]) in Definition 64.1.2. Restrictions of manifold charts V € atlas(E) to Ep are not 
manifold charts in general because their ranges are typically not open subsets of any Cartesian space. The 
definition of a submanifold is much less simple than the definition of a relative topology. 


Definition 52.3.7 indicates what is required for a fibre set E, to be a well-defined submanifold of the total 
space E. A fibre set can be shown to be a well-defined C* submanifold by Theorem 52.7.5 (viii) if it is known 


that x !(U) is C^ diffeomorphic to a product of two C^ manifolds as in Definition 64.1.2 (iii). But one of 
these two manifolds is the fibre set itself. This is a cyclic argument. 


Thus one must interpret Definition 64.1.2 to mean that all of the fibre sets are C^ submanifold sets as in 
Definition 52.3.7, and that they can therefore be given a C^ manifold atlas. (A construction for this is given 
in Theorem 52.4.11.) Then the rest of the definition is meaningful and (apparently) self-consistent. However, 
since a manifold atlas for each fibre set is not required to be supplied, there is some uncertainty as to its 
properties. This uncertainty is removed by the explicit specification of a fibre space in Definition 64.2.2. 


64.2. Differentiable fibrations with a specified fibre space 


64.2.1 REMARK:  Emzplicit specification of the fibre space for a differentiable fibration. 

Since the fibre sets of the differentiable fibration in Definition 64.1.2 are uniform, any C* manifold F which 
is C*-diffeomorphic to one fibre set is C^-diffeomorphic to them all. Since intrinsic fibre sets are difficult 
to manage, Definition 64.2.2 introduces an extrinsic fibre space F, which in practice is chosen to be a space 
with which one has great familiarity. (The homogeneity condition (iv) in Definition 64.1.2 is not required 
for Definition 64.2.2 because the uniformity of the fibre sets * !((p]) is guaranteed by condition (iii) in 
Definition 64.2.2.) 


The fibre charts in Definition 64.2.2 are not limited to the elements of a specified fibre atlas. When a fibre 
atlas is specified, as in Definition 64.8.3, such relatively unlimited fibre charts are referred to as C^ compatible 
fibre charts to distinguish them from the fibre charts in the specified fibre atlas. 


The fibre space F is not part of the specification tuple for a C^ F-fibration. For any C^ diffeomorphic 
manifolds Fı and F», E is a C^ F,-fibration if and only if E is à C* F5-fibration. The fibre space is merely 
a property of the fibration. Therefore strictly speaking, a fibre chart should be called an *F-fibre chart” to 
remove ambiguity. For fibre bundles, on the other hand, the fibre space is uniquely determined by the fibre 
charts in the fibre atlas, which all use the same fibre space for their range spaces. 


The differentiability class C^ is likewise merely a property of a fibration. So in fact, a fibre chart should be 
referred to as a *C^ F-fibre chart” to make the class explicit. This is done in Definition 64.3.2. 


64.2.2 DEFINITION: A C* (differentiable) fibration with fibre space F , for a C" manifold F < (F, Ar), for 
k € Za, is a tuple E < (E, v, M) < (E, Ag, T, M, Am) such that 
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(i) E < (E, Ag) and M < (M, Am) are C* manifolds, 

(ii) +: E > M is a surjective C^ map, 
(iii) Vp € M, JU € Top, (M), 36 :x !(U) > F, nx ó:2 !(U) 2 U x F is a C* diffeomorphism. 
Alternative name: C" (differentiable) F-fibration. 


A fibre chart for a C^ F-fibration E < (E,m, M) is any map ¢: *-1(U) — F, for any U € Top(M), for 
which 7 x 6: 771(U) — U x F is a C* diffeomorphism. 


64.2.3 REMARK: Manifold charts for a differentiable fibration with an extrinsic fibre space. 
Definition 64.2.2 is illustrated in Figure 64.2.1. 


A 


m\U)CE F 
iem QAO ie 
m=n+q n 
| + 


5 O 
UCM UxFCMxF 
Figure 64.2.1 Differentiable fibration with fibre space F 


It is assumed in Figure 64.2.1 that n = dim(M), m = dim(E) and q = dim(F). Then m = n+q. The 
domains and definitions of the manifold charts wy € Ay, vg € Ag and wp € Ap are independent of each 
other and also of the domains and definitions of the fibre charts 9 : x !(U) — F. To reduce the chances of 
confusing fibre charts with manifold charts, different mnemonic letters ¢ and w are used. The letter ¢ sounds 
like the first two letters of “fibre”, whereas the letter w (“psi”) suggests the last two letters of “maps”. 


64.2.4 REMARK:  Differentiability of vector fields on total spaces generated via fibre spaces. 

'Theorems 64.2.5 and 64.2.6 are technical lemmas which are useful for showing that certain kinds of vector 
fields on a fibre bundle total space are differentiable if they are generated by vectors or vector fields on the 
fibre space. (See the proof of Theorem 64.13.7 for an application.) 


64.2.5 THEOREM:  Differentiability of vertical vector fields with constant vertical component. 
Let k € Zg. Let (E, v, M) bea C**! differentiable F-fibration. For y € T(F), define g” : M > T(M)xT(F) 
and h? : E  T(M) x T(F) by 

Vp € M, g” (p) = (Op, v) 

Vz € E, h” (z) = (Dcos 2). 


where 0, = 07, (yy for all p € M. 
(i) Yy € T(F), g” € CF(M,T(M) x T(F)). 
(ii) Vy € T(F), h” € CF(E,T(M) x T(F)). 
PROOF: Part (i) follows from Theorem 57.2.16. 
Part (ii) follows from part (i) and Definition 64.1.2 (ii), and the chain rule, Theorem 52.1.17. 


64.2.6 THEOREM:  Differentiability of vertical vector fields with differentiable vertical component. 
Let k € Zg. Let (E, m, M) be a C**! differentiable F-fibration. For a € C*(M,T(F)) and 8 € C*(E,T(F)), 
define g^ : M + T(M) x T(F) and h? : E + T(M) x T(F) by 

Vp € M, g^ (p) = (Op, o(p)) 

Vz € E, h? (z2) = (snos 8(2)), 


where 0, = Or, (m) for all p € M. 
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(i) Va € C*(M,T(F)), g% € C*(M,T(M) x T(F)). 
(ii) V8 € C*(E,T(F)), h? € C*(E,T(M) x T(F)). 
(ii) Va € C*(M,T(F)), h^?" e CF(E, T(M) x T(F)). 
PRoor: For part (i), the map p ++ 0, is C^ by Theorem 57.2.14. So g^ € C*(M,T(M) x T(F)) by 
'Theorem 52.6.13. 

For part (ii), the map z + 0,(z) is C^ because the maps m : E — M and p ++ 0, are C*. Hence 
h? € C*(E,T(M) x T(F)) by Theorem 52.6.13. 
Part (iii) follows from part (ii) by substituting B = ao m. 


64.3. Fibre charts for differentiable fibrations 


64.3.1 REMARK:  Differentiable fibre charts. 

Since the differentiable fibre charts in Definition 64.3.2 are so strongly constrained by the differentiable 
manifold structure on the total space E, they do not, in principle, contribute any new information. On the 
other hand, the total space manifold structure is typically unknown apart from the fact that it is compatible 
with the structure induced via charts from the fibre space as in Definition 64.3.7. For such “abstract” total 
spaces, all of the knowledge of their structure comes from fibre charts. So these charts are effectively the 
primary part of the specification. 


64.3.2 DEFINITION: A C* (differentiable) F-fibre chart for a C" F-fibration (E,7,M), for k € Za, isa 
map ¢: 1 1(U) > F for some U € Top(M), such that m x 6 : v 1(U) + U x F is a C* diffeomorphism. 


64.3.3 REMARK: Obtaining component differentials from the combined trivialisation differential. 
Theorem 64.3.4 separates the differentials dz and dó of the component maps 7 and ¢ out of the differential 
d(x x $) of a Ct local trivialisation 7 x $. Usually the tangent bundle product identification map i is not 
shown explicitly, even though it is technically necessary. Theorem 64.3.4 says almost the same thing as 
Theorem 64.3.4, but in a form which is sometimes more convenient. 


64.3.4 THEOREM:  Decomposition of local trivialisation differential into horizontal/vertical components. 
Let (E,7,M) be a C! fibration with fibre space F and a C! fibre chart ¢. Then 


Vz € Dom(¢), (d(m X ¢))z = io (dz); X (dó);). 


(See Definition 54.7.6 for the tangent bundle product identification map i: T(M) x T(F) > T(M x F).) 


PROOF: The assertion follows from Definition 64.3.2 and Theorem 58.7.7. 


64.3.5 THEOREM: Separation of local trivialisation differentials into horizontal and vertical components. 
Let (E,7,M) be a C! fibration with fibre space F and a C! fibre chart ¢. Then 
Vz € Dom(9), Vy € T;(E), VV,W) € Trea (M) x Tag (E), 
(d(x x ó)).(y) 2 i(V,W) <= ((dr)2(y) =V and (d9);(y) = W). 
Hence 
Vz € Dom(9), Vy € T;(E), VV,W) € Trea (M) x Tg (E), 
y = (d(n x 9); G(V,W)) «€  ((dm).(y) = V and (dó);(y) = W). 


(See Definition 54.7.6 for the tangent bundle product identification map i: T(M) x T(F) > T(M x F).) 


PROOF: The assertion follows from Definition 64.3.2 and Theorem 58.7.9. 


64.3.6 REMARK: The manifold atlas induced on fibre sets by a fibre space. 

As mentioned in Remark 64.1.6, a fibration E without a specified fibre space F does not have an obvious 
choice of manifold atlas on the fibre sets £j. When a C^ F-fibre chart is given, it induces a C^ manifold 
atlas on each fibre set within its domain in Definition 64.3.7. 
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64.3.7 DEFINITION: Submanifold atlas induced on a fibre set from a fibre space via a fibre chart. 
The (fibre-chart-induced) atlas on a fibre set Ep of a C? F-fibration (E, s, M), by a C? F-fibre chart ¢ at a 
point p € z(Dom(4)), is the set 


Ag, = {V o $| p; V € atlas(F)). 


64.3.8 REMARK: Fibre sets of differentiable fibrations are regular submanifolds. 

Theorem 64.3.9 asserts that each fibre set E, of a differentiable fibration (E, v, M) is a submanifold of the 
total space E, and that it is diffeomorphic to the fibre space F. This is useful for importing vectors or vector 
fields from the fibre space to the fibre set. These imported vectors lie in the tangent bundle of each fibre 
set Ep, not in the tangent bundle T(E). To obtain a vector in T(E), the tangent vector embedding map in 
Definition 54.6.2 must be applied. Then the resulting tangent vector will be “vertical” in T(E). 


64.3.9 THEOREM: Fibre set submanifold atlases and regular embeddings. 
Let k € Zf. Let (E, n, M) be a C* F-fibration. Let ¢: x !(U) — F be a C* F-fibre chart for (E, 7r, M). 
Let Ap, be the manifold chart induced by ¢ on Ej = «^! ((p]) for each p € U. 


(i) (Ep, Ap,g) is a regular C^ submanifold of E. 
(ii) |. : Ep > F is a C* diffeomorphism for all p € U, assuming manifold atlas Ag,,¢ on Ep. 


(iii) The map lp. : F + Ey is a regular C^ embedding of F in E, assuming manifold atlas Ag, on Ep. 


Pnoor: Part (i) follows from Theorem 52.7.5 (vi). 
For part (ii), let 9: r71 (U) — F be a C* F-fibre chart for (E, n, M). Then 7 x 6: 77-1(U) > Ux F is a C" 
diffeomorphism by Definition 64.3.2. Hence olr- : n7t({p}) > F is a C^ diffeomorphism for all p € U 
by Theorem 52.7.5 (x). 

Part (iii) follows from Theorem 52.7.5 (xii). 


64.3.10 REMARK: Theorems related to morphisms between pointwise fibre sets and fibre spaces. 

Theorem 64.3.9 (ii) is the analogue for differentiable fibrations of the pointwise homeomorphism property 
for topological fibrations in Theorem 47.3.11 (i) and for topological spaces in Theorem 32.11.4 (ii), and the 
pointwise bijection property for non-topological fibrations in Theorem 21.5.6 (i) and for non-topological 
functions in Theorem 10.15.10 (i). Theorem 64.3.9 is similarly related to various other theorems which are 
listed in the following table. 


structure partial map pointwise morphism fibration morphism 


set 10.15.10 21.5.6 
topological space 32.10.10 32.11.4 47.3.11 
Cartesian space 42.6.16 42.7.6 
topological manifold 50.5.4 
differentiable manifold 52.6.8 52.7.5 64.3.9, 64.11.3 


The theorems in the “partial map” column show that if a function product f xg: Y — Xx Xə is continuous 
or differentiable between a space Y and a direct product of spaces X; and X», then the “fibre set restrictions” 
f| sup and 9| | (py for pı € X4 and ps € X» are continuous or differentiable respectively. 


The theorems in the “morphism” columns show that if f x g : Y — X4 x X» is a bijection, homeomorphism 
or diffeomorphism, then each “fibre set restriction" of f or g is likewise a bijection, homeomorphism or 
diffeomorphism respectively. 


These theorems require the choice of a topological or differentiable structure on each “fibre set” g^! ((p3]) or 
f ((pi)). In the case of continuity and homeomorphisms, the natural topological structure on these “fibre 
sets" is the relative topology induced by the “total space" Y. But for differentiability and diffeomorphisms, 
the choice of differentiable structure is not clear. (This is discussed in more detail in Remark 64.1.6.) 
One may either choose the maximal C* atlas which makes each fibre set a regular submanifold, as in 
Theorem 52.4.11, or else the atlas may be induced from the atlas on the “fibre space" X, or X5. This 
manifold atlas is (44 o Fl cea) V € atlas(X1)} or (vo o 9| ji (py V» € atlas(X2)} for g~!({p2}) or 


f-'({p1}) respectively. 
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64.3.11 REMARK: The importance of fibre-set regular submanifold structure and regular embeddings. 
Although the products 7 x ¢: 1^ 1(U) — U x F are almost ubiquitously referred to as “local trivialisations” , 
the emphasis is generally placed on the consequence that the total space structure is locally isomorphic to the 
"trivial" direct product structure on the set U x F. This is important for the study of global combinatorial 
topology. However, a more important consequence for local geometry is that the local trivialisation forces 
each fibre set to have a regular submanifold structure, and the fibre-set-restricted fibre chart is then an 
isomorphism between the regular submanifold structure on the fibre set z^ !((p]) and the structure on the 
fibre space F. Most of the significant geometric structures (such as connections) on differentiable fibre bundles 
are built on the regular submanifold structure and regular embeddings of fibre sets. Thus Theorem 64.3.9 is 
one of the core theorems of differential geometry. (See also Theorem 64.11.3 for the corresponding assertions 
for fibre bundles.) 


64.3.12 REMARK: Fibre chart transition maps for a differentiable fibration. 

Let k € Zi. Let ġe : 1-1 (U;) — Up x F be C* fibre charts for a C* fibration (E, m, F) with fibre space F. 
(The sets U; can be reconstructed from given fibre charts $; as Ue = 7(Dom(¢¢)) for £ = 1,2.) Then the 
function (m x $2) o (1 x $1)! : (Ug, QUo,) x F z (Ug, NUg,) x F is a C^ diffeomorphism which maps (p, y) 
to (p. gós,o: (p) (y)) for all p € Ug, N Us, for some function g¢,,¢, : Ug, N Us, — C*(F, F). This function is 
given the name "fibre chart transition map" in Definition 64.3.13. 


64.3.13 DEFINITION: The fibre chart transition map from $1 to 9», for OF F-fibre charts ¢, and $» for a 
C* fibration (E, v, M) with fibre space F for k € Zg, is the map go,,4, : U1 N Us > C*(F, F) defined by 


Vp € U, N U2, Vy € F, 962,01 (P)(y) = bo((m X é1) (p, )). (64.3.1) 
where Ug = 1(Dom(¢¢)) for £ = 1,2. In other words, 
-1 
Vp € U1 N Us, Gb2,61 (p) = $90 Mi uy (64.3.2) 


64.3.14 REMARK: The implicit group of diffeomorphisms of the fibre space of a differentiable fibration. 
The function gg,,4, in Definition 64.3.13 is of class C^ in the weak sense that the map p — 95,4, (p)(y) is C^ 
from Us, N Ug, to F for each fixed y € F. Without a C^ structure on C^(F, F), it is not possible to assert 
differentiability in the stronger sense of the function gy,,4, being of class C^ from Us, N Us, to C*(F, F). 
However, it is noteworthy that for any fixed p € M, the set of maps gy, \4,(P) : y  ges,ev(p)(y), for all 
C* fibre charts $4 and $2 for (E, r, M) with p € Dom(g¢,,¢,), is a group of C^ automorphisms of F. 
The smallest group containing all of these pointwise groups may be thought of as the “implicit group of 
diffeomorphisms" of the differentiable fibration. 


64.4. Fibre atlases for differentiable fibrations 


64.4.1 REMARK: The specification of an explicit fibre atlas for a differentiable fibration. 

An advantage of the absence of a fibre atlas in Definition 64.2.2 is the fact that a C^ fibration (E, v, M) for 
a fibre space F is also a C" fibration for any C^ manifold F’ which is diffeomorphic to F. This is as one 
would intuitively expect. Definition 64.4.4, on the other hand, specifies a particular atlas for a particular 
fibre space F. It is straightforward to define equivalence relations among atlases so that this dependency on 
a particular space F is removed. 


Definition 64.4.4 is preferable to Definition 64.2.2 because it includes it as a special case if one defines A7 to 
be the set of all C^ F-fibre maps as in Definition 64.3.7. Then Definition 64.4.4 conditions (iii), (iv) and (v) 
will be satisfied. 


Although the specification of a fibre atlas for a differentiable fibration is in some ways superfluous, in practice 
it is often the fibre atlas which is specified, while the differentiable (and therefore topological) structure on 
the fibration is induced by the fibre atlas. 


It is possible to extend a fibre atlas to all fibre charts which are compatible with the structure on the total 
space up to some level of differentiability, and consistent with some structure group for the bundle. However, 
as mentioned in Remarks 50.4.9, 51.2.2, 51.4.4, 54.5.31 and 54.9.12, requiring differentiable structures to have 
maximal atlases is usually a bad idea. This is even more true in the case of fibre bundle structure, where 
not only the differentiable structure but also the group structure of the bundle may be ^washed out" by 
including all charts which are compatible at some level. Therefore “complete” or “maximal” atlases are 
avoided here also. 
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64.4.2 DEFINITION: A C* fibre atlas for k € Zi for a fibre space F for a C^ fibration (E, m, M) is a set 
AE. of C* fibre charts for fibre space F for (E, m, M) such that Usgear Dom(9) = E. 


An indexed C^ fibre atlas for k € Zg for fibre space F for a C* fibration (E, r, M) is a family (¢;)ie7 of C^ 
fibre charts for fibre space F for (E, r, M) such that U,<; Dom(¢;) = E. (in other words, Range(ó) is a C^ 
fibre atlas for F.) 


64.4.3 REMARK: Compatible fibre charts for a differentiable F-bundle. 

There is no need to define “compatible fibre charts” for the F-bundles in Definition 64.4.4. Any C^ F-fibre 
chart according to Definition 64.3.2 is automatically compatible with and specified C^ F-fibre atlas on a 
given F-fibration (E, m, M). Consequently the F-bundle in Definition 64.4.4 are nothing more or less than 
an F-fibration with an arbitrary choice of F-fibre atlas as in Definition 64.4.2. 


It follows that Definition 64.4.4 is of very little value. The addition of a fibre atlas becomes useful, because 
it adds useful information, in the case of a differentiable fibre bundle as in Definition 64.8.3, which specifies 
a structure group which constrains the transition maps between fibre charts. 


64.4.4 DEFINITION: A C* (differentiable) F fibre bundle, for k € Zf and a C^ manifold F, is a tuple 
E < (E,1, M, AZ) < (E, Ag, 1, M, Am, AL) such that 


(i) E < (E, Ag) and M < (M, Ay) are C* manifolds, 
(ii) v : E — M is a C* surjection, 
(ii) Vo € AE, 3U; € Top(M), à : x 1(U4) — F is a C* differentiable map, 
(iv) Useak Us = M, 
(v) Vo € AE, mx ó:m-1(U4) — Ug x F is a C* diffeomorphism. 


Alternative names: C" (differentiable) F-bundle or C* (differentiable) fibration with an F-fibre atlas. 
A fibre chart for E < (E,v, M, A5) is any element of A‘. 


64.4.5 REMARK: The differences between fibrations and fibre bundles. 

It is somewhat questionable whether Definition 64.4.4 describes a fibration with an added fibre atlas, or a 
fibre bundle which lacks a structure group. These are both true, but for clarity, it is necessary to have some 
kind of clear borderlines between the categories of objects. 


(1) fibration or F-fibration: A tuple (E,7, M) with “local triviality". Local fibre charts exist, but are 
not specified. (See Definitions 64.1.2 and 64.2.2.) 


(2) F fibre bundle or F-bundle: A tuple (E, x, M, AZ) with a specified fibre atlas for a specified fibre 
space F, but no structure group. (See Definition 64.4.4.) 


(3) (G, F) fibre bundle or (G, F)-bundle: A tuple (E, x, M, AE) with a specified fibre atlas for a specified 
fibre space F and structure group G. (See Definition 64.8.3.) 


4 principal G-bundle: A tuple ] , 7T ; M , A wi h a specified fibre atlas for a spec ified structure gI oup G t 
P 
(See Definition 66.1.2.) 


There is considerable diversity of terminology in the literature for fibrations and fibre bundles. The view is 
taken here that a fibration has no fibre atlas, and that the word “bundle” means that there is a specified 
fibre atlas. One may imagine that the specified fibre charts are “bundling” together the fibre sets at different 
points in a specified way. In other words, it is the specified fibre charts which do the “bundling” of fibre 
sets. In the case of fibrations, the fibre sets are assumed to be “bundlable”, but not actually “bundled”. 


In practice, there is relatively little that can be accomplished with a fibre atlas which does not have a 
specified structure group. The structure group provides the link between associated fibre bundles, which 
require different fibre spaces to be acted on by the same structure group. Parallelism and connections cannot 
be copied from one fibre bundle to another if there is no specified structure group. Furthermore, associated 
connections are defined in terms of the Lie algebra of the structure group, which is easy to define when the 
group is a finite-dimensional Lie group, but is otherwise more difficult. Consequently Definition 64.4.4 is of 
limited value. (The useful differentiable (G, F) fibre bundles are defined in Section 64.8.) 
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64.5. Horizontal components and verticality of total space vectors 


64.5.1 REMARK: Fibre bundle concepts which can be defined without a structure group. 
Some concepts may be defined for differentiable fibrations without fibre atlases or structure groups. These 
include horizontal components and verticality of total space vectors, and differentiable cross-sections. 


64.5.2 REMARK: The induced map of a differentiable fibration projection map. 

The map 7, : T(E) + T(M) in Definition 64.5.3 is the induced map of a C! map between Ct manifolds in 
Definition 58.9.4. (See Figure 64.5.1.) The vector 7,(w) can be expressed in terms of the differential map dz 
as 7,(w) = (dz);(w), where z = *g(w) and tz: T(E) — E is the tangent bundle projection map for T(E). 


TE 
E 4—— T(E) 


| | 


TM 


Figure 64.5.1 Induced map of projection map of a differentiable fibre bundle 


Definitions 64.5.3 and 64.5.4 and Notation 64.5.6 are generalisations to differentiable fibrations of Definitions 
59.2.2 and 59.2.3 and Notation 59.2.5 respectively, which are for tangent bundles. 


64.5.3 DEFINITION: The horizontal component of a tangent vector w € T(E) for a C! fibration (E, r, M) 
is the tangent vector 7,(w) € T(M) 


64.5.4 DEFINITION: A vertical (tangent) vector on the total space E of a C fibration (E, r, M) is a vector 
in T(E) whose horizontal component is zero. 


64.5.5 REMARK: Vertical vectors on the total space are in the kernel of the projection-map induced map. 
The set {w € T, (E); (dx);(w)-— 0} = {w € T,(E); v.(w) = 0} of vertical tangent vectors in Definition 64.5.4 
for a fixed point z € E is a linear subspace of T, (E), namely the kernel ker((d7);) of the pointwise differential 
map (dz); : T;(E) > Tr(a (M). Both the horizontal component of tangent vectors in T(E) and the linear 
spaces of vertical tangent vectors on E are independent of the choice of fibre charts. 


The oblique total space tangent-vector sets in Notation 64.5.6 arise very frequently in relation to connections 
on differentiable fibre bundles. 


64.5.6 NOTATION: Oblique total space tangent-vector sets with specified horizontal component. 
T; v(E), for z € E and V € T,)(M), for a C! fibration (E, v, M), denotes the set of vectors in T;(E) 
which have horizontal component V. In other words, 


T,v(E) = {y € T;(E) n.(y) = V). 


64.5.7 DEFINITION: The vertical subspace of the tangent space T; (E) of a C! fibration (E, m, M) at a total 
space element z € E is the linear subspace T; o(E) = (y € T;(E); v.(y) = 0) of T;(E). 


64.5.8 REMARK: Vertical components and horizontality of total-space tangent vectors are not well defined. 
There is no definition of vertical components or horizontality of vectors for differentiable fibrations because 
these concepts are not fibre-chart-independent in general. It is the purpose of connections in Chapters 67, 
68 and 69 to make vertical components and horizontality of vectors fibre-chart-independent by providing a 
fixed structure on the manifold relative to which these concepts can be calculated. 


64.5.9 REMARK: Horizontal and vertical components uniquely determine vectors. 

In Theorem 64.5.10, line (64.5.1) asserts that if the horizontal component 7,(Y) of a vector Y € T;(E) is 
known, and the vertical component (Y) is known with respect to some fibre chart ¢, then the vector Y is 
uniquely determined. Line (64.5.2) asserts that this vector Y is also unique in T(E). Theorem 64.5.11 is an 
immediate corollary of Theorem 64.5.10. 
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64.5.10 THEOREM: Unique existence of total space vector with given horizontal/vertical components. 
Let ¢ be a C! F-fibre chart for a C! F-fibration (E,7,M). Then 


Vz € Dom(9), VV € Tre) (M), VW € Ty.) (F), FY € T; (E), 
m(Y)=V and ¢(Y)=W. (64.5.1) 


In other words, there is a unique Y € T;(E) with horizontal and vertical components V and W. Hence 


Vz € Dom(¢), VV € Tre) (M), VW € Ty.) (F), FY € T(E), 
m(Y)=V and 4$(Y)-—W. (64.5.2) 


In other words, there is a unique Y € T(E) with horizontal and vertical components V and W. 


Pnoor: Line (64.5.1) follows from Theorem 64.3.5. 


For line (64.5.2), there exists Y € T(E) with 7.(Y) = V and $.(Y) = W. So Y € T(E) exists with 
these properties. To show uniqueness within T(E), let Y’ € T(E) with z,(Y') = V and ¢,(Y’) = W. 
By Notation 54.1.4, Y” € T (E) for some z' € E. Then z’ € Dom(d) because ¢,(Y") is well defined. So 
z! € Dom(z x $), and then (d(x x @))./(Y") is well defined by Definition 64.3.2. So by Theorem 64.3.5, 
(dx 3) (Y^) = i(V,W) = (dir 3 6) (Y). Thus (x 4), (Y^) = (3 6)(Y). But (16), : T(Dom(4)) > 
T (r(Dom(Q)) x F) is a C? diffeomorphism by Theorem 58.9.14. So Y' = Y. Thus Y is unique within T(E), 
which verifies line (64.5.2). 


64.5.11 THEOREM:  Bijection from oblique tangent-vector set to fibre space tangent space. 
Let ¢ be a C! F-fibre chart for a C! F-fibration (E, m, M). Then 


Vz € Dom(¢), We Tr(z) (M), 
(dd) zp. v(E) : Tz v(E) — To(z) (F) isa bijection. 


PROOF: The assertion follows from Notation 64.5.6 and Theorem 64.5.10 line (64.5.1). 


64.5.12 REMARK: Total space tangent vector which is horizontal via some chart. 

Although horizontality of tangent vectors on the total space is chart-dependent, it is sometimes useful to 
construct a specific total space tangent vector such as Hy (2) in Definition 64.5.13, which is “horizontal” 
with respect to some fibre chart ¢. For connections on fibre bundles, as in Theorems 67.11.4 and 69.6.8, one 
may employ such a vector to reconstruct horizontal lift maps from other representations of connections by 
using it to cancel the chart-dependence of another vector. 


64.5.13 DEFINITION: The horizontal fibre-set vector field with horizontal component V € T (M), for a C! 
fibre chart ¢ for a C! F-fibration (E, vg, M) with p € t_(Dom(¢)), is the vector field Hj, € X(T(E)| Ep) 
given by 


Vz € Ep, Hy 4(z) = (mg X $); (V, Or. (F)): 


64.5.14 REMARK: Horizontal submanifolds with constant vertical coordinate via a fibre chart. 

Theorem 64.5.16 (iv) expresses the horizontal vector field Hj o on the submanifold E, in Definition 64.5.13 
as the differential of the restriction of mg to the submanifold E*? = ¢~1({q}) of E in Definition 64.5.15, 
where q = ¢(z). This is a kind of “horizontal submanifold” of E with "altitude" q € F as measured via the 
fibre chart 4. 

Theorem 64.5.16 (iv) requires a submanifold tangent vector embedding map m : T(E*9?) > T(E) as in 


Definition 54.6.2 to convert tangent vectors on the submanifold E%? to tangent vectors in the ambient 
space E. (Theorem 64.5.16 (iv) and Definitions 64.5.13 and 64.5.15 are illustrated in Figure 64.5.2.) 


64.5.15 DEFINITION: The horizontal submanifold of a C? fibration (E, mp, M) with fibre space F, for a 
fibre space element q € F via a fibre chart 9, is the set 


Bt? = $^ ((a)), 


together with its induced submanifold atlas {4% o "UE nod v € atlas(M)). 
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Ep 
r D2 (z), 4, H$ 4, (z) 
poo z 
AY p, (2) 
E Eb2( , E / 
2,8 Hy 4, (2) 
"CA ub 
E "| HET 
s V;ó2 J 
TE | 
ul y—v | 
Figure 64.5.2 Horizontal fibre-set vector fields and horizontal submanifolds 


64.5.16 THEOREM: Some basic properties of horizontal fibre-set vector fields. 
Let E < (E, ng, M) bea C! F-fibration. Let ¢ be a C! F-fibre chart for E. 


(i) Vp € tz(Dom(¢)), VV € T (M), Vz € Ep, (dmg); (Hy ,(z)) = V. 

) (Dom(¢)), VV € T (M), Yz € Ep, (d$): (Hy ,(2)) = Ory.) (F) 
(iii) Vp € rg(Dom(9)), VV € T (M), Yz € Ep, AY (2) € T, v (E). 

(iv) Vp € rg(Dom(9)), VV € T,(M), Yz € i 


) 
-1 
Hg (z z)- = m( TE| poco ave WV )) = "i( (Tal po., 5p" (V)), 

where gi : T(E9U)9) — T(E) is the tangent vector embedding map for the regular Ct submanifold 
E9609 = $-1(fó(z))) of E as in Definition 64.5.15. 


(ii) Vp € mg 


om 


Pnoor: Parts (i) and (ii) follow directly from Definition 64.5.13. 
Part (iii) follows from Definition 64.5.13 and Notation 59.2.5. 
Part (iv) follows from Definition 64.5.13 and Theorem 58.7.5 line (58.7.3). 


64.5.17 REMARK: The derivative of a constant cross-section equals a horizontal vector field. 
'Theorem 64.5.18 is essentially equivalent to Theorem 64.7.18. It states in effect that the directional derivative 
of a “constant cross-section" in Definition 21.6.6 is equal to the horizontal vector field in Definition 64.5.13. 


Using the constant cross-section notation of Definition 21.6.6, one may write Theorem 64.5.18 as 
-1 
Vp € M, VV € T,(M), Yq € F, Ov Xo4 = Hy s(0| y, (a) 


where the cross-section Xg q € X(E, ng, M | g(Dom(9))) is the map X5, : rg(Dom(9)) + Dom(9) which 
is defined by X454, : b> (mg x à)! (b, q). 

One might reasonably expect the derivative of a constant functions to equal zero, but Theorem 64.5.18 
gives a non-zero formula as the derivative. This is not a contradiction because the vertical component 
of the derivative does equal zero. Derivatives of cross-sections of fibre bundles always have a non-zero 
horizontal component, which projects down to the base manifold vector V by Theorem 64.5.16 (i). The 
vertical component of the derivative of a constant cross-section then equals zero by Theorem 64.5.16 (ii). 
'Thus Theorem 64.5.18 is exactly as one should expect. 


64.5.18 THEOREM: Formula for directional derivative of constant cross-section. 
Let (E, 1g, M) be a C! F-fibration. Let ¢ be a C! F-fibre chart for E. Then 


Vp € M, VV € T,(M), Yq € F, 
Ov (te x 4) 1 (,a)) = HF o( (TE x à)" (p.q)) 
= Hý (|z (0) 
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Pnoor: Let z = (mp X ¢)71(p,q). Then z = lp (a) € E, and $(z) = q. So 


Hj s((ng x $) !(p,q)) = HY 4(2) 


= (d(xz x 6) (,ó(2))«s((V) (64.5.3) 
= (d(e x $)~*(-,4))p(V) 
= üy((r& x 6) ^ (.4)), (64.5.4) 


where line (64.5.3) follows by Theorem 58.7.5 line (58.7.4), and line (64.5.4) follows from Definition 61.2.3. 
This verifies the assertion. 


64.5.19 REMARK: Vertical fibre-set vector fields. 
It is not so simple to construct vertical vector fields on fibre sets similarly to the horizontal vector fields in 
Definition 64.5.13. The analogous vertical vector field LES s Ep > Uze Eg, T „(E) would presumably satisfy 


Vz € Ep, Uss(z) = (te x 6) (0r, qu), U(O(2))), 


where u € X(T(F)) is some vector field on F. The horizontal vector parameter V in Definition 64.5.13 is 
replaced here by a vector field parameter u. This is required because each point z € E, is mapped by ¢ to 
a different point ó(z) € F. It is difficult to replace u with a “constant field” on F in general, although in 
the special case that F is a Lie group, left invariant vector fields can be defined. (This is effectively what is 
done in the case of the “fundamental vertical vector fields" in Section 66.6.) 


64.5.20 REMARK: The importance of horizontal components and vertical vectors for covariant derivatives. 
'The purpose of covariant derivatives of vector fields is to replace chart-dependent naive derivatives with 
a chart-independent derivative. (See Section 68.2 for covariant derivatives on ordinary fibre bundles. See 
Section 71.6 for covariant derivatives on tangent bundles.) The first step towards defining a covariant 
derivative is to subtract some kind of “tensorisation term" from a naive derivative to produce a vertical 
vector. One could think of this as a *verticalisation term". (The verticality of the difference follows from the 
equality of the horizontal components of the two terms.) The second step is to apply a “drop function” as 
in Section 64.6, which should preferably map vertical vectors in a chart-independent fashion to fibre space 
vectors. (Unfortunately this is not possible. See Theorem 64.6.8.) In the tangent bundle case, the range of 
the drop function is the set of base-space vectors. 


The same strategy is applied (successfully) to define Lie derivatives by subtracting suitable “verticalisation 
terms" from naive derivatives of vector fields and tensor fields. (See Section 61.8 for Lie derivatives and Lie 
transport.) For Lie derivatives, the verticalisation terms are derived from integral curves of vector fields, 
whereas for covariant derivatives, they are derived from connections. 


64.6. Drop functions for vertical total space vectors 


64.6.1 REMARK: The usefulness of chart-dependent drop functions. 

The drop functions for double tangent bundles in Section 59.2 are fibre-chart-independent because there is 
effectively only one choice of fibre chart for such tangent bundles. (This is also mentioned in Remark 68.2.6.) 
For a general ordinary fibre bundle (E, x, M, A‘), there are typically infinitely many possible fibre charts, 
even if one restricts attention to only the vertical vectors at a single point p € M. Thus the vertical vectors 
in T; o(E) for z € E in Definition 64.5.7 cannot be mapped in an obvious unique manner to Tz, (M). (For 
the fibre-chart-independent drop function for vector bundles, see Definition 65.3.5.) 


The main application of Definitions 64.6.2 and 64.6.6 in this book is to define a fibre-chart-dependent 
covariant derivative for connections on general ordinary fibre bundles in Definition 68.2.5, but the purpose of 
this covariant derivative is to show what goes wrong in such a general case. Vector bundles, by contrast, have 
well defined fibre-chart-independent covariant derivatives (in Definition 68.2.9) because of the feasibility of 
fibre-chart-independent drop functions. 

Definition 64.6.2 is stated in terms of C! F-bundles, but it is clearly applicable to Ct (G, F) fibre bundles 
as in Definition 64.8.3, which constrain fibre atlas transition maps to be in a specified structure group. In 
other words, the structure group plays no role in Definition 64.6.2. 
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64.6.2 DEFINITION: Fibre-chart-dependent pointwise drop function for general fibre bundles. 
The drop function at a point z € E , for a C! F-bundle (E, r, M, AL), for a fibre chart ¢ € AL, is the linear 
map c? : ker((dz);) — Ta(;(F) which is defined by 


Vy € ker((dr)z), e (y) = (d9).(y). 


64.6.3 REMARK: Restriction of differential of fibre chart to the vertical subspace. 

A subtle point in the statement and proof of Theorem 64.6.4 is that the differential (d$); : TZ(E) > Ty 2) (F) 
of a fibre chart $ is not generally an isomorphism (unless M is zero-dimensional). This observation follows 
from the fact that dim(T;(E)) = dim(E) = dim(M)  dim(F) = dim(Tz(,,(M)) + dim(T4(,,(F)). However, 
since the pointwise fibre set x !((p]) is C! diffeomorphic to the fibre space F (via the restriction o| 1py) 
of à), it is not at all surprising that the restricted tangent space ker((d);) is linear-space isomorphic to the 
linear space Tg(-) (F) via the restricted differential (do of the C! map ¢: E > F. 


Je lasst 


The subset z^ !((p]) of E is a regular C! submanifold of E by virtue of the diffeomorphism Plr- 
1 l([p]) — F. (See Definition 52.3.7 for regular submanifolds of differentiable manifolds.) 


{p}) 


64.6.4 THEOREM:  /somorphism between horizontal vectors and fibre space tangent vectors. 
Let (E,7,M,A%) be a C! F-bundle. Let ó € AL and z € Dom(¢). Then the drop function c? : 
ker((dz);) — Ty z)(F) is a linear space isomorphism. 


PROOF: The linearity of w$ : ker((dz);) > Tycz)(F) follows from the definition of the differential of the 
C! map $ between the two C! manifolds E and F at a single point z € E. (See Definition 58.4.5 for the 
differential of a differentiable map between manifolds.) 

Let p = n(z). Then $|. i (gy :m l((p]) + F is a C! diffeomorphism by Theorem 64.3.9 (ii). Therefore 


(49). escas), : ker((dz);) > Tyz)(F) is a linear space isomorphism. 


64.6.5 REMARK: The global drop function for a differentiable fibre bundle. 
The drop function in Definition 64.6.6 may be thought of as the “total drop function” for each ¢ € AE. The 
analogous construction for a double tangent bundle is given by Definition 59.2.15. 


64.6.6 DEFINITION: Fibre-chart-dependent global drop function for general fibre bundles. 
The drop function for a C! F-bundle (E,7,M, AL), for a fibre chart ¢ € AL, is the linear map c? : 
Uzepom(o) 12,0 (E) + T(F) which is defined by 


Vz € Dom(ó), Vy € Tz o(E), w? (y) = wf (y). 
In other words, 7? = UzeDom(¢) v. 


64.6.7 REMARK: Transformation rules for fibre bundle drop functions. 

One naturally asks what the transformation rules are for the chart-dependent fibre bundle drop functions. 
Theorem 64.6.8 expresses these in terms of the fibre chart transition maps in Definition 64.3.13. (These 
transition maps are applicable to the definition of covariant derivatives and curvature for general connections.) 


64.6.8 THEOREM: Transformation rule between fibre bundle drop functions. 
Let (E, n, M, AZ) be a C! fibre bundle with fibre space F. Then 


Vói, d2 € Ah, Vz € Dom(¢1) N Dom(¢2), Vy € Tz,0(E), 
e ues gs as (ae (ou! (y)), 


where go;,5, : U1 N U2 + C! (F, F) is the fibre chart transition map for $1 and $», and Dom($;) = m~! (U;) 
for £ = 1,2. 
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Proor: Let (E,7,M, AE) bea C! F-bundle. Let ¢1,¢2 € AE and z € Dom(¢1) n Dom(¢2). Let p = T(z) 
and E, = x !((p]). Then 


Vy € T; (E), ?* (y) = (dóz)«(y) (64.6.1) 
= (d(¢2 o dila o $1))«(y) 
= (d($ o dila) e(z) (do1)-(y) (64.6.2) 
= (d(942,41 (P))) o(z) (G1) z(y) (64.6.3) 


=g $2,1 (P )« (w $3 (y)), 


where line (64.6.1) follows from Definitions 64.6.6 and 64.6.2, line (64.6.2) follows from Theorem 58.4.13. 
the chain rule for C! manifold maps, and line (64.6.3) follows from Definition 64.3.13 line (64.3.2). 


64.6.9 REMARK: Vector bundle drop functions. 

After a total space element z € E has been “dropped” onto T'(F) as in Definition 64.6.2, it is not possible to 
drop it further from T'(F) to F in a meaningful way unless F is a linear space. But if F is a linear space, one 
may apply a linear space drop function as discussed in Remark 53.3.13 and defined in Definition 54.9.5. Thus 
if w” : T(F) > F is the natural concrete identification (i.e. drop function) from the tangent bundle Tu ) 


to ip finite-dimensional linear space F’, then one obtains the composition ble o c ome = ez. È 


m(z) 

o hx : T, o(E) > F for each ¢ € AL and z € Dom(¢). (This is done in Definition 65.3.5.) One could 
os refer to this as a “vector bundle drop function", as contrasted to the more general “fibre bundle 
drop function" in Definition 64.6.2. 


64.7. Differentiable cross-sections and naive derivative operators 


64.7.1 REMARK:  Differentiable cross-sections of differentiable fibrations. 

The set X(F,7,M) of cross-sections of a non-topological fibration (E, m, M) is defined in Definition 21.3.3 
and Notation 21.3.4. Continuous cross-sections of topological fibrations are defined in Definition 47.4.6. 
Differentiable cross-sections of differentiable fibrations follow the same pattern by adding a differentiability 
requirement to the basic cross-section definition. 


Definition 64.7.2 and Notation 64.7.3 apply also to C* fibre bundles. In general, any definition for fibrations 
applies to the corresponding class of fibre bundles. 


64.7.2 DEFINITION: A C* cross-section of a C* fibration (E, r, M) for k € Zi isa C} map X: M > E 
such that mo X = idm. In other words, X € X(E, v, M) N C*(M, E). 

A CF local cross-section of a C" fibration (E, n, M) with domain U € Top(M), for k € E. is a C^ map 
X : U > E such that m o X = idy. In other words, X € X (E, r, M |U) N C*(U, E). 


64.7.3 NOTATION: X*(E,7,M), for a C^ fibration (E, v, M) and k € Zf, denotes the set of C* cross- 
sections of (E, r, M). In other words, X (E, r, M) = X(E,v, M) N C*(M, E). 

X*(E,7,M|U), for a C* fibration (E,*, M), U € Top(M) and k € Zj, denotes the set of C* local 
cross-sections of (E, r, M) with domain U. In other words, X^(E, x, M |U) = X(E,«, M |U) A C*(U, E). 
XE (E, n, M), for a C* fibration (E, v, M) and k € Zi, denotes the set of C^ local cross-sections of (E, m, M) 
with any domain U € Top(M). In other words, X¥ (E, v, M) = Uvetop i) X*(E,n, M |U). 


64.7.4 REMARK: Replacing given-atlas tests with complete-atlas tests. 

The spaces C^(M, E) and C^(U, E) in Definition 64.7.2 and Notation 64.7.3 are defined in terms of charts in 
atlas(.M) and atlas(E). (See Definition 52.1.2.) But sometimes it is necessary to use charts which are only 
C* compatible with those atlases. Theorem 64.7.5 asserts that C* cross-sections are C^ maps when viewed 
via any pair of C* compatible charts. 


64.7.5 THEOREM: Cross-section C" differentiability tests using C" compatible charts. 
Let k € Zj. Let (E, v, M) be a C* fibration. Let U € Top(M). Let X € X(E,7, M). 
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(i) X € X*(E,7, M) if and only if 
Vi € atlas" (M), Vv» € atlas* (E), 3 o X o v1! isa C* map. 
(ii) X € X*(E,7, M) if and only if 
Vw € atlas" (M), Vy» € atlas" (E), Y2 o X ov! € C*(y4(X-! (Dom(v»))), Range(v;)). 
(ii) X € X*(E,7, M |U) if and only if 
Vi € atlas" (M), VV» € atlas*(E), p2 o X o v1! isa C* map. 
(iv) X € X*(E,n, M |U) if and only if 
Vi, € atlas" (M), Vv» € atlas" (E), Y2 o X ov! € C*(v4(X-! (Dom(v»))), Range(4»)). 


PROOF: Part (i) follows from Notation 64.7.3 and Theorem 52.1.15. 
Part (ii) follows from Notation 64.7.3 and Theorem 52.1.15. 
Part (iii) follows from Notation 64.7.3 and Theorem 52.1.15. 
Part (iv) follows from Notation 64.7.3 and Theorem 52.1.15. 


64.7.6 REMARK:  Naive differentiation of fibration cross-sections by vectors. 
Definition 64.7.7 defines the action of a base-space vector on a cross-section of a differentiable fibration. This 
closely corresponds to the action of a base-space vector on a vector field on a manifold in Definition 61.2.3. 


The partial derivative notation Oy Y emphasises that this naive derivative is not the same as the covariant 
derivative which is denoted DyY in Definition 68.2.9. Nor is it the same as the Lie derivative which is 
denoted Ly Y. Both DyY and LyY are constructed from Oy Y by subtracting suitable “correction factors". 


64.7.7 DEFINITION: The action of a vector V € T(M) on C! cross-sections of a C! fibration (E, m, M) is 
the map Oy : X! (E, v, M) > T(E) defined by 


VV € T(M), VY € X! (E,n, M), OvY —YQV), 
where Y, : T(M) — T(E) is the induced map of Y. 


64.7.8 REMARK: The action of a vector on a cross-section resides in the total-space tangent space. 

The naive differential derivative Oy Y in Definition 64.7.7 is the rate of change of the differentiable cross- 
section Y in the direction V. This rate of change is not in the tangent space T(M). It is in the tangent 
space T(E) of the total space E. In fact, Vp € M, VV € T,(M), VY € X!(E,v, M), 0yY € Ty) (E). 


64.7.9 THEOREM: Linearity of action of a vector on cross-sections of vector bundles. 
Let (E, v, M) be a C! fibration. Let Y € X!(E,z, M) and p € M. Then the map V + OyY from T,(M) 
to Ty(p)(£) is a linear map. 


Pnoor: The assertion follows from Definition 64.7.7 and Theorem 58.4.9 (iii). 


64.7.10 THEOREM: The projection map differential is a left inverse of all cross-section differentials. 
Let (E,v, M) be a C! fibration. Let X € Xi. (E,«, M). Let U = Dom(X). Then 


Vp € U, VV € T,(M), (dr) x(p) ((EX),(V)) = V. 
In other words, 

Vp € U, (drt) x(p) ° (dX), = idr, (ur. 
In other words, 7, o X. = idr({v). 


PROOF: The assertions follow from the equation 7 o X = idy in Definition 64.7.2 and Theorem 58.4.13, 
the chain rule for C! manifold maps. 


64.7.11 REMARK: Substitution of cross-section velocities into oblique double tangent vector formulas. 

It often happens in the theory of connections that a vector y € T; y (E) is required, where (E, vg, M) is a 
C! fibration. In such a situation, the input parameters z € E and V € T, (;) (M) are given, and the task is 
to find a convenient vector in T; v (E). Theorem 64.7.12 asserts that wherever a formula requires in input 
y € T; y (E), it is possible to substitute Oy X for y for any X € XL.(E, mg, M) with X(p) = z. This satisfies 
the requirement, although the new task is then to find X satisfying X(p) — z. 
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64.7.12 THEOREM: Horizontal component of naive derivatives of fibration cross-sections. 
Let (E, v, M) be a C! fibration. Then 


YX € XL, (E, v, M), Vp € Dom(X), VV € T,(M), 
dy X = (dX) (V) € Tx), v (E). 


(See Notation 64.5.6 for T’x(p) y (E).) 


PROOF: The assertion follows from Definition 64.7.7 and Theorems 64.7.10 and 64.7.9. 


64.7.13 REMARK: Gauge pull-back of differential form on fibrations. 
Definition 64.7.14 introduces a very general kind of “gauge pull-back” of a differential form from the total 
space of a fibration to the base space. (This could be further generalised to all covariant tensor fields, but it 
is the differential forms which are most useful in gauge theory.) 


The gauge pull-back concept is an application to fibrations of the vector-valued differential form pull-back 
concept in Definition 58.11.10 for general C! maps between C! manifolds. 


In gauge theory, the cross-section X is referred to as the “gauge”. One or more such gauges may be used 
in the analysis of a field theory scenario to express equations in a convenient way. Physicists generally work 
with fields which are defined on the base space, whereas mathematicians typically work with fields on fibre 
bundle total spaces. 


Some typical applications of the “gauge pull-back” concept are Definitions 69.11.3 and 70.6.2, which are for 
differential forms of degree 1 and 2 respectively. 


64.7.14 DEFINITION: A gauge pull-back of a differential form w € X(Am(T(E),W)) on the total space of 
a C1 fibration (E, m, M) by a cross-section X € X1(E,7,M), where m € Z+ and W is a linear space, is the 
differential form wx € X (As (T(M), W)) which satisfies 


Yp E€ M, VV = (Weg € Tp (M),  ex(V) = w( (X. (Veo ) 
= w( (8v, Xo ). 


64.7.15 REMARK: Constant cross-section extension differentiability. 

Theorem 57.2.20 (ii) asserts that constant vector field extensions of tangent vectors are C^ for tangent 
bundles of C^*! manifolds. Theorem 64.7.16 makes the corresponding assertion for constant extensions of 
total space elements to cross-sections of general differentiable fibrations. 


64.7.16 THEOREM: Dhifferentiability of constant cross-sections of differentiable fibrations. 
Let k € Zg. Let (E, r, M) be a C^ F-fibration. Let be a C* F-fibre chart on E. Then 


Vz € Dom(¢), Extng(z) € X} (E, r, M | «(Dom(9))). 
(See Definition 21.6.8 for the constant cross-section extension Extng(z) of z.) 


PROOF: Letz € Dom(¢). By Definition 21.6.8, Extng(z) = (rlon x ¢)71(-,¢(z)). By Definition 64.3.2, 
the map (albom) x à) ! : «(Dom(9)) x F is C^. Therefore Extng(z) is C^ by Theorem 52.6.8 (i). Hence 
Extng(z) € X'(E,m, M | «(Dom(9))) by Definition 64.7.2. 


64.7.17 REMARK: The differential of a constant cross-section is a “zero vector” in some sense. 
Unsurprisingly, the differential of the constant cross-section in Theorem 64.7.16 is some kind of zero vector. 
It is in fact the vector AY nor as shown in Theorem 64.7.18. This is not precisely a zero vector, but rather 
a vector with zero vertical component when viewed through the same fibre chart for which the cross-section 
is constant. The horizontal component is V, which is typically non-zero. 


64.7.18 THEOREM: The differential of a constant cross-section has zero vertical component. 
Let (E, ng, M) be a C! F-fibration. Let ¢ be a C! F-fibre chart on E. Then 


Vz € Dom(¢), VV € Trp() (M), (dExtng(z))ap(2)(V) = Hy.s(2)- 
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(See Definition 64.5.13 for Hy 4(z)-) Hence or otherwise, 


Vz € Dom(9), VV € T,,(2)(M), $, (Oy Extng(z)) = Orr, cy (F): (64.7.1) 


PROOF: Let z € Dom(¢), p = "g(z) and V € T,(M). Then 


(d Extno(z))« (V) = (d((a x 6) C,6(2))(V) 
= (d( x 9) p,e) (V; Or, 00) (64.7.2) 
( 


by Definition 64.5.13, where line (64.7.2) follows from Theorem 58.6.7 line (58.6.3). Then line (64.7.1) follows 
from Definition 58.1.2. 


64.8. Differentiable fibre bundles 


64.8.1 REMARK:  Differentiable manifolds with structure groups which are manifolds. 
The fibre bundles presented in Section 64.8 are assumed to have structure groups which are finite-dimensional 
differentiable manifolds. Therefore all of the relevant spaces are manifolds: the base space, total space, fibre 
space and structure group. Although this is very convenient for analysis, the more general case of structure 
groups which are not finite-dimensional manifolds could be useful in some applications. 


64.8.2 REMARK: Upgrading topological fibre bundles to differentiable fibre bundles. 

The definition of a topological fibre bundle in Section 47.3 involves four topological spaces: the total space E, 
the base space M, the fibre space F, and the structure group G. The definition also specifies three maps: 
the projection map 7 : E — M, the group operation o : G x G > G and the group action uw: G x F > F. 
Additionally there is an atlas AZ of fibre maps ¢: x^ 1(U) + F for open sets U € Top(M). A differentiable 
fibre bundle is the same except that the topologies are replaced with atlases and continuity is replaced with 
differentiability. The additional structures to be specified are differentiable manifold atlases for all four 
spaces, and the maps are required to be suitably differentiable. 


The differentiable (G, F) fibre bundle in Definition 64.8.3 satisfies the conditions for a topological (G, F) 
fibre bundle in Definition 47.6.5 if the atlases Ag, Ay, Ag and Ap are replaced with the corresponding 
induced topologies Tg = Top(E), Ty = Top(M), TG = Top(G) and Tr = Top(F). The other elements of 
the specification tuple stay the same. (See Definition 63.4.2 for C* Lie left transformation groups.) 


As mentioned in Remarks 51.3.9 and 49.2.4, the topology is always implied by a locally Cartesian atlas. 


Therefore when one writes (E, Ag), (M, Am), (G, AG) and (F, Ap), these are always abbreviations for 
(CE, Tg), Ag), ((M, Tm), Am), ((G, Ta), Aa) and (GF, Tr), Ar) respectively. 


64.8.3 DEFINITION: A C* (differentiable) (G, F) (fibre) bundle, for a C* Lie left transformation group 
(G, F) < (G, Ac, F, Ar, oc, u), for k € Zg, is a tuple E < (E,7,M, AE) < (E, Ag, m, M, Am, AE) which 
satisfies the following conditions. 


(i) (E, Ag) and (M, Am) are C* manifolds and 7: E — M is C*. 


(ii) V$ € AE, JU € Top( M), ó € C*(x-1(U), F) and x x 6: -1(U) 3 U x F is a C* diffeomorphism. 
(iii Useak Us = M, where Up denotes 7(Dom(¢)) for all ¢ € AL. 


) 
) 

(iv) Voi, d» € AE, Vp € Ug, N Uga, 3g € G, da 0 dlp, = Ly. 
) 


-1 
(v) Yoi, $2 € AE, 94,6, € C*(Us, N Uga, G), where gez., is defined by Vp € M, Las, o, (p) = 92 © lp: 


A C* (differentiable) ordinary fibre bundle, for k € Zj, is a C^ (G, F) fibre bundle for some C* Lie left 
transformation group (G, F). 


E is the total space of the fibre bundle. 
7 is the projection map of the fibre bundle. 
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M is the base space of the fibre bundle. 

AŻ is the fibre atlas of the fibre bundle. 

G is the structure group of the fibre bundle. 

F is the fibre space of the fibre bundle. 

¢ is a fibre chart of the fibre bundle for ¢ € A‘. 

1 l([p)) is the fibre set of the fibre bundle at p € M. 

Each element of E is a fibre of the fibre bundle. 

7 x ó is the local trivialisation of the fibre bundle by the fibre chart ¢ € AŻ. 
Jé2,¢, is the fibre chart transition map from $1 to $2, for $1, 0» € AÈ. 


64.8.4 REMARK:  Uniqueness of fibre chart transition map group element. 

As mentioned in Remark 47.6.10 for the related case of topological fibre bundles, the group element g € G 
in Definition 64.8.3 (iv) is uniquely determined by p, $1 and ¢2 by Theorem 20.2.7 because the action of G 
on F is effective by Definition 63.4.2 (ii). Thus g exists and is unique for all p, ¢; and $3. So the function 


9¢2,¢. : Ug, N Ue, — G in Definition 64.8.3 (v) is well defined by Vp € M, Lgs, , (p) = $2 © éilz- 


64.8.5 THEOREM:  Differentiable fibre bundles are derived from topological fibre bundles. 
Let (E, Ag, v, M, Am, AZ) bea C* differentiable (G, F) fibre bundle for a C* Lie left transformation group 
(G, Ac, F, Ar, oa, HE) for some k € Zg. 

(i) (E, Tg, n, M, Tm, AL) is a topological (G, F) fibre bundle for topological left transformation group 
(G, Ta, F, Tp, o6, ue), where Tp, Tm, Tg and Tp are the topologies induced by atlases Ap, Am, Ac 
and Ap on P, M, G and F respectively. 

(ii) (E, v, M, Af) is a non-topological (G, F) fibre bundle for the transformation group (G, F, og, uE). 


PROOF: For part (i), Definition 47.6.5 conditions (i), (ii) and (iii) follow from Definition 64.8.3 conditions 
(i), (ii) and (iii) respectively. Definition 47.6.5 condition (iv) follows from Definition 64.8.3 conditions (iv) 


and (ii). Definition 47.6.5 condition (v) follows from Definition 64.8.3 condition (v). 
Part (ii) follows from part (i) and Theorem 47.6.6. 


64.8.6 REMARK: Manifold charts for spaces in a differentiable ordinary fibre bundle. 
'The maps and spaces in Definition 64.8.3 are illustrated in Figure 64.8.1. 


2d a a 
eae 
c Ot : S 
T )JCE — > 
vem) "Decr 


vu| 


d 


Figure 64.8.1 Differentiable fibre bundle maps and spaces 


Figure 64.8.1 is essentially the same as Aue 47.6.2, except that manifold icta have been added. As 
mentioned in Remark 47.6.4, the map fi F denotes the the function-valued function ff : G —^ (F — F) defined 
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by f6(9)(f) = uE(g. f), where uE : G x F — F is the group operation of G on F. The circle on the arrow 
for Ë in Figure 47.6.2 indicates that the map is of the form BẸ : G — (F > F), not iE : G — F. That 
is, for all g € G, ii&(g) is a map f&(g) : F — F. In this case, the circled arrow BẸ is the action of a 
transformation group G on F. 

The manifold dimensions in Figure 64.8.1 are n = dim(M), m = dim(E), np = dim(F) and ng = dim(G). 
The domains and definitions of the manifold charts Ym € Am, Vg € Ag, vr € Ap and we € Ag are 
independent of each other and of the domains and definitions of the fibre charts 6 : x !(U) > F. 


64.8.7 REMARK: Analyticity and differentiability of structure groups for fibre bundles. 
The Lie transformation group in Definition 64.8.3 is defined in Section 63.4. Although the group itself is 
assumed to be analytic, the action on the fibre F is only assumed to be CF. 


If the group is assumed to be only C^, it will be analytic anyway although this is non-trivial to prove. It 
also turns out that if the action nma :G x F > F of G on F is C* with respect to F, then it is also C^ with 
respect to G. (See Remark 62.1.2 for Hilbert's fifth problem.) 

The structure group G for a fibre bundle has the role of classifying parallelism according to how much 


structure is preserved. If the structure group is large, not much fibre structure is preserved under parallel 
translations. A small structure group ensures that more structure is preserved. 


64.8.8 EXAMPLE: The trivial ((R, +), IR) fibre bundle on IR^. 

The trivial ((IR, +), IR) fibre bundle on IR^ is useful for deriving Maxwell’s equations for electromagnetism 
from gauge theory. The additive group (R,+) must not be confused with the multiplicative matrix group 
GL(1, R). However, the Lie algebra of (IR, 4-) is the same as the Lie algebra of U(1) = U(1, C). 

Let M = Rt, F = R, E = M x F and G = R, together with the C^? differentiable manifold atlases which 
each contain only a single chart, namely the identity map. In other words, the atlases are Ay = {idm}, 
Ap = {idp}, Ag = {idm x idp} = {idg} and Ag = {idg}. (The manifold R^ X IR is assumed to be identified 
with IR? via the usual real-tuple concatenation map.) 

Define 7: E 2 M and ¢: E > F by s : (p, f) 5 p and ¢: (p, f) 5 f for (p, f) € E. Let AZ, = {¢}. Then 
E < (E,ng, M, AE) < (E, Ag, rg, M, Am, AZ) is a C?? (G, F) fibre bundle by Definition 20.1.2. 

Let G have the usual real-number addition operation og : G x G > G with o : (g1, 92) ^ gi + ga. Then the 
identity of G is e = Or = 0, and T.(G) = (to,,,v; v € R}, where v = idg. So T.(G) may be identified with 
R via the map toy, — v. Define the action of G on F to be the additive translation map u: G x F > F 
with u : (g, f) 9 g + f. (See Definition 20.1.2.) 

Regarding Definition 64.8.3 condition (iv), the fibre set Ej equals (p) x R for all p € Rf. It follows that 
|p, : F > E, maps f to (p, f) for all f € F. Soós o fi|, = 9095. = idp = idr = Le = Lo for 
all 61,0» € AL and p € Rf. Thus condition (iv) is satisfied with g = e = 0. Hence go,,4, : M > G in 
condition (v) is a constant function gs,.4, : p — e, which is C^ for all $1, 0» € Af, by Theorem 52.1.9. 


(See Example 66.1.6 for a continuation of Example 64.8.8 to principal bundles.) 


64.8.9 REMARK: Transformation rules for differentials of fibre chart transition maps. 
The fibre chart transition map condition dg € G, $2 o dln = L, in Definition 64.8.3 (iv) puts a strong 
a SEO EN 


constraint on the transition map $2 0 $1, . This then places a strong constraint on the differential of the 


" 
transition map. This constraint depends on the variation of the group element g with respect to the base 
set M, which is unspecified except that it is of class C^ by Definition 64.8.3 (v). 


Theorem 64.8.10 line (64.8.1) indicates how the differentials (d$); and (dé2), are related via the value and 
differential of the function gy, pı : t2(Dom(¢:) N Dom(¢2)) > G. Lines (64.8.2) and (64.8.3) show more 
clearly the dependence of (ddéz), on ¢1 via a horizontal term depending on the value $1(z) and a vertical term 
depending on the differential (dó1);. (The expression (dLg(,)-1)g(p) is the “Maurer-Cartan form" at g(p). 
See Definition 62.5.4. See also the comments about this differential in Remark 64.8.13.) 


Note that Theorem 64.8.10 applies to individual vectors w € T,(£) in the tangent bundle of the total space E. 
It makes no assumptions about the existence or properties of vector fields on E. By contrast, Theorems 
64.13.11 and 64.14.5 assume the existence of vertical and non-vertical vector fields on E respectively, which 
are generated by Lie algebra elements of the structure group via fibre charts. 
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64.8.10 THEOREM: Transformation rule for the differential of a fibre chart transition map. 
Let (E, vg, M, AZ) be a C! (G, F) fibre bundle. Let $20 ¢i|, = L : F + F for all p € M and 


Iz: 9625.61 (p) 
1,2 € Af, as in Definition 64.8.3 (iv). Then 


Vpce M, Vze E p: Vó1, $2 € AE p, Vw € T, (E), 
(dó2)«(w) = Xicg.p,z w) (62(2)) + (dLa) 1 (2) ((d61)« (w)), (64.8.1) 
where u(g, p, z, w) = (dR ,)c ((dg)s((dv5):(w))) for all w € T. (E), and gys% is abbreviated to g. Hence 


Vp € M, Vz € Ey, Vài, eC M M ), 
(d62).(0) = (2) ro kao u(g,p zo) (91 (2)) + (dor)z(w) ) (64.8.2) 
a Jet ( XaGo,p,2,u) ( bile )) + (doi). (w)), (64.8.3) 
where ü(g, p, z, w) = (dL;(5)-:)g(p)((dg)p((dng);(w)))). 


PROOF: Letp c M, z € Ep, $1,092 € At» and w € T,(E). Then ¢2(Z) = g(tx(Z))d1(Z) for all 2 € 
Dom(¢,) n Dom(¢2). So by Theorem 63.6.31 line (63.6.5), 


(dé2);(w) = Xu (62(2)) + (dLg(p))41(2 ((db1)2(w)), 
where u = (dR m 1((dg)p((dmg)z(w))). This verifies line (64.8.1). Then 


XZ (G2(z)) = (Lop) 9 Loty-1))oés (e) Xu (961(2)) 
= (dLs())o. (e) X kdj) e (1 (2)) 


by Theorem 63.6.15. This verifies line (64.8.2). Let V = (drg)z(w). Then by Theorem 62.6.7 (iii), 


Adj(g(p) )(u(g.p, 2, w)) = (d(Lgp)-1 o Rg(p)) )e((ARG(p Je ((d9)p(V)) 
- dius (py © Ray) © Ry) (do), (V) 


= (dLg(p)- 1) g(p) ((dg)p(V)). 


This verifies line (64.8.3). 


64.8.11 REMARK: Differential transition map between local trivialisations. 

Theorem 64.8.12 is an attempt to write an explicit formula for the map (TE X $2). o (1g X $1); t, which is 
the transition map between differentials of local trivialisations mg x ¢; and tg X $9. One could call this a 
“differential transition map". The domain and range of this differential map are both equal to the product 
tangent bundle T(U) x T(F). where U = n(Dom(¢1) n Dom(¢2)). 

1 


It is important to note that (rg x $23). o (1g X $1); +, restricted to a subset (p) x T(F) of the domain, 


is not equal to the moderately plausible map id, x (¢2 o dıl, ).. This fails because the composite map 


-1 
" 
Q2 0 dila : F — F varies with respect to p € M in general. Restricting this map to E, removes the 
horizontal component, which is non-zero if (dg),(V) is non-zero. 


To put it another way, ((mg x $2) o (ng x $1) (p. f) = (p, Las, , (p) f) for all (p, f) € U x F, where the 
fibre space element L, $9.4, (P) f clearly varies with respect to both p and f in general. So the differential of 
this transition map must take into account the differential of gs, 4, (p) with respect to p. 


64.8.12 THEOREM: Formula for transition map between differentials of trivialisations. 
Let (E, vg, M, AE) be a C! (G, F) fibre bundle. Let ¢2 o $1]; = L : F + F for all p € M and 


1 
Lie 962,01 (p) 
$1, $2 € Af, as in Definition 64.8.3 (iv). Then 


Yp E€ M, VzcE p. Voi, $2 € AE y, V(V,£) € T; (M) x Ts, (2) (E), 
(d(mg x mur x &))21(V,6)) = (V. (dLg(p Jote) Xi p vy (03 (2)) + (dos) 2(w))). 
where ü(g,p, V) = (dL g(p)-1) g(p) (dg) p(V))), w = (d(xg x $1)); (V, £), and 95, 18 abbreviated to g. 
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PROOF: The assertion follows from Theorem 64.8.10 line (64.8.3). 


64.8.13 REMARK: The Maurer-Cartan form appears in differential transition maps. 

In the literature, one often sees the expression “g~'dg”, sometimes referred to as the “Maurer-Cartan form". 
(See for example Frankel [12], page 476; Sternberg [38], pages 64-65; Drechsler /Mayer [262], pages 62-63, 204; 
Eriksson/Hàggblad/Strómbom [264], page 53; Daniel/Viallet [317], pages 176, 184, Sulanke/Wintgen [40], 
pages 128-130. See also Remark 69.13.11 for gauge transformation literature using this form.) 


However, it is often the expression “g~!” which is called the *Maurer-Cartan form". The composite function 


(dL g(p)-1) g(p) © (dg)p in Theorem 64.8.12 corresponds to the expression “g~'dg”. This may be considered to 
be a T; (G)-valued one-form on the base space tangent bundle T(M), whereas the form *g^!" appears to be 
a one-form on the structure group tangent bundle T(G). 


64.9. Differentiable fibre bundle pull-back atlases 


64.9.1 REMARK:  Pull-back atlases for differentiable fibre bundles. 

Definition 64.9.2 extends Definition 52.2.8 from differentiable manifolds to differentiable fibre bundles. In 
practice, this pull-back atlas, constructed from the base space and fibre space via fibre charts, is more likely 
to be used than any native differentiable structure on the total space. The pull-back atlas is always available 
by the definition of a differentiable fibre bundle, whereas any native atlas is rarely discussed. (For the 
analogous pull-back atlas for product-structured differentiable manifolds, see Theorem 52.7.2 (i).) 


64.9.2 DEFINITION: The pull-back atlas for a C* fibre bundle (E, ng, M, AZ), for k € Zg, is the set 
[b X Vr) o (ng x à); Ym € atlas(M), Yr € atlas(F), 6 € AE]. 


Alternative name: induced atlas. 


64.9.3 REMARK: Compatibility of pull-back atlases with total space differentiable structure. 

Theorem 64.9.4 (i) shows that the pull-back atlas for a C^ fibre bundle is C^ compatible with the given 
native atlas on the total space, and part (ii) says the same thing in a slightly more technical way. (For the 
product-structured differentiable manifold version of Theorem 64.9.4 (i, ii, iii), see Theorem 52.7.2 (ii, iii, iv).) 


The pull-back atlas in Definition 64.9.2 does not depend on the differentiability class C^. (Nor does it depend 
on the structure group G, which is not even mentioned in the definition.) However, Theorem 64.9.4 (i, ii, iv) 
seems to imply that some properties of the pull-back atlas do depend on the class C^ of E. But if E 
is C*, then E is C* for l < k. So Theorem 64.9.4 makes a different assertion for the same fibre bundle for 
multiple C classes with Z < k. This does not in fact lead to a contradiction because atlas'(E) 2 atlas" (E) 
whenever / < k. So the multiple C* differentiability assertions are automatically compatible, as one would 
hope and expect. The strongest assertions imply the weaker ones. 


64.9.4 THEOREM:  Differentiability properties of pull-back atlases for differentiable fibre bundles. 
Let k € Zi. Let E < (E, ng, M, AE) be a C* (G, F) fibre bundle. Let Ao be the pull-back atlas for E. 
(i) Ao is a C* atlas for E which is C^ equivalent to atlas(E). 
(ii) Ao C atlas (E). 
(iii) Vz € E, Ao, = {(Ym X Vr) o (rg x 6); Ym € atlas;,(: (M), Yr € atlasga) (F), 6 € AE «sb where 
Ao,; denotes [o € Ao; z € Dom(vo)] for all z € E. 
(iv) Vz € E, Ao, C atlas (E), where Ao,; denotes (vo € Ao; z € Dom(vo)) for all z € E. 


PRoor: For part (i), let Ao = {(Ym X Vg) o (Te X à); Ym € atlas(M), Yr € atlas(F), 6 € AE}, which is 
the pull-back atlas for E by Definition 64.9.2. Let vo € Ao. Then Wo = (Wm X Wr) o (Tg x à) for some vy € 
atlas(M), Yr € atlas(F) and ¢ € AE. Let vg € atlas(E). Then vo o vg! is a C^ map between Cartesian 
spaces by Definitions 52.1.2, 52.6.2 and 64.8.3 (ii). Similarly, Vg o v5 = vg o (rg x à)! o (Ym X Wr)? is 
a C* map between Cartesian spaces because (rp x ¢)~! is a C^ map. So the chart vo on E is C^ compatible 
with E by Definition 51.4.2. 


To show that the charts in Ap cover E, let z € E. Then there exists $ € Abrs by Definition 64.8.3 (iii), 
and there exist manifold charts wy € atlas,,(,)(M) and Yp € atlas;(,;(F) by Definition 51.3.2 (iï). It 
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follows that z € Dom(vo), where Yo = (Ym X vr) o (1g x à) € Ao. Thus U,,,¢4, Dom(vo) = E. Hence by 
Theorem 51.4.13, Ao is a C* atlas for E which is C^ equivalent to atlas(E). 

Part (ii) follows from part (i) and Definitions 51.4.10 and 51.4.2 and Notation 51.4.7. 

For part (iii), let z € E. Let Yo € Ao,z- Then Yo € Ao and z € Dom(vo). But then Yo = (Vu X Yr) o (TEX) 
for some v € atlas(M), Wr € atlas(F) and ¢ € AE. So z € Dom((Vy X Wr) o (rg x ¢)), which implies 
z € Dom(rg x ¢) and (vg(z), ó(z )) € Dom(v x d. So z € Dom(9), and Ym € atlas,,,(.)(M) and vr € 
atlas (F). Thus Ao,- C {(Ym X Vr) o (rg x $); Wm € atlas, .(2)(M), Yr € atlas) (F), 6 € Ag, (o 
Now let Wo € (yu XE) [9] (rg x); wu € atlas; (M), Yr E atlasg(z) (F ), PE At mete) yh Then wo € Ao, 
and z € Dom(wo) because (g(z),ó(z)) € Dom(Wwu X wr) and z € Dom(zg x ¢). Thus z € Ao. This 
verifies the reverse inclusion. 


Part (iv) follows from parts (ii) and (iii). 


64.9.5 REMARK: Extraction of vector components from total space vectors using the pull-back atlas. 

The pull-back atlas facilitates the extraction of component tangent vectors on the base space and fibre 
space in Theorem 64.9.6. This is the fibre bundle version of to Theorem 58.7.3, which extracts horizontal 
and vertical components from tangent vectors on product-structured manifolds. Theorem 64.9.7 combines 
Theorem 64.9.6 with its converse. 


64.9.6 THEOREM: Extraction of horizontal/vertical vector components from total space vectors. 
Let (E, ng, M, AZ) be a C! (G, F) fibre bundle. Let n = dim(M) and m = dim(F). Then 
Vp E€ E, Vz € Ep, Vo € AE p Vým € atlas, (M), Výr € atlasz) (F), V(v,w) € IR^ x R”, 


(d). cou) om Rr) o(n4)) = Ursus (64.9.1) 
and 


(do); (t z,(v,w) yit ue ora soy) = tolz), wyr: (64.9.2) 


PROOF: Lines (64.9.1) and (64.9.2) follow from Theorem 58.7.3 lines (58.7.1) and (58.7.2) respectively. 


64.9.7 THEOREM: Determination of total space vector from horizontal and vertical components. 
Let (E, ng, M, AE) be a C! (G, F) fibre bundle. Let n = dim(M) and m = dim(F). Then 


Vp € E, Vz € Ep, YQ € AE p: Vom € atlas; (M), Vr € atlas(F), V(v,w) € R” x R”, VV € TZ(E), 
(dre). (V) = tym and (d$);(V) = LBC ania) e V= 0, (vw), (bar Xvr)o(mE xd)" 


PROOF: The assertion follows from Theorem 58.7.11. 


64.9.8 REMARK: Expressions for sets of total space vectors with specified horizontal component. 
Theorem 64.9.9 line (64.9.4) generalises Theorem 59.2.7 line (59.2.1) from tangent bundles to fibre bundles. 


64.9.9 THEOREM: Component expressions for total space tangent spaces for a given horizontal component. 
Let (E, vg, M, AL) be a C! (G, F) fibre bundle with n = dim(M) and m = dim(F). Then 


Vp E€ E, Vz € Ep, Vo € AE p Vim € atlas, (M), Vwr € atlasy.)(F), 


TUE) = {t; (ww), pm žýr)o(rexę) V E R”, wE R™”} (64.9.3) 
and 
Vp E€ E, Vz € Ep, VV € T (M), Vo € AE p Vw € ud Vip € atlasy.)(F), 
WE) = Thuy arde Unt $(vy)(V), we R”}. (64.9.4) 


PROOF: Line (64.9.3) follows from Theorems 54.3.7 and 64.9.4 (ii). 


For line (64.9.4), T; v (E) = (y € T.(E); (dtz)z(y) = V) by Notation 64.5.6, and by line (64.9.3), y € TZ(E) 
if and only if y = t; (v, W nitet for some v € R” and w € R™. But (dag);(y) = tyra, by 
Theorem 64.9.6 line (64.9.1). So V = tpv,pu, Which holds if and only if ®(w,,)(V) = v. 
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64.9.10 REMARK: Diéfferentiability test for cross-sections using the pull-back atlas. 
Theorem 64.9.11 provides a practical test for differentiability of cross-sections in terms of the pull-back atlas 
on the total space of a differentiable fibre bundle. (See the proof of Theorem 65.2.4 for an application.) 


64.9.11 THEOREM:  Pull-back atlas differentiability test for fibre bundle cross-sections. 
Let k € Zt. Let (E, ng, M, AZ) be a C* (G,F) fibre bundle. Let n = dim(M) and m = dim(F). Let 
U € Top(M) and X € X(E,zg, M |U). Then X € X*(E,«g, M |U) if and only if 


Vo € AE, Vi) € atlas(M), Vr € atlas(F), 


p y =i k -1/,-1 n+m (64.9.5) 
(om X Vr) e (TeX $) o X ody € C° (Ym X "(6 ^ (Dom(vr)))) R^"). 
PROOF: Suppose that X € X^(E,mg, M |U). Then by Theorem 64.7.5 (iv), 
Vw € atlas" (E), Vy € atlas" (M), (64.9.6) 


Ugo X oda € C'(iu(X- (Dom(g))), R"*”). 


Let Ao be the pull-back atlas for E. Let ¢ € AZ, Ym € atlas(M) and vp € atlas(F). Let vo = (Ym X vr) o 
(rg x à). Then Dom(vo) = (rg x 9) ! (Dom(vy) x Dom(yr)) = 75 (Dom(v)) N ó-1(Dom(wvr)), and 
wo € Ag by Definition 64.9.2. Therefore vo € atlas*(E) by Theorem 64.9.4 (ii). So condition (64.9.6) holds 


with wo in place of vg. Therefore (Ym X Vp) o (rg x 9) o X o v] € C*(iy (X 1(971 (Dom(vr)))), R^*") 
because 


Thus condition (64.9.5) is satisfied. 
For the converse, suppose that X € X(E, mg, M |U) satisfies condition (64.9.5). Then by Definition 64.9.2, 


Vio € Ao, Uo o X o pit € C" (pm (X^! (Dom(yo))), R+), 
where Apo is the pull-back atlas on E. So by Theorems 64.9.4 (i) and 52.1.13, 


Vijo € atlas(E), Vo o X o pir € C^(Uu (X^! (Dom(yo))), R"*), 


Hence X € X^(E, ng, M |U) by Definition 64.7.2. 


64.10. Analytic fibre bundles 

64.10.1 DEFINITION: An analytic (G, F) fibre bundle for an analytic Lie left transformation group (G, F) « 
(G, Ag, F, Ap, 0G, ub) is a differentiable (G, F) fibre bundle (E, v, M, AE) < (E, Ag, m, M, Am, AE) with 
(i) (E, Ag) and (M, Am) are analytic manifolds, 

(ii 


) 
(ii) for all 9 € AE, 6: Uy — F is analytic and 7 x à : 1^ 1(U4) — Ug x F is analytic, 
) 


the projection map 7 is analytic, 


(iv) the transition maps g¢,,¢, for $1, 9» € AL are analytic. 


64.10.2 REMARK: Analyticity of the action map of the structure group of an analytic fibre bundle. 

The analyticity of the map uE in Definition 64.10.1 is implied by the weaker condition that the action map 
ub : Gx F > F be analytic with respect to F only. (See Remark 62.1.4.) But since this implication is 
non-trivial, it is best to state the analyticity requirement explicitly for both G and F. 
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64.11. Fibre-set submanifold differentiable structure 


64.11.1 REMARK: Fibre sets are regular submanifolds which are diffeomorphic to the fibre space. 

Since fibre bundles are specifically defined so that the fibre sets will be submanifolds which are diffeomorphic 
to a given fibre space, it is not surprising that this in fact true. The surprise, if any, is that it is not entirely 
trivial to prove that it is true. The lower-level Theorem 52.7.5, which supports Theorem 64.11.3, shows that 
a manifold E which is locally diffeomorphic to a direct product U x F of differentiable manifolds U € Top( M) 
and F, can be decomposed pointwise into diffeomorphic copies of F. Achieving this requires the construction 
of a C* manifold atlas on each fibre set Ep, and showing regular C* compatibility with the ambient space 
1-1(U) C E. This implies that the fibre set is a regular C^ submanifold. It is then shown that the map 


Op, is a regular embedding of F in E. 


The technical hurdle which may not be immediately obvious is that each submanifold E, requires an atlas 
so that it has a differentiable structure which can be called C^ differentiable and regularly included in E. 
Without an atlas, Ej, would be merely a regular C* submanifold point-set, as in Definition 52.3.7. It would 
not be possible to claim that the external space F is C* diffeomorphic to E, if it has no atlas. The atlas 
AE, is constructed so that it is a valid C* atlas for Ep, and is also regularly C* compatible with E. 


The regularity of the inclusion of the manifold (Ep, Az,,¢) in E is necessary to avoid the kind of scenario 
which is seen in Example 52.3.4, where the inclusion map may be highly continuously differentiable, even 
though the subset itself is very clearly not differentiably included in the ambient space. The intuitive notion 
of differentiable inclusion of a subset in an ambient set requires it to be everywhere locally expressible as the 
graph of a suitably differentiable function. It is only when the inclusion is regular that the tangent bundle of 
the included subset has a simple direct relation to the tangent bundle of the ambient set. This is necessary 
to ensure the validity of the basic differential calculus of fibre bundles. 


64.11.2 REMARK: Intrinsic versus extrinsic manifold atlases for fibre sets. 

To see why there is some difficulty to show that the restrictions of local trivialisations to fibre sets are 

diffeomorphisms, it is perhaps helpful to consider the analogous circumstances for topological fibre bundles. 

The general strategy there is as follows. 

(1) Construct the relative topology Tg, = {9 N Ep; Q € Top(E)) on a fibre set Ep. 

(2) Demonstrate that $| E, ` Ep — F is a homeomorphism between topological spaces (Ep, Tz, ) and (F, Tp) 
for the given topology Tp on F. 


The corresponding strategy for differentiable fibre bundles encounters a difficulty. 
(1) Construct the “relative atlas” Ag, = (v|,, ; v € atlas(E)) on a fibre set Ep. 


(2) Demonstrate that $| p, ` Ep > F is a diffeomorphism between manifolds (Ej, Ag,) and (F, Ap) for the 
given atlas Ar on F. 


The difficulty here is that Ag, is not generally a differentiable manifold atlas on Æp. This is because 
Range(v| 5,) is not generally an open subset of an m-dimensional hyperplane of R”, where m = dim(F) and 
n = dim(M) + dim(F). Definition 52.3.7 for a regular differentiable submanifold point-set requires that the 
chart range be some such hyperplane for some chart in the maximal C* atlas on the total space manifold for 
some k € Z. The use of a maximal atlas is a technical way of transforming the coordinates so that there 
exists charts such that Range(w| E) is an open subset of some m-dimensional hyperplane. This suggests 


that one way out of this difficulty is to replace Ag with some kind of maximal atlas and then use this for 
the diffeomorphism test. 


A suitable regular submanifold atlas is constructed in Theorem 52.4.11. This would give E, a kind of “relative 
atlas” which can be used for the diffeomorphism test. This seems ideal. One little fly in the ointment here, 
though, is that one must know in advance that the fibre set is a regular C* submanifold in order to construct 
this “relative atlas”. So it would not then be possible to use this atlas to prove that it is a regular C^ 
submanifold. One must use only the information which is given. (That’s how normal mathematical logic 
works anyway.) All that is known about the atlas on E is that it is locally diffeomorphic to direct products 
of manifolds U and F, where U is an open subset of M. 


Luckily an open subset U of M does have an automatic “relative atlas". This is given by Definition 52.4.16, 
which is shown to make U a regular C^ submanifold in Theorem 52.4.17. So the direct product U x F has 
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the correct kind of n-dimensional atlas by Definition 52.6.2 and Theorem 52.6.5. The atlas on E is known 
to make 4^ !(U) C E diffeomorphic to U x F, but this does not force its charts to have the right kind of 
hyperplane ranges when restricted to fibre sets Ep. 


'The solution given here to these difficulties is to follow a reverse strategy. Instead of defining a relative atlas 
on Ep and showing that this is diffeomorphic to F, an atlas is induced on Ep via a fibre chart 9 from F. It 
is then shown that this makes E, a regular C* submanifold of E. This strategy effectively yields the same 
result. The principal objective is achieved, which is to show that all fibre sets are regular C* submanifold 
point-sets within the total space E according to Definition 52.3.7. The particular choice of submanifold 
atlas for E, is not the first concern. The main task is to prove that it is the right kind of point-set. The 
secondary task is then to define a suitable submanifold atlas for applications. The submanifold atlas can 
then be constructed from atlas(E) as in Theorem 52.4.11, which is very abstract, or it can be constructed 
via a fibre chart ó from atlas(F) as in Definition 64.11.5, which is very concrete. These two atlases are C* 
compatible, but the concrete atlas is chosen here because it is more concrete! 


'Theorems 64.11.3 and 64.11.6 are almost identical. The purpose of Theorem 64.11.3 is to demonstrate that 
any given fibre set is given a regular submanifold structure Ag, by a single fibre chart, and the restricted 
fibre chart is then a regular embedding of the fibre space in the total space. This motivates Definition 64.11.5. 
Then Theorem 64.11.6 shows that the atlas Ag, induced on E, by all fibre charts in AZ has exactly the 
same properties. This is the atlas which is adopted in this book as the standard manifold atlas for fibre sets 
of differentiable fibre bundles. This atlas is implicitly assumed by default. Consequently the atlas depends 
on the fibre space and the fibre atlas, and is not constructed intrinsically from the atlas on the total space. 


64.11.3 THEOREM: Properties of a single-fibre-chart-induced manifold atlas om a fibre set. 
Let k € Zj. Let (E, v, M, AL) be a C* differentiable (G, F) fibre bundle. Let ¢ € A5. Let p € t(Dom(¢)). 
Let E, = nm l([p]) and AE, = [vp o |. ; Vp € atlas(F)}. Let m = dim(F). 
(i) Ep is an m-dimensional regular C^ submanifold point-set in E. 
(ii) (Ep, An,,¢) is an m-dimensional regular C^ submanifold of E. 
(ii) (Ep, Ag, 4) is C* diffeomorphic to F via 9|, : Ep > F. 
(iv) The map lp : F > E, is a regular C^ embedding of F in E, assuming manifold atlas Ar, on Ep. 
PROOF: Part (i) follows from Theorem 52.7.5 (viii). 
Part (ii) follows from Theorem 52.7.5 (vi). 


Part (iii) follows from Theorem 52.7.5 (x). 
Part (iv) follows from Theorem 52.7.5 (xii). 


64.11.4 REMARK: Construction of a standard manifold atlas for fibre sets. 

The atlases Ax, for fibre sets Ep in Theorem 64.11.3 has an inconvenient dependence on the choice of fibre 
chart ó. Since these atlases all make (Ep, Ag,;) a regular C* submanifold of E, it seems likely that they will 
all be C^ consistent with each other. In fact, the set E, without any atlas is itself a regular C* submanifold 
point-set, as in Definition 52.3.7, and any C* atlas on this point-set will necessarily be C^ consistent with 
any other. So a union of such atlases is also a suitable atlas. Therefore it makes sense to define a standard 
manifold atlas for each fibre set E, as the union of the atlases Ag,.; for which p € m(Dom(¢)). 


Note that Definition 64.11.5 does not mention the differentiability class C^ because this is not required for 
the construction. However, (Ej, Ag, ) will be a regular C^ submanifold of E if E is a C^ fibre bundle. 


The atlas Ag, in Definition 64.11.5 is effectively the set of all pull-backs of manifold charts Yp on F via 
all fibre set bijections of the form tls, : Ep — F. (See Theorem 52.7.5 (iv) and Definition 52.7.8 for the 
product-structured differentiable manifold version of Definition 64.11.5.) Since this is such a natural and 
obvious choice for the manifold atlas for a fibre set, and it has the highly desirable regular C^ submanifold 
and embedding properties, it will generally be assumed to be defined on all differentiable fibre bundles 
without explicitly saying so. 


64.11.5 DEFINITION: The (standard) (manifold) atlas for a fibre set Ep of a C? differentiable (G, F) fibre 
bundle (E, x, M, AE) at a point p € M is the set 


Ag, — (v o | 5 ; QE AE p and v c atlas(F)], 
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where At» = (6 € AE; p € r(Dom(¢))}. 


64.11.6 THEOREM: Properties of fibre-atlas-induced manifold atlas on a fibre set. 
Let k € Zj. Let (E, m, M, AZ) be a C* differentiable (G, F) fibre bundle. Let p € E. Let Ag, be the 
standard manifold atlas for Ej. Let n — dim(M) and m — dim(F). 


(i) Ep is an m-dimensional regular C^ submanifold point-set in E. 
(Ej, Ag,) is an m-dimensional regular C^ submanifold of E. 
(Ep ) is C^ diffeomorphic to F via Pla, for any ¢ € AE 


(i 


i) 

(iii) 

(iv) The map Ole. : F — E, is a regular C^ diffeomorphism from F to (Ep, Ag,) for any ¢ € AE p 
) 


(v) The map él E, : F — E, is a regular C^ embedding of F in E for any ¢ € AE p assuming the manifold 
atlas Ag, on E, : 


Proor: Part (i) is a restatement of Theorem 64.11.3 (i), which does not depend on the choice of atlas. 
For part (ii), note that Ag, = Usear Ag,,¢, where each Ag, makes Ep an m-dimensional regular C^ 
submanifold of E as in Theorem 64.11.3 (ii). To show C* compatibility of these atlases, let w1, 2 € Ag,. 
Then Wa = Wa o Ga, for some J € atlas(F) and $4 € At» for a = 1,2. It then follows that 2 o 
-1 z , m 
ir = 20 $a], o (yı o $1] 7 = 2 0 $30 ilz, o v1! € C*(Range(v1), Range(w2)) by Definitions 


64.8.3 (iv), 63.4.2 (v) and 52.1.2. So Ag, is a C* atlas for Ep, and so (Ep, Ag,) is an m-dimensional regular 
C* submanifold of E by Theorem 52.7.5 (vi). 


Part (iii) follows from Theorem 52.7.5 (x) because the atlas Ag, is C^ compatible with Ap, for all ¢ € AE» 
by part (ii). 


Part (iv) follows from part (iii). 
Part (v) follows from Theorem 52.7.5 (xii) because the atlas Ag, is C^ compatible with Ap, 4 for all ó € Af " 
by part (ii). 


64.11.7 THEOREM: Fibre chart transition maps are diffeomorphisms. 
Let k € Zj. Let (E, m, M, AL) be a C* differentiable (G, F) fibre bundle. 


(i) Vp € E, Và1, %2 € AE p: $2 © gila : F + F is a C* diffeomorphism. 
(ii) Vp € E, Vbi,¢2 € AE p és |. 0 $2 : Ep — Ep is a C^ diffeomorphism. 


Pnoor: For part (i), let p € E and ¢1, ¢2 € AE p Then $» o $1 pate Lj,: F — F for some g € G by 


le, 
Definition 64.8.3 (iv). Hence $2 o $1 71. F> FisaCk diffeomorphism by Theorem 63.4.5. Alternatively, 
apply Theorem 64.11.6 (ii). 


ls, 
Part (ii) follows from Theorem 64.11.6 (iii). 


64.12. Fibre-set tangent vector embedding maps 


64.12.1 REMARK:  Fibre-set tangent vector embedding maps. 
Definition 64.12.2 is based on the general submanifold tangent vector embedding map in Definition 54.6.2, 
which is applied to product-structured differentiable manifolds in Theorems 54.8.5, 58.7.5 and 58.7.15. 


64.12.2 DEFINITION: The fibre-set tangent vector embedding map for a C! differentiable (G, F) fibre bundle 
(E, vg, M, AE) is the map 7: Unem T(E») > T(E) defined by 


Vp € M, Vy € T(E,), n(y) = p(y); 


where np : T(Ep) > T(E) is the submanifold tangent vector embedding map for the submanifold E, of E 
as in Definition 54.6.2 for all p € M. 
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64.12.3 REMARK: Embedded fibre-set tangent vectors are vertical vectors on the total space. 

The purpose of Theorem 64.12.4 is to assist Theorem 64.12.6, which asserts, in essence, that the tangent 
bundle of the fibre set above a given base point is equal to the set of all vertical second-level vectors above 
that base point. Since the fibre set is a submanifold, this assertion requires the application of a tangent 
vector embedding map. 

In Theorem 64.12.4, part (i) is a technical lemma which swaps the component tuples in the general tangent 
vector embedding map in Definition 54.6.2 so as to convert it from a horizontal submanifold definition to 
a vertical submanifold, and part (ii) gives an explicit formula for embedded fibre-set tangent vectors which 
makes it clear that these vectors are vertical. (Theorem 64.12.4 (ii) is illustrated in Figure 64.12.1.) 


Np te yap 7 tz,(0,v),y> p= brody. V = (Ym X Vr) o (ng x $) 


Dom(¢) € Top( E) | T a ES 
z | d 
Ep = "g ({P}) R” 


v 
> > 
IR^ U = 7 E( za Rrtm 
€ Top(M) U xF 
Figure 64.12.1 Fibre-set tangent vector embedding map formula 


64.12.4 THEOREM: Formula for fibre-set tangent vector embedding map. 
Let (E, rg, M, AE) be a C! differentiable (G, F) fibre bundle. Let n = dim(M) and m = dim(F). 
(i) Then the submanifold tangent vector embedding map np : T(Ep) — T(E) for the submanifold E, of E 
for p € M satisfies 


Vp € M, Yz € Ep, Ww E R”, Vw € atlas; (Ep), VY € A(z, b), 
Tp (te vp) = tz, (Onn v)» 

where A(-,-) is defined by 

Vp € M, Yz € Ep, Vp € atlas; (E), 

A(z, 9) = (9 € atlas; (E); Y(Ep) = Range() N ((0n«) x R”) and Vly omy) = Ir o |o } 
(See Notation 11.5.26 for Iz 17.) 
(ii) Hence the fibre-set tangent vector embedding map 77 for (E, tg, M, A‘) satisfies 
Vp € M, Vz € Ep, Ww € R”, Vo € AE p Vim € atlas; (M), Vip € atlasy,)(F), 


Tu ie odl s, ) = Loita gy ie ye rgo 


PROOF: Part (i) follows by applying Definition 54.6.2 to Definition 64.12.2, and replacing v with p o v, 
where p : R” x R” > IR" x R” is defined by p(x1, 22) = (z2,21) for all zı € IR" and z2 € R”. (Then 
w € atlasi (E) if and only if po v € atlasl(E).) 

For part (ii), let p E€ M, ze Ep, v E R”, óc AE p: Vy € atlas;,(; (M) and vp € atlas; (F). Let v 
denote the map (Ym X Wr) o (rg x $). Then v € atlasi (E) by Theorem 64.9.4 (iii, iv). Since v satisfies 


V(E,) = (Vm X vre)(Grg x $)(E)) 
= (Ym X vr)((n) x F) 
= (vi (p)) x Range(vr) 
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nd 
. Range(7) N ({Ym(p)} x R”) = (Ym x vr)(1z(Dom(2)) x F)n ((éu(p)) x R”) 
= (V (rg(Dom(9))) x Range(vr)) N (Um (p)) x IR") 
= (vu (p)} x Range(vr), 
it follows that (Ej) = Range() N ({Ym(p)} x R”). Let 9 = Yr o ola, Then 7 € atlas; (Æp) and 


T =F o $l, ouan 
= Vr 04], (64.12.1) 
=T o (Ym X Yr) o (TE x $)| p, 
= IGT o él, 
where line (64.12.1) follows from the inclusion E; C Dom(4). 


Define Yo : Dom(V) — IR**"" by wv : z' e v(z) — (V (p), 0g») for all 2’ € Dom(v). Then vo € atlas! (E) 
because constant translation does not affect C! differentiability, and v9(E,) = Range(vo) N ((0g») x R™), 


and Voss = S emit) = 1o vls, = T o volg, So vo € A(z, Y). Hence by part (i) and 
Theorem 54.6.11, n(tz v progie, ) = tz,Onn,v),o = tz,(Onn,v). = tz (Orn v) lpm XPr)o(mn xd)" 


64.12.5 REMARK: Similarity of fibre-set and product-structured submanifold embedding maps. 

Theorem 64.12.4 (ii) almost follows directly from Theorem 54.8.5 (ii), which is the corresponding assertion 
for a product-structured manifold. A subtle difference here is that the atlas Ag, for Ep in Definition 64.11.5 
is in general a union of one or more product-structured manifold atlases of the kind which are given in 
Definition 52.7.8 for horizontal and vertical submanifolds. Thus an alternative proof strategy would be to 
first show that fibre-bundle fibre-set atlases are unions of product-structured manifold atlases, then show 
that the tangent vector embedding map in Definition 64.12.2 can be expressed in terms of the corresponding 
embedding maps for product-structured manifolds in Theorem 54.8.3 (ii) and Theorem 54.8.5 (ii), and then 
finally deduce Theorem 64.12.4 (ii) from this. T ~ 


64.12.6 THEOREM: Verticality of embedded tangent vectors of fibre sets. 
Let (E, rg, M, AE) be a C! differentiable (G, F) fibre bundle. Then 


Vp € M, Vz € Ep, n(Tz(Ep)) = Np (Tz (Ep)) = T; o(E), 


where 5 = Upe m "p is the fibre-set tangent vector embedding map in Definition 64.12.2. Therefore 


Vp € M, n(T(Ep)) = n (T(E,)) = U T o(E). 


Hence n(T(E)) = U,c p Tz, (E). 


zcE 
PROOF: Let p € M, z € E, and y € T,(E,). Let m = dim(E,). Then j = t,, ; for some v € R™ 
and ) € atlas;(E,) by Notation 54.1.4. Then ) = po P|, for some Wr € atlas;(2,(F) and ¢ € 
At» by Definition 64.11.5. Let Ym € atlas,(M). Then n(t, 5) = tz,(ogn,v),(vuxXvr)o(mnxd) by Theo- 
rem 64.12.4 (ii). Therefore (dn E)z(t, (os, v) (pu br )o(ngx9)) = tp,0,vm by Theorem 58.7.3 line (58.7.1), and 
SO t, (omn,v),(oaXbr)o(mnxd) € Lz,0(£) by Notation 64.5.6. 

To show the reverse inclusion, let p € M, z € Ep and y € T;9(E). Then y € T;(E) with (drg);(y) = 0 by 
Notation 64.5.6. Let o € AE p: Ym € atlas,(M) and vr € atlasg(-) (F). Then Theorem 64.9.4 (iv) implies 
V = (Ym Xr) o (rg X ¢) € atlasl(E). Let n = dim(M) and m = dim(F). Then by Theorem 54.3.7, 
y = tzw for some w = concat(wi,w3) € IR^ x R” = R™*™. But y € T,o(E) implies (drg),(y) = 
Or, (m) by Notation 64.5.6, and (dng);(y) = (dz) z(tz,(w1,w2),y) = tp, ,u., by Theorem 58.7.3 line (58.7.1). 
Therefore w; = Opn. However, n(t,,, 5) = t,(0ga,wa) (bu xUr)o(rox4) bY Theorem 64.12.4 (ii), where 
ý= yp o $| p, € atlas: (Ep). Thus y = n(t,,,,, ;) for some ) € atlas; (Ep). So n(T-(Ep)) 2 T;o(E). Hence 
n(T:(Ep)) = Tz o(E). 

The equality n(T(Ep)) = np(T(Ep)) = Ucz, Tz,0() then follows from Notation 54.1.4, which implies 


p 


0 
TE,)- Uzer, T,(E,). Hence n(T(E)) = U,cg Tz o(E). 
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64.12.7 REMARK: Identification of fibre-set tangent vectors with vertical vectors. 

Since the fibre-set tangent vector embedding map is injective, it follows from Theorem 64.12.6 that this map 
is a bijection, both pointwise and globally. (This is asserted in Theorem 64.12.8.) Consequently fibre-set 
tangent vectors may be “identified” with vertical vectors, which means that they are so close to being the 
same thing that it is not necessary to mention explicitly every time the embedding is applied, either in the 
forward or reverse direction. This identification is in fact generally assumed in differential geometry for 
various kinds of vector fields, connections and other structures. 


64.12.8 THEOREM: Embedding map identifies fibre-set tangent vectors with vertical vectors. 
Let (E, ng, M, AE) be a C! differentiable (G, F) fibre bundle. Then 


Vp € M, Vz € E,, T.(Ep) > Tz 0(E) is a bijection, 


l|. s) : 


where 5 = Upe m p is the fibre-set tangent vector embedding map in Definition 64.12.2. Moreover, 


Vp € M, "np: T(Ep) > U TZo(E)is a bijection. 
ze Ep 


Furthermore, 

n: U T(E;) 9 U TZo(E) is a bijection. 

pcM zcE 

PROOF: Let p € M. To show that n, : T(E,) > Uses, T; o(E) is injective, let 41,9» € T(Ep) satisfy 
Np(91) = Mp(Y2) = y. Then y € Uvex, Tz o(E) by Theorem 64.12.6. So y € T;,o(E) for some z € E. Then 
Jı, J2 € T.(E,) by Definitions 64.12.2 and 54.6.2. Let ¢ € Afp and vr € atlas;(;)(F). Then it follows that 
pr o vA € atlas;(E,) by Definition 64.11.5. So Ji = lvi brodl my for some v; € R”, for i = 1,2, where 
m = dim(F) = dim(£,). Let Ym € atlas,(M). Then 7,(%;) = ta (Onn vi) l(pmžýr)o(rexg) Ír i = 1,2 by 
Theorem 64.12.4 (ii). So vı = v2 by Theorem 54.1.11. Therefore jj; = jo. Hence np : T(Ep) > Uzer, Tz o(E) 
lr.) : T, (Ey) > Tz o(E) is a 
bijection because it is injective by the injectivity of np, and it is surjective because n, (T; (E,)) = T, o(E) by 
Theorem 64.12.6. 
To show that 7 : T(E) > Uep TZo(E) is injective let 91,92 € T(E) with n(%1) = m(92) = y. Then 
y € U cg Tzo(E). So y € T;o(E) for some z € E. Let p = ng(z). Then j1,92 € T;(Ey) by Definitions 
64.12.2 and 54.6.2. So Jı = Y by the injectivity of n, : T(Ep) > Uer, Tz o(E). Hence it follows that 
7? Upem T (Ep) > Uze g Tz, (E) is a bijection because it is a surjection by Theorem 64.12.6. 


is a bijection because it is a surjection by Theorem 64.12.6. Moreover, np 


64.12.9 REMARK: Notations for fibre-set tangent vector embedding identifications. 

The fibre-set tangent vector embedding bijections in Theorem 64.12.8 are often required in applications 
although they are not often mentioned explicitly. Usually fibre set tangent vectors and vertical total space 
vectors are “identified”, which means that they can be used interchangeably as if they were identical. This 
kind of convenient identification is almost ubiquitous in mathematics, but sometimes the identification maps 
do need to be indicated explicitly. Notation 64.12.10 is provided for this purpose. 

Similarly, the total space tangent bundle T(E) = U,cp T(E) may be identified with the union U em T (Ep) 
of the fibre-set submanifold tangent spaces, although these sets are clearly different. The vectors in T(E) 


have dimension dim(E), whereas the vectors in U, em T(Ej) have dimension dim(E,) = dim(F). 


64.12.10 NOTATION: Identification maps for fibre-set submanifold tangent bundles. 

Let (E, 72, M, A5) bea C! differentiable (G, F) fibre bundle. (The following maps refer to Definition 64.12.2 
and Theorem 64.12.8.) 

7 denotes the bijection 7 : Upem T(E;) > User T20 (E). 

Mp, for p € M, denotes the bijection mp : T(Ep) > Uzer, Tz,0(E)- 

1^, for z € E, denotes the bijection "el. c, : T; (Ep) > Tz,9(E) for p = n(z). 

7 denotes the inverse of 7. In other words, it is the bijection 7: Uze g Tz,0(E) > Upem TCE;). 

fip, for p € M, denotes the inverse of np. In other words, it is the bijection fj, : Uzer, T; o(E) > T(Ep). 

7°, for z € E, denotes the inverse of 7^. In other words, it is the bijection 77^ : T; o(E) > T, (Ep). 
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64.12.11 REMARK: Verticality of differentials of inverses of fibre set restrictions of fibre charts. 
Theorem 64.12.12 confirms that the differentials of inverses of fibre-set restrictions of fibre charts map vectors 
on the fibre space to vertical vectors on the total space of the fibre bundle, as one would expect. 


64.12.12 THEOREM: Differentials of inverses of fibre set diffeomorphisms. 
Let (E, tg, M, AZ) be a C! differentiable (G, F) fibre bundle. 


(i) Vp € M, Vo € AE „, Yz € Ey, Vy € Ty2)(F), (d E (y) € T.(Ep). 
(ii) Vp € M, Yọ € AE, Yz € Ep, Vy € Ty2)(F), idole) (y)) € T- o(E). 


Pnoor: For part (i ) els, : F — E, is a C! diffeomorphism by Theorem 64.11.6 (iv). Therefore (d i 

T, a > Tazia (Ei is a well-defined map for all q € F by Theorem 58.4.8. Let q = ¢(z). Then olz (a 22: 
(alela) 6(z) : Ta(F) > T, (Ep) is a well-defined map. Hence Vy € Tyz)(F (alela )) e(z) (y) € m 

Part (ii) follows from part (i) and Theorem 64.12.6. 


64.12.13 REMARK: Formulas for inverse differentials of horizontal/vertical component maps. 

Theorem 64.12.14 is an almost mechanical conversion of Theorem 58.7.17 from product-structured manifolds 
to fibre bundles. The standard induced fibre-set atlas in Definition 64.11.5 is assumed for the fibre sets Ep 
in line (64.12.3). 

The fibre-chart-dependent horizontal submanifolds EY? = $-!([q]) = ¢71({(z)}) on line (64.12.2) are 
introduced in Definition 64.5.15 and illustrated in Figure 64.5.2. The induced atlas for each set E%? is 
atlas(E*?) = {wy o 7| goo) UM € atlas(M)}, by analogy with Definition 64.11.5 for vertical submanifolds. 


64.12.14 THEOREM: Inverse horizontal/vertical component map differential formulas. 
Let (E, Tg, M, AL) be a C! differentiable (G, F) fibre bundle. Let n = dim(M) and m = dim(F). Then 


Vq E F, Vo c AD. Vz € EY? Vu € R^, Vy € atlas, , c; (M), 
(Ung aad) bags (64.12.2) 
where jr = Vy © Tal pao € atlas(E*9), and Et? = $7! ((q]) for q € F and ¢ € AE. Similarly, 
Vp E€ M, Vo € AE p: Vz € Ey, Vv € R”, Vir € atlasga) (F), 
(dal) a(to(z)wwr) = tz wip? (64.12.3) 


where jp = Vp o M € atlas(E,). 


Pnoor: Line (64.12.2) follows from Theorem 58.7.17 line (58.7.20). 
Line (64.12.3) follows from Theorem 58.7.17 line (58.7.21). 


64.12.15 REMARK: Formula for embedded inverse differential of vertical component map. 

In the interests of brevity, the embedded version of the inverse horizontal component map differential formula 
in Theorem 64.12.14 line (64.12.2) is not given in Theorem 64.12.16. (It would follow straightforwardly from 
Theorem 58.7.19 line (58.7.22).) The embedded inverse vertical component map differential formula in 
line (64.12.4) is applicable to the proof of Theorem 65.5.2. 


64.12.16 THEOREM: Embedded inverse vertical component map differential formula. 
Let (E, ng, M, AE) be a C! (G, F) fibre bundle. Let n = dim(M) and m = dim(F). Then 


Yp E€ M, Yọ € Aip» Vz € Ep, Vv € R”, Voy € atlas (M), Vp € atlasz) (F), 


=i 
( (do| m Jet ltet) vyr) ) = tz (0v) dar Rd )o(mE Xd): (64.12.4) 


PROOF: Line (64.12.4) follows from Theorem 64.12.14 line (64.12.3) and Theorem 64.12.4 (i). (Alterna- 
tively by Theorem 58.7.19 line (58.7.23).) 
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64.12.17 REMARK: Synthesis of total space tangent vectors from horizontal/vertical components. 
If Theorem 58.7.21 line (58.7.25) is applied to fibre bundle trivialisations, the result looks something like: 


Vzc E, Vy € T.(£), Vo € AE z(e) 
y = gso Gris Je (dr) (0) + n, Al, ote (609). (0). 


where jM : E?) — E and TB ca) DES; > E are the submanifold tangent vector embedding maps 
for E?;9 and E,(z) respectively within the total space E of a C! fibre bundle (E, 7, M, AE). 


64.13. Vertical vector fields generated by the Lie algebra 


64.13.1 REMARK: Vector fields on differentiable fibre bundles. 

Vector fields may be generated by the Lie algebra T.(G) of a Lie group G acting on the fibre space of a 
differentiable (G, F) fibre bundle (E, x, M, AE) as in Definition 63.6.5. The scope of this action is limited to 
the Lie transformation group (G, F). It is not applied to the fibre bundle structure itself. 


Vector fields generated by T.(G) acting on F may be applied to individual fibre sets E, = c !((p]) via 
the differentials of fibre charts ¢ € A}. These are “differential left actions" of G acting on E via charts. 
They may be thought of as “infinitesimal transformations" by the group, but they are chart-dependent. (By 
contrast, the corresponding differential right actions for principal bundles are chart-independent.) 


'The differential action by Lie algebra elements of the structure group on individual fibre sets extends very 
simply and easily to entire domains of fibre charts, as in Definition 64.13.3. 


64.13.2 REMARK: Applications of Lie-algebra-generated vector fields on OFB total spaces. 
The vector fields generated by Lie algebra elements are much more useful when these elements are variable. 
For example, they could depend on the base point p of each fibre set Æp, or they could depend on vectors 
in the tangent spaces T;(M). When the Lie algebra element depends linearly on vectors in T,(M), the 
generated vector fields form a horizontal lift function or *connection" as in Definition 67.5.4. 


In gauge theory, vector fields like in Definition 64.13.3, but with variable Lie algebra element u depending on 
the base point p € M, are applied to particle fields to “twist” them in accordance with a “gauge potential” 
which is mathematically equivalent to a connection. Such a field seems to depend only on the base point 
because it has been pulled back or “localised” via local bundle cross-sections. (See for example Definitions 
69.13.3 and 69.14.3.) In fact, such gauge potentials are fibre bundle connections which have been projected 
down to the base space so as to make them appear to be only base-point-dependent. 


In applications of Lie-algebra-generated vector fields to connections, one of the many representations of a 
connection in Table 69.15.1 in Remark 69.15.1 is called a “connection form". In this representation, the Lie 
algebra elements u € T.(G) in Definition 64.13.3 are “outputs” from connection forms on principal bundles. 
'These outputs may be applied to functions on principal bundles in a chart-independent manner, but for an 
associated OFB, the action by Lie algebra elements is typically chart-dependent. Then transformation rules 
are required for transitions between fibre charts. In the gauge theory context, these chart transition rules 
are called “gauge transformations". 


64.13.3 DEFINITION: The vector field induced on the total space of a Ct (G, F) fibre bundle (E, v, M, AZ) 
by a vector u € T.(G) via a chart ¢ € Af is the vector field XL € X(T(Dom(Q))) which satisfies 


Vz € Dom(¢), X2) = (el, «XE (62) (64.13.1) 


In other words, XP, = Ces "E XE o à. (See Definition 63.6.5 for XP.) 


Alternative name: infinitesimal transformation induced on the total space. 


64.13.4 THEOREM: Verification that the vector field induced om a total space is a vector field. 
Let (E, v, M, AZ) bea C! (G, F) fibre bundle. 
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(i) Vu € T(G), Vo € Af, Vz € Dom(9), X24 (z) € T.(E). 
(ii) Vu € T.(G), Vo € AX, XT € X (T(Dom(9))). 
(ii) Vu € T.(G), Vo € AS, Vz € Dom(¢), XF = (6|... o ere olz. 


P 


Pnoor: For part (i), let u € T.(G), 6 € Af and z € Dom(¢). Then X/*(¢(z)) € Tgi (F) because 
XL € X(T(F)) by Definition 63.6.5. So XF (¢(2)) € Range((4| , l 22 because z € E,(2). Therefore 
XF(¢(z)) € Dom((4| e) So XẸ;(2) is a well-defined element of T(E). But Ole ,(6(2)) — z. Hence 
XE (2) € (E). 

Part (ii) follows from part (i) and Definition 57.1.2. 

For part (iii), 9|. : F > Ep is a bijection by Definition 64.8.3 (ii), and 6o $|; = idp by Theorem 64.11.7 (i). 


So XT, o lp, : F + T(E) is well defined and Xis o dla =(¢ ae o XF. Hence by applying (tle) 


to both sides, one obtains (6|, ; J+ o Xr o olz = XF. 
mz ? p 


64.13.5 REMARK: Proving differentiability of vector fields induced on the total space. 

At first sight, it might seem straightforward to show that the function X a. o in Definition 64.13.3 is CF for 
a C**! fibre bundle because ¢ is C^*! and XF is C^. However, the map z > $|s, in line (64.13.1) 
has a complicated dependence on z. The differential of the inverse of this map varies according to the 
base point z(z). This dependence is made explicit in Theorem 64.13.6, where the role of the projection 
map 7 is shown more fully. (This formula shows, by the way, that XE "no is always a vertical vector.) 
Equation (64.13.2) is applied in Theorem 64.13.7 to the map z +> (05(:), X7 (6(z))) to show differentiability. 


64.13.6 THEOREM: Local trivialisation formula for vertical vector field generated by Lie algebra element. 
Let (E, n, M, AZ) be a C! (G, F) fibre bundle. Then 


Vu € T(G), Vo € AE, Vz € Dom(¢), 
X$ba(2) = (1 x P) (0s), Xa (0(2)))- (64.13.2) 
Proor: By Theorem 10.15.14 (i), ble. = (xx $)-1(n(), -). So (leo) = ((n x )~!(m(z), -)),. Then 
Theorem 58.6.7 line (58.6.4) implies 


Voc AL. Vzem '(Dom(¢)), Vy € Tya (F), 
(( x $) 1). (Ore) y) = 


Hence line (64.13.2) follows from Definition 64.13.3 by substituting y = X/(¢(z)). 


64.13.7 THEOREM:  Differentiability of vertical vector fields generated by Lie algebra elements. 
Let (E, 7, M, AE) be a C**! (G, F) fibre bundle for some k € Zj. Then 


Vu € Te(G), Vo € AE, X7,€ X"(T(E)|v(Domn(9))). 


PROOF: Let u € T,(G), ó € AE and U = s(Dom(o)) Then XF € X*(T(F)) by Theorem 63.6.11, 
and ¢ € C**!(m !(U),F) by Definition 64.8.3 (ii). So XT o à € C*(x !(U), T(F)) by the chain rule 
Theorem 52.1.17. Define h : «* !(U) > T(U) x T(F) by A(z) = (On), XF(G(z))) for z € E. Then 
h € C*(x 1 (U), T(U) x T(F)) by Theorem 64.2.6 (ii) because (E, r, M) is a C+! F-fibration. 

By Theorem 64.13.6, XP, = ((a x $) 1), o h. Therefore XP, € C*(U,T(E)) by the chain rule because 
(m x $)-3). : T(U) x T(M) > T(E) is C^ by Definition 64.8.3 (ii) and Theorem 58.10.3. Consequently 
XE, € X*(T(E) | n(Dom(9))). D 
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64.13.8 THEOREM: Verification of verticality of vector fields generated by Lie algebra elements. 
Let (E, r, M, AZ) be a C! (G, F) fibre bundle. Then 


Vu € T.(G), Y € Ay, Yz € Dom(4), XË (2) € Tz o(E). 
Hence S dv is a vertical local cross-section of T(E) on Dom(@) for all u € T.(G) and ¢ € Af. 


PROOF: Let u € T.(G), 6 € Ag and z € Dom(¢). Then (zx x 6$). (XZ,(z)) = (Ort) X7 (9(2))) by 


Theorems 64.13.6 and 58.9.8. Therefore 7, (XZ) = Or(z) by Theorem 58.10.7. Hence XP D) € Tz o(E) 


by Notation 64.5.6. Hence X2 is a vertical cross-section of T(E) on Dom(¢) by Definition 64.5.7. 


64.13.9 REMARK: The tangent space direct product identification map technical requirement. 

The proof of Theorem 64.13.8 ignores the technical requirement to apply the “tangent space direct product 
identification map” i : Tre) (M) x Ty 2) (F) > T(s(),6(:) (M x F) in Definition 54.7.2. Thus (055), XT (9(2))) 
is identified with i(0,(2), XE (0(z))), which is technically different. The requirement for this map is also 
ignored in line (64.13.2) and elsewhere in Chapters 64-66, and also in later chapters, because the meaning 
is not made ambiguous by ignoring the issue, and it is somewhat difficult to explain why it is even required. 


64.13.10 REMARK: Ordinary fibre bundle vector field fibre chart transition formula. 

'The formula in Theorem 64.13.11 for fibre chart transitions for vector fields X A appears very generally in a 
similar form in the theory of principal bundles, often in “gauge transformation” formulas. (See for example 
Theorems 66.6.10, 69.8.2, 69.8.3 line (69.8.5), 69.12.5 and 69.12.10.) But Theorem 64.13.11 shows that the 
formula is also applicable on the total space of an ordinary fibre bundle. 


The equality on line (64.13.4) means that the “real” vector field on E is independent of the coordinate 
system. As discussed in Section 20.10, the total space of a fibre bundle represents the “real” system, whereas 
the fibre space represents the space of all possible measurements of that system. A fibre chart transition 
then signifies a change of “reference frame” with respect to which measurements are made. 


The structure group has two roles. It is the set of all permissible transformations of reference frames, but 
it is also the set of possible real actions on the total space. The generation of a vector field on E by 
u € T.(G) is first defined as an infinitesimal action on F, which is then copied to E via fibre charts ¢. But of 
course, to be “real”, the end result of this two-stage process must be independent of the reference frame @. 
And this is exactly what line (64.13.4) means. The adjoint operator converts the infinitesimal action u; to 
uz = Adj(9¢o,6,(p))(u1) so that the same real effect is seen on E after passing through charts ¢, and 9». 
(Theorem 64.13.11 is illustrated in Figure 64.13.1.) 


XL (z) = 


Xi, ($2(2)) = (dRoq(z))e(u2) 
Figure 64.13.1 Fibre chart transition rule for vector fields on an OFB total space 


64.13.11 THEOREM: Fibre chart transition rule for vector fields generated by Lie algebra elements. 
Let (E, n, M, AZ) be a C! (G, F) fibre bundle. Then 
Vui, u2 € T(G), Vor, 62 € Ab, Vz € Dom(¢1) N Dom(Q3), 
uz = Adige (m(2)) Q3). > XE (2) = XE g (2), (64.13.3) 


u2,02 
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where $5 o oil 
In other words, 


ten Las, (p) : F > F for all p € m(Dom(¢1) N Dom(¢2)) as in Definition 64.8.3 (iv). 


Vu; € Te(G), Voi, $2 € AZ, Vp € t(Dom(¢1) N Dom(¢2)), Yz € «^! ((p]), 
E = E 
Xii (2) = X dilasa o, (0) (u1),92 (2): (64.13.4) 


PROOF: Let ui € T(G), ¢1,¢2 € AE, z € Dom(¢1) NDom(¢2) and uz = Adj(g)(u1) with g = goo. (n(z)). 
Let E, denote x~'({p}) for all p € M. Then by Theorem 63.6.15, 


XE e2) = (al. «OG, (02 (2))) 
= (Pfi «(Lg «Xu, (7 02 (2))) 
rl e ACO) 
Xaa) 


Hence the assertion. 


64.13.12 THEOREM: Uniqueness of generators of vertical vector fields on fibre sets. 
Let (E, n, M, AZ) be a C! (G, F) fibre bundle. Then 


Vp € M, V1, $2 € AF p Vu, u2 € T«.(G), 


uz = Ad(ges.(p)(u) © XE ale — XZ 


uds P (64.13.5) 


where 2 O g|- =L 


x ({p}) 969.0, (p) ^ F > F for all p € t(Dom(¢1) N Dom(93)) as in Definition 64.8.3 (iv). 


Pnoor: The forward assertion of line (64.13.5) follows from Theorem 64.13.11 line (64.13. 2 To show the 


converse, let p € M, 1, @2 € Ai» and uj,u» € T.(G), and suppose that XE "PE = XE Then 
XE 


E 
d | a= x. ER S | E by Theorem 64.13.11 line (64.13.4). Therefore X7, = XX’ ilius uc) 
by Theorem 64.13.4 (iii). Hence uz = Adj(g¢.,4, (P)) (u1) by Theorem 63.6.17. 


ur, alle 


64.13.13 REMARK: Lie-algebra-induced vertical vector fields on ordinary fibre bundles via fibre charts. 
Vector fields may be defined on the C! manifolds E, M, G and F of a C! fibre bundle (E, r, M, AE) < 
(E, Ag, 1, M, Am, AE) with structure group (G, F) < (G, Ac, F, AF, oG, HE). Various kinds of invariant 
fields and infinitesimal transformations are of special interest on these manifolds. On the group G, left 
and right invariant vector fields are defined in Sections 62.4 and 62.7. On the fibre space F’, infinitesimal 
transformations are defined in Section 63.6. In Section 64.13, similar fields are defined on the total space E. 
Some vector fields for ordinary fibre bundles are summarised in Table 64.13.2. 


field parameters formula description 
XL e X(T(G)) u € T.(G) g> (dLẸ)e(u) left invariant vector field 62.4.15 
XE € X(T(G)) u € Te(G) g > (4aRÇ)elu) right invariant vector field 62.7.7 
XE € X(T(F)) u € T.(G) fr (dRg)e(u) infinitesimal transformation 63.6.5 
XE, € Xi (T(E) ueT.(G, 9€ AE zo C ).(XF(¢(z))) infinitesimal transformation 64.13.3 


m(z) 


Table 64.13.2 Lie-algebra-induced vector fields on ordinary fibre bundles 


The local field X25 : Dom(¢) + T(E) on E pushes the vector XË (¢(z)) back to z via the differential 
of the fibre chart ¢. To verify that XT is well defined for all u € T.(G) and ¢ € AL, let z € Dom(¢). 
Then ó(z) € F by Definition 64.8.3 (ii). Therefore X7 (ó(z)) is a well-defined element of T4(,;(F) because 
Xf € X(T(F)) is well defined. (See Definition 63.6.5.) But Play is a C! diffeomorphism from E,(z) 
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to F, where the notation E,(,) is an abbreviation for T l(z(z)) Let p = Oley Then p : F > E 
is a C? diffeomorphism, and p(ó(z)) = z. Therefore o.(XL(9(z))) = (dp) 2) (XE (9(2))) € TZ(E) is well 
defined. Thus Pep DEC Cl) (XE (¢(2))) is a well-defined map from Dom(¢) to T(E) such that 
XP Q2) € T.(E) for all z € Bom»). In other words, x € X(T(E)| Dom(4)). 


The fields X/ and XË are specific to Lie groups and have nothing to do with the fibre space or the 
differentiable fibre bundle. The field Xf is specific to the ids transformation group (G, F) and has nothing 
to do with the differentiable manifold. Only the field X, ca is specific to the differentiable fibre bundle. 
However, all of these fields are related. 


64.13.14 REMARK: Generalisation of infinitesimal actions to non-Lie structure groups. 

The vector u € T.(G) in Remark 64.13.13 may be set equal to the derivative 3/(0) € T.(G) of a differentiable 
curve y : R > G with y(0) = e € G. This suggests the natural generalisation to a differentiable fibre bundle 
with non-Lie structure group. In this way, the vector field X 7 o may be generalised to the vector field X on 
Dom(¢@) defined by 


Vz € Dom(4), X(z) 2 à&(L RC (2)) |,» 


where the left action b ¢ Ep > Ey is defined by B. "EE o|; (96(2)). The corresponding generalisation 


of the infinitesimal left action XF to non-manifold groups G is discussed in Remark 63.6.3. The corresponding 
left and right invariant fields X/ and XP, which do not require curves for their definition, are discussed in 
Sections 62.4 and 62.7 respectively. 


64.14. Non-vertical vector fields generated by the Lie algebra 


64.14.1 REMARK: Non-vertical infinitesimal translations on fibre sets. 
Definition 64.14.2 is useful for connections on fibre bundles. Connections are in general non-vertical vector 
fields which are generated by Lie algebra elements of the structure group. 


Definition 64.14.2 line (64.14.1) is a natural extension of Theorem 64.13.6 line (64.13.2), replacing the zero 
vector 0, € T,(M) with a general vector V € T,(M). The resulting vector field on E, is then no longer 
vertical, as is the case with X; B ud in Theorem 64. 13. 8. Instead, the field X7 d y has a ho adal component 
equal to V. Consequently, instead of the simple adjoint formula in Theorem GL 13. 11 line (64.13.3), depending 
only on g¢,,4,(p), one obtains two terms instead, namely a term which depends on g, and a term which 
depends on its first derivative in the horizontal direction V. None of this is very surprising. However, it 
must all be computed and verified. 


64.14.2 DEFINITION: The (non-vertical) vector field induced on a fibre set Ep of a C! (G, F) fibre bundle 
(E, Tg, M, AE) by i a vector u € T.(G) via a chart ¢ € A‘, with base-space velocity V € T,(M) at p € M, is 
the vector field XF ‘ov € X(T(E) | Ej) which satisfies 


Vz € Ep, Xiov (2) = (te X 6): (V, X7 (6(2)) (64.14.1) 
= (ng x 6); (V, (ARo())e(u)). 
(See Definition 63.6.5 for XE.) 


Alternative name: non-vertical Lie-algebra-generated vector field. 


64.14.3 THEOREM: Verification that non-vertical Lie-algebra-generated vector fields are vector fields. 
Let (E, Tg, M, AE) be a C! (G, F) fibre bundle. 


(i) Vu € T,(G), Vo € AS, Vz € Dom(9), VV € Tz, (M), 
(ii) Vu € T.(G), Vo € AE, Vz € Dom(¢), VV € T,, (M), 
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PROOF: For part (i), let u € T-(G), ¢ € A5, z € Dom(¢) and V € T,(M), where p = wz(z). Then 
XE (¢(z)) € Tez) (F) by Definition 63.6.5. Thus (V, XE (4(z))) € Try) (M) x Tay (F). 

Since 1g x $ : Dom(¢) > «g(Dom(6)) x F is a C! diffeomorphism by Definition 64.8.3 (ii), it follows that 
(d(xg x à))z : T(E) > Ter p(z),6(z))(M x F) = Tp(M) x Tg) (F) is a bijection. (The equivalence between 
these two tangent spaces uses the direct product identification map in Definition 54.7.2.) Consequently 
(d(ng x $))z (V, XE, v) is a well-defined element of T;(E). 

Part (ii) follows from part (i), Definition 64.14.2 and Notation 64.5.6. 


64.14.4 REMARK: Chart transition rules for non-vertical Lie-algebra-generated vector fields. 

Theorem 64.14.5 extends the fibre chart transition rules in Theorem 64.13.11 from vertical to non-vertical 
vector fields which are generated by Lie algebra elements. This transition rule for w € T, v necessarily 
includes a term which depends on the first-order derivative of the chart transition group element gy, 1, (p) 
in the direction of the horizontal component V € T,(M). 


Theorem 64.8.10 gives a fibre chart transition rule between the differentials (dó1).(w) and (ddz),(w) for 
w € T;(E), without assuming that the vector field on the fibre set is generated by a Lie algebra element. This 
differential fibre chart transition rule depends on the horizontal variation of gyp,- Theorem 64.13.11 gives 
a differential chart transition rule with the assumption that there is a Lie-algebra-generated vertical vector 
field on each fibre set Ep. Theorem 64.14.5 uses Theorem 64.8.10 and Definition 64.14.2 to combine these 
horizontal and vertical transition map formulas, assuming a non-vertical Lie-algebra-generated vector field 
on the total space. Thus one sees two terms on line (64.14.2), a vertical term Adj(g(p))(u1) depending only 


on the value g(p), and a horizontal term (ARE Je '((da)p(V)) depending on the horizontal differential (dg). 


64.14.5 THEOREM: Fibre chart transition rule for non-vertical Lie-algebra-generated vector fields. 
Let (E, ng, M, AZ) be a C! (G, F) fibre bundle. Then 


Vp € M, Yui,u» € T.(G), Voi, 9» € AT p VV € T,(M), Yz € Ep, 
uz = Adj(9(p))(u1) + (ARG )e'((d9)p(V)) > Xi onv) = X&e v. (6414.2) 


where œz o dila =L 


abbreviated to g. 


gea (p) ^ FE > F for all ¢1,¢2 € Aip as in Definition 64.8.3 (iv), and gys is 


PROOF: Let p € M, u € T«(G), ¢1, %2 € AE p V € T,(M) and z € Ep. Let w = = XE av (2) (See 
Definition 64.14.2.) Then (re X d1),(w) = (V, XË (d1(2))). So (dg), (w) = V and (déi),(w) = XE (é1(2)). 
Then by Theorem 64.8.10 line (64.8.1) with u — (dR. , ye (dg), (V). 


(doz) 2(w) = Xu (b2(z)) + (dLg(p))ox (e) ((dé1)« (w)) 
= Xa ($2(2)) + (dL g(p))41(2) (Xen (61 (2))) 
= Xf (62(2)) + (Lap) «(Xen (9(9) 202) 
= Xi (d2(2)) + XKaicg(p)) (ur) (2(2)) (64.14.3) 
= XT aate) (ur) (P2(2)) (64.14.4) 
where line (64.14.3) follows from Theorem 63.6.15. Hence by Definition 64.14.2, w = XE, éo,v (2) with 
= Adi(g(p))(u1) + (dRG, )z (dg) (V)). 
64.14.6 THEOREM: Uniqueness of generators of non-vertical vector fields on fibre sets. 
Let (E, n, M, AZ) be a C! (G, F) fibre bundle. Then 
Vp € M, Vó1, 9» € AE p VV € T,(M), Vui, ue € Te(G), 
uz = Adj(g(p)) (1) + (ARGAD V) e XE av = XE av: (6414.5) 


where ¢2 o ZUM 


as =L4,..4,(p) : E > F for all p € t(Dom(¢1) N Dom(93)) as in Definition 64.8.3 (iv). 
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PROOF: The forward assertion of line (64.14.5) follows from Theorem 64.14.5. To show the converse, let 
p E€ M, ¢1,¢2 € Afp V € T,(M) and u1, ue, ug € Te(G) with ug = Adj(g(p))(a1) + (dR& 5) ((dg)p(V)), 
and suppose that XP yy = XP , y and XS gaV = XP av. Then (vg x $2)! (V, XL (¢2(z))) = 
(tg X $3); (V, X$ (¢2(z))) for all z € Ep by Definition 64.14.2. So XL (f) = XL (f) for all f € F because 
¢2(E,) = F. Therefore u, = uz by Theorem 63.6.17. This verifies the converse. 
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Chapter 65 


DIFFERENTIABLE VECTOR BUNDLES 
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65.9 Tangent bundles of differentiable manifolds . . . .. .... les 2086 


65.0.1 REMARK: Relations between fibre bundles, vector bundles and principal bundles. 

In applications to physics, and particle physics in particular, the two main classes of fibre bundles are vector 
bundles and principal bundles. Very roughly speaking, vector bundles in particle physics contain the vector 
fields which represent fermionic matter wave functions, whereas principal bundles contain the connection 
forms which represent bosonic radiation gauge potentials. 


Some relations between the mathematical classes of fibre bundles are illustrated in Figure 65.0.1. 


ordinary fibre bundle 


"MEE 


vector bundle principal fibre bundle 
! ! 
tangent bundle principal frame bundle 
Figure 65.0.1 Family tree for differentiable fibre bundle classes 


In gauge theory, a connection form on a principal bundle typically represents a background bosonic radiation 
field, which is then converted to a covariant derivative on an associated fermionic vector bundle, which is 
then used to construct a Lagrangian density function, from which equations of motion for the fermions 
and bosons can be derived. (Some typical examples of fermion/boson combinations are electrons in photon 
radiation fields, and quarks in gluon radiation fields.) 


From the mathematical point of view, vector bundles have the virtue that covariant derivatives can be defined 
on them, whereas principal bundles have the virtue that connection forms can be defined on them. 


As discussed in Remarks 65.1.1, 65.3.4 and 68.2.2, it is the linear space structure of the fibre space of a vector 
bundle which makes covariant derivatives well defined and fibre-chart-independent. Thus differentiable vector 
bundles are the natural structure for covariant derivatives in the same way that differentiable fibre bundles 
are the natural structure for connections. 


65.0.2 REMARK: Literature for vector bundles. 
General vector bundles are defined by Poor [32], pages 12-25; Sternberg [38], pages 337-338; Darling [8], 


Alan U. Kennington, "Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright (C) 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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pages 125-132; Frankel [12], pages 413-419; Eriksson/Hággblad/Strómbom [264], pages 47-48; Drechsler/ 
Mayer [262], page 82; Crampin/Pirani [7], pages 357-360; Kobayashi/Nomizu [19], page 113; EDM2 [113], 
pages 570-571. (See Remark 68.2.1 for some literature on covariant derivatives for general vector bundles.) 


65.0.3 REMARK: The extraordinary linearity of fundamental physics. 

At several points in this book, some very basic and obvious assertions regarding vector bundles require an 
apparently disproportionate amount of work to prove them within the framework of general differentiable 
fibre bundles. This raises the question of whether the general framework is truly suitable, or whether vector 
bundles should be treated as a separate class of objects with their own particular definitions and theorems. 


In this book, the theory for connections on vector bundles is developed as a special case of general connections 
on differentiable fibre bundles. But even showing the most basic properties, such as linearity and the Leibniz 
rule for covariant derivatives (as in Theorems 68.2.11 and 68.2.16), requires inordinate preparation with 
numerous technical theorems. Such properties of covariant derivatives are typically given as fundamental 
properties in textbooks, and other properties are derived from them. Usually in mathematics, embedding 
one class of objects as a subclass within another reduces the overall workload, but in this case the workload 
is greatly increased. Nevertheless, the difficult path has been chosen here. It is quite satisfying to know that 
the fundamental properties of connections and covariant derivatives on vector bundles can be derived from 
more fundamental assumptions about general connection on differentiable fibre bundles. But the benefit is 
largely philosophical, not very useful for computations or the discovery of new theorems. 


In addition to the substantial effort required to derive properties of covariant derivatives on vector bundles 
from properties of general connections, another consideration is that the applications to particle physics do 
not appear to have any experimentally valid generalisation from linear connections to non-linear. If electro- 
magnetic waves, for example, were only first-order approximations to some sort of non-linear phenomenon, 
one would expect to see harmonics or “overtones” of light at multiples of fundamental frequencies, analogous 
to the way non-linear sound-producing mechanisms produce overtones. Since no such thing is seen, even 
when light has travelled through billions of light-years of inter-galactic space, one may assume that light is 
a purely linear phenomenon. Moreover, quantum mechanics is an entirely linear theory involving Hilbert 
spaces showing no hint of non-linearity. Thus particle physics and quantum field theory are apparently pure 
linear theories, not linearisations of slightly non-linear phenomena. Therefore there seems to be no benefit 
from embedding vector bundle theory inside a more general differentiable fibre bundle theory. 


Consequently one may say that much of the effort required to develop the theory in Chapter 65 and Sections 
68.1, 68.2 and 68.3 is a cost of the principle of deriving the particular from the general, not a benefit. 


65.1. Vector bundles 


65.1.1 REMARK: Vector bundles are differentiable fibre bundles whose fibre space is a linear space. 

The vector bundles in Definition 65.1.3 specialise the differentiable (G, F) fibre bundles in Definition 64.8.3 to 
the special case where F is a finite-dimensional linear space and G = GL(F), and (GL(F), F) is the general 
linear Lie transformation group for F. (See Notation 23.1.12 for GL(F). For the analogous specialisation of 
non-topological fibre bundles to non-topological vector bundles, see Section 24.11.) 


The more general case of infinite-dimensional linear spaces F is not considered here because the structure 
group GL(F) for such F is not a finite-dimensional manifold. The most immediate application of the 
vector bundle concept is to the tangent bundles of finite-dimensional manifolds, but the motivation for the 
generalisation from tangent bundles to vector bundles lies in quantum field theory. 


For the tangent bundle of a differentiable manifold M, the fibre space is typically chosen as the real linear 
space R” with n = dim(M), and the structure group is then GL(n, R). (Complex linear spaces C”, with 
structure groups which are subgroups of GL(n, C), are of particular interest in gauge theory.) 


When R” is regarded as a C^ manifold for some k € Zg, the charts for IR" can, in principle, be chosen 
as general C^ local diffeomorphisms from R” to R”. However, the term “vector bundle" suggests that the 
charts should be linear, which implies that each chart for F should be a component map kg : F — R” for 
some basis B for F, where m = dim(F). (See Definition 22.8.8 for kg.) Similarly, the charts for GL(F) 
would typically be the standard linear transformation matrices &p,p for F with respect to some basis. (See 
Definition 23.2.10 for &p,p.) A single chart is sufficient for the manifold atlases for F and GL(F), although 
usually these atlases will contain all valid component charts. 
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In purely mathematical terms, the motivation for defining the class of vector bundles, which is intermediate 
between the much more specialised tangent bundles and the much more general differentiable fibre bundles, 
is to introduce the vector bundle drop function in Definition 65.3.5. (The fibre space must be a Banach 
space to make this well defined.) This drop function makes it possible to define covariant derivatives of 
cross-sections of vector bundles. This helps explain the importance of general vector bundles in physics, 
because the covariant derivative is a core concept for the vector fields which are required for field theories. 


65.1.2 REMARK: Tensor bundles are special kinds of vector bundles. 

Tensor bundles may also be considered to be vector bundles in the sense that tensor spaces are linear spaces, 
but the structure group for tensors is typically not the general linear group. Tensor bundles are in fact 
associated ordinary fibre bundles of (GL(F), F) fibre bundles. The structure group for a tensor bundle is 
the same as the structure group for the primal linear space with which it is associated. For example, the 
structure group for an (r,s) tensor bundle T^*(M) would act individually on each of the r copies of 75 (M) 
and s copies of T;,(M) at each point p € M. In other words, the structure group acts, via the fibre charts 
on the underlying primal space T, (M), not as the general linear group on T5? (M). 


Consequently tensor bundles meet the requirements for a vector bundle, but for a fibre chart to be compatible 
with the structure of a tensor bundle, it must be restricted so that the tensorial structure is faithfully mapped 
to the fibre space. A similar comment applies to orthogonal bundles, for example, whose structure group is 
O(n) or SO(n). This is yet another reason why “complete atlases” should not be used in primary definitions 
for bundles or manifolds. When forming a complete atlas of any kind, one must be careful that the atlas 
completion does not lose or obscure important properties of the structure. 


If the fibre space is C” for some n € Zf, then the structure group will be GL(C”) = GL(n, C), which is not 
isometric to the group GL(2n, R) because, for example, dim(GL(n, C)) = 2n? and dim(GL(2n,IR)) = 4n? 
due to the constraints on complex number multiplication. (See for example Gilmore [82], pages 47-48.) 
'There is, in principle, no need to add extra conditions to Definition 65.1.3 to enforce linearity beyond what 
it required for general fibre bundles in Definition 64.8.3. Individual fibre sets of the total space inherit an 
implied linear space structure via the fibre charts. But when a connection is defined on a vector bundle, it 
is usual to add an extra condition to require the connection to have a linear relation (via the fibre charts) 
to the fibre space. 


It is convenient to always use linear charts as far as possible, so as to reflect the linearity of maps and spaces 
within a vector bundle. Every finite-dimensional linear space has a basis by Theorem 22.7.13, and every 
basis is associated with a unique component map as in Definition 22.8.8. It is always possible to constrain 
each manifold chart for a linear spaces to be a component map for some basis to reflect the linear structure. 
If this structure does not coincide with the bundle's intended topology and other properties, then the name 
"vector bundle" is not justified and should not be used. 


Definition 63.4.17 requires the differentiable structures (i.e. manifold atlases) on GL(F) and F to contain 
all charts which are component maps valued in GL(n, IR) and IR" respectively, where n = dim(F). Then 
by Definition 64.8.3 (i, ii, v), the manifold atlases for the total space and base space must be C^ compatible 
with these. 


65.1.3 DEFINITION:  Differentiable vector bundle. 
A CK (differentiable) vector bundle over F, for k € Zg, is a C* differentiable (G, F) fibre bundle, where 


(i) F is a finite-dimensional real linear space, 
(ii) G=GL(F), 


(iii) (G, F) < (G, Ac, F, Ar, 0G, LE) is the general linear Lie transformation group of F with component-map 
atlases Ap and Ag as in Definition 63.4.17. 


Alternative name: C* (differentiable) vector F-bundle. 


65.1.4 REMARK: Atlases for vector bundle fibre spaces and structure groups. 
Definition 63.4.17 implies that the atlases in Definition 65.1.3 (iii) are given by 


(1) Ar = (&p; B is a basis for F}, with component maps «g as in Definition 51.10.3, 


(2) Ag = (&p,p; B is a basis for F}, with component maps &p,p as in Definition 23.2.10. 
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65.1.5 REMARK: Linear structure of vector bundles. 

By Definitions 65.1.3 and 64.8.3, a C* vector bundle, for k € Ze , has a specification tuple of the form 
(E, T, M, AÈ) < < (E, A Ag,T, M, M, Am, AL), where (E, Ag) < (GE, Tk), Ag) and (M, Am) < ((M, Tm), Am) are 
C* manifolds, and its structure group has a specification tuple of the form (G, F) < (G, Ac, F, Ar, oa, HE), 
where (G, Ag) < ((G, Ta), Ag) and (F, Ap) < ((F, Tp), Ar) are C^ manifolds, and G may be assumed (for 
simplicity) to be the transformation group ik IR) acting on F = R” for some n € Zj. 


By Definition 64.8.3 (ii, iii), the fibre set E, = ~'({p}) is C^ diffeomorphic to IR" for each p € M. This 
is much weaker than ‘linearity. The fibre ae are not required to have any linear space structure at all. 
They have only a topology and differentiable manifold structure. An unambiguous linear space structure 
is not induced on the fibre sets by the fibre charts in AL if one applies only conditions (ii) and (iii) of 
Definition 64.8.3. For example, the zero element of IR" may be mapped to different elements of E, by 
different fibre charts. However, condition (iv) of Definition 64.8.3 puts an additional constraint on any two 
fibre charts $1, 9» € A% such that p € t(Dom(¢1)) n t(Dom(¢2)), namely that 


dg € G, Vz € Ep, $»(z) = Lg(¢1(z)), 


where L, € Lin(R”, R”) is the left action of g on IR". In particular, if $1(z) = 0, then ¢2(z) = 0. More 
importantly, A11 (21) + A2b1(22) = A1b2(21) + A262(22) for all Ay, 4» € IR and 21,22 € Ep. It is for this 
reason that Definition 65.1.3 does not need to explicitly require linear structure for a vector bundle beyond 
stating what the structure (transformation) group is. It is the group action which makes the induced linear 
structure on fibre sets unambiguous. 


Some authors explicitly mention the linear space structure on the fibre sets of vector bundles as if the 
fibre sets had their own concrete linear space structure. In practical applications, such a concrete structure 
typically is in fact defined, but in the abstract theory, such structure is only implicit via the fibre charts, 
in the same way that differentiable structure is implicit. Abstraction has the advantage of great generality, 
and also the advantage that undefined concepts and structures are not accidentally used in definitions and 
theorems. So for the sake of the abstraction discipline, it is apparently better to avoid saying that the fibre 
sets have their own linear space structure. In other words, all assertions about linear space structure should 
be stated via the fibre charts, in the same way that all abstract differentiable manifold structure definitions 
and theorems are stated via manifold charts. (It should perhaps be noted that by contrast, the topology on 
a manifold is assumed to be intrinsic to the underlying set of a manifold. However, it could be argued that 
a topology is an external structure anyway. See Remarks 49.2.4 and 51.3.1 for this issue.) 


When it is assumed that fibre sets have their own intrinsic linear space structure, instead of using fibre 
charts to define linear operations, this can be excused as a form of short-cut or shorthand. Definition 65.1.6 
shows how to replace such a short-cut or shorthand with the explicit importation of linear structure from 
the linear fibre space. 


Definition 65.1.6 induces the linear space structure from vector bundle fibre spaces onto fibre sets so as to 
avoid having to mention fibre charts when discussing linear space structure on fibre sets. Then one may 
make use of the induced linear space structure as if it were explicitly defined on fibre sets, although in fact 
all operations are really performed via fibre charts. The fibre-chart-independence of the algebraic operations 
in Definition 65.1.6 is a direct consequence of Definitions 64.8.3 (iv) and 23.1.1. (This may be easily verified 
by the sceptical reader who has nothing better to do. ) 


65.1.6 DEFINITION: The (induced) linear space of a fibre set Ep of a C? vector bundle (E, p, M, A7) over 
F < (R,F,or,Tr, 0r, H4) is the tuple Ep < (IR, E p OR TR CE, UE); where R < (R,or,7R) is the real 


number system, and the operations op, : Ep x Ep > Ep and n IR x E, — E, are defined by 


Vp € M,Vz,zo € Ep, VO € AE p 
on, (21,22) = |, (o ($), 062) 
and 
Vp E M, VAER, E" 
2) = $| z (HRA 92) 
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In other words, 
Vp € M, V21,22€ Ep, YỌ € Abs, +22 = eg (6) + (2) 
and 


Vp € M, YA E€ R, Yz Ep, Vó e AR, Az = |g (A¥(2)). 


65.1.7 REMARK: Standard induced basis fields for fibre sets of vector bundles. 

For the purposes of defining coefficient arrays for connections on vector bundles, such as the Christoffel and 
Cartan connection coefficient arrays in Definitions 68.1.8 and 68.3.2 respectively, it is useful to introduce 
standard bases for the fibre sets of vector bundles as in Definition 65.1.8. 

Using Notation 65.8.5, one could write eP?*r € X(F"(E,g, M)) for all ¢ € Ai» and vr c atlas(F). 
Theorem 65.1.9 shows that, unsurprisingly, the individual vectors in these coordinate basis frame fields are 
horizontal vectors. Of course, this horizontality is fibre-chart-dependent in the absence of a connection. 


65.1.8 DEFINITION: Standard coordinate basis frames for fibre sets, induced from fibre space. 
The standard induced basis field for fibre sets of a C* vector bundle (E, Tg, M, Ax) with respect to charts 
$ € Af, and wr € atlas(F), is the function e”*" : zyg(Dom(9)) > Upem Ep” defined by 


Vp € mp(Dom(4)), Vie Nm, — eP9*r(p); = d|p (Ug! (e)) 
= (TE x 9) (p, Yp (ei), 
where m = dim(F) and (e;)7*, is the standard basis for IR". 


65.1.9 THEOREM: Differential horizontality of induced basis vector fields. 
Let (E, ng, M, AE) be a C* vector bundle with m = dim(F). Then 


Vp € M, VV € T,(M), Vo € Af p, Vir € atlas(F), Vi € Nm, 
Over 9r (-), = HE (er (p) 
= (np X $); (V, OTe (F))- 
(See Definition 64.5.13 for Hj) 
PROOF: It follows from Theorem 64.5.18 and Definition 65.1.8 that 
dye” PYF (.); = ðv (re x 4) 1(-, Yp (ei)) 
—1 — 
= His (6| p, Vr i) 
= HE (er (p); 
= (rg x 6), (V, OT i (P)): (65.1.1) 


where line (65.1.1) follows from Definition 64.5.13. 


65.1.10 REMARK: Formula for the component map of a standard induced fibre-set basis. 
Theorem 65.1.11 gives a simple formula for the component map for a standard induced basis on a vector 
bundle fibre set. Fibre space charts Yr appear here because they are component maps by Definition 51.4.21. 


65.1.11 THEOREM: Component map for standard induced coordinate basis frames on vector bundles. 

Let (E, ng, M, AE) be a C! vector bundle. Let m = dim(F), ¢ € A‘, vp € atlas(F) and U = tz(Dom(¢)). 
For p € U, let Keer : Ep — IR" denote the component map (as in Definition 22.8.8) for the basis 
eF 9r (p) = (eP.9*9(p);)" : of E, (as in Definition 65.1.8). Then 


Vp € U, Vz € E,, KYE (2) = bp (d(z)). 
Hence Vp € U, Vz € Ep, Vi € Nm, KE (z)* = vi (G(z)). 
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PROOF: Let z € E, Then z = oy", &P/99r (z) eF PYF (p); for i € Nm by Definition 22.8.8. Therefore 


vr(ó(z)) = Yro > gp n (z)gE: er (p)i)) 


= X Poor hip (le (p) (65.1.2) 
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where line (65.1.2) follows from the linearity of wr and ¢ by Theorem 22.8.12 and Definition 65.1.6. 


65.2. Linear operations on vector bundles 


65.2.1 REMARK: Global extension of the induced total space scalar multiplication operation. 

The vector addition operation in Definition 65.1.6 cannot be automatically extended to a global operation 
on the total space E because if *g(z1) Æ mg(z2), it is not possible to choose a common value of p for the 
inverse map ez. However, the global extension of the scalar multiplication operation is both well defined 
and useful. It is given in Definition 65.2.2. 


65.2.2 DEFINITION: The (global) (induced) scalar product operation on the total space E of a C? vector 
bundle (E, 7p, M, AL) is the operation uk : IR x E > E defined by 


YA E R, Yz € E, Vóe AE. es LEOA, z) — elg. URA, (2). 


"E(z) 


In other words, 


VA € R, Vz € E, Vó € AE relo) dz = |p (Ad(2)). 


Tpl) 


65.2.3 REMARK: Differentiability of product of real function and cross-section for vector bundles. 
Theorem 65.2.4 is a generalisation of Theorem 57.2.11 from tangent bundles to vector bundles. The pointwise 
product of real-valued functions and vector bundle cross-sections in Theorem 65.2.4 uses the induced scalar 
product operation in Definition 65.1.6. 


65.2.4 THEOREM: Differentiability of product of real-valued function and vector bundle cross-section. 
Let k € Zj. Let (E, rg, M, AE) be a C* vector bundle over F. Then 


VU € Top(M), Vf € C! (U, R), VX € X'(E,ng, M |U), 
f.X € X'(E,ng, M |U), 


where f.X denotes the pointwise product f.X : p — f(p)X(p) using the induced linear space on Ep. 


PRoor: Let U € Top(M), f € C*(U, IR) and X € X^(E,ng, M |U). Then (f.X)(p) € E, for all p € U. 
Therefore f.X € X(E, ng, M |U) by Notation 21.3.4. 

Let p € U, $ € A5 ,, Ym € atlas,(M) and Yr € atlasgp)(F). Let bo = (Ym X Vr) © (Te x à) and N = 
vVu(X-! ($7! (Dom(vr)))). Then vo o X o v, € C*(Qo, IR^*") by Theorem 64.9.11, where n = dim(M) 
and m = dim(F), and Qo € Top(IR”) because Y}, X and ¢ are continuous. 

Let Q = Dom(f o 43) = v (U). Then f o yip € C*(Q, R) by Notation 51.6.3, and Qo C Q. Consequently 
by Theorem 42.6.11, (f o ym) (po o X o way) € C*(Qo, IR**"), where (f o yir) (yo o X o vy) 
denotes the pointwise product of (f o yr) and (vo o X o yir) restricted to the intersection of their 
domains, which is Qo. But (f o prr) (Yo o X o val) = (Flo o X)) o vy] = vo o (F.X) o vy 
by Definitions 65.1.6 and 64.8.3 (iv) since the structure group of E is GL(F) by Definition 65.1.3. Thus 
Vo o (F.X) o Wap € C'(Qg, R"*"). Hence f.X € X*'(E,ng, M |U) by Theorem 64.9.11. 
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65.2.5 REMARK: Linear spaces of cross-sections on vector bundles. 
Definition 65.2.6 introduces linear space structure for sets of cross-sections on vector bundles in the most 
obvious pointwise manner. The pointwise operations on fibre sets Ep are introduced in Definition 65.1.6. 


Theorem 65.2.7 verifies the closure of the sets of cross-sections in Definition 65.2.6 under linear operations. 
The key step in the proof is the use of linear induced charts for the vector bundle total space, which permits 
linear operations on cross-sections to be expressed as linear operations via Cartesian space charts, which can 
then benefit from Theorem 42.6.13 to prove Cartesian space map differentiability. 


65.2.6 DEFINITION: Linear space structure for sets of cross-sections of vector bundles. 7 
Let E < (E, rm, AE) be a C* vector bundle over a finite-dimensional real linear space F, where k € Zi. 
and let U € Top(M). 


The scalar product operation on C" cross-sections of the vector bundle E is the map 
Ug: Rx XF(E,mg, M |U) > X*(E,ng, M |U) defined by 


VA € R, YX € X'(E,ng, M), Vp €U, ug(A, X)(p) = AX(p). 


The (vector) addition operation on C" cross-sections of the vector bundle E is the map 
on: X'(E,ng, M |U) x XF(E,ng, M |U) ^ XF*(E,ng, M |U) defined by 


VXi, X2 € X"(E,ng, M), Vp € U, og(Xi, X2)(p) = Xi(p) + Xa(p). 


The linear space of C" cross-sections of the vector bundle E on U is the linear space 
X"*(E,ng, M |U) < (R, X*(E,ng, M |U), oR, TR, CE, LE): 


65.2.7 THEOREM: Closure of sets of C* cross-sections under linear operations. 
Let k € Zg. Let (E, rg, M, AZ) be a C* vector bundle over F. Then 


VU € Top(M), VA € R, YX € X'(E,mg, M |U), 


AX € X*(E,ng, M |U) (65.2.1) 
and 
VU € Top(M), VX,Y € X*(E, npg, M |U), 
X +Y € XF'(E,ng, M |U). (65.2.2) 


PROOF: For line (65.2.1), let A € IR and X € X*(E,g, M |U). Then since the pointwise scalar product is 
closed in E, for all p € U by Definition 65.1.6, it follows from Notation 21.3.4 that AX € X(E,mz, M |U). 
Let pe U, GE AE p Vy € atlas,(M), Yr € atlas(F) and o = (Ym X Yr) o (ng x à). Then vo € atlas” (E) 
by Theorem 64.9.4 (ii). Let Qo = vu (X-!(9 !(Dom(vr))). Then Qo = vy(vg(Dom(9))) € Top(IR?), 
where n — dim(M), and by Theorem 64.9.11, 


Vz € Qo, pol AX (Vir (@))) = (Ym X br) o (Te x PAX (Yir (20) 
= ((Ua o Te) X (Wr o Ø) (AX (Wir (£))) (65.2.3) 
= (x, Yr (AX (Yir (2))))) 
= (x, AWe(O(X (Yir (2))))) (65.2.4) 
= (ido, X (La o Yr o $ o X o vy ))(£), (65.2.5) 


where line (65.2.3) follows from Theorem 10.15.8 (i), line (65.2.4) follows from the linearity of Yr and 

oly "- and Ly: R™ — R”™ is the map Ly : y > Ay for y € IR", where m = dim(F). Since o o X o V 
vip) 

is a C^ map by Notation 64.7.3. it follows that ip o ¢ o X o v4, is a C^ map, which implies by 

Theorem 42.6.13 that Ly o vp oó$oXo V is a C^ map. Therefore wp o (AX) o V is a C^ map by 

line (65.2.5). Hence AX € X^(E,«g, M |U) by Theorem 64.9.11. This verifies line (65.2.1). 


For line (65.2.2), let X, Y € X*(E,mg, M |U). Then X +Y € X(E, rp, M |U) by Notation 21.3.4 because 
pointwise vector addition is closed in Æ, for all p € U by Definition 65.1.6. 
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Let pE U, dE AE p: Ym € atlas,(M), Yr € atlas(F) and vo = (Ym X Vr) o (rg x 9). Then vo € 
atlas*(E) by Theorem 64.9.4 (ii). Let Qo = v (X !(6 !(Dom(vr)))). Then % = va(rz(Dom(9))) = 
wu (Y- (67! (Dom(vr)))) € Top(IR"), where n = dim(M), and by Theorem 64.9.11, 


Vr € Qo, — bo((X c Y)(oyy (2))) = (Ym X Yr) o Gre X 9) X + Y) (baz (2))) 
= ((Ym o TE) X (Wr o $))((X + Y) (s; (2))) (65.2.6) 
= (x, ve(6((X +Y (Yir ())))) 
= (v, br (O(X (baz (0)))) + de (OY Wiz ())))) (65.2.7) 
= (ido, X (br o 9o X o Var tVpoóooYo Var D(z), (65.2.8) 


where line (65.2.6) follows from Theorem 10.15.8 (i), line (65.2.7) follows from the linearity of Yp and 
and “+” in line (65.2.8) denotes the pointwise sum of maps between Cartesian spaces as in 


FREE 
vag (2) 
Theorem 42.6.4 (iv). Since yp o X o VaL and Vo o Y o yi; are C* maps by Notation 64.7.3. it follows 
that jp oóoXo Vy and vpoóoYo V are C^ maps, which implies by Theorem 42.6.13 that 
UVpoooXo Vi +WUpogoYo V. is a C^ map. Therefore o o (X +Y) o V is a C* map by 
line (65.2.8). Hence X + Y € X*(E, ng, M |U) by Theorem 64.9.11. 


65.2.8 REMARK: Abstract dual vector bundles. 

In the case of an abstract (real) vector bundle (E, 7, M, A5), it is not straightforward to define a specific 
concrete dual vector bundle (E, ñ, M, AE). The dual fibre space may be constructed very concretely as 
the set F* = Lin(F,R) of all linear maps on F, and the base space M is the same for the dual and primal 
bundles, but the total space E is an abstract manifold. So it is not so easy to construct a dual total space E. 
One possible construction is the orbit-space style of associated fibre bundle construction in Definition 66.7.12, 
which is sometimes the least bad kind of construction. 

In practical applications, the primal total space E is typically the tangent bundle T(M) or some other kind 
of specific construction. Then whatever method was used to construct E can also be applied to construct 
its dual E. In the special case E — T(M), the obvious candidate is the dual tangent bundle E — T*(M). 
For a given abstract primal vector bundle, Definition 65.2.9 specifies properties which a corresponding dual 
vector bundle must have. Such dual bundles are non-unique. (Definition 65.2.9 is illustrated in Figure 65.2.1.) 


Figure 65.2.1 Primal and dual vector bundle association 


65.2.9 DEFINITION: A dual vector bundle of a C* vector bundle (E, v, M, A‘) with fibre space F, for 
k € Zg , is a C" vector bundle (E, s, M, AZ ) with fibre space F*, where: 
(i) F* = Lin(F,IR) is the dual vector space of F. 


(ii) (G, F*) < (G, F*,o, uE ) satisfies Vg € G, Yw € F*, Vv € F, uẸ (g,w )(v ) = w(ub(g !,v)). (That is, 
Vg € G, Ww € F*, Li(w) =w o Lj-:, where Lg us e ) and L* = uf (g, -) for all g € G.) 


(iii) There is a differentiable fibre bundle association h : AE > AF. 
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65.2.10 REMARK: Dual action on dual vector bundles. 

From Definition 65.2.9 condition (ii), it is possible to construct a dual action on a dual vector bundle from 
group actions on the primal bundle. Since the action of the structure group on the primal bundle is fibre- 
chart dependent, the action on the dual bundle is also fibre-chart dependent, but the fibre bundle association 
map h ensures that the dependency is consistent on the two bundles. An algebraic action by the group can 
be replaced by a differential action in the case of a C! differentiable fibre bundle. Such dual differential 


group actions are applicable to the definition of dual connections. 


The structure group of a differentiable (G, F) fibre bundle (E, v, M, AZ) can act on the total space by means 
of a chart-dependent function n? : Us + G, where Us = 7(Dom(¢)) for ¢ € AL, if the functions n? obey the 
chart transition rule r?? (p)go,, 4, (p)n® (p), 4, (p) for all p € Uy, where g¢,,¢, : Us — G is the fibre chart 
transition map in Definition 64.8.3 (v). 


65.2.11 REMARK: Line bundles. 
Definition 65.2.12 is not used in this book, but some such thing does appear in the literature. 


65.2.12 DEFINITION: A C" (differentiable) line bundle, for k € Zj, is a C^ differentiable vector bundle 
over a linear space F with dim(F) = 1. 


65.3. Vertical drop functions for vector bundles 


65.3.1 REMARK: Application of the fibre space drop function to fibre chart differentials. 

Theorem 65.3.2 is proved for the benefit of Theorem 65.3.7, which asserts the fibre chart independence of 
the drop function for vector bundle total spaces. This drop function is important because it is required for 
the definition of the covariant derivative for general vector bundles. 


65.3.2 THEOREM: Transformation rule for dropped differentials of fibre charts. 
Let (E, v, M, AZ) be a C! vector bundle of a finite-dimensional linear space F. Then 


Vz € E, Vy € T,o(E), Vo1, $2 € A z(e) 
w" ((db2)2(¥)) = Los, o, (me) (077 ((dó1)2(v))) 


where 95,4, : Ug, N Uga — GL(F) is the fibre chart transition map for E as in Definition Dus .3 (v), where 
Us = r(Dom(9)) for all € AX. (See Definition 54.9.5 for the linear space drop function w” : T(F) > F.) 


PROOF: Let z € E, y € T; o(E) and $1, 99 € AE nle) Let y € Ag and n = dim(F). Then (d$1);(y) = 
te, (2) y for some w € R”. So ®(w)((dd1)-(y)) = w by Notation 54.5.7. Therefore cc ((dó1);(y)) = 
y- (9 OCA (y))) = v^! (w) by Definition 54.9.5. 

By the chain rule, Theorem 58.4.13, (dó2);(y) = (d(¢2 o 91!))5, (3 ((dé1):(y)), and by Definitions 65.1.3 


) 
and 64.8.3 (v), ¢2 o à! (z)) € Lin(F, F). Therefore 


Loss, $1 (7 


(db2)-(y) = (d(Lg,, 4, («(2)))&n c) (401) «(9)) 
= (d(Lg,, 4, (0) 9e Cb (o) m.) 
= los(z),i,» 


V 1(w))) by Theorem 58.4.15. Then $(v)((dó3);(y)) = € by Notation 54.5.7. 
-l(é6(w)((dé3).(y))) = w-l(w) by Definition 54.9.5. Therefore w *((de2)-(y)) = 
Log, o (n2) (7 ((d01)2(y))) = Loss, ((2)) (V7 (w)).. Hence c^ ((dd2)2(y)) 

F ((dé1)(y)))- 


65.3.3 REMARK: Vector bundles over subgroups of general linear groups. 

There is no need to provide a definition for vector bundles for subgroups, such as SO(n) or SO(F), of 
general linear groups GL(n) or GL(F). Such subgroups are invariably defined with differentiable structures 
which make them regular submanifolds of the corresponding general linear groups. So a C^ differentiable 
(SO(n), IR?) fibre bundle is then necessarily a C^ vector bundle according to Definition 65.1.3. Significantly, 
the group itself is not part of the specification tuple of the fibre bundle. So the fibre bundles for subgroups 
of general linear structure groups are valid vector bundles. 


where w = s 


qe $1 


Gr 
( 


Loss, $1 n(z)) W 
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65.3.4 REMARK: Drop functions for vertical vectors on total spaces of vector bundles. 

Definition 65.3.5 is required for the definition of fibre-chart-independent covariant derivatives for general 
vector bundles. It is the linear structure of the fibre space of a vector bundle which makes possible a 
fibre-chart-independent drop function for vertical vectors on the total space. 


In this case of differentiable fibre bundles with no linear structure on the fibre space, one obtains fibre-chart- 
dependent drop functions as in Definition 64.6.2, and the supposedly global version of this drop function in 
Definition 64.6.6 is not even global because it is restricted to the domain of a particular choice of fibre chart. 
Consequently the covariant derivative in Definition 68.2.5 falls far short of the minimum requirements for 
a useful covariant derivative. (See Definition 68.2.9 for the corresponding fibre-chart-independent covariant 
derivative on vector bundles.) 


A covariant derivative is expected to give a value in the same space as the function being differentiated. In 
the most general case in Definition 64.6.2 for C ordinary fibre bundles (E, v, M, A5), there is no fibre-chart- 
independent drop function from T(E) to E. The attempt to product such a thing in Definition 64.6.2 fails 
because it is not possible to “pull back” the vector (d$);(y) € Tg) (F) to E for o € AE nle) By first dropping 
this vector from T4(,) (F) to F in Definition 65.3.5, the resultant fibre space element w” ((d$);(y)) € F can be 


pulled back to E;(; via the inverse chart ble. ji 'This three-step process effectively creates a drop function 


from TZ o(E) to E,(z). (The apparently more direct one-step map in Definition 59.2.9 from TZ o(T(M)) 
to T,(z)(M) is actually a three-step map if it is examined more closely, using the chart space IR" as the 
linear fibre space. The T'(T'(M)) drop function is required for the Lie bracket, Lie derivative and covariant 
derivatives on tangent bundles T(M), whereas the T'(E) drop function is required for covariant derivatives 
on general vector bundles E.) 


The manifold-chart-independence of the linear space drop function cz", which is used in Definition 65.3.5, 
is shown in Theorem 54.9.3. Additionally it is shown in Theorem 65.3.7 that w? in Definition 65.3.5 is 
fibre-chart-independent. (Definition 65.3.5 and Theorem 65.3.7 are illustrated in Figure 65.3.1.) 


vB) | Pls | Lew). 
- dln, eT m (6); ) 


M | n(z)* 


Figure 65.3.1 Vertical drop function for vector bundle total spaces 


65.3.5 DEFINITION: Fibre-chart-independent vertical drop function for vector bundles. 
The (vertical) drop function at a point z € E , for a C! vector bundle (E, v, M, AZ) over a finite-dimensional 
real linear space F is the map c? : T; 9(E) ^ E,(z) which satisfies 


Vy € T, o(E), YỌ € AF n(o) wF (u) = op... (ee (46). (9))). 


where w” is the drop function for F as in Definition 54.9.5. (See Notation 64.5.6 for T, o(E) = ker((d7)~).) 
In other words, Vo € AE (2) wË = le o wt o (d$); 


m(z) 


T,,o(E)' 
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The (vertical) drop function for a C! vector bundle (E, v, M, A‘) over a finite-dimensional real linear space 
F is the map c : U,cg Tz,o(E) — E which satisfies 


Vz € E, Vy € T4o(E), w” (y) = v7 (y). 


In other words, w” = UJ, cg W7. 

65.3.6 REMARK: Generalisation of vertical drop functions from tangent bundles to vector bundles. 
Definition 65.3.5 generalises the vertical drop function in Definitions 59.2.9 and 59.2.15 from tangent bundles 
to vector bundles. Theorem 65.3.7 is then the vector bundle version of Theorem 59.2.11, which shows manifold 
chart independence for tangent bundle vertical drop functions. 


65.3.7 THEOREM:  Fibre-chart-independence of the vertical drop function for vector bundle total spaces. 
Let (E, n, M, AZ) be a C! vector bundle of a finite-dimensional real linear space F. Then 


Vz € E, Vy € T, o(E), Ver, ¢2 € AE, E,n(z): 
&i| Go (61): (9) = li (^ (403): (9). 


PROOF: Letz € E,y € T;o(E) and $1,099 € AF E,n(z)* Let p = 1(z). Then $2| p. o dlp. = Lg,, 4, (p where 
Io2,6, : Ug, N Uga — GL(F) is the fibre car transition map for E as in Definition 64.8.3 (v), where Ug = 


m(Dom(¢)) for ¢ € AE. So |z = dilg ° “a oy. But e" ((db2)-(y)) = Los, s, y (@* (dé). (y))) by 
Theorem 65.3.2. Hence PC Eg = dls. ! (@F (dor) z(y))). 


65.3.8 REMARK: The pointwise vertical drop function for vector bundles is linear. 

Theorem 65.3.9 is the vector bundle version of Theorem 59.2.10. However, the proof is quite different because 
in the case of a vector bundle, the linear structure is induced by an external linear space, namely the fibre 
space of the vector bundle. 


65.3.9 THEOREM: Linearity of the pointwise vertical drop function. 
Let (E, Tg, M, AZ) be a Ct vector bundle over F. Then wf : T; 9(E) > E,,(z)18 a linear map for all z € E. 


PROOF: Let n = dim(M), m = dim(F), z € E, p —^mz(z), 6€ AE p: Vy € atlas,(M), Yr € atlasy:.)(F), 
Vo = (Vu X Ur) o (rg X9), y € TZo(E) and à € IR. Then y = £;,(o,u),u, for some w € R” by Theorem 64.9.9 
line (64.9.4). So 


m (Ay) eem (Aa tibus) 


= Wz (tz,(0,Aw) wo) (65.3.1) 
= |p. (@* (do) (t2,(0,w).¥0))) (65.3.2) 
= Pla "(toe rwr)) (65.3.3) 
= Op OP" (toe) wer) (65.3.4) 
= Ad[s, (o oae) (65.3.5) 
= Ae[g, (7 (4d) =(te,(0,u).¥0))) (65.3.6) 
= Ac (y), (65.3.7) 
where line (65.3.1) follows from Definition 54.4.4 (ii), line (65.3.2) follows from Definition 65.3.5, line (65.3.3) 


follows from Theorem 64.9.6 line (64.9.2), line (65.3.4) follows from Theorem 54.9.7, line (65.3.5) follows 
from Definition 65.1.6, line (65.3.6) follows from Theorem 64.9.6 line (64.9.2), and line (65.3.7) follows from 
Definition 65.3.5. 


Similarly, wF (t, (0,w1),po + tz,(0,w2), po) = WE (tz (0,01) po) + PE (tz (0,w2), p0) for all w1, w2 € IR" follows 
by the application of Definition 54.4.4 (i), Delinition 65.3.5, Theorem 64.9.6 ine (64.9.2), Theorem 54.9.7, 
Definition 65.1.6, Theorem 64.9.6 line (64.9.2), and Definition 65.3.5. Hence cc? is linear. 
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65.3.10 REMARK: Concrete formula for the vertical drop function in terms of the pull-back atlas. 
Theorem 65.3.11 gives a concrete expression for the vector bundle drop function expressed in terms of total 
space element components with respect to charts in the pull-back atlas in Definition 64.9.2. 


65.3.11 THEOREM: Formula for the vertical drop function using pull-back chart components. 
Let (E, 5, M, AZ) be a C! vector bundle over F. Let m = dim(F). Then 


Vp € M, Vz € Ep, Vó € AF p, Viu € atlas,(M), Vir € atlas; (F), Vv € R”, 
E 
Ww, U cta duse cue oing) = lp, ( on (v)) 
= (yp o $i) (9). 
Hence 
Vp € M, Yz € Ep, Vo € AE p: Vou € atlas; (M), Vip € atlasg:2)(F), 


E 
We (E, Co pe (ó(2)) (Gb Gb Jo(mpxd)) = 2 (65.3.8) 


PROOF: Let y = t, (0w) (iar Xdr)o(mmxd) Then (d$);(y) = to(z),v, o by Theorem 64.9.6 line (64.9.2). So 


Mir (y)) = vp (diy W)(to(2),vvp)) = Yr (v) by Definition 54.9.5 and Notation 54.5.7. Consequently 
= é|z a (v)) by Daa 65.3.5. Hence wË (y) = (Yr o lg )~1(v). Then line (65.3.8) follows 
b a wr(9(z)) for v. 


65.3.12 REMARK: Formula for the inverse of the pointwise vertical drop function. 

'The formula on line 65.3.9 in Theorem 65.3.13 for the inverse of the pointwise vertical drop function for a 
vector bundle ignores the difference between the subset T; o(E) of the tangent bundle T(E) and the tangent 
bundle T(E;,(.,) of the submanifold E; p(z) of E. The submanifold is embedded in the manifold by applying 
the fibre set tangent vector embedding map in Definition 64.12.2. However, it is customary to ignore the 
need for this identification map. 


65.3.13 THEOREM: The pointwise vertical drop function is a chart-independent linear isomorphism. 
Let (E, rp, M, AE) be a C! vector bundle over F. Then the drop function w? : T, o(E) > E;,(; isa 
chart-independent linear isomorphism for all z € E. Hence (cP)^! : Exn(z) > Tz o(E) is well defined and 


Vp € M, Yz € Ep, VL € Ep, VO € Ab, 
(m2) (9 = (dol, Jaco e" [s a CO) (65.3.9) 


-1 
In other words, (cE)-! = (de| z, ote) ow Ale. \(F) o $. 


Pnoor: Fibre chart independence and linearity follow from Theorems 65.3.7 and 65.3.9 respectively. To 
show injectivity, let z € E and y1, y» € T; o(E) satisfy w7 (y1) = vo? (y2). Let p = ng(z) and ¢ € Af p- Then 
vo ((dd)z(y1)) = o(@E(y1)) = é(oP (y2)) = w” ((dd)z(y2)) by Definition 65.3.5. Since wp, : Eg > F 
is a bijection by Theorem 54.9.7, and (d$), (9) € Tyz)(F) for j = 1,2, it follows that (d$) (1) = (d$);(yz). 
So yı = y» by Theorem 64.3.9 (ii). This verifies the injectivity of of. 

To show the surjectivity of w? : Tz o(E) — Ep for z € Ep, let € € Ey. Then $(C) € F and $(z) € F. 
So S st (o(Z)) = (v) (9(0) € Tyz)(F) is well defined by Theorem 54.9.7. Then it follows 


that y — (de| ac) (e a (0) € Tz o(E) is well defined by Theorem 64.12.12 (ii). (This ignores 


the need to apply the fibre set tangent vector embedding map in Definition 64.12.2, as is the custom.) 
Inverting and reversing this sequence of three function applications, one obtains we (y) = C, which verifies 
surjectivity. Consequently (coP) ! : Ex,(2) > T- o(E) is well defined and satisfies line (65.3.9). Hence 
wË : Tz o(E) > E,,(z) is a chart-independent linear isomorphism for all z € E by Definition 23.1.8. (The 


z 


linearity of (czP)-! follows from Theorem 23.1.14.) 
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65.4. Oblique drop functions for vector bundles 


65.4.1 REMARK: Vertical versus oblique drop functions for vector bundles. 
Section 65.4 is the extension of Section 59.3 from tangent bundles to general vector bundles. 


Whereas Section 65.3 defines drop functions for vertical vectors on the total space only, Section 65.4 defines 
a fibre-chart- dependent generalisation, which has more limited applications. 


The oblique drop function in Definition 65.4.2 uses exactly the same expressions as for the vertical drop 
function in Definition 65.3.5. Thus it is an extension from vertical tangent vectors to general tangent vectors 
on the total space. It is this extension which makes it chart-dependent, not the expression which defines it. 
Since the oblique drop function is fibre-chart-dependent, it requires a tag to indicate this. 


65.4.2 DEFINITION: Fibre-chart-dependent oblique drop function for vector bundles. 
The oblique drop function at a point z € E, for a C! vector bundle (E, m, M, A‘) over a finite-dimensional 
real linear space F, via a chart $ € AF n(z)? is the map «P? : T,(E) > Ezz) given by 


vy € T(E), wP*(y) = elz Gu" ((d6).(9))) 


where w” is the drop function for F as in Definition 54.9.5. 
In other words, Vz € E, Vó € AT r(e) wE? = ole. ,? vw o (d$),. 


The oblique drop function for a Ct vector bundle (E, r, M, A‘) over a finite-dimensional real linear space 
F, via a fibre chart ¢ € AE, is the map c? : enema) T.(E) + Dom(¢) given by 


Yz € Dom(¢), Vy € T;(E), aw? (y) = wt (y). 


In other words, 7’? = = Usepon(ó) vPE6. and Yz € Dom(¢), c E? = c; P 


y 
65.4.3 THEOREM: Formula for the oblique drop function using the pull-back atlas. 


Let (E, n, M, AE) be a C! vector bundle over a finite-dimensional real linear space F. Let n = dim(M) and 
m — dim(F). Then 


Yz € E, Vo € AT, E,n(z)) VUM € atlas; (4 (M), Vip € atlas(F), Vwi € R”, Vw € R”, 


E, 1 
me owm iiti = elg... Wp p (65.4.1) 


7 (Vr od, cae ~* (wo). 
Consequently 
Yp € M, Vz Ce Ep, Vo € AE p: Vou € atlas,(M), Vip € atlas(F), Vw, € R”, 
92 te, (us io (00) baer )o(mnxd)) = © 155090 
PROOF: Let y = t, (ww), (b Xvr)o(mexo) Then (dó);(y) = to(2),u;, vr by Theorem 64.9.6 line (64.9.2). 
So w? ((d$),(y)) = Yp (w2) by Definition 54.9.5. So c P?(y) = é|[z. - (Yr (w2)) by Definition 65.4.2. 
Hence wE? (y) = (Yr o Pl ioc) (we): This verifies line (65.4.1). 
Line (65.4.2) follows from line (65.4.1) by substituting wp(¢(C)) for we. 


65.4.4 THEOREM: Linearity of oblique drop functions. 
Let (E, n, M, A5) be a C! vector bundle over a finite-dimensional real linear space F. Then 


Vz € E, Vó € Ab «6 wF? € Lin(T;(E), Ex(z))- 


Proor: Dom(w7*)- T,(E) and Range(z7^?) C E; for all z € E and ó € Af, z(a) by Definition 65.4.2. 
Since (d$); : T(E) — Ty) (F) is linear by Theorem 58.4.9 (iii), and S [n CF) : Tgi) (F) > F is linear by 
Theorem 54.9.7, and olz 7: F — E,,z) is linear by Definition 65.1.6, it follows that wE? :T,(E) > Ex(z) 
is linear. Thus c7? € Lin(T;(E), E,(,). 
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65.4.5 THEOREM: Oblique drop function and inverse expressed as composite of three bijections. 
Let (E, v, M, AZ) be a C? vector bundle over a finite-dimensional real linear space F. Then 


Vp € M, Yz € Ep, VV € T,(M), VO € AE p 
T4, v(E) = On, QS: )(F) o (d$); T 


"- (E) = (d$). |y 


vl (65.4.3) 


(E) 


and cl 


c. —1 
Tz, a T [ry Q Pla,» (65.4.4) 


where ¢|,, : F > Ep, : Tye) (F) > F and (dó). | : Ta v (E) > Tp) (F) are all bijections. 


F 
W 
lro) T, v (E) 


Pnoor: The map d : F + Ey is a bijection by Theorem 21.5.6 (i), and v^... "mE Tyz)(F) > F isa 
bijection by Theorem 54.9.7, and (d$); lee NE : Tav (E) + Tyz)(F) is a bijection by Theorem 64.5.11. Since 
the relevant domains and ranges of these bijections exactly match, line (65.4.3) follows from Definition 65.4.2, 


and the inverse oblique drop function (d$); Ins (E) 
bijections by Theorems 10.5.6 (iii) and 10.5.11. 


in line (65.4.4) is equal to the reverse chain of inverse 


65.4.6 REMARK: Construction of bijections by restricting pointwise oblique drop functions. 
Theorem 65.4.7 is an extension of Theorem 59.3.4 from tangent bundles to general vector bundles. 


65.4.7 THEOREM:  Bijectivity of restricted pointwise oblique drop function. 
Let (E, , M, AZ) be a C! vector bundle over a finite-dimensional real linear space F. Then 


Vp € M, Yz € Ep, VV € T, (M), Yọ € Afp» 


wt lz v(E) ` T; v (E) > Ep is a bijection 


E, —1 
and v. $ T. v (E) 


: Ep > Tzv (E) is a bijection. 


PROOF: The assertions follow from Theorem 65.4.5. 


65.4.8 REMARK: Some useful bijections for oblique subspaces of total space tangent bundles. 
Theorem 65.4.9 is a fairly obvious corollary of Theorem 65.4.7. It has some applicability to connections on 
vector bundles. (See for example Definition 68.1.8 and Theorem 68.1.10.) 


65.4.9 THEOREM: Some oblique bijections from total space tangent bundles to the fibre space. 
Let (E, n, M, AZ) be a C! vector bundle over a finite-dimensional real linear space F. Then 


Vp € M, Vz € Ey, VV € T,(M ) Vo € AF p» 
F 
ud Ir a 9 (09): 


E, . : — 
To) = Pr, ow; d : T; y (E) > F is a bijection. 


Hence 
Yp € M, Yz € Ep, VV € T (M), Vo € Ap» 


m i -1 —1 —1 : sis : 
(de): |... (y NES rui ft) o ZZ : F > T,y(E) is a bijection. 


PROOF: The assertions follow from Definition 65.4.2, Theorems 65.4.7 and 21.5.6 (i), the observation that 
(do). (Tz,v(E)) = Tg) (F), and Theorem 54.9.7. 


65.4.10 REMARK: Slightly more concrete formula for the oblique drop function. 
Theorem 65.4.3 expresses oblique drop functions in terms of the pull-back atlas in Definition 64.9.2. This is 
an extension of the formulas in Theorem 65.3.11 for the vertical drop function. 


In the more concrete case of oblique drop functions on tangent bundles in Definition 59.3.2, it is possible to 
write the output as a concrete tangent vector which looks like ty(z) wz, € T, "(a (M ). For the more abstract 


vector bundles in Theorem 65.4.3, one can say that the output has the form ble (be (w2)) € Tre (E), 
but no specific representation can be given for this. 


'Theorem 65.4.11 gives some corresponding formulas for the inverse oblique drop function. 
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65.4.11 THEOREM: Formulas for the inverse of the oblique drop function. 
Let (E, , M, AZ) be a C! vector bundle over a finite-dimensional real linear space F. Then 
Yp € M, Vz,C € Ep, VV € T (M), VO € AE p VV € atlas; (M), Vvr € atlas(F), 


—1 


m = Fa n 
ove T, v (E) A) = tz, (lhm) (V) hr (60), Grab) (0) (65.4.5) 
=i "ST 
= (dé) “ln, vay am, (OO) (65.4.6) 
= (ddb)elr. v oen ita yro) wr): (65.4.7) 


PROOF: Line (65.4.5) follows from Theorem 65.4.3 line (65.4.2), Theorem 65.4.7, Notation 64.5.6 and 
Theorem 64.9.6 line (64.9.1). Line (65.4.6) follows from Theorem 65.4.5 line (65.4.4). Line (65.4.7) follows 
from line (65.4.6 4.6) by Theorem 54.9.8. 


65.4.12 REMARK: Some relations between vertical and oblique drop functions. 
Theorem 65.4.13 is an extension of Theorem 59.3.9 from tangent bundles to vector bundles. These almost 
obvious assertions have some applications to proofs of Leibniz rules. (See for example Theorem 68.2.16.) 


65.4.13 THEOREM: Oblique drop functions and inverses of vertical drop functions. 
Let (E, n, M, AZ) be a C! vector bundle over a finite-dimensional real linear space F. Then 


Vp € M, Vz € Ey, VO € AZ, Vy € T; o(E), 


w7 (y) = v2" (y) (65.4.8) 
and y = (m2) (7 (y)) (65.4.9) 
and sn serui (65.4.10) 


—1 
To o(E) ° 


Consequently (wE)! o wh? 


all p € M. Similarly, 
Yp E M, VzeE p Vo € AE ,, VW € T,(M), 


= idr, ,(gj and wp? = idr, (£) for all z € E; and ọ € Ai p fo 


T =a a (W) (65.4.11) 
and W = wE (wE? a TUA (65.4.12) 
and we? ((o@?)-1(W)) = W. (65.4.13) 


Consequently w# o 
for all p € M. 


Pnoor: Line (65.4.8) follows by Definitions 65.3.5 and 65.4.2. Line (65.4.9) follows by line (65.4.8) and 
Theorem 65.3.13. Line (65.4.10) follows by line (65.4.8) and Theorem 65.4.7. Line (65.4.11) follows from 
line (65.4.8) and Theorems 65.3.13 and 65.4.7. Line (65.4.12) follows from line (65.4.11) and Theorem 65.3.13. 
Line (65.4.13) follows from line (65.4.11) and Theorem 65.4.7. 


-1 
w3 adem E) = idr,(m) and «P? o (wf)! = idr, (yr) for all z € T,(M) and ¢ € AE p 


65.4.14 THEOREM: The oblique drop of a derivatives of a “constant cross-section” equals zero. 
Let (E, 7, M, AZ) be a C! vector bundle over a finite-dimensional real linear space F. Then 


Yp E€ M, Vo c AE p VV € T,(M), Yq € F, 
c *(Bv((ng x ó) 1 ,a))) = 0s,. 
PROOF: It follows from Theorem 64.5.18, Definitions 65.4.2 and 64.5.13, and Theorem 64.5.16 (ii) that 
a? (Oy (me x 6) 1 (,a)) = w”? (Hel lp. (a 
= olz (œ (co (6, (Hy, is ))))) (65.4.14) 
= lp, (e (0r, 05) 


by Definition 65.1.6. This verifies the assertion. 
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65.4.15 REMARK: The zero oblique drop of derivatives of induced basis vector fields. 

Theorem 65.4.16 applies Theorem 65.4.14 to the induced basis fields for the fibre sets of a vector bundles in 
Definition 65.1.8. The fact that the oblique drop, with respect to fibre chart $, of naive derivatives of fibre-set 
basis vector fields e^:?"*(.);, induced by $, is equal to a zero vector is not very surprising. This means 
that these basis vector fields are horizontal with respect to ¢, which is fully expected. Theorem 65.4.16 is 
useful for showing the equality of Cartan and Christoffel coefficient arrays for connections on vector bundles 
in Theorem 68.3.6. 


It should be noted that Theorem 65.4.16 is not valid if a general frame field e? € X1(F™(E,75, M)) is 
substituted for the horizontal frame field eE:?^*, The theorem “works” because the directional derivatives 
Oy eP: 9 ^F (.); have zero vertical component with respect to ¢. 


It should also be noted that if different fibre charts are used in Theorem 65.4.16 for the drop function 
wE? and the frame field eP/^?r, the assertion will not be valid in general. It can be seen from the fibre 
chart transformation rule in Theorem 64.8.10 that replacing ¢, with 4! in line (65.4.14) of the proof of 
Theorem 65.4.14, for a different 9! € AS p would typically introduce a non-zero term which would invalidate 
the assertion. 


65.4.16 THEOREM: Differential horizontality of induced basis vector fields. 
Let (E, ng, M, AZ) be a C! vector bundle with m = dim(F). Then 


Vp € M, VV € T,(M), Vó € Afp Vir € atlas(F), Vi € Nm, 
wt (gy ePobr( = 0g,- 


(See Definition 65.1.8 for e?» .) 


PROOF: The assertion follows from Theorem 65.4.14 and Definition 65.1.8. 


((2019-7-20. Could possibly give a transformation formula near here between c;/:? and w”:%?. This would 
probably make use of Theorem 64.8.10. However, this formula is not currently required for any application. )) 


65.5. Scaling curves and constant-scale maps for vector bundles 


65.5.1 REMARK: Scaling curves and Leibniz rules. 

Theorem 65.5.2 generalises Theorem 59.4.4 from tangent bundles to vector bundles. In both cases, the 
motivation is to provide computations to use in Leibniz rules. In each case, the velocity of the scaling 
curve is equal to the naive expected value, but only after applying a drop function. When a linear space is 
structured as a manifold, its curve velocity vectors lie in the tangent bundle of the manifold. Such vectors 
must be “dropped” to obtain vectors in the base space. 


In the proof of Theorem 65.5.2 line (65.5.1), the differential (dé| 5 asso) of the map On yields a tangent 


vector of the submanifold Æp. This is merely a convenient convention, as discussed in Remarks 58.5.7 
and 58.5.9. Unfortunately, adherence to this convention makes a submanifold tangent vector embedding 


map 7°? necessary so as to convert the fibre set tangent vector to a total space tangent vector. 


The vector Ax(Az)|,_) may be interpreted as a tangent vector of either E, or E because Az € E, and 
Az € E for all A € IR. Here it is interpreted as a tangent vector of the total space E. If it is interpreted as a 
tangent vector of Ep, then the tangent vector embedding map on line (65.5.1) will not be needed. But then 
the total space tangent vector on line (65.5.3) will not be equal to this. To resolve this issue, it would be 
necessary to distinguish two meanings for the operator “ôx”, depending on whether it operates on a curve 
in E or a curve in Æp. Since the equality is quantified by “Vz € Ep”, one might guess that the curve lies 
in Ep. Regrettably, a tedious range of notations and definitions would be required to fully clarify this issue. 
So the matter is dealt with somewhat informally here. It is assumed that the curve A — Az lies in E despite 
the fact that Definition 65.1.6 would suggest that it is a curve in Ej. Theorem 65.5.2 does at least have the 
virtue of consistency with Theorem 59.4.4, the corresponding assertion for tangent bundles. 


Note that atlas(F) is used instead of atlas;(,;(F) in Theorems 65.5.2, 65.5.4 and 65.5.6 because all charts 
of the linear space F are global by Definition 49.7.14. So atlas;(;)(F) = atlas(F) for all z € E. 
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65.5.2 THEOREM: The velocity vector field of the “scaling curve” of a vector bundle element. 
Let (E, ng, M, AZ) be a C! vector bundle over F. Then 


Vp € M, Yz € Ep, Vào € R, Vo € Aip» Vym € atlas,(M), Vvr € atlas(F), 


z —1 
x). um v ((de| z, Jao) (use dite aed) (65.5.1) 
= n" (tro 6(2),e(G(2)),vrodl zp ) (65.5.2) 
= bez (0,41 (602) (War br om X4) (65.5.3) 
= (mE .)-1(z). (65.5.4) 
(See Notation 64.12.10 for the embedding map re? : TA, (Ej) > TX,4(E).) Hence 
Vp € M, Vz € Ep, Vo € R, CMCACOINE LZ (65.5.5) 


PROOF: By Definition 65.1.6, Az = é|z. (Ad(2))- This is interpreted here as a curve in E. So the expression 


Oy (Xz) TN A, 18 interpreted as a vector in T4, 4 (E), not in T4, (Ej). The output from (dé; 6c); however, 
p 
is a vector in 74,2(E,). Therefore one obtains 


Av(Az)|,_y, = m**( ( (dol. )ro6(2)(Ar(AG(Z))|,_,) ) (65.5.6) 

= s ((de| p soto (Coote) wr Ge): (65.5.7) 
where line (65.5.6) follows from the chain rule, and line (65.5.7) follows from Theorem 57.9.11. This verifies 
line (65.5.1). Then line (65.5.2) follows from line (65.5.1) by Theorem 64.12.14 line (64.12.3), and line (65.5.3) 
follows from line (65.5.1) by e oa 64.12.16. It then follows from line (65.5.3) and Theorem 65.3.11 
line (65.3.8) that «E „(ôa (Az) h- x) = 4 Which verifies line (65.5.5). Hence 0)(Az) h- w [e c (2), 
which verifies line (65.5.4). 


65.5.3 REMARK: Conversion of a curve into a map between two manifolds. 
Theorem 65.5.4 is an extension of Theorem 59.4.8 from tangent bundles to general vector bundles. 


Whereas Theorem 65.5.2 computes the velocity of a scaling curve, Theorem 65.5.4 regards the parameter 
space IR as a differentiable manifold. This perspective converts the curve into a map between two manifolds. 
'This is more directly applicable to the computations for a Leibniz rule. By letting u — 1 in Theorem 65.5.4 
line (65.5.9), the resulting differential is clearly the same as in Theorem 65.5.2 line (65.5.4). However, the 
construction which is used to obtain this result is different. 


Note that atlas(IR) = {idr} by Definition 49.7.12. So all vectors in TA, (IR) are equal to t),,u,ia, for some 
u € IR by Notation 54.1.4. 


65.5.4 THEOREM: The differential of vector scaling curves regarded as maps on the manifold R. 
Let (E, rg, M, AE) be a C! vector bundle over F. Then 
Vp E€ M, Vz € Ep, Vào, u E R, Yọ € Aip Vym € atlas,(M), Vr € atlas(F), 
(dRz)r0(tro,widn) = 3s (o ub (60) (Ur XWr)o(m aX) (65.5.8) 
=u (aye) (2), (65.5.9) 

where R; : R — E denotes the map A — Az for z € E. (See Definition 65.2.2 for this scalar product.) 
Proor: Let p € M and z € Ey. Let ó € AZ, and W = ¢(z). Then z = (ng x 4) !(p,W). So 
R4(A) = Az = (ng x à) ! (p, AW) € E, for all A € IR by Definition 65.2.2. 


Let u, ào € IR, vy € atlas; (M) and vp € atlas(F). Let Yo = (Ym X Vr) o (rg x à). Then vo € atlas; (E), 
and by Definition 58.4.5, (GR), (fx, uia) = troz,v,vvo» Where 


1 ; 
v = Y us (Yo o R, oid! 


i=l 


)(x) loeis (Ao) 
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= u Babo (RAD ha 

= u ða (bar X Yr) o (Te X d) (ng X d) (p, AW))|, 
— ud (Ym X vr)(p, AW) s 

= uOx(vu(p), VE(AW))|, x. 

= uOy (Vu (p), AvE(W)) [sio 

= u (0n, , Yr (W) 

= (Om, ub r(ó(z))). 


This verifies line (65.5.8). Then line (65.5.9) follows from Theorem 65.3.11 line (65.3.8). 


65.5.5 REMARK: The differential of a scalar/vector product with respect to the vector. 
'Theorem 65.5.6 is an extension of Theorem 59.4.10 from tangent bundles to general vector bundles. 


Whereas Theorem 65.5.4 gives the differential of a product Az with respect to A, Theorem 65.5.6 gives the 
differential with respect to z. In a Leibniz rule, terms containing such differentials are added to give the 
differential of the product of a real-valued function and a cross-section. 


65.5.6 THEOREM: The differential of a constant-scale vector-map. 
Let (E, rg, M, AE) be a C! vector bundle over F. Let n = dim(M) and m = dim(F). Then 
Vp € M, Vz € Ep, VA € R, Vó € Afp, Viu € atlas; (M), Vip € atlas(F), Vw; € IR", Vw € R”, 
(dL) (t (us wa). XWr)o(me x4) = De (un Awa), (Yar hr) o (d)? (65.5.10) 
where Ly : E > E is defined by Ly : z — Az as in Definition 65.2.2. Hence 
Vp € M, Vz € Ep, VA € R, Vo € Afp Vy € T.(E), 
wy2°((dLa)z(y)) = Xe (y). (65.5.11) 
(See Definition 65.4.2 for the pointwise oblique drop function c P:?.) Consequently 
Vp € M, Yz € Ep, VÀ € R, Vo € AF p, VV € T,(M), Vy € T; v (E), 
(dL)«(y) = 9X n, La 02 ()- (65.5.12) 


Pnoor: For line (65.5.10), let w = (w1, w2) € IR" x IR" = R"*" and vo = (Ym X Wr) o (ng x à). Then 
by Definition 58.4.5 for the differential of a map, (dL))2(tz,w,wo) = trz,w'ro, Where 


ntm 


w= E was oL Gs E) 
= 37 wi, uL (er X G Cm 8) 2) Lc (65.5.13) 
=E wide, PALEE X 6) CQ e1), G2) Luc 
= b» w dr, Wo (TE x p) (Yu (21), Ayp (x2) » ee (65.5.14) 
=E wide, bol me X 6) Wal ED VE Qa) ) Lua (65.5.15) 
T 2 ws, (21, r2), s) 
- wie YS ws (65.5.16) 

i=l i—n-4-1 

= (wi, At»), 
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where on line (65.5.13), « = (1,22) € R” x R™ = R”+”, line (65.5.14) follows from Definition 65.1.6, 
line (65.5.15) follows from Definition 51.4.21 and Theorem 23.1.15, and on line (65.5.16), (e;)47" is the 
standard basis for IR"*"" as in Definition 22.7.9. This verifies line (65.5.10). 


For line (65.5.11), let y € T;(E). Let Ym € atlas;(4(M), vr € a ) and vo = (Vy X Vr) o (ng x Q). 


Then y = t, (wwz) for some (w1, w2) € IR? x IR". So oP Ply =n NUS (w2)) by Theorem 65.4.3. 
So AP (y) = |. io p (Aw2)) by Definitions 65.1.6 and 51.4.21 Ae “Theorem 23.1.15. Similarly by 
line (65.5.10) and Theorem 65.4.3, w? (dL). (y = 45 (Yr (Aw2)) = els, Qe (Aw2)). Hence 


7 (Az) 


ae *((dLy)2(y)) = AwE* (y). This verifies line on Line (65.5.12) follows by Theorem 65.4.7. 


65.5.7 REMARK: Sprays on vector bundles. 
Definition 65.5.8 generalises Definition 59.5.2 from tangent bundles to vector bundles. Sprays on vector 
bundles make use of the same constant-scale vector-maps L as in the statement of Theorem 65.5.6. 


Regrettably, the general vector bundle definition of a spray seems to have little value. The tangent bundle 
version has some value because an affine connection can be constructed from it. This is done by constructing 
a bilinear map from a quadratic map, exploiting the cosine rule for triangles. This cannot be done when the 
two linear spaces in the domain of the bilinear map are different. 


65.5.8 DEFINITION: A spray on aC! vector bundle (E, r, M, A5) is a cross-section S € X (E, r, M) which 
satisfies 


VA ER, Yz € E, S(Az) = (dLy)2(AS(z)) 
= A(dLy)z(S(z)), 


where Ly : E — E is defined by Ly: z > Az for all A € R and z € E. In other words, 


VAER, So Ly = (La) o LP oS 
=I 6 (Las, 


where i : T(E) > T(E) is defined by p : y + Ay for all A € Rand y € T(E). 


65.6. Naive derivative Leibniz rule for vector bundle cross-sections 


65.6.1 REMARK: Generalising naive derivative Leibniz rule from tangent bundles to vector bundles. 
Theorem 65.6.2 extends Theorem 61.3.3 from tangent bundles to general vector bundles. Line (65.6.2) is 
recognisable as some kind of Leibniz rule. The oblique drop functions can only be removed by providing 
a connection on the fibre bundle, as in Definition 67.5.4, from which a fibre-chart-independent covariant 
derivative can be constructed, as in Definition 68.2.9. 


'Theorem 65.6.2 is used in the proof of Theorem 68.2.16 for the corresponding Leibniz rule for covariant 
derivatives on vector bundles. 


65.6.2 THEOREM: Leibniz rule for the naive derivative of a cross-section of a vector bundle. 
Let (E, rg, M, AE) be a C! vector bundle over F. Then 


VU € Top(M), Vf € C! (U, R), VX € X'(E,7z,M | s Vp € U, VV € T,(M), Vo € Afp 
= E,ó 
Oy (f. X) = (Ov f) so xq) X (o )) zum m p )X (p) bom va V (b) y (Ov). (65.6.1) 
(See Definition 65.4.2 for oblique drop functions wf.) Consequently 


VU € Top(M), Vf € C'(U, R), VX € X'(E,nz, M |U), Vp € U, VV € T (M), Vo € Af p» 
(gx (p) Ov (F-X)) = (Ov f) X (0) + fr) Xo (0v X). (65.6.2) 
In other words, 2 P OvU-X )) = (Ov f) X (p) + f(p) P *(8y X). (65.6.3) 
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PROOF: Let U € Top(M). Define Y : U > E by Vp € U, Y(p) = f(p)X(p). That is, Y = f.X. Then 
Y € X!(E,ng, M|U) by Theorem 65.2.4. Let  : IR x E — E denote the global induced scalar product 
operation on E as in Definition 65.2.2. Then by Theorem 58.7.13 line (58.7.15), 


Vp € U, (dY), = (d(f-X))p 
= (d(uo (f x X)))p 
= (ARX (py) ro) 9 (df)p (A455) x) 9 (4X )p, 
where Vg € E, RẸ = u(-,q) : R — E and Vq € R, L7 = u(q, -) : E — E. Therefore 


Vp € U, VV € T;,(M), 
OyY = (dY de (V) 

ARK) soy (df )p(V)) + (dL!) xq (AX) p(V)) 

dR Tip) fo ltro) Ov fida) + M )x(p) (9v X). 


( 
= (aR 
= (dR 
= (8v f Goa) GC) + PAIR, pan (i (Ov X) 


by Theorem 65.5.4 line (65.5.9) and Theorem 65.5.6 line (65.5.12). This verifies line (65.6.1). 
Line (65.6.2) follows from line (65.6.1) and Theorem 65.4.13 line (65.4.13), and Theorem 65.3.9. 


65.6.3 REMARK: Closure of cross-sections of vector bundles under linear space operations. 

By setting f to a real constant in Theorem 65.6.2, it is not difficult to show the scalarity of naive derivatives 
on vector bundles. (See Theorem 65.6.6.) Alternatively this could be deduced from Theorem 65.5.6. The 
other linear space operation is addition, but it is not immediately obvious how to derive the additivity of 
differentials of cross-sections from scalar-product theorems. In principle, one should expect an easy proof 
that the derivative of a sum equals the sum of the derivatives because this is so straightforward for Cartesian 
space maps, and cross-sections are in essence merely maps between Cartesian spaces, if one ignores all of the 
"structural packaging" which is required by definitions of fibre bundles. 


The difficulty with the “additivity of differentials” proof for vector bundle cross-sections is that differentials 
such as Oy X, and Oy X» are in different linear spaces, namely Tx, (p) (E) and Tx,(5)(E), where V € T,(M), 
and the differential Oy (X1 -- X2) of the sum is expected to be in a third linear space, namely Tx, (p)+X2(p) (E). 
However, much the same kinds of difficulties are encountered in the proof of scalarity in Theorem 65.5.6. So 
much the same style of analysis is required for the proof of additivity. 


The sum of cross-sections X; and X> may be written as ø o (X4 x X3), which very much resembles the 
expression u o (f x X) in the proof of Theorem 65.6.2, which is designed to be amenable to Theorem 58.7.13. 
Theorem 58.7.13 is also used in the proof of Theorem 63.6.31, which is a kind of Leibniz formula, or product 
rule, for functions of the form z > g(z)p(z) which are variable actions of group elements g(z) on variable 
points p(z). Thus the computation of the differential of a sum of vector bundle cross-sections has much in 
common with Leibniz rules because both of the cross-sections are variable. 


The fly in the ointment here is that the vector sum operation is not global. The domain of ø is the limited 
set of total space pairs Uze p (Ep X Ep), not the full set E x E of all total space pairs. Although it is true that 
cross-sections X, Y € X! (E, rg, M |U) satisfy X+Y =o o(XxY) € X(E,ng, M |U), and it is also true by 
Theorem 58.7.7 that (d(X x Y)), = io ((dX), x (dY),), which implies that Oy (X x Y) = (d(X xY)),(V) = 
i(Oy. X, OyY) € T(y,p) (E x E), it is not true that (d(X + Y), = (do) (x(p),¥(p)) o ((dX)p x (dY )p). The 
problem here is that (do), can only act on vertical vector-pairs, whereas when V ¥ 0, these pairs are not 
vertical. Thus in general, an attempt to apply Theorem 58.7.13 line (58.7.15) arrives at a dead end: 


Vp € U, (d(X +Y))p = (d(o o (X x Y), 
z (dRY (py) x(p) ° (4X) p + (ALS (py ¥(p) 9 (dY )p, 


where Vg € E, RẸ = o(-,q) : Exg(q) 9 Exe(q) and Yq € R, Lf =o(9,-) : Enel) — Ers(a) The problem 
here is that H7 and L7 do not have domains and ranges equal to E. Therefore Dom((dR¥,,,,) X(p) = 


Tx (p),0(£) Dp Tx (p) (E) =) Range((dX),) and Dom((dL5 (,) )y Y (p )) = Ty (y), (E) E Ty (p) (E) 2 Range((dY ),). 
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One way out of this difficulty is to artificially extend the sum operation ø to E x E using a fibre chart 
to construct a fibre-chart-dependent addition operation. Then the horizontal components will (hopefully) 
disappear in the result on the last line. A more natural approach is to derive a formula for (d(o o (X x Y)))» 
which takes advantage of the simultaneous variation of (X (p), Y (p)) with respect to p instead of attempting 
to vary X(p) and Y (p) independently as in Theorem 58.7.13. 

In fact, the additivity of the naive derivative of cross-sections of depends very much on the special vector 
bundle structure. So it is not really worthwhile to attempt to derive it from some kind of general property 
of differentials of common-domain function products. The proof of Theorem 65.6.4 therefore makes use of 
the specific nature of the induced linear structure on fibre sets. 


65.6.4 THEOREM: Formula for naive derivative of sum of two cross-sections of a vector bundle. 
Let (E, vg, M, AE) be a C! vector bundle over F. Then 


VU € Top(M), VX1, X2 € X! (E, nz, M |U), Vp € U, VV € T, (M), Yọ € Afp» 


— EŻ% 
ðv (X3 + X2) = wt xat) ENT 2 (Oy X1) + c P? (Oy X3)). (65.6.4) 


(See Definition 65.4.2 for oblique drop functions wf? and c;;?.) Consequently 


VU € Top(M), VX;, X € X' (E, nz, M |U), Vp € U, VV € T,(M), Vó € Afp» 
zy P * (Oy (X, + X2)) = wt (Oy X1) + co P? (Oy X3). (65.6.5) 


Pnoor: Let U € Top(M) and X4, X2 € X!(E,ng, M |U). Then X; + Xo € X!(E, vg, M |U) by Theo- 
rem 65.2.7 line (65.2.2). Let c =U, Tp, where op : Ep x Ep — E, denotes the pointwise induced vector 
addition operation on E, as in Definition 65.1.6 for all p € M. 

Let p € U, V € T,(M), 6 € Af, Vm € atlas, (M) and ir € atlas(F). Let n = dim(M) and m = dim(F). 
Then by Definition 64.9.2 and Theorem 64.9.6 line (64.9.1), ðv Xi = = (dX;)p (V) = tx,(p),(v,24),% for some 
wi € R” for i = 1,2, ,2, where V = tp,v,m and Vo = (Ym X Vr) o (rg x $). Then by Theorem 64.9.6 
line (64.9.2), (d$) x, (yy ((AXi(p))5(V)) = to(x,(p)),w;,u,. for i = 1,2. So by Theorem 58.6.8 (i), 


(d(or o ($ x 6) o (Xi x X3)),(V) = (der)eqato».ecxo (o) (AP) x1 (p) ((EX1) p(V)), (db) x. (y ((1X2)5(V))) 
= (dor) $(x1(p)),6(X2(p)) CX (p)) wr de b6(Xo(p)), wa br) 
= t (X3 (p)d-Xa(p)) wi -wa br? 


where of : Fx F — F is the addition operation on F. But (X44-X35)(q = 5 « ar o (bX) o (X1x X2))(q)) 
for all q € U by Definition 65.1.6. Let z = X1(p) + X2(p). Then 
Oy (X4 + X2) = (d(X1 + X2))p(V) 
= (db ln, vus (dior o (6 x 6) o (X1 x X2)))p(V)) (65.6.6) 
= (db) alr. oy CSX (0) Xo (0) 201 war) 
= Lx (p)+X2(p),(v,wi --wa) o (65.6.7) 
= WK y+ Xal0) Hs riso (0 (Ps, PR (W1 + wa))) (65.6.8) 
~ Dx (9) Xs (p) Ir vos cov an (7. (Ov Xa) + eo P? (0y X3), (65.6.9) 
where line (65.6.6) follows from the chain rule for differentials, line (65.6.7) follows from Theorem 64.9.7, 


and lines (65 6. 5.6.8) and (65.6.9) follow from Theorem 65.4.3 line (65.4.1). This verifies line (65.6.4). Then 
line (65.6.5) follows from line (65.6.4). 


65.6.5 REMARK: Scalarity of the naive derivative of cross-sections of vector bundles. 

Theorem 65.6.6 could be proved in the same way as for Theorem 65.6.4. However, the much more general 
case of the naive derivative of a variable scalar multiple of a cross-section has already been shown in the 
Leibniz rule in Theorem 65.6.2. So it is simpler to apply that. 
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65.6.6 THEOREM: Formula for naive derivative of scalar multiple of cross-section of a vector bundle. 
Let (E, 5, M, AZ) be a C! vector bundle over F. Then 


VU € Top(M), VA € R, VX € X'(E,ng, M |U), Vp € U, VV € T,(M), Vo € Af p, 
-1 
av (AX) = vy) [M ay Aw”? (y X)). (65.6.10) 


(See Definition 65.4.2 for oblique drop functions wE? and w2.) Consequently 
VU € Top(M), VÀ € R, VX € X' (E, nz, M |U), Vp € U, VV € T (M), Yọ € AF p 
wP? (Ay (AX)) = Aw”? (Oy X). (65.6.11) 
Proor: To show line (65.6.10), let A € R and define f : U — R by f(p) = A for all p € U. Then 
f € C1 (U, IR) by Theorem 51.6.5, and Oy f = 0 for all V € T(U) by Theorem 54.11.18. Therefore Oy (AX) = 
eu os as (Aw? (ðv X)) by Theorem 65.6.2 line (65.6.1). This verifies line (65.6.10), and then 
line (65.6.11) follows from this. 


65.6.7 REMARK:  Naive derivatives of coordinatised cross-sections of vector bundles. 

Theorems 65.6.4 and 65.6.6 may be combined to obtain a simple formula for naive derivatives of general linear 
combinations of cross-sections of vector bundles. By combining the Leibniz rule in Theorem 65.6.2 with the 
addition rule in Theorem 65.6.4, similar formulas may be obtained for general variable linear combinations 
of cross-sections. This is particularly useful for differentiating coordinatised cross-sections. In other words, 
if a cross-section X € X! (E, mg, M) is expressed as X(p) = 5; , x’ (p)eF (p) for some < : M — IR" and a 
frame field e? € X!("'(E, pg, M)), then ôy X can be expressed in terms of the directional derivatives of 
x and eë. In particular, if the frame field is induced by a fibre chart, one of the terms in the formula can be 


made to disappear. 


'Theorem 65.4.16 is applied in Theorem 65.6.8 to obtain a formula for the naive derivative of a cross-section 
of a vector bundle, expressed in terms of induced basis fields. The Leibniz rule makes two terms appear 
in the formula for this naive derivative in line (65.6.15), but it disappears in line (65.6.15) because the 
fibre-chart-induced basis vectors vary horizontally relative to the fibre chart. Consequently, the formula in 
line (65.6.13) has the form one would expect if the basis vectors were constant, which they effectively are, 
relative to the fibre chart. 


65.6.8 THEOREM: Formula for naive derivative of cross-section in terms of an induced basis field. 
Let (E, ng, M, AZ) be a C! vector bundle with m = dim(F). Then 


VU € Top(M), VX € X! (E, vg, M |U), Vp € U, VV € T,(M), Yọ € Af p, Vr € atlas(F), 
wE? (8, X) = Y (3y (KF oo Xy) eP OYE, (65.6.12) 
i=1 
where &P:9"r : Dom(¢) — IR"' is the component map for the basis eP:??r for Ep in Definition 65.1.8. (See 
Definition 10.4.26 for the pointwise function composition operator “oo”.) 
In other words, if X(p) = 30, a*(p)eP 9" (p) for all p € U N xg(Dom(9)), where U € Top(M), X € 


a 


X(E,ng, M |U), 6 € AE and pp € atlas(F), then 
Vp € UN vg(Dom(9)), VV € T,(M), 


v P*(0y X) = Y (Ova!) cP, (65.6.13) 


i=l 


Pnoor: To show line (65.6.13), note that 


v P(8y X) = c; P (y (Y; gg PE) 
1 


= 3 a * (By (s er) (65.6.14) 
= X (ðv ef t (p) + z' (ye P A Ove t) (65.6.15) 
= Y Gdje (65.6.16) 
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where line (65.6.14) follows from Theorem 65.6.4 line (65.6.5), applied inductively, line (65.6.15) follows from 
Theorem 65.6.2 line (65.6.3), and line (65.6.16) follows from Theorem 65.4.16. 
Line (65.6.12) follows from line (65.6.13) because by Definition 10.4.26, (x&P:^^* oo X)(p) = P:9"^r(X(p)) 


for all p € UN «g(Dom(9)), which gives X(p) = 355.4 a (pjeP: 9r (p) for all p € U N »g(Dom(49)). 


65.7. Vector-tuple bundles built from vector bundles 


65.7.1 REMARK: Relevance of vector-tuple bundles to multilinear maps and tensors. 

As mentioned in Remark 55.5.2 in the context of vector-tuple bundles on tangent bundles of differentiable 
manifolds, vector-tuple bundles are relevant to the construction of multilinear maps and tensors. In the 
context of vector bundles, these multilinear maps and tensors are based on the vectors in a general vector 
bundle rather than a tangent bundle of a manifold. 


The task of Section 65.7 is to replicate Section 55.5, but replacing tangent bundles with general vector 
bundles. Not many surprises are expected from this. Section 65.7 may be safely ignored. 


65.7.2 REMARK: Notation quandaries for vector-tuple bundles on vector bundles. 
'The definitions for vector-tuple bundles on general vector bundles are not a particularly burdensome extension 
of the corresponding concepts for tangent bundles of manifolds, but the notations do offer some difficulties. 


The fibre set Ep at p, for a vector bundle (E, mtp, M, AE), corresponds to the pointwise tangent space 
T,(M) for a manifold M. The notation chosen here for the vector r-tuple space Ej is 7; (E, ng, M), by 
analogy with Notation 55.5.5 which denotes the set T;(M)" by T; (M). This in itself is a bad idea because 
it replaces the short, obvious, correct expression E; with the long, obscure, slightly incorrect expression 
Tp (E,ng,M). The motivation for this bad choice of notation is that in the case of the total space set 
Upem Tp (E, Tg, M) = Upem E; in Definition 65.7.5, the notation E" would be incorrect and confusing. 
The expression * E"" already has a well-defined meaning, which is a Cartesian product of r copies of E, 
which is not what is intended here. The letter “7” may be read as an abbreviation for “tuple”. 


Part of the reason for the difficulties here is that the notation “E,” for mp ({p}) is somewhat ill-designed, 
although it is quite widely used in the literature. If Æ, had been denoted instead as Fib,(£), for example, 
one could write Fib, (E) for Fib,(E)", and Fib'(E) for the vector r-tuple total space Uem Fibp(E), by 
(almost) exact analogy with T (M) and T"(M). In fact, T; (M) = Fib, (T(M)) and T"(M) = Fib"(T(M)). 


65.7.3 DEFINITION: The vector r-tuple space at p € M of a C? vector bundle (E, mg, M, AL), for r € Zp, 
is the Cartesian product Er, = xj_, Ep, together with its Cartesian product topology. 


65.7.4 NOTATION: T7 (E, ng, M), for p € M and r € Z, for a C? vector bundle (E, mz, M, Af), denotes 
the vector r-tuple space of E at p. In other words, 7; (E, ng, M) = Ej. 


65.7.5 DEFINITION: The vector r-tuple total space for a C? vector bundle (E, ng, M, AE), for r € Zj, is 
the set Upem Tp (E, me, M) = Unem Ep- 


65.7.6 NOTATION: T7"(E,ng, M), for a C? vector bundle (E, vg, M, AE) and r € Zi, denotes its vector 
r-tuple total space. In other words, 7^ (E, rg, M) = Unem Tp (E, te, M) = Unem Ey 

65.7.7 DEFINITION: The vector r-tuple fibration for a C? vector bundle (E, ng, M, AE), for r € Des is the 
tuple (7"(£, 72, M), nh, M), where 15 : T" (E, 72, M) — M is defined by 


Vp € M, V(Vj)j21 € Tp (E, ng, M), Tpl (Vi) j=1) =P. 


The projection map of the vector-tuple fibration (T(E, ng, M), 7, M) is the map mp. 


65.7.8 REMARK: Fibre chart constructor for vector-tuple bundles on vector bundles. 

The map 97, in Definition 65.7.9 constructs fibre charts for vector-tuple bundles by analogy with the fibre 
chart constructors ® in Notation 54.5.7, ®* in Notation 55.4.7, 9" in Notations 55.5.25 and 55.6.18, $7 in 
Notation 55.7.6, 6^5 and $"** in Notations 56.3.13 and 56.3.14, P>" in Notation 56.4.13, $^ and 67:"— 
in Notation 56.5.20, $-7;^^ in Remark 56.6.4, and $^^ in Notation 56.7.12. 
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65.7.9 DEFINITION: The vector r-tuple bundle fibre atlas for a C? vector bundle (E, vg, M, Af), forr € Zi, 
is the set A sega = {6%,(¢); 6 € AE), where ©%,(¢) : Usern(Dom(4)) Ep > F" is given by 


Vo € Ag, Vp € rg(Dom(9)), V(Vi)i-: € Ep, 
Srl Viiz) = Vi) i= 


65.7.10 REMARK: Manifold chart constructor for vector-tuple bundles on vector bundles. 

A novelty of the fibre chart constructor in Definition 65.7.9 is that the vector-tuple bundle fibre space F” 
is not automatically mappable to a real Cartesian space. Therefore it cannot be directly used to construct 
manifold charts for the total space 7" (E, ng, M). By Definitions 65.1.3, 63.4.17, 51.4.21 and 49.7.14, the 
atlas which is assumed for F is the set of all component maps for the linear space F. These are therefore 
used in Definition 65.7.11 for the corresponding vector-tuple fibre bundles. 


The map Y% in Definition 65.7.11 constructs manifold charts for vector-tuple bundle total spaces following the 
pattern of the manifold chart constructors V in Notation 54.5.21, V* in Definition 55.4.8, V^ in Notations 
55.5.33 and 55.6.26, W^* and W'** in Notations 56.3.21 and 56.3.22, W>" in Notation 56.4.18, W^" in 
Notation 56.5.25, and W^^W in Notation 56.7.16. 


As a formalistic curiosity, one might like to note that 95 (9) =U 


per gDom(4) xi=19| p, for all ó € AE, and 


Vo € AL, Vy € atlas(M), Vir € atlas(F), 
Talh, Ym, VF) = (Ym 9 Tp) X U ((Xja1PF) © (Xiié|g,)) 


pet E(Dom(¢)) 


= (ým ° Tp) X U Xiz (Vr o |, )- 
pet (Dom(¢)) 2 


This is related to the formula in Notation 54.5.21 line (54.5.6) for tangent bundle manifold charts, and the 
formulas in Notation 55.5.25 line (55.5.2) and Notation 55.5.33 line (55.5.3) for tangent r-tuple bundle fibre 
and manifold charts. 


65.7.11 DEFINITION: The vector r-tuple bundle manifold atlas for a C? vector bundle (E, cg, M, AL), 
for r € Zj, is the set ATr(E np, M) = UV g(ó Vu Vr) $ € Ak, Ym € atlas(M), vp € atlas(F)), where 
Vr (6, Ym, VF) : Unedomba ne (Dom(4)) E» — R” x (R™)" = R"*"" with n = dim(M) and m = dim(F) 
is given by 


Vo € Ab, Vp € »g(Dom(9)), Vu € atlas, (M), Vr € atlas(F), V(Vi))., € Es 
Valh, vu; ve) Vi)i=1 ) = (Ym (P), (Ye (O(Vi)) i=). 


65.7.12 REMARK: Construction of a vector-tuple bundle from a vector bundle. 

Definition 65.7.13 combines the various components of a vector-tuple bundle into a tuple in the form which 
is required for the specification of a differentiable fibre bundle. In fact, it may be easily (but tediously) 
verified that this tuple is a differentiable vector bundle with the same differentiability class as the original 
vector bundle E. Addition and scalar multiplication on F” are defined componentwise in the usual way. 


65.7.13 DEFINITION: The vector r-tuple bundle of a C? vector bundle E < (E, rg, M, AE), for r € Zi, 
is the tuple (7 (E, tg, M), wr, M, AT (ng M) < (T(E, ng, M), ATr(E,ng, M): Th, M, Am, AT (p ne M)) 
with the following components. 

(i) T" (E, mg, M) is the vector r-tuple total space for E as in Definition 65.7.5 and Notation 65.7.6. 


(ii) Am-(E, «5, M) is the vector r-tuple bundle manifold atlas for E as in Definition 65.7.11. 

(iii) m% is the projection map for T" (E, vg, M) as in Definition 65.7.7. 

(iv) Am = atlas(M). 
) 


a is the vector r-tuple fibre atlas for E as in Definition 65.7.9. 


(V) AFE, e M) 
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65.8. Vector-frame bundles built from vector bundles 


65.8.1 REMARK: Vector frame bundles for vector bundles. 
Section 65.8 is an extension of Section 55.6 from tangent bundles to general vector bundles. Section 65.8 is 
an almost exact carbon copy of Section 65. 65.7. So there is not much value in reading it. 


The set F; (E, TE, M) is empty if r > dim(F) by Theorem 22.7.15. So all assertions about vector r-frame 
bundles for r > dim(F) are properties of the empty set. The case of most interest is r = dim(F). 


65.8.2 DEFINITION: The vector r-frame space at p € M of a C? vector bundle (E, mp, M, AL), for r € Zj, 
is the set of linearly independent r-tuples in Ey = xj_, Ey, together with the relative topology induced by 
the Cartesian product topology on Ej. 


65.8.3 NOTATION: Z5(E,ng,M),for pe M andre Zg, for a C? vector bundle (E, vg, M, A‘), denotes 
the vector r-frame space of E at p. In other words, 
Vr € Zi, Vp e M, 
F (Eng, M) = ((Vi)-1 € Ep; (Vi)i=1 is a linearly independent family in Ep}. 


65.8.4 DEFINITION: The vector r-frame total space for a C? vector bundle (E, rg, M, AZ), for r € Zj , is 
the set Upem Fp (E, te, M). 


65.8.5 NOTATION: F(E, ng, M), for a C? vector bundle (E, c, M, AZ) and r € Zt, denotes its vector 
r-frame total space. In other words, F” (E, tg, M) = Upem Fp (Eng, M). 


65.8.6 DEFINITION: The vector r-frame fibration for a C? vector bundle (E, vg, M, AE), for r € Zg, is 
the tuple (F"(E, ng, M), t'h, M), where mh : F(E, ng, M) — M is defined by 


Vp € M, V(Vj)ja1 € Fp (E, TE, M), "g((Vi)ji) = P. 


The projection map of the vector-frame fibration (F(E, ng, M), ng, M) is the map 75. 


65.8.7 DEFINITION: The vector r-frame bundle DNE atlas for a C° vector bundle (E, mg, M, AE), for 
r € Zg, is the set AT, "aM = (95(0); ¢ € A}, where 97,(9) : UU erstDonte) Fp (E: TE, M) — F" 
is given by 

Vo € A5, Vp € ng(Dom(¢)), Y(Vi);-1 € Fp (E.g, M), 

PEO (Viii) = (Vi) i=: 

65.8.8 DEFINITION: The vector r-frame bundle manifold atlas for a C? vector bundle (E, rg, M, AZ), 
for r € Z, is the set Arr(Enp, M) = UV E(O, Vu Yr) $ € A5, Ym € atlas(M), Yr € atlas(F)}, where 
V5(6, Vu. Vr): oe comtos rins (Dom ta]y Fp (E: TE, M) > R” x (R™)" = IR7*"" with n = dim(M) and 
m — dim(F) is given by 

Vo € Ab, Vp € »g(Dom(9)), VV € atlas; (M), Vir € atlas(F), V(Vi))., € F(E, Te, M), 

Vilo, bu, vr)((Vi)i2i) = Wr), (ér(6(Vi) i21). 


65.8.9 REMARK: Vector-frame bundles constructed from vector bundles. 
Definition 65.8.10 is an extension of Definition 55.6.31 from tangent bundles to general vector bundles. 


65.8.10 DEFINITION: The vector r -frame bundle of a C? vector bundle E < (E, Tg, M, AZ), for r € Zj, 
is the tuple (F(E, TE, M), T, M, rum m M) < (F(E, TE, M), Á Fr(E;mg, M): T, M, Am, ATE gs M) 
with the following components. 

(i) Z"(E,ng, M) is the vector r-frame total space for E as in Definition 65.8.4 and Notation 65.8.5. 


(ii) AFr(E,zp,Mm) is the vector r-frame bundle manifold atlas for E as in Definition 65.8.8. 
(iii) m is the projection map for 7" (E, 75, M) as in Definition 65.8.6. 

(iv) A F = atlas( M). 

(v) A 


PU ng. M) is the vector r-frame fibre atlas for E as in Definition 65.8.7. 
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((2019-7-18. Create one or more new sections after Section 65.8 to present various kinds of tensor bundles 
built from vector bundles. Since each fibre set is a finite-dimensional linear space, all of the various species 
of tensors can be (1) defined pointwise, (2) aggregated into vector bundles on the whole base space, (3) used 
to define sets of cross-sections, which will form linear spaces under pointwise linear operations. All of these 
will require new notations, extending the notations for tangent bundles. For example, for a vector bundle 
(E, Te, M, AL), could define the dual vector bundle (E*,77,, M, A) This seems very onerous if there are 
no applications though. Could try to define notations such as 9" E, Am(E), and N (E). 


2019-7-20. An application for cross-sections of antisymmetric covariant tensor bundles (i.e. “differential 
forms") would be the definition of the interior product, particularly the operation i(X)(A) = A o X for 
vector fields X and differential forms A. This should be a map i(X) : X(A,(T(M))) > X(A, 1(T(M))) or 
i(X) : X(AS(E,n, M)) > X(N. ai(E, m, M)), or something like that. )) 


65.9. Tangent bundles of differentiable manifolds 
65.9.1 REMARK: Diagrams of spaces and maps for tangent bundles. 


Figure 65.9.1 gives some idea of the spaces and maps which are required for the specification of fibrations 
and fibre bundles based on the tangent spaces on differentiable manifolds. 


GL(n) pra GL(n) pra 
E 
u 
u% S He] ug 
ó $ 
T(M) T(M)——> R” T(M)——> GL(n) 
^. T os 
T TE ce T ue 
M M M x R” M M x GL(n) 
tangent fibration ordinary tangent bundle principal tangent bundle 


Figure 65.9.1 


Spaces and maps for tangent bundle structures 


The right action map uë : T(M) x GL(n) — T(M) in Figure 65.9.1 is constructed from the other maps. 
'The important point is that it can be constructed in the case of a principal tangent bundle. For ordinary 
tangent bundles, the right action map uë : IR" x GL(n) — R” is not well defined in general. 


65.9.2 REMARK: Specialisation of fibre bundles to tangent fibre bundles. 

When the maps and spaces for a fibre bundle in Figure 64.8.1 are specialised to the case of a tangent fibre 
bundle, the result is Figure 65.9.2. 

One notable aspect of Figure 65.9.2 is the fact that the coordinate chart space R” for the fibre space F = IR" 
is the same as the fibre space F itself. Therefore one may conveniently choose the manifold map vg» to be 
the identity map on R”. 

Another notable aspect of Figure 65.9.2 is the fact that the coordinate and fibre charts for T(M) may be 
chosen to be very closely aligned to the charts for M, which is not the case for general fibre bundles. These 
closely aligned charts are of the form 4» : &-!(U) > IR?" and ¢y : 47! (U) — R” for each ù € Am, the 
set of manifold charts on M, where w : (p, V) +> (v(p), (v;).,) and dy : (p, V) — (vi), for vectors 
pex vie? in T,(M) for some p € M. It is these closely aligned charts which permit most of the basic 
concepts of affine connections to be defined, such as for example the torsion in Remark 71.12.4. 


65.9.3 REMARK: A tangent fibre bundle is a particular kind of fibre bundle. 

Definition 54.5.30 for a tangent bundle is chosen to match Definition 64.8.3 for a differentiable fibre bundle. 
In fact, the tangent bundle is a vector bundle according to Definitions 65.1.3 and 64.8.3 with suitable 
linear space structure on the fibre sets. This is shown in Theorem 65.9.5 by “ticking all the boxes” in 
Definition 64.8.3 with F = R” and G = GL(n) for n = dim(M). 
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TR2n HE | >R? | > Rex 
AM MN 
Dom(¢) = DOR 
GL(n) 
vem) Ov xR” CM xR” 
zi 
IR? » 
Figure 65.9.2 Fibre bundle spaces and maps for a tangent bundle 


The differentiable manifold of tangent vectors (T(M), Ar(y;) is defined in Section 54.5 as the total tangent 
space for the manifold M. The tangent bundle in Definition 54.5.30 is the same as the tangent fibration of 
a manifold except that a fibre atlas ARG M) is added to indicate how the structure group interacts with the 
total space of the fibre bundle. 


65.9.4 REMARK:  Differentiable structure on the total space of a tangent fibre bundle. 
As stated in Theorem 54.5.28, the pair (T(M), Ar(m)) in Definition 54.5.30 is a C* differentiable manifold 
if (M, Am) is a C**! differentiable manifold for k € Zg. 


65.9.5 THEOREM: The tangent fibre bundle is a differentiable vector bundle. 

Let M « (M, Am) be a C**! differentiable manifold for some k € Zi. Define the tuple T(M) < 
(T(M), v, M, AR (M) < (T(M), Arm), T, M, Am, AR (m)) as in Definition 54.5.30. Then T(M) is a C^ 
vector bundle. 


PROOF: By Theorem 54.5.28 (iii), T(M) < (T(M), Ar(m)) is à C^ manifold. Define  : T(M) —^ M by 
m(V) = p for all V € T,(M). Then v is a C^ map because by Theorem 52.1.11, only the chart V(w) in 
Notation 54.5.21 needs to be tested for each chart € Am. Thus only the map 7 = v o m o V(y)^! 
needs to be shown to be C^ for y € Ay. But 7 : Range(v) x R” — IR" satisfies 7 : (z,v) œ> x for all 
(z,v) € Range(v) x IR", where n = dim(M), which is a C^? map. Therefore Definition 64.8.3 (i) is satisfied. 
For Y € Ay, let ó = (Y) as in Notation 54.5.7. Then Dom(¢) = v^ !(U) with U = 7(Dom(¢)) € Top(M), 
and à : m~1(U) > R” is a C^ map by Definition 52.1.2 because ¢ o V(u)-! : (z,v) 5 v is a C% map. 
(The single chart for any open subset of R”, regarded as a manifold, is the identity map.) The map 
m 046:m-!(U) + U x R” is then a C^ diffeomorphism because it is a C^ map and the inverse map 
(109)! : U x R” 2 7 1(U) is of class C^ since it maps (p, v) to tp,v,p, and through the charts, this is the 
map from (z, v) to (x, v). Thus Definition 64.8.3 (ii) is satisfied. 


Definition 64.8.3 (iii) is satisfied because Use arn, m(Dom(¢)) = Uyea,, Dom(v) = M. 


Let $1,» € Army Then $1 = (y1) and $» = $(v») for some Y1, Y2 € Am. Let p € Dom(v;) NDom(w2). 


Then $5 o éil- i (£p 1) = Ja1(p)u € R” is a linear function of v4 € R” by Theorem 54.1.11. (See 
Definition 51.4.18 for J21.) Therefore $$ o pl -pp = Jo, € GL(n). Thus Definition 64.8.3 (iv) is 
satisfied. 

Definition 64.8.3 (v) is satisfied because the matrix J21 (p) is a C* differentiable of p as shown in the proof 


of Theorem m 5 28 (i). Hence T(M) is a C* vector bundle by Definitions 64.8.3 and 65.1.3. 
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[2089] 
Chapter 66 


DIFFERENTIABLE PRINCIPAL BUNDLES 


66.1 Differentiable principal fibre bundles . .. ...... e 2089 
66.2 The differentiable principal bundle right action map .................204. 2090 
66.3 Identity cross-section and identity chart differentials . . . . . .. .... lll. 2096 
66.4 Left group action maps on principal bundles . ........ llle. 2098 
66.5 Infinitesimal group actions on principal bundles . ............. lll. 2101 
66.6 Fundamental vertical vector fields on principal bundles ................... 2104 
66.7 Associated differentiable fibre bundles . .......... es 2107 
66.8  Differentiable short-cut orbit-space associated cross-sections . . . . . 2.2... 2110 


66.1. Differentiable principal fibre bundles 


66.1.1 REMARK: The usefulness of differentiable principal fibre bundles. 

Differentiable principal fibre bundles are the customary structure on which to define a connection. The 
reason for this is that parallel transport for all associated fibre bundles of a given PFB may be defined in 
terms of such a connection. However, this advantage of PFBs relative to ordinary fibre bundles is partly 
illusory. Connections may be defined on any OFB, and parallel transport is then defined on associated fibre 
bundles by copying the fibre chart transition maps. This all becomes clearer in Chapters 67, 68 and 69. 


A principal fibre bundle with structure group G is nothing more or less than a (G, G) ordinary fibre bundle. 
Definition 66.1.2 intentionally does not repeat the conditions in Definition 64.8.3 for a differentiable ordinary 
fibre bundle so as to make the "political point" that a principal bundle is merely a special case of an ordinary 
fibre bundle. Unfortunately this has the inconvenience that one must consult the OFB definition to find all 
of the conditions for any applications. 


66.1.2 DEFINITION: A C* (differentiable) principal (fibre) bundle with structure group G, for k € Zg and 
a Lie group G < (G, Aa,cc), is a C* (G,G) fibre bundle (P, r, M, AS) < (P, Ap, s, M, Am, AS), where 
(G, G) < (G, Ac, G, Ac, oa, 0G) is the Lie transformation group G acting on itself. 

Alternative name: C" (differentiable) (principal) G-bundle. 


66.1.3 REMARK: A differentiable PFB has underlying topological and non-topological principal bundles. 

Theorem 66.1.4 (i) asserts that the differentiable principal bundle structure in Definition 66.1.2 is effectively 
an extension of the corresponding structure for topological principal bundles in Definition 47.8.3. In other 
words, the underlying topological principal bundle can be recovered from the differentiable principal bundle 
structure. Hence all theorems, definitions and notations for topological principal bundles are automatically 


applicable to differentiable principal bundles. 
Likewise, Theorem 66.1.4 (ii) asserts that there is an underlying non-topological principal bundle structure 


(as in Definition 21.9.4) which can be recovered from any differentiable principal bundle. 
66.1.4 THEOREM: Differentiable principal bundles are derived from topological principal bundles. 
Let k € Zj. Let (P, Ap, v, M, Am, A) be a C^ principal bundle with structure group (G, Ac, cc). 


(i) (P, Tp, v, M, Tw, AG) is a topological principal bundle with structure group (G, Tc, oa), where Tp, Tm 
and Tg are the topologies induced by atlases Ap, Aj and Ag on P, M and G respectively. 
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(ii) (P, v, M, AG) is a non-topological principal bundle with structure group (G, oa). 


Pnoor: Part (i) follows from Theorem 64.8.5(i) and Definition 66.1.2. 
Part (ii) follows from Theorem 64.8.5(ii) and Definition 66.1.2. 


66.1.5 REMARK: Comparison of topological, differentiable and analytic principal bundles. 
The requirements for principal bundle spaces and maps are summarised in Table 66.1.1. Topological principal 
bundles are defined in Section 47.8. Non-topological principal bundles are defined in Section 21.9. 


non-topological topological C* principal analytic 

component symbol principal bundle principal bundle bundle principal bundle 
total space P set topological space C* manifold analytic manifold 
base space M set topological space C* manifold analytic manifold 
structure group G group topological group Lie group Lie group 
projection map -:P—M surjection continuous p analytic 
fibre charts ¢:P>4G function continuous p analytic 
group operation o: G —G bijection continuous analytic analytic 
right action u:PxGAoP continuous OF analytic 


Table 66.1.1 Summary of requirements for principal fibre bundle spaces and maps 


66.1.6 EXAMPLE: The trivial principal (IR, +)-bundle on IR^. 

The principal bundle in Example 66.1.6 is associated with the ordinary fibre bundle in Example 64.8.8. 
The trivial principal (IR, +)-bundle on R4 is useful for deriving Maxwell's equations from gauge theory. The 
additive group (IR, +) is different to the multiplicative matrix group GL(1, IR). However, the Lie algebra of 
(IR, +) is essentially the same as the Lie algebra of U(1) = U(1, C). 

Let M = R^, G = R and P = M x G, together with the C^? manifold atlases which each contain only the 
identity map as a chart. Thus the atlases are Ay = {idm}, Ac = {idg}, and Ap = {idm x ida] = {idp}. 
The identity of G with the usual real-number addition operation is e = 0g = 0. So T.(G) = (to,;,v; v € R}, 
where i = idc. Then T.(G) may be identified with R via the map toy, — v. Define the action of G on G 
to be the additive translation map p : G x G — G with u : (g, f) = g + f. (See Definition 20.1.2.) 

By Definitions 66.1.2 and 64.8.3, P < (P, n, M, A) < (P, Ap, v, M, Am, AG) is a C?? principal bundle with 
projection map 7: P — M, fibre chart 9 : P > F and fibre atlas AG = (6), where ~ : (p, g) ^ p and 
$ó:(pg)e»gfor(pg)e P- M xG. 


(See Example 66.2.19 for a continuation of Example 66.1.6.) 


66.2. The differentiable principal bundle right action map 


66.2.1 REMARK: The right action map is an automatic property of a principal bundle. 

It is always possible to define a right action uE : Px G 5 P by group elements in G acting on the total space 
P of a principal bundle. This right action adds no new information because it is defined in terms of the other 
components of the definition. (See Definition 47.8.7 for topological principal bundles. See Definition 21.11.4 
for non-topological principal bundles.) 

In Section 66.2, the notations p and RE are shorthand for the left and right actions of group elements 
respectively. So for g in a group G, LS : g' +> gg’ and HZ : g! gg. 


66.2.2 DEFINITION: The right action (map) for a C? principal G-bundle (P, v, M, AS) is the operation 
uL : P x G — P which is defined by u5(z, g) = ol, (oa(9(2),9)) for (2,9) € P x G for any ¢ € Ap with 
z € Dom(4Q). In other words, 


Yz € P, Yg € G, YỌ € AR, Glz, g) = (n X 9) (a(z), O(2)9)- (66.2.1) 


66.2.3 NOTATION: zg, for z € P and g € G, for a C? principal G-bundle (P, n, M, AG), denotes pE (z, g). 
P G 
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Figure 66.2.1 Principal fibre bundle with Dom(¢) = 7~1(U) 


66.2.4 REMARK: Diagram of definition of differentiable principal fibre bundles. 
Definitions 66.1.2 and 66.2.2 are illustrated in Figure 66.2.1. 


Figure 66.2.1 differs from Figure 64.8.1 in some names of maps and spaces, but the main difference is the 
addition of the right action map R? : P — P for g € G, which cannot be defined (in general) in a fibre- 
chart-independent manner for ordinary fibre bundles. The charts Ya, and vg,» for the passive and active 
copies of the group G respectively are not necessarily the same. 


66.2.5 THEOREM: Fibre-chart-independence of the right action map on a principal fibre bundle. 
The right action map for a C? principal fibre bundle is chart-independent. 


PRoor: The assertion follows as for topological principal fibre bundles in Theorem 47.8.10. 


66.2.6 THEOREM: Lie right transformation group using right action map. 
Let (P, v, M, AG) be a C^ principal bundle with structure group G < (G,aq). Let wi: P x G 5 P be its 
right action map. Then (G, P, oc, uE) is a C* Lie right transformation group. 


Pnoor: By Theorem 47.8.13, (G, P, oc, E) (using the topologies on G and P, but ignoring their atlases) 
is a topological right transformation group. This transformation group is effective by Theorem 47.8.16. Also 
G is a Lie group by Definition 66.1.2. 

To show the C* differentiability of uE : P x G— P, it is sufficient to show this for each fibre chart ó € AG 
in the definition of už, on line (66.2.1) because the right action map is chart-independent by Theorem 66.2.5. 
The C* differentiability of x, ó and (m x à)! follows from Definitions 66.1.2 and 64.8.3 (i, ii). The expression 
“¢(z)g” on line (66.2.1) is C^ with respect to ¢(z) and g because og is C^ by Definitions 66.1.2 and 64.8.3. 
Therefore the composition of functions on line (66.2.1) yields a C^ function u : P x G — P. Hence 
(G, P, oc, LE) is a C* Lie right transformation group by Definition 63.5.3. 


66.2.7 DEFINITION: The right transformation group of a C* principal bundle (P, v, M, AG), for k € Zj, 
is the C^ Lie right transformation group (G, P) < (G, Ac, P, Ap, oc, ue). 


66.2.8 THEOREM: Free and effective action of right transformation group of a principal bundle. 
The right transformation group of a differentiable principal bundle acts freely on the total space. Hence the 
action is effective if the total space is non-empty. 


PROOF: The assertion follows from Theorem 47.8.16 for topological principal bundles, or alternatively from 
Theorem 21.11.10 for non-topological principal bundles. 
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66.2.9 THEOREM:  Differentiability of the right action map. i 
Let P < (P,v, M, AG) be a C* principal G-bundle for some k € Zj. Then the right action map for P is a 
C* map from P x G to P. 


PROOF: Let 9 € AG. Let U = (Dom(¢)). Then ó € C^(x-!(U),G) by Definition 64.8.3 (ii). So the 
map (z,g) + $(z)g is C^ from «-1(U) x G to G because og : G x G — G is C* by Definition 66.1.2. 
But 7: P > M is also C^ by Definition 66.1.2. So the map (z,g)  (1(z),6(z)g) is C^ from a1 (U) x G 
to U x G. Therefore pG|,,,.¢ is C^ from U x G to 5^ !(U) because m x 6 : 4 1(U) 23 U x G is a OF 


diffeomorphism by Definition 64.8.3 (ii). This holds for all € AG. Hence uE € C*(P x G, P). 


66.2.10 DEFINITION: The (pointwise) right action (map) for a C? principal G-bundle (P, v, M, AG) for 
g € G is the operation R? : P — P which is defined by RẸ (z) = u& (z, g) for all z € P. In other words, 


Yz € P, Yg € G, Và e AE.L, — RE (2) = (x6) M (n(2),0(2)g). 


66.2.11 REMARK: Alternative characterisation of the right action map. 

Theorem 66.2.12 (v) shows that the explicit construction for the right action map 4E in Definition 66.2.2 
may be replaced by the equivalent characterisation of the formulas m(zg) = -(z) and $(zg) = $(z)g for 
z€P,g€G and ó e AS. 


66.2.12 THEOREM: Some basic properties of the right action map of a differentiable principal bundle. 

Let (P, v, M, AG) be a C? principal G-bundle, where G < (G, Ac, oc). 

(i) Vg € G, Dom(R?) = P. 

(ii) Vz € P, Vg, ga € G, 2(9192) = (291) 92- 
(That is, Vz € P, Vg1,g» € G, RE, (z) = RE (RE (z)). Thus Yg1, g2 € G, RE, = R}, o RP.) 

(ii) Vz € P, Vg € G, n(zg) = n(z). (That is, n(uE(z, g)) = n(z).) 

(iv) Vó € Ag, Vz € Dom(ó), Vg € G, é(zg) = o(z)g. (That is, &(u&(2, 9)) = 7G (4(2),9)-) 

(v) Let u : P x G > P be a function which satisfies (iii) Vz € P, Vg € G, n(u(z, g)) = n(z) and (iv) 
Vo € AG, Vz € Dom(¢), Vg € G, ó(u(z, g)) = oa(ó(z),g). Then u = we. Hence uE is the unique 
function from P x G to P which satisfies parts (iii) and (iv). 


(vi) Vg EG, ro RP — m. 
(vii) Vó € AG, Vg EG, ġo R? = RP od. 
(vii) Vp € M, Vz € Py, P = {293 g € G}. 


Pnoor: Part (i) follows from Theorems 21.11.7 (i) and 64.8.5 (ii). 
Part (ii) follows from Theorems 21.11.7 (ii) and 64.8.5 (ii). 


( 

(iv) follows from Theorems 21.11.7 (vi) and 64.8.5 (ii). 

(v) follows from Theorems 21.11.7 (vii) and 64.8.5 (ii). 
Part (vi) follows from Theorems 21.11.7 (viii) and 64.8.5 (ii). 

(vii) follows from Theorems 21.11.7 (ix) and 64.8.5 (ii). 

(viii) follows from Theorems 21.11.7 (x) and 64.8.5 (ii). 


66.2.13 REMARK: Some basic functional properties of pointwise right action maps. 
Theorem 66.2.14 gives some basic functional properties for right action maps RP on C? principal bundles. 
Some related functional properties for their differentials (dRP ); are given in Theorem 66.2.18. 


66.2.14 THEOREM: Some basic properties of the pointwise right action map on a principal bundle. 
Let (P, v, M, AG) be a C? principal bundle. 

(i) Yg € G, (RP) = fas In other words, Vg € G, RẸ o R? =idp = R7- o RỌ. 

(ii) R? : P > P is a bijection for all g € G. 
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(iii) Vg € G, Vp e M, (Rg |p)" = Real: 


In other words, Vg € G, Vp € M, A. ls o Brus =idp, = Rei 


g lp, S Re |p, 


(iv) RP |, : Pp > P, is a bijection for all g € G and p € M. 


(V) Vg EG, vp e M, RP |, = e|, o RG od|p. 


Pnoor: For part (i), let g € G and z € P. Then RP (RẸ (2)) = (zg 1)g = z by Theorem 66.2.12 (ii). So 
RẸ o R? = idp. Similarly, R? o RP =idp. Hence (RP)! = At by Theorem 10.5.14 (i). 

Part (ii) follows from part (i) and Theorem 10.5.14 (iv). 

For part (iii), let g € G, p € M and z € P, = « !((p)). Then «(R7 (z)) = x(z) = p by Theorem 66.2.12 (vi). 
So Range(t//|,,) C P,. Similarly, Range(t? |, ) C P,. So A aps o R le; : P, > P, is a valid 
function by Definition 10.4.17. Let z € Pp. Then Rọ |p (2) = Rọ (2) € P». So (Rz i|p o Ry |p )(2) = 
RP (Ry (2)) = z by Theorem 66.2.12 (ii). Thus R7 ,|, o Rọ |p, = idp,. Similarly Rọ |p, o R7, idp,- 


la = P 
Hence (Re |p) = RF. 


ile, 
P, by Theorem 10.5.14 (i). 


Part (iv) follows from Theorem 10.5.14 (iv) because R? | p, has an inverse by part (iii). 


Part (v) follows from Theorem 66.2.12 (vii) by applying lp to both sides. 


66.2.15 REMARK:  Diffeomorphisms on fibre sets use the submanifold’s differentiable structure. 

The per-fibre-set diffeomorphism in Theorem 66.2.16 part (ii) appears to be a simple restriction of the 
diffeomorphism in part (i). However, the differentiable structure and tangent bundle on each fibre set P, is 
not a simple restriction of the structure on the total space P. Each fibre set P, has a regular C* submanifold 
structure as described in Theorem 52.7.5 and Definition 52.4.2. It is necessary to be careful when defining 
differentials of maps using submanifold differentiable structure because the tangent vectors belong to the 
submanifold’s tangent bundle, not the ambient space’s tangent bundle. 


66.2.16 THEOREM: Right action maps are diffeomorphisms on the principal bundle total space. 
Let k € Zi. Let P < (P,7, M, AG) be a C* principal G-bundle. 


(i) Vg EG, RP : P + P isa C* diffeomorphism. 
(ii) Vg € G, Vp € M, iu : P, > P, is a C* diffeomorphism. 


PROOF: Part (i) follows from Theorems 66.2.9 and 52.6.8 (i) applied to R? and (R7)! = Ru 
Part (ii) follows from Theorems 66.2.14 (v), 64.11.6 (iii) and 62.2.6 (ii) applied to the restricted maps R7 |p 


P|-1. pP 
and Rọ |p, = R7, 


PE 


66.2.17 REMARK: The differential of the right action map on a principal bundle. 

In the theory of connections on differentiable principal bundles, it is the differential of the right action map 
which is important. Theorem 66.2.18 (v) shows that the differential (R7). = (ARI); of RP at z € P maps 
vertical vectors to vertical vectors. This is a consequence ultimately of Definition 64.8.3 (ii), which ensures 
that the action by the structure group via the charts in Definition 66.2.2 transforms elements of a fibre set 
to elements of the same fibre set. 


66.2.18 THEOREM: Some properties of the differential of the right action map on a principal bundle. 
Let (P, v, M, AG) be a C! principal bundle. 
(i) Vg € G, Yz € P, Vy € T,(P), (RP). (y) € T, (P). 
In other words, Vg € G, Vz € P, Range((dR);) C T MP 
(ii) Vg € G, Vz € P, (dR? 1) 2g o (dR?), = idy, (p) and 
Vg E G, Vz € P; (dE js [S (RP 1 )eg = idr, (p). 
Hence Vg € G, Vz € P, ((dRP),) | = (dR? )zg: 
(iii) (ARP), : T,(P) > T, (P) is a bijection for all g € G and z € P. 
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(iv) Vg € G, Vz € P, (dr)zg o (ARP), = (dr) z. 


(v) Vg € G, Vz € P, Vy € Tz o(P), (Rg )«(y) € "n ). 

In other words, Vg € G, Vz € P, Bange((dR. a Jelma) ) C Teens 
(vi) Vg € G, Yz € P, (4R7 sss. o ls „o = idm. (p) and 

Yg E€ G, Yz E P, (aR )zlr,, o (dRP 3-1) Jalm = idr, (p). 

Hence Vg € G, Yz € P, (RP). lelro) = (dR? a" eal, g 


(vii) (aR? ele " TZ o(P) > Tzg,0(P) is a bijection for all g € G and z € P. 


PROOF: Part (i) follows from Theorem 58.9.6. 

For part (ii), let g € G and z € P. Then R? o R? = idp by Theorem 66.2.14 (i). So by the chain 
rule, Theorem 58.4.13, (GRP )zg o (ARP), = (didp), = idz, (p). Similarly, (ARP); o o (dRP 1 )2g = = idm, (p). 
Hence ((dR7);) ! = (dR? 1)2g by Theorem 10.5.14 (i). 

Part (iii) follows from part (ii) and Theorem 10.5.14 (iv). 


For part (iv), let g € G and z € P. Then (dz); o (ARI); = (d(x o R?)), = (dr); by Theorem 66.2.12 (vi) 
and Theorem 58.4.13, the chain rule for differentials of C! maps. (See Theorem 41.7.4.) 


For part (v), let g € G, z € P and y € Tzo(P). Then (RP).(y) € T.,(P) by part (i). But (dx);(y) = 0 
by Notation 64.5.6. Therefore T. (UR )«(y)) E (dz) zg((dR?).(y)) = (dr),(y) = 0. by part (iv). Hence 
(RP ).(y) € Tzg,0(P) by Notation 64.5.6. 

For part (vi), let g € G, z € P and y € T; o(P). Then part (v) implies Range((d RP), |; J) C Tag, o(P). 
Therefore (dR? )-(y) € Tzg(P). Similarly, Range((dR? + )<9|7 a) C T: o(P) by part (v). Consequently 
(dR? ,) )«olr. o Rss JO) is well defined, and is equal to y by part (ii) It then follows that 
(dR?) "M Peu = idz, (p. Similarly, (Ro ely. , 
that ((dR?) E = (dR7 i);g|,.. , by Theorem 10.5.14 (i). 
Part (vii) follows from part (vi) and Theorem 10.5.14 (iv). 


o (AR) ssl. = idr, (py. So it follows 


66.2.19 EXAMPLE: The right action map on the trivial principal (IR, +)-bundle on IR^. 

This is a continuation of Example 66.1.6 for the trivial principal (IR, 4-)-bundle on IR^. The specification 
tuple for this principal bundle is (P, 7, M, AS), where M = RP, G = R, P = M x G = Rê x R = R, 
AS = {¢}, and 7 : P 2 M and ¢: P > G are defined by 7: (p, 3) -p.and 9: (p, g) = g. 

By Definitions 66.2.2 and 66.2.10, the right action map HT :P— P by g € G on P satisfies 


Vg €G,Vz=(p,h) € P, R? (z) = 


because hg denotes o(h,g) = h + g, where ø is the group operation on G, which is real-number addition, 
not real-number multiplication. 

To evaluate the differential (dR?), : T,(P) — Tzg(P) of RP : P — P for g € G and z € P, let y € T,(P). 
Then y = Yj Ptzejp = t, for some c € RË, where Y = idp and e; = (85)? € IR? = R* x R for 
j € Zs = {0, 1,2,3,4}. (Here 7 = 0 represents time, j € {1,2,3} represents space, and j = 4 represents the 
vertical component of y. The vertical component is chart-dependent, as mentioned in Remark 59.3.1.) 

By Definition 58.4.5, (dR?).(tz,c,y) = tzg,z,p, where c € IR? satisfies 


Vk € Zs, a dôa (id o RP oidg!)(z)| 


x=id p(z) 


S 
ll 
o 


II 
Me 


II 
Me 


ES 
ll 
o 


Oni (a* +9”) 
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4 , 
D cst 
j=0 
c*. 


Therefore (ARẸ )-(tzcidp) = tzgcidp for all z € P, g € G and c € RË. This differential map clearly satisfies 
all of the assertions of Theorem 66.2.18. 


(See Example 66.2.24 for a continuation of Example 66.2.19.) 


66.2.20 REMARK: Application of right action maps to vector fields on principal bundles. 

The right transformation group (G, P) in Definition 66.2.7 satisfies the requirements of Definition 63.5.3 for 
a Lie right transformation group, and the right action map RẸ by group elements g € G acting on elements 
of a principal bundle P in Definition 66.2.10 fits Definition 63.5.6 for the “right translation operator for 
manifold points”. Therefore the induced map (RẸ ). in Theorem 66.2.18 fits Definition 63.5.7 for the “right 
translation operator for tangent vectors” in T(P). 


Consequently, one naturally wishes to define a “translation operator for vector fields” on P corresponding 
to Definition 63.5.7 for abstract Lie right transformation groups. This is done in Definition 66.2.21, which 
is applicable to the interpretation of the expression (R7). o Ay o Rea in Theorem 66.6.10 (i) as the right 


translation of the vector field Àu. 
The right translation operator Ho (YR (RP). oYo R? in Definition 66.2.21 has the same form as 


the corresponding right translation operator Y +> (RE), oYo Hoa acting on vector fields on Lie groups in 
Definition 62.6.3, although they operate on vector fields on different manifolds. 


In the principal bundle case, a vector field Y € X(T(P)) is mapped to another vector field on P. In the Lie 
group case, a vector field Y € X(T(G)) is mapped to another vector field on G. Thus Definition 66.2.21 is 
merely a special application of Definition 63.5.7 for general C! Lie right transformation groups to the right 
transformation group in Definition 66.2.7 for C! principal bundles. 


Another difference between these two right translation operators is that for the operator on X(T(P)), the 
translation is effected by a group element g which is external to the passive space P, whereas in the X (T(G)) 
case, g is internal to the passive space G. A closely related difference is the fact that only a right translation 
operator is available in general for differentiable principal bundles, whereas both left and right translation 
operators are available for differentiable groups. 


If, as in Definition 63.5.7, one denotes the operator Y ++ (Rj). o Y o Ri, for Y € X(T(P)) by RẸ”, 
then the map RỌ” : X(T(P)) > X(T(P)) may be regarded as a “right translation operator for vector 
fields on principal bundles". The letter “F” is a hint for the word “field”, which could be confused with 


(T(P)) » 


“fibre” or “frame”. A notation such as “ RŽ would be easier to guess, but is too unwieldy. A suitable 


compromise could be * RP ”. 


66.2.21 DEFINITION: The right translation operator for vector fields on a C! principal bundle P by an 
element g of its structure group G is the map det : X(T(P)) ^ X(T(P)) defined by 


Vg € G, VY € X(T(P)), A ea, eer as 


66.2.22 REMARK: Validity of definition of right translation operator for vector fields. 
Theorem 66.2.23 (i, ii) verifies that the right translation operator for vector fields on a principal bundle 
in Definition 66.2.21 yields a valid cross-section as output. This confirms that composition with TU a 


on the right is not superfluous. Without this composition, the expression (RẸ ). o Y would map z to 
(R?).(¥(z)) € T4,(P) because Y (z) € T;(P). Therefore (RD). o Y ¢ X(T(P)) if g # e (because G acts 
freely on P). 


66.2.23 THEOREM: The right translation operator maps vector fields to vector fields. 
Let (P, v, M, AG) be a C! principal G-bundle. 

(i) Vg € G, VY € X(T(P)), Vz € P, RP** (Y)(z) € T,(P). 

(ii) Vg € G, VY e X(T(P)), RP*(Y) e X(1(P)). 
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PROOF: For part (i), let g € G, Y € X(T(P)) and z € P. Then RP*(Y)(z) = (RP (Y (RP (z))) by 
Definition 66.2.21. Here R? (2) = zg | € P by Definition 66.2.10, and so Y(RP (2)) € Tg- (P) by 
Notation 57.1.5 and Definition 57.1.2 for a vector field. Therefore (Ri ).Q (Rea (z))) € T, P) = T,(P) 
by Theorem 58.9.6 for induced maps and Definition 66.2.10 for RẸ. Thus RẸ* (Y) 
Part (ii) follows from part (i) and Definition 57.1.2. 


g-1g( 


66.2.24 EXAMPLE: Vector field right translation operator on the trivial principal (IR, 4-) -bundle on R4. 
This is a continuation of Example 66.2.19 for the principal (IR, 4-)-bundle (P, r, M, AS) on M = IR* with 
P=MxG,G=R, AS = (0), 1 :(p,g) 5 p and ¢: (p,g) = g. 

Let g € Gand Y € X(T(P)). Then Vz € P, dc € IR?, Y(z) = tz,ciap by Notation 54.1.4. The uniqueness 
of the choice of c € IR? for each z € P implies that there exists a unique function y : P — IR? such that 
Vz € P, Y(z) =tz,yz),iap- (In other words, there is no need for an axiom of choice here!) Then 


Vz € P, RIŽ (Y)(2) = (RẸ ee Y oR 32) 


( 
(R5). (Y GL (2) 
= (RE (2971) 
(RP), 
t 
t 


R,) (t2g-1,y(zg-1),idp ) 
(66.2.2) 


z,y(zg—"),idp 


2,9(z),idp» 


where line (66.2.2) follows from Example 66.2.19, and y : P — IR? is defined by 7: z e y(zg7!). 


Note that t,y(zg-1),iap is not the same as Y (zg ^!) because Y (zg ^!) = tzg-1,4(z9-1),idp € Tzg-1(P), whereas 
RP (Y)(z) = tz,y(zg-1),idp € Tz(P). For the special chart idp on P, the action of R7^* on Y outputs the 
vector field z — t,,y(zg-1),iap» which has the same vector component tuple y(zg~') at z which Y has at zg ^ !. 
'This corresponds to a shift of the vector field forward by g because the value of RE X (Y) at zg is the same 
as the value of Y at z. 


(See Example 66.5.4 for a continuation of Example 66.2.24.) 


66.2.25 REMARK:  Chart-dependent left action. 
The left action analogous to the right action u& in Definition 66.2.2 is LG, : G x « (Us) > «- (Us) 


defined by L5. :(g, 2) > ola, (reg. o(2))). This left action is chart-dependent. (By contrast, the left 
action L7 : G — P in Definition 66.4.2 is chart-independent.) 


66.3. Identity cross-section and identity chart differentials 


66.3.1 REMARK: Simplified formula for a special case of the differential of an inverse trivialisation map. 
Theorem 66.3.2 (ii) gives a simplified formula for the differential of the inverse of a trivialisation map for a 
principal bundle in the special case that this map is evaluated at an identity cross-section value. This is 
useful for some applications to connection generators, such as Theorem 69.7.11, which is applicable to the 
proof of Theorem 69.10.4. 


66.3.2 THEOREM: Differentials of inverse trivialisation maps at identity cross-sections. 
Let (P, v, M, AG) be a C! principal bundle. 


(i) Vo € Ag, Vp € 1(Dom(9)), VV € T,(M), Vu € T.(G), 
(d(T x $))x,qy ((dX4)5(V) + (d(9| p)) x4) (4) = (V). 


(ii) Vo € Ag, Vp € 1(Dom(9)), VV € T;,(M), Vu € T.(G), 
(dm x d) x3 (V. tt) = (Xo) (V) + (d(l p,)) o; (0. 


(See Definition 21.10.3 for identity cross-sections X, : 7(Dom(¢)) > Dom(¢) with X, : p (x6) !(p.e).) 
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PROOF: For part (i), let U = 7(Dom(¢)). Then (dr) x,,(p)((dX¢)p(V)) = (d(T o X¢))p(V) = (didu) (V) = 
idr, (uj) (V) = V by the chain rule and Theorem 21.10.5 ü ). Similarly, the chain rule and Theorem 21.10.5 (iii) 
imply (d$) x, (5) ((dX¢)p(V)) = (d(6 o X¢))p(V) = (d eJ (V) = Or, (c) because the differential of a constant 
map equals the zero vector. Thus (d(m x $)) x, (((dX5)5(V)) = (V. Or. (o5). 


By Theorem 64.12.12 (ii), (a(6] p )) s uy (u) € T,o(P). (This assumes the tacit application of the sub- 
manifold tangent vector embedding map m : T;(P,) > Tz o(P) in Notation 64.12.10.) So it follows from 
Notation 64.5.6 that (dr)x, (j (I| p )) x, (4) = Oran). Also, (d6)x, t (9| )) x: (0) = u by the 
chain rule. Thus (d( x P) xem (4| p ))x (9) 09) = (Or, (ar), u). Hence 


"S 


(a(r x $))x,qy ((4X4)0(V) + (d(6| p )) x, 09) = (V. 0r, (6) + (07,0), u) 
= (Vu). 


Part (ii) follows from part (i). 


((2016-12-22. Give topological versions of Definition 21.6.6 for constant cross-sections of fibrations, Defini- 
tion 21.6.8 for constant cross-section extensions of fibrations, Definition 21.10.3 for identity cross-sections of 
principal bundles, and Definition 21.10.7 for identity charts for principal bundle cross-sections. )) 


66.3.3 REMARK: Formula for differential of identity chart transition maps. 
For the purpose of obtaining a “gauge transformation” formula in Theorem 69.12.5, it is useful to determine 
formulas for the differential of the identity chart transition map which appears in Definition 21.10.11. 


Theorem 66.3.4 computes first the identity chart transition map xy for elements of the total space. Then it 
computes the transition map xy = dxy o 1-1 for elements of M. This is well defined by Theorem 21.10.10. 


The expression (dL5 y (2)- Léxy(z) © (déxy)z in Theorem 66.3.4 (ii) may seem to have no particular utility. 
However, the differential (dL¢ ets 1)éxy (2) is precisely what is required to transport the output from the 
differential (dóxy), back to the group identity e. In other words, the output from this expression lies 
in T.(G), which means that it is a Lie algebra element. Theorem 66.3.4 parts (iv) and (v) are purely 
utilitarian computations for the benefit of Theorems 69.12.5 and 69.12.7. 


66.3.4 THEOREM: Basic properties of transition maps between identity charts on principal bundles. 
Let (P,z, M, AS) be a C! principal G-bundle. Let X,Y € Xj. (P,v, M) be C! cross-sections of P. Let 
U = Dom(X) n Dom(Y). Define óxy : x 1(U) > G by oxy :z 9 óx(z)óv(z) |. Let Vxy : U > G be 
the identity chart transition map xy = dxy o 1-! for X and Y, as in Definition 21.10.11. 
(i) Yz € 1-1(U), (dóxv); = (dRg vG)-1)6x (2) © ((dox)z — eq. ese) o (dóy); ). 
(ii) Vz € «-1(U), (dLZ y (z)- Jéxv (2) © (dóxv): 
= (ALF y éxv() ° (ARS (2)-t)6x(2) 9 (dóx); — (RZ (2-1) 6y(2) © (dby)z 
= = (dR ç ge Pas o (dL us (z)-1)6x(z) 9 (déx); — (ARG, (2)-1 or (2) o (dóy); 
= (dR (z)71 Joy (2) 9 (( (dL ey (2)- 1) $x (2) o (déx); — (dy )z ). 
(iii) Vp € U, Vz € «^! ((p]), Ge ic = (dixy)» o (dr)z. 
(v) Vp € U, Vz € s" (UJ), (AS cy yov 2 dor) 0 (de) 
= (ARG, (2)-1)6y (2) 9 (ULG ey (2)-1) 6x (2) 9 (déx); — (déy);). 
(v) Vp € U, Vz € «7 ((»)), Adily (z)7!) o (ALS, (yy iino) 9 (dYxy)p © (dr); 
= = (ALF (2)- 1)éx (2) 9 (dbx) 2 — (dLG, (z)- 1)éy (z) © (doy )z. 


Proor: For part (i), let z € x !(U). Then óxy (z)óy (z) = éx(z). Thus (Le ey (2) o $y)(z) = éx(z). One 
may also write this as (RG, (2) o éxy)(z) = éx(z). So by the chain rule, Theorem 58.4.13, and the product 
rule for C! maps, 


(4S e civ (2) o (dóy); + (ARS, i) )oxv(2) o (dóxy); = (dóx);. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2098 66. Differentiable principal bundles 


Since (ARG, (2)-1)ox(2) o (ARF, (2) ox (2) = (d(RẸ (2)-1 [e] RS te) eee) = (dida), (2) = idm, | rip (G) 


(ddxy)z = (ARg, (4-3)6x (9 9 (ARG, exe (2) 9 (dóxv); 
= (ARG, (2)-1)x(2) © ((déóx): — (ALG, y(2))bv@ © (doy )z ). 


For part (ii), let z € x !(U). Then 


G G G G G G 
(aL ivris)-3) der (2) o (AR, (z)-1)6x(2) O (dL ev (z) ov (2) = (dU stie D Rye- O Lexy (z)) ov (2) 
- G 
= (ARG, (2)-1) by (2) 


So (dLS y (2)- oct (z) o(doxy)z = (dL Gy (z)-1) xv (2)° (ARG, (2)-1) 6x (2) (dbx) 2— (ARF, (2)-1) dy 9o (dév); 
follows from part (i). Since (dLS (2)- depts) e (ARG, (2) iéx(z) = (LE 3 suya o RẸ yt) dato) a 
(d(RG ve- © LE etd j)ex(z) = = (GR, (- i jie (ay? (dLS (2) 1)óx (2); it then also follows that 

(aL. es. 1)éxv( (2) ° (dóxv); = = (dR, (2)- 1)óy (2) 9 (RIS y (e)- 1Jéx (a) o (dóx); — (ARG -3)6v (2) o (dóy);. 
Hence (dL. (2)- 1)éxv (2) © (doxy)z = (ARG, (2)- 1) $y (2) 9 (late ey. 1)éx (2) 9 (dox)z — (doy )z ). 

For part (iii), let p € U and z € m~! ((p]). Then (ddxy)z = (dýxy )z(2) o (dr); = (dUxy)» © (dr); because 
oxy = xy OT. 

Part (iv) follows from parts (ii) and (iii). 

For part (v), note that by Theorem 62.10.5 (ii), 


Adj(¢y (z) !) e (ARG, (2)-1) oy (2) = (dL, (2)-1) bv (e) v (dRg, (2) )e Q (ARG, (2)-1) dy (2) 
= (ALG, (2-1 ° Re) o RG (71) ev G) 
G 
= (dL, (2)-1) oy (2) 


Therefore Adj(óy (z)~ Ps (ARF, (2)- 1)óy (2) 9 (dLS (2) i)jéx(2) = EZ (dL 16v (2) o (dL ey (2)-1)6x(2) = 
(dL. (2)- 1 )éx(z)- Hence the assertion follows from part (iv). 


(2016-12-21. Give differentiable version of Theorem 21.11.12 for a principal bundle which is constructed from 
a Lie right transformation group. )) 


66.4. Left group action maps on principal bundles 


66.4.1 REMARK: Infinitesimal action of Lie algebra elements on a principal fibre bundle. 
To define connection forms on principal bundles in Section 69.5, it is convenient to define infinitesimal actions 
of Lie algebra elements on the vertical vectors of the bundle’s total space. Such actions, oddly, are most 
simply expressed in terms of the left action of the total space on the structure group as in Definition 66.5.2. 
The left action map z — (g++ LP(g)) in Definition 66.4.2 is the function-transpose of the right action map 
g= (z> RP (z)). (See Remark 10.19.4 for the transposition of function-valued functions.) 


Some of the maps and spaces in Definition 66.4.2 and Theorem 66.4.3 are illustrated in Figure 66.4.1. 


P G 
P "He Lr °g G 
RECS. i Me 
DEAR jole) u^ o) 


M | n(z)en(zg) 


Figure 66.4.1 Left action map by a principal bundle element 
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66.4.2 DEFINITION: The left action (map) by a principal bundle element z € P, for a C? principal G-bundle 
P < (P,v, M, AG) is the map LË? : G — P defined by LË : g++ zg. In other words, 


vzc P, Vg € G, Li (g) = R (2). 


66.4.3 THEOREM: Basic properties of the left action map by principal bundle elements. 
Let (P, 7, M, AG) be a C* principal G-bundle for some k € Zj. 


(i) Vz € P, Yg € G, YỌ € AG a. EP(g) = (7 à) (n(2,6(2)9). 

(ii) L? : G 2 P is a C* map for all z € P. 

(iii) Range(LẸ) = P; for all z € P. 

(iv) LP : G > P is a bijection for all z € P. 

(v) Yz € P, Yg € G, LẸ o RẸ = RẸ o LẸ. 

(vi) Yz € P, Vó € AB ate y po LZ = LẸ 
) 
) 
) 
) 


-1 
(vii) Vz € P, Vó € AS MUT Ei e lp. o Loy 
(viii) Vp e M, Yọ € AS »» po Lk (p) = idc. (See Definition 21.10.3 for the identity cross-section X, for ¢.) 
(ix) Vz € P, (LP) !(z) =e. (Note that Vz € P, Dom((LP) !) = Ps.) 
(a) Ye E P Y6 E Ay YZ € Pi, (LEJU) = Ol a ^. 
In other words, Vz € P, Yọ € AP nie) (Ere 210 fle 


m(z) 


(Note that (LẸ )-! is the inverse of a map LË, hte " ) listhe inverse of a group element ó(z) € G.) 
- Ed 
(xi) Vz e P, Yọ € AS), (LP) 1 o lp = LS 


(xii) LP: Go Pr(z) isa C* diffeomorphism for all z € P, assuming the standard submanifold atlas on Pu 
as in Definition 64.11.5. 


PROOF: Part (i) follows from Definitions 66.4.2 and 66.2.2. (AS ,,(,) Means (6 € AG; z € Dom(¢)}. The 
assertion alternatively follows from Theorems 21.11.19 (i) and 66.1.4 (ii).) 
For part (ii), let z € P. Let ọ € AS (a): Then LP (g) = (x x $) !(n(z), 9(z)g) for all g € G by part (i). 
Since 7, ¢ and the group operation cq are all C^ by Definitions 64.8.3 and 66.1.2, it follows that LP is a C* 
map from G to P. (Alternatively, apply Theorems 66.2.9 and 52.6.8 (ii).) 


Part (iii) follows from Theorems 21.11.19 (ii) and 66.1.4 (ii). 

Part (iv) follows from Theorems 21.11.19 (iii) and 66.1.4 (ii). 

Part (v) follows from Theorems 21.11.19 (iv) and 66.1.4 (ii). 

For part (vi), let g € Dom(L7) = G. Then ¢(LP(g)) = o(zg9) = o(z)g Lo (9) by Theorem 66.2.12 (iv). 
Hence $ o LP = LS: 


Part (vii) follows from parts (vi) and (iii) and the equality ble. o |p - = idp,- 


For part (viii), let p € M and ó € AS, Let z = X,(p) Then a(z) = p by Definition 21.10.3. So 
ġo LE = LG gle) bY part (vi). But ó(z) = e by Definition 21.10.3. So o o LX (9) = LS = idg. 

Part (ix) follows from Theorems 21.11.19 (v) and 66.1.4 (ii). 

Part (x) follows from Theorems 21.11.19 (vi) and 66.1.4 (ii). 

Part (xi) follows from Theorems 21.11.19 (vii) and 66.1.4 (i). 

For part (xii), 9p : Pea > Gisa C* diffeomorphism by Theorem 64.11.6 (iii), using the fibre-set 
submanifold atlas in Definition 64.11.5. So le. : G > Pz) isa C* diffeomorphism by Definition 52.2.2. 


Loz) G> Gisa C* diffeomorphism by Theorem 62.2.6 (i). Hence LP = Op 


T o LE.) : G > P.) isa 
C* diffeomorphism by the chain rule, Theorem 52.1.17. 
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66.4.4 REMARK: The choices for the differentiable structure on fibre sets. 

A technical difficulty of Theorem 66.4.5 (vii) is the choice of differentiable structure for fibre sets P,(,). One 
may use either the manifold structure of the ambient manifold P or the submanifold structure of P;(;). The 
need to make a choice is inherent in the way differentials of maps are defined, as mentioned in Remark 58.5.5. 
'The difficulty can be resolved by using the submanifold tangent vector embedding map in Definition 54.6.2 
(as is done in Theorem 58.5.8), which effectively permits submanifold tangent vectors to be identified with 
ambient space tangent vectors. In this case, the inverse embedding 7* in Notation 64.12.10 is required. 


In Theorem 66.4.5 (i), the domain of the differential (dL). is clearly T.(G), but the range of (dL). could 
be either T;(P) or T.(P,(z)), depending on whether the target space of L7 is considered to be the ambient 
manifold P or the submanifold Pj(;. The domain of the differential (dd), seems to be less ambiguous 
because the domain of $ is an open subset of P. (There is still some ambiguity here, which is resolved 
by the embedding map in Theorems 54.6.8 and 54.6.9, but the ambiguities for tangent bundles of open 
subsets may be safely ignored.) Thus one may say that the domain of (d$); is T,(P). (More precisely, it is 
T.(Dom(¢)), but this difference may be ignored without bad consequences.) This suggests that the target 
set of (dL). should be regarded as T;(P) (or T. (Dom(9))), not the submanifold tangent space T; (P;.,). 
This interpretation works perfectly well for Theorem 66.4.5 parts (i, ii, iii, iv). 


The opposite interpretation of the target space for (dL? Je is required in formulas in which the inverse (dLP);! 
of the differential (dL? )e appears. Theorem 66.4.5 parts (v, vi) show that the map (dL? Je : T.(G) > Tz o(P) 
is a bijection, where T, o(P) is defined in Notation 64.5.6 to be a subset of T (P), which is a subset of T(P) 
by Notation 54.1.4. However, although the map LË : G > Py(z) is a bijection, it is not a C! diffeomorphism 
unless P,(z) has a C! manifold structure. According to Definition 52.2.2, a C! diffeomorphism must be 
between open subsets of two C! manifolds, but Pz) is not in general an open subset of P. Consequently 
the standard fibre-set submanifold atlas is used for P,(, as in Definition 64.11.5. 


All in all, the effort required to ensure that the correct domains and target sets are used for all maps and 
differential maps is excessive. The identification maps in Notation 64.12.10 are shown in Theorem 64.12.8 to 
make valid conversions between fibre-set submanifold tangent bundles and the corresponding sets of vertical 
tangent vectors on the total space. Consequently, the identification maps will be tacitly assumed in most 
situations in this book unless some particular point needs to be made. 


66.4.5 THEOREM: Basic properties of the differential of the principal bundle element left action map. 
Let (P, v, M, AG) be a C! principal G-bundle. 
(i) Yz € P, Yọ e AS t (di), o (dL P), = (dL$ e 
(i) Ve € P, WE AG „p, Yu € TG), é (ILP) (n) = XE(6(2)). 
(See Definition 62.4.15 for left invariant vector fields X7.) 
(iii) Vp € M, Yọ € A$ p dy © (dL. (py )e = idp,(g). (See Definition 21.10.3 for X,.) 
(iv) Vz € P, Vu € T.(G), (dr)z((dL?).(u)) = Or (M): In other words, Vz € P, (dr); o o (dL? Je = 0, where 
“0” means the zero map u e» Or (m) for u € Te (G). 
(v) Vz € P, Range((dLP)e) = T, o(P) = ker((dz);). 
(vi) Vz € P, (dL?) : T.(G) — T; o(P) is a bijection. 
(vii) Vz € P, V € A. 
(aLP Jet = (d((LE)~"))z o ff = (418 4-60) © (diel p. is Joi = (dLg. C (s 9 TF- 
(See Notation 64.12.10 for the inverse embedding map ñ? P o(P) >T, M i 
j -1 
(viii) Vz € P, Yọ € AS T (dLE)! ay o (els. oe j= (dL - 1)6(z) = (dL y)! 
(See Notation 64.12.10 for the embedding map n? : Ta (Pr) > Tz,0(P).) 


PROOF: Part (i) follows from Theorem 66.4.3 (vi) and the chain rule. (The domains and ranges are assumed 
to be (d$); : T.(P) + Tyi2)(G), (dL? Je : Te(G) > T.(P) and (4L8 e : Te(G) > Ty2)(G)-) 

Part (ii) follows from part (i) and Definition 62.4.15. 

Part (iii) follows from Theorem 66.4.3 (viii) and the chain rule and Theorem 58.5.2. 


For part (iv), let z € P. Then (x o L?)(g) = «(zg) = m(z) for all g € G by Theorem 66.2.12 (iii). 
Thus m o L? : G — M is constant. So (d(z o LP)),(u)) = 0 for all g € G and u € T,(G). Therefore 
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(dt) 2g((dLP),(u)) = (d(n o LP))g(u)) = 0 for all g € G and u € T,(G) by Theorem 58.4.13, the chain 
rule for differentials of C! maps. Substituting g = e gives (dz);((dLL)«(u)) = 0 for all u € T.(G). Hence 
(dr), o (ALẸ Je = (dr), o (dL?), is equal to the zero map. 

For part (v), let z € P. Then Range((dLU).) C T o(P) by part (iv) and Notation 64.5.6. To show 
the reverse inclusion, let y € Tz o(P). Let y = ñ (y), where the bijection 5j? : Tzo(P) > T.(P,,z)) is 
defined in Notation 64.12.10. Then j € T,(P,(;). By Theorem 66.4.3 (xii), LP. e Pay; is a C 
diffeomorphism, where Lr denotes the function LP : G — P with the target space P replaced by Py (z)- Then 
(LP). : T(G) > T(P, 2)) is a bijection by Theorem 58.9.13. Let u = (L?)71(g). Then u € T.(G) because 
j € TP). Thus y = 5 (LP). (u)) = (4LP)«(u). So y € Range((dLP).). So T.,o(P) C Range((dL?)..). 
Hence Range((dL7).) = T; o(P). By Notation 64.5.6, T. o(P) = ker((dz);). 

For part (vi), let z € P. Then (d((LP) )))((aLP).(u) = (d((LP) ! o LP))-(u) = (didg)e(u) = 
idry,(Gy(u) = u for all u € T.(G). Thus (dLẸ)e has a left inverse. So by Theorem 10.5.14 (ii), (dLẸ)e 
is injective. Therefore by part (v), (dLD)e : T. (G) > Tz,0(P) is a bijection. 

For part (vii), let ¢ € AG and z € Dom(¢). Let p = T(z) Then 


(dL); = (d((L2) D). o ff (66.4.1) 
= (dL -:)8( © (dAl p); o i (66.4.2) 
= (aL )s! o (A(4|p,))2 o if, (66.4.3) 


where line (66.4.1) follows from Theorem 58.5.4, the inverse rule for C! diffeomorphisms, and Theorem 58.5.8, 
line (66.4.2) follows from the chain rule, Theorem 58.4.13, applied to (LL)! = LS gi ? |p given by 


( 
Theorem 66.4.3 (x) at z, and line (66.4.3) follows from Theorem 62.3.13 (iii). 


Part (viii) follows from Theorem 66.4.3 (xi), or alternatively from part (vii). 


66.5. Infinitesimal group actions on principal bundles 


66.5.1 REMARK: The infinitesimal action map on a principal bundle by Lie algebra elements. 

The infinitesimal action in Definition 66.5.2 is an application of the more abstract Definition 63.7.3 to the 
special case of the right action map for differentiable principal fibre bundles in Definition 66.2.2, which is 
a specialisation of the right action map for topological principal fibre bundles in Definition 47.8.7. The 
infinitesimal action map A is well defined for C! principal bundles by Theorem 66.4.3 (ii). 


The ability to define a right action map by the structure group on the total space of a principal bundle 
makes it possible to define an infinitesimal action of the group on the £otal space. In the case of ordinary 
fibre bundles, the structure group acts only on the fibre space. It is for this reason that the infinitesimal 
action map in Definition 66.5.2 is restricted to principal fibre bundles. 


66.5.2 DEFINITION: The infinitesimal action (map) of Lie algebra elements on a C! principal fibre bundle 
(P, m, M, AG) is the map A: P > (T.(G) > U,ep Tz,0(P)) with 


Vz € P, Vu € T.(Q), A«(u) = (ALP )e(u), 
where L? : G > P is as in Definition 66.4.2. Thus the functions A; : T.(G) — T.,9(P) are defined by 


Vz € P, àz = (dLË Je. 


66.5.3 REMARK: Similarity of infinitesimal action map and Lie group left translation operator. 

The infinitesimal action map in Definition 66.5.2 is very similar to the restriction to T.(G) of the left 
translation operator LT : T(G) — T(G) in Definition 62.3.9. The infinitesimal action of a Lie algebra 
element u € T.(G) at a principal bundle element z € P is the natural generalisation of the left translation 
of u by a group element g € G. In both cases, there is a left translation, although in the principal bundle 
case, this translation goes outside of the group. 
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66.5.4 EXAMPLE: nfinitesimal action map for the trivial principal (IR, +)-bundle on R4. 

This is a continuation of Example 66.2.24 for the principal (IR, J-)-bundle (P, x, M, AS) on M = IR* with 
P-MxG,G-R,A€ = {¢}, 7: (p.g) > p and ¢: (p, g) => g. 

By Definition 66.4.2, the left action map LË : G — P satisfies Lo 0) = RP((p,9)) = (p, gh) = (p,g +h) 
for all z = (p,g) € P and h € G. (The juxtaposition “gh” denotes the group operation in G, which is 
real-number addition, not multiplication.) 


Let u € To(G). Then u = to,w,idg for some w € IR. So by Definition 66.5.2, the pointwise infinitesimal action 
map Az : Te(G) > Tip,9),0(P) satisfies A- (u) = (dLo, 9))0 (to,w,idr) = t(p,g),v,idp» Where by Definition 58.4.5, 


1 
AU w) Oni (idp o LG, 9) o idR (A)| po 
— wy, 


Thus Az(u) = t(p,9),(0,w),idp» Where (0, w) € IR? is an abbreviation for (0,0,0,0, w) with w € IR. So 
Yz € P, Yw € R, Az(tog,w,ide ) = tz (0,w),idp- (66.5.1) 


By identifying u with w, one may write Az(u) = u tz (0,1),idp for all z € P and u € T.(G). This can be used 
to define connection forms on P. (See Example 70.8.7 for an application to classical electromagnetism.) 


66.5.5 REMARK: Implicit application of the fibre-set tangent bundle embedding map. 

The submanifold embedding map 7 and its inverse 7j are shown explicitly in Theorem 66.4.5 (vii, viii), but 
not in the equivalent Theorem 66.5.7 (vi, viii). (The difference between these equivalent theorems is that 
Theorem 66.5.7 is written in terms of the notation A, in Definition 66.5.2.) 

Hiding these embedding maps is equivalent to letting A optionally mean 7 o A, and 2s Az optionally 
mean fj o A. (See Notation 64.12.10 for 5j : LJ; cp Tz,0(P)) > Upem T(Ep) and f : Tz,0(P)) > Tz(Ep)-) 
Therefore it will be assumed from this point on that fibre-set tangent bundle m maps, and their 
inverses, are applied wherever required. 


It is often convenient to identify subsets T, o(P) of the total space tangent bundle T(P) with corresponding 
submanifold tangent bundles T.(P,:z)). (This identification is validated by Theorem 64.12.8.) This is no 
more harmful than identifying the two-dimensional tangent vectors in S? with their corresponding three- 
dimensional embedded vectors in IR?. 

Identifications are almost ubiquitous in mathematics. They are often so unconscious that no remark is even 
made about their necessity. This book is unusual for the number of identification maps which are given 
explicit definitions, notations and theorems. Many books provide at most a brief comment when they are 
introduced. In this book, the various kinds of identifications are presented in some detail, but are then 
gradually dropped for the sake of convenience. 


66.5.6 REMARK: Properties of the infinitesimal action map on a principal bundle. 

The basic properties of the infinitesimal action map in Theorem 66.5.7 are relevant to the understanding of 
connection forms on principal bundles in Definition 69.5.4. In particular, part (v) asserts that the Lie algebra 
of the structure group T(G) is linear-space isomorphic to the vertical subspace ker((dm),) of the tangent 
space T,(P) at each point z € P. (It is perhaps noteworthy that the expression (dL Gye in part (viii) is 
the same as the Maurer-Cartan form in Definition 62.5.4.) 

Some of the maps and spaces in Definition 66.5.2 and Theorem 66.5.7 are illustrated in Figure 66.5.1. 


66.5.7 THEOREM: Basic properties of the infinitesimal action map by Lie algebra elements. 
Let (P, v, M, AG) be a C! principal G-bundle. 


(i) Vz € P, YỌ € AS neey (dé); o A; = (ALG) )e- 
(ii) Yz € P, Vo € AP (a) Vu € Te(G), bx(Az(u)) = XI(o(z)). (See Definition 62.4.15 for X$.) 
(iii) Vp € M, Yọ € Ag, dx 0 Ax, (p) = idm, (a). (See Definition 21.10.3 for identity cross-sections Xg.) 
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Az (U) 3 Ae= (ALP). | 3Y 
1; (P) 0 LT 0 Te(G) 
(d$); 
TT(P) T'T(G) 
P 
RP 91 LẸ i? be 
~__F| se 
$ € AG ó(z) Lye) 
T 
M 
M | m(z)em (28) 
Figure 66.5.1 Differential of left action map by a principal bundle element 


(iv) Vz € P, Vu € T.(G), (dx);(A;(u)) = 0. In other words, Vz € P, (dr); o A, = 0. 
(v T.(G) — ker((dv);) 0 


) 

)A = P) is a linear space per for all z € P. 
Gi) Vee P, V6 e AG s As? = (d 

) 

) 


ido = Eo (z) 9 (d(6|,,. 7 ) = (418,4) )ylo (d(4] , Jz 
)« 


T(z) 


(vii) Yz € P, Yọ € AŠ r(e) ; (dL y -1)6(z) © (d(6| p " ); o àz = id, (o) 


(viii) Vz € P, Vó € AE, AZ! o Ca) dije od = (dL y) !- (Maurer-Cartan form.) 


PROOF: Part (i) follows from Theorem 66.4.5 (i) and Definition 66.5.2. 
Part (ii) follows from Theorem 66.4.5 (ii) and Definition 66.5.2. 


Part (iii) follows from Theorem 66.4.5 (iii) and Definition 66.5.2. 
Part (iv) follows from Theorem 66.4.5 (iv) and Definition 66.5.2. 


(ii 
( 
Part (v) follows from part (iv) and Theorems 66.4.5 (vi) and 23.1.14. 
Part (vi) follows from Theorem 66.4.5 (vii) and Definition 66.5.2. 

Part (vii) follows from part (vi). Alternatively apply the chain rule, Theorem 58.4.13, to the equality 
L5 S(j- 9 $| p b o LP = (LP)! o LP = ida, which follows from Theorem 66.4.3 (x, iv). 


Part (viii) follows from Theorem 66.4.5 (viii) and Definition 66.5.2. Alternatively by part (vi). 


66.5.8 EXAMPLE: Infinitesimal action map inverse for the trivial principal (IR, 4-) -bundle on R*. 

This is a continuation of Example 66.5.4 for the principal (IR, 4-)-bundle (P, v, M, AG) on M = Rf with 
P=MxG,G=R, AG = (0), 1 :(p,g) 5 p and ¢: (p,g) = g. 

Let z = (p,g) € P. Then tip,9),(0,w),iap € Tz,o(P) for all w € R because (dm)z(t(p.o),(0,u)idp) = tp,0,idm- 
Therefore AZ (t(p,9),(0,w),idp) € Te(G) = To(IR) is well defined by Theorem 66.5.7 (v). So it follows from 
Example 66.5.4 line (66.5.1) that 


VzcP,VwcR, Az ! (t, (ou) ia p) = 0g wjdg 


zZ 


= tog,w,idr: 


66.5.9 REMARK: Extension of properties of infinitesimal action map to “identity charts”. 

The “identity charts” ¢x in Definition 21.10.7 for local cross-sections X of a principal bundle (P, r, M, A) 
are not necessarily elements of the fibre atlas AG, although they can be shown to be compatible under suitable 
assumptions on the cross-sections and the regularity requirements for the atlas. Nevertheless, identity charts 
obey the same right action rule as charts in the atlas by Theorem 21.11.16 (i). Consequently they have the 
same kind of relation to the infinitesimal action map as the charts in AG, under suitable assumptions. This 
is shown in Theorem 66.5.10. 
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66.5.10 THEOREM: Basic properties of infinitesimal action maps applied to identity charts. 
Let (P,7,M, A8) be a C! principal G-bundle. Let X € X}.(P,7,M). Let U = Dom(X). Define the 
identity chart óx : »- !(U) > G as in Definition 21.10.7. 


(i) Vz € &1(U), (DP)! = LE (2-19 ox|p nus (This matches Theorem 66.4.3 (x).) 
(ii) Vz € a" (0), AP j = (4L. (e)- i) ox(z) o (d (éx| p. E )s- (dLS (2) js o (d (xlo) Ja: 
(iii) Vz € v7! (U), (dL. (2)-1) ox(2) © (d(óx|p |.) ); © À; = idm, (a) 


PROOF: For part (i), let z € v !(U). Let 2’ € P,(,). Then 2’ = L? (g) = Hi ) for some unique g € G by 
Theorem 66.4.3 (iv) and Definition 66.4.2. It follows that (Le leo 2 gale. x yz)- LG (2) (bx (2’)) = 
dx(2)"16x (eq) = dx (2) x (2)9 = 9 = (L2)"1(2") by Theorems 21.11.16 () and 66.4.3 (V). 

For part (ii), let z € ~'(U). Note that ¢x is a C! map by Definition 21.10.7 because X and 7 are C! 
maps, and ¢ is C! for all ó € Ag. Then (dL (.)-1)4x(z) © («(éx]p, |.) = (d(Lg. ()-1 © x|p_..))2 


(d((L?)~')), = Az! by part (i) and Theorem 66.5.7 (vi) and Theorem ER the chain rule for C! manifold 
maps. It then follows by Theorem 62.3.13 (iii) that A7 = (dLg. ue" o (déx|,, ! Je 


Part (iii) follows from part (ii). 


66.6. Fundamental vertical vector fields on principal bundles 


66.6.1 REMARK: The vector field induced on a principal bundle by a Lie algebra element. 

If the parameters of the function-valued function A in Definition 66.5.2 are swapped, the result is the 
transposed map À : T.(G) > (P — T(P)) in Definition 66.6.2. In fact, 4, € X(T(P)) for all u € T.(G), 
and Vu € T.(G), Vz € P, \u(z) € Tz o(P) by Theorem 66.4.5 (v). In other words, A, is a vector field on P 
which is vertical at every point. 


The field A, is called the “fundamental vector field” corresponding to u € T.(G) by Spivak [37], Volume 2, 
page 311; Bishop/Crittenden [2], page 91; Frankel [12], page 455; Crampin/Pirani [7], page 319; Kobayashi/ 
Nomizu [19], page 51; Daniel/Viallet [317], page 182. It is called a “fundamental field” by Bleecker [254], 
page 30. It is more accurately called the “fundamental vertical field” by Poor [32], page 283. It is called “das 
fundamentale Vektorfeld” (in German) by Sulanke/Wintgen [40], pages 75, 85. 


66.6.2 DEFINITION: The fundamental (vertical) vector field generated by a Lie algebra element. 
The transposed infinitesimal action (map) of Lie algebra elements ona C! principal fibre bundle (P, 7, M, AS) 
is the map A : T.(G) > (P > U,ep T;,o(P)) defined by 


Vu € T.(G), Vz € P, Au(z) = Az(u), 


where A is the infinitesimal action map of Lie algebra elements on P in Definition 66.5.2. 


Alternative name: fundamental (vertical) (vector) field (generated by a Lie algebra element). 


66.6.3 EXAMPLE: Fundamental vertical vector field for the trivial principal (IR, +)-bundle on IR. 
This is a continuation of Example 66.5.4 for the principal (IR, 4-)-bundle (P, r, M, AG) on M = Rf with 
P=MxG,G=R, AG = (6), m:(p,g) 5 p and ¢: (pg) 5 g. Then 


Yz € P,VwcR, Ag (tog,wide) = $2 09) idp 
So by Definition 66.6.2, 
Vw € R, Vz € P, D dace, (2) = tz,(0,w),idp- 
Alternatively, this can be written as: 
Vu = toy widg € To(IR), Vz € P, dal) = tz (0,w),idp- 
Thus A, affects only the vertical component w of the tangent vector tz(0,w),idp at z on P. 
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66.6.4 REMARK:  Transposed infinitesimal action maps are cross-sections of the PFB tangent bndle. 
Theorem 66.6.5 (ii) does not assert the equality of Range(A,) and UJ, c p Tz,0(P) for each u € Ee(G) because 
this not generally true. In fact, A, selects only one vector from each vertical vector set T; o(P). In other 
words, it is a cross-section of the tangent bundle of the principle bundle total space. This is asserted in 
Theorem 66.6.8 (iv) together with its differentiability. 


66.6.5 THEOREM: Simple relations between transposed and untransposed infinitesimal action maps. 
Let (P, v, M, AG) be a C! principal G-bundle. 


(i) Vu € T(G), Dom(A,) = P. 

(ii) Vu € Te(G), Vz € P, Au(z) € T; o(P). 
(ii) Vu € T.(G), Au € X(T(P = 

(iv) Vu € T.(G), Range(Ax iS Uvep Tz,0(P). 
(v) Vz € P, Vy € T; o(P), À 2g £) — y. 
(vi) Vz € P, Vu € T(G), XP OL) =u. 


PRoor: Part (i) follows from Definition 66.6.2. 

Part (ii) follows from Definitions 66.6.2 and 66.5.2. 

Part (iii) follows from part (ii) and Notations 64.5.6 and 57.1.5. 

Part (iv) follows from parts (1) and (ii). 

For part (v), let z € P and y € Tzo(P). Then by Theorem 66.5.7 (v), u = Az!(y) € Te(G) is well defined. 
So Au(z) = A;(u) = A;(A7!(y)) = y because Dom(A;) = T.(G 272 Definition 66.5.2. Hence Ay-1(y (2) =v- 


For part (vi), let Vz € P and u € T.(G). Then A7! (Au(z)) = Az! (A;(u)) = u by Definition ao 


66.6.6 REMARK: Similarity of fundamental vector fields to left invariant vector fields. 

As mentioned in Remark 66.5.3, the infinitesimal action map A is the natural generalisation of the Lie group 
left translation operator for vectors to left translation by a principal bundle element. In the same way, the 
transposed map A in Definition 66.6.2 is very similar to the left invariant vector field XL € X^*(T(G)) 
which appears in Definition 62.4.15, where XL is defined by XL(g) = (dL£ )e(u) for all g € G. (This is the 
transpose of the Lie group left translation operator.) 


Thus the transposed infinitesimal action map A of a Lie algebra element acting on a principal bundle is the 
natural generalisation of left invariant vector fields on Lie groups. 


Although the left invariant vector fields on Lie groups are C% differentiable, this is not generally true for 
fundamental vector fields because the action of the structure group on the total space of a C^ principal 
bundle is in essence a Lie transformation group action, not a Lie group action. Thus Theorem 66.6.8 (iv) 
delivers only C* differentiability for a C^*! principal bundle. (See Theorem 63.6.11 for a Lie transformation 
group version of Theorem 66.6.8 (iv).) 


66.6.7 REMARK: Fundamental vector fields are useful as vertical vector-extension fields. 

For the purposes of computing the vector-field style of exterior derivative in Definition 61.14.9, it is desirable 
to use input vector fields which have given values at a point. One way to do this is to use the “constant” 
extension fields in Notations 57.4.5 or 57.1.21 as inputs to the exterior derivative as in Definition 61.10.3. 


In the special case that the desired inputs are vertical vectors on a principal bundle, Theorem 66.6.5 (v) 
provides a useful construction method. Given any vertical vector y € Tz,9(P), the global vector field A ray) 
on P has the specfied value y. To compute the exterior derivative dw of an m-form w on P, one requires 
m +1 vector field inputs with given values (yx)i"_9 € Tz,o(P)"*!, assuming that one only wishes to compute 
dw for vertical vector inputs. These inputs can be provided by the vector fields A Abba for k € Zm4i. (See 
the proof of Theorem 70.5.7 for an example application of this procedure.) 


The differentiability of these “constant extension fields” is asserted by Theorem 66.6.8 (iv). In general, tuples 
of such vector fields will not be holonomic. (See Definition 61.5.19 for holonomic families of vector fields.) 
Therefore when used as inputs to the exterior derivative, the “correction terms", which are Lie brackets, are 
typically non-zero. 
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66.6.8 THEOREM: Basic properties of the transposed infinitesimal action map by Lie algebra elements. 
Let k € Zi. Let (P, v, M, A8) be a C**! principal G-bundle. 
(i) Vu € T.(G), Vz € P, Au (z) = (dLP).(u) € Tz o(P). 
(ii) Vu € T.(G), Va € P, ru(z) = (duh) z,e(02,u), where 0, = Or, (p; for all z € P. 
(iii) Vu € T.(G), = (ub)... o 9", where 6% € C*(P,T(P) x T(G)) is defined by $"(z) = (0;,u) 
for all u € Te i and z € P. (See Definition 58.10.2 for the direct-product decomposed differential 
(u&)«.« : T(P) x T(G) > T(P) of uG.) 


(iv) Vu € T.(G), Au e X*(T(P)). 


PROOF: Part (i) follows from Definition 66.5.2 and Theorem 66.4.5 (v). 
For part (ii), Theorem 58.6.7 line (58.6.4) implies (diG)z,¢(0z,u) = (dus Cz, :))e(u). So by Notation 10.12.18, 
(dub), e(0z,u) = (dLP).(u) = Au (z). (See Ten titian 58.6.2 for (du&)z e : T-(P) x Te(G) — T; (P), the direct 
product decomposition of the differential of u : P x G — P.) 

For part (iii), the formula A, = (uE).,. o 6” follows from part (ii). The assertion 9" € C*(P,T(P) x T(G)) 
follows from Theorem 57.2.16 because P and G are C^*! manifolds. 

For part (iv), the direct-product a composa differential (uw)... : T(P) x T(G) + T(P) in Definition 58.10.2 
is a C^ map by Theorem 58.10.3 because JE : P x G — P is C^*! by Theorem 66.2.9. So by part (iii) and 
Theorem 52.1.17, the chain rule for differentiable maps, A, € C^(P, T(P)). Hence A, € X^(T(P)). 


zZ 


RR 


66.6.9 REMARK: Right translation of “fundamental vector fields” on principal bundles. 

Theorem 66.6.10 (i) expresses the right translation by a group element g € G of the “fundamental vector 
field” A,, in Definition 66.6.2, or the infinitesimal action map A; in Definition 66.5.2, in terms of the adjoint 
map Adj(g~') acting on a Lie algebra element u € T.(G). (See Definition 62.10.2 for the adjoint map on a Lie 
algebra.) Theorem 66.6.10 (i) is essentially the same as propositions by Sternberg [38], page 333; Spivak [37], 


Volume 2, page 311; Bishop/Crittenden [2], page 45; Poor [32], page 283. 


66.6.10 THEOREM: The effect of the right action map on fundamental (vertical) vector fields. 
Let (P, v, M, AG) be a C! principal G-bundle. 
(i) Vu € T(G), Vg € G, (Re le o Àu © R? = AAdifo-*) u): 
(ii) Vu € Te(G), Yg € G, Yz € P, (R5 )+(Azg-1 (u)) = Az(Adj(g7")(u)). 
(ii) Yg € G, Yz € P, (RP), o Azg-1 = Az o Adj(g7+). 
Pnoor: For part (i), let u € T.(G) and g € G. Then by Definition 66.5.2, 
Yz €P, (RP). o Àu O RP )(2) = (Re )«(Au(z97")) 
= (Rg )*(Azg-2(u)) 
= (Gt). © (L5 3).)(u) 
= (RP o LP, i). (u) 


and 
Va € P, Aaai(g-1)(u) (2) = Az(Adj(g~*)(u)) 
= (LP)«(Adj(g7*)(u)) 
—( ee t.e ave (RE 9t) (66.6.1) 
= (E; x). o (BF) (u). 
= (Da o R).(u) 
(R3 945. aj Qu); (66.6.2) 


where line (66.6.1) follows from Theorem 62.10.5 (ii), and line (66.6.2) follows from Theorem 66.4.3 (v). 
Hence (R as o Àu O R? MEE = Aadiig -1)(u)- 

Part (ii) follows from part (i) and Definition 66.6.2 by noting that (Ay, € RP .)(z) = Xzq-1(u) for all 
u € Te(G), g E€ G and z € P. 

Part (iii) is a paraphrase of part (ii), which is equivalent by Theorem 10.2.13. 
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66.6.11 REMARK: Formula for right translation of fundamental vector fields on principal bundles. 

The map À, => (RP). o Au o R? in Theorem 66.6.10 (i) has the same form as the right translation operator 
map X > (RG). oXo RS, for vector fields in Definition 63.5.7. This is not surprising because (G, P) in 
Definition 66.2.7 is a right transformation group. 

Using the right translation operator Ho : X(T(P)) ^ X(T(P)) for vector fields in Definition 66.2.21, 
which is an application of the abstract Definition 63.5.7 for Lie right transformation groups to principal 
bundles, Theorem 66.6.10 (i) may be re-expressed as 


Vu € T,(G), Vg € G, Rg” (Xu) = Aaditg-t)(u)- 


That is, the right translation by g of the fundamental vector field A, for Lie algebra element u € T.(G) is 
equal to the fundamental vector field AAaj(g-t)(u) for the Lie algebra element Adj(g~')(u) € Te(G) obtained 
by computing the adjoint of u with respect to g^! € G. 


66.6.12 REMARK: Relation of the structure group's Lie algebra to the principal bundle’s Lie bracket. 
Since each Lie algebra element u € T.(G) for a C^*! principal G-bundle corresponds to a C^ vector field 
A, on the total space P by Theorem 66.6.8 (iv), the vector fields àu, A, for u,v € T,(G) can be combined 
with a Lie bracket as in Definition 61.5.7 if k > 1. But every vector u € T.(G) also corresponds to a vector 
field XL € X? (T(G)) by Theorem 62.4.16 (v). So the vector fields XL, X} can also be combined with a 
Lie bracket. Then it seems likely that these Lie brackets will be related as in Theorem 61.6.6. This is the 
assertion of Theorem 66.6.13. 

Each fibre set P5, for p € M, is a regular C^ submanifold of P by Theorem 64.3.9 (i). So Theorem 66.6.13 
for global fundamental vector fields A, would also be valid if the fields A, are restricted to P, for some 
fixed p € M. 


66.6.13 THEOREM: Lie bracket of transposed fundamental vector fields. 
Let (P, v, M, AG) be a C? principal G-bundle. Then 


Vu,v€ T.(G), uv] = [os Al. 
In other words, Vu, v € T.(G), Vz € P, A (z) = ws Av] Cz). 


PROOF: Let u,v € T,(G). Then [u,v] = [X%, X}](e) by Definition 62.8.8 (ii), where XL, X} € X? (T(G)) 
by Definition 62.4.15 and Theorem 62.4.16 (v). 
Let u € Te(G) and z € P. Then L7 € C^*!(G, P) by Theorem 66.4.3 (ii). Let g € G. By Definition 62.4.15, 


(dLE) (XŁ(g)) = (0L Jo (ALS )e(u u)) = (d(LẸ o Lg))e(u) = (dLE )e(u) = Au (zg) by Theorem 58.4.13, the 

chain rule. But Au (L? (g)) = Au(zg) also. Therefore (dL?)g o XP = A, o LP : G — P. This implies that 

XL and A, are i? seated vector fields by Definition 61.6.2. So (aLt ove (XT, XL](e)) = [Aw Av] (LP (e)) for 
À 


all u,v € T.(G) by Theorem 61.6.6. Thus Ap, (z) = Az \2((u, v]) = [\u, Av|(z). Hence Atuo] [Ans Ac]: 


66.6.14 THEOREM: Lie bracket of fundamental vector fields maps to Lie algebra product. 
Let (P, v, M, AG) be a C? principal G-bundle. Then 


Vu,v € T.(G), Vz € P, Az Aa Ao (X) = Tisi 


PROOF: Let u,v € T.(G). Then [A,, Av] z) = Nu,v](z) by Theorem 66.6.13. So [Au, Av](z) = Az([u, v]) by 
Definition 66.6.2. Hence A7 ! (Au, Av] (z)) = Az! Az ([u, v])) = [u, v] by Theorem 66.6.5 (vi). 


66.7. Associated differentiable fibre bundles 


66.7.1 REMARK: Associated differentiable fibre bundles are defined as for topological fibre bundles. 
Associated fibre bundles are fibre bundles which have equivalent chart transition maps. See Sections 47.9, 
47.10 and 47.11 for associated topological fibre bundles. Associated differentiable fibre bundles are defined 
in the same way, in terms of the topologies induced by the differentiable atlases. The definitions in those 
sections are therefore valid for differentiable fibre bundles with induced topologies. 
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66.7.2 REMARK: The role of associated fibre bundles for porting parallelism. 

Associated fibre bundles are a mechanism for copying definitions of parallelism between fibre bundles. Thus 
parallelism can be defined on one fibre bundle and copied to many associated fibre bundles. A typical example 
of this is the copying of parallelism from the tangent vector bundle of a manifold to tangent tensor bundles 
of arbitrary type. A common example in the literature is copying parallelism from a principal fibre bundle 
to an ordinary fibre bundle. Since a connection on a fibre bundle is a differential representation of pathwise 
parallelism, connections may also be copied between associated fibre bundles, and this is the purpose of 
defining them. Associated parallelism for general topological fibre bundles is presented in Section 48.4. 


66.7.3 REMARK: Construction of associated fibre bundles. 

Most textbooks define associated fibre bundles in terms of a particular method of construction, namely the 
orbit-space construction method presented in Section 47.11. The orbit-space method constructs abstract 
ordinary fibre bundles from principal fibre bundles. Another construction method in the literature uses 
patchwork spaces as presented in Section 47.10. In practice, associated fibre bundles are rarely constructed 
by the methods in Definitions 66.7.10 and 66.7.12. Instead, they are constructed independently and an 
association map h is constructed between their atlases as in Definition 66.7.5. 


66.7.4 REMARK: Extension of association from topological to differentiable fibre bundles. 

Definitions 66.7.5 and 66.7.6 are almost identical to Definitions 47.9.5 and 47.9.7 respectively because regu- 
larity constraints are not specified in the definition of fibre bundle associations. The differences lie in the fact 
that topological fibre bundles specify topologies for each space whereas differentiable fibre bundles specify 
atlases to indicate regularity. By Theorem 21.12.8, Definition 66.7.5 (ii) may be expressed as: 


Yé, $2 € Ab, Vp € Us, N Uga, Vg € G, 
daly, = Lo o dlp, e h(ó2)|]g, = Lg o h($1)| g 


66.7.5 DEFINITION: A (differentiable) fibre bundle association (map) is a bijection h : A > AE between 
the fibre atlases of differentiable (G, F) and (G, F) fibre bundles (E, v, M, A5) and (E, 7, M, AE) respectively 
such that: 
(i) Ve € Ap, r(Dom(9)) = 7(Dom(h(¢))), 
(ii) Vo1,¢2 € AE, Vp € Us, WU 45, 942,61 (P) = Gh(do),h(41)(P), Where Ug denotes 7(Dom(¢)), and g, g denote 
the fibre chart transition functions for the respective fibre bundle atlases. 


Alternative name: fibre chart association (map). 


66.7.6 DEFINITION: . 
Associated differentiable fibre bundles are differentiable (G, F) and (G, F) fibre bundles (E, v, M, A5) and 


(E 7, M, AP) for which there is specified a differentiable fibre bundle association h : AZ, > AF. 


66.7.7 REMARK: Diagram of associated fibre bundles. 

In the same way that Definition 47.9.11 defines a pair of C°-associated topological fibre bundles to be C°- 
equivalent to a pair of associated topological fibre bundles, Definition 66.7.8 defines a pair of C^-associated 
differentiable fibre bundles to be C*-equivalent to a pair of associated differentiable fibre bundles. This is 
illustrated in Figure 47.9.2. 


66.7.8 DEFINITION: C* associated differentiable fibre bundles are differentiable fibre bundles &,6& with 
the same base space which are C^ equivalent to associated differentiable fibre bundles £5, £2 respectively. 


66.7.9 REMARK: Construction of associated differentiable fibre bundles with patchwork spaces. 
Definition 66.7.10 is the differentiable analogue of Definition 47.10.4. This defines a method of constructing 
associated fibre bundles with the help of patchwork spaces. 


66.7.10 DEFINITION: The patchwork associated C^ differentiable (G, F) fibre bundle for k € Zij of a given 
C* differentiable (G, F) fibre bundle (E, m, M, AZ) < (E, Ag, v, M, Am, AL), for C* Lie left transformation 


groups (G, F) < (G, Aa, F, Ar, o, u5) and (G, F) < (G, Aa, F, Ag, o, ub), is the C^ differentiable (G, F) 
fibre bundle (E, 7, M, AE) < (E, Ag, t, M, Am, AE) defined by: 
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(i) E = {[(p,y,6)]; p e M,y € F, o € AE}, where [(p,y,4)] = {(p,94,0(p)y. d); ?' € AZ}, the 
transition maps g4;,9, : Ug, Us, — G are defined by Ly, , (y) = ¢2 © (bilr) and Dom(¢) = 


nx +(U¢) for @ € AE, 
(ii) 7: E — M is defined by st : [(p, y, 9)] > p, 
(iii) AL = {h(d); ó € AE), where h(ó) : &- (Us) > F is defined for ¢ € AE by h(¢) : [(p, y, )] ^ v. 


(iv) Ag — {pho pe Ati Im, j € Ip}, where Im and Ip are index sets for the atlases Am and Ap 
respectively, and the charts pin? : Dom(h(@)) — R'™ +", with ny = dim(M) and ng = dim(F), are 


defined for ó € Ab, i € Im and j € Ig by 


Vp € Ug, Vy € F, YE? (v, 9)]) = (M (9), v (y)), 


where YM € Am and m € Aj correspond to indices i € I, and j € Ip respectively. 


66.7.11 REMARK: Construction of associated differentiable fibre bundles with orbit spaces. 

Definition 66.7.12 is the differentiable analogue of Definition 47.11.5. This defines a method of constructing 
associated fibre bundles with the help of orbit spaces. The manifold atlases for the constructed total spaces 
in conditions 66.7.10 (iv) and 66.7.12 (iv) are essentially the same. 


The fine details of regularity of the original total space P in Definition 66.7.12 are lost in this method of 
construction because the atlas Ag does not depend in any way on the original atlas Ap. (The same loss of 
regularity information from the original total space occurs in Definition 66.7.10.) In fact, it does not seem to 
be possible to retain such information in general since the spaces F and G may be very different. The only 
hope for retaining such information is to retain some of the irregularity with respect to the base space M, 
but this must be communicated through the fibre charts somehow. 


The total space E in Definition 66.7.12 is often denoted as (P x F)/G or P xg F. Definition 66.7.12 is 
illustrated in Figure 66.7.1. 
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principal fibre bundle associated fibre bundle 


Figure 66.7.1 Associated differentiable fibre bundle, orbit-space construction method 


66.7.12 DEFINITION: The orbit-space associated differentiable (G, F) fibre bundle with C^ structure group 
(G, F) < (G, Ac, F, Ap, o, ue) for a given C^ principal G-bundle (P, tp, M, AS) < (P, Ap, tp, M, Am, AG) 
for k € Zi is the differentiable (G, F) fibre bundle (E, vg, M, AZ) < (E, Ag, te, M, Am, AE) defined by: 
(i) E = {[(z,y)]; z € P, y € F}, where for all (z,y) € P x F, 
[(2,9)] = (4^) € P x Fi np(7) = np(z) and 36 € AS rpp 6()y = O(z)y}- 
(ii) te : E — M is defined by mz: [((z, y)] > wp(z), 
(iti) AE = (h(9); 6 € A), where h(¢) : vg! (Ug) > F is defined for ¢ € AS by h(9) : [(z,y)] > O(z)y, 
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(iv) Ag = {apis [o AS, i € Im, j € Ip}, where Iņ and Ip are index sets for the atlases Am and Ar 
respectively, and the charts VL : Dom(h(¢)) — IR?» *?£, with ny = dim(M) and np = dim(F), are 
defined for ¢ € AS, i € Im and j € Ip by 


Vzemp(Us) Vy e F, yiz (IG) = (OM GG), v; (Gy), 
where Y € Ay and Qf € Ap correspond to indices i € Iy and j € Ip respectively. 


((2019-4-9. The objective of Remark 66.7.13 is to interpret each part of Definition 66.7.12 in the relatively 
simple case of a single global fibre chart. Theorem 20.5.13 should be of some relevance to this because 
the fibre sets of P are effectively identified with G. Show that the fibre sets of F are diffeomorphic to F. 
Maybe should give a theorem which asserts that Definition 66.7.12 satisfies the requirements of an abstract 
associated fibre bundle. )) 


66.7.13 REMARK: Orbit-space associated fibre bundles for a singleton fibre atlas. 

One way to descramble Definition 66.7.12 is to consider the case of a singleton fibre atlas. Suppose that 
AS = (o) for some fibre chart $9. Then Dom(¢9) = P by Definition 64.8.3 (iii). So mp xøo:P> MxF 
is a C^ diffeomorphism by Definition 64.8.3 (ii). Therefore 


Yzy) EPX F, [zl ={@ly’) E€ P x F; 2 € Pryce) andy —6óo(7) !óo(2)y)- 
({ 2019-4-9. To be continued ... )) 


((2019-4-9. To further clarify Definition 66.7.12, give the example M = S? with a two-chart fibre atlas, where 
G is equal to GL(2) or SO(2). Then let F = IR?. It would be nice to have a diagram of this. )) 


66.8. Differentiable short-cut orbit-space associated cross-sections 


66.8.1 REMARK: Short-cut versions of C" orbit-space associated fibre bundle cross-sections. 

Definition 66.8.2 and Notation 66.8.3 are adaptations of Definition 47.12.3 and Notation 47.12.4 from topo- 
logical to C^ contravariant fibre-space-valued principal bundle functions on a C% principal bundle total 
space P. Theorem 66.8.4 is the differentiable version of the corresponding Theorem 47.12.6 for topological 
principal bundles. 


66.8.2 DEFINITION:  Contravariant fibre-space-valued principal bundle functions. 
A contravariant principal bundle function on a C* principal bundle (P, p, B, AG) for a fibre space F, where 
(G, F) is a C^ Lie left transformation group, is a function Y : P — F which satisfies 


Yz € P, Wg eG, Y (zg) = 9 Y (x). 


66.8.3 NOTATION: Differentiable short-cut orbit-space associated cross-section space. 
X*((Px F)/G), for a C" principal G-bundle P and a C* Lie left transformation group (G, F), where k € Zj, 
denotes the set of C^ contravariant F-valued principal bundle functions on P. In other words, 


Vk € Zi, X*((P x F)/G) 2 (Y € C*(P, F); Yz € P, Yg € G, Y (zg) 2 g 1 Y(z)}. 
Alternative notation: X^(P xg F). 


66.8.4 THEOREM: Bijection from contravariant PFB functions to orbit-space associated cross-sections. 
Let (P, 7p, B, A8) be a C^ principal bundle with k € s Let (G, F) be a C* left transformation group. 
Let (E, tg, B, AZ) be the orbit-space associated C^ differentiable (G, F) bundle for (P, rp, B, AG). Let 
S* = (Y €C*(P,F); Vz e P, Vg € G, Y (zg) = g !Y (z)). Define p: S* > X*(E, nrp, B) by 
VY e S^, vb c B, e(Y)(b) = (2: Y (2) z € «g^ ({b})} 
id 


Then p : S^ + X*(E, Tp, B) is a well-defined bijection, and its inverse satisfies 


VX € X'(E,n, B), Vz € P, p |(X)(z) = X(np(z))(z) 
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and 


VX € X'(E,m, B), Vb € B, p (X)| p, = X(). 


Proor: LetY € S^andb e B. Let zo € P, = np! ((b]). Then P, = (zog; g € G} by Theorem 66.2.12 (viii). 


i (GG, Y (2); z € ng ((9)) = {(2, Y (2); z € (2089€ Gh} 
= {(209, Y (209)): 9 € G} 
= {(z09,9 !Y (29); g € G} 
= ((2og  ,gY (29); g € G} 
= [(2o, Y (29))] € Es 
by Definition 66.7.12 (i, ii). 
((2022-12-1. To be continued ... )) 


Thus Range(p) € X*(E,72, B). Now let X € X*(E,72z,B). Then for all b € B, X(b) = [(z, f)] for some 
z € P and f € F by Definition 66.7.12 (i), and «g(X(b)) = b implies that z € Pj. By Theorem 47.11.10 (v), 
[(z, f)] is then a well-defined function from P, to F for each b € B. So the function Y : P > F is well defined 
by Vz’ € P, Y(z) = X(np(z))(z). But Theorem 47.11.10 (viii) then implies that Y (z/g) = g Y (z') for 
all g € G, for all z' € P. Therefore Y € S^. Thus Range(p) 2 X^(E,m«g, B). Consequently Range(p) = 
X*(E,ng, B). In other words, p : S^ > X*(E, 72, B) is a surjection. 

To show that p is injective, let Y1, Y2 € S" satisfy p(Y1) = p(Y2). Then Y;| p= 
Yı = Ys. Thus p is injective. Hence p : S^ — X*(E, rp, B) is a well-defined bijection. 
To verify the formulas for p^!, let X € X*(E,7,B). Define Y : P  F by Vz € P, Y(z) = X(np(z))(z). 
Then Y € 5* by Theorem 47.11.10 (v, vii) as above. Let b € B. Then p(Y)(b) = Y|,, = X(b). So p(Y) = X. 
So Y —p !(X). Hence p !(X)(z) = X(np(z))(z) for all z € P and 9 Cg. = X(b) for all b € B. 


((2022-12-1. To be continued ... )) 


m for all b € B. So 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2112 66. Differentiable principal bundles 


[www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


book dri 


onstructed Copyright (C) 2022, Alan U. Kennington, All Rights Reserved. You m 


book daft fr personal use. Publie redistribution ofthis book draft in electr 


‘or printed form is forbidden. You may not charge- 


[2113] 


Chapter 67 


CONNECTIONS ON ORDINARY FIBRE BUNDLES 


67.1 History of parallel transport and connections... .. es 2113 
67.2 Specification styles for differential parallelism . . . ...... les 2116 
67.3 Reconstruction of parallel transport from its differential . . . ................ 2118 
67.4 Horizontal lift functions on differentiable fibrations . . . . .. .... lll. 2121 
67.5 Horizontal lift functions on ordinary fibre bundles ............ lll ll. 2123 
67.6 Connection generator functions . . . . 4. 4 4 4 2l e ees 2126 
67.7 Differentiability of horizontal lift functions . . . .. ... lle 2129 
67.8 Transposed horizontal lift functions... . o. a a a a eh 2133 
67.9 Horizontal component maps and horizontal subspaces . . . o soo a e e a 2134 
67.10 Vertical component maps . . . osoa oa o o sho sns 2137 
67.11 Connection definition conversions for ordinary bundles . . . . ..... a a e 2137 
67.12 Associated connections on ordinary fibre bundles. . ... a a a a a a lll 2140 


67.1. History of parallel transport and connections 


67.1.1 REMARK: Levi-Civita’s parallel transport became Hermann Weyl’s “affine connection”. 
Historically, affine connections on tangent bundles (the subject of Chapter 71) came before general connec- 
tions on differentiable fibre bundles (the subject of Chapters 67, 68, 69 and 70). Therefore the historical 
origins of connections must be sought first for affine connections. 


Affine connections are closely associated with the concept of parallel transport of tangent vectors. The first 
publication of the concept of parallel transport of vectors in a Riemannian manifold is generally agreed to 
be a 1917 paper by Levi-Civita[187]. (Covariant differentiation had been published much earlier, in a 1900 
paper by Ricci-Curbastro and Levi-Civita [194].) In the 1917 paper, Levi-Civita arrived at the equations 
for parallel transport by adopting normal coordinates in a Riemannian metric space which is embedded in 
a Euclidean space. He showed that the orthogonal transport along a curve is an intrinsic property of the 
manifold, independent of the embedding, and that the transport equation may be expressed in terms of 
a Christoffel array. (This kind of array had been introduced in 1869 by Christoffel [175] for a related but 
different purpose. See Remark 67.1.4.) Then he showed that length-minimising geodesics are self-parallel in 
terms of this definition of parallel transport. 


Levi-Civita did not express parallelism in terms of a connection between a point and its neighbourhood in 
the 1917 paper, nor in his 1925 book (Levi-Civita [26]), even though he was fully aware of Hermann Weyl's 
affine connection concept by 1925. Levi-Civita seems to have expressed the concept of parallelism only in 
terms of transport along curves, not as the more abstract connection concept. 


'The concept of à connection apparently originates with Hermann Weyl, who emphasised the idea of an 
“affinely connected manifold”. He described three levels of connection on a manifold, namely the continuous 
connection, the affine connection and the metric connection. (See Weyl [310], pages 94, 104, 113, 124. See also 
Remark 49.2.10 for further details of his “three-storey building" metaphor for the three connection layers.) 
Each of these three layers was, in his view, a kind of connection between a point and its neighbourhood in 
the same sense that local connectedness and neighbourhoods are defined in general topology. Weyl strongly 
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asserted the value of this intermediate layer in the understanding of the physical space of general relativity. 
The philosophical implications were substantial in a time when physics was abandoning the well-known 
Euclidean geometry in favour of the very unfamiliar non-Euclidean geometries, in particular Riemannian 
geometry, which Weyl pointed out is inconveniently inhomogeneous. (See Wey] [310], pages 98-104.) 


67.1.2 REMARK: Levi-Civita’s parallel transport in Riemannian manifolds. 

In the case of a Riemannian manifold (in Chapter 73), the Levi-Civita connection (in Chapter 74) may be 
calculated from the metric tensor field. The metric tensor field may be calculated by differentiating the 
point-to-point distance function. The distance function has a clear intuitive geometric meaning, but it is 
not intuitively obvious how parallel transport should be calculated from the metric. The formula for parallel 
transport was first pointed out by Levi-Civita [187] in 1917, long after the formula for geodesics in terms 
of the Christoffel array had been well known. (See for example the 1900 article by Ricci/Levi-Civita [194], 
and the 1902 book by Bianchi [204], page 334.) An earlier attempt to define parallel translation of vectors 
had been made by Clifford. (The Clifford parallelism is described by Bianchi [204], pages 448-454, and 
Levi-Civita [187] in [225], pages 16-18.) 

Levi-Civita [26], pages 100-111, explained a method for deriving parallel translation of a vector along a curve 
on an embedded surface by approximating it with a “developable surface” at each point of the curve, then 
transporting a vector along the approximating surface, and finally projecting this transported vector onto 
the given surface. He extended the definition of parallelism to general intrinsic Riemannian geometry by 
thinking of the manifold as being embedded in a higher-dimensional ambient space. (See Levi-Civita [26], 
pages 137-141.) At more than one point, he mentioned the similarity between geometric parallelism and the 
concept of “virtual work”, hinting at the mechanical significance. (See Levi-Civita [26], pages 103, 107.) 


In a footnote, Levi-Civita [26], page 102, suggests that parallelism for an embedded surface may be determined 
by rolling it along a plane. 
A simple and so to speak automatic way of constructing parallel directions is to roll the surface c 
along a plane. 


A more modern version of this is offered by Spivak in regard to covariant derivatives. (See Spivak [37], 
Volume 2, page 211.) He asks the question: “What does the covariant derivative really mean?”, and 
responds that a partial answer is given by the fact that the covariant derivative is the same as “ordinary 
partial derivatives” in Riemannian normal coordinates. Normal coordinates are a pretty good approximation 
to rolling the surface along a plane! (If one does in fact “roll a surface along a plane”, there will be some 
danger of rotational slippage. This corresponds to torsion of the connection. See Example 71.12.6 and 
Figure 71.12.2.) 


Thus in a Riemannian manifold, one obtains the differential of parallel translation at each point along a 
curve by setting up Riemannian normal coordinates at that point and setting the differential equal to the 
flat-space parallelism definition in the local coordinate chart. In this way, the differential of parallel transport 
is defined at each point, and the parallel transport itself is calculated along a curve by integration. 


In Chapters 67, 68 and 69, no Riemannian metric is assumed. (Any Riemannian metric which is defined is 
ignored.) In the absence of a Riemannian metric to provide a Levi-Civita connection, some other kind of 
connection must be defined. 


The simplest way to generate parallelism is by integrating a connection. Then it is guaranteed that the 
differential of the parallel transport will be the given connection, and that the integral of the connection 
will be the computed parallel transport. If one attempts to specify parallelism for all curves without first 
specifying a connection, it is difficult to guarantee that differentiating and then integrating will reconstruct 
the specified parallelism. 


Reconstructing parallelism from its differential is the subject of Section 67.3. In practice one typically does 
not compute the connection from the parallel transport along all curves. The connection of a notional parallel 
transport is typically computed first from the geometry or physics of infinitesimal translations (as in the case 
of the rolling surfaces). Then parallel transport is obtained by integrating this. However, in some simple 
geometries, the affine geodesics may be determined from symmetry arguments, and a torsion-free connection 
may then the computed from the geodesics. 


67.1.3 REMARK: Parallel transport and the conservation of angular momentum. 
The conservation of linear momentum distinguishes the special class of curves which are (affine) geodesic. 
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Light rays, in particular, follow paths which obey the conservation of linear momentum, and light rays are 
our primary means of perceiving geometry. Linear momentum effectively induces a differentiable structure 
on real-world geometry which would be only a topological structure without momentum. 


In the same way that the conservation of linear momentum gives concrete real-world meaning to velocities 
in tangent bundles (by singling out the trajectories which have finite velocity), the conservation of angular 
momentum gives concrete real-world meaning to parallel transport. If one mounts a circular disc on a 
frictionless axle at the equator of a hypothetical perfectly spherical non-rotating planet, and if this disc is 
initially at rest when commencing a journey in a straight line (i.e. geodesic) to the north pole, keeping the 
axle always vertical, the northernmost point of the disc will still point north as one arrives at the north pole. 
The geometric notion of parallel transport corresponds to the conservation of angular momentum. Without 
the conservation of angular momentum, discs on vertical axles could rotate arbitrarily without any force 
acting on them. 


Although Levi-Civita explained parallel transport in terms of rolling a manifold on a flat surface, this does 
not exclude the possibility of twisting or torsion while rolling the manifold. It is the conservation of angular 
momentum which distinguishes true parallel transport from twisting parallel transport. 


67.1.4 REMARK: Christoffel’s 1869 article on the conditions for developable surface transformations. 
The two Christoffel arrays originated in an 1869 article by Christoffel [175]. He wrote that the investigations 


in the article arose from the problem of the most general way to transform a surface without changing a 
distance function on the surface. The following appears in [175], page 47. 


[...] die gegenwártigen Untersuchungen ursprünglich durch die Ausdehnung des Problems der 
aufeinander abwickelbaren Flachen auf Gebiete von n Dimensionen veranlasst worden sind. 


This may be translated as follows. 
[...] the present investigations arose from the extension to regions of n dimensions of the problem 
of surfaces developable into each other. 
The Riemannian manifold distance context is only mentioned in the final paragraph of the article [175], 
page 70, as follows. 
Ueber die beim vorigen Lehrsatze ausgeschlossenen Differentialausdrücke F, zu denen unter andern 
das Quadrat des Linienelementes im Raume von drei Dimensionen gehórt, liegt eine Abhandlung 
aus dem Nachlasse Riemanns *) vor, zu welcher Herr Dedekind die dort unterdrückten analytischen 
Entwicklungen in Aussicht gestellt hat. 


*) Ueber die Hypothesen, welche der Geometrie zu Grunde liegen. Abh. der Góttinger Ges. d. W. 
vom Jahre 1867, Band XIII. 


'This may be translated as follows. 


Regarding the differential expressions F which are excluded in the above theorem, to which belongs, 
among other things, the square of the line element in a space of three dimensions, a treatise from 
the estate of Riemann *) is available, where Mr. Dedekind has put in prospect the analytical 
developments held back there. 


*) Ueber die Hypothesen, welche der Geometrie zu Grunde liegen. Abhandlungen der Góttinger 
Gesellschaft der Wissenschaften, 1867, Volume XIII. 


The “above theorem” here doubtless refers to the long, complicated theorem in [175], pages 68-69, which 
presents some specialised existence assertions for IR?. This article gives an interesting perspective of the 
state of differential geometry at the time. Riemann had died on 20 July 1866. His famous habilitation thesis 
was delivered on 10 June 1854, but was first published in the Transactions of the Academy of Sciences in 
Gottingen in 1868. (See Riemann/Weyl [230], pages III-IV. Dedekind was an editor of that posthumous 
publication.) Christoffel's article was apparently submitted on 3 January 1869. 


Christoffel's two arrays were not originally thought of as the coefficients of affine connections, nor even related 
directly to any concept of parallelism. They were applied much later by Levi-Civita (1917) and Weyl (1919) 
to describe parallel translation (the Levi-Civita connection) in Riemannian manifolds. (See Levi-Civita [187] 
in [225], pages 1-39; Wey] [310], pages 88-94; Levi-Civita [26], pages 107-114.) Christoffel's two arrays (of 
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the “first kind” and the “second kind”) arise inevitably from any attempt to tensorise arrays of second-order 
derivatives. (See Section 60.4 for tensorisation of second-order derivative operators.) 


67.2. Specification styles for differential parallelism 


67.2.1 REMARK: Alternative representations for connections. 

Some styles of representation of connections are listed in Table 67.2.1. (Strictly speaking, the very general 
“pathwise parallelism” style is not a connection because it is defined on a topological fibre bundle. It is 
included for comparison. See Table 69.15.1 in Remark 69.15.1 for ten representations of connections.) 


representation bundle type info style notation reference 
pathwise parallelism topological parallelism Q7. 48.3 
horizontal lift function differentiable parallelism Oy (z) 67.5 
horizontal lift transposed differentiable parallelism  0,(V) 67.8 
horizontal component map differentiable parallelism hz 67.9 
horizontal subspace map differentiable parallelism Qz 67.9 
vertical component map differentiable difference Vz 67.10 
Christoffel array vector difference n 68.1.9 
covariant derivative vector difference DyY 68.2 
connection form principal difference We 69.5 
gauge potential principal difference AS 69.11 
Koszul connection tangent difference DxY 71.6 
Cartan moving-frame principal frame difference x 71.14 
affine geodesics tangent parallelism y 72.1 
Table 67.2.1 Representations of parallelism 


The styles with “tangent” bundle type are defined only for affine connections on tangent bundles, not for 
general connections on differentiable fibre bundles. The “connection form” and “gauge potential” are defined 
only on differentiable principal fibre bundles. The Cartan-style moving-frame connection is a specialisation 
of the gauge potential to principal frame bundles. 


Representations of connections have two “information styles”. They may either indicate parallel translation 
for given curves or velocities, or they may indicate the difference between a given translation and parallel 
translation. These are indicated respectively as “parallelism” and “difference” in the “info style” column. 
Thus a “difference” representation yields the value zero for translation which is parallel. 


The “difference” styles of connection representation have the form “actual motion minus parallel motion”. 
So formulas expressed in the “difference” and “parallelism” styles often have perplexing opposite signs in 
one or more terms. This is a consequence of the “minus” sign in the “difference” styles. In much of the 
literature, the word “connection” is identified with a covariant derivative or a connection form. So some 
formulas, especially involving curvature, acquire a minus sign. Therefore one must always be aware which 
style of connection is being used when comparing formulas from different literature sources. 


67.2.2 REMARK: The wide variety of structures which are called a “connection” in the literature. 
Table 67.2.2 is a mini-survey of styles of representation of connections in the differential geometry literature. 


A large proportion of these authors additionally define connection forms, horizontal lifts and other structures 
which are equi-informational to their main definition of a connection. (See Remark 69.5.1 for some authors 
who additionally define connection forms for principal bundles or frame fields.) However, Table 67.2.2 shows 
the structure which the indicated references either explicitly define by name as a “connection”, or very 
strongly hint that the structure is what they call a “connection”. (The term “cross-section” is abbreviated 
here to “section” to save space. The indication “Koszul” in parentheses indicates that covariant derivatives 
are defined in the Koszul style with respect to vector fields instead of pointwise with respect to vectors.) 


Some authors in this survey define the connection as a point-to-point parallelism relation between two tangent 
spaces, where the parallelism is carried by a path. However, most authors define the connection as some 
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year reference connection definition 
1918 Weyl [310] Christoffel array 
1925 Levi-Civita [26) Christoffel array 
1949 Synge/Schild [41] Christoffel array 
1950 Struik [39] Christoffel array 
1959 Kreyszig [22] Christoffel array 
1959 Willmore [42] Christoffel array 
1962 Lawden [283] Christoffel array 
1963 Flanders [11] connection form array for frame fields (Cartan) 
1963 Guggenheimer [16] parallel transport of vectors along paths 
1963 Kobayashi/Nomizu [19] horizontal subspace of principal bundle tangent space 
1964 Bishop/Crittenden[2] ^ horizontal subspace of principal bundle tangent space 
1965 Postnikov [33] covariant derivative of vector fields by vector fields (Koszul) 
1968 Bishop/Goldberg [3] covariant derivative of vector fields by tangent vectors 


1968 


1970 
1970 


1972 
1975 
1975 
1977 


1979 
1980 
1980 
1980 
1981 
1981 


1986 
1987 
1994 
1996 
1997 
1997 
1998 


1999 
1999 
2004 
2005 
2007 
2012 
2015 


Choquet-Bruhat [6] 


Misner /Thorne/Wheeler [292] 


Spivak [37], Volume 2 


Sulanke/Wintgen [40] 
Cheeger /Ebin [5] 
Lovelock/Rund [27] 
Drechsler /Mayer [262] 
Do Carmo [9] 

EDM2 [1183] 
Daniel/Viallet [317 
Schutz [36] 

Bleecker [254] 

Poor [32] 
Crampin/Pirani [7] 


Gallot /Hulin/Lafontaine [13] 


Darling [8] 
Goenner [270] 


Frankel [12] 


Lee [24] 
Petersen [31] 


Lang [23] 

Rebhan [299] 
Szekeres [305] 
Penrose [297] 
Morgan/Tián [29] 
Sternberg [38] 
Gómez-Ruiz [14] 


(1) covariant derivative of vector fields (absolute differential) 

(2) horizontal lift function for principal bundle 

[covariant derivative by tangent vectors; and Christoffel array] 
Christoffel array 

covariant derivative of vector fields by vector fields (Koszul) 
connection form array for frame fields (Cartan) 


horizontal subspace of principal bundle tangent space 
covariant derivative of vector fields by vector fields (Koszul) 
covariant derivative of vector fields by vector fields (Koszul) 
(1) horizontal subspace of principal bundle 
(2) horizontal lift function for principal bundle 
covariant derivative of vector fields by vector fields (Koszul) 
horizontal subspace of principal bundle tangent space 
horizontal subspace of principal bundle tangent space 
parallel transport of vectors along paths 
horizontal subspace of principal bundle tangent space 
(1) horizontal subspace of principal bundle tangent space 
(2) covariant derivative of vector fields by vector fields (Koszul) 
parallel transport of vectors along paths 

covariant derivative of vector fields by vector fields (Koszul) 
covariant derivative of vector bundle sections by vector fields (Koszul) 
covariant derivative of vector fields by tangent vectors 
covariant derivative of vector bundle sections by vector fields (Koszul) 
covariant derivative of vector bundle sections by vector fields (Koszul) 
(1) covariant derivative of vector fields by vector fields (Koszul) 
(2) covariant derivative for vector bundle sections 
horizontal lift function for vector bundle 
[Christoffel array] 
covariant derivative of vector fields by vector fields (Koszul) 
covariant derivative of vector fields by tangent vectors 
covariant derivative of vector fields by vector fields (Koszul) 
horizontal subspace of principal bundle tangent space 
(1) covariant derivative map V : X*(T(M)) > X^ (T'!(M)) 
(2) covariant derivative of vector fields by vector fields (Koszul) 


Kennington 


horizontal lift function for ordinary fibre bundle 


Table 67.2.2 
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kind of differential of parallelism or a covariant derivative. About half of this sample are differentials of 
parallelism, while the other half are covariant derivatives (i.e. the difference between a given vector field and 
a parallel vector field). 


The connection styles differ also according to the choice of active component, which may be a vector at 
a point or a vector field, and the choice of passive component, which may be a vector field or a general 
vector bundle cross-section. A more abstract approach is to define horizontal subspace or the horizontal lift 
function. These are sometimes encountered in the context of general fibre bundles. 


Some authors do not use the word “connection” for any particular mathematical structure, preferring instead 
to use covariant derivatives or the Christoffel array without calling them connections. (The definitions of 
these authors are indicated in square brackets in Table 67.2.2.) 


67.2.3 REMARK: The choice of the best representation for the connection concept. 

Some of the factors to take into consideration in choosing a standard structure to be called “the connection” 
are popularity, ease of conversion into and out of other representations, generality, intuitive clarity, and 
convenience for calculation and manipulation. 


67.3. Reconstruction of parallel transport from its differential 


67.3.1 REMARK: A connection is a differential of a parallel transport. 

A connection is a differential of some kind of parallel transport along curves in a differentiable manifold. 
(In the earlier history of differential geometry, a “connection” often meant the parallel transport itself, not 
its differential.) Importantly, the parallel transport must be exactly reconstructible from the differential by 
integrating it. If parallel transport can be reconstructed from its differential, then clearly the differential 
contains all of the information in the curve-wise parallel transport. This is analogous to the way in which a 
Riemannian metric contains all of the information in the corresponding point-to-point distance function. 


The fundamental assumption of the concept of a connection on an ordinary fibre bundle (E, 7, B) is that 
the derivative of a “lifted” curve 4 depends only on the location of the point p on the original curve y, the 
velocity V of y at p, and the "state" z € E of the lifted curve 4 = O(»y) corresponding to the point p € B. 
This is illustrated in Figure 67.3.1. 


( ^ 
e Thes. See 
n = H(t) 2 = Â(t2) e 
O 
X. A 
(C ^ 
m. ut y 
pı = 7(tı) p2 = y(ta) a 
S g 
Figure 67.3.1 Parallelism lift function for an ordinary fibre bundle 


In Figure 67.3.1, the lifted vector Y is assumed to depend only on the point pı, the velocity Vi of y at pı, 
and the state z1 = (t1) of the lifted curve 4 = O(y) for the curve parameter tı € Dom(y) = Dom(4). If 
the dependency of the lifted curve on the base-space curve cannot be constrained in this way, then a more 
general concept of a connection must be considered. 


If the lifted velocity does depend only on the base-curve velocity and the state of the lifted curve at each 
point, then the full lifted curve can be obtained from a single state value z1 = 4(t,) by solving a first order 
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differential equation for the given base-space curve y. Thus the formulation of the lift of a curve y as the 
value 4 = 0(7) of a lift-map © : @!(B) — @'(E) is actually incorrect. (See Section 48.3 for a more careful 
formalism for general parallelism.) The lifted curve depends on some kind of “initial value”. That is, the 
state 4(t) must be specified for some t € Dom(7) to determine which lifted curve is intended. So the “lift” 
of a curve is generally an infinite number of lifted curves which correspond to different choices of an “initial 
value” at some point. The totality of such lifted curves is fully determined by the specification of the lifted 
velocity for each base-space velocity V € T,(B) and state z € Ep = ~'({p}), for each p € B. 


67.3.2 REMARK: The horizontal component of the lift of a curve equals the velocity of the curve. 

In Remark 67.3.1. the equality 7(4(t)) = y(t) for all t € Dom(^) implies that dz(4'(t)) = y(t) for all t € 
Int(Dom(y)). In other words, if 7/(t) = V, then the horizontal component d«(4'(t)) of 4'(t) must also 
equal V. 


67.3.3 REMARK: The differential of parallel transport for rectifiable curves. 

In the case of rectifiable base-space curves, which have a well defined velocity almost everywhere, lifts of 
curves may be obtained by integration in the same way as for C! differentiable curves. (See Sections 38.9 
and 50.7 for rectifiable curves.) This is based on the Lebesgue differentiation theorem in Section 45.7, which 
implies that a rectifiable curve is differentiable almost everywhere with respect to its curve parameter. (This 
does not require any axiom of choice.) In principle, one could permit the connection to have some kind of 
L?*. property, but continuity of the connection is probably a permissive enough constraint for most physically 


loc 
interesting situations. 


67.3.4 REMARK: Mathematical representation of the differential of parallel transport. 

The most intuitively clear representation of a connection is a linear map from velocities of curves in the 
manifold's point-space to velocities of parallel-transport curves in a differentiable manifold over that point 
space. This is a “horizontal lift map". This idea may be interpreted more concretely as follows. 

If M is a suitably differentiable manifold and p € M, the velocities of curves at p live in the tangent 
space T,(M). If (E, vg, M, A‘) is a suitably differentiable fibre bundle on M, then vg ({p}) is a set of 
states (or orientations) of objects which are located at p. Suppose that y : IT — M is a suitably differentiable 
curve in M for some open interval J C IR. Suppose that 7(to) = p for some tp € I. Then parallel transport of 
an object with state z € tp ({p}) along y yields a “horizontal lift” curve 4 : I > E such that 4(t9) = z and 
we (¥(t)) = q(t) for all t € "UL Let 7/(t) and 4’(t) denote the velocities of y and 4 at t € I respectively. The 
differential of horizontal lifts will contain all of the information in all horizontal lifts if 4'(t) = 0('(t))(^(t)) 
for all t € I, for all suitably differentiable curves y in M, where 


(1) 09: T(M) > Uren (rz ({p}) — T(E)) is a function for which 

(2) M (V)) = ag! ({p}) for all V € T,(M), for all p € M, and 

(3) 0(V)(z) € T;(E) for all V € T,(M) and z € tp ((p)), for all p € M, and 

(4) the map V + 6(V)(z) is linear for all V € T,(M) and z € nz ({p}), for all p € M. 

A straightforward natural-language description of the differential of parallel transport seems somewhat cum- 


bersome when expressed with more mathematical precision. An alternative to the above function-valued 
function representation is the following function-of-two-variables representation. 


(1) 62 : Use (DM) x TE '({p})) > T(E) is a function for which 


(3) 00(V, z) € T;(E) for all V € T,(M) and z € vg ((p)), for all p € M, and 
(4') the map V ++ 63(V, z) is linear for all V € T,(M) and z € vg ((p]), for all p € M. 


This function-of-two-variables formulation has the advantage that a single technical condition has been 
removed from the formal specification. However, the representation 05 is less convenient in applications. 
The representation 0 has the advantage that the first parameter may be notated as a subscript, which then 
yields a useful vector field Oy on the subset tp ({p}) of E for each V € T,(M) and p € M. It turns out 
that this vector field is closely related to the Lie algebra of the structure group for the differentiable fibre 
bundle (under some reasonable conditions on the connection). 


The inevitable clumsiness of formulations of general connections (and affine connections in particular) has 
led to a proliferation of mathematical representations, some of which are discussed in Section 67.2. Some of 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2120 67. Connections on ordinary fibre bundles 


the more user-friendly formulations lack clear geometric significance, or restrict the range of applications, as 
the price of elegance and convenience of computation. 


67.3.5 REMARK: Connection forms on ordinary fibre bundles. Not successful. 
It is useful to (unsuccessfully) attempt to represent connections on ordinary fibre bundles as differential 
forms to motivate their (successful) definition in the case of principal fibre bundles. 


The concept of a vector-valued differential form in Definition 57.6.5 permits a completely general linear 
space as the target space. However, the target space must be the same space at all points of the base-point 
manifold. A connection on an ordinary fibre bundle (£,7, B) is a linear from the tangent space T,,(B) at 
each point p € B to the set of vector fields on the fibre set E, = s !((p]) with values in the tangent 
bundle T'(M). It is therefore not possible to represent connections in this way. 


Another possibility for using differential forms to represent connections on ordinary fibre bundles is to attempt 
to define them on the total space E. This cannot succeed in the obvious way because a connection must map 
pairs (V, z) to velocities in T;(E), where V € T,(B) and z € E, = 7~'({p}). However, one may define a map 
wz : T;(E) > T; (E) for each z € E by w;(W) = W — 6(da(W))(z) for all z € Ep. This has the interesting 
property that da(w.(W)) = da(W) — dr(@(dr(W))(z)) = dr(W) — dr(W) = 0 because the horizontal 
component of 0(V)(z) equals V for all V € T,(B) by Remark 67.3.2. Therefore w,(W) is a “vertical vector" 
for all z € E and W € T;(E). (See Section 64.5 for vertical vectors. The verticality of vectors is independent 
of the choice of fibre chart.) For each choice of fibre chart ¢, there is a well-defined diffeomorphism from 
Range(w;) to the fibre space F, namely the map $i yy : m +({p}) ~ F. (See Definition 64.8.3 for 


differentiable fibre bundles.) Let p = Pa- (p) for p € m(Dom(¢)). Then the differential do, of p maps 


w;(W) to an element of T(F). Unfortunately, the mapped vector dó5(w.(W)) € T(F) is in different tangent 
spaces for different points z € Ep. So once again, this procedure does not yield a differential form on E. 
Nevertheless, it does demonstrate a style of construction which is successful in the case of principal fibre 
bundles because then one may translate the image do, (w;(W)) € T(G) for structure group G to the origin 
of G. This then yields an element of the Lie algebra of G for each z € P, and this is chart-independent, 
where P is the total space of the associated principal fibre bundle (P, q, B). 


67.3.6 REMARK: Requirements for connections and parallel transport. 

Parallel transport is not necessarily differentiable. (See Chapter 48 for general topological parallelism. See 
Sections 21.15 and 21.16 for general non-topological parallelism.) If parallel transport on a manifold is 
not differentiable, it clearly cannot be reconstructed from its differential, but in a wide range of physical 
applications, the parallel transport is differentiable. However, differentiability alone is not sufficient to 
guarantee that there will be one and only one parallel transport curve passing through each point with a 
given velocity. 


In most practical situations, the connection is given and the parallel transport is calculated by integration. 
Therefore requirements are placed on the connection to try to ensure that the generated parallel transport 
always exists and is unique. 


It is perhaps obvious that if parallel transport must be differentiable, then the underlying fibre bundle 
must be differentiable. (Differentiable fibre bundles are presented in Chapters 64-66.) The differentiability 
requirement can be weakened a little. But the effort to make such a generalisation is probably not profitable 
enough to justify the investment. 


'The definition of a general connection does not require the fibre bundle to have a structure group, although 
connections are customarily defined in relation to a structure group. In this case, the structure group is 
typically required to be differentiable, which implies that it must be a Lie group. Then the connection 
typically maps base-space velocities to total-space velocities which are related to elements of the Lie algebra 
of the structure group. Hence Lie groups and Lie algebras are often incorporated into the definitions of 
general connections. (Lie groups are presented in Chapters 62-63.) 
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67.4. Horizontal lift functions on differentiable fibrations 


67.4.1 REMARK: First define connections on fibrations to motivate the structure group’s role. 

Although it is customary to define connections on fibre bundles which have structure groups to express the 
conservation of some property (such as the lengths of vectors, for example), it is preferable to first define 
connections on differentiable fibrations, with no fibre atlas and no structure group, to motivate the role of 
the structure group. (See Definition 64.2.2 for differentiable fibrations.) 


Definition 67.4.2 has very little value in itself. It purpose is purely motivational. Definition 67.5.4 is the 
useful version of Definition 67.4.2. 


67.4.2 DEFINITION: A horizontal lift function on a C! fibration (E,n, M) < (E, Ag, v, M, Am) with Ct 
fibre space F < (F, Ar) is a function 0: T(M) > Unem (Ep — T(E)) which satisfies the following. 

(i) Vp € M, VV € T,(M), Dom(6y) = Ep. 

(ii) Vp € M, VV € T,(M), Vz € Ep, 6v(z) € T;(E). 
(iii) Yz € E, (V 6v (z)) € Lin(T; (M), T:(E)). 
(iv) Yp e M, VV €T,(M), Yz € Ep, (dn);(0v(z)) — V. [horizontal component] 


[linearity] 


Alternative name: connection on a C1 fibration. 


67.4.3 REMARK: Interpretation of the horizontal lift function on a differentiable fibration. 
The abbreviation E, in Definition 67.4.2 denotes 7~'({p}) as in Notation 21.1.3. Some of the expressions 
in Definition 67.4.2 are illustrated in Figure 67.4.1. 


gy (1) € Ta (E) 

E 
6. ea 

2) € 
Z, (. &) 
1 

M 

V eT,(M) 
Figure 67.4.1 Horizontal lift function on a differentiable fibration 


Since 0 in Definition 67.4.2 is a function-valued function, it is necessary to notate its first parameter as a 
subscript. The notational form “0(V, z)” would have implied that the domain of 0 is T(M) x P, which is 
completely untrue. (This product-space-domain style of horizontal lift function would be valid as an old- 
fashioned coordinates-style definition. Such a functional style is in fact achieved in the “localisations” in 
Definitions 67.7.2 and 69.1.12, which are required for defining differentiability of connections.) 


Condition (i) in Definition 67.4.2 implies that 0y : E, — T (E) is a function for all V € T (M), for all p € M, 


because 0 has the functional form 0 : T(M) > Ue (E, > T(E)). 


As mentioned in Remark 67.3.4, the function-valued function style of definition is a trade-off between various 
objectives. For each p € M and V € T,(M), the function 6y is effectively a vector field on the fibre set E, 
at p. For each z € Ep, the vector 0v (z) is the velocity of a parallel translation curve through z when the 
base point has velocity V. Thus one could write 0 : T(M) > Upem X (1(E,)) as in Notation 57.1.5, or 
0 : T(M) > Upem X (T(E) | Ep) as in Notation 57.1.11. But for simplicity, the vector field conditions are 
written out explicitly in Definition 67.4.2. 


Condition (ii) constrains the range of 0y (z) to the tangent space at z. Therefore Oy is a vector field on the 
submanifold E, of E. 


Condition (iii) requires the lift of V to 0y(z) € T(E) to depend linearly on V. The scalarity of this map is 
needed to ensure that parallel transport will be independent of curve parametrisation. The full linearity is 
not quite so obviously necessary, but it is customarily required. 
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Condition (iv) in Definition 67.4.2 means that the horizontal component (dz):(0v(z)) of 0vy(z) equals V € 
T, (M) for any z € Ep. In other words, the lifted velocity vector follows any curve which has velocity V. Thus 
the connection only determines the vertical component of the lifted vector. It does not have the freedom to 
make the lifted vector deviate away from a curve in which the base-point velocity vector V is defined. This 
is implicit in the word “lift”, which means that the horizontal motion of the lifted vector follows a curve. 


However, the phrase “horizontal lift” is a little confusing. The lifting is more literally a vertical lift because 
the English-language word “lift” clearly implies a vertical motion. This apparent contradiction is resolved 
if one considers that it is a horizontal vector (the base-space velocity) which is lifted. So it is a “lift of a 
horizontal vector which keeps the horizontal component unchanged", not a "lift in a horizontal direction". 
The true origin of term “horizontal lift" is that it lifts a base-space vector V to a horizontal vector Oy (z). 
The lifted vector 0v (z) is referred to as “horizontal” because it lies in a kind of “horizontal plane" Q;. (This 
set Q; = {0v (2); V € Ty(2)(M)} is a linear subspace of TZ(M) by condition (iii). See Definition 67.9.6.) 
The comments made in Remark 67.5.5 regarding Definition 67.5.4 (i, ii, iii, iv) for horizontal lift functions on 
differentiable fibre bundles are also applicable to Definition 67.4.2 because they make no reference to any 
structure group or fibre atlas. 


67.4.4 REMARK: The irrelevance of the fibre space for connections on differentiable fibrations. 

Neither the fibre space F nor any explicit fibre atlas are referred to in the conditions of Definition 67.4.2. 
It is not surprising that no structure group is mentioned because none is defined, and the fibre space serves 
only to constrain the differentiable structure of a differentiable fibration. By contrast, condition (v) for a 
differentiable fibre bundle in Definition 67.5.4 places a strong constraint on connections. 


The relatively unconstrained kind of connection in Definition 67.4.2 cannot be regarded as the special case 
of Definition 67.5.4 where the fibre atlas is the set of all possible C?! fibre charts, and the structure group 
G is the group of all diffeomorphisms of the fibre space F, because Definition 67.4.2 does not even require 
continuity of 0y (z) with respect to z or 7(z). So integration of vector fields along curves is not well defined 
in general. Consequently general connections on differentiable fibrations are of little practical value. 


67.4.5 REMARK: Existence and uniqueness of parallel transport for connections on fibrations. 

'The existence and uniqueness of parallel transport for a connection on a differentiable fibration according 
to Definition 67.4.2 is not guaranteed. Further differentiability conditions are required in order to give such 
guarantees. When viewed through local charts, the equations for parallel transport have the appearance of 
first-order differential equations. Hence the theory of such equations is applicable here. 


67.4.6 REMARK: Differentials of families of diffeomorphisms provide a substitute for a Lie algebra. 

In the case of a general differentiable fibration, neither a Lie structure group nor any other kind of structure 
group is specified. Nevertheless the connection should at least define vector fields on the total space which are 
consistent with some kind of parallel transport along curves in the base space. When viewed through a fibre 
chart, the connection should at least induce a vector field on the fibre space which is the differential of some 
family of diffeomorphisms of that space. (See Section 63.1 for groups of diffeomorphisms. See Section 63.2 for 
families of diffeomorphisms.) An additional condition such as the following, for some suitable fibre atlas Af, 
would help to ensure that parallel transport is feasible. 


(v) For all p € M, for all V € T,(M), for all ó € AẸ p, for some C! family y : I + (F > F) of C! 
diffeomorphisms on F with 0 € I and q(0) = idp, for all z € Ep, (d$)(0v(z)) = y (0)(9(z)). 


Such a condition seems to be necessary for the well-definition of parallel transport along C! curves in M, 
but the test in condition (v) could be cumbersome to apply in practice. This difficulty can be overcome by 
requiring the structure group to be a Lie transformation group, as in Definition 67.5.4, and then requiring 
the horizontal lift function to conform to the differential action of differentiable families of diffeomorphisms 
in that group, as in condition (v) of Definition 67.5.4. This Lie algebra approach narrowly constrains the 
possible diffeomorphisms of the fibre space, but finite-dimensional groups of diffeomorphisms do cover most 
cases of interest. 


Comparison of condition (v) as given above with the corresponding condition (v) in Definition 67.5.4 shows 
that the differential 4'(0)(6(z)) of a diffeomorphism family y plays very much the same role here as a Lie 
algebra element plays in the fibre bundle definition. Therefore one could think of the space of all diffeomor- 
phism family differentials ?'(0) as the implicit “Lie algebra" for the implicit structure group consisting of all 
diffeomorphisms of the fibre space. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


67.5. Horizontal lift functions on ordinary fibre bundles 2123 


67.4.7 REMARK: Dual connections on differentiable fibrations. 

As described in Section 63.3, a “dual action” can be defined for a family of diffeomorphisms. If the connection 
on a differentiable fibration is constrained to be the differential of a family of diffeomorphisms, as outlined in 
Remark 67.4.6, then a dual connection may be defined for a “dual fibration” so that the “primal connection” 
uniquely determines the dual connection. 


According to Definition 63.3.3, a dual action may be defined for families of diffeomorphisms, but more 
importantly, Definition 63.3.5 defines a “dual differential action" corresponding to a given C! family of C! 
diffeomorphisms. This uses the kind of differential which is alluded to in condition (v) in Remark 67.4.6 as 
the “primal action". Thus in the complete absence of any concept of a tensor, it is possible to define a kind of 
dualism between covariant and contravariant “vectors”. In this case, the “contravariant vectors" are vector 
fields on the fibre space whereas the “covariant vectors" are real-valued functions on the fibre space. In the 
special case of connections on vector bundles, these concepts become the familiar vectors and covectors on 
differentiable manifolds, and the dual connection becomes the familiar parallel transport rule for covariant 
vector fields corresponding to a given parallel transport rule for contravariant vector fields. 


To define a dual connection for such a general kind of connection as appears in Definition 67.4.2, first one 
would have to define a dual differentiable fibration. A dual fibre space F* could be defined as the set 
C!(F) of all C! functions on the C! manifold F. (The values of these functions could be real, or could be 
in some other space.) Then corresponding to all families y of diffeomorphisms of F, there would be dual 
actions y* on F* = C!(F) as in Definition 63.3.3. More importantly, there would be dual differential actions 
*'(0) as in Definition 63.3.5. In this way, one could define a dual (associated) connection 0* on the dual 
differentiable fibration. T'his is not carried out formally here, although the concept does throw some light on 
how connections and dual connections may be developed in the absence of a differentiable structure group. 


67.5. Horizontal lift functions on ordinary fibre bundles 


67.5.1 REMARK: Representation of a connection as vector fields on the total space of a fibre bundle. 
Section 63.6 discusses vector fields (infinitesimal actions) generated on a G-manifold by the action of a Lie 
group G. 'The extension of this concept to differentiable fibre bundles is discussed in Section 64.13. A 
connection may be represented as a family of vector fields on the total space of a differentiable fibre bundle, 
which are generated by structure group actions and parametrised by base space direction vectors. 


67.5.2 REMARK: Lifting a base-space tangent vector to the total space. 

Definition 67.5.4 specifies a connection by fixing a vector V in the base space and stating how all points in 
the fibre attached to the base point p move in a parallel fashion when the point p is moved in the direction V. 
'This may be thought of as a vector field on the total space for each base space velocity. 


Definition 67.8.2 does the reverse. In Definition 67.8.2, an element of the fibre at a point in the base space 
is fixed, and then the parallel motion of that element is specified for each velocity of the base point. This 
may be thought of as a linear map from the space of base space velocities to the space of fibre velocities for 
each fibre element. 


67.5.3 REMARK: Definition of horizontal lift functions as differential right actions. 

Part (v) of Definition 67.5.4 is specified in terms of the differential of the right action Ry of points f € F 
on elements of the group G. (See Definition 63.6.4 for the right action Ry. See Definition 63.6.5 for the 
corresponding differential actions (dR;)-.) For all f € F, Ry : G — F is defined by Ry : g — gf. Such 
“differential actions" dRy are discussed more fully in Section 63.6. 


The maps and spaces in Definition 67.5.4 are illustrated in Figure 67.5.1. 


67.5.4 DEFINITION: A horizontal lift function on a Ct (G, F) fibre bundle (E, Tp, M, AE) is a function 
0: T(M) > Upem(Ep > T(E)) which satisfies the following conditions. 


(i) Vp € M, VV € T,(M), Dom(6y) = Ep. 

(ii) Vp € M, VV € T (M), Vz € Ep, 60v(z) € T(E). 
) 
) 


(iii) Yz € E, (V > 6v(z)) € Lin(T;,'4(M), T;(E 


). [linearity] 
(iv) Vp € M, VV € T,(M), Vz € Ep, (dng);(0v(z)) = [horizontal component] 
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Figure 67.5.1 Maps and spaces for a horizontal lift function on an ordinary fibre bundle 


(v) Vp e M, VV € T,(M), Vo € Aip d du € T,(G), Vz € Ep, (d$);(0v(z)) = (dRez))e(u), 
where Ry : G — F denotes the right action Ry : g> gf of f € F on G as in Definition 63.6.4. 
In other words, Vp € M, VV € T,(M), Vo € AE p Jue T.(G), $4. o0y — XT o9. 
(See Definition 63.6.5 for the vector field XF generated on F by u € T.(G).) 


Alternative name: connection on a Ct (G, F) fibre bundle. 


67.5.5 REMARK: Comments of the group-independent conditions for a horizontal lift function. 

'The following comments refer to the conditions of Theorem 67.5.4 which do not involve the structure group. 
These are essentially the same as the corresponding conditions for a connection on a differentiable fibration 
in Definition 67.4.2, which are commented on in Remark 67.4.3. 


(i) Condition (i) is required because the functional form T(M) > Uc; (E; > T(E)) pu 
constraint on 0. Condition (i) can also be written as VV € T,(M), Dom(0y) = E; 
Tr(m) : T(M) — M is the tangent bundle projection map for T(M). 


uts only a weak 


aran (V) where 


(ii) Ep in Definition 67.5.4 means mp ({p}). So condition (ii) implies that Oy is a vector field on Ep for 
p € M and V € T,(M). (The vectors in this vector field are in T(E), but the domain of the vector 
field is Ej.) Thus the combination of conditions (i) and (ii) means that 0y € X(T(E)| Ep) for all 
V €T,(M). 
(iii) Condition (iii) means that for fixed p € M and z € Ep, the map V +> 6y(z) is linear from T,(M) 
to T.(M). This implies that 0|. (M) is a linear map from T;(M) to the linear space X(T(E)| Ep) of 
vector fields on E, with the linear space structure of pointwise addition and scalar multiplication. 


(iv) Condition (iv) means that the horizontal component of Oy is V. This is because a fibre moving in a 
parallel fashion along a path must move so that the base point of the fibre follows the path. 


67.5.6 REMARK: Differentials of right actions are in fact infinitesimal left actions. 

In Definition 67.5.4 (v), although the action of a horizontal lift function (on the fibre space via a fibre chart) 
is required to be the differential (dRgz))e of a right action, it is in fact an infinitesimal left transformations 
by the group acting on the fibre space. (This apparent paradox is explained more fully in Remarks 62.4.12, 
62.7.1, 63.6.2 and 63.7.1.) 


Definition 67.5.4 condition (v) is illustrated in Figure 67.5.2. (The PFB version is Figure 69.1.2.) 


67.5.7 REMARK: Horizontal and vertical components of the horizontal lift. 

The differential map (d$); in Definition 67.5.4 (v) is the vertical component of 0y (z) relative to a fibre chart 9. 
In other words, (d$); removes the horizontal component. Each fibre chart typically defines a different vertical 
component. The map (d$); is “orthogonal” to (dr), in the sense that ker(d$); N ker(drg) = {0}. (This 
follows from the C? diffeomorphism tp x ¢: np (U) ~ U x F for U = sg(Dom(9)).) The map (dzz); 
similarly removes the vertical component of vectors in T;(E). 
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Figure 67.5.2 Right action map condition for OFB horizontal lift function 


Definition 67.5.4(v) is the quintessential property of connections on general differentiable fibre bundles. 
Therefore it is highlighted in its own section, namely Section 67.6. Essentially all significant properties of 
connections follow from this condition. 


67.5.8 REMARK: Lifting vector fields from the base-point manifold to the total space. 

Definition 67.5.9 matches up the base points z;4(V) and «(z) for V and z in the horizontal lift expression 
0v(z) by defining a vector at each point of the base space M, and feeding this as input twice into the 
expression 0x (4(,(z). Then the input of the lift function is extended from pairs (V, z) € T(M) x E to pairs 
(X,z) € X(T(M)) x E. When z is regarded as the independent variable for the output, the result is a 
map from X(T(M)) to X(T(E)). This formal elegance is purchased at the price of a loss of clarity of the 
meaning of the lift function and a loss of locality. If the lift function is defined primarily in this way, as it 
is in some texts, the meaning of a connection at a single point is lost. The loss is partly illusory because 
the pointwise connection is defined in terms of tangent vectors, which are defined in terms of differentiation, 
which requires an open neighbourhood around each point for its definition. The loss of locality can be 
remedied by extending Definition 67.5.9 to local vector fields as in Definition 67.5.10. (See Definition 51.4.15 
for restrictions of differentiable manifolds to open subsets.) 


If Definition 67.5.10 is used as an alternative definition of a connection on a fibre bundle, and if all possible 
vector fields are permitted as inputs, then the information in the alternative definition is clearly the same 
as in the pointwise Definition 67.5.4. However, some texts define a connection for only C?? global vector 
fields. The smoothness constraint excludes manifolds which are not so smooth. The globality constraint has 
various issues in regard to the extensibility of local to global vector fields. 


67.5.9 DEFINITION: The lift of a vector field X € X(T(M)) by a horizontal lift function 0 on a Ct 
differentiable (G, F) fibre bundle (E, v, M, AŻ) is the vector field lifto(X) € X(T(E)) defined by 


Vz € E, lift (X)(z) = Oxa) (2). 


67.5.10 DEFINITION: The lift of a local vector field X € X(T(M)|U) by a horizontal lift function 0 
on a C! differentiable (G, F) fibre bundle (E, 7r, M, AE), where U € Top(M), is the local vector field 
lift(.X) € X(T(E) |«-!(U)) defined by 


Yz ea '(U), lift (X)(z) = 8x («(2)(2). 


67.5.11 REMARK: Lifts of vector fields versus lifts of curves. 
The lift of a vector field does not require the solution of any differential equation. Definitions 67.5.9 
and 67.5.10 merely apply the connection 0 to compute a subset of its graph for a particular choice of 
combinations of inputs in accordance with a given vector field. 
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By contrast, when a curve y in M is lifted by a connection 6, the output of the lift (which is known as 
“parallel transport") is generally understood to be a curve 4 in E, satisfying m o 7 = y, whose tangent 
vector 4'(t) equals 0y) (4(t)) for all t € Dom(y). This is a system of ordinary differential equations when 
viewed through the charts. 


67.6. Connection generator functions 


67.6.1 REMARK: The importance of connection generator functions. 

Section 67.6 presents “connection generator functions” for horizontal lift functions on differentiable ordinary 
fibre bundles. Connection generators are the Lie algebra elements u in Definition 67.5.4 (v). Connection 
generator functions are the Lie-algebra-valued functions in Definition 67.6.5. 


Connection generators are avoidable in the case of differentiable principal bundles. (This avoidability is shown 
in Theorem 69.1.7, which states that the existence of connection generators is equivalent to a differential 
right action invariance condition.) Explicit connection generators are also avoidable in the case of vector 
bundles. (This is because the Lie algebra elements of a general linear group are linear maps, which implies 
that linearity may be specified as the invariance condition.) 


In the case of general differentiable OFBs, it is difficult to state an invariance condition which can replace 
connection generators as a constraint on the dependence of a horizontal lift function on elements of a fibre 
set. This could help explain why so few texts give full pointwise definitions for connections on general 
differentiable OFBs. Some texts define such connections without condition (v) for Definition 67.5.4. (See 
for example Sternberg [38], pages 325-327.) Some texts avoid the issue by defining horizontal lift functions 
only on vector bundles or principal bundles. (For vector bundles, see for example Lang [23], pages 103-105. 
For principal bundles, see for example Choquet-Bruhat [6], pages 254-255.) Most texts avoid the issue by 
defining covariant derivatives on vector bundles, or connection forms or horizontal subspaces on principal 
bundles, or various other constructions as listed in Table 67.2.2. 


Some methods of avoiding the necessity to define connection generators are summarised as follows. 


(1) For a principal bundle, the connection generator existence requirement can be replaced by an equivalent 
right-invariance condition. (This is asserted by Theorem 69.1.7.) Therefore the connection generator 
function does not need to be defined for this case. This method is applicable to general Lie structure 
groups, but is restricted to principal bundles. 


(2) For a vector bundle (general) linear connection, the connection generator existence requirement can 
be replaced by an equivalent linearity condition. (The necessity of the linearity condition is asserted 
by Theorem 68.1.3. Alternatively, see Theorem 63.6.25.) This linearity condition may be stated ei- 
ther explicitly, or else implicitly by defining the connection in terms of Christoffel arrays or covariant 
derivatives. (See Definition 68.1.8 for Christoffel arrays for vector bundle connections.) 


(3) For a vector bundle connection which is not a general linear connection, the connection generator 
existence requirement can be replaced by an equivalent linearity condition with additional conditions 
according to the structure group. For example, if the group is special orthogonal, then the connection 
must satisfy an antisymmetry condition. This is most easily expressed in terms of Christoffel arrays. 


(4) General affine connections on tangent bundles are a special case of the general linear connections on 
vector bundles in (2). 


(5) Connections on tangent bundles which are not unconstrained affine connections are a special case of the 
non-general linear connections on vector bundles in (3). 


Connection generator definitions for connections are almost never seen in the literature. Instead, connections 
are defined either on principal bundles (as connection forms, horizontal subspaces or horizontal lifts), or on 
vector bundles (as covariant derivatives or Christoffel arrays). 


The connection generator concept is not just valuable as a gap-filler for defining connections OFBs which 
are not vector bundles or principal bundles. The connection generator condition is a unifying concept for all 
of the connection styles for all kinds of fibre bundles. 


The connection generator condition has a simple intuitive meaning, which is that for any “infinitesimal 
displacement” V € T,(M) of a base point p € M, there is a corresponding “infinitesimal displacement field” 
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XE of the fibre set Ej for some u = tig(V,¢) € T.(G), as viewed through a chart ¢. (For the notation 
Qo(V, p), see Definition 67.6.5.) 


It may seem that the great generality of the right invariance condition (or connection forms, or horizontal 
subspaces, or horizontal lift functions) in (1) make principal bundles the natural structure on which to define 
connections. There is much truth in this, which justifies the broad popularity of this approach. However, to 
obtain connections on ordinary fibre bundles, one must then define associated connections, which are often 


inadequately presented in those textbooks which do define them. (See Remark 67.12.1.) 


Connection generator functions are also the most suitable functions to test for C^ differentiability when one 
wishes to define C* connections. (Such differentiability is presented in Section 67.7.) 


67.6.2 REMARK: The vertical component of a horizontal lift is a, structure group Lie algebra element. 
Condition (v) of Definition 67.5.4 is a kind of invariance condition. It means that the vertical component 
(with respect to a particular choice of fibre chart $) of the connection value 0y (z) for fixed V is an element 
of the Lie algebra T(G) on the fibre space F. Thus the connection value y (z) is an “infinitesimal group 
action". (This concept is discussed in Remark 63.6.16 and elsewhere in Section 63.6.) Since each Lie algebra 
element u € T(G) may be identified with the corresponding infinitesimal transformation XF € X(T(F)) of 
the form X7 : f  (dRy)(u) for f € F as in Definition 63.6.5, condition (v) may be written as 


Vp € M, VV € T,(M), Vó € AE ,, du € T,(G), Vz € Ep, 
(dd) 2 (Ov (z)) = (dRg(z) )e(u) 
= Xi (¢(2)). 


In other words, by identifying XI with the corresponding Lie algebra element u, 


Vp € M, VV € T,(M), Yọ € Af, du € T.(G), 
$. o Ov = XP o e|, 
UR ble, 
This may be written more compactly and suggestively as: 
-1 
Vp € M, VV €T,(M), YỌ € Ak», Qx 0 Oy o Pn, E€ T.(G). 


This means that, via the fibre chart, the vertical component of the horizontal lift 0y is an element of the Lie 
algebra of the structure group. The composition of $, with Oy in effect removes the horizontal component. 
This “loss of information" is unimportant because by Definition 67.5.4 (iv) the horizontal component is known 
to equal V. However, the “loss of information" implies that Oy cannot be expressed as (Qx | B) ouo $| E, 


because the horizontal component would be missing. This may be remedied by noting that 
Vp € M, VV € T,(M), Vo € Af p, Vz € Ep, 
(d(x x $))«(6v(2)) = ((dn)(6v (2)), (d): (6v (2))) 
= (V, Xi. ay (6(2))). 
(The technical requirement for the identification map i : T(M) x T(F) — T(M x F) as in Definition 54.7.6 is 


ignored here.) Thus øy (z) can be expressed as 0v (z) = (1x9) )«(V. XT. oy (6(2))), where u(V, $) € Te(G) 
depends on V € T(M) and à € AE (2): This is somewhat clumsy, but it does demonstrate that the 


connection can be expressed in terms of a Lie algebra element which depends only on V and @. 

Condition (v) links connections on an ordinary fibre bundle to its structure group, which would otherwise 
play no role in the OFB except to constrain the overlaps of fibre charts. The fact that differential parallelism 
(i.e. the horizontal lift function) is required to act in accordance with an element of the Lie algebra of the 
structure group gives this group a very important role. In particular, it forces the set of possible actions of 
parallel translation on the fibre space to be a finite-dimensional manifold. In the absence of any structure 
group constraint, this set of parallel transport actions could include arbitrary diffeomorphisms of the fibre 
space, which would certainly not fall within the scope of a finite-dimensional Lie group. (See Remark 67.4.6 
for a substitute condition for when there is no specified structure group.) 
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67.6.3 THEOREM: Uniqueness of the generator of a horizontal lift vertical component. 
Let 0 be a horizontal lift function on a C! (G, F) fibre bundle (E, vg, M, A‘). 
(i) Yp € M, VV € T,(M), Vo € AE p, Yu € Te(G), px o Ov o dla = XF. 
(ii) Yp € M, VV € T (M), Vo € Af ps Fu € T.(G), Vz € Ep, 0v(z) = (d(ng x $))7 (V, XE (¢(z))). 
(iii) Vp € M, VV € T,(M), Vó € Ag ,, Vue T.(G), 0v = XT, y. 


PROOF: For part (i), existence of u € T,(G) follows from Definitions 67.5.4 (v) and 63.6.5. Uniqueness 
follows from Theorem 63.6.17. 


Part (ii) follows from part (i), Definition 64.8.3 (ii) and Theorem 58.7.7. 
Part (iii) follows from part (ii) and Definition 64.14.2. 


67.6.4 REMARK: Generator functions for horizontal lift functions on differentiable fibre bundles. 

Since the generator (via a fibre chart) of a horizontal lift function on a differentiable fibre bundle is shown 
to exist and be unique in Theorem 67.6.3 (i) for each pair (V, 9) with V € T(M) and o € AE «(Vy it can be 
given a name. Definition 67.6.5 gives a name to the map from (V, $) pairs to the corresponding generators 
(i.e. Lie algebra elements). Theorem 67.6.7 (iii) gives a formula for this generator function. 


Theorem 67.6.7 (iv) states, unsurprisingly, that if the velocity of a curve in the base-point manifold is zero, 
then the rate of transformation of the observed object is zero for all observation frames. 

'Theorem 67.6.8 gives a formula for how infinitesimal parallel translations appear differently with respect to 
different reference frames. The adjoint typically arises whenever Lie-algebra-generated transformations of 
manifolds are viewed within different reference frames. 


67.6.5 DEFINITION: The (connection) generator function for a connection 0 on a C! (G, F) fibre bundle 
(E, Ttg, M, AL) is the function ag : ((V,9) € T(M) x AE; o € AF xv) — T.(G) which satisfies 


=i 
VV € T(M), Vó € A5 «vy, Px o Ay o Pl, wy = XS và» 
where r : T(M) — M is the standard projection map for the tangent bundle T(M). In other words, 


Vo € AE, Vp € »g(Dom(9)), VV € T;(M), Vz € Ep, 
(d$); (8v (z)) = (dRaiz) )e(to(V, 9)). 


67.6.6 REMARK: Terminology for connection generator functions. 

It eventuates that the connection generator function in Definition 67.6.5 provides a very precise and simple 
linkage between associated OFB and PFB connections. This is not surprising because tig(V,¢) describes 
the way in which a parallel-translated fibre set transforms for base-space velocity V when viewed via a fibre 
chart ¢. All fibre bundles associated with this fibre bundle transform in the same way because they are 
all transforming in accordance with the same reference-frame parallel transport. (See Definition 67.12.3 for 
associated connections.) 


As mentioned in Remark 62.9.5, the word “generator” is used in this book to mean any Lie algebra element 
u € T.(G), where the generated vector field is an infinitesimal transformation XP : f œ> (dRy)-(u) for 
f € F, for some C? Lie left transformation group (G, F). In other words, u “generates” X7. (See also 
Remark 63.6.1.) This is different to the terminology in much of the Lie groups and physics literature, where 
a generator of a Lie group is an element of a basis for the Lie algebra. Despite the potential for confusion, 
it is difficult to find a better term than "generator" for this concept. 


67.6.7 THEOREM: Some basic properties of connection generator functions. 
Let 0 be a horizontal lift function on a C! (G, F) fibre bundle (E, vg, M, Af). 
(i) Vp € M, VV € T,(M), Yọ € AE p» Vz € Ep, 0v (z) = (d(ng x $))z (V, X1 v.a (9 02). 
(ii) Vp € M, VV € T (M), Vo € AE p Ov = (TE x $); (V; XZ v y (6C )))- 
(iii) Vp e M, VV € T (M), Vo € Ag, üg(V, $) = Gen" (à, o 0y o lp): (See Notation 63.6.21 for Gen.) 
) 


(iv) Vp € M, Yọ € Af p: üo(Or, (a), 9) = Or. (c). 
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PROOF: Part (i) follows from Definition 67.6.5 and Theorem 67.6.3 (ii). 
Part (ii) is a paraphrase of part (i). 

Part (iii) follows from Definitions 67.6.5 and 64.14.2, and Theorem 67.6.3 (i). 
For part (iv), let V = Or, (a. Then 6y(z) = 0r (gj; by Definitions 67.5.4 (iii) and 23.1.1(ii). Let o € AE p 
Then ¢, (6v Có [c A) = Or, (1 for all f € F because ¢, is pointwise linear by Definition 58.4.5. So XT V.) 
is the zero vector field on F. Therefore tig(V, 9) = 07, (a) by Theorem 63.6.17. 


67.6.8 THEOREM: Chart transition rule for connection generator functions. 
Let 0 be a horizontal lift function on a C! (G, F) fibre bundle (E, npg, M, AZ). Then 


Vp € M, VV € T,(M), Voi, ¢2 € AF p 
üg(V, $2) = Adj(g(p))(üs(V, ó1)) + (ARG) Je '((dg)p(V)), 


where L = $2 0 glz ip} for all p € nr(Dom(ġ1) N Dom(¢2)) as in Definition 64.8.3 (iv), and gy, v, 


Ib2,¢1 (P) ~ 
is abbreviated to g. 


PROOF: The assertion follows from Definition 67.6.5 and Theorem 64.14.6. 


67.7. Differentiability of horizontal lift functions 


(2017-1-14. Need definition of a C^ horizontal lift function on a C^*! (G, F) fibre bundle for k € Zf. Also 
define regularity classes for all other kinds of connections, including Definitions 67.8.2, 69.1.3 and 69.2.2. The 
transposed versions of horizontal lift functions actually should be defined in terms of a given untransposed 
lift function. Then only Definitions 67.5.4 and 69.1.3 should be given regularity classes. Must also give 
theorems showing that h,, v, are C^ in some sense, and for a PFB that w is C^.)) 


67.7.1 REMARK: Definition of differentiability of horizontal lift functions on fibre bundles. 

One of the difficulties encountered in the definition of differentiability for a horizontal lift function @ is the 
fact that the domain of @ is not a direct product of differentiable manifolds. Such products of manifolds 
have a well-defined standard differentiable structure which can be used to define differentiability. The value 
8y (z) is defined only when the base points of V and z are the same. That is, nr(m)(V) must equal 7(z), 
where Trom) : T(M) — M is the projection map for T(M). So Dom(0) = U em (T; (M) x Ep), which may 
be visualised as a kind of diagonal-line subset of the Cartesian product T(M) x E. 

Definition 67.7.2 converts the domain of 0 to a direct product of differentiable manifolds by means of fibre 
charts ¢ € AZ. The total space elements z € Dom(¢) have a bijective association via m x $ with pairs 
(V,q) € U x F, where U = «(Dom(Q)). Since both U and F have well-defined differentiable structure, 
the direct product differentiable structure is well defined. (See Definition 51.4.15 for the restriction of a 
differentiable manifold to an open subset.) 


The “localisation” 0? is then used to define differentiability classes in Definition 67.7.6. (For the misnomer 
“localisation”, see Remark 21.6.3. The term “localisation” can be justified because it employs a fibre chart, 
which is often referred to as a “local trivialisation".) 


'The spaces and maps in Definition 67.7.2 are illustrated in Figure 67.7.1. 


67.7.2 DEFINITION: The localisation of a horizontal lift function 0 on a C! differentiable (G, F) fibre 
bundle (E, v, M, A5) via a fibre chart ¢ € A is the map 0? : T(r(Dom(Q))) x F — T(F) defined by 


VV € T(z(Dom(9))), Va € F, 
6* (V, q) = d (Ov (v x p) (aran (V). a); 


where Trom) : T(M) — M is the tangent bundle projection map for M. 
The map 09(V, -) : F — T(F) may also be denoted as 6% for ó € AE and V € T(z(Dom(9))). 
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Tr(m)XidF 


(p,a) (V, q) 
MxF T(M)xF 


Figure 67.7.1 Localisation of a horizontal lift function 


67.7.3 REMARK: Connections and their localisations are equi-informational. 

Theorem 67.7.4 shows that the “information” in a horizontal lift function can be easily recovered from its 
localisations. So a connection is equi-informational to its set of localisations. Therefore one could equally 
well represent abstract connections in terms of its localisations alone. In fact, this is much closer to the way 
in which connections are represented in practical applications. 


67.7.4 THEOREM: Reconstruction of horizontal lift functions from localisations. 
Let 0 be a horizontal lift function on a C! differentiable (G, F) fibre bundle (E, r, M, AE). Then 


Vo € AE, Vp € r(Dom(¢)), VV € T,(M), Vz € Ep, 
Oy (z) = (d(x x $))7 (V, 0*(V, o(2))) 


z 
zZ 


= (ds x $))z (V, 0% (4(2))). 


Wa 


(See Notation 54.7.9 for the tangent vector concatenations (V,0°(V, ¢(z))) and (V, eg ($(z)))-) 


Proor: Let ó € Ak, p € «(Dom(9)), V € T,(M) and z € Ep. Then 0v(z) € T.(E) and (dz);(0v(z)) = V 
by Definition 67.5.4 (ii, iv). So (d(x x @)).(@v(z)) is well defined and by Theorem 58.7.9, 


Ov (2) = (d(s x $))z ' ((dm): (6v (2), (db) (Ov (z))) 


= (d(s x 6))z (V. (8v (2))). 


But z = (1x9) ((nx9)(2)) = (1x0) (p. 6(2)). So bx (Ov (2)) = p. (8v ( (Tx 6) (p, 6(2)))) = 6*(V. o(2)) 
by Definition 67.7.2. Hence 8y (z) = (d(x x 4)); (V, 6*(V. (2))) = (dx x $)); (V. 0 ((2))). 


67.7.5 REMARK: Using localisations to define connection differentiability classes. 

In Definition 67.7.6, the Cartesian product manifold T(z(Dom(4))) x F is of class C^ because M is of class 
C^*! by Definition 64.8.3 (i), and F is of class C^*! by Definitions 64.8.3 and 63.4.2 (ii). Also T(F) is of 
class C^ by Theorem 54.5.28 (iii). Therefore the set of maps C^(T(r(Dom(9))) x F, T(F)) is well defined as 
in Notation 52.1.3. € 


67.7.6 DEFINITION: A C* (differentiable) horizontal lift function on a C**! differentiable (G, F) fibre 
bundle (E, x, M, AE), for k € Zg, is a horizontal lift function 0 on (E, v, M, AE) such that 


vo € AL, 0? € C*((T(x(Dom(9))) x F,T(F)), 


where 0? : T(7(Dom(¢))) x F — T(F) is the localisation of 0 via 9. 


Alternative name: C^ (differentiable) connection. 
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67.7.7 REMARK: Application of horizontal lift function differentiability to fibrations. 

Definition 67.7.6 does not refer to the structure group G. In particular, the Lie algebra of G is not used in any 
way. In other words, condition (v) of Definition 67.5.4 is not used. Therefore Definition 67.7.6 is applicable 
to the horizontal lift functions on differentiable fibrations (as opposed to fibre bundles) in Definition 67.4.2. 


67.7.8 REMARK: Equivalent differentiability condition using connection generator functions. 

Let 6 be a horizontal lift function on a C^*! (G, F) fibre bundle E < (E, v, M, AL) for some k € Zt. Then for 
each ¢ € Af, the fibre-chart-specific connection generator function üg(- , à) for 0, as in Definition 67.6.5, has 
domain T (r(Dom(9))), which is a C* manifold. The linear space T; (G) is a C% manifold because it is finite- 
dimensional. Therefore C^ maps from T(x(Dom(9))) to T.(G) are well defined. So the functions üg(- , à) : 
T(r(Dom(9))) + T.(G) can be meaningfully tested for C^ differentiability. It is shown in Theorem 67.7.9 
that this test is equivalent to the C^ differentiability of the connection 0 according to Definition 67.7.6. 


67.7.9 THEOREM: Connection differentiability expressed in terms of connection generator functions. 
Let k € Zt. Let 0 be a horizontal lift function on a C^*! (G, F) fibre bundle E < (E, r, M, AE). Then 0 is 
a C* horizontal lift function on E if and only if 


Vo € Ag, dig(-,¢) € C*(T(m(Dom(¢))), T.(G)). 


PROOF: 
((2020-2-16. To be continued ... )) 


67.7.10 REMARK: Coordinatisation of horizontal lift functions in order to define differentiability. 

To define the differentiability of a horizontal lift function (i.e. a connection), it is necessary to express it in 
terms of Cartesian coordinates. (Differentiability is a numerical concept which can only be applied if the 
geometry is arithmetised first.) For the spaces in Definition 67.5.4, let n = dim(M) and m = dim(F). Then 
dim(E) = n 4- m, dim(T(M)) = 2n, and dim(T(E)) = 2(n + m). However, much of the “input” information 
is duplicated in the “output”, and some of the input information is redundant also. Therefore, even though 
it may seem that the lift function is a map which looks like IR?" x IR*** > R2("+™) the map looks more 
like IR?" x IR" — R” when the duplications are eliminated. This agrees with the definition of a connection 
on a vector bundle in terms of the Christoffel array, which effectively maps (x,v,w) € R” x R” x R™ to 
(Dja Lea ewe) E R”. 

For va € atlas(M), x € Range(ym) and v € R”, let p = yy (a) and V = tpvpu = tp-i(æ)vpu" Then 
Dom (6. .. us) = Dom(óy) = E, = Ey, = 7 *({p}) = a*({wy;(x)}) by Definition 67.5.4 (i). 
(See Theorem 67.7.11 part (iii).) In terms of the tangent bundle chart iu = (Ym) in Notation 54.5.21, 
L1), May be written as wry (x, v). Then Dom(V (x, v)) = Ej-1(u) = a (yl (z))). 

A fibre set element z € Ep may be coordinatised via a chart Yp € atlas(E), but this would give very little 
information about z. It is more informative to first map z to F via a fibre chart ó € AẸ and then apply 
a chart vp € atlas(F) to the result. By inverting this, one obtains a coordinatisation of z in terms of 
horizontal and vertical components. Let z = (m x $)- "(way (x), vg (y). Then z is a well-defined element 
of E if (ap (x), v El (y)) € Range(z x ¢) = U x F, where U = «(Dom(Q)) by Definition 64.3.2. Thus z is 
well defined if x € vy (1(Dom(9))) = wu (U) and y € Range(vr). (See Theorem 67.7.11 part (iv).) 

Given z € E, one may define vectors tz wys € T;(E) for vg € atlas(E) and w € R"*™, where n +m = 
dim(£) = dim(M) + dim(F). The map tz wyp — (Vz(z), w) is de = Vg) = (We 0 Tg) x (Ug). Ideally 
one would wish to decompose the component tuple w into a horizontal component w € IR" and a vertical 
component w € IR". Unfortunately, the charts on E do not necessarily have any such direct relation to 
components with respect to M and F. This may be contrasted with the situation for affine connections 
on tangent bundles, where the total space E is the tangent bundle T(M), which does have standard charts 
expressed in terms of horizontal and vertical components. The only way to perform the decomposition in 
the case of general differentiable fibre bundles is to utilise the diffeomorphisms 7 x ¢ for fibre charts ¢ € Af. 


Vectors in T; (E) may be converted to coordinate form via a fibre chart ¢ by applying the differential (d$); to 
them. Then local Cartesian coordinates can be determined for the linear space T5; (F). (This is reflected in 
Theorem 67.7.11(vi).) Definition 67.5.4 (v) gives a very specific expression (dRgz))e(u) for (dé), (0v (z)) in 
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terms of some u € T.(G). (This is reflected in Theorem 67.7.11 (vii).) Then the “output” from the horizontal 
lift function can be converted to an element of the Lie algebra of the Lie group G. However, this is not 
necessarily convenient for defining the C* differentiability of the lift function. 


Thus there are three immediately obvious candidates for Cartesian coordinatisation of the output of a 
horizontal lift function which could be used for the definition of C^ differentiability. 


(1) 0v (z) € T.(E) may be coordinatised by ýp = W(wvg) € atlas(T(E)). 

(2) 0v (z) € T, (E) may be coordinatised by applying ýr = W(vr) € atlas(T(F)) to (dó);(0y(2)). 

(3) 0v (z) € T(E) may be coordinatised by applying iG = U(wq) € atlas(T(G)) to u € T.(G) such that 
(do) 2(9v (z)) = (ARa())e(u). 


The C* differentiability of a horizontal lift function according to options (1) or (2) will determine the same 
class of lift functions because fibre charts of C^ fibrations and fibre bundles must be of class C^. (See 
Definitions 64.2.2 (iii), 64.4.4 (v) and 64.8.3 (ii).) If a Lie structure group is specified for a fibre bundle, 
C* differentiability with respect to its Lie algebra in option (3) will be equivalent to options (1) or (2). (See 
Definition 64.8.3 (v).) Option (2) is closely related to the Christoffel array style of coordinatisation, which 
suggests that this may be the most meaningful way to coordinatise a lift function. If the structure group 
is GL(n), then the coordinatisation for options (2) and (3) will be essentially identical. All in all, it seems 
best to initially define C^ differentiability of horizontal lift functions in terms of the most basic option (1), 
and then show that this is equivalent to the other options. B 


67.7.11 THEOREM: Interpretation of horizontal lift function in terms of coordinates. 
Let 0 be a horizontal lift function on a C! (G, F) fibre bundle (E, r, M, AE). Let n = dim(M), m = dim(F) 
and q = dim(G). Let Ay = atlas(M), Ag = atlas(E), Ap = atlas(F) and Ag = atlas(G). Let my, TE, TF 
and rG be the tangent bundle projection maps for T(M), T(E), T(F) and T(G) respectively. 

(i) dim(E) 2 n 4- m. 

(ii) dim(T(M)) = 2n, dim(T(F)) = 2m, dim(T(E)) = 2(n + m), and dim(T(G)) = 2q. 
(iii) Vy € Am, Vx € Range(vy), Vv € R^, 


Dom(8514, ,) = Dom(6, "IM "E = Eig m (yi ())). 


(iv) Vym € Am, Vir € Ar, Vb € Ah, Vr € vu (v ac Vy € Range(vr), 
(x x 6) "(ag (x), v (y)) € 71 (Dom(vr)) N 47 (Dom(vr)) € Top(E). 

(v) Vim € Am, Vor € Ar, Vo € AE, Vx € v (n(Dom Lg ))), Ww € R^, Vy € Range(Vr), 
0, ve} (n x 6)" Wir (2), v (y))) € Tex) 1 (95 (2), s 


(=), vP M 
(vi) Ym € Am, Vibe € Ar, Y € AE, Yz € Uu (n (Dom(9))), Yv € R^, Yy € Range(ýr), 


( 
be raoun (XO a) YEW) € Tu F 
(vii) YYm € Am, Vir € Ar, Vb € AE, Vr € Van (Dom(¢))), Vv € R”, Vy € Range(wr), Ju € T«(G), 


Da (6... a ig TX 8) (x s ) Wp ())) = GR, «Qu Je Ty- (F). 


(z),v 


PROOF: Part (i) follows from Definition 64.8.3 (iv). 

Part (ii) follows from Definitions 64.8.3, 54.5.19 and 54.5.22. 

Part (iii) follows from Definition 67.5.4 (i). 

Part (iv) follows from Definition 64.8.3 (iv). 

Part (v) follows from parts (iii) and (iv), and Definition 67.5.4 (i, ii). 

Part (vi) follows from part (v) and Definition 64.8.3 (ii) because z = (v x 9) (Yir (x), Wp (y)) implies that 


(2) = Yp (y). 
Part (vii) follows from part (vi) and Definition 67.5.4 (v). 


67.7.12 REMARK: Using vector fields to define differentiability of horizontal lift functions. 

It is possible to evade the difficulties of the non-product structure of the domain of a horizontal lift function 
by defining it to be C* if it lifts vector fields in X^(T(M)) to vector fields in X*(T(E)). Although this is 
elegant on paper, it defines a very local property in terms of global structures, which adds further difficulties 
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which are irrelevant to the concept to be defined. (This is comparable to the use of test functions to define 
C^ maps between manifolds as discussed in Remark 52.1.18. Such use of test functions is validated, with 
some difficulty, in Theorem 52.1.19.) 


67.8. Transposed horizontal lift functions 


67.8.1 REMARK:  Transposition of the horizontal lift function. 

The horizontal lift function in Definition 67.5.4 is transposed in Definition 67.8.2, which contains the same 
information. If 0 satisfies Definition 67.5.4, then 0 defined by 0,(V) = 0y (z) satisfies Definition 67.8.2 and 
vice versa. Definition 67.8.2 uses the same notation Ry for the right action of an element of the fibre space 
F as Definition 67.5.4. 


The horizontal lift function in Definition 67.5.4 is useful for most purposes, for example for the Riemann 
curvature formula in Remark 70.4.3. The transposed horizontal lift function in Definition 67.8.2 has a 
particular application in the computation of parallel transport along rectifiable base-space curves, as in 
equation (68.5.1) in Remark 68.5.1. 


'The untransposed horizontal lift function 0 in Definition 67.5.4 has the advantage that it effectively maps 
base-space vectors V to elements of the Lie algebra of the structure group, as discussed in Remark 67.6.2. 
(In fact, it is the vertical component which is a Lie algebra element via the fibre charts, but the horizontal 
component equals V, which adds no information.) This advantage becomes even clearer if one considers 
manifolds with boundary. At a boundary point, there is a starlike subset of vectors V which are "interior 
vectors" at p € M. Complete vector fields 0y € X (T(E)| Ep) can easily be defined for such vectors. But if 
the transposed style of lift function in Definition 67.8.2 is adopted, each z € E is associated with a partial 
linear map from Tp(M) to T;(E) because T,(M) is typically only a starlike set if p is a boundary point. 
Such a “starlike-linear map” at each point of E would be quite untidy and clumsy. 


67.8.2 DEFINITION: A transposed horizontal lift function on a Ct (G, F) fibre bundle (E, ng, M, AE) is a 


map 0: E > U,cg Lin(Tz, (4 (M), T (E)) which satisfies 
(i) Vz € E, 6, € Lin(T;,,(2)(M), TZ(E)), [linearity] 
(ii) Vz € E, (dug) o0, = idm, |. (M) [horizontal component] 


(iii) Vp € M, VV € T (M), Vo € AE, Ju € T.(G), Vz € te ({p}), (d$): (62(V)) = (dRgyz))e(u). 


67.8.3 NOTATION: 0, for a horizontal lift function 0, denotes the function-transpose of 0. In other words, 
0: E > (T(M) > T(E)) is defined by 


Vp € M, Yz € Ep, VV € T,(M), 6.(V) = 0y (z), 
for any horizontal lift function 0 : T(M) — (E > T(E)) on a C! (G, F) fibre bundle (E, rg, M, Af). 
67.8.4 THEOREM: The transpose of a horizontal lift-function is a transposed horizontal lift-function. 
Let (E, 7p, M, AZ) be a C! (G, F) fibre bundle. Let 0 be a horizontal lift-function on E. Then the function- 
transpose of 0 is a transposed horizontal lift-function on E according to Definition 67.8.2. 
PROOF: Let 0 satisfy Definition 67.5.4. Let 0 denote the function-transpose of 0 as in Notation 67.8.3. Let 


z € E and V € Tr) (M). Then 6,(V) = 6v(z) € T(E) by Definition 67.5.4 (ii). So 0 has the functional 
form 0 : E LJ; cg(Tzg (5 (M) > T;(E)). 


lift-function on E. 


67.8.5 REMARK: Horizontal lift of vector fields by transposed horizontal lift functions. 
Definition 67.8.6 is the obvious conversion of Definitions 67.5.9 and 67.5.10 to use a transposed horizontal 
lift function. 
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67.8.6 DEFINITION: The lift of a vector field X € X(T(M)) by a transposed horizontal lift function 0 on 
a C! differentiable (G, F) fibre bundle (E, r, M, AZ) is the vector field liftg(X) € X(T(E)) defined by 


Vz € E, liftg(X)(z) = 8, (X (n(2))). 


The lift of a local vector field X € X(T(M)|U) by a transposed horizontal lift function 0 on a C! (G, F) 
fibre bundle (E, v, M, AZ), where U € Top(M), is the vector field liftg(X) € X(T(E) | x-1(U)) defined by 


Yz € a-1(U), lifts X)(2) = 8, (X (n(2))). 


67.9. Horizontal component maps and horizontal subspaces 


67.9.1 REMARK: Alternative structures containing connection information. 

There are many ways to structure the information in the connection on a fibre bundle. The horizontal 
lift functions 0 and 0 in Definitions 67.5.4 and 67.8.2 are essentially the same. (See also Definitions 69.1.3 
and 69.2.2 for the corresponding connections 8 and P on principal bundles.) Three connection structures 
which are not quite the same as the horizontal lift are the horizontal component map, the horizontal subspace 
map, the vertical component map, and the connection form. 


The horizontal component map h, in Definition 67.9.2 maps each vector in the tangent bundle T;(E) to 
the corresponding horizontal vector in T;(E) with the same base-space velocity. The pointwise maps are 
typically more useful in practice. 


67.9.2 DEFINITION: Horizontal component map for an ordinary fibre bundle. E 
The (pointwise) horizontal component map at a point z € E for a transposed horizontal lift map 0 on a C! 
(G, F) fibre bundle (E, v, M, A5) is the map hz : T;(E) — T- (E) defined by 
Vy € T.(E), hz(y) = 0.((d)2(y)). 
In other words, Vz € E, hz = 6, o (dm)z. 


The (global) horizontal component map for a transposed horizontal lift map 0 on a C! (G, F) fibre bundle 
(E,7, M, AZ) is the map h : T(E) —> T(E) defined by 


Vy € T(E), h(y) = rs) (Ct) xe (qu (9); 
where tg : T(E) > E is the projection map for T(E). 


67.9.3 REMARK: Some properties of horizontal component maps. 

In Theorem 67.9.4, part (iii) means that the vector y — h;(y) is vertical. (See Definition 64.5.7 for verticality.) 
The chart-dependent drop function in Definition 64.6.2 may be applied to such vectors. If the total space 
E is equal to the tangent bundle T(M) of M, then the chart-independent drop function in Definition 59.2.9 
may be applied. 


Theorem 67.9.4 (iv) is an attempt to recover the information in the original lift map 0; from the horizontal 
component map h,. If an arbitrary vector y € T, y (E) is chosen, the value of A;(y) is equal to 0.(V). (See 
Notation 64.5.6 for T; y (E).) The choice of y is straightforward in any given fibre chart containing z. Thus 
one may say that 0 and h contain the same information. (Theorem 67.9.4 is illustrated in Figure 67.9.1.) 


T; o(E) hz 

Sel ame 
TE) w-h A 

z hz(y) 
Q: 
(dr). | 0. 
"-— 
Figure 67.9.1 Horizontal and vertical components of vector on total space 
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67.9.4 THEOREM: Some basic properties of horizontal component maps. 
Let 0 be a transposed horizontal lift map on a C! (G, F) fibre bundle (E, m, M, AZ). 


(i) Vz € E, Vy € T;(E), (dr).(hz(y)) = (dr);(y). In other words, Vz € E, (dr), o h; = (dr),. 
(ii) Vz € E, VV € Tre) (M), Vy € Tv (E), h( n v (E). (See Notation 64.5.6 for T, v (E).) 
(iii) Vz € E, Vy € T.(E), y — h(y) € ker((dz);) = ws 
(iv) Yz € E, VV € Tu (M), Yy € Tav (E), (V A 


Pnoor: For part (i), let z € E and y € T.(E). Then (dr),(hz(y)) = (dr)-(8-((dr)-(y))) = (dr)-(y) by 
Definition 67.8.2 (ii). 

For part (ii), let z € E, V € Tr) (M) and y € TZv(E). Then (dz);(y) = V by Notation 64.5.6. So 
(dr) 2(hz(y)) = V by part (i). Therefore h;(y) € TZ, v (E). 

For part (iii), let z € E and y € T;(E). Then (dz);(y — hz(y)) = 0 by part (i). So y — hz(y) € ker((dr)z) 
by Definition 23.1.22. 


For part (iv), let z € E, V € Tr(a (M) and y € TZv(E). Then (dz);(y) = V. So by Definition 67.9.2, 
he(y) = 8: (dz), (y) = 8. (V). 


67.9.5 REMARK: The horizontal subspace map for a connection on a differentiable fibre bundle. 

One of the more popular ways of structuring information about connections on principal bundles is the 
horizontal subspace map, but this concept is well defined for ordinary fibre bundles also. The horizontal 
subspace at a point z in the total space E is the set of all horizontal (i.e. parallel) tangent vectors at that 
point as in Definition 67.9.6. Another way to express this is that the horizontal subspace contains all vectors 
whose vertical component equals zero with respect to the given connection. 


The set of all horizontal vectors in the tangent bundle T(E) contains the same information as each of 
the other structures which represent a connection on a C! fibre bundle (E, m, M, AZ). The horizontal lift 
0v (z) may be recovered as the unique vector in the horizontal subspace Q, which has a given base-space 
velocity V € Tpz) (M). In other words, 0y (z) is the unique y € Q, which satisfies (dr);(y) = V. 

A mnemonic for the letter Q in Definition 67.9.6 could be the German word “quer”, which means “crossways; 
diagonally; at right angles". (The notation-letter Q is also used by Kobayashi/Nomizu [19], page 63; Daniel/ 
Viallet [317], page 182; EDM2 [113], page 301.) 


Some authors define connections initially as horizontal subspace maps, but as pointed out in Remark 67.11.1, 
this is a technically inconvenient starting-point for deriving other kinds of connection definitions from. 


67.9.6 DEFINITION: Horizontal subspace for an ordinary fibre bundle. 

The horizontal subspace at a point z € E for a transposed horizontal lift map 0 on a C! differentiable (G, F) 
fibre bundle (E, 7, M, AL) is the set Q, = Range(6.). 

The horizontal subspace map for a transposed horizontal lift map 0 on a C! differentiable (G, F) fibre bundle 
(E,v, M, AL) is the map Q : E — IP(T(E)) defined by Q, = Range(0;) for all z € E. 


67.9.7 REMARK: Maps and spaces for the horizontal lift, horizontal component and horizontal subspace. 
The maps and spaces in Theorem 67.9.9 are illustrated in Figure 67.9.2. (See Definition 64.5.7 for the vertical 
subspace T; o(£).) 


67.9.8 REMARK: Literature for the horizontal subspace connection representation style. 

The horizontal subspace style of connection representation is presented, either as a primary or secondary 
definition, by Kobayashi/Nomizu [19], pages 63-64; Bishop/Crittenden [2], page 75; Spivak [37], Volume 2, 
page 346; Sulanke/Wintgen [40], page 126; Drechsler/Mayer [262], pages 87-88, 203; EDM2 [113], page 301; 
Daniel/Viallet [317], page 182; Bleecker [254], page 29; Poor [32], page 54; Sternberg [38], page 334. 


67.9.9 THEOREM: Summary of basic properties of horizontal components and horizontal subspaces. 
Let 0 be a horizontal lift map for a C! (G, F) fibre bundle (E, v, M, AE). Let z € E. 
(i) Range((dz);) = T,(M). 
(ii) ker((dm);) = Tz o(E). 
(iii) Range(0,) = Q.. 
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Figure 67.9.2 Horizontal lift, horizontal component map and horizontal subspace 


(iv) ker(@,) = {0} 

(v) 4, o (dr), = hz 

(vi) (dr), o 8; = idr, (ur. 

(vii) Range(h,) = Qz- 
(viii) ker(h;) = Tz,0(E) 

(ix) hz oh, = hz. 

(x) ha 00, = 8,. 

(xi) (dx), oh, = (dz);. 
(xii) Q- = (y € T.(E); y =hz(y)}- 
(xiii) Q+ T; o(E) = {0}. 
(xiv) Range(idr, (g; — hz) = Tz o(E). 
(xv) ker(idr (gj — hz) = Qz. 


PROOF: Part (i) follows from Theorem 64.5.10. 

Part (ii) follows from Notation 64.5.6. 

Part (iii) follows from Definition 67.9.6. 

For part (iv), let V € ker(@,). Then 6;(V) = 0. So (dz);(8,(V)) = 0. So V = 0 by Definition 67.8.2 (ii). 
Hence ker(0,) = {0}. 

Part (v) follows from Definition 67.9.2. 

Part (vi) follows from Definition 67.8.2 (ii). 

Part (vii) follows from parts (v), (i) and (iii). 

For part (viii), let y € ker(h;). Then 6;((dm),(y)) = 0 by part (v). So (dr).(y) = 0 by part (iv). So 


y € T, o(E) by part (ii). Conversely, let y € Tz o(E). Then (dr).(y) = 0. So hz(y) = 0. So y € ker(^;). 
Hence ker(h;) = T, o(E). 


For part (ix), hz o h; = 0, o (dm), o 0, o (dm), by part (v). Soh, o h; = 0, 0 idr, (uj o (dz); = 0, o 
(dr), = hz by parts (vi) and (v). 

For part (x), hz o 0; = 0, o (dr), o 0; = 0, o idr,(m) = 0; by parts (v) and (vi). 

For part (xi), (dr); oh, = (dz); o 0, o (dr); = idr, (ur) © (dm), = (dz); by parts (v) and (vi). 

For part (xii), let y € Q+. Then y = 6,(V) for some V € T,(M) by part (iii). So hz(y) = hz(62(V)) = 
0,(V) = y by part (x). So Q; € (y € T;(E); y = hz(y)}. Now suppose that y € T;(E) and y = h;(y). Then 
y € Range(h;) = Q- by part (vii). Hence Q; = (y € T(E); y = hz(y)}- 


For part (xiii), let y € Q+ N T; o(E). Then y = hz(y) by part (xii). But T,,9(£) = ker(hz) by part (viii). So 
h.(y) =0. Therefore y = 0. Hence Q: N T; o0(E) = {0}. 


Part (xiv) follows from Theorem 67.9.4 (iii). 
Part (xv) follows from part (xii). 
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67.10. Vertical component maps 


67.10.1 REMARK: The vertical component map for a given connection. 

The expression “idr, (g) — h+” which appears in Theorem 67.9.9 (xiv, xv) is the vertical component map. 
'This is given a name and notation in Definition 67.10.2. Some properties of this vertical component map are 
listed in Theorem 67.10.3. (See also Theorem 69.4.3 for the corresponding properties for a principal bundle.) 
It is important to remember that this is connection-dependent. Without a horizontal component definition, 
the vertical component would be undefined. 


67.10.2 DEFINITION: Vertical component map for an ordinary fibre bundle. E 
The (pointwise) vertical component map at a point z € E for a transposed horizontal lift map 0 on a C! 
differentiable (G, F) bundle (E, v, M, A‘) is the map v, : T;(E) > T,(E) defined by 


Vy € TUE, vz(y) = 


In other words, Yz € E, vz = idr, (gj — hz. 


The (global) vertical component map for a transposed horizontal lift map 0 on a C! differentiable (G, F) 
bundle (E, m, M, A5) is the map v : T(E) — T(E) defined by 


Yy € T(E), v(y) = 


where tg : T(E) — E is the projection map for T(E). 


67.10.3 THEOREM: Some basic properties of the vertical component map for an ordinary fibre bundle. 
Let 0 be a transposed horizontal lift map for a C! (G, F) fibre bundle (E, r, M, AE). Let z € E. 


(i) Range(v;) = T; o (E). 


(ii) ker(vz) = Q;. 
(iii) vz o h; =0. 
(iv) v; o 8, — 0. 
(v) Uz ov; = Uz. 
(vi) hz ov, — 0. 
(vii) (dx), o v, — 0. 


PROOF: Part (i) follows from Theorem 67.9.9 (xiv) and Definition 67.10.2. 
Part (i) follows from Theorem 67.9.9 (xv). 
Part 2 follows from Theorem 67.9.9 = and Definition 67.10.2. 


67.11. Connection definition conversions for ordinary bundles 


67.11.1 REMARK: Conversion rules for four definitions of a connection on an ordinary fibre bundle. 
Theorem 67.11.2 aggregates the conversion rules between four different ways of encapsulating the information 
in a connection for an ordinary fibre bundle, namely the objects 0,, hz, v, and Qz. (See Theorem 69.6.3 
for analogous conversion rules for principal bundles.) The main difficulties are constructing 0, as an output, 
and constructing anything from Q, as an input. This suggests that 0, is the best choice of a fundamental 
definition from which other encapsulations can be easily constructed, and Q, is the worst. (And yet some 
authors use Q, as their basic definition!) 
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67.11.2 THEOREM: Conversion rules for four encapsulations of a connections on ordinary fibre bundles. 
Let 0 be a transposed horizontal lift map for a C! (G, F) fibre bundle (E, r, M, AE). Let z € E. 
(i) VV € T,(M), (6:(V)) = TZ, v (E) N Range(h;). 
(ii) VV € T,(M), (6;(V)) = T; y (E) n ker(v;). 
(iii) YV € T,(M), (&.(V)) = T» (E) n Q.. 
(iv) hz = 6, o (dn),. 
(v) hz = idr,(z) — vz. 
(vi) Vy € T.(B), {he(y)} = {y' € Qui y — y' € ker((d).)). 


(vii) vz = idr, (gy — 6; o (dr);. 

(ix) Vy € TZ(E), {vz(y)} = ty’ € TZo(E v - v € Q2}. 

(x) Q- = Range(0.). 

(xi) Q; = Range(h,). 
(xii) Qz = ker(v;). 

Pnoor: For part (i), let z € E, p = 1(z) and V € T,(M). Suppose that y € {0.(V)}. Then y = 0.(V). So 
y € T4v(E) by Definition 67.8.2 (i, ii) and Notation 64.5.6, and y € Range(h,) by Theorem 67.9.9 (iii, vii). 
So y € T, v (E) N Range(h;). To show the reverse inclusion, suppose that y € T; y (E) n Range(h;). . Then 
y € T, y (E) n Range(0;) by Theorem 67.9.9 (iii, vii). So y = 0 (V") for some V' € Dom(6;) = T,(M). But 
then y € T; v/(E) by Definition 67.8.2 (i, ii). Therefore V^ = V by Notation 64.5.6 and the well-definition 
of (dz);. So y = 0;(V). Hence {0.(V)} = TZ, v (E) N Range(A;). 

Part (ii) follows from part (i) and Theorem 67.9.9 (xv, vii). 

Part (iii) follows from part (ii) and Theorem 67.9.9 (xv). 

Part (iv) follows from Theorem 67.9.9 (v). 

Part (v) follows from Definition 67.10.2. 


For part (vi), let z € E and y € T(E). Suppose that y’ € {hz(y)}. Then y' = hz(y). So y' € Q: by 
Theorem 67.9.9 (vii), and y — y' = y — h;(y) = vz(y) € Tzo(E) by Theorem 67.9.9 (xiv). Therefore y — y' € 
ker((dm);) by Notation 64.5.6. Thus y’ € {y’ € Qz; y — v € ker((dx);)). To show the reverse inclusion, 
suppose that y' € (y' € Qz; y — y' € ker((dz);)). Then y' € Q, and y — y' € ker((dz);). So y' = hz(y’) 
by Theorem 67.9.9 (xii), and (dz);(y) = (dr)z(y’) by the linearity of (dz);. Therefore by Definition 67.9.2, 


hz(y) = 6: ((dr):(y)) = 6:((dr):(y)) = hz (y') = y'. Hence {h.(y)} = (y € Qz; y — y' € ker((d7).)). 
Part (vii) follows from Definitions 67.10.2 and 67.9.2. 

Part (viii) follows from Definition 67.10.2. 

For part (ix), let y € T;(E). Then it follows from part (vi) and Definition 67.10.2 that 


{uz(y)} = {y — he(y)} 
= (y — y; y! € Q; and y — y' € ker((dr),)} 
= (y; y — y' € Q- and y' € ker((dz);)) 
= (y € T;o(Ey y—y €Q;). 


) 
) 
) 
i) 
) 
(viii) v4 = idm, (gj — hz. 
) 
) 
i) 
) 


Part (x) follows from Theorem 67.9.9 (iii). 
Part (xi) follows from Theorem 67.9.9 (vii). 
Part (xii) follows from Theorem 67.9.9 (xv). 


67.11.3 REMARK: Construction of horizontal lift function via cross-sections. 

There is a partial solution to the problem mentioned in Remark 67.11.1, that the transposed horizontal lift 
function @ cannot be easily constructed from the horizontal or vertical component maps. The difficulty is 
apparently the fact that the equation h; = 0, o (dz); in Theorem 67.11.2 (iv) cannot be “solved” for 0, by 
“dividing through" by (dz); because (dr), is not injective. 
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What is needed is some kind of lift-map from vectors V € T, (,,(M) to T;(E) which has the correct horizontal 
component, so that the lifted vector can have its vertical component removed by h,. In fact, any C1 cross- 
section of (E, 7, B) which satisfies X(z(z)) = z will fulfil this requirement by Theorem 64.7.10. This idea is 
applied in Theorem 67.11.4 (i, ii). However, a Ct cross-section X must be chosen such that X (p) = z for any 
given z € E. This choice is made in Theorem 67.11.4 (iii, iv), which makes use of the “constant cross-section 
extensions" of fibre elements in Definition 21.6.8. Unfortunately, these constant cross-sections require fibre 
charts, which are not much less awkward than the cross-sections. So ultimately Theorem 67.11.4 does not 
provide better formulas than Theorem 67.11.2 (i, ii). 


67.11.4 THEOREM: Formulas for transposed horizontal lift in terms of horizontal/vertical components. 
Let 0 be a transposed horizontal lift map for a C! (G, F) fibre bundle (E, v, M, A‘). 
(i) VX € Xi, (E, v, M), Vp € Dom(X), Ox(p) = hx(y) o (dX),. 
(ii) VX € XL(E, m, M), Vp € Dom(X), 0x(p) = (dX)p — vx(y) o (dX)s. 
(iii) Vd € Ab, Vz € Dom(9), 6; = hz o (dExtno(z))«:). 
(iv) Yọ € AZ, Vz € Dom(9), 0; = (dExtng(z))q(2) — v; o (d Extng(z))q(2)- 
(v) Vo € AE, Vz € Dom(¢), VV € T,(2)(M), 6.(V) = h.(Hy s 2)). 
In other words, Vz € E, Vo € AE (2)? 0, = h;((m x p) Or, 0»))- 
(vi) Vo € Ah, Vz € Dom(4), VV € Tr(a) (M), 0-(V) = Hj (2) — v. (Hy s(2)). 
In other words, Vz € E, Vo € AE (2)? 0, = (m x 6); 1( Or, 0») — vs Rapt S 0r, 0»))- 
(See Definition 21.6.8 for Extng(z). See Definition 64.5.13 for Hy 4(2)-) 


PROOF: For part (i), let X € Xj. (E,v, M) and p € Dom(X). Let z = X(p) Let 
Then (dX) (V) € Tx(5(£) by Theorem 58.4.8. Let y = (hx) o (dX)p)(V) = hz( ) Then 
y € Range(h;), and y € T;(E) by Definition 67.9.2. But (dr).(y) = (dr).(h Es (V))) = (dr) (X.(V)) 
by Theorem 67.9.4 (i). Therefore (dx);(y) = V by Theorem 64.7.10. So y € Tz o(E) by Definition 64.5.7. 
Therefore y = 0.(V) 2 Theorem 67.11.2 (i). Thus 0x(p)(V) = hx(p)((dX)p(V)) for all V € T;(M). Hence 
Part x follows kon part g and Definition 67.10.2. 

Part (iii 


V e T,(M). 
«(V 


67.11.5 REMARK: Expressing connections in terms of the connection generator function. 

The connection generator function tig in Definition 67.6.5 faithfully encodes all of the information in a 
connection 0. However, this generator function is fibre-chart-dependent, and its relations to 0, 02, hz, vz 
and Q; are not simple. Theorem 67.6.7 (i) gives a formula for 0 in terms of dig. Theorem 67.11.6 gives some 
more formulas. 


67.11.6 THEOREM: Expressions for connections in terms of the connection generator function. 
Let 0 be a horizontal lift map for a C! (G, F) fibre bundle (E, r, M, A‘). 


(i) Vp € M, VV € T, (M), Yọ € Afp, Vz € Ep, Óv(z) = 6,(V) = (d(x x $))z (V, be (vg) (6 (2))). 
(ii) Vp € M, Yz € Ep, Vy € T.(E), Vb € A ps hely) = (d(T x 6) (mlu), X2, 6), OED). 
(iii) Vz € E, VV € Tr(a (M), Vy € Tz,v(E), Vó € AZ ay ely) = (dr x $)71(V, X. (v, (6(2))). 
(iv) Vz € E, Vy € TE), Vé € AE is valy) = (dlr X 8) i (tei 9) XE eee a) (8(9)). 
(v) Ve € E, Vy € T), VO € AE c. vey) = m Gd], )z (0) — XE, oC), 
where n” : TZ(Ez(4)) > Tz,0(£) is the fibre-set tangent on embedding map as in Notation 64.12.10. 


(vi) Vz € E, Wy € T(E), V € A5 ny os (y) = m (Co De (be) — XE cast. (90))- 
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PROOF: Part (i) follows from Theorem 67.6.7 (i) and Definition 67.8.2. 

Part (ii) follows from part (i) and Definition 67.9.2. 

Part (iii) is a restatement of part (ii) using Notation 64.5.6. 

For part (iv), let z € E, y € T;(E) and o € AE at (2) Then y € Dom((d(z x ¢)),) = T,(E). Therefore 


y = (d(x x $))z!((d(x x $))z(y)) because by Theorem 58.5.4, (d(m x ¢))z : T(E) > Tin(z),6(z)) (M x F) is a 
bijection. But (d(v x $));(y) = ((dr)-(y), (dé) z(y)) by Theorem 58.7.7 (ignoring the tangent space product 
identification map in the customary manner). Hence by part (ii) and Definition 67.10.2, 


ve(y) = y — he(y) 

= (d(v x 6))2" (dz): (y), (dd) 2(y)) — (d ae P))z (T (Y), X2, (602) 

= (d(m x $))z (T (y) — T (Y), Oxy P ào(n.(),) (6 (2) (67.1.1) 
= (d(T x $))z (0r, 0, P(Y) — Xis), (P(2))), 

where line (67.11.1) follows from the linearity of (d(z x $))7! : T(z) x Taz) > Tz(P). 


Part (v) follows from part (iv) and Theorem 58.7.5 line (58.7.5). 
Part (vi) follows from part (v) and Theorem 58.9.13. 


67.12. Associated connections on ordinary fibre bundles 


67.12.1 REMARK: Associated connections literature and history. 

Associated connections are presented by Sternberg [38], pages 335-336; Poor [32], pages 276-279, 290-293; 
Frankel [12], pages 481-487; Drechsler/Mayer [262], pages 92-95; Daniel/Viallet [317], page 186; Spivak [37], 
Volume 2, pages 346-348; Eriksson/Hàggblad/Strómbom [264], page 55; Bishop/Crittenden [2], pages 83- 
84. They are also mentioned briefly by Nash/Sen [30], page 184. Associated connections are attributed by 
Poor [32], page 290, to a 1950 paper by Ehresmann. (See Ehresmann [178], pages 39-42.) 


67.12.2 REMARK: Associated connections on associated differentiable ordinary fibre bundles. 

The motivation for defining associated differentiable ordinary fibre bundles in Section 66.7 is to prepare the 
path towards definitions of associated connections. Most often, a connection is defined first on a PFB, and 
is then “ported” to an associated OFB. In Definition 67.12.3, a kind of “peer-to-peer port” is defined, which 
bypasses the need for a PFB connection as an intermediary. Associated connections between peer OFBs 
always imply the existence of an associated PFB connection in the background. The PFB connection is 
usually presented explicitly as the source of the OFB connections, but it is not strictly necessary. 


Associated fibre bundles are typically constructed according to the requirements of particular applications, 
for example by the methods in Definitions 66.7.10 and 66.7.12. However, Definition 67.12.3 assumes abstract 
fibre bundle associations as in Definition 66.7.5, independent of the construction method. Connections are 
defined to be horizontal lift functions as in Definition 67.5.4. 


Definition 67.12.3 means that with respect to associated fibre charts, the infinitesimal transformations of the 
two fibre spaces which define the connections on two fibre bundles are generated by the same Lie algebra 
element of the structure group. In terms of the figure/frame bundle concepts in Section 20.10, this means that 
the associated infinitesimal transformations of the measurement spaces F! and F? are both generated by the 
same infinitesimal adjustment u € T.(G) to the reference frame with respect to which the two measurements 
are made. There is only one (adjustable) reference frame, but two kinds of measurements are made with 
respect to the same (adjustable) reference frame. For example, one measurement could be a vector and the 
other could be a tensor, but each measurement is acted on by the same infinitesimal matrix adjustment. 


67.12.3 DEFINITION: Associated connections have the same connection generator for each velocity. 
Associated connections on C! (G, F*) fibre bundles (EF, x^, M, a for k = 1,2, with fibre chart association 
map h: AP: > AR. are connections 0^ : T(M) > Upem (EE — T(E*)) which satisfy 


Yo € AE, Vp € a! (Dom(9)), VV € T,(M), 3u € T(G), 
$x 00b oo| — XI and h(d)x0 0% oh($)|z; = XI. (67.12.1) 
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(See Definition 63.6.5 for infinitesimal transformations XZ" € X(T(F!)) and XF? € X(T(F?)).) 


In other words, 


Vp € M, VV €T,(M), Vd! € AE; p, Vd? € AE; p» 
h(9))- 4? => Gen" (ólo6Lo9!|;,) = Gen" (82 o 02 o &?|z;). (67.12.2) 


(See Notation 63.6.21 for maps Gen” E: XF "eu and Gen” : XE * _+ u from infinitesimal transformations 
to the Lie algebra elements which generate them.) 


67.12.4 REMARK: Associated connections have matching connection generator functions. 

Theorem 67.12.5 expresses the association of connections in Definition 67.12.3 in terms of the connection 
generator functions tig in Definition 67.6.5. Theorem 67.12.5 is nothing more than a short, snappy paraphrase 
of Definition 67.12.3 line (67.12.1). It says nothing new. 


67.12.5 THEOREM: Connections are associated if and only if their connection generator functions are equal. 
For k = 1,2, let E* < (EF, n", M, AE be associated C! (G, F^) fibre bundles with fibre chart association 
map h: AP, > ae Let 6” be a connection on E* for k = 1,2. Then 6! and 6? are associated connections 


if and only if 
YV e T(M), Y$ € Abi, dig: (V, d) = tiga (V, h(9)). 


PROOF: The assertion follows by Definition 67.12.3 line (67.12.1), Theorem 67.6.3 (i) and Definition 67.6.5. 
Alternatively apply Theorem 67.6.7 (iii) to Definition 67.12.3 line (67.12.2). 


67.12.6 REMARK: Associated connections induce associated vector fields on their fibre spaces. 

According to Definition 67.5.4 (v), a connection 0y, for V € T(M), is equal to the vector field XI.) 
generated by some Lie algebra element u(V, 9) € T, (G) when viewed through a fibre chart ¢. This implies 
that the connection preserves the invariants which are enforced by the structure group. 

Any two associated fibre bundles have the same structure group. So their connections s. for k — 1,2 must 
equal vector fields X7, ve gs) When viewed through fibre charts ¢* for some u^(V, ^) € Te(G), for the same G. 
This is inevitable from the definition of a connection and the definition of associated fibre bundles. What is 
not inevitable is that u! (V, à!) will equal u?(V, ¢) for all associated charts $^ € AE, for k = 1,2. This 
is the significance of lines (67.12.1) and (67.12.2). 

Associated connections thus make only one requirement which is “not inevitable", namely that the generators 
of the vector fields on the respective fibre spaces F! and F? must be the same Lie algebra element. So it is 
not necessary to write u as u^ to indicate the dependence of u on F^ in Definition 67.12.3 because they are 
the same, even if the fibre spaces are very different. 


mV) 


The equality of the generators u*(V,") for different fibre spaces F* is implicit in the way in which gauge 
potentials are applied to different OFBs in gauge theory. The gauge potentials, generally denoted as some- 
thing similar to “A,,”, are kept the same for different associated fibre bundles. (This assumes that the same 
"gauge" is used. The “gauge” is a cross-section of the principal bundle which pulls Lie algebra elements 
down to the space-time manifold. See Remark 69.11.2 for further details.) 


67.12.7 REMARK: Associated connections for principal bundles. 

The “connection generator equality criterion" for associated connections in Definition 67.12.3 line (67.12.1) 
requires no change when it is applied to associated principal bundles. This is confirmed by Theorem 69.10.4, 
which asserts that the style of orbit-space associated vector bundle connection which is often seen in the 
literature matches Definition 67.12.3 without a change of sign or any other adjustment. This confirms also 
that principal bundles truly are special cases of ordinary fibre bundles, as implied by Definition 66.1.2. Hence 
all definitions and theorems for general fibre bundles are valid when applied to principal bundles. 
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Chapter 68 


CONNECTIONS ON VECTOR BUNDLES 


68.1 Connections on vector bundles . . . . . ... es 2143 
68.2 Covariant derivatives of vector bundle cross-sections . . ... lll. 2148 
68.8 Moving-frame coefficients for vector bundle connections . .. ......... lll. 2153 
68.4  Covariant derivatives of functions on the total space . . .. ..... lll. 2156 
68.5 Parallel transport on ordinary fibre bundles .......... es 2158 


68.1. Connections on vector bundles 


68.1.1 REMARK: Interpretation of the infinitesimal action condition for a vector bundle connection. 
Figure 68.1.1 illustrates the specialisation of connections (i.e. horizontal lift functions) in Definition 67.5.4 
from general differentiable fibre bundles to vector bundles. 


A G = GL(F 


Va = Kp,p 
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Figure 68.1.1 Connection on a vector bundle 
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A specific feature of the vector bundles in Definition 65.1.3 is that the manifold charts wr on the fibre 
space F are linear space component maps Kp with respect to bases B of F. Consequently all fibre space 
charts are global, which implies that atlasQ(F) = atlas(F) for all q € F. Similarly, the charts vc for the 
structure group GL(F) are component maps &p,p as in Definition 23.2.10. Therefore atlas;(G) = atlas(G) 
for all g € G. (See Definition 63.4.17 for the atlases which are specified for a general linear Lie transformation 
group (GL(F), F) for finite-dimensional real linear spaces F.) 
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68.1.2 REMARK: Linearity of connections on differentiable vector bundles. 

Connections 0 on general differential fibre bundles (E, r, M, A‘) in Definition 67.4.2 are linear with respect 
to the base-space velocity vector. That is, the map V ++ 0y (z) is linear with respect to V € T,(2)(M) for 
fixed z € E by Definition 67.4.2 (iii). For the vector bundles in Definition 65.1.3, the parameter z lies in 
a linear space £(,;, with linear structure induced from the fibre space as in Definition 65.1.6. Thus the 
domain E, of 0y has a well-defined linear structure for all V € T (M), for all p € M. (This domain is stated 
explicitly in Definition 67.4.2 (i).) 

However, the range of 0y does not have a simple linear space structure in general. By Definition 67.4.2 (ii, iv), 
the range of Oy is Uer, T; v(E), where V € T,(M). (See Notation 64.5.6 for T:,y(E).) There are two 
difficulties here. First, the linear spaces T, (E) are disjoint for distinct z € Æp. Secondly, the restriction of 
T.(E) to T, v (E) implies that only a hyperplane of T,(£) is used for the range, and the linear combination 
operation is not algebraically closed on such subsets. In order to assert linearity for connections on vector 
bundles, the difficulties with both the z parameter and V parameter must be removed somehow. 


The connection map ôy : Ep > U T; v (E) is clearly not linear because its range is not a linear space. 


ZC E, 
Nor is the range even a subset of a single linear space. But if the vertical component of 6y(z) is mapped 
to F via the differential (dd), of a chart 9 € AE p: the mapped vertical component is shown to depend 
linearly on z in Theorem 68.1.3. This mapped value depends very much on the chart, as shown explicitly in 


Theorem 64.8.10. So this linearity is valid only for fixed fibre charts. 


One could assert that each set T; v (E) has a linear space structure which is induced by the chart differentials 
(d$) be VEV but unfortunately these spaces are disjoint for distinct z. Therefore the claim that connections 
on vector bundles are linear is a very limited claim. (The corresponding linearity issues for affine connections 
on tangent bundles are discussed informally in Remark 71.1.3.) 


It should perhaps also be noted that Definition 67.5.4 (v) implies that the map $. o Oy o é|z. DF + T(F) 


is a cross-section of the tangent bundle T(F), and in fact it has the specific form ¢, o Oy o lp. = X for 


a Lie algebra element u € T.(G) which depends on V and ¢. So Theorem 68.1.3 is effectively asserting that 
this infinitesimal transformation of F is linear when dropped from T(F) to F, which is not surprising. So 
the vertical component linearity is a consequence of the definitions of vector bundles and connections. Thus 
the definition of a connection does not require additional ad-hoc conditions to force it to be linear. 


The maps in lines (68.1.1) and (68.1.2) of Theorem 68.1.3 are not bijections in general. In fact, the ranges 
of these linear functions could contain only the zero vector in the case of a “flat space”. 


68.1.3 THEOREM: Vertical component linearity for connections on differentiable vector bundles. 
Let 0 be a connection on a Ct vector bundle (E, s, M, AL) over a finite-dimensional real linear space F with 
m — dim(F). Then 
Vp € M, VV € T,(M), Yọ € Afp» 
wl oó,oÓvodó|z : F > Fis linear, (68.1.1) 
where w” : T(F) — F is the linear-space tangent-bundle drop function in Definition 54.9.5. Hence 
Vp € M, VV € T,(M), Vo € Ak, Vr € atlas(F), 
(yr) o Qx o Ov o Q 


-1 ikasa 
E, ` F — R” is linear. (68.1.2) 


Pnoor: To show that c" o ¢, o 0y o ole, : F — F is well defined for p € M, V € T,(M) and ó € Af p 
first note that lp : F — Ep is well defined because 2 : Ey > F is a bijection by Theorem 21.5.6 (i). 
So Oy o dla da Uer, T-(E) is well defined because Dom(8v) = E, by Definition 67.5.4 (i) and 
Range(0y) C Uzer, T,(E) by Definition 67.5.4 (ii). Since Dom(¢.) = U,epom(g) T-(Œ) 2 Uses, T.(E) 
and Range(¢.) C T(F), it follows that $. o 0v o lp, : F —> T(F) is well defined. Then finally, since 


Dom(w") = T(F) and Range(w”) C F, it follows that co^ o ¢, o 0y o le. : F — F is well defined. This 
chain of maps is illustrated in Figure 68.1.2. 
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F —> E, — U T.v(£) —> T(F) —F 
olz Oy | *€Ep ds wE 


Figure 68.1.2 Maps for linearity of a connection on a vector bundle 


To show linearity, let p € M, V € T,(M) and ¢ € AE with p € t(Dom(¢)). Then by Definition 67.5.4 (v) 
there is a u € T.(G), depending on V and ó, such that Vz € Ep, (d$).(0v(z)) = (dRa(;)e(u). (See 
Definition 63.6.5 for infinitesimal transformations XJ : p 4 (u) for u € Te(G) and q € F.) 

Let q € F. Let z = |p (4 . Then z € Ep. e A ee ye(u) = (dRa)e(u). 
Define XË : F + T(F) by a : q= (d R4)e(u) as in ira ins Then c o XE : F > F is linear by 
Theorem 63.6.25. Thus c o x o by o dla = c o XF is linear. This verifies line (68.1.1). 


For line (68.1.2), Definition 54.9.5 implies w? = v! o (vr) for Yp € atlas(F). So ®(r) = Yr o w. But 
Up: F + R” is linear by Theorem 22.8.12. So P(Yr) o dy 0 Oy o lp : F => R” is linear by line (68.1.1) 
and Theorem 23.1.13. This verifies line (68.1.2). 


68.1.4 REMARK: Linear connections on vector bundles. 

One of the many oddities of differential geometry vocabulary is the term “linear connections” to mean linear 
connections on vector bundles. By Theorem 68.1.3, connections on vector bundles are automatically linear. 
So it is superfluous to require this. But the greater oddity is that connections on tangent bundles are called 
“affine connections”. Affine suggests the combination of a translation and a linear transformation, whereas 
connections on tangent bundles are necessarily linear because they are vector bundles. 


The historical origins of the term “affine connection” are quite well known. (See for example Remarks 71.0.1 
and 71.0.5.) But affine maps are more general than linear maps, whereas an affine connection is a special 
kind of linear connection. The solution adopted for this issue here is to use the term “affine connection” to 
mean a connection on a tangent bundle, and then avoid the term “linear connection” as much as possible. 
However, both terms are actually useful shorthands. The term “affine connection” implies that the bundle is 
a tangent bundle, and the term “linear connection” implies that the bundle is a vector bundle, Thus saying 
what kind of connection it is tells you what kind of bundle it is. 


68.1.5 REMARK: The oblique drop of a vector bundle connection is linear. 

Theorem 68.1.6 applies Theorem 68.1.3 to show that the oblique drop of a connection on a vector bundle is 
linear. This is useful for showing that covariant derivatives on vector bundles are linear in Theorem 68.2.11. 
Then this makes possible the Cartan-style coefficient array formula for covariant derivatives on vector bundles 
in Theorem 68.3.4. 


Theorems 68.1.3 and 68.1.6 are very closely related to each other because the linear structure on ae sets 
of vector bundles is induced from fibre spaces in Definition 65.1.6. Therefore the maps $| E, and ez. p, are 


linear space isomorphisms by construction. So application of these maps always preserves linearity. 


68.1.6 THEOREM: Linearity of oblique drops of connections on vector bundles. 
Let 0 be a connection on a C? vector bundle (E, r, M, A‘) over a finite-dimensional real linear space F. 
Then 


Vp € M, VV €T,(M), Vó € Afp» a”? o Oy € Lin(E,, Ep), 


where «P? : T(E) => Dom(6) is the oblique drop function for vector bundles E and fibre charts 
z€Dom(¢) 


9 € AT in Definition 65.4.2. 


PROOF: Let pe M, V € T,(M) and ó € AE p- Let z € Ep. Then 6y(z) € TZ v(E) € T;(E). Therefore 
c; 9 (Øy (z)) € Ep is well defined by Definition 65.4.2. 


To show linearity, note that co P ?(0y (z) -é|z F ((d$); (0v (z)))) for all z € Ep by Definition 65.4.2. (See 
Figure 68.1.3 for comparison between en Po ow and the map c^ o 8, in the proof of Theorem 68.1.3.) 
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lp Oy Qx wë : wË o By 
F —> E, —- U T,v(E) > T(F) > F 
z€E, 
E, ——- U T, v(E) > T(F) > F > Ep 
zEEp =d 
wE? o y : Oy Qx wë Pn, 
Figure 68.1.3 Maps for linearity of oblique drop of vector bundle connection 


So by Definition 65.1.6 for the linear structure on Ep, 


Vp € M, VA € R, Vz € Ep, YỌ € Afp» 


c? *(8y (Az) = d|p (e (dd). (6v uo (Ag(z)))))) 
= |p, Aw” ((dd)-(6v(4] 5. (6(2))))) (68.1.3) 
- Won F (dé). (Ov (z »» (68.1.4) 


= Aw (Oy (z ))s 
where line (68.1.3) follows by Theorem 68.1.3, and line (68.1.4) follows by Definition 65.1.6. Similarly, 
Vp E€ M, V21, 22 € Ep, VO € AF p 
co P9 (Oy (zi + 22)) = ex * (8v (21)) + v * (Ov (22)). 


Hence c P? o Oy € Lin(E,, Ep) for all ó € AE p for all V € T, (M), for all p € M. 


68.1.7 REMARK:  Generalisation of the Christoffel array to general vector bundles. 

The Christoffel array (or connection coefficient array field) for affine connections on tangent bundles in 
Definition 71.2.2 is generalised to general connections on differentiable vector bundles in Definition 68.1.8. 
As an over-simplified mnemonic rule, line (68.1.6) may be abbreviated as follows. 


Vi, j € Nm, Vk € Nn, DÀ = be, (ej). (68.1.5) 


As always, the use of coordinates (or components or coefficients) in differential geometry is dependent on 
the linearity of the expressions which use them. Thus the simple reconstruction of vector bundle connections 
from their Christoffel coefficient array fields in Theorem 68.1.10 is a consequence of the linearity which is 
shown in Theorems 68.1.3 and 68.1.6. 


68.1.8 DEFINITION: The Christoffel array for a connection on a vector bundle. 

The iChrisioffel) coefficient array of a connection 0 on a Ct vector bundle (E, v, M, Af) over F, for a fibre 
chart ¢ € AL, manifold charts v; € atlas(M) and vp € atlas(F), at p € r(Dom(v)) n Dom(vw), is the 
array dy our (D)jk)tj-ikzi € R™*<™*” given by 


Vo € AE, YYm € atlas(M), Vir € atlas(F), Vp € bond N Wine ^ Vi, j € Nm, Vk € IN, 
p. VM; y, (P ) jr = —~,(¢(w* ?(8 e?M (p COFI DE (68.1.6) 


where n = dim(M) and m = dim(F), and e?* = pp! (e;) € F is a basis vector for F, corresponding to the 
standard basis vector e; for R”. 


Alternative name: Christoffel array for a (vector bundle) connection 0. 


68.1.9 REMARK: The Christoffel array is a covariant derivative array, not a parallelism array. 

The negative sign on lines (68.1.5) and (68.1.6) arises because the Christoffel array signifies the covariant 
derivative, not parallel transport. This choice of meaning for the Christoffel array makes good sense if it 
is noted that covariant derivatives were introduced into differential geometry a long time before parallelism 
was introduced in 1917 by Levi-Civita [187]. (This negative sign issue is also discussed in Remarks 70.4.3, 
71.4.1 and 71.12.2.) 
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68.1.10 THEOREM: Reconstruction of a vector bundle connection from its coefficient array field. 
Let 0 be a connection on a C! vector bundle (E, s, M, AZ) over F. Then 


Vp € M, VV € T,(M), Vz € Ep, YỌ € Ape Vu € atlas,(M), Vir € atlas(F), 


z^*(y(2) =— YD TL as Pie Bre)! MV olay (Pr) (681.7 


4,j—1lk 
and w" ((d$).(8v (z))) = n 
=- X M Tonne (Dj PF (6C) Vct e” (68.1.8) 
7,2— = 
and (dg) 2(6v (z)) = — D» TS war yr (0) jk VECO) Vin (V) tole) cbr (68.1.9) 
7,2]— — 
m (68.1.10) 
where w < PN V) € R” is given by 
Vi € Nm, we — — 3 Y IS uu yr P) vrlo) ou (V)*. (68.1.11) 
j=l k=l 
Hence 
Yp € M, YV €T,(M), Yz € Ep, Vo € Aip» Vym € atlas,(M), Vv € atlas(F), 
4 
0y (z )= (dé) "m v GE) (tata) enue) (68.1.12) 
= (d(T x 6); (V, tae) wybr)) (68.1.13) 
= t, (ww), (barbe )o( d)? (68.1.14) 


where v € R” is given by v* = v (V)* for all k € Nn. 


PROOF: From Definition 68.1.8 line (68.1.6), and Definition 22.8.8, it follows that 


Vo € AE, Vu € atlas(M), Vip € atlas(F), Vp € t(Dom(¢)) n Dom(va), Vj € Nm, Vk € Nn, 


ES F uL i F 
PP? (O vas (p) (lz, (ej )) = » I5 ous (p) ji e ? 


è . i —1 m i —1 
which implies v (0 em "Cm (e¥®))) ROUES D Tur (p) js él, (ei^ 
because co P: ?(0 ei" (p el; E, ) € E, and ez. p, F > E is linear by definition of the induced linear 


structure on Ep. * (See a a 1.6.) Then line (68.1.7) follows from the linearity properties for Oy (z) 
and c P? o 6v(z ) with respect to V and z in Definition 67.5.4 (iii) and Theorem 68.1.6 respectively. 


Line (68.1.8) follows from line (68.1.7) by expanding Definition 65.4.2 for c^? and applying the linear map 
$| E, : Ep — F to both sides of the resulting equation. 


Line (68.1.9) follows from line (68.1.8) by applying vL Nn to both sides of the equation. This is a linear 


space isomorphism by Theorem 54.9.7. (Note that ty(z)¢,,6. = epe is a coordinate chart basis vector 


for T4(,)(F) with respect to Yr by Notation 54.4.10. By contrast, e!" = yp! (ei) € F.) Then line (68.1.10) 
follows from Definition 54.4.4. 


Lines (68.1.12) and (68.1.13) follow from line (68.1.10). 
Line (68.1.14) follows from line (68.1.10) and Theorem 64.9.7. 
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68.2. Covariant derivatives of vector bundle cross-sections 


68.2.1 REMARK: Literature for covariant derivatives on general vector bundles. 

Covariant derivatives of cross-sections of general vector bundles are defined by Poor [32], pages 71-80; 
Frankel [12], pages 428-430; Crampin/Pirani [7], pages 371-372; Kobayashi/Nomizu [19], pages 114-116; 
Darling [8], pages 194-195. 


A covariant derivative on vector bundles is also given by Daniel/Viallet [317], page 186, by first defining 
a covariant derivative of vector-valued functions on the associated principal bundle total space, and then 
applying this to the vector bundle using a bijection as in Theorem 47.12.6. (This style of associated covariant 
derivative is presented in Theorem 69.10.4.) Essentially the same style of covariant derivative construction 
is given also by Eriksson/Hàggblad/Strómbom [264], page 55; Drechsler/Mayer [262], pages 92-95. (See also 
the covariant derivative on vector bundles which is obtained from a connection on an associated principal 
bundle by Frankel [12], pages 483-485.) 


68.2.2 REMARK:  Covariant derivatives require Banach space drop functions. 
Section 68.2 is concerned with covariant derivatives of local cross-sections X € Xi (E, m, M) of fibre bundles 
(E,7,M,A z ). It is not at all clear how to define such covariant derivatives in a fibre-chart-independent 


manner unless the fibre space F is a Banach space. 


The construction of a covariant derivative requires a drop function wË : T; 9(E) — E to drop vertical vectors 
in T; o9(E) to the total space E. The reason for this requirement is that a covariant derivative is always the 
difference between a naive derivative and a horizontal vector which represents parallel transport. Thus the 
covariant derivative is a measure of deviation of a vector from parallel transport. Since the drop function 
in Definition 65.3.5 requires at least a Banach space for the fibre space, this requirement is also imposed on 
covariant derivatives. It is assumed here for simplicity that the fibre space is a finite-dimensional real linear 
space, but the theory is valid for general real Banach spaces. 


Consequently covariant derivatives of cross-sections require vector bundles for their definition. However, a 
kind of fibre-chart-dependent covariant derivative is presented as Definition 68.2.5, which probably has no 
use except to show how useless it is. 


68.2.3 REMARK: Covariant derivatives constructed by lifting vector fields. 

For a cross-section X € Xj.(E,7, M), there is a “naive derivative” Oy X for V € T,(M) with p € Dom(X), 
which is independent of manifold and fibre charts. However, it is a vector in Tx(j)(E). In the case of 
covariant derivatives for tangent bundles, E — T(M) and T(E) — T(T(M)), and there is a fibre-chart- 
independent “vertical drop function" w (in Definition 59.2.15) which drops vertical vectors in T'(T'(M)) to 
vectors in T(M), which is the relatively concrete space where the covariant derivative is expected to be. 
Such a convenient drop function is not available for general fibre bundles. (Consider for example that the 


vertical space T; 9(E) doesn't generally even have the same dimension as T;(M).) 

It is not difficult to convert the vector Oy X € Tx (jj (E) to a vertical vector Oy X — 0v(X(p)) € Tx(p),0(F) 
by subtracting the parallel transport term 6y (X (p)), but it is difficult to map this vertical vector in general 
to something meaningful in E. 


68.2.4 REMARK:  Covariant derivative of vector field equals naive derivative minus horizontal lift. 

'The covariant derivative for a differentiable ordinary fibre bundle is defined as the difference between the 
naive derivative and the parallel transport in Definitions 68.2.5 and 68.2.9. In other words, the covariant 
derivative equals the naive derivative minus the horizontal lift. As is customary, the first parameter V of 
D*:9(V, X) is written as a subscript, and parentheses are not used around X unless it is a complicated 
expression. Thus D®:?(V, X) is customarily written as Dex. (For the naive action Oy X of a vector V ona 
differentiable cross-section X of a fibration, see Definition 64.7.7. For the chart-dependent drop function c?, 
see Definition 64.6.6.) 


Definition 68.2.5 is an attempt to generalise the covariant derivative of tangent vector fields with respect to 
affine connections on tangent bundles in Definition 71.6.4. (Definition 68.2.9 is a successful attempt to do 
the same thing.) 


68.2.5 DEFINITION: Fibre-chart-dependent covariant derivative for general fibre bundles. (Not useful.) 
The covariant derivative (operator) on a Ct (G,F) fibre bundle E < (E,vg, M, A5), for a horizontal lift 
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function 0 on E, for a fibre chart ¢ € AL, is the map D9? : T(U5) x X!(E, ng, M) — T(F) defined by 


Vp € Ug, VV € T,(M), VX € X! (E, ng, M), 
D$* x = w* (ay X — 0v(X(p))), (68.2.1) 
where Us = rg(Dom(9)). (See Definition 64.6.6 for cz?.) In other words, 
VV € T(U4), YX € X! (E, ng, M), 
DU? X = w? (0v X — 6v (X (rra (V)))); (68.2.2) 


where Trom) : T(M) — M denotes the projection map for the tangent bundle T(M). 


((2016-7-13. After Definition 68.2.5, show that the covariant derivative (fibre space projection) DUX has 
fibre chart transition properties in accordance with Theorem 64.6.8. )) 


68.2.6 REMARK: Interpretation of fibre-chart-dependent covariant derivatives. 

The two terms which are subtracted to define the covariant derivatives in Definitions 68.2.5 and 68.2.9 have 
clear geometric significance, namely the actual rate of change Oy X of X in the direction V and the parallel- 
transport rate of change 0y (X(p)) of X in the direction V. Both of these rate-of-change terms are elements 
of Tx (5) (E). They also both have horizontal component V. So a drop function wÊ? can be validly applied to 


their difference. The resulting vector DOE resides in T4(x (5j) (F), which is not very useful. Its principal 
value is to show how badly the covariant derivative behaves when one attempts to define it for fibre bundles 
which are not vector bundles. 


68.2.7 REMARK: The chart-dependence of the covariant derivative for general fibre bundles. 

One might ask why the covariant derivative for a fibre bundle is fibre-chart-dependent, considering that the 
covariant derivative for an affine connection on a tangent bundle in Definition 71.6.4 is chart- independent. 
The reason is that the double tangent bundle T(T'(M)) has a specified single standard fibre chart. Therefore 
there is a single specified drop function w from the horizontal vectors in T(T(M)) to vectors in T(M). In the 
case of tangent bundles, the total space E equals T(M), and so T(E) = T(T(M)). (This is also discussed 
in Remark 64.6.1.) 


68.2.8 REMARK:  Covariant derivative for vector bundles using fibre-chart-independent drop function. 
Definition 68.2.9 makes the unsatisfactory covariant derivative in Definition 68.2.5 fibre-chart-independent 
by replacing the fibre-chart-dependent drop function c?, which is valued in T(F), with the fibre-chart- 
independent drop function w, which is valued in E. 


It is the well-definition requirement for the rarely-mentioned vertical drop function which is the real barrier 
preventing the generalisation of covariant derivatives in a well-defined way beyond vector bundles. 


68.2.9 DEFINITION: Fibre-chart-independent covariant derivative for vector bundles. 
The covariant derivative (operator) on a C} vector bundle E < (E, ng, M, AL), for a horizontal lift function 
0 on E, is the map D : (Uveropan (TU) x X! (E,ng, M |U))) > E defined by 


VU € Top(M), Vp € U, VV € T,(M), VX € X'(E,ng, M |U), 
Di X = «F(0y X — 0v (X(p))). 


(See Definition 65.3.5 for the vector bundle drop function c : Ucp T:,0(E) > E.) 

68.2.10 REMARK: Some technicalities of the covariant derivative definition. 

The space of local cross-sections X!(E, mg, M |U) in Definition 68.2.9 is defined in Notation 64.7.3. In 
a vector bundle, global cross-sections always exist because the global zero cross-section always exists. So 
Definition 68.2.9 includes global cross-sections as a special case. Local cross-sections often arise when they 
are constructed from local fibre charts, which in turn may be constructed from fibre charts on an associated 
principal bundle, which may not possess any global fibre charts. 
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68.2.11 THEOREM: Linearity of the covariant derivative. 

Let E < (E, ng, M, AZ) be a C! vector bundle. Let 0 be a horizontal lift function (i.e. connection) on E. 
Then the covariant derivative D? is a linear map with respect to both of its two arguments. In other words, 
the following identities hold. 


(i) VU € Top(M), Vp € U, VÀ € R, VV € T,(M), VX € X! (E,ng, M |U), Dé, X  ADOX 
(ii) VU € Top(M), Vp € U, VW, V; € (M), VX € X' (E,xg, M |U), Do, V, X = DÌ, X + D X. 
(ii) VU € Top(M), Vp € U, VV € T,(M), VA € R, VX € X!(E,ng, M |U), DE(AX) = ADE X 
(iv) VU € Top(M), Vp € U, VV € T (M), VXi, Xo € X! (E, ng, M |U), DO (Xi + X2) = DO X4 + DO Xa. 


PROOF: For parts (i) and (ii), let p € U. For V € T,(M), the map V +> Oy X is linear by Theorem 64.7.9, 
the map V ++ 0y(X(p)) is linear by Definition 67.5.4 (iii). Therefore the map V — V X — 0y (X (p)) from 


T» (M) to Tx(5),9(E) is linear. But the map wE (p) ` Tx(p),o(E) — Ep is linear by Theorem 65.3.9. So the 


map V > TX (p) (OVX — 0v (X(p))) from T,(M) to E, is linear. Thus the map V — DX is linear by 
Definition 68.2.9, which verifies part (i) and (ii). 


For part (iii), let p € U, V € T,(M), A € IR and X € X! (E, vg, M |U). Define f : U > IR by f(q) = ^ for 
all q € U. Then f € C'(U,IR) by Theorem 51.6.5. Let ¢ € Afp- Then cP*(8y(AX)) = Aw? (Oy X) by 
the Leibniz rule in Theorem 65.6.2 line (65.6.2) because Oy f = 0 by Theorem 54.11.18. It follows from the 
linearity property in Theorem 68.1.6 that cz ?(6v ((XX)(p))) = Aw”? (Ay (X (p))). Then by the linearity of 
oP? : T,(E) > E, by Theorem 65.4.4, with z = AX(p), it follows by Theorem 65.4.13 line (65.4.8) that 


w” (ðv AX) — 6v (AX (p))) = e? * (8v (AX) — By (AX (p))) 
= w7 (ðv (AX)) - w2* (Oy X (p))) 
= Xue (8v X) -Awg (Ov (X(P))) 
= Mx ey (0v X — 6v (X (p))) 


Hence DO. (AX) = ADX by Definition 68.2.5. 


For part (iv), cz ?(y(X1 + X2)) = exP:?(0y X1) + wt (ðv X5) by Theorem 65.6.4 line (65.6.5), and 
wE? (Oy ((X1+X2)(p))) = eP:*(0y (X1(p))) +a"? (Oy (X2(p 3) by Hes linearity property in Theorem 68.1.6. 
Then by the linearity of wf? : T,(E) — Ep by Theorem 65.4.4, with z = Xi(p) + X2(p), it follows by 
Theorem 65.4.13 line (65.4.8) that 


w” (Oy (X + X2) — 6v ((X1 + X2)(p))) 

= «7 * (y (X1 + X2) — 8v (Xı (p) + X2(p))) 

= wf? (Oy (X1 + X2)) - w7’ (Ov (X1 (p) + X2(p))) 
Px (Ov. Xı) + wx (Ov X2) — Wx) (OV OX (p) — e (Ov CX2(p))) 
«(py (Ov X1 — Ov (X1 (p))) Twy, (8v Xa — Av (Xa(p))) 
w” (Oy X1 — 0v (X3(p))) + v" (0v X2 — 0v (X2(p))). 


I 


Say 


KS a 


Hence DO (X1 + X2) = DO X1 + DO X» by Definition 68.2.5. 


68.2.12 REMARK: Alternative expressions for the covariant derivative. _ 
In terms of the differential dX and the transposed horizontal lift function 0, the covariant derivative formula 
for 0 may be written as Di, X = w” ((4X),(V) — Ox (p)(V)). So the transposed covariant derivative defined 
by D&V = D?X is a map D& : T(U) — mp (U) which satisfies Dye a = w” o ((dX), — 0x (py) for 
all p € U. This formula is probably not very useful. 


Theorem 68.2.13 expresses the covariant derivative for a connection 0 in terms of the vertical and horizontal 
component maps v? and A? which represent the same connection 6. 
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68.2.13 THEOREM: The covariant derivative expressed in terms of horizontal/vertical component maps. 
Let (E, rg, M, AĘ) be a C! vector bundle with horizontal lift function 0. 


(i) VU € Top(M), Vp € U, VV € T,(M), VX € X! (E, ng, M |U), 
DVX = w^ (vx oy (0v X). 
where v? is the vertical component map for 0 as in Definition 67.10.2. 
(ii) VU € Top(M), Yp € U, VV e T,(M), YX e X'(E,pg, MIU), — 
DY X = wœ” (ðv X — hw (8v X)), 
where A? is the horizontal component map for 0 as in Definition 67.9.2. 


PROOF: For part (i), Definition 67.10.2 implies that Uo (p) (8y.X)) = ðv X —0x (y) ((dr E) x py (Ov X)) because 
Oy X € Tx(p (E). But (dre)xp (OVX) = V. So by Notation 67.8.3, Ox (py ((dtz) x p (0v X)) = y (X (p)). 
Hence w” (3y X — 0y (X (p))) = wF (v& c) (Oy X)). 


Part (ii) follows from part (i) and Theorem 67.11.2 (viii). 


68.2.14 REMARK: The unreality of covariant derivatives. All dropped vectors are "fake". 

Definition 68.2.9 delivers a well-defined element of E which is dropped from a well-defined difference between 
elements of T; y (E). However, this well-defined element of E is not a true element of E. If the cross-section 
X represents an electric field, for example, the rate of change of the electric field vector in a direction 
V € T,(M) is not itself an electric field vector in Æp. It is a different kind of object. (This issue is also 
discussed in Remark 54.9.2.) It is for this reason that the covariant derivative is initially computed as a 
vertical vector in TZ 9(E), not in F(z). The value is very rightly computed in the tangent bundle of E 
because it is a rate of change of the cross-section X, which is not a value in E. 


This line of argument is equally valid for all kinds of drop functions. Even though a vertical vector on 
a total space does transform like a an element of the total space, this does not truly make it an element 
of the total space. To put it simplistically, all dropped vectors are “fake”. However, the Lie bracket, Lie 
derivatives and covariant derivatives all require drop functions. So in this sense, they are all “fake”. One 
must always keep in mind that covariant derivatives live in a different space to the underlying total space. 
'This is like distinguishing the elements of IR? which describe position, velocity and acceleration in classical 
dynamics. These are different kinds of objects which just happen to lie in the same mathematical structure for 
convenience. Thus for example, a point might be described in classical physics by components (x, y, z) € R3, 
and the velocity of an object at this point might have components (x, y, z, u, v, w) € RË. In the fibre bundle 
perspective, every velocity is “tagged” by its position. A drop function would map (z, y, z, u, v, w) € IR? to 
(u,v, w) € IR?, which looks very similar to the position tuple (x, y, z) € R3. However, it is not the position of 
a point. To put it another way, drop functions remove the “location tag". This is a real loss of information, 
which makes the "true nature" of the object ambiguous. 


68.2.15 REMARK: Generalisation of Leibniz rule from tangent bundles to vector bundles. 
Table 68.2.1 lists some of the stages leading up to the proofs of Leibniz rules for naive and covariant derivatives 
on tangent bundles and vector bundles. 


The Leibniz rule in Theorem 68.2.16 is the covariant derivative version of Theorem 65.6.2, which is a 
Leibniz rule for naive derivatives of cross-sections of vector bundles. Theorem 68.2.16 is also effectively a 
generalisation of the covariant derivative Leibniz rule in Theorem 71.6.7 from affine connections on tangent 
bundles to linear connections on vector bundles. 


68.2.16 THEOREM: Leibniz rule for covariant derivatives om vector bundles. 
Let (E, Tg, M, AE) be a C! vector bundle with horizontal lift function 0. Then 


VU € Top(M), Vf € C'(U, R), VX € X! (E, rg, M |U), Vp € U, VV € T,(M), 
Dy (f-X) = (0vf)X(p) + f(n) D X, 


where D is the covariant derivative for 0, and f.X denotes the pointwise product of f and X. 
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tangent bundle vector bundle stage 


54.4.4 65.1.6 linear space structure on fibre sets 

57.2.3 64.9.11 differentiability test for cross-sections via charts 
57.2.11 65.2.4 differentiability of scalar /vector function product 
59.2.9 65.3.5 vertical drop function definition 

59.2.11 65.3.13 vertical drop function pointwise linear isomorphism 
59.3.2 65.4.2 oblique drop function definition 

59.4.8 65.5.4 scaling curve map-differential computation 


59.4.10 65.5.6 constant-scale map differential computation 
.6.2 Leibniz rule for naive derivative 
71.6.4 68.2.9 definition of covariant derivative 
71.6.7 68.2.16 Leibniz rule for covariant derivative 
Table 68.2.1 Stages in the proofs of Leibniz rules 


Pnoor: Let U € Top(M), f € C!(U,IR) and X € X!(E,«g, M, |U). Then f.X € X!(E,vg, M, |U) 
by Theorem 65.2.4. Let p € M and V € T,(M). Then DO (f.X) = w?(Ov(f-X) — 0v(f(p)X(p))) by 
Definition 68.2.9. Let Y € atlas; (M). Then 


w” (Ov (fX) — Ov (FPX (»))) = Trox (Ov (F-X) — Ov GF Q2) X (0))) (68.2.3) 
= (Ov f) X (p) + f(»)e Xy (0v X — 6v (X (p))) (68.2.4) 
= (ðv f)X(p) + fp) D$ X, (68.2.5) 


where line (68.2.3) follows from Definition 65.3.5 and Theorem 65.4.13 line (65.4.8), line (68.2.4) follows 
from Theorem 65.6.2 line (65.6.2), and line (68.2.5) follows from Definition 68.2.9 and Theorem 65.4.13 


line (65.4.8). Hence DO (f.X) = (Ov f) X(p) + f(p) DO X. 


68.2.17 REMARK:  Equi-informationality of covariant derivative and horizontal lift function. 

Theorem 68.2.18 shows that the information in a horizontal lift function can be recovered from its covariant 
derivative operator. One small technical hurdle here is that the horizontal lift is defined pointwise. It 
produces an output 6y(z) for a single total space value z € Ep, where V € T,(M). By contrast, the 
covariant derivative operation De X requires a cross-section X on some neighbourhood of p. Therefore to 
recover any information at all from the covariant derivative, a cross-section must be supplied as its input. 
This is why Theorem 68.2.18 constructs a “constant cross-section extension” Extng(z) of the total space 
element z as in Definition 21.6.8, relative to a fibre chart œ. 


68.2.18 THEOREM: Reconstruction of horizontal lift function from its covariant derivative operator. 
Let (E, 5, M, AZ) be a C! vector bundle with horizontal lift function 0. Then 


Vp € M, VV € T,(M), Vz € Ep, VO € Afp» 
Oy (z) = ðv Extng(z) — (coP) ! (D$ Extng(z)). (68.2.6) 


PROOF: Let X = Extng(z). Let U = tz(Dom(¢)). Then X € X!(E,vg, M |U) by Theorem 64.7.16. So 
D, X = w” (ôy X — 0y (X(p))) is well defined by Definition 68.2.9. Since ôy X — 6y(X(p)) € Tx(p),0(E) = 
T. o(E), it follows from Theorem 65.3.13 that the inverse of the linear isomorphism wË : T; 9(E) + E, may 
be applied to obtain (ccP) !(DO.X) = Oy X — 0v(X(p)). So Ov(z) = Ov(X(p)) = OVX — (cP) ! (DO X), 
which implies line (68.2.6). 


((2017-1-12. Give formula for the covariant derivative in terms of the Christoffel array in Definition 68.1.8. 
Maybe also give formulas for the covariant derivatives of differential forms and general tensor fields in terms 
of Christoffel arrays. )) 
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68.3. Moving-frame coefficients for vector bundle connections 


68.3.1 REMARK: Cartan coefficient arrays for connections on vector bundles. 

The moving-frame-based coefficient array fields in Definition 68.3.2 are very closely related to Christoffel 
array fields. (See Definitions 68.1.8 and 71.2.2.) In fact, they are the same concept if the manifold basis field 
e™ is derived from a manifold chart as in Definition 54.4.9. 


Some authors use the reverse order for the lower two indices of the array w in Theorem 68.3.4. (See for 
example Frankel [12], page 243.) This is very similar to the analogous index order issue for the Christoffel 
array. (See Remark 71.12.2 for discussion of this issue.) In the case of general vector bundles, this index 
order is even more important. The convention adopted here is to show the differentiating index last, namely 


on the right. 


k 
ij 
is typically written with the base point dependence suppressed as wi, and here it is notated as wr, (p), as if 
it were an array of functions of the base point, which is also often seen in the literature. Part of the reason 
for this is confusion with other kinds of connection forms which also conventionally use the letter w. The 
use of two or three indices, such as “wE” or “wh”, is a hint that this is the Cartan style of connection, not 


the Ehresmann-style connection form in Section 69.5. 


Although the coefficient array in Definition 68.3.2 is an array-valued function w(p)7; of the base point p, it 


68.3.2 DEFINITION: Coefficient array for Cartan-style connection on a vector bundle. 

The (Cartan) coefficient array for a connection 0 on a Ct vector bundle (E, rg, M, AE) with respect to a 
vector frame field e” € X!("(E,m«g, M)|U) for E and a vector frame field eM € X(F"(M)|U) for M, 
where U € Top(M), n = dim(M) and m = dim(F), is the function w : U > (Nm x Nm x IN, > R), or 
equivalently w : U — R™*™*" | given by 


Vp & U, Vk, i € Nm, Vj € Nn; wi; (p) = rp (Dimer), 
where KL : Ey + R™ is the component map for the basis e” (p) of Ep for all p € U. (See Definition 22.8.8 
for &P. See Definition 65.8.6 for F" (E, ng, M). See Definition 55.6.8 for F"(M).) 


68.3.3 REMARK: Meaning of the Cartan-style connection coefficient array. 
The formula for c7; (p) in Definition 68.3.2 commences with a C? vector-frame field e, which is a local cross- 
section of the total space vector-frame bundle #™(E, mg, M), then covariantly differentiates this vector-frame 
field with respect to a base-space vector frame eM (p) for T,(M), and then finally computes the components 
of the resulting derivative-vector in E, with respect to the total space vector frame eP(p). Thus one may 


say that the order of operations is 7, then j, then k. 


'The role of the early 20th century Cartan-style connection coefficient array is the same as the role of the mid 

19th century Christoffel array, which is to tensorise a naive derivative with respect to coordinates. The term 

Ou ae (p) is a naive derivative with respect to a vector-frame field of a cross-section X € X!(E, rg, M |U). 
I 


This is tensorised by the term 377^, wi (p)x'(p). The difference between this and the Christoffel array’s role 


is that the vector-frame field eV € X(#"(M)|U) for the Cartan-style coefficient array is not necessarily 
the coordinate basis field of a manifold chart. 


From the point of view of the style of covariant derivative operator in Definition 68.2.9, which is derived 
from a horizontal lift function on the vector bundle, the coefficient array in Definition 68.3.2 is not much 
more than a convenient computational tool for tensor calculus. But from the Cartan formalism point of 
view, the “moving frames" e and eP are primary analytical and definitional tools, and the coefficient array 
is therefore of some importance. 


68.3.4 THEOREM: Cartan-style coefficient array formula for covariant derivatives on vector bundles. 

Let (E, tg, M, AZ) be a C! vector bundle with connection 6. Let n = dim(M) and m = dim(F). Let 
U € Top(M). Let (ef (p));2, € Ef be a basis for E, for each p € U, such that eP € X'!(E,mg, M |U) for 
all i € Nm. Let (e7 (p))5., € T;(M)" be a basis for T;(M) for each p € U. Then 


VX € X (E,ng, M |U), Vp € U, VV € T,(M), 
DyX- 2 ek (p) » vi (p) (Dear (py) 2" + X wi (p)x*(p)), 
— J= k a 
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where V = i v (pje (p) and X (p) = Dp x" (p)ez (p) for all p € U, and (wp), 71 is the Cartan 
coefficient array for 0 with respect to eP and eM 


PROOF: Let X € X!(E, nrp, M|U), pe U and V € T;(M). Then 


DEX = D(X a" ef) 


= Y: DO (a ef) (68.3.1) 
k=1 

= Y ((8va^)eg (p) + x^ (p) Dy ef) (68.3.2) 
k=1 

E (Ova eg (p) + 2; 2'(p) Dyer 
zl i— 

EE ek (p) >; v P)OeM (yt T 5 x (p) y v! (p )D? eM (p)®i (68.3.3) 
k=1 j= i=1 j=l 

= X ep (p) 2; v (n0, 2^ + 5; v (p) X z (P) Diu er 
k—1 j= j=l i=1 

= V eF(p) Y; vi p)O.u(yyz^ + J v (p) ) * a'(p) Y] wr (pjek (p) (68.3.4) 
kel j= j=l i=l k=1 

= Y eg (p) 3; v (pem + X eg (p) X v4 (p) 3; r (pogl) 
k=l j= k=1 j=l i=1 

= L ek (p) 22 v (p) (Dea py + 2; ws) (p), 


where line (68.3.1) follows by Theorem 68.2.11 (iv) by induction, line (68.3.2) follows by Theorem 68.2.16, 
line (68.3.3) follows by the linearity of Oy with respect to V, which follows from Definitions 61.2.3 and 58.9.4 
and the linearity of Definition 58.4.5, and the linearity of D^ by Theorem 68.2.11 (ii), line (68.3.4) follows 
from Definition 68.3.2. 


68.3.5 REMARK: Equi-informationality of covariant derivative operators and Cartan array fields. 

The fact that the covariant derivative of any C! cross-section can be reconstructed from the Cartan coefficient 
array field in Theorem 68.3.4 implies that this array field is equi-informational to the covariant derivative. 
So it may be regarded as an alternative representation of the covariant derivative. This is very similar to the 
way in which Theorem 68.1.10 demonstrates that Christoffel coefficient array fields are equi-informational 
to connections on vector bundles because they can be reconstructed from their Christoffel array fields. 


Theorem 68.3.6 shows that the Cartan coefficient array field concept is a relatively modest extension of the 
Christoffel coefficient array field concept. The only real difference is that the basis field can be more freely 
chosen in the Cartan case. 


The Cartan array (wk, (p))?4—1 j=1 May seem to be less encumbered by dependencies on arbitrary coordinate 
charts than the Christoffel array u ducite (ENS 71 in Theorem 68.3.6, but when the dependencies of 


w are revealed as (wf, eB (55) k=1 j=1; it is clear that the Cartan arrays are just as dependent on arbitrary 
constructions as Christoffel arrays are. 


68.3.6 l'HEOREM: Equality of Cartan and Christoffel coefficients for vector bundle connections. 

Let (E, ng, M, AL) bea C! vector bundle with connection 0. Let ó € AL, Ym € atlas(M) and vp € atlas(F). 
Let n = dim(M) and m = dim(F). Let U = zg(Dom(9)) n Dom(vy). 

Let eP(p) = (eP(p))?*, be the induced basis for E, defined for p € U by ef (p) = = él (Vgl(ei)), with 


a 


standard basis vectors e; € IR". Let eM (p) = (e}" (p))? ., be the basis for T, (M) her by eM (p ) = tye, om 


for p € U and j € Np. Let (wk, (p))?4— 5-1 be the Cartan coefficient array for 0 with respect to e and eM 
in Definition 68.3.2. Then 


Vp € t(Dom(¢)) n Dom(vy), Vi, k € Nm, Vj € Nn, 
0 k k 
Vo ba wr (pb)i = Wij (p), 
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where I? is the Christoffel coefficient array for 0 with respect to ¢, Ym and vr in Definition 68.1.8. 


PROOF: Let [o € AE p 'Then IeM (y) eF = Oem (p) (TE x p), Yp (e;)). So wE? (3; M(E Aj = On, by 
Theorem 65.4.14. (Alternatively apply Theorem 65.4.16.) Therefore by Definitions 68.3.2 2 and 682.9 .9, 


Wi E (p )= Ke (De M (p)ĉi Pi 
Ky (v7 (IeM (pei — Oem (plei (p))))* 


KEPA (Opa a- wE (0u p (E09) (68.3.5) 

kp (we #0, mp) (ei (p))))* (68.3.6) 

be (o MOST (p))))) (68.3.7) 

= I$ uarie (Phi (68.3.8) 


where line (68.3.5) follows from Theorem 65.4.13 line (65.4.8) and Theorem 65.4.4, line (68.3.6) follows 
from Theorem 22.8.12, and line (68.3.7) follows from Theorem 65.1.11, and line (68.3.8) follows from Defi- 
nition 68.1.8. 


68.3.7 REMARK:  Cartan-style connection form arrays on vector bundles. 

Definition 68.3.8 converts the coefficient arrays in Definition 68.3.2 into arrays of real-valued one-forms. 
These are used extensively in the Cartan version of connections on vector bundles. The same symbol w is 
used for both the scalar coefficient array and the one-form array. (The intended meaning in each case can 
be determined by counting the indices!) 


Once again, the integer indices precede the base point in the notation, which makes this function appear 
to be an m x m array of cross-sections of the dual bundle T*(M). In other words, the function w has the 
appearance of a map w : Nm x Nm  X(T*(M)). 


68.3.8 DEFINITION:  Cartan-style connection form array on a vector bundle. 

The (Cartan) connection form array for a connection 0 on a C! vector bundle (E, ng, M, AE) with respect 
to a vector frame field e” € X!("(E,ng, M)|U) for E and a vector frame field eV € X(F"(M)|U) 
for M, where U € Top(M), n = dim(M) and m = dim(F), is the function w : U > (Nm x Nm  T*(M)) 
given by 


Vp € U, Vk,i € Nm, VV € T,(M), wi (p)(V) = KP (Dyer), 


where KL : Ey + R™ is the component map for the basis e” (p) of Ep for all p € U. (See Definition 22.8.8 
for KP. See Definition 65.8.6 for F(E, mg, M). See Definition 55.6.8 for F”(M).) 


68.3.9 REMARK: Connection forms versus covariant derivative forms. 

Although the forms in Definition 68.3.8 are generally called “connection forms”, and the coefficients in 
Definition 68.3.2 are generally called “connection coefficients”, they are in fact “covariant derivative forms” 
and “covariant derivative coefficients” respectively. However, in much of the modern literature, covariant 
derivative operators are referred to as “connections”. (This is evident from Table 67.2.2 in Remark 67.2.2.) 
This is unfortunate because covariant derivatives are specific to vector bundles, whereas connections are 
much more general. For example, connections may be defined on principal bundles, which are not included 
in the class of vector bundles. 


68.3.10 REMARK: Relations between connection coefficient arrays and one-form arrays. 
The relations between Definitions 68.3.2 and 68.3.8 are quite simple and obvious. 


Vp € U, Vk, i € Nm, Vj € Nn, wi; (p) = wr ()(ej. (p) 


1J J 


and 


Vp € U, Vk, i € Nm, VV € T,(M), 


wh (PV) = E e Y Eu qe 
= Xs (VVO), 
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where K% : T,(M) — R” is the component map for the basis e" (p) of T;(M) for all p € U. 


68.3.11 THEOREM: Reconstruction of covariant derivative from Cartan connection forms. 

Let (E, np, M, AE) be a C! vector bundle with connection 0. Let n = dim(M) and m = dim(F). Let 
U € Top(M). Let (eF (p));2, € Ez" be a basis for E, for each p € U, such that e? € X'(E, sg, M |U) for 
all i € Nm. Let (ef (p). € T,(M)” be a basis for T;,(M) for each p € U. Then 


VX € X! (E,ng, M |U), Vp € U, VV € T;(M) 
=1 


DUX = Y OCE (p) benga" + $ OV) 20), (68.3.9) 


where V = $77 ,v/(p)ej (p) and X(p) = Xp- x" (p)ez (p) for all p € U, and (wF (p))7,—, is the Cartan 
connection form array for 0 with respect to eP and eM. 


Pnoor: Line (68.3.9) follows from Theorem 68.3.4 by substitution of w^ (p) (e? (p)) for wi (p). 


68.3.12 REMARK: Relation of Cartan connection forms and coefficients to horizontal lift function. 
Theorem 68.3.13 shows the close relation between the Cartan connection forms (and coefficients) and the 
horizontal lift function of a connection. (In this book, the horizontal lift function is the connection.) If 
the frame bundle cross-sections eM and e” are induced by charts, then the horizontal lift function can be 
expressed very directly in terms of the Cartan connection forms or coefficients. 


68.3.13 THEOREM: Reconstruction of horizontal lift function from Cartan connection forms. 

Let (E, ng, M, AL) bea C! vector bundle with connection 0. Let ó € AL, Ym € atlas(M) and vp € atlas(F). 
Let n = dim(M) and m = dim(F). Let U = mg(Dom(¢)) n Dom(vy). 

Let eF(p) = (eP(p))?*, be the induced basis for E, defined for p € U by eF (p) = le (Wr (ej), with 
standard basis vectors e; € IR". Let eV (p) = (e} (p))?— be the basis for T; (M) defined by e? (p) = tpe; ,y™ 
for p € U and j € Ny. Let (wk, (p))?&—15-1 be the Cartan coefficient array for 0 with respect to eË and eM 


in Definition 68.3.2. Let (w¥ (p))?4—4 be the Cartan connection form array for 0 with respect to eP and eM 
in Definition 68.3.8. Then 


Yp € M, VVET,(M), Yz € Ep, Vo € Aip» Vym € atlas,(M), Vvr € atlas(F), 
By (z) tz, (v,w), (Ym XV p)o(n xQ)» (68.3.10) 


where v € R” is given by v) = vy (V)! for all j € Nn, and w < WE pu prl? V) € R” is given by 


Vk € Nm, wh =- È Y: wh be (4G) val VY (68.3.11) 
-- x wt (P) (VIr lol). (68.3.12) 


PROOF: Lines (68.3.10) and (68.3.11) follow from Theorem 68.1.10 lines (68.1.14) and (68.1.11), with the 
substitution of Cartan coefficients for Christoffel coefficients according to Theorem 68.3.6. Then line (68.3.12) 
follows from Definitions 68.3.8 and 68.3.2. 


68.4. Covariant derivatives of functions on the total space 


68.4.1 REMARK: Covariant derivatives of contravariant principal bundle functions. 

The covariant derivative for contravariant principal bundle functions in Definition 69.10.2 is very useful. It is 
shown in Theorem 69.10.4 that such covariant derivatives are essentially identical to the standard covariant 
derivatives of cross-sections of the associated ordinary fibre bundle. This raises the possibility that such 
covariant derivatives of functions on total spaces could have a broader relevance, for functions on the total 
spaces of more general fibre bundles. If the attempt fails, this will show that the contravariance of the 
functions is important to make such a covariant derivative usable. 
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The kind of more general covariant derivative envisaged here would be of the form (dY),(@v(z)) for a 
connection 6 on a fibre bundle (E, 72, M, AE), where Y € C'(E, F), z € E and V € T;,(z)(M). The space 
F could be F itself, or some other such finite-dimensional real linear space. 

To make the expression (dY);(0v(z)) € Ty(; (F) more closely resemble a covariant derivative, its value 
should lie in Ej, where p = 7g(z). This can be achieved by dropping (dY ): (0v (z)) from Ty (;)(F) to F and 
then mapping this back to Ep via A : F — Ep for some ¢ € Af. This is in fact almost exactly what is 
done in Definition 69.10.2, where f = w” ((dY ),(0v(z))) is mapped to the orbit lp A) = (zo, f)], where 
zo = h(@)|p (e) for a fibre chart association. h : AK > ES. 

This kind of construction uses a chart 9 € AL, whose target space is F, the fibre space of E. So it seems 


that the linear space F should be chosen to be the same as F. This is done in Definition 68.4.2. 


68.4.2 DEFINITION: Covariant derivative for functions on OFB total spaces. (Not useful.) 
The covariant derivative for F-valued functions on the total space E of a C! vector bundle (E, 7p, M, AL) 
with fibre space F, for a connection 0 on E, a fibre chart ¢ € Af, z € Dom(¢) and V € Tz, (M), is the 


map D^? : CI(E, F) > E,,(,) given by 


Vo € A5, Yz € Dom(4), VV € T,,G)(M), VY € CI (E, F), 
DIY) = ola o Go" (dV) 2(0v(2)))) 


-é|g. . (cw (8s, (Y). 


Tpl) 


68.4.3 REMARK: Difficulties with covariant derivatives of functions on OFB total spaces. 
Definition 68.4.2 is far from satisfactory. The map DM. : CI(E, F) > Ez,( is well defined because 
0v (z) € Tzv (E) for all z e Dom(¢) and V € T,,,(2)(M), and then (dY),(@v(z)) € Ty(;(F) is well defined, 


and so c^" ((dY);(0v(z))) € F is well defined, which implies that ó|z. (zz ((dY): (6v (z)))) € Ez, (; is well 


defined. However, De (Y) depends on ¢, which makes it not very useful. 


'This is similar to the difficulties which arise for Definition 68.2.5, which is a fibre-chart-dependent covariant 
derivative of cross-sections X € X!(E, ng, M). In that case, the naive/parallel differential-difference Oy X — 
0y (X (p)) is an element of Tx (5), (E), which cannot be dropped to E, because E, does not have a linear 
space structure. In the case of Definition 68.4.2, the differential (dY );(0y (z) lies in Ty (;) (F), which is easily 
dropped to w” ((dY );(0v(z)) € F, but then this must be re-imported to the linear space Ej somehow. 


Vectors in TZ o(E) can be dropped to E,, (2) by exporting them to F and then re-importing them to E 
1 F ; 

m(z) ow o (dé)... uy E T, o(E) — Er az) 

in Definition 65.3.5 is chart-independent because the two uses of ¢ cancel each other, namely the export 


via some fibre chart. The vertical drop function cP = | 


map (d9).|,. o(B) and the import map ble as shown in Theorem 65.3.7. Unfortunately, the export map 
(dY), : T(E) > Tya) (F) in Definition 68.4.2 is chart-independent. So it cannot be used to cancel the 


F -1 
import map olg “fe 


This obstacle is overcome in Definition 69.10.2 by pairing z € P with c^ ((dY );(8v(z))) € F to construct an 
orbit [(z, w® ((dY)-(Gv(z))))] € (P x F)/G, where the z-dependence of the P and F elements compensate 
for each other, which results in a z-independent orbit, which is also fibre chart independent. There seems to 
be no way to design any such self-compensating structure for ordinary fibre bundles. 

If real-valued functions Y : E — IR are used in Definition 68.4.2, the expression (dY);(0v(z)) = Oo, (2) Y 
lies in TZ(E), which has no chart-dependence issues. No export or import maps are required. However, it 
is not easy to map T7(E) to E,,(. in a meaningful way. Such a map would be a kind of “index-raising 
isomorphism" or "sharp musical isomorphism" as in Definition 73.5.4, which requires a metric tensor field, 
which is not available for general connections. 


It may be concluded that there is no obvious definition of a useful covariant derivative for functions on 
ordinary fibre bundle total spaces. By contrast, the covariant derivative for contravariant principal bundle 
functions in Definition 69.10.2 has great importance. 
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68.5. Parallel transport on ordinary fibre bundles 


((2016-7-13. In Section 68.5, apply the Lebesgue differentiation theorem to define horizontal lift for suitable 
vector fields along rectifiable curves for a general continuous connection. 


2022-10-9. Section 68.5 probably should be either deleted or moved to the end of Chapter 67. )) 


68.5.1 REMARK: Obtaining parallel transport on ordinary fibre bundles from horizontal lift functions. 

Parallel transport for ordinary fibre bundles is obtained from a connection by solving the parallel transport 
system of ordinary differential equations. The Picard iteration method which is described in Section 44.6 
may be applied to this ODE system to demonstrate existence and uniqueness along rectifiable curves. i 


The Picard iteration method may be applied to the parallel transport differential equation by converting 
it to an integral equation of the form Vs € [a,b], z(s) = z(a) + f° A(z(t)) dy(t) along a rectifiable curve 
y : [a,b] — R” for a linear-operator-valued function A : IR — Lin(IR?, IR"), to solve for z : [a,b] ^ IR". 
(Note that the linear operator A(z(t)) acts on the integrator differential dy(t) for t € [a,b]. See Section 43.12 
for Stieltjes integrals of linear-operator-valued integrands.) If the curve is C! differentiable, the simpler 
vector-valued Riemann integral f? A(z(t)) (^ (t)) dt is adequate to describe parallel transport, although the 
Picard iteration method is applied in the same way. 


To put the horizontal lift function into the form required by the Picard iteration theorem for vector-valued 
functions along rectifiable curves, one must use the transposed lift functions in Definition 67.8.2. If dim(M) — 
n and dim(F) = m, a transposed horizontal lift function 0 on a (G, F) fibre bundle (E, vg, M, AL) has 
the form 6(z) € Lin(T;,(2)(M),T.(E)) for z € E, where dim(E) = m+n, dim(T;,(2)(M)) = n and 
dim(T;(E)) = m+n. The horizontal components here are redundant because they equal the base-space 
velocity by Definition 67.8.2 (ii). So to solve the parallel transport equation, one may map 0,(V) to F via 
a fibre chart ¢ € AL for all z € E. Then the parallel transport equation may be solved in terms of the 
(fibre-chart-dependent) vertical component in a “local trivialisation" U x F for some U € Top(M). This 
leads to an equation which roughly has the form 


S 


#(2(s)) = d(2() + f diada. (68.5.1) 


where the vector integrator differential “dy(t)” is acted on by the linear operator Oz) as mentioned in 
Remark 43.12.1. Thus the Picard iteration is to be executed for a vector-valued function z : [a,b] > R”, 
which is thought of as being a vector field along a rectifiable curve y : [a,b] — R”. (This is a “lift” of the 
curve from the base space to the total space.) The curve is fixed during the iterations. The differential of 
the curve acts in the role of a kind of density function for the Stieltjes integral. 
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69.1. Horizontal lift functions for principal fibre bundles 


69.1.1 REMARK: Choice of definition for connections on principal fibre bundles. 

Connections on principal fibre bundles are defined here as horizontal lift functions. In some texts, it is the 
PFB connections which are considered primary, and OFB connections are induced via associations between 
OFBs and PFBs. (See Table 67.2.2 in Remark 67.2.2 for a survey of definitions of connections.) Of those 
authors who define connections on PFBs, most define them in terms of horizontal subspaces rather than 
horizontal lift functions. Those authors who do define connections to be horizontal lift functions mostly 
define them on OFBs. 


Since a pointwise connection has two inputs, a base-point velocity V and a total space element z, a choice 
of order of parameters must be made. For OFBs, Definition 67.5.4 puts V first, whereas Definition 67.8.2 
puts z first. When V is placed first in the expression 6y(z), there are apparently more conditions in the 
definition than if z is placed first in the expression 0,(V). The combined conditions (i), (ii) and (iii) for 
0y (z) in Definition 67.5.4 are effectively equivalent to the single condition (i) for 0: (V) in Definition 67.8.2. 
'This apparent difference in complexity is a consequence of the relative availability of pre-defined function 
spaces for the specification of these structures. 


For horizontal lift functions on PFBs, the standard definition here, Definition 69.1.3, puts the base-space 
velocity parameter V first and the total space element z second, as for OFBs. This is consistent with a desire 
to present the lift function as a velocity-dependent vector field on the fibre set P, at each point p, rather than 
as a separate vector-to-vector lift function for each element z of the total space P as in Definition 69.2.2. 


69.1.2 REMARK: Specialisation of right action map from ordinary to principal fibre bundles. 
In Definition 67.5.4 for a horizontal lift function on a (G, F) fibre bundle, condition (v) uses a right action 
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map Ay: G — F which is defined by Ry : g — gf for f € F as in Definition 63.6.4. But in Definition 69.1.3 
for a principal G-bundle, F is the same as G. Consequently Ry is the same as the right action R$ for Lie 
groups as in Definition 62.6.2. Thus Definition 69.1.3, ignoring the alternative condition (v), is an exact 
application of Definition 67.5.4 to the special case of principal fibre bundles. 


Figure 69.1.1 illustrates the spaces and maps in Definition 69.1.3 for PFB horizontal lift functions, with 
special emphasis on the infinitesimal generator condition (v). (See Figure 67.5.1 for the almost identical 
diagram for OFB horizontal lift functions in Definition 67.5.4.) 


(Qm Cy 
RG | | GREG 
G (9-9 — C.) Eust) 
(46). (Bv (2)) 
of 1 (do) « 
to) 
p  — —— eo z 
By 
z| | (dr): 
(em Ce 


Figure 69.1.1 Maps and spaces for a horizontal lift function on a principal fibre bundle 


69.1.3 DEFINITION: A horizontal lift function on a C! principal G-bundle (P, r, M, AS) is a map 
B: T(M) > U,ey (Pr > T(P)) which satisfies: 


(i) Vp € M, VV €T,(M), Dom(By) = P,. 

(ii) Vp € M, VV € T (M), Vz € P, fBv(z) € (P). 
(iii) Vz € P, (V e» 6v (z)) € Lin(Tz (M), TZ(P)). [linearity] 
(iv) Vp e M, VV € T,(M), Yz € P, (dr);(Bv(z)) = [horizontal component] 
(v) Vp e M, VV € T (M), Vo € AS» 3 Ju € T.(G), Vz € e (d$)(Bv(z)) = (dR) )e(u (u), [generator] 


where RE : G — G for g € G denotes the right action HE : cz xg on G as in Definition 62.6.2. 
In other words, Vp € M, VV € T (M), Yọ € A »s Ju € T.(G), xo By — XE o4. 
(See Definition 62.7.7 for the right-invariant vector field XË generated on G by u € T.(G).) 


Alternatively, condition (v) may be replaced with the equivalent condition (v‘). 


(v) VV € T(M), Yz € Pv» Vg € G, Bv(zg) = (ARP):(Bv ()), [right action map invariance] 
where RZ: : P > P is the right action R? :z e u (z, g) of g € G on P. (See Definition 66.2.2 for uL. 
Or see Definition 66.2.10 for RẸ.) 


A connection on a C! principal G-bundle is a horizontal lift function on the bundle. 


69.1.4 REMARK: Interpretation of non-structure-group horizontal lift function conditions. 
For interpretation of Definition 69.1.3 conditions (i, ii, iii, iv), see Remark 67.4.3 concerning the corresponding 
conditions of Definition 67.4.2 for horizontal lift functions on C! fibrations. 


69.1.5 REMARK:  Well-definition of differentials of the projection map and right action maps. 

Suppose (P, x, M, AS) is a C! principal G-bundle. Then the map 7: P— M isa C! is between the C! 
manifolds P and M by Definitions 66.1.2 and 64.8.3 (i). Hence the differential (d); : T.(P) > T,(z)(M) of 
the map 7 is well defined for every z € P by Definition 58.4.5. Similarly, if RẸ denotes the action of a group 
element g € G on the manifold P, then R? is a C! map from P to P by Dafidition 62.2.2. Therefore the 
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differential (ARP); : T.(P) > Tzg(P) of the map R7 is well defined for every z € P and g € G. Thus the 


differentials (dx); and (dR7); in Definition 69.1.3 are both well defined. 


69.1.6 REMARK: Equivalent condition for identification of right action maps with Lie algebra elements. 
Definition 69.1.3 condition (v) is illustrated in Figure 69.1.2. (The OFB version is Figure 67.5.2.) 


(46). (8v (z)) = (ARG. e (u) 
DICA 


Figure 69.1.2 Right action map condition for PFB horizontal lift function 


Condition (v) states that a horizontal lift function must be generated by an infinitesimal group element via 
a fibre chart. This is exactly the same as the corresponding condition (v) in Definition 67.5.4 for ordinary 


fibre bundles. One of the advantages of this infinitesimal generator style of condition is that it works exactly 
the same for all differentiable fibre bundles. 


By contrast, the “differential right action map invariance” condition (v^) in Definition 69.1.3 has no analogue 
for ordinary fibre bundles because OFBs do not possess a corresponding right action map. 

It is not immediately obvious that conditions (v) and (v^) in Definition 69.1.3 are equivalent. This equivalence 
is asserted as Theorem 69.1.7. It emerges during the proof of this theorem that the quite unsatisfying Lie 
algebra element u, which is only known to “exist” in Definition 69.1.3 (v), does have a concrete value. 


For any given p c M, V € T (M) and o € AS »s the value of u is uy, = (d¢)z, ,(Év(2p,9)) = bx (BV (2,4)), 
where zp = (m x ¢)~'(p,e). The total-space element zp, depends on ¢, and the Lie algebra element uy, 
additionally depends (linearly) on the base-space velocity V. These dependencies are not unexpected. 


The dependence of u on ¢ in condition (v) might raise suspicions that the condition as a whole depends 
on the choice of fibre chart, but Theorem 69.1.7 implies chart-independence of this condition because it is 
equivalent to condition (v^), which is chart-independent. The right action map R? of G on P in condition (v') 
might seem to be chart-dependent because by Definition 66.2.10, 


Yp € M, Yz € Pp, Vg € G, Vb € AŞ p, 
RY (2) = (n x à) ! (n(z), o(2)9)- 


However, this right action map is shown to be chart-independent in Theorem 66.2.5. 


69.1.7 THEOREM: Equivalence of right action differential and right action invariance conditions. 
Assume all of the conditions of Definition 69.1.3 except (v) and (v). Then (v) holds if and only if (v^) holds. 


PROOF: Assume that all of the conditions of Definition 69.1.3 hold except condition (v). Let V € T(M), 
p=tm(V), z € P, = 1 !((p)) and g € G. Let ó € AG with p € r(Dom(9)). Then by condition (v), there 
exists u € T.(G) such that (dd), (By (2’)) = (dR iy Jelu) for all z' € Pp. The substitutions 2’ = z and 
z' = zg, where zg = RẸ (2), then yield 


(dé) ss (Bv (29)) = (ARG (29) elU) 


= (ARG (2)q)e(u) (69.1.1) 
= (d(Rg o RG.) )e(u) (69.1.2) 
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= (ARG) 9(z) ((ARG(2) )e(u)) (69.1.3) 

= (dR «(a )z(Bv(z))) 

= ((ARF}) (2) o (dd)2) (8v (2) 

- (a(Re o $))«(Bv (2) (69.1.4) 

= (d($ o R$ ))«(Bv (z)) (69.1.5) 
= ((d¢) zg o (dR) 2) (Bv (2)) (69.1.6) 

= (d 


$)z g((dR Py 2(Bv(z))), 


where line (69.1.1) follows by Theorem 66.2.12 (iv), line (69.1.2) follows from Definition 62.6.2, lines (69.1.3), 
(69.1.4) and (69.1.6) follow from the chain rule, Theorem 58.4.13, for the differentials of differentiable maps, 
and line (69.1.5) follows from Theorem 66.2.12 (vii). 


By condition (iv), (d7)zg(Bv(zg)) = V because zg € P, by Theorem 66.2.12 (iii), and similarly 


(dm); ((dRG )=(Bv(z))) = (d(m o R2))« (8v (2) 
= (dr).(Bv (2)) 
=V 


because 7 o RẸ = by Theorem 66.2.12 (vi). Thus y (zg) and (ARP). (Bv (z)) have the same vertical and 
horizontal components. Therefore By (zg) = (ARP). (By (z)) by Theorem 64.5.10. This verifies condition (v^). 
Now assume that all of the conditions of Definition 69.1.3 hold except condition (v). In particular, assume 


that condition (v/) holds. Let p € M, V € T;(M) and ¢ € AG with p € 7(Dom(¢)). Let zo = (m x $)- ! (p.e). 
Then z(zo) = p and $(zo) =e. So E s) C T.(G). Let u = (d$)4,(v(zo)). Then 

Vz € Pp, (dRG(2))e(u) = (ARG Je (6), (Bv (20))) 
= ((AR§(2))e © (db) z9) (Bv (zo) 
= (d(R§z) o $)) 20(Bv (zo 
= (d( © R$.) (Bv (0) (69.1.7) 
= ((do) 244(z) © (dRẸ 2))z0) (Bv (20)) 
= (dé). e) ((ARG(2)) 20 (Bv (20))) 
= (dd) 2)6(z) Bv (209(2)) (69.1.8) 
where line (69.1.7 J follows from Theorem 66.2.12 (vii), and line (69.1.8) follows from condition (v'). But 
A20912)) = Eeo) = RSA) = Rigle = He) by Theorem 66.2.12 (vi), and x(209(2)) = 


T(z)) = p = m(z). So zoó(z) = (m x 4) ((m x $)(z09(z))) = (m x $) '(n(z2),9(2)) = z. Therefore 
Vz € Pp, (dR) )e(u) = (dd) .8y(z). Hence condition (v) holds. 


69.1.8 REMARK: Relation of connections on principal bundles to the right action of the structure group. 
Condition (v^) in Definition 69.1.3 may be written as 


YV € T(M), Yz € Priv), YJE G,  By(RP(z)) = (aR? )-(8v (2)). 
This may be written in terms of map compositions as follows. 
VV € T(M), Yz € Pv), V9 EG, (Bv o R} )(2) = ((4R7 )z o 8v)2). 


The pointwise differential (aR? ); may be replaced with the global differential (RE )« because the base point 
of the differential is effectively selected by 8v (z) € T;(E). Then one may write condition (v^) as 


VV € T(M), Vg €G, By o R? = (RẸ) © By. 


Thus a horizontal lift function on a principal fibre bundle is closely related to the right action of the structure 
group on the total space. 
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69.1.9 EXAMPLE: Horizontal lift function for a trivial principal (IR, +)-bundle on R4. 

'The complexity of computations in Example 69.1.9 does not arise from the fibre bundle formalism. Nor does 
it arise from the difficulties of computations with coordinates. The complexity arises from the difficulties of 
converting between the abstract fibre bundle formalism and the concrete coordinates framework. 

Example 69.1.9 is a continuation of Example 66.5.4 for the principal (IR, +)-bundle (P, y, M, AG) on M = IR? 
with P= M xG, G = R, AG = {¢}, « : (p,g) > p and ¢: (p, g)  g. Define manifold atlases Ay = {Ym}, 
Ag = {Ya} and Ap = (Vp) with vy = idm, Ya = ida and yp = idp. 

Let 8 be a horizontal lift function on P. Then by Definition 69.1.3 (i, ii), by (z) € TZ(P) for all z € P and 
V € T;(M) with p = n(z). 

Let p € M. Then p = (z?,z!,27,2?) = (27)3 . for some z € IR. Let V € T,(M). Then V = tpw,idm 
for some v € IR. So by Notation 54.5.7, &(va)(V) = (idm)(tp viam) = v, and by Notation 54.5.21, 
V(vy)(V) = Yüdm)(tp vidm) = G^); v )- (x, v) = (x°, x1, urs um vt, v?, v3) eR x R4. 

Let z € P. Then z = (p, g) = (2°, xt, x”, £, g) € IR^ x R for some g € R. So By(z) € T;(P) = Ti, (P) 
and By (z) = tip.g),w,iap for some w € IR x IR = RŽ. So once again by Notation 54.5.7, ®(~p)(8v(z)) = w, 
and by Notation 54.5.21, U(wp)(6v(z)) = g), w). 

Definition 69.1.3 (iv) implies (dr)z(8v(z)) = V. So (dr) z(t(p,9),w,idp) = tp,v, idm- SO by Definition 58.4.5, 


II 
Me 


&G 
ll 
© 


Vk € Za, v” wô (idi, o m o idp )(y)|, as, (9) 


Il 
[7e 
& 
[EM 
F 
3 


k 
(y) ha (2) 


e. 
ll 
[e] 


II 
Me 


S 
Il 
© 


w! Oys y^ (1 E 91) |, ciae c) 


II 
Me 


ES 
ll 
© 


uJó (1 — of) 


Ed 


— w^. 
Therefore w = (v9, v! , v?, v3, w^) for some wt € R. So 


V(vp)(Bv()) = (z, pr g), w) 
= (((x° 2°), 9), (v5, v^, v?, v?, w*)) 
0 


z (x a. 2. 22,g,5 vt, v?, v3, wt). 


Since By(z) depends on both V and z, the coordinate wf can be written as a function y : IR? x IR? = IR 
because the coordinates of p are a common subsequence of the coordinate tuples for V and z. Thus the 


coordinates of By (z) with respect to Yp are 


V(vp)(Bv(z)) 


(x°, at xt art og v9, vl, v?, uua a a, 9,9, vo, vl, v?,v3)) 
= (p, g, v, Yy(p, g, v)) 


= (z,v,y(z,v)). 


Definition 69.1.3 (iii) requires y to be linear with respect to v. Therefore y(z,v) = 25 y(z,e;)v? for all 
(z, v) € IR? x IR^ by Theorem 23.4.6. Define coefficients c" : IR? — R* by c?(z) = y(z,e;) for j € Z4 and 
z € R. Then y(z,v) = ae ci(z)v? for all (z,v) € IR? x IR. Thus y(p,g,v) = Xu c} (p, g)v? for all 
(p,g,v) E Rf x R x R4. 

Definition 69.1.3 (v) requires y to be generated by some right-invariant vector field on G = R. This puts a 
constraint on the dependence of y(p, g, v) with respect to g. 

For the fibre chart ¢ € AG defined as 6: P — G with $ : (p, g) > g for all (p, g) € M x G, the map Re) 
in Definition 69.1.3 (v) satisfies Ra) = Ree. 90) = RẸ (h) = h o g = h +g for all h € G, where “o” 
is the additive group operation on G. This right action map RẸ : R — R satisfies (dRF)e(u) = 3u RF for 
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g € Rand u € T.(G) by Definition 58.1.2. This gives (GR, Jeu) = (dRẸ)e(u) aO RẸ (h) = uty aga 
It follows that (dR, g) )e(u) = (ARF )e(u) = tywide- 

The left side of the generator equation in Definition 69.1.3 (v) is (d@).(6v(z)), where Bv(z) = t~,9),w,idp 
with w € IR^ x R, which has the form w = (v, y(z, v)). Then by Definition 58.4.5, (d$). (Bv (z)) = ty,y(z,v),ide 
because O;(p,g + h) = (0,0,0,0,1). So tg y(z,v) ida = tg,u idg; Which implies y(p, g, v) = y(z,v) = u. Most 
importantly, u is independent of g, although in general, u will depend on p and be linear with respect to v. 


The constancy of y(p, g, v) = u with respect to g may seem surprising because it is not a linear dependency. 
It must be remembered that the horizontal lift function has the character of a vector field on each fibre 
set P,, not a vector, and the strength of this vector field is uniform along each fibre set in this case. The 
fibre set P, at each base point p € IR^ may be thought of as a copy of IR. with a uniform vector field on it 
which has strength proportional to the base space vector V € T; (IR?). 


Since y(p, g, v) = De ch (p, g)v! is independent of g, the coefficients c4 (p, g) may be replaced with a4 (p) 
for j € Z4 and p € M. Thus y(p, g, v) = Xv a5 (p) for all (p, g) € P and v € IR^. This implies 


V(syY(8v(2)) = (v.90, of pw"), 


j=0 


where p 2 mr(m)(V) = (z), g = ó(z), v = #(ym)(V), and a” : M — IR^. This gives some idea of what a 
valid horizontal lift function on a principal bundle looks like in the special case of the additive real-number 
structure group and the IR^ base space. All of the components of the coordinate tuple of By (z) have the 
same form for all horizontal lift functions except for the vertical component x aj (p)v), which depends 


on the choice of p. 
(See Example 70.8.7 for a continuation of Example 69.1.9.) 


69.1.10 EXAMPLE: Horizontal lift function right action invariance for a principal (IR, 4-) -bundle on Rt. 
The alternative condition (v^) in Definition 69.1.3 to constrain the vertical component of a horizontal lift 
function on a principal bundle requires right invariance under group actions on the total space rather than 
generatability by Lie algebra elements as in condition (v). So the form of vertical component function arrived 
at in Example 69.1.9 will hopefully satisfy condition (v^). 


It is shown in Example 66.2.19 that (ARẸ )-(tzw,idp) = trgw,iap for all z € P, g € G and w € R^ x R. 
From Example 69.1.9, Bv (z) = t(p,n),widp» Where w = (v, y(p, h, v)) with y(p, h, v) = gee a5 (p)v? for all 
z= (p, h) € P and v € Rt. So (dRP).(8v(z)) = L(p,h--g) jw idp- 

The left side of the right action invariance condition (v^) is By (zg), where zg = (p, h)g = (p, h o g) = (p, h--g). 
This gives Bv (zg) = tzg,w,idp = t(p,h+g),w,idp- Hence Bv (zg) = (dR7 ):(Bv (z)); as expected. 

The most important objective of an example such as this is to verify that the theorems and definitions within 
the abstract fibre bundle formalism are still valid when they are converted to coordinates and calculus. This 
tends to confirm not only the abstract formalism but also the procedures for conversion from abstract to 
concrete. In practice, one does not write out the conversions and computations in as much detail as in 
Examples 69.1.9 and 69.1.10. The excessive detail given here is for the purpose of “abundant verification". 


69.1.11 REMARK:  Differentiability definition for connections on principal bundles. 

Definition 69.1.12 for the localisation of horizontal lift functions on a principal bundle is a direct conversion of 
the corresponding Definition 67.7.2 for an ordinary fibre bundle. (The spaces and maps in Definition 69.1.12 
are illustrated in Figure 69.1.3, which is a direct conversion of Figure 67.7.1.) 


Similarly, Definition 69.1.13 for the differentiability of horizontal lift functions on a principal bundle is a 
direct conversion of the corresponding Definition 67.7.6 for an ordinary fibre bundle. (See Section 67.7 for a 
more detailed presentation of horizontal lift function differentiability for ordinary fibre bundles.) 


69.1.12 DEFINITION: The localisation of a horizontal lift function 8 on a C! differentiable principal G- 
bundle (P, 7, M, AG) via a fibre chart ó € AG is the map 8? : T(x(Dom(9))) x G — T(G) defined by 


VV € T(z(Dom(9))), Vg € G, 
B*(V, g) = b«(Bv (Gn x à) lare (V), 9))), 
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MxG 


Figure 69.1.3 Localisation of a horizontal lift function 


where Trom) : T(M) — M is the tangent bundle projection map for M. 
The map 6?(V, -) : G — T(G) may also be denoted as B$ for ¢ € AS and V € T(x(Dom(9))). 


69.1.13 DEFINITION: A C* (differentiable) horizontal lift function on a C**1 differentiable principal G- 
bundle (P, 7, M, AG), for k € Zg, is a horizontal lift function 8 on (P, m, M, AG) such that 


vé € Ag, B* € C*(T(x(Dom(9))) x G,T(G)), 


where 8? : T(r(Dom(9))) x G — T(G) is the localisation of 8 via ¢. 


Alternative name: C^ (differentiable) connection. 


69.1.14 EXAMPLE: Localisation of horizontal lift function for a principal (IR, 4-)-bundle on IR^. 

Let f be the horizontal lift function on the principal bundle (P, m, M, AG) in Examples 69.1.9 and 69.1.10, 
where M = Rt, G = R < (R, +), P = M x G, and 7 : P —> M with 7 : (p,g) > p. 

Let ó € AG, V € T(n(Dom(¢))) and g € G. Then 8%(V,g) can be evaluated from Definition 69.1.12. Let 
p = Tr(m) (V), which means that V € T,(M). Let z = (rx) (rrim) (V), 9) = (1x9) (p, 9) = (p, g) € P. 
Then y (z) € T.(P). So ¢,(By(z)) € Tgi) (G) is well defined. 

By Example 69.1.9, ¢.(8v(z)) = (dd)2(Bv(2)) = tg y(zv) ida, where By(2) = tip,g),wylzw) idp). Thus 
B?(V,g) is the vector tgy(z,v),idg € Tg(G) with “velocity” y(z, v), which is the vertical component of By (z). 


The localisation 8? of 8 maps the product manifold T(r(Dom(9))) x G to the manifold T(G). Therefore 
this map can be tested for C* differentiability. In effect, this test is applied to the vertical component of 3. 
The construction 8% extracts this component from £ so that it can be tested. 


69.1.15 REMARK: Application of the horizontal lift function to base-manifold vector fields. 
Definition 69.1.16 is obtained by substituting a PFB horizontal lift function $8 into Definitions 67.5.9 
and 67.5.10 for lifts of vector fields by an OFB horizontal lift function 0. 


69.1.16 DEFINITION: The lift of a vector field X € X(T(M)) by a horizontal lift function 8 on a Ct 
principal G-bundle (P, x, M, AS) is the vector field liftg(X) € X(T(P)) defined by 


Vzc P, liftg(X)(z) = Bx(n(z)) (2). (69.1.9) 


The lift of a local vector field X € X(T(M)|U) by a horizontal lift function 6 on a C! principal G-bundle 
(P, v, M, AS), where U € Top(M), is the local vector field lifta(X) € X(T(P) |x-!(U)) defined by 


Vzem l(U), liftg(X)(z) = Lx) (2). 
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69.2. Transposed horizontal lift functions 


69.2.1 REMARK:  Transposed horizontal lift functions on principal bundles 
In Definition 69.2.2, conditions (i) and (ii) are essentially identical to the corresponding conditions (i) and (ii) 
in Definition 67.8.2 for a transposed horizontal lift function on an ordinary fibre bundle. 


Definition 69.2.2 conditions (iii) and (iii’) are the same as the corresponding conditions (v) and (v’) for 
Definition 69.1.3, but with the lift function transposed. Therefore by Theorem 69.1.7, Definition 69.2.2 is 
equivalent to Definition 67.8.2 for the special case that F = G. Hence all four of the horizontal lift functions, 
0, 0, B and f are equivalent for principal fibre bundles. They contain equivalent information. In other words, 


they are “equi-informational” . 


69.2.2 DEFINITION: A transposed horizontal lift function on a C! principal G-bundle (P,v, M, AG) is a 
map 8: P > Uep Lin(Tz(4(M), T;(P)) which satisfies 


(i) Yz € P, B; € Lin(Tz(4(M), T,(P)), 
(ii) Vz € P, (dr), o B; = id, (ur); 


[linearity] 


[horizontal component] 


(iii) Yp € M, VV € T (M), Vo € AB, du € T.(G), Yz € Pp, (d$): (B.(V)) = (dR§,y)e(u), 


w E€ TG), d, 0 (V) = XE o. 


In other words, Vp € M, VV € T,(M), Vo € Ag v j 
(See Definition 62.7.7 for the right-invariant vector field XË generated on G by u € T.(G).) 


Alternatively, condition (iii) may be replaced by condition (ii'), which is equivalent by Theorem 69.1.7. 
(ii) Vz € P, Vg € G, Beg = (ARP), o Bz, 
where R? : P — P is the right action RỌ : z > uG(z,g) of g € G on P. (See Definition 66.2.2 for pġ.) 


69.2.3 REMARK: The spaces and maps for the definition of the horizontal lift function. 
Definition 69.2.2 condition (iii) is illustrated in Figure 69.2.1. 


T TT(P) 
sg (P) P 
[cans ho IL: G 
[nne m C 
Bzg| T(P) P 
= (dt) zg 
1 |(an. E |" 
"d 
T«(4(M) — M 
T'T(M) 


Figure 69.2.1 Maps for transposed horizontal lift function on a PFB 


69.2.4 REMARK:  Lou-value expression for right-invariant vector field condition. 

The equation ¢, o 6.(V) = XT o ¢ in Definition 69.2.2 condition (iii) is of low value. It is an attempt to 
replicate the more useful equation $, o By = XË o $ in Definition 69.1.3 condition (v). However, whereas 
z is clearly a function argument in the expression “Sy(z)”, it is not so clearly a function argument in the 


expression *9,(V)". This suggests that the transposed function fis less natural than f. 


69.2.5 REMARK: Interpretation of the definition of horizontal lift function. 

If Definition 69.2.2 is studied line by line, it is less complex than it seems at first. In part (i), 3.(v) means 
the direction in which to move z relative to the “coordinates” when moving the point p = 7(z) in the 
direction v € T,(,,(M). In other words, B.(v) means the rate of change of z required in the direction v in 
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order to keep z parallel to the starting value. A practical example of this would be an airplane flying along 
a geodesic from New York to Paris. To keep the airplane in an orientation parallel to the initial orientation, 
it is necessary to adjust the bearing of the airplane relative to the longitude/latitude coordinate system. In 
this case, z would be the airplane’s orientation, v is the direction of travel of the airplane, and (v) is the 
rate at which the orientation must be changed to keep the airplane moving in a parallel fashion. 


Figure 69.2.2 illustrates roughly how the connection determines parallelism at a short distance from a 
point z € P. 


& CCo) oco) m um 
Hip E Z tud n : X 
: | 
| z 
<0 pom) ^ 
Figure 69.2.2 Local parallelism determined by a connection 6 


The projection of z onto M is 1(z) € M. The vector v € T(z) (M) is a tangent vector at 7(z) which indicates 
a direction of movement for the point p = «(z). If this point p is moved by the amount v to p+ v, then 
the vector in z^ !([p + v}) which is parallel to z will look like z + 8,(v) + o(v). In other words, the vector 
z € P is moved my the small amount 6,(v) + o(v) € T,(P). (This interpretation is not very rigorous. It is 
only intended to give an intuitive interpretation of the connection.) The term o(v) of order smaller than v 
as v — 0. 


Note that B,(v) has horizontal and vertical components. The horizontal component is the displacement 
horizontally by the vector v, which implies that it contains no real information. The vertical component is 
the deviation of z away from the point it would have been at p+ v if the vertical component of z had been 
left unchanged relative to the coordinate system. Similarly, when the point p is moved in the direction —v 
to p — v, the vertical component of z is translated by ¢,(—v) + o(—v) to z + 6;(—v) + o(—v). The dashed 
vertical lines at p+ v represent the parallel transport of z in the case that f has no vertical component. The 
horizontal dashed lines represent the parallel transportation of the vector z under translations +v of p, which 
consist of a horizontal component (the straight portion) and a vertical component (the curved portion). This 
explains part (i) of Definition 69.2.2. 


The fact that condition (i) requires linearity of the connection with respect to tangent vectors in T,(,,(M) 
implies a reduced amount of information in the connection. The whole map £, is fully specified on any 
T(z) (M) if it is known for any n linearly independent vectors in that space. 


Part (ii) of Definition 69.2.2 says that (dr),(82(v)) = v for all v € T,(,). This just means that if p is 
translated to p + v, then the value z + 8; (v) + o(v) has a horizontal component equal to v. In other words, 
B, always translates z to a point in P which has the same base point as the point p+ v. (Once again, this 
is a very rough first-order description. This description can be made precise, but it is more useful to think 
here in the language of small displacements which is usual in physics texts.) This condition implies that the 
horizontal component of the connection actually contains no information. 

Part (iii!) of Definition 69.2.2 states that the parallel transport vector Baa can be obtained from 6, by 
applying the linear transformation (dRg)z, which is (very roughly speaking) a vertical linear transformation 
factor of g. In other words, all vectors zg + B(zg) + o(v) can be obtained from z + 6(z) + o(v) by applying 
the transformation g. The situation when a group element g is applied is illustrated in Figure 69.2.3. 


Elements g of the group G act on the set P. The vector z.g is therefore an element of P such that 
n(z.g) = T(z) = p, as shown. The figure shows how the connection 8 transports a vector zg in a parallel 
fashion for small displacements +v from the base point p. For instance, zg is transported to z+ Pag(v) +o(v) 
for a base point translation v. Just as for parallel transport of vector z, the parallel transport of zg has a 
horizontal and vertical component. The vertical component of the connection is invariant under the group G. 
This implies a high level of redundancy of the information in the connection. 


Condition (iii^) specifies not so much invariance as conservation or preservation. It specified that the structure 
of the fibre space must be preserved under parallel translations. The connection £ is not really invariant in 
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Figure 69.2.3 Local parallelism under group action 


any sense, but it does guarantee that the invariants of the structure group are preserved. This is analogous 
to the fact that the Christoffel array for a connection is not itself a tensor, but its use does preserve the 
tensor property of tensors which are differentiated with them. 


69.2.6 REMARK: Horizontal lift of vector fields by transposed horizontal lift functions. 
Definition 69.2.7 is the obvious conversion of Definition 69.1.16 to use a transposed horizontal lift function. 
Figure 69.2.4 illustrates Definition 69.2.7 combined with Definition 69.2.2 (iii!), where v = X(n(Z)). 


(4RP), RP 


B ee ea 
lifts (X) 
T(P) v=X(n(z)) — P 
Figure 69.2.4 Lift of a vector field by a transposed horizontal lift function 


69.2.7 DEFINITION: The lift of a vector field X € X (T(M)) by a transposed horizontal lift function 6 on 
a C! principal G-bundle (P, x, M, AG) is the vector field liftg(X) € X (T(P)) defined by 


Vz € P, liftg(X)(z) = B; (X (n(2))). 


The lift of a local vector field X € X(T(M)|U) by a horizontal lift function 8 on a C! principal G-bundle 
(P, m, M, AG), where U € Top(M), is the local vector field liftg(X) € X(T(P)|x~'(U)) defined by 


Yz € m7! (U), lifta(X)(2) = 8-(X (x(z))). 


69.2.8 REMARK: Combined diagram of vector-field lift with the right action map rule. 

Figure 69.2.5 shows a combination of Definition 69.2.7 for liftg(X) for a vector field X with the right action 
map rule Definition 69.2.2 (iii^) for a transposed horizontal lift function on a principal bundle. (This is the 
same as Figure 69.2.1 with the addition of arrow-labels for the vector-field lift.) 

The lifted vector field liftg(X) € X (T(P)) for a vector field X € X(T(M)), by a transposed horizontal lift 
function f, is the same as for ordinary fibre bundles in Definition 67.8.6. 

First the expression 6,(X(m(z))) on line (69.1.9) maps z € P to m(z) € M, which is then mapped to 
v = X(n(z)) € Trz) (M) by the cross-section X. Then the lifted vector field value lift g(X)(z) = 8: CX (n(2))) 
maps z € P to B,(X(n(z))) = 8,(v) € T; (P). 


69.3. Horizontal component map and horizontal subspace 


69.3.1 REMARK: Definition of horizontal component map for a principal fibre bundle. 
Definition 69.3.2 specialises Definition 67.9.2 to principal fibre bundles. 
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liftg(X) 
Tzg(P) AE ————————— P 
it . Sk. 
[anp : E: G 


lifta(X)(z) = B; (X (n(z))) 


Figure 69.2.5 Spaces for vector-field lift by a transposed horizontal lift function 


69.3.2 DEFINITION: Horizontal component map for a principal fibre bundle. 7 
The (pointwise) horizontal component map at a point z € P for a transposed horizontal lift map 8 on a C! 
principal G-bundle (P, x, M, AS) is the map h, : T,(P) > T.(P) defined by 


Vy € T.(P), he(y) = Bz((dn):(y)). 


In other words, Vz € P, h, = B, o (dr). 


The (global) horizontal component map for a transposed horizontal lift map B on aC! principal G-bundle 
(P, v, M, AG) is the map h : T(P) — T(P) defined by 


Vy € T(P), h(y) = Bs e (y) (dr) p(y) (y)), 
where tp : T(P) > P is the projection map for T(P). 


69.3.3 EXAMPLE: Horizontal component map for a trivial principal (IR, 4-) -bundle on R4. 

Let B be the horizontal lift function on the principal bundle (P, m, M, AG) in Example 69.1.9, where M = Rf, 
G=R<(R,+), P= MxG, andz: P — M with s : (p,g) p. Then By(z) = tipg),(v,f(z,v)),idp for p € M, 
V = tpv idu € Tp(M) and z = (p, g) € Pp, where f : RÁxIRXIR^ — R has the form f(p, g, v) = $5 aj(p)v? 
for all z = (p,g) € P and v € Rf, where a = (25)3—o : M — Rf is the tuple of coefficients of the linear 
dependence of f on v. 


It follows that 8.(V) = tip,g),(v.f(z,v)),idp for p € M, V = tgvia, € Tp(M) and z = (p,g) € Pp. Let 
y € T.(P) and V = (dz);(y). Then y = tipg),(v,w),iap for some w € IR, and Definition 69.3.2 implies that 
h.(y) = Bz((dr)z(y)) = B(V) = Lp,g),(v, f (z,v)) idp: 
Thus h; maps y = t(y,j),(v,w)iap € Tz(P) to tip9),(v,f(z,v)),idp € Tz2(P). The effect is clearly to substitute 
f(z,v) € R for the arbitrary input w € IR. This is how the horizontal component map works in general. 
It replaces an arbitrary vertical component of y € T;(P) with a horizontal vertical component. Then the 
difference v;(y) = y — h;(y) is the vertical component relative to the horizontal value. This difference is 
defined to be the *vertical component map" in Definition 69.4.2. (See Example 69.4.5 for the computation 
of the vertical component map for the same connection on the same principal bundle.) 


69.3.4 REMARK: The effect of the right action map on the horizontal component map. 

The rule in line (69.3.2) in Theorem 69.3.5 for the effect of the right action map of a principal bundle on the 
horizontal component map has very much the appearance of an adjoint expression. An adjoint expression 
appears more explicitly in the corresponding rule for the connection form in Theorem 69.8.2. 
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69.3.5 THEOREM: Effect of right action map on the horizontal component map. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, v, M, AG). Then 

Vze P, VgeG, hago (ARẸ uem (ERE ao hz. (69.3.1) 
In other words, 

Vzc P,Vg eG, hzg = (ARẸ) © hz o (LR Dae (69.3.2) 
PROOF: Letz € G and g € G. Then bzg = (ARE), o B. by Definition 69.2.2 (ii). Hence hzg o (dRg)z = 
Bzg © (dr)zg © (dH), = (dRP), o B, o (dr), = (ARẸ): o h; by Theorem 66.2.18 (iv). 


Line (69.3.2) follows from line (69.3.1) and the observation that (dR7); o (dR? )zg = (d(Rj o RP i); = 
(didp);, = idr, (pj by Theorem 58.4.13 (the chain rule for C maps), and Theorem 58.5.2. 


69.3.6 REMARK: The effect of the right action map on horizontal subspaces. 

Definition 69.3.7 is an automatic specialisation to principal bundles of the horizontal subspace concept in 
Definition 67.9.6. The right action map may be applied to horizontal subspaces for principal bundles, whereas 
there is no such right action map for ordinary fibre bundles. 


69.3.7 DEFINITION: Horizontal subspace for a principal fibre bundle. E 

The horizontal subspace at a point z € P for a transposed horizontal lift map 6 on a C! principal G-bundle 
(P, v, M, AG) is the set Q, = Range(8,). 

The horizontal subspace map for a transposed horizontal lift map B on a C! principal G-bundle (P, r, M, AS) 
is the map Q : P > IP(T(P)) defined by Q, = Range(0,) for all z € P. 


69.3.8 THEOREM: Effect of right action map on horizontal subspaces. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, v, M, AS). Let Q : P > IP(T(P)) 
by the horizontal subspace map for 8. Then 


Yz € P, Vg EG, Qzg zs (dR? ).(Qz). 


PROOF: Letz € P and g € G. Then Qz = Range(@zg) = Range((dR?), o 8.) = (ARP); (Range(8;)) = 
(dR?).(Qz) by Definitions 69.3.7 and 69.2.2 (iii") and Theorem 10.10.13 (ii). 


69.4. Vertical component maps for principal bundles 


69.4.1 REMARK: The vertical component map for a given connection. 

The difference expression “idy, (pj — hz” plays an important role in Definition 69.5.4 for connection forms. 
It is given the name “vertical component map" and notation v; in Definition 69.4.2. (This is a specialisation 
from ordinary fibre bundles to principal bundles of the corresponding map in Definition 67.10.2.) Some 
properties of this vertical component map are listed in Theorems 69.4.3 and 69.4.4. (See also Theorem 67.10.3 
for the corresponding properties for general ordinary fibre bundles.) 


69.4.2 DEFINITION: Vertical component map for a principal fibre bundle. E 
The (pointwise) vertical component map at a point z € P for a transposed horizontal lift map 8 on a C! 
principal G-bundle (P, r, M, AS) is the map vz : T;(P) — T.(P) defined by 
Vy € T,(P), vs(y) = y — B. (dv); (y) 
=y—hz(y). 


In other words, Vz € P, vz = idz, (p) — hz. 


The (global) vertical component map for a transposed horizontal lift map 8 on a C! principal G-bundle 
(P, v, M, AG) is the map v : T(P) > T(P) defined by 


Vy € T(P), v(y) = y — Bsp y (dr)« e (uy Y) 
—y-— ha (y) (y), 


where tp : T(P) — P is the projection map for T(P). 
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69.4.3 THEOREM: Some basic properties of the vertical component map for a principal bundle. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, s, M, AG). Let z € P. 


(i) Range(v;) = T; o(P). 


(ii) ker(v;) = Q- 
(iii) v; oh, =0 
(iv) v; 08, =0 
(v) Uz ov; = Vz 
(vi) hz ov; =0 
(vii) (dr); ov, =0 
(viii) v; oA, = Àz 


PROOF: Part (i) follows from Theorem 67.10.3 (i) by substituting P for E. 
Part (ii) follows from Theorem 67.10.3 (ii). 

Part (iii) follows from Theorem 67.10.3 (iii). 

Part (iv) follows from Theorem 67.10.3 (iv) by substituting 8 for 0. 

Part (v) follows from Theorem 67.10.3 (v). 

Part (vi) follows from Theorem 67.10.3 (vi). 

Part (vii) follows from Theorem 67.10.3 (vii). 


For part (viii), hz o Az = 0 because ker(h;) = Tz o(P) by Theorem 67.9.9 (viii), whereas Range(Az) = Tz o(P) 
by Definition 66.5.2 and Theorem 66.4.5 (v). Hence v; o Az = (idr, (pj — hz) o Ax = Az — 0 = dz. 


69.4.4 THEOREM: Effect of the right action map on the vertical component map. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, x, M, AG). Then 


Vz € P, Yg eG, vzg © (GR, eS (ak, uod) (69.4.1) 
In other words, 


Vz € P, Vg eG, Uzg = (4R7); o Uz O ub sup (69.4.2) 
Proor: Let z € P and g € G. Then 


Uzg = idr, (p) — hzg 

= idr, (p) — (AR), o hz o (ARE 1) og (69.4.3) 
= idr, (p — (GET); © (idr, (py — vz) o (dRF1) 2g 

= idr, (p) — (ARE), o id, (pj o (dRẸ-1)zg + (ART); o v; o (ARF-1) 2g 

(dR, ), © vz o (AR, a)ug, (69.4.4) 


where line (69.4.3) follows from Theorem 69.3.5 line (69.3.2), and line (69.4.4) follows from the equations 
(ARP), o idr (pj o (ARE 1)zg = (dR?); o (dRẸ-ı)zg = = (d(RẸ o RP 1))eg = (didP)zg = idr,, (py. This 
verifies line (69.4.2). Line (69.4.1) then follows by composing each ade of line (69.4.2) on the right with 
(ARẸ); and noting that (dR? )zg o (dRẸ): = (d( RP, o RP)), = (didp), = idz (py. 


69.4.5 EXAMPLE: Vertical component map for a trivial principal (IR, +)-bundle on Rt. 

Let 8 be the horizontal lift function on the principal bundle (P, m, M, AG) in Example 69.1.9, where M = Rf, 
G=R<(R,+), P= MxG, andr : P — M witht: (p, g) > p. Then By (z) = t(y,5),(v,f(2,»));ap for p € M, 
V = tpv idam € Tp(M) and z = (p, g) € Pp, where f : R4xRxIR* > R has the form f(p,g,v) = x o a; (p)? 
for all z = (p,g) € P and v € Rf, where a = (a;)?_9 : M — IR? is the tuple of coefficients of the linear 
dependence of f on v. 
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Let y € T;(P) and V = (daz);(y) Then y = tip9),(v,w),idp for some w € IR. By Example 69.3.3, the 
horizontal component map hz maps t(p,g),(v,w), idp tO t(p,9),(v,f(z,v)),idp- 90 by Definition 69.4.2, the vertical 
component map v; maps tip 9),(v,w),idp tO Y — hz(U) = t(p,g),(0,u— f (z,v)) idp 

Thus the output from the vertical component map is the difference v;(y) = y — hz(y), which is the vertical 
component relative to the horizontal value. When this difference is viewed through a Cartesian product 
coordinate chart wp = wy x We on a locally trivialised total space P = M x G as it is here, the third 
(or *horizontal") component v of t(p,g),(v,w),idp iS set to zero, and the fourth (or *vertical") component w 
undergoes the map w — w — f(z,v). 


69.5. Connection forms on principal bundles 


69.5.1 REMARK: History and literature for connection forms. 

The definition of connection forms on principal bundles is generally attributed to Ehresmann [178] in 1950. 
(See for example Spivak [37], page 305.) These must be distinguished from the related connection-form 
matrices for frame fields, which are generally attributed to Élie Cartan in the 1920s. (See for example 
Bishop/Goldberg [3], page 222.) 


According to Sulanke/Wintgen [40], page 126, the basic ideas of the theory of general connections originate 
from Élie Cartan's papers in 1924, 1926 and 1937, whereas the modern global form of the theory in the 
language of fibre bundles was given by Ehresmann [178] in a 1950 paper. (Fibre bundles were introduced in 
1932 by Seifert [197]. Affine connections were defined for tangent bundles in 1918 by Weyl [310].) 


For Ehresmann-style connection forms, see Spivak [37], Volume 2, pages 305—341; Frankel [12], pages 460—462, 
479-487; Choquet-Bruhat [6], pages 256-267; Bishop/Crittenden [2], pages 76-87; Poor [32], pages 282-288, 
293-300; Sulanke/Wintgen [40], pages 126-130; Drechsler/Mayer [262], page 204; Kobayashi/Nomizu [19], 
pages 63-71. 


For Cartan-style connection-form-matrices, see Spivak [37], Volume 2, pages 259-289; Bishop/Goldberg [3], 
pages 222-237; Darling [8], pages 76-97; O'Neill [295], pages 49-55; Flanders [11], pages 32-48; Frankel [12], 
pages 247-267; Sternberg [38], pages 161-173, 315-324; Szekeres [305], pages 527—533; Sulanke/Wintgen [40], 
pages 159-166; Gómez-Ruiz [14], pages 161-167; Mener Thorne Wheeler [292], pages 348-358; Crampin/ 
Pirani [7], pages 133-134, 223-224, 277-283, 372-381; Postnikov [33], pages 75-79. 


69.5.2 REMARK: Connection forms are really “covariant derivative forms”. 

The name “connection form” is, strictly speaking, a misnomer. It should be called a “covariant derivative 
form” because it is a measure of the difference between a given vector on the total space and the corresponding 
horizontal (i.e. parallel) vector with the same base-space velocity. (See for example Definition 69.5.4 or 
Theorem 69.6.3 (xvii, xviii).) The connection form has much in common with the vertical component map 
in Definition 69.4.2, which equals the difference between a total space tangent vector and its horizontal 
component. The connection form merely converts the vertical component to a Lie algebra element via a 
linear space isomorphism, namely the fundamental vertical vector field. 


Covariant derivatives were introduced in 1900 by Ricci/Levi-Civita [194]. Parallel transport was introduced 
for Riemann manifolds in 1917 by Levi-Civita [187], and developed into more general affine connections in 
1918 by Weyl [310]. Connections for general fibre bundles could not be developed before the introduction 
of fibre bundles in 1932 by Seifert [197]. The so-called “connection form” corresponds more closely to the 
earlier concept of a covariant derivative. Both concepts contain the same information in the case of a PFB, 
but connections are well defined on general differentiable OFBs, whereas the connection form requires a PFB 
for its definition. 


69.5.3 REMARK: Connection forms on principal fibre bundles. 

Definition 69.5.4 gives an explicit formula for the connection form on a principal fibre bundle P for a 
given transposed connection 8 on P. The formula on line (69.5.1) first computes the vertical component 
y — Bz((dr)z(y)) of a given vector y € T.(P) by subtracting the horizontal component hz(y) = B;((dr);(y)) 
of y from y. The second step is to map this vertical vector in T; o(P) to a Lie algebra element in T(G), 
which effectively measures the deviation of y from parallel motion as an infinitesimal transformation. (See 
Section 66.5 for infinitesimal transformations on principal bundles. See Definition 64.5.7 for the vertical 
space T, o9(P).) Thus the connection form value w,(y) is the infinitesimal structure group transformation 
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which must be added to the parallel vector to obtain the vector y. If y is a parallel vector (i.e. horizontal 
vector), then w,(y) = 0. In other words, when the value of the connection form is the zero Lie algebra 
element, the vector y is equal to the corresponding parallel vector h,(y). 


For general y € T;(P), the connection form at z may be thought of as answering the question: “Which 
infinitesimal action u € T.(G) must be applied to T,(P) to transform the horizontal vector h;(y) to y?" 
(This is not literally true because only vertical vectors are acted on by u via ((dLP),)-1.) 

The map LP : G > P,(z) in Definition 69.5.4 is a C! diffeomorphism by Theorem 66.4.3 (xii). Therefore 
the inverse differential map (dL?)=>! : T..9(P) — Te(G) is well defined for each z € P. (This assumes 
the tacit application of the inverse of the fibre-set tangent vector embedding map, which is a bijection by 
Theorem 64.12.8.) A more or less explicit expression for A7! = (dLPL);! is given in Theorem 66.5.7 (vi) in 
terms of fibre charts. 


Definitions 69.3.2 and 69.5.4 are illustrated in Figure 69.5.1. 


nanf _ >V | 


Figure 69.5.1 Horizontal and vertical components and connection form of vector on total space 


69.5.4 DEFINITION: Connection form on a principal fibre bundle. E 
The (pointwise) connection form at a point z € P for a transposed horizontal lift map 6 on a C! principal 
G-bundle (P, r, M, AG) is the map wz : T;(P) + T.(G) defined by 


Vy € T,(P), we(y) = (dLZ)e "(y — B« (do): (y))) (69.5.1) 
= Az! (y — 8-((dr)-(y))) 
= Az (vz(y)) 
In other words, 
Vz € P, wz = Az! o (idr, (p) — B; o (dr);) (69.5.2) 


(See Definitions 66.4.2 and 66.5.2 for the left action map L? and infinitesimal left action map A; = (dL? J)e. 
See Definition 69.4.2 for the vertical component map vz.) 


The (global) connection form for a transposed horizontal lift map 6 on a C! principal G-bundle (P, x, M, A) 
is the map w : T(P) > T.(G) defined by 


vy e T(P), MON NOS NEUE 
= rely) treaty) (y)), 


where tp : T(P) > P is the projection map for T(P). In other words, 


Vz € P, Vy € T.(P), w(y) = 


In other words, w = (J,e p wz. 
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69.5.5 REMARK: Names for connection forms on principal fibre bundles. 

The connection form w in Definition 69.5.4 is called the “Ehresmann connection” by Spivak [37], Volume 2, 
page 315. It is called the “Ehresmann connection 1-form” by Eriksson/Hàggblad/Strómbom [264], page 53. 
It is called the *Cartan-Ehresmann connection" or *Cartan connection" by Poor [32], page 294. 


69.5.6 EXAMPLE: Connection form for a trivial principal (IR, 4-)-bundle on R4. 

Let 8 be the horizontal lift function on the principal bundle (P, m, M, AG) in Example 69.1.9, where M = IR*, 
G-R«(R,4-),P-MxG,andm: P — M with : (p,g) — p. Then By(z) = t(p,g),(v,f(z,»))iap for p € M, 
V = tpv ia, € Tp(M) and z = (p, g) € Pj, where f : R4x Rx R* — R has the form f(p, g, v) = bem a;(p)v4 
for all z = (p,g) € P and v € R^, where a = (25)3—o : M — R* is the tuple of coefficients of the linear 
dependence of f on v. 

Let y € TZ(P) and V = (dz);(y). Then y = t(y,5),(v,w);iap for some w € IR. By Example 69.4.5, the vertical 
component map v; maps tip 9) (v,w),idp tO '(p,g),(0,u—f(z,v))iap- 90 by Definition 69.5.4 and Example 66.5.8, 
the connection form w, maps y to 


= tog w— f (zv) idr: 


Thus roughly speaking, the vertical component w — f(z,v) of vz(y) € Tz o(P) is first computed, and this is 
then used as the tangent velocity parameter for the vector w;(y) € T«(G) = To(IR). 


Most importantly, the velocity parameter w — f(z,v) € IR of the vector to, w—f(z,v),idn € To(IR) is a linear 
function of the tuple (v, w) € IR^ x IR for each z € P = R* x IR. This confirms that when the abstract fibre 
bundle formalism is translated to classical differential geometry using coordinates, the connection form truly 
is a differential form, i.e. a linear functional at each point of the manifold P. 


69.5.7 REMARK: The ambiguous differentiable structure on the target space of a connection form. 
Example 69.5.8 shows the difficulties which arise when attempting to compute the exterior derivative of a 
connection form. This exterior derivative is required by Definition 70.5.2 for the curvature of the connection 
form. The target space T.(G) for w in Definition 69.5.4 may be regarded as a subset of the tangent space 
manifold T(G), as a submanifold of the tangent space manifold T(G), or as the linear space of tangent 
vectors at e € G. (This linear space T.(G) has a special differentiable structure as in Definition 51.4.21.) 


When functions from P to T.(G) are to be differentiated, as they must be in the computation of the exterior 
derivative dw, the derivatives land in different spaces according to which differentiable structure is assumed 
for T.(G). These derivatives will be in the tangent bundle of the manifold structure which is assumed 
for T.(G). This tangent bundle will be different in each case. Therefore the exterior derivative will be 
different in each case. Consequently the curvature will be different in each case! (These differences are only 
formal, of course. The numerical value of the curvature is still the same in each case.) 


When the group G is itself a finite-dimensional linear space, there are further opportunities for ambiguity. In 
this case, G itself (as opposed to T.(G)) will have the special differentiable structure as in Definition 51.4.21. 
Then there will be one again multiple choices for the structure to be used for T.(G), which can be either the 
subset T. (G) of the full manifold T(G), the submanifold T.(G) of T(G), or the linear space T. (G) of tangent 
vectors at e € G. 


This gives, in the case of a linear space G, a total of six interpretations for the manifold structure for T.(G). 


69.5.8 EXAMPLE: Exterior derivative of connection form for a trivial principal (IR, 4-) -bundle on R4. 
The exterior derivative of the connection form w in Example 69.5.6 may be computed using Definition 61.10.3 
or the specialisation to 1-forms in Remark 61.10.5. Example 61.10.6 shows how to apply the manifold exterior 
derivative to a real-valued 1-form on a manifold which is effectively a Cartesian space. In this case, the 
manifold is P = M x G = Rt x IR, and the 1-form is Lie-algebra-valued. 


For the connection form w : T(P) > To(IR) in Example 69.5.6, assume that a € C1(R*,IR*) so that 
f € C (R* x R x R4, R) and v € X(A1(T(P), T.(G))). Let Y = idp. 
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Let z = (p, g) € P and (vo, wo), (v1, w1) € R* x IR. Let Yk = t; (v, u,),o € Te(P) for k = 0,1. The two terms 
of (dw)(Y1, Y2) are in To(IR), which is a one-dimensional real linear space. 


By Remark 61.10.5, or by Definition 61.10.3 line (61.10.1), 


(du)(Yo, Y1) = Oy, (w o Extny (Y1)) — Oy, (w o Extny (Yo)) 
= Ô; (vo wo), p (W © Extny (tz Gri) ap)) — Oates) aa (w 2 Extny (tz os wie) ab) ) 
= Ó, (vou) A. > (orsa ui),9)) — Os vi mi), 9 > wur (oou) p) (69.5.3) 
= Or tos ug, UI towa— fle ar) idr) — Oz (wrw), yh? H towo- f(v) idr) (69.5.4) 


= t0,ai,idr — t0,a0,idR> 


where o4 = ô; (wowo), p(w — f(-,v1)) and ao = 0 (ow), p (wo — f(-,V0)) by Theorem 54.14.7, so that 


3 : 
Qi = (5 U9 Opi + 1905) (wi m f(p, g,v1)) 


j=0 


3 , 
= b» v)Oy; f (p, g, v1) E woÓs f (p, g, v1) 


j=0 


and 


3 A 
ao = (35 vió; + w183) (wo — F(p, g, vo)) 


j=0 


3 : 
LX = vi Opi f (p, 9, vo) d w10; f (p, g, vo). 
j=0 
Thus (du)(Yo, Y1) = to,a, - os ias; Where 
3 s ; 
a, — Aa = — 2 (1) Op F(p, 9, v1) = V1 Opi f (p, g, vo)) n (woof (p, 9, vı) m w10; f (p, g, vo)). 
j=0 


On line (69.5.3), vı and w, (and vg and wg) are constants which are “frozen” at their values at z because the 
functions Extny (t, (vı ,,),5) and Extny (tz (vo,wo),p) are constant with respect to the chart v». This implies 
that when the derivative operators Oz (4, wo), and Oz (,, v, ),y are applied, no terms arise due to the variation 
of vı, w1, vo and wo with respect to the dummy variable z’. Thus on line (69.5.4), z’ appears only as the 


first parameter of the function f. This considerably simplifies the computations. 

The function f : Rf x R x R* > R in Example 69.5.6 has the form f(p,g,v) = dr aj(p)v) for all 
z = (p,g) € P and v € R^, where a = (a;)3—0 : M — BR’ is the tuple of coefficients of the linear dependence 
of f on v. This form for f is a consequence of the definition of a connection form, not an arbitrary constraint. 
(This form is shown in Example 69.1.9 for a horizontal lift function and is applied to the connection form in 
Example 69.5.6.) Thus any connection form with structure group (IR, +) on R* will have this form. 


In particular, f(p, g, v) is independent of g and is linear with respect to v. Consequently 


3 : . 
Qj — Q0 = — y» (v8 0p f (p, 9,01) m vl Opi f (p. 9, vo)) 


Q. 


(Opi ax(p) — 0,«aj(p))vjvi. 


This expression is clearly bilinear and antisymmetric with respect to v, as expected. 


This formula for o4 — o gives the impression that dw could be a real function of points p € M = IR* and 
vo, V1 € IR^ whereas it is in fact a Lie-algebra-valued function on the tangent vector-pair total space T?(P) 
as in Notation 55.5.5. A more accurate impression is given by the formula 


Vz = (p, 9) = P, V(vo, wo), (v1, w1) = R* x R, 


[dto T e tua ago jus tatw) = to, F(p,g,vo jwo,v1 wi), idr 


= Kp (F(p, 9, vo, wo, v1, w1)), 
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where 


Vz = (p, 9) € P, V(vo, wo), (v1, w1) € Rt x R, 


3 : 
F(p, g, Vo, Wo, V1, W1) = > (Oy; ap) = 0,.a;(p))vjvi. 
k= 


This makes it clearer that (dw), : T2(P) > T.(G), and in fact (dw); € A2(T; (P), T. (G)) for all z € P, and 
then dw € X (A2(T(P), Te(G))). 


69.5.9 REMARK: The underlying tensor bundle for connection forms on principal bundles. 

In terms of Notation 56.7.4 for multilinear map total spaces, the global connection form w : T(P) — T.(G) 
in Definition 69.5.4 inhabits the set X (A1 (T(P), T.(G))) of short-cut T.(G)-valued 1-forms on P, which are 
defined in Notation 57.7.15. So w € X(Ai(T(P),T.(G))) = XCZ, (T(P),T.(G))) = X(A(T(P), T.(G))) 
because the antisymmetry constraint is redundant for singly covariant tensor fields. 


69.5.10 THEOREM: Connection forms are short-cut differential forms. 
Let (P, v, M, AG) by a C! principal G-bundle. Let w : T(P) > T.(G) be a global connection form on P. 


(i) w € X(M1(T(P), T.(G))). Hence w is a short-cut differential form as in Notation 57.7.15. 
(ii) Define o : P > A (T(P), T4(G)) by @(z) = why. (y for all z € P. Then ù € X(A1(T(P), T.(G))). 


Proor: For part (i), A (T(P), Te(G)) = Upep Ai (Zp(P), Te(G)) by Notation 56.7.4. So it follows from 
Theorem 57.7.16 (i) that 


X(A.(T(P),7.(G))) = (6 : T(P) > T(G); Yz € P, bln (py € A1 (T:(P), T.(G))} 


Since wu; : T;(P) > T.(G) is linear for all z € P, it follows that e| (py = w; € Lin(T,(P),T-(G)) = 
Ai(T.(P),T.(G)) for all z € P. Therefore w € X(A1(T(P),Te(G))) by Notation 57.7.15. 
For part (ii), @(z) € Aı (T(P), T.(G)) by the linearity of us (Py Hence 


à(2) € (X : P > M (T(P), TG); Yz € P, X(2) € Ai(T.(P),Te(@)} 
= X(Ai(T(P),T.(G))) 


by Notation 57.6.6. 


69.5.11 REMARK: Alternative tensor bundle for connection forms. 

Since Lin(T;(P), T,(G)) is canonically isomorphic to the tensor product space T* (P) ® T,(G) by Theorems 
29.2.26 and 29.2.4, and T7 (P) €9 T.(G) is isomorphic to T.(G) 69 TZ (P) by permuting components, one may 
think of w, as an element of T.(G) & T7 (P). If U,ep(Te(G) 8 T7 (P)) is regarded as a fibre bundle with 
base space P, this may be informally denoted as T.(G) ® T*(P). (See Eriksson/Hággblad/Strómbom [264], 
page 53, for something similar.) 


Slightly more accurately, one could write the triple (T.(G) ® T*(P), s, P) for this fibre bundle, with 7 : 
T.(G) 69 T* (P) > P defined by 7: u&G e» 7*(C) for all u € T(G) and ¢ € T*(P), where x* : T'(P) > P 
is the standard projection map for the dual tangent bundle T*(P). (Note that this only defines 7 for simple 
tensors. This must be extended as a bilinear map to all of T.(G) 69 T*(P).) 


Then one could write informally w € X(T.(G) & T*(P),7, P). In other words, w may be regarded as a cross- 
section of the fibre bundle (T.(G) 69 T* (P), s, P). Such an informal notation creates numerous confusions 
and complexities which are best avoided. It is much less confusing to think of the connection form as a Lie 
algebra valued function on the tangent bundle T(P), which is exactly what it is. However, the connection 
form has numerous special properties, some of which are presented in Theorems 69.5.13 and 69.5.15. 


Another possible way to formulate the connection form as a cross-section of a tensor bundle is to consider 
Lin(T;(P), T.(G)) to be a multilinear map space .Z (T, (P), T.(G)) as in Notation 27.2.19 for the special case 
of a single domain component. From this, a T.(G)-valued tangent covector bundle *XZ(T(P), T.(G)))" 
could be defined analogously to .X (T* (P)). Once again, this approach is somewhat clumsy, but the idea that 
a connection form is a Lie-algebra-valued one-form does have some applicability to quantum field theory. 
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69.5.12 REMARK: Connection forms are linear maps. 

Since the maps A71, idr, (p), Bz and (dr), are linear, it follows from line (69.5.2) that w, is linear. (The 
linearity of A7! follows from the linearity of A, by Theorem 23.1.14.) These linear maps are organised into 
an exact double sequence in Theorem 69.5.13. (See Definition 64.5.7 for the vertical space T; o(P).) 


69.5.13 THEOREM:  Ezact sequence of maps for connection forms. 
Let (P, v, M, AG) be a C! principal G-bundle. Let z € P. Let p = n(z). 
(i) ker(8.) = {0}. (In other words, 8 : T;(M) — T-.(P) is injective.) 
(ii) ker(w,) = Range(5.) = Q.. 
(iii) Range(w,) = T.(G). (In other words, wz : T.(P) — T.(G) is surjective.) 
(iv) ker(A;) = (0). (In other words, A, : T.(G) —> T;(P) is injective.) 
(v) ker((dz);) = Range(A;) = Tz o(P). 
(vi) Range((dz);) = T,(M). (In other words, (dz); : T,(P) — T,(M) is surjective.) 


In other words, the following diagram shows a doubly exact sequence of linear maps. 


Bz Wz 
O21) > T,(P) £ TG 2 0. 
(dr)z Az 


(See Definition 24.5.2 for exact sequences of linear maps.) 


Pnoor: Part (i) follows from Definition 69.2.2 (ii) and Theorem 10.5.14 (i). 
For part (ii), let y € Range(@,). Then y = 8,(V) for some V € T,(M). So by Definition 69.2.2 (ii), 
B. ((dr):(y)) = B«((dv):(B4(V))) = BX(V) = y. So wz(y) = Az! (y — B: ((dm):(y))) = Az (0) = 0. Therefore 
ker(w,) 2 Range(B.). Now suppose that y € ker(w,). Then w.(y) = 0. So y = f.((dm).(y). But 
(dr)-(y) € Tp(M). So y € Range(8;). Therefore ker(w;) C Range(B;). Hence ker(w,) = Range(B.). The 
equality Range(G,) = Q, follows from Theorem 67.9.9 (iii). 

For part (iii), let u € T.(G). Let y = A;(u). Then y € ker((dm);) by Theorem 66.5.7 (v). Therefore 
B. ((dr);(y)) = 0. So w(y) = A;!(y) = u by Definition 69.5.4. It follows that Range(w;) 2 T.(G). But 
Range(w;) € T.(G) by Definition 69.5.4. Therefore Range(w,) = T.(G). 

Parts (iv) and (v) follow from Theorems 66.5.7 (v) and 67.9.9 (ii). 

For part (vi), let V € T,(M). Let y = B.(V). Then (dz);(y) = (dx):(B:(y)) = V by Definition 69.2.2 (ii). 
Hence Range((d7),) = T,(M). 


69.5.14 REMARK: Maps/spaces for horizontal lift/component/subspace, connection form, group action. 
'The maps and spaces in Theorem 69.5.15 are illustrated in Figure 69.5.2. 


Figure 69.5.2 Horizontal lift /component/subspace, connection form and group action 


In Theorem 69.5.15 parts (xviii), (xix), (xxi), (xxii), (xxv) and (xxvii), the symbol “0” denotes the zero 
function with relevant domain and range. In parts (iv), (x) and (xxxii), “0” denotes the relevant zero vector. 
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69.5.15 THEOREM: Summary of properties of connection forms and related maps on principal bundles. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, s, M, A). Let z € P. 


(i Range((dz);) = T,(M). 


ay 
i= 
jz 
t 
=) 
o8 
[o] 
— 
c 
N 
— 
Ñ 
© 
—. 
Y 
WV 


(xv) (dr), oh, = (dr),. 


(xvi) hz o B. = B. 
hz o h; = h; 
(xviii) hz ov, =0 


(xx) vz o v: = v; 
(xxi) (dr); oA, = 
pe h,orA, =0 


(xxviii) Dom(A7!) = Tz o(P). 


(xxix) wu, = Az! o (id, (pj — hz) = AZ o vz. 
-1 

(xxx wely (P) 7 Az 

yum Az © walr, (P) Z idm, (P) 


Q,nT, o(P ) = {0}. 

Pnoor: Part (i) follows from Theorem 67.9.9 (i). 
Part = follows from Theorem 67.9.9 D. 

Part 


v) follows from Theorem 67.9.9 (vii). 

vi) follows from Theorem 67.9.9 (viii). 

vii) follows from Theorem 69.4.3 (1). 
um follows from Theorem 69.4.3 i 
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Part (xi) follows from Theorem 69.5.13 (iii). 
Part (xii) follows from Theorem 69.5.13 (ii). 
Part (xiii) follows from Theorem 67.9.9 (v). 
Part (xiv) follows from Theorem 67.9.9 (vi). 
Part (xv) follows from Theorem 67.9.9 (xi). 
Part (xvi) follows from Theorem 67.9.9 (x). 
Part (xvii) follows from Theorem 67.9.9 (ix). 
v 


(vij: 


Part (xix) follows from Theorem 69.4.3 (iii). 


Part (xviii) follows from Theorem 69.4.3 ( 


Part (xx) follows from Theorem 69.4.3 (v). 


Part (xxi) follows from parts (ix) and (ii). 


Part (xxii) follows from parts (ix) and (vi). 


For part (xxiii), let u € T.(G). Then w,(Az(u)) = Az!(A;(u) — B, ((dr),(Az(u)))) = AP (A, (u)) = u by 
Definition 69.5.4 and part (xxi). Hence w; o A, = idr, (c. 


Part (xxiv) follows from parts (vii) and (ix) and Definition 69.5.4. 
Part 
Part 
Part 
Part 


xxv) follows from parts (v) and (xii). 
xxvi) follows from part (xxv) and Definition 69.4.2. 


xxvii) follows from parts (iii) and (xii). 


MM MM oa 


xxviii) follows from part (ix). 


Part (xxix) follows from Definition 69.5.4 and part (xiii). Equality to A7! o v; follows from Definition 69.4.2. 


For part (xxx), hele. 5p) = 0 by part (vi). So w, = A71 by part (xxix). 


lr. o(P) 


Part (xxxi) follows from parts (xxx) and (xxviii). 


Part (xxxii) follows from Theorem 67.9.9 (xiii). 


69.5.16 REMARK: Application of connection form to Lie brackets of fundamental vector fields. 
Let Au, Ay : P — T(P) be fundamental vector fields on a C? principal bundle P. Then Au, A, € X'(T(P)) by 
Theorem 66.6.8 (iv). So the Lie bracket [A,, Av] is a well-defined vector field on P. Theorem 69.5.17 asserts 


that when any connection form w for P is applied to [A4, A,](z) at any z € P, the result is always [u,v], 
namely the Lie algebra vector product of u and v. 


69.5.17 THEOREM: Connection form on Lie bracket of fundamental fields gives Lie bracket of generators. 
Let (P, s, M, AG) be a C? principal G-bundle. Let w : T(P) + T.(G) be the connection form corresponding 
to a transposed horizontal lift function 8 on P. Then 


Yz € P, Vu,v € T.(G), w([Au, Av](z)) = [u, v]. 


PROOF: Theorem 66.6.13 implies that [Ay, Av] = Aq. But Ap, (z) = Az([u, v]) by Definition 66.6.2. So 
w([Au, ^v](2)) = w(Az([u, v])) = [u, v] by Theorem 69.5.15 (xxiii). 


69.5.18 THEOREM: Composition of a connection form with a fundamental vector field is a constant. 
Let (P, v, M, AG) be a C! principal G-bundle. Let w : T(P) + Te(G) be a connection form on P. Then 


Vu € T.(G), Vz € P, tay (Au{2z)) = u. 


PROOF: Let u € T,(G) and z € P. Then w,(A,(z)) = w,(A;(u)) = u by Theorem 69.5.15 (xxiii). 
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69.6. Connection definition conversions for principal bundles 


69.6.1 REMARK: Reconstruction of the horizontal lift map from a connection form. 

It is fairly straightforward to construct both the connection form w, and the horizontal subspace Q, from 
a horizontal lift function f. It is not so easy to recover the information in f. from either w; or Qz. The 
subspace Q, is easily obtained from w, as Q; = (y € T:(P); w,(y) = 0) = w;!(0) = ker(w;). The lift 
function value (V) is then the unique vector in the set Q; N Tz v (P) for any V € T;(M). Thus one may 
write {8.(V)} = (y € T.(P); w,(y) = 0 and (dr),(y) = V). This is not a convenient form for further 
analysis. Similarly, the expression {8.(V)} = {y € Qz; (dr)-(y) = V) is not convenient for analysis. This 
suggests that both the connection form and horizontal subspace representations should be rejected as primary 
definitions, even though the horizontal lift information is in principle recoverable by solving some equations. 


'The difficulty of recovering the horizontal lift function from the connection form is a further justification for 
representing connections as horizontal lift functions. The connection form is also specific to principal fibre 
bundles and it lacks intuitive clarity. Since the connection form may be constructed in a fairly straightforward 
way from the lift functions, it seems best to regard it as a secondary construction, not the primary definition. 


69.6.2 REMARK: Conversion rules for five definitions of a connection on a principal bundle. 

'Theorem 69.6.3 collects together various conversion rules between the different ways of encapsulating the 
information in a connection on a principal bundle, namely the representations [ hz, Vz, Qz and wz. (The 
fundamental vector field \, is connection-independent.) Al of these encapsulations are well defined for 
general ordinary fibre bundles except w,. (See Theorem 67.11.2 for the corresponding list of conversions for 
ordinary fibre bundles.) 


69.6.3 THEOREM: Conversion rules for five encapsulations of a connection on a principal bundle. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, v, M, AG). Let z € P. 


(i) VV € T,(M), {8.(V)} = T.v (P) N Range(h.). 
(ii) VV € T,(M), {8.(V)} = T; v (P) n ker(v;). 

(ii) YV € (M), (6.0) = Tav (P) n Q.. 

(iv) YV € TQ), {8:(V)} = Ta v (P) n ker). 


t) hy cfle (dn) 


(viii) h; = na — " O Wz. 


(ix) vz = idr, (pj — Bz o (dr). 
(x) vz = idr, (p) — he. 

(xii) v; = Az o wz. 

(xiii) Q. = Range(Z.). 

(xiv) Q, = Range(h,). 

(xv) Q- = ker(v;) 

(xvi) Q; = ker(wz) 
(xvii) w, = Az! o (idr (pj — Bz o (dr);). 


(xviii) w, = Az! o (idz (p) — hz). 


-1 
z OU. 


Vy € TZ(P), {wz(y)} = Oz! (y); v € TZo(P) and y - y € Q;). 


PROOF: For part (i), let z € P, p=7(z) and V € T,(M). Suppose that y € {8.(V)}. Then y = 8.(V). So 
y € T, v(P) by Definition 69.2.2 (i, ii) and Notation 64.5.6, and y € Range(h;) by Theorem 69.5.15 (iii, v). 
So y € T; v (P) n Range(h,). 

To show the reverse inclusion, suppose that y € T; v(P) N Range(h.z). Then y € T; v(P) n Range(8;) 
by Theorem 69.5.15 (iii, v). So y = 8;(V') for some V’ € Dom(8;) = T,(M). But then y € Tz (P) by 


) 
) 
) 
) 
vi) 
vii) 
) 
) 
) 
(xi) Vy € (P), {v.(y)} = ty’ € T20(P; y- v € Q1). 
i) 
) 
) 
) 
) 
) 
) 
) 
xx) 
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Definition 69.2.2 (i, ii). Therefore V’ = V by Notation 64.5.6 and the well-definition of (dr),. So y = B.(V). 
Hence (8,(V)) = T, v (P) N Range(hz). 


Part (ii) follows from part (i) and Theorems 69.4.3 (ii) and 69.5.15 (v). 
Part (iii) follows from part (ii) and Theorem 69.43 (ii). 

Part (iv) follows from part (iii) and Theorem 69.5.15 (xii). 

Part (v) follows from Theorem 69.5.15 (xiii). 

Part (vi) follows from Definition 69.4.2. 


For part (vii), let z € P and y € T;(P). Suppose that y' € (h;(y)). Then y = hz(y). So y' € Q: by 
Theorem 69.5.15 (v), and y — y' = y — h;(y) = vz(y) € Tz o(P) by Theorem 69.4.3 (i). So y — vy € ker((d):) 
by Notation 64.5.6. Thus y’ € (y' € Qz; y — y' € ker((dz);)). 


(d 
To show the reverse inclusion, suppose that y' € (y € Qz; y — y' € ker((dz),)}. Then y' € Q; and 
y — y. € ker((dz);). So y' = h;(y') by Theorem 69.5.15 (viii) and Definition 69.4.2, and (dz);(y) = (dr)z(y’) 
by the linearity of (dr),. Therefore by Definition 69.3.2, h.(y) = B.((dt)z(y)) = Bz((dr)z(y)) = hz(y) = y. 
Hence {helu} = y! € Qui y — y € ker((dm),)). 


Part (viii) follows from part (vi) and Theorem 69.5.15 (xxiv). 

Part (ix) follows from Definitions 69.4.2 and 69.3.2. 

Part (x) follows from Definition 69.4.2. 

For part (xi), let y € T;(P). Then it follows from part (vii) and Definition 69.4.2 that 


{uz(y)} = {y — he(y)} 
— (uy — y y' € Q, and y — y' € ker((dr),)} 
= (y; y — y € Qz andy’ € ker((dz),)} 
= {y’ € T;o(Phuy- v € Qe}. 


xii) follows from Theorem 69.5.15 (xxiv). 
xiii) follows from Theorem 69.5.15 (iii). 
xiv) follows from Theorem 69.5.15 (v). 
xv) follows from Theorem 69.4.3 (ii). 

xvi) follows from Theorem 69.5.15 (xii). 

xvii) follows from Definition 69.5.4 line (69.5.2). 


xviii) follows from part (xvii) and Definition 69.3.2. 


Part 
Part 
Part 
Part 
Part 
Part 
Part 
Part 
Part (xx) follows from parts (xi) and (xix). 


EEEIEEEIEE 


xix) follows from Theorem 69.5.15 (xxix). 


69.6.4 THEOREM: Expression for a horizontal lift in terms of its connection form. 
Let 8 be a horizontal lift map for a C! principal G-bundle (P, v, M, AG). Then 


Vp € M, Vz € Pp, VV € T (M), Vy € Tzv (P), 
Bv(z) = BV) = y — Az (9)). 


PROOF: Let y € T,v(P). Then by Theorem 69.6.3 (xvii), wz(y) = Az!(y — B.(V)). Hence it follows from 
Definition 69.2.2 that By (z) = 8,(V) = y — Az(wz(y)). 


69.6.5 REMARK: Substitution of "gauges" into total space vector formulas. 

Wherever a formula requires in input y € T; v (E), for z € E and V € T; ,((M), where (E, mp, M, AE) is 
a C! fibre bundle, it is possible to substitute Oy X for y for any X € X!(E, nrg, M) satisfying X(p) = z. 
(This is shown in Theorem 64.7.12.) 


In the case of principal bundles, such cross-sections are referred to as “gauges” or “local gauges" in gauge 
theory. Theorem 69.6.6 applies this idea of substituting a “local gauge" to Theorem 69.6.4. 

Another useful source of vectors in T; y (E) is a lifted vector 0v (z) for ordinary bundles, or £y (z) for principal 
bundles. However, if these vectors are part of the formula to be substituted, the result is generally 0 — 0, or 
some such useless formula. 
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69.6.6 THEOREM: Obtaining the horizontal lift from a connection form and a cross-section. 
Let 8 be a horizontal lift map for a C! principal G-bundle (P, m, M, AG). Then 


VX € XL.(P, r, M), Vp € Dom(X), VV € T,(M), 
Bv (X(p)) = fxm (V) = 0v X — Ax (p) (wx (p) (X4(V))). 


PROOF: The assertion follows from Theorems 69.6.4 and 64.7.12. 


69.6.7 REMARK: Reconstruction of horizontal lift functions via cross-sections. 

A very imperfect method for reconstructing a horizontal lift function 3, from the horizontal component map 
h, is given in Theorem 69.6.8 (iv, v, vi) in terms of cross-sections X € X]. (P, m, M) which happen to have 
the right value X(p) = z at p = T(z). 


As mentioned in Remark 67.11.3 and asserted in Theorem 67.11.4(ii,iv), the cross-section X which is 
required in Theorem 69.6.8 may be chosen to be a constant cross-section with some specified value z = X (p) 
at the point p = -z(z) € M, where “constant” means constant when viewed through some fibre chart. 
The corresponding assertions may be made here also. (See also Definition 21.6.8 for constant cross-section 
extensions Extng(z). See Definition 64.5.13 for horizontal fibre-set vector fields Hy, Je) 


loc 


69.6.8 THEOREM: Formulas for transposed horizontal lift in terms of horizontal/vertical components. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, m, M, AS). 


(i ) VX c XL (Pm, M), Vp e Dom(X), Px (p) = hx (y) e (dX),. 


(ii) VX € XL. (P, m, M), Vp € Dom(X), Bx(p) = (dX), — vx (y) o (dX)y. 

(iii) VX € X1. (P, m, M), Vp € Dom(X), Bx(p) = (dX), m o Ux (p) © (dX)p. 
(iv) Vo € AG, Vz € Dom(¢), B. = = h; o (dExtng(z))«(z) 

(v) Vo € AG, Yz € Dom(¢), 8; = (d Extng(z))a(z) — v; o (€Extng(z))a(z)- 

(vi) Vo € AZ, Vz € Dom(¢), Bz = (dExtng(z))a(z) — Az o wz o (d Extng(z))a(2)- 
(vii) V$ € AS, Vz € Dom(¢), vv € Tre) (M), B.V) = hz (Hf s (2). 


In other OR. Yz € P, Yọ € AT (a) B, = hy((m x à S 0r 50)))- 
(viii) Vo € AZ, Vz € Dom(¢), VV € T, (M), B,(V) = Hy (2) — v. (Hy s (2). 
In other — Vz € P, V € AE. Be = (m x $);1(5,0n,,, (9) — v«(( X ):1(,07,,, (0). 
(ix) Vo € AG, Vz € Dom(¢), VV € Tz (M), BAV)= Hy (2) — Az(we(Hy 4 (2))). 
In other words, Vz € P, Vo € AB (ay md ei s Or, (0) — As(wa((m x p) te, 07, (06)))- 
Pnoor: Part (i) follows from Theorem 67.11.4 (i) and Definition 66.1.2, and the fact that connections on 


principal bundles are a subspecies of connections on ordinary fibre bundles. 
Part (ii) follows from part (i) and Definition 69.4.2. 


TX 


Part (iii) follows from part (i) and Theorem 69.6.3 (viii). 

Part (iv) follows from part (i) and Theorem 64.7.16. 

Part (v) follows from part (ii) and Theorem 64.7.16. 

Part (vi) follows from part (iii) and Theorem 64.7.16. 

Part (vii) follows from part (iv), Theorem 64.7.18 and Definition 64.5.13. 
Part (viii) follows from part (v), Theorem 64.7.18 and Definition 64.5.13. 
Part (ix) follows from part (vi), Theorem 64.7.18 and Definition 64.5.13. 


69.7. Connection generator functions for principal bundles 


69.7.1 REMARK: Expressions for PFB connections in terms of their connection generator function. 

Theorem 69.7.2 expresses various representations of connections (namely 6, B, hz, vz and wz) in terms of the 
connection generator function ûg which is introduced in Definition 67.6.5. Since principal bundles inherit 
their connection definitions (except for the connection form) from ordinary fibre bundles, Theorem 69.7.2 
parts (i), (ii), (iii) and (iv) follow directly from Theorem 67.11.6. Part (v) follows directly from part (iv), but 
the formula is not immediately meaningful. However, it is capable of being simplified as in Theorem 69.7.3. 
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69.7.2 THEOREM: Expressions for connections in terms of the connection generator function. 
Let 8 be a horizontal lift map for a C? principal G-bundle (P, m, M, AG). 


(i) Vp € M, VV € T,(M), Vo € AB p Vz € Pp, Bv(z) = B(V) = (d(a x 6))z (V. X, cv. ay (0(2)))- 


(i) Yp € M, Yz € Py, Vy € TP), YỌ € AG, he(y) = (dlr X à): Gr). XE s ay (OC): 
(ii) Vz € P, vy € Te(P), Vb € AG Gs. Vel) = (da X e NUDO XP, gy gy (6(2))): 
(iv) Vz € P, Vy e T.(P), Vo € AS aip Y ve(y) = 07 (( (Olea) ' E MO HS a (902) ). 

where n% : T; (Er(2)) > T;,o(E) is the fibre-set pg dta embedding map as in Notation 64.12.10. 
(V) Ve € P, Vy € T(P), VÓ € AG. s) = A7 (9| O) — XB ca.) e O) 


(See Definition 62.7.7 for the right invariant vector field XP on G for u € T(G). See Remark 63.6.7 for the 
equality of XS and XP.) 


PROOF: Part (i) follows from Theorem 67.11.6 (i) and Definitions 66.1.2, 69.1.3 and 67.5.4. 

Part (ii) follows from Theorem 67.11.6 (ii) and Definitions 66.1.2, 69.1.3 and 67.5.4. 

Part (iii) follows from Theorem 67.11.6 (iv) and Definitions 66.1.2, 69.1.3 and 67.5.4. 

Part (iv) follows from Theorem 67.11.6 (vi) and Definitions 66.1.2, 69.1.3 and 67.5.4. 
( 


Part (v) follows from part (iv) and Theorem 69.6.3 (xix). 


69.7.3 THEOREM: Formula for a connection form in terms of its connection generator function. 
Let 6 be a horizontal lift map for a C! principal G-bundle (P, v, M, AG). Then 


Yz € P, Yy € T;(P), Vo € AS «(zy 
ws) = (dL. 5 2) )e ! (o, (y) — XL (n. (9),6y (P(2)))- (69.7.1) 


PROOF: The assertion follows from Theorems 69.7.2 (v) and 66.5.7 (viii). 


69.7.4 REMARK: The relation between connection forms and connection generators. 

The inverse differential map (dlp Je : Toc) (G) > T-(G) in Theorems 69.7.3 and 69.7.5 is recognisable as 
the Maurer-Cartan form in Definition 62.5.4. This form maps “off-centre” tangent vectors on G to T.(G). 
It is used particularly for “centring” differential transition rules for fibre charts. (See Remark 64.8.9.) The 
Maurer-Cartan form disappears in Theorem 69.7.7 because the identity cross-section X ; in some sense centres 
the formula. So it needs no map to move it to the “centre”, which is e € G. 


The quantifiers in Theorem 69.7.3 equation (69.7.1) may be rearranged as follows. 


Vo € AG, Vz € Dom(9), Vy € T.(P), 
we(y) = (ALF) (6« (9) — XE cay), (O(2)))- 


Then using the dot-notation for functions (in Notation 10.12.18), this may be written as follows. 


vé € AP, Yz € Dom(4), = (ALG (2))e (6C) 7 Xo, c), (G)))- 


When this fibre-chart-dependent formula for w, is combined with the conversion rules in Theorem 69.6.3, 
one obtains some fibre-chart-dependent conversion rules as in Theorem 69.7.5. 


69.7.5 THEOREM: Fibre-chart-dependent conversion rules for connections and connection generator. 
Let w be a connection form for a C! principal G-bundle (P, , M, AG). Let ¢ € AG and z € Dom(¢). 


(i) we = (dL) (6C) — XE cy OC). 
(ii) ve = Az e (ALS 71 (6C) — XE ¢- 6) (902) 
(ili) he = dr py — Az o (428,7 (9C) -XE mt), (90). 
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PROOF: Part (i) follows from Theorem 69.7.3. 
Part (ii) follows from part (i) and Theorem 69.6.3 (xii). 
Part (iii) follows from part (i) and Theorem 69.6.3 (v). 


69.7.6 REMARK: The connection form is the difference between a vector and the connection generator. 
Theorem 69.7.7 shows that the connection form is a *difference signal" between a vector y € T(P) and the 
connection generator tig(7.(y),¢) for that vector, which confirms once again that the connection form is 
really a “covariant derivative form". 


69.7.7 THEOREM: Connection form expressed as vector minus connection generator for the vector. 
Let 8 be a horizontal lift map for a C! principal G-bundle (P, x, M, AG). Then 


Yp E€ M, YQ € A$ p» Vy € Tx; (P), 
Wx4(p)(y) = b«(y) — Ua (T (y), 9), 


where X; : p > d|p (e ) is the identity cross-section for ¢ € AG and p € 7(Dom(¢)) as in Definition 21.10.3. 


PROOF: The assertion follows from Theorem 69.7.3 by substituting z; = X4(p) for z in line (69.7.1), which 
gives w (y) = (dL2)z*(be(y) — XE jy (9) = Gel) — tba (rely), 9). 


69.7.8 REMARK: Obtaining the connection generator map from a connection or connection form. 

Since a connection form can be obtained from the connection generator, as in Theorems 69.7.2 (v), 69.7.3, 
69.7.5 (i) and 69.7.7, it should be possible to invert these expressions to obtain the connection generator from 
a connection form. This is done in Theorem 69.7.9 (iii). 


The connection generator for base-space velocity V is in some sense equal to the negative of the connection 
form if the connection form is evaluated for a suitable horizontal vector with base-space velocity V. Moreover, 
via the push-forth differential ¢,, the connection generator equals the negative vertical component map in 
Theorem 69.7.9 (v), and the positive horizontal component map in Theorem 69.7.9 (vii). 


Theorem 69.7.9 (i) inverts Definition 67.6.5 to obtain the connection generator from a PFB connection f. 
This simple inversion is possible for PFBs, but not for OFBs because the fibre space of an OFB does not 
generally contain an identity element. (The difficulty of extracting the generator from an OFB connection 
is also mentioned in Remark 69.9.1.) 


69.7.9 THEOREM: Conversion from a connection to its connection generator. 
Let w be the connection form for a connection 8 on C! principal G-bundle (P, x, M, A). 


(i) Vp e M, VV € T,(M), Vo € i — a = $. (Bv (|p, (€))) = $«(Bv(Xo())), 

where Vp E€ M, Vo € A$ »» Xo(p - i P, ) as in Definition 21.10.3 for the "identity cross-section". 
(i) Vp € M, VV € T (M), Vo € p Vy € pis tig(V, $) = (y) — vx, qv). 
(iii) Vp € M, VV € T,(M), Vo € A»; 

&g(V, à) = —wx,qy(( x p); (V, Or, (e5)) 
= —wx4(p) (Hy ¢(X¢(p))), 

where Hy 5 :ze (m x p); (V, OT (G)) is the horizontal fibre-set vector field in Definition 64.5.13. 
(iv) Vp € M, We Ty (M), Vo € AS p Vy € Ty sr (P), üg(V, p) = bx (y m Ux (p) (9))- 
(v) Yp € M, VV € T,(M), Vo € A ss 

üg(V, p) = -9 (vx, qy ((m X A); (V. 0r, (e))) 
= —$. (ox, (p) (Hy s (Xo ())))- 


(vi) Vp € M, VV € T, p(M Ji Vo € Ag, Vy € Tx hv (P), üg(V, $) = bx (hx 4(p)(Y))- 
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(vii) Vp e M, VV € T,(M), Vo € AG, 


üg(V, $) = bs(Rxg(p)((m X A) (V 0r. (6)))) 
= bx (hx, gy (Hv s (Xo (0)))). 


Pnoor: For part (i), let p € M, V € T;,(M) and ¢ € AB, Then ¢,(8v(X¢(p))) = $«(Bv(4|p, (©) = 


XE (vay (6) = üg(V. à) by Definitions 21.10.3, 67.6.5 and 62.4.15. 


For part (ii), let pe M, V € T;(M), 6 € AS, and y € Tx, (5),v (P). Then by Theorem 69.6.3 (xvii), 


xq) = AXE gy (9 — Bx, cy (7 (9))) 
= Axio) Y = Bx(p)(V)) 
= = AX py = Bv (Xo(p)))- 


So By (X4(p)) 2 9 — Ax, (p) (x, (py (U))- Therefore tig(V,¢) = ¢.(y — Axe y (x. (p) (y))) by part (i (i ). Hence 
tig(V, o) = ó.(y) — wx,(p)(y) by Theorem 66.5.7 (iii). 


For part (iii), let p € M, V € T,(M) and ¢ € A£,. Let y = Hy ,(Xo(p)). Then y = (T x $); (V.Or,(a;) 
because $(X4(p)) = e, and ¢.(y = = 0r, (@) by Theorem 64.5.16 at So ág(V, 9) = —wx,(p)(y) by part (ii). 
Hence üg(V, $) = —wx (py (H¥,4(Xo(p))), which implies ág(V, ¢) = —(wx,(@)((t x $)z' (V. 0r, (o5). 

Part (iv) follows from part (ii) and Theorems 69.6.3 (xii) and 66.5.7 (iii). 

Part (v) follows from part (iii) and Theorems 69.6.3 (xii) and 66.5.7 (iii). 

Part (vi) follows from part (iv) and Theorem 69.6.3 (x). 

For part (vii), üàg(V, 9) = «(hx (p)((m x 9)7 (V, Or (G)) — (x x 9)! (V, Or, (c))) follows from part (v) and 
Theorem 69.6.3 (x). But $.((v x $); (V, 07, (5))) = Ora). So àg(V, 9) = ġa (hxg ((m x o); (V; Or, (ay), 
which implies tig(V,¢) = $. (hxc y (H? CX ())))- 


69.7.10 REMARK: Special formulas for connections in terms of identity cross- ue 

If the total space element z € P, in Theorem 69.7.2 (i) is chosen to be the value X4(p = e|; n ) for each 
p € M, then the formula for By (z) can be simplified. (See Definition 21.10.3 for d cross- Du Xs.) 
In some situations, this single “sample point” z = X, (p) at each p € M is adequate. By Definition 69.1.3 (v’), 
all of the information in a PFB connection is contained in such “sample points” anyway. 


69.7.11 THEOREM: Connection formulas in terms of connection generators and identity cross-sections. 
Let 6 be a horizontal lift map for a C! principal G-bundle (P,7, M, AG). 


(i) Yp € M, YV e TM), vé e AR, Bv(X(P)) = (1X))(V) + (| p, aby à (V. 9). 


PROOF: For part (i), By (z) = (d(1x4))z ! (V, X$ v) ((z))) by Theorem 69.7.2 (i) for all z € P. Therefore 


Bv (Xo(p)) = (dt x &))x, o (V. X2, (vo (€) (69.7.2) 
= (d( x $)) xq) (V (dRe)e(tia(V, à) (69.7.3) 
= (d(T x 6) x. o (V. Ga (V, 6)) (69.7.4) 
= (dX4)p(V) + (4(6| p ))x; (py (Ha (V, $)) (69.7.5) 


where line (69.7.2) follows from Theorem 21.10.5 (iii), line (69.7.3) follows from Definition 62.7.7, line (69.7.4) 
follows from the observation that Re = ida, and line (69.7.5) follows from Theorem 66.3.2 2 (ii). ii). 
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69.8. Effect of the right action map on connection forms 


69.8.1 REMARK: The effect of the right action map of a principal bundle on a connection form. 

Every differentiable principal bundle has one and only one right action map. But it may have any number 
of connections. The right action map HT : P > P is a bijection on each fibre set of the bundle. In other 
words, Re |p, : Py — Pp is a bijection for all p € M, where P, = 1 l((p]). Therefore its differential (ARE); 
for any z € P has no effect on the horizontal component of vectors y € T,(P). However, (dR7 ); does have 
an effect on the vertical component. It also modifies the base point z € P so that (ARẸ) (y) € T4, (P). 


The combined effect of the change in vertical component and change in base point yields the formula in 
Theorem 69.8.2 (i). The value w((R?’).(y)) of the connection form w at (RP). (y) is obtained from w(y) by 
applying the adjoint operation Adj(g !) to the Lie algebra element w(y) € T.(G). 


The right action transformation property in Theorem 69.8.2 (i) is sometimes combined with the left inverse 
property Vz € P, w, o Az = idz, (c) in Theorem 69.5.15 (xxiii) as the definition of a connection form on a 
principal bundle. (See for example Spivak [37], Volume 2, page 315; Eriksson/Hággblad/Strómbom [264], 
page 53; Daniel/Viallet [317], page 183.) The connection form is also defined in this way by Bishop/ 
Crittenden [2], page 76, but they call it the “1-form of a connection" if it satisfies only Theorem 69.5.15 (xxiii), 
and an “equivariant” 1-form of a connection if it also satisfies Theorem 69.8.2 (i). 

Other authors merely assert and prove that Theorems 69.5.15 (xxiii) and 69.8.2 (i) are both satisfied if and 
only if the form is a connection form. (See for example Sternberg [38], page 334; Poor [32], page 283; Choquet- 
Bruhat [6], pages 256-257; Sulanke/Wintgen [40], pages 127-128; ' Crampin/Pirani [7], page 379; Kobayashi/ 
Nomizu [19], page 64.) 

Theorem 69.8.2 part (ii) expresses the first equation in part (i) in terms of the pull-back map (R7)* 
Ai(T(P),Te(G)) > A1(T(P), Te(G)), which is the same as the reverse right translation operator Re, for 
differential forms in Definition 63.5.10. So part (ii) means that the effect of translating a connection form w 
by g^! is the same as acting on the output of w with the adjoint operation Adj(g7!). 


Theorem 69.8.2 part (iii) expresses the assertions in parts (i) and (ii) explicitly in terms of the right translation 
operator RY : X(Ai(T(P),Te(G))) > X(A1(T(P), T.(G))) for short-cut differential forms for Lie right 
—À groups (G, P). (This operator is a straightforward generalisation of Definition 63.5.10 from 
scalar to vector valued differential forms.) 

-1 


The inverse group element g~* in parts (i) and (ii) is replaced with g in part (iii). This shows that the 
appearance of g^! in part (i (i) is an artefact of the way in which the backwards right transformation operator 
Res is made to look as if it is the forwards operator. As explained in Remark 62.3.14, g^! appears in the 
case of translation by g because the value of the differential form must be translated from zg~! to z. 


((2019-1-2. The proof of Theorem 69.8.2 (ii) should be a single line using Definition 58.11.10 and part (i). )) 


69.8.2 THEOREM: Formulas relating connection forms to right action maps. 

Let w : T(P) — T.(G) be the connection form for a transposed horizontal lift map 8 on a Ct principal 

G-bundle (P, v, M, AS). 

(i) Vg € G, Yz € P, Yy € TP), os (ARP). 
In other words, Vg € G, Vy € T(P), w((R 
In other words, Vg € G, w o (RP), = Adj(g ^n ow. 
In other words, Vg € G, Vz € P, weg 0 (ARẸ); = Adj(g !) o wy. 
In other words, Vg € G, Vz € P, Weg = Adj(g !) ow, © (GRP) 2g. 

(ii) Vg € G, (RP)*(w) = Adj(g ^!) o w, where (RP)* is the pull-back map in Definition 58.11.10 for the 
T. (G)-valued form bundle A: (T(P), T.(G)) = A(T(P), Te(G)) in Notation 56.7.4. 

(iii) Vg € G, Rọ (w) = Adj(g) ow 
(See Definition 63.5.10 for the right translation operator Rọ for short-cut differential forms.) 


Proor: For part (i), let g € G, z € P and y € Tz o(P). Then (RP).(y) € Tzg,0(P) by Theorem 66.2.18 (v). 
Therefore cz; ((RT).(y)) is well defined by Definition 69.5.4, and then w;4((RP)«(y)) = Ag] (UT )«(y)) by 
Theorem 69.5. 15 Qax xxx). Let u = A;!(y), which is well defined by Theorem 69.5.15 (xxviii). Let z' = zg. 
Then (RP), (s (u)) = (RP). (Azrg- (1) = Aer(Adj(g74)(u)) = Azg(Adi(g-)(u)) by Theorem 66.6.10 (ii). So 
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Weg((Rg )*(y)) = Azg (Rg J= (Az(u))) = Azg (Azg(Adj(g~*)(u))) = Adi(g )0z* Q)) = Adj(g-*)(wz(y)) by a 
double application of Theorem 69.5.15 (xxx). Thus w((R?’).(y)) = Adj(g~!)(w(y)) for all y € T; (P). 
Now consider general y € T,(P). Then y = h;(y) + v;(y) by Definition 69.4.2. So y = hz(y) + Az(wz(y)) by 
Theorem 69.5.15 (xxiv). Let y' = Az(wz(y)). Then y' € TZo(P) by Theorem 66.5.7 (iv), and y = hz(y) + y'. 
By Definition 69.3. 2, hz = B, o (dr),. Therefore (ARP), o h; = (ARP), o B, o (dw), = Bzg o (dr): by 
Definition 69.2.2 (ii). So w o (ARP); o hz = Weg o Bzg o (dr); = 0 by Theorem 69.5.15 (xxvii). Therefore 
w((RE)Q)) = UR P)«(y)) by the linearity of wz, and (dR?),. By Theorem 69.5.15 (xxv), w(hz(y)) = 0 
also. So Adj(g~")(w (y )) = Adj(g~*)(w(y')). Hence w((Rq’)2(y)) = Adj(g~*)(w(y)) for all y € T.(P). 
For part (ii), the global pull-back map (RP)* : Ai(I(P),Te(G)) > A1 (T(P), T. (G)) has the pointwise form 
(ARE): : M (TZ (P), T. (G)) > Ai (Tz (P), T. (G)) for all z € P, defined by (dR?)*(n)(y) = n((dRP)2(y)) for 
all n € Ai(T(P ) T; «(G)) and y € T,(P) aroording to the template in Definition 58.8.7. Let n = wz g. Then 
(dR); a ) = eas ((dR$)«(y)) = Adj(g~*)(wz(y)) = (Adj(g™) o wz)(y) by part (i) for all y € T.(P). 
But (R7)*()|7, (P) = (aR? D. (Weg) for all z € P. So (Rọ )* (w) = Uzep (Rg )* (9). (p) = Uie p (RT): (weg). 
Similarly, (Adj(g^!) o ur. (p) = = Adj(g !) o w, for all z € P. So Adj(g ^!) o w = U,cp(Adj(g !) o 
wirp) = Uep Adi(g^ 1) ow,. For all y € T,(P), therefore, (RP)*(w w)(y) = (aR?) (wzg)(y) = (Adj(g~*) o 
wz)(y) = (Adj(g~*) o w) (y). Hence (R7)*(w) = Adj(g !) o w. 
Part (iii) follows from part (i) and Definition 63.5.10. 


69.8.3 THEOREM: Summary of effects of the right action map on connection definitions. 
Let B be a transposed horizontal lift map for a C! principal G-bundle (P, x, M, AG). Then 


Vz € P, Vg € G, Bzg = (ARẸ uo Bz (69.8.1) 
Vz € P, Yg EG, hzg = (ART), © tig 0 (dRẸ-1)zg (69.8.2) 
Vz e€ P, Yg EG, vzg = (dS, © vz o (ARE i) (69.8.3) 
Vz € P, Vg € G, Qus = (AR )4(Q,) (69.8.4) 
Vzc P,Vg eG, Weg = Adj(g !) o wz o (ARI 1) ag. (69.8.5) 


PROOF: Line (69.8.1) follows from Definition 69.2.2 (iii^). 
Line (69.8.2) follows from Theorem 69.3.5 line (69.3.2 
Line (69.8.3) follows from Theorem 69.4.4 line (69.4.2 
Line (69.8.4) follows from Theorem 69.3.8. 

(69.8.5) follows from Theorem 69.8.2 (i). 


Line 


69.9. Associated connections for principal bundles 


69.9.1 REMARK: Application of principal bundle connections to ordinary fibre bundles. 

Definition 67.12.3 gives a general rule for associating connections between ordinary fibre bundles. Since a 
principal bundle is a special case of an ordinary fibre bundle according to Definition 66.1.2, it can be expected 
that associated connections between principal and ordinary fibre bundles will follow the same rule. In other 
words, an additional definition is not needed for the special case of principal bundles. 


One significant difference between principal and ordinary bundles is that connection forms can be defined 
on the former but not on the latter. Connections on PFBs are often defined as connection forms, but then 
they must be converted to other connection representations on associated OFBs because connection forms 
cannot be defined for general OFBs. 


An important advantage of principal bundles is that the generators u € T.(G) of infinitesimal transformations 
XL € X(T(G)) in Definition 62.4.15 for a Lie group G can be determined by computing the vector field 
value u = X} (e) for the identity element e € G. (This follows from Theorem 62.4.16 (i).) 

Thus when the fibre space is G, the generator of any infinitesimal transformation on that space can be 
recovered directly as its value at the identity e rather than by inferring it indirectly from the entire vector field. 
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(Such indirect inference is not entirely trivial. See Theorem 63.6.17.) Every connection is an infinitesimal 
transformation of the fibre space via the fibre charts. Consequently the task of extracting the connection 
generator from a connection is much easier for principal bundles than for general OF Bs. 

For a connection f on a principal bundle P < (P, 7p, M, AG), the generator u = tig(V, $) of By for V € T(M), 
via any fibre chart ¢ € AG, can be obtained by evaluating u = (d$); (Bv(zo)) for the single total space 
element zp = d|p (ec) € P,, where p = mp(z) € M. (See Definition 67.6.5 for tig(V,@).) The connection 
value fv (z) for all other z € P, can then be reconstructed using the formulas (dmp);(8v(z)) = V and 
(dé);(By (z)) = (dR(:,)«(u) in Definition 69.1.3 (iv, v) to obtain 


Vp € M, VV € T,(M), Vo € AS p, Vz € Pp, 
Bv(z) = (d(mp x $)) (V, (dRgyz) )e(tia(V, $))). 


The associated connections 0 on all associated fibre bundles E < (E, E, M, AL) can be constructed from 
the same Lie algebra element u, for each V € T(M) via h(d) € AE p: by means of the association rule 
in Definition 67.12.3, applying the formulas (drg),(0v(y)) = V and (dh(6)),(0v(y)) = (dRoy(yy)e(u) in 
Definition 67.5.4 (iv, v), where ^ : AS — AL is the fibre chart association map from P to E. Then 


Vp € M, VV € T,(M), Yọ € AZ, Vy € Ep, 


v (z) = (d(nz x h(ó))), (V. (dRnv)(z) )e (Ga (V, à))). 
Thus although a principal bundle is nothing more than an ordinary fibre bundle which just happens to have 
the structure group as its fibre space, the ability to reconstruct the connection for each base point vector V 
and fibre chart ¢ from a single Lie algebra element u € T.(G) makes principal bundles very special for the 
purposes of defining connections on all associated fibre bundles. 


Another difference between PFBs and OFBs is that a principal bundle can be used as the input to construct 
orbit-space associated fibre bundles, as described in Section 47.11 and Definition 66.7.12. This associated 
OFB construction method is assumed by many authors for defining associated connections on associated 
vector bundles. (See Theorem 69.10.4 for the validity of such associated connections.) 


'Theorem 69.9.4 shows that the orbit-space associated OFB construction method is not indispensable for the 
construction of associated connections. 


The construction formula on line (69.9.4) may seem unwieldy, but it is composed of three simple steps. 
(1) For V € T;(M) and ó € Ag v extract the generator u = üa(V,$) = ox (8v (e|, (e))) € T.(G) from fiy. 


(2) For each y € Ep, construct the vector field value XF (h(¢)(y)) = (ARF (y) e (üg(V, $)) € They (F) at 
the point A($)(y) on the fibre space F. 


(3) Pull the vector pair (V, XT (h(9)(y))) back to T, (E) via (d(ng x h($))); !. 


The generator tig(V, ¢) depends on the first-order differential of ¢ in the direction of the horizontal component 
of Bv(zo) € T4,v(P). Since 6y(zo) is not a vertical vector in general, it is necessary to pull back the 
vector pair (V, XX (h(6)(y))) € Tp(M) x Tao) (F) to T,(E), not only the vertical vector component 
XT (h(9)()) € Toyo (F). 


69.9.2 REMARK:  Chart-independence of the formula for an associated OFB connection. 

The expression for Oy in line (69.9.4) in the statement of Theorem 69.9.4 is guaranteed to be independent of 
the choice of fibre chart ¢ because 0y and fy are both chart-independent. However, it is of some technical 
interest (and also a wise precaution) to verify directly that the right side of line (69.9.4) is chart-independent 
before using it. This computation is given in the proof of Theorem 69.9.3. 


69.9.3 THEOREM:  Well-definition and chart-independence of associated OFB connection formula. 

Let P < (P, p, M, AG) be a C! principal G-bundle. Let E < (E, vg, M, AZ) be a C! (G, F) fibre bundle 
associated with P via a fibre chart association map h : AS > AL. Let e be the identity of G. Let 6 bea 
horizontal lift function on P. Let p € M, V € T,(M) and y € Ep. Define w : Ag, > T,(E) by 


Vb € AB», w() = (d(tz x h(9))), (V, (Riro) )e(O« (8v (|p, ())))- (69.9.1) 
(i) w(¢) is a well-defined element of T, (E) for all ¢ € A, 
(ii) w(@) is independent of ¢. 
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PROOF: For part (i), let ọ € AS» Let z = dlp, (e ) € Py. Then By(zo) € T.,(P) is well defined. So 
(do) 2. (8v (20)) € Ta(4,(G) = Te(G) is well defined. Therefore (ARE oy (uy) e (dé) zo (Bv (20))) € This) (F) is 
well defined. Consequently w(¢) is a well-defined element of T, (E). 

For part (ii), let $1, 9» € Ag,. By Theorem 67.6.3 (i), there are unique generators tig(V, x) € Te(G) with 
(Ox). o By o drl p = XE (Vd. ) for k — 1,2. (See Definition 63.6.5 for XC. See Definition 67.6.5 for tig.) 


Let zr = bal p, (e ) for k = 1,2. Then 


Vk € No, (dóx)z, (Bv (zx)) = (Pr) 8v (e| e, (e 


m Xia e) 
tia(V, r). (69.9.2) 


By Theorem 67.6.8, tg(V, ¢2) = Adj(g(p))(&g(V, ¢1)) + (dR) )e 1 ((dg)p(V)), where Le) = $90 dlp, for 
p € Tp(Dom(¢1) N Dom(92)). Therefore 


w(ó2) = (d(e X h($2))), (V, (GE, (y) )e((dó2) 2. (Bv (22)))) 
= (d(ng X h($2)))y "LA (dB os yy) e (üg(V, ¢2))) 
= (d(ng x h(ó2))), (V. (di o, ()(Adi(s(p)) (Ga (V, 61) + (ARG) Je '((49)p(V)))), 


where by Definition 62.10.2, and lines (69.9.1) and (69.9.2), 


(Ay (y e(Adi(9(P)) (tis(V, $1))) = 


UG o Lot 9 Rgtgy-1))«(üa(V. $1) 
= (ALG (p))h( ($1)( "IUS 1h(b2)(y ye (ap 
= (dL 5p) a(or)(v) (ERI yu) «(Gs (V. di 
= (dL a h(¢1)( Gy ((dh(1))y(w (w (¢1))), 


and by Theorem 63.6.29 (iii) and Definition 63.6.5, 
(djs yu )e (Rot le (dI) V) = Xiv (h(2)(y)), 
where u(V) = (dRG,,)z ! ((dg),(V)). Thus 
(dh($2))y(w(d2)) = (dot )nto Y qu) (dh(Óx))y Qw(61))) + Xiv (A(G2)(y)) 
= (dh(2))y(w(¢1)) 


by Theorem 64.8.10 line (64.8.1). So (d(ag x h(¢2)))y(w(1)) = (dmg x h(¢2)))y(w(¢2)). Therefore wz = w; 
because (d(zg x h(¢2)))y : T(E) > T,(M) x Ty, (y)(F) is a bijection. Hence w(¢) is independent of 9. 


69.9.4 THEOREM: Expression for an associated OFB connection in terms of a PFB connection. 
Let P < (P, p, M, AG) be a C! principal G-bundle. Let E < (E, vg, M, AZ) be a C! (G, F) fibre bundle 
associated with P via a fibre chart association map h : AG — AF. Let e be the identity of G. 


Let 8 and 0 be associated connections (i.e. horizontal lift functions) on P and E respectively. Then 
Vp € M, VV € T,(M), Vz € Ep, VO € AZ», 
Oy (z) = (dug x h(6)))z (V. (Roy) (àa(V, 6) ) (69.9.3) 
, E -1 
= (d(ng x h($)))z (V. (MBit oy c)e(6s (Gv (6| p. (€)))))- (69.9.4) 
PROOF: Forallpe M, V € T,(M), z € Ep and ó € A, the right-hand expression on line (69.9.4) is 


well defined and chart-independent by Theorem 69.9.3 (i, ii). Therefore the expression for 0y (z) determines 
a well-defined element of T(E), as required for a horizontal lift function on E in Definition 67.5.4 (ii). 
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By Definition 67.12.3 line (67.12.1), the associated connection 0 must satisfy 


Vo € AS, Vp € tp(Dom(¢)), VV € T,(M), 3u € T.(G), 
d. o By o lp, — XO and h($). o6y o ZO = XF. 


So by the uniqueness of tig(V, ¢) satisfying ¢, o By o Op. = XT vay 


Vo € AG, Vp € »p(Dom(9)), VV € T,(M), 
h($). o Ay o a(o) = Xie) 


By Definition 63.6.5, Vu € T.(G), Vf € F, XE (f) = (dRF )e(u), where RẸ :g gf for g € G. Therefore 


Vp € M, VV € T,(M), Yz € Ep, VO € AG, 
Ov (z) = (d(xz X h(6)): (V, Xë v.a ((0)(2)) 
= (d(ng x h($)));* (V. (Bg oy.) (ü&(V. 0))) 
= (dz X h(6)))z (V; Paper Ji 


This verifies lines (69.9.3) and (69.9.4). 


69.9.5 REMARK: Further formulas for associated connections on OFBs and PFBs. 

Theorem 69.9.6 combines Theorem 69.9.4 with Theorems 67.11.6 and 69.7.9 to derive further formulas for 
connections on ordinary fibre bundles in terms of associated connections on associated principal bundles, by 
simply substituting conversion formulas between the various representations of connections. 


in terms of w. This could provide a pathway to express the emm pom derivative for an orbit- 
space associated OFB in terms of a connection form on the PFB. This is a style of associated connection 
which appears in the gauge theory literature. However, this particular combination of conversion formulas 
apparently does not give a simple result, at least not simple enough to justify presenting it here. (See 
Theorem 69.10.4 for the computation of the covariant derivative of an orbit-space associated vector bundle 
cross-section in terms of a connection on the principal bundle.) 


69.9.6 THEOREM: Formulas for associated connections on ordinary and principal fibre bundles. 
Let P < (P, īp, M, AG) be a C! principal G-bundle. Let E < (E, vg, M, AZ) be a C! (G, F) fibre bundle 
associated with P via a fibre chart association map h : AG — Af. Let e be the identity of G. 


Let 8 be a connection on P. Let 0 be a connection on E which is associated with 8. Let h and v be the 
horizontal and vertical component maps corresponding to 0. Then 


Vz € E, VV € Tre) (M), Vo € AŠ rel (ai 
8.(V) = (dre x h(9))z (V. (ABl 2 )e(O«(Bv (|p. 


KO) )))). (69.9.5) 
Yz € E, Vy € T(E), Vb € AB pp (2)s 


he(y) = (dr X h(9)))z (Cr), (Hi )e(O» Bere). lle a OD) (69.9.6) 
Vz € E, Vy € T.(E), mu 
valu) = n (GO...) (OU) — (HR e (0s (Bere). (o... (0) - er 


(See Notation 64.12.10 for the fibre-set tangent vector embedding map n” : TZ(Ez(,) > Tz,0(£).) 
Let tg be the connection generator function for 6. Then 


Vz € E, VV € TG!) (M), Vb € AE. uo 
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8.(V) = (d(me x h(6))7 (V. (Hity ce (üs(V. 9) ). n 
Yz € E, Wy € T4080), Vo € AS ar) 

halt) = (dlre X MOD ra). (ARE ye ao ec) (9), 0) - ina 
Vz € E, Vy € T.(E), Vo € AS L0, 

vely) = FAO o AOU) — (Loy (e Gs c) l), 9))))- TRE) 


Pnoor: Line (69.9.5) follows from Theorem 69.9.4 line (69.9.4) and Definition 67.8.2 for 0. 
Line (69.9.6) follows from line (69.9.5) and Theorem 67.11.2 (iv). 
Line (69.9.7) follows from line (69.9.6) and Theorem 67.11.6 (vi 


Lines (69.9.8), (69.9.9) and (69.9.10) follow from Theorem 69.7.9 (i) and lines (69.9.5), (69.9.6) and (69.9.7) 
respectively. 


— 


((2019-7-20. In gauge theory, a connection form is typically defined first, and the associated covariant derivative 
on an associated vector bundle is defined in terms of this. Therefore should write a formula for DO, X 
for X € X!(E,s, M) in terms of w, on an associated PFB via the connection generator equality. Have 
6v (s) = (dlr X h(2)))z (V, XE wnay 9(9)(2))) from Theorem 67.6.7 (i), where üg(V,h(6)) = tig(V, 9). 
Then have tig(V, 6) = —wx,(py((1 X9); ! (V, Or, (gy)) from Theorem 69.7.9 (iii). This should give a formula like 
Theorem 69.9.6 line (69.9.5), but using w instead of 8. This is significant because gauge theory introductions 
generally present PFB connections as connection forms rather than horizontal lift functions. Should find 
0v(z) = (d(x x h($)))z (V, —(dRaoyy))e (wx, qy((7 X $)7 (V, 0r,())))). Maybe can simplify this. Then 
can obtain D? X = w” (ðv X — 0v (X(p))) from Definition 68.2.9. This seems to be a more direct kind of 
formulation than the pulled-back lifted vector field rigmarole in Sections 69.11 and 69.13. )) 


69.10. Orbit-space associated vector bundle connections 


69.10.1 REMARK:  Covariant derivative operator for contravariant functions on principal bundles. 
Definition 69.10.2 is a pure mathematical rendition of a fairly natural way to define the associated covariant 
derivative for an associated vector bundle from a connection on a principal bundle. (This approach is 
presented by Daniel/Viallet [317], page 186, section II.H.1.) It is not the usual way to define associated 
connections pure mathematically, but Theorem 69.10.4 shows that it is equivalent to the covariant derivative 
which would be obtained by equating connection generators as in Theorem 67.12.5. 

A function Y € X!((P x F)/G) gives the measurement Y (z) € F for each z € P. Definition 69.10.2 simply 
applies the parallel translation operator ôg, (z) to Y to obtain Og, (Y = (dY);(8v(z)). This is dropped 
from T(F) to F by the drop function w”. Then the equivalence class [(z, c ((dY ):(8v (z))))] € (P x F)/G 
is an element of Ep, where E is the orbit-space associated fibre bundle over F and p = mp(z). 


69.10.2 DEFINITION:  Covariant derivative for contravariant principal bundle functions. 

The covariant derivative for contravariant principal bundle functions, for a connection 8 on a C! principal 
bundle (P, tp, M, AS), for a Lie left transformation group (G, F) over a finite-dimensional real linear space F, 
is the map D? : T(M) > (X1((P x F)/G) > (P x F)/G) given by 


VY e X!((P x F)/G), Vp € M, VV € T,(M), Vz € Py, 
DEY) = [(z, a" (aY )-(8v (2))))]. 


where X!((P x F)/G) = (Y € Cl(P, F); Vz € P, Vg € G, Y (zg) = g 1Y(z)] is the set of Ct contravariant 
F-valued principal bundle functions on P as in Notation 66.8.3. 


69.10.3 REMARK: Application of general associated connection definition to the orbit-space construction. 
Theorem 69.10.4 verifies that the style of associated connection which is defined by several authors for 
orbit-space associated vector bundles is consistent with Definition 67.12.3 for general associated connections. 
This general criterion requires the connection generator functions of associated connections to be equal, 
irrespective of whether the fibre bundles are ordinary or principal. It transpires that no special modification 
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(such as a minus sign for example) is required for principal bundle connections. The connection generator 
equality criterion is applied at line (69.10.6) in the proof of Theorem 69.10.4. 


The usefulness of line (69.10.1) in Theorem 69.10.4 is that the covariant derivative c F(0yY —0y(Y (p))) of a 
cross-section Y € X!(E, mg, M) of an orbit-space associated vector bundle (E, tg, M, AE) can be computed 
by applying the associated principal bundle connection 8 to the associated contravariant principal bundle 
function Y : P > F. (Such functions are introduced in Definition 43.5.12.) 


Such F-valued functions on the principal bundle can be used as an alternative to the formal definition of an 
orbit-space associated vector bundle. Y (z), for each z € P, is the value of f which should be used in the 
orbit-space [(z, f)] which represents the cross-section value Vinee )) = [(z, f)]. The constructions Y and Y 
are equi-informational. Line (69.10.1) shows how to compute the covariant derivative of the cross-section of 
E by computing (dY);(8v(z)) = Oa, (;;Y instead. This shows that associated vector bundles do not need 
to be constructed. Functions Y : P — F satisfying the contravariance condition Y (zg) = g Y (z) can be 
used instead. (Some authors refer to this as an “equivariance” condition.) 


69.10.4 THEOREM: Covariant derivative equivalence for principal bundle and vector bundle cross-sections. 
Let (F, Ar) be a finite-dimensional real linear space with standard atlas Ap. Let (G, F) be the C™ Lie 
left transformation group with G = GL(F). Let P < (P,zp, M, A) be a C! principal G-bundle. Let 
E < (E, ng, M, A‘) be the orbit-space associated (G, F) fibre bundle for P. Let 0 be the connection on the 
vector bundle E = (P x F)/G which is associated with a connection on P. Let v? denote the vertical 
component map for 0. Then 


VY € X'((P x F)/G), Vp € M, VV € (M), Yz € Py, 


[ee (040v 2))] = w” (ðvÝ — ðv (Ý (p))) (69.10.1) 
w” (vp (OvY)) (69.10.2) 
= _ ey. (69.10.3) 


where X! ((Px F)/G) - (Y : P > F; Yz € P, Yg € G, Y (zg) = g !Y(z)), and Y € X(E,mg, M) is defined 
for Y € X!((P x F)/G) by 


VY e X! ((P x F/G), Yp e M, Yz € P, — Y(p) = (Y (2))] 


Hence 

VY e X'!((P x F)/G), VV € T(M), DEY = DEY. (69.10.4) 
Proor: By Definition 66.7.12 (iii), the chart association map h : AG — A5 satisfies 

Vee AB, Yp E M, Yz E€ P, Vf € F, MOK HI = olf. 


Let Y € X!((P x F)/G), p € M, V € T,(M) and ¢ € AŞ, Let zo(p) = = él; (e = X4(p) € P, 
for p € mp(Dom(¢)). (See Definition 21.10.3 for Xy.) Then mp(zo(p)) = p and (z n ) =e. So by 
Theorem 69.7.11 (i), substituting zo(p) into the left hand side of line (69.10.1) gives 
h(9) ([(zo(p), =" (Y ):, 9) (8v (20(p)))))]) = iE uid e 
c ((dY) 29(p)((d20)p(V) (d(4|p, )) (V, $)))). 


To simplify the right hand side of line (69.10.1), note that the differential structure on Æ must match the 
differential structure on M x F via local trivialisations. The local trivialisation 7g x h(¢) maps Y (p) to 
(p. ó(z)Y (z)) for any z € Ep, for any p € mp(Dom(¢)). Therefore (vg x h($))(Y (p)) = (p, Y (zo(p))). So by 


the chain rule, . 
(dmg x h(0)))y (y ((dY )p(V)) = (idm, qu) x ((dY)4,( © (dzo)5))(V) 
= (V, (dY )4, y ((dzo)p (V ))). 
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Therefore , . 
OvY = (dY),(V) 


= (d(nz x h(9))) Zip Vs (AY )zo(p)((d20)p(V))). 


Let y = Y(p). Then 


Ov (Y (p)) = Ov ([20(p), Y (z0(p)))) 
= (d(ng x h(6))); (V, X2, v.n (o (^(9)(0))) (69.10.5) 
= (d(tz x h($)))y (V, X2, ev. (Y (o(p)))) (69.10.6) 
= (d(mg x h(6))); (V, (RY (5p) e(tia(V, $))), 


where line (69.10.5) follows from Theorem 67.11.6 (i), and line (69.10.6) follows from Theorem 67.12.5. So 


OvY — 6v (Y (p)) = (d(nz x h(9))) Zip) (0. (AY )zo(p) ((d20)5(V)) — (ARY (2g (py) )e (êB (V, 0) 
= n((dh(9)] s Fip) ( (dY )<o(v) ((d20)p(V)) — (ARF coto) Je (âa (V, 4) )) 


P 


by Theorem 58.7.5 line (58.7.5). (See Definition 54.6.2 for the fibre set submanifold tangent vector embedding 
map y: T(Ej) — T(E).) Therefore by Definition 65.3.5, 


w”(ðvÝ —6v(Y(p)) = h(9)| p (F (dh) (8yY — 6v (Y (p))))) 
(m7 ((dY)zo(p)((420)p(V)) — (ERE ())« (âe (V, 9) )). 


But Y € AU x do implies Ry soy (9 ) = gY (zo(p)) = Y (zo(p)g s) a all 3 € G, where zo(p)g ! = 


lp, (Og = 45.9 ) by Theorem 21.11.7 (xiii). So Ry soy (9) Y (e|. ) for all g € G. Define 
j:G— G by j:geg |. Then Ryu (p) = = Yo lp, o j. Therefore by the ds rule and Theorem 62.2.9, 
-1 : 
(dRy (py) Je = (dY )za(p) © dlp ° (dj)e = (dY )zo(p) © (d(4|p ))e o (-idr.(ay) = —(€Y)zo(p) (d(4|p. 
So 
- - -1 
w” (8yY — Ov (Ý (p))) = h(P) |p (v^ (AY )zo(p)( (dzo)p(V) + ( d(4p,)) a(V,¢))))) 


= [(zo(p), w" ((4Y) zo (p) (Bv (20()))))]. 


This verifies line (69.10.1) for the special case z = zo(p). For general z € Pp, there exists a linear map 
g € G — GL(F) such that zo(p) = zg by Theorem 21.11.7 (xi). Then 


c^ (8vY — 6v(Y(p)) = [(29, v" (AY) 29(8v(z9))))] 
= [(z,9@" (dY )zg((4RG )2(8v(z)))))] 


by Definition 69.1.3 (v’) and Theorem 47.11.10 (iii). But Ly o co^ = w" o (L,). by Theorem 58.4.16, 
and Lg o Y o R? = - Y because L LY CRI (z)) = gY (7 e Y) for all z' € P by the definition of 


X1((P x F)/G). Therefore 


c" (8yY — 6v (Y (p)) ) = [(, «^ ((dLg) (2g) (AY ) 29 ( (AR ): (8v ())))] 
= [(z, e" ((d(Lg o Y o R5): (8v (2)) 
= [Ge (Y): (v (2I. 


This confirms line (69.10.1) for all z € P,. Then line (69.10.2) follows from Theorem 68.2.13 (i). Line (69.10.3) 
follows from line (69.10.1) by Definition 68.2.9. 


Line (69.10.4) follows from line (69.10.3) and Definition 69.10.2. 


— 
— 
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69.10.5 REMARK: The significance of the associated connection matching rule validation theorem. 
Theorem 69.10.4 tests the associated connection defined by Daniel/Viallet [317], page 186, subsection II.H.1, 
for consistency with the associated connection matching rule in Definition 67.12.3, which states that as- 
sociated connections must have the same connection generator function, irrespective of whether they are 
ordinary bundles or principal bundles. 


It could be argued that the connection generator of an ordinary bundle should be the negative of the 
connection generator for an associated principal bundle. According to this argument, principal bundles 
represent reference frames, whereas ordinary bundles represent the objects which are observed via these 
reference frames. If you are sitting in a train which moves to the East, a stationary train on a nearby track 
seems to be moving to the West. If one rotates a camera to the right, the objects in the image seem to be 
rotated to the left. (This kind of issue is also discussed in Example 20.10.6.) 


The camera-versus-object argument cannot be applied in this way because although ordinary bundles do 
represent objects under observation, the outputs from fibre charts represent measurements of objects under 
observation. A connection generator üg(V, h(¢)) = tig(V, $) defines the rate of change of measurements in a 
fibre space F as the base point moves with velocity V. A measurement is an output from the observation of 
an object by an observer via a particular reference frame. In the (z, f) pairs in Theorem 69.10.4, z represents 
a reference frame and h(¢)[(z, f)] = ¢(z)f represents the measurement of object state [(z, f)] for reference 
frame z via a PFB chart ¢ and associated OFB chart h(¢). The equivalence class (or orbit) [(z, f)] represents 
an object under observation. The transformation g applied to z in [(z, f)] gives [(zg, f)], whereas the same 
transformation g applied to f in [(z, f)] gives [(z, gf )], which oddly enough, is the same orbit or object. (See 
Theorem 47.11.10 (iii).) 


A real-world example would be an ant sitting on a ruler at the 10cm notch. If the ruler is moved 1cm to 
the right, the ant (which is the real object whose location is being measured) moves 1 cm to the right. Thus 
moving the reference frame z to the right, with a fixed measurement f = 10cm moves the ant 1cm to the 
right. If instead the measurement of the ant’s location increases from 10cm to 11 cm because it walks, the 
ant likewise moves 1cm to the right. In other words, increasing either z or f gives the same movement of 
the real object under observation, which is the ant. 


It is for this reason that the PFB connection generator ûg and the OFB connection generator £g must be 
the same for associated charts. The generator ûg transforms the reference frame, whereas the generator tig 
transforms measurements made relative to a reference frame. They represent the same parallel transport. 

As another example, consider a turntable on top of a train which moves from West to East along the 30°N 
latitude curve (which is not a geodesic). For parallel motion, the turntable must rotate 1? clockwise relative 
to the train (viewed from above) for each 2? of longitude travelled. But if the turntable is locked to the top 
of the train, an ant on top of the turntable must rotate 1? clockwise relative to the turntable for each 2? of 
longitude travelled. The same parallel motion of the ant is achieved, whether it is the reference frame (the 
turntable) which rotates with the ant fixed to the turntable, or the ant rotates relative to a fixed turntable. 


69.10.6 THEOREM: Formula for covariant derivative in terms of the connection form. 

Let (G, F) be the C™ Lie left transformation group with G = GL(F) over a finite-dimensional real linear 
space F. Let (P, tp, M, AS) be a C! principal G-bundle. Let (E, mg, M, AE) be the orbit-space associated 
(G, F) fibre bundle for P. Let 0 be the connection on the vector bundle E — (P x F)/G which is associated 
with a connection 8 on P. Then 


VY e X! ((P x F)/G), VX € XL.(P, np, M), Vp € Dom(X), VV € T;(M), 
(dY ) xp) (8v (X (p))) = ôv (Y o X) + (GRy (x )e(»(X«(V))). (69.10.7) 
PROOF: Let z = X(p). Then by Theorem 69.6.6, By(z) = Oy X — Az(wzCX«(V))). So (dY);(8v(z)) = 
Ov(Y o X) — (d(Y o LP)).e(w;(X,(V))). But (Y o LP)(g) = Y (zg) = g^ !Y(z) for all g € G. Therefore 
Y o LP = Ry(; o j, where j : G— G is defined by j : g++ g |. So by Theorem 62.2.9, 
(d(Y o L2))e = (d(Ry(z) o j))e 
= (dRy(z))e o (di). 
= (dRy(z))e ° (—idr,(@)) 
= —(dRy(z))e- 
Hence (dY) x (y (By (X(p))) = ôv (Y 9 X) + (dRy(x(p)) )e(w(Xx(V))). 
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69.10.7 REMARK: The relation between contravariant function connections and connection forms. 
Line (69.10.7) may be written in terms of AY = w o X, in Definition 69.11.3 with z = X (p) as 


(dY).(Bv(z)) = ôv (Y o X) + (dRy(z))e(A¥(V)). (69.10.8) 


The opposite sign for the covariant derivative of a short-cut associated vector bundle cross-section Y , relative 
to the gauge covariant derivative formula for principal bundles in Theorem 69.14.6 line (69.14.2), is consistent 
with the contravariant relation between PFB cross-sections and OFB cross-sections. The term A&(V) has 
the same sign as w, which is a covariant derivative form, not a connection form. 


A suitable abbreviation for the associated vector bundle covariant derivative expression on line (69.10.8) 
would be “0, + A," , corresponding to the abbreviation “0,,— Aj? for principal bundle covariant derivatives 
in Remark 69.14.7. Both of these forms appear in the gauge theory literature, although it is sometimes not 
clear whether PFB or OFB covariant derivatives are intended, and often the A,, term is replaced by various 
expressions such as “—iqA,,”, with various signs and constants in place of the charge or coupling parameter q. 
This diversity of formulas is a consequence of the fact that the structure group is typically SU(n) for n = 1, 
2 or 3, and various styles of bases for its complex Lie algebra are employed. 


((2020-2-8. Make a list of covariant derivative formulas like 0,,+ A, for PFBs and associated OFBs in the 
literature, showing how they relate to connection forms on principal bundles. This has already been started 
in Remark 69.14.8. )) 


((2019-4-9. Show that the methods in the literature for defining associated connections in terms of horizontal 
subspaces for orbit-space associated fibre bundles are consistent with Theorems 69.9.3, 69.9.4 and 69.9.6, 


and Definition 67.12.3. See for example Drechsler/Mayer [262], pages 92-93; Sternberg [38], pages 335—336. )) 


((2019-4-1. Perhaps show how to define associated connections for the patchwork construction. But this is 
possibly too tedious, and is probably not useful for anything. 
2019-11-28. Probably it's better to just write a remark about how associated connections for patchwork 
associated fibre bundles can be constructed, and why it doesn't really matter. )) 


((2019-7-20. Show the relation of the Cartan-style connection forms for vector bundles in Definition 68.3.8 to 
connections on the associated frame bundles. Probably should also show how to construct these connection 
forms from connection generators. )) 


69.11. Gauge potential localisations of connection forms 


69.11.1 REMARK: Gauge potentials are connection form localisations via pull-backs of cross-sections. 

In physics terminology, Sections 69.11 and 69.12 are concerned with “gauge potentials”. In mathematical 
terminology, they are concerned with “localisations of connection forms via pull-backs of local cross-sections 
of principal bundles". The physics terminology certainly has the advantage of brevity. 


The physicist represents connections as gauge potentials, which satisfy certain gauge invariance rules, and a 
connection form on a PFB may be constructed from these if such a level of abstraction is desired. For the 
mathematician, the connection form is the starting point, from which gauge potentials may be constructed 
as in Definition 69.11.3. Many physics texts do not present PFB connection forms at all because such 
abstraction is not directly helpful for the practicalities of quantum field theory. 


It is understandable that for particle physics, the connection forms, covariant derivatives, Lagrangians and 
equations of motion are all pulled back from abstract principal bundles, where mathematicians would define 
them, to the space-time base manifold where physical intuition is more direct. This pull-back is achieved by 
means of local cross-sections of the principal bundle. In the case of principal frame bundles, as in Section 55.7, 
such a cross-section is a Cartan-style “moving frame", and the connection form then induces differentials 
of that frame for each velocity vector in the base manifold. The literature on quantum field theory makes 
extensive use of this formalism, using “gauge potentials” instead of Ehresmann-style connection forms on 
principal bundles. However, the advantage of concreteness is combined with the disadvantage of dependence 
on the "choice of gauge", which means the choice of local cross-section of the principal bundle. 


The gauge potentials of particle physics were shown in 1977 by Drechsler/Mayer [262] to be localisations of 
connection forms via pull-backs of local cross-sections of principal bundles. (See Definition 69.11.3.) Gauge 
potentials are consequently yet another equi-informational way to represent connections on fibre bundles. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2196 69. Connections on principal bundles 


Gauge potentials are local in general because only a trivial principal bundle has a global cross-section. In 
the case of non-trivial principal bundles, a global connection on the bundle can only be recovered from local 
gauge potentials if the entire base manifold is covered by them. The consistency requirement for gauge 
potentials on their overlaps is known as “gauge transformation” rule. Any set of gauge potentials which 
cover the base space, and are consistent on overlaps according to this rule, specifies a unique connection on 
the principal bundle by inverting the pull-back. (See Theorems 69.12.5 and 69.12.7.) 


69.11.2 REMARK: Projecting connection forms to the base-point manifold via cross-sections. 

Connection forms may be defined on general differentiable principal bundles in the Ehresmann style, or they 
can be defined as Lie-algebra-valued connection forms corresponding to given cross-sections of the principal 
bundle in the Cartan style. (See Remark 69.5.1 for literature references for both styles.) 


The purpose of Definition 69.11.3 is to attempt to construct the Cartan style from the Ehresmann style for 
general differentiable principal bundles, based on the pattern of Definition 71.14.2 for connection forms on 
principal frame bundles for the basic tangent bundle. (For applications of this generalisation, see for example 
Daniel/Viallet [317], page 183; Eriksson /Haggblad/Stro6mbom [264], page 53.) 


The Cartan-style connection forms for moving frames in Definition 71.14.2 are constructed as projections via 
cross-sections of the principal frame bundle, which is quite specific to affine connections on tangent bundles. 
The principal frame bundle of the base-point manifold is a principal bundle by Theorem 55.7.11. So for every 
choice of principal bundle cross-section (or “moving frame"), there is a connection form localisation which is 
defined on the base space. But in the case of general principal bundles, the principal bundle cross-sections 
are totally unrelated to the “moving frames”. Principal bundle cross-sections provide a substitute for the 
moving frames in the general case. 


Connection forms can be constructed for local cross-sections of a principal bundle from a given horizontal 
lift function 0 on an ordinary fibre bundle E as follows. 


( 


1) 
(2) Transpose 6 to make a transposed horizontal lift function P for P. 
3) 


( 


(4) *Localise" the connection form via local cross-sections of the principal bundle as in Definition 69.11.3. 


Convert 0 to a horizontal lift function 6 for a principal bundle P associated with E. 


From f, construct a connection form w on P as in Definition 69.5.4. 


The motivation for this style of “localisation” comes from gauge theory in particle physics. 
The space X1 .(P, v, M) mentioned in Definition 69.11.3 is the set of C! local cross-sections of P on open 


loc 


subsets of M according to the pattern of Notation 47.4.3. Similarly, Xioc(A1(T (M), T (G))) is the set of local 
short-cut T,(G)-valued 1-forms on open subsets of M. (Definition 69.11.3 is illustrated in Figure 69.11.1.) 


AX(V) = e(X.(V)) 
X, (V) 
p —7 Lest le 
X(p) 
X |: Ty || Xx 
M p*t—— ¥ 
Figure 69.11.1 Connection form localisation map via local cross-sections 


The “short-cut version" of cross-sections of the covariant tensor bundle A,(T'(M),T.(G) is assumed here, as 
discussed in Remark 57.7.1. Without the short-cut, it would be necessary to write AY (p)(V) = w(p)(X.(V)) 
in place of line (69.11.2). The short-cut makes the formula A& = w o X, meaningful. The equation on 
line (69.11.1) would be Dom(A&) = Dom(X) if the short-cut was not used. Here it is written as Dom(A&) = 
T(Dom(X)), but strictly speaking, 7,(M) and T,(Dom(X)) are not the same space for p € Dom(X). This 
subtlety is ignored in Definition 69.11.3 because it is an excessively pedantic point. 
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69.11.3 DEFINITION: Gauge potentials constructed from connection forms. 
The connection form localisation map via (local) cross-sections, for a connection form w on a C! principal 


G-bundle (P, x, M, AG), is the map A? : X}.(P,7,M) > Xioc(A1(T(M), T.(G))) defined by 


loc 


VX € XL,(P,m, M), Dom(A%) = T(Dom(X)), (69.11.1) 
and 


YX € XL (Pm, M), Vp € Dom(X), VV € T,(M), 
AS- (V) = u( XQ). (69.11.2) 


In other words, VX € XŁ.(P, t, M), A& =w o X, = X* (w). 

The connection form localisation via a (local) cross-section X € X}, 
(P, v, M, AG) is the function AY =w o X, € Xtoc(A1(T(M), Te(G))). 
Alternative name: the localisation of the connection form w via the cross-section X . 
Alternative name: the gauge potential for a given connection form w, for a given gauge X. 


(P, v, M) of a connection form w on 


69.11.4 EXAMPLE: Gauge potential for a trivial principal (IR, +)-bundle on IR^. 

Let 8 be the horizontal lift function on the principal bundle (P, m, M, AS) in Example 69.1.9, where M = IR*, 
G=R<(R,+),P=MxG,and7: P — M with : (p,g) — p. Then By(z) = t(pg),(v,f(z,v)),idp for p € M, 
V = tpv ia, € Tp(M) and z = (p, g) € Pp, where f : Rx RxR — R has the form f(p, g, v) = boum aj(p)v? 
for all z = (p,g) € P and v € R^, where a = (25)3—o : M — Rt is the tuple of coefficients of the linear 
dependence of f on v. 


It is shown in Example 69.5.6 that the connection form w on P which corresponds to 8 satisfies 
Vz € P, V(v,w) € R^ x R, Walls (vw) idp) = tiga Few) idr" 


This fully defines w because for all z € P, every y € T,(P) may be expressed as y = tz,(y,w),iap for some 
unique w € IR, where v € IR? is defined by tow idee = (nsu. 
Define X : M — P by X(p) = (p, Or) for all p € M. Then X € X}.(P,7,M) by Notation 64.7.3, and its 


loc 


differential X, : T(M) — T(P) satisfies X,(tp,v,idy,) = t(p,0),(v,0),iap for all p € M and v € R4. Therefore 


Vp € M, Vv E€ Rå, AJ (tp vidm) = w( X, (to uids)) 
= W(p,0) (Ep.0).(v.0) id p.) 
- tor,- f(p,0,v), idr 


= — f (p, 0, v) tie qiu 


The minus sign arises from the fact that connection forms are “difference-type” connections, as summarised 
in Remark 67.2.1 and Table 67.2.1. 


In this very simple scenario, the gauge potential to, — ¢(p,0,v),idp_ € To(R) for a vector tpwv,iam € Tp(M) 
has the “velocity” parameter —f(p,0,v) = - Yo a;(p)v! for some coefficient tuple (a;(p))?-9 € R* at 
each p € M. Thus it is a Lie-algebra-valued 1-form on the base space, as required by Definition 69.11.3. 


69.11.5 REMARK: Terminology: “gauge potential” versus “gauge field”. Avoid “gauge field”. 

The terms “gauge potential” and “gauge field” are both used by a large proportion of authors for the Lie- 
algebra-valued functions representing bosonic radiation fields in gauge theory, corresponding to connection 
forms in differential geometry. However, some authors use the term “gauge field” to mean the “field strength” 
field, which is mathematically the curvature of the gauge potential. (See for example Daniel/Viallet [317], 
page 185.) Therefore the term “gauge potential” is preferred here for various representations of the connection 
form, and the term “field strength” is used for its curvature. The term “gauge field” is mostly avoided because 
it has contradictory meanings in the literature. 


The use of the word “potential” for a gauge potential arises from the potential energy function of a charged 
particle, which can be differentiated to compute the field strength. It was the Aharonov-Bohm effect, 
predicted/discovered in 1959/1960, which showed that the electromagnetic potential has an observable effect 
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in quantum mechanics. (See Aharonov/Bohm [314]; Chambers [316].) In other words, the EM potential was 
shown to be more than just a convenient mathematical construct. 


The word “field” in the EM context tends to suggest the idea of the electric force field (which is a vector 
field) rather than its potential energy function (which is a scalar function). So it is preferable to not use 
the term “gauge field” for a connection in gauge theory, although numerous authors do use it, and in fact 
it is mathematically a Lie-algebra-valued vector field. (But as physics texts always note, the transformation 
rules for its localisation to the base space are different to those of ordinary vector fields. Hence the term 
“gauge transformation” for their unusual transformation rules.) 


69.11.6 REMARK: Interpretation of connection forms for cross-sections of principal bundles. 

The style of connection form in Definition 69.11.83 uses a local cross-section X of a principal bundle P to 
generate vectors X, (V) in the tangent bundle T(P) to which the style of connection form w : T(P) > T.(G) 
in Definition 69.5.4 can be applied. 


For example, if the C% manifold M is the embedded 2-sphere S? C IR?, and X € X}.(P,7,M) is a 


loc 


local cross-section of a C! principal G-bundle (P, 7, M, AG) as in Definition 66.1.2, where G = SO(3) and 


U = Dom(X) € Top(M), then X is a C! map from U to P with «(X(p)) € P,(M) —« !((p]) for all p € U. 
If p is varied with velocity V € T (M), then X(p) varies with velocity (dX), (V) = X.(V) € Tx(p)(P). Then 
w maps X,(V) to an element of T(G), the Lie algebra of G. The Lie algebra element w(X,(V)) is a kind 
of “correction signal" to indicate how unparallel the transport of the cross-section X is in the direction V. 
(It could be thought of as a kind of gyroscope signal which is provided to a servo-motor to maintain correct 
orientation.) Thus if w(X,(V)) = 0, this would indicate that if a C! curve y in the direction V = 4/(0) at 
p = 7(0) adopts the value X(7(t)) for each t € Dom(»), then the differential of X(y(t)) at t = 0 would be a 
horizontal vector X, (^ (0)) in the linear space Tx (jj (P). 

If w(X.(V)) Æ 0, then the Lie algebra element w(X..(V)) indicates how the value X (p) deviates from parallel 
motion. As mentioned in Remark 69.5.2, the connection form is really a “covariant derivative form”, which 
indicates deviation from parallelism, whereas a horizontal lift function indicates the correct rate of change 
of state to achieve parallel motion. 


In summary, a connection form for a local cross-section yields a Lie-algebra-valued 1-form on the base-point 
manifold which indicates the deviation from parallel motion of the given cross-section in any given direction. 


69.11.7 REMARK: Similarities between connection form localisation and cross-section localisation. 

The localisation of connection forms in Definition 69.11.38 has some similarities to the localisation of cross- 
sections in Definition 21.6.2. However, in the latter case, cross-sections X are localised via fibre charts ¢ to 
construct functions ¢ o X, whereas in the former case, connection forms w are localised by cross-sections X 
to construct functions w o X,. On principal bundles, there is a one-to-one correspondence between cross- 
sections and fibre charts, as shown in Theorem 21.10.14. So localisation via a cross-section is effectively 
equivalent to localisation via a fibre chart. Thus X in the expression w o X, can be replaced with the 
identity cross-section X, for some fibre chart $. But a significant remaining difference is that the domain of 
w is the tangent space T(P), not the total space P, and the target space is T,(G), not G. 


69.12. Gauge potential globalisation 


69.12.1 REMARK: Localisation and globalisation of connection forms. 

The pull-back AX of a connection form w : T(P) > T;(G) via a local cross-section X € X(P,m, M |U) is 
sometimes called “localisation” of the connection form because it is defined only on an open subset U of 
the base-space M, and also because it is dependent on an arbitrary choice of cross-section to express the 
connection form in “coordinates”. (See also Remark 21.6.3 for comments on the term “localisation” .) 


The Lie algebra T..(G) is a fixed global space for the whole principal bundle, whereas the fibre sets Pp are 
specific to points p € M. But the cost of expressing a connection form in terms of a Lie-algebra-valued 
function of base points is that it is non-global and cross-section-dependent. In gauge theory, it is usual to 
commence with local versions of connections, then write consistency conditions for “gauge transformations” 
for converting between arbitrary cross-section choices, and then construct a global connection form on the 
principal bundle from the local pull-backs. In other words, the global form is constructed from local forms, 
whereas in pure mathematics, one would generally construct local forms from global forms. For the local- 
to-global construction, one needs to know the consistency conditions which make "globalisation" possible. 
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69.12.2 REMARK: Undoing the localisation of connection forms via cross-sections. 

Theorem 69.12.3 gives a formula for reconstructing the original connection form w € X(A;(T(P),Te(G))) on 
a principal G-bundle (P, x, M, AG) from localisations A? € Xioc(A1(T(M),T.(G))) which are constructed 
via cross-sections X € Xi. (P, v, M) according to Definition 69.11.3. It is necessary to know both a locali- 
sation A% and the cross-section X which was used in the construction of A% in order to reconstruct w. 


Although line (69.12.1) reconstructs w only on the subset 7~!(Dom(X)) of P, the entire connection form 
can be reconstructed from cross-sections whose domains cover the base space M. 


69.12.3 THEOREM: Reconstruction of a connection form from localisations via cross-sections. 
Let w be a connection form on a C! principal G-bundle (P, v, M, AG). Then 


VX € XL.(P,m, M), Vz € v ! (Dom(X)), Vy € T,(P), 
w(y) = Adi(óx(z) )(A& ((dr)-(y))) + (ALB (2)-1) ox (dx) ())- (69.12.1) 
(See Definition 21.10.7 for óx : x^! (Dom(X)) — P.) In other words, 
VX € XL.(P,n, M), Vz € m~ ! (Dom(X)), 
wz = Adj(óx(z) !) o AX o (dz), + (ALG, (-1)ox (2) © (dbx) z. 


PROOF: Let X € X}.(P,7,M). Let Ux = Dom(X). Define a € X(A1(T(P), Te(G)) | v! (Ux)) by 


loc 
Yz €e r`! (Ux), Vy € T;(P), 
a(y) = Adj(éx (z)  )(A& ((dr)-(y))) + (dLS x ()-1)ex (2) (dx )-(v)). (69.12.2) 


(This uses the “multilinear form short-cut” for a@ as in Remark 57.7.1. In other words, a(y) is an abbreviation 
for a(z)(y), which is the same as o;(y).) Let p € Ux. Let z = X(p) € P. Then ¢x(z) = e € G by 
Definition 21.10.7. Let y € TZ(P). Let yy = X.(r.(y)) = (€X)p((dr)z(y)). Let y, = y — yn. Then 


(di) (yn) = (dr) 2((dX)p((dr)2(y))) 
= (d(x o X))p((dr)z(y)) 
= (didu, )p((d7)z(y)) 
= idr, (m) ((d7)2(y)) 
= (dr) (y). 


Also, (dex )2(Yn) = (dex) z2((dX)p((dr)z(y))) = (d(óx o X))p((du):(y)) = (de)p((d)z(y)) = 0 because 
gx o X is constant since dx (X(p for all p € Dom(X) by Theorem 21.10.8 (iv). From ¢x(z)~! = e, 


)-e iv 

it then follows that a(y) = A&((dz)z(yn)) + A 1Jox(((dox)-(yv)). But y, € Tz o(P) because 
(dz)z(y,) = (dw)z(y) — (dr)z(yn) = 0. So (dL ela) -i)ex X (dbx) 2 (yv s = A;l(y,) by Theorem 66.5.10 (ii). 
But àz! = e|. op) by Theorem 69.5.15 (xxx ed ie aly) = AX ((dr)z(yn)) + wz(yv). However, 
A& (dt) -(yn)) = AS (dr): (dX) p((dr)z(y)))) = (dX), (d) (dX) p((dr)-(y))))) by Definition 69.11.3, 
and (dX), o (dm), o (dX), o (dr); = (dX), o (d(n o X)), o (dr); = (dX), o idr, (uy o (dt), = (dX), o 
(dr). So A&((dn);(yn)) = w2((4X)p((dr)-(y))) = wz(yn). Therefore a(y) = wz(yn) + wz(Yv) = wz(yn + 
Yv) = w;(y). Consequently Vp € Ux, Vy € Tx(p)(P), aly) = w(y). In other words, Vz € Range(X), az(y) = 
wz(y). 

Let p € Ux and z € 1 ! (Ux). Let zo = X(m(z)). Then zo € Range(X). So o4, = wz. But z = zog for some 
unique g € G by Theorem 66.4.3 (iv). Then v; = Adj(g !) o wz o (dR? 1). by Theorem 69.8.3 line (69.8.5). 


To calculate Adj(g~*) o az, o (ARP 1)4, the first right-hand term of line (69.12.2) for o4, gives 


Adj(g !) o (Adj(¢x(zo)~*) e A& o (dr)z ) o (dRj-1)z 
= Adj(g~*@x (zo) !) o ua, o (dX), o (d(m x R5.) 
= Adj(éx (z) ^) o wzo o (dX), o (dr); 
= Adj(¢x(z)~') e AX o (dr): 
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by Theorem 62.10.6 (ii) for the adjoint and Theorem 66.2.12 (vi) for the equality 7 x R? — m. For the 
second right-hand term of line (69.12.2) for a,,, one obtains similarly 


Adj(g~*) o ( (dLS (z0)-1)óx (20) o (dóx)4 ) o (dR}-1). = Adj(g !)o (d(LS. y ox © Bus 
= (d(L&.. o RY o Lg yi 9 Ox 9 Ry) 
= (d(RF o Lg ya o RF- o bx): 
= d(L§,.(z)-1)4x(2) © (dbx)z 
by Theorem 21.11.16 (ii) for the equality R, o óx = óx o RP ,. Thus Adj(g^!) o o, o (dR? 5). = as, 


which implies that a, = w, for all z € x ! (Ux). This verifies line (69.12.1). 


69.12.4 REMARK:  Consistency requirements for overlapping connection-form localisations. 

Wherever the domains of two cross-sections X, Y € X{.(P,7,M) overlap, the values of w reconstructed 
from localisations A% and A¥ as in Theorem 69.12.3 evidently must be the same. This implies a consistency 
requirement for localisations if they are not known a-priori to have been correctly constructed from the same 
connection. The consistency condition is a mere academic curiosity when the connection form is given and 
the localisations A& and A£ are known to have been correctly constructed from w via cross-sections X and 
Y according to Definition 69.11.3. However, in the physics context, it is more usual to define localisations in 
Xioc(Ai(T(M), T.(G))) first, and then attempt to construct a connection form w by combining these local 
Lie-algebra-valued forms. The consistency requirement for the logical possibility of this construction (or 
reconstruction) is the subject of Theorem 69.12.5. 


69.12.5 THEOREM: Compatibility rule for connection form localisations via cross-sections. 
Let w be a connection form on a C? principal G-bundle (P, x, M, AZ). Then 


VX,Y € XL.(P, n, M), Vp € Dom(X)n Dom(Y), VV € T;(M), 
AŞ (V) = Adi(éxv (b) D (AK(V)) + (dle s y )vxr (dbx )p (V); (69.12.3) 


where dx : * !(Dom(X)) > G and óy : * !(Dom(Y)) — G are the identity charts for X and Y as 
in Definition 21.10.7, and xy : U = Dom(X) n Dom(Y) — G is the identity chart transition map in 
Definition 21.10.11 which satisfies Vp € U, Vz € x^! ((p]), Vxv(p) = óx(z)óy(z)-!. Thus 


VX,Y € XL.(P, m, M), Vp € Dom(X) n Dom(Y), 
AY |. y = Adj(wxy(p)~*) P AX lr cat) + (OL, xy (p)-1)yxy (p) 2 (dýxy)p- (69.12.4) 


Pnoor: Let X,Y € XL.(P,n,M). Let Ux = Dom(X), Uy = Dom(Y) and U = Ux N Uy. Let w be a 


loc 


connection form on (P, r, B, AS). Both A% and A£ satisfy Theorem 69.12.3 line (69.12.1) for z € a 1(U). 
So 


Yz € n 1(U), Ww, = Adj(éx(z) !) o A% o (dr); + (ALS. ex (9) 9 (dbx): 
= Adj(óv (z) *) o AY o (dr); + (dL, (2 -1)6y.() © (doy )z- 
Therefore for all p € U and z € v! ((p]), it follows from Theorem 62.10.5 (iii) that 
Ag o (dr), = Adi(dxy (z) 3) o AS o (dr), 
+ Adj(¢y(z)) o ( (ALG, (2)-1) x (2) o (dóx)z — (dLẸ (2-1) 6v (2) o (ddy)z ) 
= Adj(éxv(z) ) o AX o (dz); 
T (ARG, (2)-1) dy (2) 2 (dS, (2) )e 2 ( (dL rie- exe) o (dóx); — (dL 5, (2)-1) by (2) 2 (dóy);) 
= Adj(éxv(z) !) o A% o (dz); + (ARG, (2)-1) dy (z) 9 ((ALG 5 (-1)6x (2) 9 (dbx) 2 — (dby)z) 
= Adj(¢xy(z)~") o A% o (dr). + (ALG, (-1)6xv (9) 9 (dUxv)p o (dr); 
by Theorem 66.3.4 (iv), where óxy : 1» !(U) > G is defined by Vz € «^ !(U), óxv(z) = éx(z)óv(z) |. 


Since (dm); : Tz o(P) > T,(M) is a bijection, if follows that 


Y lec = Adi(éxv(p) )e AS | vt) + (ALS. (p)-2 vx (P) o (d xy )y. 
Since this holds for all p € U and z € «^! ((p]), lines (69.12.3) and (69.12.4) follow from this. 
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69.12.6 REMARK: Construction of a connection form from given localisations via cross-sections. 
Theorems 69.12.3 and 69.12.5 both assume that a connection form w is given on a principal bundle, and 
that the localisations AY are computed according to Definition 69.11.3 from this known connection form 
and cross-sections X. Theorem 69.12.3 then states how to recover w from A& and X, and Theorem 69.12.5 
states a necessary condition which localisations A& and AY must satisfy if they are correctly computed from 
given cross-sections X and Y. 

In the reverse of this process, it is the Lie-algebra-valued fields Ax, Ay € Xjoc(Ai(T(M), Te(G))) and cross- 
sections X and Y which are given, and the task is to construct a connection form w such that Ax = A$ 
and Ay = A$. Then it transpires that the necessary condition for compatibility in Theorem 69.12.5 is also 
a sufficient condition for the existence of a suitable connection form. 


The name “reconstruction theorem" is given by Daniel/Viallet [317], page 183, to a theorem (with a long 
proof) which is almost identical to Theorem 69.12.7. 


69.12.7 THEOREM: Construction of connection form from compatible gauge potentials. 

Let (P, v, M, AG) be a C! principal G-bundle. Let (Ax) xes be a family of Lie-algebra-valued fields with an 
index set S satisfying the following conditions. (See Definition 21.10.11 for xy.) 

(1) YX € S, X e XL.(P, m, M). 

(2) Uxes Dom(X) = M. 

(3) VX € S, Ax € Xiyc(M(T(M), T.(G))). 

(4) VX, Y € S, Vp € Dom(X) n Dom(Y), Ay = Adj(pxy(p)~*) o Ax + (ADS. (y -3)uxv (p) 9 (dUxv)p- 
Then there is a global connection form w € X(A1(T(P), T.(G))) such that VX € S, AY = Ax, and this 
connection form satisfies 


VX € S, Yz € Dom(X), 
We = Adj(éx(z) !) o Ax o (de); + (€L§,(2)-1)dx(z) © (dbx). (69.12.5) 


PROOF: For z €P, let S, = (X € $; z e Dom(X)}. Define w € X (A (T(P), T.(G))) by 
Yz € P, YX € Sz, we = Adj(óx(z) )) o Ax o (dr); + (ALS. (3) () 9 (déx)s- (69.12.6) 
To show that w, is well defined by line (69.12.6), it must be shown to be independent of X € S. Let z € P. 
For cross-sections Z € Sz, let wf = Adj(óz(z) !) o Az o (dr); + (ALG (2)-1)62(z) © (doz)z. Let X,Y € 8;. 
Define óxy : 1 !(Dom(X)n Dom(Y)) > G by óxv :z 9 óx(z)óy(z) |. Then óxy = Vxy o7 and 
Adj(óx(z) !) o Ax o (dr); = Adi(óv(z) ^) o Adj(@xy(z)~*) o Ax o (dr), 
= Adj(óv (z)~*) e Adi(Vxv (p) ) o Ax o (dr), 
= Adj(óv(z) 1) o (Ay — (ALG, (p)-1)yxy (p) © (düxv)s ) 9 (dz); 
= Adj(óy(z) ^) o Ay o (dz); 
( 
) 


m Adj(óy z^) 9 (dE cet vio) Q (dixy)p e (dr), 
= Adj(óy(z) ^) o Ay o (dz); 
x (dLẸ (2)-1) 6x (2) o (dóx) + (aL§, (2)-1) ov (2) o (doy): 


by assumption (4) and Theorem 66.3.4 (v). Therefore wy = w% . So by assumption (2), line (69.12.6) defines 
a unique value for all z € P. 


Let X € S. Then A& =w o X, by Definition 69.11.3. Let p € M, V € T,(M) and z = X (p). Then 


AX = wz o (dX), 
= Adj(éx(z) !) o Ax o (dz); o (dX), + (ALG, (,-1)o (2) ° (dx); o (dX), 
= Adi(éx (z)!) o Ax 


because (dm); o (dX), = (d(v o X))p = idr, (i), and (déx); o (dX)p = (d(óx o X))p = (de) = 0 by 
Theorem 21.10.8 (iv). But Adj(óx(z) !) = Adj(óx (X(p))!) = Adj(e) = id, (c. Hence AY = Ax. 
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69.12.8 REMARK:  Equi-informational structures for connections on principal bundles. 

It follows from Theorems 69.12.3, 69.12.5 and 69.12.7 that a connection form on a principal bundle can 
be projected or “localised” via bundle cross-sections, and the original information can be recovered from 
the localisations (if the cross-sections are known). Similarly, if a consistent set of Lie-algebra-valued fields 
covering the tangent space of the base-point space is given, a connection form may be constructed from 
these, and the original fields may be recovered from this global structure. Thus the connection form and the 
local fields contain the same information. 


By Theorem 69.6.3, the connection form also contains the same information as the horizontal lift function, 
the horizontal component map, the vertical component map, and the horizontal subspace map. It is also true 
that the covariant derivative contains the same information, and that associated connections on associated 
fibre bundles contain the same information. (In the special case of affine connections on tangent bundles, 
there are yet further equi-informational connection structures.) 


The existence of so many different constructions containing the same information about a connection can be 
a source of great confusion because the formulas to convert between them are not always easy to apply in 
practice. And more importantly, the many formulas which are used in differential geometry can look very 
different in the different formalisms. In effect, the subject must be learned in multiple languages, and skill 
in translation between these languages is essential. 


69.12.9 REMARK: The effect of the right action by the structure group on connection form localisations. 
The localisation A% € Xioc(Ai(T(M),T.(G))) of a connection form w via a given X € X}Ł.(P, r, M) 


can be transformed by acting on X(p) on the right with a Lie group element g(p) for each p € Dom(X). 
'Theorem 69.12.10 gives a formula for this. 


A connection form can be constructed for any cross-sections X and Y which are compatible according to 
Theorem 69.12.7 condition (4), but this connection form does not appear explicitly in line (69.12.7) if the 
fields AY and AZ are replaced by Ax and Ay respectively. This transformation rule depends only on X and 
Y because g is uniquely determined by X and Y, but a “globalisation” w for the fields Ax and Ay must at 
least exist, although its values do not need to be known for line (69.12.7) to be valid. If w is not given, it is 
straightforward to construct it by means of Theorem 69.12.7, and then insert it into line (69.12.7). 


The first term on the right side of line (69.12.7) may be regarded as the “vertical term" because ker(w,) = Q, 
by Theorem 69.5.15 (xii), and the second term may be thought of as the "horizontal term" because g is 
effectively constant on each fibre set. The vertical term thus depends only on the “value” of g at p, whereas 
the horizontal term depends on the “differential” with respect to the base point in M. As a “sanity test” 
for Theorem 69.12.10, the case that g is constant makes the “horizontal term" disappear because (dg), = 0. 
'This corresponds to the global application of the right action HT on P. The resulting formula then includes 
the “vertical term" only, which matches Theorem 69.8.2 (i). 


69.12.10 THEOREM: Effect of structure group right action map on gauge potentials. 
Let (P,7,M, A8) be a C! principal G-bundle. Let w : T(P) — T.(G) be a connection form on P. Let 
X,Y € XL.(P,n, M). Let U = Dom(X) n Dom(Y). Define a C! map g : U > G so that Vp € U, Y (p) = 


loc 


X(p)g(p). (This is uniquely determined by X and Y.) Then 


Vp EU, VV e TM), — A&(V) = Adi(g(p)") 0 AV) + (dL6,, -.)gty o (dg)s(V). (69.12.7) 
where AX and A¥ are localisations of w via X and Y respectively as in Definition 69.11.3. In other words, 


Vp € U, Ys an = Adj(g(p)~") o AX |p, an + (dL -1)g() 9 (d9)p- (69.12.8) 


PROOF: By Theorem 21.10.12, xy = g. So line (69.12.7) follows from Theorem 69.12.5 line (69.12.3). 


69.12.11 REMARK: Shorthand version of gauge transformation rule. 

The formula in Theorem 69.12.10 lines (69.12.7) and (69.12.8) is essentially the same as the components ver- 
sion given in Theorem 69.13.10 line (69.13.4). Therefore the comments about “shorthand” in Remark 69.13.11 
apply also to Theorem 69.12.10. 
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69.13. Gauge potential coordinate chart components 


69.13.1 REMARK: The gauge potential component function. 

Coordinatisation of the localisations A& € Xjo-(A1(Z(M),T.(G))) of connection forms w : T(P) > T.(G) 

on C1 principal G-bundles (P, 7, M, AS) by means of coordinate charts for the base-point space M is simple. 

One merely feeds the coordinate basis vector fields e? € X°(T(M)|U,), defined p € Uy = Dom(v) by 
e? : ipe» eb % into the localisations AX according to the formula AX y ; = AX o e? . (See Notation 54.4.10 

for the chart-basis vectors e^" € T,(M).) 

Definition 69.13.2 applies this coordinatisation procedure to general fields A € Xjoc(A1(T(M),T.(G))), 

which are here referred to as “gauge potentials", whether they are constructed from connection forms or not. 

Definition 69.13.3 then specialises the general procedure to connection form localisations. 


69.13.2 DEFINITION: Components of general gauge potentials. 

The gauge potential component function for a Lie-algebra-valued 1-form A € Xisc (Mi (T(M), T (G))) on the 
base space M of a principal G-bundle (P, r, M, AG), for a base-point manifold chart ù € atlas(M), is the 
function A, : Dom(A) n Dom(v) > (Nn > Te(G)) with dim(M) = n which satisfies 


Vp € Dom(A) n Dom(y), Vi € Nn, Ay (p); = A(e?”). (69.13.1) 


69.13.3 DEFINITION: Components of gauge potentials which are constructed from connection forms. 

The (connection form) (localisation) component function for a connection form w on a principal G-bundle 
(P, v, M, AG) via a cross-section X € XL. (P, s, M), for a base-point manifold chart v € atlas(M), is the 
function AX ,, : Dom(X) n Dom(v) > (Nn > Te (G)) defined by 


Vp € Dom(X) n Dom(4)), Vi € Nn, AS (p); = AX (e?) (69.13.2) 
€ T.(G), 
where n = dim(M) and A& € Xioc(A1(T(M), T. (G))) is the localisation of w via X in Definition 69.11.3. 


69.13.4 REMARK: The connection form localisation component function. 
Definition 69.13.3 is illustrated in Figure 69.13.1, which is almost the same as Figure 69.11.1. The main 


difference is that dependence of A$ (V) on V has been replaced by dependence of AS. (Pi on w and i. 
AX v (p) = «(QC (P) 
X. 
p "EP MP 
X(p) 
M [ po e s 
Figure 69.13.1 Connection form localisation component function 


69.13.5 EXAMPLE: Gauge potential component function for a trivial principal (IR, +)-bundle on Rt. 

Let 3 be the horizontal lift function on the principal bundle (P, m, M, AG) in Example 69.1.9, where M = Rê, 
G=R<(R,+), P= MxG, andr : P — M with 7 : (p, g) ^ p. Then v (2) = t(y,5),(v,f(2,»));ap for p € M, 
V = tpv idam € Tp(M) and z = (p, g) € Pp, where f : R4xRxR* — R has the form f(p,g,v) = x o a; (p)? 
for all z = (p,g) € P and v € Rf, where a = (a;)?_9 : M — R4 is the tuple of coefficients of the linear 
dependence of f on v. 
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It is shown in Example 69.5.6 that the connection form w on P which corresponds to £ satisfies 
Vz € P, V(v,w) € R^ x R, Us (Lo (usu) idp) = tor,w-f(z,v), dg. 


Define X : M — P by X(p) = (p,0p) for all p € M. Then X € Xj. (P,v, M) by Notation 64.7.3, and its 


loc 


differential X, : T(M) — T(P) satisfies X, (tpv, idm) = t(p,0),(v,0),iap for all p € M and v € IR^. Therefore 


Vp € M, Vv € RE, A (tp,v, idm) = tor, -f (p,0,v) idr: 
Then Dom(X) N Dom(idm) = M. So 


Yp € M, Vi € Za, AS. (pi = AS (e) 
AX (ty,e:,idar ) 
= tor,- f(p,0,e:), idr 
= —ai(p) tog,tidp- 
As shorthand, one could write A;(p) = —a,;(p) for p € M and i € Z4, or simply A; = —a; for i € Z4. 


69.13.6 REMARK: Alternative notation for the Lie algebra field of a connection. 
As a minor convenience, Notation 69.13.7 adds an index to the Lie algebra component function 


69.13.7 NOTATION: Components of a gauge potential which is constructed from a connection form. 
AX. yi: for connection form w on principal G-bundle (P,1, M, AG), X € XL.(P,m, M), v € atlas(M) and 
i € Nn, denotes the function from Dom( X) n Dom(w) to T.(G) defined by 


Vp € Dom( X) n Dom(w), AX. yi (P) = AX y(P)is 
where n = dim(M). In other words, 


Vp € Dom(X) n Dom(y), & pip) = Ag (EP). 


69.13.8 REMARK: Reconstruction of a connection form from localisation component functions. 

Since the connection form w can be reconstructed from localisations AX according to Theorem 69.12.3 (if the 
set of cross-sections X € Xi,.(P, r, M) covers M), and the connection form localisation component functions 
AX , contain the same information as the localisations AS, it follows that w can also be reconstructed from 


the set of A „ if these functions are known for all » € atlas(M). This is obvious because any V € T,(M) 


may be expressed as linear combinations of coordinate basis vectors e^" for some v € atlas (M). 


Thus it is possible to convert, with some difficulty and inconvenience, between the single connection form w, 
which is favoured by pure mathematicians, and the local Lie-algebra-valued tuple-fields A us which are 
favoured by physicists. 


69.13.9 REMARK: The effect of right action map fields on connection form localisation components. 
The connection form localisation component function in Definition 69.13.3 satisfies 


Vp € Dom(y), AX 7 an= » AX v (p)i : elu (69.13.3) 


by Theorem 55.3.6 line (55.3.2) because Ax |y cu 38 linear. (For the chart-basis covectors s € T} (M), 


6 


see Notation 55.3.3.) The terms in the sum-expression in line (69.13.3) are “juxtaposition products" of the 
linear functionals ef „ on the linear space Tp(M) multiplied by vectors A% (p); € Te(G) as described in 
Remarks 23.4.9 and 27.2.31. Thus each term A% ,,(p);: E is a linear map from T,(M) to T.(G). Therefore 
3 AX y(P)i €, € Lin(T; (M), Te(G)), which agrees with AX ran € Lin(T; (M), T.(G)). 


If Theorem 69.12.10 is applied to gauge potential component functions Ax, and Ayy for compatible gauge 
potentials Ax and Ay, the result is Theorem 69.13.10. Since compatibility implies that there is a unique 
connection form w from which Ax and Ay are projected via X and Y, it can be assumed that Ax = A& 
and Ay = A¥ for some connection form w on (P, s, M, AS). 
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69.13.10 THEOREM: Gauge transformation. Effect of variable group element action on gauge potential. 
Let (P,7,M, A8) be a C! principal G-bundle. Let w : T(P) — T.(G) be a connection form on P. Let 
X,Y € XL. (Pm, M). Let U = Dom(X) n Dom(Y). Define a C! map g : U > G so that Vp € U, Y (p) = 
X(p)g(p). (This is uniquely determined by X and Y.) Let n — dim(M). Then 


Vp € U, Vv € atlas,(M), Vi € Nn, 


AY. y(p)i = Adj(g(v) !) o AX y (P)a + (ALE p)-1 ow OP" (9))- (69.13.4) 


PROOF: Line (69.13.4) follows from Theorem 69.12.10 because 0^" (g) = (dg), (e?") by Theorem 58.1.6 
line (58.1.4). 


69.13.11 REMARK: Shorthand for gauge transformation formulas. Maurer-Cartan form. 

Some authors would abbreviate a formula such as line (69.13.4) to “Ai = g^! A;g + g !O;g^. This is useful 
as a mnemonic, but is difficult for pure mathematicians to decode. (This particular shorthand style of 
notation is apparently attributed to Élie Cartan by Frankel [12], page 476. For some equivalent formulas, see 
Nash/Sen [30], page 178; Daniel/Viallet [317], pages 176, 184; Bleecker [254], page 31; Frankel [12], page 540; 
Peskin/Schroeder [298], pages 488, 490; Itzykson/Zuber [277], page 565; Moriyasu [293], page 40; Eriksson/ 
Hággblad/Strómbom [264], page 53; Atiyah [251], page 8.) 


The abbreviated formula is essentially correct if standard coordinates are used for a matrix group and its 
Lie algebra, but coordinates are supposedly to be shunned in a modern approach to differential geometry. 
B $()-1)* and (R a(p))* 
)e, which equals Adj(g(p)~1). 


The left and right multipliers g^! and g in the expression “g~'Aj;g” really mean (L 


respectively, which in this context combine to give (dL 5-1) a(n) stay © (LES. J0) 


The term *g-!8jg" in gauge transformation formulas is sometimes known as the “Maurer-Cartan form". 
(See Remarks 62.5.7 and 64.8.13.) 


69.13.12 EXAMPLE: Typical gauge theory shorthand for the gauge transformation rule. 
A typical gauge theory shorthand for line (69.13.4) is Daniel/Viallet [317], page 176 as follows. 


“Ay (2) > "A, (2) = g(a) Ay(2)g(x) + ^ ()0,9(2) (69.13.5) 


This is quite easy to reconcile. The variable base point here is denoted z rather than p € M. The function g : 
M —> G is chosen so that the right action of g(p) on X (p) yields Y (p), as in the statement of Theorem 69.13.10. 
Then “9A,,(x)” corresponds to A% "IU ); with i replaced by “p”, with p replaced by “x”, and AY Y, replaced 
by “9A”, “hed w and v are both suppressed. Similarly tho sub-expression “A,,(x y on line (69.13.5) 
corresponds to AX ,, (p); on line (69.13.4) with w, X and v suppressed. 


The expression Adj(g(p) !) o A% (p); on line (69.13.4) equals (dL -1)g (p) © (dR& e o AX u(p)i by 


Theorem 62.10.5 (ii). This implies pes the first “g~!(x)” on line (69.13.5) represents the differential left 
action (dLoty- 1)g(p) Of g(p)~* on vectors in Typ) (G), not the left group action of g(p)~* on elements of G. 


Similarly, the first ^g(x)" on line (69.13.5) represents the differential right action (dR& ye of g(p) on tangent 
vectors in T.(G), not the right action of g(p) on elements of G. 


The expression “3 g(x)” on line (69.13.5) corresponds to OP" (g). Since g is a map from M to G, this 
expression yields an element of T;(,,(G), which happens to be the domain of (aLG Gia 1)g(p)- Therefore it 
«4—1 


seems very reasonable to assume that the second expression *g^!(x)" also corresponds to (dL& 9(p) -1)(»- 


Thus it is apparent that the shorthand line (69.13.5) may be interpreted to mean line (69.13.4) exactly. 


69.14. Gauge covariant derivatives 


69.14.1 REMARK: Lifting base-space chart-basis operators to the principal bundle. 

Let w € atlas; (M) be a base-space manifold chart for a C! principal G-bundle (P, m, M, AG), where p € M. 
Then by Notations 54.4.10, 54.11.3, 54.11.12 and 54.13.5, gp = Ope; is the chart-basis tangent differential 
operator corresponding to the tangent vector ef” lg tp eip for each i € Nn, where n = dim(M). Let 
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V = e?*, Then 0?” = dy by Notation 54.11.12. Any vector V € T,(M) may be lifted to B.(V) € T.(P) 
for z € v l([p]) via a transposed horizontal lift function 6 for P. In fact, the entire local vector field 
e? € X"(T(M) Uy) defined by e : p — e?" may be lifted to êf = liftg(e?) € X°(T(P) | à- (U,)) defined 
in Definition 69.2.7 by & : z — B, (ef (n(z))), where Uy = Dom(y). 

Then it seems reasonable that the differential operator field 0” € X°(T(M)|Uy) defined by 0? : p = oP” 
should be lifted to the operator field 0” € X°(T(P) | v^ (U,;)) defined by 0? (z) = Beye) for all z € a! (U,). 
(See Notation 54.12.2 for tangent operator total spaces T(M) and T(P). As mentioned in Remark 54.12.3, 
such spaces are not, strictly speaking, the total spaces of fibre bundles. So the spaces X°(T'(M)|U,) and 
X? (T(P)| 7 1(U,)) are in fact abuses of notation which must not appear in formal definitions or theorems. 
However, the operator fields ay’ : Uy > T(M) and ày im (Uu) > T(P) are well defined.) 


69.14.2 REMARK:  Pulled-back lifted chart-basis vector fields via cross-sections of principal bundles. 

The vector and operator field lifts ey and a in Remark 69.14.1 are defined for every z in every fibre 
set c '({p}) for every p € Uy. This would be useful for differentiating functions which are defined on 
allof P. But for some purposes, it is desirable to pull these lifted fields back to M via cross-sections 
X € X!(P,n, M), which implies that only the z-values z = X(p) € 7~'({p}) which are sampled by X are 
included in the pulled-back lifted fields. 


Definition 69.14.3 introduces pulled-back lifted chart-basis vector fields OX vu Definition 69.14.4 introduces 
the corresponding pulled-back lifted differential operator fields OF d These are both constructed from a 


connection form w (which is itself constructed from a transposed horizontal lift function 3), a particular choice 
of principal bundle cross-section X, a particular choice of base space manifold chart p, and a particular choice 
of coordinate component i. The steps in this construction are as follows for each p € Dom(X) n Dom(w). 


) Pull back the Cartesian space basis vector e; from v(p) to p to define e? id 


a 
(2) Lift e? via B from T,(M) to êt (z) = B.(e?”) € T.(P) for every z € 1~!({p}). 
(3) Sample the values B, (e^") € T.(P) for z € 1-!((p]) by choosing z = X (p). 
( 
( 


4) Pull back Bx(p)(e?”) € Tx(p)(P) to M by defining it as é& ,, ;(p) for p € Dom(X) n Dom(4). 
5) Define the corresponding differential operator field ps as in line (69.14.1). 


The motivation for this somewhat convoluted sequence of coordinatisations comes from the gauge theory 
context for the “standard model” in particle physics. This is taken even further in Theorem 69.14.6 by 
expressing pulled-back lifted chart-basis vector fields in terms of Lie-algebra-valued *connection fields" Ax ue 
which are subsequently further coordinatised in terms of bases for the Lie algebra T.(G). 


69.14.3 DEFINITION:  Pulled-back lifted chart-basis vector fields om principal bundles. 

The pulled-back lifted chart-basis vector field on a C! principal G-bundle (P,7,M, AG) with connection 
form w, via a C! cross-section X € Xj. (P, s, M) from a chart V € atlas(M) with index i € Nn, where 
n = dim(M), is the map ê% ,, ; : Dom(X) n Dom(v) + T(P) defined by 


Vp € Dom(X) n Dom(y), && y ilp) = Bx cy (ef) 
€ Tx (p) (P), 


where f is the transposed horizontal lift function corresponding to w. 


69.14.4 DEFINITION:  Pulled-back lifted chart-basis operator fields on principal bundles. 
The pulled-back lifted chart-basis operator field on a C} principal G-bundle (P,x, M, AG) with connection 
form w, via a C! cross-section X € Xj. (P, v, M) from a chart v € atlas(M) with index i € N,, where 


loc 


n = dim(M), is the map 02 ,, , : Dom(X) n Dom(w) > T(P defined by 
Ai 


Yp € Dom(X) n Dom(y), ôL y (p) = ôe (69.14.1) 


Xyp i(P) 
€ Tx (p) (P): 


Alternative name: components of a gauge covariant derivative on a principal bundle. 
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69.14.5 REMARK: Formula for pulled-back lifted fields in terms of connection form localisations. 

The connection form localisation AY in Definition 69.11.3 for a local cross-section X € Xj. (P, v, M) samples 
the values of a connection form w : T(P) > T.(G) on subsets T,(P) of T(P) for z = X(p). This is the 
same connection information which is sampled by Bx) in Definition 69.14.3. So these connection form 


localisations can provide the information which is required for the construction of the fields EX vu 


The specific application of the connection form localisations AS. to the chart-basis vector fields e? '" is denoted 
AX plP)i in Definition 69.13.3. Combining this with Definition 69.14.3 yields Theorem 69.14.6. 


69.14.6 THEOREM: Formula for gauge covariant derivative in terms of localised connection form. 
Let (P, v, M, AG) be a C! principal G-bundle with connection form w. Let n = dim(M). Then 


VX € XL.(P, m, M), V € atlas(M), Vi € Nn, Vp € Dom(X) n Dom(y), 
&x v, (0) = (4X) (e) — Ax (A3 y (9):) (69.14.2) 
€ Tx (p) (P). 
In other words, 
VX € XL.(P, r, M), V € atlas(M), Vi € Nn, 
êS yg = Xe 0 ef — (Ao X) 00 AS is (69.14.3) 
where “oo” is the pointwise composition operator for function-valued functions in Definition 10.4.26. 


PROOF: It follows from Definition 69.14.3 and Theorem 69.6.8 (iii) that 


VX € XL.(P,m, M), V € atlas(M), Vp € Dom(X) n Dom(y), Vi € Nn, 
Bx (y (e) = (AX) (E) — Ax ty (x qp) (4X) (62^). 


Then line (69.14.2) follows from Definitions 69.11.3 and 69.13.3. Line (69.14.3) follows from line (69.14.2) 
and Definitions 66.5.2 and 69.13.3. 


69.14.7 REMARK: The “gauge covariant derivative” for real-valued functions. 

The differential operator field Gag in Definition 69.14.4 line (69.14.1) is obtained from the corresponding 
vector field OX vj by making it act on C! functions on P. This differential operator field is interpreted 
as a “covariant derivative” in the gauge theory literature. It is called the “gauge covariant derivative” by 
Moriyasu [293], page 39. 

By looking very closely at Theorem 69.14.6 line (69.14.3) (so that it goes out of focus), one may observe 
that the vector field ex vj may be abbreviated to *e; — A;". This austere abbreviation hides 0, X, v and A. 
In the gauge theory literature, the operator field py uk corresponding to this vector field is often referred to 
as the “covariant derivative". It is a particular kind of covariant derivative which acts on functions whose 
domain is the total space of the principal bundle. 


The covariant derivative operator field Dias typically appears with notations similar to “ô, — A," in the 
gauge theory literature. (See for example Itzykson/Zuber [277], pages 64, 564; Daniel/Viallet [317], page 184; 
Peskin/Schroeder [298], pages 78, 483, 487, 490; Mandl/Shaw [288], pages 70, 221, 227; Moriyasu [293], 
page 39; Frankel [12], pages 440—441, 535, 538-540; Atiyah [251], page 7; Scharf [303], page 45; Scharf [304], 
page 165; Drechsler/Mayer [262], page 158; Messiah [291], pages 880, 885-886, 893-895; Nash/Sen [30], 
page 179; Penrose [297], pages 349-351, 450—453, 649; Actor [313], pages 473, 509-510; Bruzzo [315], page 111; 
Schutz [36], pages 220-222; Weyl [311], pages 100, 213.) 


The first and second terms of the highly abbreviated operator expression “ð, — Aj," are respectively a 
naive derivative and the vertical component map for the connection. (See Definition 69.4.2 and Theorems 
69.6.8 (iii) and 69.6.3 (viii, xii).) 


69.14.8 REMARK: Gauge covariant derivatives in the literature. 
There is a wide variety of expressions in the literature for gauge covariant derivatives. They are typically 
expressed as formulas which resemble 0, + A,,, 0, € iA,, Op 3E iqA,, Op + ig AS Fr, On d qA} Fy, and so forth, 
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with various choices for the sign, and the “coupling constant” “q” possibly replaced by “e” or “g”. Usually 
either “Fy,” or “iFk” is a Lie algebra element for the structure group, which is very often a unitary matrix 
group, which would imply that the matrices F; are either anti-hermitian or hermitian. 


Gauge covariant derivatives may be applicable either to functions on principal bundle total spaces or to 
cross-sections of vector bundles. The vector bundles may represent either classical or quantum models. 
Usually the vector bundle fibre space will be a real linear space in the classical case, or a complex linear 
space in the quantum case. 

The following are some specific examples of gauge covariant derivatives in the literature. 

(( 2019-11-29. The following list is intended to be a broad collection of samples of gauge covariant derivatives 
from the gauge theory literature, relating their notations and meanings to the definitions and notations in 
this book. This may be considered to be a core *output" from this book because it will unify a very disparate 
collection of concepts from the physics and geometry literatures. Part of this mini-project will be to add 
formal definitions to this book for all of the concepts in the list. This will take some time.... )) 


(1) “O, 4- ieA,", u € {0,1,2,3}, e € R~. (Itzykson/Zuber [277], page 64.) 
These are the components of an OFB covariant derivative “applied” to C^* (IR^, C$). 
Each component ô, + ieA,, of the tuple (0, + ie¢Ay)3—0 is applied to a different linear combination 
of the components of the tuple (9f)2 ,. Thus in the Dirac representation of the gamma matrices, 
05 + ieAg = 02 — ieA? is applied to the column vector [—iw4(z), iva(x), ivo (x), —iv1(x)] for example, 
whereas 03 + ieA3 = 03 — ieA? is applied to the column vector [ps (£), —w4(x), —Vn (x), v» (z)]. 
Thus the combined tuple (0, + ieA,,)}—9 is not directly applied to functions in the space CP inc). 
Each component of the tuple acts on a different function in C^*(IR?, C4). 

(2) “Ə, — A,(z)", u € {0,1,2,3}, x € IR^. This is an abbreviation for: 
«9,04? — (T^), Auc(z)", u € {0,1,2,3}, a,b,c € Nn, x € R4. (Itzykson/Zuber [277], page 564.) 
These are the components of an OFB covariant derivative applied to C?" (IR*, RY) or C^ (IR^, CY). 


(( 2019-11-29. To be continued ... )) 


69.14.9 REMARK: The gauge covariant derivative for vector-valued functions on principal bundles. 

As suggested by Definition 69.10.2 and Theorem 69.10.4, the covariant derivative of relevance to gauge theory 
acts on vector-valued functions on a principal bundle. For this application, Definition 69.14.4 requires a drop 
function to be incorporated into the derivative. This is the motivation for Definition 69.14.10. 


69.14.10 DEFINITION: Gauge covariant derivative components for vector-valued functions. 
The gauge covariant derivative components on a C! principal G-bundle (P, x, M, AS) with connection form w, 
for a left Lie transformation group (G, F) of a finite-dimensional real linear space F, via a C1 cross-section 
X € XL,.(P, v, M) and a chart 7 € atlas(M) with index i € Nn, where n = dim(M), is the map 9s : 
Dom(X) n Dom(v) — (C1(P, F) — P) defined by 

Vp € Dom(X) n Dom(y), VY € C!(P, F), 


(Y) = 0 , y (Y)- 


((2019-7-27. To be continued ... )) 


69.14.11 REMARK: Alternative formula for pulled-back lifted chart-basis vector fields. 

Theorem 69.14.12 gives a slightly alternative version of Definition 69.14.3, showing that the pulled-back 
lifted chart-basis vector field 66, ; is the same as the horizontal lift via B of the chart-basis vector field at. 
sampled at X(p) via the cross-section X € X1. (P, s, M). 


loc 


69.14.12 THEOREM: Alternative formula for gauge covariant derivative field. E 
Let (P, r, M, A8) be a C! principal G-bundle with transposed horizontal lift function 8. Let w be the 
connection form corresponding to £. Then 


VX € XL.(P, r, M), V € atlas(M), Vi € Nn, Vp € Dom(X) n Dom(y), 
eX y i (0) = lifta (ef) (X (p) 
€ Ty (P), 
where e" € X°(T(M)| Uy) is defined by e? spre?” for pe Uy = Dom(y). 
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Proor: By Definition 69.3.2, lifts(e) (X (p)) = xip (ef (n(X(p)))) = Bx (et (p)) = Bxqy(eP") for all 
p € Dom(X) n Dom(v), which is the same as ê% ,, ;(p) by Definition 69.14.3. 


69.14.13 REMARK: Dependence of pulled-back lifted vector fields on the choice of cross-section. 

According to Theorem 69.12.3, a connection form on a principal bundle can be reconstructed from local- 
isations which are given for at least one cross-section X € X\,.(P,7,M) covering each p € M. So the 
“sampling” of z = X(p) € «^! ((p]) in Definition 69.14.3 is not throwing away as much information about 
the connection w as it may seem. This suggests that the localisations for all cross-sections should be ex- 
pressible in terms of a single given set of “covering localisations”, i.e. a family (Ax) xes as in the statement 


of Theorem 69.12.7. 
The transformation rule for pulled-back lifted vector fields via cross-sections X,Y € Xj. (P,m, M) is very 


loc 


simple if X and Y satisfy Vp € Dom(X) n Dom(Y), Y(p) = X(p)g(p) for some “group element field” 
g : Dom(X)N Dom(Y) > G. 


VX,Y € XL.(P, rt, M), VY € atlas( M), Vi € Nn, Vp € Dom(X) n Dom(Y) N Dom(4), 


ey y ilp) = (ARI) xq) (ex ui 0))- (69.14.4) 


This follows directly from Definition 69.14.3 and Theorem 69.8.3 line (69.8.1). The simplicity of this rule 
follows from the simplicity of the transformation rule for 9 under right action maps. The transformation 
rule for the “connection form localisation components” A% .(p); in Theorem 69.14.6 line (69.14.2) is not 
so simple. This transformation rule is given in Theorem 69.13.10 line (69.13.4). (The complexity of this 
rule helps to explain why it is often excessively abbreviated, as mentioned in Remark 69.13.11.) In general, 
the complexity of transformation rules expressed in terms of connection forms may be understood from the 
complicated relation w, = AZ! o (idz, (pj — Bz o (dm)z) in Theorem 69.6.3 (xvii) between w and 8. 


Line (69.14.4) shows that no information has been lost by making an arbitrary choice of the cross-section X. 
In fact, the full global connection on the principal bundle can be recovered from these pulled-back lifted 
basis-chart fields. However, one must of course ask why the original simple information should be scrambled 
in this way. Hopefully there is some benefit for the formulation of quantum field theory. 


69.14.14 REMARK: Commutators of the gauge theory version of covariant derivatives. 

One of the first things that is typically done with the “covariant derivatives" D; = 0; — A; is to compute their 
commutators. Thus apparently [D;, D;] = [0; — A;,0; — Aj] = —[0;, Aj] — [05, Ai] + [Ai, Aj], or something of 
that nature. Therefore terms such as 0; Aj must be interpreted somehow. 


Tangent vectors in V = tpw, € T,(M), for general C! manifolds M, are converted to tangent differential 
operators Oy = Oy € T(M) in Notation 54.11.12 by applying the chart and coordinates of the vector to 
differentiate C! real functions as in Notation 54.11.3 line (54.11.1). (One of the difficulties with apparently 
abstract differential operators is that they are often extended to functions which are not real-valued, for 
which sometimes it is not entirely clear what the meaning of the derivative is. This is why tangent vectors 


are not defined to be differential operators in this book.) 


It is clear that the operator OX y (p) may be applied to a real-valued function f € C!(P,R) to obtain a 
directional derivative of f at X(p) € P since é& ,, ;(p) € Tx(p)(P). It is not quite so clear how one would 


apply the vector (dX ),(e? 9). which is the term “0;”, as a differential operator to differentiate the expression 
Ax (p) (AX s (p);), which is the term “Aj”. Not all juxtapositions of symbols have well-defined meanings! 


There is a general problem with the interpretation of the terms of a commutator of differential operators, 
as mentioned in the case of the Lie bracket in Remark 61.5.5. The two individual terms of a commutator 
XY —Y X have an ambiguous meaning. Only when the symmetric non-tensorial parts have been cancelled by 
antisymmetrisation, and the remainder has been suitably “dropped” into the right space, is it then possible 
to say that the output is well defined. 

In this particular scenario, the term (dX), (e£) is a vector in Tx (jj (P), whereas the term Ax(o (Ax V (p);) 
is its vertical component in Tx(p)o(P), which must be subtracted from it to make it horizontal. The 
commutator of local vector fields on P is well defined if P is a C? manifold. Unfortunately, the maps 
po (dX), (e^) and p++ Ax(p(A& ,(p);) are not local vector fields on P. They are only “samples” of 
vector fields on P. Their common domain is an open subset of M, not P. This contributes some additional 
complexity to the interpretation of the commutator *[0; — A;,0; — Aj”. 


((2017-1-8. To be continued ... )) 
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((2019-7-20. Try to construct the Cartan forms w in Definition 68.3.8 and coefficients whi in Definition 68.3.2 
as pulled-back lifted chart-basis vector fields like in Definition 69.14.3, or something similar. The Cartan 
forms and coefficients are constructed from basis fields, which are in essence cross-sections of principal frame 
bundles. (See the frame bundles F(M) in Definition 57.11.2 and F(E, rg, M) in Notation 65.8.5.) So 
something like this should be possible. Then this should justify the claim in Remark 69.15.1 that: “The 
gauge potential is the natural generalisation of the Cartan-style moving-frame style of connection to general 
principal bundles." ) 


69.15. Connection definition conversions overview 


69.15.1 REMARK: The many ways to represent connections on differentiable fibre bundles. 

Including gauge potentials, Table 69.15.1 lists nine distinct ways to represent connections. (The moving- 
frame connection is a special case of the gauge potential. See also the similar-looking Table 67.2.1 for 
representations of general parallelism.) 


representation scope notation section 
Covariant derivative for vector V, field Y VB DyY 68.2 
Christoffel array VB LÀ. 68.1 
Koszul connection for vector fields X, Y | T(M) | DxY 71.6 
Horizontal lift function FB 0v (z) 67.5 
Horizontal lift function transposed FB 0.(V) 67.8 
Horizontal component map FB hz 67.9 
Horizontal subspace FB Qz 67.9 
Vertical component map FB Uz 67.10 
Connection form (Ehresmann) PFB We 69.5 
Gauge potential PFB AX 69.11 
Moving-frame connection (Cartan) F(M) AX 71.14 
Table 69.15.1 Representations of connections on differentiable fibre bundles 


The abbreviations VB, T(M) and F(M) in Table 69.15.1 mean respectively vector bundles, tangent bundles 
and principal frame bundles. PFB means general principal fibre bundles, and FB means general fibre bundles, 
which includes both ordinary and principal fibre bundles. (The superset relation for these fibre bundle classes 
is illustrated in Figure 69.15.2.) 


fibre bundle 
FB = OFB 
vector bundle principal fibre bundle 
VB PFB 
t t 
tangent bundle principal frame bundle 
T(M) F(M) 


Figure 69.15.2 Superset relation for some classes of differentiable fibre bundles 


The gauge potential is the natural generalisation of the Cartan-style moving-frame style of connection to 
general principal bundles. (See Definition 71.14.2.) The role of moving frames is then played by cross-sections 
of the principal bundle. In both cases, a Lie-algebra-valued field is defined on the base space, and this field 
depends on the choice of local cross-section. 


Since the Cartan-style frame-field affine connection is a special case of the gauge potential, there are only 
nine independent connection styles in Table 69.15.1. However, gauge potentials are rarely mentioned in pure 
mathematical differential geometry texts. So the moving-frame style is listed separately here. 
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The two most popular connection styles in Table 69.15.1 are the Koszul and Cartan styles, which also 
have the narrowest scope. They are meaningful only for affine connections on tangent bundles and principal 
frame bundles respectively, whereas the other styles are meaningful for general connections on vector bundles, 
ordinary fibre bundles or principal bundles, which are much more general than the tangent bundles T(M) 
for Koszul connections, and the frame bundles (M) for Cartan connections. 


69.15.2 REMARK: Conversion rules between connection representation styles. (Rosetta stone.) 

Table 69.15.3 is an overview of conversion rules between representation styles for connections on differentiable 
fibre bundles. The abbreviations for the connection styles are as in the notation column of Table 69.15.1 in 
Remark 69.15.1. (Row labels indicate inputs. Column labels indicate outputs.) 


IN O DyY VA DxY v (z) hz Qz Uz Wy AX 
DyY 68.2.18 

m 68.1.10 
DxY 11.6.13 
Oy(z) 682.9 68.18 71.6.9 67.11.2 (iv) 67.1L2(x) 67.11.2(vii) 69.6.3 (xvii) 
h, — 682.13 (ii) 67.11.2 (1) 671L2(xi 67.112 (viii) puss : 3 (xviii) 
Q- 67.112 (iii) 67.11.2 (vi) 67.112 (ix) 3 (xx) 
v, 68.2.13 (i) 67.11.2 (ii) 67.112(v) 67.11.2 (xii) 3 (xix) 
Wz 69.6.3 (iv) 69.6.3 (viii) 69.6.3 (xvi) 69.6.3 (xii) 69.11.3 
Ax. 69.12.3 

Table 69.15.3 Conversion rules (Rosetta stone) between connection styles 


The conversions in Table 69.15.3 are valid only within the common scope of the input and output styles. 
Therefore a sequence of two or more conversions may be valid for only a sub-scope of a direct conversion. 


The Cartan-style moving-frame connection for principal frame bundles of tangent bundles, which is listed 
in Table 69.15.1, is one of the most popular representations of affine connections. However, this connection 
style is not listed in Table 69.15.3 because it is, technically speaking, a special case of “gauge potentials” , 
which extend the concept from principal frame bundles to general principal bundles. 


Also omitted in Table 69.15.3 are the connection conversions which are possible between associated fibre 
bundles by defining associated connections. (See Sections 67.12, 69.9 and 69.10.) With associated connec- 
tions, it is possible to “copy” connections from PFBs to OF Bs and vice versa, and sometimes the connection 
styles on the exporting and importing fibre bundles may be different, as in Theorems 69.9.4 and 69.9.6. For 
example, a connection form on a PFB is often exported to an OFB as a covariant derivative. Thus the 
mechanism of associated connections is additional to the conversion methods listed in Table 69.15.3. 


69.15.3 REMARK: Connection generator functions may be considered to be a connection representation. 
The connection generator functions in Definition 67.6.5 may be considered to be a style of representation for 
connections in addition to those listed in Remarks 69.15.1 and 69.15.2. In fact, it seems that the generator 
functions are the most fundamental representation of connections. They are quite likely not generally used as 
a basic definition because they are chart-dependent. However, the gauge potential representation, including 
the Cartan-style moving-frame style as a special case, is effectively chart-dependent via the arbitrary choice 
of a local cross-section of the principal bundle, which is equivalent to the choice of a PFB chart. Likewise, the 
Koszul-style vector field formalism replaces an arbitrary chart with an arbitrary vector field as its principal 
parameter. In these cases, the chart-dependence is only moderately well camouflaged. 


The horizontal lift function, horizontal and vertical component functions, and horizontal subspace connection 
styles are based on a chart-dependent connection generator function. The essence of a connection is an 
infinitesimal action by the structure group on the fibre space, which is the differential of parallel transport 
observed via some fibre chart. So it would seem reasonable to consider the connection generator function 
as a connection style on an equal footing with the others. All of the other styles may be considered to be 
merely encapsulations or “convenient packaging” for the more fundamental concept of Lie algebra elements 
acting on the fibre space. 


The following table gives some conversions from and to the connection generator function. 
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INO. DyY Th DxY  6v(z) hi Q. vz w, AX 
ig > 67.6.7 (ii) 67.1.6 (ii) 67.11.6 (iv) 69.7.3 
üg 4— 69.7.9 (i) 69.7.9 (vi) 69.7.9 (iv) — 69.7.9 (iii) 


These conversion rules may be combined with the rules in Table 69.15.3 to obtain further conversions. The 
intention here is to provide a kind of “Rosetta stone" or “polyglot dictionary” to help translate between the 
numerous idioms of differential geometry, which differ most conspicuously in their definitions of connections. 
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CURVATURE OF CONNECTIONS ON FIBRE BUNDLES 


TOL Holonomiy™ we wee oat og a Gas Be EURO RO RA a ee ee eS 
70.2 Curvature of connections on ordinary fibre bundles... a a a a a a 
70.3 Curvature of parallel transport on ordinary fibre bundles .................. 
70.4 Curvature of connections on vector bundles .................-..2 0004. 
70.5 Curvature of connections on principal bundles . . a a a a a a a 
70.6 Gauge localisation of curvature forms... a 4. ll le es 
70.7 Justification of definitions of curvature of connections . . . . ..... ln 
70:8. “Gauge theory 4:2 x eL oe oem OX mme Belk woe eh a e Gie e te e RUE RUE ode 


70.0.1 REMARK: Literature for curvature of connections. 
Some references for presentations of curvature of general connections are listed in Table 70.0.1. 


year reference VB PFB T(M) F(M) 
1918 Weyl[310], pages 117-121 T(M) 

1949 Synge/Schild [41], page 292-296 T(M) 

1959 Willmore [42], pages 214-221 T(M) F(M) 
1963 Flanders [11], pages 143-148 T(M) F(M) 
1963 Kobayashi/Nomizu [19], pages 75-79, 118-146 PFB T(M) F(M) 
1964  Bishop/Crittenden [2], pages 80-87, 99-102 PFB F(M) 
1968 Bishop/Goldberg [3], pages 231-235 F(M) 
1968 Choquet-Bruhat [6], pages 239-241, 258-260 PFB T(M) F(M) 
1970 Misner/Thorne/Wheeler [292], pages 265-282, 348-352 T(M) F(M) 
1970 Spivak [37], Volume 2, page 324 PFB 

1972 Sulanke/Wintgen [40], pages 144-166 PFB 

1977 Drechsler /Mayer [262], pages 90-91 PFB 

1980 Schutz [36], pages 210-214 T(M) 

1981 Bleecker [254], page 37 PFB 

1981 Poor [32], pages 67-71, 280-288 VB PFB 

1983 Nash/Sen [30], pages 174-181 PFB 

1986 Crampin/Pirani [7], pages 273-283, 373-382 VB T(M) F(M) 
1997 Frankel [12], pages 244-255, 431 VB T(M) F(M) 
1994 Darling [8], pages 202-206 VB 

1999 Lang [23], pages 231-239 T(M) 

2004 Szekeres [305], pages 513-514, 528-529 T(M) M) 
2012 Sternberg [38], pages 91-93, 334-335 PFB T(M) 

2015  Gómez-Ruiz [14], pages 160-166 T(M) M) 


'Table 70.0.1 Survey of presentations of curvature of connections 


[2213] 


This rough list excludes curvature definitions which are applicable only to Levi-Civita connections. The 
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column marked “F(M)” includes curvature of connection forms on tangent frame bundles, and also curvature 
of Cartan-style connection form matrices for tangent frame fields. 


The column marked “VB” in Table 70.0.1 refers to vector bundles. 


70.0.2 REMARK: Curvature is definable for vector bundles, but not for general ordinary fibre bundles. 
The curvature of an OFB connection essentially requires a covariant derivative of some kind, from which 
the curvature can be computed as a commutator. As mentioned in Remark 68.2.2, it is difficult to define a 
covariant derivative for general ordinary fibre bundles, but the linearity of vector bundle connections, asserted 
in Theorem 68.1.3, implies that vertical tangent vectors in T, 9(E) for a C? (G, F) fibre bundle (E, r, M, Aj) 
can be meaningfully mapped to elements of E by first mapping y € Tz o(E) to (d$);(y) € T(F) via a fibre 
chart @ € AR (z): then mapping this to F via the implicit tangent bundle drop function for Banach spaces 
which is mentioned in Remark 53.3.13, and then mapped back to E via $. 


By contrast, it is very difficult to see how to “drop” vectors from the tangent bundle of a general C! fibre 
space to the fibre space itself. Such a drop function is needed so that the commutator of compositions of 
covariant derivatives will be meaningful. Thus although a covariant derivative in the total-space tangent 
space can be defined, its output cannot be dropped to the total space itself so that it can be acted on by a 
second covariant derivative to compute the curvature as the commutator of the covariant derivatives. 


70.0.3 REMARK: Curvature is well defined for principal fibre bundles. 

Remark 70.0.2 explains why there is no column marked “OFB” for ordinary fibre bundles in Table 70.0.1. 
Oddly enough, there is no requirement for the fibre space to be a linear space in the case of principal fibre 
bundles. This is because a PFB connection can be expressed as a connection form, which is a Lie-algebra- 
valued differential 1-form on the principal bundle total space. A PFB connection form is thus vectorial in 
nature because Lie algebras are linear spaces. 


The curvature of a connection 1-form can be computed as a corresponding 2-form. Such a 2-form can be 
applied to pairs of base-space vectors to produce a Lie algebra element which is then the curvature for that 
vector-pair. Connection forms are specific to principal bundles, which is why they are not available for 
general OFBs. 


Since principal bundles are not generally vector bundles, their covariant derivatives are not suitable for 
composition and commutation. However, the curvature can be computed from PFB covariant derivatives 
by first expressing them in terms of the connection form, which can have an exterior derivative applied to it 
to compute the curvature form. This ultimately yields a vertical vector on the principal bundle total space, 
which corresponds to a Lie algebra element. This can then be applied as an “infinitesimal transformation” 
to the corresponding fibre set of an associated fibre bundle. 


70.1. Holonomy 


70.1.1 REMARK: The two opposite meanings of “holonomy” of parallel transport. 

The original meaning of “holonomic” parallel transport, given in 1925 by Elie Cartan (171, 174], was parallel 
transport for which the holonomy group at each point consisted of the identity group element only. In other 
words, it meant parallel transport where every vector is transported back to the same vector, irrespective of 
the choice of path. Then the word “holonomy” meant the property possessed by a space which is holonomic. 
This was expressed very clearly by Elie Cartan [171], page 11. (Also published as [174], page 89.) 


En définitive, à l'espace de Riemann donné est associé un sous-group g déterminé du groupe G 
des déplacements euclidiens, sous-group qui peut se confondre avec le groupe G lui-méme, mais 
qui peut aussi se réduire à la transformation identique; dans ce dernier cas il est bien évident que 
l'espace de Riemann est complètement holonome et ne diffère qu'en apparence de l'espace euclidien 
proprement dit. Il est naturel de donner au groupe g le nom de “groupe d’holonomie” de l'espace 
de Riemann. 


Plus généralement, à tout espace non holonome de groupe fondamental G est associé un sous- 
groupe g de G qui est son groupe d'holonomie et qui ne se réduit à la transformation identique que 
si l'espace est parfaitement holonome. 


Le groupe d'holonomie d'un espace mesure en quelque sorte le degré de non holonomie de cet 
espace, de méme que le groupe de Galois d'une équation algébraique mesure en quelque sorte le 
degré d'irrationalité des racines de cette équation. 
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This may be translated into English as follows. (Note that “Euclidean translations” here effectively means 
transport by an orthogonal, i.e. Riemannian, connection, and “fundamental group” means “structure group", 
although fibre bundles are not mentioned in the article, which is not surprising because they hadn’t been 
invented yet.) 


So finally, to the given Riemann space is associated a subgroup g which is determined from the 
group G of Euclidean translations, a subgroup which may coincide with the group G itself, but may 
also reduce to the identity transformation. In this latter case, it is quite clear that the Riemann 
space is completely holonomic and does not differ except in appearance from the actual Euclidean 
space. It is natural to give to the group g the name “holonomy group” of the Riemann space. 


More generally, to every non-holonomic space with fundamental group G is associated a subgroup 
g of G which is its holonomy group, and which reduces to the identity transformation only if the 
space is perfectly holonomic. 


The holonomy group of a space measures in some way the degree of non-holonomy of this space, 
just as the Galois group of an algebraic equation measures in some way the degree of irrationality 
of the roots of this equation. 


Some modern authors use the word “holonomy” to mean the deviation from holonomy, which is a measure 
of nonholonomy. This seems to be a confusion due to the term “holonomy group”, which is the group 
of deviations from holonomy, not the “group of holonomies”. So it would be more accurate to call it 
the “nonholonomy group”. In this book, the word “holonomy” is used in its original sense, which closely 
corresponds to its meaning in the context of vector fields whose Lie bracket equals zero. 


70.1.2 REMARK: The importance of nonholonomy for curvature of connections. 
Very roughly speaking, one could say that holonomy deviation is the integral version of curvature in the 
same way that parallelism is the integral version of connections. However, the analogy is not perfect. 


Parallel translation along curves may be differentiated with respect to the curve parameter to obtain the 
connection value in the direction of the curve’s velocity at each point. Then the parallel transport may 
be recovered by integration. Similarly, curvature may be obtained from nonholonomy by taking limits of 
nonholonomy-to-area ratios for small surfaces, but it is not generally possible to recover holonomy deviation 
from the curvature at each point by simple integration. (This is because Lie algebras are typically non- 
commutative. ) 


Holonomy deviation is the geometric concept which underlies curvature. So it is important for motivating 
definitions of curvature, but holonomy groups are not so important for setting up the basic structures of 
differential geometry. Holonomy groups are important for classification of global combinatorial topology, not 
so much for defining purely local concepts. 


70.1.3 REMARK: Basic properties of holonomy groups. 

Holonomy deviation is the parallel transport of fibre bundle elements along closed curves from and to a single 
point. The set of all closed-curve parallel-transport actions on a fibre set Ej = «^ !([b]) at a point b € B 
for a fibre bundle (E, 7r, B) constitutes a group because the composition of two actions is obtained by the 
concatenation of two closed curves, the inverse of an action is obtained by reversing the curve, and the 
identity action corresponds to a constant curve (or the concatenation of any curve with its reverse). 


If the holonomy group at a point contains only the identity action, then the connection is flat. In the case 
of the Levi-Civita connection on a Riemannian manifold, the holonomy group at each point is a subgroup 
of the orthogonal group on the tangent space at the point. 


70.1.4 REMARK: Holonomy group history and literature. 

The introduction of the concept of holonomy groups is generally attributed to two papers in 1926/1927 by 
Elie Cartan [172, 173]. 

Holonomy groups are presented and discussed by Petersen [31], pages 252-262; Sulanke/Wintgen [40], 
pages 177-186; Poor [32], pages 51-71, 129-130, 280-288; Crampin/Pirani [7], pages 378-382; Kobayashi/ 
Nomizu [19], Volume 1, pages 71-74, 89-91, 94-102, 179-197, 244—247; Kobayashi/Nomizu [20], Volume 2, 
pages 204-209; Guggenheimer [16], pages 320-327; Frankel [12], pages 259-263. s 


((2020-7-4. Chapter 70 is a dog's breakfast after this point. Please ignore it. )) 
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70.2. Curvature of connections on ordinary fibre bundles 


70.2.1 REMARK: Seeking a unified definition of curvature for all kinds of fibre bundles. 

As illustrated in the family tree in Figure 65.0.1 in Remark 65.0.1, ordinary fibre bundles include the other 
kinds of fibre bundles as special cases. So it seems reasonable to try to define curvature first for ordinary 
fibre bundles, and then apply this top-level definition to all of the other cases. 


The difficulty with this strategy is that although curvature is relatively straightforward to define for the 
special cases, namely vector bundles, tangent bundles, principal bundles and principal frame bundles, it is 
easy to even say what it means for the top level, namely ordinary fibre bundles. 


In the case of principal bundles (including principal frame bundles), the use of connection forms to represent 
connections implies that curvature can be represented by a Lie-algebra-valued two-form, which does not 
need to be dropped to a lower manifold. In the case of vector bundles (including tangent bundles), it is 
straightforward to drop the effect of a Lie algebra element to the total space. But in the case of general 
ordinary fibre bundles, there is no drop function, and the connection cannot be represented as a connection 
form. So it is somewhat problematic to even represent the curvature as a reasonable kind of object. 


70.2.2 REMARK: Criteria for choosing definitions of curvature. 

It is not too difficult to give an explicit, broadly applicable definition for curvature. However, it is important 
to know the criteria which are used to decide how curvature is defined. To deserve the name “curvature”, 
the mathematical definition should preferably correspond to some geometric idea which is already familiar. 


There was a similar requirement for geometric criteria for a definition of parallelism in Riemannian manifolds 
in 1917 when the article by Levi-Civita [187] introduced the definitive modern version of the definition. (There 
had been earlier attempts to define parallelism which were not satisfactory. The earlier attempt by Clifford 
is mentioned in Remark 67.1.2.) 


The Levi-Civita parallelism criterion assumed the prior definition of a Riemannian metric tensor field. Then 
the parallelism definition was required to preserve lengths and angles of vectors, and also have zero torsion. 
Of course one must then have criteria for the choice of the metric tensor field. In the case of hypersurfaces 
embedded in Euclidean space, the choice of metric is automatic. In general relativity, the pseudo-Riemannian 
metric is given as a solution of an initial-boundary-value problem, which then determines parallel transport. 
In the absence of a metric or pseudo-metric, the notion of parallelism must be decided by external criteria, 
which must be found in the context of the situation which one wishes to model. 


The criteria for defining curvature are simpler in principle, assuming that a connection is given. As men- 
tioned in Remark 70.2.3, the definition of curvature is determined as the infinitesimal limit of the quotient 
of holonomy deviation around a loop divided by the area of the loop. This nonholonomy-per-area limit 
characterisation of curvature is generally attributed to Elie Cartan. In the case of Riemannian manifolds, 
curvature had already been defined in the tensor calculus literature of the 19th century. (See Spivak [37], 
Volume 2, pages 184-199; Riemann/Wey] [230], pages 33-34; Bianchi [204], pages 72-74.) Such a curvature 
concept was necessarily based on the covariant derivative concept, particularly because it was expressed in 
terms of the Christoffel array. So it was a kind of exterior derivative of the covariant derivative operator 
around infinitesimal loops. Since parallel transport and covariant differentiation are effectively two side of 
the same coin, the parallelism holonomy deviation is equal to the negative of the covariant derivative holon- 
omy deviation. But even though the Cartan concept is numerically just the negative of the 19th century 
concept, it is in fact a more modern way of thinking. 


70.2.3 REMARK: The nonholonomy-per-area-limit vindication for curvature definitions. 

The fundamental justification for general definitions of curvature is the correspondence between curvature 
at a point and the infinitesimal limit of the quotient of holonomy deviation around a loop divided by the 
area of the loop. This holonomy-deviation-per-area-limit characterisation of curvature is attributed to Elie 
Cartan by Petersen [31], page 256, who gives a sketch proof for Riemannian manifolds, and by Poor [32], 
page 67, who gives a proof for general vector bundles. (See also Ambrose/Singer [168], pages 433-435.) 


It could be argued that the Cartan justification for curvature definitions is of mere philosophical value, and 
that computations and constructions are unaltered by the knowledge that a definition is equivalent to some 
complicated analytical-geometric formula in terms of solutions of an ODE for parallel transport around a 
loop which converges to a point. The value of the justification becomes even more questionable if it requires 
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substantial analytical effort. However, there are occasions when the purported formulas for curvature have 
little resemblance to intuitive geometric notions. Do Carmo [9], page 88, wrote something similar. 


As frequently happens in mathematics, a “workable” formulation of the concept of curvature required 
a long time for its development. When such a formulation finally appeared it had the advantage of 
being easy to use to prove theorems but it had the disadvantage of being so far removed from the 
initial intuitive concept that it looked as if it were some kind of arbitrary creation. 


So it is comforting to know that a valid vindication for curvature definitions does exist. 


70.2.4 REMARK: The inevitability of Lie algebras for curvature on ordinary fibre bundles. 

Since the Lie algebra of the structure group of a differentiable ordinary fibre bundle is constructed from the 
structure group, and the structure group is the fibre space for a principal fibre bundle, it might seem that this 
Lie algebra can be avoided in the definition of curvature for OFBs. However, the transformation of a fibre set 
E, of a (G, F) fibre bundle (E, r, M, Af) by parallel transport around an almost everywhere differentiable 
loop at p must be equal to the action of some element g € G acting on E, via some chart @ € At. (The 
element g depends on $ in general) The limit of this holonomy deviation for infinitesimal loops, divided 
by some kind of "area" of the loop, is computed, the result must be equal to an element of the Lie algebra 
acting on the vertical space T^(0,9(£/) via the inverse of ¢,. It is apparently impossible to avoid this role for 
the Lie algebra in the definition of curvature for OFBs. 


One possible counter-argument to this “inevitable” role for Lie algebras in curvature for OFBs is that in the 
case of connections on differentiable fibrations in Definition 67.4.2, there is no explicit structure group at all, 
and no fibre atlas. The horizontal lift function 0 in Definition 67.4.2 may be consistent with some implicit 
infinite group of C! diffeomorphisms of the fibre space F, for example. In this case, the usual meaning of 
a Lie algebra would not be applicable. Then it would be necessary to define curvature in terms of a much 
more general kind of differential of some implicit structure group. This counter-argument is not very strong 
because structure groups which are not subgroups of the general linear group for a finite-dimensional linear 
space are rarely encountered. 


Another possible counter-argument to the inevitability of the Lie algebra role could be that one does not 
see Lie algebras in the classical tensor calculus formalism for differentiable geometry. The Christoffel array 
[F4]? j,k=1 Seems to encapsulate all of the information about a connection on a tangent bundle, but it seems 
to act only on a tangent space, not on a structure group. However, closer inspection reveals that the 
square array a = [a;]?;-; = [Dp EA] for (v) € IR" is an element of the Lie algebra of GL(n). 
Thus the Christoffel array is in fact an array of n Lie algebra elements, one for each chart-basis velocity 
vector pov € T,(M). Since the connection itself is in fact comprised of Lie algebra elements, the curvature 


is also inevitably most naturally expressed in terms of the Lie algebra of the structure group. 


In general, connections on differentiable (G, F) fibre bundles are inevitably expressed in terms of Lie algebras 
via condition (v) of Definition 67.5.4, and this implies the inescapability of the role for Lie algebras in any 
definition of curvature. 


This conclusion then leads to the question of whether definitions of curvature for OFBs should be abandoned 
in favour of curvature definitions for their associated PFBs. This is effectively what is done in most of 
the literature. Even the familiar formula for Riemann curvature in tensor calculus is in fact a Lie algebra 
expression expressed in terms of Lie-algebra-valued differential forms. It seems reasonable to at least make the 
attempt, since failure will help motivate the abandonment in favour of principal bundle curvature definitions, 
but failure is not inevitable! 


In the case of general vector bundles, Lie algebra elements which represent the curvature may be identified 
with matrices, which look superficially like the elements of the structure group itself. So the appearance that 
one is writing the curvature in terms of formulas for the ordinary fibre bundle can be maintained as in the 
case of affine connections on tangent bundles, where the Christoffel arrays are superficially acting on the 
ordinary fibre bundle. In the case of more general OFBs, the Lie algebra element is no longer identifiable 
as a matrix of the same kind as in the structure group. Then the action of the Lie algebra element on fibre 
sets must be expressed as infinitesimal actions of Lie algebra elements. Therefore the illusion of avoiding the 
associated PFB is removed. 


70.2.5 REMARK: Curvature of general connections on general differentiable fibrations. 
It is philosophically interesting, but probably of little or no practical value, to consider how one might define 
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curvature for general connections for the general differentiable fibrations in Definition 67.4.2. Failure to 
define curvature in such a very general case could give some added understanding of why various constraints 
are introduced for the more successful specific cases. 


As mentioned in Remark 67.4.6, it is difficult to even define a meaningful kind of connection for a differentiable 
fibration which does not have a finite-dimensional structure group. The “tangent algebra” in Definition 63.2.5 
for a group of C! diffeomorphisms of a C! manifold is almost not worth the paper it is written on. Such 
a “tangent algebra” could be used to constrain the horizontal lift function of a C! fibration so that some 
notion of parallel transport is well defined at least locally. But even if parallel transport is well defined, the 
even more difficult task is to compute the commutator or infinitesimal holonomy deviation of such transport. 
Definition of such a commutator requires a covariant derivative which can be applied twice to the same real- 
valued function on the total space. Then this composition must be commutated and identified with some 
kind of Lie algebra for a very general group which does not have a finite-dimensional manifold structure. 


From this “thought-experiment”, some of the minimal prerequisites for a meaningful curvature definition 
begin to emerge. 


((2017-1-13. To be continued ... )) 


70.3. Curvature of parallel transport on ordinary fibre bundles 
((2017-1-14. Section 70.3 needs to be thoroughly overhauled. Please ignore it for now. )) 


70.3.1 REMARK: Applicability of “toy Riemann curvature” to lift functions on ordinary fibre bundles. 

To apply Theorem 70.7.6 to the genuine horizontal lift functions in Definition 67.5.4, a lift function of the 
form 0 : T(M) > Usem (Ep ^ T(E)) on a C1 (G, F) fibre bundle (E, vg, M, A5) must be converted to 
a Cartesian space lift map of the form L : Oy x R” > (Op > IR?) as in Definition 70.7.2. This can be 
achieved by a conversion rule such as 


V(p,v, z) € Qm x IR" x Op, 
L(p,v)(z) = Tints Yr (95-2 (p,») (te x 9) (Wr (p), vig! (2))))), (70.3.1) 


where n = dim(M), m = dim(F), Ym € atlas(M), vr € atlas(F), Qm = Range(vu), Qr = Range(vr), 
Ym = V(vy) € atlas(T(M)), Yr = V(vr) € atlas(T(F)), Range(ym) = Qu x R”, Range(yr) = Op x R”, 
$ € Ag, and II27 : IR?" — IR" is defined by zx > (@m4;)%. For any given Ym € atlas(M), vr € atlas(F) 
and ó € AL, line (70.3.1) defines L in terms of 0 without “losing information". In other words, 6 can be 
reconstructed from L. Therefore the parallel transport curvature definition for L as in Theorem 70.7.6 will 
induce the parallel transport curvature for 0. It must then be verified that this is well defined, i.e. independent 
of both manifold charts and fibre charts. 


((2016-7-13. To be continued ... )) 


((2015-10-6. The lift function 0 in Remark 70.3.1 must be Ct. See Definition 67.7.6. )) 


70.3.2 REMARK: Parallelism curvature for ordinary fibre bundles, using parallel transport ODE solutions. 
The parallel transport curvature o? for a lift function 0 on a general ordinary fibre bundle (E, rg, M, AL) 
has the following form, although the covariant derivative expressions need careful interpretation. 


Vp € M, Vu,v € T,(M), Vz € Ep, p*(u,v)(z) = Duby (2) — D,8u(z) (70.3.2) 


The subscripts U and V of the lift function 0 in line (70.3.2) are the velocity vector fields of a two-parameter 
curve-family passing through p with U(p) = u and V(p) = v. Thus U and V are extensions of v and u to the 
range of a the curve-family. (Vector fields which are obtained from curve-families in this way are sometimes 
called *holonomic". See Misner/Thorne/Wheeler [292], pages 204, 210, 239; Postnikov [33], pages 17, 52. 
Vector fields which are not holonomic may be called either “anholonomic” or *nonholonomic".) 


This somewhat ascetic, minimalist style of definition for the parallel transport curvature is more or less 
equivalent to a conceptually simpler method which utilises extensions of the vectors u and v to vector fields 
in a neighbourhood of p in M which satisfy [u,v] = 0 in a neighbourhood of p. This “full vector field” 


[ www. geometry. org/dg. html] [draft: UTC 2023-1-3 Tuesday 00:13] 


70.3. Curvature of parallel transport on ordinary fibre bundles 2219 


approach is most often seen in the literature, and a “correction term" using the covariant derivative Dry, 
is typically subtracted from the curvature expression so that the constraint [u, v] = 0 can be relaxed. 


The “most correct” interpretation of the formula for parallel transport curvature on line (70.3.2), which is 
the negative of the Riemann curvature, would solve the ODEs for parallel transport be as follows. 


(1) Construct a C? curve-family y : I :— M with I = (—1,1)? C IR? (or some other product of open 
intervals containing 0 € IR) such that 4(0,0) = p, 01 y(0,0) = u and 027(0,0) = v. 

(2) Define the vector fields U : I > T(M) and V : I > T(M) by U(s,t) = 017(s,t) and V(s,t) = 027(s, t) 
for all (s,t) € I. 

(3) Define the C? curve-family P : I — E by P(0,0) = z, and V(s,0) € I, 0; P(s,0) = 0p(,,oy(P(s,0)) and 
V(s,t) € I, OsP(s,t) = vsa (Ps, 0)). 

(4) Define the C? curve-family Q : I + E by Q(0,0) = z, and V(0,t) € I, 05Q(0,t) = 6v(,,9)(Q(0, t)) and 
V(s, t) € I, 0, Q(s, t) = 9u(s,t) (Q(s, t)). 

(5) Define the parallel transport curvature p° (u, v)(z) to be the expression 0105 P(0, 0) — 0501 Q(0, 0), which 
is an element of T;(E). 


The curve-family y in step (1) can be chosen quite freely, but the curve-families in steps (2), (3) and (4) are 
completely determined by y and 0. It is fairly straightforward to show that curve-families y of the required 
kind do exist. It follows, more or less, from Theorem 70.7.6 that the value obtained for the parallel transport 
curvature by this procedure is independent of the choice of y. 

The two terms of the expression 0,05 P(0, 0) — 0504 Q(0, 0) in step (5) are elements of T(T(E)) because they 
are second derivatives of curve-families in E. However, their difference is a vertical vector in T(T'(E)), which 
therefore may be “dropped” to T;( E). 


70.3.3 REMARK: Parallel transport curvature for ordinary fibre bundles, using lifted curve-families. 

'The result of the construction procedure in Remark 70.3.2 may be thought of more abstractly as the expres- 
sion Oe, (;)8v (z) — 05, (z)9u(z), where the maps Oy : z > 0v(4, (2) (2) and Ou : z > (4, (2) (2) define vector 
fields on E which are differentiated in the directions 6,,(z) and 6,(z) respectively. The difference between 
the results of these differentiations is once again a vertical vector in T(T'(E)) which may be dropped to a 
vector in T; (E). The vector fields 0y and Oy are defined only on a two-dimensional submanifold of E which 
passes through z, spanned by the vectors 6,,(z) and 6,(z) in T;(E), but this is sufficient to give meaning to 
the expression 0g, (2)9v(z) — 09,(2)9u(z). (This idea is illustrated in Figure 70.3.1.) 


( bulz) LR d bu (2) lf bu (5) \ 

+ m A 
9, (z) | — +] Oy (z)) — | A(z) Ov(z) | = | A(z) | + | By (z) 

! i4 o ANC ENS 

Zz Zz Zz 
M eui) d. xe dut) H € eui) 2 
Og, (z)0v (2) Oe, (2) 9u (2) O6, (2) 9v (2) — 9o, (2)9u (2) 
Figure 70.3.1 Curvature of parallel transport is a kind of “exterior derivative" 


It is not necessary to solve the parallel transport ODEs. It is possible to compute curvature on an OFB using 
just a two-parameter curve-family in the base space and derivatives of vector fields defined on a codimension 
n — 2 submanifold of E. 


(1) Construct a C? curve-family y : I :— M with I = (—1,1)? C IR? (or some other product of open 
intervals containing 0 € IR) such that 7(0,0) = p, 01:7(0,0) = u and 027(0, 0) = v. 

(2) Define the vector fields U : I > T(M) and V : I> T(M) by U(s,t) = 019(s,t) and V(s,t) = 027(s,t) 
for all (s,t) € I. 

(3) Define a C! curve-family 4 : I + E such that 7z(4(s,t)) € Range(») for all (s,t) € I, 4(0,0) = z, 
015(0,0) = 6, (z) and 024(0,0) = 0,(z). (Hence (dzg)(015(0,0)) = u and (daz) (024(0, 0)) = v.) 
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(4) Define the parallel transport curvature p?(u,v)(z) as Os9V (wn (4(s,t))) (V(8, #)) — Ou e (4(s,4))) (8: 4), 
which is an element of T,(E). 


Although this method may seem complicated, its purpose is to demonstrate that the curvature in direction 
u^ v may be defined in terms of a two-parameter curve in the base space rather than a full velocity field on 
the base space. 


Construction step (3) is reminiscent of the way in which connections are defined by some authors as the 
subspace of horizontal vectors at a point z in the total space E. In this case, one does not define a vector 
at z to be horizontal by requiring it lie in a specified horizontal subspace and have the correct horizontal 
component. The curve 4 is “horizontal at z” if its projection to the base space lies in the range of y and its 
first derivatives lie in the horizontal subspace at z. 


70.3.4 REMARK: The “arbitrary choice” of curve-family for defining parallel transport curvature. 

If the arbitrariness of the choice of a two-parameter curve-family in Remark 70.3.3 seems unsatisfactory or 
excessive, it should be considered that the familiar directional derivatives of real functions on manifolds are 
defined similarly, in terms of an arbitrary choice of curve through a point with velocity matching a given 
tangent vector. It is then shown that the differential of the real-valued function is independent of the choice 
of curve, depending only on the given tangent vector. In the case of curvature, one must construct an 
arbitrary two-parameter curve-family, and the resulting curvature value is shown to be independent of the 
curve-family, depending only on the two given tangent vectors. In both cases, one maintains a fictional view 
of the construction. For the derivative of a real-valued functions on manifolds, the fiction is the direction 
vector, which is in fact an equivalence class of curves. For the parallel transport curvature, the fiction is the 
area element, or wedge product of two vectors, which is in fact an equivalence class of curve-families. 


70.3.5 REMARK: Parallelism curvature for ordinary fibre bundles, using lifted chart basis vector fields. 
Instead of defining vector fields U o rg and V o cg on only the subset mtp (Range(?)) of E, one may define 
these vector fields on all of the points of an open neighbourhood of z € E by using a chart to generate a 
coordinate basis vector field on all points of and open neighbourhood of p € M, whereas U and V are defined 
only on the subset Range(y) of M. 


One little philosophical issue with this coordinate basis vector field approach is its uncannily close resemblance 
to the old-fashioned, much-maligned tensor calculus way of defining curvature. In fact, there is not much 
difference at all. However, there are not many general procedures for choosing a local vector field around 
a point in a differentiable manifold about which one knows only that it is a differentiable manifold. One 
does know for sure that a differentiable atlas is given for the manifold. It is therefore known that at least 
one suitably differentiable chart does exist. So it is safe to suppose that some kind of suitable vector field 
may be defined in an open neighbourhood of p € M. If n = dim(M) > 2, one may choose or construct a 
chart v) € atlas; (M) for which (p) = 0, $(u) = e1(0) and (v) = e2(0), where (e;(«))"_, is the usual basis 
for T,(IR^) for x € R”, and  — W(v) : T(Dom(v)) > T(Range(z)) maps tangent vectors on points in 
Dom(w) to their components with respect to v. This yields vector fields U and V on Dom(w) defined by 
U : grey (e ((q))) and V : q+ v^ (ex(v(a))). 

In terms of such a vector field on a neighbourhood of p € V, one may define the parallelism curva- 
ture o?(u,v)(z) as Oo, (z)8v (ng (2) (2) — O09, (89U(g (2) (2), where "6v(4,(:(2)" denotes the vector field 
z > Oy(rp(z))(Z) and "ép(4,(:)(z)" denotes the vector field z ++ Oy(7,(z))(2) for z € 75 (Dom(v)). This 
expression for p? (u, v)(z) is vertical, and is therefore vectorial if it is dropped to the tangent bundle on E. It 
is also independent of the choice of chart «. This is fairly close to the customary way of defining curvature 
in terms of vector fields, except that in order to ensure that the vector fields U and V satisfy [U,V] = 0, a 
coordinate chart is used for their construction. 


The directional derivatives Ó5,(;, and 0g,(z) acting on vector fields on E may be notated as D, and D, 
respectively, with the understanding that these signify “covariant derivatives" in the sense of differentiating 
“along the flow” of parallel transport. If u = e; € T,(M) and v = ex € T,(M), the resulting expression for 
p° (ec, ex) (2) is De, Oc, (z) — De, 0,, (2), where the subscripts ej, and e; for 0 are understood as coordinate basis 
vector fields, whereas the subscripts of “D” are understood to be the vectors e; and ex in T;(M). In the case 
of a vector bundle with an m-dimensional fibre space F, this leads to the formula Debe, (ej) — De, Ge, (e;) 
for a basis (e5)7-., for F. This gives a fairly close match for the formula in line (70.4.4) in Remark 70.4.3. 
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70.3.6 REMARK: Parallelism curvature for ordinary fibre bundles, using a lifted pair of vector fields. 

To escape from the kind of explicit dependence on coordinate charts which is proposed in Remark 70.3.5, 
one may either construct the fields U and V in some other manner so that U(0,0) = u and V(0,0) = v and 
[U, V](p) = 0, or one may permit general vector fields U and V on M, which will lead to an “error term" 
in the curvature computation. This is because the concept underlying parallelism curvature is the notion of 
parallel transport around small closed loops through a given point p. If [U, V](p) 4 0, it is not possible that 
U and V are the vector fields of a two-parameter curve-family through p. The error term which appears is 
proportional to [U, V](p). 


Since the parallelism curvature p (u, v)(z) for a given connection depends only on the values of u, v and z, it 
is certainly unsatisfactory to define the parallelism curvature as an expression of the form R(X, Y)(z) which 
depends on vector fields X and Y on M, where X(p) = u and Y (p) = v. This would be like defining the 
derivative of a real-valued function on a differentiable manifold in terms of a curve. For example, instead 
of defining (df), for f € C! (M) as the differential at p and then defining (df),(v) = 0, f as the directional 
derivative of f in the direction v € T, (M), one could define (d,)(f) for Ct curves y : IR — M with 4(0) = p 
and y'(p) =v. The “user” of a definition of derivatives should not have to provide a complete curve y when 
it is known that the derivative depends only on the vector v. Therefore the parallelism curvature is defined 
here in the format p° (u, v)(z) for given vectors u,v € T,(M) and z € Ep, not in terms of vector fields or any 
other such structure which might be used in the computation of the curvature. 


70.3.7 REMARK: Parallelism curvature for ordinary fibre bundles, using a total-space curve-family. 

'The best, simplest, easiest, clearest, most ontologically correct procedure for defining the parallelism curva- 
ture on general differentiable fibrations is to choose a C? two-parameter curve-family 4 in the total space 
with specified initial position and velocities, and then define a corresponding curve-family y = 7g o Ẹ on 
the base space by projecting the total space curve-family down to the base space as in Definition 70.3.8. 


This method is similar to the lifted curve-family approach in Remark 70.3.3, but it is very significantly 
simpler and easier in several ways. The lifted curve-family procedure commences with a choice of base-space 
curve-family y, and then lifts it to a total space curve-family 4 so that it satisfies the dual requirement of 
the correct horizontal and vertical components for the lifted vectors u and v. Then this chosen lifted two- 
dimensional submanifold must be used for differentiation of 0y(,,(.,,(z) and @u(r,(z))(Z) in the direction 
of vectors which are tangent to this submanifold, but do not properly lie in it. This may be contrasted 
with the projected total-space curve-family approach which commences with ^, which is not difficult, and 
then very easily constructs y by projection. All differentiation is automatically within the two-dimensional 
submanifold which has been chosen. 

It should be emphasised here that Definition 70.3.8 is the “parallel transport version" of the Riemann 


curvature, which is the negative of the “covariant derivative version". (This is explained in Remark 70.7.7.) 
Definition 70.3.8 is illustrated in Figure 70.3.2. 
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Figure 70.3.2 Parallelism curvature definition using a total-space curve-family 
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((2015-10-9. Need two theorems near here for the curve-family 4 in Definition 70.3.8. The first must show 
that it exists. The second must show that the curvature is independent of the choice of curve. Must also 
show that the curvature is correctly related to infinitesimal parallel transport. This should be a slightly 
surprising result because the total-space curve-family does not need to satisfy the parallel transport ODE. It 
only needs to have the correct initial location and velocities. Furthermore, it must be shown that the value 
of p° (u,v) is in fact in X(T(E,)), in other words that the horizontal component is zero. Theorem 70.7.6 
may be helpful for proving these assertions. )) 


70.3.8 DEFINITION: The parallel transport version of Riemann curvature. 
The parallelism curvature at a point p € M of a connection 0 on a C? fibration (E, vg, M) is the map 
pp, : T, (M) x T,(M) > X(T(Ep)) given by 


Vu, v € T,(M), Vz € Ep, p$ (u, v) (z) mS w( 0,0v (s.s) (¥(s, t)) = dtu (s.t) (A(s, t)) Ja (70.3.3) 


where 4 : I — E is a C? two-parameter curve-family with I = (—1, 1)? which satisfies 


(i) 40,0) = z, 
(ii) 015(0,0) = 0,(z) and 024(0,0) = 6,(z), 
and U: I — T(M) and V : I > T(M) are defined in terms of y = 7g o 4 by 
(iii) V(s,t) € I, U(s,t) = Oiy(s,t), 
(iv) V(s,t) € I, V(s,t) = O27(s, t). 


In other words, 


Vu,v € T,(M), Vz € Ep, 
pp (u, v)(z) = v( ôslre (4(s,t)) (Vs, t)) m 0:99, x s (4(s,t)) (As; t)) Ji (70.3.4) 
for any C? curve-family 4 : I — E which satisfies (i) and (ii). 


70.3.9 DEFINITION: The parallelism curvature of a connection 0 on a C? fibration (E, npg, M) is the map 
p°: Upem(Tp(M) x Tp(M)) > Upem X (T(Ep)) which is given by 


Vp € M, Vu, v € Tp(M), Yz € Ep, p° (u, v) (z) = pb (u, v) (2), 


where p is the parallelism curvature of 0 at each point p € M. In other words, o? — Upe M p$. 


70.3.10 REMARK: Definition of Riemann curvature of connection on a differentiable fibration. 

Definitions 70.3.8 and 70.3.9 are stated in terms of fibrations rather than ordinary fibre bundles because the 
structure group plays no role in the definition of the curvature, although clearly the structure group does 
influence the properties of the curvature. (See Definition 67.4.2 for connections on differentiable fibrations.) 


The drop function w is applied in Definition 70.3.8 to map vertical vectors in T(T(E)) to vectors in T(E) ina 
chart-independent fashion. (See Theorem 59.2.11 and Definition 59.2.15 for the drop function on second-level 
tangent spaces.) 


From Definition 70.3.8 conditions (i) and (ii) it follows that 7(0,0) = p, 0:7(0,0) = u and 027(0, 0) = v. 


70.3.11 REMARK: The Riemann curvature of a connection on a fibration is not a tensor. 

The Riemann curvature (or parallelism curvature) is not referred to here as the “Riemann curvature tensor” 
because it is not a tensor. It is a linear map from the space NT,(M ) of area-elements at p € M to the 
space X(T(E,)) of vector fields on the fibre set E,. If E is a vector bundle, then X(T(£,)) is essentially 
identified with the linear space Æp, and then one may say that the Riemann curvature is some kind of tensor. 
However, except in the case of the tangent bundle of a manifold, such a linear space Ep is not equal to the 
tangent space T,(M). Then the Riemann curvature is identifiable as a mized tensor, essentially of the form 
Ep8 E5 ST} (M) & T5 (M). More precisely, in the case of a vector bundle, the tensorial form would be more 
like Lin(A T; (M), Lin(E,, E,)). 
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It could be argued that Lin(A T, (M ), X (T(E,))) is some kind of tensor space, but this is a difficult case to 
make. The linear transformations for NT,(M ) are quite conventional, since this transforms like a doubly 
contravariant tensor space, but the transformations for the space X (T(E,)) of vector fields on the fibre set 
E, are better described by translation operators on Lie groups, as described in Definitions 62.3.9 and 62.6.3. 
Since this can be thought of as a Lie algebra, which is a linear space, one could argue that the Riemann 
curvature inhabits a tensor space which resembles X(T(E,)) & T;(M) & T; (M). This is not the usual 
kind of mixed tensor space consisting of a mixture of the spaces T;(M) and T7 (M), which are relatively 
straightforward to manage. 


70.3.12 REMARK: The antisymmetry of the Riemann curvature. 

Since the parallelism curvature in Definition 70.3.8 is explicitly antisymmetric by its construction, the space 
which it inhabits could be described as either Lin(A T,(M), X (T(Ep))) or Ao(T,(M), X(T(E,))). (This 
issue is also mentioned in Remark 71.11.3 for the case of affine connections on tangent bundles.) The former 
case is preferable because it corresponds more closely to the fact that the parallelism curvature is a function 
of area elements, not a function of pairs of vectors as in the latter case. However, in practice, pairs of tangent 
vectors are used for computations. The area element is an abstraction of the area spanned by two vectors. 


The vector-pair (u, v) is effectively a label for the equivalence class of C? curve-families which have the given 
initial velocity vectors u and v, as alluded to in Remark 70.3.4. The area element u ^ v spanned by u and 
v is then a higher abstraction which labels the set of all curve-families with direction vectors (u’,v’) such 
that u/ Av’ = u ^v. Many textbooks state that a tangent vector in a manifold is an equivalence class of C! 
curves. Not so many books explicitly state that the wedge product u ^ v of two vectors is an equivalence 
class of C? curve-families. 


One must not forget, however, that the values of the individual terms 0,0y(s,1)(7(s, t)) and O:0u(s,4)(7(s, t)) 
in Definition 70.3.8 do in fact depend on the choice of C? curve-family 4. It is only the antisymmetrised 
combination of these two terms which is the same for all curve-families corresponding to a given area element. 


70.3.13 REMARK: Geometric significance of dependence of parallelism curvature on area elements. 

The fact that the parallelism curvature p? (u, v) depends only on the area element u^v makes good geometric 
sense because ultimately this curvature is defined in terms of parallel transport along curves which lie in the 
plane spanned by u and v, and it is this plane (together with a scaling factor |u ^ v| and the orientation 
of the plane) which determines the class of curves along which parallel transport must be computed. Any 
vectors which span the same plane will determine the same class of parallel-transport curves. 


70.3.14 REMARK: Parallelism curvature induced onto the fibre space. 

The target space X (T(E,)) for p? (u, v) in Definition 70.3.8 has a natural isomorphism to the space X (T(F)) 
of vector fields on the fibre space F for the fibration (E, mg, M). This is because a fibre chart ¢: E > F 
induces a diffeomorphism between E, and F. Consequently, in terms of a given fibre chart, the curvature 
p? (u,v) may be thought of as an element of the Lie algebra of the structure group G acting on F. 


70.3.15 REMARK: Application of parallelism curvature to principal fibre bundles. 

In the application of the parallelism curvature in Definition 70.3.8 to a principal fibre bundle (P, tp, M, AG), 
it will transpire that the value of p?(u,v) may be regarded as an element of the Lie algebra of its structure 
group G because the fibre set is in this case P, = np ({p}), which is diffeomorphic to G. 


70.3.16 REMARK:  Definability of various curvature concepts. 

It is apparently impossible to extend the Ricci curvature concept to connections on general differentiable 
fibre bundles. The following table summarises the scope of some curvature concepts, using abbreviations VB 
for “(linear connection on) vector bundle” and TB for “(affine connection on) tangent bundle”. 


curvature OFB PFB VB TB metric 
Riemann yes yes yes yes yes 


Ricci yes yes 
sectional yes 
scalar yes 
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In the case of linear connections on vector bundles, it is possible to express the Riemann curvature in terms 
of an m x m x n x n matrix, where n is the dimension of the base space and m is the dimension of the fibre 
space. Then formulas as in Remark 70.4.3 become meaningful. 


10.4. Curvature of connections on vector bundles 
((2017-1-14. Section 70.4 needs to be thoroughly overhauled. Please ignore it for now. )) 


70.4.1 REMARK: Generalisation of curvature to connections on general fibre bundles. 

Curvature is, broadly speaking, the deviation from “flatness” of parallel transport around closed curves. In 
the case of an affine connection on the tangent bundle of a manifold, the curvature may be measured with the 
Riemann curvature tensor, but in the case of connections on general fibre bundles, a more general measure 
of curvature is required. If the fibre bundle is not a tangent bundle, the space in which parallel translation 
occurs is not the same as the base space containing the curves along which a fibre set element is transported. 


In the case of a vector bundle, curvature may be expressed at each point as a tensor with mixed spaces, one 
space being the tangent space of the base space, the other space being the fibre space. 


In the case of OFBs which are not vector bundles, the curvature may be expressed in terms of vector fields 
on fibre sets. If the structure group of the OFB is a finite-dimensional Lie group, and thus typically has a 
well-defined finite-dimensional Lie algebra, the curvature may be expressed in terms of a bilinear map from 
the tangent space to the Lie algebra. If the structure group is not a finite-dimensional Lie group, one must 
resort to the “lowest common denominator" of vector fields on fibre sets. 


70.4.2 REMARK:  Curvature for connections on general ordinary fibre bundles. (Rough sketch.) 

Suppose y : IR? — M is a 2-parameter C? curve-family in the base space M of a C? differentiable (G, F) 
fibre bundle E < (E, 7r, M, AE). Suppose that 0 is a Ct horizontal lift function on E. Then one may define 
parallel transport on E along two different paths starting at ^(0, 0) and ending at (s, t), the first path yı 
initially following the s axis, the other path ^» following the t axis. Let 57(s,t) : IR? — E denote the lift 
function with value 57(0,0) = z € E^(9,9, along path ye for £ = 1,2. Then roughly speaking, 


E t 
A2(s,t) = z+ | Ov (s oy (47 (5, 0)) ds! + f Avs") (t, t')) at! (70.4.1) 
0 0 
and 


t s 
45 (s, t) =z + Í Ovo, (55 (0, t’)) d£ + n 0v (s y (5 (s ,t)) ds’, (70.4.2) 
0 0 


where U(s,t) = Osy(s,t) and V(s,t) = O7(s,t) for (s,t) € IR?. The difference between 77 and 4$ is a 
measure of curvature. (This is roughly illustrated in Figure 70.4.1. See also Figures 71.4.2 and 71.4.3.) 
Using the Stokes Theorem, it is possible to express this difference in terms of a suitable differential of the 
horizontal lift function. 


Eso) 
7(0, 0) 
7(s, 0) 
Figure 70.4.1 Curvature concept for connection on ordinary fibre bundle 
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Equations (70.4.1) and (70.4.2) are integral equations which determine unique solutions 77 and 4$ via the 
Picard iteration method. (See Section 44.6.) If lines (70.4.1) and (70.4.2) are subtracted, divided by st, and 
the limit as s, 1 — 0 is taken for this expression, the result is roughly as follows. 


S Oye ty 002 (3,t)) — buc o AZ (5, 0)) 


: 1 AZ AZ Ls 1 : / 
jim, 30960 — 51(5.0) = lim z E ; ds (70.4.3) 
t 0 TM AE Ju —0 (FZ 0, t' 
jdm i f im Vet CE C 90) Pron 630,0) ay 
t30 t Jg s30 S 


Now it must be confessed that the integrals in lines (70.4.1) and (70.4.2) are not integrals in the usual sense. 
They are perhaps best described as anti-derivatives because differentiating both sides of these equations 
with respect to s or t yields correct formulas for the derivatives of the lifted curve families (s, t) => ^47 (s, t). 
Consequently the difference expressions in the limits within the integrals in equation (70.4.3) do not have a 
real meaning. For example, ys’) (52 (5, t)) € T: (s (E) and (s) (1(8',0)) € T5: (s/,0) (E). Since these 
two vectors are in different tangent spaces of the total space E, they cannot be subtracted. However, the 
difference can be given meaning via charts. 


((2016-7-13. For Remark 70.4.3, do a rigorous computation of the Riemann curvature components from the 
OFB connection components. )) 


((2019-6-30. The generalisation of connection coefficient arrays from tangent bundles to vector bundles in 
Definition 68.1.8 is partly motivated by the possibility of defining a Riemann curvature tensor component 
formula for connections on vector bundles, analogous to the better-known Riemann curvature tensor formula 
for tangent bundles in Theorem 71.11.7. Prove lines (70.4.4) and (70.4.5) as theorems. )) 


70.4.3 REMARK: The tensor calculus formula for Riemann curvature. 
The well-known formula for the Cartesian-chart components of the Riemann curvature tensor for an affine 
curvature may be generalised to linear connections on ordinary vector bundles as follows. 


Vi,j € Nm, Vk, € Nn, R’ ire = De, De, (ej) — Dex Oe, (ej) (70.4.4) 


m De, (Tik) m De, (rf) 


= — (Tj e Us Ige) + (Ejer +AT) 

= Tjek — Tjk e + DL. — D$ I (70.4.5) 
for a differentiable (G, F) vector bundle (E, r, M, AE) with m = dim(F) and n = dim(M). (For fibre 
bundles which are not vector bundles, the formula is much more complicated. Note also that the “swap 
function" E in Section 59.6 must be applied to one of the terms in line (70.4.4) before applying the *drop 
function" w in Section 59.2 to obtain a valid tensorial result.) This explains why the indices k and £ seem 
to be in the wrong order. The cause of this is the minus-sign in the formula Tj, = —0,, (e;)*, which is a 
consequence of the convention that the Christoffel array compensates for non-parallel naive transport along 
coordinate axes by subtracting the true parallel transport from the naive derivative of a contravariant vector 
in a given direction. In other words, by adding a Christoffel array term, one is subtracting the parallel 
transport. (See Definition 68.1.8 for the Christoffel array for a connection on a differentiable vector bundle. 
See Remark 71.4.1 for comments on the Christoffel array's minus-sign.) 


In the case of an affine connection on a tangent bundle, E = T(M) and F = R”, and so m = n. Then the 
formula in line (70.4.5) matches the classical tensor calculus formula for the Riemann curvature. The formula 
in line (70.4.4) is an abstract expression, viewed through a Cartesian chart, for the exterior derivative of the 
connection, which is represented as the horizontal lift function for the parallel transport. 


The four indices in lines (70.4.4) and (70.4.5) have different significance and character. (The formula in 
line (70.4.5) is easier to memorise when one knows what the indices mean!) 


e i is the component index for vectors in T(R™) & IR", which are computed as the differential of the 
map Vp o d$: E > R”, where à : E > F is a fibre map on the total space E, and vp : F > IR" isa 
Cartesian chart for the fibre space F. 


e j is the component index for elements of IR", which are Cartesian coordinate tuples via wr o $ for 
elements of E. 
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e k, in the first term D.,6., (e;)’, is the component index for the velocity part of the coordinate basis vector 
field e in X(T(IR?)) C (IR^  (IR" x R?)), which is induced by a Cartesian chart Ym : M > R” 
for the base-point space M. These is a velocity field along a curve-family in the direction eg in the 
base-point space. 

e £, in the first term D.,6., (e;)', refers to a coordinate basis vector e; at a point in the base space, not 
a vector field. This vector specifies the direction of differentiation, whereas the vector field ex specifies 
the direction in which the infinitesimal transport ĝe, must be calculated. 


It is particularly notable that the (i,j) index pair refers to vertical components of elements of the total 
space E, whereas the pair (k,4£) refers to horizontal velocities in the base space. The indices i and j are 
further distinguished from each other in that j is the index for the location of a particular element e; € Ep 
in the total space fibre above some p € M, whereas i is the index for a velocity 06,,(e;) € Te,(E) for 
some e; € T,(M). There is not so much real difference between the roles of k and £ because the expression 
D.,0., (e;)’ is antisymmetrised with respect to (k, £). 

The Riemann curvature tensor is often presented, at least initially, for the special case of the Levi-Civita 
connection on a Riemannian manifold. Then there are various simplifications and symmetries. In particular, 
m =n, the array (L5,)7; p= is symmetric in (j,k), and the array (R’jx0)?; ,.¢=1 is antisymmetric in (k, £). 
In the general case, one must forget what one has learned in the special case. (Of course, the antisymmetry 
with respect to (k, £) holds even in the most general case.) 


70.4.4 REMARK: Why curvature is defined for vector fields, not vectors. 

In the familiar old tensor calculus version of differential geometry, formulas for the Riemann curvature tensor 
can give the impression that its inputs are all vectors at a fixed point. In fact, at least some of the inputs 
must be vector fields. 


In the formula on line (70.4.4) in Remark 70.4.3, the subscripts e; and e; in the covariant derivative operators 
De, and De, are only basis vectors, not vector fields. Only the vectors eg(p) and ex(p) are used in the formula. 
But the expressions ĝe, (e;) and ĝe, (ej) which are being differentiated are functions of a variable point p € M. 
Although the dependence on p is usually suppressed for brevity and clarity. It would be more accurate to 
write these expressions as p++ Oe, (5)(e;(p)) and p++ 0.,(5(e;(p)). Then the covariant derivative operators 
differentiate with respect to the variable p. When one knows that the standard formulas are already expressed 
in terms of the vector fields p — e;(p) and p++ ei (p), then it is not so surprising when one sees vector fields 
in the more modern formulations. Thus for example, one often sees the Riemann curvature tensor written 
as (X, Y, Z) o Dx Dy Z - Dy Dx Z — Dix,y1Z for vector fields X, Y and Z. This formula may be written as 
Dx(y)Dy Z — Dy(y) Dx Z — Dix,yj(p)Z at each point p because the covariant derivatives on the left of each 
term depend only on the vector at a point p. 


70.4.5 REMARK: Curvature at the boundary of a manifold. 

In the case of a manifold which has a boundary, the loops along which one may test the curvature in 
a neighbourhood of a given point on the boundary are constrained. This is not much different to the 
issues which arise when attempting to calculate derivatives of functions at a boundary point. In real- 
world applications, manifolds often do have boundaries, and these issues must be resolved, but in the 
pure mathematical theory of differential geometry, one typically restricts attention to manifolds without 
boundaries so as to avoid the issues. However, if the limit of parallel transport, divided by the area enclosed 
by the curve, for curves converging to a point is well defined, there is no good reason to reject this as a 
definition of curvature at a boundary point. 


70.5. Curvature of connections on principal bundles 


70.5.1 REMARK: Curvature of connection forms on principal bundles. 

Since all covariant derivative and curvature concepts are ultimately Lie-algebra-valued functions, as suggested 
in Remark 70.2.4, it is not surprising that the most general way to define curvature on fibre bundles is, 
apparently, by constructing a curvature form from a connection form. These forms are defined on principal 
bundles, which is also not surprising because the fibre space of a principal bundle is the same as the structure 
group from which the Lie algebra is constructed. It may seem that PFBs are a small subclass of the class 
of OFBs, but via fibre bundles associations, both the connection form and curvature form can be “copied 
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across”. Therefore the best starting point for the most general concept of curvature seems to be the curvature 
of connection forms on general differentiable principal bundles. 

Definitions similar to Definition 70.5.2 are given by Spivak [37], Volume 2, page 324; Poor [32], page 282; 
Bishop/Crittenden [2], page 80; Bleecker [254], page 37; Sulanke/Wintgen [40], page 143; Choquet-Bruhat [6], 
page 259; Kobayashi/Nomizu [19], page 76. 

Definition 70.5.2 is illustrated in Figure 70.5.1. 


mi? 


nan eet ) 


p-—mn(z) 


Figure 70.5.1 Curvature of a connection form 


((2017-3-17. D", defined by (D“¢)(X) = (dw)((h? o X4,)?-0), is called the “exterior covariant derivative" 
or "exterior covariant differentiation" by Bleecker [254], page 37; Daniel/Viallet [317], page 185. Define this 
near here. Possibly use a notation such as D”¢ = (d$) o (X h*)?)) 


70.5.2 DEFINITION: The curvature (form) of the connection form w^ for a C! connection 8 on a C? 
principal G-bundle (P, x, M, AG) is the form Dw? € X(As(T(P), T.(G))) which is defined by 


Vz € P, VYyı, y2 € T;(P), (Dw*).(y1, ya) = (dw? )(h2 (y1), hE (yo), 


where A? is the horizontal component map for 8, and d : X! (&(T(P), Te(G))) ^ X? (A2 (T(P), Te(G))) is 
the exterior derivative for X! (A (T(P), Te(G))). 


70.5.3 EXAMPLE: Curvature of connection form for a trivial principal (IR, +)-bundle on R4. 

The curvature form in Definition 70.5.2 is computed as the exterior derivative of horizontal components 
of vectors in the tangent bundle T(P) of the principal bundle P. These horizontal components are in the 
same tangent bundle T(P). So the curvature form is very closely related to the exterior derivative which is 
computed in Example 69.5.8. 

As in Example 69.5.6, let 8 be the horizontal lift function on the principal bundle (P, v, M, AG), where 
M = R, G =R < (R, +), P = M xG, anda: P > M with 7 : (p, g) 9 p. Then By(z) = t(p gy (v,f(2,v))idp 
for p € M, V = tpviau € Tp(M) and z = (p,g) € Py, where f : IR x R x R^ —> R has the form 
f(p,g,v) = PR a;(p)v? for all z = (p,g) € P and v € IR*, where a = (aj)?-9 € C'(M,IR?), so that 
f € C(R* x R x Rf, R). It is shown in Example 69.5.6 that the connection form w € X(A1(T(P), Te(G))) 
corresponding to D satisfies 


Vz € E; V(v, w) = Rf x R, WA beta wide) g ton,w—f(z,v),idp- 


Let Y, € T,(P) for k = 0,1. Then for k = 0,1, Ye = tz (x,,0,), for some (vk, we) € Rf x R. So by the 
computations in Example 69.3.3, hz(Yr) = tz,(vz,f(z,vx),idp) for k = 0,1. This effectively replaces wy with 
f(z, vg), which does not depend on wy; at all. This is not surprising because the explicit purpose of the 
horizontal component map is to remove the vertical component and replace it with a value which makes the 
vector horizontal. 
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Note also that the exterior derivative value (dw)(Yo, Y1) does not depend on the vertical components wo and 
w1 of Yo and Yi, As shown in Example 69.5.8, 


(dw)(Yo, Y1) = to,a idr; 


where 
a=- 3; (Ips ax(p) — ðpraz (p) vgv, (70.5.1) 


which does not depend on wo or w;. Thus the application of h, to Yo and Yı makes no difference in this 
case. (This is not surprising because the vertical component term in Theorem 70.5.7 line (70.5.2) vanishes 
when the structure group is abelian.) One obtains therefore 


(Dw)(Yo, Y1) = (dw) (hz (Yo), h-(¥1)) 
= to,a,idr 
= - EË cs (0yas(p) — praj (p) Jobo to, ian- 
The minus sign comes from the fact that the connection form w is a “difference-style” connection according 
to the classification in Remark 67.2.1. So the curvature of w acquires a minus sign. (This curvature sign 


issue is also mentioned in Remark 70.4.3 regarding the Riemann curvature tensor, which is the curvature of 
the Christoffel array field, which is also a “difference-style” connection as mentioned in Remark 68.1.9.) 


70.5.4 NOTATION: QÊ, for a C! connection 6 on a C? principal G-bundle (P, m, M, AS), denotes the 
curvature form Du of the connection form w corresponding to 8. 

08, for a Ct connection 8 on a C? principal G-bundle (P, 7, M, AS), for z € P, denotes the restriction of 
QÊ to T.(P) x T,(P). In other words, 02 = Q? In other words, 


T.(P)x T.(P)' 
Vz € P, Vs, y2 € T.(P), Q2 (yi, y2) = (du) (h2 (y1), RE (y2)), 


where A? is the horizontal component function for 8. 


70.5.5 REMARK: Notation for curvature forms. 

The curvature form in Definition 70.5.2 is customarily denoted simply as “Q”, but this symbol is also very 
often used for open sets in topology. Therefore a superscript 8 is added in Notation 70.5.4, partly to indicate 
that it is constructed from a connection Ø, but also to distinguish it from typical open set notations. The 
optional subscript z to restrict focus to a single total space element is also customarily not indicated. 
Strictly speaking, for consistency, the notation w for connection forms should also have a superscript to 
indicate the horizontal lift function, as for example “wÊ ”. However, Theorem 69.6.3 implies that each of 
the principal bundle connection information containers 8, P, h, v, Q and w could be taken as the primary 
specification, whereas the curvature form Q is specifically constructed from one of these equi-informational 
sources. The superscript indicates that it is a secondary object, constructed from a primary object. 

The differential “D” in Definition 70.5.2 should, strictly speaking, be denoted “D®” or *D"" because the 
horizontal component function “h” should, strictly speaking, be denoted “h?” or “hY” to show its dependence 
on f, or equivalently on w. Thus one arrives at a notation such as “D“w” for the curvature form. So the 
curvature is doubly dependent on the connection because the connection itself is used to construct the 
differential operator which acts on w. 


It is true that the formula for Dw? in Definition 70.5.2 can be written without A? as 


Yz € P, Vy ya € T,(P), | (Du?) (n y2) = (dw? )(yi — A« 2 Q1); y2 — Az o2 (y2))) 


using the formula h; = idr, (pj — Az o wz in Theorem 69.6.3 (viii). But this does not seem natural. 


70.5.6 REMARK: The “structure equation” for curvature forms computed from connection forms. 
Theorems which are similar to Theorem 70.5.7 are stated and proved by Poor [32], pages 284-285; Spivak [37], 
Volume 2, pages 327-328; Bishop/Crittenden [2], pages 81-82; Bleecker [254], pages 37-39; Kobayashi/ 


Nomizu [19], pages 77-78; Sulanke/Wintgen [40], pages 144—145. 

The exterior derivative operator in Definition 61.10.3 is shown to be chart-independent in Theorem 61.12.5. 
If the connection form wô in Theorem 42.6.2 is C!, then dwf is C? by Theorem 61.13.2. 

For Lie algebra operations such as [wf (y1), wf (y»)] on line (70.5.2), see Definition 62.8.8. 
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70.5.7 THEOREM: The “structure equation” for the curvature form. 
Let P < (P,z, M, AG) be a C? principal bundle. Let wê : T(P) — T.(G) be the connection form which 
corresponds to a C! horizontal lift function 8 on P. Then 


Vz € P, Vyr, Y2 = T:(P), Q (ys, y2) = (dur) (ys, ya) + lof (yr), we (ya)]. (70.5.2) 


PROOF: Let z € P and y1, ye € T;(P). Let uy = wb (yk) € Te(G) for k = 1,2. Then rx, € X (T(P)) for 
k — 1,2 by Theorem 66.6.8 (iv). (See Definition 66.6.2 for the fundamental vertical vector field A.) 

The Lie bracket [A, , Àu,] is equal to Apu, jus} by Theorem 66.6.13. Then since [u;, u2] = Az! (qu, ,u4(2)) by 
Theorem 66.6.5 (vi), it follows that [u1, u2] = A7 “Dawes 

Theorem 61.5.13 implies that [A,,,Au,] € X?(T(P)). Suppressing (for now) the required swap and drop 
functions, Definition 61.5.7 gives 


'Then by Definition 61.4.2, 
[Au E Aus](z) = dx, (z)Au2 — Os, (2) Aur 


This converts naive derivatives by vector fields to naive derivatives by the vectors Àu, (z) and A4, (z) 
By Definition 66.6.2, Au, (z) = Az(ux) = Az(wP(yx)) = vf (yp) for k = 1,2 by Theorem 69.5.15 (xxiv). So 


Dus, Awa](2) E Oy8 (y,) us E 9,4 (ya) Au: 
Consequently 
lef) w 2)] = AZ" Our Aut @)) (70.5.3) 


X 
| 
= Az (Outin) Awb (y) T PoE (ae) ^u (uo) 


where A uin = X! (T(P)) satisfies NO = à- (wb (yk)) = vf (yp) for k = 1,2. By applying the exterior 
derivative Famula for dw? in Definition 61.14.9 and Theorem 61.14.11 to these cross-sections, one obtains: 


Vz € P, Vyi, yo € T.(P), (70.5.4) 
(dw) (v (yi), v2(y2)) = Os (y (o? 9 AP (ys 2 9, £ (y MC Sg NP (yy) - w^ (IX, By p (ya) 


If z' € P then (wf o À "I" 2!) = wb (Az (w8 (yk))) = wf (yk) by Theorem 69.5.15 (xxiii). This implies that 


the cross-section wu o As.) € X(A1(T(P), T.(G))) is constant for k = 1,2. So by Theorem 54.14.5, 


Yk 


Yz € P, Vyi,yo € T.(P), (dw®)(v8(y1), v? (y2)) = —wP (Dus) Aust 
= wh Os (lf (y1), we (y2)])), (70.5.5) 
= —[wP(y1), w$ (yo)], (70.5.6) 


where line (70.5.5) follows from line (70.5.3), and line (70.5.6) follows from Theorem 69.5.15 (xxiii). 
By Definition 69.4.2, ye = hê (ye) + v? (ye) for ( = 1,2. So the bilinearity of dw at z implies that 


(du? )(yi, yo) = (du?) (hZ (y1), hE (y2)) + (dw) (v8 (yr), v (ya) 
+ (dP) (h (yi), v2 (ya)) + (dw?) (v8 (yr), hE (y2)). 


Therefore by line (70.5.6) and the antisymmetry of dwf, one obtains 


(dw?) (AE (y1), RE (y2)) = (dw”) (yr, y2) + [ez (yr), we Qi]. 


Hence line (70.5.2) follows by Notation 70.5.4. 
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70.5.8 REMARK: Tricks used in the proof of the structure equation for the curvature form. 
The proof of Theorem 70.5.7 uses some noteworthy tricks. One trick is that the functions X_,6(,,) € X MT(P)) 
for k = 1,2 provide a very convenient pair of vector fields on P to which the vector-field version of the exterior 


derivative in Definition 61.14.9 can be applied on line (70.5.4). 


By Definition 69.5.4, the connection form values wf (yp) = A7 + (v8 (yp)) are the same as the vertical compo- 
nent values vf (yp) except that the simple connection-independent map A7! = (dLP)z! : T; o(P) > T«(G) 
has been applied to them. (See Definition 66.5.2 for Az.) 

The map Az! has the very limited domain T; o(P) because A, maps the Lie algebra set T.(G) (which is 
common to all of the principal bundle P) to only the vertical vectors at z € P. However, the transposed 
function Awe (yx) is a global cross-section of T(P). Thus a global cross-section of T(P) is constructed from a 
single vector yy € T.(P) at the single point z € P. 


As a bonus, this cross-section has the value NO, = àz(wb (yk)) = v2 (yk) at z. So it very conveniently 
provides the right kind of input for the vector-field exterior derivative in Definition 61.14.9. Most importantly, 
it is not necessary to artificially manufacture such a cross-section as in the vector version of the exterior 
derivative in Definition 61.10.3, where local manifold charts are used to *manufacture" vector fields with the 


desired value at a given point. 


Since the vector fields ALB (y,) CaN be fed into the exterior derivative dwf to evaluate its double-vertical 
component, and the horizontal-vertical cross-terms cancel because of the antisymmetry, what is left is the 
double-horizontal component, which is defined to be the curvature form. Consequently all of the real work 
in the proof of Theorem 70.5.7 is related to the double-vertical component. 

The second trick applied in the proof of Theorem 70.5.7 is the conversion of the Lie bracket ESTARE Aw? (yo) 
of vector fields dP (ye) into a product of elements of the Lie algebra T.(G). This Lie bracket arises as a kind 
of “correction term" from the application of the exterior derivative dw to nonholonomic vector fields. (See 
Remark 61.14.8 for the interpretation of this Lie bracket as a "correction term".) 

It is perhaps surprising that a Lie bracket of principal bundle vector fields can be converted to a product 
of Lie algebra elements. The reason it works is that Lie algebra products are defined to be Lie brackets of 
vector fields in Definitions 62.8.3 and 62.8.8. Vector fields on the Lie group and on the principal bundle 
are map-related as in Definition 61.6.2 via the fundamental vertical vector field A. So the Lie brackets are 
related by Theorem 61.6.6. This is what connects PFB vector fields to Lie algebra element products. 


A third fortuitous circumstance is that the derivative terms on line (70.5.4) are derivatives of constants. 
'This is because we, is a left inverse of A, for all z’ € P. So a connection form applied to the output of a 
fundamental vector field Au yields we, (ulz) = wh, (Az (u)) = u for all u € Te(G), which is clearly constant. 
So differentiating the composition of a connection form with a fundamental vector field always yields zero. 
'Thus two complicated terms in the exterior derivative computation disappear without computing them. 


The final result of applying these “tricks” is that a Lie algebra product appears in the structure equation 
on line (70.5.2) alongside the exterior derivative du^. This apparent incongruity arises from the Lie bracket 
"correction term" of the exterior derivative, which is converted into a Lie algebra product by a fundamental 
vector field via the map-related vector fields concept, combined with the fact that a Lie algebra product is 
in fact a Lie bracket of vector fields. 


70.5.9 REMARK: The confusing [w,w] notation for the self-commutator of a connection form. 

Much of the literature concerned with formulas such as appears in Theorem 70.5.7 uses a notation such 
as Q = dw + [w,w]. If taken literally, the commutator expression [w,w] would always equal zero. To 
clarify the issue here, consider the two function composites [-, -] o (w x w) and [:, -] o (w X w), where the 
common-domain and double-domain function product operations “x” and “x” are as in Notations 10.15.3 


and 10.14.4 respectively. These two function composites are calculated as follows. 
Vy € Dom(w), ([-,-]oWxw))(y) = wu) ou). [useless] 
Vyi, yo € Dom(w), (M, lo (w X e) y2) = lei), oa). [useful] 


It is the double-domain version which is intended in the equation Q = dw + [w,w]. Similarly, the curvature 
form in Definition 70.5.2 may be expressed as Dw = (dw) o (h X h). Thus one may write 


OF = Du = (dw) o (h X h) 2 du + [:,] o (w X w). 
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(Admittedly this is not going to go down well with the kids in the hood!) 

The notation w ^ w for the function (yi, y2) > [w(yi),w(y2)] has the disadvantage that wedge products 
are defined in two equally popular ways which differ by a factor of 2 (for 1-forms). So one must write 
Q = dw + iw ^ w to remove an extraneous factor of 2 if it appears. 

Some people even write Q = dw + $[w Aw] or € = dw + [w ^ w] or € = dw + $[w,w]. In every context, one 
must look up what all of the definitions and notations mean in that context. Even the exterior derivative 
may incur positive or negative integer factors according to the varying styles chosen by particular authors. 


70.6. Gauge localisation of curvature forms 


(2023-1-2. Section 70.6 and Definition 70.6.2 are new today. Work in progress! )) 


70.6.1 REMARK: The pull-back of a curvature form from a principal bundle to the base space. 
Definition 70.6.2 is the gauge pull-back of a curvature form in Definition 70.5.2 by analogy with the gauge 
pull-back in Definition 69.11.3 of a connection form in Definition 69.5.4. 


70.6.2 DEFINITION: Gauge localisation of a curvature form. 
The curvature form localisation map via (local) cross-sections , for a curvature form Q on a C? principal 
G-bundle (P, x, M, AG), is the map F? : XL. (P,m, M)  Xisc(Ag(T(M), T. (G))) defined by 


loc 


VX € XL, (P, m, M), Dom(F2) = T(Dom(X)), 
and 
YX € XL.(P, m, M), Vp € Dom(X), Vi, V2 € T,(M), 
FY (Vi, V2) = 2(X.(Vi), X. (V2)). 


The curvature form localisation via a (local) cross-section X € X{,.(P,7,M) of a curvature form Q on 


(P, v, M, AG) is the function FÌ € Xjoc(A2(T(M), Te(G))) which satisfies 


V(Vi, V2) € T*(M), FX (Vi, Vo) = U(X, (V1), X. (V2)). 


Alternative name: the localisation of the curvature form Q via the cross-section X . 


70.7. Justification of definitions of curvature of connections 
(2017-1-14. Section 70.7 needs to be thoroughly overhauled. Please ignore it for now. )) 


70.7.1 REMARK: Ad-hoc Cartesian space horizontal lift map definition. 

For Theorem 70.7.6, it is convenient to introduce an ad-hoc definition for a “Cartesian space horizontal 
lift map” which uses Cartesian coordinates instead of manifolds. Since Theorem 70.7.6 has a very local 
character, coordinates are adequate. So this “toy theorem” in coordinate charts may be transferred to true 
differentiable fibre bundles. Definition 70.7.2 is a coordinates-only “toy version” of Definition 67.5.4. 


Definition 70.7.2 (i) requires that L(p,v)(z) be linear with respect to v for all fixed p € Qm and z € Or. 
The first parameter p of L(p, v)(z) is a point in a “base space" Qm. The second parameter v is a “tangent 
vector" at p, and z is an element of the “fibre space" Qpr. Thus in terms of the familiar Christoffel array for 
a vector bundle, one may identify L(p,v)(z) with (577 , 5; 4 [j (p) 0" yr, € R”, with the important 
difference that for general connections, L(p,v)(z) is not always linear with respect to z because the fibre 
bundle may not be a vector bundle. Therefore it is not assumed that the map z — L(p, v)(z) is linear. 


Linearity of the lift map L(p,v)(z) with respect to v ensures that parallel transport is independent of the 
parametrisation of curves. At least “scalarity” is required for parametrisation independence, but life is 
slightly simpler if full linearity is specified as the requirement. The third parameter z in L(p, v)(z) refers to 
a point in the fibre space of a differentiable fibre bundle, while L(p, v)(z) refers to an element of the tangent 
space at z in Of. 


The C* condition in Definition 70.7.2 (i), jointly with respect to p, v and z, is required in order to take 
advantage of the total and directional differentiability which is guaranteed by Theorem 41.6.17. 
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70.7.2 DEFINITION: Coordinates-only horizontal lift map. (Toy version.) 
A OF (differentiable) Cartesian space horizontal lift map from R” to IR", for k € Zi and n,m € AZ isa 
map L: Oy x R” > (Op > R”), for some Qum € Top(R”) and Op € Top(R™), which satisfies 
(i) Vp € Qm, Vz € OF, (v e L(p,v)(z)) € Lin(IR^, R”), 
(ii) ((p,v, z) > L(p, v)(z)) € C* (Qm x R” x Np, R”). 


70.7.3 REMARK: The covariant derivative in the Riemann curvature justification theorem. 

The operator D, in Theorem 70.7.6 signifies the “covariant derivative” in the direction u € IR" of a function 
(p,v,z) — L(p,v)(z) for fixed v, which gives the expression on line (70.7.6). This is a kind of derivative 
“following the flow" specified by the lift map L. The term 557" , L(p,u)(z)/ Oz; h(p, v, z) in line (70.7.1) uses 
the lift map L to determine a “preferred direction" for z to vary in, so that the whole expression for D,,h: 


(1) differentiates h(p, v, z) by varying p in the direction u € R”, 
(2) does not vary v at all, and 


(3) differentiates h(p, v, z) by varying z in the direction L(p,u)(z) € R™, which is the direction in which z 
would move if it was parallel transported in the direction of motion u of the *base space point" p. 


This kind of derivative will get the “right answer" for the derivative of h along a curve in Qp (such as P and 
Q in Theorem 70.7.6) if it has been horizontally lifted by L from a curve in Qm (such as y in Theorem 70.7.6). 


In terms of the significance for fibre bundles, the pair (p,z) represents the two components of a local 
trivialisation of a fibre bundle total space, and the function À represents a cross-section of the tangent 
bundle of that total space, which is the same thing as a vector field on the total space. Thus this “covariant 
derivative" is differentiating a vector field at a point in the total space, given the *velocity" u in the base 
space to indicate the direction of base space motion, which in turn determines the “correct velocity” for total 
space motion for parallel motion. 


Although vector fields are mentioned here in the interpretation of the function h in Definition 70.7.4, it 
should be noted that the expression for D,,h in line (70.7.1) depends only on the values of the first partial 
derivatives of h at the point (p, v, z) € Qm x R” x Qp, not on any kinds of integrals of the flow which might 
be implied by the vector field h. In other words, Dh is a strictly local property of the field h. 


The “velocity” parameter v in h(p,v,z) is not varied because it is irrelevant from the point of view of 
differentiating a vector field on the total space of a fibre bundle, which utilises only the parameters p and z. 
It is included in Definition 70.7.4 only because it appears in the proof of Theorem 70.7.6. However, in the case 
of genuine ordinary fibre bundles, there is no such thing as a “constant velocity" in an open neighbourhood 
of a point in the base space. So some kind of vector field is then necessary in order to construct a vector 
field on the total space of the form z ++ 0y(,,(z))(z), where V is a vector field on the base space of an OFB 
(E, te, M, AL). On the other hand, Theorem 70.7.6 indicates that the curvature may be computed from 
a vector field which is defined only on a two-parameter curve-family in the base space, and the choice of 
vector fields which extend u and v from p to a neighbourhood of p has no influence on the result of the 
computation. 


70.7.4 DEFINITION: The covariant derivative with respect to a Cartesian space lift map from R” to R”, 
L: y xR" 2 (Op — IR"), of a Ct function h : Om x R” x Op — R™ in direction u € IR" is the function 
Dah: Oy x IR" x Op — R” given by 


V(p, v, z) € Qm x IR" x Op, 


(D,h)(p,v,z) = > u' Oy h(p, v, z) + 2 L(p, u)(z)) 4; h(p, v, z), (70.7.1) 


i=l j=l 


where 0,ih(p,v, z) denotes the partial derivative of h with respect to the ith component of the Cartesian 
space IR" = R” x R” x R™, and 0,;h(p,v, z) denotes the partial derivative of h with respect to the 
(2n + j)th component, namely the jth component of the IR" component-space. 


70.7.5 REMARK: Illustration of Riemann curvature justification theorem. 
Some of the constructions in Theorem 70.7.6 are illustrated in Figure 70.7.1. (The curvature which is 
computed here is the “curvature of parallel transport”, which equals the negative of the standard Riemann 
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curvature because the Riemann curvature computes the covariant derivative of a given vector field around a 
loop, whereas the “parallel transport curvature” shown here computes the net effect of parallel transporting 
a given fibre set element around the loop. See Remark 70.7.7 for more comment on this.) 


(Q(0, t) Q(s,t) 
4 P(s, t) 
QF C IR 
, Q 
: z= P(s,0) 
P 
z = P(0,0) = Q(0,0) J 
> 
E 
?(0, t) 7(s, t) 
ICR? 
oy 
Qm C R” 
$ = 7(s,0) 
Figure 70.7.1 Curvature of “toy” parallel transport around a loop 


70.7.6 THEOREM: The Riemann curvature justification theorem. (Toy version.) 


Let n,m € Zf. Let L: Qy x R” + (Nr — IR") be a C! Cartesian space lift map from R” to IR". Let 
p € Qy. Let z € Op. Let I = (—1,1)? C R?. Let y € C?(I, R”) satisfy 4(0,0) = p and 7(I) C Nm. Let 


P,Q € C?(I, QF) satisfy P(0,0) = Q(0,0) = z and 


Y(s,0) € I, O,P(s,0) = L(o(s,0), Osy(s, 0))(P(s, 0)) 
V(s,t) € I, O,P(s,t) = L(¥(s,t), y(s, t))(P(s,t)) 
V(0, t) € I, O,Q(0, t) us L(4(0, t), Oi (0, t))(Q(0, t)) 
V(s,t) € I, 05Q(s, t) = L(y(s,t), 9s(s, t)) (Q(s, t)). 
Let u = 0,7(0,0) = O.Y(s, DINEM and v = 053(0,0) = 0,y(0, t) o Then 


li _ 
iei) o Ks, P " 

where 
Vu,v € R^, D. Elo) S VAIO 3e Y psu) C Ob 3: 


i=l j=l 
In other words, 


Ve € Rt, 3ó € R*, V(s,t) € IN {(0,0)}, 


max(]a|, |) «5 > P(s,5) — Q(s,) —at Y: 
f= 


k 1 


70.7.2 
70.7.3 
70.7.4 


( 
( 
( 
(70.7.5 


) 
) 
) 
) 


(70.7.6) 


(De, L(p, €6)(2) - DerL(p,ex)(2))u*o"| < es, 0f. 


PROOF: Let n, m, I, 9m, Op, p, z, L, y, P, Q, u and v be as described. By Definition 70.7.2 and the twice 
differentiability of y, the expression for 0,P(s,0) in line (70.7.2) is differentiable for all (s,0) € I, and by 
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Theorem 41.6.17 and the second-order chain rule for C? maps between Cartesian spaces, Theorem 42.5.25, 


V(s,0) € I, 82 P(s,0) = > ðsy(s, 0) Op: L(^(s, 0), Os7(s, 0))(P(s, 0)) 


L(7(s, 0), 0?7(s, 0))(P(s, 0)) (70.7.7) 


ja 


A ðsP(s, 0) 0.5 L(y(s, 0), Əsy(s, 0))(P(s, 0)), 


” 


where the pseudo-notations ^O," and “0,;” indicate partial derivatives of the maps p — L(p,v)(z) and 
z+ L(p,v)(z) respectively. The term 0?7(s,0) appears as the second argument of L on line (70.7.7) due to 
the linearity of the map (v — L(p,v)(z)). It follows from equation (70.7.7) with s = 0 that 


aP(0,0) = > u' Op: L(p, u)(z) + L(p,015(0,0))(2) + » 9,P(0,0) 8; L(p,u)(z) 


= Su Opi L(p, u) (z) + L(p, a11)(2) + » L(p, u)(z)? 94 L(p, u)(z), (70.7.8) 


because QjP(0,0) = L(p,u)(z) by line (70.7.2), where the vector-valued matrix [axe]; ,., € (IR")?*? is 
defined by aj; = 0,07y(0,0) for k,l € IN». (This matrix is symmetric because y is C?.) 
Let e1 = 4e. Then for some ô, € Rt, 


Vs € (—ô1, 61), |P(s,0) — z — sL(p,u)(z) — 35202 P(0,0)| € e1|sp". (70.7.9) 


From line (70.7.3) and Definition 70.7.2, it follows that 


V(s,t) € I, 82P(s,t) = Y; (s, t)! Op L(y(s,t), 0vy(s, t)) (P(s,t)) 


i 


+ L(7(s, t), 0?77(s, t))(P(s, t)) (70.7.10) 


le, 2 9, P(s, t)? 025 L(y(s, t), Oy (s, t))(P(s, t)). 


For any given s € (—61,01), substitution of t = 0 into equation (70.7.10) gives 


m 


03 P(s,0) = 5 õpi L(p, 0) (Z) + L(6, 037(s,0))(Z) + E L(B,v)(2) O.5 LBD), (70.7.11) 


i=l j=l 
where p = 7(s,0), ù = O27(s,0), Z = P(s,0), because 02P(s,0) = L(p,0)(Z) by line (70.7.3). (For brevity, 
the dependence of p, v and Z on s is not indicated, although analysis of this dependence on s is the most 
important part of the proof!) 
The value of P(s,t) can now be estimated for small t from estimates for P(s,0), 05 P(s, 0) and 02 P(s,0). 
An estimate for P(s,0) = Z is already given in line (70.7.9). An estimate for 05 P(s,0) can be obtained 


from line (70.7.3) in terms of estimates for p, 0 and Z. An estimate for 03P(s,0) may be obtained from 
line (70.7.11) and estimates for p, ?, Z and 02^/(s, 0). 


By differentiating line (70.7.3) with respect to s and setting s — t — 0, it follows that 


010 P(0,0) = 5 ut Op L(p,v)(z) + L(p,01025(0,0) (2) + 5; 01P(0,0) 25 L(p,v)(2) 
i=1 j= 
= Du pi L(p, v)(z) + L(p, a12)(z) + 2 L(p, u)(z)? 04; L(p, v) (2). (70.7.12) 
i= I= 
Let €2 = ie. Then for some à; € IR*, 
Vs € (—ô2, 5), |02P(s,0) — L(p,v)(z) — s0,02P(0,0)| € &s|s], (70.7.13) 
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because 02P(0,0) = L(p,v)(z) by line (70.7.3), and the uniformity of £9 with respect to s follows from the 
fact that y and P are C? functions. 


An estimate is now required for the deviation of the expression in line (70.7.11) from its value for s = 0. The 
factor t = Os*y(s, 0) is continuous with respect to s for all i € IN, because y is a C? function. The factor 
p: L(p,0)(Z) is continuous with respect to p, 0 and Z for each i € N, because L is a Ct function jointly 
in these parameters by Definition 70.7.2 (ii). But p, 6 and 2 are all continuous with respect to s because 
y is C? and 2 = P(s,0) is a C? function of s. So the map s ++ 0,:L(p,#)(Z) is continuous on (—1, 1) for 
each i € Nn. Therefore the map s ++ 5; , 0 Oy: L(p, 0)(Z) is continuous on (—1, 1). 


For the second term on line (70.7.11), the map s +> L(p, 037(s,0))(Z) is continuous on (—1,1) because L 
is C! and the maps s — p = 7(s,0) and s — 024(s, 0) are continuous since y is C°. 

The third term on line (70.7.11) is continuous with respect to s because the maps s — L(p,v)(Z)) and 
s ++ 0,;L(p,%)(Z) are continuous since p, 0 and Z are continuous with respect to s and L is C? jointly in 
its three parameters by Definition 70.7.2 (ii). Consequently the map s ++ 577-4 L(p, ©) (2) 04; L(p, v)(2) is 
continuous on (—1, 1). 


Since all three terms on line (70.7.11) are continuous in s, it follows that 02 P(s,0) is continuous in s. Let 
£s = ie. Then for some 63 € RF, 


Vs € (—63, 63), [82 P(s, 0) — 02 P(0,0)| < £3, (70.7.14) 
where 
a2 P(0,0) = 2» vy L(p,v)(2) + L(p,azz)(2) + » Doaa acd sae. (70.7.15) 


Let e4 = 4e. Then since P is a C? function, for some 64 € R+, 

Vt € (—64,04), |P(s, t) — P(s,0) — t0; P(s,0) — 1402 P(s,0)| € ea[tl?. (70.7.16) 
By substituting lines (70.7.9), (70.7.13) and (70.7.14) into line (70.7.16), one obtains 

V(s,t) € (—9', 8)’, 

|P(s,t) - z — sLu — 4s? Pi — tL, — stPia — 30 Poo| < e&i|s|? + &|st| + e3|t|? + e4lt|?, (70.7.17) 

where L,, = L(p,u)(z), Ly = L(p,v)(z) and Py, = 0,0¢P(0,0) for k, € No and 0’ = min(6j, 05,63, 04). 
One may obtain analogous formulas for Q. Let £5 = ze. Then for some ó5 € IR*, 

Vt € (—65,05), IQ(0,t) — z — tL(p,v)(z) — 1?02Q(0,0)| € es|tl?, (70.7.18) 


because t + Q(0, t) is C? on (—1,1) and 02Q(0,0) = L(p, v)(z), and differentiating line (70.7.4) with respect 
to t gives 


05Q(0,0) = > v'ðpi L(p,v)(z) + L(p,054(0,0)) (2) + > 2Q(0,0)? 3.5 L(p, v)(z) 
= Do pi L(p, v) (z) + L(p, a22)(2) + » L(p, v)(z)! 84; L(p,v)(z). (70.7.19) 


Let £6 = te. By differentiating line (70.7.5) with respect to t, it follows that for some 6g € Rt, 
Vt € (—de, de); |0,Q(0, t) = Lip, u)(z) = t020,Q (0, 0)| < slt], (70.7.20) 


where the uniformity of eg with respect to t follows from the fact that y and Q are C? functions, and by 
differentiating line (70.7.5) with respect to t and setting s = t = 0, it follows that 


0201Q(0,0) = Do Opi L(p, u)(z) + L(p, 02017(0, 0))(z) + » 05Q(0, 0)! 0,5 L(p, u)(z) 
m Sv 8, L(p, u)(2) + L(p, a21)(2) + » LGC Bop. (70.7.21) 
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The map t ++ 07P(0,t) is continuous by the same arguments as for the map s ++ 03P(s,0). Let ez = $e. 
Then for some 67 € Rt, 


Vt € (—67, 57), lo2Q(0, t) — 02Q(0,0)| < e, (70.7.22) 
where differentiation of line (70.7.5) with respect to s and substituting s — t — 0 gives 
02Q(0,0) = Y; wid. L(p,u)(2) + L(p,au)(2) + 35. L(p,u)(2)} ds L(p,u)(z). — (707.23) 
i=1 j=l 
Let £& = ie. Then since Q is a C? function, for some dg € Rt, 
Vs € (—óa, ôs), IQ(s.t) — P(0,t) — s0,Q(0, t) — 15202 Q(0,t)| < ea|s". (70.7.24) 
By substituting lines (70.7.18), (70.7.20) and (70.7.22) into line (70.7.24), one obtains 
V(s,t) e (—6", 6" y?, 
|Q(s, t) z — tla iQ», SL, — stQ21 is?Qui| < eg|t? + Eelst] + ezls|? + eg|sp?, (70.7.25) 


where Qke = 04,0,Q(0,0) for k,l € No and 6” = min(ds, ôs, 67,63). Let 6 = min(0',0"). By subtracting 
formula (70.7.25) from formula (70.7.17), one obtains 


V(s,t) € [-6; ô)’, 
|P(s,t)— Q(s.t) — (3s (Pi — Qui) + st(Pi2 — Q21) + 1 (P22 — Q22))| € el(s, t)". 


But Pj; = Q11 by lines (70.7.8) and (70.7.23), and P22 = Q»» by lines (70.7.15) and (70.7.19), and by lines 
(70.7.12) and (70.7.21), 


Pio T Q21 

= 2: (wip Lp v)(z) — vay L(p,u)(z)) + © (Lp, u)(2) ða L(p,v)(2) — L(p,v)(2) 8; L(p,u)(2)) 

= > mo: (ax Llp, €¢)(z) — Ope L(p, ex)(2) + X Lt». €x)(2)?0:« L(p, ec)(z) — L(p, es)(z)*0:4 L(p, ex)(2))) 
= > who! De Ls er) ~ De Lips es) 


= D,L(p,v)(z) — Dy L(p,u)(z). 


Therefore 


V(s,t) € (—6,9)7,  |P(s,t) — Q(s,t) — st(DuL(p, v)(z) - Do L(p, u)(z))| € ells, OP. 


This completes the proof of Theorem 70.7.6. 


((2015-10-6. The proof of Theorem 70.7.6 needs a kind of Taylor's theorem to expand twice continuously 
differentiable maps between Cartesian spaces in terms of the Oth, 1st and 2nd derivatives with an error 
bound. This must be presented in Chapter 40. )) 


70.7.7 REMARK: Riemann curvature integrates the covariant derivative, not the parallel transport. 

At first sight, there may seem to be a sign error in the statement of Theorem 70.7.6. The reason for this 
discrepancy is that the covariant derivative subtracts parallel transport from a given vector field along a 
curve to compute how far the vector field deviates from parallel transport. The Riemann curvature tensor, 
in the Koszul formalism, applies two vector fields X and Y to a given vector field Z to obtain an expression 
R(X,Y)(Z), which equals the limit of the integral of the covariant derivative of Z around an area element 
which looks like X(p) A Y (p). This gives the negative of the value obtained by parallel transporting Z(p) 
around the area element and comparing the result with the initial value of Z(p). 
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To put this another way, the covariant derivative curvature, which is most often seen in textbooks, extends 
the fibre set element z to a cross-section of the total space in a neighbourhood of the base point p, and 
then covariantly differentiates this cross-section around a loop through p. By contrast, the parallel transport 
version of the curvature transports the single fibre set element z around the loop and compares the initial 
and final values. Thus the covariant derivative version requires a cross-section as input, whereas the parallel 
transport version requires only a single element z as input. 


To put this in yet another way, the standard “vector field covariant derivative” version of Riemann curvature 
cumulatively integrates the deviation of a given vector field from parallel transport around a loop, but when 
the loop arrives back at the starting point, the value of the given vector field is, of course, the same as 
when the integration started. The loop-integral computes how the transport should have varied, whereas in 
fact the net variation is zero. When this vector field covariant derivative integration “reports” the outcome, 
it reports the observed change (namely zero) minus the desired parallel-transport change, but zero minus 
anything is equal to the negative of that thing. So it is inevitable that the covariant derivative of a vector 
field around a loop must deliver the negative of parallel transport around that loop. 


The misfortune here is that most modern differential geometry is written in terms of vector fields and 
covariant derivatives. Therefore the Christoffel array is the negative of the horizontal lift function, and the 
Riemann curvature is the negative of the parallel transport curvature. 


70.7.8 REMARK: The relation between Riemann curvature and parallel transport. 

Theorem 70.7.6 is (more or less) the central theorem of this book. It says, in essence, that the Riemann 
curvature equals (the negative of) the limit of the parallel transport around a rectangle divided by its area. 
All intrinsic differential geometry follows from this! 


70.8. Gauge theory 
(( 2018-10-25. Section 70.8 is work in progress. Please ignore it for now. )) 


70.8.1 REMARK: Equations of motion for gauge potentials. 
equations of motion for gauge theories. Such equations of motion are generally derived from a Lagrangian, 
which is in turn derived from a covariant derivative which is derived from a Lie-algebra-valued function (the 
gauge potential) on the base space. The Lie algebra elements act on cross-sections of the OFB via local fibre 
charts which are constructed from cross-sections of the PFB. 


For examples of gauge theory equations of motion, see Itzykson/Zuber [277], 562-569; Peskin/Schroeder [298], 
pages 489, 491, 500; Mandl/Shaw [288], pages 70, 264, 412; Frankel [12], page 547; Bleecker [254], page 63; 
Moriyasu [293], pages 53, 70; Drechsler/Mayer [262], page 156. 


In classical physics, the equations of motion are the starting point for mathematical analysis. From these 
equations, the state of a system at future times is determined by the initial state on some space-like manifold. 
One might reasonably hope that the same holds true in the case of particle physics. Unfortunately, the 
quantum nature of fundamental particles implies that observations of the initial state disturb the future 
state. So initial value problems for quantum systems are, by their nature, impossible to test experimentally. 
Thus the equations of motion are rendered ineffective as a basis for realistic initial value problems. In 
practice, gauge theories for quantum systems are tested using scattering theory methods, which are in some 
sense non-causal. It is probably for this reason that gauge theory books often do not even present equations 
of motion, and when they do, they do not use them as the basis for computations of expected observable 
outcomes of experiments. 


Unquantized solutions of gauge theory equations of motion are called “classical solutions”, which are typically 
“non-physical solutions”. In the case of electromagnetic fields, classical solutions are useful because they 
are essentially the same as solutions of Maxwell’s equations, but in the case of the weak and strong forces, 
classical solutions have very limited realism. 


70.8.2 REMARK: The link between Yang-Mills gauge theories and connections on principal bundles. 
There are many gauge theory histories in the literature. See for example Drechsler/Mayer [262], pages 7-10; 
Atiyah [251], pages 5-7; Itzykson/Zuber [277], pages 630-631; Bleecker [254], pages xi-xiv; Moriyasu [293], 
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pages 5-33, 73-85, 102-138; Mayer [329], pages 83-92; Bruzzo [315], pages 104-105, 109-110; Sternberg [38], 
pages 347—348. 

Histories of gauge theory generally trace its origin to Hermann Weyl’s 1918 attempt, soon abandoned, to 
unify gravity and electromagnetism by exploiting the scale or *gauge" invariance of general relativity. (See 
Weyl [310], pages 298-317.) Ten years later, Wey] [311], pages 100-101, noted that an invariance property 
for Maxwell’s equations had some similarity to his abandoned gravity/EM unification ideas. So he used the 
same term “gauge invariance” (“Eichinvarianz” in German) for this more-or-less related concept. 


The Yang-Mills theory for unification of the electromagnetic and weak forces was introduced in 1954. (See 
Yang/ Mills [338].) The physical reality of gauge potentials for quantum physics (but not for classical physics) 
was theorised in 1959 by Aharonov/Bohm [314], and experimentally confirmed by Chambers [316] in 1960. 


During 1975-1977, Mayer and Drechsler gave lecture series which explicitly described the strong links between 
the mathematicians’ concept of connections on general principal bundles and the physicists’ gauge theory 
concepts. (See Drechsler/Mayer [262].) After this time, there was a clear recognition by both mathematicians 
and physicists that there was a substantial overlap between their concepts, and that there could be mutual 
benefit in combining the two areas. 


70.8.3 REMARK: A very brief summary of gauge theory. 

The following very brief summary of “gauge theory” is given by Daniel/Viallet [317], page 176. 
The basic idea of a gauge theory is that an isospin rotation at any point of space-time, affecting 
gauge potentials and fields, leads to a different description of the same physical reality. 

Any such change is represented by a smooth assignment of an element of the gauge group to any 
point of space-time, that is, a map: R4 > G. The graphs of these maps live in a space P = IR^ x G. 
Expressed in terms of principal bundles, these “graphs” are cross-sections X € X?(P, x, R4) with m : P > IR* 

defined by 7 : (z, g) A x for (z,g) € P = Rf x G. Then X : IR^ P satisfies 7 o X =idps. 

The “isospin rotation” means a change of cross-section from X € X?(P, r, Rf) to Y € X?(P,m,IR?), 
related by Y (x) = X(r)g(x) = Rgx)(X(x)) for some function g : R* — G, where the right multiplication 
Ro(4) : Pe — Py is defined by Rgz) : (x, h) +> (x, hg(a)) for h € G. Then g is the “isospin rotation" function. 
The “gauge potentials and fields” are cross-sections AY € Xi; «(A1(T (IR2), T-(G))) as in Definition 69.11.3 
line (69.11.2) and their curvature fields. (See Daniel/Viallet [317], page 185, for "gauge fields", but see 
Remark 69.11.5 for reasons to avoid the term “gauge field”.) The “different description of reality" refers to 
the results of applying gauge transformations. 


70.8.4 REMARK: Gauge terminology. 
The use of the word “gauge” for gauge theory, gauge transformations, gauge invariance, gauge potentials 
and so forth, is an unfortunate misnomer. The poor choice of terminology originates in the apparent desire 
of Weyl [311], pages 100-101, to associate these properties for EM, where the “gauge” is really a complex- 
number phase, with his abandoned earlier attempt at a unification of gravity with EM, where the gauge was 
a real-number scaling factor. Sciama [334], page 418, made the following comment about this in a footnote. 
Unfortunately the phase transformations are usually still called gauge transformations. 
From about 1929 onwards, gauge transformations were in fact unitary transformations, or at least norm- 
preserving transformations for some kind of linear space, consistent with some compact Lie structure group. 
In the context of gauge theory, a local cross-section of a principal bundle is sometimes called a “gauge”. (See 
for example Drechsler/Mayer [262], page 117; Bleecker [254], page x; Trautman [335], page 181.) 
Alternatively, a “gauge” may mean a connection form on a principal bundle as in Definition 69.5.4. (See 
for example Frankel [12], page 553; Garrity [269], page 232.) But this is more commonly called a “gauge 
connection", or sometimes even a “gauge potential". 
The term “gauge” sometimes also refers to the localisation of a connection on a principal bundle via a local 
cross-section of the bundle to the base-point manifold as in Definition 69.11.3. (See for example Szekeres [305], 
pages 248, 542; Scharf [303], page 146.) But this is more commonly called a “gauge potential". 


Although the concept of “gauge invariance” originated in electromagnetism much earlier, the specific term 
“gauge invariance” is usually attributed to the gauge theory of Weyl [310], pages 298-317, according to 
which Maxwell's electromagnetism equations could be unified with general relativity by allowing the length- 
measure to vary. He later abandoned this theory, although he later noted its relation to gauge invariance for 
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electromagnetism in the context of Dirac’s equation. (See Weyl [311], pages 100-101, 213; Synge/Schild [41], 
pages 296-305.) 

Identity charts ¢x : P > G for cross-sections X of principal bundles P are called “gauge functions” by 
Drechsler /Mayer [262], page 120. (See Definition 21.10.7.) 

The term “gauge field” is sometimes used for the curvature tensor field of a gauge potential. (See for example 
Daniel/Viallet [317], page 185, for this usage.) However, this term is also sometimes used for localisations of 
connection forms which are referred to in this book as “gauge potentials". 


'Thus one arrives at the following kinds of interpretations of gauge terminology. 


term interpretation 
gauge X local cross-section X € X? (P, m, M) 
gauge potential A localisation A% € X5. (A1(T(M), Te(G))) of w 
gauge field F localisation FY € XL.(A2(T(M), T.(G))) of curvature Dw 


) 
gauge transformation Ag, (p); = Adj(g(p)-!) o A& y (p): + (ALG) -1) a(n) 0t" (9)) 


70.8.5 REMARK: Construction of a Lagrangian density. 

The first obstacle to the construction of a Lagrangian density is that gauge theory Lagrangians are defined 
for cross-sections of ordinary bundles with space-time as the base space, whereas the “pulled-back lifted 
chart-basis operator fields” in Definition 69.14.4 are defined on a principal bundle. 

The covariant derivatives OF iy mentioned in Remark 69.14.7, which are typically denoted like *9,, — Ap”, 
must be converted so that they act on ordinary bundles instead of principal bundles. This conversion can 
be done either by back-tracking and then constructing the covariant derivative for ordinary bundles instead, 
or by attempting to make a direct conversion from the principal bundle version. 


(2018-10-29. To be continued ... )) 


70.8.6 REMARK: List of examples of fibre bundles on Minkowski space. 
Table 70.8.1 lists some examples which are related to Example 70.8.7. All of these examples use Minkowski 
space-time as the base space and the additive real numbers as the structure group. 


example topic 


64.8.8 OFB, additive real group on Minkowski space 


66.1.6 PFB, additive real group on Minkowski space 
66.2.19 PFB right action map 

66.2.24 PFB vector field right translation operator 
66.5.4 PFB infinitesimal action map 

66.5.8 PFB infinitesimal action map inverse 

PFB fundamental vertical vector field 

PFB horizontal lift function (detailed) 
69.1.10 PFB horizontal lift function right invariance 
69.1.14 PFB horizontal lift function localisation 
69.3.3 PFB horizontal component map 

69.4.5 PFB vertical component map 

69.5.6 PFB connection form 

69.5.8 PFB connection form exterior derivative 
69.11.4 gauge potential from PFB connection form 
69.13.5 gauge potential component function 


70.5.3 curvature of connection form 
70.8.7 gauge theory for classical electromagnetism 
Table 70.8.1 Examples related to connections on Minkowski space 


70.8.7 EXAMPLE: Gauge theory for classical electromagnetism. 
Let (P,7,M, AG) be the principal (IR, +)-bundle in Example 69.1.9, where M = R*, G = R < (R,+) is 
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the additive group of real numbers, P = M x G, and 7: P + M with 7: (p,g) 5 p. Let AG = (6) with 
$: M x G — G defined by ¢: (p,g) > g. 

Let 8 be the horizontal lift function on (P, v, M, AG) as in Example 69.1.9. Then By (z) = ln o\ tu plea aap 
for p € M, V = tpvidau € Tp(M) and z = (p,g) € Pp, where f : R^ x IR x R > R has the form 
fpe) Ex o a5 (p)v? for all z = (p,g) € P and v € Rf, where a = (a;)3_9 : M > IR* is the tuple of 
coefficients of the linear dependence of f on v. 


((2022-11-1. There are some incorrect assertions in Example 70.8.7. Please ignore it for now. The difficult 
task here is to translate abstract fibre bundle theory into concrete calculus. The objective here is to derive 
Maxwell's equations from the curvature of a connection form on the principal bundle P.)) 


Let p = (£0, £1, £2, £3) and z = (p, g) = (zo, £1, £2, £3, g) for some g € R. Let Ym = idy be a chart on M, 
and wp = idp be a chart on P. 

Let (zo, £1, 2, £3, Vo, V1, V2, U3) € IR^ x IR^ be the coordinates of V with respect to Ym. Then the coordinates 
of By (z) € Ti, jj (P) with respect to Yp must be (zo, £1, £2, 23, 9, V0, V1, v2, U3, W) € IR? x IR? for some w € IR 
by Definition 69.1.3 (i, ii, iv). Since By (z) depends on both V and z, the coordinate w can be written as a 
function y : IR? x IR? > IR because the coordinates of p are a common subsequence of the coordinate tuples 
for V and z. Thus 6y(z) has coordinates (zo, £1, £2, £3, g, V0, U1, V2, V3, Y(Lo, 1, L2, 23, J, VO, U1, U2, U3)) € 
IR? x IR? with respect to wp. 

A natural choice for a fibre chart is 6 : P — G defined by ¢: (p, g) ^ g for all p € M and g € G. Then 
the map Ra) in Definition 69.1.3 (v) satisfies Ra (h) = R$, o) ) = RG(h) hog h4gforallh eG, 
where *o" is the group operation on G. By Definition 58.1.2, (ARG )e (u) = Ou RE for u € Te(G). Therefore 
(dR )e(u) = u On RE (h) =u. 

The vector u in Definition 69.1.3 (v) depends on V and ¢, as mentioned in Remark 69.1.6. So one may write 
uy,» to show these dependencies as in Remark 69.1.6. Then the formula (d$);(8v(z)) = (ARG. )e(u (u) in 
Definition 69.1.3 (v) implies y(xo, £1, £2, 23, 9, Vo, V1, V2, U3) = uva for all V € T(M) and g € G. 

Let w : T(P) > T.(G) be the connection form on P constructed from 6 as in Definition 69.5.4. 

Let X € X!(P,x, M) be a C? cross-section of P as in Definition 64.7.2. 

Define a gauge potential on M as A = wo X, : T(M) > T.(G). 

Let  : M — IR^ be a C? chart on M. For example, let v» = idm. 


For u € Z4 = {0,1,2,3}, define A, = Ao e% : M > T(G). (See Definition 57.1.18 for ev.) 
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71.0.1 REMARK: The relation of affine connections to general connections. 
Affine connections are an application of the connections on general differentiable fibre bundles which are 
presented in Chapters 67, 68 and 69 to the special case of tangent bundles on differentiable manifolds. 


In the case of a general differentiable fibre bundle (E, p, M, AE), parallel transport is the transport of 
elements of fibre sets Ep from one base point p to another, where Ep is diffeomorphic to a fixed fibre space 
F for all p € M. A general connection determines the velocity of parallel transport of vectors in E, for given 
velocities of points p along curves in the base space M. Thus a general (linear) connection specifies a (linear) 
map from T,(M) to T;(E), where z is an element of E, which is transported by the base-space curve. 


In the special case of tangent bundles, the fibre set Ej, and the tangent space T;(M) are the same space. So 
E = T(M). Therefore a connection specifies a map from T;,(M) to T,(T(M)), where z € T,(M). 


One important consequence of making the tangent bundle the total space of the differentiable fibre bundle 
is that many definitions and theorems are considerably simplified. But a more important consequence is 
that many definitions become meaningful for connections on tangent bundles which would have either no 
meaning or very ambiguous meaning for general differentiable fibre bundles. For example, geodesic curves 
are "self-parallel" curves whose velocity is parallel transported along the curve, which makes no sense unless 
the fibre set at each point is the same as the tangent space. Similarly, torsion is effectively a comparison 
of parallel transport to geodesic transport, which requires geodesic curves. (See Remark 72.2.3 for more 
detailed comments on this.) 


For historical reasons, linear connections on tangent bundles are referred to as “affine connections". As 
mentioned in Remark 67.1.1, it was Hermann Weyl who introduced affinely connected manifolds in 1918 and 
gave them this name. (The particular parallelism induced by a Riemannian metric was introduced in 1917 
by Levi-Civita.) Connections on general differentiable fibre bundles arose historically as a generalisation 
of affine connections, but general fibre bundles were not introduced until 1932 by Seifert. According to 
Steenrod [142], page v: 
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The recognition of the domain of mathematics called fibre bundles took place in the period 1935-1940. 


Thus the historical progression was from absolute parallelism on flat Euclidean metric spaces, to the Levi- 
Civita connection on Riemannian manifolds, to affine connections on tangent bundles, to general connections 
on fibre bundles. 


71.0.2 REMARK: Curvature is evidenced by path-dependent parallelism. 
Figure 71.0.1 illustrates the general idea that parallel transport of coordinate frames from point A to point 
B in a manifold may result in disagreement at B. 


A 


‘ut uu pr Rg > 
Re apti” 
d 


Figure 71.0.1 Path-dependent parallel transport of vectors 


$—» 


The difference in parallel transport between two paths with the same pair of end-points is attributed to 
curvature of the manifold. In fact, this is the general definition of curvature in the sense of intrinsic geometry. 
Very roughly speaking, the “amount” of deviation from path-dependent parallel transport, divided by the 
area bounded by the two curves, is some kind of measure of the “amount” of curvature. (The meaningfulness 
of this ratio, in the limit, is a consequence of the assumed concatenation property of parallelism. See for 
example Definition 48.3.2.) Thus the curvature of a manifold equals zero everywhere if and only if all paths 
between each pair of end-points give the same parallel transport. A space where curvature is everywhere 
zero is called “flat”. This notion of intrinsic curvature must be very clearly distinguished from the extrinsic 
curvature of a manifold which is embedded in other space. Affine connections are intrinsic attributes of 
manifolds, and the curvature of such a connection is therefore an intrinsic kind of curvature. 


One could perhaps say that without curvature, differential geometry would be a shallow subject. Without 
curvature, there would be no need for the vast majority of difficult concepts which differ from those in flat 
space. Since curvature of a manifold is determined by its affine connection, it is therefore the properties of 
the affine connection which make differential geometry an "interesting" subject. 


71.0.3 REMARK:  Path-dependence of parallelism on the two-sphere S?. 

In addition to the extrinsic curvature of a two-sphere due to its embedding in IR?, the two-sphere has an 
intrinsic curvature which can be discovered by considering parallel transport of tangent frames within the 
manifold. This is illustrated in Figure 71.0.2. 


If a vector is transported in a parallel fashion from the equator to the north pole of the earth, the vector 
will not be parallel to the same vector transported along a different path. If you stand on the Earth 
at longitude 0?, latitude 0?, assuming that the oceans are temporarily drained and the surface is made 
perfectly spherical, holding a very long spear pointed northwards, and then walk towards the north pole in a 
parallel fashion, the spear will point towards longitude 180? when you arrive at the pole. But if you move in 
a parallel fashion first towards longitude 90? and then to the north pole, the spear will be pointing towards 
longitude —90? when you arrive at the north pole. This proves that the Earth is not flat! 


The very familiar example of S? creates an unfortunate false expectation regarding parallelism and curvature. 
The difference in orientation of vectors transported along different paths in S? is proportional to the area 
between the two paths. (This is in essence due to the commutativity of the structure group SO(2).) Such 
proportionality of parallel transport difference for pairs of paths is not valid in general. So one must be wary 
of using S? as a source of inspiration for conjectures regarding the properties of general affine connections. 


[ www. geometry.org/dg.html] [draft: UTC 2023-1-3 Tuesday 00:13] 


71.1. Horizontal lift functions on tangent bundles 2243 


— Finish here Finish here 


<= 


T. 


München 


Start here 


Figure 71.0.2 Path-dependent parallel transport on a 2-sphere 


71.0.4 REMARK: A differentiable manifold without parallelism is unsuitable for physics. 

Parallelism connects up the definitions of direction at different points in a space by “carrying” vectors from 
one place to another. Layer 2 (the differentiable layer of differential geometry) has no concept of parallel 
motion. This is totally unsatisfactory for physics. Momentum, for example, requires parallel motion of 
objects to be defined. Newton’s first law says that in the absence of forces, an object’s momentum does 
not change. It travels in a straight line at an even speed. But if you apply a diffeomorphism to the space 
coordinates, a straight line may become curved and vice versa. If the notion of “straight ahead” is not 
well defined, an object does not know which direction to travel in. An affine connection must be added 
to the differentiable structure in order to define parallelism at a distance. It is sufficient to define parallel 
translation along curves. This permits objects to determine their trajectories. Without an affine connection 
to guide them, physical objects would have no idea even how to obey the laws of physics. 


71.0.5 REMARK: Historical origin of the word “affine” in mathematics. 

The word “affine” apparently arose from a thinking error by Euler when he was writing about similarity 
transformations. (See Section 77.2 for history of the word “affine”.) It is related to the word “affinity”. 
(See Remark 26.1.3 for the wide range of meanings of “affine” in different contexts.) The use of the word 
“affine” for connections is unfortunate. A better terminology would have been “linear connection”. On the 
other hand, the introduction of the term “affinely connected manifold” by Weyl [310], page 94, does seem 
reasonably well justified since parallel translation is a kind of combination of an infinitesimal translation of a 
point with a linear translation between tangent spaces analogous to the concept of an affine transformation 
on a linear space as in Definition 24.4.6. 


71.1. Horizontal lift functions on tangent bundles 


71.1.1 REMARK:  Affine connections are a special case of horizontal lift functions. 

If the tangent bundle T(M) is substituted for the total space E in Definition 67.5.4 for the horizontal lift 
function of a connection on a C! ordinary fibre bundle, the result is Definition 71.1.2. (For the maps and 
spaces of a tangent bundle, viewed as an ordinary fibre bundle, see Figures 65.9.1 and 65.9.2.) 


71.1.2 DEFINITION: An affine connection on the tangent bundle T(M) < (T(M),v, M, AT cu) of a C? 
manifold M with n = dim(M) is a map 0 : T(M) > Upem(Tp(M) > T(T(M))) satisfying the following. 


(i) Vp c M, VV € T;(M), Dom(0y) = T (M). [domain 
(ii) Vp € M, VV € T (M), Yz € T,(M), 6v(z) € TZ(T(M)). [vector field 
(iii) Vz € T(M), (V = 6v(z)) € Lin(Tr( (M), TZ(T(M))). [linearity 
(iv) Vp € M, VV € T,(M), Yz € T,(M), (dr). (6v(z)) = V. [horizontal component 
(v) Vp e M, VV € T,(M), Vo € AT ur) p: du € T,(GL(n)), Vz € T,(M), (dd)z(9v(z)) = (dRgz))e(u), 

where Ru : GL(n) — R” denotes the right action Ruy : At Aw of w € R” on GL(n). [group action 


An affine connection is also called a horizontal lift function for the tangent bundle. 
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71.1.3 REMARK: Interpretation of the definition of an affine connection. 
The tangent bundle (T(M),7, M, ARG m)) in Definition 71.1.2 is a C' differentiable fibre bundle (with fibre 


space IR" and structure group GL(n)) because the standard fibre atlas ARG m) for this fibre bundle is taken 


from the C? manifold atlas for the base space M. Each fibre chart ¢ : T(M) > IR" maps vectors z € Dom(¢) 
to their component tuples with respect to a manifold chart v : 7(Dom(¢)) > R”. 


The functional form of the affine connection map 0 : T(M) > U,ey (15 (M) > T(T(M))) means that 0 
maps each vector V in the tangent bundle T'(M) of the base space M to a map from T,(M) to T(T(M)) for 
some p € M. More specifically, by condition (i), for any p € M and V € T,(M), the value of 6) is a map 
from T,(M) to T(T(M)). This is made even more specific by the “vector field" condition (ii), which states 
that Oy must map z to an element of T,(T(M)) for each z € T,(M). (The parameter V for 0 is denoted as 
a subscript to avoid using the slightly less convenient function-valued-function notation 0(V)(z).) 


Condition (iii) in Definition 71.1.2 says that for fixed z € T(M), the map V +> 0y(z) is linear with respect 


to V € T,(2)(M) which is in the tangent space of the base space. A particular consequence of this is that 
parallel transport along a curve is independent of the parametrisation of the curve. 


There are two copies of T(M) in condition (iii). The first copy appears in the role of the tangent bundle for 
the base-point space M. The second copy appears in the role of the tangent bundle's total space. In the 
abstract theory of differentiable fibre bundles, these are distinct spaces, denoted in Definition 64.8.3 as T(B) 
and E respectively. In the case of a tangent bundle, the spaces T(B) and E are the same because B — M 
and E — T(M), but they play different roles. 


By contrast with the linearity with respect to V in condition (iii), the value Oy (z) cannot be said to be linear 


with respect to z because 0v (z) € T.(T(M)) by condition (ii), which implies that ôy (z) is an element of a 
different tangent space TZ(T(M)) for each z € T,(M). 


Condition (iv) states that the projected horizontal component (d7),(@v(z)) of 0v(z) must equal V. This 
justifies the word “lift” in “horizontal lift function" because it means that Oy(z) is in some sense directly 
"above" V. In other words, 0 “lifts” V to a vector 0v (z) which is “above” V. (As mentioned in Remark 67.4.3, 
this suggests that the term "vertical lift" could be more accurate, although the *output" of a horizontal lift 
function is indeed a horizontal vector on the total space, which justifies its customary name.) 


Whereas condition (iv) determines the value of the projected horizontal component (d«);(0y (z)) of 6 (z), 
condition (v) constrains the form of the “projected” vertical component (d9);(0vy(z)). (The fibre chart ¢ 
may be thought of as a “vertical projection”.) To interpret condition (v), choose the chart vr on IR" to be 
the identity function, and coordinatise GL(n) by the invertible matrices in MI"Y(IR). (See Notation 25.8.8 
for MPY(RR). Then the coordinate matrix of e is the unit matrix IL, € Mi» (IR), and u € T,(GL(n)) has 
a coordinate matrix ®(y¢)(u) = U € M, (IR), where (Ya) is the “velocity chart” on the tangent bundle 
T(GL(n)) corresponding to the chart Ya = idg as in Definition 54.5.6 and Notation 54.5.7. (For the maps 
and spaces in condition (v), see Figure 71.1.1. This is similar to Figure 67.5.1 for connections on general 
ordinary fibre bundles and Figure 68.1.1 for connections on general vector bundles.) The fibre chart map 
(Ya): T(GL(n)) ^ R”*” is related to the total space manifold map V(vG) : T(GL(n)) > IR^*" x R^*" 
by the equality V(uG) = (Ya o va) x (va). (See Notation 54.5.21.) 


For w € R”, the right action map Rẹ : A > Aw is linear. So its differential (dR,,). : Te(GL(n)) > R” maps 
U € M44, (R) to Uw € R” via the charts (which are identity functions). Therefore condition (v) implies 


U € My, (R7), Vw € R”, (dO) ty w vp OV (tp,wwr)) = Uw. 


Thus condition (v) means that, via the charts, Oy is a linear map on R”. However, this does not ex- 
actly imply that Oy is a linear map from T,(M) to T,(M). By Definition 71.1.2 (i,ii), @y is a map from 
T,(v)(M) to T(T(M)), and 0y maps each vector z € T,(M) to an element of T.(T(M)). So strictly 
speaking, Oy is a vector field in X(T(T(M))|7,(M)). (See Notation 57.1.11 for the set of local vector 
fields X(T(T(M))| T;(M)).) Each vector 0y(z) has a different base point z in T,(M), but the vertical 
component of Oy(z) (via a fibre chart) is linearly related to z. Thus the meaning of condition (v) is simply 
that the vertical component of 0v (z) is a linear function of z via the charts. 


The principal justification for the fairly abstract form of Definition 71.1.2, particularly condition (v), is 
the fact that it is derived directly from the corresponding Definition 67.5.4 for connections on general 
differentiable fibre bundles. This places affine connections on tangent bundles within a broader framework. 
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Figure 71.1.1 Affine connection on a tangent bundle 


D 
S 


In the special case of affine connections, condition (v) is made less comprehensible by its abstract form, 
but it usefully indicates that for the generalisation to connections on general differentiable fibre bundles, 
the horizontal lift should equal an infinitesimal action u € Te(GL(n)) by the structure group on a fibre set 
via the charts. Each such infinitesimal action may then be viewed as an element of the Lie algebra of the 
structure group, via the charts. 


71.2. Horizontal lift functions in tensor calculus 


71.2.1 REMARK: Coefficients of an affine connection. 

Since it is necessary to differentiate the affine connection in Definition 71.1.2 to obtain the Riemann curvature 
tensor field and various other kinds of curvature, the abstract definition must be converted to a map between 
Cartesian spaces to make it concretely differentiable. In other words, it must be converted to “coordinates”. 


The conversion of an abstract connection to a map between Cartesian spaces may be effected via the charts for 
a C? manifold. Definition 71.2.2 gives the coefficient array field for an affine connection, often referred to as 
the “Christoffel symbol” for the connection. Line (71.2.1) may be summarised informally as T}, = —6¢, (e;)’- 


71.2.2 DEFINITION: The coefficient array field of an affine connection 0 on a C? manifold M for a chart 
w € atlas(M) is the map ry, : Dom(w) > R”’*"*", where n = dim(M), defined by 


Vp € Dom(y), Vi, jk e Nn, Ty (p) jk = -E (8,0 (EF P). (71.2.1) 
In other words, r, (p) j is the negative of the ith vertical component of 0y (z), where V = ep? and z = ge. 
(See Definition 59.3.5 for the oblique drop function cz".) 


71.2.8 REMARK: Reconstruction of an affine connection from its coefficient array field. 
'Theorem 71.2.4 gives a formula to reconstruct an affine connection from its Christoffel array field. 


71.2.4 THEOREM: Expression for horizontal lift in terms of the Christoffel array field. 
Let Ty, be the coefficient array field of an affine connection 0 on a C? manifold M for a chart ~ € atlas(M). 
Let p € Dom(v) and V, z € T,(M). Then 


ES 
0v (2) = t5 56) jv) us? 
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where 
vie Nn, w=- X Tho Gy", 
where n = dim(M) and ù = (v). (See Notation 59.1.12 for t® , |. |.) 
Pnoor: Let p € M and V,z € T,(M). Let W = indies: Then W € T.(T(M)) and (dr)(W) = 
V = (dz)(8v (z)), and 
Vi € Nn, P(Y) (W))* = w 
=- X ThA 
= = DND (8,5. (65 ))) e V (71.2.2) 
= m (a (8v (2)))', (71.2.3) 


where line (71.2.2) follows from Definition 71.2.2, line (71.2.3) follows from the linearity of ®(7) and c" and 
the linearity specified in Definition 71.1.2 (iii, v). (Linearity of the vertical component of 0v (z) with respect 
to z follows from Definition 71.1.2 (v) as discussed in Remark 71.1.3.) Hence W = 6y(z). 


71.3. Associated affine connections on vector-tuple bundles 


71.3.1 REMARK: Application of associated affine connection on vector-tuple bundles. 

When the Levi-Civita connection on a Riemannian manifold is “solved for” in Section 74.2 in terms of 
“simultaneous equations” for the connection, equation (74.2.2) in Definition 74.2.4 is expressed in terms of 
the associated connection 0y (y, z), for V,y, z € T,(M), on the tangent vector-pair bundle for the “unknown” 
base-level connection 6y(z) for V, z € T,(M). The reason for this is that the Riemannian metric function 
is defined on the vector-pair bundle T?(M) of a differentiable manifold M, not directly on the base-level 
tangent bundle T(M). The metric compatibility formula (dg)(0v(y,z)) — 0 is applied to the differentials 
of tangent vector pairs (y,z) because the metric function g € C! (I?(M),IR) is in fact an inner product 
function defined for pairs of vectors, not a distance function as the customary name “metric” would suggest. 


The base-level affine connection in Definition 71.1.2 specifies only the parallel-transport differentials 0y (z) 
for individual vectors z € T,(M) in a direction V € T,(M). It is quite obvious that the natural choice of 
associated connection for vector tuples should vary vector pairs by varying each individual vector according 
to the base-level connection. It is a mere technical formality to define this. (See Definition 55.5.37 for tangent 
vector-tuple bundles.) 


(2018-11-26. Interpret Definition 71.3.2 as a special case of Definition 67.12.3.)) 


71.3.2 DEFINITION: The associated connection on the r-tuple bundle T" (M) for an affine connection 0 on 
the tangent bundle T(M) of a C? manifold M, for r € Zj , is the connection 6” on T"(M) defined by 


Vp € M, VV € T,(M), V(z;)j-1 € Top, (M), 
0v ((5)524) = Ahv (z) =), 


where h is the canonical immersion of second-level vector r-tuples in Definition 59.7.2. 


71.4. Parallel transport via horizontal lift functions 


71.4.1 REMARK: Affine connections determine parallelism. 

The reason for the negative sign in line (71.2.1) is the weight of tradition in favour of defining the Christoffel 
array for terms to add to partial derivatives of contravariant vectors to yield covariant derivatives. The 
affine connection determines how a contravariant vector should vary so that it implements parallel transport 
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in a specified direction. The covariant derivative is defined as the difference between the actual variation 
(in a specified direction) and the parallel transport variation. Therefore the covariant derivative equals the 
difference between actual and parallel variation, whereas expressed in terms of the Christoffel array, the 
covariant derivative is calculated as the sum of the actual variation and a correction term. 


The relation between parallel transport and the Christoffel array is given a very clear classical explanation by 
Willmore [42], pages 205-210. (He introduces the negative sign on page 206.) The issue of the negative sign 
for the Christoffel array is also mentioned by Lang [23], pages 213-214. (See also Remark 70.4.3 regarding 
the minus-sign.) 


71.4.2 REMARK: Parallel transport of the tangent space along a curve. 

The parallel transport of the tangent space of a differentiable affinely connection manifold along a curve 
in Definition 71.4.3 is a particular kind of parallelism as introduced in Definition 48.3.2. For each pair 
(t4, t2) of parameters t;,t5 € I, the parallel transport map 07 (t1, t2) defines an isomorphism from Ty) (M) 
to T(t) (M) for some C curve y. 


By Definition 71.1.2, an affine connection 0 on M is a map 0 : T(M) > Upem(Tp(M) ^ T(T(M))) which 
satisfies 0v (z) € TL(T(M)) for all V € T,(M) and z € T,(M), for all p € M. The integral of this connection 
along a C! curve y in M must be a map O(t, t2) € Iso(T}(¢,), Ty(t2)) for all t;,t; € I. The group isomorphism 
space Iso(T, (M), T;(M)) denotes here the set of all invertible linear maps between the linear space T,(M) 
and T,(M), for all p,q € M. 

The equality in line (71.4.1) means that the velocity of the parallel-transported vector ©? (tı, t2)(z) is equal 
to the lift 0/(4,)(O7 (ti, t2)(z)) of the base space vector 7/(tz2) to the point O?(t;,12)(z) on the tangent 
bundle of M. The complexity of this formalism may seem excessive, but it does have the advantage that it 
is fairly straightforward to generalise to very general kinds of fibre bundles. Definition 71.4.3 is illustrated 
in Figure 71.4.1. 


" ^N 
0 
zı E Tp, (M) z2 = O7(t, t2)(21) T(M) 
T 
S Ó, 9? (5,0) (21))], = 065 a5) (6? (ti, t2)(21)) 
zA Vi = y (t1) 22 Vo — *y (t2) 
| L bt 
Pi = y(t) p2 = y(t2) M 
A 
eo l J 
e a > R 
ti I tg 
Figure 71.4.1 Parallel transport by an affine connection 


71.4.3 DEFINITION: The parallel transport of the tangent space along aC! curve y : I —^ M in a C? mani- 
fold M with I € Top(IR) by a C? affine connection 0 on M is the map O? : IxI > U,,qem 180(Tp(M), Tq(M)) 
which satisfies Vt, t2 € I, O(t1, t2) € Iso(Ty(r,), Ty(t)), and satisfies the equation: 


Vt4 t2 € I, Vz € Ta, Y(M), O,(O? (t1, t)(z)) t=to = 0, (tp) (O7 (t1, t2)(z)). (71.4.1) 


71.4.4 REMARK: Verification of “space-coherence” of parallel transport definition. 

The map t ++ O?(t,t)(z) in Definition 71.4.3 is a curve in T(M) for all t; € J and z € T,(,). Therefore 
8(0?(6,0)(2))|,.,, € TL (T(M)), where z2 = 6?(t,t5)(z). By Definition 71.1.2, it is also true that 
0, (4,4 (O7 (t, t2)(z)) € Tz, (T(M)). So the spaces on the sides of line (71.4.1) are “coherent”. 
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71.4.5 REMARK: Parallel transport on rectifiable curves. 

Definition 71.4.3 excessively constrains parallel transport to C! curves. The curve only needs to be rectifiable. 
(See Section 38.9 for rectifiable curves.) A rectifiable curve must be continuous, but the velocity may be 
undefined on a subset of measure zero of the curve’s parameter interval. 


Similarly, a connection only needs to be integrable in order to make line (71.4.1) meaningful. 


71.4.6 REMARK: Why differential geometry is difficult. Why principal fibre bundles are unavoidable. 
Definition 71.4.3 gives a hint as to why differential geometry is so difficult. The differential equation in 
line (71.4.1) is effectively a linear system of coupled first-order differential equations with variable coefficients. 
This is not, in itself, a difficult situation to deal with. However, in this case, when it is time to find an 
expression for the curvature of the connection, one must in essence solve the equation for four sides of a 
rectangle in point-coordinate space to determine how an initial vector z € T,(M) is affected by transport 
around a “loop”. The effect of parallel transport around such a loop must be determined in the limit as 
the loop converges to the point p. This should yield a linear transformation of T,(M) which is proportional 
to the area of the loop. But the problem here is that one cannot make estimates of first and second order 
changes along short paths very easily because the components of the vector O?(t;,1)(z) € T4, (M) are both 
inputs and outputs for the connection lift function @,/(;). 


In the case of ordinary differential equations of the form f'(x) = g(x) f(a) for x € IR, one simplifies the 
problem by expressing it as (d/dz) ln |f (x)| = g(x), which gives something like f(a) = + exp( f g(x) dx) +k. 
In the case of coupled equations, one may do the same sort of thing, except that a matrix exponential 
function must be used. This is a clue that instead of solving equation (71.4.1) for individual values of z, one 
should instead solve for all vectors in T (M) simultaneously. (Although p is variable, one may regard the 
tangent space as a single space via a chart.) In other words, one must move the problem to the principal 
fibre bundle. Then the connection lift function 6,/(4) is converted to a Lie algebra element via a fibre atlas 
for the principal fibre bundle of the tangent bundle. 


71.4.7 REMARK:  Erpressing parallel transport in terms of coordinate charts. 

It is not possible to prove existence and uniqueness for the parallel transport in Definition 71.4.3 because 
line (71.4.1) is too abstract. The “equation of motion" for parallel transport must be converted via coordinate 
charts to differential equations for real-valued functions of real variables. 


Let M < (M, Am) be an n-dimensional C? manifold for n € Zf. In other words, Am is a C? atlas on M. For 
any point p € M, there is at least one coordinate chart 7 € atlas,(M) = Ay. Then p E€ Dom(w) € Top( M) 
for such a chart. For the tangent vector component map v = (v) : * !(Dom(v)) — IR" alluded to in 


Definition 54.5.16, the n-tuple v(V) € IR” is the component n-tuple for each tangent vector V € Top, (M) 
with respect to the chart 7. For some C! curve y, for some t, in the open interval I = Dom(y) € Top(IR), 


and some z € T,(,)(M), define A: I > R” by A(t) = v(0?(t,t)(z)) for all t € I. Then 


Vt € I, Vi € Nn, BA) =- Y T$ (Y) OVO, (71.4.2) 
j,k=1 


where ©7 satisfies line (71.4.1) in Definition 71.4.3 and I, is as in line (71.2.1) of Definition 71.2.2. 
Line (71.4.2) has the form of a system of first-order linear differential equations with variable coefficients. 


71.4.8 REMARK: Existence and uniqueness of parallel transport. 

Since line (71.4.2) in Remark 71.4.7 expresses parallel transport as a system of first-order linear ordinary 
differential equations via local charts, one may demonstrate existence and uniqueness for parallel transport 
in terms of the general theory of such ODE systems. (See Section 44.6 for systems of ODEs.) Line (71.4.2) 
may be written as 


vt € I, &A(t) = B(t)(A(t)), (71.4.3) 


where A(t) = (A;(t))?_, € IR" for all t € J, and B(t) € Lin(IR”, R”) is defined by 


Vt € I, Vo € R^, Vi € Nn, B(t)(v 2 — V Ph si(y(t)) f v/5^ (t). (71.4.4) 
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71.4.9 REMARK: Parallel transport on families of curves. 
Given a curve-family y : I, x I2 — M for non-empty open intervals [,, I2 € Top(IR) and a C? manifold M 
with C? affine connection 0, one may define parallel transport for a start-point (so, to) € I4 x Ip by 


Vsı € 11, Yz € TA Y (M), 


Os(O1»(s, to)(z))| 6, = O^ (81,t0) (615(s1. to)(z)), 


So,to 


where Oj5(s, to) : Tso,to (M) > Ts,t9 (M) is a linear space isomorphism for all s € 74, and 71(s,t) € T4(,,(M) 
denotes the vector (0/0s)+(s,t) for s,t € Iı x Ig. This defines parallel transport only for parameters (s, t) 
in the subset 7; x {to} of Jı x I». Parallel transport for any value (s1,11) of (s,t) may be defined by using 
the pair (s1,¢9) as the start-point for the parameter subset {s1} x I; as follows: 


Vs, € lh, Vti € Io, Vz € Tytsi, to) (M), 
(O W (s1, t) UNES = 0 ,5(51,5) (12(81, #1)(Z)), 


where 2(s,t) € Ty(s,z)(M) denotes the vector (0/0t)7(s,t) for s,t € I; x Ia. Let Oj,(so,to)(z) = z for 
all z € T4(5,4,). Then Oj5(s,t) : Tso,to( M) — Ts (M) is a linear space isomorphism for all (s,t) € I4 x In. 
This defines parallel transport for the curve-family y by first transporting vectors “left and right” from a 
start-point (so, to), and then transporting vectors “up and down" from each secondary start-point (si, to). 
This is illustrated in Figure 71.4.2. 


Oro (so t) Oia(s1,5)() 
aS : à! t i i 4 012(82,t1)(2) 
ato atiii] E $—5) 85—535 
ugly» pu 
| LAPO IA 
zA MPSS " 
a bd LP n 
M D d Pl. à z A 7 + —— —— —————— —— 


t — tg 


M x . A . 
` (so, to) A= ii E (51, to) 


Figure 71.4.2 Parallel transport of a vector over a curve-family 


The “first left-right, then up-down” parallel transport O7, may be very easily modified in an obvious way 
to produce a “first up-down, then left-right” parallel transport O2,, commencing with the same start- 
point (so, to). This is illustrated in Figure 71.4.3. 


tı)(2) 
O21 (50; t1)(2) O5 (s.t 
o ‘4a 03, (s2,t1)(z) (s0,t1) (s1,¢1) (52,1) 
Seeueereee A 
} A 
pi i ee 
| [Awe : 
RB À | 
J LEV LL ee 
WALA th iy 
DD TPL » 7 
M mS A = to 
: (so, to) oe 
Figure 71.4.3 Swapped parallel transport of a vector over a curve-family 
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In general, these two parallel transport constructions will yield different values when s Æ so and t Æ to. The 
difference (035(s, t) — O3,(s,t)) : Ty(so,t9) — Ty(s,t) between these two parallel transport constructions is a 
primary definition of curvature of the connection. This kind of definition has the advantage of generality, 
since it does not require the connection to be differentiable. (In fact, it doesn't even need to be continuous, 
although it does need to satisfy some integrability criteria.) 

By taking the double limit of the ratio (075(s,t) — ©3,(s,t))(s — so) !(t — to) ! as (s,t) — (so, to), one 
obtains the standard definition of the curvature of a C? affine connection 0 at 4(so,to) for the vector- 
pair (71(s0, to), Y2(so,to)) € Ty(so,to (M)?. If it can be shown that this limit is well defined, it will yield an 
antisymmetric bilinear form on T,(,,.4,) (M) which is values in the Lie algebra of GL(n). 


71.5. Transposed horizontal lift functions on tangent bundles 


71.5.1 REMARK:  Transposed affine connections are a special case of transposed horizontal lift functions. 
If the tangent bundle T(M) is substituted for the total space E in Definition 67.8.2 for the transposed 
horizontal lift function of a connection on a general ordinary fibre bundle, the result is Definition 71.5.2. 
The vector bundle structure is used here. (See Section 65.9 for vector bundles.) 


71.5.2 DEFINITION: A transposed affine connection on the tangent bundle (T(M),7, M, AR) for a C? 


differentiable manifold M is a map 0 : T(M) > Usero) Ein(Tz 6) (M), TZ (T (M))) such that 


(i) Yz € T(M), 6; € Lin(Tr() (M), T (T(M))), [linearity] 
(ii) Vz € T(M), (dr), o0, = idr, (M): [horizontal component] 
(iii) Vp € M, VV € T,(M), Vo € AT uy du € gl(n), Vz € T,(M), (d$); (8,(V)) = (dRgz))e(u), where for 

all y € R”, Ry : GL(n) > IR" is defined by Ry : g > gy, [group action] 


The function ô is called the transposed lift function of the connection. 


71.5.3 REMARK: Interpretation of the definition of a transposed affine connection. 
The conditions of Definition 71.5.2 may be summarised as (i) linearity with respect to the translation vector, 


(ii) equality of the horizontal component of the connection to the translation vector, (iii) preservation of 
fibre space structure under the connection. 


71.6. Covariant derivatives for affine connections 


71.6.1 REMARK: Covariant derivatives specify the deviation of vector fields from parallelism. 

Towards the end of the 19th century, covariant derivatives were defined in terms of Christoffel arrays in 
the tensor calculus formalism. The title “absolute differential calculus” means that the covariant derivative 
was independent of the choice of coordinates. In 1917, Levi-Civita introduced a notion of parallelism for 
Riemannian manifolds which gave a more geometric interpretation to the covariant derivative. In 1918, 
Hermann Weyl introduced affine connections, thereby liberating Levi-Civita’s notion of parallelism from 
its tight link to Riemannian manifolds. From that time until the present, connections (i.e. differential 
parallelism) and covariant derivatives have coexisted as alternative perspectives for the same underlying 
geometric concept. 


In the Cartan formalism, a connection is defined to be a Lie-algebra-valued field for each given vector-frame 
field. This more or less liberates affine connections from coordinate-basis vector fields, but at the cost of 
introducing vector frame fields which are even more arbitrary than the coordinate-basis vector fields which 
they replace. 


In the modern vector-field calculus formalism, affine connections on tangent bundles are defined as maps 
D : X°(T(M)) x X!(T(M)) ^ X?(T(M)) which map vector field pairs (X,Y) to covariant derivative 
expressions Dx Y which are also vector fields. (To avoid analytical issues, the fields are usually assumed to 
be C??.) This could be viewed as a return to the 19th century focus on covariant derivatives, but arbitrary 
coordinate charts are replaced with arbitrary vector fields, which in applications are typically replaced with 
the original coordinate basis vector fields. 


There are also many representations of connections on general differentiable fibre bundles. Ordinary fibre 
bundle connections have at least 4 representations. (See Theorem 67.11.2 for conversion rules.) Principal 
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bundle connections have at least 5 representations. (See Theorem 69.6.3 for conversion rules. In addition, 
connections on principal bundles may be defined by localisations as in Sections 69.11, 69.12 and 69.13.) 
Since the literature is split between these numerous formalisms for the same underlying geometric concept, 
unfortunately one must understand them all. 


The view is taken here that connections on differentiable fibre bundles provide the best broad framework 
within which all other formalisms may be understood, including tensor calculus, frame-field calculus and 
vector-field calculus. No matter which framework is taken as primary, all of the other frameworks must be 
defined anyway because the literature is written in many mathematical languages and dialects. 


The task in Section 71.6 is to define covariant derivatives for tangent bundles in terms of affine connections. 
Each kind of function to be differentiated requires its own covariant derivative operation. The most basic 
non-trivial kind of function to differentiate is the vector field, which is a cross-section of the tangent bundle. 


The covariant derivative of a vector field is in some sense the opposite of a connection because the covariant 
derivative computes the deviation of a vector field from parallelism, whereas a connection gives a definition of 
parallelism itself. Therefore covariant derivatives of vector fields are defined by subtracting parallel transport 
from the naive derivative to compute the “absolute” derivative. 


71.6.2 REMARK: Confusing terminology for covariant derivatives and covariant vectors. 

The term “covariant derivative” is yet another unfortunate choice of words in the subject of differential 
geometry. On a differentiable manifold without any connection, a covariant vector transforms as the dual 
of contravariant vectors, and the contravariant vectors are the ordinary tangent vectors of the manifold’s 
tangent bundle. Thus “covariant” and “contravariant” vectors are duals of each other. It happens that the 
differential of a real-valued function on a manifold is a covariant vector. But this has nothing at all to do 
with affine connections or parallel transport. 


The “covariant derivative" of a real-valued function, when an affine connection is defined, is identical to 
the differential of the real-valued function in the differential layer 2 with no connection. But the differential 
of a vector field (in the absence of a connection) is a mixed covariant/contravariant tensor which is chart- 
dependent (unless it is “anchored” to a particular chart as suggested in Remark 60.1.4). 


When an affine connection is available, the “covariant derivative” of a contravariant vector field on a manifold 
has a value which is a contravariant vector or vector field. Thus the so-called “covariant derivative” yields a 
contravariant object, which seems somewhat confusing. However, the value of the covariant derivative of a 
contravariant vector field varies as the dual of the vector (or vector field) which is used for the differentiating. 
In this sense, the value of the covariant derivative does vary “covariantly”. 


Unfortunately there is a further confusion of terminology here because a “covariant vector” varies as the 
dual of the “contravariant vectors”, and the contravariant vectors are the ordinary tangent vectors of the 
tangent bundle. So “covariant” really means “contravariant” and vice versa. (This source of confusion is 
also discussed in Remarks 27.1.2 and 28.5.7.) 


In tensor calculus, the term “covariant” is applied to anything which uses a subscript index and the term 
“contravariant” corresponds to superscript indices. Since the application of the “covariant derivative” adds 
a subscript index to the components of the object being differentiated, the word “covariant” does seem 
appropriate. But the important attribute of the “covariant derivative” is that it takes into account the 
affine connection. Thus jv? is not the covariant derivative of v, whereas the components 0;v + T4.v* do 
correspond to the covariant derivative of a vector field v. 


71.6.3 REMARK: A covariant derivative may be defined in terms of an affine connection. 

The covariant derivative of a vector field is defined in terms of an affine connection 0 on a C? manifold in 
Definition 71.6.4. (This is a special case of the conversion of a general horizontal lift function to a covariant 
derivative.) Since the horizontal components of Oy X and 0v (X(n(V))) are both equal to V, the horizontal 
component of the difference, Oy X — 0y (X(x(V))), is a vertical vector. (See Definition 61.2.3 for the naive 
derivative Oy X € Tz(v)(T(M)).) Therefore the drop function is well defined for this difference-expression. 


The covariant derivative Dy X quantifies the deviation of X from parallel transport in the direction of V. 
A covariant derivative is thus not a differential parallelism, but it does contain the same information. One 
may perhaps draw an analogy between the covariant derivative and an accelerometer, which tells you how 
much you are deviating from an inertial frame which is determined by the affine connection. 
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71.6.4 DEFINITION: Construction of covariant derivative from an affine connection. 
The covariant derivative of vector fields by vectors on a C? manifold M for an affine connection 0 on M is 
the map D : T(M) > (X!(T(M)) > T(M)) defined by 


YV € T(M), YX € X!(T(M)), DyX-cw(0yX —0y(X(n(V))), 


where w is the “vertical drop function” for T(M) and 7 is the projection map for T(M). (See Definition 59.2.9 
for the vertical drop function.) 


71.6.5 REMARK: Interpretation of the covariant derivative definition at a single point. 
Definition 71.6.4 is probably clearer if expressed in terms of a base point p = 7(V). 


Vp € M, VV €T,(M), VX e X! (T(M)), 
Dy X = v(Oy X = y (X (p))). 


This is illustrated in Figure 71.6.1. It is not necessary that V be a vector field. Definition 71.6.9 gives the 
case that a vector V is replaced by a vector field Y. 


ay X € Tox T0) 
T(M) 
D Oy X — Oy (X(p)) € ker((dr)x(p)) 
| 5j > 
7 2 
en 
47) 
M 
DyX = c (0y X — 0v (X(p))) 
€ T,(M) 
Figure 71.6.1 Covariant derivative of a vector field X by a vector V 


71.6.6 REMARK: The role of the Leibniz rule for defining covariant derivatives. 

Many authors give primary definitions of connections as covariant derivatives. Then typically the Leibniz 
rule is one of the principal defining properties of such covariant derivatives. Since covariant derivatives are 
only meaningful for vector bundles, they offer a less general style of connection definition than horizontal lift 
functions. Consequently covariant derivatives are defined in this book in terms of horizontal lift functions 
for the special case of vector bundles. So the Leibniz rule is presented in Theorem 71.6.7 as a property of 
covariant derivatives, not as a condition in a definition of covariant derivatives. 


In applications of differential geometry to particle physics, the preferred representations of connections are 
typically connection forms on principal bundles and covariant derivatives on vector bundles. In gauge theory, 
these correspond roughly to boson radiation fields and fermion matter fields respectively. So covariant 
derivatives offer a perfectly adequate representation of connections for the matter fields, but not for the 
radiation fields. Horizontal lift functions have the advantage of unifying these two styles of connections in a 
single concept, which is then suitably specialised for the two roles as required. 


Theorem 71.6.7 is generalised from affine connections on tangent bundles to linear connections on vector 
bundles in Theorem 68.2.16. 


71.6.7 THEOREM: Leibniz rule for covariant derivatives for affine connections. 
Let M be a C? manifold. Let Ó be an affine connection on M. Then 


Vf € C'(M,R), VX e X (T(M)), Vp € M, VV € T,(M), 
Dy (f-X) = (0vf)X(p) + f(p) DV X, 


where D is the covariant derivative for 0, and f.X denotes the pointwise product of f and X. 
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PROOF: Let f € C1(M,R) and X € X!(T(M)). Then f.X € X!(T(M)) by Theorem 57.2.11. Let pe M 
and V € T,(M). Then Dy(f.X) = w(Ov(f-X) — 6v(f(p)X(p))) by Definition 71.6.4. Let y € atlas; (M). 
Then 


w(ðv (F-X) — 6v (FPX (»))) = Trox (Ov (F-X) — 9v G0) X (0) (71.6.1) 
= (ðv f) X (p) + f (ny, (0v. X — 6v (X (p) (71.6.2) 
= (8v f)X(p) + f(») D$ X, (71.6.3) 


where line (71.6.1) follows from Definition 59.2.15 and Theorem 59.3.9 line (59.3.2), line (71.6.2) follows from 
Theorem 61.3.3 line (61.3.3), and line (71.6.3) follows from Definition 71.6.4 and Theorem 59.3.9 line (59.3.2). 


Hence DY (f.X) = (Ov f)X(p) + f(7) D$ X. 


71.6.8 REMARK: The covariant derivative with respect to a vector field is calculated pointwise. 

The covariant derivative of a vector field X by a vector field Y in Definition 71.6.9 is nothing more than the 
pointwise covariant derivative of a vector field X by the value Y (p) for each p € M. This can be seen clearly 
on the right hand side of line (71.6.4), where every part of the expressions depends only on the pointwise 
values X(p) and Y (p), except for the sub-expression Oy(p)X, which does depend on the function X in a 
neighbourhood of p. (This style of covariant derivative is used in the Koszul formalism.) 


71.6.9 DEFINITION: Construction of Koszul-style connection from an affine connection. 
The covariant derivative of vector fields by vector fields or Koszul connection on a C? manifold M for an 
affine connection 0 on M is the map D : X?(T(M)) x X!(T(M)) > X°(T(M)) defined by 


VY e X°(T(M)), VX € X! (T(M)), Vp e M, 
(Dy X)(p) = &((8y (o) X)(p) — 6y y X (0))). (71.6.4) 


71.6.10 REMARK: Reconstructing the horizontal lift function from a Koszul connection. 

Theorem 71.6.11 shows some of the difficulty of reconstructing a horizontal lift function from the Koszul 
connection which is constructed from it. The Koszul-style connection accepts vector fields as inputs, but 
the horizontal lift function requires vectors as inputs. The reconstruction must be achieved by finding fields 
Y and X which have the required values V and z respectively at a given point p € M. This is not very 
difficult because every vector in the tangent space T,(M) has a “constant vector field extension" as in 
Definition 57.1.20 if M is a C! manifold. This is constructed from a chart 7 € atlas; (M), which shows 
how dependent the Koszul formalism is on charts. The test functions for Koszul-style connections are vector 
fields, but in practice these are constructed from coordinate charts. 


71.6.11 THEOREM: Recovery of horizontal lift function from a Koszul-style connection. 
Let D : X°(T(M)) x X!(T(M)) + X?(T(M)) be the Koszul connection for a horizontal lift function 0 on 
a C? manifold M. Then 


Vp € M, VV € T,(M), Vz € T,(M), VY e X°(T(M)), VX e X! (T(M)), 
(Y(p) 2V and X(p) 22) = Oy(z) = w((8v X)(p) - (Dy X)(). (71.6.5) 


Pnoor: The assertion follows directly from Definition 71.6.9. 


71.6.12 REMARK:  Ezplicit reconstruction of a horizontal lift function from a Koszul connection. 
Theorem 71.6.11 is unsatisfactory because it requires “the user" to somehow discover and provide fields Y 
and X having the required properties. In principle, these fields might not even exist. In practice, there are 
usually infinitely many such fields. So then one has a choice issue, namely how to choose a suitable pair of 
fields from infinitely many possibilities. T'heorem 71.6.13 provides a simple choice of fields which are suitable 
for Theorem 71.6.11. This choice applies the constant vector field extensions in Definition 57.1.20. (This 
shows once again that coordinates cannot be avoided in differential geometry. Coordinates can be hidden, 
but never removed. And often they can't even be hidden either!) 


The more systematic notations Extny(z) and Extn, (V) in line (71.6.6) for constant vector field extensions 
are replaced with the slightly more memorisable zy and Vy in line (71.6.7). 
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71.6.13 THEOREM: Explicit recovery of a horizontal lift function from a Koszul-style connection. 
Let D : X?(T(M)) x X!(T(M)) + X?(T(M)) be the Koszul connection for a horizontal lift function 0 on 
a C? manifold M. Then 


Vp € M, VV € T,(M), Vz € T (M), Vv € T;,(M), 
0y (z) = w((Ov Extny(z))(p) — (Dg, (v) Extny (z))(). (71.6.6) 
= w((8vzy)(p) — (Dv, 24) (Pp): (71.6.7) 
PROOF: Let p € M, V,z € T,(M) and v € T,(M). Let Y = Vy = Extn,(V) and X = zy = Extny(z) 
be the constant vector field extensions of V and z respectively, by the chart w, as in Definition 57.1.20 
and Notation 57.1.21. Then Y,z € X!(T(M)| Dom(V)) by Theorem 57.2.20 (ii), and Y(p) = Vy(p) = 


Extny(V)(p) = V and X(p) = zy(p) = Extn,(z)(p) = z by Theorem 57.1.23. So by Theorem 71.6.11, 
0v (z) = w((Ov X)(p) — (Dy X)(p), which implies lines (71.6.6) and (71.6.7). 


71.7. Covariant derivatives of vector fields along curves 


71.7.1 REMARK: Adaptation of the covariant derivative to vector fields along curves. 

Definition 71.7.2, which adapts the covariant derivative of vector fields on manifolds to vector fields on curves, 
is illustrated in Figure 71.7.1. (Note that Definition 57.8.5 for a differentiable vector field X along a curve 
yin M requires X(t) € Ty (M) for all t € Dom(»).) 


x' (0€ T 100) 


0 
“OL 
6 
=| Dep " 
OD, 
(Az) ) 

DX(t) = wx (X) — «(X (2) 

€ Ty) (M) 


Figure 71.7.1 Covariant derivative of a vector field along a curve 


71.7.2 DEFINITION: Covariant derivative of a vector field along a curve. 
The covariant derivative of a C! vector field X : IR — T(M) along a curve ^j : R — M in a C! manifold M 
with respect to an affine connection 6 on M is the function t + (D X)(t) defined by 


Vt € R, (DX)(t) = wx XH) — Oy) (X (5). (71.7.1) 


71.7.3 REMARK: Notation for covariant derivatives of vector fields on curves. 

A notation such as *(D.,(j X )(t)" is not suitable for the covariant derivative of a vector field on a curve 
in Definition 71.7.2 because this would seem to imply that a more general vector V € T,((M) could be 
used in place of 7/(t). The right hand side of line (71.7.1) is fully determined by X and t because y (and 


hence 4'(t)) can be computed as y = 7 o X : Dom(X) — M, where v is the projection map for the tangent 
bundle T(M). 

Definition 71.7.2 is significantly different to Definition 71.6.4, despite the obvious similarities. The “field” 
X along y is not a vector field on M. So it is not possible to define (Oy X)(t) for arbitrary V € Tq (M). 
Even the naive derivative (0, (;).X)(t) is not well defined unless the coordinate chart is exactly aligned with 
the curve in some neighbourhood of y(t). (If the chart and the curve are not aligned, the vector “field” will 
not be defined on the tangent line in the chart space.) A C! curve X : IR —^ T(M), on the other hand, has a 
well defined velocity X'(t) € Tx (T(M)) for all t € Dom(X) = Dom(5). The curve X effectively generates 
its own coordinate system and base-space velocity vector for the covariant differentiation. 

One could informally write (DX)(t) as *(D.;(j X)(t), but such pseudo-notation should be avoided when 
doing careful computations. Possibly acceptable would be a notation such as *D,X(t)" as in Notation 71.7.4. 
'This has the advantage of being usefully extendable to vector fields on families of curves. 
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71.7.4 NOTATION: D;X(t), for a C! vector field X : IR —^ T(M) along an open C! curve y: IR — M ina 
C! manifold M, for a parameter t € Dom(»), denotes the covariant derivative of X along y at t with respect 
to an implicit affine connection 0 on M. In other words, 


VLC R, D,X(t) = &xqy(X (t) — Oya) (X (0). 


71.7.5 DEFINITION: The (covariant) acceleration of a C? open curve y: I ^ M in a C? manifold M with 
respect to an affine connection 0 on M is the map Dy (5^) : Int(I) > T(M) defined by: 


Vt € I, Dy yy (t) = wya (OF Y(t) — 94 Y ())). 


where 0?y : I + T(T(M)) satisfies O?4(t) € Tyq)(T(M)) for t € I, and 0,4) € Ty ()(T(M)) also. The 
function c;(ker(dz);) > Ty)(M) for z € T(M) is the drop function for T(M). (See Definition 59.2.9.) 


71.8. Divergence of a vector field 


71.8.1 REMARK: The divergence of a vector field. 

'The divergence of a vector field is one of the differential geometry concepts which are much easier to express 
in coordinates than in abstract. Many textbooks give definitions which evaluate the covariant derivative of 
a volume element following the velocity of a given vector field. While this gives geometrical meaning to the 
divergence, it is technically quite demanding. 


Definition 71.8.3 uses the relatively elementary concept of the trace of a linear space endomorphism from 
Definition 23.3.3 to construct the divergence of a vector field. Although the divergence may be interpreted 
geometrically in terms of volume elements and flows, when it is applied in the construction of the Laplacian, 
volume elements and flows seem to have no relevance. Therefore the simple linear algebraic concept of the 
trace is adequate to the task. (See Theorem 23.3.2 for basis-independence of the trace.) 


Since so many authors define the divergence operator in terms of a Riemannian metric tensor field, it seems 
wise to check that it is well-defined in the absence of a metric. From the proof of Theorem 71.8.2, it can be 
seen that the linearity of the affine connection guarantees the linearity of the map V — Dy X, from which 
it follows that the trace of this map is well-defined. No metric tensor is required. 


One may also investigate whether the divergence is well defined with less structure than an affine connection 
on a tangent bundle. If there is no connection at all, then there is no covariant derivative and so the 
directional derivative Oy X of a vector field X in the direction V cannot be made vertical by balancing it 
with a horizontal lift of V. If a connection is defined on a general vector bundle, the vector Dy X is a vertical 
element of the tangent bundle of the total space E, which cannot be identified with the base-space tangent 
space T,,(M). The trace of a linear map is guaranteed by Theorem 23.3.2 to be basis-independent only if 
the map is a linear space endomorphism. So the fibre set Ep at each point p € M must be identical with 
the tangent space T (M). In other words, the vector bundle must be the tangent bundle. It is fairly clear, 
then, that an affine connection on a tangent bundle is necessary and sufficient for the meaningfulness of the 
divergence in Definition 71.8.3. 


71.8.2 THEOREM:  Well-definition of the trace of the covariant derivative map. 

Let M be a C? affinely connected manifold with connection 0. Let X € X!(T(M)) be a vector field on M. 
Define ¢, : T;,(M) > T,(M) by &,(V) = Dy X for all V € T;(M). Then ¢, € Lin(T;(M), T;(M)) is a 
well-defined linear map and Tr(¢,) € R is well defined. 


PROOF: By Definition 71.6.4, the map D : T(M) x X!(T(M))  T(M) is defined by 
YV € T(M), YX € X (T(M), DyX = a(dvX —0v(X(n(V))), (71.8.1) 


where w is the “drop” function for T(M) and 7 is the projection map for T(M). (See Definition 59.2.9 
for the drop function.) Let X € X!(T(M)) be a vector field on M. Then for any given p € M and 
V € T,(M), line (71.8.1) maps V to 7(V) = p, which is then mapped to the vector X(p) in T;(M), which 
is then mapped to the element 0y(X(p)) of T; v (T(M)), where z = X(p) and T;,v(T(M)) € T;(T(M)) 
is the set of elements of T,(T(M)) with horizontal component V. (See Notation 59.2.5. But Oy X is 
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also an element of T; v (T(M)). So the difference Oy X — 0y (X (r(V))) is a well-defined vertical vector in 
T;o(T(M)) = ker((dz),). Therefore the value of the drop function w of this difference is a well-defined 
element of T,(M). (See Definition 59.2.15 for the drop function.) So Dy X is a well-defined element of 
T,(M) for all V € T;(M). 

The linearity of the map V + Dy X for V € T,(M), for fixed X and p, follows from the linearity of ôy X 
and @y with respect to V and the linearity of w on TZ(M). Since Tp(M) is finite-dimensional, it then follows 
from Theorem 23.3.2 that Tr(¢,) is a well-defined real-number, independent of the choice of basis for the 
computation of the trace. 


{ 2016-7-13. Explain the divergence in Definition 71.8.3. Express it in coordinates. Apply it to obtain the 
Laplace-Beltrami operator. Give example to interpret divergence in terms of integral curves. )) 


71.8.3 DEFINITION: The divergence of a vector field X € X!(T(M)) at a point p in a C? affinely connected 
manifold M is the trace of the map V — Dy X from T;,(M) to T,(M). 


The divergence of a vector field X € X'(T(M)) on a C? affinely connected manifold M is the real-valued 
function on M defined as the divergence of X at p for each point p € M. 


71.8.4 NOTATION: div X, for a vector field X € X!(T(M)) on a C? affinely connected manifold M, denotes 
the divergence of X. 


((2016-7-13. Present and prove Jacobi's formula 0; log(det(exp(tA))) = Tr(A) for Remark 71.8.5. )) 


71.8.5 REMARK: Intuitive motivation for the divergence. 

Consider a vector field on R” defined by X(p)' = b! + A';p/ for b € R” and A € Mn, (IR). For b = 0, the 
integral curves look like y(t) = exp(At)y(0). Then the relative rate of increase of the volume of a region such 
as a cube or sphere (or rectangular solid or ellipsoid) is equal to the Tr( A), which “coincidentally” equals the 
divergence of the vector field. (Geometric interpretation of the trace of a linear map or matrix is discussed 
in Remarks 23.3.1, 23.3.5 and 25.9.5.) 


As a first-order approximation to the integration of the vector fields, put jj = y + A-!b, if A is invertible, 
and solve for y, where ôy’ (t) = b' + A';y? (t). The solution resembles y(t) = exp( At)y(0). Jacobi’s formula 
0; log(det(exp(t.A))) = Tr(A) may be applied to this to show that the first-order rate of expansion corresponds 
to the trace of A. T'hen the divergence equals the relative rate of expansion of volumes if they are expanded 
according to the integral flow of the vector field. (The example shape can be a rectangular or ellipsoidal 
region.) This does not require a metric because only relative rate of change of volume is required. The metric 
is only required if you want the absolute volume. This possibly explains why many texts express divergence 
in terms of the square root of the determinant of the metric tensor. 


71.9. Hessian operators 


71.9.1 REMARK: The Hessian operator on a differentiable manifold. 

In flat Euclidean space, the Hessian operator is easy to define. For any function f € C?(IR", IR), for n € Zi ; 
the Hessian of f is the cross-section H(f) € X?(T??(IR")) of the doubly covariant tangent bundle T?? (IR), 
where H(f) is defined by 


Vp € R”, Vu,v € T, (IR^), H(f)(p)(u,v) — X u^ 03,02, f(@)| 


ij=l nd 


Thus the Euclidean space Hessian operator H : C?(IR^, IR) — X (T9? (IR")) is defined by 


Vf € C? (R^, R), Vp € R^, Vu,v € T,(IR^), 
H(f)(p)(u,v) = Xo wird dy, de, f) 


i,j=l ae a 
When the Euclidean space IR" is replaced by a general C? differentiable manifold M with n = dim(M), 
the first issue which arises is that the output from the first-order differentiation of a real-valued function 
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f € C?(M,R) is either a cross-section df € X1(T°+(M)) of the covariant tangent bundle T?:1 (M), or else 
an induced map df : T(M) — IR. (This issue is also mentioned in Remark 58.9.9.) The induced map style of 
differential is not suitable for defining the Hessian operator because a second derivative would yield a map 
d(df) : T(T(M)) > R, which admittedly may be “dropped” to a lower tangent space after applying an affine 
connection, but this is not the intuitively clearest way to do it. The cross-section df € X1(T°1(M)) is much 
easier to apply a second derivative, connection and drop function to. 


The second issue which arise in the case of a differentiable manifold is that the output df € X!(T9:1(M)) from 
the first-order differentiation is not in the same class of objects as the original space C?(M, IR). So the Hessian 
cannot be obtained by simply differentiating twice. A derivative operation is required for df € X (T9 (M)), 
and this derivative must be made covariant so that it “drops into" the space X°(T°?(M)). Thus a dual 
connection 0* is required for the covariant tangent bundle T*(M). Then it should be possible to generalise 
the formula in Definition 71.6.4 from contravariant vector fields X € X!(T(M)) to differentiate covariant 


vector fields in X!(T*(M)). Then one should obtain an expression which resembles Oy (df) — 6t ((df)5) 
for V € T,(M). 


71.9.2 REMARK: Making double differentials of real-valued functions covariant. 

Conversion of the naive double differentials of real-valued functions on a manifold M in Section 59.10 to 
covariant double differentials requires a tensorial “rise map" from T?9(M) to T(T(M)). (This is illustrated 
in Figure 71.9.1, which is closely related to Figure 71.10.1.) 


Figure 71.9.1 Covariant double differential of a real-valued function 


The “rise map” maps z & V € T?9(M) to 0v(z) € T:(T(M)). 


71.9.3 REMARK: The Hessian of a real-valued function is its second covariant derivative. 

As mentioned in Section 60, second-order tangent operators are well-defined in the absence of a connection, 
but they have a first-order term which depends on the choice of coordinate chart because the transition 
rules for second-order operators have a term depending on the second derivatives of the chart transition 
maps. When a connection is defined, however, second-order derivatives may be defined with reference to 
local parallel transport so that the chart-dependent first-order component of the operator is hidden. If a 
second-order derivative is written in terms of coordinate derivatives instead of covariant derivatives, the first- 
order term reappears. Thus covariant second-order operators are special cases of the differentiable manifold 
operators in Section 60.2, but they are written with the assistance of the connection to hide the first-order 
derivatives. 


71.9.4 REMARK: The Hessian is well-defined at critical points in the absence of a connection. 

As mentioned in Section 59.11, the first-order derivative term in a second-order operator at a critical point of a 
real-valued function is zero with respect to one chart of a C? manifold if and only if it is zero with respect to all 
charts. Therefore the Hessian operator is well-defined at a critical point even in the absence of a connection, 
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but when the derivative of a real-valued function is non-zero, the Hessian is chart-dependent unless some 
arbitrary choice of special coordinates is made. A connection effectively selects geodesic coordinates at 
each point as the “right” coordinates. The Hessian is calculated with respect to these coordinates, and the 
Hessian is calculated in all other coordinates by including correction terms to make the calculation agree 
with geodesic coordinates. 


71.9.5 DEFINITION: The Hessian at a point p of a function f € C?(M,R) in a C? manifold M with a C! 
connection @ is the tensor Hy, € T9? (M) defined by 


V(V, V3) € T,(M) x T,(M), Hg (Vi, V2) = Dy, (df), 
where D denotes the covariant derivative on M with respect to 0. 


71.9.6 DEFINITION: The Hessian operator at a point p in a C? manifold M with a C! connection is the 
map Hy : T,(M) x Tp(M) — T," (M) defined by Hp : (Vi, V2) + (f 4 Dv, Dy, f), for Vi, V2 € T,(M) 
and f € C?(M), where D denotes the covariant derivative on M. 


71.9.7 REMARK:  Ezrpression for the Hessian operator in coordinates. 
In terms of the Christoffel array, the Hessian operator looks like Oj; — TE ðr. Therefore Hy(Vi, V2) = 


vivi (Oi; — TË () Ox) for vectors Vm = tp um, € Tp(M) for m = 1,2. 
71.10. Covariant differentials of maps between manifolds 


((2016-7-13. In Section 71.10, apply affine connection and drop function to convert the second-order differential 
in Definition 59.12.2 to a covariant second-order differential. )) 


71.10.1 REMARK: Application of affine connections to tensorise higher-order differentials of maps. 

Naive higher-order differentials of maps between differentiable manifolds are the subject of Section 59.12. 
In the case of second-order differentials of C? maps $ : Mı — M» between C? manifolds M; and M5, the 
naive differential d?$, which is simply the differential of the differential, is defined as a map from T(T(M;)) 
to T(T'(M»3)), which is fairly abstract and not very useful. But dropping such differentials down to T'(Mi) 
and T(Ms») requires tensorisation of some kind. Naturally one prefers to use a geometrically meaningful 
affine connection for this tensorisation task. The output from this task is a covariant double differential 
between T?°(M,) and T(M2). This is a kind of Hessian for maps between manifolds, which is analogous 
to the covariant Hessian of a real-valued function on a C? manifold. (This is illustrated in Figure 71.10.1, 
which is closely related to Figure 71.9.1.) 


D?6 


Figure 71.10.1 Covariant double differential of a map between two manifolds 
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The “rise map” maps z & V € T?:°(M;) to e (z) € T,(T(Mi)). The “fall map” maps 2» € Ta, (T(M2)) 
to co(25 — Diese): The affine connections 09? and 6) may be quite different in the two manifolds. 
In particular, n; = dim(M,) may not equal n; = dim(M3). In the case of a map ¢: R”! — R”? which is 
expressed in terms of the coordinate charts, the Hessian has the form 


Var € IR, Vi, j € Nn, Vk € Nns, 
D;Djé* (z) = 0:0;6" (a) — ¥ Pay le)jInd™(e) + S Tolla) E UP E)n (s). (T110.1) 


m= jt=1 


This follows from Definition 59.12.2. (It is perhaps noteworthy that the implicit summation convention 
applied to line (71.10.1) would need to have different ranges for m, j and i.) 


71.10.2 DEFINITION: The parallelism rise map for a C? manifold M with affine connection Ó is the map 
p* : near Tp(M) x T,(M) > T(T(M)) which satisfies pt (z, V) = Oy (2) for all z, V € T;(M), for all p € M. 


71.10.3 DEFINITION: The covariant fall map for a C? manifold M with affine connection 0 is the map 
p : T(T(M)) > T(M) which satisfies p^ (w) = w(w — (a5y(w)(z)) for all w € TZ(M), for all z € T(M), 
where r : T(M) — M is the tangent bundle projection map for M. 


71.11. Curvature of affine connections 
((2019-5-13. Apply the curvature formulas from Chapter 70 in Section 71.11. )) 


71.11.1 REMARK: The choice of tensor space where affine connection curvature should “live”. 

The tensor space which is required for the general definition of curvature of an affine connection follows from 
an analysis of surfaces embedded in Euclidean spaces. Perhaps the most accurate definition of the space 
in which affine connection curvature “lives” would be Lin( NT,(M ) T. (GL(I, CM)))). In other words, the 
curvature at a point p in a manifold M may be expressed as a linear map from the space of area-elements 
at p to the Lie algebra of the Lie group of general linear transformations of the tangent space T (M) at p. 
(Here e denotes the unit of the general linear group GL(75,(M)), and so T.(GL(1,(M))) denotes the Lie 
algebra of GL(T5,(M)) at e.) 


To be compatible with the more general definition of curvature of connections on ordinary fibre bundles, it 
would be more accurate to describe the space which the curvature inhabits as Lin( NT,(M ), X(T(T,(M)))), 
where .X (T(1,(M))) means the linear space of cross-sections (i.e. vector fields) on the submanifold T, (MM) 
of T(M). This space is mapped via a fibre chart to the Lie algebra Te(GL(T,(M))) of the structure group 
GL(T,(M)) which acts on the fibre set T (M). There is a bijection between this Lie algebra and the linear 
space of vector fields on T; (M) which are generated by infinitesimal actions of the structure group. 


The choice of tensor space Lin( NT,(M ), T-(GL(T,(M)))) is related to the observation that parallel transport 
of the tangent space around loops through p yield automorphisms of T (M), and these automorphisms are 
linear, to first order, with respect to the area element enclosed by the curve. Since this space of linear maps is 
adequate for smooth surfaces embedded in Euclidean spaces, it is assumed that the space will continue to be 
adequate for much more general kinds of manifolds. In a sense, it is a self-fulfilling prophesy because the par- 
allel transport around curves is defined in terms of linear maps in the space Lin( NT,(M ) Te(GL(Tp(M)))), 
which is essentially the same as Lin(T (M), T.(GL(1;(M)))). Then the Stokes theorem ensures that the 
curvature will live in Lin(A T, (M), T.(GL(1,(M)))). (This topic is also discussed in Remark 30.4.14.) 


71.11.2 REMARK: Similarity between curvature of a connection and the curl of a vector field. 

The curvature of an affine connection is something like the “curl” of the connection. That is, it measures 
how much the parallel transport varies around the boundary of an infinitesimal area element divided by the 
area of the element. This is what the exterior derivative calculates. 


71.11.3 REMARK: Spaces and maps for the Riemann curvature tensor. 
The curvature tensor may be considered to be an element of Lin( NT,(M ), Aut (TZ, (M))), or equivalently as 
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R T,(M) T,(M) 
| T— T,(M)* 4——9 i 
Ly (Tp(M)) T,(M) T,(M) 
NT,(M) Aut(Tp(M)) Aut(Tp(M)) 
Lin(\’Tp(M), Aut(T,(M))) Ao(Tp(M), Aut(T,(M))) 


Figure 71.11.1 Riemann curvature tensor spaces and maps for an affine connection 


an element of Ao(T,(M), Aut(T,(M))). Figure 71.11.1 illustrates some spaces and maps which are relevant 
to the definition of the Riemann curvature tensor in an affine manifold. 

The first space Lin(A T, »(M), Aut(T5(M))) may be thought of as the space of linear maps from infinitesimal 
loops (whose area approximates an area element in NT, (M )) to the automorphism group Aut(T,(M)) = 
GL(I,(M)) = Lin(T;(M), T;/(M)). The second space A2(T;(M), Aut(T;(M))) may be thought of as the 
linear space of antisymmetric maps from the tangent space to the automorphism group of T,(M). The 
double bar on the arrow in the diagram indicates a bilinear map. The zig-zag on the arrow indicates an 
antisymmetric (alternating) map. 


Although Lin(A T, (M), Aut(T,(M))) and Ao(T,(M), Aut(15(M))) are related by a natural isomorphism, 
natural isomorphisms between tensor spaces are a dime a dozen. The inscrutability of the Riemann curvature 
concept is partly attributable to the substitution of a more insightful space Lin( NT,(M ), Aut(T; (M))) with 
very practical but meaningless spaces such as T,(M) & T,(M)* & T,(M)* & T,(M)* or Q'" T,(M) = 
Lin(.Z(T,(M)'; R),.Z (T;(M)5; R)) or @"* T;(M) = .Z((15(M)*)", T;(M)*; R) which are more typically 
used in the literature. (See Remark 29.5.16 for these spaces.) 

The difference of meaning between Lin(A T, (M), Aut(T,(M))) and Ao(T,(M), Aut(T,(M))) is that the 
domain of the former space consists of wedge products of tangent vectors, whereas the domain of the latter 
space consists of pairs of vectors. In practice, one does in fact compute curvature for pairs of vectors, but the 
curvature is actually a function of equivalence classes of regular two-dimensional submanifolds of M which 
pass through p € M and have a tangent space at p which is spanned by a pair of vectors. (The curvature 
is effectively computed by integrating parallel transport around loops within each such submanifold which 
converge to the point.) Such an equivalence class of submanifolds is best described by the area element 
abstraction, although this abstraction is specified by two vectors in practice. 


71.11.4 REMARK: The Riemann curvature tensor of an affine connection. 

'The negative sign for the Riemann curvature tensor field E in Definition 71.11.5 is due to the fact that most 
authors define it to be the limit of the covariant derivative of a vector field around a loop through a point, 
whereas the Riemann curvature o? in Definition 70.3.8 computes the effect of parallel transport around the 
same loop. 


71.11.5 DEFINITION: The Riemann (curvature) tensor at a point p of a C? manifold M with a C! affine 
connection 0 is the map R : T,(M) x T,(M) > Lin(T;(M), T;(M)) given by 


Vu,v,z € T,(M), R(u,v)(z) = —pf (u,v) (2), 


where p° is given by Definition 70.3.8. 


The Riemann (curvature) tensor field of a C? manifold M with a C! affine connection 0 is the map R : 
Upem (Tp(M) x T,(M) > Lin(T; (M), T; (M))) given by 


Vp € M, Yu, v, z € T,(M), R(u, v)(z) = =° (u, v)(z). 


71.11.6 REMARK: Interpretation of Riemann curvature tensor components with respect to coordinates. 
In the usual expression R'jpe = OV j,/Ox* — OVj, /Ox* 4- 41,1, — TT, for the components of the Riemann 
curvature tensor with respect to a given coordinate frame, the first two indices indicate the components of 
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the linear transformation in Lin(T,(M),T,(/)) which is associated with each area element at p, whereas 
the last two indices indicate the components of the area element in N (T,(M )). Thus, viewed as a linear 
map from N (T,(M )) to Lin(T; (M), T; (M)), the usual tensor calculus expression indicates the input by the 
last two indices, and the output by the first two indices. 


71.11.7 THEOREM: Formula for curvature tensor components in terms of Christoffel array field. 
The curvature tensor components £';;; = R(ex, e;)(e;)" with respect to a coordinate chart w € atlas(M) on 
a C? manifold M are given by 


Va € Range(v), V(i, j, k, £) € Nh, 


OL * (x OT? (x 
Rijs(a) = ue = ) (TS (G)TÀ()- T2 (x) (x) 


= Tyo. (2) — Ej o (2) + TA) (2) ^ TA (2) Tye (x). (71.11.1) 


PRoor: By Definitions 71.6.4 and 68.1.8, De, (ej) = w(Oec,e; — be (ej)) = WY (—0.,(e5)) = Yo Thei. (See 
Definition 59.3.5 for cz".) Therefore 


- Eu "CES 2» iue 2A A e 


Hence E! jy; = De, De, (ej)! — De, Dex (ej)! = T y = Vee + Xu Di Ly). 


71.11.8 DEFINITION: The component family for the Riemann curvature tensor in an affinely connected 
C? differentiable n-dimensional manifold M for n € Zo , with respect to a C? chart Y for M, is the family 
Gio (m))7 jke € RN. for x € Range(w) defined by the formula 


Va € Range(w), V(i, j, k, 2) € NE, 
Rr irela) = Ope Telt) — OL (m) + L Cie Tg Gr) - VA (x) Es (o), 
q— 


where the family Tihle) jk=1 € IRN» at each z € Range(v) is the Christoffel array for the affine connection 
with respect to wv. 


71.11.9 REMARK: Diversity of notations for the Riemann curvature component array. 
The notations for the component array of the Riemann curvature tensor show considerable diversity in the 
literature. Some of these are summarised in Table 71.11.2. 


The brace-notation (ji, £k) used by Bianchi and Levi-Civita was known as “Riemann’s symbols”, no doubt 
by analogy with “Christoffel’s symbols". These were attributed to a posthumous work by Riemann. 


Of some interest in Table 71.11.2 is the order of the indices k and £. Reversing this order negates the 
curvature, which is equivalent to computing the parallelism style of curvature in Definition 70.3.8 instead of 
the modern covariant derivative style of curvature in Definition 71.11.5. The fact that “Riemann’s symbols” 
have the reversed order “Ck” for Levi-Civita hints that it was parallelism rather than covariant derivatives 
which motivated the definition, although the notation dated back to Riemann's time. The derivation of 
the curvature tensor from infinitesimal parallel transport, as opposed to alternating covariant derivatives, is 
explicit in the presentations of the meaning of the Riemann curvature tensor by Levi-Civita [187] in [225], 
pages 28-35; Pauli [296], pages 42-43; Levi-Civita [25], pages 198-201; Levi-Civita [26], pages 172-176; 
Guggenheimer [16], page 314-322; Bishop/Goldberg [3], pages 233-234; Lovelock/Rund [27], pages 84-91; 
Rebhan [299], pages 958-960. All of these authors use the "reverse order" £k for their Riemann curvature 
notation. It was not possible to describe curvature in terms of parallel transport before 1917 because that 
was when Levi-Civita's “notion of parallelism” was introduced. 
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year reference R(ex, ee) (e;)° 
1902 Bianchi [204], pages 73, 349 (ji, £k) 
1916 Einstein [323], page 800 Bi 
1917 Levi-Civita [225], pages 28-35 (ji, £k) 
1918 Weyl[310], page 119 am 
1921 Pauli [296], page 42 R’ jek 
1925 Levi-Civita [25], page 201 {ji, £k) 
1940 Eisenhart [10], page 100 R’ jke 
1949 Synge/Schild [41], page 83 Riike 
1959 Kreyszig [22], page 144 R’ jke 
1959 Willmore [42], page 232 Rire 
1962  Landau/Lifschitz [282], page 309 R’ jke 
1962 Lawden [283], page 105 Bi pe 
1963 Guggenheimer [16], page 322 R5! ox 
1963 Kobayashi/Nomizu [19], page 145 Rive 
1965 Postnikov [33], page 52 Rive 
1968 Bishop/Goldberg [3], page 234 Rio 
1968 Choquet-Bruhat [6], page 239 Rive 
1970 Misner/Thorne/Wheeler [292], page 219 R’ jx¢ 
1970 Spivak [37], Volume 2, page 188 Ru 
1974 Gilmore [82], page 132 Rej 
1975  Lovelock/Rund [27], page 257 no 
1979 Do Carmo [9], page 92 Low 
1980 EDM?[113], page 1350 Io" 
1980 Schutz [36], pages 211-214 R’ jke 
1981 Bleecker [254], page 114 R’ jke 
1983 Nash/Sen [30], page 188 Rire 
1986 Crampin/Pirani [7], pages 230, 295 R’ jke 
1987 Gallot/Hulin/Lafontaine [13], page 110 Rp; 
1988 Kay [18], page 101 Rie 
1994 Darling [8], page 211 Ri ne; 
1995 O'Neill [295], page 18 Rie 
1996 Goenner [270], page 253 Pm 
1997 Frankel[12], page 244 Rone 
1997 Lee [24], page 118 "nm 
1998 Petersen [31], page 41 Rie; 
1999 Rebhan [299], page 957 Rio 
2004 Szekeres [305], page 514 Rise 
2005 Penrose [297], page 302 Rye; 
2007 Morgan/Tian [29], page 5 Ree’; 
2012 Sternberg [38], page 92 Rone 
2013 Katti [279], page 256 Ra 
2015 Gémez-Ruiz [14], page 161 Ries 
Kennington Ri jkt 
Table 71.11.2 Survey of Riemann curvature tensor component array notations 


71.11.10 REMARK: Interpretation of the Ricci curvature tensor. 

In a Riemannian manifold, the geometrical significance of the Ricci curvature is that its diagonal elements 
are essentially the sums of the sectional curvatures in the hyperplanes passing through the tangent vectors 
corresponding to the diagonal elements. 
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The Ricci tensor is formed from the Riemann tensor by contraction of a covariant input with a contravariant 
input. Therefore the Ricci tensor is well defined in the absence of a metric. Consider the set Aut(T,(M)) of 
invertible linear transformations from the tangent space to itself. A linear map ¢: T (M) — T,(M) may be 
contracted by summing &p(ó(e;)); over basis vectors e; in a basis D, using the notation in Definition 22.8.8 
for the component of ó(e;) with respect to the basis vector e;. Then Tr(¢) = *77 4 &p(ó(e;)); is an invariant 
of GL(n). (See Definition 23.3.3 for the trace of a linear map.) It is very significant that the trace is an 


invariant of GL(n), not just SO(n). It implies that the Ricci tensor is well defined in the absence of a metric. 


71.11.11 DEFINITION: The Ricci (curvature) tensor at a point p of a C? manifold M with a C! affine 
connection 0 is the map R : T,(M) x T;,(M) > R given by 


Vu, z € T,(M), R(u, z) = Tr(v > R(u, v)(z)), 


where R is given by Definition 71.11.5. 


The Ricci (curvature) tensor field of a C? manifold M with a C! affine connection 6 is the map R € 
X (T??(M)) given by 


Vp € M, Vu,v € T (M), R(u, z) = Tr(v > R(u, v)(z)). 


71.11.12 REMARK: The trace of a linear endomorphism. 

Definition 71.11.11 uses the trace of a finite-dimensional linear space endomorphism. For any linear space 
endomorphism $ € Lin(V, V), for a linear space V with dim(V) = n € Z, the value Tr(¢) = 7; , (ei)! 
is independent of the choice of basis (e;)2., for V. (See Theorem 23.3.2.) In this case, the linear map is 
from T,(M) to T,(M) for given values of u and z. Thus one may write R(u,z) = $5; 4 R(u,ei)(z)! for 
all u,v € T,(M), for any basis (e;)#_, for T;(M). 


71.11.13 REMARK: Choice of notation for the Ricci tensor. 

It is an unfortunate accident of history that the names of both Riemann and Ricci start with the letter “R”. 
Some authors distinguish the fourth-degree Riemann curvature tensor from the second-degree Ricci curvature 
tensor by using the full names “Riemann” and “Ricci” in mathematical formulas. (See for example Misner/ 
Thorne/Wheeler [292].) Some authors use the notation “Re” for the Ricci tensor. (See for example Lee [24], 
pages 124.) And some use “Ric” for the Ricci tensor. (See for example Frankel [12], page 315.) In this book, 
the notation “R” will be used. 


71.11.14 DEFINITION: The component family for the Ricci curvature tensor in an affinely connected C? 
differentiable n-dimensional manifold M for n € Te. with respect to a C? chart s) for M, is the family 
(Rye(x))"~_, € IRZ* for z € Range(V) defined by the formula 


Rje = RE jke = Os Tj = OT, + PLA m TAL 


where the family (T4, (z))7; 41 € RZ% at each x € Range(w) is the Christoffel array for the affine connection 
with respect to w. 


71.11.15 REMARK: The scalar curvature is not well defined without a Riemannian metric. 

The scalar curvature is not well defined in a general affinely connected manifold. A Riemannian metric 
is required for its definition. (See Definitions 74.4.7 and 74.4.8.) Likewise, the Einstein tensor requires a 
Riemannian metric for its definition. (See Definition 75.3.3.) 


71.12. Torsion of affine connections 


71.12.1 REMARK: The geometric meaning of torsion. 

Since torsion is a property of connections, and connections are differentials of parallel transport, torsion must 
be a property of parallel transport. Therefore to interpret torsion, one must consider how torsion affects 
parallel transport. 


Torsion is related to the parallel transport of vectors which are not parallel to the direction of a curve. Parallel 
transport of tangent vectors which are parallel to the direction of a curve is unaffected by torsion because 
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torsion is in essence the antisymmetrant of a connection which alternates the direction of transport with the 
direction of the vector which is being transported. If these are the same vector, then the antisymmetrant is 
obviously zero. 


Torsion is equal to the rate of change of tangent vectors as they are parallel-transported along curves, but 
one must ask: “Rate of change relative to what?” This becomes clearer when expressed in terms of geodesic 
curves and Jacobi fields. (See Section 72.2 for torsion in the context of geodesics.) The rate of change of 
parallel-transported tangent vectors along a geodesic must be measured relative to the direction of nearby 
geodesic curves. Roughly speaking, parallel transport must be compared with Jacobi fields. 


71.12.2 REMARK: The correct order for the lower indices of the Christoffel array. 

The torsion in the Koszul idiom is 7( X, Y) = DxY — Dy X for commuting vector fields X and Y. (For 
general vector fields, (X, Y) = DxY — Dy X — [X, Y].) Since the k in NA signifies differentiation, this 
gives the components Ti, = Ly — Lj, for the torsion tensor by letting X = e; and Y = eg. This agrees 
with Szekeres [305], pages 509, 513, and also with Weyl [310], page 113, Levi-Civita [26], pages 107-114, and 
Bishop/Goldberg erg [3], page 223. However, Poor [32], page 94, and Choquet-Bruhat [6] , pages 230-237, use the 
opposite convention for the two lower m of the Christoffel array. It is pointed out explicitly by Misner/ 
Thorne/Wheeler [292], page 209, that the “differentiating index” on the Christoffel array must “come last". 
(See also Remark 71.13.1 on this subject, including a longer list of authors.) 


The apparent reversal of the indices “j” and “k” in the formula T? jk = =Tj,- Lj. does not seem so mysterious 
when one considers that the Christoffel array is the array of En oneni: of the negative of the horizontal lift 
function. (See also Remarks 70.4.3 and 71.4.1 regarding the minus-sign.) Since L7, = —6., (e;)', one obtains 


Tj, = Ge, (ej)! — Ge; (ex). 

In general, (Dx Y)! = XI0;Y' +r} Y^ X7, and so (DxY — Dy X)! = [X, Y]' + (Tj; - Lj) XY ^, which equals 
[X, Y]! -- T, X7Y^. Therefore (X, Y)! = Tj, X7Y^. This gives an equivalence for the three formalisms, 
namely tensor calculus, Koszul-style vector field calculus, and fibre bundles as follows. 


te = I; — Tj [tensor calculus] 

T(X,Y)- Tj, XIY" ei = (T; — i) XY "e; [tensor calculus] 
T(X,Y) 2 DxY — Dy X - [X,Y] [vector field calculus] 

T(X,Y)(p) = voy (X (p) — 9x qj (Y (P)) [fibre bundles] 

T(V, z) = 0.(V) — 0v (z), [fibre bundles] 


where V = X(p) and z = Y(p). The value of the torsion of two vector fields at a point depends only on the 
affine connection at that point, and on the values of the vector fields at that point. 


The Koszul formalism unnecessarily stipulates a computation involving derivatives of vector fields in the 
formula 7(X,Y) = DxY — Dy X — [X,Y]. (This is an example of how sometimes the Koszul-style vector 
field calculus makes expressions more complicated and onerous, but less insightful and less correct.) 


71.12.38 REMARK: Validity of subtraction of second-level tangents in disjoint linear spaces. 

The difference-expression 0.(V) — 0v(z) in Remark 71.12.2 is a valid element of both T,(T(M)) and 
Tv (T(M)), and the horizontal component of the difference is equal to zero. This follows by application 
of the “swap function" E for T(T(M)) which is introduced in Section 59.6. This issue is generally glossed 
over in the literature because vectors such as 0;(V) € T,(T(M)) and 0v(z) € Tv(T(M)) are easily verified 
to obey the same transformation rules for any chart transition. The fact that T;(T(M)) n Tv(T(M)) = 0 
for z Z V is somewhat troubling, but essentially harmless. This issue is a consequence of imperfections in 
the application of the fibre bundle concept to tangent bundles. (A similar rarely mentioned issue for tangent 
bundles in the fibre bundle formalism is the drop function c in Section 59.2.) 


71.12.4 REMARK: Application of the abstract affine connection to define torsion. 

The torsion of an affine connection 0 is, roughly speaking, the difference between 0y (z) and 0;(V). Even 
though V and z are in the same set T,(M) for some p € M, the values 0y (z) and 6,(V) are not in the same 
set because 0v (z) € T;(T(M)) and 9.(V) € Ty(T(M)), and T,(T(M)) n Ty(T(M)) = 0 whenever V # z. 
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Figure 71.12.1 The two terms in the torsion expression are in different tangent spaces 


This is illustrated in Figure 71.12.1. (The maps and spaces for the relevant abstract tangent bundle are 
illustrated in Figure 65.9.2.) 


Ironically, a fundamental requirement of fibre bundles, namely that fibres sets at different points must be 
disjoint, makes some of the most basic tangent bundle concepts difficult to define, even though tangent 
bundles are the canonical structures which underly the development of general fibre bundles. So one might 
perhaps conclude that the general abstract formalism for connections on fibre bundles is useful only when 
not applied to affine connections on tangent bundles. 


The way in which this issue was resolved before the abstract fibre bundle formalism was developed was to 
subtract the coordinate-map components of the two values 0v (z) and @,(V). This is a clue to a solution. It 
is possible, using a coordinate chart, to manufacture two vectors in T(T'(M)) which inhabit the same double 
tangent spaces T,(M) and Ty (M), but which make no difference to the numerical value of the difference 
between the components of 0 (z) and 0,(V). 


Let vj : U — R” bea C? chart on the open set U C M such that p € U. Define the chart 7) : x^! (U) — IR? for 
T(T(M)) by 9 : 35.4 vie; + (vi), for all v = (v;)P., € IR", where (e;)? , is the standard basis of T;(M) for 
the chart v. If V € T,(M) and z € T;(M) are linearly independent, the chart Y may be chosen so that e; = V 
and e» — z. Define tangent operator fields X,Y € X (T(M) |U) by X(q) = 0 and Y (q) = 0» for all q € U. 
Then X(p)Y € T.(T(M)) and Y(p)X € Ty(T(M)), and (dr), (X(p)Y) = V and (dz),(Y (p) X) = z. (The 
tangent operator space T(M ) may be identified with the tangent vector space T(M) in the usual way, and 
(0;)2., may be identified with (e;)?_,.) So 0y(z) - X(p)Y € T.(T(M)) and 0,(V) -Y (p) X € Ty(T(M)) are 
well-defined vector differences, and (dz),(0v(z) — X(p)Y) = 0 and (dr) (0-(V) — Y (p)X) = 0. Therefore 
0v(z) — X(p)Y is a well-defined vertical vector in TZ(T(M)) and 0.(V) — Y (p)X is a well-defined vertical 
vector in Ty (T(M)). The “drop functions" c; : ker((dr);) > T,(z)(M) and ay : ker((dr)v) > Trev) (M) 
at z and V may be applied to these difference-vectors. (See Definitions 59.2.9 for drop functions.) Since 
n(z) = 1(V) = p, this yields vectors c(0y (z) — X(p)Y) € T;(M) and wy (6.(V) - Y (p)X) € T,(M). These 
vectors may be subtracted to yield the tangent vector c;(0y(z) — X(p)Y) — wv (0-(V) — Y(p) X) € T,(M) 
on the manifold M. This then becomes the definition of the torsion of an abstract affine connection. 


By Remark 59.2.8 and Theorem 59.2.11, the construction of the abstract torsion expression using coordinate 
charts and drop functions is chart-independent. 


The abstract formula c(0y(z) — X(p)Y) — wy(0.(V) — Y(p)X) is reminiscent of the standard formula 
DxY — Dy X — [X, Y] for torsion in terms of the Koszul connection formalism. This is not pure coincidence 
of course. This standard formula may be obtained from the coordinate-aligned X and Y fields version as a 
generalisation to the case that [X, Y] is non-zero. It may seem that coordinate charts should not be part of 
a definition which is supposedly chart-independent. However, the meaning of torsion is in fact the capability 
of a parallelism to be “laid flat” on a local chart which corresponds roughly to a family of geodesics. 


71.12.5 DEFINITION: The torsion field of Ct vector fields X,Y € X!(T(M)) on a C? manifold M with a 
C! connection 0 is the vector field Z € X°(T(M)) which satisfies Z(p) = D&Y (p) — D?-X(p) — [X, Y](p) 
for all p € M. 


71.12.6 EXAMPLE: Parallel transport for an affine connection with non-zero torsion. 
Let M = IR? with the identity chart w = idm : IR? — IR? defined by v : x — x. Define a Christoffel array 
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field  : IR? > (N$ > R) by f(x) = exi; for all x € M and (k,i, j) € N3. (See Notation 14.8.35 for the 
Levi-Civita alternating symbol.) In other words, 


0 0] fo 0 -1 0 10 
Vx € M, T(z) = ||0 11,10 o O} 2/24 0 0 
0 0| |1 0 0 0 0 0 


—1 


Consider the curve y : R — M defined by Vt € R, y(t) = (t,0,0). Let Y : Dom(y) — T(M) with 
Vy € IR, Y (t) € T,((M) be a suitably differentiable vector field along y. Then Y is parallel transported by 
T if and only if 


zk k i Dj 
Vt € R, Vk € N3, Y*(t) - TEQY(0)Y*(t)! (t) = 0. (71.12.1) 
Since 4(t) = (1,0,0) for all t € R, line (71.12.1) implies Y ! (t) = 0, Y?(t) + Y?(t) = 0 and Y?(t) - Y?(t) 2 0 


for all t € R. Hence Y (t) = (kı, c1 cost + c2 sint, c1 sin t — co cost) for all t € IR, for some k1,c1,c9 € IR. This 
is illustrated in Figure 71.12.2 for kı = 0, cı = 1 and c2 = 0. 


T3 1 


tig 


Figure 71.12.2 Parallel transport of tangent vector for affine connection with non-zero torsion 


71.12.7 REMARK:  Torsion-free connections. 

The torsion-free property for an affine connection on a differentiable has no relation to any metric on the 
manifold. The swap function “=” in Definition 71.12.8 is introduced in Definition 59.6.6. The swap function 
for second-level tangent bundles T(T'(M)) is meaningful for general C? differentiable manifolds. 


71.12.8 DEFINITION: A torsion-free connection on a C? manifold M is a horizontal lift function 0 for M 
which satisfies 


Vp € M, VV, z € T,(M), 0v (z) = E(64(V)). 


71.13. Coefficients of affine connections on tangent bundles 
((2016-7-13. Combine the old Section 71.13 with Section 71.27)) 


71.13.1 REMARK: The unstandardised order of subscripts for the Christoffel array. 
There seems to be no consensus on the subscript order for the Christoffel array. The earliest authors seem 
to have preferred the order given in Definition 71.2.2. (See also Remark 71.12.2 on this subject.) 


The left index 7 of T jk refers the vector to be parallel transported, and the right index k refers to the direction 
of transport according to Szekeres [305], page 507; Misner/'Thorne/Wheeler [292], page 209; Weyl [310], 
page 90; Levi-Civita [26], pages 107-114; Bishop/Goldberg [3], page 223. Then the torsion component family 
is given as Ti, = I} — Pf, by Szekeres [305], page 513. 
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The opposite Christoffel array convention (the right index indicating the vector to be transported and the 
left index indicating the translation vector) is used by Frankel [12], page 241; Gallot /Hulin/Lafontaine [13], 
page 71; Lee [24], page 51; Choquet-Bruhat [6], pages 230-237; Poor [32], page 94; Do Carmo [9], page 52; 
Petersen [31], pages 30-31. With this opposite convention, the torsion tensor components are defined as 
Tj, = Tj, — T% by Frankel [12], page 245; Gallot/Hulin/Lafontaine [13], page 69; Lee [24], pages 63, 68; 
Choquet- Bruhat [6], page 237. This gives the same torsion tensor because it is a double reversal. 


The two original 1869 Christoffel arrays eg ] and ery were explicitly symmetric in the indices j and k because 
they were defined as coefficients of differentials of quadratic forms. (See Christoffel [175], pages 48-49.) These 
arrays were regarded effectively as “tensorisation coefficients” to ensure covariance of derivatives of forms 
defined on manifolds which obtained their metric from an ambient space. (See Section 60.4 for tensorisation 
of second-order operators.) The same comment applies to Bianchi’s 1894 lectures. (See Bianchi [203], 
pages 41-42. See also the 1902 second edition, Bianchi [204], pages 62-66.) At that time, Christoffel arrays 
were not presented as coefficients for parallel transport. 


'The 1900 paper by Ricci-Curbastro and Levi-Civita likewise defines the Christoffel array in terms of quadratic 
forms, but clearly interprets the right-hand index k of {7 * Y as the differentiating (i.e. transport) index. (See 
Levi-Civita [222], page 492.) However, the 1917 paper by Levi-Civita [187] equally clearly makes the left- 
hand index j of p } the differentiating index. (See Levi-Civita [225], page 3.) The 1918 book by Weyl [310], 

page 90, uses the notation I, and makes the right index k the parallel transport index. The 1925 book 


by Levi-Civita returns to the original index order for the arrays ed and D. where the right index k 
indicates the transport vector, although these arrays are still assumed to be symmetric. (See Levi-Civita [25], 
pages 126-127.) In the 1926 English-language edition, Levi-Civita used the notations [jk, i] and (jk, i} are 
used instead, but still tied the right index to the transport vector. (See Levi-Civita [26], pages 109-114.) 


The 1959 book by Willmore links the right subscript of the Christoffel array to the transport vector, and (in 


a different notation) defines the torsion as į (T}, — T). (See Willmore [42], pages 205-216.) 


Postnikov [33], page 50, gives the opposite subscript order to Definition 71.2.2. 


71.13.2 REMARK: Expression for Christoffel array in terms of covariant derivatives of basis vector fields. 
The Christoffel array for a given connection on a C! manifold M with respect to a C! chart y for M may be 
expressed as the family of components of the tangent vectors (De, (ex))7,—, with respect to the chart-basis. 


In other words, 
De, (ej) -5r €i 


where n = dim(M) and (e;)"_, = (e?”)"_, denotes the family of chart-basis vectors at p € Dom(:). 


t 


71.14. Affine connections on frame fields 


71.14.1 REMARK: Projecting affine connections from tangent bundles to frame fields. 

Connection forms may be defined on general differentiable principal bundles in the Ehresmann style, or they 
can be defined on local vector-frame fields in the Cartan style. (See Remark 69.5.1 for literature references.) 
The Ehresmann style is very general because connection forms on principal bundles can be induced onto 
any associated fibre bundle. The Cartan style, on the other hand, requires frame fields as a “substrate” for 
matrices of connection forms, for which a vector bundle structure is a prerequisite. The customary formulas 
in the literature for the Cartan connection form matrices often rely on Riemannian manifold structure, which 
restricts the scope even further, from vector bundles to tangent bundles. Section 71.14 is concerned with the 
definition of Cartan-style connection form matrices on the tangent bundles of affinely connected manifolds. 


Affine connections can be constructed for the frame fields of a differentiable manifold from a given horizontal 
lift function 0 as follows. 


e 


Convert 0 to a horizontal lift function 8 for a principal bundle P associated with T(M). 
(2) Transpose 6 to make a transposed horizontal lift function P for P. 

(3) From B, construct a connection form w on P as in Definition 69.5.4. 
( 


4 


Je eA dq d 


Construct an affine connection for local frame fields as in Definition 71.14.2. 
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Xioc(F(M)) means the set of local frame fields Uy c, (7 X (F(M) |U) and X oc(A1(T(M), Te(G))) means 
the set of local T. (G)-valued fields Uy e, av) X(Ai(T(M),T.(G))|U). (See Notation 57.7.24.) Note that 
the short-cut version of cross-sections of the covariant tensor bundle A4(T'(M), T.(G) is assumed here, as 
discussed in Remark 57.7.1. Without the short-cut, it would be necessary to write A& (p)(V) = w(p)(X,(V)) 
in place of line (71.14.2). The short-cut makes the formula AY = w o X, meaningful. However, the equation 
Dom(A5) = Dom(X) on line (71.14.1) is only meaningful if this short-cut is not used. 

The frame bundle #(M) in Definition 71.14.2 is assumed to be the principal frame bundle in Definition 55.7.8, 
whose fibre charts have values in GL(n), where n — dim(M). 


71.14.2 DEFINITION: The affine connection on (local) frame fields, corresponding to a connection form w 
for a C? manifold M, is the map A? : X1, (F(M)) > Xioc(A1(T(M), Te(G))) defined by 


loc 


VX € Xi (F(M)), Dom(A&) = Dom(X) (71.14.1) 
and 
VX € Xy (CF(M)), Vp € Dom(X), VV € T,(M), 
AX (V) = w(X,(V)). (71.14.2) 


In other words, VX € Xi, (F(M)), AX =w o X, = X*(w). 


71.14.38 REMARK: Interpretation of an affine connection on a frame field. 

The style of affine connection in Definition 71.14.2 effectively uses a frame field X to generate vectors X.(V) 
in the tangent bundle T(.F(M)) of the principal frame bundle (M) to which the connection form w can be 
applied. For example, if the C^? manifold M is the embedded 2-sphere S? C IR?, and X is the coordinate 
frame field for some local C? manifold chart Y% € atlas(M) with domain U = Dom(w) € Top(S?), then X 
is a C! map from U to F(M) such that 7*(X(p)) € F (M) for all p € U. (In fact, X(p) = (e^")2 , for 
all p € U. See Notation 54.4.10 for chart-basis vectors e?”.) If p is varied in the direction V € T,(M), then 
X (p) varies with differential (GX),(V) = X.(V) € Tx(p)(F(M)). A connection form v : T(F(M)) > T.(G), 
as in Definition 69.5.4, maps X,(V) to an element of T.(G), the Lie algebra of G = GL(2,R). (Such a 
connection form is well defined because F(M) is a principal bundle by Theorem 55.7.11.) The Lie algebra 
element w(X,(V)) is a kind of “correction signal” to indicate how unparallel the transport of the frame 
field is in the direction V. Thus if w(X,(V)) = 0, this would indicate that if a C! curve y in the direction 
V = y (0) at p = *(0) adopts the vector-frame X (4(t)) for each t € Dom(), then the differential of X (y(t)) 
at t = 0 would be a horizontal vector in the linear space Tx (jj (F(M)). 


If w(X,(V)) Z 0, then the Lie algebra element w( X, (V)) indicates how the frame bundle X (p) deviates from 
parallel motion. As mentioned in Remark 69.5.2, the connection form is really a “covariant derivative form", 
which indicates deviation from parallelism, whereas a horizontal lift function does indicate the correct rate 
of change of vectors to achieve parallel motion. 


In summary, an affine connection on a frame field yields a Lie-algebra-valued 1-form on a manifold which 
indicates the deviation from parallel motion of the given frame field in any given direction. 


71.14.4 REMARK: Extension of connections on frame fields to general principal bundle cross-sections. 

If the principal frame bundle F(M) in Definition 71.14.2 is replaced with a general principal bundle, the 
frame fields must be replaced by cross-sections of the principal bundle. The result of this substitution is 
shown in Definition 69.11.3. 
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Chapter 72 


GEODESICS AND JACOBI FIELDS 


T2jL “Geodesic Curves” « dA voie uoce SESS Rep ye RU EAE OW eA ee SON Ron Ww od 2270 
72.2 Geodesics and torsion . . . . 22 ll e sos ss 2273 
72.3 Exponential maps . . . . . . ss ssa ls eel sss eene 2274 
72.4 Jacobi fields... 1... ss 2275 
72.5 Equations of geodesic deviation... . . o e les 2275 


72.0.1 REMARK: Geodesics cannot be defined for general differentiable fibre bundles. 

A geodesic is defined quintessentially as a self-parallel curve. In other words, the velocity of a geodesic is 
parallel-transported along the curve. In other words, the covariant derivative of the velocity along a geodesic 
equals zero. A connection which parallel-transports the velocity of a curve in a differentiable manifold must 
be defined on the tangent bundle of that manifold, not a general fibre bundle. It is difficult to see how one 
could generalise the concept of a geodesic from tangent bundles to more general differentiable fibre bundles. 


Geodesics can be meaningfully defined in the following contexts, arranged in increasing order of generality. 


(1) Various special geometries, such as Cartesian space, Cartesian toruses (such as sets [0, 1)* with the torus 
topology, for k € Z+), Minkowski spaces IR x R* for k € Z+, and spheres S^ for Z+. In these cases, 
one may easily specify the geodesics explicitly in terms of simple geometry, without needing calculus. 

(2) Riemannian manifolds. (Here geodesics may be equivalently defined to be either locally distance- 
minimising or self-parallel according to the Levi-Civita connection.) 

(3) Torsion-free affinely connected manifolds. (This includes pseudo-Riemannian manifolds because all 
Levi-Civita connections are torsion free by definition.) 


(4) Affinely connected manifolds with possibly non-zero torsion. 


'To avoid incorrectly applying specialised rules for narrow contexts to wider contexts, it is preferable to first 
present the facts for the most general context, and then progressively accumulate rules which become possible 
in narrower contexts. Ideally each of the above contexts should have its own chapter, presented in order from 
the most general to the most specific. Both the unconstrained-torsion and zero-torsion contexts are presented 
in Chapter 72. (For each topic, the unconstrained-torsion facts are presented before the zero-torsion facts.) 
The Riemannian manifold context is presented in Chapters 73 and 74. The spherical geometry context is 
presented in Chapter 76. 


72.0.2 REMARK:  Geodesics are closely related to the core concepts of differential geometry. 

Geodesics are strongly related to many of the most important concepts in differential geometry. 

(1) Jacobi fields are defined in terms of geodesics as differentials of families of geodesics. 

(2) The equation of geodesic deviation gives a strong relation between Riemann curvature and Jacobi fields. 


(3) Torsion is very closely related to geodesics and Jacobi fields. Roughly speaking, torsion is the difference 
between *Jacobi field transport" and parallel transport along geodesics. 


(4) The convexity of sets and functions is defined in terms of geodesics. (To be convex, a set must have at 
least one geodesic joining each pair of points, and some kind of uniqueness is also typically required.) 


(5) Geodesics have a special role in Lie groups to generate invariant vector fields from Lie algebra elements. 
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(6) In general relativity, a timelike geodesic in pseudo-Riemannian space-time is the “world line of a 
neutral test particle" or of a “freely falling, neutral test body”. (See Misner/Thorne/Wheeler [292], 
pages 244, 246.) 


'Thus the geodesic is one of the principal concepts of differential geometry, strongly related to many of the 
core concepts, including especially the central concept of curvature. 


72.1. Geodesic curves 


72.1.1 REMARK: Geodesic curves are “self-parallel”. 

Geodesic curves are specified in Definition 72.1.2 in terms of the horizontal lift function 0 of an affine 
connection on a tangent bundle. Equation (72.1.1) literally means that a geodesic curve is “self-parallel”. In 
other words, the velocity of the curve is everywhere parallel-transported along the curve by the connection. 
(See Definition 57.9.2 for the first-order differential y~’ of a curve y. See Definition 59.8.2 for the second-order 
differential y”. See Definition 67.4.2 for horizontal lift functions for general differentiable fibrations. See 
Definition 71.1.2 for horizontal lift functions, i.e. affine connections, for tangent bundles.) 


72.1.2 DEFINITION: Geodesic curves in the horizontal lift formalism. 
An (affinely parametrised) geodesic (curve) in a C? differentiable manifold M with an affine connection 0 is 
a C? curve y : I — M for some open real-number interval I such that 


vt € I, Y'O = Oy lr O). (72.1.1) 


72.1.3 REMARK: Constant geodesic curves. 
Definition 72.1.2 does not exclude the possibility of constant geodesic curves. 


72.1.4 REMARK: Basic assumptions for the definition of a geodesic curve. 

The open real-number interval J in Definition 72.1.2 has the advantage that a C? curve y : I > M is well 
defined. If I is a general real-number interval, the expression “C? curve y : I > M" is generally understood 
to mean a continuous curve y : I — M such that y : Int(I) —^ M is a C? curve in the literal sense. 
Differentiable curves are defined in Section 51.9. Curves and families of curves are always assumed to be 
continuous unless otherwise stated. (See Section 36.2 for curves.) 


Both sides of line (72.1.1) are elements of Tj; (T(M)), which is a fibre set of the second-level tangent 
bundle T(T(M)). (See Section 59.1 for second-level tangent bundles.) 


72.1.5 REMARK:  Geodesics, length-minimisation, and continuity of the affine connection. 

In a Riemannian manifold, geodesics (for the Levi-Civita connection) correspond fairly closely to minimal- 
length curves. But Riemannian manifolds only need C! differentiability for their definition, and length- 
minimisation is meaningful even in metric spaces with no differentiable structure at all. 


The concept of self-parallelism in a general affinely connected manifold, i.e. the idea that the velocity vector of 
a geodesic curve is parallel-transported along the curve, requires C? differentiability of the manifold because 
a connection (i.e. a horizontal lift function) maps pairs of base-space vectors and tangent bundle elements 
to the second-level tangent bundle, which at least requires twice differentiability of M for its definition. In 
the case of affine connections on tangent bundles, the total space is T(M) and the tangent bundle on the 
total space is T'(T(M)), which requires a C? manifold M for its definition. But even though it is clear that 
M must be C?, this does not immediately imply that geodesics must be C?. If the horizontal lift function 
is continuous, however, a geodesic curve is forced to be C? by the self-parallelism equation. 


Definition 71.1.2 for a horizontal lift function (i.e. affine connection) does not require continuity. Discontinuity 
of the connection is not physically absurd. In an optical medium with a discontinuous refractive index such 
as at the interface between a lens and the air, the “geodesics” are not even differentiable at the interface, 
and the “connection” in this case is not strictly linear at the interface. (In this case, rays are mapped to 
rays, but not according to a linear transformation. This kind of ray-to-ray transformation is related to the 
unidirectional tangent concepts in Sections 53.2 and 54.16.) However, if the connection is both linear and 
discontinuous, it is difficult to obtain existence of geodesics. Therefore it is preferable to require continuity 
of affine connections, because it helps to avoid the geodesic curve existence issue. One may conclude, then, 
that the C? requirement for geodesic curves in Definition 72.1.2 is not unduly restrictive. 
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72.1.6 THEOREM: Geodesic curves in the tensor calculus formalism. 

Let M be a C? manifold with affine connection 0. Let Ty, denote the Christoffel array for 0 for any chart v € 
atlas(M). Let y : I —^ M be a C? curve in M for some open real-number interval J. Then ^ is an 
affine-parametrised geodesic curve in M if and only if 


yte I, W e€ atlas, 4) (M), 


ape (0)! = Ty Ek AVY? AVY)". (72.1.2) 
In other words, in the one of the customary abbreviated notations of tensor calculus, 
Vt € I, A(t)’ = Tie vr) (0? 4(0*. 
or even more briefly, mE 
= Thy yt 


Proor: The assertion follows from Definition 71.2.2. 


72.1.7 REMARK: Applying the “drop function” to a vertical difference of vectors. 

Both sides of equation (72.1.1) in Definition 72.1.2 are elements of the linear space T; (jj (T(M)). To express 
line (72.1.1) in terms of the covariant derivative, the first step is to observe that the left and right hand 
sides of the equation have the same horizontal component. The horizontal component of y” (t) € Tyq)(M) 
is 7/(t) € Ta (M). (See Definition 59.1.19 for the second-level tangent space T (T(M)) for z € T(M). See 
Definition 59.8.2 for the second-order tangent vector field t — ^" (t) of a curve y.) The horizontal component 
of Oya (y (t)) is 7/(t) by Definition 71.1.2 (iv). Therefore the drop function w in Definition 59.2.9 may 
be applied to the difference of these two vectors. This yields the covariant derivative of the vector field 
X :t «y (t) along y as in Definition 71.6.9. 


72.1.8 REMARK: Notation for the covariant derivative of the velocity of a curve. 

The expression Dy’ in line (72.1.3) in Theorem 72.1.9 means the covariant derivative of the velocity vector 
field y’ along y. (See Definition 71.7.2 for this terse notation.) The expression (D^4')(t) also be notated 
as Dy (t), using Notation 71.7.4, or as DiO,(t), using Notation 57.9.7. One may feel tempted to write 
something like *D.,(;?(t)^, but one must resist this temptation! (See Remark 71.7.3 for more commentary 
on this issue.) 


72.1.9 T'HEOREM:  Geodesic curves in the covariant derivative formalism. 
A C? curve y : I — M for some open real-number interval J in a C? manifold M is an affinely parametrised 
geodesic curve for an affine connection 0 on M if and only if 


vt € I, (Dy')(t) = 0. (72.1.3) 


Proor: The assertion follows from Definitions 72.1.2 and 71.7.2. 


72.1.10 EXAMPLE: Geodesic curves for a nonlinear connection. 

An affine connection function 0y : T,(M) > T(T(M)) as in Definition 71.1.2 effectively tells a geodesic “how 
to twist” if the geodesic knows the direction V in which it “wants to go”. In essence, it maps velocities to 
accelerations. As alluded to in Remark 72.1.5, the “straight lines” followed by light rays in media which have 
a discontinuous refractive index generally have a well-defined geodesic curve for each specified “direction to 
go”, but the dependence may not be linear. It is also possible to construct examples where shortest paths, 
which are geodesics in the metric space sense, are (1) not uniquely determined by the “direction to go”, and 
(2) may not exist for some “directions to go”. A simple example of this is a metric space X = (IR?, d) where 
d(x,y) = ((x1 — y1)? + (#3 — y3)2)? for x,y € X. This space is isometric to the Euclidean metric space 
X = (R,d) via the map ¢: X — X with ¢: (a1,22) > (x1, 21/9). Therefore the geodesics have the form 
t e (Ki (t — c), kat!/3) for some c, kı, k2 € IR. These curves have either zero or infinite velocity for t = c, and 
velocities in directions other than horizontal and vertical are impossible. One may define a generalised kind 
of “connection” for such a set of geodesics which tells the geodesics “where to go" if the initial generalised 
"starting direction" is known, but the affine connection formalism does not easily apply to this kind of 
situation. It is not unthinkable that such twisted geodesics could occur in physically meaningful scenarios, 
but such situations lie slightly outside the bounds of differential geometry for differentiable manifolds. 


[www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2272 72. Geodesics and Jacobi fields 


72.1.11 THEOREM: Formulas for affinely parametrised geodesics in terms of the Christoffel array. 

Let z : [a,b] + M be a C? curve in a C? manifold M with a C? affine connection. Let TẸ be the Christoffel 
array of the connection with respect to a coordinate map w : Q — IR" for some open subset Q of M such 
that x([a,b]) CQ. Then 


(i) x is an affinely parametrised geodesic curve in M if and only if 


d?! 
dt? 


() + Tem) SOL (9) = 0. 


Vt € (a, b), 


(ii) x is a freely parametrised geodesic curve in M if and only if 


d? a da) , dak da 


Vu € (a,b), dk(u) € R, (u) * Tj (2) 7 (u pL du 


Furthermore, k is C! if x is C?, and the re-parametrisation u = f(t) makes t an affine parameter for 
the curve if f satisfies 


f" (u) + klu) f’ (u)? = 0. 


f(u)- T k(z) dz) B 


(iii) If dim(M) = 2, and the curve can be locally expressed as a graph with respect to z!, then the function 
x? = h(a") satisfies 


This equation has solution 


h" — Th (h)? + (E, — Tie — Tan) (W^ + (05 + 02, — T)" 4 r£ = 0. 


PROOF: Part (i) follows from Theorem 72.1.6 by substituting x for v» o y. 


Part (ii) follows from part (i) on substitution of u = f(t), for any C? function f for which f'(t) Æ 0 for 
all t € (a,b). This gives 


d?a*(t) 


da? (t) dx" (t) f" (t) da*(t) 
dt? i 


* Dj (e(t) - 


t 
TEENA, dt dt FE? dt 


This may be interpreted as the parallelism of Dèt and i. In other words, the covariant derivative of the 
tangent vector along the curve is parallel to the tangent vector. 


Part (iii) follows from a comparison of the two equations specified in part (iii). Since they both must have 
the same value for k(u) for each u, k(u) may be eliminated between the two equations to solve for x? in terms 
of xt. Indeed, if rt = u and x? = h(u) are substituted into the two equations in part (iii), they simplify to 


Tyo (h^)? + (Th + T3) + Thy = k(u) 
h” + Typ (h’)? ale Th T DTj)h Es rA = k(u)h'. 


Now substituting k(u) from the first equation into the second gives 


h” — Th (h^)? + (TA -Th — DÀ (I? + (05 + TA — T)" +r =0, 


as claimed. 
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72.2.1 REMARK: The relation between geodesics, torsion and the affine connection. 

Note that the torsion of an affine connection has no influence on the geodesics of the manifold. So if the 
connection is regarded as a way of generating geodesics, then the torsion component of the connection is 
superfluous. Thus the connection can be regenerated from knowledge of just the geodesics, but not the 
torsion component. But if the torsion is required to be zero, then the connection is uniquely determined 
by the set of geodesics. However, knowledge of the covariant derivative determines the whole connection, 
including the torsion component. 


72.2.2 REMARK: Schild’s ladder construction for torsion-free affine connections. 

If the geodesics (together with affine parameters on all geodesics) are known, then the Schild’s ladder 
approach regenerates the definition of parallelism. The minimal unit of such a ladder is a “cross-brace”. A 
cross-brace is a pair of geodesics which meet in their mid-points. Then two pairs of opposite ends of the 
resulting “X” are joined by geodesics, giving a bow tie shape: pa. In the limit of a small cross-brace, the 
opposing geodesics are supposed to be parallel. If the torsion is zero, then this is so. But otherwise, the 
torsion causes such pairs of geodesics to be non-parallel to first order with respect to the distance between 
them. The first-order rate of deviation of the two geodesics from parallel transport is in fact equal to the 
torsion in the plane of the cross-brace. 


If the torsion is non-zero, then a Schild construction will generate a twisting (or expanding etc.) ladder 
which may resemble a DNA double helix or may simply expand or compress without rotating. It would be 
interesting to know how the behaviour of the double helix changes with respect to orientation. A better way 
of thinking of the Schild’s ladder is as an extendable handle >t<><t<, similar to the type used in old-fashioned 
lift doors (as in the Hótel des Vosges in Strasbourg). 


A useful way to demonstrate torsion is to change coordinates around a point in a 2-manifold so that the 
Christoffel array is antisymmetric. Then a pair of basis vectors at the origin gets distorted in a clear way as 
it is moved around under parallel transport. A bow tie can be formed in geodesic coordinates from the four 
points (+1, +1) or from the four points (1,0), (1,1), (—1,0) and (—1, —1). If the torsion is non-zero, then a 
bow tie with parallel ends can be formed by varying the cross-over ratio from 0.5 to something else which 
varies linearly with distance apart of the end lines. But in greater than 2 dimensions, this variation of the 
cross-over ratio does not suffice. It is useful to demonstrate explicitly how the geodesics are unaffected by 
the torsion in the case that all Christoffel array entries are zero except for r = -T4 =a. Then a geodesic 
in the direction (1, 1) is still a straight line in geodesic coordinates. All geodesics through the origin are still 
radial. 


72.2.3 REMARK: The difficulty of generalising torsion to general connections. 

The meaning of torsion can perhaps be slightly better appreciated if one attempts to define it for a general 
connection. In the general case, it is difficult to see which vector parameters to alternate to obtain a torsion- 
like commutator. In the generalisation of the connection coefficients MA to a general differentiable fibre 
bundle, the indices and j typically have a different range to the index k. Obviously it is not easy to 
transpose a non-square matrix. Even if the matrix is square because the fibre space and base space have the 
same dimension, one cannot meaningfully swap a base space vector with a fibre space vector unless they are 
in fact the same space. 


Torsion for an affine connection is in essence the difference between the actual parallel transport and the 
parallel transport which you would obtain if it was defined entirely by geodesic curves in the manner of 
Schild's ladder. (See Remark 72.2.2 regarding Schild's ladder.) But geodesic curves are defined as self- 
parallel curves. In other words, a geodesic is a curve for which the velocity is parallel-transported by the 
curve itself. Such a concept of a geodesic has no obvious generalisation to general connections. The set of 
possible velocity vectors for geodesics is precisely the tangent bundle of the manifold in which the geodesics 
dwell, and affine connections are the connections which are defined on the tangent bundle. Thus torsion is 
quintessentially tied to the concepts of geodesic curves, affine connections and tangent bundles. 


Another way to think of the torsion is that it is the deviation between parallel transport and the exponential 
map. But the exponential map is defined in terms of families of self-parallel curves, in other words geodesics. 
But the concept of a geodesic belongs to affine connections, not general connections. 
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It is perhaps tempting to think that because curvature is well defined for general connections, and the 
formulas for torsion have some similarities to formulas for curvature, then surely torsion must be generalisable 
to connections on general vector bundles also. However, torsion is not as similar to curvature as it may at 
first seem. Torsion is observable in parallel transport along any curve, not just closed curves. Torsion is a 
measure of deviation of parallel transport from the parallel transport which you would obtain for the tangent 
bundle if it was reconstructed from the collection of all geodesics using a Schild’s ladder style of procedure. 
(This is related to Jacobi fields.) By contrast, curvature is only defined on closed curves, whether those 
be finitesimal or infinitesimal. Curvature becomes truly evident only when vectors are transported around 
closed loops to compare them with the initial vector values. If one “follows” the parallel transport of a vector 
along an open curve, it is not possible to determine whether curvature is present because no comparison is 
possible. 


Tronically perhaps, even though the evidence for curvature requires a special kind of loop, namely a closed 
loop, to test for its presence, and torsion can be evident on any differentiable curve, the curvature is gener- 
alisable to general connections while the torsion is not. The reason for this is that the torsion only becomes 
evident along arbitrary curves by essentially parallel-transporting the velocity vectors along the curve itself 
out sideways to generate a field of local curves by which a comparison can be made between forward trans- 
port of transverse vectors and transverse transport of forward vectors. For the definition of curvature, no 
such tranverse transport is required. One merely transports fibres along the curve, ignoring the immediate 
environment outside the image of the curve. This may be summarised as follows. 


(1) Torsion compares longitudinal parallel transport along curves to transverse parallel transport. 


(2) Curvature compares final parallel transport of a vector around a closed curve with the initial vector. 


72.3. Exponential maps 


72.3.1 REMARK: Justification of the name “exponential map”. 

There are two main ways to think of exponentials, namely algebraically in terms of the sum-to-product rule 
which means that exponential of a sum of values equals the product of the exponential for the individual 
values, or analytically in terms of power series. The power series metaphor for exponential maps is not very 
credible for general affinely connected manifolds, but at least there is some sort of additivity property as in 
Definition 72.3.3 part (ii). 


72.3.2 REMARK: Existence and uniqueness of exponential maps. 

Definition 72.3.3 assumes neither existence nor uniqueness of exponential maps. These desirable properties 
are very much dependent on the structure of the affinely connected manifold in question. Since the equation 
of a geodesic is a first-order ordinary differential equation when viewed through charts, the existence and 
uniqueness questions for the exponential map can only be answered within the context of the basic analysis 
for ODEs. 


((2016-11-5. Give existence, uniqueness and other properties for the exponential map in Definition 72.3.3. 
Also compare it to the style of exponential map for Lie groups and Lie algebras. )) 


72.3.3 DEFINITION: An exponential map at a point p in a C? affinely connected manifold M is a map 
@:T,(M) — M which satisfies the following conditions. 


(i) (0) =p. 


(ii) For all V € T,(M), the map óv : IR > M defined by gy : t — ọ(tV) is a geodesic curve with respect 
to the connection on M. 


(ii) VV € T,(M), à, (0) = V. 


72.3.4 REMARK:  Erponential maps and geodesic coordinates. 
Exponential maps are closely related to geodesic coordinates. 
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72.4. Jacobi fields 


72.4.1 REMARK: Bounds on Jacobi fields and derivatives for an affine connection. 

The purpose of Sections 72.4 and 72.5 is to obtain estimates for the Jacobi field and its first and second 
derivatives in terms of bounds on the curvature of an affine connection. In the absence of a metric, only very 
limited kinds of bounds can be expected, in particular relating to the sign of eigenvalues of the Hessian and 
the sign of sectional curvatures. 


In a Riemannian metric space, the magnitudes of the derivatives can be bounded, and also the amount of 
rotation and skewing of Jacobi fields relative to parallel transport along a path. 


72.4.2 REMARK: Families of geodesic curves. 
Definition 72.4.3 is a pragmatic ad-hoc definition which is not intended to be the most general kind of family 
of geodesics. It is useful for some basic calculations regarding geodesics. 


72.4.3 DEFINITION: An m-parameter family of geodesic curves in a C? n-dimensional manifold M, with 
m,n € Zs isa C? map y : I x G — M, where I C R is an open interval of IR and G C R” is an open subset 
of IR" such that 0g» € G, and the map t> q(t, u1,- .. Um) is a geodesic curve for all u = (uj)5*, € G. 


72.4.4 REMARK: Applicability of properties of families of geodesics to higher-order families. 

It is clear that any mj-parameter family of geodesics may be reduced to an m -parameter curve-family 
with mg < mı by fixing values uz of u for k is some subset of mı — m» indices in the set IN,,,, thereby 
obtaining a map 4 : I x G — M defined by 4 : (t,t) — ^(t,u), where u has the chosen fixed values for 
m; — ma indices, and the values in the m2-tuple à for the remaining indices. Hence any statement about 
an m»-parameter family of geodesic curves is immediately applicable to m,-parameter families for curves for 
any m > mə. 


72.4.5 DEFINITION: A Jacobi field along a geodesic curve y in a C® manifold M is a vector field Y along 
the curve y such that 
Y" = R(Y,Y)(7’). 


72.5. Equations of geodesic deviation 


((2016-7-13. In Section 72.5, present the second-order equations of geodesic deviation. )) 


72.5.1 REMARK: First and second order equations of geodesic deviation. 
Section 72.5 presents first and second order equations of geodesic deviation for affinely connected manifolds. 


For any C! curve y: IR — M in a C! manifold M, the velocity of the curve at a point y(t) € M is strictly 
defined as the vector t4(5,5,(uos(1)),u € Tya (M) for any chart v € atlasy) (M). In tensor calculus, the 
coordinates O, (v/* (^y(t))) for i € Nn with n = dim(M), are brought into the foreground. For simplicity, y’ o y 
is written as 7’ for i € Nn, and O,(U* o y) is written as yj. So strictly speaking, yj represents t,(1),9, (yo-(t)),w- 


For a C? curve y in a C? manifold M, the derivatives 42, are well defined, but they are not components of 
a vector in Ty (M). 


72.5.2 THEOREM: Formula for second derivative of transverse field of a family of geodesics. 
Let y : IR? — M be a one-parameter family of geodesics in a C? manifold with a torsion-free C? connection 
with Christoffel array I. Then the tranverse field 7; satisfies 
D? y = Ree (72.5.1) 

where ys denotes 0,7(s,t) € T4(, (M) and y; denotes 0,7(s,t) € T4, (M). 
PROOF: A family of geodesics satisfies 41. + rw = 0. The derivative with respect to t is: 

Yast + Doi Ve + Wy TerVe = 0. (72.5.2) 
For a general one-parameter C? family of curves in a C? manifold, 

(Di) = Yat + ete te Ye DOR TI) tee eI Te 
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Substitution of yf., from (72.5.2) into this gives: 


(DÈ 44)* = Che Vo xi ves + Caer’ + Teh ee ye 


j 


Substitution of 42, = —L77 4147" into this and swapping j with m gives: 


i i i i i j ke 

(D3 m) = (Tjk — Tjen Flee jk — Imk 5778 Ts 
j i i i \ 5 ke 

= (Lik j — Tijk - P'ogLg — UP SC 


This is the same as the expression 


(Rye, HY = Riri rers 
= (Tik j — Tije + Peng — LAH DS ole Mas 


obtained from Theorem 71.11.7. 
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((2016-8-14. Overhaul/rewrite Chapter 73 because it is incomplete and inconsistent, and sometimes incorrect. )) 
73.1. Construction of Riemannian manifolds from other structures 


73.1.1 REMARK:  Coordinatised metric spaces with positive definite Hessian of the distance squared. 

A Riemannian manifold is a differentiable manifold which has a point-to-point distance function such that 
the Hessian of the square of the distance is well defined and positive definite. The Riemannian metric is 
then defined to be half the Hessian of the square of the distance function. Thus a Riemannian manifold is 
a special case of the familiar metric space studied in topology (as in Chapter 37), with the constraint that 
the distance function must satisfy a differentiability condition with respect to a differentiable structure on 
the set. Thus one may say that a Riemannian manifold is a metric space which has been given a compatible 
differentiable manifold atlas. This is illustrated in Figure 73.1.1. 


topological space 


Pd ~ 
topological manifold metric space 
differentiable manifold topological manifold 
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differentiable manifold 
with distance function 


, 


Riemannian manifold 


Figure 73.1.1 Inheritance tree for Riemannian manifolds 


Also shown in Figure 73.1.1 are topological and differentiable manifolds with metric-space distance functions. 
A Riemannian manifold is a differentiable manifold with a metric-space distance function which is required 
to have a very particular kind of relation to the differentiable structure. Riemannian manifolds automatically 
inherit all definitions, notations and theorems from topological spaces (Chapters 31-36), metric spaces 
(Chapter 37), topological manifolds (Chapters 49-50) and differentiable manifolds (Chapters 51-61). 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework". www.geometry.org/dg.html 
Copyright (C) 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2278 73. Riemannian manifolds 


73.1.2 REMARK: A Riemannian metric tensor field may be recovered from its Levi-Civita connection. 
Since the Levi-Civita connection which is calculated from a Riemannian metric tensor field is an orthogonal 
connection, the entire metric field in a connected Riemannian manifold may be recovered if the metric tensor 
is known at a single point. If the holonomy group at that point includes the entire special orthogonal group, 
the metric tensor at the point may be inferred, up to a scalar multiple, from the holonomy group. This scalar 
multiple has no real influence on the geometry of the manifold. Therefore the following three representations 
of a Riemannian metric on a differentiable manifold effectively contain the same information. 


(1) A global distance function. This may be obtained from the Riemannian metric field by extremising 
the lengths of curves between points. 


(2) A metric tensor field. This may be obtained from the distance function as half the Hessian of the 
square of the distance function. It can also be obtained from an orthogonal connection by parallel 
transport from the holonomy group at a single point. 


(3) An orthogonal connection. This may be obtained as the Levi-Civita connection from the metric 
tensor field. 


Roughly speaking, a Riemannian manifold is a differentiable manifold which is equipped with an orthogonal 
connection. There exists a Riemannian metric tensor field on a manifold whose Levi-Civita connection is a 
given connection if and only if the structure group of the given connection is the orthogonal group modulo 
some affine transformation. (This is shown by Auslander/MacKenzie [1], page 179.) Thus a Riemannian 
manifold is a special kind of affinely connected manifold. The construction of the Levi-Civita connection 
is then, in a sense, a reconstruction of the underlying orthogonal connection from which the Riemannian 
metric tensor field construction is inferred. 


Since Riemannian manifolds are derived from affinely connected manifolds by applying an orthogonality 
constraint, Riemannian manifolds automatically inherit all definitions, notations and theorems from affinely 
connected manifolds (Chapters 71-72), which in turn inherit definitions, notations and theorems from general 
connections (Chapters 67, 68, 69 and 70). This inheritance is shown in Figure 73.1.2, combined with the 
inheritance paths in Figure 73.1.1. 
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Figure 73.1.2 Inheritance tree for Riemannian manifolds 


A reverse arrow has been added to Figure 73.1.2 to indicate that the distance function of the metric space 
can be constructed from the metric tensor field on a Riemannian manifold. Because of this circularity, 
the starting point for "inheritance" may be chosen in different ways. In fact, it is possible to start the 
"inheritance tree" with an orthogonal affine connection (an affine connection whose holonomy group is a 
subgroup of the orthogonal group for some inner product), from which a Riemannian metric tensor field may 
be constructed. This circularity makes the order of presentation of the concepts of Riemannian geometry 
somewhat arbitrary and subjective. However, the weight of tradition dictates that the metric tensor field 
should be the starting point. 


73.1.3 REMARK:  Quintessences of the differential layer, connection layer and metric layer. 
In terms of the differential geometry structure layers which are presented in Section 1.1, the Riemannian 
metric tensor is the quintessential concept in layer 4, the affine connection is the quintessential concept in 
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layer 3, and the tangent vector is the quintessential concept in layer 2. A Riemannian metric tensor is a 
kind of differential of the distance function along a curve, an affine connection is a kind of differential of 
parallel transport along a curve, and a tangent vector is the differential of displacement along a curve. So 
the quintessential concept in each layer is the differential of something which varies along a curve. 


layer pathwise concept differential concept 
2 displacement tangent vector 
3 parallel transport affine connection 
4 distance metric tensor 


In each case, the pathwise concept may be reconstituted from the differential concept by integrating it. 
Fundamental physics is written mostly in terms of differential equations. So it is the differential structures 
in layers 2, 3 and 4 which are of greatest relevance to the formulation of physics theories. However, it is the 
corresponding pathwise concepts which have the most immediate intuitive meaning. 


73.1.4 REMARK: Levi-Civita connections have some properties which general affine connections do not. 
There are many properties and formulas which are valid in Riemannian manifolds, but not valid in manifolds 
which do not possess a Riemannian metric. This is not surprising. One does expect that different structures 
will have different properties. But usually it will be evident that extra structure is required in order to 
assert a property because the statement of the property refers explicitly to the structure which is required. 
For example, if the statement of a property includes a metric tensor expression which looks like g,,, it is 
clear that a metric is required. However, there are some formulas which do require a metric although they 
apparently make no reference to any metric tensor. 


A Riemannian metric places constraints on the lower structures which may not be evident because the metric 
tensor field is not mentioned explicitly in many formulas and facts about those lower structures. The most 
obvious influence of a Riemannian metric on lower-layer structures is via the Levi-Civita connection. In 
particular, this constrains the torsion of the connection to be everywhere zero. A Riemannian metric also 
constrains the differentiable and topological structures. Consequently there are lower-layer properties which 
may be expressed without any direct reference to the metric layer, but whose validity requires the provision 
of a Riemannian metric. 


Because of the danger of learning lower-layer formulas in the context of Riemannian manifolds, and then 
accidentally applying metric-layer-only formulas to non-Riemannian manifolds, the Riemannian metric has 
been delayed until this late chapter. As a result, this book is almost at the end when Riemannian manifolds 
first appear. This is not a bad thing. It is a consequence of the fact that actually most of differential 
geometry does not require a metric at all. This may be somewhat surprising to anyone who might think 
that differential geometry is mostly about Riemannian manifolds. 


If a textbook introduces Riemannian manifolds very early (as some indeed do), it is difficult to forget the 
earlier formulas and facts when later chapters deal with manifolds which have no Riemannian metric. It 
is understandable that one wishes to present Riemannian manifolds early. The Riemannian metric has the 
richest set of formulas and facts. Riemannian manifolds are also intuitively easier to grasp than manifolds 
with at most a differentiable structure or an affine connection. 


73.2. Riemannian metric tensor fields 


73.2.1 REMARK: A Riemannian metric tensor field is an “inner product field”. 

In essence, a Riemannian metric tensor field is a kind of “variable inner product”. In other words, it is 
an inner product which is defined on the tangent space at each point of a differentiable manifold. In other 
words, the tangent space at each point is given its own inner product. 


In strict ontological terms, it is quite dubious whether metric tensors should be considered to be tensors, 
despite the name. A more accurate name for them would be “inner product fields”. A metric tensor 
field is part of the basic infrastructure of a manifold, not an actor in the structure which is built on that 
infrastructure. The fact that something “transforms like a tensor” does not necessarily mean that it is a 
tensor. (For example, a basis of the tangent space at a point transforms like a singly covariant tensor, but 
it is not a tensor.) It is best to think of the term “metric tensor” as yet another of the numerous misnomers 
in differential geometry. It is really an “inner product field”. 
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73.2.2 REMARK: A Riemannian metric permits the definition of both distances and angles. 

In principle, a Riemannian manifold is a C! differentiable manifold M where a positive definite symmetric 
bilinear real-valued function g(p) : T;(M) x T,(M) — R is defined at every point p c M. This is called a 
Riemannian metric tensor field. This field defines an inner product at each point p. (See Section 24.9 for 
inner products on linear spaces.) This inner product defines both distances and angles on the tangent space 
T,(M) at each point p. One may think of a Riemannian metric tensor field as defining rulers at each point 
to measure the lengths of tangent vectors, but metric tensors also measure angles between tangent vectors. 
This is illustrated in Figure 73.2.1. 


. TIME » 
Do is Foy d des S 


Figure 73.2.1 Riemannian metric defines distance and angles on tangent spaces 


In general, angles cannot be determined from a length function alone. For example, the general kind of norm 
on a linear space in Definition 24.7.2 defines a distance function, but angles between vectors are not implied 
by such a norm. The specification of angles in a linear space is typically defined in terms of an inner product 
function. A Riemannian metric tensor field effectively specifies an inner product on the tangent space at 
each point, from which both distances and angles are easily calculated. 


In view of the assertion in Remark 73.1.2 that a Riemannian metric tensor field contains the same information 
as the corresponding point-to-point distance function, there seems at first sight to be a contradiction here. 
However, the distance function is constrained so that the Hessian of its square is well defined and positive 
definite, and this Hessian matrix determines an inner product on the tangent space, which in turn determines 
angles between tangent vectors. 


73.2.3 REMARK: Reasonable regularity constraints on Riemannian metric tensor fields. 

Figure 73.2.1 gives the impression that a metric tensor at each point of a Riemannian manifold is entirely 
independent of the value at other points. This would require “too much information" to specify, and it would 
also not be very useful because the first application of a metric tensor field g on M is to calculate lengths 
of differentiable curves y : IR — M by integrating the square root of the non-negative real-number function 
t gy (t)) Gy (t), y (t)) with respect to the parameter t of the curve. To guarantee that this integral is well 
defined, the metric tensor field should be at least locally Riemann integrable in some sense. 


To guarantee that the metric tensor field is locally Riemann integrable, it is sufficient to require it to be 
continuous, but certainly not necessary. Most texts require C! or even C% differentiability of the metric 
tensor field. But it is more enlightening to first place very weak requirements on the regularity of the metric 
tensor field, and then ratchet up the regularity requirements as necessary for particular applications. (As 
an example of an application where the metric tensor field is naturally non-differentiable, one may consider 
the example of optical refraction at an air-glass interface, which may be modelled as a discontinuity in the 
metric tensor field at the interface, which causes minimal-“length” paths to bend in order to minimise the 
end-to-end path length.) 
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73.2.4 DEFINITION: A Riemannian metric (tensor) (field) on a C! manifold M is a positive definite sym- 
metric covariant tensor field of degree 2 on M. In other words, a Riemannian metric on M is a tensor field 
g € X(T??(M)) such that 


(i) Vp € M, Vi, V2 € T (M), g(p)(Vi, V2) = g(p) (V2, Vi), [symmetric] 
(ii) Vp € M, VV € T,(M) \ {0}, g(p)(V. V) > 0. [positive definite] 


73.2.5 DEFINITION: A Riemannian manifold is a pair (M,g) such that M is a C! manifold and g is a 
Riemannian metric tensor field on M. 


73.2.6 REMARK: Interpretation of the definition of a Riemannian metric tensor field. 

A covariant tensor field of degree 2 on a C! manifold M is an element of X(T°?(M)), the space of cross- 
sections of the (0,2) tensor bundle T??(M) on M. (See Notation 57.5.3 for spaces of cross-sections.) The 
covariant tensor field g € X(T°?(M)) in Definition 73.2.4 specifies a bilinear function g(p) : T,(M) x 
T,(M) > R at each point p € M. The symmetry of g implies that g(p) € 23 (T (M); IR) for all p € M. 
(See Definition 30.1.2 and Notation 30.1.4 for symmetric bilinear functions.) The tensor field g is said to be 
positive definite if g(p) is a positive definite bilinear map for all p € M. (See Definition 30.5.3 for positive 
definite bilinear functions.) Thus Definition 73.2.4 specifies that a Riemannian metric g on a C! manifold 
M satisfies the following conditions. 


(1) g € X (T??(M)). 
(2) Vp € M, g(p) € £z (T, (M); R). 
(3) Vp € M, Vv € T,(M) \ {0}, g(p)(v, v) > 0. 


It is not entirely obvious that condition (3) should be enforced. There are situations where a positive semi- 


definite bilinear function would be sufficient and useful. (This is discussed in Section 73.10.) If the sign of 
g(p)(v, v) is unrestricted, the tensor field becomes pseudo-Riemannian. (This is discussed in Chapter 75.) 


The symmetry in condition (2) is likewise not obviously essential. For the purpose of calculating distances, 
the antisymmetric component of the bilinear function makes no difference. 


Since Riemannian metric tensor fields may be regarded as half Hessians of squares of distance functions, it 
would seem reasonable to ask which constraints on the metric tensor field are consistent with such Hessians. 
Generally such Hessians are symmetric, but they are not necessarily positive definite. (See Example 73.10.1.) 


It is often convenient to denote the parameter p € M of a metric tensor field g € X (19? (M)) as a subscript, 
so that g(p)(v, v) may be written as gy (v, v) for example. 


73.2.7 REMARK: The inner product function determined by a Riemannian metric tensor field. 

Since a metric tensor field g defines a positive definite bilinear function on the tangent space Tp(M) at each 
point p of a differentiable manifold M, it defines an inner product at each point. (See Definition 24.9 for 
inner products on linear spaces.) Then the map v ++ g(p)(v, v)!/? defines a norm on T;(M). The vectors 
v € T, (M) represent velocities (or rates of displacement). So the expression g, (^/ (t), ^/ (£)) /? in the length 


integral La (y) = f? GA ORA (0)? dt represents the rate of change of distance along a differentiable 
curve y : R > M. 


73.2.8 REMARK: A conjectural historical origin for the hint-notation “g” for the metric tensor. 

It is possible that the lower-case hint-notation “g” for the metric tensor has its origin in the capital letter 
“G” which Gauß used for the third component of the first fundamental form of a surface embedded in R3. 
Many works in the 19th century followed the use by Gauf [219], page 22, of the letters E, F and G for the 
first fundamental form: 


dz? + dy? + dz? = y (E dp? + 2F dp-dq+ Gdq’), 


where p and q denote coordinates for the surface. (See Spivak [37], Volume 2, pages 55-111, for an English 
translation of this 1827 work.) The Riemannian metric is a generalisation of this first fundamental form for 
embedded surfaces. In an 1894 work by Bianchi [203], pages 153-158, he let E = 1 and F = 0, so that the 
quadratic form became simply dp? + G dq?, which concentrated the information contained within the form 


ee 


into the component G. It is easy to imagine that over time this could have evolved into a lower case “g”. 
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73.2.9 REMARK: Spaces and maps for a differentiable Riemannian metric tensor field. o 
In terms of Notation 57.5.6, a C^ Riemannian metric tensor field g on a C#+! manifold M for k € Zj 
satisfies g € X*(T°?(M)). This means that g is a C^ cross-section of the (0,2) tensor bundle on M. 


It is inconvenient that a C^ Riemannian manifold has only a C^-! metric tensor field in Definition 73.2.11. 
The obvious alternative would be to call it a C* Riemannian manifold if the metric is C^, but then the 
manifold itself would need to be C^*!. 


As a consequence of the naming trade-off in Definition 73.2.11, there is no such thing as a C? Riemannian 
manifold. This seems reasonable because a C? manifold has no tangent vector bundle. So there is nothing 
to define the metric field on. Thus Definition 73.2.11 simply requires the metric field to be differentiable to 
the highest order possible for the given order of differentiability of the manifold. 


73.2.10 DEFINITION: A C* Riemannian metric (tensor) (field) on a C*** manifold M, for k € Zj, isa 
Riemannian metric tensor field g on M such that g € C*(M,T°?(M)). 


73.2.11 DEFINITION: A C* Riemannian manifold, for k € Z*, is a pair (M,g) such that M is a C^ 
differentiable manifold and g is a C^! Riemannian metric tensor field on M. 


73.2.12 DEFINITION: The length of a vector v € T,(M) at a point p in a Riemannian manifold M with 
metric tensor field g € X (T9?(M)), is the non-negative real number gp(v, v)!/?. 


73.2.13 NOTATION: ||v||; denotes the length of a vector v in the tangent bundle T(M) of a Riemannian 
manifold M, where g € X(T°?(M)) is the metric tensor field on M. 


73.2.14 REMARK: Abbreviation of the notation for vector length. 
The vector length ||v||, in Notation 73.2.13 is usually abbreviated to ||v|| or |v|. (See Remark 24.7.9 for the 
optional distinction between the notations || - || and | - |.) 


73.2.15 REMARK: The angle between non-zero vectors at a point in a Riemannian manifold. 

The angle between vectors in Definition 73.2.16 is well defined because ||v|| and ||w|| are non-zero because 
Jp is positive definite, and |gp(v, w)| € ||v|| - ||w|| by the Cauchy-Schwarz inequality. (See Section 19.8 and 
Theorem 24.9.14.) 


73.2.16 DEFINITION: The angle between two vectors V,W € T,(M) \ {0} at a point p in a Riemannian 
manifold M with metric tensor field g is the real number arccos(g(p)(V, W)||V ||; !|W 7 !). 


73.3. Metric tensor fields in tensor calculus 


73.3.1 REMARK: The tensor calculus formalism for Riemannian metric tensor fields. 

Historically the tensor calculus formalism, using the much-maligned “coordinates”, preceded the modern 
abstract so-called *coordinate-free" formalism by some decades. An important first task for metric tensor 
fields is to convert the abstract notations to “coordinates”, particularly so that they can be differentiated. 
Differentiation is well defined for differentiable maps between Cartesian spaces. So it is desirable to convert 
abstract maps into differentiable Cartesian space maps. 


The frequent swapping back and forth between concrete numeric and abstract geometric notations is a 
considerable burden which is unavoidable in differential geometry because this subject is a mixture of the 
“differential” and the “geometric”. A further range of burdens arises if one chooses to apply several abstract 
formalisms simultaneously. It may sometimes seem appealing to work entirely in the concrete numeric 
formalism, namely tensor calculus, because this facilitates all of the calculus operations. However, the 
"debauch of indices" in the tensor calculus requires interpretation because each number-array has a different 
nature, and different kinds of objects permit different kinds of operations. (For the *debauch of indices", 
see Spivak [37], Volume 1, page 39; Volume 2, pages 209-226.) 


The hard discipline of an abstract geometric notation encapsulates arrays of numbers into "objects" in 
well-defined "classes", each of which has a specific set of permitted operations. (In computer programming, 
a class-oriented language is called “strongly typed” or “object-oriented”. This recognises the fact that 
the zeros and ones in a computer require interpretation and meaning so that incorrect operations can be 
avoided.) The formalism conversions in Definition 73.3.2 and Theorem 73.3.4 are part of the unfortunate, but 


[www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


73.3. Metric tensor fields in tensor calculus 2283 


unavoidable, cost of combining concrete numerical and abstract geometrical viewpoints. Instead of trying to 
banish the “coordinates” language from differential geometry, it is more profitable to become “multilingual”, 
with the capability to translate efficiently between the various concrete and abstract formalisms. No single 
“monolingual” approach to differential geometry can deliver all of the benefits of “multilingualism” . 


73.3.2 DEFINITION: The component array (field) for a Riemannian metric tensor field g on a Ct manifold 
M with n = dim(M), for a chart w € atlas(M), is the map g” : Dom(w) — IR?*" defined by 


Vu € atlas(M), Vx € Range(w), Vi, j € Nn, 
Ji (2) eatur (2)) Cage o d as 
where e; is unit vector i in R”. (See Definition 22.7.9 for e;.) In other words, 


V € atlas(M), Vp € Dom(v), Vi, j € Nn, 


95 (4 (»)) = 9(P) (tp ei v tye): 


gini eT. 


(See Notation 54.4.10 for eP^" .) 


73.3.3 REMARK: Notation for components of a Riemannian metric tensor field. 

The indices 7 and j in Definition 73.3.2 are indicated before the z-dependency rather than after, although 
strictly speaking, g” (x) is a real tuple. Thus g^(z);; is more strictly correct, but 9, (x) is more convenient. 
It is customary to suppress the chart p and coordinate tuple x from a metric tensor’s component array field 
notation ar (x). These dependencies are usually clear from the context. 


The formulas in Definition 73.3.2 and Theorem 73.3.4 may seem excessively formal. In typically informal 


tensor calculus notation, one may write much more simply gi; = 9(e;, ej) and g(vi, v2) = gijviv$. 


73.3.4 THEOREM: Value of Riemannian metric expressed in terms of vector and metric components. 
Let g^ : Dom(w) — IR?*" be the component array for a Riemannian metric g on a C! manifold M with 
n — dim(M). 'Then 

Vp € M, Vy € atlas, (M), Yvi, vo € R”, 


n Mox 
ICP) (tpv s poa i) = PEXCODUE 
dj— 


Hence 
Vp € M, Vy € atlas, (M), VVi, V2 € T (M), 
g(p)(Vi, V2) = D si (99) $(U)(Vi)! 6(u)(Vo). 


ij= 


(See Notation 54.5.7 for ®.) 


Proor: The assertions follow from the bilinearity of g(p) for all p € M. 


73.3.5 REMARK: The comma notation for partial derivatives of metric tensor component fields. 

Notation 73.3.6 is an example of the “comma notation” convention for naive partial derivatives (as opposed 
to covariant derivatives) of tensor component fields. It is applied here to metric tensor component fields 
g^ € C!(Range(w), R”*”). The C' differentiability of g” follows from the requirement g € X!(T°?(M)). 


When further indices are appended, further partial derivatives are applied in the same order, which means 
that the left-most index is applied first and the right-most index is applied last as in the example of n ke 
in line (73.3.1). This gives the superficial appearance of reversing the index order. However, it one defines 
DAN in the most obvious way as 0 ko the successive appending of ",k" and “,4” yields the expected 
result, which is CA p) e- (By Theorem 42.3.6, the order of naive partial derivatives only affects the “output” 
if the differentiability is not continuous.) 
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73.3.6 NOTATION: DA ,» for a Riemannian metric tensor field g € X! (T??(M)) on a C? manifold M with 
n = dim(M), v € atlas(M) and i,j,k € Nn, eee the kth derivative of the (i,j) component of the 
component array field g” for g on M. In other words, g^ j4 € C° (Range(V), IR) is defined by 


Va € Range(w), Vi, j,k € Nn, DM (2) = Ink Gi; (x). 


uh for a Riemannian metric tensor field g € X?(T??(M)) on a C? manifold M with n = dim(M), 
v € atlas(M) and i, j, k, ( € Nn, denotes the /th derivative of the kth derivative of the (i, j) component of 
the component array field g” for g on M. In other words, DA pe € C? (Range()), IR) is defined by 


Va € Range(), Vi, j, k,l € Nn, 9L pela) = re (Oar Gh (2)). (73.3.1) 


73.3.7 REMARK: Notation for components of the “inverse” of a metric tensor. 

As mentioned in Remark 24.9.10, it is not strictly correct to say that an inner product has an inverse. An 
inner product is a bilinear map on a linear space, not a map between linear spaces. However, a metric 
tensor does have a square matrix with respect to any given C! chart, and this matrix is always invertible 
by Theorem 24.9.11 (ii). So the inverse matrix is always well defined. This inverse matrix is not the matrix 
of the “inverse” of the metric tensor because there is no such inverse, but it is the matrix of the index- 
raising isomorphism in Definition 73.5.4. It is customary to distinguish the inverse component array in 
Definition 73.3.8 from the component array in Definition 73.3.2 by “raising the indices". 


73.3.8 DEFINITION: The inverse component array (field) for a Riemannian metric tensor field g on a C! 
manifold M with n = dim(M), for a chart v» € atlas(M), is the map gy : Dom(v) — R”*” defined by 


Vw € atlas(M), Vr € Range(v), go (x) = g” (x) 5. 


In other words, the matrix [9:3 (x) | is the inverse of the matrix DAC: )]?;—1 for all x € Range(v). 


i,j—1 2,3—41 


73.4. Riemannian metric functions 


73.4.1 REMARK: Alternative Riemannian metric representation as real function on a vector-pair bundle. 
As mentioned in Remark 58.2.1, the differential df of real-valued function f on a C! manifold M may 
be represented either as a differential map df : T(M) — R or as a differential field df € X(I*(M)) = 
X(T°+(M)). The main difference in practice is that for the differential map, one may write (df)(V) for 
V € T(M), whereas for the differential field, one must write (df),(V), where V € T;,(M). These two 
representations each have advantages and disadvantages. 


Similar alternatives exist for the representation of the Riemannian metric. According to Definition 73.2.4, a 
Riemannian metric is a tensor field g € X (T9??(M)), but it can be equivalently represented as a real-valued 
function n : T?(M) — R, where T?(M) is the bundle of vector pairs in T(M) which have the same base 
point. (See Definition 55.5.37.) This representation is utilised in Definition 73.4.2. It is very easy to convert 
between the two representations with the formulas 7(Vi, V2) = g(v(Vi))(Vi, Vo) whenever «(Vi) = (Vo), 
and g(p)(Vi, V2) = (Vi, V2). 

Differences between the two representations become evident when one constructs differentials while defining 
the Levi-Civita connection. For the metric field in Definition 73.2.4, one obtains a differential in T(T9?(M)), 
whereas for Definition 73.4.2, one obtains a differential in T*(T?(M)) or T(I?(M)) > R. 


73.4.2 DEFINITION: A Riemannian metric (function) on a C! manifold M is a function y : T?(M) > R 
such that 


(i) V(Vi, Và), (Vi, V2) € T?(M), VA, A € R, n(AVi + AV], V2) = An(Vi, V2) + An(V1, V2), [left bilinear 
(ii) V(V;, V2), (Vi, V3) € T?(M), VÀ, A € R, (Vi, AVa + A'V3) = Ag(Vi, Va) + A'n(Vi, V3), — [right bilinear 
(iii) V(Vi, V) € T?(M), n(Vi, V2) = n(V2, Vi), [symmetric 
(iv) VV € T(M), n(V, V) > 0, [nonnegative 
(v) VV € T(M)N {0}, q(V, V) > 0. [positive definite 
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73.4.3 DEFINITION: The Riemannian metric function corresponding to a Riemannian metric tensor field 
g € X(T°?(M)) on a C! manifold M is the map y : T?(M) — R defined by 


Vp € M, Vs, V; € T (M), (Vi, V2) = g(p)(Vi, V2). 


73.4.4 REMARK: The tensor calculus formalism for Riemannian metric functions. 
When the component array field in Definition 73.3.2 is combined with Definition 73.4.3, the result is 


Vu € atlas(M), Vx € Range(w), Vi, j € Nn, 
gij (2) = nox) uo tya) ezt): 
In other words, 
Vw € atlas(M), Vp € Dom(v), Vi, j € Nn, 
GOO) = n pss do p,ez) 
= (e. e$). 
Then Theorem 73.4.5 follows from this. 


73.4.5 THEOREM:  Short-cut version of Riemannian metric expressed in terms of components. 
Let g” : Dom(w) + R”*” be the component array for a Riemannian metric tensor field g on a C! manifold 
M with n = dim(M). Let 7 be the corresponding Riemannian metric function. Then 


Vp € M, Vy € atlas,(M), Vv1, v2 € R”, 
Tops, b: Ip.vs e) = > KAG wb *(p))vyvg. 
Hence 


Vp € M, Vy € atlas,(M), VVi, Vo € T,(M), 
nVi,Va)= QU gg (5 (0) D) EV). 


PRoor: The assertions follow directly from Theorem 73.3.4. 


73.4.6 REMARK: Directional derivative of Riemannian metric function for “constant vector fields”. 

In Theorem 73.4.7 the expression “n(yy, zy)” denotes the map 7) o (yy x zy) : Dom(V) — R for y € atlas(M) 
and y, z € T,(M) for some p € Dom(w). In other words, “n(yy, zy)” denotes the map from q € Dom(v) to 
nyy (q), zu(q)) = g(a)(ys(a), zu(q)). Theorem 73.4.7 is applicable to the computation of the coefficient array 
for the Levi-Civita connection of a Riemannian manifold in Theorem 74.3.2. The derivative in direction V 
of N(Yy, zy) is expressed in terms of coordinates. 


73.4.7 THEOREM: Directional derivative of metric function evaluated on constant vector fields. 
Let (M, g) be a C? Riemannian manifold with n = dim(M). Let 7 be the corresponding Riemannian metric 
function. Then 


Vy € atlas(M), Va € Range(w), Vy, z, V € Ty-1(2)(M), 


Ovn(yo; zy) = 2 Y tod gh (x), 
tj k= 


where (y)? , 2 (v 
See Notation 54.5.7 


)(y E (27)?_) = ®(~)(z) and (v kn, e(u)(V). (See Definition 57.1.20 for yy, and zy. 
for ®.) In other words, 


Vy € atlas(M), Va € Range(w), Vy, z, V € Ty-1(2)(M), 


Ov (Yap: zy) = 3 ENUEN B(W)(v) gf, (x). 


ij,k=1 
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PROOF: Let v € atlas(M), x € Range(z), p = v! (z), and y, z, V € T,(M). Then (yy, zy) satisfies 


Vq € Dom(v), Nyy, z)(q) = nuy (a); zo(a)) 


Qv et) (73.4.1) 


Il 
e 
o. 
& 
e. 
Q 
— 
Q 
Mu 
— 
[Sy 


where line (73.4.1) follows from the bilinearity of g. Therefore 


Oyn(Ug mu) e >> vO 35 yzi gh (a) 


k=1 ijl 
= » yztv Ji 4 (T). 
ij,k-1 


73.4.8 THEOREM: Directional derivative of metric function evaluated om chart-basis vector fields. 
Let (M,g) be a C? Riemannian manifold with n = dim(M). Let 7 be the corresponding Riemannian 
metric function. Let v € atlas(M), x € Range(w), p = Y~! (æ), y = e^", z = eb? and V = e?”. Then 


OV (Yep, zy) = g% t): 


Pnoor: The assertion follows directly from Theorem 73.4.7. 


73.5. Lowering and raising indices of tensors 


73.5.1 REMARK: Metric tensors have no inverse. Long live the inverses of metric tensors. 

Strictly speaking, there is no such thing as the inverse of a metric tensor. However, it has been part of 
Riemannian geometry from the beginning of the subject. It is frequently used to “raise indices” on tensors 
in component form. (Vectors may be converted from contravariant to covariant using the metric tensor, which 
is referred to as “lowering the indices”. The inverse of the metric tensor performs the opposite function, 
which is “raising the indices”.) Therefore some kind of meaning must be given to this concept. 


The metric tensor g(p) at a point p of a Riemannian manifold (M, g) is a real-valued symmetric bilinear map 
on the linear space T;,(M). In other words, g(p) € .Z (Tp(M), IR). Such a map g(p) : T,(M) x T,(M) > R 
clearly cannot be inverted in the way that a linear map can. 


The fact that a bilinear form can be expressed as a square matrix with respect to a basis does not imply 
that a bilinear form is the same kind of object as a linear space endomorphism, which can also be expressed 
as a square matrix. The wide range of “kinds of objects” which are associated with matrices is discussed 
in Remark 25.1.2, where inner products are listed as application context (4). (This, by the way, is one of 
the good reasons to avoid “coordinates”. The kind of object which is represented is easily forgotten when 
an object is replaced by a nondescript array of numbers.) The distinction between bilinear forms and linear 
maps is expressed by Frankel [12], page 61, as follows. 


A point of confusion in elementary linear algebra arises since the matrix of a linear transformation 
there is usually written A;; and they make no distinction between linear transformations and 
bilinear forms. We must make the distinction. 


When a bilinear form is written as a matrix, it is not at all difficult to invert it as one does with a matrix 
representing a linear map, but one must then ask what the “inverse of a bilinear map” could possibly mean. 
Fortunately it is in fact possible to give an authentic meaning to inverses of metric tensors, but some careful 
interpretation is required. 


It is possible to convert a bilinear map on T,,(M) to a linear map from T, (M) to R by holding one of the inputs 
constant. Then by varying this “constant” input, one obtains a map from T,(M) to T;(M). Thus a map 
H3 p € Lin(T,(M), T5 (M)) may be defined by uzp : u (v > g(p)(u, v)). It is shown in Theorem 24.9.11 (i) 
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that this is a linear map from T,(M) to T(M). Since a metric tensor field g € X(T°?(M)) is an inner 
product at each point p € M, it follows from Theorem 24.9.11 (ii) that u7 p is an invertible linear map for 
all p € M. However, it is incorrect to call the map (u7,) | € Lin(T7 (M), T;(M)) “the inverse of the metric 
tensor”. Some authors call (17)! the “sharp musical isomorphism” because it can be used to raise indices 
of tensors, and j1;,, is then referred to correspondingly as the “flat musical isomorphism”. 


73.5.2 REMARK: Index lowering and raising really transforms tensors, doesn’t just change notation. 
Although the colloquial terminology for lowering and raising indices of tensors might suggest that these 
modifications are purely notational, and in practical usage indices are raised and lowered almost as if this is 
merely a matter of notation, the “musical isomorphisms” do in fact transform tensors to different tensors, 
not just the same tensors with different components or notation. A clear example of the effect of these 
isomorphisms is the gradient operator which is discussed in Remark 74.6.1, which is the “sharp” version 
of the differential operator. The differential operator yields a tangent covector which quantifies the rate of 
change of a real-valued function in a given direction, whereas the gradient operator yields a tangent vector 
which indicates the direction of most rapid increase of the real-valued function per distance moved. If a 
basis (ei)2., is chosen at a point p so that g(p)i; = g(p)(ei, ej) = oi; for i, j € Nn, then the components of 
the gradient and differential will be the same. But if the basis is changed, the gradient and differential will 
have different components. Thus one could say that they are “the same vector" only in the sense of having 
the same components in normal coordinates, although they are clearly not the same object. If it could be 
arranged that g(p);; = 6;; for all p € M, it would be difficult to distinguish the sharp and flat versions of 
tensors, but in such a flat locally Euclidean space, many other distinctions vanish also. 


73.5.3 DEFINITION: The indez-lowering isomorphism or flat musical isomorphism at a point p in a Ct 
Riemannian manifold (M, g) is the map u} p : Tp(M) > T5; (M) defined by p7p : u e (v > g(p)(u,v)). In 
other words, 


Vu € T,(M), Vv € Ty (M), Mg p (AU) v) = g(p)(u, v). 
In other words, 


Vu € T,(M), Mg »(u) = g(p)(u, -). 


The indez-lowering isomorphism or flat musical isomorphism for a C! Riemannian manifold (M, g) is the 
map u, : T(M) — T*(M) defined by 


Vp € M, Vu € T,(M), Hg (u) = Hg p(u), 
where u} p is the index-lowering isomorphism at each point p in M. 
73.5.4 DEFINITION: The indez-raising isomorphism or sharp musical isomorphism at a point p in a Ct 


Riemannian manifold (M, g) is the map utp = (ug) | : T;(M) > Tp(M), where ji; , is the index-lowering 
isomorphism at p in (M, g). 


The index-raising isomorphism or sharp musical isomorphism for a C! Riemannian manifold (M, g) is the 
map wt : T*(M) — T(M) defined by 


Vp € M, Yw € T; (M), ud (w) = ui. Qu), 


where is is the index-raising isomorphism at each point p in M. 


73.5.5 REMARK: A simple application of the index-raising isomorphism. 
Theorem 73.5.6 is almost obvious, but it is a typical step in the utilisation of the index-raising isomorphism. 
Given that a map f : T;(M) — R satisfies line (73.5.1) for some fixed u € T,(M), the value of u can 
be obtained by applying the index-raising isomorphism to f. It is not necessary to prove a-priori that 
f € T; (M) because the linearity of f is implied by the equation (73.5.1). 


In the traditional tensor calculus style, the assumption of Theorem 73.5.6 would look like *g;;u! = f;", and 
the assertion following from this would look like “uf = g” f; ^. The assertion “u = p} (f)" is a more modern 
way of saying the same thing. 
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73.5.6 THEOREM: Applying the index-raising isomorphism to solve a linear equation. 
Let (M, g) be a C! Riemannian manifold. Let p € M, u € T,(M) and f : T;(M) — R satisfy 


vv € Ty(M), a(p)(u,v) = ft). (73.5.1) 
Then u = uz (f). 
PROOF: From line (73.5.1) and Definition 73.5.3, it follows that uz (u)(v) = f(v) for all v € T; (M). Then 


since u7 (u) € T; (M), it follows that f = uz (u) € T;(M). So uj (f) is well defined by Definition 73.5.4, 
and u$ (f) = uj (ug (u)) = u by Definition 73.5.4. 


73.5.7 REMARK: Lowering indices using a Riemannian metric. 
The practice of “lowering an index” on an n-tuple of components (v")?_, of a tangent vector v € T,(M), using 
a metric tensor g € -%3 (T,(M)) with components (gij): j=1 With respect to the same basis vectors (e;)7.1; 


converts the contravariant components (v’)"_, (which are the components of a linear combination of basis 
vectors) to a covariant components n-tuple (vj); with vj = 7; ,gijv^ for j € Nn. These covariant 
components are calculated as the inner product of the vector v with the basis vectors. This can be seen by 


calculating the inner product v; = g(ei, v) for i € Nn. Then in terms of components, v; = g(ei, Pd viej) = 
jai glein ej) = 375 gizo. 


73.5.8 REMARK: A Riemannian metric permits tangent vector components to be specified covariantly. 
One way to think of Riemannian metric tensors (and inner products in general) is that they define a covariant 
component chart on the set of tangent vectors at a point, as opposed to the way in which a basis defines a 
contravariant component chart on a tangent space. 


Using a basis (e;)/_, for a linear space T (M), one may write vectors v € T,(M) as linear combinations v = 
oia vtei, where the component tuple (v’)"_, varies contravariantly to the basis tuple (e;)?_,. (For example, 
if the basis vectors are scaled, the components must be inversely scaled.) With an inner product g(p) € 
Z (T,(M)), one may define covariant components v; = g(p)(ei,v) for v € T,(M), for i € Na. Then 
vi = g(p)(e 325, ve) = 255. v g(p)(eue;) = 325-19(p)iv/. More briefly, v; = gijv. In other words, 
the metric tensor converts contravariant components into covariant components. (For example, if the basis 
vectors are scaled, the covariant components (v;)2., are scaled in the same way.) 


73.5.9 REMARK: The relative simplicity of computation of covariant components of vectors. 

The covariant component tuple (v;)/_, of a vector v € T,(M) with respect to an inner product g(p) € T)?(M) 
is simpler to compute than the corresponding contravariant component tuple (v^) ,. Given a basis (e;)"_, 
for T;(M), the covariant tuple is given by v; = g(p)(v,ei) for i € Nn, which is a single computation for 
each index i individually, independent of the other indices. But the contravariant tuple effectively requires 
the solution of the equation v = $7; , vte; for the components (v*)?-,. If any one of the basis vectors is 
altered, then in general, the entire contravariant component tuple must be recomputed because the tuple 
depends on all of the basis vectors. So it seems that covariant coordinates are simpler to compute. However, 
in applications the contravariant component tuple is more useful because most of the useful functions of 
vectors are linear. Contravariant components are also well defined for the broad category of differentiable 
manifolds, whereas covariant components require a non-singular metric tensor field. 


'The classical Euclidean construction for covariant components of vectors is also relatively simple. The most 
obvious method is to simply “drop a perpendicular” PP, from a vector OP to the line extending a given 
basis vector OA as in Figure 26.11.1 in Remark 26.11.3. The length of the line PP, may then be divided 
by the length of OA to obtain the covariant component. (In classical Euclidean geometry the length of OA 
must be a rational number.) By contrast, the classical Euclidean computation for contravariant components 
requires the construction of a parallelepiped, whose edges must then be divided by the corresponding unit 
vector lengths. This parallelepiped depends on all of the unit vectors. Constructions in plane Euclidean 
geometry for contravariant and covariant coordinates are illustrated in Figure 73.5.1. 

The use of right-angle symbols for the covariant coordinate constructions in Figure 73.5.1 is significant. 
This is direct “evidence” that covariant coordinates require the metric, whereas the contravariant coordinate 
constructions do not. Contravariant coordinates require only the affine space structure, and are invariant 
under affine transformations. 
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contravariant coordinates x and y covariant coordinates r and y 


Step 1 B 


Construct X 


Step 1 
Construct X 


P 
I 
I 
I 
l ri 
nX 
> + » 
O A O =< A 
I 
T oe ri 
Step 2 ; Step 2 
Construct Y r2 Construct Y 
Ti 
Q) Y P 
r3 a 
j 
l 
I 
T3 » gX » 
O A O A 
j 
l 
z = |OX|/|OA], y = |OY|/|OB| z = |[OX|-|OA|, y = |OY|- [OB] 
Figure 73.5.1 Contravariant versus covariant coordinate constructions 


73.5.10 THEOREM: Application of musical isomorphisms to general and chart-basis vectors and covectors. 
Let M be a C! manifold with Riemannian metric tensor field g and n = dim(M). 


(i) Vp € M, V € atlas, (M), Vv € IR", uz (teow) = Ezi 95 (0 (0) v6. 
(ii) Vp € M, Vv € atlas, (M), Vw € R”, ug (th wy) = 3255-192 (v(p))w;e?". 


) 
(iii) Vp € M, Vj € atlas, (M), Vi € Nn, uy (e^) = 355. 9 (0 (»))6 y 
(iv) Vp € M, Vw € atlas, (M), Vi € Nn, ud (et) = 555-19) (w(p) eR”. 
PRoor: For part (i), let p € M, v € atlas,(M) and v € IR". Let U € T,(M). Then U = ty. for some 
u € R”. So by Definition 73.5.3, 


Hg (tpv) (U) = g(p) (tpv, tpu) 


= PO (73.5.2) 
= X sf (9e), (U) (73.5.3) 
- (X svo)v'e,4)(O), (73.5.4) 


where line (73.5.2) follows from Theorem 73.3.4, and line (73.5.3) follows from Theorem 55.3.6 line (55.3.1), 


and line (73.5.4) follows from Definition 23.6.4. Hence u7 (tpv) = 55, i4 G3, (0 (p))v'e) ,.. 
For part (ii), let p € M, v € atlas,(M) and w € R”. Let U = u$ (tiwy) Then un; (U) = tay 


by Definition 73.5.4. Let U = tpu,y with u € IR". Then t?,, = 5,4 95 (V(p))u'e) y by part (i). 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2290 73. Riemannian manifolds 


Therefore Vj € Nn, wy = P; gj; (V(p))u! by Theorem 55.3.9. So Vk € Nn, 377.1 gj (W(p))w; = u* by 
Definition 73.3.8. Hence uy (t5) = tpu = 2554 used? = ja 9 (v(p))wie?. 
Part (iii) follows from part (i) and Notation 54.4.10. 
Part (iv) follows from part (ii) and Notation 55.3.3. 


73.6. Curve length 


73.6.1 REMARK: Requirements for the definition of curve length. 
Continuous curves are well defined in any topological space. (See Definition 36.2.3.) Therefore curves are 
well defined in differentiable manifolds, in particular in Riemannian manifolds. 


Curve length in Riemannian manifolds is well defined for curves which are differentiable almost everywhere. 
(See Section 51.9 for differentiable curves. See Section 50.7 for Lipschitz manifolds and rectifiable curves. 
See Notation 56.6.3 for 2 (T(M)).) 


73.6.2 DEFINITION: The length of a C! curve  : I — M in a C! Riemannian manifold M is 
] 0.5 as, 


where g € X? (ZF (T(M))) is the metric field on M. 


73.6.3 REMARK: Differentiation of the length of a curve with respect to its parameter. 

It is clear that (d/dt)L4(t) = gy)(¥(t), (0)! for all t € I, where L,(t’) = ra Gt) (Y (0), V(t)? dt 
for t' € I = (a,b), whenever y is a C! curve in a Riemannian manifold M. It is perhaps not so immediately 
obvious that Z(d?/dt?)(L4(t) — anms = gan Y (t^), (0). 


73.7. Distance functions 


73.7.1 REMARK: Conditions for the well-definition of distance on a Riemannian manifold. 

A distance function may be constructed from the metric tensor field on a Riemannian manifold. Some 
immediate issues which arise are whether this distance function is well defined, and how regular it is. A 
distance function d: M x M — Ri on a manifold M must satisfy Definition 37.2.3. 


Another interesting kind of question about Riemannian manifold distance functions is whether they may 
be differentiated to recover the original metric tensor field, and whether a distance function on a manifold 
which is not obtained by integrating a metric tensor field may be differentiated and then integrated again 
to obtain the original distance function. (These questions are reminiscent of the fundamental theorem of 
calculus, which says that integration and differentiation are essentially inverse operations.) These further 
questions are discussed in Section 73.9. The corresponding questions about pseudo-Riemannian manifolds 
will be discussed in Section 75.2. 


The basic formula for the distance function in a Riemannian manifold (M, g) is 


Va,y € M, d(r,y) — inf f v gv (0) E, V (0) dt, (73.7.1) 
VEC xy Dom(y) 

where Gry is some class of continuous curves from x to y for each x, y € M. To obtain any curves at all, 

x and y must be in the same topological component of M. One could define the distance d(x,y) to equal 

-Foo when x and y are in different components, but it is customary to assume that the manifold is connected 

so as to avoid such questions. 


To obtain a well-defined tangent bundle with an n-dimensional tangent space at each point, for n — dim(M), 
the manifold must be C! differentiable. (A slightly weaker condition might suffice, but it is probably not 
worth the effort.) 


Each y € Gr, must be a continuous curve of the form v : [a,b] ^ M with y(a) = x and 4(b) = y, and the 
tangent vector y(t) € T,(j(M) must at least be defined almost everywhere (with respect to the Lebesgue 
measure on IR) on the interval (a, b). 


The metric tensor field g € X (.Z; (T(M))) must be Riemann integrable on all curves in the curve class Gy y 
for all x,y € M, when applied as indicated in line (73.7.1). (See Notation 56.6.3 for -Z7 (T(M)).) 
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73.7.2 REMARK: The Hopf-Rinow geodesic completeness theorem. 

It was shown by Hopf and Rinow in 1931 that under specific conditions on a differentiable manifold, every pair 
of points in the manifold can be joined by a C? curve whose length is equal to the distance between the two 
points. (See Heinz Hopf, Willi Rinow, “Uber den Begriff der vollständigen differentialgeometrischen Fläche”, 
Commentarii Mathematici Helvetici, 3 (1931), 209-225. See also Lee [24], pages 91-113; Cheeger/Ebin [5], 
pages 9-11; Bishop/Crittenden [2], pages 152-158; Lang [23], pages 225-226; Gómez-Ruiz [14], pages 233-237; 
Do Carmo [9], pages 144-149; Gallot/Hulin/Lafontaine [13], pages 94-95; Poor [32], page 136; Spivak [37], 
Volume 1, pages 342-343; Guggenheimer [16], pages 285-286; Choquet-Bruhat [6], page 120; Kobayashi/ 
Nomizu [19], pages 172-179; Frankel [12], page 564. ,u 


73.8. Calculus of variations for extremisation of distance 
((2016-8-14. In Section 73.8, derive the Christoffel array by extremising the length of a curve. )) 


73.8.1 REMARK:  Geodesic curves are defined in terms of the calculus of variations. 

Section 73.8 deals with existence and uniqueness for calculus of variations. This is required for deriving the 
equations for geodesic curves from the property that they locally extremise a distance integral. (For some 
references to the literature on calculus of variations, see Remark 44.8.1.) 


A derivation of the Levi-Civita connection from length-extremisation via the Euler-Lagrange equations is 
given, for example, by Synge/Schild [41], pages 37-41. However, although the calculus of variations does 
yield geodesics, the affine connection obtained in this way is in general not unique because the torsion 
is not determined by length-extremisation alone. The torsion is set to zero by arbitrarily symmetrising 
the Christoffel-like array of numbers which arise from the extremisation procedure. Since the asymmetric 
component has no influence on geodesics, it may be freely chosen. This leaves open the question of how 
to justify geometrically the choice of the affine connection which has zero torsion from amongst all of the 
connections which yield the same set of affine geodesics on a Riemannian manifold. 


73.8.2 REMARK: History of geodesic curves based om the calculus of variations. 

It is pointed out by Do Carmo [9], pages 60-61, that Johann Bernoulli (also known as Jean or John Bernoulli) 
knew in 1697 the characterisation of geodesics on convex surfaces as curves for which the acceleration within 
the ambient space is normal to tangent plane of the surface, and that in 1732, Euler investigated equations 
for geodesics on surfaces S C IR? of the form S = {x € IR?; f(x) = 0) for some function f : IR? — IR. (The 
Bernoulli characterisation is derived by Frankel [12], pages 233-234.) Boyer/Merzbach [237], pages 396-397, 
wrote the following. 


Jean Bernoulli wrote prolifically on many advanced aspects of analysis—the isochrone, solids of 
least resistance, the catenary, the tractrix, trajectories, caustic curves, isoperimetric problems— 
achieving a reputation that led to his being called to Basel in 1705 to fill the chair left vacant by his 
brother’s death. He is frequently regarded as the inventor of the calculus of variations, because of 
his proposal in 1696-1697 of the problem of the brachistochrone, and he contributed to differential 
geometry through his work on geodesic lines on a surface. 


Cajori [241], page 234, describes how Euler presented the calculus of variations in a book in 1744, 


[...] which, displaying an amount of mathematical genius seldom rivalled, contained his researches 
on the calculus of variations to the invention of which Euler was led by the study of the researches of 
Johann and Jakob Bernoulli. [...] Johann Bernoulli’s problem of the brachistochrone, solved by him 
in 1697, and by his brother Jakob in the same year, stimulated Euler. The study of isoperimetrical 
curves, the brachistochrone in a resisting medium and the theory of geodesics, previously treated by 
the elder Bernoullis and others, led to the creation of this new branch of mathematics, the Calculus 
of Variations. 
Ball [232], page 396, made a similar comment. 

The classic problems on isoperimetrical curves, the brachistochrone in a resisting medium, and the 
theory of geodesics (all of which had been suggested by his master, John Bernoulli) had engaged 
Euler’s attention at an early date; and in solving them he was led to the calculus of variations. 


Despite the close association between geodesics and extremisation problems in their early history, these 
topics are disassociated in the modern abstract treatment in affinely connected manifolds. The calculus of 
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variations, applied to the minimisation of distance between two points, yields local equations in terms of an 
affine connection, which has been generalised beyond the metric space context. 


73.9. Metric tensors versus distance functions 


((2016-8-14. Section 73.9. Reinstate Theorems t46 and t451 on the Hessian formula for a Riemannian metric 
tensor. )) 


73.9.1 REMARK: Recovery of the metric tensor field from the point-to-point distance function. 
The Riemannian metric tensor’s component matrix g = gig]? j=1 and the point-to-point metric function d 
(presented in Chapter 37) are related by the following formula. 


_ 1 @d(x, y)? 


= M : 13.9.1 
2 Oy Oy) y-c ( ) 


gi; (x) 


This may be abbreviated to g;;(x) = 40,;d?(x,-). That is, the Riemannian metric tensor's component 
matrix g at a point x equals half the matrix of second-order derivatives of the square of the two-point 
distance function d with respect to the coordinates. (To avoid technicalities, points and coordinates are used 
interchangeably here.) 


A Riemannian manifold is a combination of differentiable manifold and metric space structures such that 
the square of the distance function is twice differentiable. The two-point distance function may be recovered 
from the metric tensor by integrating and minimising. 


d(x,y) = Ed l \/ gij dzidzi 
m.y JY 
= i '(t ' (t)) dt. 
E | Jno 0, r0) 


The point-to-point distance from x to y is defined to be the minimum integral of the metric tensor over 
all differentiable curves from x to y. A Riemannian manifold may therefore be defined in terms of either 
a metric tensor g or a two-point distance function d. These are two representations of the same distance 
function. This is similar to the way in which parallelism may be specified either as a point-to-point pathwise 
parallel transport function (as described in Sections 21.15 and 21.16 and Chapter 48), or differentially as a 
connection on a fibre bundle (as described in Chapters 67, 68 and 69). 


73.9.2 REMARK: Analysis of the relation between distance functions and Riemannian metrics. 
The task here is to determine the precise relation between the Riemannian metric and the point-to-point 
distance function of topological space theory. The latter is defined as a function d : M x M > Ri on a set 
M which satisfies the three conditions of identity d(r,x) = 0, symmetry d(x, y) = d(y,x) and the triangle 
inequality d(x,z) € d(x,y) + d(y, z). This may be referred to as a “two-point distance function". 


When the set M is a manifold, it is possible to regard the distance function as a function d : Range(w) x 
Range(w) > IR" defined by d : (x,y)  d(V- 1 (x), Yt (y)), where v) is a chart for M. Then for the distance 
function d to correspond with a Riemannian metric g = (gi;)7;-, in terms of local coordinates, one would 
expect equation (73.9.2) to be satisfied. 


Va,y € Range(w), d(x,y) = (gij (x) (y! — a*)(y? — zi)? +o(jy—2|) as y> z. (73.9.2) 
This implies, and (probably) is implied by equation (73.9.3). 

Yz, y € Range(), d(x,y)? = gi (zx)(y! — z*)(y? — x) + o(ly — z|?) asy a. (73.9.3) 
Here |y — x| denotes the standard Euclidean norm in R”. It is probably true that the manifold M is a 
Riemannian manifold whose metric tensor is g if and only if equation (73.9.3) is satisfied for all x in every 


chart and g is continuous and positive definite. 
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It may follow from (73.9.3) that the second derivatives of d(x, y)? with respect to y exist for all x by using 
the continuity of g. With a bit of luck, these derivatives might be continuous with respect to x too. 


The above equations may be equivalent to the following. 


8; (d(x, -)) = (gila jovi, (73.9.4) 
which means 


lim t !d(z,z + tv) = (gi; (z)viv?)/2, 
tot 


for all v € IR". It may be that if all conditions are taken together, such as continuity of d and the triangle 
inequality, then the second derivatives of d? may exist and be continuous. 


In the other direction, the distance function can be generated from the Riemannian metric by minimisation 


over curves as follows. 
d T, min Vy Sij dzidzi, 
( y) me f Jij 


where the minimisation is over suitable C! curves from z to y in the usual way. If g is continuous, then for 
y in a small enough neighbourhood of x, the minimising curve y should be unique and C?. By aligning the 
minimising curves with the radial directions out of x (normal coordinates), equations (73.9.2) and (73.9.3) 
should be recovered. 


73.9.3 EXAMPLE: Riemannian metric calculation for p-norm distance functions. 

A simple example shows that a Riemannian distance function is a very special kind of distance function. 
Consider the set IR" for n > 2 with the distance function d : (x,y) + |y — z|p, where the p-norm is defined 
as usual by |x|, = (©; ; |z?|?) V? for 1 € p < oo, and |z| = max?_, |z"| in the p = oo case. Clearly d 
corresponds to a Riemannian metric if and only if p = 2. (See Definition 24.7.11 for the p-norm.) 


Consider the value of d(0, y) for n = 2. The value is ((y!)? + (y?)?)!/?. A Riemannian metric must converge 
to a quadratic function of y as y — 0. This can only happen for p — 2. 


A change of coordinates can remove the problem at a single point, but not at all points in IR". This kind 
of example makes it clear that distance functions can only be Riemannian if they are in some sense locally 
affine distortions of Euclidean space with the 2-norm. 


73.9.4 DEFINITION: The Riemannian metric tensor field induced by a distance function d : M x M 5 R$ 
on a C? manifold M for which the map Qp : M — Rj given by Qp : q — 3d(p, q)? has a well-defined Hessian 
at p, is the tensor field g € X (T??(M)) given as the Hessian of Qp at p for each p € M. In other words, 


Vp € M, g(p)(u, v) = Ho, p (u, v), 
where the Hessian Ho,» of Qp at p is as in Definition 71.9.5. 


73.9.5 EXAMPLE: Distance function and metric tensor field for a torus. 

The torus distance function in Definition 37.2.12 gives a metric space structure to the set M = x7 lx, 
where for each k € Nn, the set I; is a semi-open interval I, = [ax,ax + Li) or I, = (ak, ay + Li] for some 
à; € Rand Lj, € IR*. If half the square of this distance function d : M x M > Rj, given by 


n F 1/2 
Vz,y € M, d(x,y) = (Z7 min(|yx — za], Le — lyk — 22) ^, 


is twice differentiated with respect to y at y = z, the result is g;;(z) = ĝi; for all x € M. This is because at 
short distances, the distance function on such a torus is identical to the distance function on the Euclidean 
space IR". 
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73.10. Slightly non-Riemannian metric tensor fields 


73.10.1 EXAMPLE: Geodesics for a slightly non-Riemannian metric 

Define a C^? manifold M < (M, Am) with point set M = IR? and differentiable manifold atlas Am = (v) 
where 7 = idgz. Define a tangent bundle on M by T(M) = ((p,v,v); p € M, v € R?}. Define a metric 
tensor field g € X (T9?(M)) by g(p)(v, w) = 9pturw1 + vw» for p € M and v,w € R?. Then g(p) is positive 
semi-definite for p € M, and positive definite for p € M' = {x € R?; zı 4 0). The Levi-Civita connection 
for (M, g) has coefficient array TE p) = 2p; 'ôi10j1Ôk1 for p € M' and i,j,k € IN». (See Definition 74.3.4.) 
Define curves Yk e : IR — M by 9s, : t e ((kt 4- c)!/3,t) for k,c € IR. (See Figure 73.10.1.) Then the curves 
Yk, are affinely parametrised geodesics in M” for all k, c € IR by Theorem 72.1.6. (See Example 74.3.5 for a 
class of metric tensor fields which generalises this example.) 


“a yı,—0.5(t) = (t — 0.5), t) 


*o(t) = (t,t) 


72,0(t) = ((2t)'/°, t) 


y-1,0(t) = (t,t) 


Figure 73.10.1 Affinely parametrised geodesics for a slightly non-Riemannian metric 


73.10.2 REMARK: Difficulties with metric tensor fields which are not positive definite. 

The curves yy. in Example 73.10.1 are affinely parametrised geodesics because they are distance-minimising 
curves. The metric function g is defined on all of M. So the curve length La p(y) = f? g (Y (E), Y (£)) 2? dt 
is well defined for all Ct curves in M. The only thing that fails here is the calculation of the Levi-Civita 
connection coefficients, so that the geodesics cannot be calculated in terms of these coefficients. 


Two obvious responses to this situation are to (1) exclude metric functions which are not positive definite, 
or (2) extend the affine connection coefficients concept to situations where the metric tensor vanishes. The 
second approach seems preferable because singularities are encountered in general relativity scenarios where 
the connection coefficients are not defined, but where geodesics are nevertheless well defined. A fly in the 
ointment here is that changes of coordinates which transform a positive definite metric tensor field to a 
metric which is not positive definite do not have an invertible Jacobian matrix. This would then require a 
generalisation of the definition of a differentiable manifold. However, this might not be such a bad thing. 


73.10.3 REMARK: The advantages of calculating a metric tensor field from the distance function. 
Examples 73.10.1 and 74.3.5 show how to generate interesting classes of metric tensor fields for purposes of 
demonstrating concepts or providing counterexamples to conjectures by calculating half the Hessian of the 
square of point-to-point distance functions. Such classes of metric tensor fields have the advantage that it is 
often very much easier to determine the geodesics which correspond to the Levi-Civita connection. 
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((2016-8-14. Overhaul/rewrite Chapter 74 because it is incomplete and inconsistent, and sometimes incorrect. )) 


74.1. Levi-Civita parallelism 


74.1.1 REMARK: The historical origin of Levi-Civita parallelism. 
Do Carmo [9], page 48, stresses the importance of the Levi-Civita connection. 


A fundamental event in the development of differential geometry was the introduction, in 1917, of 
the Levi-Civita parallelism. 


The justification of this parallelism which was given in 1926 by Levi-Civita [26], pages 100-105, is somewhat 
unsatisfactory because it relies heavily on geometric intuition of an infinitesimal association of curved surfaces 
with “developable surfaces" (which are essentially folded planes with zero Gaufiian curvature). Do Carmo [9], 
page 48, explains the covariant derivative of a vector field along a curve as the “orthogonal projection" of the 
extrinsic derivative onto the tangent plane at each point of the curve. The geometrical construction described 
by both Levi-Civita [26], pages 100-105, and Do Carmo [9], page 49, requires a ruled surface which is tangent 
to the curve at every point, which should be possible within a neighbourhood of the curve if it has bounded 
curvature. Then the projection of the derivative of a vector field along the curve onto this ruled surface 
is effectively defined on a flat Euclidean space. This very subtle, apparently extrinsic construction yields a 
derivative which depends only on the intrinsic metric tensor field. But most importantly, the Levi-Civita 
parallelism is the foundation of covariant differentiation, which is probably the most important unifying 
concept of Riemannian geometry. 


74.1.2 REMARK: The relation of the Levi-Civita connection to the Riemannian metric. 

A Riemannian metric induces a canonical affine connection called the Levi-Civita connection. (There is 
also a more general class of affine connections called “metric connections" which are more weakly consistent 
with a Riemannian metric.) The Levi-Civita connection is an orthogonal connection, which means that it 
preserves angles and length. The Levi-Civita connection does not uniquely determine the metric tensor. 
This is why the Riemannian metric is in a higher layer than the affine connection. 


74.1.3 REMARK: The relation between geodesics and the Riemannian metric. 

Since the Riemannian metric determines a canonical parallelism, one may use this to define geodesics. A 
geodesic is a curve whose direction is parallel transported at all points by the curve itself. In other words, 
they are “self-parallel”. Geodesic curves also locally minimise distance. Thus the distance between two 
points A and B in a Riemannian manifold may be obtained as the length of a geodesic curve from A to B. 
This is illustrated in Figure 74.1.1. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www.geometry.org/dg. html 
Copyright © 2023, Alan U. Kennington. All rights reserved. You may print this book draft for personal use. [9bb89a22f3] 
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft. 
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Figure 74.1.1 Levi-Civita connection determines geodesic curves and distances 


The metric tensor field determines parallel transport. Parallel transport determines the geodesics. The 
geodesics minimise path length. The minimum path length determines the point-to-point distance. And the 
point-to-point distance may be differentiated to obtain the metric tensor field. 


74.2. The Levi-Civita connection and horizontal lift functions 


74.2.1 REMARK: Derivation of the Levi-Civita connection using horizontal lift functions. 

The Levi-Civita connection may be derived in several ways. 

1) Using a kind of geometric intuition, as Levi-Civita did in 1917. (See Levi-Civita [187].) 

2) Using calculus of variations to derive the equations for an extremal-length curve. (See Synge/Schild [41], 
pages 37-41.) 

3) Using covariant derivatives of the metric field applied to vector fields. (See for example Do Carmo [9], 
pages 53-56.) 

4) Using the fibre bundle formalism and horizontal lift functions. 


The intention here is to use approach (4). 

A Riemannian connection on a Riemannian manifold must preserve the metric when it is used for parallel 
transport of pairs of vectors along curves. Therefore the Riemannian metric function 7 : T?(M) > R 
in Definition 73.4.2 must have zero differential when it is applied to any pair of vectors y and z whose 
differentials in a direction V are equal to the horizontal lift function values 0y (y) and 0y (z). 


Definition 74.2.2 uses the associated connection on vector-tuple bundles in Definition 71.3.2. 


74.2.2 DEFINITION: A metric-compatible connection on a C? Riemannian manifold (M, g) is a horizontal 
lift function 0 for M which satisfies 


Vp € M, VV, y,z € T, (M), (dn) (02-(y, z)) = 0, (74.2.1) 


where 7 : T?(M) — R is the Riemannian metric function for M corresponding to g, and 0? is the associated 
connection on 7?(M) corresponding to 0. 


74.2.3 REMARK: A Levi-Civita connection is both metric-compatible and torsion-free. 

The torsion-free property for a connection in Definition 71.12.8 is applicable to general affine connections. 
This property is applied in Definition 74.2.4 as a constraint on metric-free connections. Then lines (74.2.2) 
and (74.2.3) are effectively “simultaneous equations” which a Levi-Civita connection must satisfy. It turns 
out that this exists and is unique under some reasonable conditions on the Riemannian manifold. 


74.2.4 DEFINITION: A Levi-Civita connection on a C? Riemannian manifold (M, g) is a horizontal lift 
function @ for M which is torsion-free and metric-compatible on (M,g). In other words, a Levi-Civita 
connection 0 on a C? Riemannian manifold with metric function 7 is one which satisfies 


Vp € M, W,y,z € T,(M), (dn) (0% (y, z)) — 0 (74.2.2) 
and 
Vp € M, YV, z € T,(M), Oy (z) = &(0(V)). (74.2.3) 


74.2.5 REMARK: The inescapability of coordinates for Levi-Civita connections. 
The “simultaneous equations" in lines (74.2.2) and (74.2.3) of Definition 74.2.4 have a quite complicated 
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geometrical structure. Line (74.2.2) creates a coupling between the lift values 0y (y) and @y(z) for all 
V,y,z € T,(M). The geometrical significance of this is that the inner product with respect to parallel 
translation of y and z in the direction V must be preserved. Line (74.2.3) creates a coupling between the 
lift values 0y (z) and @.(V) for all V, z € T,(M). The geometrical significance of this is that consistency of 
parallel translation with respect to geodesic translation must be maintained. 


It appears that it is not possible to solve equations (74.2.2) and (74.2.3) without resorting to coordinates 
in one way or another. (In the Koszul formalism, it is apparently possible to solve these equations in a 
coordinate-free manner, but the vector fields employed in that formalism act as proxies for coordinate basis 
vector fields, thereby formally hiding the coordinates.) The difficulty, or impossibility, of avoiding the use of 
coordinate charts to solve the Levi-Civita connection equations is not entirely surprising, at least when one 
has thought about it for a while. 


The role of a connection is to quantify the deviation of a coordinate chart from parallel transport. The 
horizontal lift function is employed to tensorise non-tensorial objects by indicating the direction of parallel 
transport so that this direction can be subtracted from a naive derivative (or other non-tensorial object) 
to make it tensorial. In normal coordinates at a point, there is no difference between naive derivatives and 
covariant derivatives. Thus the vanishing of the connection signifies that the “error” or “deviation” between 
parallel transport and naive differentiation equals zero. This is a statement about the coordinate chart! In 
general, a connection is always a property of the choice of chart relative to some intrinsic notion of parallelism 
on a manifold. In the case of a Levi-Civita connection, this intrinsic notion is derived from the metric tensor 
field, which is a fixed chart-independent structure. 


Although in theory the role of a connection is to specify the direction of parallel transport on a manifold, in 
practice a connection quantifies the difference between parallel transport and naive transport by coordinate 
basis vector fields. Hence it is not surprising that the various formulas in the literature for connections 
on manifolds are expressed in terms of the relation of the metric tensor field to some coordinate chart. 
(For the Koszul formalism, the coordinate chart's basis vectors are replaced by vector field proxies.) In 
the classical tensor calculus formalism, one obtains a formula such as T 5 = gk" (grij + geji — Jije), Which 
is a kind of divergence formula for g in terms of the coordinates. For the Koszul formalism, one obtains 
g(DxY, Z) = $(0xg(Y, Z) -0y g(X, Z) - 0zg(X, Y)) for holonomic vector fields X, Y and Z. This expression 
quantifies the deviation from parallelism of the vector fields, which are proxies for coordinate basis fields. 


In the case of the fibre bundle formalism, the vertical components of the horizontal lift function for a Levi- 
Civita connection can be expressed in terms of a similar formula involving coordinate basis fields or general 
vector fields. After the connection has been computed, it may be used as an abstract object which hides the 
coordinate charts, but it does not seem to be possible to compute the Levi-Civita connection without making 
a direct comparison between the metric tensor field and some kind of reference vector fields. Thus the metric 
tensor field is applied to some kind of vector field, then this is differentiated with respect to another vector 
field, and then the results are combined to compute the connection relative to those vector fields. 


Equations (74.2.2) and (74.2.3) are apparently coordinate-free, but to solve these equations, the most obvious 
computation strategy is to consider separately the horizontal and vertical components of the second-level 
vector 0% (y, z) € Tt, (T?(M)). The combined equations (di) (67,(y, z)) = 0 and 6y(z) = E(0;(V)) have 
a similar structure to a linear system Vj € Nm, 55, axt — y) = 0 for a;,z,y € R™ for i € Nm for 
m = n? with n = dim(M), which may have a unique solution (under the usual constraints on a). This 
system may be expressed more abstractly as Læ = y. The solution may be written as x = L-!y, but 
this is an abstract expression for the result of the computation. It hides the complexity of the algorithms 
required to compute the solution. The expression *^L-!4" means nothing more or less than “the solution 
of the equation Læ = y". Inevitably, one must use “coordinates” to solve Lx = y since some knowledge of 
the structure of the equation is required for its solution, and there is no information about it which is not 
equivalent to a coordinate expression such as Vj € Nm, $, xt — y) = 0. 

One of the slightly puzzling aspects of the Levi-Civita connection is that its horizontal lift function takes only 
vectors as inputs, whereas the computation of the horizontal lift apparently requires vector fields. One would 
perhaps expect that expressions for the horizontal lift could be written without recourse to vector fields, since 
the result seems to dispense with them. In fact, only the first-order derivatives of vector fields are required as 
inputs, and the output of a horizontal lift function 0 is a second-level vector 0y (z), which strongly resembles 
the first derivative of a vector. Thus the vector fields of the Koszul formalism could be replaced with vectors 
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and their first-order derivatives, although this could seem somewhat clumsy. Then the kinds of objects 
for the computation and the structure of a horizontal lift function would be better harmonised. However, 
even with vectors in T(T(M)) replacing vector fields in X!(T(M)) in the computations, the Levi-Civita 
connection is still ultimately computed as vertical components expressed in terms of horizontal differentials 
of the metric field. In other words, not much has changed since 1917. 


74.2.6 REMARK: Analysis of the differential of a metric function applied to a connection. 

In rough outline, Theorem 74.2.7 states that (dr)(62 (y, z)) = n(w(Av(y)), z) + nly, (06v (z))) + Ov nly, z). 
The drop function w drops the vertical components of Oy (y) and Oy (z) from T(T(M)) to T(M), and then 
7 is applied to the results n(w(Av(y)), z) and m(y,co(0v(z))). This can be justified by noting that in the 
vertical direction, 7 is bilinear by Definition 73.4.2 (i, ii). Then the term Ovn(y, z) contributes the horizontal 
component of the differential (d7)(67,(y, z)) because the horizontal component of 67 is V by Definitions 
11.3.2 and 59.7.2. An obvious difficulty with this is that y and z are only vectors, not vector fields. So 
the term Oy n(y, z) is not well defined. The “constant extension" of a vector y or z to a neighbourhood of 
p is not well defined except as a chart-dependent concept. Although the derivative Oyn(y, z) depends only 
on the first derivatives of the extension fields yy and zy, it is somewhat artificial to abstract these fields 


as second-level tangent vectors tp,y.0,0,4 and tp,z,,0,, for example. The coordinate basis vector fields a 
are well defined and “natural”, and there is no real benefit in attempting to replace them with abstract 
second-level tangent vectors. 


It is apparently inevitable that the metric field differential for an affine connection must be analysed into 
horizontal and vertical components because the available input information regarding a Riemannian metric 
is generally given in terms of horizontal and vertical components. 


(1) The vertical differential of a connection is easy to compute because the differential of a linear function 
equals the linear function of the differential. (See Remarks 53.3.13 and 59.2.8 for discussion of this.) 
This is a compelling reason to compute the differential in the vertical direction. 


(2) The horizontal component of Oy(z) for a connection 0 is known to equal V. So it seems natural to 
compute dy in this guaranteed known direction. The differential of 0v (y, z) may then be decomposed 
into horizontal and vertical components, and dn may then be applied to these components. In practice, 
the differential Oym(yy, zy) for “constant extensions" of y and z is likely to be an available input for 
some chart wv. 


The differential of 7 is easy to compute in the vertical direction, whereas the differential of 0 is in some sense 
easy to compute in the horizontal direction. Since the sum of the horizontal and vertical components equals 
zero, this yields a formula for the vertical component in terms of the horizontal component. The vertical 
component depends on the connection and the zeroth derivative of 7, while the horizontal component depends 
on the first derivative of 7. Thus the metric compatibility yields a formula for the connection in terms of 
the horizontal derivatives of 7, acted upon by the sharp (index-raising) musical isomorphism. 


'The abstract concept of a connection, like most of the abstract concepts in differential geometry, is a kind 
of “encapsulation” of the output of particular kinds of computations. It has a geometric significance which 
assists its interpretation, but concrete computations are required for the input. In Theorem 74.2.7, it is seen 
that chart-dependent computations are required for the evaluation of the differential dr) acting on a given 
connection output. When the chart-dependent computation has been performed, the result may be written 
abstractly as (dn) (07. (y, z)), but this is merely a label which hides the technical complexity. 


Classical Euclidean geometry can be optionally developed either in the synthetic style (based on axioms for 
lines, points and other kinds of objects), or in the analytic style (based on Cartesian coordinates, algebra 
and calculus). Differential geometry, by contrast, can really only be developed as an analytic subject, based 
on not just one Cartesian coordinate chart but many, and the synthetic perspective is useful only for the 
interpretation of concepts, not for their computation. It is true that some simple kinds of differential geometry 
computations can be performed by apparently “coordinate-free methods", but these are limited to a narrow 
menu of user-friendly operations which are pre-cooked and packaged for easy consumption. Anything which 
is not on this menu must be cooked up from raw ingredients. In other words, there is no “synthetic road” 
to differential geometry. 


74.2.7 THEOREM: Differential of a metric tensor applied to a connection. 
Let M be a C? Riemannian manifold with metric function n : T?(M) — IR. Let 0 be a C! affine connection 
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on M. Let 0? be the connection on T?(M) which is associated with 0 as in Definition 71.3.2. Then 


Vp € M, Vy € atlas,(M), VV, y, z € T;(M), 
(dn) (0% (y, z)) = n(@w (8v (y)), z) + nly, ce" (6v (2))) + Ovm(yo; zy), (74.2.4) 


where the vector fields yy, zy € X! (T(M)| Dom(v)) are defined (as in Definition 57.1.20) by 


vq € Dom(i), w(q) = » DUEL”, 
vq € Dom(y), zld) = 5 9(U)(z)'e?" 


Il 
m 


PROOF: Let p € M and V,y,z € T (M). By Definition 71.3.2, 67-(y,z) = h( (0v (y), 0y (z)) ), where h is 
given in Definition 59.7.2. Let v € atlas,(M7) and V = tpw. Let 0y(y) = ty v,w y and 0v(z) = tzvw,v- 
Then 0% (y, 2) = ty 2)v.(w wa) y bY Definition 59.7.2. But (2, , (,,, ,,), may be decomposed into a sum 


2 2 2 
of components as 1 uz) v, (0,0), + 5j, 2),0,(w1 ,0),th + 1 1, 2),0, (0,02) b" It follows that 


(dn) (6%-(y, z)) = (dn) (Ey 25,0, (.,),9) + (dM) (fy, 2),0,(0,02),0) + (d) (0,25, (0.0) )- 


By Definitions 73.4.2 (i, ii) and 59.3.5, it follows that (dn) (t? (y,2),0,(ws ,0) eb y) = nw, z) = (w*(0y(y)), z) and 


(d (t? (,2),0,(0,002),0) = NY, Wa) = (y, cz" (0v(z))). Let g = &(w)(y) and Z = 9$(v)(z). Then it follows from 
Theorems 58.1.6 .6 and 57.1. 


(dn) (tly, 2),v,(0,0),) = 2; v Oni (Lo: (2), ty (x),2,0) m 


= X vLoa Q7 (2). ss (97 GL ug, 
= Oy (yv, Zy). 


Hence (dn) (0% (y, z)) = n(Œ (6v (y)), z) + nly, e" (8v (2))) + ðvnluy, zy). 


74.2.8 REMARK: Decomposition of the differential of a metric function into components. 

Theorem 74.2.7 decomposes the differential drj of a metric function 7 acting on the output of the vector-pair 
connection 6? into vertical and horizontal components. For the single-vector connection 0 to be metric- 
compatible, (d7)(67,(y, z)) must equal zero for all V, y, z € T;(M) for all p € M. Unfortunately the expression 
in line (74.2.4) is a constraint on the values of 0 for two different vectors y and z. The desired kind of formula 
for a metric-compatible connection should have a single term 0y (z) expressed in terms of the given metric 


function 7. 


In tensor calculus, the right side of line (74.2.4) would look like (—gj,L4, — gel i + gik)y iz) v". 1f this 
is made equal to zero for all y, z, v € IR", the result would be Vi,j,k € Nn, gj A + Ge, = = Gij,n as the 
condition for metric compatibility in Definition 74.2.2 line (74.2.1). In the absence of a symmetry constraint 
on [, the solutions of this system of equations for I in terms of g are not unique in general. In the case of 
the abstract formula (drj)(02.(y, z)) = 0 for metric-compatibility, the manipulations which are required for 
its solution are more onerous than for tensor calculus. In principle, this price should be worth paying for 
the sake of obtaining geometric insight, but the net benefits in this case are somewhat questionable. 


74.2.9 THEOREM: Condition for metric-compatibility of a connection. 
Let M be a C? Riemannian manifold with metric function n : T?(M) — IR. Let 0 be a C! affine connection 
on M. Then @ is a metric-compatible connection on M if and only if 


Vp € M, Vy € atlas,(M), VV, y, z € T;(M), 
n(co* (8v (y)), z) + nly, ce" (6v (z))) + Ovn(yy, zu) = 0, 


where the “constant extension fields” yy, zy € X! (T(M)| Dom(w)) are as in Definition 57.1.20. 


[ www. geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


2300 74. Levi-Civita parallelism and curvature 


PROOF: The assertion follows from Definition 74.2.2 and Theorem 74.2.7. 


74.2.10 REMARK: Interpretation of the torsion-free condition for affine connections. 

The metric-compatibility condition (74.2.2) in Definition 74.2.4 has a clear meaning. It means that the first 
differential of the inner product 7 of two vectors y,z € T,(M) in any direction V € T,(M) must equal 
zero. In other words, the inner product of a vector-pair is invariant under translations to first order. (Then 
presumably when this first-order differential is integrated along curves, the result will also be invariant.) 


The torsion-free condition (74.2.3) is not so clear. It may be explained geometrically in terms of the “non- 


sliding" of a sphere rolled over an embedded manifold, or as parallel translation in a “ruled surface" or 
“developable” which approximates an embedded manifold at a point, but these metaphors are indirect and 
ambiguous. It is difficult to express such metaphors as mathematical formulas. In tensor calculus, one 
specifies that the Christoffel array I must be symmetric, which is not a geometric style of condition. In 
vector field calculus, one specifies that the difference between Dx Y and Dy X is equal to the Poisson bracket 
of X and Y, which likewise has a not very clear geometric significance. The condition 0y (z) = =(@.(V)) on 
line (74.2.3) has perhaps even less geometrical clarity. 


The central issue here is that in a non-embedded manifold, there is no ambient geometry with which to 
give an interpretation to torsion. In a non-flat Riemannian manifold, the Levi-Civita parallel transport is 
path-dependent. So it is not clear why a path-dependent parallelism should be given the name “parallel”. 
In Euclidean geometry, parallel transport of vectors along a straight line generates a parallelogram whose 
opposite sides are equal, and whose opposite angles are equal. Since the most important properties of classical 
Euclidean parallelism are not preserved by path-dependent parallelism in curved spaces, one must ask just 
which properties are preserved by torsion-free translation, but are not preserved if torsion is non-zero. 


One way to solve this, perhaps, is to find a replacement for the parallelogram which is traced out by Euclidean 
parallel translation. Instead of a parallelogram, one could perhaps think of a C? family of curves y : IR? — M 
which has the property that Y (s,t) = O,y(s,t) is a parallel vector field along the curve s + 7(s,0). This 
would imply that Qx(,,9)(Y (s,0)) = 0x (5,0) (Y (s, 0)) for all s € IR, where X(s,t) = O.y(s,t) for (s,t) € R, 
and ôx(s 0) (Y (s,0)) is a “naive derivative" as in Definition 61.2.3. In Euclidean geometry, if two sides of a 
parallelogram have equal length and are parallel, then the other two sides also have equal length and are 
parallel. (This is Euclid's proposition I-33. See Euclid/Heath [213], pages 322-323; Euclid [216], page 25.) 
In Riemannian manifolds, one would expect that this would be true at least to the first order. Therefore 
one expects Oy(5.9 (X (s,0)) = 0y (5,9) X (s, 0)) for all s € R. If this were not true, then the “infinitesimal 
parallelogram” bounded by Y (s1,0) and Y(s2,0) for small positive |s2 — s1| would have two opposite sides 
parallel, but the other two sides not parallel, and the "angle" (or the *direction cosines") of the discrepancy 
would converge to a non-zero value as the parallelogram “shrinks”. (This kind of situation is illustrated in 
Figure 71.12.2 in Example 71.12.6.) 


Thus for any C? two-parameter curve-family, the equality Ox(s,o)(Y (5,0)) = Ox (s,0)(Y(s,0)) (parallelism 
of Y in the X-direction) implies Oy (s9)(X(s,0)) = Oy(s,0)(X(s,0)) (parallelism of X in the Y-direction) 
if the parallelism has the desirable Euclidean-space property that parallelism of two opposite sides of an 
“infinitesimal parallelogram” implies parallelism of the other two opposite sides. This may be referred to as 
the “torsion-free property”. 

Since ðx(s (Y (s,t)) = E(Oy(,) (X(s,t))) for all (s,t) € IR?, as discussed in Section 59.6, it follows that 
9x (s,o9y(Y (s, 0)) = =(Oy(s,0)(X(s,0))) if and only if the parallelism has the torsion-free property. In other 
words, the condition 0y (z) = =(0,(V)) signifies that the parallelism is at least Euclidean in the limited sense 
that infinitesimal quadrilaterals obey the basic rules of parallelism in Euclidean geometry. 


Although torsion of connections is well defined for general affine connections, as presented in Section 71.12, 
the torsion of an abstract affine connection is just one property among many, which may or may not be 
interesting, but in the case of the Levi-Civita connection on a Riemannian manifold, the torsion-free property 
is a core fact. Since the covariant derivative and various curvature tensors are defined in terms of the Levi- 
Civita connection, the torsion-free property is implicated in most of the geometry of the manifold. 


74.2.11 REMARK: Euclid’s proposition I-33, torsion-free connections, and non-zero curvature. 

Curiously, Euclid’s proposition I-33 depends heavily on the famous fifth postulate, which effectively states 
that parallel lines never meet and never diverge. In other words, the fifth postulate says that space is “flat” 
or has zero curvature. In general Riemannian manifolds, the fifth postulate is not valid, but a torsion-free 
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connection is nevertheless accepted as the definition of the standard parallelism. Significantly, torsion is 
defined as a differential, not “at a distance”. So there is no direct contradiction. 


74.2.12 REMARK: Solving the simultaneous equations for the Levi- Civita connection. 

Since the connection sought in Definition 74.2.4 must be torsion-free, this implies that any term of the form 
6v(y) may be replaced by the corresponding term E(0,(V)). This certainly assists in the solution of the 
simultaneous equations. From Theorem 74.2.9, a Levi-Civita connection must satisfy 


Vp € M, Vv € atlasp( M), YV, y, z € T,(M), 
ne" (8v (y)), z) = —n(y, v" (Ov (z))) — Ovn(yw, zy). (74.2.5) 


This expresses the “unknown” term ņn(w¥ (8v (y)), z) in terms of a “known” term ðvn(yy, zy) which de- 
pends only on the metric function, and an unknown term 7(y, c" (0v (z))). Since the torsion-free property 
implies 0y (z) = E(0,(V)), it is possible to substitute cz"(0,(V)) for cz" (0y (z)). (See Theorem 59.6.8.) A 
second application of Theorem 74.2.9 gives n(y, cz" (04(V))) = —n(c* (0: (y)), V) — O.n(yy, Vy). This can be 
substituted into line (74.2.5) to give: 


Vp € M, Vy € atlas, (M), YV, y,z € T,(M), 
n(w” (8v (y)), 2) = n(w” (62 (y)). V) + 9zn(yu. Vo) — Ovn(yus zy). 


This expresses 0y (y) in terms of a “sideways” horizontal lift 6,(y). This may not seem like good progress 
because now there are three terms on the right side of the equation. However, two of the terms are *knowns". 
The number of “unknowns” has not increased. Now the entangled lift values 0y (y) and 0.(y) apply to the 
same tangent bundle element y, but the velocities V and z which are applied are different. The next step is 
to use the torsion-free property to replace cz" (0:(y)) with cz" (0,(z)) and then apply Theorem 74.2.9 again 
to introduce the other “sideways” lift value as follows. 

Vp € M, Vy € atlas, (M), YV, y, z € T;(M), 


nv" (8v (y)), 2) = —n(z, w” (64(V))) — Byn lzy, Vw) + 0:n(yo; Vo) — 9vn(yo, zy). 
Semi-miraculously, the two unknowns may now be combined by applying the symmetry of the metric function. 
Vp € M, Vy € atlas, (M), VV, y, z € T;(M), 
n" (8v (y)), 2) = 1(8,m(V, Yy) — Ov (Ys zo) — Ayn(Vy, zu))- 


As a "sanity check", it may be noted that the two sides of this equation are both symmetric with respect to 
permutations the pair (V, y). This procedure for the solution of the equations in Definition 74.2.4 may now 
be streamlined as in the proof of Theorem 74.2.13. 


74.2.13 THEOREM: Abstract Levi- Civita connection with "lowered indices". 
Let M be a C? Riemannian manifold with metric function 7 : T?(M) — IR. Let 0 be a C! Levi-Civita 
connection on M. Then 

Vp € M, Vy € atlas, (M), VV, y, z € T;(M), 


) 
n(w” (6v (y)), 2) = $(8;m(Vy, Yy) — Ov ny, zu) — Oyn(Vo, 29)). (74.2.6) 


PROOF: By Definition 74.2.4 line (74.2.3) and Theorem 59.6.8, c" (0v (y)) = c" (6,(V)). Therefore 


Vp € M, Vv € atlas; (M), VV, y, z € T,(M), 
2n(w” (6v (y)), z) = nw” (6v (y)), z) + n(ve" (6, (V)), z) 
= —n(y, w” (6v (z))) — 8vn(yo, zy) — (V; ea" (By(z))) — Oyn(Vy, zy) (74.2.7) 
= —n(w" (64(V)), y) — n(V, w” (2(y))) — vn(ys, zo) — Oyn(Vy, zy) — (74.2.8) 
= ON (Vy, yo) — Ovn(yo: 29) — Oyn(Vu; zu); (74.2.9) 


where lines (74.2.7) and (74.2.9) follow from Theorem 74.2.9, and line (74.2.8) follows from Definitions 
73.4.2 (iii) and 74.2.4 line (74.2.3). 
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Figure 74.2.1 Computation of the Levi-Civita connection 


74.2.14 REMARK: Visualisation of computation of the Levi-Civita connection. 

The result of the computation in Theorem 74.2.13 is illustrated in Figure 74.2.1. 

The expression on line (74.2.9) is effectively the sum of the results of “probing” the variation of the 
metric function 7 in directions z, —V and —y. The sum of these three derivatives of ņ produces the 
value 2n(cz" (0v (y)), z), which is the same as n(w® (8v (y)), z) + n(vo* (6,(V)), z). This may be thought of 
as the “covariant z-component" of 0y (y) or 6,(V). 


It is probably easier to geometrically interpret the negative of the Levi-Civita horizontal lift function. The 
reason for this is that the lift function value 0y (y) is a “compensation” or “correction” term which must be 
added to the “constant extension field" y, of the vector y in order to transport it in a parallel manner in 
the V direction, but —6v(y) is effectively the covariant derivative of the constant extension field. In other 
words, —0y (y) measures the deviation from parallelism of y, in the V direction. 


To measure the deviation of yy from parallelism in the V direction, one would expect to have to evaluate 
an expression which looks something like Oyn(yy,-). For example, if the length of yy, increases in the V 
direction, as measured by 7, then 0y (y) should correct this “error” by shortening the vector, which would 
be a negative change. In other words, 0v (y) should look like —Oyn(yy,-). If the correction of the error 
is negative, then the error itself must be positive. (This is why the correction of errors is called “negative 
feedback" in engineering!) Therefore —0y (y), the measure of the "error", should look like +0yn(yy, >). 
Consequently the covariant z-component —n(w* (0y (y)), z) of the “error measure function" —6y (y), which 
is in essence a covariant derivative of a constant extension function, is (half) the negative of line (74.2.9). 
'This is illustrated in Figure 74.2.2. 


A fe / +OV1(Yps zy) 


» / / 


/—-O( Vy Yy) 


Figure 74.2.2 Computation of the negative of the Levi-Civita connection 


Thus Figure 74.2.2 is a visualisation of the deviation from parallelism of y, in the V direction, which is the 
same as the deviation from parallelism of V,, in the y direction because torsion equals zero. This “deviation” 
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or “error” should then look like 3vn(yy, :) or Oyn(Vu,-), or both. In other words, 2n(c^(0v(y)),z) = 
n(cc* (Ov (y)), z) +n(w” (6,(V)), z) should look something like Ov'n(yy, zy) + Oyn(Vy, zy). Since this includes 
the effects of transporting z in addition to transporting y and V, this dependence must be removed. This is 
the reason for the term —0,(Vy, Yy) in the equation 


—2n(w” (6v (y)), z) = —n(w” (6v (y)), z) — n(w” (8,(V)), 2) 
= OyTl (y, zy) + On(Vo; zo) — O2n(Vo, yy). 


Since n(w” (Ov (y)), z) = n(w”(8,(V)), z), one might ask why both directions V and y must be “probed” in 
Figure 74.2.2. The reason is that the term —0,7(Vy, Yy) compensates for both the V and y directions. The 
metric function is a function of two vectors. So it is not possible to evaluate it for one vector only. Thus 
—O,n(V,, Yy) cancels, in some sense, the unwanted z dependence of the sum Oyn(yy, zy) + Oyn(Vy, zy). 


Since the negative of the Levi-Civita connection (i.e. parallel transport) is more readily interpreted than the 
connection itself, it is unsurprising that the negative of the connection is more prominent in Riemannian 
geometry. The Christoffel array is the coefficient array for the negative of the parallel transport, and 
the covariant derivative subtracts parallel transport from the naive derivative. Furthermore, the Riemann 
curvature is typically defined as the curvature of the covariant derivative around an infinitesimal rectangle, 
not the curvature of the parallel transport. (This sign issue for the Riemann curvature is also discussed in 
Remark 70.7.7. See also Definition 71.11.5.) 


74.2.15 REMARK: Application of index-raising musical isomorphism to obtain the Levi-Civita connection. 
Equation (74.2.6) in Theorem 74.2.13 is unsatisfying because it does not give a formula for 0y (z) on its own 
on the left side. Theorem 74.2.16 takes one further step towards isolating the term y (z) on the left side. 
In a sense, this is not real progress because the index-raising isomorphism ut applies the inverse of a linear 
map, which itself is the solution of some equation. However, the inverse of a linear maps is a familiar concept 
which makes the formula for 0y (z) slightly easier to understand. 


Equation 74.2.10 has a form which is not totally straightforward. The index-raising isomorphism uè must 
be applied to the linear map “z ++ (O;n(Vy, Yy) — Ov N(Yy, Zy) — Oyn(Vy, Zy))”, which is not easy to interpret 
as a familiar kind of function. At this point (and probably at very numerous other points), it becomes 
clear that the fibre bundle approach to differential geometry is singularly ineffective for computations. The 
computation of the Levi-Civita connection for a Riemannian manifold is one of the most fundamental of all 
computations in the subject, but it is very tedious and onerous in the fibre bundle framework compared to 
the corresponding computation within the tensor calculus and vector field calculus frameworks. 


74.2.16 THEOREM: Abstract Levi-Civita connection formula with “raised indices”. 
Let M be a C? Riemannian manifold with metric function n : T?(M) — R. Let 6 be a C! Levi-Civita 
connection on M. Then 


Vp € M, Vv € atlas, (M), VV, y € T,(M), 
w” (6v (y)) = gut (z > (Oen(Vy, yo) — Ovn(yo, zu) — Oyn(Vy, zy) ). (74.2.10) 


PROOF: The assertion follows from Theorems 74.2.13 and 73.5.6. 


((2016-8-14. Derive the connection form on the principal bundle from the OFB horizontal lift function. )) 


74.3. The Levi-Civita connection in tensor calculus 


74.3.1 REMARK: The coefficient array for the Levi-Civita connection. 
The coefficient array for a general affine connection in Definition 71.2.2 may be applied to the Levi-Civita 
connection in Theorem 74.2.13 to obtain Theorem 74.3.2. 


74.3.2 THEOREM: The Levi-Civita connection in the tensor calculus style. 
Let M be a C? Riemannian manifold with metric function 7 : T?(M) — R. Let 6 be a C! Levi-Civita 
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connection for 7. Then the coefficient array I for 0 satisfies: 
Vu € atlas(M), Vx € Dom(w), Vi, j,k € Nn, 


X 95. (x) (Ons gp (m) + Oz: 9¢; (2) — Ope Gf; (x)) 


n 


Ty(z)f = 


NIe NIe 


[2:3 
ll 


; g (2) (gi; ; (x) + gala) = DN 


where n — dim(M). (See Notation 73.3.6 for derivatives DA (£) of components gi of a metric tensor field g. 
See Definition 73.3.8 for the inverse component array field gy for a Riemannian metric tensor field g.) 


PROOF: Let v € atlas(M), £ € Range(w), p = Y7! (x), y =e?” and V = gu Then by Theorem 74.2.13, 


Vz € Ty (M), —n(w (0v (y)), z) = z (0v n(uv. Zp) + Oyn(V, 85) = O,n(Vy, Yy )) 
= DNCWO *gh 5) — 95 L(2))s5 (74.3.1) 
= $ D (she s(2) + ahala) — Fin eho) 


where line (74.3.1) follows from Theorem 73.4.8, where (z*)?_, = (Y) (z). So by Theorem 73.5.6, 
-o Ov (y)) = u$ (3 È (hE) + ads) — gE) Eg) 
w 
= 2 2, (gis. 5 (©) F G34. 4(2) T Ih, pla) Jug (eb y) 


n 
= S ^» gy (x) (gis. (2) T oe) = aer" 


by Theorem 73.5.10 (iv). Therefore by Definition 71.2.2 and Notation 54.5.7, 


Vi j, k € Nn, D,(z)5 = —©(b)(@ (6v (y)))" 
= > 9$ (x) (gh) (m) + a], (7) — gh o(@)) 
z P 95. (a) (gh y2) + 91,7) — 9% o(@)) 


by the symmetry of g. 


74.3.3 REMARK: Christoffel array for the Levi-Civita connection. 
Definition 74.3.4 is a simple combination of Definitions 74.2.4 and 71.2.2 with Theorem 74.3.2 to arrive at 
the coefficient array of a Levi-Civita connection with respect to any chart in the atlas of the manifold. 


The name “Christoffel symbol” or “Christoffel’s symbols” was given to the coefficients i g^ (gei j+ gti — gije) 
in recognition of their first appearance in an 1869 paper by Christoffel [175]. They were often applied 
thereafter to covariant derivatives and geodesics, but their significance for parallelism was not specifically 
recognised before a 1917 paper by Levi-Civita [187]. 


The symbol “T” is also used for general affine connections, even though Christoffel’s original array was very 
specifically given by the formula in line (74.3.2), and it was not used for general affine connections until 1918 
by Weyl [310], pages 113-121. 

It seems reasonable to refer to the array “I” as the Christoffel symbol or array in the broad context of general 
affine connections (and even for general connections on vector bundles), while the array in Definition 74.3.4 
is referred to as the Christoffel (coefficient) array or Christoffel symbol for the Levi-Civita connection. 
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74.3.4 DEFINITION: The Levi-Civita connection coefficient array for a C? Riemannian manifold (M, g), 
with respect to a chart v € atlas(M), is the map Ly : Range(v) > R” x R” x R”, where n = dim(M), 
which is given by 


Va € Dom(w), Vi, j, k € Nn, 


I5 = 5 E af Grat) + ahale) - ab). (74.3.2) 


This may be abbreviated as rk = $ g! (gei j + geji — Jije). 
The Christoffel (coefficient) array or the Christoffel symbol for the connection are alternative names. 


74.3.5 EXAMPLE: Define a C? differentiable manifold M < (M, Ay) with M = IR? and manifold atlas 
Am = (V5, Y2}, where Yı = idpe and v» : R? — IR? is defined by v» : (£1, £2) — (h(z1), £2) for some C? 
bijection h : IR > IR. Define a tangent bundle on M by T9?(M) = ([(p,v,v); pe M, v € R2, v € Am}, 
with the usual equivalence class for the point/components/chart triples. Then M is a C? differentiable 
manifold if h’(t) Z 0 for all t € IR. 


Define a metric field g € X (T??(M)) by g^ (n(p))(v. w) = g(p)([(p. v, v1)]; [(p, w, v1)]) = viwi + vewe for 
p€ M and v,w € R?. Then in terms of the chart v», 


g” (z)(v, w) = g^ (W2(p))(v, w) 
= g(p)([(p, v, v2)], (p, w, v2)]) 


= h' (x1) viw + vow 


for p € M and v,w € R?, where x = v»(p). The bilinear function g”! (x) is positive definite for all z € R?. 
If h’(t) Z 0 for all t € R, then g"?(x) is also positive definite for all x € R?. Otherwise, g¥?(x) is only 
positive semi-definite for all y € IR?. However, g"?(x) is positive semi-definite for x € v»(M"), where 
M' = {p € M; h'(pi) Z 0). 
With respect to the 7%, chart, the Levi-Civita connection for (M, g) has coefficient array Ty, (p )5 — 0 for 
p € M and i,j,k € No. With respect to the v» chart, g”? (x Jig = 2h! (x£1)h” (21)0110j10&1 and Lf ak, = 
h" (,)h (z1) 103051011 for p € M', x = v»(p) and i,j,k € IN». The coefficients Ty, (vo(p))7; are undefined 
for pe MN M”. 
Define curves Yke : R — M by Yke : t | (kt +c,t) for k,c € R. Then the curves Yk, are affinely 
parametrised geodesics in M for all k,c € IR. With respect to the v? chart, the coordinates of these curves 
are given by vis : R > R? with us : t (h^! (kt + c),t) for k,c € IR. To verify that these curves are 
geodesic by Theorem 72.1.6 in terms of the 7, chart, note that ahy ? (0)! -- Dy, (We (p))10vy72 (0) Out? (£)? = 
—k?nh" (t) (t) 3 + h" (x1) (1) 1 kh (t) ! kW (t)! = 0, by Theorem 42.1.19 parts (i) and (ii). Similarly, 
OPE? (0? + Tu, (2 (p))2,0it (00/52 (£)) — 04-0 = 0. (See Example 73.10.1 and Figure 73.10.1 for the 
particular case h(t) — t? for t € IR.) 


74.3.6 REMARK:  Non-tensoriality of the Christoffel array for a Levi-Civita connection. 

Although the Christoffel array is not the array of a tensor, it is required to transform in a specified way 
under changes of chart 1». According to Theorem 60.4.7, the differential of a parallelism must satisfy equation 
(60.4.2) in order for the operator L(Y) = Oj; — TE (4)Oy to be a second-degree covariant tensor operator 
when applied to functions f € C?(M). 


74.3.7 THEOREM: The Levi-Civita connection coefficients are tensorisation coefficients. 
The Levi-Civita connection coefficient array TE in Definition 74.3.4 satisfies condition (60.4.2) for tensori- 


sation coefficients for the second-order operator 0;; — I7 E (b)ðk in Theorem 60.4.7. 


PROOF: Let v, € atlas,(M) be charts at p € M for a C? Riemannian manifold M. Denote the respective 
Levi-Civita connection coefficient arrays by [ and I. Then 
re = 1 pı (99, ON; — Ogij 
v. 2 Or) Or! — Oar! 
d Ai Og On j OG: j 
my EN kl a J a 
2 y 


Oi Ox — Os 
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where g' = $ if jg, gig = O 10 igre, 6 = Wow and 9 = h o YT! = $71. Then 


Ogu E m 
OI - 2 ($ i” LImk) 
; Ogm 
= éd + Oy Ime + OTORO SE 
Therefore 
r le, 7 u m 4s s m mas ar O9ms 
DE = 39 49 ug! ( ij? eJms + 450 Ims to1959; D 
s m m r Oma 
té, iO tms ES "40. Ims T4 $50; E 
o ms 
- [6659s |— [Ob tans | - OO OS™ ) 
u Ogms O9ms m O9ms 
= igg E (2675650 + 919,07 Se + d) 007, IR — gindt dr amt) 
u Üdms Ogms Ogms 
= Fh, + 5H d ua" (Toro, rn + gore ne — oro ot me) 
~h "^ Jms grs Ob nr 
— Am dk Zak m r ts ] 
= 9m + 3 tP 40 59 (a ' Om" — Ox ) 
= 9*9" Wa Li + 4T, (74.3.3) 


(The boxed terms cancel to zero.) This matches equation (60.4.2). 


74.3.8 REMARK: The Christoffel array is a family of "tensorisation coefficients”. 

It is the term $^; 0s in equation (74.3.3) which makes the Christoffel array a non-tensorial object. That 
is, the array doe not correspond to the coefficients of a tensor of any type. The Christoffel array is in 
fact a family of “tensorisation coefficients”. (See Definition 60.4.9 for general tensorisation coefficients.) 
The non-tensorial Christoffel array may be combined with non-tensorial higher-order derivatives to produce 
tensorial objects. The second-order derivative term in (74.3.3) is what allows the Christoffel array to convert 
non-tensorial partial derivatives into tensorial covariant derivatives. 


Equation (60.4.2) does not uniquely determine the Levi-Civita connection. In fact, equation (60.4.2) places 
a fairly weak constraint on the tensorisation coefficients which specify a connection. 


74.3.9 REMARK: Length-parametrised geodesics. 

A length-parametrised geodesic is a geodesic curve for which the curve parameter corresponds to the length 
of the curve. This is called a “length-parametrised geodesic” by Gallot/Hulin/Lafontaine [13], page 116. It 
is called a “normal geodesic” by Greene/Wu [87], page 6. 


In the case of a non-empty interval J C R with finite inf (7) or finite sup(J), it is possible to translate, and 
reverse if necessary, the parametrisation so that inf(7) = 0 and sup(7) > 0. Then the parameter t € I is the 
partial length of the curve up to the parameter value t. However, if inf(I) = —oo and sup(I) = oo, such a 
literal kind of length-parametrisation is not possible. 


74.3.10 DEFINITION: A length-parametrised geodesic in a Riemannian C! differentiable manifold M with 
metric tensor field g € X (19?(M)) is a C! geodesic curve y : T > M for some interval 7 C IR such that 


Vt € Int(1), 94(t) (F(t), Y (t)) =1. 


((2016-8-14. After Section 74.3, present a vector-field formalism version of the Levi-Civita connection. )) 


[ www. geometry. org/dg. html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


74.4. Curvature tensors 2307 


74.4. Curvature tensors 


74.4.1 REMARK: Various curvature tensors in Riemannian manifolds. 
Section 74.4 includes Riemann curvature, Ricci curvature and scalar curvature. The Riemann curvature 
tensor is defined in terms of the Levi-Civita connection. 


Frankel [12], page 229, gives the alternative name “Riemann-Christoffel tensor” for the Riemann tensor. 


74.4.2 REMARK: Spaces and maps for the Riemann curvature tensor. 

Figure 74.4.1 illustrates spaces and maps which are relevant to the definition of the Riemann curvature tensor 
for the Levi-Civita connection on a Riemannian manifold. The double bar on an arrow in Figure 74.4.1 
indicates a bilinear map. The zig-zag on the same arrow indicates an antisymmetric (alternating) map. The 
squares on two arrows indicate symmetric maps, which are in fact (special) orthogonal. 


R T,(M) T,(M) 
| Ts E T,(My 4——» i 
Ly (T,(M)) T,(M) T,(M) 
NT,(M) SO(T,(M)) SO(T,(M)) 
Lin(AT,(M), SO(T,(M))) A2(1,(M), SO(T,(M))) 


Figure 74.4.1 Riemann curvature tensor spaces and maps for a Levi-Civita connection 


As mentioned in Remark 71.11.3, the Riemann curvature tensor in the more general context of affinely 
connected manifolds may be considered to be an element of Lin( NT,(M ), Aut(T; (M))), or equivalently as 
an element of A(T;(M), Aut(T;(M))), both of which are illustrated in Figure 71.11.1. 

The group SO(T,(M)) = (6 € Aut(T,(M)); Vv,w € T,(M), go(ó(v), d(w)) = gy(v,w)) is used in the 
Riemannian manifold context instead of Aut(T,(M)) = GL(T,(M)) € Lin(T;(M), T;(M)) because the 
Levi-Civita connection preserves the metric. 


The space Lin( NT, (M), SO(T, (M))) seems to nicely capture the geometric essence of the Riemann curvature 
tensor at each point, but for practical reasons, the equivalent space A2(T5(M), SO(T,(M))) is more often 
used because it is more convenient to write a pair of vectors (u, v) than the corresponding wedge-product uAv. 


74.4.3 REMARK: The Riemann curvature tensor for a Riemannian manifold. 

The Riemann curvature tensor in Definition 74.4.4 is essentially identical to the Riemann curvature for 
affinely connected manifolds in Definition 71.11.8. In fact, the Riemann curvature is well defined for general 
differentiable fibre bundles, as described in Remark 70.4.3. 


74.4.4 DEFINITION: The Riemann curvature (tensor field) (component version) in a C? Riemannian man- 
ifold M with n = dim(M) € Zt , with respect to a C? chart ù for M, is the second-degree covariant tensor 
with components (R’jxe(2))?; 4, ,-, for x € Range(v) which is defined by the formula 


i i i mri mpi 
R' jke = Orr Tj — OUI + Gelme Uie 


where the family Tile) à € RZ% at each z € Range(v) is the Christoffel array for the Levi-Civita 
connection on M with respect to w. 


74.4.5 REMARK: The Ricci curvature tensor for a Riemannian manifold. 
The Ricci curvature tensor in Definition 74.4.6 is essentially identical to the Ricci curvature for affinely 
connected manifolds in Definitions 71.11.11 and 71.11.14. 


74.4.6 DEFINITION: The Ricci curvature (field) (component version) in a C? n-dimensional Riemannian 
manifold M for n € Zf, with respect to a C? chart ~ for M, is the second-degree covariant tensor with 
components (9;;(z))7.;-; for x € Range(v) which is defined by the formula 


Ráj — OUT = O,; TE + a We _pr pm 


mj- ik? 
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where the family (D(z))7;,4 € RZ% at each x € Range(w) is the Christoffel array for the Levi-Civita 
connection on M with respect to wv. 


74.4.7 DEFINITION: The scalar curvature (function) for a C? Riemannian manifold (M,g) is the map 
R : M — R defined by 


n 


Vp € M, R(p) = > Rlei(p), ei(p)); 


i 
where n = dim(M) and (e;(p))#_, is a basis of T;(M) for all p € M, such that Vi, j € Nn, g(ei(p), ej; (p)) = diy. 


74.4.8 DEFINITION: The scalar curvature (field) (component version) in a C? n-dimensional Riemannian 
manifold M for n € Zf, with respect to a C? chart v» for M, is the real number R(x) for z € Range(v) 
which is defined by the formula 


where the family (L5.(z))7;,4 € RZ» at each x € Range(w) is the Christoffel array for the Levi-Civita 
connection on M with respect to wv. 


74.5. Sectional curvature 


74.5.1 REMARK: The bilinear sectional curvature map. 
The antisymmetric bilinear sectional curvature map in Definition 74.5.2 is defined in preparation for defining 


the standard sectional curvature quotient map in Definition 74.5.4, which divides the bilinear sectional 
curvature by the area of the antisymmetric forms in the domain of the map. 

The space X(A2(M,R)) contains maps K : M > Upem A2(075 (M), IR) such that K(p) € A2(T;(M), R) for 
all p € M. These are cross-sections of the tangent bundle A2(Tp(M), IR). Therefore the bilinear sectional 
curvature value K(p) is a tensor at each point p € M. This contrasts with the sectional curvature quotient 
in Definition 74.5.4, which is not a tensor at each point of the manifold. 


74.5.2 DEFINITION: The bilinear sectional curvature map on a C? Riemannian manifold (M, g) is the map 
K € X(Ao(M,R)) defined by 


Vp € M, Vu, v € T,(M), K(p)(u,v) = g(p)(R(u, v)(u), v). 


74.5.3 REMARK: The sectional curvature quotient map. 

The standard sectional curvature quotient map in Definition 74.5.4 is more difficult to write a specification 
for than the corresponding bilinear map in Definition 74.5.2. Since this map is a quotient, it is not linear, 
and it is not even defined for some pairs of vectors. The quotient map Q is in the untidy space 


{Q : M > Une (Ap > IR); Vp € M, Dom(Q(p)) = Ap}, 
where 
Vp € M, Ap = { (u,v) € T,(M)*?; u^ v £0}. 


In other words, for each p € M, Q(p) is a real-valued function on {(u,v) € T,(M)?; u^ v #0}. Whenever 
u ^v = 0, the denominator in the quotient on line (74.5.1) equals zero. 


74.5.4 DEFINITION: The sectional curvature (quotient) (map) on a C? Riemannian manifold (M, g) is the 
map Q: M > Upem ({(u, v) € T,(M)?; u Av 40} > IR) defined by 


Vp € M, V(u,v) € {(u,v) € Tp(M)*; u^ v z 0), 


Q(p)(u, v) = 


(74.5.1) 
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74.5.5 REMARK: The Riemann curvature tensor may be regarded as the “true” sectional curvature. 

The Riemann curvature tensor itself may be thought of as the “true” sectional curvature tensor because 
R(u, v) yields the holonomy deviation action on the tangent space at a given point as a result of infinitesimal 
parallel transport around an area element u ^ v at the point. Each such wedge product determines a two- 
dimensional “section” of the tangent space at the point. The sectional curvature, on the other hand, throws 
away much of the curvature information, yielding instead only a real-number quantification of a projection 
of the holonomy deviation action into the plane of the area element. A metric tensor is required to make 
this quantification. 


74.6. Differential operators 


74.6.1 REMARK: The differential versus the gradient. 

The differential df of a function f € C! (M, R.) for a C! manifold M in Definition 58.1.2 has the advantage 
that it is well defined for any differentiable function on a differentiable manifold. It has the disadvantage 
that it is not a tangent vector. It is a tangent covector. So it does not transform as one would expect a 
"gradient" operator to transform under local diffeomorphisms. 


In the concrete context of a geographical map, an arrow representing the differential of a real-valued function, 
such as altitude, temperature or air pressure, if the view is “zoomed in" by a factor of two, the differential 
vectors would shrink by a factor of two, although the arrows would still appear to have the correct direction. 
(The reason for shrinkage is that the contour lines would now be further apart. So the differential would 
be smaller.) More seriously, if the map is zoomed in by a factor of two in the X direction, a vector which 
was previously pointing to the north-east would now be pointing between north and north-east, whereas the 
actual *gradient" should now appear between east and north-east. (See Figure 41.6.1 in Remark 41.6.9 for 
visualisation of the differential of a real-valued function.) 


In physical terms on the ground, the gradient is the line of steepest ascent per distance travelled, and this is 
the clue to the problem. The distance is not defined on differentiable manifolds which lack a metric function. 
One may define an ad-hoc metric function, but the line of steepest ascent per distance would depend on 
this ad-hoc choice of metric function. An arrow indicating the direction of steepest ascent on a map should, 
at least for linear transformations, join the same two points on the map, no matter how it is distorted. In 
the case of non-linear diffeomorphisms, such an arrow should have the correct direction at its base, and its 
length should scale like a tangent vector at that point. 


It is clear that a metric function of some kind is required for the definition of gradient vectors, and it is 
also clear that the gradient should be a true tangent vector, not a covector. This provides the motivation 
for Definition 74.6.2. Since the metric tensor field g € X (19?(M)) of a Riemannian manifold is an inner 
product at each point, it follows from Theorem 24.9.11 (ii) that the linear map from Tp(M) to T; (M) which 
is defined by u +> (v +> g(p)(u,v)) is invertible for all p € M. This is exploited in Definition 74.6.2 to 
construct a gradient field for a differentiable real-valued function. 


74.6.2 DEFINITION: The gradient of a differentiable real-valued function f € C! (M,IR) on a Ct Rieman- 
nian manifold (M, g) is the vector field ¢ € X(T(M)) defined by 


Vp € M, Vv € T, (M), g(p)(o(p), v) = (df)p(v). 


In other words, the gradient of f is jj o df, where uj is the index-raising isomorphism in Definition 73.5.4. 


74.6.3 NOTATION: grad f, for a function f € C! (M, IR) on a Ct Riemannian manifold (M, g), denotes the 
gradient of f. 


74.6.4 REMARK: The Laplacian operator is the trace of the Hessian or the divergence of the gradient. 
The Laplacian operator, or Laplace-Beltrami operator, may be constructed in (at least) two different ways, 
as either the trace of the Hessian or the divergence of the gradient. (See Section 71.9 for the Hessian. 
See Section 71.8 for the divergence.) In each case, there are two operations, and in each case, one of the 
operations requires only an affine connection, while the other operation requires a metric. Both the Hessian 
and the divergence are well defined in the absence of a metric. 


'The construction of the Laplacian as the trace of the Hessian emphasises the idea that it is a sum of orthog- 
onal second derivatives, which has very fundamental significance in physics, particularly in electrostatics, 
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gravitation theory, the heat equation and the diffusion equation. Needless to say, extending the Laplacian 
to “curved spaces” is important for the extension of these physical models to general relativistic scenarios. 


It is not quite so easy to motivate the construction of the Laplacian operator as A = div o grad. The 
gradient of a real-valued function has a clear physical interpretation, and the divergence of a vector field has 
a clear physical interpretation, but it is not quite so immediately clear that the divergence of the gradient 
field should be meaningful in terms of the original real-valued function. 


74.6.5 REMARK: The Laplacian operator as the trace of the Hessian. 

The Laplacian operator at a point p in an n-dimensional C? Riemannian manifold M is the sum of the 
second covariant derivatives of a real-valued function in n orthogonal directions at p. The individual second 
derivatives are effectively calculated along geodesic curves passing through p. 


For a function f € C?(M,IR), one may attempt to define Af = g” De De f. but this formula has some 
difficulties. An attempt is made here to employ abstract covariant derivatives to avoid using the Christoffel 
array, but component indices i and j are still employed, and a basis vector field (e;)?., is required for 
the summation expression. Another difficulty is that De, acts on the covariant vector De, f. Hence the 
operator De, must be interpreted as acting on the covariant vector bundle instead of the usual contravariant 
vector bundle. Therefore the expression in line (74.6.1) in Definition 74.6.7 makes no attempt to “hide the 
coordinates". 


The factor g^? in Definition 74.6.7 takes care of orthogonality and scaling of the second derivatives, no matter 
which basis vector field (e;)?., is employed. The term -TE Op ensures that the second derivatives “follow 
the geodesics”. Thus the Laplacian operator requires both the affine connection and the Riemannian metric 
for its definition, whereas general elliptic operators require only an affine connection. 


74.6.6 DEFINITION: The Laplace-Beltrami operator on a C? Riemannian manifold M is the operator A : 
C?(M,TR) ^ C°(M,R) defined by 


Vu € C?(M, R), Vp € M, Au(p) = g” (p)(8iju(p) — Ti (p)Oxu(p)), 


where T is the Christoffel array for the Levi-Civita connection for the metric tensor field g on M. 


The Laplacian (operator) on a C? Riemannian manifold is the same as the Laplace-Beltrami operator. 


((2016-8-14. Give a “component-free” Laplacian in Definition 74.6.6. This will probably require gradient and 
divergence operators to be defined first. )) 


74.6.7 DEFINITION: The Laplace-Beltrami operator on a C? Riemannian manifold M is the operator A: 
C?(M,IR) — C?(M,IR) defined by 


A = g” (ij — TEx), (74.6.1) 


where T is the Christoffel array for the Levi-Civita connection for the metric tensor field g for M. 


The Laplacian (operator) on a C? Riemannian manifold is the same as the Laplace-Beltrami operator. 
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Chapter 75 


PSEUDO-RIEMANNIAN MANIFOLDS 


75.1 Pseudo-Riemannian geometry concepts . . . . . 4 4 4 4 e 4 eee 2312 
75.2 Pseudo-Riemannian metric tensor fields . .................. ln 2316 
75.3 General relativity o e s ecra to a so posa cyii d 4e ee eee on oen 2317 
75.4 Gauge theory on pseudo-Riemannian spaces . .. ... 2e 2318 


(( 2016-7-19. Overhaul/rewrite Chapter 75 because it is incomplete and inconsistent, and sometimes incorrect. )) 


75.0.1 REMARK: Pseudo-Riemannian versus semi-Riemannian manifolds. 

Some authors use the term “semi-Riemannian manifolds” instead of “pseudo-Riemannian manifolds”. (See 
for example O’Neill [295], page 12; Petersen [31], pages 4-5; Bishop/Goldberg [3], page 208; Sternberg [38], 
page 117.) However, the qualities of pseudo-Riemannian manifolds are so different to those of Riemannian 
manifolds that the prefix “pseudo” is richly deserved. A Riemannian manifold is a special kind of metric 
space, whereas a pseudo-Riemannian manifold is not a special case of anything that looks like a metric 
space. Efforts to make pseudo-Riemannian spaces look like Riemannian spaces by using an imaginary time 
parameter seem artificial and unconvincing. There are many commonalities, but there are also many “faux 
amis”. The prefix “pseudo” comes from a classical Greek word meaning “lying” or “false”. 


Pseudo-Riemannian manifolds are not Riemannian. Neither topic is a restriction, extension or application 
of the other. So the theorems and definitions of Chapters 73 and 74 cannot be automatically applied to 
Chapter 75. Each theorem and definition must be carefully examined to determine whether it depends on 
the special assumptions for Riemannian manifolds. A large proportion of theorems and definitions do not 
pass the examination! However, many of the concepts for pseudo-Riemannian manifolds make little sense in 
a Riemannian manifold. So it is not necessarily beneficial to present pseudo-Riemannian before Riemannian. 


One could, in principle, present first all of the concepts which are meaningful in both Riemannian and 
pseudo-Riemannian manifolds, and then present the specific cases in separate chapters, referring back to 
the common-concepts chapter. However, the commonality is not great enough to justify such an approach. 
It is more efficient to present the Riemannian and pseudo-Riemannian cases as separate topics, using the 
Riemannian case as a model to attempt to mimic in the pseudo-Riemannian case. The fact that many 
concepts cannot be mimicked gives valuable insight into the nature of both cases. Riemannian manifolds are 
perhaps best thought of as belonging to static geometry, which is a mathematical topic, whereas pseudo- 
Riemannian manifolds (with a single time parameter) are best thought of as belonging to physics, which 
is a more dynamic subject. The commonalities and clashes between these two contexts are (currently) an 
inescapable and undeniable reality of the interaction between the two disciplines. 


75.0.2 REMARK: Literature for pseudo-Riemannian manifolds. 
Presentations of pseudo-Riemannian manifolds are rarely given by pure mathematicians. The great majority 
of presentations are given in theoretical physics texts, integrated with applications to general relativity. 


Some presentations of pseudo-Riemannian geometry are given by Frankel [12], pages 291—332; Sternberg [38], 
pages 117-148, 161-187, 231—289; O'Neill [295], pages 12-55; Misner/Thorne/Wheeler [292], pages 304-358; 
Szekeres [305], pages 516-542; Bishop/Goldberg [3], pages 208, 210—211, 238-244, 249-250; Goenner [270], 
pages 217-264; Rebhan [299], pages 910-985; Choquet-Bruhat [6], pages 94-100, 120-124; Levi-Civita [26], 
pages 320—359, 369—383. 
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75.0.3 REMARK: Physics motivates Minkowski space-time and pseudo-Riemannian geometry. 
Pseudo-Riemannian manifolds are an outgrowth of Minkowski space-time, whose motivation lies almost 
entirely in its application to the physics of relativistic field theories, both quantum and non-quantum, in 
particular to relativistic electromagnetic field theory. The motivation for pseudo-Riemannian geometry lies 
almost entirely in its application to various theories of gravity and cosmology. So it seems desirable to 
give here a summary of the basic concepts of special and general relativity to motivate the mathematics of 
pseudo-Riemannian manifolds. However, it would be difficult to know where to stop. Therefore apart from 
some brief comments on physical interpretation, delving into the physics will be avoided as much as possible 
in Chapter 75. Although this book attempts to recursively present the logical and mathematical background 
of all differential geometry concepts, such a recursive presentation of the physics which motivates Minkowski 
space-time and pseudo-Riemannian geometry would unduly expand the scope of this book. 


Differential geometry was a minor backwater of mathematics until its applicability to gravity theory became 
evident. Thus this book ends with the DG topic which is most applicable, but the scope of the book does 
not permit any serious discussion of the physics which principally motivates the entire tower of concepts. 


75.1. Pseudo-Riemannian geometry concepts 


75.1.1 REMARK: The relation of pseudo-Riemannian manifolds to other mathematical structures. 
Pseudo-Riemannian manifolds are an extension of the concept of a Riemannian manifold. They are part of 
the mathematical framework of general relativity. 


Pseudo-Riemannian manifolds use Lorentz transformations as the structure group for affine connections 
instead of the orthogonal transformations which are used in Riemannian manifolds. Lorentz transformations 
were introduced by Voigt in 1887 as an invariance group for the wave equation. (See Voigt [336], and 
Pauli [296], page 1.) Perhaps they should be called Voigt-Lorentz transformations! 


Pseudo-Riemannian manifolds are also a generalisation of flat Minkowski spaces, which are significant be- 
cause of their application in the formalisation of relativistic field theories. Thus pseudo-Riemannian mani- 
folds may be regarded as a hybrid of Riemannian manifolds and Minkowski space-time. In other words, a 
pseudo-Riemannian manifold may be thought of as a Riemannian manifold to which a time parameter has 
been added, or as a flat Minkowski space-time to which curvature has been added. This is illustrated in 
Figure 75.1.1. 


Euclidean 
space 
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4 
ES 
Riemannian Minkowski 
manifold Space-time 
o I 
RS ed 
d 
e 


pseudo-Riemannian 
manifold 


Figure 75.1.1  Pseudo-Riemannian manifolds, Riemannian manifolds and Minkowski space-time 


From the pure mathematical point of view, a pseudo-Riemannian manifold may have any number of “time 
parameters”. In other words, the signature may be any pair of non-negative integers (m,n). But in practice, 
there is relatively little interest in the case of multiple “time parameters". Thus it is generally tacitly 
assumed that the signature is (1,0) or (n, 1), where n is the number of “space parameters”. So one could 
perhaps refer instead to “time-dependent Riemannian manifolds”, although this would not be accurate. 
Pseudo-Riemannian manifolds are not a simple cross-product of a Riemannian manifold with a “flat” time 
parameter. The time parameter is also “curved”. 


Very roughly, one may say that Minkowski space-time is a mathematical framework for electromagnetic 
field theory under the assumption that the speed of light is the same for all “inertial frames" or “inertial 
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observers”. (These are effectively frames or observers for which Newton’s laws are observed to hold true. In 
other words, momentum is conserved in the absence of any identifiable forces acting on objects. Of course, 
this is a circular definition because force is measured as the discrepancy between the motion of objects and 
the motion they would have if momentum was conserved.) The Minkowski space-time model essentially 
says that two figures are the same if they are related by Lorentz transformations, in much the same way 
that Euclidean geometry says that two objects are the same (or “congruent”) if they are related by some 
combination of rotations and translations. Any figure or property which is not invariant under Lorentz 
transformations is rejected as not being “relativistically invariant”. 


Equally roughly, one may say that Riemannian geometry is a mathematical framework for curved space. 
When Minkowski space-time and Riemannian geometry are combined, the result is a mathematical framework 
in which both gravity and electromagnetism may be modelled. 


75.1.2 REMARK: Pseudo-Riemannian geometry is not a true geometry. 

From the mathematical point of view, it is difficult to call pseudo-Riemannian geometry a true geometry, 
but due to its importance as the mathematical framework for general relativity, it is desirable to at least 
attempt to “fit a round peg to a square hole”. In traditional geometry, one may measure a distance between 
two points as often as one wishes, but a space-time interval happens once and only once in all of history. 
Space-time intervals cannot be examined in any way because they are either in the past or in the future, or 
else they are in the very rapid process of happening, after which they will become permanently inaccessible 
in the past. 

The space-time concept is a kind of pseudo-geometry whose properties are probably more closely related to 
methods of measurement than to the nature of the physical universe itself. Observers with different velocities 
are in some way analogous to observers at different locations observing objects in three-dimensional space. 
But intuitions from space-geometries are mostly misleading. It is perhaps best to regard Minkowski space- 
time and pseudo-Riemannian manifolds as time-dependent geometries with flexible coupling between space 
and time rather than as a four-dimensional space with an unusual invariance group. 


75.1.3 REMARK: The relation of pseudo-Riemannian manifolds to pseudo-distance functions. 

As discussed in Section 73.9, a Riemannian manifold is a metric space which can be given a C? differentiable 
manifold structure in such a way that the square of the distance from each fixed point is twice differentiable 
at that point. This definition can be extended in a limited way to pseudo-Riemannian manifolds, which can 
be constructed from “pseudo-metric spaces” which can be given a C? differentiable manifold structure so that 
the square of the “pseudo-distance” from each fixed point is twice differentiable at that point. Recovering 
the pseudo-metric tensor from the point-to-point interval function is only possible under certain technical 
conditions. For example, such recovery succeeds if extremal paths are unique between each pair of points, 
and the paths depend sufficiently smoothly on the points. 


To see how a pseudo-Riemannian manifold may fail to have a global “pseudo-distance” function from which 
the pseudo-metric tensor field can be derived, such manifolds should first be defined in order to give examples. 
However, Example 75.1.5 gives some idea of how a differentiable manifold can be locally pseudo-Riemannian 
while there is no possibility of defining a global pseudo-distance which is consistent with the pseudo-metric 
tensor field. For comparison, Example 75.1.4 shows that global pseudo-metric tensor fields may be equi- 
informational and interconvertible with global pseudo-distance functions. 


75.1.4 EXAMPLE:  Spacelike and timelike zones of a pseudo-Riemannian distance-squared function. 

The generalisation from Riemannian to pseudo-Riemannian manifolds permits the distance-squared function 
to be negative. This raises the rather sensitive question of how one defines the distance function itself. For 
example, if the point set is M = IR? and the distance-squared function of a point x € IR? from 0 € IR? is 
specified as d(x)? = z2—22, then the distance for x with |a2| < |2 | is easily obtained as d^ (x) = (d(z)?)!/? = 
(r2 — z2)!/?. A corresponding “imaginary distance" may be defined as d- (x) = (—d(x)?)!/? = (x2 — 2)? 
for x such that |z5| > |v |. This is illustrated in Figure 75.1.2. 

In this example, one may consider that there is a spacelike pseudo-distance d* (0, x) for spacelike intervals 
with |x2| € |z1| and a timelike pseudo-distance d~ (0, x) for timelike intervals with |x2| > |x1|. When these 
pseudo-distances are squared and joined, the result is a pseudo-distance-squared function d(x)? = x? — «3. 
The two interval zones, spacelike and timelike, and the two pseudo-distance functions dt and d7, are 
unambiguously recovered from the combined pseudo-distance-squared function. 
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Figure 75.1.2 Pseudo-distance function on IR? 


The pseudo-distance-squared function may also be converted to a pseudo-Riemannian metric tensor field 
and then converted back again into the pseudo-distance-squared function. Half the square of the Hessian of 
d(x)? yields the 2 x 2 matrix (gi;(0))7;.., with coefficients g;;(0) = 570] — 0707 for i,j € N2. This matrix 
may be extended so that gi;(y) = 010; — 70; for all y € IR?. Then by using the calculus of variations, the 
pseudo-distance functions d* and d^ may be recovered from this pseudo-metric tensor field by extremising 


the integral fom) (+ gij (IENE) L dy(t) for differentiable curves y between a given pair of points. 


This admittedly incurs some technical complexity, but the important consequence is that the pseudo-metric 
tensor field contains the same information as the pair of pseudo-distance functions. 


Consequently there are, in this example, three equi-informational ways of representing the pseudo-distance, 
namely the pseudo-metric tensor field, the pseudo-distance-squared function, and the pair of pseudo-distance 
functions (spacelike and timelike). 


75.1.5 EXAMPLE: Confused pseudo-distance functions on manifolds with torus-like topology. 

Define a C?? manifold with point set M — [0, 1)?, and the torus topology and torus differentiable structure. 
(See Definition 32.6.15 for the torus topology. See Example 50.1.7 and Remark 51.10.8 for the torus dif- 
ferentiable structure.) Define g € T°?(M) by gij(x) = g(v-!(x))(ei ej) = 010; — 6765 for x € M, where 
w = idy : M > M is the identity map. The loci of pseudo-distances r = 0 and r* = 1 are illustrated in 
Figure 75.1.3 for the case M = [0, 4) x [0, 3). 


y Te Tı y pseudo-distances r — 0 
points at distance r from y — (0,0) and r* — 1 from y — (0,0) 


Figure 75.1.8 Riemannian and pseudo-Riemannian torus metric tensor field comparison 


Figure 75.1.3 contrasts the orderly progression of distance away from a fixed point y € M on the left with 
the disorderly confusion of pseudo-distances on the right. (See Definition 37.2.12 for the distance function on 
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the left.) The line labelled “1” in the diagram on the right satisfies r? = z? — z2 = 0. The line labelled “2” is 
a continuation of line “1”. The other lines continue in numerical order and then cycle back to the beginning. 
The concatenation of these lines is a geodesic curve through the point y € M. All points on this curve may 
be said to have pseudo-distance equal to zero from y. The mirror-image geodesic curves with r = 0 which 
go to the north-west (i.e. upper left) and south-east (i.e. lower right) of the point y are not shown because 
these would excessively clutter the diagram. 


The four curves labelled “r+ = 1” satisfy £z? — 23 = 1. The points on these curves have “spacelike pseudo- 
distance” equal to 1 from y. The straight-line path from y = (0,0) to (1,0) is a geodesic curve which 
apparently assigns the pseudo-distance rt = 1 to the point (1,0), but a diagonal geodesic apparently assigns 
the pseudo-distance r* = 0 to this same point. 


It is clear that on a manifold with a torus-like topology, a pseudo-Riemannian metric tensor field does 
not necessarily partition the manifold into well-defined spacelike and time-like regions with well-defined 
pseudo-distances from a given point. To achieve such an orderly situation, one apparently requires at 
least uniqueness of geodesics between given pairs of points. In the case of a Riemannian metric tensor field, 
however, uniqueness of geodesics between point pairs is not necessary to give a well defined distance function. 
For example, uniqueness fails for the two-sphere, but its distance function is very well defined. To define 
the distance on a Riemannian manifold, one simply selects the shortest geodesic between two points (or the 
infimum of distances in case the Hopf-Rinow theorem fails). 


75.1.6 REMARK: The absence of a triangle inequality for pseudo-distance functions. 

To see that the triangle inequality in Definitions 37.1.2 and 37.2.3 for a distance function cannot be generalised 
to the pseudo-metric case, it is sufficient to consider pairs of points y, z € R? with d(y) = d(z) = 0 for the 
pseudo-distance-squared function x œ> d(x)? = zx? — x3 in Example 75.1.4. In the following table, the 


parameter t € R$ is unbounded above. Therefore d(y + z)? is unbounded both above and below. 


y z ytz d(ytzD dytz)* d(ytz) 
(t,t) (t,t) (20,21) 0 0 0 
(t,t) (t,—t) (2t,0) At? 2t — 
(t,t) (—tt) (0,21) —4t? — 2t 


The double unboundedness of d(y4- z)? in the case d(y) = d(z) = 0 also occurs in the general case of arbitrary 
d(y)? and d(z)?. For any fixed values of d(y)? and d(z)?, the vectors y and z may be chosen to give any 
value of d(y + z)? € R. 


75.1.7 REMARK:  Pseudo-distance and pseudo-distance-squared functions. 

In view of Remark 75.1.6, perhaps the best that can be done towards defining a general class of pseudo- 
distance functions is to require at least the identity and symmetry conditions as in Definitions 37.1.2 
and 37.2.3, while making no attempt to specify any kind of triangle inequality. Such functions may be 
defined on arbitrary point sets, although they are so general as to be useless. But if they are defined on 
a differentiable manifold, they can be made to correspond roughly to the concept of a pseudo-Riemannian 
manifold. 


Definition 75.1.8 is adapted from Definition 37.2.3 by 


(1) specifying the square of the distance instead of the distance itself, 

(2) permitting the square of the distance to be negative, 

(3) weakening the identity condition to permit zero distance between distinct points, 
(4) omitting the triangle inequality. 


The use of the form “d?” for the pseudo-distance-squared function is a minor abuse of notation. 


75.1.8 DEFINITION: A pseudo-distance-squared function on a set M is a function d? : M x M — R such 
that 


(i) Vr € M, d^(v,x) — 0, [identity] 
(ii) Vz,y € M, (x,y) = d? (y, 2). [symmetry] 
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75.1.9 REMARK: The relation of pseudo-distance function pairs to pseudo-distance-squared functions. 
The pseudo-distance function pair (d*, d~) in Definition 75.1.10 is intended to correspond to the idea of the 
square root of the pseudo-distance-squared function in Definition 75.1.8. The set S* is the set of pairs (x, y) 
such that d?(z, y) > 0, while S^ is the set of pairs (x,y) such that d?(z, y) < 0. 


75.1.10 DEFINITION: A pseudo-distance function pair on a set M is a pair (d+, d7) such that d+ : St > Ri 

and d- : S~ — IRj for some subsets $+ and S~ of M x M, and 

1) StUST=M xM, 

) Yre M, (x,x) € STNA ST, 

3) Vr € M, d*(z,z) =d (2,2) = 0, 

4) Vr, y € M, (x,y) e ST & (yax)eS*, 

5) V(z,y) € S+, d*(z,y) = d*(y,), 
) 
) 
) 


2 


6) Vz,y € M, (x,y) E€ S & (y,xz) € S7, 
7) V(z,y) € S7, d(x,y) = d (y, 2), 
8) Vx,y€ M, (x,y) e S* nS- & d*(z,y) =d (x,y) = 0, 


75.1.11 REMARK: Definition of pseudo-Riemannian metric field in terms of pseudo-metric functions. 

As in the case of Riemannian metric tensor fields, a pseudo-Riemannian metric tensor field may be defined 
either explicitly or as half the Hessian of the second derivatives of a two-point distance function as in 
equation (73.9.1). In the pseudo-Riemannian case, the distance is hyperbolic rather than elliptic. Thus the 
symmetric metric tensor component matrix has eigenvalues with mixed sign whereas the eigenvalues are all 
positive in the case of a Riemannian manifold. 


75.1.12 REMARK: Pseudo-Riemannian metric tensors are not necessarily positive definite. 

Special relativity is formulated in terms of Minkowski space-time, which is a hyperbolic version of Euclidean 
space. When flat Minkowski space-time is generalised to manifolds, the corresponding concept is a pseudo- 
Riemannian metric. This is the mathematical framework of general relativity. Geodesic curves are defined 
by self-parallelism, not by minimising or maximising the length of the curve. 


75.1.13 REMARK: Pseudo-Riemannian manifolds are the final layer of differential geometry structure. 
Pseudo-Riemannian geometry is the final stage in the presentation of differential geometry. The most 
important concept to generalise to a pseudo-Riemannian metric (for the purposes of general relativity) is 
the Riemann curvature tensor. When all of the machinery of differential geometry has been generalised 
from the Riemannian metric to the pseudo-Riemannian metric, the framework is then finally ready for the 
presentation of general relativity. 


75.2. Pseudo-Riemannian metric tensor fields 


((2016-7-19. In Section 75.2 give a coordinate-free definition for the Levi-Civita connection for a Minkowskian 
metric. )) 


75.2.1 DEFINITION: A pseudo-Riemannian metric (tensor) (field) on a C* manifold M is a nondegenerate 
symmetric continuous covariant tensor field of degree 2 on M. In other words, a pseudo-Riemannian metric 
on M is a tensor field g € X?(T9?(M)) such that 


(i) Vp € M, VVi, V2 € T»(M), g(p)(Vi. V2) ES g(p) (V2, Vi). [symmetric] 
(ii) Vp € M, VV, € T,(M) \ {0}, 3V2 € T,(M), g(p)(Vi, V2) £ 0. [nondegenerate] 


75.2.2 DEFINITION: A C* (differentiable) pseudo-Riemannian metric on a C**! manifold M, for k € Zj, 
is a pseudo-Riemannian metric g on M such that g € XF(T??(M)). 


75.2.3 REMARK: The possibility of weakening the regularity for the pseudo-Riemannian metric. 
The C! condition in Definition 75.2.1 could possibly be weakened to a Hélder C?:! condition with the metric 
tensor being defined almost everywhere. This would still permit distance to be calculated. 
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75.2.4 REMARK: Minkowskian metric tensor fields. 

A pseudo-Riemannian metric with index 1 is sometimes known as a “Minkowskian metric”. Since the sign 
of the metric tensor for time-like vectors could be either positive or negative, the more specific name “space- 
time metric” is used here to suggest that it is positive for space-like vectors and negative for time-like vectors. 
Then a metric which uses the opposite convention may be given the name “time-space metric". 

Definition 75.2.5 is based on the concept of a Minkowskian inner product in Definition 24.10.6. It follows 
from Theorem 24.10.8 that the combination of conditions (i) and (ii) in Definition 75.2.5 is equivalent to the 
customary characterisation of a Minkowskian metric field in terms of eigenvalues of its component matrix. 
The conditions in Definition 75.2.5 are apparently "coordinate-free". 


75.2.5 DEFINITION: A space-time (Minkowskian) metric (tensor) (field) on a C! manifold M is a pseudo- 
Riemannian metric field on M which is a Minkowskian inner product at each point. In other words, a 
space-time metric field on M is a pseudo-Riemannian metric field on M which satisfies 


(i) Vp € M, AV € T (M), g(p)(V, V) < 0. [index > 1] 
(ii) Vp € M, Vi, V2 € Tp(M), ((g(p)(Vi, Vi) < 0 and g(p)(V2, V) < 0) = g(p)(Vi, V2) #0). [index < 2] 


((2016-7-19. After Definition 75.2.5, show from Theorem 24.10.8 that in a space-time manifold M, the inner 
product on Tp(M) at each p € M can be diagonalised with diagonal entries (—1, 1,...1), and that the basis 
vectors can be chosen to be C* if the metric field is C^ on a C^*! manifold. )) 


75.2.6 DEFINITION: A time-space (Minkowskian) metric (tensor) (field) on a C! manifold M is a pseudo- 
Riemannian metric field on M which is a hyperbolic inner product with index n — 1 at each point, where 
n = dim(M). In other words, a time-space metric field on M is a pseudo-Riemannian metric field on M 
which satisfies 

(i) Vp € M, AV € T (M), g(p)(V, V) > 0. [index € n — 1] 


(ii) Vp € M, Vy, V2 € T,(M), ((g(p)(Vi, Vi) > 0 and g(p)(Vo, Va) > 0) = g(p)(Vi, Va) 4 E 4 | 
index > n — 2 


75.2.7 REMARK: Why timelike vectors should have negative distance. 

Although it is sometimes convenient to define Minkowski time-space as in Definition 75.2.6, so that time-like 
vectors have positive length and space-like vectors have negative length, this seems to be the less natural 
way to define the pseudo-metric. Time is physically very abstract, although it is mathematically simple. In 
practice, a space-like interval can be measured at leisure, whereas a time-like interval must be inferred from 
distance, such as for example the distance between positions of a needle on a clock, or the distance between 
frames of a movie, or distance on an audio tape. Time intervals cannot be directly perceived, only inferred. 


({ 2016-7-19. Define time-like curves and proper time along time-like curves. )) 


75.3. General relativity 


((2016-7-19. Convert the old-style components version of the general relativity equation in Remark 75.3.1 to 
the modern style. )) 


75.3.1 REMARK: The general relativity evolution equations. 
The equations of general relativity look something like 


1 
Rw na PRI + AG a = Kol uv, 


where A is the famous fudge factor which “explains” the discrepancy between theory and observations by 
invoking an ad-hoc variation in the cosmological expansion rate. 


75.3.2 REMARK: The Einstein tensor and Einstein spaces. 

Although the Einstein tensor in Definition 75.3.3 is of more interest in pseudo-Riemannian manifolds, it is 
well defined in Riemannian manifolds. Likewise, the Einstein spaces in Definition 75.3.4 are well defined for 
both Riemannian and pseudo-Riemannian metric tensor fields. 
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75.3.3 DEFINITION: The Einstein (curvature) tensor (field) in a C? differentiable Riemannian manifold M 
with dim(M) = n € Zj is the twice-covariant tensor field G € X(T°?(M)) whose components with respect 
to any C? chart on M satisfy the formula Gi; = Ry — SR; for 7,7 € Nn. 


75.3.4 DEFINITION: An Einstein space is a Riemannian manifold in which the Ricci tensor is a scalar 
multiple of the metric tensor. 


75.4. Gauge theory on pseudo-Riemannian spaces 
((2019-9-14. Remark 75.4.1 is very preliminary. Please ignore it. )) 


75.4.1 REMARK: Stages in construction of gauge theory equations of motion. 
Roughly speaking, the stages in the construction of gauge theory equations of motion are as follows. (See 
for example Mandl/Shaw [288], pages 222-225; Itzykson/Zuber [277], pages 562-569.) 


(1 
2) Structure group's fibre space Fp. A finite-dimensional linear space. Example: C? 


) Base space (M, g). Example: Minkowski space-time M = R x IR? with metric g € X(T°?(M)). 
(2) 
(3) Structure group (G, Fo). Example: G = SU(3). 
(4) 


4) Structure group's Lie algebra T.(G). For G = SU(3), this is spanned by eight matrices iF; = itj 
for j € Ns, where the Hermitian matrices Ê ; are called “colour charges”, satisfying [F;, Fy] = ifjneFe 


for some array of structure constants f € IR5*5*5, 
(5) Principal bundle P < (P, nrp, M, AS) with structure group G, fibre atlas AG. 


(6) Connection form w : T(P) — T.(G) on P. This “gauge potential" represents a bosonic radiation 
field, which may be specified macroscopically by the design of an experiment, or may be “emitted by” 
fermions. This is the mathematician’s version of the “localised” gauge potential in step (8). 


(7) Cross-section of principal bundle X € X(P,zp, M): Called a “gauge”. 

(8) Connection form localisation component function A% ,, defined by AX (p), = w(X. (ey) for 
X € X(P,xp, M), v € atlas(M), p € Dom(v) n Dom(X), u € {0,1,2,3}. This is the Lie-algebra- 
valued gauge potential ^A," on M which is usually employed by physicists instead of w on P. (See 
Definition 69.13.3.) 


(9) Vector bundle's fibre space F. A finite-dimensional linear space to represent pointwise state for 
matter fields. Example: ((€*)?)°. This gives W/(p) € C4 for local cross-sections V € X(E, tg, M) of 
vector bundle E, for p € M and quark flavours f € (u,d,c, s, b, t), where c € (r, g, b) is a quark colour. 
(The notation f clashes with part (4). The notation c for colour likewise has an obvious clash.) 


10) Vector bundle E < (E, zz, M, AE) with base space M, fibre space F, and fibre atlas AŻ. 
E E 
(11) Cross-section of vector bundle v € X(E, mrg, M). This is a fermionic “classical particle field". 


(12) Covariant derivative on vector bundle. Apply connection form w to vector bundle E to obtain a 
covariant derivative *D,, = 0, + A," using ^A," from step (8). 


(13) Lagrangian density function. Use the covariant derivative operator to construct a Lagrangian density 
function representing the radiation field and the matter field(s). Includes terms like “Fy F""^. Example: 
(iD) — (FI,)? — moby, Y-M for QED. (See Peskin/Schroeder [298], page 489.) 

(14) Euler-Lagrange equations of motion. Convert the Lagrangian density function to equations of 
motion for the combined matter field(s) and the radiation field. 


(15) Quantization. Quantize the foreground and background fields alternately until the system converges. 


One oddity here, from the pure mathematical point of view, is the use of two different fibre spaces Fp and 
F in the above list. Quantum chromodynamics, the theory of quarks and gluons, is nominally based on 
SU(3) as the structure group, which nominally acts on the fibre space C. However, the particles of this 
theory, i.e. quarks, have wave functions which are represented by a fibre space which is suitable for the Dirac 
equation. For each of the six flavours of quarks, the state at each point is represented as an element of (4)?, 
with one copy of C4 for each of the three quark colours. So effectively one could say that F looks like (72. 
'Thus the fermions have a very different fibre space to the bosons, which appear in the model as elements of 
the 8-dimensional Lie algebra of SU(3). 
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Although the basic outline of steps in quantum field theory can be described pure mathematically, the choices 
of structures and functions are largely determined by physics, which is outside the scope of this book. The 
general principle here is that mathematics is concerned with tools, the design, organisation, improvement 
and documentation of these tools. Physics applies existing tools and invents new ones, which can then 
be investigated by mathematicians to attempt to provide something better if possible. Understanding the 
purposes of tools in their applications is important for mathematicians, but in the case of physics, it is not 
always possible to understand the requirements or objectives. 


In the case of the “colour charges”, i.e. Lie algebra basis matrices, in step (4), mathematically speaking any 


basis is as good as any other. But in physics, each such matrix has a physical significance. 
The gauge potential w in step (6), or equivalently A in step (8), may be defined by some experimental set-up, 


or it may be “emitted” in some sense by fermions, or even by bosons. The dependence of these potential 
functions on location may be empirical. They could follow some kind of inverse distance law, but this law 


could be modified in some way by phenomena which are well outside the domain of mathematics. 


Likewise the Lagrangian density function in step (13) is chosen by some kind of guesswork because in general 
there is no fixed “correct” Lagrangian. Terms are added as required. 


The role of the mathematician stops, roughly speaking, at the point where the physics commences. It is 
possible to describe and formulate and formalise gauge theory in terms of fibre bundles up to the point 
where many arbitrary choices must be made by physicists. (At least they are “arbitrary” from the pure 
mathematical viewpoint.) Attempting to advance further into the domain of physics means entering a non- 
axiomatic realm, where the truth of propositions is not determined by pure predicate calculus. The inference 
from one line to the next is then no longer logical inference, but rather a kind of reverse inference, where the 
intended logical outputs of the theory determine the logical inputs, which is the opposite to the axiomatic 
method. Thus in physics, the goal is to infer axioms from theorems. 


In some areas of physics, the “inference of axioms from theorems” may be said to be essentially complete, 
for example in many areas of classical physics which are now fully integrated into engineering as “settled 
science”. But in scientific research, particularly in physics, the “axioms” are still only a preliminary, tentative 
approximation to reality. The role of the mathematician in such areas can be the timely determination of 
consequences arising from variable axioms. The axioms of fundamental physics are a moving target for 
mathematicians to work on. Instead of one “settled” corpus of laws of physics, there are very large numbers 
of combinations of “laws” which can be mixed and matched for various purposes. 


Thus the “mathematicisation” of physics is a continuous process, determining the mathematical consequences 
of large numbers of theoretical frameworks as they evolve. It is far too early in the history of physics to start 
converting it all to mathematics in the way that much classical physics has been. The mathematicisation 
of gauge theory in terms of fibre bundles may or may not be helpful, but it is potentially helpful, as are all 
attempts to mathematicise physics. In return, mathematicians obtain interesting topics to work on. The 
majority of mathematics developed in the last 400 years has arisen from the demands of physics. 
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Chapter 76 


SPHERICAL GEOMETRY 


76.1 Higher-dimensional spherical coordinates . . . . ... 2e 2321 
76.2 Terrestrial coordinates for the two-sphere . . . ... aa a a a 2323 
16.3 The embedding of the two-sphere in Euclidean space... . .. a a 2324 
16.4 The differential and the critical-point Hessian operator . .................. 2326 
76.5 Metric tensor calculation from the distance function . . .... ls 2327 
16.6 Affine connection and curvatures in terrestrial coordinates... ............... 2329 
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76.8 Isometries of the 2-sphere ... . 4... a a a 2332 


76.0.1 REMARK: The two-sphere is the most important non-trivial differentiable manifold. 

The 2-sphere $ is arguably the simplest non-trivial geometry to which differential geometry can be applied. 
Many curved-space concepts have a clear intuitive meaning on the two-sphere because of its familiarity. Many 
questions can be answered in two ways, either by theory or by intuition. The answers can then be compared 
to ensure the validity of the theory and to fine-tune one's interpretation of the theory in preparation for 
applications to manifolds where intuition delivers no clear answers. Any concept which cannot be applied 
to the 2-sphere probably requires revision or rejection, since the 2-sphere is the quintessential manifold. 


Although the geometry of the 2-sphere has been studied in detail since ancient times, its rich variety of 
properties provides a valuable testing ground for modern theoretical concepts. Understanding the 2-sphere 
is also fundamental to many areas of physics from the quantum mechanics of atoms and elementary particles 
to astrophysics and cosmology. And finally, the word *geometry" means measurement of the Earth, and the 
Earth is a 2-sphere (approximately). Therefore the 2-sphere deserves to be investigated in some detail. 


76.0.2 REMARK: Coordinate charts versus curve families. 

As mentioned in Remark 49.6.9, manifolds are generally specified in terms of points-to-coordinates maps 
for theoretical purposes, but in practical situations, coordinates-to-points maps are often more convenient. 
This is particularly true for spheres embedded in Cartesian spaces. Therefore coordinates-to-points maps 
are most frequently used in Chapter 76. 


76.0.3 REMARK: To coordinatise or not to coordinatise? That is the question. 

The answer is “yes”. The much-maligned “coordinates” are the only game in town when the tyres hit 
the road. When abstract differential geometry is applied to a particular differentiable manifold, all of the 
abstract “coordinate-free” definitions must be translated into hard-nosed tensor calculus. Coordinates are 
ugly, but the alternative is to stay in the armchair and send someone else out to do the dirty work. 


76.1. Higher-dimensional spherical coordinates 


76.1.1 REMARK: The point set for unit spheres. 

The n-sphere S" is defined for n € Z* to have the base set S" = {x € IR"*!; |x| = 1). This will be 
given the differentiable structure of a C% n-dimensional manifold. (Of course, the tuples in the set S" are 
not “points”, strictly speaking. The point set is actually extra-mathematical.) It is useful to also extend 
spherical coordinates to the ambient space IR^^! in which the n-sphere is embedded. 
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76.1.2 REMARK: Recursive embedding of lower-order spheres inside higher-order spheres. 

In the case of the Cartesian tuple spaces IR" for n € Zf, there is an obvious embedding of IR^ inside IR^*! 
for each k € Zt. One may easily extend the standard chart for IR* to R*+! by adding an extra parameter 
to each tuple, and one may easily project IR**! down to IR^ by removing the last tuple element. 

It is not entirely obvious how lower-order spheres can be embedded inside higher-order spheres in such a 
clean, simple way as for the Cartesian spaces. For example, the space S! may be identified with a great 
circle within S?, and one may then add a latitude parameter to extend this embedding to the whole of $°, 
but this is not quite so tidy as the flat Cartesian space case. One unsatisfactory aspect of this extension 
method is that it does not deliver the expected chart for S! as an extension of S?, nor the corresponding 
extension from the closed ball Bj, = {x € R}; |a| € 1) to Bg, = (x € IR?; |e] <= 1). 


76.1.3 REMARK: A recursive formula for spherical coordinates. 
A fairly typical style of coordinate system for S1 is as follows. 
x! = cos 01 
x? = sind} 
One of the various styles of coordinates for S? is as follows. 
x! = cos 05 cos 04 
x? = cos 05 sin 04 
z? = sin 05 


This could be referred to as “terrestrial” or “astronomical” spherical coordinates because the value of 05 is 


zero at the equator and is +5 at the north and south poles. The labels for the coordinates are chosen so 
as to suggest a recursive formula, whereby S^ will be naturally embedded inside S**! for all k € Z*. The 


following general formula for S" for n > 1 is suggested by the above formulas for S! and S?. 


i; f Ij eos6; fori=1 
sin; IL. cosó; for2<i<n+l. 


= (76.1.1) 
To generate a general set of spherical coordinates for all dimensions, it is convenient to look for a pattern in 
the following formulas for R4. 

x! = cos 04 cos 05 cos 04 


x? = cos 03 cos 05 sin 04 
x = cos 05 sin 0 
z^ = sin b3. 


'This is the same as the formulas for IR? with the difference that the first three coordinates are multiplied by 
cos 03 and the fourth coordinate is sin 03. This suggests the following recursion formula. 


i _ | 2$ .60080, fori icm 
i sin 0, fori—-n-d1, 


where zp = (zi)? = (z1, x2,... £2?!) e IR^! is the (n + 1)-tuple for $" C IR^**. Since |z, 4| = 1, it 


follows that |z,| = 1. In other words, if x, .4 € S"-1, then x, € S". 


76.1.4 REMARK: Choice of domain for spherical coordinates. 
A suitable domain for the coordinates-to-points map in equation (76.1.1) would be (—7, 7] x [- 5, 5]! 
although the map is not injective for this domain when n > 2. To achieve injectivity, one must ensure that 
there is only one coordinate tuple at each of the poles, in each of the recursion steps. Thus for n = 2, a 


suitable domain would be (—7,7] x (-$, 5) U {0} x {-4, 3}. Then for n = 3, one obtains 


(((—m, 7] x (3,3) Y {0} x {-§, 53) x (- 9,9) U {0} x {0} x {1-53} 
and so forth. Since the poles are irregular embedding points of the map, injectivity is of limited value. So 
it is generally best to restrict the domain to a set such as (—7, 7] x (—$, $)"~', omitting the poles entirely. 
The domain (—7, 7] x (—$, $)"-* corresponds to terrestrial coordinates, where the longitude traditionally 
lies between —180° and +180°. For astronomical coordinates, the range [0, 27) x (C 5, 3)” + is more suitable, 
since the Right Ascension is generally given in the range from 0° up to 360° (or 0 to 24 hours.). 


ri 
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76.1.5 REMARK: Domain for spherical coordinates in quantum mechanics. 

In quantum mechanics, the domain [0,27) x [0,7] is often used for 5?. (See for example Bohm [255], 
page 313; Cohen-Tannoudji/Diu/Laloé [257], page 661; Penrose [297], page 563; Flügge [267], page 144; 
Haken/Wolf [273], page 157.) Then the factors cos 0? and sin 05 are swapped, the north pole is at 05 = 0 and 
the south pole is at 09 = m. Thus one obtains the recursion formula 


i .]u5 .59mB, forl cin 
n cos ôn fori=n+1 


for n > 3, and the general formula 
cos 61 IL sinó;  fori—l 
a = 4 Mij- sin6; for i = 2 (76.1.2) 


cos0; i[[;.;sinó; for3<i<n+1 


instead of equation (76.1.1). Clearly formula (76.1.2) is not as neat and tidy as formula (76.1.1). The domain 
for this style of inverse coordinate chart would have a form such as (0,2) x (0, x)" 1. 


76.2. Terrestrial coordinates for the two-sphere 


76.2.1 REMARK:  Terrestrial-style coordinates for the 2-sphere. 
Define the 2-sphere to be the subset S? = (x € R3; |v| = 1} of R3. Define terrestrial spherical coordinates 
for S? by 

y : 8? — ((—1, T] x (71/2,1/2)) U ((0,—7/2), (0,1/2)) 


(76.2.1) 
V : x> (01,05) = (arctan(z4, z3), arcsin(z)). 


(See Section 44.2 for definitions of trigonometric functions.) The range of ~ is illustrated in Figure 76.2.1. 
01 is called the longitude and 65 is called the latitude. 


05 
A 
L2 
2/2 Dee North Pole 
| | -- München 
: . 
| 
= l i | | i ja, 61 
1-3 -2 al 1 2 3 
: - 
| es i N 
i ~ Melbourne 
mec Ce "E 
-n/2| 5 
Figure 76.2.1 Range of terrestrial coordinates for S? 


In Definition 44.2.9, the range of the arctangent function is arbitrarily chosen to be the set (—7, 7], and the 
value of arctan(0, 0) is chosen as zero, so that the range of the map Ņ is as indicated in line (76.2.1). 


'The map v has a left inverse m : R? > S? defined by 


V01,05 € IR, W(01, 05) = (cos 6; cos 05, sin 0, cos 0», sin 05). (76.2.2) 


Figure 76.2.2 illustrates lines of constant longitude and latitude for a 2-sphere. The curves 0; = 0 and 05 = 0 
are indicated. The longitude intervals are 15°. The latitude intervals are 10°. 
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Palo Alto Mile München 
Ram Allah 
X O d 
La Habana Perse 
| | Te e 
Figure 76.2.2 Domain of terrestrial coordinates for 5? 


76.2.2 REMARK: Terrestrial coordinates do not constitute a chart, strictly speaking. 
The terrestrial coordinates map Y% in Remark 76.2.1 is not a chart for the manifold S? because it is not 
continuous, but it can be made into a chart by defining it as the inverse of the restriction of the map 


wy: R? > S? to the set 
Up = (r, T) x (71/2,1/2) 
= Int(Dom(#)), 
or equivalently, by restricting wv : S? — IR? to 
Up = (xe 8”; x4 > -(1— aay} 
= {x € S*; zı > 0 or 22 70] 
= y^! (Do), 
which removes the poles and the international dateline from the domain of w. If one defines Yọ = v| Uo and 
vo = Da then Wo : Uo > Up is a chart for $?, and Yo = yp". 


76.2.3 REMARK: Astronomical spherical coordinates. 
Astronomical spherical coordinates are obtained by forcing 0, into the range [0, 27). This may be achieved 
by applying the binary modulo function in Definition 16.5.19 to the arctangent expression on line (76.2.1). 
Then one obtains . 

):5?— ([0, 27) x (—1/2,7/2)) U {(0,—7/2), (0, 7/2)} (76.2.3) 
i) : a+ (01,02) = (arctan(z,, £2) mod 27, arcsin(a3)). U 


For the quantum mechanical style of spherical coordinates mentioned in Remark 76.1.5, one may replace 
arcsin(za3) with arccos(x3) in line (76.2.3) to obtain 02 € [0,7] as in the following map. 
b : S? — (10,27) x (0,7)) U {(0,0), (0, 7 
Ù: 3? (0.27) x (0,7) U (0.0), (0,2) ms 
qp : x > (01,09) = (arctan(z1, x9) mod 27, arccos(x3)). 


Terrestrial-style coordinates with 6, € (—7,7] and 0» € [—7/2,-/2]| are assumed in Chapter 76 unless 
otherwise indicated. Coordinates 6; and 05 will generally be denoted as ¢ and 0 respectively. 
76.3. The embedding of the two-sphere in Euclidean space 


76.3.1 REMARK:  Ezrtrinsic tangent vectors for the 2-sphere embedded in 3-space. 
The extrinsic tangent vectors for parameters @ and 0 in terrestrial spherical coordinates are: 


- et 0 
ór 09,8) sin @ cos 
— = — = cos œ cos 0 
06° ‘Od i 

, (76.3.1) 
0s 40) | psn’ 
00 900 me 
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The lengths of these vectors satisfy |3x/ð¢| = cos and |0x/00| = 1 in the ambient Euclidean space IR?. 
A convenient abbreviation for these tangent vectors is 0g = Or/0$ and 09 = 02/00. The extrinsic tangent 
vectors in line (76.3.1) correspond to the unit vectors in line (76.3.2). 


—sing 
eg(¢, 0) = seco 5 = cos ġo 
i ( ) 
76.3.2 
$a — cos o sin 0 
eg( 0,0) = 36 ^ — sin ó sin 0 
cos 0 


76.3.2 REMARK: Distance functions through the ambient space and within the manifold. 

Theorem 76.3.3 gives the distance between two points on a 2-sphere via a “short-cut” through the interior of 
the ball of which the sphere is the surface. One could think of this as the “ambient distance". The distance 
in Theorem 76.3.5 is the corresponding distance when the path is constrained to lie within the sphere. 


76.3.3 THEOREM: Formulas for the ambient-space distance function. 
The distance d(pi, p2) within IR? between two points pı and pa in S?, whose spherical coordinates are ($1,061) 
and (¢2, 05) respectively, satisfies 


d(pi,pa)? = 2(1 — cos(05 — 01) + cos 01 cos 0»(1 — cos(ó» — ¢1))) (76.3.3) 
= Asin? (1(05 — 01)) + 4cos 0; cos 05 sin? (4(d2 — $1)).. (76.3.4) 


PROOF: Equation (76.3.3) follows from a straightforward application of standard trigonometric formulas 
to the length of the chord line which joins pı and po. 


d(pi, pa)? = (cos à» cos 05 — cos $1 cos 61)? + (sin $2 cos 05 — sin $1 cos 06,» + (sin 0 — sin 0,)? 
= cos? 05 + cos? 01 — 2 cos 04 cos 05 (cos $1 cos $» + sin 1 sin $3) + sin? 05 + sin? 6, — 2 sin 0, sin 09 
= 2 — 2cos 04 cos 0» (cos $1 cos $» + sin $1 sin $3) — 2 sin 04 sin 05 
= 2 + 2 cos 0, cos05(1 — cos(¢z — $1)) — 2cos(05 — 61). 


Then line (76.3.4) follows from the half-angle rule for sines. 


76.3.4 REMARK:  Distance-like properties of the ambient-space distance function. 

It follows from Theorem 76.3.3 line (76.3.4) that d(pi,pa)? € ($2 — $1)? + (02 — 04)? for all (0,0) € R?, 
and the ambient-space distance function has the approximate form d(pi, p2) © ((A¢)? + (A0)? cos? 8)!/? for 
small (9,0), where Ad = ¢2 — $1, A0 = 65 — 0 and 0 = (0; + 65)/2. 


76.3.5 THEOREM: Formulas for the geodesic distance function. 
The distance d(pi, p2) within the sphere surface S?, between two points p; and pz with spherical coordinates 
(01,01) and ($2,025) respectively, is given by 


d(p1, pa) = 2 arcsin y/sin?(4 (02 — 61)) + cos 01 cos 6s sin? (4 (2 — ¢1)) (76.3.5) 
— arccos (sin 04 sin 05 + cos 04 cos 05 cos(ó5 — $i )) (76.3.6) 
= arccos (cos(05 — 64) + cos 0; cos 62(cos(¢2 — $1) — 1)). (76.3.7) 
Hence the distance d(pi, p2) satisfies 
sin?(3d(pi, p2)) = sin?(4(@2 — 01)) + cos 01 cos 62 sin? ($(d2 — 1)) (76.3.8) 
and 
cos(d(p1, p2)) = cos(05 — 04) + cos 01 cos 02(cos(¢2 — $1) — 1). (76.3.9) 


[12 


PROOF: The distance d(pi, p2) may be calculated as either 2 arcsin(5 d(pi, p2)) or arccos(pi : p2), where 
denotes the usual scalar binary product operator. The arcsine forni ule for the angle subtended by a chord 
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gives line (76.3.5) by substitution of line (76.3.4) for d(pı, p2). The arccosine rule gives 


arccos((cos $4 cos 61, sin $1 cos 04, sin 04) - (cos $» cos 0», sin $» cos 05, sin 05)) 


II 


d(p1, p2) 
= arccos(cos $4 cos 01 cos $» cos 0» + sin $1 cos 04 sin $» cos 0» + sin 61 sin 02) 


II 


= arccos(sin 6; sin 05 + cos 0 cos 0» cos(¢2 — ¢1)), 


( 
arccos(cos 04 cos 05 (cos $4 cos $» + sin $4 sin $5) + sin 0, sin 62) 
( 
— arccos(co 


cos(05 — 64) + cos 0; cos 62(cos(¢2 — $1) — 1)), 


which verifies lines (76.3.6) and (76.3.7). Lines (76.3.8) and (76.3.9) are immediate consequences of lines 
(76.3.5) and (76.3.7) respectively. 


76.4. The differential and the critical-point Hessian operator 


76.4.1 REMARK: The differential for real-valued functions. 
The differential for real-valued functions on differentiable manifolds is presented in Sections 58.1 and 58.2, 
in particular in Theorem 58.1.6 and Definitions 58.1.2 and 58.2.3. 


76.4.2 THEOREM: The differential of a differentiable real-valued function on the two-sphere. 
Let f € C1 (S?, IR). Then the differential df € T*(S?) of f satisfies 


Vp € Up, VV € T;(S?), (df) (V) = Dv f = v! 8s f (v; ^ (6, 0)) + vo f (Wo (o, 0)), 


where po : Uo — IR? is the terrestrial coordinates chart in Remark 76.2.2, V = tp,v,po for some v € IR?, and 


(0,0) = (¢(p), 0(p)) = vo(p) for all p € Uo. Let f = f o ygt. Then 
V(6,0) € Uo, Vv € R?, (df )p(V) = Dv f = vtae f (0,0) + v7 Oof (6,0), 


where p = p(¢,0) = vg ((¢,0)) for all (6,0) € Up = Range(vo), and V = ty, us. 


PROOF: The expressions for the differential (df),(V) = Dy f follow straightforwardly from Definition 58.1.2 
and Notation 54.11.12. 


76.4.3 REMARK: Informal expressions for the differential of a real-valued function on a two-sphere. 

The differential of f in Theorem 76.4.2 may be written more informally as df = (0$ f, 0e f), although such 
an expression ignores the difference between f and f o v !. Strictly speaking, (df), € T5 (S?) is a covector 
at each point p € 5?, and (df), maps V = tp.) to the real number v!0s f (V5 (o, 0)) + vao f (ug (o, 0)) 
for each p € Up and v = (vt, v 2c m2, 


Although the notation (df)p(tp,v po) is not particularly inconvenient to write, the notation D, f is even 


T p.v. do 
less convenient because of the triple subscript. Strictly speaking, p should be written as Wo(@, 0). Then these 
two notations become somewhat painful, namely as (df) j, (4,0) (55(¢,0),v,rbo) and Disi "PT f. Clearly 
informal expressions are preferable in application contexts, although it is important to be able to convert 


these to formal expressions whenever confusion arises. 


76.4.4 REMARK: Applications of the Hessian operator. 

The Hessian operator is applicable to the calculation of the Riemannian metric tensor field from the two- 
point distance function, as discussed in Section 73.9. Luckily, the Hessian in this role does not require a 
connection or metric to tensorise it because it is applied at a local extremum of a real-valued function in this 
case, namely at the global minimum of the square of the two-point distance function. (See Section 59.11 for 
the tensoriality of the Hessian operator at a critical point of a real-valued function.) 

The Laplacian can be easily computed as the trace of the Hessian, contracted with the metric tensor. 
However, in this case the Hessian must incorporate the Levi-Civita connection because the Laplacian is 
required to be valid at all points, not merely at critical points of functions which it acts on. (See Section 74.6 
for the Laplace-Beltrami operator on Riemannian manifolds.) 
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76.4.5 THEOREM: The Hessian operator at a critical point of a real-valued function on the 2-sphere. 
Let u € C?(S?, R). Then the Hessian H(u), € TP? (9?) of u at a critical point p of u satisfies 
VV, W e T;(S?), 
Hu), (V, W) = v!w'üsu(ug ' (6,0) + (vu? + vw! )Osdgu(wo (H, 0)) + vw gulha (A, 0)). 


where wo : Ug — IR? is the terrestrial coordinates chart in Remark 76.2.2, V = tpv, po and W = ty wb) for 
some v, w € R?, and (¢,0) = (¢(p), 0(p)) = vo(p). Let à = uo vg!. Then 


Vv, w € IR?, H(u)y(V,W) = vi'w ozil, 0) + (vu? + v?w!)8505ü(9, 0) + v?w?02u(9, 0), 
where p = p(¢,0) = v5 ! ((6,0)) for some (0,0) € Up = Range(4/o), and V = t,,,,4, and W = ty us. 


PROOF: The expressions for H(u), follow from Definition 59.11.4 for the Hessian at a critical point of a 
C? real-valued function on a C? differentiable manifold. 


76.4.6 REMARK: Components of the Hessian at a critical point. 
The Hessian of a function u € C?(S?), at a critical point p of u, may be thought of as the matrix 


H(u),— | 99 "0 V. 
Ugo Uoo 
These are the components of a symmetric tensor in T??? (S?). This object is tensorial because computation 
of the components in a different chart yields a matrix which is related by the tensor transformation rules 
for TO? (S?). This succeeds because the first derivatives ug and tig are equal to zero at a critical point of u. 


76.5. Metric tensor calculation from the distance function 


76.5.1 REMARK:  Derivation of the metric tensor from the distance function. 

Section 32.12.2 deals with the computation of the standard Riemannian metric tensor field on S? from the 
standard point-to-point distance function. This distance function has been familiar to mathematicians since 
ancient times. It is well-known how the distance function d : M x M —^ Ri may be derived from the 
metric tensor field g € T??(M) for any Riemannian manifold M. The reverse calculation is not often shown 
in textbooks. (See Section 73.9 for this calculation.) But it is desirable to derive the new concept, the 
Riemannian metric, from the ancient concept, the distance function. 


Theorems 76.5.2, 76.5.4, 76.5.5 and 76.5.6 present a computation of the Riemannian metric tensor field for 
the terrestrial coordinates left-inverse chart for S? in Remark 76.2.1 line (76.2.2). 


'Theorems 76.5.2 and 76.5.4 are intuitively fairly obvious. Moreover, Theorem 76.5.4 is clearly true for any 
C? function F : [0,1] — R with F(0) = 0 and F’(0) = 1, and may be easily extended to arbitrary non- 
negative F'(0). (The numbers 6, and kı in the proof of Theorem 76.5.2 depend on the choice of F.) The 
proof is not entirely trivial because the conditions on h do not imply that A!/? is differentiable. In fact, in 
the cases of interest, h!/? is usually not differentiable. Then the nonlinear arcsine function is applied to such 
a non-differentiable function, and the square of this is calculated. It is perhaps not immediately obvious how 
to prove by elementary methods that the square of a nonlinear function of such a square root is well-behaved. 


76.5.2 THEOREM:  Differentiability of the square of the arcsin of the square root of a function. 
Let h € C! (IR2, IR) satisfy h(0,0) = 0 and Range(h) C [0,1]. Define f : R? > R by f(x) = arcsin(h(z)!/2)? 
for all z € R?. Then f(0,0) = 0 and ô f(0,0) = 3 f (0,0) = 0. 


PROOF: The equality f(0,0) = 0 follows trivially from the equality arcsin(0) = 0. From the assumptions 
h € C! (IRJ, IR), h(0,0) = 0 and Range(A) C [0, 1], it follows that 0:h(0,0) = 85h(0,0) = 0. 

Let e € Rt. Since ô arcsin(t)], o = 1 by Theorem 44.2.15, there exist kı > 1 and 6, > 0 such that 
Vt € [0,0], arcsin(t) € kıt. (One may use 6, = 1 and kı = 7/2, but the choice of values does not matter.) 
The proposition Veg > 0, Jós(e&3) > 0, Vs € R, 0 < |s| < ó»(e2) = 0 < h(s,0) < es|s| follows from the 
relations h(0,0) = 0, 0:h(0,0) = 0 and Range(h) C [0,1]. Let £2 = eky?. Let 6 = min(02(£2),0265 .). Then 
0 <6 < d%e5!. So Ys e R,0 < |s| < 6 > 0 < h(s,0)/2 < el? |s/2 < c1? 512 < ehe P Lg 
Therefore Vs € IR, 0 < |s| < 6 = 0 < arcsin(h(s,0)/2) < kjh(s,0)/7. So Vs € R,0 < |s| < ô = 
0 < arcsin(A(s, 0)!/2)? < k?h(s,0) < k?ez|s| = e|s|. Hence 0, /(0,0) = 0. Similarly. 02 f(0,0) = 0. 
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76.5.3 REMARK: Translation invariance of square-of-arcsine-of-square-root property. 
Although Theorem 76.5.2 is stated for convenience at a single point x = (0,0) € R?, it is clear that the 
results f(0,0) = 0 and 0, f (0,0) = 02f(0,0) = 0 hold at any point x € IR? where h(x) = 0. 


76.5.4 THEOREM: Twice differentiability of the square of the arcsin of the square root of a function. 

Let h € C?(IR?, IR) satisfy h(0,0) = 0 and Range(A) C [0,1]. Define f : IR? > R by f(x) = arcsin(h(z)!/2)? 
for all z € IR2. Then 

(1) f(0,0) — 0, 

(2) 3 f(0,0) = 2 (0,0) = 0, 

(3) Oi; f(0,0) = 8j; (0,0) for all i, j € {1,2}. 


PROOF: Parts (1) and (2) follow directly from Theorem 76.5.2. 


For part (3), the chain rules for differentiation are applicable at all points x = (s,t) € IR? where 0 < h(a) < 1. 
At such points, 


ð; arcsin(h(s, t)! /?)? = 2 arcsin(h(s, t) /?) (1 — h(s, t)) 1? lh(s t) V? Osh(s, t) 
= arcsin(h(s, t) /?) h(s, t) 1/7? (1— h(s,t))-? O,h(s, t). 


This is close to 0,h(s,t) when h(s,t) is close to zero, but this “closeness” must be made more precise. 
From Definition 44.2.3 and elementary calculus, it follows that Vy € (0, 5], arcsin(y)/y € (1,1 + iy?). 
So h(s,t) € (0, 4] = arcsin(h(s,t)/?)h(s,t)-? € (1,1 + $h(s,t)). Likewise, elementary calculus (or 
arithmetic) yields the two-sided bound h(s,t) € (0, 1] = (1— h(s,t)) !? e (1,14 2h(s,t)). Therefore 
lim, ,9 5-18, arcsin(h(s,0)!/?)? = lim, ,9 s~!0,h(s,0) because lim, ,o h(s,0) = 0 and lim, ,9 0,h(s,0) = 0. 
(Note that, as mentioned in Remark 76.5.3, the first derivatives of f are zero wherever h is zero.) So 
lim,_,9 5-18, arcsin(h(s, 0) /2)? = 014 (0,0). Hence 01; f(0,0) = 04, (0,0). The other second derivatives in 


part (3) follow similarly. 


76.5.5 THEOREM: Derivatives of the square of the distance function. 
Define fo : IR? — IR for 0 € IR by 


Vr € R?, folz) = 4arcsin( (sin? (5x2) + cos 0 cos(8 + x2) sin?(12,))/2)*. 
Then for all 0 € R, 

(1) fo(0) — 0, 

(2) 0; fo(0) = 0» fo(0) = 0, 

(3) 0i; fo(0) = 2cos? 0 070; + 26765 for all i, j € {1, 2}. 


PROOF: Let h(x) = sin? (4x2) + cos 0 cos(0 + x2) sin? (4x1) for all x € IR?, for any 0 € IR. Then parts (1), 
(2) and (3) follow from the corresponding parts of Theorem 76.5.4. 


76.5.6 THEOREM: Formula for the metric tensor field on the sphere in terrestrial coordinates. 
The components of the Riemannian metric tensor field g € X(T9?(5S?)) with respect to the terrestrial 
coordinates chart for the two-sphere 5? satisfy 


V(9,0) € Ü, Vi, j € No, gig (u (9, 0)) = cos? 0 618] + 267 


1 Jj? 
where U C IR? and ij : U — S? are as given in Remark 76.2.2. 
Pnoor: By Definition 73.9.4, gi;(u(z)) = 18y: 3y lE), (y)?|, .. for z = (z1,23) = (6,0) € Ü. So by 
Theorems 76.3.5 and 76.5.5, 


= 40,,0,, 4arcsin((sin?($(y2 — £2)) + cos £2 cos y» sin? ($ (y1 — aD h 


= cos? £2 101 + 6207. 


y= 


=x 


Hence gi; (1b(@, 0)) = cos? 0 0101 + 676? for all (6,0) € U and i,j € No. 
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76.6. Affine connection and curvatures in terrestrial coordinates 


76.6.1 THEOREM: Some tensor calculus formulas for the standard metric in terrestrial coordinates. 
With respect to terrestrial coordinates on S?, the metric tensor components gij, the Christoffel array ys 
for the Levi-Civita connection, the Riemann curvature tensor components Rt jkl, the Ricci curvature tensor 
components Rij, the scalar curvature R, and the Einstein tensor components Gij, have the following values. 
gij ^ cos? 0 53.5; + 57.55 
g” = sec? 0 010] + 0393 
r = — tan 54 (5; 05 + 6201) + sin 0 cos 0 05010] 
TE, — (— sec? 54 (60; + 701) + (cos? 0 — sin? 8) 050107) ô? 
Ri jue = (8182 — cos? 6 0181) (8162 — 281) 
Rijke = cos? 0 (0102 — 0701) (910; — 5257) 


Rij = gij 
R=2 
Gi; —0 


PROOF: The formula for (gij)? jı follows from Theorem 76.5.6. The formula for (g'’)?,_, is the matrix 


2 


inverse of (gij); j=1- 


The formula for (L5)? ; ,.., follows from Definition 74.3.4 because gij, = —2 sin 0 cos 0 070707, and so 


r = 1g" (053 + 9t — 9i) 
= ig" (-2sin 6 cos0 (8,0102 + 076107 — 010107) 
= — sin 0 cos 6 (sec? 0 61.5, + 0565) (070107 + 016107 — 010107) 
= — sin 6 cos 0 (sec? 0 ôf (8107 + 6107) + 0$ (—010])) 
= — tan 54 (810; + 5567) + sin 0 cos 0 5560j. 


The formula for (TE ¢)? j p e=1 follows by straightforward differentiation of the formula for (TE)? ; ,=1- 
To prove the formula for (R’jx¢)? j 4,221 from Definition 71.11.8, note that 
Tien — Dj; = (sec? 0 6107 — (cos? 0 — sin? 0)6201) (9107 — 6201) 
and 
Da — Viel me = —(tan? 0 0102 + sin? 0 0501) (0107 — 050; ). 
Hence 
R jxe = UM = Tine ag IDE E 2 

= ((sec? 0 — tan? 0) 618 — (cos? 0 — sin? 0 + sin? 0) 0201) (0107 — 0201) 

= (0107 — cos? 0 6567) (0107 — 5267). 
The formula for (Rijke)? jk e= follows by contraction of (gim)7m—1 with (R^ jke) j ker 


The formula for (Rij)? j= follows from Definition 71.11.14 and the formula for (R'jke)? j k e=1: 
Rij = R as 
= (0162 — cos? 0 6567) (0102 — 6701) 
= 0207 + cos? 8 ô; ð} 


Then it follows from Definition 74.4.8 that the scalar curvature satisfies R = 2. The equation Gij = 0 for 
the Einstein tensor follows from the formula G;; = Rij -iHgi in Definition 75.3.3. 
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76.6.2 THEOREM: Sectional curvature in terrestrial coordinates. . 
The sectional curvature Ky 6 : T5 (9,0) (97) x T5 (9,0) (97) — R for (0,0) € Uo, with respect to terrestrial 
coordinates on $7, satisfies 


V($,0) € Uo, Vu,v € Tj (6, (8. unv #0 > Kgo(u,v) — 1. 


PnoorF: Let (¢,0) € Uo and u,v € T5, (9,0) (97) Then Rijke = cos^0 (010; — 001) (010; — 050) by 
Theorem 76.6.1. So Ri;xeu'v uk vu’ = cos? 0 (uv? —u?v!)?. But |u^v[? = Jul? |v[? — (u-v)? = gi;ju'u? gy zu*u* — 
(gju*v?)? = (cos? 0 (ut)? + (u?)?) (cos? 0 (vt)? + (v?)?) — (cos? 0 utut + u?v?)? = cos? 0 (utu? — u?v!)?. Hence 
Kg ol(u, v) = (Rijreu’viukv)|u ^ v|? = 1 whenever uA v £ 0. 


76.6.3 THEOREM: The Hessian operator in terrestrial coordinates. 
Let u € C?(S?,R). Then the Hessian H(u) € X?(T??(S?)) of u with respect to the terrestrial coordinates 
inverse chart v) : Uy — S? in Remark 76.2.2 satisfies 


H(u)i; = uij + tan (507 + 575; Ue — sin cos 0 010; ug. 
In other words, 


Ugg — Sin O cos ug ugo + tan ug 


Mops ugg + tanb ug ugo 


Pnoor: By Remark 71.9.7, the Hessian operator for a function u € C?(S?) in terrestrial coordinates 
satisfies H(u);; = wij — Tup = uij + tan (007 + 575} )ug — sin 0 cos00;0]ug by Theorem 76.6.1. 


76.6.4 THEOREM: The Laplacian operator in terrestrial coordinates. 
Let f € C?(S?, R). The Laplacian Af € X°(T°-°(S?)) = C?(S?, R) of f with respect to the terrestrial 
coordinates inverse chart 4) : Uy — S$? in Remark 76.2.2 satisfies 


V(¢,0) € Uo, — Af(U(0,0)) = sec? 0 05 f (0(6,0)) + 05 f (6(0,0)) — tan Oo f (U(6, 0)). 
In other words, A = sec? 0 07 + 02 — tan 0 09 and Af = sec? 0 fpo + foo — tan0 fo. 
Proor: From Definition 74.6.7, the Laplacian operator has the form A = g (ði; — TE ôk). Therefore 
A = g” (ij — TE ak) 

= (sec? 0 5161 + 6563)(Oj; — (— tan 0 01 (6102 + 6767) + sin 0 cos 0 550,01)0,) 
= sec? 0 011 + 055 — sec? 0 sin 0 cos 0 O5 
= sec? 0 041 + do — tan Op. 

Hence A = sec? 0 05 +03 — tan 0 09 and Af = sec? 0 fao + foo — tan 0 fo. 


76.6.5 EXAMPLE: Parallel transport on the 2-sphere. 

The Riemannian connection for S? defines parallel transport of tangent vectors on S?. Figure 76.6.1 shows 
parallel transport of a vector along two paths in S?. Note that the region bounded by the paths has area 7/4, 
which equals the difference in orientation of the axes at the end-points. 


Figure 76.6.1 Parallel transport on S? along boundary of ¢ € (0, 5), 0 € (2, 3) 
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To see how vectors are parallel-transported on the two-sphere as in Figure 76.6.1, note that parallel transport 
along a curve y : I — S? of a vector field V : I + T(5?) along y, so that V(t) € T,()(S?) for all t € I, is 
defined by D;(jV(t) = 0 for all t € I, where J is an open interval of IR. Expressed in terms of terrestrial 
coordinates, this means that Oyv^(t) + TE (vo((t)))v'(t)8:a? (t) = 0 for all t € I, where (v‘(t))7_, is the 
family of components of V (t), and (a'(t))2.., is the family of coordinates of y(t), for all t € I. In abbreviated 
form, vF + Dota] = 0. Then by Theorem 76.6.1, v? = (tan0 Ot (0107 + 0201) — sin cos 6 050101 ovat = 
tan 0 ôf (vtr? + v?z]) — sin 0 cos 0 dkuta}. 

Consider the curve y : t ++ %o(0,t). Then z(t) = (0,1) for all t € I = (— ,$). So vf = tanddfvt. 
For k = 1, vl(t) = tantv!(t) for t € I, which has general solution vl(t) = c sect for some c € IR. If 
v(0) = (1,0), then v!(t) = sect = sec@ for all t € I. For k = 2, v? = 0. So v(t) = (sect,0) for all t € I. 
However, the term cos? 0 in the metric tensor implies |V (t)| = 1 for all t € I. Consequently the vector (1,0) 
is transported along the longitude ¢ = 0 with constant length from the equator to the pole. Similarly, the 
vector (0,1) is transported as expected from equator to pole. 


T 
2 
C 


For parallel transport along the latitude line 0 = %, the curve is defined by y : t — (t, §). Then z(t) = (1,0). 
So the equation vf = tan @ ôf (viz? + v?z]) — sin 0 cos0 ó$v!z] yields v? = tan0 ófv? — sin 0 cos0 ófv!. Let 
w! = cos0 v! and w? = v?. (This makes w the unit vector in the direction of v.) Then wł = sin0 w? and 
w? = —sin@w!. With initial condition v(0) = (1,0), this has solution w(t) = (cos(—t sin 0), sin(—tsin0)). In 
particular, w(5) = (cos(—7), sin(— 1)). Thus the vector is uniformly rotated clockwise through the angle 7 
as t increases from 0 to 4. 


The transport from (5, §) 
et 


parallel transport by th 


to the north pole is similar to the transport along the longitude line ¢ = 0. Hence 
wo paths illustrated in Figure 76.6.1 differs at the pole by the angle 7. 


76.7. Geodesic curves 


76.7.1 THEOREM: Derivation of terrestrial coordinates equation for geodesics from affine connection. 
If the image of a geodesic curve on $? is locally expressible as the graph of a function from ¢ to 0 in terrestrial 
coordinates, then this function satisfies 


Jk, do € R, Vó € (—7, 7], 6() = arctan(k sin(¢ — $o)). 


Conversely, the graph of any function of this form is the image of a geodesic curve. 


PROOF: It follows from the equations for a freely parametrised geodesic curve in 2-dimensional manifolds 
with affine connection (Theorem 72.1.11 (iii)) that for a torsion-free connection, 


0" = Tj, (0? + (Typ = 20 y5)(8’)? + (2r = rio + rà = 0. 


On substituting the Christoffel array T = — tan 0 (0707 +676} 07 +sin 0 cos 007005, the differential equation 
for 0 becomes 

0" + 2(6')? tan 0 + sin 0 cos0 = 0. (76.7.1) 
But (tan 0)" = (0' sec? 0)’ = 2(0')? tan 0 sec? 0 + 0" sec? 0 = sec? 0 (2(0’)? tan 0 + 0"). So 0" + 2(0)? tan0 = 
(tan 0)" cos? 0. Therefore line (76.7.1) may be expressed as (tan@)” + tan@ = 0, which has the general 
solution tan 0 = ksin(ó — $9), as claimed. The converse follows from the same calculations. 


76.7.2 REMARK: Illustration of geodesic curves in terrestrial coordinates. 
Figure 76.7.1 illustrates the images of the geodesic curves in Theorem 76.7.1 with $9 = 0 and k = tana, 
where a = 10°, 20°,...80°. 


76.7.3 REMARK:  Derivation of equation for geodesics from analytic geometry. 
Another way to determine the coordinates of geodesics is to define a great circle through (¢, 0) = (0,0), with 
inclination a to the equator, to be the intersection of the sphere S? with the plane f(x, y, z); zcosa = ysina}. 
Then the equation in terms of (¢, 6) is 

tan0 = sin ó tana. 


That is, 
0 = arctan(sin ¢ tana). 


This agrees with the particular case $9 = 0 in Theorem 76.7.1 with k = tana. 
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3 a = 80? 


a = 80° m 


Figure 76.7.1 Geodesic curves in terrestrial coordinates on the two-sphere 


76.7.4 REMARK: Formulas for affinely parametrised geodesics. 

The equations for affinely parametrised geodesics may be obtained in several ways. One way is to re- 
parametrise the equations for a geodesic curve parametrised by ¢. Another way is to solve the equations for 
a geodesic curve directly. And a third way is to use the symmetries of S? to generate families of geodesic 
curves out of a simple geodesic, such as the equator, parametrised by ¢. This last approach gives 


o(t) = do + arctan(cos a tan(t — to)) 
sina sin(t — to) 
y1 — sin? a sin?(t — to) 
i tan(t — t 
— arctan | sina tan(¢ — to) 


/1+ cos? o tan?(t — to) 


. | cosa tan(t — to) 
— arcsin ; 


/1+ cos? o tan?(t — to) 


0(t) = arctan 


by applying successively R;(—to), R,(o) and R;(óo) to the curve given by ¢(t) = t and 0(t) = 0. These 
isometries of S? are discussed in Section 76.8. 

For affine parametrisation, one needs X? + TXIXE = 0. That is, 6 + cos? 0 9? = 0 and à — 2tan0 6 = 0. 
For length-parametrised geodesics, one needs additionally gij X i Xj — 1. That is, cos? 0 9? + 6? = 1. 


76.8. Isometries of the 2-sphere 


76.8.1 REMARK:  Rotations of the 2-sphere. 
Section 76.8 presents rotations of the 2-sphere in IR? about various axes. These rotations are elements of the 
classical group SO(3). 


76.8.2 REMARK: Isometries transform geodesics to geodesics. 

Since the Riemannian manifold S? is symmetric with respect to elements of the orthogonal group O(3), 
this group can be used to generate new geodesics from given geodesics. For example, the equator of S? is 
E = {(a,y,z) € R3; £? + y? = 1 and z = 0}, which is the image of a geodesic curve. Therefore the image 
gE of E for any group element g € O(3) is also a geodesic image in S?. 


76.8.3 REMARK: One-parameter families of rotations generated by antisymmetric matrices. 
Define three generators of SO(3) to be the antisymmetric matrices 


00 0 0 01 0 —1 0 
Ai2|0 0 -1]; 452|0 0 05; 45—2]|]1 0 0 (76.8.1) 
01 0 -1 0 0 0 0 0 
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Define the corresponding single parameter families of rotation matrices R,, R2 and R3 by 


1 0 0 
Ri(a1) = exp(o141) = | 0 cosa, -sinqa 
0 sino, cosa, 
cosag 0 sino» 
Ro (a2) = exp(ao A2) = 0 1 0 (76.8.2) 
—sinag 0 cosag 


cosa3 —sinagz O0 
R3(a3) = exp(a3A3) = | sinag cosas; 0 
0 0 1 


For k = 1,2,3, let R(t) denote the corresponding linear transformations on IR?, defined by x — R,(t)z. 
Then R;(t)(S?) = S? for all t € IR and k = 1, 2,3. 


76.8.4 REMARK:  Euler's angles, and yaw, pitch and roll. 

Euler’s angles 0, ¢ and v» are the parameters of transformations of IR? of the form e; = R3(¢)R2(0) Ra(v)e; 
where (e;)2., is the standard sequence of orthonormal basis vectors of IR?. The coordinate transformation 
has the matrix R3(—wW)R2(—0)R3(—¢). Therefore the coordinates (z', y', 2’) with respect to the modified 
basis vectors are related to the original coordinates (x, y, z) by 


y | = Rs(-v)Ra(-0)Rs(—0) 


v Q|Q g 


cos cos 0 cos ġ —sinv sino sinycos¢+cosycosPsing  — cosy sin Ü T 
= | — sin Y cos 0 cos ọ — cosy sinó cosi cos —sinvycosÜsino siny sino y 
sin cos o sin 0 sin ¢ cos 0 Zz 


This transformation is supposed to be useful for analysing spinning tops or something. The idea is to align 
the z’ coordinate (unit vector e5) with the axis of the top. 


Although Euler's angles are useful for mechanics, they are not at all useful as coordinates for the differentiable 
manifold structure of the group SO(3) in a neighbourhood of the identity. For that purpose, the three 
parameters a1, o» and oa are perfectly suited. For instance, the transformation R3(a3)Ri(a1)Re(a2) € 
SO(3) of unit vectors for a € IR? provides useful coordinates in a neighbourhood of the identity of SO(3). 
This particular coordinatisation may be thought of in terms of an aeroplane flying in the direction of the 
X-axis: o» is the forward-tilt of the plane (the *pitch"), o4 is the right-roll of the plane (the “roll”), o is 
the left-drift of the plane (the *yaw"). Each of the 6 orderings of the 3 rotations gives a different but 
analytic-compatible chart in a neighbourhood of the identity. 


76.8.5 REMARK: Rotation around a fired point on the 2-sphere. 
It is tempting to look for better parameters than Euler's angles. One interesting possibility would be to 
define two parameters as the location of the fixed point in S? of a group element, with the third parameter 
being the amount of rotation around that fixed point. The calculations for this are not completely trivial. 
For instance, even if the Euler angle w is set to zero, it may be shown (by matrix diagonalisation) that the 
fixed points of the transformation R3(—1) Ra (—0) Ra3(—4) satisfy 


. Sin0(1-— cos $)(1 — cos ó cos 0), 
u cos à cos 0 


T 


and 
sin ọsin 0 
y= 25 
(1 — cos 9)(1 + cos 0) 


'This leads to complicated expressions for the terrestrial coordinates of the fixed point. This does not seem 
to be a useful way to parametrise rotations, although it does have the advantage of removing the special 
significance of the axial directions. 
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76.8.6 REMARK: Vector fields induced on the 2-sphere. 

For every vector V € T(G) for G = SO(3), a vector field is induced on S? by the map p +> (dR,)e(V) 
for p € S?, where Rp : G — S? is defined by Rp : g +> g.p for g € G and p € S?. This map is chart- 
independent. 

Define a chart i123 : G > IR? so that 423 : R3(a3)R2(a2)Ri(a1) — (01,02,03) in some neighbourhood 
of e € S?. 'The rotation matrix is: 


CoC3  $192C3 — C193  C15$2C3 + $153 
R3(a3)Ro(a2)Ri(a1) = | C2853 515253 -C1C3. €18283 — $163 | , (76.8.3) 
— s2 $1C2 C1C2 


where sọ = sin oj, and cj, = cosa, for k = 1,2,3. 

Let V = t&v, € Te(G) for v € IR. Define the chart vo : S? — IR? for terrestrial coordinates as in 
Section 76.2. Then to calculate (dRp)e(V), it is necessary to differentiate the position of a point in S? with 
respect to coordinates of G. To be precise, ((dR,)«(V))* = $5; V0, (Yå o Rp o iss 2) |. uis) where 
Và = band Yè = 0. When v = (1,0,0), (dRp)e(V) = — cos o tan 0 e; +sin à ep, where e and e} are the chart- 
basis vectors at p € S? with respect to Yọ. Similarly for v = (0, 1,0), (dRp)e(V) = — sin ¢ tan 8 ef — cos o eg, 
and for v = (0,0,1), (dRp)e(V) = e5. Therefore for general V € T.(G), 


H 


i U 
mms a ee ee Aqu 
pje $ 0 sing M i 


i) 


This topic is also mentioned in Remark 63.5.11. See Figure 63.6.3 in Example 63.6.23 for an illustration of 
some vector fields induced by SO(3) on S?. 
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77.1. Chronology of mathematicians 


77.1.1 REMARK: Chronological ordering of concepts assists memory. 

The purpose of Section 77.1 is to assist the learning of mathematics by arranging many of the famous names 
of mathematical history in chronological order. Memory is assisted by associating objects with locations. 
Similarly, the association of concepts with historical dates assists memory. 


Section 77.1 is a brief chronology of mathematicians who contributed directly or indirectly to the development 
of the topics in this book. The tables are based on Bynum et alia [238], EDM2 [113], Bell [234], KEM [103] 
and numerous other sources. Names are sorted by date of death because people are generally better known 
for their achievements and influence in their last ten years than in their first ten years. 


77.1.2 REMARK: Ancient history 


dates name contribution 
c639-c546BC Thales of Miletus First proof of geometric theorem. 
572-492Bc Pythagoras of Samos taught geometry; maybe first to prove Pythagoras’ theorem 
c490-c425BC Zeno of Elea paradoxes regarding infinitesimals 
480-411BC Antiphon the Sophist atomistic calculation of the area of circles; proposed a method 


of exhaustion 
cA70-cA10BC Hippocrates of Chios wrote *Elements of Geometry" (lost) 


c460-c400BCc Hippias of Elis invented the quadratrix for trisecting angles 
c460-c370BC Democritus computed volume of pyramids by dividing into “atomistic” 
laminas 
c400-347BC Eudoxus of Cnidus attributed as having developed “method of exhaustion". 
c427-c347BC Plato philosophy of mathematics 
384-322BC Aristotle first sum of infinite series baud. 4^": provided basis for Euclid’s 


Elements? Basic formal logic. 


c325-c265BC Euclid of Alexandria Elements; organised ruler/compass geometry axiomatically. 
c287-212BC Archimedes of Syracuse rigorous treatment of areas and volumes bounded by curved 
lines and surfaces using the “method of exhaustion". 


c276-clO4BC Eratosthenes of Cyrene measured distance of 1? on Earth 
c262-c190BC Apollonius of Perga Konikon Biblia; conic sections. 
c190-c120BCc Hipparchus of Nicaea Founder of trigonometry. 


Alan U. Kennington, “Differential geometry reconstructed: A unified systematic framework”. www. geometry.org/dg. html 
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The Mesopotamians and Egyptians made much progress in geometry before classical Greek mathematics, 
but no personal names are associated with this very early geometry. It is not known, for example, who 
discovered Pythagoras’ theorem between about 2500BC and 1850BC. According to Bell [234], page 70, the 
first deductive proof of a geometric theorem is traditionally ascribed to Thales about 600BC. Maybe Thales 
needed to use deduction to fill in the gaps in his knowledge which he learned on a visit to Egypt. He probably 
forgot a few things while sailing back to Anatolia, the Egyptian priests didn’t like to tell everything, and 
Egyptian and Mesopotamian mathematics texts never gave proofs of rules and theorems. 


Euclid was more important as a collector and organiser of geometrical knowledge than as an inventor or dis- 
coverer. It is the logical organisation of Euclid’s “Elements” which has had a profound effect on mathematics 
and physics, not so much the particular set of theorems. Bell [234]; page 71, says: 


With the completion of Euclid's Elements, Greek elementary geometry, exclusive of the conics, 
attained its rigid perfection. It was wholly synthetic and metric. Its lasting contribution—and 
Euclid's—to mathematics was not so much the rich store of 465 propositions which it offered as the 
epoch-making methodology of it all. 


For the first time in history masses of isolated discoveries were unified and correlated by a single 
guiding principle, that of rigorous deduction from explicitly stated assumptions. Some of the 
Pythagoreans and Eudoxus before Euclid had executed important details of the grand design, but 
it remained for Euclid to see it all and see it whole. He is therefore the great perfector, if not the 
sole creator, of what is today called the postulational method, the central nervous system of living 
mathematics. 


Unification and organisation of mathematics is still an important task in the 21st century. Bell [233], page 299, 
says the following. 


Geometrical teaching was dominated by Euclid for over 2200 years. His part in the Elements 
appears to have been principally that of a coordinator and logical arranger of the scattered results 
of his predecessors and contemporaries, and his aim was to give a connected, reasoned account of 
elementary geometry such that every statement in the whole long book could be referred back to 
the postulates. Euclid did not attain this ideal or anything even distantly approaching it, although 
it was assumed for centuries that he had. 


77.1.3 REMARK: Dark Ages 


dates name contribution 
99BC-1BC 
cl0-c70 Heron of Alexandria Areas and volumes of geometrical figures. 


c85-c168 Ptolemy of Alexandria Spherical trigonometry. Table of chords (the ancient equivalent 
of sines and cosines). 


200—299 
c290-c350 Pappus of Alexandria Some theorems in geometry. 
400—499 
500—599 
600—699 
700-799 
800—899 
900—999 
1000-1099 
1100-1199 
1200-1299 
1323-1382 Nicole Oresme Cartesian coordinates. Location of maxima using the gradient. 


Progress in mathematics was woefully slow during the Dark Ages of the Roman Empire and Christianity 
until the Renaissance awakened Europe from its long sleep. There was some progress in arithmetic and 
algebra, but geometry and analysis largely stagnated. Bell [234], page 85, describes this as follows. 
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It is customary in mathematical history to date the beginning of the sterile period from the onset 
of the Dark Ages in Christian Europe. But mathematical decadence had begun much earlier, in 
one of the greatest material civilizations the world has known, in the Roman Empire at the height 
of its splendor. Mathematically, the Roman mind was crass. 


If the Romans had not subjugated Europe, Northern Africa and Western Asia, robots might have landed on 


Mars in 400AD. Heath [245], pages 197—198, says the following. 


With Archimedes and Apollonius Greek geometry reached its culminating point. There remained 
details to be filled in, and no doubt in a work such as, for instance, the Conics geometers of the 
requisite calibre could have found propositions containing the germ of theories which were capable 
of independent development. But, speaking generally, the further progress of geometry on general 
lines was practically barred by the restrictions of method and form which were inseparable from 
the classical Greek geometry. True, it was open to geometers to discover and investigate curves of 
a higher order than conics, such as spirals, conchoids, and the like. But the Greeks could not get 
very far even on these lines in the absence of some system of coordinates and without freer means 
of manipulation such as are afforded by modern algebra, in contrast to the geometrical algebra, 
which could only deal with equations connecting lines, areas, and volumes, but involving no higher 
dimensions than three, except in so far as the use of proportions allowed a very partial exemption 
from this limitation. [...] 


It might be thought that there was room for further extensions in the region of solid geometry. But 
the fundamental principles of solid geometry had also been laid down in Euclid, Books XI-XIII; the 
theoretical geometry of the sphere had been fully treated in the ancient sphaeric; and any further 
application of solid geometry, or of loci in three dimensions, was hampered by the same restrictions 
of method which hindered the further progress of plane geometry. 


This seems to put the blame for 1600 years of stagnation and decline on the ancient mathematicians who 
made such astonishingly rapid and self-confident progress for 400 years. But the ancient mathematicians were 
constantly going beyond the bounds of earlier mathematicians, and then at about the time of the subjugation 
of the Hellenic world to Rome, radical innovation ceased. Mathematics was not the only intellectual domain 


which slept for 1600 years and was awakened by the general renaissance in all domains. 


77.1.4 REMARK: Renaissance 


dates name contribution 


1404-1472 Leone Battista Alberti theory of perspective; vanishing line. 1435: Della pittura. 
c1415-1492 Piero della Francesca 1478: De prospectiva pingendi; perspective of solid objects. 


1473-1543 Nicolaus Copernicus trigonometry book; heliocentric version of Ptolemy's epicycles. 


1512-1594 Gerhard Kremer a.k.a. Gerard Mercator. 1569: Mercator's projection. 

1540-1603 Francois Viete symbolic algebra; Newton’s method. 

1571-1630 Johannes Kepler principle of continuity; points at infinity; elliptical planet-orbits. 

1564-1642 Galileo Galilei 1591/1612: dynamics. 

1596-1650 René Descartes 1637: la Géométrie; analytic geometry. 

1591-1661 Girard Desargues 1636-39: invented projective geometry; points at infinity. 

1623-1662 Blaise Pascal 1636-39: synthetic projective geometry. 

1601-1665 Pierre de Fermat 1629: analytic geometry; 1657/61: tangents to curves as limits 
of secants; variational principle in optics. 

1630-1677 Isaac Barrow Newton’s calculus teacher; 1670: “Lectiones geometrice” 


1629-1695 Christiaan Huygens 


The 17th century is notable for the rapid development of analysis, which is distinguished from other areas of 


mathematics by the use of infinite and infinitesimal limits. The principal contribution of Isaac Newton was 
not so much in purely mathematical analysis as in the application of analysis to physics. For this application, 


he required some elucidation and development of the methods of analysis. 


Practical analysis was developed initially by Archimedes. Limits and derivatives were written about by 
some authors in the century or two before Newton. But these notions were elevated from curiosities to 
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fundamental physical modelling tools by Newton. It was then perhaps inevitable that analysis would be 
applied to geometry to produce differential geometry. 


Bell [233], page 96, has the following comment on how Newton learned calculus from Isaac Barrow. 


Barrow’s geometrical lectures dealt among other things with his own methods for finding areas 
and drawing tangents to curves—essentially the key problems of the integral and the differential 
calculus respectively, and there can be no doubt that these lectures inspired Newton to his own 
attack. 


Bell [234], pages 120-121, has the following comment on the earlier development of Newton’s method for 
solution of polynomial equations by Francois Viéte. 


Improving on the devices of his European predecessors, Vieta gave a uniform method for the 
numerical solution of algebraic equations. Its nature is sufficiently recalled here by noting that it 
was essentially the same as Newton’s (1669) given in textbooks. 


77.1.5 REMARK: Enlightenment 


dates name contribution 


1616-1703 John Wallis generalisation of superscript exponential notation 

1654-1705 Jacob (Jacques) Bernoulli integral calculus; calculus of variations; e; the lemniscate. 

1646-1716 Gottfried Wilhelm Leibniz 1673/75: diff/int. calculus; fundamental theorem of calculus; 
calculus notation. 

1642-1727 Isaac Newton 1666/84: diff/int. calculus; 1687: Principia; fund. theorem of 
calculus; celestial mechanics. 

1685-1731 Brook Taylor 

1698-1746 Colin Maclaurin 

1667-1748 Johann Bernoulli integration; geodesics on a surface; calculus of variations 

1707-1783 Leonhard Euler coined the term “affine”, 1748. 


The 18th century was known as the “Age of Reason" because during this time the objections of European 
religious authorities to scientific progress were overcome and finally made irrelevant. Once again, as during 
the golden age of classical Greece, critical thinking and insightful discovery replaced ignorant authority. An 
important step in this was the publication of Lagrange's work on mechanics in 1788. Bell [234], page 362, 
wrote the following on this subject. 


The eighteenth century has been called the Age of Reason, also an age of enlightenment, partly 
because the physical science of that century attained its freedom from theology. In the hundred 
years from the death of Newton in 1727 to that of Laplace in 1827, dogmatic authority suffered 
the most devastating of all defeats at the hands of scientific inquiry: indifference. It simply ceased 
to matter, so far as science was concerned, whether the assertions of the dogmatists were true or 
whether they were false. 


77.1.6 REMARK: Nineteenth century 


dates name contribution 
1724-1804 Immanuel Kant “Proved” that Euclidean geometry was “known a priori”. 
1736-1813 Joseph-Louis Lagrange calculus of variations; Lagrangian mechanics. 
1746-1818 Gaspard Monge introduced differential geometry; created descriptive geometry. 
1753-1823 Lazare Nicholas Marguerite 1803: Géométrie de position; 1806: Essai sur les transversailles; 
Carnot projective geometry. 
1749-1827 Pierre-Simon Laplace analysis, celestial mechanics, potential theory. 


1802-1829 Niels Henrik Abel 
1768-1830 Jean Baptiste Joseph 


Fourier 
1811-1832 Evariste Galois finite groups. 
1752-1833 Adrien Marie Legendre 
1781-1840 Siméon Denis Poisson elliptic boundary value problems 
1793-1841 George Green Green's theorem, Green's functions. 
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1804-1851 
1777-1855 


1789-1857 
1805-1859 
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Bernard Placidus Johann 
Nepomuk Bolzano 

Carl Gustav Jacob Jacobi 
Johann Carl Friedrich 
GauB 

Augustin Louis Cauchy 
Johann Peter Gustav 
Lejeune Dirichlet 
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The mathematical set concept, limits, Cauchy sequences. 


Hamilton-Jacobi equation; Jacobian determinant. 
differential geometry; Gaufian curvature. 


partial differential equations; Cauchy sequences. 
boundary value problems. 


1796-1863 Jakob Steiner contributor to projective geometry. 
1815-1864 George Boole 1854: "An investigation of the laws of thought". 
1805-1865 William Rowan Hamilton Hamiltonian mechanics. 


1826-1866 


1798-1867 


1788-1867 


1790-1868 


1811-1874 
1809-1877 


Georg Friedrich Bernhard 
Riemann 

Karl Georg Christian von 
Staudt 

Jean-Victor Poncelet 


August Ferdinand Móbius 
Ludwig Otto Hesse 


Hermann Günther 
GraBmann 


1854: Uber die Hypothesen welche der Geometrie zu Grunde 
liegen; generalised Gaufian curvature to higher dimensions. 
elimination of metrical considerations from projective geometry 


1822: Traité des propriétés projectives des figures; modern 
projective geometry; introduced imaginary points. 

1827: Der barycentrische Calcul, included results on projective 
and affine geometry; introduced barycentric coordinates. 


development of a general calculus for vectors. 


1845-1879 William Kingdon Clifford 

1831-1879 James Clerk Maxwell Electromagnetism. Maxwell's equations. 

1793-1880 Michel Chasles 1852: Traité de Géométrie discusses cross ratio. 
1821-1881 Heinrich Eduard Heine Heine-Borel theorem. Uniformly continuous functions. 
1808-1882 Johann Benedict Listing 1847: Vorstudien zur Topologie; first printed use of word 


"topology". 


1829-1891 Ludvig Valentin Lorenz 1867: Lorenz EM gauge condition 9,, A" = 0. 

1823-1891 Leopold Kronecker arithmetisation of mathematics. 

1856-1894 Thomas Jan Stieltjes Stieltjes integral. 

1821-1895 Arthur Cayley reduction of metrical geometry to projective geometry; 


1843-1896 
1815-1897 


1814-1897 
1842-1899 


Giulio Ascoli 

Karl Theodor Wilhelm 
WeierstraB 

James Joseph Sylvester 
Marius Sophus Lie 


“invented” matrices. 
equicontinuity; Ascoli’s theorem 
mathematical analysis 


gave the name “matrix” to matrices 
continuous transformation groups. 


The dearth of British names among mathematicians who died between 1750 and 1900 is quite striking. This is 
sometimes attributed to the isolation of British mathematicians after the silly arguments about who invented 
the calculus. Newton ostensibly won the argument, but it was a self-defeating victory. British mathematics 
went into decline. Europeans dominated the development of mathematics thereafter. Bell [233], page 144, 
makes the following comment on this subject. 


The upshot of it all was that the obstinate British practically rotted mathematically for all of a 
century after Newton’s death, while the more progressive Swiss and French, following the lead of 
Leibniz, and developing his incomparably better way of merely writing the calculus, perfected the 
subject and made it the simple, easily applied implement of research that Newton’s immediate 
successors should have had the honor of making it. 
(See also some related comments and quotations in Remark 40.4.12 regarding the decline of English mathe- 
matics and the first signs of emergence from decline in the second half of the nineteenth century.) Struik [249], 
page 141, says the following about nineteenth century mathematics. 


The new and turbulent mathematical productivity was not primarily due to the technical problems 
raised by the new industries. England, the heart of the Industrial Revolution, remained math- 
ematically almost sterile for several decades. Mathematics progressed most healthily in France 
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and somewhat later in Germany, countries in which the ideological break with the past was most 
sharply felt and where sweeping changes were made, or had to be made, to prepare the ground 
for the new capitalist economic and political structure. The new mathematical research gradually 
emancipated itself from the ancient tendency to see in mechanics and astronomy the final goal of 
the exact sciences. 


77.1.7 REMARK: 


Twentieth century 


A few physicists have been included in the following table. It is often difficult to draw a clear distinction 
between mathematicians and physicists. Those who are listed here either contributed to the mathematical 
foundations of differential geometry or influenced it in some significant way. 


dates name contribution 
1835-1900 Eugenio Beltrami Laplace-Beltrami operator 
1829-1900 Elwin Bruno Christoffel covariant differentiation. 
1822-1901 Charles Hermite Hermitian matrix 
1851-1901 George Francis Fitzgerald Lorentz-Fitzgerald contraction 
1819-1903 George Gabriel Stokes Hydrodynamics. Falsely given credit for the Stokes theorem. 
1832-1903 Rudolf Otto Sigismund Lipschitz continuity; differential equations 
Lipschitz 
1864-1909 Hermann Minkowski 1908: Minkowski space-time formulation of special relativity 
1854-1912 Jules Henri Poincaré Relativity, algebraic topology, Poincaré conjecture. 
1831-1916 (Julius Wilhelm) Richard set theory; real numbers. 
Dedekind 
1838-1916 Ernst Waldfried Josef Relativity, Mach’s principle. 
Wenzel Mach 
1873-1916 Karl Schwarzschild Singularities in general relativity. 
1842-1917 Jean Gaston Darboux differential geometry, moving frames. 
1845-1918 Georg Ferdinand Ludwig Set theory. Ordinal numbers. 
Philipp Cantor 
1850-1919 Woldemar Voigt 1887: First formulation of Lorentz transformations. 
1843-1921 Karl Hermann Amandus Cauchy-Schwarz inequality. 
Schwarz 
1838-1922 Marie Ennemond Camille topology, functions of bounded variation, Jordan content 
Jordan 
1847-1923 Wilhelm Karl Joseph Killing fields. Lie algebras. 
Killing 
1838-1923 Edward Williams Morley Michelson-Morley experiments. 
1849-1925 Felix Klein Erlanger Programm; unification of geometries. 
1848-1925 Friedrich Ludwig Gottlob logic and set theory foundations in quipu-diagrams; casualty of 
Frege Russell’s paradox 
1853-1925 Gregorio Ricci-Curbastro developed tensor calculus; absolute differential calculus 
1853-1928 Hendrik Antoon Lorentz 1895: Lorentz contraction. 1904: Lorentz relativity theorem. 
1843-1930 Moritz Pasch 1882: statement of geometry as a hypothetico-deductive system 
1861-1931 Cesare Burali-Forti Burali-Forti paradox. 
1852-1931 Albert Abraham Michelson-Morley experiments. 
Michelson 
1875-1932 Giuseppe Vitali proved “existence” of Lebesgue non-measurable sets. 
1858-1932 Giuseppe Peano 1891-95: Axiomatisation of mathematical logic and numbers. 
1886: Existence for ODEs. 
1879-1932 John Wesley Young gave a strict axiomatic basis for projective geometry 
1863-1936 Gustav Ludwig Lange Introduced inertial frames of reference. 
1878-1936 Marcel Grossman discovered relevance of tensor calculus to relativity? 
1877-1938 Edmund Georg Hermann Landau's symbols 
Landau 
1875-1941 Henri Léon Lebesgue measure and integration 
1856-1941 Charles Émile Picard Picard iteration method, ODEs, elliptic PDEs. 
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1873-1941 
1868-1942 
1857-1942 
1897-1942 
1862-1943 
1874-1943 
1899-1943 
1882-1944 
1909-1945 


1892-1945 
1871-1947 
1858-1947 


1861-1947 
1873-1950 
1869-1951 
1871-1953 


1879-1955 
1885-1955 


1871-1956 


1878-1956 
1887-1956 


1878-1956 
1903-1957 


1880-1960 
1884-1962 
1885-1962 
1891-1965 


1930-1966 
1881-1966 


1882-1970 
1872-1970 


1878-1973 
1901-1976 
1921-1977 
1888-1977 
1906-1978 
1909-1978 
1906-1979 
1896-1980 


1901-1983 
1918-1988 
1906-1993 


1927-1999 


77.1. Chronology of mathematicians 


Tullio Levi-Civita 

Felix Hausdorff 

Joseph Larmor 
Stanislaw Saks 

David Hilbert 

Friedrich Moritz Hartogs 
Juliusz Pawel Schauder 
Arthur Stanley Eddington 
Gerhard Karl Erich 
Gentzen 

Stefan Banach 

Walter Kaufmann 

Max Karl Ernst Ludwig 
Planck 

Alfred North Whitehead 
Constantin Carathéodory 
Elie Cartan 

Ernst Friedrich Ferdinand 
Zermelo 

Albert Einstein 
Hermann Klaus Hugo Weyl 
Félix Edouard Justin Emile 
Borel 

Jan Lukasiewicz 

Johann Karl August 
Radon 

Felix Bernstein 

John (János) von 
Neumann 

Oswald Veblen 

Jakob Johann Laub 

Niels Henrik David Bohr 
Adolf Abraham Halevi 
Fraenkel 

Edward John Lemmon 
Luitzen Egbertus Jan 
Brouwer 

Max Born 

Bertrand Arthur William 
Russell 

René Maurice Fréchet 
Werner Karl Heisenberg 
Alfred Schild 

Paul Isaak Bernays 

Kurt Friedrich Gödel 
Eduard Ludwig Stiefel 
Charles Ehresmann 
Kazimierz (Casimir) 
Kuratowski 

Alfred Tarski 

Richard Phillips Feynman 
Annpei Huxonaepuu 
Tux0HOB 

Robert Laurence Mills 
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1917: infinitesimal parallel transport; developed tensor calculus 
1918: Hausdorff measure and dimension. 

1897: Lorentz transformations 

integration theory; Denjoy- Young-Saks theorem 

1899: Grundlagen der Geometrie. 

1915: Hartogs’s theorem 

Schauder estimates, partial differential equations 

1916: Riemannian and affine connections. 

Sequent calculus. Mathematical logic. Consistency of the 
integers. 

Banach spaces 

1901-1903: Velocity-dependent electron mass experiments. 
1900: Originator of quantum mechanics 


1910-13: Principia Mathematica. 

geometric measure theory 

1901: exterior derivative; 1923-25: defined general connections. 
1908: set theory axioms 


1916: general relativity. 
1916: Riemannian and affine connections. 


measure and integration 


Mathematical logic. “Polish notation". 
Radon measures. 


1898: Schróder-Bernstein theorem. 

1922: early version of Bernays-GOdel set theory. Ordinal 
numbers using successor sets. 

gave a strict axiomatic basis for projective geometry 

1908: With Einstein, opposed Minkowski's spacetime. 
quantum mechanics in the 1910s and 1920s 

1922: Added replacement axiom to Zermelo's set theory. 


1965: Textbook presenting tabular natural deduction system. 
1908: “The unreliability of the principles of logic”. Intuitionism. 


quantum mechanics in the 1920s 
1910-13: Principia Mathematica; Russell’s paradox. 


1906: topological compactness 

quantum mechanics in the 1920s 

1970: Schild’s ladder. 

Set theory. 

set theory 

1936: introduced fibre bundles as a distinct concept 
Ehresmann connection. 

Represented ordered pair (a,b) as {{a}, {a,b}}. Finite set 
definition. Topology. 

Finite sets. Logic. Model theory. 

Quantum electrodynamics. 

Andrei Nikolaevich Tikhonov. Compactness theorem. (Also 
sometimes transliterated as “Tychonoff” .) 

1954: Yang-Mills theory 


77.1.8 REMARK: 


Twenty-first century 
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dates name contribution 
1915-2001 Frederick (Fred) Hoyle Astrophysics. Cosmology. 
1915-2002 Laurent-Moise Schwartz theory of distributions. 
1934-2007 Paul Joseph Cohen 1963: Independence of AC and GCH from ZF set theory axioms. 
1922-2014 Patrick Colonel Suppes 1957: Textbook presenting tabular natural deduction system. 
1928-2016 Solomon Feferman 1964: Proved non-existence of real-number well-orderings. 
1921-2018 Jean-Louis Koszul 'The Koszul connection in terms of vector fields. 
1922- Yang Chen-Ning 1954: Yang-Mills theory. (IT. Yáng Zhèn-Níng.) 
1966- DLpuropud jflkonzenuu Grigori Yakovlevich Perelman. 2002: Proof of Poincaré 

IlepeubMan conjecture. 


77.2. Etymology of affine spaces 


77.2.1 REMARK: The perplexing mathematical term “affine”. 

Section 77.2 deals with the historical origin of the perplexing word “affine” in expressions such as “affine 
spaces” and “affine transformations”. It seems that the term was introduced by Euler in a slightly erroneous 
fashion in 1748, and was later defined in the modern sense by Mobius in 1827. 


77.2.2 REMARK: Some dictionary definitions of the word “affine”. 

The word “affine” is absent from some English dictionaries. The word "affin/affine" meant "similar" in 
French from the 12th to 16th centuries and then disappeared, but reappeared in the mid-19th century [483]. 
Some definitions from various dictionaries are summarised in Table 77.2.1. 


language dictionary word definition 
Latin White [485] affinis bordering upon, adjacent to, allied, kindred; a connection or 


relation by marriage. 


English Oxford shorter [482] affin/e 1509. A relation by marriage; a connection; closely related. 


French Petit Robert [483] affin/e which conserves invariant, by linear relations, transformations in 
the plane or in space 


German Wahrig [484] affin/e parallel-related (from Latin "affinis": adjacent, adjoining) 
German Duden [474 affin produced by parallel projection of one plane onto a second 
Italian Sansoni [479] affine similar, allied, kindred, alike 
Spanish ^ Cassell [473] afin contiguous, adjacent; allied, related, similar 

Table 77.2.1 Multilingual meanings for “affine” cognates 


77.2.3 REMARK: Luler’s introduction of the mathematical term “affine”. 

The geometrical term “affine” was introduced by Leonhard Euler in 1748 in his book Introductio in analysin 
infinitorum [217], volume 2, chapter 18, section 442. This is quoted by August Ferdinand Möbius in 1827 
in his book Der barycentrische Calcul [226], pages 194-195, as his source for this term. So apparently Euler 
was the first mathematician to use the word "affine" for linear transformations, and Móbius was the second. 
But the truth is more complex than this. 


Euler defined two figures to be affine if they could be oriented and translated so that one could be obtained 
from the other by a scaling in IR? such as (x,y) +> (az, by) (whereas a similarity transformation has the form 
(x,y) + (az,ay), of course). Euler’s thinking on this seems to have been rather woolly. This relation is 
not an equivalence relation since it is not transitive. (For instance a square can be deformed to a rectangle 
through one axis, or to a diamond through an axis at 45? to this, but there is not single two-scale scaling 
which can transform the diamond into the rectangle.) Therefore the set of such transformations does not 
form a closed group. Euler's use of this non-transitive relation is quite understandable since geometric 
thinking in terms of transformation groups, equivalence relations and invariants did not really take off until 
the 19th century. 
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77.2.4 REMARK: Mobius noticed the error in Euler’s concept of affinity of figures. 

Mobius noted that Euler claimed that for any two affine-related figures in the plane, there must always be 
a pair of axes for which by scaling the figure differently in those two directions the two figures could be 
matched. (See Móbius [226], page 195.) In other words, Euler effectively implied that an affine relation 
between two figures could be expressed as a translation combined with a transformation matrix such as 


cosÓ1 —sin 6; a O0||cos05 -—sin605 
sin 0 cos01| |0 b] | sinə cos Ay |? 

for some a,b,01,02 c R. The composition of such matrices yields matrices with non-zero off-diagonal 

entries. The non-closure of Euler’s transformations implies that the relation is not transitive, and therefore 

his relation is not an equivalence relation as he probably had assumed it would be. 


77.2.5 REMARK: The relevance of affine transformations to centre-of-gravity calculations. 

Between Euler in 1748 and Möbius in 1827, there seems to have been little interest in affine spaces. Although 
Euler apparently coined the word, Móbius was probably the real inventor of affine transformations as a 
subject for study. His book on barycentric calculus contained a chapter on congruences and similarities. 
(See Móbius [226], pages 191-212.) 


The central concerns in geometry between the times of Desargues and Mobius were clearly metric-invariant, 
conformal-invariant and projective geometries, and somehow there was no motivation to consider affine- 
invariant geometry as a special topic. But Mobius found applications to the engineering problem of deter- 
mining the centre of mass of a structure. The centre of mass is preserved under affine transformations but 
not under projections. In fact, the affine transformations make up precisely the group under which convex 
combinations such as the centre of gravity are invariant. 


77.2.6 REMARK: Unification of geometrical transformation groups within the Erlanger Programm. 
Within a few decades, the projective, affine, conformal and metric transformation groups were systematised 
within the framework of the so-called Erlanger Programm (named after Erlangen University, where Felix 
Klein proposed the program in 1872). The idea of the Erlanger Programm was to study a wide range of 
geometries, each specified as the set of properties and relations of special subsets (the “figures”) of a given 
set X which are invariant under a group G of transformations of X which define a generalised notion of 
congruence. In the case of affine spaces, X is a linear space and G is the set of all affine transformations — the 
group of all combinations of translations and invertible linear transformations. It seems that this sort of meta- 
geometrical point of view did not originate in the Erlanger Programm but was rather merely systematised 
in Klein’s proposal. 


77.2.7 REMARK:  Euler's original text concerning affinity. 

On reading Euler's original text on the “affinity” relation, it becomes clear that he was not much interested 
in the significance for geometry. He was interested rather in the graphs of parametrised families of algebraic 
functions rather than the geometry of those graphs as geometrical objects. Here is paragraph 442 of Euler's 
Introductio in analysin infinitorum [217], volume 2, chapter 18. 


442. Quemadmodum in curvis similibus abscissae et applicatae homologae in eadem ratione sive 
augentur sive diminuuntur, ita, si abscissae aliam sequantur rationem, aliam vero applicatae, cur- 
vae non amplius orientur similes. Verum tamen, quia curvae hoc modo ortae inter se quandam 
affinitatem tenent, has curvas affines vocabimus; complectitur ergo affinitas sub se similitudinem 
tanquam speciem, quippe curvae affines in similes abeunt, si ambae illae rationes, quas abscissae 
et applicatae seorsim sequuntur, evadant aequales. Ex curva ergo quacunque data AM B innu- 
merabiles curvae affines (Fig. 88 et 89) amb reperientur hoc modo: sumatur abscissa ap, ita ut 
sit AP : ap = 1 : m; harum rationum 1 : m et 1 : n vel alterutram vel utramque, innumerabiles 
prodibunt curvae, quae primae AM B erunt affines. 


'This may be translated as follows. 


442. Just as the corresponding abscissae and ordinates in similar curves are augmented or dimin- 
ished in the same ratio, so, if the abscissae follow one ratio and the ordinates follow a different 
ratio, the curves are no longer similar. Nevertheless, because curves arising in this way have a 
certain affinity to each other, we will call these curves affine; affinity therefore encompasses the 
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similarity idea, so to speak; in fact, affine curves change to similar curves if both of those ratios, 
which the abscissae and ordinates separately follow, happen to be equal. Therefore from any given 
curve AMB, countless affine curves amb (Fig. 88 and 89) are found in this way: the abscissa ap is 
chosen so that AP : ap = 1: m; then the ordinate pm is determined so that PM : pm = 1: n; and 
thus by changing these ratios 1 : m and 1 : n, either both or one at a time, countless curves will be 
produced which are affine to the first AM B. 


Euler’s figures 88 and 89 are illustrated in Figure 77.2.2. The X-axis is vertical and the Y-axis is oriented to 
the left. The lines PM and pm represent the ordinates, whereas AP and ap represent the abscissae. 


B 

b 
[o 

M P o 

m p 

Fig. 89 

Fig. 88 È. a 

C A 
Figure 77.2.2 Euler's figures 88 and 89 in the Introductio 


The editor, Andreas Speiser [218], of the complete works of Euler makes this comment regarding chapter 18. 


Kapitel 18 handelt von ähnlichen und affinen Kurven, ersteres mit der Substitution x = mu, 
y = mv, letzteres mit der allgemeineren x = mu, y = nv. Der Ausdruck “affin” ist wohl hier von 
Euler eingeführt worden. 


'This may be translated as follows. 


Chapter 18 deals with similar and affine curves, the former with the substitution z = mu, y = mv, 
the latter with the more general x = mu, y = nv. The expression “affine” is no doubt introduced 
here by Euler. 


In other words, this passage appears to be the origin of the term “affine” in this geometrical sense. But 
it seems that Euler did not do much with it. The subject seems to have not taken off until Möbius took 
it up and used the same term that Euler had introduced. Here is the relevant comment by Mobius [226], 
section 147, page 195, just after quoting Euler's paragraph 442. m 


Der von Euler hier aufgestellte Begriff der Affinitas ist also ganz mit dem vorhin entwickelten 
einerlei, und ich will daher gleichfalls diese allgemeinere Verwandtschaft Affinität, und Figuren, 
zwischen denen sie statt findet, affine Figuren nennen. 


Translated into English, this is as follows. 
'The concept of affinity proposed here by Euler is thus entirely the same as that which was developed 


above, and likewise, I want to call this more general relation affinity, and call figures between which 
this relation exists affine figures. 


Mobius then points out the limitations and errors in Euler’s discussion of affine transformations. 
In paragraph 443, Euler discusses how to substitute scaled variables in place of the x and y variables in a 


given equation. But in paragraph 444, he talks about the supposed distinction between similar and affine 
curve relationships. 
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444. Discrimen autem inter curvas similes et affines hoc potissimum est notandum, quod curvae, 
quae sunt similes respectu unius axis vel puncti fixi, eaedem similes sint futurae respectu aliorum 
quorumvis axium seu punctorum homologorum. Curvae autem, quae tantum sunt affines, tales 
tantum sunt respectu eorum axium, ad quos referentur, neque pro lubitu alii axes seu puncta 
homologa in ipsis dantur, ad quae affinitas referri possit. |... ] 


This may be translated as follows. 


444. Now the most powerful distinction to be noted between similar and affine curves is that curves 
which are similar in respect of a single axis or fixed point are going to be similar with respect to 
any other axes or corresponding points. On the other hand, curves which are only affine, are such 
only in respect of the axes to which they are referred, and other axes or corresponding points are 
not given in themselves arbitrarily, to which the affinity may be referred. [...] 


Euler says here that the axes to be used for a similarity translation are arbitrary, which is true, but that 
the axes for an affinity transformation cannot be chosen arbitrarily. Mobius states that in fact the axes 
for an affinity transformation are not only of arbitrary orientation but are also not necessarily orthogonal, 
although he continues to use two scale factors m and n for the scalings in the two axial directions. Thus 
Mobius does not resort to linear combinations of coordinates. He still uses a diagonal matrix effectively, but 
axes are chosen in different angles in each of the two figures which are supposed to be affine. Clearly Euler 
thought only in terms of orthogonal coordinates in both figures, with no off-diagonal components in the 
transformation matrix. Oddly, though, in paragraphs 452-454, Euler discusses orthogonal transformations 
using sines and cosines of rotation angles. But he didn't say anything about making both the diagonal 
components different and also the off-diagonal components non-zero at the same time. This just shows that 
Euler was not thinking at all in terms of what we think of as affine transformations today. But in terms of 
etymology, there seems little doubt that Euler was the originator of the term “affine” for a concept which 
directly developed into the transformation group that we know today. 


77.3. History of relativity 


77.3.1 REMARK: The controversial attribution of the relativity theories. 

The history of relativity is a controversial subject with a substantial literature. Even a brief reading of the 
literature shows that Einstein was only one of the dozen or so mathematical physicists and mathematicians 
who built up the modern theories of Minkowski-space electromagnetic theory (special relativity) and pseudo- 
Riemannian-space gravity theory (general relativity). 


The history of special and general relativity may be compared with the history of differential and integral 
calculus. The relative credit to be given to Leibniz and Newton was very controversial. But the true answer is 
that really neither of them was the inventor of the calculus. The invention of the calculus must be attributed 
to Pythagoras, Zeno, Eudoxus, Aristotle, Euclid, Archimedes, Oresme, Viéte, Stevin, Cavalieri, Torricelli, 
Kepler, Galileo, Descartes, Fermat, Wallace, Barrow, Leibniz, Newton, Maclaurin, Euler, Lagrange, Lacroix, 
Bolzano, d’Alembert, Cauchy, Weierstrass, Cantor and Dedekind, roughly in that order. The contributions 
of Leibniz and Newton were entirely unsatisfactory because of they were based on logically self-contradictory 
concepts of infinitesimals and differentials. Most of the discoveries of Leibniz and Newton were previously 
known, although they did contribute useful general frameworks for practical applications. 


In the case of the relativity theories also, Einstein’s theories drew upon the work of many others, such as 
Riemann, Maxwell, Mach, Lange, Voigt, Fitzgerald, Lorentz, Poincaré and Ricci. Einstein’s theories also 
required much elaboration by later authors to bring them to the modern form. 


77.3.2 REMARK: Chronology of Lorentz transformations and relativity. 

Since pseudo-Riemannian geometry is largely motivated by applications to Lorentz transformations and 
relativistic physics, the history of developments in this subject is of some interest. This history is very 
briefly outlined in Table 77.3.1. 


77.3.3 REMARK: Some developments leading up to special relativity. 

Voigt noticed in 1887 the invariance of the wave equation —c?02¢ + 35. , 02 à = 0 with respect to Lorentz 
transformations. Lorentz and Poincaré noticed in 1904 and 1905 that Maxwell’s equations are invariant under 
Lorentz transformations. The quadratic pseudo-distance form —c?t? + |x|? for t € R and x € IR? is also 
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date name contribution 

1854 Riemann The founding document of Riemannian geometry. [230 

1869 Christoffel Christoffel arrays to tensorise covariant derivatives. |175 

1873 Maxwell Fully developed form of Maxwell’s electromagnetism equations. 

1881 Michelson First attempt by Michelson to measure Earth’s velocity relative to the aether. 

1883 Mach Proposed abandonment of absolute time and space in favour of relative time 
and space. English translation: [464]. 

1885 Lange Introduced inertial frames of reference. See [327] page 273. 

1887 Michelson More accurate attempt by Michelson and Morley to measure Earth’s velocity 

Morley relative to the aether. [330] 

1887 Voigt Introduced Lorentz transformations in a paper about the Doppler effect. [336] 

1889 Fitzgerald Fitzgerald contraction to explain Michelson-Morley experiments. [325] 

1892 Lorentz Explanation of Michelson-Morley experiment null result in terms of Lorentz 
transformations for a fixed aether. 

1895 Lorentz Contraction law for moving objects. [286] See also English-translation excerpt 
in [263] pages 1-7. 

1900 Poincaré Gave the formula m = E/c? for electromagnetic energy, and a principle of 
relative motion: “principe du mouvement relatif". [332 

1900 Ricci-Curbastro Absolute differential calculus. Superscripts/subscripts corresponding to 

Levi-Civita contravariant and covariant tensors. [194] See also [222] pages 479—559. 

1901 Kaufmann Measurements of relativistic mass of electrons, 1901-1903. 

1904 Poincaré Used the term “relativity theory” for the invariance of Maxwell’s equations 
with respect to Lorentz transformations. [333 

1904 Lorentz Relativity theorem. Proved that Maxwell’s equations are invariant under 
Lorentz transformations. [328] Also in [263] pages 11-34. 

1905 Einstein Proposed fixed speed of light as an axiomatic principle. [263] pages 35-65. 

1905 Einstein Formula m — E/c?.[319] English translation: [263] pages 67-71. 

1908 Minkowski Minkowski space-time geometrical formulation of relativity. [331] 

1908 Einstein/Laub Rejection of Minkowski space-time by Einstein and Laub. [324] 

1915 Einstein General relativity. [321, 322] 

1916 Hilbert Grundlagen der Physik. [326 

Table 77.3.1 Overview of history of Lorentz transformations and relativity 


invariant under Lorentz transformations. The essence of the relativity idea is that the speed of light is the 
same for all inertial frames, which closely corresponds to the invariance of the wave equation. Consequently 
the relativistic explanation of the Michelson-Morley experiment is closely linked to the relativity theory. 


The 1904 September 24 lecture delivered by Poincaré [333], page 306, is particularly interesting for this 
paragraph, which was one of a list of possible principles by which one could understand physics without 
needing to know the detailed mechanisms. (Poincaré gave the conservation of energy as an example of such 
a principle, which allows one to make deductions about systems in the absence of detailed knowledge of their 
inner mechanisms.) 


Le principe de la relativité, d’après lequel les lois des phénomènes physiques doivent être les mêmes, 
soit pour un observateur fixe, soit pour un observateur entrai né dans un mouvement de translation 
uniforme; de sorte que nous n'avons et ne pouvons avoir aucun moyen de discerner si nous sommes, 
oui ou non, emportés dans un pareil mouvement. 


This may be translated into English as follows. 


The principle of relativity, according to which the laws of physical phenomena must be the same, 
be it for a fixed observer or for an observer carried along in a uniform motion of translation; so that 
we do not and cannot have any means of discerning whether we are carried along in such a motion 
or not. 
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This suggests that the relativity principle for inertial reference frames, which has been generally attributed 
to Einstein’s 1905 paper, was already under active consideration by others before 1905. This is indicated 
even more clearly on page 311 of the same lecture [333]. 


L’idée la plus ingénieuse a été celle du temps local. Imaginons deux observateurs qui veulent 
régler leurs montres par des signaux optiques; ils échangent des signaux, mais, comme ils savent 
que la transmission de la lumiére n’est pas instantanée, ils prennent soin de les croiser. Quand la 
station B apercoit le signal de la station A, son horloge ne doit pas marquer la méme heure que celle 
de la station A au moment de l'émission du signal, mais cette heure augmentée d'une constante 
représentant la durée de la transmission. Supposons, par exemple, que la station A envoie son signal 
quand son horloge marque l'heure zéro, et que la station B l’apercoive quand son horloge marque 
l'heure t. Les horloges sont réglées si le retard égal à t représente la durée de la transmission, et pour 
le vérifier la station B expédie à son tour un signal quand son horloge marque zéro, la station A 
doit alors l'apercevoir quand son horloge marque t. Les montres sont alors réglées. 


Et, en effect, elles marquent la méme heure au méme instant physique, mais à une condition, c'est 
que les deux stations soient fixes. Dans le cas contraire, la durée de la transmission ne sera pas la 
méme dans les deux sens, puisque la station A, par exemple, marche au devant de la perturbation 
optique émanée de B, tandis que la station B fuit devant la perturbation émanée de A. Les montres 
réglées de la sorte ne marqueront donc pas le temps vrai, elles marqueront ce qu'on peut appeler 
le temps local, de sorte que l'une d'elles retardera sur l'autre. Peu importe, puisque nous n'avons 
aucun moyen de nous en apercevoir. Tous les phénoménes qui se produiront en A, par exemple, 
seront en retard, mais tous le seront également, et l'observateur ne s'en apercevra pas puisque sa 
montre retarde; ainsi, comme le veut le principe de relativité, il n'aura aucun moyen de savoir s'il 
est en repos ou en mouvement absolu. 


'This may be translated into English as follows. 


The most ingenious idea was that of local time. Let us imagine two observers who wish to adjust 
their watches by optical signals; they exchange signals, but, as they know that the transmission 
of light is not instantaneous, they take care to cross them. When station B perceives the signal 
from station A, its clock may not mark the same time as that of station A at the moment of 
emission of the signal, but rather this time augmented by a constant representing the duration of 
the transmission. Let us suppose, for example, that station A sends its signal when its clock marks 
time zero, and that station B perceives it when its clock marks time t. The clocks are adjusted 
if the delay equal to t represents the duration of the transmission, and to verify this, station B 
transmits in its turn a signal when its clock marks zero. Then station A must perceive it when its 
clock marks t. The watches are then adjusted. 


And in fact they mark the same time at the same physical instant, but on the one condition, namely 
that the two stations are fixed. Otherwise the duration of the transmission will not be the same 
in the two directions, since station A, for example, moves ahead of the optical perturbation which 
emanated from B, while station B flees in front of the perturbation which emanated from A. The 
watches adjusted in this way will therefore not mark the true time; they will mark what one may 
call the local time, so that one of them will be slower than the other. It matters little, since we 
have no means of perceiving it. All phenomena which occur at A, for example, will be late, but 
they will all be equally so, and the observer will not perceive this because his watch is slow; thus, 
as the principle of relativity wants, he will have no means of knowing if he is at rest or in absolute 
motion. 


If one did not know that this had been written by Poincaré in 1904, one could easily imagine that it had 
been written by Einstein in 1905. 


The geometrisation of relativity is due to Minkowski. Einstein and Laub rejected Minkowski’s geometrical 
approach in a paper in April 1908. 
Peskin/Schroeder [298], page xvi, wrote the following about the attribution of ideas in physics. 


A principle of physics usually has a name that has been assigned according to the community's 
consensus on who deserves credit for its development. Usually the real credit is only partial, and 
the true historical development is quite complex. But the clear assignment of names is essential if 
physicists are to communicate with one another. 
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77.4. Epilogue 


(2018 


-9-7. Section 77.4 is just a sketch. It will be fully rewritten when the book is finished. )) 


77.4.1 REMARK: Successes. 


This 


book certainly has many failures, and hopefully some successes, although the reader will be the better 


judge of this. There is no sharp dividing line between successes and failures. But the following items are 
currently considered by the author to be successes. This list is roughly in the same order as the topics in 
this book. 


(1) 


(7) 
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Logicisation of differential geometry. The most urgent and important task of the author at the 
commencement of writing this book was to remove geometric intuition as a necessity for proofs and 
definitions in differential geometry. This could be called the “mathematicisation of differential geometry” 
by analogy with Bertrand Russell’s “logicisation of mathematics”. (See Russell [388], pages v-ix, 3-9; 
Russell [389], pages 6-7; Kórner [461], pages 49-51; Wilder [403], pages 219-220, 243-244; Eves [353]; 
pages 267-268.) Although there are still many gaps which need to be filled, it seems that this objective 
has been achieved. It has been shown that differential geometry is a branch of pure mathematics, not 
a branch of experimental geometry or physics. Differential geometry can be learned in the axiomatic 
manner from books instead of acquiring folk wisdom and customs by apprenticeship. Intuition helps 
discovery, but is not a valid substitute for deduction. 


Logic chapters. The logic chapters are, in the author's opinion, the best part of this book. The natural 
deduction predicate calculus in Definition 6.3.9 has all of the desired qualities, such as ease of use and 
reliability, for which it was designed, as evidenced by the proofs of theorems in Sections 6.6 and 6.7, 
and also in Remark 7.3.10. All of the proofs in this book could be formalised within this logic system 
without undue effort, and in fact most proofs have been consciously written so as to facilitate this. 


Ubiquitous logical quantifiers. The ubiquitous application of universal and existential quantifiers in 
this book must surely be counted a success. Most of the mathematics literature, and the vast majority of 
the mathematical and theoretical physics literature, is sorely lacking these quantifiers. It was discovered 
during the medieval and early renaissance era that the use of symbolic notations enormously accelerated 
the progress of mathematics, but even in the 21st century, most of the logic in the mathematical literature 
is expressed in natural language which is confusing, imprecise and unreliable. In this book, all quantifiers 
are explicit, and their order is explicit. T'his removes all doubt as to what precisely is being asserted. 
Predicate calculus also has the huge advantage that it is an international language, a lingua franca! 


Axiom of choice. The principal success of the set theory chapters is the rejection of the axiom of 
choice, whose broad acceptance, in the author's opinion, will be seen by future historians as the greatest 
blunder of twentieth century mathematics. At present the AC-scepticism heresy is rarely expressed in 
the mathematics community, but the majority vote is less important than the truth. Most readers will 
currently regard the rejection of the axiom of choice as a naive provocation, but hopefully over time 
the advantages of providing real proofs and constructions instead of faith-based invocations will become 
clear to the majority. Probably not in this author's lifetime, but the dénouement is inevitable! 


Just as children eventually discover that presents don't come from Father Xmas, and the painted eggs 
in the garden are not put there by the Easter Bunny, and the sixpence under the pillow is not put 
there by the Tooth Fairy, so also will mathematicians some day realise that choice functions do not 
magically appear when the mantra *axiom of choice" is chanted. Real mathematicians prove assertions. 
They don't rely on esoteric mystical incantations when the going gets tough. Some time in the next 
50 to 100 years, hopefully most mathematicians will see that the axiom of choice has no clothes. (See 
Andersen [488].) The Faustian pact will unravel in the fullness of time. 


Finite ordinal numbers. The logical formula Vm € N, (m = ( or da € N, m = aU {a}) defines 
ordinal numbers N € wt. (See Definition 12.1.3.) This reversal of the usual formulas is possibly new. 
Equivalence to the customary definitions relies heavily on the ZF regularity axiom. 

Infinite ordinal numbers. General ordinal numbers are defined in terms of set- inclusion well-ordering 
here instead of the customary set- membership well-ordering. (See Definition 12.5.7.) Equivalence to the 
definitions seen in typical textbooks relies heavily on the ZF regularity axiom. 


Infinite ordinal number cardinality. Georg Cantor's great reputation seems to be the cause of more 
than a hundred years of failed attempts to make the infinite ordinal numbers act as effective cardinality 
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measuring-sticks. The irrationality of continuing with this failed cardinality system has been highlighted 
here. Existence of the required ordinal numbers requires the axiom of choice, but even if they exist, 
there is nothing useful that can be said about cardinality anyway. Hopefully sufficient ridicule heaped 
on this extremely onerous and useless cardinality system will help trigger its demise. 


Fibre bundle formalism. While filling in a hundred gaps in the differential geometry literature, it 
emerged that tangent bundles do not fit tidily into the abstract fibre bundle framework. The most 
prominent consequences of the discordances between abstract and concrete fibre bundles are the “drop 
function” in Definition 54.9.5 and Sections 59.2 and 64.6, the “swap function” in Section 59.6, and 
the differential form “short-cuts” described in Section 57.6. These are mostly handled in textbooks by 
temporarily abandoning the high abstractions of fibre bundles in favour of concrete computations with 
coordinates whenever the formalism fails. This book has grabbed the bull by the horns! (In the early 
years writing this book, the author had a negative attitude towards fibre bundles because they seemed 
unnecessarily abstract. Now they seem to be the core unifying concept of differential geometry.) 


Differential geometry structural layering. The layering of differential geometry structures may 
be one small success of this book. (See Section 1.1.) This approach is far preferable to the ad-hoc 
introduction of lower-layer topics as they become relevant in the middle of higher-layer topics. 


Tangent vectors represented as linearly parametrised Cartesian space lines. According to my 
logs, my “discovery” of the representation of tangent vectors as equivalence classes of chart/line pairs, 
where the lines are linearly parametrised Cartesian space lines, occurred on 2011-8-24. Implementation 
in this book showed up in the software repository in the early hours of 2011-8-27. Since that time, it 
has become increasingly certain that this is the “best of all possible tangent vector representations", to 
paraphrase Voltaire [500, 501, 502]. It is best for both practical and philosophical purposes. 


Connection generators. The idea of defining all connections in terms of “connection generators" 
may be new. The author decided to express connection definitions in terms of infinitesimal generators 
(i.e. Lie algebra elements) on 2002 June 4. This is a stronger unifying concept than connection forms, 
although these concepts are of course related. As a result, associated connections are given a very tidy 
and obvious definition in this book, contrasted with numerous untidy definitions elsewhere. 


Conversion rules between the many styles of connection definitions. Since one of the principal 
objectives of this book was to create a “Rosetta stone" for DG definitions, it may be counted a success 
that at least in the case of connections (probably the most variable structure in the DG literature), 
a substantial proportion of the rules required to convert between the ten definition styles have been 
presented in this book. (See Section 69.15 for conversion rules between ten connection definition styles. 
See Remarks 69.15.2 and 69.15.3 for the “Rosetta stone".) 


Abstract-before-concrete topic layering. Some authors commence with intuitively clear topics and 
then proceed to progressively higher abstractions. The modern axiomatic approach at a graduate level 
is the reverse. The mature mathematician commences with abstractions and has the patience to work 
through them until finally something recognisable emerges. Putting topics into abstract-before-concrete 
order, and general-before-specific, and define-before-use, has probably mostly succeeded here. 


77.4.2 REMARK: Failures. 
It is difficult to neatly divide the author’s own observations about this book into successes and failures. The 


following items are perhaps best described as 


(1) 
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"regrets". 

Excessive length of book. The excessive length of this book must be considered to be a failure. 
Many physics books have a “Chapter 0" summary of differential geometry in about 10 to 50 pages 
before proceeding to the serious business. This author's attempt to do the same thing has clearly failed! 
The cause of this excessive length is the lack of prior understanding of the “size of the problem", and 
also of the “shape” of the problem. Now that the size and shape of the problem have become clearer, it 
should be easier to do it better next time. This project has also taken too long. It should have appeared 
25 years ago. Such a long delay is clearly unacceptable! 


Perhaps one positive consequence of the length and time over-run of this book is the cautionary, deterrent 
effect. Students and other readers of mathematics texts are apt to complain that concepts are not 
adequately explained. This book shows what happens when one attempts to fill all the gaps. As the 
old saying goes, be careful what you wish for! 
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Excessive informality. In many aspects, this is more like a scrap-book of disconnected thoughts, 
despite the long sustained effort to “join the dots” between the numerous topics. Some of the join- 
up of distant topics has been successful, but the overall picture is not as aesthetically pleasing as it 
could be. In particular, this book lacks the elegance of Halmos’s “Naive set theory” [357] and Federer’s 
“Geometric measure theory” [69]. Ideally this book should have been like a crystal, symmetric and 
perfect, unable to be changed in any detail without losing its perfect symmetry. It is sadly very far from 
being “crystalline”. It is more like a lump of granite, containing some little crystals here and there. 


Excessive formality. By attempting to be as correct as possible, and fill as many gaps as possible, 
this book has made mathematics excessively tedious. The original objective had been to make life 
much easier for people who wish to learn differential geometry, but this book has achieved the opposite. 
A difficult-to-read subject has been made so full of formalities that it is almost unreadable. However, 
since the gaps have now been mostly filled, it should be possible to boil it all down to something which 
is in fact easy to read, but without sacrificing correctness. Some time in the future maybe! 


Numerous topic-gaps. Numerous gaps still remain in this book. Several important topics have been 
explicitly excluded from its scope in Remark 1.6.3, but many more topics and sub-topics have been 
presented incompletely, unclearly, or have been silently omitted. 


Numerous definition-gaps. An original aim for this book was to present a unified formalism which 
would combine almost all of the definitions which appear in 50 DG textbooks. This would have created 
a “Rosetta Stone” to unify all of the concepts and notations in a single formalism with conversion rules 
between them, not just between definitions of connections as in Remark 69.15.2. Probably up to half of 
all basic differential geometry definitions in the literature are still absent from this book. Remedying 
this deficit could require a year or two of further work. 


Lack of readership. This book is unlikely to find a wide readership because its principal objective 
is to find the meaning of mathematics. Students are more interested in passing exams. Teachers are 
more interested in reducing their preparation workload (particularly setting exams and exercises) while 
delivering industry-standard content in their courses. Professionals are looking for new results and 
techniques to help them write more research papers. The enthusiastic amateur is often not able to 
sustain the long, gruelling, austere discipline required for pure mathematics. Therefore not very many 
readers are expected for this book. 


No generalisation of PhD results to curved space. As mentioned in Remark 1.4.4, the original 
driving force for this book was an attempt to generalise the author’s 1984/85 PhD theorems [183, 184] 
from flat to curved space. After successfully extending these theorems in 1987 from elliptic boundary 
value problems to parabolic initial value problems [185], it seemed plausible that they could be extended 
from Cartesian spaces to differentiable manifolds. In late November 2015, it became clear that the 
author’s specific methods for demonstrating power-concavity of BVP solutions for convex regions could 
only be applied to spaces of non-negative constant sectional curvature, for which the results are already 
published. This book started as a “Chapter 0” to give enough DG definitions for the planned theorem 
generalisations. Since the expected theorems have now evaporated like the morning mist in a desert, 
the original driving force for the book shifted more towards applications to physics. 


77.4.3 REMARK: For future centuries. 


One 


of the author’s principal motivations for writing this book has some similarity, and some differences, to 


the motivation expressed by Michel de Montaigne [456], Volume 1, page 117, for creating his monumental 
book of essays in the late 16th century. (His book invented the literary form known as the “essay”.) 


This 


The 
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Je l'ai voué à la commodité particuliére de mes parents et amis: à ce que m'ayant perdu (ce qu'ils 
ont à faire bientót) ils y puissent retrouver aucuns traits de mes conditions et humeurs, et que par 
ce moyen ils nourrissent plus entiére et plus vive, la connaissance qu'ils ont eu de moi. 


may be translated as follows. 


I have devoted it to the particular benefit of my relatives and friends so that, having lost me (as they 
must do soon), they may be able to rediscover in it some characteristics of my circumstances and 
temperament, and so by this means they may nourish, more fully and more alive, the acquaintance 
they have had of me. 


present book is not written for relatives and friends, but rather for acquaintances in future centuries 
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who will know the author only through his writing. Consequently this book expresses the author’s personal 
views, which do not necessarily agree on all points with the general mathematical community at this time. 
The views of the majority are already well represented in the published literature. This book is a personal 
statement, expressing the author’s stoic preference for truth whenever happiness has been the alternative. 


One human life is a mere blip in time, about a thousandth of the 75,000 year span of modern humanity on 
Earth since the Lake Toba supereruption, and a mere drop in the ocean of billions of other humans currently 
alive, and the tens of billions who did once live. A single human being has no more significance than an ant 
in a vast forest. Few people know much about their great grandparents, and very few can even name their 
great, great grandparents. Conversely, few individuals alive at present will be remembered 100 years from 
now, and much fewer will be remembered in 1000 years or 10,000 years. The invention of writing 5500 years 
ago has made possible the survival of identities and characteristics of individuals, but even if everything 
written now is physically preserved, very few individuals currently alive will come to the conscious attention 
of people 1000 years into the future. So from the perspective of future history, we are like fossils of ants in 
the primeval forests, like the images of ferns in the lumps of coal which are burned to make electricity. 


When one looks back in time more than 2000 years ago, the people who are remembered as individuals are 
those who wrote books or were mentioned in books. It is by books that people are remembered, not by 
their deeds or status in daily life. From the point of view of ancient history, anyone who was not named in 
a book did not have an individual existence. Thus the answer to the question “Do I exist?” is “No” from 
the distant future perspective, unless one survives in a book. But most books from ancient times are known 
only from brief references in books which survived, and vast numbers of books disappeared without even 
being mentioned. So to be remembered in a thousand years, one must write a book that someone will think 
is worth reading and preserving in every generation. When a book becomes extinct, it never comes back. A 
single generation of neglect condemns a book to oblivion. Thus like for old Michel de Montaigne, who feared 
that his individual characteristics would not be remembered after his death, this book will hopefully indicate 
to future generations that its author did exist, and that he had some thoughts which were individual, not 
just paraphrases of the mainstream culture. And hopefully someone will copy it in every generation. 


77.4.4 REMARK: Mathematics is an escape from ephemeral imperfect reality to eternal perfectible reality. 
Writing a mathematics book is an escape from the very imperfect ephemeral reality which one perceives 
through the daily news into an apparently eternal reality which can approach some kind of perfection 
through deep contemplation. The mathematical ideas of the ancient Mesopotamians, Egyptians and Greeks 
are still alive and still progressing in the modern world after thousands of years, while the ephemeral world 
has been totally transformed, sometimes for the better, sometimes for the worse. 


A mathematics book is an attempt to gather together and unify the grains of knowledge accumulated by 
thousands of mathematicians over thousands of years into a meaningful integrated work which synthesises 
the grains of sand into a sand-castle which can be perceived as a single entity. 


Pure mathematics is a tough discipline, but its reward is the discovery of eternal truths which outlive 
individuals, nations and cultures. This book has been motivated by the desire to sweep out some of the 
accumulated chaos and disorder of differential geometry, to make it more mathematical so that the core ideas 
can shine through more brightly. This is in the spirit of Euclid’s “Elements”, a work which brought order to 
the chaotic sand-grains of ancient Greek geometry, inspiring many later attempts to unify various areas of 
mathematics through axiomatisation, replacing esoteric mystical initiation with deduction and consensus. 
Hopefully this book is a positive contribution towards the reunification of differential geometry, a subject 
which has developed too quickly to be comprehensively consolidated. Then some day, when this subject is 
successfully unified and perfected, newcomers will be able to digest it more easily and more effectively. 


77.4.5 REMARK: Pure mathematics versus applied mathematics. 

This book has taken a pure mathematical perspective because this is a path which leads out of chaos into 
the logically coherent order of an axiomatic framework. Pure and applied mathematics may be contrasted 
in simplistic terms as follows. 


D Pure mathematics is the contemplation of infinities. 
D Applied mathematics is the computation of finities. 


From the pure mathematics perspective, applied mathematics may seem to be the art of designing algorithms 
to find pragmatic approximations to the “true” exact solutions to problems. From the applied mathematics 
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perspective, pure mathematics may seem to be the art of proving the obvious, proving for example existence, 
uniqueness, continuity, differentiability, linearity, boundedness, and other such abstract qualities which are 
intuitively clear to those who know their craft. 


A large proportion of mathematical analysis arrives at existence proofs by showing convergence of pragmatic 
approximations, and pragmatic approximations are often inspired by the constructions which are used in 
pure mathematical proofs. So pragmatic solutions are not to be sniffed at. 


Another simplistic way to distinguish the two perspectives is as follows. 


D Pure mathematics is the art of proving theorems. 


D Applied mathematics is the art of computing solutions. 


One applied mathematics professor (Ernie) told me very proudly in the tea-room in about 1980 that he had 
never published a single theorem in his whole life, although someone had attributed a theorem to him and 
had named it after him. 


Alternatively one could say the following. 


D Pure mathematics is the art of correct thinking. 


D Applied mathematics is the art of approximate thinking. 


The reader who seeks methods and algorithms for solving practical problems in differential geometry might 
not find very much which is directly valuable in this pure mathematical book. However, it is always nice 
to know that someone has checked the logical consistency of a conceptual framework, and it's also nice to 
be able to translate mathematical symbols into their meanings. Otherwise mathematics becomes an art of 
symbol manipulation, an imitative art learned by apprenticeship, as alluded to in Remark 1.4.13. 


And here's yet another simplistic comparison of pure and applied mathematics. 


D Pure mathematics is the art of sorting propositions into the "true" basket and the "false" basket. 


D Applied mathematics is the art of drawing error bars (or regions) around numbers. 


77.4.6 REMARK: Mathematical physics is a branch of mathematics. 

While attending lectures of Peter Szekeres (and other mathematical physicists) at Adelaide University in 
1973, 1974 and 1976, I must have acquired the view that physics is an axiomatic subject because that was 
how it was presented. This seemed more logical and appealing to me than the rough and ready ad-hoc 
methods of the experimental and theoretical physicists whose mathematical proofs were never convincing. 


So when writing this geometry book, it seemed that any physics it contained must necessarily be axiomatic, 
following from mathematics by writing equations for any required physical principles. Later it was quite 
shocking to realise that physicists do not in fact work from axioms to theorems and then to experimental 
verification, but rather in reverse. Relatively recently, I noticed the following paragraph in the preface of 
Szekeres [305], page ix. 


I believe that mathematical physicists put the mathematics first, while for theoretical physicists 
it is the physics which is uppermost. The latter seek out those areas of mathematics for the use 
they may be put to, while the former have a more unified view of the two disciplines. [...] In the 
big scheme of things both have their place but, as this book no doubt demonstrates, my personal 
preference is to view mathematical physics as a branch of mathematics. 


There does in fact seem to be a huge gulf between the mathematical physicists and theoretical physicists. The 
former are mathematicians who know some physics. The latter are physicists who know some mathematics. 
But this differential geometry book demonstrates this author’s view that mathematics should retain its 
axiomatic and self-consistent well-defined character even when applied to some areas of physics. In other 
words, there is no need to abandon mathematical and logical rigour whenever mathematics is applied to 
the real world. Laziness in this regard can only lead to chaos and confusion further down the track when 
sloppiness yields a symbol soup, full of contradictions and ambiguities. 


One arrives at the following division of labour. 
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discipline people 


experimental physics physicists 
theoretical physics physicists utilising mathematics 


mathematical physics mathematicians axiomatising physics 
pure mathematics mathematicians 


This book occasionally ventures into elementary mathematical physics to find applications and motivations 
for the mathematics. The physicists’ physics is out of bounds! (See Remark 1.6.3 item (11).) 


77.4.7 REMARK: Mathematics is “something to do”. 

One day in Adelaide, around about 1980, I was proceeding in a southerly direction out of Adelaide University 
to cross North Terrace. As I walked up to the traffic lights, I noticed Bert (i.e. Prof. Herbert Sydney Green, 
1920-1999) standing there waiting for the lights to change. So I took advantage of the situation to tell him 
that after dropping out of fourth year mathematical physics twice, I had done fourth year pure mathematics 
and was now working towards a Ph.D. in pure mathematics. (On one of my 4th year MP attempts, Bert 
had supervised my honours project on tachyons, a somewhat over-ambitious choice of topic!) 


I thought he would either be disappointed or impressed. But his reply was: “Well, lm glad you found 
something to do." Then the lights changed, and I never saw him again. 


It still perplexes me that he thought of a mathematics Ph.D. as “something to do”, as if otherwise one might 
be twiddling one's thumbs and staring at the carpet. But now I wonder if the present DG book is indeed 
just “something to do". Maybe all of mathematics and physics is just “something to do”. It's better than 
having nothing to do, I suppose. 


77.4.8 REMARK: The accidental historical origins of this book. 

Perhaps it is self-indulgent to write about the origins of this book, given its somewhat limited impact on the 
academic community up to this time, but probably no great harm is done by writing a few words about it. 
This book is an accidental by-product of a series of career failures, not the culmination of a grand plan. 


Originally the author had hoped for a career in astronomy, making sense of the universe by staring through 
telescopes, ultimately moving to another planet or living in a spherical spaceship in splendid isolation. But 
early in his university education, he was informed by a physics tutor that there were insufficient telescopes in 
the world to employ all of the astronomy graduates. So this career path was supposedly unrealistic because 
no more astronomers were required. (Nowadays, of course, astronomers get remote access to telescopes, but 
the internet was not on the technological horizon at that time.) 


So the author turned his interest to nuclear fusion, in the hope that wars between nations could be sharply 
reduced by discovering how to obtain infinite energy from sea-water. In fourth year nuclear fusion lectures, 
it became clear that this technology would take at least thirty years to succeed, which would have been too 
close to the author’s retirement date. (And as everyone is fond of saying, nuclear fusion power generation 
still is and always will be thirty years away.) 


The author could not understand physics because the symbols did not make sense. So he studied another 
two years of mathematics to try to understand what the mathematical physicists were trying to communicate 
on their blackboards. But the mathematical physics lecture notes still did not make sense. So the author 
sought refuge in a PhD in mathematics, hoping one day to apply it to physics. After a couple of years of 
post-doctoral mathematics research, it became clear that none of it would be useful to humanity, and it was 
very hard work for such a modest and insecure income. So the author moved into software development and 
communications engineering, which were well paid and interesting, but lacked intellectual depth. 


So the author sought refuge in an old sketch of a PDE monograph that he had started writing, in the hope 
that publishing a book on this subject might get him back into academia. To make the PDE book more 
interesting, it seemed like a good idea to add at least one new research result, namely the generalisation of 
his own PDE publications to “curved space”. But this required an understanding of differential geometry. 
Therefore the PDE monograph sketch acquired a preliminary DG chapter, to assist the PDE research. But 
month by month, year by year, this preliminary chapter began to dwarf the PDE research chapters. So it 
seemed sensible to split off a DG book for a while, then return to PDE later. Thus a 160 page DG book 
sketch was created in October 2001, leaving behind a 40 page PDE monograph sketch. 
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In November 2015, it became clear that the author’s favourite style of convexity maximum principle for 
proving properties of solutions of boundary and initial value problems was only applicable to Riemannian 
manifolds with constant non-negative sectional curvature, for which publications had already appeared. The 
higher-order terms which needed to be controlled could not be controlled enough to make the maximum 
principle work. That was the final nail in the coffin for the applicability to the author’s PDE research of the 
DG “preliminary chapter”, which by then had reached about 1600 pages. So the only reason to continue 
work on it was the hope that it could be useful to people who need to understand DG, for whatever reason. 


In summary, the author’s careers in astronomy, physics and partial differential equations all failed, and 
now the DG “preliminary chapter” is all that is left. It is an accidental by-product of failed careers, not 
the crowning achievement of a life-long plan. The gamble he took by starting a PDE monograph failed. 
Hopefully other people will learn from these mistakes! 


77.4.9 REMARK: The economic disadvantages of writing a serious book. 

Even more than at any previous time in history, it is now almost impossible to make an income by writing a 
serious book. The Internet has put an end to making money by extracting a small profit from each copy of 
a printed book. On the other hand, there is currently more economic “surplus” than at any previous time 
in history. A vast proportion of humanity is now in mortal danger from excessive nutrition, whereas in the 
past, the vast majority of humanity was in constant mortal danger from inadequate nutrition. Therefore 
most of humanity has leisure to philosophise and write, if one is willing to accept a standard of living which 
people a few hundred years ago considered entirely adequate. This is the context in which this book has 
been written. By forgoing an unnecessary million dollars of income, anyone can spend ten or more years 
writing and philosophising, although one’s real estate portfolio at the end of life will not be so impressive. 


The exclusion of the majority of people from the leisure to think was summarised in 1921 by a character in 
a novel by Aldous Huxley. (See Huxley [494], pages 106-107; quoted by Clark [454], page 72.) 


If you're to do anything reasonable in this world, you must have a class of people who are secure, safe 
from public opinion, safe from poverty, leisured, not compelled to waste their time in the imbecile 
routines that go by the name of Honest Work. You must have a class of which the members can 
think and, within the obvious limits, do what they please. You must have a class in which people 
who have eccentricities can indulge them and in which eccentricity in general will be tolerated and 
understood. That’s the important thing about an aristocracy. 


In this decade, probably a billion people can now “do what they please”, at least “within the obvious limits”. 


77.4.10 REMARK: All is forgiven. The intuitive approach to differential geometry is not so bad. 

As mentioned in Remark 2.2.12, Hans Lewy is reported to have written in regard to the very burdensome 
methods of Brouwer’s intuitionism: “If we have to go through so much trouble as Brouwer says, then nobody 
will want to be a mathematician any more.” After seeing this differential geometry book, some people may 
be tempted to say something similar: “If we have to go through so much trouble as Kennington says, nobody 
will want to do differential geometry any more.” That would probably not be far off the truth. 


It seems clear now that a logically self-consistent approach to differential geometry is excessively burdensome. 
If the basic concepts of manifolds with locally Cartesian charts, and fibre bundles with fibre charts, are applied 
consistently in all situations, the result is an unacceptable expansion in the volume of ideas which must be 
absorbed. Certainly the majority of students who might be initially attracted to differential geometry 
would abandon hope very soon after seeing what is required. Therefore “all is forgiven”, so to speak. It is 
now clear that the intuitive approach to differential geometry can be justified. An “axiomatically correct” 
approach is poorly suited to the needs of both students and practitioners. The iron discipline exercised in 
the development of the austere framework presented in this book has yielded a solid underpinning for the 
often informal or ambiguous definitions and assertions in the mainstream literature. But a strict axiomatic 
framework is possibly not the best way to initially learn the subject. Thus once again, it must be concluded 
that this book is a philosophical essay, not a textbook. 


77.4.11 REMARK: Future big projects. 

There is not enough time or wherewithal for myself to continue this book project into further topics. If time 
and means permitted, one attractive project would be to continue this book to cover all of undergraduate 
mathematics, both pure and applied. That would require a further 20,000 to 40,000 hours of work by a highly 
cooperative team of enthusiastic mathematicians to create a single unified picture of all undergraduate 
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mathematics topics. Such a work, totalling perhaps 4000 to 6000 pages, could then be handed to every 
beginning undergraduate, who could then select topics to learn. This would be better than a series of semi- 
detached single-semester courses which are only loosely connected. (This single-book plan may sound like 
the Bourbaki project, but the objective here is to provide a single eclectic work which could realistically be 
read by an undergraduate in three or four years.) 


Beyond the mathematics area, one could create a unified presentation of all undergraduate physics, chemistry, 
biology, neuroscience, geology, and other sciences. And beyond the sciences lie various humanities subjects, 
which could also be given the “unified systematic framework” treatment. If these frameworks are all joined 
into a single work, it could be a book of some 50,000 pages or so. 


An encyclopedia has the substantial disadvantage of being non-linear. Most encyclopedias are “random 
access” documents whose topics are linked in a more or less chaotic way. Random access to a million essays, 
no matter how well inter-linked and indexed, inevitably leads to confusion and disorientation. The most 
effective and satisfying learning can be achieved by reading a linear account of a subject. Even if one only dips 
into a small subset of a large linear text, one obtains at least the perspective of knowing what comes before 
and what comes after. As mentioned in Remarks 2.0.1 and 2.1.1, human knowledge has cyclic dependencies, 
and these cannot be removed. However, as the present work shows, it is possible to choose some kind of 
linear path through the network of mathematical knowledge so that forward references are minimised. The 
subject can be reorganised as a single coherent “story”, from “once upon a time” to “happily ever after”. 
Thus logic comes before set theory, which comes before numbers, and so forth. Such ordering decisions could 
be made for all of human knowledge. This could be achieved by a hundred experts working for ten years 
full-time for the benefit of human progress. Maybe some day this will happen! 


77.4.12 REMARK: Future small projects. 

Out of this book, many small books may be “spun off”. Four particular subject areas may be “boiled 
down” into smaller works without undue further expenditure of effort, namely books on logic, mathematical 
foundations (logic, set theory and number systems), differential geometry, and gauge theory. 


A minimal logic book would cover the topics listed in the table in Remark 3.0.7. (A not so minimal logic book 
has already been prepared by the author, derived from Chapters 1 to 6.) A foundations book would cover 
most of Part I of this book. The differential geometry book would cover Part IV, together with simplified, 
shortened versions of various prerequisites from Parts II and III, such as tensor algebra and fibre bundles. 
The gauge theory book would give sufficient differential geometry for its mathematicisation, together with 


derivations of various classical equations of motion for elementary particles. 


77.4.13 REMARK: The importance of saying what you really think. 

Too often, social or professional pressure prevents people from saying what they really think. But something 
which is regarded as wrong, ridiculous or absurd in one century may be regarded as a work of pure genius in 
the next. Instead of self-censoring in an attempt to win approval from peers and others, the more courageous 
path is to seek the truth and tell the truth, and then wait until the world catches up. Future centuries will 
forgive errors, but works which contain no original ideas will be forgotten. 


Some authors confuse originality with contrarianism. Anyone can assert the negative of any proposition. 
That is not originality. That is a mechanical exercise worthy of a robot. To be truly original one must seek 
truth, which will occasionally contradict mainstream views despite one’s best efforts. If an unpopular idea 
does arise in this way, the courageous path is to openly assert it. Some of the most original ideas in history 
were self-censored, only found later in private notes or correspondence, too late to be accorded priority. Even 
worse than losing priority is losing important ideas forever due to self-censorship. We will never know how 
many important ideas have been lost in this way. 


77.4.14 REMARK: The irony of a long book and short thesis. 

One of the little personal ironies of writing this book has been the contrast to the author’s conundrum when 
composing his PhD thesis in around 1983. Every week or so, the author found a significant shortening of 
the proofs of his theorems, reducing the total length to well below 100 pages, generally considered to be too 
small for a PhD thesis. Every couple of weeks, his PhD supervisor told him that he must find some new 
theorems to increase the size of the book. So the author dutifully found new applications of this theorems, 
and new extensions, which increased the size by 10 pages or more. But then the thesis shrank again when 
the author discovered yet more simplifications and short-cuts in the proofs. So the size kept going back to 
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where it was. Finally the supervisor lost patience and said the thesis would need to be examined because 
he was in a hurry to retire and go skiing in Indiana with a life-long colleague. By some miracle (or mistake, 
or enlightenment), the value of the thesis content caused the examiners to overlook the tiny amount of text, 
under 100 pages, double spaced, 12-point font, with generous margins! 


Ironically, the exact reverse has happened in the case of this book. Every attempt at simplifications or 
short-cuts has led to significant expansion of the book. Just as a PhD thesis has a kind of minimum size 
criterion, a mathematics book likewise has a kind of maximum size criterion. Anything over a few hundred 
pages is unlikely to be read. At best, a book of more than 2000 pages could end up in the basement of a 
library in case someone specifically requests to see it. Such may, at best, be the fate of this attempt to give 
the world a brief summary of the basic concepts of differential geometry. 


77.4.15 REMARK: An optimistic note. 

The author’s search for truth doesn’t amount to a hill of beans in this crazy world. But some day, the penny 
will drop, the chickens will come home to roost, the sheep will be separated from the goats, the wheat will be 
separated from the chaff, the cows will come home, and there’ll be bluebirds over the white cliffs of Dover. 
Just you wait and see! 
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NOTATIONS, ABBREVIATIONS, TABLES 
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78.3 Literature survey tables... . 4 4 4 2 2 2 a 2383 
78.1. Notations 
notation reference meaning 
lIntroduetlOmna 4a) eo at ee Roe Goede Be eae E RES LRA a eR es 1 


X<Y 
Y>X 


1.5.6 
1.5.7 
1.5.7 


end of proof; quod erat demonstrandum 
X is an abbreviation for Y 
X is an abbreviation for Y 


2. General comments on mathematics... . aoao 4 a a 4 4 s 


3. Propositional logic 
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proposition-tag “false” 

proposition-tag “true” 

the naive set {F, T}, the set of possible truth-values of propositions 
set of truth-value maps {7 : P — {F, T}} on concrete proposition domain P 
logical “not” (negation) 

logical “and” (conjunction) 

logical “or” (disjunction) 

logical implication operator (“implies”) 

logical equivalent operator (“if and only if”) 

logical reverse implication operator (“is implied by”) 

falsum; logical expression whose value is always false 

verum; logical expression whose value is always true 

alternative denial operator (NAND operator, Sheffer stroke) 

joint denial operator (NOR operator, Peirce arrow, Quine dagger) 
exclusive-or operator (XOR operator) 

dereferenced value of a logical proposition name ¢ 

equivalence relation between logical expressions on a proposition domain 
uniform substitution of symbol strings into a symbol string $o. 
postfix logical expression space 

postfix logical expression interpretation map 

prefix logical expression space 

prefix logical expression interpretation map 

infix logical expression space 

infix logical expression interpretation map 
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notation reference meaning 
A. Propositional calculus. «os. 4.82%. oe 6 dog X up doom Rom e ae RUN Um ee Ue m d 91 
Lx 4.3.11 unconditional assertion in an axiomatic system X 
- 4.3.11 unconditional assertion in an implicit axiomatic system 
Ly 4.3.12 conditional assertion in an axiomatic system X 
F- 4.3.12 conditional assertion in an implicit axiomatic system 
dex 4.3.15 two-way assertion in a logical deduction system X 
dr 4.3.15 two-way assertion in an implicit logical deduction system 
5. Predicate logic: «owes Gace RUE sos RR AUREVN A EIE oS ee Ba e 145 
5.1.10 falsum; logical predicate whose value is always false 
T 5.1.10 verum; logical predicate whose value is always true 
V 2.2 for all; universal quantifier symbol 
z 5.2.2 for some; existential quantifier symbol 
6. Predicate calculus s 44 44 4 24.42 ba p d RR ROW RR ek ke we es 163 
r-—y 6.7.6 x equals y 
TAY 6.7.7 x does not equal y; i.e. =(x = y) 
Jz 6.8.3 for some unique x 
dz, P(x) 6.8.3 P(x) is true for one and only one x 
paroi M aed. eg. ye cacti ond Go Bio Gow @ oh eae we oe ae a ei Oa we Be eS 221 
TEA 7.2.4 x is an element (or member) of a set A 
rA 7.2.5 x is not an element (or member) of set A 
Vz € S, P(x) 7.2.7 Va, (x € S => P(x)) (i.e. P(x) is true for all elements x of a set S) 
da € S, P(x) TO da, (x € S ^ P(x)) (i.e. P(x) is true for some element x of a set S) 


D 
IO 
[vv 
N 
w 
w 


A is a subset of B 
A is a superset of B; same as B C A 


D 
IU 
[vv 
N 
w 
w 


AB 7.3.3 A is not a subset of B 

ADB 7.3.3 A is not a superset of B; same as B Z A 

AGB 7.3.7 A is a proper subset of B 

A2B 7.9.7 A is a proper superset of B; same as B& A 

{x} 7.5.7 the singleton set S satisfying z € S & z=% 

() 7.6.4 the empty set; satisfies Yx, x ¢ 0 

{x,y} 7.6.13 the unordered pair S satisfying z € S = (z — zr V z=y) 

UK 7.6.17 union of set (of sets) K 

AUB 1.6.20 union of sets A and B 

IP(X) 7.6.26 the power set of set X 

f(x) TTT the set y for which f(x,y) is true, for a set x and set-theoretic function f 

[z; P(x)} 7.7.10 the set S which satisfies z € $ <= P(z) 

{xz € A; P(x)} 7.73 the set S which satisfies z € S = (z € A ^ P(z)) 

[f(r;ceA) 7.7.18 the set (y; dx € A, f(x, y)), for a set A and set-theoretic function f 
{A C X; P(A, X)) 7.7.23 the set {A € P(X); P(A, X)}, for a set X and set-theoretic formula P 

8. Set operations and constructions . . . . 4... 2 e 4 e Ses 263 

AUB 8.1.3 union of sets A and B 

AnB 8.1.3 intersection of sets A and B 

ANB 8.2.2 complement of set B within set A 

AAB 8.3.3 symmetric set difference of sets A and B 

LIS 8.4.2 union of set of sets S 

(1s 8.4.2 intersection of non-empty set of sets S 

Uxes f (X) 8.4.10 union of set of values of set-theoretic function f 

fixes f (X) 8.4.10 intersection of set of values of set-theoretic function f 

X<Y 8.8.4 X is an abbreviation for Y 

Y >X 8.8.4 X is an abbreviation for Y 
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notation reference meaning 
O. elatlOns!” REN eae Oe ee a oe ee a A eee A 285 
(a, b) 9.2.3 the ordered pair {{a}, (a, b}} for any a and b 


Left(p) 9.2.14 the left element a of an ordered pair p — (a, b) 
Right (p) 9.2.14 the right element b of an ordered pair p = (a, b) 
AxB 9.4.3 Cartesian product of sets A and B 
Ax? H 9.4.10 Cartesian product of sets A and B [strict set-product) 
Dom(R) 9.5.6 domain of relation or function R 
Range(R) 9.5.6 range of relation or function R 
Img(R) 9.5.6 image of relation or function R 
R(A) 9.5.17 the image of a set A by a relation R 
R-\(B) 9.5.17 the pre-image of a set B by a relation R 
graph(R) 9.5.24 graph of relation or function R 
aRb 9.5.29 (a,b) € R, where R is a relation 
Rı o Rə 9.6.3 composition of relations (or functions) Rı and R2 
idx 9.6.10 identity relation on set X 
ER 9.6.14 inverse of relation (or function) R 
Rl, 9.6.22 restriction of domain of relation R to set A 
R|? 9.6.22 restriction of range of relation R to set B 
RIZ 9.6.22 restriction of domain/range of relation R to sets A/B 
R, X Rə 9.7.3 double-domain direct product of relations R4 and Rə 
[x] n 9.8.6 equivalence class of x € X for an equivalence relation R 
[x 9.8.6 equivalence class of z € X for an equivalence relation R 
X/R 9.8.8 the quotient (set) of a set X with respect to an equivalence relation R 
I0: Functions s ook oy x xow eos wu RO degere e Ae ce cede dX RR e RR e UR HR oe diede ox 311 
f:X—Y 10.2.3 f is a function from X to Y 
f(x) 10.2.9 the value of a function f for an argument z of f 
Js 10.2.9 the value of a function f for an argument x of f 
(fiiex 10.2.10 alternative notation for function f with domain X 
y> 10.2.17 the set of functions from X to Y 
{f:X + Y; P(f)} 102.18 the set {f € Y*; P(f)) for sets X and Y and set-theoretic formula P 
idx 10.2.29 identity function on set X 
fla 10.4.4 the restriction of function f to set A 
F(x) ee 10.4.15  post-evaluation substitution of expression a for free variable x in expression F 
gof 10.4.18 composition of functions f and g 
goof 10.4.26 pointwise composition of function-valued functions f and g 
Inj(X, Y) 10.5.25 the set of injections from X to Y 
Surj( X, Y) 10.5.25 the set of surjections from X to Y 
Bij( X, Y) 10.5.25 the set of bijections from X to Y 
f(A) 10.6.4 image of a set A by a function f 
I «B) 10.6.4 inverse image of a set B by a function f 
Si 10.8.3 the value S(4) for a family of sets S, i.e. S; = S(i) 
(Si)ier 10.8.3 a family of sets S with index set I, i.e. S = (S;);er 
fi 10.8.5 the value f(i) for a family of functions f, ie. f; = f (i) 
(fi)ier 10.8.5 a family of functions f with index set I, i.e. f = (fi)ier 
(Ai, Bi)ier 10.8.7 family of pairs (Ai, Bi) in families (Ai)ier and (Bi)ier 
Uier Si 10.8.10 the union of the set-family S = (S;)jer 
lier S; 10.8.10 the intersection of the set-family S = (S;);e; 
X>Y 10.9.3 set of partially-defined functions from X to Y 
f:X >Y 10.9.3 f is a partially-defined function from X to Y 
f(A) 10.9.10 image of a set A by a partial function f 
f-\(B) 10.9.10 inverse image of a set B by a partial function f 
gof 10.10.7 the composition of partially defined functions f and g 
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notation reference meaning 
Xier Sj 10.11.3 Cartesian product of family of sets (S;)jer 
I, 10.12.3 projection map of Cartesian set-product onto component k 
Slice? (A) 10.12.6 the slice of A C Sı x $5 through p € Sı x S3 along component k € Nə 
Slice? (f) 10.12.11 slice of f : A — Y through p € S1 x S2 along component k, A C S1 x S2 
Lift? 10.12.14 lift map for component k € Nə to a point p € S1 x $5 
f(-,22) 10.12.18 partial map t> f(t,223), f : X1 x X2 > Y, ro € Xə 
f(x) 10.12.18 partial map th f(z1,t), f: Xy x X2 > Y, a1 E€ Xi 
nn 10.13.3 projection map z — zy from x;er Si to Sy for k c T 
Il; 10.13.7 projection map of x;er S; onto xje; S; for J C T 
Slice? (A) 10.13.11 the slice of A C x;e,; S; through p € x;e; 9; along component k € I 
Slice? (f) 10.13.16 slice of f : A — Y through p € x;e; 5; along component k € I, A C xje7 Si 
Lift? 10.13.18 lift map for component k € I to a point p € xjer Si 
fxg 10.14.4 double-domain direct product of functions f and g 
fxg 10.14.11 double-domain direct product of partial functions f and g 
fxg 10.15.3 | common-domain direct product of functions f and g 
Xier Sj 10.17.3 the partial Cartesian product of a family of sets (S;);er 
XoY 10.19.2 the set of functions from X to Y; same as Y* 
[:GoF 10.19.8 f is a function from G to F > F; same as f : G > (F > F) 
G œ F 10.19.8 the set of functions from G to F —> F; same as G > (F > F) 
di Order s 5 dare Vie ee ee ea be eee Be a a eS Ls se a E 369 
Inc(X, Y) 11.1.32 set of increasing functions from X to Y, for partially ordered X, Y 
Dec(X, Y) 11.1.32 set of decreasing functions from X to Y, for partially ordered X, Y 


NonDec(X, Y) 
NonInc(X, Y) 


set of non-decreasing functions from X to Y, for partially ordered X, Y 
set of non-increasing functions from X to Y, for partially ordered X, Y 


inf( A) the infimum (if it exists) of a set A in an ordered set 

sup( A) the supremum (if it exists) of a set A in an ordered set 

min(A) the minimum (if it exists) of a set A in an ordered set 

max(A) the maximum (if it exists) of a set A in an ordered set 

inf(f) 3. infimum (if it exists) of function f : X — Y for ordered set Y 

sup( f) 11.3.3 supremum (if it exists) of function f : X — Y for ordered set Y 

min( f) 11.3.3 minimum (if it exists) of function f : X — Y for ordered set Y 
max(f) 11.3.3 maximum (if it exists) of function f : X — Y for ordered set Y 
inf;eA f(x 11.3.4 infimum on A (if it exists) of function f : X — Y for ordered set Y 
sup,cA f(x) 11.3.4 supremum on A (if it exists) of function f : X — Y for ordered set Y 
mingea f(x) 11.3.4 minimum on A (if it exists) of function f : X — Y for ordered set Y 
maxzea f(x) 11.3.4 maximum on A (if it exists) of function f : X — Y for ordered set Y 
X [a, 6] the set {x € X; a < x < b} for a,b € X 

X [a, b) the set {x € X; a < x < b} for a,b € X 

X (a, b] the set {x € X; a < a < b} for a,b € X 

X (a, b) the set {x € X; a < a < b} for a,b € X 

X (—oo, 0] the set (rc X; x < b) forbe X 

X (—oo, b) the set {x € X; x < b} for bE X 

X [a, +00) the set (rc X; a < x} fora € X 

X [a, oc) the set (rc X; a < a} fora E X 

X (a, +00) the set (rc X; a < a} fora € X 

X (a, oo) the set (rc X; a < x} fora E€ X 

II"? projection map of Xie S; onto Xjeg Sj with J = {j € I; ni < j € no} 
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78.1. Notations 2361 
notation reference meaning 
12. Ordinal Wumberss ve o v uote Pe a eoe woe du te ode EK, ux ew LEG i ee aes oS 405 
w 12.1.29 the set of finite ordinal numbers 
XY 12.2.18 the successor set X U {X} of any set X 
Ord( X) 12.5.10 X is an ordinal number, i.e. JX C X and VS € P(X)\ {0}, ASES 
13. Cardinality: ue der dedero Rp A ur Roe dux oue cer LS doe UU edi ve tt eder es eco 449 
Z(X)—-Zz(Y) 13.1.5 X and Y are equinumerous sets 
Z(X)27(Y) 13.111 set X dominates set Y 
#(X)<#(Y) 13.1.11 set Y dominates set X 
#(X)>#(Y) 13.1.11 set X strictly dominates set Y 
#(X)<#(Y) 13.1.11 set Y strictly dominates set X 
card( X) 13.1.25 the cardinality of a well-orderable ZF set X 
H(X) 13.3.8 the Hartogs number of a ZF set X 
rank(X) 13.4.4 the rank of a ZF set X 
p(X) 13.4.7 the cardinality-rank of a ZF set X 
B(X) 13.4.10 the beta-cardinality of a ZF set X 
##(S) 13.5.6 the numerosity (or cardinality) of a finite set S 
#(S) < co 13.5.8 S is a finite set 
Enum(K,Y) 13.5.11 the set of bijections from X to Y for X c K 
Ni, + No 13.6.5 the sum of finite ordinal numbers N; and No 
Ni — No 13.6.5 the difference of finite ordinal numbers N; and Nə with No < Ny 
2" 13.6.14 the nth power of 2, for a finite ordinal number n 
#(S) = oo 13.7.4 S is an infinite set 
#(S) 13.7.10 the numerosity (or cardinality) of a countably infinite set S (always equals w) 
IP"(X) 13.12.2 the set (S € P(X); #(S) < m] for a set X and m € Zt 
P,(X) 13.12.2 the set (S € P(X); n € #(S)} for a set X and n € Zj 
P(X) 13.12.2 the set (S € P(X); n < #(S) < m) for a set X and m,n € Zj 
P(X) 13.12.5 the set (S € P(X); 4£(S) < oo) for a set X 
P(X) 13.12.5 theset (S € P(X); n < #(S) < oc] for a set X and n € Zj 
P“(X) 13.12.6 the set (S € P(X); #(S) < #(w)} for a set X 
Po(X) 13.12.6 the set (S € P(X); #(w) < #(S)} for a set X 
P2(X) 13.12.6 the set (S € P(X); n € #(S) € #(w)} for a set X and n € Zt 
P#(X) 13.12.6 the set (S € P(X); 4(S) = #(w)} for a set X 
14. Natural numbers and integers... ll ees 493 
NÍL, x] 14.1.14 the leading interval up to x € N, for any natural number system IN 
[1, x] 14.1.14 abbreviation for IN[1, x] 
N 14.1.19 the natural number system w \ {0} = {1,2,3,...} 
Nn 14.1.21 the set {1,2,...n} for n E€ w 
Z 14.4.4 the integers {...,—2,—1,0,1,2,...} 
Zt 14.4.5 the positive integers {1,2,3,...}; equivalent to IN 
Zo 14.4.5 non-negative integers {0,1,2,...}; equivalent to w 
Z7 14.4.5 the negative integers (—1, —2, —3,...} 
Zo 14.4.5 non-positive integers (0, —1, —2,...] 
yy 14.4.7 the set (ie Z; Ox i«n) 2 (0,1,...n — 1) for n € Z 
Zim, n] 14.4.10 the set {i € Z; m i € n) for men c Z 
Zim, n) 14.4.10 the set {i € Z; m € i « n) for mne Z 
Z(m,n] 14.4.10 the set {i € Z; m« i € n) for mne Z 
Z(m,n) 144.410 the set {i € Z; m«i«n) for mnc Z 
Z(—oo,n] 14.4.11 the set {i € Z; i < n} fornc Z 
Z(—oo,n) 14.4.11 the set {i € Z; i < n} fornc Z 
Z[m, oc) 14.4.11 the set {i € Z; m € i) forme Z 
Z(m, oo) 14411 the set {i € Z; m < i} for mc Z 
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2362 78. Notations, abbreviations, tables 

notation reference meaning 

at 14.4.17 sum of a finite sequence of integers (a;i)? m 

Iin ai 14.4.18 product of a finite sequence of integers (a;)7_,, 

oo 14.5.1 the positive infinite pseudo-integer 

—o00 14.5.1 the negative infinite pseudo-integer 

Z 14.5.3 the extended integers Z U (oo, —oc] 

Z+ 14.5.4 the positive extended integers; equivalent to IN 

Zi 14.5.4 non-negative extended integers; equivalent to w^ 

Z 14.5.4 the negative extended integers 

Zo 14.5.4 non-positive extended integers 

N 14.5.8 the extended natural numbers N U {oo} 

X” 14.6.1 Cartesian product of n copies of set X for n € Zi 

II”, 14.6.11 subsequence map from X? to X"™—™+1 

XA 14.7.3 indicator function of a set A 

2” 14.7.8 power-of-two function for integer argument n € Zg 

ô 14.7.11 Kronecker delta function ô : X x X — V on set X 

bij, 09, 6% 14.7.11 Kronecker delta function; same as 6(i, j) 

supp(f) 14.7.18 the (non-topological) support of a function f 

ó(P) 14.7.21 Kronecker delta pseudo-function template for proposition P 
perm( X) 14.8.3 the set of permutations of a set X 

supp(P) 14.8.8 the support (x € X; P(x) Z x} of a permutation P € perm(X) 
permo( X) 14.8.41 the set of finite permutations of a set X 

parity( f) 14.8.23 parity of a finite permutation on a set X 

n! 14.8.27 the value of the factorial function for argument n 

(n) 14.8.80 the value of the Jordan factorial function for argument (n, k) 
e(f) 14.8.34 Levi-Civita alternating symbol 

Chitin 14.8.35 Levi-Civita alternating symbol 

grin 14.8.35 Levi-Civita alternating symbol 

C? 14.9.3 combination symbol 

I? 14.10.3 set of increasing maps from IN, to IN, 

Ji 14.10.3 set of non-decreasing maps from IN, to Nn 

List(X) 14.12.3 list space on a set X 

length(/) 14.12.6 length of a list £ 

concat(/4 , £2) 14.12.86 concatenation of lists /4 and £5 

omit, (/) 14.12.6 operation to omit item j from list £ 

omit; (£) 14.12.6 operation to omit items j and k from list £ 

swap; (£) 14.12.6 operation to swap items j and k in list £ 

subs; , (4) 14.12.6 operation to substitute item j with value x in list £ 

insert; , (£) 14.12.6 operation to insert value x for item j in list £ 

subseq; (£) 14.12.6 operation to construct subsequence of list Z, start at j, extract k items 
List(X) 14.12.13 extended list space on a set X 

List, (X) 14.12.18 list space with maximum length k on a set X for k € Zj 
Mult;,(7, Y) 14.12.21 multinomial coefficient space of degree k with index set I, value set Y 
Mult? (I ,Y) 14.12.21 symmetric multinomial coefficient space, degree k, index set I, value set Y 
swap, ,(f) 14.12.23 operation to swap values of function f at s and t 

subs; (f) 14.12.23 operation to substitute value of function f at x with new value y 
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notation reference meaning 
15. Rational and real numbers ... .. es 533 
Q 15.1.6 the set of rational numbers 
Qr 15.1.11 the set of positive rational numbers 
Ro 15.1.11 the set of non-negative rational numbers 
Q 15.1.11 the set of negative rational numbers 
Ro 15.1.11 the set of non-positive rational numbers 
Qla, b] 15.1.13 the set {q E Q; a < q < b} for a,b e Q 
Ria, b) 15.1.13 the set {q E Q; a < q < b} for a,b E Q 
R(a, b] 15.1.13 the set {q E€ Q; a < q < b} for a,b E Q 
(a, b) 15.1.13 the set {q E Q; a < q < b} fora,be Q 
R(—oo, b] 15.1.14 the set {q E Q; q x b) forb c Q 
R(—o0, b) 15.1.14 the set {q E Q; q< b) forbe Q 
Qla, oc) 15.1.14 the set {q E Q; a < q} fora e Q 
R(a, oc) 15.1.14 the set {q E Q; a < q} fora E Q 
R 15.4.9 the set of real numbers 
RT 15.6.7 the set of positive real numbers {x € R; x > 0) 
Ro 15.6.7 the set of non-negative real numbers {x € IR; x > 0} 
R- 15.6.7 the set of negative real numbers (x € R; x < 0} 
Ro 15.6.7 the set of non-positive real numbers (x € IR; z < 0} 
16. Real-number constructions . . . ooa eoe 565 
a, b] 16.1.2 the closed interval of real numbers {x € R; a < x < b} 
a, b) 16.1.2 the open interval of real numbers {x € R; a < x < b} 
a,b) 16.1.2 the closed-open interval of real numbers {x € R; a € x < 6) 
(a, 5] 16.1.2 the open-closed interval of real numbers {x € R; a < x € b} 

à, 00) 16.1.2 the semi-infinite closed interval of real numbers {x € IR; a € x} 
à, oo) 16.1.2 the semi-infinite open interval of real numbers {x € IR; a < x} 
(—oo, b] 16.1.2 the semi-infinite closed interval of real numbers (x € IR; z € b} 
—oo, b) 16.1.2 the semi-infinite open interval of real numbers {x € IR; x < b} 

[a, 6] 16.1.15 — symmetrised closed interval of real numbers [a, b] U [b, a] = [min(a, b), max(a, 6)] 
oo 16.2.2 infinity element of extended real numbers 

R 16.2.4 the set of extended real numbers R U {—co, co} 

Rt 16.2.5 ^ the set of positive extended real numbers {x € R; x > 0} 

R 16.2.5 the set of non-negative extended real numbers {x € R; x > 0} 
R- 16.2.5 the set of negative extended real numbers (x € R; x < 0} 

Ro 16.2.5 the set of non-positive extended real numbers {x € R; x < 0} 
Q 16.3.3 the set of extended rational numbers 

Qt 16.3.5 the set of positive extended rational numbers 

Q 16.3.5 the set of non-negative extended rational numbers 

Q 16.3.5 the set of negative extended rational numbers 

Qo 16.3.5 the set of non-positive extended rational numbers 

R” 16.4.1 the set of real-number n-tuples with index set N, for n € Z? 
Qm,n 16.4.3 the m,n-concatenation operator for real number tuples for m,n € Zü 
|x| 16.5.2 the absolute value of x € R 

sign(x) 16.5.4 the sign of z € IR 

H(x) 16.5.8 the Heaviside function of z € IR 

floor(z) 16.5.11 the floor function of x € IR 

ceiling (a) 16.5.12 the ceiling function of z € IR 

frac(x) 16.5.16 the fractional part function of z € R 

round(z) 16.5.17 the round function of x € IR 

x mod m 16.5.19 — x modulo m for z € R and m € R \ {0} 
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2364 78. Notations, abbreviations, tables 


notation reference meaning 
xP 16.6(3 pth power of z € IR for p € ZF 
xP 16.6.4 pth power of x € R \ {0} for p € Z 
YT 16.6.9 qth root of x € R \ {0} for q e Zt 
x14 16.6.9 qth root of x € R for q € Z+ 
x?/a 16.6.13  p/qth power of x € R$ for p € Z, q € Zt 
g 16.6.16 rth power of x € Ri frre R 
bu 16.7.2 sum of a finite sequence of real numbers (a;)7..,,, 
icm Oi 16.7.3 product of a finite sequence of real numbers (a;)7..,, 
16.8.2 the set of complex numbers R x R 
i 16.8.2 the complex number (0,1) € C 
Re(z) 16.8.6 the real part of a complex number z 
Im(z) 16.8.6 the imaginary part of a complex number z 
17. Semigroups and groups . . . . lle 597 
of 17.1.08 sum of sequence f : Nn 2 T for commutative semigroup T, n € ZT 
Y fi 17.1.18 sum of sequence f : Nn >T for commutative semigroup I, n € ZT 
ien, fi 17.1.18 sum of sequence f : N, >T for commutative semigroup I', n € ZT 
Lg 17.3.6 left action by a group element g € G on elements of G 
Rg 17.3.6 right action by a group element g € G on elements of G 
e 17.3.9 identity element of a group G 


| 

a 
nm 
D 
tel 
m 
or 


inverse of an element g of a group G 
the set of group homomorphisms from G to G2 
the set of group monomorphisms from G to Gs 


m 
© 
E 
2 
= 
m 
N 
A 
N 


A 
N 


Mon(G1, G2) 17. 


Epi(G;, G2) 17.4.2 the set of group epimorphisms from G4 to Gə 
Iso(Gi, G2) 17.4.2 the set of group isomorphisms from G to G2 
End(G) 17.4.2 the set of group homomorphisms from G to G 
Aut(G) 17.4.2 the set of group isomorphisms from G to G 
g” 17.4.9 the nth power of g in a group G for n € Z 
gH 17.7.2 the left coset of a subgroup H of a group G by gE G 
Hg 17.7.2 the right coset of a subgroup H of a group G by gE G 
G/H 17.7.9 quotient of group G with respect to normal subgroup H of G 
$3 17.8.8 the left conjugate of subset S of group G by g E€ G 
18: Rings and fields « 4.5 ox osani toe Xo RR doy Eorum baa eR ER Rede d 617 
nr 18.1.22 the sum of n € Z copies of r € R, for a ring R 
in 18.2.4 the multiplicative identity of a ring R 
char(K) 18.7.11 the characteristic of a field K 
19. Modules and algebras . i sip iworo u uot askok dot woh els 647 
Homa (Mı, M2) 19.1.12 set of A-homomorphisms from module M, to module M» over set A 
Enda (M) 19.1.12 set of A-endomorphisms from module M to M over set A 
Aut,(M) 19.1.12 set of A-automorphisms from module M to M over set A 
GL(M) 19.1.12 same as Aut4(M) for module M over implicit set A 


Homg(Mı, M2) 19.4.4 set of R-module homomorphisms from Mı to Mə over a ring R 
Mong(Mı, M2) 19.4.4 set of R-module monomorphisms from Mı to Mə over a ring R 
Epig(Mı, M2) 19.4.4 set of R-module epimorphisms from Mı to Mə over a ring R 
Ison (Mi, M2) 19.4.4 set of R-module isomorphisms from M to Mə over a ring R 


Endg(M) 19.4.4 set of R-module endomorphisms of a module M over a ring R 

Autr(M) 19.4.4 set of R-module automorphisms of a module M over a ring R 

[X,Y] 19.10.5 the product of elements X and Y of a Lie algebra 

gl( M) 19.10.13 Lie algebra associated with associative algebra End g (M) for ring-module M 
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notation reference meaning 
20. Transformation<groups: eoe boo m eo a a a 4 Ea e a Uk RÜR GRO ee 671 
gx 20.1.4 action of g € G on x € X for a left transformation group (G, X) 
L 20.1.6 left action map by g € G, for left transformation group (G, X, c, u) 
Lg 20.1.6 left action map by g € G, for left transformation group (G, X, ø, p) 
Ga 20.5.2 the orbit of a left transformation group (G, X) passing through z € X 
X/G 20.5.7 the orbit space of a left transformation group (G, X) 
Gz 20.5.10 the stabiliser of x € X for a left transformation group (G, X) 
xg 20.7.4 action of g € G on x € X for a right transformation group (G, X) 
Re 20.7.6 right action map by g € G, for right transformation group (G, X,¢, ui) 
Rg 20.7.6 right action map by g € G, for right transformation group (G, X, 0, 1) 
rG 20.7.20 the orbit of a right transformation group (G, X) passing through z € X 
X/G 20.7.24 the orbit space of a right transformation group (G, X) 
21. Non-topological fibre bundles . . . . .. es 701 
E, 21.1.3 the fibre set of a fibration (E, r, B) at b € B 
X(E,n, B) 21.3.4 set of global cross-sections of a non-topological fibration (E, r, B) 
X(E,n,B|U) 21.34 set of local cross-sections of non-topological fibration (E, r, B) on U C B 
X (E,m, B) 21.3.4 set of local cross-sections of a non-topological fibration (E, r, B) 
X(E,n, B) 21.4.10 set of global short-cut cross-sections of a non-topological fibration (E, 7, B) 
X(E,n,B|U) 21.410 set of local short-cut cross-sections of non-top. fibration (E, v, B) on U C B 
Extng(z) 21.6.8 constant cross-section extension of non-topological fibration, fibre chart ¢, z € E 
atlas(E, 7, B) 21.8.4 the fibre atlas AZ of a fibre bundle (E, 7, B, AZ) 
atlas,(E,7,B) 21.8.4 the set of fibre charts ¢ € atlas(E, v, B) with b € 7(Dom(¢)) 
AE, 21.8.4 the set of ó € AL with b € (Dom(¢)) for a fibre bundle (E, 7, B, AZ) 
Xo 21.10.3 identity cross-section of non-topological principal bundle, fibre chart ¢ 
R? 21.11.6 right action by g € G on a non-topological principal G-bundle P 
zg 21.11.6 right action by g € G on z € P in a non-topological principal G-bundle P 
22.Linear Spa6es x eg, sace e RD poen mo Sd odo RE Se ER A ee do 749 
UL(S, K) 22.2.6 unrestricted linear space on set S over field K 
FL(S, K) 22.2.11 free linear space on set S over field K 
Fin(S, K) 22.2.25 {f:S > K; #{x € S; f(x) Z 0k] « oo}, set S, field or linear space K 
span( S) 22.4.4 the linear span (set of linear combinations) of S in a linear space 
span(v) 22.4.5 the linear span of a family v = (v;);er of vectors in a linear space 
dim(V) 22.5.3 dimension of linear space V 
codim(W, V) 22.5.18 codimension of linear subspace W of linear space V 
ei 22.7.9 unit vector i in the standard basis (e;)"_, for a Cartesian linear space R” 
KB 22.8.6 component map for a basis set B C V of a linear space V 
KB 22.8.7 component map for a basis family B = (e;)icr € V! of a linear space V 
KB 22.8.8 component map for a basis B = (e;j)?., € V" of a linear space V 
$4 + So 22.10.2 the Minkowski sum of subsets $4 and Sə of a linear space 
Yia Si 22.10.2 the Minkowski sum of a finite family of subsets (.S;)"_, of a linear space 
AS 22.10.2 the Minkowski product of subset S of a linear space by a scalar A 
ucs 22.10.4 Minkowski sum of (u) and subset S of a linear space for a vector v 
-S 22.10.5 Minkowski scalar multiple by —1 of a subset S of a linear space 
Si — $5 22.10.5 Minkowski sum Sı + (—S 2) for subsets Sı and S» of a linear space 
Fing(S, K) 22.11.6 functions f € Fin(S, K) which are non-negative and sum to 1 
conv(S) 22.11.12 convex span of set S in a linear space 
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2366 78. Notations, abbreviations, tables 
notation reference meaning 
23. Linear maps and dual spaces . ... ..... ll lees 789 
Lin(V, W) 23.1.4 set of linear maps from linear space V to linear space W 
Hom(Vi, V2) 23.1.9 the set of linear space homomorphisms from Vj to V2 
Mon(VA, V3) 23.1.9 the set of linear space monomorphisms from Vj; to V2 
Epi(V, V2) 23.1.9 the set of linear space epimorphisms from Vj to Vz 
Iso(Vi, V2) 23.1.9 the set of linear space isomorphisms from Vi to V2 
End(V) 23.1.9 the set of linear space endomorphisms on V 
Aut(V) 23.1.9 the set of linear space automorphisms on V 
GL(V) 23.1.12 the set (or group) of linear space automorphisms on V 
ker(¢) 23.1.23 kernel of linear map $ : V — W, linear spaces V and W 
nullity (à) 23.1.27 nullity, dim(ker(¢)), of linear map ¢: V > W 
KB,,B3 23.2.8 linear map component map for basis sets B4, B2 for Vi, V2 
KB,,Bo 23.2.9 linear map component map for basis families B,, Bə for Vi, V2 
KB,,Bo 23.2.10 linear map component map for finite bases B1, B2 for Vi, V2 
Tr(Q) 23.3.4 the trace of a finite-dimensional linear space endomorphism ¢ 
V* 23.6.2 the set of linear functionals on a linear space V 
V* 23.6.5 the linear space of linear functionals on a linear space V 
QT 23.11.2 the transpose of a linear map ¢: V > W for linear spaces V and W 
24. Linear space constructions: s sos soe s Roo eo RO X om Wa Row 3 po Ro E A so 3 UR RO x e d 817 
Vi B Ve 24.1.3 external direct sum of linear spaces V; and V3 
V QV 24.1.10 internal direct sum of linear spaces Vi and V3 
rtS 24.2.5 the left translate of S by z, for z € V, S C V, linear space V 
S+2 24.2.5 the right translate of S by z, for x € V, S C V, linear space V 
V/W 24.2.9 quotient of linear space V over linear space W 
lloll 24.7.8 norm of a vector v in a normed linear space 
Jv] 24.7.8 norm of a vector v in a normed linear space 
Zl 24.7.13 — p-norm of z € R” 
(x, y) 24.9.8 inner product of vectors x, y € R” 
cy 24.9.8 inner product of vectors x, y € R” 
(x,y) 24.9.8 inner product of vectors x,y € R” 
25. Matrix algebrà- i. woe edo» 989 Gad Pe ed war UR mv FUP ee QU ES 845 
Mm, n(K) 25.2.4 the set of m x n matrices over a field K 
Mya (IR) 25.2.4 the set of m x n real-valued matrices 
Minin 25.2.5 same as M, (IR) 
Row;(A) 25.2.16 row matrix i of a matrix A 
Col;(A) 25.2.16 column matrix j of a matrix A 
AB 25.3.8 the product of matrices A and B 
In 25.4.3 the identity matrix in Mp n (K) for n € Z and field K 
AT 25.4.12 the transpose of matrix A € Mm,n(K), m,n € Za, field K 
Rowspan(A) 25.5.3 the row matrix span of matrix A € Mm,n(K), field K, m,n € Zj 
Colspan(A) 25.5.3 the column matrix span of matrix A € Mm n(K), field K, m,n € Zt 
Rowrank(A) 25.5.9 the row rank dim(Rowspan(A)) of a matrix A 
Colrank(A) 25.5.9 the column rank dim(Colspan(A)) of a matrix A 
Mim (K) 25.8.8 the set of n x n invertible matrices over a field K 
AT! 25.8.11 the inverse of an invertible square matrix A 
AP 25.8.15 the pth power of a square matrix A 
Tr(A) 25.9.3 the trace of a square matrix A 
det (A) 25.10.4 the determinant of a square matrix A 
Xt (A) 25.11.2 the upper modulus of a real square matrix A 
X (A) 25.11.2 the lower modulus of a real square matrix A 
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78.1. Notations 2367 
notation reference meaning 
pt (A) 25.12.2 the upper bound of a real square matrix A 
u (A) 25.12.2 the lower bound of a real square matrix A 
Sym(n, R) 25.13.4 — the set of real symmetric n x n matrices 
Sym) (n, IR) 25.13.9 the set of positive semi-definite real symmetric n x n matrices 
Symp (n, IR) 25.13.9 the set of negative semi-definite real symmetric n x n matrices 
Sym" (n, IR) 25.13.9 the set of positive definite real symmetric n x n matrices 
Sym (n, R) 25.13.9 the set of negative definite real symmetric n x n matrices 
A((NG,)L 4; K) 25.15.3 set of multidimensional arrays over K with width sequence m € (Zj )" 
A(Nm,)14) 25.15.3 abbreviation for A((Nm,);_1; IR) 
A,(Nn; K) 25.15.3 set of multidimensional arrays over K with width sequence N7 
A. (N,) 25.15.3 abbreviation for .A, (IN; IR) 
At (Nn; K) 25.15.8 — set of symmetric multidimensional arrays over K, width n, degree r 
AT (Nn) 25.15.8 abbreviation for A7 (IN; IR) 
A (Nn; K) 25.15.9 set of antisymmetric multidimensional arrays over K, width n, degree r 
A; (Nn) 25.15.9 abbreviation for A; (Nn; IR) 
Chr 25.15.12 compression map for symmetric higher-degree arrays, n, r € Zu 
ego 25.15.13 compression map for antisymmetric higher-degree arrays, n,r € Zi 
Dj; 25.15.15 decompression map for symmetric higher-degree arrays, n,r € Ze 
D; 25.15.16 decompression map for antisymmetric higher-degree arrays, n,r € Zi 
20: .AfHne SPACES: «woes uy Fp a a e Sue ee etg dee uk ME ROC UO Gud eds WE Re 881 
Lpg 26.3.4 line on affine space over group, base point p, velocity g 
T,(X) 26.3.6 tangent space on affine space X over a group at base point p € X 
T(X) 26.3.8 tangent bundle on affine space X over a group 
LY é 26.3.13 half-line on an affine space over a group, base point p, velocity g 
T (X) 26.3.15 unidirectional tangent space on affine space X over a group at p € X 
TT (X) 26.3.17 unidirectional tangent bundle on affine space X over a group 
Lym 26.4.6 line on affine space over module, base point p, velocity m 
Lym 26.5.4 line on affine space over module over set, base point p, velocity m 
T,(X) 26.5.8 tangent space on affine space X over a module, base point p € X 
T(X) 26.5.10 tangent bundle on affine space X over a module 
LY ” 26.7.6 half-line on affine space over module over ordered ring, base point p, velocity m 
T (X) 26.7.8 unidirectional tangent space at p on affine space X over module/ordered ring 
TT (X) 26.7.10 unidirectional tangent bundle on affine space X over ordered ring 
TUE ) 26.8.3 tangent velocity space on affine space X over a module, base point p € X 
T(X) 26.8.5 tangent velocity bundle on affine space X over a module 
X [p, q] 26.9.9 line segment through points p,q € X 
X[po,pi...pu] 26.9.15 hyperplane segment through points po, pi,...px € X for k € ZF 
Low 26.13.4 — tangent-line vector at p with velocity v in Cartesian space IR" 
T,(R”) 26.13.6  tangent-line vector set at p in a Cartesian space R”, n € Zf, p € R” 
T(R”) 26.14.3 tangent-line vector bundle set on a Cartesian space R”, n € Zj 
B 26.14.7 velocity chart for tangent-line bundle total space for a Cartesian space 
(Li, L3) 26.15.7 concatenation of Cartesian space tangent vectors Lı and Lo 
T(R”) 26.16.6 — tangent velocity set at p € R”, n € Z? 
T(R”) 26.16.10 tangent velocity bundle set on R”, n € Zf 
T5 (IR") 26.17.4 tangent covector space at p € R”, n € Zj 
s ad 26.17.8 | tangent covector at p with co-velocity w, pw € IR", n € Zg 
T*(R”) 26.18.7 tangent field covector set at p € R”, n € Zt 
T(R”) 26.18.12 tagged tangent field covector set on R”, n € Zf 
T (IR^) 26.18.14 tagged tangent field covector bundle set on R”, n € Zf 
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notation reference meaning 
2¢...Multilinearalgebra. £x uec ce a Bw ee ow a de a UA qo e ee es @ 917 
L((Va)aea;U) 27.2.47 set of multilinear maps from x4e4 Va to U 
L(Vi,...Vin3U) 27.2.48 the set .Z'((V4)aeA;U) with A= Nm = {1,...m 
Lin(V;U) 27.2.19 same as Z((Va)aca; U) with A = Nm and V, = V for all a 
L((Va)aca) 27.3.2 set of multilinear functions on Xaca Va 
L(Vi,...Vin) 273.33 the set L((Va)aca) with A = Nm = {1,...m} 
Lm V) 27.3.4 same as Y ((Va)aca) with A = Nm and Va = V for alla 
28. Lensor Spaces o caai a 3 a ee ne 9 we o9 o9 x a sw iA 943 
GO ue A Va 28.1.4 tensor product of linear spaces (Va)aeA 
Q Vi 28.1.5 ^ tensor product of linear spaces (V;)?* , 
W6 ...Vm 28.1.6 tensor product of linear spaces (V;)7*, 
eo" y 28.1.7 tensor product of m copies of linear space V 
aca Uo 28.4.3 tensor monomial corresponding to (va)aeA 
Gi 28.4.4 tensor monomial corresponding to (vi)7*, 
U1 Q ... Um 28.4.5 tensor monomial corresponding to (vi)7*4 
29. Tensor space constructions . . . . . 4. 2 a los ros 963 
& (Fy; Fe) 29.3.4 mixed tensor product of linear spaces J^; = (Va)aca, F2 = (Wa) ses 
Gy. Uk 29.4.10 simple tensor for vectors (vx); 4 € x1 4 Vx 
v @...@ up 29.4.10 simple tensor for vectors (vx); 4 € x; 4 Vie 
gi- H 29.4.11 simple tensor for linear functionals (60)? ., € x#_,(We)* 


$'®@...@¢° 29.4.11 
(85 Vk) 8 (87.4 $t 29.4.14 
Gp Vk > Q1 $° 294.15 


ei? 29.4.17 
e 29.4.18 
Gv 29.5.3 
QUy 29.5.5 
Zr s(V;U) 29.5.15 
M. .(V,W;U) 29.5.18 
GL QU. 29.6.6 
V1 @...Q Up 29.6.6 
$3.9 29.6.7 


$1e...Gó* 29.67 
(Gk 4 vx) & (85, ") 29.6.9 
GE Uk — G1, $ 29.6.10 


e; 29.6.12 
e; 29.6.13 
Cong (A) 29.7.3 
AB 29.7.7 


simple tensor for linear functionals (90^)? , € x2 4(Wy)* 
multilinear-style mixed tensor constructed from (vk); and (95); , 
linear-style mixed tensor constructed from (vi)? , and (¢°)$_, 
mixed tensor basis vector (87. €k,i,) Q (&$.., E07) 

mixed tensor basis vector (G5 4 egi, — B E67) 

mixed tensor space of degree (r,s) on linear space V, r,s € Zg 


mixed tensor space of degree (r,s) on linear space V, r,s € Zp 
mixed multilinear maps on linear space V 

mixed multilinear maps on linear spaces V and W 

simple tensor for vectors (v;);.., € V", linear space V 

simple tensor for vectors (vz);_, € V”, linear space V 

simple tensor for linear functionals M € (W*)', linear space W 
simple tensor for linear functionals (v);.., € (W*)”, linear space W 
multilinear-style mixed tensor constructed from (vgz)%_, and (952. , 
linear-style mixed tensor constructed from (vk); and COVA 

mixed tensor basis vector (95 ., €:,) & (8j. €*) 

mixed tensor basis vector (G5, , ei, > @j_, ©”) 

the contraction of mixed tensor A on upper index a and lower index 8 
the juxtaposition product of tensors A and B 
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notation reference meaning 
30. Multilinear map and tensor symmetries . ................. lle 989 
L(V;U) 30.1.4 the set of symmetric multilinear maps from V™ to U 
L(V; U) 30.1.5 the set of antisymmetric multilinear maps from V™ to U 
LI (V) 30.1.9 the set of symmetric multilinear functions on V™ 
L(V) 30.1.10 the set of antisymmetric multilinear functions on V™ 
Am(V;W) 30.4.2 the set .ZZ (V; W) with pointwise vector addition and scalar product 
AmV 30.4.3 same as A;,(V; K), where K is the field of V 
NV 30.4.9 alternating tensor product of m copies of V; same as Am(V; K)* 
Aj Ui 30.4.23 a simple m-vector 
TO RP) 30.6.6 mixed multilinear function space ®"* T, (IR^), pe R^, r,s € Zf, n € Zt 
TER 30.6.7 mixed tensor space 0^ T(R”), pe R”, r,s € Zf, n € Z} 
T”*(T(R")) 30.6.12 — tensor bundle set per: T7? (IR^), r,s € Zj and n € Z 
T'"'*(T(IR?)) 30.6.12 tensor bundle set Uper» Tp (IR^), r,s € Zi and n € Ze 
A,(T(R”)) 30.6.12 tensor bundle set [Jc As(Zp(R")), s € Zo and n € Zo 
A.(T(R"),W) | 30.6.12 tensor bundle set per» As(Tp(R”)), s € Zg and n € Zg, linear space W 
£;(T(R"),W) 30.6.12 tensor bundle set U,crn -2s(Zp(R")), s € Zi and n € Zj, linear space W 


X(E) 30.6.14 set of cross-sections of tangent bundle total space E with base space R” 
X(E|U) 30.6.14 — set of cross-sections of tangent bundle total space E with domain U C R” 
Lpa 30.6.16 a multilinear-style tensor in 77? R”), r,s Zi, pER”, ae Nt, ne Zi 
Lra 30.6.17 a linear-style tensor in T/?*(IR"), r,s € Zt, p€ R”, a€ NS neat 
ol. Topology 2.224.424.0040 Sa be teen bee boe bedecm tase a ae bea 1021 
Top(X) 31.3.4 the topology on a topological space X 
Top, (X) 31.3.12 the set of open neighbourhoods of x € X in a topological space X 
Nbhd( X) 31.3.14 the set of point-neighbourhood pairs (x, Q) € X x Top(X) such that z € Q 
Top( X) 31.4.4 the set of closed sets in a topological space X 
Int(S) 31.8.3 the interior of a set S in a topological space X 
$ 31.8.8 the closure of a set S in a topological space .X 
Intr (S) 31.8.20 the interior of a set S in a topological space (X, T) 
Clos (S) 31.8.21 the closure of a set S in a topological space (X, T) 
Clos(S) 31.8.21 the closure of a set S in a topological space X 
Ext(S) 31.9.3 the exterior of a set S in a topological space X 
Bdy(S) 31.9.6 the boundary of a set S in a topological space X 
oS 31.9.6 the boundary of a set S in a topological space X 
Extr (S) 31.9.16 the exterior of a set S in a topological space (X, T) 
Bdy4 (S) 31.9.16 the boundary of a set S in a topological space (X, T) 
C(X,Y) 31.12.11 set of continuous functions from X to Y, for topological spaces X, Y 
C?(X,Y) 31.12.11 same as C(X,Y) 
C?(X) 31.12.13 same as C(X, IR) 
X2yY 31.14.38 topological spaces X and Y are homeomorphic 
f:XmeEY 31.14.3 f is a homeomorphism from X to Y 
Iso(.X, Y) 31.14.7 the set of all topological isomorphisms (i.e. homeomorphisms) from X to Y 
Aut( X) 31.14.7 the set of all topological automorphisms on X, same as Iso( X, X) 
32. Topological space constructions. . . . . 4 4 2 2 ll eee es 1069 
33. Topological space classes . . . . . ll ll lll ls 1101 
Tk 33.1.2 Alexandrov /Hopf topological separation class k = 0, 1, 2,3, 34, 4,5,6 
dim(S) 33.8.3 Lebesgue (topological) dimension of subset S of a topological space 
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notation reference meaning 

34. Topological connectedness ....... ele 1135 
Top^?""(X) 34.4.11 the set {Q € Top( X); Q is connected} for topological space X 
Top, (X) 34.4.11 the set {Q € Top, (X); Q is connected} for topological space X, p € X 
Top (X) 34.4.11 the set {F € Top( X); F is connected} for topological space X 


Top, (X) 344.11 the set {F € Top,(X); F is connected} for topological space X, p € X 
ConnSet(A) 34.5.9 set of all connected components of A € P(X), for topological space X 
Conn (x) 34.6.3 connected component containing x in subset A of a topological space 

35. Continuity and limits... . e so e cca e sm sals e a sss 1165 
lim; ,4 f(z) 35.8.17 limit of function f: X >Y at x € X 
lim, f 35.3.19 limit of function f: X >Y atre X 
lim; o5 Ti 35.4.5 limit of infinite sequence (z;)?2.9 
lim x 35.4.5 limit of infinite sequence x, same as lim; 555 Ti 


liminf,,, f(x) 35.8.3 inferior limit of function f : X — R at a point a € X 
limsup, ,, f(a) 35.8.3 superior limit of function f : X — R at a point a € X 


liminf, ,4- f(x) 
liminf, ,,+ f(x) 35.8.10 inferior right limit of function f : X — R at a point a € X 
(x) 


lim 


35.8.10 inferior left limit of function f : X — IR at a point a € X 


sup, ,g,- f(x) 35.8.10 superior left limit of function f : X — R at a point a € X 


lim sup,_,,+ f(x) 35.8.10 superior right limit of function f : X — R at a point a € X 

36. Topological curves, paths and groups... soo e a 1201 
S(v) 36.2.15 the initial point of a curve y 
T(y) 36.2.15 the terminal point of a curve y 
€o(M) 36.2.17 the set of C? curves in topological space M 
[yo 36.8.3 the set of curves which are path-equivalent to a given curve 7 
Po(M) 36.8.7 the set of C? paths in topological space M 

Ji. Metric Spaces «uoo ox ono we ala @ Ae ee HEUS os RS S URP RU Rue es E Rr Rs 1233 
Buy 37.3.2 open ball {y € M; d(x,y) < r}, metric space M, centre z € M, radius r € Rj 
Bs 37.3.2 closed ball (y € M; d(x,y) < r}, metric space M, centre x € M, radius r € Rj 
B, (x) 37.3.2 open ball (y € M; d(x,y) < r}, metric space M, centre x € M, radius r € R$ 
B, (a) 37.3.2 closed ball (y € M; d(x,y) < r}, metric space M, centre x € M, radius r € Rf 
Bartira 37.3.15 open annulus {y € M; rı < d(x,y) < re}, x E M, r1,r2 € Ri 
Binge ee 37.3.15 closed annulus (y € M; ri < d(z,y) < r2}, £x € M, 1,72 € Ri 
Boy. 37.3.5 punctured open ball (y € M; 0 4 d(z,y) <r},x7e€M,re Rf 
Bor 37.3.15 punctured closed ball (y € M; 04 d(x,y) <r}, sze M,r eR 
By, ra (£) 37.3.15 open annulus {y € M; rı < d(z, y) < ra]; £x E M, rı, r2 € Ri 
Beds (x) 37.3.15 closed annulus (y € M; ri < d(z,y) < ra]; x E M,mry,r2 € Ri 
B, (x) 37.3.5 punctured open ball (y € M; 0 # d(z,y) <r}, se M,reRb 
B, (£) 37.3.15 punctured closed ball {y € M; 04 d(x,y) <r}, e M,r € Ro 
d(x, A) 37.4.1 distance between point x € M and non-empty set A C M, metric space M 
d(A, B) 37.4.4 distance between sets A, B € P(M) \ (0), metric space M 
diam (S$) 37.4.7 diameter of non-empty set S C M, metric space (M, d) 
diam(S) 37.4.7 diameter of non-empty set S C M, metric space M 
radius(.S) 37.4.11 radius of non-empty set S C M, metric space M 
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notation reference meaning 
38. Metric space continuity . . . se sa sateco dewia e kapa aea Ea a E 1263 
Lip(f) 38.6.8 the infimum of Lipschitz constants for a Lipschitz function f 
C^ (S, M3) 38.7.5 set of o-Holder functions from S C Mi to M2, a € (0,1], metric spaces M1, M» 
C^ (S, M3) 38.7.5 the same as C?^(S, Mz) when a € (0, 1) 
L(y) 38.8.3 the length of a curve y in a metric space 
Ly(y) 38.8.3 the length of a restricted curve 4l j in a metric space, for a real interval J 
Ay 38.8.9 the partial curve length function for a curve y 
Ay zo 38.8.13 bidirectional partial curve length function for a curve y with initial value xo 
39. Topological linear spaces . . . . . 4 4 ee 1291 
3 eri 39.2.10 sum of a convergent series (a;);e; in a Tı topological linear space 
han ay 39.2.10 sum of a convergent series (a;);e; in a Tı topological linear space, 49 = min(J) 
Ya 39.2.10 sum of a convergent series a = (a;);e; in a T4 topological linear space 
O(g(x)) 39.7.9 at most of order g (Landau's symbol) 
o(g(x)) 39.7.10 of order less than g (Landau's symbol) 
40. Differential calculus s < ceca ccana arane e e aea e a ea 1303 
f! 40.4.5 derivative of a differentiable function f : U — IR for U € Top(R) 
f'(p) 40.4.8 derivative of a differentiable function f : U — R for U € Top(R) at p e U 
DE f(x) 40.10.4 the upper right Dini derivative of f at x 
D; f(x) 40.10.4 the lower right Dini derivative of f at x 
Dj f(x) 40.10.4 the upper left Dini derivative of f at x 
D, f(z) 40.10.4 the lower left Dini derivative of f at x 
41. Multi-variable differential calculus . ..... 2e 1329 
3k f (p) 41.1.12 kth partial derivative of f at p € U € Top(IR"), k € Nn, f: U >R, n € Zp 
Ok f 41.1.13 kth partial derivative of f, k € Nn, f : U 5 IR, U € Top(R”), n € Z 
C! (U,R) 41.1.25 set of f : U > IR with cts partial derivatives, U € Top(IR”), n € Zp 
Ox f (p) 41.2.9 kth partial derivative of f : U — R™ at p € U, for U € Top(IR?), m,n € ZÈ 
Oy, f 41.2.10 kth partial derivative of f : U > R”, for U € Top(IR?), m,n € Zp 
C! (U, IR?) 41.2.19 set of f : U > R™ with cts partial derivatives, U € Top(IR"), m,n € ZË 
Ov f (p) 41.4.5 directional derivative of f : U — IR" at p € U € Top(IR") in direction v € R” 
df 41.6.8 total differential of totally differentiable function f : U — IR", U € Top(IR") 
(df) p 41.6.8 alternative notation for total differential of f at p 
On f (p) 41.8.5 kth partial derivative of f : U + W at p € U, for U € Top(IR?), n € Zj 
Onf 41.8.6 kth partial derivative of f : U + W, for U € Top(IR?), n € Z 
C1(U,W) 41.8.10 set of f : U — W with cts partial derivatives, U € Top(IR”), n € Zp 
Ou f (p) 41.9.5 Banach space directional derivative, f : U 2 W, p € U € Top(V), v eV 
df 41.9.9 Banach space total differential of f : U 2 W, U € Top(V) 
(df)y 41.9.9 Banach space total differential at p € U of f : U 2 W,U € Top(V) 
42. Higher-order derivatives . 2 ee gore 4 oko asm xw ee 1361 
Of 42.1.3 the derivative of a partial function f : IR > IR 
or f 42.1.5 the k-fold derivative of a partial function f : IR > IR for k € Zj 
D*(U, R) 42.1.9 set of k-times differentiable functions on U € Top(IR) for k € Zj 
D*(U) 42.1.9 ^ abbreviation for D*(U, R) 
C* (U, R) 42.1.10 — set of k-times continuously differentiable functions on U € Top(IR) for k € Zj 
C*(U) 42.1.10 abbreviation for C^(U, IR) 
Oi f 42.2.4 ith partial derivative of partial function f : R” + R, n € Zj, ic Nn 
Os f 42.2.6 partial derivative of partial function f : IR" > IR for multi-index a € NE 
D*(U, R) 42.2.8 set of k-times differentiable functions on U € Top(IR") for n € Zj, k € Zj 
D*(U) 42.2.8 ^ abbreviation for D*(U, R) 
C*(U,R) 42.2.9 set of k-times continuously differentiable functions on U € Top(IR") for k € Z 
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notation meaning 
CRUD abbreviation for C^(U, IR) 
Of ith partial derivative of partial function f : R” > R”, n,m € Zf, i € Nn 
Os f partial derivative of partial function f : R” > R™ for multi-index a € NF 
D*(U, R™) functions U — R™ with derivatives to order k € Zg, U € Top(R”) 
G^ (U, RP) functions U — R™ with continuous derivatives to order k € Z;, U € Top(IR") 
GF(O, R”) functions f € C*(U, IR") with U € Top(Q), k € Zi, € € Top(IR?) 
G*(0,8) functions f € C^(Q,IR") with Range(f) € S, k € Zi, Q € Top(R”), SCR" 
C*(U,W) 5. functions U > W, cts derivatives to k € Z{, U € Top(IR”), linear space W 
C*(U, R”) 42.5.34 — set of functions in C^(U, IR") with a-Hélder kth derivatives, a € (0, 1] 
43, Integral caleulüs sde eo Ca 9o Ro moe] 99M ee RU 9m a a ROSE 1385 
n f(x) dx 43.2.6 integral of generic integrable function f : [a,b] > IR for a,b € IR 
Part(a, b) 43.3.4 set of interval partitions between a,b € R 
Part;(a, b) 43.3.11 set of interval partitions between a,b € IR with mesh not exceeding ô € Rf 
mesh(z) 43.3.11 the mesh of an interval partition x 
Solf, x) 43.3.14 Cauchy sum of function f for interval partition x 
Samp(z) 43.4.4 the set of sample sequences for an interval partition x 
Sr(f, x, 2) 43.4.5 Riemann sum of function f for interval partition x and z € Samp(x) 
SE a) 43.5.6 upper Darboux sum of f for interval partition z = (x;);—o« 
S5(f, 2) 43.5.6 lower Darboux sum of f for interval partition x = (2;);—o% 
Ap(f,x) 43.5.6 Darboux oscillation of f for interval partition x = (2;);—o% 
IU) 43.5.10 | upper Darboux integral of f 
Ip(f) 43.5.10 lower Darboux integral of f 
oscs( f) 43.6.5 oscillation of a function f on a set S 
w(t) 43.6.6 oscillation of a function f at a point t 
D f(x) da(a) 43.10.6 Riemann-Stieltjes integral of f : [a,b] — IR for integrator curve o : [a,b] — R. 
f? f da 43.10.6 Riemann-Stieltjes integral of f : [a,b] — IR for integrator curve o : [a,b] — R 
44. Differential equations and special functions . . . .. e 1425 
In 44.1.2 the logarithm function In: Rt + IR 
exp 44.1.5 the exponential function exp : IR — Rt 
e 44.1.6 exp(1) 
e” 44.1.6 alternative notation for exp(x) for x € R 
arctan 44.2.3 the arctangent function arctan : R — [—-7/2, m /2] 
arcsin 44.2.3 the arcsine function arcsin : [—1, 1] ^ [—7/2, 7/2] 
arccos 44.2.3 the arccosine function arccos : [-1, 1] — [0,7] 
T 44.2.5 4 arctan(1) 
D*(Q,R™) 44.6.10 set of functions k times differentiable from 2 C IR" to R” 
BEI 44.6.11 set of locally k-power Riemann-integrable functions from J C R” to R™ 
CK (Q, R”) 44.8.10 — set of C^ functions with continuous extensions of derivatives to Q 
45. Lebesgue measure zero . . . 4... 2l ll lll 4444s oss 1457 
46. Vector field calculus for Cartesian spaces... . 2e 1477 
X*(T(R")) 46.1.6 set of C^ vector fields on R”, with n € Zj, k € Zj 
X*(T(R")|U) 46.1.6 set of C! vector fields on U € Top(R"), with n € Zf, k € Z? 
OvY 46.1.9 directional derivative of vector field Y by a vector V 
X*(Am(T(IR"))) 46.2.7 set of C^ real-valued m-forms on R”, with m,n € Zj, k € Zg 
XF(A4(T(R"))|U) 46.2.7 set of C^ real-valued m-forms on U € Top(IR?), with m,n € Zj, k € Zg 
XF(An(T(R"),W)) 46.2.9 set of C^ W-valued m-forms on R”, with m,n € Zi, k € Zi 
X*(A,,(T(R"),W)|U) 46.2.9 set of C^ W-valued m-forms on U € Top(IR?), with m,n € Zi, k € Zt 
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X*(R") 46.3.5 set of C^ naive vector fields on R”, with n € Zf, k € Zp 

X*(R^"|U) 46.3.5 set of C^ naive vector fields on U € Top(R”), with n € Zi, k € Zj 

Jo 46.3.7 Jacobian matrix function for a C! local Cartesian space diffeomorphism ¢ 

Op vY 46.4.3 directional derivative of naive vector field Y by vector V at point p 

[X,Y] 46.4.8 Lie bracket of naive vector fields X and Y 

AU 46.6.6 set of C* real-valued naive m-forms on R”, with m,n € Zt, k € Zt 

X*(A,R"|U) 46.6.6 set of C* real-valued naive m-forms on U € Top(IR"), with m,n € Zj, k € Zg 

X*(An(R",W)) 46.6.8 set of C^ W-valued naive m-forms on IR", with m,n € Zp, k € Zp 

X"(A,(IR"^,W)|U) 46.6.8 — set of C* W-valued naive m-forms on U € Top(R”), with m,n € Zp, k € Zg 

dó 46.7.4 exterior derivative of naive m-form ¢ € Xi (A, (IR^, W)), vector version 

do 46.8.10 exterior derivative of naive m-form ¢ € Xl, (A, (IR^, W)), vector-field version 
4T. Topological fibre bundles . .. ... s 1503 

X(E,7, B) 47.4.3 set of global cross-sections of a topological fibration (E, m, B) 

X(E,7,B|U) 47.4.3 set of local cross-sections of topological fibration (E, r, B) on U C B 

X(E, 7, B) 47.4.3 set of local cross-sections of a topological fibration (E, r, B) 


Xioc(E, 7, B) 47.4.3 set of open-domain local cross-sections of topological fibration (E, 7, B) 
X°(E, 7, B) 47.4.7 set of continuous global cross-sections of a topological fibration (E, m, B) 
XC(E,n,B|U) 47.4.7 set of continuous local cross-sections of topological fibration (E, 7, B) on U 


X°(E, 7, B) 47.4.7 set of continuous local cross-sections of a topological fibration (E, m, B) 

XE (E, r, B) 47.4.7 set of continuous open-domain local cross-sections of top. fibration (E, 7, B) 
R? 47.8.9 right action map on a principal fibre bundle 

Rg 47.8.9 right action map on a principal fibre bundle 

(P x F)/G 47.11.6 orbit-space version of an associated topological fibre bundle 

P xg F 47.11.13 alternative notation for (P x F)/G 

X((P x F)/G) 4712.4 set of contravariant F-valued principal bundle functions on P 

X(P xg F) 47.12.4 alternative notation for X((P x F)/G) 


X°((P x F)/G) 47.12.14 set of continuous contravariant F-valued principal bundle functions on P 
X? (P xg F) 47.12.4 alternative notation for X°((P x F)/G) 


48. Parallelism on topological fibre bundles . ............ lees 1541 
Isoc(Ep,,Ep,) 48.1.5 the set of topological isomorphisms from fibre Ey, to Ep, 
Autgc (E) 48.1.6 the set of topological automorphisms of fibre set Ey 
Ls 48.1.9 automorphism through the charts lp, o Lgo 9|, DE, £2 Ey 
e 48.3.2 parallelism map between parameters s and t of a curve y 
49. Locally Cartesian spaces... «4 eoa] a RO R OR Rr S 1567 
dim(M) 49.4.11 dimension of a locally Cartesian topological space 
atlas(M) 49.7.10 the nominal atlas for a locally Cartesian space M 
atlas, (M) 49.7.10 set of charts v» € atlas(M) such that p € Dom(w) 
atlas(.M) 49.7.20 the maximal atlas for a topological manifold M 
atlas, (MZ) 49.7.20 — set of charts v) € atlas(M) such that p € Dom(w) 
C(M,R) 49.10.3 space of continuous real-valued functions on a manifold M 
Ó(M, R) 49.10.6 set of continuous real-valued local functions on a manifold M 
Ĉ (M IR) 49.10.7 set of continuous real-valued local functions on a manifold M; p c M 
C(M,W) 49.10.10 space of continuous W-valued functions on a manifold M; linear space W 
C (M,W) 49.10.11 set of continuous W-valued partial functions on a manifold M; linear space W 
ÓM ,W) 49.10.12 set of cts. W-valued partial functions on manifold M; linear space W, pe M 
C(M,, Mə) 49.11.5 set of continuous maps from Mı to Ms, for topological manifolds Mi, Mə 
C? (Mi, M3) 49.11.5 set of continuous maps from Mı to Ms, for topological manifolds Mi, M2 
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notation reference meaning 
50). Topological manifolds i. duco» as oR a RS ea RR ee oe we aoe e G 1601 
dim(M) 50.1.5 dimension of a topological manifold 
51. Differentiable manifolds... . . ... 2l es 1625 
atlas(.M) 51.3.11 the atlas Am for a differentiable manifold (M, Am) 
atlas; (M) 51.3.11 the subset [v € atlas(M); p € Dom(w)} of atlas( M) 
Amp 51.3.11 the subset (v € Ay; p € Dom(v)) of an atlas Am 
dim(M) 51.3.17 dimension of a differentiable manifold 
atlas" (M) 51.4.7 set of charts on M which are C* compatible with atlas(M) 
atlas; (M) 51.4.7 set of C^ compatible charts ~ € atlas*(M) with p € Dom(w) 
Jaa(p) 51.4.18 chart transition matrix from v, to wg at p in a C! manifold 
Jgo(p)*; 51.4.18 chart transition matrix from Ya to Yg at p, element 9; (v5 o v!) 
C*(M,R) 51.6.3 set of C^ real functions on a C^ manifold M 
C*(M) 51.6.3 abbreviation for C*(M, IR) 
C*(Q,R) 51.6.3 set of C* real functions on Q € Top(M) for a C^ manifold M 
GF(Q) 51.6.3 abbreviation for C^(Q, IR) 
Ó*(M IR) 51.6.7 C* real functions on open subsets of a C^ manifold M 
C*(M) 51.6.7 abbreviation for C*(M, R) 
Ge M,R) 51.6.8 C* real functions on open neighbourhoods of p € M of a C^ manifold M 
CF(M) 51.6.8 ^ abbreviation for CA(M, R) 
COR”) 51.7.3 set of C^ IR?-valued functions on an open subset €) of a CF manifold M 
c*(Q, W) 51.7.7 set of C^ W-vector-valued functions on Q € Top(M) of a C* manifold M 
52. Differentiable manifold maps and products. . . ... a a a 1647 
C* (Mi, M3) 52.1.3 set of C* maps from a C^ manifold M, to a C! manifold M» 
Ó*(M;, M3) 52.1.4 set of C* maps from open subsets of C* manifold M, to a C^ manifold Mə 
Mı x Mo 52.6.3 the direct product of differentiable manifolds Mı and Mə 
C*(Mi, M2; Mo) 52.6.12 set of componentwise C* two-variable maps from M, and Mz to Mo 
53. Philosophy of tangent bundles . ............. en 1679 
M (n, k) 53.4.3 the class of C* differentiable n-dimensional manifolds, n € Z and k € Zt 
JF (n, m, k) 53.4.4 class of C* fibrations with n-dim base space, m-dim fibre space 
54. Tangent bundles- s «c sws e ko AAS e ayi UR Rs ag a ALD E eod Ron modem eg 1701 
iow 54.1.2 tangent vector on C! manifold M, p € M, v € R”, v € atlas, (M) 
T,(M) 54.1.4 the set of tangent (line) vectors at a point p in a C! manifold M 
T(M) 54.1.4 total space of tangent vector fibration of a C! manifold M 
Lov 54.3.3 tangent vector on C! manifold M, pe M, v e R”, c atlasi (M) 
e dd 54.4.10 — chart-basis tangent vector, direction i at p for chart v», for C! manifold 


54.5.7 the velocity chart map for the tangent fibration of a C! manifold 
54.5.10 pointwise velocity chart map at p for the tangent fibration of a C! manifold 
tangent (line) (vector) bundle of a C! manifold M 
54.5.21 manifold chart map for tangent bundle total space T(M) of a C! manifold 
54.6.2 tangent vector embedding map from a manifold to a submanifold 
i 54.7.2 direct product identification map i : Tp, (M1) x Tpa (M2) — Tip, p2) (Mi x M3) 
i 54.7.6 direct product identification map i : T(M1) x T(M2) > T(Mı x M3) 
(Vi, V2) 54.7.9 “concatenation” of tangent vectors Vj € T(Mi) and V2 € T(M3) 


Ses ee 
È 
oN 
A 
[o1 
m 
aD 


1 

wp 54.9.5 pointwise drop function from T,(F) to F for linear space F 

wr 54.9.5 drop function from T(F) to F for linear space F 

Epo 54.10.5 tangent velocity vector on C! manifold M, p € M, v € R”, v € atlasp( M) 

T »(M) 54.10.8 linear space of tangent velocity vectors at p in C! manifold M 

T(M) 54.10.11 total space of tangent velocity vectors of a C manifold M 

Op v, ib 54.11.3 tangent differential operator on C! manifold M, p € M, v € R”, v € atlas,(M) 


[ www .geometry.org/dg.html] [ draft: UTC 2023-1-3 Tuesday 00:13] 


don, You may not charge 


78.1. Notations 2375 


notation reference meaning 
T,(M ) 54.11.9 linear space of untagged tangent operators at p in a C! manifold M 
Ov 54.11.12 tangent differential operator corresponding to tangent vector V 
T(M) 54.12.2 total space U, en Tj(M) of tangent operators on a C! manifold M 
oP dd 54.13.5 tangent operator space basis vector, p € M, component i, chart w 
B ait) 54.14.2 — action of Opn, € T,(M) on f € C! (M,IR?), C! manifold M, m € Zj 
Baal) 54.14.4 action of Op4,y € Ñ, (M) on f € C! (M,W), C! manifold M, linear space W 
T,(M ) 54.15.4 linear space of tagged tangent operators at p in a C! manifold M 
T(M) 54.15.6 total space of tagged tangent operators of a C! manifold M 

55. Covector bundles and frame bundles . ...... a a a a a 1751 
T5 (M) 55.2.2 tangent covector space at p € M, for a C! manifold M 
E usi 55.2.9 tangent covector in T} (M), for p € M, components w € R”, v € atlasp( M) 
ebd 55.3.3 chart-basis covector with index i at p € M, chart v € atlas,(M) 
T*(M) 55.4.3 the tangent covector fibration total space for a C! manifold M 
T* 55.4.4 projection map for tangent covector fibration T* (M) 
o* 55.4.7 the velocity chart map for the covector fibration of a C! manifold 
T*(M) 55.4.11 the tangent covector bundle of a C! manifold M 
T5(M) 55.5.5 the tangent vector r-tuple space at p in a C! manifold M for r € Zj 
T"(M) 55.5.7 the tangent vector r-tuple total space for a C! manifold M for r € Zj 
a 55.5.8 projection map for tangent vector r-tuple fibration T"(M), r € Z* 
55.5.14 tangent vector r-tuple in Tp (M), for p € M, w € (R")', v € atlas, (M), r € Zi 
en? 55.5.19  chart-basis tangent vector r-tuple, for p in a C! manifold M, chart v, i € N? 
o" 55.5.25 velocity chart map for the vector r-tuple fibration of a C! manifold, r € Zi 
wr 55.5.33 manifold chart map for the vector r-tuple fibration of a C! manifold, r € Zt 
$7 (a)! 55.5.40 multilinear function on tangent vector r-tuple space for chart v, r € Zt 
9 (y) 55.5.40 multilinear function on tangent vector r-tuple space for chart w at p, r € Zt 
Fp(M) 55.6.4 the tangent vector r-frame space at p in a C! manifold M for r € Zj 
Fp(M) 55.6.4 the tangent vector n-frame space at p in a C! manifold M, where n = dim(M) 
F'(M) 55.6.6 the tangent vector r-frame total space for a C! manifold M for r € Zi 
F(M) 55.6.6 the tangent vector n-frame total space for a C! manifold M, where n = dim(M) 
T” 55.6.8 projection map for tangent vector r-frame fibration F"(M) for r € Z* 
Fair 55.6.13 the set of linearly independent r-tuples of vectors in R” for n,r € Zi 
nd 55.6.15 tangent vector r-frame in F$ (M), for p € M, w € (R")', v € atlas (M), r € Zo 
Qo" 55.6.18 velocity chart map for the vector r-frame fibration of a C! manifold, r € Zi 
y" 55.6.26 manifold chart map for the vector r-frame fibration of a C! manifold, r € Z* 
nF 55.7.8 projection map for principal frame bundle F(M) 

56. Tensor bundles and multilinear map bundles . . . .... aa 1783 
Ty (M) 56.1.5 multilinear-map tensor space, type (r,s), p € M, C! manifold M 
Tee | 56.1.6 linear-map tensor space, type (r,s), p € M, C! manifold M 
Am 56.1.41 tangent multilinear-map tensor, p € M, a: N^ —> R, Y € Am, r,s € ZÈ 
bcd 56.1.12 tangent linear-map tensor, p € M, a: NZ^5 —> R, y € Am, r,s € Zi 
T"*(M) 56.3.4 tensor fibration total space Ucn Tp (M) 
T"*(M) 56.3.5 tensor fibration total space LJ, e; Tp’ (M) 
qu 56.3.7 projection map for multilinear-style tensor fibration T^*(M) 
qo 56.3.8 projection map for linear-style tensor fibration T"**(M) 
Q^ 56.3.13 bundle chart map for multilinear-style tensor total space, r,s € Zi 
prs 56.3.14 bundle chart map for linear-style tensor total space, r,s € Zi 
wns 56.3.21 manifold chart map for multilinear-style tensor total space, r, s € Zg 
yr? 56.3.22 manifold chart map for linear-style tensor total space, r,s € Zi 
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5 56.4.4 element of Z,(T,(M), re Zi, pe M,a: Nau) 7 R, v € atlas, (M) 
L,(T(M)) 56.4.9 multilinear function bundle [Jc yy -r (Tp(M)) 

T*"(M) 56.4.9 alternative notation for Y,(T(M)) 

ner 56.4.10 projection map for multilinear function fibration .Z;.(T'(M)) 

per 56.4.13 bundle chart map for multilinear function fibration T-^"(M) = %.(T(M)) 
Ur 56.4.18 manifold chart map for total space T^" (M) 

b 56.5.9 antisymmetric multilinear function in A,(T,(M)) = Zr (T;(M)) 

P 56.5.9 alternative notation for Eu 

LO (T(M)) 56.5.14 antisymmetric multilinear function bundle Uem Z7 (T5(M)) 

A,(T(M)) 56.5.14 antisymmetric multilinear function bundle U, em Ar (Tp(M)) 

T2"-(M) 56.5.14 alternative notation for Z7 (T(M)) 

T*"(M) 56.5.14 alternative notation for A,(T(M)) 

‘in 56.5.16 projection map for antisymmetric multilinear function fibration Z-(T(M)) 
qr 56.5.17 projection map for antisymmetric multilinear function fibration A,(T(M)) 
Gen 56.5.20 bundle chart map for fibration Y-(T(M)) = A,(T(M)) = T^"(M) 

p^r 56.5.20 bundle chart map for fibration -27 (T(M)) = A,(T(M)) = T^"(M) 

pAr 56.5.25 manifold chart map for total space T^"(M) = A,(T(M)) = £7 (T(M)) 
Lt (T(M)) 56.6.3 symmetric multilinear function bundle Upe m Z^ (15(M)) 

T2"+(M) 56.6.3 alternative notation for .Z7^ (T(M)) 

qr 56.6.5 ^ projection map for symmetric multilinear function fibration .Z;* (T(M)) 
L(T(M),W) 56.7.4 multilinear map bundle Upe ,; (1, (M), W), linear space W 

M (T(M),W) 56.74 symmetric multilinear map bundle Upe m -£ (Ip(M), W), linear space W 
L. (T(M),W) 56.74 antisymmetric multilinear map bundle Uem -r (15 (M), W), linear space W 
A.(T(M),W) | 56.74 antisymmetric multilinear map bundle Upe ,; A. (Tp(M), W), linear space W 
qn 56.7.6 projection map for antisymmetric multilinear function fibration A,.(T(M), W) 
is 56.7.8 vector-valued antisymmetric multilinear map on a tangent space 

qn W 56.7.12 bundle chart map for fibration A,(T(M), W) 

nw 56.7.16 manifold chart map for total space A,(T'(M), W) 


57. Vector fields, tensor fields, differential forms 
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set of vector fields in a C! manifold M 

set of local vector fields on a subset S of a C! manifold M 

set of local vector fields on subsets of C! manifold M 

set of local vector fields on open subsets of C! manifold M 

differential operator field on a manifold, for a given function 

differential operator field on a manifold, at a given point 

chart-basis vector field for M with respect to chart Y, i € Nn, n = dim(M) 
constant vector field extension of vector z by a chart v 

constant vector field extension of vector z by a chart v 

set of C^ vector fields on a C^*! manifold M 

set of C^ local vector fields on a C^*! manifold M, known subdomain U 
set of C^ local vector fields on a C^*! manifold M, unknown subdomain 
set of tangent operator fields on a C! manifold M 

set of C^ tangent operator fields on a C**! manifold M 

set of tagged tangent operator fields on a C! manifold M 

set of C^ tagged tangent operator fields on a C^*^! manifold M 

chart-basis operator field for chart y, index i 

chart-basis vector m-tuple field on M, chart y, i € N™, n = dim(M), m € Zj 


n? 
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Extn, (V) 57.4.5 chart-dependent “constant extension” of vector-tuple V = (V4) 1, chart v 
Extnj (V) 57.4.9 transposed chart-dependent “constant extension” of V = (V4) 1, chart y 
X(T"*(M)) 51.5.3 set of tensor fields, type (r,s), on manifold M 
X(T"*(M),S) 57.54 set of tensor fields, type (r,s), on subset S of manifold M 
X*(T^*(M)) 57.5.6 set of C* tensor fields, type (r, s), on manifold M 
A per 57.5.9 set of C^ tensor fields, type (r,s), on subset Q 
X(Amn(T(M))) 57.6.3 set of differential m-forms on C! manifold M 
X(Am(T(M)) |S) 57.6.4 set of differential m-forms on subset S of C! manifold M 
X(Am(T(M),W)) 57.6.6 set of W-valued differential m-forms on Ct manifold M 
X(Am(T(M),W)|S) 57.6.7 set of W-valued differential m-forms on subset S of C! manifold M 
X*(Am(T(M))) 57.6.14 set of C^ differential m-forms on C^*! manifold M 


[o1 
m 
e 
[n 
A 


~~’ 
SS 
+" @/—S a 
Jor 
NJN 
9e 
=|= 
elie 


set of C^ W-valued differential m-forms on C^*! manifold M 
set of C^ differential m-forms on C*+! manifold M, subdomain U 


X*(An(T(M), W set of C^ W-valued differential m-forms on C^*! manifold M, subdomain U 
X(Am(T(M))) 57.7.12 set of short-cut differential m-forms on a C! manifold M 
X*(Am(T(M))) 57.7.12 set of short-cut C* differential m-forms on a C^*! manifold M 
X(Am(T(M),W)) 57.7.15 set of short-cut W-valued differential m-forms on a C! manifold M 
X*(Am(T(M),W)) 57.7.15 set of short-cut C^ W-valued differential m-forms on a C^*! manifold M 
X(Am(T(M))|U) 57.7.23 short-cut differential m-forms, C! manifold M, subdomain U 
X*(Am(T(M))|U) 57.7.23 short-cut C* differential m-forms, C^*! manifold M, subdomain U 
X(Am(T(M),W)|U) 57.7.23 short-cut W-valued differential m-forms, C! manifold M, subdomain U 
X*(Am(T(M),W)|U) 57.7.23 short-cut C^ W-valued differential m-forms, C^*! manifold M, subdomain U 
Xioc(Am(T(M))) 57.7.24 local short-cut differential m-forms, C! manifold M 
XE. (AG (M))) 57.7.24 local short-cut C^ differential m-forms, C^*! manifold M 
Xjoc(Am(T(M),W)) 57.7.24 local short-cut W-valued differential m-forms, C! manifold M 
XE (Am(T(M),W)) 57.7.24 local short-cut C^ W-valued differential m-forms, C^*! manifold M 
y 57.9.2 tangent vector field of a differentiable curve y 
Ovy(t) 57.9.7 tangent vector field of a differentiable curve y, value at t € Dom(») 
58. Differentials of functions and maps... .. 2... 2 ee 1837 
(df) p 58.1.2 differential of real-valued function f at p in a differentiable manifold 
df 58.2.3 differential of real-valued function f on a differentiable manifold 
Tp, p, (Mi, M3) 58.3.3 pointwise induced-map tangent space, pı € Mi, p2 € M», manifolds Mi, Mə 
T(Mi, M3) 58.3.5 total induced-map tangent space, manifolds Mi, M» 


T* ,(Mi,Ma) 58.3.8 


P1,p2 eicit 


T* (My, M3) 58.3.10 


(do) py 58.4.5 
(d$)$ 58.5.8 
n% 58.5.8 
(d$),, p; 58.6.2 
(d$); 58.8.2 
(d$); 58.8.7 
dó 58.9.2 
ds 58.9.4 
ds 58.9.16 
ben 58.10.2 
o* 58.11.3 
à" 58.11.5 
o* 58.11.10 


pointwise induced-map tangent covector space, pı € Mi, po € M2 

total induced-map tangent covector space, manifolds Mi, M2 

differential of C! map $ : Mı — M» at p € Mı, for C! manifolds Mi, M» 
differential of ¢ at p for target (sub)manifold S 

tangent vector embedding map from submanifold S to ambient manifold M 
direct product decomposition of differential of map $ at (p1, po) € Mi x Mo 
pull-back of covectors via a C! map $ : Mı — M» at p € Mi 

pull-back of vector-valued covectors via a C! map $ : Mı — M» at p € Mi 
differential of C! map $ : Mı — Mə, for C! manifolds Mj, Mə 

induced map of C! map $ : Mı — Mb, for C! manifolds Mı, M; 

induced vector field map of C! map $ : Mı — Mo, for C! manifolds Mi, M» 
global direct product decomposition of differential of map @ on Mı x Mə 
global pull-back of covectors via a C! diffeomorphism $ : Mı > M» 
pull-back of real-valued 1-forms via a C! map $ : Mı > M» 

pull-back of vector-valued r-forms via a C! map $ : Mı > M2 
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notation reference meaning 
59. Recursive tangent bundles and differentials . .. ......... le 1875 
T.(T(M)) 59.1.9 second-level tangent-line set at z € T(M), C? manifold M 
be wp) 59.1.11 second-level tangent vector, z € T(M), w € IR?",  € atlas,(T(M)) 
B aod 59.1.12 second-level tangent vector, p € M, v, ù, ù% € R”, v € atlas, (T(M)) 
TOM) 59.1.26 — level-k tangent bundle T(T*-D(M)), k € Zt 
T; v(T(M)) 59.2.5 the set of vectors in TZ(T(M)) which have horizontal component V 
Wz 59.2.9 vertical drop function from ker((dz);) to T,,2)(M) for a C? manifold M 
w 59.2.15 global vertical drop function from Usero) ker((dr),) to T(M), C? manifold M 
a 59.3.2 oblique drop function from T,(T(M)) to T,(,)(M) for a C? manifold M 
a 59.3.5 oblique drop function from T(T(M)) to T(M), C? manifold M, v € atlas(M) 
Ey 59.6.3 swap function from T; y (T(M)) to Ty..(T(M)) for C? manifold M 
= 59.6.6 global swap function from T(T(M)) to T(T(M)) for C? manifold M 
q" 59.8.2 the second-order tangent vector field of a C? curve in a C? manifold 
d*^ 59.8.10 differential d(d*-!^), where d°y = y, for curves M 
do 59.12.2 double differential d(d¢) of a map ¢: Mı — Mə for manifolds Mı, M» 
(d?¢), 59.12.2 double differential at z € T(M) of a map ¢: Mı — Ms for manifolds Mi, M3 
60. Higher-order tangent operators . . . . . 4 4 4 2 4 e e ee es 1907 
OF iu 60.2.2 second-order tangent operator at p € M 
D (M) 60.2.4 set of second-order tangent operators 5 us at pE M 
TI(M) 60.2.4 set of second-order tangent operators abus on M 
P 60.2.12 tagged second-order operator, p € M, a € Sym(n, R), b € R”, v € atlas; (M) 
Te (M) 60.2.13 set of tagged second-order tangent operators Deas peM 
T? (M) 60.2.13 set of tagged second-order tangent operators PM on M 
ig 60.5.6 second-order tangent vector, p € M, a € Sym(n, IR), b € R^, v» € atlas; (M) 
T? (M) 60.5.8 set of second-order tangent vectors iin at pe M 
T? (M) 60.5.8 set of second-order tangent vectors T MR on M 


Ow 60.5.14 second-order tangent vector DD ial e TU (M) for W = hg e TPI(M) 
61.. Vector field. calculüs «s s ss s ek Ox ode ced owe Um a a OC FUR HR E eue A 1921 

Ox 61.1.3 action of a vector field X € X(T(M)) on functions in C!(M, R) 

OvY 61.2.3 naive derivative of vector field Y by a vector V 

OxY 61.4.2 naive derivative of vector field Y by a vector field X 

oF 61.4.6 action of a vector V on type (r,s) tensor fields 

oF 61.4.7 action of a vector field X on type (r,s) tensor fields 

[X,Y] 61.5.7 Lie bracket of vector fields X and Y 

LxY 61.8.4 Lie derivative of vector field Y with respect to vector field X 

dó 61.10.3 exterior derivative of m-form ¢ € Xi. (As, (I(M), W)), vector version 

dó 61.14.9 exterior derivative of m-form ¢ € Xi. (As, (I (M), W)), vector-field version 
62% Lie groups a x42 1925 Fo ay w0Pew sie Ane np URP eh A ek we EU Sed we A eee s 1959 

Lg 62.3.3 left translation operator for elements of a Lie group 

Lg 62.3.3 left translation operator for real-valued functions on a Lie group 

ÈT 62.3.3 left translation operator for tangent operators on a Lie group 

LE 62.3.3 left translation operator for tangent operator fields on a Lie group 

L? 62.3.9 left translation operator for tangent vectors on a Lie group 

L 62.3.9 left translation operator for tangent vector fields on a Lie group 

X1(T(G)) 62.4.3 set of left-invariant vector fields on a Lie group G 

X*(T(G)) 62.4.3 set of left-invariant C* vector fields on a Lie group G 

XL 62.4.15 the left invariant vector field on a Lie group G such that X¢ (e) = V 
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78.1. Notations 2379 


right translation operator for elements of a Lie group 

right translation operator for real-valued functions on a Lie group 
right translation operator for tangent operators on a Lie group 

right translation operator for tangent operator fields on a Lie group 
right translation operator for tangent vectors on a Lie group 

right translation operator for tangent vector fields on a Lie group 

set of right-invariant vector fields on a Lie group G 

set of right-invariant C* vector fields on a Lie group G 

the right invariant vector field on a Lie group G such that X#(e) = V 
the adjoint map for an element g of a Lie group G 


63. Lie transformation groups 


63.2.8 


the velocity field family of a diffeomorphism field y on a C! manifold 
Lie group of linear transformations of finite-dimensional linear space V 

Lie group of general linear transformations of IR” 

Lie group of general linear transformations of IR” 

vector field on M generated by U € T.(G), Lie left transf. group (G, M) 

set of vector fields generated on M by Lie left transformation group (G, M) 
map from vector fields Xj to generators U € T.(G) 

vector field on M generated by U € T.(G), Lie right transf. group (G, M) 


XE (E, m, M) 


OvY 
7] 
Tp 


2021 
the set (y € T;(E); 7.(y) = V) of vectors with horizontal component V 
horizontal fibre set vector field, OFB (E, n, M), V € T(M), fibre chart ó 
horizontal submanifold ¢~!({q}) of a fibration with fibre chart ¢ 

drop function at z € E for a C! fibration (E, r, B) and fibre chart ¢ 

drop function for C! fibration (E, x, B) and fibre chart ¢ 

set of OF cross-sections of a differentiable fibration (FE, r, M) 

set of C^ local cross-sections of differentiable fibration (E, m, M), domain U 
set of C* local cross-sections of a differentiable fibration (E, r, M) 

naive derivative of fibration cross-section Y by a vector V 

fibre-set tangent vector embedding 1) : Je; T (Ep) > U.eg Tz,0(£) 
fibre-set tangent vector embedding np : T(E») > Uzer, Tzo(E), p € M 
fibre-set tangent vector embedding n? : TZ(Ez;(;)) > T,o(E),z€ E 
inverse of 7 embedding, Ñ : Uze g T2,0(E) > Upem T(E») 

inverse of np embedding, 7, : Uien, T,o(E)— T(Ep), pe M 

inverse of n” embedding, 5? : T; o(E) > T, (Er(2)), z € E 

vertical vector field induced on total space E by u € T,(G), chart à 
non-vertical vector field on fibre set E, for u € T.(G), chart 9, V € T,(M) 


Tp (E, ng, M) 


C 


65.3.5 
65.3.5 
65.4.2 
65.4.2 
65.7.4 
65.7.4 


vertical drop function at z € E for C! vector bundle (E,7, B, AZ) 
vertical drop function for C! vector bundle (E, 7, B, AL) 
oblique drop function at z € E for C! vector bundle (E, v, B, A5), 6 € At " 


oblique drop function for C1 vector bundle (E, mt, B, AL), 6 € AZ 
vector r-tuple space of vector bundle (E, tg, M) at p € M 

vector r-tuple total space of vector bundle (E, mpg, M) 

vector r-tuple bundle fibre atlas for vector bundle (E, ng, M) 
vector r-tuple bundle manifold atlas for vector bundle (E, ng, M) 
vector r-frame space of vector bundle (E, xg, M) at pe M 

vector r-frame total space of vector bundle (E, nrg, M) 

vector r-frame bundle fibre atlas for vector bundle (E, tg, M) 
vector r-frame bundle manifold atlas for vector bundle (E, mg, M) 


z) 
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notation reference meaning 
66. Differentiable principal bundles . . . .... a a a a a 2089 
R? 66.2.10 pointwise right action map on a principal G-bundle P for g € G 
R? X 66.2.21 right translation operator by g € G for vector fields on principal bundle P 
LE 66.4.2 left action g — zg on principal G-bundle P for g € G, z € P 
À 66.5.2 infinitesimal action map of Lie algebra on principal bundle P 
Àz 66.5.2 infinitesimal action of Lie algebra on principal bundle P for z € P 
Àu 66.6.2 transposed infinitesimal action of Lie algebra on principal bundle P, u € T.(G) 


X*((P x F)/G) 66.8.3 set of C* contravariant F-valued principal bundle functions on P 
X*(P xg F) 66.8.3 alternative notation for X((P x F)/G) 


67. Connections on ordinary fibre bundles ......... e 2113 
0 67.4.2 horizontal lift function (i.e. connection) on a fibration 
0 67.5.4 horizontal lift function (i.e. connection) on fibre bundle 
lifto 67.5.9 lift of vector field by horizontal lift function 0 on fibre bundle 
lifto 67.5.10 lift of local vector field by horizontal lift function 0 on fibre bundle 
Qg(V, à) 67.6.5 generator of connection @ on fibre bundle (E, vg, M, AZ) for V € T(M), 6 € AE 
0? 67.7.2 localisation of horizontal lift function 0 via fibre chart ¢ 
0 67.1.2 localisation of horizontal lift function 0 via fibre chart ¢ for velocity V 
7 67.8.2 transposed horizontal lift function (i.e. connection) on a fibration 
lifts 67.8.6 lift of vector field by transposed horizontal lift function 0 on fibre bundle 
hz 67.9.2 horizontal component map at z € E for a connection on OFB E 
Qz 67.9.6 horizontal subspace at z € E for a connection on OFB E 
Q 67.9.6 horizontal subspace map for a connection on OFB E 
Uz 67.10.2 vertical component map at z € E for a connection on OFB E 

68. Connections on vector bundles . . . ... ll llle 2143 
Do 68.2.5 chart-dependent covariant derivative on fibre bundle, connection 0, chart @ 
DY X 68.2.9 covariant derivative on vector bundle, connection 0, vector V, cross-section X 
wp) 68.3.2 Cartan-style connection coefficients on a vector bundle at p 

69. Connections on principal bundles... .... a a a 2159 
B 69.1.3 horizontal lift function (i.e. connection) on a principal bundle 
lift g 69.1.16 lift of vector field by horizontal lift function 6 on principal bundle 
3 69.2.2 transposed horizontal lift function (i.e. connection) on a principal bundle 
lifts 69.2.7 lift of vector field by transposed horizontal lift function 8 on principal bundle 
hz 69.3.2 horizontal component map at z € P for a connection on a PFB P 
Qz 69.3.7 horizontal subspace at z € P for a connection on a PFB P 
Wz 69.5.4 pointwise connection form at z € P for a principal bundle P 
Ww 69.5.4 global connection form for a principal bundle P 
Uz 69.4.2 vertical component map at z € P for a connection on a PFB P 
A” 69.11.3 gauge potential construction map for connection form w 
AX 69.11.3 gauge potential constructed from connection form w and cross-section X 
Ay 69.13.2 gauge potential component function for a principal bundle 
AX y 69.13.3 connection form w localised component function, cross-section X 
AX v 69.13.7 connection form w localised component function, cross-section X 
EX uu 69.14.3  pulled-back lifted chart-basis vector on principal bundle 
OF ea 69.14.4 — pulled-back lifted chart-basis operator on principal bundle 
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notation reference meaning 
70. Curvature of connections on fibre bundles .. ........ ess 2213 
p 70.3.8 Riemann (parallelism) curvature of a connection @ at a point p 
p? 10.3.9 Riemann (parallelism) curvature of a connection 0 
Dw 70.5.2 curvature form of a connection form w 
Q8 70.5.4 curvature form for connection ( on principal bundle 
08 70.5.4 restriction oL. (py? of curvature form QÊ for z in a principal bundle P 
71. Affine connections on tangent bundles . ..... eA 2241 
LÀ. 11.2.2 components of the Christoffel array for an affine connection 
Dy X 11.6.4 covariant derivative of vector field X by vector V 
Dy X 11.6.9 covariant derivative of vector field X by vector field Y 
DX 71.7.2 covariant derivative of vector field X along a curve 
DX (t) 71.7.4 covariant derivative of vector field X along a curve at parameter t € Dom(X) 
div X 71.8.4 the divergence of X € X!(M,IR) for a Riemannian manifold (M, g) 
Hu 71.11.8 | components of the Riemann curvature tensor of an affine connection 
72. Geodesics and Jacobi fields . . . ..... 2e 2269 
(oe Riemannian manifolds: p ios a e£ 4 coe ek sew owe wm m og Re UR UR ECC TR n A 2277 
llvllg 13.2.13 length of a vector v in a Riemannian manifold (M, g) 
g^ 13.3.2 component array field of metric tensor field g with respect to chart v 
DA k 73.3.6 kth partial derivative of component field gi for chart v 
au 13.3.6 (k, £)th partial derivative of component field g% for chart v 
gu 13.3.8 inverse component array field of metric tensor field g with respect to chart v 
ps 73.5.3 index-lowering isomorphism or flat musical isomorphism for metric g 
Bu 13.5.4 index-raising isomorphism or sharp musical isomorphism for metric g 
74. Levi-Civita parallelism and curvature .......... en 2295 
grad f 14.6.3 the gradient of f € C! (M, IR) for a Riemannian manifold (M, g) 
75. Pseudo-Riemannian manifolds . .................-.2.-.-.-....... 4 2311 
76. Spherical geometry ....... lll lll lessons 2321 
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78.2. Abbreviations 


abbreviation reference meaning 
AC 7.1.10 axiom of choice 
AD Anno Domini [i.e. Common Era] 
a.k.a. also known as 
BC Before Christ [i.e. Before Common Era] 
BCE Before Common Era 
BG 7.1.10 Bernays-Gédel (set theory) 
BV 38.10.1 bounded variation 
CC 7.1.10 countable choice axiom 
CCD 20.10.6 charge-coupled device 
CE Common Era 
DG differential geometry 
DNA 72.2.2 deoxyribonucleic acid 
EDM 79.2 Encyclopedic dictionary of mathematics [112, 113] 
FB 69.15.1 fibre bundle 
FLS 22.2.11 free linear space 
FOL +EQ 7.5.12 first-order language with equality 
FTOC 43.8.1 fundamental theorem of calculus 
GL 25.14.1 general linear group 
GR general relativity 
HOL 10.2.20 higher-order logic 
IOU 7.10.3 I owe you 
KEM 79.2 Kleine Enzyklopädie Mathematik [103 
LTS 51.11.4 local transformation semigroup 
MM 3.2.4 metamathematics 
MP 4.3.17 modus ponens 
MSC 1.8 mathematics subject classification 
NAND 3.7.14 not AND 
NBG 7.1.10 Neumann-Bernays-Gödel (set theory) 
NOR 3.7.14 not OR 
O 25.14.1 orthogonal group 
ODE 44.0.1 ordinary differential equation 
OFB 21.8.1 ordinary fibre bundle 
PC 4.5.5 propositional calculus 
PDE 44.0.5 partial differential equation 
PFB 21.9.2 principal fibre bundle 
QC 6.1.1 predicate calculus [i.e. quantifier calculus] 
QED 1.5.6 quod erat demonstrandum 
RAA 3.1.4 reductio ad absurdum 
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abbreviation reference meaning 
SL 25.14.1 special linear group 
SLR 20.10.6 single lens reflex 
SO 25.14.1 special orthogonal group 
SU 25.14.1 special unitary group 
TLS 39.1.1 topological linear space 
TS 55.0.1 tangent space 
TVS 39.1.1 topological vector space 
U 25.14.1 unitary group 
VB 69.15.1 vector bundle 
wf 4.1.6 well-formed formula 
wit 4.1.6 well-formed formula 
XOR 3.7.16 exclusive OR 
ZF 7.1.10 Zermelo-Fraenkel (set theory) 
ZFC 7.11.9 Zermelo-Fraenkel (set theory) with axiom of choice 


78.3. Literature survey tables 


The following table is a list of literature survey tables in this book. 


part table table title 
I 3.7.1 Survey of logic operator notations 
4.2.1 Survey of propositional calculus formalisation styles 
4.8.1 Survey of deduction metatheorem presentations for propositional calculus 
5.2.1 Survey of logical quantifier notations 
6.1.2 Survey of predicate calculus formalisation styles 
14.1.1 Survey of definitions and notations for “natural numbers” 
IT 18.11.2 Survey of algebraic set-class definition terminology 
21.3.1 Survey of terminology for cross-sections 
21.5.2 Survey of fibre chart styles 
22.11.1 Survey of convex hull terminology and notation 
26.10.1 Survey of affine space definitions 
28.5.1 Survey of tensor space definitions 
29.5.1 Survey of tensor space degree and type terminology 
30.4.2 Survey of general and antisymmetric tensor space notations 
II 31.8.1 Survey of topological interior, closure and boundary notations 
36.1.1 Survey of meanings of “curve” and “path” 
IV | 49.2.3 Survey of manifold definition core structures 
53.3.1 Survey of tangent vector definitions 
58.4.1 Survey of terminology for differentials and induced maps 
67.2.2 Survey of definitions which have the name “connection” 
70.0.1 Survey of presentations of curvature of connections 
71.11.2 Survey of Riemann curvature tensor component array notations 
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Chapter 79 
BIBLIOGRAPHY 
79.1 Differential geometry books... . lees 2385 
79.2 Other mathematics books . . . . 2 222 2l lll ee 2387 
79.3 Mathematics journal articles . . . 2 lees 2391 
79.4 Ancient and historical mathematics books . ...... aa 2392 
79.5 History of mathematics . . . . . 4... ll leer oss 2394 
19.6 Physics and mathematical physics books . .... aa 2394 
79.7 Physics and mathematical physics journal articles . . . ....... ll 2397 
79.8 Logic and set theory books ... ll ore 2398 
79.9 Logic and set theory journal articles . ..... aa 2401 
79.10 Anthropology, linguistics and neuroscience . ...... en 2402 
79.11 Philosophy and ancient history ..... 2... es 2402 
79.12 Dictionaries, grammar books and encyclopedias ................. sn 2403 
79:13 ‘Other Téeferences UP aa doaa ee Ge ee oe a ee ee eG 2404 
79.14 Personal communications . . . 2... e osse 2404 
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6. Yvonne Suzanne Marie-Louise Choquet-Bruhat, Géométrie différentielle et systémes extérieurs, 
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bbreviated graph, 11.2.3 
bbreviated notation for proposition name map, postfix, 
3.11.10 

abbreviations, 78.2 

Abel, Niels Henrik, 17.3.1, 17.3.24, 77.1.6 

Abel, Niels Henrik, elliptic integrals, 44.2.20 

Abelian group, 17.3.24 

abgeschlossene Hülle, topological closure, 31.8.10 

absolute parallelism, 26.1.2 

absolute parallelism, non-topological fibration, 21.15.2 

absolute value, complex numbers, 16.8.8 

absolute value, real numbers, 16.5.2 

absolute value function, 18.5.6 

absolute value function, standard, 18.5.14 

absolute value function, trivial, 18.6.10 

absolute value function on ring, 18.5 

absorbing set in a linear space, 24.6.9 

abstract direct sum of linear spaces, 24.1.4 

abstract expressionism, 35.1.1 

abstract groups, 20.1.1 

abstract linear space, Cartesian, 26.11.1 

abstract linear spaces, differentiation of maps, 41.9.1 

abstraction, cosmic, north, 26.10.1 

absurd consequences, reductio ad absurdum, 3.1.5 

absurdly big sets, 12.5.4 E 

absurdly infinite sets, 13.4.1 

AC (axiom of choice), 7.1.10 

AC-enhanced theorem, 7.11.14 

AC-free theorem example, 11.8.3 

AC pixie, 22.7.20 

AC-tainted theorem, 7.11.14, 10.3.9, 10.5.17, 10.11.10, 
11.6.23, 22.7.21, 22.7.27, 23.5.9, 23.5.10, 33.4.21 

AC-tainted theorem examples, 7.11.13 

Academy, St. Petersburg, 44.2.6 

acceleration function, second-order ODE right-hand side, 
44.3.3 

acceleration of curve, covariant, 71.7.5 

accumulation point, w, 31.10.16 

accumulation point of set, 31.10.2 

accumulation point of set, oo-, 31.10.17 

Achilles and tortoise, Zeno of Elea, 31.2.3 

act freely, left transformation group, 20.3.2 

act freely, right transformation group, 20.7.11 

act transitively, left transformation group, 20.4.1 

act transitively, right transformation group, 20.7.14 

action, differential, left, 63.6.1 

action, dual, family of diffeomorphisms, 63.3.3 

action, dual, of diffeomorphism family, 63.3 

action, dual differential, diffeomorphism family, 63.3.5 

action, infinitesimal, 67.5.1 
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action, infinitesimal group, principal bundle, 66.5 

action map, infinitesimal, identity chart, 66.5.9 — 

action map, infinitesimal, of Lie algebra elements on a 
principal fibre bundle, 66.5.2 

action map, infinitesimal transposed, principal bundle, 66.6.2 

action map, left, 20.1.2 c 

action map, left, by principal bundle element, 21.11.18, 66.4.2 

action map, left group, principal bundle, 66.4 

action map, pointwise right, principal fibre bundle, 66.2.10 

action map, right, 20.7.2 Ed 

action map, right, effect on connection form, 69.8 

action map, right, non-topological principal bundle, 21.11 

action map, right, non-topological principal fibre bundle, 
21.11.4 

action map, right, principal bundle, 66.2.2 

action map, right, topological principal bundle, 47.8.7 

action map, transposed infinitesimal, properties, 66.6.8 

action of curve on vector field, manifold, 61.2.7 

action of group, differentiable, fibre sets, 64.13.1 

action of vector field on real-valued function, 61.1, 61.1.3 

action of vector field on tensor field, 61.4.7 E 

action of vector field on vector field, 61.4.2 

action of vector on fibration cross-section, 64.7.7 

action of vector on naive vector field, Cartesian space, 46.4.2 

action of vector on real-valued function, 61.1.2 

action of vector on tensor field, 61.4.6 

action of vector on vector field, Cartesian space, 46.1.8 

action of vector on vector field, manifold, 61.2.3 

active set, algebra, 17.0.3 

actually infinite versus potentially infinite, 12.1.32, 13.12.4 

acyclic graph, directed, 11.2.3 v S 

addition, defective, extended real numbers, 16.2.8 

addition, finite ordinal number, commutativity, 13.6.6 

addition, finite ordinal number, existence, 13.6.2 

addition, finite ordinal numbers, 13.6 mu 

addition, real numbers, 15.7 E 

addition, rectangular matrix, 25.3 

addition, vector, 22.1.1 m 

addition of sets, Minkowski, 22.10.2, 22.10.4 

addition operation, finite ordinal numbers, 13.6.3 

addition operation, real numbers, 15.7.2 

address, matrix element, 25.2.11 ^ 

adherent set, topological closure, 31.8.10 

adjoint map, conjugate map differential, 62.10.4 

adjoint map, Lie group, 62.10, 62.10.2 

adjoint map, linear space, 23.11.3 

adjoint map by Lie group element on Lie algebra, 62.10.1 

aesthetics, tangent vectors, 53.1.8 

aether, luminiferous, 48.2.4 

affine, etymology, 71.0.5 
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affine combination of points, 24.4.3 affine space over unitary module over ring, 26.7.2 
affine connection, 26.1.3, 71, 71.1.2 affine space tangent bundle, fibre bundle, 26.6.4 
affine connection, abstract, torsion, 71.12.4 affine structure function, affine space over a group, 26.2.2 
affine connection, Cartan-style, 71.14.1 affine transformation, 24.4, 26.1.3 
affine connection, Christoffel array, 71.2.2 affine transformation group, 77.2.6 
affine connection, coefficient array field, 71.2.2 affine transformation on linear space, 24.4.6 
affine connection, covariant derivative, 71.6 _ affinely parametrised geodesic conditions, 72.1.11 
affine connection, covariant derivative Leibniz rule, 71.6.7 affinely parametrised geodesic curve, 72.1.2 
affine connection, curl, 71.11.2 a-fortiori, 41.5.3 
affine connection, curvature, 71.11 aggregate existence, 9.1.1 
affine connection, grooves in space, 60.1.4 Aharonov-Bohm effect, gauge potential, 69.11.5, 70.8.2 
affine connection, metric-free parallelism, 71.6.1 air gap, tangent to curve, 11.2.33 
affine connection, Ricci curvature tensor, 71.11.11 Alberti, Leone Battista, 77.1.4 
affine connection, Riemann curvature tensor field, 71.11.5 aleph number, 13.2.7 
affine connection, tensorisation coefficients, 60.4.12 aleph numbers, cardinality yardstick, 13.2.6 
affine connection, terminology origin, Hermann Weyl, 67.1.1, algebra, active set, 17.0.3 
71.0.1 — algebra, associative, 19.9 
affine connection, torsion, 71.12 algebra, etymology, 22.0.3 
affine connection, transposed, 71.5.2 algebra, GraSmann, 27.1.5 
affine connection curvature, principal fibre bundle, 71.4.6 algebra, Lie, 19.10 
affine connection example, non-zero torsion, 71.12.6 algebra, Lie, computation using Lie group manifold, 62.8.5 
affine connection on frame field, 71.14, 71.14.2 algebra, Lie, definition styles, 19.10.1 
affine connection on tangent bundle, coefficients, 71.13 algebra, Lie, of left invariant vector fields on a Lie group, 
affine-invariant geometry, 77.2.5 GN 62.4.10 
affine manifold, 26.1.3 algebra, Lie, of Lie group, 62.8 
affine space, 26, 26.1.3 algebra, Lie, of vector fields on a manifold, 61.5.16 
affine space, automorphism group, 26.1.6 algebra, Lie, over a ring, 19.10.3 
affine space, convex combination of points, 26.9.14 algebra, Lie, real, 19.10.14 
affine space, etymology, 77.2 algebra, linear, 22, 23, 24 
affine space, hyperplane segment through points, 26.9.12 algebra, logical, 6.1.3 n 
affine space, hyperplane through points, 26.9.11 ^ algebra, matrix, 25 
affine space, line segment, 26.9.7 algebra, matrix, history, 25.0.1 
affine space, line through points, 26.9.2 algebra, multilinear, 27, 27.1.2, 27.1.5 
affine space, manifold chart, 26.2.7 algebra, passive set, 17.0.3 
affine space, span of points, 26.9.13 algebra, sets, 8.1, 8.4 
affine space definitions, survey, 26.10.1 algebra, symbolic, 22.2.22 
affine space discussion, 26.1 algebra, tangent, of diffeomorphism group, 63.2.5 
affine space over a group, 26.2, 26.2.2 algebra of ring-module endomorphisms, 19.9.5 
affine space over a group, tangent space, 26.3.2 algebra of sets, 18.11.15, 18.12 
affine space over a linear space, 26.10 algebra of sets generated by a subset, 18.11.22 
affine space over group, half-line, 26.3.12 algebraic class of sets, 18.11 
affine space over group, line, 26.3.3 algebraic dual of linear space, 23.6.7 
affine space over group, tangent bundle, 26.3.7 algebraic operation symbol, 17.0.4 
affine space over group, tangent bundle, unidirectional, algebraic structures, single-set, single operation, 17 
26.3.16 algebraic structures, single-set, two operations, 18. 
affine space over group, tangent-line bundle, 26.3 algebraic structures summary table, 17.0.3 ~ 
affine space over group, tangent space, 26.3.5 algebraic systems, family tree, 17.0.1 
affine space over group, tangent space, unidirectional, 26.3.14 algebraic systems, plethora, 17.0.2 
affine space over linear space, 26.10.3 algebraic topology, 36.0.1 
affine space over module, line, 26.4.5 algebraic topology, out of scope, 1.6.3 
affine space over module, tangent bundle, 26.5.9 algebras, 19 
affine space over module, tangent bundle, unidirectional, algebras and modules, family tree, 19.0.1 
26.7.9 allowed homomorphism over a set, 19.1.10 
affine space over module, tangent-line bundle, 26.5 almost-differentiable manifold, 53.2 
affine space over module, tangent space, 26.5.7 almost-empty fibration, 212.11 
affine space over module, tangent velocity bundle, 26.8.4 almost-everywhere assertions, 45.1.1 
affine space over module, tangent velocity space, 26.8.2 alphabet, Greek, 31.4.2 
affine space over module over group, 26.6, 26.6.2 alternate function, Lie bracket, 61.5.1 
affine space over module over ordered ring, half-line, 26.7.5 alternating m-form function, 30.1.8 
affine space over module over ordered ring, tangent space, alternating m-form map, 30.1.3 
unidirectional, 26.7.7 alternating multilinear function, 30.1.8 
affine space over module over ring, 26.6, 26.6.6 alternating multilinear map, 30.1.3 
affine space over module over set, 26.4 - alternating symbol, Levi-Civita, 14.7.1, 14.8.34 
affine space over module over set, line, 26.5.3 alternating tensor, 30.4, 30.4.27 
affine space over module over unstructured set, 26.4.10 alternating tensor product, 30.4.8 
affine space over module without operator domain, 26.4.2 alternative denial, 3.7.14 
affine space over unitary module over field, 26.10.3 alternative denial operator, 3.7.8, 3.7.10 
affine space over unitary module over ordered ring, 26.7 always-false, nullary operator, 3.6.16 
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always-true, nullary operator, 3.6.16 

always-true set-theoretic formula, 10.2.20 

ambient space, locally Cartesian space, 49.6.9 

ambiguity, left/right transformation group, 8.8.7, 63.7.1 

ambiguity, zero tangent operator, 53.3.3, 54.12.4, 54.12.10, 
54.15.1 EC 

amplifier, operational, noise sensitivity, 44.1.8, 44.6.2 

analysis, soft versus hard, 35.1.1 

analysis situs, 31.1.1 

analysis situs, combinatorial topology, 31.1.2 

analytic chart, 51.10.4 

analytic fibre bundle, 64.10 

analytic fibre bundle for Lie left transformation group, 64.10.1 

analytic function, 42.8 

analytic function, complex, 42.8.5 

analytic function, real, 42.8.3 

analytic function, real, coefficient sequence, 42.8.3 

analytic group, 62.2 

analytic group, real, 62.2.4 

analytic manifold, 51.10, 51.10.3 

analytic manifold atlas, 51.10.2 

analytic manifold atlas, equivalent, 51.10.5 

analytic real function, 16.8.10 

Analytical Society, Cambridge, derivative notation, 40.4.12 

anchor chart, 71.6.2 

ancient Greeks, 7.1.2 

and, 3.6.2 ER, 

Andromeda Galaxy holiday, axiom of choice, 22.7.20 

angle between vectors, Riemannian manifold, 73.2.16 

angle brackets, inner product, 24.9.9 

angles, Euler’s, 76.8.4 

anholonomic vector fields, Riemann curvature, 70.3.2 

animal, half, 1.4.7 

annulus, closed, 37.3.14 

annulus, open, 37.3.14 

anointment axiom, ZF, 7.4.2, 12.1.27, 13.8.15 

ant on ruler, 69.10.5 ^ 

ant on turntable on train, 69.10.5 

anthropomorphic principle, 48.2.4 

anticommutative product, 19.10.2 

anticommutativity, Lie bracket, 61.5.9 

antiderivative, 43.2.3 

antidifferentiation, 43.2.3 

Antiphon the Sophist, 77.1.2 

antisymmetric and general tensor space notations, survey, 
30.4.13 

antisymmetric higher-degree array compression, 25.15.13 

antisymmetric higher-degree array decompression, 25.15.16 

antisymmetric multilinear effect, 30.4.4 

antisymmetric multilinear function, 30.1.8 

antisymmetric multilinear function bundle, 56.5.26 

antisymmetric multilinear function bundle, non-topological, 
56.5.23 

antisymmetric multilinear function bundle manifold chart, 
56.5.24 

antisymmetric multilinear function bundle notation, 56.5.14 

antisymmetric multilinear function components, 30.3.10 

antisymmetric multilinear function fibration, 56.5.16, 56.5.17 

antisymmetric multilinear function fibration, vector-valued, 
56.7.6 

antisymmetric multilinear function on tangent space, 
particular, 56.5.9 

antisymmetric multilinear map, 30.1, 30.1.3 

antisymmetric multilinear map, vector-valued, 56.7.7 

antisymmetric multilinear map, vector-valued, fibre chart, 
56.7.10 

antisymmetric multilinear map bundle, 56.7.18 
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antisymmetric multilinear map bundle, non-topological, 
56.7.14 

antisymmetric multilinear map bundle manifold chart, 56.7.15 

antisymmetric multilinear map components, 30.3.2, 30.3.5 

antisymmetric multilinear map fibration, fibre chart, 56.5.19 

antisymmetric multilinear map fibration, vector-valued, fibre 
chart, 56.7.11 

antisymmetric multilinear map on tangent space, particular, 
56.7.8 

antisymmetric multilinear map space, canonical basis, 30.3.6 

antisymmetric multilinear map space basis, 30.3 

antisymmetric tangent multilinear function bundle, 56.5 

antisymmetric tangent multilinear map bundle, 56.7 

antisymmetric tensor spaces, 30.4 ~~ 

antisymmetry, multilinear map, 30 

antisymmetry, tensor, 30 mi 

antisymmetry of an order, 11.1.2 

Apianus, Petrus, 14.9.6 

Apollonius of Perga, 77.1.2 

Apollonius of Perga, coordinates, 26.11.5 

apología (&xoAovía), 3.0.5 

apple, force, 53.1.4 ^ 

applicable set theory, 7.0.1 

application, logic, 243 . 

applied mathematics, pure mathematics, comparison, 77.4.5 

apprenticeship, osmosis, 2.2.2 Pr 

approximation cylinder, local, Peano ODE method, 44.4.4 

approximation of infimum from above, 16.1.18 

approximation of supremum from below, 16.1.18 

approximation rectangle, local, Peano ODE method, 44.3.9 

a-priori formula for vector field transport, 61.7.3 

a-priori knowledge, 4.3.2 

arc, 36.1.1 a 

arc, Jordan, 36.2.11 

arccos function, 44.2.3 

arccos function, first derivative, 44.2.15 

arccos function, second derivative, 44.2.16 

archaeologist, future, ontology, 2.2.2 

Archimedean bi-ordering of group, 17.5.5 

Archimedean left-ordering of group, 17.5.5 

Archimedean order on a group, 17.5.4 

Archimedean ordered commutative group, 17.5.9 

Archimedean ordered ring, 18.1.1, 18.4, 18.4.2 

Archimedean ordering of commutative group, 17.5.9 

Archimedean ordering of ring, 18.4.2 

Archimedean right-ordering of group, 17.5.5 

Archimedean ring, module, Cauchy-Schwarz inequality, 19.8.9 

Archimedean tangent space, 26.19.1 

Archimedes of Syracuse, 77.1.2 

Archimedes of Syracuse, Archimedean order, 17.5.3 

Archimedes of Syracuse, coordinates, 26.11.5 

Archimedes of Syracuse, curves, time parameter, 40.1.3 

Archimedes of Syracuse, integral, similarity to Cauchy’s, 
43.3.1, 43.4.1 

Archimedes of Syracuse, integral, similarity to Darboux’s, 
43.5.3 

Archimedes of Syracuse, integration, 43.0.1 

Archimedes of Syracuse, lemmas, 2.4.6 

Archimedes of Syracuse, limit concepts, 35.3.10 

Archimedes of Syracuse, method of exhaustion, 43.1.2 

Archimedes of Syracuse, no word for radius, 44.2.22 

Archimedes of Syracuse, practical analysis, 77.1.4 

Archimedes of Syracuse, spiral, definition, 26.19.2 

Archimedes of Syracuse, spiral, time parameter, 26.19.1 

Archimedes spiral, 26.19.1 

architect’s model, 2.2.13 

architecture versus bricklaying, 1.4.6 

arcsin function, 44.2.3 
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arcsin function, first derivative, 44.2.15 

arcsin function, second derivative, 44.2.16 
arctan function, 44.2.3 

arctan function, first derivative, 44.2.15 

arctan function, second derivative, 44.2.16 
arctan function, two-parameter, 44.2.9 

area, directed, 30.4.25 

arena, mathematics, anointment of sets, 12.1.27 
arena, passive geometry, active physics, 22.0.3 
arena, passive space-time, active field theory, 49.5.10 
arena, ZF set theory, 13.8.15 

argument, logical, non-logical axiom, 4.3.2 
argument, principal, complex number, 44.2.10 
argument line entanglement, 6.6.26 

argument of function, 10.2.8 

argumentation versus truth, 3.1.4 
argumentative approach, Aristotle, 4.0.1 
Aristotle, 77.1.2 PEE 
Aristotle, argumentative approach, 4.0.1 
arithmetic, consistency, 14.3.2 d 
arithmetic, natural number, 14.3 

arithmétic equivalent, logic operator, 3.15.6 
arithmétic triangle, 14.9.5 

arity, logical operator, 3.11.7 

array, antisymmetric higher-degree, compression, 25.15.13 


array, antisymmetric higher-degree, decompression, 25.15.16 


array, Christoffel, affine connection, 67.1.1 
array, Christoffel, correction terms, 71.4.1 
array, Christoffel, geodesic curve, 72.1.6 

array, Christoffel, grooves in space, 60.1.4 
array, Christoffel, history, 67.1.4, 74.3.3 

array, Christoffel, horizontal lift function, 62.0.1, 67.7.10 
array, Christoffel, index order, 71.13.1 

array, Christoffel, on two-sphere, 76.6.1 

array, Christoffel, Riemann curvature, 70.4.3 
array, Christoffel, Schild's ladder, 72.2.2 

array, Christoffel, tensorisation, 60.4.1, 60.4.12 
array, Christoffel, torsion, 71.12.2 

array, Christoffel, vector bundle, 68.1.8 

array, higher-degree, 25.15.2 c 

array, higher-degree, antisymmetric, 25.15.6 
array, higher-degree, symmetric, 25.15.5 

array, sparse, 22.2.23 

array, symmetric higher-degree, compression, 25.15.12 
array, symmetric higher-degree, decompression, 25.15.15 
arrow, blunt end, vector, 22.0.2 

arrow, circled, notation, 10.19.8 

arrow, flying, Zeno of Elea, 53.1.12 

arrow, Peirce, 3.7.8, 3.7.10, 3.7.14 

arrow, self, 11.2.3 _ 

arrow, transitivity, 11.1.11, 11.2.3 

arrow, transitivity, total order, 11.5.8 

arrow diagram, partial order, 11.2.3 
arrowhead, vector, 22.0.2 EE 

art, non-representational, 2.4.4 

art of thinking, 3.0.3 m 

artificial intelligence, 2.2.3 

Ascoli, Giulio, 77.1.6 

Ascoli's theorem, 38.5, 38.5.5 

assertion, 4.3.9 ~ 

assertion, conditional, 4.3.10 

assertion, Gentzen-style, 5.3.3 

assertion, Hilbert-style, 5.3.3 

assertion symbol, 4.3.11, 4.3.12 

assertion symbol, evolution of meaning, 5.3.6 
assertion symbol, two-way, 4.3.15 A 


assertion symbol meaning, 4.3.7 


assertional logical system, 4. 
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assertions, almost-everywhere, 45.1.1 

assertions, denials, deception, 3.1 

assertions, uninteresting, 4512 

associated baseless figure bundle constructions, 20.12 

associated baseless figure bundles, 20.11 

associated baseless figure/frame bundles, 20.11.2 

associated connection, contravariant principal bundle 
function, 69.10.3 

associated connection, literature, 67.12.1 

associated connection, ordinary fibre bundle, 67.12, 67.12.3 

associated connection, principal bundle, 69.9 

associated connection, vector-tuple bundle, 59.7.1, 
71.3.2, 74.2.1 

associated contravariant function on principal bundle, 47.12, 
66.8 

associated cross-section, orbit- space, short-cut, 47.12, 66.8 

associated cross-section, short-cut orbit-space, 471123 

associated differentiable fibre bundle, 66.7, 66.7.6, 66. 

associated differentiable fibre bundle, orbit-space method: 
66.7.12 

associated differentiable fibre bundle, patchwork, 66.7.10 

associated fibre bundle, history, 21.12.1 

associated fibre bundle, orbit-space, patchwork comparison, 
4711.3 

associated fibre bundle cross-section, orbit-space, short-cut, 
66.8.1 

associated non-topological fibre bundle, 21.12, 21.12.5 

associated non-topological fibre bundle, orbit-space, 21.14 

associated non-topological fibre bundle, orbit-space method, 
21.14.2 

associated non-topological fibre bundle, patchwork, 21.13, 
21.13.2 

associated OFB connection formula using PFB connection, 
69.9.4, 69.9.6 

associated orbit-space cross-section, short-cut map, 47.12.5 

associated parallelism, 48.4 

associated principal bundle, radiation field, gauge theory, 
21.12.9, 47.11.2 

associated tensor bundles, Lie derivative, 61.9.1 

associated topological fibre bundle, 47.9, 47.9.7 

associated topological fibre bundle, 'continuous-equivalent, 
47.9.11 

associated topological fibre bundle, orbit-space, 47.11 

associated topological fibre bundle, orbit-space method, 
47.11.5 

associated topological fibre bundle, patchwork, 47.10, 47.10.4 

associated topological pathwise parallelism, 48.4.2 

associated transformation group, 20.9.1, 20. 9.17, 2 17, 20.11 

associated vector bundle, matter field, ¢ gauge theory, 2 21.12.9, 
47.11.2 

associated vector bundle, orbit-space, connection, 69.10 

association, fibre bundle, topological, 47.9.5 

association, tangent vector/covector bundles, topological, 
55.4.17 

association map, fibre bundle, differentiable, 66.7.5 

association map, fibre bundle, non-topological, 21.12.4 

association map, fibre chart, 66.7.5 

associative algebra, 19.9 

associative algebra, associated Lie algebra, 19.10.11 

associative algebra over commutative unitary ring, 19.9.2 

associativity, function composition, 10.4.21 

asterisk method of learning, 1.7.1 

astral plane, 2.2.14, 13.9.10, 22.7.22, 45.3.6 

astronomical spherical coordinates, 76.2.3 

atlas, analytic manifold, 51.10.2 

atlas, analytic manifold, equivalent, 51.10.5 

atlas, C"-maximal, 51.4.4 
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atlas, complete, disadvantages, 50.4.9, 51.2.2, 51.4.4, 54.5.31, atlas via diffeomorphism, pull-back, differentiable manifold, 


54.9.12, 64.4.1 52.2.8 
atlas, complete, locally Cartesian space, 49.7.18 atlas Via trivialisations, induced, differentiable fibre bundle, 
atlas, complete, topological manifold, 49.7.18 64.9.2 
as, complete C^, differentiable manifold, 51.4.5 atlas via trivialisations, pull-back, differentiable fibre bundle, 
tlas, continuous, locally Cartesian space, 49.7 64.9.2 


atlases, equivalent, differentiable manifold, 51.4.10 


as, continuous locally Cartesian, induced topology, 49.8.12 
as, curve, 36.1.4 atom, set theory, 9.2.8 
tlas, differentiable, locally Cartesian, 51.5 atomic wff-wff, 4.5.15 


attraction at a distance, James Clerk Maxwell, 21.0.7 

attribute, invariant/covariant/contravariant, 20.9.19 

Auswahlaxiom, 7.12.7 —_ 

aut, 3.6.3 mm 

author biography, 0.0 

automorphism, C* differentiable, differentiable manifold, 
52.2.2 

automorphism, inner, 17.8.12 

automorphism, linear space, 23.1.8 

automorphism, linear space, contragredient, 23.11.18 

automorphism, ordered field, 18.8.17 

automorphism, ring, 18.1.20 

automorphism, set, 10.5.21 


as, differentiable locally Cartesian, induced topology, 
51.5.8 

atlas, differentiable manifold, 51.5.10 

atlas, differentiable manifold, indexed, 51.5.10 

atlas, differentiable submanifold, induced by ambient atlas, 

52.4.7 

atlas, fibre, differentiable fibration, 64.4 

atlas, fibre, for topological fibration, 47.5.2 

atlas, fibre, non-topological fibration, 21.7, 21.7.2 

a 

a 


tlas, fibre, topological, 47.5 

ocally Cartesian, Continuous, empty chart exclusion, 
49.8.5 

atlas, locally Cartesian, differentiable, 51.5, 51.5.2 


as, 


atlas, locally Cartesian, induced topology, 49.8 automor phism, topological, 31.14.2 

atlas, locally Cartesian space, 49.7.3 — automorphism, transformation group, 20.6.2, 20.8.1 
atlas, locally Cartesian space, differentiable, 51.3.2 m Broup, -o 

atlas, locally Cartesian space, differentiable, indexed, 51.3.2 memory Pe, Gr OUD ANDO Spaca; 26.1.6 : 

atlas, locally Cartesian space, indexed, 49.7.3 c automorphisms, topological, continuous family, 36.10.14 
M inn manifold. for fibre set 64 115 — axiom, anointment, ZF, 7.4.2, 12.1.27, 13.8.15 

atlas, manifold, tangent bundle total space, 54.5.22 ial lnc s 5 

atlas, maximal, disadvantages, 50.4.9, 51.2.2, 51.4.4, 54.5.31, MM ee 


axiom, extensionality, 7.5.1 


axiom, infinity, faith, 7.10.3 

axiom, infinity, ZF, 7.9 

axiom, logical, 3.3.6 — 

axiom, multiplicative, 7.12.2, 10.3.3 
axiom, multiplicative, countable, 13.7.22 
axiom, non-logical, logical argument, 4.3.2 
axiom, nonlogical, 3.13.5, 3.14.7 mm 
axiom, power set, 7.6.23 

axiom, reflexivity of equality, 6.7.5 


54.9.12, 64.4.1 
atlas, maximal, locally Cartesian space, 49.7.18 
atlas, maximal, topological manifold, 49.7.18 
atlas, maximal C^, differentiable manifold, 51.4.5 
a 

a 


tlas, non-topological, 49.3, 49.3.4 

as, path, 36.1.4 m 

atlas, pull-back, via homeomorphism, locally Cartesian space, 
49.11.7 

atlas, relative, on submanifold, 52.4.9, 52.7.4, 64.11.2 


atlas, standard, for Cartesian space, 49.7.12 axiom, regularity, ZF, 7.8 
atlas, standard, for finite-dimensional linear space, 49.7.14 axiom, set existence, ZF, 7.4.2, 7.4.3 
atlas, standard, for general linear group, 49.7.15 axiom, singleton, 7.6.15 — 
atlas, submanifold, construction, 52.4.11 axiom, specification, ZF, 7.7.3 
atlas, submanifold, removal, 52.4.10 axiom, substitutivity of equality, 6.7.5 
atlas, topological fibre, equivalent, 47.6.17 axiom, union, 7.6.15 E 
atlas, topological manifold, 50.1.11 axiom, unordered pair, 7.6.10 
atlas, topological manifold, indexed, 50.1.12 axiom, ZF, productive, 7.4.2 
atlas, usual, for Cartesian space, 49.7.12 axiom of choice, 5.2.7, 7.11.10, 32.1.11 
atlas for a set, continuous locally Cartesian, 49.8.2 axiom of choice, Andromeda Galaxy holiday, 22.7.20 
atlas for manifold, locally Lipschitz, 50.6.10 axiom of choice, Cantor's intersection theorem, 37.9.9 
atlas induced by fibre chart on fibre set, 64.3.7 axiom of choice, career implications, 2.1.5 mm 
atlas induced on fibre set by fibre chart, 64.11.3 axiom of choice, Cartesian product of sets, 10.11.9 
atlas induced on fibre set by fibre space, 64.3.10 axiom of choice, clothes, 77.4.1 
atlas induced on product-structured differentiable manifold, axiom of choice, complete metric space, 37.8.10 
52.7.1 axiom of choice, Easter Bunny, 13.8.15 
atlas induced on product-structured topological manifold, axiom of choice, equinumerosity, 13.1.16 
50.5.1 axiom of choice, existence of homeomorphisms, 31.14.5 
atlas induced on product-structured topological submanifold, ^ axiom of choice, faith, 7.10.3 
50.5.8 axiom of choice, false, 7.12.5 
atlas induced on tangent bundle from base space, 54.5.18 axiom of choice, families of sets and functions, 10.8.6 
atlas induced via homeomorphism, locally Cartesian space, axiom of choice, Father Christmas, 13.8.15 
49.11.7 axiom of choice, Faustian pact, 7.4.4, 77.4.1 
atlas of curves for a path, 36.1.8 axiom of choice, imaginary parachute, 7.11.1, 10.3.1 
atlas product, direct, non-topological manifolds, 49.3.10 axiom of choice, independence from ZF axioms, 22.6.11 
atlas product, direct, topological manifolds, 50.4.6 axiom of choice, linear functional extension, 23.5.1 
atlas via diffeomorphism, induced, differentiable manifold, axiom of choice, linear space basis, 22.7.20, 22.7.22, 27.5.2 
52.2.8 axiom of choice, literature, 7.11.2 
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axiom of choice, local, 7.12.4 

axiom of choice, not proven, Scottish law, 7.1.11 
axiom of choice, not to be taken literally, 45.3.7. 
axiom of choice, product topology, 32.12.5 

axiom of choice, quantifier swapping, 10.11.11, 13.8.9 


axiom of choice, reversing logical quantifiers, 38.2.2, 45.2.1 


axiom of choice, shim theorem, 13.8.14 

axiom of choice, socks metaphor, Russell, 7.12.2 
axiom of choice, theorems lost without it, 7.11.13 
axiom of choice, Tikhonov's theorem, 33.5.16 
axiom of choice, tooth fairy, 13.8.15 

axiom of choice, Urysohn's lemma, 33.3.22 

axiom of choice, useless, 7.4.3 

axiom of choice, weeded out, 1.6.2 

axiom of choice equivalents, literature, 7.11.12 
axiom of comprehension, Zermelo set theory, 7.7.3 
axiom of countable choice, 7.11.14, 13.7.21 E 
axiom of countable choice, infinite sets, 13.7.5 
axiom of countable choice, magic wand, 13.10.1 
axiom of countable choice, separable space, 33.4.22 
axiom of countable choice doppelganger, 33.7.4 
axiom of dependent choice, 7.1.11 

axiom of foundation, 7.8.1 

axiom of infinite choice, 7.11, 7.12 

axiom of infinity, ultrafinitist, 12.1.27 

axiom of replacement, ZF, 7.7 

axiom of separation, Zermelo set theory, 7.7.3 
axiom of specification, 7.7.4, 9.1.1 es 
axiom of substitution, ZF set theory, 7.7.1 

axiom schema, 4.1.6 sw: 

axiom style, ZF, 7.9.2 

axiom template, 4.4.5 

axiomatic approach, definitions, 7.1.3 

axiomatic system, 4.1.6 mm 

axiomatic system, consistent, knowledge sets, 3.4.10 
axiomatic system, raison d'étre, 4.9.5, 6.1.7 
axiomatic system PC for propositional calculus, 4.4.3 


axiomatic system PC’ for propositional calculus, 4.9.2 


axiomatic system QC for predicate calculus, 6.3.9 


axiomatic system QC4-EQ for predicate calculus, 6.7.6 


axiomatic systems, declarations, tags, 3.2.4 

axioms, complete ordered field, real numbers, 15.9.3 
axioms, Euclidean geometry, 2.1.4 

axioms, non-standard, 7.1.]1] 

axioms, Peano, for natural numbers, 14.1.5 

axioms, real number system, 15.9 

axioms, set theory, Zermelo-Fraenkel, 7.2, 7.2.4 
axioms, ZF set theory, basic four, 7.6 —— 


Babbage, Charles, derivative notation, 40.4.12 
backbone of set theory, ordinal numbers, 12.6.4 
backwards-deductive search, 4.5.13 

balanced set in a linear space, 24.6.9 

ball, closed, 37.3.1, 37.5.13 

ball, closed, punctured, 37.3.14 

ball, closed, with zero radius, 37.3.4 

ball, metric space, 37.3 

ball, open, 37.3.1, 37.5.12 

ball, open, punctured, 37.3.14 

ball, unit, seminorm recovery, 24.6.10 

ball centre, 37.3.1 I 

ball radius, 37.3.1 

balls, unit, seminorm Minkowski functionals, 24.6.7 
balls, unit, seminorm properties, 24.6.8 mE 
Banach, Stefan, 77.1.7 

Banach space, 22.5.20, 39.4 

Banach space, complex, 39.4.4 

Banach space, differentiable fibre bundle, 64.1.4 
Banach space, directional derivative, 41.9.4 
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Banach space, directionally differentiable function, 41.9.2 

Banach space, real, 39.4.2 

Banach space, tangent vectors, drop function, 54.9.9 

Banach space, total differential, 41.9.8 

Banach space, totally differentiable function, 41.9.7 

Banach space tangent bundle, 53.3.13 ES 

Banach-Tarski theorem, 7.12.8 

bar, Halmos, QED symbol, 1.5.6 

bare continuum, 49.2.10, 59.1.13 

bare-handed techniques, 12.1.0 _ 

Barrow, Isaac, 77.1.4 

Barrow integral, 43.2 

barter, analogy for tangent vectors, 53.1.4 

base, open, 32.2 

base, open, countable, 32.2.10 

base, open, topology, 32.2.3 

base at a point, open, countable, 32.2.9 

base at a point, open, topology, 32.2.2 

base space, differentiable fibre bundle, 64.8.3 

base space, non-topological fibre bundle, 21.8.3 

base space, topological fibre bundle, 47.6.5 

baseless fibre bundle, 20.10.1 

baseless figure bundle, associated, constructions, 20.12 

baseless figure bundles, associated, 20.11 EE 

baseless figure/frame bundle, contravariant, 20.10.8 

baseless figure/frame bundles, 20.10 

baseless figure/frame bundles, associated, 20.11.2 

basic logical expression, concrete propositions, 3.6 

basic logical operation, on logical expressions, 3.9 

basis, canonical, antisymmetric multilinear map space, 30.3.6 

basis, canonical, multilinear dual function space, 27.7.2 

basis, canonical, multilinear function space, 27.6.9, 30.2.11, 
55.5.41 

basis, canonical, multilinear map space, 27.6.16, 30.2.7 

basis, dual, canonical, 23.8.2 

basis, dual linear space, 23.7 

basis, linear space, 22.7 = 

basis bundle, tangent space, 55.7.2 

basis cardinality, finite-dimensional linear space, 22.7.16 

basis covector, chart, tangent space, 55.3.2 LESS 

basis existence, linear space, 7.11.13, 22.7.21 

basis family, dual, canonical, 23.7.3 

basis family, linear space, 22.7.6 

basis family, ordered, linear space, 22.7.7 

basis family, totally ordered, linear space, 22.7.7 

basis family, well-ordered, linear space, 22.7.7 

basis field, standard induced, vector bundle fibre set, 65.1.8 

basis for linear space, finite, 22.8.3 

basis-free Minkowskian inner product criterion, 24.10.8 

basis set, dual, canonical, 23.7.2 

basis set, linear space, 22.7.2 

basis-transition component array, 22.9.6 

basis-transition component array multiplication conventions, 
22.9.10 

basis vector, chart, tangent operator space, 54.13.4 

basis vector, chart, tangent space, 54.4.9 

basis vector-tuple, chart, tangent space, 55.5.18 

baskets, true or false, 77.4.5 

beans, hill, crazy world, 77.4.15 

beauty, sacrifice for truth, 1.4.11 

bed, Procrustes, 53.2.2 E 

bedrock of knowledge, 2.0.1 

bedrock of mathematics, 2.1 

Beltrami, Eugenio, ir 

Beltrami, Eugenio, absolute differential calculus, 27.1.5 

benefits, social, mechanisation of logic, 4.1.3 

Bernays, Paul Isaak, 77.1.7 mnm 

Bernoulli, Jacob (Jacques), 16.2.2, 77.1.5 
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Bernoulli, Johann, 77.1.5 

Bernoulli, Johann, geodesics, 73.8.2 

Bernoulli, Johann, integral calculus name, 43.1.1 

Bernoulli’s lemniscate, 16.2.2 

Bernshtein’s theorem, 13.1.6 

Bernstein, Felix, 77.1.7 

beta-cardinality, 13.4 

beta-cardinality, cardinality yardstick, 12.6.4 

beta-cardinality of set, 13.4.9 

beta-set, 12.5.4, 13.4.12, 13.4.13 

beth number, 13.4.12 

BG (Bernays-Gédel), 7.1.10 

bi-ordering of group, 17.5.1 

bi-ordering of group, Archimedean, 17.5.5 

bibliography, 79 

bidirectional length-parametrisation, locally rectifiable curve, 
38.9.11 

bidirectional length-parametrised curve, metric space, 38.9.11 

bidirectional partial curve length function, metric space, 
38.8.13 

bidirectional tangent space, 26.8.1 

bidual of linear space, 23.10.1 

big-endian integer representation, 53.1.14 

bijection, 10.5.2 

bijection, continuous, which is not a homeomorphism, 31.14.4 

bijection, local, 10.9.15 

bijective function, 10.5.2 

bijective partial function, 10.9.8 

bilinear effect, vector pair, 30.4.25 

bilinear form, 24.9.3 

bilinear function, 24.9.3 

bilinear function, non-singular symmetric, on linear space, 

24.10.2 

bilinear function, real negative definite, 30.5.3 

bilinear function, real negative semi-definite, 30.5.3 

bilinear function, real positive definite, 30.5.3 

bilinear function, real positive semi-definite, 30.5.3 

bilinear function on linear space, symmetric, index, 24.10.5 

bilinear map, 27.2.16 

bilinear map, universal, 28.6.4 

bilinear sectional curvature map, 74.5.2 

bilinear tensor map, canonical, 28.6.4 

binary Cartesian product, projection map, 10.12 

binary Cartesian product, projection slice, 10.12 

binary number, 14.4.12 mm 

binary set intersection properties, 8.1 

binary set union properties, 8.1 

binary truth function, 3.7 nd 

binary truth function classification, 3.8 

binding precedence rules, logical operator, 3.9.6 

biography of author, 0.0 an: 

blackboard bold font, 14.1.1 

bless existence of set, 45.3.7 

blind spot, modelling, 53.2.2 

bluebirds, white cliffs of Dover, 77.4.15 

blunt end, arrow, vector, 22.0.2 

Bohr, Niels Henrik David, 77.1.7 

boilerplate, tangent vector-frame spaces, 55.6.2 

boilerplate, tangent vector-tuple spaces, 55.5.1 

boilerplate, vector-valued multilinear map bundles, 56.7.17 

bold font, blackboard, 14.1.1 

Bolzano, Bernard Placidus Johann Nepomuk, 77.1.6 

Bolzano, Bernard Placidus Johann Nepomuk, Bolzano- 
Weierstraf theorem, 35.7.4 

Bolzano, Bernard Placidus Johann Nepomuk, set theory, 
7.1.6 

Bolzano-Weierstra8 limit-point property, real numbers, 35.7.3 
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Bolzano-Weierstraß limit-point theorem, Cartesian space, 
35.7.7 

Bolzano-Weierstraf literature, 35.5.2 

Bolzano-Weierstraf property, metric space, 37.7.21 

Bolzano-Weierstraf sequence-limit property, real numbers, 
35.7.8 

Bolzano-Weierstraf set, 35.5.4 

Bolzano-Weierstraf space, 35.5.5 

Bolzano-Weierstraf theorem, history, 35.7.4 

Bolzano-Weierstraf theorem, metric space, 37.7.21 

bone-setting, 22.0.3 

bones without flesh, formal logical calculus, 7.3.10 

book, definition-centric, 1.4.14 ~ 

Boole, George, 77.1.6 

Boole, George, the laws of thought, 3.2.7 

boolean prime ideal theorem, 13.8.8 

boot-strap, logic, 4.5.11 

boot-strapping of definitions, 2.1.1, 9.3.1 

Borel, Félix Edouard Justin Emile, 77.1.7 

Borel, Félix Edouard Justin Emile, axiom of choice, 7.12.7 

Borel, Félix Edouard Justin Emile, integration, 43.1.2 

Born, Max, 77.1.7 

boson, radiation field, 47.12.7 

bosonic radiation field, gauge potential, 75.4.1 

bosonic radiation field, principal bundle, 21.12.9, 47.11.2, 
69.11.5, 71.6.6 EID ERI 

bosonic radiation fields, gauge theory, 65.0.1 

bound, lower, real square matrix, 25.12 

bound, upper, real square matrix, 25.12 

bound for function, 11.3 pr 

bound function, lower, real square matrix, 25.12.2 

bound function, upper, real square matrix, 25.12.2 

bound of partially ordered set, lower, 11.2.4, 11.3.2 

bound of partially ordered set, upper, 11.2.4, 11.3.2 

bound variable, predicate calculus, 6.3.9 

boundaries of manifolds, out of scope, 1.6.3 

boundary, zero-thickness, 31.2.9 

boundary, zero-width, 31.2.2 

boundary of set, 31.9 

boundary of set, open/closed portions, 31.9.14 

boundary of set, topological, 31.9.5 

boundary operator, topological space, 31.9.7 

boundary point, topological manifold, 50.1.8 

boundary point of manifold, curvature, 70.4.5 

boundary point of set, 31.2.1 

bounded, pointwise, sequence of functions, 38.4.13 

bounded, pointwise, set of functions, 38.4.13 

bounded, uniformly, sequence of functions, 38.4.14 

bounded, uniformly, set of functions, 38.4.14 

bounded above, function, 11.3.7 

bounded above, set, 11.2.38 

bounded-above set, 11.2.5 

bounded below, function, 11.3.7 

bounded below, set, 11.2.38 

bounded-below set, 11.2.5 

bounded function, 11.3.7 

bounded set, 11.2.5, 11.2.3 

bounded set in metric space, 37.4.12 

bounded variation, locally, metric-space-valued function, 

38.10.7 

bounded variation, metric-space-valued function, 38.10.4 

bounded variation functions, 38.10 

bounds for sets, partial order; 11.2 

box, Pandora’s, non-Hausdorff manifolds, 50.0.1 

brace-and-semicolon notations for sets, 7.7.8 

brachistochrone, history of geodesics, 73.8.2 

bracket, Lagrange, 61.5.2 

bracket, Lie, 61.5 
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bracket, Lie, alternative names, 61.5.1 

bracket, Lie, anticommutativity, 61.5.9 

bracket, Lie, Cartesian space, 464 

bracket, Lie, Cartesian space, holonomy, 46.5 

bracket, Lie, comparison of interpretations, 61.5.5 

bracket, Lie, differentiability of output field, 61.5.13 

bracket, Lie, drop function, 61.5.6 

bracket, Lie, extension to tensor fields, 61.9.1 

bracket, Lie, integral curve, holonomy, 46.5.4 

bracket, Lie, integral curve, nonholonomy, 61.5.20 

bracket, Lie, interpretations, 61.5.3 

bracket, Lie, Lie algebra, 19.10.15 

bracket, Lie, Lie derivative, 61.8.1 

bracket, Lie, literature, 61.5.1 

bracket, Lie, map-related vector fields, 61.6 

bracket, Lie, naive vector field, 46.4.7 m 

bracket, Lie, on manifold, computation example, 61.5.14 

bracket, Lie, on manifold, coordinates expression, 61.5.11 

bracket, Lie, on manifolds, 61.5.7 

bracket, Lie, second-order operator field, 60.3.2 

bracket, Lie, serendipity, 61.5.4 ET 

bracket, Lie, swap function, 61.5.6 

bracket, Lie, well-definition proof, 61.5.8 

bracket, Poisson, literature, 61.5.2 

bracket-level diagram, infix logical expression, 3.12.1 

bracket of vector fields, Lie, Cartesian space, 46.1.11 

brackets, angle, inner product, 24.9.9 

brain, human, 22.0.3 

bread and butter, general topology, 49.2.8 

breadthless length, Euclid's line definition, 40.2.1 

breakfast, dog's, differential forms, 61.12.1 

breakfast, dog's, infinite sets, 13.7.1 

bricklaying versus architecture, 1.4.6 

Brouwer, Luitzen Egbertus Jan, 77.1.7 

Brouwer, Luitzen Egbertus Jan, topology, 31.1.2 

Brouwer/Hilbert encounter, intuitionism versus formalism, 
2.2.12 

bug propagation, mathematical logic, 4.5.11 

bug versus feature, axiom of choice, 7.12.7 | 

bull, horns, 77.4.1 

bundle, antisymmetric multilinear function, 56.5.26 

bundle, antisymmetric multilinear function, manifold chart, 
56.5.24 

bundle, antisymmetric multilinear function, non-topological, 
56.5.23 

bundle, antisymmetric multilinear function, notation, 56.5.14 

bundle, antisymmetric multilinear map, 56.7.18 

bundle, antisymmetric multilinear map, manifold chart, 
56.7.15 

bundle, antisymmetric multilinear map, non-topological, 
56.7.14 

bundle, associated baseless figure, constructions, 20.12 

bundle, associated differentiable fibre, orbit-space method, 
66.7.12 

bundle, associated differentiable fibre, patchwork, 66.7.10 

bundle, associated non-topological fibre, orbit-space, 21.14 

bundle, associated non-topological fibre, orbit-space method, 
21.14.2 

bundle, associated non-topological fibre, patchwork, 21.13.2 

bundle, associated topological fibre, orbit-space, 47.11 

bundle, associated topological fibre, orbit-space method, 
47.11.5 

bundle, associated topological fibre, patchwork, 47.10.4 

bundle, baseless figure/frame, contravariant, 20.10.8 

bundle, covector, on a manifold, 55 

bundle, differentiable fibre, 64 

bundle, differentiable fibre, connection, 67.5.4 

bundle, differentiable fibre, horizontal lift function, 67.5.4 
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bundle, differentiable fibre, pull-back atlas, 64.9 

bundle, differentiable fibre, with Lie structure group, 64.8.3 

bundle, differentiable principal, 66 T 

bundle, differentiable principal, right action chart- 
independence, 66.2.5 

bundle, double tangent, 59.1 


bundle, fibre, analytic, 64.10 

bundle, fibre, baseless, 20.10.1 

bundle, fibre, curvature of connection, 70 

bundle, fibre, differentiable, 64.8 ~ 

bundle, fibre, differentiable ordinary, 64.8.3 

bundle, fibre, empty, 21.8.15 

bundle, fibre, history, 47.0.3 

bundle, fibre, non-topological, 21.8.3 

bundle, fibre, topological, 47.5.5, 47.6, 47.6.5 

bundle, fibre, topological ordinary, 47.6.5 

bundle, fibre/frame, vector component transition array, 
22.9.12 

bundle, frame, cross-section, 57.11 

bundle, frame, on a manifold, 55 - 

bundle, line, differentiable, 65.2.12 

bundle, multilinear function, 56.4.19 

bundle, multilinear function, manifold chart, 56.4.17 

bundle, multilinear function, non-topological, 56.4.1 

bundle, multilinear map, notations, 56.7.4 

bundle, multilinear map, on a manifold, 56 

bundle, non-topological fibre, 21 ~ 

bundle, non-topological fibre, associated, 21.12, 21.12.5 

bundle, non-topological fibre, with fibre space F, 21.7. 

bundle, non-topological ordinary fibre, parallelism, 21.16 

bundle, non-topological principal, constant cross-section, 
21.10.2 

bundle, non-topological principal, identity chart for 
cross-section, 21.10.7 

bundle, non-topological principal, identity cross-section, 
21.10.3 

bundle, non-topological principal, right action 
chart-independence, 21.11.2 

bundle, non-topological principal, right action map, 21.11 

bundle, non-topological vector, linear space of cross-sections, 
24.11.4 

bundle, orbit-space associated vector, connection, 69.10 

bundle, ordinary, connection definition conversions, 67.11 

bundle, ordinary fibre, associated connection, 67.12, 67.12.3 

bundle, ordinary fibre, covariant derivative, 68.2.5 

bundle, ordinary fibre, differentiable cross-section, 64.7 

bundle, ordinary fibre, drop function, 64.6, 64.6.2, 64.6.6 

bundle, ordinary fibre, horizontal component, 64.5 

bundle, ordinary fibre, horizontal component map, 67.9.2 

bundle, ordinary fibre, horizontal subspace, 67.9.6 

bundle, ordinary fibre, naive derivative, 64.7 

bundle, ordinary fibre, parallel transport, 68.5 

bundle, ordinary fibre, Riemann curvature, 70.3, 70.4.3 

bundle, ordinary fibre, Riemann curvature, justification, 70.7 

bundle, ordinary fibre, vertical component map, 67.10.2 m 

bundle, principal, associated connection, 69.9 

bundle, principal, associated contravariant function, 47.12, 
66.8 m 

bundle, principa , connection definition conversions, 69.6 

bundle, principal, contravariant function, 47.12.3, 66.8.2. 

bundle, principal, curvature of connection, 70.5 

bundle, principal, horizontal lift function, 69.1.3 

bundle, principal, infinitesimal group action, 66.5 

bundle, principal, left action map, 21.11.18, 66.4.2 

bundle, principal, left group action map, 66.4 

bundle, principal, right action map, 66.22 — 

bundle, principal, right action map differential properties, 
66.2.18 
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bundle, principal, right action map properties, 66.2.14 

bundle, principal, right transformation group, 66.2.7 

bundle, principal, right translation of fundamental vector 
field, 66.6.10, 66.6.11 

bundle, principal, right translation operator for vector field, 
66.2.21 

bundle, principal, topological, right action map, 47.8.7 

bundle, principal, topological, right transformation group, 
47.8.14 

bundle, principal, transposed infinitesimal action map, 66.6.2 

bundle, principal, vertical component map, 69.4 

bundle, principal differentiable, right action map, 66.2 

bundle, principal fibre, affine connection curvature, 71.4.6 

bundle, principal fibre, connection, 69 

bundle, principal fibre, connection form, 69.5, 

bundle, principal fibre, differentiable, 66.1.2 - 

bundle, principal fibre, empty, 21.9.5 

bundle, principal fibre, horizontal component map, 69.3.2 

bundle, principal fibre, horizontal lift function, 69.1 

bundle, principal fibre, horizontal lift function transposed, 
69.2 

bundle, principal fibre, horizontal subspace, 69.3.7 

bundle, principal fibre, infinitesimal action map of Lie algebra 
elements, 66.5.2 

bundle, principal fibre, non-topological, 21.9.4 

bundle, principal fibre, pointwise right action map, 66.2.10 

bundle, principal fibre, topological, 47.8.3 

bundle, principal fibre, vertical component map, 69.4.2 

bundle, principal frame, 55.7 A 

bundle, second-level tangent, horizontal component swap 
function, 59.6, 59.6.3, 59.6.6 

bundle, submanifold tangent, tacit identification, 66.5.5 

bundle, symmetric multilinear function, notation, 56.6.3 

bundle, tangent, construction methods, 53.4 

bundle, tangent, cross-section, 57.1.3 = 

bundle, tangent, differentiable manifold, 54, 54.5.30, 65.9 

bundle, tangent, differentiable manifold philosophy, 53 — 

bundle, tangent, direct product for Cartesian spaces, 26.15 

bundle, tangent, direct product for manifolds, 54.7 7. 

bundle, tangent, double, 59.1.22 

bundle, tangent, for Banach space, 53.3.13 

bundle, tangent, higher-level, 59, 59.1.25 

bundle, tangent, horizontal lift function, 71.1 

bundle, tangent, induced atlas from base space, 54.5.18 

bundle, tangent, linear-style, 56.3.24 

bundle, tangent, local cross-section, 57.1.10 

bundle, tangent, multilinear-style, 56.3.23 

bundle, tangent, non-topological linear-style, 56.3.17 

bundle, tangent, non-topological multilinear-style, 56.3.16 

bundle, tangent, of submanifold, 54.6 m 

bundle, tangent, of tangent vector-tuple bundle, 59.7 

bundle, tangent, open subset of manifold, 54.6.7 — 

bundle, tangent, second-level, 59.1, 59.1.20 — 

bundle, tangent, transposed horizontal lift function, 71.5 

bundle, tangent, true, 53.1.7 ~~ 

bundle, tangent, unidirectional, 54.16 

bundle, tangent covector, 55.4, 55.4.11 

bundle, tangent covector, Cartesian space, 26.17 

bundle, tangent field covector, Cartesian space, 26.18 

bundle, tangent-line, affine space over group, 263 . 

bundle, tangent-line, Cartesian space, 26.14 

bundle, tangent-line, Cartesian space, philosophy, 26.12 

bundle, tangent-line, on affine space over module, 26.5 

bundle, tangent-line vector, 54.5.16 mu 

bundle, tangent-line vector, differentiable manifold, 54.5 

bundle, tangent multilinear function, 56.4 d 

bundle, tangent multilinear function, antisymmetric, 56.5 

bundle, tangent multilinear function, symmetric, 56.6 
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bundle, tangent multilinear map, antisymmetric, 56.7 

bundle, tangent multilinear map, symmetric, 567 

bundle, tangent space basis, 55.7.2 DIT 

bundle, tangent tensor, differentiable manifold, 56.3 

bundle, tangent vector, total space manifold chart, 54.5.19 

bundle, tangent vector frame, 55.6, 55.6.22, 55.6.31 

bundle, tangent vector frame, manifold chart, 55.6.24 

bundle, tangent vector frame, total space, 55.6.30 

bundle, tangent vector frame, total space manifold atlas, 
55.6.28 

bundle, tangent vector principal frame, 55.7.8 

bundle, tangent vector-tuple, 55.5, 55.5.37 

bundle, tangent vector-tuple, fibre atlas, 55.5.28 

bundle, tangent vector-tuple, manifold chart, 55.5.31 

bundle, tangent velocity, affine space, 26.8 

bundle, tangent velocity, Cartesian space, 26.16 

bundle, tangent velocity vector, total space, 54.10.10 

bundle, tensor, on a manifold, 56 

bundle, tensor, on Cartesian space, 30.6 

bundle, topological fibre, 47 m 

bundle, topological fibre, associated, 47.9 

bundle, topological fibre, equivalent, 47.7.6 

bundle, topological fibre, homomorphism, 47.7.3 

bundle, topological fibre, isomorphism, 47.7.5, 47.7.1 

bundle, topological fibre, parallelism, 48 

bundle, topological fibre, particle field, 47.12.2 

bundle, topological fibre/frame, contravariant, 47.13.2 

bundle, topological fibre/frame, local trivialisation, 47.13.4 

bundle, topological principal, right action chart-independence, 
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47.8.10 
bundle, trivial fibre, product-structured space, 10.15.11, 
21.0.7 


bundle, vector, 65.1 

bundle, vector, connection, 68, 68.1 

bundle, vector, covariant derivative, 68.2.9 

bundle, vector, covariant derivative, literature, 68.2.1 

bundle, vector, covariant derivative Leibniz rule, 68.2.16 

bundle, vector, differentiable, 65, 65.1.3 

bundle, vector, dual, 65.2.9 

bundle, vector, linear operations, 65.2 

bundle, vector, linearity of oblique drop function, 65.4.4 

bundle, vector, literature, 65.0.2 

bundle, vector, non-topological, 24.11, 24.11.2 

bundle, vector, oblique drop function, 65.4, 65.4.2 

bundle, vector, tangent, fibre bundle, 54.0.1 

bundle, vector, vertical drop function, 65.3, 65.3.5 

bundle, vector frame, manifold chart map, 55.6.25 

bundle, vector-frame, of a vector bundle, 65.8.10 - 

bundle, vector-frame, on a vector bundle, 65.8 

bundle, vector-tuple, differential of real-valued function, 
59.7.3 

bundle, vector-tuple, of a vector bundle, 65.7.13 

bundle, vector-tuple, on a vector bundle, 65.7 

bundle association, tangent vector/ covector, topological, 
55.4.17 

bundle association map, fibre, differentiable, 66.7.5 

bundle cross-section, covariant tensor, short-cut, 46.2.1, 
57.7.1, 58.11.6 

bundle cross-sections, principal, connection form, 69.11.3 

bundle example, ordinary, real group on four-space, 64.8.8 

bundle example, principal, real group on four-space, 66.2.19, 
66.2.24, 66.5.4, 66.6.3 

bundle framework, fibre, discomfort, 57.7.1 

bundle manifold atlas, tangent covector, 55.4.8 

bundle manifold chart, tensor, linear-style, 56.3.20 

bundle manifold chart, tensor, multilinear-style, 56.3.19 

bundle notation, higher-level tangent, 59.1.26 

bundle product, tangent, identification map, 54.7 
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bundle set, tagged tangent field covector, Cartesian space, 


26.18.13 

bundle total space, tangent-line, Cartesian space, 26.14.2 
bundles, associated tensor, Lie derivative, 61.9.1 
bundles, baseless figure, associated, 20.11 ~~ 
bundles, baseless figure/frame, associated, 20.11.2 
bundles, fibre, tangent bundles, 54.5.33 

bundles, fibre, tangent covector bundles, 55.4.15 
bundles, fibre/frame combined, topological, 47.13 
bundles, figure/frame, baseless, 20.10 T 
bundles, principal, principal frame bundles, 55.7.11 
bundles, principal frame, are principal bundles, 55.7.11 
bundles, tangent, are fibre bundles, 54.5.33 
bundles, tangent, recursive, 59 

bundles, tangent covector, are fibre bundles, 55.4.15 
bundles, topological fibre, equivalent, 47.7.12 
Bunny, Easter, axiom of choice, 13.8.15 

buoy, ocean current, Lie transport, 61.7.1 
Burali-Forti, Cesare, 77.1.7 

burden of hypotheses, 4.5.10 

bush, choice-function in the, 45.7.2 

business channel, definitions and theorems, 1.5.9 
but-not binary logic operator, 3.7.13 

butter and bread, general topology, 49.2.8 

BV function, 38.10.1 
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ation versus understanding, 1.4.6 


differential, 40 

differential, chain rule, 40.5.15 

differential, composition rule, 40.5.15 
differential, constant multiplication rule, 40.5.9, 


differential, product rule, 40.5.9, 41.1.18 
differential, quotient rule, 40.5.9, 41.1.18 
differential, reciprocal rule, 40.5.9, 41.1.18 
differential, single variable rules, 40.5 
differential, sum rule, 40.5.9, 41.1.18 
exterior, Cartesian space, 46 

fundamental theorem, 43.8 

fundamental theorem I, 43.8.3 

fundamental theorem 1, vector-valued, 43.9.7 
fundamental theorem 2, 43.8.5 

fundamental theorem 2, vector-valued, 43.9.8 
integral, 43 

predicate, 5.1, 5.2, 6, 6.1 

predicate, axiomatic system QC, 6.3.9 
predicate, axiomatic system QC+EQ, 6.7.6 
predicate, bound variable, 6.3.9 

predicate, deduction metatheorem, 6.6.27 
predicate, formalisations survey, 6.1.5 
predicate, free variable, 6.3.9 cd 
predicate, international language, 77.4.1 
predicate, with equality, 7.5.12 ` 
predicate tautology, 6.0.1 

propositional, 4 = 

propositional, axiomatic system PC, 4.4.3 
propositional, axiomatic system PC’ , 4.9.2 
propositional, formalisation, 4.1 e 
propositional, formalisations ‘survey, 4.2.1 


propositional, scroll management, 4.3.4, 4.3.7, 
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4.3.10, 4.8.14 
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tensor, Levi-Civita connection, 743 
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calculus, tensor, metric tensor field, 73.3 

calculus, vector field, 61 x 

calculus, vector field, Cartesian space, 46 

calculus of variations, extremisation of distance, 73.8 

calculus of variations, geodesics, history, 73.8.2 n 

calculus of variations, literature, 44.8.1 

calculus of variations, trade-off, function versus derivative, 

44.8.4 

calculus of variations on Cartesian space, 44.8 

Cambridge Analytical Society, derivative notation, 40.4.12 

camel's back, straw, 5.2.8 

camera, orbit-space, associated fibre bundle, 47.11.1 

camera-versus-object argument, 69.10.5 

can of worms, 8.0.1 

cancellation ring, 18.1.1 

cancellative ring, 18.1.15 

cancellative semigroup, 17.1.14 

cancellative semigroup, ordered ring, 18.3.11 

canonical basis, antisymmetric multilinear map space, 30.3.6 

canonical basis, multilinear dual function space, 27.7.2 

canonical basis, multilinear function space, 27.6.9, 30.2.11, 
55.5.41 = 

canonical basis, multilinear map space, 27.6.16, 30.2, 30.2.7 

canonical bilinear tensor map, 28.6.4 n a 

canonical dual basis, 23.8.2 

canonical dual basis family, 23.7.3 

canonical dual basis set, 23.7.2 — 

canonical injection, dual vector in mixed tensor space, 29.5.8, 
29.5.10 

canonical injection, scalar in mixed tensor space, 29.5.8, 
29.5.10 

canonical injection, scalar in multilinear function space, 
27.3.7 

canonical injection, scalar in tensor space, 28.1.16 

canonical injection, vector in mixed tensor space, 29.5.8, 
29.5.10 mum 

canonical injection, vector in multilinear map space, 27.2.23 

canonical injection, vector in tensor space, 28.1.18 

canonical isomorphism, interpretation, 29.2.22 

canonical map from linear space to second dual, 23.10.4 

canonical multilinear function map, 27.4.4 

canonical multilinear function map transpose, 27.4.13 

canonical multilinear map, 28.2, 28.2.1, 28.2.2 

canonical tensor map, 28.2, 28.2.2, 28.6.2 

canonical tensor map, dual-space, 28.2.8 

Cantor, Georg Ferdinand Ludwig Philipp, 12.1.32, 77.1.7 

Cantor, Georg Ferdinand Ludwig Philipp, point-set topology, 
31.1.2 

Cantor, Georg Ferdinand Ludwig Philipp, set theory, 7.1.6 

Cantor-Bernstein-Dedekind theorem, 13.1.6 m 

Cantor-Bernstein theorem, 13.1.6 

Cantor diagonalisation, constructible real numbers, 13.7.23 

Cantor diagonalisation procedure, 13.1.26 mmm 

Cantor real number system, 15.3.8 

Cantor's intersection theorem, Cartesian space, 37.9.6, 
37.9.12 

Cantor's intersection theorem, complete metric space, 37.9.8 

Cantor's Paradise, David Hilbert, 12.5.5 

Cantor's theorem, 13.1.27 

Cantor's theorem, compared to Hartogs's theorem, 13.3.1 

capital, intellectual, 4.5.11 

Carathéodory, Constantin, 77.1.7 

Carathéodory measure, 43.1.3 

cardinal, homogeneous, 13.10.7 

cardinal, inductive, 13.10.77 

cardinal, mediate, 13.10, 13.10.7, 13.11.6 

cardinal, reflexive, 13.10.7 

cardinal number comparability, 7.11.13 
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cardinality, 13 
cardinality, beta, 13.4 
cardinality, rational numbers, 15.2 

cardinality, trichotomy, 13.4.1 

cardinality, uniqueness, 6.8.7 

cardinality and tangent vectors, 53.1.17 
cardinality-constrained power set, 13.12 

cardinality of basis, finite-dimensional linear space, 22.7.16 
cardinality of countably infinite set, 13.7.9 

cardinality of finite set, 13.5.5 

cardinality of sets, comparability theorem, 13.1.20 
cardinality of sets, trichotomy theorem, 13.1.20 

cardinality of well-orderable set, 13.1.24 

cardinality-rank of set, 13.4.6 

cardinality representation by ordinal numbers, 13.2 
cardinality test using Peano axioms, 14.2.1 m 
cardinality yardstick, aleph numbers, 13.2.6 

cardinality yardstick, beta-cardinality, 12.6.4, 13.4.2 
cardinality yardstick, beth number, 13.4.12 

cardinality yardstick, countable choice axiom, 14.2.8 
cardinality yardstick, finite ordinal numbers, 13.5.1, 14.2.1 
cardinality yardstick, infinite ordinal numbers useless, 12.6.7, 
13.1.21 
cardinality yardstick, 
cardinality yardstick, 
cardinality yardstick, 
cardinality yardstick, 


mediate cardinals, 13.10.14 

w, 13.7.5, 13.115 ^ 

ordinal numbers versus beta-sets, 12.5.4 

rational numbers, 15.2.1 um 

cardinality yardstick, von Neumann universe, 12.6.3 

cardinality yardstick, Zermelo integer representation, 12.1.15 

cardinals, mediate, continuum hypothesis, 7.11.8 

Carnot, Lazare Nicholas Marguerite, 77.1.6 

carpet, red, 7.10.2 

carpet, sweep, variant notions of finiteness, 13.11.6 

carpet, sweep under, choice function, 45.3.6 

cart before horse, 49.7.5 

Cartan, Élie, 77.1.7 

Cartan, Elie, connection forms, 69.5.1 

Cartan, Elie, exterior derivative, 46.7.1 

Cartan, Elie, holonomy group, 70.1.4 

Cartan, Elie, moving frames, 55.7.12 

Cartan and Christoffel coefficient array equality, 68.3.6 

Cartan formalism literature, 69.5.1 

Cartan-style affine connection, 71.14.1 

Cartan-style connection form, 69.11.2- 

Cartan-style vector bundle connection, coefficient array, 
68.3.2 

Cartan-style vector bundle connection form array, 68.3.8 

Cartesian, locally, patchwork space, 49.9 

Cartesian abstract linear space, 26.11.11 

Cartesian atlas, locally, differentiable, 51.5.2 

Cartesian atlas, locally, topology induced on set, 51.5.7 

Cartesian atlas for a set, locally, continuous, 49.8.2 

Cartesian coordinates, 22.7.1 

Cartesian coordinates construction, 26.11.3 

Cartesian differentiable manifold, 51.4.22 

Cartesian geometry, 22.8.1 

Cartesian inner product space, 24.9.7, 26.11.1 

Cartesian linear space, 22.2.19, 26.11.1 

Cartesian linear space, standard basis, 22.7.9 

Cartesian locally Cartesian space, 49.4.16 

Cartesian metric linear space, 26.11.] 

Cartesian metric space, 26.11.1 

Cartesian product, binary, projection map, 10.12 


Cartesian product, binary, projection slice, 10.12 
Cartesian product, countable set-family, 13.7.22 
Cartesian product, finite set-family, 13.7.17 
Cartesian product, function slice, 10.12.10, 10.13.15 


Cartesian product, general, projection map, 10.13 
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Cartesian product, general, projection slice, 10.13 

Cartesian product, general set-family, 10.11.10 — 

Cartesian product, partial, 10.17, 10.172 

Cartesian product, sequence, 14.6 

Cartesian product, standard subsequence map, 14.6.11 

Cartesian product component range projection map, 11.5.27 

Cartesian product of family of sets, 10.11, 10.11.2 LEES 

Cartesian product of sets, 9.4, 9.4.2 mE 

Cartesian product of sets, properties, 9.4.6 

Cartesian set-product, concatenation map, 14.6.10 

Cartesian set-product, projection map, 10.12.2, 10.13.2, 
11.5.25 

Cartesian set-product, 
10.13.6 

Cartesian set-product, 
10.13.10 

Cartesian space, 26.11 

Cartesian space, atlas, standard, 49.7.12 

Cartesian space, atlas, usual, 49.7.12 

Cartesian space, Euclidean space, difference, 49.4.1 

Cartesian space, locally, 49 

Cartesian space, locally, differentiable, induced by atlas, 
51.5.14 

Cartesian space, locally, empty, dimension, 49.4.13 

Cartesian space, locally, Hausdorff, 33.1.32 

Cartesian space, locally, history, 49.4.8 

Cartesian space, locally, non-Hausdorff, 49.5, 51.8.2 

Cartesian space, locally, single-chart is manifold, 51.5.12 

Cartesian space, locally, T4 property, 49.4.22 

Cartesian space, locally, zero-dimensional, 49.4.9 

Cartesian space, metric space, completeness, 37.8.14 

Cartesian space, real, linear space structure, 22.2.19 

Cartesian space, tagged tangent field covector bundle set, 
26.18.13 

Cartesian space, tagged tangent field covector set, 26.18.11 

Cartesian space, tangent covector, 26.17.5, 26.17.6, 26.17.7 

Cartesian space, tangent covector space, 26.17.3 

Cartesian space, tangent field covector, 26.18.3 
26.18.5 

Cartesian space, tangent field covector set, 26.18.6 

Cartesian space, tangent field covector space, 26.18.8 

Cartesian space, tangent-line bundle total space, 26.14.2 

Cartesian space, tangent-line space, 26.13.11 

Cartesian space, tangent-line vector, 26.13.1, 26.13.2, 26.13.3 

Cartesian space, tangent-line vector set, 26.13.5 

Cartesian space, tangent-line vector velocity, 26.13.14 

Cartesian space, tensor, linear style, 30.6.10 

Cartesian space, tensor, multilinear style, 30.6.9 

Cartesian space, true tangent bundle, 53.1.8 

Cartesian space, vector field, 46.1 

Cartesian space, vector-valued function, differentiation, 41.8 

Cartesian space curve, mean value theorem, 40.8.6 E 

Cartesian space diffeomorphism, 41.7.6, 42.7.2 

Cartesian space local diffeomorphism, 42.7 

Cartesian space map, constant, zero partial derivatives, 


projection map, component subset, 


slice through point in set, 10.12.5, 


N 


6.18.4, 


41.2.23 

Cartesian space map common-domain product, differentiable, 
42.6.8 

Cartesian space map double-domain product, differentiable, 
42.6.9, 42.7.4 


Cartesian space maps, common-domain product, 
differentiability, 41.3.5 

Cartesian space maps, double-domain product, 
differentiability, 41.3.2 

Cartesian space p-norm, 24.7.11 

Cartesian space tangent covector bundle, 26.17 

Cartesian space tangent field covector bundle, 26.18 

Cartesian space tangent-line bundle, 26.14 
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Cartesian space tangent-line bundle, philosophy, 26.12 

Cartesian space tangent-line tangent space, 26.13 

Cartesian space tangent vector concatenation, 26.15.6 

Cartesian space tangent velocity bundle, 26.16 

Cartesian space tensor bundle, 30.6 m 

Cartesian space topology, 32.6 nx 

Cartesian space valued function on manifold, differentiable, 
51.7.2 

Cartesian spaces, locally, philosophy, 49.2 

Cartesian spaces, locally, terminology, 49.2.8 

Cartesian spaces, ontology, 26.12.5 

Cartesian tangent vector, 26.19.1 

Cartesian topological linear space, 26.11.1 

Cartesian topological space, 26.11.1, 32.6.3, 49.4.3 

Cartesian topological space, locally, 49.4, 49.4.7 

Cartesian topological space, locally, dimension, 49.4.10 

Cartesian topological space, locally, variable-dimension, 
49.4.5 

Cartesian tuple space, 16.4.1, 26.11.1 

casing stones, Tura, 26.10.1 

cat, fox-hunter, 1.4.3 

cat, Schródinger's, Hausdorff condition, 49.5.10 

cat, visual cortex, 40.2.1 

catchment area, differential geometry, 1.4.1, 2.0.1 

category theory, commutative diagram, 10.15.9 

catenary, history of geodesics, 73.8.2 

cattle, 22.0.3 mE 

Cauchy, Augustin Louis, 77.1.6 

Cauchy, Augustin Louis, continuity of functions, 31.12.2 

Cauchy, Augustin Louis, finite groups, 17.3.1 

Cauchy, Augustin Louis, integration, 43.1.2 _ 

Cauchy convergent sequence of functions, uniformly, 38.4.3 

Cauchy-Euler method, ODE existence, 44.3.1 

Cauchy integrability, uniform continuity condition, 43.3.20 

Cauchy integrable function, real-valued, 43.3.16 

Cauchy integral, 43.1.4, 43.3, 43.3.15 

Cauchy integral, Cauchy net, 43.3.19 

Cauchy method, ODE existence, 44.3.1 

Cauchy net, 37.8.15, 38.3.13 

Cauchy net, Cauchy integral, 43.3.19 

Cauchy-Peano theorem, ODE existence, 44.3.1 

Cauchy-Riemann-Darboux integral, 43.5 acum 

Cauchy-Riemann-Darboux integral, basic properties, 43.7 

Cauchy-Riemann-Darboux integral, history, milestones, 
43.5.1 

Cauchy-Riemann-Darboux integral, vector-valued integrand, 
43.9 

Cauchy-Riemann-Darboux-Stieltjes integral, 43.10 

Cauchy-Riemann integral, 43.4 

Cauchy-Schwarz inequality, elementary version, 16.6.17 

Cauchy-Schwarz inequality, module over Archimedean ring, 
19.8.9 

Cauchy-Schwarz inequality, module over ring, 19.8, 19.8.2 

Cauchy-Schwarz inequality, real linear space, 24.9.14 

Cauchy sequence, 37.8 = B: 

Cauchy sequence, metric space, 37.8.3 

Cauchy sequence of rational numbers, 15.3.7 

Cauchy sum, 43.3.14 -— 

cause, first, 3.1.6 

caveat, existence, 11.2.18 

Cayley, Arthur, 17.3.1, 22.0.1, 25.0.1, 77.1.6 

Cayley product of matrices, 25.1.1 = 

Cayley table, 17.1.4, 25.1.3 

CC (axiom of countable choice), 7.1.10 

CC-tainted theorem, 7.11.14, 13.7.22, 13.8.12, 13.8.13, 13.9.9, 
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CC-tainted theorem examples, 7.11.13 

ceiling function, 16.5.12 

centre of ball, 37.3.1 

centroid, 20.9.20 

chaff versus wheat, 77.4.15 

chain, set-inclusion, 12.5.17 

chain, set membership, 7.8.9 

chain rule, Cartesian spaces, 41.7 

chain rule, composition of map with curve in manifold, 
58.4.11 

chain rule, composition of real function with curve in 
manifold, 58.1.19 

chain rule, differentiability, differentiable manifold map, 
52.1.17 

chain rule, differential calculus, 40.5.15 

chain rule, global differential, differentiable manifold map, 
58.9.8 

chain rule, inverse, partial derivatives, 41.7.7 

chain rule, partial derivatives, 41.7.2, 41.7.4 

chain rule, partial derivatives, higher-order, 42.5.27 

chain rule, partial derivatives, second-order, 42.5.25 

chain rule, pointwise differential, differentiable manifold map, 
58.4.13 

chained functions, fully, 10.4.16 

chained functions, partially, 10.4.24 

channels, stereo, book presentation, 1.5.9 

chapter groups, 1.3 

chapter page counts, 1.3.0 

chapters, overview, 12 . 

characterisation, pointwise, interior and closure of set, 31.8.11 

characteristic, field, 18.7.10 

characteristic, unitary ring, 18.2.13 

characteristic function, alternative name for indicator 
function, 14.7.4 

chart, analytic, 51.10.4 

chart, anchor, 71.6.2 

chart, compatible, differentiable manifold, 51.4.2 

chart, continuous, locally Cartesian space, 49.6 

chart, differentiable manifold, empty, 51.5.5 

chart, fibre, antisymmetric multilinear map fibration, 56.5.19 

chart, fibre, association map, 66.7.5 

chart, fibre, compatible, 47.6.15 

chart, fibre, differentiable fibration, 64.2.2, 64.3, 64.3.2 

chart, fibre, differentiable fibre bundle, 64.8.3 

chart, fibre, differential transformation rule, 64.8.10 

chart, fibre, non-topological fibration, 21.5, 21.5.2 

chart, fibre, topological, 47.3 LT 

chart, fibre, topological fibration, 47.3.10 

chart, fibre, topological fibration with intrinsic fibre space, 
47.3.3 

chart, fibre, transition map subscript order, 47.6.11 

chart, fibre, vector-valued antisymmetric multilinear map 

fibration, 56.7.11 

chart, implicit, mathematical object, 26.11.6 

chart, locally Cartesian space, 49.6.2 

chart, manifold, affine space, 26.2.7 

chart, non-topological, 49.3, 49.3.3 

chart, tensor bundle manifold, linear-style, 56.3.20 

chart, tensor bundle manifold, multilinear-style, 56.3.19 

chart, topological manifold, 49.6.2 ps 

chart-basis covector, tangent space, 55.3.2 

chart-basis operator field, 57.3.5 

chart-basis operator field, lifted, on principal bundle, 69.14.4 

chart-basis vector, tangent operator space, 54.13.4 

chart-basis vector, tangent space, 54.4.9 

chart-basis vector field, 57.1.18 

chart-basis vector field, lifted, on principal bundle, 69.14.3 

chart-basis vector-tuple, tangent space, 55.5.18 
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chart-basis vector-tuple field, 57.4.2 

chart constructor, fibre, antisymmetric multilinear function 
bundle, 56.5.20 

chart constructor, fibre, antisymmetric multilinear map 
bundle, 56.7.12 

chart constructor, fibre, list, 65.7.8 

chart constructor, fibre, multilinear function bundle, 56.4.13 

chart constructor, fibre, principal frame bundle, 55.7.6 

chart constructor, fibre, tangent bundle, 54.5.7 

chart constructor, fibre, tangent covector bundle, 55.4.7 

chart constructor, fibre, tangent vector-frame bundle, 55.6.18 

chart constructor, fibre, tangent vector-tuple bundle, 55.5.25 

chart constructor, fibre, tensor bundle, 56.3.13, 56.3.14 

chart constructor, fibre, vector-frame bundle, 65.8.7 

chart constructor, fibre, vector-tuple bundle, 65.7.9 

chart constructor, manifold, antisymmetric multilinear 
function bundle, 56.5.25 

chart constructor, manifold, antisymmetric multilinear map 
bundle, 56.7.16 

chart constructor, manifold, list, 65.7.10 

chart constructor, manifold, multilinear function bundle, 
56.4.18 

chart constructor, 

chart constructor, 

chart constructor, 
55.6.26 

chart constructor, 
55.5.33 

chart constructor, manifold, tensor bundle, 56.3.21, 56.3.22 

chart constructor, manifold, vector-frame bundle, 65.8.8 

chart constructor, manifold, vector-tuple bundle, 65.7.11 

chart for cross-section, identity, non-topological principal 
bundle, 21.10.7 

chart for path, 36.1.8 

chart-independence, right action, differentiable principal 
bundle, 66.2.5 

chart-independence, right action, non-topological principal 
bundle, 21.11.2 

chart-independence, right action, topological principal 
bundle, 47.8.10 

chart map, manifold, vector frame bundle, 55.6.25 

chart space, coordinate-free, 53.1.11 

chart space, locally Cartesian space, 49.4.12 

chart transition formula, tangent vector frames, 55.6.19 

chart transition formula, tangent vector tuples, 55.5.27 

chart transition map, differentiable fibre, 64.3.13 

chart transition map, fibre, non-topological fibre bundle, 
21.8.12 

chart transition matrix, differentiable manifold, 51.4.18 

chart transition rule, tangent vector components, 54.1.11 

chart transition rules, covector, double contragredient, 55.3.10 

chart transition rules, manifold, contragredient, 54.4.12, 
55.2.16 ~~ 

charts, compatible, covering set is atlas, 51.4.13 

Chasles, Michel, 77.1.6 

check-accent, vertical component, 59.1.7 

check accent on free variable, 6.3.16 

chess, automated theorem proving, 4.5.12 

chess, certification of axiom-set compliance, 7.4.1 

chess and formalism, 2.2.12 nm 

chess-like decision trees, 2.4.8 

chicken, egg, derivative and differentiability, 41.1.1 

chicken, egg, integral curve, 44.5.4, 46.5.4 

chicken-foot symbol, 1.5.7, 8.8.4, 9.5.28, 17.0.5 

chicken-foot symbol, tensor space diagram, 28.1.9, 29.3.5 

chickens, roost, 77.4.15 

Chinese mathematics, 14.9.6 

choice, dependent, axiom, 7.1.11 


manifold, 
manifold, 
manifold, 


tangent bundle, 54.5.21 
tangent covector bundle, 55.4.8 
tangent vector-frame bundle, 


manifold, tangent vector-tuple bundle, 
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choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 
choice 


axiom, 5.2.7, 7.11.10, 32.1.11 

axiom, Andromeda Galaxy holiday, 22.7.20 
axiom, Cantor’s intersection theorem, 37.9.9 
axiom, career implications, 2.1.5 

axiom, Cartesian product of sets, 10.11.9 
axiom, clothes, 77.4.1 

axiom, complete metric space, 37.8.10 
axiom, countable, 7.11.14, 13.7.21 

axiom, countable, doppelgänger, 33.7.4 
axiom, countable, infinite sets, 13.7.5 _ 
axiom, countable, magic wand, 13.10.1 
axiom, countable, separable space, 33.4.22 
axiom, Easter Bunny, 13.8.15 

axiom, equinumerosity, 13.1.16 

axiom, existence of homeomorphisms, 31.14.5 
axiom, faith, 7.10.3 

axiom, false, 7.12.5 

axiom, families of sets and functions, 10.8.6 
axiom, Father Christmas, 13.8.15 

axiom, imaginary parachute, 7.11.1, 10.3.1 
axiom, independence from ZF axioms, 22.6.11 
axiom, infinite, 7.11, 7.12 


linear space basis, 22.7.20, 22.7.22, 27.5.2 
choice axiom, 
not proven, Scottish law, 7.1.11 
choice axiom, 45.3.7 
choice axiom, reversing logical quantifiers, 38.2.2, 45.2.1 
choice axiom, theorems lost without it, 7.11.13 
choice axiom, Urysohn’s lemma, 33.3.22 
choice axiom equivalents, literature, 7.11.12 
choice function, power-set, 10.3.7 
choice theorem, finite, Cartesian products, 13.7.17 
Christoffel, Elwin Bruno, 77.1.7 
Christoffel, Elwin Bruno, relativity, 77.3.2 
Christoffel array, correction terms, 71.4.1 
Christoffel array, history, 67.1.4, 74.3.3 
Christoffel array, negative sign, 71.2.2 
Christoffel array, Schild’s ladder, 72.22 


choice axiom, linear functional extension, 23.5.1 
choice axiom, 
literature, 7.11.2 
choice axiom, local, 7.12.4 
choice axiom, 
not to be taken literally, 45.3.7 

choice axiom, product topology, 32.12.5 
choice axiom, quantifier swapping, 10.11.11, 13.8.9 
choice axiom, shim theorem, 13.8.14 
choice axiom, socks metaphor, Russell, 7.12.2 
choice axiom, Tikhonov’s theorem, 33.5.16 
choice axiom, tooth fairy, 13.8.15 
choice axiom, useless, 7.4.3 
choice axiom, weeded out, 1.6.2 
choice function, 10.3 
choice function, open base, second countable space, 33.4.23 
choice function, sweep under carpet, 45.3.6 
choice function for separable topology, 33.4.8 
chord, classical definition, 44.2.22 
Christmas, Father, axiom of choice, 13.8.15 
Christoffel, Elwin Bruno, absolute differential calculus, 27.1.5 
Christoffel, Elwin Bruno, Christoffel symbol, 67.1.4 
Christoffel and Cartan coefficient array equality, 68.3.6 
Christoffel array, affine connection, 67.1.1, 71.2.2 
Christoffel array, geodesic curve, 72.1.6 
Christoffel array, grooves in space, 60.1.4 
Christoffel array, horizontal lift function, 62.0.1, 67.7.10 
Christoffel array, index order, 71.13.1 
Christoffel array, negative sign, covariant derivative, 68.1.9, 

70.4.3, 71.4.1, 71.12.2 
Christoffel array, non-tensorial, 74.3.8 
Christoffel array, Riemann curvature, 70.4.3 
Christoffel array, tensorisation, 60.4.1, 60.4.12 
Christoffel array, torsion, 71.12.2 
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Christoffel array, vector bundle, 68.1.8 

Christoffel array field, affine connection, 71.2.1 

Christoffel array on two-sphere, 76.6.1 

Christoffel coefficient array, Levi-Civita connection, 74.3.4 
Christoffel symbol, affine connection, 71.2.1 — 
Christoffel symbol, history, 74.3.3 

chronology, Lorentz transformations and relativity, 77.3.2 
chronology of mathematicians, 77.1 

circle, great, terrestrial coordinates, 76.7.3 

circle topology, semi-open interval, 32.6.14 

circle topology distance function, semi-open interval, 37.2.10 
circled arrow notation, 10.19.8 

circuit, digital electronic, 3.2.9 

clarity, ontological, tangent vector definition, 54.1.6 

clash, name, field, 18.7.6 

class, equivalence, 9.8.5 

class, mathematical, 8.8.2 

class of object, 8.8, 44.6.8 

class of object, coefficient matrix, 29.4.8 


63.7.1 
class of object, parametrisation by sets, 9.1.4 
class of object, predicate logic, 5.1.1 amo 
class of sets, algebraic, 18.11 ENS 
class of subsets of a set, multiplicative, 18.11.4 
class-tag dictionary, mathematical objects, 44.6.8 
classes, infinite differentiability, interpretation, 42.1.12 
classes, topological spaces, 33 
classes, topology, 31.11.11 23 
classes of sets, 18.12 
classical electromagnetism, gauge theory, 70.8.7 
classical matrix group, 25.14.1 
classification, topological fibre bundles, out of scope, 47.7.1 
classification of set, 9.8.9 
clear as mud, context-dependent, 63.6.7 
Clifford, William Kingdon, 77.1.6 
Clifford, William Kingdon, parallelism, 67.1.2 
cliffs of Dover, white, bluebirds, 77.4.15 
close enough is close enough, 44.6.16 
closed annulus, 37.3.14 
closed ball, 37.3.1, 37.5.13 
closed ball, punctured, 37.3.14 
closed ball with zero radius, 37.3.4 
closed curve, 36.2.11 
closed curve, simple, 36.2.11 
closed interval, 11.5.10, 16.1. 
closed interval notation, symmetrised, 16.1.15 
closed neighbourhood, 31.8.26 
closed neighbourhoods, 31.8.25 
closed-open interval, 11.5.10, 16.1. 
closed portion of boundary of set, 31.9.14 
closed set, 31.4.1 
closed set, alternative definition, 31.10.14 
closed set symbol F, 31.4.2 
closed-singleton topology, minimal, 31.11.8 
closure, exterior, topology, 31.9.1 
closure, infix logical expression space, substitution, 3.12.5, 
3.12.12 
closure, postfix logical expression space, substitution, 3.11.8 
closure, prefix logical expression space, substitution, 3.11.15 
closure, topological, 31.8.7 
closure, topological, notation, 31.8.10 
closure notations, topological, survey, 31.8.10 
closure of set, 31.8 
closure of set, pointwise characterisation, 31.8.11 
closure of set unions under arbitrary unions, 8.6 
closure operator, relation to strength of topology, 31.8.22 
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closure operator, topological, basic properties, 31.8.13, 
31.8.14, 31.8.16, 31.8.17 

closure operator, topological space, 31.8.9 

clothes, axiom of choice, 77.4.1 

cloud of statistical variations, 2.2.5 

clouds on the horizon, physics, 36.3.1 

cluster point, w, 31.10.16 

cluster point of set, 31.10.2 

cluster point of set, oo-, 31.10.17 

co-existence, tangent spaces, 26.8.1, 53.3.2 

co-velocity of tangent covector, Cartesian space, 26.18.9 

coarse topological space, 31.3.18 

coarse topology, 31.3.18 

coarser, partition, 8.7.15 

coarsest common refinement of interval partitions, 43.3.7 

codimension, linear space, 22.5.17 

codomain of relation, 9.5.4 

coefficient array, Christoffel, Levi-Civita connection, 74.3.4 

coefficient array, moving-frame, vector bundle connection, 
68.3 

coefficient array field, affine connection, 71.2.2 

coefficient array for vector bundle connection, Cartan-style, 
68.3.2 

coefficient array of connection, vector bundle, 68.1.8 

coefficient sequence of real analytic function, 42.8.3 

coefficient space, multinomial, 14.12.20 

coefficient space, multinomial, symmetric, 14.12.20 

coefficient versus component versus coordinate, 53.3.15 

coefficients, tensorisation, 74.3.8 

coefficients of affine connection on tangent bundle, 71.13 

Cohen, Paul Joseph, 2.1.3, 77.1.8 PESE 

coin, two sides, 20.0.01 . 

collections, synonymous with sets, 7.1.7 

column index, matrix element, 25.2.1] 

column matrix injection map, 25.2.18 

column matrix of a matrix, 25.2.14 

column matrix span, 25.5.2 

column null space, 25.6 

column null space of matrix, 25.6.2 

column nullity, 25.6 

column nullity of matrix, 25.6.3 

column of a matrix, 25.2.13 

column rank of matrix, 25.5.8 

column span, matrix, 25.5 

column vector of a matrix, 25.2.15 

comb space, not locally connected, 34.7.7 

combination, convex, 22.11.2, 24.4.1 

combination, convex, linear space, 22.11 

combination, convex, points in affine space, 26.9.14 

combination, convex, vectors, 20.9.21 

combination, linear, 20.9.20, 20.9.22, 22.3, 22.3.2, 22.5.20 

combination, linear, formal, 22.2.22 ^ 

combination of points, affine, 24.4.3 

combination symbol, 14.9.2 

combinations, 14.9 

combinatorial topology, 31.1.2 

combinatorics, topology on finite set, 31.5.6 

combined topological fibre/frame bundles, 47.13 

comma-notated tuple concatenation, 46.8.7 

comma notation, partial derivatives, tensor calculus, 73.3.5 

commensurable lines, 26.11.3 

common divisor, greatest, 15.1.4, 40.2.3 

common-domain direct function-product, properties, 10.15.6 

common-domain direct product map, differentiability, 52.6.13 

common-domain direct product of functions, 10.15 

common-domain direct product of functions, continuity, 
32.11.2 
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common-domain direct product of functions, homeomorphism, 
32.11.3 

common-domain direct product of functions, pointwise, 
10.15.2 

common-domain direct product of maps, differential, 54.7.3 

common-domain direct product of relations, 9.7.9 EE 

common-domain function-product homeomorphism, 32.11 

common-domain product, Cartesian space maps, ^ 
differentiability, 41.3.5 

common-domain product, Cartesian space maps, 
differentiable, 42.6.8 

common-domain product-map, differential, 58.7 

common-domain product map differential, component 
differentials, 58.7.9 

common-domain product of differentiable maps, 52.7.5 

common-domain product of maps, pointwise differential, 
58.7.7, 58.10.7 

common refinement, interval partitions, 43.3.5 

common refinement condition, directed set, 37.8.15 

common refinement of interval partitions, coarsest, 43.3.7 

communication network, data packet, 21.15.77 pem 

community, mathematical, 7.10.2 

commutative diagram, category theory, 10.15.9 

commutative division ring, 18.7.4 

commutative group, 17.3.23 

commutative group, ordered, 17.5.8 

commutative group, ordered, Archimedean, 17.5.9 

commutative group, ordering, Archimedean, 17.5.9 

commutative group ordering, 17.5.7 

commutative ring, 18.1.17 

commutative semigroup, 17.1.15 

commutative unitary ring, 18.2.14 

commutative unitary ring, associative algebra, 19.9.2 

commutative unitary ring, differentiable real functions, 51.6.4 

commutativity, partial derivative, theorem, 42.3.6, 42.5.23 

commutativity of partial derivatives, 42.3 

compact analytic manifold, 51.10.6 gs 

compact-interval curve, 36.2.9 

compact neighbourhood, 33.6.2 

compact-open topology, 33.5.20 

compact set, 33.5.10 

compact set, co-limit-point, 35.5.9 

compact set, co-limit-point, w-infinite-set, 35.5.12 

compact set, oc-limit-point, sequence-range, 35.6.4 

compact set, limit-point, 35.5.4 

compact set, limit-point, w-infinite-set, 35.5.12 

compact set, limit-point, sequence-range, 35.6.4 

compact set, locally, 33.6.4 

compact set, sequentially, 35.6.2 

compact set, strongly locally, 33.6.11 

compact set terminology, history, 33.5.1 

compact sets, closed in Hausdorff space, 33.5.14 

compact sets, real, explicit sequential compactness, 35.7.10 

compact sets, real, sequential compactness, 35.7.11 

compact space, limit-point, 35.5.5 

compact space, sequentially, 35.6.2 

compact topological space, 7.11.13. 

compact topology, locally, 33.6.3 

compact topology, strongly locally, 33.6.10 

compactness, cover-based, 33.5 

compactness, explicit sequential, real compact sets, 35.7.10 

compactness, Heine-Borel, 33.5.9 

compactness, hereditary, 33.5.12 

compactness, local, 33.6 

compactness, metric space, 37.7 

compactness, sequence-limit-based, 35.6 

compactness, sequential, real compact sets, 35.7.11 

compactness, set-limit-based, 35.5 
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compactness classes, topological spaces, cover-based, 33.7 

comparability, well ordered sets, 11.7 m 

comparability theorem, set cardinality, 13.1.20 

comparability theorem, well-ordered sets, 11.7.7 

compatibility, connection form localisations, 69.12.5 

compatible chart, differentiable manifold, 51.4.2 

compatible charts, covering set, atlas, 51.4.13 

compatible fibre chart, 47.6.15 

complement, set, 8.2, 8.2.1 

complement topology, finite, 31.11.7 

complete atlas, disadvantages, 50.4.9, 51.2.2, 51.4.4, 54.5.31, 

54.9.12, 64.4.1 

complete atlas, locally Cartesian space, 49.7.18 

complete atlas, topological manifold, 49.7.18 

complete C^ atlas, differentiable manifold, 51.4.5 

complete lattice, 11.4.4 

complete metric space, 37.8.9 

complete metric space, nested set convergence, 37.9 

complete ordered field, 18.9, 18.9.3 imd 

complete ordered field axioms, real numbers, 15.9.3 

completely regular topological space, 33.3.18 

completely separable space, 33.4.11 

completeness, logical calculus, 6.6.14 

completeness, metric space, 37.8 

completeness, metric space, Cartesian space, 37.8.14 

completeness, metric space, real numbers, 37.8.14 

completeness theorem, geodesic, Hopf-Rinow, 73.7.2 

complex analytic function, 42.8.5 

complex Banach space, 39.4.4 

complex holomorphic function, 42.8.7 

complex Lie group, 62.2.10 mmm 

complex linear space, 22.1.16 

complex number, 16.8 

complex number, ideal, 18.1.10 

complex number, imaginary part, 16.8.5 

complex number, real part, 16.8.5 

complex number principal argument, 44.2.10 

complex number system, 16.8.1 

complex number system, integral domain, 18.2.19 

complex numbers, absolute value, 16.8.8 

complex numbers, magic properties, 16.8.11 

complex numbers, out of scope, 1.6.3 

complex numbers, topology, usual, 32.6.12 

complex numbers, usual topology, 32.6.11 

complex topological linear space, 39.1.3 

component, connected, divide and conquer, 34.5.1, 34.5.16 

component, connected, of a point, 34.6.2 

component, connected, topological space, 34.5, 34.5.3 

component, connected, topological space subset, 34.5.6 

component, horizontal, of tangent vector, 64.5.3 

component, open interval, point in real-number open set, 
32.7.5 

component array, basis-transition, 22.9.6 

component array, basis-transition, multiplication conventions, 
22.9.10 

component array, linear map, 23.2.5 

component array field, Riemannian metric, 73.3.2 

component array field, Riemannian metric, inverse, 73.3.8 

component differentials, common-domain product map 
differential, 58.7.9 

component enumeration, connected, countable, 34.6.16 

component enumeration, open set of real numbers, 32.7 

component enumeration for open set, dartboard, 32.7.7, 
34.8.3 NES 

component function for vector, 22.8.16 

component map, connected, 34.6.2 

component map, connected, topological space, 34.6 

component map, dual, 23.9.1, 23.9.4 
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component map, dual linear space, 23.9 

component map, horizontal, connection on fibre bundle, 67.9 

component map, horizontal, ordinary fibre bundle, 67.9.2. 

component map, horizontal, principal fibre bundle, 69.3.2 

component map, linear map, 23.2.2, 23.2.8, 23.2.9, 

component map, linear space, 22.8.6, 22.8.7, 22.8.8 

component map, vector, 22.8 

component map, vertical, connection on fibre bundle, 67.10 

component map, vertical, ordinary fibre bundle, 67.10.2 

component map, vertical, principal bundle, 69.4 

component map, vertical, principal fibre bundle, 69.4.2 

component map for a linear map, 23.2 

component matrix, linear map, 25.7 — 

component matrix of a linear map, 25.7.3 

component partition, countable, open set, 34.8.2 

component-projection map, differentiability, 42.6.17 

component space, baseless figure/frame bundle, 20.10.8 

component space, topological fibre/frame bundle, 47.13.2 

component spaces, infinite, multilinear map, 27.2.7 

component swap function, horizontal, global, 59.6.5 

component transition array, vector, fibre/frame bundle, 
22.9.12 

component transition map, vector, 22.9, 22.9.4 

component transition matrix, dual linear space, 23.9.7 

component tuple, second-order tangent, 60.5.3 

component tuple, tangent covector, 55.2.5 

component tuple for vector, 22.8.17 

component unpacking, double tangent vector, 59.3.6 

component versus coordinate, terminology, 54.1.5 

component versus coordinate versus coefficient, 53.3.15 

components, multilinear map, 27.5 

components, tangent vector, chart transition rule, 54.1.11 

components for mixed tensors, mixed primal spaces, 29.4 

components for mixed tensors, single primal space, 29.6 

components of open set, connected, enumeration, 34.8.1 

componentwise differentiable two-variable manifold map, 
52.6.11 

composite of functions, 10.4.17 

composite of partially defined functions, 10.10.6 

composition, pointwise, of function-valued functions, 10.4.26 

composition, relations, 9.6 

composition of functions, 10.4, 10.4.17 

composition of functions, associativity, 10.4.21 

composition of functions, notation, 10.4.18 

composition of partially defined functions, 10.10, 10.10.6 

composition of relations, 9.6.2 PEE 

composition rule, differential calculus, 40.5.15 

composition rule, partial derivatives, 41.7.2, 41.7.4 

composition rule for differentiation, 42.1.17 

compound sequent, 5.3.14 

comprehension, definition of sets, uniqueness, 7.6.24 

comprehension, naive, set specification, 7.7.9 

comprehension, unrestricted, 3.0.5 mm 

comprehension axiom, Zermelo set theory, 7.7.3 

compression, antisymmetric higher-degree array, 25.15.13 

compression, symmetric higher-degree array, 25.15.12 

computation versus understanding, 1.4.6 d 

computer, virtual memory, 7.10.2 

computer hardware, real numbers, 15.3.1 

computer operating system, 4.5.11 

computer program, analogy to mathematical theorems, 2.2.14 

concatenation, tuple, comma-notated, 46.8.7 

concatenation map, Cartesian set-products, 14.6.10 

concatenation of Cartesian space tangent vectors, 26.15.6 

concatenation of curves, 36.6 

concatenation of curves, domain-transformed, 36.6.9 

concatenation of curves, domain-translated, 36.6.7 

concatenation of families, juxtaposition notation, 29.3.9 
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concatenation of joinable curves, 36.6.5 

concatenation of lists, 14.12.6 

concatenation of paths, 36.8.12 

concatenation of tangent vectors, 54.7.1 

concatenation of tangent vectors, differentiable manifolds, 
54.7.8 

concatenation operation for tuples, 16.4.3 

concave function, linear space, 23.12.2 

concavity implied by non-increasing derivative, 40.6.11 

concavity implied by non-positive second derivative, 42.1.22 

concept, intermediate, force, 53.1.4 

conceptual economy, 10.1.3 

concrete differentiable manifold structure, 51.5.17 

concrete proposition, 2.4.3 

concrete proposition domain, 3.2, 3.2.3 

concrete proposition domain examples, 3.2.9 

concrete variable space, 7.5.12 

concretely identical linear spaces, 59.6.2 

conditional assertion, 4.3.9, 4.3.10 

conditional proof, deduction rule, 4.8.4 

cone, forward, finite path, 2115.97 | 

cone, positive, field, 18.8.9 

cone, positive, ring, 18.3.17 

cone condition, exterior/interior, 26.19.4, 51.11.5 

conformable matrices for multiplication, 25.3.6 

conformal-invariant geometry, 77.2.5 

conformal map, 20.6.3 

conformal transformation group, 77.2.6 

conformality of transition maps, 54.5.31 


conjugate map differential, adjoint map, 62.10.4 
conjugate of subset of group, left, 17.8.2 
conjugate of subset of group, right, 17.8.3 
conjugate operator, linear space, 23.11.3 
conjugation map, 17.8.12 

conjugation maps in groups, 17.8 

conjunction, logical, 3.6.2 m 

conjunctive normal form, 4.7.4 


connected, locally, separable space, 34.8 

connected, locally, subset of topological space, 34.7.5 
connected, locally, topological space, 34.7.3 T 
connected, locally pathwise, topological space, 36.7.14 
connected component, divide and conquer, 34.5.1, 34.5.16 
connected component, topological space, 34.5, 34.5.3 
connected component, topological space subset, 34.5. 
connected component enumeration, countable, 34.6.1 
connected component map, 34.6.2 

connected component map, topological space, 34.6 
connected component of a point, 34.6.2 

connected components of open set, enumeration, 34.8.1 
connected function, forward, 35.2.4 mnm 
connected function, strongly, 35.2.6 

connected function, weakly, 35.2.5 

connected point, locally, topological space, 34.7.2 
connected points, pathwise, 36.7.2, 36.7.5 

connected set, 34.1.6 

connected set, pathwise, 36.7, 36.7.6 

connected topological space, 


O 


o 


4.1.3 

connected topological space, pathwise, 36.7.7 

connected topological spaces, 34.1 

connectedness, topological 34 

connectedness, topological, literature, 34.1.1 

connectedness and continuity of functions, 31.2.8 
connectedness-based definition for function continuity, 35.2 
connectedness in strongly separated spaces, 34.2 m 
connectedness of topological spaces, local, 34.7 
connectedness principle, transfinite induction, 11.8.5 
connectedness verification procedures, 34.4 

connection, affine, 26.1.3, 71, 71.1.2 
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connection, affine, Cartan-style, 71.14.1 

connection, affine, Christoffel array, 71.2.2 

connection, affine, coefficient array field, 71.2.2 

connection, affine, covariant derivative, 71.6 

connection, affine, covariant derivative Leibniz rule, 71.6.7 

connection, affine, grooves in space, 60.1.4 

connection, affine, metric-free parallelism, 71.6.1 

connection, affine, on frame field, 71.14, 71.14.2 

connection, affine, Ricci curvature tensor, 71.11.11 

connection, affine, Riemann curvature tensor field, 71.11.5 

connection, affine, tensorisation coefficients, 60.4.12 

connection, affine, terminology due to Hermann Weyl, 67.1.1, 
71.0.1 

connection, affine, transposed, 71.5.2 

connection, associated, contravariant principal bundle 
function, 69.10.3 

connection, associated, literature, 67.12.1 

connection, associated, ordinary fibre bundle, 67.12, 67.12.3 

connection, associated, principal bundle, 69.9 E 

connection, associated, vector-tuple bundle, 59.7.1, 71.3, 
71.3.2, 74.2.1 un 

connection, associated OFB, formula using PFB connection, 
69.9.4, 69.9.6 

connection, differentiable, ordinary fibre bundle, 67.7.6 

connection, differentiable, principal bundle, 69.1.13 

connection, general, torsion, 72.2.3 T 

connection, information style, 67.2.1 

connection, Koszul, 67.2.1, 67.2.2, 71.6.9 

connection, Koszul, torsion, 71.12.2, 71.12.4 

connection, Levi-Civita, 74.2.4, 74.38 

connection, Levi-Civita, abstract formula, 74.2.13, 74.2.16 

connection, Levi-Civita, Christoffel coefficient array, 74.3.4 

connection, Levi-Civita, Euclidean space, 26.11.1 

connection, Levi-Civita, horizontal lift function, 74.2 

connection, Levi-Civita, metric layer, 73.1.4 = 

connection, Levi-Civita, on two-sphere, 76.6.1 

connection, Levi-Civita, orthogonal, 74.1.2 

connection, Levi-Civita, tensor calculus, 74.3 

connection, Levi-Civita, tensorisation, 60.4.12 

connection, Levi-Civita, torsion-free, 74.2.3 

connection, Lie, 61.7, 61.7.5 

connection, Lie, for vector fields, 61.7.9 

connection, Lie, geometric interpretation, 61.7.1 

connection, Lie, versus affine connection, 61.7.4 

connection, linear, vector bundle, 68.1.4 

connection, metric-compatible, 74.2.2 

connection, orbit-space associated vector bundle, 69.10 

connection, orthogonal, 73.1.2 = 

connection, orthogonal, Levi-Civita, 74.1.2 

connection, PFB, formula for associated OFB connection, 
69.9.4, 69.9.6 

connection, reconstruction of parallel transport from, 67.3 

connection, torsion-free, 71.12.8 RE 

connection, torsion-free, interpretation, 74.2.10 

connection, vector bundle, Cartan-style coefficient array, 
68.3.2 

connection, vector bundle, linearity, 68.1.2 

connection, vector bundle, moving-frame coefficient array, 
68.3 

connection, vector bundle, oblique drop, linearity, 68.1.6 

connection coefficient array, vector bundle, 68.1.8 

connection curvature, affine, principal fibre bundle, 71.4.6 

connection definition conversions, ordinary bundle, 67.11 

connection definition conversions, principal bundle, 69.6 

connection definition conversions overview, 69.15 Pu 

connection definition conversions table, 69.15. 

connection definitions, Rosetta stone, 69.15.3 

connection definitions list, 69.15.1 
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connection definitions survey, 67.2.2 

connection form, Cartan-style, 69.11.2 

connection form, construction from cross-section localisations, 
69.12.6 

connection form, construction from gauge potentials, 69.12.7 

connection form, conversion to gauge potential, 69.11.3 

connection form, curvature, 70.5.2 

connection form, differentiation by tangent vector, 54.14.6 

connection form, effect of right action map, 69.8 

connection form, Ehresmann, 69.5.5 a 

connection form, gauge transformation rule, 69.13.10 

connection form, localisation, via cross-section, 69.11.3 

connection form, Maurer-Cartan form, 62.10.13 

connection form, misnomer for covariant derivative form, 
69.5.2 

connection form, negative sign, covariant derivative, 69.7.8 

connection form, principal bundle, Minkowski space, 69.5.6 

connection form, principal fibre bundle, 69.5.4 

connection form, reconstruction from localisations, 69.12.3 

connection form array, vector bundle, Cartan-style, 68.3.8 

connection form equals vector minus its generator, 69.7.7 

connection form example, curvature, 70.5.3 

connection form example, exterior derivative, 69.5.8 

connection form for principal bundle cross-sections, 69.11.3 

connection form formula using connection generator, 69.7.3 

connection form is a covariant derivative form, 69.7.6 

connection form localisation, overlap consistency, 69.12.4 

connection form localisation component function, 69.13.3 

connection form localisation components, 69.13 

connection form localisation via cross-sections, 69.11 

connection form localisations, compatibility, 69.12.5 

connection form on principal fibre bundle, 69.5 

connection formulas, connection generator function, 67.11.6 

connection formulas using connection generator, 69.7.2 

connection generator, difficult extraction from connection, 
69.9.1 

connection generator, ordinary fibre bundle, 67.6.5 

connection generator, ordinary fibre bundle, transition rule, 
67.6.8 

connection generator function, 67.6 

connection generator function, connection formulas, 67.11.6 

connection generator function, principal bundle, 69.7 

connection generator function, terminology, 67.6.6 _ 

connection layer, 1.1 ERN 

connection on differentiable fibration, 67.4.2 

connection on differentiable fibre bundle, 67.5.4 

connection on fibre bundle, curvature, 70 

connection on fibre bundle, horizontal component map, 67.9 

connection on fibre bundle, vertical component map, 6710 

connection on ordinary fibre bundle, 67 mm 

connection on ordinary fibre bundle, curvature, 70.2 

connection on principal bundle, curvature, 70.5 mi 

connection on principal fibre bundle, 69 

connection on vector bundle, 68, 68.1 — 

connection on vector bundle, curvature, 70.4 

connection representation, horizontal subspace, literature, 
67.9.8 

connections, history, 67.1 

connections, principal bundles, equi-informational structures, 
69.12.8 

connections on ordinary bundle, conversion rules, 67.11.2 

connections on principal bundle, conversion rules, 69.6.3 

connective, primitive, 4.1.6, 4.7.1 

conquer, divide and, connected components, 34.5.1, 34.5.16 

conservation equations, 31.2.2 

consistency of arithmetic, 14.3.2 

consistent axiomatic system, knowledge sets, 3.4.10 

constant, individual, 5.1.11 
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constant Cartesian space map, zero partial derivatives, 
41.2.23 

constant continuous extension of function, 31.12.19 

constant cross-section, derivative, 64.7.18 

constant cross-section, differentiability, 64.7.16 

constant cross-section, locally, non-topological, 21.6 

constant cross-section, non-topological fibration, 21.6.6 

constant cross-section, non-topological principal bundle, 
21.10.2 

constant cross-section extension, non-topological fibration, 
21.6.8 

constant curve, 36.2.11 

constant curve, zero velocity vector field, 57.9.8 

constant-direction line, 53.1.11 

constant function, continuity, 31.12.9, 32.8.5 

constant function, zero derivative, 40.6.7 

constant function differentiability, summary table, 42.6.1 

constant-graph condition, regular differentiable submanifold, 
52.4.8 

constant-graph condition, regular differentiable submanifold 
point-set, 52.3.16 

constant integral curves of zero vector fields, 57.10.8 

constant logical function, 5.1.11 

constant map differentiability, differentiable manifold, 52.1.9 

constant multiplication rule, differential calculus, 40.5.9, 
41.1.18 

constant name, definition, 5.1.12 

constant predicate, 5.1.11 

constant real function, infinitely differentiable, 42.1.14, 
42.2.16 mm 

constant real function, zero derivative, 40.5.5 

constant real function, zero partial derivative, 41.1.16 

constant real-tuple map, infinitely differentiable, 42.6.2 

constant real-tuple-valued function on manifold, differentiable, 
51.7.4 

constant real-valued function, locally, global differentiable 
extension, 51.8.3 

constant real-valued function on manifold, differentiable, 
51.6.5 

constant real-valued function on manifold, zero derivative, 
54.11.6 

constant real-valued function on manifold, zero differential, 
54.11.18 

constant scale map, tangent bundle, 59.4 

constant-scale map, vector bundle, 65.5 

constant-scale tangent vector map, differential, 59.4.10 

constant stretch of curve, 36.4.2 

constant vector field, direct product, differentiability, 57.2.16 

constant vector field extension, 57.1.20 

constant vector field extension properties, 57.1.23 

constant vector-field tuple extension, 57.4.9 

constant vector-tuple field extension, 57.4.5 

constant vector-tuple fields, locally, properties, 57.4.7 

constant vector-valued function on manifold, differentiable, 
51.7.8 

constant vector-valued function on manifold, zero derivative, 
54.14.5 

constant vector-valued map, zero partial derivatives, 41.8.12 

constant-velocity line, 53.1.11 

constant vertical vector field, differentiability, 64.2.5 

constructible real numbers, Cantor diagonalisation, 13.7.23 

constructible set, 9.1.1 

constructible universe, 12.5.4 

constructible universe, literature, 12.6.8 

construction methods, tangent bundle, 53.4 

construction of connection form from gauge potentials, 
69.12.7 

construction stage, ZF set theory, 7.8.10 
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constructions, topological space, 32 

constructor, fibre chart, antisymmetric multilinear function 
bundle, 56.5.20 

constructor, fibre chart, antisymmetric multilinear map 
bundle, 56.7.12 

constructor, fibre chart, list, 65.7.8 

constructor, fibre chart, multilinear function bundle, 56.4.13 

constructor, fibre chart, principal frame bundle, 55.7.6 

constructor, fibre chart, tangent bundle, 54.5.7 

constructor, fibre chart, tangent covector bundle, 55.4.7 

constructor, fibre chart, tangent vector-frame bundle, 55.6.18 

constructor, fibre chart, tangent vector-tuple bundle, 55.5.25 

constructor, fibre chart, tensor bundle, 56.3.13, 56.3.14 

constructor, fibre chart, vector-frame bundle, 65.8.7 

constructor, fibre chart, vector-tuple bundle, 65.7.9 

constructor, manifold chart, antisymmetric multilinear 
function bundle, 56.5.25 

constructor, manifold chart, antisymmetric multilinear map 
bundle, 56.7.16 

constructor, manifold chart, list, 65.7.10 

constructor, manifold chart, multilinear function bundle, 
56.4.18 

constructor, manifold chart, tangent bundle, 54.5.21 

constructor, manifold chart, tangent covector bundle, 55.4.8 

constructor, manifold chart, tangent vector-frame bundle, 
55.6.26 

constructor, manifold chart, tangent vector-tuple bundle, 
55.5.33 

constructor, manifold chart, tensor bundle, 56.3.21, 56.3.22 

constructor, manifold chart, vector-frame bundle, 65.8.8 

constructor, manifold chart, vector-tuple bundle, 65.7.11 

containment theory, 7.2.10 M 

content, determinable, set, 7.8.9 

content, exterior Jordan, 43.6.1 

content, Jordan, and Darboux integrability, 43.6 

content, Jordan, outer, 43.6.2 m 

context, socio-mathematical, function composition, 10.10.8 

context-dependent, clear as mud, 63.6.7 

continuity, 35 m 

continuity, constant function, 31.12.9, 32.8.5 

continuity, epsilon-delta, 38.1.2, 38.1.3, 38.1.7 

continuity, epsilon-delta, Cartesian space, 38.1.10 

continuity, Hólder, metric space, 38.7, 38.7.2 

continuity, injectivity, implies strict monotonicity, 34.9.9 

continuity, keyhole test, 31.12.16 

continuity, Lipschitz, metric space, 38.6 

continuity, Lipschitz, multiple definitions, 38.6.1 

continuity, metric space, 38 

continuity, modulus, 38.3.9 

continuity, modulus, pointwise, 38.2.4 

continuity, modulus bound, 38.3.9 

continuity, pointwise, 35.3 

continuity, uniform, metric space, 38.3 

continuity condition, uniform, Cauchy integrability, 43.3.20 

continuity conditions, forward set-maps, 35.1.8 

continuity conditions, inverse set-maps, 35.1.2 

continuity implied by differentiability, single variable, 40.5.3 

continuity implied by differentiability, single variable; 
pointwise, 40.5.2 

continuity implied by differentiability, vector-valued function, 
40.7.3 

continuity in topology, fundamental importance, 31.12.1 

continuity of functions, defined using connectedness, 35.2 

continuity of functions, history, 31.12.2 NS 

continuity of map, metric space, 38.1 

continuity of maps, explicit, metric spaces, 38.2 

continuity of seminorm, 24.6.5 pes 

continuous-associated topological fibre bundle, 47.9.11 
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continuous at a point, 35.3.2 

continuous atlas, locally Cartesian space, 49.7 

continuous bijection between metric spaces, continuous 
inverse, 38.1.13 

continuous bijection which is not a homeomorphism, 31.14.4 

continuous chart, locally Cartesian space, 49.6 

continuous curve, 36.2.3 

continuous derivative extensions of differentiable functions, 
44.8.10 

continuous differentiability, keyhole test, 41.1.26 

continuous differentiability of map, keyhole test, 41.2.20 

continuous embedding, topological manifold, 50.3.3 

continuous extension of function, constant, 31.12.19 

continuous family of topological automorphisms, 36.10.14 

continuous function, 31.12, 31.12.4 

continuous function, explicit, 38.2.9 

continuous function, explicit, at a point, 38.2.6 

continuous function, set-map properties, 35.1 

continuous function, uniformly, 38.3.2 um 

continuous function on metric space, 38.1.1 

continuous functions on a locally Cartesian space, linear 
space, 49.10.2 

continuous inverse of continuous bijection between metric 
spaces, 38.1.13 

continuous inverse of continuous real-valued injection on 
interval, 34.9.25 

continuous locally Cartesian atlas, empty chart exclusion, 
49.8.5 

continuous locally Cartesian atlas, induced topology, 49.8.12 

continuous locally Cartesian atlas for a set, 49.8.2 

continuous one-parameter subgroup, 36.9.6 

continuous partial function, 31.13, 31.13.2 

continuous path, directed, 36.8.4 

continuous path, oriented, 36. 

continuous path, unoriented, 36.8.14 

continuous real-valued injection on interval has continuous 
inverse, 34.9.25 

continuous vector-valued functions on a locally Cartesian 
space, linear space, 49.10.9 

continuously directionally differentiable function, 41.4.11 

continuously partially differentiable function, 41.1.24, 41.2.18, 
42.5.11 

continuously partially differentiable function, vector-valued, 
41.8.9 

continuously totally differentiable function, 41.6.10 

continuum, bare, 49.2.10, 59.1.13 

continuum hypothesis, independence from ZF, 7.1.4, 7.1.5, 
7.1.11 Eph 

continuum hypothesis, mediate cardinals, 7.11.8 

contour, 36.1.1 md 

contraction, tensor, 29.7 

contraction (Zusammenziehung), sequent, 5.3.8 

contraction map, inverse function theorem, 41.10.1 

contraction map, ODE existence, 44.6.1 

contraction of indices, tensor calculus, 29.7.1 

contraction of Riemann curvature tensor, Ricci tensor, 
71.11.10 

contraction of tensor, basis-independent, 29.7.4 

contraction of tensor, multilinear-function-style, 29.7.2 

contraction of tensor indices, related to trace, 23.3.1, 25.9.5 

contraction of two matrices, 60.5.2 

contradiction, 3.13, 3.13.2 

contragredient, differential of manifold map, 58.8.1 


=n 


[o9 
iN 


contragredient, double, covector chart transition rules, 55.3.10 


contragredient, linear space automorphism, 23.11.18 

contragredient, manifold chart transition rules, 54.4.12, 
55.2.16 

contragredient action, diffeomorphism family, 63.3.1 
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contragredient relationship, ordinary/principal bundle, 63.5.2 

contragredient representation, linear map, 23.11.19 

contragredient representation, linear transformation group, 
23.11.20 

contragredient transformation for covectors, 55.2.19 

contrapositive, logical, 4.5.9, 6.6.11, 12.1.12, 23.6.8 

contravariant, terminology, 20.10.11, 27.1.2, 28.5.7, 28.5.8, 
71.6.2 

contravariant attribute, 20.9.19 

contravariant baseless figure/frame bundle, 20.10.8 

contravariant function on principal bundle, associated, 47.12, 
66.8 m 

contravariant principa 
66.8.2 

contravariant principal bundle function, associated 
connection, 69.10.3 

contravariant principal bundle function, connection, 69.10.4 

contravariant principal bundle function, OFB cross-section 
interpretation, 47.12.6, 66.8.4 

contravariant principal bundle function, short-cut map, 
47.12.5 

contravariant principa 
69.10.2 

contravariant topological fibre/frame bundle, 47.13.2 

conundrum, ontology, 2.2.13 

convergence, metric space, 37.7 

convergence, nested set, complete metric space, 37.9 

convergence, pointwise, topology, 35.3.27 

convergence, uniform, metric space, 38.4 

convergence of function, 35.3 ES 

convergence of sequence, 35.4 

convergence of sequence, metric space, 37.7.11 

convergent infinite series, topological linear space, 39.2.7 

convergent sequence, 35.4.2 

convergent sequence, Cauchy, metric space, 37.8.3 

convergent sequence of functions, Cauchy, uniformly, 38.4.3 

convergent sequence of functions, uniformly, 38.4.2 

converges at a point, function, 35.3.7 

conversion rules, connections on ordinary bundle, 67.11.2 

conversion rules, connections on principal bundle, 69.6.3 

convex combination, 24.4.1 

convex combination, linear space, 22.11 

convex combination, points in affine space, 26.9.14 

convex combination of vectors, 20.9.21, 22.11.2 

convex envelope, 22.11.20 

convex function, linear space, 23.12, 23.12.2 

convex generator, 22.11.18 T 

convex hull, 22.11.19 

convex hull terminology, survey, 22.11.19 

convex set, linear space, 22.11 

convex span, 22.11.10 m 

convex span of two real numbers, 16.1.14 

convexity implied by non-decreasing derivative, 40.6.11 

convexity implied by non-negative second derivative, 42.1.22 

convexity of real-number intervals, 22.11.24 

cook, vegetarian, 1.4.14 

coordinate-bound line definition, 26.3.9 

coordinate-bound tangent space, 53.1.11 

coordinate frames, holonomic versus moving, 55.7.12 

coordinate-free, 52.1.20, 53.3.9 DS 

coordinate-free, curve classes, 53.3.4 

coordinate-free, differential operator, 53.3.3 

coordinate-free, matrices and bases, 25.7.2, 25.7.6 

coordinate-free, tensor algebra, 29.2.7 

coordinate-free chart space, 53.1.11 

coordinate-free line definition, 26.3.9 

coordinate-free linear space, 26.1.1 

coordinate-free Minkowskian inner product definition, 24.10.6 


bundle function, 47.12.2, 47.12.3, 


bundle functions, covariant derivative, 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


2424 


coordinate-free tangent bundle, 53.3.5 

coordinate-free vectorial geometry, total differential theorem, 
41.6.3 

coordinate-independent, tensor algebra, 27.1.4 

coordinate map, locally Cartesian space, 49.6.2 

coordinate map, topological manifold, 49.6.2 

coordinate neighbourhood, cross-section localisation, 21.6.1 

coordinate space, baseless figure/frame bundle, 20.10.8 

coordinate space, locally Cartesian space, 49.4.12 

coordinate space, topological fibre/frame bundle, 47.13.2 

coordinate tuple, tangent vector frame, 55.6.14 

coordinate tuple, tangent vector-tuple, 55.5.13 

coordinate versus component, terminology, 54.1.5 

coordinate versus component versus coefficient, 53.3.15 

coordinates, Cartesian, 22.7.1 

coordinates, Cartesian, construction, 26.11.3 

coordinates, hidden, 29.2.7 NS 

coordinates, spherical, astronomical, 76.2.3 

coordinates, spherical, recursive formula, 76.1.3 

coordinates, terrestrial, two-sphere, 76.2 

coordinates extraction, vertical component, double tangent 
vector, 59.3.7 

coordinates idiom, tensors in physics, 27.5.11 

corps (field), 18.7.6 

corpus, PDE, 1.4.4 

corpus, theorem, 7.9.2 

corpus (field), 18.7.6 

correct logic, 2.4.4 

correlation, test-retest, 40.1.1, 53.1.7 

correspondence principle, helicopter to supermarket, 54.9.9 

corresponding parameters of curves, 36.5.20 

cortex, early visual, 40.2.1 

cortex, visual, 22.0.3 

coset, linear space, 24.2.2 

coset of subgroup, 17.7.1 

cosine, classical definition, 44.2.21, 44.2.22 

cosine function, 44.2.14 — > 

cosmic abstraction, north, 26.10.1 

cotangent map, induced, pull-back differential, 58.11.1 

cotangent vector, 55.2 HER 

cotangent vector space, differentiable manifold, 55.2.1 

countability class, topological space, 33.4 

countability tests, Peano-axioms-style, 14.2 

countable, first, topological space, 33.4:12 - 

countable, second, topological space, 33.4.13 

countable choice, axiom, infinite sets, 13.7.5 

countable choice, axiom, magic wand, 13.10.1 

countable choice axiom, 7.11.14, 13.7.21 

countable choice axiom, separable space, 33.4.22 

countable choice axiom doppelgànger, 33.7.4 

countable component partition, open set, 34.8.2 

countable connected component enumeration, 34.6.16 

countable family, explicit, of sets of explicit measure zero, 
45.3.1 

countable family of explicit real-number measure-zero sets, 
union, 45.3.3 

countable multiplicative axiom, 13.7.22 

countable open base, 32.2.10 

countable open base at a point, 32.2.9 

countable set, 13.7.6 

countable set, Peano-style, 14.2.6 

countable space, second, open base choice function, 33.4.23 

countable union, finite set, 13.8 

countable union theorem, uses CC, 13.9.9 

countably infinite family of finite sets, explicit, 13.8.4 

countably infinite set, 13.7, 13.7.6 

countably infinite set cardinality, 13.7.9 

countably infinite set numerosity, 13.7.9 
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counterpoint, 1.4.13 

counting sheep, 40.2.2 

counting theorem, well-ordered sets, 13.2.4 

covariance of exterior derivative, Cartesian space, 46.7.8 

covariant, terminology, 20.10.11, 27.1.2, 28.5.7, 28.5.8 

covariant acceleration of curve, 71.7.5 

covariant attribute, 20.9.19 

covariant derivative, affine connection, 71.6 

covariant derivative, construction of horizontal lift, vector 
bundle, 68.2.18 

covariant derivative, function on total space, 68.4 

covariant derivative, gauge, 69.14, 69.14.4 = 

covariant derivative, gauge, real-valued function, 69.14.7 

covariant derivative, gauge, vector-valued function, 69.14.9, 
69.14.10 

covariant derivative, Leibniz rule, affine connection, 71.6.7 

covariant derivative, Leibniz rule, vector bundle, 68.2.16 _ 

covariant derivative, OFB total space function, 68.4.2 

covariant derivative, ordinary fibre bundle, 68.2.5 

covariant derivative, terminology, 71.6.2 

covariant derivative, vector bundle, 68.2.9 

covariant derivative, vector bundle, literature, 68.2.1 

covariant derivative, vector bundle cross-section, 68.2 

covariant derivative for contravariant principal bundle 
functions, 69.10.2 

covariant derivative form, connection form, 69.7.6 

covariant derivative form, correct name for connection form, 
69.5.2 

covariant derivative map, trace, 71.8.2 

covariant derivative of vector field along curve, 71.7 

covariant derivative of vector field by vector, 71.6.4 

covariant derivative of vector field by vector field, 71.6.9 

covariant derivative of vector field on curve, 71.7.2 

covariant derivatives need vector bundles, 65.0.1, 65.1.1, 
65.3.4, 68.2.2 

covariant differential, map between manifolds, 71.10 

covariant fall map, affine connection, 71.10.3 ^ 

covariant tensor bundle cross-section, short-cut, 46.2.1, 
57.7.1, 58.11.6 

covariant vector components, non-singular metric tensor, 
73.5.9 

covector, 30.4.20 

covector, chart-basis, tangent space, 55.3.2 

covector, tangent, 55.2 

covector, tangent, Cartesian space, 26.17.5, 26.17.6, 26.17.7 

covector, tangent, representations, 55.1 

covector, tangent, total space, 55.42 — 

covector, tangent bundle, 55.4.11 

covector, tangent bundle manifold atlas, 55.4.8 

covector, tangent field, Cartesian space, 26.18.3, 26.18.4, 
26.18.5 

covector bundle, tangent, 55.4 

covector bundle, tangent, Cartesian space, 26.17 

covector bundle on a manifold, 55 

covector bundle set, tagged tangent field, Cartesian space, 
26.18.13 

covector bundles, tangent, are fibre bundles, 55.4.15 

covector chart transition rules, double contragredient, 55.3.10 

covector co-velocity, tangent, Cartesian space, 26.18.9 

covector component tuple, tangent, 55.2.5 

covector fibration velocity chart, 55.4.6 

covector fibration velocity chart map, 55.4.6 

covector field, differential, of real-valued function, 58.2.3 

covector set, tagged tangent field, Cartesian space, 26.18.11 

covector set, tangent field, Cartesian space, 26.18.6 

covector space, contragredient transformation, 55.2.19 

covector space, cotangent, differentiable manifold, 55.2.1 

covector space, dual transformation, 55.2.19 
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covector space, induced-map tangent, pointwise, 58.3.7 

covector space, induced-map tangent, total, 58.3.9 

covector space, tangent, basis, 55.3 

covector space, tangent, Cartesian space, 26.17.3 

covector space, tangent, linear space structure, 55.2.3 

covector space, tangent field, Cartesian space, 26.18.8 

covector with given coordinates, tangent, 55.2.7 

cover, indexed open, refinement, 33.5.7 

cover, locally finite, topological space, 33.7.14 

cover, open, 31.7, 31.7.2, 33.5 

cover, open, finite, 3172 — 

cover, open, indexed, 33.5.4 

cover, open, refinement, 31.7.4 

cover, open, subcover, 31.7.3 

cover, pointwise finite, topological space, 33.7.14 

cover-based compactness, 33.5 

cover of a set, 8.7, 8.7.7 E 

cover of a set, indexed, 10.18, 10.18.2 

cover of a set, refinement, 8.7.9 

cover of a set, subcover, 8.78 

cover of a subset, 8.7.2 mm 

cover of a subset, refinement, 8.7.4 

cover of a subset, subcover, 8.7.3 — 

covering lemma, Lebesgue, 37.7.18 

covering set of compatible charts, atlas, 51.4.13 

cow, counting, 14.1.2 

cows, home, 77.4.15 

crazy world, hill of beans, 77.4.15 

creation myth, 3.1.6 

creep, mission, T41 

critical point, Hessian operator, 59.11, 59.11.4 

critical-point Hessian operator, terrestrial coordinates, 76.4 

cross-brace, 72.2.2 

cross product, naive, 9.2.4 

cross product of vectors, 17.1.4 

cross-section, associated, short-cut orbit-space, 47.12.3 

cross-section, associated orbit-space, short-cut map, 47.12.5 

cross-section, constant, derivative, 64.7.18 

cross-section, constant, differentiability, 64.7.16 

cross-section, constant, non-topological fibration, 21.6.6 

cross-section, constant, non-topological principal bundle, 
21.10.2 

cross-section, covariant tensor bundle, short-cut, 46.2.1, 
57.7.1, 58.11.6 

cross-section, differentiable, ordinary fibre bundle, 64.7 

cross-section, fibration, action of vector, 64.7.7 

cross-section, fibration, naive derivative, 64.7. 

cross-section, frame bundle, 57.11 

cross-section, identity, non-topological principal bundle, 
21.10, 21.10.3 

cross-section, identity chart, non-topological principal bundle, 
21.10.7 

cross-section, localisation. via fibre chart, 21.6.2 

cross-section, localised, reglobalisation, 21.6.4 

cross-section, locally constant, non-topological, 21.6 

cross-section, non-topological fibration, 21.3 m 

cross-section, orbit-space associated, short-cut, 47.12, 66.8 

cross-section, orbit-space associated fibre bundle, short-cut, 
66.8.1 

cross-section, terminology, 21.3.1 

cross-section, topological fibration, 47.4 

cross-section, vector bundle, covariant derivative, 68.2 

cross-section, zero, 57.2.13 

cross-section differentiability test, fibre bundle, pull-back 
atlas, 64.9.11 

cross-section extension, constant, non-topological fibration, 
21.6.8 
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cross-section localisation, connection form construction, 
69.12.6 

cross-section localisation, coordinate neighbourhood, 21.6.1 

cross-section localisation, non-topological, 21.6 

cross-section of differentiable fibration, 64.7.1 

cross-section of fibration, differentiable, 64.7.2 

cross-section of non-topological fibration, 21.3.3 

cross-section of non-topological fibration, local, 21.3.3 

cross-section of tangent bundle, 57.1.3, 61.9.1 

cross-section of tangent bundle, local, 57.1.10 

cross-section of topological fibration, 47.4.2 

cross-section of topological fibration, continuous, 47.4.6 

cross-section of topological fibration, local, 47.4.2 

cross-section of vector bundle, naive derivative, Leibniz rule, 
65.6.2 

cross-section product by real function, vector bundle, 
differentiability, 65.2.4 

cross-section short-cut, form-style non-topological fibration, 
214 

cross-section space, associated, short-cut orbit-space, 47.12.4 

cross-sections, linear space, of non-topological vector bundle, 
24.11.4 

cross-sections, principal bundle, connection form, 69.11.3 

cultural relativism, 2.2.7 

cumulative hierarchy of sets, set membership depth, 7.8.8 

cumulative hierarchy of sets, transfinite recursive, 126 — 

cumulative type hierarchy, 12.6.1 

curl, 46.9.7 m 

curl, affine connection, 71.11.2 

curvature, affine connection, 71.11 

curvature, affine connection, principal fibre bundle, 71.4.6 

curvature, parallelism, at a point, differentiable fibration, 
70.3.8 

curvature, parallelism, differentiable fibration, 70.3.9 

curvature, Riemann, at a point, differentiable fibration, 70.3.8 

curvature, Riemann, component array notation survey, 
71.11.9 

curvature, Riemann, differentiable fibration, 70.3.9 

curvature, Riemann, justification theorem, toy version, 70.7.6 

curvature, Riemann, ordinary fibre bundle, 70.3, 70.4.3 

curvature, Riemann, ordinary fibre bundle, justification, 70.7 

curvature, scalar, 74.4.7 TY 

curvature, sectional, 74.5 

curvature, sectional, quotient map, 74.5.4 

curvature, sectional, two-sphere, 76.6.2 

curvature example, connection form, 70.5.3 

curvature field, scalar, component version, 74.4.8 

curvature form, gauge localisation, 70.6, 70.6.2 

curvature form on principal bundle, structure equation, 70.5.7 

curvature map, sectional, bilinear, 74.5.2 NEG 

curvature of connection form, 70.5.2 

curvature of connection on fibre bundle, 70 

curvature of connection on ordinary fibre bundle, 70.2 

curvature of connection on principal bundle, 70.5 

curvature of connection on vector bundle, 704 . 

curvature tensor, Ricci, affine connection, 71.11.11 

curvature tensor, Ricci, components, 71.11.14 

curvature tensor, Ricci, Riemannian manifold, component 
version, 74.4.6 

curvature tensor, Riemann, 70.4.1 

curvature tensor, Riemann, component version, 74.4.4 

curvature tensor, Riemann, components, 71.11.8 

curvature tensor, Riemannian manifold, 74.4 

curvature tensor components, affine connection, 71.11.7 

curvature tensor field, Einstein, 75.3.3 

curvature tensor field, Riemann, affine connection, 71.11.5 

curve, bidirectional length-parametrised, metric space, 
38.9.11 
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curve, closed, 36.2.11 

curve, closed simple, 36.2.11 

curve, compact-interval, 36.2. 

curve, constant, 36.2.11 

curve, continuous, 36.2.3 

curve, corresponding parameters, 36.5.20 

curve, cyclic, 36.2.11 

curve, differentiable, in manifold, 51.9, 51.9.2 

curve, differential, 57.9.1 m 

curve, geodesic, 2I 

curve, geodesic, affinely parametrised, 72.1.2 

curve, geodesic, history, 73.8.2 

curve, geodesic, two-sphere, 76.7 

curve, higher-order differential, 59.8 

curve, initial point, 36.2.13 m 

curve, integral, existence theorem, 57.10.5 

curve, integral, literature, 57.10.1 

curve, integral, of vector field, 57.10.2 

curve, integral, uniqueness theorem, 57.10.6 

curve, integral of vector field, differentiable manifold, 57.10 

curve, integrator, for Riemann-Stieltjes integral, 43.10.3 

curve, length-parametrised, metric space, 38.9.3 

curve, level, 36.1.7 

curve, Lie transport, 61.7.10 

curve, locally rectifiable, bidirectional length-parametrisation, 
38.9.11 

curve, locally rectifiable, in locally Lipschitz manifold, 50.7.6 

curve, multiple point, 36.2.13 

curve, never-constant, 36.4.3 

curve, open-interval, 36.2.9 

curve, rectifiable, 38.9.2 

curve, rectifiable, in a manifold, 50.7 

curve, rectifiable, length-parametrisation, 38.9.3 

curve, rectifiable, metric space, 38.9 

curve, representative, 36.8.4 EX 

curve, scaling, tangent vector, 59.4 

curve, scaling, vector bundle, 65.5 

curve, simple, 36.2.11 

curve, sometimes-constant, 36.4.3 

curve, space-filling, 36.1.5, 36.3 

curve, space-filling, Peano, 36.3.1 

curve, tangent operator, 57.9.13 

curve, tangent vector field, 57.9.2 

curve, tangent vector field, linear space drop function, 57.9.5 

curve, tangent vector scaling, differential, 59.4.7 

curve, terminal point, 36.2.13 

curve, topological, 36 

curve, topological space, 36.2 

curve, vector-valued integrator, for Stieltjes integral, 43.12.2 

curve, velocity vector field, 57.9, 57.9.2 c 

curve action on vector field, manifold, 61.2.7 

curve and path meanings, survey, 36.1.1 

curve atlas, 36.1.4, 36.1.8 

curve class, parallelism, 48.2.6 

curve concatenation, 36.6 

curve concatenation, domain-transformed, 36.6.9 

curve concatenation, domain-translated, 36.6.7 

curve concatenation, joinable, 36.6.5 

curve constant stretch, 36.4.2 

curve differential, regarded as manifold map, 57.9.9, 59.4.7 

curve equi-velocity, 53.1.17 —— RE 

curve equivalence class, tangent vector representation, 53.3.1 

curve existence, integral, applications, 57.10.4 

curve family, 36.0.1 

curve family, derivative swap, in a manifold, 59.6.12 

curve family, higher-order differential, 59.9 

curve-family, parallel transport, 7149 

curve family, quadrilateral, generated by vector fields, 46.5.6 
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curve in Cartesian space, differentiable, 40.8.4, 42.6.6 

curve in Cartesian space, mean value theorem, 40.8.6 

curve length, metric space, 38.8, 38.8.2 

curve length, Riemannian manifold, 73.6, 73.6.2 

curve length function, bidirectional partial, metric space, 
38.8.13 

curve length function, partial, metric space, 38.8.8 

curve parameters, corresponding, 36.5.20 ^ 

curve parametrisation, 36.1.6 

curve path-equivalence, 36.5, 36.5.3 

curve reparametrisation, length invariance, 38.8.5 

curve stationary interval removal, 36.4 

curve terminology, 36.1 — 

curve topics summary, 51.9.7 

curve traversal order, 36.1.5 

curve velocity field, tangent vector scaling, verticality, 59.4.3 

curved space, PDE techniques, 1.4.4 

curves, joinable, 36.6.2 TU 

cut, Dedekind, 15.3.6, 15.4.4, 35.8.7 

cut, Dedekind, finite, 15.4.5 

cut, Dedekind, real number representation, 15.4 

cyclic curve, 36.2.11 mE 

cyclic definition, 2.0.1 

cylinder, local approximation, Peano ODE method, 44.4.4 


dagger, Quine, 3.7.8, 3.7.10, 3.7.14 

Daniell integral, 43.1.4 ` 

Darboux, Jean Gaston, 77.1.7 

Darboux, Jean Gaston, integral, 43.1.2 

Darboux, Jean Gaston, integral well-definition proof, 43.3.18, 
43.4.1, 43.5.1, 43.5.2 

Darboux, Jean Gaston, moving frames, 55.7.12 

Darboux integrable function, real-valued, 43.5.11 

Darboux integrable vector-valued function, 43.9.2 

Darboux integral, 43.1.4, 43.5 a 

Darboux integral, basic properties, 43.7 

Darboux integral, real-valued function, 43.5.12 

Darboux integral, real-valued function, lower, 43.5.10 

Darboux integral, real-valued function, upper, 43.5.10 

Darboux integral, vector-valued integrand, 43.9 

Darboux integral of vector-valued function, 43.9.2 

Darboux sum, lower, 43.5.6 

Darboux sum, upper, 43.5.6 

Dark Ages, 77.1.3 

dark energy, 2.3.3 


dark matter, 2.3.3 
dark number, 2.3 
dark problem, 6.6.14 
dark set, 2.3 

dark theorem, 6.6.14 
dartboard, choice function for separable topology, 33.4.8 
dartboard, component enumeration for open set, 32.7.7, 
34.8.3 I 
data packet, communication network, 21.15.7 

dateline, international, 76.2.2 

de Morgan's law, 8.2.3, 8.4.11 

debauch of indices, 73.3.1 

deception, assertions, denials, 3.1 

decimal representation, real number, 15.3.5 

declaration of function, inline, 43.2.7, 43.10.5 

declarations in axiomatic systems, tags, 3.2.4 
decomposition of differential of map, direct product, global, 
58.10.2 

decomposition space, topological, 32.13.7 

decompression, antisymmetric higher-degree array, 25.15.16 
decompression, symmetric higher-degree array, 25.15.15 
decreasing function, 11.1.30 

Dedekind, (Julius Wilhelm) Richard, 15.0.1, 17.3.1, 77.1.7 
Dedekind cut, 15.3.4, 15.3.6, 15.4.4, 35.8.7. 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


80. Index 


Dedekind cut, finite, 15.4.5 

Dedekind cut, real number representation, 15.4 
Dedekind-cut real number system, 15.9.2 ~ 
Dedekind-finite set, 7.11.13, 13.10.3 
Dedekind-infinite set, 13.10, 13.10.2 

deduction, logical, 4. 437 GE 

deduction, natural, | 4.8. 2, 4.8.5, 5.3 

deduction, natural, tabular systems literature, 5.3.2 
deduction metatheorem, 4.3.18, 4.8 Gn 
deduction metatheorem, motivational example, 4.8 
deduction metatheorem, predicate calculus, 6.6.27 
deduction metatheorem, purpose, 4.8.2 

deduction metatheorem literature survey, 4.8.6 
deduction rule, conditional proof, 4.8.4 

deduction theorem literature survey, 4.8.6 
defeatism, logic, 3.3.1 nn 

defective addition, extended real numbers, 16.2.8 
defective multiplication, extended real numbers, 16.2.8 
definite, negative, 25.11.7, 30.5.3 

definite, positive, 25.11.7, 30.5.3 

definition, cyclic, 2.0.1 


definition, metamathematical, 3.2.4 

definition, recursive, 20.9.12 

definition-centric book, 1.4.14 

definition minimalism, 18.11.13 

definition notation, 1. 155 

definition of * ‘constant”, 5.1.12 

definitions, boot-strápping, 2.1.1, 9.3.1 

definitions, mathematical, naturalism, 7.1.3 

degeneracy, tangent vector 0-tuple, 55.5.15 

degenerate symmetric bilinear function on linear space, 
24.10.2 

degenerate total space, tangent vector 0-tuples, 55.5.11 
degree, multilinear map, 27.2.12 

degree, order, difference of meaning, 59.0.2, 60.0.1 

degree, tensor space, 28.1.11 mec ees 

degree and type, tensor space, terminology survey, 29.5.11 
delta function, Dirac, tangent vector representation, 53.3.1 
delta function, Kronecker, 14.7, 14.7.10, 54.4.8 

delta function, Kronecker, unit matrix, 25.4.1 

delta pseudo-notation for propositions, Kronecker, 14.7.20 
Democritus, 77.1.2 

denarius, 22.0.3 — 

Deng Xiáo-Píng, seek truth from facts, 1.4.15 

denial, alternative, 3.7.14 

denial, joint, 3.7.14 

denial operator, or, alternative, 3.7.8, 3.7.10 

denial operator, joint, 3.7.8, 3.7.10 

denials, assertions, deception, 3.1 

Denjoy integral, 43.1.4 mm 

dense set of function discontinuities, 41.1.22 

dense subset, nowhere, of topological space, 33.4.4 

dense subset of topological space, 33.4.2 

denumerable set, countably infinite, 13.7.12 

dependent choice, axiom, 7.1.11 

depth, rectangular matrix, 25.2.2 

depth of set membership, unbounded versus infinite, 7.8.8 
dereferencing logical expression names, 3.9.2 D EE 
derivation, tangent vector representation, 53.3.1 
derivation rules, logic, 6.1.3 T 
derivative, covariant, affine connection, 71.6 

derivative, covariant, for contravariant principal bundle 
functions, 69.10.2 

derivative, covariant, function on total space, 68.4 
derivative, covariant, gauge, 69.14, 69.14.4 

derivative, covariant, of vector field along curve, 71.7 
derivative, covariant, of vector field by vector, 71.6.4 
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derivative, 
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derivative, 
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derivative, 
derivative, 
derivative 


derivative 


derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 


derivative, 
derivative, 
derivative, 
derivative, 
46.8.10 


derivative, 


derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
manifold, 61.14.9 
derivative, 
derivative, 
derivative, 


derivative, 
derivative, 
61.14 
derivative, 
derivative, 
69.14.10 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 


derivative, 


derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
derivative, 
42.2.7, 42.5.8 
derivative, 
derivative, 
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covariant, of vector field by vector field, 71.6.9 
of vector field on curve, 71.7.2 

OFB total space function, 68.4.2 
ordinary fibre bundle, 68.2.5 
terminology, 71.6.2 

vector bundle, 68.2.9 

vector bundle, literature, 68.2.1 
covariant, vector bundle cross-section, 68.2 
Dini, 40.10, 40.10.2 mm 
directional, HETE 4 

directional, Banach space, 41.9.4 

directional, of vector-valued function, 41.8.14 
directional, several real variables, 41.4 
exterior, Cartesian space, 46.7, 46.7.4. 
exterior, Cartesian space vector fields, corrected, 


covariant, 
covariant, 
covariant, 


covariant, 


covariant, 


covariant, 


exterior, Cartesian space vector fields, uncorrected, 


46.8.3 


exterior, chart-basis vector fields, 61.11 

exterior, connection form example, 69.5.8 
exterior, differentiable manifold, 61.10, 61.10.3 
exterior, does not need swap-like function, 61.10.9 
exterior, for vector fields, 46.8 

exterior, for vector fields on a differentiable 


exterior, literature, 46.7.1 
exterior, of a 0-form, 61.10.7 
exterior, of a 1-form, vector-valued, 61.10.5 


, exterior, on manifold versus Cartesian space, 61.12 


exterior, target space verification, 61.10.4 
exterior for vector fields, differentiable manifold, 


gauge covariant, real-valued function, 69.14.77 
gauge covariant, vector-valued function, 69.14.9, 


higher-order, map between Cartesian spaces, 42.5 
Lie, 61.8.4 m 
Lie, drop function, 61.8.3 

Lie, literature, 61.8.2 

Lie, of vector field, 61.8 

Lie, tensor field, 619 — 

monomial function, 40.5.12 

naive, fibration cross-section, 64.7.7 

naive, ordinary fibre bundle, 64.7 47 

naive, vector field, 61.2, 61.4 


, naive vector field, non-vectorial, 46.4.4 


non-decreasing, implies convexity, 40.6.11 


, non-increasing, implies concavity, 40.6.11 


non-negative second, implies convexity, 42.1.22 
non-positive second, implies concavity, 42.1.22 
partial, chain rule, 41.7.2, 41.7.4 

partial, commutativity theorem, 42.3.6, 42.5.23 
partial, composition rule, 41.7.2, 41.7.4 

partial, higher-order chain rule, 42.5.27 
partial, inverse chain rule, 41.7.7 

partial, map between Cartesian spaces, 41.2 
partial, of a function, 41.1.10, 41.2.7 ~ 
partial, of partial function, 42.2.2, 42.5.3 
partial, of vector-valued function, 41.8.4 
partial, second-order chain rule, 42.5.25 
partial, several real variables, 41:1 
polynomial function, 40.5.13 sd 

real function of real variable, 40.4 

real-valued function, 40.4.4 ^ 

undefined higher-order partial, partial function, 


undefined k-fold, partial function, 42.1.7 
unidirectional, several real variables, 41.5 
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derivative equals differential quotient limit, 40.8.5 
derivative extensions, continuous, of differentiable functions, 
44.8.10 

derivative for several variables, higher-order, 42.2 

derivative map, covariant, trace, 71.8.2 S 

derivative matrix, partial, of a function, 41.2.12 

derivative notation, 40.4.12, 57.9.12 

derivative of map, higher-order, applications, 42.6 
derivative of naive vector field, directional, Cartesian space, 
46.4.2 
derivative of partial function, 42.1.2 

derivative of partial function, higher-order, 42.1.4 
derivative of partial function, partial, higher-order, 42.2.5, 
42.5.6 
derivative of product of maps, naive, Leibniz rule, 61.3.1 
derivative of vector field, directional, Cartesian space, 46.1.8 
derivative operator, directional, line-based, 54.11.8 
derivative sequence, higher-order, 42.1.4 

derivative swap, curve family in a manifold, 59.6.12 
derivative tree, partial, for partial function, 42.2.5, 42.5.6 
derivative tuple, partial, of a function, 41.1.14 

derivative zero, constant function, 40.6.7 

derivative zero, constant real function, 40.5.5 

derivative zero, partial, constant real function, 41.1.16 
derivatives, higher-order, 42 

derivatives, higher-order, real-to-real functions, 42.1 
derivatives, multi-variable, 41 

derivatives, partial, commutativity, 42.3 

derivatives, zero partial, constant Cartesian space map, 
41.2.23 

derivatives, zero partial, constant vector-valued map, 41.8.12 
derived rule, propositional calculus, 4.5.10 

derived rules, validity, 4.3.6 

Desargues, Girard, 77.14 

Descartes, René, 2.2.8, 26.11.3, 77.1.4 

Descartes, René, Cartesian coordinates, 22.7.1, 23.3.1 
Descartes, René, coordinates, 22.0.3, 26.11.5 

desert, red carpet, 7.10.2 

destination set of relation, 9.5.21 

determinable content, set, 7.8.9 

determinant, history, 25.10.1 — 

determinant of a matrix, 25.10, 25.10.3 

determinant of matrix, geometric interpretation, 25.10.14 
developable surface, 74.1.1 

deviation, geodesic, equation, 72.5 

deviation from flatness, parallelism, 70.4.1 

diagonal map, common-domain direct function-product, 
10.15.5 

diagonalisation, Cantor, constructible real numbers, 13.7.23 
diagonalisation of matrix, 76.8.5 m 
diagonalisation procedure, Ascoli's theorem, 38.5.1 
diagonalisation procedure, Cantor, 13.1.26 

diagram, bracket-level, infix logical 'expression, 3.12.1 
diagram, commutative, category theory, 10.15.9 

diagram, tensor space, chicken-foot symbol, 28.1.9, 29.3.5 
diameter, not radius, ancient Greeks, 44.2.22 

diameter of a set, 37.4 ES 

diameter of set, 37.4.6 

dictionary, polyglot, connection definitions, 69.15.3 
diffeomorphic differentiable manifolds, 52.2.3 
diffeomorphism, Cartesian space, 41.7.6, 42.7.2 
diffeomorphism, Cartesian spaces, 41.7 

diffeomorphism, differentiable manifold, 52.2, 52.2.2 
diffeomorphism, fibre-set, Cartesian spaces, 42.7.6 
diffeomorphism, global differential, inverse rule, 58.9.13 


diffeomorphism, local, Cartesian space, 42.7 
diffeomorphism, pointwise differential, inverse rule, 58.5.4 
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diffeomorphism, product-structured manifold, 52.7 
diffeomorphism, pull-back atlas, differentiable manifold, 
52.2.8 

diffeomorphism, pull-back differential, global, 58.11.3, 58.11.5 
58.11.10 

diffeomorphism double-domain product, differentiable, 42.7.4 
diffeomorphism family, 63.2 

diffeomorphism family, contragredient action, 63.3.1 


diffeomorphism family, dual action, 63.3, 63.3.1, 63.3.3 
diffeomorphism family, dual differential action, 63.3.5 
diffeomorphism family, vector field, 63.2.3 

diffeomorphism family, velocity field family, 63.2.7 
diffeomorphism group, 62.0.3, 63.1, 63.1.2 

diffeomorphism group, tangent algebra, 63.2.5 
diffeomorphisms, differentiable, topological group, 63.1.6 
diffeomorphisms, family, differentiable, 63.2.2 
diffeomorphisms, group of, 63.1.4 IE 

difference, set, symmetric, 8.3 

difference-style connection, 67.2.1 

differences of this book from other DG texts, 1.6 
differentiability, chain rule, differentiable manifold map, 
52.1.17 

differentiability, component-projection map, 42.6.17 
differentiability, constant functions, summary table, 42.6.1 
differentiability, constant map, differentiable manifold, 52.1.9 
differentiability, continuous, keyhole test, 41.1.26 
differentiability, fractional, 50.6.2 

differentiability, index-selection map, 42.6.17 

differentiability, keyhole test, 40.3.5 

differentiability, map, joint-domain, 41.3 

differentiability, map, joint-range, 41.3 _ 

differentiability, one real variable, 40.3 

differentiability, partial, keyhole test, 41.1.5 

differentiability, unidirectional, real function, 40.9 
differentiability classes, infinite, interpretation, 42.1.12 
differentiability definition, test functions, pull-back, 52.1.18 
differentiability implies continuity, single variable, 40.5.3 
differentiability implies continuity, single variable, pointwise, 
40.5.2 
differentiability implies continuity, vector-valued function, 
40.7.3 
differentiability of differential forms on manifolds, 57.6.13 
differentiability of map, higher-order, keyhole test, 42.5.28 
differentiability of map, partial, keyhole test, 41.2.3 
differentiability or map, continuous, keyhole test, 41.2.20 
differentiability test, pull-back atlas, for fibre bundle 
cross-section, 64.9.11 

differentiable, componentwise, two-variable manifold map, 
52.6.11 

differentiable, jointly, two-variable manifold map, 52.6.11 
differentiable action of group, fibre sets, 64.13.1 
differentiable connection, ordinary fibre bundle, 67.7.6 
differentiable connection, principal bundle, 69.1.13 
differentiable cross-section, ordinary fibre bundle, 64.7 
differentiable cross-section of fibration, 64.7.2 m 
differentiable curve in Cartesian space, 40.8.4, 
differentiable curve in manifold, 51.9, 51.9.2 
differentiable diffeomorphisms, topological group, 63.1.6 
differentiable extension, global, locally constant real-valued 
function, 51.8.3 

differentiable F fibre bundle, 64.4.4 

differentiable family of diffeomorphisms, 63.2.2 
differentiable fibration, connection, 67.4.2 

differentiable fibration, cross-section, 64.7.1 

differentiable fibration, horizontal lift function, 67.4, 67.4. 
differentiable fibration, parallelism curvature, 70.3.9 
differentiable fibration, parallelism curvature at a point, 
70.3.8 
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fibration, Riemann curvature, 70.3.9 

fibration, Riemann curvature at a point, 70.3.8 
fibration with intrinsic fibre space, 64.1.2 
fibration with specified fibre space, 64.2.2 

fibre atlas, 64.4.2 

fibre bundle, 54.5.1, 64, 64.8 

fibre bundle, associated, 66.7, 66.7.6, 66.7.8 
fibre bundle, associated, ‘orbit-space method, 


fibre bundle, associated, patchwork, 
fibre bundle, connection, 67.5.4 
fibre bundle, horizontal lift function, 67.5.4 
fibre bundle, lift function, transposed, 67.8.2 
fibre bundle, ordinary, 64.8.3 

fibre bundle, pull-back atlas, 64.9 

fibre bundle, vector field, 64.13 

fibre bundle connection, 67 

fibre bundle with Lie structure group, 64.8.3 
fibre chart transition map, 64.3.13 

function, C^, 42.5.11 
function, continuously partially, 41.1.24, 
42.5.11 

function, continuously partially, vector-valued, 


66.7.10 


function, directionally, 41.4.2 

function, directionally, Banach space, 41.9.2 
function, directionally, continuously, 41.4.1 
function, partially, 41.1.4, 41.2.2 

function, partially, vector-valued, 41.8.2 
function, totally, 41.6.4 

function, totally, Banach space, 41.9.7 
function, totally, continuously, 41.6.10 
function, unidirectionally, 40.9.6, 41.5.2 
functions with continuous derivative extensions, 


un 


group, 54.5.1, 62, 62.2, 62.2.2 

ocally Cartesian atlas, 51.5, 51.5.2 

ocally Cartesian atlas, induced topology, 51.5.8 
ocally Cartesian space, 51.3.7 

locally Cartesian space, induced by atlas, 


ocally Cartesian space atlas, 51.3.2 

ocally Cartesian space atlas, indexed, 51.3.2 
manifold, 51, 51.3, 51.3.8 E 
manifold, Cartesian, 51.4.22 
manifold, compatible chart, 51.4.2 
manifold, complete C" atlas, 51.4. 
manifold, differentiable map, 52.1, 52. 
manifold, differentiable real-valued function, 


oO 


ov 
m 
N 


manifold, 
manifold, 
manifold, 


empty chart exclusion, 51.5.5 
equivalent atlases, 51.4.10 
finite-dimensional linear space, 


manifold, 
manifold, 
manifold, 
manifold, 
manifold, 


general linear group, 51.4.24 
induced by atlas, 51.5.15 
maximal CF atlas, 51.4. 


product-structured, 52.7.8 
product-structured, induced atlas, 


[o1 


manifold, product-structured, regular 


embedding, 52.7.5 


manifold, regular embedding, 52.5.3 
manifold, regular immersion, 52.5.4 
manifold, regular submersion, 52.5.6 
manifold, restriction to open set, 51.4.16 
manifold, tangent bundle, 54, 54.5.30, 65.9 
manifold, tangent bundle philosophy, 53 
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differentiable manifo 


differentiable manifold 
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differentiable manifold 
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ble manifold, underlying topology, 51.3.14 
atlas, 51.5.10 =: 
atlas, indexed, 51.5.10 

d chart, torus, 51.10.8 

d diffeomorphism, 52.2, 52.2.2 
d dimension, 51.8.46 — 

d direct product, 52.6, 52.6. 
ble manifold embedding, 525 — 

immersion, 52.5 

inclusion, 52.3.3 

ble manifold map, 52 

ble manifold map, differentiability chain rule, 


17 


ble manifo 
ble manifo 


N 


d map, differential chain rule, global, 


58.4.13 
differentiable manifold map regularity, 52.5.1 
differentiable manifold product, 52 


structure, concrete, 51.5.17 


d 
d 
d restriction, 51.4.15 
d 
d 


ble manifold submersion, 52.5 
ble manifolds, C^-diffeomorphic, 52.2.3 
ble map, differential, global, 58.9, 58.9.2 


ble map, differential, pointwise, 58.4 

ble map, differential at a point, 58.4.5 

ble map, induced map, global, 58.9, 58.9.4 

ble map, pull-back differential, global, 58.11 
ble map, pull-back differential, pointwise, 58.8 


ble map between differentiable manifolds, 52.1, 


ble maps, common-domain product, 52.7.5 


differentiable one-parameter subgroup, 62.9.2 


ble ordinary fibre bundle, 64.8.3 
ble partial function, real, 42.1.7 
ble partial function on Cartesian space, 42.2.7, 


ble principal bundle, 66 
ble principal bundle, right action chart- 


independence, 66.2.5 


ble principal bundle, right action map, 66.2 


differentiable principal fibre bundle, 66.1, 66.1.2 
differentiable real-tuple-valued function on manifold, 51.7.2 


ble real-valued function, 40.3.4 


ble real-valued function on manifold, 51.6 

ble structure, submanifold, fibre-set, 64.11 

ble structure, true, tangent bundle, 49.1.2, 53.1.1 
ble submanifold, 52.4 


differentiable submanifold, horizontal, product-structured 


fold, 52.7.8 
d, regular, 52.4.2 

ble submanifold, regular, basic properties, 52.4.4 
ble submanifold, vertical, product-structured 
fold, 52.7.8 
ble submanifo 


d atlas induced by ambient atlas, 
ble submanifold point-set, 52.3 

ble submanifold point-set, regular, 52.3.7 
ble transformation group, 63.4.3 

ble vector bundle, 65 

ble vector-valued function, 40.7.2 


differential, common-domain direct product of maps, 54.7.3 


, common-domain product-map, 58.7 
, common-domain product map, component 


differentials, 58.7.9 
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map, differential chain rule, pointwise, 


ble real-valued function on differentiable manifold, 
51.6.2 
differentia 
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differential, constant-scale tangent vector map, 59.4.10 
differential, covariant, map between manifolds, 71.10 
differential, exterior, 46.7.1 


differential, higher-order, 59 
differential, partial, of map on manifold product, 58.6 


differential, pull-back, vector-valued, 58.8.6 
differential, second-order, of map between manifolds, 59.12.2 
differential, tangent vector map, constant-scale, 59.4.10 


differential, tangent vector scaling curve, 59.4.7 

differential, terrestrial coordinates, 76.4 

differential, total, Banach space, 41.9.8 

differential, total, several real variables, 41.6, 41.6. 

differential, total, terminology, 41.6.1 

differential, total, theorem, 41.6.15 _ 

differential, vector integrator, Stieltjes integral, 43.12.1 

differential action, dual, diffeomorphism family, 63.3.5 

differential action, left, 63.6.1 

differential calculus, 40 

differential calculus, chain rule, 40.5.15 

differential calculus, composition rule, 40.5.15 

differential calculus, constant multiplication rule, 40.5.9, 

41.1.18 

differential calculus, product rule, 40.5.9, 41.1.18 

differential calculus, quotient rule, 40.5.9, 41.1.18 

differential calculus, reciprocal rule, 40.5.9, 41.1.18 

differential calculus, sum rule, 40.5.9, 41.1.18 

differential chain rule, global, differentiable manifold map, 

58.9.8 

differential chain rule, pointwise, differentiable manifold map, 
58.4.13 

differential equations, 44 

differential equations, ordinary, Peano method, 44.3 

differential equations, ordinary, Picard method, 44.6 

differential equations, ordinary, uniqueness, 445 — 

differential equations, ordinary systems, Peano method, 44.4 

differential equations, ordinary systems, Picard method, 44.7 

differential field of real-valued function, 58.2.3 T 


-1 


differential for differential operators, 58.12 

differential for tangent operator, 58.12.2, 58.12.7 

differential for vector-valued form, pull-back, 58.11.77 

differential form, Cartesian space, 46.2, 46.2.2 

differential form, Cartesian space, differentiable, 46.2.6, 46.2.8 

differential form, Cartesian space, vector-valued, 46.2.4 

differential form, differentiable naive, Cartesian space, 46.6.5, 
46.6.7 

differential form, geometric meaning, 57.6.9 

differential form, naive, Cartesian space, 46.6, 46.6.2, 46.6.3 

differential form, short-cut vector-valued, equi-informational, 
57.7.19 

differential form of degree 0 is a scalar function, 55.5.11 

differential form on a differentiable manifold, 57.6 

differential form on a manifold, 57, 57.6.2 7 

differential form on a manifold, local, short-cut, 57.7.21 

differential form on a manifold, short-cut, 57.7.2 

differential form on a manifold, vector-valued, 57.6.5 

differential form on a manifold, vector-valued, short-cut, 
57.7.3 

differential form on fibration, gauge pull-back, 64.7.14 

differential form short-cut, differentiable manifold, 57.7 

differential form space, vector-valued, short-cut, 57.7.15 

differential forms on manifolds, differentiability, 57.6.13 

differential geometry, intrinsic, 49.1 

differential geometry, intrinsic, locally Cartesian, 49.1.1 

differential geometry, patch-free, 49.2.7 mmm 

differential geometry history, 77 

differential geometry languages, Rosetta stone, 1.4.5 

differential layer, 1.1, 51.1.1 — 

differential layer, overview, 51.1 
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differential map of real-valued function, 58.2.2 

differential of a map, target manifold dependence, 58.5.5, 
58.5.7, 58.7.4, 58.7.5, 58.7.16 

differential of common-domain product of maps, pointwise, 
58.7.7, 58.10.7 

differential of curve, 57.9.1 

differential of curve, higher-order, 59.8 

differential of curve family, higher-order, 59.9 

differential of curve regarded as manifold map, 57.9.9, 59.4.7 
differential of diffeomorphism, global, inverse rule, 58.9.13 
differential of diffeomorphism, pointwise, inverse rule, 58.5.4 
differential of diffeomorphism, pull-back, global, 58.11.3, 
58.11.5, 58.11.10 

differential of differentiable map, global, 58.9, 58.9.2 
differential of differentiable map, global pull-back, 58.11 
differential of differentiable map, pointwise, 58.4 o 
differential of differentiable map, pointwise pull-back, 58.8 
differential of differentiable map at a point, 58.4.5 a 
differential of function, 58 

differential of identity chart transition map, 66.3.4 
differential of identity map, 58.5.1 

differential of identity map, pointwise, 58.5 

differential of identity map, tautological tensor, 58.5.3 
differential of inclusion map, tangent vector embedding map, 
58.5.6 
differential of manifold embedding, tangent vector embedding 
map, 58.5.8 

differential of manifold map, contragredient, 58.8.1 
differential of map, 58 

differential of map, direct product decomposition, global, 
58.10.2 


differential of map at a point, pull-back, 58.8.2 

differential of map at a point, pull-back, vector-valued, 58.8.7 
differential of map at a point, transposed, 58.8.2 

differential of map between manifolds, higher-order, 59.12 
differential of real-valued function, 58.1.2 Es 
differential of real-valued function, global, 58.2 

differential of real-valued function, higher-order, 59.10 
differential of real-valued function, pointwise, 581 
differential of real-valued function for tangent operators, 
58.1.3 
differential of real-valued function on vector-tuple bundle, 
59.7.3 
differential of submanifold restriction, tangent vector 
embedding map, 58.5.10 

differential of trivialisation, transition map, 64.8.12 
differential operator, coordinate-free, 53.3.3 ^. 
differential operator, tangent, 54.11, 54.11.2 

differential operator, tangent, local functions, 54.12.6 
differential operator, tangent, tagged, 54.15.3 
differential operator, tangent, total space, 54.12 
differential operator in Riemannian manifold, 74.6 
differential operator space, tangent, 54.11.10 

differential quotient limit equals derivative, 40.8.5 
differential topology, out of scope, 1.6.3 

differential transformation rule, fibre chart, 64.8.10 
differential transition map, trivialisation, Maurer-Cartan 
form, 62.5.3 

differentials, component, common-domain product map 
differential, 58.7.9 

differentials, recursive, 59 

differentiation, composition rule, 42.1.17 

differentiation, maps between normed linear spaces, 41.9 
differentiation, multi-variable, 41 g 
differentiation, vector-valued function, 40.7 
differentiation, vector-valued function on Cartesian space, 
41.8 

differentiation rules, single variable, 40.5 
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differentiation theorem, Lebesgue, 45.7, 45.7.8, 45.7.9 direct sum of linear spaces, external, 24.1.2 
digital electronic circuit, 3.2.9 ES direct sum of linear spaces, formal, 24.1.2 
dime a dozen, tensor space isomorphisms, 71.11.3 direct sum of linear spaces, internal, 24.1.6 
dimension, dual linear space, 23.7.11, 23.7.12 directed acyclic graph, 11.2.3 
dimension, empty locally Cartesian space, 49.4.13 directed area, 30.4.25 
dimension, empty manifold, 49.4.10, 49.4.14 directed continuous path, 36.8.4 
dimension, Lebesgue, 33.8.2 directed set, 37.8.15 
dimension, linear space, 22.5, 22.5.2 direction of velocity, 40.1.2 
dimension, locally Cartesian topological space, 49.4.10 directional derivative, 41.4.4 
dimension, tensor space, 28.1.22 directional derivative, Banach space, 41.9.4 
dimension, topological, 33.8 directional derivative, naive vector field, non-vectorial, 46.4.4 
dimension of differentiable manifold, 51.3.16 directional derivative, several real variables, 41.4 
dimension of linear space of multilinear function, 27.6.14 directional derivative of naive vector field, Cartesian space, 
dimension of linear space of multilinear maps, 27.6.18 46.4.2 
dimension of topological manifold, 50.1.4 directional derivative of vector field, Cartesian space, 46.1.8 
dimension zero locally Cartesian space, 49.4.9 directional derivative of vector-valued function, 41.8.14 
dimensional analysis, 46.9.4 directional derivative operator, line-based, 54.11.8 
diminishing returns, minimalist logic axioms, 4.9.5 directionally differentiable function, 41.4.2 
Dini derivative, 40.10, 40.10.2 — directionally differentiable function, Banach space, 41.9.2 
dinkum set, axiom of infinity, 12.1.27 directionally differentiable function, continuously, 41.4.11 
Diophantus, regula falsi, 3.1.5 directionally differentiable function which is discontinuous, 
Dirac delta function, tangent vector representation, 53.3.1 41.4.8, 41.4.9 
Dirac equation, gauge theory, 47.12.7 Dirichlet, Johann Peter Gustav Lejeune, 77.1.6 
direct product, common-domain, of relations, 9.7.9 Dirichlet function, 43.2.3 
direct product, differentiable manifolds, 52.6 e disclaimer, 0.0 
direct product, relation, double-domain, 9.7.2 discomfort, fibre bundle framework, 57.7.1 
direct product, topological manifolds, 50.4.2 — disconnected, totally, subset of topological space, 34.3.14 
direct product atlas, non-topological manifolds, 49.3.10 disconnected, totally, topological space, 34.3.14 
direct product atlas, topological manifolds, 50.4.6 disconnected pair of sets, 34.3.6 
direct-product decomposed differential, 58.10.2 disconnected set, 34.1.6 
direct product decomposition of differential of map, global, disconnected topological space, 34.1.3 
58.10.2 disconnection, subset of topological space, 34.3 
direct product identification map for Cartesian space tangent disconnection of subset of topological space, 34.3.4 
bundles, 26.15.4 disconnection of topological space, 34.3.2 
direct product identification map for Cartesian space tangent discontinuous function which is directionally differentiable, 
spaces, 26.15.2 41.4.8, 41.4.9 
direct product identification map for manifold tangent discontinuous partially differentiable function example, 
bundles, 54.7.6 41.1.21, 41.1.22 
direct product identification map for manifold tangent spaces, discovery of proof, 4.5.13 
54.7.2 discrete metric, 37.2.6 
direct product map, common-domain, differentiability, discrete topological space, 31.3.19 
52.6.13 discrete topology, 31.3.19, 37.5.7 
direct product map, double-domain, differentiability, 52.6.15 discrete topology, induced by discrete metric, 37.2.5 
direct product of differentiable manifolds, 52.6.2 disjoint set-union topology, 32.14.11 
direct product of family of groups, 17.7.14 disjoint sets, 8.1.8 
direct product of functions, common-domain, 10.15 disjoint union, 8.7.12 
direct product of functions, common-domain, continuity, disjunction, exclusive, 3.8.1 
32.11.2 disjunction, inclusive, 3.8.1 
direct product of functions, common-domain, disjunction, logical, 3.62 
homeomorphism, 32.11.3 disjunctive normal form, 4.7.4 
direct product of functions, common-domain, pointwise, displacement vector, 26.12.1 
10.15.2 distance between point and set, 37.4.1 
direct product of groups, 17.7.14 distance between sets, 37.4, 37144 
direct product of maps, common-domain, differential, 54.7.3 distance extremisation, calculus of variations, 73.8 
direct product of relations, double-domain, 9.7 distance function, circle topology, semi-open interval, 37.2.10 
direct product of tangent bundles, 54.7.5 ~ distance function, general, 37.1.2 
direct product of tangent bundles of Cartesian spaces, 26.15 distance function, Hessian of second derivatives, 75.1.11 
direct product of tangent bundles of manifolds, 54.7 distance function, metric space, general, 37.1 
direct product of tangent spaces, 54.7.1 rae distance function, metric space, real-valued, 37.2 
direct product of tangent spaces of Cartesian spaces, 26.15.1 distance function, metric tensor calculation from, 76.5 
direct product topology, 32.9.4 distance function, nearest integer, 16.5.22 ic 
direct product topology, family of spaces, 32.12.2 distance function, point-to-point, 76.5.1 
direct product topology, two spaces, 32.9 distance function, real-valued, 37.2.3 
direct product via atlases, topological manifolds, 50.4.8 distance function, Riemannian manifold, 73.7 
direct relation product, double-domain, 9.7.2 distance function, torus topology, semi-open interval, 37.2.12 
direct sum, linear spaces, 24.1 L3 distance function, valued in ordered commutative group, 
direct sum of linear space sequence standard injection, 24.1.5 37.1.2 
direct sum of linear spaces, abstract, 24.1.4 distance function induced by norm, 39.3.2 
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distance functions versus metric tensors, 73.9 
distance-squared function, pseudo, 75.1.8 —— 

distance transfer function envelope, continuity, 38.2.3 
distribution, Schwartz, 19.5.1, 43.1.3 

distribution, Schwartz, tangent vector representation, 53.3.1 
distribution, uneven, rational numbers, 15.1.17 
distribution theory, 31.9.8, 62.3.5 

distributivity logic axiom, 4.4.4 

divergence, Jacobi’s formula, 71.8.5 

divergence interpretation, trace of linear map, 23.3.5 
divergence of vector field, 71.8, 71.8.3 

divergence of vector field, trace of linear space endomorphism, 
71.8.1 

divergent infinite series, topological linear space, 39.2.7 
divergent sequence, 35.4.2 

divide and conquer, connected components, 34.5.1, 34.5.16 
division ring, 18.7.2 EBEN RR 
division ring, commutative, 18.7.4 

divisor, greatest common, 15.1.4, 40.2.3 

divisor, zero, 18.7.8 

divisor, zero, ring, 18.1.14 

DNA (deoxyribonucleic acid), 72.2.2 

dog, classification, 2.2.13 

dog, extratextual definition, 3.6.7 

dog, mouse-catcher, 1.4.3 

dog versus the word "dog", 3.3.1 

dog's breakfast, differential forms, 61.12.1 

dog's breakfast, infinite sets, 13.7.1 

domain, concrete proposition, 3.2, 3.2.3 

domain, integral, 18.2.15 acd 

domain, integral, complex number system, 18.2.19 
domain, integral, ordered, 18.6.13 

domain, multiple, function product, 55.5.26 

domain, operator, module without, 19.1.2 

domain, truth, of a truth value map, 3.5.2 

domain of function, 10.2.6 — 

domain of interpretation, propositional calculus, 5.1.8 
domain of relation, 9.5.4, 9.5.6 nd 
domain restriction, relation, 9.6.21 

domain-transformed concatenation of curves, 36.6.9 


domain-translated concatenation of curves, 36.6.7 


domain/range specification, function, 10.2.1 
domination, numerosity, 13.1.9 

domination, numerosity, pseudo-notation, 13.1.10 
doppelgänger, countable choice axiom, 33.7.4 

dot notation, partial map of functions of two variables, 
10.12.18 

dotage, Newton’s notation, 40.4.12 

dots, ellipsis, ranges of integers, 14.1.20 


double differential of map between manifolds, 59.12.2 
double-domain direct product map, differentiability, 52.6.15 
double-domain direct product of relations, 9.7 
double-domain direct relation product, 9.72. 
double-domain function product, 10.14.3 
double-domain product, Cartesian space maps, 
differentiability, 41.3.2 

double-domain product, Cartesian space maps, differentiable, 
42.6.9, 42.7.4 

double-domain product, diffeomorphisms, differentiable, 
42.7.4 

double-domain product, partial functions, 10.14.10 
double-domain product of functions, 10.14 

double dual, linear space isomorphism, 23.10.7 

double dual, linear space monomorphism, 23.10.5 

double dual linear space, 23.10 

double dual of linear space, 23.10.1 

double-map-rule notation, 10.19.3 
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double set-map, 10.7.4 

double set-map, function, 10.7 

double shadow set, 45.6, 45.6.3 

double shadow-set interval list, 45.6.8 

double shadow set measure, upper bound, 45.6.9 
double shadow set properties, 45.6.6 


double tangent bundle, 59.1, 59.1.22 

double tangent space, oblique drop function, 
double tangent space, vertical drop function, 59.2.9, 59.2.15 
double tangent vector, vertical component coordinates 
extraction, 59.3.7 

double tangent vector component unpacking, 59.3.6 

Dover, white cliffs, bluebirds, 77.4.15 

dovetailed functions, 10.4.16 

drift, project, 1.4.1 

drop function, acceleration of trajectory, 59.8.9 

drop function, covariant derivative, 68.2.6, 71.6.5, 71.8.2, 
T2137 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
57.9.5 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
drop function, 
59.2.10 
drop function, 
65.3.9 
drop function, 
drop function, 
drop function, 
drop function, 
59.2.11 
drop function, 
65.3.13 
drop function, 
drop function, 
drop function, 


59.3.2, 59.3.5 


covariant derivative of curve, 71.7.5 
differentiable fibre bundle, 64.6.6 _ 
fibration, 64.6.2 

Lie bracket, 61.5.6 

Lie derivative, 61.8.3 

linear space, 53.3.13 

linear space, tangent vector field of curve, 


linear space tangent bundle, vertical, 54.9.5 
oblique, 59.3 

oblique, double tangent space, 59.3.2, 59.3. 
oblique, linearity, vector bundle, 65.4.4 
oblique, vector bundle, 65.4, 65.4.2 

ordinary fibre bundle, 64.6, 64.6.2 

pointwise vertical, linearity, tangent bundle, 


[91 
Oo 


pointwise vertical, linearity, vector bundle, 


Riemann curvature, 70.3.10 

second factor projection, 59.2.13 

torsion, abstract affine connection, 71.12.4 
vertical, chart-independence, tangent bundle, 


vertical, chart-independence, vector bundle, 


Oo 


vertical, double tangent space, 59.2.9, 59.2.1 
vertical, linear space tangent bundle, 54.9 
vertical, total tangent space, 59.2 a 
drop function, vertical, vector bundle, 65.3, 65.3.5 

drop function, verticality, 59.2.8 DS 

dropped vector, fake, 68.2.14 

dual, finite-dimensional linear space, 23.8 

dual, multilinear, 27.2.25, 29.2.23. 

dual, multilinear, of a linear space family, 27.6.4 

dual, tensor space, 29.1 mum 

dual action, family of diffeomorphisms, 63.3.3 

dual action of diffeomorphism family, 63.3, 63.3.1 

dual basis, canonical, 23.8.2 m 

dual basis family, canonical, 23.7.3 

dual basis set, canonical, 23.7.2 

dual component map, 23.9.1, 23.9.4 

dual differential action, diffeomorphism family, 63.3.5 
dual frame field, 57.11.4 

dual linear space, 23.6, 23.6.4 

dual linear space, component transition matrix, 23.9.7 
dual linear space, dimension, 23.7.11, 23.7.12 ^ 
dual linear space, double, 23.10 

dual linear space basis, 237 — 

dual linear space component map, 23.9 

dual map, linear space, 23.11.3 nx 
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dual of linear space, double, 23.10.1 

dual of linear space, second, 23.10.2 

dual operator, linear space, 23.11.3 

dual order, 11.1.13 

dual representation, linear map, 23.11.19 

dual representation, linear transformation group, 23.11.20 
dual space, linear, 23 

dual space, multilinear, 29.2.2 

dual space, true, 55.2.4 

dual-space canonical tensor map, 28.2.8 

dual transformation for covectors, 55.2.19 

dual vector, 28.5.7 

dual vector bundle, 65.2.9 

duality, logic quantifier, 5.2.7 

duality, topology, 31.5.9 

duals of linear subspaces, natural isomorphisms, 24.3 
dummy variable, 6.3.10 = 
duple, ordered, 14.6.5 

duplicity, 6.8.17 

duplique, 6.8.17 

dynasty, Sung, 14.9.6 


early visual cortex, 40.2.1 

Earth chart, 49.6.8 

Easter Bunny, eggs, axiom of choice, 13.8.15 

economy, conceptual, 10.1.3 

Eddington, Arthur Stanley, 77.1.7 

effective left transformation group, 20.2, 20.2.1 

effective left transformation group of a topological space, 

36.10.7 

effective right transformation group, 20.7.9 

effective right transformation group of topological space, 

36.11.5 

effective topological left transformation group of topological 

space, 36.10.8 

effective topological right transformation group, 36.11.6 

egg, chicken, derivative and differentiability, 41.1.1 

egg, chicken, integral curve, 44.5.4, 46.5.4 

eggs, Easter Bunny, axiom of choice, 13.8.15 

eggs, golden, 3.0.4 ~ 

Egypt pyramid, precision, 26.10.1 

Egyptian mathematics, 77.1.2 

Ehresmann, Charles, 77.1.7 _ 

Ehresmann, Charles, associated connections, 67.12.1 

Ehresmann, Charles, associated fibre bundle history, 21.12.1 

Ehresmann, Charles, connection forms, 69.5.1 

Ehresmann connection form, 69.5.5 

Ehresmann formalism literature, 69.5.1 

eigenvalue, matrix, 25.3.1 

eigenvalues of metric tensor, 75.1.11 

eigenvector, matrix, 25.3.1 

Einstein, Albert, 77.1.7 

Einstein, Albert, absolute differential calculus, 27.1.5 

Einstein, Albert, ontology of mathematics, 2.2.2 

Einstein, Albert, relativity, 77.3.2 >. 

Einstein curvature tensor field, 75.3.3 

Einstein implicit summation convention, 22.3.10 

Einstein implicit summation convention, multiple ranges, 
71.10.1 

Einstein space, 75.3.4 

electricity line, 11.8.5 

electromagnetism, classical, gauge theory, 70.8.7 

electron, matter field, 47.12.7 

electronic circuit, digital, 3.2.9 

element, column index, matrix, 25.2.11 

element, row index, matrix, 25.2.11 

element address, matrix, 25.2.11 

element of matrix, 25.2.11 

element operator, left, for ordered pair, 9.2.13 
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element operator, right, for ordered pair, 9.2.13 

ellipsis dots, ranges of integers, 14.1.20 

elliptic integrals, simplicity of their inverses, 44.2.20 

embedded manifold, 50.6.4 

embedding, continuous, topological manifold, 50.3.3 

embedding, differentiable manifold, 52.5 

embedding, fibre set of fibre bundle, regular differentiable, 
64.11.3, 64.11.6 

embedding, fibre space in fibration, differentiable, 64.3.9 

embedding, fibre space in fibre bundle, differentiable, 64.11.3 

embedding, regular, differentiable manifold, 52.5.3 

embedding, regular, product-structured differentiable 
manifold, 52.7.5 

embedding, regular, topological manifold, 50.3.6 

embedding, terminology, 50.2.3 

embedding, topological manifold, 50.3 

embedding manifold, differential, tangent vector embedding 
map, 58.5.8 

embedding map, fibre-set tangent bundle, implicit, 66.5.5 

embedding map, fibre-set tangent vector, 64.12 m 

embedding map, tangent vector, differential of inclusion map, 
58.5.6 

embedding map, tangent vector, differential of manifold 
embedding, 58.5.8 

embedding map, tangent vector, differential of submanifold 
restriction, 58.5.10 

embedding map, tangent vector, manifold product slice-set, 
54.8.3 

embedding map, tangent vector, open subset, 54.6.8 

embedding map, tangent vector, product-structured manifold, 
54.8, 58.7.5 

embedding map, tangent vector, submanifold, 54.6, 54.6.2 

empirical proposition, 5.2.7 m 

empty chart, differentiable manifold, exclusion, 51.5.5 

empty fibration, 21.2.5 

empty fibre bundle, 21.8.15 

empty function, 10.2.21 

empty locally Cartesian space, dimension, 49.4.13 

empty manifold, dimension, 49.4.10, 49.4.14 

empty manifold, direct product, 50.4.3 

empty matrix, 25.2.7 

empty principal fibre bundle, 21.9.5 

empty product of family of sets, pathological, 32.12.5 

empty set, 7.6.3 MN 

empty set, uniqueness, 7.6.1 

empty set axiom, 761 — 

empty set bounds, 11.2.15 

empty set topology, 31.3.9 

empty topology, 31.5.2 

empty universe, predicate calculus, 6.3.4 

endofunction, 10.5.21 

endomorphism, C s differentiable, differentiable manifold, 
52.2.2 

endomorphism, linear space, 23.1.8 

endomorphism, linear space, trace, 23.3.3 

endomorphism, monoid, 17.2.8 

endomorphism, ordered field, 18.8.17. 

endomorphism, ring, 18.1.20 

endomorphism, ring-module, algebra, 19.9.5 

endomorphism, set, 10.5.21 

endomorphism, transformation group, 20.6. 

endomorphism module for modules, 19.4.10 

endomorphism trace, linear space, divergence of vector field, 
71.8.1 

endomorphism versus surjective homomorphism, 10.5.23 

endomorphisms of a module, ring, 19.1.5 

endomorphisms of a set, semigroup, 17.1.11 

endomorphisms of module, ring, 19.1.15 


N 


, 20.8. 


m 
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energy, dark, 2.3.3 

engineer’s model, 2.2.13 

English mathematics, isolation, serious evil, 40.4.12 

entanglement, argument lines, 6.6.26 

entier (total space of fibre bundle), 47.1.2 

entity bundle, baseless figure/frame bundle, 20.10.8 

entity bundle, topological fibre/frame bundle, 47.13.2 

entry matrix, 25.2.11 

enumeration, components, open set of real numbers, 32.7 

enumeration, connected component, countable, 34.6.16 _ 

enumeration, open intervals, 32.7.2 

enumeration, rational numbers, 15.2 

enumeration of components of open set, dartboard, 32.7.7, 
34.8.3 

enumeration of connected components of open set, 34.8.1 

enumeration of rational numbers, 15.2.4 — 

enumeration theorem, well-ordered sets, 13.2.4 

enumerations of sets, 12.4 

envelope, convex, 22.11.20 

envelope, distance transfer function, continuity, 38.2.3 

epilogue, 77.4 

epimorphism, C* differentiable, differentiable manifold, 52.2.2 


epimorphism, linear space, 23.1.8, 23.11.9, 23.11.10 
epimorphism, ordered field, 18.8.17 

epimorphism, ring, 18.1.20 

epimorphism, set, 10.5.21 

epimorphism, transformation group, 20.6.2, 20.8.1 


epsilon, set membership notation, 7.2.9 

epsilon-delta approach to calculus, history, 40.2.5 

epsilon-delta continuity, 38.1.2, 38.1.3, 38.1.7 

epsilon-delta continuity, Cartesian space, 38.1.10 

epsilon-number, 12.6.5 

equality, definition, 6.7.5 

equality, predicate calculus, 6.7 

equality, predicate calculus with, 7.5.12 

equality, reflexivity, axiom, 6.7.5 

equality, substitution of, 7.5.12 - 

equality, substitutivity, axiom, 6.7.5 

equality symbol, Robert Recorde, 6.7.11 

equation of geodesic deviation, 72.5 

equations, differential, 44 zs 

equations, ordinary differential, Peano method, 44.3 

equations, ordinary differential, Picard method, 44.6 

equations, ordinary differential, uniqueness, 445 — 

equations, ordinary differential systems, Peano method, 44.4 

equations, ordinary differential systems, Picard method, 44.7 

equations of motion, gauge potential, 70.8.1 os 

equi-informational, knowledge sets and truth tables, 3.14.5 

equi-informational, partitions and equivalence relations, 9.8.3 

equi-informational, power sets and indicator function sets, 
14.7.6 

equi-informationa 

equi-informationa 

equi-informationa 

equi-informationa 
15.1.3 

equi-velocity of curves, 53.1.17 

equicontinuous, pointwise, sequence of functions, 38.4.10 

equicontinuous, uniformly, sequence of functions, 38.4.11 

equinumerosity, 7.11.13, 13.1 

equinumerosity, axiom of choice, 13.1.16 

equinumerosity, pseudo-notation, 13.1.4 

equinumerous, power set, indicator function set, 14.7.5 

equinumerous ordinal numbers, 12.2.14 

equinumerous sets, 13.1.2 

equipotent sets, 13.1.2 

equivalence, logical, 3.6.2 

equivalence, logical expression, 3.10 


, two-valued maps and subsets, 3.5.1 
definitions, connection on manifold, 67.2.2 
definitions, fibre chart, 21.5.11 

definitions, pseudo-Riemannian metric, 
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equivalence class, 9.8.5 

equivalence class of curves, tangent vector representation, 
53.3.1 

equivalence kernel, 9.8.10, 27.8.1 

equivalence kernel of a function, 10.16.7 

equivalence kernel of function, 10.16 

equivalence relation, 9.8, 9.8.2, 10.16 

equivalence relation, quotient map, 10.16.2 

equivalent atlases, differentiable manifold, 51.4.10 
equivalent logical expressions, 3.10.2 

equivalent norms, linear space, 24.8.11 

equivalent topological fibre atlases, 47.6.17 

equivalent topological fibre bundle, 47.7.6 

equivalent topological fibre bundles, 47.7.12 

equivariant 1-form of a connection, 69.8.1 

equivariant map, left transformation group, 20.6.6 
equivariant map, right action, principal bundle, 47.8.20 
equivariant map, right transformation group, 20.8.4 
equivariant map between transformation groups, 20.9.20 
Eratosthenes of Cyrene, 77.1.2 — 
Erlanger Programm, 77.2.6 

erroneous use of symbol, exists, 5.2.3 

Eskimo vegetarianism, axiom of choice, 7.12.7 

essay, investigation, treatise, enquiry, 1.4.3 

essence of a group, 8.8.2 EN 

essential nature, 147 

etymology, affine, 71.0.5 

etymology, affine space, 77.2 

etymology, algebra, 22.0.3 _ 

etymology, 


folk, 3.1.6 
etymology, homeomorphism, 31.14.1 
etymology, 


naive, 2.1.2 
etymology, sequence, 11.5.23 
etymology, tangent, 53.1.2 
etymology, 


topology, 31.1.1, 35.3.11 
etymology, vector, 26.1.10 


Euclid of Alexandria, 53.3.12, 77.1.2 

Euclid of Alexandria, method of exhaustion, 43.1.2 
Euclid of Alexandria, static geometry, 26.19.1 
Euclid’s definition, tangent, 53.3.12 — 
Euclid’s definitions of lines and points, 40.2.1 
Euclid’s Elements, 77.1.2 

Euclidean geometry, 2.1.4 

Euclidean geometry, ontology, 2.2.2 

Euclidean inner product, standard, 24.9.6 
Euclidean inner product space, 24.9.7, 26.11.1 
Euclidean linear space, 22.2.19, 22.2.20 

Euclidean linear space, standard basis, 22.7.9 
Euclidean metric linear space, 26.11.1 

Euclidean metric space, 26.11... 

Euclidean norm, 24.7.15 

Euclidean normed space, 24.7.16, 26.11.1 
Euclidean plane geometry, 26.11.1 

Euclidean solid geometry, 26.11.1 

Euclidean space, 26.11 

Euclidean space, Cartesian space, difference, 49.4.1 
Euclidean space, commutative, 26.11.3 

Euclidean space, extra-mathematical, 26.11.4 
Euclidean space, Levi-Civita connection, 26.11.1 
Euclidean space tangent covector bundle, 26.17 
Euclidean space tangent field covector bundle, 26.18 
Euclidean space tangent-line bundle, 26.14 
Euclidean space tangent-line bundle, philosophy, 26.12 
Euclidean space tangent-line tangent space, 26.13 __ 
Euclidean space tangent velocity bundle, 26.16 
Euclidean topological space, locally, 49.4 S 
Eudoxus of Cnidus, 77.1.2 e 

Eudoxus of Cnidus, Archimedean order, 17.5.3 
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Eudoxus of Cnidus, limit concepts, 35.3.10 

Eudoxus of Cnidus, method of exhaustion, 43.1.2 

Euler, Leonhard, 24.4.1, 26.1.4, 40.2.2, 54.10.1, 71.0.5, 77.1.5, 
77.2.3 ia o c cc (c2 

Euler, Leonhard, combinatorial topology, 31.1.2 

Euler, Leonhard, geodesics, 73.8.2 

Euler-Lagrange equation, real function on real interval, 
44.8.12 

Euler’s angles, 76.8.4 

European renaissance, 77.1.3 

even integer, 5.2.7 

everything is relative, 20.10.2 

evil, greater, axiom of choice, 13.10.1 

evil, serious, isolation of English mathematics, 40.4.12 

ex falso sequitur quodlibet, 3.1.5 

ex machina path class, 48.2.2 

exact sequence of linear maps, 24.5, 24.5.2 

exact sequences, literature, 24.5.1 _ 

exchange (Vertauschung), sequent, 5.3.8 

exchange rate, analogy for tangent vectors, 53.1.4 

excluded middle, 3.1.4 

exclusion, Hausdorff separation, 33.1.25 

exclusion semantics, knowledge, 3.14.6, 6.4.6 

exclusion semantics, logical expression, 3.111 

exclusion semantics, truth function, 3.7.6 

exclusion semantics, truth value map, 3.7.4 

exclusive disjunction, 3.8.1 

exclusive or, 3.6.3 m 

exclusive or, notation, 3.7.16 

exclusive-or operator, 3.7.8, 3.7.10 

exhaustion method, 17.5.3, 18.4.1, 43.0.1, 43.1.2 

exhaustive substitution, logical expression, 6.1.2 

existence, function, 10.2.5 Pd 

existence, unique, 6.8.2 

existence, unique, quantifier notation, 6.8.3, 6.8.5 

existence axiom, set, ZF, 7.4.2, 7.4.3 

existence caveat, 11.2.18 eo” 

existence in aggregate, 9.1.1 

existence of set, bless, 45.3.7 

existence theorem, integral curve, 57.10.5 

existence theorem, ODE, Peano, 44.3.20 

existence theorem, ODE, Picard, 44.6.20 

existence theorem, ODE systems, Peano, 44.4.9 

existence theorem, ODE systems, Picard, 44.7.1 


existence theorem, one-parameter subgroup, Lie group, 62.9.9 


existence theorem, Peano, literature, 44.3.1 

existential quantifier, 5.2.1 

existential quantifier, restricted, 7.2.7 

existential quantifier ontology, 13.7.15 

existential/universal quantifier, information content, 5.2.6 

exists, erroneous use of symbol, 5.2.3 ES 

expansion, name, 3.9.2 LS 

expansion, sequent, 6.5.9, 6.6.26 

explicit continuity of maps, metric spaces, 38.2 

explicit countable family of sets of explicit measure zero, 
45.3.1 

explicit countably infinite family of finite sets, 13.8.4 

explicit families of real-number sets of explicit measure zero, 


45.3 

explicit measure zero, explicit families of real-number sets, 
45.3 

explicit measure zero set, real-number explicit, properties, 
45.2.4 


explicit measure-zero set of real numbers, 45.2, 45.2.2 

explicit measure-zero set of real numbers, explicit family, 
45.3.2 

explicit measure zero sets, explicit countable family, 45.3.1 
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explicit real-number measure-zero sets, family, countable 
union, 45.3.3 

explicit right inverse of surjection, 10.5.17 

explicit sequential compactness, real compact sets, 35.7.10 

explicitly continuous function, 38.2.9 

explicitly continuous function at a point, 38.2.6 

exponential function, 44.1, 44.1.5 

exponential function, basic properties, 44.1.9 

exponential map, 72.3, 72.3.3 


expression, basic logical, concrete propositions, 3.6 
expression, logical, equivalence, 3.10 ~ 
expression, logical, exhaustive substitution, 6.1.2 
expression, logical, functional notation, 3.12.13 _ 
expression, logical, infix, 3.9.12, 3.12.2 

expression, logical, postfix, 3.11.4 

expression, logical, prefix, 3.11.12 

expression, logical, substitution, 3.10 

expression, logical, template, 3.9.9 


expression name, logical, dereferencing, 3.9.2 
expression name, logical, unquoting, 3.9.2 — 
expression space, logical, infix, 3123 

expression space, logical, postfix, 3.11.5 

expression space, logical, prefix, 3.11.13 

expression style, logical, infix, 3.12 

expression style, logical, prefix and postfix, 3.11 
expression substitution notation, post-evaluation, 10.4.14 
expressionism, abstract, 35.1.1 

expressions, logical, equivalent, 3.10.2 

extended finite ordinal number, 12.1, 12.1.3 

extended finite ordinal numbers, order, 12.2 

extended finite ordinal numbers, standard order, 12.2.11 
extended integer, 14.5, 14.5.2 

extended integer, negative, 14.5.4 

extended integer, positive, 145.4 

extended integer set notations, 14.5.4 

extended integers, non-negative, usual topology, 32.5.4 
extended integers, usual topology, 32.5.4 

extended list space, 14.12.12 

extended natural number, 14.5.7 


extended number notation, 1.5.1 


extended rational number, 16.3 


extended rational number notation, 1.5.1 
extended real number, 16.2, 


16.2.3 

extended real number notation, 1.5.1 

extended real number system, 16.2.7, 

extended real number system, non-negative, 16.2.10 

extended real number system, ordered, 16.2.13 

extension, constant cross-section, non-topological fibration, 
21.6.8 

extension, global differentiable, locally constant real-valued 
function, 51.8.3 

extension, integers, 15.1.2 

extension axiom, ZF, 7.5 

extension of a function, 10.4.12 

extension of function, 10.4 

extension of function, continuous, constant, 31.12.19 

extension of function to a set, 10.4.13 

extension of vector to constant vector field, 57.1.20 

extension of vector-tuple to constant vector-field tuple, 57.4.9 

extension of vector-tuple to constant vector-tuple field, 57.4.5 

extensionality, axiom, 7.5.1 — 

extensions, continuous derivative, of differentiable functions, 
44.8.10 

exterior calculus, Cartesian space, 46 

exterior calculus, fundamental theorem, 46.9.1 

exterior closure, topology, 31.9.1 

exterior cone condition, 26.19.4, 51.11.5 

exterior derivative, boundary integral interpretation, 46.9.7 
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exterior derivative, Cartesian space, 46.7, 46.7.4 
exterior derivative, Cartesian space vector fields, corrected, 
46.8.10 
exterior derivative, Cartesian space vector fields, uncorrected, 
46.8.3 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 
exterior derivative, 


chart-basis vector fields, 61.11 

covariance, Cartesian space, 46.7.8 

differentiable manifold, 61.10, 61.10.3 

does not need swap-like function, 61.10.9 

literature, 46.7.1 

nonholonomic vector fields, 46.8, 61.14 

of a 0-form, 61.10.7 PESCE 

of a 1-form, vector-valued, 61.10.5 

on manifold versus Cartesian space, 61.12 

Stokes theorem, Cartesian space, 46.9.5 _ 

surface integral interpretation, 46.10.2 - 

exterior derivative, target space verification, 61.10.4 

exterior derivative, vector fields, uncorrected, 61.14.2 

exterior derivative example, connection form, 69.5.8 

exterior derivative for vector fields, 46.8 

exterior derivative for vector fields, differentiable manifold, 
61.14 

exterior derivative for vector fields on a differentiable 
manifold, 61.14.9 

exterior differential, 46.7.1 

exterior form, 30.4.07 

exterior Jordan content, 43.6.1 

exterior Lebesgue measure, 43.6.3 

exterior of set, 31.9 

exterior of set, topological, 31.9.2 

exterior operator, topological space, 31.9.4 

exterior point of set, 31.2.1 

exterior sphere condition, 26.19.4 

exterior to a set, topological separation, 33.2.5 

external direct sum of linear spaces, 24.1.2 

extra-mathematical, Euclidean space, 26.11.4 

extra-mathematical, rational numbers, 15.1.1 

extra-mathematical points, 53.1.6 

extra-mathematical tangent vector, 53.1.14 

extraction of vertical component coordinates, double tangent 
vector, 59.3.7 

extraneous free variables, 6.3.12, 6.3.13 

extremisation of distance, calculus of variations, 73.8 

extremum, local, on differentiable manifold, 51.6:10 - 

extremum of function, zero derivative, 40.6.1 

extrinsic tangent vector, 53.1.6 


F, hint-letter for closed set, 31.4.2 

F-fibration, non-topological, 21.7.5 

F fibre bundle, differentiable, 64.4.4 

factor projection, second, drop function, 59.2.13 

factorial function, 14.8.26 

factorial function, Jordan, 14.8.29 

factorial function, multi-index, 14.8.39 

factorisation property, universal, multilinear functions, 27.8, 
27.8.7 ~~ 

factorisation property, universal, tensor product, 28.3, 28.3.7 

facts, seek truth from, 1.4.15 — — 

fairy, tooth, axiom of choice, 13.8.15 

faith axiom, choice, 22.7.20, 22.7.22 

faith axiom, infinity versus choice, 7.10.3 

fake, dropped vector, 68.2.14 

fall map, covariant, affine connection, 71.10.3 

false, proposition tag, 3.2.2 

false generalisation, 37.0.1. 

false zero-parameter predicate, 5.1.10 

falsi, regula, 3.1.5 

falsity, truth, 3.1.1 

falso, ex, sequitur quodlibet, 3.1.5 
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families of real-number sets, explicit, of explicit measure zero, 
45.3 

family, nested, of closed sets, 37.9.4 

family, parametrised proposition, 5.1 

family of curves, 36.0.1 m 

family of diffeomorphisms, 63.2 

family of diffeomorphisms, differentiable, 63.2.2 

family of diffeomorphisms, vector field, 63.2.3 

family of elements of a set, totally ordered, 11.5.20 

family of explicit real-number measure-zero sets, countable 

union, 45.3.3 

family of finite sets, explicit countably infinite, 13.8.4 

family of functions, 10.8, 10.8.4 E 

family of functions, totally ordered, 11.5.22 

family of geodesics, one-parameter, 72.5.2 

family of local diffeomorphisms, 61.9.1 _ 

family of sets, 10.8, 10.8.2 

family of sets, Cartesian product, 10.11, 10.11.2 

family of sets, intersection, 10.8.9 T 

family of sets, totally ordered, 11.5.21 

family of sets, union, 10.8.9 

family of topological automorphisms, continuous, 36.10.14 

family of vectors, linearly independent, 22.6.5 I 

family algebraic systems, 17.0.1 

fami classes of sets, 18.11.3 

fami lattice, 11.4.1 

fami modules and algebras, 19.0.1 

order relations, 11.1.1 

relations and functions, 9.0.2 

rings, 18.1.1 m 

set rings and algebras, 18.12.6 

family tree, well-orderings, 11.6.3 

family versus function, terminology, 10.1.5 

Father Christmas, presents, axiom of choice, 13.8.15 

Faustian pact, axiom of choice, 7.4.4, 77.4.1 

FB (fibre bundle), 69.15.1 m 

feature versus bug, axiom of choice, 7.12.7 

Feferman, Solomon, 77.1.8 

Feferman's paradox, axiom of choice, 7.12.6 

Fermat, Pierre de, 77.1.4 m 

Fermat, Pierre de, calculus, derivative at extremum, 40.6.1 

Fermat, Pierre de, Cartesian coordinates, 23.3.1 

Fermat’s last theorem, 7.12.8 

fermion, matter field, 47.12.7 

fermionic matter field, vector bundle, 21.12.9, 47.11.2, 71.6.6, 

15.4.1 

fermionic matter fields, gauge theory, 65.0.1 

Feynman, Richard Phillips, 77.1.7 

Feynman, Richard Phillips, philosophy of science, 2.0.3 

bration, almost-empty, 21.2.11 m 

bration, antisymmetric multilinear map, fibre chart, 56.5.19 

bration, differentiable, connection, 67.4.2 

bration, differentiable, cross-section, 64.7.1 

bration, differentiable, fibre atlas, 64:4 

bration, differentiable, fibre chart, 64.3, 64.3.2 

bration, differentiable, fibre space embedding, 64.3.9 

bration, differentiable, horizontal lift function, 67.4.2 

bration, differentiable, parallelism curvature, 70.3.9 

bration, differentiable, parallelism curvature at a point, 
70.3.8 

bration, differentiable, Riemann curvature, 70.3.9 

bration, differentiable, Riemann curvature at a point, 70.3.8 

bration, differentiable, specified fibre space, 64.2 

bration, differentiable, unspecified fibre space, 64.1 

bration, differentiable, with intrinsic fibre space, 64.1.2 

bration, differentiable, with specified fibre space, 64.2. 

bration, differentiable cross-section, 64.7.2 

bration, differential form, gauge pull-back, 64.7.14 


fami 
fami 
fami 


y 
y 
y 
family 
y 
y 
y 


E Eb E Eh E E Eh En Eh th 


N 


E EP fH kh EH EH kh D 


[draft: UTC 2023-1-3 Tuesday 00:13] 


fibration, drop function, 64.6.2 


80. Index 


fibration, empty, 21.2.5, 21.2.11 
fibration, form-style non-topological, cross-section short-cut, fi 

21.4 fi 
fibration, form-style non-topological, platform fibration, fi 

21.4.6 fi 
fibration, form-style non-topological, short-cut map, 21.4.9 fi 
fibration, manifold tangent, projection map, 54.5.4 fi 
fibration, multilinear function, 56.4.10 fi 
fibration, multilinear function, antisymmetric, 56.5.16, fi 

56.5.17 fi 
fibration, multilinear function, antisymmetric vector-valued, fi 

56.7.6 fi 
fibration, multilinear function, notation, 56.4.9 fi 
fibration, multilinear function, symmetric, 56.6.5 fi 
fibration, non-surjective projection map, 21.2.6 f 
fibration, non-topological, absolute parallelism, 21.15.2 fi 
fibration, non-topological, constant cross-section, 21.6.6 fi 
fibration, non-topological, constant cross-section extension, 

21.6.8 fi 
fibration, non-topological, cross-section, 21.3, 21.3.3 fi 
fibration, non-topological, fibre atlas, aT MES 
fibration, non-topological, fibre chart ,21.5 fi 
fibration, non-topological, form-style, 21.4.2 fi 
fibration, non-topological, linear-style tensor, 56.3.8 
fibration, non-topological, local cross-section, 21.3.3 fi 
fibration, non-topological, local trivialisation, 21.5.7 fi 
fibration, non-topological, multilinear-style tensor, 56.3.7 fi 
fibration, non-topological, non-uniform, 21.1 a fi 
fibration, non-topological, parallelism, 21.15 fi 
fibration, non-topological, uniform, 212 fi 
fibration, non-topological, with fibre space F, 21.2.10 f 
fibration, non-uniform non-topological, 21.1.0 f 
fibration, noumena, 21.5.8 f 
fibration, partitioning by inverse function, 10.16.5 f 
fibration, phenomena, 21.5.8 f 
fibration, platform, of form-style non-topological fibration, fi 

21.4.6 fi 

bration, principal, meaningless, 21.9.1 fi 
bration, second-order tangent, 60.5.12 fi 
bration, self-projection, 21.2.5 fi 
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bration, 
bration, 
bration, 
bration, 
bration 
bration, 
bration, 
bration, 
bration 
bration, 
bration 
bration, 
bration 
bration, 
bration 
bration, 
bration 
bration, 


single-base-point, 21.2.5 

tangent, of manifold, 54.5.4 
tangent vector frame, 
tangent vector tuple, 


, topological, 47.3 


topological, continuous cross-section, 47.4.6 
topological, cross-section, 47.4, 47.4.2 
topological, fibre atlas, 47.5.2 


, topological, fibre chart, 47.3.10 


topological, fibre chart continuity, 47.3.4 


, topological, local cross-section, 47.4.2 


topological, topological fibre set, 47.3.8 


, topological, with fibre space F, 47.3.6 


topological, with intrinsic fibre space, 47.2.2 


, trivial, 21.2.5 


uniform non-topological, 21.2.1 


, vector-frame, on vector bundle, 65.8.6 


vector-tuple, on vector bundle, 65.7.7 


bration, 
chart, 56.7.11 

bration cross-section, action of vector, 

bration cross-section, naive derivative, 64. 

bration total space, tensor, linear-style, 56. 3. 3 


64.7.7 


N 
NT 


bre, differentiable fibre bundle, 64.8.3 
bre, non-topological fibre bundle, 21.8.3 


vector-valued antisymmetric multilinear map, fibre 


bration total space, tensor, multilinear-style, 56.3.2 


bre, non-uniform non-topological fibration, 21.1.2 
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bre, standard, topological fibration with intrinsic fibre space, 


bre, topological fibre bundle, 47.6.5 

bre, topological fibre/frame bundle, 47.13.2 
as, differentiable, 64.4.2 

as, differentiable fibration, 64.4 

as, differentiable fibre bundle, 64.8.3 
as, non-topological fibration, 21.7, 21 


= 


E 


as, non-topological fibre bundle, 21.8.3 
as, tangent vector-tuple bundle, 55.5.28 
as, topological, 47.5 

as, topological, equivalent, 47.6.17 

as, topological fibre bundle, 47.6. 
as for topological fibration, 47.5.2 
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analytic, 64.10 

associated, differentiable, 66.7, 66.7.6, 66.7.8 
associated, history, 21.121 = 
associated differentiable, orbit-space method, 


associated differentiable, patchwork, 66.7.10 


bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 


bundle, 


21.8.12 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 
bundle, 


bre bundle, principal, 
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associated non-topological, orbit-space method, 


associated non-topological, patchwork, 21.13.2 
associated topological, orbit-space method, 


associated topological, patchwork, 47.10.4 
baseless, 20.10.1 

curvature of connection, 70 

differentiable, 54.5.1, 64, 64.8 


, differentiable, connection, 67.5.4 


differentiable, horizontal lift function, 67.5.4 
differentiable, pull-back atlas, 64.9 
differentiable, vector field, 64.13 
differentiable, with Lie structure group, 64.8.3 
empty, 21.8.15 

geometric nature, 21.0.1 

history, 47.0.3 | ^ 

locally product-structured space, 21.0.7 
non-topological, 21, 21.8.3 

non-topological, associated, 21.12, 21.12.5 
non-topological, fibre chart transition map, 


non-topological, orbit-space associated, 21.14 
non-topological, patchwork associated, 21.13 
non-topological, with fibre space F, 21.7.5 — 
non-topological ordinary, parallelism, 21.16 
noumena, 47.13.6 = 


ordinary, associated connection, 67.12, 67.12.3 
ordinary, connection, 67 

ordinary, covariant derivative, 68.2.5 

ordinary, differentiable, 64.8.3 

ordinary, differentiable cross-section, 64.7 
ordinary, drop function, 64.6, 64.6.2, 64.6.6 
ordinary, horizontal component, 64.5 

ordinary, horizontal component map, 67.9.2 
ordinary, horizontal subspace, 67.9.6 

ordinary, naive derivative, 647 

ordinary, non-topological, 21.8 

ordinary, parallel transport, 68.5 

ordinary, Riemann curvature, 70.3, 70.4.3 
ordinary, Riemann curvature, justification, 70.7 
ordinary, topological, 47.6.5 = 
ordinary, vertical component map, 67.10.2 


phenomena, 47.13.6 


principal, affine connection curvature, 71.4.6 
principal, connection, 69 
principal, connection form, 69.5, 69.5.4 


differentiable, 66.1 1, 66 66.1.2 
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bre bund 
bre bund 


, principal, 
, principal, 
bre bundle, principal, 
bre bundle, principal, 
bre bundle, principal, 
69.2 
bre bundle, principal, horizontal subspace, 69.3.7 

bre bundle, principal, infinitesimal action map of Lie algebra 
elements, 66.5.2 
fibre bundle, principal 
fibre bundle, principal, 
21.11.4 

bre bundle, principal, 
group, 21.11.8 
bre bundle, principal, 
e bundle, principal, 
bundle, principal, 


empty, 21.9.5 

exact sequences, 24.5.1 

horizontal component map, 69.3.2 
horizontal lift function, 69.1 
horizontal lift function, ‘transposed, 
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non-topological, 21.9, 21.9.4 
non-topological, right action map, 


D 


non-topological, right transformation 


pointwise right action map, 66.2.10 
right action map, 66.2.2 

right transformation group, 66.2.7 
right transformation groups, 20.7.1, 
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, principal, 
, principal, 
, principal, 
bre bundle, principal, 
bre bundle, principal, 
66.6.2 
bre bundle, principal, vertical component map, 69.4.2 
bre bundle, structure group invariants, 20.9.21 

bre bundle, tangent bundle on affine space, 26.6.4 

bre bundle, topological, 47, 47.5.5, 47.6, 47.6.5 — 

bre bundle, topological, associated, 47.9, 47.9.7 

bre bundle, topological, continuous-associated, 47.9.11 
bre bundle, topological, equivalent, 47.7.6 

bre bundle, topological, homomorphism, 47.7.3 
bre bundle, topological, isomorphism, 47.7.5, 47. 
bre bundle, topological, 


bre bundle, topological, 

bre bundle, topological, overview, 47.1 

bre bundle, topological, parallelism, 48 

bre bundle, topological, particle field, 47.12.2 
bre bundle, topological, patchwork associated, 47.10 

bre bundle, topological, pathwise parallelism, 48.3 

bre bundle, topological fibre/frame bundle, 47.132 

bre bundle, transformation group homomorphisms, 20.6.1 
bre bundle, transformation groups, 20.1.1 GS 
bre bundle, trivial, product-structured space, 10.15.11, 
21.0.7 
bre bundle association, topological, 47.9.5 

bre bundle association map, differentiable, 66.7.5 
bre bundle association map, non-topological, 21.12. 
bre bundle connection, generator function, 67.6.5 
bre bundle connection, horizontal component map, 67.9 
bre bundle connection, vertical component map, 6710 
bre bundle cross-section, orbit-space associated, short-cut, 
66.8.1 


structure group invariants, 20.9.22 
subgroups, 17.7.13 
topological, 47.8, 47.8.3 


transformation groups, 20.3.6 
transposed infinitesimal action map, 
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fibre bundle cross-section differentiability test, pull-back 
atlas, 64.9.11 

fibre bundle framework, discomfort, 57.7.1 

fibre bundle homomorphism, 47.7 

fibre bundle isomorphism, 477 

fibre bundle quintessence, local trivialisation, 10.15.11, 21.0.7 

fibre bundle structure, algebraic/topological/differentiable, _ 


21.0.1 
bre bundles, tangent bundles, 54.5.33 
bre bundles, tangent covector bundles, 55.4.15 
bre bundles, topological, equivalent, 47.7.12 
bre bundles, topological classification, out of scope, 47.7.1 


bre chart, antisymmetric multilinear map fibration, 56.5.19 
bre chart, compatible, 47.6.15 
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bre chart, differentiable fibration, 64.2.2, 64.3, 64.3.2 
bre chart, differentiable fibre bundle, 64.8.3 
bre chart, non-topological fibration, 21.5, 21.5.2 


bre chart, non-topological fibre bundle, 21.8.3 

bre chart, tangent-line bundle total space, Cartesian space, 
26.14.7 

bre chart, topological, 47.3 

bre chart, topological fibration, 47.3.10 

bre chart, topological fibration with intrinsic fibre space, 
47.3.3 

bre chart, topological fibre bundle, 47.6.5 

bre chart, vector-valued antisymmetric multilinear map 
fibration, 56.7.11 

bre chart association map, 66.7.5 

bre chart constructor, antisymmetric multilinear function 
bundle, 56.5.20 

bre chart constructor, antisymmetric multilinear map 
bundle, 56.7.12 

bre chart constructor, list, 65.7.8 

bre chart constructor, multilinear function bundle, 56.4.13 
bre chart constructor, principal frame bundle, 55.7.6 

bre chart constructor, tangent bundle, 54.5.7 

bre chart constructor, tangent covector bundle, 55.4.7 

bre chart constructor, tangent vector-frame bundle, 55.6.18 
bre chart constructor, tangent vector-tuple bundle, 55.5.25 
bre chart constructor, tensor bundle, 56.3.13, 56.3.14 
bre chart constructor, vector-frame bundle, 65.8.7 

bre chart constructor, vector-tuple bundle, 65.7.9 

bre chart continuity, topological fibration, 47.3.4 

bre chart differential transformation rule, 64.8.10 
bre-chart-induced atlas on fibre set, 64.3.7 

bre chart restricted to fibre set, homeomorphism, 47.3.11 
bre chart role, 47.3.12 ~ 
bre chart transition map, differentiable, 64.3.13 

bre chart transition map, non-topological fibre bundle, 
21.8.12 

bre chart transition map subscript order, 47.6.11 

bre field, 21.3.1 

bre set, differentiable fibre bundle, 64.8.3 

bre set, induced linear structure, 24.11.2 

bre set, non-topological fibre bundle, 21.8.3 

bre set, non-uniform non-topological fibration, 21.1.2 

bre set, principal fibre bundle, group action, 47.8.19 

bre set, topological, of topological fibration, 47.3.8 

bre set, topological fibration, cross-section, 47.4.1 

bre set, topological fibration, homeomorphic to fibre space, 
47.6.8 
bre set, topological fibration, 
bre set, topological fibration, pairwise homeomorphic, 47.2.5 
bre set, topological fibration, regularly embedded, 47.3.7 
bre set, topological fibration, structure group, 47.5.3 

bre set, topological fibre bundle, 47.6.5 

bre set, topological fibre bundle, algebraic structure, 47.6.14 
bre set, vector bundle, standard induced basis field, 65.1.8 
bre set atlas, fibre-chart-induced, 64.3.7 

bre set automorphism through the charts, 48.1.8 

bre-set diffeomorphism, Cartesian spaces, 42.7.6 

bre set glue, 64.1.1 

bre set isomorphism through the charts, 48.1.14 

bre set linear space, induced, 65.1.6 

bre set manifold atlas, 64.11.5 

bre set map, structure-preserving, 48.1 

-restricted fibre chart, homeomorphism, 47.3.11 
submanifold differentiable structure, 64.11 

bre-set tangent bundle embedding map, implicit, 66.5.5 
bre-set tangent vector embedding map, 64.12 


intrinsic fibre space, 47.2.2 
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bre-set tangent vector identification with vertical vectors, 
64.12.7 
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fibre-set vector field, horizontal, 64.5.13 field, vector, divergence, 71.8, 71.8.3 
fibre-set vector field, horizontal submanifold, 64.5.14 field, vector, exterior derivative, 46.8 
fibre set vector field induced by Lie algebra, non-vertical, field, vector, integral curve, 57.10.2 _ 
64.14.2 field, vector, Lie derivative, 61.8 
fibre sets, pairwise disjoint, 47.1.2 field, vector, lift by horizontal lift function, 67.5.9, 69.1.16 
fibre space, differentiable fibre bundle, 64.8.3 field, vector, lift by transposed horizontal lift function, 67.8.6, 
fibre space, intrinsic, topological fibration, 47.2.2 69.2.7 ENG 
fibre space, non-topological fibration, 21.2.8 field, vector, linear space, 57.2.8 
fibre space, non-topological fibre bundle, 21.8.3 field, vector, naive derivative, 61.2, 61.4 
fibre space, specified, differentiable fibration, 64.2 field, vector, on a manifold, 57 
fibre space, topological fibration with intrinsic fibre space, field, vector, Picard iteration method, 44.6.18 
47.3.2 field, vector, zero, 57.2.13 =-= 
fibre space, topological fibre bundle, 47.6.5 field, vector-tuple, 57.4 __ 
fibre space, unspecified, differentiable fibration, 64.1 field, vector-tuple, chart-basis, 57.4.2 
fibre space embedding in fibration, differentiable, 64.3.9 field, vector-tuple, constant extension, 57.4.5 
fibre/frame bundle, relativity condition, 20.10.18 mu field along curve, tensor, 57.8.8 mum 
fibre/frame bundle, topological, contravariant, 47.13.2 field along curve, vector, 57.8.2 
fibre/frame bundle, topological, local trivialisation, 47.13.4 field along curve, vector, continuous, 57.8.3 
fibre/frame bundle, vector component transition array, field along curve, vector, differentiable, 57.8.5 
22.9.12 field along curve, velocity vector, 57.9 cdi 
fibre/frame bundles, topological, combined, 47.13 field covector, tangent, Cartesian space, 26.18.3, 26.18.4, 
field, 18.7, 18.7.3 m 26.18.5 
field, affine space over unitary module over, 26.10.3 field covector bundle, tangent, Cartesian space, 26.18 
field, characteristic, 18.7.10 field covector bundle set, tagged tangent, Cartesian space, 
field, complete ordered, 18.9, 18.9.3 26.18.13 
field, differentiable naive vector, Cartesian space, 46.3.4 field covector set, tagged tangent, Cartesian space, 26.18.11 
field, differential, of real-valued function, 58.2.3 field covector set, tangent, Cartesian space, 26.18.6 
field, dual frame, 57.11.4 field covector space, tangent, Cartesian space, 26.18.8 
field, fibre, 21.3.1 field family, vector, holonomic, 46.5.3 
field, frame, 57.11 field family, velocity, of diffeomorphism family, 63.2.7 
field, fundamental vector, principal bundle, 66.6.1 field of sets, 18.11.15, 18.12 
field, fundamental vertical vector, 66.6 field on principal bundle, fundamental vector, right 
field, fundamental vertical vector, principal bundle, 66.6.2 translation, 66.6.10, 66.6.11 
field, Jacobi, 72, 72.4, 72.4.5 field ordering, 18.8.3 
field, lifted chart-basis operator, on principal bundle, 69.14.4 field regarded as a linear space, 22.1.9 
field, lifted chart-basis vector, on principal bundle, 69.14.3 fields, holonomic vector, Koszul formalism, 57.0.2 
field, matter, fermion, 47.12.7 fields, holonomic vector, Riemann curvature, 70.3.2 
field, metric tensor, tensor calculus, 73.3 fields, locally constant vector-tuple, properties, 57.4.7 
field, multilinear, 27.1.1 m fields, map-related vector, Lie bracket, 61.6 
field, naive vector, Cartesian space, 46.3, 46.3.3 fields, vector, related by a map, 61.6.2 ^ 
field, name clash, 18.7.6 m fifth problem, Hilbert’s, 62.1 
field, operator, chart-basis, 57.3.5 figure, transformation group, 20.9, 20.9.2 
field, ordered, 18.8, 18.8.4 figure bundle, associated baseless, constructions, 20.12 
field, ordered, homomorphism, 18.8.16 figure bundle, baseless figure/frame bundle, 20.10.8 
field, ordered, morphisms, 18.8.17 figure bundles, baseless, associated, 20.11 
field, particle, topological fibre bundle, 47.12.2 figure space for transformation group, 20.9.13 
field, positive cone, 18.8.9 figure/frame bundle, baseless, contravariant, 20.10.8 
field, positive subset, 18.8.9 figure/frame bundles, baseless, 20.10 
field, radiation, boson, 47.12.7 figure/frame bundles, baseless, associated, 20.11.2 
field, Riemannian metric tensor, 73.2.4 finer partition, 8.7.15 = 
field, tangent operator, 57.3 finite basis for linear space, 22.8.3 
field, tangent vector, 57.1.2, 57.1.9 finite choice theorem, Cartesian products, 13.7.17 
field, tangent vector frame, 57.11.2 finite-codimensional linear subspace, 22.5.19 
field, tensor, 57.5, 57.5.2 finite-complement topology, 31.11.7 
field, tensor, along curve, 57.8 finite cover, locally, topological space, 33.7.14 
field, tensor, Lie derivative, 61.9 finite cover, pointwise, topological space, 33.7.14 
field, tensor, on a manifold, 57 finite-dimensional linear space, 22.5.7 
field, torsion, 71.12.5 finite-dimensional linear space, differentiable manifold, 
field, unital morphism, 18.7.9 51.4.21 
field, vector, 57.1 finite-dimensional linear space, dual, 23.8 
field, vector, action on real-valued function, 61.1, 61.1.3 finite-dimensional linear space, standard atlas, 49.7.14 
field, vector, action on tensor field, 61.4.7 = finite-dimensional linear space, standard topology, 32.6.6 
field, vector, action on vector field, 61.4.2 finite-dimensional linear space basis cardinality, 22.7.16 
field, vector, along curve, 57.8 finite-dimensional normed linear space, 39.5 
field, vector, Cartesian space, 46.1, 46.1.3 finite mathematical induction principle, 12.2.13 
field, vector, Cartesian space, differentiable, 46.1.5 finite mathematical induction principle, wrapped, 12.2.24 
field, vector, constant extension, 57.1.20 finite open cover, 31.7.2 
field, vector, differentiability, 57.2 finite ordinal number, 12.1.34 
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finite ordinal number, addition operation, 13.6.3 force function, second-order ODE right-hand side, 44.3.3 
finite ordinal number, extended, 12.1, 12.1.3 form, bilinear, 24.9.3 
finite ordinal number, subtraction operation, 13.6.4 form, connection, Cartan-style, 69.11.2 
finite ordinal number addition, 13.6 form, connection, curvature, 70.5.2 
finite ordinal number addition, commutativity, 13.6.6 form, connection, effect of right action map, 69.8 
finite ordinal number addition, existence, 13.6.2 form, connection, Ehresmann, 69.5.5 mud 
finite ordinal numbers, extended, order, 12.2 form, connection, for principal bundle cross-sections, 69.11.3 
finite ordinal numbers, extended, standard order, 12.2.11 form, connection, localisation component function, 69.13.3 
finite ordinal numbers, standard order, 12.2.5 form, connection, localisation components, 69.13 
finite ordinal numbers are totally ordered, 12.2.6 form, connection, localisation via cross-sections, 69.11 
finite ordinal numbers are well ordered, 12.2.7, 12.2.8 form, connection, on principal fibre bundle, 69.5 
finite permutation, 14.8.10 form, connection, principal fibre bundle, 69.5.4 - 
finite permutation, cardinality conservation, 14.8.41 form, connection, reconstruction from localisations, 69.12.3 
finite permutation, flow-balance equation, 14.8.40 form, differential, Cartesian space, 46.2, 46.2.2 
finite real-number interval, 16.1.12 form, differential, Cartesian space, differentiable, 46.2.6, 
finite sequence, 12.3.1 46.2.8 E 
finite sequence, sorted rearrangement, 14.11.3 form, differential, Cartesian space, vector-valued, 46.2.4 
finite sequence, sorting-permutation, 14.11.3 form, differential, geometric meaning, 57.6.9 
finite sequence, sum, semigroup, 17.1.17 form, differential, on a differentiable manifold, 57.6 
finite sequence, sum and product, 16.7 form, differential, on a manifold, 57, 57.6.2 E 
finite set, 13.5, 13.5.2 form, differential, on a manifold, local short-cut, 57.7.21 
finite set, countable union, 13.8 form, differential, on a manifold, short-cut, 57.7.2 
finite set, Dedekind, 13.10.3 form, differential, on a manifold, vector-valued, 57.6.5 
finite set, Kuratowski, 13.11.7 form, differential, on a manifold, vector-valued, short-cut, 
finite set, Peano-style, 14.2.10 57.7.3 
finite set, variant definitions, 13.11 form, differential, short-cut on differentiable manifold, 57.7 
finite set cardinality, 13.5.5 form, first fundamental, 73.2.8 mme 
finite set concepts, 13.11.1 form, Maurer-Cartan, left, 62.5.4 
finite set numerosity, 13.5.5 form, Maurer-Cartan, right, 62.7.13 
finite set topology, combinatorics, 31.5.6 form, naive differential, Cartesian space, 46.6, 46.6.2, 46.6.3 
finite subsequence, 12.3.3 form, statement, 4.1.6 
finite support, free linear space, 22.2.9 form short-cut, multilinear, general, 57.7.25 
fire and smoke, 3.4.7, 3.7.4, 3.7.6 form-style non-topological fibration, 21.4.2 
first cab off the rank, choice function, 33.4.8 form-style non-topological fibration, cross-section short-cut, 
first cause, 3.1.6 21.4 
first countable topological space, 33.4.12 form-style non-topological fibration, platform fibration, 21.4.6 
first fundamental form, 73.2.8 form-style non-topological fibration, short-cut map, 21.4.9 
first-instance subsequence of a sequence, 12.4.16 formal direct sum of linear spaces, 24.1.2 
first-order language, 6.9 formal linear combination of vectors, 22.2.22 
first-order ODE, non-uniqueness, 44.3.2 formalisation, propositional calculus, 4.1 
first-order partial derivative vector field, 57.9.15 formalism, topology, 31.3.1 
fish, kettle of, 6.4.9, 7.11.7 formalism, vector field calculus, Koszul, 57.0.2 
Fitzgerald, George Francis, 77.1.7 formalism versus intuitionism, 2.2.12 
Fitzgerald, George Francis, relativity, 77.3.2 formalisms and notations, plethora, 1.4.4, 1.4.5 
five layers, linguistic structure, predicate calculus, 5.1.6 Forms, Platonic, 2.2.10 
fixed stars, 48.2.4, 59.8.1 = forms on manifolds, differential, differentiability, 57.6.13 
flat musical isomorphism, 73.5.1, 73.5.3 formula, set-theoretic, 7.2.2 
flat space, 26.1.2 formula, two-parameter set-theoretic function, 7.7.6 
flatness deviation, parallelism, 70.4.1 formula, well-formed, 4.1.6 — 
flesh without bones, formal logical calculus, 7.3.10 fortune, wheel of, 1.8 
floating-point hardware, 16.2.12 forward cone, finite path, 21.15.7 
floating-point number, 8.8.1, 15.3.1 forward-connected function, 35.2.4 
floor function, 16.5.11 mm forward set-map conditions for continuity, 35.1.8 
flow, glacier, Lie transport, 61.7.1 foundation, mathematics, 7.1.1 
flow-balance equation, finite permutation, 14.8.40 foundation axiom, 7.8.1 
flow diagram, topics, 1.2 foundations, sandy, intellectual towers, 2.0.2 
fluxions, 40.4.12 ES Fourier, Jean Baptiste Joseph, 77.1.6 
fluxions, Newtonian, 53.1.3 Fourier, Jean Baptiste Joseph, definite integral notation, 
fly, ointment, 6.3.5, 7.8.10, 12.6.5, 15.8.3, 20.9.11, 22.6.1, 43.2.4 

27.8.8, 39.7.1, 46.5.7, 47.1.2, 50.6.5, 51.8.1, 53.1.13, fox-hunter, cat, 1.4.3 

60.1.3, 64.11.2, 65.6.3, 73.10... NO7 fraction, simplified, 15.1.4 
flying arrow, Zeno of Elea, 53.1.12 fractional differentiability, 50.6.2 
folk etymology, 3.1.6 fractional part function, 16.5.16 
font, Fraktur, 31.4.3 Fraenkel, Adolf Abraham Halevi, 77.1.7 
font, lead, 5.2.3 Fraktur font, 31.4.3 
foot, chicken, symbol, 1.5.7 frame, inertial, 48.2.4, 59.8.1 
football patches, 32.15.2 frame, inertial, moving train, 20.1.10, 20.10.2 
force, intermediate concept, 53.1.4 frame, moving, 55.6.1, 55.7.12, 57.11 
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frame, tangent vector, bundle, 55.6.22 

frame, tangent vector, fibre atlas, 55.6.21 

frame, tangent vector principal, fibre atlas, 55.7.7 

frame bundle, baseless figure/frame bundle, 20.10.8 

frame bundle, principal, 55.7 

frame bundle, principal, tangent vector, 55.7.8 

frame bundle, tangent vector, 55.6, 55.6.31 

frame bundle, tangent vector, manifold chart, 55.6.24 

frame bundle, tangent vector, total space, 55.6.30 

frame bundle, tangent vector, total space manifold atlas, 
55.6.28 

frame bundle, topological fibre/frame bundle, 47.13.2 

frame bundle, vector, manifold chart map, 55.6.25 

frame bundle cross-section, 57.11 

frame bundle on a manifold, 55 

frame bundles, principal, are principal bundles, 55.7.11 

frame coordinate tuple, tangent vector, 55.6.14 

frame fibration, tangent vector, 55.6.8 

frame fibration, vector, on vector bundle, 65.8.6 

frame field, 57.11 _— 

frame field, affine connection, 71.14, 71.14.2 

frame field, dual, 57.11.4 ~~ 

frame field, tangent vector, 57.11.2 

frame fields, juxtaposition product, 57.11.7 

frame space, tangent vector, 55.6.3 

frame space, vector, on vector bundle, 65.8.2 

frame transformation group, baseless figure/frame bundle, 
20.10.8 


frame transformation group, topological fibre/frame bundle, 


47.13.2 
frames, coordinate, holonomic versus moving, 55.7.12 
frames, tangent vector, chart transition formula, 55.6.19 
framework, fibre bundle, discomfort, 57.7.1 
framework, model validation, ZF, 7.1.4 
franca, lingua, predicate calculus, 77 AA 
Francesca, Piero della, 77.1.4 
Fréchet, René Maurice, 77.1.7 
Fréchet, René Maurice, compact sets, 33.5.1 
Fréchet, René Maurice, metric spaces, 31.1.3 
free group, 22.2.22 cx 
free left transformation group, 20.3, 20.3.2 
free linear space, 22.2, 22.2.10, 22.2.23 
free linear space, basis, 22.7.19 
free linear space, finite support, 22.2.9 
free linear space, standard immersion, 22.2.10 
free linear space, vector-valued, 22.2.28 
free linear space used to define tensor product, 28.7 
free right transformation group, 20.7.11 
free topological right transformation group, 36.11.6 
free variable, check accent, 6.3.16 —— 
free variable, hat accent, 6.3.16 
free variable, predicate calculus, 6.3.9 
free variable, selective quantification, 6.3.1 
free variables, extraneous, 6.3.12, 6.3.13 
freely, act, left transformation group, 20.3.2 
freely, act, right transformation group, 20.7.11 
Frege, Friedrich Ludwig Gottlob, 2.2.9, 77.1.7 
Frege, Friedrich Ludwig Gottlob, logic axioms, 4.4.1 
FTOC (fundamental theorem of calculus), 43.8.1 _ 
FTOC 1 (fundamental theorem of calculus 1), 43.8.3 
FTOC 1 for curves, vector-valued, 43.9.7 
FTOC 2 (fundamental theorem of calculus 2), 43.8.5 
FTOC 2 for curves, vector-valued, 43.9.8 
function, 10, 10.2, 10.2.2 
function, active nature, 10.1.3 
function, bijective, 10.5.2 
function, choice, 10.3 
function, continuous, 31.12, 31.12.4 


, 6.7.8, 6.7.10 
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function, differentiable real-valued, on differentiable manifold, 
51.6.2 

function, differentiable real-valued, on manifold, 51.6 

function, differentiable vector-valued, on manifold, 51.7 

function, domain/range specification, 10.2.1 E 

function, empty, 10.2.21 — 

function, function-valued, 10.19.1 

function, graph of, 10.2.6 

function, identity, 10.2.27 

function, injective, 10.5.2 

function, inline declaration, 43.2.7, 43.10.5 

function, inverse, 10.5.9, 10.5.10 

function, left inverse, 10.5.13 

function, local, 10.9.2 

function, locally defined, 10.9.2 

function, logical, constant, 5.1.11 

function, partial, 10.9, 10.9.2 

function, partial, bijective, 10.9.8 

function, partial, composition, 10.10 

function, partial, continuous, 31.13, 31.13.2 

function, partial, double-domain product, 1 10.14.10 

function, partial, injective, 10.9.8 

function, partial, set-map properties, 10.9.12 

function, partial, surjective, 10.9.8 

function, partially defined, 10.9, 10.9.2 

function, partially defined, composition, 10.10 

function, procedures versus look-up tables, 10.1.4 

function, real-valued, basic, 16.5 

function, real-valued, higher-order differential, 59.10 

function, right inverse, 10.5.13 esas 

function, set-theoretic, 10.1.2 

function, special, 43.2.3, 43.2.8, 

function, special, fundamental theorems of calculus, 43.8.2 

function, surjective, 10.5.2 BECA 

function, terminology, historical origin, 10.1.6 

function, truth, 3.7.2 

function argument, 10.2.8 

function bound, 11.3 

function composite, 10. 4.17 

function composition, 10.4, 10.4.1 

function composition, socio-mathematical context, 10.10.8 

function composition associativity, 10.4.21 

function composition notation, 10.4.18 

function continuity, history, 31.12.2 

function convergence at a point, 35.3.7 

function direct product, common-domain, pointwise, 10.15.2 

function domain, 10.2.6 — 

function double set-map, 10.7 

function existence and uniqueness, 10.2.5 

function extension, 10.4, 10.4.12 

function extension to a set, 10.4.13 

function family, 10.8.4 

function formula, two-parameter set-theoretic, 7.7.6 

function graph, 9.1.2 

function image, 10.2.6, 10.2.7 

function inverse set-map, 10.6 

function limit at a point, 35.3.7 

function mixed set-map, 10.7 _ 

function of several variables, 10.2.31, 27.2.2 

function of two variables, 10.2.32 

function order, strong bound, normed linear space, 39.7.7 

function order, strong bound, normed module over ring, 
39.7.4 

function order, weak bound, normed linear space, 39.7.6 

function order, weak bound, normed module over ring, 39.7.3 

function-predicate versus function-set, 10.1.2 

function product, double-domain, 10.14.3 

function product, multiple-domain, 55.5.26 


E 
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function-product homeomorphism, common-domain, 32.11 

function quotient, 10.5.27 mm 

function range, 10.2.6, 10.2.7 

function restriction, 10.4, 10.4.2 

function restriction to a set, 10.4.3 

function-sequence, equicontinuous, pointwise, 38.4.10 

function-sequence, equicontinuous, uniformly, 38.4.11 

function-sequence, pointwise bounded, 38.4.13 

function-sequence, uniformly bounded, 38.4.14 

function set, 10.2.16 

function set, notation, 10.2.17, 10.2.20 

function-set, pointwise bounded, 38.4.13 

function-set, uniformly bounded, 38.4.14 

function set-map, 10.6, 10.6.4 

function set-map for a partial function, 10.9.10 

function set-map properties, 10.6.7 EM 

function-set notation, 10.19 

function set restriction, notation, 10.2.18 

function-set versus function-predicate, 10.1.2 

function substitute-value operator, 14.12.23 

function swap-values operator, 14.12.23 

function target set, 10.2.7 

function template, 10.2.30, 14.7.1 

function template, Kronecker delta, 14.7.12 

function theorem, implicit, 41.10 

function theorem, inverse, 41.10, 41.10.4 

function through point along component, slice, 10.12.10, 
10.13.15 

function value, 10.2.8 

function-valued function, 10.19.1 

function-valued function transpose, 10.19.4 

function-valued functions, pointwise composition, 10.4.26 

functional, linear, 23.4, 23.4.2 

functional, linear, juxtaposition product by vector, 23.4.10 

functional, linear non-zero, existence, 23.5 

functional notation, logical expression, 3.12.13 

functions, chained, fully, 10.4.16 

functions, chained, partially, 10.4.24 

functions, direct product, common-domain, 10.15 

functions, dovetailed, 10.4.16 

functions, product, double-domain, 10.14 

functions, special, 44 E 

functions of bounded variation, 38.10 


d 


fundamental form, first, 73.2.8 

fundamental mysteries, real-world geometry, 49.3.11 

fundamental theorem of calculus, 43.8 

fundamental theorem of calculus I, 43.8.3 

fundamental theorem of calculus 1, vector-valued, 43.9.7 

fundamental theorem of calculus 2, 43.8.5 

fundamental theorem of calculus 2, vector-valued, 43.9.8 

fundamental theorem of calculus on manifolds, 46.9.1 

fundamental theorem of exterior calculus, 46.9.1 

fundamental vector field, infinitesimal transformation, 63.7.4 

fundamental vector field, Lie left transformation group, 63.6.6 

fundamental vector field, principal bundle, 66.6.1 mmm 

fundamental vector field on Lie groups, 62.4.14 

fundamental vector field on principal bundle, right 
translation, 66.6.10, 66.6.11 

fundamental vector fields, transposed, Lie bracket, 66.6.13 

fundamental vertical vector field, 66.6 

fundamental vertical vector field, principal bundle, 66.6.2 


Fux, Johann Joseph, 1.4.13 


G, hint-letter for open set, 31.4.2 

Galaxy, Andromeda, axiom of choice, 22.7.20 
Galileo Galilei, 4.0.1, 22.0.3, 77.1.4. 
Galois, Evariste, 17.3.1, 77.1.6 

gap, air, tangent to curve, 11.2.33 

garbage in, garbage out, 22.7.22 _ 
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gauge, local cross-section of principal bundle, 70.8.4 

gauge condition, Lorenz, 77.1.6 

gauge covariant derivative, 69.14, 69.14.4 

gauge covariant derivative, real-valued function, 69.14.7 

gauge covariant derivative, vector-valued function, 69.14.9, 
69.14.10 ~~ 

gauge field, terminology to avoid, 69.11.5, 70.8.3 

gauge localisation of curvature form, 70.6, 70.6.2 

gauge potential, connection form construction theorem, 

69.12.7 

gauge potential, 

gauge potential, 

gauge potential, 

gauge potential, 

69.13.5 

gauge potential, terminology, 69.11.5 

gauge potential, vector fields on fibre bundles, 64.13.2 

gauge potential component function, 69.13.2 

gauge potential coordinate chart components, 69.13 

gauge potential globalisation, 69.12 

gauge pull-back of differential form on fibration, 64.7.14 

gauge terminology, misnomer, 70.8.4 

gauge theory, 70.8 NE 

gauge theory, classical electromagnetism, 70.8.7 

gauge theory, Dirac equation, 47.12.7 

gauge theory, isospin rotation, 70.8.3 

gauge theory, pseudo-Riemannian space, 75.4 

gauge theory, radiation and matter fields, 65.0.1 

gauge theory history literature, 70.8.2 

gauge transformation, between cross-sections for connections, 
69.12.1 

gauge transformation, fibre chart transition formula, 64.13.10 

gauge transformation, for gauge potentials, 69.11.5, 69.12.11 

gauge transformation, identity chart transition map, 66.3.3 

gauge transformation, shorthand formulas, 69.13.11 

gauge transformation, terminology, 70.8.4 

gauge transformation, vector fields on fibre bundles, 64.13.2 

gauge transformation rule, connection form, 69.13.10 

gauge transformation shorthand, Maurer-Cartan form, 

Gau8, Johann Carl Friedrich, 49.6.8, 73.2.8, 77.1.6 

Gaufian integers, ring, 18.2.17 

Geminus, lemmas, 2.4.6 

general and antisymmetric tensor space notations, survey, 
30.4.13 

general connection, torsion, 72.2.3 

general linear group, 25.14 

general linear group, differentiable manifold, 51.4.24 

general linear group, locally Cartesian space, 49.4.18 

general linear group, standard atlas, 49.7.15 ^ 

general linear group, standard topology, 32.6.8 

general linear group for module over a set, 19.1.12 

general linear Lie transformation group, 63.4.17 

general linear transformations, Lie group, 63.4.18, 63.4.19 

general ordinal numbers, standard order, 12.5.12 

general relativity, 75.3 

generalisation, false, 37.0.1 

generalised function, tangent vector representation, 53.3.1 

generated by a subset, subgroup, 17.6.5 

generated by a subset, subring, 18.1.8 

generated topology, 32.1.4 ME 

generating set, general meaning, 62.9.5 

generation rule, 20.9.12 

generator, convex, 22.11.18 

generator, infinitesimal left transformation, 63.6.1 

generator, infinitesimal transformation, linear dependence, 
63.6.12 

generator, Lie algebra element, 62.9.5 


connection form localisation, 69.11, 69.11.1 
constructed from connection form, 69.11.3 
equations of motion, 70.8.1 

principal bundle, Minkowski space, 69.11.4, 
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generator, ordinary fibre bundle connection, transition rule, 
67.6.8 

generator function, connection, 67.6 

generator function, connection, terminology, 67.6.6 

generator function, principal bundle connection, 69.7 

generator function for connection on fibre bundle, 67.6.5 

generator map for infinitesimal transformations, 63.6.20 

generator of connection, difficult extraction from connection, 
69.9.1 

generator of infinitesimal transformation, 63.6.19 

Gentzen, Gerhard Karl Erich, 5.3.2, 77.1.7 

Gentzen, Gerhard Karl Erich, consistency of arithmetic, 
14.3.2 

Gentzen-style assertion, 5.3.3 

geodesic, 72 s 

geodesic, affinely parametrised, conditions, 72.1.11 

geodesic, brachistochrone, calculus of variations, 73.8.2 

geodesic, catenary, calculus of variations, 73.8.2 

geodesic, length-parametrised, 74.3.10 

geodesic, tractrix, calculus of variations, 73.8.2 

geodesic completeness theorem, Hopf-Rinow, 73.7.2 

geodesic curve, 72.1 

geodesic curve, affinely parametrised, 72.1.2 

geodesic curve, calculus of variations, history, 73.8.2 

geodesic curve, two-sphere, 76.7 

geodesic deviation, equation, 72.5 

geodesics, family, one-parameter, 72.5.2 

geodesics and torsion, 72.2 

geometric measure theory, 30.4.26 

geometry, Cartesian, 22.8.1 


geometry, coordinate-free vectorial, total differential theorem, 


41.6.3 
geometry, differential, history, 77 
geometry, differential, intrinsic, 49.1 
geometry, differential, patch-free, 49.2.7 
geometry, Euclidean, 2.1.4 
geometry, Euclidean, ontology, 2.2.2 
geometry, information, out of scope, 1.6.3 
geometry, intrinsic, Hausdorff condition, 49.5.10 
geometry, native, 53.1.7 GREEN 
geometry, plane, Euclidean, 26.11.1 
geometry, pseudo-Riemannian, 75 
geometry, pseudo-Riemannian, overview, 75.1 
geometry, real-world, fundamental mysteries, 49.3.11 
geometry, solid, Euclidean, 26.11.1 
geometry, spherical, 76 
geometry languages, differential, Rosetta stone, 1.4.5 
geophysics, 16.8.10 EE 
Gilgamesh, Mesopotamia, 3.1.1 
glacier flow, Lie transport, 61.7.1 
global differentiable extension, locally constant real-valued 
function, 51.8.3 
global differential of diffeomorphism, inverse rule, 58.9.13 
global differential of differentiable map, 58.9.2 
global direct product decomposition of differential of map, 
58.10.2 
global neighbourhood, 31.2.6 
global pull-back differential of diffeomorphism, 58.11.3, 
58.11.5, 58.11.10 
globalisation, gauge potential, 69.12 
glue, fibre set, 64.1.1 
glue, topological, 31.2.4 
gluon, radiation field, 47.12.7 
goat, 22.0.3 
goats versus sheep, 77.4.15 
gobble, object left on postfix expression stack, 3.11.7 
gobbledygook, reciprocal of zero, 40.5.11 
Gódel, Kurt Friedrich, 2.1.3, 77.1.7 


[www .geometry.org/dg.html] 


ington. All Rights Reserved. You may pint this book draft for personal ue. Publie redistribution of this bon draft in electronic or printe 


golden axioms, 4.5.14 

golden eggs, 3.0.4 

golden tablets, 3.1.6, 6.4.6, 6.4.9, 62.0.1 

golden test of validity, 4.5.14 

goose, 3.0.4 

gordian knots, infinity concepts, 13.7.1 

Göttingen Mathematics Club, Brouwer/Hilbert encounter, 
2.2.12 

gradient of real-valued function, Riemannian manifold, 74.6.2 

Gradus ad Parnassum, 1.4.13 

graph, directed acyclic, 11.2.3 

graph of function, 9.1.2, 10.2.6 

graph of relation, 9.1.2, 9.5.21, 9.5.2 

Graßmann, Hermann Günther, 77.1.6 

Graßmann, Hermann Günther, linear algebra, 22.0.1 

Graßmann, Hermann Günther, multilinear algebra, 27.1.5 

Graßmann algebra, 27.1.5 

great circle, terrestrial coordinates, 76.7.3 

greatest common divisor, 15.1.4, 40.2.3 

greatest lower bound of partially ordered set, 11.2.4, 11.3.2 

Greek alphabet, 31.4.2 

Greeks, ancient, 7.1.2 

Green, George, 77.1.6 

Green, Herbert Sydney, traffic lights, 77.4.7 

Gregory integral, 43.2 

grooves in space, affine connection, 60.1.4 

Grossman, Marcel, 77.1.7 

group, 17.3, 17.3.2 mmm 

group, Abelian, 17.3.24 

group, affine space over, 26.2, 26.2.2 

group, affine space over module over, 26.6.2 

group, analytic, 62.2 

group, analytic, real, 62.2.4 

group, bi-ordering, Archimedean, 17.5.5 

group, commutative, 17.3.23 

group, commutative, Archimedean ordered, 17.5.9 

group, commutative, Archimedean ordering, 17.5.9 

group, commutative, ordered, 17.5.8 

group, commutative, ordering, 17.5.7 

group, conjugate of subset, left, 17.8.2 

group, conjugate of subset, right, 17.8.3 

group, diffeomorphism, 62.0.3 

group, differentiable, 54.5.1, 62, 62.2, 62.2.2 

group, differentiable transformation, 63.4.3 

group, direct product, 17.7.14 

group, direct product of family, 17.7.14 

group, essence, 8.8.2 — 

group, free, 22.2.22 

group, general linear, 25.14 

group, general linear, differentiable manifold, 51.4.24 

group, general linear, for module over a set, 19.1.12 

group, general linear, locally Cartesian space, 49.4.18 

group, general linear, standard atlas, 49.7.15 

group, general linear, standard topology, 32.6.8 

group, holonomy, orthogonal connection, 73.1.2 

group, left module, 19.2.6 

group, left-ordering, Archimedean, 17.5.5 

group, left transformation, 20.1 

group, left transformation, effective, 20.2, 20.2.1 

group, left transformation, equivariant map, 20.6.6 

group, left transformation, free, 20.3, 20.3.2 ^ 

group, left transformation, homomorphism, 20.6 

group, left transformation, transitive, 20.4, 204.1 

group, Lie, 62, 62.0.3, 62.2 7 

group, Lie, abstraction levels, 62.2.11 

group, Lie, adjoint map, 62.10, 62.10.2 

group, Lie, complex, 62.230 _ 

group, Lie, left invariant vector field, 62.4 
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group, Lie, left translation operator, 62.3, 62.3.3 

group, Lie, Lie algebra, 62.8 7 

group, Lie, matrix, 25.14.1 - 

group, Lie, one-parameter subgroup, 62.9 

group, Lie, one-parameter subgroup existence theorem, 62.9.9 

group, Lie, one-parameter subgroup uniqueness theorem, 
62.9.7 

group, Lie, real, 62.2.4 

group, Lie, relation to topological group, 36.9.1 

group, Lie, right invariant vector field, 62.7 

group, Lie, right translation operator, 62.6, 62.6.2 

group, Lie, right translation operator for tangent vectors, 

62.6.3 

group, Lie left transformation, 63.4, 63.4.2 


[Zvi 


group, Lie right transformation, 63.5. 

group, Lie transformation, 63 md 

group, Lie transformation, general linear, 63.4.17 

group, line-to-line transformation, 26.20 

group, locally Cartesian, 62.1.0  ' 

group, matrix, 25.14 

group, matrix, classical, 25.14.1 

group, minimal structure, non-topological fibre bundle, 21.7.7 

group, module over, 19.2 esi 

group, ordered, 17.5 

group, ordered, homomorphism, 17.5.11 

group, orthogonal, 48.3.8, 76.8.2 

group, permutation, non-topological fibre bundle, 21.8.17 

group, permutations, 14.8.6 

group, quotient, 17.7.9 

group, reference-frame transition, topological fibre/frame 
bundle, 47.13.2 

group, reverse-operation, 17.4.12 

group, right-ordering, Archimedean, 17.5.5 

group, right transformation, 20.7 

group, right transformation, effective, 20.7.9 

group, right transformation, equivariant map, 20.8.4 

group, right transformation, free, 20.7.11 

group, right transformation, homomorphism, 20.8 

group, right transformation, Lie, 63.5 TT 

group, right transformation, transitive, 20.7.14 

group, structure, baseless figure/frame bundle, 20.10.8 

group, structure, differentiable fibre bundle, 64.8.3 

group, structure, non-topological fibre bundle, 21.8.2 

group, structure, topological fibre bundle, 47.6.5 

group, structure, topological fibre/frame bundle, 47.13.2 

group, topological, 36, 36.9, 36.9.2 

group, topological left transformation, self-acting, 36.10.12 

group, topological right transformation, self-acting, 36.11.9 

group, topological transformation, left, 36.10 

group, topological transformation, right, 36.11 

group, transformation, 20 

group, transformation, associated, 20.9.1, 20.9.17, 20.11 

group, transformation, figure space, 20.9.13  — 

group, transformation, left, 20.1.2 

group, transformation, Lie, 62.0.3 

group, transformation, right, 20.7.2 

group, transformation, structure-preserving, 19.2.2 

group action, infinitesimal, principal bundle, 66.5 

group action map, left, principal bundle, 66.4 

group identity, 17.3.9 NS 

group inverse, 17.3.12 

group left inverse, 17.3.12 

group morphism, unital, 17.4.8, 36.9.5 

group morphism notations, 17.4.2 

group morphisms, 17.4, 17.4.1 

group of automorphisms, 17.3.18 

group of diffeomorphisms, 63.1, 63.1.4 

group of diffeomorphisms, tangent algebra, 63.2.5 
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group of differentiable diffeomorphisms, topological, 63.1.6 
group of linear transformations, topological, 39.6 

group order, Archimedean, 17.5.4 mE 

group ordering, 17.5.1 

group realisation, 20.1.11 

group right inverse, 17.3.12 

groups, abstract, 20.1.1 

groups, transformation, 20.1.1 


groups of chapters, 1.3 


Hadamard, Jacques Salomon, axiom of choice, 7.12.7 
Hahn-Banach theorem, 7.11.13 

half animal, 1.4.7 

half-line on affine space over group, 26.3.12 

half-line on affine space over module over ordered ring, 26.7.5 
half robot, 1.4.7 

halfline-to-halfline transformations, 26.20.2 

Halmos, Paul Richard, 1.5.6, 7.1.1 

Halmos bar, QED symbol, 1.5.6 - 

Hamilton, William Rowan, 22.0.1, 77.1.6 

Hamilton, William Rowan, origin of word “vector”, 26.1.10 
Hamiltonian mechanics, Poisson bracket, literature, 61.5.2 
hand, construction in the, 45.7.2 

hard analysis, 35.1.1 E 

hard predicate, 6.3.7 

hardware, floating-point, 16.2.12 

Hartogs, Friedrich Moritz, 77.1.7 

Hartogs number, 13.3.8 

Hartogs's theorem, 13.3, 13.3.2 

Hartogs’s theorem, compared to Cantor’s theorem, 13.3.1 
hat, rabbit, 6.6.5, 12.5.5, 45.3.6, 62.0.1 

hat, rabbit, complex numbers, 16.8.11. 

hat, rabbit, IOU, 7.10.3, 45.3.7 

hat-accent, horizontal component, 59.1.7 

hat accent on free variable, 6.3.16 ^ 

Hausdorff, Felix, 77.1.7 

Hausdorff, Felix, integration, 43.1.2 

Hausdorff, Felix, metric spaces, 37.0.5 

Hausdorff, Felix, point-set topology, 31.1.2 

Hausdorff, Felix, topological space dense subset, 33.4.1 
Hausdorff, Felix, topological spaces, 31.1.3 

Hausdorff condition, intrinsic geometry, 49.5.10 
Hausdorff condition, Schródinger's cat, 49.5.10 
Hausdorff condition, topological manifold, 50.1.2 
Hausdorff property, hereditary, 33.1.33 

Hausdorff property, locally Cartesian space, 49.5.2 
Hausdorff property inheritance, direct product, finite, 33.1.35 
Hausdorff separation, 33.1.25 m 
Hausdorff space, 33.1.24 

Hausdorff space, compact sets are closed, 33.5.14 
Hausdorff topological manifold, 50.1.3 

Hausdorff topology, 33.1.24 

Hausdorff's maximality theorem, 7.11.13 

Haydn, Joseph, 1.4.13 

heat equation, semigroup, 17.1.5, 17.1.6 

Heaviside function, 16.5.8 ^ © 

heavy machinery, 12.1.9 

height, rectangular matrix, 25.2.2 

Heine, Heinrich Eduard, 38.3.6, 77.1.6 

Heine-Borel compactness, 33.5.9 

Heine-Borel property, metric space, 37.7.20 

Heine-Borel theorem, Cartesian spaces, 34.9.14 
Heine-Borel theorem, real-number interval, 34.9.12 
Heine-Borel theorem, real-number sets, 34.9.13 
Heine-Borel theorem, real numbers, 34.9.11 

Heisenberg, Werner Karl, 77.1.7 

helicopter, supermarket, correspondence principle, 54.9.9 
hereditary, compactness, 33.5.12 

hereditary, Hausdorff property, 33.1.33 
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hereditary, Lindelóf property, 33.7.9 

hereditary properties, topology, 33.5.11 

Hermite, Charles, 2.2.8, 77.1.7 

Heron of Alexandria, 77.1.3 

Hesse, Ludwig Otto, 77.1.6 

Hessian, trace, Laplace-Beltrami operator, 74.6.4 

Hessian matrix at local extremum, 51.6.13 

Hessian of function, 71.9.5 

Hessian of second derivatives of distance function, 75.1.11 

Hessian operator, 71.9 

Hessian operator, critical-point, terrestrial coordinates, 76.4 

Hessian operator, metric tensor calculation, 73.2.6, 73.9.1, 
73.10.3 

Hessian operator, Riemannian metric, 73.1.1 

Hessian operator at a critical point, 59.11.4 

Hessian operator at a point, 71.9.6 

Hessian operator at critical point, 59.11 

Hessian operator on S? at a critical point, 76.4.5 


Hessian operator on two-sphere, terrestrial coordinates, 76.6.3 


hidden coordinates, 29.2.7 

hierarchy, cumulative type, 12.6.1 

hierarchy of sets, cumulative, set membership depth, 7.8.8 

hierarchy of sets, cumulative, transfinite recursive, 126 

high-level set language, 7.9.2 m 

higher-degree array, 25.15. 

higher-degree array, antisymmetric, 25.15.6 

higher-degree array, antisymmetric, compression, 25.15.13 

higher-degree array, antisymmetric, decompression, 25.15.16 

higher-degree array, symmetric, 25.15.5 

higher-degree array, symmetric, compression, 25.15.12 

higher-degree array, symmetric, decompression, 25.15.15 

higher-dimensional spherical coordinates, 76.1 

higher-level tangent bundle, 59, 59.1.25 m 

higher-level tangent bundle notation, 59.1.26 

higher-level tangent vector, manifold, 59.1.24 

higher-order derivative, map between Cartesian spaces, 42.5 

higher-order derivative for several variables, 42.2 I 

higher-order derivative of map, applications, 42.6 

higher-order derivative sequence, 42.1.4 m 

higher-order derivatives, 42 

higher-order derivatives, real-to-real functions, 42.1 

higher-order differentiability of map, keyhole test, 42.5.28 

higher-order differential, 59 

higher-order differential of curve, 59.8 

higher-order differential of curve family, 59.9 

higher-order differential of map between "manifolds, 59.12 

higher-order differential of real-valued function, 59.10 

higher-order partial derivative, undefined, partial function, 
42.2.7, 42.5.8 

higher-order partial derivative chain rule, 42.5.27 

higher-order partial derivative of partial function, 42.2.5, 
42.5.6 

higher-order tangent operator, 60 

higher-order tangent vector, 60.5 

Hilbert, David, 77.1.7 m 

Hilbert, David, axiom of choice, 7.12.7 

Hilbert, David, Cantor's Paradise, 12.5.5 

Hilbert space, 22.5.20 

Hilbert-style assertion, 5.3.3 

Hilbert-style predicate calculus, 6.1.10, 6.3.3, 6.3.5 

Hilbert's fifth problem, 62.1 cx ee 

Hilbert/Brouwer encounter, intuitionism versus formalism, 
2.2.12 

hill of beans, crazy world, 77.4.15 

hint-letter U for neighbourhood, Umgebung, 31.4.2 

Hipparchus of Nicaea, 77.1.2 

Hippias of Elis, 77.1.2 

Hippias of Elis, quadratrix, 26.19.3 
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Hippocrates of Chios, 77.1.2 

history, associated fibre bundle, 21.12.1 

history, Cauchy-Riemann-Darboux integral, milestones, 
43.5.1 

history, compact set terminology, 33.5.1 

history, continuity of functions, 3112.2 

history, determinant, 25.10.1 

history, fibre bundle, 47.0.3 

history, fibre bundle, associated, 21.12.1 

history, gauge theory, literature, 70.8.2 - 

history, geodesic curve, calculus of variations, 73.8.2 

history, holonomy group, 70.1.4 

history, Levi-Civita parallelism, 74.1.1 

history, locally Cartesian space, 49.4.8 

history, matrix algebra, 25.0.1 

history, notation for integrals, 43.2.4 

history, parallel transport and connections, 67.1 

history of differential geometry, 77 m 

history of integration, milestones, 43.1.2 

history of relativity, 77.3 

hoard, theorems, 4.5.11, 4.8.14 

hog, whole, 4.8.8, 53.3.9 

Holder condition on Riemannian manifold, 75.2.3 

Holder continuity, metric space, 38.7 aam 

Holder continuous function, metric space, 38.7.2 

Holder continuous manifold, 50.6 

holomorphic function, complex, 42.8.7 

holonomic coordinate frames versus moving frames, 55.7.12 

holonomic vector field family, 46.5.3 

holonomic vector field family, manifold, 61.5.19 

holonomic vector-field tuples, exterior derivative, 61.14.4 

holonomic vector fields, exterior derivative, 46.8 

holonomic vector fields, Koszul formalism, 57.0.2 

holonomic vector fields, Riemann curvature, 70.3.2 

holonomy, 70.1 

holonomy, Lie bracket, Cartesian space, 46.5 

holonomy, Lie bracket, integral curve, 46.5.4, 61.5.20 

holonomy deviation, infinitesimal loops, Lie algebra, 70.2.4 

holonomy-deviation-per-area limit, Riemann curvature, 70.2.3 

holonomy group, history, 70.1.4 

holonomy group, literature, 70.1.4 

holonomy group, orthogonal connection, 73.1.2 

home, cows, 77.4.15 

homeomorphic, 31.14.2 

homeomorphism, 31.14, 31.14.2 

homeomorphism, etymology, 31.14.1 

homeomorphism, fibre-set-restricted fibre chart, 47.3.11 

homeomorphism, function-product, common-domain, 32.11 

homeomorphism, local, 31.14.16 

homeomorphism, not implied by continuous bijection, 31.14.4 

homeomorphism, subspace, product-structured topological 
space, 32.11.4 

homeomorphism between subsets of topological spaces, 
31.14.12 

homeomorphism existence, axiom of choice, 31.14.5 

hominids, vocal-grooming, 3.0.2 

homogeneous cardinal, 13.10.7 

homomorphism, C! differentiable, Cartesian space, 41.7.6 

homomorphism, C^ differentiable, Cartesian space, 


42.7.2 

homomorphism, C* differentiable, differentiable manifold, 
52.2.2 

homomorphism, fibre bundle, 47.7 

homomorphism, group, 17.4.1 _ 

homomorphism, injective, versus monomorphism, 10.5.23 

homomorphism, left transformation group, 20.6 

homomorphism, linear space, 23.1.8 a 

homomorphism, modules over a ring, 19.4.3 

homomorphism, natural, quotient linear space, 24.2.12 


[draft: UTC 2023-1-3 Tuesday 00:13] 


fis book draft in electronic or printe 


2446 80. Index 


homomorphism, order, strong, 11.1.21 

homomorphism, order, weak, 11.1.2 

homomorphism, order-preserving, 18.8.15 

homomorphism, ordered field, 18.8.16 

homomorphism, ordered group, 17.5.11 

homomorphism, ordered ring, 18.3.13 

homomorphism, right transformation group, 20.8 

homomorphism, ring, 18.1.19 mE 

homomorphism, semigroup, 17.1.8 

homomorphism, set, 10.5.21 

homomorphism, surjective, versus endomorphism, 10.5.23 

homomorphism, topological fibre bundle, 47.7.3 

homomorphism, transformation group, 20.6.2, 20.8.1 

homomorphism, unitary modules over a ring, 19.4.1 

homomorphism, unitary ring, 18.2.8 

homomorphism between modules over a set, 19.1.10 

homomorphism module, 19.1.4, 19.1.14 

homomorphism ring-module, 19.4.8 

Hopf-Rinow geodesic completeness theorem, 73.7.2 

horizon, clouds, physics, 36.3.1 

horizontal component, ordinary fibre bundle, 64.5 

horizontal component, second-level tangent vector, 59.1.13 

horizontal component, total tangent space, 59.2 

horizontal component map, connection on fibre bundle, 67.9 

horizontal component map, ordinary fibre bundle, 67.9. — 

horizontal component map, principal bundle, 69.3 

horizontal component map, principal bundle, Minkowski 
space, 69.3.3 

horizontal component map, principal fibre bundle, 69.3.2 

horizontal component of tangent vector, 64.5.3 

horizontal component of total tangent space, 59.2.2 

horizontal component swap function, global, 59.6.5 

horizontal component swap function, second-level tangent 
bundle, 59.6, 59.6.3, 59.6.6 

horizontal differentiable submanifold, product-structured 
manifold, 52.7.8 

horizontal fibre-set vector field, 64.5.13 

horizontal lift, constructed from covariant derivative, vector 
bundle, 68.2.18 

horizontal lift, constructed from Koszul connection, 71.6.11, 
71.6.13 mE 

horizontal lift, Lie, 61.7.5 

horizontal lift, Lie, for vector fields, 61.7.9 

horizontal lift, principal bundle, Minkowski space, 69.1.9 

horizontal lift function, affine connection, 71.1.2 

horizontal lift function, Christoffel array, 62.0.1 

horizontal lift function, differentiable, ordinary fibre bundle, 
67.7.6 

horizontal lift function, differentiable, principal bundle, 
69.1.13 

horizontal lift function, differentiable fibration, 67.4, 67.4.2 

horizontal lift function, differentiable fibre bundle, 67.5.4 

horizontal lift function, differentiable fibre bundle, transposed, 
67.8.2 

horizontal lift function, interpretation of conditions, 67.4.3 

horizontal lift function, Levi-Civita connection, 74.2 

horizontal lift function, localisation, 67.7.2 no: 

horizontal lift function, localisation, principal bundle, 69.1.12 

horizontal lift function, ordinary fibre bundle, 67.5 

horizontal lift function, parallel transport, 71.4 

horizontal lift function, principal bundle, 69.1.3 

horizontal lift function, principal bundle, transposed, 69.2, 
69.2.2 un 

horizontal lift function, principal fibre bundle, 69.1 

horizontal lift function, tangent bundle, 71.1 xs 

horizontal lift function, tensor calculus, 71.2 

horizontal lift function, transposed, OFB, 67.8 

horizontal lift function, transposed, tangent bundle, 71.5 
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horizontal lift function differentiability, 67.7 

horizontal lift localisation, principal bundle, Minkowski space, 
69.1.14 

horizontal lift map, Cartesian space, 70.7.2 

horizontal submanifold, fibre-set vector field, 64.5.14 

horizontal subset, product-structured set, 10.15.12 

horizontal subspace, mnemonic letter Q, 67.9.5 

horizontal subspace, ordinary fibre bundle, 67.9.6 

horizontal subspace, principal bundle, 69.3 

horizontal subspace, principal fibre bundle, 69.3.7 

horizontal subspace connection representation, literature, 
67.9.8 

horizontal topological submanifold, product-structured 
manifold, 50.5.10 

horizontal topological subspace, product-structured space, 
32.11.6 

horns, bull, 77.4.1 

horse after cart, 49.7.5 

horse-riding, ontology of mathematics, 1.4.6 

Hoyle, Frederick (Fred), 77.1.8 —— 

Hudde, Johann, calculus; derivative at extremum, 40.6.1 

hull, convex, 22.11.19 

hull, convex, terminology survey, 22.11.19 

Hülle, abgeschlossene, topological closure, 31.8.10 

human brain, 22.0.3 

Huygens, Christiaan, 77.1.4 

Huygens, Christiaan, calculus, derivative at extremum, 40.6.1 

hyperbolic inner product, linear space, 24.10 

hyperbolic inner product, pseudo-metric field, 24.10.1 

hyperbolic inner product on linear space, 24.10.3 

hyperplane-constrained map, zero partial derivative, 41.2.26 

hyperplane segment, parametrised, 26.9 

hyperplane segment through points; affine space, 26.9.12 

hyperplane through points, affine space, 26.9.11 

hypotheses, burden of, 4.5.10 

hypothesis, continuum, independence from ZF, 7.1.4, 7.1.5, 
7.1.11 

hypothesis, continuum, mediate cardinals, 7.11.8 

hypothesis, logical argument, 4.3.2 


I-finite set, 13.11.1 

Ia-finite set, 13.11.1 

ice, thin, skating, 11.2.18 

ideal complex number, 18.1.10 

ideal of a ring, 18.1.9 

ideal theorem, boolean prime, 13.8.8 

identical, concretely, linear spaces, 59.6.2 

identification, tacit, submanifold tangent bundle, 66.5.5 

identification map, direct product, for manifold tangent 
spaces, 54.7.2 

identification map, tangent bundle product, 54.7 

identification set, 9.8.9 EE 

identification space, 9.8.9 

identification space, topological, 32.13, 32.13.7 

identity, group, 17.3.9 

identity, Jacobi, 19.10.2, 19.10.3 

identity, Jacobi, interpretation, 19.10.9 

identity, negative, Lie group inversion map differential, 62.2.9 

identity, semigroup, 17.2.2 

identity chart, infinitesimal action map, 66.5.9 

identity chart, non-topological principal bundle, 21.10 

identity chart differential, principal bundle, 66.3 

identity chart for cross-section, non-topological principal 
bundle, 21.10.7 

identity chart transition map differential, 66.3.4 

identity cross-section, non-topological principal bundle, 21.10, 
21.10.3 

identity cross-section differential, principal bundle, 66.3 

identity element, multiplicative unitary ring, 18.2.2 — 
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identity function, 10.2.27 

identity map, differential, 58.5.1 

identity map differential, pointwise, 58.5 

identity map on a manifold, differentiability, 52.1.8 

identity map on Cartesian space, differentiability, 42.5.21 

identity map on Cartesian space, partial differentiability, 
41.2.24 

identity map on real numbers, differentiability, 42.1.16 

identity matrix, 25.4, 25.4.2 

identity relation; 9.6.9 

if and only if, 3.6.2 — 

ignorance set, 34. 

II-finite set, 13.11.1 

III-finite set, 13.11.1 

Iliad, 7.1.2 

image, orbit-space, associated fibre bundle, 47.11.1 

image of function, 10.2.6, 10.2.7 ~ 

image of path, 36.8.4 84 — 

image of relation, 9.5.4, 9.5.6 

image of set by relation, 9.5.16 

imaginary parachute, axiom of choice, 7.11.1, 10.3.1 

imaginary part, complex number, 16.8.5 

imaginative process, predicate calculus, 6.1.3 

imbedding (see “embedding” ), 50.2.1 ~ 

immersion, differentiable manifold, 52.5 

immersion, regular, differentiable manifold, 52.5.4 

immersion, regular, topological manifold, 50.3.8 

immersion, standard, unrestricted linear space, 22.2.7 

immersion, terminology, 50.2.3 

immersion, topological manifold, 50.3 

implication, logical, 3.6.2 m 

implication operator theorems, 4.6 

implicit chart, mathematical object, 26.11.6 

implicit fibre-set tangent bundle embedding map, 66.5.5 

implicit function theorem, 41.10 

implicit summation convention, Einstein, 22.3.10 

implicit summation convention, Einstein, multiple ranges, 
71.10.1 

implies, 3.6.2 

inclusion, differentiable manifold, 52.3.3 

inclusion, partial order, left segments, 11.2.13 

inclusion, sets, 7.3, 7.3.2 

inclusion map differential, tangent vector embedding map, 
58.5.6 

inclusive disjunction, 3.8.1 

inclusive or, 3.6.3 T 

inclusive OR versus XOR, 3.8.2 

incommensurables, ancient Greek mathematics, 26.11.4 

incompressible number, 2.3.5 

increasing function, 11.3.30 - 

independence, linear, 22.6 - 

independent, linearly, family of vectors, 22.6.5 


independent, linearly, set of vectors, 22.6.3 
index, column, matrix element, 25.2.11 
index, initial, list, 14.12.4 

index, row, matrix element, 25.2.11 

index function, permutation, 14.8.22 

index lowering, tensor, 73.5 

index lowering, vector in metric space, 73.5.7 
index-lowering isomorphism, 73.5.3 

index map, subsequence, 12.3.2 


index of symmetric bilinear function on linear space, 24.10.5 


index of this book, 80 

index raising, tensor, 73.5 

index-raising isomorphism, 73.5.4 

index-selection map, differentiability, 42.6.17 

index set, integers, start at 0 or 1, 14.1.20, 14.6.9, 14.12.1, 
14.12.4, 16.4.2 
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indexed cover of a set, 10.18, 10.18.2 

indexed open cover, 33.5.4 

indexed topological manifold atlas, 50.1.12 

indicator function, 14.7, 14.7.2 

indicator function set, equinumerous to power set, 14.7.5 

indices, debauch of, 73.3.1 

indices, subscript /superscript, linear spaces, 22.8.21, 23.8.4 

individual constant, 5.1.11 

induced atlas by fibre chart on fibre set, 64.3.7 

induced atlas from ambient space, differentiable submanifold, 
52.4.7 

induced atlas on fibre set by fibre chart, 64.11.3 

induced atlas on fibre set by fibre space, 64.3.10 

induced atlas on product-structured differentiable manifold, 
52.7.1 

induced atlas on product-structured topological manifold, 
50.5.1 

induced atlas on product-structured topological submanifold, 
50.5.8 

induced atlas on tangent bundle from base space, 54.5.18 

induced atlas via diffeomorphism, differentiable manifold, 
52.2.8, 52.2.8 

induced atlas via homeomorphism, locally Cartesian space, 

49.11.7 

induced atlas via trivialisations, differentiable fibre bundle, 

induced basis field, standard, vector bundle fibre set, 65.1.8 

induced by atlas, differentiable locally Cartesian space, 

51.5.14 

induced cotangent map, pull-back differential, 58.11.1 

induced distance function by a norm, 39.3.2 

induced equivalence relation by inverse function, 10.16.6 

induced linear space of fibre set, 65.1.6 

induced linear structure on fibre sets, 24.11.2 

induced linear transformation, differential of map, 58.4.1 

induced map for differential operators, 58.12 

induced map for tagged tangent operator, 58.12.11 

induced map for tangent operator, 58.12.8 

induced map for vector field, 58.9.16 

induced map of a curve, 57.9.9 

induced map of differentiable map, global, 58.9, 58.9.4 

induced-map tangent covector space, pointwise, 58.3.7 

induced-map tangent covector space, total, 58.3.9 

induced-map tangent space, 58.3 

induced-map tangent space, pointwise, 58.3.2 

induced-map tangent space, total, 58.3.4 

induced map versus differential, terminology, 58.4.1 

induced order by a bijection, 11.1.28 

induced pseudo-vector-fields by a map, 61.6.5 

induced quotient topology, 32.13.9 

induced relative topology, 31.6.2 

induced square-norm by inner product, 19.7.1 

induced tangent map, differential of map, 58.4.1 

induced topology, locally Cartesian atlas, 49.8 

induced topology, metric, 37.5 

induced topology by a metric, 37.5.2 

induced topology by a norm, 39.3.2 

induced topology by forward map, 32.13.5 

induced topology by norms on finite-dimensional linear space, 

equivalent, 39.5.3 

induced topology of continuous locally Cartesian atlas, 

49.8.12 

induced topology of differentiable locally Cartesian atlas, 

51.5.8 

induced top topology on a set by a function, 32.8.4 

induction, mathematical, Peano axiom, 14.1.7 

induction, mathematical, principle, 7.10.1 

induction, mathematical, principle (theorem), 12.2.12 
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induction, 
induction, 


naive, 9.3.4 

transfinite, 11.8, 22.7.21, 22.7.26 

induction, transfinite, literature, 11.8.6 

induction, transfinite, principle, 11.8.2 

induction argument example, 31.3.7 

induction principle, mathematical, finite, 12.2.13 

induction principle, mathematical, wrapped finite, 12.2.24 

inductive cardinal, 13.10.7 

inductive definition, finite ordinal number addition, 13.6.1 

inductive set, 13.10.7 

inequalities, basic, real numbers and square roots, 16.6.18, 

16.6.19 — 

inequality, Cauchy-Schwarz, elementary version, 16.6.17 

inequality, Cauchy-Schwarz, module over Archimedean ring, 

19.8.9 

inequality, Cauchy-Schwarz, module over ring, 19.8, 19.8.2 

inequality, Cauchy-Schwarz, real linear space, 24.9.14 

inequality, triangle, 37.3.12 

inequality, triangle, equivalents, 37.3.8, 37.3.10 

inequality, triangle, general metric function, 37.1.2 

inequality, triangle, infinitesimal area elements, 43.9.3 

inequality, triangle, infinitesimal line elements, 40.8.2 

inequality, triangle, module over ring, 19.8 

inequality, triangle, real-valued metric function, 37.2.3 

inertia, 53.1.12 

inertial frame, 48.2.4, 59.8.1 

inertial frame, moving train, 20.1.10, 20.10.2 

inf map domain, 11.2.26 

inf notation, 11.2.16, 11.2.20, 11.2.22 

inferior limit, 35.8 

inferior limit, real function, one-sided, 35.8.9 

inferior limit, real-valued function, 35.8.2 

infimum, existence of upper approximations, 16.1.18 

infimum map, 11.2.21 

infimum of partially ordered set, 11.2.4, 11.3.2 

infimum set-map, 11.2.19 

infinite, countably, set cardinality, 13.7.9 

infinite, countably, set numerosity, 13.7.9 

infinite, potentially, 4.5.4 

infinite, potentially versus actually, 12.1.32, 13.12.4 

infinite choice axiom, 7.11, 7.12 

infinite-codimensional linear subspace, 22.5.19 

infinite differentiability classes, interpretation, 42.1.12 

infinite-dimensional linear space, 22.5.7 

infinite-dimensional multilinear map, 27.2.25 

infinite matrix, 25.1.4 

infinite matrix, sparse, 25.3.16 

infinite real-number interval, 16.1.12 

infinite sequence, 12.3.1 

infinite series, convergent, topological linear space, 39.2.7 

infinite series, divergent, topological linear space, 

infinite series, sum, topological linear space, 39.2 

infinite series, topological linear space, 39.2, 

infinite set, 13.7.2 

infinite set, countably, 13.7, 13.7.6 

set, Dedekind, 13.10, 13.10.2 

set, Peano-style, 14.2.2 

set, uncountable, 13.10.14 

infinite subsequence, 12.3.3 

infinite versus unbounded, set membership depth, 7.8.8 

infinitesimal action, 67.5.1 

infinitesimal action map, identity chart, 66.5.9 

infinitesimal action map, transposed, principal bundle, 66.6.2 

infinitesimal action map, transposed, properties, 66.6.8 

infinitesimal action map of Lie algebra elements on a principal 
fibre bundle, 66.5.2 

infinitesimal group action, principal bundle, 66.5 


39.2.7 


infinite 
infinite 
infinite 
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infinitesimal transformation, differentiability on passive set, 
63.6.11 
infinitesimal transformation, fibre sets, 64.13.1 

infinitesimal transformation, fundamental vector field, 63.7.4 
infinitesimal transformation, joint differentiability, 63.6.13 


infinitesimal transformation, left, 63.6 


infinitesimal transformation, linear dependence, generator, 
63.6.12 

infinitesimal transformation, right, 63.7 

infinitesimal transformation generator, 63.6.19 

infinitesimal transformation generator map, 63.6.20 

infinitesimal transformation induced on total space by Lie 


algebra, 64.13.3 

infinitesimal transformation of Lie right transformation 
group, 63.7.3 

infinitesimal transformation of Lie transformation group, 
63.6.5, 63.6.28 

infinity, logical quantifier, 5.2.7, 5.2.8 

infinity, pseudo-definition, 16.2.12 

infinity, real, definition, 16.2.2 

infinity axiom, faith, 7.10.3 

infinity axiom, ZF, 7.9 

cc-accumulation point of set, 31.10.17 

oo-cluster point of set, 31.10.17 

co-limit-point compact set, 35.5.9 

co-limit-point compact set, w-infinite-set, 35.5.12 

cc-limit-point compact set, sequence-range, 35.6.4 

cc-limit point of set, 31.10.17 

co-limit set of set, 31.10.17 

infix logical expression, 3.9.12, 3.12.2 

infix logical expression, bracket-level diagram, 3.12.1 

infix logical expression, syntax specification, non-recursive, 

3.12.1 

ogical expression interpretation map, 3.12.4 

ogical expression space, 3.12.3 

ogical expression space, substitution closure, 3.12.5, 

3.12.12 2 

ogical expression style, 3.12 

ogical subexpression, 3.12.8 

infix notation, relations, 9.5.30 _ 

infix syntax disadvantages, 3.12.7 

information content, topology, 31.11.11 

information content, universal/existential quantifier, 5.2.6 

information geometry, out of scope, 1.6.3 a 

information style, connection, 67.2.1 

initial index, list, 14.12.4 

initial ordinal number, 13.2.7 

initial point of curve, 36.2.13 

initial point of path, 36.8.10 

initial segment, ordinal number, 12.1.8 

initial segment, weak, ordinal number, 12.5.27 

initial segment, weak, well-ordered set, 11.7.10 

initial segment, well-ordered set, 11.7.3 

initial value problem, ODE system, classical solutions, 44.6.12 

initial value problem, ODE system, weak solutions, 44.6.14 

injection, 10.5.2 

injection, canonical, dual vector in mixed tensor space, 29.5.8, 

29.5.10 a 

injection, canonical, scalar in mixed tensor space, 29.5.8, 

29.5.10 

injection, canonical, scalar in multilinear function space, 

27.3.7 

injection, canonical, scalar in tensor space, 28.1.16 

injection, canonical, vector in mixed tensor space, 29.5.8, 

29.5.10 

injection, canonical, vector in multilinear map space, 27.2.23 

injection, canonical, vector in tensor space, 28.1.18 

injective function, 10.5.2 


infix 
infix 
infix 


infix 
infix 
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injective homomorphism versus monomorphism, 10.5.23 

injective partial function, 10.9.8 

injective relation, 9.6.18 ^ 

injective sequence, 12.3.2 

injectivity, continuity, implies strict monotonicity, 34.9.9 

ink, extra, 19.2.5 RE 

ink, significance, 6.4.6 

ink, waste, 1.5.6, 24.7.9 

inline declaration of function, 43.2.7, 43.10.5 

inline proof, 4.8.11, 10.6.2 Eum 

inner automorphism, 17.8.12 

inner product, Euclidean, standard, 24.9.6 

inner product, hyperbolic, linear space, 24.10 

inner product, hyperbolic, on linear space, 24.10.3 

inner product, hyperbolic, pseudo-metric field, 24.10.1 

inner product, linear space, 24.9 

inner product, module over a ring, positive definite, 19.7.3 

inner product, module over ring, 19.7 

inner product criterion, Minkowskian, basis-free, 24.10.8 

inner product definition, Minkowskian, coordinate-free, 
24.10.6 

inner product on module over a ring, 19.7.2 

inner product on real linear space, 24.9.4 

inner product on real linear space, Minkowskian, 24.10.6 

inner product real linear space, 24.9.5 

inner product space, Cartesian, 24.9.7, 26.11.1 

inner product space, Euclidean, 24.9.7, 26.11.1 

insert-item function for lists, 14.12.6 

insight, 1.7.2 

integer, 14 144 

integer, even, 5.2.7 

integer, extended, 14.5, 14.5.2 

integer, lowest, zero or one, 14.1.2, 14.6.2, 14.12.1, 14.12.4 

integer, lowest, zero or one survey, 141.1 ^ 

integer, nearest, distance function, 16.5.22 

integer, nearest, function, 16.5.17 

integer, negative, 14.4.5 

integer, negative extended, 14.5.4 

integer, positive, 14.4.5 

integer, positive extended, 14.5.4 

integer, signed, 14.4.3 

integer index set, start at 0 or 1, 14.1.20, 14.6.9, 14.12.1, 
14.12.4, 16.4.2 

integer interval notation, 14.4.10 

integer interval notation, semi-infinite, 14.4.11 

integer notation summary, 1.5.1 

integer representation, 53.3.6 

integer representation, little-endian or big-endian, 53.1.14 

integer set notations, 14.4.5 

integer set notations, extended, 14.5.4 

integers, extended, usual topology, 32.5.4 

integers, extension, 15.1.2 

integers, Gaufian, ring, 18.2.17 

integers, modelling the real world, 26.11.6 

integers, non-negative, usual topology, 32.5.3 

integers, non-negative extended, usual topology, 32.5.4 

integers, topology, usual, 32.5.3 TM 

integrability, Cauchy, uniform continuity condition, 43.3.20 

integrable function, Cauchy, real-valued, 43.3.16 

integrable function, Darboux, real-valued, 43.5.11 

integrable function, Riemann, real-valued, 43.4.7 

integrable vector-valued function, Darboux, 43.9.2 

integral, Barrow, 43.2 

integral, Cauchy, 43.1.4, 43.3, 43.3.1 

integral, Cauchy, Cauchy net, 43.3.19 

integral, Cauchy-Riemann, 43.4 

integral, Cauchy-Riemann-Darboux, 43.5 

integral, Cauchy-Riemann-Darboux, basic properties, 43.7 
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integral, Cauchy-Riemann-Darboux, vector-valued integrand, 
43.9 

integral, Cauchy-Riemann-Darboux-Stieltjes, 43.10 

integral, Daniell, 43.1.4 

integral, Darboux, 43.1.4, 43.5 

integral, Darboux, basic properties, 43.7 

integral, Darboux, real-valued function, 43.5.12 

integral, Denjoy, 43.1.4 

integral, Gregory, 43.2 

integral, Leibniz, 43.2 

integral, lower Darboux, real-valued function, 43.5.10 

integral, lower Riemann-Stieltjes, real-valued function, 
43.10.4 

integral, Newton, 43.1.4, 43.2 

integral, Newton’s, 4323 

integral, Perron, 43.1.4 _ 

integral, Riemann, 43.4, 43.4.6 

integral, Riemann-Stieltjes, 43.10 

integral, Riemann-Stieltjes, integrator curve, 43.10.3 

integral, Riemann-Stieltjes, real-valued function, 43.10.4 

integral, Stieltjes, linear-operator-valued function, 43.12.3 

integral, Stieltjes, operator-valued integrand, 43.12 

integral, Stieltjes, vector integrator differential, 43.12.1 

integral, Stieltjes, vector-valued function, 43.11.3 

integral, Stieltjes, vector-valued integrand, 43.11 

integral, Stieltjes, vector-valued integrator curve, 43.12.2 

integral, upper Darboux, real-valued function, 43.5.10 

integral, upper Riemann-Stieltjes, real-valued function, 
43.10.4 

integral calculus, 43 

integral curve, existence theorem, 57.10.5 

integral curve, Lie bracket, holonomy, 46.5.4 

integral curve, Lie bracket, nonholonomy, 61.5.20 

integral curve, literature, 57.10.1 

integral curve, uniqueness theorem, 57.10.6 

integral curve existence, applications, 57.10.4 

integral curve of vector field, 57.10.2 ^ 

integral curve of vector field, differentiable manifold, 57.10 

integral curves of zero vector fields are constant, 57.10.8 

integral domain, 18.2.15 

integral domain, complex number system, 18.2.19 

integral domain, ordered, 18.6.13 

integral notation, history, 43.2.4 

integral of Riemannian metric, 73.9.2 

integral of vector-valued function, Darboux, 43.9.2 

integral power, negative, 16.6.4 

integral power, non-negative, 16.6.3 

integral root, positive, non-negative real number, 16.6.8 

integral system, 18.6.14 

integrals, tables, 43.2.3 

integrand, vector-valued, Cauchy-Riemann-Darboux integral, 
43.9 

integration, history, milestones, 43.1.2 

integration by parts, 43.8.11 

integration by parts, tactical retreat, 43.8.10 

integration styles, literature, 43.1.4 

integrator curve, vector-valued, for Stieltjes integral, 43.12.2 

integrator curve for Riemann-Stieltjes integral, 43.10.3 

integrator differential, vector, Stieltjes integral, 43.12.1 

intellectual capital, 4.5.11 

intellectual towers, sandy foundations, 2.0.2 

intelligence, artificial, 2.2.3 ~~ 

interior cone condition, 26.19.4, 51.11.5 

interior of set, 31.8 

interior of set, pointwise characterisation, 31.8.11 

interior of set, topological, 31.8.2 

interior operator, relation to strength of topology, 31.8.22 
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interior operator, topological, basic properties, 31.8.13, 
31.8.14, 31.8.16, 31.8.17 

interior operator, topological space, 31.8.4 

interior point of open set, 31.3.8 

interior point of set, 31.2.1 

interior sphere condition, 26.19.4 

intermediate concept, force, 53.1.4 

intermediate value theorem, 34.9.7 

internal direct sum of linear spaces, 

international dateline, 76.2.2 

international language, predicate calculus, 77.4.1 

interpretation, domain, propositional calculus, 5.1.8 

interpretation, logic, 2.4.3 A 

interpretation, predicate calculus, 6.4 

interpretation, recursive, infix logical expression, 3.9.12 

interpretation channel, remarks and diagrams, 1.5.9 

interpretation map, infix logical expression, 3.12.4 

interpretation map, postfix logical expression, 3.11. 

interpretation map, prefix logical expression, 3.11.14 

intersection of family of sets, 10.8.9 

intersection of sets, 8.1.2 

intersection of sets, properties, binary, 8.1 

intersection of sets, properties, general, 8.4 

intersection theorem, Cantor's, Cartesian space, 37.9.6, 
37.9.12 

intersection theorem, Cantor's, complete metric space, 37.9.8 

interval, leading, natural numbers, 14.1.21 

interval, left-compact, real-number, 34.9.16 


24.1.6 


o 


interval, left-open, real-number, 34.9.16 
interval, nested, theorem, 37.9.2, 37.9.3, 37.9.12 
interval, open, real numbers, 32.5.8 

interval, real-number, 16.1, 16.1.4 

interval, real-number, convexity, 22.11.24 
interval, real-number, finite, 16.1.12 

interval, real-number, infinite, 16.1.12 

interval, real-number, length, 16.1.13 

interval, real-number, semi-infinite, 16.1.12 
interval, right-compact, real-number, 34.9.16 
interval, right-open, real-number, 34.9.16 
interval, semi-open, circle topology, 32.6.14 
interval, semi-open, circle topology distance function, 37.2.10 


interval, semi-open, torus topology, 32.6.15 


interval, semi-open, torus topology distance function, 37.2.12 

interval, space-time, 20.1.10 

interval, totally ordered set, 11.5.10 

interval, unit, 16.1.7 

interval component, open, point in real-number open set, 
32.7.5 

interval enumeration, open, 32.7.2 

interval enumeration, open set of real numbers, 32.7 

interval list, double shadow-set, 45.6.8 

interval list, shadow-set, 45.5.6 

interval notation, closed, ‘symmetrised, 16.1.15 

interval notation, integer, 14.4.10 

interval notation, integer, semi-infinite, 14.4.11 

interval notation, rational number, 15.1.13 

interval notation, rational number, semi-infinite, 15.1.14 

interval notation, totally ordered set, 11.5.13 

interval notation, totally ordered set, semi-infinite, 11.5.15 

interval partition, 43.3.3 

interval partition, mesh, 43.3.10 

interval partition refinement, 43.3.5 

interval partition sample sequence, 43.4.3 

interval partitions, common refinement, coarsest, 43.3.7 

interval partitions, mesh-constrained set, 43.3.11 

interval partitions refinement, common, 43.3.5 


intrinsic differential geometry, 49.1 
intrinsic differential geometry, locally Cartesian, 49.1.1 
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intrinsic fibre space, topological fibration, 47.2.2 

intrinsic geometry, Hausdorff condition, 49.5.10 

introduction to the book, 1 

intuition, mathematical, versus logical rigour, 2.4. 

intuitionism versus formalism, 2.2.12 == 

intuitive topology, 31.3.8 

invariant, transformation group, 20.9 

invariant attribute, 20.9.19 rd 

invariant vector field, left, generated from matrices, 62.4.17 

invariant vector field, right, 62.7.2 I 

invariant vector field, translation, Lie group, 62.4.12, 62.10.10 

invariant vector field generated by vector, left, Lie group, 
62.4.15 

invariant vector field generated by vector, right, Lie group, 
62.7.7 

invariant vector field on Lie group, left, 62.4 

invariant vector field on Lie group, right, 62.7 

invariant vector fields on a Lie group, left, Lie algebra, 62.4.10 

invariant vector fields on Lie group, left, linear space, 62.4.8 

inverse, continuous, of continuous bijection between metric 
spaces, 38.1.13 

inverse, continuous, of continuous real-valued injection on 
interval, 34.9.25 

inverse, group, 17.3.12 

inverse, left, function, 10.5.13 

inverse, left, matrix, 254 

inverse, right, function, 10.5.13 

inverse, right, matrix, 25.4 

inverse chain rule, partial derivatives, 41.7.7 

inverse function, 10.5.9 

inverse function theorem, 41.10, 41.10.4 

inverse image of set by relation, 0.5.16 - 

inverse matrix, 25.8.10 

inverse matrix, left, 25.4.5 

inverse matrix, right, 25.4.5 

inverse matrix procedure, 24.9.10 

inverse of function, 10.5.10 

inverse of surjection, right, explicit, 10.5.17 

inverse relation, 9.6.13, 10.5.9 

inverse rule, global differential of diffeomorphism, 58.9.13 

inverse rule, pointwise differential of diffeomorphism, 58.5.4 

inverse set-map, function, 10.6 

inverse set-map, topology, 32.8.1, 32.8.3 

inverse set-map conditions for continuity, 35.1.2 

inverse set-map for a function, 10.6.4 

inverse set-map for a function, double, 10.7.5 

inverse set-map for a partial function, 10.9.10 

inverse trigonometric function, 44.2.3 

inversion, relations, 9.6 

inversion map differential, Lie group, negative identity, 62.2.9 

invertible matrix, 25.8.7 

invertible matrix properties, 25.8.13 

investigation, essay, treatise, enquiry, 1.4.3 

IOU, Andromeda Galaxy holiday, 22.7.20 

IOU, axiom of choice, 13.8.14 

IOU, rabbit, hat, 7.10.3, 45.3.7 

IOU, rabbit, open base point-choice function, 33.4.22 

irrational numbers, 15.0.1 

irrational real number, 15.5.6 

isolated point, topology, 31.10 

isolated point of set, 31.10.8 

isolation of English mathematics, serious evil, 40.4.12 

isometry on two-sphere, 76.8 

isomorphism, C^ differentiable, differentiable manifold, 52.2.2 

isomorphism, canonical, interpretation, 29.2.22 

isomorphism, linear space, 23.1.8 

isomorphism, musical, flat, 73.5.1, 73.5.3 

isomorphism, musical, sharp, 
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isomorphism, natural, 22.2.3 

isomorphism, natural, tensor space, 29.2.5 
isomorphism, order, 11.1.21 ial 
isomorphism, ordered field, 18.8.17 

isomorphism, ring, 18.1.20 

isomorphism, set, 10.5.21 

isomorphism, tensor space, unmixed, 29.2 
isomorphism, topological fibre bundle, 47.7.5, 47.7.11 
isomorphism, transformation group, 20.6.2, 20.8.1 
isomorphism theorem, linear space, 24.2.16 
isospin rotation, gauge theory, 70.8.3 

iteration method, Picard, interpretation, 44.6.18 
iteration method, Picard, literature, 44.6.1 
iteration method, Picard, properties, 44.6.7 
iteration sequence, Picard, 44.6.15 

IV-finite set, same as Dedekind-finite, 13.11.1 


Jacobi, Carl Gustav Jacob, 77.1.6 

Jacobi field, 72, 72.4, 72.4.5 

Jacobi identity, 19.10.2, 19.10.3 

Jacobi identity interpretation, 19.10.9 

Jacobi’s formula, divergence, 71.8.5 

Jacobian matrix, manifold chart transition, 51.4.1 
Jacobian matrix function, Cartesian space, 46.3.7 
Jacobian system, holonomic vector field family, 46.5.1 
joinable curve concatenation, 36.6.5 

joinable curves, 36.6.2 

joint denial, 3.7.14 

joint denial operator, 3.7.8, 3.7.10 

joint-domain map differentiability, 41.3 

joint-range map differentiability, 41.3 _ 

jointly differentiable two-variable manifold map, 52.6.11 
Jones, William, 44.2.6 

Jordan, Marie Ennemond Camille, 77.1.7 


io) 


Jordan, Marie Ennemond Camille, bounded variation, 38.10.1 


Jordan arc, 36.2.11 

Jordan content, exterior, 43.6.1 

Jordan content and Darboux integrability, 43.6 

Jordan factorial function, 14.8.29 

Jordan outer content, 43.6.2 

journey, 36.1.1 

justification, Riemann curvature, ordinary fibre bundle, 70.7 

justification theorem, Riemann curvature, toy version, 70.7.6 

juxtaposition notation, semigroup operation, 17.1.12 

juxtaposition of lists, concatenation of families, 29.3.9 

juxtaposition product, frame fields, 57.11.7 

juxtaposition product, linear functional by vector, 23.4.10, 
69.13.9 

juxtaposition product, multilinear function by vector, 27.2.31, 
56.7.9 

juxtaposition product, multilinear map space canonical basis, 
27.6.15 

juxtaposition product, tensors, 29.7, 57.11.7 

juxtaposition product, tensors, general, 29.7.5 

juxtaposition product, tensors, multilinear-function-style, 

29.7.6 


k-combination, 14.9.1 

k-permutation, 14.10.1 

Kant, Immanuel, 26.11.3, 77.1.6 
Kaufmann, Walter, 77.1.7 

Kaufmann, Walter, relativity, 77.3.2 
Kepler, Johannes, 77.1.4 

Kepler, Johannes, calculus, derivative at extremum, 40.6.1 
kernel, equivalence, 9.8.10, 27.8.1 

kernel, equivalence, function, 10.16 
kernel, equivalence, of a function, 10.16.7 
kernel, linear map, 23.1.22 

kettle of fish, 6.4.9, 7.11.7 
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keyhole test for continuity, 31.12.16 

keyhole test for continuous differentiability, 41.1.26 
keyhole test for continuous differentiability of map, 41.2.20 
keyhole test for differentiability, 40.3.5 

keyhole test for higher-order differentiability of map, 42.5.28 
keyhole test for partial differentiability, 41.1.5 

keyhole test for partial differentiability of map, 41.2.3 
Khufu’s pyramid, 26.10.1 ET 
Klein, Felix, 77.1.7, 77.2.6 

knots, gordian, infinity concepts, 13.7.1 

knowledge, a-priori, 4.3.2 _ 

knowledge, exclusion semantics, 3.14.6, 6.4.6 

knowledge, quantitative measure, 3.4.6 

knowledge bedrock, 2.0.1 

knowledge set, 3.4, 3.4.4 

knowledge set, using truth domains, 3.5 

knowledge set semantics, predicate calculus, 6.5.9 
knowledge sets, combination by intersection, 3.4.10 
knowledge space, 3.4.3 

knowledge versus truth, 3.1.3 

knowledge wheel, 2.0.1 im 

Kontinuum, 49.2.10 

Körper (field), 18.7.6 

Koszul, Jean-Louis, 77.1.8 

Koszul connection, 67.2.1, 67.2.2, 71.6.9 

Koszul connection, torsion, 71.12.2, 71.12.4 
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Lie algebra of vector fields on a manifold, 61.5.16 
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Lie bracket, anticommutativity, 61.5.9 
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Lie bracket, integral curve, nonholonomy, 61.5.20 
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Lie bracket, Lie derivative, 61.8.1 
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Lie bracket of vector fields, Cartesian space, 46.1.11 

Lie bracket of vector fields, differentiability, 61.5.13 
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Lie derivative, 61.8.4 = 
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Lie derivative of vector field, 61.8 
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62.10.10 
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Lie group vector field, invariant, translation, 62.4.12, 62.10.10 
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Lie horizontal lift for vector fields, 61.7.9 
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Lie left transformation group, left translation operator, 
63.4.7, 63.4.8 
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63.5.6, 63.5.7 
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Lie transport, glacier flow, 61.7.1 
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vector bundle, 68.2.18 

lift, horizontal, constructed from Koszul connection, 71.6.11, 

71.6.13 a 

lift, horizontal, Lie, 61.7.5 

lift, horizontal, Lie, for vector fields, 61.7.9 

lift, vertical, 71.1.3 — 

lift, vertical, by horizontal lift function, 67.4.3 

lift function, affine connection, transposed, 71.5.2 

lift function, differentiable fibre bundle, transposed, 67.8.2 

lift function, horizontal, affine connection, 71.1.2 

lift function, horizontal, Christoffel array, 62.0.1 

lift function, horizontal, differentiability, 67.7 

lift function, horizontal, differentiable, ordinary fibre bundle, 

67.7.6 

lift function, horizontal, differentiable, principal bundle, 

69.1.13 

lift function, horizontal, differentiable fibration, 67.4, 67.4.2 

lift function, horizontal, differentiable fibre bundle, 67.5.4 

lift function, horizontal, interpretation of conditions, 67.4.3 

lift function, horizontal, localisation, 67.7.2 

lift function, horizontal, localisation, principal bundle, 69.1.12 
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lift function, horizontal, parallel transport, 714 — 

lift function, horizontal, tangent bundle, TI 

lift function, horizontal, tensor calculus, 71.2 

lift function, horizontal transposed, OFB, 67.8 

lift function, horizontal transposed, principal bundle, 69.2 
lift function, principal bundle, horizontal, 69.1.3 

lift function, principal bundle, horizontal transposed, 69.2.2 
lift function, principal fibre bundle, horizontal, 69.1 

lift function, tangent bundle, unidirectional, 54.16.4 

lift function, transposed horizontal, tangent bundle, 71.5 
lift map, horizontal, Cartesian space, 70.7.2 X 
lift of local vector field by horizontal lift function, 67.5.10, 
69.1.16 

lift of vector field by horizontal lift function, 67.5.9, 69.1.1 
lift of vector field by transposed horizontal lift function, - 
67.8.6, 69.2.7 

lifted chart-basis operator field on principal bundle, 69.14.4 
lifted chart-basis vector field, pulled-back, principal bundle, 
69.14.3 

lifted chart-basis vector field on principal bundle, 69.14.3 
limit, 35 

limit, real function, one-sided inferior, 35.8.9 


limit, real function, one-sided superior, 35.8.9 
limit, sequence, real-number, 35.7 

limit-based compactness, limit points of sets, 35.5 
limit-based compactness, limits of sequences, 35.6 
limit notation, 35.3.17 mm 
limit of a function at a point, 35.3.7 

limit of a sequence, 35.4.2 

limit of function, 35.3 

limit of sequence, 35.4, 35.4.12 

limit of sequence in metric space, 37.7.12 
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limit point, w, 31.10.16 
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limit set of set, 31.10.2 
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Lindelóf property, hereditary, 33.7.9 

Lindelöf set, 33.7.8 

Lindelóf topological space, 33.7.8 

line, constant-direction, 53.1.11 

line, constant-velocity, 53.1.11 

line, parametrised, traversal mechanism, 40.2.3 
line, straight, differentiable manifold, 53.1.9 
line-based directional derivative operator, 54.11.8 
line bundle, differentiable, 65.2.12 

line bundle, tangent, affine space over group, 26.3 
line bundle, tangent, Cartesian space, 26.14 

line bundle, tangent, Cartesian space, philosophy, 26.12 
line bundle, tangent, on affine space over module, 26.5 
line bundle total space, tangent, Cartesian space, 26.14.2 
line definition, coordinate-free, 26.3.9 

line on affine space over group, 26.3.3 

line on affine space over module, 26.4.5 

line on affine space over module over set, 26.5.3 
line segment, affine space, 26.9.7 

line segment, parametrised, 26.9 

line space, tangent, Cartesian space, 26.13.11 
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line-vector, tangent, Cartesian space, 26.13.1, 26.13.2, 26.13.3 


line-vector, tangent, differentiable manifold, 54.1 


line-vector bundle, tangent, differentiable manifold, 54.5 


line-vector set, tangent, Cartesian space, 26.13.5 
line-vector space, tangent, differentiable manifold, 54.4, 
line-vector tangent bundle, 54.5.16 mm 
line-vector velocity, tangent, Cartesian space, 26.13.14 
linear algebra, 22, 23, 24 

linear combination, 22.3, 22.3.2 

linear combination of vectors, 20.9.20, 20.9.22, 22.5.20 
linear combination of vectors, formal, 22.2.22 

linear connection, vector bundle, 68.1.4 

linear functional, 23.4, 23.4.2 


linear functional, juxtaposition product by vector, 23.4.10 


linear functional, non-zero, existence, 23.5 

linear functional extension, axiom of choice, 23.5.1 
linear group, general, differentiable manifold, 51.4.24 
linear group, general, for module over a set, 19.1.12 
linear group, general, locally Cartesian space, 49.4.18 
linear group, general, standard atlas, 49.7.15 

linear group, general, standard topology, 32.6.8 
linear independence, 22.6 

linear map, 23, 23.1, 231.1 

linear map, component array, 23.2.5 


linear map, component map, 23.2.2, 23.2.8, 23.2.9, 23.2.10 
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linear map, component matrix, 25.7 

linear map, contragredient representation, 23.11.19 
linear map, dual representation, 23.11.19 

linear map, exact sequence, 24.5 

linear map, kernel, 23.1.22 mm 

linear map, trace, 233 — 

linear map, trace, divergence interpretation, 23.3.5 
linear-map component map, 23.2 

linear map component matrix, 25.7.3 

linear map for a component matrix, 25.7.5 


linear-map style tensor on differentiable manifold, 56.1.4 


linear maps, linear space of, 23.1.6 
linear operations, vector bundle, 65.2 


linear-operator-valued function, Stieltjes integral, 43.12.3 


linear space, 22, 22.1, 22.1.1 

linear space, absorbing sub, 24.6.9 

linear space, abstract, Cartesian, 26.11.1 
linear space, adjoint map, 23.11.3 

linear space, affine space over, 26.10, 26.10.3 
linear space, algebraic dual, 23.6.7 

linear space, balanced set, 24.6.9 

linear space, basis family, 22.7.6 

linear space, basis set, 22.72 

linear space, Cartesian, 22.2.19, 26.11.1 
linear space, codimension, 22.5.1 
linear space, complex, 22.1.16 
linear space, component map, 22.8.6, 22.8.7, 22.8.8 
linear space, concave function, 23.12.2 

linear space, conjugate operator, 23.11.3 

linear space, convex function, 23.12, 23.12.2 
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linear space, coordinate-free, 26.1.1 

linear space, coset, 24.2.2 E 
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linear space, direct sum, 24.1 
linear space, double dual, 23.10 

linear space, dual, 23.6 

linear space, dual, dimension, 23.7.11, 23.7.12 
linear space, dual map, 23.11.3 

linear space, dual operator, 23.11.3 
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linear space, equivalent norms, 24.8.11 

linear space, Euclidean, 22.2.19, 22.2.20 

linear space, field regarded as, 22.1.9 

linear space, finite-dimensional, 22.5.7 

linear space, finite-dimensional, basis cardinality, 22.7.16 

linear space, finite-dimensional, differentiable manifold, 
51.4.21 

linear space, finite-dimensional, dual, 23.8 

linear space, finite-dimensional, norms are equivalent, 39.5.3 

linear space, finite-dimensional, standard atlas, 49.7.14 

linear space, finite-dimensional, standard topology, 32.6.6 

linear space, free, 22.2, 22.2.10, 22.2.23 

linear space, free, basis, 22.7.19 
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space, free, finite support, 22.2.9 

space, free, vector-valued, 22.2.28 

space, hyperbolic inner product, 24.10 

space, infinite-dimensional, 22.5.7 

space, inner product, 24.9 

space, internal direct sum, 24.1.6 

space, isomorphism theorem, 24.2.16 

space, norm, 24.7, 24.7.2 

space, norm, bounds, 24.8 

space, normed, 24.7.5, 39.3 

space, normed, finite-dimensional, 39.5 

space, ordered basis family, 22.7.7 T 

space, pull-back operator, 23.11.3 

space, quotient, 24.2, 24.38 — 

space, quotient, natural homomorphism, 24.2.12 
space, real, 22.1.15 

space, real, inner product, 24.9.5 
space, real, normed, 24.7.7 

space, seminorm, 24.6, 24.6.3 
space, seminorm, bounds, 24.8 
space, span of subset, 22.4.2, 22.4. 
space, tangent vector, finite-dimensional, 54.1.14 
Space, tensor product, 28.1 

space, topological, 39, 39.1, 39.1.3 

space, topological, convergent infinite series, 39.2.7 
space, topological, divergent infinite series, 39.2.7 
space, topological, infinite series, 39.2, 39.2.2 
space, topological, partial-sum sequence, 39.2.2 


space, topological, sum of infinite series, 39.2.6, 39.2. 
space, topological dual, 23.6.7 

space, totally ordered basis family, 22.7.7 

space, transpose map, 23.11, 23.11.2 

space, transposed map; 23.1 1.3 
space, unrestricted, 22.2, 22.2.5, 23.7.4 

space, unrestricted, standard immersion, 22.2.7 
space, unrestricted, vector-valued, 22.2.27 
space, well-ordered basis family, 22.7.7 

space automorphism, contragredient, 23.11.18 
space basis, 22.7 

space basis, dual, 23.7 

space basis, finite, 22.8.3 

space basis existence, 7.11.13, 22.7.21 

space bidual, 23.10.1 

space component map, dual, 23.9 

space direct sum, abstract, 24.1.4 


space direct sum, external, 24.1. 


space direct sum, formal, 24.1. 
space double bidual, 23.10.1 

Space drop function, curve tangent vector field, 57.9.5 
space dual, 23.6.4 

space dual, component transition matrix, 23.9.7 
space endomorphism, trace, 23.3.3 


space endomorphism trace, divergence of vector field, 
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linear space exact sequence, 24.5.2 

linear space isomorphism, double dual, 23.10.7 

linear space locally Cartesian space, 49.4.17 

linear-space manifold, scaling curve velocity, 57.9.11 

linear space monomorphism, 23.11.9, 23.11.10 

linear space monomorphism, double dual, 23.10.5 

linear space morphism notations, 23.1.9 

linear space morphisms, 23.1.8 

linear space of continuous functions on a locally Cartesian 
space, 49.10.2 

linear space of continuous vector-valued functions on a locally 
Cartesian space, 49.10.9 

linear space of cross-sections of non-topological vector bundle, 
24.11.4 

linear space of fibre set, induced, 65.1.6 

linear space of left invariant vector fields on Lie group, 62.4.8 

linear space of linear maps, 23.1.6 

linear space of multilinear functions, 27.6.3 

linear space of multilinear maps, 27.6, 27.6.3 

linear space of multilinear maps on dual spaces, 27.7 

linear space of rectangular matrices, 25.3.2 m 

linear space second dual, 23.10.2 

linear space sequence direct sum standard injection, 24.1.5 

linear space subspace, 22.1.10 

linear space tangent bundle, vertical drop function, 54.9, 
54.9.5 

linear spaces, concretely identical, 59.6.2 

linear spaces, normed, differentiation of maps, 41.9 

linear span, 22.4 mn 

linear span of set of vectors, 22.4.2, 22.4.3 

linear structure, induced, on fibre sets, 24.11. 

linear-style tensor, Cartesian space, 30.6.10 

linear-style tensor, specified by coordinates, 56.1.12 

linear-style tensor bundle manifold chart, 56.3.20 

linear-style tensor fibration total space, 56.3.3 

linear-style tensor space on Cartesian space, 30.6.5 

linear subspace, finite-codimensional, 22.5.19 

linear subspace, infinite-codimensional, 22.5.19 

linear subspace duals, natural isomorphisms, 24.3 

linear transformation, induced, differential of map, 58.4.1 

linear transformation group, contragredient representation, 
23.11.20 

linear transformation group, dual representation, 23.11.20 

linear transformations, general, Lie group, 63.4.18, 63.4.19 

linear transformations, topological group, 39.6 

linearity, oblique drop of vector bundle connection, 68.1.6 

linearity, vector bundle connection, 68.1.2 

linearity of oblique drop function, vector bundle, 65.4.4 

linearly independent family of vectors, 22.6.5 

linearly independent set of vectors, 22.6.3 

lingua franca, predicate calculus, 77.4.1 

linguistic structure, predicate calculus, five layers, 5.1.6 

lion, 3.0.1, 3.1.4 MEE 

lion, truth, falsity, 3.1.1 

Lipschitz, Rudolf Otto Sigismund, 77.1.7 

Lipschitz, Rudolf Otto Sigismund, absolute differential 
calculus, 27.1.5 

Lipschitz atlas for manifold, locally, 50.6.10 

Lipschitz constant, global, uniform, 38.6. 

Lipschitz continuity, metric space, 38.6 

Lipschitz continuity, multiple definitions, 38.6.1 

Lipschitz continuous derivatives, 42.5.33 ' 

Lipschitz continuous function, global, uniform, 38.6.6 

Lipschitz continuous function, pointwise bound, pointwise 
locality, 38.6.11 

Lipschitz continuous function, pointwise bound, pointwise 
pair-locality, 38.6.14 
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Lipschitz continuous function, uniform bound, uniform 

locality, 38.6.10 

Lipschitz continuous manifold, 50.6 

Lipschitz curve transition map, 36.1.8 

Lipschitz function, global, uniform, 38.6.6 

Lipschitz function, metric space, 38.0.1 

Lipschitz manifold, 48.2.7, 50.6.3 

Lipschitz manifold, locally, 50.6.11 

Lipschitz manifold, locally, rectifiable curve, 50.7 

Lipschitz manifold example, 53.2.4, 53.2.5, 53.2.6 

list, initial index, 14.12.4 

list concatenation function, 14.12.6 

list insert-item function, 14.12.6 

list length function, 14.12.6 

list omit-function, 14.12.6 

list omit-two-items function, 14.12.6 

list operation, general set, 14.12.6 _ 

list operation on ring, 18.10.3 

list operation on semigroup, 18.10.2 

list operations for sets with algebraic structure, 18.10 

list product function, 18.10.3 

list projection function, 18.10.2 

list space, 14.12.2 

list space, extended, 14.12.12 

list space, general sets, 14.12 

list space with maximum length, 14.12.17 

list spaces for sets with algebraic structure, 18.10 

list subsequence function, 14.12.6 

list substitute-item function, 14.12.6 

list sum function, 18.10.2 

list swap-items function, 14.12.6 

Listing, Johann Benedict, 77.1.6 

Listing, Johann Benedict, topology name, 31.1.1 

listing, recursive, of elements of a set, 789 —— 

literature, accumulation point, topology, 31.10.1 

literature, affine space definitions, 26.2.1 

literature, associated connection, 67.12.1 

literature, axiom of choice, 7.11.2 — 

literature, axiom of choice equivalents, 7.11.12 

literature, Bolzano-Weierstraf, 35.5.2 

literature, bounded variation functions of a real variable, 
38.10.1 

literature, calculus of variations, 44.8.1 

literature, Cartan formalism, 69.5.1 

literature, commutativity of partial derivatives, 42.3.1 

literature, complete metric spaces, 37.8.1 

literature, connection form defined using right action, 69.8.1 

literature, constructible universe, 12.6.8 

literature, covariant derivative, vector bundle, 68.2.1 

literature, curvature for general connections, 70.0.1 

literature, curvature of connection form, 70.5.1 

literature, Dedekind cuts, 15.4.1 

literature, deduction metatheorem, predicate calculus, 6.6.27 

literature, deduction metatheorem, propositional calculus, 
4.8.6 

literature, Dini derivatives, 40.10.1 

literature, Ehresmann formalism, 69.5.1 

literature, epsilon-delta approach to calculus, 40.2.5 

literature, exact sequences, 24.5.1 

literature, exhaustion method, application to quadrature, 
43.5.3 

literature, exterior derivative, 46.7.1 

literature, finite set definitions, 13.5.1 

literature, fundamental theorem of calculus, 43.8.1 

literature, fundamental vector field, principal bundle, 66.6.1 

literature, gauge covariant derivative, 69.14.77 

literature, gauge theory history, 70.8.2 

literature, gauge transformation formula shorthand, 69.13.11 
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literature, holomorphic complex functions, 42.8.6 

literature, holonomy group, 70.1.4 

literature, Hopf-Rinow theorem, 73.7.2 

literature, horizontal subspace connection representation, 
67.9.8 

literature, implicit function theorem, 41.10.5 

literature, integral curve, 57.10.1 

literature, integration styles, 43.1.4 

literature, isomorphism theorems, 24.2.15 

literature, Jordan content, 43.6.1 ^ 

literature, Lebesgue differentiation theorem, 45.7.1 

literature, left conjugate, algebra, 17.8.1 

literature, Lie bracket, 61.5.1 mu 

literature, Lie bracket defined as commutator, 61.5.3 

literature, Lie derivative, 61.8.2 

literature, linear space basis cardinality, 22.7.17 

literature, logic formalisation terminology, 4.1.6 

literature, manifolds with boundaries, 50.138 — 

literature, Maurer-Cartan form, 62.5.7, 64.8.13 

literature, Maurer-Cartan form, gauge transformation, 
69.13.11 

literature, monoid, algebra, 17.2.1 

literature, moving frames, 55.7.12 

literature, natural deduction, tabular systems, 5.3.2 

literature, open cover definitions, 33.5.3 = = 

literature, ordinal number definition, Robinson-style, 12.1.8 

literature, ordinal numbers, general, 12.5.2 mum 

literature, paracompactness, 33.7.13 

literature, Peano existence theorem, 44.3.1 

literature, Peano natural number axioms, 14.1.4 

literature, 


vector bundles, 65.0.2 


Picard iteration method, 44.6.1 
literature, Poisson bracket, 61.5.2 
literature, predicate calculus, Hilbert-style, 6.1.10 
literature, predicate calculus formalisations, 6.1.5. 
literature, pseudo-metric distance function, 37.1.9 
literature, pseudo-Riemannian geometry, 75.0.2 
literature, quadratrix, Hippias of Elis, 26.19.3 - 
literature, real-number power series, 42.8.1 
literature, rectifiable curves, 38.9.1 
literature, regularity of manifold embedding or immersion, 
52.5.1 
literature, separable and second countable spaces, 33.4.17 
literature, separable versus second countable spaces, 37.7.22 
literature, Skolem’s paradox, 7.12.6 
literature, space-filling curves, 36.3.1 
literature, Stokes theorem, 46.9.1, 46.9.1 
literature, structure equation for curvature forms, 70.5.6 
literature, submanifolds, embeddings and immersions, 50.2.3 
literature, tables, differential calculus, 40.0.1 
literature, tables, integral calculus, 43.2.3 
literature, test-function-based vector field differentiability, 
57.2.6 
literature, topological connectedness, 34.1.1 
literature, topological identification spaces, 32.13.2 
literature, topological linear spaces, 39.1.2 
literature, topology on the empty set, 31.3.9 
literature, transfinite induction, 11.8.6 _ 
literature, trichotomy of set cardinality, 13.1.20 
literature, uniform continuity for uniform spaces, 38.3.13 
literature, 
literature, 
literature, vector cross product, 17.1.4 
literature, vector fields, map-related, 61.6.1 
literature, well-ordering theorem proof, 11.6.23 
literature references, 79 
literature survey tables, 78.3 
literature versus typing, 1.4.6 
little-endian integer representation, 53.1.14 
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local approximation cylinder, Peano ODE method, 44.4.4 

local approximation rectangle, Peano ODE method, 44.3.9 

local axiom of choice, 7.12.4 

local bijection, 10.9.15 

local compactness, 33.6 

local connectedness of topological spaces, 34.7 

local diffeomorphism, Cartesian space, 42.7 __ 

local diffeomorphism family, 61.9.1 mU 

local differential form on a manifold, short-cut, 57.7.21 

local extremum, Hessian matrix, 51.6.13 

local extremum on differentiable manifold, 51.6.10 

local function, 10.9.2 

local homeomorphism, 31.14.16 

local maximum on differentiable manifold, 51.6.10 

local minimum of function on curve, 51.9.5 

local minimum on differentiable manifold, 51.6.1 

local trivialisation, differentiable fibre bundle, 64.8.3 

local trivialisation, differential transition map, 64.8.11 

local trivialisation, fibre bundle quintessence, 10.15.11, 21.0.7 

local trivialisation, non-topological fibration, 21.5.7 

local trivialisation, topological fibre bundle, 47.6.5 - 

local trivialisation, topological fibre/frame bundle, 47.13.4 

localisation, connection form, components, 69.13 

localisation, connection form, overlap consistency, 69.12.4 

localisation, cross-section, connection form construction, 
69.12.6 

localisation, cross-section, coordinate neighbourhood, 21.6.1 

localisation, cross-section, non-topological, 21.6 

localisation, gauge, of curvature form, 70.6, 70.6.2 

localisation component function, connection form, 69.13.3 

localisation of connection form via cross-section, 69.11.3 

localisation of connection form via cross-sections, 69.11 


localisation of cross-section via fibre chart, 21.6.2 

localisation of horizontal lift function, 67.7.2 

localisation of horizontal lift function, principal bundle, 
69.1.12 

localisations, connection form, compatibility, 69.12.5 

localisations of connection form, reconstruction from, 69.12.3 

localised cross-section, reglobalisation, 21.6.4 

locality, tangent vector specification, 54.1.13 

locally bounded variation, metric-space-valued function, 

38.10.7 

locally Cartesian atlas, continuous, empty chart exclusion, 

49.8.5 

locally Cartesian atlas, differentiable, 51.5, 51.5.2 

locally Cartesian atlas, induced topology, 49.8 

locally Cartesian atlas, topology induced on set, 51.5.7 

locally Cartesian atlas for a set, continuous, 49.82 

locally Cartesian group, 62.1.2 

locally Cartesian patchwork space, 49.9 

locally Cartesian space, 49 m 

locally Cartesian space, ambient space, 49.6.9 

locally Cartesian space, Cartesian, 49.4.16 

locally Cartesian space, chart space, 49.4.12 

locally Cartesian space, coordinate space, 49.4.12 

locally Cartesian space, differentiable, 51.3.7 

locally Cartesian space, differentiable, induced by atlas, 

515.14 

locally Cartesian 

51.3.13 

locally Cartesian 

locally Cartesian 

locally Cartesian 

locally Cartesian 

49.10.2 

locally Cartesian space, linear space of continuous 

vector-valued functions, 49.10.9 

locally Cartesian space, non-Hausdorff, 49.5, 51.8.2 
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space, differentiable, underlying topology, 


space, empty, dimension, 49.4.13 

space, Hausdorff, 33.1.32 

space, history, 49.4.8 

space, linear space of continuous functions, 
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locally Cartesian space, single-chart, is manifold, 51.5.12 

locally Cartesian space, Tı property, 49.4.22 

locally Cartesian space, transition map, 49.6.4 

locally Cartesian space, zero-dimensional, 49.4.9 

locally Cartesian space atlas, 49.7.3 

locally Cartesian space atlas, differentiable, 51.3.2 

locally Cartesian space atlas, differentiable, indexed, 51.3.2 

locally Cartesian space atlas, indexed, 49.7.3 

locally Cartesian space atlas, transition map, 49.7.7 

locally Cartesian space chart, 49.6.2 mu 

locally Cartesian space coordinate map, 49.6.2 

locally Cartesian space for general linear group, 49.4.18 

locally Cartesian space for linear space, 49.4.17 

locally Cartesian spaces, philosophy, 49.2 

locally Cartesian topological space, 49.4, 49.4.7 

locally Cartesian topological space, variable-dimension, 49.4.5 

locally Cartesian topological space dimension, 49.4.10 ^. 

locally compact set, 33.6.4 

locally compact set, strongly, 33.6.11 

locally compact topology, 33.6.3 

locally compact topology, strongly, 33.6.10 

locally connected point, topological space, 34.7.2 

locally connected separable space, 34.8 

locally connected subset of topological space, 34.7.5 

locally connected topological space, 34.7.3 

locally constant cross-section, non-topological, 21.6 

locally constant real-valued function, global differentiable 
extension, 51.8.3 

locally constant vector-tuple fields, properties, 57.4.7 

locally defined function, 10.9.2 

locally Euclidean topological space, 49.4 

locally finite cover, topological space, 33.7.14 

locally Lipschitz atlas for manifold, 50.6.10 

locally Lipschitz manifold, 50.6.11 

locally Lipschitz manifold, rectifiable curve, 50.7 

locally open implies open, condition on cover, 31.7.10 

locally open set, 31.7.6 HN 

locally pathwise connected topological space, 36.7.14 

locally product-structured space, fibre bundle, 21.0.7 

locally rectifiable curve, bidirectional length-parametrisation, 
38.9.11 

locally rectifiable curve in locally Lipschitz manifold, 50.7.6 

locus, 35.3.11 

logarithm function, 44.1, 44.1.2 

logarithm function, basic properties, 44.1.3 

logic, conjectural cognitive operational interpretation, 3.6.7 

logic, correct, 2.4.4 ES 

logic, naive, 4.8.8 

logic, predicate, 5 

logic, predicate, name-to-object map, 5.1.4 

logic, predicate, object class, 5.1.1 

logic, predicate, object oriented, 5.1.2 

logic, propositional, 3 mm 

logic, two-valued, 3.1.3 

logic, underlying set theory, 3.2.5 

logic application, 2.4.3 m 

logic as a branch of applied mathematics, 3.0.5 

logic axiom of distributivity, 4.4.4 

logic axiom of restriction, AAA — 

logic axioms, minimalist, diminishing returns, 4.9.5 

logic formalisation terminology, literature, 4.1.6 

logic interpretation, 2.4.3 SU 

logic language, low-level, 7.9.2 

logic mechanisation, social benefits, 4.1.3 

logic model, 2.4.3 

logic operator, arithmétic equivalent, 3.15.6 

logic operator, binary, but-not, 3.7.13 

logic operator, binary, not-but, 3.7.13 
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logic operator notations, 3.7.14 

logic operator notations, survey, 3.7.14 

logic quantifier duality, 5.2.7 d 

logic quantifier notations, survey, 5.2.4 

logic structure, 2.4.3 E 

logic symbols, 363 

logical algebra, 6.1.3 

logica. argument, hypothesis, 4.3.2 

logical argument, non-logical axiom, 4.3.2 

logical axiom, 3.3.6 m 

logical calculus, completeness, 6.6.14 

logical conjunction, 3.6.2 

logical deduction, A3 — 

logical defeatism, 3.3.1 

logica. disjunction, 3.6.2 

logical expression, basic, concrete propositions, 3.6 

logical expression, exclusion semantics, 3.11.1 ^ - 

logical expression, exhaustive substitution, 6.1.2 

logical expression, functional notation, 3.2.13 _ 

logical expression, infix, 3.9.12, 3.12.2 

logical expression, infix, bracket-level diagram, 3.12.1 

logical expression, infix, interpretation map, 3.12.4 

logical expression, infix, non-recursive syntax specification, 

3.12.1 

logical expression, postfix, 3.11.4 

logical expression, postfix, interpretation map, 3.11.6 

logical expression, prefix, 3.11.12 mum 

logical expression, prefix, interpretation map, 3.11.14 

logical expression equivalence, 3.10 

logical expression name, dereferencing, 3.9.2 

logical expression name, unquoting, 3.9.2 

logical expression space, infix, 3123 

logical expression space, infix, substitution closure, 3.12.5, 

3.12.12 NE 

logical expression space, postfix, 3.11.5 

logical expression space, postfix, substitution closure, 3.11.8 

logical expression space, prefix, 3.11.13 NETS 

logical expression space, prefix, substitution closure, 3.11.15 

logical expression style, infix, 3.12 

logical expression style, prefix and postfix, 3.11 

logical expression substitution, 3.10 

logical expression template, 399 

logical expressions, equivalent, 3.10.2 

logical function, constant, 5.1.11 

logical operation, basic, on logical expressions, 3.9 

logical operator, arity, 3.11.7 m 

logical operator binding precedence rules, 3.9.6 

logical predicate, zero-parameter, 5.1.10 

logical quantifier, infinity, 5.2.7, 5.2.8 

logical quantifier, semantics, 5.3.11 _ 

logical quantifiers, 5.2 mum 

logical quantifiers, power sets, 7.6.28 

logical quantifiers, reversing, axiom of choice, 45.2.1 

logical rigour versus mathematical intuition, 2.4.8 

logical subexpression, infix, 3.12.8 

logical theorem, 3.3.6 

logical voltage, 3.2.9 

logicisation of differential geometry, 77.4.1 

logicism, Bertrand Russell, 5.0.4 

logicism, set-ism, 12.0.2 SERT 

logos (Àóvoc), 31.1.1 — 

longitude, 49.6.8 

Lorentz, Hendrik Antoon, 77.1.7 

Lorentz, Hendrik Antoon, relativity, 77.3.2 

Lorentz group, space-time interval preservation, 20.1.10 

Lorentz transformation, pseudo-Riemannian manifolds, 75.1.1 

Lorentz transformation, tangent vector time parameter, 
53.1.3 
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Lorentz transformations and relativity, chronology, 77.3.2 
Lorenz, Ludvig Valentin, 77.1.6 

Lorenz gauge condition, 77.1.6. 

lost theorems without axiom of choice, 7.11.13 

low-level logic language, 7.9.2 

lower bound, real square matrix, 25.12 

lower bound function, real square matrix, 25.12.2 

lower bound of partially ordered set, 11.2.4, 11.3.2 

lower Darboux sum, 43.5.6 

lower modulus, real square matrix, 25.11, 25.11.2 

lower section, proper, well-ordered set, 11.7.4 

lower section, well-ordered set, 11.7.4 

lower well-ordering, 11.6.4 

lowering an index, vector in metric space, 73.5.7 
lowering-index isomorphism, 73.5.3 

lowering indices, tensor, 73.5 

lowest integer, zero or one, 14.1.2, 14.6.2, 14.12.1, 14.12.4 
lowest integer, zero or one, survey, dàii 7 
Lukasiewicz, Jan, 4.9.3, 77.1.7 
Lukasiewicz, Jan, logic axioms, 4.4.1 
luminiferous aether, 48.2.4 m 


Mach, Ernst Waldfried Josef Wenzel, 77.1.7 
Mach, Ernst Waldfried Josef Wenzel, relativity, 77.3.2 
Mach's principle, 48.2.4 NS 
machina, ex, path class, 48.2.2 

machinery, heavy, 12.1.9 

Maclaurin, Colin, 77.1.5 

magic properties, complex numbers, 16.8.11 

magic wand, axiom of choice, 2.1.5 

magic wand, axiom of countable choice, 13.10.1 
magma, molten, 2.1.1 T 
majority vote, truth not decided by, 1.4.15 

manifold, affine, 26.1.3 

manifold, almost-differentiable, 53.2 

manifold, analytic, 51.10, 51.10.3 

manifold, analytic, compact, 51.10.6 

manifold, C? versus topological, 51.2.5 

manifold, differentiable, 51, 51.3, 51.3. 
manifold, differentiable, Cartesian, 51.4.22 

manifold, differentiable, compatible chart, 51.4.2 

manifold, differentiable, diffeomorphism, 52.2 

manifold, differentiable, differentiable real-valued function, 
51.6.2 
manifold, differentiable, dimension, 51.3.16 

manifold, differentiable, direct product, 52.6 

manifold, differentiable, embedding, 525 — 

manifold, differentiable, empty chart exclusion, 51.5.5 
manifold, differentiable, equivalent atlases, 51.4.10 
manifold, differentiable, finite-dimensional linear space, 
51.4.21 

manifold, differentiable, general linear group, 51.4.24 
manifold, differentiable, immersion, 52.5 
manifold, differentiable, inclusion, 52.3.3 
manifold, differentiable, induced by atlas, 51.5.15 
manifold, differentiable, map, 52 —— 
manifold, differentiable, map regularity, 52.5.1 
manifold, differentiable, product, 52 

manifold, differentiable, regular embedding, 52.5. 


manifold, differentiable, regular immersion, 52.5. 

manifold, differentiable, regular submersion, 52.5. 

manifold, differentiable, restriction, 51.4.15 

manifold, differentiable, restriction to open set, 51.4.16 

manifold, differentiable, submersion, 52.5 

manifold, differentiable, tangent bundle, 54, 54.5.30, 65.9 

manifold, differentiable, tangent bundle philosophy, 53 

manifold, differentiable, underlying topology, 51.3.14 

manifold, differentiable product-structured, regular 
embedding, 52.7.5 
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manifold, differentiable real-valued function, 51.6 
manifold, differentiable vector-valued function, 51.7 
manifold, embedded, 50.6.4 d 
manifold, empty, dimension, 49.4.10, 49.4.14 

manifold, empty, direct product, 50.4.3 

manifold, Hólder continuous, 50.6 

manifold, level, 36.1.7 m 

manifold, Lipschitz, 48.2.7, 50.6.3 

manifold, Lipschitz continuous, 50.6 

manifold, locally Lipschitz, 50.6.11 - 

manifold, locally Lipschitz, rectifiable curve, 50.7 
manifold, non-topological, 49.3.8 ~~ 
manifold, product-structured, diffeomorphism, 52.7 
manifold, product-structured, tangent vector embedding 
map, 54.8, 58.7.5 

manifold, product-structured differentiable, induced atlas, 
52.7.1 

manifold, product-structured topological, induced atlas, 
50.5.1 

manifold, restriction, 52.4.16 

manifold, Riemannian, 73, 73.2.5, 73.2.11 

manifold, Riemannian, distance function, 73.7 

manifold, Riemannian, Laplacian operator, 74.6.4 
manifold, semi-Riemannian, 75.0.1 

manifold, tangent vector, 54.1.2 

manifold, topological, 50, 50.1, 50.1.1 

manifold, topological, boundary point, 50.1.8 

manifold, topological, complete atlas, 49.7.18 

manifold, topological, continuous embedding, 50.3.3 
manifold, topological, dimension, 50.1.4 

manifold, topological, embedding, 50.3 

manifold, topological, Hausdorff, 50.1.3 

manifold, topological, Hausdorff condition, 50.1.2 
manifold, topological, immersion, 50.3 BEN 
manifold, topological, maximal atlas, 49.7.18 

manifold, topological, non-Hausdorff, 50.1.3 

manifold, topological, product, 50.4 

manifold, topological, product-structured submanifold, 50.5 
manifold, topological, regular embedding, 50.3.6 x 
manifold, topological, regular immersion, 50.3.8 


manifold, topological, regular submersion, 50.3.13 

manifold, topological, regular topological submanifold, 50.2.8 

manifold, topological, submanifold, 50.2 

manifold, topological, submersion, 50.3 

manifold, topological, topological submanifold, 50.2.6 

manifold, topological, transition map, 49.6.4 

manifold, topological product-structured, regular 
submanifold, 50.5.4 

manifold, unidirectionally differentiable, 51.11 

manifold atlas, analytic, 51.10.2 T 

manifold atlas, analytic, equivalent, 51.10.5 

manifold atlas, differentiable, 51.5.10 

manifold atlas, differentiable, indexed, 51.5.10 

manifold atlas, tangent bundle total space, 54.5.22 

manifold atlas, topological, 50.1.11 

manifold atlas, topological, indexed, 50.1.12 

manifold atlas, topological, transition map, 49.7.7 

manifold atlas for fibre set, 64.11.5 SERM 

manifold chart, affine space, 26.2.7 

manifold chart, antisymmetric multilinear function bundle, 
56.5.24 

manifold chart, antisymmetric multilinear map bundle, 
56.7.15 

manifold chart, differentiable, torus, 51.10.8 

manifold chart, multilinear function bundle, 56.4.17 

manifold chart, tensor bundle, linear-style, 56.3.20 

manifold chart, tensor bundle, multilinear-style, 56.3.19 

manifold chart, topological, torus, 50.1.7 
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manifold chart constructor, antisymmetric multilinear 
function bundle, 56.5.25 

manifold chart constructor, antisymmetric multilinear map 
bundle, 56.7.16 

manifold chart constructor, list, 65.7.10 

manifold chart constructor, multilinear function bundle, 

56.4.18 

manifold chart constructor, tangent bundle, 54.5.21 

manifold chart constructor, tangent covector bundle, 55.4.8 

manifold chart constructor, tangent vector-frame bundle, 

55.6.26 

manifold chart constructor, tangent vector-tuple bundle, 

55.5.33 

manifold chart constructor, tensor bundle, 56.3.21, 56.3.22 

manifold chart constructor, vector-frame bundle, 65.8.8 

manifold chart constructor, vector-tuple bundle, 65.7.11 

manifold chart map, vector frame bundle, 55.6.25 

manifold chart transition rules, contragredient, 54.4.12, 

55.2.16 ~~ 

manifold definition core structures, survey, 49.2.4 

manifold dependence, target, differential of a map, 58.5.5, 

58.5.7, 58.7.4, 58.7.5, 58.7.16 

manifold direct product, differentiable, 52.6.2 

manifold direct product, topological, 50.4.2 

manifold direct product via atlases, topological, 50.4.8 


manifold embedding, differential, tangent vector embedding 


map, 58.5.8 

manifold example, Lipschitz, 53.2.4, 53.2.5, 53.2.6 
manifold map, differentiable, differentiability chain rule, 
52.1.17 

manifold map, differentiable, global differential chain rule, 
58.9.8 


manifold map, differentiable, pointwise differential chain rule, 


58.4.13 

manifold map, two-variable, componentwise differentiable, 
52.6.11 

manifold map, two-variable, jointly differentiable, 52.6.11 
manifold map differential, contragredient, 58.8.1 

manifold product, partial differential of map, 58.6 
manifold product, partial map differentiability, 52.6.8 
manifold product, slice-set submanifold, left, 52.6.17 
manifold product, slice-set submanifold, right, 52.6.17 
manifold product slice-set, tangent vector embedding map, 
54.8.3 

manifold structure, differentiable, concrete, 51.5.17 


manifold with boundary, curvature at boundary point, 70.4.5 


manifolds, differentiable, C^-diffeomorphic, 52.2.3 

manifolds, terminology, 49.2.8 

manifolds with boundaries, out of scope, 1.6.3 

Máo Zé-Dong, seek truth from facts, 1.4.15 

map, differentiable, between differentiable manifolds, 52.1, 
52.1.2 p 

map, differentiable manifold, 52 

map, differentiable manifold, differentiability chain rule, 
52.1.17 

map, exponential, 72.3, 72.3.3 

map, linear, 23, 23.1, 23.1.1 

map, multilinear, symmetry, 30 

map, name-to-object, predicate logic, 5.1.4 

map, truth value, 3.2.3 

map differentiability, joint-domain, 41.3 

map differentiability, joint-range, AL3 

map from linear space to second dual, canonical, 23.10.4 

map on IR regarded as a manifold, 57.9.9, 59.4.7 

map-related vector fields, 61.6.2 

map-related vector fields, Lie bracket, 61.6 

map-rule notation, 10.19, 10.19.5 = 

map-rule notation, double, 10.19.3 
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map versus function, terminology, 10.1.5 

mapping versus function, terminology, 10.1.5 

maps between normed linear spaces, differentiation, 41.9 
Mars robots, 77.1.3 

Mars robots, arithmetic is good enough, 14.3.2 
mathematical class, 8.8.2 T 
mathematical class notations, 8.8.5 

mathematical class parameters, 8.8.5 
mathematical community, 7.10.2 
mathematical definitions, naturalism, 7.1.3 
mathematical induction, Peano axiom, 14.1.7 
mathematical induction, principle, 7.10.1 
mathematical induction principle, finite, 12.2.13 
mathematical induction principle, wrapped finite, 12.2.24 
mathematical induction principle (theorem), 12.2.12 
mathematical intuition versus logical rigour, 2.4.8 
mathematical object, 8.8.2 EA 
mathematical object, implicit chart, 26.11.6 
mathematical object parametrisation by sets, 9.1.4 
mathematicians, chronology, 77.1 BRL 
mathematics, bedrock, 2.1 m 

mathematics, naive, 2.11. 

mathematics, rigorous, 2.1.1 

mathematics foundation, 7.1.1 

mathematics heaven, 2.2.10 — 

mathematics in stereo, 2.4.8 

mathematics learning, ET. 

mathematics ontology, 2.2 

mathematics package, computerised, 1.4.6 
mathematics philosophy, 2.0.3 d 
matrices, conformable for multiplication, 25.3.6 
matrix, Cayley product, 25.1.4 

matrix, chart transition, differentiable manifold, 51.4.18 
matrix, column, span, 25.5.2 

matrix, column matrix, 25.2.14 

matrix, column vector, 25.2.15 

matrix, component, linear map, 25.7 

matrix, component transition, dual linear space, 23.9.7 
matrix, empty, 25.2.7 

matrix, Hessian, at local extremum, 51.6.13 
matrix, identity, 25.4 

matrix, infinite, 25.1.4 

matrix, infinite, sparse, 25.3.16 

matrix, invertible, 25.8.7 

matrix, invertible, properties, 25.8.13 

matrix, Jacobian, manifold chart transition, 51.4.18 
matrix, lower bound function, 25.12.2 

matrix, lower modulus, 25.11.2 

matrix, multidimensional, 25.15, 25.15.2 

matrix, multidimensional, antisymmetric, 25.15.6 
matrix, multidimensional, symmetric, 25.15.5 
matrix, non-invertible, 25.8.7 

matrix, nonsingular, 25.8.7 

matrix, orthogonal, 25.14.2 

matrix, partial derivative, of a function, 41.2.12 
matrix, real negative definite, 25.11.7 

matrix, real negative semi-definite, 25.11.7 

matrix, real positive definite, 25.11.7 

matrix, real positive semi-definite, 25.11.7 

matrix, real square, lower bound, 25.12 

matrix, real square, lower modulus, 25.11 

matrix, real square, upper bound, 25.12 

matrix, real square, upper modulus, 25.11 

matrix, real symmetric, 25.13, 25.133 

matrix, rectangular, 25.2, 25.2.2 

matrix, row, span, 25.5.2 

matrix, row matrix, 25.2.14 

matrix, row vector, 25.2.15 
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matrix, singular, 25.8.7 

matrix, switching, non-Hausdorff locally Cartesian space, 
49.5.9 

matrix, symmetric, 25.13.2 

matrix, unit, 25.4.2 

matrix, upper bound function, 25.12.2 

matrix, upper modulus, 25.11.2 

matrix, zero, 25.2.8 

matrix addition, rectangular, 25.3 

matrix algebra, 25 m 

matrix algebra, Square, 25.8 

matrix algebra history, 25.0.1 

matrix column, 25.20.13 

matrix column null space, 25.6.2 

matrix column nullity, 25.6.3 

matrix column rank, 25.5.8 

matrix column span, 25.5 

matrix determinant, 25.10, 25.10.3 

matrix determinant, geometric interpretation, 25.10.14 

matrix diagonalisation, 76.8.5 

matrix eigenvalue, 25.3.1 

matrix eigenvector, 25.3.1 

matrix element, 25.2.11 

matrix element address, 25.2.11 

matrix element column index, 25.2.11 

matrix element row index, 25.2.11 

matrix entry, 25.2.11 

matrix function, Jacobian, Cartesian space, 46.3.7 

matrix group, 25.14 

matrix group, classical, 25.14.1 

matrix identity, 25.4.2 

matrix injection map, column, 25.2.18 

matrix injection map, row, 25.2.19 

matrix inverse, 25.8.10 

matrix inverse, left, 25.4. 

matrix inverse, right, 25.4.5 

matrix inverse procedure, 24.9.10 

matrix left inverse, 25.4 

matrix linear space, rectangular, 25.3.2 

matrix meanings, 25.1.2 

matrix multiplication, rectangular, 25.3 

matrix product, 25.3.7 m 

matrix rank, 25.5 

matrix right inverse, 25.4 

matrix row, 25.2.13 ^ 

matrix row null space, 25.6.2 

matrix row nullity, 25.6.3 

matrix row rank, 25.5.8 


matrix row span, 25.5 - 

matrix trace, 25.9, 25.9.2 

matrix trace, geometric interpretation, 25.9.5 

matrix trace, invariant like the determinant, 25.10.14 
matrix trace, properties, 25.9.4 

matrix transpose, 25.4.11 

matter, dark, 2.3.3 


matter field, fermion, 47.12.7 


[91 


matter field, fermionic, vector bundle, 21.12.9, 47.11.2, 71.6.6, 


15.4.1 
matter fields, fermionic, gauge theory, 65.0.1 
Maurer-Cartan form, differential transition map, 
trivialisation, 62.5.3 
Maurer-Cartan form, fibre chart differential transition map, 
64.8.9 
Maurer-Cartan form, gauge transformation, literature, 
69.13.11 
Maurer-Cartan form, gauge transformation shorthand, 
69.13.12 
Maurer-Cartan form, left, 62.5.4 
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Maurer-Cartan form, left invariant, right transformation, 

62.10.14 

Maurer-Cartan form, left invariant, standard, 62.5.4 

Maurer-Cartan form, Lie algebra infinitesimal action map, 

66.5.6 

Maurer-Cartan form, literature, 62.5.7, 64.8.13 

Maurer-Cartan form, related to connection forms and 

generators, 69.7.4 

Maurer-Cartan form, right, 62.7.13 

Maurer-Cartan form, right invariant, left transformation, 

62.10.14 

Maurer-Cartan form, right invariant, non-standard, 62.7.12 

Maurer-Cartan form is a connection form, 62.10.13 

Maurer-Cartan forms on Lie groups, 62.5 

max map domain, 11.2.26 m 

max notation, 11.2.16, 11.2.20, 11.2.22 

maximal atlas, C^, 51.4.4 

maximal atlas, disadvantages, 50.4.9, 51.2.2, 51.4.4, 54.5.31, 
54.9.12, 64.4.1 UR 

maximal atlas, locally Cartesian space, 49.7.18 

maximal atlas, topological manifold, 49.7.18 

maximal C* atlas, differentiable manifold, 51.4.5 

maximal element, partial ordered set, 11.2.2 

maximal partial order, 11.5.5 

maximality theorem, Hausdorff's, 7.11.13 

maximum, local, on differentiable manifold, 51.6.10 

maximum, real-valued function, zero partial derivatives, 
51.6.12 

maximum map, 11.2.21 

maximum of partially ordered set, 11.2.4, 11.3.2 

maximum set-map, 11.2.19 

maximum/minimum of function, zero derivative, 40.6.1 

Maxwell, James Clerk, 77.1.6 

Maxwell, James Clerk, attraction at a distance, 21.0.7 

Maxwell, James Clerk, relativity, 77.3.2 

Maxwell's equations, gauge terminology, 70.8.4 

Maxwell's equations, gauge theory, 70.8.1 

Maxwell's equations, gauge theory example, 70.8.7 

Maxwell's equations, history of fibre bundles, 47.0.3 

Maxwell's equations, invariance property, 70.8.2 

Maxwell's equations, principal bundle, 66.1.6 

mean speed, bounded by maximum pointwise speed, 40.8.7 

mean value theorem, 40.6 

mean value theorem, curve in Cartesian space, 40.8.6 

mean value theorem, curves, 40.8 

mean value theorem, real-valued function of real numbers, 
40.6.6 

mean value theorem, several dependent variables, 40.8 

meaning, symbol-strings, 3.3.1 ms 

measure, double shadow set, upper bound, 45.6.9 

measure, exterior Lebesgue, 43.6.3 

measure, Lebesgue, 30.4.26 

measure, Lebesgue, outer, 43.6.4 

measure, shadow set, upper bound, 45.5.8 

measure and integration, research thrust, 43.1.3 

measure of open set of real numbers, 32.7.10 

measure theory, geometric, 30.4.26 

measure zero, explicit families of real-number sets, 45.3 

measure zero, Lebesgue, 45 n 

measure zero, rational numbers, 45.1.5 

measure zero set, real-number, properties, 45.1.4 

measure zero set, real-number explicit, properties, 45.2.4 

measure-zero set of real numbers, 45.1, 45.1.3 

measure-zero set of real numbers, explicit, 45.2, 45.2.2 

measure-zero set of real numbers, explicit family, 45.3.2 

measure zero sets, explicit, explicit countable family, 45.3.1 

measure-zero sets, explicit real-number family, countable 

union, 45.3.3 
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measurement, orbit-space, associated fibre bundle, 47.11.1 metric space, Cartesian, 26.11.1 
measurement map, baseless figure/frame bundle, 20.10.8 metric space, Cauchy sequence, 37.8.3 
measurement process map, topological fibre/frame bundle, metric space, compactness and convergence, 37.7 
47.13.2 metric space, complete, 37.8.9 
measurement space, baseless figure/frame bundle, 20.10.8 metric space, complete, nested set convergence, 37.9 
measurement space, topological fibre/frame bundle, 47.13.2 metric space, completeness, 37.8 SS 
measurement transformation group, baseless figure/frame metric space, completeness, Cartesian space, 37.8.14 
bundle, 20.10.8 metric space, completeness, real numbers, 37.8.14 
measurement transformation group, topological fibre/frame metric space, continuity, 38 
bundle, 47.13.2 metric space, continuity of map, 38.1 
measurements, phenomena, real numbers, 15.0.3 metric space, continuous function, 38.1.1 
measuring stick, 53.1.3, 53.1.4, 59.1.2 metric space, convergence of sequence, 37.7.11 
meat-centric cook, 1.4.14 metric space, curve length, 38.8, 38.8.2 
mechanics, quantum, out of scope, 1.6.3 metric space, Euclidean, 26:11.] ^ 
mechanisation of logic, social benefits, 4.1.3 metric space, general, 37.1, 37.1.3 
mediate cardinal, 13.10, 13.10.7, 1311.6 metric space, Heine-Borel property, 37.7.20 
mediate cardinals, continuum hypothesis, 7.11.8 metric space, Hölder continuity, 38.7 
megadrought, Late Pleistocene, 3.0.2 metric space, induced topology, 37.5.2 
membership chain, set, 7.8.9 TOR metric space, Lipschitz continuity, 38.6 
membership-predicate, relation representation, 9.1.1, 9.5.1 metric space, paracompact, 37.5.11, 37.7.19 
membership relation, concrete propositions, 3240 metric space, radius of set, 37.4.11 
membership relation network traversal, 7.8.9 metric space, real-valued, 37.2, 37.2.4 
membership relation on the left, 7.5.1 metric space, uniform continuity, 38.3 
membership symbol €, set, 7.2.9 metric space, uniform convergence, 38.4 
membership theory, 7.2.10 TT metric space, valued in ordered commutative group, 37.1.3 
memory, short-term, 7.10.2 metric spaces, continuous bijection, continuous inverse, 
memory, virtual, computer, 7.10.2 38.1.13 
Mercator, Gerard, 77.1.4 metric spaces, explicit continuity of maps, 38.2 


mesh, interval partition, 43.3.10 
mesh-constrained set of interval partitions, 43.3.11 metric tensor, misnomer for inner product field, 73.2.1 
Mesopotamia, Gilgamesh, 3.1.1 metric tensor, non-singular, covariant vector components, 
Mesopotamian mathematics, 77.1.2 73.5.9 

meta-function, 10.2.30 metric tensor calculation, Hessian operator, 73.2.6, 73.9.1, 
metadefinition, tensor product, 28.6.2 73.10.3 — S 
metadefinition, tensor product space, 28.6 metric tensor calculation from distance function, 76.5 
metamathematical definition, 3.2.4 nad metric tensor field, Minkowskian time-space, 75.2.6 — 
metaphysical set, 22.7.22 metric tensor field, pseudo-Riemannian, 75.2, 75.2.1 
metaphysics, limits, 5.2.8 metric tensor field, Riemannian, 73.2, 73.2.4 

metatheorem, 4.8.8 ^ metric tensor field, Riemannian, induced by distance function, 
metatheorem, deduction, 4.3.18, 4.8 73.9.4 


metric subspace, 37.2.14 


metatheorem, deduction, literature survey, 4.8.6 
metatheorem, deduction, predicate calculus, 6.6.27 
method of exhaustion, 17.5.3, 18.4.1, 43.0.1, 43.1.2 
metric, discrete, 37.2.6 

metric, induced topology, 37.5 

metric, pseudo-Riemannian, differentiable, 75.2.2 
metric, Riemannian, 73.2.4 = 
metric, Riemannian, component array field, 73.3.2 
metric, Riemannian, differentiability, 73.2.10 
metric, Riemannian, Hessian operator, 73.1.1 


metric, Riemannian, inverse component array field, 73.3.8 


metric, two-point, 37.2.7 

metric, usual, on real numbers, 37.2.8 
metric-compatible connection, 74.2.2 

metric-free parallelism, affine connection, 71.6.1 
metric function, general, 37.1, 37.1.2 

metric function, real-valued, 37.2, 37.2.3 

metric function, Riemannian, 73.4, 73.4.2 

metric function, Riemannian, for tensor field, 73.4.3 


metric tensor field, space-time Minkowskian, 75.2.5 
metric tensor field, tensor calculus, 73.3 

metric tensor inverse, 73.5.1 A 

metric tensor on two-sphere, 76.6.1 

metric tensors versus distance functions, 73.9 
metric transformation group, 77.2.6 

metrisable topological space, 37.5.15 

Michelson, Albert Abraham, 77.1.7 

Michelson, Albert Abraham, relativity, 77.3.2 
microscope, human mind, 4.3.5 —_ 


middle, 


excluded, 3.1.4 


milestones, Cauchy-Riemann-Darboux integral, history, 


43 


5.1 


milestones, history of integration, 43.1.2 
Mills, Robert Laurence, 77.1.7 


min ma 
min not 


p domain, 11.2.26 
ation, 11.2.16, 11.2.20, 11.2.22 


min/max equivalent, logic operator, 3.15.6 
mind, Roman, 77.1.3 


metric function, two-point, 73.9.1 minimal closed-singleton topology, 31.11.8 

metric function, valued in ordered commutative group, 37.1.2 | minimal element, partial ordered set, 11.2.2 

metric-invariant geometry, 77.2.5 minimal structure group, non-topological fibre bundle, 21.7.7 
metric layer, 1.1 minimal T topology, 31.11.8 

metric linear ‘space, Cartesian, 26.11.1 minimalism, definitions, 18.11.13 


metric linear space, Euclidean, 26.11.1 
metric space, 37 

metric space, ball, 37.3 

metric space, bounded set, 37.4.12 
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minimum, local, on differentiable manifold, 51.6.10 

minimum, real-valued function, zero partial derivatives, 
51.6.12 

minimum map, 11.2.21 

minimum of partially ordered set, 11.2.4, 11.3.2 

minimum set-map, 11.2.19 

minimum/maximum of function, zero derivative, 40.6.1 

Minkowski, Hermann, 77.1.7 

Minkowski, Hermann, relativity, 77.3.2 

Minkowski base space fibre bundle example list, 70.8.6 

Minkowski functional, seminorm unit ball recovery, 24.6.10 

Minkowski functionals of seminorm unit balls, 24.6.7 

Minkowski scalar multiple of a set, 22.10.2 

Minkowski scalar product of a set, 22.10 

Minkowski set subtraction, notation, 22.10.5 

Minkowski space, additive real ordinary bundle, 64.8.8 

Minkowski space, meaningfulness of geodesics, 72.0.1 

Minkowski space, principal bundle, connection form, 69.5.6 

Minkowski space, principal bundle, gauge potential, 69.11.4, 

69.13.5 

Minkowski space, principal bundle, gauge theory example, 

70.8.7 

Minkowski space, principal bundle, horizontal component 

map, 69.3.3 

Minkowski space, principal bundle, horizontal lift localisation, 

69.1.14 

Minkowski space, principal bundle, vertical component map, 

69.4.5 

Minkowski space, real principal bundle, 66.1.6 

Minkowski space, real principal bundle, horizontal lift, 69.1.9 

Minkowski space, real principal bundle, infinitesimal action, 

66.5.4 

Minkowski space, real principal bundle, infinitesimal action 

map inverse, 66.5.8 

Minkowski space, real principal bundle, right action, 66.2.19 

Minkowski space, real principal bundle, right action 

invariance, 69.1.10 

Minkowski space, real principal bundle, vector right action, 

66.2.24 

Minkowski space example, fundamental vertical vector field, 

66.6.3 

Minkowski space-time, 75.1.11 

Minkowski space-time, gauge theory construction stages, 

75.4.1 

Minkowski space-time, motivated by physics, 75.0.3 

Minkowski space-time, not true geometry, 75.1.2 

Minkowski space-time, pseudo-Riemannian manifolds, 75.1.1 

Minkowski sum of convext sets is convex, 22.11.23 

Minkowski sum of intervals is an interval, 22.11.24 

Minkowski sum of sets, 22.10, 22.10.2, 22.104 

Minkowskian inner product criterion, basis-free, 24.10.8 

Minkowskian inner product definition, coordinate-free, 

24.10.6 

Minkowskian inner product on real linear space, 24.10.6 

Minkowskian metric, pseudo-Riemannian metric, 75.2.4 

Minkowskian metric tensor field, space-time, 75.2.5 

Minkowskian metric tensor field, time-space, 75.2.6 

misnomer, Cartesian, for Oresmian, 26.11.1 

misnomer, connection form, for covariant derivative form, 
69.5.2 

misnomer, gauge, 70.8.4 

misnomer, metric tensor, for inner product field, 73.2.1 

misnomer, non-decreasing/non-increasing, partial order, 
11.1.29 

mission creep, 1.4.1 

mixed multilinear function space, 29.3.8, 29.5.2 

mixed set-map, function, 10.7 

mixed tensor product space, 29.3.3, 29.5.4 
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mixed tensor product space, single primal space, 29.5 
mixed tensor product space on mixed primal spaces, 29.3 
mixed tensor space, traditional, 29.5.13 = 
mixed tensors, components, mixed primal spaces, 29.4 
mixed tensors, components, single primal space, 29.6 
mixed tensors, physics, 29.5.14 LS 
mobile, repère, 55.6.1, 55.7.12, 57.11 

Mobius, August Ferdinand, 24.4.1, 26.1.4, 77.1.6, 77.2.3 
Mobius strip, topological fibre bundle, 47.6.18 

model, logic, 2.4.3 E 

model, name clash, 2.4.2 

model, set theory, Zermelo, 12.6.3 

model of observations, real numbers, 15.3.2, 53.1.7 
model theory, 6.1.6, 6.6.14 

model theory, meaning of assertion symbol, 5.3.6 

model theory, out of scope, 1.6.3 m 

model theory, underlying set theory, 3.16.2, 
model validation framework, ZF, 7.1.4 
modelling, blind spot, 53.2.2 mm 
modelling the real world, integers, 26.11.6 
models used by architects and engineers, 2.2.13 
module, left, over a ring, 19.3.1 

module, left, over a ring, unitary, 19.3.6 
module, line on affine space over, 26.4.5 
module, ring of endomorphisms of, 19.1. 
module, unitary, affine space over, 26.7 

module, unitary, over a field, 19.3.16 — 

module, unitary left, 22.1.2 

module automorphism over a set, 19.1.10 

module homomorphism over a set, 19.1.10 

module morphism notations, 19.4.4 

module of homomorphisms, 19.1.4, 19.1.14 

module of module endomorphisms, 19.4.10 

module of ring-module homomorphisms, 19.4.8 

module operator domain, 19.1.7 

module over a ring, inner product, 19.7.2 

module over a ring, inner product, positive definite, 19.7.3 
module over a ring, norm, 18.5.11, 19.6.1, 19.6.2 

module over a ring, normed, 19.6.3 

module over a ring, submodule, 19.3.9 

module over a ring, unitary, morphisms, 19.4.11 

module over a ring, unitary, submodule, 19.3.14 

module over a ring morphisms, 19.4.3 

module over a set, 19.1.7 EL 

module over Archimedean ring, Cauchy-Schwarz inequality, 
19.8.9 

module over field, unitary, affine space over, 26.10.3 

module over group, 19.2 

module over group, affine space over, 26.6, 26.6.2 

module over group, left, 19.2.6 ín 

module over monoid, left, 19.2.5 

module over ordered ring, half-line on affine space over, 26.7.5 
module over ring, 19.3 

module over ring, affine space over, 26.6, 26.6.6 

module over ring, Cauchy-Schwarz inequality, 19.8.2 
module over ring, inner product, 19.7 

module over ring, morphism, 194 — 

module over ring, norm, 19.6 

module over ring, orthogonal vectors, 19.7.7 

module over ring, seminorm, 19.5, 19.5.3 

module over ring, square-norm, 19.7.1 _ 

module over ring, unitary, affine space over, 26.7.2 

module over semigroup, 19.2 

module over semigroup, left, 19.2.4 

module over set, affine space over, 26.4 

module over set, line on affine space over, 26.5.3 

module over set, morphism notations, 19.1.12 

module over unstructured operator domain, 19.1 
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module over unstructured set, affine space over, 26.4.10 

module with operator domain, 19.1.7 

module without operator domain, 19.1.2 

module without operator domain, affine space over, 26.4.2 

modules, 19 

modules and algebras, family tree, 19.0.1 

modulo function, 16.5.19 

modulus, lower, real square matrix, 25.11, 25.11.2 

modulus, upper, real square matrix, 25.11, 25.11.2 

modulus function, real number, 16.55 

modulus of continuity, 38.3.9 

modulus of continuity, pointwise, 38.2.4 

modulus of continuity bound, 38.3.9 

modus ponendo ponens, 4.8.15 

modus ponens, 4.1.6, 4.3.17 

modus tollendo tollens, 4.8.15 

molten magma, 2.1.1 ^ 

Monge, Gaspard, 77.1.6 

monkey, visual cortex, 40.2.1 

monoid, 17.2, 17.2.6 

monoid, left 1 module, 19.2.5 

monomial, multilinear function, 27.4.8 

monomial function, derivative, 40.5.12 

monomorphism, C" differentiable, differentiable manifold, 
52.2.2 

monomorphism, 

monomorphism, 

monomorphism, 

monomorphism, 

monomorphism, set, 10.5.21 

monomorphism, transformation group, 20.6.2, 20.8. 

monomorphism versus injective homomorphism, 10.5.23 

monotonicity, strict, implied by continuity and injectivity, 
34.9.9 

monsters, ordinal numbers and choice axiom, 13.2.1, 13.3.11 

Montaigne, Michel de, 1.4.3, 77.4.3 

Morgan’s law, de, 8.2.3, 8.4.11 _ 

Morley, Edward Williams, 77.1.7 

Morley, Edward Williams, relativity, 77.3.2 

morphism, C! differentiable, Cartesian space, 41. 41.7.6 

morphism, CF differentiable, Cartesian space, 42. 42.7.2 

morphism, C^ differentiable, differentiable manifold, 52.2.2 

morphism, group, unital, 17.4.8, 36.9.5 

morphism, module over ring, 19.4 

morphism, set, 10.5.21 m 

morphism, unital, field, 18.7.9 

morphism, unital, unitary ring, 18.2.11 

morphism, unitary ring, 18.2.7 

morphism notations, linear space, 23.1.9 

morphisms, group, 17.4 os 

morphisms, groups, 17.4.1 

morphisms, linear space, 23.1.8 

morphisms, modules over a ring, 19.4.3 

morphisms, ordered field, 18.8.17 

morphisms, ring, 18.1.20 

morphisms, unitary modules over a ring, 19.4.11 

motion, equations, gauge potential, 70.81 — 

motivations of this book, 1.4 

Mount Olympus, 7.1.2 m 

mouse-catcher, dog, 1.4.3 

moving frame, 55.6.1, 55.7.12, 57.11 


linear space, 23.1.8, 23.11.9, 23.11.10 
order, 11.1.01] 
ordered field, 18.8.17 
ring, 18.1.20 


ren 


moving-frame coefficient array, vector bundle connection, 68.3 


moving-frame connection, Cartan-style, 71.14.1 
moving-frame field affine connection, 71.14.2 
moving train, inertial frame, 20.1.10, 20.10.2 
MSC 2010 subject classification, 1.8 

mud, clear as, context-dependent, 63.6.7 
multi-index, 14.8.38, 14.8.39 
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multi-index, higher-order partial derivatives, 42.2.1, 42.2.5, 
42.5.6, 42.5.7 DE 

multi-index-degree-tuple, 14.8.39 

multi-index factorial function, 14.8.39 

multi-index-list, 14.8.38 

multi-index style comparison, 14.8.37 

multi-variable derivatives, 41 

multidimensional matrix, 25.15, 25.15.2 

multidimensional matrix, antisymmetric, 25.15.6 

multidimensional matrix, symmetric, 25.15.5 

multilinear algebra, 27, 27.1.2, 27.1.5 

multilinear dual, 27.2.25, 29.2.23 

multilinear dual function space, canonical basis, 27.7.2 

multilinear dual of a linear space family, 27.6.4 

multilinear dual space, 29.2.2 

multilinear effect, antisymmetric, 30.4.4 

multilinear field, 27.1.1 

multilinear form short-cut, general, 57.7.25 

multilinear function, 27.2.1, 27.2.3, 27.3 

multilinear function, alternating, 30.1. 

multilinear function, antisymmetric, 30.1.8 

multilinear function, antisymmetric, components, 30.3.10 

multilinear function, dual, simple, 27.4.14 

multilinear function, juxtaposition product by vector, 27.2.31, 
56.7.9 

multilinear function, linear space dimension, 27.6.14 

multilinear function, simple, 27.4, 27.4.6 

multilinear function, symmetric, 30.1.7 

multilinear function bundle, 56.4.19 

multilinear function bundle, antisymmetric, 56.5.26 

multilinear function bundle, antisymmetric, manifold chart, 
56.5.24 

multilinear function bundle, antisymmetric, non-topological, 
56.5.23 

multilinear function bundle, non-topological, 56.4.15 

multilinear function bundle, tangent, 56.4 

multilinear function bundle, tangent, antisymmetric, 56.5 

multilinear function bundle, tangent, symmetric, 56.6 

multilinear function bundle manifold chart, 56.417 

multilinear function bundle notation, antisymmetric, 56.5.14 

multilinear function bundle notation, symmetric, 56.6.3 

multilinear function fibration, 56.4.10 

multilinear function fibration, antisymmetric, 56.5.16, 56.5.17 

multilinear function fibration, antisymmetric vector-valued, 
56.7.6 

multilinear function fibration, symmetric, 56.6.5 

multilinear function fibration, vector-valued, 56.7.5 

multilinear function fibration notation, 56.4.9 

multilinear function map, canonical, 27.4.4 

multilinear function map transpose, canonical, 27.4.13 

multilinear function monomial, 27.4.8 

multilinear function on manifold, vector-valued, 56.7.1 

multilinear function on tangent space, antisymmetric, 
particular, 56.5.9 

multilinear function on tangent space, particular, 56.4.4 

multilinear function space, canonical basis, 27.6.9, 30.2.11, 
55.5.41 Em 

multilinear function space, canonical injection, scalar, 27.3.7 

multilinear function space, mixed, 29.3.8, 29.5.2 

multilinear-function-style tensor juxtaposition product, 29.7.6 

multilinear functions, linear space, 27.6.3 NES 

multilinear functions, universal factorisation property, 27.8 
27.8.7 mm 

multilinear map, 27.2, 27.2.3 

multilinear map, alternating, 30.1.3 

multilinear map, antisymmetric, 30.1, 30.1.3 

multilinear map, antisymmetric, components, 30.3.2, 30.3.5 


multilinear map, antisymmetric, vector-valued, 56.7.7 
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multilinear map, canonical, 28.2, 28.2.1, 28.2.2 

multilinear map, components, s, 27.5 

multilinear map, degree, 27.2.12. 

multilinear map, infinite component spaces, 27.2.7 
multilinear map, infinite-dimensional, 27.2.25 
multilinear map, linear space dimension, 27.6.18 

multilinear map, symmetric, 30.1, 30.1.2 

multilinear map bundle, antisymmetric, 56.7.18 

multilinear map bundle, antisymmetric, manifold chart, 
56.7.15 

multilinear map bundle, antisymmetric, non-topological, 
56.7.14 

multilinear map bundle, tangent, antisymmetric, 56.7 
multilinear map bundle, tangent, symmetric, 56.7 
multilinear map bundle notations, 56.7.4 a: 
multilinear map bundle on a manifold, 56 

multilinear map fibration, antisymmetric, fibre chart, 56.5.19 
multilinear map fibration, vector-valued antisymmetric, fibre 
chart, 56.7.11 
multilinear map linear space, 27.6 

multilinear map on tangent space, antisymmetric, particular, 
56.7.8 

multilinear map space, antisymmetric, canonical basis, 30.3.6 
multilinear map space, canonical basis, 27.6.16, 30.2.7 
multilinear map space, canonical injection, vector, 27.2.23 
multilinear map space basis, antisymmetric, 30.3 

multilinear map space canonical basis, 30.2 = 


multilinear-map style tensor on differentiable manifold, 56.1.3 


multilinear map symmetry, 30 

multilinear maps, linear space, 27.6.3 

multilinear maps on dual spaces, linear space, 27.7 

multilinear-style tensor, Cartesian space, 30.6.9 

multilinear-style tensor, specified by coordinates, 56.1.11 

multilinear-style tensor bundle manifold chart, 56.3.19 

multilinear-style tensor fibration total space, 56.3.2 

multilinear-style tensor space on Cartesian space, 30.6.4 

multilingualism, differential geometry formalisms, 73.3.1 

multinomial coefficient space, 14.12.20 

multinomial coefficient space, symmetric, 14.12.20 

multiple-domain function product, 55.5.26 

multiple of a set, scalar, Minkowski, 22.10.2 

multiple point of curve, 36.2.13 

multiple point of path, 36.8.10 

multiple-valued function, 10.9.1 

multiplication, defective, extended real numbers, 16.2.8 

multiplication, real numbers, 15.8 

multiplication, rectangular matrix, 25.3 

multiplication, scalar, 22.1.1 7 

multiplication rule, constant, differential calculus, 40.5.9, 
41.1.18 — 

multiplicative axiom, 7.12.2, 10.3.3 

multiplicative axiom, countable, 13.7.22 

multiplicative class of subsets of a set, 18.11.4 

multiplicative identity element, unitary ring, 18.2.2 

multiplicity, 6.8 mum 

multiplicity quantifier, 6.8.8 

multiplicity quantifier, orderer pair test, 9.2.15 

musical isomorphism, flat, 73.5.1, 73.5.3 

musical isomorphism, sharp, 73.5.1 1, 73.5 5.4 

mysteries, fundamental, real-world econ 49.3.11 

myth, 7.1.2 

myth, creation, 3.1.6 


naive, etymology, 2.1.2 

naive comprehension, set specification, 7.7.9 

naive cross product, 9.2.4 dE 

naive derivative, fibration cross-section, 64.7.7 

naive derivative, Leibniz rule, tangent bundle vector field, 
61.3.3 
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naive derivative, Leibniz rule, vector bundle cross-section, 
65.6, 65.6.2 

naive derivative, ordinary fibre bundle, 64.7 

naive derivative, vector field, 61.2, 61.4 — 

naive derivative, vector field, Leibniz rule, 61.3 

naive derivative of product of maps, Leibniz rule, 61.3.1 

naive differential form, Cartesian space, 46.6, 46.6.2, 46.6.3 

naive differential form, differentiable, Cartesian space, 46.6.5, 
46.6.7 

naive induction, 9.3.4 

naive logic, 48.8 —— 

naive mathematics, 2.1.1, 2.1.2 

naive set, 3.2.3, 3.2.10, 6. 


10, 6.5.9 — 

naive theorem, £38 

naive vector field, action of vector, Cartesian space, 46.4.2 

naive vector field, Cartesian space, 46.3, 46.3.3 

naive vector field, differentiable, Cartesian space, 46.3.4 

naive vector field, directional derivative, non-vectorial, 46.4.4 

naive vector field, Lie bracket, Cartesian space, 46.4.7 

naive vector field directional derivative, Cartesian space, 
46.4.2 

name, constant, definition, 5.1.12 

name, logical expression, dereferencing, 3.9.2 

name, logical expression, unquoting, 3.9.2 

name, statement, 4.1.6 m 

name clash, field, 18.7.6 

name clash, model, 2.4.2 

name expansion, 3932 

name map, proposition, 3.3.2 

name scope, proposition, 3.3.4 

name space, proposition, 3.3, 3.3.2 

name-to-object map, piede EG 5.1.4 

NAND (not-and), 4.7.4 mm 

NAND operator, 3. 7.8, 3.7.8, 3.7.10, 3.7.14 

NASA, Skylab, 26.10.1 pans 

native geometry, 53.1.7 

native time, 53.1.7 

native velocity, 53.1.7 

native velocity, pull-back, 40.1.1 

natural deduction, 4.8.2, 4.8.5, 5.3 

natural deduction, tabular systems, literature, 5.3.2 

natural homomorphism, quotient linear space, 24.2.12 

natural isomorphism, 22.2.3 

natural isomorphism, tensor space, 29.2.5 

natural number, 14, 14.1 

natural number, extended, 14.5.7 

natural number arithmetic, 14.3 

natural number definitions and notations, survey, 14.1.1 

natural number system, Peano axioms, 14.1.5 

natural number system, standard, 14.1.18 

natural numbers, first number, 14.1.2 

natural numbers, leading interval, 14.1.21 

naturalism, mathematical definitions, 7.1.3 

nature, essential, 1.4.7 

nature, set, 7.8.9 mE 

NBG (Neumann-Bernays-Gódel), 7.1.10 

nearest integer distance function, 16.5.22 

nearest integer function, 16.5.17 

negation, logical, 3.6.2 

negative definite, 25.11.7, 30.5.3 

negative identity, Lie group inversion map differential, 62.2.9 

negative integral power, 16.6.4 

negative number one's complement representation, 14.4.13 

negative number two’s complement representation, 14.4.12 

negative of real number, 15.7.6 

negative rational power, positive real number, 16.6.12 

negative semi-definite, 25.11.7, 30.5.3 

neighbourhood, closed, 31.8.26 
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neighbourhood, compact, 33.6.2 


neighbourhood, coordinate, cross-section localisation, 21.6.1 


neighbourhood, global, 31.2.6 

neighbourhood, open, 31.3.11 

neighbourhood, per-point, 31.2.6 

neighbourhood of point in topological space, 31.3.12 

neighbourhood system, 31.2.6 

neighbourhood/point pairs, topological space, 31.3.14 

neighbourhoods, closed, 31.8.25 

nested family of closed sets, 37.9.4 

nested interval theorem, 37.9.2, 37.9.3, 37.9.12 

nested sequence of subsequences, Ascoli’s theorem, 38.5.1 

nested set convergence, complete metric space, 37.9 

net, Cauchy, 37.8.15, 38.3.13 

net, Cauchy, Cauchy integral, 43.3.19 

network, communication, data packet, 21.15.7 

network, socio-mathematical, 4.3.3, 7.10.2 

network traversal, membership relation, 7.8.9 

Neumann-Bernays-Gódel set theory, 7.5.11, 7.8.6 

neural path, 22.0.3 mu 

never-constant curve, 36.4.3 

Newton, Isaac, 22.0.3, 77.1.5, 77.1.6 

Newton, Isaac, calculus learned from Isaac Barrow, 77.1.4 

Newton, Isaac, derivative notation, 40.4.12 

Newton, Isaac, integration, 43.1.2 

Newton integral, 43.1.4, 43.2 

Newton's integral, 43.2.3 

Newton's philosophising Rule I, 2.3.2 

noise sensitivity, operational amplifier, 44.1.8, 44.6.2 

non-Archimedean ordered ring, 18.4.4, 18.4.5, 18.4.7 

non-decreasing, misnomer, for a partial order, 11.1.29 

non-decreasing derivative implies convexity, 40.6.11 

non-decreasing function, 11.1.30 

non-empty set, 7.6.3 

non-Hausdorff locally Cartesian space, 49.5, 51.8.2 

non-Hausdorff manifold, pseudo-metric distance, 49.5.4 

non-Hausdorff topological manifold, 50.1.3 

non-Hausdorff topology, 33.1.27 

non-increasing, misnomer, for a partial order, 11.1.29 

non-increasing derivative implies concavity, 40.6.11 

non-increasing function, 11.1.30 

non-invertible matrix, 25.8.7 

non-logical axiom, logical argument, 4.3.2 

non-measurable set, Lebesgue, 5.2.7, 7.11.13, 7.12.7 

non-membership notation, set, 7.2.5 

non-negative extended integers, usual topology, 32.5.4 

non-negative extended real number system, 16.2.10 

non-negative integers, usual topology, 32.5.3 

non-negative integral power, 16.6.3 

non-negative rational power, non-negative real number, 
16.6.11 

non-negative second derivative implies convexity, 42.1.22 

non-positive second derivative implies concavity, 42.1.22 

non-recursive syntax specification, infix logical expression, 
3.12.1 

non-representational art, 2.4.4 

non-singular metric tensor, covariant vector components, 
13.5.9 

non-singular symmetric bilinear function on linear space, 
24.10.2 

non-surjective projection map, fibration, 21.2.6 

non-tensorial Christoffel array, 74.3.8 

non-topological atlas, 49.3, 49.3.4 

non-topological chart, 49.3, 49.3.3 

non-topological F-fibration, 21.7.5 

non-topological fibration, absolute parallelism, 21.15.2 

non-topological fibration, constant cross-section, 21.6.6 
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non-topological 
21.6.8 
non-topologica 
non-topologica 
non-topological 
non-topologica 
21.4 
non-topological 
21.4.6 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
non-topological 
non-topological 
non-topologica 
non-topologica 
non-topological 
non-topologica 
21.14.2 
non-topologica 
non-topological 
21.8.12 
non-topological 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
non-topologica 
21.10.2 
non-topologica 


fibration, constant cross-section extension, 


w 


fibration, cross-section, 21.3, 21.3. 
fibration, fibre space, 21.2. 
fibration, form-style, 21.4.2 


fibration, form-style, cross-section short-cut, 


oo 


fibration, form-style, platform fibration, 


fibration, form-style, short-cut map, 21.4.9 
fibration, local cross-section, 21.3.3 
fibration, local trivialisation, 21.5.7 
fibration, non-uniform, 21.1, 21.1.2 
fibration, parallelism, 21.15 
fibration, uniform, 21.2, 21.2.1 

fibration with fibre space F, 21.2.10 

fibre bundle, 21, 21.8.3 SEES! 

fibre bundle, associated, 21.12, 21.12.5 

fibre bundle, associated, orbit-space method, 


fibre bundle, associated, patchwork, 21.13.2 
fibre bundle, fibre chart transition map, 


fibre bundle, orbit-space associated, 21.14 
fibre bundle, patchwork associated, 2113 
fibre bundle association map, 21.12.4 
fibre bundle with fibre space F’, 21.7. 
ocally constant cross-section, 21.6 
manifold, 49.3.8 E 
ordinary fibre bundle, 21.8 

ordinary fibre bundle, parallelism, 21.16 
principal bundle, constant cross-section, 


[o1 


principal bundle, identity chart for 


cross-section, 21.10.7 


non-topological 
21.10.3 
non-topologica 


principal bundle, identity cross-section, 


principal bundle, right action chart- 


independence, 21.11.2 


non-topologica 
non-topologica 
non-topological 

21.11.4 
non-topological 


principal bundle, right action map, 21.11 


principal fibre bundle, 21.9, 21.9.4 
principal fibre bundle, right action map, 


principal fibre bundle, right transformation 


group, 21.11.8 


non-topologica 
non-topologica 
24.11.4 


vector bundle, 24.11, 24.11.2 
vector bundle, linear space of cross-sections, 


non-trivial ring, 18.1.5 
non-trivial topology, 31.11.11 
non-uniform non-topological fibration, 21.1, 21.1.2 
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non-uniqueness, first-order ODE, 44.3.2 

non-vectorial directional derivative, naive vector field, 46.4.4 

non-vertical Lie-algebra-generated vector field transition rule, 
64.14.5 

non-vertical vector field generated by Lie algebra, 64.14 

non-vertical vector field induced on fibre set by Lie algebra, 
64.14.2 

non-zero ring, 18.1.5 

noncommutative ordered ring, 18.3.9 

nonholonomic vector fields, exterior derivative, 46.8, 61.14 

nonholonomic vector fields, Riemann curvature, 70.3.2 

nonholonomy, Lie bracket, integral curve, 61.5.20 — 

nonholonomy-per-area limit, Riemann curvature, 70.2.3 

nonlogical axiom, 3.13.5, 3.14.7 

nonsingular matrix, 25.8.7 

NOR operator, 3.7.8, 3.7.10, 3.7.14 

norm, distance function induced by, 39.3.2 
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norm, Euclidean, 24.7.15 

norm, linear space, 24.7, 24.7.2 

norm, linear space, bounds, 24.8 

norm, module over ring, 19.6 

norm, power, Cartesian space, 24.7.11 

norm, topology induced by, 39.3.2 

norm on module over a ring, 18.5.11, 19.6.1, 19.6.2 
normal coordinates, 60.1.4 

normal form, conjunctive, 4.7.4 

normal form, disjunctive, ATA 

normal space, 31.2.8 

normal subgroup, 17.7.7 

normal topological space, 33.3.24 

normed linear space, 24.7.5, 39.3 

normed linear space, finite-dimensional, 39.5 
normed linear spaces, differentiation of maps, 41.9 
normed module over a ring, 19.6.3 7 
normed real linear space, 24.7.7 

normed space, Euclidean, 24.7.16, 26.11.1 

norms, equivalent, linear space, 24.8.11 

north, cosmic abstraction, 26.10.1 

not, 3.6.2 

not-but binary logic operator, 3.7.13 

not proven, Scottish law, axiom of choice, 7.1.11 
notation, derivative, 40.4.12, 57.9.12 

notation, logic quantifier, survey, 5.2.4 

notation, map-rule, 10.19 mm 

notation, set membership relation, 7.2.9 

notation, sets of functions, 10.19 ^ 

notation for definitions, 1.5.5 

notation for important sets of numbers, 1.5.1 
notation for integrals, history, 43.2.4 D 
notations, 78.1 

notations and formalisms, plethora, 1.4.4, 1.4.5 
noumena, fibration, 21.5.8 

noumena, fibre bundle, 47.13.6 

noumena, measurements, real numbers, 15.0.3 
nowhere dense subset of topological space, 33.4.4 
nowhere-differentiable continuous function, 44.3.2, 50.7.4 
null space, column, 25.6 DUM LS 
null space, column, of matrix, 25.6.2 

null space, row, 25.6 

null space, row, of matrix, 25.6.2 

nullary operator, always-false, 3.6.16 

nullary operator, always-true, 3.6.1 
nullity, column, 25.6 

nullity, column, of matrix, 25.6.3 

nullity, row, 25.6 

nullity, row, of matrix, 25.6.3 

number, aleph, 13.2.7 

number, beth, 13.4.12 

number, binary, 14.4.12 

number, complex, 16.8 

number, complex, ideal, 18.1.10 

number, dark, 2.3 

number, epsilon, 12.6.5 

number, etymology, 14.1.2 

number, extended finite ordinal, 12.1.3 
number, extended rational, 16.3 

number, extended real, 16.2, 16.2.3 

number, finite ordinal, addition, 13.6 
number, finite ordinal, addition operation, 13.6.3 
number, finite ordinal, subtraction operation, 13.6.4 
number, floating-point, 8.8.1, 15.3.1 

number, Hartogs, 13.3.8 

number, incompressible, 2.3.5 


number, initial ordinal, 13.2. 


number, Lebesgue, 37.7.17 
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number, lowest, zero or one, 14.1.2, 14.6.2, 14.12.1, 14.12.4 
number, lowest, zero or one survey, 14.1.1 

number, natural, 14, 14.1 id 

number, natural, definitions and notations survey, 14.1.1 
number, natural, extended, 14.5.7 IR 
number, natural, standard system, 14.1.18 

number, ordinal, 12 

number, ordinal, extended finite, 12.1 

number, ordinal, general, 12.5, 12.5.7 

number, rational, 15, 15.1, 15.1.5 
number, real, 15, 15.3, 15.4.13 
number, real, constructions, 16 
number, real, decimal representation, 15.3.5 

number, real, irrational, 15.5.6 

number, real, negative of, 15.7.6 

number, real, powers and roots, 16.6 

number, real, rational, 15.5.6 amd 

number, real, reciprocal, 15.8.11 

number, unmentionable, 2.3.3 

number addition, finite ordinal, commutativity, 13.6.6 
number addition, finite ordinal, existence, 13.6.2 

number arithmetic, natural, 14.3 

number heaven, 2.2.10 PS 

number interval, real, 16.1 

number mysticism, 2238 — 

number notation summary, 1.5.1 

number representation, rational, 15.1.3 

number representation, real, 15.3.4 

number system, complex, 16.8.1 _ 

number system, complex, integral domain, 18.2.19 
number system, extended real, 16.2.7 GN 
number system, extended real, non-negative, 16.2.10 
number system, extended real, ordered, 16.2.13 

number system, natural, Peano axioms, 14.1.5 

number system, ordered rational, abstract, 15.1.10 
number system, rational, abstract, 15.1.9 umm 
number system, real, axioms, 15.9 

number system, real, Cantor, 15.3.8 

number system, real, Dedekind-cut, 15.9.2 

number tuple, real, 16.4 

numbers, extended finite ordinal, order, 12.2 

numbers, extended finite ordinal, standard order, 12.2.11 
numbers, finite ordinal, are totally ordered, 12.2.6 
numbers, finite ordinal, are well ordered, 12.2.7, 12.2.8 
numbers, finite ordinal, standard order, 12.2.5 

numbers, general ordinal, literature, 12.5.2 

numbers, general ordinal, standard order, 12.5.12 
numbers, natural, first number, 14.1.2 

numbers, natural, leading interval, 14.1.21 

numbers, ordinal, backbone of set theory, 12.6.4 
numbers, ordinal, equinumerous, 12.2.14 

numbers, rational, cardinality, 15.2 

numbers, rational, enumeration, 15.2, 15.2.4 

numbers, rational, measure zero, 45.1.5 

numbers, rational, uneven distribution, 15.1.17 
numbers, real, addition, 15.7 

numbers, real, model of observations, 15.3.2, 53.1.7 
numbers, real, multiplication, 15.8 

numbers, real, order, 15.6 Ex 

numbers, real, set, 15.4.8. 

numbers, real, standard topology, 32.5 

numbers, real, the most important number system, 32.5.1 
numerology, Pythagorean, 2.2.9 mE 
numerosity domination, 13.1.9 

numerosity domination, pseudo-notation, 13.1.10 
numerosity of countably infinite set, 13.7.9 
numerosity of finite set, 13.5.5 


object, mathematical, 8.8.2 
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object, mathematical, implicit chart, 26.11.6 

object bundle, baseless figure/frame bundle, 20.10.8 

object bundle, topological fibre/frame bundle, 47.13.2 

object class, 8.8, 44.6.8 

object class, coefficient matrix, 29.4.8 

object class, example, 8.8.2, 8.8.5, 8.8.7, 9.5.21, 20.7.19, 

63.7.1 a as 

object class, predicate logic, 5.1.1 

object class parametrisation by sets, 9.1.4 

object class-tag dictionary, 44.6.8 — 

object language, quantification, 6.4.1 

object oriented, predicate logic, 5.1.2 

objectives of this book, 1.4 ~~ 

oblique drop function, 59.3 

oblique drop function, double tangent space, 59.3.2, 59.3.5 

oblique drop function, linearity, vector bundle, 65.4.4 

oblique drop function, vector bundle, 65.4, 65.4.2 

oblique drop of vector bundle connection, linearity, 68.1.6 

oblique projection, 50.6.5, 50.6.6 

obol, 22.0.3 

observation map, baseless figure/frame bundle, 20.10.8 

observation process map, topological fibre/frame bundle, 
47.13.2 

observation space, baseless figure/frame bundle, 20.10.8 

observation space, topological fibre/frame bundle, 47.13.2 

observations, model, real numbers, 15.3.2, 53.1.7 

observer bundle, baseless figure/frame bundle, 20.10.8 

observer bundle, topological fibre/frame bundle, 47.13.2 

obvious assumptions, non-obvious inferences, 4.9.5, 6.1.7 

ocean current, buoy, Lie transport, 61.7.1 mM 

Ockham's razor, axiom of choice, 7.12.7 

Ockham's razor, Newton's philosophising Rule I, 2.3.2 

octuple, ordered, 14.6.5 I 

ODE, first order, non-uniqueness, 44.3.2 

ODE (ordinary differential equation), 44.0.1 

ODE existence, Picard method, 44.6 

ODE existence theorem, Peano, 44.3.20 

ODE existence theorem, Peano, systems, 44.4.9 

ODE existence theorem, Picard, 44.6.20 

ODE existence theorem, Picard, systems, 44.7.1 

ODE problem formalisation, 44.6.8 — 

ODE system, classical solutions, 44.6.12 

ODE system, weak solutions, 44.6.14 

ODE uniqueness theorem, 44.5.2 

ODE uniqueness theorem, systems, 44.5.3 

OFB (ordinary fibre bundle), 21.8.1, 47.0.4 

OFB connection, associated, formula using PFB connection, 
69.9.4, 69.9.6 

ointment, fly, 6.3.5, 7.8.10, 12.6.5, 15.8.3, 20.9.11, 22.6.1, 


60.1.3, 64.11.2, 65.6.3, 73.10.2 
Olympus, Mount, 7.1.2 
Q, hint-letter for open set, 31.4.2 
Omega, hint-letter for open set, 31.4.2 
w-accumulation point, 31.10.16 
w-cluster point, 31.10.16 
w-infinite, uncountable, 13.10.14 
w-infinite set, 13.7.6 
w-infinite-set oo-limit-point compact set, 35.5.12 
w-infinite-set limit-point compact set, 35.5.12 
w-limit point, 31.10.16 
omit-function for lists, 14.12.6 
omit-two-items function for lists, 14.12.6 
on-demand sets, 7.10.2 — 
1-1 function, 10.5.7 
one-parameter family of geodesics, 72.5.2 
one-parameter subgroup, continuous, 36.9.6 
one-parameter subgroup, differentiable, 62.9.2 
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one-parameter subgroup, Lie group, 62.9 

one-parameter subgroup existence theorem, Lie group, 62.9.9 

one-parameter subgroup uniqueness theorem, Lie group, 
62.9.7 

one-sided inferior limit, real function, 35.8.9 

one-sided superior limit, real function, 35.8.9 

one-to-one function, 10.5.7 

one’s complement representation of negative numbers, 14.4.13 

onto function, 10.5.7 

ontological clarity, tangent vector definition, 54.1.6 

ontology, Cartesian spaces, 26.12.5 

ontology, definition, 2.2.3 

ontology, existential quantifier, 13.7.15 

ontology, mathematics, 2.2 

ontology, Platonic, 2.2.10 

ontology, sets, TAY —— 

ontology, tangent vector, 53.1.15 

ontology, underlying, velocity vectors, 26.12.2 

open annulus, 37.3.14 

open ball, 37.3.1, 37.5.12 

open ball, punctured, 37.3.14 

open base, 32.2 

open base, countable, 32.2.10 

open base, topology, 32.2.3 

open base at a point, countable, 32.2.9 

open base at a point, topology, 32.2.2 

open base choice function, second countable space, 33.4.23 

open-closed interval, 11.5.10, 16.1.5 

open cover, 31.7, 31.7.2, 33.5 

open cover, finite, 3T -—- 

open cover, indexed, 33.5.4 

open cover, indexed, refinement, 33.5.7 

open cover, refinement, 31.7.4 

open cover, subcover, 3107.3 — 

open cover definitions, survey, 33.5.3 

open curve, tangent operator, 57.9.13 

open curve, tangent vector field, 57.9.2 

open curve, velocity vector field, 57.9. 

open interval, 11.5.10, 16.1.5 

open interval, real numbers, 32.5.8 

open interval component, point in real-number open set, 
32.7.5 

open-interval curve, 36.2.9 

open interval enumeration, 32.7.2 

open neighbourhood, 31.3.11 

open portion of boundary of set, 31.9.14 

open set, 31.3.3 I 

open set, enumeration of connected components, 34.8.1 

open set, locally, 31.7.6 

open set of real numbers, interval enumeration, 32.7 

open set of real numbers, measure, 32.7.10 

open set symbol G, 31.4.2 

open set symbol Q, 31.4.2 

open subbase, 32.3 

open subbase, topology, 32.3.5 

open subbase at a point, topology, 32.3.2 

operating system, computer, 4.5.11 

operation, addition, finite ordinal numbers, 13.6.3 

operation, basic logical, on logical expressions, 3.9 

operation, Lie, 61.5.1 ~~ 

operation, subtraction, finite ordinal numbers, 13.6.4 

operational amplifier, noise sensitivity, 44.1.8, 44.6.2 

operator, differential, in Riemannian manifold, 74.6 

operator, differential, tangent, 54.11.2 — 

operator, function substitute-value, 14.12.23 

operator, function swap-values, 14.12.23 

operator, Hessian, 71.9 

operator, Hessian, at a critical point, 59.11.4 
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operator, Hessian, at critical point, 59.11 orbit-space associated differentiable fibre bundle, 66.7.12 
operator, Hessian, metric tensor calculation, 73.2.6, 73.9.1, orbit-space associated fibre bundle cross-section, short-cut, 
73.10.3 66.8.1 
operator, Hessian, on S? at a critical point, 76.4.5 orbit-space associated non-topological fibre bundle, 21.14, 
operator, Hessian, Riemannian metric, 73.1.1 21.14.2 
operator, Laplace-Beltrami, 74.6.7 orbit-space associated topological fibre bundle, 47.11, 47.11.5 
operator, Laplace-Beltrami, Riemannian manifold, 74.6.6 orbit-space associated vector bundle connection, 69.10 
operator, Laplacian, 74.6.7 orbit-space cross-section, associated, short-cut map, 47.12.5 
operator, Laplacian, Riemannian manifold, 74.6.4, 74.6.6 orbit-space method for associated fibre bundles, 47.11.1 
operator, left translation, 62.3.2 order, 11 
operator, list concatenation, 14.12.6 order, degree, difference of meaning, 59.0.2, 60.0.1 
operator, list insert-item, 14.12.6 order, dual, 11.1.13 
operator, list length, 14.12.6 order, extended finite ordinal numbers, 12.2 
operator, list omit-one-item, 14.12.6 order, lexicographic, 11.6.19 
operator, list omit-two-items, 14.12.6 order, lexicographic, choice function, 37.7.9 
operator, list product, 18.10.3 order, lexicographic, generalised curves, 36.1.8 
operator, list projection, 18.10.2 order, partial, 11.1, 11.1.2 
operator, list subsequence, 14.12.6 order, partial, maximal, 11.5.5 
operator, list substitute-item, 14.12.6 order, partial, restriction, 11.1.19 
operator, list sum, 18.10.2 order, partial, set bounds, 11.2 
operator, list swap-items, 14.12.6 order, partial, set-inclusion relation, 11.1.12 
operator, logic, arithmetic equivalent, 3.15.6 order, partial, strongest, 11.1.10 
operator, logical, arity, 3.11.7 order, partial, weakest, 11.1.9 
operator, logical, binding precedence rules, 3.9.6 order, real numbers, 15.6 
operator, NAND, 3.7.14 —— order, real numbers, standard, 15.6.2 
operator, NOR, 3.7.14 order, standard, extended finite ordinal numbers, 12.2.11 
operator, nullary, always-false, 3.6.16 order, standard, finite ordinal numbers, 12.2.5 a 
operator, nullary, always-true, 3.6.16 order, standard, general ordinal numbers, 12.5.12 
operator, right translation, for vector field on principal order, total, 11.5, 11.5.1 
bundle, 66.2.21 order, well-ordering, 11.6 
operator, substitution, 27.2.26 order by inclusion, partial, left segments, 11.2.13 
operator, tangent, action on vector-valued function, 54.14 order-homomorphism, 18.8.15 
operator, tangent, higher-order, 60 mE order homomorphism, strong, 11.1.21 
operator, tangent, second-order, 60.2, 60.2.2 order homomorphism, weak, 11.1.21 
operator, tangent, second-order, tagged, 60.2.11 order induced by a bijection, 11.1.28 
operator, tangent, tagged, 54.15 order isomorphism, 11.1.21 
operator, tangent, total space, 54.12.1 order monomorphism, 11.1.21 
operator, tangent differential, 54.11 order notation, Landau, strong bound, 39.7.10 
operator, tangent differential, tagged, 54.15.3 order notation, Landau, weak bound, 39.7.9 
operator, tangent differential, total space, 54.12 order of function, strong bound, normed linear space, 39.7.7 
operator, XOR, 3.7.14 order of function, strong bound, normed module over ring, 
operator domain, module without, 19.1.2 39.7.4 
operator domain, unstructured, module over, 19.1 order of function, weak bound, normed linear space, 39.7.6 
operator domain of module, 19.1.7 order of function, weak bound, normed module over ring, 
operator field, chart-basis, 57.3.5 39.7.3 
operator field, composition, 60.3.1 order on a group, Archimedean, 17.5.4 
operator field, lifted chart-basis, on principal bundle, 69.14.4 order-preserving homomorphism, 18.8.15 
operator field, tangent, 57.3 order relations, family tree, 11.1.1 
operator field, tangent, composition, 60.3 order symbols, Landau, 39.7 
operator homomorphism over a set, 19.1.10 ordered commutative group, 17.5.8 
operator notations, logic, 3.7.14 ordered commutative group, Archimedean, 17.5.9 
operator notations, logic, survey, 3.7.14 ordered extended real number system, 16.2.13 
operator on Lie group, left translation, 62.3 ordered family of elements of a set, totally, 11.5.20 
operator on Lie group, right translation, 62.6 ordered family of functions, totally, 11.5.22 
operator space, differential, tangent, 54.11.10 ordered family of sets, totally, 11.5.21 
operator space, tangent, chart-basis vector, 54.13.4 ordered field, 18.8, 18.8.4 
operator space, tangent, tagged, 54.15.5 ordered field, complete, 18.9, 18.9.3 
or, 3.6.2 ordered field homomorphism, 18.8.16 
or, exclusive, 3.6.3 ordered field morphisms, 18.8.17 
or, exclusive, notation, 3.7.16 ordered group, 17.5 
or, inclusive, 3.6.3 ordered group homomorphism, 17.5.11 
orbit, left transformation group, 20.5.2 ordered integral domain, 18.6.13 
orbit, right transformation group, 20.7.20 ordered pair, 9.2, 9.2.2 
orbit space, left transformation group, 20.5.6 ordered pair, left element operator, 9.2.13 
orbit space, right transformation group, 20.7.23 ordered pair, right element operator, 9.2.13 
orbit-space, short-cut, associated cross-section, 47.12.3 ordered pair, tuple-style, 14.6.5 
orbit-space, short-cut, associated cross-section space, 47.12.4 ordered-pair-set, relation representation, 9.1.1, 9.5.1 
orbit-space associated cross-section, short-cut, 47.12, 66.8 ordered pair verification test, 9.2.15 
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ordered pair versus 2-tuple, 14.6.6 

ordered quadruple, 9.3.3 

ordered rational number system, abstract, 15.1.10 
ordered ring, 18.3, 18.3.3 


ordered ring, Archimedean, 18.1.1, 18.4, 18.4.2 


ordered ring, cancellative semigroup, 18.3.1 
ordered ring, non-Archimedean, 18.4.4, 18.4.5, 18.4.7 
ordered ring, noncommutative, 18.3.9 

ordered ring homomorphism, 18.3.13 

ordered sample, 14.10.1 

ordered selection, 14.10, 14.10.1 

ordered selection, rearrangement, 14.11 

ordered selection, sorting, 14.11 EA 

ordered selection, strongly, 14.10.2 

ordered selection, weakly, 14.10.2 

ordered set, 11.1.2 

ordered set, partial, maximal element, 11.2.2 
ordered set, partial, minimal element, 11.2. 
ordered set, totally, 11.5.1 

ordered set, totally, interval notation, 11.5.13 

ordered set, totally, semi-infinite interval notation, 11.5.15 
ordered traversal, 11.5.28, 11.5.29, 36.1.8, 36.2.2 

ordered triple, 9.3.3 

ordered tuple, 14.6.5 

ordered tuples, low-level representation, 9.3 

ordered unitary ring, 18.6 m 

ordering of commutative group, 17.5.7 

ordering of commutative group, Archimedean, 17.5.9 
ordering of field, 18.8.3 

ordering of group, 17.5.1 

ordering of ring, 18.3.2 

ordering of ring, Archimedean, 18.4.2 

ordinal, limit, 12.5.16 

ordinal, successor, 12.5.16 

ordinal number, 12 

ordinal number, extended finite, 12.1, 12.1.3 

ordinal number, finite, 12.1.34. 

ordinal number, finite, addition, 13.6 

ordinal number, finite, addition operation, 13.6.3 

ordinal number, finite, subtraction operation, 13.6.4 
ordinal number, general, 12.5, 12.5.7 

ordinal number, initial, 13.2.7 

ordinal number, weak initial segment, 12.5.27 

ordinal number 10, 12.2.2 E x 

ordinal number addition, finite, commutativity, 13.6.6 
ordinal number addition, finite, existence, 13.6.2 

ordinal number definition, Robinson-style, 12.1.8 

ordinal numbers, backbone of set theory, 12.6.4 

ordinal numbers, cardinality representation, 13.2 

ordinal numbers, equinumerous, 12.2.14 as 

ordinal numbers, extended finite, order, 12.2 

ordinal numbers, extended finite, standard order, 12.2.11 
ordinal numbers, finite, are totally ordered, 12.2.6 

ordinal numbers, finite, are well ordered, 12.2.7, 12.2.8 
ordinal numbers, finite, standard order, 12.2.5 

ordinal numbers, general, literature, 12.5.2 

ordinal numbers, general, standard order, 12.5.12 

ordinary bundle, baseless figure/frame bundle, 20.10.8 
ordinary bundle, connection definition conversions, 67.11 
ordinary bundle, topological fibre/frame bundle, 47.13.2 
ordinary bundle connections, conversion rules, 6711.2 _ 
ordinary bundle example, real group on four-space, 64.8.8 
ordinary differential equations, Peano method, 44.3 
ordinary differential equations, Peano method, systems, 44.4 
ordinary differential equations, Picard method, 44.6 
ordinary differential equations, Picard method, systems, 44.7 
ordinary differential equations, uniqueness, 44.5 pns 
ordinary fibre bundle, associated connection, 67.12, 67.12.3 
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ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 
ordinary 


fibre bundle, covariant derivative, 68.2.5 

fibre bundle, differentiable cross-section, 64.7 
fibre bundle, drop function, 64.6, 64.6.2, 64.6.6 
fibre bundle, horizontal component, 64.5 

fibre bundle, horizontal component map, 67.9.2 
fibre bundle, horizontal lift function, 67.5 

fibre bundle, horizontal subspace, 67.9.6 

fibre bundle, naive derivative, 67 — 

fibre bundle, non-topological, 21.8 

fibre bundle, non-topological, parallelism, 21.16 
fibre bundle, parallel transport, 68.5 

fibre bundle, Riemann curvature, 70.3, 70.4.3 
fibre bundle, Riemann curvature, justification, 70.7 
fibre bundle, vertical component map, 67.10.2 
ordinary fibre bundle connection, 67 

ordinary fibre bundle connection, curvature, 70.2 

Oresme, Nicole, 77.1.3 od 

Oresme, Nicole, calculus, derivative at extremum, 40.6.1 
oriented continuous path, 36.8.4 

origin set of relation, 9.5.21 

orthogonal connection, 73.1.2 

orthogonal connection, Levi-Civita, 74.1.2 

orthogonal group, 48.3.8, 76.8.2 

orthogonal matrix, 25.14.2 

orthogonal subset of linear space, 24.10.4 

orthogonal transformation, 77.2.7 

orthogonal vectors, module over ring, 19.7.7 

orthogonality of transition maps, 54.5.31 

oscillation of function at a point, 43.6.6 

oscillation of function on a set, 43.6. 
osmosis through apprenticeship, 2.2.2 
outer content, Jordan, 43.6.2 

outer measure, Lebesgue, 43.6.4 
overlapping set-union topology, 32.14.8 
overview of chapters, 12 
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p-norm, Cartesian space, 24.7.11 

packet, data, communication network, 21.15.7 
pact, Faustian, axiom of choice, 7.4.4, 77.4.1 
page counts, chapters, 1.3.0 Fra 

pair, ordered, 9.2, 9.2.2 

pair, ordered, left element operator, 9.2.13 
pair, ordered, right element operator, 9.2.13 
pair, ordered, tuple-style, 14.6.5 

pair, ordered, verification test, 9.2.15 

pair, ordered, versus 2-tuple, 14.6.6 

pair, unordered, 7.6.12 

pair axiom, unordered, 7.6.10 

pair class, 9.2.4 

Pandora’s box, non-Hausdorff manifolds, 50.0.1 
panorama, photo-stitching software, 32.15.2 
Pappus of Alexandria, 77.1.3 

parachute, imaginary, axiom of choice, 7.11.1, 10.3.1 
paracompact, metric space, 37.7.19 
paracompact metric space, 37.5.11 
paracompact topology, 33.7.15 

paracompact topology, countably, 33.7.16 
paracompactness literature, 33.7.13 

paradox, Feferman's, axiom of choice, 7.12.6 
paradox, Russell's, 3.2.10, 7.8.6 

paradox, Skolem's, 12.6.4 ^ 

paradox, Skolem's, axiom of choice, 7.12.6 
paradox, Skolem's, constructible real numbers, 13.7.23 
paradox, Zeno of Elea, Achilles and tortoise, 31.2.3 _ 
paradox, Zeno of Elea, flying arrow, 53.1.12 
parallel transport, 67.3.1, 72.2.2 

parallel transport, history, 67.1 

parallel transport, horizontal lift function, 71.4 
parallel transport, ordinary fibre bundle, 68.5 - 
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el transport, reconstruction from connection, 67.3 

el transport of tangent space along curve, 71.4.3 

el transport on curve-family, 71.4.9 

el transport on the two-sphere, 71.0.3 

elism, absolute, 26.1.2 

elism, absolute, non-topological fibration, 21.15.2 

elism, associated, 48.4 as 

elism, associated topological pathwise, 48.4.2 

elism, deviation from flatness, 70.4.1 

elism, Levi-Civita, 74.1 

elism, metric-free, affine connection, 71.6.1 

elism, non-topological fibration, 21.15 

elism, non-topological ordinary fibre bundle, 21.16 

elism, pathwise, non-topological fibration, 21:15.5 - 

elism, pathwise, topological, 48.3.2 

elism curvature, differentiable fibration, 70.3.9 

elism curvature at a point, differentiable fibration, 

10.3.8 

elism curve class, 48.2.6 

elism on topological fibre bundle, 48 

elism path class, 48.2, 48.2.6 = 

parallelism rise map, affine connection, 71.10.2 

parallelism-style connection, 67.2.1 

parameter, proposition, 5.1.4 

parametrisation by length, bidirectional, locally rectifiable 
curve, 38.9.11 

parametrisation by length, rectifiable curve, 38.9.3 

parametrisation of curve, 36.1.6 

parametrisation of path, 36.8.17 

parametrised hyperplane segment, 26.9 

parametrised line, traversal mechanism, 40.2.3 

parametrised line segment, 26.9 

parametrised proposition family, 5.1 

parenthesis, Poisson’s, Lie bracket, 61.5.1 

parity function, permutation, 14.8.22 

park, walk in, 8.0.1 

Parnassum, Gradus ad, 1.4.13 

part, fractional, function, 16.5.16 

partial bijection, 10.9.15 

partial Cartesian product, 10.17, 10.17.2 

partial curve length function, bidirectional, metric space, 
38.8.13 

partial curve length function, metric space, 38.8.8 

partial derivative, map between Cartesian spaces, 41.2 

partial derivative, several real variables, 41.1 

partial derivative, undefined higher-order, partial function, 
42.2.7, 42.5.8 

partial derivative, zero, constant real function, 41.1.16 

partial derivative chain rule, 41.7.2, 41.7.4 

partial derivative commutativity theorem, 42.3.6, 42.5.23 

partial derivative composition rule, 41.7.2, 41.7.4 

partial derivative inverse chain rule, 41.7.7 

partial derivative matrix of a function, 41.2.12 

partial derivative of a function, 41.1.10, 41.2. 

partial derivative of partial function, 42.2.2 5: 

partial derivative of partial function, higher-order, 42.2.5, 
42.5.6 

partial derivative of vector-valued function, 41.8.4 

partial derivative tree for partial function, 42.2.5, 42.5.6 

partial derivative tuple of a function, 41.1.14 

partial derivative vector field, first-order, 57.9.15 

partial derivative vector field, second-order, 59.9.3 

partial derivatives, commutativity, 42.3 

partial derivatives, zero, constant Cartesian space map, 
41.2.23 

partial derivatives, zero, constant vector-valued map, 41.8.12 

partial derivatives, zero, real-valued function at min/max, 
51.6.12 
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partial differentiability, keyhole test, 41.1.5 

partial differentiability of map, keyhole test, 41.2.3 
partial differential of map on manifold product, 58.6 
partial function, 10.9, 10.9.2 

partial function, bijective, 10.9.8 

partial function, composition, 10.10 

partial function, continuous, 31.13, 31.13.2 

partial function, derivative, 42.1.2 
partial function, derivative, higher-order, 42.1.4 


partial function, injective, 10.9.8 

partial function, partial derivative, 42.2.2, 42.5.3 

partial function, partial derivative, higher-order, 42.2.5, 
42.5.6 


partial function, partial derivative tree, 42.2.5, 42.5.6 

partial function, partial map, potential confusion, 10.12.16, 
42.5.2 

partial function, surjective, 10.9.8 

partial function on Cartesian space, differentiable, 42.2.7, 
42.5.8 — 

partial function product, double-domain, 10.14.10 

partial function set-map properties, 10.9.12 

partial map, partial function, potential confusion, 10.12.16, 
42.5.2 ~ 

partial map differentiability on manifold product, 52.6.8 

partial map of function of two variables, 10.12.17 

partial map of functions of two variables, dot notation, 
10.12.18 

partial order, 11.1, 11.1.2 

partial order, maximal, 11.5.5 

partial order, non-decreasing, misnomer, 11.1.29 

partial order, restriction, 11.1.19 

partial order, set bounds, 11.2 

partial order, set-inclusion relation, 11.1.12 

partial order, strongest, 11.1.10 

partial order, weakest, 11.1.9 

partial order by inclusion, left segments, 11.2.13 

partial ordered set, maximal element, 11.2.2 

partial ordered set, minimal element, 11.2. 

partial sequence, 10.17.5 

partial-sum sequence, topological linear space, 39.2.2 

partially defined function, 10.9, 10.9.2 

partially defined function, composite, 10.10.6 

partially defined function, composition, 10.10, 10.10.6 

partially differentiable, continuously, vector-valued function, 
41.8.9 

partially differentiable function, 41.1.4, 41.2. 

partially differentiable function, continuously, 41.1.24, 
41.2.18, 42.5.11 

partially differentiable function, discontinuous, example, 
41.1.22 

partially differentiable function, vector-valued, 41.8.2 

partially ordered set, 11.1.2 

particle field, topological fibre bundle, 47.12.2 

particle trajectory, 36.0.1 

particular-point topology, 31.11.13 

partition, coarser, 8.7.15 

partition, finer, 8.7.15 

partition, interval, 43.3.3 

partition, interval, mesh, 43.3.10 

partition, interval, refinement, 43.3.5 

partition, interval, sample sequence, 43.4.3 

partition, refinement, 8.7.16 

partition of a set, 8.7 

partition of set, 8.7.12, 9.8.9 

partitioning by inverse function, fibration, 10.16.5 

partitions, interval, coarsest common refinement, 43.3.7 

partitions, interval, common refinement, 43.3.5 


partitions, interval, mesh-constrained set, 43.3.11 
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parts, integration by, 43.8.11 

Pascal, Blaise, 14.9.5, 77.1.4 

Pascal’s triangle, 14.9.5, 14.9.6 

Pasch, Moritz, 77.1.7 

passive set, algebra, 17.0.3 

patch-free differential geometry, 49.2.7 

patches, football, 32.15.2 

patchwork associated differentiable fibre bundle, 66.7.10 

patchwork associated non-topological fibre bundle, 21.13, 
21.13.2 —— 

patchwork associated topological fibre bundle, 47.10, 47.10.4 

patchwork quilt, 49.2.2 EXE 

patchwork space, 10.17.8 

patchwork space, locally Cartesian, 49.9 

patchwork space, non-topological, 10.17 

patchwork space, topological, 32.15 

patchwork space, topologically compatible, 32.15.3 

patchwork space representation, 10.17.9 

patchwork topology, 32.15.3 

path, 36.8, 36.8.4 

path, continuous, directed, 36.8.4 

path, continuous, oriented, 36.8.4 

path, continuous, unoriented, 36.8.14 

path, neural, 22.0.3 

path, rectifiable, 48.2.7 

path, topological, 36 

path, traversable, finite, 21.15.7 

path and curve meanings, survey, 36.1.1 

path atlas, 36.1.4, 36.1.8 S 

path chart, 36.1.8 

path class, ex machina, 48.2.2 

path class, parallelism, 48.2, 48.2.6 

path concatenation, 36.8.12 

path-equivalence of curves, 36.5 

path-equivalent curves, 365.3 

path image, 36.8.4 

path initial point, 36.8.10 

path multiple point, 36.8.10 

path parametrisation, 36.8.17 

path representative, 36.8.4 

path reversal, 36.8.8 

path terminal point, 36.8.10 

path terminology, 36.1 

path topics summary, 51.9.7 

pathological empty product of family of sets, 32.12.5 

pathological set, 10.11.9 

pathwise connected, locally, topological space, 36.7.14 

pathwise connected points, 36.7.2, 36.7.5 

pathwise connected set, 36.7, 36.7.6 

pathwise connected topological space, 36.7.7 

pathwise parallelism, non-topological fibration, 21.15.5 

pathwise parallelism, reversibility rule, 48.3.5 

pathwise parallelism, topological, 48.3.2 

pathwise parallelism, topological, associated, 48.4.2 

pathwise parallelism, transitivity rule, 48.3.4 

pathwise parallelism on topological fibre bundle, 48.3 

PC (propositional calculus), 4.5.5 id 

PDE (partial differential equation), 44.0.5 

PDE corpus, 1.4.4 

PDE techniques, curved space, 1.4.4 

Peano, Giuseppe, 77.1.7 EN 

Peano, Giuseppe, falsum/verum notations, 3.6.14 

Peano, Giuseppe, logical operator notations, 3.7.15 

Peano, Giuseppe, natural number axioms, 14.1.4 

Peano, Giuseppe, set membership notation, 7.2.9 

Peano, Giuseppe, set theory, 7.1.6 d 

Peano, Giuseppe, successor set notation, 12.2.19 

Peano axioms for natural numbers, 14.1.5 
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Peano-axioms-style countability tests, 14.2 

Peano-countable set, 14.2.6 and 

Peano existence theorem, literature, 44.3.1 

Peano-finite set, 14.2.10 

Peano-infinite set, 14.2.2 

Peano method, ordinary differential equations, 44.3 

Peano method, ordinary differential equations, systems, 44.4 

Peano ODE existence theorem, 44.3.20 vu 

Peano ODE existence theorem, systems, 44.4.9 

Peano ODE method, local approximation cylinder, 44.4.4 

Peano ODE method, local approximation rectangle, 44.3.9 

Peano ODE method, polygonal approximation, 44.3.12 

Peano ODE method, polygonal approximation, systems, 
44.4.6 

Peano space-filling curve, 36.3.1 

Peirce, Charles Sanders, 379 

Peirce arrow, 3.7.8, 3.7.10, 3.7.14 

pentuple, ordered, 14.6.7 

per-point neighbourhood, 31.2.6 

Perelman, Grigori Yakovlevich, 77.1.8 

permutation, 14.8, 14.8.2 

permutation, finite, 14.8.10 

permutation, finite, cardinality conservation, 14.8.41 

permutation, finite, flow-balance equation, 14.8.40 

permutation, parity function, 14.8.22 

permutation, set automorphism, 10.5.22 

permutation, sorting, for finite sequence, 14.11.3 

permutation, support, 14.8.7 

permutation group, 14.8.6 

permutation group, non-topological fibre bundle, 21.8.17 

permutation-invariant topology, 31.11.5 

Perron integral, 43.1.4 

perspective bundle, baseless figure/frame bundle, 20.10.8 

perspective bundle, topological fibre/frame bundle, 47.13.2 

Petrie, William Matthew Flinders, 26.10.1 

PFB (principal fibre bundle), 21.9.2, 47.0.4 

PFB connection, formula for associated OFB connection, 
69.9.4, 69.9.6 

phenomena, fibration, 21.5.8 

phenomena, fibre bundle, 47.13.6 

phenomena, measurements, real numbers, 15.0.3 

philosophy, locally Cartesian spaces, 49.2 

philosophy, tangent bundle, differentiable manifold, 53 

philosophy, vapid, 26.12.2 

philosophy channel, remarks and diagrams, 1.5.9 

philosophy of mathematics, 2.0.3 m 

photo-stitching software, panorama, 32.15.2 

photon, radiation field, 47.12.7 

physics, 2-sphere, 76.0.1 

physics, coordinates idiom for tensors, 27.5.11 

physics, Lie groups, 62.0.3 

physics, mixed tensors, 29.5.14 

physics, parallelism, 48.2.4 

physics, power series, 16.8.10 

physics, theoretical, out of scope, 1.6.3 

physics, vector fields, 36.0.1 — 

physics arena, passive space-time, active field theory, 49.5.10 

T, 2.2.5, 2.3.5, 44.2.5, 44.2.6 

Picard, Charles Emile, 77.1.7 

Picard approximations example, 44.6.17 

Picard iteration method, interpretation, 44.6.18 

Picard iteration method, properties, 44.6.7 

Picard iteration method literature, 44.6.1 

Picard iteration sequence, 44.6.15 

Picard method, ordinary differential equations, 44.6 

Picard method, ordinary differential equations, Systems, 44.7 

Picard ODE existence theorem, 44.6.20 x 

Picard ODE existence theorem, systems, 44.7.1 
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pitch, 76.8.4 

pixie, AC, 22.7.20 

pixie number, 2.3.3 

pixies at the bottom of the garden, 7.12.7 

Planck, Max Karl Ernst Ludwig, 77.1.7 

plane, astral, 2.2.14, 13.9.10, 22.7.22, 45.3.6 

plane geometry, Euclidean, 26.11.1 ^ 

planet, force, 53.1.4 

platform fibration of form-style non-topological fibration, 
21.4.6 

Plato, 2.2.10, 77.1.2 

Platonic Forms, 2.2.8 

Platonic ontology, 2.2.10 

Pleistocene, Late, megadrought, 3.0.2 

ethora, algebraic systems, 1.02 

ethora, differential geometry definitions, 49.6.9 

ethora, finite set concepts, 13.11.1 RN 

ethora, formalisms, 3.0.3, 4.1.3 

ethora, formalisms and notations, 1.4.4, 1.4.5 

ethora, notations, 6.1.10 DEI 

ethora, set theorems, 8.0.1 

ethora, tangent covector representations, 55.1.1, 55.1.5 

ethora, tensor space representations, 56.1.1 

ethora, topological space classes, 33.7.7 

plethora of lemmas, 2.4.6 

Poincaré, Jules Henri, 31.14.1, 77.1.7 

Poincaré, Jules Henri, axiom of choice, 7.12.7 

Poincaré, Jules Henri, combinatorial topology, 31.1.2 

Poincaré, Jules Henri, relativity, 77.3.2 

point, trig, 7.1.5 

point-set, differentiable submanifold, regular, 52.3.7 

point-set layer, 1.1 

point-set topology, 31.1.2 

point space, affine space over a group, 26.2.2 

point space, affine space over module over group, 26.6.2 

point space, affine space over module over ring, 26.6.6 

point space, affine space over module over set, 26.4.10 

point-to-point distance function, 76.5.1 

point/neighbourhood pairs, topological space, 31.3.14 

pointwise bounded sequence of functions, 38.4.13 

pointwise bounded set of functions, 38.4.13 
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pointwise characterisation, interior and closure of set, 31.8.11 


pointwise common-domain direct product of functions, 
10.15.2 

pointwise composition of function-valued functions, 10.4.26 

pointwise continuity, 35.3 

pointwise convergence topology, 33.5.19, 35.3.27 

pointwise differential of common-domain product of maps, 
58.7.7, 58.10.7 

pointwise differential of diffeomorphism, inverse rule, 58.5.4 

pointwise equicontinuous sequence of functions, 38.4.10 

pointwise finite cover, topological space, 33.7.14 

pointwise induced-map tangent covector space, 58.3.7 

pointwise induced-map tangent space, 58.3.2 ~ 

pointwise modulus of continuity, 38.2.4 

pointwise speed, maximum, bounds mean speed, 40.8.7 

Poisson, Siméon Denis, 77.1.6 

Poisson bracket, literature, 61.5.2 

Poisson’s parenthesis, Lie bracket, 61.5.1 

polyglot dictionary, connection definitions, 69.15.3 

polygonal approximation, Peano ODE method, 44.3.12 

polygonal approximation, Peano ODE method, systems, 
44.4.6 

polynomial function, derivative, 40.5.13 

Poncelet, Jean-Victor, 77.1.6 

ponendo ponens, modus, 4.8.15 

ponens, modus, 4.3.17 

ponere, 4.8.15 
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portion of boundary of set, open/closed, 31.9.14 

poset, 11.1.1 

positive cone, field, 18.8.9 

positive cone, ring, 18.3.17 

positive definite, 25.11.7, 30.5.3 

positive definite inner product, module over a ring, 19.7.3 

positive integral root, non-negative real number, 16.6.8 

positive semi-definite, 25.11.7, 30.5.3 

positive subset, field, 18.8.9 

positive subset, ring, 18.3.17 

post-evaluation expression substitution notation, 10.4.14 

postfix expression, stack, 3.11.7 

postfix logical expression, 3.11.4 

postfix logical expression interpretation map, 3.11.6 

postfix logical expression space, 3.11.5 

postfix logical expression space, substitution closure, 3.11.8 

postfix logical expression style, 3.11 RES 

postulational method, 77.1.2 m 

potential, gauge, connection form construction theorem, 
69.12.7 

potential, gauge, connection form localisation, 69.11, 69.11.1 

potential, gauge, constructed from connection form, 69.11.3 

potential, gauge, coordinate chart components, 69.13 

potential, gauge, equations of motion, 70.8.1 

potential, gauge, globalisation, 69.12 

potential, gauge, vector fields on fibre bundles, 64.13.2 

potentially infinite, 4.5.4 

potentially infinite versus actually infinite, 12.1.32, 13.12.4 

power, negative integral, 16.6.4 

power, negative rational, positive real number, 16.6.12 

power, non-negative integral, 16.6.3 

power, non-negative rational, non-negative real number, 
16.6.11 

power, rational, non-negative real number, 16.6.10 

power, real, non-negative real number, 16.6.14, 16.6.15 

power, real number, 16.6 

power norm, Cartesian space, 24.7.11 

power-of-two function, 14.7.7 

power series, 42.8 

power series, physics, 16.8.10 

power set, 7.6.25 

power set, cardinality-constrained, 13.12 

power set, equinumerous to indicator function set, 14.7.5 

power set axiom, 7.6.23 

power-set choice function, 10.3.7 

power set notation alternative, 7.6.27 

power set properties, 8.5, 8.5.2 EE 

power sets in logical quantifiers, 7.6.28 

powers of 2, finite ordinal numbers, 13.6.13 

pre-image of set by relation, 9.5.16 

precedence rules, binding, logical operator, 3.9.6 

predicate, constant, 5.1.11 Ems 

predicate, hard, 6.3.7 

predicate, logical, zero-parameter, 5.1.10 

predicate, soft, 6.3.7, 6.3.9 mum 

predicate calculus, 5.1, 5.2, 6, 6.1 

predicate calculus, bound variable, 6.3.9 

predicate calculus, deduction metatheorem, 6.6.27 

predicate calculus, empty universe, 6.3.4 

predicate calculus, free variable, 6.3.9 — 

predicate calculus, Hilbert-style, 6.1.10, 6.3.3, 

predicate calculus, imaginative process, 6.1.3 

predicate calculus, international language, 77.4.1 

predicate calculus, knowledge set semantics, 6.5.9 

predicate calculus, linguistic structure, five layers, 5.1.6 

predicate calculus, rules C and G, 6.1.10 Pad 

predicate calculus, rules G and C, 6.3.19 

predicate calculus, sequent, 6.3.3 
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predicate calculus axiomatic system QC, 6.3.9 

predicate calculus axiomatic system QC+EQ, 6.7.6 

predicate calculus formalisations survey, 6.1.5 mm 

predicate calculus interpretation, 6.4 x 

predicate calculus rules, practical ‘application, 6.5 

predicate calculus with equality, 6.7, 7.5.12 E 

predicate logic, 5 ini 

predicate logic, name-to-object map, 5.1.4 

predicate logic, object class, 5.1.1 ~ 

predicate logic, object oriented, 5.1.2 

predicate tautology calculus, 6.0.1 

prefix logical expression, 3.11.12 

prefix logical expression interpretation map, 3.11.14 

prefix logical expression space, 3.11.13 

prefix logical expression space, substitution closure, 3.11.15 

prefix logical expression style, 3.11 

presents, Father Christmas, axiom of choice, 13.8.15 

primacy, shared, sets and numbers, 12.0.2, 14.4.1 

primal vector, 28.5.7 

prime ideal theorem, boolean, 13.8.8 

primitive connective, 4.1.6, 4.7.1 

principal argument, complex number, 44.2.10 

principal bundle, associated, gauge theory radiation field, 
21.12.9, 47.11.2 

principal bundle, associated connection, 69.9 

principal bundle, associated contravariant function, 47.12, 
66.8 ua 

principal bundle, baseless figure/frame bundle, 20.10.8 

principal bundle, connection definition conversions, 69.6 

principal bundle, differentiable, 66 

principal bundle, differentiable, right action chart- 
independence, 66.2.5 

principal bundle, differentiable, right action map, 66.2 

principal bundle, differentiable, right transformation group is 
free and effective, 66.2.8 

principal bundle, horizontal component map, 69.3 

principal bundle, horizontal lift, Minkowski space, 69.1.9 

principal bundle, horizontal lift function, 69.1.3 

principal bundle, horizontal subspace, 69.3 

principal bundle, infinitesimal group action, 66.5 

principal bundle, left action map, 21.11.18, 66.4.2 

principal bundle, left group action map, 66.4 


principal bundle, Minkowski space, connection form, 69.5.6 

principal bundle, Minkowski space, gauge potential, 69.11.4, 
69.13.5 

principal bundle, Minkowski space, gauge theory example, 
70.8.7 

principal bundle, Minkowski space, horizontal component 
map, 69.3.3 

principal bundle, Minkowski space, horizontal lift localisation, 
69.1.14 

principal bundle, Minkowski space, right action invariance, 
69.1.10 

principal bundle, Minkowski space, vertical component map, 
69.4.5 

principal bundle, non-topological, constant cross-section, 
21.10.2 


principal bundle, non-topological, identity chart for 
cross-section, 21.10.7 

principal bundle, non-topological, identity cross-section, 
21.10.3 

principal bundle, non-topological, right action 
chart-independence, 21.11.2 

principal bundle, non-topological, right action map, 21.11 

principal bundle, non-topological, right transformation group 
is free and effective, 21.11.10 

principal bundle, right action map, 66.2.2 
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principal bundle, right action map differential properties, 
66.2.18 

principal bundle, right action map properties, 66.2.14 

principal bundle, right transformation group, 66.2.7 

principal bundle, right translation of fundamental vector 
field, 66.6.10, 66.6.11 

principal bundle, right translation operator for vector field, 
66.2.21 

principal bundle, topological, right action chart-independence, 
47.8.10 

principal bundle, topological, right action map, 47.8.7 

principal bundle, topological, right transformation group, 
47.8.14 

principal bundle, topological, right transformation group is 
free and effective, 47.8.16 

principal bundle, topological fibre/frame bundle, 47.13.2 

principal bundle, transposed infinitesimal action map, 66.6.2 

principal bundle, vertical component map, 69.4 

principal bundle connection, curvature, 705 — 

principal bundle connection, generator function, 69.7 

principal bundle connections, conversion rules, 69.6.3 

principal bundle connections, equi-informational structures, 
69.12.8 

principal bundle cross-sections, connection form, 69.11.3 

principal bundle example, real group on four-space, 66.2.19, 
66.2.24, 66.5.4, 66.6.3 

principal bundle for Maxwell's equations, 66.1.6 

principal bundle function, contravariant, 47.12.2, 47.12.3, 
66.8.2 

principal bundle function, contravariant, associated 
connection, 69.10.3 

principal bundle function, contravariant, connection, 69.10.4 

principal bundle function, contravariant, OFB cross-section 
interpretation, 47.12.6, 66.8.4 

principal bundle function, contravariant, short-cut map, 
47.12.5 

principal bundle functions, contravariant, covariant 
derivative, 69.10.2 

principal bundles, principal frame bundles, 55.7.11 

principal fibration, meaningless, 21.9.1 

principal fibre bundle, affine connection curvature, 71.4.6 

principal fibre bundle, connection, 69 S 

principal fibre bundle, connection form, 69.5, 69.5.4 

principal fibre bundle, differentiable, 66.1, 66.1.2 

principal fibre bundle, empty, 21.9.5 EHI 

principal fibre bundle, exact sequences, 24.5.1 

principal fibre bundle, horizontal component map, 69.3.2 

principal fibre bundle, horizontal lift function, 69.1 

principal fibre bundle, horizontal lift function, transposed, 
69.2 

principal fibre bundle, horizontal subspace, 69.3.7 

principal fibre bundle, infinitesimal action map of Lie algebra 
elements, 66.5.2 

principal fibre bundle, non-topological, 21.9, 21.9.4 

principal fibre bundle, non-topological, right action map, 
21.11.4 

principal fibre bundle, non-topological, right transformation 
group, 21.11.8 

principal fibre bundle, pointwise right action map, 66.2.10 

principal fibre bundle, right transformation groups, 20.7.1, 
20.8.3 

principal fibre bundle, structure group invariants, 20.9.22 

principal fibre bundle, subgroups, 17.7.13 

principal fibre bundle, topological, 47.8, 47.8.3 

principal fibre bundle, transformation groups, 20.3.6 

principal fibre bundle, vertical component map, 69.4.2 

principal frame, tangent vector, fibre atlas, 55.7.7 

principal frame bundle, 55.7 
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principal frame bundle, tangent vector, 55.7.8 

principal frame bundles are principal bundles, 55.7.11 

principal G-bundle, topological, 47.8.3 

principal principle, mathematical induction, 12.1.7 

principal velocity chart map for tangent vector-frame 
fibration, 55.7.5 

principal velocity chart on tangent vector-frame fibration, 
55.7.5 

principle, anthropomorphic, 48.2.4 

principle, Mach’s, 48.2.4 

principle of mathematical induction, 7.10.1 

principle of mathematical induction, finite, 12.2.13 

principle of mathematical induction, Peano axiom, 14.1.7 

principle of mathematical induction, wrapped finite, 12.2.24 

principle of mathematical induction (theorem), 12.2.12 

principle of transfinite induction, 11.8.2 m 

probability notation, 7.7.16 

probability theory, combination symbol, 14.9.1 

probability theory, out of scope, 1.6.3 

problem, dark, 6.6.14 DES 

problem, Hilbert’s fifth, 62.1 

problem, somebody else's, equality relation, 6.7.3 

problem, somebody else's, model theory, 2.1.3 

problem, somebody else's, ZF set theory model, 

problem formalisation, ODE, 44.6.8 

procedure, inverse matrix, 24.9.10 

Proclus, lemmas, 2.4.6 ns 

Procrustes bed, 53.2.2 

product, alternating tensor, 30.4.8 

product, anticommutative, 19.10.2 

product, binary Cartesian, projection map, 10.12 


product, binary Cartesian, projection slice, 10.12 

product, Cartesian, countable set-family, 13.7.22 

product, Cartesian, finite set-family, 13.7.17 

product, Cartesian, general set-family, 10.11.10 

product, Cartesian, partial, 10.17, 10.17.2 

product, Cartesian, sequence, 14.6 

product, common-domain, of differentiable maps, 52.7.5 

product, common-domain direct, of relations, 9.7.9 

product, differentiable manifold, 52 P 

product, direct, differentiable manifolds, 52.6 

product, direct, functions, common-domain, 10.15 

product, direct, identification map for Cartesian space 
tangent bundles, 26.15.4 

product, direct, identification map for Cartesian space 
tangent spaces, 26.15.2 

product, direct, identification map for manifold tangent 
bundles, 54.7.6 

product, direct, identification map for manifold tangent 
spaces, 54.7.2 

product, direct, of tangent bundles, 54.7.5 

product, direct, of tangent bundles of Cartesian spaces, 26.15 

product, direct, of tangent bundles of manifolds, 54.7 mum 

product, direct, of tangent spaces, 54.7.1 


-1 
iN 
um 


product, direct, of tangent spaces of Cartesian spaces, 26.15.1 


product, direct, topological manifolds, 50.4.2 
product, direct common-domain, function continuity, 32.11.2 
product, double-domain, partial function, 10.14.10 
product, finite sequence, 16.7 

product, function, multiple-domain, 55.5.26 
product, functions, double-domain, 10.14.3 
product, general Cartesian, projection map, 10.13 
product, general Cartesian, projection slice, 10.13 
product, hyperbolic inner, linear space, 24.10 
product, inner, hyperbolic on linear space, 24.10.3 
product, inner, linear space, 24.9 

product, inner, module over ring, 19.7 

product, inner, on module over a ring, 19.7.2 
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product, inner, on real linear space, 24.9.4 

product, juxtaposition, linear functional by vector, 23.4.10 

product, juxtaposition, multilinear function by vector, 
27.2.31, 56.7.9 

product, juxtaposition, tensors, 29.7 

product, manifolds, partial map differentiability, 52.6.8 

product, relation, double-domain direct, 9.7.2 

product, relations, double-domain direct, 9.7 

product, topological manifolds, 50.4 ~ 

product, topological spaces, projected slice, 32.10 

product, wedge, 30.4.11 mm 

product atlas, direct, non-topological manifolds, 49.3.10 

product atlas, direct, topological manifolds, 50.4.6 

product by real function, vector bundle cross-section, 
differentiability, 65.2.4 

product by real function, vector field, differentiability, 57.2.11 

product decomposition of differential of map, direct, global, 
58.10.2 

product function, list, 18.10.3 

product map, common-domain, differentiability, 52.6.13 

product-map, common-domain, differential, 58.7 

product map, double-domain, differentiability, 52.6.15 

product map differential, common-domain, component 
differentials, 58.7.9 

product of Cartesian space maps, common-domain, 
differentiable, 42.6.8 

product of Cartesian space maps, double-domain, 
differentiable, 42.6.9, 42.7.4 

product of diffeomorphisms, double-domain, differentiable, 
42.7.4 

product of functions, double-domain, 10.14 

product of manifolds, slice-set, tangent vector embedding 
map, 54.8.3 

product of maps, common-domain, pointwise differential, 
58.7.7, 58.10.7 

product of maps, common-domain direct, differential, 54.7.3 

product of maps, naive derivative, Leibniz rule, 61.3.1 

product of matrices, 25.3.7 

product of sets, Cartesian, properties, 9.4.6 

product of tensors, juxtaposition, 57.117 

product of vectors, cross, 17.1.4 

product operation, scalar, total space of vector bundle, 65.2.2 

product rule, differential calculus, 40.5.9, 41.1.18 

product-structured differentiable manifold, 52.7.8 

product-structured differentiable manifold, induced atlas, 
52.7.1 

product-structured differentiable manifold, regular 
embedding, 52.7.5 

product-structured manifold, diffeomorphism, 52.7 

product-structured manifold, tangent vector embedding map, 
58.7.5 

product-structured manifold tangent vector embedding map, 
548 

product-structured set, 10.15.12 

product-structured space, locally, fibre bundle, 21.0.7 

product-structured space, trivial fibre bundle, 10.15.11, 21.0.7 

product-structured submanifold, topological manifold, 50.5 

product-structured topological manifold, 50.5.10 7 

product-structured topological manifold, induced atlas, 50.5.1 

product-structured topological manifold, regular submanifold, 
50.5.4 

product-structured topological space, 32.11.1, 32.11.6 

product-structured topological space, subspace 
homeomorphism, 32.11.4 

product-structured topological submanifold, induced atlas, 
50.5.8 

product topology, direct, 32.9.4 

product topology, direct, family of spaces, 32.12.2 
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product topology, direct, two spaces, 32.9 

product topology, family of spaces, 32.112 

product via atlases, direct, topological manifolds, 50.4.8 

productive ZF axiom, 7.4.2 

productivism, quantifier ontology, 13.7.15 

program, computer, analogy to mathematical theorems, 
2.2.14 

project drift, 1.4.1 

projected slice, topological space product, 32.10 

projected slice map, temporary notation, 32.10.2 

projected slice through point in subset of set-product, 10.12.7, 
10.13.12 

projection, Mercator, 77.1.4 

projection, oblique, 50.6.5, 50.6.6 

projection, second factor, drop function, 59.2.13 

projection function, list, 18.10.2 Ex 

projection map, binary Cartesian product, 10.12 

projection map, Cartesian product component range, 11.5.27 

projection map, Cartesian set-product, 10.12.2, 10.13.2, 
11.5.25 ——— 

projection map, Cartesian set-product, component subset, 
10.13.6 

projection map, differentiable fibre bundle, 64.8.3 

projection map, general Cartesian product, 10.13 

projection map, non-surjective, fibration, 21.2.6 

projection map, non-topological fibre bundle, 21.8.3 

projection map, tangent fibration, manifold, 54.5.4 

projection map, tangent-line bundle total space, Cartesian 
space, 26.14.7 

projection map, tangent vector tuple fibration, 55.5.8 

projection map, topological fibre bundle, 47.6.5 

projection slice, binary Cartesian product, 10.12 

projection slice, general Cartesian product, 10.13 

projective geometry, 77.2.5 

projective transformation group, 77.2.6 

proof, inline, 4.8.11, 10.6.2 

proof discovery, 4.5.13 

proof symbol, 1.5.6 

propagation, bug, mathematical logic, 4.5.11 

proper lower section, well-ordered set, 11.7.4 

proper subset, 7.3.6 

proper superset, 7.3.6 

properties, magic, complex numbers, 16.8.11 

proposition, concrete, 2.4.3 

proposition, empirical, 5.2.7 

proposition domain, concrete, 3.2, 3.2.3 

proposition domain, concrete, examples, 3.2.9 

proposition family, parametrised, 5.1 

proposition name map, 3.3.2 ns 

proposition name map, abbreviated notation, postfix, 3.11.10 

proposition name scope, 3.3.4 


proposition name space, 3.3, 3.3.2 


? 
proposition parameter, 5.1.4 


proposition template, 7.2.3 
propositional calcu us, 4, 4.5.7, 4.9.4 
propositional calculus, domain of interpretation, 5.1.8 
propositional calculus, scroll management, 4.3.4, 4.3.7, 4.3.10, 
4.8.14 MEE 
propositional calculus, semantics-free, 4.1.1 
propositional calculus axiomatic system PC, 4.4.3 
propositional calculus axiomatic system PC', 4.9.2 
propositional calculus formalisation, 4.1 ES 
propositional calculus formalisation selection, 4.2 
propositional calculus formalisations survey, 4.2.1 
propositional logic, 3 m 
propositional logic, punctuation, unnecessary, 3.11.2 
propositional tautology calculus, 4.3.2 
propositions, two possible truth values, 3.2.6 
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proxies, real-valued test-functions for chart components, 
61.1.4 

proxies, vector fields for coordinate basis fields, 57.0.2, 74.2.5 

pseudo-absolute value function, 18.5.2 

pseudo-definition, infinity, 16.2.12 

pseudo-distance function, 73.10.3, 75.1.3 

pseudo-distance function pair, 75.1.10 

pseudo-distance-squared function, 75.1.8 

pseudo-metric, pseudo-Riemannian manifold, 75.1.3 

pseudo-metric distance, non-Hausdorff manifold, 49.5.4 

pseudo-metric distance function, 37.1.9 

pseudo-metric field, hyperbolic inner product, 24.10.1 

pseudo-metric space, topological completeness, 37.8.16 

pseudo-metric space, trivial topology, 37.2.5 

pseudo-metric tensor field, 75.1.4, 75.1.6, 75.1.11 

pseudo-notation, 1.4.7 

pseudo-notation, atlas of locally Cartesian space, 49.7.9 

pseudo-notation, components of families, 10.13.5 

pseudo-notation, covariant derivative of vector field on a 
curve, 71.7.3 

pseudo-notation, equinumerosity, 13.1.4 

pseudo-notation, Landau order symbols, 39.7.8 

pseudo-notation, numerosity domination, 13.1.10 

pseudo-notation E", 26.11.2 EE 

pseudo-notation M^, 23.1.5, 26.11.2 

pseudo-product of tensors, 57.11.7 

pseudo-Riemannian geometry, 75 

pseudo-Riemannian geometry, literature, 75.0.2 

pseudo-Riemannian geometry, not true geometry, 75.1.2 

pseudo-Riemannian geometry, overview, 75.1 

pseudo-Riemannian metric, differentiable, 75.2.2 

pseudo-Riemannian metric tensor field, 75.2, 75.2.1 

pseudo-Riemannian space, gauge theory, 75.4 

pseudo-theorem, 4.8.8 E 

pseudo-value, limits, 11.2.18 

pseudo-vector-fields, induced by a map, 61.6.5 

pseudometric space, 37.1.9 

Ptolemy of Alexandria |Claudius Ptolemaeus], 26.11.4, 77.1.3 

Ptolemy of Alexandria [Claudius Ptolemaeus], Table f 
Chords, 44.2.22 

pull-back, differential of map between manifolds, 58.8.1 

pull-back, gauge, of differential form on fibration, 64.7.14 

pull-back, native velocity, 40.1.1 

pull-back, right conjugate, transformation group, 17.8.6 

pull-back, test functions, differentiability definition, 52.1.18 

pull-back, transpose of linear map, 23.11.6 

pull-back atlas differentiability test for fibre bundle 

cross-section, 64.9.11 

pull-back atlas via diffeomorphism, differentiable manifold, 

52.2.8, 52.2.8 

pull-back atlas via homeomorphism, locally Cartesian space, 

49.11.7 

pull-back atlas via trivialisations, differentiable fibre bundle, 

pull-back differential, vector-valued, 58.8.6 

pull-back differential for vector-valued form, 58.11.7 

pull-back differential of diffeomorphism, global, 58.11.3, 

58.11.5, 58.11.10 T 

pull-back differential of differentiable map, global, 58.11 

pull-back differential of differentiable map, pointwise, 58.8 

pull-back differential of map at a point, 58.8.2 rad 

pull-back differential of map at a point, vector-valued, 58.8.7 

pull-back of topology, 32.0.1, 32.8.1 

pull-back operator, linear space, 23.11.3 

pulled-back lifted chart-basis vector field, principal bundle, 
69.14.3 

punch line, logic proof, 4.3.7 

punctuation, propositional logic, unnecessary, 3.11.2 
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punctured closed ball, 37.3.14 

punctured open ball, 37.3.14 

pure mathematics, applied mathematics, comparison, 77.4.5 
push-forth, differential of second-order operator, 59.0.1 
push-forth, left conjugate, transformation group, 17.8.6 
pyramid, Egypt, precision, 26.10.1 ae 
pyrrhic victory, partial differentiability, 41.1.8 

Pythagoras of Samos, 77.1.2 

Pythagorean numerology, 2.2.9 

Pythagoreans, 77.1.2 


Q, mnemonic letter for horizontal subspace, 67.9.5 

Q (rational number system), 15.1.6 pea 

QC (predicate calculus), 6.1.1 

QED (quod erat demonstrandum), 1.5.6 

QED symbol, 1.5.6 um 

quadratrix, Hippias of Elis, literature, 26.19.3 

quadrilateral curve family generated by vector fields, 46.5.6 

quadruple, ordered, 9.3.3, 14.6.5 mE 

quadruplicity, 6.8.17 

quadruplique, 6.8.17 

quantification, selective, of free variables, 6.3.11, 6.7.8, 6.7.10 

quantifier, existential, 5.2.1 

quantifier, existential, ontology, 13.7.15 

quantifier, existential, restricted, 7.2.7 

quantifier, logical, infinity, 5.2.7, 5.2.8 

quantifier, logical, semantics, 5.3.11 

quantifier, multiplicity, 6.8.8 

quantifier, multiplicity, orderer pair test, 9.2.15 

quantifier, universal, 5.2.1 

quantifier, universal, restricted, 7.2.7 

quantifier duality, logic, 5.2.7 

quantifier notation, unique existence, 6.8.3, 6.8.5 

quantifier notations, logic, survey, 5.24 — 

quantifier ontology, productivism, 13.7.15 

quantifier reversal, axiom of choice, 10.11.11, 13.8.9 

quantifier reversal theorem, 10.5.17 

quantifier swapping, axiom of choice, 10.11.11, 13.8.9 

quantifiers, logical, 5.2 

quantifiers, logical, power sets, 7.6.28 

quantifiers, reversing logical, axiom of choice, 45.2.1 

quantum mechanics, out of scope, 1.6.3 

quark, matter field, 47.12.7 cum 

quaternion, 16.8.10 

quaternion, not-commutative division ring, 18.7.1 

quaternion, origin of word “vector”, 26.1.10 

quaternions, possible origin of linear algebra, 22.0.1 

quilt, patchwork, 49.2.2 

Quine dagger, 3.7.8, 3.7.10, 3.7.14 

quintessence, 22.0.3 ^ 

quintessence of fibre bundle, local trivialisation, 10.15.11, 
21.0.7 

quintessence of velocity, 53.1.2 

quintuple, ordered, 14.6.5 

quipu-like logic notation, Frege, 4.4.1 

quod erat demonstrandum, 1.5.6 

quodlibet, ex falso sequitur, 3.1.5 

quotient, linear spaces, 24.2, 24.2.8 

quotient group, 17.7, 7.9 


17.7.9 

quotient limit, differential, equals derivative, 40.8.5 
quotient linear space, natural homomorphism, 24.2.12 
quotient map of equivalence relation, 10.16.2 
quotient of functions, 10.5.27 

quotient rule, differential calculus, 40.5.9, 41.1.18 
quotient set, equivalence relation, 9.8.7 

quotient space, topological, 32.13, 32.13.7 

quotient topology, 32.13.9 md 


r-linear map, 27.2.16 
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rabbit, hat, 6.6.5, 12.5.5, 45.3.6, 62.0.1 

rabbit, hat, complex numbers, 16.8.11 

rabbit, hat, IOU, 7.10.3, 45.37 

rabbit, IOU, open base point-choice function, 33.4.22 

radiation field, boson, 47.12.7 

radiation field, bosonic, gauge potential, 75.4.1 

radiation field, bosonic, principal bundle, 21.12.9, 47.11.2, 
69.11.5, 71.6.6 

radiation fields, bosonic, gauge theory, 65.0.1 

radio, 1.7.1 

radius, no word in ancient Greek, 44.2.22 

radius of ball, 37.3.1 

radius of set, metric space, 37.4.11 

Radon, Johann Karl August, 77.1.7 

Radon, Johann Karl August, integration, 43.1.2 

Radon measure, 20.9.20, 43.1.3 mE 

raising-index isomorphism, 73.5.4 

raising indices, tensor, 73.5 

raison d'étre, axiomatic system, 4.9.5, 6.1.7 

range, meanings, 9.5.26 D 

range of a sequence, 12.3.2 

range of function, 10.2.6, 10.2.7 

range of relation, 9.5.4, 9.5.6 

range restriction, relation, 9.6.21 

range/domain specification, function, 10.2.1 

rank, column, of matrix, 25.5.8 

rank, first cab off, choice function, 33.4.8 

rank, matrix, 25.5 

rank, row, of matrix, 25.5.8 

rank, von Neumann universe, 12.5.4 

rank-nullity theorem, 24.2.17 

rank of set, 13.4.3 

rational number, 15, 15.1, 15.1.5 

rational number, Cauchy sequence, 15.3.7 

rational number, extended, 16.3 

rational number embedding in real numbers, 15.5.3 

rational number interval notation, 15.1.13 

rational number interval notation, semi-infinite, 15.1.14 

rational number notation summary, 1.5.1 

rational number representation, 15.33 — 

rational number representations, 15.1.4 

rational number system, abstract, 15.1.9 

rational number system, ordered, abstract, 15.1.10 

rational numbers, cardinality, 15.2 

rational numbers, enumeration, 15.2, 15.2.4 

rational numbers, measure zero, 45.1.5 

rational numbers, topology, usual, 32.5.6 

rational numbers, uneven distribution, 15.1.17 

rational power, negative, positive real number, 16.6.12 

rational power, non-negative, non-negative real number, 
16.6.11 

rational power, non-negative real number, 16.6.10 

rational real number, 15.5.6 

raven, logic, 5.2.7 

razor, Ockham's, axiom of choice, 7.12.7 

razor, Ockham's, Newton's philosophising Rule I, 2.3.2 

R (real number system), 15.4.9 m 

R regarded as a manifold, 57.9.9, 59.4.7 

real analytic function, 42.8.3 

real-analytic function, 16.8.10 

real analytic function, coefficient sequence, 42.8.3 

real analytic group, 62.2.4 

real function, constant, infinitely differentiable, 42.1.14, 
42.2.16 

real function, constant, zero derivative, 40.5.5 

real function, constant, zero partial derivative, 41.1.16 

real function with C^ derivative is C^*!, 42.1.15 

real infinity, definition, 16.2.2 
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real Lie algebra, 19.10.14 

real Lie group, 62.2.4 

real line with two origins, not Hausdorff, 49.5.5 
real linear space, 22.1.15 

real linear space, inner product, 24.9.5 

real linear space, normed, 24.7.7 

real negative definite bilinear function, 30.5.3 

real negative definite matrix, 25.11.7 

real negative semi-definite bilinear function, 30.5.3 
real negative semi-definite matrix, 25.11.7 ^ 
real number, 15, 15.3, 15.4.13 

real number, extended, 16. 
real number, irrational, 15.5.6 


real number, negative of, 15.7.6 

real number, powers and roots, 16.6 

real number, rational, 15.5.6 ~ 

real number addition, 15.7 _ 

real-number constructions, 16 

real number decimal representation, 15.3.5 

real-number interval, 16.1, 16.1.4 

real-number interval, finite, 16.1.12 

real-number interval, infinite, 16.1.12 

real-number interval, left-compact, 34.9.16 

real-number interval, left-open, 34.9.16 

real-number interval, right-compact, 34.9.16 

real-number interval, right-open, 34.9.16 

real-number interval, semi-infinite, 16.1.12 

real-number interval convexity, 22.11.24 

real-number interval length, 16.1.13 

real number interval topology, 34.9 

real-number limit point, 35.7 

real-number measure-zero sets, explicit family, countable 
union, 45.3.3 

real number multiplication, 15.8 

real number notation summary, 1.5.1 

real-number open set interval enumeration, 32.7 

real-number open set measure, 32.7.10 — 

real number order, 15.6 

real number reciprocal, 15.8.11 

real number representation, 15.3.4 

real number representation, Dedekind cut, 15.4 

real-number sequence limit, 35.7 x 

real-number set of explicit measure zero, 45.2, 45.2.2 

real-number set of explicit measure zero, properties, 45.2.4 

real-number set of measure zero, 45.1, 45.1.3 

real-number set of measure zero, explicit family, 45.3.2 

real-number set of measure zero, properties, 45.1.4 _ 

real number system, Cantor, 15.3.8 

real number system, Dedekind-cut, 15.9.2 

real number system, extended, 16.27 

real number system, extended, non-negative, 16.2.10 

real number system, extended, ordered, 16.2.13 

real number system axioms, 15.9 

real-number topology, the most important, 32.5.1 

real number tuple, 16.4, 16.4.2 

real numbers, absolute value, 16.5.2 

real numbers, addition operation, 15.7.2 

real numbers, computer hardware, 15.3.1 


real numbers, constructible, Cantor diagonalisation, 13.7.23 


real numbers, countable union of countable sets, 45.3.6 
real numbers, metric space, completeness, 37.8.14 
real numbers, model of observations, 15.3.2, 53.1.7 
real numbers, order, standard, 15.6.2 

real numbers, phenomena, measurements, 15.0.3 

real numbers, rational number embedding, 15.5.3 

real numbers, regarded as a manifold, 57.9.9, 59.4.7 
real numbers, set, 15.4.8 

real numbers, standard topology, 32.5 
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real numbers, the most important number system, 32.5.1 

real numbers, topology, usual, 32.5.7 

real numbers, topology based on total order, 32.5.5 

real numbers usual metric, 37.2.8 

real part, complex number, 16.8.5 

real positive definite bilinear function, 30.5.3 

real positive definite matrix, 25.11.7 ~ 

real positive semi-definite bilinear function, 30.5.3 

real positive semi-definite matrix, 25.11.7 

real power, non-negative real number, 16.6.14, 16.6.1 

real square matrix, lower bound, 25.12 

real square matrix, lower modulus, 25.11 

real square matrix, upper bound, 25.12 

real square matrix, upper modulus, 25.11 

real symmetric matrix, 25.13, 25.13.3 

real-tuple map, constant, infinitely differentiable, 42.6.2 

real-tuple-valued function on manifold, constant, 
differentiable, 51.7.4 

real-tuple-valued function on manifold, differentiable, 51.7.2 

real-valued function, basic, 16.5 

real-valued function, differentiable, 40.3.4 

real-valued function, differentiable, on differentiable manifold, 
51.6.2 

real-valued function, differentiable, on manifold, 51.6 

real-valued function, differential, 58.1.2 

real-valued function, differential, for tangent operators, 58.1.3 

real-valued function, differential, global, 58.2 

real-valued function, differential, on vector-tuple bundle, 
59.7.3 

real-valued function, differential, pointwise, 58.1 

real-valued function, higher-order differential, 59.10 

real-valued function at min/max, zero partial derivatives, 
51.6.12 

real-valued function on manifold, constant, differentiable, 
51.6.5 

real-valued function on manifold, constant, zero derivative, 
54.11.6 

real-valued function on manifold, constant, zero differential, 
54.11.18 

real-valued injection on interval, continuous, has continuous 
inverse, 34.9.25 

realisation of a group, 20.1.11 

rearrangement, sorted, of finite sequence, 14.11.3 

rearrangement of ordered selection, 14.11 

reciprocal, real numbers, 15.8 

reciprocal of real number; 15.8.11 

reciprocal rule, differential calculus, 40.5.9, 41.1.18 

reconstruction of connection form from gauge potentials, 
69.12.7 

reconstruction of connection form from localisations, 69.12.3 

reconstruction of parallel transport from connection, 67.3 

Recorde, Robert, equality symbol, 6.7.11 m 

rectangle, local approximation, Peano ODE method, 44.3.9 

rectangular matrix, 25.2, 25.2.2 

rectangular matrix addition, 25.3 

rectangular matrix linear space, 25.3.2 

rectangular matrix multiplication, 25.3 

rectangular set, 9.4.4 

rectangular Stokes theorem, two dimensions, 46.9 

rectifiable curve, 38.9.2 m 

rectifiable curve, length-parametrisation, 38.9.3 

rectifiable curve, locally, bidirectional length-parametrisation, 

38.9.11 

rectifiable curve, locally, in locally Lipschitz manifold, 50.7.6 

rectifiable curve, metric space, 38.0.1, 38.9 

rectifiable curve in a manifold, 50.7 

rectifiable path, 48.2.7 7 

recursion, structure, run-away, 54.0.1 
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recursive catchment area, differential geometry, 1.4.1, 2.0.1 relation, domain, 9.5.4, 9.5.6 
recursive cumulative hierarchy of sets, transfinite, 126 relation, equivalence, 9. 8, 9.8 9.8.2, 10.16 
recursive definition, 20.9.12 osi relation, equivalence, quotient map, 10.16.2 
recursive differentials, 59 relation, graph of, 9.5.23 
recursive gap-filling, Zauberlehrling, Goethe, 1.4.1 relation, identity, 9.6.9 


recursive interpretation of infix logical expression, 3.9.12 relation, image, 9 5.4, 9.5.6 
recursive listing of elements of a set, 7.8.9 relation, injective, 9.6.18 
recursive tangent bundles, 59 ux relation, inverse, 9.6. 13, 10.5.9 
red carpet, 7.10.2 ~ relation, origin set, 9.5.21 
reductio ad absurdum, 3.1.4, 4.1.6, 4.3.4, 4.4.4 relation, range, 9.5.4, 9.5.6 
reductio ad absurdum has absurd consequences, 3.1.5 relation, reflexive, 9.6.15 . 
reductionism, 2.0.1, 27.1.3 m relation, set-inclusion, partial order, 11.1.12 
reductionist, ATA relation, source set, 9.5.20, 9.5.21 
redundancy, specification tuple, 31.3.5 relation, symmetric, 9.6.15 
redundant axiom, ZF set theory, 7.4.2 relation, target set, 9.5.20, 9.5.21 
reference-frame transition group, topological fibre/frame relation, transitive, 9.6.15 

bundle, 47.13.2 relation, univocal, 10.9.1 
references, literature, 79 relation composition, 9.6 
refinement, common, interval partitions, 43.3.5 relation graph, 9.1.2, 9.5.21 
refinement, interval partition, 43.3.5 relation image, 9.5.16 
refinement, partition, 8.7.16 relation inverse image, 9.5.16 
refinement condition, common, directed set, 37.8.15 relation inversion, 9.6 
refinement of a cover of a set, 8.7.9 relation network traversal, membership, 7.8.9 
refinement of a cover of a subset, 8.7.4 relation on the left, membership, 7.5.1 PEERS 
refinement of indexed cover, 10.18.4 relation pre-image, 9.5.16 
refinement of indexed open cover, 33.5.7 relation product, double-domain direct, 9.7.2 
refinement of interval partitions, common, coarsest, 43.3.7 relation representation, 9.1.1 2 
refinement of open cover, 31.7.4 —— relation restriction, domain or range, 9.6.21 
refinement of open cover, indexed, 33.5.7 relation tuple, 9.5.28 


reflexive cardinal, 13.10.7 

reflexive relation, 9.6.15 

reflexivity, equivalence relation, 9.8.2 

reflexivity, weak, of an order, 1.1.3 

reflexivity of equality axiom, 6.7.5 

reglobalisation of localised cross-section, 21.6.4 

regula falsi, 3.1.5 

regular differentiable submanifold, 52.4.2 

regular differentiable submanifold, basic properties, 52.4.4 
regular differentiable submanifold, constant-graph condition, 
52.4.8 

regular differentiable submanifold point-set, 52.3.7 

regular differentiable submanifold point-set, constant-graph 
condition, 52.3.16 

regular embedding, differentiable manifold, 52.5.3 

regular embedding, fibre set of fibre bundle, differentiable, 
64.11.3, 64.11.6 

regular embedding, fibre space in fibration, differentiable, 
64.3.9 

regular embedding, product-structured differentiable 
manifold, 52.7.5 

regular embedding, topological manifold, 50.3.6 

regular immersion, differentiable manifold, 52.5.4 

regular immersion, topological manifold, 50.3.8 

regular submanifold, fibre space in fibration, differentiable, 
64.3.9 


regular submanifold, product-structured topological manifold, 


50.5.4 

regular submersion, differentiable manifold, 52.5.6 
regular submersion, topological manifold, 50.3.13 
regular topological submanifold, topological manifold, 50.2.8 
regularity, differentiable manifold map, 52.5.1 
regularity, weak, 1.4.10 

regularity axiom, ZF, 7.8 

related vector fields by a map, 61.6.2 

relation, 9, 9.5, 9.5.2 

relation, codomain, 9.5.4 

relation, composition, 9.6.2 

relation, destination set, 9.5.21 


[www .geometry.org/dg.html] 


personal use. Publie redisuibution of 


relations, direct product, double-domain, 9.7 

relations, family tree, 9.0.2 

relative, everything is, 20.10.2 

relative atlas on submanifold, 52.4.9, 52.7.4, 64.11.2 

relative set complement, 821 —_—- 

relative topology, 31.6, 31.6.2 

relativism, cultural, 2.2.7 

relativity, general, 75.33 — 

relativity, history, 77.3 

relativity, special, 75.1.11 

relativity condition, fibre/frame bundle, 20.10.18 

renaissance, European, 77.1.3 PEE 

rendezvous points, metamathematical definitions, 3.2.4 

reparametrisation of curve, length invariance, 38.8.5 . 

repère mobile, 55.6.1, 55.7.12, 57.11 T 

replacement axiom, ZF, 7.7 

representation, cardinality, by ordinal numbers, 13.2 

representation, contragredient, linear map, 23.11.19 

representation, contragredient, linear transformation group, 
23.11.20 

representation, dual, linear map, 23.11.19 

representation, dual, linear transformation group, 23.11.20 

representation, rational number, 15.1.3 

representation, real number, 15.3.4 

representation of integer, 53.3.6 

representation of patchwork space, 10.17.9 

representation of tangent vector, 53.3, 53.3.1 

representation theorem, well-ordered sets, 13.2.4 

representational art, 2.4.4 

representations, mathematical classes, 15.1.4 

representative curve, 36.8.4 

research, set theory, 7.0.1 _ 

research thrust, measure and integration, 43.1.3 

restricted existential quantifier, 7.2.7 

restricted universal quantifier, T27 

restriction, function set, notation, 10.2.18 

restriction, well-ordering, 11.6.6 

restriction logic axiom, 4.4.4 

restriction of differentiable manifold, 51.4.15 


[draft: UTC 2023-1-3 Tuesday 00:13] 


Mf this book draft in electronic or printe 


2480 


restriction of differentiable manifold to open set, 51.4.16 

restriction of domain of relation, 9.6.21 

restriction of function, 10.4, 10.42 

restriction of function to a set, 10.4.3 

restriction of manifold, 52.4.16 

restriction of relation, domain or range, 9.6.21 

restriction submanifold, differential, tangent vector 
embedding map, 58.5.10 

retina, sphere-surface, 49.2.7 

returns, diminishing, minimalist logic axioms, 4.9.5 

reversal of path, 36.8.8 mm 

reversal of quantifiers, theorem, 10.5.17 

reverse-operation group, 17.4.12; 

reversibility rule for pathwise parallelism, 48.3.5 

reversing logical quantifiers, axiom of choice, 45.2.1 

Ricci-Curbastro, Gregorio, 77.1.7 

Ricci-Curbastro, Gregorio, absolute differential calculus, 
27.1.5 

Ricci-Curbastro, Gregorio, relativity, 77.3.2 

Ricci-Curbastro, Gregorio, Ricci curvature tensor, 71.11.10 

Ricci-Curbastro, Gregorio, tensor calculus, 27.1.2 

Ricci curvature tensor, affine connection, 71.11.11 

Ricci curvature tensor, components, 71.11.14 

Ricci curvature tensor, Riemannian manifold, component 
version, 74.4.6 

Riemann, Georg Friedrich Bernhard, 77.1.6 

Riemann, Georg Friedrich Bernhard, Christoffel array, 67.1.4 

Riemann, Georg Friedrich Bernhard, manifold, 49.2.9 

Riemann, Georg Friedrich Bernhard, metric tensor, 27.1.5 

Riemann, Georg Friedrich Bernhard, relativity, 77.3.2 

Riemann curvature, differentiable fibration, 70.3.9 

Riemann curvature, ordinary fibre bundle, 70.3, 70.4.3 

Riemann curvature, ordinary fibre bundle, justification, 70.7 

Riemann curvature at a point, differentiable fibration, 70.3.8 

Riemann curvature component array notation survey, 71.11.9 

Riemann curvature justification theorem, toy version, 70.7.6 

Riemann curvature tensor, 70.4.1 ES 

Riemann curvature tensor, components, 71.11.8 

Riemann curvature tensor, Levi-Civita connection, 
component version, 74.4.4 

Riemann curvature tensor field, affine connection, 71.11.5 

Riemann-Darboux integral, 43.5 

Riemann-Darboux integral, basic properties, 43.7 

Riemann-Darboux-Stieltjes integral, 43.10 — 

Riemann integrable function, real-valued, 43.4.7 

Riemann integral, 43.4, 43.4.6 

Riemann integral, vector-valued integrand, 43.9 

Riemann-Stieltjes integral, 43.10 

Riemann-Stieltjes integral, integrator curve, 43.10.3 

Riemann-Stieltjes integral, real-valued function, 43.10.4 

Riemann-Stieltjes integral, real-valued function, lower, 
43.10.4 

Riemann-Stieltjes integral, real-valued function, upper, 
43.10.4 

Riemann sum, 43.4.5 

Riemannian manifold, 73, 73.2.5, 73.2.11 

Riemannian manifold, angle between vectors, 73.2.16 

Riemannian manifold, curve length, 73.6, 73.6.2 

Riemannian manifold, distance function, 73.7 


Riemannian manifold, gradient of real-valued function, 74.6.2 


Riemannian manifold, Laplacian operator, 74.6.4 
Riemannian manifold, length of vector, 73.2.12 
Riemannian metric, 73.2.4 

Riemannian metric, Hessian operator, 73.1.1 


Riemannian metric component array field, 73.3.2 


Riemannian metric component array field, inverse, 73.3.8 


Riemannian metric differentiability, 73.2.10 
Riemannian metric function, 73.4, 73.4.2 
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Riemannian metric function for tensor field, 73.4.3 

Riemannian metric integral, 73.9.2 

Riemannian metric tensor field, 73.2 

Riemannian metric tensor field induced by distance function, 
73.9.4 

right ‘action, differentiable principal bundle, chart- 
independence, 66.2.5 

right action, non-topological principal bundle, 
chart-independence, 21.11.2 

right action, topological principal bundle, chart-independence, 
47.8.10 

right action invariance, principal bundle, Minkowski space, 
69.1.10 

right action map, 20.7.2 

right action map, effect on connection form, 69.8 

right action map, non-topological principal bundle, 21.11 

right action map, non-topological principal fibre bundle, 
21.11.4 

right action map, pointwise, principal fibre bundle, 66.2.10 

right action map, principal bundle, 66.2.2 

right action map, topological principal bundle, 47.8.7 

right action map differential properties, principal bundle, 
66.2.18 

right action map properties, principal bundle, 66.2.14 

right action on Lie transformation group, 63.64 

Right Ascension, 76.1.3 

right-compact real-number interval, 34.9.16 

right conjugate of subset of group, 17.8.3 

right conjugation map, 17.8.12 

right coset of subgroup, 17.7.1 

right-differentiable function, 40.9.4 

right element operator for ordered pair, 9.2.13 

right ideal of a ring, 18.1.9 

right infinitesimal transformation, 63.7 

right invariant vector field, 62.7.2 zw: 

right invariant vector field, Lie group, 63.5.12 

right invariant vector field generated by vector, Lie group, 
62.7.7 

right invariant vector field on Lie group, 62.7 

right inverse, function, 10.5.13 

right inverse, group, 17.3.12 

right inverse, matrix, 25.4 

right inverse matrix, 25.4.5 

right inverse of surjection, explicit, 10.5.17 

right-open real-number interval, 34.9.16 

right-open set, 40.9.1 

right-ordering of group, 17.5.1 

right-ordering of group, Archimedean, 17.5.5 

right segment, partially ordered set, 11.2.12 

right slice-set submanifold of manifold product, 52.6.17 

right transformation group, 20.7, 20.7.2 

right transformation group, effective, 20.7.9 

right transformation group, effective topological, 36.11.6 

right transformation group, equivariant map, 20.8.4 

right transformation group, free, 20.7.11 

right transformation group, free topological, 36.11.6 

right transformation group, Lie, 63.5.3 

right transformation group, Lie, right translation operator, 
63.5.6, 63.5.7 

right transformation group, non-topological principal fibre 
bundle, 21.11.8 

right transformation group, principal bundle, 66.2.7 

right transformation group, topological, 36.11.3 

right transformation group, topological, self-acting, 36.11.9 

right transformation group, topological principal bundle, 
47.8.14 

right transformation group, transitive, 20.7.14 

right transformation group, transitive topological, 36.11.6 
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right transformation group homomorphism, 20.8, 20.8. 
right transformation group of topological space, 36.11.2 
right transformation group of topological space, effective, 


36.11.5 
right translation, left invariant vector field, Lie group, 62.4.12, 
62.10.10 


right translation of fundamental vector field on principal 
bundle, 66.6.10, 66.6.11 

right translation operator for differential forms, 62.6.9, 63.5.9 

right translation operator for tangent covectors, 62.6.9, 63.5. 

right translation operator for tangent vectors, Lie group, 
62.6.3 

right translation operator for vector field on principal bundle, 
66.2.21 

right translation operator of Lie right transformation group, 
63.5.6, 63.5.7 

right translation operator on Lie group, 62.6, 62.6.2 

right well-ordering, 11.6.1 m 

rigorous mathematics, 2.1.1 

rigour, logical, versus mathematical intuition, 2.4.8 

ring, 18.1, 18.1.2 m 

ring, absolute value function, 18.5 

ring, affine space over module over, 26.6.6 

ring, affine space over unitary module over, 26.7.2 

ring, cancellation, 18.1.15 = 

ring, cancellative ring, 18.1.15 

ring, commutative, 18.1.17 

ring, commutative unitary, 18.2.14 

ring, commutative unitary, differentiable real functions, 51.6.4 

ring, division, 18.7.2 

ring, division, commutative, 18.7.4 

ring, Gaußian integers, 18.2.17 

ring, ideal, 18.1.9 

ring, list operation, 18.10.3 

ring, module, inner product, 19.7.2 

ring, module, norm, 19.6.2 

ring, module, seminorm, 19.5.3 

ring, module over, 19.3 

ring, module over, inner product, 19.7 

ring, module over, norm, 19.6 

ring, module over, seminorm, 19.5 

ring, non-trivial, 18.1.5 = 

ring, non-zero, 18.1.5 

ring, noncommutative, ordered, 18.3.9 

ring, normed module, 19.6.3 

ring, ordered, 18.3, 18.3.3 

ring, ordered, Archimedean, 18.1.1, 18.4, 18.4.2 

ring, ordered, cancellative semigroup, 18.3.11 

ring, ordered, homomorphism, 18.3.13 

ring, ordered, non-Archimedean, 18.4.4, 18.4.5, 18.4.7 

ring, ordered, noncommutative, 18.3.9 

ring, positive cone, 18.3.17 

ring, positive subset, 18.3.17 

ring, subring, 18.1.7 

ring, trivial, 18.1.5 

ring, unitary, 18.2, 18.2.2 

ring, unitary, characteristic, 18.2.13 

ring, unitary, homomorphism, 18.2.8 

ring, unitary, ordered, 18.6 

ring, unitary, unital morphism, 18.2.11 

ring, zero, 18.1.5 

ring homomorphism, 18.1.19 

ring-module, left, 19.3.1 

ring-module, morphism, 19.4 

ring-module, submodule, 19.3.9 

ring-module, unitary, submodule, 19.3.14 

ring-module, unitary left, 19.3.6 

ring-module endomorphism algebra, 19.9.5 


o 
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ring-module morphisms, 19.4.3 

ring-module morphisms, unitary, 19.4.11 

ring morphisms, 18.1.20 si as 

ring of endomorphisms of a module, 19.1.5 

ring of endomorphisms of module, 19.1.15 

ring of sets, 18.11.12, 18.12 

ring of sets generated by a set, 18.11.20 

ring ordering, 18.3.2 

ring ordering, Archimedean, 18.4.2 

ring with unity, 18.2.2 

rings, family tree, 18.1.1 

Rinow-Hopf geodesic completeness theorem, 73.7.2 

rise map, parallelism, affine connection, 71.10.2 

road, synthetic, to differential geometry, 74.2.6 

Robinson-style ordinal number definition, 12.1.8 

robot, half, 1.4.7 

robot task versus human task, 1.4.6 

robots, Mars, 77.1.3 

robots, Mars, arithmetic is good enough, 14.3.2 

roll, 76.8.4 

Rolle’s theorem, 40.6.4 

Roman Empire, 77.1.3 

Roman mind, 77.1.3 

roost, chickens, 77.4.15 

root, positive integral, non-negative real number, 16.6.5, 
16.6.6, 16.6.8 ERES 

root, real number, 16.6 

root, square, Cauchy-Schwarz inequality, not needed for 
module over ring, 19.8.8 

root, square, Cauchy-Schwarz inequality, real numbers, 
16.6.17 

root, square, enumerations of w x w, 13.9.4 

root, square, needed for Cauchy-Schwarz inequality, 19.8.1, 
19.8.4 CREE 

root, square, real numbers, basic inequalities, 16.6.18, 16.6.19 

root, square, triangle inequality, not needed for modules over 
rings, 19.8.10 

root node, logical expression styles, 3.11.1 

Rosetta stone, connection definition conversion rules, 77.4.1 

Rosetta stone, connection definitions, 69.15.2 

Rosetta stone, differential geometry languages, 1.4.5 

Rosetta stone, generator function, connection conversions, 
69.15.3 

Rosetta stone, required for all geometry concepts, 77.4.2 

rotation of two-sphere, 76.8.1 

rote learning, 1.7.2 

round function, 16.5.17 

route, 36.1.1 

row index, matrix element, 25.2.11 

row matrix injection map, 25.2.19 

row matrix of a matrix, 25.2.14 

row matrix span, 25.5.2 

row null space, 25.6 

row null space of matrix, 25.6.2 

row nullity, 25.6 

row nullity of matrix, 25.6.3 

row of a matrix, 25.2.13 

row rank of matrix, 25.5.8 

row span, matrix, 25.5 

row vector of a matrix, 25.2.15 

rule, chain, differential calculus, 40.5.15 

rule, chain, partial derivatives, 41.7.2, 41.7.4 

rule, chain, partial derivatives, higher-order, 42.5.27 

rule, chain, partial derivatives, second-order, 42.5.25 

rule, composition, differential calculus, 40.5.15 

rule, composition, partial derivatives, 41.7.2, 41.7.4 

rule, derived, propositional calculus, 4.5.10 

rule, generation, 20.9.12 
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rule, product, differential calculus, 40.5.9, 41.1.18 
rule, quotient, differential calculus, 40.5.9, 41.1.18 
rule, reciprocal, differential calculus, 40.5.9, 41.1.18 
rule, sum, differential calculus, 40.5.9, 41.1.18 
Rule C, predicate calculus, 4.8.5, 6.1.7, 6.1.10, 6.1.11, 6.3.19 
Rule G, predicate calculus, 4.8.5, 6.1.10, 6.1.11, 6.3.19 

ruled surface, 74.1.1 TS 

ruler with ant, 69.10.5 

rules, binding precedence, logical operator, 3.9.6 

rules, derivation, logic, 6.1.3 ES 
rules, derived, validity, 4.3.6 

rules, differentiation, single variable, 40.5 
run-away structure recursion, 54.0.1 ^ 
Russell, Bertrand Arthur William, 2.1.4, 2.2.9, 
Russell, Bertrand Arthur William, moie 
Russell, Francis Stanley (Frank), 2.1.4 
Russell's paradox, 3.2.10, 7.8.6 m 
Russell's socks metaphor, axiom of choice, 7.12.2 


TT. 


a 
l1 


dr 


sacrifice beauty for truth, 1.4.11 

Saks, Stanislaw, 77.1.7 

sample, ordered, 14.10.1 

sample, unordered, combination symbol, 14.9.1 

sample sequence, interval partition, 43.4.3 

sandy foundations, intellectual towers, 2.0.2 

sawtooth functions, 16.5.22 ME 

scalar curvature, 74.4.7 

scalar curvature field, component version, 74.4.8 

scalar multiple of a set, Minkowski, 22.10.2 

scalar multiplication, 22.1.1 

scalar product operation, total space, vector bundle, 65.2.2 

scalar space, affine space over module over group, 26.6.2 — 

scalar space, affine space over module over ring, 26.6.6 

scalar space, affine space over module over set, 26.4.10 

scalar-valued multilinear map, 27.2.1 

scaled tangent vector map, constant, differential, 59.4.10 

scaling, second-level tangent vector, 59.4.1 S 

scaling, uni-axial, 24.4.1, 26.1.3, 77.2.7 

scaling curve, finite-dimensional linear space, 57.9.10 

scaling curve, tangent vector, 59.4 

scaling curve, tangent vector, differential, 59.4.7 

scaling curve, vector bundle, 65.5 

scaling curve velocity, vector bundle element, 65.5.2 

scaling curve velocity field, tangent vector, verticality, 59.4.3 

scaling curve velocity in linear-space manifold, 57.9.11 

Schauder, Juliusz Pawel, 77.1.7 

schema, axiom, 4.1.6 

Schild, Alfred, 77.1.7 

Schild’s ladder, 72.2.2 

Schild’s ladder, geodesic curves, 72.2.3 

Schild’s ladder, geodesic spray, 59.5.3 

Schróder-Bernstein theorem, 13.1.8 

Schródinger's cat, Hausdorff condition, 49.5.10 

Schwartz, Laurent-Moise, 77.1.8 

Schwartz distribution, 19.5.1, 43.1.3 

Schwartz distribution, tangent vector representation, 53.3.1 

Schwarz, Karl Hermann Amandus, 77.1.7 

Schwarzschild, Karl, 77.1.7 

science channel, definitions and theorems, 1.5.9 

scope, proposition name, 3.3.4 i 

scope of this book, 1.6.3 ^ 

Scottish law, not proven, axiom of choice, 7.1.11 

scroll management, propositional calculus, 4.3.4, 4.3.7, 4.3.10 
4.8.14 D 

search, backwards-deductive, 4.5.13 

second countable space, open base choice function, 33.4.23 

second countable topological space, 33.4.13 

second countable versus separable spaces, 33.4.17 

second derivative, non-negative, implies convexity, 42.1.22 
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second derivative, non-positive, implies concavity, 42.1.22 

second derivatives of distance function, Hessian, 75.1.11 

second dual of linear space, 23.10.2 aaa 

second factor projection, drop function, 59.2.13 

second-level tangent bundle, 59.1, 59.1.4, 59.1.22 

second-level tangent bundle, horizontal component swap 
function, 59.6, 59.6.3, 59.6.6 

second-level tangent-line set, 59.1.8 

second-level tangent space, 59.1.19 

second-level tangent space, vertical subspace, 64.5.7 

second-level tangent vector, 59.1.6 

second-level tangent vector notation, 59.1.12 

second-level tangent vector scaling, 59.4.1 _ 

second-order differential of map between manifolds, 59.12.2 

second-order partial derivative chain rule, 42.5.25 

second-order partial derivative vector field, 59.9.3 


second-order tangent component tuple, 60.5.3 

second-order tangent fibration, 60.5.12 

second-order tangent operator, 60.2, 60.2.2 

second-order tangent operator, tagged, 60.2.11 

second-order tangent operator, tensorisation coefficients, 60.4, 
60.4.9 ~~ 

second-order tangent space, 60.5.9 

second-order tangent vector, 60.5, 60.5. 

second-order tangent vector field, 59.8.2 

section, lower, well-ordered set, 11.7.4 

section, proper lower, well-ordered set, 11.7.4 

section means cross-section, 21.3.1 

section of topological fibration, 47.4.2 

section of topological fibration, continuous, 47.4.6 

section of topological fibration, local, 47.4.2 

sectional curvature, 74.5 

sectional curvature, two-sphere, 76.6.2 

sectional curvature map, bilinear, 74.5.2 

sectional curvature quotient map, 74. 

seek truth from facts, 1.4.15 

segment, hyperplane, parametrised, 26.9 

segment, initial, ordinal number, 121.8. 

segment, initial, well-ordered set, 11.7.3 

segment, left, partially ordered set, 11.2.12 

segment, line, affine space, 26.9.7 

segment, line, parametrised, 26.9 

segment, right, partially ordered set, 11.2.12 

segment, weak initial, ordinal number, 12.5.27 

segment, weak initial, well-ordered set, 11.7.10 

selection, ordered, 14.10, 14.10.1 

selection, ordered, rearrangement, 14.11 

selection, ordered, sorting, 14.11 

selection, strongly ordered, 14.10.2 

selection, unordered, combination symbol, 14.9.1 

selection, weakly ordered, 14.10.2 

selective quantification of free variables, 6.3.11, 6.7.8, 6.7.10 

self-arrow, 11.2.3 mm 

self-projection fibration, 21.2.5 

semantics, 2.2.3 

semantics, knowledge set, predicate calculus, 6.5.9 

semantics, logical quantifier, 5.3.11 EN. 

semantics, universalisation, free variables in sequents, 6.4.1 

semantics-free propositional calculus, 4.1.1 PER 

semi-closed interval, 11.5.10, 16.1.5 ^ 

semi-definite, negative, 25.11.7, 30.5.3 

semi-definite, positive, 25.11.7, 30.5.3 

semi-infinite real-number interval, 16.1.12 

semi-open interval, 11.5.10, 16.1.5 

semi-open interval, circle topology, 32.6.14 

semi-open interval, circle topology distance function, 37.2.10 

semi-open interval, torus topology, 32.6.15 

semi-open interval, torus topology distance function, 37.2.12 
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semi-Riemannian manifold, 75.0.1 
semi-ring of sets, 18.11.7 

semicolon notation for sets, variants, 7.7.16 
semigroup, 17.1, 17.1.1 

semigroup, cancellative, 17.1.14 

semigroup, cancellative, ordered ring, 18.3.11 
semigroup, commutative, 17.1.15 
semigroup, identity, 17.2.2 

semigroup, left module, 19.2.4 

semigroup, list operation, 18.10.2 
semigroup, module over, 19.2 

semigroup, set-endomorphisms, 17.1.11 
semigroup, sum of finite sequence, 17.1.17 
semigroup homomorphism, 17.1.8 
seminorm, continuity, 24.65 
seminorm, linear space, 24.6, 24.6.3 
seminorm, linear space, bounds, 24.8 
seminorm, module over ring, 195 . 
seminorm on module over ring, 19.5.3 


seminorm unit ball recovery, Minkowski functional, 24.6.10 


seminorm unit balls, Minkowski functionals, 24.6.7 
seminorm unit balls, properties, 24.6.8 

sensitivity, noise, operational amplifier, 44.1.8, 44.6.2 
separability, topological space, 33.4 i i 
separable space, completely, 33.4.11 

separable space, locally connected, 34.8 

separable topological space, 33.4.6 an 

separable topology choice function, 33.4.8 

separable versus second countable spaces, 33.4.17 
separated pair of sets, strongly, 33.2.10 

separated pair of sets, topologically, 33.2 

separated pair of sets, weakly, 33.23 —— 

separated topological spaces, strongly, connectedness, 34.2 
separation, Hausdorff, 33.1.25 mE 
separation axiom, Zermelo set theory, 7.7.3 

separation class, completely regular, 33.3.18 

separation class, Hausdorff, 33.1.24 
separation class, normal, 33.3.24 
separation class, T'o, 33.1.5 

separation class, T1, 31.3.8, 31.10.5, 33.1. 
separation class, T2, 33.1.24 

separation class, T3, 33.3.2 

separation class, T;i, 33.3.15 


separation class, T4, 33.3.20 

separation class, T'5, 33.3.25 

separation class, T6, 33.3.30 

separation class, Vedenisov axiom, 33.3.29 

separation classes for topological spaces, stronger, 33.3 
separation classes for topological spaces, weaker, 33.11. 
septuple, ordered, 14.6.5 

sequence, 12.3, 12.3.1 

sequence, Cauchy, 37.8 

sequence, Cauchy, metric space, 37.8.3 

sequence, convergence, 35.4 

sequence, convergent, 35.4.2 

sequence, divergent, 35.4.2 

sequence, etymology, 11.5.23 

sequence, exact, linear maps, 24.5 

sequence, finite, 12.3.1 

sequence, finite, sorted rearrangement, 14.11.3 
sequence, finite, sorting-permutation, 14.11.3 
sequence, finite, sum and product, 16.7 

sequence, first-instance subsequence, 12.4.16 

sequence, infinite, 12.3.1 

sequence, injective, 12.3.2 

sequence, limit, 35.4, 35.4.2, 35.4.12 

sequence, partial, 10.17.5 
sequence, partial-sum, topological linear space, 39.2.2 
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sequence convergence, metric space, 37.7.11 

sequence limit, real-number, 35.7 

sequence-limit-based compactness, 35.6 

sequence limit in metric space, 37.7.12 

sequence of functions, equicontinuous, pointwise, 38.4.10 
sequence of functions, equicontinuous, uniformly, 38.4.1 
sequence of functions, pointwise bounded, 38.4.13 
sequence of functions, uniformly bounded, 38.4.14 
sequence of functions, uniformly Cauchy convergent, 38.4.3 
sequence of functions, uniformly convergent, 38.4.0 ^ 
sequence of subsequences, nested, Ascoli's theorem, 38.5.1 
sequence range, 12.3.2 

sequence-range oc-limit-point compact set, 35.6.4 
sequence-range limit-point compact set, 35.6.4 

sequence sum, finite, semigroup, 17.1.17 

sequent, compound, 5.3.14 

sequent, predicate calculus, 6.3.3 

sequent, propositional calculus, 6.1.3 

sequent, simple, 5.3.14 inis 

sequent, Verdünnung (thinning), 5.3.8 

sequent, Vertauschung (exchange), 5.3.8 

sequent, Zusammenziehung (contraction), 5.3.8 

sequent calculus, 5.3, 5.3.9 NE 

sequent expansion, 6.5.9, 6.6.26 

sequential compactness, explicit, real compact sets, 35.7.10 
sequential compactness, real compact sets, 35.7.11 
sequentially compact set, 35.6.2 mn 
sequentially compact space, 35.6.2 

sequitur quodlibet, ex falso, 3.1.5 

serendipity, Lie bracket, 61.5.4 — 

series, convergent infinite, topological linear space, 39.2.7 
series, divergent infinite, topological linear space, 39.2.7 
series, infinite, topological linear space, 39.2, 39.2.2 
series, power, 42.8 —— 
series, sum, topological linear space, 39.2.6, 39.2.9 

series, Taylor, 16.8.10, 44.2.2 

serpentine development, finite sets, 13.7.1 

set, beta, 13.4.12 

set, constructible, 9.1.1 

set, dark, 2.3 ES 

set, determinable content, 7.8.9 

set, directed, 37.8.15 e 

set, empty, 7.6.3 

set, finite, 13.5.2 

set, incompressible, 2.3.5 

set, infinite, 13.7.2 ^ 

set, naive, 3.2.3, 3.2.10, 6.5.9 


223r 320 222 


set, nature, 7.8.9 


set, non-empty, 7.6.3 

set, power, 7.6.25 — 

set, singleton, 7.5.6 

set, unmentionable, 2.3.3 

set addition, Minkowski, 22.10.2, 22.10.4 

set algebra, 8.1, 8.4, 18.11.15 

set beta-cardinality, 13.4.9 

set boundary, topological, 31.9.5 

set cardinality, comparability theorem, 13.1.20 

set cardinality, trichotomy theorem, 13.1.20 

set cardinality-rank, 13.4.6 

set class, 9.2.4 

set class, algebraic, 18.11 

set classes, 18.12 

set complement, 8.2, 8.2.1 
,8 


Hi 


set constructions = 

set cover, 8.7 

set cover, indexed, 10.18, 10.18.2 
set diameter, 37.4, 37.4.6 

set difference, symmetric, 8.3 
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set distance, 37.4 set-theoretic function, 10.1.2 

set-endomorphism semigroup, 17.1.11 set-theoretic function formula, two-parameter, 7.7.6 

set existence axiom, ZF, 7.4.2, 7.4.3 set theory, 7 P 

set exterior, topological, 31.9.2 set theory, applicable, 7.0.1 

set family, 10.8.2 set theory, atom, 9.28 

set family, Cartesian product, 10.11, 10.11.2 set theory, backbone, ordinal numbers, 12.6.4 

set field, 18.11.15 set theory, Neumann-Bernays-Godel, 7.5.11, 7.8.6 

set inclusion, 7.3, 7.3.2 set theory, underlying, for logic, 3.2.5 ax 
set-inclusion chain, 12.5.17 set theory, underlying, for model theory, 3.16.2 
set-inclusion relation, partial order, 11.1.12 set theory, underlying, model theory, 7.1.5 

set interior, topological, 31.8.2 set theory, Zermelo, 7.7.3, 7.7.4 RIT 

set intersection, 8.1.2 set theory, Zermelo-Fraenkel, concrete proposition domain, 
set intersection properties, binary, 8.1 3.2.9 

set intersection properties, general, 8.4 set theory, Zermelo-Fraenkel, with axiom of choice, 7.11.10 
set language, high-level, 7.9.2 set theory, Zermelo-Fraenkel, with axiom of countable choice, 
set-limit-based compactness, 35.5 13.7.21 

set-map, double, 10.7.4 set theory, ZF, 8-line summary, 7.3.9 

set-map, forward, conditions for continuity, 35.1.8 set theory, ZF, redundant axiom, 7.4.2 

set-map, function, 10.6 set theory 8-line summary, ZF, 7.2.8 

set-map, function, double, 10.7 set theory axioms, social licence, 7.10.2 


set-map, function, mixed, 10.7 set theory axioms, Zermelo-Fraenkel, 7.2, 7.2.4 


set-map, function, properties, 10.6.7 set theory axioms, ZF, basic four, 7.6 


set-map, inverse, conditions for continuity, 35.1.2 set theory construction stage, ZF, 78.10 
set-map, inverse, function, 10.6 


set theory model, Zermelo, 12.6.3 
set-map, partial function, properties, 10.9.12 set theory research, 7.0.1 aa 
set-map for a function, 10.6.4 set translate, 31113 — 
set-map for a function, double, 10.7.5 
set-map for a function, inverse, 10.6.4 
set-map for a function, inverse, double, 10.7.5 
set-map for a partial function, 10.9.10 
set-map for a partial function, inverse, 10.9.1 


2 A 1 set-union topology, disjoint, 32.14.11 
set-map properties, continuous function, 35.1 satzuniór-topolosvcoverlanping 32:1 4:8 
set membership chain, 7.8.9 po'oBy, pping, 24.72.29 


— ti logici 12.0.2 
set membership symbol €, 7.2.9 cae ae aah collections, 7.1.7 
set morphism, 10.5.21 beds — 


set non-membership notation, 7.2.5 sets defined by their comprehension, uniqueness, 7.6.24 


set notations with braces and semicolons, 7.7.8 ee eee m " 10.2.31. 27.2.2 

set of explicit measure zero, real-number, properties, 45.2.4 ROVERS ven eS; une MO at s puces en 

set of measure zero, real-number, properties, 45.1.4 — several variables, higher-order derivative, 42.2 
set of real numbers of explicit measure zero, 45.2, 45.2.2 ae ee 

set of real numbers of measure zero, 45.1, 45.1.3 Shadow set, 45.5, 45.5.2 


set of real numbers of measure zero, explicit family, 45.3.2 shadow set, double, ED 363 
set ontology, 7.4.1 Et shadow-set interval list, 45.5.6 


set operations, 8 shadow-set interval list, double, 45.6.8 


set union, 8.1.2 

set union properties, binary, 8.1 
set union properties, general, 8.4 
set union topology, 32.14, 32.14.6 


[e] 


set partition, 8.7, 8.7.12 shadow set measure, double, upper bound, 45.6.9 
set product, Cartesian, 9.4, 9.4.2 shadow set measure, upper bound, 45.5.8 
set-product, Cartesian, projection map, 10.12.2, 10.13.2, shadow set, properties, 45.5.4 
11.5.25 — shadow set properties, double, 45.6.6 
set-product, Cartesian, projection map, component subset, shadow set properties, torchlight, 45.6.5 
10.13.6 shaft, vector, 22.0.2 
set-product, Cartesian, slice through point in set, 10.12.5, shared primacy, sets and numbers, 12.0.2, 14.4.1 
10.13.10 m sharp musical isomorphism, 73.5.1, 73.5.4 
set quotient, equivalence relation, 9.8.7 sheep counting, 40.2.2 
set rank, 13.4.3 sheep versus goats, 77.4.15 
set ring, 18.11.12 Sheffer, Henry Maurice, 3.7.9 
set rings and algebras, family tree, 18.12.6 Sheffer stroke, 3.7.8, 3.7.10, 3.7.14 
set rings and fields, 18.12 neca shim theorem, choice axiom, 13.8.14 
set semi-ring, 18.11.7 short-cut, covariant tensor bundle cross-section, 46.2.1, 
set sigma-algebra, 18.12.5 57.7.1, 58.11.6 
set sigma-field, 18.12.5 short-cut, cross-section, form-style non-topological fibration, 
set sigma-ring, 18.12.2 21.4 
set c-algebra, 18.12.5 short-cut, multilinear form, general, 57.7.25 
set c-field, 18.12.5 short-cut differential form, differentiable manifold, 57.7 
set o-ring, 18.12.2 short-cut differential form space, vector-valued, 57.7.15 
set specification, naive comprehension, 7.7.9 short-cut map, associated orbit-space cross-section, 47.12.5 
set sum, Minkowski, 22.10.2, 22.10.4 mm short-cut map, contravariant principal bundle function, 
set-theoretic formula, 7.2.2 47.12.5 
set-theoretic formula, always-true, 10.2.20 short-cut map, form-style non-topological fibration, 21.4.9 
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short-cut orbit-space associated cross-section, 47.12, 47.12.3, 
66.8 a 

short-cut orbit-space associated cross-section space, 47.12.4 

short-cut orbit-space associated fibre bundle cross-section, 
66.8.1 

short-term memory, 7.10.2 

sigma-algebra of sets, 18.12.5 

sigma-field of sets, 18.12.5 

sigma-ring of sets, 18.12.2 

c-algebra of sets, 18.12.5 

c-algebra of sets generated by a subset, 18.12.12 

o-field of sets, 18.12.5 

o-ring of sets, 18.12.2 

o-ring of sets generated by a set, 18.12.11 

sign function, 16.5.4 

sign function, permutation, 14.8.22 

signed integer, 14.4.3 

signum function, real number, 16.5.5 

simple closed curve, 36.2.11 

simple curve, 36.2.11 

simple dual multilinear function, 27.4.14 

simple m-vector, 30.4.22 

simple multilinear function, 27.4, 27.4.6 

simple sequent, 5.3.14 7 

simple tensor, 28.4, 28.4.2 

simplified fraction, 15.1.4 

sine, classical definition, 44.2.21, 44.2.22 

sine function, 44.2.14 

sine-of-reciprocal function graph closure, 34.7.8 

single-base-point fibration, 21.2.5 

single-chart locally Cartesian space is manifold, 51.5.12 

single substitution into symbol string, 3.9.5 

singleton axiom, 7.6.15 ES 

singleton set, 7.5.6 

singleton sets, uniqueness, 7.5.3 

singular matrix, 25.8.7 imd 

situs, analysis, 31.1.1 

situs, analysis, combinatorial topology, 31.1.2 

sixpence, tooth fairy, axiom of choice, 13.8.15 

skating, thin ice, 11.2.18 

skew-symmetric form, 30.4.27 

skew-symmetric tensor, 30.4.27 

Skolem's paradox, 12.6.4 

Skolem's paradox, axiom of choice, 7.12.6 

Skolem's paradox, constructible real numbers, 13.7.23 

Skylab, NASA, 26.10.1 

slice, projected, direct product of two topological spaces, 
32.10.1 

slice, projected, topological space product, 32.10 

slice, projection, binary Cartesian product, 10.12 

slice, projection, general Cartesian product, 10.13 

slice map, projected, temporary notation, 32.10. 

slice of function through point along component, 10.12.10, 
10.13.15 

slice-set, manifold product, tangent vector embedding map, 
54.8.3 

slice-set submanifold of manifold product, left, 52.6.17 

slice-set submanifold of manifold product, right, 52.6.17 

slice through point in subset of set-product, 10.12.5, 10.13.10 

slice through point in subset of set-product, projected, 
10.12.7, 10.13.12 

smallest topology, 31.5.1 

smoke and fire, 3.4.7, 3.7.4, 3.7.6 

snail trail 59.1.2 ^ = 

snake, 2.1.1 

snakes ‘and ladders, implications between compactness classes, 
35.5.6 

Sobolev space, 19.5.1, 43.1.3 
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social benefits, mechanisation of logic, 4.1.3 

social licence, set theory axioms, 7.10.2 

socio-mathematical context, function composition, 10.10.8 

socio-mathematical network, 4.3.3, 7.10.2 

socks metaphor, Russell, axiom of choice, 7.12.2 

soft analysis, 35.1.1 I 

soft predicate, 6.3.7, 6.3.9 

software error, 1185 

software library, 4.5.11 

solid geometry, Euclidean, 26.11.1 

somebody else's problem, equality relation, 6.7.3 

somebody else's problem, model theory, 2413 

somebody else's problem, ZF set theory model, 7.4.1 

sometimes-constant curve, 36.4.3 

sorted rearrangement of finite sequence, 14.11.3 

sorting of ordered selection, 14.11 

sorting-permutation for finite sequence, 14.11.3 

source set of relation, 9.5.20, 9.5.21 

space, affine, 26 

space, Euclidean, commutative, 26.11.3 

space, figure, for transformation group, 20.9.13 

space, flat, 26.1.2 

space, linear, 22.1, 22.1.1 

space, metric, 37 ^ 

space, metric, continuity, 38 

space, patchwork, 10.17.8 

space, vector, 220.2 —— 

space-filling curve, 36.1.5, 36.3 

space-filling curve, Peano, 36.3.1 

space grooves, affine connection, 60.1.4 

space-time, Minkowski, 75.1.11 

space-time, Minkowski, gauge theory construction stages, 
75.4.1 

space-time, Minkowski, not true geometry, 75.1.2 

space-time, Minkowski, pseudo-Riemannian manifolds, 75.1.1 

space-time, Minkowskian metric tensor field, 75.2.5 

space-time interval, 20.1.10 S 

span, column, matrix, 25.5 

span, column matrix, 25.5.2 

span, convex, 22.11.10 

span, convex, of two real numbers, 16.1.14 

span, linear, 22.4 

span, row, matrix, 25.5 

span, row matrix, 25.5.2 

span of points, affine space, 26.9.13 

span of subset, linear space, 22.4.2 

sparse array, 22.2.23 

sparse infinite matrix, 25.3.16 

spear, parallel transport, 71.0.3 

special function, 43.2.3, 43.2.8, 44.2.2 

special function, fundamental theorems of calculus, 43.8.2 

special functions, 44 

special relativity, 75.1.11 

specification axiom, 7.7.4, 9.1.1 

specification axiom, ZF, 7.7.3 

specification theorem, Zermelo-Fraenkel, 7.7.2 

specification tree versus tuple, 47.6.3 Poa 

specification tuple, 8.8 

specification tuple, tangent velocity vector, 53.1.13 

specification tuple redundancy, 31.3.5 

speed, mean, bounded by maximum pointwise speed, 40.8.7 

sphere condition, exterior/interior, 26.19.4 

sphere-surface retina, 49.2.7 

spherical coordinates, astronomical, 76.2.3 

spherical coordinates, higher-dimensional, 76.1 

spherical coordinates, recursive formula, 76.1.3 

spherical coordinates, terrestrial, two-sphere, 76.2 

spherical geometry, 76 
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spider, 3.6.7 

spine of set theory, ordinal numbers, 12.6.4 

spiral, Archimedes, 26.19.1 m 

Spivak, Michael David, 1.4.5 

spray on tangent bundle, 59.5 

spray on vector bundle, 65.5.8 

square function, 16.5.23 

square matrix, real, lower bound, 25.12 

square matrix, real, lower modulus, 25.11 

square matrix, real, upper bound, 25.12 

square matrix, real, upper modulus, 25.11 

square matrix algebra, 25.8 

square-norm, general module over ring, 19.7.1 

square root, Cauchy-Schwarz inequality, not needed for 
module over ring, 19.8.8 

square root, Cauchy-Schwarz inequality, real numbers, 16.6.17 

square root, enumerations of w x w, 13.9.4 

square root, needed for Cauchy-Schwarz inequality, 19.8.1, 
19.8.4 

square root, real numbers, basic inequalities, 16.6.18, 16.6.19 

square root, triangle inequality, not needed for modules over 
rings, 19.8.10 

St. Petersburg Academy, 44.2.6 

stack, postfix expression, 3.11.7 

stage, construction, ZF set theory, 7.8.10 

stages, von Neumann universe, 12.6.4 

standard absolute value function, 18.5.14 

standard atlas for Cartesian space, 49.7.12 

standard atlas for finite-dimensional linear space, 49.7.14 

standard atlas for general linear group, 49.7.15 

standard basis, Cartesian linear space, 22.7.9 

standard basis, Euclidean linear space, 22.7. 

standard fibre, topological fibration with intrinsic fibre space, 

47.3.2 

standard immersion, free linear space, 22.2.10 

standard immersion, unrestricted linear space, 22.2.7 

standard immersion in tensor product using free linear space, 

28.7.2 

standard induced basis field, vector bundle fibre set, 65.1.8 

standard injection for direct sum of linear space sequence, 

24.1.5 

standard order, extended finite ordinal numbers, 12.2.11 

standard order, finite ordinal numbers, 12.2.5 

standard order, general ordinal numbers, 12.5.12 

standard topology, extended integers, 32.5.4 

standard topology, finite-dimensional linear space, 32.6.6 

standard topology, general linear group, 32.6.8 

standard topology, integers, 32.5.3 

standard topology, metric space, 37.5.2 

standard topology, quotient by equivalence kernel, 32.13.7 

standard topology, quotient by equivalence relation, 32.13. 

standard topology, rational numbers, 32.5.6 

standard topology, real-number Cartesian products, 32.6.1 

standard topology, real numbers, 32.5, 32.5.7 ER 

stars, fixed, 48.2.4, 59.8.1 E 

start at 0 or 1, integer index set, 14.1.20, 14.6.9, 14.12.1, 
14.12.4, 16.4.2 a a et se 

state bundle, baseless figure/frame bundle, 20.10.8 

state bundle, topological fibre/frame bundle, 47.13.2 

statement form, 4.1.6 

statement name, 4.1.6 

stationary interval of curve, removal, 36.4 

statistical variations, cloud, 2.2.5 m 

Steiner, Jakob, 77.1.6 E 

step function, 16.5.7 

stereo channels, book presentation, 1.5.9 

stereo mathematics, 2.4.8 m 

stick, measuring, 53.1.3, 53.1.4, 59.1.2 
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Stiefel, Eduard Ludwig, 77.1.7 

Stiefel, Eduard Ludwig, fibre bundles, 47.0.3 

Stieltjes integral, linear-operator-valued function, 43.12.3 

Stieltjes integral, operator-valued integrand, 43.12 

Stieltjes integral, vector integrator differential, 43.12.1 

Stieltjes integral, vector-valued function, 43.11.3 _ 

Stieltjes integral, vector-valued integrand, 43.11 

Stieltjes integral, vector-valued integrator curve, 43.12.2 

Stifel, Michael, 14.9.6 

Stokes, George Gabriel, 77.1.7 

Stokes theorem, exterior derivative, 46.9.7 

Stokes theorem, motivation for exterior derivative, 46.7.12 

Stokes theorem, pathwise parallelism, 48.2.2 EMEN 

Stokes theorem, rectangular, in three dimensions, 46.10.1 

Stokes theorem, rectangular, three dimensions, 46.10 

Stokes theorem, rectangular, two dimensions, 469 — 

Stokes theorem, rectangular in two dimensions, 46.9.3 

Stokes theorem literature, 46.9.1 

stone, Rosetta, connection definition conversion rules, 77.4.1 

stone, Rosetta, connection definitions, 69.15.2 

stone, Rosetta, differential geometry languages, 1.4.5 

stone, Rosetta, generator function, connection conversions, 
69.15.3 

Stone-Weierstra8 theorem, generalised, 7.11.13 

stones, casing. Tura, 26.10.1 

straight line, differentiable manifold, 53.1.9 

straw, camel's back, 5.2.8 

stress-energy tensor, 30.5.1 

stretch of curve, constant, 36.4.2 

strip, Móbius, topological fibre bundle, 47.6.18 

stroke, Sheffer, 3.7.8, 3.7.10, 3.7.14 

strong force, gauge theory, 70.8.1 

stronger topology, 31.3.23 

strongly connected function, 35.2.6 

strongly locally compact set, 33.6.11 

strongly locally compact topology, 33.6.10 

strongly ordered selection, 14.10.2 

strongly separated pair of sets, 33.2.10 

strongly separated topological spaces, connectedness, 34.2 

structural layers of differential geometry, 1.1 3 

structure, differentiable manifold, concrete, 51.5.17 

structure, induced linear, on fibre sets, 24.11.2 

structure, logic, 2.4.3 

structure constant, Lie algebra, 19.10.6 

structure equation, curvature form on principal bundle, 70.5.7 


structure equation for curvature forms, literature, 70.5.6 
structure group, baseless figure/frame bundle, 20.10.8 
structure group, differentiable fibre bundle, 64.8.3 
structure group, minimal, non-topological fibre bundle, 21.7.7 
structure group, non-topological fibre bundle, 21.8.2 
structure group, topological fibre bundle, 47.6.5 
structure group, topological fibre/frame bundle, 47.13.2 
structure-preserving fibre set map, 48.1, 48.1.2 
structure-preserving transformation group, 19.2.2 
structure recursion, run-away, 54.0.1 

style, ZF axioms, 7.9.2 

subalgebra, Lie, 19.10.17 

subbase, open, 32.3 

subbase, open, topology, 32.3.5 

subbase at a point, open, topology, 32.3.2 

subcover of cover of a set, 8.7.8 E 

subcover of cover of a subset, 8.7.3 

subcover of open cover, 31.7.3 

subexpression, logical, infix, 3.12.8 

subfield, 18.7.13 

subgroup, 17.6, 17.6.1 

subgroup, continuous one-parameter, 36.9.6 

subgroup, differentiable one-parameter, 62.9.2 
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su 
su 
su 
su 
su 
su 
su 
su 


su 
su 
su 


su 


su 
su 
su 


su 
su 
su 
su 
su 
su 


su 
su 
su 
su 
su 


su 
su 
su 
su 
su 


su 


su 
su 
su 
su 


su 


su 
su 
su 
su 
su 
su 
su 
S 
su 
su 
su 
su 
su 
su 
su 
su 
su 
su 
su 
su 


c 


[www .geometry.org/dg.html] 


bgroup, one-parameter, Lie group, 62.9 

bgroup generated by a subset, 17.6.5 — 

bgroup-set, 17.6.3 

bject classification, MSC 2010, 1.8 

bmanifold, differentiable, 52.4 ad 

bmanifold, fibre set, non-topological fibration, 21.5.7 

bmanifold, fibre set, standard manifold atlas, 64.11.5 

bmanifold, fibre space in fibration, regular differentiable, 
64.3.9 

bmanifold, horizontal, fibre-set vector field, 64.5.14 

bmanifold, product-structured, topological manifold, 50.5 

bmanifold, product-structured topological, induced atlas, 
50.5.8 

bmanifold, regular, product-structured topological 
manifold, 50.5.4 

bmanifold, regular differentiable, 52.4.2 

bmanifold, regular differentiable, basic properties, 52.4.4 

bmanifold, regular differentiable, constant-graph condition, 
52.4.8 

bmanifold, regular topological, topological manifold, 50.2.8 

bmanifold, relative atlas, 52.4.9, 52.7.4, 64.11.2 

bmanifold, terminology, 50.2[3 ^ 

bmanifold, topological, topological manifold, 50.2.6 

bmanifold, topological manifold, 50.2 

bmanifold atlas, differentiable, induced by ambient atlas, 
52.4.7 

bmanifold atlas, fibre set, fibre-chart-induced, 64.3.7 

bmanifold atlas construction, 52.4.11 

bmanifold atlas removal, 52.4.10 

bmanifold differentiable structure, fibre-set, 64.11 

bmanifold inclusion map differential, tangent vector 
embedding map, 58.5.6 

bmanifold of manifold product, slice-set, left, 52.6.17 

bmanifold of manifold product, slice-set, right, 52.6.17 

bmanifold point-set, differentiable, 52.3 

bmanifold point-set, differentiable, regular, 52.3.7 

bmanifold point-set, regular differentiable, constant-graph 
condition, 52.3.16 

bmanifold restriction, differential, tangent vector 
embedding map, 58.5.10 

bmanifold tangent bundle, 54.6 

bmanifold tangent bundle, tacit identification, 66.5.5 

bmanifold tangent vector embedding map, 54.6, 54.6.2 

bmanifold tangent vector embedding map, ‘differential of 
manifold embedding, 58.5.8 

bmanifold tangent vector embedding map, differential of 
submanifold restriction, 58.5.10 


bmersion, differentiable manifold, 52.5 

bmersion, regular, differentiable manifold, 52.5.6 
bmersion, regular, topological manifold, 50.3.13 
bmersion, terminology, 50.2.3 


bmersion, topological manifold, 50.3 
bmodule of module over a ring, 19.3.9 
bmodule of unitary module over a ring, 
bring, 18.1.7 

bring generated by a subset, 18.1.8 
bscript indices, linear spaces, 22.8.21, 23.8.4 
bscript order, transition map, fibre chart, 47.6.11 
bsequence, 12.3.3 EE 
bsequence, finite, 12.3.3 

bsequence, infinite, 12.3.3 

bsequence function for lists, 14.12.6 

bsequence index map, 12.3.2 

bsequence map, standard, for Cartesian product, 
bsequence of a sequence, first-instance, 12.4.16 
bset, 7.3.2 
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subsets of a set, multiplicative class, 18.11.4 

subspace, horizontal, ordinary fibre bundle, 67.9.6 

subspace, horizontal, principal fibre bundle, 69.3.7 

subspace, metric, 37.2.14 

subspace, trivial, 22.1.14, 22.5.4 

subspace duals, linear, natural isomorphisms, 24.3 

subspace of linear space, 22.1.10 

subspace spanned by subset of linear space, 24.1.7 

substitute-item function for lists, 14.12.6 

substitute-value operator for functions, 14.12.23 

substitution, exhaustive, logical expression, 6.1.2 

substitution, logical expression, 3.10 7 

substitution, single, into symbol string, 3.9.5 

substitution, uniform, into symbol string, 3.10.7 

substitution axiom, ZF set theory, 7.7.1 

substitution notation, expression, post-evaluation, 10.4.14 

substitution of equality, 7.5.12 o 

substitution operator, 27.2.26 

substitutivity of equality axiom, 6.7.5 

subtraction operation, finite ordinal numbers, 13.6.4 

successor function, not a ZF function, 12.2.16 

successor function, Peano axioms, 14.1.5 

successor function, wrapped, 12.2.22 

successor map, not a ZF function, 12.2.16 

successor ordinal, 12.5.16 

successor set, 12.2.17 

sum, Cauchy, 43.3. 

sum, Darboux, lower, 43.5.6 

sum, Darboux, upper, 43.5.6 

sum, direct, linear spaces, 24.1 

sum, finite sequence, 16.7 

sum, Riemann, 43.4.5 — 

sum function, list, 18.10.2 

sum of finite sequence, semigroup, 17.1.17 

sum of infinite series, topological linear space, 39.2.6, 39.2.9 

sum of sets, Minkowski, 22.10.2, 22.10.4 

sum rule, differential calculus, 40.5.9, 41.1.18 

summation convention, implicit, Einstein, 22.3.10 

summation convention, implicit Einstein, multiple ranges, 
7140.1 

Sung dynasty, 14.9.6 

sup map domain, 11.2.26 

sup notation, 11.2.16, 11.2.20, 11.2.22 

superior limit, 35.8 

superior limit, real function, one-sided, 35.8.9 

superior limit, real-valued function, 35.8.2 

supermarket, helicopter, correspondence principle, 54.9.9 

superscript indices, linear spaces, 22.8.21, 23.8.4 

superset, 7.3.2 

superset, proper, 7.3.6 

Suppes, Patrick Colonel, 77.1.8 

support, finite, free linear space, 22.2.9 

support of function, 14.7.16 

support of permutation, 14.8.7 

supremum, existence of lower approximations, 16.1.18 

supremum map, 11.2.21 

supremum of partially ordered set, 11.2.4, 11.3.2 

supremum set-map, 11.2.19 

surface, developable, 74.1.1 

surface, ruled, 74.1.1 

surjection, 10.52 

surjection, explicit right inverse, 10.5.17 

surjective function, 10.5.2 

surjective homomorphism versus endomorphism, 10.5.23 

surjective partial function, 10.9.8 

survey, affine space definitions, 26.10.1 

survey, Christoffel array index order, 71.13.1 

survey, connection definitions, 67.2.2 


Ll 


Hi 
m 


[ draft: UTC 2023-1-3 Tuesday 00:13] 


dis book dia 


2488 


survey, convex hull terminology, 22.11.19 

survey, cross-section terminology, 21.3.1 

survey, curve and path meanings, 36.1.1 

survey, deduction metatheorem literature, 4.8.6 

survey, differentials and induced maps terminology, 58.4.1 

survey, general and antisymmetric tensor space notations, 
30.4.13 

survey, logic operator notations, 3.7.14 

survey, logical quantifier notations, 5.2.4 

survey, manifold definition core structures, 49.2.4 

survey, natural number definitions and notations, 14.1.1 

survey, open cover definitions, 33.5.3 

survey, predicate calculus, Hilbert-style, 6.1.10 

survey, predicate calculus formalisations, 6.1.5 

survey, propositional calculus formalisations, 4.2.1 

survey, Riemann curvature component array notations, 
71.11.9 

survey, tangent vector definitions, 53.3.2 

survey, tensor space definitions, 28.5.3 

survey, tensor space degree and type terminology, 29.5.11 

survey, topological closure notations, 31.8.10 

survey tables, literature, 78.3 

swan, white, 5.2.7 m 

swap derivatives, curve family in a manifold, 59.6.12 

swap function, horizontal component, global, 59.6.5 

swap function, horizontal component, second-level tangent 
bundle, 59.6, 59.6.3, 59.6.6 

swap function, Lie bracket, 61.5.6 

swap-items function for lists, 14.12.6 


swap-like function, not needed for exterior derivative, 61.10.9 


swap-values operator for functions, 14.12.23 

swapping logical quantifiers, axiom of choice, 38.2.2, 45.2.1 

swapping quantifiers, axiom of choice, 10.11.11, 13.8.9 

sweep under carpet, choice function, 45.3.6 

switching matrix, non-Hausdorff locally Cartesian space, 
49.5.9 

Sylvester, James Joseph, 25.0.1, 77.1.6 

symbol, chicken-foot, 1.5. 7, 8.8. 8.8.4, ; 9.5.2 28, 17.0.5 

symbol, chicken-foot, tensor space diagram, 28.1.9, 29.3.5 

symbol, Christoffel, affine connection, 71.2.11  ~ 

symbol, Christoffel, history, 74.3.3 

symbol, conditional assertion, 4.3.12 

symbol, Levi-Civita, 14.8.35, 14.8.36 

symbol, two-way assertion, 4.3.15 

symbol, unconditional assertion, 4.3.11 

symbol —, history, 17.1.19 

symbol +, history, 17.1.19 

symbol =, Robert Recorde, 6.7.11 

symbol ©, 4.7.2 

symbol y for curves, 36.2.1 

symbol A, 4.7.2 

symbol V, 4.7.2 

symbol string, ing, single substitution into, 3.9.5 

symbol string, uniform substitution into, 3.10. 

symbol-string meanings, 3.3.1 

symbolic algebra, 22.2322 

symbols, logic, 3.6.3 

symmetric bilinear function, non-singular, on linear space, 
24.10.2 

symmetric bilinear function on linear space, index, 24.10.5 

symmetric higher-degree array compression, 25.15.12 

symmetric higher-degree array decompression, 25.15.15 

symmetric matrix, 25.13.2 

symmetric matrix, real, 25.13 

symmetric multilinear function, 30.1.7 

symmetric multilinear function bundle notation, 56.6.3 

symmetric multilinear function fibration, 56.6.5 

symmetric multilinear map, 30.1, 30.1.2 
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symmetric relation, 9.6.15 

symmetric set difference, 8.3 

symmetric tangent multilinear function bundle, 56.6 

symmetric tangent multilinear map bundle, 567 — 

symmetric tensor space, 30.5 EA 

symmetrised closed interval notation, 16.1.15 

symmetry, equivalence relation, 9.8.2 

symmetry, multilinear map, 30 ^ 

symmetry, tensor, 30 n 

syncretism, tangent bundle, 53.1.15 

syncretism, tensor space, 29.2.2 

syntax-directed translation, 3.11.7 

syntax specification, non-recursive, infix logical expression, 
3.12.1 

synthetic road to differential geometry, 74.2.6 

system, axiomatic, 4.1.6 

system, integral, 18.6.14 

system of ODEs, classical solutions, 44.6.12 

system of ODEs, weak solutions, 44.6.14 

Szekeres, Peter, mathematical physics is mathematics, 77.4.6 


T-finite set, same as II-finite, 13.11.1 

To separation class, topology, 33.1.5 

property, locally Cartesian space, 49.4.22 
separation class, topology, 31.3.8, 31.10.5, 33.1.8 
topology, minimal, 31.11.8 

topology on finite set, 33.1.17 

space, 33.1.24 

T3 topological space, 33.3.2 

T3i topological space, 33.3.15 


T4 topological space, 33.3.20 

'T5 topological space, 33.3. 

Ts topological space, 

table, algebraic structures summary, 17.0.3 

table, Cayley, 17.1.4, 25.1.3 

Table of Chords, Ptolemy of Alexandria, 44.2.22 

tables, survey, literature, 78.3 

tables of integrals, 43.2.3 

tablets, golden, 3.1.6, 6.4.6, 6.4.9, 62.0.1 

tabular systems, natural deduction, literature, 5.3.2 

tacit identification, submanifold tangent bundle, 66.5.5 

tactical retreat, integration by parts, 43.8.10 

tagged second-order tangent operator, 60.2.11 

tagged tangent differential operator, 54.15.3 

tagged tangent field covector bundle set, Cartesian space, 
26.18.13 

tagged tangent field covector set, Cartesian space, 26.18.11 

tagged tangent operator, 54.15 

tagged tangent operator, differential, 58.12.11 

tagged tangent operator space, 54.155 

tagged tangent operator space chart-basis operator, 54.15.8 

tagging theorems, axioms of choice, 7.1.11 

tags for declarations in axiomatic systems, 3.2.4 

tainted theorem, AC, 7.11.14, 10.3.9, 10.5.17, 10.11.10, 
11.6.23, 22.7.21, 22.7.27, 23.5.9, 23.5.10, 33.4.21 
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45.3.4, 45.3.5, 45.4.3, 45.4.7 
tangent, classical definition, 44.2.21, 44.2.22 
tangent, etymology, 53.1.2 
tangent, Euclid's definition, 53.3.12 
tangent algebra of diffeomorphism group, 63.2.5 
tangent bundle, coordinate-free, 53.3.5 
tangent bundle, cross-section, 57.1.3 
tangent bundle, cross-section, local, 57.1.10 
tangent bundle, differentiable manifold, 54, 54.5.30, 65.9 
tangent bundle, double, 59.1, 59.1.22 m m 
tangent bundle, higher-level, 59, 59.1.2 
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tangent bundle, horizontal lift function, 71.1 
tangent bundle, induced atlas from base ‘space, 54.5.18 
tangent bundle, open subset of manifold, 54.6.7 
e, 
e, 


tangent bundle, philosophy, differentiable manifold, 53 

tangent bundle, second-level, 59.1, 59.1.4, 59.1.22 == 

tangent bundle, second-level, "horizontal component swap 
function, 59.6, 59.6.3, 59.6.6 


tangent bund e, Spray, 59.5, 59.5.2 
tangent bundle, submanifold, tacit identification, 66.5.5 
tangent bundle, transposed horizontal lift function, 71.5 
tangent bundle, true, 53.1.7 
tangent bundle, true, Cartesian space, 53.1.8 
tangent bundle, true differentiable structure, 49.1.2, 53.1.1 
tangent bundle, unidirectional, 54.16, 54.16.4 
tangent bundle atlas, unidirectional, 54.16.4 
tangent bundle chart, unidirectional, 54.16.4 
tangent bundle construction methods, 53.4 
tangent bundle direct product for Cartesian spaces, 26.15 
tangent bundle direct product for manifolds, 54.7 n 
tangent bundle direct product identification map for 
Cartesian spaces, 26.15.4 
tangent bundle direct product identification map for 
manifolds, 54.7.6 
tangent bundle embedding map, fibre-set, implicit, 66.5.5 
tangent bundle for Banach space, 53.3.13 
tangent bundle is a differentiable manifold, 54.5.28 
tangent bundle lift function, unidirectional, 54.16.4 
e 
e 


tangent bundle notation, higher-level, 59.1.26 

tangent bundle of linear space, vertical drop function, 54.9, 
54.9.5 mm 

tangent bund 


e of submanifold, 54.6 

tangent bundle of tangent bundle, 59.1 

tangent bundle of tangent vector-tuple bundle, 59.7 
tangent bundle on affine space, fibre bundle, 26.6.4 
tangent bundle on affine space over group, 26.3.7 
tangent bundle 


on affine space over group, unidirectional, 
26.3.16 


tangent bundle on affine space over module, 26.5.9 
tangent bundle on affine space over module, unidirectional, 
26.7.9 


tangent bundle product identification map, 54.7 

tangent bundle projection map, unidirectional, 54.16.4 

tangent bundle total space, unidirectional, 54.16.4 

tangent bundle total space manifold, 54.5.26 

tangent bundle total space manifold atlas, 54.5.22 

tangent bundle vector field, naive derivative, Leibniz rule, 
61.3.3 

tangent bundle vertical drop function, pointwise, linearity, 
59.2.10 


tangent bundles, direct product, 54.7.5 

tangent bundles, recursive, 59 

tangent bundles are fibre bundles, 54.5.33 

tangent component tuple, second-order, 60.5.3 

tangent covector, 55.2 

tangent covector, Cartesian space, 26.17.5, 26.17.6, 26.17.7 
tangent covector bundle, 55.4, 55.4.11 

tangent covector bundle, Cartesian space, 26.17 
tangent covector bundle manifold atlas, 55.4.8 

tangent covector bundles are fibre bundles, 55.4.15 
tangent covector co-velocity, Cartesian space, 26.18.9 
tangent covector component tuple, 55.2.5 

tangent covector representations, 55.1 

tangent covector representations, plethora, 55.1.5 
tangent covector space, Cartesian space, 26.17.3 
tangent covector space, differentiable manifold, 55.2.1 
tangent covector space, induced-map, pointwise, 58.3.7 
tangent covector space, induced-map, total, 58.3.9 
tangent covector space, linear space structure, 55.2.3 
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tangent covector space basis, 55.3 

tangent covector total space, 55.4.2 

tangent covector with given coordinates, 55.2.7 
tangent differential operator, 54.11, 54.11.2 
tangent differential operator, local functions, 54.12.6 
tangent differential operator, tagged, 54.15.3 
tangent differential operator, total space, 54.12 
tangent differential operator space, 54.11.10 — 
tangent fibration, manifold, projection map, 54.5.4 
tangent fibration, second-order, 60.5.12 

tangent fibration of manifold, 54.5.4 

tangent fibration velocity chart, 54.5.6, 54.5.9 
tangent fibration velocity chart map, 


54. 

tangent field covector, Cartesian space, 26.18.3, 26.18.4, 
26.18.5 

tangent field covector bundle, Cartesian space, 26.18 

tangent field covector bundle set, tagged, Cartesian space, 
26.18.13 

tangent field covector set, Cartesian space, 26.18.6 

tangent field covector set, tagged, Cartesian space, 26.18.11 

tangent field covector space, Cartesian space, 26.18.8 

tangent function, 44.2.14 

tangent generalisations, 26.19.4 

tangent-line bundle, affine space over group, 26.3 

tangent-line bundle, Cartesian space, 26.14 XS 

tangent-line bundle, Cartesian space, philosophy, 26.12 

tangent-line bundle on affine space over module, 26.5 

tangent-line bundle total space, Cartesian space, 26.14.2 

tangent-line bundle total space, Cartesian space, fibre chart, 
26.14. 

tangent-line bundle total space, Cartesian space, projection 
map, 26.14.7 

tangent-line bundle total space, parametrisation, 26.14.5 

tangent-line set, second-level, 59.1.8 

tangent-line space, Cartesian space, 26.13.11 

tangent-line tangent space, Cartesian space, 26.13 

tangent-line vector, Cartesian space, 26.13.1, 26.13.2, 26.13.3 

tangent-line vector, differentiable manifold, 54.1 

tangent-line vector, manifold, trivial, 54117 — 

tangent-line vector bundle, 54.5.16 

tangent-line vector bundle, differentiable manifold, 54.5 

tangent-line vector set, Cartesian space, 26.13.5 e 

tangent-line vector space, differentiable manifold, 54.4, 54.4.4 

tangent-line vector velocity, Cartesian space, 26.13.14 ~ 

tangent map, induced, differential of map, 58.4.1 

tangent multilinear function bundle, 56.4 

tangent multilinear function bundle, antisymmetric, 56.5 

tangent multilinear function bundle, symmetric, 56.0 _ 

tangent multilinear map bundle, antisymmetric, 56.7 

tangent multilinear map bundle, symmetric, 567 

tangent operator, action on real-tuple-valued functions, 
54.14.2 

tangent operator, action on vector-valued function, 54.14 

tangent operator, action on vector-valued functions, 54.14.4 

tangent operator, differential, 58.12.2, 58.12.7, 58.12.8 

tangent operator, higher-order, 60 

tangent operator, second-order, 60.2, 60.2.2 

tangent operator, second-order, tagged, 60.2.11 

tangent operator, second-order, tensorisation coefficients, 
60.4, 60.4.9 

tangent operator, tagged, 54.15 

tangent operator, tagged, differential, 58.12.11 

tangent operator, zero, ambiguity, 53.3.3, 54.12.4, 54.12.10, 
54.15.1 umm 

tangent operator field, 57.3 

tangent operator field composition, 60.3 

tangent operator of curve, 57.9.13 ~~ 

tangent operator space, tagged, 54.15.5 
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tangent 
tangent 
tangent 
tangent 
tangent s 
tangent s 
tangent s 
tangent s 
tangent s 
tangent s 
tangent s 
tangent s 
tangent space, 
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operator space chart-basis operator, tagged, 54.15.8 
operator space chart-basis vector, 54.13.4 

operator total space, 54.12.1 

affine space over a group, 26.3.2 
Archimedean, 26.19.1 

bidirectional, 26.8.1 - 

coordinate-bound, 53.1.11 

double, oblique drop function, 59.3.2, 59.3.5 
double, vertical drop function, 59.2.9, 59.2.15 
induced-map, 58.3 

induced-map, pointwise, 58.3.2 
induced-map, total, 58.3.4 

second-level, 59.1.19 

second-level, vertical subspace, 64.5.7 
second-order, 60.5.9 

pace basis bundle, 55.7.2 

pace chart-basis covector, 55.3.2 


tangent 
tangent 
tangent 
tangent 
tangent 
tangent 
tangent 


pace chart-basis vector, 54.4.9 

pace chart-basis vector-tuple, 55.5.18 

pace direct product, 54.7.1 

tangent space direct product for Cartesian spaces, 26.15.1 

tangent space direct product identification map for Cartesian 
spaces, 26.15.2 

tangent space direct product identification map for manifolds, 
54.7.2 

tangent space on affine space over group, 26.3.5 

tangent space on affine space over group, unidirectional, 
26.3.14 

tangent space on affine space over module, 26.5.7 

tangent space on affine space over module over ordered ring, 
unidirectional, 26.7.7 

tangent spaces, co-existence, 26.8.1, 53.3.2 

tangent tensor bundle, differentiable manifold, 56.3 

tangent tensor space, differentiable manifold, 56.1. 

tangent vector, Cartesian, 26.19.1 ond 

tangent vector, differentiable manifold, 54.1 

tangent vector, double, component unpacking, 59.3.6 

tangent vector, double, vertical component coordinates 
extraction, 59.3.7 

tangent vector, extra-mathematical, 53.1.14 

tangent vector, extrinsic, 53.1.6 

tangent vector, fibre-set, identification with vertical vectors, 
64.12.7 


tangent vector, finite-dimensional linear space, 54.1.14 
tangent vector, higher-level, manifold, 59.1.24 

tangent vector, higher-order, 60.5 

tangent vector, manifold, 54.1.2 

tangent vector, ontology, 53.1.15 

tangent vector, second-level, 59.1.6 

tangent vector, second-level, notation, 59.1.12 

tangent vector, second-level, scaling, 59.4.1 

tangent vector, second-order, 60.5.5 

tangent vector, survey of definitions, 53.3.2 

tangent vector, true nature, 53.1 

tangent vector, ubiquitous zero, 58.12.6, 58.12.10 
tangent vector, unidirectional, 54.16.4 

tangent vector, vertical, on total space of fibration, 64.5.4 
tangent vector, zero, 54.1.17 

tangent vector 0-tuple degeneracy, 55.5.15 

tangent vector bundle, differentiable manifold, 54.5 
tangent vector bundle, fibre bundle, 54.0.1 — 
tangent vector bundle total space manifold chart, 54.5.19 
tangent vector components, chart transition rule, 54.1.11 
tangent vector concatenation, 54.7.1 

tangent vector concatenation, Cartesian space, 26.15.6 
tangent vector concatenation, differentiable manifolds, 54.7.8 
tangent vector definition, ontological clarity, 54.1.6 
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tangent vector embedding map, differential of inclusion map, 
58.5.6 

tangent vector embedding map, differential of manifold 
embedding, 58.5.8 

tangent vector embedding map, differential of submanifold 
restriction, 58.5.10 

tangent vector embedding map, manifold product slice-set, 
54.8.3 

tangent vector embedding map, open subset, 54.6.8 

tangent vector embedding map, product-structured manifold, 
54.8, 58.7.5 

tangent vector embedding map, submanifold, 54.6, 54.6.2 

tangent vector field, 57.1.2, 57.1.9 7 

tangent vector field, second-order, 59.8.2 

tangent vector field of curve, 57.9.2. 

tangent vector field of curve, drop function, linear space, 
57.9.5 

tangent vector frame bundle, 55.6, 55.6.22, 55.6.31 

tangent vector frame bundle manifold chart, 55.6.24 

tangent vector frame bundle total space, 55.6.30 

tangent vector frame bundle total space manifold atlas, 
55.6.28 

tangent vector frame coordinate tuple, 55.6.14 

tangent vector frame fibration, 55.6.8 

tangent vector-frame fibration principal velocity chart, 55.7.5 

tangent vector-frame fibration principal velocity chart map, 
55.7.5 

tangent vector-frame fibration velocity chart, 55.6.17 

tangent vector-frame fibration velocity chart map, 55.6.17 

tangent vector frame fibre atlas, 55.6.21 

tangent vector frame field, 57.11.2 

tangent vector frame space, 55.6.3 

tangent vector frame total space, 55.6.5 

tangent vector frames, chart transition formula, 55.6.19 

tangent vector map, constant-scale, differential, 59.4.10 

tangent vector on fibre-set, embedding map, 64.12 

tangent vector parametrisation, extended, 54.3.3 

tangent vector principal frame bundle, 55.7.8 

tangent vector principal frame fibre atlas, 55.7.7 

tangent vector representation, 53.3, 53.3.1 

tangent vector scaling curve, differential, 59.4.7 

tangent vector scaling curve velocity field, verticality, 59.4.3 

tangent vector space, differentiable manifold, 54.4 

tangent vector specification, locality, 54.1.13 

tangent vector-tuple bundle, 55.5, 55.5.30, 55.5.3 

tangent vector-tuple bundle, tangent bundle of, 5 

tangent vector-tuple bundle fibre atlas, 55.5.28 

tangent vector-tuple bundle manifold chart, 55.5.31 

tangent vector-tuple bundle total space, 55.5.36 

tangent vector-tuple bundle total space manifold atlas, 
55.5.34 

tangent vector-tuple coordinate tuple, 55.5.13 

tangent vector tuple fibration, 55.5.8 

tangent vector-tuple fibration velocity chart, 55.5.24 

tangent vector-tuple fibration velocity chart map, 55.5.24 

tangent vector tuple space, 55.5.4 

tangent vector-tuple total space, 55.5.6 

tangent vector tuples, chart transition formula, 55.5.27 

tangent-vector-valued function on manifold, derivative, 
54.14.7 

tangent vector/covector bundle association, topological, 
55.4.17 

tangent vectors and cardinality, 53.1.17 

tangent vectors are functions, 53.1.13, 54.2.1 

tangent vectors map charts to lines, 53.1.13, 54.2.1 

tangent velocity bundle, 26.16.9 

tangent velocity bundle, affine space, 26.8 

tangent velocity bundle, Cartesian space, 26.16 
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tangent velocity bundle on affine space over module, 26.8.4 

tangent velocity set, 26.16.5 

tangent velocity space, 26.16. 

tangent velocity space, affine space, 26.8 

tangent velocity space on affine space over module, 26.8.2 

tangent velocity vector, 26.16.2, 26.16.3, 26.16.4, 54.10.4 

tangent velocity vector, differentiable manifold, 54.10 _ 

tangent velocity vector bundle total space, 54.10.10 - 

tangent velocity vector space, differentiable manifold, 54.10.7 

tangent velocity vector specification tuple, 53.1.13 

target manifold dependence, differential of a map, 58.5.5, 
58.5.7, 58.7.4, 58.7.5, 58.7.16 

target set of function, 10.2.7 

target set of relation, 9.5.20, 9.5.21 

Tarski, Alfred, 77.1.7 

tautological tensor, differential of identity map, 58.5.3 

tautology, 3.13, 3.13.2 S 

tautology calculus, predicate, 6.0.1 

tautology calculus, propositional, 4.3.2 

Taylor, Brook, 77.1.5 E 

Taylor series, 16.8.10, 44.2.2 

Taylor series, combination symbol, 14.9.1 

techniques, PDE, curved space, 1.4.4 

template, axiom, 4.4.5 NES 

template, function, Kronecker delta, 14.7.12 

template, logical expression, 3.9.9 

template, proposition, 7.2.3 TOS 

template function, 10.2.30 

tensor, 28.1.2 

tensor, alternating, 30.4 

tensor, curvature, Ricci, affine connection, 71.11.11 

tensor, curvature, Ricci, Riemannian manifold, component 
version, 74.4.6 

tensor, curvature, Riemann, 70.4.1 

tensor, curvature, Riemannian manifold, 74.4 

tensor, differentiable manifold, imported from Cartesian 
space, 56.2 

tensor, Levi-Civita, 14.8.34 

tensor, linear style, Cartesian space, 30.6.10 

tensor, linear-style, specified by coordinates, 56.1.12 

tensor, metric, calculation from distance function, 76.5 

tensor, metric, inverse, 73.5.1 — 

tensor, multilinear style, Cartesian space, 30.6.9 

tensor, multilinear-style, specified by coordinates, 56.1.11 

tensor, Riemann curvature, component version, 74.4.4 

tensor, simple, 28.4, 28.4.2 

tensor, stress-energy, 30.5.1 

tensor, tautological, differential of identity map, 58.5.3 

tensor, two-point, 58.3.11 

tensor algebra, coordinate-independent, 27.1.4 

tensor algebra, meaning, 27.1.2 s 

tensor bundle, linear-style, 56.3.24 

tensor bundle, multilinear-style, 56.3.23 

tensor bundle, non-topological linear-style, 56.3.17 

tensor bundle, non-topological multilinear-style, 56.3.16 

tensor bundle, tangent, differentiable manifold, 56.3 

tensor bundle cross-section, covariant, short-cut, 46.2.1, 
57.7.1, 58.11.6 

tensor bundle manifold chart, linear-style, 56.3.20 

tensor bundle manifold chart, multilinear-style, 56.3.19 

tensor bundle on a manifold, 56 

tensor bundle on Cartesian space, 30.6 

tensor bundles, associated, Lie derivative, 61.9.1 

tensor calculus, contraction of indices, 29.7.1 


tensor calculus, horizontal lift function, 71.2 
tensor calculus, Levi-Civita connection, 74.3 
tensor calculus, metric tensor field, 73.3 


tensor contraction, 29.7 
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tensor contraction, basis-independent, 29.7.4 

tensor contraction, multilinear-function-style, 29.7.2 

tensor fibration, non-topological, linear-style, 56.3.8 

tensor fibration, non-topological, multilinear-style, 56.3.7 

tensor fibration total space, linear-style, 56.3.3 

tensor fibration total space, multilinear-style, 56.3.2 

tensor field, 57.5, 57.5.2 ~ 

tensor field, action by a vector, 61.4.6 

tensor field, action by a vector field, 61.4.7 

tensor field, Lie derivative, 61.9 

tensor field, metric, Riemannian, 73.2 

tensor field, metric, tensor calculus, 73.3 

tensor field, pseudo-metric, 75.1.4, 75.1.6, 75.1.11 

tensor field, Riemann curvature, affine connection, 71.11.5 

tensor field, Riemannian metric, 73.2.4 

tensor field, Riemannian metric, differentiability, 73.2.10 

tensor field along curve, 57.8, 57.8.8 

tensor field on a manifold, 57 

tensor index, lowering and raising, 13.5 

tensor juxtaposition product, 29.7, 57.11.7 

tensor juxtaposition product, multilinear-function-style, 
29.7.6 

tensor map, canonical, 28.2, 28.2.2, 28.6.2 

tensor map, canonical, dual-space, 28.2.8 

tensor of degree 1, 28.1.17 

tensor on differentiable manifold, linear-map style, 56.1.4 

tensor on differentiable manifold, multilinear-map style, 
56.1.3 

tensor product, alternating, 30.4.8 

tensor product, universal factorisation property, 28.3, 28.3.7 

tensor product defined via free linear space, 28.7 

tensor product in terms of free linear space, 28.7.2 

tensor product metadefinition, 28.6.2 

tensor product of linear spaces, 28.1 

tensor product space, 28.1.2 

tensor product space, mixed, 29.3.3, 29.5.4 

tensor product space, mixed, on mixed primal spaces, 29.3 

tensor product space, mixed, single primal space, 29.5 

tensor product space metadefinition, 28.6 cu: 

tensor product standard immersion using free linear space, 
28.7.2 

tensor pseudo-product, 57.11.7 

tensor space, 28, 28.1.2 

tensor space, canonical injection, scalar, 28.1.16 

tensor space, canonical injection, vector, 28.1.18 

tensor space, canonical tensor map, 28.6.2 

tensor space, degree, 28.1.11 

tensor space, mixed, dual vector canonical injection, 29.5.8, 
29.5.10 

tensor space, mixed, scalar canonical injection, 29.5.8, 29.5.10 

tensor space, mixed, traditional, 29.5.13 

tensor space, mixed, vector canonical injection, 29.5.8, 29.5.10 

tensor space, natural isomorphism, 29.2.5 

tensor space, symmetric, 30.5 NE 

tensor space, tangent, differentiable manifold, 56.1 

tensor space definition styles, 28.5 LI 

tensor space definitions, survey, 28.5.3 

tensor space degree and type terminology, survey, 29.5.11 

tensor space diagram, chicken-foot symbol, 28.1.9, 29.3.5 

tensor space dimension, 28.1.22 

tensor space dual, 29.1 NES 

tensor space isomorphism, unmixed, 29.2 

tensor space notations, general and antisymmetric, survey, 
30.4.13 

tensor space on Cartesian space, linear style, 30.6.5 

tensor space on Cartesian space, multilinear style, 30.6.4 

tensor space representations, plethora, 56.1.1 

tensor spaces, antisymmetric, 30.4 
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tensor symmetry, 30 

tensorisation, Christoffel array, 60.4.1 

tensorisation, Levi-Civita connection, 60.4.12 

tensorisation coefficients, 74.3.8 

tensorisation coefficients, affine connection, 60.4.12 

tensorisation coefficients, torsion, 60.4.11 

tensorisation coefficients for second-order tangent operator, 
60.4, 60.4.9 

tensors, coordinates idiom, physics, 27.5.11 

tensors, mixed, components, mixed primal spaces, 29.4 

tensors, mixed, components, single primal space, 29.6 - 

terminal point of curve, 36.2.13 m 

terminal point of path, 36.8.10 

terminology, contravariant, 20.10.11, 27.1.2, 28.5.7, 28.5.8, 
11.6.2 

terminology, covariant, 20.10.11, 27.1.2, 28.5.7, 28.5.8 

terminology, covariant derivative, 71.6.2 

terminology, differentiable manifolds, 51.2 

terminology, embedding, 50.2.3 = 

terminology, immersion, 50.2.3 

terminology, induced map versus differential, 58.4.1 

terminology, locally Cartesian spaces, 49.2.8 

terminology, manifolds, 49.2.8 IT 

terminology, submanifold, 50.2.3 

terminology, submersion, 50.2.3 

terrestrial coordinates, 76.6.1 

terrestrial coordinates, critical-point Hessian operator, 76.4 

terrestrial coordinates, differential, 76.4 ES 

terrestrial coordinates, two-sphere, 76.2 

test for continuity, keyhole, 31.12.16 

test for continuous differentiability, keyhole, 41.1.26 

test for continuous differentiability of map, keyhole, 41.2.20 

test for differentiability, keyhole, 40.3.5 a 

test for higher-order differentiability of map, keyhole, 42.5.28 

test for partial differentiability, keyhole, 41.1.5 

test for partial differentiability of map, keyhole, 41.2.3 

test functions, differentiability definition, pull-back, 52.1.18 

test particle, 36.0.1 

test-retest correlation, 40.1.1, 53.1.7 

tetrahedron, convex span, 22.11.16 - 

textbook, not a, 1.4.3, 1.6.1 

Thales of Miletus, 77.1.2. 

the, 6.8.1 

theorem, Ascoli's, 38.5, 38.5.5 

theorem, Banach-Tarski, 7.12.8 

theorem, Bernshtein’s, 13.1.6 

theorem, Bolzano-Weierstraf, history, 35.7.4 

theorem, boolean prime ideal, 13.8.8 

theorem, calculus, fundamental 1, 43.8.3 

theorem, calculus, fundamental 1 vector-valued, 43.9.7 

theorem, calculus, fundamental 2, 43.8.5 

theorem, calculus, fundamental 2 vector-valued, 43.9.8 

theorem, Cantor’s, 13.1.27 

theorem, Cantor’s, compared to Hartogs’s theorem, 13.3.1 

theorem, comparability of set cardinality, 13.1.20 

theorem, counting, well-ordered sets, 13.2.4 

theorem, dark, 6.6.14 -— 

theorem, deduction, literature survey, 4.8.6 

theorem, enumeration, well-ordered sets, 13.2.4 

theorem, existence, integral curve, 57.10.5 

theorem, Fermat's last, 7.12.8 — 7 

theorem, Hahn-Banach, 7.11.13 

theorem, Hartogs’s, 13.3, 13.3.2 

theorem, Hartogs’s, compared to Cantor’s theorem, 13.3.1 

theorem, Heine-Borel, Cartesian spaces, 34.9.14 

theorem, Heine-Borel, real-number interval, 34.9.12 

theorem, Heine-Borel, real-number sets, 34.9.13 

theorem, Hopf-Rinow geodesic completeness, 73.7.2 
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theorem, implicit function, 41.10 

theorem, intermediate value, 34.9.7 

theorem, inverse function, 41.10, 41.10.4 

theorem, Krein-Milman, 7.11.13 

theorem, Lebesgue differentiation, 45.7 

theorem, linear space isomorphism, 24.2.16 

theorem, logical, 3.3.6 ERA 

theorem, maximality, Hausdorff's, 7.11.13 

theorem, mean value, 40.6 

theorem, naive, 4.8.8 ^ 

theorem, nested interval, 37.9.2, 37.9.3, 37.9.12 

theorem, ODE uniqueness, 44.5.2 

theorem, ODE uniqueness, systems, 44.5.3 

theorem, one-parameter subgroup existence, Lie group, 62.9.9 

theorem, one-parameter subgroup uniqueness, Lie group, 
62.9.7 

theorem, partial derivative commutativity, 42.3.6, 42.5.23 

theorem, PC (propositional calculus), 4.5.5 

theorem, Peano ODE existence, 44.8.00 — 

theorem, Peano ODE existence, systems, 44.4.9 

theorem, Picard ODE existence, 44.6.20 

theorem, Picard ODE existence, systems, 44.7.1 

theorem, QC (predicate calculus), 6.1.1 

theorem, rank-nullity, 24.2.17 mm 

theorem, representation, well-ordered sets, 13.2.4 

theorem, Rolle's, 40.6.4 

theorem, Schróder-Bernstein, 13.1.8 

theorem, shim, choice axiom, 13.8.14 

theorem, Stokes, exterior derivative, 46.9.7 

theorem, Stokes, literature, 46.9.1 

theorem, Stokes, pathwise parallelism, 48.2.2 

theorem, Stokes, rectangular in two dimensions, 46.9.3 

theorem, Stone-Weierstraf, generalised, 7.11.13 

theorem, Tikhonov, axiom of choice, 33.5.16 

theorem, Tikhonov's, 7.11.13 

theorem, total differential, 41.6.15 

theorem, trichotomy of set cardinality, 13.1.20 

theorem, uniqueness, integral curve, 57.10.6 

theorem, well-ordering, 7.12.7, 22.7.21 

theorem corpus, 7.9.2 

theorem hoard, 4.5.11, 4.8.14 

theorem of calculus, fundamental, 43.8 

theorem tagging, axioms of choice, 7.1.11 

theorem using choice axiom, 10.3.9, 10.5.17, 10.11.10, 11.6.23, 
22.7.21, 22.7.27, 23.5.9, 23.5.10, 33.4.21 © 

theorem using countable choice axiom, 13.7.22, 13.8.12, 
13.8.13, 13.9.9, 13.10.10, 13.10.11, 33.4.25, 33.7.6, 


theorems, trivial, purpose, 26.14.4 

theoretical physics, out of scope, 1.6.3 

theory, containment, 7.2.10 7 

theory, gauge, 70.8 

theory, membership, 7.2.10 

theory, model, 6.1.6, 6.6.14 

theory, set, 7 BE 

theory, Yang-Mills, history of gauge theory, 70.8.2 
theory of distributions, 31.9.8 

thin ice, skating, 11.2.18 

thinking, the art of, 3.0.3 

thinning (Verdünnung), sequent, 5.3.8 

Thomson, William (Lord Kelvin), 46.9.1 
three-storey model for differential geometry, 49.2.10 
thrust, research, measure and integration, 43.1.3 _ 
Tietze’s second axiom, topology separation class, 33.3.25 
Tikhonov, Andrei Nikolaevich, 77.1.7 

Tikhonov’s theorem, 7.11.13 

Tikhonov’s theorem, axiom of choice, 33.5.16 
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timber store, wood block, 9.4.4 topological left transformation group homomorphism, 
time, native, 53.1.7 T 36.10.16 

time-space, Minkowskian metric tensor field, 75.2.6 topological left transformation group of topological space, 
Toba, Lake, supereruption, 77.4.3 mes 36.10.4 

tollendo tollens, modus, 4.8.15 topological left transformation group of topological space, 
tollere, 4.8.15 effective, 36.10.8 

tooth fairy, sixpence, axiom of choice, 13.8.15 topological linear space, 39, 39.1, 39.1.3 

topic flow diagram, 1.2 topological linear space, Cartesian, 26.11.1 


topological linear space, convergent infinite series, 39.2.7 


o0 


topological, coarse, 31.3.1 


topological, trivial, 31.3.18 topological linear space, divergent infinite series, 39.2.7 
topological automorphism, 31.14.2 topological linear space, infinite series, 39.2, 39.2.2 
topological automorphisms, continuous family, 36.10.14 topological linear space, partial-sum sequence, 39.2.2 
topological boundary of set, 31.9.5 topological linear space, sum of infinite series, 39.2.6, 39.2.9 
topological classification, fibre bundles, out of scope, 47.7.1 topological manifold, 50, 50.1, 50.1.1 

topological closure, 31.8.7 topological manifold, boundary point, 50.1.8 

topological closure notations, survey, 31.8.10 topological manifold, complete atlas, 49.7.18 

topological connectedness, 34 topological manifold, continuous embedding, 50.3.3 


topological manifold, direct product, 50.4.2 
topological manifold, Hausdorff, 50.1.3 
topological manifold, Hausdorff condition, 50.1.2 
topological manifold, maximal atlas, 49.7.18 
topological dual of linear space, 23.6.7 topologica manifold, non-Hausdorff, 50.1. 
topological exterior of set, 319.2 —— topologica mani old, product-structured, 50.5.10 
topological fibration, 47.3 —= topological manifold, product-structured, induced atlas, 


topological fibration, continuous cross-section, 47.4.6 un fold "- d 1 
topological fibration, cross-section, 47.4, 47.4.2 topological manitoid; product-structured, regular 


topological fibration, fibre atlas, 47.5.2 submanifold, 50.5.4 . 
topological fibration, fibre chart 17.3.10 topological manifold, product-structured submanifold, 50.5 


topological fibration, fibre chart continuity, 47.3.4 topologica poate regular embedding, SUM 
: : ; — topological manifold, regular immersion, 50.3.8 
topological fibration, local cross-section, 47.4.2 B am à 
topological manifold, regular submersion, 50.3.13 


ta pologica EH topologigal fibre set, 173.8 topological manifold, regular topological submanifold, 50.2.8 
topological fibration with fibre space F, 47.3.6 A m : = 
topological manifold, submanifold, 50.2 


to pologica fibration with intrinsic fibre space, 47.2, 47.2.2 topological manifold, topological submanifold, 50.2.6 
topological fibre atlas, 47.5 ; E Mr — 
topolosical fibreatlas ocavalent dir eae topological manifold, transition map, 49.6.4 
ELM Te avas, equivalent, X 9. topological manifold atlas, 50.1.11 


topological fibre bundle, £T, 47.5.5, 27-6, 47.6.5 topological manifold atlas, indexed, 50.1.12 


topological fibre bundle, associated, 47.9, 47.9.7 . " NE 
. . —Z topological manifold atlas, transition map, 49.7.7 
topological fibre bundle, associated, orbit-space method, : » —_ 
topological manifold chart, 49.6.2 


eS . topological manifold chart, torus, 50.1.7 
topological fibre bundle, associated, patchwork, 47.10.4 ; pr ; 

3 : : topological manifold coordinate map, 49.6.2 
topological fibre bundle, continuous-associated, 47.9.11 a E : A = 

s : topological manifold dimension, 50.1.4 
topologica fibre bundle, equivalent, 47.7.6 topological manifold embedding, 50.3 
to pologica fibre bundle, orbit-space associated, 47.11 topological manifold immersion, 50.3. 
topological fibre bundle, ordinary, 47.6.5 topological manifold product, 5 I 
topological fibre bundle, parallelism, 48 topological manifold submersion, 50.3 
topological fibre bundle, particle field, 47.12.2 topological manifold versus C? manifold, 51.2.5 
topological fibre bundle, patchwork associated, 47.10 topological ordinary fibre bundle, 47.6.5 m—— 
topological fibre bundle, pathwise parallelism, 48.3 topological patchwork space 3248 
topological fibre bundle association, 47.9.5 mni 


topological path, 36 
topological fibre bundle classification, out of scope, 47.7.1 topological pathwise parallelism, 48.3.2 


topological fibre bundle homomorphism, 47.7.3, 47.7.8 topological pathwise parallelism, associated, 48.4.2 


topological connectedness, literature, 34.1.1 
topological curve, 36 mum 
topologica decomposition space, 32.13.7 
topological dimension, 33.8 


w 


[o1 


topological fibre bundle isomorphism, 47.7.5, 47.7.11 topological principal bundle, right action chart-independence, 
topological fibre bundles, equivalent, 47.7.12 AT.8.10 

topological fibre chart, 47.3 topological principal bundle, right action map, 47.8.7 
topological fibre set of topological fibration, 47.3.8 topological principal bundle, right transformation group, 
topological fibre/frame bundle, contravariant, 47.13.2 47.8.14 

topological fibre/frame bundle, local trivialisation, 47.13.4 topological principal fibre bundle, 47.8, 47.8.3 

topological fibre/frame bundles, combined, 47.13 topological principal G-bundle, 47.8.3 

topological G-bundle, 47.8.3 mm topological quotient space, 32.13, 32.13.7 

topological glue, 31.2.4 topological right transformation group, 36.11.3 

topological group, 36, 36.9, 36.9.2 topological right transformation group, effective, 36.11.6 
topological group of differentiable diffeomorphisms, 63.1.6 topological right transformation group, free, 36.11.6 
topological group of linear transformations, 39.6 topological right transformation group, self-acting, 36.11.9 
topological identification space, 32.13, 32.137 topological right transformation group, transitive, 36.11.6 
topological interior of set, 31.8.2 topological space, 31.3, 31.3.3 

topological layer, 1.1 topological space, boundary operator, 31.9.7 

topological left transformation group, self-acting, 36.10.12 topological space, Cartesian, 26.11.1, 32.6.3, 49.4.3 
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topological space, closure operator, 31.8.9 topology, algebraic, out of scope, 1.6.3 
topological space, compact, 33.5.10 topology, Cartesian space, 32.6 — 
topological space, connected, 34.1, 34.1.3 topology, coarse, 31.3.18 ^ 
topological space, connected component, 34.5, 34.5.3 topology, combinatorial, 31.1.2 
topological space, connected component of a subset, 34.5.6 topology, compact-open, 33.5.20 
topological space, countability class, 33.4 topology, complex numbers, usual, 32.6.11, 32.6.12 
topological space, curve, 36.2 E topology, differential, out of scope, 1.6.3 
topological space, dense subset, 33.4.2 topology, direct product, 32.9.4 s 
topological space, disconnected, 34.1.3 topology, direct product, family of spaces, 32.12.2 
topological space, disconnection of subset, 34.3.4 topology, direct product, two spaces, 32.9 
topological space, discrete, 31.3.19 ERN topology, discrete, 31.3.19, 37.5.7 ~~ 
topological space, exterior operator, 31.9.4 topology, discrete, induced by discrete metric, 37.2.5 
topological space, first countable, 33.4.12 topology, duality, 31.5.9 
topological space, homeomorphism between subsets, 31.14.12 topology, empty, 31.5.2 
topological space, interior operator, 31.8.4 topology, etymology, 31.1.1, 35.3.11 
topological space, Lindelof, 33.7.8 topology, extended integers, usual, 32.5.4 
topological space, local connectedness, 34.7 topology, finite-complement, 31.11.7 
topological space, locally Cartesian, 49.4, 49.4.7 topology, finite set, 31.5 
topological space, locally Cartesian, dimension, 49.4.10 topology, fundamental importance of continuity, 31.12.1 
topological space, locally Cartesian, variable-dimension, topology, Hausdorff, 33.1.24 
49.4.5 topology, hereditary properties, 33.5.11 
topological space, locally connected, 34.7.3 topology, induced, locally Cartesian atlas, 49.8 
topological space, locally connected point, 34.7.2 topology, infinite set, simple, 31.11 a 
topological space, locally Euclidean, 49.4 — topology, information content, 31.11.11 
topological space, locally pathwise connected, 36.7.14 topology, integers, usual, 32.5.3 
topological space, metrisable, 37.5.15 topology, intuitive, 31.3.8 
topological space, neighbourhood of point, 31.3.12 topology, inverse set-map, 32.8.1, 32.8.3 
topological space, normal, 31.2.8 iE topology, isolated point, 31.10.8 
topological space, nowhere dense subset, 33.4.4 topology, largest, 31.3.17, 31.5.1 
topological space, pathwise connected, 36.7.7 topology, locally compact, 33.6.3 
topological space, product-structured, 32.11.1, 32.11.6 topology, minimal closed-singleton, 31.11.8 
topological space, product-structured, subspace topology, minimal T, 31.11.8 
homeomorphism, 32.11.4 topology, most important, real numbers, 32.5.1 
topological space, second countable, 33.4.13 topology, non-Hausdorff, 33.1.27 
topological space, separability, 33.4 topology, non-negative extended integers, usual, 32.5.4 
topological space, separable, 33.4.6 topology, non-negative integers, usual, 32.5.3 
topological space, separation classes, stronger, 33.3 topology, non-trivial, 31.11.11 
topological space, separation classes, weaker, 3331. topology, paracompact, 33.7.15 
topological space, sequentially compact, 35.6.2 . topology, paracompact, countably, 33.7.16 
topological space, subset, locally connected, 34.7.5 topology, particular-point, 31.11.13 
topological space classes, 33 topology, patchwork, 32.15.3 
topological space compactness classes, cover-based, 33.7 topology, permutation-invariant, 31.11.5 
topological space constructions, 32 B topology, point-set, 31.1.2 
topological space direct product, 32.9.4 topology, product, family of spaces, 32.12 
topological space disconnection, 34.3.2 topology, pull-back, 32.8.1 
topological space product, projected slice, 32.10 topology, quotient, 32.13.9 
topological spaces, strongly separated, connectedness, 34.2 topology, rational numbers, usual, 32.5.6 
topological submanifold, horizontal, product-structured topology, real numbers, the most important topology, 32.5.1 
manifold, 50.5.10 topology, real numbers, usual, 32.5.7 
topological submanifold, product-structured, induced atlas, topology, relative, 31.6, 31.6.2 
50.5.8 topology, separable, choice function, 33.4.8 
topological submanifold, regular, topological manifold, 50.2.8 topology, set union, 32.14.6 
topological submanifold, topological manifold, 50.2.6 topology, set-union, disjoint, 32.14.11 
topological submanifold, vertical, product-structured topology, set-union, overlapping, 32.14.8 
manifold, 50.5.10 topology, smallest, 31.3.17, 31.5.1 
topological subspace, horizontal, product-structured space, topology, standard, finite-dimensional linear space, 32.6.6 
32.11.6 topology, standard, general linear group, 32.6.8 
topological subspace, vertical, product-structured space, topology, strength, 31.3.22 ~ 
32.11.6 topology, stronger, 31.3.23 
topological tangent vector/covector bundle association, topology, strongly locally compact, 33.6.10 
55.4.17 topology, subject matter, 31.1.2 md 
topological transformation group, left, 36.10 topology, To separation class, 33.1.5 
topological transformation group, right, 36.11 topology, Tı separation class, 31.3.8, 31.10.5, 33.1.8 
topological vector space, terminology, 39.1 _ topology, translation-invariant, 31.11.3, 31.11.5 
topological vector spaces, out of scope, 1.6.3 topology, trivial, 31.3.18 
topologically separated pair of sets, 33.2 topology, trivial, pseudo-metric space, 37.2.5 
topology, 31, 31.3.2 cu topology, underlying, differentiable locally Cartesian space, 
topology, algebraic, 36.0.1 51.3.13 
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topology, underlying, differentiable manifold, 51.3.14 

topology, weak, generated by family of maps to topological 

spaces, 32.4.8 

topology, weaker, 31.3.23 

topology, weakness, 31.3.22 

topology classes, 31.11.11 _ 

topology for real numbers, standard, 32.5 

topology formalism, 31.3.1 mu 

topology generated by collection of subsets of set, 32.1.4 

topology generation from set collections, 32.1 

topology induced by a metric, 37.5, 37.5.2 

topology induced by continuous locally Cartesian atlas, 

49.8.12 

topology induced by differentiable locally Cartesian atlas, 

51.5.8 

topology induced by forward map, 32.13.5 

topology induced by norm, 39.3.2 

topology induced by norms on finite-dimensional linear space, 

equivalent, 39.5.3 

topology induced on a set by a function, 32.8.4 

topology induced on set by locally Cartesian atlas, 51.5.7 

topology of pointwise convergence, 33.5.19, 35.3.27 

topology of real number intervals, 34.9 

topology on finite set, T1, 33.1.17 — 

topology on finite set, uniform, 31.5.10 

topology on the empty set, 31.3.9 — 

topology partitions sets into interior, exterior and boundary, 

31.9.12 

topology pull-back, 32.0.1 

topology separation class, Tietze's second axiom, 33.3.25 

topology separation class, Vedenisov axiom, 33.3.29 

topos (xóxoc), 31.1.1, 35.3.11 

torchlight shadow set properties, 45.6.5 

torsion, abstract affine connection, 71.12.4 

torsion, affine connection, 71.12 

torsion, affine connection, chart alignment, 65.9.2 

torsion, conservation of momentum, rotational slippage, 
67.1.3 

torsion, convention for index order, 71.13.1 

torsion, geometric meaning, 71.12.1 

torsion, index order convention, 71.12.2 

torsion, non-zero, affine connection example, 71.12.6 

torsion, not defined for general connections, 71.0.1 

torsion, rotational slippage, parallel transport, 67.1.2 

torsion, tensorisation coefficients, 60.4.11 

torsion, zero, Levi-Civita connection, 73.1.4 

torsion and geodesics, 72.2 

torsion field, 71.12.5 m 

torsion for a general connection, 72.2.3 

torsion-free, Levi-Civita connection, 74.2.3 

torsion-free connection, 71.12.8, 72.5.2 

torsion-free connection, interpretation, 74.2.10 

torsion-free connection, terrestrial coordinates, 76.7.1 

tortoise, Achilles, Zeno of Elea, 31.2.3 ES 

torus differentiable manifold chart, 51.10.8 

torus topological manifold chart, 50.1.7 

torus topology, semi-open interval, 32.6.15 

torus topology distance function, semi-open interval, 37.2.12 

total differential, Banach space, 41.9.8 

total differential, several real variables, 41.6, 41.6.7 

total differential, terminology, 41.6.1 ^ M77 

total differential theorem, 41.6.15 

total differential theorem, coordinate-free vectorial geometry, 
41.6.3 

total induced-map tangent covector space, 58.3.9 

total induced-map tangent space, 58.3.4 

total order, 11.5, 11.5.1 


total ordering of finite ordinal numbers, 12.2.6 
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total space, differentiable fibre bundle, 64.8.3 

total space, non-topological fibre bundle, 21.8. 

total space, tangent operator, 54.12.1 

total space, tensor fibration, linear-style, 56.3. 

total space, tensor fibration, multilinear-style, 56.3.2 

total space, topological fibre bundle, 47.6.5 

total space, vector bundle, scalar product operation, 65.2.2 

total space manifold, tangent bundle, 54.5.26 

total space manifold atlas, tangent bundle, 54.5.22 

total space vector field induced by Lie algebra, 64.1 

total tangent space, horizontal component, 59.2, 5 

total tangent space, vertical drop function, 59.2 

total tangent space, vertical vector, 59.2.3 

total variation, metric-space-valued function, 38.10.3 

totally differentiable function, 41.6.4 

totally differentiable function, Banach space, 41.9.7 

totally differentiable function, continuously, 41.6.1 

totally disconnected subset of topological space, 34.3.14 

totally disconnected topological space, 34.3.14 

totally ordered family of elements of a set, 11.5.20 

totally ordered family of functions, 11.5.22 

totally ordered family of sets, 11.5.21 

totally ordered set, 11.5.1 

totally ordered set, interval notation, 11.5.13 

totally ordered set, interval notation, semi-infinite, 11.5.15 

towers, intellectual, sandy foundations, 2.0.2 

trace-back, definitions and theorems, 2.0.2, 4.2.4 

trace-back, logic theorems, 4.3.4, 4.3.17 

trace of covariant derivative map, 71.8.2 

trace of Hessian, Laplace-Beltrami operator, 74.6.4 

trace of linear map, 23.3 mum 

trace of linear map, basis-independent, 23.3.2 

trace of linear map, basis-independent, tangent space, 
71.11.12 

trace of linear map, divergence interpretation, 23.3.5 

trace of linear map, used for Ricci curvature, 71.11.10 

trace of linear map, used in definition of divergence, 71.8.3 

trace of linear space endomorphism, 23.3.3 T T 

trace of linear space endomorphism, divergence of vector 
field, 71.8.1 

trace of matrix, 25.9, 25.9.2 

trace of matrix, geometric interpretation, 25.9.5 

trace of matrix, invariant like the determinant, 25.10.14 

trace of matrix, properties, 25.9.4 i 

track, 36.1.1 

tractrix, history of geodesics, 73.8.2 

trade-off, calculus of variations, function versus derivative, 
44.8.4 

trade-off, meaningfulness versus convenience, 28.1.19 

trade-off, truth versus beauty, 1.4.11 

trail, snail, 59.1.2 cun 

train, moving, inertial frame, 20.1.10, 20.10.2 

train with ant on turntable, 69.10.5 

trajectory, 11.5.28, 36.1.1 

tranche of stages, von Neumann universe, 12.6.5 

transcendental functions, 44.1.8 

transfer function envelope, distance, continuity, 38.2.3 

transfinite induction, 11.8, 22.7.21, 22.7.26 — 

transfinite induction, connectedness principle, 11.8.5 

transfinite induction, literature, 11.8.6 

transfinite induction, principle, 11.8.2 

transfinite recursive cumulative hierarchy of sets, 12.6 

transformation, affine, 24.4, 26.1.3 7 

transformation, affine, on linear space, 24.4.6 

transformation, gauge, between cross-sections for connections, 

69.12.1 

transformation, gauge, fibre chart transition formula, 64.13.10 

transformation, gauge, for gauge potentials, 69.11.5, 69.12.11 
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transformation, gauge, 
transformation, gauge, 
transformation, gauge, terminology, 70.8.4 

transformation, gauge, vector fields on fibre bundles, 64.13.2 
transformation, infinitesimal, fibre sets, 64.13.1 
transformation, infinitesimal, left, 63.6 

transformation, infinitesimal, right, 63.7 

transformation, orthogonal, 77.2.7 as 

transformation group, 20 

transformation group, associated, 20.9.1, 20.9.17, 20.11 
transformation group, differentiable, 63.43 © 
transformation group, figure space, 20.9.13 

transformation group, left, 20.1, 20.1.2 

transformation group, left, effective, 20.2, 20.2.1 
transformation group, left, equivariant map, 20.6.6 
transformation group, left, free, 20.3, 20.3.2 

transformation group, left, homomorphism, 20.6 
transformation group, left, Lie, 63.4, 63.4.2 
transformation group, left, topological self-acting, 36.10.12 
transformation group, left, transitive, 20.4, 20.4.1 
transformation group, Lie, 62.0.3, 63 — 

transformation group, Lie, general linear, 63.4.17 
transformation group, Lie left, left translation operator, 
63.4.7, 63.4.8 
transformation group, 
63.5.6, 63.5.7 
transformation group, 
transformation group, 
transformation group, 
transformation group, 
transformation group, 
transformation group, 
transformation group, 
transformation group, 
bundle, 21.11.8 
transformation group, 
transformation group, 
47.8.14 
transformation group, 
transformation group, 
transformation group, 


identity chart transition map, 66.3.3 


shorthand formulas, 69.13.11 


Lie right, right translation operator, 


line-to-line, 26.20 

right, 20.7, 20.7.2 

right, effective, 20.7.9 

right, equivariant map, 20.8.4 

right, free, 20.7.11 

right, homomorphism, 20.8 

right, Lie, 63.5, 63.5.3 

right, non-topological principal fibre 


right, principal bundle, 66.2.7 
right, topological principal bundle, 


right, topological self-acting, 36.11.9 

right, transitive, 20.7.14 

structure-preserving, 19.2.2 

transformation group, topological, left, 36.10 

transformation group, topological, right, 36.11 

transformation group ambiguity, left/right, 8.8.7, 63.7.1 

transformation group figure, 20.9, 20.9.2 —— 

transformation group homomorphism, 20.6.2, 20.8.1 

transformation group homomorphism, topological left, 

36.10.16 

transformation group invariant, 20.9 

transformation group of a topological space, effective left, 

36.10.7 

transformation group of topological space, left, 36.10.3 

transformation group of topological space, right, 36.11.2 

transformation group of topological space, topological left, 

36.10.4 

transformation groups, 20.1.1 

transformation rule, differential, fibre chart, 64.8.10 

transformations, general linear, Lie group, 63.4.18, 63.4.19 

transformations, linear, topological group, 39.6 

transition array, vector components, fibre /frame bundle, 
22.9.12 

transition component array, basis, 22.9.6 

transition component array, basis, multiplication conventions, 
22.9.10 

transition formula, chart, tangent vector frames, 55.6.19 

transition formula, chart, tangent vector tuples, 55.5.27 

transition group, reference-frame, topological fibre/frame 
bundle, 47.13.2 
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transition map, components, vector, 22.9 

transition map, differentiable fibre chart, 64.3.13 

transition map, differential, trivialisation, Maurer-Cartan 
form, 62.5.3 

transition map, differential of trivialisation, 64.8.12 

transition map, fibre chart, non-topological fibre bundle, 
21.8.12 

transition map, locally Cartesian space, 49.6.4 

transition map, locally Cartesian space atlas, 49.7.7 

transition map, topological manifold, 49.6.4 ~ 

transition map, topological manifold atlas, 49.7.7 

transition map, vector components, 22.9.4 

transition map conformality, 54.5.31 

transition map differential, identity chart, 66.3.4 

transition map orthogonality, 54.5.31 

transition map subscript order, fibre chart, 47.6.11 

transition matrix, chart, differentiable manifold, 51.4.18 

transition matrix, components, dual linear space, 23.9.7 

transition rule, chart, tangent vector components, 54.1.11 

transition rule, connection generator, ordinary fibre bundle, 
67.6.8 

transition rule, differential, fibre chart, 64.8.10 

transition rule, Lie-algebra-generated vector field, 
non-vertical, 64.14.5 

transition rule, Lie-algebra-generated vector field, vertical, 
64.13.11 

transition rules, covector chart, double contragredient, 55.3.10 

transition rules, manifold chart, contragredient, 54.4.12, 
55.2.16 

transitive left transformation group, 20.4, 20.4.1 

transitive relation, 9.6.15 

transitive right transformation group, 20.7.14 

transitive topological right transformation group, 36.11.6 

transitivity, equivalence relation, 9.8.2 ERES 

transitivity arrow, 11.1.11, 11.2.3 

transitivity arrow, total order, 11.5.8 

transitivity of an order, 11.1.2 

transitivity rule for pathwise parallelism, 48.3.4 

translate of set, 31.11.3 

translation, invariant vector field, Lie group, 62.4.12, 62.10.10 

translation, right, of fundamental vector field on principal 

bundle, 66.6.10, 66.6.11 

translation, syntax-directed, 3.11.7 

translation-invariant topology, 31.11.3, 31.11.5 

translation operator, left, 62.3.0 _ 

translation operator, left, of Lie left transformation group, 

63.4.7, 63.4.8 

translation operator, right, for vector field on principal 

bundle, 66.2.21 

translation operator, right, of Lie right transformation group, 


63.5.6, 63.5.7 

translation operator for differential forms, left, 62.3.15, 
63.4.10 

translation operator for differential forms, right, 62.6.9, 63.5.9 


translation operator for tangent covectors, left, 62.3.15, 


63.4.10 
translation operator for tangent covectors, right, 62.6.9, 
63.5.9 


translation operator for tangent vectors, left, 62.3.9 
translation operator for tangent vectors, Lie group, right, 
62.6.3 

translation operator on Lie group, left, 62.3, 62.3.3 
translation operator on Lie group, right, 62.6, 62.6.2 
transport, Lie, 61.7.11 

transport, Lie, curve, 61.7.10 

transport, Lie, glacier flow, 61.7.1 

transport, parallel, 67.3.1, 72.2.2 

transport, parallel, horizontal lift function, 71.4 
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transport, parallel, 
transport, parallel, 


of tangent space along curve, 71.4.3 

on curve-family, 71.4.9 

transport, parallel, on the two-sphere, 71.0.3 

transport, parallel, ordinary fibre bundle, 68.5 

transport, vector field, a-priori formula, 61.7.3 

transpose map, linear space, 23.11, 23.11.2 

transpose of a matrix, 25.4.11 

transposed affine connection, 71.5.2 

transposed differential of map at a point, 58.8.2 

transposed fundamental vector fields, Lie bracket, 66.6.13 

transposed horizontal lift function, OFB, 67.8 

transposed horizontal lift function, principal bundle, 69.2, 
69.2.2 m 

transposed horizontal lift function, tangent bundle, 71.5 

transposed horizontal lift function on differentiable "fibre 
bundle, 67.8.2 

transposed infinitesimal action map, principal bundle, 66.6.2 

transposed infinitesimal action map, properties, 66.6.8 

transposed map, linear space, 23.11.3 Jaa 

transposition of function-valued function, 10.19.4 

transposition operation on set, 14.8.14 

traversable path, finite, 21.15.7 

traversal, 36.1.1 

traversal, membership relation network, 7.8.9 

traversal, ordered, 11.5.28, 11.5.29, 36.1.8, 36.2.2 

traversal mechanism, parametrised line, 40.2.3 _ 


tree, family, algebraic systems, 17.0.1 

tree, family, classes of sets, 18.11.3 

tree, family, lattice, 11.4.1 

tree, family, modules and algebras, 19.0.1 

tree, family, order relations, 11.1.1 

tree, family, relations and functions, 9.0.2 

tree, family, rings, 18.1.1 cn 

tree, family, set rings and algebras, 18.12.6 

tree, family, well-orderings, 11.6.3 

tree, partial derivative, for partial function, 42.2.5, 42.5.6 


tree versus tuple, specification, 47.6.3 
Trennungsaxiom (separation axiom), 33.1.2 

triangle, arithmétic, 14.9.5 

triangle inequality, 37.3.12 

triangle inequality, general metric function, 37.1.2 
triangle inequality, infinitesimal area elements, 43.9.3 
triangle inequality, infinitesimal line elements, 40.8.2 
triangle inequality, module over ring, 19.8 

triangle inequality, real-valued metric function, 37.2.3 
triangle inequality equivalents, metric space, 37.3.8 
triangle inequality equivalents, open balls, 37.3.10 - 
Triangle inequality for modules over rings, 19.8.11 
triangle of Pascal, 14.9.5 

triangulation of fields, 22.0.3 

trichotomy, 7.11.12 

trichotomy, cardinality, 13.4.1 

trichotomy theorem, set cardinality, 13.1.20 

trig point, 7.1.5 ME 
trigonometric function, 44.2 

trigonometric function, inverse, 44.2.3 

trilinear map, 27.2.16 T 

triple, ordered, 9.3.3, 14.6.5 

triplicity, 6.8.17 

triplique, 6.8.17 

tripliqueness, 6.8.10 

trivial absolute value function, 18.6.10 

trivial fibration, 21.2.5 

trivial fibre bundle, product-structured space, 10.15.11, 21.0.7 
trivial ring, 18.1.5 

trivial subspace, 22.1.14, 22.5.4 

trivial theorems, purpose, 26.14.4 

trivial topological space, 31.3.18 
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trivial topology, 31.3.18 

trivial topology, pseudo-metric space, 37.2.5 

trivialisation, differential, transition map, 64.8.12 

trivialisation, differential transition map, Maurer-Cartan 
form, 62.5.3 

trivialisation, local, differentiable fibre bundle, 64.8.3 

trivialisation, local, differential transition map, 64.8.11 

trivialisation, local, fibre bundle quintessence, 10.15.11, 21.0.7 

trivialisation, local, non-topological fibration, 21.5.7 

trivialisation, local, topological fibre bundle, 47.6.5 

trivialisation, local, topological fibre/frame bundle, 47.13.4 

trivialisations, induced atlas, differentiable fibre bundle, 
64.9.2 

trivialisations, pull-back atlas, differentiable fibre bundle, 
64.9.2 

true, proposition tag, 3.2.2 

true differentiable structure, tangent bundle, 49.1.2, 53.1.1 

true dual space, 55.2.4 

true nature, tangent vector, 53.1 

true tangent bundle, 53.1.7 

true tangent bundle, Cartesian space, 53.1.8 

true zero-parameter predicate, 5.1.10 

truth, falsity, 3.1.1 

truth, sacrifice for beauty, 1.4.11 

truth, vacuous, 7.6.8 

truth domain of a truth value map, 3.5.2 

truth function, 3.7.2 

truth function, binary, 3.7 

truth function, binary, classification, 3.8 


truth function, exclusion semantics, 3.7.6 
truth not decided by majority vote, 141 
truth table, set operation properties, 8.3.6 

truth table, set operation propositions, 8.0.1 

truth table applicability, predicate calculus, 6.1.2 

truth value map, 3.2.3 E 

truth value map, exclusion semantics, 3.7.4 

truth value map, truth domain, 3.5.2 

truth values, two possible, propositions, 3.2.6 

truth versus argumentation, 3.1.4 mE 

truth versus knowledge, 3.1.3 

tuple, ordered, 14.6.5 ES 

tuple, real number, 16.4 

tuple, second-order tangent component, 60.5.3 

tuple, specification, 8.8 

tuple, specification, redundancy, 31.3.5 

tuple, specification, tangent velocity vector, 53.1.13 
tuple, tangent covector component, 55.2.5 ^ 
tuple concatenation, comma-notated, 46.8.7 

tuple concatenation operation, 16.4.3 

tuple fibration, tangent vector, 55.5.8 

tuple fibration, vector, on vector bundle, 65.7.7 

tuple of real numbers, 16.4.2 

tuple space, Cartesian, 16.4.1, 26.11.1 

tuple space, tangent vector, 55.5.4 

tuple space, vector, on vector bundle, 65.7.3 

tuple versus tree, specification, 47.6.3 

tuples, ordered, low-level representation, 9.3 

tuples, tangent vector, chart transition formula, 55.5.27 
Tura casing stones, 26.10.1 

turntable with ant on train, 69.10.5 
two-parameter arctan function, 44.2.9 
two-parameter set-theoretic function formula, 
two-point metric, 37.2.7 

two-point tensor, 58.3.11 

two-port object, 62.7.10 

two-sided ideal of a ring, 18.1.9 

two-sphere, geodesic curve, 76.7 

two-sphere isometry, 76.8 
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two-sphere rotation, 76.8.1 

two-valued logic, 3.1.3 

two-valued maps, equi-informational with subsets, 3.5.1 

two-variable function, 10.2.32 mE 

two-variable manifold map, componentwise differentiable, 
52.6.11 

two-variable manifold map, jointly differentiable, 52.6.11 

two-way assertion symbol, 4.3.15 

two’s complement representation of negative numbers, 14.4.12 

Tychonoff, see Tikhonov, 77.1.7 

type and degree, tensor space, terminology survey, 29.5.11 

type hierarchy, cumulative, 12.6.1 

typesetting logical quantifiers, 5.2.3 

typing versus literature, 1.4.6 


U, hint-letter for neighbourhood, Umgebung, 31.4.2 

ubiquitous zero tangent vector, 58.12.6, 58.12.10 

ultrafinitist, axiom of infinity, 12.1.27 

Umgebung, hint-letter U for neighbourhood, 31.4.2 

unbounded set, 11.2.5 

unbounded versus infinite, set membership depth, 7.8.8 

unconditional assertion, 4.3.9 ES 

uncountable infinite set, 13.10.14 

uncountable w-infinite, 13.10.14 

uncut space, 22.2.22 ~ | 

undefined higher-order partial derivative, partial function, 

42.2.7, 42.5.8 

undefined k-fold derivative, partial function, 42.1.7 

undefined value, limits, 11.2.18 

underlying ontology, velocity vectors, 26.12.2 

underlying set theory, model theory, 7.1.5 

underlying set theory for logic, 3.2.5 

underlying set theory for model theory, 3.16.2 

underlying topology, differentiable locally Cartesian space, 

51.3.13 

underlying topology, differentiable manifold, 51.3.14 

understanding versus computation, 1.4.6 

uneven distribution, rational numbers, 15.1.17 

uni-axial scaling, 24.4.1, 26.1.3, 77.2.7 

unidirectional derivative, several real variables, 41.5 

unidirectional differentiability, real function, 40.9 

unidirectional tangent bundle, 54.16, 5416.4 

unidirectional tangent bundle atlas, 54.16.4 

unidirectional tangent bundle chart, 54.16.4 

unidirectional tangent bundle lift function, 54.16.4 

unidirectional tangent bundle on affine space over group, 
26.3.16 


unidirectiona. 
26.7.9 
unidirectiona 
unidirectiona 
unidirectiona 
26.3.14 
unidirectiona 


tangent bundle on affine space over module, 


tangent bundle projection map, 54.16.4 
tangent bundle total space, 54.16.4 
tangent space on affine space over group, 


tangent space on affine space over module over 
ordered ring, 26.7.7 
unidirectional tangent vector, 54.16.4 
unidirectionally differentiable atlas for topological manifold, 
51.11.2 
unidirectionally differentiable function, 40.9.6 
unidirectionally differentiable manifold, 51.1 
uniform continuity, metric space, 38.3 NES 
uniform continuity condition, Cauchy integrability, 43.3.20 
uniform continuity for uniform spaces, literature, 38.3.13 
uniform 
uniform 
uniform 
uniform 
uniform 
uniform 


convergence, metric space, 38.4 
non-topological fibration, 21.2, 21.2.1 

space, completeness, 37.816 

spaces, uniform continuity, literature, 38.3.13 
substitution into symbol string, 3.10.7 
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uniform topology on finite set, 31.5.10 

uniformities, out of scope, 1.6.3 

uniformly bounded sequence of functions, 38.4.14 

uniformly bounded set of functions, 38.4.14 

uniformly Cauchy convergent sequence of functions, 38.4.3 

uniformly continuous function, 38.3.2 

uniformly convergent sequence of functions, 38.4.2 

uniformly equicontinuous sequence of functions, 38.4.11 

uniformly Hölder continuous function, metric space, 38.7.2 

uninteresting assertions, 4.5.12 

union, countable, finite set, 13.8 

union, disjoint, 8.7.12 E: 

union, set, topology, 32.14.6 

union axiom, 7.6.15 ^ 

union of a pair of sets, 7.6.19 

union of a set of sets, 7.6.16 

union of countable family of explicit real-number measure-zero 
sets, 45.3.3 

union of family of sets, 10.8.9 

union of sets, 8.1.2 

union of sets, properties, binary, 8.1 

union of sets, properties, general, 8.4 

union theorem, countable, uses CC, 13.9.9 

unions, closure under arbitrary unions, 8.6 

unique existence, 6.8.2 D 

unique existence quantifier notation, 6.8.3, 6.8.5 

uniqueness, 6.8 TIS 

uniqueness, cardinality, 6.8.7 

uniqueness, empty set, 7.6.1 

uniqueness, function, 10.2.5 

uniqueness failure, first-order ODE, 44.3.2 

uniqueness of sets defined by their comprehension, 7.6.24 

uniqueness theorem, integral curve, 57.10.6 

uniqueness theorem, ODE, 44.5.2 ^ 

uniqueness theorem, ODE systems, 44.5.3 

uniqueness theorem, one-parameter subgroup, Lie group, 
62.9.7 

unit ball recovery, seminorm, Minkowski functional, 24.6.10 

unit balls, seminorm, Minkowski functionals, 24.6.7 

unit balls, seminorm, properties, 24.6.8 MES 

unit element, unitary ring, 18.2.2 

unit interval, 16.1.7 

unit matrix, 25.4.2 

unit step function, 16.5.7 

unit vector, Cartesian linear space, 22.7.9 

unital group morphism, 17.4.8, 36.9.5 

unital morphism, field, 18.7.9 

unital morphism, unitary ring, 18.2.11 

unitary left module, 22.1.2 

unitary left module over a field, 19.3.16 

unitary left module over a ring, 19.3.6 

unitary left module over a ring, submodule, 19.3.14 

unitary module, affine space over, 26.7 

unitary module over field, affine space over, 26.10.3 

unitary module over ring, affine space over, 26.7.2 

unitary ring, 18.2, 18.2.2 

unitary ring, characteristic, 18.2.13 

unitary ring, commutative, 18.2.14 

unitary ring, commutative, differentiable real functions, 
51.6.4 

unitary ring, ordered, 18.6 

unitary ring, unital morphism, 18.2.11 

unitary ring homomorphism, 18.2.8 

unitary ring-module morphisms, 19.4.11 

unity, unitary ring, 18.2.2 

universal bilinear map, 28.6.4 

universal factorisation property, multilinear functions, 27.8, 
27.8.7 
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universal factorisation property, tensor product, 28.3, 28.3.7 
universal quantifier, 5.2.1 7 
universal quantifier, restricted, 7.2.7 

universal/existential quantifier, information content, 5.2.6 
universalisation semantics, free variables in sequents, 6.4.1 
universe, constructible, 12.5.4 

universe, constructible, literature, 12.6.8 

universe, empty, predicate calculus, 6.3.4 

universe, von Neumann, 12.6.1 

universe, von Neumann, finite stages, 12.6.2 

universe, von Neumann, rank, 12.5.4 ~ 

universe, von Neumann, set membership depth, 7.8.8 
universe, von Neumann, tranche of stages, 12.6.5 
Universe is a fibre bundle, 21.0.7, 21.12.2 

univocal relation, 10.9.1 

unmentionable number, 2.3.3 

unordered pair, 7.6.12 incu 

unordered pair axiom, 7.6.10 

unordered sample, combination symbol, 14.9.1 

unordered selection, combination symbol, 14.9.1 
unoriented continuous path, 36.8.14 

unpacking components, double tangent vector, 59.3.6 
unpleasant task, tensors, 27.0.3 

unquoting logical expression names, 3.9.2 

unrestricted comprehension, 3.0.5 = 

unrestricted linear space, 22.2, 22.2.5, 23.7.4 

unrestricted linear space, standard immersion, 22.2.7 
unrestricted linear space, vector-valued, 22.2.27 
unstructured operator domain, module over, 19.1 

pgrade, vector tuple map to tensor map, 28.3.2. 


pper bound, real square matrix, 25.12 

pper bound function, real square matrix, 25.12.2 
pper bound of partially ordered set, 11.2.4, 11.3.2 
pper Darboux sum, 43.5.6 

pper modulus, real square matrix, 25.11, 25.11.2 
pper well-ordering, 11.6.4 m m 
rysohn's lemma, axiom of choice, 33.3.22 

useless axiom of choice, 7.4.3 

usual atlas for Cartesian space, 49.7.12 

usual metric on real numbers, 37.2.8 


usual topology for R”, 32.6.1 


goegnumnuignt 


ct 


V-finite set, 13.11.1 

vacuous truth, 7.6.8 

validity, golden test, 4.5.14 

value, absolute, complex numbers, 16.8.8 

value, absolute, function, 18.5.6 ~ 

value, absolute, real numbers, 16.5.2 

value, intermediate, theorem, 34.9.7 

value, pseudo-absolute, function, 18.5.2 

value of function, 10.2.8 

value theorem, mean, 40.6 

vapid philosophy, 26.12.2 

variable, bound, predicate calculus, 6.3.9 

variable, dummy, 6.3.10 =- 

variable, free, check accent, 6.3.16 

variable, free, hat accent, 6.3.16 

variable, free, predicate calculus, 6.3.9 

variable, free, selective quantification, 6.3.11, 6.7.8, 6.7.10 

variable-dimension locally Cartesian topological space, 49.4.5 

variable space, concrete, 7.5.12 

variables, free, extraneous, 6.3.12, 6.3.13 

variables, several, function, 10.2.31 

variation, bounded, functions, 38.10 

variation, bounded, metric-space-valued function, 38.10.4 

variation, locally bounded, metric-space-valued function, 
38.10.7 

variation, total, metric-space-valued function, 38.10.3 
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variations, calculus, geodesics, history, 73.8.2 

variations, calculus, literature, 44.8.1 

variations, calculus, on Cartesian space, 44.8 

variations, calculus of, extremisation of distance, 73.8 

VB (vector bundle), 69.15.1, 70.0.1 — 

Veblen, Oswald, 77.1.7 

Veblen, Oswald, topology, 31.1.1 

vector, action, on real-valued function, 61.1.2 

vector, chart-basis, tangent space, 54.4.9 

vector, component function, 22.8.10 

vector, component tuple, 22.8.17 

vector, cotangent, 55.2 

vector, displacement, 26.12.1 

vector, double tangent, component unpacking, 59.3.6 

vector, double tangent, vertical component coordinates 
extraction, 59.3.7 

vector, dual, 28.5.7 

vector, etymology, 26.1.10 

vector, fibre-set tangent, identification with vertical vectors, 
64.12.7 

vector, primal, 28.5.7 

vector, second-level tangent, notation, 59.1.12 

vector, second-level tangent, scaling, 59.4.1 _ 

vector, second-order tangent, 60.5.5 

vector, simple, 30.4.22 

vector, tangent, concatenation, 54.7.1 

vector, tangent, differentiable manifold, 54.1 

vector, tangent, extra-mathematical, 53.1.14 

vector, tangent, finite-dimensional linear space, 54.1.14 

vector, tangent, manifold, 54.1.2 

vector, tangent, ontology, 53.1.15 

vector, tangent, survey of definitions, 53.3.2 

vector, tangent, true nature, 53.1 

vector, tangent-line, Cartesian space, 26.13.1, 26.13.2, 26.13.3 

vector, tangent-line, differentiable manifold, 54.1 

vector, tangent velocity, 54.10.4 7 

vector, tangent velocity, differentiable manifold, 54.10 

vector, tangent velocity, specification tuple, 53.1.13 

vector, unit, Cartesian linear space, 22.7.9 

vector action on fibration cross-section, 64.7.7 

vector action on naive vector field, Cartesian space, 46.4.2 

vector action on tensor field, 61.4.6 

vector action on vector field, Cartesian space, 46.1.8 

vector action on vector field, manifold, 61.2.3 ^ — 

vector addition, 22.1.1 

vector angle, Riemannian manifold, 73.2.16 

vector bundle, 65.1 

vector bundle, associated, gauge theory matter field, 21.12.9, 
47.11.2 

vector bund 

vector bund 

vector bund 


e, Christoffel array, 68.1.8 
e, connection coefficient array, 68.1.8 
e, covariant derivative, 68.2.9 
vector bundle, covariant derivative, literature, 68.2.1 
vector bundle, covariant derivative Leibniz rule, 68.2.16 
vector bundle, differentiable, 65, 65.1.3 
E aS) 
e, 
e, 
e, 
e, 


vector bundle, dual, 65.2.9 

vector bundle, linear connection, 68.1.4 

vector bundle, linear operations, 65.2 _ 

vector bundle, linearity of oblique drop function, 65.4.4 

vector bundle, non-topological, 24.11, 24.11.2 

vector bundle, non-topological, Tinear space of cross-sections, 
24.11.4 

vector bundle, oblique drop function, 65.4.2 

vector bundle, orbit-space associated, connection, 69.10 

vector bundle, spray, 65.5.8 

vector bundle, tangent, fibre bundle, 54.0.1 

vector bundle, tangent, total space manifold chart, 54.5.19 

vector bundle, tangent-line, 54.5.16 
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vector bund 
bund 
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e, tangent-line, differentiable manifold, 54.5 

e, tangent velocity, total space, 54.10.10 

e, total space, scalar product operation, 65.2.2 
bundle, vertical drop function, 65.3.5 

bundle connection, 68, 68.1 

bundle connection, Cartan-style, coefficient array, 


vector 
vector 
vector 


vector 
vector 


e connection, curvature, 70.4 

e connection, linearity, 68.1.2 

e connection, moving-frame coefficient array, 68.3 

e connection, oblique drop, linearity, 68.1.6 je 

vector bundle connection form array, Cartan-style, 68.3.8 

vector bundle cross-section, covariant derivative, 68.2 

vector bundle cross-section, naive derivative, Leibniz rule, 
65.6, 65.6.2 

vector bundle cross-section product by real function, 
differentiability, 65.2.4 

bundle element scaling curve velocity, 65.5.2 

bundle fibre set, standard induced basis field, 65.1.8 

bundle literature, 65.0.2 

bundle oblique drop function, 65.4 

bundle vector-frame bundle, 65.8.10 

vector-bundle vector frame fibration, 65.8.6 

vector-bundle vector-frame space, 65.8.2 

vector-bundle vector-frame total space, 65.8.4 

bundle vector-tuple bundle, 65.7.13 

vector-bundle vector tuple fibration, 65.7. 

vector-bundle vector-tuple space, 65.7.3 

vector-bundle vector-tuple total space, 65.7.5 

vector bundle vertical drop function, 65.3 

vector bundle vertical drop function, pointwise, linearity, 
65.3.9 

vector component, horizontal, ordinary fibre bundle, 64.5 

vector component, second-level tangent vector, 59.1.13 

vector component map, 22.8 —— 

vector component transition array, fibre/frame bundle, 
22.9.12 

vector component transition map, 22.9, 22.9. 

vector components, covariant, non-singular metric tensor, 
73.5.9 

vector components, tangent, chart transition rule, 54.1.11 

vector concatenation, tangent, Cartesian space, 26.15.6 

vector definition, tangent, ontological clarity, 54.1.6 

vector embedding map, tangent, differential of inclusion map, 
58.5.6 

vector embedding map, tangent, differential of manifold 
embedding, 58.5.8 

vector embedding map, tangent, differential of submanifold 
restriction, 58.5.10 

vector embedding map, tangent, manifold product slice-set, 
54.8.3 

vector embedding map, tangent, open subset, 54.6.8 

vector embedding map, tangent, product-structured manifold, 
54.8, 58.7.5 

vector embedding map, tangent, submanifold, 54.6, 54.6.2 

vector family, linearly independent, 22.6.5 m 

vector field, 57.1 

vector field, action by curve, manifold, 61.2.7 

field, action by vector, manifold, 61.2.3 

field, action of vector, Cartesian space, 46.1.8 

vector field, action of vector, coordinates expression, 61.2.10, 
61.2.11 

vector field, action of vector, point/velocity/chart version, 
61.2.12 

vector field, action on real-valued function, 61.1, 61.1.3 

vector field, action on tensor field, 61.4.7 m 

vector field, action on vector field, 

field, Cartesian space, 46.1, 46.1.3 


vector bund 
vector bund 


vector bund 


vector 


vector 
vector 
vector 
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vector field, Cartesian space, differentiable, 46.1.5 

vector field, chart-basis, 57.1.18 

vector field, constant direct product, differentiability, 57.2.16 

vector field, constant vertical, differentiability, 64.2.5 

vector field, differentiable naive, Cartesian space, 46.3.4 

vector field, exterior derivative, 46.8 

vector field, fibre-set, horizontal, 64.5.13 

vector field, fibre-set, horizontal submanifold, 64.5.14 

vector field, fundamental, Lie left transformation group, 
63.6.6 

vector field, fundamental, on Lie groups, 62.4.14 

vector field, fundamental, principal bundle, 66.6.1 

vector field, fundamental vertical, 66.6 

vector field, fundamental vertical, principal bundle, 66.6.2 

vector field, induced map, 58.9.16 

vector field, integral curve, 57.10.2 

vector field, integral curve, differentiable manifold, 57.10 

vector field, invariant, translation, Lie group, 62.4.12, 
62.10.10 

vector field, left invariant, generated from matrices, 62.4.17 

vector field, Lie derivative, 61.8 

vector field, Lie group, left invariant, 62.4.2, 63.4.14 

vector field, Lie group, right invariant, 62.7, 63. 

vector field, lift by horizontal lift function, 67.5.9, 69.1.16 

vector field, lift by transposed horizontal lift function, 67.8.6, 
69.2.7 

vector field, lifted chart-basis, on principal bundle, 69.14.3 

vector field, linear space, 57.2.8 


vector field, local, lift by horizontal lift function, 67.5.10, 


69.1.16 
vector field, naive, Cartesian space, 46.3, 46.3.3 
vector field, naive, Lie bracket, 46.47 . 
vector field, naive, non-vectorial directional derivative, 46.4.4 
vector field, naive derivative, 61.2, 61.4 ~~ 
vector field, naive derivative, Leibniz rule, 61.3 
vector field, non-vertical Lie-algebra-generated, transition 
rule, 64.14.5 
vector field, partial derivative, first-order, 57.9.15 


vector field, partial derivative, second-order, 59.9.3 

vector field, Picard iteration method, 44.6.18 

field, right invariant, 62.7.2 

field, tangent, 57.1.2, 57.1.9 

field, tangent, second-order, 59.8.2 

field, vertical, differentiability, 64.2.6 

vector field, vertical Lie-algebra-generated, transition rule, 
64.13.11 

vector field, zero, 57.2.13 

field, zero, differentiability, 57.2.14 

field along curve, 57.8, 57.8.2 

field along curve, continuous, 57.8.3 

field along curve, covariant derivative, 71.7 

field along curve, differentiable, 57.8.5 m 

vector field along curve, velocity, 57.9 

vector field calculus, 61 EX 

field calculus, Cartesian space, 46 

field calculus formalism, Koszul, 57.0.2 

field differentiability, 57.2 

vector field differentiability, given differentiable velocity, 
57.2.21 

vector field directional derivative, Cartesian space, 46.1.8 

vector field directional derivative, naive, Cartesian “space, 
46.4.2 

vector field divergence, 71.8, 71.8.3 

vector field divergence, trace of linear space endomorphism, 
71.8.1 

vector field extension, constant, 57.1.20 

field extension, constant, properties, 57.1.23 

field family, holonomic, 46.5.3 


vector 


vector 


vector 
vector 


vector 


vector 


vector 


vector 
vector 


vector 
vector 
vector 


vector 


vector 
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vector field family, holonomic, manifold, 61.5.19 

vector field generated by family of diffeomorphisms, 63.2.3 
vector field generated by Lie algebra, non-vertical, 64.14 _ 
vector field generated by Lie algebra, vertical, 64.13 
vector field generated by vector, Lie group, left invariant, 


62.4.15 

vector field generated by vector, Lie group, right invariant, 
62.7.7 

vector field generated on manifold by Lie algebra, 63.6.5, 
63.6.28, 63.7.3 


vector field generated quadrilateral curve family, 46.5.6 
vector field induced on fibre set by Lie algebra, non-vertical, 
64.14.2 


vector field induced on total space by Lie algebra, 64.13.3 

vector field of tangent bundle, naive derivative, Leibniz rule, 

61.3.3 

vector field on a manifold, 57 

vector field on differentiable fibre bundle, 64.13 

vector field on Lie group, left invariant, 62.4 

vector field on principal bundle, fundamental, right 
translation, 66.6.10, 66.6.11 

vector field on principal bundle, right translation operator, 
66.2.21 

vector field product by real function, differentiability, 57.2.11 

vector field transport, a-priori formula, 61.7.3 

vector-field tuple extension, constant, 57.4.9 

vector fields, holonomic, Koszul formalism, 57.0.2 

vector fields, holonomic, Riemann curvature, 70.3.2 

vector fields, Lie bracket, Cartesian space, 46.1.11 

vector fields, map-related, Lie bracket, 61.6 

vector fields, zero, integral curves are constant, 57.10.8 

vector fields on a Lie group, left invariant, Lie algebra, 62.4.10 

vector fields on a manifold, Lie algebra, 61.5.16 

vector fields on Lie group, left invariant, linear space, 62.4.8 

vector fields related by a map, 61.6.2 

vector fields versus vector-valued functions, 57.1.1 

vector-frame, on vector bundle, total space, 65.8.4 

vector frame, tangent, bundle, 55.6.22 

vector frame, tangent, fibre atlas, 55.6.21 

vector frame, tangent, total space, 55.6.5 

vector frame bundle, manifold chart map, 55.6.25 

vector frame bundle, tangent, 55.6, 55.6.31 

vector frame bundle, tangent, manifold chart, 55.6.24 

vector frame bundle, tangent, total space, 55.6.30 

vector frame bundle, tangent, total space manifold atlas, 
55.6.28 

vector-frame bundle of a vector bundle, 65.8.10 

vector-frame bundle on a vector bundle, 65.8 

vector frame coordinate tuple, tangent, 55.6.14 

vector-frame fibration, on vector bundle, 65.8.6 

vector frame fibration, tangent, 55.6.8 

vector frame field, tangent, 57.11.2 

vector-frame space, on vector bundle, 65.8.2 

vector frame space, tangent, 55.6.3 

vector frames, tangent, chart transition formula, 55.6.19 

vector integrator differential, Stieltjes integral, 43.12.1 

vector length, Riemannian manifold, 73.2.12 

vector map, constant-scale tangent, differential, 59.4.10 

vector principal frame, tangent, fibre atlas, 55.7.7 

vector principal frame bundle, tangent, 55.7.8 

vector product, 17.1.4 

vector scaling curve, tangent, differential, 59.4.7 

vector set, tangent-line, Cartesian space, 26.13.5 

vector space, 22.0.2 a_i 

vector space, affine space over a group, 26.2.2 

vector space, affine space over module over group, 26.6.2 

vector space, affine space over module over ring, 26.6.6 

vector space, affine space over module over set, 26.4.10 
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vector space, cotangent, differentiable manifold, 55.2.1 
vector space, tangent-line, differentiable manifold, 54.4, 54.4.4 
vector space, tangent velocity, differentiable manifold, 54.10.7 
vector space, topological, terminology, 39.1 
vector-tuple, chart basis, tangent space, 55.5.18 
vector-tuple, on vector bundle, total space, 65.7.5 
vector-tuple, tangent, bundle, 55.5, 55.5.30 mand 
vector-tuple, tangent, total space, 55.5.6 
vector-tuple bundle, associated connection, 59.7.1, 71.3, 

71.3.2, 74.2.1 m 
vector-tuple bundle, differential of real-valued function, 59.7.3 
vector-tuple bundle, tangent, 55.5.37 
vector-tuple bundle, tangent, fibre atlas, 55.5.28 
vector-tuple bundle, tangent, manifold chart, 55.5.31 
vector-tuple bundle, tangent, total space, 55.5.36 
vector-tuple bundle, tangent, total space manifold atlas, 

55.5.34 
vector-tuple bundle of a vector bundle, 65.7.13 
vector-tuple bundle on a vector bundle, 65.7 
vector-tuple coordinate tuple, tangent, 55.5.13 
vector-tuple fibration, on vector bundle, 65.7. 
vector tuple fibration, tangent, 55.5.8 

e 
e 


NI 


vector-tuple field, 57.4 

vector-tuple field, chart-basis, 57.4.2 

vector-tuple field extension, constant, 57.4.5 

vector-tuple fields, locally constant, properties, 57.4.7 

vector-tuple space, on vector bundle, 65.7.3 

vector tuple space, tangent, 55.5.4 

vector tuples, tangent, chart transition formula, 55.5.27 

vector-valued 1-form, exterior derivative, 61.10.5 

vector-valued antisymmetric multilinear function fibration, 
56.7.6 

vector-valued antisymmetric multilinear map, 56.7.7 

vector-valued antisymmetric multilinear map, fibre chart, 
56.7.10 

vector-valued antisymmetric multilinear map bundle, 56.7.18 

vector-valued antisymmetric multilinear map bundle, 
non-topological, 56.7.14 

vector-valued antisymmetric multilinear map fibration, fibre 
chart, 56.7.11 

vector-valued C* function on Cartesian space, 42.5.31 

vector-valued continuously partially differentiable function, 
41.8.9 

vector-valued covector, pull-back differential, 58.8.7 

vector-valued differential form, exterior derivative, 61.10.9 

vector-valued differential form, short-cut, equi-informational, 
57.7.19 

vector-valued differential form on a manifold, 57.6.5 

vector-valued differential form on a manifold, short-cut, 
57.7.3 

vector-valued form differential, pull-back, 58.11.7 

vector-valued function, action of tangent operator, 54.14.4 

vector-valued function, Cartesian space, differentiation, 41.8 

vector-valued function, Darboux integrable, 43.9.2 = 

function, Darboux integral, 43.9.2 

function, differentiable, 40.7.2 

function, differentiable, on manifold, 51.7 

function, differentiation, 40.7 = 

function, directional derivative, 41.8.14 

function, partial derivative, 41.8.4 

function, tangent operator action, 54.14 

function on linear space, differentiation, 41.9 

vector-valued function on linear space, directional derivative, 
41.9.4 

vector-valued function on linear space, directionally 
differentiable, 41.9.2 

vector-valued function on linear space, total differential, 


vector-valued 
vector-valued 


vector-valued 
vector-valued 
vector-valued 


vector-valued 


vector-valued 


vector-valued 
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vector-valued function on linear space, totally differentiable, 
41.9.7 

vector-valued function on manifold, constant, differentiable, 
51.7.8 

vector-valued function on manifold, constant, zero derivative, 
54.14.5 


vector-valued function on manifold, differentiable, 51.7.6 

vector-valued function on manifold, false derivative, 54.9.2 

vector-valued functions versus vector fields, 57.1.1 ^ 

vector-valued integrand, Cauchy-Riemann-Darboux integral, 
43.9 

vector-valued integrator curve for Stieltjes integral, 43.12.2 

ocal differential form, short-cut, 57.7.21 

vector-valued map, constant, zero partial derivatives, 41.8.12 

vector-valued multilinear function fibration, 56.7.5 

vector-valued multilinear function on manifold, 56.7.1 

vector-valued multilinear map, components, 56.7.3 

vector-valued multilinear map bundle, 56.7 

vector-valued partially differentiable function, 41.8.2 

vector-valued pull-back differential, 58.8.6 

vector-valued short-cut differential form space, 57.7.15 

vector velocity, tangent-line, Cartesian space, 26.13.14 

vectorial geometry, coordinate-free, total differential theorem, 
41.6.3 

vectoriality of Lie bracket, Cartesian space, 46.4.10 

vectors, orthogonal, module over ring, 19.7.7 

vectors, tangent, are functions, 53.1.13, 54.2.1 

vectors, vector, map charts to lines, 53.1.13, 54.2.1 

Vedenisov, Nikolai Borisovich, 33.3.20 = 

Vedenisov axiom, topology separation class, 33.3.29 

vegetarian cook, 1.4.14 

vegetarianism, Eskimo, axiom of choice, 7.12.7 

vel, 3.6.3 Z 

velocity, 40.1 

velocity, native, 53.1.7 

velocity, native, pull-back, 40.1.1 

velocity, quintessence, 53.1.2 

velocity, scaling curve, vector bundle element, 65.5.2 

velocity, tangent-line vector, Cartesian space, 26.13.14 

velocity bundle, tangent, 26.16.9 A 

velocity bundle, tangent, affine space, 26.8 

velocity bundle, tangent, Cartesian space, 26.16 

velocity bundle, tangent, on affine space over module, 26.8.4 

velocity chart map for covector fibration, 55.4.6 

velocity chart map for tangent fibration, 54.5.6, 54.5.9 


vector-valued 


V 


oO 


locity chart map for tangent vector-frame fibration, 
principal, 55.7.5 

velocity chart map for tangent vector-tuple fibration, 55.5.24 
velocity chart on covector fibration, 55.4.6 

velocity chart on tangent fibration, 54.5.6, 54.5.9 

velocity chart on tangent vector-frame fibration, 55.6.17 
velocity chart on tangent vector-frame fibration, principal, 
55.7.5 

velocity chart on tangent vector-tuple fibration, 55.5.24 
velocity field, tangent vector scaling curve, verticality, 59.4.3 
velocity field family of diffeomorphism family, 63.2.7 ^ 
velocity function, first-order ODE right-hand side, 44.3.3 
velocity of scaling curve in linear-space manifold, 57.9.11 
velocity set, tangent, 26.16.5 

velocity space, tangent, 26.16.7 

velocity space, tangent, affine space, 26.8 

velocity space, tangent, on affine space over module, 26.8.2 
velocity space, tangent vectors, 53.1.2 is 
velocity vector, tangent, 26.16.2, 26.16.3, 26.16.4, 54.10.4 
velocity vector, tangent, differentiable manifold, 54.10 
velocity vector, tangent, specification tuple, 53.1.13 

velocity vector bundle, tangent, total space, 54.10.10 
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velocity vector field of constant curve, zero, 57.9.8 

velocity vector field of curve, 57.9, 57.9.2 

velocity vector space, tangent, differentiable manifold, 54.10.7 
velocity vectors, underlying ontology, 26.12.2 

Venn diagram, set operation properties, 8.3.6 

verboten, class of all functions, 42.1.12 p 

Verdünnung (thinning), sequent, 5.3.8 


Vertauschung (exchange), sequent, 5.3.8 

vertical component, check-accent, 59.1.7 

vertical component, common-domain product map 
differential, 58.7.8 

vertical component, cross-section of tangent bundle, 57.1.3 

vertical component, fibre-chart-dependent, 68.5.1 

vertical component, horizontal lift, 67.5.7 

vertical component, horizontal lift, Christoffel array, 67.7.10 

vertical component, horizontal lift, Lie algebra, 67.6.2 

vertical component, horizontal lift function, 69.2.5 

vertical component, linearity of connections, 68.1.2 

vertical component, oblique drop function, 59.3.1 - 

vertical component, product-structured manifold, 58.7.20 

vertical component, product-structured manifold vectors, 
58.7.10 

vertical component, second-level tangent vector, 59.1.10, 
59.2.4 

vertical component, second-level tangent vector, chart 
transition, 59.1.14 

vertical component, second-order derivative of curve, 59.8.7 

vertical component, submanifold tangent vectors, 58.7.14 

vertical component, swap function, 59.6.2 

vertical component, transposed horizontal lift, 67.8.1 

vertical component, vector field naive derivative, 61.2.1 

vertical component, vertical drop function, 59.2.13 

vertical component coordinates extraction, double tangent 
vector, 59.3.7 

vertical component map, basic properties, 67.10.3 

vertical component map, common-domain product-map, 
58.7.16 

vertical component map, connection, 67.10.1 

vertical component map, connection on fibre bundle, 67.10 

vertical component map, formula for covariant derivative, 
68.2.13 

vertical component map, ordinary fibre bundle, 67.10.2 

vertical component map, principal bundle, 69.4 

vertical component map, principal bundle, Minkowski space, 
69.4.5 

vertical component map, principal fibre bundle, 69.4.2 

vertical component map, relation to connection form, 69.5.2 

vertical component map, transposed horizontal lift formula, 
67.11.4 

vertical component map differential, inverse, embedded, 
58.7.19 

vertical differentiable submanifold, product-structured 
manifold, 52.7.8 

vertical drop function, chart-independence, tangent bundle, 
59.2.11 

vertical drop function, chart-independence, vector bundle, 
65.3.13 

vertical drop function, double tangent space, 59.2.9, 59.2.15 

vertical drop function, linear space tangent bundle, 54.9, 
54.9.5 mm 

vertical drop function, pointwise, linearity, tangent bundle, 
59.2.10 

vertical drop function, pointwise, linearity, vector bundle, 
65.3.9 

vertical drop function, total tangent space, 59.2 

vertical drop function, vector bundle, 65.3, 65.3.5 

vertical Lie-algebra-generated vector field transition rule, 
61.13.11 
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vertical lift, 71.1.3 

vertical lift by horizontal lift function, 67.4.3 

vertical subset, product-structured set, 10.15.12 

vertical subspace, second-level tangent space, 64.5.7 

vertical tangent vector on total space of fibration, 64.5.4 

vertical topological submanifold, product-structured 
manifold, 50.5.10 

vertical topological subspace, product-structured space, 
32.11.6 

vertical vector, identification with fibre-set tangent vector, 
64.12.7 

vertical vector field, constant, differentiability, 64.2.5 

vertical vector field, differentiability, 64.2.6 i 

vertical vector field, fundamental, 66.6 

vertical vector field, fundamental, principal bundle, 66.6.2 

vertical vector field generated by Lie algebra, 64.13 

vertical vector of total tangent space, 59.2.3 

verticality, drop function, 59.2.8 

verticality, tangent vector scaling curve velocity field, 59.4.3 

VI-finite set, 13.11.1 

victory, pyrrhic, partial differentiability, 41.1.8 

Viéte, Frangois, 77.1.4 

viewpoint bundle, baseless figure/frame bundle, 20.10.8 

viewpoint bundle, topological fibre/frame bundle, 47.13.2 

VII-finite set, 13.11.1 

virtual memory, computer, 7.10.2 

visual art, 2.4.4 

visual cortex, 22.0.3 

visual cortex, early, 40.2.1 

Vitali, Giuseppe, 77.1.7 

vocal-grooming hominids, 3.0.2 

Voigt, Woldemar, 77.1.7 

Voigt, Woldemar, relativity, 77.3.2 

voltage, logical, 3.2.9 Lf 

von Neumann, John (János), 12.1.1, 12.6.1, 77.1.7 

von Neumann universe, 12.6.1 PAA 

von Neumann universe, finite stages, 12.6.2 

von Neumann universe, rank, 12.5.4 

von Neumann universe, set membership depth, 7.8.8 

von Neumann universe, stages, 12.6.4 

von Neumann universe, tranche of stages, 12.6.5 

von Staudt, Karl Georg Christian, 77.1.6 

vote, majority, truth not decided by, 1.4.15 


walk in park, 8.0.1 

Wallis, John, 77.1.5 

wand, magic, axiom of choice, 2.1.5 

wand, magic, axiom of countable choice, 13.10.1 

waste of ink, 1.5.6, 24.7.9 

water line, 11.8.5- 

weak force, gauge theory, 70.8.1 

weak initial segment, ordinal number, 12.5.27 

weak initial segment, well-ordered set, 11.7.10 

weak reflexivity of an order, 11.1.2 

weak regularity, 1.4.10 

weak topology generated by family of maps to topological 
spaces, 32.4.8 

weaker topology, 31.3.23 

weakly connected function, 35.2.5 

weakly ordered selection, 14.10.2 

weakly separated pair of sets, 33.2.2 

wedge product, 30.4.11 

weeding out axioms of choice, 1.6.2 

Weierstra8, Karl Theodor Wilhelm, 77.1.6 

Weierstrab, Karl Theodor Wilhelm, epsilon-delta continuity, 
38.1.2 

Weierstraf property, topological space, 35.7.4 

well-defined function, 10.9.1 

well-formed formula, 4.1.6 
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well-orderable set, cardinality, 13.1.24 

well ordered by set-inclusion, 11.6.17 
well-ordered set, 11.6.1 

well-ordered set, lower section, 11.7.4 
well-ordered set, proper lower section, 11.7.4 


well-ordered set, weak initial segment, 11.7.10 

well-ordered set comparability theorem, 11.7. 

well ordered sets, comparability, 11.7 

well-ordered sets, counting theorem, 13.2.4 

well-ordered sets, enumeration theorem, 13.2.4 

well-ordered sets, representation theorem, 13.2.4 

well-ordering, 11.6, 11.6.1 

well-ordering, induced by bijection, 11.6.8 

well-ordering, lower, 11.6.4 

well-ordering, restriction, 11.6.6 

well-ordering, upper, 11.6.4 

well-ordering of finite ordinal numbers, 12.2.7, 12.2.8 

well-ordering theorem, 7.12.7, 22.7.21 

well-ordering theorem, proof, literature, 11.6.23 

well-orderings, family tree, 11.6.3 

Weyl, Hermann Klaus Hugo, 77.1.7 

Weyl, Hermann Klaus Hugo, affine connection, 59.1.13, 

67.1.1, 71.0.1 

Wey ; Hermann Klaus Hugo, axioms for real numbers, 22.1.6 

Weyl, Hermann Klaus Hugo, Christoffel array, 67.1.4 

Weyl, Hermann Klaus Hugo, gauge theory, 70.8.2 

Weyl, Hermann Klaus Hugo, manifold definition, 49.4.8 

Weyl, Hermann Klaus Hugo, three-layer differential geometry 
model, 49.2.10 

wff (well-formed formula), 4.1.6 

wff-wff, atomic, 4.5.15 

wheat versus chaff, 77.4.15 

wheel of fortune, 1.8 

wheel of knowledge, 2.0.1 

white cliffs of Dover, bluebirds, 77.4.15 

white swan, 5.2.7 

Whitehead, Alfred North, 2.1.4, 77.1.7 

Whitney sum, tangent vector bundles, 55.5.1 

whole hog, 4.8.8, 53.3.9 M 

Widman, Johann, 17.1.19 

width, rectangular matrix, 25.2.2 

William Tell notation, 10.19.8 

wimps, understanding mathematics, 1.4.8 

wood block, timber store, 9.4.4 m 

world, crazy, hill of beans, 77.4.15 

wormholes, pathwise parallelism, 48.2.2 

worms, can of, 8.0.1 

wrapped finite mathematical induction principle, 12.2.24 

wrapped successor function, 12.2.22 ES 


x 


XOR (exclusive OR), 3.7.8, 3.7.10 
XOR notation, 3.7.16 | 
XOR operator, 3.7.14 

XOR versus inclusive OR, 3.8.2 


Yang, Chen-Ning, 77.1.8 

Yang Hut, 14.9.6 

Yang-Mills theory, history of gauge theory, 70.8.2 

yardstick, cardinality, aleph numbers, 13.2.6 

yardstick, cardinality, beta-cardinality, 12.6.4, 13.4.2 

yardstick, cardinality, beth number, 13.4.12 

yardstick, cardinality, countable choice axiom, 14.2.8 

yardstick, cardinality, finite ordinal numbers, 13.5.1, 14.2.1 

yardstick, cardinality, infinite ordinal numbers useless, 12.6.7, 
13.1.21 

yardstick, cardinality, mediate cardinals, 13.10.14 

yardstick, cardinality, w, 13.7.5, 13.11.5 

yardstick, cardinality, ordinal numbers versus beta-sets, 
12.5.4 
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yardstick, cardinality, rational numbers, 15.2.1 
yardstick, cardinality, von Neumann universe, 12.6.3 


yardstick, cardinality, Zermelo integer representation, 12.1.15 


yaw, 76.8.4 
Young, John Wesley, 77.1.7 


Z (Zermelo set theory), 7.7.3 

Zauberlehrling, Goethe, recursive gap-filling, 1.4.1 

Zeno of Elea, 77.1.2 n 

Zeno of Elea, Achilles and tortoise, 31.2.3 

Zeno of Elea, flying arrow, 53.1.12 

Zermelo, Ernst Friedrich Ferdinand, 77.1.7 

Zermelo, Ernst Friedrich Ferdinand, axiom of choice, 7.12.7 

Zermelo, Ernst Friedrich Ferdinand, axiom of specification, 
7.7.3 

Zermelo, Ernst Friedrich Ferdinand, cumulative hierarchy, 
12.6.1 

Zermelo, Ernst Friedrich Ferdinand, ordinal numbers, 7.9.5, 
12.1.15 

Zermelo-Fraenkel set theory, concrete proposition domain, 
3.2.9 

Zermelo-Fraenkel set theory axioms, 7.2, 7.2.4 

Zermelo-Fraenkel set theory with axiom of choice, 7.11.10 

Zermelo-Fraenkel set theory with axiom of countable choice, 
13.7.21 

Zermelo-Fraenkel specification theorem, 7.7.2 

Zermelo set theory, 7.7.3, 7.7.4 rand 

Zermelo set theory model, 12.6.3 

Zermelo's theorem (well-ordering), 11.6.22 

zero, not a natural number, 14.1.2 

zero cross-section, 57.2.13 

zero derivative, constant function, 40.6.7 

zero derivative, constant real function, 40.5.5 

zero derivative, constant real-valued function on manifold, 


54.11.6 

zero derivative, constant vector-valued function on manifold, 
54.14.5 

zero differential, constant real-valued function on manifold, 
54.11.18 


zero-dimensional locally Cartesian space, 49.4.9 
zero divisor, 18.7.8 ot 
zero divisor, ring, 18.1.14 

zero matrix, 25.2.8 

zero measure, Lebesgue, 45 
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zero-parameter predicate, logical, 5.1.10 

zero partial derivative, constant real function, 41.1.16 

zero partial derivatives, constant Cartesian space map, 
41.2.23 

zero partial derivatives, constant vector-valued map, 41.8.12 

zero partial derivatives, real-valued function at min/max, 
51.6.12 

zero ring, 18.1.5 

zero tangent operator ambiguity, 53.3.3, 54.12.4, 54.12.10, 
54.15.1 

zero tangent vector, ubiquitous, 58.12.6, 58.12.10 

zero-thickness boundary, 31.2.9 BM 

zero vector, tangent, 54.1.17 

zero vector field, 57.2.13 

zero vector field, differentiability, 57.2.14 

zero vector fields, integral curves are constant, 57.10.8 

zero velocity vector field, constant curve, 57.9.8 

zero-width boundary, 31.2.2 

zeroth power, all sets have the same, 14.6.4 

zeroth power of any set is the empty set, 14.6.3 

ZF (Zermelo-Fraenkel), 7.1.10 

ZF axiom, anointment, 7.4.2, 12.1.27, 13.8.15 

ZF axiom, productive, TAD 

ZF axiom style, 7.9.2 

ZF extension axiom, 7.5 

ZF infinity axiom, 7.9 _ 

ZF regularity axiom, 7.8 

ZF replacement axiom, 7.7 

ZF set existence axiom, 7.4.2, 7.4.3 

ZF set theory, four basic axioms, 7. 


ZF set theory, redundant axiom, 7.4.2 
ZF set theory 8-line summary, 7.2.8, 7.3.9 
ZF set theory axioms, 7.2 

ZF set theory construction stage, 7.8.10 

ZF specification axiom, 7.7.3 — 

ZF with axiom of choice, ZF+AC, 7.11.10 

ZF with axiom of countable choice, ZF+CC, 13.7.21 
ZF+AC set theory, ZF with axiom of choice, 7.11.1 
ZF+CC set theory, ZF with countable choice axiom, 13.7.21 
ZFC (Zermelo-Fraenkel axioms plus axiom of choice), 7.11.9 
Zhu Shijié, 14.9.6 

Zorn’s lemma, 2.4.6, 32.7.1 

Zusammenhang (connection), 49.2.10 

Zusammenziehung (contraction), sequent, 5.3.8 
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pages 2530 definitions 1549 theorems 1829 
chapters 80 notations 578 proofs 1817 
sections 759 examples 191 diagrams 525 
remarks 4571 tables 60 references 503 


Things to do = 114. Theorems to prove = 12. 
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